Home > The Art of Statistics How to Learn from Data(48)

The Art of Statistics How to Learn from Data(48)
Author: David Spiegelhalter

Our chapter on hypothesis testing contained a claim that a P-value of 0.05 was only equivalent to ‘weak evidence’. The reasoning for this is partly based on Bayes factors: P = 0.05 can be shown to correspond, under some reasonable priors under the alternative hypothesis, to Bayes factors between 2.4 and 3.4, which Table 11.3 suggests is weak evidence. As we saw in Chapter 10, this led to a proposed reduction in the necessary P-value for claiming a ‘discovery’ to 0.005.

Unlike null-hypothesis significance testing, Bayes factors treat the two hypotheses symmetrically, and so can actively support a null hypothesis. And if we are willing to put prior probabilities on hypotheses, we might even calculate posterior probabilities of alternative theories for how the world works. Suppose, based on theoretical grounds alone, we judged it 50:50 whether the Higgs boson existed, corresponding to prior odds of 1. The data discussed in the last chapter gave a P-value of around 1/3,500,000, and this can be converted to a maximum Bayes factor of around 80,000 in favour of the Higgs boson, which is very strong evidence even according to legal usage.

*

 

 

Table 11.3

Kass and Raftery’s scale for interpretation of Bayes factors in favour of a hypothesis.8


When combined with prior odds of 1, this turns into posterior odds of 80,000 to 1 for the existence of the Higgs boson, or a probability of 0.99999. But neither the legal nor scientific community generally approve of this kind of analysis, even if it can be used for Richard III.

 

 

An Ideological Battle


In this book we have moved from the informal examination of data, through communication with summary statistics, to using probability models to arrive at confidence intervals, P-values and so on. These standard inferential tools, with which generations of students have occasionally struggled, are known as ‘classical’ or ‘frequentist’ methods, since they are based on long-run sampling properties of statistics.

The alternative Bayesian approach is based on fundamentally different principles. As we have seen, external evidence about the unknown quantities, expressed as a prior distribution, is combined with evidence from the underlying probability model for the data, known as the likelihood, to give a final, posterior distribution which forms the basis for all conclusions.

If we seriously adopt this statistical philosophy, the sampling properties of statistics become irrelevant. And so having spent years learning that a 95% confidence interval does not mean there is a 95% probability that the true value lies in the interval,* the poor student now has to forget all that: a Bayesian 95% uncertainty interval has precisely the latter meaning.

But the argument about the ‘correct’ way to do statistical inference is even more complex than a simple dispute between frequentists and Bayesians. Just like political movements, each school splits into multiple factions who have often been in conflict with each other.

In the 1930s, a three-cornered fight erupted into the public arena. The forum was the Royal Statistical Society, which then as now meticulously recorded and published the discussion of papers presented at its meetings. When Jerzy Neyman proposed his theory of confidence intervals in 1934, Arthur Bowley, a strong advocate of the method of the Bayesian approach, then known as inverse probability, said, ‘I am not at all sure that the “confidence” is not a “confidence trick”,’ and followed this by suggesting a Bayesian approach was necessary: ‘Does that really take us any further?… Does it really lead us towards what we need—the chance that in the universe which we are sampling the proportion is within… certain limits? I think it does not.’ The derisive linking of confidence intervals with confidence tricks continued in the subsequent decades.

The following year, in 1935, open warfare then broke out between two non-Bayesian camps, with Ronald Fisher on one side, and Jerzy Neyman and Egon Pearson on the other. The Fisherian approach was based on estimation using the ‘likelihood’ function, which expresses the relative support given to the different parameter values by the data, and hypothesis testing was based on P-values. In contrast, the Neyman—Pearson approach, which as we have seen was known as ‘inductive behaviour’, was very much focused on decision-making: if you decide the true answer is in a 95% confidence interval, then you will be right 95% of the time, and you should control Type I and Type II errors when hypothesis testing. They even suggested you should ‘accept’ the null hypothesis when it was included in the 95% confidence interval, a concept that was anathema to Fisher (and has subsequently been rejected by the statistical community).

Fisher first accused Neyman ‘of falling into the series of misunderstandings which his paper revealed’. Pearson then rose to Neyman’s defence, saying that ‘while he knew there was a widespread belief in Professor Fisher’s infallibility, he must, in the first place, beg leave to question the wisdom of accusing a fellow-worker of incompetence without, at the same time, showing that he had succeeded in mastering the argument.’ The acrimonious dispute between Fisher and Neyman continued for decades.

The struggle for statistical ideological supremacy continued after the Second World War, but over time the more standard, non-Bayesian schools have resolved into a pragmatic mix, with experiments generally designed using a Neyman—Pearson approach of Type I and Type II errors, but then analysed from a Fisherian perspective using P-values as measures of evidence. As we have seen in the context of clinical trials, this strange amalgam seems to work fairly well, leading prominent (Bayesian) statistician Jerome Cornfield to remark, ‘the paradox is that a solid structure of permanent value has, nevertheless, emerged, lacking only the firm logical foundation on which it was originally thought to have been built.’9

The purported advantages of conventional statistical methods over Bayesianism include the apparent separation of the evidence in the data from subjective factors; general ease in computation; wide acceptability and established criteria for ‘significance’; availability of software; and existence of robust methods that do not have to make strong assumptions about the shape of distributions. Whereas Bayesian enthusiasts would claim that the very ability to make use of external, and even explicitly subjective, elements is what enables more powerful inferences and predictions to be made.

The statistical community used to engage in lengthy vituperative arguments about the foundations of the subject, but now a guarded truce has been called and a more ecumenical approach is the norm, with methods chosen according to the practical context rather than their ideological credentials derived from Fisher, Neyman—Pearson or Bayes. This seems a sensible and pragmatic compromise in an argument that can appear somewhat obscure to non-statisticians. My personal view is that, while they may well disagree about the fundamentals of their subject, reasonable statisticians will generally come to similar conclusions. The problems that arise in statistical science do not generally come from the philosophy underlying the precise methods that are used. Instead, they are more likely to be due to inadequate design, biased data, inappropriate assumptions and, perhaps most important, poor scientific practice. And in the next chapter we shall take a look at this dark side of statistics.*

 

 

Summary


• Bayesian methods combine evidence from data (summarized by the likelihood) with initial beliefs (known as the prior distribution) to produce a posterior probability distribution for the unknown quantity.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)