Home > The Art of Statistics How to Learn from Data(49)

The Art of Statistics How to Learn from Data(49)
Author: David Spiegelhalter

• Bayes’ theorem for two competing hypotheses can be expressed as posterior odds = likelihood ratio × prior odds.

• The likelihood ratio expresses the relative support for two hypotheses from an item of evidence, and is sometimes used to summarize forensic evidence in criminal trials.

• When the prior distribution comes from some physical sampling process, Bayesian methods are uncontroversial. However generally a degree of judgement is necessary.

• Hierarchical models allow evidence to be pooled across multiple small analyses that are assumed to have parameters in common.

• Bayes factors are the equivalent of likelihood ratios for scientific hypotheses, and are a controversial substitute for null-hypothesis significance testing.

• The theory of statistical inference has a long history of controversy, but issues of quality of data and scientific reliability are more important.

 

 

CHAPTER 12


How Things Go Wrong

 

Does extra-sensory perception (ESP) exist?

 

In 2011, the eminent American social psychologist Daryl Bem published a major paper in a prominent psychology journal that featured the following experiment. A hundred students sat in front of a computer screen showing two curtains, and chose which of either the left or right curtain hid an image. The curtains then ‘opened’ to reveal whether they were correct or not, and this was repeated for a series of 36 images. The twist was that, unknown to the participants, the position of the image was determined at random after the subject had made their choice, and so any excess correct choices over chance would be ascribed to precognition of where the image would appear.

Bem reported that, instead of the expected 50% success rate under the null hypothesis of no precognition, subjects chose correctly 53% of the time when an erotic image was shown (P = 0.01). The paper contained the results of eight further experiments in precognition, with over 1,000 participants and spread over ten years, and he observed statistically significant results in favour of precognition in eight of the nine studies. Is this convincing proof that extra-sensory perception (ESP) exists?

This book has, I hope, illustrated some powerful applications of statistical science in solving real-world problems, carried out with skill and care by practitioners who are mindful of its limitations and potential pitfalls. But the real world is not always so worthy of admiration. It is now time to look at what happens when the science and art of statistics are not carried out so well. We shall then look at how Bem’s paper was received and critiqued.

There is a reason so much attention is now being paid to poor-quality statistical practice: it has been blamed for what is known as the reproducibility crisis in science.

 

 

The ‘Reproducibility Crisis’


Chapter 10 explored John Ioannidis’s notorious 2005 claim that most published research findings were false, and since then many other researchers have argued that there is a fundamental lack of reliability in the published scientific literature. Scientists have failed to replicate studies done by their peers, suggesting that the original studies are not as trustworthy as previously thought. Although initially focused on medicine and biology, these accusations have since spread to psychology and other social sciences, although the actual percentage of claims that are either exaggerated or false is contested.

Ioannadis’s original claim was based on a theoretical model, but an alternative approach is to take past studies and try to replicate them, in the sense of conducting similar experiments and seeing if similar results are observed. The Reproducibility Project was a major collaboration in which 100 psychological studies were repeated with larger sample sizes, and so had higher power to detect a true effect if it existed. The project revealed that whereas 97% of the original studies had statistically significant results, only 36% of the replications did.1

Unfortunately, this was widely reported as implying that the remaining 63% of ‘significant’ studies were false claims, but this falls into the trap of making a strict division between studies that are either significant or not-significant. Distinguished US statistician and blogger Andrew Gelman has pointed out that ‘the difference between “significant” and “not significant” is not itself statistically significant’.2 In fact only 23% of original and replication studies had results that were significantly different from each other, which is perhaps a more appropriate estimate of the proportion of the original studies with exaggerated or false claims.

Rather than thinking in terms of significant or not-significant as determining a ‘discovery’, it would be better to focus on the sizes of the estimated effects. The Reproducibility Project found that replication effects were on average in the same direction as the original studies, but were around half their magnitude. This points to an important bias in the scientific literature: a study which has found something ‘big’, at least some of which is likely to have been luck, is likely to lead to a prominent publication. In an analogy to regression to the mean, this might be termed ‘regression to the null’, where early exaggerated estimates of effects later decrease in magnitude towards the null hypothesis.

The claimed reproducibility crisis is a complex issue, rooted in the excessive pressure put on researchers to make ‘discoveries’ and publish their results in prestigious scientific journals, all of which is crucially dependent on finding statistically significant results. No single institution or profession is to blame. We have also showed when discussing hypothesis testing that, even if statistical practice were perfect, the rarity of true and substantial effects means a substantial proportion of results that are claimed to be ‘significant’ are inevitably going to be false-positives (Figure 10.5). But, as we now see, statistical practice is often far from perfect.


Statistics can be done badly at every stage of the PPDAC cycle. Right from the beginning, we may try to tackle a Problem that just cannot be answered with the information available. For example, if we set out to find out why teenage pregnancy rates have fallen so dramatically in the UK over the last decade, nothing in the observed data can offer an explanation.*

Then the Planning can go wrong, for example by

• Choosing a sample that is convenient and inexpensive rather than representative, for example telephone polls before elections.

• Asking leading questions or using misleading wording in surveys, such as ‘How much do you think you can save by buying online?’

• Failing to make a fair comparison, such as assessing homeopathy by only observing volunteers for the therapy.

• Designing a study that is too small and so has low power, which means that fewer true alternative hypotheses are detected.

• Failing to collect data on potential confounders, lack of blinding in randomized trials, and so on.

 

As Ronald Fisher famously put it, ‘To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.’3

When it comes to collecting Data, common problems include excessive missing responses, people dropping out of the study, recruitment being much slower than anticipated, and simply getting everything coded up efficiently. All these issues should have been foreseen and avoided by careful piloting.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)