Home > The Art of Statistics How to Learn from Data(15)

The Art of Statistics How to Learn from Data(15)
Author: David Spiegelhalter

Here are some more examples of how easy it might be to believe a causal link, when some other factor is influencing events:

• Many children are diagnosed with autism soon after being vaccinated. Does vaccination cause autism? No, these are events that happen at around the same age and inevitably there are some coincidental close occurrences.

• Out of the total number of people who die each year, a smaller proportion are left-handed than in the general population. Does that mean that left-handers live longer? No, this happens because people who are dying now were born in an era when children were forced to change to being right-handed, and so there are simply fewer older left-handers.10

• The average age at which popes die is older than that of the general population. Does this mean that being a pope helps you live longer? No, popes are selected from a group who have not died young (otherwise they could not be candidates).11

 

The myriad ways we can be caught out might encourage the idea that we can never conclude causation from anything other than a randomized experiment. But, perhaps ironically, this view was counteracted by the man responsible for the first modern randomized clinical trial.

 

 

Can We Ever Conclude Causation from Observational Data?


Austin Bradford Hill was a brilliant British applied statistician who was at the forefront of two world-changing scientific advances: he designed the streptomycin clinical trial mentioned earlier in the chapter, which essentially set the standards for all subsequent RCTs, and with Richard Doll in the 1950s he led the research that eventually confirmed the link between smoking and lung cancer. In 1965 he set out a list of criteria that needed to be considered before concluding that an observed link between an exposure and an outcome was causal, where an exposure might comprise anything from chemicals in the environment to habits such as smoking or lack of exercise.

These criteria have been subsequently much debated, and the version shown below was developed by Jeremy Howick and colleagues, separated into what they call direct, mechanistic and parallel evidence.12


Direct evidence:

1. The size of the effect is so large that it cannot be explained by plausible confounding.

2. There is appropriate temporal and/or spatial proximity, in that cause precedes effect and effect occurs after a plausible interval, and/or cause occurs at the same site as the effect.

3. Dose responsiveness and reversibility: the effect increases as the exposure increases, and the evidence is even stronger if the effect reduces upon reduction of the dose.

 

Mechanistic evidence:

4. There is a plausible mechanism of action, which could be biological, chemical, or mechanical, with external evidence for a ‘causal chain’.

 

Parallel evidence:

5. The effect fits with what is known already.

6. The effect is found when the study is replicated.

7. The effect is found in similar, but not identical, studies.

 

These guidelines might enable causation to be determined from anecdotal evidence, even in the absence of a randomized trial. For example, mouth ulcers have been observed to occur after aspirin is rubbed within the mouth, say to relieve tooth pain. The effect is dramatic (obeys guideline 1), occurs where rubbed (2), is a plausible response to an acidic compound (4), is not contradicted by current science and is similar to the known effect of aspirin in causing stomach ulcers (5), and has been repeatedly observed in multiple patients (6). So five out of seven guidelines are satisfied, the remaining two have not been tested, and so it is reasonable to conclude this is a genuine adverse reaction to the drug.


The Bradford Hill criteria apply to general scientific conclusions for populations. But we may also be interested in individual cases, say in civil litigation where courts need to decide whether a particular exposure (say the asbestos encountered in a job) caused a negative outcome in a specific person (say John Smith’s lung cancer). It can never be established with absolute certainty that the asbestos was the cause of the cancer, since it cannot be proved that the cancer would not have occurred without the exposure. But some courts have accepted that, on the ‘balance of probabilities’, a direct causal link has been established if the relative risk associated with the exposure is greater than two. But why two?

Presumably the reasoning behind this conclusion is as follows:

1. Suppose that, in the normal run of things, out of 1,000 men like John Smith, 10 would get lung cancer. If asbestos more than doubles the risk, then if these 1,000 men had been exposed to asbestos, then perhaps 25 would have developed lung cancer.

2. So of those exposed to asbestos who go on to develop lung cancer, less than half would have got lung cancer if they had not been exposed.

3. So more than half of the lung cancers in this group will have been caused by the asbestos.

4. Since John Smith is one of this group of people, then on the balance of probabilities his lung cancer was caused by the asbestos.

 

This kind of argument has led to a new area of study known as forensic epidemiology, which tries to use evidence derived from populations to draw conclusions about what might have caused individual events to occur. In effect this discipline has been forced into existence by people seeking compensation, but this is a very challenging area for statistical reasoning about causation.


The appropriate handling of causation still remains contested within the field of statistics, whether it concerns pharmaceuticals or big ears, and without randomization it is rare to be able to draw confident conclusions. One imaginative approach takes advantage of the fact that many genes are spread essentially at random through the population, so it is as if we have been randomized to our specific version at conception. This is known as Mendelian randomization, after Gregor Mendel, who developed the modern idea of genetics.13

Other advanced statistical methods have been developed to try to adjust for potential confounders and so to get closer to an estimate of the actual effect of the exposure, and these are largely based on the important idea of regression analysis. And for this we must acknowledge, yet again, the fertile imagination of Francis Galton.

 

 

Summary


• Causation, in the statistical sense, means that when we intervene, the chances of different outcomes are systematically changed.

• Causation is difficult to establish statistically, but well-designed randomized trials are the best available framework.

• Principles of blinding, intention-to-treat and so on have enabled large-scale clinical trials to identify moderate but important effects.

• Observational data may have background factors influencing the apparent observed relationships between an exposure and an outcome, which may be either observed confounders or lurking factors.

• Statistical methods exist for adjusting for other factors, but judgement is always required as to the confidence with which causation can be claimed.

 

 

CHAPTER 5


Modelling Relationships Using Regression


The ideas in previous chapters allow us to visualize and summarize a single set of numbers, and also to look at associations between pairs of variables. These basic techniques can take us a remarkably long way, but modern data will generally be a lot more complex. There will often be a list of possibly related variables, one of which we are particularly interested in explaining or predicting, whether it is an individual’s risk of cancer or a country’s future population. In this chapter we meet the important idea of a statistical model, which is a formal representation of the relationships between variables, which we can use for the desired explanation or prediction. This inevitably means introducing some mathematical ideas, but the basic concepts should be clear without using algebra.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)