Home > The Art of Statistics How to Learn from Data(12)

The Art of Statistics How to Learn from Data(12)
Author: David Spiegelhalter

For anyone who has spent time accumulating academic qualifications, this newspaper headline could have been alarming. But should we be concerned? This is a huge study based on a registry of the complete eligible population—not a sample—so we can confidently conclude that slightly more brain tumours really were found in more-educated people. But did all that sweating in the library overheat the brain and lead to some strange cell mutations? In spite of the newspaper headline, I doubt it. And to give them credit, the authors of the paper doubted it too, adding, ‘Completeness of cancer registration and detection bias are potential explanations for the findings.’ In other words, wealthy people with higher education are more likely to be diagnosed and get their tumour registered, an example of what is known as ascertainment bias in epidemiology.

 

 

‘Correlation Does Not Imply Causation’


We saw in the last chapter how Pearson’s correlation coefficient measures how close the points on a scatter-plot are to a straight line. When considering English hospitals conducting children’s heart surgery in the 1990s, and plotting the number of cases against their survival, the high correlation showed that bigger hospitals were associated with lower mortality. But we could not conclude that bigger hospitals caused the lower mortality.

This cautious attitude has a long pedigree. When Karl Pearson’s newly developed correlation coefficient was being discussed in the journal Nature in 1900, a commentator warned that ‘correlation does not imply causation’. In the succeeding century this phrase has been a mantra repeatedly uttered by statisticians when confronted by claims based on simply observing that two things tend to vary together. There is even a website that automatically generates idiotic associations, such as the delightful correlation of 0.96 between the annual per-capita consumption of mozzarella cheese in the US between 2000 and 2009, and the number of civil engineering doctorates awarded in each of those years.2

There seems to be a deep human need to explain things that happen in terms of simple cause—effect relationships—I am sure we could all construct a good story about all those new engineers gorging on pizzas. There is even a word for the tendency to construct reasons for a connection between what are actually unrelated events—apophenia—with the most extreme case being when simple misfortune or bad luck is blamed on others’ ill-will or even witchcraft.

Unfortunately, or perhaps fortunately, the world is a bit more complicated than simple witchcraft. And the first complication comes in trying to work out what we mean by ‘cause’.

 

 

What Is ‘Causation’ Anyway?


Causation is a deeply contested subject, which is perhaps surprising as it seems rather simple in real life: we do something, and that leads to something else. I jammed my thumb in the car door, and now it hurts.

But how do we know that my thumb would not have hurt anyway? Perhaps we can think of what is known as a counter-factual. If I hadn’t jammed my thumb in the door, then my thumb would not hurt. But this will always be an assumption, requiring the rewriting of history, since we can never really know for certain what I might have felt (although in this case I might be fairly confident that my thumb would not suddenly start hurting of its own accord).

This gets even trickier when we allow for the unavoidable variability that underlies everything interesting in real life. For example, the medical community now agrees that smoking cigarettes causes lung cancer, but it took decades for doctors to come to this conclusion. Why did it take so long? Because most people who smoke do not get lung cancer. And some people who do not smoke do get lung cancer. All we can say is that you are more likely to get lung cancer if you smoke than if you do not smoke, which is one reason why it took so long for laws to be enacted to restrict smoking.

So our ‘statistical’ idea of causation is not strictly deterministic. When we say that X causes Y, we do not mean that every time X occurs, then Y will too. Or that Y will only occur if X occurs. We simply mean that if we intervene and force X to occur, then Y tends to happen more often. So we can never say that X caused Y in a specific case, only that X increases the proportion of times that Y happens. This has two vital consequences for what we have to do if we want to know what causes what. First, in order to infer causation with real confidence, we ideally need to intervene and perform experiments. Second, since this is a statistical or stochastic world, we need to intervene more than once in order to amass evidence.

And that leads us naturally to a delicate topic: conducting medical experiments on large numbers of people. Few of us might relish the idea of being experimented on, especially when life and death are concerned. Which makes it all the more remarkable that thousands of people have been willing to be part of huge studies in which neither they nor their doctor knew which treatment they would end up getting.


Do statins reduce heart attacks and strokes?

 

Every day I take a little white pill—a statin—because I have been told it lowers cholesterol and so reduces the risk of heart attacks and strokes. But what is its effect on me personally? I am almost certain that it causes my low-density cholesterol (LDL) to drop, since I was told it reduced soon after I started taking the tablets. This drop in LDL is a direct, essentially deterministic effect that I can assume is caused by the statin.

But I will never know if this daily ritual does me any good in the long run; it depends on which of my many possible future lives actually occurs. If I never have a heart attack or a stroke, I will have no idea whether I would have never had one even if I had not taken the tablets, and all this pill-popping for years was a waste of time. If I do have a heart attack or a stroke, I will not know if this event was delayed by taking the statin. All I can ever know is that, on average, it benefits a large group of people like me, and this knowledge is based on large clinical trials.

The purpose of a clinical trial is to carry out a ‘fair test’ that properly determines causation and estimates the average effect of a new medical treatment, without introducing biases that could give us the wrong idea of its effectiveness. A proper medical trial should ideally obey the following principles:

1. Controls: If we want to investigate the effect of statins on a population, we can’t just give statins to a few people, and then, if they don’t have a heart attack, claim this was due to the pill (regardless of the websites that use this form of anecdotal reasoning to market their products). We need an intervention group, who will be given statins, and a control group who will be given sugar pills or placebos.

2. Allocation of treatment: It is important to compare like with like, so the treatment and comparison groups have to be as similar as possible. The best way to ensure this is by randomly allocating participants to be treated or not, and then seeing what happens to them—this is known as a randomized controlled trial (RCT). Statin trials do this with enough people so that the two groups should be similar in all factors that could otherwise influence the outcome, including—and this is critically important—those factors that we don’t know about. These studies can be huge: in the UK Heart Protection Study carried out in the late 1990s, 20,536 people at raised risk of a heart attack or stroke were randomly allocated to take either 40 mg of simvastatin daily or a dummy tablet.3

3. People should be counted in the groups to which they were allocated: The people allocated to the ‘statin’ group in the Heart Protection Study (HPS) were included in the final analysis even if they did not take their statins. This is known as the ‘intention to treat’ principle, and can seem rather odd. It means that the final estimate of the effect of statins really measures the effect of being prescribed statins rather than actually taking them. In practice, of course, people will be strongly encouraged to take the tablets throughout the study, although after five years in the HPS 18% of those allocated a statin had stopped taking them, while as many as 32% of those initially allocated to a placebo tablet actually started taking statins during the trial. Since these people who switch treatments tend to muddy the difference between the groups, we might expect the apparent effect in an ‘intention-to-treat’ analysis to be less than the effect of actually taking the drug.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)