Home > The Art of Statistics How to Learn from Data(14)

The Art of Statistics How to Learn from Data(14)
Author: David Spiegelhalter

The main recent innovation in randomized experimentation concerns ‘A/B’ testing in web design, in which users are (unknowingly) directed to alternative layouts for web pages, and measurements made of time spent on pages, click-throughs to advertisements, and so on. A series of A/B tests can rapidly lead to an optimized design, and the huge sample sizes mean that even small, but still potentially profitable, effects can be reliably detected. This has meant an entirely new community has had to learn about trial design, including the perils of making multiple comparisons that we will come to in Chapter 10.

 

 

What Do We Do When We Can’t Randomize?

 

Why do old men have big ears?

 

It is easy for researchers to randomize if all they have to do is change a website: there is no effort to recruit participants since they don’t even know they are the subjects of an experiment, and there is no need to get ethical approval to use them as guinea pigs. But randomization is often difficult and sometimes impossible: we can’t test the effect of our habits by randomizing people to smoke or eat unhealthy diets (even though such experiments are performed on animals). When the data does not arise from an experiment, it is said to be observational. So often we are left with trying as best we can to sort out correlation from causation by using good design and statistical principles applied to observational data, combined with a healthy dose of scepticism.

The issue of old men’s ears might be rather less important than some of the topics in this book, but illustrates the need for choosing study designs that are appropriate for answering questions. Taking a problem-solving approach based on the PPDAC cycle, the Problem is that, certainly based on my personal observation, old men often seem to have big ears. Why could this be? An obvious Plan is to see whether, in the contemporary population, age is correlated with adult ear-length. It turns out that groups of medical researchers in the UK and Japan have collected Data in such a cross-sectional study: their Analysis showed a clear positive correlation, and their Conclusions were that ear-length was associated with age.7

The challenge is then to try to explain this association. Do ears carry on growing with age? Or did people who are old now always have bigger ears, and something has happened over the last decades to make more recent generations have smaller ears? Or is it that men with smaller ears simply die earlier for some reason; there is a traditional Chinese belief that big ears predict a longer life. Some imagination is required to think of what kind of studies could test these ideas. A prospective cohort study would follow young men through their lives, measuring their ears to check if they grew, or if those with smaller ears died earlier. This would take rather a long time, and so an alternative retrospective cohort study could take men who are old now, and try and work out whether their ears had grown, perhaps using past photographic evidence. A case-control study could take men who died, find men who are still alive who matched them in their age and other factors that are known to predict longevity, and see if the survivors had bigger ears.*

And so the problem-solving cycle would start again.

 

 

What Can We Do When We Observe an Association?


This is where some statistical imagination is called for, and it can be an enjoyable exercise to guess the reasons why an observed correlation might be spurious. Some are fairly easy: the close correlation between mozzarella consumption and civil engineers is presumably because both measures have been increasing over time. Similarly any correlation between ice-cream sales and drownings is due to both being influenced by the weather. When an apparent association between two outcomes might be explained by some observed common factor that influences both, this common cause is known as a confounder: both the year and weather are potential confounders since they can be recorded and considered in an analysis.

The simplest technique for dealing with confounders is to look at the apparent relationship within each level of the confounder. This is known as adjustment, or stratification. So for example we could explore the relationship between drownings and ice-cream sales on days with roughly the same temperature.

But adjustment can produce some paradoxical results, as shown by an analysis of acceptance rates by gender at Cambridge University. In 1996 the overall acceptance rate to study five academic subjects in Cambridge was slightly higher for men (24% of 2,470 applicants) than it was for women (23% of 1,184 applicants). The subjects were all in what we today call STEM (science, technology, engineering and medicine) subjects, which have historically been studied predominantly by men. Was this a case of gender discrimination?

Take a careful look at Table 4.2. Although overall the acceptance rate was higher for men, the acceptance rate in each subject individually was higher for women. How can this apparent paradox occur? The explanation is that the women were more likely to apply for the more popular and therefore more competitive subjects with the lowest acceptance rate, such as medicine and veterinary medicine, and tended not to apply to engineering, which has a higher acceptance rate. In this case, therefore, we might conclude that there is no evidence of discrimination.

*

 

 

Table 4.2

Illustration of Simpson’s Paradox using admission data for Cambridge in 1996. Overall, the acceptance rate was higher for men. But in each subject the acceptance rate was higher for women.


This is known as Simpson’s paradox, which occurs when the apparent direction of an association is reversed by adjusting for a confounding factor, requiring a complete change in the apparent lesson from the data. Statisticians revel in finding real-life examples of this, each further reinforcing the caution required in interpreting observational data. Nevertheless, it shows the insights gained by splitting data according to factors that may help explain observed associations.


Does having a nearby Waitrose put £36,000 on the value of your house?

 

The claim that a nearby Waitrose ‘adds £36,000 to house price’ was credulously reported by the British media in 2017.8 But this was not a study of the change in house prices after a store opened, and Waitrose certainly did not experimentally randomize the placement of their new stores: it was simply a correlation between house prices and the closeness of supermarkets, particularly upscale ones like Waitrose.

The correlation almost certainly reflects Waitrose’s policy of opening stores in wealthier locations, and is therefore a fine example of the actual chain of causation being the precise opposite of what has been claimed. This is known, unsurprisingly, as reverse causation. More serious examples occur in studies examining the relationship between drinking alcohol and health outcomes, which generally find that non-drinkers have substantially higher death rates than moderate drinkers. How can this possibly make sense, given what we know about the impact of alcohol on the liver for example? This relationship has been partially attributed to reverse causation—those people who are more likely to die do not drink because they are ill already (possibly through excessive drinking in the past). More careful analyses now exclude ex-drinkers, and also ignore adverse health events in the first few years of the study, since these may be due to pre-existing conditions. Even with these exclusions, some overall health benefit from moderate drinking appears to remain, although it is deeply contested.

Another amusing exercise is to try to invent a narrative of reverse causation for any statistical claim based on correlation alone. My favourite is a study finding a correlation between US teenagers’ consumption of carbonated soft drinks and their tendency towards violence: while a newspaper reported this as ‘Fizzy Drinks Make Teenagers Violent’,9 perhaps it is just as plausible that being violent works up a thirst? Or more plausibly we could think of some common factors that might influence both, such as membership of a particular peer-group. Potential common causes that we do not measure are known as lurking factors, since they remain in the background, are not included in any adjustment, and are waiting to trip up naïve conclusions from observational data.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)