Home > The Art of Statistics How to Learn from Data(9)

The Art of Statistics How to Learn from Data(9)
Author: David Spiegelhalter

A survey would not be valid if the questions were biased in favour of a particular response. For example, in 2017 budget airline Ryanair announced that 92% of their passengers were satisfied with their flight experience. It turned out that their satisfaction survey only permitted the answers, ‘Excellent, very good, good, fair, OK’.*

We have seen how positive or negative framing of numbers can influence the impression given, and similarly the framing of a question can influence the response. For example, a UK survey in 2015 asked people whether they supported or opposed ‘giving 16- and 17-year-olds the right to vote’ in the referendum on whether to leave the European Union, and 52% supported the idea while 41% opposed it. So the majority favoured this proposal when framed in terms of recognizing rights and empowering younger people.

But when the same respondents were asked the (logically identical) question of whether they supported or opposed ‘reducing the voting age from 18 to 16’ for the referendum, the proportion supporting the proposal dropped to 37%, with 56% opposing. So when framed in terms of a more risky liberalization, the proposal was opposed by the majority, a reversal in opinion brought about by simple rewording of the question.2

The responses to questions can also be influenced by what has been asked beforehand, a process known as priming. Official surveys of wellbeing estimate that around 10% of young people in the UK consider themselves lonely, but an online questionnaire by the BBC found the far higher proportion of 42% among those choosing to answer. This figure may have been inflated by two factors: the self-report nature of the voluntary ‘survey’, and the fact that the question about loneliness had been preceded by a long series of enquiries as to whether the respondent in general felt a lack of companionship, isolated, left out, and so on, all of which might have primed them to give a positive response to the crucial question of feeling lonely.3

Going from sample (Stage 2) to study population (Stage 3): this depends on the fundamental quality of the study, also known as its internal validity: does the sample we observe accurately reflect what is going on in the group we are actually studying? This is where we come to the crucial way of avoiding bias: random sampling. Even children understand what it means to pick something at random: closing your eyes and reaching into a jumbled bag of sweets and seeing which colour comes out, or pulling a number out of a hat to see who gets a prize or a treat (or doesn’t). It has been used for millennia as a way of ensuring fairness and justice, when it is known as sortition,* and has been used as a way of allocating rewards,* running lotteries, and appointing people with power such as officials and jurors. It has also been involved in more sobering duties, such as choosing which young people should go off to war, or who to eat in a lifeboat lost at sea.

George Gallup, who essentially invented the idea of the opinion poll in the 1930s, came up with a fine analogy for the value of random sampling. He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir. A literal proof of this idea was provided by the 1969 Vietnam War draft lottery, which had to provide an ordered list of birthdays, and then men whose birthday was at the top of the list would be drafted first to go to Vietnam, and so on down the list. In a public attempt to make the process fair, 366 capsules were prepared, each containing a unique birthday, and capsules were intended to be picked from a box at random. But the capsules were put in the box in order of the month of the birthday, and were not properly mixed up. This might not have caused a problem if the men drawing out the capsules had delved down into the box, but as a remarkable video shows, they tended to take them from the top.4 The result was that it was bad luck to be born later in the year: 26 out of 31 birthdays in December ended up drafted, compared to only 14 in January.

The idea of adequate ‘stirring’ is crucial: if you want to be able to generalize from the sample to the population, you need to make sure your sample is representative. Just having masses of data does not necessarily help guarantee a good sample and can even give false reassurance. For example, polling companies performed miserably in the 2015 UK general election, even though they had sampled thousands of potential voters. A later inquiry blamed non-representative sampling, particularly from telephone polls: not only did landlines make up the majority of numbers called, but less than 10% of those who were called actually responded. This is hardly likely to be a representative sample.

Going from study population (Stage 3) to target population (Stage 4): finally, even with perfect measurement and a meticulous random sample, the results may still not reflect what we wanted to investigate in the first place if we have not been able to ask the people in whom we are particularly interested. We want our study to have external validity.

An extreme example is when our target population comprises people, whereas we have only been able to study animals, such as the effect of a chemical on mice. Less dramatic is when clinical trials of new drugs have been conducted only on adult men, but the drug is then used ‘off-label’ on women and children. We would like to know the effects on everyone, but this cannot be solved by statistical analysis alone—we inevitably need to make assumptions and be very cautious.

When We Have All the Data

Although the ideas of learning from data are neatly illustrated by looking at surveys, in fact much of the data used today is not based on random sampling, or in fact any sampling at all. Routinely collected data on, say, online purchasing or social transactions, or for administering a system such as education or policing, can be re-purposed to help us understand about what is going on in the world. In these situations we have all the data. In terms of the process of induction shown in Figure 3.1, there is no gap between Stages 2 and 3—the ‘sample’ and the study population are essentially the same. This does avoid any concern about having a small sample size, but many other problems can still remain.

Consider the question of how much crime there is in Britain, and the politically sensitive issue of whether it is going up or going down. There are two main sources of data—one survey-based and one administrative. First, the Crime Survey for England and Wales is a classic piece of sampling in which around 38,000 people are questioned each year about their experiences of crime. Just like the Natsal sex survey, problems can arise when using the actual reports (Stage 1) to draw conclusions about their true experiences (Stage 2), since respondents may not tell the truth—say about drug crime in which they have themselves participated. Then we need to assume the sample is representative of the eligible population and take into account its limited size (Stage 2 to Stage 3), and finally acknowledge that the study design is not reaching some part of the overall target population, such as the fact that nobody under 16 or living in a communal residence is questioned (Stage 3 to Stage 4). Nevertheless, with suitable caveats, the Crime Survey for England and Wales is a ‘designated national statistic’ and used for monitoring long-term trends.5

The second source of data comprises the reports of crimes recorded by the police. This is done for administrative purposes and is not a sample: since every crime that is recorded in the country can be counted, the ‘study population’ is the same as the sample. Of course we still have to assume that the data recorded truly represent what happened to those victims who report crimes (Stage 1 to Stage 2), but the major problem occurs when we want to claim that the data on the study population—people who reported crimes—represents the target population of all crimes committed in England and Wales. Unfortunately, police-recorded crime systematically misses cases which the police do not record as a crime or which have not been reported by the victim; illegal drug use, for example, and people who choose not to report thefts and vandalism in case their area suffers a decline in property values. As an extreme example, after a report in November 2014 criticized the police’s recording practices, the number of recorded sexual offences rose from 64,000 in 2014 to 121,000 in 2017: a near doubling in three years.

Hot Books

» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)

The Art of Statistics How to Learn from Data(9) Author: David Spiegelhalter

The Art of Statistics How to Learn from Data(9)
Author: David Spiegelhalter