Home > The Art of Statistics How to Learn from Data(8)

The Art of Statistics How to Learn from Data(8)
Author: David Spiegelhalter

How many sexual partners have people in Britain really had?

The last chapter showed some remarkable results from a recent UK survey in which people reported the number of sexual partners they had had in their lifetime. Plotting these responses revealed various features, including a (very) long tail, a tendency to use round numbers such as 10 and 20, and more partners reported by men than women. But the researchers who spent millions of pounds collecting this data were not really interested in what these particular respondents said—after all, they were guaranteed complete anonymity. Their responses were a means to an end, which was to say something about the overall pattern of sexual partnerships in Britain—those of the millions of people who were not questioned about their sexual behaviour.

It is no trivial matter to go from the actual responses collected in a survey to conclusions about the whole of Britain. Actually, this is incorrect—it is incredibly easy to just claim that what these respondents say accurately represents what is really going on in the country. Media surveys about sex, where people volunteer to fill in forms on websites about what they say they get up to behind closed doors, do this all the time.

The process of going from the raw responses in the survey to claims about the behaviour of the whole country can be broken down into a series of stages:

1. The recorded raw data on the number of sexual partners that our survey participants report tells us something about…

2. The true number of partners of people in our sample, which tells us something about…

3. The number of partners of people in the study population—the ones who could potentially have been included in our survey—which tells us something about…

4. The number of sexual partners for people in Britain, which is our target population.

Where are the weakest points in this chain of reasoning? Going from the raw data (Stage 1) to the truth about our sample (Stage 2) means making some strong assumptions about how accurate respondents are when they say how many partners they have had, and there are many reasons to doubt them. We have already seen an apparent tendency for men to overstate, and women to understate, their partner count, possibly due to women not including partnerships they would rather forget, different tendencies to round up or round down, poor memory, and simple ‘social acceptability bias’.*

Going from our sample (Stage 2) to the study population (Stage 3) is perhaps the most challenging step. We first need to be confident that the people asked to take part in the survey are a random sample from those who are eligible: this should be fine for a well-organized study like Natsal. But we also need to assume that the people who actually agree to take part are representative, and this is less straightforward. The surveys have around a 66% response rate, which is remarkably good given the nature of the questions. However there is some evidence that participation rates are slightly lower in those who are not so sexually active, possibly counterbalanced by the difficulty in getting interviews with more unconventional members of society.

Finally, going from the study population (Stage 3) to the target population (Stage 4) is more straightforward, provided we can assume that the people who could potentially have been asked to participate represent the adult population of Britain. In Natsal’s case this should be assured by their careful experimental design, based on a random sample of households, although this does mean that people in institutions such as prisons, the services or nunneries were not included.

By the time we have worked through all the things that can go wrong, it might be enough to make anyone sceptical about making any general claims about the true sexual behaviour of the country, based on what we are told by the respondents to the survey. But the whole point of statistical science is to smooth progress through these stages and finally, with due humility, be able to say what we can and cannot learn from data.

Learning from Data—the Process of ‘Inductive Inference’

The preceding chapters have assumed you have a problem, you get some data, you look at it, and then summarize it concisely. Sometimes the counting, measuring and describing is an end in itself. For instance, if we just want to know how many people passed through the Accident & Emergency department last year, the data can tell us the answer.

But often the question goes beyond simple description of data: we want to learn something bigger than just the observations in front of us, whether it is to make predictions (how many will come next year?), or say something more basic (why are the numbers increasing?).

Once we want to start generalizing from the data—learning something about the world outside our immediate observations—we need to ask ourselves the question, ‘Learn about what?’ And this requires us to confront the challenging idea of inductive inference.

Many people have a vague idea of deduction, thanks to Sherlock Holmes using deductive reasoning when he coolly announces that a suspect must have committed a crime. In real life deduction is the process of using the rules of cold logic to work from general premises to particular conclusions. If the law of the country is that cars should drive on the right, then we can deduce that on any particular occasion it is best to drive on the right. But induction works the other way, in taking particular instances and trying to work out general conclusions. For example, suppose we don’t know the customs in a community about kissing female friends on the cheek, and we have to try to work it out by observing whether people kiss once, twice, three times, or not at all. The crucial distinction is that deduction is logically certain, whereas induction is generally uncertain.

Figure 3.1 represents inductive inference as a generic diagram, showing the steps involved in going from data to the eventual target of our investigation: as we have seen, the data collected in the sex survey tells us about the behaviour of our sample, which we use to learn about the people who could have been recruited to the survey, from which we make some tentative conclusions about sexual behaviour in the whole country.

Of course, it would be ideal if we could go straight from looking at the raw data to making general claims about the target population. In standard statistics courses, observations are assumed to be drawn perfectly randomly and directly from the population of direct interest. But this is rarely the case in real life, and therefore we need to consider the entire process of going from raw data to our eventual target. And, as we have seen with the sex survey, problems can occur at each of the different stages.

Going from data (Stage 1) to the sample (Stage 2): these are problems of measurement: is what we record in our data an accurate reflection of what we are interested in? We want our data to be:

• Reliable, in the sense of having low variability from occasion to occasion, and so being a precise or repeatable number.

• Valid, in the sense of measuring what you really want to measure, and not having a systematic bias.

Figure 3.1

Process of inductive inference: each arrow can be interpreted as ‘tells us something about’1

For example, the adequacy of the sex survey depends on people giving the same or very similar answers to the same question each time they are asked, and this should not depend on the style of the interviewer or the vagaries of the respondent’s mood or memory. This can be tested to some extent by asking specific questions both at the start and end of the interview. The quality of the survey also requires the interviewees to be honest when they report their sexual activity, and not either systematically exaggerate or downplay their experiences. All these are fairly strong demands.

Hot Books

» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)

The Art of Statistics How to Learn from Data(8) Author: David Spiegelhalter

The Art of Statistics How to Learn from Data(8)
Author: David Spiegelhalter