Home > The Art of Statistics How to Learn from Data(2)

The Art of Statistics How to Learn from Data(2)
Author: David Spiegelhalter

Ideally both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals. Such league tables are favoured by the media and even some politicians, but can be grossly misleading: not only because the differences could be due to chance variation, but because the hospitals may be taking in very different types of cases. In Table 1.1, for example, we might suspect that Birmingham, one of the biggest and most well-known children’s hospitals, takes on the most severe cases, and so it would be unfair, to put it mildly, to highlight their apparently unimpressive overall survival rates.*

The survival rates can be presented in a horizontal bar-chart such as the one shown in Figure 1.1. A crucial choice is where to start the horizontal axis: if the values start from 0%, all the bars will be almost the full length of the graphic, which will clearly show the extraordinarily high survival rates, but the lines will be indistinguishable. But the oldest trick of misleading graphics is to start the axis at say 95%, which will make the hospitals look extremely different, even if the variation is in fact only what is attributable to chance alone.

Choosing the start of the axis therefore presents a dilemma. Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ‘logical and meaningful baseline’, which in this situation appears difficult to identify—my rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.

 

 

Figure 1.1

Horizontal bar-chart of 30—day survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.

 

 

I began this book with a quotation from Nate Silver, the founder of data-based platform FiveThirtyEight and first famous for accurately predicting the 2008 US presidential election, who eloquently expressed the idea that numbers do not speak for themselves—we are responsible for giving them meaning. This implies that communication is a key part of the problem-solving cycle, and I have shown in this section how the message from a set of simple proportions can be influenced by our choices of presentation.

We now need to introduce an important and convenient concept that will help us get beyond simple yes/no questions.

 

 

Categorical Variables


A variable is defined as any measurement that can take on different values in different circumstances; it’s a very useful shorthand term for all the types of observations that comprise data. Binary variables are yes/no questions such as whether someone is alive or dead and whether they are female or not: both of these vary between people, and can, even for gender, vary within people at different times. Categorical variables are measures that can take on two or more categories, which may be

• Unordered categories: such as a person’s country of origin, the colour of a car, or the hospital in which an operation takes place.

• Ordered categories: such as the rank of military personnel.

• Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*

 

When it comes to presenting categorical data, pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas. Figure 1.2 shows a fairly hideous example modelled on the kind offered by Microsoft Excel, showing the proportions of the 12,933 child heart patients from Table 1.1 that are treated in each hospital.

Multiple pie charts are generally not a good idea, as comparisons are hampered by the difficulty in assessing the relative sizes of areas of different shapes. Comparisons are better based on height or length alone in a bar chart. Figure 1.3 shows a simpler, clearer example of a horizontal bar chart of the proportions being treated in each hospital.

 

 

Comparing a Pair of Proportions


We have seen how a set of proportions can be elegantly compared using a bar chart, and so it would be reasonable to think that comparing two proportions would be a trivial matter. But when these proportions represent estimates of the risks of experiencing some harm, then the way in which those risks are compared becomes a serious and contested issue. Here is a typical question:

 

 

Figure 1.2

The proportion of all child heart operations being carried out in each hospital, displayed in a 3D pie chart from Excel. This deeply unpleasant chart makes categories near the front look bigger, and so makes it impossible to make visual comparisons between hospitals.

 

 

Figure 1.3

Percentage of all child heart operations being carried out in each hospital: a clearer representation using a horizontal bar chart.

 

 

What’s the cancer risk from bacon sandwiches?

 

We’re all familiar with hyperbolic media headlines that warn us that something mundane increases the risk of some dread occurrence: I like to call these ‘cats cause cancer’ stories. For example, in November 2015 the World Health Organization’s International Agency for Research in Cancer (IARC) announced that processed meat was a ‘Group I carcinogen’, putting it in the same category as cigarettes and asbestos. This inevitably led to panicky headlines such as the Daily Record’s claim that ‘Bacon, Ham and Sausages Have the Same Cancer Risk as Cigarettes Warn Experts’.4

The IARC tried to quell the fuss by emphasizing that the Group 1 classification was about being confident that an increased risk of cancer existed at all, and said nothing about the actual magnitude of the risk. Lower down in the press release, the IARC reported that 50g of processed meat a day was associated with an increased risk of bowel cancer of 18%. This sounds worrying, but should it be?

The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who don’t. Statistical commentators took this relative risk and reframed it into a change in absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.

They concluded that, in the normal run of things, around 6 in every 100 people who do not eat bacon daily would be expected to get bowel cancer in their lifetime. If 100 similar people ate a bacon sandwich every single day of their lives, then according to the IARC report we would expect that 18% more would get bowel cancer, which means a rise from 6 to 7 cases out of 100.* That is one extra case of bowel cancer in all those 100 lifetime bacon-eaters, which does not sound so impressive as the relative risk (an 18% increase), and might serve to put this hazard into perspective. We need to distinguish what is actually dangerous from what sounds frightening.5

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)