Home > The Art of Statistics How to Learn from Data(7)

The Art of Statistics How to Learn from Data(7)
Author: David Spiegelhalter

 

Certain plots are so complex that it becomes difficult to spot interesting patterns with the naked eye. Take Figure 2.9, in which each line shows the rank of the popularity of a particular given name for boys born in England and Wales between 1905 and 2016.6 This represents an extraordinary social history, and yet on its own only communicates the rapidly changing fashions in naming, with the later, denser lines suggesting a greater breadth and diversity of names since the mid-nineties.

It is only by allowing interactivity that we can pick out specific lines of personal interest. For example, I’m intrigued to see the trend for David, a name which became particularly popular in the 1920s and 1930s, possibly due to the Prince of Wales (later the short-reigned Edward VIII) being called David. But its popularity has declined precipitously—in 1953 I was one of tens of thousands of Davids, but in 2016 only 1,461 were given that name, and over 40 names were more popular.

 

 

Figure 2.9

A screenshot of an interactive graph provided by the UK Office for National Statistics, showing the trend of the position of each boy’s name in a league table of popularity. My rather unimaginative parents gave me the most popular boy’s name in 1953, but I have since gone out of fashion, in direct contrast to Oliver. David has, however, shown some signs of recovery recently, possibly influenced by David Beckham.

 

 

Communication


This chapter has focused on summarizing and communicating data in an open and non-manipulative way; we do not want to influence our audiences’ emotions and attitudes, or convince them of a certain perspective. We just want to tell it how it is, or at least how it seems to be, and while we cannot ever claim to tell the absolute truth, we can at least try to be as truthful as possible.

Of course this attempt at scientific objectivity is easier said than done. When the Statistical Society of London (later the Royal Statistical Society) was set up in 1834 by Charles Babbage, Thomas Malthus and others, they loftily declared that ‘The Statistical Society will consider it to be the first and most essential rule of its conduct to exclude carefully all opinions from its transactions and publications—to confine its attention rigorously to facts—and, as far as it may be found possible, to facts which can be stated numerically and arranged in tables.’7 From the very start they took no notice whatsoever of this stricture, and immediately starting inserting their opinions about what their data on crime, health and the economy meant and what should be done in response to it. Perhaps the best we can do now is recognize this temptation and do our best to keep our opinions to ourselves.

The first rule of communication is to shut up and listen, so that you can get to know about the audience for your communication, whether it might be politicians, professionals or the general public. We have to understand their inevitable limitations and any misunderstandings, and fight the temptation to be too sophisticated and clever, or put in too much detail.

The second rule of communication is to know what you want to achieve. Hopefully the aim is to encourage open debate, and informed decision-making. But there seems no harm in repeating yet again that numbers do not speak for themselves; the context, language and graphic design all contribute to the way the communication is received. We have to acknowledge we are telling a story, and it is inevitable that people will make comparisons and judgements, no matter how much we only want to inform and not persuade. All we can do is try to pre-empt inappropriate gut reactions by design or warning.

 

 

Storytelling with Statistics


This chapter has introduced the concept of data visualization, sometimes known as dataviz. These techniques are often used for researchers, or for fairly sophisticated audiences, using a standard armoury of plots that are selected for their value in gaining understanding and for exploring the data, rather than their purely visual appeal. When we have worked out the important messages in the data that we want to communicate, we might then go on to use infographics, or infoviz, to grab the attention of the audience and tell a good story.

Sophisticated infographics regularly appear in the media, but Figure 2.10 shows a fairly basic example which tells a strong story of social trends by bringing together the responses to three questions in the UK’s National Survey of Sexual Attitudes and Lifestyles (Natsal-3) in 2010; at what age did women and men first have sex, first start co-habiting, and have their first child?8 The median ages for each of these life events are plotted against the women’s year of birth, and the three points connected with a heavy vertical line. The steady lengthening of this line between women born in the 1930s and those in the 1970s displays the increased period in which effective contraception is necessary.

 

 

Figure 2.10

Infographic based on data from the third UK National Survey of Sexual Attitudes and Lifestyles (Natsal-3)—the lesson from the data is pointed out both visually and verbally.

 

 

Even more advanced are dynamic graphics, in which movement can be used to reveal patterns in the changes over time. The master of this technique was Hans Rosling, whose TED talks and videos set a new standard of storytelling with statistics, for example by showing the relationship between changing wealth and health through the animated movement of bubbles representing each country’s progress from 1800 to the present day. Rosling used his graphics to try to correct misconceptions about the distinction between ‘developed’ and ‘undeveloped’ countries, with the dynamic plots revealing that, over time, almost all countries moved steadily along a common path towards greater health and prosperity.*9


This chapter has demonstrated a continuum from simple descriptions and plots of raw data, through to complex examples of storytelling with statistics. Modern computing means that data-visualization is becoming easier and more flexible; and since summary statistics can hide as well as illuminate, appropriate graphical displays are essential. Nevertheless, summarizing and communicating the raw numbers is only the first stage in the process of learning from data. To get further along this path, we need to address the fundamental idea of what we are trying to achieve in the first place.

 

 

Summary


• A variety of statistics can be used to summarize the empirical distribution of data-points, including measures of location and spread.

• Skewed data distributions are common, and some summary statistics are very sensitive to outlying values.

• Data summaries always hide some detail, and care is required so that important information is not lost.

• Single sets of numbers can be visualized in strip-charts, box-and-whisker plots and histograms.

• Consider transformations to better reveal patterns, and use the eye to detect patterns, outliers, similarities and clusters.

• Look at pairs of numbers as scatter-plots, and time-series as line-graphs.

• When exploring data, a primary aim is to find factors that explain the overall variation.

• Graphics can be both interactive and animated.

• Infographics highlight interesting features and can guide the viewer through a story, but should be used with awareness of their purpose and their impact.

 

 

CHAPTER 3


Why Are We Looking at Data Anyway? Populations and Measurement

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)