Home > The Art of Statistics How to Learn from Data(38)

The Art of Statistics How to Learn from Data(38)
Author: David Spiegelhalter

*

 

 

Table 10.2

Observed and expected (in parentheses) counts of arm-crossing by gender: expected counts are calculated under the null hypothesis that arm-crossing is not associated with gender.


It is clear from Table 10.2 that the observed and expected counts are fairly similar, reflecting that the data are just about what we would expect under the null hypothesis. The chi-squared statistic is an overall measure of the dissimilarity between the observed and expected counts (its formula is given in the Glossary), and has the value 0.02. The P-value corresponding to this statistic, available from standard software, is 0.90, showing no evidence against the null hypothesis. It is reassuring that this P-value is essentially the same as the ‘exact’ test based on the hypergeometric distribution.

The development and use of test statistics and P-values has traditionally formed much of a standard statistics course, and has unfortunately given the field a reputation for being largely about picking the right formula and using the right tables. Although this book tries to take a broader perspective on the subject, it is nevertheless valuable to revisit the examples we have discussed throughout the book with regard to their statistical significance.


1. Do the daily number of homicides in the UK follow a Poisson distribution?

 

Figure 8.5 showed, for England and Wales between 2014 and 2016, the observed counts of days with different numbers of homicides. There were a total of 1,545 incidents over 1,095 days, an average of 1.41 per day, and under the null hypothesis of a Poisson distribution with this mean, we would expect the counts shown in the final column of Table 10.3. Adapting the approach used for the analysis in Table 10.2, the discrepancy between the observed and expected counts can be summarized by a chi-squared goodness-of-fit test statistic—again, see the Glossary for details.

The observed P-value of 0.96 is not significant, so there is no evidence to reject the null hypothesis (in fact the fit is so good as to almost be suspicious). Of course we should not then assume the null hypothesis is precisely true, but it should be reasonable to use it as an assumption when assessing, for example, the changes in homicide rates seen in Chapter 9.


2. Has the unemployment rate in the UK changed in the recent past?

 

In Chapter 7 we saw a quarterly change in unemployment of 3,000 had a margin of error of ± 77,000, based on ± 2 standard errors. This means the 95% confidence interval runs from −80,000 to +74,000 and clearly contains the value 0, corresponding to no change in unemployment. But the fact that this 95% interval includes 0 is logically equivalent to the point estimate (−3,000) being less than 2 standard errors from 0, meaning the change is not significantly different from 0.

This reveals the essential identity between hypothesis testing and confidence intervals:

• A two-sided P-value is less than 0.05 if the 95% confidence interval does not include the null hypothesis (generally 0).

• A 95% confidence interval is the set of null hypotheses that are not rejected at P < 0.05.

 

*

 

 

Table 10.3

Observed and expected days with specified number of homicide incidents in England and Wales, April 2014 to March 2016. A chi-squared goodness-of-fit test has a P-value of 0.96, indicating no evidence against the null hypothesis of a Poisson distribution.


This intimate link between hypothesis testing and confidence intervals should stop people misinterpreting results that are not statistically significantly different from 0—this does not mean that the null hypothesis is actually true, but simply that a confidence interval for the true value includes 0. Unfortunately, as we shall see later, this lesson is often ignored.


3. Does taking statins reduce the risk of heart attacks and strokes in people like me?

 

Table 10.4 repeats the results from the Heart Protection Study (HPS) previously shown in Table 4.1, but adds columns showing the confidence with which the benefits have been established. There is a close connection between the standard errors, the confidence intervals and the P-values. The confidence intervals for the risk reduction are roughly the estimate ± 2 standard errors (note the HPS rounds the relative reductions to whole numbers). The confidence intervals easily exclude the null hypothesis of 0%, corresponding to no effect of the statin, and so the P-values are very small—in fact the P-value for the 27% reduction in heart attacks is around 1 in 3 million. This is the consequence of carrying out such a massive study.

Other summary statistics might be used, such as the difference in absolute risks, but should all give similar P-values. The HPS researchers focus on the proportional reduction since it is fairly constant across subgroups, and therefore makes a good single summary measure. There are a number of different ways of calculating the confidence intervals, although these should only produce minor differences.

*

 

 

Table 10.4

The results reported at the end of the Heart Protection Study, showing the estimated relative effects, their standard errors, confidence intervals and P-values testing the null hypothesis of ‘no effect’.


4. Are mothers’ heights associated with their sons’ heights, once the fathers’ heights are taken into account?

 

In Chapter 5 we demonstrated a multiple linear regression with son’s height as the response (dependent) variable, and mother’s and father’s height as explanatory (independent) variables. The coefficients were shown in Table 5.3, but without any consideration of whether they could be considered significantly different from 0. To illustrate the way these results appear in statistical software, Table 10.5 reproduces the form of the output from the popular (free) R program.

As in Table 5.3, the intercept is the average of the sons’ heights, and the coefficients (labelled ‘Estimates’ in the output) represent the expected change in height per one inch difference of their mother and father from the average mother and father heights. The standard error is calculated from a known formula, and is clearly small relative to the size of the coefficients.

The t-value, also known as a t-statistic, is a major focus of attention, since it is the link that tells us whether the association between an explanatory variable and the response is statistically significant. The t-value is a special case of what is known as a Student’s t-statistic. ‘Student’ was the pseudonym of William Gosset, who developed the method in 1908 while on secondment at University College London from the Guinness brewery in Dublin—they wanted to preserve their employee’s anonymity. The t-value is simply the estimate/standard error (this can be checked for the numbers in Table 10.5), and so can be interpreted as how far the estimate is away from 0, measured in the number of standard errors. Given a t-value and the sample size, the software can provide a precise P-value; for large samples, t-values greater than 2 or less than −2 correspond to P < 0.05, although these thresholds will be larger for smaller sample sizes. R uses a simple star system for P-values, from one * indicating P < 0.05, up to three stars *** indicating P < 0.001. In Table 10.5 the t-values are so large that the P-values are vanishingly small.

*

 

 

Table 10.5

A reproduction of the output in R of a multiple regression using Galton’s data, with son’s height as the response variable, and mother’s and father’s height as explanatory variables. The t-value is the estimate divided by the standard error. The column headed Pr(>|t|) represents a two-sided P-value; the probability of getting such a large t-value, either positive or negative, under the null hypothesis that the true relationship is 0. The notation ‘2 e-16’ means the P-value is less than 0.0000000000000002 (that is 15 zeros). The final line shows the interpretations of the stars in terms of P-values.

Hot Books
» House of Earth and Blood (Crescent City #1)
» A Kingdom of Flesh and Fire
» From Blood and Ash (Blood And Ash #1)
» A Million Kisses in Your Lifetime
» Deviant King (Royal Elite #1)
» Den of Vipers
» House of Sky and Breath (Crescent City #2)
» The Queen of Nothing (The Folk of the Air #
» Sweet Temptation
» The Sweetest Oblivion (Made #1)
» Chasing Cassandra (The Ravenels #6)
» Wreck & Ruin
» Steel Princess (Royal Elite #2)
» Twisted Hate (Twisted #3)
» The Play (Briar U Book 3)