16.1 The Bootstrap Idea in Chapter 16 Bootstrap Methods and Permutation Tests

16.1 The Bootstrap Idea

Here is the example we will use to introduce the bootstrap approach.

Example 16.1 Average time looking at a Facebook profile.

Data set icon for face4.

In Example 12.18 (page 624), we compared the amount of time a Facebook user spends reading different types of profiles. Here, let’s focus on just the average time for the fourth profile (negative male). Figure 16.1 gives a histogram and Normal quantile plot of the 21 observations. The data are skewed to the right. Given the relatively small sample size, we have some concerns about using the t confidence interval procedures for these data.

A histogram and normal quantile plot. — Figure 16.1 Histogram and Normal quantile plot of the distribution of viewing times (in minutes) looking at a negative male Facebook profile page, Example 16.1. The data are right skewed.

The histogram plots percentage on the vertical axis, ranging from 0.0 to 0.4 in increments of 0.1, versus time in minutes on the horizontal axis, ranging from 0 to 25 with class sizes of 5. The distribution is right skewed. By class, the percentages are as follows. 0 to less than 5, 0.38. 5 to less than 10, 0.35. 10 to less than 15, 0.19. 15 to less than 20, 0.05. 20 to less than 25, 0.05. The normal quantile plot plots time in minutes on the vertical axis, ranging from 0 to 25 in increments of 5, versus normal score on the horizontal axis, ranging from negative 2 to 2 in increments of 1. Twenty-one points are plotted in a loose, curvilinear cluster that rises left to right from (negative 2, 0) to (1.9, 24). A diagonal regression line rises through the center of the cluster from (negative 1.8, 0) to (2, 17). All values estimated.

The big idea: Resampling and the bootstrap distribution

Statistical inference is based on the sampling distributions of sample statistics. A sampling distribution is based on many random samples from the population. The bootstrap is a way of finding the sampling distribution—at least approximately—from just one sample. We first encountered the bootstrap in Chapter 7 (pages 403–404) when discussing nonparametric procedures. Here is a summary of the method in terms of our Facebook profile example:

Step 1: Resampling. In Example 16.1, we have just one simple random sample (SRS) of 21 cases. In place of many samples from the population, create many resamples by repeatedly sampling with replacement cases from this one SRS. Each resample is the same size as the original SRS.

Sampling with replacement means that after we randomly draw a case from the original sample, we put it back before drawing the next case. Think of drawing a number from a hat and then putting it back before drawing from the hat again. As a result, any number in the hat can be drawn more than once. If we sampled without replacement, we’d get the same set of numbers we started with, though in a different order. Figure 16.2 illustrates three resamples from an SRS of five cases. In practice, we draw hundreds or thousands of resamples, not just three.

A tree diagram of resampling. — Figure 16.2 The resampling idea. The top box is an SRS of size n=5. The three lower boxes are three resamples from this original sample. Some values from the original sample are repeated in the resamples because each resample is formed by sampling with replacement. We calculate the statistic of interest—the sample mean in this example—for the original sample and each resample.

Step 2: Bootstrap distribution. The sampling distribution of a statistic describes the values taken by the statistic in all possible samples of the population of the same size. The bootstrap distribution of a statistic summarizes the values taken by the statistic in all possible resamples of the same size. The bootstrap distribution gives information (i.e., shape and spread) about the sampling distribution.

Example 16.2 Bootstrap distribution of the mean time looking at a Facebook profile.

Data set icon for face4.

In Example 16.1, we want to estimate the average viewing time of a negative male Facebook profile μ, so we’ll use the sample mean x¯ as our statistic. For our one sample of 21 subjects, x¯=7.87 minutes. When we resample, we get different values of x¯, just as we would if we randomly sampled a new group of subjects to study.

We randomly generated 3000 resamples for these data. The mean for the resamples is 7.89 minutes, and the standard deviation is 1.22 minutes. Figure 16.3(a) gives a histogram of the bootstrap distribution of the means of 3000 resamples from the viewing time data. The Normal density curve with the mean 7.89 and standard deviation 1.22 is superimposed on the histogram. A Normal quantile plot is given in Figure 16.3(b). The Normal curve fits the data well, but some skewness is still present.

According to the bootstrap idea, the bootstrap distribution represents the sampling distribution.

Example 16.3 Assessing the bootstrap distribution.

Let’s compare the bootstrap distribution with what we know about the sampling distribution of x¯.

Shape: We see that the bootstrap distribution is nearly Normal. The central limit theorem says that the sampling distribution of the sample mean x¯ is approximately Normal if n is large. So the bootstrap distribution shape is close to the shape we expect the sampling distribution to have for large enough n.

Center: The bootstrap distribution is centered close to the mean of the original sample, 7.89 minutes versus 7.87 minutes for the original sample. Therefore, the mean of the bootstrap distribution has little bias as an estimator of the mean of the original sample. We know that the sampling distribution of x¯ is centered at the population mean μ—that is, that x¯ is an unbiased estimate of μ. So the resampling distribution behaves (starting from the original sample) as we expect the sampling distribution to behave (starting from the population).

Spread: The histogram and density curve in Figure 16.3(a) picture the variation among the resample means. We can get a numerical measure by calculating their standard deviation. Because this is the standard deviation of the 3000 values of x¯ that make up the bootstrap distribution, we call it the bootstrap standard error of x¯. The numerical value is 1.22. In fact, we know that the standard deviation of x¯ is σ/n, where σ is the standard deviation of individual observations in the population. Our usual estimate of this quantity is the standard error of x¯, s/n, where s is the standard deviation of our one SRS. For these data, s=5.65 and

sn=5.6521=1.23

The bootstrap standard error 1.22 is close to the theory-based estimate 1.23.

In discussing Example 16.2, we used statistical theory to describe the sampling distribution of the sample mean x¯. We found that the bootstrap distribution created by resampling matches the properties of this sampling distribution. The heavy computation needed to produce the bootstrap distribution replaces the heavy theory (central limit theorem, mean, and standard deviation of x¯) used to describe the sampling distribution.

The great advantage of the resampling idea is that it often works in settings when theory does not apply. Of course, theory also has its advantages: we know exactly when it works. We don’t know exactly when resampling works, so that “When can I safely bootstrap?” is a somewhat subtle issue (see Section 16.3).

Figure 16.4 illustrates the bootstrap idea by comparing three distributions. Figure 16.4(a) shows the idea of the sampling distribution of the sample mean x¯: take many random samples from the population, calculate the mean x¯ for each sample, and collect these x¯-values into a distribution.

Three diagrams of obtaining a sampling distribution. — Figure 16.4 (a) The idea of the sampling distribution of the sample mean x¯: take very many samples, collect the x¯ values from each, and look at the distribution of these values. (b) The theory shortcut: if we know that the population values follow a Normal distribution, theory tells us that the sampling distribution of x¯ is also Normal. (c) The bootstrap idea: when theory fails and we can afford only one sample, that sample stands in for the population, and the distribution of x¯ in many resamples stands in for the sampling distribution.

Diagram a. A population has unknown mean, mu. Multiple S R S’s of size n are taken from the population and their x-bars are obtained. A histogram of the sampling distribution is roughly normal. Diagram b. A normal population has unknown mean mu. A distribution curve for the population is short and wide with standard deviation sigma. According to theory, the distribution of curve of the sampling distribution is also normal, but taller and narrower with mean mu and standard deviation sigma divided by the square root of n. Diagram c. A population has an unknown mean, mu. One S R S of size n is taken. A histogram of the sampling distribution is right skewed. Multiple resamples of size n are taken, and x bar is obtained from each. A histogram of the resulting bootstrap distribution is roughly normal.

Figure 16.4(b) shows how traditional inference works: statistical theory tells us that if the population has a Normal distribution, then the sampling distribution of x¯ is also Normal. If the population is not Normal but our sample is large, we can use the central limit theorem. If μ and σ are the mean and standard deviation of the population, the sampling distribution of x¯ has mean μ and standard deviation σ/n. When it is available, theory is wonderful: we know the sampling distribution without the impractical task of actually taking many samples from the population.

Figure 16.4(c) shows the bootstrap idea: we avoid the task of taking many samples from the population by instead taking many resamples from a single sample. The values of x¯ from these resamples form the bootstrap distribution. We use the bootstrap distribution rather than theory to learn about the sampling distribution.

Check-in

16.1 A small bootstrap example. To illustrate the bootstrap procedure, let’s bootstrap a small random subset of the Facebook profile data:

10.10 7.95 4.38 8.60 1.32 12.25
1. Sample with replacement from this initial SRS by rolling a die. Rolling a 1 means select the first member of the SRS, a 2 means select the second member, and so on. (You can also use Table B of random digits and respond only to digits 1 to 6.) Create 20 resamples of size n=6.
2. Calculate the sample mean for each of the resamples.
3. Make a stemplot of the means of the 20 resamples. This is an estimate of the bootstrap distribution.
4. Calculate the bootstrap standard error.
16.2 Standard deviation versus standard error. Explain the difference between the standard deviation of a sample and the standard error of a statistic such as the sample mean.

Thinking about the bootstrap idea

It might appear that resampling creates new data out of nothing. Even the name “bootstrap” comes from the impossible image of “pulling yourself up by your own bootstraps.”² However, the resamples are not used as if they were new data. The bootstrap distribution of the resample means is used only to estimate how the sample mean of one actual sample of size 21 would vary because of random sampling.

Using the same data for two purposes—to estimate a parameter and also to estimate the variability of the estimate—is perfectly legitimate. We do exactly this when we calculate x¯ to estimate μ and then calculate s/n from the same data to estimate the variability of x¯.

What is new? First of all, we don’t rely on the formula s/n to estimate the standard deviation of x¯. Instead, we use the ordinary standard deviation of the many x¯-values from our many resamples.³ Suppose that we take B resamples and call the means of these resamples x¯* to distinguish them from the mean x¯ of the original sample. We would then find the mean and standard deviation of the x¯*’s in the usual way.

To make clear that these are the mean and standard deviation of the means of the B resamples rather than the mean x¯ and standard deviation s of the original sample, we use a distinct notation:

meanboot=1B∑ x¯*SEboot=1B−1∑ (x¯*−meanboot)2

These formulas go all the way back to Chapter 1. Once we have the values x¯*, we can just ask our software for their mean and standard deviation.

Because we will often apply the bootstrap to statistics other than the sample mean, here is the general definition for the bootstrap standard error.

Second, we don’t appeal to the central limit theorem or other theory to tell us that a sampling distribution is roughly Normal. We simply look at the bootstrap distribution using the methods of Chapter 1 to see if it is roughly Normal (or not). What we find determines how to proceed in constructing a confidence interval.

In summary, the bootstrap allows us to calculate standard errors for statistics for which we don’t have formulas and to check Normality of the sampling distribution for statistics that theory doesn’t easily handle. To apply the bootstrap idea, we must start with a statistic that estimates the parameter we are interested in. We come up with a suitable statistic by appealing to another principle that we have often applied without thinking about it.

This principle tells us to estimate a population mean μ by the sample mean x¯ and a population standard deviation σ by the sample standard deviation s. Estimate a population median by the sample median and a population regression line by the least-squares line calculated from a sample. The bootstrap idea itself is a form of the plug-in principle: substitute the data for the population and then draw samples (resamples) to mimic the process of building a sampling distribution.

Using software

Software is essential for bootstrapping in practice. Here is an outline of the program you would write if your software can choose random samples from a set of data but does not have bootstrap functions:

Repeat B times
1. Draw a resample with replacement from the data.
2. Calculate the resample statistic.
3. Save the resample statistic into a variable.
Make a histogram and Normal quantile plot of the B resample statistics.
Calculate the mean and standard deviation of the B statistics.

Example 16.4 Using software.

Data set icon for face4.

R has packages that contain various bootstrap functions, so we do not have to write them ourselves. If the 21 viewing times are saved as a variable, we can use functions to resample from the data, calculate the means of the resamples, and request both graphs and printed output. We can also ask that the bootstrap results be saved for later access.

The function plot within the package boot will generate graphs similar to those in Figure 16.3 so you can assess Normality. Figure 16.5 contains the default output from a call of the function boot. The variable Time contains the 21 viewing times, the function theta is specified to be the mean, and we request 3000 resamples. The original entry gives the mean x¯=7.87 of the original sample. Bias is the difference between the mean of the resample means and the original mean. If we add the entries for bias and original, we get the mean of the resample means, meanboot:

7.87+0.02=7.89

The bootstrap standard error is displayed under std. error. All these values except original will differ slightly if you take another 3000 resamples because the resamples are drawn at random.

An R output of bootstrap data. — Figure 16.5 R output of the bootstrap applied to the Facebook profile viewing time data, Example 16.3.

Section 16.1 SUMMARY

To bootstrap a statistic such as the sample mean, draw with replacement hundreds or thousands of resamples of the same size from a single original sample and calculate the statistic for each resample. This collection of resample statistics approximates the bootstrap distribution, which summarizes the values of all possible resample statistics from the single original sample.
A bootstrap distribution approximates the sampling distribution of the statistic, usually sharing the same shape and spread. It is centered at the statistic (from the original sample) when the sampling distribution is centered at the parameter (of the population). The bootstrap standard error is the standard deviation of the bootstrap distribution.
Use graphical and numerical summaries to determine whether the bootstrap distribution is approximately Normal and centered at the original statistic and to assess its spread. This information is used to determine the appropriate bootstrap confidence interval.
If we take B resamples and call the means of these resamples x¯* to distinguish them from the mean x¯ of the original sample, we find the mean and standard deviation of the x¯*’s in the usual way:

meanboot=1B∑ x¯*SEboot=1B−1∑ (x¯*−meanboot)2

The bootstrap standard error of x¯ is an estimate of σ/n, the standard deviation of x¯.
Using the sample mean to estimate the population mean is an example of the plug-in principle: use a quantity based on the sample to approximate a similar quantity from the population.

Section 16.1 EXERCISES

16.1 What’s wrong? For each of the following, explain what is wrong and why.
1. The standard deviation of the bootstrap distribution will be approximately the same as the standard deviation of the original sample.
2. The bootstrap distribution is created by resampling without replacement from the original sample.
3. When generating the resamples, it is best to use a sample size smaller than the size of the original sample.
4. The bootstrap distribution is created by resampling with replacement from the population.
16.2 Describing the method to obtain resamples. Suppose an SRS of size n=10 is obtained from the undergraduates at a large public university to measure the average stress level during dead week. Carefully detail two ways to randomly select resamples in this setting.
16.3 Gosset’s data on double stout sales. William Sealy Gosset worked at the Guinness Brewery in Dublin and made substantial contributions to the practice of statistics. In Exercise 1.37 (page 44), we examined Gosset’s data on the change in the double stout market before and after World War I (1914–1918). Here are the data for a sample of six of the regions in the original data:

Bristol 94 Glasgow 66

English P 46 Liverpool 140

English Agents 78 Scottish 24
1. Do you think that these data appear to be from a Normal distribution? Give reasons for your answer.
2. Select five resamples from this set of data, making sure to describe how this sampling was performed.
3. Compute the mean for each resample.
4. Find the bootstrap standard error.
16.4 More on the bootstrap standard error. Refer to your work in the previous exercise.
1. Do you expect your bootstrap standard error to be larger, smaller, or approximately equal to the standard deviation of the original sample of six regions? Explain your answer.
2. Would your answer change if the bootstrap standard error were based on 500 resamples instead of 5? Explain your answer.
16.5 Assessing the bootstrap distribution. Refer to the data in Exercise 16.3. Figure 16.6 gives a histogram and a Normal quantile plot of 3000 resample means (labeled t*). What do these plots tell you about the sampling distribution of x¯ when n=6?

Figure 16.6 R graphical output for the percent change in double stout sales bootstrap, Exercise 16.5.

The output shows a histogram of t that plots density on the vertical axis, ranging from 0.000 to 0.030 in increments of 0.005, versus t asterisk on the horizontal axis, ranging from 20 to 120 with class sizes of 4. The distribution if roughly normal between 36 and 120 with a mean of 75. To the right is a normal quantile plot that plots t asterisk on the vertical axis, ranging from 20 to 140 in increments of 20, versus quantiles of standard normal on the horizontal axis, ranging from negative 4 to 4 in increments of 1. Hundreds of points are plotted in a nearly straight linear cluster that rises from left to right. A regression line rises through the center of the cluster from (negative 3, 30) to (4, 130). Nearly all of the points are on the line.
16.6 Assessing another bootstrap distribution. Refer to the data in Check-in question 16.1 (page 16-6). Figure 16.7 gives a histogram and a Normal quantile plot of 3000 resample means (labeled t*). What do these plots tell you about the sampling distribution of x¯ when n=6?

Figure 16.7 R graphical output for the Facebook viewing time bootstrap, Exercise 16.6.

The output shows a histogram of t that plots density on the vertical axis, ranging from 0.000 to 0.030 in increments of 0.005, versus t asterisk on the horizontal axis, ranging from 2 to 12 with class sizes of approximately 0.35. The distribution if roughly normal between 2.5 and 11.5 with a mean of approximately 7.5. To the right is a normal quantile plot that plots t asterisk on the vertical axis, ranging from 2 to 14 in increments of 2, versus quantiles of standard normal on the horizontal axis, ranging from negative 4 to 4 in increments of 1. Hundreds of points are plotted in a nearly straight linear cluster that rises from left to right. A regression line rises through the center of the cluster from (negative 3, 2) to (3, 12). Nearly all of the points are on the line.
16.7 Interpreting the output. Figure 16.8 gives output from R for the sample of ratios (as a percent) in Exercise 16.3. Summarize the results of the analysis using this output.

Figure 16.8 R output for the percent change in double stout sales bootstrap, Exercise 16.7.

The output lists the following data. Ordinary nonparametric bootstrap. Call. boot, data = stout, statistic = theta, R = 3000. Bootstrap statistics. t 1 asterisk. Original, 74.66667. bias, 0.1047778. standard error, 14.9482.
16.8 Interpreting bootstrap output. Figure 16.9 gives output from R for the sample of six viewing times in Check-in question 16.1 (page 16-6). Summarize the results of the analysis using this output.

Figure 16.9 R output for the Facebook viewing time bootstrap, Exercise 16.8.

The output lists the following data. Ordinary nonparametric bootstrap. Call. boot, (data = Time, statistic = theta, R = 3000. Bootstrap statistics. t 1 asterisk. Original, 7.433333. bias, negative 0.01242889. standard error, 1.48071.

Inspecting the bootstrap distribution of a statistic helps us judge whether the sampling distribution of the statistic is close to Normal. For each of the data sets in Exercises 16.9 through 16.12, determine whether you’d be comfortable using inference methods that rely of Normality by bootstrapping the sample mean x¯ using 2000 resamples. Construct a histogram and a Normal quantile plot to support your answer.
16.9 Bootstrap distribution of the time to start a business. We examined the distribution of the time to start a business for 187 countries in Example 1.43 (page 60). The distribution is clearly skewed and has an outlier. We view these data as coming from a process that gives times to start a business around the globe.
16.10 Bootstrap distribution of average IQ score. The distribution of the 60 IQ test scores in Table 1.1 (page 15) is roughly Normal (see Figure 1.7), and the sample size is large enough that we expect a Normal sampling distribution.
16.11 Bootstrap distribution of delivery time from COSI. A random sample of eight times (in minutes) it took a delivery robot to bring you lunch from COSI to your dormitory (Example 7.1, page 387) are

13.7 26.3 20.0 45.3 8.5 43.6 10.1 17.3

The distribution has no outliers, but we cannot comfortably assess Normality from such a small sample.
16.12 Bootstrap distribution of average audio file length. The lengths (in seconds) of audio files found on an iPod (Table 7.3, page 399) are skewed. We previously transformed the data prior to using t procedures.
16.13 Standard error versus the bootstrap standard error. We have two ways to estimate the standard deviation of a sample mean x¯: use the formula s/n for the standard error or use the bootstrap standard error.
1. Find the sample standard deviation s for the 60 IQ test scores in Exercise 16.10 and use it to find the standard error s/n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.10?
2. Find the sample standard deviation s for the time to start a business data in Exercise 16.9 and use it to find the standard error s/n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.9?
3. Find the sample standard deviation s for the eight delivery times in Exercise 16.11 and use it to find the standard error s/n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.11?
16.14 Visit lengths to a statistics help room. Table 5.1 (page 283) gives the length (in minutes) for a sample of 50 visits to a statistics help room. See Example 5.5 (page 283) for more details about these data.
1. Make a histogram of the 50 visit lengths. Describe the shape. Is it similar to the distribution of all 1264 recorded visits?
2. The central limit theorem says that the sampling distribution of the sample mean x¯ becomes Normal as the sample size increases. Is the sampling distribution roughly Normal for n=50? To find out, bootstrap this SRS using 1000 resamples and inspect the bootstrap distribution of the mean. What do you conclude?
16.15 More on help room visit lengths. Here is an SRS of 10 of the visit lengths from Exercise 16.14:

20 35 142 41 60 150 30 5 165 55

We expect the sampling distribution of x¯ to be less close to Normal for samples of size 10 than for samples of size 50 from a skewed distribution.
1. Create and inspect the bootstrap distribution of the sample mean for these data using 1000 resamples. Compared with your distribution from the previous exercise, is this distribution closer to or farther away from Normal?
2. Compare the bootstrap standard errors for your two sets of resamples. Why is the standard error larger for the smaller SRS?

Bristol	94	Glasgow	66
English P	46	Liverpool	140
English Agents	78	Scottish	24