16.2 First Steps in Using the Bootstrap

To introduce the key ideas of resampling and bootstrap distributions, we studied an example in which we knew quite a bit about the actual sampling distribution (i.e., the sampling distribution of x¯ for large n). We saw that the bootstrap distribution agrees with the sampling distribution in shape and spread.

The center of the bootstrap distribution is not the same as the center of the sampling distribution. The sampling distribution of a statistic used to estimate a parameter is centered at the actual value of the parameter in the population, plus any bias. The bootstrap distribution is centered at the value of the statistic for the original sample, plus any bias. A key fact is that the two biases are similar in size even though the two centers may differ.

The bootstrap method is most useful in settings where we don’t know the sampling distribution of the statistic. The principles for any statistic are

Bootstrap t confidence intervals

If the bootstrap distribution of a statistic shows a Normal shape and small bias, we can construct a confidence interval for the parameter by using the bootstrap standard error and the familiar t distribution. We’ll use an example to show how this works.

Example 16.5 Grade point averages.

Data set icon for gpa.

A study of college students at a large university looked at grade point average (GPA) after three semesters of college as a measure of success. In Example 11.1 (page 565), we examined predictors of GPA. Let’s take a look at the distribution of the GPA for the 150 students in this study.

A histogram is given in Figure 16.10(a). The Normal quantile plot is given in Figure 16.10(b). The distribution is strongly skewed to the left. The Normal quantile plot suggests that there are several students with perfect (4.0) GPAs and one at or near the lower end of the distribution (0.0). These data are not Normally distributed.

A histogram and normal quantile plot.

Figure 16.10 Histogram and Normal quantile plot for 150 grade point averages, Example 16.5. The distribution is skewed to the left.

Because of the lack of symmetry, let’s abandon the mean in favor of a statistic that better focuses on the central part of a skewed distribution. We might choose the median, but in this case, we will use the 25% trimmed mean, the mean of the middle 50% of the observations. The median is simply the middle observation or the mean of the two middle observations. Thus, the trimmed mean often does a better job of representing the average of typical observations than does the median.

Our parameter is the 25% trimmed mean of the population of college student GPAs after three semesters at this large university. By the plug-in principle, the statistic that estimates this parameter is the 25% trimmed mean of the sample of 150 students. Because 25% of 150 is 37.5, we drop the 37 lowest and 37 highest GPAs and find the mean of the remaining 76 GPAs. The statistic is

x¯25%=2.950

Given the relatively large sample size (n=76) from this left-skewed distribution, we can use the central limit theorem to argue that the sampling distribution would be approximately Normal with mean near 2.950. Estimating its standard deviation, however, is a more difficult task. We can’t simply use the standard error of the sample mean based on the remaining 76 observations, as that would underestimate the true variability.

Fortunately, we don’t need any distribution facts to use the bootstrap. We bootstrap the 25% trimmed mean just as we bootstrapped the sample mean: draw 3000 resamples of size 150 from the 150 GPAs, calculate the 25% trimmed mean for each resample, and form the bootstrap distribution from these 3000 values.

Example 16.6 Interpreting the bootstrap results.

Data set icon for gpa.

Figure 16.11 shows the bootstrap distribution of the 25% trimmed mean. Here is the summary output from R:

ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = GPA, statistic = theta, R = 3000)
Bootstrap Statistics :
             original          bias                   std. error
t1* 2.949605        −0.002912            0.0778597

A histogram and normal quantile plot.

Figure 16.11 The bootstrap distribution of the 25% trimmed means for 3000 resamples from the GPA data in Example 16.6. The bootstrap distribution is approximately Normal.

What do we see?

Shape: The bootstrap distribution is close to Normal. This suggests that the sampling distribution of the trimmed mean is also close to Normal.

Center: The bootstrap estimate of bias is 0.003, which is small relative to 2.950, the value of the statistic, and 0.078, the bootstrap standard error. Because it is small, we know the statistic (the trimmed mean of the sample) has small bias as an estimate of the parameter (the trimmed mean of the population).

Spread: The bootstrap standard error of the statistic is

SEboot=0.078

This is an estimate of the standard deviation of the sampling distribution of the trimmed mean.

Recall the familiar one-sample t confidence interval (page 386) for the mean of a Normal population:

x¯±t*SE=x¯±t*sn

This interval is based on the Normal sampling distribution of the sample mean x¯ and the formula SE=s/n for the standard error of x¯. When a bootstrap distribution is approximately Normal and has small bias, we can essentially use the same idea with the bootstrap standard error to get a confidence interval for any parameter.

caution Note that this interval uses the original value of the statistic, not the mean of the bootstrap distribution.

Example 16.7 Bootstrap t confidence interval of the trimmed mean.

Data set icon for gpa.

We want to estimate the 25% trimmed mean of the population of all college student GPAs after three semesters at this large university. We have an SRS of size n=150 but use only the middle 76 observations for our estimate. Because the bootstrap standard error accounts for the trimming of observations, we will use n=76 when computing degrees of freedom.4 The preceding software output shows that the trimmed mean of this sample is x¯25%=2.950 and that the bootstrap standard error of this statistic is SEboot=0.078. A 95% confidence interval for the population trimmed mean is, therefore,

x¯25%±t*SEboot=2.950±(1.992)(0.078)=2.950±0.155=(2.795,3.105)

Because Table D does not have entries for [ n2(37) ]1=75 degrees of freedom, we used the Excel function = T.INV.2T(0.05,75) to get t*=1.992.

We are 95% confident that the 25% trimmed mean (the mean of the middle 50%) for the population of college student GPAs after three semesters at this large university is between 2.795 and 3.105.

Check-in
  1. 16.3 Bootstrap t confidence interval. Recall Example 16.1 (page 16-3). Suppose a bootstrap distribution is created using 3000 resamples and that the mean and standard deviation of the resample means are 7.84 and 1.20, respectively.

    1. What is the bootstrap estimate of the bias?

    2. What is the bootstrap standard error of x¯?

    3. Assume that the bootstrap distribution is reasonably Normal. Because the bias is small relative to the observed x¯, the bootstrap t confidence interval for the population mean μ is justified. Give the 95% bootstrap t confidence interval for μ.

  2. 16.4 Bootstrap t confidence interval for average audio file length. Return to or create the bootstrap distribution resamples on the sample mean for audio file length in Exercise 16.12 (page 16-11). In Example 7.10 (page 400), the t confidence interval was applied to both the time lengths and the logarithm of the time measurements. Data set icon for songs.

    1. Inspect the bootstrap distribution. Are the conditions met so that we can use a bootstrap t confidence interval? Explain why or why not.

    2. Construct the 95% bootstrap t confidence interval.

    3. Compare the bootstrap results with the t confidence interval for average time reported in Example 7.10.

Bootstrapping to compare two groups

Two-sample problems are among the most common statistical settings. In a two-sample problem, we wish to compare two populations, such as male and female college students, based on separate samples from each population. When both populations are roughly Normal, the two-sample t procedures compare the two population means.

The bootstrap can also compare two populations, without the Normality condition and without the restriction to a comparison of means. The most important new idea in terms of implementation is that bootstrap resampling must mimic the “separate samples” design that produced the original data.

Example 16.8 Bootstrap comparison of GPAs.

Data set icon for gpa.

In Example 16.5, we looked at grade point average (GPA) after three semesters of college as a measure of success. How do GPAs compare between males and females? Figure 16.12 shows density curves and Normal quantile plots for the GPAs of 91 males and 59 females. The distributions are both far from Normal. Here are some summary statistics:

Sex n x¯ s
Male 91     2.784 0.859
Female 59     2.933 0.748
Difference 0.149

The data suggest that GPAs tend to be slightly higher for females. The mean GPA for females is roughly 0.15 higher than the mean for males.

A graph of density curves and a normal quantile plot.

Figure 16.12 Density curves and Normal quantile plots of the distributions of GPA for males and females, Example 16.8.

Let’s consider estimating the difference between the population means, μ1μ2. We might be somewhat reluctant to use the two-sample t confidence interval because both samples are very skewed. Investigating the bootstrap distribution will allow us to assess this concern.

To compute this distribution and the bootstrap standard error for the difference in sample means x¯1x¯2, resample separately from the two samples. Each of our 3000 resamples consists of two group resamples, one of size 91 drawn with replacement from the male data and one of size 59 drawn with replacement from the female data. For each combined resample, compute the statistic x¯1x¯2. The 3000 differences form the bootstrap distribution. The bootstrap standard error is the standard deviation of the bootstrap distribution.

Example 16.9 Assessing the bootstrap distribution.

The boot function in R automates this bootstrap procedure. Here is the R output for one set of 3000 differences:

STRATIFIED BOOTSTRAP
Call:
boot(data = gpa, statistic = meanDiff, R = 3000, strata = Sex)

Bootstrap Statistics :
     original     bias    std. error
t1* −0.1490259  0.003989901  0.1327419

Figure 16.13 shows that the bootstrap distribution is close to Normal. We can trust the bootstrap t confidence interval for these data. A 95% confidence interval for the difference in mean GPAs (males versus females) is, therefore,

x¯25%±t*SEboot=0.149±(2.002)(0.133)=0.149±0.266=(0.415,0.117)

Because Table D does not have entries for (n11,n21)=58 degrees of freedom, we used the Excel function = T.INV.2T(0.05,58) to get t*=2.002.

A histogram and normal quantile plot.

Figure 16.13 The bootstrap distribution and Normal quantile plot for the differences in means for the GPA data, Example 16.9.

We are 95% confident that the difference in the mean GPAs of males and females at this large university after three semesters is between 0.415 and 0.117. Because 0 is in this interval, we cannot conclude that the two population means are different. We discuss hypothesis testing in Section 16.5.

In this example, the bootstrap distribution of the difference is close to Normal. caution When the bootstrap distribution is non-Normal, we can’t trust the bootstrap t confidence interval. Fortunately, there are more general ways of using the bootstrap to get confidence intervals that can be safely applied when the bootstrap distribution is not Normal. These methods, which we discuss in Section 16.4, are the next step in practical use of the bootstrap.

Check-in
  1. 16.5 Bootstrap comparison of average reading abilities. Table 7.4 (page 414) gives the scores on a test of reading ability for two groups of third-grade students. The treatment group used “directed reading activities,” and the control group followed the same curriculum without the activities. Data set icon for drp.

    1. Bootstrap the difference in means x¯1x¯2 and report the bootstrap standard error.

    2. Inspect the bootstrap distribution. Are the conditions met for the use of a bootstrap t confidence interval? If so, give a 95% confidence interval.

    3. Compare the bootstrap results with the two-sample t confidence interval reported in Example 7.15 on page 415.

  2. 16.6 Formula-based versus bootstrap standard error. We have a formula (page 413) for the standard error of x¯1x¯2. This formula does not depend on Normality. How does this formula-based standard error for the data of Example 16.9 compare with the bootstrap standard error obtained in Example 16.9? Data set icon for gpa.

Example 16.10 Do all daily numbers have an equal payoff?

The New Jersey Pick-It Lottery is a daily numbers game run by the state of New Jersey. We’ll analyze the first 254 drawings after the lottery was started in 1975.5 Buying a ticket entitles a player to pick a number between 000 and 999. Half the money bet each day goes into the prize pool. (The state takes the other half.) The state picks a winning number at random, and the prize pool is shared equally among all winning tickets.

Although all numbers are equally likely to win, numbers chosen by fewer people have bigger payoffs if they win because the prize is shared among fewer tickets. Figure 16.14 is a scatterplot of the first 254 winning numbers and their payoffs. What patterns can we see?

A scatterplot of lottery payoff versus number.

Figure 16.14 The first 254 winning numbers in the New Jersey Pick-It Lottery and the payoffs for each, Example 16.10. To see patterns, we use least-squares regression (dashed line) and a scatterplot smoother (curve).

The straight line in Figure 16.14 is the least-squares regression line. The line shows a general trend of higher payoffs for larger winning numbers. The curve in the figure was fitted to the plot by a scatterplot smoother that follows local patterns in the data rather than being constrained to a straight line. The curve suggests that there were larger payoffs for numbers in the intervals 000 to 100, 400 to 500, 600 to 700, and 800 to 999.

Are the patterns displayed by the scatterplot smoother just chance? We can use the bootstrap distribution of the smoother’s curve to get an idea of how much random variability there is in the curve. Each resample “statistic” is now a curve rather than a single number. Figure 16.15 shows the curves that result from applying the smoother to 20 resamples from the 254 data points in Figure 16.14. In practice, we’d consider many more than 20 resamples but the small number makes it easier to see the variation in resamples. The original curve is the thick line. The spread of the resample curves about the original curve shows the sampling variability of the output of the scatterplot smoother.

A scatterplot of lottery payoff versus number.

Figure 16.15 The curves produced by the scatterplot smoother for 20 resamples from the data displayed in Figure 16.14. The curve for the original sample is the heavy line.

Nearly all the bootstrap curves mimic the general pattern of the original smoother curve, showing, for example, the same low average payoffs for numbers in the 200s and 300s. This suggests that these patterns are real, not just chance. In fact, when people pick “random” numbers, they tend to choose numbers starting with 2, 3, 5, or 7, so these numbers have lower payoffs. This pattern disappeared after 1976; it appears that players noticed the pattern and changed their number choices.

Section 16.2 SUMMARY

  • Bootstrap distributions mimic the shape, spread, and bias of sampling distributions.

  • The bootstrap estimate of the bias of a statistic is the mean of the bootstrap distribution minus the statistic for the original data. Small bias means that the bootstrap distribution is centered at the statistic of the original sample and suggests that the sampling distribution of the statistic is centered at the population parameter.

  • The bootstrap can estimate the sampling distribution, bias, and standard error of a wide variety of statistics, such as the trimmed mean, whether or not statistical theory tells us about their sampling distributions.

  • If the bootstrap distribution is approximately Normal and the bias is small, we can give a bootstrap t confidence interval for the parameter

    statistic±t*SEboot

    where t* is the critical value of the t(n1) distribution with area C between t* and t*. Do not use this t interval if the bootstrap distribution is not Normal or if it shows substantial bias.

  • To use the bootstrap to compare two populations, draw separate resamples from each sample and compute a statistic that compares the two groups. Repeat many times and use the bootstrap distribution for inference.

Section 16.2 EXERCISES

  1. 16.16 Use the bootstrap standard error and the t distribution for the confidence interval. Suppose you collect GPA data similar to that of Example 16.5 (page 16-12) from your university. You only collect n=100 GPAs, so the 25% trimmed mean is based on the middle n=50 observations. The mean of the sample is 2.917, the mean of the bootstrap distribution is 2.908, and the standard error is 0.090.

    1. What is the bootstrap estimate of bias?

    2. Do you think it reasonable to use the bootstrap t confidence interval? Explain your answer.

    3. Use the t distribution to find the 95% confidence interval.

  2. 16.17 Should you use the bootstrap t confidence interval? For each of the following situations, explain whether or not you would use the bootstrap standard error and the t distribution for the confidence interval. Give reasons for your answers.

    1. The bootstrap distribution of the mean is approximately Normal, and the difference between the mean of the data and the mean of the bootstrap distribution is large relative to the mean of the data.

    2. The bootstrap distribution of the mean is approximately Normal, and the difference between the mean of the data and the mean of the bootstrap distribution is small relative to the mean of the data.

    3. The bootstrap distribution of the mean is clearly skewed, and the difference between the mean of the data and the mean of the bootstrap distribution is large relative to the mean of the data.

    4. The bootstrap distribution of the mean is clearly skewed, and the difference between the mean of the data and the mean of the bootstrap distribution is small relative to the mean of the data.

  3. 16.18 Using the mean instead of the trimmed mean. In Example 16.7 (page 16-15), bootstrap results for the 25% trimmed mean of GPA were presented. Here are some results using the mean as the statistic:

    ORDINARY NONPARAMETRIC BOOTSTRAP
    Call:
    boot(data = GPA, statistic = theta, R = 3000)

    Bootstrap Statistics :
       original    bias   std. error
    t1* 2.842133  −0.002608911  0.06722723

    1. The bootstrap distribution (not shown) looks very Normal. Is it reasonable to construct the t confidence interval? Explain your answer.

    2. Construct the t confidence interval.

    3. A friend is confused about why the two analyses do not provide similar bootstrap means, especially given that both sampling distributions appear to be Normal. How would you explain this to your friend?

  4. 16.19 Bootstrap t confidence interval for the time to start a business. In Exercise 16.9 (page 16-17), we examined the bootstrap distribution for the times to start a business. Return to or re-create the bootstrap distribution of the sample mean for these 187 observations. Data set icon for tts.

    1. Find the bootstrap t 95% confidence interval for these data.

    2. Compare the interval you found in part (a) with the usual t interval.

    3. Which interval do you prefer? Give reasons for your answer.

  5. 16.20 Bootstrap t confidence interval for average delivery time by a robot. Return to or re-create the bootstrap distribution of the sample mean for the eight lunch delivery times in Exercise 16.11 (page 16-11). Data set icon for botdel.

    1. Although the sample is small, verify using graphs and numerical summaries of the bootstrap distribution that the distribution is reasonably Normal and that the bias is small relative to the observed x¯.

    2. The bootstrap t confidence interval for the population mean μ is, therefore, justified. Give the 95% bootstrap t confidence interval for μ.

    3. Give the usual t 95% interval and compare it with your interval from part (b).

  6. 16.21 Bootstrap t confidence interval for help room visit lengths. Return to or re-create the bootstrap distribution of the sample mean for the 50 visit lengths in Exercise 16.14 (page 16-11). Data set icon for help50.

    1. What is the bootstrap estimate of the bias? Verify from the graphs of the bootstrap distribution that the distribution is reasonably Normal and that the bias is small relative to the observed x¯. The bootstrap t confidence interval for the population mean μ is therefore justified.

    2. Give the 95% bootstrap t confidence interval for μ.

    3. The only difference between the bootstrap t and usual one-sample t confidence intervals is that the bootstrap interval uses SEboot in place of the formula-based standard error s/n. What are the values of the two standard errors?

  7. 16.22 Another bootstrap distribution of the trimmed mean. Bootstrap distributions and quantities based on them differ randomly when we repeat the resampling process. A key fact is that they do not differ very much if we use a large number of resamples. Figure 16.11 (page 16-14) shows one bootstrap distribution of the trimmed mean of the GPA data. Repeat the resampling of these data to get another bootstrap distribution of the trimmed mean. Data set icon for gpa.

    1. Plot the bootstrap distribution and compare it with Figure 16.11. Are the two bootstrap distributions similar?

    2. What are the values of the bias and bootstrap standard error for your new bootstrap distribution? How do they compare with the values given on page 16-14?

    3. Find the 95% bootstrap t confidence interval based on your bootstrap distribution. Compare it with the result in Example 16.7 (page 16-15).

  8. 16.23 Bootstrap distribution of the standard deviation s. For Example 16.5 (page 16-12), we bootstrapped the 25% trimmed mean of 150 GPAs. Another statistic whose sampling distribution is unfamiliar to us is the standard deviation s. Bootstrap s for these data. Discuss the shape and bias of the bootstrap distribution. Is the bootstrap t confidence interval for the population standard deviation σ justified? If it is, give a 95% confidence interval. Data set icon for gpa.

  9. 16.24 Bootstrap comparison of tree diameters. In Exercise 7.57 (page 432), you were asked to compare the mean diameter at breast height (DBH) for trees from the northern and southern halves of a land tract using a random sample of 30 trees from each region. Data set icon for nspines.

    1. Use a back-to-back stemplot or side-by-side boxplots to examine the data graphically. Does it appear reasonable to use standard t procedures?

    2. Bootstrap the difference in means x¯Northx¯South and look at the bootstrap distribution. Does it meet the conditions for a bootstrap t confidence interval?

    3. Report the bootstrap standard error and the 95% bootstrap t confidence interval.

    4. Compare the bootstrap results with the usual two-sample t confidence interval.

  10. 16.25 Bootstrapping a Normal data set. The following data are “really Normal.” They are an SRS from the standard Normal distribution N(0, 1), produced by a software Normal random number generator. Data set icon for normald.

    0.01 0.04 1.02 0.13 0.36 0.03 1.88 0.34 0.00 1.21
    0.02 1.01 0.58 0.92 1.38 0.47 0.80 0.90 1.16 0.11
    0.23 2.40 0.08 0.03 0.75 2.29 1.11 2.23 1.23 1.56
    0.52 0.42 0.31 0.56 2.69 1.09 0.10 0.92 0.07 1.76
    0.30 0.53 1.47 0.45 0.41 0.54 0.08 0.32 1.35 2.42
    0.34 0.51 2.47 2.99 1.56 1.27 1.55 0.80 0.59 0.89
    2.36 1.27 1.11 0.56 1.12 0.25 0.29 0.99 0.10 0.30
    0.05 1.44 2.46 0.91 0.51 0.48 0.02 0.54
    1. Make a histogram and Normal quantile plot. Do the data appear to be “really Normal”? From the histogram, does the N(0, 1) distribution appear to describe the data well? Why?

    2. Bootstrap the mean. Why do your bootstrap results suggest that t confidence intervals are appropriate?

    3. Give both the bootstrap and the formula-based standard errors for x¯. Give both the bootstrap and usual t 95% confidence intervals for the population mean μ.

  11. 16.26 Bootstrap distribution of the median. We will see in Section 16.3 that bootstrap methods often work poorly for the median. To illustrate this, bootstrap the sample median of the 21 viewing times we studied in Example 16.1 (page 16-3). Why is the bootstrap t confidence interval not justified? Data set icon for face4.

  12. 16.27 Bootstrap distribution of the mpg standard deviation. The Environmental Protection Agency (EPA) establishes the tests to determine the fuel economy of new cars, but it often does not perform them. Instead, the test protocols are given to the car companies, and the companies perform the tests themselves. To keep the industry honest, the EPA runs some audits each year. The following data are from one EPA audit. We studied these data in Exercise 7.11 (page 406), using methods based on Normal distributions. Data set icon for mileage.

    18.0 15.7 15.8 18.0 18.5 19.8 20.2 20.4
    16.9 18.3 19.8 17.2 16.7 17.7 19.5 18.0

    In addition to the average mpg, the EPA is also interested in how much variability there is in the mpg.

    1. Calculate the sample standard deviation s for these mpg values.

    2. We have no formula for the standard error of s. Find the bootstrap standard error for s.

    3. What does the standard error indicate about how accurate the sample standard deviation is as an estimate of the population standard deviation?

    4. Would it be appropriate to give a bootstrap t interval for the population standard deviation? Why or why not?