Chapter 16 Exercises in Chapter 16 Bootstrap Methods and Permutation Tests

Chapter 16 EXERCISES

16.73 Sex and GPA. In Example 16.7 (page 16-15), you used the bootstrap to find a 95% confidence interval for the 25% trimmed mean of GPA. Let’s change the statistic of interest to the 5% trimmed mean. Using Examples 16.5 through 16.7 as a guide, find the corresponding 95% confidence interval. Compare this interval with the one in Example 16.7.
16.74 Change the trim. Refer to the previous exercise. Change the statistic of interest to the 10% trimmed mean. Answer the questions in the previous exercise and also compare your new interval with the one you found there.
16.75 Compare the correlations. In Exercise 16.45 (page 16-37), we compared the mean GPA for males and females using the bootstrap. In Exercise 16.46, we used the bootstrap to examine the correlation between GPA and high school math grades. Find the correlations for men and women separately and determine whether there is evidence that they differ.
1. Find the correlation between GPA and high school math grades for the men. Use the bootstrap to find a 95% confidence interval for the population correlation.
2. Repeat part (a) for the women.
3. Use the bootstrap to test the null hypothesis that the population correlations for men and women are the same, ρMen=ρWomen.
4. Summarize your findings.
16.76 Use the regression slope. Refer to the previous exercise, where we used correlations to address the question of whether or not the relationship between GPA and high school math grades is the same for men and women. In Exercise 16.50 (page 16-37), we used the bootstrap to examine the slope of the least-squares regression line for predicting GPA using high school math grades. Let’s compute the slope separately for males and females and determine whether or not they differ. This is another way to ask the question about whether or not the relationship between GPA and high school math grades is the same for males and females. Answer the questions from the previous exercise using the slope. Compare the results that you find here with those you found in the previous exercise.
16.77 Bootstrap confidence interval for the difference in proportions. Refer to Exercise 16.70 (page 16-49). We want a 95% confidence interval for the change from 2015 to 2020 in the proportions of U.S. residents who report that they have listened to at least one podcast. Bootstrap the sample data. Give all three bootstrap confidence intervals (t, percentile, and BCa). Compare the three intervals and summarize the results. Which intervals would you recommend? Give reasons for your answer.
16.78 Bootstrap confidence interval for the ratio. Here is one conclusion from the data in Table 16.3, described in Exercise 16.68: “The mean serum retinol level in uninfected children was 1.255 times the mean level in the infected children. A 95% confidence interval for the ratio of means in the population of all children in Papua New Guinea is . . . .”
1. Bootstrap the data and use the BCa interval to complete this conclusion.
2. Briefly describe the shape and bias of the bootstrap distribution. Does the bootstrap percentile interval agree closely with the BCa interval for these data?
16.79 Poetry: An occupational hazard. According to William Butler Yeats, “She is the Gaelic muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for she is restless, and will not let them remain long on earth.” One study designed to investigate this issue examined the age at death for writers from different cultures and sexes.¹³

In Example 1.27 (page 34), we examined the distributions of the age at death for female novelists, poets, and nonfiction writers. Figure 1.14 shows modified side-by-side boxplots for the three categories of writers. The poets do appear to die young! Note that there is an outlier among the nonfiction writers. This writer died at the age of 40—young for a nonfiction writer but not for a novelist or a poet! Let’s use the methods of this chapter to compare the ages at death for poets and nonfiction writers.
1. Use numerical and graphical summaries to describe the distribution of age at death for the poets. Do the same for the nonfiction writers.
2. Use the methods of Chapter 7 (page 417) to compare the means of the two distributions. Summarize your findings.
3. Use the bootstrap methods of this chapter to compare the means of the two distributions. Summarize your findings.
16.80 Medians for the poets. Refer to the previous exercise. Use the bootstrap methods of this chapter to compare the medians of the two distributions. Summarize your findings and compare them with part (c) of the previous exercise.
16.81 Permutation test for the poets. Refer to Exercise 16.79. Answer part (c) of that exercise using the permutation test. Summarize your findings and compare them with what you found in Exercise 16.79.
16.82 Variance for poets. Refer to Exercises 16.79 and 16.81.
1. Instead of comparing means, compare variances using the ratio of sample variances as the statistic. Summarize your findings.
2. Explain how questions about the equality of standard deviations are related to questions about the equality of variances.
3. Use the results of this exercise and the previous three exercises to address the question of whether or not the distributions of the poets and nonfiction writers are the same.
16.83 Bootstrap confidence interval for the median. Most software can generate random numbers that have the uniform distribution on 0 to 1. For example, Excel has the RAND() function (page 168) and R has the runif() function. Generate a sample of 50 observations from this distribution.
1. Figure 4.9 (page 229) shows the density curve of this distribution. What is the population median?
2. Bootstrap the sample median and describe the bootstrap distribution.
3. What is the bootstrap standard error? Compute a 95% bootstrap t confidence interval.
4. Find the 95% BCa confidence interval. Compare with the interval in (c). Is the bootstrap t interval reliable here?

16.84 Are female personal trainers, on average, younger? A fitness center employs 20 personal trainers. Here are the ages, in years, of the female and male personal trainers working at this center: Data set icon for train.

Male	25	26	23	32	35	29	30	28	31	32	29
Female	21	23	22	23	20	29	24	19	22

Make a back-to-back stemplot. Do you think the difference in mean ages will be significant?
A two-sample t test gives P<0.001 for the null hypothesis that the mean age of female personal trainers is equal to the mean age of male personal trainers. Do a two-sided permutation test to check the answer.
What do you conclude about using the t test? What do you conclude about the mean ages of the trainers?

16.85 Planning to attend a four-year college. A Pew survey asked U.S. teenagers whether they plan to attend a four-year college.¹⁴ For the boys, 51% of 461 survey participants said they planned to attend a four-year college. For the girls, 68% of 454 survey participants said this. Use the bootstrap to find a 95% confidence interval for the difference between the female proportion who said they planed to attend a four-year college and the male proportion.
16.86 Use a ratio for females versus males. Refer to the previous exercise. In many settings, researchers prefer to communicate the comparison of two proportions with a ratio. For teenagers planning to attend a four-year college, they would report that females are 1.33 (68/51) times more likely to say they plan to attend a four-year college. Use the bootstrap to give a 95% confidence interval for this ratio.
16.87 Another way to communicate the result. Refer to the previous two exercises. Here is another way to communicate the result: female teenagers are 33% more likely to say they plan to attend a four-year college than male teenagers.
1. Explain how the 33% is computed.
2. Use the bootstrap to give a 95% confidence interval for this estimate.
3. Based on this exercise and the previous two, which of the three ways is most effective for communicating the results? Give reasons for your answer.

PUTTING IT ALL TOGETHER

16.88 Sadness and spending. Refer to Exercise 7.47 (page 430). A study of sadness and spending randomized subjects to watch videos designed to produce sad or neutral moods. Each subject was given $10, and after watching the video, he or she was asked to trade $0.50 increments of their $10 for an insulated bottle of water. Here are the data: Data set icon for sadness.

Group	Purchase price
Neutral	0.00	2.00	0.00	1.00	0.50	0.00	0.50
Neutral	2.00	1.00	0.00	0.00	0.00	0.00	1.00
Sad	3.00	4.00	0.50	1.00	2.50	2.00	1.50	0.00	1.00
Sad	1.50	1.50	2.50	4.00	3.00	3.50	1.00	3.50

Use the two-sample t significance test (page 416) to compare the means of the two groups. Summarize your results.
Use the pooled two-sample t significance test (page 423) to compare the means of the two groups. Summarize your results.
Use a permutation test to compare the two groups. Summarize your results.
Discuss the differences among the results you found for parts (a), (b), and (c). Which method do you prefer? Give reasons for your answer.

16.89 Comparing the variances for sadness and spending. Refer to the previous exercise. Some treatments in randomized experiments such as this can cause variances to be different. Are the variances of the neutral and sad subjects equal?
1. Compute the ratio F*=s12/s22 and compare to the F distribution with n1−1 and n2−1 degrees of freedom. This is known as the F test for equality of variances. Summarize your results.
2. Compare the variances using a permutation test. Summarize your results.
3. Write a short paragraph comparing the F test with the permutation test for these data.

16.90 Insurance fraud? Jocko’s Garage has been accused of insurance fraud. Data on estimates (in dollars) made by Jocko and another garage were obtained for 10 damaged vehicles. Here is what the investigators found: Data set icon for garage.

Car	1	2	3	4	5
Jocko’s	1375	1550	1250	1300	900
Other	1250	1300	1250	1200	950
Car	6	7	8	9	10
Jocko’s	1500	1750	3600	2250	2800
Other	1575	1600	3300	2125	2600

Compute the mean estimate for Jocko and the mean estimate for the other garage. Report the difference in the means and the 95% standard t confidence interval. Be sure to choose the appropriate t procedure for your analysis and explain why you made this choice.
Use the bootstrap to find the confidence interval. Be sure to give details about how you used the bootstrap, which options you chose, and why.
Compare the t interval with the bootstrap interval.

16.91 Other ways to look at Jocko’s estimates. Refer to the previous exercise. Let’s consider some other ways to analyze these data.
1. For each damaged vehicle, divide Jocko’s estimate by the estimate from the other garage. Perform your analysis on these data. Write a short report that includes numerical and graphical summaries, your estimate, the 95% t confidence interval, the 95% bootstrap confidence interval, and an explanation for all choices (such as whether you chose to examine the mean or the median, bootstrap options, etc.).
2. Compute the mean of Jocko’s estimates and the mean of the estimates made by the other garage. Divide Jocko’s mean by the mean for the other garage. Report this ratio and find a 95% confidence interval for this quantity. Be sure to justify choices that you made for the bootstrap.
3. Using what you have learned in this exercise and the previous one, how would you summarize the comparison of Jocko’s estimates with those made by the other garage? Assume that your audience knows very little about statistics but a lot about insurance.

16.92 Comparing two operators. Exercise 7.29 (page 409) gives these data on a delicate measurement of total body bone mineral content made by two operators on the same eight subjects:¹⁵ Data set icon for tbbmc.

Operator	Subject
Operator	1	2	3	4	5	6	7	8
1	1.328	1.342	1.075	1.228	0.939	1.004	1.178	1.286
2	1.323	1.322	1.073	1.233	0.934	1.019	1.184	1.304

Do permutation tests give good evidence that measurements made by the two operators differ systematically? If so, in what way do they differ? Do two tests: one that compares centers and one that compares spreads.