Chapter 16 EXERCISES

  1. 16.73 Sex and GPA. In Example 16.7 (page 16-15), you used the bootstrap to find a 95% confidence interval for the 25% trimmed mean of GPA. Let’s change the statistic of interest to the 5% trimmed mean. Using Examples 16.5 through 16.7 as a guide, find the corresponding 95% confidence interval. Compare this interval with the one in Example 16.7. Data set icon for gpa.

  2. 16.74 Change the trim. Refer to the previous exercise. Change the statistic of interest to the 10% trimmed mean. Answer the questions in the previous exercise and also compare your new interval with the one you found there. Data set icon for gpa.

  3. 16.75 Compare the correlations. In Exercise 16.45 (page 16-37), we compared the mean GPA for males and females using the bootstrap. In Exercise 16.46, we used the bootstrap to examine the correlation between GPA and high school math grades. Find the correlations for men and women separately and determine whether there is evidence that they differ. Data set icon for gpa.

    1. Find the correlation between GPA and high school math grades for the men. Use the bootstrap to find a 95% confidence interval for the population correlation.

    2. Repeat part (a) for the women.

    3. Use the bootstrap to test the null hypothesis that the population correlations for men and women are the same, ρMen=ρWomen.

    4. Summarize your findings.

  4. 16.76 Use the regression slope. Refer to the previous exercise, where we used correlations to address the question of whether or not the relationship between GPA and high school math grades is the same for men and women. In Exercise 16.50 (page 16-37), we used the bootstrap to examine the slope of the least-squares regression line for predicting GPA using high school math grades. Let’s compute the slope separately for males and females and determine whether or not they differ. This is another way to ask the question about whether or not the relationship between GPA and high school math grades is the same for males and females. Answer the questions from the previous exercise using the slope. Compare the results that you find here with those you found in the previous exercise. Data set icon for gpa.

  5. 16.77 Bootstrap confidence interval for the difference in proportions. Refer to Exercise 16.70 (page 16-49). We want a 95% confidence interval for the change from 2015 to 2020 in the proportions of U.S. residents who report that they have listened to at least one podcast. Bootstrap the sample data. Give all three bootstrap confidence intervals (t, percentile, and BCa). Compare the three intervals and summarize the results. Which intervals would you recommend? Give reasons for your answer.

  6. NAEP 16.78 Bootstrap confidence interval for the ratio. Here is one conclusion from the data in Table 16.3, described in Exercise 16.68: “The mean serum retinol level in uninfected children was 1.255 times the mean level in the infected children. A 95% confidence interval for the ratio of means in the population of all children in Papua New Guinea is . . . .” Data set icon for retinol.

    1. Bootstrap the data and use the BCa interval to complete this conclusion.

    2. Briefly describe the shape and bias of the bootstrap distribution. Does the bootstrap percentile interval agree closely with the BCa interval for these data?

  7. 16.79 Poetry: An occupational hazard. According to William Butler Yeats, “She is the Gaelic muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for she is restless, and will not let them remain long on earth.” One study designed to investigate this issue examined the age at death for writers from different cultures and sexes.13

    In Example 1.27 (page 34), we examined the distributions of the age at death for female novelists, poets, and nonfiction writers. Figure 1.14 shows modified side-by-side boxplots for the three categories of writers. The poets do appear to die young! Note that there is an outlier among the nonfiction writers. This writer died at the age of 40—young for a nonfiction writer but not for a novelist or a poet! Let’s use the methods of this chapter to compare the ages at death for poets and nonfiction writers. Data set icon for poets.

    1. Use numerical and graphical summaries to describe the distribution of age at death for the poets. Do the same for the nonfiction writers.

    2. Use the methods of Chapter 7 (page 417) to compare the means of the two distributions. Summarize your findings.

    3. Use the bootstrap methods of this chapter to compare the means of the two distributions. Summarize your findings.

  8. 16.80 Medians for the poets. Refer to the previous exercise. Use the bootstrap methods of this chapter to compare the medians of the two distributions. Summarize your findings and compare them with part (c) of the previous exercise. Data set icon for poets.

  9. 16.81 Permutation test for the poets. Refer to Exercise 16.79. Answer part (c) of that exercise using the permutation test. Summarize your findings and compare them with what you found in Exercise 16.79. Data set icon for poets.

  10. 16.82 Variance for poets. Refer to Exercises 16.79 and 16.81.

    1. Instead of comparing means, compare variances using the ratio of sample variances as the statistic. Summarize your findings.

    2. Explain how questions about the equality of standard deviations are related to questions about the equality of variances.

    3. Use the results of this exercise and the previous three exercises to address the question of whether or not the distributions of the poets and nonfiction writers are the same. Data set icon for poets.

  11. NAEP 16.83 Bootstrap confidence interval for the median. Most software can generate random numbers that have the uniform distribution on 0 to 1. For example, Excel has the RAND() function (page 168) and R has the runif() function. Generate a sample of 50 observations from this distribution.

    1. Figure 4.9 (page 229) shows the density curve of this distribution. What is the population median?

    2. Bootstrap the sample median and describe the bootstrap distribution.

    3. What is the bootstrap standard error? Compute a 95% bootstrap t confidence interval.

    4. Find the 95% BCa confidence interval. Compare with the interval in (c). Is the bootstrap t interval reliable here?

  12. 16.84 Are female personal trainers, on average, younger? A fitness center employs 20 personal trainers. Here are the ages, in years, of the female and male personal trainers working at this center: Data set icon for train.

    Male 25 26 23 32 35 29 30 28 31 32 29
    Female 21 23 22 23 20 29 24 19 22
    1. Make a back-to-back stemplot. Do you think the difference in mean ages will be significant?

    2. A two-sample t test gives P<0.001 for the null hypothesis that the mean age of female personal trainers is equal to the mean age of male personal trainers. Do a two-sided permutation test to check the answer.

    3. What do you conclude about using the t test? What do you conclude about the mean ages of the trainers?

  13. 16.85 Planning to attend a four-year college. A Pew survey asked U.S. teenagers whether they plan to attend a four-year college.14 For the boys, 51% of 461 survey participants said they planned to attend a four-year college. For the girls, 68% of 454 survey participants said this. Use the bootstrap to find a 95% confidence interval for the difference between the female proportion who said they planed to attend a four-year college and the male proportion.

  14. 16.86 Use a ratio for females versus males. Refer to the previous exercise. In many settings, researchers prefer to communicate the comparison of two proportions with a ratio. For teenagers planning to attend a four-year college, they would report that females are 1.33 (68/51) times more likely to say they plan to attend a four-year college. Use the bootstrap to give a 95% confidence interval for this ratio.

  15. NAEP 16.87 Another way to communicate the result. Refer to the previous two exercises. Here is another way to communicate the result: female teenagers are 33% more likely to say they plan to attend a four-year college than male teenagers.

    1. Explain how the 33% is computed.

    2. Use the bootstrap to give a 95% confidence interval for this estimate.

    3. Based on this exercise and the previous two, which of the three ways is most effective for communicating the results? Give reasons for your answer.

PUTTING IT ALL TOGETHER

  1. 16.88 Sadness and spending. Refer to Exercise 7.47 (page 430). A study of sadness and spending randomized subjects to watch videos designed to produce sad or neutral moods. Each subject was given $10, and after watching the video, he or she was asked to trade $0.50 increments of their $10 for an insulated bottle of water. Here are the data: Data set icon for sadness.

    Group Purchase price
    Neutral 0.00 2.00 0.00 1.00 0.50 0.00 0.50
    2.00 1.00 0.00 0.00 0.00 0.00 1.00
    Sad 3.00 4.00 0.50 1.00 2.50 2.00 1.50 0.00 1.00
    1.50 1.50 2.50 4.00 3.00 3.50 1.00 3.50
    1. Use the two-sample t significance test (page 416) to compare the means of the two groups. Summarize your results.

    2. Use the pooled two-sample t significance test (page 423) to compare the means of the two groups. Summarize your results.

    3. Use a permutation test to compare the two groups. Summarize your results.

    4. Discuss the differences among the results you found for parts (a), (b), and (c). Which method do you prefer? Give reasons for your answer.

  2. 16.89 Comparing the variances for sadness and spending. Refer to the previous exercise. Some treatments in randomized experiments such as this can cause variances to be different. Are the variances of the neutral and sad subjects equal? Data set icon for sadness.

    1. Compute the ratio F*=s12/s22 and compare to the F distribution with n11 and n21 degrees of freedom. This is known as the F test for equality of variances. Summarize your results.

    2. Compare the variances using a permutation test. Summarize your results.

    3. Write a short paragraph comparing the F test with the permutation test for these data.

  3. 16.90 Insurance fraud? Jocko’s Garage has been accused of insurance fraud. Data on estimates (in dollars) made by Jocko and another garage were obtained for 10 damaged vehicles. Here is what the investigators found: Data set icon for garage.

    Car 1 2 3 4 5
    Jocko’s 1375 1550 1250 1300 900
    Other 1250 1300 1250 1200 950
    Car 6 7 8 9 10
    Jocko’s 1500 1750 3600 2250 2800
    Other 1575 1600 3300 2125 2600
    1. Compute the mean estimate for Jocko and the mean estimate for the other garage. Report the difference in the means and the 95% standard t confidence interval. Be sure to choose the appropriate t procedure for your analysis and explain why you made this choice.

    2. Use the bootstrap to find the confidence interval. Be sure to give details about how you used the bootstrap, which options you chose, and why.

    3. Compare the t interval with the bootstrap interval.

  4. 16.91 Other ways to look at Jocko’s estimates. Refer to the previous exercise. Let’s consider some other ways to analyze these data. Data set icon for garage.

    1. For each damaged vehicle, divide Jocko’s estimate by the estimate from the other garage. Perform your analysis on these data. Write a short report that includes numerical and graphical summaries, your estimate, the 95% t confidence interval, the 95% bootstrap confidence interval, and an explanation for all choices (such as whether you chose to examine the mean or the median, bootstrap options, etc.).

    2. Compute the mean of Jocko’s estimates and the mean of the estimates made by the other garage. Divide Jocko’s mean by the mean for the other garage. Report this ratio and find a 95% confidence interval for this quantity. Be sure to justify choices that you made for the bootstrap.

    3. Using what you have learned in this exercise and the previous one, how would you summarize the comparison of Jocko’s estimates with those made by the other garage? Assume that your audience knows very little about statistics but a lot about insurance.

  5. 16.92 Comparing two operators. Exercise 7.29 (page 409) gives these data on a delicate measurement of total body bone mineral content made by two operators on the same eight subjects:15 Data set icon for tbbmc.

    Operator Subject
    1 2 3 4 5 6 7 8
    1 1.328 1.342 1.075 1.228 0.939 1.004 1.178 1.286
    2 1.323 1.322 1.073 1.233 0.934 1.019 1.184 1.304

    Do permutation tests give good evidence that measurements made by the two operators differ systematically? If so, in what way do they differ? Do two tests: one that compares centers and one that compares spreads.