Chapter 7 EXERCISES

For any two-sample t problems, try to use the degrees of freedom approximation provided by software. For exercises involving summarized data, this approximation is provided for you. If you instead use the conservative approximation, the smaller of n11 and n21, be sure to clearly state this.

  1. 7.79 LSAT scores. The scores of four senior roommates on the Law School Admission Test (LSAT) are

    153162166133

    Find the mean, the standard deviation, and the standard error of the mean. Is it appropriate to calculate a confidence interval based on these data? Explain why or why not. Data set icon for lsat.

  2. 7.80 Converting a two-sided P-value. You use statistical software to perform a significance test of the null hypothesis that two means are equal. The software reports a P-value for the two-sided alternative. Your alternative is that the first mean is greater than the second mean.

    1. The software reports t=1.85 with a P-value of 0.075. Would you reject H0 at α=0.05? Explain your answer.

    2. The software reports t=1.85 with a P-value of 0.075. Would you reject H0 at α=0.05? Explain your answer.

  3. 7.81 Degrees of freedom and t*. As the degrees of freedom increase, the t distributions get closer and closer to the z (N(0, 1)) distribution. One way to see this is to look at how the value of t* for a 95% confidence interval changes with the degrees of freedom.

    1. Make a plot with degrees of freedom from 10 to 100 by 10 on the x axis and t* on the y axis. Also draw a horizontal line on the plot corresponding to the value of z*=1.96.

    2. Summarize the main features of the plot.

    3. Describe how this plot would change if you considered a 90% confidence interval.

  4. 7.82 Sample size and margin of error. The margin of error for a confidence interval for μ depends on the confidence level, the sample standard deviation s, and the sample size. Fix the confidence level at 95% and the sample standard deviation at s=1 to examine the effect of the sample size. Find the margin of error for sample sizes of 11 to 101 by 10s—that is, let n=11,21,31,,101. Plot the margins of error versus the sample size and summarize the relationship.

  5. 7.83 Which design? The following situations all require inference about a mean or means. Identify each as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers.

    1. Your customers are college students. You are interested in comparing the interest in a new product that you are developing between those students who live in the dorms and those who live elsewhere.

    2. Your customers are college students. You are interested in finding out which of two new product labels is more appealing.

    3. Your customers are college students. You are interested in assessing their interest in a new product.

  6. 7.84 Identify the design. The following situations all require inference about a mean or means. Identify each as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers.

    1. You want to estimate the average age of your store’s customers.

    2. You do an SRS survey of your customers every year. One of the questions on the survey asks about customer satisfaction on a seven-point scale, with the response 1 indicating “very dissatisfied” and 7 indicating “very satisfied.” You want to see if the mean customer satisfaction has improved from last year.

    3. You ask an SRS of customers their opinions on each of two new floor plans for your store.

  7. 7.85 Number of critical food violations. The results of a major city’s restaurant inspections are available through its online newspaper.44 Critical food violations are those that put patrons at risk of getting sick and must immediately be corrected by the restaurant. An SRS of n=250 inspections from the collection of inspections since January 2012 were collected, resulting in x¯=1.031 violations and s=2.022 violations.

    1. Test the hypothesis that the average number of critical violations is less than 1.5, using a significance level of 0.05. State the two hypotheses, the test statistic, and the P-value.

    2. Construct a 95% confidence interval for the average number of critical violations and summarize your result.

    3. Which of the two summaries (significance test versus confidence interval) do you find more helpful in this case? Explain your answer.

    4. These data are integers ranging from 0 to 10. The data are also skewed to the right, with 79% of the values either a 0 or a 1. Given this information, do you think use of the t procedures is appropriate? Explain your answer.

  8. 7.86 Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair. Data set icon for paired.

    Group 1 Group 2
    48.86 48.88
    50.60 52.63
    51.02 52.55
    47.99 50.94
    54.20 53.02
    50.66 50.66
    45.91 47.78
    48.79 48.44
    47.76 48.92
    51.13 51.63
    1. Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative.

    2. Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value.

    3. Describe the differences in the two test results.

  9. 7.87 Two-sample t test versus matched pairs t test, continued. Refer to the previous exercise. Perhaps an easier way to see the major difference in the two analysis approaches for these data is by computing 95% confidence intervals for the mean difference.

    1. Compute the 95% confidence interval using the two-sample t confidence interval.

    2. Compute the 95% confidence interval using the matched pairs t confidence interval.

    3. Compare the estimates (that is, the centers of the intervals) and margins of error. What is the major difference between the two approaches for these data?

  10. 7.88 Average service time. Another benchmark that was measured in the QSRMagazine.com drive-thru study, described in Exercise 7.49 (page 430), was the service time.45 A summary of the results (in seconds) for two of the chains is shown below:

    Chain n x¯ s
    Taco Bell 165 240.38 36.3
    McDonald’s 165 289.05 40.7
    1. Is there a difference in the average service time between these two chains? Test the null hypothesis that the chains’ average service time is the same. Use a significance level of 0.05.

    2. Construct a 95% confidence interval for the difference in average service time.

    3. Lex plans to go to Taco Bell and Sam to McDonald’s. Is the interval in part (b) likely to contain the difference in their service times they encounter? Explain your answer.

  11. 7.89 The efficacy of digital mindfulness training. There is growing evidence that in-person mindfulness training can reduce stress. Little is known, however, about the efficacy of self-guided digital training. To investigate this, a group of researchers randomized 69 participants to either a digital training group or a control group.46 For the digital group, participants were asked to complete the first 10 guided meditations using the mindfulness app Headspace. For the control group, participants were asked to listen to the 10 excerpts from an audiobook on mindfulness using Headspace. The following table summarizes the change from baseline in feelings of stress as measured using the Stress Overload Scale (SOS).

    Group n   x¯ s
    Digital 41 5.39  8.36
    Control 28   0.10 10.46
    1. Can we conclude that the change from baseline is different across the two groups? Specify the hypotheses, test statistic, P-value, and conclusion using α=0.01. (Software gives k=49.35.)

    2. Can we conclude that the average stress level was reduced in the digital group? Specify the hypotheses, test statistic, P-value, and conclusion using α=0.01.

  12. 7.90 Incomplete follow-up. Refer to the previous exercise. The researchers report that 19 participants (n=13 digital and n=6 control) did not complete training and thus were not included in the analysis. Does this information in any way alter your conclusions in the previous exercise? Explain your answer.

  13. NAEP 7.91 Can mockingbirds learn to identify specific humans? A central question in urban ecology is why some animals adapt well to the presence of humans and others do not. The following results summarize part of a study of the northern mockingbird (Mimus polyglottos) that took place on a campus of a large university.47 For four consecutive days, the same human approached a nest and stood 1 meter away for 30 seconds and placed his or her hand on the rim of the nest. On the fifth day, a new person did the same thing. Each day, the distance of the human from the nest when the bird flushed was recorded. This was repeated for 24 nests. The human intruder varied his or her appearance (that is, wore different clothes) over the four days. We report results for only Days 1, 4, and 5 here. The response variable is flush distance, measured in meters.

    Day Mean s
    1  6.1 4.9
    4 15.1 7.3
    5  4.9 5.3
    1. Explain why this should be treated as a matched design.

    2. Unfortunately, the research article does not provide the standard error of the difference, only the standard error of the mean flush distance for each day. However, we can use the general addition rule for variances ( page 247) to approximate it. If we assume that the correlation between the flush distance at Day 1 and Day 4 for each nest is ρ=0.40, what is the standard deviation for the difference in distance?

    3. Using your result in part (b), test the hypothesis that there is no difference in the flush distance across these two days. Use a significance level of 0.05.

    4. Repeat parts (b) and (c) but now compare Day 1 and Day 5, assuming a correlation between flush distances for each nest of ρ=0.30.

    5. Write a brief summary of your conclusions.

  14. NAEP 7.92 Analysis of tree size using the complete data set. The data used in Exercises 7.19 (page 407), 7.57, and 7.58 (page 432) were obtained by taking simple random samples from the 584 longleaf pine trees that were measured in the Wade Tract. The entire data set is given in the WADE data set. Find the 95% confidence interval for the mean DBH using the entire data set and compare this interval with the one that you calculated in Exercise 7.19. Write a report about these data. Include comments on the effect of the sample size on the margin of error, the distribution of the data, the appropriateness of the Normality-based methods for this problem, and the generalizability of the results to other similar stands of longleaf pine or other kinds of trees in this area of the United States and other areas. Data set icon for Wade.

  15. 7.93 Can snobby salespeople boost retail sales? Researchers asked 180 women to read a hypothetical shopping experience where they entered a luxury store (for example, Louis Vuitton, Gucci, Burberry) and asked a salesperson for directions to the items they sought. For half the women, the salesperson was condescending while doing this. The other half were directed in a neutral manner. After reading the experience, participants were asked various questions, including what price they were willing to pay (in dollars) for a particular product from the brand.48 Here is a summary of the results:

    Chain n x¯ s
    Condescending 90 4.44 3.98
    Neutral 90 3.95 2.88

    Were the participants who were treated rudely willing to pay more for the product? Analyze the data and write a report summarizing your work. Be sure to include details regarding the statistical methods you used, your assumptions, and your conclusions. If you use two-sample t procedures, software gives k=162.1 degrees of freedom.

  16. 7.94 A comparison of female high school students. A study was performed to determine the prevalence of the female athlete triad (low energy availability, menstrual dysfunction, and low bone mineral density) in high school students.49 A total of 80 high school athletes and 80 sedentary students were assessed. The following table summarizes several measured characteristics:

    Athletes Sedentary
    Characteristic x¯ s x¯ s
    Body fat (%)  25.61   5.54  32.51   8.05
    Body mass index  21.60   2.46  26.41   2.73
    Calcium deficit (mg) 297.13 516.63 580.54 372.77
    Glasses of milk/day   2.21   1.46   1.82   1.24
    1. For each of the characteristics, test the hypothesis that the means are the same in the two groups. Use a significance level of 0.05 for each test. (Software gives k=140.1, 156.3, 143.7, and 154.0, respectively.)

    2. Write a short report summarizing your results.

  17. 7.95 More on snobby salespeople. Refer to Exercise 7.93. Researchers also asked a different 180 women to read the same hypothetical shopping experience, but now they entered a mass market (e.g., Gap, American Eagle, H&M). Here are those results (in dollars) for the two conditions:

    Chain n x¯ s
    Condescending 90 2.90 3.28
    Neutral 90 2.98 3.24

    Were the participants who were treated rudely willing to pay more for the product? Analyze the data and write a report summarizing your work. Be sure to include details regarding the statistical methods you used, your assumptions, and your conclusions. Also compare these results with the ones from Exercise 7.93. If you use two-sample t procedures, software gives k=178.0 degrees of freedom.

  18. 7.96 Transforming the response. Refer to Exercises 7.93 and 7.95. The researchers state that they took the natural log of the willingness to pay variable in order to “normalize the distribution” prior to analysis. Thus, their test results are based on log dollar measurements. For the t procedures used in these two exercises, do you feel this transformation is necessary? Explain your answer.

  19. 7.97 Competitive prices? A retailer entered into an exclusive agreement with a supplier who guaranteed to provide all products at competitive prices. The retailer eventually began to purchase supplies from other vendors who offered better prices. The original supplier filed a legal action claiming violation of the agreement. In defense, the retailer had an audit performed on a random sample of invoices. For each audited invoice, all purchases made from other suppliers were examined, and the prices were compared with those offered by the original supplier. For each invoice, the percent of purchases for which the alternate supplier offered a lower price than the original supplier was recorded.50 Here are the data:

     0 100  0 100  33  34 100  48  78 100  77 100  38
    68 100 79 100 100 100 100 100 100  89 100 100

    Report the average of the percents with a 95% margin of error. Do the sample invoices suggest that the original supplier’s prices are not competitive on the average? Data set icon for compete.

  20. 7.98 Weight-loss programs. In a study of the effectiveness of weight-loss programs, 47 subjects who were at least 20% overweight took part in a group support program for 10 weeks. Private weighings determined each subject’s weight at the beginning of the program and six months after the program’s end. The matched pairs t test was used to assess the significance of the average weight loss. The paper reporting the study said, “The subjects lost a significant amount of weight over time, t(46)=4.68, p<0.01.”51 It is common to report the results of statistical tests in this abbreviated style.

    1. Why was the matched pairs statistic appropriate?

    2. Explain to someone who knows no statistics but is interested in weight-loss programs what the practical conclusion is.

    3. The paper follows the tradition of reporting significance only at fixed levels, such as α=0.01. In fact, the results are more significant than “p<0.01” suggests. What can you say about the P-value of the t test?

  21. 7.99 Behavior of pet owners. On the morning of March 5, 1996, a train with 14 tankers of propane derailed near the center of the small Wisconsin town of Weyauwega. Six of the tankers were ruptured and burning when the 1700 residents were ordered to evacuate the town. Researchers study disasters like this so that effective relief efforts can be designed for future disasters. About half the households with pets did not evacuate all their pets. A study conducted after the derailment focused on problems associated with retrieval of the pets after the evacuation and characteristics of the pet owners. One of the scales measured “commitment to adult animals,” and the people who evacuated all or some of their pets were compared with those who did not evacuate any of their pets. Higher scores indicate that the pet owner is more likely to take actions that benefit the pet.52 Here are the data summaries:

    Group n x¯ s
    Evacuated all or some pets 116 7.95 3.62
    Did not evacuate any pets 125 6.26 3.56

    Analyze the data and prepare a short report describing the results. (Software gives k=237.0.)

  22. 7.100 Sample size calculation. Example 7.13 (page 412) tells us that the mean height of 10-year-old girls is N(56.9, 2.8) and for boys it is N(56.0, 3.5). The null hypothesis that the mean heights of 10-year-old boys and girls are equal is clearly false. The difference in mean heights is 56.956.0=0.9inch. Small differences such as this can require large sample sizes to detect. To simplify our calculations, let’s assume that the standard deviations are the same—say, σ=3.2—and that we will measure the heights of an equal number of girls and boys. How many would we need to measure to have a 90% chance of detecting the (true) alternative hypothesis?

  23. 7.101 Different methods of teaching reading. In the READ data set, the response variable Post3 is to be compared for three methods of teaching reading. The Basal method is the standard, or control, method, and the two new methods are DRTA and Strat. We can use the methods of this chapter to compare Basal with DRTA and Basal with Strat. Note that to make comparisons among three treatments it is more appropriate to use the procedures that we will learn in Chapter 12. Data set icon for read.

    1. Is the mean reading score with the DRTA method higher than that for the Basal method? Perform an analysis to answer this question, and summarize your results.

    2. Answer part (a) for the Strat method in place of DRTA.

  24. 7.102 Does flipping a classroom work? One approach to active learning is a “flipped classroom.” This commonly involves students watching video lectures outside of class and working on problem-solving activities in class. Research has primarily focused on comparing teaching approaches using end-of-class outcomes, such as final grade. In a recent project, researchers compared the lasting benefits of a flipped classroom by comparing grades received in the subsequent course of the series. Here are the results:53

    Group n x¯ s
    Flipped 166 2.45 1.09
    Traditional 129 1.59 1.20
    1. Do the average grades received differ across the two delivery styles of the previous course? Test the hypothesis that the average grades of the two groups are the same. (Software gives k=261.40.)

    2. Is this an experiment or an observational study? Explain your answer.

    3. Based on your answers to parts (a) and (b), what are your conclusions?

  25. 7.103 Conditions for inference. Suppose that your state contains 85 school corporations, and each corporation reports its expenditures per pupil. Is it proper to apply the one-sample t method to these data to give a 95% confidence interval for the average expenditure per pupil in your state? Explain your answer.

  26. 7.104 Food costs. The Consumer Expenditure Survey provides information on the buying habits of U.S. consumers.54 In the latest report, the average annual amount a person under the age of 25 spent on food was $4876, with a standard error of $248.

    1. Assuming a sample size of n=2000, calculate a 90% confidence interval for the average annual amount a person under the age of 25 spends on food.

    2. Will this interval capture 90% of all annual food expenditures by persons under the age of 25? Explain your answer.

  27. 7.105 Assessment of a foreign-language institute. The National Endowment for the Humanities sponsors summer institutes to improve the skills of high school teachers of foreign languages. One such institute hosted 20 French teachers for four weeks. At the beginning of the period, the teachers were given the Modern Language Association’s listening test of understanding of spoken French. After four weeks of immersion in French in and out of class, the listening test was given again. (The actual French spoken in the two tests was different, so that simply taking the first test should not improve the score on the second test.) The maximum possible score on the test is 36.55 Here are the data: Data set icon for sumlang.

    Teacher Pretest Posttest Gain Teacher Pretest Posttest Gain
     1 32 34 2 11 30 36 6
     2 31 31 0 12 20 26 6
     3 29 35 6 13 24 27 3
     4 10 16 6 14 24 24 0
     5 30 33 3 15 31 32 1
     6 33 36 3 16 30 31 1
     7 22 24 2 17 15 15 0
     8 25 28 3 18 32 34 2
     9 32 26 -6 19 23 26 3
    10 20 26 6 20 23 26 3

    To analyze these data, we first subtract the pretest score from the posttest score to obtain the improvement for each teacher. These 20 differences form a single sample. They appear in the “Gain” columns. The first teacher, for example, improved from 32 to 34, so the gain is 3432=2.

    1. State appropriate null and alternative hypotheses for examining the question of whether or not the course improves French spoken-language skills.

    2. Describe the gain data. Use numerical and graphical summaries.

    3. Perform the significance test. Give the test statistic, the degrees of freedom, and the P-value. Summarize your conclusion.

    4. Give a 95% confidence interval for the mean improvement.

  28. 7.106 Sign test for assessment of a foreign-language institute. Use the sign test to assess whether the summer institute of of the previous exercise improves French listening skills. State the hypotheses, give the P-value using the binomial table (Table C), and report your conclusion. Data set icon for sumlang.

  29. 7.107 Approximating power. When software is not available to compute probabilities from the non-central t distribution, one can approximate the power by assuming the standard deviation σ is known and using the Normal distribution. Let’s compare the exact and approximate power calculations for a new mindfulness study, described in Exercise 7.89. Specifically, we’ll assume sg=10, α=0.05, there are n=41 subjects per group, and we want to detect any difference in the change of SOS scores that is more than 4.5 units. The exact power is 0.5212. Proceed through the following steps to get the approximate value.

    1. Given n1=n2=41, state the degrees of freedom for the two-sample pooled t test.

    2. Assuming α=0.05, use Table D to obtain the critical t value, t*.

    3. This means we reject when

      | (x¯1x¯2)0102/41 |t*

      Rewrite this event so that it is in terms of x¯1x¯2.

    4. Compute the probability of this event in part (c) now assuming (x¯1x¯2)~N(4.5,102/41). How does it compare to the exact value?

PUTTING IT ALL TOGETHER

  1. 7.108 Food intake and weight gain. If we increase our food intake, we generally gain weight. Nutrition scientists can calculate the amount of weight gain that would be associated with a given increase in calories. In one study, 16 nonobese adults, aged 25 to 36 years, were fed 1000 calories per day in excess of the calories needed to maintain a stable body weight. The subjects maintained this diet for eight weeks, so they consumed a total of 56,000 extra calories.56 According to theory, 3500 extra calories will translate into a weight gain of 1 pound. Therefore, we expect each of these subjects to gain 56,000/3500=16pounds (lb). Here are the weights before and after the eight-week period, expressed in kilograms (kg): Data set icon for wtgain.

    Subject 1 2 3 4 5 6 7 8
    Weight before 55.7 54.9 59.6 62.3 74.2 75.6 70.7 53.3
    Weight after 61.7 58.8 66.0 66.2 79.0 82.3 74.3 59.3
    Subject 9 10 11 12 13 14 15 16
    Weight before 73.3 63.4 68.1 73.7 91.7 55.9 61.7 57.8
    Weight after 79.1 66.0 73.4 76.9 93.1 63.0 68.2 60.3
    1. For each subject, subtract the weight before from the weight after to determine the weight change.

    2. Find the mean and the standard deviation for the weight change.

    3. Calculate the standard error and the margin of error for 95% confidence. Report the 95% confidence interval for weight change in a sentence that explains the meaning of the 95%.

    4. Convert the mean weight gain in kilograms to mean weight gain in pounds. Because there are 2.2 kg per pound, multiply the value in kilograms by 2.2 to obtain pounds. Do the same for the standard deviation and the confidence interval.

    5. Test the null hypothesis that the mean weight gain is 16 lb. Be sure to specify the null and alternative hypotheses, the test statistic with degrees of freedom, and the P-value. What do you conclude?

    6. Write a short paragraph explaining your results.

  2. 7.109 Food intake and NEAT. Nonexercise activity thermogenesis (NEAT) provides a partial explanation for the results you found in the previous analysis. NEAT is energy burned by fidgeting, maintenance of posture, spontaneous muscle contraction, and other activities of daily living. In the study of the previous exercise, the 16 subjects increased their NEAT by 328 calories per day, on average, in response to the additional food intake. The standard deviation was 256.

    1. Test the null hypothesis that there was no change in NEAT versus the two-sided alternative. Summarize the results of the test and give your conclusion.

    2. Find a 95% confidence interval for the change in NEAT. Discuss the additional information provided by the confidence interval that is not evident from the results of the significance test.

  3. 7.110 Alcohol consumption and body composition. Individuals who consume large amounts of alcohol do not use the calories from this source as efficiently as calories from other sources. One study examined the effects of moderate alcohol consumption on body composition and the intake of other foods. Fourteen subjects participated in a crossover design where they either drank wine for the first six weeks and then abstained for the next six weeks or vice versa.57 During the period when they drank wine, the subjects, on average, lost 0.4 kilogram (kg) of body weight; when they did not drink wine, they lost an average of 1.1 kg. The standard deviation of the difference between the weight lost under these two conditions is 8.6 kg. During the wine period, they consumed an average of 2589 calories; with no wine, the mean consumption was 2575. The standard deviation of the difference was 210.

    1. Compute the differences in means and the standard errors for comparing body weight and caloric intake under the two experimental conditions.

    2. A report of the study indicated that there were no significant differences in these two outcome measures. Verify this result for each measure, giving the test statistic, degrees of freedom, and the P-value.

    3. One concern with studies such as this, with a small number of subjects, is that there may not be sufficient power to detect differences that are potentially important. Address this question by computing 95% confidence intervals for the two measures and discuss the information provided by the intervals.

    4. Here are some other characteristics of the study. The study periods lasted six weeks. All subjects were males between the ages of 21 and 50 years who weighed between 68 and 91 kg. They were all from the same city. During the wine period, subjects were told to consume two 135-milliliter (ml) servings of red wine per day and no other alcohol. The entire six-week supply was given to each subject at the beginning of the period. During the other period, subjects were instructed to refrain from any use of alcohol. All subjects reported that they complied with these instructions except for three subjects, who said that they drank no more than three to four 12-ounce bottles of beer during the no-alcohol period. Discuss how these factors could influence the interpretation of the results.

  4. 7.111 Do women perform better in school? Some research suggests that women perform better than men in school, but men score higher on standardized tests. Table 1.2 (page 24) presents data on a measure of school performance, grade point average (GPA), and a standardized test, IQ, for 78 seventh-grade students. Do these data lend further support to the previously found gender differences? Give graphical displays of the data and describe the distributions. Use significance tests and confidence intervals to examine this question, and prepare a short report summarizing your findings. Data set icon for grades.

  5. 7.112 Self-concept and school performance. Refer to the previous exercise. Although self-concept in this study was measured on a scale with values in the data set ranging from 20 to 80, many prefer to think of this kind of variable as having only two possible values: low self-concept or high self-concept. Find the median of the self-concept scores in Table 1.2, and define those students with scores at or below the median to be low-self-concept students and those with scores above the median to be high-self-concept students. Do high-self-concept students have GPAs that differ from those of low-self-concept students? What about IQ? Prepare a report addressing these questions. Be sure to include graphical and numerical summaries and confidence intervals, and state clearly the details of significance tests. Data set icon for grades.