For any two-sample t problems, try to use the degrees of freedom
approximation provided by software. For exercises involving
summarized data, this approximation is provided for you. If you
instead use the conservative approximation, the smaller of
7.79 LSAT scores. The scores of four senior roommates on the Law School Admission Test (LSAT) are
Find the mean, the standard deviation, and the standard error of
the mean. Is it appropriate to calculate a confidence interval
based on these data? Explain why or why not.
7.80 Converting a two-sided P-value. You use statistical software to perform a significance test of the null hypothesis that two means are equal. The software reports a P-value for the two-sided alternative. Your alternative is that the first mean is greater than the second mean.
The software reports
The software reports
7.81 Degrees of freedom and
Make a plot with degrees of freedom from 10 to 100 by 10 on
the x axis and
Summarize the main features of the plot.
Describe how this plot would change if you considered a 90% confidence interval.
7.82 Sample size and margin of error. The margin
of error for a confidence interval for
7.83 Which design? The following situations all require inference about a mean or means. Identify each as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers.
Your customers are college students. You are interested in comparing the interest in a new product that you are developing between those students who live in the dorms and those who live elsewhere.
Your customers are college students. You are interested in finding out which of two new product labels is more appealing.
Your customers are college students. You are interested in assessing their interest in a new product.
7.84 Identify the design. The following situations all require inference about a mean or means. Identify each as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers.
You want to estimate the average age of your store’s customers.
You do an SRS survey of your customers every year. One of the questions on the survey asks about customer satisfaction on a seven-point scale, with the response 1 indicating “very dissatisfied” and 7 indicating “very satisfied.” You want to see if the mean customer satisfaction has improved from last year.
You ask an SRS of customers their opinions on each of two new floor plans for your store.
7.85 Number of critical food violations.
The results of a major city’s restaurant inspections are available
through its online newspaper.44
Critical food violations are those that put patrons at risk of
getting sick and must immediately be corrected by the restaurant.
An SRS of
Test the hypothesis that the average number of critical violations is less than 1.5, using a significance level of 0.05. State the two hypotheses, the test statistic, and the P-value.
Construct a 95% confidence interval for the average number of critical violations and summarize your result.
Which of the two summaries (significance test versus confidence interval) do you find more helpful in this case? Explain your answer.
These data are integers ranging from 0 to 10. The data are also skewed to the right, with 79% of the values either a 0 or a 1. Given this information, do you think use of the t procedures is appropriate? Explain your answer.
7.86 Two-sample t test versus matched pairs
t test.
Consider the following data set. The data were actually collected
in pairs, and each row represents a pair.
Group 1 | Group 2 |
---|---|
48.86 | 48.88 |
50.60 | 52.63 |
51.02 | 52.55 |
47.99 | 50.94 |
54.20 | 53.02 |
50.66 | 50.66 |
45.91 | 47.78 |
48.79 | 48.44 |
47.76 | 48.92 |
51.13 | 51.63 |
Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative.
Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value.
Describe the differences in the two test results.
7.87 Two-sample t test versus matched pairs t test, continued. Refer to the previous exercise. Perhaps an easier way to see the major difference in the two analysis approaches for these data is by computing 95% confidence intervals for the mean difference.
Compute the 95% confidence interval using the two-sample t confidence interval.
Compute the 95% confidence interval using the matched pairs t confidence interval.
Compare the estimates (that is, the centers of the intervals) and margins of error. What is the major difference between the two approaches for these data?
7.88 Average service time. Another benchmark that was measured in the QSRMagazine.com drive-thru study, described in Exercise 7.49 (page 430), was the service time.45 A summary of the results (in seconds) for two of the chains is shown below:
Chain | n |
|
s |
---|---|---|---|
Taco Bell | 165 | 240.38 | 36.3 |
McDonald’s | 165 | 289.05 | 40.7 |
Is there a difference in the average service time between these two chains? Test the null hypothesis that the chains’ average service time is the same. Use a significance level of 0.05.
Construct a 95% confidence interval for the difference in average service time.
Lex plans to go to Taco Bell and Sam to McDonald’s. Is the interval in part (b) likely to contain the difference in their service times they encounter? Explain your answer.
7.89 The efficacy of digital mindfulness training. There is growing evidence that in-person mindfulness training can reduce stress. Little is known, however, about the efficacy of self-guided digital training. To investigate this, a group of researchers randomized 69 participants to either a digital training group or a control group.46 For the digital group, participants were asked to complete the first 10 guided meditations using the mindfulness app Headspace. For the control group, participants were asked to listen to the 10 excerpts from an audiobook on mindfulness using Headspace. The following table summarizes the change from baseline in feelings of stress as measured using the Stress Overload Scale (SOS).
Group | n |
|
s |
---|---|---|---|
Digital | 41 |
|
8.36 |
Control | 28 | 0.10 | 10.46 |
Can we conclude that the change from baseline is different
across the two groups? Specify the hypotheses, test statistic,
P-value, and conclusion using
Can we conclude that the average stress level was reduced in
the digital group? Specify the hypotheses, test statistic,
P-value, and conclusion using
7.90 Incomplete follow-up. Refer to the previous
exercise. The researchers report that 19 participants
7.91 Can mockingbirds learn to identify specific humans?
A central question in urban ecology is why some animals adapt well
to the presence of humans and others do not. The following results
summarize part of a study of the northern mockingbird (Mimus polyglottos) that took place on a campus of a large university.47
For four consecutive days, the same human approached a nest and
stood 1 meter away for 30 seconds and placed his or her hand on
the rim of the nest. On the fifth day, a new person did the same
thing. Each day, the distance of the human from the nest when the
bird flushed was recorded. This was repeated for 24 nests. The
human intruder varied his or her appearance (that is, wore
different clothes) over the four days. We report results for only
Days 1, 4, and 5 here. The response variable is flush distance,
measured in meters.
Day | Mean | s |
---|---|---|
1 | 6.1 | 4.9 |
4 | 15.1 | 7.3 |
5 | 4.9 | 5.3 |
Explain why this should be treated as a matched design.
Unfortunately, the research article does not provide the
standard error of the difference, only the standard error of
the mean flush distance for each day. However, we can use the
general addition rule for variances (
page 247)
to approximate it. If we assume that the correlation between
the flush distance at Day 1 and Day 4 for each nest is
Using your result in part (b), test the hypothesis that there is no difference in the flush distance across these two days. Use a significance level of 0.05.
Repeat parts (b) and (c) but now compare Day 1 and Day 5,
assuming a correlation between flush distances for each nest
of
Write a brief summary of your conclusions.
7.92 Analysis of tree size using the complete data set.
The data used in
Exercises 7.19
(page 407),
7.57, and
7.58 (page 432) were obtained by taking simple random samples from the 584
longleaf pine trees that were measured in the Wade Tract. The
entire data set is given in the WADE data set. Find the 95%
confidence interval for the mean DBH using the entire data set and
compare this interval with the one that you calculated in
Exercise 7.19. Write a report about these data. Include comments on the effect
of the sample size on the margin of error, the distribution of the
data, the appropriateness of the Normality-based methods for this
problem, and the generalizability of the results to other similar
stands of longleaf pine or other kinds of trees in this area of
the United States and other areas.
7.93 Can snobby salespeople boost retail sales? Researchers asked 180 women to read a hypothetical shopping experience where they entered a luxury store (for example, Louis Vuitton, Gucci, Burberry) and asked a salesperson for directions to the items they sought. For half the women, the salesperson was condescending while doing this. The other half were directed in a neutral manner. After reading the experience, participants were asked various questions, including what price they were willing to pay (in dollars) for a particular product from the brand.48 Here is a summary of the results:
Chain | n |
|
s |
---|---|---|---|
Condescending | 90 | 4.44 | 3.98 |
Neutral | 90 | 3.95 | 2.88 |
Were the participants who were treated rudely willing to pay more
for the product? Analyze the data and write a report summarizing
your work. Be sure to include details regarding the statistical
methods you used, your assumptions, and your conclusions. If you
use two-sample t procedures, software gives
7.94 A comparison of female high school students. A study was performed to determine the prevalence of the female athlete triad (low energy availability, menstrual dysfunction, and low bone mineral density) in high school students.49 A total of 80 high school athletes and 80 sedentary students were assessed. The following table summarizes several measured characteristics:
Athletes | Sedentary | |||
---|---|---|---|---|
Characteristic |
|
s |
|
s |
Body fat (%) | 25.61 | 5.54 | 32.51 | 8.05 |
Body mass index | 21.60 | 2.46 | 26.41 | 2.73 |
Calcium deficit (mg) | 297.13 | 516.63 | 580.54 | 372.77 |
Glasses of milk/day | 2.21 | 1.46 | 1.82 | 1.24 |
For each of the characteristics, test the hypothesis that the
means are the same in the two groups. Use a significance level
of 0.05 for each test. (Software gives
Write a short report summarizing your results.
7.95 More on snobby salespeople. Refer to Exercise 7.93. Researchers also asked a different 180 women to read the same hypothetical shopping experience, but now they entered a mass market (e.g., Gap, American Eagle, H&M). Here are those results (in dollars) for the two conditions:
Chain | n |
|
s |
---|---|---|---|
Condescending | 90 | 2.90 | 3.28 |
Neutral | 90 | 2.98 | 3.24 |
Were the participants who were treated rudely willing to pay more
for the product? Analyze the data and write a report summarizing
your work. Be sure to include details regarding the statistical
methods you used, your assumptions, and your conclusions. Also
compare these results with the ones from
Exercise 7.93. If you use two-sample t procedures, software gives
7.96 Transforming the response. Refer to Exercises 7.93 and 7.95. The researchers state that they took the natural log of the willingness to pay variable in order to “normalize the distribution” prior to analysis. Thus, their test results are based on log dollar measurements. For the t procedures used in these two exercises, do you feel this transformation is necessary? Explain your answer.
7.97 Competitive prices? A retailer entered into an exclusive agreement with a supplier who guaranteed to provide all products at competitive prices. The retailer eventually began to purchase supplies from other vendors who offered better prices. The original supplier filed a legal action claiming violation of the agreement. In defense, the retailer had an audit performed on a random sample of invoices. For each audited invoice, all purchases made from other suppliers were examined, and the prices were compared with those offered by the original supplier. For each invoice, the percent of purchases for which the alternate supplier offered a lower price than the original supplier was recorded.50 Here are the data:
0 | 100 | 0 | 100 | 33 | 34 | 100 | 48 | 78 | 100 | 77 | 100 | 38 |
68 | 100 | 79 | 100 | 100 | 100 | 100 | 100 | 100 | 89 | 100 | 100 |
Report the average of the percents with a 95% margin of error. Do
the sample invoices suggest that the original supplier’s prices
are not competitive on the average?
7.98 Weight-loss programs. In a study of the
effectiveness of weight-loss programs, 47 subjects who were at
least 20% overweight took part in a group support program for 10
weeks. Private weighings determined each subject’s weight at the
beginning of the program and six months after the program’s end.
The matched pairs t test was used to assess the
significance of the average weight loss. The paper reporting the
study said, “The subjects lost a significant amount of weight over
time,
Why was the matched pairs statistic appropriate?
Explain to someone who knows no statistics but is interested in weight-loss programs what the practical conclusion is.
The paper follows the tradition of reporting significance only
at fixed levels, such as
7.99 Behavior of pet owners. On the morning of March 5, 1996, a train with 14 tankers of propane derailed near the center of the small Wisconsin town of Weyauwega. Six of the tankers were ruptured and burning when the 1700 residents were ordered to evacuate the town. Researchers study disasters like this so that effective relief efforts can be designed for future disasters. About half the households with pets did not evacuate all their pets. A study conducted after the derailment focused on problems associated with retrieval of the pets after the evacuation and characteristics of the pet owners. One of the scales measured “commitment to adult animals,” and the people who evacuated all or some of their pets were compared with those who did not evacuate any of their pets. Higher scores indicate that the pet owner is more likely to take actions that benefit the pet.52 Here are the data summaries:
Group | n |
|
s |
---|---|---|---|
Evacuated all or some pets | 116 | 7.95 | 3.62 |
Did not evacuate any pets | 125 | 6.26 | 3.56 |
Analyze the data and prepare a short report describing the
results. (Software gives
7.100 Sample size calculation.
Example 7.13
(page 412)
tells us that the mean height of 10-year-old girls is
N(56.9, 2.8) and for boys it is N(56.0, 3.5). The
null hypothesis that the mean heights of 10-year-old boys and
girls are equal is clearly false. The difference in mean heights
is
7.101 Different methods of teaching reading.
In the READ data set, the response variable Post3 is to be
compared for three methods of teaching reading. The Basal method
is the standard, or control, method, and the two new methods are
DRTA and Strat. We can use the methods of this chapter to compare
Basal with DRTA and Basal with Strat. Note that to make
comparisons among three treatments it is more appropriate to use
the procedures that we will learn in
Chapter 12.
Is the mean reading score with the DRTA method higher than that for the Basal method? Perform an analysis to answer this question, and summarize your results.
Answer part (a) for the Strat method in place of DRTA.
7.102 Does flipping a classroom work? One approach to active learning is a “flipped classroom.” This commonly involves students watching video lectures outside of class and working on problem-solving activities in class. Research has primarily focused on comparing teaching approaches using end-of-class outcomes, such as final grade. In a recent project, researchers compared the lasting benefits of a flipped classroom by comparing grades received in the subsequent course of the series. Here are the results:53
Group | n |
|
s |
---|---|---|---|
Flipped | 166 | 2.45 | 1.09 |
Traditional | 129 | 1.59 | 1.20 |
Do the average grades received differ across the two delivery
styles of the previous course? Test the hypothesis that the
average grades of the two groups are the same. (Software gives
Is this an experiment or an observational study? Explain your answer.
Based on your answers to parts (a) and (b), what are your conclusions?
7.103 Conditions for inference. Suppose that your state contains 85 school corporations, and each corporation reports its expenditures per pupil. Is it proper to apply the one-sample t method to these data to give a 95% confidence interval for the average expenditure per pupil in your state? Explain your answer.
7.104 Food costs. The Consumer Expenditure Survey provides information on the buying habits of U.S. consumers.54 In the latest report, the average annual amount a person under the age of 25 spent on food was $4876, with a standard error of $248.
Assuming a sample size of
Will this interval capture 90% of all annual food expenditures by persons under the age of 25? Explain your answer.
7.105 Assessment of a foreign-language institute.
The National Endowment for the Humanities sponsors summer
institutes to improve the skills of high school teachers of
foreign languages. One such institute hosted 20 French teachers
for four weeks. At the beginning of the period, the teachers were
given the Modern Language Association’s listening test of
understanding of spoken French. After four weeks of immersion in
French in and out of class, the listening test was given again.
(The actual French spoken in the two tests was different, so that
simply taking the first test should not improve the score on the
second test.) The maximum possible score on the test is 36.55
Here are the data:
Teacher | Pretest | Posttest | Gain | Teacher | Pretest | Posttest | Gain |
---|---|---|---|---|---|---|---|
1 | 32 | 34 | 2 | 11 | 30 | 36 | 6 |
2 | 31 | 31 | 0 | 12 | 20 | 26 | 6 |
3 | 29 | 35 | 6 | 13 | 24 | 27 | 3 |
4 | 10 | 16 | 6 | 14 | 24 | 24 | 0 |
5 | 30 | 33 | 3 | 15 | 31 | 32 | 1 |
6 | 33 | 36 | 3 | 16 | 30 | 31 | 1 |
7 | 22 | 24 | 2 | 17 | 15 | 15 | 0 |
8 | 25 | 28 | 3 | 18 | 32 | 34 | 2 |
9 | 32 | 26 | -6 | 19 | 23 | 26 | 3 |
10 | 20 | 26 | 6 | 20 | 23 | 26 | 3 |
To analyze these data, we first subtract the pretest score from
the posttest score to obtain the improvement for each teacher.
These 20 differences form a single sample. They appear in the
“Gain” columns. The first teacher, for example, improved from 32
to 34, so the gain is
State appropriate null and alternative hypotheses for examining the question of whether or not the course improves French spoken-language skills.
Describe the gain data. Use numerical and graphical summaries.
Perform the significance test. Give the test statistic, the degrees of freedom, and the P-value. Summarize your conclusion.
Give a 95% confidence interval for the mean improvement.
7.106 Sign test for assessment of a foreign-language
institute.
Use the sign test to assess whether the summer institute of of the
previous exercise improves French listening skills. State the
hypotheses, give the P-value using the binomial table (Table C), and report your conclusion.
7.107 Approximating power.
When software is not available to compute probabilities from the
non-central t distribution, one can approximate the power
by assuming the standard deviation
Given
Assuming
This means we reject when
Rewrite this event so that it is in terms of
Compute the probability of this event in part (c) now assuming
7.108 Food intake and weight gain. If we
increase our food intake, we generally gain weight. Nutrition
scientists can calculate the amount of weight gain that would be
associated with a given increase in calories. In one study, 16
nonobese adults, aged 25 to 36 years, were fed 1000 calories per
day in excess of the calories needed to maintain a stable body
weight. The subjects maintained this diet for eight weeks, so
they consumed a total of 56,000 extra calories.56
According to theory, 3500 extra calories will translate into a
weight gain of 1 pound. Therefore, we expect each of these
subjects to gain
Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Weight before | 55.7 | 54.9 | 59.6 | 62.3 | 74.2 | 75.6 | 70.7 | 53.3 |
Weight after | 61.7 | 58.8 | 66.0 | 66.2 | 79.0 | 82.3 | 74.3 | 59.3 |
Subject | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
Weight before | 73.3 | 63.4 | 68.1 | 73.7 | 91.7 | 55.9 | 61.7 | 57.8 |
Weight after | 79.1 | 66.0 | 73.4 | 76.9 | 93.1 | 63.0 | 68.2 | 60.3 |
For each subject, subtract the weight before from the weight after to determine the weight change.
Find the mean and the standard deviation for the weight change.
Calculate the standard error and the margin of error for 95% confidence. Report the 95% confidence interval for weight change in a sentence that explains the meaning of the 95%.
Convert the mean weight gain in kilograms to mean weight gain in pounds. Because there are 2.2 kg per pound, multiply the value in kilograms by 2.2 to obtain pounds. Do the same for the standard deviation and the confidence interval.
Test the null hypothesis that the mean weight gain is 16 lb. Be sure to specify the null and alternative hypotheses, the test statistic with degrees of freedom, and the P-value. What do you conclude?
Write a short paragraph explaining your results.
7.109 Food intake and NEAT. Nonexercise activity thermogenesis (NEAT) provides a partial explanation for the results you found in the previous analysis. NEAT is energy burned by fidgeting, maintenance of posture, spontaneous muscle contraction, and other activities of daily living. In the study of the previous exercise, the 16 subjects increased their NEAT by 328 calories per day, on average, in response to the additional food intake. The standard deviation was 256.
Test the null hypothesis that there was no change in NEAT versus the two-sided alternative. Summarize the results of the test and give your conclusion.
Find a 95% confidence interval for the change in NEAT. Discuss the additional information provided by the confidence interval that is not evident from the results of the significance test.
7.110 Alcohol consumption and body composition. Individuals who consume large amounts of alcohol do not use the calories from this source as efficiently as calories from other sources. One study examined the effects of moderate alcohol consumption on body composition and the intake of other foods. Fourteen subjects participated in a crossover design where they either drank wine for the first six weeks and then abstained for the next six weeks or vice versa.57 During the period when they drank wine, the subjects, on average, lost 0.4 kilogram (kg) of body weight; when they did not drink wine, they lost an average of 1.1 kg. The standard deviation of the difference between the weight lost under these two conditions is 8.6 kg. During the wine period, they consumed an average of 2589 calories; with no wine, the mean consumption was 2575. The standard deviation of the difference was 210.
Compute the differences in means and the standard errors for comparing body weight and caloric intake under the two experimental conditions.
A report of the study indicated that there were no significant differences in these two outcome measures. Verify this result for each measure, giving the test statistic, degrees of freedom, and the P-value.
One concern with studies such as this, with a small number of subjects, is that there may not be sufficient power to detect differences that are potentially important. Address this question by computing 95% confidence intervals for the two measures and discuss the information provided by the intervals.
Here are some other characteristics of the study. The study periods lasted six weeks. All subjects were males between the ages of 21 and 50 years who weighed between 68 and 91 kg. They were all from the same city. During the wine period, subjects were told to consume two 135-milliliter (ml) servings of red wine per day and no other alcohol. The entire six-week supply was given to each subject at the beginning of the period. During the other period, subjects were instructed to refrain from any use of alcohol. All subjects reported that they complied with these instructions except for three subjects, who said that they drank no more than three to four 12-ounce bottles of beer during the no-alcohol period. Discuss how these factors could influence the interpretation of the results.
7.111 Do women perform better in school?
Some research suggests that women perform better than men in
school, but men score higher on standardized tests.
Table 1.2
(page 24)
presents data on a measure of school performance, grade point
average (GPA), and a standardized test, IQ, for 78 seventh-grade
students. Do these data lend further support to the previously
found gender differences? Give graphical displays of the data
and describe the distributions. Use significance tests and
confidence intervals to examine this question, and prepare a
short report summarizing your findings.
7.112 Self-concept and school performance.
Refer to the previous exercise. Although self-concept in this
study was measured on a scale with values in the data set
ranging from 20 to 80, many prefer to think of this kind of
variable as having only two possible values: low self-concept or
high self-concept. Find the median of the self-concept scores in
Table 1.2,
and define those students with scores at or below the median to
be low-self-concept students and those with scores above the
median to be high-self-concept students. Do high-self-concept
students have GPAs that differ from those of low-self-concept
students? What about IQ? Prepare a report addressing these
questions. Be sure to include graphical and numerical summaries
and confidence intervals, and state clearly the details of
significance tests.