In Section 9.1, we discussed the use of the chi-square test to compare categorical-variable distributions of c populations. We now consider a slight variation on this scenario where we compare a sample from one population with a hypothesized distribution. Here is an example that illustrates the basic ideas.
The ACT study was designed to examine relationships among bone growth patterns, bone development, and calcium intake. Participants were more than 14,000 adolescents from six states: Arizona (AZ), California (CA), Hawaii (HI), Indiana (IN), Nevada (NV), and Ohio (OH). After the major goals of the study were completed, the investigators decided to do an additional analysis of the written comments made by the participants during the study.
Because the number of participants was so large, a sampling plan was devised to select sheets containing the written comments of approximately 10% of the participants. A systematic sample (see page 191) of every 10th comment sheet was retrieved from each storage container for analysis.8 Here are the counts for each of the six states:
Number of study participants in the sample | ||||||
---|---|---|---|---|---|---|
AZ | CA | HI | IN | NV | OH | Total |
167 | 257 | 257 | 297 | 107 | 482 | 1567 |
There were 1567 study participants in this sample. Did it result in a representative sample from the collection of all participants? One way to answer this is to use the proportions of students from each of the states in the original sample of more than 14,000 participants as the population values.9 Here are the proportions:
Population proportions | ||||||
---|---|---|---|---|---|---|
AZ | CA | HI | IN | NV | OH | Total |
0.105 | 0.172 | 0.164 | 0.188 | 0.070 | 0.301 | 100.000 |
Let’s see how well our sample reflects the state population proportions. We start by computing expected counts. Because 10.5% of the population is from Arizona, we expect the sample to have about 10.5% from Arizona. Therefore, because the sample has 1567 subjects, our expected count for Arizona is
Here are the expected counts for all six states:
Expected counts | ||||||
---|---|---|---|---|---|---|
AZ | CA | HI | IN | NV | OH | Total |
164.54 | 269.52 | 256.99 | 294.60 | 109.69 | 471.67 | 1567.01 |
9.15 Why is the sum 1567.01?
Refer to the table of expected counts in
Example 9.14.
Explain why the sum of the expected counts is 1567.01 and not
1567.
9.16 Calculate the expected counts. Refer to Example 9.14. Find the expected counts for the other five states. Report your results with three places after the decimal, as we did for Arizona. When you sum using three decimal places, does the rounding error go away?
As we saw with the expected counts in the analysis of two-way tables in Section 9.1, we do not really expect the observed counts to be exactly equal to the expected counts. Different samples under the same conditions would give different counts. We expect the average of these counts to be equal to the expected counts when the null hypothesis is true. How close do we think the counts and the expected counts should be?
We can think of our table of observed counts in Example 9.14 as a one-way table with six cells, each with a count of the number of subjects sampled from a particular state. Our question of interest is translated into a null hypothesis that says that the observed proportions of students in the six states can be viewed as random samples from the subjects in the ACT study. The alternative hypothesis is that the process generating the observed counts, a form of systematic sampling in this case, does not provide samples that are compatible with this hypothesis. In other words, the alternative hypothesis says that there is some bias in the way we selected the subjects whose comments we will examine.
Our analysis of these data is very similar to the analyses of two-way tables that we studied in Section 9.1. We have already computed the expected counts. We now construct a chi-square statistic that measures how far the observed counts are from the expected counts. Here is a summary of the procedure.
For Arizona, the observed count is 167. In Example 9.14, we calculated the expected count, 164.535. The contribution to the chi-square statistic for Arizona is
We use the same approach to find the contributions to the chi-square statistic for the other five states. The expected counts are all at least 5, so we can proceed with the significance test.
The sum of these six values is the chi-square statistic,
The degrees of freedom are the number of cells minus 1,
9.17 Compute the chi-square statistic.
For each of the other five states, compute the contribution to the
chi-square statistic using the method illustrated for Arizona in
Example 9.15. (You
can use the expected counts that you found in Check-in question
9.16 for these calculations.) Show that the sum of these values is
the chi-square statistic.
Software output from Minitab, SPSS, and JMP for this problem is given in Figure 9.10. Minitab and SPSS report the P-value as 0.968. JMP gives an additional place after the decimal, 0.9679. Note that the SPSS output includes a column titled “Residual.” For tables of counts, a residual for a cell is defined as
The chi-square statistic is the sum of the squares of these residuals. Note that the residual reported by SPSS is the numerator of this ratio.
Figure 9.10 Minitab, SPSS, and JMP outputs, Example 9.16.
Some software packages do not provide routines for computing the chi-square goodness-of-fit test. However, a very simple trick can be used to produce the results from software that can analyze two-way tables. Make a two-way table in which the first column contains k cells with the observed counts. Add a second column with counts that correspond exactly to the probabilities specified by the null hypothesis, with a very large number of observations. Then perform the chi-square significance test for two-way tables.
9.18 Distribution of M&M colors. M&M
Mars Company has varied the mix of colors for M&M’s Plain
Chocolate Candies over the years. These changes in color blends
are made based on the results of consumer preference tests.
Recently, the color distribution was reported to be 13% brown, 14%
yellow, 13% red, 20% orange, 24% blue, and 16% green.10
You open up a 14-ounce bag of M&M’s and find 61 brown, 59
yellow, 49 red, 77 orange, 141 blue, and 88 green. Use a
goodness-of-fit test to examine how well this bag fits the
percents stated by M&M Mars Company.
A study of the effect of the full moon on aggressive behaviors of
dementia patients included 15 patients, 14 of whom exhibited a greater
number of aggressive behaviors on full moon days than on other days.
The sign test (page 401) tests the null hypothesis that patients are equally likely to
exhibit more aggressive behaviors on full moon days and on other days.
Because
To look at these data from the viewpoint of goodness of fit, we think of the data as two counts: patients who had a greater number of aggressive behaviors on full moon days and patients who had a greater number of aggressive behaviors on other days.
Counts | ||
---|---|---|
Moon | Other | Total |
14 | 1 | 15 |
If the two outcomes are equally likely, the expected counts are both
7.5
The test statistic is
We have
The sign test can test the null hypothesis versus the one-sided alternative that there was a “moon effect.” Within the framework of the goodness-of-fit test, we test only the general alternative hypothesis that the distribution of the counts do not follow the specified probabilities. Note that the P-value for the sign test versus the one-sided alternative is 0.000488, approximately one-half of the value that we reported from Table F in Example 9.17.
The chi-square goodness-of-fit test is used to compare
the sample distribution of a categorical variable from a
population with a hypothesized distribution. The data for
n observations with k possible outcomes are
summarized as observed counts,
The analysis of these data is similar to the analyses of two-way
tables discussed in
Section 9.1. For each cell, the expected count is determined by
multiplying the total number of observations n by the
specified probability
9.14 What’s wrong? Each of the following statements contains an error. Describe each error and explain why the statement is wrong.
A goodness-of-fit test can be used to compare the observed distribution of a categorical variable with a distribution specified by an alternative hypothesis.
The residuals for a chi-square goodness-of-fit test are all positive.
The expected counts for a goodness-of-fit test are computed by multiplying the sample size by the sample proportion.
9.15 Is the coin fair? In Example 4.3 (page 207), we learned that the South African statistician John Kerrich tossed a coin 10,000 times while imprisoned by the Germans during World War II. The coin came up heads 5067 times.
Formulate the question about whether or not the coin was fair as a goodness-of-fit hypothesis.
Compute the expected counts and explain what they tell us.
9.16 Goodness of fit to a standard Normal
distribution.
Computer software generated 300 random numbers that should look
as if they are from the standard Normal distribution. They are
categorized into five groups: (1) less than or equal to
9.17 Test the hypothesis that the coin fair. Refer to Exercise 9.15. Find the chi-square statistic and the P-value.
9.18 More on the goodness of fit to a standard Normal distribution. Refer to Exercise 9.16. Use software to generate a sample of 300 Normal random variables with mean 10 and standard deviation 5. Choose a set of intervals and perform the goodness-of-fit test.
9.19 Interpret the results of the coin tossing analysis. Refer to Exercises 9.15 and 9.17. Write a short summary of your analysis of John Kerrich’s coin tossing, including the results of the chi-square test.
9.20 Goodness of fit to a Poisson distribution.
Refer to
Example 5.30
(page 316),
where a Poisson distribution is described as a model for the
number of Wi-Fi slowdowns per day. The mean number of slowdowns
is 3.7. In this setting, the probability for 0, 1, or 2
slowdowns is 0.28543, the probability for 4, 5, or 6 slowdowns
is 0.54466, and the probability for 7 or more slowdowns is
0.16991. Suppose that you record the number of slowdowns for the
next 100 days. Your observed counts of slowdowns are 27 for 0,
1, or 2 slowdowns, 56 for 4, 5, or 6 slowdowns, and 17 for 7 or
more slowdowns. Use these data to test the hypothesis that
slowdowns are distributed according to this Poisson
distribution.
9.21 More on the goodness of fit to a Poisson
distribution.
Refer to the previous exercise. Repeat the analysis using 41,
35, and 24 as the observed counts. What do you conclude?