Chapter 9 EXERCISES

  1. 9.22 Translate each problem into a r×c table. In each of the following scenarios, translate the problem into one that can be analyzed using a r×c table. Give the values of r and c, the table, and its entries.

    1. A sample of undergraduate students were asked whether or not they were in favor of adding a civics requirement to the core curriculum. For the first-year students, 108 said Yes and 250 said No. For the fourth-year students, 131 said Yes and 111 said No.

    2. Four website designs are being compared. Forty-eight students have agreed to be subjects for the study, and they are each randomly assigned to watch one of the designs for as long as they like. For each student, the study directors record whether or not the website is watched for more than a minute. For the first design, 10 students watched for more than a minute; for the second, 6 watched for more than a minute; for the third, 11 students watched for more than a minute; and for the fourth, 2 students watched for more than a minute.

  2. 9.23 Sexual harassment online or in person. In the study described in Exercise 9.11, the students were also asked whether or not they were harassed in person and whether or not they were harassed online. Here are the data for the girls: Data set icon for harasg.

    Harassed in person Harassed online
    Yes No
    Yes 321 200
    No  40 441
    1. Analyze these data using the method presented in Chapter 8 for comparing two proportions (page 474).

    2. Analyze these data using the method presented in this chapter for examining a relationship between two categorical variables in a 2×2 table.

    3. Use this example to explain the relationship between the chi-square test and the z test for comparing two proportions.

    4. The number of girls reported in this exercise is not the same as the number reported for Exercise 9.11. Suggest a possible reason for this difference.

  3. 9.24 Data for the boys. Refer to the previous exercise. Here are the corresponding data for boys: Data set icon for harasb.

    Harassed in person Harassed online
    Yes No
    Yes 183 154
    No 48 578

    Using these data, repeat the analyses that you performed for the girls in Exercise 9.23. How do the results for the boys differ from those that you found for girls?

  4. 9.25 Repeat your analysis. In part (a) of Exercise 9.23, you had to decide which variable was the explanatory variable and which variable was the response variable when you computed the proportions to be compared. Data set icon for harasb.

    1. Did you use harassed online or harassed in person as the explanatory variable? Explain the reasons for your choice.

    2. Repeat the analysis that you performed in Exercise 9.23 with the other choice for the explanatory variable.

    3. Summarize what you have learned from comparing the results of using the different choices for analyzing these data.

  5. 9.26 Is there a random distribution of trees? In Example 6.1 (page 329), we examined data concerning the longleaf pine trees in the Wade Tract and concluded that the distribution of trees in the tract was not random. Here is another way to examine the same question. First, we divide the tract into four equal parts, or quadrants, in the east–west direction. Call the four parts Q1 through Q4. Then we take a random sample of 100 trees and count the number of trees in each quadrant. Here are the data: Data set icon for treeq.

    Quadrant Q1 Q2 Q3 Q4
    Count 18 22 39 21
    1. If the trees are randomly distributed, we expect to find 25 trees in each quadrant. Why? Explain your answer.

    2. We do not really expect to get exactly 25 trees in each quadrant. Why? Explain your answer.

    3. Perform the goodness-of-fit test for these data to determine if these trees are randomly scattered. Write a short report giving the details of your analysis and your conclusion.

  6. 9.27 Is the die fair? You suspect that a die has been altered so that the outcomes of a roll, the numbers 1 to 6, are not equally likely. You toss the die 500 times and obtain the following results: Data set icon for die.

    Outcome 1 2 3 4 5 6
    Count 69 84 99 78 98 72

    Compute the expected counts that you would need to use in a goodness-of-fit test for these data.

  7. 9.28 Perform the significance test Refer to the previous exercise. Find the chi-square test statistic and its P-value and write a short summary of your conclusions.

  8. 9.29 DFW rates. One measure of student success for colleges and universities is the percent of admitted students who graduate. Studies indicate that a key issue in retaining students is their performance in so-called gateway courses. These are courses that serve as prerequisites for other key courses that are essential for student success. One measure of student performance in these courses is the DFW rate, the percent of students who receive grades of D, F, or W (withdraw). A major project was undertaken to improve the DFW rate in a gateway course at a large midwestern university. The course curriculum was revised to make it more relevant to the majors of the students taking the course, a small group of excellent teachers taught the course, technology (including clickers and online homework) was introduced, and student support outside of the classroom was increased. The following table gives data on the DFW rates for the course over three years.11 In Year 1, the traditional course was given; in Year 2, a few changes were introduced; and in Year 3, the course was substantially revised.

    Year DFW rate Number of students taking course
    Year 1 42.3% 2408
    Year 2 24.9% 2325
    Year 3 19.9% 2126

    Do you think that the changes in this gateway course had an impact on the DFW rate? Write a report giving your answer to this question. Support your answer with an analysis of the data.

  9. 9.30 Lying to a teacher. One of the questions in a survey of high school students asked about lying to teachers.12 The following table gives the numbers of students who said that they had lied to a teacher at least once during the past year, classified by sex. Data set icon for lie.

    Lied at least once Sex
    Male Female
    Yes 3,228 10,295
    No 9,659  4,620
    1. Add the marginal totals to the table.

    2. Calculate appropriate percents to describe the results of this question.

    3. Summarize your findings in a short paragraph.

    4. Test the null hypothesis that there is no association between sex and lying to teachers. Give the test statistic and the P-value (with a sketch similar to the one on page 494) and summarize your conclusion. Be sure to include numerical and graphical summaries.

    5. The survey asked students if they lied, but we do not know if they answered the question truthfully. How does this fact affect the conclusions that you can draw from these data?

  10. 9.31 When do Canadian students enter private career colleges? A survey of 13,364 Canadian students who enrolled in private career colleges was conducted to understand student participation in the private postsecondary educational system.13 In one part of the survey, students were asked about their field of study and about when they entered college. Here are the results: Data set icon for canf.

    Field of study Number of students Time of entry
    Right after high school Later
    Trades 942 34% 66%
    Design 584 47% 53%
    Health 5085 40% 60%
    Media/IT 3148 31% 69%
    Service 1350 36% 64%
    Other 2255 52% 48%

    In this table, the second column gives the number of students in each field of study. The next two columns give the marginal distribution of time of entry for each field of study.

    1. Use the data provided to make the 6×2 table of counts for this problem.

    2. Analyze the data.

    3. Write a summary of your conclusions. Be sure to include the results of your significance testing as well as a graphical summary.

  11. 9.32 Government loans for Canadian students in private career colleges. Refer to the previous exercise. The survey also asked about how these college students paid for their education. A major source of funding was government loans. Here are the survey percents of Canadian private students who used government loans to finance their education, by field of study: Data set icon for cangov.

    Field of study Number of students Percent using government loans
    Trades  942 45%
    Design  599 53%
    Health 5234 55%
    Media/IT 3238 55%
    Service 1378 60%
    Other 2300 47%
    1. Construct the 6×2 table of counts for this exercise.

    2. Test the null hypothesis that the percent of students using government loans to finance their education does not vary with field of study. Be sure to provide all the details of your significance test.

    3. Summarize your analysis and conclusions. Be sure to include a graphical summary.

    4. The number of students reported in this exercise is not the same as the number reported in Exercise 9.31. Suggest a possible reason for this difference.

  12. 9.33 Are Mexican Americans less likely to be selected as jurors? Refer to Exercise 8.74 (page 485) concerning Castaneda v. Partida, the case where the Supreme Court review used the phrase “two or three standard deviations” as a criterion for statistical significance. Recall that there were 181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of the 870 people selected for jury duty, 339 were Mexican Americans. We are interested in finding out if there is an association between being Mexican American and being selected as a juror. Formulate this problem using a two-way table of counts. Construct the 2×2 table using the variables Mexican American or not and juror or not. Find the X2 statistic and its P-value. Square the z statistic that you obtained in Exercise 8.74 and verify that the result is equal to the X2 statistic.

  13. 9.34 Goodness-of-fit to the uniform distribution. Computer software generated 500 random numbers that should look as if they are from the uniform distribution on the interval 0 to 1 (see page 229). They are categorized into five groups: (1) less than or equal to 0.2, (2) greater than 0.2 and less than or equal to 0.4, (3) greater than 0.4 and less than or equal to 0.6, (4) greater than 0.6 and less than or equal to 0.8, and (5) greater than 0.8. The counts in the five groups are 114, 92, 108, 101, and 85, respectively. The probabilities for these five intervals are all the same. What is this probability? Compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results.

  14. 9.35 More on goodness-of-fit to the uniform distribution. Refer to the previous exercise. Use software to generate your own sample of 500 uniform random variables on the interval from 0 to 1 and perform the goodness-of-fit test. Choose a different set of intervals than the ones used in the previous exercise.

  15. NAEP 9.36 Suspicious results? An instructor who assigned an exercise similar to the one described in the previous exercise received homework from a student who reported a P-value of 0.999. The instructor suspected that the student did not use the computer for the assignment but just made up some numbers for the homework. Why was the instructor suspicious? How would this scenario change if there were 2000 students in the class?

  16. NAEP 9.37 McNemar’s test. In Exercise 9.23 (page 511), you examined the relationship between being harassed online and being harassed in person for a sample of 1002 girls. An additional question can be asked about these data. Suppose we wanted to compare the proportions of girls who were harassed online and the proportion who were harassed in person. This is very much like the type of question that we studied in Section 8.2 (page 468). In that case, however, we used the assumption that the two samples used to calculate the proportions were independent. This assumption is not valid for our harassment data because the proportions are calculated from data provided by the same girls. McNemar’s test is the recommended procedure. The null hypothesis is that the two population proportions are equal, and the alternative is two-sided. The test examines the counts in the cells where the two responses do not agree. In our case, these are 200 and 40. Note that if these two counts are equal, then the proportions will be equal for any possible values of counts in the other two cells. McNemar’s test is equivalent to the goodness-of-fit test that we examined in Example 9.17. Find the sample proportions, report the results of the significance test, and write a short summary of your conclusions.

PUTTING IT ALL TOGETHER

  1. 9.38 Titanic! In 1912, the luxury liner Titanic, on its first voyage, struck an iceberg and sank. Some passengers got off the ship in lifeboats, but many died. Think of the Titanic disaster as an experiment in how the people of that time behaved when faced with death in a situation where only some can escape. The passengers are a sample from the population of their peers. Here is information about who lived and who died, by sex and economic status.14 (The data leave out a few passengers whose economic status is unknown.)

    Men
    Status Died Survived
    Highest 111 61
    Middle 150 22
    Lowest 419 85
    Total 680 168
    Women
    Status Died Survived
    Highest   6 126
    Middle  13  90
    Lowest 107 101
    Total 126 317
    1. Compare the percents of men and of women who died. Is there strong evidence that a higher proportion of men died in such situations? Why do you think this happened?

    2. Look only at the women. Describe how the three economic classes differ in the percent of women who died. Are these differences statistically significant?

    3. Now look only at the men and answer the same questions.

  2. 9.39 Health care fraud. Most errors in billing insurance providers for health care services involve honest mistakes by patients, physicians, or others involved in the health care system. However, fraud is a serious problem. The National Health Care Anti-Fraud Association estimates that tens of billions of dollars are lost to health care fraud each year.15 When fraud is suspected, an audit of randomly selected billings is often conducted. The selected claims are then reviewed by experts, and each claim is classified as allowed or not allowed. The distributions of the amounts of claims are frequently highly skewed, with a large number of small claims and a small number of large claims. Simple random sampling would likely be overwhelmed by small claims and would tend to miss the large claims, so stratification is often used. See the section on stratified sampling in Chapter 3 (page 184). Here are data from an audit that used three strata based on the sizes of the claims (small, medium, and large).16 Data set icon for berrors.

    Stratum Sampled claims Number not allowed
    Small 59 7
    Medium 19 6
    Large  4 2
    1. Construct the 3×2 table of counts for these data and include the marginal totals.

    2. Find the percent of claims that were not allowed in each of the three strata.

    3. State an appropriate null hypothesis to be tested for these data.

    4. Perform the significance test and report your test statistic with degrees of freedom and the P-value. State your conclusion.

    5. Is there a reason you should not trust the chi-square test for this setting? Explain your answer.

  3. 9.40 Population estimates. Refer to the previous exercise. One reason to do an audit such as this is to estimate the number of claims that would not be allowed if all claims in a population were examined by experts. We have estimates of the proportions of such claims from each stratum based on our sample. With our simple random sampling of claims from each stratum, we have unbiased estimates of the corresponding population proportion for each stratum. Therefore, if we take the sample proportions and multiply by the population sizes, we would have the estimates that we need. Here are the population sizes for the three strata: Data set icon for berrors.

    Stratum Claims in strata
    Small 3118
    Medium  225
    Large   41
    1. For each stratum, estimate the total number of claims that would not be allowed if all claims in the stratum had been audited.

    2. (Optional) Give margins of error for your estimates. (Hint: You first need to find standard errors for your sample estimates; see Chapter 10, page 452. Then you need to use the rules for variances given in Chapter 4, page 246, to find the standard errors for the population estimates. Finally, you need to multiply by z* to determine the margins of error.)