8.2 Comparing Two Proportions in Chapter 8 Inference for Proportions

8.2 Comparing Two Proportions

When you complete this section, you will be able to:

Identify the counts and sample sizes for a comparison between two proportions, compute the sample proportions, and find their difference.
Use the large-sample method to find the confidence interval for a difference between two proportions and interpret the confidence interval.
Use the large-sample method to perform a significance test for comparing two proportions and interpret the results.
Find the sample size needed for a desired margin of error for the difference in proportions.
Find the sample size needed for a significance test comparing two proportions.

In studies about proportions, it is far more common to compare the proportions of two groups (such as subjects in a treatment group versus those in a control group). In the previous section, we learned the methods of inference for a single proportion. Our problem now concerns the comparison of two proportions.

Similar to our approach when comparing two means (page 410), we call the two groups being compared Population 1 and Population 2 and the two population proportions of “successes” p1 and p2. The data consist of two independent SRSs, of size n1 from Population 1 and size n2 from Population 2. The proportion of successes in each sample estimates the corresponding population proportion. Here is the notation we will use in this section:

Population	Population proportion	Sample size	Count of successes	Sample proportion
1	p1	n1	X1	p^1=X1/n1
2	p2	n2	X2	p^2=X2/n2

To compare the two populations, we use the difference between the two sample proportions:

D=p^1−p^2

When both sample sizes are sufficiently large, the sampling distribution of the difference D is approximately Normal.

EXAMPLE 8.11 Lost wallets.

Data set icon for lost.

If you lose your wallet, what is the chance that it will be returned to you? Does this chance depend on whether or not there is money in your wallet? The answer is somewhat surprising. A recent study collected data from 355 cities in 40 countries. Researchers designed experiments to examine these questions. For the United States, there were 300 wallets with no money and 300 wallets with money ($13.45).¹⁶ Here are the results:

Wallet condition	n	X	p^=X/n
Money	300	174	0.58
No money	300	111	0.37

In each row of this table, X is the number of wallets returned, and p^ is the sample proportion. The difference between the proportions is

D=p^1−p^2=0.58−0.37=0.21

Inference procedures for comparing proportions are z procedures based on the Normal approximation and on standardizing the difference D. The first step is to obtain the mean and standard deviation of D. By the addition rule for means, the mean of D is the difference of the means:

μD=μp^1−μp^2=p1−p2

That is, the difference D=p^1−p^2 between the sample proportions is an unbiased estimator of the population difference p1−p2. Similarly, the addition rule for variances tells us that the variance of D is the sum of the variances:

σD2=σp^12+σp^22=p1(1−p1)n1+p2(1−p2)n2

Therefore, when n1 and n2 are large, D is approximately Normal with mean μD=p1−p2 and standard deviation

σD=p1(1−p1)n1+p2(1−p2)n2

Check-in

8.14 Rules for means and variances. Suppose that p1=0.3, n1=40, p2=0.6, n2=50. Find the mean and the standard deviation of the sampling distribution of p1−p2.
8.15 Effect of the sample sizes. Suppose that p1=0.3, n1=160, p2=0.6, n2=200.
1. Find the mean and the standard deviation of the sampling distribution of p1−p2.
2. The sample sizes here are four times as large as those in the previous Check-in question, but the population proportions are the same. Compare the results for this Check-in question with those that you found in the previous Check-in question. What is the effect of multiplying the sample sizes by 4?
8.16 Verify the formulas. It is quite easy to verify the formulas for the mean and standard deviation of the difference D.
1. What are the means and standard deviations of the two sample proportions p^1 and p^2?
2. Use the addition rule for means of random variables. What is the mean of D=p^1−p^2?
3. The two samples are independent. Use the addition rule for variances of random variables. What is the variance of D?

Large-sample confidence interval for a difference in proportions

To obtain a confidence interval for p1−p2, we once again replace the unknown parameters in the standard deviation with estimates to obtain an estimated standard deviation, or standard error. Here is the confidence interval we want.

Large-sample confidence interval for comparing two proportions

Choose an SRS of size n1 from a large population having proportion p1 of successes and an independent SRS of size n2 from another population having proportion p2 of successes. The estimate of the difference in the population proportions is

D=p^1−p^2

The standard error of D is

SED=p^1(1−p^1)n1+p^2(1−p^2)n2

and the margin of error of D for confidence level C is

m=z*SED

where the critical value z* is the value for the standard Normal density curve with area C between −z* and z*. A level C large-sample confidence interval for p1−p2 is

D±m

Use this method for 90%, 95%, or 99% confidence when the number of successes and the number of failures in each sample are both 10 or more.

Example 8.12 Confidence interval for lost wallets.

Let’s find a 95% confidence interval for the difference between the proportions of returned wallets with money and with no money. From Example 8.11 we know

Wallet condition	n	X	p^=X/n
Money	300	174	0.58
No money	300	111	0.37

and D=0.58−0.37=0.21.

To get the margin of error, we first calculate the standard error of D:

SED=p^1(1−p^1)n1+p^2(1−p^2)n2=(0.58)(1−0.58)300+(0.37)(1−0.37)300=0.0399

For 95% confidence, we have z*=1.96, so the margin of error is

m=z*SED=(1.96)(0.0399)=0.0781

The 95% confidence interval is

D±m=0.21±0.0781=(0.1319, 0.2881)

With 95% confidence, we can say that the difference in the proportions is between 0.1319 and 0.2881. Alternatively, we can report that the difference between the percent returned of wallets with money and the percent returned of wallets with no money is 21.0%, with a 95% margin of error of 7.8%.

In this example, we chose wallets with money as the first population. Had we chosen wallets with no money to be the first population, the estimate of the difference would be negative (−0.21), and the margin of error would be unchanged. Because it is easier to discuss positive numbers, we generally choose the first population to be the one with the higher proportion.

Example 8.13 Lost wallets confidence interval from software.

Data set icon for lost.

Figure 8.5 shows a spreadsheet that can be used as input to software. Separate columns label each count according to condition and response. Output from JMP and Minitab is given in Figure 8.6. Compare these outputs with the calculations that we performed in Example 8.11.

An Excel spreadsheet of data. — Figure 8.5 Spreadsheet that can be used as input to software that computes the confidence interval for the lost wallet data, Example 8.13.

J M P and Minitab outputs of wallet data. — Figure 8.6 JMP and Minitab outputs for the lost wallet confidence interval, Example 8.13.

Software Output

Check-in

8.17 Age and commercial preference. A study was designed to compare two energy drink commercials. Participants were individuals aged 18 to 25 who regularly drink energy drinks. Each participant was shown the commercials in random order and asked to select the better one. Commercial A was selected by 46 out of the 103 participants aged 18 to 21 years and by 65 out of the 110 participants aged 22 to 25 years. Give an estimate of the difference in age proportions that favored Commercial A. Also construct a large-sample 95% confidence interval for this difference.
8.18 Confidence interval for age and commercial preference. Refer to the previous Check-in question. Construct a 95% confidence interval for the difference in proportions that favor Commercial B. Explain how you could have obtained these results from the calculations you did in Check-in question 8.17.

Example 8.14 Plus four for sex and sexual maturity.

In studies that look for a difference between sexes, a major concern is whether or not apparent differences are due to other variables that are associated with sex. Because boys mature more slowly than girls, a study of adolescents that compares boys and girls of the same age may confuse a sex effect with an effect of sexual maturity. The “Tanner score” is a commonly used measure of sexual maturity.¹⁸ Subjects are asked to determine their score by placing a mark next to a rough drawing of an individual at their level of sexual maturity. There are five different drawings, so the score is an integer between 1 and 5.

A pilot study included 12 girls and 12 boys from a population that will be used for a large experiment. Four of the boys and three of the girls had Tanner scores of 4 or 5, a high level of sexual maturity. Let’s find a 95% confidence interval for the difference between the proportions of boys and girls who have high (4 or 5) Tanner scores in this population. The numbers of successes and failures in both groups are not all at least 10, so the large-sample approach is not recommended. On the other hand, the sample sizes are both at least 5, so the plus four method is appropriate.

The plus four estimate of the population proportion for boys is

p˜1=X1+1n1+2=4+112+2=0.3571

For girls, the estimate is

p˜2=X2+1n2+2=3+112+2=0.2857

Therefore, the estimate of the difference is

D˜=p˜1−p˜2=0.3571−0.2857=0.071

The standard error of D˜ is

SED˜=p˜1(1−p˜1)n1+2+p˜2(1−p˜2)n2+2=(0.3571)(1−0.3571)12+2+(0.2857)(1−0.2857)12+2=0.1760

For 95% confidence, z*=1.96 and the margin of error is

m=z*SED˜=(1.96)(0.1760)=0.345

The confidence interval is

D˜±m=0.071±0.345=(−0.274, 0.416)

With 95% confidence we can say that the difference in the proportions is between −0.274 and 0.416. Alternatively, we can report that the difference in the proportions of boys and girls with high Tanner scores in this population is 7.1%, with a 95% margin of error of 34.5%.

The very large margin of error in this example indicates that either boys or girls could be more sexually mature in this population and that the difference could be quite large. Although the interval includes the possibility that there is no difference, corresponding to p1=p2 or p1−p2=0, we should not conclude that there is no difference in the proportions. caution With small sample sizes such as these, the data do not provide us with a lot of information for our inference. This fact is expressed quantitatively through the very large margin of error.

Significance test for a difference in proportions

Data set icon for Vtm.

Although we prefer to compare two proportions by giving a confidence interval for the difference between the two population proportions, it is sometimes useful to test the null hypothesis that the two population proportions are the same.

We standardize D=p˜1−p˜2 by subtracting its mean p1−p2 and then dividing by its standard deviation

σD=p1(1−p1)n1+p2(1−p2)n2

If n1 and n2 are large, the standardized difference is approximately N(0, 1). For the large-sample confidence interval, we used sample estimates in place of the unknown population values in the expression for σD. Although this approach would lead to a valid significance test, we instead adopt the more common practice of replacing the unknown σD with an estimate that takes into account our null hypothesis H0: p1=p2. If these two proportions are equal, then we can view all the data as coming from a single population. Let p denote the common value of p1 and p2; then the standard deviation of D=p^1−p^2 is

σD=p(1−p)n1+p(1−p)n2=p(1−p)(1n1+1n2)

We estimate the common value of p by the overall proportion of successes in the two samples:

p^=number of successes in both samplesnumber of observations in both samples=X1+X2n1+n2

This is called the pooled estimate of p^ because it combines, or pools, the information from both samples.

To estimate σD under the null hypothesis, we substitute p^ for p in the expression for σD. The result is a standard error for D that assumes H0: p1=p2:

SEDP=p^(1−p^)(1n1+1n2)

The subscript on SEDP reminds us that we pooled data from the two samples to construct the estimate.

Significance test for comparing two proportions

To test the hypothesis

H0: p1=p2

compute the z statistic

z=p^1−p^2SEDP

where the pooled standard error is

SEDP=p^(1−p^)(1n1+1n2)

and where the pooled estimate of the common value of p1 and p2 is

p^=X1+X2n1+n2

In terms of a standard Normal random variable Z, the approximate P-value for a test of H0 against

Ha: p1>p2 is P(Z≥z)

Ha: p1<p2 is P(Z≤z)

Ha: p1≠p2 is 2P(Z≥ |z|)

This z test is based on the Normal approximation to the binomial distribution. As a general rule, we will use it when the expected number of successes and the expected number of failures in each of the samples are at least 5.

Example 8.15 Lost wallets: The z test.

Data set for lost.

Are wallets with no money and wallets with money equally likely to be returned? We examine the data in Example 8.11 (page 469) to answer this question. Here is the data summary:

Wallet condition	n	X	p^=X/n
Money	300	174	0.58
No money	300	111	0.37

The sample proportions are certainly quite different, but we will perform a significance test to see if the difference is large enough to lead us to believe that the population proportions are not equal. Formally, we test the hypotheses

H0: p1=p2Ha: p1≠p2

The pooled estimate of the common value of p is

p^=174+111300+300=285600=0.475

The test statistic is calculated as follows:

SEDp=(0.475)(1−0.475)(1300+1300)=0.040774z=p^1−p^2SEDp=0.58−0.370.040774=5.15

The P-value is 2P(Z≥5.15). Note that the largest value for z in Table A is 3.49. Therefore, from Table A, we can conclude that P<2(1−0.9998)=0.0004, although we know that the true P-value is smaller. Software gives P=0.00000026.

Here is our summary: 58% of the wallets with money and 37% of the wallets with no money were returned; the difference is statistically significant (z=5.5, P<0.0001) . We have chosen to report the the P-value as P<0.001 as this communicates what we have found concisely. Do you think we should have reported P<0.0004 or P=0.00000026?

Example 8.16 Output for the lost wallets significance test.

Data set for lost.

We prefer to use software to obtain the significance test results for comparing the return rates of wallets with and without money. Output from JMP and Minitab is given in Figure 8.7. JMP reports the significance tests for the two-sided alternative and for the two one-sided alternatives. We are interested in the two-sided alternative. Therefore, we report the P-value as <0.0001. Minitab reports the test statistic, z=5.27 and gives the P-value as 0.000 (this means p<0.0005). The same P-value is given for the Fisher exact test. This test is an alternative to the large-sample significance test that we have discussed. It is preferred by many, particularly for small sample sizes.

The J M P window shows an expanded menu, contingency analysis of returned by wallet. Below, it reads, frequency, count. Below is a second expanded menu, two sample test for proportions, with the following two tables of data beneath. First. Description, P 1 yes money minus P 1 yes no money. Proportion difference, 0.21. Lower 95 percent, 0.130715. Upper 95 percent, 0.286504. The Minitab output shows two tables of data as follows. Descriptive statistics. Sample, 1. N, 300. Event, 174. Sample p, 0.580000. Sample, 2. N, 300. Event, 111. Sample p, 0.370000. Estimation for difference. Difference, 0.21. 95 percent C I for difference, (0.131871, 0.288129). C I based on normal approximation. Second. Adjusted Wald test. Null hypothesis, P 1 yes money minus P 1 yes no money less than or equal to 0. Probability, less than 00001 asterisk. Null hypothesis, P 1 yes money minus P 1 yes no money greater that or equal to 0. Probability, 1.0000. Null hypothesis, P 1 yes money minus P 1 yes no money equals 0. Probability, less than 0.0001 asterisk. The Minitab output lists two categories of data as follows. Descriptive statistics. Table. Sample, 1. N, 300. Event, 174. Sample p, 0.580000. Sample, 2. N, 300. Event, 111. Sample p, 0.370000. Test. Null hypothesis, H sub 0, p sub 1 minus p sub 2 equals 0. Alternative hypothesis, H sub 1, p sub 1 minus p sub 2 does not equal 0. Table. Method, normal approximation. Z value, 5.27. P value, 0.000. Method, Fisher’s exact. Z value, blank. P value, 0.000.

Do you think that we could have argued that the proportion would be higher for wallets with money than for wallets without money before looking at the data in this example? This would allow us to use the one-sided alternative H0: p1>p2. The P-value would be half of the value obtained for the two-sided test. Do you think that this approach is justified?

Check-in

8.19 The z test for age and commercial preference. Refer to Check-in question 8.17 (page 472). Test whether the proportions of the two age groups are the same versus the two-sided alternative at the 5% level.
8.20 Changing the alternative hypothesis. Refer to Check-in question 8.19. Does your conclusion change if you perform the test with the older participants designated as the first group (corresponding to p1)? Explain your answer.

Choosing a sample size for two sample proportions

In Section 8.1, we studied methods for determining the sample size using two settings. First, we used the margin of error for a single proportion as the criterion for choosing n (page 460). Second, we used the desired power of the significance test for a single proportion as the determining factor (page 462). We follow the same approach here for comparing two proportions.

Use the margin of error

Recall that the large-sample estimate of the difference in proportions is

D=p^1−p^2=X1n1−X2n2

the standard error of the difference is

SED=p^1(1−p^1)n1+p^2(1−p^2)n2

and the margin of error for confidence level C is

m=z*SED

where z* is the value for the standard Normal density curve with area C between −z* and z*.

For a single proportion, we guessed a value for the true proportion and computed the margins of error for various choices of n. Here we use the same idea, but we need to guess values for the two proportions. We can display the results in a table, as in Example 8.9 (page 461), or in a graph, as in Exercise 8.29 (page 467).

Sample size for desired margin of error

The level C confidence interval for a difference in two proportions will have a margin of error approximately equal to a specified value m when the sample size for each of the two proportions is

n=(z*m)2(p1*(1−p1*)+p2*(1−p2*))

Here, z* is the critical value for confidence level C, and p1* and p2* are guessed values for p1 and p2, the proportions of successes in the future sample.

The margin of error will be less than or equal to m if p1* and p2* are chosen to be 0.5. The common sample size required is then given by

n=(12)(z*m)2

Note that to use the confidence interval, which is based on the Normal approximation, we still require that the number of successes and the number of failures in each of the samples are at least 10.

Example 8.17 Margin of error—based sample sizes for age and commercial preferences.

Consider the setting in Check-in question 8.17, where we compared the preferences of two age groups for two commercials. Suppose we want to do a study in which we perform a similar comparison using a 95% confidence interval that will have a margin of error of 0.2 or less. What should we choose for our sample size? Using m=0.2 and z* in our formula, we have

n=(12)(z*m)2=(12)(1.960.2)2=48.02

We would include 48 participants in each age group for our study.

Note that we have rounded the calculated value, 48.02, down because it is very close to 48 and we used p1*=p2*=0.5. The normal procedure would be to round the calculated value up to the next larger integer.

Check-in

8.21 What would the margin of error be? Consider the setting in Check-in question 8.17 with n1=n2=32.
1. Compute the margins of error for each of the following scenarios: p1=0.6, p2=0.4; p1=0.7, p2=0.3; and p1=0.8, p2=0.2.
2. If you think that one of these scenarios is likely to fit your study, should you reconsider your choice of n1=32 and n2=32? Explain your answer.

Use the power of the significance test

When we studied using power to compute the sample size needed for a significance test for a single proportion, we used software. We will do the same for the significance test comparing two proportions.

Some software allows us to consider significance tests that are a little more general than the version we studied in this section. Specifically, we used the null hypothesis H0: p1=p2, which we can rewrite as H0: p1−p2=0. The generalization allows us to use values different from zero in the alternative way of writing H0. Therefore, we write H0: p1−p2=Δ0 for the null hypothesis, and we will need to specify Δ0=0 for the significance test that we studied.

Here is a summary of the inputs needed for software to perform the calculations:

The significance level α (the probability of rejecting the null hypothesis when it is true); usually we choose 5% for the α.
Power (probability of rejecting the null hypothesis when it is false); usually we choose 80% (0.80) for power.
The value of Δ0 in the null hypothesis H0: p1−p2=Δ0.
The alternative hypothesis, two-sided (Ha: p1≠p2) or one-sided (Ha: p1>p2) or (Ha: p1<p2).
Values for p1 and p2 in the alternative hypothesis.

Example 8.18 Sample sizes for age and commercial preferences.

Refer to Example 8.17, where we used the margin of error to find the sample sizes for comparing the preferences of two age groups for two commercials. Let’s find the sample sizes required for a significance test that the two proportions who prefer Commercial A are equal (Δ0=0) using a two-sided alternative with p1=0.55 and p2=0.75, α=0.55, and 80% (0.80) power. Outputs from JMP and Minitab are given in Figure 8.8. We need n1=n2=89 participants in each age group for our study.

A J M P input and Minitab output. — Figure 8.8 JMP and Minitab outputs for finding the sample size, Example 8.18.

At the top of the J M P window is an expanded menu, sample size. Below is a power calculator. It lists several measures with textboxes for entering desired values and radio button for selecting choices. From top to bottom, it reads as follows. Two proportions. Testing if two proportions are different from each other. H sub 0, P sub 1 minus P sub 2 equals delta o. Alpha, 0.05 entered. Proportion 1, 0.55 entered. Proportion 2, 0.75 entered. Two options, two-sided or one-sided, with two-sided selected. Supply two of difference, sample sizes, or power to determine the third. When entering sample sizes, enter a value for both groups. Null difference in proportion, 0. Sample size 1, 89. Sample size 2, 89. Power, 0.8. The Minitab output of power and sample size lists the following test information and results. Test for two proportions. Testing comparison p equals baseline p, versus not equal to. Calculating power for baseline p equals 0.55 alpha equals 0.05. The sample size is for each group. Below, the output shows a graph of a power curve for two proportions. The graph plots power on the vertical axis, ranging from 0.0 to 1.0 in increments of 0.2, versus comparison p on the horizontal axis, ranging from 0.2 to 0.9 in increments of 1. To the right, the output lists the following data for the graph. Sample size, 89. Assumptions. alpha, 0.05. Baseline p, 0.55. Alternative, does not equal. On the graph, a U-shaped plot falls from (0.25, 1.0) to (0.54, 0.05), then rises to (0.83, 1.0). A point is plotted on the curve at (0.75, 0.8). All values estimated.

Note that the Minitab output in Figure 8.8 gives the power curve for different alternatives. All of these have p1=0.55, which Minitab calls the “Baseline p,” while p2, the Comparison p, varies from approximately 0.3 to 0.8. We see that the power is essentially 100% (1) at the extremes. The power is 0.05, the Type I error (α), at p2=0.55, which corresponds to the null hypothesis.

Check-in

8.22 Find the sample sizes. Consider the setting in Example 8.17. Change p1 to 0.8 and p2 to 0.6. Find the required sample sizes.

Beyond the Basics

Relative risk

In Exercise 8.11, we compared the chance that a wallet would be returned for wallets with money and wallets with no money by reporting the difference in the proportions with a confidence interval. Another way to compare two proportions is to take the ratio. This approach can be used in any setting, and it is particularly common in studies of medical treatments.

We think of each proportion as a risk that something (usually bad) will happen. We then compare these two risks with the ratio of the two proportions, which is called the relative risk (RR). Note that a relative risk of 1 means that the two proportions, p1 and p2, are equal. The procedure for calculating confidence intervals for relative risk is based on the same kind of principles that we have studied, but the details are somewhat more complicated. Fortunately, we can leave the details to software and concentrate on interpretation and communication of the results.

Example 8.19 Aspirin and blood clots: Relative risk.

A study of 822 patients who completed a standard treatment for blood clots (venous thromboembolism) were randomly assigned in equal numbers to receive a low-dose aspirin or a placebo treatment. Patients were monitored for several years for the occurrence of several related medical conditions. Counts of patients who experienced one or more of these conditions were reported for each year after the study began.¹⁹ The following table gives the data for a composite of events, termed “major vascular events.” Here, X is the number of patients who had a major event during the time they were monitored:

Population	n	X	p^=X/n
1 (aspirin)	411	45	0.1095
2 (placebo)	411	73	0.1776
Total	822	118	0.1436

The relative risk is

RR=p^1p^2=45/41173/411=0.6164

Software gives the 95% confidence interval as 0.4364 to 0.8707. Taking aspirin has reduced the occurrence of major events to 62% of what it is for patients taking the placebo. The 95% confidence interval is 44% to 87%. The confidence interval does not include 1, so we conclude that the two proportions are different with p≤0.05.

Note that the confidence interval is not symmetric about the estimate. Relative risk is one of many situations where this occurs.

Section 8.2 SUMMARY

The large-sample estimate of the difference in two population proportions is

D=p^1−p^2

where p^1 and p^2 are the sample proportions:

p^1=X1n1 and p^2=X2n2
The standard error of D is

SED=p^1(1−p^1)n1+p^2(1−p^2)n2
The level C margin of error of D is

m=z*SED

where z* is the value for the standard Normal density curve with area C between −z* and z*.
The level C large-sample confidence interval for D is

D±m

We recommend using this interval for 90%, 95%, or 99% confidence when the number of successes and the number of failures in both samples are all at least 10. When sample sizes are smaller, alternative procedures such as the plus four estimate of the difference in two population proportions are recommended.
Significance tests of H0: p1=p2 use the z statistic

z=p^1−p^2SEDp

with P-values from the N(0, 1) distribution. In this statistic,

SEDp=p^(1−p^)(1n1+1n2)

and p^ is the pooled estimate of the common value of p1 and p2:

p^=X1+X2n1+n2

Use this test when the number of successes and the number of failures in each of the samples are at least 5.
Relative risk is the ratio of two sample proportions:

RR=p^1p^2

Confidence intervals for relative risk are often used to summarize the comparison of two proportions.

Now that you have completed this section, you will be able to:

Identify the counts and sample sizes for a comparison between two proportions, compute the sample proportions, and find their difference. Review Example 8.11 (page 469) and try Exercise 8.35.
Use the large-sample method to find the confidence interval for a difference between two proportions and interpret the confidence interval. Review Example 8.12 (page 470) and try Exercise 8.37.
Use the large-sample method to perform a significance test for comparing two proportions and interpret the results. Review Example 8.15 (page 475) and try Exercise 8.39.
Find the sample size needed for a desired margin of error for the difference in proportions. Review Example 8.17 (page 477) and try Exercise 8.49.
Find the sample size needed for a significance test comparing two proportions. Review Example 8.18 (page 478) and try Exercise 8.51.

Section 8.2 EXERCISES

8.34 What’s wrong? For each of the following, explain what is wrong and why.
1. A z statistic is used to test the null hypothesis that p^1=p^2.
2. If two sample proportions are equal, then the sample counts are equal.
3. A 95% confidence interval for the difference in two proportions includes errors due to nonresponse.
8.35 Identify the key elements. For each of the following scenarios, identify the populations, the counts, and the sample sizes; compute the two proportions and find their difference.
1. A study of tipping behaviors examined the relationship between the color of the shirt worn by the server and whether or not the customer left a tip.²⁰ There were 418 male customers in the study. Of the 69 customers served by a server wearing a red shirt, 40 left a tip. Of the 349 who were served by a server wearing a shirt of a different color, 130 left a tip.
2. A sample of 40 runners was used to compare two new routines for stretching. The runners were randomly assigned to one of the routines, which they followed for two weeks. Satisfaction with the routines was measured using a questionnaire at the end of the two-week period. For the first routine, 11 runners said that they were satisfied or very satisfied. For the second routine, 14 runners said that they were satisfied or very satisfied.
8.36 Apply the confidence interval guidelines. Refer to the previous exercise. For each of the scenarios, determine whether or not the guidelines for using the large-sample method for a 95% confidence interval are satisfied. Explain your answers.
8.37 Find the 95% confidence interval. Refer to Exercise 8.35. For each scenario, find the large-sample 95% confidence interval for the difference in proportions and use the scenario to explain the meaning of the confidence interval.
8.38 Apply the significance test guidelines. Refer to Exercise 8.35. For each of the scenarios, determine whether or not the guidelines for using the large-sample significance test are satisfied. Explain your answers.
8.39 Perform the significance test. Refer to Exercise 8.35. For each scenario, perform the large-sample significance test and use the scenario to explain the meaning of the significance test.
8.40 Teeth and military service. In 1898 the United States and Spain fought a war over the U.S. intervention in the Cuban War of Independence. At that time, the U.S. military was concerned about the nutrition of its recruits. Many did not have a sufficient number of teeth to chew the food provided to soldiers. As a result, it was likely that they would be undernourished and unable to fulfill their duties as soldiers. The requirements at that time specified that a recruit must have “at least four sound double teeth, one above and one below on each side of the mouth, and so opposed” so that they could chew food. Of the 58,952 recruits who were under the age of 20, 68 were rejected for this reason. For the 43,786 recruits who were 40 or over, 3801 were rejected.²¹
1. Find the proportion of rejects for each age group.
2. Find a 99% confidence interval for the difference in the proportions.
3. Use a significance test to compare the proportions. Write a short paragraph describing your results and conclusions.
4. Are the guidelines for the use of the large-sample approach satisfied for your work in parts (b) and (c)? Explain your answers.
8.41 Physical education requirements. In the 1920s, about 97% of U.S. colleges and universities required a physical education course for graduation. Today, about 40% require such a course. A recent study of physical education requirements included 354 institutions: 225 private and 129 public. Among the private institutions, 60 required a physical education course, while among the public institutions, 101 required a course.²²
1. What are the explanatory and response variables for this exercise? Justify your answers.
2. What are the populations?
3. What are the statistics?
4. Use a 95% confidence interval to compare the private and the public institutions with regard to the physical education requirement.
5. Use a significance test to compare the private and the public institutions with regard to the physical education requirement.
6. For parts (d) and (e), verify that the guidelines for using the large-sample methods are satisfied.
7. Summarize your analysis of these data in a short paragraph.
8.42 Exergaming in Canada. Exergames are active video games such as rhythmic dancing games, virtual bicycles, balance board simulators, and virtual sports simulators that require a screen and a console. A study of exergaming practiced by students from grades 10 and 11 in Montreal, Canada, examined many factors related to participation in exergaming.²³ Of the 358 students who reported that they stressed about their health, 29.9% said that they were exergamers. Of the 851 students who reported that they did not stress about their health, 20.8% said that they were exergamers.
1. Define the two populations to be compared for this exercise.
2. What are the counts, the sample sizes, and the proportions?
3. Are the guidelines for the use of the large-sample confidence interval satisfied?
4. Are the guidelines for the use of the large-sample significance test satisfied?
8.43 Confidence interval for exergaming in Canada. Refer to the previous exercise. Find the 95% confidence interval for the difference in proportions. Write a short statement interpreting this result.
8.44 Significance test for exergaming in Canada. Refer to Exercise 8.42. Use a significance test to compare the proportions. Write a short statement interpreting this result.
8.45 Adult gamers versus teen gamers. A Pew Internet Project Data Memo presented data comparing adult gamers with teen gamers with respect to the devices on which they play. The data are from two surveys. The adult survey had 1063 gamers, while the teen survey had 1064 gamers. The memo reports that 54% of adult gamers played on game consoles (Xbox, PlayStation, Wii, etc.), while 89% of teen gamers played on game consoles.²⁴
1. Refer to the table that appears on page 468. Fill in the numerical values of all quantities that are known.
2. Find the estimate of the difference between the proportion of teen gamers who played on game consoles and the proportion of adults who played on these devices.
3. Is the large-sample confidence interval for the difference between two proportions appropriate to use in this setting? Explain your answer.
4. Find the 95% confidence interval for the difference.
5. Convert your estimated difference and confidence interval to percents.
6. The adult survey was conducted between October and December 2008, whereas the teen survey was conducted between November 2007 and February 2008. Do you think that this difference should have any effect on the interpretation of the results? Be sure to explain your answer.
8.46 Significance test for gaming on computers. Refer to the previous exercise. Test the null hypothesis that the two proportions are equal. Report the test statistic with the P-value and summarize your conclusion.
8.47 Gamers on computers. The report described in Exercise 8.45 also presented data from the same surveys for gaming on computers (desktops or laptops). These devices were used by 73% of adult gamers and by 76% of teen gamers. Answer the questions given in Exercise 8.45 for gaming on computers.
8.48 Significance test for gaming on consoles. Refer to the previous exercise. Test the null hypothesis that the two proportions are equal. Report the test statistic with the P-value and summarize your conclusion.
8.49 Find the sample size. You are planning a study in which you will use a 95% confidence interval to report the difference between two proportions. Find the sample size needed for a margin of error of 0.2 if you do not have good guess at the values of the two proportions. How would your answer change if you were willing to guess p1*=0.65 and p2*=0.35?
8.50 Can we compare gaming on consoles with gaming on computers? Refer to Exercises 8.45 to 8.48. Do you think that you can use the large-sample confidence intervals for a difference in proportions to compare teens’ use of computers with teens’ use of consoles? Write a short paragraph giving the reason for your answer. (Hint: Look carefully at the assumptions needed for this procedure on page 470.)
8.51 Find the power. Consider testing the null hypothesis that two proportions are equal versus the two-sided alternative with α=0.05, 80% power, and equal sample sizes in the two groups.
1. For each of the following situations, find the required sample size: (i) p1=0.1 and p2=0.2, (ii) p1=0.2 and p2=0.3, (iii) p1=0.3 and p2=0.4, (iv) p1=0.4 and p2=0.5, (v) p1=0.5 and p2=0.6, (vi) p1=0.6 and p2=0.7, (vii) p1=0.7 and p2=0.8, and (viii) p1=0.8 and p2=0.9.
2. Write a short summary describing your results.
8.52 Find the relative risk. Refer to Exercise 8.35. For each scenario, find the relative risk. Be sure to give a justification for your choice of proportions to use in the numerator and the denominator of the ratio. Use the scenarios to explain the meaning of the relative risk.