Chapter 14 Exercises in Chapter 14 Logistic Regression

Chapter 14 EXERCISES

14.23 Another logistic model for cell phones and age. Refer to Exercise 14.5 (page 14-10). Suppose that you use the actual value of age in years as the explanatory variable in a logistic regression model.
1. Describe the statistical model for logistic regression in this setting.
2. Interpret β1 in terms of an effect based on a difference in age of one year.
3. This model requires an assumption that is not needed in the model that you described in the previous exercise. Explain the assumption and describe a method for examining whether or not it is a reasonable assumption to make for these data. (Hint: Refer to Exercise 14.5 and Figure 14.7, pages 14-10 and 14-15.)
14.24 Compare the multiple logistic regression analysis with the two-way table. The data analyzed in Figure 14.11 were studied in Exercises 9.1, 9.3, 9.5, and 9.7 (pages 503 to 504) using a 2×6 table of counts. Compare these two approaches to the analysis of these data. Describe some strengths and weaknesses of each approach. Which do you prefer? Give reasons for your answer.
14.25 Exergaming in Canada. Exergames are active video games such as rhythmic dancing games, virtual bicycles, balance board simulators, and virtual sports simulators that require a screen and a console. A study of exergaming by students in grades 10 and 11 in Montreal, Canada, examined many factors related to participation in exergaming.⁵ Of the 358 students who reported that they stressed about their health, 29.9% said that they were exergamers. Of the 851 students who reported that they did not stress about their health, 20.8% said that they were exergamers. Analyze these data using logistic regression and write a summary of your analytical approach, your results, and your conclusions.
14.26 More exergaming in Canada. Refer to the previous exercise. Another explanatory variable reported in this study was the amount of television watched per day. Of the 54 students who reported that they watched no TV, 11.1% were exergamers; for the 776 students who watched some TV but less than two hours, 20.6% were exergamers; and for the 370 students who watched two or more hours, 31.1% were exergamers. Use logistic regression to examine the relationship between TV watching and exergaming. Write a summary of your analytical approach, your results, and your conclusions.
14.27 Interpret the fitted model. If we apply the exponential function to the fitted model in Example 14.9 (page 14-9), we get

odds=e−256 + 1.125x=e−2.56×e1.125x

Show that for any value of the quantitative explanatory variable x, the odds ratio for increasing x by 1,

oddsx+1oddsx

is e1.125=3.08. This justifies the interpretation given at the end of Example 14.9.
14.28 z and the X2 statistic. Use the three outputs in Figure 14.8 (page 14-16) to explore the relationship between the z statistic and the X2 statistic that we have discussed in this chapter (page 14-12).
1. Use the information in each output to calculate the z statistic. Verify that they are essentially the same (with no roundoff, they would be equal). This z statistic has approximately the standard Normal distribution if the null hypothesis (β1=0) is true.
2. Show that the square of z is close to the Wald statistic reported by SPSS and the X2 statistic reported by JMP.
3. Note that Minitab uses a different calculation to obtain a X2 statistic. Does the P-value for this statistic reported by Minitab lead to a different conclusion than the P-values given by SPSS and JMP. Explain your answer.
4. Comment on the reporting of P-values as 0.000 by Minitab and .000 by SPSS versus <0.0001 by JMP? Which do you prefer? Give reasons for your answer.
14.29 An example of Simpson’s paradox. Here is an example of Simpson’s paradox: the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group. The data concern the comparison of success rates for 2-pointers and 3-pointers for two teams in a game of basketball (see Example 2.48, page 135). Here are the data for all shots combined:

Outcome Team

A B

Made 28 26

Missed 32 34

Shots 60 60

And here are the counts broken down by the type of shot:

Outcome 2-pointers 3-pointers

A B A B

Made 25 16 3 10

Missed 25 14 7 20

Shots 50 30 10 30
1. Use a logistic regression to model the odds of making a shot with team as the explanatory variable. Summarize the results of your analysis and give a 95% confidence interval for the odds ratio that team A makes a shot relative to team B.
2. Rerun your analysis in part (a) using the team (A or B) and the type of shot (2-pointer or 3-pointer) as explanatory variables. Summarize the results of your analysis and give a 95% confidence interval for the odds ratio of team A relative to team B.
3. Explain Simpson’s paradox in terms of your results.
14.30 Reducing the number of workers. To be competitive in global markets, many corporations are undertaking major reorganizations. Often, these involve “downsizing” or a “reduction in force” (RIF), where substantial numbers of employees are terminated. Federal and various state laws require that employees be treated equally regardless of their age. In particular, employees over the age of 40 are in a “protected” class, and many allegations of discrimination focus on comparing employees over 40 with their younger coworkers. Here are the data for a recent RIF:

Terminated Over 40

No Yes

Yes 8 48

No 554 745
1. Write the logistic regression model for this problem using the log odds of a RIF as the response variable and an indicator for over and under 40 years of age as the explanatory variable.
2. Explain the assumption concerning binomial distributions in terms of the variables in this exercise. To what extent do you think that these assumptions are reasonable?
3. Find b1 and its standard error SEb1, as well as the 95% confidence interval for β1.
4. Transform the results to the odds scale and write a short summary of your work and conclusion.
5. If additional explanatory variables were available—for example, a performance evaluation—how would you use this information to study the RIF?
14.31 Another example of Simpson’s paradox. Refer to Exercises 2.105 and 2.106 (page 139). Using Exercise 14.29 as a guide, analyze these data using logistic regression.
14.32 Predicting physical activity. Participation in physical activities typically declines between high school and young adulthood. This suggests that postsecondary institutions may be an ideal setting to address physical activity. A study looked at the association between physical activity and several behavioral and perceptual characteristics among midwestern college students.⁶ Of 663 students who met the vigorous activity guidelines for the previous week, 169 reported eating fruit two or more times per day. Of the 471 who did not meet the vigorous activity guidelines in the previous week, 68 reported eating fruit two or more times per day. Model the log odds of vigorous activity using an indicator variable for eating fruit two or more times per day as the explanatory variable. Summarize your findings.

The following four exercises use the GPAHI data file. We examine models for relating success as measured by the GPA to several explanatory variables. In Chapter 11, we used multiple regression methods for our analysis. Here, we define an indicator variable, HIGPA, to be 1 if the GPA is 3.0 or better and 0 otherwise.
14.33 Use high school grades to predict high grade point averages. Use a logistic regression to predict HIGPA using the three high school grade summaries as explanatory variables.
1. Summarize the results of the hypothesis test that the coefficients for all three explanatory variables are zero.
2. Give the coefficient for high school math grades with a 95% confidence interval. Do the same for the two other predictors in this model.
3. Summarize your conclusions based on parts (a) and (b).
14.34 Use SAT scores to predict high grade point averages. Use a logistic regression to predict HIGPA using the SATM and SATCR scores as explanatory variables.
1. Summarize the results of the hypothesis test that the coefficients for both explanatory variables are zero.
2. Give the coefficient for the SATM score with a 95% confidence interval. Do the same for the SATCR score.
3. Summarize your conclusions based on parts (a) and (b).
14.35 Use high school grades and SAT scores to predict high grade point averages. Run a logistic regression to predict HIGPA using the three high school grade summaries and the two SAT scores as explanatory variables. We want to produce an analysis that is similar to that done for the case study in Chapter 11.
1. Test the null hypothesis that the coefficients of the three high school grade summaries are zero; that is, test H0: βHSM=βHSS=βHSE=0.
2. Test the null hypothesis that the coefficients of the two SAT scores are zero; that is, test H0: βSATM=βSATCR=0.
3. What do you conclude from the tests in (a) and (b)?
14.36 Is there an effect of sex? In this exercise, we investigate the effect of sex (coded as 0 for males and 1 for females) on the odds of getting a high GPA.
1. Use sex to predict HIGPA using a logistic regression. Summarize the results.
2. Perform a logistic regression using sex and the two SAT scores to predict HIGPA. Summarize the results.
3. Compare the results of parts (a) and (b) with respect to how sex relates to HIGPA. Summarize your conclusions.

Outcome	Team
A	B
Made	28	26
Missed	32	34
Shots	60	60

Outcome	2-pointers	3-pointers
A	B	A	B
Made	25	16	3	10
Missed	25	14	7	20
Shots	50	30	10	30

Terminated	Over 40
No	Yes
Yes	8	48
No	554	745

PUTTING IT ALL TOGETHER

14.37 Finding the best model. In Example 14.14 (page 14-18), we looked at a multiple logistic regression for movie profitability based on three explanatory variables. Complete the analysis by looking at the three models that include two explanatory variables and the three models that include only one variable. Create a table that includes the parameter estimates and their P-values as well as the overall X2 statistic and degrees of freedom. Based on the results, which model do you feel is the best? Explain your answer.
14.38 Tipping behavior in Canada. The Consumer Report on Eating Share Trends (CREST) contains data from all provinces of Canada detailing away-from-home food purchases by roughly 4000 households per quarter. Researchers recently restricted their attention to restaurants at which tips would normally be given.⁷ From a total of 73,822 observations, “high” and “low” tipping variables were created based on whether the observed tip rate was above 20% or below 10%, respectively. They then used logistic regression to identify explanatory variables associated with either “high” or “low” tips. The following table summarizes what they termed the stereotype-related variables for the low-tip analysis:

Explanatory variable Odds ratio

Senior adult 1.099

Sunday 1.098

English as second language 1.142

French-speaking Canadian 1.163

Alcoholic drinks 0.713

Lone male 0.858

All coefficients were significant at the 0.01 level. Write a short summary explaining these results in terms of the odds of leaving a low tip.

Explanatory variable	Odds ratio
Senior adult	1.099
Sunday	1.098
English as second language	1.142
French-speaking Canadian	1.163
Alcoholic drinks	0.713
Lone male	0.858