14.23 Another logistic model for cell phones and age. Refer to Exercise 14.5 (page 14-10). Suppose that you use the actual value of age in years as the explanatory variable in a logistic regression model.
Describe the statistical model for logistic regression in this setting.
Interpret
This model requires an assumption that is not needed in the model that you described in the previous exercise. Explain the assumption and describe a method for examining whether or not it is a reasonable assumption to make for these data. (Hint: Refer to Exercise 14.5 and Figure 14.7, pages 14-10 and 14-15.)
14.24 Compare the multiple logistic regression analysis with the two-way table. The data analyzed in Figure 14.11 were studied in Exercises 9.1, 9.3, 9.5, and 9.7 (pages 503 to 504) using a
14.25 Exergaming in Canada. Exergames are active video games such as rhythmic dancing games, virtual bicycles, balance board simulators, and virtual sports simulators that require a screen and a console. A study of exergaming by students in grades 10 and 11 in Montreal, Canada, examined many factors related to participation in exergaming.5 Of the 358 students who reported that they stressed about their health, 29.9% said that they were exergamers. Of the 851 students who reported that they did not stress about their health, 20.8% said that they were exergamers. Analyze these data using logistic regression and write a summary of your analytical approach, your results, and your conclusions.
14.26 More exergaming in Canada. Refer to the previous exercise. Another explanatory variable reported in this study was the amount of television watched per day. Of the 54 students who reported that they watched no TV, 11.1% were exergamers; for the 776 students who watched some TV but less than two hours, 20.6% were exergamers; and for the 370 students who watched two or more hours, 31.1% were exergamers. Use logistic regression to examine the relationship between TV watching and exergaming. Write a summary of your analytical approach, your results, and your conclusions.
14.27 Interpret the fitted model. If we apply the exponential function to the fitted model in Example 14.9 (page 14-9), we get
Show that for any value of the quantitative explanatory variable x, the odds ratio for increasing x by 1,
is
14.28 z and the
Use the information in each output to calculate the z statistic. Verify that they are essentially the same (with no roundoff, they would be equal). This z statistic has approximately the standard Normal distribution if the null hypothesis
Show that the square of z is close to the Wald statistic reported by SPSS and the
Note that Minitab uses a different calculation to obtain a
Comment on the reporting of P-values as 0.000 by Minitab and .000 by SPSS versus
14.29 An example of Simpson’s paradox. Here is an example of Simpson’s paradox: the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group. The data concern the comparison of success rates for 2-pointers and 3-pointers for two teams in a game of basketball (see Example 2.48, page 135). Here are the data for all shots combined:
Outcome | Team | |
---|---|---|
A | B | |
Made | 28 | 26 |
Missed | 32 | 34 |
Shots | 60 | 60 |
And here are the counts broken down by the type of shot:
Outcome | 2-pointers | 3-pointers | ||
---|---|---|---|---|
A | B | A | B | |
Made | 25 | 16 | 3 | 10 |
Missed | 25 | 14 | 7 | 20 |
Shots | 50 | 30 | 10 | 30 |
Use a logistic regression to model the odds of making a shot with team as the explanatory variable. Summarize the results of your analysis and give a 95% confidence interval for the odds ratio that team A makes a shot relative to team B.
Rerun your analysis in part (a) using the team (A or B) and the type of shot (2-pointer or 3-pointer) as explanatory variables. Summarize the results of your analysis and give a 95% confidence interval for the odds ratio of team A relative to team B.
Explain Simpson’s paradox in terms of your results.
14.30 Reducing the number of workers. To be competitive in global markets, many corporations are undertaking major reorganizations. Often, these involve “downsizing” or a “reduction in force” (RIF), where substantial numbers of employees are terminated. Federal and various state laws require that employees be treated equally regardless of their age. In particular, employees over the age of 40 are in a “protected” class, and many allegations of discrimination focus on comparing employees over 40 with their younger coworkers. Here are the data for a recent RIF:
Terminated | Over 40 | |
---|---|---|
No | Yes | |
Yes | 8 | 48 |
No | 554 | 745 |
Write the logistic regression model for this problem using the log odds of a RIF as the response variable and an indicator for over and under 40 years of age as the explanatory variable.
Explain the assumption concerning binomial distributions in terms of the variables in this exercise. To what extent do you think that these assumptions are reasonable?
Find
Transform the results to the odds scale and write a short summary of your work and conclusion.
If additional explanatory variables were available—for example, a performance evaluation—how would you use this information to study the RIF?
14.31 Another example of Simpson’s paradox. Refer to Exercises 2.105 and 2.106 (page 139). Using Exercise 14.29 as a guide, analyze these data using logistic regression.
14.32 Predicting physical activity. Participation in physical activities typically declines between high school and young adulthood. This suggests that postsecondary institutions may be an ideal setting to address physical activity. A study looked at the association between physical activity and several behavioral and perceptual characteristics among midwestern college students.6 Of 663 students who met the vigorous activity guidelines for the previous week, 169 reported eating fruit two or more times per day. Of the 471 who did not meet the vigorous activity guidelines in the previous week, 68 reported eating fruit two or more times per day. Model the log odds of vigorous activity using an indicator variable for eating fruit two or more times per day as the explanatory variable. Summarize your findings.
The following four exercises use the GPAHI data file. We examine models for relating success as measured by the GPA to several explanatory variables. In Chapter 11, we used multiple regression methods for our analysis. Here, we define an indicator variable, HIGPA, to be 1 if the GPA is 3.0 or better and 0 otherwise.
14.33 Use high school grades to predict high grade point averages. Use a logistic regression to predict HIGPA using the three high school grade summaries as explanatory variables.
Summarize the results of the hypothesis test that the coefficients for all three explanatory variables are zero.
Give the coefficient for high school math grades with a 95% confidence interval. Do the same for the two other predictors in this model.
Summarize your conclusions based on parts (a) and (b).
14.34 Use SAT scores to predict high grade point averages. Use a logistic regression to predict HIGPA using the SATM and SATCR scores as explanatory variables.
Summarize the results of the hypothesis test that the coefficients for both explanatory variables are zero.
Give the coefficient for the SATM score with a 95% confidence interval. Do the same for the SATCR score.
Summarize your conclusions based on parts (a) and (b).
14.35 Use high school grades and SAT scores to predict high grade point averages. Run a logistic regression to predict HIGPA using the three high school grade summaries and the two SAT scores as explanatory variables. We want to produce an analysis that is similar to that done for the case study in Chapter 11.
Test the null hypothesis that the coefficients of the three high school grade summaries are zero; that is, test
Test the null hypothesis that the coefficients of the two SAT scores are zero; that is, test
What do you conclude from the tests in (a) and (b)?
14.36 Is there an effect of sex? In this exercise, we investigate the effect of sex (coded as 0 for males and 1 for females) on the odds of getting a high GPA.
Use sex to predict HIGPA using a logistic regression. Summarize the results.
Perform a logistic regression using sex and the two SAT scores to predict HIGPA. Summarize the results.
Compare the results of parts (a) and (b) with respect to how sex relates to HIGPA. Summarize your conclusions.
14.37 Finding the best model. In Example 14.14 (page 14-18), we looked at a multiple logistic regression for movie profitability based on three explanatory variables. Complete the analysis by looking at the three models that include two explanatory variables and the three models that include only one variable. Create a table that includes the parameter estimates and their P-values as well as the overall
14.38 Tipping behavior in Canada. The Consumer Report on Eating Share Trends (CREST) contains data from all provinces of Canada detailing away-from-home food purchases by roughly 4000 households per quarter. Researchers recently restricted their attention to restaurants at which tips would normally be given.7 From a total of 73,822 observations, “high” and “low” tipping variables were created based on whether the observed tip rate was above 20% or below 10%, respectively. They then used logistic regression to identify explanatory variables associated with either “high” or “low” tips. The following table summarizes what they termed the stereotype-related variables for the low-tip analysis:
Explanatory variable | Odds ratio |
---|---|
Senior adult | 1.099 |
Sunday | 1.098 |
English as second language | 1.142 |
French-speaking Canadian | 1.163 |
Alcoholic drinks | 0.713 |
Lone male | 0.858 |
All coefficients were significant at the 0.01 level. Write a short summary explaining these results in terms of the odds of leaving a low tip.