Chapter 11 CHECK-IN QUESTIONS

  1. 11.1

    1. Final exam scores.

    2. 166.

    3. Seven.

    4. Math course anxiety, math test anxiety, numerical task anxiety, enjoyment, self-confidence, motivation, and perceived usefulness of the feedback sessions.

  2. 11.3 GPA is left-skewed, and there may be extreme observations at 0 and 0.5. SATM, SATCR, and SATW all look normally distributed, without any extreme values.

  3. 11.5 Yes, they are more or less randomly dispersed around 0.

Chapter 11 EXERCISES

  1. 11.1

    1. A small P-value indicates that at least one explanatory variable is significant.

    2. R2 is not obtained from squaring and adding the pairwise correlations.

    3. The null hypothesis should be β2.

  2. 11.3

    1. The response variable is dog life expectancy.

    2. n=168.

    3. p=2.

    4. The explanatory variables are the breed’s autosomal inbreeding coefficient and the logarithm of the adult male average weight.

  3. 11.5

    1. Life expectancy decreases as weight and inbreeding level increase.

    2. y^=9.25.

    3. e=1.77.

  4. 11.7

    1. (0.0334, 12.9666).

    2. (0.0344, 13.0344).

    3. (0.1232, 9.9232).

    4. (0.124, 9.676).

  5. 11.9

    1. y=β0+β1x1+β2x2+β3x3+β4x4+β5x5+β6x6+ϵ, where ϵ~N(0, σ) and independent.

    2. The sources of variation are model (DFM=p=6), error (DFE=np1=176), and total (DFT=n1=182).

  6. 11.11

    1. Things seem as expected: the three anxiety variables have negative signs, and the other four variables all have positive signs.

    2. 7 and 158.

    3. Only the variable Feedback usefulness is significant (t=3.320, P-value=0.0011) at the 0.05 level.

  7. 11.13 R2=90/510=0.1765

    Source DF Sum of squares Mean square F
    Model  3  90 30  2.857
    Error 40 420 10.5
    Total 43 510
  8. 11.15 (a–d) Answers will vary.

  9. 11.17

    1. 10, which is the slope for x.

    2. 5, which is the slope for x.

    3. 5, which is the slope for x. Yes, it is true in general, as long as x is an indicator variable with values 0 and 1.

  10. 11.19

    1. df=(12,110) P-value<0.0001.

    2. No, the overall model is significant.

    3. Division, November, Weekend, Night, and Promotion are all significant in the presence of all the other explanatory variables.

    4. 52%.

    5. 15246.36.

    6. A prediction interval is more appropriate to represent this particular case.

  11. 11.21

    1. Teaching and Research are both right-skewed. Citations is left-skewed.

    2. Teaching and Research are very strongly linearly related (r=0.8906). Citations does not appear to be related to either Teaching or Research (r=0.04901 and 0.002, respectively).

  12. 11.23

    1. Overall=β0+β1Teaching+β2Research+β3Citations+ϵ, ϵ~N(0, σ) and independent.

    2. F=1384.08, P-value<0.0001, y^=4.69649+0.22313Teaching+0.38737Research+0.31088Citations.

    3. The residuals look evenly distributed around 0.

    4. For Teaching: (0.17659, 0.26967); for Research: (0.34649, 0.42825); for Citations: (0.27350, 0.34825).

    5. R2=98.90%; s=1.23731.

  13. 11.25 Generally, all four plots show the same random scattering, and the conditions are met for a multiple regression model.

  14. 11.27 HSS and SATM are significant or very close (at the 0.05 level) in each of the models we considered; thus, these two variables would definitely be included. HSM, HSE, and SATW were not significant in any of the models we considered, so we likely would not want them in our model. If we had to choose from one of the four given, the model with HSM, HSS, and SATM seems like the best candidate.

  15. 11.29

    1. F=6.94, DF=4 and 22, P-value=0.0009.

    2. R2=55.77%.

    3. Only Admit has a significant t test (t=3.38, P-value=0.0027), and the other three are not significant when added to the model last.

  16. 11.31

    1. For Model 1: 200; for Model 2: 199.

    2. t=3.09, P-value=0.0023.

    3. For Gene expression: t=2.44, P-value=0.0156; for RB.composite: t=3.33, P-value=0.0010.

    4. The relationship is still positive after adjusting for RB. When gene expression increases by 1, popularity increases by 0.204 in Model 1 and by 0.161 in Model 2 (with RB fixed).

  17. 11.33

    1. 8 and 786.

    2. 7.84%; it is not very predictive.

    3. Males and Hispanics consume energy drinks more frequently. Consumption also increases with risk-taking scores.

    4. Within a group of students with identical (or similar) values of those other variables, energy-drink consumption increases with increasing jock identity and increasing risk taking.

  18. 11.35

    1. F=10.44, P-value<0.0001, y^=23.395560.68175x1+0.10195x2.

    2. 17.71%.

    3. No violations.

    4. H0: β2=0, Ha: β20; t=1.83, P-value=0.0696.

  19. 11.37

    1. Budget and Opening are right-skewed. Theaters and Ratings are left-skewed.

    2. The correlations are 0.403, 0.570, 0.625, 0.281, 0.151, and 0.022. Budget, Opening, and Theaters have the largest correlations among them (first three listed); Ratings is not highly correlated with any of the other three (last three listed).

  20. 11.39

    1. F=32.28, P-value<0.0001, USRevenue=β0+β1Budget+β2Opening+β3Theaters+β4Ratings+ϵ, where ϵ~ N(0,σ) and independent.

    2. y^=170.40874++0.09252Budget+1.91600Opening+0.02961Theaters+17.29923Ratings.

    3. The residual plot shows a slight downward trend, suggesting another model may be more appropriate.

    4. R2=77.26%.

  21. 11.41

    1. ($25,514,500,$158,200,600).

    2. ($28,098,300,$154,239,900).

    3. The intervals are similar.

  22. 11.43

    1. GINI and Corrupt skewed to the right, the other three skewed to the left. GINI, Democracy, and Life have the most skewness.

    2. LSI seems moderately correlated with Corrupt, Democracy, and Life (r=0.6974, 0.6092, and 0.7219) but is not related to GINI much at all (r=0.0503). Among the others, only Corrupt seems to be moderately related to both Democracy and Life (r=0.7474 and 0.6503); other relationships appear to be weak.

  23. 11.45

    1. Refer to your regression output.

    2. For example, the t statistic for the GINI coefficient grows from t=0.42 (P=0.675) to t=4.25 (P<0.0007). The Democracy t is 3.53 in the third model (P<0.0007) but drops to 0.71(P=0.479) in the fourth model.

    3. A good choice is to use GINI, Life, and Corrupt. All three coefficients are significant, and R2=70%.

  24. 11.47

    1. F=22.34, P-value<0.0001, y^=334.03+19.50OC. The residual plot shows a possible outlier.

    2. F=21.62, P-value<0.0001, y^=57.70+6.41OC+53.87TRAP. TRAP is much more significant (t=3.50, P-value=0.0016) in this model than OC (t=1.25, P-value=0.2210).

  25. 11.49 All variables are normal when log transformed. All pairs are positively associated: strongest between LVO+and LVO (r=0.8396) and LOC and LVO(r=0.5545) y^=4.39+0.71 logOC and weakest between LOC and t=6.57, P<0.0001. Using logOC: R2=59.83%,s=0.36. Using logOC and logTRAP: y^ =4.26+0.43logOC+0.42logTRAP, t=2.56, P=0.0162, t=2.06, P=0.0484, R2=65.14%, s=0.34. Using all three: y^=0.87+0.39logOC +0.03logTRAP+0.67logVO, t=3.40, P=0.0021, t=0.17, P=0.8624, t=5.71, P<0.0001, R2=84.21%, s=0.23. The best model uses only logOC and logVO:y^=0.83298+0.40589logOC+0.68159logVO, R2=84.19%, s=0.23.

  26. 11.51 Using logOC : y^=5.21+0.44logOC, t=3.59, P=0.0012, R2=30.75%, s=0.41. Using logOC and logTRAP: y^=5.04+0.06logOC +0.59logTRAP t=0.31, P=0.7618, t=2.61, P=0.0144, R2=44.30%, s=0.37. Using all three: y^=1.570.29logOC+0.24logTRAP+0.81logVO+, t=2.08, P=0.0468, t=1.47, P=0.1523, t=5.71, P<0.0001, R2=74.77%, s=0.26. The best model uses only logVO+alone: y^=1.75657+0.7305logVO+, R2=70.49%, s=0.27.

  27. 11.53

    1. PCB=β0+β1PCB52+β2PCB118+β3PCB138+β4PCB180+ϵ, where ϵ~N(0, σ) and independent.

    2. F=1456.18, P-value<0.0001, y^=0.93692+11.87270PCB52+3.76107PCB118+3.88423PCB138+4.18230PCB180. All individual predictors are significant.

    3. The residual plot shows a possible violation of constant variance. The residuals are Normal, except for two possible outliers.

  28. 11.55

    1. F=786.71, P-value<0.0001, y^=1.01840+12.64419PCB52+0.31311PCB118+8.25459PCB138..

    2. b118=0.31311, P-value=0.7083.

    3. b118=3.76107, P-value<0.0001.

    4. When we add PCB180 to the model, it makes PCB118 useful for prediction.

  29. 11.57 TEQ=β0 +β1PCB52+β2PCB118+β3PCB138+β4PCB180+ϵ, where ϵ~N(0, σ) and independent. F=33.53, P-value<0.0001. Only PCB118 tests as significant individually. The residual plot shows a couple potential outliers, which are also causing a slight right-skew in the Normal quantile plot.

  30. 11.59

    1. The correlations are all positive; the largest correlation is 0.956 (LPCB and LPCB138), and the smallest is 0.227 (LPCB28 and LPCB180). There is one outlier (specimen 39) in LPCB28; it stands out because of the “stack” of values in the LPCB126 data set that arose from the adjustment of the zero terms.

    2. All correlations are higher with the transformed data.

  31. 11.61 A good model includes logPCB28, logPCB118, and logPCB126; R2=0.7764. Adding more variables doesn’t increase R2 much.

  32. 11.63

    1. Taste: 24.53, 20.95, 16.26, 23.9. Acetic: 5.50, 5.43, 0.57, 0.66. H2S: 5.94, 5.33, 2.13, 3.69. Lactic: 1.44, 1.45, 0.30, 0.43. None of the variables show striking deviations from Normality in the quantile plots. Taste and H2S are slightly right-skewed, and Acetic has an irregular shape. There are no outliers.

  33. 11.65 F=12.11, P-value=0.0017, y^=61.49861+15.64777Acetic. R2=30.20%. The residuals are Normally distributed, but the scatterplots show that the residuals are linearly related to both H2S and Lactic.

  34. 11.67 F=27.55, P-value<0.0001, y^=29.85883+37.71995Lactic. R2=49.59%. The residuals are Normally distributed, but the scatterplots show that the residuals are linearly related to both H2S and Lactic.

  35. 11.69 y^=26.94+3.801Acetic+5.146H2S with s=10.89 and R2=0.582. For Acetic: t=0.84 (P-value=0.406). This two-variable model is not much better than the model with H2S alone (which explained 57.1% of the variation in Taste).

  36. 11.71 y^=28.88+0.328Acetic+3.912H2S+19.671Lactic with s=10.13. R2=65.2%. Acetic is not significant (P-value=0.942); there is no gain in adding Acetic to the model with H2S and Lactic. Residuals appear to be Normally distributed and show no patterns in scatterplots with explanatory variables. It appears that the H2S/Lactic model is best.

  37. 11.73

    1. For Age, y^=0.0460.484Age; for age of matched control, y^=0.366−0.315Age; for age of retired football player, y^=0.0730.835Age.

    2. Education-matched controls, age, and age for retired football players have statistically significant impacts on volume.