Chapter 10 Inference for Regression

Introduction

In this chapter, we continue our study of relationships between variables and describe methods for inference when there is a quantitative response variable and a single quantitative explanatory variable. These methods will help us address questions such as the following:

Is the trend in the annual number of tornadoes reported in the United States approximately linear? If so, what is the average yearly increase in the number of tornadoes? How many are predicted for next year?
For female college students, is a greater number of steps per day associated with a lower body mass index? How strong is the predictive relationship?
Is there a strong positive correlation between a state’s adult binge-drinking rate and the prevalence of underage drinking?

We first met the sample mean in Chapter 1, as a measure of the center of a collection of observations. Later, we learned that when the data are a random sample from a population, the sample mean is an unbiased estimate of the population mean μ. We then used the sample mean in Chapters 6 and 7 as the basis for confidence intervals and significance tests for inference about μ.

Now we take this same approach for the problem of fitting straight lines to data. In Chapter 2, we met the least-squares regression line y^=b0+b1x as a description of a straight-line relationship between a response variable y and an explanatory variable x. At that point, however, we did not distinguish between sample and population. In this chapter, we will now think of the least-squares line computed from the sample as an estimate of the true population regression line.

Following the common practice of using Greek letters for population parameters, we write this population line as β0+β1x. This notation reminds us that the intercept of the fitted line b0 estimates the intercept of the population line β0, and the fitted slope b1 estimates the slope of the population line β1.

Our discussion begins with an overview of the simple linear regression model and inference about the slope β1 and the intercept β0. Because regression lines are most often used for prediction, we then consider inference about either the mean response or an individual future observation of y for a given value of the explanatory variable x. We conclude the chapter with more of the computational details, including the use of analysis of variance (ANOVA). If you plan to read Chapter 11 on regression involving more than one explanatory variable, these details will be very useful.