In this chapter, we continue our study of relationships between
variables and describe methods for inference when there is a
quantitative response variable and a single quantitative explanatory
variable. These methods will help us address questions such as the
following:
-
Is the trend in the annual number of tornadoes reported in the
United States approximately linear? If so, what is the average
yearly increase in the number of tornadoes? How many are predicted
for next year?
-
For female college students, is a greater number of steps per day
associated with a lower body mass index? How strong is the
predictive relationship?
-
Is there a strong positive correlation between a state’s adult
binge-drinking rate and the prevalence of underage drinking?
We first met the sample mean in
Chapter 1, as a
measure of the center of a collection of observations. Later, we
learned that when the data are a random
sample from a population, the sample mean is an unbiased estimate of
the population mean
μ. We then used the sample mean in
Chapters 6 and
7 as the basis for
confidence intervals and significance tests for inference about
μ.
Now we take this same approach for the problem of fitting straight
lines to data. In
Chapter 2, we met the
least-squares regression line
y^=b0+b1x
as a description of a straight-line relationship between a response
variable y and an explanatory variable x. At that point,
however, we did not distinguish between sample and population. In this
chapter, we will now think of the least-squares line computed from the
sample as an estimate of the true population regression line.
Following the common practice of using Greek letters for population
parameters, we write this population line as
β0+β1x. This notation reminds us that the intercept of the fitted line
b0
estimates the intercept of the population line
β0, and the fitted slope
b1
estimates the slope of the population line
β1.
Our discussion begins with an overview of the simple linear regression
model and inference about the slope
β1
and the intercept
β0. Because regression lines are most often used for prediction, we
then consider inference about either the mean response or an
individual future observation of y for a given value of the
explanatory variable x. We conclude the chapter with more of
the computational details, including the use of analysis of variance
(ANOVA). If you plan to read
Chapter 11 on
regression involving more than one explanatory variable, these details
will be very useful.