In Chapters 1 and 2 we studied various numerical summaries that describe key features of a data set. In Chapter 3, we learned that these summaries are only as good as the methods by which the data are produced and that random sampling or randomized experimentation provide the most convincing evidence for conclusions. In this chapter, we begin our study of statistical inference, the process of drawing conclusions about a population from data. Our first step is to combine the methods that produce and summarize data with the basic concepts of probability (Chapter 4) to construct a sampling distribution. Such distributions provide the link between the population and the sample summary and are used to address questions such as the following:
We start this chapter with a description of the foundation for statistical inference. In Section 5.1, we not only discuss the use of statistics from a sample as estimates of parameters of a population but also describe the chance variation of a statistic when the data are produced by random sampling or randomized experimentation.
The sampling distribution of a statistic shows how the statistic would vary in identical repeated data collections. That is, the sampling distribution is a probability distribution that answers the question “What would happen if we did this experiment or sampling many times?” Understanding these distributions is the key to understanding statistical inference.
The last two sections of this chapter study the sampling distributions of two common statistics: the sample mean (for quantitative data) and the sample proportion or sample count (for categorical data). The general framework for constructing a sampling distribution is the same for all statistics, so we focus on statistics that are commonly used in inference. As part of this study, we revisit the Normal distributions and are introduced to two common discrete probability distributions: the binomial and Poisson distributions.