5.2 The Sampling Distribution of a Sample Mean in Chapter 5 Sampling Distributions

5.2 The Sampling Distribution of a Sample Mean

When you complete this section, you will be able to:

Explain the difference between the sampling distribution of x¯ and the population distribution.
Determine the mean and standard deviation of x¯ for an SRS of size n from a population with mean μ and standard deviation σ.
Describe how many times larger n has to be for an SRS to reduce the standard deviation of x¯ by a certain factor.
Utilize the central limit theorem to approximate the sampling distribution of x¯ and perform probability calculations based on this approximation.

A variety of statistics are used to describe quantitative data. The sample mean, median, and standard deviation are all examples of statistics based on quantitative data. In the previous section, we learned that the general framework for constructing a sampling distribution is the same for all statistics and that we can approximate a sampling distribution through simulation. In this section, we will concentrate on the sample mean and study its sampling distribution using both simulation and statistical theory. Because sample means are just averages of observations, they are among the most frequently used statistics.

Suppose that you plan to survey 1000 undergraduates enrolled in four-year U.S. universities about their sleeping habits when at school. The sampling distribution of the average hours of sleep per night describes what this average would be if many simple random samples of 1000 students were drawn from the population of students in the United States. In other words, it gives you an idea of what you are likely to see from your survey. It tells you whether you should expect this average to be near the population mean and whether its margin of error is roughly ±1 hour or ±1 minute.

Before constructing this distribution, however, we need to consider another set of probability distributions that plays a role in statistical inference. Imagine choosing only one individual at random from the population and measuring a quantity. The values obtained from repeated draws of one individual from the population have a probability distribution that is called the population distribution.

Example 5.4 Total sleep time of college students.

A study of sleep quality and academic performance describes the distribution of sleep duration among students as approximately Normal, with a mean of 7.13 hours and standard deviation of 1.67 hours.⁴ Suppose that we select a college student at random and obtain their average sleep time. This result is a random variable X because, prior to the random sampling, we don’t know the sleep time. We do know, however, that in repeated sampling, X will have the same N(7.13, 1.67) distribution that describes the pattern of sleep time in the entire population. We call N(7.13, 1.67) the population distribution. This is the context in which we first met distributions, as density curves that provide models for the overall pattern of data.

In this example, the population of all college students actually exists so that we can, in principle, draw an SRS of students from it. Sometimes, our population of interest does not actually exist. For example, suppose that we are interested in studying final-exam scores in a statistics course, and we have the scores of the 103 students who took the course last semester. For the purposes of statistical inference, we might want to consider these 103 students as part of a hypothetical population of similar students who would take this course. In this sense, these 103 students represent not only themselves but also a larger population of similar students. The key idea is to think of the observations that you have as coming from a population with a probability distribution.

Check-in

5.6 Time spent using apps on a smartphone. In “How to Win Gen Z on Mobile,” App Annie reports that Gen Z smartphone users worldwide average 92.5 hours per month using the top 25 non-gaming apps.⁵
1. State the population that this report describes and the statistic.
2. Sketch what you think this population distribution would look like, making sure to specify its mean and median.

Now that we have made the distinction between the population distributions and sampling distributions, we can proceed with an in-depth study of the sampling distribution of a sample mean x¯. As we did in Example 5.3 (page 274), we will begin this study using simulation.

Example 5.5 Sample means are approximately Normal.

Data set icon for helprm.

Figure 5.6(a) displays the distribution of student visit lengths (in minutes) to a statistics help room at a large midwestern university. Students visiting the help room were asked to sign in upon arrival and then sign out when leaving. During the school year, there were 1838 visits to the help room but only 1264 recorded visit lengths because many visiting students forgot to sign out.⁶ The distribution is strongly skewed to the right, with a maximum visit length of 485 minutes. The population mean is μ=61.28 minutes.

Two histograms of visits to a help room. — Figure 5.6 (a) The distribution of 1264 visit lengths to a statistics help room during the school year, Example 5.5. (b) The distribution of the sample means x¯ for 500 random samples of size 50 from this population. The scales and histogram classes are exactly the same in both panels.

Table 5.1 Length (in minutes) of 50 visits to a statistics help room
5	6	14	15	20	20	20	28	30	30
30	30	31	33	35	40	41	41	41	50
50	55	55	55	55	55	60	60	60	65
65	65	66	67	75	75	80	85	85	86
90	90	98	99	110	122	142	150	160	165

Data set icon for help50.

TABLE 5.1 contains the lengths of a random sample of 50 visits from this population. The mean of these 50 visits is x¯=62.1 minutes. If we were to take another sample of size 50, we would likely get a different value of x¯. This is because this new sample would contain a different set of visits. To find the sampling distribution of x¯, we take many SRSs of size 50 and calculate x¯ for each sample. Figure 5.6(b) is the distribution of the values of x¯ for 500 random samples. The scales and choice of classes are exactly the same as in Figure 5.6(a) so that we can make a direct comparison.

Figure 5.6(b) illustrates several striking facts about the sampling distribution of a sample mean. First, the sample means are much less spread out than the individual visit lengths. Second, the sample means appears centered around population mean μ=61.28. In fact, the mean of the 500 sample means is 61.09. Third, the Normal quantile plot in Figure 5.7 confirms that the distribution in Figure 5.6(b) is close to Normal. This is quite remarkable, given the skewed population distribution with several very large values.

A normal quantile plot of sample means. — Figure 5.7 Normal quantile plot of the 500 sample means in Figure 5.6(b). The distribution is close to Normal.

These three facts contribute heavily to the popularity of sample means in statistical inference. Let’s now study each of them more carefully using statistical theory.

The mean and standard deviation of x¯

The sample mean x¯ from a sample or an experiment is an estimate of the mean μ of the underlying population. The sampling distribution of x¯ is determined by:

The design used to produce the data.
The sample size n.
The population distribution.

Suppose we select an SRS of size n from a population and measure a variable X on each individual in the sample. The n measurements are values of n random variables X1,X2,…,Xn. A single Xi is a measurement on one individual selected at random from the population and, therefore, has the distribution of the population. If the population is large relative to the sample, we can consider X1,X2,…,Xn to be independent random variables, each having the same distribution. This is our probability model for measurements on each individual in an SRS.

The sample mean of an SRS of size n is

x¯=1n(X1,X2,…,Xn)

If the population has mean μ, then μ is the mean of the distribution of each observation Xi. To get the mean of x¯, we use the rules for means of random variables. Specifically,

μx¯=1n(μX1+μX2+⋯+μXn)=1n(μ+μ+⋯+μ)=μ

That is, the mean of x¯ is the same as the mean of the population. The sample mean x¯ is, therefore, an unbiased estimator of the unknown population mean μ.

Because the observations are independent, the addition rule for variances also applies:

σx¯2=(1n)2(σX12+σX22+⋯+σXn2)=(1n)2(σ2+σ2+⋯+σ2)=σ2n

With n in the denominator, the variability of x¯ about its mean decreases as the sample size grows. Thus, a sample mean from a large sample will usually be very close to the true population mean μ. Here is a summary of these facts.

How precisely does a sample mean x¯ estimate a population mean μ? Because the values of x¯ vary from sample to sample, we must give an answer in terms of the sampling distribution. We know that x¯ is an unbiased estimator of μ, so its values in repeated samples are not systematically too high or too low. Most samples will give an x¯-value close to μ if the sampling distribution is concentrated close to its mean μ. Thus, the precision of estimation depends on the spread of the sampling distribution.

Because the standard deviation of x¯ is σ/n, the standard deviation of the statistic decreases in proportion to the square root of the sample size. This means, for example, that a sample size must be multiplied by 4 in order to divide the statistic’s standard deviation in half. By comparison, a sample size must be multiplied by 100 in order to reduce the standard deviation by a factor of 10.

Example 5.6 Standard deviations for sample means of visit lengths.

The standard deviation of the population of visit lengths in Figure 5.6(a) is σ=41.84. The length of a single visit will often be far from the population mean. If we choose an SRS of 50 visits, the standard deviation of their mean length is

σx¯=41.8450=5.92 minutes

Averaging over more visits reduces the variability and makes it more likely that x¯ is close to μ. For example, if we were to sample 200(4×50) visits, the standard deviation would be half as large:

σx¯=41.84200=2.96 minutes

Check-in

5.7 Find the mean and the standard deviation of the sampling distribution. Compute the mean and standard deviation of the sampling distribution of the sample mean when you plan to take an SRS of size 25 from a population with mean 215 and standard deviation 10.
5.8 The effect of increasing the sample size. In the setting of the previous exercise, repeat the calculations for a sample size of 100. Explain the effect of the sample size increase on the mean and standard deviation of the sampling distribution.

Before discussing the third fact, we have one comment on terminology. To maintain the distinction between parameters and statistics, the term “standard error” is sometimes used for the standard deviation of a statistic. Thus, the standard deviation of x¯ is called the standard error of x¯. In this book, we use the term “standard error” only in situations when the standard deviation of a statistic is estimated from the data. This is discussed more fully in Chapter 7 (page 384).

The central limit theorem

Data set icon for Vtm.

We have described the center and spread of the probability distribution of a sample mean x¯ but not its shape, which depends on the shape of the population distribution. An important distinction is between Normal and non-Normal population distributions. Here is one important but special case: if the population distribution is Normal, then so is the distribution of the sample mean.

Most population distributions are non-Normal. Yet Figures 5.6(b) and 5.7 show that means of samples of size 50 from a strongly skewed population are close to Normal. Clearly, there must be something more we can say about the sampling distribution of x¯ in the non-Normal setting.

One of the most famous facts of probability theory says that, for large sample sizes, the distribution of x¯ is close to a Normal distribution. This is true no matter what shape the population distribution has, as long as the population has a finite standard deviation σ. This is the central limit theorem. It is much more useful than the fact that the distribution of x¯ is exactly Normal if the population is exactly Normal.

Example 5.7 How close will the sample mean be to the population mean?

With the Normal distribution to work with, we can better describe how precisely a random sample of 50 visits estimates the mean length of all visits to the statistics help room. The population standard deviation for the 1264 visits in the population of Figure 5.6(a) is σ=41.84 minutes. From Example 5.6 we know σx¯=5.92 minutes. By the 95 part of the 68–95–99.7 rule, about 95% of all samples will have mean x¯ within 2 standard deviations of μ—that is, within ±11.8 minutes of μ. This value is what we earlier referred to as the margin of error (page 277).

If a margin of error of 11.8 minutes is not considered precise enough, we must consider a larger sample size to reduce the standard deviation of x¯.

Example 5.8 Reducing the standard deviation of x¯.

In the setting of Example 5.7, if we want to reduce the standard deviation of x¯ by a factor of 2, we must take a sample four times as large, n=4×50, or 200. Then

σx¯=41.84200=2.96 minutes

For samples of size 200, about 95% of the sample means will be within twice 2.96, or 5.9 minutes, of the population mean μ.

The standard deviation computed in Example 5.8 is actually too large. This is due to the fact that the population size, N=1264, is not at least 20 times larger than the sample size, n=200. caution When N<20n, it is better to adjust the standard deviation of x¯ to reflect only the variance remaining in the population that is not in the sample. This is done by multiplying the unadjusted standard deviation by the finite population correction factor. This quantity is N-nN-1 and moves the standard deviation of x¯ toward 0 as n moves toward N. Applying this correction to Example 5.8, the standard deviation of x¯ is reduced 10% to

41.842001264-2001264-1=2.72 minutes

Thus, for samples of size 200, about 95% of the sample means will be within twice 2.72, or 5.4 minutes, of the population mean μ, rather than the 5.9 minutes reported in Example 5.8.

Check-in

5.9 Use the 68–95–99.7 rule. You take an SRS of size 25 from a population with mean 215 and standard deviation 10. According to the central limit theorem, what is the approximate sampling distribution of the sample mean? Use the 95 part of the 68–95–99.7 rule to determine the margin of error.
5.10 Increasing the sample size. Refer to the previous Check-in question. Suppose that you increase the sample size to 225. Use the 95 part of the 68–95–99.7 rule to determine the margin of error. By what factor is the margin of error reduced?

The main point of Examples 5.7 and 5.8 is to demonstrate that the central limit theorem allows us to use Normal probability calculations to answer questions about sample means even when the population distribution is not Normal. Example 5.8, however, reminds us that if the population is very spread out, the n in the formula for the standard deviation of x¯ implies that very large samples are needed to estimate the population mean precisely. It also reminds us to check the relative size of the sample when the population size is finite.

The central limit theorem can be used when “n is large.” How large n has to be for x¯ to be close to Normal depends on the population distribution. More observations are required if the shape of the population distribution is far from Normal. For the very skewed visit length population, samples of size 50 are large enough. However, if we had used a sample size of n=25, the associated sampling distribution would still be skewed, and the use of Normal probability calculations would be unwise. Here is a more detailed study of another skewed distribution.

Example 5.9 The central limit theorem in action.

Figure 5.8 shows the central limit theorem in action for another very non-Normal population. Figure 5.8(a) displays the density curve of a single observation from the population. The distribution is strongly right-skewed, and the most probable outcomes are near 0. The mean μ of this distribution is 1, and its standard deviation σ is also 1. This particular continuous distribution is called an exponential distribution. Exponential distributions are used as models for how long an iPhone will function properly and for the time between snaps you receive on Snapchat.

Four distribution curves. — Figure 5.8 The central limit theorem in action: the sampling distribution of sample means from a strongly non-Normal population becomes more Normal as the sample size increases, Example 5.9. The distribution of (a) 1 observation; (b) x¯ for 2 observations; (c) x¯ for 10 observations; and (d) x¯ for 25 observations.

Each curve is plotted on a graph with only 0 and 1 on the horizontal axis marked. On graph a, the curve is right skewed, falling with decreasing steepness from a point on the vertical axis. On graph b, the curve is right skewed, rising with decreasing steepness from the origin to a maximum at approximately 0.5, then falling with decreasing steepness across the graph. On graph c, the curve is slightly right skewed, rising with increasing steepness from 0.25 to a maximum at 0.9, then falling with decreasing steepness to 2. On graph d, the curve is roughly normal between 0.5 and 1.5, with a mean at 1. All values estimated.

Figures 5.8(b), (c), and (d) are the density curves of the sample means of 2, 10, and 25 observations from this population. As n increases, the shape becomes more Normal. The mean remains at μ=1, but the standard deviation decreases, taking the value 1/n. The density curve for 10 observations is still somewhat skewed to the right but already resembles a Normal curve having μ=1 and σ=1/10=0.32. The density curve for n=25 is yet more Normal. The contrast between the shape of the population distribution and of the distribution of the mean of 10 or 25 observations is striking.

Applet You can also use the Central Limit Theorem applet to study the sampling distribution of x¯. From one of three population distributions, 10,000 SRSs of a user-specified sample size n are generated, and a histogram of the sample means is constructed. You can then compare this estimated sampling distribution with the Normal curve that is based on the central limit theorem.

Example 5.10 Using the Central Limit Theorem applet.

In Example 5.9, we considered sample sizes of n=2, 10, and 25 from an exponential distribution. Figure 5.9 shows a screenshot of the Central Limit Theorem applet for the exponential distribution when n=10. The mean and standard deviation of this sampling distribution are 1 and 1/10=0.316, respectively. From the 10,000 SRSs, the mean is estimated to be 1.001, and the estimated standard deviation is 0.319. These are both quite close to the true values. In Figure 5.8(c), we saw that the density curve for 10 observations is still somewhat skewed to the right. We can see this same behavior in Figure 5.9 when we compare the histogram with the Normal curve based on the central limit theorem.

A screen capture of two normal distribution curves using the statistical power applet. — Figure 5.9 Screenshot of the *Central Limit Theorem* applet for the exponential distribution when n=10, Example 5.10.

Try using the applet for the other sample sizes in Example 5.9. You should get histograms shaped like the density curves shown in Figure 5.8. You can also consider other sample sizes by sliding n from 1 to 100. As you increase n, the shape of the histogram moves closer to the Normal curve that is based on the central limit theorem.

Check-in

5.11 Use the Central Limit Theorem applet. Let’s consider the uniform distribution between 0 and 10. For this distribution, all intervals of the same length between 0 and 10 are equally likely. This distribution has a mean of 5 and standard deviation of 2.89.
1. Approximate the population distribution by setting n=1.
2. What are your estimates of the population mean and population standard deviation based on the 10,000 SRSs? Are these population estimates close to the true values?
3. Describe the shape of the histogram and compare it with the Normal curve.
5.12 Use the Central Limit Theorem applet again. Refer to the previous Check-in question. In the setting of Example 5.9, let’s approximate the sampling distribution for samples of size n=2, 10, and 25 observations.
1. For each sample size, compute the mean and standard deviation of x¯.
2. For each sample size, use the applet to approximate the sampling distribution. Report the estimated mean and standard deviation. Are they close to the true values calculated in part (a)?
3. For each sample size, compare the shape of the sampling distribution with the Normal curve based on the central limit theorem.
4. For this population distribution, what sample size do you think is needed to make you feel comfortable using the central limit theorem to approximate the sampling distribution of x¯? Explain your answer.

Now that we know that the sampling distribution of the sample mean x¯ is approximately Normal for a sufficiently large n, let’s consider some probability calculations.

Example 5.11 Time between snaps.

Snapchat has more than 200 million daily users sending well over 3 billion snaps a day.⁷ Suppose that the time X between snaps you receive is governed by the exponential distribution with mean μ=8 minutes. You record the next 50 times between snaps. What is the probability that their average exceeds 7 minutes?

The central limit theorem says that the sample mean time x¯ (in minutes) between snaps has approximately the Normal distribution with mean equal to the population mean μ=8 minutes and standard deviation

σ50=850=1.13 minutes

The sampling distribution of x¯ is, therefore, approximately N(8, 1.13). Figure 5.10 shows this Normal curve (solid) and also the actual density curve of x¯ (dashed).

Overlapping exact distribution curves and normal approximation curves. — Figure 5.10 The exact distribution (dashed) and the Normal approximation from the central limit theorem (solid) for the average time between snaps received, Example 5.11.

The probability we want is P(x¯>7.0). This is the area to the right of 7 under the solid Normal curve in Figure 5.10. A Normal distribution calculation gives

P(x¯>7.0)=P(x¯-81.13>7.0-81.13)=P(Z>-0.88)=0.8106

The exactly correct probability is the area under the dashed density curve in the figure. It is 0.8094. The central limit theorem Normal approximation is off by only about 0.0012.

We can also use this sampling distribution to talk about the total time between the 1st and 51st snap received.

Example 5.12 Convert the results to the total time.

There are 50 time intervals between the 1st and 51st snap. According to the central limit theorem calculations in Example 5.11,

P(x¯>7.0)=0.8106

We know that the sample mean is the total time divided by 50, so the event {x¯>7.0} is the same as the event {50x¯>50(7.0)}. We can say that the probability is 0.8106 that the total time is 50(7.0)=350 minutes (5.83 hours) or greater.

Check-in

5.13 Find a probability. Refer to Example 5.11. Find the probability that the mean time between snaps is less than 8 minutes. The exact probability is 0.5188. Compare your answer with the exact one.

Figure 5.11 summarizes the facts about the sampling distribution of x¯ in a way that emphasizes the big idea of a sampling distribution. The general framework for constructing the sampling distribution of x¯ is shown on the left and involves the following steps:

Take many random samples of size n from a population with mean μ and standard deviation σ.
Find the sample mean x¯ for each sample.
Collect all the x¯’s and display their distribution.

A diagram of a sampling distribution and a normal distribution curve. — Figure 5.11 The sampling distribution of a sample mean x¯ has mean μ and standard deviation σ/n. The sampling distribution is Normal if the population distribution is Normal; it is approximately Normal for large samples in any case.

The sampling distribution of x¯ is shown on the right. Keep this figure in mind when performing probability calculations about x¯.

A few more facts related to the sampling distribution of x¯

Even though the central limit theorem is the big fact of probability theory in this section, there are several additional facts related to our investigations of the sampling distribution of x¯ that will be useful in describing methods of inference in later chapters.

The fact that the sample mean of an SRS from a Normal population has a Normal distribution is a special case of a more general fact: any linear combination of independent Normal random variables is also Normally distributed. That is, if X and Y are independent Normal random variables and a and b are any fixed numbers, aX+bY is also Normally distributed, and this is true for any number of Normal random variables. In particular, the sum or difference of independent Normal random variables has a Normal distribution. The mean and standard deviation of aX+bY are found as usual from the rules for means and variances. These facts are often used in statistical calculations. Here is an example.

Example 5.13 Getting to and from campus.

You live off campus and take the shuttle provided by your apartment complex to and from campus. Your time on the shuttle, in minutes, varies from day to day. The time going to campus X has the N(20, 4) distribution, and the time returning from campus Y varies according to the N(18, 8) distribution. If they vary independently, what is the probability that you will be on the shuttle for less time going to campus?

The difference in times X-Y is Normally distributed, with mean and variance

μX−Y=μX−μY=20−18=2 σX−Y2=σX2+σY2=42+82=80

Because 80=8.94, X-Y has the N(2, 8.94) distribution. Figure 5.12 illustrates the probability computation:

P(X<Y)=P(X-Y<0)=P((X-Y)-28.94<0-28.94)=P(Z<-0.22)=0.4129

Although, on average, it takes longer to go to campus than return, the trip to campus will take less time on roughly two of every five days.

A normal distribution curve of time difference. — Figure 5.12 The Normal probability calculation, Example 5.13. The difference in times going to campus and returning from campus (X-Y) is Normal, with mean 2 minutes and standard deviation 8.94 minutes.

The second useful fact is that more general versions of the central limit theorem say that the distribution of a sum or an average of many small random quantities is close to Normal. This is true even if the quantities are not independent (as long as they are not too highly correlated) and even if they have different distributions (as long as no single random quantity is so large that it dominates the others). These more general versions of the central limit theorem suggest why the Normal distributions are common models for observed data. Any variable that is a sum of many small random influences will have approximately a Normal distribution.

Finally, the central limit theorem also applies to discrete random variables. An average of discrete random variables will never result in a continuous sampling distribution, but the Normal distribution often serves as a good approximation. In the next section, we will discuss the sampling distribution and Normal approximation for counts and proportions. This Normal approximation is just an example of the central limit theorem applied to these discrete random variables.

Beyond the Basics

Weibull distributions

Our discussion of sampling distributions so far has concentrated on the Normal model to approximate the sampling distribution of the sample mean x¯. This model is important in statistical practice because of the central limit theorem and the fact that sample means are among the most frequently used statistics. Simplicity also contributes to its popularity. The parameter μ is easy to understand, and to estimate it, we use a statistic x¯ that is also easy to understand and compute.

There are, however, many other probability distributions that are used to model data in various circumstances. The time that a product, such as a Nintendo Switch, lasts before failing rarely has a Normal distribution. Earlier, we mentioned the use of the exponential distribution to model time to failure. Another class of continuous distributions, the Weibull distributions, is more commonly used in these situations.

Example 5.14 Weibull density curves.

Figure 5.13 shows the density curves of three members of the Weibull family. Each describes a different type of distribution for the time to failure of a product.

Three density curves. — Figure 5.13 Density curves for three members of the Weibull family of distributions, Example 5.14. The curves model (a) infant mortality, (b) early failure, and (c) old-age wear-out.

Figure 5.13(a) is a model for infant mortality. This describes products that often fail immediately, prior to delivery to the customer. However, if the product does not fail right away, it will likely last a long time. For products like this, a manufacturer might test them and ship only the ones that do not fail immediately.
Figure 5.13(b) is a model for early failure. These products do not fail immediately, but many fail early in their lives after they are in the hands of customers. This is disastrous, and the product or the process that makes it must be changed at once.
Figure 5.13(c) is a model for old-age wear-out. Most of these products fail only when they begin to wear out, and then many fail at about the same age.

A manufacturer certainly wants to know to which of these classes a new product belongs. To find out, engineers operate a random sample of products until they fail. From the failure time data, we can estimate the parameter (called the “shape parameter”) that distinguishes among the three Weibull distributions in Figure 5.13. The shape parameter has no simple definition like that of a population proportion or mean, and it cannot be estimated by a simple statistic such as p^ or x¯.

Two things save the situation. First, statistical theory provides general approaches for finding good estimates of any parameter. These general methods not only tell us how to use x¯ in the Normal settings but also how to estimate the Weibull shape parameter. Second, software can calculate the estimate from data even though there is no algebraic formula that we can write for the estimate. Statistical practice often relies on both mathematical theory and methods of computation more elaborate than the ones we will meet in this book. Fortunately, big ideas such as sampling distributions carry over to more complicated situations.⁸

Section 5.2 SUMMARY

The population distribution of a variable X is the distribution of its values for all members of the population. This distribution and the sample size n affect the distribution of x¯.
The sample mean x¯ of an SRS of size n drawn from a large population with mean μ and standard deviation σ has a sampling distribution with mean and standard deviation

μx¯=μσx¯=σn
The sample mean x¯ is an unbiased estimator of the population mean μ and is less variable than a single observation.
The standard deviation of x¯ decreases in proportion to the square root of the sample size n. This means that to reduce the standard deviation by a factor of C, we need to increase the sample size by a factor of C2.
The central limit theorem states that, for large n, the sampling distribution of x¯ is approximately (μ, σ/n) for any population with mean μ and finite standard deviation σ. This allows us to approximate probability calculations of x¯ using the Normal distribution.
Linear combinations of independent Normal random variables have Normal distributions. In particular, if the population has a Normal distribution, so does x¯.

Now that you have completed this section, you will be able to:

Explain the difference between the sampling distribution of x¯ and the population distribution. Review Example 5.5 (page 283) and try Exercise 5.15.
Determine the mean and standard deviation of x¯ for an SRS of size n from a population with mean μ and standard deviation σ. Review Example 5.6 (page 286) and try Exercises 5.19 and 5.23.
Describe how many times larger n has to be for an SRS to reduce the standard deviation of x¯ by a certain factor. Review Example 5.8 (page 287) and try Exercise 5.17.
Utilize the central limit theorem to approximate the sampling distribution of x¯ and perform probability calculations based on this approximation. Review Example 5.11 (page 290) and try Exercise 5.25.

Section 5.2 EXERCISES

5.13 What’s wrong? For each of the following statements, explain what is wrong and why.
1. If the population standard deviation is 20, then the standard deviation of x¯ for an SRS of 10 observations is 20/10=2.
2. When taking SRSs from a population, larger sample sizes will result in larger standard deviations of x¯.
3. For an SRS from a population, both the mean and the standard deviation of x¯ depend on the sample size n.
4. The larger the population, the bigger the sample size n needs to be for a desired standard deviation of x¯.
5.14 What’s wrong? For each of the following statements, explain what is wrong and why.
1. The central limit theorem states that for large n, the population mean μ is approximately Normal.
2. For large n, the distribution of observed values will be approximately Normal.
3. For sufficiently large n, the 68–95–99.7 rule says that x¯ should be within 2σ of μ about 95% of the time.
4. Refer to Figure 5.13. For x¯ to be approximately Normal, we will need to draw a larger sample size n from the distribution in panel (c) than in panel (a).

5.15 Generating a sampling distribution. Let’s illustrate the idea of a sampling distribution in the case of a very small sample from a very small population. The population is the 10 scholarship players currently on your women’s basketball team. For convenience, the 10 players have been labeled with the integers 0 to 9. For each player, the total amount of time spent (in minutes) on Twitter during the past week is recorded in the following table.

Player	0	1	2	3	4	5	6	7	8	9
Time (min)	118	24	89	85	74	135	116	107	60	99

The parameter of interest is the average amount of time on Twitter. The sample is an SRS of size n=3 drawn from this population of players. Because the players are labeled 0 to 9, a single random digit from Table B chooses one player for the sample.

Find the mean for the 10 players in the population. This is the population mean μ.
Use Table B to draw an SRS of size 3 from this population. (Note: You may sample the same player’s time more than once.) Write down the three times in your sample and calculate the sample mean x¯. This statistic is an estimate of μ.
Repeat this process nine more times, using different parts of Table B. Make a histogram of the 10 values of x¯. You are approximating the sampling distribution of x¯.
Is the center of your histogram close to μ? Explain why you’d expect it to get closer to μ the more times you repeated this sampling process.

5.16 Sleep duration of college students. In Example 5.4, the daily sleep duration among college students was approximately Normally distributed with mean μ=7.13 hours and standard deviation σ=1.67 hours. You plan to take an SRS of size n=60 and compute the average total sleep time.
1. What is the standard deviation for the average time?
2. Use the 95 part of the 68–95–99.7 rule to describe the variability of this sample mean.
3. What is the probability that your average will be below 6.9 hours?
5.17 Determining sample size. Refer to the previous exercise. You want to use a sample size such that about 95% of the averages fall within ±10 minutes (0.17 hour) of the true mean μ=7.13.
1. Based on your answer to part (b) in Exercise 5.16, should the sample size be larger or smaller than 60? Explain.
2. What standard deviation of x¯ do you need such that approximately 95% of all samples will have a mean within 10 minutes of μ?
3. Using the standard deviation you calculated in part (b), determine the number of students you need to sample.
5.18 Length of a movie on Netflix. Flixable reports that Netflix’s U.S. catalog contains almost 4000 movies.⁹ You are interested in determining the average length of these movies. Previous studies have suggested the standard deviation for this population is 34 minutes.
1. What is the standard deviation of the average length if you take an SRS of 25 movies from this population?
2. How many movies would you need to sample if you wanted the standard deviation of x¯ to be no larger than 5 minutes?
5.19 Bottling an energy drink. A bottling company uses a filling machine to fill cans with an energy drink. The cans are supposed to contain 250 milliliters (ml) each. The machine, however, has some variability, so the standard deviation of the volume is σ=0.27. A sample of five cans is inspected each hour for process control purposes, and records are kept of the sample mean volume. If the process mean is exactly equal to the target value, what are the mean and standard deviation of the numbers recorded?
5.20 Average movie length on Netflix. Refer to Exercise 5.18. Suppose that the true mean movie length is 98.6 minutes, and you plan to take an SRS of n=50 movies.
1. Explain why it may be reasonable to assume that the average x¯ is approximately Normal even though the population distribution is likely skewed to the right.
2. Sketch the approximate Normal curve for the sample mean, making sure to specify its mean and standard deviation.
3. What is the probability that your sample mean will differ from the population mean by more than 2 minutes?
5.21 Can volumes. Averages are less variable than individual observations. It is reasonable to assume that the can volumes in Exercise 5.19 vary according to a Normal distribution. In that case, the mean x¯ of an SRS of cans also has a Normal distribution.
1. Make a sketch of the Normal curve for a single can. Add the Normal curve for the mean of an SRS of five cans on the same sketch.
2. What is the probability that the volume of a single randomly chosen can differs from the target value by 0.1 ml or more?
3. What is the probability that the mean volume of an SRS of five cans differs from the target value by 0.1 ml or more?
5.22 Number of friends on Facebook. In Australia, young people aged 18 to 29 have an average of 394 Facebook friends.¹⁰ This population distribution takes only integer values, so it is certainly not Normal. It is also highly skewed to the right. Suppose that σ=280 and you take an SRS of 70 Facebook users from this population.
1. For your sample, what are the mean and standard deviation of x¯, the mean number of friends per user?
2. Use the central limit theorem to find the probability that the average number of friends for an SRS of 70 Facebook users is greater than 425.
3. What are the mean and standard deviation of the total number of friends in your sample?
4. What is the probability that the total number of friends among your sample of 70 Facebook users is greater than 29,750?
5.23 Cholesterol levels of teenagers. A study of the health of teenagers plans to measure the blood cholesterol level of an SRS of 13- to 16-year-olds. The researchers will report the mean x¯ from their sample as an estimate of the mean cholesterol level μ in this population.
1. Explain to someone who knows no statistics what it means to say that x¯ is an “unbiased” estimator of μ.
2. The sample result x¯ is an unbiased estimator of the population truth μ no matter what size SRS the study chooses. Explain to someone who knows no statistics why a large sample gives more trustworthy results than a small sample.
5.24 Grades in a math course. Indiana University posts the grade distributions for its courses online.¹¹ In one spring semester, students in Math 118 received 16.1% A’s, 34.3% B’s, 29.2% C’s, 9.6% D’s, and 9.8% F’s.
1. Using the common scale A=4, B=3, C=2, D=1, F=0, take X to be the grade of a randomly chosen Math 118 student. Use the definitions of the mean (page 237) and standard deviation (page 245) for discrete random variables to find the mean μ and the standard deviation σ of grades in this section.
2. Math 118 is a large enough course that we can take the grades of an SRS of 25 students and not worry about the finite population correction factor. If x¯ is the average of these 25 grades, what are the mean and standard deviation of x¯?
3. What is the probability that a randomly chosen Math 118 student gets a B or better, P(X≥3)?
4. What is the approximate probability that the grade point average for 25 randomly chosen Math 118 students is B or better, P(x¯≥3)?
5. Explain why the probabilities in parts (c) and (d) are so different.
5.25 Weights of airline passengers. In 2019, the Federal Aviation Administration (FAA) updated its standard average passenger weight to be based on data from U.S. government health agency surveys.¹² It specified this average weight, which includes clothing, as 189 pounds in the summer (195 in the winter). These health agency surveys can also be used to determine the standard deviation, which we’ll assume is 47 pounds. Weights are not Normally distributed, especially when the population includes both men and women, but they are not very non-Normal. A commuter plane carries 25 passengers. What is the approximate probability that, in the winter, the total weight of the passengers exceeds 5225 pounds? (Hint: To apply the central limit theorem, restate the problem in terms of the mean weight.)
5.26 Investments in two funds. Jennifer invests her money in a portfolio that consists of 65% Fidelity 500 Index Fund and 35% Fidelity Tax-Free Bond Fund. Suppose that, in the long run, the annual real return X on the Index Fund has mean 10% and standard deviation 12%, the annual real return Y on the Bond Fund has mean 5% and standard deviation 3%, and the correlation between X and Y is -0.16.
1. The return on Jennifer’s portfolio is R=0.65X+0.35Y. What are the mean and standard deviation of R?
2. The distribution of returns is typically roughly symmetric but with more extreme high and low observations than a Normal distribution. The average return over a number of years, however, is close to Normal. If Jennifer holds her portfolio for 20 years, what is the approximate probability that her average return is greater than 5%?
3. The calculation you just made is not overly helpful because Jennifer isn’t really concerned about the mean return R¯. To see why, suppose that her portfolio returns 12% this year and 6% next year. The mean return for the two years is 9%. If Jennifer starts with $1000, how much does she have at the end of the first year? At the end of the second year? How does this amount compare with what she would have if both years had the mean return, 9%? Over 20 years, there may be a large difference between the ordinary mean R¯ and the geometric mean, which reflects the fact that returns in successive years multiply rather than add.

5	6	14	15	20	20	20	28	30	30
30	30	31	33	35	40	41	41	41	50
50	55	55	55	55	55	60	60	60	65
65	65	66	67	75	75	80	85	85	86
90	90	98	99	110	122	142	150	160	165

5	6	14	15	20	20	20	28	30	30
30	30	31	33	35	40	41	41	41	50
50	55	55	55	55	55	60	60	60	65
65	65	66	67	75	75	80	85	85	86
90	90	98	99	110	122	142	150	160	165

5	6	14	15	20	20	20	28	30	30
30	30	31	33	35	40	41	41	41	50
50	55	55	55	55	55	60	60	60	65
65	65	66	67	75	75	80	85	85	86
90	90	98	99	110	122	142	150	160	165