4.4 Means and Variances of Random Variables in Chapter 4 Probability: The Study of Randomness

4.4 Means and Variances of Random Variables

When you complete this section, you will be able to:

Use a probability distribution to find the mean of a discrete random variable.
Apply the law of large numbers to describe the behavior of the sample mean as the sample size increases.
Find means using the rules for means of linear transformations, sums, and differences.
Use a probability distribution to find the variance and the standard deviation of a discrete random variable.
Find variances and standard deviations using the rules for variances and standard deviations for linear transformations.
Find variances and standard deviations using the rules for variances and standard deviations for sums of and differences between two random variables and for uncorrelated and for correlated random variables.

The probability histograms and density curves that picture the probability distributions of random variables resemble our earlier pictures of distributions of data. In describing data, we moved from graphs to numerical measures such as means and standard deviations. Now we will make the same move to expand our descriptions of the distributions of random variables. We can speak of the mean winnings in a game of chance or the standard deviation of the randomly varying number of calls a travel agency receives in an hour. In this section, we will learn more about how to compute these descriptive measures and about the laws they obey.

The mean of a random variable

In Chapter 1 (page 26), we learned that the mean x¯ is the average of the observations in a sample. Recall that a random variable X is a numerical outcome of a random process. The mean can be considered the average of many replications of X. This is true for discrete and continuous random variables.

If we think of a random process as corresponding to the population, then the mean of the random variable is a characteristic of this population. Here is an example.

Example 4.33 The Tri-State Pick 3 lottery.

Most states and Canadian provinces have government-sponsored lotteries. Here is a simple lottery wager, from the Tri-State Pick 3 game that New Hampshire shares with Maine and Vermont. You choose a three-digit number, 000 to 999. The state chooses a three-digit winning number at random and pays you $500 if your number is chosen.

Because there are 1000 three-digit numbers, you have probability 1/1000 of winning. Taking X to be the amount your ticket pays you, the probability distribution of X is

Payoff X	$0	$500
Probability	0.999	0.001

The random process consists of drawing a three-digit number. The population consists of the numbers 000 to 999. Each of these possible outcomes is equally likely. In the setting of sampling in Chapter 3 (page 181), we can view the random process as selecting an SRS of size 1 from the population. The random variable X is $500 if the selected number is equal to the one that you chose and is $0 if it is not.

What is your average payoff from many tickets? The ordinary average of the two possible outcomes $0 and $500 is $250, but that makes no sense as the average because $500 is much less likely than $0. In the long run, you receive $500 once in every 1000 tickets and $0 on the remaining 999 of 1000 tickets. The long-run average payoff is

$50011000+$09991000=$0.50

or 50 cents. That number is the mean of the random variable X. (Tickets cost $1, so in the long run, the state keeps half the money you wager.)

If you play Tri-State Pick 3 several times, we would—as usual—call the mean of the actual amounts you win x¯. The mean in Example 4.33 is a different quantity: it is the long-run average winnings you expect if you play a very large number of times.

Check-in

4.16 Find the mean of the probability distribution. You toss a fair coin. If the outcome is heads, you win $20; if the outcome is tails, you win nothing. Let X be the amount that you win in a single toss of a coin. Find the probability distribution of this random variable and its mean.

Just as probabilities are an idealized description of long-run proportions, the mean of a probability distribution describes the long-run average outcome. We can’t call this mean x¯, so we need a different symbol. The common symbol for the mean of a probability distribution is μ, the Greek letter mu. We used μ in Chapter 1 for the mean of a Normal distribution, so this is not a new notation. We will often be interested in several random variables, each having a different probability distribution with a different mean.

To remind ourselves that we are talking about the mean of X, we often write μX rather than simply μ. In Example 4.33, μX=$0.50. Notice that, as often happens, the mean is not a possible value of X. You will often find the mean of a random variable X called the expected value of X. This term can be misleading because we don’t necessarily expect one observation on X to be close to its expected value.

The mean of any discrete random variable is found just as in Example 4.33. It is an average of the possible outcomes—but it is a weighted average in which each outcome is weighted by its probability. Because the probabilities add to 1, we have total weight 1 to distribute among the outcomes. An outcome that occurs half the time has probability one-half and gets one-half the weight in calculating the mean. Here is the general definition.

Mean of a discrete random variable

Suppose that X is a discrete random variable whose distribution is

Value of X	x1	x2	x3	. . .
Probability	p1	p2	p3	. . .

To find the mean of X, multiply each possible value by its probability and then add all the products:

μX=x1p1+x2p2+⋯=∑ xipi

Example 4.34 The mean of equally likely first digits.

If first digits in a set of data all have the same probability, the probability distribution of the first digit X is then

First digit X	1	2	3	4	5	6	7	8	9
Probability	1/9	1/9	1/9	1/9	1/9	1/9	1/9	1/9	1/9

The mean of this distribution is

μX=1×19+2×19+3×19+4×19+5×19+6×19+7×19+8×19+9×19=45×19=5

Suppose that the random digits in Example 4.34 had a different probability distribution. In Example 4.15 (page 215), we described Benford’s law as a probability distribution that describes first digits of numbers in many real situations. Let’s calculate the mean for Benford’s law.

Example 4.35 The mean of first digits that follow Benford’s law.

Here is the distribution of the first digit for data that follow Benford’s law. We use the letter V for this random variable to distinguish it from the one that we studied in Example 4.34. The distribution of V is

First digit V	1	2	3	4	5	6	7	8	9
Probability	0.301	0.176	0.125	0.097	0.079	0.067	0.058	0.051	0.046

The mean of V is

μV=(1)(0.301)+(2)(0.176)+(3)(0.125)+(4)(0.097)+(5)(0.079)+(6)(0.067)(7)(0.058)+(8)(0.051)+(9)(0.046)=3.441

The mean reflects the greater probability of smaller first digits under Benford’s law than when first digits 1 to 9 are equally likely.

Figure 4.13 locates the means of X and V on the two probability histograms. Because the discrete uniform distribution of Figure 4.13(a) is symmetric, the mean lies at the center of symmetry. We can’t precisely locate the mean of the right-skewed distribution of Figure 4.13(b) by eye; calculation is needed.

What about continuous random variables? The probability distribution of a continuous random variable X is described by a density curve. Chapter 1 (page 49) showed how to find the mean of the distribution: it is the point at which the area under the density curve would balance if it were made out of solid material. The mean lies at the center of symmetric density curves such as the Normal curves. Exact calculation of the mean of a distribution with a skewed density curve requires advanced mathematics.¹² The idea that the mean is the balance point of the distribution applies to discrete random variables as well, but in the discrete case, we have a formula that gives us this point.

Statistical estimation and the law of large numbers

We would like to estimate the mean height μ of the population of all American women between the ages of 18 and 24 years. This μ is the mean of the random variable X obtained by choosing a young woman at random and measuring her height. To estimate μ, we choose an SRS of young women and use the sample mean x¯ to estimate the unknown population mean μ. In the language of Section 5.1 (page 271), μ is a parameter, and x¯ is a statistic.

Statistics obtained from probability samples are random variables because their values vary in repeated sampling. The sampling distributions of statistics are just the probability distributions of these random variables.

It seems reasonable to use x¯ to estimate μ. An SRS should fairly represent the population, so the mean x¯ of the sample should be somewhere near the mean μ of the population. Of course, we don’t expect x¯ to be exactly equal to μ, and we realize that if we choose another SRS, the luck of the draw will probably produce a different x¯.

If x¯ is rarely exactly right and varies from sample to sample, why is it nonetheless a reasonable estimate of the population mean μ? If we keep on adding observations to our random sample, the statistic x¯ is guaranteed to get as close as we wish to the parameter μ and then stay that close. We have the comfort of knowing that if we can afford to keep on measuring more women, eventually we will estimate the mean height of all young women very accurately. This remarkable fact is called the law of large numbers. It is remarkable because it holds for any population, not just for some special class such as Normal distributions.

The behavior of x¯ is similar to the idea of probability. In the long run, the proportion of outcomes taking any value gets close to the probability of that value, and the average outcome gets close to the distribution mean. Figure 4.1 (page 205) shows how proportions approach probability in one example. Here is an example of how sample means approach the distribution mean.

Example 4.36 Heights of young women.

The distribution of the heights of all young women is close to the Normal distribution with mean 64.5 inches and standard deviation 2.5 inches. Suppose that μ=64.5 were exactly true.

Figure 4.14 shows the behavior of the mean height x¯ of n women chosen at random from a population whose heights follow the N(64.5,2.5) distribution. The graph plots the values of x¯ as we add women to our sample. The first woman drawn had height 64.21 inches, so the line starts there. The second had height 64.35 inches, so for n=2 the mean is

x¯=64.21+64.352=64.28

This is the second point on the line in the graph.

A graph of mean observations versus number of observations. — Figure 4.14 The law of large numbers in action, Example 4.36. As we take more observations, the sample mean always approaches the mean of the population.

The graph plots mean of first n observations on the vertical axis, ranging from 62.5 to 66.0 in increments of 0.5, versus number of observations n on the horizontal axis, ranging from 0 to 10,000 in varying increments. A dashed line representing the population mean extends horizontally from 64.5 on the vertical axis. The plot rises and falls irregularly across the graph with the following general trend. It rises from (0, 64.1) to (3, 64.6), falls to (30, 63.0), then rises to (50, 64.5). After the fiftieth observation, the variance between mean observations decreases to between 64.3 and 64.6 as it gradually approaches horizontal at the population mean around the ten thousandth observation. All values estimated.

At first, the graph shows that the mean of the sample changes as we take more observations. Eventually, however, the mean of the observations gets close to the population mean μ=64.5 and settles down at that value. The law of large numbers says that this always happens.

Check-in

4.17 Use the Law of Large Numbers applet. The Law of Large Numbers applet animates a graph like Figure 4.14 for rolling dice. Use it to better understand the law of large numbers by making a similar graph.

The mean μ of a random variable is the average value of the variable in two senses. By its definition, μ is the average of the possible values, weighted by their probability of occurring. The law of large numbers says that μ is also the long-run average of many independent observations on the variable. The law of large numbers can be proved mathematically starting from the basic laws of probability.

Thinking about the law of large numbers

The law of large numbers says broadly that the average results of many independent observations are stable and predictable. The gamblers in a casino may win or lose, but the casino will win in the long run because the law of large numbers says what the average outcome of many thousands of bets will be. An insurance company deciding how much to charge for life insurance and a fast-food restaurant deciding how many beef patties to prepare also rely on the fact that averaging over many individuals produces a stable result. It is worth the effort to think a bit more closely about such an important fact.

The “law of small numbers”

Both the rules of probability and the law of large numbers describe the regular behavior of chance phenomena in the long run. Psychologists have discovered that our intuitive understanding of randomness is quite different from the true laws of chance.¹³ For example, most people believe in an incorrect “law of small numbers.” That is, we expect even short sequences of random events to show the kind of average behavior that in fact appears only in the long run.

Some teachers of statistics begin a course by asking students to toss a coin 50 times and bring the sequence of heads and tails to the next class. The teacher then announces which students just wrote down a random-looking sequence rather than actually tossing a coin. The faked tosses don’t have enough “runs” of consecutive heads or consecutive tails. Runs of the same outcome don’t look random to us but are, in fact, common. For example, the probability of a run of three or more consecutive heads or tails in just 10 tosses is greater than 0.8.¹⁴ The runs of consecutive heads or consecutive tails that appear in real coin tossing (and that are predicted by the mathematics of probability) seem surprising to us. Because we don’t expect to see long runs, we may conclude that the coin tosses are not independent or that some influence is disturbing the random behavior of the coin.

Example 4.37 The “hot hand” in basketball.

Belief in the law of small numbers influences behavior. If a basketball player makes several consecutive shots, both the fans and her teammates believe that she has a “hot hand” and is more likely to make the next shot. This is doubtful.

Careful study suggests that runs of baskets made or missed are no more frequent in basketball than would be expected if each shot were independent of the player’s previous shots. Baskets made or missed are just like heads and tails in tossing a coin. (Of course, some players make 30% of their shots in the long run and others make 50%, so a coin-toss model for basketball must allow coins with different probabilities of a head.) Our perception of hot or cold streaks simply shows that we don’t perceive random behavior very well.¹⁵

caution Our intuition doesn’t do a good job of distinguishing random behavior from systematic influences. This is also true when we look at data. We need statistical inference to supplement exploratory analysis of data because probability calculations can help verify that what we see in the data is more than a random pattern.

How large is a large number?

The law of large numbers says that the actual mean outcome of many trials gets close to the distribution mean μ as more trials are made. It doesn’t say how many trials are needed to guarantee a mean outcome close to μ. That depends on the variability of the random outcomes. The more variable the outcomes, the more trials are needed to ensure that the mean outcome x¯ is close to the distribution mean μ. Casinos understand this: the outcomes of games of chance are variable enough to hold the interest of gamblers. Only the casino plays often enough to rely on the law of large numbers. Gamblers get entertainment; the casino has a business.

Example 4.38 Laws of large numbers for gambling.

Serious gamblers sometimes follow a system of betting in which the amount bet on each play depends on the outcome of previous plays. You might, for example, double your bet on each spin of the roulette wheel until you win—or, of course, until your fortune is exhausted. Such a system tries to take advantage of the fact that you have a memory even though the roulette wheel does not. Can you beat the odds with a system based on the outcomes of past plays? No. Mathematicians have established a stronger version of the law of large numbers that says that, if you do not have an infinite fortune to gamble with, your long-run average winnings μ remain the same as long as successive trials of the game (such as spins of the roulette wheel) are independent.

Example 4.39 Another law of large numbers.

You are in charge of a process that manufactures video screens for computer monitors. Your equipment measures the tension on the metal mesh that lies behind each screen and is critical to its image quality. You want to estimate the mean tension μ for the process by the average x¯ of the measurements. Alas, the tension measurements are not independent. If the tension on one screen is a bit too high, the tension on the next is more likely to also be high. Many real-world processes are like this: the process stays stable in the long run, but two observations made close together are likely to both be above or both be below the long-run mean. Again the mathematicians come to the rescue: as long as the dependence dies out fast enough as we take measurements farther and farther apart in time, the law of large numbers still holds.

Rules for means

We occasionally need to apply a linear transformation, a+bX, to a random variable. Similarly, there are situations where our interest is in the sum or difference between two random variables, X+Y or X−Y. The means of the transformed variables can be found easily from the means of the original random variables. Here are some examples.

Example 4.40 Paint flaws on refrigerators.

You are studying flaws in the painted finish of refrigerators made by your firm. Dimples and paint sags are two kinds of surface flaw. Not all refrigerators have the same number of dimples: many have none, some have one, some two, and so on. You ask for the average number of imperfections on a refrigerator. The inspectors report finding an average of 0.7 dimples and 1.4 sags per refrigerator. How many total imperfections of both kinds (on the average) are there on a refrigerator? That’s easy: if the average number of dimples is 0.7 and the average number of sags is 1.4, then counting both gives an average of 0.7+1.4=2.1 flaws.

In more formal language, the number of dimples on a refrigerator is a random variable X that varies as we inspect one refrigerator after another. We know only that the mean number of dimples is μX=0.7. The number of paint sags is a second random variable Y having mean μY=1.4. (As usual, the subscripts keep straight which variable we are talking about.) The total number of both dimples and sags is another random variable, the sum X+Y. Its mean μX+Y is the average number of dimples and sags together. It is just the sum of the individual means μX and μY.

That’s an important rule for how means of random variables behave. Here’s another example describing a different rule.

Example 4.41 Crickets.

The crickets living in a field have mean length of 1.2 inches. What is the mean in centimeters? There are 2.54 centimeters in an inch, so the length of a cricket in centimeters is 2.54 times its length in inches. If we multiply every observation by 2.54, we also multiply their average by 2.54. The mean in centimeters must be 2.54×1.2, or about 3.05 centimeters. More formally, the length in inches of a cricket chosen at random from the field is a random variable X with mean μX. The length in centimeters is 2.54X, and this new random variable has mean 2.54μX.

The point of these examples is that means behave like averages. Here are the rules written in more formal language.

Example 4.42 How many courses?

In Exercise 4.35 (page 233), you described the probability distribution of the number of courses taken in the fall by students at a small liberal arts college. Here is the distribution:

Courses in the fall	1	2	3	4	5	6
Probability	0.06	0.06	0.12	0.20	0.41	0.15

For the spring semester, the distribution is a little different:

Courses in the spring	1	2	3	4	5	6
Probability	0.06	0.08	0.15	0.25	0.34	0.12

For a randomly selected student, let X be the number of courses taken in the fall semester and let Y be the number of courses taken in the spring semester. The means of these random variables are

μX=(1)(0.06)+(2)(0.06)+(3)(0.12)+(4)(0.20)+(5)(0.41)+(6)(0.15)=4.29μY=(1)(0.06)+(2)(0.08)+(3)(0.15)+(4)(0.25)+(5)(0.34)+(6)(0.12)=4.09

The mean course load for the fall is 4.29 courses, and the mean course load for the spring is 4.09 courses. We assume that these distributions apply to students who earned credit for courses taken in the fall and the spring semesters. The mean of the total number of courses taken for the academic year is X+Y. Using Rule 2, we calculate the mean of the total number of courses:

μZ=μX+μY=4.29+4.09=8.38

Note that it is not possible for a student to take 8.38 courses in an academic year. This number is the mean of the probability distribution.

Example 4.43 What about credit hours?

In the previous exercise, we examined the number of courses taken in the fall and in the spring at a small liberal arts college. Suppose that we were interested in the total number of credit hours earned for the academic year. We assume that for each course taken at this college, three credit hours are earned. Let T be the mean of the distribution of the total number of credit hours earned for the academic year. What is the mean of the distribution of T? To find the answer, we can use Rule 1 with a=0 and b=3. Here is the calculation:

μT=μa+bZ=a+bμZ=0+(3)(8.38)=25.14

The mean of the distribution of the total number of credit hours earned is 25.14.

Check-in

4.18 Find μY. The random variable X has mean μX=15. If Y=10+3X, what is μY?
4.19 Find μW. The random variable U has mean μU=22, and the random variable V has mean μV=12. If W=0.5U+0.5V, find μW.

The variance of a random variable

The mean is a measure of the center of a distribution. A basic numerical description requires, in addition, a measure of the spread or variability of the distribution. The variance and the standard deviation are the measures of spread that accompany the choice of the mean to measure center. Just as for the mean, we need a distinct symbol to distinguish the variance of a random variable from the variance s2 of a data set. We write the variance of a random variable X as σX2. Once again, the subscript reminds us which variable we have in mind.

The definition of the variance σX2 of a random variable is similar to the definition of the variance s2 given in Chapter 1 (page 35). That is, the variance is an average value of the squared deviation (X−μX)2 of the variable X from its mean μX.

For discrete random variables, the average we use is a weighted average in which each outcome is weighted by its probability. Calculating this weighted average is straightforward. For continuous random variables, however, advanced mathematics is needed. Here is the formula for the discrete case.

Variance of a discrete random variable

Suppose that X is a discrete random variable whose distribution is

Value of X	x1	x2	x3	. . .
Probability	p1	p2	p3	. . .

and that μX is the mean of X. The variance of a discrete random variable X is

σX2=(x1−μX)2p1+(x2−μX)2p2+⋯=∑ (xi−μX)2pi

The standard deviation σX of X is the square root of the variance.

Example 4.44 Find the mean, the variance, and the standard deviation.

In Example 4.42, we saw that the distribution of the number X of fall courses taken by students at a small liberal arts college is

Courses in the fall	1	2	3	4	5	6
Probability	0.06	0.06	0.12	0.20	0.41	0.15

We can find the mean and variance of X by arranging the calculation in the form of a table. Both μX and σX2 are sums of columns in this table:

xi	pi	xipi	(xi−μX)2pi
1	0.06	0.06	(1−4.29)2(0.06)=0.64945
2	0.06	0.12	(2−4.29)2(0.06)=0.31465
3	0.12	0.36	(3−4.29)2(0.12)=0.19969
4	0.20	0.80	(4−4.29)2(0.20)=0.01682
5	0.41	2.05	(5−4.29)2(0.41)=0.20668
6	0.15	0.90	(6−4.29)2(0.15)=0.43862
μX=4.29			σX2=1.8259

We see that σX2=1.8259. The standard deviation of X is σX=1.8259=1.35. The standard deviation is a measure of the variability of the number of fall courses taken by the students at the small liberal arts college.

Check-in

4.20 Find the variance and the standard deviation. The random variable X has the following probability distribution:

Value of X	0	4
Probability	0.2	0.8

Find the variance σX2 and the standard deviation σX for this random variable.

Rules for variances and standard deviations

What are the facts for variances that parallel Rules 1, 2, and 3 for means? caution The mean of a sum of random variables is always the sum of their means, but this addition rule is true for variances only in special situations. The following example explains why.

Example 4.45 Family finances.

Take X to be the percent of a family’s after-tax income that is spent, and take Y to be the percent that is saved. When X increases, Y decreases by the same amount. Though X and Y may vary widely from year to year, their sum X+Y is always 100% and does not vary at all. It is the association between the variables X and Y that prevents their variances from adding.

If random variables are independent, this kind of association between their values is ruled out, and their variances do add. Two random variables X and Y are independent if knowing that any event involving X alone did or did not occur tells us nothing about the occurrence of any event involving Y alone.

Probability models often assume independence when the random variables describe outcomes that appear unrelated to each other. You should ask in each instance whether the assumption of independence seems reasonable.

When random variables are not independent, the variance of their sum depends on the correlation between them as well as on their individual variances. In Chapter 2, we met the correlation r between two observed variables measured on the same individuals. We defined the correlation r as an average of the products of the standardized x and y observations. The correlation between two random variables is defined in the same way. We won’t give the details; it is enough to know that the correlation between two random variables has the same basic properties as the correlation r calculated from data. We use ρ, the Greek letter rho, for the correlation between two random variables. The correlation ρ is a number between −1 and 1 that measures the direction and strength of the linear relationship between two variables. The correlation between two independent random variables is zero.

Example 4.46 Correlation for family finances.

Refer to the family finances setting in Example 4.45. If X is the percent of a family’s after-tax income that is spent and Y is the percent that is saved, then Y=100−X. This is a perfect linear relationship with a negative slope, so the correlation between X and Y is ρ=−1.

With the correlation at hand, we can state the rules for manipulating variances.

Rules for variances and standard deviations of linear transformations, sums, and differences

Rule 1. If X is a random variable and a and b are fixed numbers, then

σa+bX2=b2σX2

Rule 2. If X and Y are independent random variables, then

σX+Y2=σX2+σY2σX−Y2=σX2+σY2

This is the addition rule for variances of independent random variables.

Rule 3. If X and Y have correlation ρ, then

σX+Y2=σX2+σY2+2ρσXσYσX−Y2=σX2+σY2−2ρσXσY

This is the general addition rule for variances of random variables.

To find the standard deviation, take the square root of the variance.

caution Because a variance is the average of squared deviations from the mean, multiplying X by a constant b multiplies σX2 by the square of the constant. Adding a constant a to a random variable changes its mean but does not change its variability. The variance of X+a is, therefore, the same as the variance of X. Because the square of −1 is 1, the addition rule says that the variance of a difference between independent random variables is the sum of the variances. For independent random variables, the difference X−Y is more variable than either X or Y alone because variations in both X and Y contribute to variation in their difference.

As with data, we prefer the standard deviation to the variance as a measure of the variability of a random variable. caution Rule 2 for variances implies that standard deviations of independent random variables do not add. To combine standard deviations, use the rules for variances. For example, the standard deviations of 2X and −2X are both equal to 2σX because this is the square root of the variance 4σX2.

Example 4.47 Payoff in the Tri-State Pick 3 lottery.

The payoff X of a $1 ticket in the Tri-State Pick 3 game is $500 with probability 1/1000 and 0 the rest of the time. Here is the combined calculation of mean and variance:

xi	pi	xipi	(xi−μX)2pi
0	0.999	0	(0−0.5)2(0.999)= 0.24975
500	0.001	0.5	(500−0.5)2(0.001)=249.50025
μX=0.5			σX2=249.75

The mean payoff is 50 cents. The standard deviation is σX=249.75=$15.80. It is usual for games of chance to have large standard deviations because large variability makes gambling exciting.

If you buy a Pick 3 ticket, your winnings are W=X−1 because the dollar you paid for the ticket must be subtracted from the payoff. Let’s find the mean and variance for this random variable.

Example 4.48 Winnings in the Tri-State Pick 3 lottery.

The winnings are W=X−1, so by the rules for means, the mean amount you win is

μW=μX−1=−$0.50

That is, you lose an average of a half dollar on a ticket. The rules for variances remind us that the variance and standard deviation of the winnings W=X−1 are the same as those of X. Subtracting a fixed number changes the mean but not the variance.

Example 4.49 Express the winnings in cents.

Using the information from the previous example, let’s express the winnings in cents. The winnings in cents are 100X, and the ticket cost is 100 cents. Our winnings in cents is the random variable C=100X−100, so we have

μC=100μX−100=−50

Suppose now that you buy a $1 ticket on each of two different days. The payoffs X and Y on the two tickets are independent because separate drawings are held each day. Your total payoff is X+Y. Let’s find the mean and standard deviation for this payoff.

Example 4.50 Two tickets.

The mean for the payoff for the two tickets is

μX+Y=μX+μY=$0.50+$0.50=$1.00

Because X and Y are independent, the variance of X+Y is

σX+Y2=σX2+σY2=249.75+249.75=499.5

The standard deviation of the total payoff is

σX+Y=499.5=$22.35

This is not the same as the sum of the individual standard deviations, which is $15.80+$15.80=$31.60. Variances of independent random variables add; standard deviations do not.

Suppose that we buy 20 lottery tickets. The same rules for means and variances of independent random variables works for any number of random variables. Therefore, the mean for 20 tickets is 20×$0.50=$10.00. The variance is 20×249.75=4995, and the standard deviation is 4995=70.68.

When we add random variables that are correlated, we need to use the correlation for the calculation of the variance but not for the calculation of the mean. Here is an example.

Example 4.51 Utility bills.

Consider a household where the monthly bill for natural gas averages $125, with a standard deviation of $75, while the monthly bill for electricity averages $174, with a standard deviation of 41. The correlation between the two bills is −0.55.

Let’s compute the mean and standard deviation of the sum of the natural-gas bill and the electricity bill. We let X stand for the natural-gas bill and Y stand for the electricity bill. Then the total is X+Y. Using the rules for means, we have

μX+Y=μX+μY=125+174=299

To find the standard deviation, we first find the variance and then take the square root to determine the standard deviation. From the general addition rule for variances of random variables,

σX+Y2=σX2+σY2+2ρσXσY=(75)2+(41)2+(2)(−0.55)(75)(41)=3923.5

Therefore, the standard deviation is

σX+Y=3923.5=63

The total of the natural-gas bill and the electricity bill has mean $299 and standard deviation $63.

The negative correlation in Example 4.51 is due to the fact that, in this household, natural gas is used for heating, and electricity is used for air-conditioning. So, when it is warm, the electricity charges are high, and the natural-gas charges are low. When it is cool, the reverse is true. This causes the standard deviation of the sum to be less than it would be if the two bills were uncorrelated (see Exercise 4.68, page 252).

There are situations where we need to combine several of our rules to find means and standard deviations. Here is an example.

Example 4.52 Calcium intake.

To get enough calcium for optimal bone health, tablets containing calcium are often recommended to supplement the calcium in the diet. One study designed to evaluate the effectiveness of a supplement followed a group of young people for seven years. Each subject was assigned to take either a tablet containing 1000 milligrams of calcium per day (mg/d) or a placebo tablet that was identical except that it had no calcium.¹⁶ A major problem with studies like this one is compliance: subjects do not always take the treatments assigned to them.

In this study, the compliance rate declined to about 47% toward the end of the seven-year period. The standard deviation of compliance was 22%. Calcium from the diet averaged 850 mg/d, with a standard deviation of 330 mg/d. The correlation between compliance and dietary intake was 0.68. Let’s find the mean and standard deviation for the total calcium intake. We let S stand for the intake from the supplement and D stand for the intake from the diet.

We start with the intake from the supplement. Because the compliance is 47% and the amount in each tablet is 1000 mg, the mean for S is

μS=1000(0.47)=470

Because the standard deviation of the compliance is 22%, the variance of S is

σS2=10002(0.22)2=48,400

The standard deviation is

σS=48,400=220

Be sure to verify which rules for means and variances are used in these calculations.

We can now find the mean and standard deviation for the total intake. The mean is

μS+D=μS+μD=470+850=1320

the variance is

σS+D2=σS2+σD2+2ρσSσD=(220)2+(330)2+2(0.68)(220)(330)=256,036

and the standard deviation is

σS+D=256,036=506

The mean of the total calcium intake is 1320 mg/d, and the standard deviation is 506 mg/d.

The correlation in this example illustrates an unfortunate fact about compliance and having an adequate diet. Some of the subjects in this study have diets that provide an adequate amount of calcium, while others do not. The positive correlation between compliance and dietary intake tells us that those who have relatively high dietary intakes are more likely to take the assigned supplements. On the other hand, subjects with relatively low dietary intakes, the ones who need the supplement the most, are less likely to take the assigned supplements.

Section 4.4 SUMMARY

The probability distribution of a random variable X, like a distribution of data, has a mean μX and a standard deviation σX.
The law of large numbers says that the average of the values of X observed in many trials must approach μ.
The mean μ is the balance point of the probability histogram or density curve. If X is discrete with possible values xi having probabilities pi, the mean is the average of the values of X, each weighted by its probability:

μX=x1p1+x2p2+⋯
The variance σX2 is the average squared deviation of the values of the variable from their mean. For a discrete random variable,

σX2=(x1−μX)2 p1+(x2−μX)2 p2+⋯
The standard deviation σX is the square root of the variance. The standard deviation measures the variability of the distribution about the mean. It is easiest to interpret for Normal distributions.
The mean and variance of a continuous random variable can be computed from the density curve, but to do so requires more advanced mathematics.
The means and variances of random variables obey the following rules. If a and b are fixed numbers, then

μa+bX=a+bμXσa+bX2=b2σX2
If X and Y are any two random variables having correlation ρ, then

μX+Y=μX+μYμX−Y=μX−μYσX+Y2=σX2+σY2+2ρσXσYσX−Y2=σX2+σY2−2ρσXσY
If X and Y are independent, then ρ=0. In this case,

σX+Y2=σX2+σY2σX−Y2=σX2+σY2
To find the standard deviation, take the square root of the variance.

Now that you have completed this section, you will be able to:

Use a probability distribution to find the mean of a discrete random variable. Review Example 4.35 (page 238) and try Exercise 4.57.
Apply the law of large numbers to describe the behavior of the sample mean as the sample size increases. Review Example 4.36 (page 239) and try Exercise 4.59.
Find means using the rules for means of linear transformations, sums, and differences. Review Examples 4.42 and 4.43 (pages 243 and 244) and try Exercise 4.61.
Use a probability distribution to find the variance and the standard deviation of a discrete random variable. Review Example 4.35 (page 238) and try Exercise 4.63.
Find variances and standard deviations using the rules for variances and standard deviations for linear transformations. Review Example 4.48 (page 247) and try Exercise 4.65.
Find variances and standard deviations using the rules for variances and standard deviations for sums of and differences between two random variables and for uncorrelated and for correlated random variables. Review Examples 4.50 and 4.51 (page 248) and try Exercise 4.67.

Section 4.4 EXERCISES

4.56 Different kinds of means. Explain the difference between the mean of a random variable and the mean of a sample.

4.57 Find the mean of the random variable. A random variable X has the following distribution:

X	−2	−1	0	1
Probability	0.1	0.2	0.4	0.3

Find the mean for this random variable. Show your work.

4.58 Servings of fruits and vegetables. The following table gives the distribution of the number of servings of fruits and vegetables consumed per day in a population:

Number of servings X	0	1	2	3	4	5
Probability	0.4	0.1	0.1	0.2	0.1	0.1

Find the mean for this random variable.

4.59 Explain what happens when the sample size gets large. Consider the following scenarios: (1) You take a sample of two observations on a random variable and compute the sample mean, (2) you take a sample of 100 observations on the same random variable and compute the sample mean, (3) you take a sample of 1000 observations on the same random variable and compute the sample mean. Explain in simple language how close you expect the sample mean to be to the mean of the random variable as you move from Scenario 1 to Scenario 2 to Scenario 3.
4.60 What’s wrong? In each of the following scenarios, there is something wrong. Describe what is wrong and give a reason for your answer.
1. If you toss a fair coin three times and get heads all three times, then the probability of getting a tail on the next toss is much greater than one-half.
2. If you multiply a random variable by 10, then the mean is multiplied by 10 and the variance is multiplied by 10.
3. When finding the mean of the sum of two random variables, you need to know the correlation between them.
4.61 Find some means. Suppose that X is a random variable with mean 20 and standard deviation 2. Also suppose that Y is a random variable with mean 40 and standard deviation 7. Assume that the correlation between X and Y is zero. Find the mean of the random variable Z for each of the following cases. Be sure to show your work.
1. Z=25−12X.
2. Z=13X−8.
3. Z=X+Y.
4. Z=X−Y.
5. Z=−3X+3Y.

4.62 Mean of the distribution for the number of aces. In Exercise 4.47 (page 234) you examined the probability distribution for the number of aces when you are dealt two cards in the game Texas hold ’em. Let X represent the number of aces in a randomly selected deal of two cards in this game. Here is the probability distribution for the random variable X:

Value of X	0	1	2
Probability	0.8507	0.1448	0.0045

Find μX, the mean of the probability distribution of X.

4.63 Find the variance and the standard deviation. A random variable X has the following distribution:

X	−2	−1	0	1
Probability	0.1	0.2	0.4	0.3

Find the variance and the standard deviation for this random variable. Show your work.

4.64 Standard deviation of the number of aces. Refer to Exercise 4.62. Find the standard deviation of the number of aces.
4.65 Find some variances and standard deviations. Suppose that X is a random variable with mean 20 and standard deviation 3. Also suppose that Y is a random variable with mean 60 and standard deviation 2. Assume that the correlation between X and Y is zero. Find the variance and the standard deviation of the random variable Z for each of the following cases. Be sure to show your work.
1. Z=33−8X.
2. Z=11X−6.
3. Z=X+Y.
4. Z=X−Y.
5. Z=−2X+2Y.
4.66 Standard deviation for fruits and vegetables. Refer to Exercise 4.58. Find the variance and the standard deviation for the distribution of the number of servings of fruits and vegetables.
4.67 What happens if the correlation is not zero? Suppose that X is a random variable with mean 20 and standard deviation 3. Also suppose that Y is a random variable with mean 60 and standard deviation 2. Assume that the correlation between X and Y is 0.4. Find the variance and the standard deviation of the random variable Z for each of the following cases. Be sure to show your work.
1. Z=33−8X.
2. Z=11X−6.
3. Z=X+Y.
4. Z=X−Y.
5. Z=−2X+2Y.
4.68 Suppose that the correlation is zero. Refer to Example 4.51 (page 248).
1. Recompute the standard deviation for the total of the natural-gas bill and the electricity bill, assuming that the correlation is zero.
2. Is this standard deviation larger or smaller than the standard deviation computed in Example 4.51? Explain why.
4.69 Find the mean of the sum. Figure 4.12 (page 235) displays the density curve of the sum Y=X1+X2 of two independent random numbers, each uniformly distributed between 0 and 1.
1. The mean of a continuous random variable is the balance point of its density curve. Use this fact to find the mean of Y from Figure 4.12.
2. Use the same fact to find the means of X1 and X2. (They have the density curve pictured in Figure 4.9, page 229.) Verify that the mean of Y is the sum of the mean of X1 and the mean of X2.
4.70 Calcium supplements and calcium in the diet. Refer to Example 4.52 (page 249). Suppose that people who have high intakes of calcium in their diets are more compliant than those who have low intakes. What effect would this have on the calculation of the standard deviation for the total calcium intake? Explain your answer.
4.71 Toss a four-sided die twice. Role-playing games like Dungeons & Dragons use many different types of dice. Suppose that a four-sided die has faces marked 1, 2, 3, and 4. The intelligence of a character is determined by rolling this die twice and adding 1 to the sum of the spots. The faces are equally likely, and the two rolls are independent. What is the average (mean) intelligence for such characters? How spread out are their intelligences, as measured by the standard deviation of the distribution?
4.72 Means and variances of sums. The rules for means and variances allow you to find the mean and variance of a sum of random variables without first finding the distribution of the sum, which is usually much harder to do.
1. A single toss of a balanced coin has either 0 or 1 head, each with probability 1/2. What are the mean and standard deviation of the number of heads?
2. Toss a coin four times. Use the rules for means and variances to find the mean and standard deviation of the total number of heads.
3. Example 4.27 (page 227) finds the distribution of the number of heads in four tosses. Find the mean and standard deviation from this distribution. Your results in parts (b) and (c) should agree.
4.73 What happens when the correlation is 1? We know that variances add if the random variables involved are uncorrelated (ρ=0) but not otherwise. The opposite extreme is perfect positive correlation (ρ=1). Show by using the general addition rule for variances that, in this case, the standard deviations add. That is, σX+Y=σX+σY if ρXY=1.
4.74 Will you assume independence? In which of the following games of chance would you be willing to assume independence of X and Y in making a probability model? Explain your answer in each case.
1. In blackjack, you are dealt two cards and examine the total points X on the cards (face cards count 10 points). You can choose to be dealt another card and compete based on the total points Y on all three cards.
2. In craps, the betting is based on successive rolls of two dice. X is the sum of the faces on the first roll, and Y the sum of the faces on the next roll.
4.75 Transform the distribution of heights from centimeters to inches. A report of the National Center for Health Statistics says that the heights of 20-year-old men have mean 176.8 centimeters (cm) and standard deviation 7.2 cm. There are 2.54 centimeters in an inch. What are the mean and standard deviation in inches?
4.76 Fire insurance. An insurance company looks at the records for millions of homeowners and sees that the mean loss from fire in a year is μ=$300 per person. (Most of us have no loss, but a few lose their homes. The $300 is the average loss.) The company plans to sell fire insurance for $300 plus enough to cover its costs and profit. Explain clearly why it would be a bad business decision to sell only 5 policies. Then explain why selling thousands of such policies is a safe business.
4.77 Mean and standard deviation for 5 policies and for 20 policies. In fact, the insurance company in the previous exercise sees that in the entire population of homeowners, the mean loss from fire is μ=$300, and the standard deviation of the loss is σ=$400. What are the mean and standard deviation of the average loss for 5 policies? (Losses on separate policies are assumed to be independent.) What are the mean and standard deviation of the average loss for 20 policies?