5.44 The cost of Internet access. In Canada, households spent an average of $54.17 CDN monthly for high-speed Internet access.24 Assume that the standard deviation is $17.83. If you ask an SRS of 500 Canadian households with high-speed Internet how much they pay, what is the probability that the average amount will exceed $55?
5.45 Dust in coal mines.
A laboratory weighs filters from a coal mine to measure the amount
of dust in the mine atmosphere. Repeated measurements of the
weight of dust on the same filter vary Normally, with standard
deviation
The laboratory reports the mean of three weighings of this filter. What is the distribution of this mean?
What is the probability that the laboratory will report a weight of 137.12 mg or higher for this filter?
5.46 The effect of sample size on the standard deviation. Assume that the standard deviation in a very large population is 100.
Calculate the standard deviation for the sample mean for samples of size 1, 4, 25, 100, 250, 500, 1000, and 5000.
Graph your results with the sample size on the x axis and the standard deviation on the y axis.
Summarize the relationship between the sample size and the standard deviation that your graph shows.
5.47 Monitoring the emerald ash borer. The emerald ash borer is a beetle that poses a serious threat to ash trees. Purple traps are often used to detect or monitor populations of this pest. In the counties of your state where the beetle is present, thousands of traps are used to monitor the population. These traps are checked periodically. The distribution of beetle counts per trap is discrete and strongly skewed. A majority of traps have no beetles, and only a few will have more than two beetles. For this exercise, assume that the mean number of beetles trapped is 0.43, with a standard deviation of 0.95.
Suppose that your state does not have the resources to check
all the traps, so it plans to check only an SRS of
Use the central limit theorem to find the probability that the average number of beetles in 150 traps is greater than 0.55.
Do you think it is appropriate in this situation to use the central limit theorem? Explain your answer.
5.48 Attitudes toward drinking and studies of behavior. Some of the methods in this chapter are based on approximations rather than exact probability results. We have given rules of thumb for safe use of these approximations.
You are interested in attitudes toward drinking among the 75 members of a fraternity. You choose 30 members at random to interview. One question is “Have you had five or more drinks at one time during the past week?” Suppose that, in fact, 30% of the 75 members would say Yes. Explain why you cannot safely use the B(30, 0.3) distribution for the count X in your sample who say Yes.
The National AIDS Behavioral Surveys found that 0.2% (that’s 0.002 as a decimal fraction) of adult heterosexuals had both received a blood transfusion and had a sexual partner from a group at high risk of AIDS. Suppose that this national proportion holds for your region. Explain why you cannot safely use the Normal approximation for the sample proportion who fall in this group when you interview an SRS of 1000 adults.
5.49 Benford’s law. It is a striking fact that the first digits of numbers in legitimate records often follow a distribution known as Benford’s law. Here it is:
First digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Proportion | 0.301 | 0.176 | 0.125 | 0.097 | 0.079 | 0.067 | 0.058 | 0.051 | 0.046 |
Fake records usually have fewer first digits 1, 2, and 3. What is the approximate probability, if Benford’s law holds, that among 1000 randomly chosen invoices there are 575 or fewer amounts with first digit 1, 2, or 3?
5.50 Watching live television. A survey of 442 people aged 18 to 29 revealed that 30% watch live television every day.25 You take a random sample of 20 undergraduates from your university and ask them whether they watch live TV every day. If their rate matches the 30% rate:
What is the distribution of the number of students who say they watch live television every day?
What is the distribution of the number of students who say that they do not watch live television every day?
What is the probability that no more than 2 of the 20 students in your sample say that they watch live television every day?
5.51 Leaking gas tanks. Leakage from underground gasoline tanks at service stations can damage the environment. It is estimated that 25% of these tanks leak. You examine 15 tanks chosen at random, independently of each other.
What is the mean number of leaking tanks in such a sample of 15?
What is the probability that 10 or more of the 15 tanks leak?
Now you do a larger study, examining a random sample of 2000 tanks nationally. What is the probability that at least 540 of these tanks are leaking?
5.52 Watching live television, continued. Refer to Exercise 5.50. You think that the undergraduate rate of those who watch live television every day at your university is 15%.
Using this rate, what is the expected number of students in your sample who say that they watch live television every day? What is the expected number of students who say that they do not watch live television every day? You should see that these two means add to 20, the total number of students.
What is the probability that no more than 2 of the 20 students in your sample say that they watch live television every day?
Based on your answer to part (b) and your answer to part (c)
of Exercise 5.50,
which of the two rates (30% or 15%) is more supported by an
observed count
5.53 Marks per round in cricket. Cricket is a dart game that uses the numbers 15 to 20 and the bull’s-eye. Each time you hit one of these regions, you score either 0, 1, 2, or 3 marks. Thus, in a round of three throws, a person can score 0 to 9 marks. Lex plans to play 20 games. Her distribution of marks per round is discrete and strongly skewed. A majority of her rounds result in 0, 1, or 2 marks, and only a few are more than 4 marks. Assume that her mean is 2.21 marks per round, with a standard deviation of 1.90, and that her 20 games will involve 140 rounds.
What are the mean and standard deviation of the average number
of marks
Using the central limit theorem, what is the probability that Lex averages fewer than 2 marks per round?
Do you think that the probability obtained in part (b) is good approximation to the true probability in this setting? Explain your answer.
5.54 Common last names. The U.S. Census Bureau says that the 10 most common names in the United States are (in order) Smith, Johnson, Williams, Brown, Jones, Garcia, Miller, Davis, Rodriguez, and Martinez.26 These names account for 4.9% of all U.S. residents. Out of curiosity, you look at the authors of the textbooks for your current courses. There are 12 authors in all. Would you be surprised if none of the names of these authors were among the 10 most common? Give a probability to support your answer and explain the reasoning behind your calculation.
5.55 Use the Normal approximation. Suppose that we toss a fair coin. Use the Normal approximation to find the probability that the sample proportion of heads is
between 0.45 and 0.55 when
between 0.48 and 0.52 when
Use these results to describe the relationship between the
sample size and the precision of the estimate
5.56 Use the Probability applet. The
Probability applet simulates tosses of a coin. You can
choose the number of tosses n and the probability
p of a head. You can therefore use the applet to simulate
binomial random variables.
The count of misclassified sales records in
Example 5.20
has the binomial distribution with
What proportion of the 25 samples had exactly 0 bad records? Do you think this sample proportion is close to the probability?
Remember that this probability of 0.2863 tells us only what happens in the long run. Here we’re considering only 25 samples. If X is the number of samples out of 25 with exactly 0 misclassified records, what is the distribution of X?
Explain how to use the distribution in part (b) to describe
the sampling distribution of
5.57 A random walk.
A particle moves along the line in a random walk. That is, the
particle starts at the origin (position 0) and moves either right
or left in independent steps of length 1. If the particle moves to
the right with probability 0.6, its movement at the ith
step is a random variable
The position of the particle after k steps is the sum of these random movements,
Use the central limit theorem to find the approximate probability that the position of the particle after 500 steps is at least 200 to the right.
5.58 Tossing a die. You are tossing a balanced die that has probability 1/6 of coming up 1 on each toss. Tosses are independent. We are interested in how long we must wait to get the first 1.
The probability of a 1 on the first toss is 1/6. What is the probability that the first toss is not a 1 and the second toss is a 1?
What is the probability that the first two tosses are not 1s and the third toss is a 1? This is the probability that the first 1 occurs on the third toss.
Now you see the pattern. What is the probability that the first 1 occurs on the fourth toss? On the fifth toss?
5.59 The geometric distribution.
Generalize your work in the previous exercise. You have
independent trials, each resulting in a success or a failure. The
probability of a success is p on each trial. The binomial
distribution describes the count of successes in a fixed number of
trials. Now the number of trials is not fixed; instead, continue
until you get a success. The random variable Y is the
number of the trial on which the first success occurs. What are
the possible values of Y? What is the probability
5.60 Wi-Fi interruptions. Suppose that the number of Wi-Fi interruptions on your home network follows the Poisson distribution, with an average of 1.6 Wi-Fi interruptions per day.
Show that the probability of no interruptions on a given day is 0.2019.
Treating each day as a trial in a binomial setting, use the binomial formula to compute the probability of no interruptions in a week.
Now, instead of using the binomial model, let’s use the Poisson distribution exclusively. What is the mean number of Wi-Fi interruptions during a week?
Based on the Poisson mean of part (c), use the Poisson distribution to compute the probability of no interruptions in a week. Confirm that this probability is the same as found part (b). Explain in words why the two ways of computing no interruptions in a week give the same result.
Explain why using the binomial distribution to compute the probability that only one day in the week will not be interruption free would not give the same probability had we used the Poisson distribution to compute that only one interruption occurs during the week.
5.61 Poisson distribution? Suppose you find in your spam folder an average of two spam emails every 10 minutes. Furthermore, you find that the rate of spam mail from midnight to 6 a.m. is twice the rate during other parts of the day. Explain whether or not the Poisson distribution is an appropriate model for the spam process.
5.62 A lottery payoff. A $1 bet in a state lottery’s Pick 3 game pays $500 if the three-digit number you choose exactly matches the winning number, which is drawn at random. Here is the distribution of the payoff X:
Payoff X | $0 | $500 |
Probability | 0.999 | 0.001 |
Each day’s drawing is independent of other drawings.
Joe buys a Pick 3 ticket twice a week. The number of times he wins follows a B(104, 0.001) distribution. Using the Poisson approximation to the binomial, what is the probability that he wins at least once?
The exact binomial probability is 0.0988. How accurate is the Poisson approximation here?
If Joe pays $5 a ticket, he needs to win at least twice a year to come out ahead. Using the Poisson approximation, what is the probability that Joe will come out ahead?
5.63 A test for ESP. In a test for ESP (extrasensory perception), the experimenter looks at cards that are hidden from the subject. Each card contains either a star, a circle, a wave, or a square. As the experimenter looks at each of 20 cards in turn, the subject names the shape on the card.
If a subject simply guesses the shape on each card, what is the probability of a successful guess on a single card? Because the cards are independent, the count of successes in 20 cards has a binomial distribution.
What is the probability that a subject correctly guesses at least 10 of the 20 shapes?
In many repetitions of this experiment with a subject who is guessing, how many cards will the subject guess correctly, on the average? What is the standard deviation of the number of correct guesses?
A standard ESP deck actually contains 25 cards. There are 5 different shapes, each of which appears on 5 cards. The subject knows that the deck has this makeup. Is a binomial model still appropriate for the count of correct guesses in one pass through this deck? If so, what are n and p? If not, why not?
5.64 A roulette payoff. A $1 bet on a single number on a casino’s roulette wheel pays $35 if the ball ends up in the number slot you choose. Here is the distribution of the payoff X:
Payoff X | $0 | $35 |
Probability | 0.974 | 0.026 |
Each spin of the roulette wheel is independent of other spins.
What are the mean and standard deviation of X?
Sam comes to the casino weekly and bets on 10 spins of the roulette wheel. What does the law of large numbers say about the average payoff Sam receives from his bets each visit?
What does the central limit theorem say about the distribution of Sam’s average payoff after betting on 520 spins in a year?
Sam comes out ahead for the year if his average payoff is greater than $1 (the amount he bet on each spin). What is the probability that Sam ends the year ahead? The true probability is 0.396. Does using the central limit theorem provide a reasonable approximation?
5.65 A roulette payoff revisited. Refer to the previous exercise. In part (d), the central limit theorem was used to approximate the probability that Sam ends the year ahead. The estimate was about 0.10 too large. Let’s see if we can get closer using the Normal approximation to the binomial with the continuity correction.
If Sam plans to bet on 520 roulette spins, he needs to win at least $520 to break even. If each win gives him $35, what is the minimum number of wins m he must have?
Given
Use the information in the previous two parts to compute
5.66 Learning a foreign language. Does delaying
oral practice hinder learning a foreign language? Researchers
randomly assigned 25 beginning students of Russian to begin
speaking practice immediately and another 25 to delay speaking for
four weeks. At the end of the semester both groups took a standard
test of comprehension of spoken Russian. Suppose that in the
population of all beginning students, the test scores for early
speaking vary according to the N(32, 6) distribution and
scores for delayed speaking have the N(29, 5) distribution.
What is the sampling distribution of the mean score
If the experiment were repeated many times, what would be the
sampling distribution of the difference
What is the probability that the experiment will find (misleadingly) that the mean score for delayed speaking is at least as large as that for early speaking?
5.67 Summer employment of college students.
Suppose (as is roughly true) that 88% of college men and 82% of
college women were employed last summer. A sample survey
interviews SRSs of 400 college men and 400 college women. The two
samples are, of course, independent.
What is the approximate distribution of the proportion
The survey wants to compare men and women. What is the
approximate distribution of the difference in the proportions
who worked,
What is the probability that in the sample a higher proportion of women than men worked last summer?
5.68 Income of working couples. A study of
working couples measures the income X of the husband and
the income Y of the wife in a large number of couples in
which both partners are employed. Suppose that you knew the means
Is it reasonable to take the mean of the total income
Is it reasonable to take the variance of the total income to
be
5.69 More on watching live television.
Consider the settings of
Exercises 5.50 and
5.52.
Using the reported 30% from the survey, what is the largest
number m out of
Now using the hypothesized rate of 15% and your answer to part
(a), what is
If you were to increase the sample size from
5.70 Iron depletion without anemia and physical
performance.
Several studies have shown a link between iron depletion without
anemia (IDNA) and physical performance. In one study, the physical
performance of 24 female collegiate rowers with IDNA was compared
with that of 24 female collegiate rowers with normal iron
status.27
Several different measures of physical performance were studied,
but we’ll focus here on training-session duration. Assume that
training-session duration of female rowers with IDNA is Normally
distributed, with mean 58 minutes and standard deviation 11
minutes. Training-session duration of female rowers with normal
iron status is Normally distributed, with mean 69 minutes and
standard deviation 18 minutes.
What is the probability that the mean duration of the 24 rowers with IDNA exceeds 63 minutes?
What is the probability that the mean duration of the 24 rowers with normal iron status is less than 63 minutes?
What is the probability that the mean duration of the 24 rowers with IDNA is greater than the mean duration of the 24 rowers with normal iron status?
5.71 Treatment and control groups.
The previous exercise illustrates a common setting for statistical
inference. This exercise gives the general form of the sampling
distribution needed in this setting. We have a sample of
n observations from a treatment group and an independent
sample of m observations from a control group. Suppose that
the response to the treatment has the
Under the assumptions given, what is the distribution of
What is the distribution of
5.72 Risks and insurance. The idea of insurance
is that we all face risks that are unlikely but carry high cost.
Think of a fire destroying your home. So we form a group to
share the risk: we all pay a small amount, and the insurance
policy pays a large amount to those few of us whose homes burn
down. An insurance company looks at the records for millions of
homeowners and sees that the mean loss from fire in a year is
Explain clearly why it would be unwise to sell only 100 policies. Then explain why selling many thousands of such policies is a safe business.
Suppose the company sells the policies for $700. If the company sells 50,000 policies, what is the approximate probability that the average loss in a year will be greater than $700?
5.73 Binge drinking.
The Centers for Disease Control and Prevention finds that 28% of
people aged 18 to 24 years binge drank. Those who binge drank
averaged 9.3 drinks per episode and 4.2 episodes per month. The
study took a sample of over 18,000 people aged 18 to 24 years,
so the population proportion of people who binge drank is very
close to
What is the sample proportion of students at your college who binge drink?
If, in fact, the proportion of all students on your campus who binge drink is the same as the national 28%, what is the probability that the proportion in an SRS of 200 students is as large as or larger than the result of the administration’s sample?
A writer for the student paper says that the percent of students who binge drink is higher on your campus than nationally. Write a short letter to the editor explaining why the survey does not support this conclusion.
5.74 The ideal number of children. “What do you
think is the ideal number of children for a family to have?” A
Gallup Poll asked this question of 1020 randomly chosen adults.
Roughly 41% thought that a total of three or more children was
ideal.29
Suppose that
What is the probability that the sample proportion
What is the probability that a sample proportion
Combine these results to make a general statement about the effect of larger samples in a sample survey.
5.75 Is the ESP result better than guessing? When the ESP study of Exercise 5.63 discovers a subject whose performance appears to be better than guessing, the study continues at greater length. The experimenter looks at many cards bearing one of five shapes (star, square, circle, wave, and cross) in an order determined by random numbers. The subject cannot see the experimenter as the experimenter looks at each card in turn, in order to avoid any possible nonverbal clues. The answers of a subject who does not have ESP should be independent observations, each with probability 1/5 of success. We record 900 attempts.
What are the mean and the standard deviation of the count of successes?
What are the mean and the standard deviation of the proportion of successes among the 900 attempts?
What is the probability that a subject without ESP will be successful in at least 24% of 900 attempts?
The researcher considers evidence of ESP to be a proportion of successes so large that there is only probability 0.01 that a subject could do this well or better by guessing. What proportion of successes must a subject have to meet this standard? (Example 1.45, on page 62, shows how to do an inverse calculation for the Normal distribution that is similar to the type required here.)
5.76 How large a sample is needed? The changing
probabilities you found in
Exercise 5.74 are due
to the fact that the standard deviation of the sample proportion