Answers to Odd-Numbered Exercises

Chapter 1 CHECK-IN QUESTIONS

  1. 1.1 Europe; 121.

  2. 1.3 Who: The cases are Initial Coin Offerings (ICOs); there are nine cases. What: There are four variables–ID, Name, Location, and Amount in millions of dollars. The purpose of the data is to list ICOs that have raised more than $100 million.

  3. 1.5 Answers will vary.

  4. 1.7 Answers will vary. Both tell the same story, so it is down to personal preference.

  5. 1.9 Answers will vary.

  6. 1.11 Student preferences will vary. The stemplot has the advantage of showing each individual score.

  7. 1.13 The histogram shows more detail about the distribution and has the same shape as the stemplot with split stems.

  8. 1.15

    1. California, Florida, Texas, New York.
    2. These states do not influence the distribution of undergraduate students per 1000 people.
  9. 1.17 x¯=82.9

  10. 1.19 The ordered list is: 1 6 6 6 7 7 9 9 12 12 12 12 14 17 17 18 19 23 27 29 31 43 49 67 203. M=14. Without the outlier, the median is 13; with the outlier, the median is 14. The outlier does not influence the median greatly.

  11. 1.21 Q1=75,Q3=93.

  12. 1.25 s2=182.77,s=13.52.

  13. 1.27 With Venezuela: IQR=18; without Venezuela: IQR=15.5. The IQR is affected very little by outliers.

  14. 1.29 According to the rule, 95% of scores will fall between μ±2. Therefore, 95% of NAEP scores are between 202 and 362.

  15. 1.31 For 300, z=0.45.

  16. 1.33 For X=350,z=1.70 and the proportion less that 350 is the area to the left, which is 0.9554. For the proportion greater than or equal to 350, we calculate 10.9554=0.0446.

  17. 1.35 To get the top 20% of students, we need to solve for the 80th percentile. The corresponding z is 0.84. So a student needs to score at least 315.6.

CHAPTER 1 EXERCISES

  1. 1.1

    1. The cases are student organizations.
    2. The variables are: Whether the majority of members are undergraduate or graduate students (undergraduate or graduate); Primary advisor email address (possible values are email addresses); Meeting day (Sunday–Saturday); and Number of members (0 to the number of students enrolled at the university).
    3. Number of members is quantitative, the rest are categorical.
    4. The name of the organization is the label.
    5. Who: part (a) What: part (b) and (c) Why: We could look at the number of members depending on when they meet or whether they are graduate or undergraduate.
  2. 1.3

    1. The cases are employed students who have graduated.
    2. The variables are: Starting salary ($0 to $100,000); Employment industry (a list of 20); State of employment (U.S. states or other country).
    3. Salary is quantitative, the others are categorical.
    4. Yes, a label was used; it is the ID numbered 1 to 1255.
    5. Who: part (a) What: part (b) and (c) Why: We could look at the starting salary based on where they are employed or their industry.
  3. 1.5

    1. The cases are employees.
    2. No, the last name alone cannot be treated as a label because there could be multiple people with the same last name. A label must be unique to each case in the data set.
    3. Employee identification number—label, last name—categorical, first name—categorical, middle initial—categorical, department—categorical, number of years—quantitative, salary—quantitative, education—categorical, age—quantitative.
  4. 1.7 Answers will vary. This could include enrollment, graduation rate, job placement rate, in-state tuition, out-of-state tuition, public/private institution, etc.

  5. 1.9 Answers will vary.

  6. 1.11

    1. A bar graph because we are comparing a quantitative variable (minutes) with a categorical variable (day of the week).
    2. Stemplot because, assuming that the grades are the percent grades and not letter grades, the values are quantitative with stems from 0 to 10, and we can feasibly write 120 numbers in the plot. We could also use a histogram, but with a stemplot, we can see the individual grades.
    3. We could use a pie chart if we turn the values into percents for each color. A bar graph could also work here.
    4. A histogram would be best for this data because the number of students in a graduating class is quantitative. We could also use a stemplot, but we would assume that there are too many high schools in the entire state of Iowa to write down all of them individually.
  7. 1.15

    1. The distribution is skewed to the right.
    2. There appears to be one large outlier at 4213.49.
    3. The shape is roughly symmetric, the center is around 3130, the range is from 2664.38 to 4213.49.
  8. 1.17

    1. Energy is highest in January, decreases toward the spring, increases again in July and August, is lower in September and October, and increases again in December.
    1. The graph makes it much easier to see the variability visually than does a table of how things change month to month for 12 months.
  9. 1.19 The Pareto chart is easiest to read because it allows you to quickly tell which colors had the highest votes and which had the lowest votes for least favorite color.

  10. 1.21 It is slightly easier to read the Pareto chart because the many different categories make the pie chart harder to read.

  11. 1.23

    1. Four variables: GPA, IQ, and self-concept are quantitative; gender is categorical.
    1. The histogram is slightly easier to take in a glance with all of the GPAs.
    2. Unimodal and skewed left, centered near 7.8, spread from 0.5 to 10.8.
    3. The males have a much larger spread and a much more left-skewed distribution.
  12. 1.25 Older coins are more rare, and so the older the year, the less likely they are to be in circulation, and therefore you probably won’t have many of them in your pockets.

  13. 1.27 Overall times to run the Boston Marathon decrease from 1972 to about 1982 and then plateau. Times stop improving around 2006.

  14. 1.29

    1. x¯=3208.44.
    2. M=3130.37.
    3. Because the distribution is right-skewed with a potential outlier, the median is a better measure of center.
  15. 1.31

    1. s=306.68.
    2. Q1=3027.64, Q3=3286.95.
    3. Min=2664.38 (this is the smallest value), Q1=3027.64 (this value has 25% of the observations below it), M=3130.37 (this is the middle observation, or has 50% of the observations below or above it), Q3=3286.95 (this value has 75% of the observations below it), Max=4213.49 (this is the largest value).
    4. The five-number summary would be better for this distribution because it is right-skewed with a potential outlier.
  16. 1.33

    1. The distribution is right-skewed with a potential outlier.
    2. The distribution is right-skewed.
    3. Preference will vary. The only advantage of the stemplot is that it preserves the data; otherwise, the histogram is likely better. The boxplot is also fine but hides some of the details that the histogram shows.
  17. 1.35

    1. ,
    2. The KPOT values are right-skewed, whereas the KSUP values are fairly symmetric. The center for KSUP is higher than the center for the KPOT. Also, the KPOT values are more spread out than the KSUP values.
    3. It is easier to compare two groups when looking at side-by-side boxplots.
  18. 1.37

    1. x¯=122.9.
    2. M=102.5.
    3. The data set is right-skewed with an outlier, so the median is a better center.
  19. 1.39

    1. IQR=62.
    2. Outliers are below 26 or above 222. London is confirmed as an outlier.
    3. The first three quarters are about equal in length, and the last is extremely long.
    4. The main part of the distribution is relatively symmetric; there is one extreme high outlier. The minimum is about 25, the first quartile is about 70, the median is about 100, and the third quartile is about 130. There is a gap in the data from roughly 200 to about 425.
  20. 1.41

    1. s=8.80.
    2. With n=50, the positions of Q1 and Q3 will be at 13 and 38. We find Q1=43.79 and Q3=57.02.
  21. 1.43

    1. Because weight is quantitative and has a decent number of observations (n=25), a histogram is a good choice. Mean and standard deviation are a good starting point for numerical summaries.
    2. Now that we see the distribution is left-skewed, we know that using the mean and standard deviation was not a good choice. Median and quartiles would have been a better choice.
    3. Answers will vary depending on where the clusters are split.
  22. 1.45

    1. With the outliers: x¯=5.2, M=4.9, Without the outliers: x¯=5, M=4.9. The median didn’t change, but without the outliers, the mean is closer to the median.
    2. With the outliers: s=1.40, Q1=4.4, Q3=5.6. Without the outliers: s=0.88 (answers will vary), Q1=4.4, Q3=5.5. The values are nearly identical with and without the outliers, but the standard deviation decreased without the outliers.
    3. Outliers can strongly affect the standard deviation and mean, and they don’t affect the quartiles and median as much.
  23. 1.47 There are fewer people with very large net worth in the United States (for example, Bill Gates, Oprah Winfrey, and Warren Buffet). These families will highly affect the mean net worth of families in the United States and skew the data.

  24. 1.49 The mean is $102,181.82. Ten of the employees make less than the mean. The median salary is $40,000.

  25. 1.51 The median doesn’t change, while the mean increases to $124,909.09.

  26. 1.53 For n=2, the median is also the average of the two values.

  27. 1.55

    1. The mean is 16, and the standard deviation is 4.97.
    2. The mean for the 20 cases is 15.75, and the standard deviation is 3.43.
    3. The mean didn’t change much, but the standard deviation decreased.
  28. 1.57 The mean is 5.082 pounds. The standard deviation is 2.86 pounds.

  29. 1.59 The 10% trimmed mean is 5.05, and the 20% trimmed mean is 5.0. These trimmed means are closer to the median than the original untrimmed mean was.

  30. 1.61

    1. Standardized values can be negative.
    2. 95% of the values will be within two standard deviations of the mean.
    3. They are switched: The mean should be 0, and the standard deviation should be 1.
  31. 1.63

    1. When the mean stays the same, the center of the curve stays at the same place. When the standard deviation increases from 2 to 4, the curve gets flatter and wider.
  32. 1.65 The table is given below.

    μ3σ μ2σ μ1σ μ μ+1σ μ+2σ μ+3σ
    168 199 230 261 292 323 354
    1. 68% of reading values will be between 230 and 292. 95% of the reading scores will be between 199 and 323. 99.7% of the reading scores will be between 168 and 354.
  33. 1.67

    Value Standardized Score
    200 1.97
    250 0.35
    280  0.61
    300  1.26
    320  1.90
  34. 1.69 The values of the eighth-grade geography scores associated with the percentiles, rounded up, with the given mean and standard deviation, are 221, 240, 261, 282, 301. These are very close to the table, so we are satisfied that the values of the eighth-grade geography scores are approximately normal.

  35. 1.71

    1. 68% of the women speak between 7856 and 20,738 words per day. 95% of women speak between 1415 and 27,179 words per day. 99.7% of women speak between 5026 and 33,620 words per day.
    2. It is not entirely reasonable because people cannot speak fewer than 0 words per day.
    3. 68% of the men speak between 4995 and 23,125 words per day. 95% of men speak between 4070 and 32,190 words per day. 99.7% of men speak between 13,135 and 41,255 words per day. This seems less reasonable because people cannot speak fewer than 0 words per day.
    4. Yes, based on this information, we think that potentially women speak more words per day than men.
  36. 1.73

    1. 75% of the observations lie below 0.75.
    2. 50% of observations lie below 0.50.
    3. 25% of observations lie between 0.50 and 0.75.
    4. This density curve is a square, making the area under the curve 1.
  37. 1.75 The mean is 0.5, the median is 0.5, Q1=0.25, and Q3=0.75.

  38. 1.77 The first quartile is about 0.67 standard deviation below the mean of a standard Normal distribution. The third quartile is about 0.67 standard deviation above the mean.

  39. 1.79

    1. 0.0322.
    2. 0.9678.
    3. 0.8159.
    4. 0.7837.
  40. 1.81

    1. z=0.47.
    2. z=0.67.
  41. 1.83 2.28% of adults are developmentally disabled, based on the criteria.

  42. 1.85 zJessica=1.02, zAshley=1.20.

  43. 1.87 Jorge’s equivalent ACT score is 31.16.

  44. 1.89 Renee scored in the 94.50th percentile.

  45. 1.91 The top 15% of all SAT scores are above 1242.

  46. 1.93 The quintiles for the ACT scores are Q20=16.96, Q40=20.15, Q60=22.85, Q80=26.04.

  47. 1.95

    1. 16.6% of women have low levels of HDL.
    2. 37.45% have protective levels of HDL.
    3. 45.95% of women are in the normal range.
  48. 1.97

    1. 0.62% of healthy adults have osteoporosis.
    2. 2.28% of the older population have osteoporosis.
  49. 1.99

    1. Q1=0.6745andQ3=0.6745.
    2. Q1=μ0.6745σandQ3=μ+0.6745σ.
  50. 1.101 The interquartile range is 1.3490, and 1.5×1.3490=2.02. So there are approximately 2×0.0217=0.0434, or 4.34% outliers.

  51. 1.103 Looking at the qqplot of DBH, we see that there is a slight s shape in the plot, which indicates that the diameter may not be Normally distributed.

  52. 1.107 We can clearly see that all sources of renewable energy have increased from 2008 to 2018, with wind and solar having the largest increases.

  53. 1.111

    1. For car makes (a categorical variable), use either a bar graph or pie chart. For car age (a quantitative variable), use a histogram, stemplot, or boxplot.
    2. Study time is quantitative, so use a histogram, stemplot, or boxplot. To show change over time, use a time plot (average hours studied against time).
    3. Use a bar graph or pie chart to show radio station preferences.
    4. Use a Normal quantile plot to see whether the measurements follow a Normal distribution.
  54. 1.115

    1. σ=7.5.
  55. 1.117

    1. μ=79 and σ=30.4.
  56. 1.119

    1. Most people will “round” their answers when asked to give an estimate like this; in fact, the most striking answers are ones such as 115, 170, or 230. The students who claimed 360 minutes (six hours) and 300 minutes (five hours) may have been exaggerating.
    2. Women seem to generally study more (or claim to), as there are none that claim less than 60 minutes per night. The center (median) for women is 170; for men the median is 120 minutes.
  57. 1.121 x¯=35.66, s=41.56, Min=0, Q1=1, M=11.5, Q3=68, Max=181. On average, the band pauses for 35.66 seconds; however, the largest portion of the time, they don’t pause at all. The distribution is strongly right-skewed and shows that sometimes the band pauses for as much as 181 seconds, or 3 minutes, before playing the final note.

  58. 1.123 Antho2 is approximately Normally distributed. x¯=1.711, s=0.590.

  59. 1.125 The distribution is highly skewed to the right. The five-number summary is 0.0154, 0.0784, 0.1423, 0.6975, 4.2995.