You are on page 1of 195

Chapter 1 Looking at Data—Distributions

1.1. The regular price for the Smokey Grill Ribs coupon is 20; the discount price is 11.
1.3. Who: The cases are coupons; there are seven cases. What: There are six variables: ID, Type, Name,
Item, RegPrice, and DiscPrice. Only RegPrice and DiscPrice have units in dollars. The data might be used
to compare coupons with one another to see which is better. We would not want to draw conclusions
about other coupons not listed.
1.5. Answers will vary. For example, giving the actual exam scores will allow the student to see the
absolute change in grade.
1.7. (a) The cases are students. (b) Four variables: Favorite choice for online research (“Google or Google
Scholar,” “Library database or website,” “Wikipedia or online encyclopedia,” “Other”), Age (reasonable
age range for first-year college students—17 to 30, etc.), Sex (M or F), and Major (could be a big list—
statistics, math, engineering, English, etc.). (c)Age is quantitative; the rest are categorical. (d) The label is
the number1 to 552. (e) Who: part (a) answer. What: part (b) and (c). Why: We could look at the
distribution of favorite choice across different age groups, majors, and sex.
1.9. (a) The cases are employees. (b) Employee identification number (label), last name (label), first name
(label), middle initial (label), department (categorical), number of years (quantitative), salary
(quantitative), education (categorical), age (quantitative). (c) Sample data would vary.

1.11. Age: quantitative, possible values 16to ? (what would the oldest student’s age be?). Sing:
categorical, yes/no. Can you play: categorical, no, a little, pretty well. Food: quantitative, possible values
$0 to ? (what would be the most a person might spend in a week?). Height: quantitative, possible values
2feet to 9 feet (check the Guinness Book of World Records).
1.13. Answers will vary. A few possibilities would be graduation rate, student–professor ratio, and job
placement rate.
1.15. Answers will vary. For example, each state could be divided as a percentage of the total of the
nation’s fatalities to show state differences; the disadvantage is that states with more population would
have a higher number of fatalities. Instead, each state’s fatalities could be divided by the state population
to get a percentage for each state; this would be a better way to compare state-to-state rates of drunk-
driving fatalities.
1.17. Both regular and split stemplots are shown. The first exam scores are left-skewed; the middle is
around 80.

5 79
6 158
7 00335558
8 000223557
9 00122448

1
5 79
6 1
6 58
7 0033
7 5558
8 000223
8 557
9 0012244
9 8
1.19. (a) The stemplot is below.(b) Use two stems, even though one is blank. Seeing the gap is useful.

3 3
3 58
4 1223
4 6777899
5 1135
5 9
6 344
6
7 2

1.21. The larger classes hide a lot of detail; there are now only three bars in the histogram.

1.23. A stemplot or histogram can be used (possible stemplots are shown in the solution to Exercise 1.17
and histograms in Exercises 1.20 and 1.22); the distribution is unimodal and left-skewed, centered near
80, and range from 55 to 98. There are no apparent outliers.
1.25. (a) Shown below. (b) The United States is a clear outlier. It has four or five times as many Facebook
users as the other countries, despite having a population smaller than some of the other countries. (c) The
United States dominates; many other countries shown have similar amounts of Facebook users.

2
1.27. (a) Bar graph below. (b) Second class had the fewest passengers. Third class had by far the most,
over twice as many as in first class. (c) A bar graph of the percents (relative frequency) would have the
same features.

1.29. We divided the number of survivors by the number of passengers in each class and then multiplied
by 100 to get a percent. A bar graph is appropriate because we now have three “wholes” to consider

3
1.31. (a) Stemplot shown below; each number is rounded to the nearest 10, so 268 is 2680. (b) The
distribution is somewhat right-skewed. (c) There appears to be one small outlier at 2680. (d) The shape is
right-skewed, the center is around 3200, and the range is from 2680 to 3950.

26 8
27
28
29 5
30 37
31 02347779
32 0358
33 2333
34 68
35 29
36 28
37
38 02
39 5

1.33. (a) 2013 still has the highest usage in December and January. See time plot below. (b) The patterns
are very similar, but the values for the winter months in 2014 are somewhat higher than those in the 2013
winter months. These differences are most likely due to weather.

Energy2013
9.5
9
8.5
8
7.5
7

1.35. Bar graph below right. For example, opinions about least-favorite color are somewhat more varied
than favorite colors. Interestingly, purple is liked and disliked by about the same percentage of people.

4
1.37. White is the most popular color in 2012 for North America, followed by Black, Silver, and Gray.
For marketing techniques, answers will vary.

1.39. (a) Four variables: GPA, IQ, and self-concept are quantitative; gender is categorical. (b) See below,
left. (c) Unimodal and skewed to the left, centered near 7.8, spread from 0.5 to 10.8. (d) There is more
variability among the boys; in fact, there seems to be two groups of boys: those with GPAs below 5 and
those with GPAs above 5.

5
1.41. The stemplot is at right, with split stems. The distribution is unimodal and skewed to the left, with
center around 59.5. Most self-concept scores are between 35 and 73, with a few below that, and one high
score of 80 (but not really high enough to be an outlier).

1.43. Without Suriname: X = 16.29. With Suriname: X = 23.96.


1.45. The ordered list is: 2 4 5 5 5 5 6 6 7 8 10 11 12 13 16 17 19 19 24 25 32 38 49 53 208 .
M = 12. Without the outlier, the median is 11.5; with the outlier, the median is 12. The outlier does not
greatly influence the median.
1.47. M = 84.

77 + 289 + 128 +  + 25
1.49. The mean is x = = 196.575 min (the value 196 in the text was rounded). The
80
quartiles and median are in positions 20.5, 40.5, and 60.5. The sorted data are given below. Based on
this:Q 1 = 54.5, M = 103.5, Q 3 = 200.

1 2 2 3 4 9 9 9 11 19
19 25 30 35 40 44 48 51 52 54
55 56 57 59 64 67 68 73 73 75
75 76 76 77 80 88 89 90 102 103
104 106 115 116 118 121 126 128 137 138
140 141 143 148 148 157 178 179 182 199
201 203 211 225 274 277 289 290 325 367
372 386 438 465 479 700 700 951 1148 2631

1.51. Shown below.

6
1.53. From software: s2 = 157.07, s = 12.53.
1.55. Without Suriname: Min = 55, Q 1 = 75, M = 84, Q 3 = 93, Max = 97. With Suriname: Min = 2, Q 1 =
5.5, M = 11.5, Q 3 = 21.5, Max = 53. Of course, the max changes drastically with the outlier removed;
otherwise, the other numbers in the five number summary do not change drastically. This shows that,
generally, the five number summary is robust.
1.57. (a) x = 3208.44. (b) M = 3130.37. (c) Because the distribution is right-skewed with a potential
outlier, the median is a better measure of center. See histogram in Exercise 1.61.
1.59. (a) s = 306.68. (b) Q 1 = 3027.64, Q 3 = 3286.95. (c) Min = 2664.38 (this is the smallest value), Q 1 =
3027.64 (this value has 25% of the observations below it), M = 3130.37 (this is the middle observation or
has 50% of the observations below or above it), Q 3 = 3286.95 (this value has 75% of the observations
below it), Max = 4213.49 (this is the largest value). (d) The five number summary would be better for this
distribution because it is right-skewed with a potential outlier. See histogram in 1.57.
1.61. (a) Shown below. The distribution is right-skewed with a potential outlier (b) Shown below. (c)
Preference will vary. The only advantage of the stemplot is that it preserves the data; otherwise, the
histogram is likely better. The boxplot is also fine but hides some of the details that the histogram shows.

7
1.63. The KPOT values (shown on the left) are right-skewed, whereas the KSUP (shown on the right)
values are fairly symmetric. The center for KSUP (right) is higher than the center for the KPOT (left).
Also, the KPOT values (left) are more spread out than the KSUP values (right).

6 26
9 27
28 8
8865 29
7753 30
53220 31 5
986633 32 37
33 02347779
9 34 0358
841 35 2333
1 36 68
37 29
38 28
39
40 02
41 5
1 42

1.65. (a) x = 122.9. (b) M = 102.5. (c) The data set is right-skewed with an outlier (London), so the
median is a better center.
1.67. (a) IQR = 129 − 67 = 62. (b) Outliers are below 67 − 1.5*62 = − 26 or above 129 + 1.5*62 = 222.
London is confirmed as an outlier. (c) The boxplots for (c) and (d) are shown on the following page. The
first three-quarters are about equal in length, and the last (upper quarter) is extremely long. (d) The main
part of the distribution is relatively symmetric; there is one extreme high outlier. The minimum is about
25, the first quartile is about 70, the median is about 100, and the third quartile is about 125. There is a
gap in the data from roughly 200 to about 425.

(e) The stemplot is at right. (f) Answers will vary. For example, the stemplot and the boxplot both
indicate the same shape distribution: relatively symmetric with an extremely high outlier. One can simply
approximate locations of the median and quartiles from the boxplot but get more exact (at least to within
rounding) with the stemplot.

8
1.69. (a) s = 8.80. (b) With n = 50, the positions of Q 1 and Q 3 will be at 13 and 38 (51 – 13, or 13 in from
the upper end of the sorted distribution). We find Q 1 = 43.79 and Q 3 = 57.02. (c) Answers will vary.
However, if students said the median was the better center in Exercise 1.64, they should pair that with the
quartiles and not the standard deviation.
1.71. (a) Answers will vary. Because weight is quantitative and has a decent amount of observations (n =
25), a histogram is a good choice. Mean and standard deviation are a good starting point for numerical
summaries. (b) Answers will vary. Now that we see the distribution is left-skewed, the choice of using the
mean and standard deviation was not a good choice. Median and quartiles would have been a better
choice. (c) Answers will vary. One possible break is between 5.3 and 6.0. The summaries for the groups
should provide better summary statistics than the grouped data.

1.73. (a) With the outlier: x = 5.235, M = 4.90. Without the outlier: x = 5.265, M = 4.905. The values are
nearly identical with and without the outlier. (b) With the outlier: s = 1.406, Q 1 = 4.40, Q 3 = 5.60. Without
the outlier: s = 1.356, Q 1 = 4.430, Q 3 = 5.600. The values are nearly identical with and without the outlier.
(c) Even though there is one outlier, its removal does not change the numerical summaries at all. This is
partly due to the large sample and partly due to the fact that this outlier is not too far from the other
observations, so that removing it doesn’t have a huge effect on the analysis.
1.75. Celebrities and business executives (Bill Gates of Microsoft, Warren Buffett, Oprah, etc.) typically
have a very large net net worth, which will pull the mean worth, making it much larger than the median.

9
7 * 55000 + 3 * 80000 + 650000
1.77. The mean is = $115,909.09. Ten of the employees make less than the
11
mean. M = $55,000.
1.79. The median doesn’t change, but the mean increases to $138,636.36.
1.81. The average would be 2.5 or less (an earthquake that isn’t usually felt). These do little or no
damage.
1.83. For n = 2 the median is also the average of the two values.
1.85. (a) The median of seven (sorted) points is the fourth, while the median of eight points is the average
of the fourth and fifth. If these are to be the same, added points must be equal to the fourth point of the
original seven, so that the fourth and fifth points are now the same. (b) Regardless of the configuration of
the first seven points, if the eighth point is added so as to leave the median unchanged, then in that
(sorted) set of eight, the fourth and fifth points must the same. Once we add a ninth point, one of these
two points will be the new middle (fifth) point, so the median will not change.
1.87. (a) Picking the same number for all four observations results in a standard deviation of 0. (b)
Picking 10, 10, 20, and 20 results in the largest standard deviation = 5.77. (c) For part (a), you may pick
any number as long as all observations are the same. For part (b), only one choice provides the largest
standard deviation.
1.89. x = 2.41(2.2) = 5.302 pounds and s = 1.25(2.2) = 2.75 pounds.

1.91. Full data set: x = 196.575 and M = 103.5 minutes. The 10% and 20% trimmed means are
x = 127.734 and x =111.917 minutes. While still larger than the median of the original data set, they are
much closer to the median than the ordinary untrimmed mean.
1.93. According to the rule, 95% of scores will fall between µ ± 2σ. Therefore, 95% of scores are between
288 − 2*38 = 212 and 288 + 2*38 = 364.

1.95. z = 350 − 288 = 1.63.


38

1.97. For X = 350, z = 350 − 288 = 1.63 , the proportion less than 350 is the area to the left, which is
38
0.9484. For the proportion greater than or equal to 350, we calculate 1 − 0.9484 = 0.0516.

Area greater than 350 = Entire area under the Normal− Area less than 350

10
1.99. To get the top 20% of students, we need to solve for the 80th percentile. The corresponding z is
0.84. So x = 288 + 38(0.84) = 319.92.
1.101. (a) Answers will vary; the graph should be a mirror image. The mean and median should be equal.
(b) Answers will vary; example below. The mean should be farther left than the median.

1.103. (a – b) Shown below. (c) The curve shifts to the left or right, but the spread remains the same.

1.105. (a) The distribution is shown at right. (b – c) The table below indicates the desired ranges. These
values are shown on the x axis of the distribution.

Low High
68% 256 320
95% 224 352
99.7% 192 384

11
1.107.
Value Percentile (Table A) Percentile (Software)
150 50 50
140 38.6 38.8
100 7.6 7.7
180 80.5 80.4
230 98.9 98.9

1.109. Using the N(153,34) distribution, we find the values corresponding to the given percentiles as
given below (using Table A). The actual scores are very close to the percentiles of the Normal
distribution. We can conclude these scores are at least approximately Normal.

Percentile Score Score with


N(153, 34)
10% 110 109
25% 130 130
50% 154 153
75% 177 176
90% 197 197

1.111. (a) Ranges are shown in the table below. In both cases, some of the lower limits are negative,
which does not make sense; this happens because the women’s distribution is skewed, and the men’s
distribution has an outlier. Contrary to the conventional wisdom, the men’s mean is slightly higher,
although the outlier is at least partly responsible for that. (b) The means suggest that Mexican men and
women tend to speak more than people of the same gender from the United States.

Women Men
68% 8489 to 20,919 7158 to 22,886
95% 2274 to 27,134 −706 to 30,750
99.7% −3941 to 33,349 −8570 to 38,614

1.113. (a) 0.25 or 1/4. (b) Graph shown below. (c) (4 – 3)/4 = 0.75 or 75%. (d) (2.5 – 1.5)/4 = 0.25 or
25%.

1.115. (a) The mean is at point C; the median is at point B. (b) The mean and median are both at point A.
(c) The mean is at point A; the median is at point B.

12
1.117. (a) The applet shows an area of 0.6826 between − 1.000 and 1.000, while the 68–95–99.7 rule
rounds this to 0.68. (b) Between − 2.000 and 2.000, the applet reports 0.9544 (compared with the
rounded 0.95 from the 68–95–99.7 rule). Between −3.000 and 3.000, the applet reports 0.9974 (compared
with the rounded 0.997).

1.119. Graphs shown below from top left to bottom right. (a) 0.0808. (b) 0.9192. (c) 0.0228. (d) 0.9772
− 0.0808 = 0.8964.

13
1.121. (a) z = 1.17 or 1.18. (b) z = 1.17 or 1.18.

1.123. z = 130 − 100 = 2. From Table A, 0.0228 or 2.28% qualify for membership.
15

16 − 21.5
1.125. For Joshua, z = = −1.02. For Anthony, z = 1050 − 1498 = −1.42. Joshua has the
5.4 316
higher standardized score.

1.127. z = 30 − 21.5 = 1.57. Using x = µ + zσ. The equivalent SAT score is 1498 + 316(1.57) =
5.4
1994.12.

1.129. z = 19 − 21.5 = −0.46. From Table A, we get 0.3228, so about the 32nd percentile.
5.4
1.131. The bottom 12% corresponds to the 12th percentile; using Table A gives z = − 1.17 or − 1.18.
Using − 1.18, the SAT score needed is 1498 + 316(–1.18) = 1125.12. So scores 1125.12 and lower make
up the bottom 12% of all scores.
1.133. From Table A, the quartiles have z-scores of − 0.675, 0, and 0.675. Using 1498 + 316(z) yields
scores of 1285, 1498, and 1711 (rounded to the nearest integer).
1.135. (a) z = (40 − 46)/13.6 = − 0.44. From Table A, 33% of men have low values of HDL. (Software
gives 32.95%.) (b) z = (60 – 46)/13.6 = 1.03. From Table A, 15.15% of men have protective levels of
HDL. (Software gives 15.16%.) (c) (1 − 0.1515) − 0.33 = 0.5185. 51.85% of men are in the intermediate
range for HDL. (Software gives 0.5188.)
1.137. (a) The first and last deciles for a standard Normal distribution are ±1.2816. (b) Fora N(9.12, 0.15)
distribution, the first and last deciles are μ − 1.2816σ = 8.93 and μ + 1.2816σ= 9.31 oz.
1.139. (a) As the quartiles for a standard Normal distribution are ±0.6745, we have IQR = 1.3490. (b) c =
1.3490: For a N(μ, σ) distribution, the quartiles are Q 1 = μ − 0.6745σ and Q 3 = μ + 0.6745σ, so IQR = (μ +
0.6745σ) − (μ − 0.6745σ) = 1.3490σ.
1.141. To find these levels, use 55 + z*15.5, where z from Table A has the given percent as the area to the
left of z. With symmetry, z10% = − z90% .

Percentile 10% 20% 30% 40% 50% 60% 70% 80% 90%
HDL level 35.2 42.0 46.9 51.1 55 58.9 63.1 68.0 74.8

1.143. (a) See solutions for Exercises 1.30 and 1.61. (b) The Normal quantile plot is shown below. The
data are roughly normal, but there is one potential high outlier.

14
1.145. There is less energy used in 2014 from fossil fuels and more used from renewable sources. There
was almost no change in nuclear and electric power usage. Graphs shown below.

1.147. x = 8.389, s = 1.965. Min = 4.9, Q 1 = 6.9, M = 8.2, Q 3 = 9.5, Max = 15.1. As shown in the
histogram, the distribution is somewhat right-skewed, so the five-number summary is a better description
for highway fuel consumption.

15
1.149. Answers may vary. One useful statistic might be the amount of time per injury for each country.
To get this, we take the Time divided by the number of injuries and then multiply by 60 to convert it to
seconds. The graph below shows the Time per injury (in seconds). It does look like some countries may
be flopping as several spend between two to three times as long per injury as other countries.
Note: One possible confounding factor is the severity of each injury.

1.151. (a) For car makes (a categorical variable), use either a bar graph or pie chart. For car age (a
quantitative variable), use a histogram, stem plot, or boxplot. (b) Study time is quantitative, so use a
histogram, stem plot, or boxplot. To show change over time, use a time plot (average hours studied
against time). (c) Use a bar graph or pie chart to show radio station preferences. (d) Use a Normal
quantile plot to see whether the measurements follow a Normal distribution.

16
1.153. Answers will vary.
1.155. (a) RDA = 75 = 60 +zσ. From Table A, z = 2.00. We have 15 = 2σ, so σ = 7.5. (b) The distribution
is shown below. Note that the UL is far off into the upper tail at 2000 mg/d.

1.157. (a) We were given that the mean is 84.1 and the 50th percentile was 79. In a Normal distribution,
these should be equal; one option would be to average these and estimate µ = (79 + 84.1)/2 = 81.55. The
5th percentile is 42 = 81.55 – 1.645σ. This implies σ = 24.04. The 95th percentile is 142 = 81.55 + 1.645σ.
This implies σ = 36.75. If we average the two estimates, we would have σ = 30.4. (b) The graph is shown
below. (c) From the two distributions, over half of women consume more vitamin C than they need, but
some consume far less.

1.159. (a) Not only are most responses multiples of 10, but many are multiples of 30 and 60. Most people
will “round” their answers when asked to give an estimate like this; in fact, the most striking answers are
ones such as 115, 170, or 230. The students who claimed 360 min (6 hours) and 300 min (5 hours) may
have been exaggerating. (Some students might also “consider suspicious” the student who claimed to
study 0 min per night. As a teacher, I can easily believe that such students exist, and I suspect that some
of your students might easily accept that claim as well.) (b) Women seem to generally study more (or
claim to), as none claim less than 60 min per night. The center (median) for women is 170; for men the
median is 120 min. (c) The boxplots are given. Opinions will vary.

17
1.161. No and no. It is easy to imagine examples of many different data sets with mean 0 and standard
deviation 1, for example, {−1,0,1} and {−2,0,0,0,0,0,0,0,2}. Likewise, for any given five numbers a ≤ b ≤
c ≤ d ≤ e (not all the same), we can create many data sets with that five-number summary simply by
taking those five numbers and adding some additional numbers in between them, for example (in
increasing order): 10, , 20, , , 30, , , 40, , 50. As long as the number in the first blank is between 10 and
20, and so on, the five-number summary will be 10, 20, 30, 40, 50.
1.163. x = 35.66, s = 41.56. Min = 0, Q 1 = 1, M = 11.5, Q 3 = 68, Max = 181. On average, the band
pauses for 35.66 seconds; however, for a large portion of the time, they don’t pause at all. The
distribution, as shown in the histogram below, is strongly right-skewed and shows that sometimes the
band pauses for as much as 181 seconds or 3 min before playing the final note.

1.165. Antho1 is normally distributed from the histogram and normal quantile plot. x = 1.630, s = 0.521.

1.167. Antho3 is right-skewed; it also has a potential high outlier. Min = 0.0546, Q 1 = 0.4506, M =
0.6781, Q 3 = 1.1282, Max = 6.3109.

18
Chapter 2 Looking at Data: Relationships
2.1. (a) The cases are students. (b) Number of friends and time (average amount spent on Facebook per
week). (c) Both variables are quantitative because both are numeric and arithmetic operations are
possible.
2.3. Cases: cups of Mocha Frappuccino. Variables: size and price (both quantitative).
2.5. (a) Tweets. (b) Click count and length of tweet are quantitative. Day of week and sex are categorical.
Time of day could be quantitative (in the hours/minutes format as hh:mm) or categorical (if morning,
afternoon, etc.). (c) Click counts is the response because the other variables can possibly explain the
number of click counts. The others could all be potentially explanatory because they can be controlled by
the researcher or are already set.
2.7. Answers will vary. Some possible variables are condition of the book (with values poor, good, or
excellent), number of pages, and binding type (with values hardback or paperbound), in addition to
purchase price and buy-back price. Cases would be individual textbooks. Here, we would likely be
interested in the relationship between the buy-back price (response variable) based on the other
explanatory variables such as the number of pages. We could also use the categorical variable, condition
of the book, to group the data.
2.9. Answers will vary. Some possible variables are university, size, etc., in addition to the average
number of tickets sold and the percentage of games won. Cases would be individual teams. Here, we
would likely be interested if there is a relationship between the average number of tickets sold and the
percentage of games won.
2.11. Rating looks Normally distributed. Price also looks somewhat normal but has a high outlier.

2.13. (b) Shown below. (c) There is no difference in the plot except for the scale on the x axis.

1
2.15. Answers may vary. Most will likely prefer the plot with the log transformation because it spreads
out the data points and makes the plot easier to read and interpret.
2.17. (a) A negative association means that low values of one variable are associated with high values of
the other variable. (b) A stemplot is used for only a single quantitative variable. (c) We put the response
variable on the y axis and the explanatory variable on the x axis.
2.19. (a)

(b) The form is somewhat linear; the direction is positive; the strength is still weak. (c) There are a couple
possible low outliers for Antho3. (d) Adding a line could be useful because the relationship is somewhat
linear (shown below).

2
(e) Answers will vary depending on software used; the smoothing does not contribute much in describing
the relationship.

2.21. (a)

3
(b) The form is linear; the direction is positive; the strength is very strong. (c) There is one outlier with an
unusually high value for both variables. (d) Yes, the line shows the direction and strength. (e) Answers
will vary depending on software used; the smoothing does not help at all because the relationship is quite
linear.

2.23. (a) Plot shown below. For all fuel types, as highway fuel consumption increases, so does carbon
dioxide emissions. (b) Vehicles with fuel type D have the largest emissions, while vehicles with fuel type
E have the smallest emissions. The other two types, X and Z, have fairly similar emissions.

2.25. (a) Shown below. (b) As nondominant arm strength increases, so does dominant arm strength.
There is one outlier with an extremely high nondominant arm strength. (c) The form is linear. The
direction is positive. The strength is strong. (d) There is one outlier with an extremely high nondominant
arm strength. (e) Yes, the relationship is linear except for the outlier.

4
2.27. Parents’ income is explanatory, and college debt is the response. Both variables are quantitative. We
would expect a negative association: for parents with lower incomes, student college debt would be high
and vice versa.
2.29. (a) Shown below. (b) Answers will vary depending on software used; the smoothing does not help,
as the relationship is quite linear.

5
2.31. The relationships between calories and alcohol content are quite similar for domestic and imported
beers. Also, the outlier for the imported beers no longer is an outlier, as there are several other domestic
beers that have a similar alcohol content.

2.33. (a) Shown below. (b) As time increases, logcount goes down. (c) The form is linear; the direction is
negative; the strength is extremely strong. (d) There are no outliers. (e) Yes, the relationship is very
linear, almost a perfect line.

6
2.35. (a) Shown below. (b) The plot is more linear than the original scatterplot. (c) Answers may vary but
the log transformed data should be preferred because it straightens out the relationship.

2.37. (a) Shown below. (b) The association is positive, linear, and strong. (c) Overall, the relationship is
strong, but it is stronger for women than for men. Male subjects generally have both larger lead body
mass and higher metabolic rates than women.

7
2.39. (a) This is a linear transformation. Dollars = 0 + 0.01 × Cents. (b) r = 0.671. (c) They are the same.
(d) Changing the units does not change the correlation.
2.41. (a) Strong and positive. (b) Strong and negative. (c) Weak and negative. (d) No linear relationship.
2.43. (a) r = 0.298. (b) Probably. The plot is somewhat linear. (c) No, if they were approximately equal,
the correlation would be closer to 1.
2.45. For fuel type D: r = 0.977. For fuel type E: r = 0.983. For fuel type X: r = 0.981. For fuel type Z: r =
0.976. The correlations for all four fuel types of vehicles are similar, around 0.98. The relationship
between CO2 emissions and highway fuel consumption is linear and strong for all four fuel types.
2.47. (a) r = 0.905. (b) Yes, the correlation is appropriate because the pattern is linear. There is one
outlier, but it fits the overall pattern.
2.49. The person who wrote the article interpreted a correlation close to 0 as if it were a correlation close
to −1 (implying a negative association between teaching ability and research productivity). Professor
McDaniel’s findings mean there is little linear association between research and teaching; for example,
knowing that a professor is a good researcher gives little information about whether she is a good or bad
teacher.
2.51. (a) r = −0.999. (b) Correlation is a good numerical summary here because the scatterplot is strongly
linear. (c) Both have a correlation value that is close to 1, which would indicate a strong relationship.
However, the correlation only makes sense if there is a linear relationship. The relationship in the
previous exercise is not linear.
2.53. (a) r = 0.905. (b) The correlation does a good job of describing the relationship because it is quite
linear; however, there is one outlier in this dataset, O’Doul’s, with an extremely low alcohol percent.
2.55. r = 0.912. Both correlations for the imported and domestic beers are quite similar, especially when
the outlier O’Doul’s is removed. The relationships between calories and percent alcohol for both types of
beers are linear and strong and quite similar in pattern.
2.57. Answers may vary. (a) With only two points, the correlation will be 1 or –1 because they form a
perfect straight line. (b) – (d) Shown below.

8
2.59. (a) r = –0.72971. (b) The correlation is not a good numerical summary for this relationship because
there is a curvature in the plot.

2.61. (a) Almost all the data points are below the line.

2.63. Predictions for 300 and 600 would be trustworthy because they are within the range of the data for
NEA. We would not trust predictions from –300 and 800 because they are outside the range of the data.
This would be extrapolation.

9
2.65. (a) ŷ = 0.34475 + 0.15185Antho3. (b) The scatterplot is shown below. (c) The line does not fit the
data well. Several of the data points are very far from the line. (d) ŷ = 0.34475 + 0.15185(1.5) =
0.572525.

2.67. (a) ŷ = 56.75012 + 20.78205FuelConsHwy. (b) The scatterplot is shown below. (c) A single
regression line would not be a good fit for the four types of vehicles, even though the correlations were all
very close. From the plot, we can see there are several different lines that need to be accounted for with
separate regression lines.

2.69. (a) – (b) Plot shown below. (c) For the baseball group, there is a strong linear relationship between
bone strength of the dominant arm and the nondominant arm. For each unit increase in the nondominant,
the dominant arm bone strength increases by 1.373. The increase is greater than in the control group.

10
2.71. Predicted bone strength is 0.886 + 1.373*16 = 22.854 cm4/1000.
2.73. (a) – (d) below. (e) In terms of least squares, the first line is a better description of the relationship
between time and radioactive counts. The sum of its squared differences (the least squares that is
minimized in regression) is 8440.2; the sum for the second line is 187,706.

2.75. (a) and (d) Plot shown below. (b) The relationship is linear, positive, and strong. There are several
sy  358, 460 
outliers with large population values. (c) The slope is b1 = r = 0.98367   = 0.05326. The
sx  6, 620, 733 
intercept is b0 = y − b1 x = 302,136 − 0.05326(5, 955, 551) = −15057 (–15045 from software). The regression
line is ŷ = –15057 + 0.05326x.

11
2.77. (a) ŷ = –15047 + 0.05326(4,000,000) = 197,993 (197,987 from software). (b) ŷ = 8487 +
0.04846(4000000) = 202,327 (202,328 from software). (c) In this situation, the outliers did not change the
prediction for the median-sized state. Although they have populations much larger than the rest of the
states, the outliers follow the pattern of the regression line, so they are not influencing our regression
equation drastically.
2.79. (a) ŷ = 8491.907 + 0.048x. (b) r2 = 0.942. (c) 94.2% of the variation in the number of
undergraduates is accounted for by the population size. (d) The software does not report the nature of the
relationship; it is assuming a linear relationship in the calculations shown.
2.81. (a) Plot shown below. There seems to be a weak negative linear relationship between y and x, but
with one extreme outlier with a very high x value. (b) ŷ = 31.38274 – 0.09425x. (c) r2 = 0.0224 or
2.24%. (d) Here, the outlier completely eliminated all evidence of a regression line. The x variable
accounts for only 2.24% of the variation in y, meaning there is no linear relationship at all. However, we
know this is wrong because it is mostly due to the outlier unnaturally twisting the regression line.

12
2.83. (a) ŷ = 6.41895 + 28.34687x. (b) r2 = 0.8193; 81.93% of the variation in calories is explained by
the relationship with percent alcohol. (c) The relationship between calories and percent alcohol is linear,
positive, and strong; however, there does seem to be one low outlier, O’Doul’s, with a very low alcohol
content.

2.85. (a) The correlations and regression lines for all four datasets are essentially the same: r = 0.82 and
ŷ = 3 + 0.5x. For x = 10, ŷ = 3 + 0.5(10) = 8. (b) Plots shown below. (c) For Data A, the regression line
is a reasonable fit for the data. For Data B, there is obviously a curve in the scatterplot, and a
transformation is needed before a regression should be run. For Data C, this is a perfect relationship but
with one outlier, which is influential; a regression would not be valid for this data. For Data D, there is no
relationship between y and x at all; only the extreme x outlier even makes the regression line possible, but
it is definitely inappropriate to use regression on this dataset. Only for Data Set A should regression be
used.

13
2.87. (a) y = 15 – 2(4) = 7. (b) For each 1 unit increase in x, y decreases by 2. (c) The intercept is 15.
2.89. The results from the applet are shown as follows. The regression equation is
yˆ = 0.896 x − 1517.935. This is a strong, positive relationship because r = 0.982. We can conclude that
NAEP scores are steadily increasing about 0.896 points per year.

2.91. r = 0.16 = 0.4. r has a positive sign because students who attended a higher percent of classes
earned higher grades.

14
2.93. The residuals sum to 0.01.

2.95. The residuals are calculated as Dominant – (0.886 + 1.373 × Nondominant).

ID Nondominant Dominant Residual ID Nondominant Dominant Residual


5 21.0 40.3 10.58 7.0 31.5 36.9 –7.24
6 14.6 20.8 –0.13 8.0 14.9 21.2 –0.14

2.97. (a) – (b) The table and plot are shown below. (c) The residual plot looks random; the model using
logs is much better.

Time LogCount Predicted Residual


1 6.35957 6.33244 0.02713
3 5.75890 5.81121 -0.05231
5 5.31321 5.28997 0.02323
7 4.77068 4.76874 0.00195

2.99. (a) Plot shown below.

15
(b). Residual plot below.

(c). The histogram below shows that California does not appear as an outlier for LogPopulation.

(d). The histogram below shows that California does not appear as an outlier for LogUndergrads.

(e). Looking at the scatterplot in part (a), California follows the pattern with the other states using the log
transformations; it does not appear as an outlier in terms of the relationship. The residual plot in part (b)

16
confirms this. (f) California is not influencing the regression line using the log transformations. Below are
residual plots with and without California removed; overall, the plot looks nearly identical. Using logs
seems to fix the influential nature of California for this data set.

2.101. (a) If the line is pulled toward the influential point, the observation will not necessarily have a
large residual. (b) High correlation is always present if there is causation. (c) Extrapolation is using a
regression to predict for x-values outside the range of the data (here, using 20, for example).
2.103. The Internet use does not cause people to have fewer babies. Possible lurking variables are
economic status of the country, levels of education, etc.
2.105. Answers will vary. For example, a reasonable explanation is that the cause-and-effect relationship
goes in the other direction: Doing well makes students or workers feel good about themselves, rather than
vice versa.
2.107. The explanatory and response variables were “consumption of herbal tea” and
“cheerfulness/health.” The most important lurking variable is social interaction; many nursing home
residents may have been lonely before students started visiting.
2.109. (a) It is difficult to draw the correct line by hand. (b) Most people tend to overestimate the slope
for a scatterplot with r = 0.6; that is, most students will find that the least-squares line (the one without the
ending dots) is less steep than the one they draw.

2.111. Each group has a positive association, but, when combined, the regression slope is negative.

17
2.113. Sum the rows of the table. 1278 met the requirements; 751 did not meet requirements.
2.115. Divide the cell count by the table total. 861/2029 = 0.4243.
2.117. Divide the cell count by the total number of children aged 11 to 13. 417/974 = 0.4281 (which
rounds to 43%).
2.119. (a) Because they want to see the effect of driver’s education courses, that is, the explanatory
variable. The number of accidents is the response. (b) Driver’s Ed would be the column (x) variable, and
number of accidents would be the row (y) variable. A possible table is shown below. (c) There are six
cells (two columns by three rows). For example, the first row, first column entry could be the number
who took driver’s education and had 0 accidents.

Driver’s Driver’s
Ed Ed
Yes No
0 accidents
1 accident
2+ accidents

2.121. (a) Age is the explanatory variable. Rejected is the response. With the dentistry available at that
time, it is reasonable to think that, as a person got older, he or she would have lost more teeth.
(b)

Under 20 20 to 25 25 to 30 30 to 35 35 to 40 Over 40
Yes 0.0002 0.0019 0.0033 0.0053 0.0086 0.0114
No 0.1761 0.2333 0.1663 0.1316 0.1423 0.1196

(c)

Marginal Distribution of Rejected Marginal Distribution of Rejected


Yes No
0.03081 0.96919

Marginal Distribution of Age


Under 20 20 to 25 25 to 30 30 to 35 35 to 40 Over 40

18
Under 20 20 to 25 25 to 30 30 to 35 35 to 40 Over 40
0.1763 0.2352 0.1696 0.1369 0.1509 0.131

(d) The conditional distribution of Rejected Given Age because we have said Age is the explanatory
variable. (e) In the table, note that all columns sum to 1. We can clearly see the proportion of rejected
recruits increasing with increasing age.

Under 20 20 to 25 25 to 30 30 to 35 35 to 40 Over 40
Yes 0.0012 0.0082 0.0196 0.0389 0.0572 0.0868
No 0.9988 0.9918 0.9804 0.9611 0.9428 0.9132

2.123 Sex is the explanatory; Lied is the response. A table is shown below.

Lied Male Female Total


Yes 5057 5966 11023
No 4145 5719 9864
Total 9202 11685 20887

To assess if there is a sex difference, we look at the conditional distribution of Lied Given Sex (table
below). The percentages for both males and females are quite similar. For the males, about 55% admitted
that they lied, whereas for the females, 51% admitted that they had lied. Males may be slightly more
willing to admit that they lied than females.
Note: This does not mean that they are more likely to lie; just that in this study a higher percentage
reported that they had lied than did their female counterparts.
Conditional Distribution of Lied Given Sex
Lied Male Female
Yes 54.96 51.06
No 45.04 48.94
Total 100 100

2.125. (a) There are 151 + 148 = 299 “high exercisers,” of which 151/299 = 50.5% get enough sleep and
49.5% (the rest) do not. (b) There are 115 + 242 = 357 “low exercisers,” of which 115/357 = 32.2% get
enough sleep and 67.8% (the rest) do not. (c) Those who exercise more than the median are more likely to
get enough sleep.
2.127. 63/2100 = 3.0% of Hospital A’s patients died, compared with 16/800 = 2.0% at B.
2.129. Answers will vary. In general, choose a to be any number from 0 to 300, and then all the other
entries can be determined.
2.131. Answers will vary. For example, causation might be a negative association between the setting on
a stove and the time required to boil a pot of water (higher setting, less time). Common response might be
a positive association between SAT score and grade point average. Both of these will have a positive
relationship with a person’s IQ. An example of confounding might be a negative association between
hours of TV watching and grade point average. Once again, people who are naturally smart could finish

19
required work faster and have more time for TV; those who are not as smart could become frustrated and
watch TV instead of doing homework.
2.133. This is a case of confounding: The association between dietary iron and anemia is difficult to
detect because malaria and helminths also affect iron levels in the body.
2.135. Responses will vary. For example, students who choose the online course might have more self-
motivation or better computer skills. A diagram is shown on the right; the generic “Student
characteristics” might be replaced with something more specific.

2.137. No, self-confidence and improving fitness could be common responses to some other personality
trait, or high self-confidence could make a person more likely to join the exercise program.
2.139. Patients suffering from more serious illnesses are more likely to go to larger hospitals (which may
have more or better facilities) for treatment. They are also likely to require more time to recuperate
afterward.

2.141. In this case, there may be a causative effect but in the direction opposite to the one suggested:
People who are overweight are more likely to be on diets and so choose artificial sweeteners over sugar.
(Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to
artificial sweeteners.)
2.143. This is an observational study—students choose their “treatment” (to take or not take the refresher
sessions).

20
2.145. (a) Plot shown below. There is no real linear relationship between production and dwelling
permits. (b) ŷ = 110.95813 + 0.07315DwellPermit. (c) For each new index point of dwelling permits
issued, production increases by 0.07315. (d) The intercept is 110.95813. This is what we expect sales to
be when the index for permits issued for new dwellings is 0. This interpretation is not useful because we
do not have any countries with around 0 permits issued, so we do not know whether the relationship
observed extends that far. (e) ŷ = 110.95813 + 0.07315(224) = 127.3444. (d) e = 101 – 127.3444 = –
26.34444. (e) R-square = 1.99%. This is much smaller than the R-square value in the previous exercise. It
should be noted, however, that neither variable (production or sales) had a strong relationship to the
number of permits issued for new dwellings.

2.147. (a) Generally, as the percent under 15 increases in a province or territory, the percent of the
population over 65 in that province or territory decreases. (b) r = –0.851. The correlation gives a pretty
good representation of the relationship; however, there is an outlier, Nunavut, which does not follow the
pattern of the other provinces and territories.
2.149. (a) Plot shown below. (b) The three territories have smaller percentages of the population over 65
than any of the provinces. Additionally, two of the three territories have larger percentages of the
population under 15 than any of the provinces.

21
2.151. Answers may vary. Overall, only 31% of banks offer RDC. Of those that offer RDC, almost half,
48%, have an asset size of 201 or more. Conversely, of those that do not offer RDC, over half, 59%, have
an asset size of under 100. Generally speaking, banks that offer RDC are more likely to have larger asset
sizes, while the banks that do not offer RDC are more likely to have smaller asset sizes. Put differently, as
asset size increases, it is more likely that the bank will offer RDC.
Conditional Distribution:
AssetSize Given OfferRDC Distribution of Offer RDC by
OfferRDC OfferRDC Asset Size
AssetSize Yes No 100
Under100 26.92 58.75
101To200 25.21 25.1 80

201 orMore 47.86 16.16 60


Total 100 100
40

Conditional Distribution: 20
OfferRDC Given AssetSize 0
OfferRDC OfferRDC Under100 101To200 201OrMore
AssetSize Yes No Total
Yes No
Under100 16.94 83.06 100
101To200 30.89 69.11 100
201 orMore 56.85 43.15 100

22
2.153. (a) Table of Field by Country.

Field Canada France Germany Italy Japan UK US Total


SocBusLaw 64 153 66 125 259 152 878 1697
SciMathEng 35 111 66 80 136 128 355 911
ArtsHum 27 74 33 42 123 105 397 801
Educ 20 45 18 16 39 14 167 319
Other 30 289 35 58 97 76 272 857
Total 176 672 218 321 654 475 2069 4585

(b) Marginal Distribution of Country.

Country Frequency Percent


Canada 176 3.84
France 672 14.66
Germany 218 4.75
Italy 321 7
Japan 654 14.26
UK 475 10.36
US 2069 45.13

(c) Marginal Distribution of Field.

Field Frequency Percent


ArtsHum 801 17.47
Educ 319 6.96
Other 857 18.69
SciMathEng 911 19.87
SocBusLaw 1697 37.01

2.155. A school that accepts weaker students but graduates a higher-than-expected number of them would
have a positive residual, whereas a school with a stronger incoming class but a lower-than-expected
graduation rate would have a negative residual. It seems reasonable to measure school quality by how
much benefit students receive from attending the school.
2.157. (a) The residuals are positive at the beginning and end and negative in the middle. (b) The
behavior of the residuals agrees with the curved relationship seen in Figure 2.34.
2.159. (a) The regression equation for predicting salary from year is ŷ = 41.253 + 3.9331Year; for year
25, the predicted salary is 41.253+3.9331*25 = 139.58 thousand dollars, or about $139,600. (b) The log-
salary regression equation is ŷ = 3.8675 + 0.04832Year. At year 25, we predict 3.8675 + 0.04832*25 =
5.0755, so the predicted salary is e5.0755 = 160.052, or about $160,050. (c) Although both predictions

23
involve extrapolation, the second is more reliable because it is based on a linear fit to a linear relationship.
(d) Interpreting relationships without a plot is risky. (e) Student summaries will vary but should include
comments about the importance of looking at plots and the risks of extrapolation.
2.161. (a) ŷ = 2990.42521 + 0.99216Salary2014_15. (b) Residual plot below. There are two outliers in
the y direction. One individual is being paid much higher than predicted; another is being paid much
lower than predicted.

2.163. Number of firefighters and amount of damage are common responses to the seriousness of the fire.
2.165. The scatterplot shows a strong linear positive association. The regression equation is ŷ = 2652.66
+ 0.00474MOE; this regression explains r2 = 0.6217 = 62% of the variation in MOR. Thus, we can use
MOE to get fairly good (though not perfect) predictions of MOR.

2.167. (a) Table shown below. (b) 490/700 or 70% of male applicants are admitted; 280/500 or 56% of
female applicants are admitted. (c) For Business, 80% of male applicants are admitted; 90% of female
applicants are admitted. For Law, 10% of male applicants are admitted; 33% of female applicants are

24
admitted. (d) Three times as many men apply to business school than to law school, while the number of
women who apply to each school is much closer but with more women applying to law school. Because
the business school has much higher admittance rates, it makes the overall percentage for men much
higher when combined with the law school applicants. Similarly, because the law school admittance rates
are much worse, the fact that more women applied to law school makes their overall percentage look
much worse when combined with the business school applicants.

Sex Admit Deny Total


Male 490 210 700
Female 280 220 500
Total 770 430 1200

2.169. If we ignore the “year” classification, we see that Department A teaches 32 small classes out of 52,
or about 61.54%, whereas Department B teaches 42 small classes out of 106, or about 39.62%. (These
agree with the dean’s numbers.) For the report to the dean, students may analyze the numbers in a variety
of ways, some valid and some not. The key observations are: (i) When considering only first- and second-
year classes, A has fewer small classes (1/12 = 8.33%) than B (12/70 = 17.14%). Likewise, when
considering only upper-level classes, A has 31/40 = 77.5% and B has 30/36 = 83.33% small classes. The
graph on the right illustrates this. These numbers are given in the back of the text, so most students should
include this in their analysis! (ii) 40/52 = 77.78% of A’s classes are upper-level courses, compared with
36/106 = 33.96% of B’s classes.

2.171. (a) The distribution shown below shows a split between what people feel about the quality of the
recycled filters. (b) The conditional distributions are shown in the table below. Buyers of the filters are
much more likely to think the quality of the recycled filter is higher, while in contrast, Nonbuyers are
much more likely to think the quality of the recycled filter is lower. It is plausible that using the filters
may cause more favorable opinions.

Quality Frequency Percent Quality Quality Quality


Higher 49 36.84 Group Higher Same Lower Total
Same 32 24.06 Buyers 55.56 19.44 25 100
Lower 52 39.1 Nonbuyers 29.9 25.77 44.33 100

25
Opinion on quality of recycled
filters
60
50
40
30
20
10
0
Buyers Nonbuyers

Quality Quality Quality

2.173. (a) The tables are shown below.


Female Titanic Passengers Male Titanic Passengers

Class Class Class Total Class Class Class Total


1 2 3 1 2 3
Survived 139 94 106 339 Survived 61 25 75 161
Died 5 12 110 127 Died 118 146 418 682
Total 144 106 216 466 Total 179 171 493 843

(b) If we look at the conditional distribution of survival given class for females, 139/144 = 96.53% of
first-class females survived, 94/106 = 88.68% survival among second-class females, and 106/216 =
49.07% survival among third-class females. Survival depended on class. (c) For males, 61/179 = 34.08%
survival among first-class, 25/171 = 14.62% survival among second-class, and 75/493 = 15.21% survival
among third-class. Once again, survival depended on class. (d) Females overall had much higher survival
rates than males.

2.175 Answers will vary.

26
Chapter 3 Producing Data
3.1. This is anecdotal evidence; the preference of a group of friends likely would not generalize to the
entire college.
3.3. This is anecdotal evidence; the opinion of the three students does not generalize to all students,
especially if the students were picked by the newspaper because of their strong opinions in the first place.
3.5. In order to evaluate the claim, we would need data for a sample of the Millennial Generation
including its brand preferences and measures of loyalties toward these brands.
3.7. Yes to both. Here are two examples. Pew Research regularly makes available its data (about six
months after the survey) for academic purposes; surveys are observational. The “warnings and
contraindications” packed with medicines are based on experiments.
3.9. This is an observational study. All the soap dispensers were run until their batteries died; there were
no treatments.
3.11. This is an experiment. Explanatory variable: apple form (juice or whole fruit); response variable:
how full the subject felt.
3.13. The data used were likely observational data from a sample survey. Most likely a random sample of
cans of tuna were measured and compared with the amount printed on the labels.
3.15. (a) This is a sample survey if it is supposed to represent only those who did not get tickets;
otherwise, it is anecdotal evidence and certainly doesn’t apply to all of those who tried to get tickets to the
concert. (b) This represents a sample survey of all those who tried to get tickets to the concert (it is
actually a voluntary response sample). (c) This also represents a sample survey of all those who tried to
get tickets to the concert. It differs from part (b) because only a sample of students was selected to
participate rather than allowing participants to be self-selected. (d) It is not an experiment; there is no
treatment. (e) The method used in part (a) is not very useful especially if it was determined the students
were upset before being interviewed. The method in part (b) also can be problematic, as those who
choose to respond may have strong opinions, especially negative ones. The method in part (c) is likely the
best of the three in order to get an accurate representation of students’ opinions on how the tickets were
allocated.
3.17. (a) This is an experiment because the girls were assigned a controlled diet during the study period.
The experimental units/subjects are the girls, the treatments are the two controlled diets (low and high
calcium), and the response is the amount of calcium retained. The factor is this instance is the same as the
treatment, the diet assigned, it has two levels: low and high calcium.
3.19. (a) This study is biased because no control group was used. Therefore, the placebo effect may be
present. (b) To remove the bias, two treatment groups should be used: one with the aspirin and a control
group that receives a placebo.
Note: They also used individuals who had frequent headaches. Their responses to aspirin
certainly could be different than responses from other students (who likely do not use pain reducing drugs
as frequently).

1
3.21.

3.23. Using a computer, answers will vary. Each treatment group should end up with 25 subjects.
3.25. (a) Experimental units were the 30 students. They are human, so we can use “subjects.” (b) We
have only one “treatment,” so the experiment is not comparative. One possibility is to randomly assign
half of the students to the online homework system and the other half to “standard” homework. (c) One
possibility is grades on an exam over the material from that month; compare those with the online
homework to those with “standard” homework.
3.27. (a) Experimental units (subjects): people who go to the website. Treatments: description of comfort
or showing discounted price. Response variable: shoe sales. (b) Comparative, because we have two
treatments. (c) One option to improve: randomly assign morning and afternoon treatments. (d) Yes, a
placebo (no special description or price) could give a “baseline” sales figure.
3.29. No, a matched pairs design would require that the two measurements are taken on the same subject,
in this case the person who visits the website, which would be impossible in this case.
3.31. (a) Shopping patterns may differ on Friday and Saturday. (b) Responses may vary in different
states. (c) A control is needed for comparison.
3.33. For example, new employees should be randomly assigned to either the current program or the new
one. One possible outcome would be whether the new employee is still with the company six months
later.
3.35. (a) The factors are calcium dose and vitamin D dose. There are nine treatments (each
calcium/vitamin D combination). (b) Assign 20 students to each group, with 10 of each gender. The
complete diagram (including the blocking step) would have a total of 18 branches, below is a portion of
that diagram, showing only three of the nine branches for each gender. (c) Randomization results will
vary. (d) Yes, there is a placebo. The group that gets 0 mg of both calcium and vitamin D serves as the
placebo group.

2
3.37. Applet, answers will vary. (a) The results of the first random selection are shown below. (b) Click
Sample again for the second 25. They will begin immediately after the end of the first sample (73 in the
results shown above). (c) Continue until 75 have been assigned to a group and check that there are 25
remaining in the “Population hopper.”

3.39. Design (a) is an experiment. Because the treatment is randomly assigned, the effect of other habits
would be “diluted” because they would be more or less equally split between the two groups. Therefore,
any difference in colon health between the two groups could be attributed to the treatment (bee pollen or
not).
Design (b) is an observational study. It is flawed because the women observed choose whether or not to
take bee pollen; one might reasonably expect that people who choose to take bee pollen have other dietary
or health habits that would differ from those who do not.
3.41. (a) Randomly assign half the girls to get high-calcium punch; the other half will get low-calcium
punch. The response variable is not clearly described in this exercise; the best we can say is “observe how
the calcium is processed.” (b) Shown below; those in the Sample section will receive the high calcium
punch. (c) Also shown below.

3
Camper Punch Camper Punch Camper Punch
type type type
1 low 11 high 21 high
2 high 12 high 22 high
3 high 13 high 23 low
4 low 14 low 24 high
5 high 15 high 25 high
6 low 16 high 26 high
7 low 17 low 27 high
8 low 18 high 28 low
9 low 19 low 29 low
10 low 20 low 30 low

3.43. Answers will vary. For example, the trainees and experienced professionals could evaluate the same
water samples.
3.45. (a) The sample is the 772 forest owners that the survey was sent to. (b) The population is all forest
owners from this region. (c) The response rate is the percentage of those who were sent the survey that
returned it: 348/772 = 45%. Additionally, we would like to know the sample design (among other things).
3.47. The next three labels selected are 114, 080, 094. These correspond to Malta, Iraq, and Kosovo.
3.49. See the solution to the previous exercise; for this problem, we need to choose three items instead of
two, but the setup is otherwise the same.
3.51. (a) The population is all 5674 season ticket holders. (b) The sample is the 150 fans who were sent
the survey. (c) The response rate is 98/150 = 65.33%. (d) The nonresponse rate is 52/150 = 34.67%. (e)
Answers will vary, offering free food or a raffle for free tickets for responders are some examples.
3.53. (a) Answers will vary depending on which line students use. They should break the line into three-
digit numbers and pick the first eight labels that are between 001 and 123. (b) Answers will vary on
preference between Excel and the random digit table.
3.55. (a) This statement confuses the ideas of population and sample. (If the entire population is found in
our sample, we have a census rather than a sample.) (b) “Dihydrogen monoxide” is H 2 O (water). Any
concern about the dangers posed by water most likely means that the respondent did not know what
dihydrogen monoxide was and was too embarrassed to admit it. (Conceivably, the respondent could have
known the question was about water and had concerns arising from a bad experience of flood damage or
near-drowning. But misunderstanding seems to be more likely.) (c) Honest answers to such questions are
difficult to obtain even in an anonymous survey; in a public setting like this, it would be surprising if
there were any raised hands (even though there are likely to be at least a few cheaters in the room).
3.57. The population is (all) local businesses. The sample is the 80 businesses selected. The nonresponse
rate is 34/80 = 42.5%.
3.59. Using line 136 and labels 01 – 33, we select 08, 14, 20, 09, 24, 12, 11, and 16. Actually complexes
chosen may vary depending on labels; alphabetically we would choose: Burberry, Crestview,
Georgetown, Cambridge, Nobb Hill, Country View, Country Square, and Fairington.

4
3.61. Applet, answers will vary.
3.63. Each student has a 20% chance: five out of 25 over-21 students, and three of 15 under-21 students.
This is not an SRS because not every group of eight students can be chosen; the only possible samples are
those with five older and three younger students. It is a stratified random sample.
3.65. The sample is random because the starting point is randomly selected (so every individual has an
equal chance to be selected before the process begins). Once the random starting point has been selected,
the rest of the sample is determined. There is no possibility of selecting (in the previous exercise),
students 04 and 05, which could happen in a simple random sample.
3.67. Label the students 01, . . ., 30 and use Table B. Then label the faculty 0, . . ., 9 and use the table
again. We were not told a particular starting line, so actual samples will vary.
Note: Students often try some fallacious method of choosing both samples simultaneously. We
simply want to choose two separate SRSs: one from the students and one from the faculty.
3.69. (a) This design would omit households without telephones or with unlisted numbers. Such
households would likely be made up of poor individuals (who cannot afford a phone), those who choose
not to have phones (perhaps because they use a cell phone exclusively) and those who do not wish to have
their phone numbers published. (b) Those with unlisted numbers would be included in the sampling
frame when a random-digit dialer is used.
3.71. These three proposals are clearly in increasing order of risk. Most students will likely consider that
(a) qualifies as minimal risk, and most will agree that (c) goes beyond minimal risk.
3.73. Answers will vary. It is good to plainly state the purpose of the research (“To study how people’s
religious beliefs and their feelings about authority are related”). Stating the research thesis (that orthodox
religious beliefs are associated with authoritarian personalities) would cause bias.
3.75. Answers will vary.
3.77. Answers will vary.
3.79. Answers will vary.
3.81. To control for changes in the mass spectrometer over time, we should alternate between control and
cancer samples.
3.83. They cannot be anonymous because the interviews are conducted in person in the subject’s home.
They are certainly kept confidential.
Note: For more information about this survey, see the GSS website: www.norc.org/GSS+Website
3.85. Answers will vary.
3.87. (a) Answers will vary. (b) The subjects should be told what kind of questions will be asked and how
long it will take. (c) Answers will vary. (d) Revealing the sponsor could bias the poll, especially if the
respondent doesn’t like or agree with the sponsor. However, the sponsor should be announced once the
results are made public; that way people can know the motivation for the study and judge whether it was
done appropriately, etc.
3.89. Answers will vary.

5
3.91. Answers will vary.
3.93. (a) You need information about a random selection of his games, not just the ones he chooses to talk
about. (b) These students may have chosen to sit in the front; all students should be randomly assigned to
their seats.
3.95. This is an experiment because each subject is (randomly, we assume) assigned to a treatment. The
explanatory variable is the price history seen by the subject (steady prices or fluctuating prices), and the
response variable is the price the subject expects to pay.
3.97. Answers will vary.
3.99. The two factors are gear (three levels) and steepness of the course (number of levels not specified).
Assuming there are at least three steepness levels, which seems like the smallest reasonable choice, that
means at least nine treatments. Randomization should be used to determine the order in which the
treatments are applied. Note that we must allow ample recovery time between trials, and it would be best
to have the rider try each treatment several times.
3.101. (a) One possible population: all full-time undergraduate students in the fall term on a list provided
by the registrar. (b) A stratified sample with 125 students from each class rank is one possibility. (c)
Mailed (or emailed) questionnaires might have high nonresponse rates. Telephone interviews exclude
those without phones and may mean repeated calling for those who do not answer. Face-to-face
interviews might be more costly than funding will allow. There might also be some response bias: Some
students might be hesitant about criticizing the faculty (while others might be far too eager to do so).
3.103. Use a block design: Separate men and women, and randomly allocate each gender among the six
treatments.
3.105. The latter method (CASI) will show a higher percentage of drug use because respondents will
generally be more comfortable (and more assured of anonymity) about revealing illegal or embarrassing
behavior to a computer than to a person, so they will be more likely to be honest.
3.107. Answers will vary.
3.109. Answers will vary.

6
Chapter 4 Probability: The Study of Randomness
4.1. Six of the first 10 digits on line 131 correspond to “heads,” so the proportion of heads is 6/10 = 60%.
Although the average number of heads in 10 tosses is 5, the actual outcome is random and can vary from
sample to sample.
4.3. (a) We can discuss the probability (chance) the temperature would be between 30 and 35 degrees
Fahrenheit, for example. (b) Depending on your school, student identification numbers are probably not
random. For example, at the author’s university, all student IDs begin with 900, which means that the first
three digits are all the same and not random. (c) The probability of an ace in a single draw is 4/52 if the
deck is well-shuffled.
4.5. Answers will vary depending on your set of 25 rolls.
4.7. Answers will vary. This particular set of rolls had 21/40 rolls that included at least one six. Continue
until you are convinced of the result.

4.9. S = {all numbers between 0 and 168}.


4.11. A favorite color being white, black, silver, gray, or red means is it not blue, brown, or other. This
probability is 1 – (0.07 + 0.05 + 0.04) = 1 – 0.16 = 0.84. Adding three probabilities and subtracting that
result from 1 is slightly easier than adding the five probabilities of interest.
4.13. These are disjoint events, so P(4 or (7 or more)) = P(4) + P(7 or more) = 0.097 + 0.155 = 0.252.
4.15. Heads and tails are equally likely on each toss, so the sample space of outcomes is {HH, HT, TH,
TT}. Getting a head and then a tail is one of these, so the probability is 1/4 = 0.25.
4.17. (a) S = {Yes, No}. (b) S = {0, 1, 2, ..., x}, where x is a reasonable upper limit. (c) S = {18, 19, 20, ...
} (presuming the friends of college students are most likely other college students). There is some leeway
here with the lower and upper ends of the ages. (d) Answers will vary by institution. Some have many
(for example, Purdue University has almost 200 possible majors); others have few (for example, a
culinary school presumably has just one major).
4.19. (a) Not equally likely: she wins close to 60% of her matches. (b) Equally likely. The chance of a 3
and the chance of a 4 are both 1/6. (c) Probably not equally likely because it depends on the intersection.
(d) Not equally likely: home teams win more than half their games.
4.21. (a) The probability that both of the two disjoint events occur is 0. (Multiplication of probabilities is
appropriate for independent events, not disjoint events.) (b) Probabilities must be no more than 1; P(A

1
and B) will be no more than 0.7. (We cannot determine this probability exactly from the given
information.) (c) P(not A) = 1 – P(a) = 1 − 0.45 = 0.55.
4.23. There are five possible outcomes: S = {link1, link2, link3, link4, leave}.
4.25. (a) P(AB) = 1 – (0.42 + 0.11 + 0.44) = 1 – 0.97 = 0.03. (b) P(can donate) = P(O or B) = P(O) +
P(b) = 0.44 + 0.11 = 0.55.
4.27. (a) No. These student categories are disjoint, and the probabilities sum to more than 1. (b) This is
legitimate (in terms of the probability rules), but the deck would be a nonstandard one. (c) This is
legitimate, but represents a “loaded” die (i.e., not a fair one).
4.29. (a) P(some education beyond high school, but no degree) = 1 – (0.12 + 0.31 + 0.29) = 0.28. (b) P(at
least high school) = 1 – P(didn’t finish high school) = 1 – 0.12 = 0.88.
4.31. For example, the probability for A-positive blood is (0.35)(0.84) = 0.3528 and for A-negative blood
is (0.35)(0.16) = 0.0672.

Blood type A+ A– B+ B– AB+ AB– O+ O–


Probability 0.294 0.056 0.084 0.016 0.0252 0.0048 0.4368 0.0832

4.33. (a) There are six arrangements of the digits 0, 5, and 9 (059, 095, 509, 590, 905, 950), so that
P(win) = 6/1000 = 0.006. (b) There are only three arrangements of the digits 2, 2, and 3 (223, 232, 322),
so that P(win) = 3/1000 = 0.003.
4.35. P(none are O-negative) = P(all are not O-negative) = (1 − 0.07)10 = 0.4840, so P(at least one is O-
negative) = 1 – P(none are O-negative) = 1 − 0.4840 = 0.5160.
4.37. Note that A = (A and B) or (A and Bc), and the events (A and B) and (A and Bc) are disjoint, so Rule 3
says that P(a) = P((A and B) or (A and Bc))= P(A and B) + P(A and Bc). If P(A and B) = P(a)P(b), then we
have P(A and Bc) = P(a) − P(a)P(b) = P(a)(1 − P(b)), which equals P(a)P(Bc) by the complement rule.
4.39. The table shows the combinations. (a) Abdreona and Caleb’s children can have alleles AA, AO, or
OO, so they can have blood type A or O. (b) Either note that the four combinations in the table are
equally likely or compute P(type O) = P(O from Nancy and O from David) = 0.52 = 0.25 and P(type A) =
1 − P(type O) = 0.75.

A O
A AA AO
O AO OO

2
4.41. The table shows the combinations. Anna and Nathan’s children can have alleles AB, AO, BO or
OO, so they can have any of the four blood types all with equal probability. P(type A) = P(type AB) =
P(type B) = P(type O) = 0.25. (a) P(at least one child has type O) = 1 – P(no child has type O) = 1 – P(all
three have type A, AB, or B) = 1 – (0.75)3 = 0.578125 = 37/64. (b) P(all three have type O) = 0.253 =
0.015625 = 1/64. P(first has type O, next two do not) = 0.25(0.752) = 0.140625 = 9/64.

A O
B AB BO
O AO OO

4.43. Two tosses of a fair coin can result in HH, HT, TH, or TT. Each of these has probability 1/4.
Counting the number of heads in the two tosses, we have

x 0 1 2
P(X = x) 0.25 0.5 0.25

4.45.

x 1 2 3 4 5 6
P(X=x) 0.05 0.05 0.13 0.26 0.36 0.15

4.47. (a) P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) = 0.05 + 0.05 + 0.13 = 0.23. (b) P(X = 4 or X = 5) =
0.26 + 0.36 = 0.62. (c) P(X = 8) = 0.
4.49. (a) The possible values certainly can be negative; it is the probabilities that can’t be negative. (b)
Continuous random variables can take values from any interval, not just 0 to 1. (c) A Normal random
variable is continuous. (Also, a distribution is associated with a random variable, but “distribution” and
“random variable” are not the same things.)
4.51. Based on the information from Exercise 4.50, along with the complement rule, P(T) = 0.19 and
P(Tc) = 0.81. P(TT) = 0.192 = 0.0361, P(TTC) = (0.19)(0.81) = 0.1539, P(TCT) = (0.81)(0.19) = 0.1539,
and P(TCTC) = 0.812 = 06561. (c) Add up the probabilities from (b) that correspond to each value of X.

Outcome TT TTC TCT TCTC


Probability 0.0361 0.1539 0.1539 0.6561
X 0 1 1 2
Probability 0.0361 0.3078 0.3078 0.6561

4.53. (a) Time is continuous. (b) Hits are discrete (you can count them). (c) Yearly income is discrete
(you can count money).
4.55. (a) The pairs are given below. We must assume that we can distinguish between, for example,
“(1,2)” and “(2,1)”; otherwise, the outcomes are not equally likely. (b) Each pair has probability 1/36. (c)
The value of X is given below each pair. For the distribution, we see that there are four pairs that add to 5,
so P(X = 5) = 4/36. Histogram below, right. (d) P(7 or 11) = 6/36 + 2/36 = 8/36 = 2/9= 0.222. (e) P(not
7) = 1 – 6/36 = 5/6 = 0.8333.

3
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 3 4 5 6 7 Sum of two dice
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 0.200
3 4 5 6 7 8
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 0.150
4 5 6 7 8 9
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 0.100
5 6 7 8 9 10
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 0.050
6 7 8 9 10 11
0.000
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) 2 3 4 5 6 7 8 9 10 11 12
7 8 9 10 11 12

Sum 2 3 4 5 6 7 8 9 10 11 12
Probability 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

4.57. (a) Shown below. (b) P(X ≥ 1) = 1 – P(X = 0) = 1 – 0.1 = 0.9. (c) “At most, three nonword errors.”
P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0.1 + 0.3 + 0.3 + 0.2 = 0.9, P(X < 3) = P(X = 0) +
P(X = 1) + P(X = 2) = 0.1 + 0.3 + 0.3 = 0.7.

Number of nonword errors


0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4

4.59. (a) The height should be 1/2 because the area under the curve must be 1.

(b) P(Y ≤ 1.6) = 0.8.

4
(c) P(0.5 < Y < 1.7) = 0.6.

(d) P(Y ≥ 0.95) =0.525.

 8−9 X −9 10 − 9 
4.61. P(8 ≤ X ≤ 10) = P  ≤ ≤  = P(−13.8 ≤ Z ≤ 13.8). This probability is essentially
 0.0724 0.0724 0.0724 
1; x will almost certainly estimate μ within ±1 (in fact, it will almost certainly be much closer than this).

4.63. The possible values of X are $0 and $10, each with probability 0.5. The mean amount won will be
$0(0.5) + $10(0.5) = $5.

5
4.65. µ Y = 12 + 6µ X = 12 + 6(12) = 84.

4.67. µ X = 0(0.3) + 3(0.7) = 2.1. σ2 X = (0 – 2.1)2(0.3) + (3 – 2.1)2(0.7) = 1.89. σ X = 1.89 = 1.3748.

4.69. As the sample size gets larger, the standard deviation decreases. The mean for 1000 will be much
closer to µ than the mean for 2 (or 100) observations. Intuitively, as you sample more of the population,
you have better information about the population, so any estimates about the population should get
“better,” that is, closer to the true value(s).
4.71. σ2 X = (–1 – 0.6)2(0.2) + (0 – 0.6)2(0.3) + (1 – 0.6)2(0.2) + (2 – 0.6)2(0.3) = 1.24, σ = 1.1136.
4.73. (a) σ2 Z = 102σ2 X = 100(4)2 = 1600, σ Z = 40. (b) σ2 Z = 122σ2 X = 144(4)2 = 2304, σ Z = 48. (c) σ2 Z = σ2 X
+ σ2 Y – 2ρσ X σ Y = (4)2 + (8)2 – 2(0.5)(4)(8) = 48, σ Z = 6.928. (d) σ2 Z = σ2 X + σ2 –Y = (4)2 + (–1)2(8)2 –
2(0.5)(4)(8) = 48, σ Z = 6.928. (e) σ2 Z = σ2 –2X + σ2 2Y = (–2)2σ2 X + (2)2σ2 Y – 2ρσ –2X σ 2Y = (–2)2(4)2 + (2)2(8)2
– 2(0.5)(8)(16) = 192, σ Z = 13.856.

4.75. µ = 0(0.3) + 1(0.1) + 2(0.1) + 3(0.2) + 4(0.2) + 5(0.1) = 2.2.

4.77. σ = (0 − 0.1538) 2 (0.8507) + (1 − 0.1538) 2 (0.1448) + (2 − 0.1538) 2 (0.0045) =


0.0201228321 + 0.1036846829 + 0.015338045 = 0.013914556 = 0.373.

4.79. (a) σ = 752 + 412 = $85.48. (b) This is larger than that calculated in the example. The negative
correlation makes the standard deviation of the sum smaller.
4.81. The situation described in this exercise—“people who have high intakes of calcium in their diets are
more compliant than those who have low intakes”—implies a positive correlation between calcium intake
and compliance. Because of this, the variance of total calcium intake is greater than the variance we
would see if there were no correlation (as the calculations in Example 4.39 demonstrate).
4.83. (a) For a single toss, X = number of heads has possible values 0 and 1. µ X = 0(0.5) + 1(0.5) = 0.5,
and σ X = (0 − 0.5) (0.5) + (0 − 0.5) (0.5) = 0.25 = 0.5. (b) Tossing four times, we have µ Y = 0.5 +
2 2

0.5 + 0.5 + 0.5 = 2. σ Y = 0.25 + 0.25 + 0.25 + 0.25 = 1 = 1. (c) The use of the distribution is illustrated
below. Summing the xp column, we found µ = 2; summing the last column gives σ2 = 1, thus σ = 1.

x P(X=x) xp ( x − µ )2 p
0 0.0625 0 0.25
1 0.25 0.25 0.25
2 0.375 0.75 0
3 0.25 0.75 0.25
4 0.0625 0.25 0.25
Note: This exercise illustrates an important fact: The result of four tosses of a coin is not the
same as multiplying the result of one toss by 4. That idea works for the mean but not for the variance and
standard deviation.
4.85. (a) Because cards are dealt without replacement, the total points X and Y are not independent. (b)
The result of one roll will not affect the other; X and Y are independent in this case.
4.87. Although the probability of having to pay for a total loss for one or more of the 10 policies is very
small, if this were to happen, it would be financially disastrous. On the other hand, for thousands of

6
policies, the law of large numbers says that the average claim on many policies will be close to the mean,
so the insurance company can be assured that the premiums they collect will (almost certainly) cover the
claims.
4.89. P(3 or 4 or 5) = P(3) + P(4) + P(5) = 1/6 + 1/6 + 1/6 = 1/2 = 0.5.
4.91. Let A be the event “next card is a heart” and B be “none of Doyle’s cards are hearts.” Then, P(A | B)
= 13/48 because (other than the four known cards in Slim’s hand) there are 48 cards, of which 13 are
hearts.
4.93. This computation uses the addition rule for disjoint events, which is appropriate for this setting
because B (full-time students) is made up of four disjoint groups (those in each of the four age groups).
4.95.

4.97. (a) For disjoint events, P(A or B or C) = P(a) + P(b) + P(c) = 0.2 + 0.4 + 0.1 = 0.7. (b) Shown
below. (c) P(A or B or C)C = 1 – P(A or B or C) = 1 – 0.7 = 0.3.

7
4.99. (a) P(B|A) = P(A and B)/P(a) = 0.2/0.5 = 0.4. (b) Shown below.

4.101. The letters assigned here are certainly not the only possible combinations. (a) Let A = “5 to 10
years old” and AC = 11 to 13 years old (these are the only two age groups under consideration). Let C =
adequate calcium intake and CC = inadequate calcium intake. (b) P(a) = 0.52; P(AC) = 0.48. P(CC | A) =
0.18; P(CC | AC) = 0.57. (c) The tree diagram is given following. Multiply out the branches to find
P(inadequate calcium) = (0.52)(0.18) + (0.48)(0.57) = 0.3672.

4.103. These events are not independent; the probability of having inadequate calcium intake changes
with the age of the child. From the solution to Exercise 4.108, we have P(11 to 13 | inadequate calcium)
= 0.7451, which is not the same as P(11 to 13) = 0.48.
4.105. (a) 0.42 – 0.28 = 0.14. (b) 0.39 – 0.28 = 0.11. (c) 1 – (0.42 + 0.39 – 0.28) = 0.47. (d) The answers
in (a) and (b) are found by a variation of the addition rule for disjoint events: We note that P(S) = P(S and
E) + P(S and EC) and P(e) = P(S and E) + P(SC and E). In each case, we know the first two probabilities,
and we find the third by subtraction. The answer for (c) is found by using the general addition rule to find
P(S or E), and noting that SC and EC = (S or E)C.
4.107. For a randomly chosen high school student, let L = “student admits to lying” and M = “student is
male,” so P(L) = 0.53, P(M) = 0.44, and P(M and L) = 0.24. Then, P(M or L) = P(M) + P(L) − P(M and L)
= 0.44 + 0.53 – 0.24 = 0.73.
4.109. (a) Shown in the table below. (b) P(four-year institution|woman) = P(four-year institution and
woman)/P(woman) = 0.3416/(0.3416+0.2301) = 0.5975.

8
Outcome Men Women
Four-year institution 0.2684 0.3416
Two-year institution 0.1599 0.2301

4.111. The tree diagrams are different because each branch on the tree is a conditional probability given
the previous branches. So, by rearranging the order of the branches, we are necessarily changing the
probabilities (assuming the events are not independent).

4.113. P(A|B) = P(A and B)/P(b) = 0.082/0.261 = 0.3142. If A and B are independent, then P(A|B) = P(a)
but because P(a) = 0.138, A and B are not independent.
4.115. (a) The vehicle is a light truck is the event AC, P(AC) = 0.69, given in the problem. (b) The vehicle
is an imported car is the event (A and B), P(A and B) = 0.08.
4.117. P(at least one offer) = 1 – P(None) = 1 – 0.1 = 0.9.
4.119. P(B|C) = P(B and C)/P(c) = 0.1/0.3 = 0.33 or 1/3. P(C|B) = P(C and B)/P(b) = 0.1/0.5 = 0.2 or 1/5.
4.121. (a) Beth’s brother has type aa, and he got one allele from each parent. But neither parent is albino,
so neither could be type aa. (b) The table on the right shows the possible combinations,
each of which is equally likely, so P(aa) = 0.25, P(Aa) = 0.5, and P(AA) = 0.25. (c) Beth A a
0.25 1 0.50 2 A AA Aa
is either AA or Aa, and P(AA|not aa) = = , while P(Aa|not aa) = = . a Aa aa
0.75 3 0.75 3

4.123. Let C be the event “Toni is a carrier,” T be the event “Toni tests positive,” and D be “her son has
2
DMD.” We have P(c) = , P(T|C) = 0.7, and P(T|CC) = 0.1. Therefore, P(T) = P(T and C) + P(T and CC)
3
2 1 P (T and C ) (2 / 3)(0.7) 14
= P(c)P(T|C) + P(CC)P(T|CC) = (0.7) + (0.1) = 0.5, and P(C|T) = = = =
3 3 P (C ) 0.5 15
0.9333.
4.125. When repeated many times, the mean will be close to µ = 0.6(–8) + 0.4(5) = –2.4.
4.127. (a) Because the possible values of X are 2 and 3, the possible values of Y are Y = 5(22) – 1 = 19
with probability 0.2 and Y = 5(32) – 1 = 44 with probability 0.8. (b) µ = 19(0.2) + 44(0.8) = 39. σ2 = (19.

9
– 39)2(0.2) + (44 – 39)2(0.8) = 100, so σ = 10. (c) There are no rules for a quadratic function of a random
variable; we must use the definitions.
4.129. (a) P(a) = 1/36, P(b) = 15/36. (b) P(a) = 1/36, P(b) = 15/36. (c) P(a) = 10/36, P(b) = 6/36. (d) P(a)
= 10/36, P(b) = 6/36.
4.131. For each bet, the mean is the winning probability times the Point Expected Payoff
winning payout, plus the losing probability times −$10. These are 4 or 10 1 2
summarized in the table on the right; all mean payoffs equal $0. ( +20) + ( −10) = 0
3 3
Note: Alternatively, we can find the mean amount of money 5 or 9 2 3
( +15) + ( −10) = 0
we have at the end of the bet. For example, if the point is 4 or 10, 5 5
we end with either $30 or $0, and our expected ending amount is 6 or 8 5 6
1 2 ( +12) + ( −10) = 0
($30) + ( $0) = $10 , equal to the amount of the bet. 11 11
3 3

4.133. (a) All the probabilities are between 0 and 1 and sum to 1. (b) P(tasters agree) = 0.03 + 0.07 + 0.25
+ 0.20 + 0.06 = 0.61. (c) P(Taster 1 rates higher than 3) = 0.00 + 0.02 + 0.05. + 0.20 + 0.02 + 0.00 + 0.01
+ 0.01 + 0.02 + 0.06 = 0.39. P(Taster 2 rates higher than 3) = 0.00 + 0.02 + 0.05 + 0.20 + 0.02 + 0.00 +
0.01 + 0.01 + 0.02 + 0.06 = 0.39.
4.135. This is the probability of nine (independent) losses, followed by a win; by the multiplication rule,
this is (0.994)9(0.006) = 0.00568.
4.137. P(no point is established) = 12/36 = 1/3. In Exercise 4.131, the probabilities of winning each odds
bet were given as 1/3 for 4 and 10, 2/5 for 5 and 9, and 5/11 for 6 and 8. This tree diagram can get a bit
large (shown below). The probability of winning an odds bet on 4 or 10 (with a net payout of $20) is
 3   1  =  1  . Losing that odds bet costs $10 and has probability  3   2  = 2 , (or 1 ).
       
 36   3   36   36   3  36 18

Similarly, the probability of winning an odds bet on 5 or 9 is     = , and the probability of


4 2 2
 36   5  45
losing that bet is     = , (or 1/15). For an odds bet on 6 or 8, we win $12 with probability
4 3 3
 36   5  45
 5   5  25
, and lose $10 with     =
5 6 30 5
   = = . To confirm that this game is fair, one can
 36   11  396  36   11  396 66
multiply each payoff by its probability then add up all of those products. More directly, because each
individual odds bet is fair (as was shown in the solution to Exercise 4.131), one can argue that taking the
odds bet whenever it is available must be fair.

10
4.139. 1/18.

11
Chapter 5 Sampling Distributions
5.1. 84% is the statistic, the population is all women in the world, the population parameter would be the
true proportion of all women who experience their first street harassment before the age of 17.
5.3. It is necessary for the amounts he collected to serve as a sample from a larger population, but it is not
entirely clear what this population would be. For example, if you consider that he could have played on
other days of the week, his records represent a sample of what he could have won or lost, and/or you are
considering future winnings as well, then his collection does serve as an approximation to a sampling
distribution. Without these assumptions, his records would be considered a population and would not
represent a sampling distribution.
5.5. The centers of the sampling distributions would be the same. The spread of the sampling distribution
for the SRS of 600 would be smaller than it would be for the SRS of 200.
5.7. (a) The population is the 153 English majors at your college. (b) The sample is the 30 selected to be
on the committee. (c) The statistic is the number or proportion from the 30 in favor of the change. (d) The
parameter would be the number or proportion of all 153 students who would favor the change. (e)
Answers will vary.
5.9. (a) Population: all students in the United States at four-year colleges. Sample: the 17,096 students
surveyed. (b) Population: all restaurant workers. Sample: the 100 people asked. (c) Population: all 584
longleaf pine trees. Sample: the 40 trees measured.
5.11. (a) We would expect the distribution to be right-skewed, causing the mean to be larger than the
median. Drawings will vary but should be right-skewed. (b) Take many samples of a fixed size n, and
record the sample median for each of these samples. A histogram of these sample medians would be an
approximation for the sampling distribution of the sample median.
5.13. Applet, answers will vary. (a) The shape of the sampling distribution should be roughly Normal
centered at 0.4. (b) The shape of the sampling distribution should be roughly Normal centered at 0.2.
5.15. Applet, answers will vary. (a) and (b) The mean should be “close” to 50.5. (c) The histogram
theoretically should be a Normal distribution centered at 50.5.
5.17. Population: all adults in the U.S. Statistic: mean of 37 hours and 6 min. Likely values: answers will
vary, any amount of time someone could spend during a month using mobile apps (4.5, 68, 0, etc).

σ 16
5.19. µ x = µ = 44. σ x = = = 0.667. When the sample size increases, the mean of the sampling
n 576
distribution remains the same, but the standard deviation of the sampling distribution decreases.

σ 24
5.21. With n = 2304, µ x = µ = 82. σ x = = = 0.5. So, x ~ N(82, 0.5). About 95% of the time, x
n 2304
should be between 81 and 83 (two standard deviations either side of the mean). With the larger sample
size, the standard deviation decreased.

5.23. Applet, answers will vary. (a) Each sample size has µ x = 1. For n = 2, σ x = 1 / 2 = 0.7071. For n
= 10, σ x = 1/ 10 = 0.3162. For n = 25, σ x = 1/ 25 = 0.2. (b) Shown below is one realization for n = 25.
(c) The mean and standard deviations should be fairly close to the values calculated in part (a). As the

1
sample size increases, the sampling distribution for the sample mean gets less right-skewed and closer to
Normal. (d) Answers will vary; generally, n = 40 is “large enough” for most skewed distributions like an
exponential random variable.

5.25. (a) The standard deviation for n = 10 will be σ x = 10 / 10. (b) Standard deviation decreases with
increasing sample size. (c) µ x always equals µ and does not depend on the sample size n. (d) The
population size N does not matter, as long as it is relatively large compared with n.
5.27. (a) µ = 120.5. (b) Answers will vary. (c) Answers will vary. (d) The center of the histogram should
theoretically be close to µ, but random sampling does not guarantee this. With larger n, we would expect
the mean of this sampling distribution to get closer to µ.

5.29. (a) Larger. (b) We need σ x ≤ 0.08/2 = 0.04. (c) We need 1.24 / n ≤ 0.04. Using algebra, we find
n ≥ (1.24 / 0.04) 2 = 775. So n = 775.

5.31. The mean of the sample means will still be 250 ml. The standard deviation of the sample means will
be σ x = 0.4 / 5 = 0.1789ml.

5.33. (a) Graph shown. (b) To be more than 1 ml away from the target value means the volume is less
than 249 or more than 251. Using symmetry, P = 2P(X < 249) = 2P(Z < −2.5) = 2(0.0062) = 0.0124. (c) P
= 2P(X < 249) = 2P(Z < −5.59), which is essentially 0.

5.35. (a) x is not systematically higher than or lower than µ. (b) With large samples, x is more likely to
be close to µ.

2
0.5 − 0.4
5.37. (a) µ x = 0.4. σ x = 0.9 / 100 = 0.09. (b) z = = 1.11. From Table A, P(Z > 1.11) =
0.09
0.1335. (c) Yes, n = 100 is a large enough sample to be able to use the central limit theorem.

 208 − 190 
5.39. If W is total weight, and x = W/25, then P(W > 5200) = P( x > 208) = P  Z > =
 35 / 25 

P(Z > 2.57) = 0.0051.

5.41. (a) Assuming both samples are “large,” y has a N ( µY , σ Y / m ) distribution, and x has a
N ( µ X , σ X / n ) distribution. (b) y − x has a Normal distribution with mean µY − µ X and standard
deviation σ y2 + σ x2 .

5.43. n = 1391. X = 0.26(1391) = 361.66, so say 362. p̂ = X/n = 26% (as given in the problem).

5.45. (a) n = 1500. (b) Answers and reasons will vary. (c) If the choice is “Yes,” X = 1234. (d) For “Yes,”
pˆ = 1234 / 1500 = 0.8227.

5.47. B(20, 0.5).


5.49. (a) For the B(8, 0.3) distribution, P(X = 0) = 0.0576 and P(X ≥ 6) = 0.0113. (b) For the B(8, 0.7)
distribution, P(X = 8) = 0.0576 and P(X ≤ 2) = 0.0113. (c) The number of “failures” in the B(8, 0.3)
distribution has the B(8, 0.3) distribution. With eight trials, zero successes is equivalent to eight failures,
and six or more successes are equivalent to two or fewer failures.

5.51. (a) µ X = np = 8(0.3) = 2.4. σ X = np (1 − p ) = 8(0.3)(1 − 0.3) = 1.296. (b) µ X = np = 8(0.7) = 5.6.
σ X = np (1 − p ) = 8(0.7)(1 − 0.7) = 1.296. (c) For the means we have 2.4 + 5.6 = 8. The standard
deviations for parts (a) and (b) are the same.
Note: For a binomial distribution, the average number of successes and failures will always add
up to the number of trials. Additionally, the standard deviations for the number of successes and failures
will always be the same.
5.53. From the previous exercise, we have µ = 0.5 and σ = 0.0408. (a) P(0.4 < pˆ < 0.6) =
 0.4 − 0.5 0.6 − 0.5 
P <z<  = P(−2.45 < z < 2.45). Using Table A, this becomes 0.9929 – 0.0071 =
 0.0408 0.0408 
0.9858. (b) Similarly, P(0.45 < pˆ < 0.55) = P(−1.23 < z < 1.23) = 0.7814.

−4.4 5
e 4.4
5.55. (a) P ( X = 5) = = 0.1687. (b) Using software, P(X ≤ 5) = 0.7199.
5!

5.57. (a) Separate flips are independent (coins have no “memory,” so they do not get on a “streak” of
heads). (b) The coin is fair. The probabilities are still P(H) = P(T) = 0.5. (c) The parameters for a
binomial distribution are n and p. p̂ is a sample statistic. (d) This is best modeled with a Poisson
distribution.

3
5.59. (a) A B(200, p) distribution seems reasonable for this setting (even though we do not know what p
is). (b) This setting is not binomial; there is no fixed value of n. (c) A B(500, 1/12) distribution seems
appropriate for this setting. (d) This is not binomial because separate cards are not independent.
5.61. (a) The distribution of those who say they have stolen something is B(10, 0.2). The distribution of
those who do not say they have stolen something is B(10, 0.8). (b) X is the number who say they have
stolen something. P(X ≥ 4) = 1 – P(X ≤ 3) = 0.1209.
5.63. (a) µ = 10(0.2) = 2 will say they have stolen. µ = 10(0.8) = 8 will say they have not stolen. (b)
σ = 10(0.2)(1 − 0.2) = 1.265. (c) If p = 0.1, σ = 10(0.1)(1 − 0.1) = 0.949. If p = 0.01,
σ = 10(0.01)(1 − 0.01) = 0.315. As p gets smaller, the standard deviation becomes smaller.

5.65. (a) P(X ≥ 12) = 0.0833 and P(X ≥ 13) = 0.0206, so 13 is the smallest value of m. (b) Using 75%,
P(X ≤ 13) = 0.9198. (c) The probability will decrease. When the sample size increases, the mean will
increase as well.
5.67. The probability that a digit is greater than 5 is P(6 or 7 or 8 or 9) = 0.1 + 0.1 + 0.1 + 0.1 = 0.4 and
0.6 that the digit is not greater than 5. (a) P(at least one digit greater than 5) = 1 – (no digits greater than
5) = 1 – (0.6)6 = 0.9533. (b) μ = (40)(0.4) = 16.
5.69. (a) n = 4, p = 0.62. (b) See table and graph below. (c) µ = 4(0.62) = 2.48.

x 0 1 2 3 4
P(x) 0.0209 0.1361 0.3330 0.3623 0.1478

Number of young people


that say "yes"
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4

5.71. (a) Because 500(0.62) = 310 and 500(0.38) = 190, the approximate distribution is
𝑝 ℎ𝑎𝑡 ~ 𝑁(0.62, √{(0.62)(0.38) ÷ 500} = 0.0217). P(0.59 < pˆ < 0.65) =
 0.59 − 0.62 0.65 − 0.62 
P <z<  = P(−1.38 < z < 1.38) = 0.9162 – 0.0838 = 0.8324. (b) If p = 0.9, the
 0.0217 0.0217 
distribution of p̂ is approximately N(0.9, 0.0134) because 500(0.9) = 450 and 500(0.1) = 50.
 0.87 − 0.9 0.93 − 0.9 
P(0.87 < pˆ < 0.93) = P  <z<  = P(−2.24 < z < 2.24) = 0.9875 – 0.0125 = 0.9750.
 0.0134 0.0134 

4
(c) As p gets closer to 1, the probability of being within ±0.03. of p increases because the standard
deviation decreases.

5.73. (a) The mean is μ = p = 0.69, and the standard deviation is σ = p(1 − p ) / n = 0.0008444. (b) μ ± 2σ
gives the range 68.83% to 69.17%. (c) This range is considerably narrower than the historical range. In
fact, 67% and 70% correspond to z = −23.7 and z = 11.8, suggesting that the observed percentages do not
come from a N(0.69, 0.0008444) distribution; that is, the population proportion has changed over time.
5.75. (a) pˆ = 56 / 200 = 0.28. (b) We need P( pˆ ≥ 0.28). p̂ is approximately N(0.28, 0.0317) because
 0.28 − 0.28 
200(0.28) = 56 and 200(0.72) = 144. P ( pˆ ≥ 0.28) = P  z ≥  = P(Z ≥ 0) = 0.5. (c) Answers
 0.0317 
may vary. The key is that there is a 50% chance of obtaining a sample proportion 0.28 or larger if the
proportion of binge drinkers at your campus is really 0.28, so it is likely that percent of binge drinkers on
your campus is similar to the national average.

5.77. (a) p = 1/4 = 0.25. (b) P(X ≥ 10) = 0.0139. (c) μ = np = 5 and σ = np(1 − p ) = 3.75 = 1.9365. (d)
No. The trials would not be independent because the subject may alter his/her guessing strategy based on
this information.

5.79. (a) X has a B(900, 1/5) distribution, with mean µ = np = 900(1/5) = 180 and σ = (900)(0.2)(0.8) = 12
successes. (b) For p̂ , the mean is µ p̂ = p = 0.2 and σ pˆ = (0.2)(0.8) / 900 = 0.01333. (c) P( p̂ > 0.24) =
P(Z > 3) = 0.0013. (d) From a standard Normal distribution, P(Z > 2.326) = 0.01, so the subject must
score 2.326 standard deviations above the mean: µ pˆ + 2.326σ pˆ = 0.20. + 2.326(0.01333) = 0.2310.
This corresponds to 208 or more successes (correct guesses) in 900 attempts.
5.81. Poisson with µ = 2.768. (a) P(X = 0) = 0.0628. (b) P(X ≥ 3) = 1 – P(X ≤ 2) = 0.5229.

5.83. (a) µ = 50. (b) σ = 50 = 7.071. P(X > 60) = P(z > 1.41) = 0.0793. Software gives 0.0786.

5.85. (a) With σ x = 0.09 / 3 = 0.05196, x has (approximately) a N(137 mg, 0.05196 mg)

distribution. (b) P( x ≥ 140) = P(Z ≥ 57.74), which is essentially 0.

5.87. (a) The distribution will be approximately Normal with mean µ x = 2.07 and standard deviation
σ x = 2.11/ 140 = 0.178. (b) P ( x < 2) = P(Z < −0.39) = 0.3483. (c) Yes, because n = 140 is large.
5.89. The probability that the first digit is 1, 2, or 3 is 0.301 + 0.176 + 0.125 = 0.602, so the number of
invoices for amounts beginning with these digits should have a binomial distribution with n = 1000 and p
= 0.602. More usefully, the proportion p̂ of such invoices should have approximately a Normal
distribution with mean p = 0.602 and standard deviation p(1 − p ) / 1000 = 0.01548, so P( p̂ ≤ 575/1000
= 0.575) = P(Z ≤ −1.74) = 0.0409.
5.91. (a) µ X = np = 15(0.25) = 3.75. (b) P(X ≥ 10) = 0.000795. (c) P(X ≥ 540) = 0.0213 (0.0192 using the
Normal approximation).

5
5.93. (a) He needs $520/$35 = 14.857 (really 15) wins. (b) µ = 520(0.026) = 13.52 and
σ = 520(0.026)(1 − 0.026) = 3.629. (c) Without the continuity correction, P(X ≥ 15) = P(Z ≥ 0.41) =
0.3409. With the continuity correction, we have P(X ≥ 14.5) = P(Z ≥ 0.27) = 0.3936. The continuity
correction is much closer.
5.95. (a) pˆ F is approximately N(0.82, 0.01921) and pˆ M is approximately N(0.88, 0.01625). (b) When we
subtract two independent Normal random variables, the difference is Normal. The new mean is the
difference of the two means (0.88 − 0.82 = 0.06), and the new variance is the sum of the variances
(0.000369 + 0.000264 = 0.000633), so pˆ M − pˆ F is approximately N(0.06, 0.02516). (c) P( pˆ F > pˆ M ) = P(
pˆ F − pˆ M < 0) = P(Z < −2.38) = 0.0087 (software gives 0.0085).

5.97. For each step of the random walk, the mean is μ = (1)(0.6) + (−1)(0.4) = 0.2, the variance is
σ 2 = (1 − 0.2)2 (0.6) + ( −1 − 0.2)2 (0.4) = 0.96, and the standard deviation is σ = 0.96 = 0.9798. Therefore,
Y/500 has approximately a N(0.2, 0.04382) distribution, and P(Y ≥ 200) = P(Y/500 ≥ 0.4) P(Z ≥ 4.56) = 0.
Note: The number R of right-steps has a binomial distribution with n = 500 and p = 0.6. Y ≥ 200
is equivalent to taking at least 350 right-steps, so we can also compute this probability as P(R ≥ 350), for
which software gives the exact value 0.00000215.
5.99. The Poisson distribution is not appropriate because the rate is not constant and increases between
the midnight and 6 A.M. period.
5.101. Y has possible values 1, 2, 3, . . .. P(first one appears on toss k) = (5/6)k–1(1/6) because tossing a 1
must be preceded by k – 1 “not ones.”

6
Chapter 6 Introduction to Inference
6.1. σ x = σ / 28 = 2.10 / 28 = $0.40.

6.3. As in the previous solution, take two standard deviations: $0.80.


Note: This is the whole idea behind a confidence interval: Probability tells us that x is usually
close to μ. That is equivalent to saying that μ is usually close to x .
6.5. (a) Student results will vary. In one simulation, 23/25 = 92% of the 80% confidence intervals
contained µ. (b) Student results will vary, but about 80% of their 80% confidence intervals should contain
µ.

6.7. The margin of error would be halved because n = 6375 is roughly four times n = 1593. All else being
σ 8500
equal, the width of the interval changes as n . x ± z* = 24,164 ± 1.96 = 24,164 ± 208.66. The
n 6375
confidence interval would now be half as wide as in the previous exercise because the new margin of
error is roughly half as large as the old margin of error.
2
6.9. n ≥ 
1.96 * 3850 
 = 227.8. Take n = 228. Note that you did not need the reported average salary;
 500 
the margin of error for a confidence interval is based on the standard deviation, the confidence level, and
the size of the sample.
6.11. Answers will vary. We may not get a response for a variety of reasons. Regardless, it is likely the
532 who responded are different from those who didn’t respond, so that our estimated margin of error is
not a good measure of accuracy.

6.13. The margins of error are (1.96)(20 / n ) , which yield 13.067, 7.84, 4.356, and 3.92, respectively.
(And, of course, all intervals are centered at 78.) Interval width decreases as sample size increases.

1
6.15. (a) She did not divide the standard deviation by 500 = 22.361. (b) Confidence intervals concern
the population mean, not the sample mean. (The value of the sample mean is known to be 8.6 and is the
center of the interval; it is the population mean that we do not know.) (c) 95% is a confidence level, not a
probability. Furthermore, it does not make sense to make probability statements about the population
mean μ, once a sample has been taken and a confidence interval computed (at that point, both the interval
and the unknown parameter are fixed, and the interval either contains the parameter, or it does not). (d)
The large sample size does not affect the distribution of individual alumni ratings (the population
distribution). The use of a Normal distribution is justified because the distribution of the sample mean is
approximately Normal when the sample is large, based on the central limit theorem.

2.8
6.17. (a) The margin of error is m = 1.96 = 0.2045. The 95% confidence interval is (5.295,
720
2.8
5.704). (b) For 99% confidence, we have m = 2.576 = 0.2688. The 99% confidence interval is
720
(5.231, 5.769), which is wider than the 95% confidence interval.
6.19. For mean TRAP level, the margin of error is 2.29 U/l and the 95% confidence interval for μ is
13.2 ± (1.96)(6.5/ 31) = 13.2 ± 2.29 = 10.91 to 15.49 U/l.

6.21. Scenario A has a smaller margin of error. Both samples would have the same value of z* (1.96), but
the value of σ would likely be smaller for A because we might expect less variability in textbook cost for
freshman students than all students.

540
6.23. (a) The margin of error is m = 1.96 = 22.24. (b) To yield a margin of error of 15, we would
2265
2

need a larger sample than 2265. (c) n ≥ 


 1.96 * 540  = 4978.7, so we would need at least n = 4979.

 15 
6.25. (a) Larger; in order to reduce the standard deviation to 5 min or 0.0833 hours, we need a larger
sample. (b) We would need the standard deviation to be half of 5 min or 0.0833 hours, which is 0.04167.
2
(c) n ≥ 
1.24 
 = 885.5, so n = 886.
 0.04167 

2
6.27. (a) The 95% confidence interval for the mean number of hours spent listening to the radio in a week
is 11.5 ± (1.96)(8.3 / 1200) = 11.5 ± 0.470 = 11.03 to 11.97 hours. (b) No. This is a range of values for
the mean time spent, not for individual times. (See also the comment in the solution to Exercise 6.25.) (c)
The large sample size (n = 1200 students surveyed) allows us to use the central limit theorem to say x
has an approximately Normal distribution.
6.29. (a) We can be 95% confident but not certain. (The true population percentage is either in the
confidence interval or it isn’t.) (b) We obtained the interval 53.1% to 55.1% by a method that gives a
correct result (that is, includes the true value of what we are trying to estimate) 95% of the time. (c) For
95% confidence, the margin of error is about two standard deviations (that is, z* = 1.96), so
σ estimate = 0.51%. (d) No, confidence intervals only account for random sampling error.

 1.609 km  1 gallon  kpl


6.31. Multiply by    = 0.4251 . This gives xkpl = 0.4251xmpg = 18.3515 and
 1 mile  3.785 liters  mpg
margin of error (1.96)(0.4251σ mpg / 20) = 0.6521 kpl, so the 95% confidence interval is 17.6994 to
19.0036 kpl.
2
6.33. n ≥ 
1.96 * 6.5 
 = 72.14 —take n = 73.
 1.5 
6.35. No, this is not trustworthy. Because the numbers are based on voluntary response rather than an
SRS, the confidence interval methods of this chapter cannot be used; the interval does not apply to the
whole population.
6.37. (a) The probability that all cover their means is 0.955 = 0.7738 (or use Table C for a binomial
distribution with n = 5, p = 0.05, and k = 0). (b) The probability is now 0.995 = 0.9510. (c) C = (0.95)1/10
= 0.99488 or about 99.5%.
6.39. If μ is the mean DXA reading for the phantom, we test H0: μ = 1.4 g/cm2 versus Ha: μ ≠ 1.4 g/cm2.
We use the “not equal” alternate here because an “inaccurate” reading could deviate in either direction.
6.41. P(Z < −1.57) = 0.0582, so the two-sided P-value is 2(0.0582) = 0.1164.

3
6.43. (a) For P-value = 0.05, the value of z is 1.645. (b) For a one-sided alternative (on the positive side),
z is statistically significant at α = 0.05 if z > 1.645.

33.5 − 30
6.45. (a) z = = 1.75. (b) For a one-sided alternative, P-value = P(Z > 1.75) = 0.0401. (c) For a
14 / 49
two-sided alternative, P-value = 0.0802.
6.47. (a) No; because 0.037 is less than 0.05, we reject H0 so that 30 falls outside the 95% confidence
interval. (b) Yes; because 0.037 is greater than 0.01, we fail to reject H0 so that 30 is inside the 99%
confidence interval.
6.49. (a) Yes. (b) No. (c) Because 0.033 ≤ 0.05, we reject H0. Because 0.033 > 0.01, we do not reject H0.
6.51. (a) One of the one-sided P-values is half as big as the two-sided P-value (0.038); the other is 1 −
0.038. = 0.962. (b) Suppose the null hypothesis is H0: μ = μ0. The smaller P-value (0.038) goes with the
one-sided alternative that is consistent with the observed data; for example, if x > µ0 , then P = 0.038 for
the alternative μ > µ0 .

6.53. (a) Hypotheses should be stated in terms of the population mean, not the sample mean. (b) The null
hypothesis H0 should be that there is no change (μ = 21.2). (c) A small P-value is needed for significance;
P = 0.98 gives no reason to reject H0. (d) We compare the P-value, not the z-statistic, to α. (In this case,
such a small value of z would have a very large P-value—close to 0.5 for a one-sided alternative or close
to 1 for a two-sided alternative.)
6.55. (a) If μ is the mean score for the population of placement-test students, then we test H0: μ = 77
versus Ha: μ ≠ 77 because we have no prior belief about whether placement-test students will do better or
worse (we just wanted to know if they differ). (b) If μ is the mean time to complete the maze with rap
music playing, then we test H0: μ = 20 seconds versus Ha: μ > 20 seconds because we believe rap music
will make the mice finish more slowly. (c) If μ is the mean area of the apartments, we test H0: μ = 880 ft2
versus Ha: μ < 880 ft2 because we suspect the apartments are smaller than advertised.
6.57. (a) H0: μ = $42,800 versus Ha: μ > $42,800, where μ is the mean household income of mall
shoppers. (b) H0: μ = 0.4 hr versus Ha: μ ≠ 0.4 hr, where μ is this year’s mean response time.
6.59. (a) For Ha: μ > μ0, the P-value is P(Z > −1.33) = 0.9082. (b) For Ha: μ < μ0, the P-value is P(Z <
−1.33) = 0.0918. (c) For Ha: μ ≠ μ0, the P-value is 2P(Z < −1.33) = 2(0.0918) = 0.1836.

4
6.61. The center of the given confidence interval is x = ($46,382 + $48,008)/2 = $47,195 (the average
from the sample). The margin of error for the given confidence interval is $48,008 − x = $813, which
means σ x = 813 /1.96 = 414.8. Using this information, we have H0: μ = $48,127 versus Ha: μ ≠ $48,127.
47,195 − 48,127
z= = −2.25. P-value = 2P(Z < –2.25) = 2(0.0122) = 0.0244. We fail to reject H0. The data
414.8
does not provide evidence, at the 1% level, that the average income of graduates from your institution is
different from the national average.
6.63. P = 0.09 means there is some evidence for the wage decrease, but it is not significant at the α = 0.05.
level. Specifically, the researchers observed that average wages for peer-driven students were 13% lower
than average wages for ability-driven students, but (when considering overall variation in wages) such a
difference might arise by chance 9% of the time, even if student motivation had no effect on wages.
6.65. Even if the two groups (the health and safety class and the statistics class) had the same level of
alcohol awareness, there might be some difference in our sample due to chance. The difference observed
was large enough that it would rarely arise by chance. The reason for this difference might be that health
issues related to alcohol use are probably discussed in the health and safety class.
6.67. Because the difference for public school students was statistically significant, we can say the mean
score for them increased. The difference for private school students was not significant; that does not
mean that it didn’t increase, but rather that it didn’t increase enough to be called significant.
6.69. If μ is the mean difference between the two groups of children, we test H0: μ = 0 versus Ha: μ ≠ 0
5.8 − 0
(we only want to know if the difference is significant). The test statistic is z = = 4.14, P-value =
1.4
0.00003 from software. Our data provides very strong evidence that those subjects classified as having
low sleep efficiency had an average systolic blood pressure that is different than that of other adolescents.
Note: The exercise reports the standard deviation of the mean difference, rather than the sample
standard deviation; that is, the reported value has already been divided by 238.
6.71. If μ is the mean east–west location, the hypotheses are H0: μ = 100 versus Ha: μ ≠ 100 (as in the
113.8 − 100
previous exercise). For testing these hypotheses, we find z = = 5.75, P-value < 0.0001. We
58 / 584
reject H0. There is strong evidence that the trees are not uniformly spread from east to west.

103.3 − 95
6.73. (a) z = = 1.38, so the P-value = P(Z > 1.38) = 0.0838. Using α = 0.05, we fail to reject H0.
30 / 25
There is not enough evidence to show that the older students have a higher SSHA mean. (b) The
important assumption is that this is an SRS from the population of older students. We also assume a
Normal distribution, but this is not crucial provided there are no outliers and little skewness.
6.75. (a) H0: μ = 0 mpg versus Ha: μ ≠ 0 mpg, where μ is the mean difference. (b) The mean of the 20
2.73 − 0
differences is x = 2.73, so z = = 4.07, for which P-value < 0.0001. We reject H0. We conclude
3/ 20
that μ ≠ 0 mpg; that is, we have strong evidence that the computer’s reported fuel efficiency differs from
the driver’s computed values.

5
6.77. See the sample screen (for x = 1) below. As one can judge from the shading under the Normal
curve, it would take a value of at least about 0.7 to be significant, which is higher than in the previous
exercise. (In fact, the cutoff is about 0.7355, which is approximately 2.326/ 10 .) Smaller α means that
x must be farther away from μ0 in order to reject H0.

6.79. With sample size n = 50, sample means greater than 0.3 are statistically significant. See the results
below for x = 0.1.

6.81. The P-value doubles, from the previous exercise, for each X value because our P-value now
represents two tail areas. Still, as X moves further away from µ0, the P-value gets smaller.

P-value
0.1 0.7518
0.2 0.5271
0.3 0.3428
0.4 0.2059
0.5 0.1139
0.6 0.0578
0.7 0.0269
0.8 0.0114

6
P-value
0.9 0.0044
1 0.0016

6.83. When a test is significant at the 1% level, it means that, if the null hypothesis were true, outcomes
similar to those seen are expected to occur fewer than 1 time in 100 repetitions of the experiment or
sampling. “Significant at the 5% level” means we have observed something that occurs in fewer than five
out of 100 repetitions (when H0 is true). Something that occurs “fewer than one time in 100 repetitions”
must also occur “fewer than five times in 100 repetitions,” so significance at the 1% level guarantees
significance at the 5% level (or any higher level). In mathematical terms, P-value < 0.01 means P-value <
0.05 as well.
6.85. Using Table D or software, we find that the 0.005 critical value is 2.576, and the 0.0025 critical
value is 2.807. Therefore, if 2.576 < |z| < 2.807—that is, either 2.576 < z < 2.807 or −2.807 < z <
−2.576—then z would be significant at the 1% level, but not at the 0.5% level with a two-sided test.
6.87. As 0.674 < 1.03 <0.841, the two-sided P-value is between 2(0.20) = 0.40. and 2(0.25) = 0.50.
(Software tells us that P-value = 0.4694.)
6.89. Because the alternative is two-sided, the answer for z = −1.88 is the same as for z = 1.88 −1.645 >
−1.88 > −1.960, so Table D says that 0.05 < P-value < 0.10, and Table A gives P-value = 2(0.0301) =
0.0602. Note there is no difference in the P-values for the two-sided alternative whether the sign on the
test statistic is positive or negative.
6.91. In order to determine the effectiveness of alarm systems, we need to know the percent of all homes
with alarm systems and the percent of burglarized homes with alarm systems. For example, if only 10%
of all homes have alarm systems, then we should compare the proportion of burglarized homes with alarm
systems to 10%, not 50%. An alternate (but rather impractical) method would be to sample homes and
classify them according to whether or not they had an alarm system and also by whether or not they had
experienced a break-in at some point in the recent past. This would likely require a very large sample in
order to get a sufficiently large count of homes that had experienced break-ins.
6.93. The first test was barely significant at α = 0.05, while the second was significant at any reasonable
α.
6.95. A significance test answers only question (b). The P-value states how likely the observed effect (or
a stronger one) is if H0 is true, and chance alone accounts for deviations from what we expect. The
observed effect may be significant (very unlikely to be due to chance) and yet not be of practical
importance. And the calculation leading to significance assumes a properly designed study.
6.97. (a) If SES had no effect on LSAT results, there would still be some difference in scores due to
chance variation. “Statistically insignificant” means that the observed difference was no more than we
might expect from that chance variation. (b) If the results are based on a small sample, then, even if the
null hypothesis were not true, the test might not be sensitive enough to detect the effect. Knowing the
effects were small tells us that the statistically insignificant test result did not occur merely because of a
small sample size.

7
6.99. In each case, we find the test statistic z by dividing the observed difference (2453.7 − 2403.7 = 50
kcal/day) by 880/ n . (a) For n = 100, z = 0.57, so P = P(Z > 0.57) = 0.2843. (b) For n = 500, z = 1.27,
so P = P(Z > 1.27) = 0.1020. (c) For n = 2500, z = 2.84, so P = P(Z > 2.84) = 0.0023.
6.101. We expect more variation with small sample sizes than with large sample sizes, so even a large
difference between x and μ0 (or whatever measures are appropriate in our hypothesis test) might not turn
out to be significant. If we were to repeat the test with a larger sample, the decrease in the standard error
might give us a small enough P-value to reject H0.
6.103. This exercise asks students to create their own examples. Answers will vary.
6.105. Answers will vary.
6.107. P-value = 0.00001 = 1/100,000, so we would need n = 100,000 tests in order to expect one P-value
of this size (assuming that all null hypotheses are true). That is why we reject H0 when we see P-values
such as this: It indicates that our results would very rarely happen if H0 were true.
6.109. Using α/12 = 0.05/12 = 0.004167 as the cutoff, we reject the fifth (P-value = 0.001) and eleventh
(P-value < 0.002) tests.
6.111. The one with the larger sample size will have more power. More data will provide more
information, which should give us a better chance of finding differences between the data and the
hypothesis.
6.113. The power for μ = 35 will be higher than 0.73 because larger differences are easier to detect.

6.115. (a) Changing from the one-sided to the two-sided alternative decreases power (because more
evidence is required to reject H0; consider that the critical value for a one-sided Ha is 1.645, while it is
1.96. for a two-sided Ha). (b) Decreasing σ increases power (there is less variability in the population, so
changes are easier to detect). (c) Power increases (larger sample sizes have narrower sampling
distributions and hence more power to detect change).
6.117. The applet reports the power as 0.986.
6.119. (a) It is better to overestimate σ somewhat because our particular random sample might have
shown a smaller s than the reality. In cases where the actual σ is much smaller than 2, we will just end up
with more power (this is good, safe—rather than underestimating and ending up with not enough power).

8
x −4
(b) For the alternative Ha: μ > 4, we reject H0 at the 5% significance level if z >1.645. (b) > 1.96
2 / 100
when x > 4.329. (c) When μ = 4.25, the probability of rejecting H0 is P( x < 4.329) =
 4.329 − 4.25 
PZ < = .395  = 0.3446 (0.3464 from software). (d) The power of this test is not up to the
 2 / 100 
80% standard suggested in the text; he should collect a larger sample. (e) The needed sample size is 396.

6.121. We reject H0 when z > 2.326, which is equivalent to x > 485 + 2.326*100 / 500 = 495.4, so the
power against μ = 499 is P(reject H0 when μ = 499) = P( x > 495.4 when μ = 499) =
 495.4 − 499 
PZ >  = P( Z > −0.80) = 0.7881. This is only slightly less than the “80% power” standard,
 100 / 500 
a slightly larger sample would suffice to ensure an adequate detection rate.
6.123. (a) The hypotheses are “subject should go to college” and “subject should join work force.” The
two types of errors are recommending someone go to college when (s)he is better suited for the work
force and recommending the work force for someone who should go to college. (b) In significance
testing, we typically wish to decrease the probability of wrongly rejecting H0 (that is, we want α to be
small); the answer to this question depends on which hypothesis is viewed as H0.
Note: For part (a), there is no clear choice for which should be the null hypothesis. In the past,
when fewer people went to college, one might have chosen “work force” as H0—that is, one might have
said, “we’ll assume this student will join the work force unless we are convinced otherwise.” Presently,
roughly two-thirds of graduates attend college, which might suggest H0 should be “college.”
6.125. From the description, we might surmise that we had two (or more) groups of students—say, an
exercise group and a control (or no-exercise) group. (a) For example, if μ is the mean difference in scores
between the two groups, we might test H0: μ = 0 versus Ha: μ ≠ 0. (Assuming we had no prior suspicion
about the effect of exercise, the alternative should be two-sided.) (b) With P-value = 0.13, we would not
reject H0. In plain language: The results observed do not differ greatly from what we would expect if
exercise had no effect on exam scores. (c) For example: Was this an experiment? What was the design?
How big were the samples?

6.127. (a – b) The confidence intervals were computed as x ± 1.96s / n and are shown in the table
below. (c) Because the confidence interval for boys is entirely above the confidence interval for girls for
each food intake, we could conclude that boys consume more of each, on average.

Boys Girls
Energy (kJ) 2399.9 to 2496.1 2130.7 to 2209.3
Protein (g) 24.00 to 25.00 21.66 to 22.54
Calcium (mg) 315.33 to 332.87 257.70 to 272.30

6.129. A sample screenshot and example plot are not shown but would be similar to those shown above
for Exercise 6.126. Most students (99.4% of them) should find that their final proportion is between 0.84
and 0.96; 85% will have a proportion between 0.87 and 0.93.
6.131. Because there is nonresponse, the accuracy is in question regardless of the small margin of error.
There is no guarantee the respondents are similar to the nonrespondents.

9
6.133. (a) x = 5.3 mg/dl, so x ± 1.960σ / 6 is 4.61 to 6.05 mg/dl. (b) To test H0: μ = 4.8 mg/dl versus Ha:
x − 4.8
μ > 4.8 mg/dl, we compute z = = 1.45 and P-value = 0.0735. We fail to reject H0. There is not
0.9 / 6
sufficient evidence that this patient has a mean phosphorus level that exceeds 4.8.

6.135. (a) The distribution is roughly symmetric, shown below. (b) 30.4 ± 1.96  7  = (26.06,34.74).

 10 
(c) H0: µ = 25. Hα: µ > 25, Z = 30.4 − 25 = 2.44 , P-value = 0.0073. The mean odor threshold for the
7
10
beginning students is higher than the published threshold of 25.

2 034
3 011246
4 3

6.137. (a) Under H0, x has a N(0%, 55%/ 104 ) = N(0%, 5.3932%) distribution. (b)
6.9 − 0
z= = 1.28, so P = P(Z > 1.28) = 0.1003. (c) This is not significant at α = 0.05. There is not
55 / 104
enough evidence to say that compensation increased.
6.139. Yes. That’s the heart of why we care about statistical significance. Significance tests allow us to
discriminate between random differences (“chance variation”) that might occur when the null hypothesis
is true and differences that are unlikely to occur when H0 is true.

6.141. For each sample, find x , then take x ± (1.96)(5 / 15) = x ± 2.53. We “expect” to see that 95 of
the 100 intervals will include 25 (the true value of μ); binomial computations show that (about 99% of the
time) 90 or more of the 100 intervals will include 20.

x − 24
6.143. For each sample, find x , then compute z = . Choose a significance level α and the
5 / 15


appropriate cutoff point ( z ) —for example, with α = 0.05, reject H0 if |z| > 1.96. Because the true mean is
x − 20
20, Z = has a N(0, 1) distribution, so the probability that we will fail to reject H0 is
5 / 15
 x − 24  4
P  − z∗ < < z ∗  = P(− z * < Z − < z * ) = P(3.0984 – z* < Z < 3.0984 + z*); here, we
 5 / 15  5 / 15
have adjusted for the difference between the true mean and the value of the mean in H0. If α = 0.05. (z* =
1.96), this probability is P(−1.14 < Z < 5.06) = 0.8729. Thus, we “expect” to (wrongly) fail to reject H0
87.29% of the time and correctly reject H0 only about 12.71% of the time.
6.145. Answers will vary.

10
Chapter 7 Inference for Means
7.1. (a) The standard error of the mean is s / n = $180 / 16 = $45. (b) The degrees of freedom are df =
n − 1 = 15.

7.3. The 95% confidence interval for μ is $766 ± (2.131)$180 / 16 = ($670.105, $861.895).

7.5. (a) Using df = n – 1 = 22, 0.04 < P-value < 0.05, which is significant at the 5% level. (b) Using df =
n – 1 = 8, 0.05 < P-value < 0.10, which is not significant at the 5% level. (c) Shown below.

7.7. From software, the 95% confidence interval is (2.0817, 26.9183).

Mean 95% CL Mean 95% CL Mean Std Dev 95% CL Std Dev 95% CL Std Dev
14.5000 2.0817 26.9183 14.8541 9.8211 30.2320

7.9. A paired t test is appropriate because the number of receptivity displays was recorded twice for each
female when exposed to the two different videos.
7.11. Taking Hot Oil values minus the Oil Free values, x = −1.6, s = 2.9665. Using df = n – 1 = 5 – 1 = 4,
t* = 2.776. The 95% confidence interval is −1.6 ± 2.776 2.9965 = (−5.32, 2.12).
5
7.13. Although the sample size is quite large, n = 518, there are potential outliers, so we should not use t
procedures with these data.
7.15. (a) df = 14, t* = 2.145. (b) df = 27, t* = 2.052. (c) df = 27, t* = 1.703. (d) As sample size increases
(comparing a and b), the margin of error decreases. As confidence increases (comparing c and b), the
margin of error increases.
7.17. The 5% critical value for a t distribution with df = 23 is 1.714. Only one of the one-sided options
(reject H 0 when t > 1.714) is shown; the other is simply the mirror image of this sketch (shade the area to
the left of −1.714; reject when t < −1.714).

1
7.19. Yes; because we want µ > 0, we need the area to the right. This would give a P-value = 0.9625.

7.21. (a) df = n – 1 = 12. (b) 2.681 < t < 3.055. (c) Because the alternative is two-sided, we double the
upper-tail probability to find the P-value: 0.01 < P-value < 0.02. (d) t = 2.78 is significant at the 5% level
but not at the 1% level. (e) From software, P-value = 0.0167.
7.23. Let P be the given (two-sided) P-value. Now suppose that the alternative is μ > μ 0 , then the one-
sided P-value is the area to the right. If x is greater than μ 0 and t is positive, then the one-sided P-value is
simply P/2. However, if x is less than μ 0 and t is negative, then P-value = area to the right = 1 − P/2.
Similarly, suppose the alternative is μ < μ 0 , then the one sided P-value is now the area to the left. So if x
is less than μ 0 and t is negative, then the one-sided P-value = P/2, but if x is greater than μ 0 and t is
positive, then the one-sided P-value = 1 − P/2.

7.25. X = 28.1563, s = 1.4841. df = 15 so t* = 2.131. The 95% confidence interval is


28.1563 ± 2.1311.4841 = (27.37, 28.95).
16
7.27. (a) The histogram below shows the data are normally distributed, so t procedures are appropriate.
(b) The 95% confidence interval is 36.157 ± 2.603. (c) (33.55, 38.76). (d) Inference from the confidence

2
interval only applies to the mean, not the median. Other tools would need to be used for inference on the
median.

7.29. (a) If μ is the mean alcohol content by volume, we wish to test H 0 : μ = 4.7% versus H a : μ ≠ 4.7%.
4.9767 − 4.7
From these data, we have x = 4.9767 and s = 0.03215. The test statistic is t = = 14.907.
0.03215 / 3
With df = 2, we have 0.002 < P-value < 0.005 (software gives 0.0045). These data indicate the mean
alcohol content of Budweiser beer is not 4.7%. (b) The 95% confidence interval is 4.9767 ±
4.303(0.03215/ 3 ) = (4.8968%, 5.0566%). Note that this interval is entirely above 4.7%. (c) For the cans
and bottles to be within 0.3% of the advertised level, they need to be between 4.7% and 5.3%. Because
our confidence interval is entirely contained in this range, it appears that Budweiser is within the
standards.
7.31. (a) If μ is the mean number of uses a person can produce in 5 minutes after witnessing rudeness, we
7.88 − 10
wish to test H 0 : μ = 10 versus H a : μ < 10. (b) t = = −5.2603, with df = 33, for which P-value <
2.35 / 34
0.0005. This is strong evidence that witnessing rudeness decreases performance.
7.33. (a) The histogram, boxplot, and normal quantile plot reveals that the distribution has two peaks, so
the distribution is not Normal. The five-number summary is 2.2, 10.95, 28.5, 41.9, 69.3. (b) Maybe. We
have a large enough sample to overcome the non-Normal distribution, but we are sampling from a small
population. (c) x = 27.29, s = 17.7058. df = 39, so using 30 we have t* = 2.042, so the 95% confidence
interval is 27.29 ± 2.042(17.7058/ 40 ) = (21.57, 33.01). The software gives (21.63, 32.95). (d) One
could argue for either answer. We chose a random sample from this tract, so the main question is, can we
view trees in this tract as being representative of similar trees in the same area?

3
7.35. (a) R/6 = (95 – 0)/6 = 15.833. (b) df = 2499; so using 1000 we have t* = 1.962. The 95% confidence
interval is 2.55 ± 1.962(15.833/ 2500 ) = (1.93, 3.17). (c) The large number of observations eliminates
any concern of skewness, but the outliers pose a risk for using t procedures.

52.98 − 45
7.37. We wish to test H 0 : μ = 45 versus H a : μ > 45. t = = 5.457. Using df = 49 (use 40), P-
10.34 / 50
value < 0.0005. These data show that parents of ADHD children are extremely stressed (on average).
7.39. (a) We wish to test H 0 : μ = 0 versus H a : μ ≠ 0, where μ is the mean change in NEAT.
328 − 0
t= = 5.125 with df = 15, for which P-value = 0.00012. There is strong evidence of a change
256 / 16
in NEAT. (b) With t* = 2.131, the 95% confidence interval for the change in NEAT activity is 191.6 to
464.4 kcal/day. This tells us how much of the additional calories might have been burned by the increase
in NEAT: It consumed 19% to 46% of the extra 1000 kcal/day.
7.41. (a) H 0 : μ = 0 versus H a : μ ≠ 0. (b) With mean difference x = 2.73 and standard deviation s =
2.73 − 0
2.8015, the test statistic is t = = 4.358 with df = 19, for which P-value < 0.001 (software
2.8015 / 20
gives 0.0003). We have strong evidence that the results of the two computations are different.

938.2 − 925
7.43. (a) To test H 0 : μ = 925 picks versus H a : μ > 925 picks, we have t = = 3.27
24.2971 / 36

with df = 35, for which P-value = 0.0012. (b) For H 0 : μ = 935 picks versus H a : μ > 935 picks, we have
938.2 − 935
t= = 0.80, again with df = 35, for which P-value = 0.2146. (c) The 90% confidence interval
24.2971 / 36
from the previous exercise was 931.3 to 945.0 picks, which includes 935 but not 925. For a test of H 0 : μ =
μ 0 versus H a : μ ≠ μ 0 , we know that P-value < 0.10 for values of μ 0 outside the interval and P-value >
0.10 if μ 0 is inside the interval. The one-sided P-value would be half of the two-sided P-value.

7.45. (a) The differences are spread from −0.018 to 0.020 g, with mean x = −0.0015 and
standard deviation s = 0.0122 g. A stemplot is shown on the right; the sample is too small
to make definitive judgments about skewness or symmetry, but there may be an outlier. A
normal quantile plot reveals the data are approximately normal; therefore, t methods are
−0.0015 − 0
appropriate. (b) For H 0 : μ = 0 versus H a : μ ≠ 0, we find t = = −0.347 with
0.0122 / 8
df = 7, for which P-value = 0.7388. We cannot reject H 0 based on this sample; the two operators are not
significantly different in their results. (c) The 95% confidence interval for μ is –0.0015 ± 2.365(0.0122/

4
8 ) = (−0.0117 to 0.0087). (d) The subjects from this sample may be representative of future subjects,
but the test results and confidence interval are suspect because this is not a random sample.
7.47. (a) We test H 0 : μ = 0 versus H a : μ > 0, where μ is the mean change in score
(that is, the mean improvement, which is expected to increase). (b) The distribution
2.5 − 0
is left-skewed, with mean x = 2.5 and s = 2.8928. (c) t = = 3.8649,
2.8928 / 20
df = 19, and P-value = 0.0005; there is strong evidence of improvement in listening
test scores. (d) With df = 19, we have t* = 2.093, so the 95% confidence interval is
2.898
2.5 ± 2.093 = 1.15 to 3.85.
20

(11.53) 2 (15.33) 2
7.49. The 95% confidence interval is 46.32 − 32.85 ± 2.160 + = (2.65, 24.29). With
16 14
the smaller sample sizes, the confidence interval is wider.
7.51. H 0 : μ = 5 versus H a : μ > 5. The confidence interval is (5.48, 21.46). Because 5 is not in this interval,
we reject H 0 . The data shows evidence that the improvement is greater than 5 points.
7.53. SPSS and SAS give both results (the SAS output refers to the unpooled result as the Satterthwaite
method), while JMP and Excel show only the unpooled procedures. The pooled t statistic is 2.279, for
which P-value = 0.052.
Note: When the sample sizes are equal—as in this case—the pooled and unpooled t statistics are
equal. (See the next exercise.) Both Excel and JMP refer to the unpooled test with the slightly misleading
phrase “assuming unequal variances.” The SAS output also implies that the variances are unequal for
this method. In fact, unpooled procedures make no assumptions about the variances. Finally, note that
both Excel and JMP can do pooled procedures as well as the unpooled procedures that are shown.

7.55. (a) Hypotheses should involve µ 1 and µ 2 (population means) rather than x1 and x2 (sample means).
(b) The samples are not independent; we would need to compare the 56 males to the 44 females. (c) We
need the P-value to be small (for example, less than 0.10) to reject H 0 . A large P-value like this gives no
reason to doubt H 0 . (d) Assuming the researcher computed the t statistic using x1 − x2 , a positive value
of t does not support H a . (The one-sided P-value would be 0.982, not 0.018.)
7.57. (a) We cannot reject H 0 : µ 1 = µ 2 in favor of the two-sided alternative at the 5% level because 0.05
< P-value < 0.10 (0.0771 from software). (b) We could reject H 0 in favor of H α : µ 1 < µ 2 . A negative t
statistic means that x1 < x2 , which supports the claim that µ 1 < µ 2 , and the one-sided P-value would be
half of its value from part (a): 0.025 < P-value < 0.05 (0.0386 from software).
7.59. (a) Both distributions are Normally distributed, except the low-intensity class has a low outlier. (b)
H 0 : µ H = µ L , H α : µ H ≠ µ L . t = 5.30. df = 14. P-value < 0.001 (<0.0001 from software). The data are
significant at the 5% level, and there is evidence the noise levels are different between the high- and low-
intensity fitness classes. (c) Because the low-intensity class has an outlier, the t test is not appropriate. (d)
t = 6.31. df = 13. P-value < 0.001 (<0.0001 from software). Removing the outlier did not change the
results. (e) Because the outlier is not affecting the results, it is probably okay to report both tests. It would
be a good idea to investigate the outlier and see why it had such a low decibel value; if it were drastically

5
different in some way, it might be good to remove it and only report the test without it after mentioning
its removal.
7.61. (a) The t procedure is robust. Because n 1 + n 2 ≥ 40, we can use the t procedures on skewed data. (b)
H 0 : µ days = µ month , H α : µ days ≠ µ month . t = –2.42. df = 89. 0.01 < P-value < 0.02. The data are significant at
the 5% level, and there is evidence the means of the two groups are different. Those who are told 30/31
days have a smaller expectation interval on average than those who are told 1 month.

7.63. H 0 : µ Brown = µ Blue , H α : µ Brown > µ Blue . t = 0.55 − (−0.38)


= 2.59. df = 39. 0.005 < P-value < 0.01
(1.68) 2 (1.53) 2
+
40 40
(0.0058 from software). The data show that brown-eyed students appear more trustworthy compared with
their blue-eyed counterparts.
7.65. The nonresponse rate was (3866 – 1839)/3866 = 0.5243, or about 52.4% in spite of the incentive of
the gift card raffle and repeated email reminders. We do not know if these students were not Facebook
users, did not want to disclose their time spent on Facebook, their GPAs or other academic information,
or just ignored the survey. Because of this, the results should be viewed with caution.
7.67. (a) As shown in the histograms, the data are not Normally distributed. For the Neutral group, price
is heavily right-skewed. For the Sad group, price is somewhat uniform. Because neither distribution is
strongly skewed or has outliers, t procedures are still appropriate. (b) Table shown below. (c) H 0 : µ N =
µ S , H α : µ N ≠ µ S . (d) t = 0.5714 − 2.1176 = −4.31. df = 14 – 1 = 13, P-value < 0.001 (0.0002 from
(0.73) 2 (1.24) 2
+
14 17
software). The data show evidence of a significant difference between the groups in purchase price. The
Sad group spent more than the Neutral group. (e) The 95% confidence interval is
(0.73) 2 (1.24) 2
0.5714 − 2.1176 ± 2.160 + = (−2.32, −0.77). Software gives (–2.284, –0.808).
14 17

Price Price Price


Group N Mean Std Dev
N 14 0.571429 0.730046
S 17 2.117647 1.244104

6
7.69. (a) The problem with averages on rating is that there is no guarantee the differences between ratings
are equal, so that going from a rating of 1 to 2, 2 to 3, etc., are equal. Taking averages assumes this, so it
is likely not appropriate. (b) The data are ratings from 1–5; as such, they certainly will not be Normally
distributed, but because n 1 + n 2 ≥ 40 and outliers are not possible, the t procedures can be used. (c)
McDonald’s: X = 3.9937, s = 0.8930; Taco Bell: X = 4.2208, s = 0.7331. (d) H 0 : µ M = µ T , H α : µ M ≠ µ T .
t = –3.48. df = 307. P-value < 0.001 (0.0005 from software). The data are significant at the 5% level, and
there is evidence the average customer ratings between the two chains is different.
7.71. (a) Assuming we have SRSs from each population, this seems reasonable. (b) H 0 : µ Early = µ Late , H α :
µ Early ≠ µ Late . (c) SE D = 1.0534, t = 1.614, df = 199, P-value = 0.1081 (0.1075 from software). We fail to
detect a difference in mean fat consumption for the two groups. (d) The 95% confidence interval for the
difference in mean fat consumption is (−0.39, 3.79). Software gives (–0.372, 3.772). We note that the
intervals contain 0; these support the results of the test.

7.73. To find a confidence interval ( x1 − x2 ) ± t ∗SE D , we need one of the following:

• Sample sizes and standard deviations, in which case we could find the interval in the usual way
• t and df because t = ( x1 − x2 ) / SE D , so we could compute SE D and use df to find t*
• df and a more accurate P-value from which we could determine t, and then proceed as above

The confidence interval could give us useful information about the magnitude of the difference (although,
with such a small P-value, we do know that a 95% confidence interval would not include 0).
7.75. This is a matched pairs design; for example, Monday hits are (at least potentially) not independent
of one another. The correct approach would be to use one-sample t methods on the seven differences
(Monday hits for design 1 minus Monday hits for design 2, Tuesday/1 minus Tuesday/2, and so on).
7.77. There could be things that are similar about the next eight employees who need new computers as
well as the following eight, which could bias the results (like being from the same office or department).
7.79. (a) Stemplots and boxplots are shown on the right. The north distribution is right-skewed, while the
south distribution is left-skewed. (b) The methods of this section seem to be appropriate in spite of the
skewness because the sample sizes are relatively large, and there are no outliers in either distribution. (c)
We test H 0 : µ N = µ S versus H a : µ N ≠ µ S ; we should use a two-sided alternative because we have no
reason (before looking at the data) to expect a difference in a particular direction. (d) The means and

7
standard deviations are x N = 23.7, sN = 17.5001, xS = 34.53, and sS = 14.2583 cm. Then, SE D =
4.1213, and t = −2.629 with df = 29, 0.01 < P-value < 0.02 (0.011 from software). We conclude that the
means are different. (e) The 95% confidence interval is (−19.2614, −2.4053) [software gives (–19.0902, –
2.5765)]. The interval not only tells us that a difference exists, but that the northern trees are, on average,
between about 2.4 and 19 cm smaller in dbh than the trees in the southern part of the tract.
7.81. (a) Using df = 50, the 95% confidence interval is (−1.07, 7.07). (b) With 95% confidence, the mean
change in sales from last year to this year is between –1.07 and 7.07. Because the interval covers 0 and
includes some negative values, it is possible sales have actually decreased.
7.83. (a) We test H 0 : µ B = µ F versus H a : µ B > µ F . SE D = 0.5442 and t = 1.654, using df = 18 we have
0.05 < P-value < 0.10 (0.0532 from software); there is not quite enough evidence to reject H 0 at α = 0.05.
(b) The 95% confidence interval is (−0.2434, 2.0434). Software gives (–0.2021, 2.0021). (c) We need two
independent SRSs from Normal populations.

7.85. The pooled standard deviation is s p = 27.06, so the pooled standard error is s p 1/ 32 + 1/ 33 =
6.713. The test statistic is t = 4.17, df = 63, P-value < 0.001. The 95% confidence interval is (14.57,
41.43). The results are similar.
7.87. See the solution to Exercise 7.79 for means and standard deviations. The pooled standard deviation
is s p = 15.9617, and the standard error is SE D = 4.1213. The significance test has t = −2.629, df = 58, and
P-value = 0.0110, so we have fairly strong evidence (though not quite significant at α = 0.01) that the
south mean is greater than the north mean. The 95% confidence interval is (−19.113, −2.554). Software
gives (–19.083, –2.584). All results are similar to those found in Exercise 7.85.

7.89. With sN = 17.5001, sS = 14.2583, and nN = nS = 30 , we have

(17.5001 / 30 + 14.2583 / 30)


2 2 2

df = = 55.725.
(17.5001 / 30) /(30 − 1) + (14.2582 / 30) /(30 − 1)
2 2 2 2

7.91. (a) With sI = 7.8, nI = 115, sO = 3.4, and nO = 220, we have


(7.8 / 115 + 3.4 / 220)
2 2 2

df = = 137.066. This agrees within rounding. (b)


(7.8 / 115) /(115 − 1) + (3.4 / 220) /(220 − 1)
2 2 2 2

( n I − 1) s I + ( nO − 1) sO
2 2

sp = = 5.332, which is slightly closer to s O (the standard deviation from the larger
n I + nO − 2

sample). (c) With no assumption of equality, SE D = sI2 / nI + sO2 / nO = 0.7626. With the pooled method,

SE D = s p 1 / nI + 1 / nO = 0.6136. (d) With the pooled standard deviation, t = 18.74 and df = 333, for
which P < 0.0001, and the 95% confidence interval is as shown in the table. With the smaller standard
error, the t value is larger (it had been 15.08), and
df t∗ Confidence interval
the confidence interval is narrower. The P-value
Part (d) 333 1.9671 10.2931 to 12.7069
is also smaller (although both are less than
100 1.984 10.2827 to 12.7173
0.0001). (e) With sI = 2.8 , nI = 115 , sO = 0.7, 333 1.9671 4.5075 to 5.2925
Part (e)
100 1.984 4.5042 to 5.2958

8
and nO = 220, we have

(2.82 / 115 + 0.72 / 220) 2


df = = 121.503. The pooled standard deviation is s p =
(2.82 / 115) 2 /(115 − 1) + (0.72 / 220) 2 /(220 − 1)
1.734; the standard errors are (with no assumptions) SE = 0.2653 and (assuming equal standard
deviations) SE = 0.1995. The pooled t is 24.56 (df = 333, P-value < 0.0001), and the 95% confidence
intervals are shown in the table. The pooled and usual t procedures compare similarly with the results for
part (d): With the pooled procedure, t is larger, and the interval is narrower.
2
1.96(11605) 
7.93. Using z* = 1.96, n =  = 20.69, round up to get n = 21. The table below shows the
 5000 
calculations starting with 21; n = 24 guarantees the margin of error is less than 5000.

n margin of error t*
21 5282.6 2.086
22 5146.3 2.080
23 5018.7 2.074
24 4901.2 2.069

7.95. Higher; if the alternative µ is farther away from 18.5, then we will have more power.
7.97. Decrease; with a two-sided alternative, essentially we are cutting alpha in two pieces, one for each
side. This functionally smaller alpha equates to a smaller rejection region and thus a smaller probability of
rejecting, which means less power.
7.99. For the two-sided alternative, power = 0.72. For the larger standard deviation, power = 0.72.
7.101. (a) To halve the margin of error, the sample size needs to be quadrupled. (b) The sign test is less
powerful than the t test when the differences are close to normal. (c) For a two-sided alternative, the
power would be the same at µ = 3 and 17 because they are the same distance from 10. (d) Increasing the
sample size has no effect on the probability of a Type I error, this is determined by the choice of α. It will,
however, decrease the probability of a Type II error.
7.103. No, the confidence interval is for the mean monthly rate, not the individual apartment rates.
2
7.105. (a) n =  1.96(5.2)  = 103.9, so n = 104. (b) We would need to use the bigger sample to make sure
 1 
both margins of error conditions are met.
7.107. (a) From software, n = 18. (b) For n = 10, the power will be less than 90%; with a smaller sample
size, we will get less power. (c) For n = 10, the power is 0.80.
7.109. Answers will vary based on the choice of σ; s = 0.0122 so to be conservative we will use σ = 0.015.
With this, the power is only 0.09. The power is quite low; we would not be able to detect a difference of
0.002 confidently.
7.111. (a) Increase. A larger alpha gives a larger rejection region and thus a larger probability of rejecting,
which means more power. (b) From software, the power is 0.89.

9
7.113. The four standard deviations from Exercises 7.85 and 7.86 are sN = 17.5001, sS = 14.2583,
sE = 16.0743, and sW = 15.3314 cm. Using a larger σ for planning the study is advisable because it
provides a conservative (safe) estimate of the power. For example, if we choose a sample size to provide
80% power, and the true σ is smaller than that used for planning,
the actual power of the test is greater than the desired 80%. Power with n = Power with n =
Results of additional power computations depend on what σ 20 60
15 0.5334 0.9527
students consider to be “other reasonable values of σ.” Shown in
16 0.4809 0.9255
the table are some possible answers using the Normal
17 0.4348 0.8928
approximation. (Powers computed using the noncentral t 18 0.3945 0.8560
distribution are slightly greater.)
7.115. H 0 : median = 0, H a : median > 0 (H 0 : p = 1/2, H a : p > 1/2). One difference is 0; of the nine non-
zero differences, seven are positive. The P-value is P(X ≥ 7) = 0.0898 from a B(9, 0.5) distribution; there
is not quite enough evidence to conclude that Jocko’s estimates are higher. In Exercise 7.38, we were
able to reject H 0 , here we cannot.
7.117. H 0 : median = 0, H a : median > 0. Out of the 20 differences, 17 are positive (and none equal 0). The
P-value is P(X ≥ 17) = 0.0002 from a B(20, 0.5) distribution. So we reject H 0 and conclude that the
computer does record higher gas mileage than the driver. Using a t test, we found P-value = 0.0013,
which led to the same conclusion.
7.119. The mean is x = 153.5, the standard deviation is s = 14.708, and the standard error of the mean is
sx = 7.354. It would not be appropriate to construct a confidence interval because we cannot consider
these four scores to be an SRS.
7.121. (a) Shown below. (b) The plot shows that t* approaches z* = 1.96 as the df increases. (c) The plots
would be similar but t* would approach z* = 1.645 as the df increases.

2.25

2.2

2.15

2.1

2.05

1.95

1.9
10 20 30 40 50 60 70 80 90 100

t* z*

7.123. (a) Use two independent samples (students who live in the dorms, and those who live elsewhere).
(b) Use a matched pairs design: Take a sample of college students and have each subject rate the appeal
of each label design (make sure to randomize the order in which the designs are presented). (c) Take a
single sample of college students and ask them to rate the appeal of the product.

10
0.83 − 1.5
7.125. (a) H 0 : µ = 1.5, H a : µ < 1.5. t = = −9.974 , with df = 199, for which P-value ≈ 0. We
0.95 / 200
reject H 0 at the 5% significance level. (b) From Table D, use df = 100 and t* = 1.984, so the 95%
confidence interval for μ is 0.83 ± 1.984(0.95/ 200) = 0.83 ± 0.133 = 0.697 to 0.963 violations (with
software, the interval is 0.698 to 0.962.) (c) While the significance test lets us conclude that there were
less than 1.5 violations (on the average), the confidence interval gives us a range of reasonable values for
the mean number of violations. (d) We have a large sample (n = 200), and the limited range means that
there are no extreme outliers, so t procedures should be safe.
7.127. (a) The two-sample confidence interval (using the conservative df = 9) is
−0.853 ± 2.262 2.3182 / 10 + 1.9242 / 10 = −0.853 ± 2.155 = −3.008 to 1.302. (Software gives −2.859 to
1.153.) (b) The paired sample confidence interval is −0.853 ± 2.262(1.269 / 10) = –0.853 ± 0.908 =
(−1.761, 0.055). (c) The centers of the intervals are the same, but the margin of error for the independent
samples interval is much larger.
7.129. (a) The sample sizes are 41 and 197. We are looking at the average proportions across both
samples. (b) Because we have no indication that a direction was considered prior to data being collected,
the hypotheses are H 0 : µ B = µ W and H a : µ B ≠ µ W , where µ B is the average proportion of friends who are
black for students with a black roommate, and µ W is the average proportion of friends who are black for
students with a white roommate. (c) For the First Year: t = 0.982. With df = 52.3, P-value = 0.3305. With
df = 40, 0.30 < P-value < 0.40. For the Third Year: t = 2.126, df = 46.9, P-value = 0.0388. With df = 40,
0.02 < P-value < 0.04. (d) There was not a significant difference in the average proportion of black
friends at the middle of the first year; by the middle of the third year, the difference had become
significant. Students with black roommates had a larger average proportion of black friends than those
with white roommates.

7.131. (a) The mean difference in body weight change (with wine minus without wine) was x1 = 0.4 − 1.1
= −0.7 kg, with standard error SE 1 = 8.6/ 14 = 2.298 kg. The mean difference in caloric intake was x2 =
2589 − 2575 = 14 cal, with SE 2 = 210/ 14 = 56.125 cal. (b) The t statistics ti = xi / SEi , both with df =
13, are t1 = −0.305 ( P1 = 0.7655) and t2 = 0.249 ( P2 = 0.8069). (c) For df = 13, t* = 2.160, so the 95%
confidence intervals xi ± t *SE i are −5.66 to 4.26 kg (−5.67 to 4.27 with software) and −107.23 to 135.23
cal (−107.25 to 135.25 with software). (d) Students might note a number of factors in their discussions;
for example, all subjects were males, weighing 68 to 91 kg (about 150 to 200 lb), which may limit how
widely we can extend these conclusions.
7.133. (a) This is a matched pairs design because, at each of the 24 nests, the same mockingbird
responded on each day. (b) The variance of the difference is approximately s12 + s42 − 2ρ s1s4 = 48.684, so
the standard deviation is 6.9774 m. (c) To test H 0 : µ 1 = µ 4 versus H a : µ 1 ≠ µ 4 , we have
15.1 − 6.1
t= = 6.319 with df = 23, for which P-value < 0.001. (d) With correlation ρ = 0.3, the
6.9774 / 24
variance of the difference is approximately s12 + s52 − 2ρ s1s5 = 36.518, so the standard deviation is 6.0430
4.9 − 6.1
m. To test H 0 : µ 1 = µ 5 versus H a : µ 1 ≠ µ 5 , we have t = = −0.973 with df = 23, for which P-
6.0430 / 24

11
value = 0.3407. (e) The significant difference between day 1 and day 4 suggests that the mockingbirds
altered their behavior when approached by the same person for four consecutive days; seemingly, the
birds perceived an escalating threat. When approached by a new person on day 5, the response was not
significantly different from day 1; this suggests that the birds saw the new person as less threatening than
a return visit from the first person.
7.135. How much a person eats or drinks may depend on how many people he or she is with. This means
that individual customers within each wine-label group probably cannot be considered to be independent
of one another, which is a fundamental assumption of the t procedures.
7.137. A two-sample t procedure was used. We assumed the data are approximately normal. H 0 : µ C =
µN , Hα: µC > µN . t = 4.44 − 3.95
= 0.95. df = 89. 0.15 < P-value < 0.20. There is no significant
(3.98) 2 (2.88) 2
+
90 90
evidence that those treated rudely were willing to pay more. This is a two-sample one-sided significance
test; we assume the data were randomly selected and that the data contain no outliers.

7.139. A two sample t procedure was used. H 0 : µ C = µ N , H α : µ C > µ N . t = 2.90 − 2.98


= −0.16. df
(3.28) 2 (3.24) 2
+
90 90
= 89. P-value > 0.25. There is no significant evidence that those treated rudely were willing to pay more.
This is a two-sample one-sided significance test; we assume the data were randomly selected and that the
data contain no outliers. We found similar results here as in previous exercise: that condescending did not
boost sales.

7.141. The mean and standard deviation of the 25 numbers are x = 77.76% and s = 32.6768%, so the
standard error is SE x = 6.5354%. For df = 24, Table D gives t* = 2.064, so the 95% confidence interval
is x ± 13.49% = 64.27% to 91.25%. This seems to support the retailer’s claim: The original supplier’s
price was higher between 64% to 91% of the time.
7.143. Back-to-back stemplots follow. The distributions appear similar; the most striking difference is the
relatively large number of boys with low GPAs. Testing the difference in GPAs (H 0 : µ B = µ G , H α : µ B <
µ G ), we obtain SE D = 0.4582 and t = −0.91, which is not significant, regardless of whether we use df =
30, 0.15 < P-value < 0.20 (software gives 0.1839). The 95% confidence interval for the difference µ B –
µ G in GPAs is shown in the table. For the difference in IQs, we find SE D = 3.1138. With the same
hypotheses as before, we find t = 1.64—fairly strong evidence but not quite significant at the 5% level:
0.05 < P-value < 0.10 (df = 30). Software gives P-value = 0.0528 (df = 56.9). The 95% confidence
interval for the difference µ B – µ G in IQs is shown in the second table on the right.

GPA GPA IQ IQ df t* Confidence interval


n Mean s Mean s GPA 75 1.992 −1.3277 to 0.4979
Boys 47 7.282 2.319 111 12.12 30 2.042 −1.3505 to 0.5207
Girls 31 7.697 1.721 105.8 14.27 IQ 57 2.003 −1.1167 to 11.3542
30 2.042 −1.2397 to 11.4772

12
7.145. It is reasonable to have a prior belief that people who evacuated their pets would score higher, so
we test H 0 : µ 1 = µ 2 versus H α : µ 1 > µ 2 . We find SE D = 0.4630 and t = 3.65, which gives P-value <
0.0005 no matter how we choose degrees of freedom (115 or 237.0). As one might suspect, people who
evacuated their pets have a higher mean score. A 95% confidence interval for the difference is (0.7714,
2.6086); software gives (0.7779, 2.6021).
7.147. (a) We test H 0 : µ B = µ D versus H α : µ B < µ D . Pooling might
n x s
be appropriate for this problem, in which case s p = 6.5707.
Basal 22 41.0455 5.6356
Whether or not we pool, SE D = 1.9811 and t = 2.87 with df = 42
DRTA 22 46.7273 7.3884
(pooled), 39.3, or 21, so P-value = 0.0032, 0.0033, or 0.0046. We
Strat 22 44.2727 5.7668
conclude that the mean score using DRTA is higher than the mean
score with the Basal method. The difference in the average scores is 5.68; options for a 95% confidence
interval for the difference µ D – µ B are given in the table below. (b) We test H 0 : µ B = µ S versus H α : µ B <
µ S . If we pool, s p = 5.7015. Whether or not we pool, SE D = 1.7191 and t = 1.88 with df = 42, 42.0, or 21,
so P-value = 0.0337, 0.0337, or 0.0372. We conclude that the mean score using Strat is higher than the
Basal mean score. The difference in the average scores is 3.23; options for a 95% confidence interval for
the difference µ S – µ B are given in the table below.

Confidence interval Confidence interval


df t *
for df t* for
39.3 2.0223 1.6754 to 9.6882 42.0 2.0181 −0.2420 to 6.6966
21 2.0796 1.5618 to 9.8018 21 2.0796 −0.3477 to 6.8023
21 2.080 1.5610 to 9.8026 21 2.080 −0.3484 to 6.8030
42 2.0181 1.6837 to 9.6799 42 2.0181 −0.2420 to 6.6965
40 2.021 1.6779 to 9.6857 40 2.021 −0.2470 to 6.7015

7.149. No. What we have is nothing like an SRS of the population of school corporations; we have census
data for your state.

13
Chapter 8 Inference for Proportions
8.1. (a) n = 5013. (b) p is the (fixed, but unknown) population proportion of smartphone users who have
purchased an item after using the phone to search for information. (c) X = 2657. X is the number of
smartphone users from the sample who have purchased an item after using the phone to search for
information. (d) p̂ = 2657/5013 = 0.53.

p(1 − p ) 0.53(1 − 0.53)


8.3. (a) SE pˆ = = = 0.007056. (b) 0.53 ± 0.0138. (c) (51.62%, 54.38%).
n 5013
8.5. For z = 1.34, the two-sided P-value is the area under a standard Normal curve above 1.34 and below
−1.34.

0.75 − 0.5
8.7. To test H 0 : p = 0.5 versus H a : p ≠ 0.5. p̂ = 0.75, z = = 2.24. P-value = 0.0250. The
(0.5)(1 − 0.5)
20
data show a significant difference between your product and your competitor’s.

8.9. (a) p̂ = 0.35, z = 0.35 − 0.5


= −1.34. P-value = 0.1802. The results are exactly the same. (b) The
(0.5)(1 − 0.5)
20
(0.35)(1 − 0.35)
95% confidence interval is 0.35 ± 1.96 = (14.1%,55.9%). The values are 1 minus the
20
values from the previous interval.
8.11. The plot is symmetric about 0.5, where it has its maximum (the maximum margin of error always
occurs at p̂ = 0.5).

1
margin of error
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

8.13. From software, the power is 0.99.


8.15. (a) n = 300, X = 109. (b) p̂ = 109/300 = 0.3633. (c) p̂ = 36.33% is the estimate of p, the
population proportion of students at your college who regularly eat breakfast.

8.17. (a) p̂ = 462/1003 = 0.461. SE pˆ = 0.461(1 − 0.461) / 1003 = 0.0157, m = 1.96(0.0157) = 0.0308.
(b) There are certainly more than 20(1003) = 20,060 cell phone owners; the number of “successes” was
462 > 10 and the number of “failures” was 1003 – 462 = 541 > 10. (c) 0.461 ± 0.0308 = (0.4302, 0.4918).
(d) We are 95% confident that between 43.0% and 49.2% of cell phone owners used their cell phone
while in a store within the last 30 days to call a friend or family member for advice about a purchase they
were considering.

8.19. (a) p̂ = 210/250 = 0.84. The standard error is SE pˆ = 0.84(1 − 0.84) / 250 = 0.0232 , and the margin
of error is 1.96(0.0232) = 0.0454. (b) The number of successes was 210, and the number of failures was
250 – 210 = 40. Both counts are greater than 10. It is reasonable that more than 20(250) = 5000 people are
customers of the dealership, but this was not an SRS; they asked all customers in the two-week period. (c)
The confidence interval is 0.84 ± 0.0454 = (0.7946, 0.8854). (d) Based on the sample, we estimate with
95% confidence that between 79.5% and 88.5% of the service department’s customers would recommend
the service to a friend; because this was not a SRS, we must use caution in relying on this interval.
2
 1.96 
8.21. n =   0.461(1 − 0.461) = 381.8, so n = 382.
 0.05 
8.23. (a) Confidence cannot be bigger than 100%. (b) The margin of error only takes into account errors
from random sampling, not errors due to bias. (c) The P-value gives no indication of the truthfulness of
null hypothesis; rather, it gives the amount of evidence in favor of the alternative hypothesis.

3274
8.25. pˆ = = 0.6548. 95% confidence interval:
5000
(0.6548)(1 − 0.6548)
0.6548 ± 1.96 = (0.642, 0.668).
5000

2
8.27. (a) X = 0.89(1050) = 934.5, which rounds to 935. We cannot have fractions of respondents. (b)
0.89 ± 1.96 0.89(1 − 0.89) / 1050 = (0.8711, 0.9089). (c) (87.1%, 90.9%). (d) For example, parents might
be conscious of violence because of recent events in the news.

0.4(1 − 0.4)
8.29. (a) Values of p̂ outside the interval 0.4 ± 1.96 = 0.2482 to 0.5518 will result in
40
rejecting H 0 . (b) For n = 80, values outside the interval 0.2926 to 0.5074 will result in rejecting H 0 . (c) A
sketch is included. What we see is the effect of the larger sample size; the distribution is narrowed and so
is the confidence interval.

8.31. (a) About (0.42)(159,949) = 67,179 students plan to study abroad. (b) For 99% confidence, the
0.42(1 − 0.42)
margin of error is 2.576 = 0.00318. So, 0.42 ± 0.00318 = (0.41682, 0.42318).
159949

0.43(1 − 0.43)
8.33. p̂ = 0.43 and n = 1430, the 95% confidence interval is 0.43 ± 1.96 = 0.43 ± 0.0257 =
1430
(0.4043, 0.4557).

8.35. (a) For 99% confidence, the margin of error is 2.576 (0.87)(0.13) / 430,000 = 0.001321. (b) One
source of error is indicated by the wide variation in response rates: We cannot assume that the statements
of respondents represent the opinions of nonrespondents. The effect of the participation fee is more
difficult to predict, but one possible impact is on the types of institutions that participate in the survey:
Even though the fee is scaled for institution size, larger institutions can more easily absorb it. These other
sources of error are much more significant than the sampling error, which is the only error accounted for
in the margin of error from part (a).

3
8.37. (a) p̂ = 390/1191= 0.3275. For 95%, the margin of error is 1.96 0.3275(1 − 0.3275) /1191 =
0.02665, and the interval is (0.3008, 0.3541). (b) Speakers and listeners probably perceive sermon length
differently (just as, say, students and lecturers have different perceptions of the length of a class period).

5067
8.39. (a) pˆ = = 0.5067. z = 0.5067 − 0.5 = 1.34. P-value = 0.1802. This is not significant at the
10000 (0.5)(1 − 0.5)
10000
5% level. The data do not provide evidence that the coin is biased. (b) The 95% confidence interval is
(0.5067)(1 − 0.5067)
0.5067 ± 1.96 = (0.497, 0.516).
10000

2
 1.96   1 
8.41. n =     = 4268.4, use n = 4269.
 0.015   4 
8.43. The table below gives the unrounded values; for each case, we would round up to the next integer.
To be sure we meet our target margin of error, we should use the largest sample, or n = 196.

required sample size n


n
250
0.1 70.56
0.2 125.44 200
0.3 164.64 150
0.4 188.16
0.5 196 100

0.6 188.16 50
0.7 164.64
0
0.8 125.44 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9 70.56

4
8.45. Using software, n = 153.

8.47. Let D = p 1 – p 2 . µ D = 0.4 − 0.5 = −0.1, σ D = 0.4(1 − 0.4) + 0.5(1 − 0.5) = 0.1339.
25 30

8.49. (a) The means are µ p̂ 1


= p 1 and µ p̂
2
= p 2 . The standard deviations are σ pˆ1 =
p1 (1 − p1 )
n1

p2 (1 − p2 ) p1 (1 − p1 ) p2 (1 − p2 )
and σ pˆ 2 = . (b) µD = µ pˆ1 − µ pˆ 2 = p1 − p2 . (c) σ D2 = σ p2ˆ + σ p2ˆ =
1 2
+ .
n2 n1 n2

8.51. (–0.019, 0.225). We can just reverse the sign of the interval in the previous exercise.
8.53. With H 0 : p W = p M , H α : p W < p M . The test statistic is the same: z = –1.65. The P-value = 0.0495,
which is significant at the 5% level. The data do show evidence that more men favor Commercial A than
women.
8.55. Using software, we need 686 women and 686 men.
8.57. (a) This was an experiment; although we cannot assign who visits the establishment, we could
randomly assign which type of server they get. There were more than 10 successes and failures in each
group, so assuming random assignment, the guidelines are generally met. (b) This is an experiment; the
runners were randomly assigned; however, there are less than 10 successes in each group. Hence, the
guidelines are not met.
8.59. (a) The guidelines are met; the number of successes and failures in each group was at least five. (b)
The guidelines are met; the number of successes and failures in each group was at least five.
8.61. (a) RR = (40/69)/(130/349) = 1.56. Males customers are 1.56 times as likely to tip a red-shirted
server than a different-colored-shirt server. (b) RR = (9/20)/(6/20) = 1.5. A runner is 1.5 times as likely to
be satisfied with the first stretching routine than the second stretching routine.
8.63. (a) Type of college is explanatory; response is whether physical education is required. (b) The
populations are private and public colleges and universities. (c) X 1 = 101, n 1 = 129, p̂1 = 101/129 =
0.7829. X 2 = 60, n 2 = 225, p̂2 = 60/225 = 0.2667. (d)
0.7829(1 − 0.7829) 0.2667(1 − 0.2667) = (0.4245, 0.6079). We note this interval
0.7829 − 0.2667 ± 1.96 +
129 225
does not contain 0; it appears that public institutions are more likely to require physical education. (e) H 0 :
60 + 101
p 1 = p 2 . H α : p 1 ≠ p 2 . pˆ = = 0.4548. z = 0.7829 − 0.2667 P-value ≈ 0. (f)
225 + 129 = 9.39.
 1 1 
0.4548(1 − 0.4548)  + 
 225 129 
There were 101 public institutions that require physical education and 18 that do not. There were 60
private institutions that require physical education and 165 that do not. All these counts are greater than 5.
We do not know if the samples were SRSs. (g) It appears that public institutions are much more likely to
require physical education (by an estimated 42.45% to 60.79%) at 95% confidence.

5
0.299(1 − 0.299) 0.208(1 − 0.208)
8.65. SE D = + = 0.0279. (0.299 − 0.208) ± 1.96(0.0279) = (0.0363,
358 851
0.1457). Based on these samples, we’d estimate at 95% confidence that among Canadian youth in 10th
and 11th grades, between 3.6% and 14.6% more youths who stress about their health are exergamers than
those who do not stress about their health.
8.67. (a) Table below. The values of X 1 and X 2 are estimated as (0.54)(1063) and (0.89)(1064). (b) The
estimated difference is pˆ 2 − pˆ1 = 0.35. (c) Large-sample methods should be appropriate because we have
large, independent samples from two populations, and we have at least five successes and failures in each
sample. (d) With SE D = 0.01805, the 95% confidence interval is 0.35 ± 0.03537 = (0.3146, 0.3854). (e)
The estimated difference is about 35%, and the interval is about 31.5% to 38.5%. (f) A possible concern is
that adults were surveyed before Christmas, while teens were surveyed before and after Christmas. It
might be that some of those teens may have received game consoles as gifts but eventually grew tired of
them.

Population Population Sample Count of Sample


proportion size successes proportion
1 (adults) p1 1063 574 0.54
2 (teens) p2 1064 947 0.89

8.69. (a) Table below. The values of X 1 and X 2 are estimated as (0.73)(1063) and (0.76)(1064). (b) The
estimated difference is pˆ 2 − pˆ1 = 0.03. (c) Large-sample methods should be appropriate because we have
large, independent samples from two populations, and we have at least five successes and failures in each
sample. (d) With SE D = 0.01889, the 95% confidence interval is 0.03 ± 0.03702 = (−0.0070, 0.0670). (e)
The estimated difference is about 3%, and the interval is about −0.7% to 6.7%. (f) As previously, a
possible concern is that adults were surveyed before Christmas.

Population Population Sample Count of Sample


proportion size successes proportion
1 (adults) p1 1063 776 0.73
2 (teens) p2 1064 809 0.76

776 + 809
8.70 H 0 : p 1 = p 2 . H α : p 1 ≠ p 2 . pˆ = = 0.7452. z = 0.76 − 0.73
1063 + 1064 = 1.60.
 1 1 
0.7452(1 − 0.7452)  + 
 1063 1064 

P-value = 0.1096. The difference in gaming on computers is not statistically significant. It appears that
teens and adults may game on computers similarly.
8.71. No. This procedure requires independent samples from different populations. We have one sample
(of teens).
8.73. (a) Using software: (i) 199, (ii) 294, (iii) 356, (iv) 388, (v) 388, (vi) 356, (vii) 294, (viii) 199. (b) As
p 1 and p 2 get closer to 0.5, the necessary sample size to achieve the same power gets larger. As p 1 and p 2
get farther from 0.5, the necessary sample size to achieve the same power gets smaller.

6
1639 0.7(1 − 0.7)
8.75. (a) n = 2342, X = 1639. (b) pˆ = = 0.7. SE pˆ = = 0.0095. (c) The 95% confidence
2342 2342
interval is 0.7 ± 1.96 (0.7)(1 − 0.7) = (0.681, 0.718). With 95% confidence, the proportion of people who
2342
get their news from the desktop or laptop is between 68.1% and 71.8%. (d) Yes. We have at least 10
successes and failures in the sample.
8.77. We have large samples from two independent populations (different age groups). p̂1 = 861/1055 =
0.8161(0.1839) 0.4281(0.5719)
0.8161, p̂2 = 417/974 = 0.4281. SE D = + = 0.0198. The 95%
1055 974
confidence interval for the difference in the proportion of children in these age groups who get enough
calcium in their diets is (0.8161 – 0.4281) ± 1.96(0.0198) = (0.3492, 0.4268). Because the interval is
entirely above 0, we can conclude that children 5 to 10 years old are much more likely to get adequate
calcium in their diets than children 11 to 13 years old.
8.79. Answers will vary. The confidence interval gives more information on the actual size of the
difference.
8.81. (a) 0.67(1802) = 1207. (b) 0.67 ±1.96(0.01108) = (0.6483, 0.6917). (c) About 64.8% to 69.2% of all
Internet users use Facebook, at 95% confidence.
8.83. No. Many people use both; there was only one sample, not two independent samples.
8.85. (a) While there is only a 5% chance of any interval being wrong, we have six (roughly independent)
chances to make that mistake. (b) For 99.2% confidence, use z* = 2.65. (Using software, z* = 2.6521, or
2.6383 using the exact value of 0.05/6). (c) The margin of error for each interval is z* SE pˆ , so each
interval is about 1.35 times wider than in the previous exercise. (If intervals are rounded to three decimal
places, as on the right, the results are the same regardless of the value of z* used.)

Genre Interval
Racing 0.705 to 0.775
Puzzle 0.684 to 0.756
Sports 0.643 to 0.717
Action 0.632 to 0.708
Adventure 0.622 to 0.698
Rhythm 0.571 to 0.649

8.87. The pooled estimate of the proportion is p̂ = 0.375 (the average of p̂1 and p̂2 because the sample
sizes were equal). Then, SE D = 0.01811, so z = (0.43 − 0.32)/SE D = 6.08, for which P-value < 0.0001.

8.89. With p̂ = 1006/1530 = 0.6575, we have SE pˆ = 0.01213, so the 95% confidence interval is

p̂ ± 1.96 SE pˆ = 0.6575 ± 0.0238 = (0.6337, 0.6813).


8.91. H 0 : p F = p M . H α : p F ≠ p M . Finding the number of men and women who drink at least five servings
per week of soft drinks, we have X M = (0.17)(1003) = 171 and X F = (0.15)(1003) = 150. p̂ = 0.1600,

7
SE D = 0.0164, z = 1.28, P-value = 0.2009. There is no significant difference between Australian men
and women in terms of drinking five or more soft drinks per week.
n p̂ SE D z P-value
1,000 0.16 0.0232 0.86 0.3884
4,000 0.16 0.0116 1.73 0.0845
10,000 0.16 0.0073 2.73 0.0064

8.93. The difference becomes more significant (i.e., the P-value decreases) as the sample size increases.
For small sample sizes, the difference between p̂1 = 0.55 and p̂2 = 0.45 is not significant, but, with larger
sample sizes, we expect that the sample proportions should be better estimates of their respective
population proportions, so p̂1 – p̂2 = 0.1 suggests that p 1 ≠ p 2 .

n z P-value
60 1.1 0.2713
70 1.18 0.238
80 1.26 0.2077
100 1.41 0.1585
400 2.83 0.0047
500 3.16 0.0016
1000 4.47 0

8.95. (a) Using either trial and error, or the formula derived in part (b), we find that at least

∗ pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
n = 534 is needed. (b) Generally, the margin of error is m = z + ; with p̂1 = p̂2 =
n n
0.5, this is m = z ∗ 0.5 / n . Solving for n, we find n = (z*/m)2/2.

8.97. (a) p 0 = 143,611/181,535 = 0.7911. (b) p̂ = 339/870 = 0.3897, σ p̂ = 0.0138, and z = −29.1, so P-
value = 0 (regardless of the H a ). This is strong evidence against H 0 ; we conclude that Mexican
Americans are underrepresented on juries. (c) p̂1 = 339/870 = 0.3897, while p̂2 = (143,611 −

339)/(181,535 − 870) = 0.7930. Then, p̂ = 0.7911, SE D = 0.01382, and z = −29.2—and again, we


have P-value = 0 and reject H 0 .

8
Chapter 9: Analysis of Two-Way Tables
9.1. (a) 61.08% of women use Instagram, while 38.92% do not. (b) 43.98% of men use Instagram, while
56.02% do not. (c) Graph shown below. (d) A greater percent of women use Instagram compared with
men.
9.3. Answers will vary. Most will probably prefer the distribution broken down by sex, which emphasizes
the difference between men and women and their Instagram usage.
9.5. Among all three fruit consumption groups, vigorous exercise is most likely. Incidence of low exercise
decreases with increasing fruit consumption.

row total × column total


9.7. Each expected count for each cell is computed as . The expected counts
n
are shown in the table below.

Physical Physical Physical Physical


Fruit Activity Activity Activity Activity
Consumption Low Medium Vigorous Total
Low 51.9 212.9 304.2 569
Medium 29.3 120.1 171.6 321
High 26.8 110 157.2 294
Total 108 443 633 1184

9.9. df = (r − 1)(c − 1). (a) df = 6, 0.025 < P-value < 0.05. (b) df = 6, 0.025 < P-value < 0.05. (c) df = 2,
0.10 < P-value < 0.15. (d) df = 2, 0.01 < P-value < 0.02.
9.11. (a) The conditional distributions are given in the table. Divide the number of “yes” and “no”
answers in each column by the column total. (b) The graph is shown at the right. (c) Explanatory variable
value 1 had proportionately fewer “yes” responses.

Response Explanatory 1 Explanatory 2


1 2
Yes 0.357 0.452
No 0.643 0.548
Total 1 1

9.13. (a) pi = proportion of “yes” responses in group i. H0: p1 = p2, Ha: p1 ≠ p2. pˆ1 = 0.357, pˆ 2 = 0.452.
(b) pˆ = (75 + 95) / (210 + 210) = 0.4048. z = −1.9882, P-value = 0.0469. We reject H 0 at the 5% level
and conclude there is a difference in the proportion of “yes” answers for the two levels. (c) The P-values
agree (software gives P = 0.0469 for X (1) = 3.95). (d) z2 = (−1.9882)2 = 3.9529.
2

9.15.

1
Consoles Gamer Gamer Total
Adult Teen
Yes 574 489 1063
No 945 119 1064
1519 608

9.17. The table below gives the joint distribution. For the marginal distribution of gamers: 71.42% are
adult; 28.58% are teen. For the marginal distribution of consoles: 49.98% play games on consoles;
50.02% do not. For adult gamers: 37.79% play games on consoles; 62.21% do not. For teen gamers:
80.43% play games on consoles; 19.57% do not. For those who play games on consoles: 54% are adult
gamers; 46% are teen gamers. For those who do not play games on consoles: 88.82% are adult gamers;
11.18% are teen gamers. Answers will vary for preference of conditional distributions, but the usage of
game consoles by gamer type is probably the most informative. As shown in the graph below, a much
higher percent of teen gamers play on consoles than adult gamers.

Console Use by Gamer


Type
1

0.5

0
Adult Teen
Gamer

Yes No

9.19. The expected counts are in the table below.

Consoles Gamer Adult Gamer Teen


Yes 759.14 303.86
No 759.86 304.14

9.21. From software, X 2 = 315.7770, df = 1, P-value < 0.0001. The data provide evidence of an
association between type of gamer and whether or not he or she play console games.
9.23. X 2 = 315.7770 ≈ z2 = (−17.77)2 = 315.7729.
9.25. The joint probabilities are given in the table below. For the marginal distribution of sex: 48.98% are
boys; 51.02% are girls. For the marginal distribution of the number of times witnessing sexual
harassment: 71.36% have witnessed it more than once; 16.12% have witnessed it once; and 12.51% have
never witnessed it. For boys: 76.01% have witnessed it more than once; 12.98% have witnessed it once;
and 11.01% have never witnessed it. For girls: 66.90% have witnessed it more than once; 19.14% have
witnessed it once; and 13.96% have never witnessed it. For those witnessing sexual harassment more than
once: 52.17% are boys; 47.83% are girls. For those witnessing sexual harassment once: 39.43% are boys;

2
60.57% are girls. For those never witnessing sexual harassment: 43.09% are boys; 56.91% are girls.
Answers will vary for preference of conditional distributions, but the distribution of times by sex are
probably the most informative, showing that boys tend to say they have witnessed sexual harassment
slightly more than girls, with boys being 10% higher in the more category than girls.

Sex Sex
(Times) Boys Girls
More 37.23 34.13
Never 5.39 7.12
Once 6.36 9.77

9.27. X 2 = 20.822, df = 2, P-value < 0.0001. The data provide evidence of an association between sex and
the number of times witnessing sexual harassment.
9.29. This is due to rounding error.

9.31. CA: 0.5820; HI: 0.0000; IN: 0.0196; NV: 0.0660; OH: 0.2264; χ2 = 0.0369 + 0.5820 + 0.0000 +
0.0196 + 0.0660 + 0.2264 = 0.9308.
9.33. (a) The null hypothesis is that the coin matches the theoretical proportions of 50/50, 50% heads and
50% tails, and is fair. The alternative is that the coin does not match the theoretical 50/50 and is biased.
( 5067 − 5000 ) ( 4933 − 5000 )
2 2

(b) χ 2 = + = 1.7956. df = 1, 0.15 < P-value < 0.20. The data do not provide
5000 5000
evidence that the coin is biased.
9.35. Answers will vary. Most results will give a fairly decent randomization and should fail to reject the
null hypothesis. Changing the interval likely will not change the result and should still fail to reject the
null hypothesis.

(10 − 17.15) ( 55 − 36.008) ( 22 − 39.186 ) ( 53 − 49.042 )


2 2 2 2

9.37. χ 2 = + + +
= 20.8548. df = 3, P-value <
17.15 36.008 39.186 49.042
0.0005 (0.0001 from software). The data provide evidence that the data are significantly different from a
Poisson distribution with mean 2.1.
9.39. (a) H0: p1 = p2, Ha: p1 ≠ p2, where the proportions of interest are for those harassed in person.
pˆ1 = 321 / 361 = 0.8892, pˆ 2 = 200 / 641 = 0.3120, pˆ = 521 / 1002 = 0.5200. We compute z = 17.556, P-
value ≈ 0. We conclude there is an association between being harassed online and in person. It appears
that those who have been harassed online are more likely to also be harassed in person. (b) We test H0:
There is no association between being harassed online and in person versus Ha; there is a relationship. X 2
= 308.23, df = 1, P-value ≈ 0. (c) 17.5562 = 308.21, which agrees with X 2 to within roundoff error. (d)
Perhaps one girl would not answer these questions.
9.41. (a) The solution to Exercise 9.39 used “harassed online” as the explanatory variable. (b) Changing
to use “harassed in person” for the two-proportions z test gives pˆ1 = 321 / 521 = 0.6161,
pˆ 2 = 40 / 481 = 0.0832, and pˆ = 361 / 1002 = 0.3603. We again compute z = 17.556, P-value ≈ 0. No
changes will occur in the chi-square test. (c) The test statistic will be the same regardless of which is
viewed as explanatory.

3
9.43. For 600 tosses, the expected count for each outcome would be (1/6)(100) = 100.
9.45. (a) Table shown below. (b) 10.53% of Small, 29.41% of Medium, and 20% of Large were not
allowed. (c) In the 3 × 2 table, the expected count for large/not allowed is too small (5)(12)/79 = 0.76. (d)
H0: There is no association between the size of the claim and whether or not it is allowed. Hα: There is an
association between the size of the claim and whether or not it is allowed. (e) From software, X 2 = 3.456,
df = 1, 0.05 < P-value < 0.10. The data do not provide evidence of an association between the size of the
claim and whether or not it is allowed.

Allowed? Allowed?
Stratum Yes No Total
Small 51 6 57
Medium 12 5 17
Large 4 1 5
Total 67 12 79

9.47. The table below shows the given information translated into a 3 × 2 table. For example, in Year 1,
about (0.423)(2408) = 1018.584, (which rounds to 1019) students received DFW grades, and the rest—
(0.577)(2408) = 1389.416 students—passed. To test H0: the DFW rate has not changed, we compute X2 =
308.3, df = 2, P < 0.0001—very strong evidence of a change.

Year DFW Pass


1 1019 1389
2 579 1746
3 423 1703
9.49. (a) The approximate counts are shown on the right; for example, among those students in trades,
(0.34)(942) = 320.28 enrolled right after high school, and (0.66)(942) = 621.72 enrolled later. (b) In
addition to a chi-square test in part (c), students might note other things such as, overall, 39.4% of these
students enrolled right after high school. Health is the most popular field, with about 38% of these
students. (c) We have strong enough evidence to conclude that there is an association between field of
study and when students enter college; the test statistic is X2 = 276.1, with df = 5, for which P-value is
very small. A graphical summary is not shown; a bar chart would be appropriate.
Time of Entry
Field of Right after Later Total
study high school
Trades 320 622 942
Design 274 310 584
Health 2034 3051 5,085
Media/IT 976 2172 3,148
Service 486 864 1,350
Other 1173 1082 2,255
Total 5263 8101 13,364

4
9.51. From software, X 2 = 852.4330, df = 1, P-value < 0.0001 (Using Table F, P-value < 0.0005.) z2 =
(−29.2)2 = 852.64 = X 2 with rounding error.

Person Juror Not Total


MA 339 143,272 143,611
Not MA 531 37,393 37,924
Total 870 180,665 181,535

9.53. Answers will vary. Most results will give a fairly decent randomization and should fail to reject the
null hypothesis. Changing the interval likely will not change the result and should still fail to reject the
null hypothesis.
9.55. (a) Each quadrant accounts for one fourth of the area, so we expect it to contain one fourth of the
100 trees. (b) Some random variation would not surprise us; we no more expect exactly 25 trees per
quadrant than we would expect to see exactly 50 heads when flipping a fair coin 100 times. (c)

(18 − 25) ( 22 − 25) ( 39 − 25) ( 21 − 25)


2 2 2 2

χ2 = + + + = 10.8. df = 3 and P-value = 0.0129. We conclude


25 25 25 25
that the distribution is not random.

5
Chapter 10: Inference for Regression
10.1. (a) The slope is 0.8. (b) For each serving of fruits and vegetables in a calorie-controlled diet, the
expected average decrease in blood pressure goes up by 0.8. (c) For x = 7, µ y = 2.8 + 0.8(7) = 8.4. (d) The
distribution is N(8.4, 3.2). (e) 95% of the subpopulation will be between µ y ± 2σ. 8.4 ± 2(3.2) = 2.0 to
14.8.

1.4
10.3. (a) t = = 2.154, df = 20 – 1 = 19. 0.04 <P-value < 0.05. At α = 0.05, we reject H 0 . (b)
0.65
2.1 2.1
t= = 2, df = 30 – 1 = 29. 0.05 <P-value < 0.10. At α = 0.05, we fail to reject H 0 . (c) t = = 2, df
1.05 1.05
= 100 – 1 = 99. 0.04 <P-value < 0.05. At α = 0.05, we reject H 0 .
10.5. The margin of error is half the width of the confidence interval. (24.4 – 23.0)/2 = 0.7. The margin of
error depends on the distance from x . Because x = 11.0 is farther from the mean, the margin of error
there will be larger.

10.7. (a) The parameters of the regression model are β 0 , β 1 , and ε; those given are estimates of these. (b)
The estimated slope b 1 does not belong in the hypotheses; it should be H 0 : β 1 = 0. (c) This is backward;
the prediction interval will be wider than the mean response interval.
10.9. Prediction intervals concern individuals instead of means. Departures from the Normal distribution
assumption would be more severe here (in terms of how the individuals vary around the regression line).
10.11. (a) ŷ = 8.25464 + 0.11259EDUC. The points in the residual plot, shown below, look scattered and
random so the assumptions are satisfied. The histogram shows the residuals are Normally distributed. (b)
Yes, the log transformed data can effectively be used for inference because the residuals have a normal
distribution and the residual plot appears random.

10.13. Not exactly. Because the list was narrowed before we took our SRS, our sample really only reflects
the schools that met the “academic quality” criteria and not all 500+ colleges.
10.15. (a) ŷ = 14,180 + 0.77688(7372) = $19,907.16. (b) ŷ = 14,180 + 0.77688(10566) = $22,388.51.
(c) The margin of error for Appalachian would be larger than the margin of error for Texas A&M because
its in-state cost is farther from the average in-state cost.

1
10.17. (a) See plots below. InCostAid looks somewhat linear but has several outliers. Admit also looks
linear but also has a couple outliers. GradRate does looks weakly linear and may have one low outlier.
OutCostAid does not have a linear relationship with AveDebt. Baruch College is notable in each of the
plots. (b) The table is shown below. (c) InCostAid is easily the best predictor, as it is the only one that is
significant with AveDebt by itself.

Variable s P-value
InCostAid 3665.8 0.0058
Admit 4203.5 0.1882
GradRate 4282.8 0.3356
OutCostAid 4369.8 0.8025

10.19. (a) Plot shown below. A linear trend looks reasonable. There is not anything unusual, such as
outliers. (b) ŷ = –24517 + 12.83450Year. (c) The intercept only describes what happens when x is 0. So
although a negative number of tornadoes is not possible at year 0; this is far outside the range of our data
and not of interest to us. (d) The residual plot, also shown below, looks mostly random except there
seems to be a little more variation in the past 10 years or so than in the past. (e) The residuals are Normal
as shown in the Normal quantile plot. (f) Yes, the residuals check out, and we can proceed with our
regression analysis.

2
10.21. (a) Plot shown below. (b) Plot shown below. The points are much closer to a straight line. (c) The
90% confidence interval is (0.37305, 0.43643). (d) For each year, capacity of memory (in log Kbytes)
increases between 0.37305 and 0.43643 with 90% confidence.

10.23. Table shown below. Answers will vary for preference.

Package SS MS F r2
SPSS Sums of Squares Mean Square 17.096 0.149
Minitab Adj SS Adj MS 17.10 14.85%
Excel SS MS 17.09644 0.148540143
3
Package SS MS F r2
JMP Sums of Squares Mean Square 17.0964 0.14854

10.25. (a) The null hypothesis should test the slope β 1 not β 0 . (b) Sums of squares add; mean squares do
not. (c) The r2 value determines explanatory power, not the P-value. (d) The total df is equal to n – 1, not
n.
10.27. (a) Plot shown below. The pattern shows that obligations are increasing linearly over time. (b) ŷ
= –3609.51 + 1.815Year. (c) The residuals are –0.38, 0.19, 0.76, –0.57.
( −0.38) 2 + (0.19) 2 + (0.76) 2 + (−0.57) 2
s= = 0.73587. (d) The model is y i = β 0 + β 1 x i + ε i , where ε i
2
~N(0, σ). The estimate for β 0 is b 0 = –3609.51; the estimate for β 1 is b 1 = 1.815; the estimate for σ is s =
0.73587. (e) The 95% confidence interval for the slope is (1.10702, 2.52298). Obligations go up between
1.10702 and 2.52298 in billions of dollars each year.

10.29. (a) t = 2.25, df = 40.01<P-value < 0.02;the correlation is significantly greater than 0 at the 5%
level. (b) States with more adult binge drinking are more likely to have underage drinking. r2 = 0.1024.
10.24% of the variation in underage drinking can be accounted for by the prevalence of adult binge
drinking. (c) Even though most states were used, it is assumed that sampling took place for each state;
thus, we can still infer about the true unknown correlation. (Had we obtained different samples from each
state, we would have obtained difference results.)
10.31.

Source DF SS MS F
Regression 1 5552.9 5552.9 28.22
Residual 23 4525 196.739
Error
Total 24 10077.9
14.026
10.33. (a) SEb1 = = 0.15619. (b) Using df = 23, t* = 2.069, so the interval is (0.4898,
(25 − 1)(18.33) 2

1.1362). (c) The intercept is meaningful because it tells us what the EAFE is when the S&P is 0, meaning

4
1 (12.04) 2
no return in U.S. markets. (d) SEb0 = 14.026 + = 3.377 . The interval is (–10.177,
25 (25 − 1)(18.33) 2
3.797).
10.35. The first plot shows nonconstant variance, as the points spread out moving from left to right. The
second plot also shows similar nonconstant variance. The third plot looks good, random, and has no
violations. The fourth plot could have a nonlinear pattern; on the ends of the plot all the points are
positive.
10.37. (a) ŷ = –0.01270 + 0.01796x. r2 = 79.98%. The relationship is quite strong and shows a positive
relationship between the number of beers and the BAC. (b) H 0 : β 1 = 0, H α : β 1 ≠ 0. t = 7.48, df = 14, P-
value < 0.0001. There is a significant linear relationship between BAC and the number of beers. For each
additional beer drunk, BAC goes up by 0.01796 on average. (c) (0.04, 0.1142). Because the prediction
interval includes values over 0.08, he cannot be confident he would not be arrested.

10.39. This question will assume the California schools were removed. (a) H 0 : β 1 = 0, H α : β 1 ≠ 0. (b) t =
10.67, P-value < 0.0001. There is evidence of a significant linear relationship between the 2014 and 2008
tuition amounts. (c) The 95% confidence interval is (0.83963, 1.24033). On average, the annual percent
increase in tuition is between 84% to 124% of what it was the previous year. (d) r2 = 81.41%. (e) The
intercept represents the tuition at x = 0; this is extrapolation.
Note: Changes for using the entire dataset are: (b) t = 7.68, with the same P-value; (c) (0.74649,
1.28680) with a similar interpretation; (d) r2 = 65.52%.
10.41. Again, this question will assume California schools were removed. (a) ŷ = 4466.19453 +
1.70030x. (b) The residual plot, shown below, looks random and scattered and the assumptions appear
valid. (c) Answers may vary. The r2 value is better using the year 2008 and is therefore likely the better
model.
Note: Changes for using the entire dataset are: (a) ŷ = 5367.62092 + 1.61450x; (b) The plot
has a few more points but overall looks similar.

5
10.43. All plots and graphs shown below. (a) Percent is strongly right-skewed; a lot of players have a
small percent of their salary devoted to incentive payments. The five-number summary is0%, 0.31%,
1.43%, 17.65%, and 85.01%. Rating is also right-skewed. The five-number summary is 0, 2.25, 6.31,
12.69, and 27.88. (b) Only the residuals need to be Normal because they are somewhat right-skewed, it
could pose a threat to the results. (c) The relationship is quite scattered. The direction is positive, but any
linear relationship is weak. A large number of observations fall close to 0 percent. (d) ŷ = 6.24693 +
0.10634x. (e) The residual plot looks good; no apparent violations. The Normal quantile plot shows the
violation of Normality and the right-skew we saw earlier.

6
10.45. (a) 30. Generally, this may be true because the sellers might expect buyers to “lowball,” but
markets will vary. (b) The relationship is linear, positive, and strong. (c) House 27 has an assessed value
of 167.9 but a sales price of 360.0. This observation is likely influencing the regression somewhat. (d) ŷ
= 9.0176 + 1.15705x. s = 37.34442. (e) ŷ = 9.43181 + 1.123x. s = 25.39177. (f) The outlier has some
influence on the regression; particularly, the first model that includes the outlier has a much larger
standard error than it does when the observation is removed. (g) Answers will vary.

10.47. (a) About r2 = 8.41% of the variability in AUDIT score is explained by (a linear regression on)
gambling frequency. (b) H 0 : ρ = 0. H a : ρ ≠ 0. t =9.12 (df = 906), for which P-value is very small. (c)
Even though this study found a significant positive correlation between gambling and drinking behaviors,
the relationship is weak. In addition, nonresponse is a problem because the students who did not answer
might have different characteristics from those who did. Because of this, we should be cautious about
considering these results to be representative of all first-year students at this university and even more
cautious about extending these results to the broader population of all first-year students.
10.49. All plots and graphs shown below. (a) Histograms with imbedded numerical summaries are shown
below. IBI is slightly left skewed; Forest is slightly right skewed. (b) The scatterplot shows a weak
positive association, with more scatter in y for small x. (c) y i = β 0 + β 1 x i + ε i , where ε i ~N(0, σ). (d) H 0 : β 1
= 0, H α : β 1 ≠ 0. (e) ŷ =59.91 + 0.1531Forest, s = 17.79. For the significance test, t = 1.92, P-value =
0.0608. There is evidence of a significant linear relationship between index of biotic integrity and percent
of area that was Forest. (f) The residual plot shows that there is more variation for small x. (g) As we can

7
see from a histogram and Normal quantile plot, the residuals are somewhat left-skewed but, otherwise,
seem reasonably close to Normal. (h) Student opinions may vary.

10.51. The precise results of these changes depend on which observation is changed. (Six observations
had 0% Forest and two had 100% Forest.) Specifically, if we change IBI to 0 for one of the first six
observations, the resulting P-value is between 0.019 (observation 6) and 0.041 (observation 3). Changing
one of the last two observations changes the P-value to 0.592 (observation 48) or 0.645 (observation 49).
In general, the first change decreases P-value (that is, the relationship is more significant) because it

8
accentuates the positive association. The second change weakens the association, so P-value increases
(the relationship is less significant).
10.53. Using the model with area, ŷ = 52.92 + 0.4602(10) = 57.52, and the prediction interval is
(23.5598, 91.4892). Using the model with forest, ŷ = 59.91 + 0.1531(63) = 69.55, and the prediction
interval is (33.2085, 105.9006). The model using Forest predicts a slightly higher predicted value than the
model using Area; however, both prediction intervals are quite wide, thus suggesting a lot of error in
prediction.
10.55. (a) The scatterplot shows a very linear trend. (b) ŷ = −61.12 + 9.3187Year. r2 = 98.8%. (c) The
rate we seek is the slope. The 99% confidence interval is (8.3562, 10.2812).

10.57. (a) 2016 would be coded as 116. (b) ŷ = −61.12 + 9.3187(116) = 1019.8492, for a prediction of
3.0020m. (c) For a specific year, such as 2016, we would use a prediction interval.
10.59. For n = 15, t = 2.08; for n = 25, t = 2.77. The P-values are 0.0579 and 0.0109. The test with n = 25
is significant at the 0.05 level, but the other is not. Finding the same correlation with more data points is
stronger evidence that the observed correlation is not just due to chance.
10.61. (a) Plot shown below. There is a strong increasing linear relationship between the two test scores.
One student stands out as an outlier with an ACT score of 21 and an SAT score of 420. (b) ŷ = 1.63 +
0.0214SAT. t = 10.78, P-value < 0.0005 (<.0001 from software).ACT and SAT scores appear to be
linearly related. (c) r = 0.667 = 0.8167.

9
10.63. (a) For SAT: x = 912.7 and s x = 180.1117. For ACT: y = 21.13 and s y = 4.7137. Therefore, the slope
is a 1 = 0.02617, and the intercept is a 0 = −2.7522. (b) The new line is dashed. (c) For example, the first
prediction is −2.7522 + (0.02617)(1000) = 23.42. Up to rounding error, the mean and standard deviation
of the predicted scores are the same as those of the ACT scores: y = 21.13 and s y = 4.7137.

r n−2
10.65. For each correlation, we compute t = . For n = 123, between 0.24 and 0.28 have a P-value
1− r
2

< 0.01; between 0.20 and 0.23 have a P-value < 0.05. For n = 96, between 0.22 and 0.24 have a P-value <
0.05; the others are not significant.

10
Rule Breaking measure Popularity Gene
Expression
Sample 1 (n = 123)
RB.composite 0.28** 0.26**
RB.questionnaire 0.22* 0.23*
RB.video 0.24** 0.20*
Sample 1 Caucasians only (n = 96)
RB.composite 0.22* 0.23*
RB.questionnaire 0.16 0.24*
RB.video 0.19 0.16

10.67. (a) For women: (14.72609, 33.32604).For men: (–9.46079, 42.96351). These intervals overlap
quite a bit so the slopes could be quite similar or even the same. (b) These quantities can be computed
from the data, but it is somewhat simpler to recall that they can be found from the sample standard
deviations sx ,w and sx ,m : sx ,w 11 = 6.8684 11 = 22.78 and sx ,m 6 = 6.6885 6 = 16.38 . The women’s
standard error is smaller in part because it is divided by a larger n. (c) In order to reduce the standard error
for men, we should choose our new sample to include men with a wider variety of lean body masses.
Note: Just taking a larger sample will also likely reduce the standard error, but it is reduced even
more if we choose subjects with more variable lean body masses.

11
Chapter 11: Multiple Regression
11.1. (a) Final exam scores. (b) 166. (c) Seven. (d) Math course anxiety, math test anxiety, numerical task
anxiety, enjoyment, self-confidence, motivation, and perceived usefulness of the feedback sessions.
11.3. (a) A small P-value indicates only that at least one explanatory variable is significantly different
from zero, not all. (b) The description of R2 is appropriate, but it is not obtained from squaring and adding
the pairwise correlations. (c) The null hypothesis should be about the parameter β 2 , not about b 2 .
11.5. (a) For n =25, df = 22, t* = 2.074. The 95% confidence interval is 6.4 ± 2.074(3.3) = (–0.4442,
13.2442). (b) For n =43, df = 40, t* = 2.021. The 95% confidence interval is 6.4 ± 2.021(2.9) = (0.5391,
12.2609). (c) For n =25, df = 21, t* = 2.080. The 95% confidence interval is 4.8 ± 2.080(2.7) = (–0.816,
10.416). (d) For n =104, df = 100, t* = 1.984. The 95% confidence interval is 4.8 ± 1.984(1.8) = (1.2288,
8.3712).

11.7. (a) y= β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6 + ε, where ε ~N(0, σ) and independent. (b) The


sources of variation are model (DFM = p = 7), error (DFE = n − p − 1 = 134), and total (DFT = n − 1 =
141).
11.9. (a) Answers may vary, but things do look similar to what is expected. The three anxiety variables
have negative signs, suggesting that higher anxiety is associated with lower final exam scores.
Additionally, the other four variables all have positive signs, suggesting that higher values of those
variables are associated with higher final exam scores. Because these other four variables are all
measuring what we might call “good” attitudes for the students, it may be understandable that they would
be positively associated with the final exam scores. (b) DFM = p = 7. DFE = n – p – 1 = 158. (c) The t
and P-values are shown in the table below. Only Feedback usefulness tests are significant at the 0.05
level; all other variables are not individually significant with final exam scores.

Variable Estimate SE t P-value


Math course anxiety –0.212 0.114 –1.860 0.0648
Math test anxiety –0.155 0.119 –1.303 0.1946
Numerical task anxiety –0.094 0.116 –0.810 0.4190
Enjoyment 0.176 0.114 1.544 0.1246
Self-confidence 0.118 0.114 1.035 0.3022
Motivation 0.097 0.115 0.843 0.4002
Feedback usefulness 0.644 0.194 3.320 0.0011

11.11. (a) Using α = 0.05 and df = 123 – 12 – 1 = 110 (use 100), for significance we need |t| > 1.984.
Thus, Division, November, Weekend, Night, and Promotion are all significant in the presence of all the
other explanatory variables. (b) df = 12 and 110, P-value < 0.001. (c) 52%. (d) 12,493.47 – 788.74 +
2992.75 + 1460.31 – 131.62 – 779.81 = 15,246.36. (e) Because we do not expect the same setting for very
many games, the mean response interval does not make sense; thus, a prediction interval is more
appropriate to represent this particular game and its specific settings.
11.13. Answers will vary. Most will likely prefer the scatterplot matrix because it shows which
relationships are actually somewhat linear. The benefit of the correlation matrix is that we can tell how
strong the relationship is if it is linear.

1
11.15. The four plots are shown below. Generally, all four plots show the same random scattering, and the
conditions are met for a multiple regression model.

0.204
11.17. (a) DFE = n – p – 1. For Model 1: 200. For Model 2: 199. (b) t = = 3.09, P-value = 0.0023.
0.066
0.161 0.100
(c) For gene expression: t = = 2.44, P-value = 0.0153. For RB. composite: t = = 3.33, P-value
0.066 0.030
= 0.0010. (d) The relationship is still positive after adjusting for RB. When gene expression increases by
1, popularity increases by 0.204 in Model 1 and by 0.161 in Model 2 (with RB fixed).
11.19. (a) ŷ = 15656 + 43.08092Admit + 75.96892Grad + 0.83610InCostAid – 0.29900OutCostAid. (b)
R2 = 43.53%. (c) F = 3.66, DF = 4 and 19, P-value = 0.0225. The data provides evidence of a significant
multiple linear regression between average debt and the predictors: Admit, Grad, In Cost Aid, and
OutCostAid. (d) Used in a model together, the four predictors are significant in explaining the average
debt, F = 3.66, P-value = 0.0225. However, some of the predictors are not significantly contributing to the
model. The table below shows that only InCostAid has a significant t test and the other three are not
significant when added last. We likely could get a more efficient model by removal some of the
insignificant predictors, which would make model better overall.

2
Variable Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 15656 5515.8709 2.84 0.0105
Admit 43.08092 63.54874 0.68 0.5060
GradRate 75.96892 54.01067 1.41 0.1757
InCostAid 0.8361 0.28907 2.89 0.0093
OutCostAid -0.299 0.16044 -1.86 0.0779

11.21. (a) The 95% prediction interval is (16462, 31773). (b) The 95% prediction interval is (15818,
31347). (c) The intervals give similar predictions for Ohio State University. The model using all four
predictors has only a slightly narrower prediction interval.
11.23. We have p = 8 explanatory variables and n = 795 observations. (a) The ANOVA F test has degrees
of freedom DFM = p = 8 and DFE = n – p –1 = 786. (b) This model explains only R2 = 7.84% of the
variation in energy-drink consumption; it is not very predictive. (c) A positive (negative) coefficient
means that large values of that variable correspond to higher (lower) energy-drink consumption.
Therefore, males and Hispanics consume energy drinks more frequently than females and non-Hispanics.
Consumption also increases with risk-taking scores. (d) Within a group of students with identical (or
similar) values of those other variables, energy-drink consumption increases with increasing jock identity
and increasing risk taking.
11.25. (a) From software, F = 10.44, P-value < 0.0001. There is a significant multiple linear regression
between BMI and the predictors x 1 and x 2 . ŷ = 23.39556 − 0.68175x 1 + 0.10195x 2 , where x 1 = (PA –
8.614) and x 2 = (PA – 8.614)2. (b) The quadratic regression explains R2 = 17.71% of the variation in BMI.
(c) The residual plot looks random; no violations. The normal quantile plot shows that the residuals are
normal. (d) H 0 : β 2 = 0, H α : β 2 ≠ 0. From software, t = 1.83 with df = 97, for which P-value = 0.0696,
which is not significant but close.

Note: Many of the problems below have so many parts that putting all the plots together is not
ideal for reference. Thus, going forward, many of the tables, plots, and graphs are shown with their
corresponding parts.
11.27. (a) Budget and Opening are right-skewed. Opening has a large outlier that may be influential.
Theaters and Ratings are left-skewed.

3
(b) The correlations and scatterplot matrix show a linear relationship between the explanatory variables,
though none are very strong.

Budget Opening Theaters Ratings


Budget 1 0.40304 0.56998 0.28109
Opening 0.40304 1 0.62548 0.15132
Theaters 0.56998 0.62548 1 -0.02213
Ratings 0.28109 0.15132 -0.02213 1

4
11.29. (a) From software, F = 32.28, P-value < 0.0001. There is a significant multiple linear regression
between U.S. Revenue and the predictors Budget, Opening, Theaters, and Ratings. US Revenue = β 0 +
β 1 Budget + β 2 Opening + β 3 Theaters + β 4 Ratings + ε, where ε ~N(0, σ) and independent. (b) ŷ = –
170.40874 + 0.09252Budget + 1.91600Opening + 0.02961Theaters + 17.29923Ratings. (c) The residual
plot shows a slight downward trend as well as some slight non constant variation, with residuals less
spread out at lower predicted values; neither are bad, but it does suggest that another model may be more
appropriate. (d) R2 = 77.26%.

11.31. (a) ŷ = 66.3431 or $66,343,100, the 95% prediction interval is (–$25,514,500, $158,200,600). (b)
ŷ = 63.0708 or $63,070,800, the 95% prediction interval is (–$28,098,300, $154,239,900). (c) Although

5
the first model has a slightly larger predicted value, the interval widths are nearly identical; both models
give equally good prediction.
11.33. (a) Teaching and research are both right-skewed. Citations are left-skewed.

(b) The correlations are shown in the table below. Teaching and research are strongly linearly related.
Citations do not look related to either teaching or research.

Teaching Research Citations


Teaching 1 0.89934 0.18775
Research 0.89934 1 0.06905
Citations 0.18775 0.06905 1

6
88.2

R
esearch

21.9
89.1

Teachi ng

21.6
100.0

C
i tati ons

42.6

11.35. (a) Overall = β 0 + β 1 Teaching + β 2 Research + β 3 Citations + ε, where ε ~N(0, σ) and independent.
(b) From software, F = 736.38, P-value < 0.0001. There is a significant multiple linear regression
between overall total score and the predictors Teaching, Research, and Citations. ŷ = 8.16814 +
0.26432Teaching + 0.32800Research + 0.27513Citations. (c) For teaching: (0.19280, 0.33583). For
research: (0.26790, 0.38810). For Citations: (0.23850, 0.31176). The intervals should not contain 0
because all three predictors had significant t tests. (d) R2 = 97.74%. s = 1.72272.
11.37. (a) All distributions are skewed to varying degrees: GINI and CORRUPT to the right; the other
three to the left. CORRUPT, DEMOCRACY, and LIFE have the most skewness.

7
(b) As the correlations and scatterplots below show, LSI seems moderately correlated with Corrupt,
Democracy, and Life but is not related to GINI much at all. Among the others, only Corrupt seems to be
moderately related to both Democracy and Life; other relationships appear weak.

LSI GINI CORRUPT DEMOCRACY LIFE


LSI 1 -0.05025 0.69741 0.60923 0.72186
GINI -0.05025 1 -0.34846 -0.15219 -0.39614
CORRUPT 0.69741 -0.34846 1 0.74744 0.65034
DEMOCRACY 0.60923 -0.15219 0.74744 1 0.52471
LIFE 0.72186 -0.39614 0.65034 0.52471 1

8
8.3

LS
I

2.8
63.1

G
INI

24.7
9.6

C
ORR
UPT

1.7
6.0

D
EMO
CRA
CY

0.5
82.25

LIFE

47.56

11.39. (a) The coefficients, standard errors, t statistics, and P-values are given in the Minitab output
shown with the solution to Exercise 11.38. (b) Student observations will vary. For example, the t statistic
for the GINI coefficient grows from t = −0.42 (P = 0.675) to t = 4.25 (P < 0.0005). The DEMOCRACY t
is 3.53 in the third model (P < 0.0005) but drops to 0.71 (P = 0.479) in the fourth model. (c) A good
choice is to use GINI, LIFE, and CORRUPT (Minitab output follows). All three coefficients are
significant, and R2 = 70% is nearly the same as the fourth model from Exercise 11.34.
11.41. (a) From software, F = 22.34, P-value < 0.0001. There is a significant linear regression between
VO+ and the predictor OC. ŷ = 334.03439 + 19.50471OC. The residual plot shows a possible outlier
with a large residual value. (b) From software, F = 21.62, P-value < 0.0001. There is a significant

9
multiple linear regression between VO+ and the predictors OC and TRAP. ŷ = 57.70419 + 6.41466OC +
53.39331TRAP. In this model, TRAP is much more significant (t = 3.50, P-value = 0.0016) than OC (t =
1.25, P-value = 0.2210) suggesting it is a more useful predictor; this is consistent with the correlations
given in the previous exercise, which showed that TRAP was higher correlated with VO+ than OC.
11.43. All variables are normal or roughly normal when log transformed.

Scatterplots and correlations (below) show that all six pairs of variables are positively associated. The
strongest association is between LVO+ and LVO–; the weakest is between LOC and LVO–.

logvoplus logoc logtrap logvominus


logvoplus 1 0.77353 0.7549 0.83961
logoc 0.77353 1 0.79545 0.55449
logtrap 0.7549 0.79545 1 0.6642
logvominus 0.83961 0.55449 0.6642 1

10
7.8419

l ogvopl us

5.6525
4.3554

l ogoc

2.0919
3.3604

l ogtrap

1.1939
7.7124

l ogvom
i nus

5.5373

The regression equations for these transformed variables are given in the table below, along with
significance test results. Residual analysis for these regressions is not shown. The final conclusion is the
same as for the untransformed data: When we use all three explanatory variables to predict LVO+, the
coefficient of LTRAP is not significantly different from 0; we then find that the model that uses LOC and
LVO– to predict LVO+ is nearly as good (in terms of R2), thus making it the best of the bunch.

Variable Estimate SE t P-value R-square s


Intercept 4.38523 0.36426 12.04 <.0001
logOC 0.70602 0.10742 6.57 <.0001 0.5983 0.35801
Intercept 4.26023 0.35063 12.15 <.0001
logOC 0.43007 0.16805 2.56 0.0162
logTRAP 0.42397 0.20538 2.06 0.0484 0.6514 0.33943
Intercept 0.87239 0.6402 1.36 0.1842
logOC 0.39203 0.11538 3.40 0.0021
logTRAP 0.02746 0.15697 0.17 0.8624
logVO– 0.67248 0.11778 5.71 <.0001 0.8421 0.23266
Intercept 0.83298 0.58878 1.41 0.1682
logOC 0.40589 0.08242 4.92 <.0001
logVO– 0.68159 0.10378 6.57 <.0001 0.8419 0.22859

11
11.45. Switching logVO+ and logVO– does not change the individual and pairwise analysis done
previously.
The regression equations for these variables are given in the table below, along with significance test
results. Residual analysis for these regressions is not shown. In this situation, the best model for
predicting logVO– is just using the predictor logVO+ alone; neither biomarker variable makes a
worthwhile contribution to the prediction.

Variable Estimate SE t P-value R-square s


Intercept 5.21169 0.41615 12.52 <.0001
logOC 0.44033 0.12272 3.59 0.0012 0.3075 0.40901
Intercept 5.03784 0.38561 13.06 <.0001
logOC 0.05656 0.18482 0.31 0.7618
logTRAP 0.58962 0.22587 2.61 0.0144 0.4430 0.37329
Intercept 1.57276 0.66196 2.38 0.0249
logOC -0.29324 0.14072 -2.08 0.0468
logTRAP 0.24478 0.16618 1.47 0.1523
logVO+ 0.81336 0.14246 5.71 <.0001 0.7477 0.25587
Intercept 1.75657 0.59361 2.96 0.0061
logVO+ 0.7305 0.08776 8.32 <.0001 0.7049 0.26697

11.47. (a) PCB = β 0 + β 1 PCB52 + β 2 PCB118 + β 3 PCB138 + β 4 PCB180 + ε, where ε ~ N(0, σ) and
independent. (b) From software, F = 1456.18, P-value < 0.0001. There is a significant multiple linear
regression between PCB and the predictors. ŷ = 0.93692 + 11.87270PCB52 + 3.76107PCB118 +
3.88423PCB138 + 4.18230PCB180. Additionally, all individual predictors are significant. (c) The
residual plot shows a possible violation of constant variance. The residuals are Normal, except for two
possible outliers.

11.49. (a) From software, F = 786.71, P-value < 0.0001. There is a significant multiple linear regression
between PCB and the predictors. ŷ = –1.01840 + 12.64419PCB52 + 0.31311PCB118 +
8.25459PCB138. (b) b 118 = 0.31311, P-value = 0.7083. (c) b 118 = 3.76107, P-value < 0.0001. (d) This
illustrates how complicated multiple regression can be: When we add PCB180 to the model, it
complements PCB118, making it useful for prediction.

12
11.51. TEQ = β 0 + β 1 PCB52 + β 2 PCB118 + β 3 PCB138 + β 4 PCB180 + ε, where ε ~ N(0, σ) and
independent. From software, F = 33.53, P-value < 0.0001. There is a significant multiple linear regression
between TEQ and the predictors. Not all predictors are significant, only PCB118 tests are significant as
shown in the output below. The residual plot shows a couple of potential outliers, which are also causing
a slight right-skew in the normal quantile plot.

Variable DF Parameter Standard t Pr > |t|


Estimate Error Value
Intercept 1 1.05997 0.18445 5.75 <.0001
pcb52 1 -0.09728 0.10938 -0.89 0.3772
pcb118 1 0.30618 0.09639 3.18 0.0023
pcb138 1 0.10579 0.0747 1.42 0.1616
pcb180 1 -0.00391 0.06478 -0.06 0.9521

11.53. (a) The correlations (all positive) are listed in the table below. The largest correlation is 0.956
(LPCB and LPCB138); the smallest 0.227 (LPCB28 and LPCB180) is not quite significantly different
from 0 (t = 1.91, P = 0.0607), but, with 28 correlations, such a P-value could easily arise by chance, so
we would not necessarily conclude that ρ = 0 (or that ρ ≠ 0).
Rather than showing all 28 scatterplots—which are all fairly linear and confirm the positive associations
suggested by the correlations—we have included only two of the interesting ones: LPCB against LPCB28
and LPCB against LPCB126. The former is notable because of one outlier (specimen 39) in LPCB28; the
latter stands out because of the “stack” of values in the LPCB126 data set that arose from the adjustment
of the zero terms. (The outlier in LPCB28 and the stack in LPCB126 can be seen in other plots involving
those variables; the two plots shown are the most appropriate for using the PCB congeners to predict
LPCB, as the next exercise asks.) (b) All correlations are higher with the transformed data. In part, this is
because these scatterplots do not exhibit the “greater scatter in the upper right,” which was seen in many
of the scatterplots of the original data.

13
LPCB28 LPCB52 LPCB118 LPCB126 LPCB138 LPCB153 LPCB180
LPCB52 0.795
LPCB118 0.533 0.671
LPCB126 0.272 0.331 0.739
LPCB138 0.387 0.540 0.890 0.792
LPCB153 0.326 0.519 0.780 0.647 0.922
LPCB180 0.227 0.301 0.654 0.695 0.896 0.867
LPCB 0.570 0.701 0.906 0.729 0.956 0.905 0.829
LPCB vs LPCB2 8 LPCB vs LPCB1 2 6
6 6

5
5

4
LP C B

LP C B
4

2
-6 -4 -2 0 2 -6 -5 -4
LP C B 2 8 LP C B 1 2 6

11.55. Student results will vary with how many different models they try and what trade-off they consider
between “good” (in terms of large R2) and “simple” (in terms of the number of variables included in the
model). SAS output showing the highest R2 for each model size is shown below. A stepwise selection in
SAS selects the model with variables: logPCB28, logPCB118, and logPCB126.

# R- C(p) MSE Variables in Model


variables Square
1 0.7294 10.7774 0.09776 logpcb126
2 0.7681 1.943 0.08505 logpcb28 logpcb126
3 0.7764 1.6247 0.08328 logpcb28 logpcb126 logpcb118
4 0.7805 2.4761 0.08303 logpcb153 logpcb28 logpcb126 logpcb118
5 0.7814 4.2165 0.08399 logpcb153 logpcb28 logpcb52 logpcb126
logpcb118
6 0.7818 6.1037 0.08519 logpcb153 logpcb180 logpcb28 logpcb52
logpcb126 logpcb118
7 0.7822 8 0.08644 logpcb138 logpcb153 logpcb180 logpcb28
logpcb52 logpcb126 logpcb118

11.57. (a) None of the variables show striking deviations from Normality in the quantile plots (not
shown). Taste and H2S are slightly right-skewed, and Acetic has an irregular shape. There are no outliers.

14
Variable Mean Median Std Dev Lower Upper Quartile
Quartile Quartile Range
taste 24.5333333 20.95 16.2553828 13.4 37.3 23.9
acetic 5.4980333 5.425 0.5708784 5.236 5.892 0.656
h2s 5.9417667 5.329 2.1268792 3.912 7.601 3.689
lactic 1.442 1.45 0.30349 1.25 1.68 0.43

11.59. From software, F = 12.11, P-value = 0.0017. There is a significant regression between taste and
acetic. ŷ = –61.49861 + 15.64777Acetic. R2 = 30.20%. The scatterplot is shown below. The Normal
quantile plot shows the residuals are Normally distributed; however, the scatterplots of the residuals show
that the residuals are linearly related to both H2S and Lactic.

15
11.61. From software, F = 27.55, P-value < 0.0001. There is a significant regression between taste and
lactic. ŷ = –29.85883 + 37.71995Lactic. R2 = 49.59%. The scatterplot is shown below. The Normal
quantile plot shows the residuals are Normally distributed; however, the scatterplots of the residuals show
that the residuals are linearly related to both H2S and Lactic.

16
11.63. The regression equation is ŷ = −26.94 + 3.801 Acetic + 5.146 H2S with s = 10.89 and R2 = 0.582.
The t value for the coefficient of Acetic is 0.84 (P-value = 0.406), indicating that it does not significantly
add to the model when H2S is used because Acetic and H2S are correlated (in fact, r = 0.618 for these
two variables). This model does a better job than any of the three simple linear regression models, but it is
not much better than the model with H2S alone (which explains 57.1% of the variation in Taste), as we
might expect from the t test result.

11.65. The regression equation is ŷ = −28.88 + 0.328 Acetic + 3.912 H2S + 19.671 Lactic with s =
10.13. The model explains 65.2% of the variation in Taste (the same as for the model with only H2S and
Lactic). Residuals of this regression appear to be Normally distributed and show no patterns in
scatterplots with the explanatory variables. (These plots are not shown.) Acetic is not significant (P-value
= 0.942); there is no gain in adding Acetic to the model with H2S and Lactic. It appears that the best
model is the H2S/Lactic model from the previous exercise.

17
Chapter 12 One-Way Analysis of Variance

12.1. (a) ANOVA tests the null hypothesis that the population means are all equal. (b) Experiments are
best for establishing causation. (c) ANOVA is used to compare means (and assumes that the variances are
equal). (d) Multiple comparison procedures are used when we wish to determine which means are
significantly different, but we do not need specific relations in mind prior to looking at the data.

12.3. x ij = µ i + ε ij , i = 1, 2, 3, j = 1, 2, … , 20. ε ij ~ N(0, σ). There are I = 3 groups and n i = 20 subjects per
group. The parameters of the model are µ 1 , µ 2 , µ 3 , and σ.
12.5. (a) Yes, the largest s is less than twice the smallest s: 80 < 2(68) = 136. (b) The estimates for µ 1 , µ 2 ,
and µ 3 are 279, 245, and 258. s 2p = (20 − 1)(78) + (20 − 1)(68) + (20 − 1)(80) = 5702.67. The estimate for
2 2 2

(20 − 1) + (20 − 1) + (20 − 1)


σ is 75.516.
12.7. The Normal quantile plot shows the residuals are Normally distributed.

12.9. (a) This sentence describes between-group variation. Within-group variation is the variation that
occurs by chance among members of the same group. (b) The sums of squares (not the mean squares) in
an ANOVA table will add; that is, SST = SSG + SSE. (c) The common population standard deviation σ
(not its estimate s p ) is a parameter. (d) A small P means the means are not all the same, but the
distributions may still overlap quite a bit. (See the “Caution” immediately preceding this exercise in the
text.)
12.11. (a) With I = 5 groups and N = 30, we have df I − 1 = 4 and N − I = 25. In Table E, we see that
4.18 < F < 6.49. (b) The sketch on the right shows the observed F value and the critical values from
Table E. (c) 0.001 < P-value < 0.01. (d) Because the P-value is small, we reject H 0 ; however, this does
not say that all pairs of group means are different, only that at least one mean is different.
12.13. Generally the one with the largest difference between means and the smallest standard deviation
will be the most significant, if this is not clear a ratio between the two can be used. So part (b) can be
ruled out because it has a larger sigma than part (a) with the same means. Of the remaining two, we can
see that part (c) has a bigger max difference relative to its sigma (10 to 3) than part (a) does (5 to 2), so
part (c) will be the most significant.
12.15. F = MSG/MSE. DFG = I – 1, DFE = N – I. (a) F = 340/50 = 6.8, DFG = 3 − 1 = 2, DFE = 63 − 3
= 60. 0.001 < P-value < 0.01. (b) DFG = 8 − 1 = 7, DFE = 48 − 8 = 40. MSG = 77/7 = 11, MSE =
190/40 = 4.75. F = 11/4.75 = 2.32, 0.025 < P-value < 0.05.

12.17. (a) Response: egg cholesterol level. Populations: chickens with different diets or drugs. I = 3, n 1 =
n 2 = n 3 =25, N = 75. (b) Response: rating on seven-point scale. Populations: the three groups of students. I
= 3, n 1 = 31, n 2 = 18, n 3 =45, N = 94. (c) Response: quiz score. Populations: students in each TA group. I
= 3, n 1 = n 2 = n 3 =14, N = 42.
12.19. For all three situations we have H 0 : µ 1 = µ 2 = µ 3 . H α : not all of the µ i are equal. DFG = I – 1 = 2,
DFE = N − I, and DFT = N − 1. The degrees of freedom for the F test are DFG and DFE.

Situation I N DFG DFE DFT df for F statistic


(a) Egg cholesterol level 3 75 2 72 74 F(2, 72)
(b) Student opinions 3 94 2 91 93 F(2, 91)
(c) Teaching assistants 3 42 2 39 41 F(2, 39)
12.21. (a) This sounds like a fairly well-designed experiment, so the results should at least apply to this
farmer’s breed of chicken. (b) It would be good to know what proportion of the total student body falls in
each of these groups—that is, is anyone overrepresented in this sample? (c) How well a TA teaches one
topic (power calculations) might not reflect that TA’s overall effectiveness.
12.23. (a) The plot suggests that both drugs cause an increase in activity level, and Drug B appears to
have a greater average effect. (b) Yes, the largest s is less than twice the smallest s: 17.2 = 4.15 < 2
7.75 = 2(2.784) = 5.57.`12.16 and s p = 3.487. (c) DFG = I − 1 = 4 and DFE = N − I = 20. (d)
Compared with an F(4, 20) distribution in Table E, we see that 2.25 < F < 2.87, so 0.05 < P-value < 0.10
(software gives 0.0642). We do not have significant evidence of a difference in mean effect.
12.25. (a) With I = 5 and N = 183, DFG = 4 and DFE = 178. (b) If the reported df were correct (meaning
32 athletes dropped out of the study), 151 athletes actually participated. (c) Answers will vary. For
example, the individuals could have been outliers in terms of their ability to withstand the water-bath
pain. In either case of low or high outliers, their removal would lessen the standard deviation for their
sport and move that sports mean (removing a high outlier would lower the mean, and removing a low
outlier would raise the mean).
12.27. To compare males and females, the coefficients are a 1 = 0.5, a 2 = 0.5, a 3 = −0.5, a 4 = −0.5.
Note: Technically any multiple of the coefficients would also be correct, such as −0.5, −0.5, 0.5,
0.5 or 1, 1, −1, −1.
12.29. Because there are only two groups, if the ANOVA test establishes that there are differences in
means, then we already know that the two means we have must be different. In this case contrasts and
multiple comparison provide no further useful information.
12.31. The power would be larger. For larger differences between alternative means, λ gets bigger,
increasing our power to see these differences.
12.33. (a) Yes, the largest s is less than twice the F, df1=2, df2=767
smallest s: 0.824 < 2(0.657) = 1.314.
2 2 2
(489 − 1)(0.804) + (69 − 1)(0.824) + (212 − 1)(0.657)
s 2p = =
489 + 69 + 212 − 3
0.5902. So s p = 0.7683. (b) Comparing F = 17.66 with
an F(2, 767) distribution, we find P-value < 0.001. In
the sketch shown, we see the 1% critical value is 4.633,
so the observed value lies well above the bulk of this
distribution. (c) For the contrast ψ = µ 2 – 0.5(µ 1 + µ 3 ), 0
0.01
4.633
we test H 0 : ψ = 0 versus H a : ψ > 0. We find c = 0.585 F

with SE c = 0.0977, so t = 5.99 with df = 767, and P-


value < 0.0001.
12.35. (a) 1µ 2 – 0.5µ 1 – 0.5µ 4 . (b) 1/3µ 1 + 1/3µ 2 + 1/3µ 4 – 1µ 3 .
12.37. Let µ 1 be the placebo mean, µ 2 and µ 3 be the means for low and high doses of Drug A, and µ 4 and
µ 5 be the means for low and high doses of Drug B. Recall from Exercise 12.39 that s p = 3.487. (a) The
first contrast is ψ 1 = µ 1 −1/2(µ 2 + µ 4 ); the second is ψ 2 = µ 3 − µ 2 − (µ 5 − µ 4 ). (b) The estimated
contrasts are c 1 = 11.80 − 0.5(15.25 + 16.15) = −3.9 and c 2 = (18.55 − 15.25) − (17.10 − 16.15) = 2.35.
The respective standard errors are SE c1 = s p 1/4 + 0.25/4 + 0 / 4 + 0.25/4 + 0 / 4 = 2.1353 and
SE c2 = s p 0 / 4 + 1/4 + 1/4 + 1/4 + 1/4 = s p = 3.487. (c) The first contrast is significant (t 1 = −1.826, one-
sided P-value = 0.0414), but the second is not (t 2 = −0.674, one-sided P-value 0.2540). We have enough
evidence to conclude that low doses increase activity level over a placebo, but we cannot conclude that
activity level changes due to increased dosage are different between the two drugs.
12.39. (a) If we believe the average score for the four off-task behaviors is less than the paper-and-pencil
control, a contrast would be ψ = µ 7 − 0.25(µ 1 + µ 2 + µ 3 + µ 4 ), keeping the convention that we believe the
contrast will be positive. (b) H 0 : ψ = 0 and H a : ψ > 0. (c) ψˆ = 0.62333 − 0.25(0.62667 + 0.57 + 0.53667

+ 0.53667) = 0.05583. The standard error is SEψˆ = 0.1205


1
+ (0.25) 
 1 + 1 + 1 + 1  = 0.0295.

2

21  21 20 20 21 
The test statistic is t = 0.05583/0.0295 = 1.894 with P-value = 0.0302. We can conclude that students
using the paper-and-pencil “on task” behavior do significantly better on quizzes such as these than
students who are off-task using digital technologies.
12.41. The power would be larger. For larger differences between alternative means, λ gets bigger,
increasing our power to see these differences.
12.43. (a) With roughly equal means, we have F = 0.00 and P-value = 0.9995. (b) As the means become
more different, F increases and P-value decreases. Here, we have F = 55.53 and P-value < 0.0001.

12.45. (a) The pricing among triple-play providers does seem different. Cable has the highest prices
followed by DSL. Fiber has the cheapest prices for triple-play. (b) Yes, the largest s is less than twice the
smallest s; 40.39 < 2(26.09) = 52.18. (c) df = 2, 44, 0.025 < P-value < 0.05. There are significant
differences in triple-play pricing among the difference provider platforms.
12.47. (a) Plot shown below. The Portals and Transactions stages seem to have higher integration features
than the Presence stage. (b) An observational study. They are not imposing a treatment on the winery. (c)
Yes, the largest s is less than twice the smallest s; 2.346 < 2(2.097) = 4.194. (d) This shouldn’t be a
problem because our inference is based on sample means, which will be approximately Normal given the
sample sizes. (e) F(2, 190); P-value < 0.001. There are significant differences in the number of market
integration features of the wineries among those with different website stages.

12.49. (a) H 0 : µ 1 = µ 2 = µ 3 , H α : not all of the µ i are equal, F = 5.31, P-value = 0.0067. There are
significant interval differences among the three groups. (b) The Bonferroni procedure shows that Group 2
is not significantly different from either Group 1 or Group 3; however, Group 3 is significantly different
(larger) from Group 1. (c) This is not appropriate. The regression assumes that Group 2 (coded as 2)
would have twice the effect of Group 1 (coded as 1), and Group 3 (coded as 3) would have three times the
effect of Group 1, etc. This is likely not true.
12.51. (a) Table shown below. Pooling is appropriate; the largest s is less than twice the smallest s:
0.621669 < 2(0.572914) = 1.146. (b) Histograms shown below; while the distributions aren’t Normal,
there are no outliers or extreme departures from Normality that would invalidate the results. We can
likely proceed with the ANOVA.

Level of N Score Score


Food N Mean Std Dev
Comfort 22 4.8873 0.5729
Control 20 5.0825 0.6217
Organic 20 5.5835 0.5936
12.53. (a) H 0 : µ 1 = µ 2 = µ 3 , H α : not all of the µ i are equal, F = 8.89, P-value = 0.000. There are
significant differences in the number of minutes that the three groups are willing to volunteer. According
to the Tukey multiple comparison, the Comfort group is willing to donate significantly more minutes than
the Organic group. In other words, the Comfort group shows more prosocial behavior than the Organic
group. The Control group is in the middle, not significantly different from either the Comfort or Organic
group in the number of minutes they are willing to donate. (b) The residual plot shows a slight decrease in
variability, suggesting a possible violation of constant variance. The Normal quantile plot looks fine and
shows a roughly Normal distribution.
12.55. (a) Table shown below. (b) Yes, the largest s is less than twice the smallest s; 11.501 < 2(9.078) =
18.156. (c) Histograms shown below. All three distributions are roughly Normal.

Level of Loss Loss


Group N Mean Std Dev
Ctrl 35 –1.0086 11.5007
Grp 34 –10.7853 11.1392
Indiv 35 –3.7086 9.0784
12.57. (a) All weight loss values are divided by 2.2. (b) F = 7.77, df = 2 and 101, P-value = 0.0007. The
results are identical with the transformed data regarding the test statistic, df, and P-value. Transforming
the response variable by a fixed amount has no
F, df1=4, df2=405
effect on the ANOVA results.
12.59. (a) The variation in sample size is some
cause for concern, but there can be no extreme
outliers in a 1-to-7 scale, so ANOVA is probably
reliable. (b) Pooling is appropriate; the largest s is
less than twice the smallest s: 1.26 < 2(1.03) =
2.06. (c) With I = 5 groups and total sample size
N = 410, we use an F(4, 405) distribution. We
can compare 5.69 with an F(4, 200) distribution 0 1 2 3 4
0.0001830
5 5.69 6
in Table E and conclude that P-value < 0.001, or F

with software determine that P-value = 0.0002.


(d) Hispanic Americans have the highest emotion scores, Japanese are in the middle, and the other three
cultures are the lowest.
12.61. (a) df = 2, 117. (b) P-value < 0.001. There are significant average reduction differences among the
different groups of bargainers. (c) Because the bargainer was the same person each time, there is no way
to differentiate if the average reduction was due to race/gender or due to individual ability to bargain.
Hence, the results would certainly not be generalizable.
12.63. (a) The means, standard deviations, and standard
errors are given (all in grams per cm2). (b) All three 0.235

distributions appear to be reasonably close to Normal,


and the standard deviations are suitable for pooling. (c) 0.230
BMD (g per cm ^ 2)

ANOVA gives F = 7.72 (df 2 and 42) and P-value =


0.001, so we conclude that the means are not all the 0.225

same. (d) With df = 42, three comparisons, and α =


0.05, the Bonferroni critical value is t** = 2.4937. The 0.220

pooled standard deviation is s p = 0.01437, and the


standard error of each difference is 0.215
Control Low dose High dose

SE D = s p 1 / 15 + 1 / 15 = 0.005247, so two means are


significantly different if they differ by t**SE D = n x s sx
0.01308. The high-dose mean is significantly different Control 15 0.2189 0.01159 0.002992
from the other two. (e) Briefly: High doses of kudzu Low dose 15 0.2159 0.01151 0.002972
isoflavones increase BMD. High dose 15 0.2351 0.01877 0.004847

Minitab Output: BMD versus Treatment


Source DF SS MS F P
Treatment 2 0.003186 0.001593 7.72 0.001
Error 42 0.008668 0.000206
Total 44 0.011853

S = 0.01437 R-Sq = 26.88% R-Sq(adj) = 23.39%

12.65. (a) Pooling is appropriate; the largest s is less than twice the x n s
smallest s: 27.364 < 2(16.594) = 33.188. (b) ANOVA gives F = 7.98 Control 10 601.1 27.364
(df 2 and 27), for which P-value = 0.002. There are significant Low jump 10 612.5 19.329
differences in bone density between the different jump condition High jump 10 638.7 16.594
groups; specifically it looks like the high jump group has a much
higher mean bone density than the other two groups.
One-way ANOVA: Density Versus Group
Source DF SS MS F P
Group 2 7434 3717 7.98 0.002
Error 27 12580 466
Total 29 20013

S = 21.58 R-Sq = 37.14% R-Sq(adj) = 32.49%

2.67. s p = 2.421 = 1.556 and the df = 117. (a) The contrast is ψ sex = µ 1 – µ 3 ; c sex = 1.055 – 2.310 = –
1.255 and SE c1 = s p 1/40 + 1/ 40 = 0.3479, so t = –1.255/0.3479 = –3.6. P-value < 0.001. The data
shows a difference between the sexes among Hispanics. (b) The contrast is ψ nat = µ 1 – µ 2 ; c nat = 1.055 –
1.050 = 0.005 and SE c1 = s p 1/40 + 1/ 40 = 0.3479, so t = 0.005/0.3479 = 0.014. P-value > 0.25. The
data does not show evidence of a difference between nationalities among males. (c) The conclusions from
the contrasts were limited to only comparing sex among Hispanics and nationalities among males.
Including Anglo females would alleviate this limited inference issue by allowing comparison of sex
among both nationalities and comparing nationality among both sexes as well as a possible interaction
effect.
12.69. (a) Table shown below. Pooling is not appropriate; the largest s is more than twice the smallest s:
8.6603 > 2(2.8868) = 5.7736. (b) ANOVA gives F = 137.94 (df 5 and 12), for which P-value < 0.0005.
There are significant Gpi differences among the six types of scaffold material.

n Mean s Minitab Output: Gpi Versus Material


ECM1 3 65.00% 8.66%
Source DF SS MS F P
ECM2 3 63.33% 2.89% Material 5 13411.1 2682.2 137.94 0.000
ECM3 3 73.33% 2.89% Error 12 233.3 19.4
MAT1 3 23.33% 2.89% Total 17 13644.4
MAT2 3 6.67% 2.89% S = 4.410 R-Sq = 98.29% R-Sq(adj) = 97.58%
MAT3 3 11.67% 2.89%

12.71. For convenience, the means and sample sizes are repeated here. In Exercise 12.26, we found s p =
18.421. (a) We have ψ 1 = µ C – 0.25(µ 30× 1 + µ 30× 2 + µ 60× 1 + µ 60×2 ), ψ 2 = 0.5(µ 30× 1 + µ 30× 2 ) – 0.5(µ 60× 1 +
µ 60×2 ), ψ 3 = 0.5(µ 60× 1 – µ 60×2 ) – 0.5(µ 30× 1 – µ 30× 2 ). (b) c 1 = −6.3 – 0.25(−17.4 – 18.4 – 24 – 24) = 14.65,
c 2 = 0.5(−17.4 – 18.4) – 0.5(−24.0 – 24.0) = 6.1, c 3 = 0.5(–24 – 24) – 0.5(−17.4 – 18.4) = −0.5. SE C1 =
4.209, SE C2 = SE C3 = 3.784. (c) All tests are two-sided (we are interested only in the comparisons; no
direction was given). There were df = 114 in the ANOVA for error. The test results are t 1 = 14.65/4.209 =
3.481, P-value < 0.001 (from Table D, df = 100) and P-value = 0.0007 (from software); t 2 = 6.1/3.784 =
1.612, 0.10 < P-value < 0.20 (from Table D, df = 100) and P-value = 0.1097 (from software); t 3 =
−0.5/3.784 = −0.132, P-value > 0.5 (from Table D, df = 100) and P-value = 0.8952 (from software). The
average of the massages relieves arthritis pain in the knee better than conventional treatment; there is not
enough evidence to conclude that twice a week relieves pain better than once a week; there is no
significant difference in the difference of the averages for 30 and 60 min. It appears that 60 min once or
twice a week relieves knee osteoarthritis pain best.

n Mean
30 min, 1 × /wk 22 −17.4
30 min, 2 × /wk 24 −18.4
60 min, 1 × /wk 24 −24.0
60 min, 2 × /wk 25 −24.0
Usual care 24 −6.3
(control)

12.73. (a) The plot shows granularity (due to the fact that the scores are all integers), but otherwise
independence is not violated. (b) Pooling is appropriate; the largest s is less than twice the smallest s; 1.6
< 2(0.93) = 1.86. (c) The individual normal quantile plots for each shampoo show a lot of granularity due
to using integer values but most look at least roughly normal. (d) The normal quantile plot for the
residuals looks even better and shows that the residuals are indeed normally distributed despite the
granularity.
12.75. (a) The three contrasts are ψ 1 = 1/3(µ Pyr1 + µ Pyr2 + µ Keto ) – µ Placebo , ψ 2 = 0.5(µ Pyr1 + µ Pyr2 ) – µ Keto ,
and ψ 3 = µ Pyr1 – µ Pyr2 . (b) The pooled standard deviation is s p = MSE = 1.1958. The estimated
contrasts and their standard errors are in the table. For example,
1 1 1
SE c1 = s p / 112 + / 109 + / 106 + 1 / 28 = 0.2355. (c) We test H 0 : ψ i = 0 versus H a : ψ 1 ≠ 0 for each
9 9 9
contrast. The t and P-values are given in the table. The Placebo mean is significantly higher than the
average of the other three, while the Keto mean is significantly lower than the average of the two Pyr
means. The difference between the Pyr means is not significant (meaning the second application of the
shampoo is of little benefit)—this agrees with our conclusion from Exercise 12.56.

c 1 = −12.51 c 2 = 1.269 c 3 = 0.191


SE C1 = 0.2355 SE C2 = 0.1413 SE C3 = 0.1609
t 1 = −53.17 t 2 = 8.98 t 3 = 1.19
P-value 1 < 0.0005 P-value 2 < 0.0005 P-value 3 = 0.2359

12.77. Because the original measurements are percents, the instructions to “add 5% to each response”
could be interpreted in two ways: (1) new response = old response + 5 or (2) new response = old response
× 1.05. The table gives summary statistics for both interpretations. For (1), the means increase by 5, but
everything else remains the same; the ANOVA table (below) is identical to the one in the original
solution. For (2), both the means and standard deviations are multiplied by 1.05, SS and MS are
multiplied by 1.052, but F and P-value remain the same.

Versi Versio Versio Versio Source DF SS MS F P


Material 5 13411.1 2682.2 137.94 0.000
on n (1) n (2) n (2) Error 12 233.3 19.4
(1) Total 17 13644.4
s s
ECM1 70 8.6603 68.25 9.0933 S = 4.410 R-Sq = 98.29% R-Sq(adj) = 97.58%
ECM2 68.3 2.8868 66.5 3.0311
ECM3 78.3 2.8868 77 3.0311
One-Way ANOVA: Gpi*1.05 Versus Material
MAT1 28.3 2.8868 24.5 3.0311
MAT2 11.7 2.8868 7 3.0311 Source DF SS MS F P
MAT3 16.7 2.8868 12.25 3.0311 Material 5 14785.8 2957.2 137.94 0.000
Error 12 257.3 21.4
Total 17 15043.0

One-Way ANOVA: Gpi+5 Versus Material S = 4.630 R-Sq = 98.29% R-Sq(adj) = 97.58%

12.79. A table of means and standard deviations is given. Quantile plots are not shown, but, apart from
the granularity of the scores and a few possible outliers, there are no marked deviations from Normality.
Pooling is reasonable for both PRE1 and PRE2; the ratios are 1.24 and 1.48. For both PRE1 and PRE2,
we test H 0 : µ B = µ D = µ S versus H a : at least one mean is different. Both tests have df 2 and 63. For PRE1,
F = 1.13 and P-value = 0.329; for PRE2, F = 0.11 and P-value = 0.895. There is no reason to believe that
the mean pretest scores differ between methods.

PRE
PRE PRE PRE 2 Source DF SS MS F p
Group 2 20.58 10.29 1.13 0.329
11 2 Error 63 572.45 9.09
Method n s s Total 65 593.03
Basal 22 10.5 2.972 5.27 2.7634
DRTA 22 9.72 2.694 5.09 1.9978 Analysis of Variance on PRE2
Strat 22 9.136 3.342 4.954 1.8639
Source DF SS MS F p
Group 2 1.12 0.56 0.11 0.895
Error 63 317.14 5.03
Minitab Output: Analysis of Variance on PRE1 Total 65 318.26
12.81. (a) Sampling plans will vary but should attempt to address how cultural groups will be determined:
Can we obtain such demographic information from the school administration? Do we simply select a
large sample then poll each student to determine if he or she belongs to one of these groups? (b) Answers
will vary with choice of H a and desired power. For example, with the alternative µ 1 = µ 2 4.4, µ 3 = 5, and
standard deviation σ = 1.2, three samples of size 75 will produce power 0.78. (See output below.) (c) The
report should make an attempt to explain the statistical issues involved; specifically, it should convey that
sample sizes are sufficient to detect anticipated differences among the groups.
Minitab Output: Power and Sample Size
One-way ANOVA
Alpha = 0.05 Assumed standard deviation = 1.2
Factors: 1 Number of levels: 3 Maximum Sample
Difference Size Power 0.6 75 0.782579

The sample size is for each level.

12.83. The design can be similar, although the types of music might be different. Bear in mind that
spending at a casual restaurant will likely be less than at the restaurants examined in Exercise 12.62; this
might also mean that the standard deviations could be smaller. A pilot study might be necessary to get an
idea of the size of the standard deviations. Decide how big a difference in mean spending you would want
to detect, then do some power computations.
Chapter 13 Two-Way Analysis of Variance
13.1. x ijk = µ ij + ε ijk , i = 1, 2, j = 1, 2, k = 1, … , n ij . ε ijk ~ N(0, σ). We have I = 2, J = 2, n 11 = 27, n 12 = 23,
n 21 = 19, and n 22 = 31. The parameters of the model are µ 11 , µ 12 , µ 21 , µ 22 , and σ.
13.3. (a) Two-way ANOVA is used when there are two factors (explanatory variables). (The outcome
[response] variable is assumed to have a Normal distribution, meaning that it can take any value, at least
in theory.) (b) Each level of A should occur with all three levels of B. (Level A has two factors.) (c) The
RESIDUAL part of the model represents the error. (d) DFAB = (I − 1)(J − 1).
13.5. (a) A large value of the AB F statistic indicates that we should reject the hypothesis of no
interaction. (b) The relationship is backward: Mean squares equal the sum of squares divided by degrees
of freedom. (c) Under H 0 , the ANOVA test statistics have an F distribution. (d) If the sample sizes are not
the same, the sums of squares may not add for “some methods of analysis.”
13.7. (a) There are nine cells with four observations each, so N = 36. DFA = 2, DFB = 2, DFAB = 4, DFE
= 27. (b) Plot below. (c) 0.025 < P-value < 0.05. There is a significant interaction effect. (d) Interaction is
not significant; the interaction plot should have roughly parallel lines.

13.9. (a) The factors are sex (I = 2) and age (J = 3). The response variable is the percent of pretend play.
The total number of observations is N = (2)(3)(11) = 66. (b) The factors are weeks after harvest (I = 5)
and amount of water (J = 2). The response variable is the percent of seeds germinating. The total number
of observations is N = 30 (three lots of seeds in each of the 10 treatment combinations). (c) The factors
are mixture (I = 6) and freezing/thawing cycles (J = 3). The response variable is the strength of the
specimen. The total number of observations is N = 54. (d) The factors are different colored tags (I = 4)
and the type of buyer (J = 2). The response variable is the total dollar amount spent. The total sample size
is N = 138.
13.11. (a) For those with real-time feedback, the average total cost for those not informed was only
slightly larger than the average total cost for those informed, while for those without feedback, the not-
informed average was much larger than for those who were informed. Additionally, we can see an
interaction effect. For those not informed, the lack of real-time feedback increased their spending, while
for those who were informed, the lack of real-time feedback decreased their spending. (b) F = 18.18, DF
= 3, 190, P-value < 0.0001. (c) The interaction term is significant (F = 16.18, P-value < 0.0001) as is the
Informed term (F = 37.06, P-value < 0.0001). The Smartcart term is not significant (F = 1.16, P-value =
0.2828). Effects are interpreted as in part (a).

13.13. (a) The same students were tested twice. (b) The plot shows a definite interaction; the control
group’s mean score decreased, while the expressive writing group’s mean increased somewhat. (c) No,
the largest s is more than twice the smallest s; 14.3 > 2(5.8) = 11.6.

13.15. (a) The Normality assumption is for the error terms, not the measurements; however, recall from
Chapter 12 that ANOVA is robust to reasonable departures from Normality, especially when sample sizes
are similar. (b) Yes, the largest s is less than twice the smallest: 1.62 < 2(0.82) = 1.64. The ANOVA table
is below. Based on this information, Age and Sex are significant factors at the 0.05 level, but the
interaction is not significant.

Source DF SS MS F P-value
Age 6 31.97 5.328 4.4 0.0003
Sex 1 44.66 44.66 36.879 0.0000
Age*Sex 6 13.22 2.203 1.819 0.0962
Error 232 280.95 1.211
13.17. (a) There appears to be an interaction effect; the lines are not parallel. (b) There appears to be a
significant Focus main effect, so the marginal means for Focus would be useful in explaining this
difference. (c) If each participant looked at a picture of each body type then their responses likely would
be related to each other, which violates the independence assumption.

13.19. (a) There appears to be an interaction between history and statement of thanks. For consumers with
a short history, the employee’s statement of thanks drastically increases the repurchase intent. For
consumers with a long history, there is not much difference in repurchase intent whether or not a thanks
was given. (b) Short: 6.245. Long: 7.45. No: 6.61. Yes: 7.085. The marginal mean for history is
somewhat meaningful, as generally the long history consumers are more likely to repurchase. But the
marginal mean for thank you is misleading, suggesting that, generally, no thanks is lower than yes thanks,
but that is only true for the short history group.

13.21. (a) The plot suggests a possible interaction because the lines are not parallel. (b) By subjecting the
same individual to all four treatments, rather than four individuals to one treatment each, we reduce the
within-groups variability (the residual), which makes it easier to detect between-groups variability (the
main effects and interactions).
13.23. (a) We’d expect reaction times to slow with older individuals. If bilingualism helps brain
functioning, we would not expect that group to slow as much as the monolingual group. The expected
interaction is seen in the plot; mean total reaction time for the older bilingual group is much less than for
the older monolingual group; the lines are not parallel. (b) The interaction is just barely not significant (F
= 3.67, P-value = 0.059). Both main effects are very significant (P-value = 0.000).
Minitab Output: Time Versus Age, Ling
Source DF SS MS F P
Age 1 479880 479880 195.01 0.000
Ling 1 63394 63394 25.76 0.000
Interaction 1 9031 9031 3.67 0.059
Error 76 187023 2461
Total 79 739329

S = 49.61 R-Sq = 74.70% R-Sq(adj) = 73.71%

13.25. (a) There appears to be an interaction effect. For the unfavorable outcome, there is no difference in
satisfaction between the favorable and unfavorable process; both scored equally low. For the favorable
outcome, those with a favorable process scored substantially higher than those with an unfavorable
process. (b) There doesn’t appear to be an interaction effect. Overall, the favorable outcome means were
only slightly higher than the unfavorable outcomes means. But both favorable process means were
generally quite high, much higher than both unfavorable process means, regardless of whether the
outcome was favorable or not. (c) Yes, there appears to be a three-factor interaction effect. Particularly
for those with an unfavorable outcome and a favorable process, the humor group scored much higher than
those with no humor. Also, for those with unfavorable outcome and unfavorable process, the humor
group scored worse than those with no humor.
13.27. A favorable process seems to have the greatest impact on satisfaction, followed by a favorable
outcome. Humor doesn’t seem to add much to satisfaction.

Humor Mean Process Mean Outcome Mean


Yes 3.88 Favorable 4.80 Favorable 4.30
No 3.60 Unfavorable 2.74 Unfavorable 3.21

13.29. s p 2 = 3.1493, so s p = 1.7746. It is reasonable to pool the standard deviations because the largest s is
less than twice the smallest s; 2.024 < 2(1.601) = 3.202.
13.31. For general attitude there may be a slight interaction effect. Differences in sex are largest for
Canada, then the United States, and smallest for France, with females higher than males in all cases.
Additionally, Canada attitudes are generally the highest, followed by the United States, with France
having the worst attitudes among the three.
For product benefits, there may be a slight interaction effect. Differences in sex are largest for Canada,
then for the United States, and very little for France, with females higher than males in all cases.
Additionally, Canada benefit scores are generally the highest, followed by the United States, with France
having the lowest scores among the three.

For credibility of information, there seems to be an interaction effect. Differences in sex are largest for the
United States, then Canada, with females higher in both, but in France, males had a slightly higher score
for credibility than females. Additionally, Canada credibility scores are generally the highest, followed by
the United States, with France having the lowest scores among the three.
For purchase intention, there seems to be a small interaction effect. Differences in sex are largest for the
United States with females higher; for Canada and France there are small differences between sexes.
Canada and the United States seem to have much higher purchase intent scores than France.

13.33. Main effect A is not significant, 0.05 < P-value < 0.100 (DF = 2, 18). Main effect B is significant,
0.025 < P-value < 0.050 (DF = 2, 18). The interaction is not significant, P-value > 0.100 (DF = 4, 18).
13.35. (a) The means are nearly parallel and show little evidence of an interaction. (b) With equal sample
sizes, the pooled variance is simply the unweighted average of the variances: s p 2 = 0.25(0.122 + 0.142 +
0.122 + 0.132) = 0.16325. Therefore, s p = 0.1278. (c) Note that all of these contrasts have been arranged
so that, if the researchers’ suspicions are correct, the contrast will be positive. To compare new-car
testosterone change to old-car change, the appropriate contrast is ψ 1 = 1/2(µ new,city + µ new,highway ) –
1/2(µ old,city + µ old,highway ). To compare city change to highway change for new cars, we take ψ 2 = µ new,city –
µ new,highway . To compare highway change to city change for old cars, we take ψ 3 = µ old,highway – µ old,city . (d)
By subjecting the same individual to all four treatments, rather than four individuals to one treatment
each, we reduce the within-groups variability (the residual) by eliminating the natural differences in
testosterone levels among the men, which makes it easier to detect the main effects and interactions of
interest.
13.37. (a) Plot below. (b) There seems to be a fairly large difference between the means based on how
much the rats were allowed to eat, but not very much difference based on the chromium level. There may
be an interaction: the NM mean is lower than the LM mean, while the NR mean is higher than the LR
mean. (c) The marginal means are L: 4.86, N: 4.871, M: 4.485, R: 5.246. For low chromium level (L), R
minus M is 0.63; for normal chromium (N), R minus M is 0.892. Mean GITH levels are lower for M than
for R; there is not much difference for L versus N. The difference between M and R is greater among rats
that had normal chromium levels in their diets (N).

13.39. (a) s p = 38.14. (b) Yes, the largest s is less than twice
the smallest s; 42.4 < 2(31.2) = 62.4. (c) Sender Individual:
Feedback Agent n x s
70.24; Sender Group: 48.65; Responder Individual: 60.46; Simple No 35 4.229 3.246
Yes 32 3.250 4.429
Responder Group: 59.37. (d) There appears to be an
Elaborate No 34 3.324 3.591
interaction effect. Generally, when the responder was an
Yes 34 5.176 2.634
individual, more money was sent than for a group responder,
but when the sender was a group, they sent more money to an individual responder, and when the sender
was an individual, they sent more money to a group responder. (e) For sender, P-value = 0.0033; the
sender effect is significant. For responder, P-value = 0.9748; the responder effect is not significant. For
the interaction, P-value = 0.1522, the interaction effect is not significant. Only the sender effect is
significant; when the sender was an individual, he or she sent more money.
13.41. Yes; the iron-pot means are the highest, and the F statistic for testing the effect of the pot type is
very large. (In this case, the interaction does not weaken any evidence that iron-pot foods contain more
iron; it only suggests that, while iron pots increase iron levels in all foods, the effect is strongest for
meats.)
13.43. (a) Table below. (b) There are differences in diameter among the five different tools, as expected.
There are also differences in diameter in the different shifts, although not as large as the tool differences.
There also appears to be a small interaction effect for some tool*time combinations. Particularly, tools 3,
4, and 5 are fairly consistent across time differences, with time 2 having the largest diameters, and time 1
the smallest diameters. For tool 2, however, time 1 diameters get larger than time 3s; for tool 1, time 1
diameters get the largest, bigger than both time 2 and time 3 diameters. (c) The Tool effect is extremely
significant: F = 412.94, DF = 4, 30, P-value < 0.0001. The Time effect is also very significant: F = 43.60,
DF = 2, 30, P-value < 0.0001. The interaction effect is significant, F = 7.65, DF = 8, 30, P-value <
0.0001. (d) Because all terms are significant, we would interpret effects as stated in part (b).

Level Level Diamete Diameter


of of r
Tool Time N Mean Std Dev
1 1 3 25.031 0.0011547
1 2 3 25.028 0
1 3 3 25.026 0
2 1 3 25.017 0.0011547
2 2 3 25.02 0.002
2 3 3 25.016 0
3 1 3 25.006 0.0015275
3 2 3 25.013 0.0011547
3 3 3 25.009 0.0011547
4 1 3 25.012 0
4 2 3 25.019 0.0011547
4 3 3 25.014 0.004
5 1 3 24.997 0.0011547
5 2 3 25.006 0
5 3 3 25.000 0.0015275

13.45. (a) All three F values have df 1 and 945, and the P-values are < 0.001, < 0.001, and 0.1477. Sex
and handedness both have significant effects on mean lifetime, but there is no significant interaction. (b)
Women live about 6 years longer than men (on the average), while right-handed people average 9 more
years of life than left-handed people. “There is no interaction” means that handedness affects both sexes
in the same way, and vice versa.
13.47. (a) and (b) The table following lists the means and standard deviations (the latter in parentheses) of
the nitrogen contents of the plants. The two plots below suggest that plant 1 and plant 3 have the highest
nitrogen content, plant 2 is in the middle, and plant 4 is the lowest. (In the second plot, the points are so
crowded together that no attempt was made to differentiate among the different water levels.) There is no
consistent effect of water level on nitrogen content. Standard deviations range from 0.0666 to 0.3437, for
a ratio of 5.16—larger than we like. (c) Minitab output below. Both main effects and the interaction are
highly significant.

Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm


1 3.2543 2.7636 2.8429 2.9362 3.0519 3.0963 3.3334
(0.2287) (0.0666) (0.2333) (0.0709) (0.0909) (0.0815) (0.2482)
2 2.4216 2.0502 2.0524 1.9673 1.9560 1.9839 2.2184
(0.1654) (0.1454) (0.1481) (0.2203) (0.1571) (0.2895) (0.1238)
3 3.0589 3.1541 3.2003 3.1419 3.3956 3.4961 3.5437
(0.1525) (0.3324) (0.2341) (0.2965) (0.2533) (0.3437) (0.3116)
4 1.4230 1.3037 1.1253 1.0087 1.2584 1.2712 0.9788
(0.1738) (0.2661) (0.1230) (0.1310) (0.2489) (0.0795) (0.2090)

Minitab Output: Two-Way ANOVA: pctnit Versus Species, Water

Source DF SS MS F P
species 3 172.392 57.4639 1301.32 0.000
water 6 2.587 0.4311 9.76 0.000
Interaction 18 4.745 0.2636 5.97 0.000
Error 224 9.891 0.0442
Total 251 189.614

I nteraction Plot for Percent Nitrogen I nteraction Plot for Percent Nitrogen
species
1
3.5 water
2 3.5
3 1
4 2
3
3.0 3.0 4
5
6
7
2.5 2.5
Mean
Mean

2.0 2.0

1.5
1.5

1.0
1.0
1 2 3 4
1 2 3 4 5 6 7 species
water

13.49. For each water level, there is highly significant evidence of variation in nitrogen level among plant
species (Minitab output below). For each water level, we have df = 32, six comparisons, and α = 0.05, so
the Bonferroni critical value is t** = 2.8123. (If we take into account that there are seven water levels, so
that overall we are performing 6 × 7 = 42 comparisons, we should take t** = 3.5579.) The table on the
right gives the pooled standard deviations s p , the standard errors of each difference SE D , and the
“minimum significant difference” MSD = t**SE D (two means are significantly different if they differ by at
least this amount). MSD1 uses t** = 2.8123, and MSD2 uses t** = 3.5579. As it happens, for either choice
of MSD, the only nonsignificant differences are between species 1 and 3 for water levels 1, 4, and 7.
Therefore, for every water level, species 4 has the lowest nitrogen level and species 2 is next. For water
levels 1, 4, and 7, species 1 and 3 are statistically tied for the highest level; for the other levels, species 3
is the highest, with species 1 coming in second.
Water sp SE D MSD1 MSD2
level
1 0.1824 0.086 0.2418 0.3059
2 0.2274 0.1072 0.3015 0.3814
3 0.1912 0.0902 0.2535 0.3208
4 0.1991 0.0939 0.264 0.334
5 0.1994 0.094 0.2643 0.3344
6 0.2318 0.1093 0.3073 0.3887
7 0.2333 0.11 0.3093 0.3913
Minitab Output: Pct Nitrogen vs Species for Water 1
Source DF SS MS F P
Species_1 3 18.3711 6.1237 184.05 0.000
error 32 1.0647 0.0333
Total 35 19.4358
Pct Nitrogen vs Species for Water 2
Source DF SS MS F P
species_2 3 17.9836 5.9945 115.93 0.000
Error 32 1.6546 0.0517
Total 35 19.
Pct Nitrogen vs Species for Water 3
Source DF SS MS F P
species_3 3 22.9171 7.6390 208.87 0.000
Error 32 1.1704 0.0366
Total 35 24.0875

Pct Nitrogen vs Species for Water 4


Source DF SS MS F P
species_4 3 25.9780 8.6593 218.37 0.000
Error 32 1.2689 0.0397
Total 35 27.2469
Pct Nitrogen vs Species for Water 5
Source DF SS MS F P
species_5 3 26.2388 8.7463 220.01 0.000
Error 32 1.2721 0.0398
Total 35 27.5109
Pct Nitrogen vs Species for Water 6
Source DF SS MS F P
species_6 3 28.0648 9.3549 174.14 0.000
Error 32 1.7191 0.0537
Total 35 29.7838
Pct Nitrogen vs Species for Water 7
Source DF SS MS F P
species_7 3 37.5829 12.5276 230.17 0.000
Error 32 1.7417 0.0544
Total 35 39.3246

13.51. (a) and (b) The tables below list the means and standard deviations (the latter in parentheses). The
means plots show that biomass (both fresh and dry) generally increases with water level for all plants.
Generally, species 1 and 2 have higher biomass for each water level, while species 3 and 4 are lower.
Standard deviation ratios are quite high for both fresh and dry biomass: 108.01/6.79 = 15.9 and
35.76/3.12 = 11.5. (c) Minitab output below. For both fresh and dry biomass, main effects and the
interaction are significant. (The interaction for fresh biomass has P-value = 0.04; other P-values are
smaller.)
Minitab Output: Two-Way ANOVA: Fresh Biomass Versus Species, Water
Source DF SS MS F P
Species 3 458295 152765 81.45 0.000
water 6 491948 81991 43.71 0.000
Interaction 18 60334 3352 1.79 0.004
Error 84 157551 1876
Total 111 1168129

Two-Way ANOVA: Dried Biomass Versus Species, Water


Source DF SS MS F P
Species 3 458295 152765 81.45 0.000
water 6 491948 81991 43.71 0.000
Interaction 18 60334 3352 1.79 0.004
Error 84 157551 1876
Total 111 1168129

Fresh biomass Dried biomass

140
400 species species
1 1
2 2
120
3 3
4 4
300
100

80
M ean

M ean

200

60

40
100

20

0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
water water

Fresh Fresh Fresh Fresh Fresh Fresh Fresh Fresh


Biomass Biomass Biomass Biomass Biomass Biomass Biomass Biomass
Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm
1 109.095 165.138 168.825 215.133 258.900 321.875 300.880
(20.949) (29.084) (18.866) (42.687) (45.292) (46.727) (29.896)
2 116.398 156.750 254.875 265.995 347.628 343.263 397.365
(29.250) (46.922) (13.944) (59.686) (54.416) (98.553) (108.011)
3 55.600 78.858 90.300 166.785 164.425 198.910 188.138
(13.197) (29.458) (28.280) (41.079) (18.646) (33.358) (18.070)
4 35.128 58.325 94.543 96.740 153.648 175.360 158.048
(11.626) (6.789) (13.932) (24.477) (22.028) (32.873) (70.105)
Dry Dry Dry Dry Dry Dry Dry Dry
Biomass Biomass Biomass Biomass Biomass Biomass Biomass Biomass
Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm
1 40.565 63.863 71.003 85.280 103.850 136.615 120.860
(5.581) (7.508) (6.032) (10.868) (15.715) (16.203) (17.137)
2 34.495 57.365 79.603 95.098 106.813 103.180 119.625
(11.612) (6.149) (13.094) (25.198) (18.347) (25.606) (35.764)
3 26.245 31.865 36.238 64.800 64.740 74.285 67.258
(6.430) (11.322) (11.268) (9.010) (3.122) (12.277) (7.076)
4 15.530 23.290 37.050 34.390 48.538 61.195 53.600
(4.887) (3.329) (5.194) (11.667) (5.658) (12.084) (25.290)

13.53. For each water level, there is highly significant evidence of variation in the biomass level (both
fresh and dry) among plant species (Minitab output below). For each water level, we have df = 12, six
comparisons, and α = 0.05, so the Bonferroni critical value is t** = 3.1527. (If we take into account that
there are seven water levels, so that overall we are performing 6 × 7 = 42 comparisons, we should take t**
= 4.2192.) The table below gives the pooled standard deviations sp, the standard errors of each difference
SE D and the “minimum significant difference” MSD = t**SE D (two means are significantly different if
they differ by at least this amount). MSD1 uses t** = 3.1527, and MSD2 uses t** = 4.2192. Rather than
give a full listing of which differences are significant, we note that plants 3 and 4 are not significantly
different, nor are 1 and 3 (except for one or two water levels). All other plant combinations are
significantly different for at least three water levels. For fresh biomass, plants 2 and 4 are different for all
levels, and for dry biomass, 1 and 4 differ for all levels.
Fresh Fresh Fresh Fresh Dry Dry Dry Dry
biomass biomass biomass biomass biomass biomass biomass biomass
Water sp SE D MSD1 MSD2 sp SE D MSD1 MSD2
level
1 20.0236 14.1588 44.6382 50.3764 7.6028 5.3760 16.9487 19.1274
2 31.4699 22.2526 70.1552 79.1735 7.6395 5.4019 17.0305 19.2197
3 19.6482 13.8934 43.8012 49.4318 9.5103 6.7248 21.2010 23.9263
4 43.7929 30.9663 97.6265 110.1762 15.5751 11.0133 34.7213 39.1846
5 38.2275 27.0310 85.2197 96.1746 12.5034 8.8412 27.8734 31.4565
6 59.3497 41.9666 132.3068 149.3147 17.4280 12.3235 38.8518 43.8462
7 66.7111 47.1719 148.7174 167.8348 23.7824 16.8167 53.0176 59.8329

Fresh Biomass vs Species, Water level 1 Dry Biomass vs Species, Water level 1
Source DF SS MS F P Source DF SS MS F P
Species 3 19107 6369 15.88 0.000 species 3 1411.2 470.4 8.14 0.003
Error 12 4811 401 Error 12 693.6 57.8
Total 15 23918 Total 15 2104.8
Fresh Biomass vs Species, Water level 2 Dry Biomass vs Species, Water level 2
Source DF SS MS F P Source DF SS MS F P
Species 3 35100 11700 11.81 0.001 Species 3 4597.1 1532.4 26.26 0.000
Error 12 11884 990 Error 12 700.3 58.4
Total 15 46984 Total 15 5297.4
Fresh Biomass vs Species, Water level 3 Dry Biomass vs Species, Water level 3
Source DF SS MS F P Source DF SS MS F P
Species 3 71898 23966 62.08 0.000 Species 3 6127.2 2042.4 22.58 0.000
Error 12 4633 386 Error 12 1085.3 90.4
Total 15 76531 Total 15 7212.6
Fresh Biomass vs Species, Water level 1 Dry Biomass vs Species, Water level 1
Source DF SS MS F P Source DF SS MS F P
Species 3 19107 6369 15.88 0.000 species 3 1411.2 470.4 8.14 0.003
Error 12 4811 401 Error 12 693.6 57.8
Total 15 23918 Total 15 2104.8
Fresh Biomass vs Species, Water level 4 Dry Biomass vs Species, Water level 4
Source DF SS MS F P Source DF SS MS F P
species 3 62337 20779 10.83 0.001 Species 3 8634 2878 11.86 0.001
Error 12 23014 1918 Error 12 2911 243
Total 15 85351 Total 15 11545
Fresh Biomass vs Species, Water level 5 Dry Biomass vs Species, Water level 5
Source DF SS MS F P Source DF SS MS F P
species 3 99184 33061 22.62 0.000 Species 3 10026 3342 21.38 0.000
Error 12 17536 1461 Error 12 1876 156
Total 15 116720 Total 15 11902
Fresh Biomass vs Species, Water level 6 Dry Biomass vs Species, Water level 6
Source DF SS MS F P Source DF SS MS F P
Species 3 86628 28876 8.20 0.003 species 3 13460 4487 14.77 0.000
Error 12 42269 3522 Error 12 3645 304
Total 15 128897 Total 15 17105
Fresh Biomass vs Species, Water level 7 Dry Biomass vs Species, Water level 7
Source DF SS MS F P Source DF SS MS F P
Species 3 144376 48125 10.81 0.001 species 3 14687 4896 8.66 0.002
Error 12 53404 4450 Error 12 6787 566
Total 15 197780 Total 15 21474

13.55. (a) With I = 2, J = 3, and N = 180, the degrees of freedom for sex are 1 and 174; for floral
characteristic 2 and 174; for interaction 2 and 174. (b) Damage to males was higher for all characteristics.
For males, damage was highest under characteristic level 3, while for females, the highest damage
occurred at level 2. Interaction should be significant because the distance between the means increases
from floral type 1 to floral type 3. (c) Three of the standard deviations are at least half as large as the
means. Because the response variable (leaf damage) must be non-negative, this suggests that these
distributions are right-skewed; taking logarithms should reduce the skewness.
13.57. The table and plot of the means suggest that females have higher HSE grades than males. For a
given sex, there is not too much difference among majors. Normal quantile plots show no great deviations
from Normality, apart from the granularity of the grades (most evident among women in EO). In the
ANOVA, only the effect of sex is significant. Residual analysis (not shown) reveals some causes for
concern; for example, the variance does not appear to be constant.

Sex CS EO Other Minitab Output: HSE Versus Sex, Major


Male n = 39 39 39 Source DF SS MS F P
x = 7.7949 7.4872 7.4103 Sex
0.000
1 105.338 105.338 50.32
s = 1.5075 2.1505 1.5681 Major 2 5.880 2.940 1.40 .248
Female n = 39 39 39 Interaction 2 5.573 2.786 1.33 0.266
x =8.8462 9.2564 8.6154 Error 228 477.282 2.093
Total 233 594.073
s = 1.1364 0.7511 1.1611
13.59. The following table and plot of the means suggest that students who stay in the sciences have
higher mean SATV scores than those who end up in the “Other” group. Female CS and EO students have
higher scores than males in those majors, but males have the higher mean in the “Other” group. Normal
quantile plots suggest some right-skewness in the “Women in CS” group and also some non-Normality in
the tails of the “Women in EO” group. Other groups look reasonably Normal. In the ANOVA table, only
the effect of major is significant.

Sex CS EO Other
Male n = 39 39 39
x = 526.949 507.846 487.564
s = 100.937 57.213 108.779
Female n = 39 39 39
x = 543.385 538.205 465.026
s = 77.654 102.209 82.184

Minitab Output: SATV Versus Sex, Major


Source DF SS MS F P
Sex 1 3824 3824.4 0.47 0.492
Major 2 150723 75361.7 9.32 0.000
Interaction 2 29321 14660.7 1.81 0.166
Error 228 1843979 8087.6
Total 233 2027848
Chapter 14 Logistic Regression
p 1/ 4 1 3/ 4
14.1. (a) odds = = = , or “1 to 3.” (b) P(not a heart) = 3/4, odds = = 3, or “3 to 1.”
1 − p 1 − 1/ 4 3 1− 3 / 4

87 /140 71/150
14.3. odds men = = 1.6415. odds women = = 0.8987.
1 − 87 /140 1 − 71/10
14.5. For men: log(1.6415) = 0.4956. For women: log(0.8987) = −0.1068.
14.7. Letting x = 1 for women, we have b 0 = 0.4956, b 1 = –0.1068 – 0.4956 = –0.6024. The equation is
log(odds) = 0.4956 – 0.6024x. The odds ratio is 0.548. (For x = 1 for men the equation is log(odds) = –
0.1068 + 0.6024x and the odds ratio is 1.826.
14.9. In Example 14.7, b 1 = 0.69245. e0.69245 = 1.9986.

14.11. (a) p̂ = 462/1003 = 0.4606. (b) odds = pˆ 0.4606


= = 0.8539.
1 − pˆ 0.5394

14.13. (a) The model is y = log(odds) = β 0 + β 1 x. Variable x = 1 if the customer is 25 years old or younger
and 0 otherwise. (b) β 0 is the log(odds) for people over 25 having used their cell phone to consult about a
purchase in the last 30 days; β 1 is the difference in log(odds) for those 25 or younger.
14.15. (a) n = 102,738. Of the recruits, 3869 were rejected and 98,869 were not. (b) The equation is
log(odds) = −6.76382 + 4.41058x.
14.17. (a) Odds ratio = e4.41058 = 82.3172. (b) The confidence interval is (64.7468, 104.65572). (c)
Recruits over 40 were about 82.3 times more likely to be rejected for service due to bad teeth than recruits
younger than 20. Further, we are 95% confident that recruits 40 years or older were between 64.7 and
104.7 times more likely to be rejected.
14.19. (a) G = 9413.210, P-value < 0.0005. We have overwhelming evidence that not all the slopes are 0
(
(at least one regression coefficient is not 0). (b) The confidence interval is e 1
b − z*SEb1 b + z*SEb1
,e 1 ) . The
results are shown in the table below. (c) In each case, the hypotheses are H 0 : β i = 0, H a : β i ≠ 0. The test
statistic is z = b i /SE(b i ). In all cases, we have P-value ≈ 0. In each case, we have extremely strong
evidence that the slope is not zero; the probability of being rejected for service due to bad teeth is higher
for all older age groups than for the under 20 age group.

Age Group bi SE(b i ) 95% Confidence interval z


20 to 25 1.97180 0.127593 (5.5941, 9.2247) 15.45
25 to 30 2.85364 0.125048 (13.5793, 22.1399) 22.82
30 to 35 3.55806 0.123713 (27.5384, 44.7252) 28.76
35 to 40 3.96185 0.122837 (41.3094, 66.8606) 32.25
Over 40 4.41058 0.122513 (64.7449, 104.65879) 36.00

14.21. Answers will vary. For example, 68/58,952 = 0.0012 = 0.12% of those 20 or younger were
rejected. The rejection rate for those 20 to 25 was 647/78,639 = 0.0082 = 0.82%, 0.82/0.12 = 6.8333. That
analysis would indicate the older recruits were about 6.8333 times more likely to be rejected, while this
analysis indicates an odds ratio of 7.1836. The odds ratio is not directly comparable with the ratio of
probabilities because it compares odds (the chance of the event happening divided by the chance of the
event not happening). The first analysis is perhaps more easily understood by most people.

0.01648
14.23. (a) For the high-blood-pressure group: p̂hi = 55/3338 = 0.01648, odds hi = = 0.01675 or
1 − 0.01648
about 1 to 60. (b) For the low-blood-pressure group: p̂low = 21/2676 = 0.00785, oddslow = 0.00791 or
about 1 to 126. (c) The odds ratio is odds hi /odds low = 2.1176. Odds of death from cardiovascular disease
are about 2.1 times greater in the high-blood-pressure group.
14.25. (a) odds ratio = e0.7505 = 2.1181. Using the interval in the previous exercise, the 95% odds-ratio
confidence interval is (e0.2452, e1.2558) or (1.28, 3.51). (b) We are 95% confident that the odds of death from
cardiovascular disease are about 1.28 to 3.51 times greater in the high-blood-pressure group than in the
low-blood-pressure group.
14.27. The odds a student who watches no TV was an exergamer were 0.111/(1 – 0.111) = 0.1249. The
odds for a student who reported less than two hours of TV per day are 0.206/(1 – 0.206) = 0.2594. The
odds for a student who watches at least two hours of TV per day are 0.311/(1 – 0.311) = 0.4514. The
fitted logistic regression equation is log(odds) = −2.0793 + 0.7313(< 2 hours) + 1.2830(> 2 hours). Those
who watch less than 2 hours of TV are e0.731368 = 2.08 times as likely to be an exergamer than those who
don’t watch any TV; from software the 95% confidence interval is (0.874, 4.941). Similarly, those who
watch at least 2 hours of TV are e1.2830 = 3.61 times as likely to be an exergamer than those who don’t
watch any TV; from software the 95% confidence interval is (1.501, 8.669).
14.29. (a) It is not tested with an F test, a chi-square with 4 degrees of freedom would be used. (b) There
is no error term for logistic regression. (c) The hypothesis should include the parameter β 1 not b 1 . (d) The
interpretation of coefficients is affected by correlations among explanatory variables.
14.31. (a) log(odds) = –2.56 + 1.125(3) = 0.815, odds = e0.815 = 2.26. (b) log(odds) =–2.56 + 1.125(3.69)
= 1.591, odds = e1.591 = 4.91. (c) log(odds) = –2.56 + 1.125(4.09) = 2.041, odds = e2.041 = 7.70.
14.33. (a) For low: p̂ = 88/1169 = 0.0753. For high: p̂ = 112/1246 = 0.0899. (b) For low: odds =
0.0753/0.9247 = 0.0814. For high: odds = 0.0899/0.9101 = 0.0988. (c) For low: –2.5083. For high: –
2.3150.
14.35. With b 1 = 3.1088 and SE(b 1 ) = 0.3879, the 99% confidence interval is 3.1088 ± 2.576(0.3879) =
(2.1096, 4.1080).
14.37. (a) From Minitab: z = 3.109/0.388 = 8.013. From SPSS: z = –3.109/0.388 = –8.013. From JMP: z =
–3.1087767/0.3878909 = 8.0145. (b) Using JMP, z 2 = (8.1045)2 =64.23, which agrees with the value of
Wald from SPSS and X2 from JMP. (c) Using the P-value from Minitab yields the same conclusion. (d)
Preferences among students will vary.
14.39. An odds ratio greater than 1 means a higher probability of a low tip. Therefore, the odds favor a
low tip from senior adults, those dining on Sunday, those who speak English as a second language, and
French-speaking Canadians. Diners who drink alcohol and lone males are less likely to leave low tips. For
example, senior adults are 1.099 times as likely to leave a low tip than their counterparts.

 p 
14.41. (a) The model is log   = β 0 + β1 x , where x = 1 if the person is over 40, and 0 if the person
 1− p 
is under 40. (b) p is the probability that the person is terminated; this model assumes that the probability
of termination depends on age (over/under 40). In this case, that seems to have been the case, but we
might expect that other factors were taken into consideration. (c) The estimated odds ratio is e1.3504 =
3.859. A 95% confidence interval is 1.3504 ± 1.96(0.4130) = (0.5409, 2.1599). For odds, this translates to
(1.7176, 8.6701). The odds of being terminated are 1.7 to 8.7 times greater for those over 40. (d) We
could use the additional variables in a multiple logistic regression model to account for their effects
before assessing if age has an effect.
14.43. A two-way table is shown below. p̂fruit = 169/237 = 0.7131, pˆ no = 494/897 = 0.5507. odds fruit =
2.4853; odds no = 1.2258. log(odds fruit ) = 0.2036; log(odds fruit ) = 0.9704, so b 0 = 0.2036 and b 1 = 0.7068,
and the equation is log(odds) = 0.2036 + 0.7068x. The odds ratio is e0.7068 = 2.03 and a 95% confidence
interval for the odds ratio is 1.49 to 2.77. Thus, students eating fruit two or more times a day are between
1.49 and 2.77 times as likely to be active than those not eating fruit two or more times a day.

Eats Eats
Fruit? Fruit?
Active? Yes No Total
Yes 169 494 663
No 68 403 471
Total 237 897 1134

14.45. For each group, the probability, odds, and log(odds) of being overweight are

65,080 pˆ no
pˆ no = = 0.2732 odds no = = 0.3759 log(odds no ) = −0.9785
238,215 1 − pˆ no

83,143 0.2856 pˆ FF
pˆ FF = = odds FF = = 0.3997 log(odds FF ) = −0.9170
291,152 1 − pˆ FF

With x = 0 for no fast food and x = 1 for fast food, the logistic regression equation is log(odds) = −0.9785
+ 0.0614x. Software reports z = 9.97, P-value ≈ 0, leaving little doubt that the slope is not 0. A 95%
confidence interval for the odds ratio is (1.0506, 1.0763); the odds of being overweight for students at
schools close to fast-food restaurants are about 1.05 to 1.08 times greater than for students at schools that
are not close to fast food.
14.47. (a) From software, the X2 statistic for testing this hypothesis is 30.652 (df = 3), which has P-value
= 0.000. We conclude that at least one coefficient is not 0. (b) The fitted model is log(odds) = −7.759 +
0.1695HSM + 0.5113HSS + 0.1925HSE. The standard errors of the three coefficients are 0.1830, 0.2409,
and 0.2076, giving respective 95% confidence intervals (using 100 df, t* = 1.984) 0.1695 ± 1.984(0.183)
= (−0.1936, 0.5326), 0.5113 ± 1.984(0.2409) = (0.0334, 0.9892), and 0.1925 ± 1.984(0.0.208) =
(−0.2202, 0.6052). (c) Only the coefficient of HSS is significantly different from 0.
14.49. The coefficients and standard errors for the fitted model follow. (a) From software, the X2 statistic
for testing this hypothesis is given by SAS as 18.7174 (df = 3); because P-value = 0.0003, we reject H 0
and conclude that at least one of the high school grades add a significant amount to the model with SAT
scores. (b) The X2 statistic for testing this hypothesis is 9.3738 (df = 2); because P-value = 0.0092, we
reject H 0 ; at least one of the two SAT scores adds significantly to the model with high school grades. (c)
For modeling the odds of HIGPA, high school grades (specifically HSS) are useful; SAT scores
(specifically, math scores) are as well.
Partial SAS output:
Analysis of Maximum Likelihood Estimates
Parameter higpa DF Estimate Standard Wald Pr > ChiSq
Error Chi-Square
Intercept 0 1 12.1154 2.4167 25.1315 <.0001
hsm 0 1 -0.00977 0.1970 0.0025 0.9604
hss 0 1 -0.5441 0.2499 4.7412 0.0294
hse 0 1 -0.2777 0.2204 1.5877 0.2077
satm 0 1 -0.00991 0.00343 8.3272 0.0039
satcr 0 1 0.00259 0.00284 0.8324 0.3616
Test that all slopes are zero: G = 40.826, DF = 5, P-Value = 0.000
Linear Hypotheses Testing Results
Wald DF Pr > ChiSq
Label Chi-Square
test1_hs 18.7174 3 0.0003
test2_sat 9.3738 2 0.0092
Chapter 15 Nonparametric Tests
15.1. Group A ranks are 4, 6, 7, 8, 9. Group B ranks are 1, 2, 3, 5, 10.

# Rooms 736 1096 1214 1260 1544 1622 1628 1996 2019 3933
Group B B B A B A A A A B
Rank 1 2 3 4 5 6 7 8 9 10

15.3. H 0 : No difference in distribution of the number of rooms between the 25 top-ranked hotels and the
hotels ranked 26 to 50. H α : There is a systematic difference in distribution of the number of rooms
between the 25 top-ranked spas and the spas ranked 26 to 50. W = 4 + 6 + 7 + 8 + 9 = 34.

(5)(5)(11)
15.5. µ W = (5)(11)/2 = 27.5, σ W = = 4.7871 . We found W = 34; thus, using continuity
12
correction, we get z = (33.5 – 27.5)/4.7871 = 1.25, P-value = 0.1056. There is not enough evidence to
show a systematic difference in the number of rooms between the 25 top-ranked spas and the spas ranked
26 to 50.

15.7. (a) For exergamers, no TV: 6/281= 2.14%; some TV: 160/281 = 56.94%; more than 2 hours TV:
115/281 = 40.93%. For non-exergamers, no TV: 48/919 = 5.22%; some TV: 616/919 = 67.03%; more
than 2 hours: 255/919 = 27.75%. (b) From software, χ2 = 20.068, df = 2, P-value < 0.0005. We conclude
there is an association between being an exergamer and the amount of TV watching.

15.9. W = 3 + 6 + 7 + 9 + 11 = 36.

(5)(6)(12)
15.11. µ W = (5)(12)/2 = 30, σ W = = 5.4772 .
12
15.13. (a) From the JMP output, we have W = 1,351,159.5, z = 8.5063, P-value < 0.0001. Education and
civic participation are related. From the table, we see that, as education level increases, individuals are
more likely to be engaged. (b) For those who are civically engaged, 66.1% have at least some college, and
39.2% are college graduates. The corresponding percents for those who are not civically engaged are

15.15. Back-to-back stemplots and summary statistics are shown. Men Women
The men’s distribution is right-skewed, and the women’s 3224 0 2
distribution reasonably symmetric. From software, W = 1421, P- 99877555 0 5567888888899
value = 0.6890 (two-sided). We do not have enough evidence to 2221100000 1 122244
conclude there is a difference between women and men in words 77765 1 556677777788
spoken per day. 44442 2 0111224
76 2 5
With the larger samples, we can use t procedures for this exercise 10 3 2
when we could not in the previous one. From software, t = –0.11, 6 3
two-sided P-value = 0.9155 (two-sided). There is not enough
evidence to show a difference in means between men and women.
x s Min Q1 M Q3 Max
Men 14,060 9,065 695 7,464.5 11,118 22,740 36,345
Women 14,253 6,514 2.363 8,345.5 14,602 18,050 32,291

15.17. (a) Normal quantile plots are not shown. No departures from Normality were seen. (b) We test H 0 :
µ 1 = µ 2 versus H a : µ 1 > µ 2 . x1 = 0.676, s 1 = 0.119, x2 = 0.406, s 2 = 0.268. From software, t = 2.06, P-
value = 0.0446. We have fairly strong evidence that high-progress readers have higher mean scores. (c)
H 0 : No difference in distribution of scores between the two groups. H α : There is a systematic difference
in distribution of scores between the two groups. From software, W = 36 and P-value = 0.0473. This is
the essentially identical P-value found in part (b).
15.19. (a) See table. (b) For Story 2, W = 8 + 9 + 4 + 7 + 10 = 38. Under H 0 : µ W = (5)(11)/2 = 27.5,
(5)(5)(11)
σW = = 4.7871 . (c) z = (37.5−27.5)/4.787 = 2.09, P-value = 0.0183. (d) See table.
12
Story 1 Story 1 Story 2 Story 2
Child Progress Score Rank Score Rank
1 high 0.55 4.5 0.8 8
2 high 0.57 6 0.82 9
3 high 0.72 8.5 0.54 4
4 high 0.7 7 0.79 7
5 high 0.84 10 0.89 10
6 low 0.4 3 0.77 6
7 low 0.72 8.5 0.49 3
8 low 0 1 0.66 5
9 low 0.36 2 0.28 1
10 low 0.55 4.5 0.38 2

15.21. H 0 : There is no difference in the distribution of taste between food made with Hot Oil and food
made Oil Free. H α : One oil preparation method has systematically higher taste ratings than the other. W+
= 11.5.

Oil
Expert Hot Oil Free Difference Rank
1 78 75 –3 3.5
2 84 85 1 1
3 62 67 5 5
4 73 75 2 2
5 63 66 3 3.5

(5)(6)(11)
15.23. µ W+ = (5)(6)/4 = 7.5, σ W + = = 3.7081 . z = (11 – 7.5)/3.7081 = 0.94, the two-sided
24
P-value = 0.3472. The data do not show a systematic difference in food taste between food made with
Hot Oil and food made Oil Free.
(8)(9)(17)
15.25. µ W+ = (8)(9)/4 = 18, σ W + = = 7.1414 .
24
15.27. W+ = 35.
15.29. (a) n = 20. (b) W+ = 192.0. (c) P-value = 0.001. We reject the null hypothesis of no difference in
distributions and conclude one distribution is systematically higher than the other. (d) The estimated
median is 2.925. With n = 20, the median should be the average of the middle two absolute value
differences.
15.31. H 0 : median = 0 versus H a : median > 0. From software, W+ = 119, P-value = 0.001, which is strong
evidence that there are more aggressive incidents during Moon days. This agrees with the results of the t
and permutation tests.
15.33. (a) With this additional subject, six of the seven subjects rated drink A higher, and (as before) the
subject who preferred drink B only gave it a 2-point edge. The difference for the seventh subject will be
influential for a t test, because its difference (30) is an outlier. (b) For testing H 0 : µ d = 0 versus H a : µ d ≠
0, we have x = 7.8571 and s = 10.3187, so t = 2.01 (df = 6) and P-value = 0.091. (c) For testing H 0 :
Ratings have the same distribution for both drinks versus H a : One drink is systematically rated higher;
from software, we have W+ = 26.5 and P-value = 0.043. (d) The new data point is an outlier, which may
make the t procedure inappropriate. This also increases the standard deviation of the differences, which
makes t insignificant. The Wilcoxon test is not sensitive to outliers, and the extra data point makes it
powerful enough to reject H 0 .
15.35. (a) The data are right-skewed with no outliers. (b) From software: W+ = 31, P-value = 0.556. The
data do not show a systematic difference between the observations and 105.

15.37. We want to compare the attractiveness scores for k = 5 independent samples (the 102, 302, 502,
702, and 902 friend groups of subjects). Under the null hypothesis for ANOVA, each population is N(µ,
σ). An F test is used to compare the group means. The Kruskal–Wallis test only assumes a continuous
distribution in each population, and it uses a chi-square distribution for the test statistic.
15.39. Minitab gives the median of each group and the average rank for each group. Using the “adjusted
for ties” values, H = 17.05, P-value = 0.002. This indicates that the distributions are not all the same;
from the average ranks, the group with the lowest average rank was the 102 friend group; the group with
the highest average rank was the 302 friend group.
15.41. (a) H 0 : The distribution of BMD is the same for all three groups versus H a : At least one group is
systematically higher or lower. From software, H = 9.12, df = 2, P-value = 0.010. (b) In the solution to
Exercise 12.45, ANOVA yielded F = 7.72 (df 2 and 42) and P = 0.001. The ANOVA evidence is slightly
stronger, but (at α = 0.05) the conclusion is the same.
15.43. (a) Diagram below. (b) The stemplots (right) suggest greater density for high-jump rats and a
greater spread for the control group. (c) From software, H = 10.68 with P-value = 0.005. We conclude
that bone density differs among the groups. ANOVA tests H 0 : all means are equal, assuming Normal
distributions with the same standard deviation. For Kruskal–Wallis, the null hypothesis is that the
distributions are the same (but not
necessarily Normal). (d) There is strong Control Control Low Jump Low Jump High High
evidence that the three groups have different Jump Jump
bone densities; specifically, the high-jump 55 4 55 55
group has the highest average rank (and the 56 9 56 56
57 57 57
highest density), the low-jump group is in the
58 58 8 58
middle, and the control group is lowest.
59 33 59 469 59
60 03 60 57 60
61 14 61 61
62 1 62 62 2266
63 63 1258 63 1
64 64 64 33
65 3 65 65 00
66 66 66
67 67 67 4

15.45. The table below gives the summary statistics. H 0 : all three varieties have the same length
distribution versus H a : at least one variety is systematically longer or shorter. From software, H = 45.36,
df = 2, P-value < 0.0005. There is evidence that at least one species has different lengths.

Variety n x s M
bihai 16 47.5975 1.2129 47.12
red 23 39.7113 1.7988 39.16
yellow 15 36.1800 0.9753 36.11
15.47. (a) On the right is a histogram of service times for Verizon customers. With only 10 CLEC service
calls, it is hardly necessary to make such a graph for them; we can simply observe that seven of those 10
calls took 5 hours, which is quite different from the distribution for Verizon customers. The means and
medians tell the same story: xV = 1.7263 hr, M V = 1 hr and xC = 3.80 hr, M C = 5 hr. (b) The CLEC
distribution has only two values (1 and 5) with n = 10. The t test is not reliable in situations like this.
From software, the Wilcoxon rank-sum test gives W = 786.5, which is highly significant (P-value =
0.0026 or 0.0006). We have strong evidence that response times for CLEC customers are longer. It is also
possible to apply the Kruskal–Wallis test (with two groups). While the P-values are slightly different (P-
value = 0.0052 or 0.0012 adjusted for ties), the conclusion is the same: We have strong evidence of a
difference in response times. (c) The nonparametric test makes no assumption about the Normality of the
sample means; the mean of the CLEC distribution cannot be Normal with our data.

15.49. Use the Wilcoxon rank sum test with a two-sided alternative. Software provides the following: for
meat, W = 15 and P-value = 0.4705, and for legumes, W = 10.5 and P-value = 0.0433 (or 0.0421). There
is no evidence of a difference in iron content for meat, but for legumes the evidence is significant at α =
0.05.

15.51. (a) The three pairwise comparisons are bihai–red, bihai–yellow, and red–yellow. (b) The test
statistics and P-values are given in the Minitab output below; all P-values are reported as 0 to four
decimal places. (c) All three are easily significant at the overall 0.05 level.
Minitab Output: Wilcoxon Rank-Sum Test for Bihai – Red
bihai N = 16 Median = 47.120
red N = 23 Median = 39.160
W = 504.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0000
Wilcoxon Rank-Sum Test for Bihai – Yellow
bihai N = 16 Median = 47.120
yellow N = 15 Median = 36.110
W = 376.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0000
Wilcoxon Rank-Sum Test for Red – Yellow
red N = 23 Median = 39.160
yellow N = 15 Median = 36.110
W = 614.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0000
Chapter 16 Bootstrap Methods and Permutation Tests
16.1. Answers will vary. (a) Not shown, students answers will vary substantially. (b) The theoretical
mean is 4.368; from our bootstrap the mean is 4.316. (c) Not shown. (d) The theoretical standard error is
s/ n = 2.32/ 6 = 0.947; from our bootstrap the standard error is 0.920.

16.3. (a) Answers will vary, but a Normal quantile plot indicates that a Normal distribution cannot be
rejected. (b) Answers will vary. (c) The means could range from 24 to 140; however, based on 2000
resamples, the means should range between 33 and 122.
16.5. The mean of the bootstrap distribution is about 75 (indicated by the dashed line below the
histogram); it is roughly Normal, but the peak in the center is too high to be truly Normal. The Normal
quantile plot is straight, except for some straggling at the low end (and a bit at the high end). Based on
these plots, the bootstrap distribution is approximately Normal.
16.7. (a) The standard deviation of the bootstrap distribution will be approximately s/ n . (b) Bootstrap
samples are done with replacement from the original sample. (c) Use a sample size equal to the original
sample size. (d) The bootstrap distribution is created by sampling with replacement from the original
sample, not the population.
16.9. This bootstrap distribution of price appears Normal.

16.11. This bootstrap distribution of age appears Normal.


16.13. Bootstrap standard errors will vary. (a) s = 14.8; s/ 60 = 1.911. From our bootstrap the standard
error is 1.932. (b) s = 903.4; s/ 71 = 107.21. From our bootstrap the standard error is 108.1. (c) s =
14.854; s/ 8 = 5.25. From our bootstrap the standard error is 4.966.

16.15. (a) The bootstrap distribution, shown below, looks slightly less skewed than the one in the
previous exercise (possibly because some of the extreme outliers were not included). (b) The bootstrap
standard error for the sample of 80 calls is 36.56. For the sample of 10 calls, the standard error is 29.55.
This is smaller (not larger), possibly due to the lack of the extremely long calls in the small sample.

16.17. (a) The bootstrap distribution is right-skewed (refer to the solution to Exercise 16.12); a t interval
may not be appropriate. (b) Answers will vary based on a student’s bootstrap. For our bootstrap, the t
interval is 354 ± 2.021(42.99) = (267.1, 440.9) using t* for df = 49 (use 40). (c) The interval reported in
Example 7.11 is 266.6 to 441.6 seconds.
16.19. The summary statistics given in Example 16.6 include standard deviations s1 = 0.859 for males
0.8592 0.7482
and s2 = 0.748 for females, so SE D = + = 0.1326. The standard error reported by the S-
91 59
PLUS bootstrap routine in Example 16.6 is 0.1327419 (very close).
16.21. With n = 41, df = 40, so t* = 2.021. The interval is 109.8 ± (2.021)(8.2) = (93.23, 126.37).
16.23. Answers will vary based on student’s bootstrap. (a) With n = 1046, df = 1045 (use 1000), so t* =
1.962. Using our bootstrap the interval is 29.89 ± (1.962)(0.494) = (28.92, 30.86). (b) The standard
deviation of the original sample is s = 14.41, so the interval is 29.88 ± (1.962)(14.41/ 1046 ) = (29.00,
30.75). The intervals are almost identical. (c) Answers will vary.
16.25. Answers will vary based on a student’s bootstrap. (a) The bootstrap mean is 195.8, and the original
mean is 196.575, so the bias is small. The graphs are given in the solution to Exercise 16.14. The
bootstrap distribution is right-skewed but reasonably normal. (b) With n = 80, df = 79 (use 60), so t* =
2.000. Using our bootstrap, the interval is 195.8 ± (2.000)(36.56) = (122.68, 268.92). (c) SEboot = 36.56;
formula SE = s/ 80 = 342/ 80 = 38.24. The regular t interval is 196.575 ± (2.000)(38.24) = (119.1,
272.1).
16.27. The bootstrap distribution of the standard deviation shown as follows Typical Ranges
looks quite Normal. This particular resample had SEboot = 0.0489 and mean SEboot 0.047 to 0.05
0.814. The original sample of GPAs had s = 0.817, so there is little bias. t lower 0.718 to 0.724
With n = 150, we can use df = 100, t* = 1.984 and find a 95% confidence t upper 0.910 to 0.916
interval for the population standard deviation of 0.817 ± (1.984)(0.0489) = 0.7120 to 0.9140.

Bootstr ap Distr ibution of GP A Standar d Dev iation Bootstr ap Distr ibution of GP A Standar d Dev iation

1.0
90

80
Resample Standard Deviation

70 0.9

60
Frequency

50
0.8

40

30

0.7
20

10

0.6
0
0.65 0.70 0.75 0.80 0.85 0.90 0.95 -3 -2 -1 0 1 2 3
Resample Standard Deviation Z-Score

16.29. (a) The data appear to be roughly Normal, though with the typical Typical Ranges
random gaps and bunches that usually occur with relatively small Bias −0.016 to 0.02
samples. It appears from both the SEboot 0.12 to 0.14
t lower −0.16 to −0.11
t upper 0.36 to 0.41
Nor mal data set Nor mal data set
20 4

15 2

1
Frequency

value

10 0

-1

5 -2

-3

-4
0
-2.4 -1.2 0.0 1.2 2.4 -3 -2 -1 0 1 2 3
value Z-Score
histogram and quantile plot that the mean is slightly larger than zero, but the difference is not large
enough to rule out a N(0, 1) distribution. (b) The bootstrap distribution is extremely close to Normal with

Bootstr ap Distr ibution fr om N( 0 , 1 ) Data Bootstr ap Distr ibution of N( 0 , 1 ) Data


90 0.75

80

0.50
70

60

Resample Mean
0.25
Frequency

50

40
0.00

30

20 -0.25

10

-0.50
0
-0.250 -0.125 0.000 0.125 0.250 0.375 0.500 -3 -2 -1 0 1 2 3
Resample Means Z-Score

no appreciable bias.
(c) The typical SE is 0.1308, and the t interval is −0.1357 to 0.3854. Typical results for SEboot and the
bootstrap interval are above on the right.

16.31. (a) The sample standard deviation is s = 4.4149 mpg. (b) The typical Typical Ranges
range for SEboot is in the table on the right. (c) SEboot is quite large relative to Bias −0.22 to −0.09
s, suggesting that s is not an accurate estimate. (d) There is substantial SEboot 0.55 to 0.65
negative bias and some skewness, so a t interval is probably not appropriate. t lower 3.07 to 3.26
t upper 5.57 to 5.76

16.33. (a) The distribution of x is N(26, 27/ n ). (b) SEboot ranged from about 6.49 to about 7.08 (these
values will depend on the original random sample; our sample had a standard deviation of 22.5, which is
quite a bit smaller than the population standard deviation). (c) For n = 40, the original random sample had
s = 29.21. SEboot ranged from about 4.2 to 4.7. For n = 160, SEboot ranged from about 2.02 to about 2.25
(the original sample had s = 26.73). (d) We note that none of these histograms looks exactly Normal and
the decrease in the standard error with increasing n.
Bootstrap Distribution for N( 26, 27) with n = 10 Bootstrap Distribution for N( 26, 27) with n = 40 Bootstrap Distribution for N( 26, 27) with n = 160
200 140
200

120

150
100 150

80
Frequency

Frequency

Frequency
100
100

60

40
50 50

20

0 0 0
0 8 16 24 32 40 48 56 6 12 18 24 30 36 42 18 20 22 24 26 28 30
Resa mple mean Resa mple mean Resa mple mean

16.35. Student answers should vary depending on their samples. Students should notice that the bootstrap
distributions are approximately Normal for larger sample sizes. For small samples, the sample could be
skewed one way or the other in Exercise 16.33, and most should be right-skewed for Exercise 16.34.
Some of that skewness should come through into the bootstrap distribution.
16.37. (a) Both graphs indicate the bootstrap distribution is roughly Normal. The bootstrap mean is 7.881
and the original mean is 7.87, so the bias is small. (b) With n = 21, df = 20, so t* = 2.086. Using our
bootstrap, the interval is 7.881 ± (2.086)(1.221) = (5.33, 10.43). (b) The standard deviation of the original
sample is s = 5.65, so the interval is 7.87 ± (2.086)(5.65/ 21 ) = (5.30, 10.44). The intervals are nearly
identical.

16.39. The bootstrap distribution is right-skewed; an interval based on t would not be appropriate. The
original sample had the statistic of interest θˆ = 1.21. The bootstrap distribution had a sample mean a bit
higher because the bias is 0.045. SEboot = 0.2336. The BCa confidence interval is (0.766, 1.671), which is
located about 0.11 higher than the regular bootstrap interval, which was (0.653, 1.554) and lower than the
percentile interval (0.860, 1.762).
16.41. (a) The bootstrap percentile and t intervals are similar, suggesting that the t intervals are
acceptable. (b) Every interval (percentile and t) includes 0.
Note: In the solution to Exercise 16.29, the percentile intervals
Typical Ranges
were always 70% to 80% as wide as the t intervals (because of the
t lower −0.16 to −0.11
heavy tails of that bootstrap distribution). In this case, the width of
t upper 0.36 to 0.41
Percentile lower −0.17 to −0.09 the percentile interval is 93% to 106% of the width of the t
Percentile upper 0.35 to 0.42 interval.

16.43. The 95% t interval given in Example 16.5 is 2.794 to 3.106. The 95% bootstrap percentile interval
is 2.793 to 3.095, as given in Example 16.8. The differences are relatively small relative to the width of
the intervals, so they do not indicate appreciable skewness.
16.45. One set of 1000 repetitions gave the BCa interval as (0.4503, 0.8049). We see the bootstrap
distribution is left-skewed and that there is one possible high outlier as well. The lower end of the BCa
interval typically varies between 0.42 and 0.46 while the upper end varies between 0.795 and 0.805.
These intervals are lower than those found in Example 16.10.

16.47. The percentile interval is shifted to the right relative to the bootstrap t interval. The more accurate
intervals are shifted even farther to the right.
16.49. The interval students will report will vary. Typical ranges Typical Ranges
are given to the right. We note that the locations of the intervals SEboot 27.9 to 29.5
are quite different (especially the low ends); this mirrors what is t lower 48 to 52
seen in Exercise 16.47. These intervals are much lower (and t upper 178 to 181
narrower) than those in Exercise 16.47. Percentile lower 60 to 67
Percentile upper 169 to 177
BCa lower 66 to 72
BCa upper 176 to 195

16.51. Typical ranges for the endpoints of the BCa interval (using Typical Ranges
males–females) are given at right. This interval is comparable with the BCa lower −0.427 to −0.372
−0.416 to 0.118 found in Example 16.6. BCa upper 0.087 to 0.133

16.53. (a) The bootstrap distribution is left-skewed; simple bootstrap inference is inappropriate. (b) The
percentile interval is consistently about 0.941 to 0.995, while the BCa interval is 0.934 to 0.993. These
agree fairly well, but the BCa interval is shifted left and a bit wider. We do have significant evidence that
the correlation is not 0.

16.55. (a) The regression line is Rating = 26.724 + 1.207


Typical Ranges
PricePerLoad. (b) The ends of the bootstrap distribution do not
SEboot 0.228 to 0.242
look very Normal; a t interval may not be appropriate. (c) The
t lower 0.705 to 0.734
typical standard error of the slope is 0.2846. With t22 = 2.074, the t upper 1.68 to 1.71
typical confidence interval would be 1.207 ± (2.074)(0.2846), or Percentile lower 0.83 to 0.86
0.6167 to 1.7973. All these intervals seem to be located higher. Percentile upper 1.74 to 1.81
BCa lower 0.71 to 0.80
BCa upper 1.62 to 1.69
16.57. The regression equation is Spending = 0.03567 + 2.91895 Population. (a) The residuals plots
are shown below. The scatterplot indicates a possible increase in variability with population as well as
several outliers. The Normal plot also indicates potential outliers on both ends of the distribution. (b) We
see indications of outliers in both the histogram and the Normal quantile plot of the bootstrap distribution.
(c) The standard confidence interval for the slope is 2.91895 ± (2.021)(0.20152) = 2.512 to 3.326. All
other intervals are similar.
16.59. No, because we believe that one population has a smaller variability. In order to pool the data, the
permutation test requires that both populations be the same when H0 is true.
16.61. Enter the data with the score given to the phone and an indicator for each design. We have
hypotheses H0: µ1 = µ2 (no difference in preference) and Ha: µ1 ≠ µ2 (there is a preference for one of the
phones, but we don’t know which one). Resample the design indicators (without replacement) to scramble
them. Compute the mean score for each scrambled design group. Repeat the process many times. The P-
value of the test will be the proportion of resamples where the resampled difference in group means is
larger than the observed difference (in absolute value).
16.63. If there is no relationship, we have H0: ρ = 0. We test this against Ha: ρ ≠ 0 (there is a relationship
of unspecified direction). Because there is no relationship under H0, we can resample one of the variables,
say screen satisfaction (without replacement), and compute the correlation between that and the original
scores for keyboard satisfaction. Repeat the process many times, keeping track of the proportion of
resamples where the correlation is greater in absolute value than that found in the original data. That
proportion is the P-value for the test.

57 + 61 42 + 62 + 41 + 28
16.65. (a) The observed difference in means is − = 59 − 43.25 = 15.75. (b)
2 4
Student results will vary, but should be one of the 15 (equally likely) possible values: –21, –20.25, –10.5,
–9, –6, –5.25, 0.75, 1.5, 3.75, 4.5, 4.5, 5.25, 15.75, 16.5, 19.5. (c) The histogram shape will vary
considerably. (d) Out of 20 resamples, the number that yields a difference of 20.25 (or more) has a
binomial distribution with n = 20 and p = 3/15, so students should get between 0 and 8 or 9 resamples that
give a value of 20.25 or larger, for a P-value ranging between 0 and 0.45. (e) As was noted in part (b),
only one resample possibility can give a difference of means greater than or equal to the observed value,
so the exact P-value is 3/15 = 0.2.
Note: To determine the 15 possible values, note that the six numbers sum to 291. If the first two numbers
add up to T, then the other four will add up to 291 − T, and the difference in means will be
1 1 3
T − (291 − T ) = T − 72.75. The values of T range from (28 + 41) = 69 to (61 + 62) = 123.
2 4 4
16.67. (a) H0: µEE = µLE and Ha: µEE ≠ µLE (the question of interest in Example 7.16 was whether we
could conclude the two groups were not the same). (b) xEE = 11.56, sEE = 4.306, xLE =5.12, sLE = 4.622.
The test statistic is t = 2.28, df = 8, P-value = 0.0521. At the 0.05 level, we fail to reject the null
hypothesis that the two groups have the same mean weight loss, while noting that the P-value is just
slightly above 0.05. (c) With an observed difference in means of 6.44, the P-value for the permutation test
is the proportion of times the permuted sample means have a difference at least 6.44 in absolute value.
Many repetitions of the permutation test found typical P-values between 0.048 (where the null hypothesis
would be rejected at the 0.05 level) to 0.07 (where the null would not be rejected). (d) The results of one
set of 1000 resamples are shown below. Others will be similar. We note this BCa interval does not
contain 0, so we will reject equality of the mean weight loss. Many repetitions of the boot procedures
found a minimum value for the low end of the BCa interval 0.35.
Intervals:
Level Normal Basic
95% (1.431, 11.756) (1.596, 11.722)
Level Percentile BCa
95% (1.158, 11.284) (1.480, 11.498)

16.69. (a) The two populations should be the same shape, but skewed—or otherwise clearly non-
Normal—so that the t test is not appropriate. (b) Either test is appropriate if the two populations are both
Normal with the same standard deviation. (c) We can use a t test but not a permutation test if both
populations are Normal with different standard deviations.
16.71. (a) We test H0: μ = 0 versus Ha: μ > 0,
Gain in Spoken French Scores
where μ is the population mean difference before
10.0
and after the summer language institute. We find t
= 3.86, df = 19, and P-value = 0.0005. (b) The 7.5

Normal quantile plot (right) looks odd because 5.0

we have a small sample, and all differences are 2.5


gain

integers. (c) The P-value is almost always less


0.0
than 0.002. Both tests lead to the same
conclusion: The difference is statistically -2.5

significant (the language institute did help -5.0

comprehension).
-2 -1 0 1 2
Z-Score
0.75
P er mutation distr ibution of the cor r elation
50

0.50

40

0.25

30

Frequency
0.00

20
-0.25

-0.50 10

-3 -2 -1 0 1 2 3 0
Normal Score -0.4 -0.2 0.0 0.2 0.4 0.6

16.73. (a) We have H0: ρ = 0 versus Ha: ρ ≠ 0. (b) The observed correlation is r = 0.671. We create
permutation samples and observe the proportion with correlations at least 0.671 in absolute value. We
should find a P-value 0.002 or less. We’ll conclude there is a correlation between price and rating for
laundry detergents.
16.75. For testing H0: σ1 = σ2 versus Ha: σ1 ≠ σ2, the permutation test P-value will almost always be
between 0.065 and 0.095. In the solution to Exercise 7.107, we found F = 1.506 with df = 29 and 29, for
which P-value = 0.2757—three or four times as large. In this case, the permutation test P-value is
smaller, which is typical of short-tailed distributions.
16.77. For the permutation test, we must resample in a way that is consistent with the null hypothesis.
Hence, we pool the data—assuming that the two populations are the same—and draw samples (without
replacement) for each group from the pooled data. For the bootstrap, we do not assume that the two
populations are the same, so we sample (with replacement) from each of the two data sets separately,
rather than pooling the data first.
Note: Shown below is the bootstrap distribution for the ratio of means; comparing this with the
permutation distribution from the previous solution illustrates the effect of the resampling method.
16.79. (a) We will test H0: µ1 = µ2 versus Ha: µ1 ≠ µ2, (males are coded as 1 in the data file). The observed
mean for males was 2.7835, and the observed mean for females was 2.9325. We seek the proportion of
resamples where the absolute value of the difference was at least 0.149. The P-value should generally be
between 0.25 and 0.32. This test finds no significant difference in GPA between the two sexes. (b) We
test H0: σ1/σ2 = 1 versus Ha: σ1/σ2 ≠ 1. The observed ratio is 0.8593/0.7477 = 1.149. The P-value is
generally between 0.235 and 0.270. We fail to detect a difference in the standard deviations of GPAs for
males and females. (c) Responses will vary.
16.81. (a) The standard test of H0: σ1 = σ2 versus Ha: σ1 ≠ σ2 leads to F = 0.3443 with df 13 and 16; P-
value = 0.0587. (b) The permutation P-value is typically between 0.02 and 0.03. (c) The P-values are
similar, even though, technically, the permutation test is significant at the 5% level, while the standard
test is (barely) not. Because the samples are too small to assess Normality, the permutation test is safer.
(In fact, the population distributions are discrete, so they cannot follow Normal distributions.)
16.83. The 95% t interval is narrower in some cases than the Typical Ranges
percentile interval. Typical results are shown at the right. In t lower 2.748 to 2.757
Example 16.8, the percentile interval is 2.793 to 3.095 (a bit t upper 3.012 to 3.027
narrower and higher than these) and the t interval is 2.80 to 3.10 Percentile lower 2.731 to 2.760
(again, higher and narrower). This is at least in part explained by Percentile upper 3.004 to 3.028
eliminating more observations on either end with the 25% trim.

16.85. (a) The correlation for males is 0.4657. Because the bootstrap distribution does not look Normal,
we’ll focus on the percentile interval. The lower end of the percentile intervals ranged from 0.269 to
0.286, with the upper end ranging from 0.623 to 0.625. (b) The correlation for females is 0.3649. Again
focusing on the percentile interval, the intervals are wider. The low end of the percentile interval ranged
from 0.053 to 0.081 while the upper end ranged from 0.581 to 0.604. (c) Use a function like the one
below to bootstrap the difference in correlations. In the plot shown, we can clearly see that the bootstrap
distribution of the differences in correlations is very Normal. It is also clear that 0 will be included in the
interval [the interval for this bootstrap set was (−0.2426, 0.4229 )]. All intervals examined had a low end
between −0.22 and −0.24 with a high end between 0.40 and 0.42. We can conclude that there is no
significant difference in the correlation between high school math grades and college GPA by gender.
> cd = function (D,i) {
+ + M=subset(D[i,2:3],D[i,9]==1)
+ + F=subset(D[i,2:3],D[i,9]==2)
+ + cm=cor(M$GPA, M$HSM)
+ + cf=cor(F$GPA, F$HSM)
+ + cmf=cm-cf
+ + return (cmf) }
16.87. The bootstrap distribution looks quite Normal, and (as a consequence) all of the bootstrap
confidence intervals are similar to each other and also are similar
Typical Ranges
to the standard (large-sample) confidence interval: 0.0112 to
Bias −0.01 to 0.02
0.0681.
SEboot 0.12 to 0.15
BCa lower 0.97 to 1.06
BCa upper 1.47 to 1.58
Percentile lower 0.99 to 1.05
Percentile upper 1.49 to 1.58
16.89. (a) Thirty-two poets died at an average age Writer's Age at Death
of 63.19 years (s = 17.30), with median age 68. 100

Twenty-four nonfiction writers died at an average 90

age of 76.88 years (s = 14.10), with median age 80


77.5. In the boxplots, we can clearly see that
70
poets seem to die younger. Both distributions are
Age

60
somewhat left-skewed, and the nonfiction writer
who died at age 40 is a low outlier. (b) Using a 50

two-sample t test, we find that testing H0: µN = µP 40

against Ha: µN ≠ µP gives t = 3.26 with P-value = 30

0.002 (df = 53). A 95% confidence interval for Nonfiction Poems

the difference in mean ages is (5.27, 22.11).


Nonfiction writers seem to live on average between 5.22 and 22.11 years longer than poets, at 95%
confidence. (c) The bootstrap distribution is symmetric and seems close to Normal, except at the ends of
the distribution, so a bootstrap t interval should be appropriate. The low ends of the bootstrap interval are
typically between 5.26 and 5.82; the high ends are typically between 21.26 and 22.15. One particular
interval seen was (5.40, 21.36). Note this interval is a bit narrower than the two-sample t interval.
16.91. The R permutation test for the mean ages returns a P-value of 0.006 (comparable with the 0.002
from the t test) in this instance. The 99% confidence interval for the P-value is between 0.0002 and
0.0185. We can determine there is a difference in mean age at death between poets and nonfiction writers.
Exact Permutation Test Estimated by Monte Carlo

data: Age by Type1


P-value = 0.006
alternative hypothesis: true mean Type1 = 2 - mean Type1=3 is not equal to 0
sample estimates:
mean Type1=2 - mean Type1=3
-13.6875

P-value estimated from 999 Monte Carlo replications


99 percent confidence interval on P-value:
0.0002072893 0.0184986927

16.93. All answers (including the shape of the bootstrap distribution) will depend strongly on the initial
sample of uniform random numbers. The median M of these initial samples will be between about 0.36
and 0.64 about 95% of the time; this is the center of the bootstrap t confidence interval. (a) For a uniform
distribution on 0 to 1, the population median is 0.5. Most of the time, the bootstrap distribution is quite
non-Normal; three examples are shown below. (b) SEboot typically ranges from about 0.04 to 0.12 (but
may vary more than that, depending on the original sample). The bootstrap t interval is therefore roughly
M ± 2SEboot. (c) The more sophisticated BCa and tilting intervals may or may not be similar to the
bootstrap t interval. The t interval is not appropriate because of the non-Normal shape of the bootstrap
distribution and because SEboot is unreliable for the sample median (it depends strongly on the sizes of the
gaps between the observations near the middle).
Note: Based on 5000 simulations of this exercise, the bootstrap t interval M ± 2SEboot will capture the
true median (0.5) only about 94% of the time (so it slightly underperforms its intended 95% confidence
level). In the same test, both the percentile and BCa intervals included 0.5 over 95% of the time, while at
the same time being narrower than the bootstrap t interval nearly two-thirds of the time. These two
measures (achieved confidence level and width of confidence interval) both confirm the superiority of the
other intervals. The bootstrap percentile, BCa, and tilting intervals do fairly well despite the high
variability in the shape of the bootstrap distribution. They give answers similar to the exact rank-based
confidence intervals obtained by inverting hypothesis tests. One variation of tilting intervals matches the
exact intervals.
16.95. See Exercise 8.63 for more details about this survey. The Typical Ranges
bootstrap distribution appears to be close to Normal; bootstrap t lower 0.31 to 0.32
intervals are similar to the large-sample interval (0.3146 to t upper 0.38 to 0.39
0.3854). Percentile lower 0.30 to 0.32
Percentile upper 0.38 to 0.39

16.97. (a) This is the usual way of computing percent change: 89/54 − 1 = 1.65 – 1 = 0.65. (b) Subtract 1
from the confidence interval found in Exercise 16.96; this typically gives an interval similar to 0.55 to
0.75. (c) Preferences will vary.
16.99. (a) The mean ratio is 1.0596; the usual t interval is Typical Ranges
1.0596 ± (2.262)(0.02355) = 1.0063 to 1.1128. The bootstrap (a) Mean ratio
distribution for the mean is close to Normal, and the bootstrap t lower 1.00 to 1.02
confidence intervals (typical ranges on the right) are usually t upper 1.10 to 1.12
similar to the usual t interval but slightly narrower. Percentile lower 1.00 to 1.03
Bootstrapping the median produces a clearly non-Normal Percentile upper 1.09 to 1.11
distribution; the bootstrap t interval should not be used for the BCa lower 1.00 to 1.03
median. (Ranges for median intervals are not given.) (b) The BCa upper 1.09 to 1.11
ratio of means is 1.0656; the bootstrap distribution is noticeably
skewed, so the bootstrap t is not a good choice, but the other (b) Ratio of means
methods usually give intervals similar to 0.75 to 1.55. Also t lower 0.59 to 0.68
shown below is the bootstrap distribution for the ratio of the t upper 1.46 to 1.54
Percentile lower 0.69 to 0.78
medians. It is considerably less erratic than the median ratio, but
Percentile upper 1.45 to 1.64
we have still not included these confidence intervals. (c) For BCa lower 0.69 to 0.78
example, the usual t interval from part (a) could be summarized BCa upper 1.45 to 1.66
by the statement, “On average, Jocko’s estimates are 1% to 11%
higher than those from other garages.”
Chapter 17 Statistics for Quality: Control and Capability
17.1. Answers will vary. For example, if brushing teeth, a possible flow chart could be

17.3. Answers will vary. Common causes for brushing teeth might be variability in the actual amount of
toothpaste used each time. Special causes might include having eaten unusually sticky foods, such as
caramels.

17.5. CL = µ = 80, UCL = 80 + 3(22/ 5 ) = 109.516, LCL = 80 – 3(22/ 5 ) = 50.48.

17.7. Answers here will vary but could include such steps as deciding on the order, locating the phone
number, making the call, etc.
17.9. As shown in the Pareto, most of the problems occur in the categories relating to the color coat.

17.11. Answers will vary. Possible causes might include the store being usually busy or heavy traffic
during delivery, etc.

17.13. (a) For n = 10, the x chart has CL = 1.8, with control limits μ ± 3σ/ 10 = 1.658 and 1.942 inches.
The s chart has CL = 0.9727(0.15) = 0.146, with control limits B 5 σ = (0.276)(0.15) = 0.0414 inch and B 6 σ
= (1.669)(0.15) = 0.2504 inch. (b) For n = 2, the x chart has CL = 1.8, with control limits μ ± 3σ/ 2 =
1.482 and 2.118 inches. The s chart has CL = 0.7979(0.15) = 0.1197, with control limits B 5 σ = 0 and B 6 σ
= (2.606)(0.15) = 0.3909 inch. (c) There are 2.54 cm to an inch, so CL = (1.8)(2.54) = 4.572 cm, with
limits (1.575)(2.54) = 4.0005 cm and (2.025)(2.54) = 5.1435 cm. The new center line for the s chart will
be (0.1382)(2.54) = 0.3510 with control limits 0 and (0.3132)(2.54) = 0.7955 cm.
Xbar Chart of Meat Weights S Chart of Meat Weights
1.06
UCL = 0.04324
1
1.05
1 0.04
UCL = 1.04691
1.04
0.03

Sample StDev
Sample Mean

1.03

1.02 __
0.02 _
X= 1.014
1.01 S = 0.01684

1.00 0.01

0.99

0.98 LCL = 0.98109 0.00 LCL = 0

1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 19
Sample Sample

17.15. (a) For the x chart, CL = μ = 1.014 lb, with control limits μ ± 3σ/ 3 = 0.9811 and 1.0469 lb. (b)
For n = 3, c 4 = 0.8862, B 5 = 0, and B 6 = 2.276. For the s chart, we have CL = (0.8862)(0.019) = 0.01684
lb, with control limits 0 and (2.276)(0.019) = 0.04324 lb. (c) The control charts are below. (d) The s chart
is in control, but there were two out-of-control signals on the x chart at samples 10 and 12.

17.17. (a) The center line is at μ = 11.5 Kp; the control limits should be at μ ± 3σ 4 = 11.5 ± 0.3 = 11.2
and 11.8 Kp. (b) Graphs below. Set A had two points out of control (the 12th sample and the last); Set C
has the last two out of control. (c) Set B is from the in-control process. The process mean shifted suddenly
for Set A; it appears to have changed close to the 11th or 12th sample. The mean drifted gradually for the
process in Set C.

Data Set A Data set B

11.8 11.8 11.8 11.8


Sa mple mean

Sa mple mean

11.6 11.6

11.5 11.5

11.4 11.4

11.2 11.2 11.2 11.2

4 8 12 16 20 4 8 12 16 20
Sa mple Sa mple

Data set C

12.0

11.8 11.8
Sa mple mean

11.6

11.5

11.4

11.2 11.2

4 8 12 16 20
Sa mple
17.19. For the s chart with n = 6, we have c 4 = 0.9515, B 5 = 0.029 and B 6 = 1.874, so CL =
(0.9515)(0.002) = 0.001903 inch, and the control limits are (0.029)(0.002) = 0.000058 and (1.874)(0.002)
= 0.003748 inch. For the x chart, CL = μ = 0.85 inch, and the control limits are μ ± 3σ/ 6 = 0.85 ±
0.00245 = 0.8476 and 0.8525 inch.

17.21. For the x chart, CL = 43, and the control limits are μ ± 3σ/ 5 = 25.91 and 60.09. For n = 5, c 4 =
0.9400, B 5 = 0, and B 6 = 1.964, so for the s chart CL = (0.9400)(12.74) = 11.9756, and the control limits
are 0 and (1.964)(12.74) = 25.02. The control charts (below) show that Sample 5 was out of control on the
s chart, but it appears to have been a special cause variation, as there is no indication that the samples that
followed it were out of control.
17.23. (a) The process mean is the same as the center line: μ = 715. The control limits are three standard
errors from the mean, so 35 = 3σ/ 4 , solving gives σ = 23.3333. (b) If the mean changes to μ = 700, then
x is approximately Normal with mean 700 and standard deviation. σ/ 4 = 11.6667, so x will fall outside
the control limits with probability 1 − P(680 < x < 750) = 1 − P(−1.71 < Z < 4.29) = 0.0436. (c) With μ
= 700 and σ = 10, x is approximately Normal with mean 700 and standard deviation σ/ 4 = 5, so x will
fall outside the control limits with probability 1 − P(680 < x < 750) = 1 − P(−4 < Z < 10) = 1.

17.25. The usual 3σ limits are μ ± 3σ/ n for an x chart and (c 4 ± 3c 5 )σ for an s chart. For 2σ limits,
simply replace “3” with “2.” (a) μ ± 2σ/ n . (b) (c 4 ± 2c 5 )σ.

17.27. (a) Shrinking the control limits would increase the frequency of false alarms because the
probability of an out-of-control point when the process is in control will be higher (roughly 5% instead of
0.3%). (b) Quicker response comes at the cost of more false alarms. (c) The runs rule is better at detecting
gradual changes. (The one-point-out rule is generally better for sudden, large changes.)

17.29. We estimate σ̂ to be s/c 4 = 1.03/0.9213 = 1.1180, so the x chart has center line x = 47.2 and
control limits x ± 3 σ̂ / 4 = 45.523 and 48.877. The s chart has center line s = 1.03 and control limits 0
and (2.088)( σ̂ ) = 2.3344.

17.31. (a) The center line for the s chart will be at s = 0.07177. For n = 4, c 4 = 0.9213, and σ̂ = s /c 4 =
0.07177/0.9213 = 0.07790. We’ll have UCL = 2.088(0.07790) = 0.16266 and LCL = 0. (b) None of the
sample standard deviations were greater than 0.16266, so variability is under control. (c) The center line
for the x chart is x = 2.0078. The limits are x ± 3 σ̂ / 4 = 2.0078 ± 0.10766 = 1.90014 to 2.11546 inches.
(d) These limits are slightly narrower than those found previously (1.85 to 2.15 inches).
17.33. Sketches will vary. One possible x chart is
5
shown, created with the assumption that the
experienced clerk processes invoices in an
average of 2 min, while the new hire takes an 4

average of 4 min.
Sample Mean

3 3
Note: Such a process would not be considered to
be in control for very long. Because both clerks
2
“are quite consistent, so that their times vary
little from invoice to invoice,” each sample has a
small value of s, so the revised estimate of σ 1
2 4 6 8 10 12 14 16 18 20
would likely be smaller. At that point, the control Sample Number
limits (based on that smaller spread) will be moved closer to the center line.
17.35. (a) Average the 20 sample means and standard deviations and estimate μ to be µˆ = x = 2750.7 and
σ to be σ̂ = s /c 4 = 345.5/0.9213 = 375.0. (b) In the s chart shown in Figure 17.7, most of the points fall
below the center line.
17.37. If the manufacturer practices SPC, it will provide some assurance that the phones are roughly
uniform in quality; as the text says, “We know what to expect in the finished product.” So, assuming that
uniform quality is sufficiently high, the purchaser does not need to inspect the phones as they arrive
because SPC has already achieved the goal of that inspection to avoid buying many faulty phones. (Of
course, a few unacceptable phones may be produced and sold even when SPC is practiced—but
inspection would not catch all such phones anyway.)
17.39. The quantile plot does not suggest any
serious deviations from Normality, so the natural 10000

tolerances should be reasonably trustworthy. 9000

8000
Loss ($)

7000

6000

5000

4000

3000
-3 -2 -1 0 1 2 3
Z-Score

17.41. If we shift the process mean to 2500 mm, about 99% will meet the new specifications: P(1500 < X
 1500 − 2500 3500 − 2500 
< 3500) = P  < Z <  = P(−2.60 < Z < 2.60) = 0.9953 – 0.0047 = 0.9906.
 384 384 
17.43. The mean of the 17 in-control samples is x = 43.4118, and the standard deviation is 11.5833, so
the natural tolerances are x ± 3s = 8.66 to 78.16.

 44 − 43.4118 64 − 43.4118 
17.45. P(44 < X < 64) = P  < Z <  = P(0.05 < Z < 1.78) = 0.9625 – 0.5199
 11.5833 11.5833 
= 0.4426. Only about 44% of meters meet the specifications.
17.47. The limited precision of the measurements shows up in the granularity (stair-step appearance) of
the graph (above, right). Aside from this, there is no particular departure from Normality.
17.49. The quantile plot, while not perfectly
linear, does not suggest any serious deviations 1.100

from Normality, so the natural tolerances should 1.075

be reasonably trustworthy.
1.050
Weight

1.025

1.000

0.975

0.950
-3 -2 -1 0 1 2 3
Z-Score
17.51. (a) (ii) A sudden change in the x chart: This would immediately increase the amount of time
required to complete the checks. (b) (i) A sudden change (decrease) in s or R because the new
measurement system will remove (or decrease) the variability introduced by human error. (c) (iii) A
gradual drift in the x chart (presumably a drift up, if the variable being tracked is the length of time to
complete a set of invoices).
17.53. The process is no longer the same as it was during the downward trend (from the 1950s into the
1980s). In particular, including those years in the data used to establish the control limits results in a mean
that is too high to use for current winning times and a standard deviation that includes variation
attributable to the “special cause” of the changing conditioning and professional status of the best runners.
Such special cause variation should not be included in a control chart.
17.55. LSL and USL are specification limits on the individual observations. This means that they do not
apply to averages and that they are specified as desired output levels, rather than being computed based on
observation of the process. LCL and UCL are control limits for the averages of samples drawn from the
process. They may be determined from past data, or independently specified, but the main distinction is
that the purpose of control limits is to detect whether the process is functioning “as usual,” while
specification limits are used to determine what percentage of the outputs meet certain specifications (are
acceptable for use).

17.57. For computing Cˆ pk , note that the estimated process mean (2750.7 mm) lies closer to the USL. (a)
4000 − 1000 4000 − 2750.7 3500 − 1500
Cˆ p = =1.3028 and Cˆ pk = =1.0850. (b) Cˆ p = = 0.8685 and Cˆ pk
6 × 383.8 3 × 383.8 6 × 383.8
3500 − 2750.7
= = 0.6508.
3 × 383.8

1.09 − 0.95 1.09 − 1.03


17.59. Using x = 1.02997 lb and s = 0.0224 lb. (a) Cˆ p = = 1.0417 and Cˆ pk = = 0.8929.
6 × 0.0224 3 × 0.0224
(b) Customers typically will not complain about a package that was too heavy.

0.75 − 0.25
17.61. (a) C pk = = 0.5767. Only 50% of the output

meets the specifications. (b) LSL and USL are 0.865 standard
deviations above and below the mean, so the proportion meeting
specifications is P(−0.865 < Z < 0.865) = 0.6130. (c) The
relationship between C pk and the proportion of the output meeting
specifications depends on the shape of the distribution.

17.63. (a) Use the mean and standard deviation of the 85


remaining observations: µ̂ = x = 43.4118 and σ̂ = s =
64 − 44
11.5833. (b) Cˆ p = = 0.2878 and Cˆ pk = 0 (because µ̂
6σˆ
is outside the specification limits). This process has very poor
capability: The mean is too low and the spread too great. Only
about 46% of the process output meets specifications.
17.65. We have x = 7.997 mm and s = 0.0025 mm, we assume that an individual bearing diameter X
 7.992 − 7.997 8.000 − 7.997 
follows a N(7.997, 0.0025). (a) P(7.992 < X < 8.000) = P  < Z <  =
 0.0025 0.0025 

P(−2 < Z < 1.2) = 0.8849 – 0.0228 = 0.8621 or 86.2%. (b) Cˆ pk =


8.000 – 7.996
= 0.5797.
3 × 0.0023

17.67. It is called Six-Sigma Quality because it allows for six or more standard deviations on either side
of the mean instead of the Normal 3 required.

17.69. Students will have varying justifications for the sampling choice. Choosing six calls per shift gives
an idea of the variability and mean for the shift as a whole. If we took six consecutive calls (at a randomly
chosen time), we might see additional variability in x because sometimes those six calls might be
observed at particularly busy times (when a customer has to wait for a long time until a representative is

available or when a representative is using the restroom).


17.71. The outliers are 276 seconds (sample 28), 244 seconds (sample 42), and 333 seconds (sample 46).
After dropping those outliers, the standard deviations drop to 9.284, 6.708, and 31.011 seconds. (Sample
#39, the other out-of-control point, has two moderately large times, 144 and 109 seconds; if they are
removed, s drops to 3.416.)
17.73. (a) For those 10 months, there were 10(2,650) = 26,500 total invoices (opportunities), so
958
p= = 0.03615. (b) The center line and control limits are CL = p = 0.03615; control limits: p
26, 500
p (1 − p )
±3 = 0.02527 and 0.04703.
2650

0.0218(0.9782)
17.75. The center line is at the historical rate (0.0218); the control limits are 0.0231 ± 3 ,
500
which means about 0.0035 and 0.0427.

p (1 − p )
17.77. The center line is at p = 194 = 0.00506; the control limits should be at p ± 3 ,
38, 370 1520

which means about −0.0004 (use 0) and 0.01052.

17.79. (a) The student counts sum to 9218, while


Student Absenteeism
3277
the absentee total is 3277, so p = = 0.40
0.4028
9218
0.3555 and n = 921.8. (b) The center line is p = 0.38
Proportion Absent

0.3555, and the control limits are 0.36


0.3555
p (1 − p )
p±3 = 0.3082 and 0.4028. The p 0.34
921.8
chart suggests that absentee rates are in control. 0.32

0.3082
(c) For October, the limits are 0.3088 and 0.4022; 0.30

for June, they are 0.3072 and 0.4038. These limits 1 2 3 4 5


Month
6 7 8 9 10

appear as solid lines on the p chart, but they are


not substantially different from the control limits found in (b). Unless n varies a lot from sample to
sample, it is sufficient to use n.

8, 000
17.81. (a) p = = 0.008. We expect about (500)(0.008) = 4 defective orders per month. (b)
1, 000, 000

p (1 − p )
The center line and control limits are CL = p = 0.008, control limits: p ± 3 = −0.00395 and
500
0.01995 (We take the lower control limit to be 0.) It takes at least 10 bad orders in a month to be out of
control because (500)(0.01995) = 9.975.
17.83. (a) The percents do not add to 100%
35
because one customer might have several
30
complaints; that is, he or she could be counted in 25
several categories. (b) Clearly, top priority
Percent

20
should be given to the processes of ease of 15

adjustment/credits as well as both the accuracy 10

0
t s e t t p te l es l ls
di ce ic en en re ua at
re oi vo m m s da an er ca
. /c nv in ip ip le d p ns
ad
j i e sh h
fS
a i se of
m
to
r
of et ts tu
ce y pl et
e ec o om it y en re
oi ac m pl rr p. pr ar m p
nv ur Co m Co m g Cl ip re
i cc Co lc
o ti n Eq
u
le
s
of A a ee Sa
se n ic M
Ea ch
Te
and the completion of invoices, as the three most common complaints involved invoices.

17.85. On one level, these two events are similar: Points above the UCL on an x (s) chart suggest that the
process mean (standard deviation) may have increased. The difference is in the implications of such an
increase (if not due to a special cause). For the mean, an increase might signal a need to recalibrate the
process in order to keep meeting specifications (that is, to bring the process back into control). An
increase in the standard deviation, on the other hand, typically does not indicate that adjustment or
recalibration is necessary, but it will require re-computation of the x chart control limits.

17.87. We find that s = 7.65, so with c 4 = 0.8862 and B 6 = 2.276,


we compute σ̂ = s / c4 = 7.65/0.8862 = 8.63, LCL = 0, and UCL
= (2.276)(8.63) = 19.65. One point (from sample #1) is out of
control. (And, if that cause was determined and the point
removed, a new chart would have s for sample #10 out of
control.) The second (lower) UCL line on the control chart is the
final UCL, after removing both of those samples.

50
17.89. (a) From the previous exercise, σ̂ = 7.295. Therefore, C p = = 1.1423. This is a fairly small
6σˆ
value of C p ; the specification limits are just barely wider than the 6 σ̂ width of the process distribution, so
if the mean wanders too far from 830, the capability will drop. (b) If we adjust the mean to be close to
830 mm × 10−4 (the center of the specification limits), we will maximize C pk . C pk is more useful when the
mean is not in the center of the specification limits. (c) The value of σ̂ used for determining C p was
estimated from the values of s from our control samples. These are for estimating short-term variation
(within those samples) rather than the overall process variation.

15 p (1 − p )
17.91. (a) Use a p chart, with center line p = = 0.003 and control limits of p ± 3 , or 0 to
5000 100
0.0194. (b) There is little useful information to be gained from keeping a p chart: If the proportion
remains at 0.003, about 74% of samples will yield a proportion of 0, and about 22% of proportions will be
0.01. To call the process out of control, we would need to see two or more unsatisfactory films in a
sample of 100.
17.93. Several interpretations of this problem are possible, but, for most reasonable interpretations, the
probability is about 0.3%. From the description, it seems reasonable to assume that all three points are
inside the control limits; otherwise, the one-point-out rule would take effect. Furthermore, the phrase
“two out of three” could be taken to mean either “exactly two out of three,” or “at least two out of three.”
(Given what we are trying to detect, the latter makes more sense, but students may have other ideas.) For
the kth point, we name the following events:

• A k = “that point is no more than 2σ/ n from the center line,”

• B k = “that point is 2 to 3 standard errors from the center line.”


For an in-control process, P(A k ) = 95% (or 95.45%) and P(B k ) = 4.7% (or 4.28%). The first given
probability is based on the 68–95–99.7 rule; the second probability (in parentheses) comes from Table A
or software. Note that, for example, the probability that the first point gives no cause for concern, but the
second and third are more than 2σ/ n from, and on the same side of, the center line, would be:
1
P ( A1 ∩ B2 ∩ B3 ) = 0.10% (or 0.09%).
2

(The factor of 1/2 accounts for the second and third points being on the same side of the center line.) If
the “other” point is the second or third point, this probability is the same, so if we interpret “two out of
three” as meaning “exactly two out of three,” then the total probability is three times the above number:
P(false out-of-control signal from an in-control process) = 0.31% (or 0.26%).
With the (more-reasonable) interpretation “at least two out of three”: P(false out-of-control signal) =
1 1 1
P ( A1 ∩ B2 ∩ B3 ) + P ( B1 ∩ A2 ∩ B3 ) + P ( B1 ∩ B2 ) = 0.32% (or 0.27%).
2 2 2

You might also like