# Moore-212007

pbs

November 27, 2007

9:50

. SOLUTIONS. . .TO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODD-NUMBERED . . . . . . . . . . . . . . . . . . . .
CHAPTER . . . . . . . . . . .1

1.5 1.7

1.9

1.11

1.13 1.15

1.17

1.19 1.21

1.23

1.25

S-1

Moore-212007

pbs

November 27, 2007

9:50

S-2

SOLUTIONS TO ODD-NUMBERED EXERCISES

1.27

The distribution of dates of coins will be skewed to the left because as the dates go further back there are fewer of these coins currently in circulation. Various left-skewed sketches are possible. (a) Many more readers who completed the survey owned Brand A than Brand B. (b) It would be better to consider the proportion of readers owning each brand who required service calls. In this case, 22% of the Brand A owners required a service call, while 40% of the Brand B owners required a service call. Some possible variables that could be used to measure the “size” of a company are number of employees, assets, and amount spent on research and development. The stemplot shows that the costs per month are skewed slightly to the right, with a center of \$20 and a spread from \$8 to \$50. America Online and its larger competitors were probably charging around \$20, and the members of the sample who were early adopters of fast Internet access probably correspond to the monthly costs of more than \$30. When you change the scales, some extreme changes on one scale will be barely noticeable on the other. Addition of white space in the graph also changes visual impressions. (a) Household income is the total income of all persons in the household, so it will be higher than the income of any individual in the household. (c) Both distributions are fairly symmetric, although the distribution of mean personal income has two high outliers. The distribution of median household incomes has a larger spread and a higher center. The center of the distribution of mean personal income is about \$25,000, while the center of the distribution of median household income is about \$37,000. The means are \$19,804.17, \$21,484.80, and \$21,283.92 for black men, white females, and white males, respectively. Since we have not taken into account the type of jobs performed by individuals in each category or years employed, we cannot make claims of discrimination without ﬁrst adjusting for these factors. The medians are \$18,383.50, \$19,960, and \$19,977 for black men, white females, and white males, respectively. The medians are smaller, but our general conclusions are similar. Because of strong skewness to the right, the median is the lower number of \$330,000 and the mean is \$675,000. For Asian countries the ﬁve-number summary is 1.3, 3.4, 4.65, 6.05, 8.8; and for Eastern European countries it is −12.1, −1.6, 1.4, 4.3, 7.0. Side-by-side boxplots show that the growth of per capita consumption tends to be much higher for the Asian countries than for the Eastern European countries and that the growth of per capita consumption for the Eastern European countries is much more spread out. (a) The mean is \$983.5. (b) The standard deviation is \$347.23. (c) Results should agree except for number of digits retained. A rare, catastrophic loss would be considered an outlier, and averages are not resistant to outliers. The ﬁve-number summary is more appropriate for describing the distributions of data with outliers. (a) The ﬁve-number summary is 5.7, 11.7, 12.75, 13.5, 17.6. (b) The IQR is 1.8. Any low outliers would have to be less than 9, and any high outliers would have

1.29

1.31 1.33

1.35

1.37

1.39

1.41

1.43 1.45

1.47 1.49

1.51

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-3

to be greater than 16.2. Therefore, Florida and Alaska are deﬁnitely outliers, but so is the state with 8.5% older residents. 1.53 (a) The ﬁve-number summary is 0, 0.75, 3.2, 7.8, 19.9. (b) The IQR is 7.05. Any low outliers would have to be less than −9.825, and any high outliers would have to be greater than 18.375. The histogram shows that the U.S., Australia, and Canada all have CO2 emissions much higher than the rest of the countries, but only the U.S. is ofﬁcially considered an outlier using the 1.5 × IQR rule. (a) The 5th percentile is approximately at the 748th position. The 95th percentile is approximately at the 14,211th position. (b) The 5th percentile value is approximately \$10,000. The 95th percentile value is approximately \$140,000. (a) The distribution is skewed left with a possible low outlier. (b) Use stems from the tens place as −2, −1, −0, 0, 1, 2, 3. The median is \$27.855. The mean is \$34.70 and is larger than the median because the distribution is skewed right. (a) The mean change is x = 28.767% and the standard deviation is s = 17.766%. (b) Ignoring the outlier, x = 31.707% and the standard deviation is s = 14.150%. The low outlier has pulled the mean down toward it and increased the variability in the data. (c) “Identical” in this case probably means that the 5-liter vehicles were all the same make and model. You will learn about the effects of outliers on the mean and median by interactively using the applet. From the boxplot, the clear pattern is that as the level of education increases, the incomes tend to increase and also become more spread out. Ignoring the District of Columbia, the histogram of violent crimes is fairly symmetric, with a center of about 450 violent crimes per 100,000. The spread is from about 100 to 1000 violent crimes per 100,000 for the 50 states, with the District of Columbia being a high outlier with a rate of slightly over 2000 violent crimes per 100,000. Both data sets have a mean of 7.501 and a standard deviation of 2.032. Data A has a distribution that is left-skewed while Data B has a distribution that is fairly symmetric except for one high outlier. Thus, we see two distributions with quite different shapes but with the same mean and standard deviation. (a) The ﬁve-number summary is 0.9%, 3.0%, 4.95%, 6.6%, 14.2%. (b) The mean is larger because the distribution is moderately skewed to the right with a high outlier. (a) A histogram is better because the data set is moderate size. (b) The two low outliers are −26.6% and −22.9%. The distribution is fairly symmetric, with a median, or center, of 2.65%. The spread is from −16.5% to 24.4%. (c) The mean is 1.551%. At the end of the month, you would have \$101.55. (d) At the end of the month, you would have \$74.40. Excluding the two outliers gives a mean of 1.551% and a standard deviation of 8.218%. Both have changed. Quartiles and medians are relatively unaffected. \$2.36 million must be the mean since fewer than half of the players’ salaries were above it.

1.55

1.57 1.59 1.61

1.63 1.65 1.67

1.69

1.71

1.73

1.75

x ± 2s = (−0. (a) Mean is C.91 1.0122.93 1. (a) No.9878. but the two outliers stick out again. You can use the Uniform distribution for an example of a symmetric distribution. 10. (a) 0. 1. (c) 16%.77 1. (c) 0. Answers will vary. The plot suggests no major deviations except for a possible low outlier.115 The histogram is fairly rectangular and the Normal quantile plot is S-shaped.2404.4584.1711 or 17. For symmetric data. while Gerald’s z-score is 1.105 (a) 0. and there are many choices for a left-skewed distribution. and the third quartile is about 277 days. 0.2180.0384. 1. (a) 0. (c) 0. (b) 452. (b) The beginning years are all below average until January 1966. (b) 0. The distribution is not exactly normal.5. so Eleanor scored relatively higher. 1. (b) Mean is A.9%. (a) 0. Then there is a severe drop with no recovery.2389. 1. 1. 1. (b) 0.99 1.117 (a) The mean is 64. but then the last month to be above average is 4/01. this apartment building is no longer an outlier. the normal quantile plot will show the highest and lowest points below the diagonal. the points will follow the diagonal fairly closely. For left-skewed data. 2007 9:50 S-4 SOLUTIONS TO ODD-NUMBERED EXERCISES 1.2236.83 1.87 1. It ﬁts in perfectly with the rest of the data. (c) 72.25 grams per mile driven. 90.9494.97 1. The normal quantile plot shows a fairly good ﬁt to the diagonal line in the middle. (d) 0.85 1.107 0. (c) 712. median is A.8.67.109 (a) Between −21% and 47%. (c) Mean is A.1956). Answers will vary.Moore-212007 pbs November 27.111 (a) The ﬁrst and third quartiles for the standard Normal distribution are approximately ±0.81 1. 0.11%.0505. x ± 3s = (−0.7%. It looks much better than the distributions in 1. For right-skewed data. even at the ends.6764. median is B. As 2001 . It’s not even one of the top two data points now. (b) The distribution looks fairly normal except for 2 new high outliers (#11 and #13). 0. (b) 0. 0.113 Normal quantile plot looks fairly linear. (d) The dot. the normal quantile plot will show the highest and lowest points above the diagonal.79 1. standard deviation = 0. median is B. 1.65 to 2. (b) The ﬁrst quartile is about 255 days. 1. 1.0505.89 1.4136).94.6316). (a) Mean = −0. 1. 1. (c) The values are high (in the 80s) between 1/00 and 7/00. (b) x ± s = (−0. (c) There are several answers for (a) but (b) is unique.101 The stemplot shows a fairly symmetric distribution with a high and low outlier.103 0.025.49.5948.0454. (b) 0. Eleanor’s z-score is 1. (b) 64 inches to 74 inches.com economy was still booming in January 2000.95 (a) 1. 10. (c) 0. 1.0224. and 100%.

6 million and the mean salary is slightly over \$3. Q 1 = 337 grams.000.8 thousand barrels and x = 48.12475. the minimum is 2 thousand barrels. 1.5 grams. did bad things to the economy as well. and a smaller fraction of these could be sampled. (b) For normal corn. 1.5 grams.8 thousand barrels. (c) The box plot indicates very.78. The next break is not as clear. Q 1 = 21.123 Gender and automobile preference are categorical. M = 358 grams.125 The distribution is right-skewed. the dot. Q 3 = 1.127 (a) For normal corn.Moore-212007 pbs November 27. and the maximum is 204. and the maximum is 477 grams. the group of the most populous counties might consist of the 8 counties with populations over 1 million. but it is difﬁcult to be certain of this from only the ﬁve-number summary.30 and s = 50. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-5 progressed. 1. 1. Almost all the data was 1. Because of the rightskewness. For the mean. For example. Age and household income are quantitative.com economy started to crash.129 (a) From the histogram or the boxplot. Q 1 = 383.25 thousand barrels. The mean weight gain for the new corn is 36. M = 406. For new corn.131 The mean is μ = 250 and the standard deviation is σ = 175. being pulled up by the strong right-skewness of the distribution. new corn gives higher weight gains. and we could sample households from half of these counties. the median.1 thousand barrels.95 and s = 42. and Q 3 are all the same. and we would sample from all of these counties. the minimum is 272 grams. but it could be taken at 100. M.5 grams. Finally. but it shouldn’t appear to deviate . (b) The box has a length of 0.9 thousand barrels. (c) 0.121 (a) 11/22 = 50%.119 (a) Min = 1. very right. 1. Division into three groups is fairly arbitrary. indicating some right-skewness. The boxplot and the histogram clearly show the right-skewness. 2001. M = 1.80. the minimum is 318 grams. it is clear that the distribution is skewed to the right. Q 3 = 400. 1. we expect the mean to be substantially larger than the median. the distribution should look fairly symmetric with a center at about 20. There are 27 counties with populations between 100. The median is 159. the Normal quantile plot may not be that smooth. This makes sense because Q 1 . The spread goes from \$200 thousand to slightly over \$12 million. the mean of the populations is 583. (b) Negative revenue growth means the previous year’s revenue was higher than this year’s revenue. Q 1 = 1. Q 3 = 428. and the maximum is 462 grams.5 grams. The maximum is much farther from the median than the minimum. The median salary is \$1. (b) M = 37. there are 23 counties with populations below 100.73. The box containing the ﬁrst quartile. Overall. x = 366. with the third quartile being slightly farther from the median. A natural way to proceed is to use cutoffs that correspond to gaps in the data. and September 11. Q 3 = 60.5 million. making plotting difﬁcult.133 The range of values is extreme.994 and is clearly quite far from the center of the data. 1. 1.778. which could suggest right-skewness or just a high outlier. and for the new corn x = 402. Since the distribution is skewed right and has three high outliers. 1.5 thousand barrels. skewed data. (c) M = 37.000. (d) The top 25% of all telecom companies had revenue growth greater than 0.000 and 1 million. with three high outliers.65 grams larger than for the normal corn.82%.135 Answers will vary. and the third quartile is fairly symmetric. Because of the small number of observations.

At very slow speeds and at very high speeds. .1 (a) Time spent studying is the explanatory variable. (c) This bear market appears to have had a decline of 48% and a duration of about 21 months.5 2. fuel used decreases. Both variables are quantitative and probably have a negative association because the more money the parents have. (a) Explanatory variable = accounts. (d) The points lie close to a simple curved form. Both are categorical variables. We therefore expect mean personal income for a state to be positively associated with its median household income. ﬁnancial services (32. natural resources (23. The Normal quantile plot should not look nearly as much like a straight line as when plotting the 20 values of x.2 2. . . . (a) Speed is the explanatory variable and would be plotted on the x axis. . The plot shows a positive association. it increases as speed increases. For the standard deviation. Hand wipe is the explanatory variable. at about 60 km/h. and technology (54. suggesting the distribution of s is not Normal. and linear.3 2. and the yield of a crop is the response variable.11 2. (e) Economic class of the father is the explanatory variable. but two points stand out at the top right corner of the plot. (a) The longer the duration. (c) No. 2007 9:50 S-6 SOLUTIONS TO ODD-NUMBERED EXERCISES from a straight line in a systematic way. so personal and household incomes are positively associated. .7 2. and that of the son is the response. both low and high speeds correspond to high values of fuel used. because market sector is a categorical variable. so we cannot say that the variables are positively or negatively associated. and then.Moore-212007 pbs November 27. the distribution should look slightly skewed to the right with a center close to 5.15 2. Highway gas mileage tends to be between 5 and 10 MPG greater than city mileage.13 2. (c) It appears to fall roughly on the line deﬁned by the pattern of the remaining points. (b) A straight line that slopes up from left to right describes the general trend reasonably well. (a) If a person has a high income. (b) Explore the relationship between the two variables. positive. (c) Strong. so the relationship is reasonably strong. (b) The pattern is roughly linear.314). . and the highway mileage is 68 MPG. .17 . (c) Yearly rainfall is the explanatory variable. and grade on the exam is the response. Household incomes will always be greater than or equal to 2. It may be most reasonable to simply explore the relationship between these two variables.317). (d) Charles Schwab and Fidelity. (d) Many factors affect salary and the number of sick days one takes. (b) Technology was the best place to invest in 2003.760). CHAPTER . . response variable = assets. . the greater the decline.960). (b) At ﬁrst. (c) In the scatterplot. the less a student will probably need to borrow to pay for college tuition.9 2. and whether or not the skin appears abnormally irritated is the response. Response variable: money student borrows. engines are very inefﬁcient and use more fuel. his or her household income will also be high. (a) The means for the market sectors are: consumer (30. The association is not very strong. Explanatory variable: parent income. (a) The city gas mileage is approximately 62 MPG.

Technology = −24. (b) The correlation in Figure 2. the association is positive. Computation shows r = 0. the mean is 1. 2.03. Utilities and natural resources = 25. The District of Columbia is a city with a few very wealthy inhabitants and many poorer inhabitants. (a) High values of duration go with high values of decline.31 2.66. so we cannot speak of positive or negative association between market sector and total return. The points in Figure 2.19 (a) The overall pattern is linear.000 and the standard deviation is 16. (b) Financial services and Utilities and natural resources were good places to invest. The correlation r measures the strength of a straightline relationship. neither variable is necessarily the explanatory variable. (a) Explanatory variable = price. median household income could be smaller than mean personal income. so this probably corresponds to “Eat Slim Veal Hot Dogs. Florida is the one with the fewest failures. We expect the distribution of both personal and household incomes to be strongly right-skewed. and Texas. linear relationship and the correlation is 0.87. Hot dogs that are high in calories are generally high in sodium. so it is represented on the horizontal axis. Financial services = 35.Moore-212007 pbs November 27. The points are not tightly clustered about a line. we know that the mean will be larger than the median. and median household income must be larger than median personal income.07. (b) If the distribution of incomes in a state is strongly right-skewed.68. (c) Of the four points in the right side of the plot. but results should look fairly random and without pattern. The correlation is 0.3215. 2. They are the most populous states in the country.928. y = 29.27 2.6. (a) The means are: Consumer = −7.29 2. (c) The association is positive and is reasonably strong.31 we found that the scatterplot shows a strong. (a) In 2. (c) Market sector is categorical. (b) x = 18. (c) The correlation should be close to 0. (b) Brand 13 has the fewest calories.21 2. (d) The outlier is California. the mean is 50. and r = 0. Florida.8487. positive.” (a) Business starts is taken as the explanatory variable. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-7 the incomes of the individuals in the household. so the correlation is not near 1. Many very wealthy people who work in New York City live in Connecticut.25 2.4. The variables are positively associated.325. positive. The pattern is strong. sx = 0. Although mean household income must be larger than mean personal income.23 2. For deforestation. These states are scattered throughout the country. We arbitrarily put City MPG on the horizontal axis of our plot.955.738 and the standard deviation is 0. (b) Answers will vary. We therefore expect median household income in a state to be larger than mean personal income in that state.33 .3055. (b) The association is positive. (e) The four states outside the cluster are California.6. and the strength is moderately strong.955. so the association is positive.2 is closer to 1 than the correlation in Figure 2. and linear. resulting in a high mean income. New York. (d) The two points are the District of Columbia and Connecticut. (a) In this case. (b) For price. (c) The overall pattern of the plot is roughly linear.2 appear to lie more closely along a line than do those in Figure 2. The strength of the relationship (ignoring outliers) is moderately strong. s y = 0.

is r = 0. (d) The residuals show the same curved pattern. The correlation.75%. (a) Slightly parabolic. (a) The number y in inventory after x weeks must be y = 96 − 4x. This is not surprising: r does not change when we change the units of measurement of x. opening downward. sy r = 0.51 2. the stock is just as likely to perform well as to not perform well—likewise for companies that do not compensate their CEOs highly. y. The scatterplot suggests that the relation between y and x is a curved relationship.1889.59 2.Moore-212007 pbs November 27. There is a straight-line pattern that is fairly strong. s y = 3.5% because r 2 = (0.3%. The predicted selling price is 967.0916.355.47 2.35 2. (b) x = 22. (b) R 2 = 2. (b) r 2 increases from 0. 2007 9:50 S-8 SOLUTIONS TO ODD-NUMBERED EXERCISES 2. the remaining points appear more tightly clustered about the least-squares regression line. (b) 0.047x. No. (b) The graph shows a negatively sloping line.0892 + 0. In Figure 2. Thus. computed from statistical software. not that there is no correlation. (c) No. (c) y = 9.2. you could not make an accurate prediction of stock returns because the R 2 is so low and the scatterplot shows a very weak association between Treasury bills and stock returns. Observation 1 is not very inﬂuential.31. y). (a) and (b) The plots look very different.188999x.660 to 0.596)2 = 0. 2. we would predict y = y = 9.74776x. The regression line with year does not do a good job of explaining the variation in returns. In Figure 2. (c) The correlation between x and y is 0.41 2.61 .37 (a) 0.09 is not possible. the linear trend is not as pronounced.45 2. ˆ (a) y = 1. or both.968.39 2. There do not appear to be any extreme outliers from the straight-line pattern. b = r sx = 0. positive. after removing the outliers. very weak. A more accurate statement might be that in companies that compensate their CEOs highly.253.57 2. and a = y − bx = 1. ˆ (a) The percent is 35.253.779. the remaining points appear to be more tightly clustered around a line.600.07% when x = x = 1.55 2.707x. (a) The scatterplot shows a curved pattern. (a) There is a fairly strong. not a straight-line relationship.6 thousand dollars for a unit appraised for \$802. (a) Gender is a categorical variable. The magazine’s report implies that there is a negative correlation between compensation of corporate CEOs and the performance of their company’s stock. The least-squares regression line passes through the ˆ point (x.995.74. (c) Using a calculator we found the sum to be −0.8.49 2. y = 5.53 2. sx = 17. linear relationship between appraised value ˆ and selling price.306. (b) y = 127. the outlier accentuates the linear trend because it appears to lie along the line deﬁned by the other points.368. (b) y = 6.08%.898.43 2. After its removal.01. (c) The correlation has no unit of measurement.270 + 1. ˆ (a) y = 2. (c) These points are apparently not outliers because the correlation dropped quite a bit when they were removed. After 25 weeks the equation predicts that there will be y = −4. (b) No. (b) A correlation of 1.08% + ˆ 1. The correlation between x ∗ and y ∗ is also 0. Observation 1 is an outlier.707.95918 + 6. and after its removal. which is very low.

525483 + 0.64. r = −0. and so the variables are negatively associated.2.879.1%.87 . (c) The residual is −20. positive. Experiment with the applet on the Web and comment.4%. y = ˆ 58. (b) r 2 is 39.047x. (b) R 2 = 93. (c) R 2 = 93.67 2. (a) Number of Target stores = −0. We would predict Octavio to score 4.6571 + 0.5879 + 1.590147. so we predict number of Target stores = 96. There are 254 Wal-Mart stores in Texas.81 2.8.76946. so we predict number of Wal-Mart stores = 132. (c) y = 127.130.4%.859. If the correlation has increased to 0. For x = 500.76946 = the residual. and b = 0. The observed decline for this particular bear market is 14%.79 2. so the residual is −14.3. 2.4232.73 2.382345 × (number of Wal-Mart stores). (b) R = 0.1 points above the class mean on the ﬁnal exam. so the prediction is an overestimate. (b) y = −17. y = 43.999993.879. so the residual = 121. (a) R 2 = 93. The actual assets − the predicted assets = −20. (c) The points. Leaving out spaghetti and snack cake. there is a fairly strong. and b = 0.6 is substituted for x.13459 × (number of Target stores). There are 90 Target stores in Texas. Fact 3 is true for this least-squares line.334%.8. ˆ (a) r = 0.71 2. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-9 2. r 2 = 0. and intercept = a = −17. taken together.464 + 0.5%.0832x. linear relationship.69 2. and the mean appraisal is \$688.1275.77 2. (b) The actual assets are 11.83 2. ˆ ˆ (a) y = 15.76%. a = 1. so the calibration does not need to be done again.860.8814 + 1. (d) The predicted selling price will be \$848. (d) There are 90 Target stores in Texas. We would expect the predicted absorbance to be very accurate based on the plot and the correlation. ˆ (a) Spaghetti and snack cake appear to be outliers.3101 + 1. (b) y = ˆ 1.75 2.988.391.928. then y will be 48. (b) For all 10 points.14721x. (c) Only the intercept is affected by the change in units. a = 1. (e) The mean selling price is \$848. (a) The predicted assets for DLJ Direct with 590 accounts will be 31. are moderately inﬂuential. (c) The predicted value is y = 28.0832. With 64% of the variation unexplained.85 2.334%. When American stocks have gone down.Moore-212007 pbs November 27.65 2.849.2. If 793. (b) We would predict Julie’s ﬁnal exam score to be 78. (a) Yes. y = 58.63 ˆ (a) Slope = b = 0. the least-squares regression line would not be considered an accurate predictor of ﬁnal-exam score. Therefore. then there is a stronger relationship between American and European stocks.113301x. (c) Number of Wal-Mart stores = 30.270 + 1.590147. (a) b = 0.9. R 2 is the percent of the variation in the values of total assets that is explained by the least-squares regression line of total assets on number of accounts.2.16 and a = 30.858x.30356x. so European stocks have not provided much protection against losses in American stocks.750.1275 + 0. R 2 is the percent of the variation in the selling price that is explained ˆ by the least-squares regression line of selling price on appraisal values.5768. Thus.36. (b) There are 254 Wal-Mart stores in Texas. European stocks have tended to go down also. (c) r 2 = 0. Smaller raises tended to be given to those who missed more days. R 2 = 86. Yes. so the residual = −6.

107 (a) The residuals are more widely scattered about the horizontal line as the predicted value increases. years of education. The regression model will predict low salaries more precisely because lower predicted salaries have smaller residuals and hence are closer to the actual salaries.9101.6954 + 9. the plot shows a modest positive association. For intermediate numbers of years the residuals tend to be positive. and will overestimate the salaries of players who have been in the majors more than 15 years. .68%. and this could account for the observed correlation. namely the size of the ﬁre. The reasoning assumes that the correlation between the number of ﬁreﬁghters at a ﬁre and the amount of damage done is due to causation.105 Intelligence or family background may be lurking variables. (b) The number who smoke is 1004. However. Correlations based on averages are usually higher than the correlation one would compute for the individual values from which an average was computed. is behind the correlation. Answers will vary but may include: age.86 million employees only 5 years later.Moore-212007 pbs November 27.365. 2. and location. There also appears to be an overall downward trend over time. Intelligence or support from parents may be a lurking variable. (b) y = 44.89 We would expect the correlation for individual stocks to be lower.99 2. Correlations based on averages (such as the stock index) are usually too high when applied to individuals. years of experience in job. (c) There are periodic ﬂuctuations with two peaks around 1977 and 1986. 2. Families with a tradition of more education and high-paying jobs may encourage children to follow a similar career path. The model will overestimate the salaries of players that are new to the majors. (b) We would expect the correlation to be smaller. we should see a negative ˆ association.93 2. If lurking variables are present. For very low and very high numbers of years in the majors. so the percent who smoke is 18. No. this is an extrapolation because it is very unlikely that a kind of business that has 0 employees in 1997 would have 1.103 (a) r 2 = 0. 2. 2007 9:50 S-10 SOLUTIONS TO ODD-NUMBERED EXERCISES 2. This is the fraction of the variation in daily sales data that can be attributed to a linear relationship between daily sales and total item count. then requiring students to take algebra and geometry may have little effect on success in college. r 2 = 0. 2.59163x.91 2. 2. (b) Yes.101 (a) If the consumption of an item falls when price rises. the residuals tend to be negative. factors that may lead students to take more math in high school may also lead to more success in college.109 (a) The data describe 5375 students. The most seriously ill patients may be sent to larger hospitals rather than smaller hospitals. Correlation does not imply causation. More generally. (a) The story states that these are the “ten kinds of business that employ the MOST people” so it would not be possible to have GREATER x-values than the ones listed in this data. It is more plausible that a lurking variable.95 2. will underestimate the salaries of players who have been in the major leagues about 8 years. (b) There is a curved pattern in the plot.97 2.

86% 18.73% 36. while for men accounting and ﬁnance are the most popular.12% The data support the belief that parents’ smoking increases smoking in their children.61% .419). For both types of purchases.155). 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-11 (c) Neither parent smokes One parent smokes Both parents smoke Total Percent 2.167).75% 4-year full-time 35.8%.84% 3.113 (a) Female Accounting Administration Economics Finance 30. (d) If you are in “good” condition before surgery or if you are in “bad” condition before surgery. (b) Percent of Hospital A patients who died who were classiﬁed as “poor” before surgery = 3. 2.44% 2. (e) The majority of patients in Hospital A are in poor condition before surgery.121 There are 1716 thousands of older students. cash is most likely.59% 4-year part-time 13.303).40%. your chance of dying is lower in Hospital A.22% 40. The conditional distribution of payment method for planned purchases is: cash (0.65% The most popular choice for women is administration. The distribution is 2-year part-time % older students 7. Patients in poor condition are more likely to die. Percent of Hospital B patients who died who were classiﬁed as “poor” before surgery = 4%.3%.Moore-212007 pbs November 27.530). The conditional distribution of payment method for impulse purchases is: cash (0.23% 2239 41.47% 1356 25.66% 1780 33.78% 24. and credit card (0. 2.117 The marginal distribution for payment method is: cash (0. so choose Hospital A. while the majority of patients in Hospital B are in good condition before surgery. even though both types of patients fare better in Hospital A. Percent of Hospital B patients who died = 2%. check is least likely. and credit card (0.351).129). check (0. Answers will vary for explaining the choice of payment method for impulse purchases.05% 2-year full-time 43.11% Male 34. For planned purchases.111 Neither parent One parent Both parents smokes smokes smoke % students who smoke 13.115 (a) Percent of Hospital A patients who died = 3%. 2.495). and credit card (0. check (0. credit card is most likely. check (0. Percent of Hospital B patients who died who were classiﬁed as “good” before surgery = 1. 2. 2.22% 27.58% 22. (c) Percent of Hospital A patients who died who were classiﬁed as “good” before surgery = 1%.452).54% did not respond. For impulse purchases. and this makes the overall number of deaths in Hospital A higher than in Hospital B. (b) 46.119 (a) The percent is 20. (b) The percent is 9.85%.

05% 57.67% (b) The data show that those taking desipramine had fewer relapses than those taking either lithium or a placebo. 2.44%) of applicants who are less than 40 who are hired is more than 10 times the percent (0. (d) Lurking variables that might be involved are past employment history (why are older applicants without a job and looking for work?) or health.33% 16.123 (a) Hired Applicants < 40 Applicants ≥ 40 6. The entry in the Total column is 59. These results are interesting but association does not imply causation. (c) Only a small percent of all applicants are hired.129 (a) A combined two-way table is Admit Male Female (b) Converting to percents gives Admit Male Female 70% 56% Deny 30% 44% 490 280 Deny 210 220 (c) The percents for each school are as follows: Business Admit Deny Male Female 80% 90% 20% 10% Male Female Law Admit 10% 33% Deny 90% 67% .56% 99.Moore-212007 pbs November 27.125 (a) Desipramine Relapse No relapse 41.33% Lithium 75% 25% Placebo 83. 2.61% Not hired 93.73% 2. 2007 9:50 S-12 SOLUTIONS TO ODD-NUMBERED EXERCISES Older students tend to prefer full-time colleges and universities to part-time colleges and universities. The difference may be due to rounding off to the nearest thousand.61%) of applicants who are 40 or older who are hired. but the percent (6.918 thousand.127 (a) The sum is 59.54% 10.69% 10. with 79.920 thousand. 2.2% of all older students enrolled in some fulltime institution.67% 58. (b) Never married Married Widowed Divorced 21.44% 0.39% (b) A graph shows results similar to those in (a).

Fund A increases by 40%. and more men apply to the business school than the law school. (c) 0.5 13 11.141 (a) 0. Fund B increases by 20%.459. (b) 0. R 2 = 75.5 15 11. 2. 2.952x. (e) The rounding didn’t have an effect for this data because the part of each data point which was rounded off was a small percentage of the actual data value. the overall admission rate for men appears relatively high.966. ˆ 2. (c) The “zero intercept” line completely misses all the data because it is too low.137 (a) If a professor had a 2002 salary of \$0. 2. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-13 (d) If we look at the tables. one should not conclude from these data that the observed association is evidence that high interest rates cause low stock returns.396. (d) Professor #15 has the highest 2002 and 2005 values. but the data point follows the general trend of the other data.135 (a) 0. #15 is not an inﬂuential observation because R 2 .56 (c) 0. Then if Fund B increases by 10%. 2.966. 2.Moore-212007 pbs November 27. 144.877.131 (b) There is a fairly strong. his or her 2005 salary would be \$27.44. The point is an outlier in the horizontal direction.200). s. Professor #16 has a 2005 salary similar to this one. The estimated intercept raises the least-squares regression line to the right height to pass through the data.300. This is a case where there is a perfect straight-line relation between Funds A and B. s = 8710. 2.966.145 Removing this point would make the correlation closer to 0.147 Suppose the return on Fund A is always twice that of Fund B. ˆ With #15 removed: y = 21681. while more women apply to the law school than the business school. (c) Professors #3 and #5 have 2002 salary similar to this one. The correlation of amps with average weight is greater than the correlation of amps with the individual weights. and this makes the overall admission rate for women appear low.387 + 0.5 14 12.5 12 10. It is hard to get into the law school. 2. 2. It would not be considered an outlier. and if Fund B increases by 20%.149 (a) and (b) The overall pattern shows strength decreasing as length increases until length is 9. positive. Professor #9 does not follow the overall pattern and would be considered an outlier. s = 8765.676.139 (a) With all data points: y = 27459.6%. and the slope hardly changed at all when it was removed.143 High interest rates are weakly negatively associated with lower stock returns. (b) Amps Average weight 10 9.6%. (b) No.896x. 2. This is not a “practical” interpretation because no professor would have had a 2002 salary of \$0. which is a general fact.441 + 0. we see that it is easier to get into the business school than the law school.554. and its location strongly inﬂuences the regression line. R 2 = 75.133 (d) The association looks stronger as the range of axes increases because the points look closer together. (d) The calculations are the same. Because many other factors (lurking variables) affect stock returns. then the pattern is relatively ﬂat with strength decreasing only . linear association with an outlier from point Professor #9 (102. Because it is easier to get into the business school. Units of measure don’t change the correlation between two variables.

so there should be no systematic differences in the groups that received the two treatments. 2. and r = 0. and are these common lengths that are used in building? 2. (b) For a house built in 2000. a negative value makes no sense! (d) r = −0.947. (c) The least-squares line is Strength = 488.24% 1. age is 2 and selling price = \$186.241). indicating that older houses are associated with lower selling prices and newer houses with higher selling prices.17% 1.544.9 × Length. The proportions for “higher” and “lower” are very close. and 1899 (age = 101) is almost within the range.75 × Length. and we would not trust the regression line to predict the selling price of such a house.89% The data show that aspirin is associated with reduced rates of heart attacks but a slightly increased rate of strokes. To ﬁnd the equation of the regression line for predicting degree-days from gas use.682.51.5 − 46. The equation of the regression line is gas use = 123.226 − 1334. age is 3 and selling price = \$185.419.49 × age. We see that for each 1-year increase in age the selling price drops by \$1334.226.02. A straight line does not adequately describe these data because it fails to capture the “bend” in the pattern at Length = 9.391).048 and a = −5.889.153 (a) Selling price = 189.222 × (degree-days).226. The association is negative.49 × 150 = −\$10. b = r sx = 20. y = 558.368). same (0. s y = 274. However.222 and a = y − bx = 123. x = 21.151 Let x denote degree-days per day and y gas consumed per day.155 We convert the table entries into percents of the column totals to get the conditional distribution of heart attacks and strokes for the aspirin group and for the placebo group.4 × Length. age is 1 and selling price = \$187.53. The regression line predicts a house that is 150 years old to have a selling price = 189.226 − 1334. the design of the study is such that it is difﬁcult to identify lurking variables. 2. lower (0.09% 1.222. we interchange the roles of x and y and compute b = 0.048 × (gas use).Moore-212007 pbs November 27. The slope has units of cubic feet per degree-day.891.49. 1850 (age = 150) is well outside the range of the data. Thus.93% 0.283 + 0.283.226 + 20.557. For a house built in 1998. Association does not imply causation.383. age is 0 and selling price = 189.1 − 3. 2007 9:50 S-14 SOLUTIONS TO ODD-NUMBERED EXERCISES very slightly for lengths greater than 9. Doctors were assigned at random to the treatments (aspirin or placebo). For a house built in 1999. Aspirin group Fatal heart attacks Other heart attacks Strokes 0. . For what lengths is it stronger. The equation for the lengths of 9 to 14 inches is 667.08% Placebo group 0.157 (a) The marginal distribution of opinion about quality is: higher (0. I would want to know how the strength of the wood product compares to solid wood of various lengths. (c) 1900 (age = 100) is within the range of the data used to calculate the leastsquares regression line.50.38 − 20. so we would probably trust the regression line to predict the selling price of a house in these years. For a house built in 1997. 2. (d) The equation for the lengths of 5 to 9 inches is 283.989. The two lines describe the data much better than a single line. The slope has units of degree-days per cubic feet. The equation of the regression line is degree-days = −5. sy sx = 13.

Group 1 will contain: Dubois (05). Group 2 will then contain the remaining subjects: Abate. Iselin. The response variable is the price the student would expect to pay for the detergent. and the next 5 to Group 3. 06. Regimen B = Moses. A statistically signiﬁcant result means that it is unlikely that the salary differences we are observing are just due to chance. 02. The ﬁrst 5 selected are assigned to Group 1. However.57 . Quinones (16). For each group. 08. The remaining 5 are assigned to Group 4. Brown. Kendall. 03. 09. Loren. Mann. (d) The ﬁrst 3. 01. and the sample mean for the processing is 314. 17. To properly blind the subjects. The explanatory variable is the price history that is viewed. The light/regular group will use the remaining 10 subjects.43 3. Stall. Hwang. This is a comparative experiment with two treatments: steady price and price cuts. Nevesky.55 3.39 To assign the 20 pairs to the four treatments. Group 1 is pairs numbered 16. The levels correspond to the three sets of choices. 07. Chen (04). Wilansky. Jackson.49 3. (a) This is a completely randomized design with two treatment groups. Group 3 is 18.45 3.41 3. Morse. the regular/light group will use subjects with labels: 12. 15. Group 2 is 13. Cruz. If charts or indicators are introduced in the second year. (a) Randomly assign your subjects to either Group 1 or Group 2. Gutierrez. Birnbaum.4 and 3. and 11. (b) Using line 141 on Table B. Ullman (20). 05. The response variable is whether they chose a milk drink or a fruit drink. Use a diagram like those in Figures 3. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-17 3. 05. and 08.47 3. Group 1 will drink them in the regular/light order.5 to describe the design. give each of the pairs a two-digit number from 01 to 20.4 and 3. (a) The subjects and their excess weights. Gerson (08). Brown. 04. Rodriguez. and then the results of the two groups will be compared to see if the order of tasting made a difference to the ratings.5 to describe the design. 19.53 3. and 10. McNiell. (b) The factor is the set of choices that are presented to each subject. Engel. Regimen C = Hernandez. (b) The sample mean for the kill room is 2138. Fluharty (07). 06. and Group 2 will drink them in the light/regular order. 10. you won’t know if the observed differences are due to the introduction of the chart or indicator or due to lurking variables. the next 5 to Group 2. Travers (19).Moore-212007 pbs November 27. With these conventions. Brunk.5. This experiment suffers from lack of realism and limits the ability to apply the conclusions of the experiment to the situation of interest. Regimen D = Deng. Santiago. (c) Use a diagram like those in Figures 3. The remaining 5 pairs are assigned to Group 4. (a) The subjects are the 210 children. Tran. not appearance) cups with no labels on them. and the electric consumption in the ﬁrst year is compared with the second year. are listed with the columns as the 5 blocks and the 4 subjects in each block labeled 1 to 4. Each group will taste and rate both the regular and the light mocha drink.51 3. Aﬁﬁ (02). 19. both mocha drinks should be in identical opaque (we are only measuring taste. the taste ratings of the regular and light drinks will be compared. and Rosen. rearranged in increasing order of excess weight. Obrach. Lucero (13). Kaplan. (b) Answers will vary using software. Thompson (18). Smith. Regimen A = Williams. 09. 16. but using line 131 on Table B. (a) This is matched pairs data because each day two related measurements are taken.

08. 0.5 to describe the design.8 away ˆ from the true proportion. 10. Use a diagram like those in Figures 3. then p would only be 0.239 residents interviewed. 20. (b) Each subject will do the task twice.9196. In a controlled scientiﬁc study. 09.85 3. ˆ (b) Approximately 95% of the p values are inside the interval (0. (b) If we use a p of 0. ˆ (a) Worst case scenario would be that we have a p of 1. (a) Approximately 10 million to 70 million dollars.4 and 3. 01. the effects of factors other than the treatment can be eliminated or accounted for. Label the plots across rows. (a) Population is all people who live in Ontario.61 3.63 3.89 . The difference in the number of correct insertions at the two temperatures is the observed response.4 and 3. (a) Use a diagram like those in Figures 3. ˆ (a) p = 0. (d) Sampling variability decreases as the sample size increases. (b) Use a diagram like those in Figures 3.75 3. Sample is the 61. (a) Use a diagram like those in Figures 3. 3. (c) It is impossible to know how far off this sample proportion is from the true population proportion because we do not know what the true population proportion is. while others will not.788 million dollars. 2007 9:50 S-18 SOLUTIONS TO ODD-NUMBERED EXERCISES 5 children to be assigned to receive Set 1 have labels 119. This is a very large probability sample. 3.515 is a statistic. The ﬁrst 10 plots selected are labeled 19.65 3.Moore-212007 pbs November 27.5 to describe the design. which is close to the other medians.67 3.59 (a) The more seriously ill patients may be assigned to the new method by their doctors.4 and 3. 02.5 to describe the design.8133 away. so this is a good indication of where the true proportion is. 06. and 148. (a) 25. (b) Worst case scenario is the true proportion is 0.9196). (b) Approximately 5 million to 80 million dollars. 033.503 is a parameter and 2.71 3. 16. 199.79 3. and the remaining 10 are assigned to Method B. The seriousness of the illness would be a lurking variable that would make the new treatment look worse than it really is. These are assigned to Method A.5 away from the true proportion at most. The number of adults is larger than the number of men. ˆ (c) Approximately 95% of the p values from 1200 samples are between 0.87 3. (c) 13. which is smaller. (c) Approximately 18 million to 40 million dollars.69 3.4 and 3. (b) Yes. Then the estimate would be 0. (b) 30.8133. once under each temperature condition in a random order. and 07.81 3. 192.3 million dollars.73 3. The number 73% is a statistic and the number 68% is a parameter.234 million dollars.5 to describe the design. 2. Some people might object because some participants will be required to pay for part of their health care.77 3.83 3. it may take a long time to carry out the study.5.8072 and 0. which would be 0.8072. Draw a rectangular array with ﬁve rows and four columns. (b) As a practical issue. which is smaller. The larger sample size suggested by the faculty advisor will decrease the sampling variability of the estimates.

000 miles is P(X > 50. P(A or B) = 0. 6. (c) Households that have more cars than the garage can hold have 3 or more cars. If we record the size of the hard drive chosen by many. NOO. σ X −Y = 10. so P(X = 60. (b) Not legitimate.43.225. Proﬁt downtown is 35 and μ35Y = 6825.Moore-212007 pbs November 27. yellow. ONN.47.000) = 0. (a) Continuous. the average will be close to 18. or 4.000) = 0. NNO. (a) Legitimate. (d) Discrete.43 4.71. The probabilities of the outcomes do not sum to 1.125. σY = 6475. 1.39 4. (c) P(X > 5) = 0.1. All times greater than 0 are possible without any separation between values. 4.000.0344.0344. (b) P(X ≥ 5) = 0. (b) Discrete. If the correlation between two variables is positive. P(B) = 0.29.91.125 P(X ) 0. P(X ≥ 1) = 0. (a) The probability that a tire lasts more than 50.825. (a) P(A) = 0. σ X −Y = 100. σY = 5600. (c) The normal distribution is continuous. as it is not one of the possible choices and does not indicate which choice is most popular.000) = 0 and P(X ≥ 60. Knowing μ is not very helpful. Also. μY = 195.55 4.000) = P(X > 60.47 4. (a) The 8 arrangements of preferences are NNN.5.11.5. small values of one tend to be associated with small values of the other.2. Each must have probability 0. This suggests that when two variables are positively associated. 2007 9:50 S-22 SOLUTIONS TO ODD-NUMBERED EXERCISES 4. (c) X 0 1 0. It is a count that can take only the values 0. or 7. and σ X = 74.45 (a) All the probabilities given are between 0 and 1 and they sum to 1. (b) P(X > 60.61 .5.83. (c) P(plain M&M is red.00. (b) {X ≥ 1} means the household owns at least 1 car. (f) P(a randomly chosen household contains more than two persons) = P(X > 2) = 0. ONO. P(A does not occur) = 0. (b) Proﬁt at the mall is 25X and μ25X = 7000. Any number 0 or larger is possible without any separation between values.32.47. 5. (a) μ X = 280.18.375. 3.375 2 0. many customers in the 60-day period and compute the average of these sizes. (b) P(blue) = 0.375 3 0. and σY = 138. then large values of one tend to be associated with large values of the other. resulting in a relatively small difference. (d) P(2 < X ≤ 4) = 0. again resulting in a relatively small difference.49 4. they vary together and the difference tends to stay relatively small and varies little.57 4. (c) Continuous. (a) All the probabilities given are between 0 and 1 and they sum to 1.65. 4. Household size can take only the values 1.41 (a) P(blue) = 0.51 4. and σY = 80. or orange) = 0. 3. (b) “A does not occur” means the farm is 50 acres or more. σY = 19. (b) μY = 195.37 4. The sum of the probabilities is greater than 1.125 4.5. (c) Not legitimate.75.000) = 0. OON. μ = 18. 2. NON. (b) P(X = 2) = 0. (e) P(X = 1) = 0. or orange) = 0.00. 2 2 (a) μ X = 280.59 4. P(peanut M&M is red.53 4. OOO. 2.04. yellow. μ25X +35Y = 13. 2 μY = 445. (c) “A or B” means the farm is less than 50 acres or 500 or more acres. 2 μ X −Y = 100. 20% of households have more cars than the garage can hold.5. (c) The combined proﬁt is 25X + 35Y.

73 4.720 0. (c) 379. If X and Y are positively correlated. μ X +Y = 31 seconds. (c) Orlando and Disney World are close to each other. The histogram for renter-occupied units is more peaked and less spread out than 2 2 the histogram for owner-occupied units.71 4.0010.47.28 bets per second. σrented = 1. the probabilities add up to 0. This reﬂects the fact that the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units.35 3 0.69 218. small values of X and Y tend to occur together.99. Rainfall usually covers more than just a very small geographic area. X + Y exhibits larger variation when X and Y are positively correlated than if they are not. If the 0. There are 1000 three-digit numbers (000 to 999). resulting in a small value of X + Y .63 (a) 720.64074.77 4. your probability of winning is 6/1000 and your probability of not winning is 994/1000. 4.003 −1280 0.13 (c) Yes. then the probabilities add up to 1.67 (a) 0. If you pick a number with three different digits. (a) If X is the time to bring the part from the bin to its position on the automobile chassis and Y is the time required to attach the part to the chassis.71174. We would not expect X and Y to be independent.08 1 0. (a) −1280.18 5 0. (b) 0. resulting in a very large value of X + Y . then large values of X and Y tend to occur together. (b) Experience suggests that the amount of rainfall on one day is not closely related to the amount of rainfall on the next. of transactions % of clients 0 0. then the total time for the entire operation is X + Y . (b) It will not affect the mean. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-23 4.19 4 0.125 is rounded to 0.30833. σ X +Y = 4. (b) No.12.65 4. because they are not widely separated in time.49998. (a) We would expect X and Y to be independent because they correspond to events that are widely separated in time.997 (a) Two important differences between the histograms are (1) the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units and (2) the spread of the distribution of the number of rooms of owner-occupied units is slightly larger than the spread of the distribution of the number of rooms of renter-occupied units.98. (b) X Probability 4. 4. μrented = 4.125 is rounded down to 0.79 .17 2 0. σrented = 1. Likewise.75 4.1131. perhaps slightly dependent.284.187.13. If X and Y have correlation 0. σ X +Y = 4. σowned = 2. σowned = 1.3.Moore-212007 pbs November 27. Thus. We might expect X and Y to be independent or. The mean number of rooms for owner-occupied units is larger than the mean number of rooms for renter-occupied units. Your expected payoff is \$0. (c) The answer in (a) will remain the same in both cases. (b) μowned = 6.69204. if for 5 transactions the 0.

2 (a) μ X = 550. this average would be close to μ. σ X −550 = 32. (b) μfemale−male = 15.000 is a parameter. That is not right. Pfeiffer undoubtedly tested only a sample of all the models produced by Apple. σ0. 2 σ0. 4.95 These are statistics. So σ X +Y = σ X + σY . 14 is a statistic. σY = σ(9X/5)+32 = 105. P(x > 2) = P(Z > 4. Thus. Thus. Six of seven at-bats is hardly the “long run.95. 4.2Y = 15.8W +0. σ X +Y = σ X + σY + 2ρ X Y σ X σY = σ X + σY + 2σ X σY = (σ X + σY )2 . 4.2Y . in the long run. It only tells us what will happen if we keep track of the long-run average. His expected winnings are −\$0.76). he will ﬁnd that he loses an average of 40 cents per bet.81 4.5) = σ . The law of large numbers tells us that if the gambler makes . 2007 9:50 S-24 SOLUTIONS TO ODD-NUMBERED EXERCISES 4.60.4 computed in part (a). σ Z = 5.8W σ0.” Furthermore.135635594 × 10 .599.83 4.947.89 4. The law of large numbers says that in the long run.5) + 2 2 (μ − σ − μ X ) (0.4. σ X = σ . the law of large numbers says nothing about the next event. 5.00 to play each time. The mean return remains the same.93 4. and these means are computed from these samples. 2 2 (c) μY = μ(9X/5)+32 = 1022. Thus.101 (a) To say that x is an unbiased estimator of μ means that x will neither systematically overestimate or underestimate μ in repeated use and that if we take many.763.Moore-212007 pbs November 27. (b) Using line 120. x = 67. 4. repeat this many times.109 The gambler pays \$1.00 for an expected payout of \$0. 4.5) + (μ − σ )(0. which is close to the value of μ = 69. (b) σ Z = σ2000X +3500Y = 13 3. σfemale−male = 2009 and 2 σfemale−male = 44.812. many samples.7.70 because we no longer include the positive term 2ρσ0. (a) The two students are selected at random and we expect their scores to 2 be unrelated or independent.40. (a) Using statistical software we compute the mean of the 10 sizes to be μ = 69.99 4. 2 2 2 (a) σ X +Y = 7. 7.6.2Y = 3. (b) μ X −550 = 0.91 4. 4.5 and σ X −550 = 5. (c) We cannot ﬁnd the probability that the woman chosen scores higher than the man chosen because we do not know the probability distribution for the scores of women or men.6 is a statistic.107 19 is a parameter. the average payout to Joe will be 60 cents.5) = μ.7. Thus. which is approximately 0. (c) The center of our histogram appears to be at about 69. Joe pays \$1.85 2 μ X = (μ + σ )(0.25. these results will vary less from sample to sample than the results we would obtain if our samples were small.105 500. and compute the average of these x-values. compute the value of some statistic (such as x).053 per bet.674.97 4. This is smaller than the result in Exercise 4.8W +0.13. calculate x for each. if Joe keeps track of his net winnings and computes the average per bet. and keep track of our results.000.87 4.3 and σY = 10. (b) If we draw a large sample from a population. and 5. our SRS is companies 3. 4. The law of large numbers tells us that the long-run average will be close to 34%. σ X = 5. 2 2 2 2 2 If ρ X Y = 1.82. However. 0. so in the long run his average winnings are −\$0.75 and σ X +Y = 2795.103 The sampling distribution of x should be approximately N (1.26. σ X = (μ + σ − μ X )2 (0.085).

131 P(Chavez promoted) = 0.60) = 0.28. We ﬁnd L = 133. (b) P(x ≥ 124) = P(Z ≥ 21.45. (b) E(Y ) = 620 and = 12.73 < Z < 2. 17. (c) There is always sampling variability.55. P(x < 295) = P(Z < −2. (b) 9.031.113 P(X > 210. P(750 < Y < 825) = P(−1. 4.9535.9990.8558.137 The weight of a carton varies according to an N (780. 4.11 to 0.808. this average will be close to −\$0. and computes the average of these.19 will contain approximately 95% of the many x’s.123 Sheila’s mean glucose level x will vary according to an N (125. (e) P(not in occupation D or E) = 0. keeps track of his net winnings. 4.94). but this variability is reduced when an average is used instead of an individual measurement.0071. which is approximately 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-25 a large number of bets on red. 5) distribution. (c) The standard deviation is the same for both. 4.1587.65). (b) P(worker is female) = 0.Moore-212007 pbs November 27. (b) The mean contents x will vary according to an N (298. (b) 0.45) = 0. 4. the smaller the variability. (d) The probabilities using the central limit theorem are more accurate as the sample size increases. (d) P(occupation D or E) = 0.0866. 4.125 (a) E(X ) = −620 and = 12. Letting Y denote the weight of a carton.031. 4.808.0462) distribution. 4. 4.3 cents per bet on average.0125.0052.135 The probability distribution is Y 1 2 3 4 5 6 7 8 9 10 11 12 P(Y ) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 4. (b) μ = 2.53) = 0.225) distribution. 1. 4.96. which does not require that we know the distribution of weekly expenses. (c) P(not in occupation F) = 0. The larger the sample size.127 (a) 0.111 (a) The mean x of n = 3 weighings will follow an N (123. The money received from 12 policies (unless extra charges for costs and proﬁts are huge) would not cover this replacement cost.119 The range 0.121 (a) P(x > 400) = P(Z > 10.72.133 (a) All the probabilities listed are between 0 and 1 and they sum to 1. (c) 0. He will ﬁnd he loses about 5.3544.32) distribution. . (b) We would need to know the distribution of weekly postal expenses to compute the probability that postage for a particular week will exceed \$400. which is approximately 0.129 (a) 0.141 If a single home is destroyed by ﬁre.053. 4. 4. 4. 4. L must satisfy P(x > L) = 0. 4. 0.117 (a) P(X < 295) = 0.05. The 150-bag probability calculation is probably fairly accurate.25.43. but the 3-bag probability calculation is probably not.139 (a) All the probabilities listed are between 0 and 1 and they sum to 1.115 0. In part (a) we applied the central limit theorem. the replacement cost could be several hundred thousand dollars.

σ X = 9707.160 0. 100} is one possibility (but assuming a person could be employed for 100 years is extreme). (a) 15% drink only cola.5 5.73)2 = 0.7 5. (b) Because we are allowing only a ﬁnite number of possible values. The more policies the company sells. 1.73)2 = 0. Since the years are independent. .0041. 4. and that the company will make money. (c) We included 101 possible values (0 to 100).91)6 = 0.65908.27)2 (0. 0.568.27)2 (0.144 (0. 3}. 5.57. .144 (0.0000005. .826. . X is discrete. . 2007 9:50 S-26 SOLUTIONS TO ODD-NUMBERED EXERCISES Although the chance that a home will be destroyed by ﬁre is small.145 σ X = 94. one can appeal to the law of large numbers and feel conﬁdent that the mean loss per policy will be close to \$250.73) = 0. .35.1 5. 2. the company can be reasonably sure that the amount it charges for extra costs and proﬁts will be available for these costs and proﬁts.27)(0. 2.35.65 = 0.17 . (c) 0.11 5. .389 0. .27)2 (0.432 Arrangements Probability of each arrangement Probability for W W =2 W =3 5. (c) 0.389 (0. 0.0197 Part (c) 0. the risk to the company is too great. 0. (b) 20% drink none of these beverages. (b) The probability of the price being down in any given year is 1 − 0.144 (0.73)2 = 0. . 2 4. 3. (a) W = {0.73) = 0. Thus.27)(0.6.99058.0532 (0.8.27)(0. .09)6 = 0.Moore-212007 pbs November 27. If one sells thousands of policies.73)3 = 0.0532 (0.32768. (a) 0. 4.147 (a) S = {0. (c) 0.09608. .0532 (0. .73) = 0.149 (a) S = {all numbers between 0 and 35 ml with no gaps}. . (b) 0. (b) 0.9 (a) The probability that the rank of the second student falls into the ﬁve categories is unaffected by the rank of the ﬁrst student selected.27)3 = 0. (b) (0.1681. (b) S is a continuous sample space. the better off the company will be.337.35. 1.143 P(age at death ≥ 26) = 0.197 (a) (0. CHAPTER . Part (b) W W =0 W =1 DDD DDF DFD FDD DFF FDF FFD FFF (0. μ X = 303. the probability of the price being down in the third year is 0.236. .5 5.5450.66761. 4.13 5. (c) We included an inﬁnite number of possible values.2746. (a) 0.15 0.3 5. All values between 0 and 35 ml are possible with no gaps.

σ = 0. (a) n = 10 and p = 0. (b) The writer’s ﬁrst statement is correct. σ = 1. P(X ≤ 300) = 0. √ √ μ = 8 and σ = np(1 − p) = 20(0. P(X = 4) = 0.0640.455.35) = 1. (c) P(X = 0) = 0. This probability is extremely small and suggests that p may be greater than 0.25 and √ σ = np(1 − p) = 5(0. there is less variability in the values of X .0074. and P(AB and Rh-negative) = 0. P(X = 3) = 0. 5. The odds 3 against throwing three straight 11s.65. P(A and Rh-negative) = 0.0720. P(B and Rh-negative) = 0. (b) P(X ≥ 10) was evaluated for a sample of size 20 and found to be 0. This description ﬁts this setting.27 5.6) = 2.342.37 5.0064. The value of 3.000171.3124.33 5. ˆ (a) X is binomial with n = 1555 and p = 0.193) = 0. (b) We know that the count of successes in an SRS from a large population containing a proportion p of successes is well approximated by the binomial distribution.0146. and when p = 0.3 = (0.0101.4)(0.254.0879.4. 5. 1.5256. P(AB and Rh-positive) = 0. the writer multiplied the odds.21 5. P(A and Rh-positive) = 0. 0.2) = 1. (c) P(X ≤ 2)√ P(X = 0)+ = P(X = 1) + P(X = 2) = 0. As the value of p gets closer to 1.25 5. (b) P(X = 2) = 0.0488. (a) X can be 0.45 5. μ p = p = 0. (a) μx = np = 6. and μx = 600 if ˆ n = 1200. (d) μ = 3. P(X = 2) = 0.789. 2.3364.37.3360.75) = 1. The assumption of a ﬁxed number of observations is violated. P(X = 1) = 0.0010.65)(0.43 5.3780. μ p = 0. which is not the correct way to compute the odds for the three throws. (b) P(X = 0) = 0.3762. (b) Using the Normal approximation to the binomial for proportions. P(O and Rh-positive) = 0.0924. 1.0019.0053.6)(0. The proportion in the sample will be closer to 40% for larger sample sizes. 4.191. (b) μx = 60 if n = 120.99. are 1−P = 1−(2/36) = 5831 to 1.9. however.35 5. (b) The possible values of X are 0.2373.Moore-212007 pbs November 27. 2. (a) The 20 machinists selected are not an SRS from a large population of machinists and could have different success probabilities.47 .20. 3.20.3955. (a) Using the Normal approximation. P(X = 11) = 0.25.41 5. P(X = 4) =√ 0. so the chance of the proportion in the sample being as large as 50% decreases.8)(0. (a) n = 5 and p = 0.2451.067. μ p stays the same regardless of sample size. The probability of three 11s in three independent throws is 0. (a) The probability of an 11 is 2/36.25)(0. P (2/36)3 When computing the odds for the three tosses. P(B and Rhpositive) = 0. and P(X = 5) = 0. P(X = 1) = 0. (d) μ = 2. Using software for the binomial distribution. 4.23 Yes. P(O and Rh-negative) = 0.19 5.29 5.0176. √ √ (a) μ = 16.5 and σ = np(1 − p) = √ 10(0. A and B are independent if P(A and B) = P(A)P(B) and 0.0336.2447.2637.31 5.39 5.2816. These probabilities can be used to draw a histogram. and ˆ ˆ ˆ the P( p ≤ 0.1811.25 should be 5. P(X = 2) = 0. 3. P(X = 3) = 0. P(X ≥ 100) = 0.5. (c) When p = 0.5). and P(X = 5) = 0.1160. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-27 5. (b) σ = np(1 − p) = 20(0. σ p = 0.

P(X ≤ 175) = 0. (d) n must be increased to 200. 5. Given that the customer defaults on the loan.Moore-212007 pbs November 27. P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.67 5.12.7660. (c) Poisson with mean 2 × 48. (b) 0. 5.1472. √ √ (a) σ = μ = 2. P(X ≤ 70) = 0. (b) P(X ≤ 66) = 0. (d) 0.89% chance that the customer overdraws the account.69 5.59 5.52.4 = 9.53 5. (b) P(X ≤ 175) corresponds to Jodi scoring 70% or lower. (c) 0. √ √ (a) σ = μ = 17 = 4.5 is reasonable for the count of successes in an SRS of size n from a large population. 0. (b) 0.323.0620. the probability is 0. (b) 0. Using Bayes’s rule.57 5.25.0248. √ √ (a) μ = 180 and σ = np(1 − p) = 1500(0. (b) P(X ≤ 10) = 0.3090.2424.7 = 97. (c) 0.4776. Drawing a diagram should help.0344. (a) The binomial distribution with n = 150 and p = 0.0489. P(X ≤ 170) = 0.63 5. P(X ≤ 70) = 0.9986 = 0.0014.73 5. (a) The employees are independent and each equally likely to be hospitalized.51 5. (c) Poisson with mean 1/4 × 14 = 3. These answers are simpler to see if you ﬁrst draw a tree diagram.4335.2148. Using the Normal approximation.7254 = 0. (a) Poisson distribution with a mean of 12 × 7 = 84 accidents.1730 = 0. (b) P(X > 5) = 1 − P(X ≤ 5) = 1 − 0. (b) 0.0821.20. 0. Using Bayes’s rule. (b) We expect 75 businesses to respond. (b) Poisson with mean 1/2 × 14 = 7.256. then P(B|A) = P(B). (a) 0. (d) 0.2746. P(Y < 1/2 and Y > X ) = 1/8.87. For a 30-minute √ √ period. (b) Using the Normal approximation.4617.0300.3150.7. (c) Using the Normal approximation.8270. (b) For a 15-minute period.586.4450.5970. (a) 0.55 5. (c) k = 3. 2007 9:50 S-28 SOLUTIONS TO ODD-NUMBERED EXERCISES included in the histogram from part (c) and should be a good indication of the center of the distribution.61 5.8989.71 5.3333.75 5.5905 = 0.62. which is not the case.3 = 1. the probability is 0.88) = 12.9700 = 0.1251.938.49 (a) Using the Normal approximation.85 5. (a) 0. (a) P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.0491. there is an 89. σ = μ = 48.7 = 6.98.65 5.4. (a) 0. (a) 0. 0. (c) If the events A and B were independent.87 .12)(0.77 5. (c) P(X > 30) = 1 − P(X ≤ 30) = 1 − 0.79 5.2061. (a) Poisson distribution with a mean of 48.81 5.9982. σ = μ = 97. P(X ≥ 100) = 1 − P(X ≤ 99) = 1 − 0. as in this case. P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.83 5. (b) 0. (b) 0. P(X ≥ 50) = 1 − P(X ≤ 49) = √ √ 0.4095.0018 = 0.5.

. (c) Using the Normal approximation. (c) The people are independent from each other. the probability is 0. (a) μ = 3. which is not true by comparing the answers in (a) and (b). Using Bayes’s rule. 5. n = 1244. (c) With the credit manager’s policy. .0404. it is important to take the observations under similar conditions.84.5 6.751.1065. (b) 0.115 (a) 0. (d) Using software. (b) 0. (a) The response rate is 1468/13. .117 Using Bayes’s rule.68. the vast majority of those whose future credit is denied would not have defaulted.75.97 5.1029.99 5. (b) If there are systematic patterns in the organizations that did not respond.55 100 0.Moore-212007 pbs November 27. (b) 0. (c) P(ﬁrst one on kth toss) = ( 5 )k−1 ( 1 ).01592. . 2 × (standard deviation for x) = \$44. .674.109 (a) Drawing the tree diagram will be helpful in solving the remaining parts.98 6. 5.91 million dollars. . (189.48.787. 5.5).93 5.107 (a) 0. CHAPTER .11 . 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-29 5.6 6. 6 6 6 6 6 6 5. the probability is 0.0336. 5.1416. 5. (b) 0.3 6.1129.9992 = 0. The small margin of error is probably not a good measure of the accuracy of the survey’s results. 250. (b) 0.89 (a) Using Bayes’s rule. the survey results may be biased. Yes. (b) Expect 94 not to have defaulted.105 0.627. (b) ( 5 )2 ( 1 ).103 (a) The probability of a success p should be the same for each observation. we should have P(in labor force) = P(in labor force | college graduate).1930. (c) If the events “in labor force” and “college graduate” were independent. (b) The probability of the driver being male for observations made outside a church on Sunday morning may differ from the probability for observations made on campus after a dance. 5. . P(X ≥ 1245) = 0.09 million dollars).95 5. 5.75. . A and B are independent if P(A and B) = P(A)P(B) and 0. Sample size Margin of error 10 3.101 (a) μ = 1250. (c) 0.10 20 2. (b) P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0. (b) Using the Normal approximation. (a) 0.7 6.125.5596. To ensure that this is true.064.6)(0. Use n = 1245. (c) P(X ≤ 8) = 0.111 (a) ( 5 )( 1 ).1 6. 5. P(X ≥ 275) = 0.19 40 1. such as the same location and time of day.4557.000 = 0.3 = (0.113 (a) 0. and each person has a 70% chance of being male. P(X ≤ 80) = 0. . .9 The standard deviation for x is \$22.91 5.0008. 0.024. the probability is 0.

528. (c) 1. The smaller standard deviation leads to a smaller margin of error.23 6.17 6. we do not know if our interval correctly includes the population percent or not. Thus.137.863. management vs.25 6.25 minutes).Moore-212007 pbs November 27. 128.43).96 ± 0. we do not know if our interval correctly includes the true population percent or not. not to any particular interval produced by the method. white collar jobs. H0 : μ = 0. 95% refers to the method.061).57 years). (b) The standard deviation of the mean weight in kilograms is 0.31). the width of the conﬁdence interval (or the size of the margin of error) decreases. or (101. (c) A 95% conﬁdence interval for the mean weight.77 years or (11.531. (b) The margin of error is 3%. 63.58. This would not be considered strong evidence that Cleveland differs from the national average. Ha : μ < 0.27 6. the margin of error only covers random variation.8 ± 4. (a) No.9186. 35. The conﬁdence coefﬁcient of 95% is the probability that the method we used will give an interval containing the correct value of the population mean study time.41 (a) 115 ± 13.25 minutes. 55%).1142. 2007 9:50 S-30 SOLUTIONS TO ODD-NUMBERED EXERCISES As sample size increases. we say we are 95% conﬁdent that this is one of the correct intervals. The mean weight of runners in pounds is 136. 12.21 6. Because the method yields a correct result 95% of the time.35 6. 30.29 6. \$18. (a) 95% conﬁdence means that this particular interval was produced by a method that will give an interval that contains the true percent of the population that will vote for Ringel 95% of the time. 6.37 6. 6. (b) P-value = 0.75 minutes. (b) This particular interval was produced by a method that will give an interval that contains the true percent of the population that like their job 95% of the time. The interval includes 50%. of the population is (60.37. The results are not trustworthy.39 . so we cannot be “conﬁdent” that the true percent is 50% or even slightly less than 50%.29. (a) The mean weight of the runners in kilograms is x = 61.664). 5. 11. We are 95% conﬁdent that the true population percent falls in this interval. 140.099.31 6. entry. the election is too close to call from the results of the poll.and mid-level employees.13 The students in your major will have a smaller standard deviation because many of them will be taking the same classes which require the same textbooks. It does not tell us about individual study times. Some possibilities include blue collar vs. so the 95% conﬁdence interval is 52% ± 3% = (49%.19 6.8 ± 0.490.567.0208.37. in kilograms. In pounds we get (132. Phone-in polls are not SRSs. (b) No. When we apply the method once. (a) z = −1. Use n = 75.15 6. The formula for a conﬁdence interval is relevant if our sample is an SRS (or can plausibly be considered an SRS).53. The standard deviation of the mean weight in pounds is 2. Answers will vary.51 or (26. \$17.33 6.90 ± \$961. When we apply the method once. Hence.062. n = 74. (d) No. or (\$16.03 years.

Z = −0. If a signiﬁcance level of 0. Because it is unlikely that we would obtain data this strongly in support of the calcium supplement by chance. There is not enough evidence to say that private four-year students have signiﬁcantly higher debt than public four-year borrowers. do not reject the null hypothesis.05. (b) With a P-value of 0. Ha : μ = 4. (c) Reject the null hypothesis whenever the P-value is ≤ α.807 and z < −2. (a) The null hypothesis should be μ = 0. (b) H0 : μ A = μ B . P-value = 2(0. (b) The standard deviation of the sample mean should be 18/ 30.008. in the population who name economics as their favorite subject. (a) H0 : μ = 31. results that are as strongly or more strongly in support of the calcium supplement if it is really no more effective than the placebo. Ha : ρ > 0.41 If the homebuilders have no idea whether Cleveland residents spend more or less than the national average.43 6.108. then they are not sure whether μ is larger or smaller than 31%.07 we would not reject H0 : μ = 15. The signiﬁcance level corresponding to z ∗ = 2 is 0.9124. p F . not the sample statistic x.0026. z-values that are signiﬁcant at the α = 0.807. and thus 15 would fall outside the 90% conﬁdence interval.9713.51 6.63 6. H0 : μ = 100. and the percent of females.0574.005 level are z > 2.05 is used. The P-value is the probability that we would observe. (a) Answers may vary. 0. Ha : μ = 31%. (b) No.049 < 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-31 6. In this case the P-value is 0.57 6.07 we would reject H0 : μ = 15. and the signiﬁcance level corresponding to z ∗ = 3 is 0.61 6. (a) With a P-value of 0.53 6. (c) The hypothesis test uses the population parameter μ. the P-value is large. (b) P-value = 0. The alternative hypothesis could be √ μ > 0. Ha : p M > p F .45 6. Ha : μ A > μ B .65 . Do not reject the null hypothesis. Ha : μ A = μ B where group A consists of students who exercise regularly and group B consists of students who do not exercise regularly. (c) H0 : μ = 1400. where the parameter of interest is the correlation ρ between income and the percent of disposable income that is saved by employed young adults. p M . simply by chance.049 > 0. Ha : μ = 100.01.0287. The appropriate hypotheses are H0 : μ = 31%.59 6. (a) H0 : p M = p F . Ha : μ < 1400. (a) Yes. which is very small. (b) No. (a) P-value = 0.55 6.0456. 0. There is not enough evidence to say that exercise signiﬁcantly affects how students perform on their ﬁnal exam in 6.Moore-212007 pbs November 27. where μ A is the mean score on the test of basketball skills for the population of all sixthgrade students if all were treated as those in group A and μ B is the mean score on the test of basketball skills for the population of all sixth-grade students if all were treated as those in group B. this is strong evidence against the assumption that the effect of the calcium supplement is the same as that of the placebo. (c) H0 : ρ = 0. (c) P-value = 0. (b) H0 : μ = 4. and thus 15 would not fall outside the 95% conﬁdence interval.49 6. where the parameters of interest are the percent of males. Ha : μ > 31. There is not enough evidence to say that the average north-south location is signiﬁcantly different from 100. but one possibility is a two-sample comparison of means test with H0 : μ A = μ B .4562) = 0.47 6.

The result is not signiﬁcant at the 5% level. (b) z > 1. so we would not reject the hypothesis that μ = 5 at the 5% level.75 6. (b) z = 1.Moore-212007 pbs November 27. surveys of acquaintances only) will produce data for which statistical inference is not appropriate. we would be willing to reject the null hypothesis even when no practical effect exists. (a) (99. This is a call-in poll.0013. which means that our sample statistic would be 0 standard deviations away from the null hypothesis mean. The 95% conﬁdence interval does not contain 7.3821. so we would reject H0 at the 5% level.79 6. (c) It would be good to know how the sample was selected and if this was an observational study or an experiment. Our results are based on the mean of a sample of 40 observations. 6.89 6.56) < 0.64. The P-value is 0. (a) No.225). With a signiﬁcance level this high.71 6. and such polls are not random samples. This is reasonably strong evidence against the null hypothesis that the population mean corn yield is 135.96.67 z = 3. Poorly designed experiments also provide examples of data for which statistical inference is not valid.81 6. (a) H0 : μ = 7.5 corresponds to a Z value of 0. and such a mean may vary approximately according to a Normal distribution (by the central limit theorem) even if the population is not Normal. (a) z > 1. (a) z = 1. so we would not reject H0 at the 5% level.2040 and the result is not signiﬁcant at the 5% level. Ha : μ = 7. but we would still reject the null hypothesis.69 6.77 6. The P-value is P(Z ≥ 3.91 6.0164. (a) The P-value = 0.2040 and the result is not signiﬁcant at the 1% level. Answers will vary. The 95% conﬁdence interval in part (a) contains 105.645. 2007 9:50 S-32 SOLUTIONS TO ODD-NUMBERED EXERCISES statistics.83 6.041.002. The approximate P-value is < 0. 6. We reject H0 at the 5% level if z > 1. so our conclusions are probably still valid. A P-value of 0. (b) The hypotheses are H0 : μ = 105. (a) P-value = 0. (b) P-value = 0. Statistical signiﬁcance and practical signiﬁcance are not necessarily the same thing. The conclusion is not justiﬁed.65.56. We would conclude that there is strong evidence that these 5 sonnets come from a population with a mean number of new words that is larger than 6. The result is signiﬁcant at the 5% level.93 . Ha : μ = 105.87 6.1711.73 6. and thus we have evidence that the new sonnets are not by our poet.9.001.645. (b) Yes. 109.85 6. (c) P-value = 0. In part (b) we reject H0 in favor of Ha : μ = 0 if either z is too large or z is too small because both extremes are evidence against H0 in favor of Ha : μ = 0. (b) 5 is in the 95% conﬁdence interval. (c) No. Any convenience sample (phone-in or write-in polls.96 or z < −1. Statistical inference is not valid for badly designed surveys or experiments. (b) The P-value is 0. (c) In part (a) we reject H0 only for large values of z because such values are evidence against H0 in favor of Ha : μ > 0.

95 Using the Bonferroni procedure for k = 6 tests with α = 0.4 24 8.58. 6.99 6. 6.05.05/6 = 0.05 that we would observe a difference as large as or larger than this by chance if.44.88.5040. (8.” (b) The error probability one chooses to control usually depends on which error is considered more serious.82) 9. (3.97 6.90266.68.101 Since 80 is farther away from 50 than 70 is.72) 8.08. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-33 6.32) 5. The probability is less than 0. Of the six P-values given. This is quite a bit larger than the power of 0. x ± 0. .115 Industries with SHRUSED values above the median were found to have cash ﬂow elasticities less than those for industries with lower SHRUSED values. on average the cash ﬂow elasticities for the two types of industries are the same. 6. The probability of a Type II error at μ = 298 is 1 minus the power of the test at μ = 298. in fact. (7. which is 1 − 0. This is a “false-positive. linear relationship between age and months employed.008 and 0.62) 6.105 The power of the test against the alternative μ = 298 is 0. and so it is unlikely that the observed difference is merely accidental.22) 4. (a) X has a binomial distribution with n = 77 and p = 0.18.3 (2. 34. z = 2. In most cases.9 26 9. As sample size increases.109 (a) The hypotheses are H0 : patient does not need to see a doctor and Ha : patient does need to see a doctor. the consequences can be serious.72) 7. (c) Let μ denote the mean DMS odor threshold among all beginning oenology students.” (2) The patient is diagnosed as not needing to see a doctor when.4 23 7. We test the hypotheses H0 : μ = 25. (8. All the widths of all the conﬁdence intervals are exactly the same. 6. (4.001.103 (a) The power of the test against the alternative μ = 298 is 0. 0.5 25 8.111 Age Months employed.117 (a) The stemplot shows that the data are roughly symmetric. the power will be higher than 0. the patient does need to see one.3 22 6.05/6 = 0.2 20 5.98.0083. (b) (26.22) 9.74 μg/l). the patient does not need to see a doctor. (8.0083 (or less) for statistical signiﬁcance for each test. power increases. (b) The power is higher. 6. Ha : μ > 25. The program can make two types of error: (1) The patient is told to see a doctor when.Moore-212007 pbs November 27. This probability is quite low.05. 6.32 18 2.4960 that we found in Exercise 6.4960 = 0. The alternative μ = 295 is farther away from μ0 = 300 than μ = 298 and so is easier to detect. are below 0.5. 6.9 19 4.98.05.4960.9099.107 The probability of a Type I error is α = 0.06 μg/l.52) 5.62) There is a strong. This is a “false-negative. we should require a P-value of α/k = 0. 3.81. positive.58. in fact. a false-negative is considered more serious. If you have an illness and it is not detected. (6.0 21 5. in fact. (4.08. (b) The probability that 2 or more are signiﬁcant is 0. 6.113 Answers will vary. x 95% CI. only two. 6.

123 (a) Increasing the size n of a sample will decrease the width of a level C conﬁdence interval.11.05.67 ± 3. we would expect the proportion of times we reject to be about 0. 6. we would not expect to get exactly the same number of intervals to contain μ = 240. Thus. (c) m = 10. the probability that we will obtain data that lead us to incorrectly reject H0 is 0. the effect of the program is confounded with the reasons why some women chose to participate and some didn’t. If we repeated the simulations. This is consistent with the meaning of a 0. (c) The study is not good evidence that requiring job training of all welfare mothers will greatly reduce the percent who remain on welfare for several years.80.50. 6.60 ± 4.127 (a) Assume that in the population of all mothers with young children. In 100 trials where H0 was true. and so results will vary from one simulation to the next.125 No. (b) The sample size is large. of the simulations contained μ = 240.119 \$782.121 (a) The authors probably want to draw conclusions about the population of all adult Americans. They were not assigned using randomization. the proportion of times was 0.05 signiﬁcance level. (b) Increasing the size n of a sample will decrease the P-value. The probability is less than 0.Moore-212007 pbs November 27. and so in a very large number of simulations we would expect about 50% to contain μ = 240.02 = (\$776. 6. (b) 95% conﬁdence means that the method used to construct the interval 21% ± 4% will produce an interval that contains the true difference 95% of the time. or 60%. 6.05.0073. . those who would choose to attend the training program and those who would not choose to attend actually remain on welfare at the same rate.38 ± 4.131 (a) We used statistical software to conduct the simulations. 2007 9:50 S-34 SOLUTIONS TO ODD-NUMBERED EXERCISES The P-value is 0. (e) 15 of the 25. The null hypothesis is either true or false.129 Only in 5 cases did we reject H0 . 6. and the central limit theorem suggests that it is probably reasonable to assume that the sampling distribution for the sample means is approximately Normal. 6. The population to which their conclusions most clearly apply is all people listed in the Indianapolis telephone directory. Thus. Mothers chose to participate in the training program.05.92 (c) None of these intervals overlap. Statistical signiﬁcance at the 0. which suggests that the observed differences are likely to be real. (c) Increasing the sample size n will increase the power. \$788. The probability that any given simulation contains μ = 240 is 0.01 that we would observe a difference as or more extreme than that actually observed.82 ± \$6.05 level means that if the null hypothesis is true. (b) Store type 95% conﬁdence interval Food stores Mass merchandisers Pharmacies 18. 6. (d) The calculations agree with our simulation result up to roundoff error. we say that we are 95% conﬁdent that this particular interval is accurate. Because the method is reliable 95% of the time. This is strong evidence that the mean odor threshold for beginning oenology students is higher than the published threshold of 25 μg/l.84).61 48.45 32. This is because each simulation is random.

938). there is little difference between critical values for the t and the Normal distribution.3218.31 . the t ∗ decreases for the same conﬁdence level. P-value ≈ 0.37876. so there are certainly stores with a percent change that is negative. 7.36917. not for individuals.2 is 0. (b) There are 11 degrees of freedom. Hypotheses are H0 : μ = 500 and Ha : μ > 500.87. (b) A 95% conﬁdence interval for the mean annual earnings of hourly-paid white female workers at this bank is (\$22.11 7. . (c) (27. The 95% conﬁdence interval for the mean monthly rent is (\$481. Hypotheses are H0 : μ = 0% and Ha : μ = 0%.262 using degrees of freedom n − 1 = 9. . 27.9 7. and using software. This illustrates that for large degrees of freedom. .9394. (b) 2.63).8% and the standard deviation is 15%.3 7.000.574 with 9 df. (b) The 95% conﬁdence interval is (20. \$27.811. which agrees quite well with the bootstrap intervals.0005.29 7. \$604.Moore-212007 pbs November 27. 0. As the conﬁdence level increases.12). We conclude there is strong evidence that the mean annual earnings exceed \$20. t ∗ increases for the same sample size.797 using degrees of freedom n − 1 = 24. (c) The mean is 4.914). we get P-value = 0. t = 1. (c) No.27 7.25 7. 40. (d) As sample size increases. (a) The stemplot using split stems shows the data are clearly skewed to the right and have several high outliers. 63.005 < P-value < 0. not signiﬁcant at 1%.11. (b) t ∗ = 2.025.7 7. (a) The data are skewed right with 3 points that are particularly high. .13 7.205. .02.21 7. (b) P-value < 0.350. . 59. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-35 CHAPTER . (d) 0.23 7.15 7.02145.19. and using software.552 with 19 df.5 (a) SEx = 24. s = 21.022.01.064 using degrees of freedom n − 1 = 24. A 95% conﬁdence interval for the mean loss in vitamin C is (50.7 7. (c) 0. and using software.025 and 0.89). (a) The t statistic must exceed 2. since we are interested in whether the average sales are different from last month (no direction of the difference is speciﬁed). (b) (59.47 with 49 df.19 7.7287 × 10−6 using 119 degrees of freedom. (b) t = 2. (17.075. 19. σx = 3.719.81).9916. (b) This result can be found in Table D in the row corresponding to z ∗ . (c) Excel gives 9. The lesson is that for large sample sizes the t procedures are very robust.093 and 2. (f) Excel gives 0. (a) Degrees of freedom = 119. (b) x = 34.5768. There is strong evidence that the average sales have increased.9902).000 and Ha : μ > 20. (a) There is no obvious skewness and there are no outliers present. clearly nonNormal. t = 4. (e) Signiﬁcant at 5%. . but use df = 100 on Table D to be conservative. . 000.156. The power of the t test against the alternative μ = 2. (a) t ∗ = 2.54.17 7.02 < P-value < 0. (a) Degrees of freedom = 19. The hypotheses are H0 : μ = 20.1 7. (a) Two-sided. . this is the conﬁdence interval for the population average weight. . (a) 0. (c) t ∗ = 2.00. We conclude that there is not much evidence that the mean rent of all advertised apartments exceeds \$500.

65 7.0021 using the Normal approximation or 0.67 7.61 7.13. (b) (4. Randomization makes the two groups similar except for the treatment and is the best way to ensure that no bias is introduced. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-37 data is 19. (c) P-value lies between 2 × 0.59 7. SPSS and Minitab report both the standard deviation and the standard error of the mean for each group. (a) The data are probably not exactly normally distributed because there are only 5 discrete answer choices. If we use the 68–95–99. while SAS reports only the standard error of the mean for each group. and Minitab provides the t with Satterthwaite degrees of freedom. There is not enough evidence to say that there is a signiﬁcant difference between the average consumption of sugary drinks 7. (b) Yes.69 7.55 (a) Two-sided signiﬁcance test.2 < P-value < 0.57. and n = 24 since the zero difference is dropped. all report the conﬁdence interval for the mean difference. since you just want to know if there is evidence of a difference in the two designs. Reject the null hypothesis. Reject the null hypothesis. All report degrees of freedom and P-values to various accuracies. First write pooled variance as the average of the individual variances and then use this expression in the pooled t to see that it gives the same result for equal sample sizes.7% rule. 5. (d) The sample sizes are so large that a little skewness will not affect the results of the two-sample comparison of means test. There is strong evidence of a difference in daily sales for the two designs.63 7. There is strong evidence the average self-efﬁcacy score for the intervention group is signiﬁcantly higher than the average self-efﬁcacy score for the control group. 7. 0. 0.426). P-value < 0.) (d) t = 3. (e) (0. Do not reject the null hypothesis.49. but be sure to double the P-value when checking your answer with this one. t = 1.3. (b) Excel reports the two variances.0033 using the binomial distribution. The two-sample comparison of means t test is fairly robust.0025 = 0.9 oz. There is very strong evidence the average exposure to respirable dust is signiﬁcantly higher for the drill and blast workers than it is for the concrete workers.44.57 7.002. The pooled t = 17.005 and 2 × 0. The same problem with negative consumption for the older kids (starting with 95%) as well. then 68% of the younger kids would consume between −2. (b) 29 df using the conservative approximation. t = 18. P-value is close to 0.374. while SAS seems to provide the most information because it includes more information about the two groups individually. The P-value is P(X ≥ 19) = 0.Moore-212007 pbs November 27.5 and 18. SPSS and SAS provide the t with Satterthwaite degrees of freedom as well as the pooled t. (A “=” would also be appropriate in the alternative hypothesis. (c) H0 : μ I = μC and Ha : μ I > μC . (c) H0 : μD/B = μC and Ha : μD/B > μC . of sweetened drinks every day. (a) All report both means. (c) Excel is doing the pooled two-sample t procedure. (e) Excel has the least information.001 = 0. (d) With the exception of Excel.669).0005. (b) H0 : μolder = μyounger and Ha : μolder = μyounger .191. This is a matched pairs experiment with the pairs being the two measurements on each day of the week. the large sample sizes would compensate for uneven distributions. (a) No.71 .13 with 133 df. Results are almost identical to those of Example 7.

2). (a) The hypotheses are H0 : μ0 = μ3 and Ha : μ0 > μ3 .85 7. (b) The value of the t statistic is 2.5.47. and the P-value is approximately zero. (b) The 90% conﬁdence interval is (19. so in spite of the skewness appearing in the histograms. (e) How were these children selected to participate? Where they chosen because they consume such large quantities of sweetened drinks? 7. These values would be used in the t statistic. then the percent sales would have decreased. 160. where 0 corresponds to immediately after baking and 3 corresponds to three days after baking. Do not reject the null hypothesis.965.07/2 = 0.532).85. 4.932. so it is very unlikely that the ﬁrst mean is greater than the second mean.07/2 = 0. so the improvement is greater for those who took piano lessons.58).77. (b) P-value is 0.73 (a) Reject the null hypothesis because 0 is not inside the conﬁdence interval. df = 21. You reject at both the 1% and 5% levels of signiﬁcance. 34. and the sum of the sample sizes is close to 40. (c) (−5. 18.46) using 40 degrees of freedom as the conservative estimate from the t-table. Our sample mean difference is well above 0. (a) Using 50 df to be conservative and using Table D.662) using software.54.16. which is included in the conﬁdence interval. The analysis would proceed by ﬁrst taking the change in score and then computing the mean and standard deviation of the changes. Reject the null hypothesis. (c) These are observational data. (b) The individuals in the study are not random samples from low-ﬁtness and high-ﬁtness groups of middle-aged men. (a) The hypotheses are H0 : μLow = μHigh and Ha : μLow = μHigh . t = −0. and P-value > 0.79 7. t = −8.014.45).32. 2007 9:50 S-38 SOLUTIONS TO ODD-NUMBERED EXERCISES between the older and younger groups of children. (b) As sample size increases.91 7. The t procedures are not particularly appropriate here. If the true difference in means was −1.93 . A 99% CI will be wider than a 95% CI because the t ∗ value increases as the conﬁdence level increases. Using 33 df. 95% CI = (15. the 95% conﬁdence interval is (−0.29.77 7.89 7. The 95% conﬁdence interval for the cost of the extra bedroom is (−0. the t procedures can be used. (b) The 90% conﬁdence interval is (−11. so it is reasonable to conclude that the ﬁrst mean is greater than the second mean.23. H0 : μ A = μ B can also be written as H0 : μ A − μ B = 0. which indicates there is strong evidence that the vitamin C content has decreased after three days.81 7.Moore-212007 pbs November 27. 10. where 0 corresponds to immediately after baking and 3 corresponds to three days after baking.87 7. This would now be a matched pairs design. since we have before and after measurements on the same men. The sample data is not normally distributed either. the 95% conﬁdence interval is (1. (b) The negative values correspond to a decrease in sales. (d) The sample sizes are very uneven and fairly small.223 with a P-value of 7. and P-value = 0. so the possibility of bias is deﬁnitely present. which indicates no evidence of a loss of vitamin E. (a) The Normal quantile plots are fairly linear. t = 22.662.83 7.9). 24. Our sample mean difference (x 1 − x 2 ) is far below 0.75 7. (a) P-value is 1 − 0.2.9. df = 1. 6. (a) The hypotheses are H0 : μ0 = μ3 and Ha : μ0 > μ3 . the margin of error decreases.035.

1 There is a signiﬁcant difference between the high.125 0. (a) t ∗ = 12. (b) F = 3.129 Between 0.103 (a) The value in the table is 647.7411.Moore-212007 pbs November 27.99 7.981).101 The hypotheses are H0 : σ1 = σ2 and Ha : σ1 = σ2 . (b) t ∗ = 4.37 0.87 1.29 0. P-value = 2 × 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-39 0. 7. δ) = 0.105 (a) The hypotheses are H0 : σM = σW and Ha : σM > σW . and using an F(1.5 > 0. The powers obtained using the Normal approximation are quiter close to these values.7765.4855.24 1. 7.9769.139 0.14 0.3390.001 and 0. serving ordered .2981 = 0.907.16 0.2 and 0. 7.3 ≈ 0. F = 1.002 and 0.20.06 0. n = 50 is 0. 7.45 0. (b) F = 1.71.5 > 0.95 7.859.28 2.1 and 0.123 0.630 0. so there is little evidence of a difference in variances between the two groups. (c) The increase in degrees of freedom means that the value of the t statistic needs to be less extreme for the pooled t statistic in order to ﬁnd a statistically signiﬁcant difference.005 Between 0.44 1. n = 75 is 0.609 2.303.79.9460. (a) The upper 5% critical value is F ∗ = 2.136 0.47 3.1528.109 (a) Using SAS.02 Between 0.5 Conclusion Reject H0 Do not reject H0 Reject H0 Do not reject H0 Reject H0 Do not reject H0 Do not reject H0 Do not reject H0 0. These powers were calculated using SAS.2 Between 0.48 0.08 0. DF.0764 = 0. (c) Using an F(19. 7.24 2 sH 170 + 2 sL 224 t 3.10. which is greater than the power found in (a).59. 17) distribution P-value > 0. δ) = 0.5962.39 0.002 Between 0. and using an F(33. (b) Signiﬁcant at the 10% level but not at the 5% level.23 0. 35.107 The power for n = 25 is 0.21 0.05 Do not reject H0 and 0. (b) Using SAS. 7. 1) distribution and statistical software.94.127 0.86 P-value Between 0. POWER = 1-PROBT(t ∗ . n = 125 is 0.01 Reject H0 and 0.74. 7.125 0.143 0. 43) distribution and statistical software.005 > 0. P-value = 2 × 0.97 The approximate degrees of freedom are 37. POWER = 1-PROBT(t ∗ .119 0. so there is little evidence of a difference in variances between the two groups.01 0. so there is little evidence that the males have larger variability in their scores. DF.0165. (c) The conﬁdence interval for the difference is (4. n = 100 is 0. well-dressed staff.111 Perceived quality Food served in promised time Quickly corrects mistakes Well-dressed staff Attractive menu Serving accurately Well-trained personnel Clean dining area Employees adjust to needs Employees know menu Convenient hours xH − xL 0.and low-performing restaurants in regards to food served in promised time.8787. This gives fairly strong evidence that the mean SSHA score is lower for men than for women at this college.

Side-by-side boxplot shows cotton much higher than ramie.573).113 (a) Study was done in South Korea. This suggests that more than half of the products at the alternate supplier are priced lower than the original supplier.98.3077).137 As the degrees of freedom increase for small values of n. the F test is not robust to skewness.96 but never become greater than 1.207.312. (c) The P-value is approximately equal to zero. P-value is very close to 0 so reject the null hypothesis. 2007 9:50 S-40 SOLUTIONS TO ODD-NUMBERED EXERCISES food accurately. 7. There is strong evidence that cotton has a signiﬁcantly higher mean lightness than ramie. Response rate is low (394/950). Only selected QSRs were studied. The P-values of both are approximately zero. . There is not a signiﬁcant difference in the other qualities. 7. (b) Yes.012). (b) The t procedures are robust for large sample sizes even when the distributions are slightly skewed. 0. (b) (−0.39219. 7.0765. P-value is close to 0 so we can reject the null hypothesis.954. As sample sizes get larger. and employees know menu. 7. 7. There is strong evidence that hotel managers have a signiﬁcantly higher average masculinity score than the general male population. 0. 7. 7. (c) The middle 95% of scores would be from 29. 7.96. (b) (12.121 (1. 7. 7. therefore they used a single sample t test.115 H0 : μ = 4.555 ± 0. the t values are close to z = 1. results may not apply to other countries. x R = 41. H0 : μC = μ R and Ha : μC > μ R .131 (a) (0. 7. (b) Answers will vary. 92. and we can conclude that the program was effective. slightly skewed right but fairly symmetric.127 (a) This study used a matched pairs design. 7.6488. the t test is robust to skewness.09).133 (a) No.9998. t = −8.88 and Ha : μ > 4. results may not apply to other QSRs.139. we would trust the results more if the rate were higher. sC = 0. (d) The scores for the ﬁrst minute are clearly much lower than the scores for the 15th minute. Results of the t tests show that there is a signiﬁcant difference in the average SATM scores between the two sexes.25377. the value of t rapidly approaches the z score.Moore-212007 pbs November 27. The fact that no differences were found when the demographics of this study were compared with the demographics of similar studies suggests that we do not have a serious problem with bias based on these characteristics.117 (a) No outliers.66 to 44. 7. t = 46.129 (a) H0 : μ1 = μ2 and Ha : μ1 < μ2 . We need to assume fairly normally distributed data without outliers and that a simple random sample was taken from each group.123 2. P-value ≈ 0.6035). Conclude that the workers were faster than the students.55. t = 21. (b) The average weight loss in this program was signiﬁcantly different from zero. s R = 0.125 The 95% conﬁdence interval on percentage of lower priced products at the alternate supplier is (64.119 x C = 48.162. 7.9513.98. 13.135 The F test result shows that there is no reason to believe the variances of the two sexes are unequal.88.

59 8. 0. np0 = 150 and n(1 − p0 ) = 350.9 m: 0. (b) (0. Ha : p = 0.2 0. 0.64.47 8.642.267). 0. The P-value is approximately 0.34. 0.112.5 0. 0.49 ˆ For p2 : mean = 2 σ p1 − p2 ˆ ˆ = 2 σ p1 ˆ + p2 (1− p2 ) .3 0.1802. ˆ n n = μ p1 −μ p2 = ˆ ˆ (0.43 8. 0. (c) Safe. np0 = 4 < 10 and n(1 − p0 ) = 6 < 10.0820 0.089).7 0. The test statistic z = 4. (c) These results are consistent (same P-value) with the previous exercise.53 8.Moore-212007 pbs November 27. (a) μ p1 − p2 = −0. Also.35 (0.3078.05. we cannot automatically conclude that a total of 24 = 15 + 9 applicants lied about one or the other.0877 0. (0.33 8.5.5165). We round up to get n = 201.64. we would not reject the null hypothesis that the probability that Kerrich’s coin comes up heads is 0. standard deviation σ p2 = ˆ ˆ p1 − p2 .93. 0. (a) Not safe. (d) Safe.0877 0. not a designed experiment. 8. np0 = 60 and n(1 − p0 ) = 40.1 and σ p1 − p2 = 0.45 8. H0 : p1 = p2 . (b) μ D = μ p1 − p2 ˆ ˆ n p1 (1− p1 ) p2 (1− p2 ) 2 σ p2 = + .295.61 .0947.294). The margin of error for the 90% conﬁdence interval is 0. (b) (0. Round up to get n = 451. ˆ (a) p = 0.0537 8. (a) H0 : p = 0. standard deviation σ p1 = ˆ ˆ μ p2 = p2 . Both are greater than 10.0691. Observational studies generally do not provide a good basis for concluding causality.339).4 0.768. Plus four interval: (−0.106. this is not a random sample from the population of all bicyclists who were fatally injured in a bicycle accident.1 0. P-value = 0. ˆ ˆ ˆ ˆ (a) H0 : pImpulse = pPlanned and Ha : pImpulse = pPlanned .125. n = 450. (b) Safe.46. (0. (c) 8.912). This is not strong evidence against the null hypothesis that the sample represents the state in regard to rural versus urban residence in terms of the proportion of urban residents. The P-value = 0. 0.02.0716 0. The P-value = 0.0895 0. n = 1711. only association.3524.31 8.6 0. Z interval: (−0. ˆ p: 0.3168.272).51 ˆ (a) For p1 : mean μ p1 = p1 . 0.768). 2007 9:50 S-42 SOLUTIONS TO ODD-NUMBERED EXERCISES are not necessarily mutually exclusive events.0537 0. This is strong evidence that a higher proportion of complainers than noncomplainers leave voluntarily. We conclude there is not strong evidence against the hypothesis that the sample represents the state in regard to rural versus urban residence in terms either of the proportion of rural residents or in terms of the proportion of urban residents.41 8. Z = −1.39 8.37 8. Do not reject the null hypothesis. Ha : p1 > p2 . (a) The test statistic is z = 1. Both are greater than 10. (c) This is an observational study.0820 0.0716 0.18. and since this is larger than 0.0788). and X = 542 (the number who tested positive) are basic summary statistics.8 0.4969. (c) (−0. n 8.0301.57 2 σD p1 (1− p1 ) . (b) The test statistic is z = −0.55 8.289. There is not enough evidence to say 8.

0273. (c) (−0.141 and p2 = 0.14.1963.511.408. P-value = 0. P-value is approximately 0. H0 : p1 = p2 . There is evidence of a signiﬁcant difference between the proportion of male athletes who admit to cheating and the proportion of female athletes who admit to cheating.2125. This is strong evidence that the proportion of males born in Copenhagen with an abnormal male chromosome who have criminal records is larger than that of males born in Copenhagen with a normal male chromosome. We would recommend that ﬂip-up shields be used on new tractors. H0 : p1 = p2 . Let p1 denote the proportion of all Danish males born in Copenhagen with a normal male chromosome who have had criminal records and p2 the proportion of all Danish males born in Copenhagen with an abnormal male chromosome who have had criminal records. The interval does not contain 0. (b) (0.79 8.1787). Ha : p1 < p2 . (b) SE D = ˜ 0.029. (a) z = 5. (a) H0 : p1 = p2 . and this is strong evidence that the opinions differed in the two counties. Someone who gambles will be less likely to respond to the survey. ˆ ˆ p1 (1− p1 ) contributes more to the standard error of the difference (b) The term n1 because n 1 = 191 is so much smaller than n 2 = 1520. 0. This is strong evidence that there is a difference in the proportions of females and males who were in a fatal bicycle accident.038.8.4115). in fact. Benton County: p2 = 0.69 8.71 8.34.2377). Ha : p1 = p2 .339. Ha : p1 = p2 . P-value = approximately 0. Do you think men or women are more likely to report that they do not gamble when. 0.5. so reject the null hypothesis. ˆ ˆ (a) Tippecanoe County: p1 = 0. −0.7333. This is not strong evidence that there is a difference in preference for natural trees versus artiﬁcial trees between urban and rural households. Z = −0. 0. We test the hypotheses H0 : p1 = p2 . P-value = 0.77 8. Ha : p1 = p2 . z = −5.75 8. P-value = 0.2186. ˆ ˆ (a) p1 = 0.51. and tested positive. 8. We could conclude that there is reasonably strong evidence that the proportion of cockroaches that will die on glass is less than the proportion that will die on plasterboard. were tested for alcohol. 0. Zero is well outside this interval. There is not enough evidence to say that the applicants are lying in different proportions than they did 6 months ago. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-43 the difference in credit card use between impulse and planned purchases is statistically signiﬁcant. (a) (−0. P-value = 0.482).23. (b) z = 1. A 95% conﬁdence interval is (−0. There is not enough evidence to say that the detergent preferences are signiﬁcantly different for people with hard water and people with soft water.73 8. The 95% conﬁdence interval (male − female) is (0.1389).81 . Ha : p1 = p2 . Z = 1.67 8.40.63 H0 : pS = pH and Ha : pS = pH .0209. (c) (0.91.2542.143). Do not reject the null hypothesis. P-value = approximately 0.Moore-212007 pbs November 27.0281. they do gamble? 8. This is strong evidence that there is a difference in the proportions of the two types of shields removed. 0. so that bolt-on shields appear to be removed more often than ﬂip-up shields. z = −3.0003. We test the hypotheses H0 : p1 = p2 .65 8.253. Z = 14. (b) z = −1. P-value < 0.

231. Users (total number) Analysis of education Analysis of income 1132 871 Nonusers (total number) 852 677 8. Ha : p1 = p2 . The 95% CI (users − nonusers) is (−0. For nonusers. H0 : p1 = p2 . P-value = 0. so reject the null hypothesis.Moore-212007 pbs November 27. so you could summarize by saying that the margin of error was no greater than 3. 2007 9:50 S-44 SOLUTIONS TO ODD-NUMBERED EXERCISES 8. Let p1 be the proportion of all die-hard fans who attend a Cubs game at least once a month and p2 the proportion of less loyal fans who attend a Cubs game at least once a month.1141.39 0. personalities of the servers. it is not a serious limitation for this study. there is variability involved in taking a sample.3 24 20 17 7 3 n 247 247 247 247 247 1371 1371 m (in %) 3. 0. Did one server only do the repeating while the other server did no repeating.102.2019). There is strong evidence that the users and nonusers differ signiﬁcantly in the proportion of college graduates. Z = 6.0626). 0.1676.97 with a P-value near 0.38. (c) Cultural differences. z = 9. (b) H0 : p1 = p2 . Ha : p1 = p2 . 0.00 2.69 0. so do not reject the null hypothesis.783. 95% CI (repeat − no repeat) is (0.72 2.0022. We test H0 : p1 = p2 .04.83 (a) and (b) Category Download less Peer-to-peer E-mail and IM Web sites iTunes Overall use of new services Overall use paid services ˆ p (in %) 38 33. the proportion of “rather not say” is 0. You could also separate out the last 2 questions by saying their margin of error was less than 1%. P-value = approximately 0. the proportion of “rather not say” is 0.05. so reject the null hypothesis. This is strong evidence that the proportion of die-hard Cubs fans who attend a game at least once a month is larger than the proportion of less loyal 8. 8.205. and gender differences could all play a role in interpreting these results.85 (a) H0 : p1 = p2 . Z = 3. Z = 1.46 (c) Argument for (A): Readers should understand that the population percent is not guaranteed to be at the sample percent. (b) 95% CI (users – nonusers) = (0.55 2.91 . There is not much evidence of a signiﬁcant difference in the proportion of “rather not say” answers between users and nonusers.0106.09 3. P-value = 0. Since the nonresponse rate is not signiﬁcantly different for the two groups. Ha : p1 > p2 . There is strong evidence of a signiﬁcant difference in the proportion of tips received between servers who repeat the customer’s order and those who do not repeat the order.430). or did they switch off? (d) Answers will vary. Argument for (B): Listing each individual margin of error does seem excessive. 8. Ha : p1 = p2 . pnorepeat = 0.89 ˆ ˆ (a) prepeat = 0.87 For users.517.09% for each of these questions.

11. This does not include 0. The margin of error is m = 0. A 95% conﬁdence interval for the difference in the two proportions is (−0.09.390.93 8. This is consistent with the results in Example 8.5.5)(0. 8.8.0013. The z statistic and P-value are almost the same as in part (b). 8.5) .3755.438 25 0. z = −3. We test the hypotheses H0 : p = 0. the margin of error decreases. This conclusion is justiﬁed provided the trial run can be considered a random sample of all (future) runs from the modiﬁed process and that items produced by the process are independently conforming or nonconforming.76(1025) = 779 (after rounding off). 8. This is strong evidence that there is a difference in the proportions of male and female college students who engage in frequent binge drinking. −0.99 8.S. P-value = approximately 0. P-value = approximately 0.062 As sample size increases.107 (a) p0 = 0. adults who are male. (b) H0 : p1 = p2 .20.6736). it is not possible to guarantee a margin of error of 0.97 8. 0.00319). z = −29.95 The number of people in the survey who said they had at least one credit card is n = 0. (c) z = −29.791.098 400 0. P-value = approximately 0. n2 This leads to ˆ 8.5645). .16. Ha : p < 0. Let p be the proportion of products that will fail to conform to speciﬁcations in the modiﬁed process.139 150 0. the proportion of U. z = −3.11. 8.S. m = 1. We would conclude that there is strong evidence that the proportion of men with low blood pressure who die from cardiovascular disease is less than the proportion of men with high blood pressure who die from cardiovascular disease. (0.277 50 0. A 95% conﬁdence interval for the difference in the two proportions is (0. the former value. P-value = 0.5)(0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-45 Cubs fans who attend a game at least once a month. Let p1 denote the proportion of male college students who engage in frequent binge drinking and p2 the proportion of female college students who engage in frequent binge drinking. This is strong evidence that the proportion of nonconforming items is less than 0.196 100 0. P-value = 0. 0. (b) p = 0.11. so we would conclude that the proportion of heavy lottery players who are male is different from the proportion of U. We test the hypotheses H0 : p1 = p2 .113 200 0.01411.105 Starting with a sample size of 25 for the ﬁrst sample. adults who are male.101 (a) Let p1 denote the proportion of men with low blood pressure who die from cardiovascular disease and p2 the proportion of men with high blood pressure who die from cardiovascular disease. so the magnitude of the difference is large.069 500 0.960 a negative value for n 2 .485.0275. Ha : p1 < p2 . The 95% conﬁdence interval is (0.0008. Ha : p1 = p2 . z = 9. which is not feasible.5524.15 or less.103 n m 10 0.5) 25 + (0.Moore-212007 pbs November 27. We would conclude that there is strong evidence that the probability that a randomly selected juror is Mexican American is less than the proportion of eligible voters who are Mexican American.

or obtained from a shelter. and no intervention has a 20. The older employees (over 40) are almost twice as likely to fall into the lowest performance category but are only 1/3 as likely to fall into the highest category. 9.000. df = 4. This is highly signiﬁcant.241 and P = 0.4121. df = 2. Although there appears to be a strong increasing trend.028. (c) χ 2 = 163.37 9. Statistical software gives P-value = 0. Phoenix: χ 2 = 2.001.611. and P-value < 0. It is easily veriﬁed that z 2 = χ 2 .572 and P < 0.000. The percent continued to increase in the late 1980s but much more slowly. so the difference in percent on time for the two airlines is highly signiﬁcant. stray. χ 2 = 3. with some leveling off in the early 1990s. while a much higher percent of cats than dogs come from other sources such as born in home. df = 12. χ 2 = 43. The percent started increasing again in the mid-1990s. San Diego: χ 2 = 4. (b) H0 : There is no relationship between intervention and response rate.31 9. df = 4. A much higher percent of dogs than cats come from private sources.2% response rate. df = 2. (a) Combined χ 2 = 13. df = 2.45 9. the changes in the percents are quite small (from 60% to about 66%). Use the information on nonresponse rate.037.487. indicating a relationship between the PEOPLE score and ﬁeld of study. note that science has an unusually large percent of low-scoring students relative to other ﬁelds.6% response rate.845 and P = 0. P-value = 0.000. z = 6.35 9.346 and P = 0. P-value = 0. There is strong evidence of a relationship between winning or losing this year and winning or losing next year. but very slowly. The source of dogs and cats differs.Moore-212007 pbs November 27.223 and P < 0. Intervention seems to increase the response rate. χ 2 = 24. In Exercise 9.126. while liberal arts and education have an unusually large percent of high-scoring students relative to the other ﬁelds.0005. San Francisco: χ 2 = 21.29 9.513. (a) A phone call has a 68. Seattle: χ 2 = 14. df = 2. while for the data in this exercise the opposite is true.000. P-value = 0.9. but that percent increased quite rapidly until the mid-1980s. χ 2 = 19.47 .683. There is no evidence of a difference in the income distribution of the customers at the two stores. with America West being the winner. P-value = 0. (b) χ 2 = 6. with a phone call being more effective than a letter. P-value = 0.903 and P < 0. and take a larger sample size than necessary to make sure that you have enough observations with nonresponse accounted for. Women represented a very small percent in 1970. when it reached about 60%. At the 5% level of signiﬁcance we would conclude a relationship between the source of the cat and whether or not the cat is brought to an animal shelter.411.33 9. (b) Los Angeles: χ 2 = 3. Ha : There is a relationship.41 9.413. df = 1. P-value = 0.000. The data show no evidence the response rates vary by industry.43 9.969 with 13 degrees of freedom (P-value = 0.696). 2007 9:50 S-48 SOLUTIONS TO ODD-NUMBERED EXERCISES 9. (c) Most of the effects illustrating the paradox are statistically signiﬁcant.81.072. (a) Column percents because the “source” of the cat is the explanatory variable. Among the major differences.20 good performance continued.27 9.001.277.25 χ 2 = 50. The graph shows an increasing trend. a letter has a 43.7% response rate.1977 and χ 2 = 38.39 9. yet χ 2 = 9.

7 10.91. (c) MEAN OVERSEAS RETURN = 4. (a) χ 2 = 24.10 .5 10. min = 2.66% increase in overseas returns. (e) For 2001. min = 29..94.51 CHAPTER . so we can reject the null hypothesis. .. there is a strong. (b) The output gives t = 6. which indicates no gender differences for the high versus low mastery categories. It’s very hard to tell if there are any outliers or unusual patterns because the relationship is so weak.66 × U.3 Predicted wages = \$449.24 (d) SPENDING = β0 + β1 · YEAR + ε.7%. and the regression standard error is s = 16.11 (a) r = 0. overseas returns are 4.03 and P-value = 0.. A 1% increase in U.863. and P-value < 0.0005.0005 and there is strong evidence that the population correlation ρ > 0.14 is a comparison of several populations based on separate samples from each. Ha : β1 > 0. The mutual funds that had the highest returns last year did so well partly because they were a good investment and partly because of good luck (chance). Inspection of the data shows that the males have higher percents in the two high social comparison categories. max = 9. From SPSS. 9. When U. the hypothesis test in part (d) has t = 3.S. positive. s = 17.18 −0.7 + 0.400 + 1. max = 70.535. (h) The assumptions are reasonable—we are assuming a simple random sample from the population.12 is a test of independence based on a single sample. 10.32 with 49 degrees of freedom.S..91. This agrees with what we found from the full 4 × 2 table. (b) β1 = 0. and linear. t = 6. (c) χ 2 = 0.400.76. (g) Yes..29. The ε term in this model accounts for the differences in overseas returns in years when the U. (a) β0 = 4. P-value < 0. with estimates β0 = −2651.9. ˆ (b) y = −2651..32 with 49 degrees of freedom. . 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-49 9.280.16 is a test of independence based on a single sample.Moore-212007 pbs November 27. returns is associated with a 0. β1 = 1.923 + 0. while the females have higher percents in the two low social comparison categories.2898. linear relationship between year and spending.7.49 Exercise 9. (e) y = 52. skewed left but no outliers.S. IBI: x = 65. the normal probability plot looks good. (b) Scatterplot looks weak.30 −0..9%. The residual is \$2.13 is a comparison of several populations based on separate samples from each. P-value < 0.13 Area: x = 28.001. ε = 0.415 and a P-value of 0. 10.340x. H0 : β1 = 0.6270.. R 2 = 19.3177. SEb1 = 0.9 b1 = 0. return is the same.12 0. The trend might not stay the same beyond 1999. positive.24 −0.45 and P-value < 0. Exercise 9. 10. returns are 0%. RETURN + ε. Residual = −\$60.. Ha : β1 = ˆ 0. (d) H0 : β1 = 0.460x. (c) IBI = β0 + β1 · AREA + ε. This is an extrapolation. There is strong evidence that Area and IBI have a linear relationship.1 10. df = 2. Exercise 9. (c) Year 1995 1996 1997 1998 1999 Residual 0. fairly symmetric and normally distributed except for 2 high outliers.0005.0992. (f) The residual plot shows that the residuals get slightly less spread out as Area increases. t = 6. the predicted spending is \$29.94. (a) Yes. (b) χ 2 = 23.S.66.6700. Exercise 9.0005.340. but overall it doesn’t look too bad. 10. s = 18.714.

05.9. 10. (c) Income = 24. (b) y ± t ∗ SEμ = (397.5 to 10.8209 to 27982. The relationship is weak.6844).5 to 7.86. The conclusion we reached in Exercise 10. The conclusion we reached in Exercise 10. and the Normal probability and residual plots look good. the intercept and slope for the least-squares regression line change from 4786.187469.17 is not changed. ˆ ˆ 10.8209 to 8039. (c) t = 5.29 with 49 degrees of freedom. so that age explains only about 3% of the variation in income. respectively.0005. We conclude that there is strong evidence that ρ > 0.835.03525. P-value < 0. 10. P-value = 0. This is strong evidence that β0 > 0. 10.1135. (b) b0 = 2. (b) Selling price = 4786.0005.51 with 48 degrees of freedom. (b) Older men might earn more than younger men because of seniority or experience. 454. respectively.17 H0 : ρ = 0.26 and 89.1%.Moore-212007 pbs November 27. 3. The correlation is positive.6% to 71.17 is not changed. Younger men might earn more than older men because they are better trained for certain types of high-paying jobs (technology) or have more education. Ha : ρ = 0. 10.8209 × Square footage. No one will purchase T-bills if they do not offer a rate greater than 0. The correlation is a numerical description of the linear relationship between Area and Forest. (b) Regression inference is robust against a moderate lack of Normality for larger sample sizes. 10. ˆ ˆ .54). At the upper (higher income) portion of each vertical stack the points are more dispersed.8 and 73.3029. r 2 = 0. so there is a positive association between age and income in the sample. which is very weak.21 (a) We see the skewness in the plot by looking at the vertical stacks of points.1135 × Age. 10.0001.23 (a) The intercept tells us what the T-bill percent will be when inﬂation is at 0%. (c) (0.8%. the predicted income increases by \$892. This agrees with what we saw in the test using the slope in the 10. and the t statistic for the slope changes from 10.6% to 59. (d) (1.97.46 and 92.6476. 2007 9:50 S-50 SOLUTIONS TO ODD-NUMBERED EXERCISES there is a fairly linear (although weak) relationship between IBI and Area.3745 + 892. (b) r 2 decreases from 69.46 + 92. with only a very few at the highest incomes.33 (a) y ± t ∗ SEμ = (391. and the t statistic for the slope changes from 10.14. r = 0.696 and indicates that square footage is helpful for predicting selling price.061. there is approximately the same spread above and below the regression line on the scatterplot which is fairly uniform for all Area values on the plot. Ha : ρ > 0. 10.35). There is a statistically signiﬁcant straight-line relationship between selling price and square footage. There is not enough evidence to say that the correlation is signiﬁcantly different from 0 if a signiﬁcance level of 0.19 (a) Each of the vertical stacks corresponds to an integer value of age.15 Area is the better explanatory variable for regression with IBI.29 (a) r 2 increases from 69.25 (a) r 2 = 0. 10.874.46 and 92. The slope tells us that for each additional year in age. the intercept and slope for the least-squares regression line change from 4786. 449. t = 10. 0.5039. so do not reject the null hypothesis.200261).31 (a) Growth is much faster than linear.6660 and SEb0 = 0. P-value < 0. There are many points in the bottom (lower income) portion of each stack.27 H0 : ρ = 0. IBI and Forest have a much weaker relationship than IBI and Area. (b) The plot looks very linear.3070. 10.05 is used. P-value < 0.

09944.1936. 10. F = 39. (b) (33. 53.0% PI.203). P-value = 7.496). (c) df = 137.47 SEb1 = 0.314). 10. latitude.874. r 2 = 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-51 10. There is strong evidence that the slope of the population regression line of proﬁtability on reputation is nonzero.36% of the variation in proﬁtability among these companies is explained by regression on reputation. 10.03996.140586). 10.0417. (b) This interval is wider because the regression output uses all 5712 men in the data set. (b) Under the column labeled 95.996).41 (58.885 − 0.111822.63 Answers will vary. but using the t table we would use df = 100 to be conservative.61 r 2 = 0.0001. Total SS 10. the pharmacy would be charging less of a markup. P-value for F statistic = 0.915. 10. ˆ 10.49 The hypotheses are H0 : β1 = 0. but the value of r 2 in part (b) tells us that only 19.3177. They are not exactly the same because the IBI and Forest did not have as strong a relationship as IBI and Area.Moore-212007 pbs November 27.78. which equals the value of R-square in the output. t = 6. and P-value = 0. It is too wide to be very useful.0% CI.57 (0.0001 = P-value for t statistic for the slope. ˆ ˆ (b) y = 2. 10. A 95% prediction interval for a future response estimates a single IBI for a 31 km2 area. This interval includes values larger than 0.51 (a) r 2 = = 0.780. 0.53 s = 19. 62.35 (a) (62. 10.37 The conﬁdence intervals for mean response and the prediction intervals are both fairly similar. For testing H0 : β1 = 0.914 and t = 6.3745 + 892.1135 × Age = 51.446.43 (a) (44. 10. 0. (c) Under the column labeled 95.010). so we can reject the null hypothesis. There is strong evidence that (log) markup and (log) cost have a negative linear relationship (evidence that charge compression is taking place).041. 100.11428).295x where y is the predicted (log) markup and x is the (log) cost.797).08. we see that the desired interval is (−41.492) is the square of the t statistic (6.65 (a) For more expensive items.041) for the slope. the P-value is less than 0. (d) It would depend on the type of terrain and the location. .36% of the variation in proﬁtability among these companies is explained by regression on reputation. (b) 19. 55. (b) s = Residual MS = 2. 10.39 (a) y = 24. √ F = t.005. 10.637.1801. 71.563E − 08. 10. so Steve can’t be conﬁdent that he won’t be arrested if he drives and is stopped.4489.55 (a) H0 : β1 = 0.45 A 90% prediction interval for the BAC of someone who drinks 5 beers is (0. Ha : β1 = 0.2534. Mountain regions.735. (c) Part (a) tells us that the test of whether the slope of the regression line of proﬁtability on reputation is 0 is statistically signiﬁcant. (c) A 95% conﬁdence interval for mean response is the interval for the average IBI for all areas of 31 km2 . √ Regression SS 10.379. 10. we see that the desired interval is (49.579. while the one-sample t procedure uses only the 195 men of age 30. and proximity to industrial areas might make a difference. 145.59 The F statistic (36. Ha : β1 = 0. Ha : β1 < 0.

10. A two-sided test of the slope gives a t = 41. but not for women who work in large banks.79 (a) y = 106. (b) The ˆ moderately strong linear relationship is good. 10. For women who work in small banks. the P-value for the test of the hypothesis that the slope is 0 is 0. 0. where y is the predicted number of employees and x is the number of rooms.6) is 35. and the residual plot has a distinct funnel shape. and positive relationship between the number of students and the total yearly expenditures. the P-value for the test of the hypothesis that the slope is 0 is 0. We might use length of service to predict wages for women who work in small banks. but the outliers are not. (c) y = ˆ 101.8% of the variation in lean.589x. 10. The 95% conﬁdence interval for the slope is (−0. the least-squares regression line of lean on year explains 99. However. (d) H0 : β1 = 0. 1.75 It appears that regressing wages on length of service will do a better job of explaining wages for women who work in small banks than for women who work in large banks.774 in the original model to 129. The standard error of regression is s = 4. The s drops from 188. The least squares regression ˆ ˆ line is y = 0.Moore-212007 pbs November 27. and the regression standard error is s = 1. (e) (0. 10.891.530x.526 + 0.67 (a) Some questions to consider: Is this a national chain or an independent restaurant? What kind of food is prepared? Do the restaurants have similar stafﬁng and experience? (b) Answers will vary. so we cannot reject the null hypothesis. the normal probability plot does not look good.77 For women who work for large banks.73 The scatterplot shows a strong.3% in the model without the outliers.253.02 < P-value < 0. 10. where y is the predicted total yearly expenditure and x is the number of students.181. The new equation of the line is y = 72.6. This is a multiple of more than 8 times s and would be considered an extreme outlier from the pattern of the data from 1975 to 1987. (b) The R 2 drops from 60.70. .69 (a) The relationship looks positive. however. and moderately strong except for potential outliers for the two largest hotels (1388 and 1590 rooms each). linear.71 (a) Hotel 1 (1388 rooms) and Hotel 11 (1590 rooms) are the two outliers. which contains 0. r 2 = 99. so reject the null hypothesis.91066 meters. 10. linear.981 + 0. These results sound very promising. The assumptions for regression may not be appropriate here. Ha : β1 = 0 with t = 4.088.4%. (b) A plot of the data shows that these data closely follow a straight line.623 and a P-value close to 0. or 2.105.526 + 0. and 0.001.81 t = 2.287 and P-value from SPSS of 0. The 95% conﬁdence interval for the slope from the original model did not contain 0. and the P-value is now 0. There is not enough evidence to say that the slope is signiﬁcantly different from 0. There is strong evidence that there is a signiﬁcant linear relationship between the number of rooms and the number of employees working at hotels in Toronto. R 2 is very good at 98.6.8%.151 in the model ˆ without the outliers.04. which tells us that for the years 1975 to 1987.2134.16. ˆ 10.775).514x.284). 2007 9:50 S-52 SOLUTIONS TO ODD-NUMBERED EXERCISES 10. The new t test statistic is 1.5% in the original model to just 26. the absolute difference between the observed value in 1918 (coded value of 71) and the value predicted by the least-squares regression line (coded value 106.0002.

000. Depth: fairly normally distributed.1 11. the unrounded values are s = 2.538 on the original scale).5 Pred.50 + 0.3 11.4 M 90 11 14 2.21 13..55 + 0..808 · weight + 25.445 − 10.44958 and s 2 = 6.308 1. one low outlier.00) + 120(43. The name given to s in the output is S.643 (compared to 0.21 Saw 6 has the most negative residual of all.9 11. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-53 CHAPTER . and the coefﬁcient of assets is much smaller. 11. and Saws 2 and 11 both have negative residuals. .11 With General Motors and Wal-Mart deleted.533 on the original scale). 11. For SAS.238 log(assets) + 0.44958 and s 2 = 6. Weight plot looks good with possibly an outlier at Weight = 9.7 11. (b) The explanatory variables are the number of banks and deposits.84 11.416 s 43. The coefﬁcient of sales has more than doubled. This would require less in the form of assets and increases the amount of sales.. no outliers. The correlation between log(proﬁts) and log(assets) is 0. The name given to s in the output is Std.00045. Saw 8 has a positive residual. (b) Residuals vs. and the high outliers were eliminated from the other two plots. Error of the Estimate.744 0.674 · amps + 70.15 (a) Variable Price Weight Amps Depth x 91.00) + 150(38. The two high outliers are different than the high outliers in sales. the unrounded values are s = 2.030 · depth √ (b) 27. (c) p = 2.0553 sales. SPACE = 210 + 160(45. Residuals vs. ˆ 11.569 (compared to 0.. For SPSS.574. one low outlier.4 ft2 . (a) The response variable is bank assets.53 2.0765 Min 30 9 10 2. The name given to s in the output is Root MSE.Moore-212007 pbs November 27. no outliers.. Amps: very left-skewed.24) + 160(17. Residuals vs. (d) n = 54.. However. log(proﬁts) = −1. The linear association between log(assets) and log(sales) appears much stronger.00) + 65(315. we cannot say that . proﬁts = 1. 11.19 (a) A histogram shows that the residuals are fairly Normally distributed with no outliers.11 . For Minitab.000. Weight: fairly normally distributed.449581635 and s 2 = 6. 11.032 1. The name given to s in the output is Standard Error.533.455 on the original scale).17 (a) y = −303.00496 assets + 0.526 (compared to 0. Therefore.831 = 774.5 Max 150 13 15 2.2 Q1 50 11 12 2. The correlation between the explanatory variables log(assets) and log(sales) is 0.5 (b) and (c) Price: fairly normally distributed. Depth looks funnel shaped. The distribution of sales is skewed to the right with two high outliers. 11.4 Q3 140 12 15 2.478 log(sales). 11. the unrounded values are s = 2.000450185. Amps looks good with possibly an outlier at Amps = 10. The correlation between log(proﬁts) and log(sales) is 0.. It is not surprising that Wal-Mart has high sales relative to its assets because its primary business function is the distribution of products to ﬁnal users.450 and s 2 = 6.25) = 41... the unrounded values are s = 2.13 For Excel.

488.3619 . Package Excel Minitab SPSS SAS Coefﬁcient 0. A scatterplot of Residuals vs. 11. Rank looks completely random. 27.9 St.361902429 0. cash items.5 20. The association is less strong with check items but this is due to an outlier. 11.160 Eng + 0. 11.33 Plot the residuals versus the three explanatory variables and give a Normal quantile plot for the residuals.00 5. 1.038 0.0478 Staff. (b) s = 5.8 (b) Use split stems.03755888 0.80 134 5. 11.dev.034315568 0.91 P-value 0.00 15. (b) s = 1.3 20.0 Q1 2.98 (b) There is a strong linear trend between gross sales and both cash and credit card items.883 + 0.27 All summaries are smaller except minimums.3 Max.68 Median 263. 11. 2007 9:50 S-54 SOLUTIONS TO ODD-NUMBERED EXERCISES there is a relationship between rank and the residuals.Moore-212007 pbs November 27.2 Median 8.dev.80 Q1 2.9 Q3 11.30 125 1. Largest effect is on the mean and standard deviation.30 125 1.00 316.37 HSS does not help much in predicting GPA.29 Share = 1.157 Assets and s = 3.03756 t-statistic 0.30 Max.91 . with high outliers. 1.00 St.162. 7.00 Min.914 0. There are no obvious problems in the plots.31 (a) Stemplots show that all four distributions are right-skewed and that gross sales.362 .5 9.63 293 12.03432 3. (c) All three distributions appear to be skewed to the right.30 11.74 886 76.96 794 48.501.35 (a) TBill98 = 0. Variable Share Accounts Assets Mean 6.60 909 38.25 (a) Share = 5.04 7.00663 Accounts + 0.362 0.50 2500 219.52 20. Variable Gross Cash Credit Check Mean 320.913647251 0.432E-02 0.27 Median 6.03432 Standard error 0.dev.85 509 15. 11. 12.50 132 5. 11.0828 Assets.60 392 13.80 602.3 19.16 − 0.07 7.00031 Accounts + 0. with math and English grades available for prediction.85 + 0. 180.90 909 38.1 11.35 Min. 4.80 14.70 Q3 10.138 Arch + 0.76 St.03756 . and credit items have high outliers.23 (a) Variable Share Accounts Assets Mean 8.

77 Because we have no data on smaller homes with an extra half bath. rank in years is useful when performing the same regression analysis without the data from 2002 salary.65 From the stemplot.236. σ . and unsecured loan. (b) The relationship between μ y and x is curved. 11.16.7 + 1351. values that are less than −1. s = 9035.63 y = 112.177 with degrees of freedom 2 and 13. . predicted price = \$99. t = 0. P-value = 0. t = 2.73 The trend is increasing and roughly linear as one goes from 0 to 2 garages and then levels off. 11. however. (c) b0 = 30.915.109.109. 11.43. (b) t = 4. predicted price = \$97. reject the null hypothesis. 11. percent down payment.79 (a) The relationship between μ y and x is curved and increasing. (c) Years in rank included with the 2002 salary produced these results: coefﬁcient = 57. years in rank is not very useful in predicting 2005 salary. (c) The interest rate is lower for larger loans.017. Without an extra half bath. but the coefﬁcient for years in rank is not.563.69 For 1000-square-foot homes. For 1500-square-foot homes.915.96 will lead to rejection of the null hypothesis. do not reject the null hypothesis.783.62. 11. predicted price = \$79. βyears . ˆ 11. 11. b year s = 57.67 1000-square-foot homes.750.981. 11.47.392.837x.59 (a) y varies Normally with a mean μGPA = β0 + 8β1 + 9β2 + 7β3 . We conclude years in rank and 2002 salary contain information that can be used to predict 2005 salary. the two sets of predictions are fairly similar. lower for a higher percent down payment.6%.000 from software. (c) The relationship between μ y and x is curved and decreasing.71 The pooled t test statistic is t = 2. length of loan. Three of these are clearly outliers (the three most expensive) for this location.725.621. β2002 .01. (b) The GPA of students with an B+ in math. 11. lower for longer length loans.Moore-212007 pbs November 27. For 1500-squarefoot homes.719. The difference in price is \$15.253. There is evidence the coefﬁcient for 2002 salary is signiﬁcantly different from zero. and B in English has a Normal distribution with an estimated mean of 2. At the 5% level. our regression equation is probably not trustworthy. ﬁrst decreasing and then increasing.392.71. 11. When included with the data concerning 2002 salary.75 With an extra half bath. For years in rank.61 (a) yi = β0 + β2002 x2002i + βyears xyearsi + εi .875 with df = 35 and 0. Years in rank are useful for predicting 2006 salary. six of these are the six most expensive homes. (d) F = 20. (b) The statistically signiﬁcant explanatory variables are loan size.134. Both tests use df = 13.005 < P-value < 0. (e) R 2 = 75. 11. predicted price = \$96. (b) β0 . Overall. predicted price = \$78. P-value is close to 0.57 (a) The hypotheses about the jth explanatory variable are H0 : β j = 0 and Ha : β j = 0. predicted price = \$84. P-value = 0.96 or greater than 1. (f) For 2002 salary. P-value = 0.585. and P-value = 0. t = 0. and higher for an unsecured loan. 2007 9:50 S-56 SOLUTIONS TO ODD-NUMBERED EXERCISES 11. b2002 = 0.518.865. The degrees of freedom for the t statistics are 5650.243. These results agree (up to roundoff) with the results for the coefﬁcient of Bed3 in Example 11.041. A− in science.

and after removing the squared term. then 5. 2 high outliers Forest 39. 0. (b) F = 1.393. 0. 0. (b) 0. df = 7.29 17.2346 3. Area and Forest have a negative relationship. no outliers (b) All relationships are linear and moderately weak. 0. fairly linear. then 20%. but the others are positive. the R 2 increases to 61.284.000021. 0. weak.780. 11.225.1760 3. 0.09 with 1 and 13 df.689. 0. 0.540 − 0. 0. no outliers IBI 65. Therefore.809. (d) The variables Account and the square of Account are highly correlated. Software gives P = 0.000034 Account2. Data point #40 looks like a . Promotions 1 3 5 7 10 4. and R 2 and s are very close to what they were in the original model.05. Promotions: negative.89. 0. then 3.251.61 − 0. then 7. The squared term does not contribute signiﬁcantly to explaining salary.87 (a) R12 = 65. Description of stemplot 28. dev.280 Left skewed. Price vs.001 Discount2 .2699 20 4.058. 11.756.2040 4.524. the difference in means (Group B − Group A) = 3 = the coefﬁcient of x. which agrees with F up to rounding error.91 In the previous exercise.76.Moore-212007 pbs November 27. then 30%. (c) t = 3. the s decreases to 0. then 40%.85 (a) Assets = 7.2648 4.10.060 Discount+ 0. 0.920. 0. 11. and the P-value is still very close to 0. Discount: negative.76.2685 4.94 18.204 Right skewed. The best model is the one with the quadratic term for ˆ discount: y = 5.000034 ± 0.2511.2331 4.097.89 (a) Price vs. the F test statistic is 81. (c) t = −1. moderate.2407 30 4.2618 40 4.2429 4. the residual plot for promotions looks ok but the residual plot for discount seems to have a curved shape. If the quadratic term is removed and the interaction of discount and promotion is used instead.423. the 1 promotion yields the highest expected price. The difference in the intercepts (Group B − Group A) = 120 − 80 = 40 = coefﬁcient of x1 .48%.93 (a) Area Mean St. 11.81 For part (a). All coefﬁcients are signiﬁcantly different from 0. try the quadratic term for discount and the interaction of discount and promotion.3856 4.39 32. the coefﬁcient for the interaction term is not signiﬁcant. 0.83 For part (a).38%.2707 4. The quadratic term is useful for predicting assets in a model that already contains the linear term. (b) The table below shows the mean and standard deviation for expected price for each combination of promotion and discount.2144 (b) and (c) At every promotion level.3155. R22 = 62. and t 2 = 1. 0.714 Right skewed. and P = 0. linear.1629 3. 11. the difference in slopes (coefﬁcient x2 for Group B − coefﬁcient x2 for Group A) = 19 − 7 = −12 = the coefﬁcient of x1 x2 .007.0046 Account + 0.094.102 Promotion − 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-57 11. 0.1520 4.1%. This can be veriﬁed directly for the other parts and shown to be true in general. For every discount level. 0. If the quadratic term for discount is used.269. the 10% discount yields the highest expected price.1848 4. This can be veriﬁed directly for the other parts and shown to be true in general. 11.

(b) y = 1.776. 1 low and 1 high outlier Fairly symmetric. 1 high outlier Fairly symmetric. (h) Yes.000.4028 0. All the explanatory variables are signiﬁcantly correlated with PCB.600 PCB118 + 4. 1 low and 2 high outliers Right skewed. but ignoring data is not a good idea.3494 0. 11.8627 4. 7 high outliers (b) All the variables are positively correlated with each other. There are two new potential outliers though at #44 and #58.7%. 11. but the Normal probability plot looks good. This model has R 2 = 97. s = 4. (c) Conﬁrms what we did in (a) and (b). (c) yi = β0 + βArea xAreai + βForest xForesti + εi .3906 1.103 Answers will vary. (b) The error terms are all 0. no outliers 11. only 35.234 Forest. so σ = 0.3483 0. 11.3592 0. (e) R 2 = 35. 6 high outliers Skewed right.504 0.95 (a) PCB PCB52 PCB118 PCB138 PCB180 Mean 68. 0.4%.3717 1.628 + 14.3354 −1. All coefﬁcients are signiﬁcant again.7009 0.101 (a) Results will vary with software. 2007 9:50 S-58 SOLUTIONS TO ODD-NUMBERED EXERCISES potential outlier on the Area/Forest scatterplot.8576 0.0191 5.701 0.478). All coefﬁcients are signiﬁcant. although PCB52 is the most weakly correlated with PCB and with the other explanatory variables. 1 low and 1 high outlier Fairly symmetric. P-value = 0. 1 low outlier Fairly symmetric.2563 6. We had general linear relationships between the variables and the residuals look good. no outliers Fairly symmetric.2591 Description of stemplot Fairly symmetric. The residuals look fairly Normally distributed.97 (a) #50 and #65 are the two potential outliers.054 PCB138 + 4.4235 −0.3914 0. but not anyplace else.7% of the variation in IBI is explained by Area and Forest.5% and .685. (b) Most software should ignore these data points. #50 (residual = −22. (c) Table below uses Base 10 logs. s = 14.555. 5 high outliers Skewed right.5167 0.7397 0. dev. 5 high outliers Skewed right.1584 St. Ha : The coefﬁcients are not both 0.5793 −0. no outliers Right skewed. (g) A histogram of the residuals looks slightly skewed left.629 + 0. P-value = ˆ 0.0864) is the ˆ overestimate.Moore-212007 pbs November 27. 5 high outliers Skewed right.4918 0.3495 St. Log of variable PCB138 PCB153 PCB180 PCB28 PCB52 PCB126 PCB118 PCB TEQ Mean 0. Only the correlation between PCB52 and PCB180 is not signiﬁcant (P-value = 0.9864 Description of distribution Skewed right.5983 3.569 Area + 0. F = 2628. 16 high outliers Fairly symmetric. but if you start with the full model and then drop one variable at a time. dev.99 (a) β0 = 0. R 2 = 99. y = 40. 11.442 PCB52 + 2. (f) Residual plots look good—random and no outliers. β1 = β2 = β3 = 1 because TEQ = TEQPCB + TEQDIOXIN + TEQFURAN. (d) H0 : βArea = βForest = 0. one good possibility is leaving out Log PCB126 because all the coefﬁcients are signiﬁcant at the 5% level. However.9580 3. and the residual plots look random.4674 0. F = 12. 59.8268 4.109 PCB 180.972.

3 Days + 0.09.212 + 0.8% and s = 0. 11. 152.119 For the linear model.87.90 from the previous model.109 (a) Vitamin C = 46 − 6. Neither the coefﬁcient for x1 or for x2 is signiﬁcant.107 (a) R 2 = 18.214 + 0. its coefﬁcient is signiﬁcant. (c) It appears that there is an increasing and then decreasing effect when looking at the residuals plotted against year. R 2 = 0.108 Log PCB28 + 0.000503. (c) This isn’t always the best approach either.08 and df = 7. While the t statistic shows that the coefﬁcient on the variable LOS is signiﬁcantly different from zero.29.1 − 11. Ha : At least one coefﬁcient is not equal to zero. (c) R 2 = 0.045 × Year2 + 3.059. The P-value = 0. (c) Vitamin C = 50.5 + 0. (b) The residuals appear fairly Normal.2+4. t = −10. Multiple regression is complicated.82. It appears that the squared term is signiﬁcant in the model. (b) All correlations are signiﬁcant.8×Soybean yield. 11.62.Moore-212007 pbs November 27.117 (a) Corn yield = −607. (b) It appears that the relationship may be slightly curved. df = 8.41 with a 95% prediction interval of (114. 11. t statistics have small P-values.14 with a 95% prediction interval of (101. If the students choose a lower signiﬁcance level.04. it does not appear that this is a strong linear model.151 Log PCB118. t = 16. a ˆ good model with only 4 variables would be: y = 1. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-59 ˆ s = 0.9 × Soybean yield.101 Log PCB52 + 0. This model has an R 2 = 97.73 with a P-value = 0.287 + 0. 45). P-value = 0. t = 1. LOS does not explain more than 12% of the variation in wages. the t statistics for the coefﬁcients are: t = −5. 11.152 Log PCB153 + 0.091 Log PCB28 + 0. (b) H0 : All coefﬁcients are equal to 0. For the quadratic model. (d) In the order they appear in the model. 11.087 Log PCB52. It makes sense to include year in the regression model. predicted yield for 2001 was 127. P-value = 0.400 Log PCB138+ 0. F = 233.668 Log PCB138 + 0.152.67 × LOS + 71.12. 11. The least-squares regression line is: y = 1. t = 2.95.103 Log PCB52.111 Wages = 349 + 0. If just x1 is used.006. t = 9. This model has an R 2 = 96. 11. R 2 = 0.61). P-value = 0. If just x2 is used. 158.0638. predicted yield for 2001 was 136.105 Answers will vary. df = 3 and 36.088 Log PCB28 + 0. .05 Days.86.132 Log PCB180 + 0. The residuals show a systematic pattern in that the values go from positive to negative to positive as the number of days increases.1421. This indicates that the model provides signiﬁcant prediction of corn yield.763 Days2 . compared to 0. Choose the model with the squared term.5 + 0. A model with 3 variables would be: ˆ y = 1.115 (a) Corn yield = −46. (d) R 2 without the squared term is 0.3x Year −0. days in the second model shows that there is no systematic pattern.512 with a P-value = 0. F = 3. its coefﬁcient is signiﬁcant. 11.0600. (e) The residuals all appear random when plotted against the explanatory variables. The scatterplots show that both x1 and x2 have positive linear relationships with y and with each other. s = 1.6 LOS.113 Wages = 302. The squared term t statistic is 6. The linear model gave a closer prediction to actual.9%.0581. A look at the scatterplot of residuals vs.99.821 Log PCB138 + 0.82 with a P-value = 0.93 and with the squared term is 0.2% and s = 0. P-value = 0. It appears there is a signiﬁcant linear relationship between vitamin C level and the number of days after baking. P-value = 0. R 2 = 0.144 Log PCB153 + 0.8 × Size.

0.155 HSM + 0.123 The residuals do not show any obvious patterns or problems.05 GHSM − 0. A Pareto chart indicates that the hospital ought to study DRGs 209 and 116 ﬁrst in attempting to reduce its losses.129 (a) GPA = 0. Based on the large P-value associated with the SATM coefﬁcient.7 . a malfunction of the coffeemaker or a power outage.05 GHSS − 0.966. The plot of SATM vs. a serious mismeasurement of the amount of coffee used or the amount of water used. (d) F = 0. (b) H0 : All coefﬁcients are equal to 0. F = 26.319. SATM: (−0. The t statistics for each coefﬁcient have P-values greater than 0.. variation in the amount of water added to the coffeemaker. (c) Data set B comes from a process that is in control.. (b) In the chart for data set A one ﬁnds that samples 12 and 20 fall above UCL.666 + 0. the interval describing SATM coefﬁcient does contain zero.10 for all except the explanatory variable HSM.127 Looking at the sample of males only shows the same results when compared to the sample of all students.5 12. Yes.20) and while HSM is a signiﬁcant predictor. Ha : At least one coefﬁcient is not equal to zero. (f) R 2 = 0. GPA does not show any obvious outliers.1 A ﬂowchart for making coffee might look like Measure coffee ∅ Grind coffee ∅ Add coffee and water to coffeemaker ∅ Brew coffee ∅ Pour coffee into mug and add milk and sugar if desired.. CHAPTER .067 Gender + 0.184 (compared to 0.012 GHSE. (d) HSM: t = 5.999. 0. 11.703. In the chart for data set B no samples are outside the control limits.125 GPA = 0.1942. In the chart for data set C.. and the use of milk that has gone bad.8. Some sources of common cause variation are variation in how long the coffee has been stored and the conditions under which it has been stored.. one should conclude that SATM does not provide signiﬁcant prediction of GPA. In making a cause-and-effect diagram.63 and P-value = 0. R 2 = 0.99. and variation in the amount of milk and/or sugar added. This means that we assume the model is not a signiﬁcant predictor of GPA and let the data provide evidence that the model is a signiﬁcant predictor. UCL = 11.2. P-value = 0. The plot of SATV vs.143. 11. consider what factors might affect the ﬁnal cup of coffee at each stage of the ﬂowchart. (e) s = 0. P-value = 0.1 we described the process of making a good cup of coffee. P-value = 0.5.582 + 0. variation in the length of time the coffee sits between when it has ﬁnished brewing and when it is drunk..3 12. variation in how ﬁnely ground the coffee is. (c) HSM: (0. (a) CL = 11.044 HSE + 0..00061 SATM.121 The scatterplots do not show strong relationships.Moore-212007 pbs November 27. LCL = 11.193 HSM + 0. samples 19 and 20 are above UCL.. 11. SATM: t = 0. The 9 DRGs account for 80.1295. (b) Verify.2565).5% of total losses.050 HSS + 0. variation in the measured amount of coffee used. interruptions that result in the coffee sitting a long time before it is drunk. This model provides signiﬁcant prediction of GPA. 2007 9:50 S-60 SOLUTIONS TO ODD-NUMBERED EXERCISES 11. 12. (c) Verify.00059. Some special causes that might at times drive the process out of control would be a bad batch of coffee beans.12 . GPA shows two possible outliers on the left side.. This indicates there is no reason to include gender and the interactions.001815). Data set A comes 12. In Exercise 12. 11. HSS and HSE are not. .

13 My main reasons for late arrivals at work are “don’t get out of bed when the alarm rings” (responsible for about 40% of late arrivals). alarm rings ∅ 5:45 A. a given sample is equally likely to be from the experienced clerk or the inexperienced clerk. LCL = (c4 − 2c5 )σ . and “slow trafﬁc on the way to work” (responsible for about 15% of late arrivals). UCL = 0. s = 8.044.7 we see that most of the points ˆ ˆ lie below 40 (and more than half of those below 40 lie well below 40). 12. CL = 43..M. Second sample: x = 46. leave home ∅ 6:50 A.31 99.Moore-212007 pbs November 27.M. CL = 11. (b) In Figure 12. .9. inspecting samples of the monitors it produces and ﬁxing any problems that arise.98.M. The desired probability is 0.27 By practicing statistical process control. “too long showering. (b) For an s chart. LCL = 0. For the s chart we note that UCL = 25. 12. 12. that x will be either larger than UCL = 713 or smaller than LCL = 687. 12.87662.02. s = 13.1591.25 (a) μ = 275. Thus. arrive at ofﬁce. 12.15 For the s chart. get out of bed ∅ 5:46 A.94.9 CL = 0. start coffeemaker ∅ 5:47 A.M. √ √ 12..M. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-61 from a process in which the mean shifted suddenly.23 Presumably.. Incoming inspection is thus redundant and no longer necessary.46065. “too long eating breakfast” (responsible for about 25% of late arrivals). For Normal data the natural tolerances are trustworthy. The s chart shows a lack of control at sample point 5 (8:15 3/8). For the x chart. UCL = 0.M. For the x chart UCL = 60.5. the manufacturer is. LCL = 0.8750. all but one (Sample 12) are only slightly larger than 40.M. 12.06% of monitors will meet the new speciﬁcations if the process is centered at 250 mV..19 We want to compute the probability that x will be beyond the control limits. 12. which is consistent with the estimate of σ in part (a)....87338. 12. and which clerk a sample comes from will be random.21 (a) 2σ control limits are UCL = μ + 2(σ n).11 5:45 A.03. The s chart suggests that typical values of s are below 40. 12. LCL = 0. that is. so based on our Normal quantile plot.0023568. shaving. breakfast ∅ 6:30 A. Data set C comes from a process in which the mean drifted gradually upward. in essence.17 First sample: x = 48.09.M.001128. the 2σ control limits are UCL = (c4 + 2c5 )σ . shave. and dress ∅ 6:15 A. σ = 37. and the pattern of large and small values should appear random. Of the points above 40.065. brush teeth ∅ 6:35 A. and dressing” (responsible for about 20% of late arrivals). Both types should occur about equally often in the chart.. We would want to ﬁnd out what happened at sample 5 to cause a lack of control in the s chart. 12. but otherwise neither chart shows a lack of control. LCL = 25. 12. CL = 0. LCL = μ − 2(σ n). shower. the natural tolerances we found in the previous exercise are trustworthy. park at school ∅ 6:55 A. The control charts the manufacturer creates are a record of this inspection process. LCL = 0. UCL = 1. CL = 0.29 The plot shows no serious departures from Normality. the x chart should display two types of points: those that have relatively small values (corresponding to the experienced clerk) and those with relatively large values (corresponding to the inexperienced clerk)..M.

02. and hence USL and LSL will be at least 6σ from μ.57 If the same representatives work a given shift. Cpk = 0. a much wider range than the speciﬁcation limits of 54 ± 10).63 UCL = 0.53 (a) 17. and calls can be expected to arrive at random during a given shift.55 Cp = USL − LSL 6σ .01 (Sample 46). 12.4043 and LCL = 0. the process mean and standard deviation should be stable over the entire shift.39.750. CL = 0. and the process variability is large (we saw in Exercise 12. we would expect to see 4 defective orders per month.Moore-212007 pbs November 27. and no obvious trends. (b) UCL = 0.008 and if the manufacturer processes 500 orders per month. It estimates what the process is capable of producing if the process could be centered. then the process may not be stable over the entire shift. If different representatives work during a shift. and random selection should lead to sensible estimates of the process mean and standard deviation.67 (a) p = 0. There may be changes in either the process mean or the process standard deviation over the shift. LCL = 0.68. CL = 0.008. and thus it would be more reasonable to time 6 consecutive calls.33 that the natural tolerances for the process are 43. this is called six-sigma quality.0334. The reasons are that the process is not centered (we estimate the process mean to be 43. 12.8. the values of s become s = 9.49 Cp is referred to as the potential capability index because it measures process variability against the standard provided by external speciﬁcations for the output of the process. or if the rate of calls varies over a shift. ˆ ˆ 12. so if Cp ≥ 2 we must have USL − LSL ≥ 22σ . 12. If the process is properly centered. Thus. LCL = 0. It estimates what the process is capable of if the process target is centered between the speciﬁcation limits. 12. When the outliers are omitted. The capability is ˆ ˆ poor (both indexes are small).0435.17. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-63 12. (b) For the p chart.41 ± 37. resulting in a value of s = 93. (c) For October. p = 0. σ = 12.20. 12.59 The three outliers are a response time of 276 seconds (occurring in Sample 28).61 (a) The total number of opportunities for unpaid invoices = 128. s = 31.71 (Sample 42). (b) Cp = 0. resulting in a value of s = 107. Stability would more likely be seen only over much shorter time periods.0233. The process (proportion of students per month with 3 or more unexcused absences) appears to be in control.65 (a) p = 0.27. n = 921.3087.4027 and LCL = 0. In this case. UCL = 0. s = 6.41. UCL = 0. A random sample from all calls received during the shift would overestimate the process variability because it would include variability due to special causes.28 (Sample 28).020.53. and a response time of 333 seconds (occurring in Sample 46). CL = 0. a response time of 244 seconds (occurring in Sample 42). Cpk is referred to as the actual capability because index because it considers both the center and the variability against the standard provided by external speciﬁcations for the output of the process.356. (b) Cpk = 0. 12. That is. UCL = 0.356. then its mean μ will be halfway between USL and LSL. These exact limits do not affect our conclusions in this case. CL = 0.06. 12. For June.0334.00599.78% of clip openings will meet speciﬁcations if the process remains in its ˆ current state.4033.3093.3077. .41. then it might make sense to choose calls at random from all calls received during the shift.51 (a) μ = 43. LCL = 0.01307. 12. but the midpoint of the speciﬁcation limits is 54). resulting in a value of s = 119. there are no months with unusually high or low proportions of absences. (b) UCL = 0.

and LCL = 0.79 (a) With p = 0. then in a sample of 100 ﬁlms the expected number of unsatisfactory ﬁlms is 0. A better estimate would be to compute the sample standard deviation s of all 22 × 3 = 66 observations in the samples. (b) Cpk depends on the value of the process mean.508. If the process mean can easily be adjusted.896.71 (a) Presumably we are interested in monitoring and controlling the amount of time the system is unavailable.. After the ﬁrst sample. 12. and LCL = 0.. the number of defective orders must be greater than 10..718. 4. most of our samples will have no unsatisfactory ﬁlms. Attempts 2. 13. This value tells us that the speciﬁcation limits will lie just within 3 standard deviations of the process mean if the process mean is in the center of the speciﬁcation limits. (c) We might examine samples of programming changes and record the proportion in the sample that are not properly documented. The control limits would be UCL = 0. ˆ 12. the new limits for future samples should be UCL = 0.. sales are lowest for the ﬁrst two quarters and then increase in the third and fourth quarters. Sales decrease from the fourth quarter of one year to the ﬁrst quarter of the next. 12. 12. (b) If the proportion of unsatisfactory ﬁlms is 0. UCL = 0. To be above the UCL. CL = 0.. . and 10 are above the UCL. CHAPTER .019. 12. and LCL = 0.294. 2007 9:50 S-64 SOLUTIONS TO ODD-NUMBERED EXERCISES LCL = 0.14.73 CL = 7. . Thus.702. 12..003. LCL = 0. (b) To monitor the time to respond to requests for help we might measure the response times in a sample of time periods and use an x and an s chart to monitor how the response times vary. (b) The new sample proportions are not in control according to the original control limits from part (a). perhaps reﬂecting an initial lack of skill or initial problems in using the new system.642. the process is in control with respect to short-term variation.75 (a) Cp = 1. Using s/c4 is likely to give a slightly too small estimate of the process variation and hence a slightly too large (optimistic) estimate of Cp . This gives an estimate of all the variation in the output of the process (including sample-to-sample variation).69 (a) The percents add up to larger than 100% because customers can have more than one complaint. (b) The pattern described is obvious in the time plot.1 (a) Each year. the process variation. so using Cp is ultimately more informative about the process capability.13 . UCL = 19.Moore-212007 pbs November 27. A better measure of the process capability is to center the process mean within the speciﬁcation limits and then compute a capability index. Cpk and Cp will be the same. although sample 10 is very close to the upper control limit. (c) We used s/c4 to estimate σ . We might measure the time the system is not available in sample time periods and use an x and an s chart to monitor how this length of time the system is unavailable varies. (b) The category with the largest number of complaints is the ease of obtaining invoice adjustments/credits.506.003. we would use a p chart. the process does appear to be in control.65. it is easy to change the value of Cpk . and plotting the sample values (most of which will be 0) will not be very informative. If the process is properly centered.77 (a) We would use a p chart.. With new p = 0..3. so we might target this area for improvement. Because we are monitoring a proportion.. The ﬁrst sample is out of control.

(c) There is usually a peak in January. there is a dip. and 1. 24. (b) The ﬁrst quarter of 2002 is \$8871.13 (b) There is no obvious increase or decrease overall.231 tells us that fourth-quarter sales are typically 23. 13.3 (a) Sales = 5903.15 (a) The dashed line in the time plot corresponds to the least-squares line. (d) The pattern described in part (a) is repeated year after year. (d) There are two temporary troughs: a big one from January 1991 to November 1994 and a smaller one from January 1998 to January 2003. (c) The biggest dips come in August each year. and hence must be in the fourth quarter.231 to account for the fact that the trend-only model typically underpredicts the fourth quarter. (a) Seasonally adjusted series is a little smoother. The fourth quarter of 2002 is \$9228. fairly linear trend over time. (b) If we know that X 1 = X 2 = X 3 = 0. (a) The seasonality factors are 0. This suggests that the seasonal pattern in the DVD player sales data is not as strong as it is in the monthly retail sales data. and then an increase until the next January each year. and then there is another major (but not quite as big) dip in December each year. The fourth-quarter seasonality factor of 1. 0. . The fourth quarter of 2002 is \$11. but it’s just not as dramatic as for all the other years.885. Using the trend-only model. then we know that we are not in any of the ﬁrst three quarters. There could also be considered a temporary peak at the very end of the data set where there is a steep increase. but there is lots of month-tomonth variability. while the fourthquarter forecast has been multiplied by 1. (c) In the past. However.17 (a) The ﬁrst quarter of 2002 is \$8188.75x with sales in millions of dollars and x takes on values 1. so that might help explain why the August dip isn’t as big as expected.9 13.22 million. which is close to 1. (a) Sales = 7858. (b) The average is 0. In August 2002. respectively. 13.94 million.97 million. (b) Seasonally adjusting the DVD player sales data smoothed the time series a little but not to the degree that seasonally adjusting the sales data in Figure 13. (b) The ﬁrst-quarter forecast has been multiplied by 0. 13.76 + 99.21X 1 − 2564.923 as the trend-only model typically overpredicts the ﬁrst quarter.1% above the average for all four quarters. There are smaller dips in June each year. (b) The fourth quarter of 1995. .359. we ﬁnd that the sales for the ﬁrst three quarters tend to be overpredicted and those for the fourth quarter are underpredicted. 0.79X 3. a drop until April.54x − 2274. 13. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-65 (c) There appears to be a positive trend. although the trend levels off after 1998. 13.58X 2 − 2022. (c) The trend-andseason model and the trend-only model with seasonality factors give similar predictions as both are adjusting the trend model for the seasonality effects. the June 2002 dip was bigger than all the other June dips.22 + 118. so the ﬁrst quarter of 2002. . (c) The plot mimics the pattern of seasonal variation in the original series. 2. . The trend-and-season model explains much more of the variability .999.231 for quarters 1 through 4.Moore-212007 pbs November 27. (c) The slope is the increase in sales (in millions of dollars) that occurs from one quarter to the next.7 did. 13.83 million.19 (a) For the trend-only model R 2 = 35% and for the trend-and-season model R 2 = 86. predictions in the ﬁrst three quarters tended to be slightly more accurate than in the fourth quarter.923.8%.960.7 13.5 13. (c) The intercept again represents the fourth quarter of 1995.11 (b) There is a positive.

The sales in July 2001 are 5170 thousands of units.03125.21 (a) There are two very large temporary troughs (approximately) between July 1990 and July 1995 and then between April 1997 and November 2003. s = 1170.29 (a) The 12-month moving average forecast predicts the November 2002 price by the average of the preceding 12 months. The correlation between et−1 and et is 0.25 (a) The other group of 10 points has December as the y coordinate. 0. 0. 0. 13.0000001. and .9 + 0. and most of the residuals for the ﬁrst three quarters are negative.009. 0. The constant for the AR(1) model is much smaller than the constant (intercept) for the simple linear regression model.31. Averaging the last 12 values in the series gives the 12-month moving average forecast as \$3. Yes.996yt−1 .0039062. so the forecast of August 2001 sales using this model is 5162.985. but neither captures the sharp rise in price that occurred over the latter part of 2002.0000009. 0.0019531.1. The AR(1) model is preferred because it was estimated using maximum likelihood.125.Moore-212007 pbs November 27.081. 0. 2007 9:50 S-66 SOLUTIONS TO ODD-NUMBERED EXERCISES in the JCPenney sales. 13.0009766.27 (a) Fitting the simple linear regression model using yt as the response variable and yt−1 as the predictor gives the equation yt = 34.25. The 120-month moving average forecast predicts the November 2002 price by the average of the preceding 120 months. (c) When w = 0.9.015625.9 the coefﬁcients are 0.39.23 (a) The residuals for the fourth quarters are positive. 0. positive. 0. 0.095.5 the coefﬁcients are 0. If there were a seasonal adjustment. and 0. 0.0387420.28. 0. 0. which is preferred over least-squares for time series models. and for the trendand-season model. (d) It is clear from the plot that the trend-and-season model is a substantial improvement over the trend-only model. 0. 0.31 (a) When w = 0. 0. 13. 13.22. 0.06561. 0. 0.09. so the forecast of August 2001 sales using this model is 5163. (b) Autocorrelation is not apparent in the lagged residual plot. The 120-month moving average forecast is slightly better.4573.64.1 the coefﬁcients are 0.992yt−1 .0531441. Averaging the last 120 values in the series gives the 120-month moving average forecast as \$3. (b) A closer look at the time plot suggests a positive autocorrelation. but the constants differ. (b) The correlation of 0. The majority of pairs of successive residuals shows the ﬁrst residual lower than the second residual in the pair.0729. (c) The coefﬁcients of yt−1 are very similar in parts (a) and (b). the correlation would be closer to 0. The outlying groups of points have the December sales as either the x or y coordinate.00009.0 + 0. (c) The lagged residual plot shows a beautiful.0625. (b) Fitting the AR(1) model gives the equation yt = 13. 13.7. The sales in July 2001 are 5170 thousands of units. The correlation between successive residuals et and et−1 is only 0.09.0000000. 0. there is a sharp rise. and 0.0478297. 0.0009.0430467. and this is what is reducing the correlation from 0.9206. linear relationship between et−1 and et . 0. 13.0078125. these two sets of 10 points should no longer stand out from the remaining points. (b) For the trend-only model. s = 566. 0. At the end of the data. (b) When w = 0.9206 to 0. (c) The trend-and-season model closely follows the original series. 0.000009. (c) If we looked at the seasonally adjusted time series. (b) Going to the Web page gives the actual winter wheat price received by Montana farmers for November 2002 as \$4. The August 2001 estimates are very close.9206 suggests a strong autocorrelation in much of the time series.5. we have strong evidence of autocorrelation.059049. 0.

0000000 for the values w = 0. (c) Yes. but the model with w = 0. although in general maximum likelihood is preferred to least-squares in time series models.1 puts the greatest weight on y1 in the calculation of a forecast.327 and the regression standard error is 47. (d) There is very little difference between the values of R 2 the model standard error s in this example. (c) The equations obtained for least-squares and maximum likelihood are almost identical. ˆ 13. The curve for w = 0. (e) The curves for w = 0. The correlation is large and the relationship looks strong.39 (a) The moving averages do appear to be very linear.1 0.327 and the model standard error is 47.Moore-212007 pbs November 27. the relationship is strong enough to make production 12 months ago a good predictor or this month’s production. The model with w = 0.35 (a) There is a moderate positive linear relationship.976 The forecasts are fairly similar. respectively. ˆ ˆ ˆ 13. Thus. w = 0.568yt−1 .1. 13. the values of the coefﬁcients in the forecast model decrease exponentially in value with the exception of this last coefﬁcient. (b) The smoothness decreases as the value of w increases.5 0. These were then used to forecast the orange price for February 2001. there is a slight spiraling of the moving averages around the predicted value line.49217. (c) The Minitab output for an exponential smoothing model provides forecasts for each value of the smoothing constant.329 216.89. yJuly2005 = 704.000977.37 (a) Fitting the simple linear regression model using yt as the response variable and yt−1 as the predictor gives yt = 100. 13. The assumptions necessary for the regression model are no longer valid.998 219. (b) An AR(1) model was ﬁtted and the estimated autoregression equation is yt = 100. The results are summarized below. (d) The January 2001 data point was added to the series and the models with the three weighting constants were ﬁt to the new series. and w = 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-67 0. From 1996 on until the end of the data. yJuly2005 = 703. . which decreases more slowly.33 (a) yt = −0.9.481x. (b) 0. The value of w = 0.41 (a).5822. (f) The coefﬁcients of y1 are 0. and 0. (c) The moving average line and predicted value line completely overlap until about 1996. (d). 13.5 show a more rapid exponential decrease than the curve for w = 0.5 provided the forecast that was closest to the actual value. The estimates are fairly close. so it is better to use the AR(1) model. The results are summarized on the next page. the y-intercept is not the same.572yt−1 .692. there is no clear indication which ﬁtting method is preferred. (b) y = 181.0000000.1.372 + 1.45385 + ˆ 1. R 2 = 0.9 puts more weight on the most recent value of the time series in the computation of a forecast.4 + 0. Note that as indicated in the text. (c) The slope is very similar for both equations.9 would be best for forecasting the monthly ups and downs in orange prices. however.9 Prediction 218. (b) yt = −2.9 and w = 0.021 + 1.5. 0.348678.004yt−1 . Smoothing constant w 0.00877yt−1 .9 + 0. but he lines are still very close.74. R 2 = 0.

120. The moving averages with a span of k = 100 are quite smooth. (c). We set all indicators equal to 0 except the indicator for July.225(Nov.384 − 20. . Adding an indicator variable for December would be redundant.oanda. .)+0.45 (a) If we use an exponential smoothing forecast equation with smoothing constant ˆ of w = 1.Moore-212007 pbs November 27. Using the model from part (c). R 2 = 0. The forecast in part (c) is slightly larger. (e) Both the quadratic and cubic models appear to ﬁt appreciably better than the straight line model. .9 provided the forecast that was closest to the actual value. we must ﬁrst deﬁne 11 indicator variables.691 + 0. as in Example 13.)−0. R 2 = 0. If we use an exponential smoothing forecast equation with smoothing constant of w = 0.5 is UNITS = 2. = 0.371(Jun. Nov.3. .47 (a) There are no clear seasonal patterns in the time series plot.9. (b).993532.49 (a) To ﬁt the trend-and-season model.198.137 + 1.9 Prediction 219.278. 2002 (from the web site www.478 The actual value in February was 229. (d) We used k = 4 and k = 100.) − 0. Both forecasts underestimate the actual exchange rate. (c) We ﬁtted a second degree polyˆ nomial to the data using statistical software and obtained y = 20.057(Oct.802.430 + 256.171(year)2 + 0. 0 otherwise. Mar. 13.43 (a) If we use a span of k = 1.525(Jul. = 1 if the month is February.518 221.556(Mar. 13. Using software we obtained the estimated trend-and-season model ln(UNITS) = 10.) − 0.) − 0. (c).544(Apr.761(Jan. we notice that this would correspond to case 64.3(year) + 5.582 Thus.533(Aug. and so on through November. then we know the month must be December. The moving averages with a span of k = 4 are not particularly smooth. the forecast equation is yt = yt−1 . = 1 if the month is January. = 0.697. (c) To use the equation in part (a) to forecast for July 2002. We let Jan.)−0. = 0. R 2 = 0. (e) Using the model from part (b). Feb. which we set to 1. we ﬁnd the forecast for the July 24.9.041(Sep.1 0.5 0. the moving average forecast equation would be ˆ yt = yt−1 .) − 0.149355(year)3 . the forecast equation is ˆ yt = yt−1 .01140.082. 13.786(Feb. (b) We ﬁtted ˆ a line to the data using statistical software and obtained y = −487. 2002. = 0.com/convert/fxhistory). (d) The forecast using the seasonality factors in a trend-and-season model from Example 13.36356(year)2 . (d) We used w = 0. 13.586. The actual exchange rate on July 24. so in this case the model with w = 0. we forecast sales for July 2002 to be UNITS = e14.753. (d) We ﬁtted a third degree polynomial to the data using statistical software and obtained ˆ y = −1.620. Feb. and s = 5048.152.735(year). (b) If Jan. and s = 3684. we ﬁnd the forecast for the July 24. 0 otherwise. 2002.)−0. 2007 9:50 S-68 SOLUTIONS TO ODD-NUMBERED EXERCISES Smoothing constant w 0.632 (May)−0. .101. The cubic model ﬁts a little better (a bit larger R 2 and a bit smaller s) than the quadratic model and might be a slightly better choice for the trend equation.752.991149. exchange rate to be 0.582 = 2. and s = 4091. exchange rate to be 0.2 is smoother than the exponential smoothing model with w = 0. We get ln(UNITS) = 14.733(year) − 872.764 223.)− 0.). We can determine whether the month is December from the other indicator variables.069(Case) − 0.2 and w = 0. The exponential smoothing model with w = 0. is 1. (b).6.

14. Using the AR(1) model. μ2 . and the common standard deviation σ . The plot with span k = 36 does the best job of smoothing the minor ups and downs while still capturing the major jump about two-thirds of the way through the data..Moore-212007 pbs November 27. The ratio of these is 120/80 = 1.11 MST = 9. (b) The F statistic would need to be larger than 3.10 is 1.. so it is reasonable to pool the standard deviations for these data. An even larger value of k might also work well. x 3 = 200. and μ3 .45 = 593.25178. σ ) distribution where σ is the common standard deviation. N = 60.. 14. We estimate σ by sp = 101.3 14.13 (a) The ANOVA F statistic has 3 numerator degrees of freedom and 20 denominator degrees of freedom... 14.... Basal 0 4 0 67 0 888999 1 01 1 22222223 1 45 1 6 DRTA 0 0 6777 0 888889999 1 000 1 2233 1 5 1 6 Strat 0 445 0 666777 0 889 1 111 1 22333 1 44 1 14. although it does capture the major jump in the series that occurs two-thirds of the way through the values.125. . The distribution of the DRTA scores shows some right-skewness.515. This ratio is less than 2.1 to be 9.7 14.53 The AR(1) model does not smooth the minor ups and downs in the data.9 SSG + SSE = 20.10 to have a P-value less than 0. 24.03 = SST.14 . There are no clear outliers in any of the groups. 13.51 We tried spans of 12.5 The distribution for the Basal group appears to be centered at a slightly larger score than that for the DRTA group and perhaps for the Strat group.01245.58 + 572. 13. (a) The largest standard deviation is 120 and the smallest is 80. DFG + DFE = 2 + 63 = 65 = DFT. The εij are assumed to be from an N (0. and n 3 = 20. . and 36. Minitab forecasts the June 2002 average price per pound of coffee as 3.32 Stem plots for the three groups are given below. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-69 (e) The forecast using the AR(1) model from Example 13.036. Using statistical software (we used Minitab) we ﬁnd the mean of all 66 observations in Table 14.788 and the variance of all 66 observations in Table 14. CHAPTER . x 2 = 175. n 2 = 20. n 1 = 20.. The ANOVA model is xij = μi + εij .5. Using the moving average model with span k = 36. The forecast in part (c) is considerably larger. (b) I = 3.1 to be 9. 14.124.05. The parameters of the model are the I population means μ1 .1 I = 3. Minitab forecasts the June 2002 average price per pound of coffee as 3. x 1 = 150.

70883 (b) As sample size increases. In Excel.15 (a) It is always true that sp = MSE.7126 3.8a. (c) Of the sample sizes selected in the table above.29117 Power 0. Group 1 differs from Groups 2.31887 0.042 PROBF 0.27 From the output in Figure 14.6168 4.39427 0. 18. 1 1 6. 14.21 The 95% conﬁdence interval is 15 ± 3.9042 1. which is a parameter. t23 = 1.25 The value of t ∗∗ for the Bonferroni procedure when α = 0.2 121.75683 0.8084 2.29 Group Group 1 Group 2 Group 3 Group 4 Group 5 Mean 150. Because of this. not categorical.d. Hence.83012 0. 14.90378. Take the square root of this entry (0. 100 would be best since the power is fairly low at smaller sample sizes. 14.683 2.616003 in this case) to get sp . and 5. power increases.16 x 2 − x 3 = 2.29. using the Bonferroni procedure we would not reject the null hypothesis that the population means for groups 2 and 3 are different.7273 − 44. 3. t23 = 2.e a SD 19. 14. 46.17 Try the applet.Moore-212007 pbs November 27.968).2. 14.33 (a) n 10 20 30 40 50 100 F∗ 2.1 18.24317 0.968 = (11.521 9.19 The standard error for the contrast SEc = 2. (b) You do want to use one-way ANOVA when there are at least three means to be compared.10243 0.866 2.725 2.90378 = 1. Thus.29.31 The Bonferroni 95% conﬁdence interval is (−2.3 18.4 22.2727 14.35 (a) The response variable needs to be quantitative.627 DFG 3 3 3 3 3 3 DFE 36 76 116 156 196 396 λ 0. 14. sp = MSE.651 2.2c.22).29| < t ∗∗ . This interval includes 0.1).05 is t ∗∗ = 2.16988 0. SAS √ calls sp “Root MSE.d 117.23 t23 = = 1. You should ﬁnd that the F statistic increases and the P-value decreases if the pooled standard error is kept ﬁxed and the variation among the group means increases. .6 n 30 30 30 30 30 We see that Groups 1 and 4 have the largest means.9b.2b.60573 0.” (b) sp = MSE. 14. 2007 9:50 S-70 SOLUTIONS TO ODD-NUMBERED EXERCISES √ 2 14.68113 0.1 20. MSE is found in the ANOVA table in the row labeled “Within Groups” and under the column labeled MS. 14.4545 1. (c) The pooled estimate sp is an estimate for σ .4545 and the standard error for this difference is 1.89757 0.31 + 22 22 14.29.46 (see Example 14. Group 4 differs from Groups 2 and 5.c 129.032.663 2. 7. Because |1.e 140.

p F 0. N = 441. teacher policies. (e) These data do not represent 245 independent observations. (g) There is no control group.49 (a) In general. so it is impossible to comment on the effectiveness of the accommodations.43 For parts (a). n 2 = 145.050 2. 14.025 5. (b) So many decimal places are not necessary. the mean grade decreased.81 (c) Hispanic Americans were highest in each of the three groups. 14.41 0. Pooling should not be used (sp would be 0. (b) DFG = 2. For Emotion score. DFT = 440. it is not unreasonable to group these data together.51 (a) Checking whether the largest s < 2 × smallest s for each group indicates that it is appropriate to pool the standard deviations for intensity and recall but not frequency. N = 15. (b). (c) DFG = 2. because 20 < 2 × 15. (c) The biggest s is not exactly less than 2 times the smallest s (1. University policies. (c) I = 3. do not reject the null hypothesis. 72). 14. (a) DFG = 2. 14. 35). Using 2 decimal places would be ﬁne. 14.3538 (from software). DFE = 12. and 40.400.86). Since the mean grade is affected similarly for 2. 7). 3.61. (c) 41.77 (b) The P-value would be between 0. F(2. n 3 = 76. with a large P-value at the 5% signiﬁcance level. and recall there is strong evidence that not all the means are the same.866. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-71 14.001. DFE = 438. response variable = 1 to 5 rating of “Did you discuss the presentation with any of your friends?” (b) I = 3. (f) Answers will vary.07 0. (b) F(2.42 0. 2 × 0. as the number of accommodations increased. the European. response = cholesterol level.41 (a) I = 3.89 0.025 2.47 (a) Yes.100 3. n i = 25.45 Answers will vary. F(2.010 8. 14. (b) F(4.100. and Indian scores were low but close together. and 4 accommodations. F (2.66233 vs. There is not enough evidence to stay that any of the means are different.35 0. For . (d) 204.001 18.0365 (from software). so for frequency. 438). student backgrounds might not be the same from school to school. response = 1 to 10 rating of the game. DFT = 74.39 (a) F(4. (b) 48.14. DFT = 14.85 0.37 (a) F(3.001 4.82745). 405).100 1.050 and 0.Moore-212007 pbs November 27. F = 1. P-value = 0.45 0. DFE = 72. intensity. 14. p F 0. The P-value for all three one-way ANOVA tests is < 0.100. Asian. 18). Some students were measured multiple times. and (c): H0 : μ1 = μ2 = μ3 and Ha: Not all the means are the same. n i = 5. F = 4.” However. in this case. but it is not a steady decline. but it is close. (d) It is never a good idea to eliminate data points without a good reason. N = 75. (c) You can never conclude from an F test that all the means are different.010 3. 36.050 4. 12).000. n 1 = 220. No additional information is gained by having so many. P-value = 0.95 0. The alternative hypothesis is that “not all the means are the same.

but the plot for control shows some deviation from Normality towards the very low and very high ends.321 1. The largest s is < 2 × smallest s.001551 0.000017 0. 14. The histograms for low and high groups show fairly symmetric distributions. high (0. Asian and Japanese were both low. There are no outliers for any groups. This may affect how broadly the results can be applied.09500 1. (e) The chi-square test has a test statistic of 11. and the difference between 5 and 7 days after baking. In other words. At the α = 0. (d) Answers will vary. all the differences are signiﬁcantly different from 0 except the difference between 0 and 1 days after baking.55 Condition i − j 1−0 3−0 3−1 5−0 5−1 5−3 7−0 7−1 7−3 7−5 Bonferroni Tests Difference Std.321 1.05 level that any of the group means differed.321 1.01 level.321 1.321 1.219.56 did not reject the hypothesis at the 0.5400 −9.0116). P-value = 0.4750 −4.57 (a) The ANOVA in Exercise 14.353 and a P-value of 0.235. (e) The high dose of kudzu isoﬂavones yields a signiﬁcantly greater mean bone density for the femur of a rat than the control or low dose does.Moore-212007 pbs November 27.000012 0.05 level except the difference after 5 and 7 days.0115). 0. .023. so it is appropriate to pool the standard deviations. 2007 9:50 S-72 SOLUTIONS TO ODD-NUMBERED EXERCISES Frequency and Intensity and Recall.3850 −33. A means plot shows that control is slightly higher than low but high is much higher than the other two groups. standard deviation) for each group: control (0. no further analysis on which group means differed is appropriate.000219 0.53 (a) The histogram of bone density for the control group shows a skewed right distribution. People near a university might not be representative of all citizens of those countries. There is strong evidence that not all the means are the same. 14.000053 0.000033 0.216. and European and Indian are close together in the middle. 0.321 1.036734 0. so at the 5% signiﬁcance level there is enough evidence to say that there is a relationship between gender and culture.38000 −40.6350 −13.718.321 1. error −6. The following are the (mean.238149 The differences in the group means are all signiﬁcantly different from 0 at the α = 0. 0.001.9100 −20. A side-by-side boxplot of the groups shows heavy overlap of control and low but not much overlap of high with the other two groups.000007 0. There is no signiﬁcant difference in mean bone density between the control and low dose.75000 −26.321 1. (d) Bonferroni multiple comparisons shows that high is signiﬁcantly different from both control and low groups. low (0. there is evidence that the proportion of men and women is not the same for all of the cultural groups.2900 −29. (c) F = 7.321 P-value 0.321 1. Thus. but control and low groups are not signiﬁcantly different from each other. 14. (b) The normal quantile plots for low and high look good.008542 0.1600 −36.0188).

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-73

(b) Condition i − j 1−0 3−0 3−1 5−0 5−1 5−3 7−0 7−1 7−3 7−5 Bonferroni Tests Difference Std. error −0.110000 −0.140000 −0.030000 −0.045000 0.065000 0.095000 −0.385000 −0.275000 −0.245000 −0.340000 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 P-value 0.752553 0.514112 0.999966 0.998871 0.982858 0.861026 0.014421 0.061029 0.096013 0.025003

The only group means that are signiﬁcantly different at the α = 0.05 level are the difference between 0 and 7 days after baking and between 7 and 5 days after baking. 14.59 (a) The plots show that Group 2 has a modest outlier, but otherwise there are no serious departures from Normality. The assumption that the data are (approximately) Normal is not unreasonable. (b) Number of promotions Sample size Mean Std. dev. 1 3 5 7 40 40 40 40 4.22400 4.06275 3.75900 3.54875 0.273410 0.174238 0.252645 0.275031

(c) The ratio of the largest to the smallest is 0.275031 0.174238 = 1.58. This is less than 2, so it is not unreasonable to assume that the population standard deviations are equal. (d) The hypotheses for AVOVA are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 156 159 Sum of squares 10.9885 9.53875 20.5273 Mean square 3.66285 0.061146 F 59.903 P-value ≤ 0.0001

The F statistic has 3 numerator and 156 denominator degrees of freedom. The P-value is ≤ 0.0001, and we would conclude that there is strong evidence that the population mean expected prices associated with the different numbers of promotions are not all equal. 14.61 (a) Group Piano Singing Computer None Sample size 34 10 20 14 Mean 3.61765 −0.300000 0.450000 0.785714 Std. dev. 3.05520 1.49443 2.21181 3.19082

Moore-212007

pbs

November 27, 2007

9:50

S-74

SOLUTIONS TO ODD-NUMBERED EXERCISES

(b) The hypotheses for ANOVA are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 74 77 Sum of squares 207.281 553.437 760.718 Mean square 69.0938 7.47887 F 9.2385 P-value ≤ 0.0001

The F statistic has 3 numerator and 74 denominator degrees of freedom. The P-value is ≤ 0.0001, and we would conclude that there is strong evidence that the population mean changes in scores associated with the different types of instruction are not all equal. 14.63 The contrast is ψ = μ1 − (1/3)μ2 − (1/3)μ3 − (1/3)μ4 . To test the null hypothesis H0 : ψ = 0, the t statistic is t = c/SEc = 3.306/0.636 = 5.20. P-value ≤ 0.001. We conclude that there is strong statistical evidence that the mean of the piano group differs from the average of the means of the other three groups. 14.65 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i − j Difference Std. error P-value 2−1 3−1 3−2 11.4000 37.6000 26.2000 9.653 9.653 9.653 0.574592 0.001750 0.033909

At the α = 0.05 level we see that Group 3 (the high-jump group) differs from the other two. The other two groups (the control group and the low-jump group) are not signiﬁcantly different. It appears that the mean density after 8 weeks is different (higher) for the high-jump group than for the other two. 14.67 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i − j Difference Std. error P-value 2−1 3−1 3−2 0.120000 2.62250 2.50250 0.3751 0.3751 0.3751 0.985534 0.000192 0.000274

At the α = 0.05 level we see that Group 3 (the iron pots) differs from the other two. The other two groups (the aluminum and clay pots) are not signiﬁcantly different. It appears that the mean iron content of yesiga wet’ when it is cooked in iron pots is different (higher) than when it is cooked in the other two. 14.69 (a) Plots show no serious departures from Normality (but note the granularity of the Normal probability plot. It appears that values are rounded to the nearest 5%), so the Normality assumption is not unreasonable.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-75

(b) Group i − j ECM2 − ECM1 ECM3 − ECM1 ECM3 − ECM2 MAT1 − ECM1 MAT1 − ECM2 MAT1 − ECM3 MAT2 − ECM1 MAT2 − ECM2 MAT2 − ECM3 MAT2 − MAT1 MAT3 − ECM1 MAT3 − ECM2 MAT3 − ECM3 MAT3 − MAT1 MAT3 − MAT2 Bonferroni Tests Difference Std. error −1.66667 8.33333 10.0000 −41.6667 −40.0000 −50.0000 −58.3333 −56.6667 −66.6667 −16.6667 −53.3333 −51.6667 −61.6667 −11.6667 5.00000 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 P-value 1.00000 0.450687 0.223577 0.000001 0.000002 0.000000 0.000000 0.000000 0.000000 0.008680 0.000000 0.000000 0.000000 0.101119 0.957728

At the α = 0.05 level we see that none of the ECMs differ from each other, that all the ECMs differ from all the other types of materials (the MATs), and that MAT1 and MAT2 differ from each other. The most striking differences are those between the ECMs and the other materials. 14.71 (a) Group Lemon White Green Blue Sample size 6 6 6 6 Mean 47.1667 15.6667 31.5000 14.8333 Std. dev. 6.79461 3.32666 9.91464 5.34478

(b) The hypotheses are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. ANOVA tests whether the mean number of insects trapped by the different colors are the same or if they differ. If they differ, ANOVA does not tell us which ones differ. (c) Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 4218.46 920.500 5138.96 Mean square 1406.15 46.0250 F 30.552 P-value ≤ 0.0001

sp = 6.78. We conclude that there is strong evidence of a difference in the mean number of insects trapped by the different colors. 14.73 (a) Source Groups Error Total Degrees of freedom 3 32 35 Sum of squares 104,855.87 70,500.59 175,356.46 Mean square 34,951.96 2,203.14 5,010.18 F 15.86

5)μ2 − μ3 . Ha : ψ2 = 0. This quantity corresponds to MSE in the ANOVA table.5)μ2 − μ3 .65| is larger than t ∗∗ = 2.25. 2007 9:50 S-76 SOLUTIONS TO ODD-NUMBERED EXERCISES (b) H0 : μ1 = μ2 = μ3 = μ4 .005. P-value < 0. (c) SEc1 = 6.81 (a) Question 1. Question 3. Ha : not all of the μi are equal. Ha : ψ1 > 0.65. although the researchers tried to match those in the groups with respect to age and other characteristics. Hypotheses: H0 : ψ3 = 0. 14. (b) Question 1: P-value > 0.001.005.5)μ1 + (0. so we do not have strong evidence that T and C differ. (d) The F statistic has the F(2.46). sp = 46. Ha : ψ3 > 0. SEc2 = 7. is t = 7. it would be reasonable to test the hypotheses H0 : ψ1 = 0. The P-value is smaller than 0. (b) Source Groups Error Total Degrees of freedom 2 206 208 Sum of squares 17.15. no matter how well designed it is in other respects. 0. 14.5)μ1 + (0.66| is not larger than t ∗∗ = 2.83 For each pair of means we get the following: Group 1 (T) vs.14. For H0 : ψ2 = 0. Hypotheses: H0 : ψ2 = 0.46 Mean square 8. (c) This is an observational study. There is at best weak evidence that T is better than the average of C and S.81. 206) distribution. We conclude that these data do not provide evidence that the mean weight gains of pregnant women in these three countries differ.203. Group 2 (C): t12 = −0. We conclude that there is not strong evidence that the average of the mean SAT mathematics scores for computer science majors differs from that for engineering and other science majors. Group 3 (J): t13 = −3.46.46 = (−24.Moore-212007 pbs November 27.29. the test statistic is t = 1. it would be reasonable to test the hypotheses H0 : ψ2 = 0. (e) A 95% conﬁdence interval for ψ1 is 49 ± 12. The P-value is greater than 0. Question 2: 0. 2 14.4 175. so we have strong evidence that T and J differ. A 95% conﬁdence interval for ψ2 is −10 ± 14. 61.21 (c) H0 : μ1 = μ2 = μ3 . (d) The test statistic for H0 : ψ1 = 0. |−.20.54 = (36. Thus.37.10 < P-value < 0.79 (a) For the contrast ψ1 = (0. 14. For the contrast ψ2 = μ1 − μ2 . We conclude that there is strong evidence that the average of the mean SAT mathematics scores for computer science and engineering and other sciences majors is larger than the mean SAT mathematics scores for all other majors.5)μ4 . (c) The F statistic has the F(3. 4.32.90 5010. Contrast: ψ2 = μ1 − (0. Question 2. Contrast: ψ1 = μ1 − μ2 . Ha : ψ1 > 0.22 803. There is strong evidence that J is better than the average of the other three groups. Males were not assigned at random to treatments.46.81. Ha : ψ1 > 0.5)μ2 − (0.75 (a) sp = 3.356. 14. |−3. (b) A contrast is ψ2 = μ1 − μ2 .66. Ha : ψ2 = 0.90.77 (a) A contrast is ψ1 = (0. .61 3. there are reasons why people choose to jog or choose to be sedentary that may affect other aspects of their health. 32) distribution. Hypotheses: H0 : ψ1 = 0.54). We do not have evidence that T is better than C. 2 (d) sp = 2.75. Group 1 (T) vs. Contrast: ψ3 = μ3 − (1/3)μ1 − (1/3)μ2 − (1/3)μ4 . (b) Estimate of ψ1 : c1 = 49.18 F 2. Ha : ψ2 > 0. Question 3: P-value < 0. It is always risky to draw conclusions of causality from a single (small) observational study.100.10 < P-value < 0. Estimate of ψ2 : c2 = −10.94. Ha : not all of the μi are equal.

2.44. standard deviations.8 6.44 by dividing each by 64 and multiplying the result by 100.81. dev.0547 19.73 11.8548 A sample size of 175 gives reasonable power. 14.419845 Std. 14.0158 3. The means.81.65 F 2. so we have strong evidence that T and S differ.74 ≤ 0.852 1.22| is larger than t ∗∗ = 2.85 n 50 100 150 175 200 DFG 2 2 2 2 2 DFE 147 297 447 522 597 F∗ 3. so we have strong evidence that C and S differ.853 176. Group 3 (J) vs.673 Mean Square 13.1460 .81. 14. that the mean % vitamin C content is not the same for all conditions.87 (a) Group 0 1 3 5 7 Sample size 2 2 2 2 2 Mean 76.0576 3.7336 0.25840 We conclude that there is strong evidence that the group means differ.352 0.34 9.0001 4. Group 4 (S): t14 = 3. |3.78 5. |−2.39753 3.69043 0.3 135.0130 3.606.97 21. the F statistic.792. and P-value are all the same as in Exercise 14.32561 1. |6. standard deviations. that is.0108 λ 2.89 (a) The ANOVA table with the incorrect observation is Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 40.99 367.81. (c) The ANOVA table is Source Degrees of freedom Sum of squares Groups Error Total 4 5 9 6263. one might consider a sample size of 150 per group. Group 4 (S): t34 = 6. The gain in power by using 200 women per group may not be worthwhile unless it is easy to get women for the study.5453 0.56 8. |3.297 (b) The sample sizes would be the same in both tables.Moore-212007 pbs November 27. Group 4 (S): t24 = 3. so we do not have strong evidence that C and J differ.12 Power 0.29| is not larger than t ∗∗ = 2.0032 P-value 0.14.5547 34.86| is larger than t ∗∗ = 2. so we have strong evidence that J and S differ.26 Mean square F P-value 1565.2950 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-77 Group 1 (T) vs. and standard errors above could have been obtained from those in Exercise 14. error 1. The degrees of freedom. Group 2 (C) vs.29. and standard errors change the same way as individual values do.20429 1.3984 13 Std. Means.195 0.8017 0. If it is difﬁcult or expensive to include more women in the study.22.820.1016 65. Group 3 (J): t23 = −2.14| is larger than t ∗∗ = 2.695 2. Group 2 (C) vs.86.0261 3.2920 6285.

Lemon yellow White Green Blue 6 6 6 6 114. 0. . 14.10.8333 164. (c) Using software we obtain the following: Variable Constant Number of promotions Coefﬁcient 4. and we concluded that there was strong evidence that the mean number of insects that will be trapped differs between the different-colored traps.49 showed that there is strong evidence that the mean expected price is different for the different numbers of promotions. the mean expected price is changing as the number of promotions changes.91 (a) The pattern in the scatterplot is a roughly linear.0401 0. error of coeff.667 15.32666 9.36453 −0. because if the slope is different from 0. (b) The results are very different. so we would conclude that we do not have strong evidence that the mean number of insects that will be trapped differs between the different-colored traps.416 3. Thus.3 P-value ≤ 0. The ANOVA in Exercise 14.0001. (c) Group Count Mean Std. dev. In Exercise 14.0001.91464 5. In this example the regression is more informative. and we see that P-value is ≤ 0. P-value ≤ 0.0001 The t statistic for testing whether the slope is 0 is −133. decreasing trend. This is consistent with our regression results here. (b) The test in regression that tests the null hypothesis that the explanatory variable has no linear relationship with the response variable is the t test of whether or not the slope is 0.6667 31.0001 ≤ 0.34478 The unusually large values of the mean and standard deviation might indicate that there was an error in the data recorded for the lemon yellow trap. 2007 9:50 S-78 SOLUTIONS TO ODD-NUMBERED EXERCISES The P-value is larger than 0. It not only tells us that the means differ but also gives us information about how they differ. there is strong evidence that the slope is not 0.61.116475 Std.0087 t ratio 109 −13. The outlier increased the sum of squares of error considerably.Moore-212007 pbs November 27. and this results in a much smaller value of F.5000 14.