Moore-212007

pbs

November 27, 2007

9:50

. SOLUTIONS. . .TO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODD-NUMBERED . . . . . . . . . . . . . . . . . . . .
CHAPTER . . . . . . . . . . .1
1.1 1.3 Answers will vary. (a) The bars can be drawn in any order since major is a categorical variable. (b) The percents add to 98.5%, so a pie chart would not be appropriate without a category for Other. The histogram is unimodal and fairly symmetric except for a high outlier (Toyota Prius at 51). The range of the data without the outlier is 19 to 37 mpg. (a) See solutions to Exercise 1.5. (b) The Rolls-Royce Phantom (19 mpg) and Mercedes-Benz E55 AMG (21 mpg) have the lowest mileage. However, there are no low outliers. (a) It is fairly symmetric with one low outlier. Excluding the low outlier, the spread is from about −18% to 18%. (b) About 1%. (c) Smallest is about −18% and the largest is about 18%. (d) About 35% to 40%. From the stemplot, the center is $28 and the spread is from $3 to $93. There are no obvious outliers. Examination of the stemplot shows the distribution is clearly right-skewed. With or without split stems, the conclusions are the same. (a) and (b) are bar graphs. (c) One example of how to do this is by using the “clustered bar graph” on SPSS. (a) The top five are Texas, Minnesota, Oklahoma, Missouri, and Illinois. The bottom five are Alaska, Puerto Rico, Rhode Island, Nevada, and Vermont. (b) The distribution is strongly unimodal and right-skewed with a large peak in the 0 ≤ damage < 10 group. The range is 0 to 90. The three states with the most damage (Texas, Minnesota, and Oklahoma) may be outliers. (c) Answers will vary. (a) Alaska has 5.7% and Florida has 17.6% older residents. (b) The distribution is unimodal and symmetric with a peak around 13%. Without Alaska and Florida, the range is 8.5% to 15.6%. GM had more complaints than Toyota each year, but both companies seem to have fewer complaints in general over time. (a) The individuals are the different cars. (b) The variables are vehicle type (categorical), transmission type (categorical), number of cylinders (can be considered categorical because the number of cylinders divides the cars into only a few categories), city MPG (quantitative), and highway MPG (quantitative). (a) The different public mutual funds are the individuals. (b) The variables “category” and “largest holding” are both categorical, while the variables “net assets” and “year-to-date return” are quantitative. (c) Net assets are in millions of dollars, and year-to-date return is given as a percent. Some possible variables are cost of living, taxes, utility costs, number of similar facilities in the area, and average age of residents in the location.

1.5 1.7

1.9

1.11

1.13 1.15

1.17

1.19 1.21

1.23

1.25

S-1

Moore-212007

pbs

November 27, 2007

9:50

S-2

SOLUTIONS TO ODD-NUMBERED EXERCISES

1.27

The distribution of dates of coins will be skewed to the left because as the dates go further back there are fewer of these coins currently in circulation. Various left-skewed sketches are possible. (a) Many more readers who completed the survey owned Brand A than Brand B. (b) It would be better to consider the proportion of readers owning each brand who required service calls. In this case, 22% of the Brand A owners required a service call, while 40% of the Brand B owners required a service call. Some possible variables that could be used to measure the “size” of a company are number of employees, assets, and amount spent on research and development. The stemplot shows that the costs per month are skewed slightly to the right, with a center of $20 and a spread from $8 to $50. America Online and its larger competitors were probably charging around $20, and the members of the sample who were early adopters of fast Internet access probably correspond to the monthly costs of more than $30. When you change the scales, some extreme changes on one scale will be barely noticeable on the other. Addition of white space in the graph also changes visual impressions. (a) Household income is the total income of all persons in the household, so it will be higher than the income of any individual in the household. (c) Both distributions are fairly symmetric, although the distribution of mean personal income has two high outliers. The distribution of median household incomes has a larger spread and a higher center. The center of the distribution of mean personal income is about $25,000, while the center of the distribution of median household income is about $37,000. The means are $19,804.17, $21,484.80, and $21,283.92 for black men, white females, and white males, respectively. Since we have not taken into account the type of jobs performed by individuals in each category or years employed, we cannot make claims of discrimination without first adjusting for these factors. The medians are $18,383.50, $19,960, and $19,977 for black men, white females, and white males, respectively. The medians are smaller, but our general conclusions are similar. Because of strong skewness to the right, the median is the lower number of $330,000 and the mean is $675,000. For Asian countries the five-number summary is 1.3, 3.4, 4.65, 6.05, 8.8; and for Eastern European countries it is −12.1, −1.6, 1.4, 4.3, 7.0. Side-by-side boxplots show that the growth of per capita consumption tends to be much higher for the Asian countries than for the Eastern European countries and that the growth of per capita consumption for the Eastern European countries is much more spread out. (a) The mean is $983.5. (b) The standard deviation is $347.23. (c) Results should agree except for number of digits retained. A rare, catastrophic loss would be considered an outlier, and averages are not resistant to outliers. The five-number summary is more appropriate for describing the distributions of data with outliers. (a) The five-number summary is 5.7, 11.7, 12.75, 13.5, 17.6. (b) The IQR is 1.8. Any low outliers would have to be less than 9, and any high outliers would have

1.29

1.31 1.33

1.35

1.37

1.39

1.41

1.43 1.45

1.47 1.49

1.51

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-3

to be greater than 16.2. Therefore, Florida and Alaska are definitely outliers, but so is the state with 8.5% older residents. 1.53 (a) The five-number summary is 0, 0.75, 3.2, 7.8, 19.9. (b) The IQR is 7.05. Any low outliers would have to be less than −9.825, and any high outliers would have to be greater than 18.375. The histogram shows that the U.S., Australia, and Canada all have CO2 emissions much higher than the rest of the countries, but only the U.S. is officially considered an outlier using the 1.5 × IQR rule. (a) The 5th percentile is approximately at the 748th position. The 95th percentile is approximately at the 14,211th position. (b) The 5th percentile value is approximately $10,000. The 95th percentile value is approximately $140,000. (a) The distribution is skewed left with a possible low outlier. (b) Use stems from the tens place as −2, −1, −0, 0, 1, 2, 3. The median is $27.855. The mean is $34.70 and is larger than the median because the distribution is skewed right. (a) The mean change is x = 28.767% and the standard deviation is s = 17.766%. (b) Ignoring the outlier, x = 31.707% and the standard deviation is s = 14.150%. The low outlier has pulled the mean down toward it and increased the variability in the data. (c) “Identical” in this case probably means that the 5-liter vehicles were all the same make and model. You will learn about the effects of outliers on the mean and median by interactively using the applet. From the boxplot, the clear pattern is that as the level of education increases, the incomes tend to increase and also become more spread out. Ignoring the District of Columbia, the histogram of violent crimes is fairly symmetric, with a center of about 450 violent crimes per 100,000. The spread is from about 100 to 1000 violent crimes per 100,000 for the 50 states, with the District of Columbia being a high outlier with a rate of slightly over 2000 violent crimes per 100,000. Both data sets have a mean of 7.501 and a standard deviation of 2.032. Data A has a distribution that is left-skewed while Data B has a distribution that is fairly symmetric except for one high outlier. Thus, we see two distributions with quite different shapes but with the same mean and standard deviation. (a) The five-number summary is 0.9%, 3.0%, 4.95%, 6.6%, 14.2%. (b) The mean is larger because the distribution is moderately skewed to the right with a high outlier. (a) A histogram is better because the data set is moderate size. (b) The two low outliers are −26.6% and −22.9%. The distribution is fairly symmetric, with a median, or center, of 2.65%. The spread is from −16.5% to 24.4%. (c) The mean is 1.551%. At the end of the month, you would have $101.55. (d) At the end of the month, you would have $74.40. Excluding the two outliers gives a mean of 1.551% and a standard deviation of 8.218%. Both have changed. Quartiles and medians are relatively unaffected. $2.36 million must be the mean since fewer than half of the players’ salaries were above it.

1.55

1.57 1.59 1.61

1.63 1.65 1.67

1.69

1.71

1.73

1.75

x ± 2s = (−0. (a) Mean is C.91 1.0122.93 1. (a) No.9878. but the two outliers stick out again. You can use the Uniform distribution for an example of a symmetric distribution. 10. (a) 0. 1. (c) 16%.77 1. (c) 0. Answers will vary. The plot suggests no major deviations except for a possible low outlier.115 The histogram is fairly rectangular and the Normal quantile plot is S-shaped.2404.4584.1711 or 17. For symmetric data. while Gerald’s z-score is 1.105 (a) 0. and there are many choices for a left-skewed distribution. and the third quartile is about 277 days. 0.2180.0384. 1. (a) 0. (c) 0. (b) 452. (b) The beginning years are all below average until January 1966. (b) 0. The distribution is not exactly normal.5. so Eleanor scored relatively higher. 1. (b) Mean is A.9%. (a) 0. Then there is a severe drop with no recovery.2389. 1. 1. 1. (b) 0.99 1.117 (a) The mean is 64. but then the last month to be above average is 4/01. this apartment building is no longer an outlier. the normal quantile plot will show the highest and lowest points below the diagonal. the points will follow the diagonal fairly closely. For left-skewed data. 2007 9:50 S-4 SOLUTIONS TO ODD-NUMBERED EXERCISES 1.2236.83 1.87 1. It fits in perfectly with the rest of the data. (c) 72.25 grams per mile driven. 90.9494.97 1. The normal quantile plot shows a fairly good fit to the diagonal line in the middle. (d) 0.85 1.107 0. (c) 712. median is A.8.67.109 (a) Between −21% and 47%. (c) Mean is A.1956). Answers will vary.Moore-212007 pbs November 27.111 (a) The first and third quartiles for the standard Normal distribution are approximately ±0.81 1. 0.11%.0505. x ± 3s = (−0.7%. It looks much better than the distributions in 1. For right-skewed data. even at the ends.6764. median is B. As 2001 . It’s not even one of the top two data points now. (b) The distribution looks fairly normal except for 2 new high outliers (#11 and #13). 0. (b) 0. 0.113 Normal quantile plot looks fairly linear. (d) The dot. the normal quantile plot will show the highest and lowest points above the diagonal.79 1. standard deviation = 0. median is B. 1.65 to 2. (b) The first quartile is about 255 days. 1. 1.0505.89 1.4136).94.6316). (a) Mean = −0. 1. 1. (c) The values are high (in the 80s) between 1/00 and 7/00. (b) x ± s = (−0. (c) There are several answers for (a) but (b) is unique.101 The stemplot shows a fairly symmetric distribution with a high and low outlier.103 0.025.49.5948.0454. (b) 0. Eleanor’s z-score is 1. (b) 64 inches to 74 inches.com economy was still booming in January 2000.95 (a) 1. 10. (c) 0. 1.0224. and 100%.

6 million and the mean salary is slightly over $3. Q 1 = 337 grams.000.8 thousand barrels and x = 48.12475. the minimum is 2 thousand barrels. 1.5 grams. did bad things to the economy as well. and a smaller fraction of these could be sampled. (b) For normal corn. 1.5 grams.8 thousand barrels. (c) The box plot indicates very.78. The next break is not as clear. Q 1 = 21.123 Gender and automobile preference are categorical. M = 358 grams.125 The distribution is right-skewed. the dot. Q 3 = 1.127 (a) For normal corn.Moore-212007 pbs November 27. and the maximum is 204. and the maximum is 477 grams. the group of the most populous counties might consist of the 8 counties with populations over 1 million. but it is difficult to be certain of this from only the five-number summary.30 and s = 50. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-5 progressed. 1. 1. Almost all the data was 1. Because of the rightskewness. For the mean. For example. Age and household income are quantitative.com economy started to crash.129 (a) From the histogram or the boxplot. Q 1 = 383.25 thousand barrels. The mean weight gain for the new corn is 36. M = 406. For new corn.131 The mean is μ = 250 and the standard deviation is σ = 175. being pulled up by the strong right-skewness of the distribution. new corn gives higher weight gains. and we could sample households from half of these counties. the median.1 thousand barrels.95 and s = 42. and Q 3 are all the same. and we would sample from all of these counties. the minimum is 272 grams. but it could be taken at 100. M.5 grams. Finally. but it shouldn’t appear to deviate . (b) The box has a length of 0.9 thousand barrels. (c) 0.121 (a) 11/22 = 50%.119 (a) Min = 1. very right. 1. Division into three groups is fairly arbitrary. indicating some right-skewness. The boxplot and the histogram clearly show the right-skewness. 2001. M = 1.80. the minimum is 318 grams. it is clear that the distribution is skewed to the right. Q 3 = 400. 1. we expect the mean to be substantially larger than the median. the distribution should look fairly symmetric with a center at about 20. There are 27 counties with populations between 100. The median is 159. the Normal quantile plot may not be that smooth. This makes sense because Q 1 . The spread goes from $200 thousand to slightly over $12 million. the mean of the populations is 583. (b) Negative revenue growth means the previous year’s revenue was higher than this year’s revenue. Q 1 = 1. Q 3 = 428. and the maximum is 462 grams.5 grams. The maximum is much farther from the median than the minimum. The median salary is $1. (b) M = 37. there are 23 counties with populations below 100.73. The box containing the first quartile. Overall. x = 366. with the third quartile being slightly farther from the median. A natural way to proceed is to use cutoffs that correspond to gaps in the data. and September 11. Q 3 = 60.5 million. making plotting difficult.133 The range of values is extreme.994 and is clearly quite far from the center of the data. 1. 1.778. which could suggest right-skewness or just a high outlier. and for the new corn x = 402. Since the distribution is skewed right and has three high outliers. 1.5 thousand barrels. skewed data. (c) M = 37.000. (d) The top 25% of all telecom companies had revenue growth greater than 0.000 and 1 million. with three high outliers.65 grams larger than for the normal corn.82%.135 Answers will vary. and the third quartile is fairly symmetric. Because of the small number of observations.

At very slow speeds and at very high speeds. .1 (a) Time spent studying is the explanatory variable. (c) This bear market appears to have had a decline of 48% and a duration of about 21 months.5 2. fuel used decreases. Both variables are quantitative and probably have a negative association because the more money the parents have. (a) Explanatory variable = accounts. (d) The points lie close to a simple curved form. Both are categorical variables. We therefore expect mean personal income for a state to be positively associated with its median household income. financial services (32. natural resources (23. The Normal quantile plot should not look nearly as much like a straight line as when plotting the 20 values of x.2 2. . . . (a) Speed is the explanatory variable and would be plotted on the x axis. . The plot shows a positive association. it increases as speed increases. For the standard deviation. Hand wipe is the explanatory variable. at about 60 km/h. and technology (54. suggesting the distribution of s is not Normal. and linear.3 2. and the yield of a crop is the response variable.11 2. (e) Economic class of the father is the explanatory variable. but two points stand out at the top right corner of the plot. (a) The longer the duration. (c) No. 2007 9:50 S-6 SOLUTIONS TO ODD-NUMBERED EXERCISES from a straight line in a systematic way. so personal and household incomes are positively associated. .7 2. and that of the son is the response. both low and high speeds correspond to high values of fuel used. because market sector is a categorical variable. so we cannot say that the variables are positively or negatively associated. and then.Moore-212007 pbs November 27. the distribution should look slightly skewed to the right with a center close to 5.15 2. Highway gas mileage tends to be between 5 and 10 MPG greater than city mileage.13 2. (c) It appears to fall roughly on the line defined by the pattern of the remaining points. (b) A straight line that slopes up from left to right describes the general trend reasonably well. (a) If a person has a high income. (b) Explore the relationship between the two variables. positive. (c) Strong. so the relationship is reasonably strong. (b) The pattern is roughly linear.314). . and the highway mileage is 68 MPG. .17 . (c) Yearly rainfall is the explanatory variable. and grade on the exam is the response. Household incomes will always be greater than or equal to 2. It may be most reasonable to simply explore the relationship between these two variables.317). (d) Charles Schwab and Fidelity. (d) Many factors affect salary and the number of sick days one takes. (b) Technology was the best place to invest in 2003.760). CHAPTER . . response variable = assets. . the greater the decline.960). (b) At first. (c) In the scatterplot. the less a student will probably need to borrow to pay for college tuition.9 2. and whether or not the skin appears abnormally irritated is the response. Response variable: money student borrows. engines are very inefficient and use more fuel. his or her household income will also be high. (a) The means for the market sectors are: consumer (30. The association is not very strong. Explanatory variable: parent income. (a) The city gas mileage is approximately 62 MPG.

Technology = −24. (b) The correlation in Figure 2. the association is positive. Computation shows r = 0. the mean is 1. 2.03. Utilities and natural resources = 25. The District of Columbia is a city with a few very wealthy inhabitants and many poorer inhabitants. (a) High values of duration go with high values of decline.31 2.66. so we cannot speak of positive or negative association between market sector and total return. The points in Figure 2.19 (a) The overall pattern is linear.000 and the standard deviation is 16. (b) Financial services and Utilities and natural resources were good places to invest. The correlation r measures the strength of a straightline relationship. neither variable is necessarily the explanatory variable. (a) Explanatory variable = price. median household income could be smaller than mean personal income. so this probably corresponds to “Eat Slim Veal Hot Dogs. Florida is the one with the fewest failures. We expect the distribution of both personal and household incomes to be strongly right-skewed. and Texas. linear relationship and the correlation is 0.87. Hot dogs that are high in calories are generally high in sodium. so it is represented on the horizontal axis. Financial services = 35.Moore-212007 pbs November 27. The points are not tightly clustered about a line. we know that the mean will be larger than the median. and median household income must be larger than median personal income.07. (b) If the distribution of incomes in a state is strongly right-skewed.68. (c) Of the four points in the right side of the plot. but results should look fairly random and without pattern. The correlation is 0.3215. 2. They are the most populous states in the country.928. y = 29.27 2.6. (a) The means are: Consumer = −7.29 2. (c) The association is positive and is reasonably strong.31 we found that the scatterplot shows a strong. (a) In 2. (c) Market sector is categorical. (b) x = 18. (c) The correlation should be close to 0. (b) Brand 13 has the fewest calories.21 2. (d) The outlier is California. the mean is 50. and r = 0. Florida.8487. positive.” (a) Business starts is taken as the explanatory variable. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-7 the incomes of the individuals in the household. so the correlation is not near 1. Many very wealthy people who work in New York City live in Connecticut.25 2.4. The variables are positively associated.325. positive. The pattern is strong. sx = 0. Although mean household income must be larger than mean personal income.23 2. For deforestation. These states are scattered throughout the country. We arbitrarily put City MPG on the horizontal axis of our plot.955.738 and the standard deviation is 0. (b) Answers will vary. We therefore expect median household income in a state to be larger than mean personal income in that state.33 .3055. (b) The association is positive. (e) The four states outside the cluster are California.6. and the strength is moderately strong.955. so the association is positive.2 is closer to 1 than the correlation in Figure 2. and linear. resulting in a high mean income. New York. (d) The two points are the District of Columbia and Connecticut. (a) In this case. (b) For price. (c) The overall pattern of the plot is roughly linear.2 appear to lie more closely along a line than do those in Figure 2. The strength of the relationship (ignoring outliers) is moderately strong. s y = 0.

is r = 0. (d) The residuals show the same curved pattern. The correlation.75%. (a) Slightly parabolic. (a) The number y in inventory after x weeks must be y = 96 − 4x. This is not surprising: r does not change when we change the units of measurement of x. opening downward. sy r = 0.51 2. the stock is just as likely to perform well as to not perform well—likewise for companies that do not compensate their CEOs highly. y. The scatterplot suggests that the relation between y and x is a curved relationship.1889.59 2.Moore-212007 pbs November 27. There is a straight-line pattern that is fairly strong. s y = 3.5% because r 2 = (0.3%. The predicted selling price is 967.0916.355.47 2.35 2. (b) x = 22. (b) R 2 = 2. (b) r 2 increases from 0. 2007 9:50 S-8 SOLUTIONS TO ODD-NUMBERED EXERCISES 2. the remaining points appear more tightly clustered about the least-squares regression line. (b) 0.047x. No. (b) The graph shows a negatively sloping line.0892 + 0. In Figure 2. Thus. computed from statistical software. not that there is no correlation. (c) No. (c) y = 9.2. you could not make an accurate prediction of stock returns because the R 2 is so low and the scatterplot shows a very weak association between Treasury bills and stock returns. Observation 1 is not very influential.31. y). (a) and (b) The plots look very different.188999x.660 to 0.596)2 = 0. 2. we would predict y = y = 9.74776x. The regression line with year does not do a good job of explaining the variation in returns. In Figure 2. (c) The correlation between x and y is 0.41 2.61 .37 (a) 0.09 is not possible. the linear trend is not as pronounced.45 2. ˆ (a) y = 1. or both.968.39 2. There do not appear to be any extreme outliers from the straight-line pattern. b = r sx = 0. positive. after removing the outliers. very weak. A more accurate statement might be that in companies that compensate their CEOs highly.253.57 2. and a = y − bx = 1. ˆ (a) The percent is 35.253.779. the remaining points appear to be more tightly clustered around a line.600.07% when x = x = 1.55 2.707x. (a) The scatterplot shows a curved pattern. (a) There is a fairly strong. not a straight-line relationship.6 thousand dollars for a unit appraised for $802. (a) Gender is a categorical variable. The magazine’s report implies that there is a negative correlation between compensation of corporate CEOs and the performance of their company’s stock. The least-squares regression line passes through the ˆ point (x.995.74. (c) Using a calculator we found the sum to be −0.8.49 2. y = 5.53 2. sx = 17. linear relationship between appraised value ˆ and selling price.306. (b) y = 127. the outlier accentuates the linear trend because it appears to lie along the line defined by the other points.368. (b) y = 6.08%.898.43 2. After its removal.01. (c) The correlation has no unit of measurement.270 + 1. ˆ (a) y = 2. (c) These points are apparently not outliers because the correlation dropped quite a bit when they were removed. After 25 weeks the equation predicts that there will be y = −4. (b) No. (b) A correlation of 1.08% + ˆ 1. The correlation between x ∗ and y ∗ is also 0. Observation 1 is an outlier.707.95918 + 6. and after its removal. which is very low.

525483 + 0.64. r = −0. and so the variables are negatively associated.2.879.1%.87 . (c) The residual is −20. positive. Experiment with the applet on the Web and comment.4%. y = ˆ 58. (b) r 2 is 39.047x. (b) R 2 = 93. (c) R 2 = 93.67 2. (a) Number of Target stores = −0. We would predict Octavio to score 4.6571 + 0.5879 + 1.590147. so we predict number of Target stores = 96. There are 254 Wal-Mart stores in Texas.81 2.8.76946. so we predict number of Wal-Mart stores = 132. (c) y = 127.130.4%.859. If the correlation has increased to 0. For x = 500.76946 = the residual. and b = 0. The observed decline for this particular bear market is 14%.79 2. so the residual is −14.3. 2.4232.73 2.382345 × (number of Wal-Mart stores). (b) R = 0.1 points above the class mean on the final exam. so the prediction is an overestimate. (b) y = −17. y = 43.999993.879. so the residual = 121. (a) R 2 = 93. The actual assets − the predicted assets = −20. (c) The points. Leaving out spaghetti and snack cake. there is a fairly strong. and b = 0.6 is substituted for x.13459 × (number of Target stores). There are 90 Target stores in Texas. Fact 3 is true for this least-squares line.334%.8. ˆ (a) r = 0.71 2. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-9 2. r 2 = 0. and intercept = a = −17. taken together.464 + 0.5%.0832x. linear relationship.69 2. and the mean appraisal is $688.1275.77 2. (b) The actual assets are 11.83 2. ˆ ˆ (a) y = 15.76%. a = 1. so the calibration does not need to be done again.860.8814 + 1. (d) The predicted selling price will be $848. (d) There are 90 Target stores in Texas. We would expect the predicted absorbance to be very accurate based on the plot and the correlation. ˆ (a) Spaghetti and snack cake appear to be outliers.3101 + 1. (b) y = ˆ 1.75 2.988.391.928. then y will be 48. (b) For all 10 points.14721x. (c) Only the intercept is affected by the change in units. a = 1. (e) The mean selling price is $848. (a) The predicted assets for DLJ Direct with 590 accounts will be 31. are moderately influential. (c) The predicted value is y = 28.0832. With 64% of the variation unexplained.85 2.334%. When American stocks have gone down.Moore-212007 pbs November 27.65 2.849.2. If 793. (b) We would predict Julie’s final exam score to be 78. (a) Yes. y = 58.63 ˆ (a) Slope = b = 0. the least-squares regression line would not be considered an accurate predictor of final-exam score. Therefore. then there is a stronger relationship between American and European stocks.113301x. (c) Number of Wal-Mart stores = 30.270 + 1.590147. (a) b = 0.9. R 2 is the percent of the variation in the values of total assets that is explained by the least-squares regression line of total assets on number of accounts.2.16 and a = 30.858x.30356x. so European stocks have not provided much protection against losses in American stocks.750.1275 + 0. R 2 is the percent of the variation in the selling price that is explained ˆ by the least-squares regression line of selling price on appraisal values.5768. Thus.36. (b) There are 254 Wal-Mart stores in Texas. European stocks have tended to go down also. (c) r 2 = 0. Smaller raises tended to be given to those who missed more days. R 2 = 86. Yes. so the residual = −6.

107 (a) The residuals are more widely scattered about the horizontal line as the predicted value increases. years of education. The regression model will predict low salaries more precisely because lower predicted salaries have smaller residuals and hence are closer to the actual salaries.9101.6954 + 9. the plot shows a modest positive association. For intermediate numbers of years the residuals tend to be positive. and will overestimate the salaries of players who have been in the majors more than 15 years. .68%. and this could account for the observed correlation. namely the size of the fire. The reasoning assumes that the correlation between the number of firefighters at a fire and the amount of damage done is due to causation.105 Intelligence or family background may be lurking variables. (b) The number who smoke is 1004. However. Correlations based on averages are usually higher than the correlation one would compute for the individual values from which an average was computed. is behind the correlation. Answers will vary but may include: age.86 million employees only 5 years later.Moore-212007 pbs November 27.365. 2. and location. There also appears to be an overall downward trend over time. Intelligence or support from parents may be a lurking variable. (b) y = 44.89 We would expect the correlation for individual stocks to be lower.99 2. Correlations based on averages (such as the stock index) are usually too high when applied to individuals. years of experience in job. (c) There are periodic fluctuations with two peaks around 1977 and 1986. 2. Families with a tradition of more education and high-paying jobs may encourage children to follow a similar career path. The model will overestimate the salaries of players that are new to the majors. (b) We would expect the correlation to be smaller. we should see a negative ˆ association.93 2. If lurking variables are present. For very low and very high numbers of years in the majors. so the percent who smoke is 18. No. this is an extrapolation because it is very unlikely that a kind of business that has 0 employees in 1997 would have 1.103 (a) r 2 = 0. 2. 2007 9:50 S-10 SOLUTIONS TO ODD-NUMBERED EXERCISES 2. This is the fraction of the variation in daily sales data that can be attributed to a linear relationship between daily sales and total item count. then requiring students to take algebra and geometry may have little effect on success in college. r 2 = 0. 2.59163x.91 2. 2. (b) Yes.101 (a) If the consumption of an item falls when price rises. the residuals tend to be negative. factors that may lead students to take more math in high school may also lead to more success in college.109 (a) The data describe 5375 students. The most seriously ill patients may be sent to larger hospitals rather than smaller hospitals. Correlation does not imply causation. More generally. (a) The story states that these are the “ten kinds of business that employ the MOST people” so it would not be possible to have GREATER x-values than the ones listed in this data. It is more plausible that a lurking variable.95 2. will underestimate the salaries of players who have been in the major leagues about 8 years. (b) There is a curved pattern in the plot.97 2.

86% 18.73% 36. while for men accounting and finance are the most popular.12% The data support the belief that parents’ smoking increases smoking in their children.61% .419). For both types of purchases.155). 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-11 (c) Neither parent smokes One parent smokes Both parents smoke Total Percent 2.167).75% 4-year full-time 35.8%.84% 3.113 (a) Female Accounting Administration Economics Finance 30. (d) If you are in “good” condition before surgery or if you are in “bad” condition before surgery. (b) Percent of Hospital A patients who died who were classified as “poor” before surgery = 3. 2.44% 2. (e) The majority of patients in Hospital A are in poor condition before surgery.121 There are 1716 thousands of older students. cash is most likely.59% 4-year part-time 13.303).40%. your chance of dying is lower in Hospital A.22% 40. The conditional distribution of payment method for planned purchases is: cash (0.65% The most popular choice for women is administration. The distribution is 2-year part-time % older students 7. Patients in poor condition are more likely to die. Percent of Hospital B patients who died who were classified as “poor” before surgery = 4%.3%.Moore-212007 pbs November 27.530). The conditional distribution of payment method for impulse purchases is: cash (0.23% 2239 41.47% 1356 25.66% 1780 33.78% 24. and credit card (0. 2.117 The marginal distribution for payment method is: cash (0. so choose Hospital A. while the majority of patients in Hospital B are in good condition before surgery. even though both types of patients fare better in Hospital A. Percent of Hospital B patients who died = 2%. check is least likely. and credit card (0.351).129). check (0. Answers will vary for explaining the choice of payment method for impulse purchases.05% 2-year full-time 43.11% Male 34. For planned purchases.111 Neither parent One parent Both parents smokes smokes smoke % students who smoke 13.115 (a) Percent of Hospital A patients who died = 3%. 2.495). and credit card (0. check (0. credit card is most likely. check (0. Percent of Hospital B patients who died who were classified as “good” before surgery = 1. 2. 2.22% 27.58% 22. (c) Percent of Hospital A patients who died who were classified as “good” before surgery = 1%.452).54% did not respond. For impulse purchases. and this makes the overall number of deaths in Hospital A higher than in Hospital B. (b) 46.119 (a) The percent is 20. (b) The percent is 9.85%.

05% 57.67% (b) The data show that those taking desipramine had fewer relapses than those taking either lithium or a placebo. 2.44%) of applicants who are less than 40 who are hired is more than 10 times the percent (0. (d) Lurking variables that might be involved are past employment history (why are older applicants without a job and looking for work?) or health.33% 16.123 (a) Hired Applicants < 40 Applicants ≥ 40 6. The entry in the Total column is 59. These results are interesting but association does not imply causation. (c) Only a small percent of all applicants are hired.129 (a) A combined two-way table is Admit Male Female (b) Converting to percents gives Admit Male Female 70% 56% Deny 30% 44% 490 280 Deny 210 220 (c) The percents for each school are as follows: Business Admit Deny Male Female 80% 90% 20% 10% Male Female Law Admit 10% 33% Deny 90% 67% .56% 99.Moore-212007 pbs November 27.125 (a) Desipramine Relapse No relapse 41.33% Lithium 75% 25% Placebo 83. 2.61% Not hired 93.73% 2. 2007 9:50 S-12 SOLUTIONS TO ODD-NUMBERED EXERCISES Older students tend to prefer full-time colleges and universities to part-time colleges and universities. The difference may be due to rounding off to the nearest thousand.61%) of applicants who are 40 or older who are hired. but the percent (6.918 thousand.127 (a) The sum is 59.54% 10.69% 10. with 79.920 thousand. 2.2% of all older students enrolled in some fulltime institution.67% 58. (b) Never married Married Widowed Divorced 21.44% 0.39% (b) A graph shows results similar to those in (a).

Fund A increases by 40%. and more men apply to the business school than the law school. (c) 0.5 13 11.141 (a) 0. Fund B increases by 20%.459. (b) 0. R 2 = 75.5 15 11. 2. 2.952x. (e) The rounding didn’t have an effect for this data because the part of each data point which was rounded off was a small percentage of the actual data value. the overall admission rate for men appears relatively high.966. ˆ 2. (c) The “zero intercept” line completely misses all the data because it is too low.137 (a) If a professor had a 2002 salary of $0. 2. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-13 (d) If we look at the tables. one should not conclude from these data that the observed association is evidence that high interest rates cause low stock returns.396. (d) Professor #15 has the highest 2002 and 2005 values. but the data point follows the general trend of the other data.135 (a) 0. #15 is not an influential observation because R 2 .56 (c) 0. Then if Fund B increases by 10%. 2.966. 2.Moore-212007 pbs November 27. 144.877.131 (b) There is a fairly strong. his or her 2005 salary would be $27.44. The point is an outlier in the horizontal direction.200). s. Professor #16 has a 2005 salary similar to this one. The estimated intercept raises the least-squares regression line to the right height to pass through the data.300. This is a case where there is a perfect straight-line relation between Funds A and B. s = 8710. 2.966.145 Removing this point would make the correlation closer to 0.147 Suppose the return on Fund A is always twice that of Fund B. ˆ With #15 removed: y = 21681. while more women apply to the law school than the business school. (c) Professors #3 and #5 have 2002 salary similar to this one. The correlation of amps with average weight is greater than the correlation of amps with the individual weights. and this makes the overall admission rate for women appear low.387 + 0.5 14 12.5 12 10. It is hard to get into the law school. 2. 2. It would not be considered an outlier. and if Fund B increases by 20%.149 (a) and (b) The overall pattern shows strength decreasing as length increases until length is 9. positive. Professor #9 does not follow the overall pattern and would be considered an outlier. s = 8765.676.139 (a) With all data points: y = 27459.6%. and the slope hardly changed at all when it was removed.143 High interest rates are weakly negatively associated with lower stock returns. (b) Amps Average weight 10 9.6%. (b) No.896x. 2. This is not a “practical” interpretation because no professor would have had a 2002 salary of $0. which is a general fact.441 + 0. we see that it is easier to get into the business school than the law school.554. and its location strongly influences the regression line. R 2 = 75.133 (d) The association looks stronger as the range of axes increases because the points look closer together. (d) The calculations are the same. Because many other factors (lurking variables) affect stock returns. then the pattern is relatively flat with strength decreasing only . linear association with an outlier from point Professor #9 (102. Because it is easier to get into the business school. Units of measure don’t change the correlation between two variables.

so there should be no systematic differences in the groups that received the two treatments. 2. and r = 0. and are these common lengths that are used in building? 2. (b) For a house built in 2000. a negative value makes no sense! (d) r = −0.947. (c) The least-squares line is Strength = 488.24% 1. age is 2 and selling price = $186.241). indicating that older houses are associated with lower selling prices and newer houses with higher selling prices.17% 1.544.9 × Length. The proportions for “higher” and “lower” are very close. and 1899 (age = 101) is almost within the range.75 × Length. and we would not trust the regression line to predict the selling price of such a house.89% The data show that aspirin is associated with reduced rates of heart attacks but a slightly increased rate of strokes. To find the equation of the regression line for predicting degree-days from gas use.682.51.5 − 46. The equation of the regression line is gas use = 123.226 − 1334. age is 3 and selling price = $185.419.49 × age. We see that for each 1-year increase in age the selling price drops by $1334.226.02. A straight line does not adequately describe these data because it fails to capture the “bend” in the pattern at Length = 9.391).048 and a = −5.889.153 (a) Selling price = 189.222 × (degree-days).226. The association is negative.49 × 150 = −$10. b = r sx = 20. y = 558.368). same (0. s y = 274. However.222 and a = y − bx = 123. x = 21.151 Let x denote degree-days per day and y gas consumed per day.155 We convert the table entries into percents of the column totals to get the conditional distribution of heart attacks and strokes for the aspirin group and for the placebo group.4 × Length. age is 1 and selling price = $187.53. The regression line predicts a house that is 150 years old to have a selling price = 189.226 − 1334. the design of the study is such that it is difficult to identify lurking variables. 2. lower (0.09% 1.222. we interchange the roles of x and y and compute b = 0.048 × (gas use).Moore-212007 pbs November 27. The slope has units of cubic feet per degree-day.891.49. 1850 (age = 150) is well outside the range of the data. Thus.93% 0.283 + 0.283.226 + 20.557. For a house built in 1998. Association does not imply causation.383. age is 0 and selling price = 189.1 − 3. 2007 9:50 S-14 SOLUTIONS TO ODD-NUMBERED EXERCISES very slightly for lengths greater than 9. Doctors were assigned at random to the treatments (aspirin or placebo). For a house built in 1999. Aspirin group Fatal heart attacks Other heart attacks Strokes 0. . For what lengths is it stronger. The equation for the lengths of 9 to 14 inches is 667.08% Placebo group 0.157 (a) The marginal distribution of opinion about quality is: higher (0. I would want to know how the strength of the wood product compares to solid wood of various lengths. (c) 1900 (age = 100) is within the range of the data used to calculate the leastsquares regression line.50.38 − 20. so we would probably trust the regression line to predict the selling price of a house in these years. For a house built in 1997. 2. (d) The equation for the lengths of 5 to 9 inches is 283.989. The two lines describe the data much better than a single line. The slope has units of degree-days per cubic feet. The equation of the regression line is degree-days = −5. sy sx = 13.

.194).11 3. continuing from where you left off in the table. .258). (b) U. same (0. 002. . . (c) All regulators from the supplier. 4400 and select an SRS of 44 of these. You would expect that the higher rate of no-answer was probably during the second period. as more families are likely to be gone on vacation. and 074.299). Many variables (lurking variables) could have changed over the five years to increase the unemployment rate.9 3. .S. 131.250). 273. (a) All adult U. The selected sample includes the retailers numbered 400. and lower (0. 10—Fleming. . select 5 small accounts. The conditional distribution of opinion for nonbuyers is: higher (0.7 3. more actively training for some athletic event.15 . . . with one from each pair completing the training and the other not. these are probably the tellers who are more experienced or more interested in advancing their careers. These are the proportions of opinion in general. The first 5 midsize accounts have labels 417. For buyers. (b) To make this an experiment. .443).5 3. 500 and select an SRS of 25 of these. CHAPTER . . residents. 12—Gates. 1452. .” Yes. or just the regulators in the last shipment. or more seriously trying to lose weight. randomly assign 25 members to a control group that just has them visiting a health club and 25 members to a treatment group which has them visiting a health club and working with a trainer.13 3.Moore-212007 pbs November 27. 322. 211. These individuals may be more health conscious. Label the small accounts 0001. and the response variables are whether or not the individual considers the particular features essential in a health plan. 3. The unseen bias in the study as written in the exercise is that since the tellers were allowed to select themselves for advanced training. The selected sample is 04—Bowman. 350. start with a sample of tellers.” For both groups. . . numbering across rows. Label the midsize accounts 001. not separated out for buyers and nonbuyers.3 3. and lower (0. all of the same experience and ability. 19—Naber. and 097.1 It is an observational study because information is gathered without imposing a treatment. we can conclude that using recycled filters causes more favorable opinions. . 417.556). (b) The conditional distribution of opinion for buyers is: higher (0. . households. and 13—Goel. the largest proportion was for “lower. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-15 with the proportion for “lower” slightly bigger. and randomly select half to participate in the advanced training while the other half get no additional training. We have used two-digit labels. same (0. . 17—Liao. 0002. 247. . and the first 5 small accounts have labels 3698. 2480. Label the retailers from 001 to 440. . (a) To make this an experiment. 077. The explanatory variable is consumer’s gender. First select 5 midsize accounts and. 494.” For nonbuyers. and 3716. 172. the majority voted for “higher. The unseen bias involved in using the study as written in the exercise is that currently individuals have decided on their own to work with the trainer.3 3. such as a recession or factories leaving the area.S. 208. the smallest proportion was for “same. 2605. An alternate way of doing this is to treat it as a matched pairs experiment where pairs of tellers with similar backgrounds and experience are paired up.

or “individuals. and 359. We labeled tracts numbered in the 1000s one to six.” (b) “The elimination of the tenure system at public institutions should be considered as a means of increasing the accountability of faculty to the public. 0727. and the treatments are the two levels of medication.29 3. (b) The sample is 150 randomly selected businesses. The labels and blocks selected are 21 (block 3002). Various labels are possible. (d) Answers will vary. (b) There are testimonials on the Web site. 04—A1101. numbering across rows. The response variable is the number of pain episodes reported. 3. 115. (b) There are two factors—temperature and stirring rate—and six treatments. (c) 51. the first 5 males are those with the labels 1369. Some businesses are listed by a person’s name. (a) The committee intends for the population to be all local businesses. 75. but in actuality the population they are using is all local businesses listed in the telephone book. This is a voluntary response sample.35 3. Continuing in the table. and 10 (block 2003). How did the committee know which names in the phone book were businesses? (a) The population “eating-and-drinking establishments” in the large city. (b) The population is the congressman’s constituents.37 . 155.33 3.” Subjects are the 300 sickle cell patients. 2007 9:50 S-16 SOLUTIONS TO ODD-NUMBERED EXERCISES 3.17 (a) The site is a true random number generator which uses little variations in amplitude in atmospheric noise picked up by a radio. 2. (b) A simple random sample of size n would allow every set of n individuals an equal chance of being selected.33%.23 3.19 3. 052. so the systematic sample includes clusters numbered 35. and some only in the white pages. The clusters numbered 1. and in the 3000s nineteen to forty-four.” are needed. in the 2000s seven to eighteen. Some businesses list only in the yellow pages. (a) The individuals are different batches and the response is the yield. and 1868. 4.27 3. and persons with strong opinions on the subject are more likely to take the time to write. 0815. (a) The selected number is 35. Label the alphabetized lists 001 to 500 for the females and 0001 to 2000 for the males. Some businesses may only use cell phones or Web sites and choose not to be listed in the phone book at all. (c) The population is all claims filed in a given month. We have used two-digit labels. 23 (block 3004).Moore-212007 pbs November 27. A systematic sample doesn’t allow every set of n individuals a chance of being selected. 19 (block 3000). (a) “A national system of health insurance should be favored because it would provide health insurance for everyone and would reduce administrative costs. 159. 1025.31 3.21 3. and 11—A2220. and 5 would have no chance of being selected in a systematic sample.3. 18 (block 2011). The selected sample is 12—B0986. 3. (c) and (d) Answers will vary.25 3. and 195. Lay them out in a diagram like that in Figure 3. (c) Twelve batches. while at the same time the question of whether such a move would have a deleterious effect on the academic freedom so important to such institutions cannot be ignored. the factor is the type of medication given. 087. The first 5 females are those with the labels 138.

Group 1 will contain: Dubois (05). Group 2 will then contain the remaining subjects: Abate. Iselin. The response variable is the price the student would expect to pay for the detergent. and the next 5 to Group 3. 06. Regimen B = Moses. A statistically significant result means that it is unlikely that the salary differences we are observing are just due to chance. 02. The first 5 selected are assigned to Group 1. However.57 . Quinones (16). For each group. 08. The remaining 5 are assigned to Group 4. Brown. Kendall. 03. 09. Loren. Mann. (d) The first 3. 01. and the sample mean for the processing is 314. 17. To properly blind the subjects. The explanatory variable is the price history that is viewed. The light/regular group will use the remaining 10 subjects.43 3. Stall. Hwang. This is a comparative experiment with two treatments: steady price and price cuts. Nevesky.55 3.39 To assign the 20 pairs to the four treatments. Group 1 is pairs numbered 16. The levels correspond to the three sets of choices. 07. Chen (04). Wilansky. Jackson.49 3. (a) This is a completely randomized design with two treatment groups. Group 3 is 18.45 3.41 3. Morse. the regular/light group will use subjects with labels: 12. 15. Group 2 is 13. Cruz. If charts or indicators are introduced in the second year. (a) Randomly assign your subjects to either Group 1 or Group 2. Gutierrez. Birnbaum.4 and 3. and 11. (b) Using line 141 on Table B. Ullman (20). 05. The response variable is whether they chose a milk drink or a fruit drink. Use a diagram like those in Figures 3. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-17 3. 05. and 08.47 3. Group 1 will drink them in the regular/light order.5 to describe the design. give each of the pairs a two-digit number from 01 to 20.4 and 3. (a) The subjects and their excess weights. Gerson (08). Brown. 04. Rodriguez. and then the results of the two groups will be compared to see if the order of tasting made a difference to the ratings.5 to describe the design. 19.53 3. and 10. McNiell. (b) The factor is the set of choices that are presented to each subject. Engel. Regimen C = Hernandez. (b) The sample mean for the kill room is 2138. Fluharty (07). 06. and Group 2 will drink them in the light/regular order. 10. you won’t know if the observed differences are due to the introduction of the chart or indicator or due to lurking variables. the next 5 to Group 2. Travers (19).Moore-212007 pbs November 27. With these conventions. Brunk.5. This experiment suffers from lack of realism and limits the ability to apply the conclusions of the experiment to the situation of interest. Regimen D = Deng. Santiago. (c) Use a diagram like those in Figures 3. The remaining 5 pairs are assigned to Group 4. (a) The subjects are the 210 children. Tran. not appearance) cups with no labels on them. and the electric consumption in the first year is compared with the second year. are listed with the columns as the 5 blocks and the 4 subjects in each block labeled 1 to 4. Each group will taste and rate both the regular and the light mocha drink.51 3. Afifi (02). 19. both mocha drinks should be in identical opaque (we are only measuring taste. the taste ratings of the regular and light drinks will be compared. and Rosen. rearranged in increasing order of excess weight. Obrach. Lucero (13). Kaplan. (b) Answers will vary using software. Thompson (18). Smith. Regimen A = Williams. 09. 16. but using line 131 on Table B. (a) This is matched pairs data because each day two related measurements are taken.

08. 0.5 to describe the design.8 away ˆ from the true proportion. 10. Use a diagram like those in Figures 3. then p would only be 0.239 residents interviewed. 20. (b) Each subject will do the task twice.9196. In a controlled scientific study. 09.85 3. ˆ (b) Approximately 95% of the p values are inside the interval (0. (b) If we use a p of 0. ˆ (a) Worst case scenario would be that we have a p of 1. (a) Approximately 10 million to 70 million dollars.4 and 3. 01. the effects of factors other than the treatment can be eliminated or accounted for. Label the plots across rows. (a) Population is all people who live in Ontario.61 3.63 3.89 . The difference in the number of correct insertions at the two temperatures is the observed response.4 and 3. (a) Use a diagram like those in Figures 3. ˆ (a) p = 0. (d) Sampling variability decreases as the sample size increases. (b) Use a diagram like those in Figures 3.75 3. Sample is the 61. (a) Use a diagram like those in Figures 3. 3. (c) It is impossible to know how far off this sample proportion is from the true population proportion because we do not know what the true population proportion is. while others will not.788 million dollars. 2007 9:50 S-18 SOLUTIONS TO ODD-NUMBERED EXERCISES 5 children to be assigned to receive Set 1 have labels 119. This is a very large probability sample. 3.515 is a statistic. The first 10 plots selected are labeled 19.65 3.Moore-212007 pbs November 27.5 to describe the design. which is close to the other medians.67 3.59 (a) The more seriously ill patients may be assigned to the new method by their doctors.4 and 3. 02.5 to describe the design.8133 away. so this is a good indication of where the true proportion is. 06. and 148. (a) 25. (b) Worst case scenario is the true proportion is 0.9196). (b) Approximately 5 million to 80 million dollars. 033.503 is a parameter and 2.71 3. 16. 199.79 3. and the remaining 10 are assigned to Method B. The seriousness of the illness would be a lurking variable that would make the new treatment look worse than it really is. These are assigned to Method A.5 away from the true proportion at most. The number of adults is larger than the number of men. ˆ (c) Approximately 95% of the p values from 1200 samples are between 0.87 3. (c) 13. which is smaller. (c) Approximately 18 million to 40 million dollars.69 3.4 and 3. (b) Yes. Then the estimate would be 0. (b) 30.8133. once under each temperature condition in a random order. and 07.81 3. 192.3 million dollars.73 3. The number 73% is a statistic and the number 68% is a parameter.234 million dollars.5 to describe the design. 2. Some people might object because some participants will be required to pay for part of their health care.77 3.83 3. it may take a long time to carry out the study.5.8072 and 0. which would be 0.8072. Draw a rectangular array with five rows and four columns. (b) As a practical issue. which is smaller. The larger sample size suggested by the faculty advisor will decrease the sampling variability of the estimates.

For a sample of invoice numbers 9. An activist for patients’ rights might be acceptable. (c) This is not acceptable. 1.8. 3. (c) These 10 samples can be used to draw a histogram.17%. 0. ˆ ˆ (a) Starting at line 101.93 3. (b) The explanatory variable is the type of ad which is shown to the student.Moore-212007 pbs November 27.107 Answers will vary.2. and 6 corresponding to days past due of 6. 3.115 (a) This is a completely randomized design with 2 treatment groups (20 women in each group). 0. 0. and their viewpoint is important to consider. the center ˆ should be 0. (c) Most people probably will not be able to accurately remember how many movies they have watched in a movie theater over the past 12 months. 3. (b) This is acceptable as long as no names will be reported and the social psychologist doesn’t interfere in any way. 3. (a) This is acceptable. Scientists may be so concerned with the experiment itself that they do not fully consider the experience of participating in the experiment from the subject’s point of view.6. (b) Answers will vary.111 (a) This is an experiment because the students are reacting to the ads they were shown. 0.101 (a) The pollsters must tell the potential respondents what type of questions will be asked and how long it will take to complete the survey. 0. However. however. 12. (b) This is required so the respondents can make sure the polling group is legitimate or so the respondent can issue a complaint if necessary. the average is x = 10. (d) The center is close to 0.95 3. p = 0. (a) Many of the subjects will be non-scientists. and the response variable is the expected price for the cola which the student states. record the each women’s choice of company. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-19 3.4. The other two courses have acceptable alternatives which make the use of the students more ethical. which does not seem ethical. (d) The results would be more accurate if the question concerned just the past month.6.0. and 0.4. and 15.2. The ads are the treatment which is applied to the subjects. At the end of the experiment. 3. This is not anonymous because it takes place in the person’s home. (b) Using Table B. 7. 0.6. 8.6.6.103 Psychology 001 uses dependent subjects. 3. That is a very long time period. it is confidential if the name/address is then separated from the response before the results are publicized. (b) Answers will vary depending on the random numbers you choose. The center of the histogram of the 10 repetitions should not be far from 8.91 (a) Answers will vary depending on the random numbers you choose.109 Answers will vary. If more samples were taken. One group would receive the brochures for Company A and for Company B with child-care. 3.105 Answers will vary. The other group would receive the brochures for Company A and for Company B without child-care. and they were not warned ahead of time to keep track of their ticket stubs. (b) 37. Choosing a member of the clergy might be tricky—which religion would be picked? A medical doctor might be considered just another scientist.8.6 since p is an unbiased estimator of p.99 3. the activist would have a bias before even reviewing the experiment. 3.97 3. 0.113 (a) All adults. (b) The 9 additional samples have p = 0. starting at line .

3.4 and 3. Chen (06).117 This is an observational study because no treatment is applied.6 becomes smaller as the sample size increases.Moore-212007 pbs November 27. Leading questions can introduce strong bias.5 to describe the design. (b) Use a diagram like those in Figures 3. (c) The first five subjects assigned to the beta carotene group are subjects 731.125 The nonresponse rate can produce serious bias.137 Answers will vary depending on the random numbers you choose. 3. Brown (04). (f) People who eat lots of fruits and vegetables tend to have different diets. Increasing the sample size decreases the sampling variability of p . Gupta (16). Ullmann (38). The two types of muffin are the treatments. One group is assigned to each treatment. 3. To carry out the random assignment. 304. First number the 20 recitations from 01 to 20. (d) Neither the subjects nor those evaluating the subjects’ response knew which treatments were applied. McNeill (27). 10 in each group. The explanatory variables are the four different supplement combinations. Gerson (14). Travers (36). about half. 3. etc.133 (a) The factors are “storage method” at three levels and “when cooked” at two levels. 3. and Afifi (03).119 This is a matched pairs experiment. 3. Morse (28). of the rats with genetic defects would be in the experimental group. Answers will vary depending on where in the table you start. The histograms are becoming more concentrated about their center. may exercise more. 3.123 (a) There may be systematic differences between the recitations attached to the two lectures. 253. (b) Randomly assign the 20 recitations to the two groups. 3.135 Subjects should not be told where each burger comes from and in fact shouldn’t be told which two burger chains are being compared. Adamson (02). Danielson (09). 3. 3. enter Table B and read two-digit groups until 10 recitations have been selected. . 3. Chen (06). For the firms sent follow-up questionnaires. (e) Any observed differences are due to chance. Kim (23). (b) One possible design is to take the group of judges and divide them at random into six groups. and 296. Roberts (32). it is important to draw the histograms using the same ˆ scales. Group 1 will consist of: Wong (40). nonresponse was a serious problem as well. For the original questionnaires. may smoke less. Rivera (31).6. Sugiwara (34). which is p = 0.127 Take an SRS from each of the four groups for a stratified sample.131 (a) The response variable is whether or not the subject gets colon cancer. there was a 37% response rate. 2007 9:50 S-20 SOLUTIONS TO ODD-NUMBERED EXERCISES 161. (c) Having each subject taste fries from each of the six treatments in a random order is a block design and eliminates the variability between subjects. The other 20 subjects would be in Group 2.121 The wording of questions has the most important influence on the answers to a survey. and both of these questions lead the respondent to answer Yes.129 This is a sensitive question and many people are embarrassed to admit that they do not vote. 3.139 For comparison purposes. which results in contradictory responses. The researcher is just measuring the men as they already are. or 5. We would expect that in a long series of random assignments. Janle (21). 470. Edwards (11). ˆ so the chance of getting a value of p far from 0.

(a) S = {any number (including fractional values) between 0 and 24 hours}. 10. the center was about 65% or 66%. .29 4.13 4. We tossed a thumbtack 100 times and it landed with the point up 39 times. (a) P(Y > 300) = 0. 8. (a) P(completely satisfied) = 0. 4.15 4. there is a 0. The longest run of “not buys” was 5.02. (a) In our 100 draws the number in which at least 14 people had a favorable opinion was 37. P(death was related to some other occupation) = 0.27 4. the percentages add up to 100%. The probabilities do not sum to 1. (c) 0.37.000}.747. . (a) Answers will vary.287.17 4. 3. A success is an item which is returned. .919.19 4. (a) Yes.11 We spun a nickel 50 times and got 22 heads. (d) 0. .6. (b) This simulation represents 100 randomly selected items which have been purchased. We estimate the probability of heads to be 0. (b) The probability that T is less than 1 is 0. Answers will vary. Model 3: Not legitimate. (c) The probability that T is less than 0. .790.025. (c) 0.25 4. Over the long run. 7.9 4.39.7 4. Model 4: Not legitimate. (c) The shape of our histogram was roughly symmetric and bell-shaped.33 4.65.21 4. (a) P(not farmland) = 0. The first 200 digits contain 21 0s. (c) 0. . . 9. (a) 0. (e) We might take S = {any possible number. 2.105.44. 6. and the values ranged from 58% to 74%. (c) 0. The proportion of 0s is 0.5 4. . (a) The area of a triangle is (1/2) × height × base = (1/2) × 1 × 2 = 1. The probabilities do not sum to 1. (c) P(something other than farmland or forest) = 0. Some of the probabilities are greater than 1. Model 1: Not legitimate.08. 11.1 4.5. The approximate probability of landing point up is 0. (a) 0. (b) 1.69.39 since probabilities sum to 1.93. (a) P(Y > 1) = 0.19. (b) 0. 12}. (b) 0. The histogram in (b) is more spread out.07. 4. P(death was either agriculture-related or manufacturing-related) = 0. . The percent of simulated customers who bought a new computer is 50%.31 4.Moore-212007 pbs November 27. (d) One possibility is S = {any number 0 or larger}. (b) S = {any integer value between 0 and 11. but should be close to 6%.35 .23 4.14. (b) The shape of our histogram was roughly symmetric and bell-shaped. (c) S = {0. (a) The number of “buys” (1s) in our simulation was 50. and the values ranged from 40% to 90%. (b) P(Y > 370) = 0.28. Model 2: Legitimate. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-21 CHAPTER . (d) Both distributions are roughly symmetric with similar centers.01.4 4.3 4.5 is 0. (b) P(dissatisfied) = 0. 5. .1125. 1.06 probability that an item which is purchased will be returned. the center appeared to be about 65%. (b) The longest run of “buys” (1s) was 4. (b) 0.5. either positive or negative}. The approximate probability is 0. (b) P(either farmland or forest) = 0.253.

000 miles is P(X > 50. P(A or B) = 0. 6. (c) Households that have more cars than the garage can hold have 3 or more cars. If we record the size of the hard drive chosen by many. NOO. σ X −Y = 10. so P(X = 60. (b) Not legitimate.43.225. Profit downtown is 35 and μ35Y = 6825.Moore-212007 pbs November 27. yellow. ONN.47.000) = 0. (a) Continuous. the average will be close to 18. or 4.000) = 0. NNO. (a) Legitimate. (d) Discrete.43 4.71. The probabilities of the outcomes do not sum to 1.125. σY = 6475. 1.39 4. (c) P(X > 5) = 0.1. All times greater than 0 are possible without any separation between values. 4.000.0344.0344. (b) P(X ≥ 5) = 0. (b) Discrete. If the correlation between two variables is positive. P(B) = 0.29.91.125 P(X ) 0. P(X ≥ 1) = 0. (a) The probability that a tire lasts more than 50.825. (a) P(A) = 0. σ X −Y = 100. σY = 5600. (c) The normal distribution is continuous. as it is not one of the possible choices and does not indicate which choice is most popular.000) = 0 and P(X ≥ 60. Knowing μ is not very helpful. Also. μY = 195.55 4.000) = P(X > 60.47 4. (a) The 8 arrangements of preferences are NNN.5.11.5. small values of one tend to be associated with small values of the other.2. Each must have probability 0. This suggests that when two variables are positively associated. 2007 9:50 S-22 SOLUTIONS TO ODD-NUMBERED EXERCISES 4. (c) X 0 1 0. It is a count that can take only the values 0. or 7. and σ X = 74.45 (a) All the probabilities given are between 0 and 1 and they sum to 1. (b) P(X > 60.61 .5.83. (c) P(plain M&M is red.00. (b) {X ≥ 1} means the household owns at least 1 car. (f) P(a randomly chosen household contains more than two persons) = P(X > 2) = 0. ONO. P(A does not occur) = 0. (b) Profit at the mall is 25X and μ25X = 7000. Any number 0 or larger is possible without any separation between values.32.47. 5. (a) μ X = 280.18.375. 3.375 2 0. many customers in the 60-day period and compute the average of these sizes. (b) P(blue) = 0.375 3 0. and σY = 138. then large values of one tend to be associated with large values of the other. resulting in a relatively small difference. (d) P(2 < X ≤ 4) = 0. again resulting in a relatively small difference.49 4. they vary together and the difference tends to stay relatively small and varies little.57 4. (c) Continuous. (a) All the probabilities given are between 0 and 1 and they sum to 1.65. 4. Household size can take only the values 1.41 (a) P(blue) = 0.51 4. and σY = 80. or orange) = 0. 3. (b) “A does not occur” means the farm is 50 acres or more. σY = 19. (b) μY = 195.37 4. The sum of the probabilities is greater than 1.125 4.5. (c) Not legitimate.75.000) = 0. OON. μ = 18. 2. NON. (b) P(X = 2) = 0. (e) P(X = 1) = 0. or orange) = 0.00. 2 2 (a) μ X = 280.59 4. P(peanut M&M is red.53 4. OOO. 2.04. yellow. μ25X +35Y = 13. 2 μY = 445. (c) “A or B” means the farm is less than 50 acres or 500 or more acres. 2 μ X −Y = 100. 20% of households have more cars than the garage can hold.5. (c) The combined profit is 25X + 35Y.

73 4.720 0. (c) 379. If X and Y are positively correlated. μ X +Y = 31 seconds. (c) Orlando and Disney World are close to each other. The histogram for renter-occupied units is more peaked and less spread out than 2 2 the histogram for owner-occupied units.71 4.0010.47.28 bets per second. σrented = 1. the probabilities add up to 0. This reflects the fact that the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units.35 3 0.69 218. small values of X and Y tend to occur together.99. Rainfall usually covers more than just a very small geographic area. X + Y exhibits larger variation when X and Y are positively correlated than if they are not. If the 0. There are 1000 three-digit numbers (000 to 999). resulting in a small value of X + Y .63 (a) 720.64074.77 4. your probability of winning is 6/1000 and your probability of not winning is 994/1000. 4.003 −1280 0.13 (c) Yes. then the probabilities add up to 1.67 (a) 0. If you pick a number with three different digits. (a) If X is the time to bring the part from the bin to its position on the automobile chassis and Y is the time required to attach the part to the chassis.71174. We would not expect X and Y to be independent.08 1 0. (a) −1280.18 5 0. (b) 0. resulting in a very large value of X + Y . then large values of X and Y tend to occur together. (b) Experience suggests that the amount of rainfall on one day is not closely related to the amount of rainfall on the next. of transactions % of clients 0 0. then the total time for the entire operation is X + Y . (b) It will not affect the mean. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-23 4.19 4 0.125 is rounded to 0.30833. σ X +Y = 4. (b) No.12.65 4. because they are not widely separated in time.49998. (a) We would expect X and Y to be independent because they correspond to events that are widely separated in time.997 (a) Two important differences between the histograms are (1) the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units and (2) the spread of the distribution of the number of rooms of owner-occupied units is slightly larger than the spread of the distribution of the number of rooms of renter-occupied units.98. (b) X Probability 4. 4. μrented = 4.125 is rounded down to 0.79 .17 2 0. σrented = 1. Likewise.75 4.1131. perhaps slightly dependent.284.187.13. If X and Y have correlation 0. σ X +Y = 4. σowned = 2. σowned = 1.3.Moore-212007 pbs November 27. Thus. We might expect X and Y to be independent or. The mean number of rooms for owner-occupied units is larger than the mean number of rooms for renter-occupied units. Your expected payoff is $0. (c) The answer in (a) will remain the same in both cases. (b) μowned = 6.69204. if for 5 transactions the 0.

2 (a) μ X = 550. this average would be close to μ. σ X −550 = 32. (b) μfemale−male = 15.000 is a parameter. That is not right. Pfeiffer undoubtedly tested only a sample of all the models produced by Apple. σ0. 2 σ0. 4.95 These are statistics. So σ X +Y = σ X + σY . 14 is a statistic. σY = σ(9X/5)+32 = 105. P(x > 2) = P(Z > 4. Thus. Thus. Six of seven at-bats is hardly the “long run.95. 4.2Y = 15.8W +0. σ X +Y = σ X + σY + 2ρ X Y σ X σY = σ X + σY + 2σ X σY = (σ X + σY )2 . 4.2Y . in the long run. It only tells us what will happen if we keep track of the long-run average. His expected winnings are −$0.76). he will find that he loses an average of 40 cents per bet.81 4.5) = σ . The law of large numbers tells us that if the gambler makes . 2007 9:50 S-24 SOLUTIONS TO ODD-NUMBERED EXERCISES 4.60.4 computed in part (a). σ Z = 5.8W σ0.” Furthermore.135635594 × 10 .599.83 4.947.89 4. The law of large numbers says that in the long run.5) + 2 2 (μ − σ − μ X ) (0.4. σ X = σ . the law of large numbers says nothing about the next event. 5.00 to play each time. The mean return remains the same.93 4. and these means are computed from these samples. 2 2 (c) μY = μ(9X/5)+32 = 1022. Thus.101 (a) To say that x is an unbiased estimator of μ means that x will neither systematically overestimate or underestimate μ in repeated use and that if we take many.763.Moore-212007 pbs November 27. (b) Using line 120. x = 67. 4. repeat this many times.109 The gambler pays $1.00 for an expected payout of $0. 4.5) + (μ − σ )(0. which is close to the value of μ = 69. (b) σ Z = σ2000X +3500Y = 13 3. σfemale−male = 2009 and 2 σfemale−male = 44.812. many samples.7.70 because we no longer include the positive term 2ρσ0. (a) The two students are selected at random and we expect their scores to 2 be unrelated or independent.40. (a) Using statistical software we compute the mean of the 10 sizes to be μ = 69.99 4. 2 2 2 (a) σ X +Y = 7. 7.6.2Y = 3. (b) μ X −550 = 0.91 4. 4.5 and σ X −550 = 5. (c) We cannot find the probability that the woman chosen scores higher than the man chosen because we do not know the probability distribution for the scores of women or men.6 is a statistic.107 19 is a parameter. the average payout to Joe will be 60 cents.5) = μ.7. Thus. which is approximately 0. (c) The center of our histogram appears to be at about 69. Joe pays $1.85 2 μ X = (μ + σ )(0.25. these results will vary less from sample to sample than the results we would obtain if our samples were small.105 500. and compute the average of these x-values. compute the value of some statistic (such as x).053 per bet.674.97 4. This is smaller than the result in Exercise 4.8W +0.13. calculate x for each. if Joe keeps track of his net winnings and computes the average per bet. and keep track of our results.000.87 4.3 and σY = 10. (b) If we draw a large sample from a population. and 5. our SRS is companies 3. 4. The law of large numbers tells us that the long-run average will be close to 34%. σ X = 5. 2 2 2 2 2 If ρ X Y = 1.82. However. 0. so in the long run his average winnings are −$0.75 and σ X +Y = 2795.103 The sampling distribution of x should be approximately N (1.26. σ X = (μ + σ − μ X )2 (0.085).

131 P(Chavez promoted) = 0.60) = 0.28. We find L = 133. (b) P(x ≥ 124) = P(Z ≥ 21.45. (b) E(Y ) = 620 and = 12.73 < Z < 2. 17. (c) There is always sampling variability.55. P(x < 295) = P(Z < −2. (b) 9.031.113 P(X > 210. P(750 < Y < 825) = P(−1. 4.9535.9990.8558.137 The weight of a carton varies according to an N (780. 4.11 to 0.808. this average will be close to −$0. and computes the average of these.19 will contain approximately 95% of the many x’s.123 Sheila’s mean glucose level x will vary according to an N (125. (e) P(not in occupation D or E) = 0. keeps track of his net winnings. 4.94). but this variability is reduced when an average is used instead of an individual measurement.0071. which is approximately 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-25 a large number of bets on red. 5) distribution. (c) The standard deviation is the same for both. 4.1587.65). (b) P(worker is female) = 0.Moore-212007 pbs November 27. (b) The mean contents x will vary according to an N (298. (b) 0.45) = 0. 4. the smaller the variability. (d) The probabilities using the central limit theorem are more accurate as the sample size increases. (d) P(occupation D or E) = 0.0866. 4.125 (a) E(X ) = −620 and = 12. Letting Y denote the weight of a carton.031. 4.808.0462) distribution. 4. 4.3 cents per bet on average.0125.0052.135 The probability distribution is Y 1 2 3 4 5 6 7 8 9 10 11 12 P(Y ) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 4. (b) μ = 2.53) = 0.225) distribution. 1. 4.96. which does not require that we know the distribution of weekly expenses. (c) P(not in occupation F) = 0. The larger the sample size.127 (a) 0.111 (a) The mean x of n = 3 weighings will follow an N (123. The money received from 12 policies (unless extra charges for costs and profits are huge) would not cover this replacement cost.119 The range 0.121 (a) P(x > 400) = P(Z > 10.72.133 (a) All the probabilities listed are between 0 and 1 and they sum to 1. (c) 0. He will find he loses about 5.3544.32) distribution. . (b) We would need to know the distribution of weekly postal expenses to compute the probability that postage for a particular week will exceed $400. which is approximately 0.129 (a) 0.141 If a single home is destroyed by fire.053. 4. 4. 4. 4. L must satisfy P(x > L) = 0. 4. 0.117 (a) P(X < 295) = 0.05. The 150-bag probability calculation is probably fairly accurate.25.43. but the 3-bag probability calculation is probably not.139 (a) All the probabilities listed are between 0 and 1 and they sum to 1.115 0. In part (a) we applied the central limit theorem. the replacement cost could be several hundred thousand dollars.

σ X = 9707.160 0. 100} is one possibility (but assuming a person could be employed for 100 years is extreme). (a) 15% drink only cola.5 5.73)2 = 0.7 5. (b) Because we are allowing only a finite number of possible values. The more policies the company sells. 1.73)2 = 0. Since the years are independent. .0041. 4. and that the company will make money. (c) We included 101 possible values (0 to 100).91)6 = 0.65908.27)2 (0. 0.568.27)2 (0.144 (0. 3}. 5.57. .144 (0.0000005. .826. . X is discrete. . 2007 9:50 S-26 SOLUTIONS TO ODD-NUMBERED EXERCISES Although the chance that a home will be destroyed by fire is small.145 σ X = 94. one can appeal to the law of large numbers and feel confident that the mean loss per policy will be close to $250.73) = 0. .35.1 5. 2. the company can be reasonably sure that the amount it charges for extra costs and profits will be available for these costs and profits.27)(0. 2.35.65 = 0.17 . (c) 0.11 5. .389 0. .27)2 (0.432 Arrangements Probability of each arrangement Probability for W W =2 W =3 5. (c) 0.389 (0. 0.0197 Part (c) 0. the risk to the company is too great. 0. (b) 20% drink none of these beverages. (b) The probability of the price being down in any given year is 1 − 0.144 (0.73)2 = 0. . 2 4. 3. (a) W = {0.73) = 0. Thus.27)(0.6.99058.0532 (0.8.27)(0. .09)6 = 0.Moore-212007 pbs November 27. If one sells thousands of policies.73)3 = 0.0532 (0.32768. (a) 0. 4.147 (a) S = {0. (c) 0.09608. .0532 (0. .73) = 0.149 (a) S = {all numbers between 0 and 35 ml with no gaps}. . (b) 0. (b) 0.9 (a) The probability that the rank of the second student falls into the five categories is unaffected by the rank of the first student selected.27)3 = 0. (b) (0.1681. (b) S is a continuous sample space. the better off the company will be.337.35. 1.143 P(age at death ≥ 26) = 0.197 (a) (0. CHAPTER . Part (b) W W =0 W =1 DDD DDF DFD FDD DFF FDF FFD FFF (0. μ X = 303. the probability of the price being down in the third year is 0.236. .5 5.5450.66761. 4.13 5. (c) We included an infinite number of possible values.2746. (a) 0.15 0.3 5. All values between 0 and 35 ml are possible with no gaps.

σ = 0. (a) n = 10 and p = 0. (b) The writer’s first statement is correct. σ = 1. P(X ≤ 300) = 0. √ √ μ = 8 and σ = np(1 − p) = 20(0. P(X = 4) = 0.0640.455.35) = 1. (c) P(X = 0) = 0. This probability is extremely small and suggests that p may be greater than 0.25 and √ σ = np(1 − p) = 5(0. there is less variability in the values of X .0074. and P(AB and Rh-negative) = 0. P(X = 3) = 0. 5. The odds 3 against throwing three straight 11s.65. P(A and Rh-negative) = 0.0720. P(B and Rh-negative) = 0. (b) P(X ≥ 10) was evaluated for a sample of size 20 and found to be 0. This description fits this setting.27 5.6) = 2.342.37 5.0064. The value of 3.000171.3124.33 5. ˆ (a) X is binomial with n = 1555 and p = 0.193) = 0. (b) We know that the count of successes in an SRS from a large population containing a proportion p of successes is well approximated by the binomial distribution.0146. and when p = 0.3 = (0.0101.4)(0.254.0879.4. 5. 1.5256. P(AB and Rh-positive) = 0. the writer multiplied the odds.21 5. P(A and Rh-positive) = 0. 0.2) = 1. (c) P(X ≤ 2)√ P(X = 0)+ = P(X = 1) + P(X = 2) = 0. As the value of p gets closer to 1.25 5. (b) P(X = 2) = 0.0488. (a) X can be 0.45 5. μ p = p = 0. (a) μx = np = 6. and μx = 600 if ˆ n = 1200. (d) μ = 3. P(X = 2) = 0.789. 2.3364.37.3360.75) = 1. The assumption of a fixed number of observations is violated. P(X = 1) = 0.0010.65)(0.43 5.3780. μ p = 0. which is not the correct way to compute the odds for the three throws. (b) P(X = 0) = 0.3762. (b) Using the Normal approximation to the binomial for proportions. P(O and Rh-positive) = 0.0924. 1.0019.0053.6)(0. The proportion in the sample will be closer to 40% for larger sample sizes. 4.191. (b) μx = 60 if n = 120.99. are 1−P = 1−(2/36) = 5831 to 1.9. however.35 5. (b) The possible values of X are 0.2373.Moore-212007 pbs November 27. 2. (a) The 20 machinists selected are not an SRS from a large population of machinists and could have different success probabilities.47 .20. 3.20.3955. (a) Using the Normal approximation. P(X = 11) = 0.25.41 5. P(X = 4) =√ 0. so the chance of the proportion in the sample being as large as 50% decreases.8)(0. (a) n = 5 and p = 0.2451.067. μ p stays the same regardless of sample size. The probability of three 11s in three independent throws is 0. (a) The probability of an 11 is 2/36.25)(0. P (2/36)3 When computing the odds for the three tosses. P(B and Rhpositive) = 0. and P(X = 5) = 0. P(X = 1) = 0. (d) μ = 2. Using software for the binomial distribution. 4.23 Yes. P(O and Rh-negative) = 0.19 5.29 5.0176. √ √ (a) μ = 16.5 and σ = np(1 − p) = √ 10(0. A and B are independent if P(A and B) = P(A)P(B) and 0.0336.2447.2637.31 5.39 5.2816. These probabilities can be used to draw a histogram. and ˆ ˆ ˆ the P( p ≤ 0.1811.25 should be 5. P(X = 2) = 0. 3. P(X = 3) = 0. P(X ≥ 100) = 0.5. (c) When p = 0.5). and P(X = 5) = 0.1160. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-27 5. (b) σ = np(1 − p) = 20(0. σ p = 0.

P(X ≤ 175) = 0. (d) n must be increased to 200. 5. Given that the customer defaults on the loan.Moore-212007 pbs November 27. P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.67 5.12.7660. (c) Poisson with mean 2 × 48. (b) 0. 5.1472. √ √ (a) σ = μ = 2. P(X ≤ 70) = 0. (b) P(X ≤ 66) = 0. (d) 0.89% chance that the customer overdraws the account.69 5.59 5.52.4 = 9.53 5. (b) P(X ≤ 175) corresponds to Jodi scoring 70% or lower. (c) 0. √ √ (a) σ = μ = 17 = 4.5 is reasonable for the count of successes in an SRS of size n from a large population. 0. (b) 0.323.0620. the probability is 0. (b) 0. Using Bayes’s rule.57 5.25.0248. √ √ (a) μ = 180 and σ = np(1 − p) = 1500(0. (b) P(X ≤ 10) = 0.3090.2424.7 = 97. (c) 0.4776. Drawing a diagram should help.0344. (a) The binomial distribution with n = 150 and p = 0.0489. P(X ≤ 170) = 0.63 5. P(X ≤ 70) = 0.9986 = 0.0014.73 5. (a) The employees are independent and each equally likely to be hospitalized.51 5. (c) Poisson with mean 1/4 × 14 = 3. These answers are simpler to see if you first draw a tree diagram.4335.2148. Using the Normal approximation.7254 = 0. (a) Poisson distribution with a mean of 12 × 7 = 84 accidents.1730 = 0. (b) P(X > 5) = 1 − P(X ≤ 5) = 1 − 0. (b) 0.0821.20. 0. Using Bayes’s rule. (b) We expect 75 businesses to respond. (b) Poisson with mean 1/2 × 14 = 7.256. then P(B|A) = P(B). (a) 0. (d) 0.2746. P(Y < 1/2 and Y > X ) = 1/8.87. For a 30-minute √ √ period. (b) Using the Normal approximation.4617.0300.3150.7. (c) Using the Normal approximation.8270. (b) For a 15-minute period.586.4450.5970. (a) 0.55 5. (c) k = 3. 2007 9:50 S-28 SOLUTIONS TO ODD-NUMBERED EXERCISES included in the histogram from part (c) and should be a good indication of the center of the distribution.61 5.8989.71 5.3333.75 5.5905 = 0.62. which is not the case.3 = 1. the probability is 0.88) = 12.9700 = 0.1251.938.49 (a) Using the Normal approximation.85 5. (a) 0. (a) P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.0491. there is an 89. σ = μ = 48.7 = 6.98.65 5.4. (a) 0. (a) 0. 0. (c) If the events A and B were independent.87 .12)(0.77 5. (c) P(X > 30) = 1 − P(X ≤ 30) = 1 − 0.79 5.2061. (a) Poisson distribution with a mean of 48.81 5.9982. σ = μ = 97. P(X ≥ 100) = 1 − P(X ≤ 99) = 1 − 0. as in this case. P(X ≥ 5) = 1 − P(X ≤ 4) = 1 − 0.83 5. (b) 0. (b) 0. P(X ≥ 50) = 1 − P(X ≤ 49) = √ √ 0.4095.0018 = 0.5.

. (c) Using the Normal approximation. (c) The people are independent from each other. the probability is 0. (a) μ = 3. which is not true by comparing the answers in (a) and (b). Using Bayes’s rule. 5. n = 1244. (c) With the credit manager’s policy. .0404. it is important to take the observations under similar conditions.84.5 6.751.1065. (b) 0.115 (a) 0. (d) Using software. (b) 0. (a) The response rate is 1468/13. .117 Using Bayes’s rule.68. the vast majority of those whose future credit is denied would not have defaulted.75.97 5.1029.99 5. (b) If there are systematic patterns in the organizations that did not respond.55 100 0.Moore-212007 pbs November 27. (b) 0. (c) P(first one on kth toss) = ( 5 )k−1 ( 1 ).01592. . 2 × (standard deviation for x) = $44. .674.109 (a) Drawing the tree diagram will be helpful in solving the remaining parts.98 6. 5.91 million dollars. . (189.48.787. 5.5).93 5.107 (a) 0. CHAPTER .11 . 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-29 5.6 6. 6 6 6 6 6 6 5. the probability is 0.0336. 5.1416. 5. (b) 0.3 6.1129.9992 = 0. The small margin of error is probably not a good measure of the accuracy of the survey’s results. 250. (b) 0.89 (a) Using Bayes’s rule. the survey results may be biased. Yes. (b) Expect 94 not to have defaulted.105 0.627. (b) ( 5 )2 ( 1 ).103 (a) The probability of a success p should be the same for each observation. we should have P(in labor force) = P(in labor force | college graduate).1930. (c) If the events “in labor force” and “college graduate” were independent. (b) The probability of the driver being male for observations made outside a church on Sunday morning may differ from the probability for observations made on campus after a dance. 5. . P(X ≥ 1245) = 0.09 million dollars).95 5. 5.75. . A and B are independent if P(A and B) = P(A)P(B) and 0. Sample size Margin of error 10 3.101 (a) μ = 1250. (c) 0.10 20 2. (b) P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0. (b) Using the Normal approximation. (a) 0.7 6.125.5596. To ensure that this is true.064.6)(0. Use n = 1245. (c) P(X ≤ 8) = 0.111 (a) ( 5 )( 1 ).1 6. 5. P(X ≥ 275) = 0.19 40 1. such as the same location and time of day.4557.000 = 0.3 = (0.113 (a) 0. and each person has a 70% chance of being male. P(X ≤ 80) = 0. . .9 The standard deviation for x is $22.91 5.0008. 0.024. the probability is 0.

528. (c) 1. The smaller standard deviation leads to a smaller margin of error.23 6.17 6. we do not know if our interval correctly includes the population percent or not. Thus.137.863. management vs.25 6.25 minutes).Moore-212007 pbs November 27. 128.43).96 ± 0. we do not know if our interval correctly includes the true population percent or not. not to any particular interval produced by the method. white collar jobs. H0 : μ = 0. 95% refers to the method.061).57 years). (b) The standard deviation of the mean weight in kilograms is 0.31). the width of the confidence interval (or the size of the margin of error) decreases. or (101. (c) A 95% confidence interval for the mean weight.77 years or (11.531. (b) The margin of error is 3%. 63.58. This would not be considered strong evidence that Cleveland differs from the national average. Ha : μ < 0.27 6. the margin of error only covers random variation.8 ± 4. (a) No.9186. 35. The confidence coefficient of 95% is the probability that the method we used will give an interval containing the correct value of the population mean study time.41 (a) 115 ± 13.25 minutes. 55%).1142. 2007 9:50 S-30 SOLUTIONS TO ODD-NUMBERED EXERCISES As sample size increases. we say we are 95% confident that this is one of the correct intervals. The mean weight of runners in pounds is 136. 12.21 6. Because the method yields a correct result 95% of the time.35 6. 30.29 6. $18. (a) 95% confidence means that this particular interval was produced by a method that will give an interval that contains the true percent of the population that will vote for Ringel 95% of the time. 6.37 6. 6. (b) P-value = 0.75 minutes. (b) This particular interval was produced by a method that will give an interval that contains the true percent of the population that like their job 95% of the time. The interval includes 50%. of the population is (60.37. The results are not trustworthy.39 . so we cannot be “confident” that the true percent is 50% or even slightly less than 50%.29. (a) The mean weight of the runners in kilograms is x = 61.664). 5. 11. We are 95% confident that the true population percent falls in this interval. 140.099.31 6. entry. the election is too close to call from the results of the poll.and mid-level employees.13 The students in your major will have a smaller standard deviation because many of them will be taking the same classes which require the same textbooks. It does not tell us about individual study times. Some possibilities include blue collar vs. so the 95% confidence interval is 52% ± 3% = (49%.19 6.8 ± 0.490.567.0208.37. in kilograms. In pounds we get (132. Phone-in polls are not SRSs. (b) No. When we apply the method once. (a) z = −1. Use n = 75.15 6. The formula for a confidence interval is relevant if our sample is an SRS (or can plausibly be considered an SRS).53. The standard deviation of the mean weight in pounds is 2. Answers will vary.51 or (26. $17.33 6.90 ± $961. When we apply the method once. Hence.062. n = 74. (d) No. or ($16.03 years.

Z = −0. If a significance level of 0. Because it is unlikely that we would obtain data this strongly in support of the calcium supplement by chance. There is not enough evidence to say that private four-year students have significantly higher debt than public four-year borrowers. do not reject the null hypothesis.05. (b) With a P-value of 0. Ha : μ = 4. (c) Reject the null hypothesis whenever the P-value is ≤ α.807 and z < −2. (a) The null hypothesis should be μ = 0. (b) H0 : μ A = μ B . P-value = 2(0. (b) The standard deviation of the sample mean should be 18/ 30.008. in the population who name economics as their favorite subject. (a) H0 : μ = 31. results that are as strongly or more strongly in support of the calcium supplement if it is really no more effective than the placebo. Ha : ρ > 0.41 If the homebuilders have no idea whether Cleveland residents spend more or less than the national average.43 6.108. then they are not sure whether μ is larger or smaller than 31%.07 we would not reject H0 : μ = 15. The significance level corresponding to z ∗ = 2 is 0.9124. p F . not the sample statistic x.0026. z-values that are significant at the α = 0.807. and thus 15 would fall outside the 90% confidence interval.9713.51 6.63 6. H0 : μ = 100. and the percent of females.0574.005 level are z > 2.05 is used. The P-value is the probability that we would observe. (a) Answers may vary. 0. Ha : μ = 31%. (b) No.049 < 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-31 6. In this case the P-value is 0.57 6.07 we would reject H0 : μ = 15. and the significance level corresponding to z ∗ = 3 is 0.61 6. (a) With a P-value of 0.53 6. (c) The hypothesis test uses the population parameter μ. the P-value is large. (b) P-value = 0. The alternative hypothesis could be √ μ > 0. Ha : p M > p F .45 6. Ha : μ A > μ B .65 . Do not reject the null hypothesis. Ha : μ A = μ B where group A consists of students who exercise regularly and group B consists of students who do not exercise regularly. (c) H0 : μ = 1400. where the parameter of interest is the correlation ρ between income and the percent of disposable income that is saved by employed young adults. p M . simply by chance.049 > 0. Ha : μ = 100.01.0287. The appropriate hypotheses are H0 : μ = 31%.59 6. (a) H0 : p M = p F . Ha : μ < 1400. (a) Yes. which is very small. (b) No. (a) P-value = 0.55 6.0456. 0. There is not enough evidence to say that exercise significantly affects how students perform on their final exam in 6.Moore-212007 pbs November 27. where μ A is the mean score on the test of basketball skills for the population of all sixthgrade students if all were treated as those in group A and μ B is the mean score on the test of basketball skills for the population of all sixth-grade students if all were treated as those in group B. this is strong evidence against the assumption that the effect of the calcium supplement is the same as that of the placebo. (c) H0 : ρ = 0. (c) P-value = 0. (b) H0 : μ = 4. and thus 15 would not fall outside the 95% confidence interval.49 6. where the parameters of interest are the percent of males. Ha : μ > 31. There is not enough evidence to say that the average north-south location is significantly different from 100. but one possibility is a two-sample comparison of means test with H0 : μ A = μ B .4562) = 0.47 6.

The result is not significant at the 5% level. (b) z > 1. so we would not reject the hypothesis that μ = 5 at the 5% level.75 6. (b) z = 1.Moore-212007 pbs November 27. surveys of acquaintances only) will produce data for which statistical inference is not appropriate. we would be willing to reject the null hypothesis even when no practical effect exists. (a) (99. This is a call-in poll.0013. which means that our sample statistic would be 0 standard deviations away from the null hypothesis mean. The 95% confidence interval does not contain 7.3821. so we would reject H0 at the 5% level.79 6. (c) It would be good to know how the sample was selected and if this was an observational study or an experiment. Our results are based on the mean of a sample of 40 observations. 6.89 6.56) < 0.64. The P-value is 0. (a) No.225). With a significance level this high.71 6. and such polls are not random samples. This is reasonably strong evidence against the null hypothesis that the population mean corn yield is 135.96.67 z = 3. Poorly designed experiments also provide examples of data for which statistical inference is not valid.81 6. (a) H0 : μ = 7.5 corresponds to a Z value of 0. and such a mean may vary approximately according to a Normal distribution (by the central limit theorem) even if the population is not Normal. (a) z > 1. (a) z = 1. so we would not reject H0 at the 5% level.2040 and the result is not significant at the 5% level. Ha : μ = 7. but we would still reject the null hypothesis.69 6.77 6. The P-value is P(Z ≥ 3.91 6.0164. (a) The P-value = 0.2040 and the result is not significant at the 1% level. Answers will vary. The 95% confidence interval in part (a) contains 105.645. 2007 9:50 S-32 SOLUTIONS TO ODD-NUMBERED EXERCISES statistics.83 6.041.002. The approximate P-value is < 0. 6. We reject H0 at the 5% level if z > 1. so our conclusions are probably still valid. A P-value of 0. (b) The hypotheses are H0 : μ = 105. (a) P-value = 0. (b) P-value = 0. Statistical significance and practical significance are not necessarily the same thing. The conclusion is not justified.65.56. We would conclude that there is strong evidence that these 5 sonnets come from a population with a mean number of new words that is larger than 6. The result is significant at the 5% level.93 . Ha : μ = 105.87 6.1711.73 6. and thus we have evidence that the new sonnets are not by our poet.9.001.645. (b) Yes. 109.85 6. (c) P-value = 0. In part (b) we reject H0 in favor of Ha : μ = 0 if either z is too large or z is too small because both extremes are evidence against H0 in favor of Ha : μ = 0. (b) 5 is in the 95% confidence interval. (c) No. Any convenience sample (phone-in or write-in polls.96 or z < −1. Statistical inference is not valid for badly designed surveys or experiments. (b) The P-value is 0. (c) In part (a) we reject H0 only for large values of z because such values are evidence against H0 in favor of Ha : μ > 0.

95 Using the Bonferroni procedure for k = 6 tests with α = 0.4 24 8.58. 6.99 6. 6.05.05/6 = 0.05 that we would observe a difference as large as or larger than this by chance if.44.88.5040. (8.” (b) The error probability one chooses to control usually depends on which error is considered more serious.82) 9. (3.97 6.90266.68.101 Since 80 is farther away from 50 than 70 is.72) 8.08. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-33 6.32) 5. The probability is less than 0. Of the six P-values given. This is quite a bit larger than the power of 0. x ± 0. .115 Industries with SHRUSED values above the median were found to have cash flow elasticities less than those for industries with lower SHRUSED values. on average the cash flow elasticities for the two types of industries are the same. 6. The probability of a Type II error at μ = 298 is 1 minus the power of the test at μ = 298. in fact. (7. which is 1 − 0. This is a “false-positive. linear relationship between age and months employed.008 and 0.62) 6.105 The power of the test against the alternative μ = 298 is 0. and so it is unlikely that the observed difference is merely accidental.22) 4. (a) X has a binomial distribution with n = 77 and p = 0.18.3 (2. 34. z = 2. In most cases.9 26 9. As sample size increases.109 (a) The hypotheses are H0 : patient does not need to see a doctor and Ha : patient does need to see a doctor. the consequences can be serious.72) 7. (c) Let μ denote the mean DMS odor threshold among all beginning oenology students.” (2) The patient is diagnosed as not needing to see a doctor when.4 23 7. We test the hypotheses H0 : μ = 25. (8. All the widths of all the confidence intervals are exactly the same. 6. (4.001.103 (a) The power of the test against the alternative μ = 298 is 0. 0.5 25 8.111 Age Months employed.117 (a) The stemplot shows that the data are roughly symmetric. the power will be higher than 0. the patient does need to see one.3 22 6.05/6 = 0.2 20 5.98.0083. (b) (26.22) 9.74 μg/l). the patient does not need to see a doctor. (8.0083 (or less) for statistical significance for each test. power increases. (b) The power is higher. 6. Ha : μ > 25. The program can make two types of error: (1) The patient is told to see a doctor when.Moore-212007 pbs November 27. This probability is quite low.05. 6.32 18 2.4960 that we found in Exercise 6.4960 = 0. The alternative μ = 295 is farther away from μ0 = 300 than μ = 298 and so is easier to detect. are below 0.5. 6.9 19 4.98.05.4960.9099.107 The probability of a Type I error is α = 0.06 μg/l.52) 5.62) There is a strong. This is a “false-negative. we should require a P-value of α/k = 0. 3.81. positive.58. in fact. a false-negative is considered more serious. If you have an illness and it is not detected. (6.0 21 5. in fact. (4.08. (b) The probability that 2 or more are significant is 0. 6.113 Answers will vary. x 95% CI. only two. 6.

123 (a) Increasing the size n of a sample will decrease the width of a level C confidence interval.11.05.67 ± 3. we would expect the proportion of times we reject to be about 0. 6. we would not expect to get exactly the same number of intervals to contain μ = 240. Thus. (c) m = 10. the probability that we will obtain data that lead us to incorrectly reject H0 is 0. the effect of the program is confounded with the reasons why some women chose to participate and some didn’t. If we repeated the simulations. This is consistent with the meaning of a 0. (c) The study is not good evidence that requiring job training of all welfare mothers will greatly reduce the percent who remain on welfare for several years.80.50. 6.60 ± 4.127 (a) Assume that in the population of all mothers with young children. In 100 trials where H0 was true. and so results will vary from one simulation to the next.125 No. (b) The sample size is large. of the simulations contained μ = 240.119 $782.121 (a) The authors probably want to draw conclusions about the population of all adult Americans. They were not assigned using randomization. the proportion of times was 0.05 significance level. (b) Increasing the size n of a sample will decrease the P-value. The probability is less than 0.Moore-212007 pbs November 27. and so in a very large number of simulations we would expect about 50% to contain μ = 240.02 = ($776. 6. (b) 95% confidence means that the method used to construct the interval 21% ± 4% will produce an interval that contains the true difference 95% of the time. or 60%. 6.05.0073. . those who would choose to attend the training program and those who would not choose to attend actually remain on welfare at the same rate.38 ± 4.131 (a) We used statistical software to conduct the simulations. 2007 9:50 S-34 SOLUTIONS TO ODD-NUMBERED EXERCISES The P-value is 0. (e) 15 of the 25. The null hypothesis is either true or false.129 Only in 5 cases did we reject H0 . 6. and the central limit theorem suggests that it is probably reasonable to assume that the sampling distribution for the sample means is approximately Normal. 6. The population to which their conclusions most clearly apply is all people listed in the Indianapolis telephone directory. Thus. Mothers chose to participate in the training program.05.92 (c) None of these intervals overlap. Statistical significance at the 0. which suggests that the observed differences are likely to be real. (c) Increasing the sample size n will increase the power. $788. The probability that any given simulation contains μ = 240 is 0.01 that we would observe a difference as or more extreme than that actually observed.82 ± $6.05 level means that if the null hypothesis is true. (b) Store type 95% confidence interval Food stores Mass merchandisers Pharmacies 18. 6. (d) The calculations agree with our simulation result up to roundoff error. we say that we are 95% confident that this particular interval is accurate. Because the method is reliable 95% of the time. This is strong evidence that the mean odor threshold for beginning oenology students is higher than the published threshold of 25 μg/l.84).61 48.45 32. This is because each simulation is random.

938). there is little difference between critical values for the t and the Normal distribution.3218.31 . the t ∗ decreases for the same confidence level. P-value ≈ 0.37876. so there are certainly stores with a percent change that is negative. 7.36917. not for individuals.2 is 0. (b) There are 11 degrees of freedom. Hypotheses are H0 : μ = 500 and Ha : μ > 500.87. (b) A 95% confidence interval for the mean annual earnings of hourly-paid white female workers at this bank is ($22.11 7. . (c) (27. The 95% confidence interval for the mean monthly rent is ($481. Hypotheses are H0 : μ = 0% and Ha : μ = 0%.262 using degrees of freedom n − 1 = 9. . 27.9 7. and using software. This illustrates that for large degrees of freedom. .9394. (b) 2.63).8% and the standard deviation is 15%.3 7.000.574 with 9 df. (b) The 95% confidence interval is (20. $27.811. which agrees quite well with the bootstrap intervals.0005.29 7. $604.Moore-212007 pbs November 27. 0. As the confidence level increases.12). We conclude there is strong evidence that the mean annual earnings exceed $20. t ∗ increases for the same sample size.797 using degrees of freedom n − 1 = 24. (c) The mean is 4.914). we get P-value = 0. t = 1. (c) No.27 7.25 7. 40. (d) As sample size increases. (a) The stemplot using split stems shows the data are clearly skewed to the right and have several high outliers. 63.005 < P-value < 0. not significant at 1%.11. (b) t ∗ = 2.025.7 7. (a) The data are skewed right with 3 points that are particularly high. .13 7.205. .02.21 7. (b) P-value < 0.350. . 59. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-35 CHAPTER . (d) 0.23 7.15 7.02145.19. and using software.552 with 19 df.5 (a) SEx = 24. s = 21.022.01.064 using degrees of freedom n − 1 = 24. A 95% confidence interval for the mean loss in vitamin C is (50.7 7. (c) 0. and using software.025 and 0.89). (a) The t statistic must exceed 2. since we are interested in whether the average sales are different from last month (no direction of the difference is specified). (b) (59.47 with 49 df.19 7.7287 × 10−6 using 119 degrees of freedom. (b) t = 2. (17.075. 19. σx = 3.719.81).9916. (b) This result can be found in Table D in the row corresponding to z ∗ . (c) Excel gives 9. The lesson is that for large sample sizes the t procedures are very robust.093 and 2. (f) Excel gives 0. (a) Degrees of freedom = 119. (b) x = 34.5768. There is strong evidence that the average sales have increased.9902).000 and Ha : μ > 20. (a) There is no obvious skewness and there are no outliers present. clearly nonNormal. t = 4. (e) Significant at 5%. . but use df = 100 on Table D to be conservative. . 000.156. The power of the t test against the alternative μ = 2. (a) t ∗ = 2.54.17 7.02 < P-value < 0. (a) Degrees of freedom = 19. The hypotheses are H0 : μ = 20.1 7. (a) Two-sided. . this is the confidence interval for the population average weight. . (a) 0. (c) t ∗ = 2.00. We conclude that there is not much evidence that the mean rent of all advertised apartments exceeds $500.

t = 1.76.19 micrograms). (b) No.51 7.17.75) for the mean at the factory.47 7. where μ represents the average difference in vitamin C between the measurement five months later in Haiti and the factory measurement. There is very strong evidence that vitamin C is destroyed as a result of storage and shipment. A 95% confidence interval for the mean price received by farmers for corn sold in October is ($1. (b) x ≥ 34.Moore-212007 pbs November 27. There is no need for increasing the sample size beyond 50.53 . 38. 70. There is extremely strong evidence that the scores improved over six months.000. and 0. P-value < 0. The hypotheses are H0 : μ = 0 and Ha : μ > 0. problems with missing values for one survey due to nonresponse or customers leaving the pool would make the paired sample difficult. We have converted quantitative data into categorical data by sorting the streams into very poor.54. Taking the differences (Variety A – Variety B) to determine if there is evidence that Variety A has the higher yield corresponds to the hypotheses H0 : μ = 0 and Ha : μ > 0 for the mean of the differences.15.122. $2. poor.29 with 9 df.39 7. 2007 9:50 S-36 SOLUTIONS TO ODD-NUMBERED EXERCISES 7. −5.734. sample proportions use Z ∗ . (b) The best answer is 2 independent samples.55.35 7.10 < P-value < 0. (a) The hypotheses are H0 : μ = 0 and Ha : μ < 0. A 95% confidence interval for the mean score on the question “Feeling welcomed at Purdue” is (3. A 95% confidence interval for the mean amount of D-glucose in cockroach hindguts under these conditions is (18.45). which is in agreement with the confidence interval (2.48) for the mean after five months.37 7.0005. (a) H0 : population median = 0 and Ha : population median > 0.96. 3. 6. so the sample proportion is 0. −3.69 micrograms.41 7.745.12) for the mean change. (a) Single sample. t = 6.34. matched pairs would also be an acceptable answer. (b) The number of pairs with a positive difference in our 7. and other. (36. or the population of differences (time to complete task with left-hand thread) − (time to complete task with right-hand thread). Even though we are looking at changes of opinion over time from the same basic group. (c) The 95% confidence intervals are (40. the hypotheses would be H0 : p = 1/2 and Ha : p > 1/2. where μ represents the average improvement in scores over six months for preschool children. The ratio of mean time for right-hand threads as a percent of mean time for lefthand threads is x R / x L = 88. (b) t = −4.866). Do not reject the null hypothesis.513) = 1.33 (a) 6 streams are classified very poor or poor out of 49 total.43 7. if the exact same sample was used both years with a stable population of customers. The 90% confidence interval for the mean time advantage is (−21.88) obtained in Exercise 7.95 with 26 df. not t ∗ . so those using the right-hand threads complete the task in about 90% of the time it takes those using the left-hand threads. However.49 7. and (−7. 44. (a) t ∗ = 2.45 7.55. Sample proportions are for categorical data.90 with 33 df and P-value < 0. There is not enough evidence to support that Variety A tomatoes have a higher mean yield than Variety B tomatoes. (c) The power is P(Z ≥ −4. If p is the probability of completing the task faster with the right-hand thread.47).0005. sample means are for quantitative data.7%. and using software.403.

65 7.0021 using the Normal approximation or 0.67 7.61 7.13. (b) (4. Randomization makes the two groups similar except for the treatment and is the best way to ensure that no bias is introduced. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-37 data is 19. (c) P-value lies between 2 × 0.59 7. SPSS and Minitab report both the standard deviation and the standard error of the mean for each group. (a) The data are probably not exactly normally distributed because there are only 5 discrete answer choices. If we use the 68–95–99. while SAS reports only the standard error of the mean for each group. and Minitab provides the t with Satterthwaite degrees of freedom. There is not enough evidence to say that there is a significant difference between the average consumption of sugary drinks 7. (b) Yes.69 7.55 (a) Two-sided significance test.2 < P-value < 0.57. and n = 24 since the zero difference is dropped. all report the confidence interval for the mean difference. since you just want to know if there is evidence of a difference in the two designs. Reject the null hypothesis. Reject the null hypothesis. All report degrees of freedom and P-values to various accuracies. First write pooled variance as the average of the individual variances and then use this expression in the pooled t to see that it gives the same result for equal sample sizes.7% rule. 5. (d) The sample sizes are so large that a little skewness will not affect the results of the two-sample comparison of means test. There is strong evidence of a difference in daily sales for the two designs.63 7. There is strong evidence the average self-efficacy score for the intervention group is significantly higher than the average self-efficacy score for the control group. 7. 0. 0.426). P-value < 0.) (d) t = 3. (e) (0. Do not reject the null hypothesis.49. but be sure to double the P-value when checking your answer with this one. t = 1.3. (b) Excel reports the two variances.0033 using the binomial distribution. The two-sample comparison of means t test is fairly robust.0025 = 0.9 oz. There is very strong evidence the average exposure to respirable dust is significantly higher for the drill and blast workers than it is for the concrete workers.44.57 7.002. The pooled t = 17.005 and 2 × 0. The same problem with negative consumption for the older kids (starting with 95%) as well. then 68% of the younger kids would consume between −2. (b) 29 df using the conservative approximation. t = 18. P-value is close to 0.374. while SAS seems to provide the most information because it includes more information about the two groups individually. The P-value is P(X ≥ 19) = 0.Moore-212007 pbs November 27.5 and 18. SPSS and SAS provide the t with Satterthwaite degrees of freedom as well as the pooled t. (A “=” would also be appropriate in the alternative hypothesis. (c) H0 : μ I = μC and Ha : μ I > μC . (c) H0 : μD/B = μC and Ha : μD/B > μC . of sweetened drinks every day. (a) All report both means. (c) Excel is doing the pooled two-sample t procedure. (e) Excel has the least information.001 = 0. (d) With the exception of Excel.669).0005. (b) H0 : μolder = μyounger and Ha : μolder = μyounger .191. This is a matched pairs experiment with the pairs being the two measurements on each day of the week. the large sample sizes would compensate for uneven distributions. (a) No.71 .13 with 133 df. Results are almost identical to those of Example 7.

2). (a) The hypotheses are H0 : μ0 = μ3 and Ha : μ0 > μ3 .85 7. (b) The value of the t statistic is 2.5.47. and the P-value is approximately zero. (b) The 90% confidence interval is (19. so in spite of the skewness appearing in the histograms. (e) How were these children selected to participate? Where they chosen because they consume such large quantities of sweetened drinks? 7. These values would be used in the t statistic. then the percent sales would have decreased. 160. where 0 corresponds to immediately after baking and 3 corresponds to three days after baking. Do not reject the null hypothesis.965.07/2 = 0.532).85. 4.932. so it is very unlikely that the first mean is greater than the second mean.07/2 = 0. so the improvement is greater for those who took piano lessons.58).77. (b) P-value is 0.73 (a) Reject the null hypothesis because 0 is not inside the confidence interval. df = 21. You reject at both the 1% and 5% levels of significance. 34. and the sum of the sample sizes is close to 40. (c) (−5. 18.46) using 40 degrees of freedom as the conservative estimate from the t-table. Our sample mean difference is well above 0. (a) Using 50 df to be conservative and using Table D.662) using software.54.16. which is included in the confidence interval. The analysis would proceed by first taking the change in score and then computing the mean and standard deviation of the changes. Reject the null hypothesis. (c) These are observational data. (b) The individuals in the study are not random samples from low-fitness and high-fitness groups of middle-aged men. (a) The hypotheses are H0 : μLow = μHigh and Ha : μLow = μHigh . t = −0. and P-value > 0.79 7. t = −8.014.45).32. 2007 9:50 S-38 SOLUTIONS TO ODD-NUMBERED EXERCISES between the older and younger groups of children. (b) As sample size increases.91 7. The t procedures are not particularly appropriate here. If the true difference in means was −1.93 . A 99% CI will be wider than a 95% CI because the t ∗ value increases as the confidence level increases. Using 33 df. 95% CI = (15. the 95% confidence interval is (−0.29.77 7.89 7. The 95% confidence interval for the cost of the extra bedroom is (−0. the t procedures can be used. (b) The 90% confidence interval is (−11. so it is reasonable to conclude that the first mean is greater than the second mean.23. H0 : μ A = μ B can also be written as H0 : μ A − μ B = 0. which indicates there is strong evidence that the vitamin C content has decreased after three days.81 7.Moore-212007 pbs November 27. 10. where 0 corresponds to immediately after baking and 3 corresponds to three days after baking.87 7. This would now be a matched pairs design. since we have before and after measurements on the same men. The sample data is not normally distributed either. the 95% confidence interval is (1. (b) The negative values correspond to a decrease in sales. (d) The sample sizes are very uneven and fairly small.223 with a P-value of 7. and P-value = 0. so the possibility of bias is definitely present. which indicates no evidence of a loss of vitamin E. (a) The Normal quantile plots are fairly linear. t = 22.662.83 7.9). 24. Our sample mean difference (x 1 − x 2 ) is far below 0.75 7. (a) P-value is 1 − 0.2.9. df = 1. 6. (a) The hypotheses are H0 : μ0 = μ3 and Ha : μ0 > μ3 . the margin of error decreases.035.

1 There is a significant difference between the high.125 0. (a) t ∗ = 12. (b) F = 3.129 Between 0.103 (a) The value in the table is 647.7411.Moore-212007 pbs November 27.99 7.981).101 The hypotheses are H0 : σ1 = σ2 and Ha : σ1 = σ2 . (b) t ∗ = 4.37 0.87 1.29 0. P-value = 2 × 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-39 0. 7. δ) = 0.105 (a) The hypotheses are H0 : σM = σW and Ha : σM > σW . and using an F(1.5 > 0. The powers obtained using the Normal approximation are quiter close to these values.7765.4855.24 1. 7.9769.139 0.14 0.3390.001 and 0. serving ordered .2981 = 0.907.16 0.2 and 0. 7.3 ≈ 0. F = 1.002 and 0.20.06 0. n = 50 is 0. 7.45 0. (b) F = 1.71.5 > 0.95 7.859.28 2.1 and 0.123 0.630 0. so there is little evidence of a difference in variances between the two groups. (c) The increase in degrees of freedom means that the value of the t statistic needs to be less extreme for the pooled t statistic in order to find a statistically significant difference.005 Between 0.44 1. n = 75 is 0.609 2.303.79.9460. (a) The upper 5% critical value is F ∗ = 2.136 0.47 3.1528.109 (a) Using SAS.02 Between 0.5 Conclusion Reject H0 Do not reject H0 Reject H0 Do not reject H0 Reject H0 Do not reject H0 Do not reject H0 Do not reject H0 0. These powers were calculated using SAS.2 Between 0.48 0.08 0. DF.0764 = 0. (c) Using an F(19. 7.24 2 sH 170 + 2 sL 224 t 3.10. which is greater than the power found in (a).59. 17) distribution P-value > 0. δ) = 0.5962.39 0.002 Between 0. and using an F(33. (b) Significant at the 10% level but not at the 5% level.23 0. 35.107 The power for n = 25 is 0.21 0.05 Do not reject H0 and 0. (b) Using SAS. 7. 1) distribution and statistical software.94.127 0.86 P-value Between 0. POWER = 1-PROBT(t ∗ . n = 125 is 0.01 Reject H0 and 0.74. 7.125 0.143 0. 43) distribution and statistical software.005 > 0. P-value = 2 × 0.97 The approximate degrees of freedom are 37. POWER = 1-PROBT(t ∗ .119 0. so there is little evidence of a difference in variances between the two groups.01 0. so there is little evidence that the males have larger variability in their scores. DF.0165. (c) The confidence interval for the difference is (4. n = 100 is 0. well-dressed staff.111 Perceived quality Food served in promised time Quickly corrects mistakes Well-dressed staff Attractive menu Serving accurately Well-trained personnel Clean dining area Employees adjust to needs Employees know menu Convenient hours xH − xL 0.and low-performing restaurants in regards to food served in promised time.8787. This gives fairly strong evidence that the mean SSHA score is lower for men than for women at this college.

Side-by-side boxplot shows cotton much higher than ramie.573).113 (a) Study was done in South Korea. This suggests that more than half of the products at the alternate supplier are priced lower than the original supplier.98.3077).137 As the degrees of freedom increase for small values of n. the F test is not robust to skewness.96 but never become greater than 1.207.312. (c) The P-value is approximately equal to zero. P-value is very close to 0 so reject the null hypothesis. 2007 9:50 S-40 SOLUTIONS TO ODD-NUMBERED EXERCISES food accurately. 7. There is strong evidence that cotton has a significantly higher mean lightness than ramie. Response rate is low (394/950). Only selected QSRs were studied. The P-values of both are approximately zero. . There is not a significant difference in the other qualities. 7. (b) Yes.012). (b) The t procedures are robust for large sample sizes even when the distributions are slightly skewed. 0. (b) (−0.39219. 7.0765. P-value is close to 0 so we can reject the null hypothesis.954. As sample sizes get larger. and employees know menu. 7. 7. There is strong evidence that hotel managers have a significantly higher average masculinity score than the general male population. 0. 7. 7. (c) The middle 95% of scores would be from 29. 7.96. (b) (12.121 (1. 7. 7. therefore they used a single sample t test.115 H0 : μ = 4.555 ± 0. the t values are close to z = 1. results may not apply to other countries. x R = 41. H0 : μC = μ R and Ha : μC > μ R .131 (a) (0. 7. (b) Answers will vary. 92. and we can conclude that the program was effective. slightly skewed right but fairly symmetric.127 (a) This study used a matched pairs design. 7.6488. the t test is robust to skewness.09).133 (a) No.9998. t = −8.88 and Ha : μ > 4. results may not apply to other QSRs.139. we would trust the results more if the rate were higher. sC = 0. (d) The scores for the first minute are clearly much lower than the scores for the 15th minute. Results of the t tests show that there is a significant difference in the average SATM scores between the two sexes.25377. the value of t rapidly approaches the z score.Moore-212007 pbs November 27. The fact that no differences were found when the demographics of this study were compared with the demographics of similar studies suggests that we do not have a serious problem with bias based on these characteristics.117 (a) No outliers.66 to 44. 7. t = 46.129 (a) H0 : μ1 = μ2 and Ha : μ1 < μ2 . We need to assume fairly normally distributed data without outliers and that a simple random sample was taken from each group.123 2. P-value ≈ 0.6035). Conclude that the workers were faster than the students.55. t = 21. (b) The average weight loss in this program was significantly different from zero. s R = 0.125 The 95% confidence interval on percentage of lower priced products at the alternate supplier is (64.119 x C = 48.162. 7.9513.98. 13.135 The F test result shows that there is no reason to believe the variances of the two sexes are unequal.88.

5 8.41).21 8.17 8.” testing whether the proportion who answer “Yes” is equal to 0. The methods have the same margin of error = 0. ˆ (a) p ± Z ∗ SE p (forgot the Z ∗ ). 0.00132.218.7 (0. (b) Teens 16–19 years old may have jobs.64%. .62 and P-value = 0.376).5 is used.324.3 is appropriate.25 8.3 8. .359. (d) Yes. (a) 0. Need to sample at least 601 people if a p ∗ of 0. (b) (0. A person could have both lied about having a degree (for example. If we know how many answered “Yes. (b) No. 16–17 year olds. 0. (a) (0.0126. which is small. H0 : p = 0. 0.11 8. It would make more sense to group the teens as 12–15 year olds.25. .23 8. but the plus four method shifts the interval slightly higher.1052. 0.19 8. Those most likely not to reply are the cheaters. having an advanced degree such as a master’s or Ph.23.” (b) The test statistic is z = 1.47).29 .Moore-212007 pbs November 27. the person delivering the sermon probably thinks the sermon is shorter than it actually is. (a) (0. and a 15% response rate is very low.316. not ˆ the sample statistics. .355).401). .22.) and about their major (for example. approximately 95% of these intervals would contain the population proportion. (0. their undergraduate major if they lied about having a master’s degree but did have a bachelor’s degree). 0.15 8. No. (b) Hypotheses need population parameters. 0. . 0. If we were to repeat our sampling many times and compute a confidence interval from each sample. (0. .8 8. 0.9 8.21.634 ± 0.27 8.745).D. 0. 99% confidence interval is wider because the Z ∗ is bigger.1 8. Smaller samples sizes will make a bigger difference when the sample proportion is close to 0. Because lying about having a degree and lying about major 8. so we can trust these results. (0. 0. . over the long run. What about the schools that can’t afford the fee? Just because the sample size is large doesn’t mean that good data was collected.635.251). (0.48). Because the proportion who answer “Yes” is 1 minus the proportion who answer “No. (c) Nonresponse rate is only 3.301. 18 and 19 year olds may be living on their own.75 is equivalent to testing whether the proportion who answer “No” is 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-41 CHAPTER . 0.027. The response rate and other issues may be larger sources of error here than pure statistical variation quantified by the margin of error.300.354). (a) m = 0.” we automatically know how many answered “No. and the congregation would probably think the sermons are longer than they actually are.13 8. (0. (a) (0.384). a high nonresponse rate would skew the results. This plus four interval is shifted slightly to the right of the original interval of (0. . . An example would be 1/100. (b) Students most likely not to respond are the cheaters.47). and 18–19 year olds.21.

59 8. 0. np0 = 150 and n(1 − p0 ) = 350.9 m: 0. (b) (0. Ha : p = 0.2 0. 0.64.47 8.642.267). 0. The P-value is approximately 0.34. 0.112.5 0. 0.49 ˆ For p2 : mean = 2 σ p1 − p2 ˆ ˆ = 2 σ p1 ˆ + p2 (1− p2 ) .3 0.1802. ˆ n n = μ p1 −μ p2 = ˆ ˆ (0.43 8. 0. (c) Safe. np0 = 4 < 10 and n(1 − p0 ) = 6 < 10.0820 0.089).7 0. The test statistic z = 4. (c) These results are consistent (same P-value) with the previous exercise.53 8.Moore-212007 pbs November 27. (a) μ p1 − p2 = −0. Also.35 (0.3078.05. we cannot automatically conclude that a total of 24 = 15 + 9 applicants lied about one or the other.0877 0. (0.33 8.5.5165). We round up to get n = 201.64. we would not reject the null hypothesis that the probability that Kerrich’s coin comes up heads is 0. standard deviation σ p2 = ˆ ˆ p1 − p2 .93. 0. (a) Not safe. (d) Safe.0877 0. not a designed experiment. 8. np0 = 60 and n(1 − p0 ) = 40.1 and σ p1 − p2 = 0.45 8. H0 : p1 = p2 . (b) μ D = μ p1 − p2 ˆ ˆ n p1 (1− p1 ) p2 (1− p2 ) 2 σ p2 = + .295.61 .0947.294). The margin of error for the 90% confidence interval is 0. (b) (0. Round up to get n = 451. ˆ (a) p = 0.0537 8. (a) H0 : p = 0. standard deviation σ p1 = ˆ ˆ μ p2 = p2 . Both are greater than 10.0691. Observational studies generally do not provide a good basis for concluding causality.339).4 0.768. Plus four interval: (−0.106. this is not a random sample from the population of all bicyclists who were fatally injured in a bicycle accident.1 0. P-value = 0. ˆ ˆ ˆ ˆ (a) H0 : pImpulse = pPlanned and Ha : pImpulse = pPlanned .125. n = 450. (b) Safe.46. (0. (c) 8.912). This is not strong evidence against the null hypothesis that the sample represents the state in regard to rural versus urban residence in terms of the proportion of urban residents. The P-value = 0. 0.02.0716 0. The P-value = 0.0895 0. n = 1711. only association.3524.31 8.6 0. Z interval: (−0. ˆ p: 0.3168.272).51 ˆ (a) For p1 : mean μ p1 = p1 . 0.768). 2007 9:50 S-42 SOLUTIONS TO ODD-NUMBERED EXERCISES are not necessarily mutually exclusive events.0537 0. This is strong evidence that a higher proportion of complainers than noncomplainers leave voluntarily. We conclude there is not strong evidence against the hypothesis that the sample represents the state in regard to rural versus urban residence in terms either of the proportion of rural residents or in terms of the proportion of urban residents.41 8. Z = −1.39 8.37 8. Do not reject the null hypothesis. Ha : p1 > p2 . (a) The test statistic is z = 1. Both are greater than 10. (c) This is an observational study.0820 0.0716 0.18. and since this is larger than 0.0788). and X = 542 (the number who tested positive) are basic summary statistics.8 0.4969. (c) (−0. n 8.0301.57 2 σD p1 (1− p1 ) . (b) The test statistic is z = −0.55 8.289. There is not enough evidence to say 8.

0273. (c) (−0.141 and p2 = 0.14.1963.511.408. P-value = 0. P-value is approximately 0. H0 : p1 = p2 . There is evidence of a significant difference between the proportion of male athletes who admit to cheating and the proportion of female athletes who admit to cheating.2125. This is strong evidence that the proportion of males born in Copenhagen with an abnormal male chromosome who have criminal records is larger than that of males born in Copenhagen with a normal male chromosome. We would recommend that flip-up shields be used on new tractors. H0 : p1 = p2 . Let p1 denote the proportion of all Danish males born in Copenhagen with a normal male chromosome who have had criminal records and p2 the proportion of all Danish males born in Copenhagen with an abnormal male chromosome who have had criminal records. The interval does not contain 0. (b) (0.79 8.1787). Ha : p1 < p2 . (b) SE D = ˜ 0.029. (a) z = 5. (a) H0 : p1 = p2 . and this is strong evidence that the opinions differed in the two counties. Someone who gambles will be less likely to respond to the survey. ˆ ˆ p1 (1− p1 ) contributes more to the standard error of the difference (b) The term n1 because n 1 = 191 is so much smaller than n 2 = 1520. 0. This is strong evidence that there is a difference in the proportions of females and males who were in a fatal bicycle accident.038.8.4115). in fact. Benton County: p2 = 0.69 8.71 8.34.2377). Ha : p1 = p2 .339. Ha : p1 = p2 . P-value = approximately 0. Do you think men or women are more likely to report that they do not gamble when. 0.5. so reject the null hypothesis. ˆ ˆ (a) Tippecanoe County: p1 = 0. −0.7333. This is not strong evidence that there is a difference in preference for natural trees versus artificial trees between urban and rural households. Z = −0. 0. We test the hypotheses H0 : p1 = p2 . P-value = 0.77 8. Ha : p1 = p2 . z = −5.75 8. P-value = 0.2186. ˆ ˆ (a) p1 = 0.51. and tested positive. 8. We could conclude that there is reasonably strong evidence that the proportion of cockroaches that will die on glass is less than the proportion that will die on plasterboard. were tested for alcohol. 0. Zero is well outside this interval. There is not enough evidence to say that the applicants are lying in different proportions than they did 6 months ago. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-43 the difference in credit card use between impulse and planned purchases is statistically significant. (a) (−0. P-value = 0.482).23. (b) z = 1. A 95% confidence interval is (−0. There is not enough evidence to say that the detergent preferences are significantly different for people with hard water and people with soft water.73 8. The 95% confidence interval (male − female) is (0.1389).81 . Ha : p1 = p2 . Z = 1.67 8.40.63 H0 : pS = pH and Ha : pS = pH .0209. (c) (0.91.2542.143). Do not reject the null hypothesis. P-value = approximately 0.Moore-212007 pbs November 27.0281. they do gamble? 8. This is strong evidence that there is a difference in the proportions of the two types of shields removed. 0. so that bolt-on shields appear to be removed more often than flip-up shields. z = −3.0003. We test the hypotheses H0 : p1 = p2 .65 8.253. Z = 14. (b) z = −1. P-value < 0.

231. Users (total number) Analysis of education Analysis of income 1132 871 Nonusers (total number) 852 677 8. Ha : p1 = p2 . The 95% CI (users − nonusers) is (−0. For nonusers. H0 : p1 = p2 . P-value = 0. so reject the null hypothesis.Moore-212007 pbs November 27. so you could summarize by saying that the margin of error was no greater than 3. 2007 9:50 S-44 SOLUTIONS TO ODD-NUMBERED EXERCISES 8. Let p1 be the proportion of all die-hard fans who attend a Cubs game at least once a month and p2 the proportion of less loyal fans who attend a Cubs game at least once a month.1141.39 0. personalities of the servers. it is not a serious limitation for this study. there is variability involved in taking a sample.3 24 20 17 7 3 n 247 247 247 247 247 1371 1371 m (in %) 3. 0. Did one server only do the repeating while the other server did no repeating.102.2019). There is strong evidence that the users and nonusers differ significantly in the proportion of college graduates. Z = 6.0626). 0.1676.97 with a P-value near 0.38. (c) Cultural differences. z = 9. (b) H0 : p1 = p2 . Ha : p1 = p2 . 0.00 2.69 0. so do not reject the null hypothesis.783. 95% CI (repeat − no repeat) is (0.72 2.0022. We test H0 : p1 = p2 .04.83 (a) and (b) Category Download less Peer-to-peer E-mail and IM Web sites iTunes Overall use of new services Overall use paid services ˆ p (in %) 38 33. the proportion of “rather not say” is 0. You could also separate out the last 2 questions by saying their margin of error was less than 1%. P-value = approximately 0. the proportion of “rather not say” is 0.05. so reject the null hypothesis. This is strong evidence that the proportion of die-hard Cubs fans who attend a game at least once a month is larger than the proportion of less loyal 8. 8.205. and gender differences could all play a role in interpreting these results.85 (a) H0 : p1 = p2 . Z = 3. Z = 1.46 (c) Argument for (A): Readers should understand that the population percent is not guaranteed to be at the sample percent. (b) 95% CI (users – nonusers) = (0.55 2.91 . There is not much evidence of a significant difference in the proportion of “rather not say” answers between users and nonusers.0106.09 3. P-value = 0. Since the nonresponse rate is not significantly different for the two groups. Ha : p1 > p2 . There is strong evidence of a significant difference in the proportion of tips received between servers who repeat the customer’s order and those who do not repeat the order.430). or did they switch off? (d) Answers will vary. Argument for (B): Listing each individual margin of error does seem excessive. 8. Ha : p1 = p2 . pnorepeat = 0.89 ˆ ˆ (a) prepeat = 0.87 For users.517.09% for each of these questions.

11. This does not include 0. The margin of error is m = 0. A 95% confidence interval for the difference in the two proportions is (−0.09.390.93 8. This is consistent with the results in Example 8.5.5)(0. 8.8.0013. The z statistic and P-value are almost the same as in part (b). 8.5) .3755.438 25 0. z = −3. We test the hypotheses H0 : p = 0. the margin of error decreases. This conclusion is justified provided the trial run can be considered a random sample of all (future) runs from the modified process and that items produced by the process are independently conforming or nonconforming.76(1025) = 779 (after rounding off). 8. This is strong evidence that there is a difference in the proportions of male and female college students who engage in frequent binge drinking. −0.99 8.S. P-value = approximately 0. P-value = approximately 0.062 As sample size increases.107 (a) p0 = 0. adults who are male. (b) H0 : p1 = p2 .20.6736). it is not possible to guarantee a margin of error of 0.97 8. 0.00319). z = −29.95 The number of people in the survey who said they had at least one credit card is n = 0. (c) z = −29.791.098 400 0. P-value = approximately 0. n2 This leads to ˆ 8.5645). .16. Ha : p < 0. Let p be the proportion of products that will fail to conform to specifications in the modified process.139 150 0. the proportion of U. z = −3.11. 8.S. m = 1. We would conclude that there is strong evidence that the proportion of men with low blood pressure who die from cardiovascular disease is less than the proportion of men with high blood pressure who die from cardiovascular disease. (0.277 50 0. A 95% confidence interval for the difference in the two proportions is (0. the former value. P-value = 0.5)(0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-45 Cubs fans who attend a game at least once a month. Let p1 denote the proportion of male college students who engage in frequent binge drinking and p2 the proportion of female college students who engage in frequent binge drinking. This is strong evidence that the proportion of nonconforming items is less than 0.196 100 0. P-value = 0. 0. (b) p = 0.11. so we would conclude that the proportion of heavy lottery players who are male is different from the proportion of U. We test the hypotheses H0 : p1 = p2 .113 200 0.01411.105 Starting with a sample size of 25 for the first sample. adults who are male.101 (a) Let p1 denote the proportion of men with low blood pressure who die from cardiovascular disease and p2 the proportion of men with high blood pressure who die from cardiovascular disease. so the magnitude of the difference is large.069 500 0.960 a negative value for n 2 .485.0275. Ha : p1 < p2 . The 95% confidence interval is (0.0008. Ha : p1 = p2 . z = 9. which is not feasible.5524.15 or less.103 n m 10 0.5) 25 + (0.Moore-212007 pbs November 27. We would conclude that there is strong evidence that the probability that a randomly selected juror is Mexican American is less than the proportion of eligible voters who are Mexican American.

(c) H0 : There is no association between gender and admission.11 (b) % of males who are admitted: 490/800 = 61. . . Ha: There is an association between gender and admission. French or not French music playing. so the columns are type of music.6% of all firms are unsuccessful. which is the same as H0 : p1 = p2 .8% of successful firms and 72.7 9. % of females who are admitted to business school: 200/300 = 66.67%. . (e) Business: χ 2 = 0. . There is not enough evidence to say that there is an association between gender and admission.25%.31)2 = 10. z 2 = (3. .248 from SPSS. . Do not reject the null hypothesis. . but one example would be: X A B C Totals 9.Moore-212007 pbs November 27. (c) No relation says that whether or not a person is a label user has no relationship to gender.106 from SPSS. % of females who are admitted to law school: 200/400 = 50%.291)2 = 10. df = 10.67%.5 9. % of females who are admitted: 400/700 = 57.949. P-value = 0. Do not reject the null hypothesis. Answers will vary.9 9. then firms with exclusive territories and those lacking exclusive territories should both have 27. (a) Minitab calculates χ 2 = 10.83.3 9.3% of unsuccessful firms offer exclusive territories. 87. P-value = 1 from SPSS.14%. 2007 9:50 S-46 SOLUTIONS TO ODD-NUMBERED EXERCISES CHAPTER . There is not enough evidence to say that there is an association between gender and admission in law school. There is not enough evidence to say that there is an association between gender and admission in business school. (f) Simpson’s paradox: Because the business school has so many more students both admitted and rejected (and 600 men apply. P-value = 0. .335.1 (a) Simplest is two columns. (d) % of males who are admitted to business school: 400/600 = 66.9 9. more . (b) The explanatory variable is the type of music because we think this influences the type of wine purchased. χ 2 = 2. Law: χ 2 = 1.6% unsuccessful firms. % of males who are admitted to law school: 90/200 = 45%. .610. If there is no association between success and exclusive territories. 28 firms lack exclusive territories and 27. . Do not reject the null hypothesis. French wine purchased or other wine purchased. and two rows.13 (a) Two-way table: Admit Gender Male Female Total Yes 490 400 890 No 310 300 610 Total 800 700 1500 10 10 10 30 Y 10 10 10 30 Z 10 10 10 30 Totals 30 30 30 90 9. (b) (3.95.

54%. There is strong evidence to say that there is an association between initial major and transferred area.3% in the young adult category and 66. and 26. P-value = 0.001 from SPSS. There is strong evidence to say that there is an association between gender and visits to the H. it may be more likely that others are cheating too. Visits H. Reject the null hypothesis. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-47 than any gender to any program).021. bihai flowers. (b) Answers will vary. Reject the null hypothesis.19 9. P-value = 0. There is not enough evidence to say that there is an association between model dress and magazine readership age group.527.107 from SPSS.96%. (d) If one member of a team is cheating.17 9. Department A teaches a much larger percentage of small classes.15 % of Department A’s classes which are small: 32/52 = 61. % of Department B’s classes which are for third and fourth year students: 36/106 = 33. The 0 cell count does not invalidate the significance test. P-value is close to 0 from SPSS. χ 2 = 15. it changes the overall results when business and law are combined. (c) The people most likely not to respond are those who gamble.Moore-212007 pbs November 27. The marginal percentages for age group were 33. then χ 2 = 7. χ 2 = 2.62%.92%. but here is one possibility: If we change all the sample sizes to 1000 but keep the percentages the same.21 Initial major Biology Chemistry Mathematics Physics Transferred to other 202 64 38 33 χ 2 = 50. P-value is close to 0 from SPSS. The teammates also may have had a discussion about how to fill out the form. It is the expected counts that need to be 5 or greater for a 2 × 2 table in order for the chi-square test to be appropriate. but Department A teaches more than twice the percentage of third and fourth year students as Department B. so the results may be biased towards “no” answers. The marginal percentages for model dress were 73. bihai Gender Female Male Total 9.23 Yes 20 0 20 No 29 21 50 Total 49 21 70 9.675. % of Department A’s classes which are for third and fourth year students: 40/52 = 76. Reject the null hypothesis. If we change all the “yes” answer counts to 100 but keep the percentages the same.4% had models dressed sexually. .591. Do not reject the null hypothesis. There is very strong evidence that there is an association between collegiate sports division and report of cheating. 9.6% of the ads had models dressed not sexually. (a) χ 2 = 76.749 and the P-value increases to 0. χ 2 = 12.7% in the mature adult category. % of Department B’s classes which are small: 42/106 = 39.713 and the P-value remains very close to 0.

or obtained from a shelter. and no intervention has a 20. The older employees (over 40) are almost twice as likely to fall into the lowest performance category but are only 1/3 as likely to fall into the highest category. 9.000. df = 4. This is highly significant.241 and P = 0.4121. df = 2. Although there appears to be a strong increasing trend.028. (c) χ 2 = 163.37 9. Statistical software gives P-value = 0. Phoenix: χ 2 = 2.001.611. and P-value < 0. It is easily verified that z 2 = χ 2 .572 and P < 0.000. The percent continued to increase in the late 1980s but much more slowly. so the difference in percent on time for the two airlines is highly significant. stray. χ 2 = 3. with some leveling off in the early 1990s. while a much higher percent of cats than dogs come from other sources such as born in home. df = 12. χ 2 = 43. The percent started increasing again in the mid-1990s. San Diego: χ 2 = 4. (b) H0 : There is no relationship between intervention and response rate.31 9. df = 4. A much higher percent of dogs than cats come from private sources.2% response rate. df = 2. (a) Combined χ 2 = 13. df = 2.45 9. the changes in the percents are quite small (from 60% to about 66%). Use the information on nonresponse rate.037.487. indicating a relationship between the PEOPLE score and field of study. note that science has an unusually large percent of low-scoring students relative to other fields.6% response rate.845 and P = 0. P-value = 0.000. z = 6.35 9.346 and P = 0. P-value = 0. There is strong evidence of a relationship between winning or losing this year and winning or losing next year. but very slowly. The source of dogs and cats differs.Moore-212007 pbs November 27.223 and P < 0. Intervention seems to increase the response rate. χ 2 = 24. In Exercise 9.126. while liberal arts and education have an unusually large percent of high-scoring students relative to the other fields.0005. San Francisco: χ 2 = 21.29 9.513. (a) A phone call has a 68. Seattle: χ 2 = 14. df = 2. while for the data in this exercise the opposite is true.000. P-value = 0.9. but that percent increased quite rapidly until the mid-1980s. χ 2 = 19.47 .683. There is no evidence of a difference in the income distribution of the customers at the two stores. with America West being the winner. P-value = 0. (b) χ 2 = 6. with a phone call being more effective than a letter. P-value = 0.903 and P < 0. and take a larger sample size than necessary to make sure that you have enough observations with nonresponse accounted for. Women represented a very small percent in 1970. when it reached about 60%. At the 5% level of significance we would conclude a relationship between the source of the cat and whether or not the cat is brought to an animal shelter.411.33 9. (b) Los Angeles: χ 2 = 3. Ha : There is a relationship.41 9.413. df = 1. P-value = 0.000. The data show no evidence the response rates vary by industry.43 9.969 with 13 degrees of freedom (P-value = 0.696). 2007 9:50 S-48 SOLUTIONS TO ODD-NUMBERED EXERCISES 9. (c) Most of the effects illustrating the paradox are statistically significant.81.072. (a) Column percents because the “source” of the cat is the explanatory variable. Among the major differences.20 good performance continued.27 9.001.277.25 χ 2 = 50. The graph shows an increasing trend. a letter has a 43.7% response rate.1977 and χ 2 = 38.39 9. yet χ 2 = 9.

7 10.91. (c) MEAN OVERSEAS RETURN = 4. (a) χ 2 = 24.10 .5 10. min = 2.66% increase in overseas returns. (e) For 2001. min = 29..94.51 CHAPTER . so we can reject the null hypothesis. .. there is a strong. (b) The output gives t = 6. which indicates no gender differences for the high versus low mastery categories. It’s very hard to tell if there are any outliers or unusual patterns because the relationship is so weak.66 × U.3 Predicted wages = $449.24 (d) SPENDING = β0 + β1 · YEAR + ε.7%. and the regression standard error is s = 16.11 (a) r = 0. overseas returns are 4.03 and P-value = 0.. A 1% increase in U.863. and P-value < 0.0005.0005 and there is strong evidence that the population correlation ρ > 0.14 is a comparison of several populations based on separate samples from each. Ha : β1 > 0. The mutual funds that had the highest returns last year did so well partly because they were a good investment and partly because of good luck (chance). Inspection of the data shows that the males have higher percents in the two high social comparison categories. max = 9. From SPSS. 9. When U. the hypothesis test in part (d) has t = 3.S. positive. s = 17.18 −0.7 + 0.400 + 1. max = 70.535. (h) The assumptions are reasonable—we are assuming a simple random sample from the population.12 is a test of independence based on a single sample. 10.32 with 49 degrees of freedom.S..91. This agrees with what we found from the full 4 × 2 table. (b) β1 = 0. and linear. t = 6. (c) χ 2 = 0.400.76. (g) Yes..29. The ε term in this model accounts for the differences in overseas returns in years when the U. (a) β0 = 4. P-value < 0. with estimates β0 = −2651.9. ˆ (b) y = −2651..32 with 49 degrees of freedom. . 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-49 9.280.16 is a test of independence based on a single sample.Moore-212007 pbs November 27. returns is associated with a 0. β1 = 1.923 + 0. while the females have higher percents in the two low social comparison categories.2898. linear relationship between year and spending.7.49 Exercise 9. (e) y = 52. skewed left but no outliers.S. IBI: x = 65. the normal probability plot looks good. (b) Scatterplot looks weak.30 −0..9%. The residual is $2.13 is a comparison of several populations based on separate samples from each. P-value < 0.13 Area: x = 28.001. ε = 0.415 and a P-value of 0. 10.340x. H0 : β1 = 0.6270.. R 2 = 19.3177. SEb1 = 0.9 b1 = 0. return is the same.12 0. The trend might not stay the same beyond 1999. positive.24 −0.45 and P-value < 0. Exercise 9. 10. returns are 0%. RETURN + ε. Residual = −$60.. Ha : β1 = ˆ 0. (d) H0 : β1 = 0.460x. (c) IBI = β0 + β1 · AREA + ε. This is an extrapolation. There is strong evidence that Area and IBI have a linear relationship.1 10. df = 2. Exercise 9. (c) Year 1995 1996 1997 1998 1999 Residual 0. fairly symmetric and normally distributed except for 2 high outliers.0005.0992. (f) The residual plot shows that the residuals get slightly less spread out as Area increases. t = 6. the predicted spending is $29.94. (a) Yes. (b) χ 2 = 23.S.66.6700. Exercise 9.0005.340. but overall it doesn’t look too bad. 10. s = 18.714.

05.9. 10. (c) Income = 24. (b) y ± t ∗ SEμ = (397.5 to 10.8209 to 27982. The relationship is weak.6844).5 to 7.86. The conclusion we reached in Exercise 10. The conclusion we reached in Exercise 10. and the Normal probability and residual plots look good. the intercept and slope for the least-squares regression line change from 4786.187469.17 is not changed. ˆ ˆ 10.8209 to 8039. (c) t = 5.29 with 49 degrees of freedom. so that age explains only about 3% of the variation in income. respectively.0005. We conclude that there is strong evidence that ρ > 0.835.03525. P-value < 0. 10. P-value = 0. This is strong evidence that β0 > 0. 10.1135. (b) b0 = 2. (b) Selling price = 4786.0005.51 with 48 degrees of freedom. (b) Older men might earn more than younger men because of seniority or experience. 454. respectively.17 H0 : ρ = 0.26 and 89.1%.Moore-212007 pbs November 27. 3. The correlation is positive.6% to 71.17 is not changed. Younger men might earn more than older men because they are better trained for certain types of high-paying jobs (technology) or have more education. Ha : ρ = 0. 10.8209 × Square footage. No one will purchase T-bills if they do not offer a rate greater than 0. The correlation is a numerical description of the linear relationship between Area and Forest. (b) Regression inference is robust against a moderate lack of Normality for larger sample sizes. 10. ˆ ˆ .54). At the upper (higher income) portion of each vertical stack the points are more dispersed.8 and 73.3029. r 2 = 0. so there is a positive association between age and income in the sample. which is very weak.21 (a) We see the skewness in the plot by looking at the vertical stacks of points.1135 × Age. 10.0001.23 (a) The intercept tells us what the T-bill percent will be when inflation is at 0%. (c) (0.8%. the predicted income increases by $892. This agrees with what we saw in the test using the slope in the 10. and the t statistic for the slope changes from 10.6% to 59. (d) (1.97.46 and 92.6476. 2007 9:50 S-50 SOLUTIONS TO ODD-NUMBERED EXERCISES there is a fairly linear (although weak) relationship between IBI and Area.3745 + 892. (b) r 2 decreases from 69.46 + 92. with only a very few at the highest incomes.33 (a) y ± t ∗ SEμ = (391. and the t statistic for the slope changes from 10.14. r = 0.696 and indicates that square footage is helpful for predicting selling price.061. there is approximately the same spread above and below the regression line on the scatterplot which is fairly uniform for all Area values on the plot. Ha : ρ > 0. 10.35). There is a statistically significant straight-line relationship between selling price and square footage. There is not enough evidence to say that the correlation is significantly different from 0 if a significance level of 0.19 (a) Each of the vertical stacks corresponds to an integer value of age.15 Area is the better explanatory variable for regression with IBI.29 (a) r 2 increases from 69.25 (a) r 2 = 0. 10.874.46 and 92. The slope tells us that for each additional year in age. the intercept and slope for the least-squares regression line change from 4786. 449. t = 10. 0.5039. so do not reject the null hypothesis.200261).31 (a) Growth is much faster than linear.6660 and SEb0 = 0. P-value < 0. There are many points in the bottom (lower income) portion of each stack.27 H0 : ρ = 0. IBI and Forest have a much weaker relationship than IBI and Area. (b) The plot looks very linear.3070. 10.05 is used. P-value < 0.

09944.1936. 10. F = 39. (b) (33. 53.0% PI.203). P-value = 7.496). (c) df = 137.47 SEb1 = 0.314). 10. latitude.874. r 2 = 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-51 10. There is strong evidence that the slope of the population regression line of profitability on reputation is nonzero.36% of the variation in profitability among these companies is explained by regression on reputation. 10.03996.140586). 10.0417. (b) This interval is wider because the regression output uses all 5712 men in the data set. (b) Under the column labeled 95.996).41 (58.885 − 0.111822.63 Answers will vary. but using the t table we would use df = 100 to be conservative.61 r 2 = 0.0001. Total SS 10. the pharmacy would be charging less of a markup. P-value for F statistic = 0.915. 10. ˆ 10.49 The hypotheses are H0 : β1 = 0. but the value of r 2 in part (b) tells us that only 19.3177. They are not exactly the same because the IBI and Forest did not have as strong a relationship as IBI and Area.Moore-212007 pbs November 27.78. which equals the value of R-square in the output. t = 6. and P-value = 0. It is too wide to be very useful.0% CI.57 (0.0001 = P-value for t statistic for the slope. ˆ ˆ (b) y = 2. 10. A 95% prediction interval for a future response estimates a single IBI for a 31 km2 area. This interval includes values larger than 0.51 (a) r 2 = = 0.780. 0.53 s = 19. 62.35 (a) (62. 10.37 The confidence intervals for mean response and the prediction intervals are both fairly similar. For testing H0 : β1 = 0.914 and t = 6.3745 + 892.1135 × Age = 51.446.43 (a) (44. 10. 0. (c) Under the column labeled 95.010). so we can reject the null hypothesis. There is strong evidence that (log) markup and (log) cost have a negative linear relationship (evidence that charge compression is taking place).041. 100.11428).295x where y is the predicted (log) markup and x is the (log) cost.797).08. we see that the desired interval is (−41.492) is the square of the t statistic (6.65 (a) For more expensive items.041) for the slope. the P-value is less than 0. (d) It would depend on the type of terrain and the location. .36% of the variation in profitability among these companies is explained by regression on reputation. (b) 19. 55. (b) s = Residual MS = 2. 10.39 (a) y = 24. √ F = t.005. 10.637.1801. 71.563E − 08. 10. so Steve can’t be confident that he won’t be arrested if he drives and is stopped.4489.55 (a) H0 : β1 = 0.45 A 90% prediction interval for the BAC of someone who drinks 5 beers is (0. Ha : β1 = 0.2534. Mountain regions.735. (c) Part (a) tells us that the test of whether the slope of the regression line of profitability on reputation is 0 is statistically significant. (c) A 95% confidence interval for mean response is the interval for the average IBI for all areas of 31 km2 . √ Regression SS 10.379. 10. we see that the desired interval is (49.579. while the one-sample t procedure uses only the 195 men of age 30. and proximity to industrial areas might make a difference. 145.59 The F statistic (36. Ha : β1 = 0. Ha : β1 < 0.

10. A two-sided test of the slope gives a t = 41. but not for women who work in large banks.79 (a) y = 106. (b) The ˆ moderately strong linear relationship is good. 10. For women who work in small banks. the P-value for the test of the hypothesis that the slope is 0 is 0. 0. where y is the predicted number of employees and x is the number of rooms.6) is 35. and the residual plot has a distinct funnel shape. and positive relationship between the number of students and the total yearly expenditures. the P-value for the test of the hypothesis that the slope is 0 is 0. We might use length of service to predict wages for women who work in small banks. but the outliers are not. (c) y = ˆ 101.8% of the variation in lean.589x. 10. The 95% confidence interval for the slope is (−0. the least-squares regression line of lean on year explains 99. However. (d) H0 : β1 = 0. 1.75 It appears that regressing wages on length of service will do a better job of explaining wages for women who work in small banks than for women who work in large banks.774 in the original model to 129. The standard error of regression is s = 4. The s drops from 188. The least squares regression ˆ ˆ line is y = 0.Moore-212007 pbs November 27. and the regression standard error is s = 1. (e) (0. 10.891.530x.526 + 0.67 (a) Some questions to consider: Is this a national chain or an independent restaurant? What kind of food is prepared? Do the restaurants have similar staffing and experience? (b) Answers will vary. so we cannot reject the null hypothesis. the normal probability plot does not look good.77 For women who work for large banks.73 The scatterplot shows a strong.3% in the model without the outliers.253.02 < P-value < 0. 10. where y is the predicted total yearly expenditure and x is the number of students.181. The new equation of the line is y = 72.6. This is a multiple of more than 8 times s and would be considered an extreme outlier from the pattern of the data from 1975 to 1987. (b) The R 2 drops from 60.70. .69 (a) The relationship looks positive. however. and moderately strong except for potential outliers for the two largest hotels (1388 and 1590 rooms each). linear.71 (a) Hotel 1 (1388 rooms) and Hotel 11 (1590 rooms) are the two outliers. which contains 0. r 2 = 99. so reject the null hypothesis.91066 meters. 10. linear.981 + 0. These results sound very promising. The assumptions for regression may not be appropriate here. Ha : β1 = 0 with t = 4.088.4%. (b) A plot of the data shows that these data closely follow a straight line.623 and a P-value close to 0. or 2.105.526 + 0. and 0.001.81 t = 2.287 and P-value from SPSS of 0. The 95% confidence interval for the slope from the original model did not contain 0. and the P-value is now 0. There is not enough evidence to say that the slope is significantly different from 0. There is strong evidence that there is a significant linear relationship between the number of rooms and the number of employees working at hotels in Toronto. R 2 is very good at 98.6.8%.151 in the model ˆ without the outliers.04. which tells us that for the years 1975 to 1987.2134.16. ˆ 10.775).514x.284). 2007 9:50 S-52 SOLUTIONS TO ODD-NUMBERED EXERCISES 10. The new t test statistic is 1.5% in the original model to just 26. the absolute difference between the observed value in 1918 (coded value of 71) and the value predicted by the least-squares regression line (coded value 106.0002.

000. Depth: fairly normally distributed.1 11. the unrounded values are s = 2.538 on the original scale).5 Pred.50 + 0.3 11.4 M 90 11 14 2.21 13..55 + 0..808 · weight + 25.445 − 10.44958 and s 2 = 6.308 1. one low outlier.00) + 120(43. The name given to s in the output is S.643 (compared to 0.21 Saw 6 has the most negative residual of all.9 11. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-53 CHAPTER . and the coefficient of assets is much smaller. 11. and Saws 2 and 11 both have negative residuals. .11 With General Motors and Wal-Mart deleted.533 on the original scale). 11. For SAS.238 log(assets) + 0.44958 and s 2 = 6. Weight plot looks good with possibly an outlier at Weight = 9.7 11. (b) The explanatory variables are the number of banks and deposits.84 11.416 s 43. The coefficient of sales has more than doubled. This would require less in the form of assets and increases the amount of sales.. no outliers. The correlation between log(profits) and log(assets) is 0. The name given to s in the output is Std.00045. Saw 8 has a positive residual. (b) Residuals vs. and the high outliers were eliminated from the other two plots. Error of the Estimate.744 0.674 · amps + 70.15 (a) Variable Price Weight Amps Depth x 91.00) + 150(38. The two high outliers are different than the high outliers in sales. the unrounded values are s = 2.030 · depth √ (b) 27. (c) p = 2.0553 sales. SPACE = 210 + 160(45. Residuals vs. ˆ 11.569 (compared to 0.. For SPSS.574. one low outlier.4 ft2 . (a) The response variable is bank assets.53 2.0765 Min 30 9 10 2. The name given to s in the output is Root MSE.Moore-212007 pbs November 27. no outliers.. Amps: very left-skewed.24) + 160(17. Residuals vs. (d) n = 54.. However. log(profits) = −1. The linear association between log(assets) and log(sales) appears much stronger.00) + 65(315. we cannot say that . profits = 1. 11.19 (a) A histogram shows that the residuals are fairly Normally distributed with no outliers.11 . For Minitab.000. Weight: fairly normally distributed.449581635 and s 2 = 6. 11.032 1. The name given to s in the output is Standard Error.533.455 on the original scale).17 (a) y = −303.00496 assets + 0.526 (compared to 0. Therefore.831 = 774.5 Max 150 13 15 2.2 Q1 50 11 12 2. The correlation between the explanatory variables log(assets) and log(sales) is 0.5 (b) and (c) Price: fairly normally distributed. Depth looks funnel shaped. The distribution of sales is skewed to the right with two high outliers. 11.4 Q3 140 12 15 2.478 log(sales). 11. the unrounded values are s = 2.000450185. Amps looks good with possibly an outlier at Amps = 10. The correlation between log(profits) and log(sales) is 0.. It is not surprising that Wal-Mart has high sales relative to its assets because its primary business function is the distribution of products to final users.450 and s 2 = 6.25) = 41... the unrounded values are s = 2.13 For Excel.

488.3619 . Package Excel Minitab SPSS SAS Coefficient 0. A scatterplot of Residuals vs. 11. Rank looks completely random. 27.9 St.361902429 0. cash items.5 20. The association is less strong with check items but this is due to an outlier. 11.160 Eng + 0. 11.33 Plot the residuals versus the three explanatory variables and give a Normal quantile plot for the residuals.00 5. 1.038 0.0478 Staff. (b) s = 5.8 (b) Use split stems.03755888 0.80 134 5. 11.dev.034315568 0.91 P-value 0.00 15. (b) s = 1.3 20.0 Q1 2.98 (b) There is a strong linear trend between gross sales and both cash and credit card items.883 + 0.27 All summaries are smaller except minimums.3 Max.68 Median 263. 11. 2007 9:50 S-54 SOLUTIONS TO ODD-NUMBERED EXERCISES there is a relationship between rank and the residuals.Moore-212007 pbs November 27.2 Median 8.dev.80 Q1 2.9 Q3 11.30 125 1. Largest effect is on the mean and standard deviation.30 125 1.00 316.37 HSS does not help much in predicting GPA.29 Share = 1.157 Assets and s = 3.03756 t-statistic 0.30 Max.91 . with high outliers. 1.00 St.162. 7.00 Min.914 0. There are no obvious problems in the plots.31 (a) Stemplots show that all four distributions are right-skewed and that gross sales.362 .5 9.63 293 12.03432 3. (c) All three distributions appear to be skewed to the right.30 11.74 886 76.96 794 48.501.35 (a) TBill98 = 0. Variable Share Accounts Assets Mean 6.60 909 38.25 (a) Share = 5.04 7.00663 Accounts + 0.362 0.50 2500 219.52 20. Variable Gross Cash Credit Check Mean 320.913647251 0.432E-02 0.27 Median 6.03432 Standard error 0.dev.85 509 15. 11. 12.50 132 5. 11.0828 Assets.60 392 13.80 602.3 19.16 − 0.07 7.00031 Accounts + 0. with math and English grades available for prediction.85 + 0. 180.90 909 38.1 11.35 Min. 4.80 14.70 Q3 10.138 Arch + 0.76 St.03756 . and credit items have high outliers.23 (a) Variable Share Accounts Assets Mean 8.

65 with degrees of freedom q = 3 and n − p−1 = 218. The degrees of freedom for the t statistics are 2215. we did not expect the confidence interval to include 0.0001. 11. (b) The null hypothesis should be H0 : β2 = 0. lower when there is a cosigner.2402) HSM. higher when there is a young borrower. (c) HSM. conclude that the explanatory variable is not useful in prediction when all other variables are available to use for prediction. total income. higher when there is a bad credit report. 2.41 The two models give similar predictions. young borrower. lower for a higher percent down payment.4415. 3. cosigner. df = 40 − 30 − 1 = 9.0234.7585). however.917. HSE: 12. lower for longer length loans. R 2 . values that are less than −1.0607 HSE.1%.624 + 0.0%. which has the F(2.76%. although HSE is more helpful without HSS in the model. Software gives P = 0. as more dual-earner couples sign up for DB plans.55 (a) The hypotheses about the jth explanatory variable are H0 : β j = 0 and Ha : β j = 0. HSM is still the most important variable in predicting GPA. 11. gives the proportion of the variation in the response variable that is explained by the explanatory variables. F = 13. 60) distribution.I. (1. lower when the borrower owns a home. .0% C.I.3%. Another possible correction would be: in each subpopulation. 11. −0.0% P.5%. R22 = 6. length of loan.84086 95.183 HSM + 0.Moore-212007 pbs November 27. higher for an unsecured loan. Very little of the variability in the response is explained by this set of 3 explanatory variables.96 or greater than 1. (b) R 2 = 11. Fit 1.2%.43 (a) HSM.001. unsecured loan.0806) 95. There is strong evidence that the coefficient of P5 is not 0.51 (a) The squared multiple correlation.6011.47 (a) t = −8. fewer would be signing up for DC plans. High school grades contribute significantly to explaining GPA when SAT scores are already in the model. P-value < 0. and years at current address.39 GPA = 0. y varies Normally with a mean given by the population regression equation. lower for those with higher total income. 11. If a t is not significant. HSE: 20.49 The difference in signs makes sense because employees generally sign up for either the DC or the DB plan. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-55 11. 11. (c) The interest rate is lower for larger loans. 11. The total number of dual-earner couples available to sign up for a plan probably stays about the same from year to year. bad credit report. HSS. (b) HSM. Almost all of the information for predicting GPA is contained in HSM. (e) HSM: 19. HSE: 20.6135.53 (a) F = 4.45 R12 = 21. (c) One of the assumptions for multiple regression is that the deviations εi are independent Normal random variables with mean 0 and a common standard deviation σ . percent down payment. and lower when the number of years at current address is higher. own home. HSE 11. Software gives P < 0. (0.96 will lead to rejection of the null hypothesis. (b) (−1. At the 5% level.15%. HSS: 20. (b) The significant explanatory variables are loan size. 11. Since we rejected the null hypothesis in part (a). (d) HSS.34%. and after removing the grade variables.

77 Because we have no data on smaller homes with an extra half bath. rank in years is useful when performing the same regression analysis without the data from 2002 salary.65 From the stemplot.236. σ . and unsecured loan. (b) The relationship between μ y and x is curved. 11.16.7 + 1351. values that are less than −1. s = 9035.63 y = 112.177 with degrees of freedom 2 and 13. . predicted price = $99. t = 0. P-value = 0. t = 2.73 The trend is increasing and roughly linear as one goes from 0 to 2 garages and then levels off. 11. however. (c) b0 = 30.915.109.109. 11.43. (b) t = 4. predicted price = $97. reject the null hypothesis. 11. percent down payment.79 (a) The relationship between μ y and x is curved and increasing. (c) Years in rank included with the 2002 salary produced these results: coefficient = 57. years in rank is not very useful in predicting 2005 salary. (c) The interest rate is lower for larger loans.017. Without an extra half bath. but the coefficient for years in rank is not.563.69 For 1000-square-foot homes. For 1500-square-foot homes.915.96 will lead to rejection of the null hypothesis. do not reject the null hypothesis.783.62. 11. predicted price = $79. βyears . ˆ 11. 11. b year s = 57.67 1000-square-foot homes.750.981. 11.47.392.837x.59 (a) y varies Normally with a mean μGPA = β0 + 8β1 + 9β2 + 7β3 . We conclude years in rank and 2002 salary contain information that can be used to predict 2005 salary. the two sets of predictions are fairly similar. lower for a higher percent down payment.6%.000 from software. (c) The relationship between μ y and x is curved and decreasing.71 The pooled t test statistic is t = 2. length of loan. Three of these are clearly outliers (the three most expensive) for this location.725.621. β2002 .01. (b) The GPA of students with an B+ in math. 11. lower for longer length loans.Moore-212007 pbs November 27. For 1500-squarefoot homes.719. The difference in price is $15.253. There is evidence the coefficient for 2002 salary is significantly different from zero. and B in English has a Normal distribution with an estimated mean of 2. At the 5% level. our regression equation is probably not trustworthy. first decreasing and then increasing.392.71. 11. When included with the data concerning 2002 salary.75 With an extra half bath. For years in rank.61 (a) yi = β0 + β2002 x2002i + βyears xyearsi + εi .875 with df = 35 and 0. Years in rank are useful for predicting 2006 salary. six of these are the six most expensive homes. (d) F = 20. (b) The statistically significant explanatory variables are loan size.134. Both tests use df = 13.005 < P-value < 0. (e) R 2 = 75. 11. predicted price = $96. (b) β0 . Overall. predicted price = $78. P-value is close to 0.57 (a) The hypotheses about the jth explanatory variable are H0 : β j = 0 and Ha : β j = 0. predicted price = $84. P-value = 0.96 or greater than 1. (f) For 2002 salary. P-value = 0.585. and P-value = 0. t = 0. and higher for an unsecured loan. 2007 9:50 S-56 SOLUTIONS TO ODD-NUMBERED EXERCISES 11. b2002 = 0.518.865. The degrees of freedom for the t statistics are 5650.243. These results agree (up to roundoff) with the results for the coefficient of Bed3 in Example 11.041. A− in science.

and after removing the squared term. then 5. 2 high outliers Forest 39. 0. (b) F = 1.393. 0. 0. (b) 0. df = 7.29 17.2346 3. Area and Forest have a negative relationship. no outliers (b) All relationships are linear and moderately weak. 0. fairly linear. then 20%. but the others are positive. the R 2 increases to 61.284.000021. 0. weak.780. 11.225.1760 3. 0.09 with 1 and 13 df.689. 0. 0.540 − 0. 0. no outliers IBI 65. Therefore.809. (d) The variables Account and the square of Account are highly correlated. Software gives P = 0.000034 Account2. Data point #40 looks like a . Promotions 1 3 5 7 10 4. and R 2 and s are very close to what they were in the original model.05. Promotions: negative.89. 0. then 3.251.61 − 0. then 7. The squared term does not contribute significantly to explaining salary.87 (a) R12 = 65. Description of stemplot 28. dev.280 Left skewed. Price vs.001 Discount2 .2699 20 4.058. 11.756.2040 4.524. the difference in means (Group B − Group A) = 3 = the coefficient of x. which agrees with F up to rounding error.91 In the previous exercise.76.Moore-212007 pbs November 27. then 30%. (c) t = 3. the s decreases to 0. then 40%.85 (a) Assets = 7.2648 4.10.060 Discount+ 0. 0.920. 0. 11. and the P-value is still very close to 0. Discount: negative.76.2685 4.94 18.204 Right skewed. The best model is the one with the quadratic term for ˆ discount: y = 5.000034 ± 0.2511.2331 4.097.89 (a) Price vs. the F test statistic is 81. (c) t = −1. moderate.2407 30 4.2618 40 4.2429 4. the residual plot for promotions looks ok but the residual plot for discount seems to have a curved shape. If the quadratic term is removed and the interaction of discount and promotion is used instead.423. the 1 promotion yields the highest expected price. The difference in the intercepts (Group B − Group A) = 120 − 80 = 40 = coefficient of x1 .48%.93 (a) Area Mean St. 11.81 For part (a). All coefficients are significantly different from 0. try the quadratic term for discount and the interaction of discount and promotion.3856 4.39 32. the coefficient for the interaction term is not significant. 0.83 For part (a).38%.2707 4. The quadratic term is useful for predicting assets in a model that already contains the linear term. (b) The table below shows the mean and standard deviation for expected price for each combination of promotion and discount.2144 (b) and (c) At every promotion level.3155. R22 = 62. and t 2 = 1. 0.714 Right skewed. and P = 0. linear.1629 3. 11. the difference in slopes (coefficient x2 for Group B − coefficient x2 for Group A) = 19 − 7 = −12 = the coefficient of x1 x2 .007.0046 Account + 0.094.102 Promotion − 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-57 11. 0.1520 4.1%. This can be verified directly for the other parts and shown to be true in general. For every discount level. 0. If the quadratic term for discount is used.269. the 10% discount yields the highest expected price.1848 4. This can be verified directly for the other parts and shown to be true in general. 11.

(b) y = 1.776. 1 low and 1 high outlier Fairly symmetric. 1 high outlier Fairly symmetric. (h) Yes.000.4028 0. All the explanatory variables are significantly correlated with PCB.600 PCB118 + 4. 1 low and 2 high outliers Right skewed. but ignoring data is not a good idea.3494 0. 11.8627 4. 7 high outliers (b) All the variables are positively correlated with each other. There are two new potential outliers though at #44 and #58.7%. 11. but the Normal probability plot looks good. This model has R 2 = 97. s = 4. (c) Confirms what we did in (a) and (b). (c) yi = β0 + βArea xAreai + βForest xForesti + εi .3906 1.103 Answers will vary. (b) The error terms are all 0. no outliers 11. only 35.234 Forest. so σ = 0.3483 0. 11.3592 0. (e) R 2 = 35. 6 high outliers Skewed right.504 0.95 (a) PCB PCB52 PCB118 PCB138 PCB180 Mean 68. 0.4%.3717 1.628 + 14.3354 −1. All coefficients are significant again.7009 0.101 (a) Results will vary with software. 2007 9:50 S-58 SOLUTIONS TO ODD-NUMBERED EXERCISES potential outlier on the Area/Forest scatterplot.8576 0.0191 5.701 0.478). All coefficients are significant. although PCB52 is the most weakly correlated with PCB and with the other explanatory variables. 1 low and 1 high outlier Fairly symmetric. P-value = 0. 1 low outlier Fairly symmetric.2563 6. We had general linear relationships between the variables and the residuals look good. no outliers Fairly symmetric.2591 Description of stemplot Fairly symmetric. The residuals look fairly Normally distributed.97 (a) #50 and #65 are the two potential outliers.054 PCB138 + 4.4235 −0.3914 0. but not anyplace else.7% of the variation in IBI is explained by Area and Forest.5% and .685. (b) Most software should ignore these data points. #50 (residual = −22. (c) Table below uses Base 10 logs. s = 14.555. 5 high outliers Skewed right.5167 0.7397 0. dev. 5 high outliers Skewed right.1584 St. Ha : The coefficients are not both 0.5793 −0. no outliers Right skewed. (g) A histogram of the residuals looks slightly skewed left.629 + 0. P-value = ˆ 0.0864) is the ˆ overestimate.Moore-212007 pbs November 27. 5 high outliers Skewed right.4918 0.3495 St. Log of variable PCB138 PCB153 PCB180 PCB28 PCB52 PCB126 PCB118 PCB TEQ Mean 0. Only the correlation between PCB52 and PCB180 is not significant (P-value = 0.9864 Description of distribution Skewed right.5983 3.569 Area + 0. F = 2628. 16 high outliers Fairly symmetric. but if you start with the full model and then drop one variable at a time. dev.99 (a) β0 = 0. R 2 = 99. y = 40. 11.442 PCB52 + 2. (f) Residual plots look good—random and no outliers. β1 = β2 = β3 = 1 because TEQ = TEQPCB + TEQDIOXIN + TEQFURAN. (d) H0 : βArea = βForest = 0. one good possibility is leaving out Log PCB126 because all the coefficients are significant at the 5% level. However.9580 3. and the residual plots look random.4674 0. F = 12. 59.8268 4.109 PCB 180.972.

3 Days + 0.09.212 + 0.8% and s = 0. 11. 152.119 For the linear model.87.90 from the previous model.109 (a) Vitamin C = 46 − 6. Neither the coefficient for x1 or for x2 is significant.107 (a) R 2 = 18.214 + 0. its coefficient is significant. (c) It appears that there is an increasing and then decreasing effect when looking at the residuals plotted against year. R 2 = 0.108 Log PCB28 + 0.000503. (c) This isn’t always the best approach either.08 and df = 7. While the t statistic shows that the coefficient on the variable LOS is significantly different from zero.29.1 − 11. Ha : At least one coefficient is not equal to zero. (c) R 2 = 0.045 × Year2 + 3.059. The P-value = 0. (c) Vitamin C = 50.5 + 0. (b) The residuals appear fairly Normal.2+4. t = −10. Multiple regression is complicated.82. It appears that the squared term is significant in the model. (b) All correlations are significant.8×Soybean yield. 11.62.Moore-212007 pbs November 27.117 (a) Corn yield = −607. (b) It appears that the relationship may be slightly curved. df = 8.41 with a 95% prediction interval of (114. 11. t statistics have small P-values.14 with a 95% prediction interval of (101. If the students choose a lower significance level.04. it does not appear that this is a strong linear model.151 Log PCB118. t = 16. a ˆ good model with only 4 variables would be: y = 1. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-59 ˆ s = 0.9 × Soybean yield.101 Log PCB52 + 0. This model has an R 2 = 97.73 with a P-value = 0.287 + 0. 45). P-value = 0. t = 1. LOS does not explain more than 12% of the variation in wages. the t statistics for the coefficients are: t = −5. 11.152 Log PCB153 + 0.091 Log PCB28 + 0. (b) H0 : All coefficients are equal to 0. For the quadratic model. (d) In the order they appear in the model. 11.087 Log PCB52. It makes sense to include year in the regression model. predicted yield for 2001 was 127. P-value = 0.400 Log PCB138+ 0. F = 233.668 Log PCB138 + 0.152.67 × LOS + 71.12. 11. The least-squares regression line is: y = 1. t = 2.95.103 Log PCB52.111 Wages = 349 + 0. If just x1 is used.006. t = 9. This model has an R 2 = 96. 11. R 2 = 0.61). P-value = 0. If just x2 is used. 158.0638. predicted yield for 2001 was 136.105 Answers will vary. df = 3 and 36.088 Log PCB28 + 0. .05 Days.86.132 Log PCB180 + 0. The residuals show a systematic pattern in that the values go from positive to negative to positive as the number of days increases.1421. This indicates that the model provides significant prediction of corn yield.763 Days2 . compared to 0. Choose the model with the squared term.5 + 0. A model with 3 variables would be: ˆ y = 1.115 (a) Corn yield = −46. (d) R 2 without the squared term is 0.3x Year −0. days in the second model shows that there is no systematic pattern.512 with a P-value = 0. F = 3. its coefficient is significant. 11.0600. (e) The residuals all appear random when plotted against the explanatory variables. The scatterplots show that both x1 and x2 have positive linear relationships with y and with each other. s = 1.6 LOS.113 Wages = 302. The squared term t statistic is 6. The linear model gave a closer prediction to actual.9%.0581. A look at the scatterplot of residuals vs.99.821 Log PCB138 + 0.82 with a P-value = 0.93 and with the squared term is 0.2% and s = 0. P-value = 0. It appears there is a significant linear relationship between vitamin C level and the number of days after baking. P-value = 0. R 2 = 0.144 Log PCB153 + 0.8 × Size.

0.155 HSM + 0.123 The residuals do not show any obvious patterns or problems.05 GHSM − 0. A Pareto chart indicates that the hospital ought to study DRGs 209 and 116 first in attempting to reduce its losses.129 (a) GPA = 0. Based on the large P-value associated with the SATM coefficient.7 . a malfunction of the coffeemaker or a power outage.05 GHSS − 0.966. The plot of SATM vs. a serious mismeasurement of the amount of coffee used or the amount of water used. (d) F = 0. (b) H0 : All coefficients are equal to 0. F = 26.319. SATM: (−0. The t statistics for each coefficient have P-values greater than 0.. variation in the amount of water added to the coffeemaker. (c) Data set B comes from a process that is in control.. (b) In the chart for data set A one finds that samples 12 and 20 fall above UCL.666 + 0. the interval describing SATM coefficient does contain zero.10 for all except the explanatory variable HSM.127 Looking at the sample of males only shows the same results when compared to the sample of all students.5 12. Yes.20) and while HSM is a significant predictor. Ha : At least one coefficient is not equal to zero. (f) R 2 = 0. GPA does not show any obvious outliers.1 A flowchart for making coffee might look like Measure coffee ∅ Grind coffee ∅ Add coffee and water to coffeemaker ∅ Brew coffee ∅ Pour coffee into mug and add milk and sugar if desired.. CHAPTER .067 Gender + 0.184 (compared to 0.012 GHSE. (d) HSM: t = 5.999. 0. 11.703. In the chart for data set B no samples are outside the control limits.125 GPA = 0.1942. In the chart for data set C.. and the use of milk that has gone bad.8. Some sources of common cause variation are variation in how long the coffee has been stored and the conditions under which it has been stored.. one should conclude that SATM does not provide significant prediction of GPA. In making a cause-and-effect diagram.63 and P-value = 0. R 2 = 0.99. and variation in the amount of milk and/or sugar added. This means that we assume the model is not a significant predictor of GPA and let the data provide evidence that the model is a significant predictor. UCL = 11.2. P-value = 0. The plot of SATV vs.143. 11. consider what factors might affect the final cup of coffee at each stage of the flowchart. (e) s = 0. P-value = 0.1 we described the process of making a good cup of coffee. P-value = 0.5.582 + 0. variation in the length of time the coffee sits between when it has finished brewing and when it is drunk..3 12. variation in how finely ground the coffee is. (c) HSM: (0. (a) CL = 11.044 HSE + 0..00061 SATM.121 The scatterplots do not show strong relationships.Moore-212007 pbs November 27. LCL = 11.193 HSM + 0. samples 19 and 20 are above UCL.. 11. SATM: t = 0. The 9 DRGs account for 80.1295. (b) Verify.2565).5% of total losses.050 HSS + 0. variation in the measured amount of coffee used. interruptions that result in the coffee sitting a long time before it is drunk. This model provides significant prediction of GPA. 2007 9:50 S-60 SOLUTIONS TO ODD-NUMBERED EXERCISES 11. 12. (c) Verify.00059. Some special causes that might at times drive the process out of control would be a bad batch of coffee beans.12 . GPA shows two possible outliers on the left side.. This indicates there is no reason to include gender and the interactions.001815). Data set A comes 12. In Exercise 12. 11. HSS and HSE are not. .

13 My main reasons for late arrivals at work are “don’t get out of bed when the alarm rings” (responsible for about 40% of late arrivals). alarm rings ∅ 5:45 A. a given sample is equally likely to be from the experienced clerk or the inexperienced clerk. LCL = (c4 − 2c5 )σ . and “slow traffic on the way to work” (responsible for about 15% of late arrivals). UCL = 0. s = 8.044.7 we see that most of the points ˆ ˆ lie below 40 (and more than half of those below 40 lie well below 40). 12. CL = 43..M. Second sample: x = 46. leave home ∅ 6:50 A.31 99.Moore-212007 pbs November 27.M. CL = 11. (b) In Figure 12. .9. inspecting samples of the monitors it produces and fixing any problems that arise.98.M. The desired probability is 0.27 By practicing statistical process control. “too long showering. (b) For an s chart. LCL = 0. For the s chart we note that UCL = 25. 12. 12. that x will be either larger than UCL = 713 or smaller than LCL = 687. 12.87662.02. s = 13.1591.25 (a) μ = 275. Thus. arrive at office. 12.15 For the s chart. get out of bed ∅ 5:46 A.94.9 CL = 0. start coffeemaker ∅ 5:47 A.M. √ √ 12..M. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-61 from a process in which the mean shifted suddenly.23 Presumably.. Incoming inspection is thus redundant and no longer necessary.46065. “too long eating breakfast” (responsible for about 25% of late arrivals). For Normal data the natural tolerances are trustworthy. The s chart shows a lack of control at sample point 5 (8:15 3/8). For the x chart. UCL = 0.M. For the x chart UCL = 60.5. the manufacturer is. LCL = 0.8750. all but one (Sample 12) are only slightly larger than 40.M. 12.06% of monitors will meet the new specifications if the process is centered at 250 mV..19 We want to compute the probability that x will be beyond the control limits. 12. which is consistent with the estimate of σ in part (a)....87338. 12. and which clerk a sample comes from will be random.21 (a) 2σ control limits are UCL = μ + 2(σ n).11 5:45 A.03. The s chart suggests that typical values of s are below 40. 12. LCL = 0. that is. so based on our Normal quantile plot.0023568. shaving. breakfast ∅ 6:30 A. Data set C comes from a process in which the mean drifted gradually upward. in essence.17 First sample: x = 48.09.M.001128. the 2σ control limits are UCL = (c4 + 2c5 )σ . shave. and dress ∅ 6:15 A. σ = 37. and the pattern of large and small values should appear random. Of the points above 40.065. brush teeth ∅ 6:35 A. and dressing” (responsible for about 20% of late arrivals). Both types should occur about equally often in the chart.. We would want to find out what happened at sample 5 to cause a lack of control in the s chart. 12. but otherwise neither chart shows a lack of control. LCL = 25. 12. CL = 0. LCL = μ − 2(σ n). shower. the natural tolerances we found in the previous exercise are trustworthy. park at school ∅ 6:55 A. The control charts the manufacturer creates are a record of this inspection process. LCL = 0. UCL = 1. CL = 0.29 The plot shows no serious departures from Normality. the x chart should display two types of points: those that have relatively small values (corresponding to the experienced clerk) and those with relatively large values (corresponding to the inexperienced clerk)..M.

They may take more breaks for water or become tired more quickly and thus accomplish less. resulting in control limits that are too wide to effectively signal unusually fast or slow times. ˆ ˆ ˆ ˆ 12. If it is the temperature of the office. 12.085. as the temperature in the office gradually increases. Thus.303. where the process is relatively stable). LCL and UCL for x are the lower and upper control limits for the means of samples of several individual measurements on the output of a process. We would expect to see this reflected in a sudden change (decrease) in the level of an s or R chart. we should consider only the process variation (variability in recent times. 12. There are thus two sources of variation in the data.34 should be considered only approximate. (c) The capability index formulas make sense for normal distributions. to determine if times in the next few years are unusual.Moore-212007 pbs November 27. because of the lack of precision.39 (a) If all pilots suddenly adopt the policy of “work to rule” (and are now more careful and thorough about doing all the checks). we should probably view these data as only approximately Normal and thus the calculations in Exercise 12. These horizontal bands correspond to observations having the same value. With greater precision. One is the process variation that we would observe if times were stable (random variation that we might observe over short periods of time).16% of meters meet specifications.43 LSL and USL define the acceptable set of values for individual measurements on the output of a process. these observations would undoubtedly differ and would probably appear to come from a Normal distribution.41 The winning time for the Boston Marathon has been gradually decreasing over time. . They represent the actual performance of the process when it is in control. 12. Cpk = 1. (b) Cp = 0. However. This would be reflected in a gradual shift up on the x chart.17. In using the standard deviation s of all the times between 1950 and 2002. the temperature will gradually increase as outdoor temperature increases.74%. we include both of the sources of variation. one would expect to see a sudden increase in the time to complete the checks but probably not much of a change in the variability of the time.651. If the measurement of interest is the number of invoices prepared. the workers gradually become less comfortable. and the other is variation due to the downward trend in times (the large differences between times from the early 1950s compared to recent times).47 (a) Cpk = 1. Using s overestimates the process variation. (b) 99. then as time passes. We might expect to see a gradual decrease in the number of invoices prepared and hence a gradual shift down in the x chart. (b) Presumably samples of measurements made by the laser system will have a much smaller standard deviation than samples of measurements made by hand.869. 50% meet specifications. Cpk = 0.33 The natural tolerances for the distance between the holes are μ ± 3σ = 43. These limits represent the desired performance of the process. 2007 13:26 S-62 SOLUTIONS TO ODD-NUMBERED EXERCISES 12. 12. (c) Presumably the measurement of interest is either the temperature in the office or the number of invoices prepared.41 ± ˆ ˆ 37. but for distributions that are clearly not normal they will give misleading results.37 The large-scale pattern in the plot looks Normal in that the centers of the horizontal bands of points appear to lie along a straight line. However. 12.45 (a) Cp = 1. we would expect to see a sudden change in level on the x chart. resulting from the limited precision (roundoff). 12.35 About 43.

02. and hence USL and LSL will be at least 6σ from μ.57 If the same representatives work a given shift. Cpk = 0. a much wider range than the specification limits of 54 ± 10).63 UCL = 0.53 (a) 17. and calls can be expected to arrive at random during a given shift.55 Cp = USL − LSL 6σ .01 (Sample 46). 12.4043 and LCL = 0. the process mean and standard deviation should be stable over the entire shift.39.750. CL = 0. and the process variability is large (we saw in Exercise 12. we would expect to see 4 defective orders per month.Moore-212007 pbs November 27. and no obvious trends. (b) UCL = 0.008 and if the manufacturer processes 500 orders per month. It estimates what the process is capable of producing if the process could be centered. then the process may not be stable over the entire shift. If different representatives work during a shift. and random selection should lead to sensible estimates of the process mean and standard deviation.67 (a) p = 0. There may be changes in either the process mean or the process standard deviation over the shift. LCL = 0.68. CL = 0.008. and thus it would be more reasonable to time 6 consecutive calls.33 that the natural tolerances for the process are 43. this is called six-sigma quality.0334. The reasons are that the process is not centered (we estimate the process mean to be 43. 12.8. the values of s become s = 9.49 Cp is referred to as the potential capability index because it measures process variability against the standard provided by external specifications for the output of the process. or if the rate of calls varies over a shift. ˆ ˆ 12. so if Cp ≥ 2 we must have USL − LSL ≥ 22σ . 12. If the process is properly centered. Thus. LCL = 0. It estimates what the process is capable of if the process target is centered between the specification limits. 12. When the outliers are omitted. The capability is ˆ ˆ poor (both indexes are small).0435.17. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-63 12. (b) For the p chart.41 ± 37. resulting in a value of s = 93. (c) For October. p = 0. σ = 12.20. 12.59 The three outliers are a response time of 276 seconds (occurring in Sample 28).61 (a) The total number of opportunities for unpaid invoices = 128. s = 31.71 (Sample 42). (b) Cp = 0. resulting in a value of s = 107. Stability would more likely be seen only over much shorter time periods.0233. The process (proportion of students per month with 3 or more unexcused absences) appears to be in control.65 (a) p = 0.27. n = 921.3087.4027 and LCL = 0. In this case. UCL = 0. s = 6.41. UCL = 0. A random sample from all calls received during the shift would overestimate the process variability because it would include variability due to special causes.28 (Sample 28).020.53. and a response time of 333 seconds (occurring in Sample 46). CL = 0. a response time of 244 seconds (occurring in Sample 42). Cpk is referred to as the actual capability because index because it considers both the center and the variability against the standard provided by external specifications for the output of the process.356. (b) Cpk = 0. 12. That is. UCL = 0.356. then its mean μ will be halfway between USL and LSL. These exact limits do not affect our conclusions in this case. CL = 0.06. 12. For June.0334.00599.78% of clip openings will meet specifications if the process remains in its ˆ current state.4033.3093.3077. .41. then it might make sense to choose calls at random from all calls received during the shift.51 (a) μ = 43. LCL = 0.01307. 12. but the midpoint of the specification limits is 54). resulting in a value of s = 119. there are no months with unusually high or low proportions of absences. (b) UCL = 0.

and LCL = 0.79 (a) With p = 0. then in a sample of 100 films the expected number of unsatisfactory films is 0. A better estimate would be to compute the sample standard deviation s of all 22 × 3 = 66 observations in the samples. (b) Cpk depends on the value of the process mean.508. If the process mean can easily be adjusted.896.71 (a) Presumably we are interested in monitoring and controlling the amount of time the system is unavailable.. After the first sample. 12. and LCL = 0.. the number of defective orders must be greater than 10..718. 4. most of our samples will have no unsatisfactory films. Attempts 2. 13. This value tells us that the specification limits will lie just within 3 standard deviations of the process mean if the process mean is in the center of the specification limits. (c) We might examine samples of programming changes and record the proportion in the sample that are not properly documented. The control limits would be UCL = 0. ˆ 12. the new limits for future samples should be UCL = 0.. sales are lowest for the first two quarters and then increase in the third and fourth quarters. Sales decrease from the fourth quarter of one year to the first quarter of the next. 12. 12. (b) If the proportion of unsatisfactory films is 0. UCL = 0. To be above the UCL. CL = 0.. . and 10 are above the UCL. CHAPTER .019. 12. and LCL = 0.294. 2007 9:50 S-64 SOLUTIONS TO ODD-NUMBERED EXERCISES LCL = 0.14.73 CL = 7. . Thus.702. 12..003. LCL = 0. (b) To monitor the time to respond to requests for help we might measure the response times in a sample of time periods and use an x and an s chart to monitor how the response times vary. (b) The new sample proportions are not in control according to the original control limits from part (a). perhaps reflecting an initial lack of skill or initial problems in using the new system.642. the process is in control with respect to short-term variation.75 (a) Cp = 1. Using s/c4 is likely to give a slightly too small estimate of the process variation and hence a slightly too large (optimistic) estimate of Cp . This gives an estimate of all the variation in the output of the process (including sample-to-sample variation).69 (a) The percents add up to larger than 100% because customers can have more than one complaint. (b) The pattern described is obvious in the time plot.1 (a) Each year. the process variation. so using Cp is ultimately more informative about the process capability.13 . UCL = 19.Moore-212007 pbs November 27. A better measure of the process capability is to center the process mean within the specification limits and then compute a capability index. Cpk and Cp will be the same. although sample 10 is very close to the upper control limit. (c) We used s/c4 to estimate σ . We might measure the time the system is not available in sample time periods and use an x and an s chart to monitor how this length of time the system is unavailable varies. (b) The category with the largest number of complaints is the ease of obtaining invoice adjustments/credits.506.003. we would use a p chart. the process does appear to be in control.65. it is easy to change the value of Cpk . and plotting the sample values (most of which will be 0) will not be very informative. If the process is properly centered.77 (a) We would use a p chart.. With new p = 0..3. so we might target this area for improvement. Because we are monitoring a proportion.. The first sample is out of control.

(c) There is usually a peak in January. there is a dip. and 1. 24. (b) The first quarter of 2002 is $8871.13 (b) There is no obvious increase or decrease overall.231 tells us that fourth-quarter sales are typically 23. 13.3 (a) Sales = 5903.15 (a) The dashed line in the time plot corresponds to the least-squares line. (d) The pattern described in part (a) is repeated year after year. (d) There are two temporary troughs: a big one from January 1991 to November 1994 and a smaller one from January 1998 to January 2003. (c) The biggest dips come in August each year. and hence must be in the fourth quarter.231 to account for the fact that the trend-only model typically underpredicts the fourth quarter. (a) Seasonally adjusted series is a little smoother. The fourth quarter of 2002 is $9228. fairly linear trend over time. (b) If we know that X 1 = X 2 = X 3 = 0. (a) The seasonality factors are 0. This suggests that the seasonal pattern in the DVD player sales data is not as strong as it is in the monthly retail sales data. and then an increase until the next January each year. and then there is another major (but not quite as big) dip in December each year. The fourth-quarter seasonality factor of 1. 0. . The fourth quarter of 2002 is $11. but it’s just not as dramatic as for all the other years.885. Using the trend-only model. then we know that we are not in any of the first three quarters. There could also be considered a temporary peak at the very end of the data set where there is a steep increase. but there is lots of month-tomonth variability. while the fourthquarter forecast has been multiplied by 1. (c) In the past. However.17 (a) The first quarter of 2002 is $8188.75x with sales in millions of dollars and x takes on values 1. so that might help explain why the August dip isn’t as big as expected.9 13.22 million. which is close to 1. (a) Sales = 7858. (b) The average is 0. In August 2002. respectively. 13.94 million.97 million. (b) Seasonally adjusting the DVD player sales data smoothed the time series a little but not to the degree that seasonally adjusting the sales data in Figure 13. (b) The first-quarter forecast has been multiplied by 0. 13.76 + 99.21X 1 − 2564.923 as the trend-only model typically overpredicts the first quarter.1% above the average for all four quarters. There are smaller dips in June each year. (b) The fourth quarter of 1995. .359. we find that the sales for the first three quarters tend to be overpredicted and those for the fourth quarter are underpredicted. 0.79X 3. a drop until April.54x − 2274. 13. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-65 (c) There appears to be a positive trend. although the trend levels off after 1998. 13.58X 2 − 2022. (c) The trend-andseason model and the trend-only model with seasonality factors give similar predictions as both are adjusting the trend model for the seasonality effects. the June 2002 dip was bigger than all the other June dips.22 + 118. so the first quarter of 2002. . (c) The plot mimics the pattern of seasonal variation in the original series. 2. . The trend-and-season model explains much more of the variability .999.231 for quarters 1 through 4.Moore-212007 pbs November 27. (c) The slope is the increase in sales (in millions of dollars) that occurs from one quarter to the next.7 did. 13.83 million.19 (a) For the trend-only model R 2 = 35% and for the trend-and-season model R 2 = 86. predictions in the first three quarters tended to be slightly more accurate than in the fourth quarter.923.8%.960.7 13.5 13. (c) The intercept again represents the fourth quarter of 1995.11 (b) There is a positive.

The sales in July 2001 are 5170 thousands of units.03125.21 (a) There are two very large temporary troughs (approximately) between July 1990 and July 1995 and then between April 1997 and November 2003. s = 1170.29 (a) The 12-month moving average forecast predicts the November 2002 price by the average of the preceding 12 months. The correlation between et−1 and et is 0.25 (a) The other group of 10 points has December as the y coordinate. 0. 0. 0. 13.0000001. and .9 + 0. and most of the residuals for the first three quarters are negative.009. 0. The constant for the AR(1) model is much smaller than the constant (intercept) for the simple linear regression model.31. Averaging the last 12 values in the series gives the 12-month moving average forecast as $3. Yes.996yt−1 .0039062. so the forecast of August 2001 sales using this model is 5162.985. but neither captures the sharp rise in price that occurred over the latter part of 2002.0000009. 0.0019531.1. The AR(1) model is preferred because it was estimated using maximum likelihood.125.Moore-212007 pbs November 27.081. 0. 2007 9:50 S-66 SOLUTIONS TO ODD-NUMBERED EXERCISES in the JCPenney sales. 13.0009766.27 (a) Fitting the simple linear regression model using yt as the response variable and yt−1 as the predictor gives the equation yt = 34.25. The 120-month moving average forecast predicts the November 2002 price by the average of the preceding 120 months. (c) When w = 0.9.015625.9 the coefficients are 0.39.23 (a) The residuals for the fourth quarters are positive. 0. positive. 0. 0.095.5 the coefficients are 0. If there were a seasonal adjustment. and 0. 0.0387420.28. 0. 0. which is preferred over least-squares for time series models. and for the trendand-season model. (d) It is clear from the plot that the trend-and-season model is a substantial improvement over the trend-only model. 0. 0.31 (a) When w = 0. 0. 13. 13.22. 0.06561. 0. 0.09. so the forecast of August 2001 sales using this model is 5163. (b) Autocorrelation is not apparent in the lagged residual plot. The 120-month moving average forecast is slightly better.4573.64.1 the coefficients are 0.992yt−1 .0531441. Averaging the last 120 values in the series gives the 120-month moving average forecast as $3. (b) A closer look at the time plot suggests a positive autocorrelation. but the constants differ. (b) The correlation of 0. The majority of pairs of successive residuals shows the first residual lower than the second residual in the pair.0729. (c) The coefficients of yt−1 are very similar in parts (a) and (b). the correlation would be closer to 0. The outlying groups of points have the December sales as either the x or y coordinate.00009.0 + 0. (c) The lagged residual plot shows a beautiful.0625. (b) Fitting the AR(1) model gives the equation yt = 13. 13.7. The sales in July 2001 are 5170 thousands of units. The correlation between successive residuals et and et−1 is only 0.09.0000000. 0. there is a sharp rise. and 0.0478297. 0.0009.0430467. and this is what is reducing the correlation from 0.9206. linear relationship between et−1 and et . 0. 13.0078125. these two sets of 10 points should no longer stand out from the remaining points. (b) For the trend-only model. s = 566. 0. At the end of the data. (b) When w = 0.9206 to 0. (c) The trend-and-season model closely follows the original series. 0.000009. (c) If we looked at the seasonally adjusted time series. (b) Going to the Web page gives the actual winter wheat price received by Montana farmers for November 2002 as $4. The August 2001 estimates are very close.9206 suggests a strong autocorrelation in much of the time series.5. we have strong evidence of autocorrelation.059049. 0.

0000000 for the values w = 0. (c) Yes. but the model with w = 0. although in general maximum likelihood is preferred to least-squares in time series models.1 puts the greatest weight on y1 in the calculation of a forecast.327 and the regression standard error is 47. (d) There is very little difference between the values of R 2 the model standard error s in this example. (c) The equations obtained for least-squares and maximum likelihood are almost identical. ˆ 13. The curve for w = 0. (e) The curves for w = 0. The correlation is large and the relationship looks strong.39 (a) The moving averages do appear to be very linear.1 0.327 and the model standard error is 47.Moore-212007 pbs November 27. the relationship is strong enough to make production 12 months ago a good predictor or this month’s production. The model with w = 0.35 (a) There is a moderate positive linear relationship.976 The forecasts are fairly similar. respectively. ˆ ˆ ˆ 13. Thus. w = 0.568yt−1 .1. 13. the values of the coefficients in the forecast model decrease exponentially in value with the exception of this last coefficient. (b) The smoothness decreases as the value of w increases.5 0. These were then used to forecast the orange price for February 2001. there is a slight spiraling of the moving averages around the predicted value line.49217. (c) The Minitab output for an exponential smoothing model provides forecasts for each value of the smoothing constant.329 216.89. yJuly2005 = 704.000977.37 (a) Fitting the simple linear regression model using yt as the response variable and yt−1 as the predictor gives yt = 100. 13. The assumptions necessary for the regression model are no longer valid.998 219. (b) An AR(1) model was fitted and the estimated autoregression equation is yt = 100. The results are summarized below. (d) The January 2001 data point was added to the series and the models with the three weighting constants were fit to the new series. and w = 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-67 0. From 1996 on until the end of the data. yJuly2005 = 703. . which decreases more slowly.33 (a) yt = −0.9.481x. (b) 0. The value of w = 0.41 (a).5822. (f) The coefficients of y1 are 0. and 0. (c) The moving average line and predicted value line completely overlap until about 1996. (d). 13.5 show a more rapid exponential decrease than the curve for w = 0.5 provided the forecast that was closest to the actual value. The estimates are fairly close. so it is better to use the AR(1) model. The results are summarized on the next page. the y-intercept is not the same.572yt−1 .692. there is no clear indication which fitting method is preferred. (b) y = 181.0000000.1.372 + 1.45385 + ˆ 1. R 2 = 0.9 puts more weight on the most recent value of the time series in the computation of a forecast.4 + 0. Note that as indicated in the text. (c) The slope is very similar for both equations.9 would be best for forecasting the monthly ups and downs in orange prices. however.9 Prediction 218. (b) yt = −2.9 and w = 0.021 + 1.5. 0.348678.004yt−1 . Smoothing constant w 0.00877yt−1 .9 + 0. but he lines are still very close.74. R 2 = 0.

120. The moving averages with a span of k = 100 are quite smooth. (c). We set all indicators equal to 0 except the indicator for July.225(Nov.384 − 20. . Adding an indicator variable for December would be redundant.oanda. .)+0.45 (a) If we use an exponential smoothing forecast equation with smoothing constant ˆ of w = 1.Moore-212007 pbs November 27. Using the model from part (c). R 2 = 0. The forecast in part (c) is slightly larger. (e) Both the quadratic and cubic models appear to fit appreciably better than the straight line model. .9 provided the forecast that was closest to the actual value. we must first define 11 indicator variables.691 + 0. as in Example 13.)−0. R 2 = 0. If we use an exponential smoothing forecast equation with smoothing constant of w = 0.5 is UNITS = 2. = 0.371(Jun. Nov.3. .47 (a) There are no clear seasonal patterns in the time series plot.9. (b).993532.49 (a) To fit the trend-and-season model.198.137 + 1.9 Prediction 219.278. 2002 (from the web site www.478 The actual value in February was 229. (d) We used k = 4 and k = 100.) − 0. Both forecasts underestimate the actual exchange rate. (c) We fitted a second degree polyˆ nomial to the data using statistical software and obtained y = 20.057(Oct.802.430 + 256.171(year)2 + 0. 0 otherwise. Mar. 13.43 (a) If we use a span of k = 1.525(Jul. = 1 if the month is February.518 221.556(Mar. 13. Using software we obtained the estimated trend-and-season model ln(UNITS) = 10.) − 0.) − 0. (c).544(Apr.761(Jan. we notice that this would correspond to case 64.3(year) + 5.582 Thus.533(Aug. and so on through November. then we know the month must be December. The moving averages with a span of k = 4 are not particularly smooth. the forecast equation is yt = yt−1 . = 1 if the month is January. = 0.697. (c) To use the equation in part (a) to forecast for July 2002. We let Jan.)−0. = 0. R 2 = 0. (e) Using the model from part (b). Feb. which we set to 1. we find the forecast for the July 24.9.041(Sep.1 0.5 0. the moving average forecast equation would be ˆ yt = yt−1 .) − 0.149355(year)3 . the forecast equation is ˆ yt = yt−1 .01140.082. 13.786(Feb. (b) We fitted ˆ a line to the data using statistical software and obtained y = −487. 2002. = 0.com/convert/fxhistory). (d) The forecast using the seasonality factors in a trend-and-season model from Example 13.36356(year)2 . (d) We used w = 0. 13.586. The actual exchange rate on July 24. so in this case the model with w = 0. we forecast sales for July 2002 to be UNITS = e14.753. (d) We fitted a third degree polynomial to the data using statistical software and obtained ˆ y = −1.620. Feb. and s = 5048.152.735(year). (b) If Jan. and s = 3684. we find the forecast for the July 24. 0 otherwise. 2002.)−0. 2007 9:50 S-68 SOLUTIONS TO ODD-NUMBERED EXERCISES Smoothing constant w 0.632 (May)−0. .101. The cubic model fits a little better (a bit larger R 2 and a bit smaller s) than the quadratic model and might be a slightly better choice for the trend equation.752.991149. exchange rate to be 0.582 = 2. and s = 4091. exchange rate to be 0.2 is smoother than the exponential smoothing model with w = 0. We get ln(UNITS) = 14.733(year) − 872.764 223.)− 0.). We can determine whether the month is December from the other indicator variables.069(Case) − 0.2 and w = 0. The exponential smoothing model with w = 0. is 1. (b).6.

14. Using the AR(1) model. μ2 . and the common standard deviation σ . The plot with span k = 36 does the best job of smoothing the minor ups and downs while still capturing the major jump about two-thirds of the way through the data..Moore-212007 pbs November 27. The ratio of these is 120/80 = 1.11 MST = 9. (b) The F statistic would need to be larger than 3.10 is 1.. so it is reasonable to pool the standard deviations for these data. An even larger value of k might also work well. x 3 = 200. and μ3 .45 = 593.25178. σ ) distribution where σ is the common standard deviation. N = 60.. 14. We estimate σ by sp = 101.3 14.13 (a) The ANOVA F statistic has 3 numerator degrees of freedom and 20 denominator degrees of freedom... 14.... Basal 0 4 0 67 0 888999 1 01 1 22222223 1 45 1 6 DRTA 0 0 6777 0 888889999 1 000 1 2233 1 5 1 6 Strat 0 445 0 666777 0 889 1 111 1 22333 1 44 1 14. although it does capture the major jump in the series that occurs two-thirds of the way through the values.125. . The distribution of the DRTA scores shows some right-skewness.515. This ratio is less than 2.1 to be 9.7 14.53 The AR(1) model does not smooth the minor ups and downs in the data.9 SSG + SSE = 20.10 to have a P-value less than 0. 24.03 = SST.14 . There are no clear outliers in any of the groups. 13.51 We tried spans of 12.5 The distribution for the Basal group appears to be centered at a slightly larger score than that for the DRTA group and perhaps for the Strat group.01245.58 + 572. 13. (a) The largest standard deviation is 120 and the smallest is 80. DFG + DFE = 2 + 63 = 65 = DFT. The εij are assumed to be from an N (0. and n 3 = 20. . and 36. Minitab forecasts the June 2002 average price per pound of coffee as 3.32 Stem plots for the three groups are given below. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-69 (e) The forecast using the AR(1) model from Example 13.036. Using statistical software (we used Minitab) we find the mean of all 66 observations in Table 14.788 and the variance of all 66 observations in Table 14. CHAPTER . x 2 = 175. n 2 = 20. n 1 = 20.. The ANOVA model is xij = μi + εij .5. Using the moving average model with span k = 36. The forecast in part (c) is considerably larger. (b) I = 3.1 to be 9. 14.124.05. The parameters of the model are the I population means μ1 .1 I = 3. Minitab forecasts the June 2002 average price per pound of coffee as 3. x 1 = 150.

70883 (b) As sample size increases. In Excel.15 (a) It is always true that sp = MSE.7126 3.8a. (c) Of the sample sizes selected in the table above.29117 Power 0. Group 1 differs from Groups 2.31887 0.042 PROBF 0.27 From the output in Figure 14.6168 4.39427 0. 18. 1 1 6. 14.21 The 95% confidence interval is 15 ± 3.9042 1. which is a parameter. t23 = 1.25 The value of t ∗∗ for the Bonferroni procedure when α = 0.2 121.75683 0.8084 2.29 Group Group 1 Group 2 Group 3 Group 4 Group 5 Mean 150. Because of this. not categorical.d. Hence.83012 0. 14.90378. Take the square root of this entry (0. 100 would be best since the power is fairly low at smaller sample sizes. 14.683 2.616003 in this case) to get sp . and 5. power increases.16 x 2 − x 3 = 2.29. using the Bonferroni procedure we would not reject the null hypothesis that the population means for groups 2 and 3 are different.7273 − 44. 3. t23 = 2.e a SD 19. 14. 46.17 Try the applet.Moore-212007 pbs November 27.968).2. 14.33 (a) n 10 20 30 40 50 100 F∗ 2.1 18.24317 0.968 = (11.521 9.19 The standard error for the contrast SEc = 2. (b) You do want to use one-way ANOVA when there are at least three means to be compared.10243 0.866 2.725 2.90378 = 1. Thus.29.31 The Bonferroni 95% confidence interval is (−2.3 18.4 22.2727 14.35 (a) The response variable needs to be quantitative.627 DFG 3 3 3 3 3 3 DFE 36 76 116 156 196 396 λ 0. 14. sp = MSE.651 2.2c.22).29| < t ∗∗ . This interval includes 0.1).05 is t ∗∗ = 2.16988 0. SAS √ calls sp “Root MSE.d 117.23 t23 = = 1. You should find that the F statistic increases and the P-value decreases if the pooled standard error is kept fixed and the variation among the group means increases. .6 n 30 30 30 30 30 We see that Groups 1 and 4 have the largest means.9b.2b.60573 0.” (b) sp = MSE. 14. 2007 9:50 S-70 SOLUTIONS TO ODD-NUMBERED EXERCISES √ 2 14.68113 0.1 20. MSE is found in the ANOVA table in the row labeled “Within Groups” and under the column labeled MS. 14.4545 1. (c) The pooled estimate sp is an estimate for σ .4545 and the standard error for this difference is 1.89757 0.31 + 22 22 14.29.46 (see Example 14. Group 4 differs from Groups 2 and 5.c 129.032.663 2. 7. Because |1.e 140.

p F 0. N = 441. teacher policies. (e) These data do not represent 245 independent observations. (g) There is no control group.49 (a) In general. so it is impossible to comment on the effectiveness of the accommodations.43 For parts (a). n 2 = 145.050 2. 14.025 5. (b) So many decimal places are not necessary. the mean grade decreased.81 (c) Hispanic Americans were highest in each of the three groups. 14.41 0. Pooling should not be used (sp would be 0. (b) DFG = 2. For Emotion score. DFT = 440. it is not unreasonable to group these data together.51 (a) Checking whether the largest s < 2 × smallest s for each group indicates that it is appropriate to pool the standard deviations for intensity and recall but not frequency. N = 15. (b). (c) DFG = 2. because 20 < 2 × 15. (c) The biggest s is not exactly less than 2 times the smallest s (1. University policies. (c) I = 3. do not reject the null hypothesis. 72). 14. (a) DFG = 2. 14. 35). Using 2 decimal places would be fine. 14.3538 (from software). DFE = 12. and 40.400.86). Since the mean grade is affected similarly for 2. 7). 3.61. (c) 41.77 (b) The P-value would be between 0. F(2. n 3 = 76. with a large P-value at the 5% significance level. and recall there is strong evidence that not all the means are the same.866. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-71 14.001. DFE = 438. response variable = 1 to 5 rating of “Did you discuss the presentation with any of your friends?” (b) I = 3. (f) Answers will vary.07 0. (b) F(2.42 0. 2 × 0. as the number of accommodations increased. the European. response = cholesterol level.41 (a) I = 3.89 0.025 2.47 (a) Yes.100 3. n i = 25.45 Answers will vary. F(2.010 8. 14. (b) F(4.100. and Indian scores were low but close together. and 4 accommodations. F (2.66233 vs. There is not enough evidence to stay that any of the means are different.35 0. For . (d) 204.001 18.0365 (from software). so for frequency. 438). student backgrounds might not be the same from school to school. response = 1 to 10 rating of the game. DFT = 74.39 (a) F(4. (b) 48.14. DFT = 14.85 0.37 (a) F(3.001 4.82745). 405).100 1.050 and 0.Moore-212007 pbs November 27. F = 1. P-value = 0.45 0. DFE = 72. intensity. 14. p F 0. The P-value for all three one-way ANOVA tests is < 0.100. Asian. 18). Some students were measured multiple times. and (c): H0 : μ1 = μ2 = μ3 and Ha: Not all the means are the same. n i = 5. F = 4.” However. in this case. but it is not a steady decline. but it is close. (d) It is never a good idea to eliminate data points without a good reason. N = 75. (c) You can never conclude from an F test that all the means are different.010 3. 36.050 4. 12).000. n 1 = 220. No additional information is gained by having so many. P-value = 0.95 0. The alternative hypothesis is that “not all the means are the same.

but the plot for control shows some deviation from Normality towards the very low and very high ends.321 1. The largest s is < 2 × smallest s.001551 0.000017 0. 14. The histograms for low and high groups show fairly symmetric distributions. high (0. Asian and Japanese were both low. There are no outliers for any groups. This may affect how broadly the results can be applied.09500 1. (e) The chi-square test has a test statistic of 11. and the difference between 5 and 7 days after baking. In other words. At the α = 0. (d) Answers will vary. all the differences are significantly different from 0 except the difference between 0 and 1 days after baking.55 Condition i − j 1−0 3−0 3−1 5−0 5−1 5−3 7−0 7−1 7−3 7−5 Bonferroni Tests Difference Std.321 1.05 level that any of the group means differed.321 1.01 level.321 1.321 1.219.56 did not reject the hypothesis at the 0.5400 −9.0116). P-value = 0.4750 −4.57 (a) The ANOVA in Exercise 14.353 and a P-value of 0.235. (e) The high dose of kudzu isoflavones yields a significantly greater mean bone density for the femur of a rat than the control or low dose does.Moore-212007 pbs November 27.000012 0.05 level except the difference after 5 and 7 days.0115). 0. .023. so it is appropriate to pool the standard deviations. 2007 9:50 S-72 SOLUTIONS TO ODD-NUMBERED EXERCISES Frequency and Intensity and Recall.3850 −33. A means plot shows that control is slightly higher than low but high is much higher than the other two groups. standard deviation) for each group: control (0. no further analysis on which group means differed is appropriate.000219 0.53 (a) The histogram of bone density for the control group shows a skewed right distribution. People near a university might not be representative of all citizens of those countries. There is strong evidence that not all the means are the same. 14.000053 0.000033 0.216. and European and Indian are close together in the middle. 0.321 1.036734 0. so at the 5% significance level there is enough evidence to say that there is a relationship between gender and culture.38000 −40.6350 −13.718.321 1. error −6. The following are the (mean.238149 The differences in the group means are all significantly different from 0 at the α = 0. 0.001.9100 −20. A side-by-side boxplot of the groups shows heavy overlap of control and low but not much overlap of high with the other two groups.000007 0. There is no significant difference in mean bone density between the control and low dose.75000 −26.321 1. (d) Bonferroni multiple comparisons shows that high is significantly different from both control and low groups. low (0. there is evidence that the proportion of men and women is not the same for all of the cultural groups.2900 −29. (c) F = 7.321 P-value 0.321 1. Thus. but control and low groups are not significantly different from each other. 14. (b) The normal quantile plots for low and high look good.008542 0.1600 −36.0188).

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-73

(b) Condition i − j 1−0 3−0 3−1 5−0 5−1 5−3 7−0 7−1 7−3 7−5 Bonferroni Tests Difference Std. error −0.110000 −0.140000 −0.030000 −0.045000 0.065000 0.095000 −0.385000 −0.275000 −0.245000 −0.340000 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 P-value 0.752553 0.514112 0.999966 0.998871 0.982858 0.861026 0.014421 0.061029 0.096013 0.025003

The only group means that are significantly different at the α = 0.05 level are the difference between 0 and 7 days after baking and between 7 and 5 days after baking. 14.59 (a) The plots show that Group 2 has a modest outlier, but otherwise there are no serious departures from Normality. The assumption that the data are (approximately) Normal is not unreasonable. (b) Number of promotions Sample size Mean Std. dev. 1 3 5 7 40 40 40 40 4.22400 4.06275 3.75900 3.54875 0.273410 0.174238 0.252645 0.275031

(c) The ratio of the largest to the smallest is 0.275031 0.174238 = 1.58. This is less than 2, so it is not unreasonable to assume that the population standard deviations are equal. (d) The hypotheses for AVOVA are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 156 159 Sum of squares 10.9885 9.53875 20.5273 Mean square 3.66285 0.061146 F 59.903 P-value ≤ 0.0001

The F statistic has 3 numerator and 156 denominator degrees of freedom. The P-value is ≤ 0.0001, and we would conclude that there is strong evidence that the population mean expected prices associated with the different numbers of promotions are not all equal. 14.61 (a) Group Piano Singing Computer None Sample size 34 10 20 14 Mean 3.61765 −0.300000 0.450000 0.785714 Std. dev. 3.05520 1.49443 2.21181 3.19082

Moore-212007

pbs

November 27, 2007

9:50

S-74

SOLUTIONS TO ODD-NUMBERED EXERCISES

(b) The hypotheses for ANOVA are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 74 77 Sum of squares 207.281 553.437 760.718 Mean square 69.0938 7.47887 F 9.2385 P-value ≤ 0.0001

The F statistic has 3 numerator and 74 denominator degrees of freedom. The P-value is ≤ 0.0001, and we would conclude that there is strong evidence that the population mean changes in scores associated with the different types of instruction are not all equal. 14.63 The contrast is ψ = μ1 − (1/3)μ2 − (1/3)μ3 − (1/3)μ4 . To test the null hypothesis H0 : ψ = 0, the t statistic is t = c/SEc = 3.306/0.636 = 5.20. P-value ≤ 0.001. We conclude that there is strong statistical evidence that the mean of the piano group differs from the average of the means of the other three groups. 14.65 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i − j Difference Std. error P-value 2−1 3−1 3−2 11.4000 37.6000 26.2000 9.653 9.653 9.653 0.574592 0.001750 0.033909

At the α = 0.05 level we see that Group 3 (the high-jump group) differs from the other two. The other two groups (the control group and the low-jump group) are not significantly different. It appears that the mean density after 8 weeks is different (higher) for the high-jump group than for the other two. 14.67 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i − j Difference Std. error P-value 2−1 3−1 3−2 0.120000 2.62250 2.50250 0.3751 0.3751 0.3751 0.985534 0.000192 0.000274

At the α = 0.05 level we see that Group 3 (the iron pots) differs from the other two. The other two groups (the aluminum and clay pots) are not significantly different. It appears that the mean iron content of yesiga wet’ when it is cooked in iron pots is different (higher) than when it is cooked in the other two. 14.69 (a) Plots show no serious departures from Normality (but note the granularity of the Normal probability plot. It appears that values are rounded to the nearest 5%), so the Normality assumption is not unreasonable.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-75

(b) Group i − j ECM2 − ECM1 ECM3 − ECM1 ECM3 − ECM2 MAT1 − ECM1 MAT1 − ECM2 MAT1 − ECM3 MAT2 − ECM1 MAT2 − ECM2 MAT2 − ECM3 MAT2 − MAT1 MAT3 − ECM1 MAT3 − ECM2 MAT3 − ECM3 MAT3 − MAT1 MAT3 − MAT2 Bonferroni Tests Difference Std. error −1.66667 8.33333 10.0000 −41.6667 −40.0000 −50.0000 −58.3333 −56.6667 −66.6667 −16.6667 −53.3333 −51.6667 −61.6667 −11.6667 5.00000 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 P-value 1.00000 0.450687 0.223577 0.000001 0.000002 0.000000 0.000000 0.000000 0.000000 0.008680 0.000000 0.000000 0.000000 0.101119 0.957728

At the α = 0.05 level we see that none of the ECMs differ from each other, that all the ECMs differ from all the other types of materials (the MATs), and that MAT1 and MAT2 differ from each other. The most striking differences are those between the ECMs and the other materials. 14.71 (a) Group Lemon White Green Blue Sample size 6 6 6 6 Mean 47.1667 15.6667 31.5000 14.8333 Std. dev. 6.79461 3.32666 9.91464 5.34478

(b) The hypotheses are H0 : μ1 = μ2 = μ3 = μ4 , Ha : not all of the μi are equal. ANOVA tests whether the mean number of insects trapped by the different colors are the same or if they differ. If they differ, ANOVA does not tell us which ones differ. (c) Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 4218.46 920.500 5138.96 Mean square 1406.15 46.0250 F 30.552 P-value ≤ 0.0001

sp = 6.78. We conclude that there is strong evidence of a difference in the mean number of insects trapped by the different colors. 14.73 (a) Source Groups Error Total Degrees of freedom 3 32 35 Sum of squares 104,855.87 70,500.59 175,356.46 Mean square 34,951.96 2,203.14 5,010.18 F 15.86

5)μ2 − μ3 . Ha : ψ2 = 0. This quantity corresponds to MSE in the ANOVA table.5)μ2 − μ3 .65| is larger than t ∗∗ = 2.25. 2007 9:50 S-76 SOLUTIONS TO ODD-NUMBERED EXERCISES (b) H0 : μ1 = μ2 = μ3 = μ4 .005. P-value < 0. (c) SEc1 = 6.81 (a) Question 1. Question 3. Ha : not all of the μi are equal. Ha : ψ1 > 0.65. although the researchers tried to match those in the groups with respect to age and other characteristics. Hypotheses: H0 : ψ3 = 0. 14. (b) Question 1: P-value > 0.001.005.5)μ1 + (0. so we do not have strong evidence that T and C differ. (d) The F statistic has the F(2.46). sp = 46. Ha : ψ3 > 0. SEc2 = 7. is t = 7. it would be reasonable to test the hypotheses H0 : ψ1 = 0. The P-value is smaller than 0. (b) Source Groups Error Total Degrees of freedom 2 206 208 Sum of squares 17.15. no matter how well designed it is in other respects. 0. 14.5)μ1 + (0.66| is not larger than t ∗∗ = 2.83 For each pair of means we get the following: Group 1 (T) vs.14. For H0 : ψ2 = 0. Hypotheses: H0 : ψ2 = 0.46 Mean square 8. (c) This is an observational study. There is at best weak evidence that T is better than the average of C and S.81. 206) distribution. We conclude that these data do not provide evidence that the mean weight gains of pregnant women in these three countries differ.203. Group 2 (C): t12 = −0. We conclude that there is not strong evidence that the average of the mean SAT mathematics scores for computer science majors differs from that for engineering and other science majors. Group 3 (J): t13 = −3.46.46 = (−24.Moore-212007 pbs November 27.29. the test statistic is t = 1. it would be reasonable to test the hypotheses H0 : ψ2 = 0. (e) A 95% confidence interval for ψ1 is 49 ± 12. The P-value is greater than 0. Question 2: 0. 2 14.4 175. so we have strong evidence that T and J differ. A 95% confidence interval for ψ2 is −10 ± 14. 61.21 (c) H0 : μ1 = μ2 = μ3 . (d) The test statistic for H0 : ψ1 = 0. |−.20.54 = (36. Thus.37.10 < P-value < 0.79 (a) For the contrast ψ1 = (0. 14. For the contrast ψ2 = μ1 − μ2 . We conclude that there is strong evidence that the average of the mean SAT mathematics scores for computer science and engineering and other sciences majors is larger than the mean SAT mathematics scores for all other majors.5)μ4 . (c) The F statistic has the F(3. 4.32.90 5010. Contrast: ψ2 = μ1 − (0. Question 2. Contrast: ψ1 = μ1 − μ2 . Ha : ψ1 > 0.22 803. There is strong evidence that J is better than the average of the other three groups. Males were not assigned at random to treatments.46.81. Ha : ψ1 > 0.5)μ2 − (0.75 (a) sp = 3.356. 14. |−3. (b) A contrast is ψ2 = μ1 − μ2 .66. Ha : ψ2 = 0.90.77 (a) A contrast is ψ1 = (0. .61 3. there are reasons why people choose to jog or choose to be sedentary that may affect other aspects of their health. 32) distribution. Hypotheses: H0 : ψ1 = 0.54). We do not have evidence that T is better than C. 2 (d) sp = 2.75. Group 1 (T) vs. Contrast: ψ3 = μ3 − (1/3)μ1 − (1/3)μ2 − (1/3)μ4 . (b) Estimate of ψ1 : c1 = 49.18 F 2. Ha : ψ2 > 0. Question 3: P-value < 0. It is always risky to draw conclusions of causality from a single (small) observational study.100.10 < P-value < 0. Estimate of ψ2 : c2 = −10.94. Ha : not all of the μi are equal.

2.44. standard deviations.8 6.44 by dividing each by 64 and multiplying the result by 100.81. dev.0547 19.73 11.8548 A sample size of 175 gives reasonable power. 14.419845 Std. 14.0158 3. The means.81.65 F 2. so we have strong evidence that T and S differ.74 ≤ 0.852 1.22| is larger than t ∗∗ = 2.85 n 50 100 150 175 200 DFG 2 2 2 2 2 DFE 147 297 447 522 597 F∗ 3. so we have strong evidence that C and S differ.853 176. Group 3 (J) vs.673 Mean Square 13.1460 .81. 14. that the mean % vitamin C content is not the same for all conditions.87 (a) Group 0 1 3 5 7 Sample size 2 2 2 2 2 Mean 76.0576 3.7336 0.25840 We conclude that there is strong evidence that the group means differ.352 0.34 9.0001 4. Group 4 (S): t14 = 3. |3.78 5. |−2.39753 3.69043 0.3 135.0130 3.606.97 21. the F statistic.792. and P-value are all the same as in Exercise 14.32561 1. |6. standard deviations. that is.0108 λ 2.89 (a) The ANOVA table with the incorrect observation is Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 40.99 367.81. (c) The ANOVA table is Source Degrees of freedom Sum of squares Groups Error Total 4 5 9 6263. one might consider a sample size of 150 per group. Group 4 (S): t34 = 6. The gain in power by using 200 women per group may not be worthwhile unless it is easy to get women for the study.5453 0.56 8. |3.297 (b) The sample sizes would be the same in both tables.Moore-212007 pbs November 27. Group 4 (S): t24 = 3. so we do not have strong evidence that C and J differ.12 Power 0.29| is not larger than t ∗∗ = 2.0032 P-value 0.14.5547 34.86| is larger than t ∗∗ = 2. so we have strong evidence that J and S differ.26 Mean square F P-value 1565.2950 0. 2007 9:50 SOLUTIONS TO ODD-NUMBERED EXERCISES S-77 Group 1 (T) vs. and standard errors above could have been obtained from those in Exercise 14. error 1. The degrees of freedom. Group 2 (C) vs.29. and standard errors change the same way as individual values do.20429 1.3984 13 Std. Means.195 0.8017 0. If it is difficult or expensive to include more women in the study.22.820.1016 65. Group 3 (J): t23 = −2.14| is larger than t ∗∗ = 2.695 2. Group 2 (C) vs.86.0261 3.2920 6285.

Lemon yellow White Green Blue 6 6 6 6 114. 0. . 14.10.8333 164. (c) Using software we obtain the following: Variable Constant Number of promotions Coefficient 4. and we concluded that there was strong evidence that the mean number of insects that will be trapped differs between the different-colored traps.49 showed that there is strong evidence that the mean expected price is different for the different numbers of promotions. the mean expected price is changing as the number of promotions changes.91 (a) The pattern in the scatterplot is a roughly linear.0401 0. error of coeff.667 15.32666 9.36453 −0. because if the slope is different from 0. (b) The results are very different. so we would conclude that we do not have strong evidence that the mean number of insects that will be trapped differs between the different-colored traps.416 3. Thus.3 P-value ≤ 0. The ANOVA in Exercise 14.0001. (c) Group Count Mean Std. dev. In Exercise 14.0001.91464 5. In this example the regression is more informative. and we see that P-value is ≤ 0. P-value ≤ 0.0001 The t statistic for testing whether the slope is 0 is −133. decreasing trend. This is consistent with our regression results here. (b) The test in regression that tests the null hypothesis that the explanatory variable has no linear relationship with the response variable is the t test of whether or not the slope is 0.6667 31.0001 ≤ 0.34478 The unusually large values of the mean and standard deviation might indicate that there was an error in the data recorded for the lemon yellow trap. 2007 9:50 S-78 SOLUTIONS TO ODD-NUMBERED EXERCISES The P-value is larger than 0. It not only tells us that the means differ but also gives us information about how they differ. there is strong evidence that the slope is not 0.61.116475 Std.0087 t ratio 109 −13. The outlier increased the sum of squares of error considerably.Moore-212007 pbs November 27. and this results in a much smaller value of F.5000 14.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times