You are on page 1of 23

4/20/2007

24 Answers

Mix and Match


1. f 2. e 3. c 4. g 5. h 6. j 7. a 8. d 9. b 10. i

True/False
11. True. 12. False The value of se typically decreases, but it does not have to. R2 must increase. 13. False Its called a marginal slope because it includes the effects of other explanatory variables. 14. True 15. False It might be smaller, but it does not have to be smaller. It depends on the size and sign of any indirect effects. 16. False The marginal and partial slopes need not even have the same sign, much less both be close to zero. 17. True 18. True 19. False We should only conclude that at least some deviation from this hypothesis occurs. It may not be the case that both are different from zero. Perhaps only one of them differs from zero. 20. True 21. False Its primary use is locating the effects of leveraged outliers.

4/20/2007 22. True

24 Answers

Think About It
23. Most likely we have some collinearity. Busy areas attract a lot of fast food outlets because sales are high (positive correlation). Among densely populated areas, however, the number of competitors reduces sales of a store (negative partial slope). Youd like to have the densely populated area to yourself. The more competitors that are around, the lower your sales for a give population density. 24. The two explanatory variables, test score and education, are evidently redundant. Once you know one, the other adds little value. Both are positively correlated (evidently), so either has a positive correlation with performance. But once you know the educational background, the score on the qualifying test adds little additional value. 25. a) Estimated Salary = b0 + 5 Age + 2 Test Score b) The indirect effect is 10 $M/Point = 2 years/point * 5 $M/year, larger than the direct effect. c) The marginal effect is the direct plus indirect effect, or 10 + 2 = 12 $M/point. d) Youre not going to be much older, so we need the partial effect. Raising the test score by 5 points nets $10,000 annually. Its probably worth it if youre going to stay with the company long enough to earn it back. 26. a) No, not without the intercept. b) Positive. The marginal slope is -0.1 + 0.7*0.2 = 0.04 c) A young person with lots of money to spend. 27. a) The correlation of something with itself is 1. b) You cannot, not without knowing the variance of x1. c) The partial and marginal slopes will be the same because the two explanatory variables are evidently uncorrelated. There can be no indirect effect. 28. a) Yes. R2 is at least as large as 0.74082 > 0.54. b) The same as the correlation, 0.7408. The correlations become covariances when standardized, so we have the covariances and variances. c) They differ because the two explanatory variables are correlated. 29. a) The fitted value is 87 + 0.3 * 250 + 1.5 * 100 =312, or $312,000 revenue per month 87 + 0.3 * 200 + 1.5 * 75 = 259.5, or $259,500 revenue per month Expand to the second location. b) The intercept, $87,000, resembles a fixed cost. The intercept estimates fixed revenue that is present regardless of the distance to the destination or the population. Perhaps its money earned from air freight or other services provided by the airline. Without a confidence interval, we cannot be sure if the value is really far from zero. It might be a large extrapolation. c) Among comparably populated cities, flights to those that are 100 miles farther away produce 0.3 100 = $30,000 more revenue per month, on average. A24-2

4/20/2007

24 Answers

d) If we compare revenue from flights to cities that are equally distant from the hub, average monthly revenue to larger cities is higher by about $1.5 per person. 30. a) The estimated margin from the location near the office complex is Est Margin = 54 - 0.0073 * 2250 + 0.0216 * 400 = 46.215% whereas at the more isolated complex the margin is Est Margin = 54 - 0.0073 * 300 + 0.0216 * 50 = 52.89% Choose the more isolated site. b) The intercept, 54% operating margin, is a baseline value added to the estimated margin for every hotel. Without seeing the scale of the other variables, we cannot tell if the intercept is an extrapolation if interpreted as the predicted value for a hotel in a very isolated location with no competitors or offices. c) The negative slope indicates that on average, sites with more competing rooms have lower operating margins (at a slope of about 0.0073% per additional competing room. d) The slope for office shows that sites near offices earn higher margins. On average, with about a 0.02 gain in margin per additional 1000 square feet. 31. a,b) The filled in table is Intercept Distance Population Estimate 87.3543 0.3428 1.4789 SE 55.0459 0.0925 0.2515 t-statistic 1.5869 3.7060 5.8803 p-value 0.10 <0.01 <0.01

c) Yes, the t-statistic for Distance is larger than 2 in absolute size. d) Based on the fit of this model, the confidence interval for 10 times the slope for population is 14.789 - 2*2.515, 14.789 + 2*2.515 = 9.759 to 19.819 thousand dollars The relevant se rounds to 2.5, so we should keep 1 decimal place and give the interval as 9.7 to 19.8 thousand dollars ($9,700 to $19,800). 32. a,b) The completed table of output is Estimate SE t-statistic Intercept 53.9826 5.1777 10.4259 Rooms -0.0073 0.0013 5.6154 Office 0.0216 0.0176 1.2273 p-value <<.001 <<.001 0.2

c) No, the p-value is larger than 0.05 and the t-statistic is less than 2. d) Yes, because the absolute value of the t-statistic is larger than 2. 33. a) Yes. The overall F-statistic is (0.74/0.26) * ((37-1-2)/2) 48.4 >> 4. b) The standard deviation of the residuals around the fit is $32,700. Given that the conditions of the model check out, we ought to be able to predict monthly revenue to within about $65,000 with 95% confidence. 34. a) Yes, because the overall F-statistic is F = (.45/.55)*((100-1-2)/2) 39.7 >>4. b) We could predict to within about 2 times se, or 16.8% with 95% confidence.

A24-3

4/20/2007

24 Answers

You Do It
35. Diamonds a) The plots show the discrete properties of the data: we only have several fixed lengths and widths. Width is very highly related to price. The two xs are not very correlated. The plots look straight enough (particularly that for width).
1000 800
1000 800

Price ($)

Price ($)
15 20 25 30

600 400 200 0

600 400 200 0 1 1.5 2 2.5 3 3.5 4 4.5

Length (Inch)

Width (mm)

30

Length (Inch)

25

20

15 1 1.5 2 2.5 3 3.5 4 4.5

Width (mm)

b) The largest correlation (0.95) is between price and width. Evidently width tells you more about how much gold than the length. Price ($) Length (Inch) Width (mm) Price ($) 1.0000 0.1998 0.9544 Length (Inch) 0.1998 1.0000 0.0355 Width (mm) 0.9544 0.0355 1.0000 c) The fit of this model has R2 = 0.94 and se = $57 with these coefficients
Term Intercept Length (Inch) Width (mm) Estimate -405.635 8.8838083 222.48894 Std Error 62.11863 2.654034 11.64679 t Ratio -6.53 3.35 19.10 Prob>|t| <.0001 0.0026 <.0001

d) First, the overall fit of the model is not straight enough; theres a trend in the residuals. Second, the model is missing an obviously important variable: the amount of gold in the chain.

A24-4

4/20/2007

24 Answers
200

Price ($) Residual

150 100 50 0 -50 -100 0 200 400 600

Price ($) Predicted

e) We formed the volume of the chain as the length (in mm) times the width2. This in a way gets at the amount of gold in the chain, though not perfectly. The residuals have some pattern left, but theres not the clear trend as before, and now we can identify some outliers (a bargain and an expensive chain) that were hidden.
50

Price ($) Residual

25 0 -25 -50 -75 0 200 400 600 800 1000

Price ($) Predicted

f) Heres the fit for the improved model. With the added volume, the other two explanatory variables, particularly the length, lose importance. The model looks much straighter with a much smaller se near $17. Theres still a problem in the residuals, but they are much smaller. Our proxy for gold isnt perfect for the heavier chains.
R2 se Term Intercept Length (inch) Width (mm) Volume (cu mm) Estimate 55.118884 0.0451975 -30.59663 0.0930388 0.994674 17.0672 Std Error 34.43198 0.971144 16.27885 0.005845 t Ratio 1.60 0.05 -1.88 15.92 Prob>|t| 0.1225 0.9633 0.0724 <.0001

36. Convenience shopping a) The scatterplots appear straight enough. The only problems appear to be a scattering of outliers, such as the point highlighted in the figures.

A24-5

4/20/2007

24 Answers

Sales (Dollars)

Sales (Dollars)

3000

3000

2000

2000

1000 1000 2000 3000 4000 5000

1000 0 100 200 300 400 500 600 700 800

Volume (Gallons)

Car Washes

5000 4000 3000 2000 1000

Volume (Gallons)

100 200 300 400 500 600 700 800

Car Washes

b) The largest correlation is between volume of gas and sales. Car washes are slightly correlated with both of these, but not very much. Seems as though sales at the car wash are not very predictive of either gas volume or sales at the store. Sales Volume Car Washes (Dollars) (Gallons) Sales (Dollars) 1.0000 0.6496 0.1700 Volume (Gallons) 0.6496 1.0000 0.1242 Car Washes 0.1700 0.1242 1.0000

c) The fitted model is

R2 se Estimate 1112.1759 0.3150315 0.2326914

0.430022 245.6717 Std Error 77.8611 0.022442 0.1166 t Ratio 14.28 14.04 2.00 Prob>|t| <.0001 <.0001 0.0469

Term Intercept Volume (Gallons) Car Washes

d) The outliers are scattered and not very serious with so much data. The residuals are nearly normal.

A24-6

4/20/2007

24 Answers

.01 .05 .10 .25 .50 .75 .90 .95 .99 1000

1200 1000 800 600 400 200 0 -200 -400 -600 -800 1200 1500

Sales (Dollars) Residual

1800

2100

2400 2700

10 2030 4050 -3

-2

-1

Sales (Dollars) Predicted

Count

Normal Quantile Plot

e) The slope for car washes indicates that among stations with comparable levels of gasoline sales, those that sell more car washes generate higher sales in the connected convenience store. The size of the effect is small, however, with added sales amounting to between nothing and $0.47 in added daily sales (on average) per additional wash. To get the interval, the calculations are 0.2326914 - 2 * 0.1166, 0.2326914 + 2 * 0.1166 -.0005 to .4659 and round to 2 decimals. The reported p-value is slightly less than 0.05 because the precise cutoff with this number of cases is 1.96 rather than our approximate 2. Notice in the rounding, however, it does not matter. The lower endpoint is basically zero. 37. Download a) The file sizes increased steadily over the day, meaning that these two explanatory variables are closely associated. The scatterplots of transfer time on file size and time of day seem reasonably linear, though their may be some bending in the plot of transfer time on the time of day.
50 50

Transfer Time (sec)

40 30 20 10 20 30 40 50 60 70 80 90 100

Transfer Time (sec)

40 30 20 10 0 1 2 3 4 5

File Size (MB)

Time

A24-7

4/20/2007
100 90

24 Answers

File Size (MB)

80 70 60 50 40 30 20 0 1 2 3 4 5

Time

b) The marginal and partial slopes for the file size will be very different. We will not easily be able to separate their influence from one another. The file size and time of day are virtually redundant, so the indirect effect of file size will be very large. c) The multiple regression is R2 se Estimate 7.1388209 0.3237435 -0.185726 0.624569 6.283617 Std Error 2.885703 0.179818 3.16189 t Ratio 2.47 1.80 -0.06 Prob>|t| 0.0156 0.0757 0.9533

Term Intercept File Size (MB) Time (hours since 8 am)

d) Somewhat, but not completely. The residual plot suggests slightly more variation for larger file sizes. The effect is fairly subtle and is also evident in a time plot of the residuals. There is also a slight negative dependence over time, with the residuals oscillating back in forth from positive to negative. Again, the effect is not too strong (albeit significant by the Durbin-Watson test, D = 2.67). The residuals appear nearly normal with no evidence of bending patterns.
15 .01 .05 .10 .25 .50 .75 .90 .95 .99

Transfer Time (sec) Residual

15 10 5 0

10 5 0 -5

-5 -10 15 20 25 30 35
-10 5 10 15-3 -2 -1 0 1 2 3

Transfer Time (sec) Predicted

Count

Normal Quantile Plot

e) No. The outcomes of these tests are weird. The overall F-statistic is approximately F = (0.624/(1-0.624))*(77/2) 64 is very significant (being much larger than 4). On the other hand, the t-statistics as seen in the tabular summary are both less than 2. Thus, we can reject H0: 1 = 2 = 0, but cannot reject either H0: 1 = 0 or H0: 2 = 0. f) The key difference is the increase in the se of the slope. The confidence interval for the partial slope for file size from the multiple regression is 0.3237435 - 2 * 0.179818 to 0.3237435 + 2 * 0.179818, or about -.04 to 0.68 seconds per MB a huge A24-8

4/20/2007

24 Answers

range that includes zero. The marginal slope is 0.3133 - 2 * 0.0275 to 0.3133 + 2 * 0.0275, or about .2583 to .3683 seconds per MB. The estimates (slopes) are about the same, but the range in the multiple regression is much larger. g) The direct effect of file size (from the multiple regression) is indirect effect of file size is 0.32 sec/MB. The indirect effect (from the simple regressions) is (0.0562 hours since 8am/MB)* (-0.186 sec/hour after 8am) = -.0104532 sec/MB is very small. The path diagram only tells you about the difference between the indirect and direct effect (slope in the simple and multiple regression), not the change in the standard errors. 38. Home prices a) Some of the homes are large and expensive, making these leveraged outliers. The relationships appear linear. One particularly large home has 7 bath bet they have someone else do the cleaning. The two explanatory variables are related, as you would expect.
900 800 700 600 500 400 300 200 100 0 2 3 4 5 6 7 8 9 10 11 12 13 900 800 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7

Price ($M)

Sq Feet

Price ($M)

Num Bath Rms

13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7

Sq Feet

Num Bath Rms

b)

R2 se Term Intercept Sq Feet Num Bath Rms Estimate 107.41869 45.16066 14.793861

0.533512 81.03068 Std Error 19.59055 5.78193 11.74715 t Ratio 5.48 7.81 1.26 Prob>|t| <.0001 <.0001 0.2099

c) Theres no sign of the usual changing variation. This looks to meet the usual A24-9

4/20/2007

24 Answers

assumptions. The concern remains the presence of the leveraged outlier. The residuals are nearly normal.
200 .01 .05 .10 .25 .50 .75 .90 .95 .99

Price ($M) Residual

200 100 0 -100 -200 200 300 400 500 600 700 800
100 0 -100 -200 10 20 30 -3 -2 -1 0 1 2 3

Price ($M) Predicted

Count

Normal Quantile Plot

d) Yes. The overall F-statistic is F = (0.5335/(1-0.5335))* (150-1-2)/2 84 which is much larger than 4 needed to assure statistical significance. e) The confidence interval for the marginal slope is 82.3267 - 2 * 9.4291, 82.3267 + 2 * 9.4291 = 63.4685 to 101.1849 or about 63 to 101 thousand dollars per bathroom. For the partial slope, the CI is 14.7939 - 2 * 11.7472, 14.7939 + 2 * 11.7472 = -8.7005 to 38.2883 or about -9 to 38 thousand dollars per bathroom. The range of the intervals is comparable, but the estimates are rather different. The estimates change because of the correlation between the two explanatory variables (evident in a) which implies a large indirect effect. f) Shes unlikely to recover the value of the conversion from the sale price. The value of converting space (the partial slope; the conversion to a bathroom does not increase the size of the home) is from -9 to 38, and her cost of 40 thousand lies outside this range. Dont do it (unless she just wants another bathroom). 39. Production costs a) The scatterplots are OK: roughly linear with a few troublesome outliers. These jobs feature expensive material costs, but relatively typical labor hours and average costs.
65 60 65 60

Average Cost ($/unit)

55 50 45 40 35 30 25 20 1 2 3 4 5 6 7 8

Average Cost ($/unit)

55 50 45 40 35 30 25 20 .1 .2 .3 .4 .5 .6 .7 .8

Material Cost ($/unit)

Labor Hours ($/unit)

A24-10

4/20/2007
8

24 Answers

Material Cost ($/unit)

7 6 5 4 3 2 1 .1 .2 .3 .4 .5 .6 .7 .8

Labor Hours ($/unit)

b) The estimated multiple regression is R2 se Term Intercept Material Cost ($/unit) Labor Hours (Hrs/unit) Estimate 19.873795 2.2842944 34.357028

0.336022 7.337964 Std Error 2.084669 0.444853 4.490999 t Ratio 9.53 5.13 7.65 Prob>|t| <.0001 <.0001 <.0001

c) Yes, the indicated model meets the usual conditions. For example, the residual plot looks fine and the residuals are nearly normal.
Average Cost ($/unit) Residual
20 10 0 -10 -20 20 25 30 35 40 45 50 55 60 65
20 .01 .05 .10 .25 .50 .75 .90 .95 .99

10

-10

-20

Average Cost ($/unit) Predicted

10

20

30 -3

-2

-1

Count

Normal Quantile Plot

d) Yes, the estimated slope for labor hours is far from zero (more than 7.6 ses from zero). e) The confidence interval for labor is 34.357028 - 2 * 4.490999, 34.357028 + 2 * 4.490999 or 25 to 43 $/Hour. Evidently, the cost of additional labor on these jobs runs about 25 to 43 $ per hour, on average. Theres little indirect effect because theres little correlation between labor and material costs. Its not as if were pulling in expensive labor for valuable materials. f) To within about $14.70 per unit. Thats a fairly wide margin considering that some of the less expensive orders cost only $30 per unit. The prediction could be off on such orders by 50%.

A24-11

4/20/2007 40. Leases

24 Answers

a) Other than the outliers (which are rather expensive for their size and age, marked here with xs further below) the plots look reasonably linear, though not very strong association. The two explanatory variables appear unrelated, so there will be similar marginal and partial slopes.
26 24 26 24

Cost per Sq Foot

22 20 18 16 14 12 0 .0001 .0003 .0005 .0007 .0009

Cost per Sq Foot

22 20 18 16 14 12 -10 0 10 20 30 40 50 60 70 80 90

1/Sq Feet
0.001 0.0009 0.0008 0.0007

Age

1/Sq Feet

0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0 -10 0 10 20 30 40 50 60 70 80 90

Age

b)

R2 se Term Intercept 1/Sq Feet Age Estimate 15.466548 3263.0632 0.0352693

0.329793 1.438612 Std Error 0.177344 538.5394 0.004673 t Ratio 87.21 6.06 7.55 Prob>|t| <.0001 <.0001 <.0001

c) See part f, but as far as the data goYes, but the outliers either indicate curvature or perhaps a change in the variation for some group of leases. Even so, the residuals are nearly normal. The model is close to meeting the conditions of the MRM, and we should proceed.
7 6 5 4 3 2 1 0 -1 -2 -3 -4 15 16 17 18 19 20 21

Cost per Sq Foot Residual

7 6 5 4 3 2 1 0 -1 -2 -3 -4 20 40 60 -3

.01 .05 .10 .25 .50 .75 .90 .95 .99

Cost per Sq Foot Predicted

-2

-1

Count

Normal Quantile Plot

A24-12

4/20/2007

24 Answers

d) Yes. F= (0.3298/(1-0.3298)) * (223-1-2)/2 54 which is much larger than 4, and thus statistically significant. e) Among leases for the same amount of office space, those in older buildings appear slightly more expensive. The average cost of a lease in a 5 year old building is about 3 to 5 cents more per square foot than comparable space in a 4 year old building. Details for the confidence interval 0.035269 - 2 * 0.004673, 0.035269 + 2 * 0.004673 0.026 to 0.045 f) This model does not address the location of the buildings. This lurking variable could have a considerable impact on the slopes in this model. Perhaps thats why the older buildings cost more its not the age of the buildings, its the location and the older buildings are in a nice part of town. 41. R&D expenses a) The scatterplots (all on log scales) show strongly linear trends, but between y and the explanatory variables as well as between the explanatory variables
8

Log R&D Expense

Log R&D Expense

6 4 2 0 -2 -4 -6 0 10

6 4 2 0 -2 -4 -6 0 10

Log Assets

Log Net Sales

10

Log Assets
0

10

Log Net Sales

b)

R2 se Term Intercept Log Assets Log Net Sales Estimate -1.203173 0.5831633 0.2284876

0.80991 0.869808 Std Error 0.089859 0.052146 0.053194 t Ratio -13.39 11.18 4.30 Prob>|t| <.0001 <.0001 <.0001

c) The residuals are skewed, even on the log scale. The range below zero is more extreme than the range above. That is, the variation of negative residuals is larger than the variation of positive residuals. As a result, the data are not nearly normal. A24-13

4/20/2007

24 Answers

The model would not be suitable for prediction (ie, 95% prediction intervals would not have the right coverage). The CLT suggests inferences about slopes are OK, but not for predicting individual companies.
Log R&D Expense Residual
.001.01.05 .10.25.50.75.90 .95.99 .999

2.0 1.0 0.0 -1.0 -2.0 -3.0 -6 -4 -2-1 0 1 2 3 4 5 6 7 8 9

2 1 0 -1 -2 -3

Log R&D Expense Predicted

25 50 75 -4

-3

-2 -1

Count

Normal Quantile Plot

d) Yes, because the t-statistic (4.3) indicates that this slope is significantly different from zero. Hence, the addition of this explanatory variable significantly increases R2. e) The partial elasticity of R&D expenses with respect to net sales is 0.2284876 - 2 * 0.053194, 0.2284876 + 2 * 0.053194 = .1220996 to .3348756 or about (to presentation precision) 0.12 to 0.33. Among companies of equal assets, R&D spending averages between 0.12 to 0.33 percent higher among those with 1% higher net sales. f) Yes, its considerably smaller. The marginal elasticity is 0.79 0.04, so the confidence intervals for the estimates do not even overlap. The simple explanation for the difference is that the partial elasticity estimates the effect of percentage differences in net sales among companies with equal assets. The marginal elasticity includes the indirect effect: the marginal elasticity includes the benefit of having more assets (which itself has positive partial elasticity). 42. Cars a) The calibration and residual plot show the a small amount of curvature (the fit underpredicts the price of the small cars) as well as large changes in the variation.
Base Price MSRP Residual
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 0 10000 30000 50000 70000

Base Price MSRP Actual

50000 40000 30000 20000 10000 0 -10000 -20000 -30000 -40000 0 10000 30000 50000 70000

Base Price MSRP Predicted P< .0001 RSq=0.67 RMSE=9898.8

Base Price MSRP Predicted

b) Not entirely, but its a big step forward. For the model with logs, the fit appears more linear with more similar (though perhaps still changing) variation. A big A24-14

4/20/2007 improvement, and good enough to continue.


5 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4 3.9 3.94.04.1 4.24.34.44.54.64.7 4.84.95.0

24 Answers

0.3

Log 10 Price Actual

Log 10 Price Residual


Log 10 Price Predicted P<.0001 RSq=0.77 RMSE=0.1026

0.2 0.1 0.0 -0.1 -0.2 -0.3 3.94.04.1 4.24.34.44.54.64.7 4.84.95.0

Log 10 Price Predicted

c) It might be zero, but it does not have to be exactly zero. The confidence interval for the partial elasticity is -0.0177 - 2 * 0.1063 to -0.0177 + 2 * 0.1063, or about -0.23 to 0.19. Zero lies inside the confidence interval. d) The confidence interval for the marginal elasticity for weight is 1.432 - 2 * 0.133 to 1.432 + 2 * 0.133, or about 1.17 to 1.70. The estimated marginal and partial elasticities have similar standard errors, but the partial elasticity is near zero (not significantly different from zero). e) Yes. The indirect effect of log10 weight on log10 price is almost the same as the marginal slope. Hence, theres nothing left for the direct effect. All of the effect of this variable comes from its indirect effect via changes in HP. The indirect effect for log10 weight is 1.0378 (the slope in the regression of log10 HP on log10 weight) times the direct effect for log10 HP, 1.3964: 1.0378 * 1.3964 1.4492 f) Yes, weight has an effect, but only indirectly. If all we know is that one car is heavier than another, the heavier car is likely to cost more (on average). If the cars have the same HP, however, wed expect the two cars to be comparably priced, on average. 43. OECD a) The scatterplots show a very strong association between y and the second predictor. This second variable appears more associated with the GDP, as well as having a more linear relation. The scatterplots seem reasonably linear.
60000 50000
60000 50000

GDP (per cap)

40000 30000 20000 10000 0 -10

GDP (per cap)


-5 0 5 10 15 20

40000 30000 20000 10000 0 200

300

400

500

600

700

800

Trade Bal (%GDP)

Muni Waste (kg/person)

A24-15

4/20/2007
20 15

24 Answers

Trade Bal (%GDP)

10 5 0 -5 -10 200

300

400

500

600

700

800

Muni Waste (kg/person)

b) The two xs are correlated (r 0.3). The slope for the trade balance will change because of the presence of indirect effects. c) The estimated model is R2 se Term Intercept Trade Bal (%GDP) Muni Waste (kg/person) Estimate -4622.225 959.60593 62.184369 0.772618 6934.623 Std Error 4796.003 232.7805 9.153925 t Ratio Prob>|t| -0.96 0.3440 4.12 0.0003 6.79 <.0001

d) Yes. For example, the residuals have similar variances (left) and are nearly normal (right). Of course, with only 29 cases, we cannot be very sure and we may have missed a subtle problem.
15000

GDP (per cap) Residual

10000 5000 0 -5000 -10000 -15000 0 10000 30000 50000

15000 10000 5000 0 -5000 -10000 -15000 2 4 6 8 10-3

.01 .05 .10 .25 .50 .75 .90 .95 .99

GDP (per cap) Predicted

-2

-1

Count

Normal Quantile Plot

e) The direct path from trade balance to y has coefficient 960 and the path from waste to y has coefficient 62. The path from trade balance to muni waste has slope from the fit Estimated Muni Waste (kg/person) = 503.93174 + 7.7335205 Trade Bal (%GDP) Similarly the path from municipal waste to trade balance has slope Estimated Trade Bal (%GDP) = -4.990754 + 0.0119591 Muni Waste (kg/person) The indirect effect for trade balance is thus 7.7335205 * 62.184369 481 As a check the sum of the direct and indirect effects are 960 + 481 = 1441 which is the marginal slope for the trade balance. Because the indirect effect is positive, the marginal slope is larger than the partial slope. On average, countries A24-16

4/20/2007

24 Answers

with larger exports have more consumption (producing more trash), and this consumption contributes to GDP. f) The 95% confidence interval for the slope for municipal waste is 62.1843 - 2 * 9.1539, 62.1843 + 2 * 9.1539 = $43.8765 to $80.4921 more GDP per kilogram of waste. The se rounds to 9, would be rounded to $44 to $80. The interval does not include zero, so that 2 is not zero. This does not mean countries should produce more waste. Rather, it means that at a given trade balance, countries with more waste per person have larger GDP per person. The model is not causal. 44. Hiring a) The scatterplots seem reasonably linear, though the association is weak in each case. The association between the two explanatory variables is particularly weak. This plot may have two clusters of employees.
12 11 12 11

Log Profit

10 9 8 7 0 1 2 3 4 5 6 7

Log Profit

10 9 8 7 0 1 2 3 4 5 6 7 8 9 10

Log Accounts
7 6

Log Commission

Log Accounts

5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

Log Commission

b) Because the association between the two explanatory variables is weak, the marginal and partial elasticities should be similar. c) The estimated model is RSquare Root Mean Square Error Observations (or Sum Wgts) Term Intercept Log Accounts Log Commission Estimate 8.3716563 0.1995083 0.1325818 Std Error 0.117483 0.029552 0.016318 0.279374 0.671333 464 t Ratio 71.26 6.75 8.12 A24-17 Prob>|t| <.0001 <.0001 <.0001

4/20/2007

24 Answers

d) The residuals show little pattern, though negative residuals seem more dispersed (more variable) than positive residuals. The residuals are a bit skewed, but the deviations are only in the lower extreme. As a whole, the residuals are nearly normal.
2 .01 .05 .10 .25 .50 .75 .90 .95 .99

1 0 -1 -2 -3

Log Profit Residual

1 0 -1 -2 -3 8 9 10 11

25

50

75 -3

-2

-1

Log Profit Predicted

Count

Normal Quantile Plot

e) The confidence interval for the partial elasticity is 0.1995083 - 2 * 0.029552, 0.1995083 + 2 * 0.029552 or about (to presentation precision) 0.14 to 0.26. The marginal elasticity is larger than this interval. Looks like there was more of an indirect effect than we anticipated. f) The path diagram shows that the partial elasticity for the number of accounts is 0.20 and the partial elasticity for early commission is 0.13. The indirect effect for the log of the number of the accounts is .0908 0.1325 * 0.6855 (from the regression of log commission on log accounts) Notice that this checks (up to rounding errors) : the sum of direct and indirect effects is the marginal elasticity given in the text, 0.09 + 0.20 = 0.29. g) To answer this question requires that you believe the MRM and treat these effects as causal. Because there could be other factors at work, thats wishful thinking. If you do choose to believe the model, then go with the program that concentrates on developing accounts. The partial elasticity of the number of accounts is larger than the partial elasticity of the early commissions, so put the effort here. 45. Promotion a) The scatterplots are vaguely linear, with weak associations between the two predictors and the response. The largest correlation is between the two explanatory variables, so marginal and partial slopes will likely differ.

A24-18

4/20/2007
0.240 0.235 0.240 0.235

24 Answers

Market Share

0.225 0.220 0.215 0.210 0.205 .02 .04 .06 .08 .10 .12 .14

Market Share

0.230

0.230 0.225 0.220 0.215 0.210 0.205 .1 .2 .3 .4 .5 .6 .7

Detail Voice
0.14 0.12

Sample Voice

Detail Voice

0.10 0.08 0.06 0.04 0.02 .1 .2 .3 .4 .5 .6 .7

Sample Voice

b) The estimated model is

R2 se n Estimate 0.2127433 0.0191598 0.0216912

0.280169 0.006605 39 Std Error 0.004656 0.065153 0.008333 t Ratio 45.69 0.29 2.60 Prob>|t| <.0001 0.7704 0.0133

Term Intercept Detail Voice Sample Voice

c) The residuals look fine, though rather variable (i.e., the model does not explain much variation.) The DW does not find a pattern over time (D = 2.07). The residuals are also nearly normal.
Market Share Residual
0.010 0.005 0.000 -0.005 -0.010 -0.015 .220 .225 .230

0.010

Residual

0.005 0.000 -0.005 -0.010 -0.015 0 5 10 15 20 25 30 35 40

Market Share Predicted

Row Number

A24-19

4/20/2007

24 Answers

0.01 0.005 0 -0.005 -0.01 -0.015 2 4 6 -3

.01 .05 .10 .25 .50 .75 .90 .95 .99

-2

-1

Count

Normal Quantile Plot

d) Yes. F = (0.28/(1-0.28)) * (39-1-2)/2 7 > 4, so the effect is statistically significant. e) No. The partial effect for detailing is not significantly different from zero. f) No. The model is not causal. The partial slope for detailing is not significantly different from zero (i.e., zero is in the 95% confidence interval), but this does not mean detailing has no effect. It only means, as in the statement of the question in part e, that at a given level of sample share, periods with a higher share of detailing have not shown gains in market share. Since detailing and sampling tend to come together, it is hard to separate the two. Perhaps the best advice would be to do some experiments. 46. Apple a) All three variables are correlated with each other, with common outlying events (such as October 1987). The correlations are modest in size, but reasonably linear.
0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.2 -0.1 0 .1
0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.3 -0.2

Apple Return

Apple Return

Market Return

-0.1

.1

.2

.3

.4

IBM Return

0.1

Market Return

-0.1

-0.2 -0.3 -0.2 -0.1 0 .1 .2 .3 .4

IBM Return

b) The estimated model is

R2 se n

0.216589 0.13255 300 A24-20

4/20/2007 Term Intercept Market Return IBM Return Estimate 0.0048214 1.3168817 0.2275089 Std Error 0.007868 0.204542 0.110773

24 Answers t Ratio 0.61 6.44 2.05 Prob>|t| 0.5405 <.0001 0.0409

c) The residuals and model appear fine. There is little dependence over time (DW = 1.94), and the scatterplot of residuals on the fitted values looks about as good as they come. Even the outlier period (October 1987, marked as ) is on target. The residuals are also nearly normal.
Apple Return Residual
0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.3 -0.2 -0.1 .0 .1 .2
0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 20 40 60-3 -2 -1 0 1 2 3 .01 .05.10 .25 .50 .75 .90.95 .99

Apple Return Predicted

Count

Normal Quantile Plot

d) The confidence interval for the market effect is 1.3168817 - 2 * 0.204542, 1.3168817 + 2 * 0.204542 = .9077977 to 1.7259657 or 0.91 to 1.73. This range is so wide as to allow the possibility of the marginal and partial slopes being the same. That is, the partial slope is smaller, but not by much considering the sampling variation. e) The confidence interval for the estimate of IBM returns is 0.2275089 - 2 * 0.110773, 0.2275089 + 2 * 0.110773 = .0059629 to .4490549 or 0.01 to 0.45 (presentation precision). The interpretation is that during months with equal returns on the market, months in which IBM returned 1%, Apple went up as well from near zero to about 0.5%, on average. f) Yes, the improvement is statistically significant because the confidence interval for the slope of IBM returns does not include zero (just barely). As to a better trading strategy, perhaps not unless we can anticipate movements in IBM. Related ideas known as pairs trading rely on correlations between the movements of two stocks to identify opportunities to buy one and sell the other.

A24-21

4/20/2007 4M Leasing

24 Answers

a) Without an estimated value for the residual price, the manufacturer may not be able to cover costs when the cars are returns. Perhaps it should have charged more for mileage if this factor has a large effect on resale value. b) We need multiple regression because it is likely that the two factors are related; namely, that older cars have been driven further. If we use marginal estimates of these effects, for example, well in effect double count for the age of the car when we estimate the impact of mileage on the residual value. That might lead us to charge more than we need to cover our costs. Thats OK (from the manufacturers point of view), but we might be losing profitable sales due to charging too much. c) Most of the curvature we have seen in previous examples with cars (See Chapter 20) come from combining very different models: for example, theres more variation in attributes among very expensive cars than among cheaper cars. Also, the nonlinear patterns that come as cars lose value (you cannot lose $10,000 for many years and stay positive) become more evident as cars get much older. d) The plots appear straight-enough, and we can see the collinearity between the two proposed explanatory variables. A few outlier appear in the plots, but none of these seem extreme.
45000 40000 35000 45000 40000 35000

Price

30000 25000 20000 0 1 2 3 4 5

Price

30000 25000 20000 0 10000 30000 50000 70000

Age
5 4

Mileage

Age

3 2 1 0 0 10000 30000 50000 70000

Mileage

e)

R2 se n Term Intercept Age Estimate 40323.937 -1853.803

0.510372 3178.879 218 Std Error 721.8478 288.8791 t Ratio 55.86 -6.42 Prob>|t| <.0001 <.0001 A24-22

4/20/2007 Term Mileage Estimate -0.124023 Std Error 0.02375

24 Answers t Ratio -5.22 Prob>|t| <.0001

f) The residuals have similar variances and are nearly normal. The diagonal stripes come from rounding of the prices. Theres one unusually expensive car among these ($13,800, in row 17), but otherwise nothing stands out and particularly troublesome.
15000 10000 5000 0 -5000 -10000 20000 250003000035000 40000
15000 10000 5000 0 -5000 -10000 25 50 75 -3 -2 -1 0 1 2 3 .01 .05 .10 .25 .50 .75 .90 .95 .99

Price Residual

Price Predicted

Count

Normal Quantile Plot

g) For the effect of age on residual value, the 95% confidence interval is -1853.803 - 2 * 288.8791, -1853.803 + 2 * 288.8791 = -2431.5612 to -1276.0448 which rounds to a drop in resale value of about $1,280 to $2,430 per year. For mileage, -0.124023 - 2 * 0.02375, -0.124023 + 2 * 0.02375 = -.171523 to -.076523 which rounds to $0.077 to $0.172 per mile. h) To cover the loss in value of the car over the term of the lease, I recommend that we structure the lease for a 3-series BMW to cost $2,400 per year with and additional $0.18 per mile. These estimates on average will cover the costs due to aging with 95% confidence. [You might also suggest a lease that allows 10,000 miles, pushing that 10000 * 0.172 = 1,720 into the set annual price, with say 0.17 per additional mile. This would make the total cost per year 2400 + 1720 = $4100, or about 350 per month. In fact, at the time of this writing, you could lease a BMW 325i for 36 months at $420 per month, plus $3,400 down and $0.20 per mile over 30,000.] i) This analysis ignores the fact that these cars cost different amounts at the time of purchase. We have not observed the actual loss in value; weve only seen how time (and mileage) has affected their value. They did not all start from the same initial cost. Also, we have not identified other differences among these cars, such as special options that might increase the value of the car further. Other factors, such as a blow to the reputation of BMW, would also make the estimates from this model inaccurate.

A24-23