Professional Documents
Culture Documents
Learning Objectives
1. Understand how regression analysis can be used to develop an equation that estimates
2. Understand the differences between the regression model, the regression equation, and
3. Know how to fit an estimated regression equation to a set of sample data based on the
4. Be able to determine how good a fit is provided by the estimated regression equation and
compute the sample correlation coefficient from the regression analysis output.
5. Understand the assumptions necessary for statistical inference and be able to test for a
significant relationship.
7. Learn how to use a residual plot to make a judgment as to the validity of the regression
assumptions.
1. a.
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
xi 15 yi 40
x 3 y 8
d. n 5 n 5
( xi x )( yi y ) 26 ( xi x ) 2 10
( xi x )( yi y ) 26
b1 2.6
( xi x )2 10
b0 y b1 x 8 (2.6)(3) 0.2
yˆ 0.2 2.6 x
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
2. a.
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
( xi x )( yi y ) 540 ( xi x ) 2 180
( xi x )( yi y ) 540
b1 3
( xi x ) 2 180
b0 y b1 x 35 (3)(11) 68
yˆ 68 3 x
e. yˆ 68 3(10) 38
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
3. a.
xi 50 yi 83
x 10 y 16.6
b. n 5 n 5
( xi x )( yi y ) 171 ( xi x ) 2 190
( xi x )( yi y ) 171
b1 0.9
( xi x )2 190
yˆ 7.6 0.9 x
c. yˆ 7.6 0.9(6) 13
4. a.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
b. There appears to be a positive linear relationship between the percentage of women
working in the five companies (x) and the percentage of management jobs held by
c. Many different straight lines can be drawn to provide a linear approximation of the
line that “best” represents the relationship according to the least squares criterion.
( x i x )( y i y ) 624 ( x i x ) 2 480
( xi x )( yi y ) 624
b1 1.3
( xi x )2 480
b0 y b1 x 43 1.3(60) 35
yˆ 35 1.3 x
5. a.
25
Number of Defective Parts
20
15
10
0
0 2 4 6 8 10 12
Line Speed (meters per minute)
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
b. There appears to be a negative relationship between line speed (meters per minute)
c. Let x = line speed (meters per minute) and y = number of defective parts.
( xi x )( yi y ) 60 ( xi x ) 2 40
( xi x )( yi y ) 60
b1 1.5
( xi x ) 2 40
b0 y b1 x 17 (1.5)(7) 27.5
yˆ 27.5 1.5 x
6. a.
number of passing yards per attempt and y = the percentage of games won by the
team.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
c. x xi / n 680 / 10 6.8 y yi / n 464 / 10 46.4
( xi x )( yi y ) 121.6 ( xi x ) 2 7.08
( xi x )( yi y ) 121.6
b1 17.1751
( xi x ) 2 7.08
yˆ 70.391 17.1751x
d. The slope of the estimated regression line is approximately 17.2. So, for every
increase of one yard in the average number of passes per attempt, the percentage of
e. With an average number of passing yards per attempt of 6.2, the predicted percentage
of games won is ŷ = –70.391 + 17.175(6.2) = 36%. With a record of seven wins and
nine losses, the percentage of Kansas City Chiefs wins is 43.8 or approximately 44%.
Considering the small data size, the prediction made using the estimated regression
7. a.
150
140
130
Annual Sales ($1000s)
120
110
100
90
80
70
60
50
0 2 4 6 8 10 12 14
Years of Experience
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
b. Let x = years of experience and y = annual sales ($1000s)
( xi x )( yi y ) 568 ( xi x ) 2 142
( xi x )( yi y ) 568
b1 4
( xi x )2 142
b0 y b1 x 108 (4)(7) 80
y 80 4 x
8. a.
4.5
4.0
Satisfaction
3.5
3.0
2.5
2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution
( x i x )( y i y ) 2.4 ( x i x ) 2 2.6
( xi x )( yi y ) 2.4
b1 .9077
( xi x ) 2 2.6
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
b0 y b1 x 3.2 (.9077)(3.3) .2046
yˆ .2046 .9077 x
d. The slope of the estimated regression line is approximately .9077. So, a one unit
increase in the speed of execution rating will increase the overall satisfaction rating
by approximately .9 points.
e. The average speed of execution rating for the other brokerage firms is 3.4. Using this
as the new value of x for Zecco.com, we can use the estimated regression equation
Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3.
9. a.
25
20
Landscaping Expenditures ($000)
15
10
0
0 200 400 600 800 1000
Home Value ($000)
b. The scatter diagram indicates a positive linear relationship between x = Home Value
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
($ thousands) and y = landscaping expenditures ($ thousands).
( xi x )( y i y ) 8194.15 ( xi x ) 2 382633.5
( xi x )( yi y ) 8194.15
b1 .021415
( xi x ) 2 382633.5
y = .0214 x + 6.3091
10. a.
350.00
300.00
250.00
Price ($)
200.00
150.00
100.00
50.00
0.00
0 10 20 30 40 50
Age (years)
b. The scatter diagram indicates a positive linear relationship between x = age of wine
and y = price of a 750-ml bottle of wine. In other words, the price of the wine
( x i x )( y i y ) 3838.2 ( x i x ) 2 552.1
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
( xi x )( yi y ) 3838.2
b1 6.952
( xi x ) 2 552.1
yˆ 9.0216 6.952 x
d. The slope of the estimated regression line is approximately 6.95. So, for every
11. a.
85
80
75
Overall Score
70
65
60
55
50
400 600 800 1000 1200 1400
Price ($)
b. The scatter diagram indicates a positive linear relationship between x = price ($) and y
= overall score.
( xi x )( yi y ) 11,900
b1 .021212
( xi x )2 561,000
yˆ 53.864 .0212 x
d. The slope of .0212 means that spending an additional $100 in price will increase the
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
overall score by approximately two points.
12. a.
10.00
8.00
6.00
% Return Coca-Cola
4.00
2.00
0.00
-6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00
-2.00
-4.00
% Return S&P 500
percentage return of the S&P 500 and y = percentage return for Coca-Cola.
c. x xi / n 2.00 / 10 .2 y yi / n 15 / 10 1.5
( xi x )( y i y ) 95 ( xi x ) 2 179.6
( xi x )( yi y ) 95
b1 .5290
( xi x ) 2 179.6
yˆ 1.3942 .529 x
d. A 1-percent increase in the percentage return of the S&P 500 will result in a .529
e. The beta of .529 for Coca-Cola differs somewhat from the beta of .82 reported by
Yahoo Finance. This is likely to the result of differences in the period over which the
data were collected and the amount of data used to calculate the beta. Note: Yahoo
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
uses the last five years of monthly returns to calculate beta.
13. a.
30.0
Reasonable Amount of Itemized
25.0
Deductions ($1000s)
20.0
15.0
10.0
5.0
0.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0
( xi x )( yi y ) 1233.7 ( xi x ) 2 7648
( xi x )( yi y ) 1233.7
b1 0.1613
( xi x )2 7648
y 4.68 016
. x
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
14. a.
9
8
Number of Days Absent 7
6
5
4
3
2
1
0
0 5 10 15 20
Distance to Work (kilometers)
The scatter diagram indicates a negative linear relationship between x = distance to work
b. x xi / n 90 / 10 9 y yi / n 50 / 10 5
( xi x )( y i y ) 95 ( xi x ) 2 276
( xi x )( yi y ) 95
b1 .3442
( xi x ) 2 276
b0 y b1 x 5 ( .3442)(9) 8.0978
yˆ 8.0978 .3442 x
six days.
15. a. The estimated regression equation and the mean for the dependent variable are:
The sum of squares resulting from error and the total sum of squares are:
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Thus, SSR = SST – SSE = 80 – 12.4 = 67.6
The least squares line provided a good fit; 84.5% of the variability in y has been
16. a. The estimated regression equation and the mean for the dependent variable are:
yˆi 68 3x y 35
The sum of squares due to error and the total sum of squares are
The least squares line provided an excellent fit; 87.6% of the variability in y has been
Note: the sign for r is negative because the slope of the estimated regression equation is
negative.
(b1 = –3)
17. The estimated regression equation and the mean for the dependent variable are:
The sum of squares due to error and the total sum of squares are:
SSR 1512.376
r2 .84
b. SST 1800
c. r r 2 .84 .917
19. a. The estimated regression equation and the mean for the dependent variable are:
ŷ = 80 + 4x y = 108
The sum of squares resulting from error and the total sum of squares are:
We see that 93% of the variability in y has been explained by the least squares line.
( xi x )( yi y ) 14,125 ( xi x ) 2 4.396
( xi x )( yi y ) 14,125
b1 3213.15
( xi x ) 2 4.396
yˆ 28,941.7 3213.15 x
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
SSR = SST – SSE = 52,120,800 – 6,735,080 = 45,385,720
Thus, an estimate of the price for a bike that weighs 7 kilograms is $6451.
( xi x )( yi y ) 712,500
b1 7.6
( xi x ) 2 93, 750
y 1246.67 7.6 x
b. $7.60
c. The sum of squares resulting from error and the total sum of squares are:
We see that 95.87% of the variability in y has been explained by the estimated regression
equation.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
SSR 9524.97
r2 .9013
SST 10,568
90% of the variability in the dependent variable was explained by the linear
c. r r .9013 .95
2
c. ( xi x ) 2 10
s 2.033
sb1 0.643
( xi x ) 2
10
b1 2.6
t 4.044
sb1 .643
d.
Using t table (3 degrees of freedom), area in tail is between .01 and .025.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Source of Sum of Degrees of Mean F p-value
Total 80.0 4
c. ( xi x ) 2 180
s 8.7560
sb1 0.6526
( xi x ) 2
180
b1 3
t 4.59
sb1 .653
d.
Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less
than .02.
than .025.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Source of Variation Sum of Degrees of Mean F p-value
Total 1850 4
b. ( xi x ) 190
2
s 6.5141
sb1 0.4726
( xi x ) 2
190
b1 .9
t 1.90
sb1 .4726
Using t table (3 degrees of freedom), area in tail is between .05 and .10.
related.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
related.
( x x ) 2
14,950
s 8.4797
sb1 .0694
(x x )
i
2
14,950
b1 .318
t 4.58
sb1 .0694
Using t table (4 degrees of freedom), area in tail is between .005 and .01.
c.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Variation Squares Freedom Square
Total 1,800 5
27. a.
The scatter diagram suggests a positive linear relationship between the two
variables.
( xi x )( yi y ) 187136 ( xi x ) 2.5638
2
( xi x )( yi y ) 187136
b1 72990.51
( xi x ) 2 2.5638
yˆ 72990.51 x 110210
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
c. SSE = ( yi yˆi )2 1047.747 SST = ( yi y )2 = 14706.9
Thus,
28. The sum of squares due to error and the total sum of squares are:
Thus,
We can use either the t test or F test to determine whether speed of execution and overall
( xi x ) 2 2.6
s .3997
sb1 .2479
( xi x ) 2
2.6
b1 .9077
t 3.66
sb 1
.2479
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Using t table (9 degrees of freedom), the area in the tail is less than .005; the p-
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall
Using the F table (1 degree of freedom numerator and 9 denominator), the p-value
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall
Total 3.5800 10
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
29. SSE = ( yi yˆi ) 2 233,333.33 SST = ( yi y ) 2 = 5,648,333.33
Thus,
Total 5,648,333.33 5
Using the F table (1 degree of freedom numerator and 4 denominator), the p-value
Because p-value , we reject H0: β1 = 0. Production volume and total cost are
related.
Thus,
s 260.7575 16.1480
( xi x )2 = 56.655
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
s 16.148
sb1 2.145
(x i x) 2
56.655
b1 12.966
t 6.045
sb1 2.145
Using the F table (1 degree of freedom numerator and 8 denominator), the p-value
32. a. s = 2.033
x 3 ( xi x )2 10
1 ( x* x )2 1 (4 3) 2
s yˆ * s 2.033 1.11
n ( xi x ) 2
5 10
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
yˆ * t /2 s yˆ*
or 7.07 to 14.13.
1 ( x* x )2 1 (4 3)2
spred s 1 2.033 1 2.32
n ( xi x )2 5 10
c.
d. ŷ t /2 spred
*
or 3.22 to 17.98.
33. a. s = 8.7560
x 11 ( xi x )2 180
1 ( x* x )2 1 (8 11)2
s yˆ * s 8.7560 4.3780
n ( xi x ) 2
5 180
yˆ * t /2syˆ*
or 30.07 to 57.93.
1 ( x* x ) 2 1 (8 11) 2
spred s 1 8.7560 1 9.7895
n ( xi x ) 2
5 180
c.
ŷ * t /2 spred
d.
44 + 3.182(9.7895) = 44 + 31.15
or 12.85 to 75.15.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
34. s = 6.5141
x 10 ( xi x )2 190
1 ( x* x )2 1 (12 10)2
s yˆ* s 6.5141 3.0627
n ( xi x )2 5 190
yˆ * t /2syˆ*
or 8.65 to 28.15.
1 ( x* x ) 2 1 (12 10) 2
spred s 1 6.5141 1 7.1982
n ( xi x )2 5 190
ŷ * t /2 spred
or -4.50 to 41.30.
The two intervals are different because there is more variability associated with
x 4.6 ( xi x )2 32.4
1 ( x* x )2 1 (3 4.6)2
syˆ* s 9.7172 4.11
n ( xi x )2 10 32.4
yˆ * t /2syˆ*
or 21.32 to 40.28.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
1 ( x* x )2 1 (3 4.6)2
spred s 1 9.7172 1 10.55
n ( xi x )2 10 32.4
c.
ŷ * t /2 spred
or 6.47 to 55.13.
d. As expected, the prediction interval is much wider than the confidence interval. This
is because it is more difficult to predict the waiting time for an individual customer
arriving with three people in line than it is to estimate the mean waiting time for a
1 ( x* x ) 2 1 (9 7)2
s yˆ* s 4.6098 1.6503
n ( xi x )2 10 142
36. a.
yˆ * t /2syˆ*
yˆ * 80 4 x * 80 4(9) 116
1 ( x* x ) 2 1 (9 7)2
spred s 1 4.6098 1 4.8963
n ( xi x )2 10 142
b.
ŷ * t /2 spred
c. As expected, the prediction interval is much wider than the confidence interval. This
is because it is more difficult to predict annual sales for one new salesperson with
nine years of experience than it is to estimate the mean annual sales for all
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
salespersons with nine years of experience.
37. a. x 57 ( xi x )2 7648
s2 = 1.88 s = 1.37
1 ( x* x )2 1 (52.5 57)2
s yˆ * s 1.37 0.52
n ( xi x ) 2 7 7648
yˆ * t /2syˆ*
spred
b. = 1.47
d. Any deductions exceeding the $16,860 upper limit could suggest an audit.
b. x 575 ( xi x )2 93,750
1 ( x* x )2 1 (500 575)2
spred s 1 241.52 1 267.50
n ( xi x )2 6 93,750
ŷ * t /2 spred
or $3,815.10 to $6,278.24.
c. Based on one month, $6,000 is not out of line because $3,815.10 to $6,278.24 is the
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
prediction interval. However, a sequence of five to seven months with consistently
39. a. With
x* = 89, yˆ * 17.49 1.0334x* 17.49 1.0334(89) $109.46
s 220.2 14.391
1 ( x* x ) 2 1 (89 105)2
s yˆ* s 14.8391 6.1819
n ( xi x )2 9 4100
or $94.84 to $124.08.
1 ( x* x ) 2 1 (128 105)2
spred s 1 14.8391 1 16.525
n ( xi x )2 9 4100
ŷ * t /2 spred
or $110.69 to $188.85.
40. a. 9
b. ŷ = 20.0 + 7.21x
c. 1.3626
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Using Excel, the p-value corresponding to F = 28.00 is .0011.
b1 B1 .8951 0
t 6.01
sb1 .149
b.
Using the t table (8 degrees of freedom), the area in the tail is less than
.005.
b. 30
c. F = MSR / MSE = 6828.6/82.1 = 83.17
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
43. a.
800
700
600
500
Total Batch Time
400
300
200
100
0
0 100 200 300 400 500 600 700
Quantity
ŷ = .7631(Quantity) + 117.5419
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
The intercept is the estimate for the setup time (117.5 minutes), and the slope (.7631
1000
900
800
700
600
Price ($)
500
400
300
200
100
0
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Weight (kg)
b. There appears to be a negative linear relationship between the two variables. The
Analysis of Variance
Total 17 597626
Model Summary
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
S R-sq R-sq(adj)
Coefficients
Regression Equation
R Large residual
xi 70 yi 76
x 14 y 15.2
45. a. n 5 n 5
( xi x )( yi y ) 200 ( xi x ) 2 126
( xi x )( yi y ) 200
b1 1.5873
( xi x ) 2 126
yˆ 7.02 1.59 x
c.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
6
Residuals
0
-2
-4
-6
0 5 10 15 20 25
x
With only five observations, it is difficult to determine if the assumptions are satisfied.
However, the plot does suggest curvature in the residuals that would indicate that the
error term assumptions are not satisfied. The scatter diagram for these data also indicates
d. s 23.78
2
1 ( xi x ) 2 1 ( x 14) 2
hi i
n ( xi x ) 2
5 126
e. The standardized residual plot has the same shape as the original residual plot. The
curvature observed indicates that the assumptions regarding the error term may not be
satisfied.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
46. a. yˆ 2.32 .64 x
b.
4
3
2
1
Residuals
0
-1
-2
-3
-4
0 5 10
x
The assumption that the variance is the same for all values of x is
yˆ 29.4 1.55 x
Because p-value = .05, we conclude that the two variables are related.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
c.
10
0
Residuals
-5
-10
-15
25 35 45 55 65
Predicted Values
d. The residual plot leads us to question the assumption of a linear relationship between
x and y. Even though the relationship is significant at the .05 level of significance, it
48. a. yˆ 80 4 x
2
Residuals
-2
-4
-6
-8
0 2 4 6 8 10 12 14
x
Regression Statistics
Multiple R 0.8092
R Square 0.6549
Observations 61.0000
ANOVA
df SS MS F Significance F
Total 60 9,142.6756
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
b.
30
25
20
Residual 15
10
5
0
0.00 5,000.00 10,000.00 15,000.00 20,000.00 25,000.00 30,000.00
-5
-10
-15
Density
c. The residual plot leads us to question the assumption of constant variance of the error
terms. The plot indicates that the absolute value of the residuals is larger for larger
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
The first data point (y= 145, x=135) has a large standardized residual.
b.
2.5
2.0
1.5
Standardized Residual
1.0
0.5
0.0
-0.5
-1.0
The standardized residual plot indicates that the observation x = 135, y = 145 may be an
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
c. The scatter diagram follows.
150
145
140
135
130
125
y
120
115
110
105
100
100 110 120 130 140 150 160 170 180
The scatter diagram also indicates that the observation x = 135, y = 145 may be an
outlier; the implication is that for simple linear regression an outlier can be identified by
Analysis of Variance
Total 7 101.500
Model Summary
S R-sq R-sq(adj)
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
3.18123 40.18% 30.21%
Coefficients
Regression Equation
y = 13.00 + 0.425 x
R Large residual
X Unusual X
The standardized residuals are: –1.00, –.40, .01, –.48, .25, .65, 2.00, –2.16.
The last two observations in the data set appear to be outliers because the
standardized residuals for these observations are 2.00 and –2.16, respectively.
Here we identify an observation as having high leverage if hi > 6/n; for these data, 6/n =
6/8 = .75. Because the leverage for the observation x = 22, y = 19 is .76, we would
an influential observation.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
c.
30
25
20
15
y
10
0
0 5 10 15 20 25
x
observation.
52. a.
120
100
Program Expenses ($)
80
60
40
20
0
0 5 10 15 20 25
Fund‐raising Expenses (%)
The scatter diagram does indicate potential influential observations. For example, the
22.2% fund-raising expense for the American Cancer Society and the 16.9% fund-raising
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
expense for the St. Jude Children’s Research Hospital look like they may each have a
large influence on the slope of the estimated regression line. And with a fund-raising
expense of 2.6%, the percentage spent on programs and services by the Smithsonian
Institution (73.7%) seems to be somewhat lower than would be expected; thus, this
Analysis of Variance
Total 9 855.2
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Fits and Diagnostics for Unusual Observations
R Large residual
X Unusual X
c. The slope of the estimated regression equation is –0.917. Thus, for every 1% increase
in the amount spent on fund-raising, the percentage spent on program expenses will
decrease by .917%; in other words, just a little less than 1%. The negative slope and
d. The output in part b indicates that there are two unusual observations:
standardized residual.
low side compared to most of the other supersized charities, the percentage spent
on program expenses appears to be much lower than one would expect. It appears
that the Smithsonian’s administrative expenses are too high. But thinking about
the expenses of running a large museum like the Smithsonian, the percentage
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
costs for a museum are higher than for some other types of organizations. The
especially large value of fund-raising expenses for the American Cancer Society
suggests that this obervation has a large influence on the estimated regression
equation. The following output shows the results if this observation is deleted
The y-intercept has changed slightly, but the slope has changed from –.917
to –1.00.
53. a.
60
50
Shopping Time (Minutes)
40
30
20
10
0
0 20 40 60 80 100 120 140
Arrival Time (Minutes before 6:00 p.m.)
b. There appears to be a positive relationship between the two variables, but observation
32 (111, 24) appears to be an observation with high leverage and may be especially
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
influential in terms of fitting a linear model to the data.
c.
d. In part c, we have highlighted observation 32, which has a standardized residual less
e. Looking at the scatter diagram in part a, observation 32 probably will have a lot of
Note that the slope of the estimated regression equation is now .2829 as compared to a
value of .2543 when this observation is included. Thus, we see that this observation has
an impact on the value of the slope of the fitted line and hence we would say that it is an
influential observation. Also note that the R2 value has increased from .65 to .75,
indicating that observation 32 was an outlier. If possible, Kroger should analyze the
dissimilar from the other observations (or if this observation suffers from data recording
error), then removing it from the analysis may be reasonable. Otherwise, Kroger should
be wary of removing the observation (and thereby inflating the accuracy of the model)
but instead investigate the possibility of additional data collection to reduce its influence.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
54. a.
2,500
1,500
1,000
500
0
0 100 200 300 400 500
Revenue ($ millions)
The scatter diagram does indicate potential outliers or influential observations (or both).
For example, the New York Yankees have both the highest revenue and value and appear
to be an influential observation. The Los Angeles Dodgers have the second highest value
Regression Statistics
Multiple R 0.9062
R Square 0.8211
Observations 30
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
ANOVA
df SS MS F Significance F
Total 29 4296009.367
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Thus, the estimated regression equation that can be used to predict the team’s value given the value of annual
c. The standard residual value for the Los Angeles Dodgers is 4.7 and should be treated as an outlier. To determine if the New
York Yankees point is an influential observation, we can remove the observation and compute a new estimated regression
equation. The results show that the estimated regresssion equation is ŷ = –449.061 + 5.2122 revenue. The following two
scatter diagrams illustrate the small change in the estimated regression equation after removing the observation for the New
York Yankees. These diagrams show that the effect of the New York Yankees observation on the regression results is not
that dramatic.
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Scatter Diagram Including the New York Yankees Observation
55. No. Regression or correlation analysis can never prove that two variables are causally
related.
56. The estimate of a mean value is an estimate of the average of all y values associated with
the same x. The estimate of an individual y value is an estimate of only one of the y
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
57. The purpose of testing whether 1 0 is to determine whether or not there is a significant
relationship between x and y. However, rejecting 1 0 does not necessarily imply a good
58. a.
1420
1400
1380
1360
S&P 500
1340
1320
1300
1280
1260
12200 12400 12600 12800 13000 13200 13400
DJIA
Analysis of Variance
Total 14 23346
Model Summary
S R-sq R-sq(adj)
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Coefficients
Regression Equation
c. Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value
d. With R-Sq = 94.9%, the estimated regression equation provided an excellent fit.
f. The DJIA is not that far beyond the range of the data. With the excellent fit provided
by the estimated regression equation, we should not be too concerned about using the
59. a.
350.0
300.0
Selling Price ($1000s)
250.0
200.0
150.0
100.0
50.0
0.0
0 50 100 150 200 250 300
Size (sq. m.)
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
The scatter diagram suggests that there is a linear relationship between size and
approximately $176,692.
e. The estimated regression equation should provide a good estimate because r2 = 0.897.
f. This estimated equation might not work well for other cities. Housing markets are
also driven by other factors that influence demand for housing such as job market and
quality-of-life factors. For example, because of the existence of high-tech jobs and its
proximity to the ocean, Seattle, Washington, has houses that are quite different from
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
60. a.
The scatter diagram indicates a positive linear relationship between the two variables.
Online universities with higher retention rates tend to have higher graduation rates.
Analysis of Variance
Total 28 2725.3
Model Summary
S R-sq R-sq(adj)
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Coefficients
Regression Equation
R Large residual
X Unusual X
d. The estimated regression equation is able to explain 44.9% of the variability in the
graduation rate based on the linear relationship with the retention rate. It is not a great
fit, but the fit is reasonably good given the type of data.
standardized residual. With a retention rate of 51% it does appear that the graduation
rate of 25% is low compared to the results for other online universities. The president
of South University should be concerned after looking at the data. Using the
x value gives it large influence. With a retention rate of only 4%, the president of the
Analysis of Variance
Total 9 1004.5
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
Usage 30
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Fit SE Fit 95% CI 95% PI
H0: 1 = 0.
Analysis of Variance
Total 5 34.000
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Variable Setting
Speed 10
b. Because the p-value corresponding to F = 11.33 = .028 < = .05, the relationship is
significant.
c. r 2 = .739; a good fit. The least squares line explained 73.9% of the variability in the
number of defects.
d. Using the output in part a, the 95% confidence interval is 12.294 to 17.2712.
63. a.
9
8
7
6
5
Days
4
3
2
1
0
0 5 10 15 20
Distance
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Analysis of Variance
Total 9 46.000
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
Distance 5
There is a significant relationship between the number of days absent and the
distance to work.
e. The 95% confidence interval is 5.19502 to 7.5586 or approximately 5.2 to 7.6 days.
Analysis of Variance
Total 9 357650
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
Age 4
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
746.667 29.7769 (678.001, 815.332) (559.515, 933.818)
b. Because the p-value corresponding to F = 54.75 is .000 < α = .05, we reject H0: 1 =
Analysis of Variance
Total 9 3702.5
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Hours 95
b. Because the p-value corresponding to F = 57.42 is .000 < α = .05, we reject H0: 1 =
c. 84.65 points
Analysis of Variance
Total 9 107.04
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Horizon = 0.275 + 0.950 S&P 500
b. Because the p-value = 0.029 is less than α = .05, the relationship is significant.
c. r2 = .470. The least squares line does not provide an especially good fit.
Analysis of Variance
Total 19 1.0020
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
Variable Setting
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Adjusted_Gross Income 35000
b. Because the p-value = 0.038 is less than α = .05, the relationship is significant.
c. r2 = .217. The least squares line does not provide a good fit.
68. a.
18
16
14
12
Price (S1000s)
10
8
6
4
2
0
0 50 100 150 200
Kilometers (1000s)
b. There appears to be a negative relationship between the two variables that can be
approximated by a straight line. An argument could also be made that the relationship
is perhaps curvilinear because at some point a car has so many kilometers that its
Analysis of Variance
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
Source DF Adj SS Adj MS F-Value P-Value
Total 18 87.547
Model Summary
S R-sq R-sq(adj)
Coefficients
Regression Equation
e. r 2 = .5396; a reasonably good fit considering that the condition of the car is also an
f. The slope of the estimated regression equation is –.03657. Thus, a one-unit increase
in the value of x coincides with a decrease in the value of y equal to .03657. Because
the data were recorded in thousands, every additional 1,000 kilometers on the car’s
g. The predicted price for a 2007 Camry with 97,000 kilometers is ŷ = 16.474 –
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.
whether the seller is a private party or a dealer, this is probably not the price you
would offer for the car. But it should be a good starting point in figuring out what to
© 2020 Cengage Learning®. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
in part.