Chapter 14 - Multiple Regression

Chapter 14 - Multiple Regression
CHAPTER 14—Multiple Regression

14.1 SSE
14.2 Insert x values into the least squares equation and solve for yˆ .
14.3 a. b0 = 29.347, bl = 5.6128, b2 = 3.8344
b0 = meaningless
b1 = 5.6128 implies that we estimate that mean sales price increases by $5,612.80 for
each increase of 100 square feet in house size, when the niceness rating stays constant.
b2 = 3.8344 implies that we estimate that mean sales price increases by $3,834.40 for
each increase in niceness rating of 1, when the square footage remains constant.
b. 172.28. From = 29.347 + 5.6128(20) + 3.8344(8)
14.4 a. b0 = 7.5891, bl = –2.3577, b2 = 1.6122, b3 = 0.5012
b. = 8.41099, = 7.5891 – 2.3577(3.70) + 1.6122(3.90) + .5012(6.50)
sales revenue = ($3.70)(841,065 bottles) = $3,112,066.30
14.5 a. b0 = 1946.8020, b1 = 0.0386, b2 = 1.0394, b0 = –413.7578
b. = 15896.24725, from = 1946.802 + .03858 (56194) + 1.0394 (14077.88) –

413.7578 (6.89) = 15896.25.
c. Therefore, actual hours were 17207.31 – 15896.25 = 1311.06 hours greater than predicted.
14.6 (Variance and standard deviation of the populations of potential error term values)
14.7 a. The proportion of the total variation in the observed values of the dependent variable
explained by the multiple regression model.
b. The adjusted R-squared differs from R-squared by taking into consideration the number
of independent variables in the model.
14.8 Overall F-test used to determine if at least one independent variable is significant.
14.9 (1)
(2) Total variation = 7447.5

Unexplained variation = 73.6
Explained variation = 7374
14-1
(3)
and close together and close to 1.
(4) F(model)
(5) Based on 2 and 7 degrees of freedom, F.05 = 4.74. Since F(model) = 350.87 > 4.74, we
reject by setting
reject by setting
(7) p-value = 0.00 (which means less than .001). Since this p-value is less than = .10, .05,
.01, and .001, we have extremely strong evidence that is false. That is,
we have extremely strong evidence that at least one of x1 and x2 is significantly related to
y.
14.10 (1)
(2) Total variation = 13.4586

Unexplained variation = 1.4318
Explained variation = 12.0268
(3)
and close together.
14-2
(4) F(model)
reject by setting
reject by setting
(7) p-value is less than .001. Since this p-value is less than = .10, .05, .01, and .001, we
have extremely strong evidence that is false. That is, we have
extremely strong evidence that at least one of x1, x2, x3 is significantly related to y.
14.11 (1) SSE = 1,798,712.2179
s2 = = = 149892.68483
s= = 387.15977
(2) total variation = 464,126,601.6
unexplained = 1,798,712.2179
explained = 462,327,889.39
(3) R2 = = .9961
and close to each other and 1.
(4) F = = =
1028.131
(5) Based on 3 and 12 degrees of freedom, F.05 = 3.49
F = 1028.131 > F.05 = 3.49. Reject H0 : at  = .05
14-3
(6) Based on 3 and 12 degrees of freedom, F.01 = 5.95
F = 1028.131 > F.01 = 5.95. Reject H0 : at  = .01
(7) p-value = .0001. Reject H0 at  = .05, .01, and .001.
14.12 significantly related to y with strong (a) or very strong (b) evidence
14.13 Explanations will vary.
14.14 We first consider the intercept

(1) b0 = 29.347, = 4.891, t = 6.00
where t = b0/ = 29.347/4.891 = 6.00
(2) We reject H0: = 0 (and conclude that the intercept is significant) with  = .05
if > t.05/2 = t.025
Since t.025 = 2.365 (with n – (k + 1) = 10 – (2 + 1) = 7 degrees of freedom), we have t =
6.00 > t.025 = 2.365.
We reject H0 : = 0 with  = .05 and conclude that the intercept is significant at the .05
level.
(3) We reject H0 : = 0 with  = .01 if > t.01/2 = t.005
Since t.005 = 3.499 (with 7 degrees of freedom), we have t = 6.00 > t .005 = 3.499.
We reject H0 : = 0 with  = .01 and conclude that the intercept is significant at
the .01 level.
(4) The Minitab output tells us that the p-value for testing H0 : = 0 is 0.000. Since this p-
value is less than each given value of , we reject H0 : = 0 at each of these values of
. We can conclude that the intercept is significant at the .10, .05, .01, and .001
levels of significance.
(5) A 95% confidence interval for is
[b0 t/2 ] = [b0 t.025 ]
= [29.347 2.365(4.891)]
= [17.780, 40.914]
This interval has no practical interpretation since is meaningless.
(6) A 99% confidence interval for is

[b0 t.005 ] = [29.347 3.499(4.891)]
= [12.233, 46.461]
We next consider .
14-4
(1) b1 = 5.6128, = .2285, t = 24.56

where t = b1 / = 5.6128/.2285 = 24.56
(2), (3), and (4):
We reject H0 : = 0 (and conclude that the independent variable x1 is significant) at
level of significance  if > t/2. Here t/2 is based on n – (k + 1) = 10 – 3 = 7 d.f.
For  = .05, t/2 = t.025 = 2.365, and for  = .01, t/2 = t.005 = 3.499.
Since t = 24.56 > t.025 = 2.365, we reject H0 : = 0 with  = .05.
Further, the Minitab output tells us that the p-value related to testing H0 : = 0 is 0.000.
Since this p-value is less than each given value of , we reject H0 at each of these values
of  (.10, .05, .01, and .001).
The rejection points and p-values tell us to reject H0 : = 0 with  = .10,  = .05, 
= .01, and  = .001. We conclude that the independent variable x1 (home size) is
significant at the .10, .05, .01, and .001 levels of significance.
(5) and (6):
95% interval for :
[b1 t.025 ] = [5.6128 2.365(.2285)]
= [5.072, 6.153]
99% interval for :
[b1 t.005 ] = [5.6128 3.499(.2285)]
= [4.813, 6.412]
For instance, we are 95% confident that the mean sales price increases by between $5072
and $6153 for each increase of 100 square feet in home size, when the rating stays
constant.
Last, we consider .
(1) b2 = 3.8344, = .4332, t = 8.85
where t = b2/ = 3.8344/.4332 = 8.85
(2), (3), and (4):
We reject H0 : = 0 (and conclude that the independent variable x2 is significant) at
level of significance  if > t/2. Here, t/2 is based on n – (k +1) = 10 – 3 = 7 d.f.
For  = .05, t/2 = t.025 = 2.365, and for  = .01, t/2 = t.005 = 3.499.
Further, the Minitab output tells us that the p-value related to testing H0 : = 0 is
0.000. Since this p-value is less than each given value of , we reject H0 at each of these
values of  (.10, .05, .01, and .001).
The rejection points and p-values tell us to reject H0 : = 0 with  = .10,  = .05, 
= .01, and  = .001.
14-5
We conclude that the independent variable x2 (niceness rating) is significant at

the .10, .05, .01, and .001 levels of significance.
(5) and (6):
95% interval for :
[b2 t.025 ] = [3.8344 2.365(.4332)]
= [2.810, 4.860]
99% interval for :
[b2 t.005 ] = [3.8344 3.499(.4332)]
= [2.319, 5.350]
For instance, we are 95% confident that the mean sales price increases by between $2810
and $4860 for each increase of one rating point, when the home size remains constant.
14.15 Works like Exercise 12.22
y=
n – (k + 1) = 30 – (3 + 1) = 26
Rejection points:
t.025 = 2.056 t.005 = 2.779
H0 : =0 t= = 3.104; Reject H0 at  = .05,  = .01
H0 : =0 t= = –3.696; Reject H0 at  = .05, not .01
H0 : =0 t= = 5.459; Reject H0 at  = .05,  = .01
H0 : =0 t= = 3.981; Reject H0 at  = .05,  = .01

p-value for testing H0 : = 0 is .001; Reject H0 at  = .01
H0 : = 0 is less than .001; Reject H0 at  = .001
H0 : = 0 is .0005; Reject H0 at  = .001
95% C.I.: [bj 2.056 ]
99% C.I.: [bj 2.779 ]
14.16 Works like Exercise 12.22
y=
n – (k + 1) = 16 – (3 + 1) = 12
Rejection points:
t.025 = 2.179 t.005 = 3.055
H0 : =0 t= = 3.861; Reject H0 at  = .05,  = .01
H0 : =0 t= = 2.958; Reject H0 at  = .05, not .01
14-6
H0 : =0 t= = 15.386; Reject H0 at  = .05,  = .01
H0 : =0 t= = –4.196; Reject H0 at  = .05,  = .01
p-value for testing H0 : = 0 is .0120; Reject H0 at  = .05

H0 : = 0 is .0001; Reject H0 at  = .001
H0 : = 0 is .0012; Reject H0 at  = .01
95% C.I.: [bj 2.179 ]
99% C.I.: [bj 3.055 ]
14.17 You can be x% confident that a confidence interval contains the true average value of y given
particular values of the x’s while you can be x% confident that a prediction interval contains
an individual value of y given particular values of the x’s.
14.18 The midpoint of both the confidence interval and the prediction interval is the value of y-hat
given by the least squares model.
14.19 a. Point estimate is = 172.28 ($172,280)
95% confidence interval is [168.56, 175.99]
b. Point prediction is = 172.28
95% prediction interval is [163.76, 180.80]
c. Stdev Fit = = 1.57
This implies that Distance value = (1.57 / s)2
= (1.57 / 3.242)2
= 0.2345
The 99% confidence interval for mean sales price is
[ t.005 ] with t.005 based on 7 degrees of freedom
= [172.28 3.499(1.57)]
= [172.28 5.49]
= [166.79, 177.77]
The 99% prediction interval for an individual sales price is
[ t.005 ]
= [172.28 3.499(3.242) ]
14-7
= [172.28 12.60]
= [159.68, 184.88]
14.20 a. 95% PI
890,255 bottles
791,876($3.70) = $2,929,941.20
b. 99% PI; t.005 = 2.779
[8.41065  2.779(.235) ] = [7.74465,9.07665]
14.21 y = 17207.31 is above the upper limit of the interval [14906.2, 16886.3]; this y-value is
unusually high.
14.22 An independent variable that is measured on a categorical scale.
14.23 There is one dummy variable for each value a categorical variable can take on. You use m-1
dummy variables to model a categorical variable that can take on m values.
14.24 The difference in the average of the dependent variable when the dummy variable equals 1 and
when the dummy variable equals 0.
14.25 a. Different y-intercept for the different types of firms.
b. equals the difference between the mean innovation adoption times of stock
companies and mutual companies.
c. p-value is less than .001; Reject H0 at both levels of .
is significant at both levels of .
95% CI for : [4.9770, 11.1339]; 95% confident that for any size of insurance firm,
the mean speed at which an insurance innovation is adopted is between 4.9770 and
11.1339 months faster if the firm is a stock company rather than a mutual company.
d. No interaction
14.26 a. The pool coefficient is $25,862.30. Since the cost of the pool is $35,000 you expect to
recoup $25,862.3 / $35,000 = 74%.
b. There is not an interaction between pool and any other independent variable.
14.27 a. =
=
14-8
=
b. F = 184.57, p-value < .001:
Reject H0; conclude there are differences.
c.
95% C.I. for βM:
95% C.I. for βT:
d. 77.20, [75.040, 79.360], [71.486, 82.914]
e.
p-value
e.
p-value <.001, is significant
14.28 a. The point estimate of the effect on the mean of campaign B compared to campaign A is
b4 = 0.2695.
The 95% confidence interval = [0.1262, 0.4128]
The point estimate of the effect on mean of campaign C compared to campaign A is b 5 =

0.4396.
The 95% confidence interval = [0.2944, 0.5847]
Campaign C is probably most effective even though intervals overlap.
b. = 8.7154 – 2.768(3.7) + 1.6667(3.9) + 0.4927(6.5) + 0.4396 = 8.61621
Confidence interval = [8.5138,8.71862] Prediction interval = [8.28958, 8.94285]
c. = effect on mean of Campaign C compared to Campaign B.
14-9
d. is significant at alpha = 0.1 and alpha = 0.05 because p-value = 0.0179. Thus there is
strong evidence that is greater than 0.
95% confidence interval= [0.0320,0.3081], since it doesn't overlap 0, we are at least 95%
confident that is greater than 0.
14.29 a. No interaction since p-values are so large.
b. = 8.61178 (861,178 bottles)
95% prediction interval = [8.27089,8.95266]—slightly bigger
14.30 Complete:
Reduced:
14.31 (k – g) denotes the number of regression parameters we have set equal to zero in .
[n – (k + 1)] denotes the denominator degrees of freedom.
14.32 To test in Model 2, we note that Model 2 is the complete model, and therefore
k = 5. If is true, Model 2 becomes Model 1, which is the reduced model, and therefore k – g =
2. It follows that .
Based on 2 and 24 degrees of freedom . Since F = 19.703 is greater than , we reject at . We
have seen in Exercise 12.45 that . Therefore, if , it follows that and . That is, is equivalent
to .
14.33 Model 3—complete

Model 1—reduced
based on 4 and 22 degrees of freedom.

Since 9.228 > 4.31, reject at  = .05 and .01.
14-10
14.34 Model 3—complete

Model 2—reduced

Since 0.1502 < 3.44, do not reject at  = .05 or .01.
14.35 When plotting residuals versus predicted values of y you look for the residuals to fall in a
horizontal band to indicate constant variance. When plotting residuals versus the individual
x’s you look for the residuals to fall in a random pattern indicating a linear model is sufficient.
When doing a normal probability plot of the residuals you look for a straight line pattern to
conclude that the errors follow a normal distribution. Finally you plot the residuals versus
time (or observation order) and look for a random pattern to conclude that the error terms are
independent.
14.36 By doing a normal probability plot of the residuals.
14.37 a. straight line appearance
b. explanations will vary
14.38 Autocorrelation still present based on the residual plot below.
Residuals
4.7
Residual (gridlines = std. error)
2.4
0.0
-2.4
-4.7
-7.1
0 5 10 15 20
Observation
14.39
14-11
14.40 The estimates of the coefficients indicate that at a specified square footage, adding rooms
increases selling price while adding bedrooms reduces selling price. Thus building both a
family room and a living room (while maintaining square footage) should increase sales price.
In addition, adding a bedroom at the cost of another room will tend to decrease selling price.
14.41 If you do promotions on weekends, then that promotion, on average nets you 4745 – 4690 = 55
while promotions on day games net you on average 4745 + 5059 = 9804. Thus promotions on
day games gain, on average, more attendance.
They should change when some promotions are done.
14.42 a. Residual plots show all regression assumptions are being met.
b. Normal probability plot addresses whether or not error terms follow a normal
distribution. Since this plot is approximately a straight line we conclude the errors follow
a normal distribution.
14.43 a. d = 0.306604 < 0.5349 so classify this employee into group 0.
b. y0-hat = -298.27 + 5.2*85 + 1.97*82 = 305.27
y1-hat = -351.65 + 5.68*85 + 2.1*82 = 303.35
Since y0-hat > y1-hat classify person into group 0.
Internet Exercise:
14.44 a. All the scatterplots below show linear relationships. Interpretations will vary.
14-12
14-13
14-14
14-15
b.
Regression Analysis: CityMPG versus Weight, WhlBase, Disp(L), Domestic
The regression equation is

CityMPG = 28.1 - 0.0115 Weight + 0.275 WhlBase + 0.646 Disp(L) - 1.55 Domestic
Predictor Coef SE Coef T P

Constant 28.108 6.926 4.06 0.000
Weight -0.011483 0.001405 -8.17 0.000
WhlBase 0.27535 0.09251 2.98 0.004
Disp(L) 0.6457 0.6031 1.07 0.287
Domestic -1.5503 0.6923 -2.24 0.028
S = 2.90295 R-Sq = 74.5% R-Sq(adj) = 73.3%
Analysis of Variance
Source DF SS MS F P
Regression 4 2163.98 541.00 64.20 0.000
Residual Error 88 741.59 8.43
Total 92 2905.57
14-16
b0 = 28.1 which is the average city mpg when all variables are 0. No practical
meaning.
b1 = -0.0115 which means that as weight goes up by 1 city mpg goes down by 0.0115.
b2 = 0.275 which means that as wheel base goes up by 1 city mpg goes up by 0.275.
b3 = 0.646 which means that as displacement goes up by 1, city mpg goes up by 0.646.
b4 = -1.55 which means the average city mpg for domestics is 1.55 less than the
average city mpg for imports.
14-17

Chapter 14 - Multiple Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 14 - Multiple Regression

Uploaded by

Copyright:

Available Formats

Chapter 14 - Multiple Regression

CHAPTER 14—Multiple Regression

14.3 a. b0 = 29.347, bl = 5.6128, b2 = 3.8344

b. 172.28. From = 29.347 + 5.6128(20) + 3.8344(8)

14.4 a. b0 = 7.5891, bl = –2.3577, b2 = 1.6122, b3 = 0.5012

b. = 8.41099, = 7.5891 – 2.3577(3.70) + 1.6122(3.90) + .5012(6.50)

sales revenue = ($3.70)(841,065 bottles) = $3,112,066.30

14.5 a. b0 = 1946.8020, b1 = 0.0386, b2 = 1.0394, b0 = –413.7578

b. = 15896.24725, from = 1946.802 + .03858 (56194) + 1.0394 (14077.88) –

(2) Total variation = 7447.5

and close together and close to 1.

(2) Total variation = 13.4586

and close together.

14.11 (1) SSE = 1,798,712.2179

(2) total variation = 464,126,601.6

and close to each other and 1.

(5) Based on 3 and 12 degrees of freedom, F.05 = 3.49

F = 1028.131 > F.05 = 3.49. Reject H0 : at  = .05

(6) Based on 3 and 12 degrees of freedom, F.01 = 5.95

F = 1028.131 > F.01 = 5.95. Reject H0 : at  = .01

(7) p-value = .0001. Reject H0 at  = .05, .01, and .001.

14.13 Explanations will vary.

14.14 We first consider the intercept

(5) A 95% confidence interval for is

[b0 t/2 ] = [b0 t.025 ]

This interval has no practical interpretation since is meaningless.

(6) A 99% confidence interval for is

(1) b1 = 5.6128, = .2285, t = 24.56

We conclude that the independent variable x2 (niceness rating) is significant at

H0 : =0 t= = 3.104; Reject H0 at  = .05,  = .01

H0 : =0 t= = –3.696; Reject H0 at  = .05, not .01

H0 : =0 t= = 5.459; Reject H0 at  = .05,  = .01

H0 : =0 t= = 3.981; Reject H0 at  = .05,  = .01

H0 : =0 t= = 3.861; Reject H0 at  = .05,  = .01

H0 : =0 t= = 2.958; Reject H0 at  = .05, not .01

H0 : =0 t= = 15.386; Reject H0 at  = .05,  = .01

H0 : =0 t= = –4.196; Reject H0 at  = .05,  = .01

p-value for testing H0 : = 0 is .0120; Reject H0 at  = .05

14.19 a. Point estimate is = 172.28 ($172,280)

95% confidence interval is [168.56, 175.99]

b. Point prediction is = 172.28

95% prediction interval is [163.76, 180.80]

c. Stdev Fit = = 1.57

This implies that Distance value = (1.57 / s)2

The 99% confidence interval for mean sales price is

[ t.005 ] with t.005 based on 7 degrees of freedom

The 99% prediction interval for an individual sales price is

14.22 An independent variable that is measured on a categorical scale.

14.25 a. Different y-intercept for the different types of firms.

c. p-value is less than .001; Reject H0 at both levels of .

is significant at both levels of .

Reject H0; conclude there are differences.

95% C.I. for βM:

95% C.I. for βT:

d. 77.20, [75.040, 79.360], [71.486, 82.914]

p-value <.001, is significant

The 95% confidence interval = [0.1262, 0.4128]

The point estimate of the effect on mean of campaign C compared to campaign A is b 5 =

The 95% confidence interval = [0.2944, 0.5847]

Campaign C is probably most effective even though intervals overlap.

b. = 8.7154 – 2.768(3.7) + 1.6667(3.9) + 0.4927(6.5) + 0.4396 = 8.61621

Confidence interval = [8.5138,8.71862] Prediction interval = [8.28958, 8.94285]

c. = effect on mean of Campaign C compared to Campaign B.

b. y0-hat = -298.27 + 5.285 + 1.9782 = 305.27

y1-hat = -351.65 + 5.6885 + 2.182 = 303.35