You are on page 1of 34

Solution Manual for A Second Course in Statistics:

Regression Analysis, 8th Edition, William


Mendenhall, Terry T. Sincich

To download the complete and accurate content document, go to:


https://testbankbell.com/download/solution-manual-for-a-second-course-in-statistics-r
egression-analysis-8th-edition-william-mendenhall-terry-t-sincich/
Solution Manual for A Second Course in Statistics: Regression Analysis, 8th Edition, William

Chapter 9 9-1

Chapter

Special Topics in Regression


(Optional) 9
9.1 a. For this case, k = 15 is the knot value (i.e., the value of x at which the slope changes).
Since there is no discontinuity in the linear relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 ( x1 -15) x2
ì
ï1 if x > 15
where x1 = x and x2 = ï
í
ï
î0 if not
ï

b. For x £ 15, x2 = 0, and the model is E ( y ) =  0 + 1 x.

Thus,  0 = y-intercept and 1 = slope.

For x > 15, x2 = 1, the model is:

E ( y ) =  0 + 1 x +  2 ( x -15) = (  0 -15 2 ) + ( 1 +  2 ) x

Thus, (  0 -15 2 ) = y -intercept and ( 1 +  2 ) = slope.

c. If the two slopes are identical, then we must have  2 = 0. Therefore, to test for a
difference between the two slopes, we test:

H 0 :  2 = 0 against H a :  2 ¹ 0

using a t test.

9.2 a. For this case, k1 = 1.45 and k2 = 5.20 are the knot values. Since there is no
discontinuity in the linear relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 ( x1 -1.45) x2 +  3 ( x1 - 5.20) x3

ì
ï1 if x1 > 1.45 ìï1 if x1 > 5.20
where x1 = x and x2 = ïí and x3 = ïí
ï
î0 if not
ï ïïî0 if not

b. For x £ 1.45, x2 = 0, and x3 = 0, the model is E ( y ) =  0 + 1 x1.

Thus,  0 = y-intercept and 1 = slope.

For 1.45 < x £ 5.20, x2 = 1, and x3 = 0, the model is

E ( y ) =  0 + 1 x1 +  2 ( x1 -1.45) = (  0 -1.45 2 ) + ( 1 +  2 ) x1

Copyright © 2020 Pearson Education, Inc.

Visit TestBankBell.com to get complete for all chapters


9-2 Special Topics in Regression

Thus, (  0 -1.45 2 ) = y -intercept and ( 1 +  2 ) = slope.

For x > 5.20, x2 = 1, and x3 = 1, the model is

E ( y ) =  0 + 1 x1 +  2 ( x1 -1.45) +  3 ( x1 - 5.20)
= (  0 -1.45 2 - 5.20  3 ) + ( 1 +  2 +  3 ) x1

Thus, (  0 -1.45 2 - 5.20 3 ) = y -intercept and ( 1 +  2 +  3 ) = slope.

c. If all three of the slopes are identical, the model specifies that  2 =  3 = 0 Therefore, to
test for a difference in at least two of the three slopes, we test:

H 0 :  2 = 3 = 0
H a : At least one  i ¹ 0
using a partial F test.

9.3 a. For this case, k = 320 is the knot value. Since there is discontinuity in the linear
relationship, the appropriate linear model is:

E ( y ) =  0 + 1 x1 +  2 ( x1 - 320) x2 +  3 x2

ì1 if x > 320
ï
where x1 = x and x2 = ï
í
ï
î0 if not
ï

b. For x £ 320, x2 = 0, and the model is E ( y ) =  0 + 1 x

Thus,  0 = y-intercept and 1 = slope.

For x > 320, x2 = 1, and the model is:

E ( y ) =  0 + 1 x +  2 ( x - 320) +  3 = (  0 - 320  2 +  3 ) + ( 1 +  2 ) x

Thus, (  0 - 320  2 +  3 ) = y -intercept and ( 1 +  2 ) = slope.

c. To test for a difference between the two lines, we test:

H 0 :  2 = 3 = 0
H a : At least one  i ¹ 0
using a nested F test.

9.4 a. When age is less than the change point, x1 = 0 and x2 = 1. The equation is
E (BMI) =  0 + 1 (AGE ) x2 +  2 (AGE - CP) x1 =  0 + 1 (AGE ).
The slope of the line is 1.

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-3

b. When age is greater than the change point, x1 = 1 and x2 = 0. The equation is
E (BMI) =  0 + 1 (AGE ) x2 +  2 (AGE - CP) x1 =  0 +  2 (AGE - CP).
The slope of the line is  2 .

9.5 a. For this case, k1 = 50 and k2 = 100 are the knot values. Since there is no discontinuity
in the linear relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 ( x1 - 50) x2 +  3 ( x1 -100) x3

ïì1 if x1 > 50 ìï1 if x1 > 100


where x1 = x and x2 = ïí and x3 = ïí
ïïî0 if not ïïî0 if not

b. For x £ 50, x2 = 0, and x3 = 0, the model is E ( y ) =  0 + 1 x1.

Thus,  0 = y-intercept and 1 = slope.

For 50 < x £ 100, x2 = 1, and x3 = 0, the model is

E ( y ) =  0 + 1 x1 +  2 ( x1 - 50) = (  0 - 50 2 ) + ( 1 +  2 ) x1

Thus, (  0 - 50 2 ) = y -intercept and ( 1 +  2 ) = slope.

For x > 100, x2 = 1, and x3 = 1, the model is

E ( y ) =  0 + 1 x1 +  2 ( x1 - 50) +  3 ( x1 -100)
= (  0 - 50 2 -100  3 ) + ( 1 +  2 +  3 ) x1

Thus, (  0 - 50 2 -100 3 ) = y -intercept and ( 1 +  2 +  3 ) = slope.

c. For this case, k1 = 50, k2 = 75, and k3 = 100 are the knot values. Since there is no
discontinuity in the linear relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 ( x1 - 50) x2 +  3 ( x1 - 75) x3 +  4 ( x1 -100) x4

ïì1 if x1 > 50 ì
ï1 if x1 > 75 ì1 if x1 > 100
ï
where x1 = x and x2 = ïí , x3 = ï
í , and x4 = ï
í
ïïî0 if not ï
î0 if not
ï ï
î0 if not
ï

9.6 a. A plot of the cells against the number of hours is:

Copyright © 2020 Pearson Education, Inc.


9-4 Special Topics in Regression

Scatterplot of CELLS vs HOURS

20

15

CELLS

10

0
0 20 40 60 80 100 120 140
HOURS

There appears to be an increasing trend until hours reach approximately 70, then a
decreasing trend after hours exceeds 70.

b. For this case, k = 70 is the knot value. Since there is no discontinuity in the linear
relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 x2*
ìï1 if x ³ 70
where x1 = x, x2* = ( x1 - 70) x2 , and x2 = ïí
ïïî0 if x < 70

c. Using MINITAB, the results are:

Regression Analysis: CELLS versus X1, X2*


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 415.2 207.598 26.48 0.000
Error 14 109.7 7.839
Total 16 524.9

Model Summary
S R-sq R-sq(adj)
2.79981 79.09% 76.11%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 0.17 1.53 0.11 0.914
X1 0.2221 0.0340 6.53 0.000
X2* -0.2729 0.0631 -4.33 0.001

Regression Equation
CELLS = 0.17 + 0.2221 X1 - 0.2729 X2*

The least squares line is yˆ = 0.17 + 0.2221x1 - 0.2729 x2* .

d. To determine if the overall model is useful for predicting number of cells, we test:

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-5

H 0 : 1 =  2 = 0
H a : At least one  i ¹ 0

The test statistic is F = 26.48 and the p-value is p = 0.000. Since the p-value is less than
 ( p = 0.000 < 0.05), H0 is rejected. There is sufficient evidence to indicate that the
overall model is useful for predicting the number of cells at  = 0.05.

e. Less than 70 hours: Slope = 1 = 0.2221


More than 70 hours: Slope = 1 +  2 = 0.2221 - 0.2729 = -0.0508

f. To determine if the slopes differ, we test:

H0 : 2 = 0
Ha : 2 ¹ 0

The test statistic is t = -4.33 and the p-value is p = 0.001. Since the p-value is less than
 ( p = 0.001 < 0.05), H0 is rejected. There is sufficient evidence to indicate that that
the slopes differ for less than 70 hours and more than 70 hours at  = 0.05.

9.7 a. Knot values are 4 and 7.

b. The hypothesized model is E ( y )   0  1 x1   2  x1  4 x2   3  x1  7  x3 .


1 if x  4 1 if x  7
where x1  x , x2   , and x3  
0 if not 0 if not

c.
x4 4 x7 x7
Slope 1  1   2   1   2   3 
d. For every 1-point increase in performance over range x  4, satisfaction is estimated to
increase 5.05 units. For every 1-point increase in performance over the range 4  x  7,
satisfaction is estimated to increase 0.59 units. For every 1-point increase in
performance over the range 7  x  10, satisfaction is estimated to increase 1.45 units.

9.8 a. Yes, since there appears to be an increase in the slope starting at 1.5 cubic meters.

b. Since there is no discontinuity in the linear relationship, the appropriate model is:

E ( y ) =  0 + 1 x1 +  2 x2*
ì1 if x1 > 1.5
ï
where x1 = x, x2* = ( x1 -1.5) x2 , and x2 = ïí
ï
î0 if x1 £ 1.5
ï

c. Using MINITAB, the results are:

Copyright © 2020 Pearson Education, Inc.


9-6 Special Topics in Regression

Regression Analysis: BedLoad versus X1, X2*


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 0.007991 0.003996 118.39 0.000
Error 73 0.002464 0.000034
Total 75 0.010455

Model Summary
S R-sq R-sq(adj)
0.0058094 76.44% 75.79%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -0.00487 0.00223 -2.19 0.032
X1 0.01192 0.00208 5.73 0.000
X2* 0.05812 0.00814 7.14 0.000

Regression Equation
BedLoad = -0.00487 + 0.01192 X1 + 0.05812 X2*

To determine the adequacy of the model, we test:

H 0 : 1 =  2 = 0
H a : At least one  i ¹ 0

The test statistic is F = 118.39 and the p-value is p = 0.000. Since the p-value is less
than  ( p = 0.000 < 0.05), H0 is rejected. There is sufficient evidence to indicate that
that the overall model is useful for predicting the number of cells at  = 0.05.

d. For every 1 cubic meter per second increase in discharge rate for flows less than 1.5
cubic meters per second, the bedload transport rate is estimated to increase by 0.01192.
For every 1 cubic meter per second increase in discharge rate for flows greater than or
equal to 1.5 cubic meters per second, the bedload transport rate is estimated to increase
by 0.01192  0.05812  0.07004.

9.9 a. The plot of the data is:

Scatterplot of Length vs Load


5.5

5.0

4.5
Length

4.0

3.5

2.5 3.0 3.5 4.0


Load

Yes, there is a knot at 3.55.

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-7

b. Since there is no discontinuity in the linear relationship, the appropriate model is:
E ( y )   0  1 x1   2 x2*
1 if x  3.55
where x1  load, x2*   x1  3.55 x2 and x2  
0 if not

c. Using MINITAB, the results are:

Regression Analysis: Length versus X1, X2*


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 7.27925 3.63962 1371.01 0.000
Error 17 0.04513 0.00265
Total 19 7.32437

Model Summary
S R-sq R-sq(adj)
0.0515237 99.38% 99.31%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 2.221 0.115 19.25 0.000
X1 0.5287 0.0393 13.46 0.000
X2* 2.629 0.162 16.25 0.000

Regression Equation
Length = 2.221 + 0.5287 X1 + 2.629 X2*

yˆ  2.221  0.5287 x1  2.629 x2* .

To determine the adequacy of the model, we test:

H 0 : 1 =  2 = 0
H a : At least one  i ¹ 0

The test statistic is F = 1371.01 and the p-value is p = 0.000. Since the p-value is less
than  ( p = 0.000 < 0.05), H0 is rejected. There is sufficient evidence to indicate that
that the overall model is useful for predicting crack length at  = 0.05.

9.10 The fitted regression line is yˆ  320.63639  0.08338 x. For y p = 325,


y p - ˆ0 325 - 320.63639
xˆ = = = 52.334
ˆ 1
0.08338

9.11 The fitted regression line is yˆ  27.237  0.558 x. For y p = 75,


y p - ˆ0 75 - (-27.237)
xˆ = = = 183.22
ˆ 1
0.558

Copyright © 2020 Pearson Education, Inc.


9-8 Special Topics in Regression

For  = 0.05 with df = n - 2 = 8 - 2 = 6, t0.025 = 2.447. From the printout,


s = 5.269 and ˆ = 0.558.
1

2 2
SS xx = å x 2 - n ( x ) = 245,080 - 8(174.25) = 2175.5

The 95% confidence interval is


æ s ö÷ 2 2
1 ( xˆ - x ) æ 5.269 ö÷ 1 (183.22 -174.25)
xˆ  t /2 ççç ÷÷ 1 + +  183.22  2.447 çç ÷ 1 + +
çè ˆ1 ÷÷ø n SS xx èç 0.558 ÷ø 8 2175.5
 183.22  24.91  (158.31, 208.13)
We are 95% confident that the maximal oxygen uptake value will be between 158.31 and
208.13 when the mean heart rate is at 75% of the maximum heart rate.

9.12 We want to predict the pectin amount, x, required to yield a sweetness index of y p = 5.8.
The fitted regression line is yˆ  6.252  0.002311x. The estimate of x is calculated as follows:

y p - ˆ0 5.8 - (6.252)


xˆ = = = 195.7
ˆ1 -0.00231

For  = 0.05 with df = n - 2 = 24 - 2 = 22, t0.025 = 2.074. From the printout,


s = 0.214998 and ˆ1 = 6.252.

2 æ 6167 ö÷2
SS xx = å x 2 - n ( x ) = 1,641,115 - 24çç = 56, 452.96
çè 24 ÷ø÷

The 95% confidence interval is


æ s ö÷ 2 2
1 ( xˆ - x ) æ 0.215 ö÷ 1 (195.7 - 256.96)
xˆ  t /2 ççç ÷÷ 1 + +  195.7  2.074çç ÷ 1 + +
çè ˆ1 ÷ø÷ n SS xx èç -0.00231÷ø 24 56, 452.96
 195.7  203.2  (-7.5, 398.9)
We are 95% confident that the amount of pectin required to yield a sweetness index of 5.8
will fall between 0 and 398.9 parts per million.

9.13 Using MINITAB, the results are:

Regression Analysis: MASS versus TIME


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 89.79 89.7942 122.19 0.000
Error 21 15.43 0.7349
Total 22 105.23

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-9

Model Summary
S R-sq R-sq(adj)
0.857257 85.33% 84.64%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 5.221 0.296 17.64 0.000
TIME -0.1140 0.0103 -11.05 0.000

Regression Equation
MASS = 5.221 - 0.1140 TIME

The fitted regression line is yˆ  5.221  0.114 x. We want to predict the time, x, required to
yield a spillage mass of y p = 5.
y p  ˆ0 5  5.221
xˆ    1.94
ˆ1 0.114

For  = 0.05 with df = n - 2 = 23 - 2 = 21, t0.025 = 2.080. From the printout,


s = 0.857257 and ˆ1 = -0.114.

2 2 æ 526 ö÷2
SS xx = å x - n ( x ) = 18,936 - 23çç = 6906.609
çè 23 ÷÷ø

The 95% confidence interval is


1  xˆ  x  1 1.94  22.87 
2 2
 s   0.857257 
ˆx  t /2   1    1.94  2.08  1 
 ˆ1  n SS xx  0.114  23 6906.609
 1.94  16.46  14.52, 18.40
We are 95% confident that the amount of time used to produce 5 pounds of spillage will fall
between 0 and 18.40 minutes.

9.14 Using MINITAB, the results are:

Regression Analysis: DAMAGE versus DISTANCE


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 841.77 841.766 156.89 0.000
Error 13 69.75 5.365
Total 14 911.52

Model Summary
S R-sq R-sq(adj)
2.31635 92.35% 91.76%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 10.28 1.42 7.24 0.000
DISTANCE 4.919 0.393 12.53 0.000

Copyright © 2020 Pearson Education, Inc.


9-10 Special Topics in Regression

Regression Equation
DAMAGE = 10.28 + 4.919 DISTANCE

The fitted regression line is yˆ  10.28  4.919 x. We want to predict the distance, x, for a
residential fire causing damage of y p = 18.2.
y p  ˆ0 18.2  10.28
xˆ    1.61
ˆ 1
4.919

For  = 0.10 with df = n - 2 = 15 - 2 = 13, t0.05 = 1.771. From the printout,


s = 2.31635 and ˆ = 4.919. 1

2 2 æ 49.2 ö÷2
SS xx = å x - n ( x ) = 196.16 -15çç = 34.784
çè 15 ÷÷ø

The 90% confidence interval is


1  xˆ  x  1 1.61  3.28
2 2
 s   2.31635 
xˆ  t /2   1    1.61  1.771  1  
 ˆ1  n SS xx  4.919  15 34.784
 1.61  0.893  0.717, 2.503
We are 90% confident that the distance from a fire where the damage was $18,200 will fall
between 0.717 and 2.503 miles.

9.15 a. Using MINITAB, the results are:

Regression Analysis: DECREASE versus DOSAGE


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 265.15 265.152 107.14 0.000
Error 6 14.85 2.475
Total 7 280.00

Model Summary
S R-sq R-sq(adj)
1.57313 94.70% 93.81%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -2.03 1.60 -1.27 0.252
DOSAGE 6.061 0.586 10.35 0.000

Regression Equation
DECREASE = -2.03 + 6.061 DOSAGE

The fitted model is yˆ  2.03  6.061x.

b. To determine if the model is adequate, we test:

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-11

H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is t  10.35 and the p-value is p  0.000. Since the p-value is less
than  ( p = 0.000 < 0.05), H0 is rejected. There is sufficient evidence to indicate that
that the overall model is useful for predicting decrease in pulse rate at  = 0.05.

c. We want to predict dosage, x, for a reduction in pulse rate of y p = 10.


y p  ˆ0 10   2.03
xˆ    1.985
ˆ
1
6.061

For  = 0.05 with df = n - 2 = 8 - 2 = 6, t0.025 = 2.447. From the printout,


s = 1.57313 and ˆ = 6.061.
1

2 æ 20.5 ö÷2
SS xx = å x 2 - n ( x ) = 59.75 - 8çç = 7.21875
çè 8 ÷÷ø

The 95% confidence interval is


1  xˆ  x 
2
 s 
xˆ  t /2   1  
 ˆ1  n SS xx

1 1.985  2.5625
2
 1.57313 
 1.985  2.447  1 
 6.061  8 7.21875
 1.985  0.687  1.298, 2.672
We are 95% confident that to reduce a patient’s pulse rate by 10 beats/minute, the
dosage will be between 1.298 cc and 2.672 cc.

1 1 1
9.16 a. Use weights wi  b. Use weights wi  c. Use weights wi 
xi2 xi xi

1 1
d. Use weights wi   ni e. Use weights wi   xi
1 / ni 1 / xi

9.17 a. Using MINITAB, the results are:

Regression Analysis: DEFECTS versus SPEED


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 940.9 940.90 20.45 0.001
Error 13 598.0 46.00
Total 14 1538.9

Copyright © 2020 Pearson Education, Inc.


9-12 Special Topics in Regression

Model Summary
S R-sq R-sq(adj)
6.78252 61.14% 58.15%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -3.37 6.67 -0.50 0.622
SPEED 0.1940 0.0429 4.52 0.001

Regression Equation
DEFECTS = -3.37 + 0.1940 SPEED

The fitted model is yˆ  3.37  0.194 x.

To determine if the model is useful for predicting the number of defectives, we test:

H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is t  4.52 and the p-value is p  0.001. Since the p-value is less
than  ( p = 0.001 < 0.05), H0 is rejected. There is sufficient evidence to indicate that
that the model is useful for predicting the number of defectives at  = 0.05.

b. The plot of the residuals versus speed is:

Scatterplot of RESI vs SPEED


15

10

5
RESI

0 0

-5

-10
100 120 140 160 180 200
SPEED

The assumption of constant variance appears to be violated. As speed increases, the


spread of the residuals increases.

c. Let ˆi = ( yi - yˆi ) represent the residual for the ith-observation. To obtain the variance
of the 5 residuals at each level of x, we use the short-cut formula for the sample
variance.
æ 5 ö÷2
5
ççç å ˆi ÷÷÷
2 èi=1 ø
å ˆi -
5
s 2 = i=1
5 -1

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-13

The results are shown in the following table:


x Variance of Residuals s 2 s2 / x s2 / x 2 s2 / x
100 20.7 0.2070 0.00207 2.070
150 44.3 0.2953 0.00197 3.617
200 84.3 0.4215 0.00211 5.961

Note that when the variance s 2 is divided by x 2 for each level of x, the result is a
constant (approximately 0.002). Therefore, the variance is proportional to x 2 , and the
appropriate weights to use in a weighted least squares are:
1
wi =
xi2

d. Using MINITAB, the results of the weighted least squares is:

Regression Analysis: DEFECTS versus SPEED


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 0.04224 0.042240 22.30 0.000
Error 13 0.02462 0.001894
Total 14 0.06686

Model Summary
S R-sq R-sq(adj)
0.0435195 63.18% 60.34%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -3.06 5.41 -0.57 0.581
SPEED 0.1919 0.0406 4.72 0.000

Regression Equation
DEFECTS = -3.06 + 0.1919 SPEED

The standard deviation of the unweighted least squares slope is 0.0429 while the
standard deviation of the weighted least squares slope is 0.0406, a slight improvement.

e. The weighted residuals are found using wi  yi  yˆi  , where yˆi is the predicted value of
yi obtained using the weight wi in the weighted least squares regression. The plot of the
weighted residuals is:

Copyright © 2020 Pearson Education, Inc.


9-14 Special Topics in Regression

Scatterplot of W-RES vs SPEED


0.075

0.050

0.025
W-RES

0.000 0

-0.025

-0.050

100 120 140 160 180 200


SPEED

From the residual plot, it is evident that the spread of the residuals for the three values
of x is constant. Thus, it appears that weighted least squares has corrected the problem
of unequal variances.

9.18 Using MINITAB, the results are:

Regression Analysis: SALARY versus EXP, EXPSQ


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 13723582237 6861791118 103.99 0.000
Error 47 3101279310 65984666
Total 49 16824861546

Model Summary
S R-sq R-sq(adj)
8123.09 81.57% 80.78%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 20242 4423 4.58 0.000
EXP 522 617 0.85 0.401
EXPSQ 53.0 19.6 2.71 0.009

Regression Equation
SALARY = 20242 + 522 EXP + 53.0 EXPSQ

To determine the proper weights, we use the following:

s 2j s 2j s 2j

Group Range of x xj s 2j xj x 2j xj
1 1  x  10 4.25 15,525,326.75 3,653,018.06 859,533.66 7,530,889.65
2 11  x  20 16.864 57,582,147.82 3,414,574.80 202,481.52 14,022,073.71
3 21  x  30 25.05 93,581,188.29 3,735,775.98 149,132.77 18,697,549.45

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-15

s 2j
The column where the values are similar is the column . Thus, we would suggest using
xj
1
the weight of w j  .
xj

9.19 a. Using MINITAB, the results are:

Regression Analysis: ARSENIC versus DEPTH-FT


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 237209 237209 20.75 0.000
Error 325 3715353 11432
Total 326 3952562

Model Summary
S R-sq R-sq(adj)
106.920 6.00% 5.71%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 140.6 11.8 11.93 0.000
DEPTH-FT -0.669 0.147 -4.56 0.000

Regression Equation
ARSENIC = 140.6 - 0.669 DEPTH-FT

The residual plot is:


Scatterplot of RESI vs DEPTH-FT
600

500

400

300
RESI

200

100

0 0

-100

-200
0 50 100 150 200 250
DEPTH-FT

The assumption of constant variance appears to be violated. As the depth increases, the
spread of the variance decreases. However, since there are very few observations for
large values of x (depth), it is difficult to determine if there really is a problem with non-
constant variance.

b. To determine the proper weights, we use the following:


   
2
xj s 2j s 2j / x j s 2j / x 2j s 2j / x j s 2j x j s 2j x j s 2j x j
Grp Range

1 25  x  85 49.826 12787.730 256.6473 5.1509 1811.6115 637,162.54 31,747,316.03 90,265.51

2 90  x  155 124.362 7193.473 57.8430 0.4651 645.0519 894,595.21 111,253,711.75 80,219.99

3 160  x  225 178.563 3162.062 17.7084 0.0992 236.6329 564,625.62 100,820,961.99 42,253.77

Copyright © 2020 Pearson Education, Inc.


9-16 Special Topics in Regression

Since the spread of the residuals decreases as the depth increases, the weight will have
to be a function of1 / x j rather than a function of x j . From the table above, it appears
that the best weight would be 1 / 1 / x j  x j .  
c. Using x j as the weight, the results are:
Regression Analysis: ARSENIC versus DEPTH-FT
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 31681581 31681581 46.14 0.000
Error 325 223180138 686708
Total 326 254861719

Model Summary
S R-sq R-sq(adj)
828.679 12.43% 12.16%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 143.6 11.3 12.67 0.000
DEPTH-FT -0.749 0.110 -6.79 0.000

Regression Equation
ARSENIC = 143.6 - 0.749 DEPTH-FT

Comparing the original model in part a to this model, we find that the R2 has increased
from 0.0600 to 0.1243. In addition, the standard error of the slope has decreased from
0.147 to 0.110.

The plot of the weighted residuals is:


Scatterplot of Wt-RESI vs DEPTH-FT
6000

5000

4000

3000
Wt-RESI

2000

1000

0 0

-1000
0 50 100 150 200 250
DEPTH-FT

Although the weighted least squares regression has a slightly better fit, the problem of
non-constant variance does not appear to be corrected.

9.20 a. Using MINITAB, the results are:

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-17

Regression Analysis: HEATRATE versus INLET-TEMP


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 107601712 107601712 116.00 0.000
Error 65 60295496 927623
Total 66 167897208

Model Summary
S R-sq R-sq(adj)
963.132 64.09% 63.54%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 21977 1020 21.55 0.000
INLET-TEMP -9.291 0.863 -10.77 0.000

Regression Equation
HEATRATE = 21977 - 9.291 INLET-TEMP

The residual plot is:

Scatterplot of RESI vs INLET-TEMP

3000

2000

1000
RESI

0 0

-1000

-2000
900 1000 1100 1200 1300 1400 1500
INLET-TEMP

It appears that as the inlet temperature increases, the spread of the residuals tends to
decrease. However, there really does not appear to be much of a problem.

b. To determine the proper weights, we use the following:


Group Range x s2 s2 / x s 2j / x 2j s 2j / x j
j j j j
1 888  x  1057 987.08 1,755,598.27 1778.583 1.8019 55,879.13
2 1058  x  1135 1104.07 1,159,125.55 1049.869 0.9509 34,884.53
3 1136  x  1177 1158.00 492,545.24 425.341 0.3673 14,474.11
4 1178  x  1310 1268.33 714,271.65 563.158 0.4440 20,056.11
5 1311  x  1427 1373.38 96,384.44 70.180 0.0511 2,600.82

No weight looks particularly good since there did not appear to be a real problem with
1
non-constant variance. Thus, we would suggest using the weight of w j  .
xj

c. Using MINITAB, the results of the weighted least squares are:

Regression Analysis: HEATRATE versus INLET-TEMP

Copyright © 2020 Pearson Education, Inc.


9-18 Special Topics in Regression

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 96622 96622.4 113.60 0.000
Error 65 55285 850.5
Total 66 151908

Model Summary
S R-sq R-sq(adj)
29.1641 63.61% 63.05%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant 22273 1046 21.30 0.000
INLET-TEMP -9.545 0.896 -10.66 0.000

Regression Equation
HEATRATE = 22273 - 9.545 INLET-TEMP

Comparing the original model in part a to this model, we find that the R2 has decreased
from 0.6409 to 0.6361. In addition, the standard error of the slope has increased from
0.863 to 0.896. As noted earlier, there really does not appear to be much of a problem
with the assumption of non-constant variance. Using the weighted least squares has not
improved the model.

9.21 When the response (i.e., dependent) variable y is recorded as 0 or 1, three major problems
arise:
1. The assumption of normal errors is violated.
2. The assumption of equal error variances is violated.
3. Predicted values may take on nonsensical values (i.e., negative values or values greater
than 1).
The first two problems are the most serious, since we have little or no faith in any inferences
derived from an ordinary least squares regression model when the standard regression
assumptions are violated. There is a violation of constant error variance assumption;
violation of assumption of normal errors; predicted y is not bounded between 0 and 1.

9.22 Stage 1:

Using MINITAB, the results of the first stage are:


Regression Analysis: OWN versus INCOME
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 0.6684 0.6684 3.10 0.095
Error 18 3.8816 0.2156
Total 19 4.5500

Model Summary
S R-sq R-sq(adj)
0.464373 14.69% 9.95%

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-19

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -0.182 0.319 -0.57 0.576
INCOME 0.000012 0.000007 1.76 0.095

Regression Equation
OWN = -0.182 + 0.000012 INCOME

The least squares equation is yˆ  0.182  0.000012 x. The predicted values are:

Obs. i yˆi Obs. i yˆi


1 0.251 11 0.324
2 0.190 12 0.183
3 0.492 13 0.312
4 0.315 14 0.156
5 0.536 15 0.385
6 0.204 16 0.244
7 0.235 17 0.201
8 0.166 18 0.767
9 0.494 19 0.297
10 0.796 20 0.451

Stage 2:

The predicted values above are used to calculate the appropriate weights for weighted least
1
squares regression, where wi  .
yˆi 1  yˆi 
Using MINITAB, the weighted least squares regression results are:

Regression Analysis: OWN versus INCOME


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 3.675 3.675 3.53 0.076
Error 18 18.718 1.040
Total 19 22.394

Model Summary
S R-sq R-sq(adj)
1.01975 16.41% 11.77%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -0.205 0.284 -0.72 0.480
INCOME 0.000012 0.000006 1.88 0.076

Regression Equation
OWN = -0.205 + 0.000012 INCOME

Copyright © 2020 Pearson Education, Inc.


9-20 Special Topics in Regression

The weighted least squares equation is yˆ  0.205  0.000012 x.


To determine if the model is useful for predicting y, we test:

H 0 : 1  0
H a : 1  0

The test statistic is t  1.88 and the p-value is p  0.076 / 2  0.038. Since the p-value is less
than   p  0.038  0.05  , H0 is rejected. There is sufficient evidence to indicate that the
model is useful for predicting y at   0.05.

9.23 a.  0  y-intercept since x1  0 is not in the observed range of years of higher education
1  change in the probability of being hired for each additional year of education, all
other variables held constant.
 2  change in the probability of being hired for each additional year of experience, all
other variables held constant.
3  difference in the probability of being hired between males and females, all other
variables held constant.

b. Stage 1:

Using MINITAB, the results are:

Regression Analysis: HIRE versus EDUC, EXP, GENDER


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 3 3.279 1.0929 9.27 0.000
Error 24 2.828 0.1179
Total 27 6.107

Model Summary
S R-sq R-sq(adj)
0.343298 53.69% 47.90%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -0.844 0.281 -3.00 0.006
EDUC 0.1069 0.0423 2.53 0.018
EXP 0.0886 0.0207 4.28 0.000
GENDER 0.485 0.138 3.51 0.002

Regression Equation
HIRE = -0.844 + 0.1069 EDUC + 0.0886 EXP + 0.485 GENDER

The least squares equation is yˆ  0.844  0.1069 x1  0.0886 x2  0.485 x3 . The


predicted values are:

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-21

Obs. i 1 2 3 4 5 6 7 8 9 10
yˆi -0.026 0.068 0.814 0.548 -0.328 0.277 0.245 -0.062 -0.114 0.897
Obs. i 11 12 13 14 15 16 17 18 19 20
yˆi 0.245 0.454 -0.239 0.418 0.511 0.152 0.496 0.371 0.204 0.157
Obs. i 21 22 23 24 25 26 27 28
yˆi 0.026 0.282 0.939 0.381 0.100 0.371 0.954 0.861

Stage 2:

The predicted values above are used to calculate the appropriate weights for weighted
1
least squares regression, where wi  . Note: Where the predicted values are
yˆi 1  yˆi 
less than 0, we will replace the values with 0.01.

Using MINITAB, the weighted least squares regression results are:

Regression Analysis: HIRE versus EDUC, EXP, GENDER


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 3 53.93 17.9752 21.79 0.000
Error 24 19.80 0.8250
Total 27 73.73

Model Summary
S R-sq R-sq(adj)
0.908294 73.14% 69.79%

Coefficients
Term Coef SE Coef T-Value P-Value
Constant -0.528 0.144 -3.66 0.001
EDUC 0.0750 0.0271 2.77 0.011
EXP 0.0747 0.0139 5.36 0.000
GENDER 0.3912 0.0976 4.01 0.001
Regression Equation
HIRE = -0.528 + 0.0750 EDUC + 0.0747 EXP + 0.3912 GENDER

The weighted least squares equation is yˆ  0.528  0.0750 x1  0.0747 x2  0.3912 x3 .

To determine if the model is useful for predicting y, we test:

H 0 : 1   2  3  0
H a : At least one i  0

The test statistic is F  21.79 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate that the
model is useful for predicting y at   0.05.

d. To determine if gender is an important predictor of hiring status, we test:

Copyright © 2020 Pearson Education, Inc.


9-22 Special Topics in Regression

H 0 : 3 = 0
H a : 3 ¹ 0

The test statistic is t = 4.01 and the p-value is p = 0.001. Since the p-value is less than
 ( p = 0.001 < 0.05), H0 is rejected. There is sufficient evidence to indicate that
gender is an important predictor of hiring status y. Since the estimate of  3 is positive, it
appears that males have a higher probability of being hired than females.

e. Using MINITAB, the 95% confidence interval for E ( y ) when x1 = 4, x2 = 3, and x3 = 0 :


Prediction for HIRE
Settings
Variable Setting
EDUC 4
EXP 3
GENDER 0

Prediction
Fit SE Fit 95% CI 95% PI
-0.0037493 0.0450864 (-0.0968032, 0.0893045) (-0.212196, 0.204697)
Weight = 101.01

The confidence interval is (-0.0968, 0.0893). That is, we estimate the probability of
being hired E ( y ), for female applicants ( x3 = 0) with x1 = 4 years of education and
x2 = 3 years of experience to fall between -0.0968 and 0.0893. Note that the interval
includes negative numbers, which are nonsensical probabilities.

9.24 a.   P  y  1 for each value of x. For this problem  is the probability that the household
owns a digital organizer given the households annual income.


b. odds  This represents the odds of that a household owns a digital organizer.
1

c. To determine if the model is useful in the prediction of PC ownership, we test:

H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is  2  2.4533 and the p-value is p  0.117. Since the p-value is not
less than   p  0.117  0.10  , H0 is not rejected. There is sufficient evidence to
indicate that the annual income of the head of the household is a useful predictor of PC
ownership at   0.10.

9.25 a.   probability that the rock sample is from the Maryland nappe.

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-23

  
b. The model is  *   0  1 x1 where  *  ln  
 1    

c. 1 is the change in the log-odds of a Maryland nappe for every 1-unit increase in FIA.

exp   0  80 1 
d. yˆ 
1  exp   0  80 1 

9.26 a. To determine if the overall model is statistically useful for predicting geese flight
response, we test:

H 0 : 1 =  2 = 0
H a : At least one  i ¹ 0

The test statistic is  2  259.18 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.01 , H0 is rejected. There is sufficient evidence to indicate the
overall model is statistically useful for predicting geese flight response at   0.01.

b. To determine if flight response of the geese depends on altitude of the helicopter, we


test:

H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is  2  9.38 and the p-value is p  0.002. Since the p-value is less
than   p  0.002  0.01 , H0 is rejected. There is sufficient evidence to indicate the
flight response of the geese depends on altitude of the helicopter at   0.01.

c. To determine if flight response of the geese depends on lateral distance of the helicopter
from the flock, we test:

H0 : 2 = 0
Ha : 2 ¹ 0

The test statistic is  2  252.48 and the p-value is p  0.000. Since the p-value is less
than   p  0.002  0.01 , H0 is rejected. There is sufficient evidence to indicate the
flight response of the geese depends on lateral distance of the helicopter from the flock
at   0.01.

d. The fitted model is ˆ *  2.395  0.1965 x1  0.2388 x2 . When x1  6 and x2  3,


e
2.395  0.1965 6   0.2388 3 

ˆ  0.946
1  e 2.395 0.1965 6  0.2388 3  
 

Copyright © 2020 Pearson Education, Inc.


9-24 Special Topics in Regression

9.27 a. Using MINITAB, the results are:

Binary Logistic Regression: HIRE versus EDUC, EXP, GENDER


Response Information
Variable Value Count
HIRE 1 9 (Event)
0 19
Total 28

Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 3 20.430 6.8100 20.43 0.000
EDUC 1 6.112 6.1117 6.11 0.013
EXP 1 14.878 14.8775 14.88 0.000
GENDER 1 11.321 11.3209 11.32 0.001
Error 24 14.735 0.6140
Total 27 35.165

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
58.10% 49.57% 22.73

Coefficients
Term Coef SE Coef VIF
Constant -14.25 6.08
EDUC 1.155 0.602 1.83
EXP 0.910 0.429 3.64
GENDER 5.60 2.60 3.55

Odds Ratios for Continuous Predictors


Odds Ratio 95% CI
EDUC 3.1736 (0.9747, 10.3331)
EXP 2.4839 (1.0708, 5.7618)
GENDER 271.4232 (1.6526, 44579.8931)

To determine if the model is adequate, we test:

H 0 : 1 =  2 =  3 = 0
H a : At least one  i ¹ 0

The test statistic is  2  20.43 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.05 , H0 is rejected. There is sufficient evidence to indicate the
overall model is adequate at   0.05.

b. To determine if gender is an important predictor of hiring status, we test:

H 0 : 3 = 0
H a : 3 ¹ 0

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-25

The test statistic is  2  11.32 and the p-value is p  0.001. Since the p-value is less
than   p  0.001  0.05 , H0 is rejected. There is sufficient evidence to indicate gender
is an important predictor of hiring status at   0.05.

c. Using MINITAB, the results are:

Prediction for HIRE


Settings
Variable Setting
EDUC 4
EXP 0
GENDER 1
Prediction
Fitted
Probability SE Fit 95% CI
0.0175489 0.0318498 (0.0004778, 0.400267)

When x1  4, x2  0, and x3  1, the 95% confidence interval for the mean response is
 0.0004778, 0.400267  .
9.28 From Exercise 6.11, the two variables selected as significant were well depth and percentage
of adjacent land allocated to industry. Using MINITAB, the results are:

Binary Logistic Regression: MTBE1 versus Depth, IndPct


Response Information
Variable Value Count
MTBE1 1 63 (Event)
0 128
Total 191

Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 2 10.153 5.077 10.15 0.006
Depth 1 5.108 5.108 5.11 0.024
IndPct 1 3.817 3.817 3.82 0.051
Error 188 232.060 1.234
Total 190 242.214

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
4.19% 3.37% 238.06

Coefficients
Term Coef SE Coef VIF
Constant -1.447 0.316
Depth 0.00625 0.00280 1.01
IndPct 0.0538 0.0279 1.01

Copyright © 2020 Pearson Education, Inc.


9-26 Special Topics in Regression

Odds Ratios for Continuous Predictors


Odds Ratio 95% CI
Depth 1.0063 (1.0008, 1.0118)
IndPct 1.0552 (0.9991, 1.1145)

To determine if the model is adequate for predicting detectible levels of MTBE, we test:

H 0 : 1 =  2 = 0
H a : At least one  i ¹ 0

The test statistic is  2  10.15 and the p-value is p  0.006. Since the p-value is less
than   p  0.006  0.05 , H0 is rejected. There is sufficient evidence to indicate the overall
model is adequate for predicting detectible levels of MTBE at   0.05.

Depth is a significant predictor of detectible levels of MTBE  2  5.11, p  0.024 and the  
percentage of adjacent land allocated to industry is very close to being a significant predictor
 
of detectible levels of MTBE  2  3.82, p  0.051 . Thus, we would recommend using this
model.

9.29 a.  = probability of landfast ice.

  
b. The model is  *   0  1 x1   2 x2   3 x3 , where  *  ln  .
 1    

c. Using MINITAB, the results are:

Binary Logistic Regression: y versus x1, x2, x3


Response Information
Variable Value Count
y 1 196 (Event)
0 220
Total 416

Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 3 70.46 23.485 70.46 0.000
x1 1 29.06 29.056 29.06 0.000
x2 1 64.06 64.059 64.06 0.000
x3 1 63.95 63.948 63.95 0.000
Error 412 504.86 1.225
Total 415 575.31

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
12.25% 11.73% 512.86

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-27

Coefficients
Term Coef SE Coef VIF
Constant 0.296 0.434
x1 4.128 0.809 1.34
x2 47.12 7.48 27.28
x3 -31.14 4.73 26.27

Odds Ratios for Continuous Predictors


Odds Ratio 95% CI
x1 62.0648 (12.7053, 303.1827)
x2 2.91936E+20 (1.26237E+14, 6.75132E+26)
x3 0.0000 (0.0000, 0.0000)

The fitted prediction equation is ˆ *  0.296  4.128 x1  47.12 x2  31.14 x3 .

d. To determine if the overall model is adequate, we test:

H 0 : 1 =  2 =  3 = 0
H a : At least one  i ¹ 0

The test statistic is  2  70.46 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.05 , H0 is rejected. There is sufficient evidence to indicate the
overall model is adequate in predicting the probability of landfast ice at   0.05.
e. The model is  *   0  1 x1   2 x2   3 x3   4 x1 x2   5 x1 x3   6 x2 x3 .

f. Using MINITAB, the results of fitting the complete model are:

Binary Logistic Regression: y versus x1, x2, x3, x1x2, x1x3, x2x3
Response Information
Variable Value Count
y 1 196 (Event)
0 220
Total 416

Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 6 102.649 17.1082 102.65 0.000
x1 1 0.985 0.9850 0.99 0.321
x2 1 0.733 0.7331 0.73 0.392
x3 1 37.632 37.6318 37.63 0.000
x1x2 1 1.567 1.5666 1.57 0.211
x1x3 1 0.065 0.0646 0.06 0.799
x2x3 1 26.783 26.7829 26.78 0.000
Error 409 472.664 1.1557
Total 415 575.313

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
17.84% 16.80% 486.66

Copyright © 2020 Pearson Education, Inc.


9-28 Special Topics in Regression

Coefficients
Term Coef SE Coef VIF
Constant 6.10 1.44
x1 -3.00 3.01 18.12
x2 10.6 12.5 63.74
x3 -39.69 7.73 55.66
x1x2 50.5 39.6 126.24
x1x3 -6.1 24.1 167.21
x2x3 56.2 12.1 33.77

Odds Ratios for Continuous Predictors


Odds Ratio 95% CI
x1 0.0500 (0.0001, 18.2084)
x2 38499.6206 (0.0000, 1.71183E+15)
x3 0.0000 (0.0000, 0.0000)
x1x2 8.46393E+21 (0.0000, 3.99376E+55)
x1x3 0.0021 (0.0000, 6.51110E+17)
x2x3 2.66545E+24 (1.45796E+14, 4.87299E+34)

The fitted prediction equation is


ˆ *  6.10  3.00 x1  10.6 x2  39.69 x3  50.5 x1 x2  6.1x1 x3  56.2 x2 x3 .

g. To determine if the interaction terms contribute to the model, we test:

H 0 :  4 = 5 = 6 = 0
H a : At least one  i ¹ 0

The test statistic is the difference in the  2 values from fitting the complete model and
the reduced model. The test statistic is  2   2 Complete   2 Reduced
 102.65  70.46  32.19.

Using   0.01, the critical value is  3,0.01


2
 11.3449. The rejection region is
 2  11.3449. Since the observed value of the test statistic falls in the rejection
 
region  2  32.19  11.3449 , H0 is rejected. There is sufficient evidence to indicate at
least one of the interaction terms contribute to the model at   0.01.

1 if sweet 1 if angular
9.30 First, we must code the variables. Let y   and x  
0 if sour 0 if rounded

  
The model is  *   0  1 x , where  *  ln  .
 1    
Using MINITAB, the results of the logistic regression are:

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-29

Binary Logistic Regression: Y versus X


Response Information
Variable Value Count
Y 1 42 (Event)
0 38
Total 80

Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 1 43.46 43.4637 43.46 0.000
Error 78 67.24 0.8620
Total 79 110.70

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
39.26% 38.36% 71.24

Coefficients
Term Coef SE Coef VIF
Constant -1.551 0.416
X 3.497 0.634 1.00

Odds Ratios for Continuous Predictors


Odds Ratio 95% CI
X 33.0000 (9.5280, 114.2941)

The fitted prediction equation is ˆ *  1.551  3.497 x.


To determine if the model is adequate for predicting sweet-tasting food, we test:

H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is  2  43.46 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.05 , H0 is rejected. There is sufficient evidence to indicate the overall
model is adequate in predicting sweet-tasting food at   0.05.

9.31 a. Using MINITAB, the descriptive statistics are:

Descriptive Statistics: LOS


Statistics
Variable N Mean Variance
LOS 50 6.540 6.907

The mean (6.540) is very similar to the variance (6.907). Thus, Poisson regression is
justified.

b. The lengths of stay are whole numbers or count data. The value of length of stay cannot
be negative, unlike normal data with a mean and variance about the same.

Copyright © 2020 Pearson Education, Inc.


9-30 Special Topics in Regression

c. The Poisson regression model would be E  y   e 0  e 1x or ln  E  y    0  1 x.

d. Using MINITAB, the results of fitting a Poisson model are:

Poisson Regression Analysis: LOS versus FACTORS


Deviance Table
Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 1 18.39 18.3883 18.39 0.000
Error 48 32.64 0.6800
Total 49 51.03

Model Summary
Deviance Deviance
R-Sq R-Sq(adj) AIC
36.03% 34.07% 219.77

Coefficients
Term Coef SE Coef VIF
Constant 1.392 0.130
FACTORS 0.002094 0.000478 1.00

Goodness-of-Fit Tests
Test DF Estimate Mean Chi-Square P-Value
Deviance 48 32.64133 0.68003 32.64 0.956
Pearson 48 32.69350 0.68111 32.69 0.955

e. To determine if the model is adequate, we test:


H 0 : 1 = 0
H a : 1 ¹ 0

The test statistic is  2  18.39 and the p-value is p  0.000. Since the p-value is less
than   p  0.000  0.05 , H0 is rejected. There is sufficient evidence to indicate the
overall model is adequate in predicting length of stay at   0.05.

f. 1  0.002094. For each additional factor, the ln of the length of stay is estimated to
increase by 0.002094. A more practical interpretation is found by computing
e0.002094  1.002. The length of stay is estimated to increase by
100 1.002  1 %  0.2% for each additional factor.

9.32 There is sufficient evidence to indicate the model is adequate for predicting the number of
 
flycatchers killed  2  12.30, p  0.000 . To determine if the model was correctly
specified, we look at the “goodness-of-fit” test based on n  k  1  14  1  1  12 degrees
of freedom. If the model is correctly specified, then  2 / df will be close to 1. For this
example,  2 / df  10.18111 / 12  0.848. Since this is close to 1, it indicates the model is
correctly specified. In addition, the p-value for the “goodness-of-fit” test is p  0.600. Since
the p-value is larger than   0.05, this again indicates that the model is correctly specified.

Copyright © 2020 Pearson Education, Inc.


Chapter 9 9-31

1  0.0684. For each additional percent of nest box occupancy, the ln of the number of
flycatchers killed is estimated to increase by 0.0684. A more practical interpretation is found
by computing e0.0684  1.071. The length of stay is estimated to increase by
100 1.071  1 %  7.1% for each additional percent of nest box occupancy.

9.33 a. The Poisson regression model would be E  y   e 0  e 1 x1  e 2 x2  e 3 x3  e 4 x4 or


ln E  y    0  1 x1   2 x2  3 x3   4 x4 .

b. To determine if the model was correctly specified, we look at the “goodness-of-fit” test
based on n  k  1  14  1  1  12 degrees of freedom. If the model is correctly
specified, then  2 / df will be close to 1. For this example,  2 / df  664 / 126  5.270.
Since this is not close to 1, it indicates the model is not correctly specified. In addition,
the p-value for the “goodness-of-fit” test is p  0.001. Since the p-value is not larger
than   0.05, this again indicates that the model is not correctly specified.

c. 1  0.368. The difference in the ln of the number of flycatchers killed between the
Papilionidae species and the Hesperidae species is estimated to be 0.368. A more
practical interpretation is found by computing e0.368  1.445. The number of flycatchers
killed is estimated to be 100 1.445  1 %  44.5% greater for the Papilionidae species
than the Hesperidae species.

9.34 1  0.1148. For each 1- unit increase in the composite score, the ln of the number of
suicides increases by 0.1148. A more practical interpretation is found by computing
e0.1148  1.122. The number of suicides increases by 100 1.122  1 %  12.2% for each
additional unit increase in the composite score.

9.35 a. The Poisson regression model would be E  y   e 0  e 1 x1  e 2 x2 or


ln E  y    0  1 x1   2 x2 , where x1  level of sensation seeking and
1 if male
x2  
0 if female

b. 1  0.2315. For each 1- point increase in the sensation seeking level, the ln of the
number of alcoholic drinks consumed is estimated to increase by 0.2315, holding gender
constant. A more practical interpretation is found by computing e0.2315  1.260. The
number of alcoholic drinks consumed is estimated to increase
by 100 1.260  1 %  26.0% for each additional point increase in the sensation seeking
level, holding gender constant.

c. There is sufficient evidence to indicate the model is adequate for predicting the number

of alcoholic drinks consumed  2  227.3, p  0.0001 . 
d. To determine if the interaction terms contribute to the model, we test:

Copyright © 2020 Pearson Education, Inc.


Solution Manual for A Second Course in Statistics: Regression Analysis, 8th Edition, William

9-32 Special Topics in Regression

H 0 : 3 = 0
H a : 3 ¹ 0

The test statistic is the difference in the  2 values from fitting the complete model and
the reduced model. The test statistic is  2   2 Complete   2 Reduced .

e. The interaction model would be ln  E  y    0  1 x1   2 x2  3 x1 x2 .


The male model is ln  E  y    0  1 x1   2   3 x1    0   2    1  3  x1.
The female model is ln  E  y    0  1 x1.

For Males: The percentage change in the number of drinks consumed for a 1-point
 
increase in sensation seeking level is e 1  3  1 100%.

For Females: The percentage change in the number of drinks consumed for a 1-point

increase in sensation seeking level is e 1  1 100%.

Copyright © 2020 Pearson Education, Inc.

Visit TestBankBell.com to get complete for all chapters

You might also like