You are on page 1of 11

Stat 252 – Homework # 4 Solutions – Fall 2018

STAT 252 Homework 4 Solutions (63 marks) – Due Wednesday, November 28 by 5pm

For questions that state “SHOW ALL STEPS”, write all the steps of a hypothesis test or confidence interval as
indicated below. For other questions that say do “NOT” show all steps, read the question carefully and
follow the exact instructions regarding what is required.

Whenever you are asked to “carry out the most appropriate test” and “SHOW ALL STEPS”:

i) Select the most appropriate hypothesis test and define the parameter(s) of interest.
ii) State clearly the null and alternative hypothesis in terms of the parameter(s).
iii) Calculate the test statistic, being sure to state its components.
iv) Calculate df. Determine the P-value for the test AND state the strength of the evidence against H0.
State whether P is less than or greater than alpha and, based on this comparison, decide whether to
reject or not reject H0. If the exact P-value is given in output, then report it as is. If not, then you must
estimate the P-value (within a range of values) using the appropriate statistical table.
v) Based on the research problem and referring to the significance level given, write a conclusion in
words.

Whenever you are asked to calculate a “confidence interval” and “SHOW ALL STEPS”:

i) State the critical value of the test statistic.


ii) Calculate the confidence interval, showing calculations of all its components.
iii) Interpret the interval.

Note: If you need to use the t-table or F-table and the degrees of freedom you need are NOT on the
table, round your degrees of freedom DOWN to the nearest one.

1. (Nine parts; 30 marks in total) A researcher wanted to determine the effect of driving experience and
driving violations (and the interaction of these two variables) on auto insurance premiums (the response
variable). He took a random sample of 17 drivers insured by a certain company and, for each driver, he
recorded driving experience (in years), the number of driving violations committed (within the last 3 years),
and the monthly premium paid (in dollars). He calculated the interaction term (driving experience x
violations). Based on the partial computer output from multiple linear regression analysis shown below,
answer parts (a) – (i) below. Assume that all the required assumptions for this model are satisfied.

Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .983a .966 .958 8.796
a. Predictors: (Constant), Interaction, Driving_Experience, Violations
b. Dependent Variable: Monthly_Premium

1
Stat 252 – Homework # 4 Solutions – Fall 2018

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 28715.964 3 9571.988 123.704 .000b
Residual 1005.919 13 77.378
Total 29721.882 16
a. Dependent Variable: Monthly_Premium
b. Predictors: (Constant), Interaction, Driving_Experience, Violations

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients 95.0% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 96.740 9.898 9.774 .000 75.357 118.124
Driving_Experience -1.441 .676 -.177 -2.131 .053 -2.901 .020
Violations 22.313 2.372 .955 9.408 .000 17.189 27.436
Interaction -.772 .244 -.229 -3.159 .008 -1.299 -.244
a. Dependent Variable: Monthly_Premium

(a) (5 marks) At the 5% significance level, perform a hypothesis test to determine whether the overall
multiple regression model is useful for making predictions about monthly premiums. Show ALL steps.

H0 : 1  2  3  0
Ha : at least one i  0, i  1, 2,3 (1 mark)
SSR / k MSR
F 
SSE / (n  (k  1)) MSE
28715.964 / 3 9571.98800
   123.704 (1.5 marks)
1005.919 /17  (3  1) 77.37838

df  (k , n  (k  1))  (3,17  (3  1))  (3,13)


OR Fnk( k 1)  F173 (31)  F133 (0.5 marks)
P < 0.001, so there is extremely strong evidence against H0.
Since P < α (0.05), reject H0. (1 mark)

At the 5% significance level, the data provide sufficient evidence to conclude that the overall regression
model is useful for making predictions about monthly premiums. OR, at least one of variables, driving
experience, driving violations, or their interaction, has an effect on monthly premiums.
(1 mark)

2
Stat 252 – Homework # 4 Solutions – Fall 2018
(b) (2 marks) What percentage of variation in monthly premiums is explained by the regression model?
(Determine the unadjusted percentage.)

SS REGR 28715.964
R2    0.96616
SSTOTAL 29721.882
Therefore, 96.6% (unadjusted) of the variation in monthly premiums is explained by the regression model.

(c) (2 marks) What percentage of variation in monthly premiums is explained by the regression model?
(Determine the adjusted percentage.)

MSE SSE / (n  (k  1))


2
Radj  1  1
MST SST / (n  1)
1005.919 /17  (3  1) 77.37838
 1  1  1  0.04165  0.95835
29721.882 / (17  1) 1857.61763
Therefore, 95.8% (adjusted) of the variation in monthly premiums is explained by the regression model.

(d) (2 mark) Find the standard error of the model.

SSE 1005.919
MSE    77.37838
n  (k  1) 17  (3  1)
Standard error of the model is: ˆ  MSE  77.37838  8.796498  8.796

(e) (5 marks) Since it would be virtually impossible that longer years of driving experience would lead to a
driver having to higher monthly premiums, it is legitimate to perform a left-tailed test. Therefore, perform
the most appropriate test (at the 5% significance level) to determine whether there is a negative
relationship between years of driving experience and monthly premiums. Show ALL steps. Give both the
exact P-value from the computer output and the P-value from the appropriate statistical table.

Note: A one-tailed test requires a regression t-test, not a regression ANOVA F-test.
H0:  2  0 (There is no relationship between years of driving experience and monthly premiums.)
Ha:  2  0 (There is a negative relationship between years of driving experience and monthly premiums.)
(1 mark)
ˆ2 1.441
t   2.132 (1 mark)
ˆ
SE (  2 ) 0.676
df  n  (k  1)  17  (3  1)  13 (0.5 marks)
From the computer output, the exact P-value = 0.053/2 = 0.0265 (0.5 marks)
From the t-table: 0.025 < P < 0.05 (0.5 marks)
There is strong evidence against H0.
Since P < α (0.05), reject H0 (1 mark)

Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude that there is a
significant negative relationship between years of driving experience and monthly premiums.
(0.5 marks)

3
Stat 252 – Homework # 4 Solutions – Fall 2018
(f) (4 marks) Calculate a 95% confidence interval for the slope of the interaction term (representing
interaction between years of driving experience and the number of driving violations). SHOW ALL STEPS.
Based on this confidence interval, what conclusion can you make about whether the interaction between
these two predictor variables has a significant effect on monthly premiums? Explain your answer.

For 95% confidence level, alpha = 0.05


At df  n  (k  1)  17  (3  1)  13
tn( k 1), /2  t17(31),0.01/2  t13,0.025  2.160 (1 mark)
ˆ  t  SE (ˆ )
3  /2 3
0.772  2.160  0.244
0.772  0.52704
(1.299, 0.245) (1.5 marks)

Conclusion: It is estimated with 95% confidence that the slope of the interaction term is between -1.299
and -0.245. (0.5 marks)

Since 0 is NOT inside this interval, we can say, with 95% confidence, that the interaction between years of
driving experience and the number of driving violations has a significant effect on monthly premiums.
(1 mark)

(g) (2 marks) Suppose that a driver who has 10 years of driving experience and has committed 3 driving
violations within the past 3 years has to pay a monthly premium of $100. What is the residual or error of
this observation?

yˆ  96.740  (1.441)(10)  (22.313)(3)  (0.772)(10)(3)  126.109 (1 mark)


Residual = observed – predicted = ( yi  yˆ p )  $100  $126.109  $26.109 (1 mark)

(h) (4 marks) Based on the values of the predictor variables given in part (g) (a driver who has 10 years of
driving experience and has committed 3 violations within the past 3 years), what is the 95% prediction
interval for all single observation responses of monthly premiums at those values of the predictor
variables? SHOW ALL STEPS. [Note: SE(Fit) = 3.249]

For 95% confidence level, alpha = 0.05


At df  n  (k  1)  17  (3  1)  13
tn( k 1), /2  t17(31),0.05/2  t13,0.025  2.160 (1 mark)
Based on the values of the predictor variables given in part (g), yˆ p  126.109

yˆ p  t /2  ˆ 2  [ SE ( Fit )]2
126.109  2.160  (8.796498) 2  (3.249) 2
126.109  2.160  9.3773
126.109  20.2550
(105.854,146.364) (2 marks)
It is estimated with 95% confidence that all single observation responses of monthly premiums at those
values of the predictor variables given in part (g) are between $105.854 and $146.364. (1 mark)

4
Stat 252 – Homework # 4 Solutions – Fall 2018
(i) (4 marks) Based on the values of the predictor variables given in part (g) (a driver who has 10 years of
driving experience and has committed 3 violations within the past 3 years), what is the 95% confidence
interval for mean monthly premium at those values of the predictor variables? SHOW ALL STEPS. [Note
again: SE(Fit) = 3.249]

At df = 13, tn( k 1), /2  t17(31),0.05/2  t13,0.025  2.160


Calculated in part (g), yˆ p  126.109 (1 mark)
yˆ p  t /2  SE( Fit )
126.109  2.160  3.249
126.109  7.01784
(119.091,133.127) (2 marks)
It is estimated with 95% confidence that the mean monthly premium at those values of the predictor
variables given in part (g) is between $119.091 and $133.127. (1 mark)

2. (Eight parts; 33 marks in total) A study was conducted with the objective of increasing muzzle velocity of
mortar-like antipersonnel weaponry with grenade-type golf-ball-size ammunition. Some researchers
determined that the addition of an O-ring can reduce propellant gas escape in the muzzle and increase
muzzle velocity. Three explanatory variables were considered in the study: vent hole volume (in cubic
inches), the presence of an O-ring (with or without), and discharge hole area (in inches) which might affect
the pressure pulse of propellant gases. The researcher measured the muzzle velocity 8 times for each
combination of the four levels of discharge hole area (0.016, 0.03, 0.048, and 0.062) as well as with or
without an O-ring, for a total of 64 observations. In each trial, vent hole volume was also observed.

For parts (a) – (e): (No output is required for these parts) Consider the regression model below
(which will be referred to as the “original model” for parts (a) – (e)) for average muzzle velocity given the
variables, volume, ring, and area (where ring and area are treated as categorical variables and volume is a
numerical variable).

 (velocity | volume, ring , area)  0  1volume   2 with  3 d1   4 d 2  5 d3


  6 (volume  with)   7 (volume  d1 )  8 (volume  d 2 )  9 (volume  d3 )
 10 (volume  with  d1 )  11 (volume  with  d 2 )  12 (volume  with  d 3 )

The indicator variables for ring and area are defined as follows:
Ring:
1, if an O-ring is present (with O-ring)
with  
0, if an O-ring is not present (without O-ring)

Area:
1, if discharge hole area is 0.016
d1  
0, otherwise
1, if discharge hole area is 0.030
d2  
0, otherwise
1, if discharge hole area is 0.048
d3  
0, otherwise
5
Stat 252 – Homework # 4 Solutions – Fall 2018
a) (6 marks) Referring to the original model, in terms of the regression coefficients, what is the effect of
volume on mean velocity? Find this effect in general, and then summarize the effect for all
combinations of levels of ring and area in the following chart.

(2 marks) The effect of volume on mean velocity is (by definition):

 (velocity | volume  1, ring , area}   (velocity | volume, ring , area}


 1  6 with  7 d1  8 d2  9 d3  10 (with  d1 )  11 (with  d2 )  12 (with  d3 )

(4 marks) Thus, for each combination of levels of ring and area, we have:

Ring Discharge Hole Area Effect of volume on mean velocity


With O-Ring 0.016 1   6   7  10
With O-Ring 0.030 1   6   8  11
With O-Ring 0.048 1  6  9  12
With O-Ring 0.062 1   6
Without O-Ring 0.016 1   7
Without O-Ring 0.030 1   8
Without O-Ring 0.048 1   9
Without O-Ring 0.062 1

b) (2 marks) Modify the original model to specify that the effect of volume on the mean of velocity is the
same with and without an O-ring, provided the discharge hole area is the same; otherwise, the effect of
volume on the mean of velocity is possibly different with and without O-ring when the discharge hole
area is not the same. Just state the constraint(s) needed. You do not have to rewrite the model.

We have the following four constraints:


Area  0.016 : 1   6   7  10  1   7   6  10  0
Area  0.030 : 1   6  8  11  1  8   6  11  0
Area  0.048 : 1   6   9  12  1   9   6  12  0
Area  0.062 : 1   6  1   6  0

Putting them all together, this implies  6  10  11  12  0

6
Stat 252 – Homework # 4 Solutions – Fall 2018
c) (3 marks) Referring to the original model, set up a test to explore whether or not the effect of volume is
any different for the different areas when an O-ring is not present. Write out the null and alternative
hypotheses in terms of the regression coefficients and identify the null distribution of the test statistic.

If the effect of volume on the mean velocity is the same for discharge hole area without using an O-ring,
then
1  7  1  8  1  9  1
  7  8   9  0

(2 marks) Thus,
H 0 :  7  8   9  0
H a : at least one i  0, i  7,8,9

(1 mark) The null distribution of the test statistic is an Fdfdf((fr))df ( f )  Fnk( k 1)  F643 (121)  F513
distribution.

d) (4 marks) Referring to the original model, in terms of the regression coefficients, what is the effect of
ring (with O-ring vs. without O-ring) on the mean velocity? Find this effect in general, and then
summarize the effect for all levels of area in the following chart.

(2 marks) The effect of ring (with O-ring vs. without O-ring) on the mean velocity is (by definition):

 (velocity | volume, ring  with, area )   (velocity | volume, ring  without , area)
 { 0  1volume   2  3 d1   4 d 2  5 d3   6 volume   7 (volume  d1 )  8 (volume  d 2 )
 9 (volume  d3 )  10 (volume  d1 )  11 (volume  d 2 )  12 (volume  d3 )}
  0  1volume  3 d1   4 d 2  5 d3   7 (volume  d1 )  8 (volume  d 2 )   9 (volume  d 3 )
  2   6 volume  10 (volume  d1 )  11 (volume  d 2 )  12 (volume  d 3 )

(2 marks) Thus, for each level of area we have:

Discharge Hole Area Effect of O-ring (with vs. without) on the mean velocity
0.016  2  ( 6  10 )volume
0.030  2  ( 6  11 )volume
0.048  2  ( 6  12 )volume
0.062  2   6 volume

7
Stat 252 – Homework # 4 Solutions – Fall 2018
e) (2 marks) Referring to the original model, consider a test to explore whether or not there is any O-ring
effect on mean velocity. Write out the reduced model, under the null hypothesis, for this test.

The reduced model under the null hypothesis would state that O-ring has no effect on mean velocity.
This would mean that

 2  ( 6  10 )volume   2  (  6  11 )volume   2  (  6  12 )volume   2   6 volume  0.

OR   2  6  10  11  12  0

Thus, the reduced model would be:

 (velocity | volume, area}  0  1velocity  3d1  4 d 2  5d3


 7 (velocity  d1 )  8 (velocity  d 2 )  9 (velocity  d3 )

For parts (f) – (g): Refer to the table below for the group definitions of a k = 8-mean model.
Let  i , i  1,2,,8 , correspond to the mean of velocity for groups 1 to 8, respectively.

Group O-Ring Discharge Hole Area


1 With O-Ring 0.016
2 With O-Ring 0.030
3 With O-Ring 0.048
4 With O-Ring 0.062
5 Without O Ring 0.016
6 Without O-Ring 0.030
7 Without O-Ring 0.048
8 Without O-Ring 0.062

Also, use the computer output in Tables 1 – 3 to answer parts (f) – (g):

Table 1: The One-Factor ANOVA table for the 8 means.

ANOVA

Velocity

Sum of Squares df Mean Square F Sig.

Between Groups 46346.482 ? ? ? ?

Within Groups 5477.462 ? ?

Total 51823.944 ?

8
Stat 252 – Homework # 4 Solutions – Fall 2018
Table 2:

Table 3:

Contrast Tests

Contras
t Value of Contrast Std. Error t df Sig. (2-tailed)

Velocity Assume equal variances 1 179.22500 9.889986 18.122 56 .000

2 40.55000 3.496638 11.597 56 .000

3 290.08750 18.502470 15.678 56 .000

4 -550.88750 37.333880 -14.756 56 .000

f) (5 marks) Carry out the most appropriate test to determine if there are any significant differences in the
mean velocity among the 8 different groups? SHOW ALL STEPS. Use a 5% significance level.

 H 0 : 1  2  3  4  5  6  7  8 (One-mean model)
(1 mark) 
 H a : Not all population means are equal (8-mean model)

(1.5 marks) The value of the test statistic is:


MSBetween MSTreaments SSTreatments / (k  1) 46346.482 / (8  1) 6620.926
F0*  or     67.690
MSError MSError SSError / (n  k ) 5477.462 / (64  8) 97.812

(0.5 marks) df  (k  1, n  k )  (8  1, 64  8)  (7,56) OR Fnkk1  F64818  F567

(1.5 marks) The P-value is: P < 0.001 OR P( F567  67.69)  0.001.
There is extremely strong evidence against H0. Since P < α (0.05), reject H0.

(0.5 marks) Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude
that there is some difference in mean velocity among the 8 different groups (at least two means are
different.

9
Stat 252 – Homework # 4 Solutions – Fall 2018
g) (5 marks) What is the overall effect of ring (with O-ring vs. without O-ring) on the mean of velocity?
First, define a linear contrast that will define the overall effect of ring on mean velocity. Then, calculate
a 95% confidence interval for this effect. SHOW ALL STEPS. Based on this confidence interval, does it
appear that there is a difference in mean velocity with vs. without an O-ring?

1  2  3   4  5   6   7  8
(1 mark)   
4 4

Based on the output for contrast 1, we have:

1
(0.5 marks) Estimate : ˆ  (179.225)  44.806
4
1
(0.5 marks) S .E.( Estimate)  S .E.(ˆ )  (9.889986)  2.473
4
(1 mark) For 95% confidence,
C.V .  t56,0.025
*
 t50,0.025
*
 2.009 .

(1 mark) The 95% confidence interval for  is then:


Estimate  {C.V .  S .E.( Estimate)}
 44.806  (2.009)(2.473)
 (39.839, 49.774)

(1 mark) Conclusion: It is estimated with 95% confidence that the mean of velocity is between 39.839
units to 49.774 units larger when an O-ring is present. Since the confidence interval does not include
zero, we would reject H 0 :   0 in favour of H a :   0 at significant level 0.05. Thus, at 5%
significance, there is significant evidence of a difference in mean velocity with vs. without an O-ring.

h) (6 marks) For this part, consider the three models defined below and use the three tables of computer
output analyzing those three models. Carry out the most appropriate test to determine if there are any
differences in the mean velocity for the four different levels of the discharge hole area, after accounting
for the volume and whether or not an O-ring is included. SHOW ALL STEPS. Use a 1% significance level.

Model 1:  (velocity | area}   0  3 d1   4 d 2  5 d3

Model 2:  (velocity | volume, ring}   0  1volume   2 with

Model 3:  (velocity | volume, ring , area}   0  1volume   2 with  3 d1   4 d 2   5 d3

ANOVA table for model 1:

10
Stat 252 – Homework # 4 Solutions – Fall 2018

ANOVA table for model 2:

ANOVA table for model 3:

This is a question concerning model 3. In model 3 we are testing:


 H 0 : 3   4  5  0 (Model 2)
(1 mark) 
 H a : at least one i  0, i  3, 4,5 (Model 3)

(0.5 marks) From ANOVA table for Model 3 (full model)


SS E = 6132.387 and df E = n – (k + 1) = 64 – (5 + 1) = 58

(0.5 marks) From ANOVA table for Model 2 (reduced model)


SS E = 19666.592 and df E = n – (k + 1) = 64 – (2 + 1) = 61

(1.5 marks) The value of the test statistic is:


[ SSE (reduced )  SS E ( full )] [df E (reduced )  df E ( full )]
F
SSE ( full ) / df E ( full )
[19666.592  6132.387] [61  58] 13534.205 / 3
F   42.669
6132.387 / 58 6132.387 / 58

(0.5 marks) Fdfdf((fr))df ( f )  F6461658  F583

(1.5 marks) The P-value is: P < 0.001 OR P( F583  42.67)  P( F503  42.669)  0.001
There is extremely strong evidence against H0. Since P < α (0.05), reject H0.

(0.5 marks) Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude
that the mean velocity for the four different levels of the discharge hole area, after accounting for the
volume and whether or not an O-ring is included.
11

You might also like