You are on page 1of 44

CHAPTER 18

MULTIPLE REGRESSION

SECTIONS 1 - 3

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

1. In a multiple regression analysis, if the model provides a poor fit, this indicates that:
a. the sum of squares for error will be large
b. the standard error of estimate will be large
c. the multiple coefficient of determination will be close to zero
d. All of the above
ANSWER: d

2. In a multiple regression analysis, when there is no linear relationship between each of the
independent variables and the dependent variable, then
a. multiple t-tests of the individual coefficients will likely show some are significant
b. we will conclude erroneously that the model has some validity
c. the chance of erroneously concluding that the model is useful is substantially less
with the F-test than with multiple t-tests
d. All of the above statements are correct
ANSWER: d

165
166 Chapter Eighteen

3. In testing the validity of a multiple regression model, a large value of the F-test statistic
indicates that:
a. most of the variation in the independent variables is explained by the variation in y
b. most of the variation in y is explained by the regression equation
c. most of the variation in y is unexplained by the regression equation
d. the model provides a poor fit
ANSWER: b

4. Which of the following statements regarding multicollinearity is not true?


a. It exists in virtually all multiple regression models.
b. It is also called collinearity and intercorrelation.
c. It is a condition that exists when the independent variables are highly correlated with
the dependent variable.
d. It does not affect the F-test of the analysis of variance.
ANSWER: c

5. In a multiple regression analysis involving 25 data points, the standard error of estimate
squared is calculated as sε = 1.8 and the sum of squares for error as SSE = 36. Then, the
2

number of the independent variables must be:


a. 6
b. 5
c. 4
d. 3
ANSWER: c

6. When the independent variables are correlated with one another in a multiple regression
analysis, this condition is called:
a. heteroscedasticity
b. homoscedasticity
c. multicollinearity
d. elasticity
ANSWER: c

7. In a multiple regression model, the mean of the probability distribution of the error
variable ε is assumed to be:
a. 1.0
b. 0.0
c. Any value greater than 1
d. k, where k is the number of independent variables included in the model
ANSWER: b
Multiple Regression
167
8. The adjusted multiple coefficient of determination is adjusted for the:
a. number of regression parameters including the y-intercept
b. number of dependent variables and the sample size
c. number of independent variables and the sample size
d. coefficient of correlation and the significance level
ANSWER: c

9. In a multiple regression model, the standard deviation of the error variable ε is assumed
to be:
a. constant for all values of the independent variables
b. constant for all values of the dependent variable
c. 1.0
d. not enough information is given to answer this question
ANSWER: a

10. In multiple regression analysis, the ratio MSR/MSE yields the:


a. t-test statistic for testing each individual regression coefficient
b. F-test statistic for testing the validity of the regression equation
c. multiple coefficient of determination
d. adjusted multiple coefficient of determination
ANSWER: b

11. In a multiple regression analysis involving 6 independent variables, the sum of squares
are calculated as: Total variation in Y = SSY = 900, SSR = 600 and SSE = 300. Then, the
value of the F-test statistic for this model is:
a. 150
b. 100
c. 50
d. None of the above
ANSWER: d

12. In order to test the validity of a multiple regression model involving 5 independent
variables and 30 observations, the numerator and denominator degrees of freedom for the
critical value of F are, respectively,
a. 5 and 30
b. 6 and 29
c. 5 and 24
d. 6 and 25
ANSWER: c

13. In multiple regression models, the values of the error variable ε are assumed to be:
a. autocorrelated
b. dependent of each other
c. independent of each other
d. always positive
ANSWER: c
168 Chapter Eighteen

14. A multiple regression model involves 5 independent variables and a sample of 10 data
points. If we want to test the validity of the model at the 5% significance level, the
critical value is:
a. 6.26
b. 3.33
c. 9.36
d. 4.24
ANSWER: a

15. A multiple regression model involves 10 independent variables and 30 observations. If


we want to test at the 5% significance level the parameter β 4 , the critical value will be:
a. 2.093
b. 1.697
c. 2.228
d. 1.729
ANSWER: a

16. In a multiple regression analysis involving k independent variables and n data points, the
number of degrees of freedom associated with the sum of squares for error is:
a. k-1
b. n-k
c. n-1
d. n-k-1
ANSWER: d

17. A multiple regression model has the form yˆ = 8 + 3 x1 + 5 x 2 − 4 x3 . As x3 increases by


one unit, with x1 and x 2 held constant, the y on average is expected to:
a. increase by 1 unit
b. increase by 12 units
c. decrease by 4 units
d. decrease by 16 units
ANSWER: c

18. The problem of multicollinearity arises when the:


a. dependent variables are highly correlated with one another
b. independent variables are highly correlated with one another
c. independent variables are highly correlated with the dependent variable
d. None of the above
ANSWER: b
Multiple Regression
169
19. To test the validity of a multiple regression model, we test the null hypothesis that the
regression coefficients are all zero by applying the:
a. t-test
b. z-test
c. F-test
d. All of the above
ANSWER: c

20. To test the validity of a multiple regression model involving two independent variables,
the null hypothesis is that:
a. β 0 = β 1 = β 2
b. β 1 = β 2 = 0
c. β 1 = β 2
d. β 1 ≠ β 2
ANSWER: b

21. If multicollinearity exists among the independent variables included in a multiple


regression model, then:
a. regression coefficients will be difficult to interpret
b. standard errors of the regression coefficients for the correlated independent variables
will increase
c. multiple coefficient of determination will assume a value close to zero
d. both (a) and (b) are correct statements
ANSWER: d

22. Which of the following is not true when we add an independent variable to a multiple
regression model?
a. Adjusted coefficient of determination can assume a negative value
b. Unadjusted coefficient of determination always increases
c. Unadjusted coefficient of determination may increase or decrease
d. Adjusted coefficient of determination may increase
ANSWER: c

23. A multiple regression model has the form yˆ = b0 + b1 x1 + b2 x 2 . The coefficient b1 is


interpreted as the:
a. change in y per unit change in x1
b. change in y per unit change in x1 , holding x 2 constant
c. change in y per unit change in x1 , when x1 and x 2 values are correlated
d. change in the average value of y per unit change in x1 , holding x 2 constant
ANSWER: d
170 Chapter Eighteen

24. A multiple regression analysis involving three independent variables and 25 data points
results in a value of 0.769 for the unadjusted multiple coefficient of determination. Then,
the adjusted multiple coefficient of determination is:
a. 0.385
b. 0.877
c. 0.591
d. 0.736
ANSWER: d

25. The coefficient of multiple determination ranges from:


a. 1.0 to ∞
b. 0.0 to 1.0
c. 1.0 to k, where k is the number of independent variables in the model
d. 1.0 to n, where n is the number of observations in the dependent variable
ANSWER: b

26. For a multiple regression model, the following statistics are given: Total variation in Y =
SSY = 500, SSE = 80, and n = 25. Then, the coefficient of determination is:
a. 0.84
b. 0.16
c. 0.3125
d. 0.05
ANSWER: a

27. For a multiple regression model the following statistics are given: Total variation in Y =
SSY = 250, SSE = 50, k = 4, and n = 20. Then, the coefficient of determination adjusted
for the degrees of freedom is:
a. 0.800
b. 0.747
c. 0.840
d. 0.775
ANSWER: b

28. A multiple regression model has the form: yˆ = 5.25 + 2 x1 + 6 x 2 . As x 2 increases by one
unit, holding x1 constant, then the value of y will increase by:
a. 2 units
b. 7.25 units
c. 6 units on average
d. None of the above
ANSWER: c
Multiple Regression
171
29. The graphical depiction of the equation of a multiple regression model with k
independent variables (k > 1) is referred to as:
a. a straight line
b. response variable
c. response surface
d. a plane only when k = 3
ANSWER: c

30. A multiple regression model has:


a. only one independent variable
b. only two independent variables
c. more than one independent variable
d. more than one dependent variable
ANSWER: c

31. If all the points for a multiple regression model with two independent variables were on
the regression plane, then the multiple coefficient of determination would equal:
a. 0
b. 1
c. 2, since there are two independent variables
d. any number between 0 and 2
ANSWER: b

32. If none of the data points for a multiple regression model with two independent variables
were on the regression plane, then the multiple coefficient of determination would be:
a. –1.0
b. 1.0
c. any number between –1 and 1, inclusive
d. any number greater than or equal to zero but smaller than 1
ANSWER: d

33. The multiple coefficient of determination is defined as:


a. SSE/SSY
b. MSE/MSR
c. 1- (SSE/SSY)
d. 1- (MSE/MSR)
ANSWER: c

34. In a multiple regression model, the following statistics are given: SSE = 100, R 2 = 0.995 ,
k = 5, and n = 15. Then, the multiple coefficient of determination adjusted for degrees of
freedom is:
a. 0.955
b. 0.930
c. 0.900
d. 0.855
ANSWER: b
172 Chapter Eighteen

35. In a multiple regression model, the error variable ε is assumed to have a mean of:
a. –1.0
b. 0.0
c. 1.0
d. Any value smaller than –1.0
ANSWER: b

36. For the following multiple regression model: yˆ = 2 − 3x1 + 4 x 2 + 5 x3 , a unit increase in
x1 , holding x 2 and x3 constant, results in:
a. an increase of 3 units in the value of y
b. a decrease of 3 units in the value of y
c. a decrease of 3 units on average in the value of y
d. an increase of 8 units in the value of y
ANSWER: c

37. In a multiple regression model, the probability distribution of the error variable ε is
assumed to be:
a. normal
b. nonnormal
c. positively skewed
d. negatively skewed
ANSWER: a

38. Which of the following measures can be used to assess the multiple regression model’s
fit?
a. sum of squares for error
b. sum of squares for regression
c. standard error of estimate
d. single t-test
ANSWER: c

39. In a multiple regression analysis involving 40 observations and 5 independent variables,


the following statistics are given: Total variation in Y = SSY = 350 and SSE = 50. Then,
the multiple coefficient of determination is:
a. 0.8408
b. 0.8571
c. 0.8469
d. 0.8529
ANSWER: b
Multiple Regression
173
40. In a multiple regression analysis involving 20 observations and 5 independent variables,
the following statistics are given: Total variation in Y = SSY = 250 and SSE = 35. The
multiple coefficient of determination adjusted for degrees of freedom is:
a. 0.810
b. 0.860
c. 0.835
d. 0.831
ANSWER: a

41. In testing the validity of a multiple regression model involving 10 independent variables
and 100 observations, the numerator and denominator degrees of freedom for the critical
value of F will be, respectively,
a. 9 and 90
b. 10 and 100
c. 9 and 10
d. 10 and 89
ANSWER: d

42. In multiple regression analysis involving 10 independent variables and 100 observations,
the critical value of t for testing individual coefficients in the model will have:
a. 100 degrees of freedom
b. 10 degrees of freedom
c. 89 degrees of freedom
d. 9 degrees of freedom
ANSWER: c

43. For a multiple regression model,


a. SSY = SSR – SSE
b. SSE = SSR – SSY
c. SSR = SSE – SSY
d. SSY = SSE + SSR
ANSWER: d

44. In a regression model involving 50 observations, the following estimated regression


model was obtained: yˆ = 10.5 + 3.2 x1 + 5.8 x2 + 6.5 x3 . For this model, the following
statistics are given: SSR = 450 and SSE = 175. Then, the value of MSR is:
a. 12.50
b. 275
c. 150
d. 3.804
ANSWER: c
174 Chapter Eighteen

45. In a regression model involving 30 observations, the following estimated regression


model was obtained: yˆ = 60 + 2.8 x1 + 1.2 x 2 − x3 . For this model, the following statistics
were given: Total variation in Y = SSY = 800 and SSE = 200. Then, the value of the F
statistic for testing the validity of this model is:
a. 26.00
b. 7.69
c. 3.38
d. 0.039
ANSWER: a

46. Most statistical software provide p-value for testing each coefficient in the multiple
regression model. In the case of b2 , this represents the probability that:
a. b2 = 0
b. β 2 = 0
c. | b2 | could be this large if β 2 = 0
d. | b2 | could be this large if β 2 ≠ 0
ANSWER: c

47. In a regression model involving 60 observations, the following estimated regression


model was obtained: yˆ = 51.4 + 0.70 x1 + 0.679 x 2 − 0.378 x3 , and the following statistics
were given: SSY = 119,724 and SSR = 29,029.72. Then, the value of MSE is:
a. 1619.541
b. 9676.572
c. 1995.400
d. 5020.235
ANSWER: a

48. In testing the validity of a multiple regression model in which there are four independent
variables, the null hypothesis is:
a. H 0 : β 1 = β 2 = β 3 = β 4 = 1
b. H 0 : β 0 = β 1 = β 2 = β 3 = β 4
c. H 0 : β 1 = β 2 = β 3 = β 4 = 0
d. H 0 : β 0 = β 1 = β 2 = β 3 = β 4 ≠ 0
ANSWER: c
Multiple Regression
175

49. For a set of 20 data points, a statistical software listed the estimated multiple regression
equation as yˆ = −8.61 + 22 x1 + 7 x 2 + 28 x3 , and also has listed the t statistic for testing
the significance of each regression coefficient. Using the 5% significance level for
testing whether b2 = 7 differs significantly from zero, the critical region will be that the
absolute value of t is greater than or equal to:
a. 1.746
b. 2.120
c. 1.337
d. 1.333
ANSWER: b

50. For the multiple regression model: yˆ = 75 + 25 x1 − 15 x 2 + 10 x3 , if x 2 were to increase by


5, holding x1 and x3 constant, the value of y will:
a. increase by 5
b. increase by 75
c. decrease on average by 5
d. decrease on average by 75
ANSWER: d

51. In a multiple regression analysis, there are 20 data points and 4 independent variables,
and the sum of the squared differences between observed and predicted values of y is
180. The multiple standard error of estimate will be:
a. 6.708
b. 3.464
c. 9.000
d. 3.000
ANSWER: b

52. A multiple regression analysis includes 4 independent variables results in sum of squares
for regression of 1200 and sum of squares for error of 800. Then, the multiple coefficient
of determination will be:
a. 0.667
b. 0.600
c. 0.400
d. 0.200
ANSWER: b
176 Chapter Eighteen

53. A multiple regression analysis includes 20 data points and 4 independent variables
produced the following statistics: Total variation in Y = SSY = 200 and SSR = 160.
Then, the multiple standard error of estimate will be:
a. 0.80
b. 3.266
c. 3.651
d. 1.633
ANSWER: d

54. In a multiple regression analysis involving 25 data points and 5 independent variables,
the sum of squares terms are calculated as Total variation in Y = SSY = 500, SSR = 300,
and SSE = 200. In testing the validity of the regression model, the F value of the test
statistic will be:
a. 5.70
b. 2.50
c. 1.50
d. 0.176
ANSWER: a

55. A multiple regression equation includes 5 independent variables, and the coefficient of
determination is 0.81. The percentage of the variation in y that is explained by the
regression equation is:
a. 81%
b. 90%
c. 86%
d. about 16%
ANSWER: a

56. In a simple linear regression problem, the following pairs of ( yi , yˆ i ) are given: (6.75,
7.42), (8.96, 8.06), (10.30, 11.65), and (13.24, 12.15). Then, the sum of squares for error
is
a. 39.2500
b. -0.0300
c. 4.2695
d. 39.2800
ANSWER: c
Multiple Regression
177

57. In a multiple regression problem involving two independent variables, if b1 is computed


to be + 2.0, it meant that the
a. relationship between x1 and y is significant
b. estimated average of y increases by two units for each increase of one unit of x1
holding x2 constant
c. estimated average of y increases by two units for each increase of one unit of x1 ,
without regard to x2
d. estimated average of y is two when x1 equals 0
ANSWER: b

58. In a multiple regression model, the value of the coefficient of multiple determination has
to fall between
a. – 1 and + 1
b. 0 and + 1
c. – 1 and 0
d. Any pair of real numbers
ANSWER: b

59. In a multiple regression model, which of the following is correct regarding the value of
the value of R 2 adjusted for the degrees of freedom?
a. It can be negative
b. It has to be positive
c. It has to be larger than the coefficient of multiple determination
d. It can be larger than 1
ANSWER: a

60. An interaction term in a multiple regression model with two independent variables may
be used when
a. the coefficient of determination is small
b. there is a curvilinear relationship between the dependent and independent variables
c. neither one of the two independent variables contribute significantly to the regression
model
d. the relationship between x1 and y changes for differing values of x2
ANSWER: d

61. In a multiple regression model, the adjusted R 2


a. cannot be negative
b. can sometimes be negative
c. can sometimes be greater than + 1
d. has to fall between 0 and + 1
ANSWER: b
178 Chapter Eighteen

62. The coefficient of multiple determination R 2


a. measures the variation around the predicted regression equation
b. measures the proportion of variation in y that is explained by x1 and x2
c. measures the proportion of variation in y that is explained by x1 holding x2 constant
d. will have the same sign as b1
ANSWER: b

63. If a group of independent variables are not significant individually but are significant as a
group at a specified level of significance, this is most likely due to
a. autocorrelation
b. the presence of dummy variables
c. the absence of dummy variables
d. multicollinearity
ANSWER: d
Multiple Regression
179

TRUE / FALSE QUESTIONS

64. Multiple regression is the process of using several independent variables to predict a
number of dependent variables.
ANSWER: F

65. In multiple regression, the descriptor “multiple” refers to more than one dependent
variable.
ANSWER: F

66. For each x term in the multiple regression equation, the corresponding β is referred to as
a partial regression coefficient.
ANSWER: T

67. In a multiple regression problem, the regression equation is ŷ = 60.6 − 5.2 x1 + 0.75 x2 . The
estimated value for y when x1 = 3 and x2 = 4 is 48.
ANSWER: T

68. In reference to the equation yˆ = −0.80 + 0.12 x1 + 0.08 x2 , the value –0.80 is the y intercept.
ANSWER: T

69. In testing the significance of a multiple regression model in which there are three
independent variables, the null hypothesis is H 0 : β1 = β 2 = β 3 .
ANSWER: F

70. In a multiple regression problem involving 24 observations and three independent


variables, the estimated regression equation is ŷ = 72 + 3.2 x1 + 1.5 x2 − x3 . For this model,
SST = 800 and SSE = 245. Then, the value of the F statistic for testing the significance
of the model is 15.102.
ANSWER: T

71. A multiple regression equation includes 5 independent variables, and the coefficient of
determination is 0.81. Then, the percentage of the variation in y that is explained by the
regression equation is 90%.
ANSWER: F

72. In a multiple regression analysis involving 4 independent variables and 30 data points,
the number of degrees of freedom associated with the sum of squares for error, SSE, is
25.
ANSWER: T
180 Chapter Eighteen

73. In order to test the significance of a multiple regression model involving 4 independent
variables and 25 observations, the numerator and denominator degrees of freedom for the
critical value of F are 3 and 21, respectively.
ANSWER: F

74. In multiple regression analysis, the adjusted multiple coefficient of determination is


adjusted for the number of independent variables and the sample size.
ANSWER: T

75. A multiple regression analysis includes 25 data points and 4 independent variables
produces SST = 400 and SSR = 300. Then, the multiple standard error of estimate is 5.
ANSWER: F

76. Multicollinearity is present if the dependent variable is linearly related to one of the
explanatory variables.
ANSWER: F

77. In a multiple regression analysis involving 50 observations and 5 independent variables,


SST = 475 and SSE = 71.25. Then, the multiple coefficient of determination is 0.85.
ANSWER: T

78. A multiple regression model has the form ŷ = 6.75 + 2.25 x1 + 3.5 x2 . As x1 increases by
one unit, holding x2 constant, the value of y will increase by 9 units.
ANSWER: F

79. In reference to the multiple regression model ŷ = 40 + 15 x1 − 10 x2 + 5 x3 , if x 2 were to


increase by five units, holding x1 and x3 constant, then, the value of y would decrease
on average by 50 units.
ANSWER: T

80. A multiple regression model involves 40 observations and 4 independent variables


produces SST = 100,000 and SSR = 80,400. Then, the value of MSE is 560.
ANSWER: T

81. In order to test the significance of a multiple regression model involving 5 independent
variables and 30 observations, the numerator and denominator degrees of freedom for the
critical value of F are 5 and 24, respectively.
ANSWER: T

82. In reference to the equation yˆ = −0.80 + 0.12 x1 + 0.08 x2 , the value 0.12 is the average
change in y per unit change in x1 , when x2 is held constant.
ANSWER: T

83. In multiple regression, if the error sum of squares SSE equals the total variation in y, then
the value of the test statistic F is zero.
Multiple Regression
181
ANSWER: T

84. In reference to the equation yˆ = 1.86 − 0.51x1 + 0.60 x2 , the value 0.60 is the average change
in y per unit change in x2 , regardless of the value of x1 .
ANSWER: F

85. Most statistical software print a second R 2 statistic, called the coefficient of
determination adjusted for degrees of freedom, which has been adjusted to take into
account the sample size and the number of independent variables.
ANSWER: T

86. In multiple regression, the standard error of estimate is defined by sε = SSE /( n − k ) ,


where n is the sample size and k is the number of independent variables.
ANSWER: F

87. In regression analysis, the total variation in the dependent variable y, measured by
∑ ( yi − y )2 , can be decomposed into two parts: the explained variation, measured by
SSR, and the unexplained variation, measured by SSE.
ANSWER: T

88. In multiple regression, a large value of the test statistic F indicates that most of the
variation in y is unexplained by the regression equation and that the model is useless. A
small value of F indicates that most of the variation in y is explained by the regression
equation and that the model is useful.
ANSWER: F

89. When an additional explanatory variable is introduced into a multiple regression model,
coefficient of multiple determination adjusted for degrees of freedom can never decrease.
ANSWER: F

90. In multiple regression analysis, when the response surface (the graphical depiction of the
regression equation) hits every single point, the sum of squares for error SSE = 0, the
standard error of estimate sε = 0, and the coefficient of determination R 2 = 1.
ANSWER: T

91. In a multiple regression analysis involving k independent variables, the t-tests of the
individual coefficients allows us to determine whether β i ≠ 0 (for i = 1, 2, …., k), which
tells us whether a linear relationship exists between xi and y.
ANSWER: T

92. In multiple regression analysis, the problem of multicollinearity affects the t-tests of the
individual coefficients as well as the F-test in the analysis of variance for regression,
since the F-test combines these t-tests into a single test.
ANSWER: F
182 Chapter Eighteen

93. A multiple regression model is assessed to be good if the error sum of squares SSE and
the standard error of estimate sε are both small, the coefficient of multiple determination
R2 is close to 1, and the value of the test statistic F is large.
ANSWER: T

94. The most commonly method to remedy non-normality or heteroscedasticity in regression


analysis is to transform the dependent variable, y. The most commonly used
transformations are y ′ = log y (provided y ≥ 0) , y ′ = y 2 , y ′ = y (provided y ≥ 0) , and
y′ = 1 y .
ANSWER: T

95. In multiple regression analysis, and because of a commonly occurring problem called
multicollinearity, the t-tests of the individual coefficients may indicate that some
independent variables are not linearly related to the dependent variable, when in fact they
are.
ANSWER: T

96. Multicollinearity is present when there is a high degree of correlation between the
dependent variable and any of the independent variables.
ANSWER: F

97. The coefficient of multiple determination R 2 measures the proportion of variation in y


that is explained by the explanatory variables included in the model.
ANSWER: T

98. When an additional explanatory variable is introduced into a multiple regression model,
the coefficient of multiple determination will never decrease.
ANSWER: T

99. In regression analysis, we judge the magnitude of the standard error of estimate relative
to the values of the dependent variable, and particularly to the mean of y.
ANSWER: T

100. In calculating the standard error of the estimate, sε = MSE , there are(n – k – 1) degrees
of freedom, where n is the sample size and k is the number of independent variables in
the model.
ANSWER: T

101. A multiple regression is called “multiple” because it has several explanatory variables.
ANSWER: T
102. The coefficient of multiple determination measures the proportion or percentage of the
total variation in the dependent variable y that is explained by the regression plane.
ANSWER: T
Multiple Regression
183
103. When an explanatory variable is dropped from a multiple regression model, the adjusted
coefficient of determination can increase.
ANSWER: T

104. The coefficient of multiple determination is calculated by dividing the regression sum of
squares by the total sum of squares (SSR/SST) and subtracting that value from 1
ANSWER: F

105. In a multiple regression model involving 5 independent variables, if the sum of the
squared residuals is 847 and the data set contains 40 points, then, the value of the
standard error of the estimate is 24.911.
ANSWER: F

106. One of the consequences of multicollinearity in multiple regression is biased estimates on


the slope coefficients.
ANSWER: F

107. When an explanatory variable is dropped from a multiple regression model, the
coefficient of multiple determination can increase.
ANSWER: F

108. Multicollinearity is a situation in which two or more of the independent variables are
highly correlated with each other.
ANSWER: T

109. You have just run a regression in which the coefficient of multiple determination is 0.78.
To determine if this indicates that the independent variables explain a significant portion
of the variation in the dependent variable, you would perform an F – test.
ANSWER: T

110. From the coefficient of multiple determination, we cannot detect the strength of the
relationship between the dependent variable y and any individual independent variable.
ANSWER: T

111. The total sum of squares (SST) in a regression model will never exceed the regression
sum of squares (SSR).
ANSWER: F

112. A regression had the following results: SST = 92.25, SSE = 34.55. It can be said that
37.45% of the variation in the dependent variable is explained by the independent
variables in the regression.
ANSWER: F

113. An interaction term in a multiple regression model involving two independent variables
may be used when the relationship between x1 and y changes for differing values of x2 .
ANSWER: T
184 Chapter Eighteen

114. Multicollinearity is present when there is a high degree of correlation between the
independent variables included in the regression model.
ANSWER: T

115. The interpretation of the slope is different in a multiple linear regression model as
compared to a simple linear regression model.
ANSWER: T

116. A multiple regression is called “multiple” because it has several data points, and multiple
dependent variables.
ANSWER: F

117. A high value of the coefficient of multiple determination significantly above 0 in multiple
regression, accompanied by insignificant t – values on all parameter estimates, very often
indicates a high correlation between independent variables in the model.
ANSWER: T

118. One of the consequences of multicollinearity in multiple regression is inflated standard


errors in some or all of the estimated slope coefficients.
ANSWER: T

119. A regression analysis showed that SST = 112.18 and SSE = 33.65. It can be said that
70% of the variation in the dependent variable is explained by the independent variables
in the regression.
ANSWER: T

120. A multiple regression model has the form yˆ = b0 + b1 x1 + b2 x 2 . The coefficient b1 is


interpreted as the average change in y per unit change in x1 .
ANSWER: F

121. When an explanatory variable is dropped from a multiple regression model, the adjusted
coefficient of multiple of multiple determination can increase.
ANSWER: T

122. The parameter estimates are biased when multicollinearity is present in a multiple
regression equation.
ANSWER: F

123. In trying to obtain a model to estimate grades on a statistics test, a professor wanted to
include, among other factors, whether the person had taken the course previously. To do
this, the professor included a dummy variable in her regression that was equal to 1 if the
person had previously taken the course, and 0 otherwise. The interpretation of the
Multiple Regression
185
coefficient associated with this dummy variable would be the average amount the repeat
students tended to be above or below non-repeaters, with all other factors the same.
ANSWER: T

124. When an additional explanatory variable is introduced into a multiple regression model,
the adjusted coefficient of multiple determination can never decrease.
ANSWER: F

125. If we have taken into account all relevant explanatory variables, the residuals from a
multiple regression should be random.
ANSWER: T

126. When an additional explanatory variable is introduced into a multiple regression model,
the coefficient of multiple determination will increase.
ANSWER: T

127. Multicollinearity will result in excessively low standard errors of the parameter estimates
reported in the regression output.
ANSWER: F

128. A multiple regression model is assessed to be perfect if the error sum of squares SSE = 0,
the standard error of estimate sε = 0, the coefficient of multiple determination R 2 =1, and
the value of the test statistic F = ∞ .
ANSWER: T

129. A multiple regression model is assessed to be poor if the error sum of squares SSE , and
the standard error of estimate sε are both large, the coefficient of multiple determination
R 2 is close to 0, and the value of the test statistic F is small.
ANSWER: T
186 Chapter Eighteen

STATISTICAL CONCEPTS & APPLIED QUESTIONS

130. Consider the following statistics of a multiple regression model: Total variation in y =
SSY = 1000, SSE = 300, n = 50, and k = 4 .
a. Determine the standard error of estimate
b. Determine the multiple coefficient of determination
c. Determine the F-statistics

ANSWER:
a. sε = 2.582
b. R 2 = 70%
c. F = MSR/MSE = 26.25

131. Consider the following statistics of a multiple regression model: n = 25, k = 5, b1 = -6.31,
and sb1 = 2.98. Can we conclude at the 1% significance level that x1 and y are linearly
related?

ANSWER:
H 0 : β 1 = 0 vs. H 1 : β 1 ≠ 0
Rejection region: | t | > t0.005,19 = 2.861, Test statistic: t = -2.117
Conclusion: Don’t reject the null hypothesis. No

132. The computer output for the multiple regression model y = β 0 + β 1 x1 + β 2 x 2 + ε is shown
below. However, because of a printer malfunction some of the results are not shown.
These are indicated by the boldface letters a to i. Fill in the missing results (up to three
decimal places).

Predictor Coef StDev T


Constant a 6.15 4.11
x1 3.51 b 1.25
x2 -0.71 0.30 c

S=d R-Sq = e

ANALYSIS OF VARIANCE

Source of Variation df SS MS F
Regression 2 412 g i
Error 37 f h
Total 39 974

ANSWER:
Multiple Regression
187
a = 25.277 b = 2.808 c = -2.367 d = 3.897 e = .423
f = 562 g = 206 h = 15.189 i = 13.5623

FOR QUESTIONS 133 THROUGH 140, USE THE FOLLOWING NARRATIVE:


Narrative: Life Expectancy
An actuary wanted to develop a model to predict how long individuals will live. After consulting
a number of physicians, she collected the age at death (y), the average number of hours of
exercise per week ( x1 ), the cholesterol level ( x 2 ), and the number of points that the individual’s
blood pressure exceeded the recommended value ( x3 ). A random sample of 40 individuals was
selected. The computer output of the multiple regression model is shown below.

THE REGRESSION EQUATION IS

y = 55.8 + 1.79 x1 − 0.021x 2 − 0.016 x3

Predictor Coef StDev T


Constant 55.8 11.8 4.729
x1 1.79 0.44 4.068
x2 -0.021 0.011 -1.909
x3 -0.016 0.014 -1.143

S = 9.47 R-Sq = 22.5%

ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression 3 936 312 3.477
Error 36 3230 89.722
Total 39 4166

133. {Life Expectancy Narrative} Is there enough evidence at the 10% significance level to
infer that the model is useful in predicting length of life?

ANSWER:
H 0 : β1 = β 2 = β 3 = 0
H 1 : At least one β i is not equal to zero.
Rejection region: F > F0.05,3,36 ≈ = 2.84
Test statistic: F = 3.477
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 10%
significance level to infer that the model is useful in predicting length of life.
188 Chapter Eighteen

134. {Life Expectancy Narrative} Is there enough evidence at the 1% significance level to
infer that the average number of hours of exercise per week and the age at death are
linearly related?

ANSWER:
H 0 : β 1 = 0 vs. H 1 : β 1 ≠ 0
Rejection region: | t | > t0.005,36 ≈ 2.724
Test statistic: t = 4.068
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 1% significance
level to infer that the average number of hours of exercise per week and the age at death
are linearly related.

135. {Life Expectancy Narrative} Is there enough evidence at the 5% significance level to
infer that the cholesterol level and the age at death are negatively linearly related?

ANSWER:
H 0 : β 2 = 0 vs. H 1 : β 2 < 0
Rejection region: t < - t0.05,36 ≈ -1.69
Test statistic: t = -1.909
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 5% significance
level to infer that the cholesterol level and the age at death are negatively linearly related.

136. {Life Expectancy Narrative} Is there sufficient evidence at the 5% significance level to
infer that the number of points that the individual’s blood pressure exceeded the
recommended value and the age at death are negatively linearly related?

ANSWER:
H 0 : β 3 = 0 vs. H 1 : β 3 < 0
Rejection region: t < - t0.05,36 ≈ -1.69
Test statistic: t = -1.143
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 5% significance
level to infer that the number of points that the individual’s blood pressure exceeded the
recommended value and the age at death are negatively linearly related.

137. {Life Expectancy Narrative} What is the coefficient of determination? What does this
statistic tell you?

ANSWER:
R 2 = 0.225. This means that 22.5% of the variation in the age at death is explained by the
three variables: the average number of hours of exercise per week, the cholesterol level,
and the number of points that the individual’s blood pressure exceeded the recommended
value, while 77.5% of the variation remains unexplained.
Multiple Regression
189
138. {Life Expectancy Narrative} Interpret the coefficient b1 .

ANSWER:
b1 = 1.79. This tells us for each additional hour increase of exercise per week, the age at
death on average is extended by 1.79 years (assuming that the other independent
variables in the model are held constant).

139. {Life Expectancy Narrative} Interpret the coefficient b2 .

ANSWER:
b2 = -0.021. This tells us that for each additional unit increase in the cholesterol level, the
age at death on average is shortened by .021 years or equivalently about a week
(assuming that the other independent variables in the model are held constant).

140. {Life Expectancy Narrative} Interpret the coefficient b3 .

ANSWER:
b3 = 0.016. This tells us for each additional point increase of the individual’s blood
pressure that exceeded the recommended value, the age at death on average is shortened
by 0.016 years or equivalent, about six days (assuming that the other independent
variables in the model are held constant).

FOR QUESTIONS 141 THROUGH 147, USE THE FOLLOWING NARRATIVE:


Narrative: Demographic Variables and TV
A statistician wanted to determine if the demographic variables of age, education, and income
influence the number of hours of television watched per week. A random sample of 25 adults
was selected to estimate the multiple regression model: y = β 0 + β 1 x1 + β 2 x 2 + β 3 x3 + ε , where
y is the number of hours of television watched last week, x1 is the age (in years), x 2 is the
number of years of education, and x3 is income (in $1,000). The computer output is shown
below.

THE REGRESSION EQUATION IS

y = 22.3 + 0.41x1 − 0.29 x 2 − 0.12 x3

Predictor Coef StDev T


Constant 22.3 10.7 2.084
x1 0.41 0.19 2.158
x2 -0.29 0.13 -2.231
x3 -0.12 0.03 -4.00

S = 4.51 R-Sq = 34.8%


190 Chapter Eighteen

ANALYSIS OF VARIANCE

Source of Variation df SS MS F
Regression 3 227 75.667 3.730
Error 21 426 20.286
Total 24 653

141. {Demographic Variables and TV Narrative} Test the overall validity of the model at the
5% significance level.

ANSWER:
H 0 : β1 = β 2 = β 3 = 0
H 1 : At least one β i is not equal to zero.
Rejection region: F > F0.05,3,21 = 3.07
Test statistic: F = 3.73
Conclusion: Reject the null hypothesis. The model is valid at α = .05.

142. {Demographic Variables and TV Narrative} Is there sufficient evidence at the 1%


significance level to indicate that hours of television watched and age are linearly
related?

ANSWER:
H 0 : β 1 = 0 vs. H 1 : β 1 ≠ 0
Rejection region: | t | > t0.005,21 = 2.831
Test statistic: t = 2.158
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 1%
significance level to indicate that hours of television watched and age are linearly related.

143. {Demographic Variables and TV Narrative} Is there sufficient evidence at the 1%


significance level to indicate that hours of television watched and education are
negatively linearly related?

ANSWER:
H 0 : β 2 = 0 vs. H 1 : β 2 < 0
Rejection region: t < - t0.01,21 = -2.518
Test statistic: t = -2.231
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 1%
significance level to indicate that hours of television watched and education are
negatively linearly related.
Multiple Regression
191

144. {Demographic Variables and TV Narrative} What is the coefficient of determination?


What does this statistic tell you?

ANSWER:
R 2 = 0.348. This means that 34.8% of the variation in the number of hours of television
watched per week is explained by the three variables: age, number of years of education,
and income, while 65.2% remains unexplained.

145. {Demographic Variables and TV Narrative} Interpret the coefficient b1 .

ANSWER:
b1 = 0.41. This tells us that for each additional year of age, the number of hours of
television watched per week on average increases by 0.41 (assuming that the other
independent variables in the model are held constant).

146. {Demographic Variables and TV Narrative} Interpret the coefficient b2 .

ANSWER:
b2 = -0.29. This tells us that for each additional year of education, the number of hours of
television watched per week on average decreases by 0.29 (assuming that the other
independent variables in the model are held constant).

147. {Demographic Variables and TV Narrative} Interpret the coefficient b3 .

ANSWER:
b3 = -0.12. This tells us that for each additional year of $1000 in income, the number of
hours of television watched per week on average decreases by 0.12 (assuming that the
other independent variables in the model are held constant).

FOR QUESTIONS 148 THROUGH 155, USE THE FOLLOWING NARRATIVE:


Narrative: Family Expenditure on Clothes
An economist wanted to develop a multiple regression model to enable him to predict the annual
family expenditure on clothes. After some consideration, he developed the multiple regression
model y = β 0 + β 1 x1 + β 2 x 2 + β 3 x3 + ε , where y is the annual family clothes expenditure (in
$1,000), x1 is the annual household income (in $1,000), x 2 is the number of family members,
and x3 is the number of children under 10 years of age. The computer output is shown below.
192 Chapter Eighteen

THE REGRESSION EQUATION IS

y = 1.74 + 0.091x1 + 0.93 x 2 + 0.26 x3

Predictor Coef StDev T


Constant 1.74 0.630 2.762
x1 0.091 0.025 3.640
x2 0.93 0.290 3.207
x3 0.26 0.180 1.444

S = 2.06 R-Sq = 59.6%

ANALYSIS OF VARIANCE

Source of df SS MS F
Variation
Regression 3 288 96 22.647
Error 46 195 4.239
Total 49 483

148. {Family Expenditure on Clothes Narrative} Test the overall model’s validity at the 5%
significance level

ANSWER:
H 0 : β1 = β 2 = β 3 = 0
H 1 : At least one β i is not equal to zero.
Rejection region: F > F0.05,3,46 ≈ 2.84
Test statistic: F = 22.647
Conclusion: Reject the null hypothesis. Yes, the model is valid at α = .05.

149. {Family Expenditure on Clothes Narrative}Test at the 5% significance level to determine


whether annual household income and annual family clothes expenditure are positively
linearly related.

ANSWER:
H 0 : β 1 = 0 vs. H 1 : β1 > 0
Rejection region: t > t0.05,46 ≈ 1.68
Test statistic: t = 3.64
Conclusion: Reject the null hypothesis. Yes, annual household income and annual family
clothes expenditure are positively linearly related.
Multiple Regression
193

150. {Family Expenditure on Clothes Narrative} Test at the 1% significance level to


determine whether the number of family members and annual family clothes expenditure
are linearly related.

ANSWER:
H 0 : β 2 = 0 vs. H1 : β 2 ≠ 0
Rejection region: | t | > t0.005,36 ≈ 2.69
Test statistic: t = 3.207
Conclusion: Reject the null hypothesis. Yes, the number of family members and annual
family clothes expenditure are linearly related.

151. {Family Expenditure on Clothes Narrative} Test at the 1% significance level to


determine whether the number of children under 10 years of age and annual family
clothes expenditure are linearly related.

ANSWER:
H 0 : β 3 = 0 vs. H1 : β 3 ≠ 0
Rejection region: | t | > t0.005,46 ≈ 2.69
Test statistic: t = 1.444
Conclusion: Don’t reject the null hypothesis. No sufficient evidence to conclude that the
number of children under 10 years of age and annual family clothes expenditure are
linearly related.

152. {Family Expenditure on Clothes Narrative}What is the coefficient of determination?


What does this statistic tell you?

ANSWER:
R 2 = 0.596. This means that 59.6% of the variation in the annual family clothes
expenditure is explained by the three variables: annual household income, number of
family members, and number of children under 10 years of age, while 40.4% of the
variation remains unexplained.

153. {Family Expenditure on Clothes Narrative} Interpret the coefficient b1.

ANSWER:
b1 = 0.091. This tells us that for each additional $1000 in annual household income, the
annual family clothes expenditure increases on average by $91, assuming that the number
of family members, and the number of children under 10 years of age in the model are
held constant.
194 Chapter Eighteen

154. {Family Expenditure on Clothes Narrative} Interpret the coefficient b2 .

ANSWER:
b2 = 0.93. This tells us that for each additional family member, the annual family clothes
expenditure increases on average by $930, assuming that the annual household income,
and the number of children under 10 years of age in the model are held constant.

155. {Family Expenditure on Clothes Narrative} Interpret the coefficient b3 .

ANSWER:
b3 = 0.26. This tells us that for each additional child under the age of 10, the annual
family clothes expenditure increases on average by $260, assuming that the number of
family members and the annual household income in the model are held constant.

FOR QUESTIONS 156 THROUGH 163, USE THE FOLLOWING NARRATIVE:


Narrative: Student’s Final Grade
A statistics professor investigated some of the factors that affect an individual student’s final
grade in his course. He proposed the multiple regression model y = β 0 + β 1 x1 + β 2 x 2 + β 3 x3 + ε ,
where y is the final mark (out of 100), x1 is the number of lectures skipped, x 2 is the number
of late assignments, and x3 is the mid-term test mark (out of 100). The professor recorded the
data for 50 randomly selected students. The computer output is shown below.

THE REGRESSION EQUATION IS

ŷ = 41.6 − 3.18 x1 − 1.17 x 2 + .63 x3

Predicto Coef StDev T


r
Constant41.6 17.8 2.337
x1 -3.18 1.66 -1.916
x 2 -1.17 1.13 -1.035
x3 0.63 0.13 4.846

S = 13.74 R-Sq = 30.0%

ANALYSIS OF VARIANCE

Source of Variation df SS MS F
Regression 3 3716 1238.667 6.558
Error 46 8688 188.870
Total 49 12404
Multiple Regression
195

156. {Student’s Final Grade Narrative} What is the coefficient of determination? What does
this statistic tell you?

ANSWER:
R 2 = 0.30. This means that 30% of the variation in the student’s final grade in statistics
is explained by the three variables: number of lectures skipped, number of late
assignments, and mid-term test grade, while 70% remains unexplained.

157. {Student’s Final Grade Narrative} Do these data provide enough evidence to conclude at
the 5% significance level that the model is useful in predicting the final mark?

ANSWER:
H 0 : β1 = β 2 = β 3 = 0
H 1 : At least one β i is not equal to zero.
Rejection region: F > F0.05,3,46 ≈ 2.84
Test statistic: F = 6.558
Conclusion: Reject the null hypothesis. Yes, the model is useful in predicting the final
mark.

158. {Student’s Final Grade Narrative} Do these data provide enough evidence to conclude at
the 5% significance level that the final mark and the number of skipped lectures are
linearly related?

ANSWER:
H 0 : β 1 = 0 vs. H 1 : β 1 ≠ 0
Rejection region: | t | > t0.025,46 ≈ 2.014
Test statistic: t = -1.916
Conclusion: Don’t reject the null hypothesis. No, enough evidence to conclude at the 5%
significance level that the final mark and the number of skipped lectures are linearly
related.

159. {Student’s Final Grade Narrative} Do these data provide enough evidence at the 5%
significance level to conclude that the final mark and the number of late assignments are
negatively linearly related?

ANSWER:
H 0 : β 2 = 0 vs. H 1 : β 2 < 0
Rejection region: t < - t0.05,46 ≈ -1.679
Test statistic: t = -1.035
Conclusion: Don’t reject the null hypothesis. No, enough evidence at the 5% significance
level to conclude that the final mark and the number of late assignments are negatively
linearly related.
196 Chapter Eighteen

160. {Student’s Final Grade Narrative} Do these data provide enough evidence at the 1%
significance level to conclude that the final mark and the mid-term mark are positively
linearly related?

ANSWER:
H 0 : β 3 = 0 vs. H 1 : β 3 > 0
Rejection region: t > t0.01,46 ≈ 2.412
Test statistic: t = 4.846
Conclusion: Reject the null hypothesis. Yes, these data provide enough evidence at the
1% significance level to conclude that the final mark and the mid-term mark are
positively linearly related.

161. {Student’s Final Grade Narrative} Interpret the coefficient b1 .

ANSWER:
b1 = -3.18. This tells us that for each additional lecture skipped, the student’s final score
on average decreases by 3.18 points, assuming that the number of late assignments, and
the mid-term test mark (out of 100) in the model are held constant.

162. {Student’s Final Grade Narrative} Interpret the coefficient b2 .

ANSWER:
b2 = -1.17. This tells us that for each additional late assignment, the student’s final score
on average decreases by 1.17 points, assuming that the number of lectures skipped, and
the mid-term test mark (out of 100) in the model are held constant.

163. {Student’s Final Grade Narrative} Interpret the coefficient b3 .

ANSWER:
b3 = 0.63. This tells us that for each additional mid-term test score (out of 100), the
student’s final score on average increases by 0.63 points assuming that the number of
lectures skipped, and the number of late assignments in the model are held constant.
Multiple Regression
197
FOR QUESTIONS 164 THROUGH 182, USE THE FOLLOWING NARRATIVE:
Narrative: Real Estate
A real estate builder wishes to determine how house size is influenced by family income, family
size, and education of the head of household. House size is measured in hundreds of square feet,
income is measured in thousands of dollars, and education is measured in years. A partial
computer output is shown below.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.865
R Square 0.748
Adjusted R Square 0.726
Standard Error 5.195
Observations 50

ANOVA
df SS MS F Signif F
Regression 3605.7736 901.4434 0.0001
Residual 1214.2264 26.9828
Total 49 4820.0000

Coeff. St. Error t Stat P-value


Intercept – 1.6335 5.8078 – 0.281 0.7798
Family Income 0.4485 0.1137 3.9545 0.0003
Family Size 4.2615 0.8062 5.286 0.0001
Education – 0.6517 0.4319 – 1.509 0.1383

164. {Real Estate Narrative} What percentage of the variability in house size is explained by
income?

ANSWER:
74.8% of the variability in house size is explained by income

165. {Real Estate Narrative} Which of the independent variables in the model are significant
at the 2% level?

ANSWER:
Family income and family size

166. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which all explanatory variables are significant individually: α = .01, .05, .
10, and .15?

ANSWER:
α = .15
198 Chapter Eighteen

167. {Real Estate Narrative} When the builder used a simple linear regression model with
house size as the dependent variable and education as the independent variable, he
obtained an r 2 value of 23.0%. What additional percentage of the total variation in
house size has been explained by including family size and income in the multiple
regression?

ANSWER:
74.8% - 23.0% = 51.8%. This means that additional 51.8% of the total variation in house
size has been explained by including family size and income in the multiple regression.

168. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which at least two explanatory variables are significant individually: α = .01,
.05, .10, and .15?

ANSWER:
α = .01

169. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which the regression model as a whole is significant: α = .00005, .001, .01,
and .05?

ANSWER:
α = .001

170. {Real Estate Narrative} What is the predicted house size for an individual earning an
annual income of $40,000, having a family size of 4, and having 13 years of education?

ANSWER:
2488 square feet

171. {Real Estate Narrative} What minimum annual income would an individual with a
family size of 4 and 16 years of education need to attain a predicted 10,000 square foot
home?

ANSWER:
$211,850

172. {Real Estate Narrative} What minimum annual income would an individual with a
family size of 9 and 10 years of education need to attain a predicted 5,000 square foot
home?

ANSWER:
$44,140
Multiple Regression
199
173. {Real Estate Narrative} One individual in the sample had an annual income of $100,000,
a family size of 10, and an education of 16 years. This individual owned a home with an
area of 7,000 square feet. What is the residual (in hundreds of square feet) for this data
point?

ANSWER:
-5.40

174. {Real Estate Narrative} One individual in the sample had an annual income of $10,000, a
family size of 1, and an education of 8 years. This individual owned a home with an area
of 1,000 square fee (House = 10.00). What is the residual (in hundreds of square feet) for
this data point?

ANSWER:
y - ŷ = 70 – 75.404 = - 5.404 or – 540.4 square feet

175. {Real Estate Narrative} Suppose the builder wants to test whether the coefficient on
income is significantly different from 0. What is the value of the relevant t – statistic?

ANSWER:
t = 3.9549

176. {Real Estate Narrative} At the 0.01 level of significance, what conclusion should the
builder draw regarding the inclusion of income in the regression model?

ANSWER:
Income is significant in explaining house size and should be included in the model
because its p value of .0003 is less than 0.01.

177. {Real Estate Narrative} Suppose the builder wants to test whether the coefficient on
education is significantly different from 0. What is the value of the relevant t – statistic?

ANSWER:
t = - 1.509

178. {Real Estate Narrative} What is the value of the calculated F test statistic that is missing
from the output for testing whether the whole regression model is significant?

ANSWER:
F = 901.4434/26.9828 = 33.408
200 Chapter Eighteen

179. {Real Estate Narrative} At the 0.01 level of significance, what conclusion should the
builder draw regarding the inclusion of education in the regression model?

ANSWER:
Education is not significant in explaining house size and should not be included in the
model because its p value of 0.1383 is larger than 0.01

180. {Real Estate Narrative} What are the regression degrees of freedom that are missing from
the output?

ANSWER:
df = 3605.7736/901.4434 = 4

181. {Real Estate Narrative} What are the residual degrees of freedom that are missing from
the output?

ANSWER:
df = 1214.2264/26.9828 = 45

182. {Real Estate Narrative} The observed value of the F – statistic is missing from the
printout. What are the numerator and denominator degrees of freedom for this F –
statistic?

ANSWER:
df = 4 for the numerator, and 45 for the denominator

183. Three predictor variables are being considered for use in a linear regression model. Given
the correlation matrix below, does it appear that multicollinearity could be a problem?

x1 x2 x3
x1 1.000
x2 0.025 1.000
x3 0.968 0.897 1.000

ANSWER:
It appears that multicollinearity could be a problem because x3 is highly correlated with
both x1 and x2 .
Multiple Regression
201
184. Discuss some of the signals for the presence of multicollinearity.

ANSWER:
There are several clues to the presence of multicollinearity:
a. An independent variable known to be an important predictor ends up having a partial
regression coefficient that is not significant.
b. A partial regression coefficient exhibits the wrong sign.
c. When an independent variable is added or deleted, the partial regression coefficients
for the other variables change dramatically. A more practical way to identify
multicollinearity is through the examination of a correlation matrix, which is a matrix
that shows the correlation of each variable with each of the other variables. A high
correlation between two independent variables is an indication of multicollinearity.

185. A statistician estimated the multiple regression model: y = β 0 + β 1 x1 + β 2 x 2 + ε , with 45


observations. The computer output is shown below. However, because of a printer
malfunction, some of the results are not shown. These are indicated by the boldface
letters a to l. Fill in the missing results (up to three decimal places).

Predictor Coef StDev T


Constant a 3.51 2.03
x1 21.6 b 4.73
x2 -12.5 7.61 c

S=d R-Sq = e

ANALYSIS OF VARIANCE

Source of Variation df SS MS F
Regression f i j l
Error g 388 k
Total h 519

ANSWER:
a = 7.125 b = 4.567 c = -1.643 d = 3.039 e = .252 f =2
g = 42 h = 44 i = 131 j = 65.5 k = 9.238 l = 7.090
202 Chapter Eighteen

186. What is meant by multicollinearity?

ANSWER:
Multicollinearity is a condition which indicates that two or more of the independent
variables are highly correlated with each other.

187. A multiple regression equation has been developed for y = daily attendance at a
community swimming pool, x1 = temperature (degrees Fahrenheit), and x2 = weekend
versus weekday, ( x2 =1 for Saturday and Sunday, and 0 for other days of the week.) For
the regression equation shown below, interpret each partial regression coefficient:
yˆ = 100 + 10 x1 + 175 x2 .

ANSWER:
The partial regression coefficient for x1 implies that, holding the day of the week
constant, a one degree Fahrenheit increase in the temperature will result in an increase of
10 in attendance. The partial regression coefficient for x2 implies that the attendance
increases by 75 people on Saturdays and Sundays (assuming a constant temperature).
Multiple Regression
203

SECTION 4
MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

188. If the Durbin-Watson statistic has a value close to 0, which assumptions is violated?
a. Normality of the errors
b. Independence of errors
c. Homoscedasticity
d. None of the above.
ANSWER: b

189. If the Durbin-Watson statistic d has values smaller than 2, this indicates
a. a positive first – order autocorrelation
b. a negative first – order autocorrelation
c. no first – order autocorrelation at all
d. None of the above.
ANSWER: a

190. If the Durbin-Watson statistic d has values greater than 2, this indicates
a. a positive first – order autocorrelation
b. a negative first – order autocorrelation
c. no first – order autocorrelation at all
d. None of the above.
ANSWER: b

191. If the Durbin-Watson statistic has a value close to 4, which assumption is violated?
a. Normality of the errors
b. Independence of errors
c. Homoscedasticity
d. None of the above
ANSWER: b

192. The range of the values of the Durbin-Watson statistic d is


a. – 4 ≤ d ≤ 4
b. – 2 ≤ d ≤ 2
c. 0 ≤ d ≤ 4
d. 0 ≤ d ≤ 2
ANSWER: c

193. Which of the following statements is false?


204 Chapter Eighteen

a. Time series data refer to data that are gathered at a specific period of time
b. First – order autocorrelation is a condition in which a relationship exists between
consecutive residuals ei and ei −1 , where i is the time period
c. Time series data refer to data that are gathered sequentially over a series of time
periods
d. None of the above
ANSWER: a

194. The Durbin – Watson test is used to test for positive first – order autocorrelation by
comparing its statistic value d to the critical values d L and dU available in most statistics
books. Which of the following statements is true?
a. If d < d L , we conclude that there is enough evidence to show that positive first –
order autocorrelation exists.
b. If d > d L , we conclude that there is not enough evidence to show that positive first –
order autocorrelation exists
c. If d L ≤ d ≤ dU , we conclude that the test is inconclusive.
d. All of the above
ANSWER: d

195. In reference to the Durbin – Watson statistic d and the critical values d L and dU , which
of the following statements is false?
a. If d > 4 - d L , we conclude that the negative first – order autocorrelation exists
b. If d < 4 - dU , we conclude that there is not enough evidence to show that negative
first – order autocorrelation exists
c. If dU ≤ d ≤ 4 - dU , we conclude that there is no evidence of first – order
autocorrelation
d. None of the above
ANSWER: d

196. In reference to the Durbin – Watson statistic d and the critical values d L and dU , which
of the following statements is false?
a. If d < d L , we conclude that positive first – order autocorrelation exists
b. If d > dU , we conclude that there is not enough evidence to show that positive first –
order autocorrelation exists
c. If d < d L or d > 4 - d L , we conclude that there is no evidence of first – order
autocorrelation
d. None of the above
ANSWER: c
Multiple Regression
205

TRUE / FALSE QUESTIONS

197. The Durbin-Watson d statistic is used to check the assumption of normality.


ANSWER: F

198. The Durbin-Watson test allows the statistics practitioner to determine whether there is
evidence of first – order autocorrelation.
ANSWER: T
n n

The Durbin-Watson statistic d is defined as d = ∑ (ei − ei −1 ) / ∑ ei , where ei is the


2
199.
i =2 i =1

residual at time period i.


ANSWER: F

200. The range of the values of the Durbin-Watson statistic d is 0 ≤ d ≤ 4.


ANSWER: T

201. Time series data refer to data that are gathered sequentially over a series of time periods.
ANSWER: T

202. Small values of the Durbin-Watson statistic d (d < 2) indicate a negative first – order
autocorrelation.
ANSWER: F

203. Large values of the Durbin-Watson statistic d (d > 2) indicate a positive first – order
autocorrelation.
ANSWER: F

204. If the value of the Durbin-Watson statistic d satisfies the inequality d L ≤ d ≤ dU , where d L
and dU are the critical values for d, then the test for positive first – order autocorrelation
is inconclusive.
ANSWER: T

205. If the value of the Durbin-Watson test statistic d satisfies the inequality d > 4 - d L is a
critical value of d, we conclude that positive first – order autocorrelation exists.
ANSWER: F

206. If the value of the Durbin-Watson test statistic d satisfies the inequalities d < d L or d > 4
- d L , where d L and dU are the critical values of d, we conclude that positive first – order
autocorrelation exists.
ANSWER: T
206 Chapter Eighteen

STATISTICAL CONCEPTS & APPLIED QUESTIONS

207. Test the hypotheses: H 0 : There is no first-order autocorrelation vs. H 1 : There is negative
first-order autocorrelation given that: Durbin–Watson Statistic d = 1.75, n = 20, k = 2,
and α = 0.01.

ANSWER:
d L = 0.86 and d U = 1.27
The decision is made as follows:
If d > 4 - d L = 3.14, reject the null hypothesis and conclude that negative autocorrelation
is present.
If 2.73 = 4 - d U ≤ d ≤ 4 - d L = 3.14, we say that the test is inconclusive.
If d ≤ 4 - d U = 2.73, we conclude that there is no evidence of negative autocorrelation.
Since d = 1.75, we conclude that there is no evidence of negative autocorrelation.

208. Test the hypotheses H 0 : There is no first-order autocorrelation vs. H 1 : There is


positive first-order autocorrelation, given that: Durbin–Watson Statistic d = 1.12, n = 45,
k = 5, and α = 0.05.

ANSWER:
d L = 1.29 and d U = 1.78
The decision is made as follows:
If d < d L = 1.29, reject the null hypothesis and conclude that positive autocorrelation is
present.
If 1.29 = d L ≤ d ≤ d U = 1.78, we say that the test is inconclusive.
If d ≥ d U = 1.78, we conclude that there is no evidence of positive autocorrelation.
Since d = 1.12, we reject the null hypothesis and conclude that positive autocorrelation is
present.

209. If the residuals in a regression analysis of time ordered data are not correlated, the value
of the Durbin-Watson d statistic should be near __________.

ANSWER:
2
Multiple Regression
207
210. If the value of the Durbin-Watson statistic d is small (d < 2), this indicates a
__________(positive/negative) first – order autocorrelation exists.

ANSWER:
positive

211. Test the hypotheses H 0 : There is no first-order autocorrelation vs. H 1 : There is first-
order autocorrelation, given that: Durbin–Watson Statistic d = 1.89, n = 28, k = 3, and
α = 0.05.

ANSWER:
d L = 0.97, and d U = 1.41
The decision is made as follows:
If d < d L = 0.97 or d > 4 - d L = 3.03, reject the null hypothesis and conclude that the
autocorrelation is present..
If 0.97 = d L ≤ d ≤ d U = 1.41, or 2.59 = 4 - d U ≤ d ≤ 4 - d L = 3.03, we say that the test
is inconclusive.
If 1.41 = d U ≤ d ≤ 4 - d U = 2.59, we conclude that there is no evidence of autocorrelation
Since d = 1.70, we conclude that there is no evidence of autocorrelation.

212. If the value of the Durbin-Watson statistic d is large (d > 2), this indicates a __________
(positive/negative) first – order autocorrelation exists.

ANSWER:
negative

213. To use the Durbin-Watson test to test for positive first – order autocorrelation, the null
hypothesis will be H o : __________ (there is, there is no) first – order autocorrelation.

ANSWER:
there is no

214. To use the Durbin-Watson test to test for negative first – order autocorrelation, the null
hypothesis will be H o : __________ (there is, there is no) first – order autocorrelation.

ANSWER:
there is no

215. The range of the values of the Durbin-Watson statistic d is __________.

ANSWER:
0≤d ≤4
208 Chapter Eighteen

216. Given that the Durbin-Watson test is conducted to test for positive first – order
autocorrelation with α = .05 , n = 20, and there are two independent variables in the model,
the critical values for the test are d L = __________ and dU = __________, respectively.

ANSWER:
1.10 and 1.54

You might also like