Professional Documents
Culture Documents
QM 2 CHAP16
QM 2 CHAP16
Chapter 14
Additional Topics in
Regression Analysis
Chap 16-1
Chapter Goals
Model Verification
Chap 16-4
The Stages of Model Building
(continued)
Logically evaluate
Model Specification regression results in
light of the model (i.e.,
are coefficient signs
correct?)
Coefficient Estimation Are any coefficients
biased or illogical?
Evaluate regression
assumptions (i.e., are
Model Verification * residuals random and
independent?)
If any problems are
suspected, return to
Interpretation and Inference model specification
and adjust the model
Chap 16-5
The Stages of Model Building
(continued)
Model Specification
Chap 16-7
Dummy Variable Models
(More than 2 Levels)
Consider a categorical variable with K levels
The number of dummy variables needed is one
less than the number of levels, K – 1
Example:
y = house price ; x1 = square feet
If style of the house is also thought to matter:
Style = ranch, split level, condo
Three levels, so two dummy
variables are needed
Chap 16-8
Dummy Variable Models
(More than 2 Levels)
(continued)
Example: Let “condo” be the default category, and let x2
and x3 be used for the other two categories:
y = house price
x1 = square feet
x2 = 1 if ranch, 0 otherwise
x3 = 1 if split level, 0 otherwise
Chap 16-9
Interpreting the Dummy Variable
Coefficients (with 3 Levels)
Consider the regression equation:
yˆ 20.43 0.045x 1 23.53x 2 18.84x 3
For a condo: x2 = x3 = 0
With the same square feet, a
yˆ 20.43 0.045x 1 ranch will have an estimated
average price of 23.53
For a ranch: x2 = 1; x3 = 0 thousand dollars more than a
condo
yˆ 20.43 0.045x 1 23.53
With the same square feet, a
For a split level: x2 = 0; x3 = 1 split-level will have an
estimated average price of
yˆ 20.43 0.045x 1 18.84 18.84 thousand dollars more
than a condo.
Chap 16-10
Experimental Design
Consider an experiment in which
four treatments will be used, and
the outcome also depends on three environmental factors that
cannot be controlled by the experimenter
Let variable z1 denote the treatment, where z1 = 1, 2, 3,
or 4. Let z2 denote the environment factor (the
“blocking variable”), where z2 = 1, 2, or 3
Z1 X1 X2 X3 Z2 X4 X5
1 0 0 0 1 0 0
2 1 0 0 2 1 0
3 0 1 0 3 0 1
4 0 0 1
Chap 16-13
Experimental Design Model
Chap 16-14
Lagged Values of the
Dependent Variable
Chap 16-18
Specification Bias
(continued)
Chap 16-20
Multicollinearity
(continued)
Including two highly correlated explanatory
variables can adversely affect the regression
results
No new information provided
Can lead to unstable coefficients (large
standard error and low t-values)
Coefficient signs may not match prior
expectations
Chap 16-21
Some Indications of
Strong Multicollinearity
Incorrect signs on the coefficients
Large change in the value of a previous
coefficient when a new variable is added to the
model
A previously significant variable becomes
insignificant when a new independent variable
is added
The estimate of the standard deviation of the
model increases when a variable is added to
the model
Chap 16-22
Detecting Multicollinearity
Chap 16-23
Assumptions of Regression
Normality of Error
Error values (ε) are normally distributed for any given
value of X
Homoscedasticity
The probability distribution of the errors has constant
variance
Independence of Errors
Error values are statistically independent
Chap 16-24
Residual Analysis
ei y i yˆ i
The residual for observation i, ei, is the difference
between its observed and predicted value
Check the assumptions of regression by examining the
residuals
Examine for linearity assumption
Examine for constant variance for all levels of X
(homoscedasticity)
Evaluate normal distribution assumption
Evaluate independence assumption
Y Y
x x
residuals
x residuals x
Not Linear
Linear
Chap 16-26
Residual Analysis for
Homoscedasticity
Y Y
x x
residuals
x residuals x
Non-constant variance
Constant variance
Chap 16-27
Residual Analysis for
Independence
Not Independent
Independent
residuals
residuals
X
residuals
Chap 16-28
Excel Residual Output
Chap 16-31
Autocorrelated Errors
Independence of Errors
Error values are statistically independent
Autocorrelated Errors
Residuals in one time period are related to residuals in
another period
Autocorrelation violates a least squares
regression assumption
Leads to sb estimates that are too small (i.e., biased)
Thus t-values are too large and some variables may
appear significant when they are not
Chap 16-32
Autocorrelation
Autocorrelation is correlation of the errors
(residuals) over time
Time (t) Residual Plot
15
10
Here, residuals show Residuals
5
a cyclic pattern, not 0
-5 0 2 4 6 8
random
-10
-15
Time (t)
t
e 2
t 1
d less than 2 may signal positive
autocorrelation, d greater than 2 may
signal negative autocorrelation
Chap 16-34
Testing for Positive
Autocorrelation
H0: positive autocorrelation does not exist
H1: positive autocorrelation is present
Calculate the Durbin-Watson test statistic = d
d can be approximated by d = 2(1 – r) , where r is the sample
correlation of successive errors
Find the values dL and dU from the Durbin-Watson table
(for sample size n and number of independent variables K)
0 dL dU 2
Chap 16-35
Negative Autocorrelation
ρ0 ρ0
Do not reject H0
Reject H0 Inconclusive Inconclusive Reject H0
0 dL dU 2 4 – dU 4 – dL 4
Chap 16-36
Testing for Positive
Autocorrelation
(continued)
160
Example with n = 25: 140
120
Excel/PHStat output: 100
Durbin-Watson Calculations
Sales
80 y = 30.65 + 4.7038x
2
Sum of Squared 60 R = 0.8976
Difference of Residuals 3296.18 40
Sum of Squared 20
Residuals 3279.98 0
0 5 10 15 20 25 30
Durbin-Watson Tim e
Statistic 1.00494
(e t e t 1 )2
3296.18
d t 2
n
1.00494
3279.98
et
2
t 1
Chap 16-37
Testing for Positive
Autocorrelation
(continued)
Here, n = 25 and there is k = 1 one independent variable
Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
Therefore the linear model is not the appropriate model
to forecast sales
Decision: reject H0 since
D = 1.00494 < dL
Chap 16-41