Diagnosing Model Problems: Detecting and Curing Multicollinearity

Chapter 5
Diagnosing Model Problems
1
Outline
5.1 Multicollinearity
5.2 Heteroskedasticity
5.3 Autocorrelation
2
5.1 Multicollinearity
5.1.1 The nature of multicollinearity

5.1.2 Estimation in the presence of multicollinearity.
5.1.3 Detection of multicollinearity
5.1.4 Curing the problem
3
5.1.1. The Nature of Multicollinearity
• Originally it meant the existence of a “perfect,” or exact
linear relationship among some or all explanatory variables
of a regression model.
• Today, it includes perfect multicollinearity and less than
perfect multicollinearity.
• Wooldridge (2004): High (but not perfect) correlation
between two or more independent variables is called
multicollinearity.
• Perfect multicollinearity
λ1X1 + λ2X2 + · · ·+λk Xk = 0
• Unperfect multicollinearity
λ1X1 + λ2X2 + · · ·+λ2Xk + vi = 0
where vi is a stochastic error term.
4
5.1.1. The Nature of Multicollinearity
• A numerical example:
• X3i = 5X2i → There is perfect collinearity between X2 and

X3
• The variable X*3 was created from X3 by simply adding to it
the following numbers (vi = 2, 0, 7, 9, 2). Now there is no
longer perfect collinearity between X2 and X*3. However, the
two variables are highly correlated because calculations will show
that the coefficient of correlation between them is 0.9959.
5
The Nature of Multicollinearity
• The term
6
5.1.2. Estimation in the presence of multicollinearity
Perfect multicollinearity
Nếu có perfect collirearity => ko ước l

ượng đc B2^ B3^
Perfect collirearity →X3i = λX2i , where λ is a nonzero constant
→ the estimator is indeterminate
7
5.1.2. Estimation in the presence of multicollinearity
High multicollinearity
• The variances and covariances of β2 and β3 are given by
→ where r23 is the coefficient of correlation between X2 and X3.

• The r23 tends toward 1 as collinearity increases, the variances
and covariance of the estimators increase.
Perfect collinearity: r23 = 1, the variances are infinite.
8
Practical consequences of Multicollinearity
High multicollinearity
1. The OLS estimators have large variances and covariances,
making precise estimation difficult.
2. The confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. The t ratio of one or more coefficients tends to be statistically
insignificant.
4. Although the t ratio of one or more coefficients is statistically
insignificant, R2, the overall measure of goodness of fit, can be
very high.
5. The OLS estimators and their standard errors can be sensitive
to small changes in the data.
9
Example
10
Example
• Income and wealth together explain about 96
percent of the variation in consumption
expenditure .
• Neither of the slope coefficients is individually
statistically significant.
• Not only is the wealth variable statistically
insignificant but also it has the wrong sign.
• H0 (  2 =  3 = 0) is rejected (F=92.40) →
Consumption expenditure is related to income
and wealth.
→ When collinearity is high, tests on individual
regressors are not reliable.
11
5.1.3. Detection of Multicollinearity
• High R2 but insignificant t ratios. If R2 is high, say,

in excess of 0.8, the F test in most cases will reject
the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t
tests will show that none or very few of the partial
slope coefficients are statistically different from
zero.
12
• High pair-wise correlation among regressors: a rule of

thumb indicates that the pair-wise correlation is high, say, in
excess of 0.8, then multicollinearity is a serious problem. →
This is a sufficient but not necessary condition. The model
involving more than two explanatory variables, the
correlations will not provide an infallible guide to the
presence of multicollinearity.
• Auxiliary regressions: to regress each Xi on the remaining
X variables and compute the corresponding R2. If the
computed F exceeds the critical Fi at the chosen level of
significance, it is taken to mean that the particular Xi is
collinear with other X’s;
13
5.1.3. Detection of Multicollinearityy
• The speed with which variances and covariances increase

can be seen with the variance-inflating factor (VIF),
which is defined as:
• Using this definition, we can express
14
• Tolerance and variance inflation factor: if the VIF of a

variable exceeds 10, which will happen if R2j exceeds 0.90,
that variable is said be highly collinear
The inverse of the VIF is called tolerance (TOL). That is,
When R2j = 1 (i.e., perfect collinearity), TOLj = 0

When R2j = 0 (i.e., no collinearity whatsoever), TOLj is 1
15
5.1.4. Curing the problem
• Do Nothing:
• A priori information.
• Combining cross-sectional and time series data.
• Dropping a variable(s) and specification bias
• Transformation of variables
• Additional or new data.
16
Do Nothing
• Multicollinearity is essentially a data
deficiency problem and sometimes we have no
choice over the data we have available for
empirical analysis.
17
Priori information
Ex: Suppose we consider the Cobb Douglas production

function of a country:
  Ut
Qt = ALt K t e ln Qt = ln A +  ln Lt +  ln K t + U t
Or: Qt* = A* + L*t + K t* + U t

- High correlation between K and L leads to large
variances of coefficient estimators.
- Based on the findings in prior literature, we know that the

country has constant returns to scale: α+β = 1.
18
Priori information
Replacing β with 1-α, we obtain: t

Q *
= A*
+ L*
t + (1 −  ) K t + Ut
*
Q − K = A +  (L − K ) + U t
*
t
*
t
* *
t
*
t
or Yt = A + Z + U t
* * *
t
Where Yt* = Qt* − K t* Z t* = L*t − K t*
→We estimate ̂ and compute: ˆ = 1 − ˆ
19
Combining cross-sectional and time series data
• Examine the demand for automobiles
ln Yt = 1 +  2 ln Pt +  3 ln I t + ut
Where Y= number of cars sold, P= average
price, I= income, t= time.
→We estimate price elasticity and income
elasticity.
→In time series data, the price and income
variables tend to be highly collinear.
20
Combining cross-sectional and time series data
• If we have cross-sectional data, for ex, data
generated by consumer panes, or budget
studies conducted by various private and
governmental agencies → we obtain a fairly
reliable estimate of the income elasticity ˆ 3
• Time series regression: Yt* = 1 +  2 ln Pt + ut
• Where: Yt* = ln Yt − ˆ3 ln I t
21
• Dropping a variable(s) and specification bias
-When we drop the wealth variable, the income variable is
now highly significant.
- But we may be committing a specification bias or
specification error. Economic theory says that income and
wealth should both be included in the model explaining
the consumption expenditure, dropping the wealth
variable would constitute specification bias.
-The remedy may be worse than the disease.
Multicolliearity may prevent precise estimation of the
parameters, whereas omitting a variable may seriously
mislead us as to the true values of the parameters.
22
Transformation of variables-first difference form
Regression model: Yt =  0 + 1 X 1t +  2 X 2t+U t
→ It must hold at time (t-1):
Yt −1 =  0 + 1 X 1,t −1 +  2 X 2,t −1+U t −1

→ Yt − Yt −1 = 1 ( X 1t − X 1,t −1 ) +  2 ( X 2t − X 2,t −1) + U t − U t −1
→ The first difference regression model often

reduces the severity of multicollinearity.
23
Additional or new data
• Increasing the size of the sample may
attenuate the collinearity problem.
• The sample size increases →  x2i2 will

increase → The variance will decrease.
24
Assignments
• Questions 10.3, 10.10 in p.376-377, Gujarati.
• Problems 10.26-10.30 in p.382-384, Gujarati.
25
5.2 HETEROSCEDASTICITY
5.2.1. The nature of heteroscedasticity
5.2.2. OLS estimators
5.2.3. Detecting heteroscedasticity
5.2.4. Correcting for Heteroscedasticity
26
5.2.1. The Nature of Heteroscedasticity
▪ Ass 4: Homoscedasticity or equal variance of ui:

Var(uiXi) = 2
The variation around the regression line is the same across the X
values; it neither increases or decreases as X varies
▪ If the variances of ui are not the same
Var(uiXi) = i2
there is heteroscedasticity
27
Examples or reasons for the heteroscedasticity (1)
▪ Error-learning models: as people learn, their errors of
behavior become smaller over time.
▪ As incomes grow, people have more discretionary income
and hence more scope for choice about the disposition of
their income. → The variance of consumption/savings is
likely to increase with income.
▪ Companies with larger profits are generally expected to
show greater variability in their dividend policies than
companies with lower profits.
▪ Growth-oriented companies are likely to show more
variability in their dividend payout ratio than established
companies.
28
▪ Heteroscedasticity can also arise as a result of the presence
of outliers.
▪ An outlier is an observation that is much different (either
very small or very large) in relation to the observations in
the sample.
▪ An outlier is an observation from a different population to
that generating the remaining sample observations.
▪ The inclusion and exclusion of outliers, especially if the
sample is small can substantially alter the results of
regression analysis.
29
▪ The heteroscedasticity may be due to the fact that
some important variables are omitted from the
model.
▪ Another source of the heteroscedasticity is skewness
in the distribution of one or more regressors.
▪ Due to incorrect data transformation or incorrect
functional form.
▪ Heteroscedasticity is likely to happen in cross-
sectional analysis.
30
lehangmyhanh.cs2@ftu.edu.vn 31
5.2.2. OLS estimation in the presence of heteroscedasticity
• Two variable model: Yi = 1 +  2 X i + ui

ˆ 2 =
x y
i i
=
n X i Yi −  X i  Yi
x 2
i n X i2 − ( X i ) 2
If  12   22   32  .... i.e. heteroscedasticity:
var( ˆ 2 ) =
 i i
x 2 2
( xi2 ) 2
If  12 =  22 =  32 = .... =  2 i.e. homoscedasticity:

2
var( ˆ 2 ) =
 i
x 2
32
Consequences of Heteroscedasticity
▪ Ass 4:
( )
E ˆ k = k (k = 0,1,, K )
33
5.2.3. Detecting Heteroscedasticity
1. Graphical Method
2. Park Test
3. Glejser Test
4. Goldfeld–Quandt Test
5. Breusch–Pagan–Godfrey Test
6. White’s General Heteroscedasticity Test
34
1. Graphical Method
• If there is no a priori or empirical information
about the nature of heteroscedasticity, in
practice we can do the regression analysis on
the assumption that there is no
heteroscedasticity and then do a postmortem
examination of the residual squared uˆ i2 to see
if they exhibit any system pattern.
• We plot uˆ i2
against the estimated Ŷ to find our
i
whether the estimated mean value of Y is

systematically related to the squared residual.35
1. Graphical Method
Nguyen Thu Hang-BMNV, FTU CS2 36

1. Graphical Method
Nguyen Thu Hang-BMNV, FTU CS2 37

Example- Testing for Heteroskedasticity
• Housing Price Equations (HPRICE1.WF1):
y = f(lotsize, sqrft, bdrms)
Where: y = House price ($1000)
lotsize= size of lot in square feet
sqrft= size of house in square feet
bdrms= number of bedrooms
38
Example- Testing for Heteroskedasticity
• Eview output
39
Example-Graphic method
40000
30000
20000
10000
200 300 400 500 600

Linear prediction
40
2. Park Test
• Suggest that uˆ i2 is some function of the
explanatory variable Xi. Park suggests the
functional form:  i2 =  2 X i e v
i
• Or: ln  = ln  +  ln X i + vi
2 2
i

• Since i is generally unknown, Park suggest
2
using uˆ 2 as a proxy and running the following

i
regression:
ln uˆ = ln  +  ln X i + vi =  +  ln X i + vi
2
i
2
41
3.2. Park Test
42
Example-Park test
• Eview commands
genr luhatsq=log(uhat^2)
genr llotsize=log(lotsize)
genr lsqrft=log( sqrft)
genr lbdrms=log( bdrms)
43
Example-Park test
44
Example-Park test using Eviews
ls luhatsq c lsqrft
There is statistically significant relationship between the two

variables. Following the Park test, we conclude that the
heteroskedasticity exists in the error variance.
45
3. Glejser Test
Glejser suggest the

following functional forms:
Use t-test to test:
H0:β2 = 0 (Homoscedasticity)
46
Example-Glejser Test
Use PRICE1.WF1 with Eview
There is statistically significant relationship between the two

variables. Following the Glejser test, we conclude that the
heteroskedasticity exists in the error variance. 47
• The k-variable linear regression model:
• Yi = 1 +  2 X 2i + ... +  k X ki + ui
• Assume that the error variance  i2 is described as:

 = f (1 +  2 Z 2i + ... +  m Z mi )
i
2
• Specifically, assume that:

 i2 = 1 +  2 Z 2i + ... +  m Z mi
→  i2 is some function of the nonstochastic variable

Z’s. If  2 =  3 = ... =  m = 0 ,  2 =  which is
i 1
constant.
→To test whether the error variance is
homoskedasticity, we can test the hypothesis that
 2 =  3 = ... =  m = 0 48
• Step 1: estimate the k-variable linear regression
model by OLS and obtain û i
• Step2: Obtain ~ 2 =  uˆi2 / n
• Step 3: Construct variables pi defined as:
~
pi = uˆ / 
2 2
i
• Step4: Regress pi on the Z’s (or X’s) as:
pi = 1 +  2 Z 2i + ... +  m Z mi + vi
• Step 5: Obtain ESS and define:
1
 = ( ESS )
2
• Step4: ~ 2
m −1
49
5. White’s General Heteroscedasticity Test
The White test proceeds as follows:

•Step 1. Given the data, we estimate and obtain the residuals, uˆi .
Yi = β1 + β2X2i + β3X3i + ui
•Step 2. We then run the following (auxiliary) regression:
Obtain the R2 from this (auxiliary) regression

•Step 3. Under the null hypothesis that there is no heteroscedasticity.
α2 = α3 = α4 = α5 = α6 = 0
•Step 4. The sample size (n) times the R2 obtained from the auxiliary
regression asymptotically follows the chi-square distribution
•If the n.R2 exceeds the critical chi-square value at the chosen level of
significance, the conclusion is that there is heteroscedasticity 51
Example with Eview
White's test for Ho: homoskedasticity

against Ha: unrestricted heteroskedasticity
chi2(9) = 33.73
Prob > chi2 = 0.0001
→ Reject H0.
53
The Method of Generalized Least Squares (GLS)
• let us continue with the two-variable model:
•
5.2.4 Correcting for Heteroscedasticity
• When σi2 is known: The Method of Weighted
Least Squares
• When σi2 is Not Known
58
When σi2 is Not Known
Example
• Eview output
63
5.3. AUTOCORRELATION
5.3.1. The nature of autocorrelation
5.3.2. OLS estimation in the presence of
autocorrelation
5.3.3. Detection of autocorrelation
5.3.4. Correcting for autocorrelation
64
Assumption 5:
cov(ut, us)=0 for all t≠s
This assumption states that the disturbances ut

and us are independently distributed, which is
called serial independence.
65
If this assumption is no longer valid, then the
disturbances are not pair-wise independent,
but pair-wise auto-correlated (or Serially
Correlated).
This means that an error occurring at period t
may be carried over to the next period t+1.
Autocorrelation is most likely to occur in time

series data.
66
Why does serial correlation occur?
• Inertia
• Specification Bias: Excluded variables case.
• Specification Bias: Incorrect functional form
• Cobweb phenomenon
• Lags
• “Manipulation” of Data
• Data transformation
67
• Inertia: A salient feature of most economic time

series is inertia or sluggishness: GDP, price
indexes, production, employment and
unemployment exhibit cycles.
68
• Specification Bias: Excluded variables case:
- Suppose we have the following demand model:
Yt = 1 +  2 X 2t+  3 X 3t +  4 X 4t + U t
- Where Y= quantity of beef demanded, X2= price of

beef, X3= consumer income, X4= price of pork, t=time.
- For some reason we run the model:
Y = 1 +  2 X 2t+  3 X 3t + Vt
Vt =  4 X 4t + U t
→ The error term will reflect a systematic pattern, thus
creating autocorrelation.
69
• Specification Bias: Incorrect functional form
- Suppose the correct model in a cost-output study:
MCi = 1 +  2 Qi +  3Qi2 + U i
Where MC= marginal cost, Q= output
- But we run the model: MC =  +  Q + V
i 1 2 i i
- Or: Vi =  3Qi + U i
2
The error term will

reflect a systematic
pattern, and thus
creating autocorrelation.
70
• Cobweb phenomenon: The supply of many agricultural
commodities reflects the so-called cobweb phenomenon,
where supply reacts to price with a lag of one time period
because supply decisions take time to implement.
Supplyt = 1 +  2 Pt −1 + ut
• At the end of period t, price Pt turns out to be lower than Pt-
1. Therefore, in period t+1, farmers may very well decide to
produce less than they did in period t. In this situation, the
disturbance ut are not expected to be random because if the
farmers overproduce in year t, they are likely to reduce their
production in t+1, and so on, leading to a Cobweb pattern.
71
• Lags: In a time series regression of consumption
expenditure on income, it is not uncommon to find that the
consumption expenditure in the current period depends ,
among other things, on the consumption expenditure of the
previous period. Consumers do not change their
consumption habits readily for psychological, technological
or institutional reasons.
consumptiont = 1 +  2incomet +  2consumptiont −1 + ut

• This regression is known as autoregression because one of
the explanatory variables is the lagged value of the
dependent variable.
72
• Manipulation of Data: in time series
regressions involving quarterly data, such data
usually derived from the monthly data by
simply adding three monthly observations and
dividing the sum by 3. This averaging
introduces smoothness into the data by
dampening the fluctuations in the monthly
data. This smoothness may itself lend to a
systematic pattern in the disturbances, and
thereby introducing autocorrelation.
73
First-Order Autocorrelation
The simplest and most commonly observed is the
first-order autocorrelation.
Consider the multiple regression model:
Yt=β1+β2X2t+β3X3t+β4X4t+…+βkXkt+ut
in which the current observation of the error term
ut is a function of the previous (lagged)
observation of the error term:
ut=ρut-1+et
74
The coefficient ρ is called the first-order
autocorrelation coefficient and takes values from
-1 to +1.
This is called a first –order autoregressive scheme,
AR(1).
It is obvious that the size of ρ will determine the
strength of serial correlation.
75
Three different cases:
(a) If ρ is zero, then we have no autocorrelation.
(b) If ρ approaches unity, the value of the previous
observation of the error becomes more important in
determining the value of the current error and
therefore high degree of autocorrelation exists. In this
case we have positive autocorrelation.
(c) If ρ approaches -1, we have high degree of negative
autocorrelation.
76
77
78
Higher-Order Autocorrelation
Second-order when:
ut=ρ1ut-1+ ρ2ut-2+et
Third-order when
ut=ρ1ut-1+ ρ2ut-2+ρ3ut-3 +et
p-th order when:
ut=ρ1ut-1+ ρ2ut-2+ρ3ut-3 +…+ ρput-p +et
79
autocorrelation
Yt =  1 +  2 X t + u t E (u i u j )  0
➢ Assume that the error term can be modeled as

follows: u t = u t −1 + t −1    1
 ➢  is known as the coefficient of autocovariance and
the error term satisfies the OLS assumption.
80
autocorrelation
ˆ 2 =
S xy
=
 xt y t var( ˆ 2 ) =
2
S xx  t
x 2
 t
x 2
 n−1 n−2

 2   xt xt +1  xt xt +2
x x 
var( ˆ2 ) AR (1) =  1 + 2  t =1
+ 2  2 t =1
+ ... + 2  n−1 1 n

 xt 
2

n
x 2

n
x 2

n
x 2
 t =1
t
t =1
t
t =1
t 

The coefficient estimator is still linear and

unbiased. However it does not have minimum
variance→ It is not BLUE.
81
The BLUE estimator in the presence of autocorrelation
➢ Under the AR (1) process, the BLUE

estimator of β2 is given by the following
expression.
 ( x − x )( y − y )
n
t −1 t −1
ˆ
 =GLS t −2 t
+C
t
 ( x − x )
2 n 2
t −2 t t −1
2
Var ( ˆ GLS
)= +D

2
( x t − x t −1 )
n 2
t −2
Under autocorrelation, the estimators obtained the method of

GLS are BLUE. The method of GLS can be learnt in advanced
courses.
82
Consequences of Using OLS in the presence
of autocorrelation
➢ The estimator is no more not BLUE, and even
if we use the variance, the confidence
intervals derived from there are likely to be
wider than those based on the GLS procedure.
Hypothesis testing: we are likely to declare a

coefficient statistically insignificant even
though in fact it may be.
➢ One should use GLS and not OLS.
83
Consequences of Using OLS in the presence
of autocorrelation
➢ The estimated variance of the error is likely to
overestimate the true variance
➢ Over estimate R-square
➢ Therefore, the usual t and F tests of significance
are no longer valid, and if applied, are likely to
give seriously misleading conclusions about the
statistical significance of the estimated
regression coefficients.
84
5.3.3. Detecting Autocorrelation
1. Graphical Method
2. The Runs Test
3. The Durbin Watson Test
4. A general test of autocorrelation: The
Breusch-Godfrey (BG) Test
85
1. Graphical Method
There are various ways of examining the residuals.
• The time sequence plot: Plot residuals against time.
• Plot the standardized residuals against time. The
standardized residuals is the residuals divided by the
standard error of the regression.
→ If the actual and standard plot shows a pattern, then
the errors may not be random.
86
Example
• Effects of inflation and deficits on interest rate
(intdef.dta).
i3t = 1 +  2 inf t +  3 deft + ut
• where i3= the three-month T-bill rate, inf=

annual inflation rate, def= the federal budget
deficit as a percentage of GDP.
88
Example
• Eview output
90
Example
The time sequence plot
predict i3hat, xb
genr uhat= i3-i3hat
View scatter uhat

year
91
Example
The standardized residuals against time
sduhat= uhat/Root
Genr sduhat=
uhat/1.8432
scatter sduhat year
92
2. The Runs Test
➢ Consider a list of estimated error term, the errors
term can be positive or negative. In the following
sequence, there are three runs.
➢ (─ ─ ─ ─ ─ ─ ─ ─ ─ ) ( + + + + + + + + + + + + +
+ + + + + + + +) (─ ─ ─ ─ ─ ─ ─ ─ ─ ─ )
➢ A run is defined as uninterrupted sequence of one
symbol or attribute, such as + or -.
➢ The length of the run is defined as the number of
element in it. The above sequence as three runs, the
first run is 9 minuses, the second one has 21 pluses
and the last one has 10 minuses.
93
2. The Runs Test
➢ By examining how runs behave in s strictly random
sequence of observations, one can derive a test of
randomness of runs.
➢ Are the 3 runs observed in our illustrative example
consisting of 40 observations too many or two few
compared with the number of runs expected in a
strictly random sequence of 40 observations?
➢ If there are too many runs, it would mean that in our
example the residuals change sign frequently, thus
indicating negative serial correlation.
➢ If there are two few runs, they may suggest positive
autocorrelation.
94
4.2. The Runs Test
➢ Define
➢ N: total number of observations (N=N1+N2)
➢ N1: number of + symbols (i.e. + residuals)
➢ N2: number of ─ symbols (i.e. ─ residuals)
➢ R: number of runs
➢ Assuming that the N1 >10 and N2 >10, then
the number of runs is normally distributed
with:
95
2. The Runs Test
➢ Then, E ( R) =
2N1 N 2
+1 2 N 1 N 2 (2 N 1 N 2 − N )
 =2
( N ) 2 ( N − 1)
R
N
➢ If the null hypothesis of randomness is sustainable,

following the properties of the normal distribution,
we should expect that
➢ Prob [E(R) – 1.96 R ≤ R ≤ E(R) + 1.96 R]
➢ Hypothesis: do not reject the null hypothesis of
randomness with 95% confidence if R, the number
of runs, lies in the preceding confidence interval;
reject otherwise
The Runs Test - Example
R=3, N1=19, N2=21,N=40
E(R)=10.975
SigmaR= 3.1134
The 95% confidence interval for R in our

example:
10+/- 1.96*3.1134 = (4.8728, 17.0722)
97
➢ The most celebrated test for detecting serial
correlation.
➢ The Durbin-Watson d statistic
t =n
 t t −1
(uˆ
t =2
− ˆ
u ) 2
d= t =n
 t
ˆ
u 2
t =1
➢ It is simply the ratio of the sum of squared

differences in successive residuals to the RSS.
➢ The number of observation is n-1 as one observation
is lost in taking successive differences. 98
The assumptions underlying the d statistic
- The regression model includes the intercept term.

- The explanatory variables are nonstochastic, or fixed in
repeated sampling.
- The disturbances are generated by the first order
autoregressive scheme: ut=ρut-1+et
99
The assumptions underlying the d statistic
- The error term is assumed to be normally distributed.

- The regression model does not include the lagged
values of the dependent an explanatory variables.
- There are no missing values in the data.
- Durbin-Watson have derived a lower bound dL and an
upper bound dU such that if the computed d lies
outside these critical values, a decision can be made
regarding the presence of positive or negative serial
correlation.
100
d statistic
d=
 t  t −1 − 2 uˆ t uˆ t −1
ˆ
u 2
+ ˆ
u 2

 21 −
 uˆ uˆ
t t −1


 t
ˆ
u 2 
  uˆ 2
t


d  2(1 − ̂ )
➢ Where ˆ =
 uˆ uˆ
t t −1
 uˆ 2
t
➢ But since -1 ≤  ≤ 1, this implies that 0 ≤ d ≤ 4.
101
d statistic
➢ If the statistic lies near the value 2, there is no
serial correlation.
➢ But if the statistic lies in the vicinity of 0, there
is positive serial correlation.
➢ The closer the d is to zero, the greater the
evidence of positive serial correlation.
➢ If it lies in the vicinity of 4, there is evidence of
negative serial correlation
➢ If it lies between dL and dU / 4 –dL and 4 – dU,
then we are in the zone of indecision.
Zone of Zone of
No
indecision indecision
autocorrelation Reject H0,
Reject H0,
evidence of evidence of
positive negative
auto- auto-
correlation correlation
0 dL dU 2 4-dU 4-dL 4
103
Modified d test
➢ Use Modified d test if d lies in the zone in the of indecision.

Given the level of significance ,
➢ Ho:  = 0 versus H1:  > 0, reject Ho at  level if d < dU. That
is there is statistically significant evidence of positive
autocorrelation.
➢ Ho :  = 0 versus H1 :  < 0, reject Ho at  level if 4- d < dU.
That is there is statistically significant evidence of negative
autocorrelation.
➢ Ho :  = 0 versus H1 :  ≠ 0, reject Ho at 2 level if d < dU
and 4- d < dU. That is there is statistically significant
evidence of either positive or negative autocorrelation.
106
The mechanics of the Durbin-Watson test
➢ Run the OLS regression and obtain the residuals

➢ Compute d
➢ For the given sample size and given number of
explanatory variables, find out the critical dL and dU.
➢ Follow the decisions rule
107
Example
Because this is time series data, we should consider the possibility

of autocorrelation. To run the Durbin-Watson, first we have to
specify the data as time series with the tsset command. Next we
use the dwstat command.
Durbin-Watson d-statistic( 3, 56) = .7161527
DL=1.490, DU=1.641
Reject H0, evidence of positive
auto-correlation 108
4. The Breusch – Godfrey
➢ The BG test, also known as the LM test, is a general
test for autocorrelation in the sense that it allows for
(1) nonstochastic regressors such as the lagged values of
the regressand;
(2) higher-order autoregressive schemes such as AR(1),
AR (2)etc.; and
(3) simple or higher-order moving averages of white
noise error terms.
109
The Breusch – Godfrey
➢ Consider the following model:

Yt = 1 +  2 X t + u t
u t = 1u t −1 +  2 u t − 2 + ........ p u t − p + t
H o : 1 =  2 = ..... =  p = 0
➢ Estimate the regression using OLS

➢ Run the following regression and obtained
the R-square
The Breusch – Godfrey
➢ If the sample size is large, Breusch and

Godfrey have shown that (n – p) R2 follow a
chi-square p df.
➢ If (n – p) R2 exceeds the critical value at the
chosen level of significance, we reject the null
hypothesis, in which case at least one rho is
statistically different from zero.
111
Example
Applying Eviews
112
4. The Breusch – Godfrey
Point to note:
➢ The regressors included in the regression model may
contain lagged values of the regressand Y. In DW,
this is not allowed.
➢ The BG test is applicable even if the disturbances
follow a pth-order moving averages (MA) process,
that is ut is integrated as follows:
u t =  t + 1 t −1 +  2  t − 2 + ........ p  t − p
➢ A drawback of the BG test is that the value of p, the
length of the lag cannot be specified as a priori.
113
Model Misspecification vs. Pure Autocorrelation
(read p441 Gujaradi)
➢ It is important to find out whether
autocorrelation is pure autocorrelation and not
the result of mis-specification of the model.
➢ Suppose that the Durbin Watson test of a
given regression model reveals a value of
0.716. This indicates positive autocorrelation .
➢ However, could this correlation have arisen
because the model was not correctly
specified?
➢ Time series model do exhibit trend, so add a
trend variable in the equation. 114
5.3.4 Curing the problem (read 442
Gujaradi)
We have4 options:
1. Try to find out if the autocorrelation is pure
autocorrelation and not the result of
mis-specification of the model. As we discussed in
Section 5.3.1, sometimes we observe patterns in
residuals because the model is mis-specified—that
is, it has excluded some important variables—or
because its functional form is incorrect
5.3.4 Curing the problem
2. If it is pure autocorrelation, one can use appropriate
transformation of the original model so that in the
transformed model we do not have the problem of (pure)
autocorrelation. As in the case of heteroscedasticity, we will
have to use some type of generalized least-square (GLS)
method.
3. In large samples, we can use the Newey–West method to
obtain standard errors of OLS estimators that are corrected
for autocorrelation. This method is actually an extension of
White’s heteroscedasticity-consistent standard errors method
that we discussed in the previous chapter.
4. In some situations we can continue to use the OLS
method

Diagnosing Model Problems: Detecting and Curing Multicollinearity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diagnosing Model Problems: Detecting and Curing Multicollinearity

Uploaded by

Copyright:

Available Formats

Chapter 5

Diagnosing Model Problems

5.1.1 The nature of multicollinearity

• X3i = 5X2i → There is perfect collinearity between X2 and

Nếu có perfect collirearity => ko ước l

Perfect collirearity →X3i = λX2i , where λ is a nonzero constant

→ the estimator is indeterminate

→ where r23 is the coefficient of correlation between X2 and X3.

• High R2 but insignificant t ratios. If R2 is high, say,

• High pair-wise correlation among regressors: a rule of

• The speed with which variances and covariances increase

• Using this definition, we can express

• Tolerance and variance inflation factor: if the VIF of a

When R2j = 1 (i.e., perfect collinearity), TOLj = 0

Ex: Suppose we consider the Cobb Douglas production

Or: Qt* = A* + L*t + K t* + U t

- Based on the findings in prior literature, we know that the

Replacing β with 1-α, we obtain: t

• Where: Yt* = ln Yt − ˆ3 ln I t

→ It must hold at time (t-1):

Yt −1 =  0 + 1 X 1,t −1 +  2 X 2,t −1+U t −1

→ The first difference regression model often

• The sample size increases →  x2i2 will

▪ Ass 4: Homoscedasticity or equal variance of ui:

• Two variable model: Yi = 1 +  2 X i + ui

If  12   22   32  .... i.e. heteroscedasticity:

If  12 =  22 =  32 = .... =  2 i.e. homoscedasticity:

whether the estimated mean value of Y is

Nguyen Thu Hang-BMNV, FTU CS2 36

Nguyen Thu Hang-BMNV, FTU CS2 37

200 300 400 500 600

using uˆ 2 as a proxy and running the following

There is statistically significant relationship between the two

Glejser suggest the

There is statistically significant relationship between the two

• Assume that the error variance  i2 is described as:

• Specifically, assume that:

→  i2 is some function of the nonstochastic variable

The White test proceeds as follows:

Obtain the R2 from this (auxiliary) regression

White's test for Ho: homoskedasticity

This assumption states that the disturbances ut

Autocorrelation is most likely to occur in time

• Inertia: A salient feature of most economic time

- Where Y= quantity of beef demanded, X2= price of

The error term will

consumptiont = 1 +  2incomet +  2consumptiont −1 + ut

➢ Assume that the error term can be modeled as

The coefficient estimator is still linear and

➢ Under the AR (1) process, the BLUE

Under autocorrelation, the estimators obtained the method of

Hypothesis testing: we are likely to declare a

i3t = 1 +  2 inf t +  3 deft + ut

• where i3= the three-month T-bill rate, inf=

genr uhat= i3-i3hat

View scatter uhat

scatter sduhat year

➢ If the null hypothesis of randomness is sustainable,

The 95% confidence interval for R in our

➢ It is simply the ratio of the sum of squared

- The regression model includes the intercept term.

- The error term is assumed to be normally distributed.

➢ But since -1 ≤  ≤ 1, this implies that 0 ≤ d ≤ 4.

Or: Qt* = A* + Lt + K t + U t