You are on page 1of 7

# Violations of Classical Linear Regression Assumptions

Mis-Specification
Assumption 1. Y=Xβ+ε
a. What if the true specification is Y=Xβ+Zγ+ε but we leave out the relevant variable Z?
Then the error in the estimated euation is reall! the sum Zβ+ε. "ultipl! the true re#ression b!
X\$ to #et the mis%specified &'()
X\$Y=X\$Xβ+X\$Zγ+X\$ε.
The &'( estimator is b=*X\$X+
%1
X\$Y= *X\$X+
%1
X\$Xβ+*X\$X+
%1
X\$Zγ+*X\$X+
%1
X\$ε. The last term is
on avera#e #oin# to vanish, so we #et b=β+*X\$X+
%1
X\$Zγ. -nless γ=. or in the data, the
re#ression of X on Z is /ero, the &'( b is biased.
b. What if the true specification is Y=Xβ+ε but we include the irrelevant variable Z)
Y=Xβ+Zγ+*ε%Zγ+. The error is ε0=ε−Ζγ. 1ar*ε0+=var*ε++γ\$var*Z+γ.
The estimator of 2β γ3\$ is
1
]
1

¸

1
]
1

¸

·
1
]
1

¸

Y 4 Z
Y 4 X
Z 4 Z X 4 Z
Z 4 X X 4 X
#
b
1
The e5pected value of this is
1
]
1

¸
β
·
1
1
]
1

¸

+ β
·
1
]
1

¸

.
.
. Z 4 X + X 4 X *
#
b
6
1
. Thus the &'( produces an
unbiased estimate of the truth when irrelevant variables are added. 7owever, the standard error
of the estimate is enlar#ed in #eneral b! #\$Z\$Z#8*n%9+ *since e0\$e0=e\$e%:e\$Z#+#\$Z\$Z#+. This
could easil! lead to the conclusion that β=. when in fact it is not.
c. What if the coefficients chan#e within the sample, so β is not a constant? (uppose that
βi=β+Ziγ. Then the proper model is Y=X*β+Zγ++ε=Xβ+XZγ+ε. Thus we need to include the
interaction term XZ. ;f we do not, then we are in the situation *a+ above, and the &'( estimates
of the coefficients of X will be biased. &n the other hand, if we include the interaction term
when it is not reall! appropriate, the estimators are unbiased but not minimum variance. We can
#et fooled about the true value of β.
7ow do !ou test whether the interactions belon# or not. <un an unconstrained re#ression
*which includes interactions+ and then run a constrained re#ression *set interaction coefficients
eual to /ero+. 2*((6const%((6unconst+8382((6unconst8*n%9+3= >,n%9 where =number of interaction
terms.
d. "an! researchers do a ?search@ for the proper specification. This can lead to spurious
results and we will loo9 at this is some detail in a lecture to follow.
Censored Data and Frontier Regression
Assumption :. 62εAX3=..
(uppose that 62εi AX3=µB.. Cote) this is the same for all i. b=*X\$X+
%1
X\$Y=*X\$X+
%1
X\$*Xβ+ε+
=β+*X\$X+
%1
X\$ε. Thus 62b3=β+µ*X\$X+
%1
X\$1. The term *X\$X+
%1
X\$1 is the re#ression of 1 on X,
but the first column of X is 1 so the resultin# re#ression coefficients must be 21 . .D.3\$. As a
result 62b3=β+2µ . . D .3\$. &nl! the intercept is biased.
• Cow suppose that 62εiAX3=µi but this varies with i. That is, µBµ1. E! reasonin# li9e the
above, 62b3=β+*X\$X+
%1
X\$µ The re#ression of µ on X will in #eneral have non%/ero coefficients
ever!where and the estimate of b will be biased in all wa!s.
1
;n particular, what if the data was censored in the sense that onl! observations of Y that
are not too small nor too lar#e are included in the sample) ";C ≤Yi≤"AX. 7ence for values of
Xi such that Xiβ are ver! small or var! lar#e, onl! errors that are hi#h and low respectivel! will
lead to observations in the dataset. This can lead to the t!pe of bias discussed above for all the
coefficients, not Fust the intercept. (ee the #raph below where the slope is also biased.
X
Y
Gensored, too low
Gensored, too hi#h
True <e#ression
Apparent, but biased
re#ression line
";C
"AX
Frontier Regression: Stochastic Frontier Analysis
1
Cost Regression: C
i
=a + bQ
i
+ εi + φi
The term a+bQ+ε represents the minimum cost measured with a slight
measurement error ε !i"en this# the actual costs must be abo"e the
minimum so the ine\$ciency term φ must be positive Suppose that φ has an
e%ponential distribution:
&'φ+=e
%φ/λ
8λ for φ≥0.
()ote: *(φ3=λ and 1ar2φ3=λ
:
.3 (uppose that the measurement error ε=C*.,σ
:
+ and is
independent of the inefficienc! φ. The Foint probabilit! of ε and φ is
: :
:
1
8 8
e
:
1
+ , * f
σ ε − λ φ −
σλ π
· φ ε
. 'et the total error be denoted θ=ε+φ. 2Cote) 62θ3=λ and
1ar2θ3=σ
:

:
.3 Then the Foint probabilit! of the inefficienc! and total error is
: :
:
1
8 + * 8
e
:
1
+ , * f
σ φ − θ − λ φ −
σλ π
· φ θ
. The mar#inal distribution of the total error is found b!
inte#ratin# the f*θ,φ+ with respect to φ over the ran#e 2.,∞+. -sin# ?complete%the%suare@ this
can be seen to eual
: :
:
1
8 8
e + 8 8 *
1
+ * f
λ σ + λ θ −
λ σ − σ θ Φ
λ
· θ , where Φ is the cumulative standard normal.
To fit the model to n data%points, we would select a, b , λ and σ to ma5imi/e lo#%
li9elihood)
∑ λ − − − ∑ λ σ − σ − − Φ + λ σ + λ − ·
i i i i i i
: :
. 8 + bH a G * + 8 8 + bH a G ** ln + : 8 * n + ln* n + ' ln*
&nce we have estimated the parameters, we can measure the amount of inefficienc! for each
observation, φi. The conditional pdf f*φiAθi+ is computed for θi=Gi%a%bHi)
1
Ai#ner, I., G. 'ovell and J. (chmidt *1KLL+, ?(pecification and 6stimation of Jroduction >rontier Jroduction
>unction "odels,@ J. Econometrics, M)1 *Nul!+, :1%OLP Qumbha9a, ( and G. 'ovell *:...+, (tochastic >rontier
Anal!sis, Gambrid#e -niv Jress. >ree (>A software ><&CT;6< R.1 is available at
http)88www.u.edu.au8economics8cepa8frontier.htm .
:
:
:
i i
:
1
+ 8 *
i
i i
e
+ 8 8 * :
1
+ A * f

,
_

¸
¸
σ
λ σ − θ − φ

λ σ − σ θ Φ σ π
· θ φ
. This is a half%normal distribution and has a
mode of θi%σ
:
8λ, assumin# this is positive. The de#ree of cost inefficienc! is defined as ;6i=
i
e
φ
P
this is a number #reater than 1, and the bi##er it is the more inefficientl! lar#e is the cost. &f
course, we do not 9now φi, but if we evaluate ;6i at the posterior mode θi%σ
:
8λ it euals ;6i ≈
i
:
i
bH a 8 G
e
− − λ σ −
. Cote that the term σ
:
8λ captures the idea that we do not precisel! 9now what the
minimum cost euals, so we sli#htl! discount the measured cost to account for our uncertaint!
on-Spherical !rrors
Assumption O. var*YAX+=var*εAX+=σ
:
;
(uppose that var*εAX+= σ
:
W, where W is a s!mmetric, positive definite matri5 but WB;. What
are the conseuences for &'(?
a. 62b3=62*X\$X+
%1
X\$*Xβ+ε+3=β+*X\$X+
%1
X\$62ε3 = β, so &'( is still unbiased even if WB;.
b. 1ar2b3=62*b%β+*b%β+\$3=*X\$X+
%1
X\$62εε\$3X*X\$X+
%1

:
*X\$X+
%1
X\$WX*X\$X+
%1

:
*X\$X+
%1
7ence, the &'( computed standard errors and t%stats are wron#. The &'( estimator will not be
E'-6.
"enerali#ed Least-S\$uares
(uppose we find a matri5 J *n×n+ such that JWJ\$=;, or euivalentl! W=J
%1
J\$
%1
or W
%1
=J\$J *use
spectral demcomposition+. "ultipl! the re#ression model *Y=Xβ+ε+ on left b! J) JY=JXβ+Jε.
Write JY=Y0, JX=X0 and Jε=ε0, so in the transformed variables Y0=X0β+ε0. Wh! do this?
'oo9 at the variance of ε0) 1ar*ε0+=62ε0ε0\$3=62Jεε\$J\$3=J62εε\$3J\$=σ
:
JWJ\$=σ
:
;. The error ε0
is sphericalP that\$s wh!.
"LS estimator) b0=*X0\$X0+
%1
X0\$Y0=*X\$J\$JX+
%1
X\$J\$JY=*X\$W
%1
X+
%1
X\$W
%1
Y.
Anal!sis of the transformed data euation sa!s that S'( b0 is E'-6. (o it has lower variance
that the &'( b.
1ar2b03=σ
:
*X0\$X0+
%1
= σ
:
*X\$W
%1
X+
%1
7ow do we estimate σ
:
? 2Cote) from &'( 62e\$e38*n%9+=62ε\$"ε38*n%9+=62tr*ε\$"ε+38*n%
9+=62tr*"εε\$+38*n%9+ =tr*"62εε\$3+8*n%9+=σ
:
tr*"W+8*n%9+. (ince WB;, tr*"W+Bn%9, so 62e\$e38
*n%9+ Bσ
:
.3 7ence, to estimate σ
:
we need to use the errors from the transformed euation
Y0=X0b0+e0.
s0
:
=*e0\$e0+8*n%9+
62s0
:
3=tr*"062ε0ε0\$3+8*n%9+= σ
:
tr*"0JWJ\$+8*n%9+= σ
:
tr*"0+8*n%9+=σ
:
. 7ence s0
:
is an
unbiased estimator of σ
:
.
;mportant Cote) all of the above assumes that W is 9nown and that it can be factored into J
%1
J\$
%1
.
7ow do we 9now W? Two special cases are autocorrelation and heteros9edasticit!.
Autocorrelated !rrors
O
(uppose that Yt=Xtβ+ut *notice the subscript t denotes time since this problem occurs most
freuentl! with time%series data+. ;nstead of assumin# that the errors ut are iid, let us assume
the! are autocorrelated *also called seriall! correlated errors+ accordin# to the la##ed formula
ut=ρut%1+εt,
where εt is iid. (uccessivel! la##in# and substitutin# for ut #ives the euivalent formula
ut=εt+ρεt%1+ρ
2
εt%:+D
-sin# this, we can see that 62utut3=σ
:
*1+ρ
:

R
+D+=σ
:
8*1%ρ
:
+, 62utut%13=ρ σ
:
8*1%ρ
:
+,
62utut%:3=ρ
2
σ
:
8*1%ρ
:
+, D 62utut%m3=ρ
m
σ
:
8*1%ρ
:
+. Therefore, the variance matri5 of u is
var*u+=62uu\$3 =
1
1
1
1
1
1
]
1

¸

ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
ρ −
σ
− − −

1
1
1
1
1
1
O n : n 1 n
O n :
: n
1 n :
:
:

    

:
W,
where
1
1
1
1
1
1
]
1

¸

ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
ρ −
·
− − −

1
1
1
1
1
1
W
O n : n 1 n
O n :
: n
1 n :
:

    

and
1
1
1
1
1
1
]
1

¸

ρ −
ρ + ρ −
ρ − ρ + ρ −
ρ −
·

1 . .
. 1 .
. 1
. . 1
W
:
:
1

    

;t is possible to show that W
%1
can be factored into J\$J where
1
1
1
1
1
1
]
1

¸

ρ −
ρ −
ρ −
ρ −
·
1 . .
. 1 .
. . 1
. . . 1
J
:

    

.
Siven this J, the transformed data for S'( is
1
1
1
1
1
]
1

¸

ρ − ρ − ρ −
ρ − ρ − ρ −
ρ − ρ − ρ −
·
1
1
1
1
1
1
]
1

¸

ρ −
ρ −
ρ −
ρ −
· ·
− −

p , 1 n p 1 n 1 , 1 n 1 n
p 1 11 :1
p 1
:
11
: :
1 n n
: O
1 :
1
:
5 5 5 5 1
5 5p 5 5 1
5 1 5 1 1
0 X ,
! !
! !
! !
! 1
JY 0 Y

Cotice that onl! the first element is uniue. The rest Fust involves subtractin# a fraction ρ of the
la##ed value from the current value. "an! modelers drop the first observation and use onl! the
last n%1 because it is easier, but this throws awa! information and ; would not recommend doin#
R
it unless !ou had a ver! lar#e n. The Gochrane%&rcutt techniue successivel! estimates of ρ
from the errors and re%estimatin# based upon new transformed data *Y0,X0+.
1. Suess a startin# ρ..
:. At sta#e m, estimate β in model Yt%ρmYt%1=*Xt%ρmXt%1+β+εt usin# &'(. ;f the estimate bm is
not different from the previous bm%1, then stop. &therwise, compute error vector em=*Y0%
X0bm+.
O. 6stimate ρ in emt=ρem,t%1+εt via &'(. This estimate becomes the new ρm+1. So bac9 to :.
Dur%in-&atson test for ρB0 in ut=ρut%1+εt.
1. Gompute &'( errors e.
:. Galculate

·
·

·
n
1 t
:
t
n
: t
:
1 t t
e
+ e e *
d
.
O. dT: ⇒ ρU., dU: ⇒ ρT., d=: ⇒ ρ=..
'eteros(edasticit)
7ere we assume that the errors are independent, but not necessaril! identicall! distributed. That
is the matri5 W is dia#onal, but not the identit! matri5. The most common wa! for this to occur
is because Yi is the avera#e response of a #roup i that has a number of members mi. 'ar#er
#roups have smaller variance in the avera#e response) var*εi+=σ
:
8mi. 7ence the variance matri5
would be
1ar*ε+=
1
1
1
1
1
1
]
1

¸

σ
n
:
1
m
1
m
1
m
1
:
. .
. .
. .

   

.
An related e5ample of this would be that Y is the sum across the members of man! similar
elements, so that the var*εi+=σ
:
mi and
V
1ar*ε+=
1
1
1
1
]
1

¸

σ
n
:
1
:
m . .
. m .
. . m

   

.
;f we 9new how bi# the #roups where and whether we had the avera#e or total response, we
could substitute for mi in the above matri5 W.
"ore #enerall!, we thin9 that the variance of ε; depends upon some variable Z. We can
do a "less*er +est of this as follows.
1. Gompute &'( estimate of b,e
:. <e#ress AeiA on Zi
η
, where η=1,%1, and W.
O. ;f the coefficient of Z
η
is . then the model is homoscedastic, but if it is not /ero, then
the model has heteros9edastic errors.
;n (J((, !ou can correct for heteros9edasticit! b! usin# Anal!/e8<e#ression8Wei#ht 6stimation
rather than Anal!/e8<e#ression8'inear. You have to 9now the variable Z, of course.
Tric9) (uppose that σt
:

:
Zt
:
. Cotice Z is suared. Iivide both sides of euation b! Z to #et
Yt8Zt=*Xt8Zt+β+εt8Zt. This new euation has homoscedastic errors and so the &'( estimate of
this transformed model is E'-6.
Simultaneous !\$uations
Assumption R. X is fi5ed
'ater in the semester will return to the problem that X is often determined b! actors in the pla!
we are stud!in# rather than b! us scientists. This is a serious problem in simultaneous euation
models.
Multicollinearit)
Assumption V. X has full column ran9.
What is the problem if !ou have multicollinearit!? ;n X\$X there will be some portions that loo9
li9e a little suare
1
]
1

¸

5 4 5 5 4 5
5 4 5 5 4 5
and this has a determinant eual to /ero, so its reciprocal will be
near infinit!. &'( is still E'-6, but estimated var2b3=*X\$X+
%1
Y\$*;%X*X\$X+
%1
X\$+Y8*n%9+ can be
ver! lar#e.
;f there is collinearit!, then there e5ists a wei#htin# vector α such that Xα is close to the ,
vector. &f course, we cannot Fust allow α to be /ero. 7ence let\$s loo9 for the value of α that
minimi/es AAXαAA
:
subFect to α-α=1. The 'a#ran#ian for this constrained optimi/ation is
'=α\$X\$Xα+λ*1%α-α+ and the first order conditions are X\$Xα%λα·0 This is the euation for the
ei#envalue and ei#envector of X\$X. "ultipl! the first order condition b! α\$ and use the fact that
ei#envectors have a len#th of 1 to see that α\$X\$Xα=λ, so we are loo9in# at the smallest of the
ei#envalues when we see9 collinearit!. When is this ei#envalue ?small@ enou#h to measure
serious collinearit!? We compute a Gondition ;nde5 as the suare root of the ratio lar#est
M
ei#envalue to the smallest ei#envalue)
smallest
est ar# l
G;
λ
λ
≡ . When the condition inde5 is #reater than
:. or O., we have serious collinearit!. ;n (J(( <e#ression8'inear8(tatistics clic9 ?Gollinearit!
Iia#nostics.@
Warnin#) "an! people use the 1ariance ;nflation >actor to identif! collinearit!. +his should %e
avoided *see Ghennamaneni, 6chambadi, 7ess and (!am :..K+. The problem is that 1;>
confuses ?collinearit!@ with ?correlation@ as follows. 'et < be the correlation matri5 of X)
<=I
%W
X\$7XI
%W
8*n%1+ where the standard deviation matri5 I
W
=srt*dia#*X\$7X+8*n%1++.
Gompute <
%1
. >or e5ample,
1
1
1
1
]
1

¸

ρ − ρ −
ρ
ρ −
ρ
ρ −
·
1
]
1

¸

ρ
ρ

: :
: :
1
1
1
1
1 1
1
1
1
and alon# the dia#onal is 18*1%ρ
:
+ which is called the 1ariance ;nflation >actor *1;>+. "ore
#enerall! 1;>i=*1%<i
:
+
%1
where <i
:
is the <%suare from re#ressin# 5i on the 9%1 other variables in
X. The problem with 1;> is that it starts with a mean%centered data 7X, when collinearit! is a
problem of the raw data X. ;n &'( we compute *X\$X+
%1
, not *X\$7X+
%1
. Ghennamani et al.
provide a variant of 1;> that does not suffer from these problems.
What can !ou do if there is collinearit!?
1+ Io nothin#. &'( is E'-6.
:+ Set more information. &btain more data or formali/e the lin9s between the elements of X.
O+ (ummari/e X. Irop a variable or do principal component anal!sis *more on this in ne5t
chapter of the te5tboo9+.
R+ -se rid#e re#ression. This appends a matri5 9; to the bottom of the e5o#enous data X and
appends a correspondin# vector of .\$s to the bottom of the endo#enous data Y. This s!nthetic
data obviousl! results in a biased estimator *biased toward . since the au#mented data has Y not
respondin# to chan#es in X+, but the au#mented data 9; has ortho#onal and hence ma5imall!
?not collinear@ observations. 7ence, the estimates become more precise. >or 9≈., the improved
precision dominates the bias.
L