Professional Documents
Culture Documents
Part 5 – Finite
Sample Properties
matrix ; beduc=init(100,1,0)$
proc$
draw ; n=25 $
regress; quietly ; lhs=hhninc ; rhs = one,educ $
matrix ; beduc(i)=b(2) $
sample;all$
endproc$
execute ; i=1,100 $
histogram;rhs=beduc $
= β + ( X'X ) −1 X'ε
β + ( X'X ) −1 ∑ i=1 x iεi
N
=
= β + ∑ i=1 ( X'X ) −1 x iεi
N
= β + ∑ i=1 v iεi
N
ε1 ε1 ε1
ε 2 ε 2 ε 2
= E Var +
...
Var E
...
Var | X |X
...
ε N ε N ε N
0
0
= E {σ2I} + Var = σ2I.
...
0
1. Linear estimator
2. Unbiased: E[b|X] = β
β + ∑ i=1 v iεi
n
Source of the random behavior of b =
v i = ( X ′X ) −1 x ′i where x i is row i of X.
We derived E[b | X ] and Var[b | X ] earlier. The distribution
of b | X is that of the linear combination of the disturbances, εi .
If εi has a normal distribution, denoted ~ N[0,σ2 ], then
b | X = β + Aε where ε ~ N[0,σ2I] and A = ( X ′X ) −1 X ′.
b | X ~ N[β, Aσ2IA ′] = N[β, σ2 ( X ′X ) −1 ].
Note how b inherits its stochastic properties from ε.
Both variable coefficients are significant and must be included in the model (as per
specification).
z′ * z *
where z* = the vector of residuals from the regression of z on X.
z′ * z *
The R 2 in that regression is R z|X
2
=1- , so
∑ i=1 (zi − z)
n 2
σ2
Var[c] = σ [z′M X z ] =
2 −1
"Detecting" multicollinearity?
1
Variance inflation factor: VIF(z) = .
(1 − R z|X )
2
Condition number
larger than 30 is
‘large.’
In the Filippelli test, Stata found two coefficients so collinear that it dropped them from
the analysis. Most other statistical software packages have done the same thing, and
most authors have interpreted this result as acceptable for this test.
What is better: Include a variable that causes collinearity, or drop the variable
and suffer from a biased estimator?
Mean squared error would be the basis for comparison.
Some generalities. Assuming X has full rank, regardless of the condition,
b is still unbiased
Gauss-Markov still holds