Professional Documents
Culture Documents
Marcelo Sant’Anna
FGV EPGE
We strengthen the assumption about our sample in order to derive finite sample
properties of the estimators we are studying.
Assumption
The observations (yi , xi ) for i = 1, . . . , n are independent and identically
distributed (i.i.d.).
The independence assumption makes our lives easier but is strong and might
not be reasonable in a variety of contexts
Fortunately it can be relaxed in many applications, allowing us to make
inference that is ‘robust’ to some forms of dependence
yi =xi0 β + ei
E [ei |xi ] =0
Homoskedastic: E ei2 |xi = σ 2 (xi ) = σ 2
Heteroskedastic: E ei2 |xi = σ 2 (xi ) = σi2
h i h i0
Vβ̂ = var β̂|X = E β̂ − β̂|X β̂ − β̂|X |X
0
= E β̂ − β β̂ − β |X
0
0 −1 0 0 −1 0
= E (X X ) X e (X X ) X e |X
h i
−1 −1
= E (X 0 X ) X 0 ee 0 X (X 0 X ) |X
−1 −1
= (X 0 X ) X 0 E [ee 0 |X ] X (X 0 X )
| {z }
≡D
What is D? A diagonal matrix: Dn×n = diag σ12 , . . . , σn2
Diagonal terms: E ei2 |X = E ei2 |xi = σi2
Off-diagonal terms: E [ei ej |X ] = E [ei |xi ] E [ej |xj ] = 0
Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 5 / 16
Gauss-Markov theorem
Theorem
In the homoskedastic linear regression model with iid sampling, if β̃ is a linear
unbiased estimator of β then
−1
var β̃|X ≥ σ 2 (X 0 X ) .
so β̃ is unbiased ⇐⇒ A0 X = I .
so β̃ is unbiased ⇐⇒ A0 X = I .
β̃ =A0 y
var β̃|X =A0 Aσ 2
We now show the theorem. We need to show that for any unbiased estimator
(A0 X = I ):
−1
A0 A − (X 0 X ) ≥ 0.
The trick is setting C = A − X (X 0 X )−1 . Note that C 0 X = 0 and re-writing
−1 0
A0 A − (X 0 X ) = C + X (X 0 X )−1 C + X (X 0 X )−1 − (X 0 X )−1
The method of moments estimator of the error variance σ 2 = E ei2 is naturally:
n
1X 2
σ̂ 2 = êi
n
i=1
Remember that
−1 −1
Vβ̂ = var β̂|X = (X 0 X ) X 0 E [ee 0 |X ] X (X 0 X )
| {z }
≡D
−1 2
V̂β̂0 = (X 0 X ) s .
i=1
i=1
However, we know êi2 is biased towards zero, so one may wish to inflate the
estimator:
n
!
n 0 −1
X −1
V̂β̂HC1 = (X X ) xi xi0 êi2 (X 0 X )
n−k
i=1
n
!
−1 −1
X
V̂β̂HC2 = (X 0 X ) (1 − hii )−1 xi xi0 êi2 (X 0 X )
i=1
n
!
−1 −1
X
0
V̂β̂HC3 = (X X ) (1 − hii )−2 xi xi0 êi2 (X 0 X )
i=1
Students in the same class share teacher, location and peer effects, that
should make their achievement outcomes correlated;
Bidders in an auction may share common information about the object being
sold, which will imply a correlation in their bidding behavior;
We will briefly discuss the implications of clustered data for standard covariance
estimation and how to make covariance estimates robust to this environment.
{(yig , xig ) : g = 1, . . . , G ; i = 1, . . . , ng },
yg = Xg β + ee ,
y = X β + e.
E [eg |Xg ] = 0.
Theorem
Given the assumptions above, the OLS estimator is still unbiased in the clustered
world: h i
E β̂|X = β.
Theorem
The variance of the OLS estimator, given the cluster assumptions above is
G
!
−1 −1
X
0
Xg E eg eg |Xg Xg (X 0 X ) .
0 0
Vβ̂ = var β̂|X = (X X )
g =1
Theorem
The variance of the OLS estimator, given the cluster assumptions above is
G
!
−1 −1
X
0
Xg E eg eg |Xg Xg (X 0 X ) .
0 0
Vβ̂ = var β̂|X = (X X )
g =1
When E eig2 |xg = σ 2 and E [eig ejg |xg ] = ρσ 2 for j 6= i, regressors do not
vary within cluster and all clusters have N observations, the formula above
simplifies to
−1
Vβ̂ = (X 0 X ) σ 2 (1 + ρ(N − 1)).
The Arellano’s variance estimator robust to clustering is a direct extension of
Eicker-White’s estimator to the clustered world:
G
!
−1 −1
X
V̂β = (X 0 X ) Xg0 êg êg0 Xg (X 0 X )
g =1