You are on page 1of 18

Statistics II - Least Squares Regression

Marcelo Sant’Anna

FGV EPGE

July 17, 2019

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 1 / 16


Samples

We strengthen the assumption about our sample in order to derive finite sample
properties of the estimators we are studying.

Assumption
The observations (yi , xi ) for i = 1, . . . , n are independent and identically
distributed (i.i.d.).

The independence assumption makes our lives easier but is strong and might
not be reasonable in a variety of contexts
Fortunately it can be relaxed in many applications, allowing us to make
inference that is ‘robust’ to some forms of dependence

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 2 / 16


Linear regression model assumptions

Assumption (Linear Regression Model)


Observations (yi , xi ) satisfy the linear regression equation

yi =xi0 β + ei
E [ei |xi ] =0

have finite moments and E [xi xi0 ] is non-singular.

There are two cases for the conditional variance of errors:

 
Homoskedastic: E ei2 |xi = σ 2 (xi ) = σ 2
 
Heteroskedastic: E ei2 |xi = σ 2 (xi ) = σi2

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 3 / 16


Unbiasedness of OLS

Theorem (Conditional mean of OLS estimator)


In the linear regression model with iid sampling:
h i
E β̂|X = β.

From the expression of the OLS estimator:


−1
β̂ = (X 0 X ) (X 0 y )
−1
= (X 0 X ) (X 0 (X β + e))
−1
=β + (X 0 X ) (X 0 e)

Taking expectations conditional on X :


h i h i
−1
E β̂ − β|X = E (X 0 X ) (X 0 e) |X
−1
= (X 0 X ) X 0 E [e|X ]
| {z }
=0

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 4 / 16


Variance of OLS estimator

   h i  h i0 
Vβ̂ = var β̂|X = E β̂ − β̂|X β̂ − β̂|X |X
  0 
= E β̂ − β β̂ − β |X
  0 
0 −1 0 0 −1 0
= E (X X ) X e (X X ) X e |X
h i
−1 −1
= E (X 0 X ) X 0 ee 0 X (X 0 X ) |X
−1 −1
= (X 0 X ) X 0 E [ee 0 |X ] X (X 0 X )
| {z }
≡D


What is D? A diagonal matrix: Dn×n = diag σ12 , . . . , σn2
   
Diagonal terms: E ei2 |X = E ei2 |xi = σi2
Off-diagonal terms: E [ei ej |X ] = E [ei |xi ] E [ej |xj ] = 0
Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 5 / 16
Gauss-Markov theorem

Theorem
In the homoskedastic linear regression model with iid sampling, if β̃ is a linear
unbiased estimator of β then
 
−1
var β̃|X ≥ σ 2 (X 0 X ) .

What does the relation ≥ mean for matrices? A ≥ B ⇐⇒ A − B is positive


semi-definite.
−1
The OLS estimator under homoskedasticity has variance Vβ̂ = σ 2 (X 0 X ) ,
since in that case D = σ 2 I . This establishes the famous BLUE (best linear
unbiased estimator) for the OLS β̂.

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 6 / 16


Gauss-Markov theorem

What does it mean to be a “linear unbiased estimator of β?”


Linear function of y :
β̃ = A0 y ,
where An×k function of X . The OLS is a particular case in which
−1
A = X (X 0 X ) .
Unbiasedness:
h i
E β̃|X = E [A0 y |X ] = A0 E [y |X ] = A0 X β,

so β̃ is unbiased ⇐⇒ A0 X = I .

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 7 / 16


Gauss-Markov theorem

What does it mean to be a “linear unbiased estimator of β?”


Linear function of y :
β̃ = A0 y ,
where An×k function of X . The OLS is a particular case in which
−1
A = X (X 0 X ) .
Unbiasedness:
h i
E β̃|X = E [A0 y |X ] = A0 E [y |X ] = A0 X β,

so β̃ is unbiased ⇐⇒ A0 X = I .

And the variance of β̃:


 
var β̃|X = A0 E [ee 0 |X ] A = A0 Aσ 2

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 7 / 16


Gauss-Markov theorem

β̃ =A0 y
 
var β̃|X =A0 Aσ 2

We now show the theorem. We need to show that for any unbiased estimator
(A0 X = I ):
−1
A0 A − (X 0 X ) ≥ 0.
The trick is setting C = A − X (X 0 X )−1 . Note that C 0 X = 0 and re-writing
−1 0
A0 A − (X 0 X ) = C + X (X 0 X )−1 C + X (X 0 X )−1 − (X 0 X )−1


=C 0 C + (X 0 X )−1 X 0 C + C 0 X (X 0 X )−1 + (X 0 X )−1 − (X 0 X )−1


=C 0 C ≥ 0.

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 8 / 16


Estimation of error variance

 
The method of moments estimator of the error variance σ 2 = E ei2 is naturally:
n
1X 2
σ̂ 2 = êi
n
i=1

Under homoskedasticity, this estimator will be biased:


 
n−k
E σ̂ 2 |X = σ2
 
n

An obvious fix for the bias is to rescale the estimator:


n
1 X 2
s2 = êi
n−k
i=1

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 9 / 16


Covariance matrix estimation with homoskedasticity

Remember that
 
−1 −1
Vβ̂ = var β̂|X = (X 0 X ) X 0 E [ee 0 |X ] X (X 0 X )
| {z }
≡D

Under homoskedasticity, D = σ 2 In , so:


−1
Vβ̂ = (X 0 X ) σ2 .

A natural unbiased estimator for Vβ̂ is then:

−1 2
V̂β̂0 = (X 0 X ) s .

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 10 / 16


Covariance matrix estimation with heteroskedasticity
 
−1 −1
Vβ̂ = var β̂|X = (X 0 X ) X 0 E [ee 0 |X ] X (X 0 X )
| {z }
≡D

If the true errors were observed, an ideal estimator would be:


n
!
−1 −1
X
0
ideal
V̂β̂ = (X X ) xi xi ei (X 0 X ) ,
0 2

i=1

why? Note that


n
!
h i
−1 −1
X
E V̂β̂ideal |X = (X 0 X ) xi xi0 (X 0 X )
 2 
E ei |X
i=1
n
!
−1 −1
X
0
= (X X ) xi xi0 σi2 (X 0 X )
i=1
0 −1 −1
= (X X ) (X 0 DX ) (X 0 X ) = Vβ̂

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 11 / 16


Covariance matrix estimation with heteroskedasticity
A simple feasible estimator under heteroskedasticity was proposed by Eicker-White:
n
!
−1 −1
X
0
HC0
V̂β̂ = (X X ) xi xi êi (X 0 X ) .
0 2

i=1

However, we know êi2 is biased towards zero, so one may wish to inflate the
estimator:
  n
!
n 0 −1
X −1
V̂β̂HC1 = (X X ) xi xi0 êi2 (X 0 X )
n−k
i=1
n
!
−1 −1
X
V̂β̂HC2 = (X 0 X ) (1 − hii )−1 xi xi0 êi2 (X 0 X )
i=1
n
!
−1 −1
X
0
V̂β̂HC3 = (X X ) (1 − hii )−2 xi xi0 êi2 (X 0 X )
i=1

HC2 and HC3 are more conservative than HC0:


V̂β̂HC0 < V̂β̂HC2 < V̂β̂HC3
Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 12 / 16
Clustered Sampling

In many contexts, the random sample assumption might be strong. In many


cases, we may think that observations in the same group share many unobserved
factors that affect the outcomes we want to measure:

Students in the same class share teacher, location and peer effects, that
should make their achievement outcomes correlated;

Bidders in an auction may share common information about the object being
sold, which will imply a correlation in their bidding behavior;

Can you think of more examples?

We will briefly discuss the implications of clustered data for standard covariance
estimation and how to make covariance estimates robust to this environment.

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 13 / 16


Basic notation

Now we need a new index to identify the clusters of data:

{(yig , xig ) : g = 1, . . . , G ; i = 1, . . . , ng },

we can partion observations in the different clusters:

yg = (y1g , . . . , yng g )0 and Xg = (x1g , . . . , xng g )0 .

The regression equation for cluster g :

yg = Xg β + ee ,

and stacking up all clusters we get back

y = X β + e.

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 14 / 16


Unbiasedness of OLS in the clustered world

Assumption (Independent clusters)


The clusters (yg , Xg ) are mutually independent across clusters g .

Assumption (Conditional mean (in cluster))


Cluster errors are conditionally mean zero for all g = 1, . . . , G :

E [eg |Xg ] = 0.

Theorem
Given the assumptions above, the OLS estimator is still unbiased in the clustered
world: h i
E β̂|X = β.

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 15 / 16


Variance of the OLS estimator

Theorem
The variance of the OLS estimator, given the cluster assumptions above is
G
!
 
−1 −1
X
0
Xg E eg eg |Xg Xg (X 0 X ) .
0 0
 
Vβ̂ = var β̂|X = (X X )
g =1

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 16 / 16


Variance of the OLS estimator

Theorem
The variance of the OLS estimator, given the cluster assumptions above is
G
!
 
−1 −1
X
0
Xg E eg eg |Xg Xg (X 0 X ) .
0 0
 
Vβ̂ = var β̂|X = (X X )
g =1

 
When E eig2 |xg = σ 2 and E [eig ejg |xg ] = ρσ 2 for j 6= i, regressors do not
vary within cluster and all clusters have N observations, the formula above
simplifies to
−1
Vβ̂ = (X 0 X ) σ 2 (1 + ρ(N − 1)).
The Arellano’s variance estimator robust to clustering is a direct extension of
Eicker-White’s estimator to the clustered world:
G
!
−1 −1
X
V̂β = (X 0 X ) Xg0 êg êg0 Xg (X 0 X )
g =1

Marcelo Sant’Anna (FGV EPGE) Statistics II - Lec 3 July 17, 2019 16 / 16

You might also like