You are on page 1of 9

Econ 311: James J.

Heckman

Panel Data Analysis


The notes on panel data analysis comprise three parts:
• Part I: Classical Models
• Part II: Feasible Estimators
• Part III: Modern Moment Estimation

Part I. Classical Models


This lecture reviews the early econometric literature on panel data analysis.
The lecture consists of four sections:
• Section 1: Defines model and error structure;

• Section 2: Derives expression for the GLS estimator for panel data;

• Section 3: Demonstrates that the GLS estimator can be expressed as the


weighted average of the ”Within” and ”Between” estimators;
• Section 4: Derives expressions of the standard errors of the ”Within”
and ”Between” estimators.

1 Standard panel data model


This section defines the model and the structure of the error term in the
standard panel data model:
Yit = Xit β + εit , i = 1, ..., I, t = 1, ..., T ,
εit = fi + Uit
E(fi ) = E(Uit ) = 0
E(fi2 ) = σ 2f , E(Uit2 ) = σ 2u
E(fi fi0 ) = 0, i 6= i0
Uit is iid ∀ i, t. Assume that Xit is strictly exogenous, ie :
* E(Uit + fi | Xi1 , ..., XiT ) = 0 ∀ t.
* Also distribution of the Xi = (Xi1 , ..., XiT )0 does not depend on β

1
=⇒ Cov(εit εit0 ) = σ 2f , t 6= t0 and Cov(εit εi0 t0 ) = 0, i 6= i0 and t 6= t0
Collecting each individual time series into one vector, we get the representa-
tion:
Yi = Xi β + εi where i = 1, ...I and Yi is a T x1 vector and Xi is a T xK
vector.

2 GLS etimator for panel data


σ 2f
First define ρ ≡ 2 (intraclass correlation coefficient).
σ f + σ2u

Look at covariance matrix for εi = (εi1 , ..., εiT )0


 
1 ρ ρ ··· ρ
E(ε0i εi ) = (σ 2f + σ2u )  ρ 1 ρ · · · ρ  ≡ (σ 2f + σ 2u )(A) .
··· ··· ··· ··· 1

Now stack all of the disturbances (in groups of T ) so that ε = (ε1 , ε2 ....εI )
 
A 0 0 0
0 A 0 0 
E(εε0 ) = Ω 6= I = (σ 2f + σ 2u ) 
0 0 A 0 

0 0 0 A
Now stacking Xi into a supervector X and Yi into a supervector Y , we have
that GLS estimator is:
β̂ GLS = (X 0 Ω−1 X)−1 (X 0 Ω−1 Y )
Note that OLS applied to the data yields unbiased (because of exogeneity of
Xit wrt. εit ) but inefficient estimators of β (because of heteroscedasticity of
the error terms). Further, standard computer programs produce the wrong
standard errors. Correct standard errors are:
σ2 (X 0 X)−1 (X 0 ΩX)(X 0 X)−1
whereas OLS standard errors is assumed to be:
σ2 (X 0 X)−1 .
Accordingly, inferences based on OLS models are incorrect.

2
3 Interpretations of the GLS estimator
3.1 GLS estimator as a weighted average of "Within" and "Be-
tween" estimators
Write A = (1 − ρ)I + ριι0
where ι is a T × 1 vector of 1’ s. so that we have:
 ..   .. 
1-ρ 0 0 . ρ ρ ρ .
 ..   .. 
 0 1-ρ 0 .   ρ ρ ρ . 
A= .  + .. 
 0 0 1-ρ ..   ρ ρ ρ . 
··· ··· ··· ··· ··· ··· ··· ···

We also get (can be verified by matrix multiplication and some manipulation)


that:
A−1 = λ1 ιι0 + λ2 I
where:
−ρ
λ1 =
(1 − ρ)(1 − ρ + T ρ)
1
λ2 =
1−ρ
What is the GLS estimator doing? We have:

β̂ GLS = (X 0 Ω−1 X)−1 (X 0 Ω−1 Y )


 
..
A−1 0 0 0 .
 .. 
0 A−1 0 0 . 
 .. 
−1
Ω =0  0 A −1
0 .  
 .. 
0 0 0 A−1 . 
··· ··· ··· ··· ···
   
X1 Y1
X   Y2 
 2  
X =  ..  and Y =  .. 
 .   . 
XI YI

Thus, we conclude that:

3
µ ¶−1 µ ¶
P
I P
I
β̂ GLS = Xi0 A−1 Xi Xi0 A−1 Yi (*)
i=1 i=1

Then using the expression for A−1 given above, we get:


µI 0 0

PI
0 −1
P Xi ιι Xi PI
Xi A Xi = T λ1 + λ2 Xi0 Xi (**)
i=1 i=1 T i=1

Now define:
ι0 Xi
Xi∗ ≡
T
So Xi∗ is a a Kx1 vector of means. Now we can establish that the GLS
estimator is a function of within and between variation:
P
I
Total variation ≡ TXX ≡ Xi0 Xi
i=1

The Within Variation is the sum of the variation of individual data around
their time-series means’, ie:
P
I
0 0 0
Within Variation ≡ WXX ≡ (Xi − ιXi∗ ) (Xi − ιXi∗ )
i=1

Now:
0 ιι0 ιι0
Xi − ιXi∗ = Xi − Xi = [I − ]Xi
T T
and observing that:
· ¸· ¸
ιι0 ιι0 ιι0
I− I− =I− (idempotent)
T T T
Thus, we have that within variation is also given by:
PI
0 ιι0
WXX = Xi (I − )Xi
i=1 T
Similarly, the Between variation is defined as:
P 0
P Xi0 ιι0 Xi
Between Variation ≡ BXX ≡ T Xi∗ Xi∗ =
T
So that, we get directly from the definitions: TXX = WXX + BXX

4
Then we can rewrite the decomposition of the first term of the GLS estimator
(see equations (*) and (**) above) as:
µI 0 0
¶ ·T 0 0
¸
PI P Xi ιι Xi P X i ιι Xi
Xi0 A−1 Xi = T λ1 + λ2 + WXX
i=1 i=1 T i=1 T
µI ¶
P Xi0 ιι0 Xi
= λ2 WXX + (λ2 + T λ1 )
i=1 T
= λ2 WXX + (λ2 + T λ1 )BXX
There is a similar decomposition for other term and we get that:

β̂ GLS = [λ2 WXX + (λ2 + T λ1 ) BXX ]−1 [λ1 WXY + (T λ2 + λ1 )BXY ]


where BXY and WXY are defined analogously as BXX and WXX respectively.
Now we define:
λ1 ρ 1 − ρ + Tρ − Tρ 1−ρ
θ ≡1+T =1−T = =
λ2 (1 − ρ + T ρ) 1 − ρ + Tρ 1 − ρ + Tρ

=⇒ β̂ GLS = [WXX + θBXX ]−1 [WXY + θBXY ].


Thus the GLS estimator can be seen as a combination of 2 estimators av-
eraged together. Take first estimator the within estimator. This is simply
given by taking derivations from mean:
Yi∗ = Xi∗ β + fi + Ui∗
so that we have:
Yit − Yi∗ = (Xit − Xi∗ )β + Uit − Ui∗
∴ subtracting Yi. produces an estimator free of fi . Doing that eliminates the
fixed effects from the model. Based on this, we obtain the within estimator
as:
β̂ W = (WXX )−1 WXY
Similarly we define the between estimator as:

β̂ B = (BXX )−1 BXY .


Simply average over the groups and we are done.
(WXX )β̂ W = WXY
(BXX )β̂ B = BXY

5
=⇒ [WXX + θBXX ]β̂ GLS = WXX β̂ W + θ(BXX )β̂ B
∴ we have that:
β̂ GLS = [WXX + θBXX ]−1 [WXX β̂ W + θBXX β̂ B ].

Note that for a scalar regressor β̂ GLS lies between β̂ W and β̂ B . (But not,
necessarily so, for the general regressor case). Below, we consider some special
cases:
• Case 1: Suppose ρ = 0. Then we have λ1 = 0 =⇒ θ = 1. Then β̂ GLS is
simply OLS.
• Case 2: Suppose that ρ = 1. Then A is singular, A−1 doesn’t exist. We
have a degenerate case - no error term in the model.
• Case 3: If we have that regressors are fixed over the spell, then we have
no within variation and in this case WXX = 0 and GLS is simply the
between estimator.
• Case 4: Suppose that T → ∞, ρ 6= 0. Then we have:

(T λ1 ) −ρT −ρ
T → ∞lim = T → ∞lim = T → ∞ lim →
λ2 (1 − ρ + T ρ) 1−ρ

T
−1
=⇒ θ = 0 =⇒ β̂ GLS = β̂ W .
ie, here the within estimator is the efficient estimator.

3.2 Alternative decomposition of the GLS estimator


The A−1 matrix itself can be written in an interesting fashion and provides
an example of another interpretation of the GLS estimator.
λ1
We define: A−1 ≡ λ2 (I − kιι0 ) where k is simply given by −
λ2
We then decompose (I − kιι0 ) as follows:
(I − kιι0 ) ≡ F 0 F ≡ [I − c ιι0 ][I − c ιι0 ] = I − c ιι0 − c ιι0 + T c2 ιι0 =
I − (2c − T c2 )ιι0
λ1
=⇒ 2c − T c2 = − , F = I − cιι0 .
λ2
6
· r ¸
1 1−ρ
Solve to get c = 1−
T 1 − ρ + ρT
∴GLS estimator which is:
·I ¸−1 · I ¸
P 0 −1 P 0 −1
β̂ GLS = Xi A Xi Xi A Yi
i=1 i=1

Can be written ⇐⇒ by transforming the data in the following way:


Pre-multiply Yi = Xi β + εi with F; to obtain:
F Yi = F Xi β + F εi
Thus, this transformed regression yields us the GLS estimator. Note that:
F Yi = Yi − (cT ι)Yi .
F Xi = Xi − (cT )ι Xi∗
ι0 Yi ι0 Xi
where Yi∗ = and Xi∗ =
T T
Here again if we have ρ = 0, c = 0, GLS is OLS applied to data.

4 OLS controlling for fixed effects (LSDV model)


The panel data model considered here is:
       
Y1 ι 0 0    
 Y2   0     X U
 =  fi + ι  f2 +..+ 0  1 1
 f +  · · ·  β+ · · · 
 ···   ···   ···   ···  N
XN UN
YN 0 0 ι
   
Yli X1i
Yi =  · · ·  , Xi =  · · · 
YiT XiT

E(Ui Ui0 ) = σ 2 IT , E(Ui Uj0 ) = 0, i 6= j


if we use the following least squares regression setup, we can obtain unbiased
estimators for fi s and β:
Y = [d1 d2 . . . dN ] f + Xβ + U = Df + Xβ + U

7
where di is a dummy variable indicating a T x1 ith unit. This is referred to as
the Least squares dummy variable (LSDV) model. Since this model
captures the presence of the fixed effect, the standard results for the OLS
model applies to the LSDV model, so that the estimators for β are unbiased
and consistent.
Now using the results from partitioned inverse (refer part IV of the lectures
on Asymptotic theory), we can directly show that the β̂ estimator here is
same as the within estimator. We know that:
β̂ LSDV = [X 0 Md X]−1 [X 0 Md Y ]
where Md = I − D(D0 D)−1 D0 .
Given the
 special structure of D, we get:
F 0 0 ··· 0
 0 F 0 ··· 0 
 
Md =  .. .. .. .. .. 
 . . . . . 
0 0 ··· 0 F

ιι0
where: F ≡ I − . Then from definition in section above we have:
T
·I ¸−1 · I ¸
P 0 P 0
β̂ LSDV = Xi F Xi Xi F Yi = β̂ W (the Within estimator).
i=1 i=1
µ ¶
P
I
Then define: B ≡ Xi0 F Xi Then we have:
i=1
"Ã I ! N
#
X X
Varβ̂ LSDV = B 0 E Xi0 F Ui Ui0 F Xi B
i=1 i=1
" I
#
X X
= B 0 σ 2u (Xi0 F )F Xi0 B = σ2U ( Xi0F Xi )−1
i=1
µ µ 0¶ µ 0¶ ¶
ιι ιι ιι0
= Varβ̂ W , since =
T T T

5 Standard error for the between estimator


Recall that we have:

8
·N µ ¶¸−1 · N ¸
P ιι0 0 P ιι0
Between Estimator: β̂ B = Xi Xi Xi Yi
i=1 T i=1 T
P
N ιι0 PN 0
= β + Xi Ui + Xi iiT fi .
i=1 T i=1

Note that β̂ B uncorrelated with β̂ W because fi ⊥⊥ Uj ∀ i, j;


³ ´ µP ιι0 ¶−1 · σ 2 ¸µ
P X ιι0 0
X
¶µ
P ιι0
¶−1
u i i
Var β̂ B = Xi Xi0 + σ 2f Xi Xi0
T T T T
µ 2
¶µ 0
¶−1
σ P ιι 0
= σ2f + U Xi Xi
T T

You might also like