Panel Data Analysis of Microeconomic Decisions: Fall 2020

Panel Data Analysis of Microeconomic Decisions
Fall 2020
Panel Data Analysis 2. static models
2. static models
Overview
2.1 Fixed effects model
2.2 Random effects model
2.3 Discussion
2.4 Hausman Test
2.5 Goodness-of-fit
2.6 IV for panel
2.7 Robust inference
2.8 Testing for heteroskedasticity and autocorrelation
1
2.1 Fixed effects model (FE)
• fixed effect analysis allows for arbitrary correlation between an unobserved individual-
specific effect αi and the xit
• the unobserved effects model is

iid
yit = αi
|{z} +x0itβ + uit where uit ∼ (0, σu2 ) (6)
individual specific
intercept
where the intercept varies over i and thus is captured by the αi
• for consistency of the FE estimator we need a set of assumptions on the behavior

of the error term uit
• the most important assumption is strict exogeneity,
E(uit|xi1, ..., xiT , αi) = 0, t = 1, . . . , T
2
Excursus: Strict exogeneity
• in its most revealing form strict exogeneity in an unobserved effects model is
E(yit|xi1, xi2, . . . , xiT , αi) = E(yit|xit, αi) = x0itβ + αi
– 2nd equality: functional form assumption on E(yit|xit, αi)

– 1st equality: interpretation of strict exogeneity
“once we control for xit and αi, xis has no partial effect on yit for s 6= t’ ’
• strict exogeneity can be stated in terms of idiosyncratic errors uit
E(uit|xi1, xi2, . . . , xiT , αi) = 0, t = 1, . . . , T
3
• it implies that explanatory variables in each time period are uncorrelated with the
idiosyncratic error in each time period
E(xituis) = 0, s, t = 1, . . . , T
• this assumption is the key for OLS to consistently estimate β
• this assumption is much stronger than assuming zero contemporaneous correlation,

E(xituit) = 0, t = 1, . . . , T which we typically assume for OLS in cross-section data
• note: it still allows for arbitrary correlation between αi and xit for all t (we see later
how to deal with this)
• a strictly exogenous variable is not allowed to depend upon current, future, and
past values of the error term
4
• Exercise:
1. The assumption of strict exogeneity does not work if lagged dependent
variables are included in the specification. Why? Illustrate the problem using
yit = x0itβ + uit and let be xit = yi,t−1
2. Consider the following strict exogeneity condition: E(yit|xi1, ..., xiT ) =

E(yit|xit). Is this sufficient for strict exogeneity in the unobserved effects
model? Discuss.
5
FE model: Within transformation
• how can we obtain a consistent estimate β̂ for Equation (6) under strict exogeneity,
E(uit|xi1, xi2, . . . , xiT , αi) = 0, t = 1, . . . , T ?
• solution: transform Equation (6) to eliminate the unobserved effect αi
• within transformation: yields observations in deviations from individual mean

PT
1. calculate time averages, e.g. T1 t=1 xit, of all variables in Equation (6)
ȳi = x̄0iβ + αi + ūi
2. subtract the time means from Equation (6)
yit − ȳi = (xit − x̄i)0β + (uit − ūi) (7)
• result: regression model in deviations from individual means without αi (eliminated)
6
FE model & OLS
• we now can apply pooled OLS to the transformed model in Equation (7)
• if strict exogeneity holds, OLS provides us with a consistent estimate of β
• necessary for consistency is that the key pooled OLS assumption (E(uitxit) = 0)
holds for transformed quantities
E((xit − x̄i)(uit − ūi)) = 0, t = 1, . . . , T (8)
– under strict exogeneity, uit is uncorrelated with xis for all s, t = 1, . . . , T

– it follows that uit and ūi are uncorrelated with xit and x̄i for t = 1, ..., T
• Equation (8) holds under strict exogeneity → OLS is consistent
7
• under strict exogeneity we can say even more than Equation (8):
– we know that E(uit − ūi|xit) = E(uit|xit) − E(ūi|xit) = 0, because every
1
PT
element of ūi = T t=1 uit is uncorrelated with xit
– and in turn this implies that E(uit − ūi|xit − x̄i) = 0, since each (xit − x̄i) is
just a function of xi1, . . . , xiT
• under strict exogeneity the OLS estimator for the transformed model is unbiased
• Equation (8) does not hold if we assume something weaker than strict exogeneity
– for example, only assuming E(xituit) = 0, does not ensure that xis is
uncorrelated with uit, s 6= t
– the problem is that ūi and x̄i are time averages and thus functions of their
values in t = 1, ...T
8
FE estimator
• The FE estimator, β̂F E , is the OLS estimator from the regression of
yit − ȳi on xit − x̄i, t = 1, . . . , T ; i = 1, . . . , N
PN 0
• assume that i=1 (xit − x̄i )(xit − x̄i ) is invertible (no perfect multicollinearity,
must be maintained for all moment estimators)
• the FE estimator is given by
N X
T
!−1 N X
T
X X
β̂F E = (xit − x̄i)(xit − x̄i)0 (xit − x̄i)(yit − ȳi) (9)
i=1 t=1 i=1 t=1
9
What about the αi?
• allowing for arbitrary correlation between xit and αi comes at a price
• applying within-transformation rules out estimating the individual specific effect α̂i
– we cannot obtain estimates for time-constant observables e.g. gender, race, or
other time-fixed attributes
– reason: not possible to distinguish effects of time-constant observables from
time-constant unobservables in αi
• So can we never obtain an estimate α̂i?
10
Dummy variable regression
• traditional approaches view αi as parameters to be estimated along with β
• to do so, define N dummy variables, one for each i: dij = 1, if i = j and 0

otherwise and specify
N
X
yit = αj dij + x0itβ + uit,
j=1
where α1 coefficient on di1, α2 coefficient on di2,. . .

• it is note very attractive to estimate such a model, since one has to estimate N + K
parameters for N × T observations
• moreover, the α̂i are not consistent for N → ∞

– for each new i a new αi is added: information does not accumulate as N → ∞
– consistency of α̂i requires T → ∞ because information accumulates per i
11
Calculation and usefulness of α̂i
• since α̂i is the intercept for cross-section unit i, we can obtain it by
α̂i = ȳi − x̄0iβ̂F E
• α̂i then can be used to compute sample statistics to get an idea on how heterogeneity
is distributed in population
• Example: use αi to compute sample averages µ̂α

−1
PN −1
PN
– µ̂α ≡ N i=1 α̂i = N i=1 (ȳi − x̄i β̂F E ), a consistent estimator of µ
– although we cannot estimate each αi consistently, we can use it to estimate
features of the population distribution of αi
• Example using estimate of αi: Guner, Kulikova & Llull (2018): ”Marriage and
health: Selection, protection, and assortative mating” (link)
• Stata: intercept is an estimate of average heterogeneity
12
Asymptotic inference with fixed effects
• for the FE estimator to be efficient, we have to make an additional assumption

about uit
• in addition to E(uit|xi1, ..., xiT , αi) = 0 we assume

– homoskedasticity: errors have constant variance across t,
V ar(uit|xi1, ..., xiT , αi) = E(u2it|xi1, ..., xiT , αi) = E(u2it) = σu2 , t = 1, . . . , T
– serially uncorrelated: errors have zero covariance across different periods
E(uituis|xi1, ..., xiT , αi) = E(uituis) = 0, all t 6= s
• however, efficiency of OLS in the transformed model requires that

{(uit − ūi) : t = 1, . . . , T } are homoskedastic across t and serially uncorrelated
• it is not obvious that this holds and we should check this (whiteboard)
13
• covariance matrix of FE estimator is given by
N X
T
!−1
X
V ar β̂F E = σu2 (xit − x̄i)(xit − x̄i)0
i=1 t=1
• note: σu2 is the error variance of uit, not of (uit − ūi)

– a consistent estimator of σu2 for the FE model is
2
σ̂u,F E = SSR/ [N (T − 1) − K]
– a consistent estimator of σu2 with pooled OLS is
σ̂u2 = SSR/(N T − K)
– difference between SSR/(N T −K) and SSR/[N (T −1)−K] can be substantial

for small T
14
• Example: The Excel sheet illustrates differences in the degrees of freedom for
2 2
σ̂u,F E and σ̂u,P OLS for different N and T
– we vary N = 10 to N = 150; T = 3, T = 10, and T = 20; K = 5
• note that typically K is rather small compared to N and N × T , thus

1 1
– N T −K ≈ NT , and
1 1
– N (T −1)−K ≈ N (T −1) for large N
• consequence: FE standard errors are too big and corrected (software) by a factor
1/2
{(N T − K)/ [N (T − 1) − K]}
• Exercise: Is FE error variance estimator identical to the one we obtain from a

dummy variable regression? Why (not)? Explain.
15
First difference (FD) estimation
• remember first lecture (section 1.1): for T = 2, αi was eliminated by taking

differences t2 − t1
• consider Equation (6) again
yit = αi + x0itβ + uit, t = 1, . . . , T
• first differencing - lagging it one period and subtracting - results in
yit − yi,t−1 = (xit − xi,t−1)0β + (uit − ui,t−1)

∆yit = ∆x0itβ + ∆uit t = 2, . . . , T (10)
• as with within transformation, first differencing eliminates αi
16
• the first differenced model relies on same strict exogeneity assumption as within
transformed model
E(uit|xi1, . . . , xiT , αi) = 0, t = 1, . . . , T
• under strict exogeneity, OLS on first differences is consistent because
E(∆xit∆uit) = E((xit − xi,t−1)(uit − ui,t−1)) = 0, t = 2, . . . , T
• strict exogeneity holds in the first differenced equation and thus this estimator is
unbiased
E(∆uit|∆xi2, ∆xi3, . . . , ∆xiT ) = 0
• Exercise: Is strict exogeneity required to satisfy E(∆xit∆uit) = 0? Why (not)?

Discuss.
17
• the FD estimator, β̂F D , is the OLS estimator from the regression of
∆yit on ∆xit, t = 2, . . . , T ; i = 1, . . . , N
• FD estimator is given by (under rank assumption)
N X
T
!−1 N X
T
X X
β̂F D = ∆xit∆x0it ∆xit∆yit (11)
i=1 t=2 i=1 t=2
• note: by taking first differences we lose one time period, such that we have T − 1
time periods for each i
18
Errors in FD model
• FE estimator is efficient in class of estimators using the strict exogeneity assumption
– key efficiency FE: homoskedasticity and no serial correlation in uit
– under these assumptions, the FD estimator is less efficient than the FE estimator
• alternative assumption FD: first differences of idiosyncratic errors are serially

uncorrelated and have constant variance
E(eite0it|xi1, xi2, . . . , xT , αi) = σe2, eit ≡ ∆uit, t = 2, . . . , T
• under this assumption: uit = uit−1 + eit

– thus, no serial correlation in eit implies that uit is a random walk
– random walk: eit are iid, but uit exhibits substantial serial dependence
• FD is the opposite extreme of FE: if uit are serially correlated, the FD estimator is
the most efficient one under strict exogeneity
19
Asymptotic variance of the FD estimator
• asymptotic variance of β̂F D
N X
T
!−1
\ X
Avar(β̂F D ) = σ̂e2 ∆xit∆x0it
i=1 t=2
• to obtain a consistent estimate σ̂e2 for σe2, compute OLS residuals
êit = ∆yit − ∆xitβ̂F D , and obtain

N X
T
−1
X
σ̂e2 = [N (T − 1) − K] ê2it
i=1 t=2
• unlike in FE model, degrees of freedom are correct, as dropping the first period
appropriately captures the lost degrees of freedom
20
Fixed effects versus first differencing
• for T = 2: FE and FD estimation produce identical estimates and inference
• for T > 2 and under strict exogeneity, choice between FE and FD hinges on
assumption about idiosyncratic errors
– FE: more efficient if uit are serially uncorrelated
– FD: more efficient if uit follows random walk
• if FE and FD produce different results, likely that strict exogeneity does not hold
(FE and FD have different probability limits)
• Stata example: compare performance of FE and FD estimator using GDP from

slide 23 in chapter 1
21
Discussion: Which estimator exploits which variation?

(whiteboard)
• illustration: variation of FE, FD, OLS
• illustration: difference between within and between variation
• Simpson’s paradox: Why between variation can be misleading (link)
22

Panel Data Analysis of Microeconomic Decisions: Fall 2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Panel Data Analysis of Microeconomic Decisions: Fall 2020

Uploaded by

Copyright:

Available Formats

Panel Data Analysis of Microeconomic Decisions

2.2 Random effects model

2.4 Hausman Test

2.6 IV for panel

2.7 Robust inference

2.8 Testing for heteroskedasticity and autocorrelation

2.1 Fixed effects model (FE)

• the unobserved effects model is

where the intercept varies over i and thus is captured by the αi

• for consistency of the FE estimator we need a set of assumptions on the behavior

• the most important assumption is strict exogeneity,

E(uit|xi1, ..., xiT , αi) = 0, t = 1, . . . , T

Excursus: Strict exogeneity

• in its most revealing form strict exogeneity in an unobserved effects model is

E(yit|xi1, xi2, . . . , xiT , αi) = E(yit|xit, αi) = x0itβ + αi

– 2nd equality: functional form assumption on E(yit|xit, αi)

• strict exogeneity can be stated in terms of idiosyncratic errors uit

E(uit|xi1, xi2, . . . , xiT , αi) = 0, t = 1, . . . , T

• this assumption is the key for OLS to consistently estimate β

• this assumption is much stronger than assuming zero contemporaneous correlation,

2. Consider the following strict exogeneity condition: E(yit|xi1, ..., xiT ) =

FE model: Within transformation

• solution: transform Equation (6) to eliminate the unobserved effect αi

• within transformation: yields observations in deviations from individual mean

ȳi = x̄0iβ + αi + ūi

2. subtract the time means from Equation (6)

yit − ȳi = (xit − x̄i)0β + (uit − ūi) (7)

• result: regression model in deviations from individual means without αi (eliminated)

FE model & OLS

• if strict exogeneity holds, OLS provides us with a consistent estimate of β

E((xit − x̄i)(uit − ūi)) = 0, t = 1, . . . , T (8)

– under strict exogeneity, uit is uncorrelated with xis for all s, t = 1, . . . , T

• Equation (8) holds under strict exogeneity → OLS is consistent

• The FE estimator, β̂F E , is the OLS estimator from the regression of

yit − ȳi on xit − x̄i, t = 1, . . . , T ; i = 1, . . . , N

• the FE estimator is given by

What about the αi?

• allowing for arbitrary correlation between xit and αi comes at a price

• So can we never obtain an estimate α̂i?

Dummy variable regression

• traditional approaches view αi as parameters to be estimated along with β

• to do so, define N dummy variables, one for each i: dij = 1, if i = j and 0

where α1 coefficient on di1, α2 coefficient on di2,. . .

• moreover, the α̂i are not consistent for N → ∞

Calculation and usefulness of α̂i

• since α̂i is the intercept for cross-section unit i, we can obtain it by

α̂i = ȳi − x̄0iβ̂F E

• Example: use αi to compute sample averages µ̂α

• Stata: intercept is an estimate of average heterogeneity

Asymptotic inference with fixed effects

• for the FE estimator to be efficient, we have to make an additional assumption

• in addition to E(uit|xi1, ..., xiT , αi) = 0 we assume

• however, efficiency of OLS in the transformed model requires that

• covariance matrix of FE estimator is given by

• note: σu2 is the error variance of uit, not of (uit − ūi)

– a consistent estimator of σu2 with pooled OLS is

– difference between SSR/(N T −K) and SSR/[N (T −1)−K] can be substantial

– we vary N = 10 to N = 150; T = 3, T = 10, and T = 20; K = 5

• note that typically K is rather small compared to N and N × T , thus

• Exercise: Is FE error variance estimator identical to the one we obtain from a

First difference (FD) estimation

• remember first lecture (section 1.1): for T = 2, αi was eliminated by taking