You are on page 1of 64

Introduction to Pooled Cross Sections

EMET 8002
Lecture 5
August 13, 2009
Administrative Matters
Consultation hours:
Tuesdays 3 pm to 5pm
Im going to be away again next Tuesday, so Ill hold
consultation hours next week on Wednesday, August 19
th
from
3 pm to 5 pm
Case Studies projects have now been assigned. If you did not
receive the email that I sent on Tuesday you can view the
assignments on the course website
If you have not already done so, please contact your supervisor
immediately. A few of the projects will require you to apply for
data access which should be done immediately!
Outline
Introduce pooled cross sections regression analysis
Chapter 13 in the text
Potential pitfall using difference-in-differences
estimation strategy
Introduction to two-period panel data and the first
difference estimator
Pooling Independent Cross Sections
across Time
What is it?
It is obtained by sampling randomly from a large
population at two or more points in time
For example, randomly sampling households from ACT
residents in 2000 and 2008
A Spreadsheet

view
Pooled cross sections Panel
Unit Period Unit Period
1 1 1 1
2 1 2 1
3 1 3 1
4 2 1 2
5 2 2 2
6 2 3 2
Formal notation
We can denote the pooled cross section as a random sample:
{ }
1 2 1 1 1 2 1 2
1 2
, , ,..., 1,2,..., , 1,..., ,..., ...
1,2,...,
it it it kit T
Period Period PeriodT
y x x x i N N N N N N N
t T
= + + + + +
=

Pooling Independent Cross Sections
across Time
Consider the regression model:
Benefits:
Pooling can lead to larger sample sizes: This leads to more precise
estimators and test statistics with more power. However, this is only true if
the relationship between the dependent variable and at least some of the
explanatory variables remains constant over time
If the xs are changing over time, it can also provide additional variation in x
with which to estimate its effect on y
Note: the error term may have the structure
For now, well make the following assumptions:
And that the two components of the error term are independent
0 1 1 2 2 1 1 1 2 1 2
1 2
... 1,2,..., , 1,..., ,..., ...
1,2,...,
it it it k kit it T
Period Period PeriodT
y x x x u i N N N N N N N
t T
= + + + + + = + + + + +
=

it t it
u = +
( ) ( )
2 2
| ~ 0, | ~ 0,
t it
X and X


Pooling Independent Cross Sections
across Time
Suppose the true error structure does include a year component, but
we ignore that and run OLS on the following equation:
Does it matter? Yes!
This introduces serial correlation between observations within the same
time period:
0 1 1 2 2
...
it it it k kit it
y x x x u = + + + + +
( )
( )
( )
( )
( )
2
2
| |
|
,
0
it jt t it t jt
t t it jt it jt
E u u X E X
E X
i j t


= + +


= + + +

=

Pooling Independent Cross Sections


across Time
This violates one of our assumptions for OLS!
The OLS coefficient estimate is still unbiased and
consistent
However, the variance-covariance matrix is
biased/inconsistent which leads to incorrect standard
errors and incorrect inference
This is similar to the problem of serial correlation in time series
models which we saw previously.
Thus, it makes sense to include time dummies (a.k.a. year
effects):
0 1 1 2 2
2
...
T
it it it k kit t it
t
y x x x
=
= + + + + + +

Interpretation of the year effects


How do we interpret the year dummies?
In words, each time dummy is the difference in the
conditional expected value of y between the base year (t=1)
and the year t=j
( )
( )
( ) ( )
1 1 2 2
1 1 2 2
| 1, ...
| , ...
| , | 1,
it it k kit
it it k kit j
j
E y t x x x
E y t j x x x
E y t j E y t

= = + + +
= = + + + +
= = =
x
x
x x
Pooling Independent Cross Sections
across Time
Further benefits:
A further benefit of these data is that we can explore changes in the coefficients
over time
This amount to allowing some or all of the s to have
t-subscripts
In a two-period pooled cross section dataset with one explanatory variable:
We can then use F-tests (including Chow Tests) to test for changes in the
regression model over time
Note: While changes in the coefficients may be interesting, one has to be very
cautious in interpreting the source of the changes (e.g., as the impact of a
policy or changing economic structure)
0 1 2 2 2 it it it it
y x x D = + + + +
Example 13.1: Womens Fertility over
Time
Dependent variable is the number of children born to a
woman.
Explanatory variables include socio-demographic
characteristics.
Pooled time series of cross-sections (GSS: 1972, 1974,
1976, 1978, 1980, 1982, 1984).
N = 1,129
Dataset: FERTIL1.RAW
One question of interest is: After controlling for other
observable factors, what happened to the fertility rate
over time?
Example 13.1: Womens Fertility over
Time
In Stata:
sort year
by year: summarize kids
Mean number of children
1972 1974 1976 1978 1980 1982 1984
3.026 3.208 2.803 2.804 2.817 2.403 2.237
Year t -

1972
0.182 -0.223 -0.222 -0.209 -0.623 -0.789
Example 13.1: Womens Fertility over
Time
In Stata: reg kids educ age agesq black east northcen west
farm othrural town smcity y74 y76 y78 y80 y82 y84
kids

Coef.

Std. Err.

t

P>t

[95% Conf.

Interval]
educ

-.1284268

.0183486

-7.00

0.000

-.1644286

-.092425
age

.5321346

.1383863

3.85

0.000

.2606065

.8036626
agesq

-.005804

.0015643

-3.71

0.000

-.0088733

-.0027347
black

1.075658

.1735356

6.20

0.000

.7351631

1.416152
east

.217324

.1327878

1.64

0.102

-.0432192

.4778672
northcen

.363114

.1208969

3.00

0.003

.125902

.6003261
west

.1976032

.1669134

1.18

0.237

-.1298978

.5251041
farm

-.0525575

.14719

-0.36

0.721

-.3413592

.2362443
othrural

-.1628537

.175442

-0.93

0.353

-.5070887

.1813814
town

.0843532

.124531

0.68

0.498

-.1599893

.3286957
smcity

.2118791

.160296

1.32

0.187

-.1026379

.5263961
y74

.2681825

.172716

1.55

0.121

-.0707039

.6070689
y76

-.0973795

.1790456

-0.54

0.587

-.448685

.2539261
y78

-.0686665

.1816837

-0.38

0.706

-.4251483

.2878154
y80

-.0713053

.1827707

-0.39

0.697

-.42992

.2873093
y82

-.5224842

.1724361

-3.03

0.003

-.8608214

-.184147
y84

-.5451661

.1745162

-3.12

0.002

-.8875846

-.2027477
_cons

-7.742457

3.051767

-2.54

0.011

-13.73033

-1.754579
Example 13.1: Womens Fertility over
Time
There may be heteroskedasticity in the previous
model.
This could be related to the observed characteristics,
or
It could simply be that the error variance is changing
over time
Nonetheless, the usual heteroskedasticity-robust
standard errors and t statistics are still
Just use the robust option with the regress command
in Stata
Allowing the effect to change across
periods
We can also interact year dummy variables with key
explanatory variables to see if the effect of that
variable changed over time
Example 13.2: Changes in the returns to
education and the gender wage gap
Consider the following regression model pooled over
the years 1978 and 1985:
The dataset is CPS78_85.RAW
reg lwage y85 educ y85educ exper expersq union
female y85fem
( )
0 0 1 1 2
2
3 4 5 5
log 85 85
85
wage y educ y educ exper
exper union female y female u


= + + + +
+ + + + +
Example 13.2: Changes in the returns to
education and the gender wage gap
lwage

Coef.

Std. Err.

t

P>t

[95% Conf.Interval]
y85

.1178062

.1237817

0.95

0.341

-.125075

.3606874
educ

.0747209

.0066764

11.19

0.000

.0616206

.0878212
y85educ

.0184605

.0093542

1.97

0.049

.000106

.036815
exper

.0295843

.0035673

8.29

0.000

.0225846

.036584
expersq

-.0003994

.0000775

-5.15

0.000

-.0005516

-.0002473
union

.2021319

.0302945

6.67

0.000

.1426888

.2615749
female

-.3167086

.0366215

-8.65

0.000

-.3885663

-.244851
y85fem

.085052

.051309

1.66

0.098

-.0156251

.185729
_cons

.4589329

.0934485

4.91

0.000

.2755707

.642295
Chow test for structural change
across time
We can apply the Chow test to see if a multiple
regression function differs across two time periods
We can do this is pooled cross sections by interacting
all explanatory variables with time dummies and
performing an F-test that the interactions are jointly
insignificant
Usually, we allow the intercept to change over time
and only test that the slope parameters have changed
Policy Analysis with Pooled Cross
Sections
This type of data can be useful in identifying the impacts
of policies (or government programs) on various
outcomes
They are especially helpful if the policy experiment has
before & after and treatment & control dimensions
Consider a simple example:
We wish to estimate the impact of participating in a
government program (i.e., the treatment) on an outcome, y
Let participation in the program be captured by the dummy
variables:
1 affected ("treatment")
0 unaffected ("control")
i
D

=

A simple estimate of the treatment


effect
One estimator of the treatment effect is given by the difference of
means:
In a regression context, we could estimate this difference by:
Such a regression would work if
T C
y y
i i i
y D u = + +
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
| 0 | 0
| 1 | 1
| 1 | 0 iff | 1 | 0
i i i i
i i i i
i i i i i i i i
E y D E u D
E y D E u D
E y D E y D E u D E u D

= = + =
= = + + =
= = = = = =
A simple estimate of the treatment
effect
The difference in means between the treatment and
control groups and the OLS estimate of are consistent
estimates of the treatment effect ONLY if there are no
other differences between the treatment and control
groups
Sometimes we can add covariates (Xs) to help control for
differences between the treatment and control group
Nonetheless, this is often an impractical assumption to
make
However, we may be able to use time variation in the
application of the program (before & after) combined
with variation in treatment
Empirical example: The effect of building
an incinerator on house prices
The hypothesis that we are interested in testing is that the
announcement of the pending construction of an incinerator would
cause the prices of houses located nearby to fall, relative to houses
further away.
A house is considered to be close if it is within 3 miles of the
incinerator.
We have data on house prices for houses that sold in 1978, before the
announcement of the incinerator, and in 1981, after the
announcement.
We begin by regressing the real house price on a dummy variable for
whether the house is close to the incinerator using data from 1981
using the dataset KIELMC.RAW
Empirical example: The effect of building
an incinerator on house prices
1981 1978 1978 &
1981
Near
Incinerator
-30,688
(5,828)
[-5.27]
-18,824
(4,745)
[-3.97]
-18,824
(4,875)
[-3.86]
Year 1981 18,790
(4,050)
[4.64]
Near
Incinerator
* 1981
-11,864
(7,457)
[-1.59]
The Difference-in-Differences Model
! Consider the following simple example where we allow:
! The model is:
! Suppose that the differences between treatment and control
groups can be written:
! Also assume that the time effects can be written (normalized)
as:

E u
i
| D
i
=1 !
"
#
$
% E u
i
| D
i
= 0 !
"
#
$

y
it
= ! + "D
it
+#
t
+ u
it

E u
it
| D
i1
= 0, D
i 2
= 0 !
"
#
$
= %
C
E u
it
| D
i1
= 0, D
i 2
=1 !
"
#
$
= %
T

!
t
= 0,t =1
!
t
=!,t = 2
The Difference-in-Differences Model
! The expected outcomes in the before (period 1) are:
! In the after period (period 2):
! An estimate of ! can then be recovered by comparing:

E y
i1
| D
i1
= 0, D
i 2
= 0 !
"
#
$
= % +&
C
E y
i1
| D
i1
= 0, D
i 2
=1 !
"
#
$
= % +&
T

E y
i1
| D
i1
= 0, D
i 2
= 0 !
"
#
$
= % +&
C
+'
E y
i1
| D
i1
= 0, D
i 2
=1 !
"
#
$
= % +&
T
+' + (

E y
i 2
| D
i1
= 0, D
i 2
= 1 !
"
#
$
% E y
i1
| D
i1
= 0, D
i 2
= 1 !
"
#
$
( )
% E y
i 2
| D
i1
= 0, D
i 2
= 0 !
"
#
$
% E y
i1
| D
i1
= 0, D
i 2
= 0 !
"
#
$
( )
The Difference-in-Differences Model
! The difference-in-differences estimator would then be based on:
! Or, alternatively,
! In a regression framework, we would estimate this as:

y
T ,2
! y
T ,1
"# + $
y
C,2
! y
C,1
"#

y
T ,2
! y
C,2
" #
T
!#
C
( )
+ $
y
T ,1
! y
C,1
" #
T
!#
C
( )

y
it
= ! +" AFTER
it
+# D
it
+ $D
it
% AFTER
it
+ u
it
The Difference-in-Differences Model
! Of course, we could also add covariates.
! In this specification, we denote:

! " common time effect
# " permanent differences (across T,C)
$ " treatment effect (diff of diff)
Difference-in-difference
They key distinction between the difference-in-difference
estimators and the difference in means is that we have
relaxed the assumption about the distribution of the error
terms across the treatment and control groups.
We no longer require the conditional expectation of the error
term to be equal across groups, we only require the
conditional expectation for each group to be constant over
time
This may still be a strong assumption! You need to think
about the validity of making this assumption.
Another Example: The Effect of Workers
Comp. on Injury Duration (Kentucky)
Notes: Dependent variable is log duration of Workers comp benefits. Controls include: age, sex,
married, whether a hospital stay was required, indicators for the type of injury, and industry of
job. After corresponds to the increase in the cap of weekly WC benefits.
5347 5347 2567 2567 Sample size
0.188 0.022 0.214 0.031 R-squared
Yes No Yes No Controls
0.175
(0.064)
0.229
(0.070)
After*High
Earner
0.047
(0.041)
0.014
(0.045)
After
0.115
(0.048)
0.233
(0.049)
0.274
(0.054)
0.462
(0.051)
High Earner
Before and
After
Before and
After
After After Data:
Examples of difference-in-difference
For a good example of a paper using this strategy, see Duflo, Esther.
(2001). Schooling and labor market consequences of school
construction in Indonesia: Evidence from an unusual policy
experiment. American Economic Review. Vol. 91, No. 4, pp. 795-813.
For a good example of when the impact of the policy/program might
have spillover effects on the control group, see Miguel, Edward and
Michael Kremer. (2004). Worms: Identifying impacts on education and
health in the presence of treatment externalities. American Economic
Review. Vol. 72, No. 1, pp. 159-217.
How much should we trust diff-in-diff
estimates?
The following discussion is based on an excellent paper by
Bertrand, Duflo, and Mullainathan (2004) in the Quarterly
Journal of Economics
Many papers that employed difference-in-differences estimators
use many years of data and focus on serially correlated
outcomes but ignore that the resulting standard errors are
inconsistent
Diff-in-diff estimates are usually based on estimating an
equation of the form:
where i denotes individuals, s denotes a state or group
membership, and t denotes the time period
ist s t ist st ist
Y A B cX I = + + + +
How much should we trust diff-in-diff
estimates?
An important point, that we will address later in the course, is possible
correlation of the error terms across individuals within a state/group in
a given year.
We are going to ignore this potential problem for now and assume that
the econometricians have appropriately dealt with correlation within
state-year cells
Hence, lets think of the data being averaged over individuals
within a state in each given year
Three factors make serial correlation an important issue in the
difference-in-differences context:
Estimation often relies on long time series
The most commonly used dependent variables are typically highly
serially correlated
The treatment variable, I
st
, changes very little within a state over
time
How much should we trust diff-in-diff
estimates?
How severe is the problem?
They examine how diff-in-diff performs on placebo laws,
where treated states (in the U.S.) are chosen at random as
is the year of passage of the placebo law
Since the laws are fictitious, a significant effect should only
be found 5% of the time (i.e., the true null hypothesis of no
effect is falsely rejected 5% of the time)
They use wages as the dependent variable over 21 years
They find rejection rates of the null hypothesis as high as
45% of the time!
In other words, there is statistical evidence that these fakes
laws affected wages in close to half of the simulations
How much should we trust diff-in-diff
estimates?
Does this matter practically?
They find 92 diff-in-diff papers published between
1990 and 2000 in the following journals: the American
Economic Review, the Industrial Labor Relations
Review, the Journal of Labor Economics, the Journal of
Political Economy, the Journal of Public Economics, and
the Quarterly Journal of Economics
69 of these papers have more than 2 time periods
Only 4 papers collapse the data into before-after
Thus, 65 papers have potential serial correlation problem
Only 5 provide a serial correlation correction
How much should we trust diff-in-diff
estimates?
Some results:
When the treatment variable is not serially correlated,
rejection rates of H
0
: no effect are close to 5%
The overrejection problem diminishes with the serial
correlation in the dependent variable
How much should we trust diff-in-diff
estimates?
Solutions:
Parametric methods
Specify an autocorrelation structure for the error term,
estimate its parameters, and use these parameters to compute
standard errors
This does not do a very good job of remedying the problem
With short time series, the OLS estimation of the autocorrelation
parameter is downward biased
The autocorrelation structure may be incorrectly specified
Block bootstrap
Bootstrapping is an advanced technique
It does poorly when the number of states/groups becomes small
How much should we trust diff-in-diff
estimates?
Solutions (continued):
Ignore time series information: average the before and after
data and estimate the model on two periods
This is difficult when treatment occurs at different times across
states since there is no longer a uniform before and after and it
is not even defined for control states
This can be corrected for though
This procedure works well, even when there are a small
number of states/groups
Arbitrary variance-covariance matrix
Does quite well in general, although the rejection rate
increases above 5% when the number of states/groups is small
Can be implemented in Stata using the cluster option (at the
state/group level, not the state-time cell)
How much should we trust diff-in-diff
estimates?
Main message:
There is not one preferred correction mechanism
Collapsing the data into pre- and post- periods produced
consistent standard errors, even when the number of
states is small (although the power of this procedure
declines fast)
Allowing for an arbitrary autocorrelation process is also
viable when the number of groups is sufficiently large
Doing good econometrics is not easy!
Be very, very careful that all your assumptions are
met!
Panel data
! If we have repeated observations on the same individuals
(units, i) then we have longitudinal, or panel data:
! Benefits of panel data:
! Similar to repeated cross-sections;
! BUT most importantly, we can exploit repeated observations on the
same individual in order to control for certain types of unobserved
heterogeneity, which otherwise might contaminate OLS estimation;
! Panel data allows for richer controls for unobserved heterogeneity
than just systematic differences between treatment and control.

y
it
, x
it
{ }
i =1, 2,3,..., N
t =1, 2,...,T
Panel Data
! Begin with two periods, for simplicity
! Of course, we can do all the same stuff with panel data as with
pooled cross-sections.
! However, we can do more.
! and we will also have an additional statistical consideration, with
the loss of independence across observations
! Consider the simple regression model:
! Omitted variables bias for arises when
! Under certain assumptions, we will be able to exploit panel data
in order to fix this bias (unobserved heterogeneity)

y
it
= !
0
+ !
1
x
it
+ v
it
1

!
( )
, 0
it it
corr x v !
Panel Data
When corr(x
it
,v
it
)0 this violates assumption MLR.4
(and MLR.4)
Hence, OLS is no longer valid
Under some circumstances we can cope with this
problem using panel data. This is another example of
when one of the core OLS assumptions fails to hold.
Fixed Effects Error Structure
! Imagine we can write the error term as:
! Furthermore, assume that ALL of the omitted variables bias was
due to
! i.e., the correlation of x with fixed (and unobserved) individual
characteristics.
t
composite error
time effect
fixed effect
idiosyncratic effect
it t i it
it
i
it
v a u
v
a
u
!
!
= + +
"
"
"
"
( )
, 0
it i
corr x a !
First Difference Estimator
! Consider the First Differenced (FD) estimator, based on:
! The key point is that the fixed effects fall out.
! By assumption, we also require
! Thus, by differencing we have eliminated the heterogeneity
bias.
2 0 1 2 2
1 0 1 1 1
1
i i i
i i i
i i i
y x a u
y x a u
y x u
! ! "
! !
! "
= + + + +
= + + + +
# = # + + #
( )
, 0
i i
corr x u ! ! =
Example: Crime and unemployment
We have data on crime and unemployment rates
for 46 cities in 1982 and 1987
Cities are the unit of observation i
1982 and 1987 are the period of observation t
Well try three specifications:
0 1
0 0 1
0 1
1987
87 1982,1987
it it it
it t it it
i i i
crmrte unem u t
crmrte d unem u t
crmrte unem u



= + + =
= + + + =
= + +
Simple Example: Unemployment and
crime (standard errors in parentheses)
0.127 0.012 0.033 R-squared
46 92 46 N
7.94
(7.98)
Y87
2.22
(0.88)
0.427
(1.19)
-4.16
(3.42)
Unemployment
Rate
15.40
(4.70)
93.42
(12.74)
128.38
(20.76)
Constant
1982,1987
(FD)
1982,1987
(Levels)
1987
(Levels)
Data:
Interpretation
Controlling for unemployment, crime has risen between 1982
and 1987 in these cities
Using just cross sectional data (i.e., only the 1987 data) would
suggest that higher unemployment is associated with lower
crime rates this is certainly not what we expect!
Using the first-differencing specification suggests that the partial
correlation between crime and unemployment is positive when
we control for city fixed effects (i.e., the negative partial
correlation we observed in the 1987 cross section was biased)
Caveats to the first difference
estimator
It may be incorrect to assume that corr(x
i
, v
i
)=0
We need variation in the xs
This means we cannot include variables that do not change
over time across observations (e.g., race, country of birth,
etc.)
It also means we cannot include variables for which the
change would be the same for all observations (e.g., age)
Also, we cannot expect to get precise estimates on
variables, such as education, which will tend to change for a
relatively few observations in a dataset
The FD Estimator more generally
! The FD estimator provides a powerful strategy for dealing with
omitted variables bias when panel data are available.
! More generally, we can apply the model to multiple time periods
(not just two):
! In which case the FD estimator is based on:
y
it
= !x
it
+ "
t
D
t
+ a
i
+ u
it
t=2
T
#

!y
it
= "!x
it
+ #
t
D
t
+ !u
it
t=2
T
$
The FD Estimator and Program Evaluation
! Of course, we can also use this framework for policy evaluation
(difference-of-differences, as before).
! The added benefit is that we can control for the unobserved
fixed effects at the level of the individual unit.
! We do not require as the same simple structure as with the
pooled cross-sections.
! But the framework is no panacea, since there may be very good
reasons why
! For example, unexplained changes in y (the error term) may be
correlated with changes in policy.
( )
cov , 0
it it
x u ! ! "
Additional Considerations
! Given that there is a time-series dimension to the FD estimator
(and panel data more generally), we may need to account for
serial correlation.
! In addition, we may need to deal with heteroskedasticity.
! While there are GLS (serial correlation) procedures available,
the easiest solution would be to use Newey-West variance-
covariance matrix.
Example: County Crime Rates (NC)
! Panel of North Carolina counties, 1981-1987.
! How do various law enforcement variables affect the crime rate?
! Base specification includes (in logs):
! Probability of arrest;
! Probability of conviction (conditional on arrest);
! Probability of prison (conditional on conviction)
! Average sentence (conditional on prison)
! Police per capita
! Covariates:
! Region, urban, pop density, tax revenues
! Year effects
! Estimated in levels and FD (ignoring serial correlation, etc.)
Example: County crime rates in North
Carolina, 1981-1987.
Pooled Cross Sections First Differencing
log(prbarr) -0.720
(0.037)
-0.327
(0.030)
log(prbconv) -0.546
(0.026)
-0.238
(0.018)
log(prbpris) 0.248
(0.067)
-0.165
(0.026)
log(avgsen) -0.087
(0.058)
-0.022
(0.022)
log(polpc) 0.366
(0.030)
0.398
(0.027)
Year effects Yes Yes
No. observations 630 540
R-squared 0.57 0.43
Interpretation
Consider the impact of the probability of being
arrested:
The first-differencing estimates suggest that we were
overestimating the negative impact on the crime rate
(i.e., increasing the probability of arrest has less of an
impact once you remove county fixed effects)
Potential pitfalls
It can be worse than pooled OLS if one or more of
the explanatory variables is subject to measurement
error
Differencing a poorly measured regressor reduces its
variation relative to its correlation with the difference
error (see Wooldridge, 2002, Chapter 11 for more
details)
This could be a problem with explanatory variables
from household or firm surveys, especially ones in
developing countries
Differencing with More Than Two
Time Periods
More on the error structure:
When doing FD estimation with more than two time
periods, we must assume that u
it
is uncorrelated over
time no serial correlation
This assumption is sometimes reasonable, but it will not
hold if we assume that u
it
are uncorrelated over time.
If the u
it
are serially uncorrelated with constant variance
then u
it
and u
it-1
are negatively correlated (-0.5)
If u
it
follows a stable AR(1) process then u
it
will be
serially correlated
Only when u
it
follows a random walk will u
it
be serially
uncorrelated
Differencing with More Than Two
Time Periods
Testing for serial correlation in the first-differenced
equation:
First, we estimate our first-differenced equation and
obtain the residuals
Run a simple pooled OLS regression of the residual on
the lagged residual for t=3,,T, i=1,,N and compute
a standard t test for the coefficient on the lagged
residual

it
it
r u
Differencing with More Than Two
Time Periods
Correcting for serial correlation:
In the presence of AR(1) serial correlation we can use
the Prais-Winsten FGLS estimator
The Cochrane-Orcutt procedure is less preferred since
we lose N observations by dropping the first time period
However, standard PW procedures will treat the
observations as if they followed an AR(1) process over
both i and t, which makes no sense in this situation
since we have assumed independence across i
A detailed treatment on how to do this can be found in
Wooldridge (2002)
Assumptions for Pooled OLS Using
First Differences
Assumption FD.1: For each i the model is
where the
j

are the parameters to be estimated and a
i

is the
unobserved effect.
Assumption FD.2: We have a random sample from the cross
section.
Assumption FD.3: Each explanatory variable changes over
time (for at least some i) and no perfect linear relationships
exist among the explanatory variables.
1 1
... , 1,...,
it it k itk i it
y x x a u t T = + + + + =
Assumptions for Pooled OLS Using
First Differences
Assumption FD.4: For each t, the expected value of
the idiosyncratic error given the explanatory variables
in all time periods and the unobserved effect is zero:
E(u
it
|X
t
,a
i
)=0.
As stated, this assumption is stronger than is
necessary for consistency (u
it
is uncorrelated with
x
itj
for all j=1,,k and for all t=2,,T).
Under assumptions FD.1 through FD.4, the first-
difference estimator is unbiased.
Assumptions for Pooled OLS Using
First Differences
Assumption FD.5: The variance of the differenced errors,
conditional on all explanatory variables, is constant (i.e.,
homoskedastic): var(u
it
|X
i
)=
2
, t=2,,T.
Assumption FD.6: For all ts, the differences in the
idiosyncratic errors are uncorrelated (conditional on all
explanatory variables): cov(u
it
,u
is
|X
i
)=0, ts.
Under assumptions FD.1 through FD.6, the FD estimator of
j
is
the best linear unbiased estimator (conditional on the
explanatory variables).
Comparison of assumptions with
standard OLS
Notice the strong similarities between the first differencing
assumptions (FD) and those for standard OLS (MLR):
MLR.1 and FD.1 are basically the same, except weve now
added repeated observations and an unobserved effect for
each cross-sectional observation
MLR.2 and FD.2 are the same
MLR.3 and FD.3 are the same, except weve added the
condition that there has to be at least some time variation
for each of the explanatory variables
MLR.4 and FD.4 are the same, except that the condition is
across all time periods (clearly FD.4 is the same as MLR.4 if
T=1)
Comparison of assumptions with
standard OLS
FD.5 is the same as MLR.5: homoskedasticity, but of
the differenced error terms
FD.6 is new. It assumes that there is no correlation
over time of the error terms (clearly this was not an
issue when T=1)
But we had a no serial correlation assumption in time
series models
Practice questions
In-chapter questions: 13.1, 13.3, 13.4, 13.5
End-of-chapter questions: C13.2, C13.7, C13.11 (i-iv)

You might also like