You are on page 1of 6

UN3412 Recitation Week 7

Qianyang Zhang

1 Regression with Panel Data

1.1 Motivation

Although multiple regression is a powerful tool for estimating the effect of variables on the outcomes,
the OLS estimators of the regression coefficients could suffer from omitted variable bias is some of
the variables are unobserved and thus cannot be included in the regression. If we are dealing with
panel data, in which each observational unity, or entity, is observed at two or more time periods,
then there is a method that allows us to control for some types of omitted variables without actually
observing them. In this chapter, we will look at fixed effects regression, which is an extension of
multiple regression that exploits panel data to control for unobserved variables that differ across
entities but constant over time. Regression with time-fixed effects controls for unobserved variables
that are constant across entities but change over time.

1.2 Panel Data

• A Balanced Panel: a panel that has observations for each entity and each time period.

• An Unbalanced Panel : a panel that has some missing data from at least one time period for
at least one entity.

Example: Panel Data with Two Time Periods


Let Zi be a variable that determines the fatality rate in the ith state but does not change over time.
Accordingly, the population linear regression relating Zi and the real beer tax to the fatality rate
is

Rateit = β0 + β1 T axit + β2 Zi + uit (1)

where uit is the error term, i = 1, · · · , n, and t = 1, · · · , T . Consider the Equation (1) for each of
the two years 1982 and 1988:

Ratei1982 = β0 + β1 T axi1982 + β2 Zi + ui1982 (2)


Ratei1988 = β0 + β1 T axi1988 + β2 Zi + ui1988 (3)

Then we can eliminate the fixed effect term Zi by subtracting Equation (2) from Equation (3)

Ratei1988 − Ratei1982 = β1 (T axi1988 − T axi1982 ) + ui1988 − ui1982 (4)


1.3 Fixed Effects Regression

The Fixed Effects Regression Model (Entity Fixed Effects)

Yit = β0 + β1 Xit + β2 Zi + uit (5)

where Zi is an unobserved variable that varies from one state to the next but does not change over
time. We are interested in estimating β1 , the effect on Y of X, holding constant the unobserved
state characteristics Z. Let αi = β0 + β2 Zi , then we can rewrite Equation (5) as

Yit = β1 Xit + αi + uit (6)

Next, we will talk about four ways to estimate the β1 .

• Pooled OLS

Yit = β1 Xit + αi + uit

When E[αi xit ] = 0, and E[uit xit ] = 0, simply running an OLS regression of Y on X will yield
a consistent estimator of β1 .

• First Difference Approach

Yit = β1 Xit + αi + uit


Yit−1 = β1 Xit−1 + αi + uit−1
⇒ Yit − Yit−1 = β1 (Xit − Xit−1 ) + uit − uit−1
=
| {z } | {z } | {z }
∆Yit ∆Xit ∆uit
⇒∆Yit = β1 ∆Xit + ∆uit
=

When E[∆ui t∆xi t] = 0, running an OLS regression of ∆Y on ∆X will yield a consistent


estimator of β1 .

• Within Group Transformation Approach

Yit = β1 Xit + αi + uit


Y i = β1 X i + αi + ui
T T T
1X 1X 1X
where Y i = Yit , X i = Xit , ui = uit
T T T
t=1 t=1 t=1
⇒ Yit − Y i = β1 (Xit − X i ) + uit − ui
=
| {z } | {z } | {z }
Ẏit Ẋit u̇it

⇒Ẏit = β1 Ẋit + u̇it


=

When E[u̇it ẋit ] = 0, running an OLS regression of Ẏ on Ẋ will yield a consistent estimator
of β1 .

2
• Least Squares Dummy Variable Approach

Yit = β1 Xit + αi + uit

Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:

Yit = β0 + β1 Xit + γ2 D2i + · · · + γn DNi + uit


(
1, if i = k
where DKi =
0, otherwise

1.4 Regression with Time Fixed Effects

The Regression Model with Time Fixed Effects When we control for variables that are
constant across entities but change over time, the population regression model can be modified as

Yit = β0 + β1 X1i + β2 Zi + β3 St + uit

where Si is unobserved and where the single t subscript emphasizes that the control variable changes
over time but is constant across states. Because β3 St represents variables that determine Yit , if St
is correlated with Xit , then omitting St from the regression leads to omitted variable bias.

• Time Effects Only


The Time Fixed Effects Regression Model with a single X regressor is

Yit = β1 Xit + λt + uit

Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:

Yit = β0 + β1 Xit + δ2 B2i + · · · + δn BTi + uit


(
1, if t = k
where BKi =
0, otherwise

• Both Entity and Time Fixed Effects The Two-way Fixed Effects Regression Model
with both entity and time fixed effects, with a single X regressor is

Yit = β1 Xit + αi + λt + uit

Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:

Yit = β0 + β1 Xit + γ2 D2i + · · · + γn DNi + δ2 B2i + · · · + δn BTi + uit


(
1, if t = k
where BKi =
0, otherwise

3
Yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ui , i = 1, · · · , n

1.5 Standard Errors for Fixed Effects Regression

Standard errors that are valid if uit is potentially heteroskedastic and potentially correlated over
time within an entity are referred to as heteroskedasticity-and-autocorrelation-robust (HAR)
standard errors.
The standard errors used in this chapter are one type of HAR standard errors, clustered standard
errors. The term clustered arises because these standard errors allow the regression errors to have
an arbitrary correlation within a cluster, but assume that the regression errors are uncorrelated
across clusters.
Like heterskedasticity-robust standard errors in regression with cross-sectional data, clustered stan-
dard errors are valid whether or not there is heterskedasticity, autocorrelation, or both.

1.6 The Fixed Effects Regression Assumptions

Yit = β1 Xi t + αi + uit , i = 1, · · · , n, t = 1, · · · , T.

where β1 is the causal effect of Y on X and

1. uit has conditional mean 0 (Strict exogeneity assumption)

E[uit |Xi1 , ·, XiT , α1 ] = 0

2. (Xi1 , ·, XiT , ui1 , ·, uiT ,), i = 1, · · · , n are i.i.d. draws from their joint distribution.

3. Large outliers are unlikely: (Xit , uit ) have nonzero finite fourth moments.

4. There is no perfect multicollinearity.

For multiple regressors, Xit should be replaced by the full list X1,it , · · · , XN,it .
Under the four assumptions for panel data above, the fixed effects estimator is consistent and is
normally distributed when n is large.

Note that:

• the 1st assumption will be violated if current uit is correlated with past, present, or future
values of X.
• the 2nd assumption for panel data here holds that the variables are independent across entities
but makes no such restriction within an entity. It does not exclude the possibility of uit being
heterskedastic, or Xit being correlated over time within an entity.

4
2 Estimating Fixed Effects Using Stata

use " Guns . dta " , clear


* Begin logging
log using " FixedEffects . smcl " , replace
* Generate new variable ( s )
gen lnvio = ln ( vio )
* Create a local variable
local controls incarc_rate density avginc pop pb1064 pw1064 pm1029
* - - - - - - - - - - - simple OLS - - - - - - - - - - - -*
* Regression (1)
reg lnvio shall
* Regression (2)
reg lnvio shall ‘ controls ’

* - - - - - - adding Fixed Effects - - - - - - -*


// Fixed effects using least squares dummy variable model

xi : reg lnvio shall ‘ controls ’ i . stateid


estimates store ols

* Plot the fixed effects


predict lnviohat
separate lnvio , by ( stateid )
separate lnviohat , by ( stateid )
twoway connected lnviohat1 - lnviohat56 shall ‘ controls ’

// Fixed effects : n entity - specific intercepts


* ( using xtreg )

xtset stateid // declaring the dataset to be a panel


xtreg lnvio shall ‘ controls ’ , fe
estimate store fixed

// Alternative way to estimate fixed effects : n entity - specific intercepts


* ( using areg )

areg lnvio shall ‘ controls ’ , absorb ( stateid )


estimate store areg

// Comparing the results obtained by the above three methods


estimates table ols fixed areg , star stats ( N r2 r2_a )

/***** Reporting the R - square :


use the one provided by either regress or areg *****/

* - - - - - - adding Time Fixed Effects - - - - - - -*


xtreg lnvio shall ‘ controls ’ i . year , fe
testparm i . year // testing if time FE are needed
// time FE are only needed when the value of Prob > F is less than 0.05

log close

5
References

[1] James H. Stock AND Mark W. Watson (2019) Introduction to Econometrics, Pearson.

[2] ttps://www.princeton.edu/ otorres/Panel101.pdf

You might also like