Professional Documents
Culture Documents
Qianyang Zhang
1.1 Motivation
Although multiple regression is a powerful tool for estimating the effect of variables on the outcomes,
the OLS estimators of the regression coefficients could suffer from omitted variable bias is some of
the variables are unobserved and thus cannot be included in the regression. If we are dealing with
panel data, in which each observational unity, or entity, is observed at two or more time periods,
then there is a method that allows us to control for some types of omitted variables without actually
observing them. In this chapter, we will look at fixed effects regression, which is an extension of
multiple regression that exploits panel data to control for unobserved variables that differ across
entities but constant over time. Regression with time-fixed effects controls for unobserved variables
that are constant across entities but change over time.
• A Balanced Panel: a panel that has observations for each entity and each time period.
• An Unbalanced Panel : a panel that has some missing data from at least one time period for
at least one entity.
where uit is the error term, i = 1, · · · , n, and t = 1, · · · , T . Consider the Equation (1) for each of
the two years 1982 and 1988:
Then we can eliminate the fixed effect term Zi by subtracting Equation (2) from Equation (3)
where Zi is an unobserved variable that varies from one state to the next but does not change over
time. We are interested in estimating β1 , the effect on Y of X, holding constant the unobserved
state characteristics Z. Let αi = β0 + β2 Zi , then we can rewrite Equation (5) as
• Pooled OLS
When E[αi xit ] = 0, and E[uit xit ] = 0, simply running an OLS regression of Y on X will yield
a consistent estimator of β1 .
When E[u̇it ẋit ] = 0, running an OLS regression of Ẏ on Ẋ will yield a consistent estimator
of β1 .
2
• Least Squares Dummy Variable Approach
Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:
The Regression Model with Time Fixed Effects When we control for variables that are
constant across entities but change over time, the population regression model can be modified as
where Si is unobserved and where the single t subscript emphasizes that the control variable changes
over time but is constant across states. Because β3 St represents variables that determine Yit , if St
is correlated with Xit , then omitting St from the regression leads to omitted variable bias.
Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:
• Both Entity and Time Fixed Effects The Two-way Fixed Effects Regression Model
with both entity and time fixed effects, with a single X regressor is
Equivalently. the fixed effect regression model can be written in terms of a common intercept,
the X, and n − 1 binary variables representing all but one entity:
3
Yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ui , i = 1, · · · , n
Standard errors that are valid if uit is potentially heteroskedastic and potentially correlated over
time within an entity are referred to as heteroskedasticity-and-autocorrelation-robust (HAR)
standard errors.
The standard errors used in this chapter are one type of HAR standard errors, clustered standard
errors. The term clustered arises because these standard errors allow the regression errors to have
an arbitrary correlation within a cluster, but assume that the regression errors are uncorrelated
across clusters.
Like heterskedasticity-robust standard errors in regression with cross-sectional data, clustered stan-
dard errors are valid whether or not there is heterskedasticity, autocorrelation, or both.
Yit = β1 Xi t + αi + uit , i = 1, · · · , n, t = 1, · · · , T.
2. (Xi1 , ·, XiT , ui1 , ·, uiT ,), i = 1, · · · , n are i.i.d. draws from their joint distribution.
3. Large outliers are unlikely: (Xit , uit ) have nonzero finite fourth moments.
For multiple regressors, Xit should be replaced by the full list X1,it , · · · , XN,it .
Under the four assumptions for panel data above, the fixed effects estimator is consistent and is
normally distributed when n is large.
Note that:
• the 1st assumption will be violated if current uit is correlated with past, present, or future
values of X.
• the 2nd assumption for panel data here holds that the variables are independent across entities
but makes no such restriction within an entity. It does not exclude the possibility of uit being
heterskedastic, or Xit being correlated over time within an entity.
4
2 Estimating Fixed Effects Using Stata
log close
5
References
[1] James H. Stock AND Mark W. Watson (2019) Introduction to Econometrics, Pearson.