Fixed and Random
Effects
Advanced Panel Data Methods
Fixed effects estimation
• Eliminating fixed effect, ai can be done in two ways:
1) FD OR 2) FE
• Fixed Effect transformation (FE) ( = within transformation):
(1)
(2)
or
• , , - time-demeaned data
• We can add more explanatory variables to the system:
• Time-demeaned equation is:
We estimate it by pooled OLS
• If explanatory variable is constant over time (e.g. distance, gender) gets swept
away after transformation: for all i and t if
• To estimate the equation with OLS, the following assumptions need to be satisfied:
1. Exogeneity:
2.
3.
These assumptions make FE better than FD
Degrees of freedom
• Total observation: NT
• Independent variables: k
(?)
• True
• Example: N = 54 firms, T = 3 years, k = 4
1) No of observation: NT = 162
2) Df = NT – N – k = 162 – 54 – 4 = 104
Example: Effect of job training on firm scrap rates
• N = 54 firms, T = 3 years: 1987, 1988, 1989
• Include lagged effect of grants given in 1988 and 1989
Problem with FE estimation
• It is unclear how to compute goodness-of-fit: R 2 given in table 14.1 is
based on the within transformation – how much time variation in y is
explained by the time variation in x. Other ways are also possible
• Time-constant variables cannot be included in the model, but can be
interacted with year dummies. E.g. education in wage equation can be
interacted with each year dummy to see how return on education changed
over the years.
• Variables whose change across time is constant e.g. experience, age, etc.
are omitted in the FE estimation.
Dummy Variable Regression
• Traditional FE approach assumes ai to be a parameter.
ai here is the intercept for person i (or firm i, city i) that is to be estimated
along with We need at least two time periods for this.
• To estimate the intercept for each i, we put in a dummy variable for each
cross-sectional observation along with explanatory variables. This method
is called dummy variable regression.
• When N is large, too many explanatory variables – not very practical.
• Advantages of DVR:
exactly same estimates of the as FE;
se and other statistics are identical
properly computes degrees of freedom directly
Stata output
• Thus, FE estimator can be obtained by the dummy variable regression.
• R2 from DVR is rather high (0.92) because we are including dummy variable for
each cross-sectional unit.
• The R-squared from DVR can be used to compute F tests assuming that the
classical linear model assumptions hold.
In particular, we can test the joint significance of all of the cross-sectional dummies (N –
1 dummies)
• The estimated intercepts, , are directly available from the DVR. But for FE, we
can compute ourselves as follows:
where are FE estimates.
• tells us if individual specific effects are above or below the average value in the
sample.
e.g. model of crime we obtain for a city to see if unobserved FE that contribute to
crime are above or below average.
Fixed Effects or First Differencing?
• When T = 2, FE = FD (both unbiased under assumption FE.1-FE.4)
FE must include time dummy to be identical to FD, which includes intercept
FD has advantage over FE – more straightforward and just a cross-sectional regression
• When T 3, model choice depends on efficiency of estimators – determined by
serial correlation of (assuming
When - serially uncorrelated, FE > FD as standards errors of FE are valid.
When - serially correlated, not easily comparable. Positive correlation e.g. random walk
FD > FE as becomes uncorrelated.
• It’s a good idea to try both: if results are not sensitive, so much better.
• When FE and FD give substantively different results, it makes sense to report both
sets of results and to try to determine why they differ.
FE with Unbalanced Panels
• Unbalanced Panel – when some panel data sets, especially on individuals or firms, have
missing years for at least some cross-sectional units in the sample.
• Estimation almost the same
Total No of observations: . One df is lost due to time-demeaning.
DVR follows exactly same process.
• The more difficult issue with an unbalanced panel is determining why the panel is
unbalanced
With cities and states, for example, data on key variables are sometimes missing for certain years.
If the reason for missing values is not correlated with , unbalanced panel causes no problems.
But with cities, firms and individuals (e.g. how unionization affects firm profitability) – missing
values can be problem. e.g. firms in subsequent years may have gone out of business, so we have
non-random sample. Will be estimators be unbiased?
• If the reason a firm leaves the sample (called attrition) is correlated with the idiosyncratic
error— those unobserved factors that change over time and affect profits—then the
resulting sample section can cause biased estimators
Example 14.3. Stata Output
N = 148
We add two variables to the analysis: log(salesit ) and log(employit). Missing: 3 firms and 5
additional observations. Estimated grant effect gets larger.
Wooldridge (2010, Ch10)
Random Effects Models
Example 14.4 Wage equation
, so RE is close to FE
Hausman test
• It is common to see researchers apply both random effects and fixed effects, and then
formally test for statistically significant differences in the coefficients on the time-
varying explanatory variables.
• Hausman (1978) first proposed such a test.
• The idea is to use RE estimates unless the Hausman test rejects
• A failure to reject means either that the RE and FE can be used
• A rejection means FE estimates should be used
Stata codes
use [Link], clear
//Declare data to be panel data
xtset fcode year
//Summary stats
misstable sum lscrap d88 d89 grant lsales lemploy
Hausman test
// Fixed Effects
xtreg lscrap d88 d89 grant lsales lemploy, fe
estimates store fixed
// Random Effects
xtreg lscrap d88 d89 grant lsales lemploy, re
estimates store random
//Hasuman Test for FE vs RE
hausman fixed random, sigmamore
Fail to reject
Correlated RE Approach
• Consider the simple model
(1)
• Rather than assuming ai uncorrelated with xit (RE) or take away time averages to
remove ai (FE), we might model correlation between ai and xit.
- time average
(1)
Assume uncorrelated with
• (1) and (2) assume that ai and are correlated. Thus,
can be estimated using RE
• It can be shown that:
Thank you