You are on page 1of 26

Panel Data Analysis:

Fixed and Random Effects Using


Stata

Dr. Abuzar Nomani


Abuzarnomani88@gmail.com

May 2023 https://www.youtube.com/channel/UCPO8r7bzTbze-zxuE2pRuqQ


Panel Data
• Panel data, also known as longitudinal data or Entity Year Y X1 X2 X3
cross-sectional time-series data, is a type of 1 1 # # # #
data used in statistical analysis that involves
observations of the same set of individuals or 1 2 # # # #
entities over time. 1 3 # # # #
2 1 # # # #
• In panel data, the same individuals or entities 2 2 # # # #
are observed at different points in time, 2 3 # # # #
allowing for analysis of changes over time and
3 1 # # # #
identification of individual and time-specific
effects. 3 2 # # # #
3 3 # # # #

See Stock and Watson, Introduction to Econometrics, chapter 10 “Regression with Panel Data”

Abuzar Nomani (abuzarnomani88@gmail.com) 2


The data: the long form
• The long form is a way of representing panel
data where each observation is listed as a Variable Variable Variable
separate row in the dataset. In the long form, ID Time 1 2 3
the dataset has a unique identifier variable 1 1 10 20 30
that links all observations for each 1 2 15 25 35
individual/entity, and a variable indicating the
time period of the observation. 2 1 12 22 32
2 2 18 28 38
3 1 8 18 28
3 2 13 23 33

Abuzar Nomani (abuzarnomani88@gmail.com) 3


The data: wide form
• The wide form is another way of representing
panel data where each individual/entity is
represented by a single row with columns for
each time period. In the wide form, each
column represents a specific time period and Var 1 Var 2 Var3 Var1 Var 2 Var3
each row represents a unique ID (T1) (T1) (T 1) (T 2) (T 2) (T 2)
individual/entity. 1 10 20 30 15 25 35
2 12 22 32 18 28 38
3 8 18 28 13 23 33

Abuzar Nomani (abuzarnomani88@gmail.com) 4


Reshape data
. reshape long
(note: j = 2001 2002 2003 2004 2005)

Data wide -> long

Number of obs. 3 -> 15


Number of variables 6 -> 3
j variable (5 values) -> year
xij variables:
x12001 x12002 ... x12005 -> x1

. reshape wide x1, i(id) j(year) string


(note: j = 2001 2002 2003 2004 2005)

Data long -> wide

Number of obs. 15 -> 3


Number of variables 3 -> 6
j variable (5 values) year -> (dropped)
xij variables:
x1 -> x12001 x12002 ... x12005

Abuzar Nomani (abuzarnomani88@gmail.com) 5


Setting data as panel
• Once the data is in long form, we need to set
it as panel so we can use Stata’s panel data xt
commands and the time series operators.

Balanced panel: all entities are


. xtset company year observed across all times.
varlist: company: string variable not allowed Unbalanced panel: some entities are
. xtset number year not observed in some years. Stata
panel variable: number (unbalanced) algorithms automatically account for
time variable: year, 2013 to 2020
delta: 1 unit
this
. xtset number year
panel variable: number (strongly balanced)
time variable: year, 2013 to 2018
delta: 1 unit

Abuzar Nomani (abuzarnomani88@gmail.com) 6


Visualization of data
• Once the data is set as panel, you can use a
series of xt commands to analyze it. 1 2

150
100
. xtline eps

• For more information type:

EPS

50
help xt

0
2012 2014 2016 20182012 2014 2016 2018
Company
Graphs by Number

Abuzar Nomani (abuzarnomani88@gmail.com) 7


Visualization of data
• All in one
1 2

150
. xtline eps roe npm

100
50
0
2012 2014 2016 20182012 2014 2016 2018
Company
EPS ROE
NPM
Graphs by Number

Abuzar Nomani (abuzarnomani88@gmail.com) 8


Data transformation/Histogram

.03
• gen ln_gdppc = ln(gdppc)

.02
• gen ln_cr = ln(cr + 2)

Density
• If the variable has negative

.01
values, you need to add a value

0
-200 -100 0 100

high enough so the minimum NPM

value is over zero (preferable 1).

.4
For example, if the lowest value

.3
in ‘varX’ is -1

Density

.2
.1
0
-1 0 1 2 3 4
ln_npm
Abuzar Nomani (abuzarnomani88@gmail.com)
9
Usage of panel data
• panel data analysis helps to overcome
omitted variable bias by controlling for
unobserved individual-specific factors that
may influence the relationship between the
variables of interest.

Abuzar Nomani (abuzarnomani88@gmail.com) 10


Example
• we can control for these unobserved factors
and obtain more accurate estimates of the
Size relationship using fixed effect model.
• Entity fixed effects. Variables that do not
financing Industry
change over time but vary across entities
(cultural factors, difference in business
practices across companies, etc.)
Profitability • Time fixed effects. Variables that change over
time but not across entities (i.e. national
policies, federal regulations, international
Brand Employes
agreements, etc.)

Management

Abuzar Nomani (abuzarnomani88@gmail.com) 11


Entity Fixed effect(Internal environ)

• want to investigate the effect of education on wages using panel data on a


sample of individuals over several years. We may include entity fixed effects to
control for unobserved differences between individuals that could affect their
wages, such as innate ability, personality traits, or family background.
• investigate the effect of corporate governance on firm performance using panel
data on a sample of firms over several years. We may include firm fixed effects to
control for unobserved differences between firms that could affect their
performance, such as the quality of management, the firm's reputation, or its
access to capital.
• They allow us to estimate the effect of variables of interest while controlling for stable differences between entities, such
as cultural factors or differences in business practices, that do not change over time but vary across entities.

Abuzar Nomani (abuzarnomani88@gmail.com) 12


Time fixed effect(External environ)

• investigate the effect of a new government policy on business


investment using panel data on a sample of firms over several years.
We may include time fixed effects to control for unobserved changes
in the economic environment that could affect business investment,
such as changes in interest rates, inflation, or global economic
conditions.
• time fixed effects in our analysis, we can capture the effects of these external changes over time and obtain
more accurate estimates of the effect of the policy on business investment, while controlling for other time-
varying factors that affect all firms in a similar way.

Abuzar Nomani (abuzarnomani88@gmail.com) 13


Entity Fixed effect model

• strict exogeneity assumption: It is assumed that there is a correlation


between the entity-specific error term (unobserved heterogeneity) and the
predictor variables within the same entity. This assumption allows the fixed
effects to capture and control for the unobserved heterogeneity that is specific to
each entity.
• 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
• i = 1…n ; t = 1….T
• Where:
• 𝑌𝑖𝑡 outcome variable (for entity i at time t)
• 𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
• 𝑋𝑖𝑡 is a vector of predictors (for entity i at time t).
• 𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.
• 𝛽 coefficient represents the effect of the predictor variables on the outcome variable, controlling for individual heterogeneity. Specifically, it
represents the change in the outcome variable for a given entity when a predictor variable changes one unit over time

Abuzar Nomani (abuzarnomani88@gmail.com) 14


The entity and time fixed effects regression model

• 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡


• i = 1…n ; t = 1….T
• Where:
• 𝑌𝑖𝑡 outcome variable (for entity i at time t).
• 𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
• 𝑋𝑖𝑡 is a vector of predictors (for entity i at time t) .
• 𝛿𝑡 is the unknow coefficient for the time regressors (t), It represents the change
in the outcome variable for a unit change in the time predictor
• 𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.
• the inclusion of both entity and time fixed effects in the regression model allows
us to control for both individual and time-specific heterogeneity and obtain
unbiased estimates of the effects of the predictors on the outcome variable.

Abuzar Nomani (abuzarnomani88@gmail.com) 15


Fixed effect idea
• The fixed effect model includes entity fixed effects in the regression model to account for
individual-specific or entity-specific heterogeneity.
• Entity fixed effects capture time-invariant differences between entities that are constant
over time but vary across entities.
• By including fixed effects, the model controls for unobserved heterogeneity that may
influence the outcome variable.
• The fixed effect model allows us to estimate the effects of time-varying variables on the
outcome variable while controlling for time-invariant entity-specific factors.
• It is suitable when we want to focus on within-entity variation and understand the effects
of time-varying predictors.
• The fixed effect model helps to control for individual/entity-specific characteristics that
remain constant over time, such as innate abilities or cultural factors.
• By accounting for fixed effects, we can obtain more accurate estimates of the effects of
the variables we're interested in.

Abuzar Nomani (abuzarnomani88@gmail.com) 16


Descriptive statistics
. sum eps roe npm

Variable Obs Mean Std. Dev. Min Max

eps 12 31.35333 49.31236 -14.26 133.66


roe 12 10.9575 7.905077 -7.87 19.32
npm 12 8.719167 7.141252 -9.11 14.72
. xtsum eps roe npm

Variable Mean Std. Dev. Min Max Observations

eps overall 31.35333 49.31236 -14.26 133.66 N = 12


between 44.89421 -.3916668 63.09833 n = 2
within 36.50138 -26.225 101.915 T = 6

roe overall 10.9575 7.905077 -7.87 19.32 N = 12


between 3.803056 8.268333 13.64667 n = 2
within 7.389263 -5.180833 22.00917 T = 6

npm overall 8.719167 7.141252 -9.11 14.72 N = 12


between 5.584965 4.77 12.66833 n = 2
within 5.829558 -5.160833 17.35917 T = 6
Fixed effects regression using xtreg, fe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖 Fixed effect

Outcome Predictors Controlling for Total number of


heteroskedasticity Total entities
cases

. xtreg eps roe npm, fe robust

Fixed-effects (within) regression Number of obs = 12


Group variable: number Number of groups = 2

R-sq: within = 0.2337 variation in the dependent variable is Obs per group: min = 6
explained by the independent
between = 1.0000 variables in the model. avg = 6.0
overall = 0.5659 max = 6

estimated to account for potential F(1,1) = .


violations of assumptions, such as
corr(u_i, Xb) = 0.7985 heteroscedasticity or the presence of
Prob > F = .
correlation between the correlated errors.
error term and the predicted (Std. Err. adjusted for 2 clusters in number)
values of the independent 𝛽 Two-tail p-values test the hypothesis
that each coefficient is different from
variables
Robust 0 (according to its t-value). A value
lower than 0.05 will reject the null
eps Coef. Std. Err. t P>|t| [95% Conf. Interval]
and conclude that the predictor has a
significant effect on the outcome
roe -6.973354 9.527731 -0.73 0.598 -128.0347 114.0879 (95% significance).
npm 10.42195 12.12993 0.86 0.548 -143.7035 164.5474
_cons 16.89314 1.362807 12.40 0.051 -.4229617 34.20924
Intraclass correlation (rho), shows how
sigma_u 13.208038 much of the variance in the output is
sigma_e 37.468756 explained by the difference across
rho .11052755 (fraction of variance due to u_i) entities. In this example is 99%.
Fixed effects regression using xtreg, fe
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, fe robust

Fixed-effects (within) regression Number of obs = 168


Group variable: number Number of groups = 25

R-sq: within = 0.5209 Obs per group: min = 1


between = 0.3441 avg = 6.7
overall = 0.3291 max = 8 If this number is < 0.05 then
your model is ok. This is an
F(6,24) = 13.75 Ftest to see whether all the
corr(u_i, Xb) = -0.3871 Prob > F = 0.0000 coefficients in the model are
jointly different than zero.
(Std. Err. adjusted for 25 clusters in number)

Robust
ln_eps Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_cr -3.231048 2.664804 -1.21 0.237 -8.730933 2.268837


ln_qr 3.540039 2.651945 1.33 0.194 -1.933306 9.013384
ln_roce -.3406091 .7676624 -0.44 0.661 -1.924986 1.243768
ln_roa .1591794 .7432614 0.21 0.832 -1.374837 1.693195
ln_roe 1.026841 .4542418 2.26 0.033 .0893326 1.96435
ln_npm .1001502 .3322293 0.30 0.766 -.5855374 .7858379
_cons .1948932 .5428112 0.36 0.723 -.9254142 1.315201

sigma_u .83986913
sigma_e .49385331
rho .74307545 (fraction of variance due to u_i)
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖 Entity and time fixed effects regression using xtreg, fe
Time Fixed effect

. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm i year, fe robust

Fixed-effects (within) regression Number of obs = 168


Group variable: number Number of groups = 25

R-sq: within = 0.6138 Obs per group: min = 1


between = 0.4704 avg = 6.7
overall = 0.4491 max = 8

F(8,24) = 15.91
corr(u_i, Xb) = -0.0878 Prob > F = 0.0000

(Std. Err. adjusted for 25 clusters in number)

Robust
ln_eps Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_cr -1.745499 2.435638 -0.72 0.481 -6.772409 3.281412


ln_qr 1.692942 2.402481 0.70 0.488 -3.265536 6.651419
ln_roce .0631209 .6523751 0.10 0.924 -1.283315 1.409557
ln_roa -.0676198 .6996967 -0.10 0.924 -1.511723 1.376483
ln_roe .8251312 .3356368 2.46 0.022 .1324108 1.517851
ln_npm -.0036716 .3204184 -0.01 0.991 -.6649826 .6576394
indas -.4810444 .1563704 -3.08 0.005 -.8037771 -.1583117
year .0062592 .0295927 0.21 0.834 -.0548171 .0673355
_cons -11.62416 59.68456 -0.19 0.847 -134.807 111.5587

sigma_u .71784076
sigma_e .44665697
rho .72089666 (fraction of variance due to u_i)
Random effect idea
• The random effect model assumes that individual-specific heterogeneity is unrelated to
the predictors we're studying.
• Instead of including entity fixed effects, the random effect model treats individual-
specific effects as random variables.
• These random variables have specific distribution assumptions, often assumed to follow
a normal distribution.
• The random effect model estimates the average effects of the predictors on the outcome
variable.
• It takes into account both within-entity variation (changes in predictors and outcome
over time) and between-entity variation (differences in average values across entities).
• The random effect model is useful when we want to capture overall or average effects of
the predictors on the outcome.
• It assumes that individual-specific effects are random and uncorrelated with the
predictors, allowing us to focus on the relationship between the predictors and the
outcome.

Abuzar Nomani (abuzarnomani88@gmail.com) 21


Random effects regression using xtreg, re
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, re robust

Random-effects GLS regression Number of obs = 168


Group variable: number Number of groups = 25

R-sq: within = 0.5026 Obs per group: min = 1


between = 0.5811 avg = 6.7
overall = 0.4732 max = 8

Wald chi2(6) = 141.76


corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 25 clusters in number)

Robust
ln_eps Coef. Std. Err. z P>|z| [95% Conf. Interval]

ln_cr -1.708752 1.451515 -1.18 0.239 -4.553669 1.136165


ln_qr 2.054775 1.469891 1.40 0.162 -.8261592 4.93571
ln_roce -.5889436 .5884734 -1.00 0.317 -1.74233 .5644431
ln_roa .4986326 .7704968 0.65 0.518 -1.011513 2.008778
ln_roe 1.247742 .295727 4.22 0.000 .668128 1.827356
ln_npm -.2463293 .2087309 -1.18 0.238 -.6554344 .1627757
_cons .1572039 .4527437 0.35 0.728 -.7301575 1.044565

sigma_u .61152367
sigma_e .49385331
rho .60525987 (fraction of variance due to u_i)
Which to choose?

• Whenever there is a clear idea that individual characteristics of each


entity or group affect the regressors, use fixed effects.
• Random effects is used whenever there is reason to believe that
individual characteristics have no effect on the regressors
(uncorrelated).

• example, macroeconomic data collected for most countries overtime.


There might be a good reason to believe that countries’ economic
performance may be affected by their own internal characteristics:
type of government, political environment, cultural characteristics,
type of public policies, etc
Abuzar Nomani (abuzarnomani88@gmail.com) 23
The Hausman-test tests whether the individual characteristics are correlated with
the regressors (see Green, 2008, chapter 9). The null hypothesis is that they are not
(random effects).
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm i year, fe robust
. estimates store fe
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, re robust
. estimates store re
. hausman fe re

Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
fe re Difference S.E.

ln_cr -3.231048 -1.708752 -1.522296 .8846641


ln_qr 3.540039 2.054775 1.485264 .8502234
ln_roce -.3406091 -.5889436 .2483345 .4311662
ln_roa .1591794 .4986326 -.3394532 .4305656
ln_roe 1.026841 1.247742 -.2209007 .1968139
ln_npm .1001502 -.2463293 .3464796 .2235597

b = consistent under Ho and Ha; obtained from xtreg


B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 10.86 If Prob > chi2 is < 0.05
Prob>chi2 = 0.0928 use fixed effects
Do we need random effects?
• The LM test helps you decide between a random effects regression and a simple
OLS regression. The null hypothesis in the LM test is that variances across
entities is equal to zero. This is, no significant difference across units (i.e. no
panel effect). The command in Stata is xttset0 type it right after running the
random effects model
. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

ln_eps[number,t] = Xb + u[number] + e[number,t]

Estimated results:
Var sd = sqrt(Var)

ln_eps 1.150541 1.072633


e .2438911 .4938533
u .3739612 .6115237

Test: Var(u) = 0
chibar2(01) = 142.48
Prob > chibar2 = 0.0000
25
Suggested books / references
• Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley,
2007.
• Econometric Analysis of Panel Data, Badi H. Baltagi, Wiley, 2008.
• Econometric Analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.
• An Introduction to Modern Econometrics Using Stata/ Christopher F. Baum, Stata Press, 2006.
• Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge
; New York : Cambridge University Press, 2007.
• Data Analysis Using Stata/ Ulrich Kohler, Frauke Kreuter, 2 nd ed., Stata Press, 2009.
• Statistics with Stata / Lawrence Hamilton, Thomson Books/Cole, 2006.
• Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan,
New York : Radius Press, c1986
• “Beyond “Fixed Versus Random Effects”: A framework for improving substantive and statistical analysis of
panel, time-series cross-sectional, and multilevel data” / Brandom Bartels
http://polmeth.wustl.edu/retrieve.php?id=838
• “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence” / Daniel Hoechle,
http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf
• Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O.Keohane, Sidney
Verba, Princeton University Press, 1994.
• Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge
University Press, 1989.
Abuzar Nomani (abuzarnomani88@gmail.com) 26

You might also like