Dr. Abuzar Nomani

Panel Data Analysis:
Fixed and Random Effects Using

Stata
Dr. Abuzar Nomani

Abuzarnomani88@gmail.com
May 2023 https://www.youtube.com/channel/UCPO8r7bzTbze-zxuE2pRuqQ

Panel Data
• Panel data, also known as longitudinal data or Entity Year Y X1 X2 X3
cross-sectional time-series data, is a type of 1 1 # # # #
data used in statistical analysis that involves
observations of the same set of individuals or 1 2 # # # #
entities over time. 1 3 # # # #
2 1 # # # #
• In panel data, the same individuals or entities 2 2 # # # #
are observed at different points in time, 2 3 # # # #
allowing for analysis of changes over time and
3 1 # # # #
identification of individual and time-specific
effects. 3 2 # # # #
3 3 # # # #
See Stock and Watson, Introduction to Econometrics, chapter 10 “Regression with Panel Data”
Abuzar Nomani (abuzarnomani88@gmail.com) 2

The data: the long form
• The long form is a way of representing panel
data where each observation is listed as a Variable Variable Variable
separate row in the dataset. In the long form, ID Time 1 2 3
the dataset has a unique identifier variable 1 1 10 20 30
that links all observations for each 1 2 15 25 35
individual/entity, and a variable indicating the
time period of the observation. 2 1 12 22 32
2 2 18 28 38
3 1 8 18 28
3 2 13 23 33

The data: wide form
• The wide form is another way of representing
panel data where each individual/entity is
represented by a single row with columns for
each time period. In the wide form, each
column represents a specific time period and Var 1 Var 2 Var3 Var1 Var 2 Var3
each row represents a unique ID (T1) (T1) (T 1) (T 2) (T 2) (T 2)
individual/entity. 1 10 20 30 15 25 35
2 12 22 32 18 28 38
3 8 18 28 13 23 33

Reshape data
. reshape long
(note: j = 2001 2002 2003 2004 2005)
Data wide -> long
Number of obs. 3 -> 15

Number of variables 6 -> 3
j variable (5 values) -> year
xij variables:
x12001 x12002 ... x12005 -> x1
. reshape wide x1, i(id) j(year) string

(note: j = 2001 2002 2003 2004 2005)
Data long -> wide
Number of obs. 15 -> 3

Number of variables 3 -> 6
j variable (5 values) year -> (dropped)
xij variables:
x1 -> x12001 x12002 ... x12005

Setting data as panel
• Once the data is in long form, we need to set
it as panel so we can use Stata’s panel data xt
commands and the time series operators.
Balanced panel: all entities are

. xtset company year observed across all times.
varlist: company: string variable not allowed Unbalanced panel: some entities are
. xtset number year not observed in some years. Stata
panel variable: number (unbalanced) algorithms automatically account for
time variable: year, 2013 to 2020
delta: 1 unit
this
. xtset number year
panel variable: number (strongly balanced)
time variable: year, 2013 to 2018
delta: 1 unit

Visualization of data
• Once the data is set as panel, you can use a
series of xt commands to analyze it. 1 2
150
100
. xtline eps
• For more information type:
EPS
50
help xt
0
2012 2014 2016 20182012 2014 2016 2018
Company
Graphs by Number

Visualization of data
• All in one
1 2
150
. xtline eps roe npm
100
50
0
2012 2014 2016 20182012 2014 2016 2018
Company
EPS ROE
NPM
Graphs by Number

Data transformation/Histogram
.03
• gen ln_gdppc = ln(gdppc)
.02
• gen ln_cr = ln(cr + 2)
Density
• If the variable has negative
.01
values, you need to add a value
0
-200 -100 0 100
high enough so the minimum NPM
value is over zero (preferable 1).
.4
For example, if the lowest value
.3
in ‘varX’ is -1
Density
.2
.1
0
-1 0 1 2 3 4
ln_npm
Abuzar Nomani (abuzarnomani88@gmail.com)
9
Usage of panel data
• panel data analysis helps to overcome
omitted variable bias by controlling for
unobserved individual-specific factors that
may influence the relationship between the
variables of interest.

Example
• we can control for these unobserved factors
and obtain more accurate estimates of the
Size relationship using fixed effect model.
• Entity fixed effects. Variables that do not
financing Industry
change over time but vary across entities
(cultural factors, difference in business
practices across companies, etc.)
Profitability • Time fixed effects. Variables that change over
time but not across entities (i.e. national
policies, federal regulations, international
Brand Employes
agreements, etc.)
Management

Entity Fixed effect(Internal environ)
• want to investigate the effect of education on wages using panel data on a

sample of individuals over several years. We may include entity fixed effects to
control for unobserved differences between individuals that could affect their
wages, such as innate ability, personality traits, or family background.
• investigate the effect of corporate governance on firm performance using panel
data on a sample of firms over several years. We may include firm fixed effects to
control for unobserved differences between firms that could affect their
performance, such as the quality of management, the firm's reputation, or its
access to capital.
• They allow us to estimate the effect of variables of interest while controlling for stable differences between entities, such
as cultural factors or differences in business practices, that do not change over time but vary across entities.

Time fixed effect(External environ)
• investigate the effect of a new government policy on business

investment using panel data on a sample of firms over several years.
We may include time fixed effects to control for unobserved changes
in the economic environment that could affect business investment,
such as changes in interest rates, inflation, or global economic
conditions.
• time fixed effects in our analysis, we can capture the effects of these external changes over time and obtain
more accurate estimates of the effect of the policy on business investment, while controlling for other time-
varying factors that affect all firms in a similar way.

Entity Fixed effect model
• strict exogeneity assumption: It is assumed that there is a correlation

between the entity-specific error term (unobserved heterogeneity) and the
predictor variables within the same entity. This assumption allows the fixed
effects to capture and control for the unobserved heterogeneity that is specific to
each entity.
• 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
• i = 1…n ; t = 1….T
• Where:
• 𝑌𝑖𝑡 outcome variable (for entity i at time t)
• 𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
• 𝑋𝑖𝑡 is a vector of predictors (for entity i at time t).
• 𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.
• 𝛽 coefficient represents the effect of the predictor variables on the outcome variable, controlling for individual heterogeneity. Specifically, it
represents the change in the outcome variable for a given entity when a predictor variable changes one unit over time

The entity and time fixed effects regression model
• 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡

• i = 1…n ; t = 1….T
• Where:
• 𝑌𝑖𝑡 outcome variable (for entity i at time t).
• 𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
• 𝑋𝑖𝑡 is a vector of predictors (for entity i at time t) .
• 𝛿𝑡 is the unknow coefficient for the time regressors (t), It represents the change
in the outcome variable for a unit change in the time predictor
• 𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.
• the inclusion of both entity and time fixed effects in the regression model allows
us to control for both individual and time-specific heterogeneity and obtain
unbiased estimates of the effects of the predictors on the outcome variable.

Fixed effect idea
• The fixed effect model includes entity fixed effects in the regression model to account for
individual-specific or entity-specific heterogeneity.
• Entity fixed effects capture time-invariant differences between entities that are constant
over time but vary across entities.
• By including fixed effects, the model controls for unobserved heterogeneity that may
influence the outcome variable.
• The fixed effect model allows us to estimate the effects of time-varying variables on the
outcome variable while controlling for time-invariant entity-specific factors.
• It is suitable when we want to focus on within-entity variation and understand the effects
of time-varying predictors.
• The fixed effect model helps to control for individual/entity-specific characteristics that
remain constant over time, such as innate abilities or cultural factors.
• By accounting for fixed effects, we can obtain more accurate estimates of the effects of
the variables we're interested in.

Descriptive statistics
. sum eps roe npm
Variable Obs Mean Std. Dev. Min Max
eps 12 31.35333 49.31236 -14.26 133.66

roe 12 10.9575 7.905077 -7.87 19.32
npm 12 8.719167 7.141252 -9.11 14.72
. xtsum eps roe npm
Variable Mean Std. Dev. Min Max Observations
eps overall 31.35333 49.31236 -14.26 133.66 N = 12

between 44.89421 -.3916668 63.09833 n = 2
within 36.50138 -26.225 101.915 T = 6
roe overall 10.9575 7.905077 -7.87 19.32 N = 12

between 3.803056 8.268333 13.64667 n = 2
within 7.389263 -5.180833 22.00917 T = 6
npm overall 8.719167 7.141252 -9.11 14.72 N = 12

between 5.584965 4.77 12.66833 n = 2
within 5.829558 -5.160833 17.35917 T = 6
Fixed effects regression using xtreg, fe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖 Fixed effect
Outcome Predictors Controlling for Total number of

heteroskedasticity Total entities
cases
. xtreg eps roe npm, fe robust
Fixed-effects (within) regression Number of obs = 12

Group variable: number Number of groups = 2
R-sq: within = 0.2337 variation in the dependent variable is Obs per group: min = 6
explained by the independent
between = 1.0000 variables in the model. avg = 6.0
overall = 0.5659 max = 6
estimated to account for potential F(1,1) = .

violations of assumptions, such as
corr(u_i, Xb) = 0.7985 heteroscedasticity or the presence of
Prob > F = .
correlation between the correlated errors.
error term and the predicted (Std. Err. adjusted for 2 clusters in number)
values of the independent 𝛽 Two-tail p-values test the hypothesis
that each coefficient is different from
variables
Robust 0 (according to its t-value). A value
lower than 0.05 will reject the null
eps Coef. Std. Err. t P>|t| [95% Conf. Interval]
and conclude that the predictor has a
significant effect on the outcome
roe -6.973354 9.527731 -0.73 0.598 -128.0347 114.0879 (95% significance).
npm 10.42195 12.12993 0.86 0.548 -143.7035 164.5474
_cons 16.89314 1.362807 12.40 0.051 -.4229617 34.20924
Intraclass correlation (rho), shows how
sigma_u 13.208038 much of the variance in the output is
sigma_e 37.468756 explained by the difference across
rho .11052755 (fraction of variance due to u_i) entities. In this example is 99%.
Fixed effects regression using xtreg, fe
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, fe robust

R-sq: within = 0.5209 Obs per group: min = 1

between = 0.3441 avg = 6.7
overall = 0.3291 max = 8 If this number is < 0.05 then
your model is ok. This is an
F(6,24) = 13.75 Ftest to see whether all the
corr(u_i, Xb) = -0.3871 Prob > F = 0.0000 coefficients in the model are
jointly different than zero.
(Std. Err. adjusted for 25 clusters in number)
Robust
ln_eps Coef. Std. Err. t P>|t| [95% Conf. Interval]
ln_cr -3.231048 2.664804 -1.21 0.237 -8.730933 2.268837

ln_qr 3.540039 2.651945 1.33 0.194 -1.933306 9.013384
ln_roce -.3406091 .7676624 -0.44 0.661 -1.924986 1.243768
ln_roa .1591794 .7432614 0.21 0.832 -1.374837 1.693195
ln_roe 1.026841 .4542418 2.26 0.033 .0893326 1.96435
ln_npm .1001502 .3322293 0.30 0.766 -.5855374 .7858379
_cons .1948932 .5428112 0.36 0.723 -.9254142 1.315201
sigma_u .83986913
sigma_e .49385331
rho .74307545 (fraction of variance due to u_i)
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖 Entity and time fixed effects regression using xtreg, fe
Time Fixed effect
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm i year, fe robust


between = 0.4704 avg = 6.7
F(8,24) = 15.91
corr(u_i, Xb) = -0.0878 Prob > F = 0.0000
Robust
ln_eps Coef. Std. Err. t P>|t| [95% Conf. Interval]
ln_cr -1.745499 2.435638 -0.72 0.481 -6.772409 3.281412

ln_qr 1.692942 2.402481 0.70 0.488 -3.265536 6.651419
ln_roce .0631209 .6523751 0.10 0.924 -1.283315 1.409557
ln_roa -.0676198 .6996967 -0.10 0.924 -1.511723 1.376483
ln_roe .8251312 .3356368 2.46 0.022 .1324108 1.517851
ln_npm -.0036716 .3204184 -0.01 0.991 -.6649826 .6576394
indas -.4810444 .1563704 -3.08 0.005 -.8037771 -.1583117
year .0062592 .0295927 0.21 0.834 -.0548171 .0673355
_cons -11.62416 59.68456 -0.19 0.847 -134.807 111.5587
sigma_u .71784076
sigma_e .44665697
Random effect idea
• The random effect model assumes that individual-specific heterogeneity is unrelated to
the predictors we're studying.
• Instead of including entity fixed effects, the random effect model treats individual-
specific effects as random variables.
• These random variables have specific distribution assumptions, often assumed to follow
a normal distribution.
• The random effect model estimates the average effects of the predictors on the outcome
variable.
• It takes into account both within-entity variation (changes in predictors and outcome
over time) and between-entity variation (differences in average values across entities).
• The random effect model is useful when we want to capture overall or average effects of
the predictors on the outcome.
• It assumes that individual-specific effects are random and uncorrelated with the
predictors, allowing us to focus on the relationship between the predictors and the
outcome.

Random effects regression using xtreg, re
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, re robust
Random-effects GLS regression Number of obs = 168


between = 0.5811 avg = 6.7
Wald chi2(6) = 141.76

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
Robust
ln_eps Coef. Std. Err. z P>|z| [95% Conf. Interval]
ln_cr -1.708752 1.451515 -1.18 0.239 -4.553669 1.136165

ln_qr 2.054775 1.469891 1.40 0.162 -.8261592 4.93571
ln_roce -.5889436 .5884734 -1.00 0.317 -1.74233 .5644431
ln_roa .4986326 .7704968 0.65 0.518 -1.011513 2.008778
ln_roe 1.247742 .295727 4.22 0.000 .668128 1.827356
ln_npm -.2463293 .2087309 -1.18 0.238 -.6554344 .1627757
_cons .1572039 .4527437 0.35 0.728 -.7301575 1.044565
sigma_u .61152367
sigma_e .49385331
Which to choose?
• Whenever there is a clear idea that individual characteristics of each

entity or group affect the regressors, use fixed effects.
• Random effects is used whenever there is reason to believe that
individual characteristics have no effect on the regressors
(uncorrelated).
• example, macroeconomic data collected for most countries overtime.

There might be a good reason to believe that countries’ economic
performance may be affected by their own internal characteristics:
type of government, political environment, cultural characteristics,
type of public policies, etc
The Hausman-test tests whether the individual characteristics are correlated with
the regressors (see Green, 2008, chapter 9). The null hypothesis is that they are not
(random effects).
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm i year, fe robust
. estimates store fe
. xtreg ln_eps ln_cr ln_qr ln_roce ln_roa ln_roe ln_npm, re robust
. estimates store re
. hausman fe re
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
fe re Difference S.E.
ln_cr -3.231048 -1.708752 -1.522296 .8846641

ln_qr 3.540039 2.054775 1.485264 .8502234
ln_roce -.3406091 -.5889436 .2483345 .4311662
ln_roa .1591794 .4986326 -.3394532 .4305656
ln_roe 1.026841 1.247742 -.2209007 .1968139
ln_npm .1001502 -.2463293 .3464796 .2235597
b = consistent under Ho and Ha; obtained from xtreg

B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 10.86 If Prob > chi2 is < 0.05
Prob>chi2 = 0.0928 use fixed effects
Do we need random effects?
• The LM test helps you decide between a random effects regression and a simple
OLS regression. The null hypothesis in the LM test is that variances across
entities is equal to zero. This is, no significant difference across units (i.e. no
panel effect). The command in Stata is xttset0 type it right after running the
random effects model
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
ln_eps[number,t] = Xb + u[number] + e[number,t]
Estimated results:
Var sd = sqrt(Var)
ln_eps 1.150541 1.072633

e .2438911 .4938533
u .3739612 .6115237
Test: Var(u) = 0
chibar2(01) = 142.48
Prob > chibar2 = 0.0000
25
Suggested books / references
• Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley,
2007.
• Econometric Analysis of Panel Data, Badi H. Baltagi, Wiley, 2008.
• Econometric Analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.
• An Introduction to Modern Econometrics Using Stata/ Christopher F. Baum, Stata Press, 2006.
• Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge
; New York : Cambridge University Press, 2007.
• Data Analysis Using Stata/ Ulrich Kohler, Frauke Kreuter, 2 nd ed., Stata Press, 2009.
• Statistics with Stata / Lawrence Hamilton, Thomson Books/Cole, 2006.
• Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan,
New York : Radius Press, c1986
• “Beyond “Fixed Versus Random Effects”: A framework for improving substantive and statistical analysis of
panel, time-series cross-sectional, and multilevel data” / Brandom Bartels
http://polmeth.wustl.edu/retrieve.php?id=838
• “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence” / Daniel Hoechle,
http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf
• Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O.Keohane, Sidney
Verba, Princeton University Press, 1994.
• Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge
University Press, 1989.

Dr. Abuzar Nomani

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dr. Abuzar Nomani

Uploaded by

Copyright:

Available Formats

Panel Data Analysis:

Fixed and Random Effects Using

Dr. Abuzar Nomani

May 2023 https://www.youtube.com/channel/UCPO8r7bzTbze-zxuE2pRuqQ

Abuzar Nomani (abuzarnomani88@gmail.com) 2

Abuzar Nomani (abuzarnomani88@gmail.com) 3

Abuzar Nomani (abuzarnomani88@gmail.com) 4

Data wide -> long

Number of obs. 3 -> 15

. reshape wide x1, i(id) j(year) string

Data long -> wide

Number of obs. 15 -> 3

Abuzar Nomani (abuzarnomani88@gmail.com) 5

Balanced panel: all entities are

Abuzar Nomani (abuzarnomani88@gmail.com) 6

• For more information type:

Abuzar Nomani (abuzarnomani88@gmail.com) 7

Abuzar Nomani (abuzarnomani88@gmail.com) 8

high enough so the minimum NPM

value is over zero (preferable 1).

Abuzar Nomani (abuzarnomani88@gmail.com) 10

Abuzar Nomani (abuzarnomani88@gmail.com) 11

• want to investigate the effect of education on wages using panel data on a

Abuzar Nomani (abuzarnomani88@gmail.com) 12

• investigate the effect of a new government policy on business

Abuzar Nomani (abuzarnomani88@gmail.com) 13

• strict exogeneity assumption: It is assumed that there is a correlation

Abuzar Nomani (abuzarnomani88@gmail.com) 14

• 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡

Abuzar Nomani (abuzarnomani88@gmail.com) 15

Abuzar Nomani (abuzarnomani88@gmail.com) 16

Variable Obs Mean Std. Dev. Min Max

eps 12 31.35333 49.31236 -14.26 133.66

Variable Mean Std. Dev. Min Max Observations

eps overall 31.35333 49.31236 -14.26 133.66 N = 12

roe overall 10.9575 7.905077 -7.87 19.32 N = 12

npm overall 8.719167 7.141252 -9.11 14.72 N = 12

Outcome Predictors Controlling for Total number of

. xtreg eps roe npm, fe robust

Fixed-effects (within) regression Number of obs = 12

estimated to account for potential F(1,1) = .

Fixed-effects (within) regression Number of obs = 168

R-sq: within = 0.5209 Obs per group: min = 1

ln_cr -3.231048 2.664804 -1.21 0.237 -8.730933 2.268837

Fixed-effects (within) regression Number of obs = 168

R-sq: within = 0.6138 Obs per group: min = 1

(Std. Err. adjusted for 25 clusters in number)

ln_cr -1.745499 2.435638 -0.72 0.481 -6.772409 3.281412

Abuzar Nomani (abuzarnomani88@gmail.com) 21

Random-effects GLS regression Number of obs = 168

R-sq: within = 0.5026 Obs per group: min = 1

Wald chi2(6) = 141.76

(Std. Err. adjusted for 25 clusters in number)

ln_cr -1.708752 1.451515 -1.18 0.239 -4.553669 1.136165

• Whenever there is a clear idea that individual characteristics of each

• example, macroeconomic data collected for most countries overtime.

ln_cr -3.231048 -1.708752 -1.522296 .8846641

b = consistent under Ho and Ha; obtained from xtreg

Test: Ho: difference in coefficients not systematic

Breusch and Pagan Lagrangian multiplier test for random effects

ln_eps[number,t] = Xb + u[number] + e[number,t]

ln_eps 1.150541 1.072633

You might also like