Econometrics of Panel Data Models

Econometrics of Panel Data
1. Basics and Examples

2. The generalized least squares estimator
3. Fixed effects model
4. Random Effects model
Basics and examples
We observes variables for N units, called the cross-sections, for T

consecutive periods:
(Yit , Xit )
i = 1, . . . , N , with N the cross-sectional dimension.
t = 1, . . . , T , with T the temporal dimension.
panel of size N T .
Yit is the income of family i during year t, for 1 i 1000, and

observed in years 2000, 2001, 2002, so T = 3.
Yit is the unemployment rate for EU-country i, (1 i 15),

observed monthly from 1998:01 up to 2001:12, so T = 48.
Note that:
T large, N small multiple time series

T small, N large survey data on individuals/firms for a small
number of waves.
Example 1: South American countries

For 8 South-American countries we want to model the Real GDP per
capita in 1985 prices (=Rgdl) in function of the following explicative
variables.
Population in 1000s (Pop)
Real Investment share of GDP, in % (I)
Real Government share of GDP, in % (G)
Exchange Rate with U.S. dollar (XR)
Measure of Openness of the Economy (Open)
You find the data in the file penn.wmf, already in Eviews format.
We are in particular interested in the effect of Openness on economic
growth.
1. Create apool object in Eviews (/Object/New object). Give it a name and

define the cross-section identifiers. These identifiers are those parts of the
names of the series identifying the cross-section.
2. Open the XR-variables as a group and make a plot of them. Compute them
in log-difference, using the PoolGenr menu of the pool object and
logdifXR?=dlog(XR?). The ? will be substituted by every cross-section
identifier. Plot the transformed variables.
3. Compute the medians of the variable I? for the different countries (use
View/descriptive statistics within the Pool object.) Compute now the average
value of I? for every year.
4. Estimate the regression model for Brazil, using /Quick/estimate equation
and specifying in Eviews the equation
dlog(rgdp bra) c dlog(pop bra) i bra g bra dlog(xr bra) open bra
5. Now we want to pool the data of all countries, to increase the sample size.
Use, within the pooled object, /Estimate, and specify: dependent
variable=dlog(rgdp?); common coefficients=c dlog(pop?) i? g? dlog(xr?)
open?. This is a pooled regression model.
6. Pooling the data ignores the fact that the data originate from different
countries. Dummy variables for the different countries need to be added. This
can be done by specifying the constant term as a cross section specific
coefficient. We obtain a fixed effect panel data model. Discuss the
regression output.
7. The fixed effect panel data model assumes that the effect of openness is the
same of all countries. How could you relax this assumption?
8. Test whether all country effects are equal (to know how Eviews labels the
coefficients, use View/Representation), using a Wald test. The country
effects are called the fixed effects, and if there are significantly different then
there is unobserved heterogeneity.
The Generalized Least Squares

estimator
Standard linear regression model:
Yi = Xi + i (i = 1, . . . , n)
with
Var(i ) = 2 is constant homoscedastic errors
Cov(i , j ) = 0 for i 6= j uncorrelated errors
At the standard model, the Ordinary Least Squares (OLS) estimator

is
Consistent, meaning that for n tending to infinity.

Has the smallest variance among all estimators (for normal
errors) and smallest variance among all linear estimators.
One has that
OLS =
n
X
Xi Xi
i=1
!1
n
X
i=1
Xi Yi
What if the the errors are not homoscedastic and uncorrelated?

E.g. for panel data:
Cross-sectional heteroscedasticity
Correlation among cross sections
Serial correlation within and across cross-sections
...
The Ordinary Least Squares (OLS) estimator is still consistent, but

not optimal anymore.
General linear regression model:
Yi = Xi + i (i = 1, . . . , n)
with
Var(i ) = i2 heteroscedastic errors
Cov(i , j ) = ij for i 6= j correlated errors.
One can still use OLS (not even a bad idea), if one uses
White standard errors (if heteroscedasticity)
Newey-West standard errors (if correlated errors +

heteroscedasticity)
10
The Generalized Least Squares (GLS) estimator will be consistent

and optimal and is given by
GLS =
n X
n
X
i=1 j=1
wij Xi Xj
n X
n
X
i=1 j=1
where the weights depends on the values of ij .
wij Xi Yj ,
More precisely: let be the n n matrix with elements ij , then
wij = (1 )ij .
Unfortunately, the values in are unknown.
11
The Feasible Generalized Least Squares (GLS) proceeds in 2 steps:

1. Compute OLS and the residuals
riOLS = Yi Xi OLS .
2. Use the above residuals to estimate the ij . [This will require
some additional assumptions on the structure of ]
Compute then the GLS estimator with estimated weights wij .
The above scheme can be iterated fully iterated GLS estimator.
12
Theoretical Example
Our sample of size n = 20 consists of two groups of equal size (e.g.
men and women). There is no correlation among the observations,
but we think that the variances of the error terms for men and
women might be of different size.
[The error terms contains the omitted and unobserved variables. We
might indeed think that their size is different for women than for
men, e.g. when regressing salary on individual characteristics]
2
i2 = ii = M
for i = 1, . . . , 10
i2 = ii = F2
for i = 11, . . . , 20
ij = 0 for i 6= j .
13
Computation of the (Feasible) GLS estimator:

1. Compute the OLS estimator and the residuals riOLS .
2. Estimate
2
10
20
X
1 X OLS 2
1
=
(ri
) and
F2 =
(riOLS )2 .
10 i=1
10 i=11
Due to the simple structure of the matrix , we have
1
w
i = 2
1
(i = 1, . . . , 10) and w
i = 2
GLS =
n
X
wi Xi Xi
i=1
14
!1
n
X
i=1
(i = 11, . . . , 20)
wi Xi Yi
Application to panel data regression

Let it be the error term of a panel data regression model, with
1 i n, and 1 t T.
Three different specifications are common:
1. V ar(it ) = 2 and all covariances between error terms are zero.
OLS can be applied (no weighting).
2. V ar(it ) = i2 and all covariances between error terms are zero.
We have cross-sectional heteroscedasticity. GLS can be applied
(cross-section weights):
3. V ar(it ) = i2 , Cov(it , jt ) = ij , all other covariances zero. We
allow now for contemporaneous correlation between
cross-sections. GLS can be applied (SUR weights).
15
Example South American (continued)

1. Have a look at the residuals (View/residuals/Graphs) within the
pool object). Compute the covariance and the correlation matrix
of the residuals (i) Is there cross-sectional heteroscedasticity?
(ii) Is there contemporaneous correlation?
2. Estimate now the model with the appropriate GLS estimator.
Are the results depending a lot on the weighting scheme?
3. Is there still serial correlation present in the residuals, i.e.
(cross)-correlation at leads and lags? Hence, is the model
capturing the dynamics in the data?
16
The Fixed Effects regression model
Fixed effects Model:
Yit = Xit
+ i + it
with t = 1, . . . T time periods and i = 1 . . . , N cross-sectional units.
The i contain the omitted variables, constant over time, for

every unit i.
The i are called the fixed effects, and induce unobserved

heterogeneity in the model.
The Xit are the observed part of the heterogeneity. The it

contain the remaining omitted variables.
17
Testing for unobserved heterogeneity:
H0 : 1 = . . . = N :=
(Test for redundant fixed effects)
In case H0 holds, there is no unobserved heterogeneity, and the model
reduces to the pooled regression model:
+ + it
Yit = Xit
Ignoring unobserved heterogeneity may lead to severe bias of the

estimated , see figure:
18
15
Cross Section 1
Cross Section 2
10
Pooled Regression
Cross Section 3
19
LSDV estimation
LSDV=Least Squares Dummy Variable estimation
Rewrite the model as
Yit = 1 Di1 + . . . + n Din + Xit

+ it ,
j
with Di = 1 if i = j and zero if i 6= j .

Estimate model by OLS or GLS (weighting).
If necessary, use White/Newey West type of Standard Errors (also if
GLS is used, see later).
20
Within groups estimator

Compute averages of Xit and Yit within each group of
i. and Yi.
cross-sectional unit X
Y
it
Yi.
=
=
Xit
+ i + it
+ i + i.
X
i.
i. ) + (it i. )
(Yit Yi. ) = (Xit X
Regress the centered Yit on the centered Xit by OLS.
By centering, the fixed effects are eliminated !
One can show that the within group estimator is identical to LSDV.
21
Comments
1. If a variable Xit is constant in time for all cross-sections, the FE
model cannot be estimated.
Why?
2. The fixed effects model can be rewritten with a common
intercept included as
Yit = Xit
+ + i + it ,
and
1 + 2 + . . . + N = 0.
Obviously, we have i = + i , and is the average of the fixed
effects.
22
3. One can add time effects (or period effects) in the model:
+ i + t + it ,
Yit = Xit
The t contain the omitted variables, constant over

cross-sections, at every time point t.
The time effects capture the business cycle.
23
4. If we think that the cross-sectional units are an i.i.d. sample

(typical for micro-applications), but serial correlation or period
heteroscedasticity is present (within each unit), then OLS can be
made more precise/efficient:
(a) V ar(it ) = t2 and all covariances between error terms are zero.
We have period heteroscedasticity. GLS can be applied
(Period weights):
(b) V ar(it ) = t2 , Cov(it , is ) = ts , all other covariances zero. We
allow for serial correlation. GLS can be applied (Period
weights).
24
Example: Grunfeld data

We consider investment data for 10 American firms from 1935-1954,
and consider the model
IN Vit = i1 V ALit + i2 CAPit + i + it

for 1 i N = 10, and 1 t T = 20. The variables are
Gross investment for the firm (INV)
Value of the firm (VAL)
Real Value of the Capital stock (plant and equipment) (CAP)
The data are in the excel file grunfeld2.xls.
25
1. Have a look at the data in the Excel File. Write up the number of
observations, the number of variables, and the upper left cell of the data
matrix. Close the Excel file, create an unstructured Workfile and read in the
data (Proc/Import/Read Text Lotus Excel).
2. To apply a panel structure, double click on the Range: line at the top of
the workfile window, or select Proc/Structure/Resize Current Page. Select
Dated Panel, and enter the appropriate variables as Date Series and as
Cross Section ID series.
3. Open the investment series. Explore the Descriptive Statistics and tests
menu.
26
4. Use View/Graph to (i) Make a line plot of the time series for every cross
section (ii) Make boxplots of the distribution of investment over the different
cross sections and over time.
5. Use Quick/Estimate Equation to estimate the fixed effects model. Specify
the equation inv c cap value and use Panel Options to indicate that you use
fixed effects.
6. Interpret your outcome. Would it be useful to add period effects? Test
whether they this is necessary with View/Fixed Random Effects testing.
7. Select an appropriate weighting scheme within Panel Options. Interpret your
outcome.
27
Random Effects model
Model
Yit = c + Xit
+ it
where the error term is decomposed as
it = i + vit .
i is a random effect N (0, 2 ).

It is the permanent component of the error term.
vit a noise term N (0, v2 ).
It is the idiosyncratic component of the error term.
28
(The vit are uncorrelated among cross-sections, are serially

uncorrelated at all leads and lags, within and across cross sections.
The random effects are uncorrelated among cross-sections.)
At the price of one extra parameter 2 , the random effects model

allows for correlation within cross-section units:
For every i and t 6= s:
2
Cov(it , is ) = Cov(i + vit , i + vis ) =
The following Variance decomposition holds:

2
Var(it ) = Var(i + vit ) =
+ v2 .
29
Within groups/cross sections correlation:

2
.
= Corr(it , is ) = 2
+ v2
The larger the value of , the more unobserved heterogeneity.
One estimates by Generalized Least Squares, and obtains the

RE-estimator.
Different methods are existing to make GLS feasible.
30
Testing for correlated random effects:

The random effect i needs to be uncorrelated with the X -variables.
This is a strong assumption. If not, there is an endogeneity problem,
and the RE-estimator is inconsistent.
H0 : Corr(i , Xit ) = 0
The Hausman test compares two estimators: the FE (always
consistent) and the RE estimator (consistent under H0 ).
One rejects H0 if the difference between the two estimators is large.
31
Using Fixed or random effects?
In econometrics, the fixed effects model seems to be the most

appropriate (HO not needed).
If N is large, and T is small, and the cross-sectional units are a

random sample from a population, then random effects model
becomes attractive:
It is a parsimonious model, that captures within
group-correlation.
(For N large, FE requires estimation of many parameters)
Random effects is popular for modeling grouped data:

(i) Sample of 1000 children coming from 30 different schools
(ii) Sample of 1000 persons from 20 different villages
...
32
Robust Standard Errors: For RE no weighted versions are available.

Using robust standard errors (or coefficient covariance) might be
appropriate. This only affects the SE, not the estimators.
1. White cross section: robust to V ar(it ) = i2 and Cov(it , jt ) = ij .
[robust to cross-section heteroscedasticity and contemporenous correlation
among cross sections; appropriate if N << T .]
2. White period: robust to V ar(it ) = t2 and Cov(it , is ) = ts .

[robust to serial correlation within cross-section and changing variances over
time; appropriate if cross-sections are random sample and T << N .]
2
3. White diagonal: robust to V ar(it ) = it
[robust to all forms of heteroscedasticity, but not robust for any type of
correlation over time of across cross-section.]
Can also be used for FE.

33
Exercise Consider the grunfeld data in grundfeld2.wf1. The

model was:
IN Vit = i1 V ALit + i2 CAPit + i + it
1. Estimate the model as a random effects model.
2. What is the within-group correlation?
3. Perform the Hausman test. (View/Fixed random effects
testing/Correlated random effects)
4. Compute different types of robust SE. How is this affecting the
results?
34

Econometrics of Panel Data Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics of Panel Data Models

Uploaded by

Copyright:

Available Formats

Econometrics of Panel Data

1. Basics and Examples

Basics and examples

We observes variables for N units, called the cross-sections, for T

i = 1, . . . , N , with N the cross-sectional dimension.

t = 1, . . . , T , with T the temporal dimension.

Yit is the income of family i during year t, for 1 i 1000, and

Yit is the unemployment rate for EU-country i, (1 i 15),

T large, N small multiple time series

Example 1: South American countries

Population in 1000s (Pop)

Real Investment share of GDP, in % (I)

Real Government share of GDP, in % (G)

Exchange Rate with U.S. dollar (XR)

Measure of Openness of the Economy (Open)

1. Create apool object in Eviews (/Object/New object). Give it a name and

The Generalized Least Squares

Standard linear regression model:

Var(i ) = 2 is constant homoscedastic errors

Cov(i , j ) = 0 for i 6= j uncorrelated errors

At the standard model, the Ordinary Least Squares (OLS) estimator

Consistent, meaning that for n tending to infinity.

One has that

What if the the errors are not homoscedastic and uncorrelated?

Correlation among cross sections

Serial correlation within and across cross-sections

The Ordinary Least Squares (OLS) estimator is still consistent, but

General linear regression model:

Var(i ) = i2 heteroscedastic errors

Cov(i , j ) = ij for i 6= j correlated errors.

White standard errors (if heteroscedasticity)

Newey-West standard errors (if correlated errors +

The Generalized Least Squares (GLS) estimator will be consistent

where the weights depends on the values of ij .

More precisely: let be the n n matrix with elements ij , then

Unfortunately, the values in are unknown.

The Feasible Generalized Least Squares (GLS) proceeds in 2 steps:

Computation of the (Feasible) GLS estimator:

Due to the simple structure of the matrix , we have

Application to panel data regression

Example South American (continued)

The Fixed Effects regression model

Fixed effects Model:

with t = 1, . . . T time periods and i = 1 . . . , N cross-sectional units.

The i contain the omitted variables, constant over time, for

The i are called the fixed effects, and induce unobserved

The Xit are the observed part of the heterogeneity. The it

Testing for unobserved heterogeneity:

Ignoring unobserved heterogeneity may lead to severe bias of the

Yit = 1 Di1 + . . . + n Din + Xit

with Di = 1 if i = j and zero if i 6= j .

Within groups estimator

The t contain the omitted variables, constant over

4. If we think that the cross-sectional units are an i.i.d. sample

Example: Grunfeld data

IN Vit = i1 V ALit + i2 CAPit + i + it

Gross investment for the firm (INV)

Value of the firm (VAL)

Real Value of the Capital stock (plant and equipment) (CAP)

The data are in the excel file grunfeld2.xls.

Random Effects model

where the error term is decomposed as

i is a random effect N (0, 2 ).