You are on page 1of 43

PANEL DATA

(Ch. 10)

The recommended exercise questions from the textbook:


Chapter 10: All except (10.6), (10.10).

[1] What are panel data?


Panel data consists of the observations on the same n entities at
two or more time periods T. If the data set contains observations
on the variables X and Y, then the data are denoted
( X it , Yit ), i = 1,..., n and t = 1,..., T ,
where the first subscript, i, refers to the entity being observed, and
the second subscript, t, refers to the date at which it is observed.

Balanced panel Vs. unbalanced panel.


Balanced panel: Variables are observed for each entity and
each time period.
Unbalanced panel: Some missing data for at least one time
period.

We consider the analysis of balanced panel. But extension to


unbalanced is straightforward.

Panel-1
[2] Revisiting Omitted Variables Biases
Issue:
Do alcohol taxes help decrease traffic deaths?
Data: fatality.wf1
48 U.S. states (excluding Alaska and Hawaii): N = 48.
1982 -1988: T =7.
fatality rate = # of traffic accident deaths per 10,000 people.
beertax = tax per a case of beer ($).

Estimation results for the 1982 data:


n = 2.01 + 0.15BeerTax
FatalityRate
(0.15) (0.13)

Estimation results for the 1988 data:


n = 1.86 + 0.44BeerTax
FatalityRate
(0.11) (0.13)

Panel-2
Panel-3
What is going on here?
Consider a simple multiple regression model (for a given time t):
Yit = 0 + 1Xit + 2Zi + uit, i = 1, ... , N,
where Zi is a time-invariant regressor.

1 What do 1 and 2 measure?


1 measures the partial effect of Xit on Yit with Zi held constant.
Similarly, 2 measures the partial effect of Zi on Yi with Xit held
constant.

If you estimate Yit = 0 + 1Xit + errorit instead?


cov( X it , Z i )
1 p 1 + 2
var( X it )

Each state would have a different level of preference for alcohol


(say, Zi = Pal).
Pal (Z) and Beertax (X) could be positively related: cov( X it , Zi ) >0.

Pal (Z) would have a positive partial effect on FatalityRate (2 > 0).
Thus, 1 could be positive even if the true 1 is negative.

How could we control Pal using panel data?

Panel-4
[3] Panel Data with Two Time Periods
Two equations for 1982 and 1988:
FatalityRatei,1988 = 0 + 1BeerTaxi,1988 + 2Zi + ui,1988.
FatalityRatei,1982 = 0 + 1BeerTaxi,1982 + 2Zi + ui,1982.
FatalityRatei,1988 Fatalityi,1982
= 1(BeerTaxi,1988 BeerTaxi,1982) + (ui,1988-ui,1982). (1)

No Zi in (1)! OLS on (1) will yield a consistent estimator of 1.


Actual estimation results for (1):
n
Fatality1988 Fatality1982

= -0.072 1.04(BeerTax1988 BeerTax1982)


(0.065) (0.36)

Panel-5
Comments on the before-and-after estimation results.
As real beer tax increases by $1 per case, the traffic fatality rate
falls by 1.04 deaths per 10,000 people.
This is a big effect, because mean traffic fatality rate is
approximately two.
This before-and-after approach works well if T = 2. What should
we do if T > 2?

Panel-6
[4] Fixed Effects Regression
(A) A simple regression model:
Yit = 0 + 1Xit + 2Zi + uit, i = 1, ... , N, t = 1, ... , T. (1)

Set i = 0 + 2Zi. Then, we have


Yit = 1Xit + i + uit, (2)
which is called the fixed effects regression model.

For the ith cross-sectional entity, the regression line is (2). The
slope coefficient 1 is the same for all i, but the intercept terms i
are different across different i (but constant over time).

Set:
Yit = 0 + 1Xit + 2D2i + 3D3i + ... + nDni + uit, (3)
where i = 1, ... , n, t = 1, ..., T (nT observations),
1 if i is the 2nd entity;
D 2i =
0 otherwise,
and other dummy variables D3, ..., Dn are similarly defined.

In (3), 1 = 0, 2 = 0 + 2, ... , n = 0 + n.

The slope coefficient 1 and n other parameters (0, 2, ..., n) can


be estimated by OLS on model (3).

Panel-7
Entity-demeaned OLS algorithm
Yit = 1Xit + i + uit
1 T
Yi = 1 X i + i + ui , where Yi = t =1Yit .
T
------------------------------------
(Y it Yi ) = 1 ( X it X i ) + ( uit ui ) . (4)

OLS estimator of 1 from (4) = OLS estimator of 1 from (3).

Least Square Assumptions for the fixed effects model:


(FEA.1) E (uit | X i1 , X i 2 ,..., X iT , i ) = 0 .
(FEA.2) The data, ( X i1 ,..., X iT , Yi1 ,..., YiT ) , i =1, ..., n, are random
sample.
(FEA.3) ( X it , i ) have nonzero finite fourth moments: Large
outliers are unlikely.
(FEA.4) There is no perfect multicollinearity.
(FEA.5) No autocorrelation: cov(uit , uis | X i1 ,..., X iT , i ) = 0 for all
t s.

For multiple regressions, Xit should be replaced by full list of X1,it,


, Xk,it.

What happens if (FEA.5) is violated?

Panel-8
(B) Extension to multiple Xs.
The fixed effects regression model is
Yit = 1X1,it + ... + kXk,it + i + uit, (5)
where i = 1, ... , n, and t = 1, ... , T.
Equivalently, the fixed effects model can be written as
Yit = 0 + 1X1,it + ... + kXk,it + 2D2i + ... + nDni + uit. (6)
Entity-demeaned algorithm
(Y it Yi ) = 1 ( X 1,it X 1,i ) + ... + k ( X k ,it X k ,i ) + ( uit ui ) . (7)

OLS estimators of 1, ... , k from (7) = OLS estimators of 1, ... ,


k from (6).

(C) Application to Traffic Deaths.


Fixed effects regression results:
n = -0.66BeerTax + StateFixedEffects.
FatalityRate
(0.20)

Panel-9
[5] Time and Entity Fixed Effects Model
(1) Motivation.
Return to our FatalityRate example:
Yit = 0 + 1Xit + 2Zi + 3St + uit,
where, Yit = FatalityRate; Xit = BeerTax;
Zi = time-invariant preferences for alcohol or driving of the
people in State i;
St = Time specific effects (common to all states) such as
overall mobile safety improvements.
1 if t is the first time period ;
Let B1t =
0, otherwise.
Define dummy variables B2t, ... , BTt similarly.

(2) Time and Entity Fixed Effects Model:


Yit = 0 + 1X1,it + ... + kXk,it + 2D2i + ... + nDni
+ 2B2t + ... TBTt + uit.
Too many regressors. But can get reasonably accurate estimates
of 1, ... , k. But the estimates of 2, ... , n and 2, ... , T are
inaccurate.

(3) Application to traffic death


n = -0.64Beertax + StateFixedEffects
FatalityRate
(0.25)
+ TimeFixedEffects.

Panel-10
[6] Drunk Driving Laws and Traffic Death
Would driving laws and economic conditions matter?

Panel-11
Drinking or drunken driving law do not matter very much.
Economic factors are important.
(4) is the base model.
Average tax = $0.5/case,
and average fatality rate = 2 per 10,000 people.
As tax increases by $0.5, fatality rate drops 0.450.5 = 0.225 (per
10,000).
But this result is somewhat imprecise: The confidence interval for
the effect of BeerTax at 95% of confidence level is:
0.45 1.96 0.22 (-0.88, -0.02),
which is quite wide.

Panel-12
[7] Eviews Exercise

(1) Exercise with an artificial panel data set named artificial_panel.xls.

There are four variables in the excel file, country, year, y, and x. Each
variable has 11 observations from the 3rd row to the 14th row. The data are
artificial numbers for three countries, US, Japan and Korea. Notice that the
variable country is alphabetic, not numeric.

STEP 1: Open artificial_panel.xls using Excel. Then, using your mouse, block
the data and copy them.

STEP 2: Open Eviews. Then, type the following on the Eviews window (the
narrow white window below the File, Edit, Object buttons):

create u 12 (enter)

Then, a workfile window will pop up.

Panel-13
Type the followings on the Eviews window:

alpha country (enter)


data year y x (enter)

The command alpha is used to create alphabetic variables, while


data is for numeric variables.

Then, a spreadsheet will pop up.

Panel-14
Close the window by clicking on X on the North-East corner of the
window. Eviews will ask you whether you want to delete Untitled
Group. Click on the Yes button.

Panel-15
STEP 3: On the workfile, click on the show buttom. Then, a SHOW window
will pop up. Type on the window:

country year y x

Panel-16
Click on OK. Then, a spreadsheet will pop up.

Panel-17
Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push
the right button on your mouse.

Panel-18
Then, you will see that the data from the excel file are pasted to the
spreadsheet.

Panel-19
Close the spreadsheet by clicking on X on the North-East corner.
Eviews will ask you whether you want to delete Untitled Group.
Click on the Yes button.

STET 4: On the workfile, push the save buttom. Determine the drive and file
folder where you want to save the file. Choose the file name
artificial_panel.wf1.

Panel-20
Click on the save button. Then, a Workfile Save window will pop
up. Just click on the ok button.

Panel-21
Then, you will be back to the workfile.

Panel-22
STEP 5: On the workfile, push the Proc button. Choose Structure/Resize
Current Page

Panel-23
Then you will have the Workfile Structure window. Choose Dated
Panel. Then, you will have the following screen.

Panel-24
Type 2001 for Start date, 2004 for End date, country for Cross-
section ID series, and year for Data series. Then, click on OK.

Panel-25
Then, you will be back to the workfile. Save it!!!

STEP 6: Push the objects/new object... button. Choose Equation and choose
art_pan as the name of the object. Then, an Equation Estimation
window will pop up. Type y x on the Equation specification box.

Panel-26
And click on Panel Options.

Panel-27
Choose Fixed for Cross-section, Fixed for Period, and White
(diagonal) for Coef covariance method.

By choosing Fixed for Cross-section, you are doing regression with


dummy variables for individual entities. By choosing Fixed for
Period, you are adding time dummy variables into regression.

Panel-28
STEP 7: Choose view/Fixed/Random Effects/Cross-section Effects.
Then you will have:

Panel-29
Choose view/Fixed/Random Effects/Period Effects.

Panel-30
Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects.

Panel-31
Panel-32
I found that the F and 2 statistics for the individual dummy variables and the
time dummy variables are computed assuming the error terms in the
regression models are homoskedastic over i and t. So, the results are not
reliable if the error terms are in fact heteroskedastic.
If you would like to test whether time effects are statistically significant,
I would like to suggest you to estimate your model choosing None for Period
but including time-dummy variables as time dummy variables.

Panel-33
(2) Exercise with fatality.wf1.
-----------------------------------------------------------------------------------
variable name variable label
----------------------------------------------------------------------------------
state State ID (FIPS) Code
year Year
spircons Spirits Consumption
unrate Unemployment Rate
perinc Per Capita Personal Income
emppop Employment/Population Ratio
beertax Tax on Case of Beer
sobapt % Southern Baptist
mormon % Mormon
mlda Minimum Legal Drinking Age
dry % Residing in Dry Counties
yngdrv % of Drivers Aged 15-24
vmiles Ave. Mile per Driver
vmilespd Ave. Mile per 1,000 Driver
breath Prelim. Breath Test Law
jaild Mandatory Jail Sentence
comserd Mandatory Community Service
jailcom jaild + comserd
allmort # of Vehicle Fatalities (#VF)
mrall Vehicle Fatality Rate (VFR) = #VF/Population
vfrall 10,000*mrall = VFR per 10,000 people
allnite # of Night-time VF (#NVF)
mralln Night-time VFR (NVFR)
allsvn # of Single VF (#SVF)
a1517 #NVF, 15-17 year olds
mra1517n NVFR, 15-17 year olds
a1829 #VF, 18-20 year olds
a1820n #NVF, 18-20 year olds
mra1820 VFR, 18-20 year olds
mra1820n NVFR, 18-20 year olds
a2124 #VF, 21-24 year olds
mra2124 VFR, 21-24 year olds
a2124n #NVF, 21-24 year olds
mra2124n NVFR, 21-24 year olds
aidall # of alcohol-involved VF

Panel-34
da18 Dummy variable for drinking age = 18
da19 Dummy variable for drinking age = 19
da20 Dummy variable for drinking age = 20
lincperc Log of per capita real income
mraidall Alcohol-Involved VFR
pop Population
pop1517 Population, 15-17 year olds
pop1820 Population, 18-20 year olds
pop2124 Population, 21-24 year olds
miles total vehicle miles (millions)
unus U.S. unemployment rate
epopus U.S. Emp/Pop Ratio
gspch GSP Rate of Change
Dum1982
Dum1983
Dum1984
:
DUM1988
------------------------------------------------------------------------------------

Panel-35
Estimation of the specification (4) on Table 10.1 in p. 368.

Dependent Variable: VFRALL


Sample: 1982 1988
Cross-sections included: 48
Total panel (balanced) observations: 336
White diagonal standard errors & covariance (d.f. corrected)

Variable Coefficient Std. Error t-Statistic Prob.

C -2.327171 1.316419 -1.767804 0.0782


BEERTAX -0.450272 0.222005 -2.028203 0.0435
DA18 0.027509 0.065473 0.420158 0.6747
DA19 -0.019096 0.039510 -0.483315 0.6293
DA20 0.030875 0.045689 0.675767 0.4998
JAILD 0.012644 0.031940 0.395866 0.6925
COMSERD 0.034135 0.114820 0.297289 0.7665
VMILESPD 0.008226 0.008368 0.983073 0.3264
LINCPERC 1.814889 0.472220 3.843312 0.0002
UNRATE -0.063043 0.011616 -5.427345 0.0000
DUM1982 0.533926 0.075931 7.031706 0.0000
DUM1983 0.435841 0.070418 6.189300 0.0000
DUM1984 0.246723 0.050392 4.896067 0.0000
DUM1985 0.155325 0.043688 3.555327 0.0004
DUM1986 0.189843 0.040808 4.652090 0.0000
DUM1987 0.087532 0.032452 2.697246 0.0074

Effects Specification

Cross-section fixed (dummy variables)

R-squared 0.939540 Mean dependent var 2.040444


Adjusted R-squared 0.925809 S.D. dependent var 0.570194
Log likelihood 183.8646 F-statistic 68.42532
Durbin-Watson stat 1.733929 Prob(F-statistic) 0.000000

Panel-36
Testing significance of the individual and time dummy variables:
[Estimation choosing Fixed for period and not using dummy variables as
regressor.]

Redundant Fixed Effects Tests


Equation: MIN
Test cross-section and period fixed effects

Effects Test Statistic d.f. Prob.

Cross-section F 44.772106 (47,273) 0.0000


Cross-section Chi-square 727.186063 47 0.0000
Period F 19.685127 (6,273) 0.0000
Period Chi-square 120.798386 6 0.0000
Cross-Section/Period F 40.398468 (53,273) 0.0000
Cross-Section/Period Chi-square 732.351587 53 0.0000

Panel-37
Testing significance of the time dummy variables:
[Estimation choosing None for period and using dummy variables as
regressor.]

Wald Test:
Equation: MIN

Test Statistic Value df Probability

F-statistic 11.46715 (6, 273) 0.0000


Chi-square 68.80287 6 0.0000

Panel-38
Comments on (FEA.5):
What if Assumption #5 fails: so corr(uit,uis|Xit,Xis,i) 0?
OLS panel data estimators of 1 are unbiased, consistent.
The OLS standard errors will be wrong.
Use heteroskedasticity and autocorrelation-consistent standard
errors (clustered standard errors).
The clustered SE formula is NOT the usual (hetero-robust) SE
formula! [Appendix 10.2 (pp. 379 381)].
The clustered SE might not be very accurate if N is small.
Eviews can compute these!

In Eviews, choose White period instead of White (diagonal).

Panel-39
Estimation of the specification (7) on Table 10.1 in p. 368.

Dependent Variable: VFRALL


Sample: 1982 1988
Cross-sections included: 48
Total panel (balanced) observations: 336
White period standard errors & covariance (d.f. corrected)

Variable Coefficient Std. Error t-Statistic Prob.

C -2.327171 1.915400 -1.214979 0.2254


BEERTAX -0.450272 0.319805 -1.407961 0.1603
DA18 0.027509 0.075267 0.365483 0.7150
DA19 -0.019096 0.053288 -0.358351 0.7204
DA20 0.030875 0.054076 0.570957 0.5685
JAILD 0.012644 0.017699 0.714386 0.4756
COMSERD 0.034135 0.142797 0.239043 0.8113
VMILESPD 0.008226 0.007355 1.118432 0.2644
LINCPERC 1.814889 0.683535 2.655150 0.0084
UNRATE -0.063043 0.013984 -4.508168 0.0000
DUM1982 0.533926 0.098541 5.418291 0.0000
DUM1983 0.435841 0.091540 4.761205 0.0000
DUM1984 0.246723 0.064103 3.848852 0.0001
DUM1985 0.155325 0.054832 2.832774 0.0050
DUM1986 0.189843 0.042774 4.438265 0.0000
DUM1987 0.087532 0.032445 2.697841 0.0074

Effects Specification

Cross-section fixed (dummy variables)

R-squared 0.939540 Mean dependent var 2.040444


Adjusted R-squared 0.925809 S.D. dependent var 0.570194
Durbin-Watson stat 1.733929 Prob(F-statistic) 0.000000

Panel-40
Average tax = $0.5/case,
and average fatality rate = 2 per 10,000 people.
As tax increases by $0.5, fatality rate drops 0.450.5 = 0.225 (per
10,000).
The confidence interval for the effect of BeerTax at 95% of
confidence level is:
0.45 1.96 0.32 (-1.08, 0.18),
which is wider than (-0.88, -0.02).

Panel-41
Panel-42
Panel-43

You might also like