I n d i a n a U n i v e r s i t y

Uni ver si t y I nf or mat i on Technol ogy Ser vi ces
Linear Regression Models for Panel Data Using SAS, Stata,
LIMDEP, and SPSS
*





Hun Myoung Park, Ph.D.








© 2005-2009
Last modified on September 2009








University Information Technology Services
Center for Statistical and Mathematical Computing
Indiana University
410 North Park Avenue Bloomington, IN 47408
(812) 855-4724 (317) 278-4740
http://www.indiana.edu/~statmath

*
The citation of this document should read: “Park, Hun Myoung. 2009. Linear Regression Models for Panel Data
Using SAS, Stata, LIMDEP, and SPSS. Working Paper. The University Information Technology Services (UITS)
Center for Statistical and Mathematical Computing, Indiana University.”
http://www.indiana.edu/~statmath/stat/all/panel
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 2
http://www.indiana.edu/~statmath

2
This document summarizes linear regression models for panel data and illustrates how to
estimate each model using SAS 9.2, Stata 11, LIMDEP 9, and SPSS 17. This document does not
address nonlinear models (i.e., logit and probit models) and dynamic models, but focuses on
basic linear regression models.


1. Introduction
2. Least Squares Dummy Variable Regression
3. Panel Data Models
4. One-way Fixed Effect Models: Fixed Group Effect
5. One-way Fixed Effect Models: Fixed Time Effect
6. Two-way Fixed Effect Models
7. Random Effect Models
8. Poolability Test
9. Conclusion
Appendix
References


1. Introduction

Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each
of which has repeated measurements at different time periods. U.S. Census Bureau’s Census
2000 data at the state or county level are cross-sectional but not time-series, while annual sales
figures of Apple Computer Inc. for the past 20 years are time series but not cross-sectional. If
annual sales data of IBM, LG, Siemens, Microsoft, and AT&T during the same periods are also
available, they are panel data. The cumulative General Social Survey (GSS), American
National Election Studies (ANES), and Current Population Survey (CPS) data are not panel
data in the sense that individual respondents vary across survey years. Panel data may have
group effects, time effects, or the both, which are analyzed by fixed effect and random effect
models.

1.1 Data Arrangement

A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T
observations measured at 1 through t time period. Thus, the total number of observations is nT.
Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month).
Otherwise, panel data should be analyzed with caution. A short panel data set has many
entities but few time periods (small T), while a long panel has many time periods (large T) but
few entities (Cameron and Trivedi 2009: 230).

Panel data have a cross-section (entity or subject) variable and a time-series variable. In Stata,
this arrangement is called the long form (as opposed to the wide form). While the long form has
both group (individual level) and time variables, the wide form includes either group or time
variable. Look at the following data set to see how panel data are arranged. There are 6 groups
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 3
http://www.indiana.edu/~statmath

3
(airlines) and 15 time periods (years). The .use command below loads a Stata data set through
TCP/IP and in 1/20 of the .list command displays the first 20 observations.

. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear
(Cost of U.S. Airlines (Greene 2003))

. list airline year load cost output fuel in 1/20, sep(20)

+------------------------------------------------------------+
| airline year load cost output fuel |
|------------------------------------------------------------|
1. | 1 1 .534487 13.9471 -.0483954 11.57731 |
2. | 1 2 .532328 14.01082 -.0133315 11.61102 |
3. | 1 3 .547736 14.08521 .0879925 11.61344 |
4. | 1 4 .540846 14.22863 .1619318 11.71156 |
5. | 1 5 .591167 14.33236 .1485665 12.18896 |
6. | 1 6 .575417 14.4164 .1602123 12.48978 |
7. | 1 7 .594495 14.52004 .2550375 12.48162 |
8. | 1 8 .597409 14.65482 .3297856 12.6648 |
9. | 1 9 .638522 14.78597 .4779284 12.85868 |
10. | 1 10 .676287 14.99343 .6018211 13.25208 |
11. | 1 11 .605735 15.14728 .4356969 13.67813 |
12. | 1 12 .61436 15.16818 .4238942 13.81275 |
13. | 1 13 .633366 15.20081 .5069381 13.75151 |
14. | 1 14 .650117 15.27014 .6001049 13.66419 |
15. | 1 15 .625603 15.3733 .6608616 13.62121 |
16. | 2 1 .490851 13.25215 -.652706 11.55017 |
17. | 2 2 .473449 13.37018 -.626186 11.62157 |
18. | 2 3 .503013 13.56404 -.4228269 11.68405 |
19. | 2 4 .512501 13.8148 -.2337306 11.65092 |
20. | 2 5 .566782 14.00113 -.1708536 12.27989 |
+------------------------------------------------------------+

If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape
command to rearrange a data set back and forth between the long and wide form. The following
command changes from the long form to wide one so that the wide form has only six
observations that have a group variable and as many variables as the time period (4*15 year).

. keep airline year load cost output fuel

. reshape wide cost output fuel load, i(airline) j(year)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)

Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 90 -> 6
Number of variables 6 -> 61
j variable (15 values) year -> (dropped)
xij variables:
cost -> cost1 cost2 ... cost15
output -> output1 output2 ... output15
fuel -> fuel1 fuel2 ... fuel15
load -> load1 load2 ... load15
-----------------------------------------------------------------------------

If you wish to rearrange the data set back to the long form, run the following command.

. reshape long cost output fuel load, i(airline) j(year)

In balanced panel data, all entities have measurements in all time periods. In a contingency
table of cross-sectional and time-series variables, each cell should have only one frequency.
When each entity in a data set has different numbers of observations due to missing values, the
panel data are not balanced. Some cells in the contingency table have zero frequency. In
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 4
http://www.indiana.edu/~statmath

4
unbalanced panel data, the total number of observations is not nT. Unbalanced panel data
entail some computational and estimation issues although most software packages are able to
handle both balanced and unbalanced data.

1.2 Fixed Effect versus Random Effect Models

Panel data models examine fixed and/or random effects of entity (individual or subject) or time.
The core difference between fixed and random effect models lies in the role of dummy
variables (Table 1.1). If dummies are considered as a part of the intercept, this is a fixed effect
model. In a random effect model, the dummies act as an error term.

A fixed group effect model examines group differences in intercepts, assuming the same slopes
and constant variance across entities or subjects. Since a group (individual specific) effect is
time invariant and considered a part of the intercept,
i
u is allowed to be correlated to other
regressors. Fixed effect models use least squares dummy variable (LSDV) and within effect
estimation methods. Ordinary least squares (OLS) regressions with dummies, in fact, are fixed
effect models.

Table 1.1 Fixed Effect and Random Effect Models
Fixed Effect Model Random Effect Model
Functional form
*

it it i it
v X u y + + + = | o
'
) ( ) (
'
it i it it
v u X y + + + = | o
Intercepts Varying across groups and/or times Constant
Error variances Constant Varying across groups and/or times
Slopes Constant Constant
Estimation LSDV, within effect method GLS, FGLS
Hypothesis test Incremental F test Breusch-Pagan LM test
* ) , 0 ( ~
2
v it
IID v o

A random effect model, by contrast, estimates variance components for groups (or times) and
error, assuming the same intercept and slopes.
i
u is a part of the errors and thus should not be
correlated to any regressor; otherwise, a core OLS assumption is violated. The difference
among groups (or time periods) lies in their variance of the error term, not in their intercepts. A
random effect model is estimated by generalized least squares (GLS) when the O matrix, a
variance structure among groups, is known. The feasible generalized least squares (FGLS)
method is used to estimate the variance structure when O is not known. A typical example is
the groupwise heteroscedastic regression model (Greene 2003). There are various estimation
methods for FGLS including the maximum likelihood method and simulation (Baltagi and
Cheng 1994).

Fixed effects are tested by the (incremental) F test, while random effects are examined by the
Lagrange Multiplier (LM) test (Breusch and Pagan 1980). If the null hypothesis is not rejected,
the pooled OLS regression is favored. The Hausman specification test (Hausman 1978)
compares fixed effect and random effect models. If the null hypothesis that the individual
effects are uncorrelated with the other regressors in the model is not rejected, a random effect
model is better than its fixed counterpart.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 5
http://www.indiana.edu/~statmath

5
If one cross-sectional or time-series variable is considered (e.g., country, firm, and race), this is
called a one-way fixed or random effect model. Two-way effect models have two sets of
dummy variables for group and/or time variables (e.g., state and year).

1.3 Estimation and Software Issues

The LSDV regression, within effect model, between effect model (group or time mean model),
GLS, and FGLS are fundamentally based on OLS in terms of estimation. Thus, any procedure
and command for OLS is good for linear panel data models (Table 1.2).

The REG procedure of SAS/STAT, Stata .regress (.cnsreg), LIMDEP regress$, and SPSS
regression commands all fit LSDV1 by dropping one dummy and have options to suppress
the intercept (LSDV2). SAS, Stata, and LIMDEP can estimate OLS with restrictions (LSDV3),
but SPSS cannot. In Stata, .cnsreg command requires restrictions defined in the .constraint
command.

Table 1.2 Procedures and Commands in SAS, Stata, LIMDEP, and SPSS
SAS 9.2 Stata 11 LIMDEP 9 SPSS 17
Regression (OLS)
PROC REG .regress Regress$ Regression
LSDV1 w/o a dummy w/o a dummy w/o a dummy w/o a dummy
LSDV2 /NOINT ,noconstant
w/o One in Rhs
/Origin
LSDV3
RESTRICT .cnsreg Cls: N/A
One-way fixed
effect (within)
TSCSREG /FIXONE
PANEL /FIXONE
.xtreg, fe
.areg, abs
Regress;Panel;Str=;
Fixed$
N/A
Two-way fixed
(within effect)
TSCSREG /FIXTWO
PANEL /FIXTWO
N/A Regress;Panel;Str=;
Period=;Fixed$
N/A
Between effect
PANEL /BTWNG
PANEL /BTWNT
.xtreg, be Regress;Panel;Str=;
Means$
N/A
One-way random
effect
TSCSREG /RANONE
PANEL /RANONE
MIXED /RANDOM
.xtreg, re
.xtgls
.xtmixed
Regress;Panel;Str=;
Random$
N/A
Two-way random
TSCSREG /RANTWO
PANEL /RANTWO
.xtmixed Regress;Panel;Str=;
Period=;Random$
N/A
Random coefficient
model
MIXED /RANDOM .xtmixed
.xtrc
Regress;RPM=;Str=$ N/A

SAS, Stata, and LIMDEP also provide the procedures and commands that estimate panel data
models in a convenient way (Table 1.2). SAS/ETS has the TSCSREG and PANEL procedures
to estimate one-way and two-way fixed/random effect models.
1
These procedures estimate the
within effect model for a fixed effect model and by default employ the Fuller-Battese method
(1974) to estimate variance components for group, time, and error for a random effect model.
PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks
(1967) autoregressive model and Da Silva moving average method.

PROC TSCSREG can handle balanced data only, whereas PROC PANEL is able to deal with
balanced and unbalanced data. PROC PANEL requires each entity (subject) has more than one
observation. PROC TSCSREG provides one-way and two-way fixed and random effect models,

1
PROC PANEL was an experimental procedure in 9.13 but becomes a regular procedure in 9.2. SAS 9.13 users
need to download and install PROC PANEL from http://www.sas.com/apps/demosdownloads/setupintro.jsp.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 6
http://www.indiana.edu/~statmath

6
while PROC PANEL supports the between effect model (/BTWNT and /BTWNG) and pooled
OLS regression (/POOLED) as well. PROC PANEL has BP and BP2 options to conduct the
Breusch-Pagen LM test for random effects, while PROC TSCSREG does not.
2
Despite
advanced features of PROC PANEL, the output of the two procedures is similar. PROC
MIXED is also able to fit random effect and random coefficient (parameter) models and
supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG.

The Stata .xtreg command estimates a within effect (fixed effect) model with the fe option, a
between effect model with be, and a random effect model with re. This command, however,
does not directly fit two-way fixed and random effect models.
3
The .areg command with the
absorb option, equivalent to the .xtreg with the fe option, fits the one-way within effect
model that has a large dummy variable set. A random effect model can be also estimated using
the .xtmixed command. Stata has .xtgls that fits panel data models with heteroscedasticity
across groups and/or autocorrelation within groups.

The LIMDEP Regress$ command with the Panel subcommand estimates panel data models.
The Fixed effect subcommand fits a fixed effect model, Random effect estimates a random
effect model, and Means is for a between effect model. SPSS has limited ability to analyze
panel data.

1.4 Data Sets

This document uses two data sets. A cross-sectional data set contains research and development
(R&D) expenditure data of the top 50 information technology firms presented in OECD
Information Technology Outlook 2004. A panel data set has cost data for U.S. airlines (1970-
1984), which are used in Econometric Analysis (Greene 2003). See the Appendix for the details.

2
However, BP and BP2 produce invalid Breusch-Pagan statistics in cases of unbalanced data.
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/etsug_panel_sect041.htm.
3
You may fit the two-way fixed effect model by including a set of dummies and using the fe option. For the two-
way random effect model, you need to use the .xtmixed command instead of .xtreg.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 7
http://www.indiana.edu/~statmath

7
2. Least Squares Dummy Variable Regression

A dummy variable is a binary variable that is coded to either 1 or zero. It is commonly used to
examine group and time effects in regression analysis. Consider a simple model of regressing
R&D expenditure in 2002 on 2000 net income and firm type. The dummy variable d1 is set to 1
for equipment and software firms and zero for telecommunication and electronics. The variable
d2 is coded in the opposite way. Take a look at the data structure (Figure 2.1).

Figure 2.1 Dummy Variable Coding for Firm Types
+-----------------------------------------------------------------+
| firm rnd income type d1 d2 |
|-----------------------------------------------------------------|
| LG Electronics 551 356 Electronics 0 1 |
| AT&T 254 4,669 Telecom 0 1 |
| IBM 4,750 8,093 IT Equipment 1 0 |
| Ericsson 4,424 2,300 Comm. Equipment 1 0 |
| Siemens 5,490 6,528 Electronics 0 1 |
| Verizon . 11,797 Telecom 0 1 |
| Microsoft 3,772 9,421 Service & S/W 1 0 |
… … … … … … … …

2.1 Model 1 without a Dummy Variable: Pooled OLS

The ordinary least squares (OLS) regression without dummy variables, a pooled regression
model, assumes a constant intercept and slope regardless of firm types. In the following
regression equation,
0
| is the intercept;
1
| is the slope of net income in 2000; and
i
c is the
error term.

Model 1:
i i i
income D R c | | + + =
1 0
&

The pooled model fits the data well at the .05 significance level (F=7.07, p<.0115). R
2
of .1604
says that this model accounts for 16 percent of the total variance. The model has the intercept
of 1,482.697 and slope of .2231. For a $ one million increase in net income, a firm is likely to
increase R&D expenditure by $ .2231 million (p<.012).

. use http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.dta, clear
( R&D expenditure of IT firm (OECD 2002))

. regress rnd income

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 1, 37) = 7.07
Model | 15902406.5 1 15902406.5 Prob > F = 0.0115
Residual | 83261299.1 37 2250305.38 R-squared = 0.1604
-------------+------------------------------ Adj R-squared = 0.1377
Total | 99163705.6 38 2609571.2 Root MSE = 1500.1

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2230523 .0839066 2.66 0.012 .0530414 .3930632
_cons | 1482.697 314.7957 4.71 0.000 844.8599 2120.533
------------------------------------------------------------------------------

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 8
http://www.indiana.edu/~statmath

8
Pooled model: R&D = 1,482.697 + .2231*income

Despite moderate goodness of fit statistics such as F and t, this is a naïve model. R&D
investment tends to vary across industries.

2.2 Model 2 with a Dummy Variable

You may assume that equipment and software firms have more R&D expenditure than other
types of companies. Let us take this group difference into account.
4
We have to drop one of the
two dummy variables in order to avoid perfect multicollinearity. That is, OLS does not work
with both dummies in a model. The
1
o in model 2 is the coefficient of equipment, service, and
software companies.

Model 2:
i i i i
d income D R c o | | + + + =
1 1 1 0
&

Model 2 fits the date better than Model 1 The p-value of the F test is .0054 (significant at
the .01 level); R
2
is .2520, about .1 larger than that of Model 1; SSE (sum of squares due to
error or residual) decreases from 83,261,299 to 74,175,757 and SEE (square root of MSE) also
declines accordingly (1,500→1,435). The coefficient of d1 is statistically discernable from zero
at the .05 level (t=2.10, p<.043). Unlike Model 1, this model results in two different regression
equations for two groups. The difference lies in the intercepts, but the slope remains unchanged.

. regress rnd income d1

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 2, 36) = 6.06
Model | 24987948.9 2 12493974.4 Prob > F = 0.0054
Residual | 74175756.7 36 2060437.69 R-squared = 0.2520
-------------+------------------------------ Adj R-squared = 0.2104
Total | 99163705.6 38 2609571.2 Root MSE = 1435.4

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 1006.626 479.3717 2.10 0.043 34.41498 1978.837
_cons | 1133.579 344.0583 3.29 0.002 435.7962 1831.361
------------------------------------------------------------------------------

d1=1: R&D = 2,140.2050 + .2180*income = 1,113.579 +1,006.6260*1 + .2180*income
d1=0: R&D = 1,133.5790 + .2180*income = 1,113.579 +1,006.6260*0 + .2180*income

The slope .2180 indicates a positive impact of two-year-lagged net income on a firm’s R&D
expenditure. Equipment and software firms on average spend $1,007 million (=2,140-1,134)
more for R&D than telecommunication and electronics companies.

2.3 Visualization of Model 1 and 2


4
The dummy variable (firm types) and regressors (net income) may or may not be correlated.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 9
http://www.indiana.edu/~statmath

9
There is only a tiny difference in the slope (.2231 versus .2180) between Model 1 and Model 2.
The intercept 1,483 of Model 1, however, is quite different from 1,134 for equipment and
software companies and 2,140 for telecommunications and electronics in Model 2. This result
appears to be supportive of Model 2.

Figure 2.2 highlights differences between Model 1 and 2 more clearly. The red line (pooled) in
the middle is the regression line of Model 1; the dotted blue line at the top is one for equipment
and software companies (d1=1) in Model 2; finally the dotted green line at the bottom is for
telecommunication and electronics firms (d2=1 or d1=0).

Figure 2.2. Regression Lines of Model 1 and Model 2
R&D=1483+.223*Income
R&D=2140+.218*Income
R&D=1134+.218*Income
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
R
&
D

(
U
S
D

M
i
l
l
i
o
n
s
)
0 500 1000 1500 2000 2500
Income (USD Millions)
Source: OECD Information Technology Outlook 2004. http://thesius.sourceoecd.org/
2002 R&D Investment of OECD IT Firms


This plot shows that Model 1 ignores the group difference, and thus reports the misleading
intercept. The difference in the intercept between two groups of firms looks substantial.
However, the two models have the similar slopes. Consequently, Model 2 considering a fixed
group effect (i.e., firm type) seems better than the simple Model 1. Compare goodness of fit
statistics (e.g., F, R
2
, and SSE) of the two models. See Section 3.2.2 and 4.7 for formal
hypothesis test.

2.4 Least Squares Dummy Variable Regression: LSDV1, LSDV2, and LSDV3

The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with
dummy variables. Above Model 2 is a typical example of LSDV. The key issue in LSDV is
how to avoid the perfect multicollinearity or so called “dummy variable trap.” LSDV has three
approaches to avoid getting caught in the trap. These approaches are different from each other
with respect to model estimation and interpretation of dummy variable parameters (Suits 1984:
177). They produce different dummy parameter estimates, but their results are equivalent.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 10
http://www.indiana.edu/~statmath

10

The first approach, LSDV1, drops a dummy variable as shown in Model 2 above. That is, the
parameter of the eliminated dummy variable is set to zero and is used as a baseline (Table 3). A
variable to be dropped,
1 LSDV
dropped
d (d2 in Model 2), needs to be carefully (as opposed to arbitrarily)
selected so that it can play a role of the reference group effectively. LSDV2 includes all
dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero). Finally, LSDV3
includes the intercept and all dummies, and then impose a restriction that the sum of parameters
of all dummies is zero. Each approach has a constraint (restriction) that reduces the number of
parameters to be estimated by one and thus makes the model identified. The following
functional forms compare these three LSDVs.

LSDV1:
i i i i
d income D R c o | | + + + =
1 1 1 0
& or
i i i i
d income D R c o | | + + + =
2 2 1 0
&
LSDV2:
i i i i i
d d income D R c o o | + + + =
2 2 1 1 1
&
LSDV3:
i i i i i
d d income D R c o o | | + + + + =
2 2 1 1 1 0
& , subject to 0
2 1
= +o o

Table 2.1. Three Approaches of the Least Squares Dummy Variable Regression Model
LSDV1 LSDV2 LSDV3
Dummies included
1 1
1
LSDV
d
LSDV
d d ÷ except
for
1 LSDV
dropped
d
* *
1 d
d d ÷
3 3
1
LSDV
d
LSDV
d d ÷
Intercept?
1 LSDV
o
No
3 LSDV
o
All dummies? No (d-1) Yes (d) Yes (d)
Constraint
(restriction)?
0
1
=
LSDV
dropped
o

(Drop one dummy)
0
2
=
LSDV
o

(Suppress the intercept)
0
3
=
¿
LSDV
i
o

(Impose a restriction)
Actual dummy
parameters
1 1 * LSDV
i
LSDV
i
o o o + = ,
1 * LSDV
dropped
o o =
*
1
o ,
*
2
o ,…
*
d
o
3 3 * LSDV
i
LSDV
i
o o o + = ,
¿
=
* 3
1
i
LSDV
d
o o
Meaning of a
dummy coefficient
How far away from the
reference group (dropped)?
Actual intercept How far away from the
average group effect?
H
0
of the t-test
0
* *
= ÷
dropped i
o o 0
*
=
i
o
0
1
* *
= ÷
¿ i i
d
o o
Source: Constructed from Suits (1984) and David Good’s lecture (2004)

Three approaches end up fitting the same model but the coefficients of dummy variables in
each approach have different meanings and thus are numerically different (Table 2.1). A
parameter estimate in LSDV2,
*
d
o , is the actual intercept (Y-intercept) of group d. It is easy to
interpret substantively. The t-test examines if
*
d
o is zero. In LSDV1, a dummy coefficient
shows the extent to which the actual intercept of group d deviates from the reference point (the
parameter of the dropped dummy variable), which is the intercept of LSDV1,
1 * LSDV
dropped
o o = .
5


5
In Model 2,
1
ˆ
o of 1,007 is the estimated (relative) distance between two types of firm (equipment and software
versus telecommunications and electronics). In Figure 2.2, the Y-intercept of equipment and software (absolute
distance from the origin) is 2,140 = 1,134+1,006. The Y-intercept of telecommunications and electronics is 1,134.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 11
http://www.indiana.edu/~statmath

11
The null hypothesis holds that the deviation from the reference group is zero. In LSDV3, a
dummy coefficient means how far its actual parameter is away from the average group effect
(Suits 1984: 178). The average effect is the intercept of LSDV3:
¿
=
* 3
1
i
LSDV
d
o o . Therefore,
the null hypothesis is the deviation from the average is zero. In short, each approach has a
different baseline and thus tests a different hypothesis but produces exactly the same parameter
estimates of regressors. They all fit the same model; given one LSDV fitted, in other words, we
can replicate the other two LSDVs. Table 2.1 summarizes differences in estimation and
interpretation of the three LSDVs.

Which approach is better than the others? You need to consider both estimation and
interpretation issues carefully. In general, LSDV1 is often preferred because of easy estimation
in statistical software packages. Oftentimes researchers want to see how far dummy parameters
deviate from the reference group rather than what are the actual intercept of each group.
LSDV2 and LSDV3 involve some estimation problems; for example, LSDV2 reports a
incorrect R
2
.

2.5 Estimating Three LSDVs

The SAS REG procedure, Stata .regress command, LIMDEP Regress$ command, and
SPSS Regression command all fit OLS and LSDVs. Let us estimate three LSDVs using SAS,
Stata, and LIMDEP.

2.5.1 LSDV 1 without a Dummy

LSDV 1 drops a dummy variable. The intercept is the actual parameter estimate (absolute
distance from the origin) of the dropped dummy variable. The coefficient of a dummy included
means how far its parameter estimate is away from the reference point or baseline (i.e., the
intercept).

Here we include d2 instead of d1 to see how a different reference point changes the result.
Check the sign of the dummy coefficient and the intercept.

PROC REG DATA=masil.rnd2002;
MODEL rnd = income d2;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: rnd

Number of Observations Read 50
Number of Observations Used 39
Number of Observations with Missing Values 11


Analysis of Variance

Sum of Mean
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 12
http://www.indiana.edu/~statmath

12
Source DF Squares Square F Value Pr > F

Model 2 24987949 12493974 6.06 0.0054
Error 36 74175757 2060438
Corrected Total 38 99163706


Root MSE 1435.42248 R-Square 0.2520
Dependent Mean 2023.56410 Adj R-Sq 0.2104
Coeff Var 70.93536


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 2140.20468 434.48460 4.93 <.0001
income 1 0.21801 0.08032 2.71 0.0101
d2 1 -1006.62593 479.37174 -2.10 0.0428

d2=0: R&D = 2,140.2047 + .2180*income = 2,140.2047 - 1,006.6259*0 + .2180*income
d2=1: R&D = 1,133.5788 + .2180*income = 2,140.2047 - 1,006.6259*1 + .2180*income

The intercept 2,140 is the Y-intercept of equipment and software firms, whose dummy is
dropped in the model (d1=1, d2=0). The coefficient -1,007 of telecommunications and
electronics means that its Y-intercept is -1,007 smaller than 1,134 of equipment and software.
That is, 1,134 = 2,140 (baseline) – 1,007. Therefore, this model is identical to Model 2 in
Section 2.2. In short, dropping another dummy does not change the model although producing
different dummy coefficients.

Alternatively, you may use the GLM and MIXED procedures to get the same result.

PROC GLM DATA=masil.rnd2002;
MODEL rnd = income d2 /SOLUTION;
RUN;

PROC MIXED DATA=masil.rnd2002;
MODEL rnd = income d2 /SOLUTION;
RUN;

2.5.2 LSDV 2 without the Intercept

LSDV 2 includes all dummy variables and suppresses the intercept. The Stata .regress
command has the noconstant option to fit LSDV2. The coefficients of dummies are actual
parameter estimates; thus, you do not need to compute Y-intercepts of groups. This LSDV,
however, reports incorrect (inflated) R
2
(.7135 > .2520) and F (29.88 > 6.06). This is because
the X matrix does not have a column vector of 1 and produces incorrect sums of squares of
model and total (Uyar and Erdem (1990: 298). However, the sum of squares of errors is correct
in any LSDV.

. regress rnd income d1 d2, noconstant
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 13
http://www.indiana.edu/~statmath

13

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 3, 36) = 29.88
Model | 184685604 3 61561868.1 Prob > F = 0.0000
Residual | 74175756.7 36 2060437.69 R-squared = 0.7135
-------------+------------------------------ Adj R-squared = 0.6896
Total | 258861361 39 6637470.79 Root MSE = 1435.4

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 2140.205 434.4846 4.93 0.000 1259.029 3021.38
d2 | 1133.579 344.0583 3.29 0.002 435.7962 1831.361
------------------------------------------------------------------------------

d1=1: R&D = 2,140.205 + .2180*income
d2=1: R&D = 1,133.579 + .2180*income

2.5.3 LSDV 3 with a Restriction

LSDV 3 includes the intercept and all dummies and then imposes a restriction on the model.
The restriction is that the sum of all dummy parameters is zero. The Stata .constraint
command defines a constraint, while the .cnsreg command fits a constrained OLS using the
constraint()option. The number in the parenthesis indicates the constraint number defined in
the .constraint command.

. constraint 1 d1 + d2 = 0
. cnsreg rnd income d1 d2, constraint(1)

Constrained linear regression Number of obs = 39
F( 2, 36) = 6.06
Prob > F = 0.0054
Root MSE = 1435.4225

( 1) d1 + d2 = 0
------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 503.313 239.6859 2.10 0.043 17.20749 989.4184
d2 | -503.313 239.6859 -2.10 0.043 -989.4184 -17.20749
_cons | 1636.892 310.0438 5.28 0.000 1008.094 2265.69
------------------------------------------------------------------------------

d1=1: R&D = 2,140.205 + .2180*income = 1,637 + 503*1 + (-503)*0 + .2180*income
d2=1: R&D = 1,133.579 + .2180*income = 1,637 + 503*0 + (-503)*1 + .2180*income

The intercept is the average of actual parameter estimates: 1,637 = (2,140+1,133)/2. Since there
are two groups here, the coefficients of two dummies by definition share the same magnitude
($503) but have opposite directions. Equipment and software firms invest $2,140 millions for
R&D expenditure, $503 millions MORE than the average expenditure of overall IT firms
(=$2,140-$1,637), while telecommunications and electronics spend $503 millions LESS than
the average (=$1,134-$1,637). In the SAS output below, the coefficient of RESTRICT is
virtually zero and, in theory, should be zero.

PROC REG DATA=masil.rnd2002;
MODEL rnd = income d1 d2;
RESTRICT d1 + d2 = 0;
RUN;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 14
http://www.indiana.edu/~statmath

14

The REG Procedure
Model: MODEL1
Dependent Variable: rnd

NOTE: Restrictions have been applied to parameter estimates.

Number of Observations Read 50
Number of Observations Used 39
Number of Observations with Missing Values 11


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 2 24987949 12493974 6.06 0.0054
Error 36 74175757 2060438
Corrected Total 38 99163706


Root MSE 1435.42248 R-Square 0.2520
Dependent Mean 2023.56410 Adj R-Sq 0.2104
Coeff Var 70.93536


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 1636.89172 310.04381 5.28 <.0001
income 1 0.21801 0.08032 2.71 0.0101
d1 1 503.31297 239.68587 2.10 0.0428
d2 1 -503.31297 239.68587 -2.10 0.0428
RESTRICT -1 1.81899E-12 0 . .

* Probability computed using beta distribution.

Table 2.2 Estimating Three LSDVs Using SAS, Stata, LIMDEP, and SPSS

LSDV 1 LSDV 2 LSDV 3
SAS
PROC REG;
MODEL rnd = income d2;
RUN;
PROC REG;
MODEL rnd = income d1 d2 /NOINT;
RUN;
PROC REG;
MODEL rnd = income d1 d2;
RESTRICT d1 + d2 = 0;
RUN;
Stata
. regress ind income d2 . regress rnd income d1 d2, noconstant . constraint 1 d1+ d2 = 0
. cnsreg rnd income d1 d2 const(1)
LIMDEP
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d2$
REGRESS;
Lhs=rnd;
Rhs=income, d1, d2$
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d1, d2;
Cls: b(2)+b(3)=0$
SPSS
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d2.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/ORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d1 d2.
N/A
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 15
http://www.indiana.edu/~statmath

15

Table 2.2 compares how SAS, Stata, LIMDEP, and SPSS estimate LSDVs. SPSS is not able to
fit the LSDV3. In LIMDEP, ONE indicates the intercept to be included. Cls: b(2)+b(3)=0 fits
the model under the condition that the sum of parameter estimates of d1 (second parameter)
and d2 (third parameter) is zero. In SPSS, pay attention to the /ORIGIN option for LSDV2.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 16
http://www.indiana.edu/~statmath

16
3. Panel Data Models

Panel data models examine group (individual-specific) effects, time effects, or both. These
effects are either fixed effect or random effect. A fixed effect model examines if intercepts vary
across groups or time periods, whereas a random effect model explores differences in error
variances. A one-way model includes only one set of dummy variables (e.g., firm), while a two-
way model considers two sets of dummy variables (e.g., firm and year). Model 2 in Chapter 2,
in fact, is a one-way fixed group effect panel data model.

3.1 Functional Forms and Notation

The parameter estimate of a dummy variable is a part of the intercept in a fixed effect model
and a component of error in the random effect model. Slopes remain the same across groups or
time periods. The functional forms of one-way panel data models are as follows.

Fixed group effect model:
it it i it
v X u y + + + = | o
'
) ( , where ) , 0 ( ~
2
v it
IID v o
Random group effect model: ) (
'
it i it it
v u X y + + + = | o , where ) , 0 ( ~
2
v it
IID v o

Note that
i
u is a fixed or random effect and errors are independent identically distributed,
) , 0 ( ~
2
v it
IID v o .

Notations used in this document include,
-
- i
y : dependent variable (DV) mean of group i.
-
t
y
-
: dependent variable (DV) mean at time t.
-
- i
x : means of independent variables (IVs) of group i.
-
t
x
-
: means of independent variables (IVs) at time t.
-
- -
y : overall means of the DV.
-
- -
x : overall means of the IVs.
- n: the number of groups or firms
- T : the number of time periods
- N=nT : total number of observations
- k : the number of regressors excluding dummy variables
- K=k+1 (including the intercept)

3.2 Fixed Effect Models

There are several strategies for estimating fixed effect models. The least squares dummy
variable model (LSDV) uses dummy variables, whereas the within effect model does not. These
strategies, of course, produce the identical parameter estimates of non-dummy independent
variables. The between effect model fits the model using group and/or time means of dependent
and independent variables without dummies. Table 3.1 summarizes pros and cons of these
models.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 17
http://www.indiana.edu/~statmath

17
3.2.1 Estimations: LSDV, Within Effect, and Between Effect Models

As discussed in Chapter 2, LSDV is widely used because it is relatively easy to estimate and
interpret substantively. This LSDV, however, becomes problematic when there are many
groups or subjects in panel data. If T is fixed and · ÷ nT , only coefficients of regressors are
consistent. The coefficients of dummy variables,
i
u + o , are not consistent since the number of
these parameters increases as nT increases (Baltagi 2001). This is the so called incidental
parameter problem. Under this circumstance, LSDV is useless and thus calls for another
strategy, the within effect model.

A within group effect model does not need dummy variables, but it uses deviations from group
means. Thus, this model is the OLS of ) ( )' ( ) (
- - -
÷ + ÷ = ÷
i it i it i it
x x y y c c | without an
intercept.
6
The incidental parameter problem is no longer an issue. The parameter estimates of
regressors in the within effect model are identical to those of LSDV. The within effect model in
turn has several disadvantages.

Since this model does not report dummy coefficients, you need to compute them using the
formula | '
*
- -
÷ =
i i i
x y d Since no dummy is used, the within effect model has larger degrees of
freedom for error, resulting in small MSE (mean square error) and incorrect (smaller) standard
errors of parameter estimates. Thus, you have to adjust the standard error using the formula
k n nT
k nT
se
df
df
se se
k
LSDV
error
Within
error
k k
÷ ÷
÷
= =
*
. Finally, R
2
of the within effect model is not correct
because the intercept is suppressed.

Table 3.1 Comparison of Fixed Effect Models
LSDV1 Within Effect Between Effect
Functional form
i i i i
X i y c | o + + =
- - -
÷ + ÷ = ÷
i it i it i it
x x y y c c
i i i
x y c o + + =
- -

Dummy Yes No No
Dummy coefficient Presented Need to be computed N/A
Transformation No Deviation from the group means Group means
Intercept (estimation) Yes No Yes
R
2
Correct Incorrect
SSE Correct Correct
MSE Correct Smaller
Standard error of |
Correct Incorrect (smaller)
DF
error
nT-n-k nT-k (n larger) n-K
Observations nT nT n

The between group effect model, so called the group mean regression, uses group means of the
dependent and independent variables. This data aggregation reduces the number of

6
You need to follow three steps: 1) compute group means of the dependent and independent variables; 2)
transform variables to get deviations from the group means; 3) run OLS with the transformed variables without the
intercept.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 18
http://www.indiana.edu/~statmath

18
observations down to n. Then, run OLS of
i i i
x y c o + + =
- -
. Table 3.1 contrasts LSDV, the
within effect model, and the between group models.

3.2.2 Testing Group Effects

In a regression of
it it i it
X y c | µ o + + + = ' , the null hypothesis is that all dummy parameters
except for one for the dropped are zero: 0 ... :
1 1 0
= = =
÷ n
H µ µ . This hypothesis is tested by the
F test, which is based on loss of goodness-of-fit. The robust model in the following formula is
LSDV (or within effect model) and the efficient model is the pooled regression.
7


) , 1 ( ~
) ( ) 1 (
) 1 ( ) (
) ( ) ' (
) 1 ( ) ' ' (
2
2 2
k n nT n F
k n nT R
n R R
k n nT e e
n e e e e
Robust
Efficient Robust
Robust
Robust Efficient
÷ ÷ ÷
÷ ÷ ÷
÷ ÷
=
÷ ÷
÷ ÷


If the null hypothesis is rejected, you may conclude that the fixed group effect model is better
than the pooled OLS model.

3.2.3 Fixed Time Effect and Two-way Fixed Effect Models

For the fixed time effects model, you need to switch n and T, and i and t in the formulas.

- Model:
it it t it
X y c | t o + + + = '
- Within effect model: ) ( )' ( ) (
t it t it t it
x x y y
- - -
÷ + ÷ = ÷ c c |
- Dummy coefficients: | '
*
t t t
x y d
- -
÷ =
- Correct standard errors:
k T Tn
k Tn
se
df
df
se se
k
LSDV
error
Within
error
k k
÷ ÷
÷
= =
*

- Between effect model:
t t t
x y c o + + =
- -

- 0 ... :
1 1 0
= = =
÷ T
H t t .
- F-test: ) , 1 ( ~
) ( ) ' (
) 1 ( ) ' ' (
k T Tn T F
k T Tn e e
T e e e e
Within
Within Pooled
÷ ÷ ÷
÷ ÷
÷ ÷
.

The fixed group and time effect model uses slightly different formulas. The within effect model
of this two-way fixed model is estimated by five strategies (see Section 6.1).

- Model:
it it t i it
X y c | t µ o + + + + = .
- Within effect Model:
- - - -
+ ÷ ÷ = y y y y y
t i it it
*
and
- - - -
+ ÷ ÷ = x x x x x
t i it it
*
.
- Dummy coefficients: | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
i i i
and | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
t t t


7
When comparing fixed effect and random effect models, the fixed effect estimates are considered as the robust
estimates and random effect estimates as the efficient estimates.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 19
http://www.indiana.edu/~statmath

19
- Correct standard errors:
1
*
+ ÷ ÷ ÷
÷
= =
k T n nT
k nT
se
df
df
se se
k
LSDV
error
Within
error
k k

- 0 ... :
1 1 0
= = =
÷ n
H µ µ and 0 ...
1 1
= = =
÷ T
t t .
- F-test: )] 1 ( ), 2 [( ~
) 1 ( ) ' (
) 2 ( ) ' ' (
+ ÷ ÷ ÷ ÷ +
+ ÷ ÷ ÷
÷ + ÷
k T n nT T n F
k T n nT e e
T n e e e e
Robust
Robust Efficient


3.3 Random Effect Models

The one-way random group effect model is formulated as
it i it it
v u X y + + + = | o ' ,
it i it
v u w + =
where ) , 0 ( ~
2
u i
IID u o and ) , 0 ( ~
2
v it
IID v o . The
i
u are assumed independent of
it
v and
it
X ,
which are also independent of each other for all i and t. This assumption is not necessary in the
fixed effect model. The components of ) ( ) , (
js it js it
w w E w w Cov = are
2 2
v u
o o + if i=j and t=s and
2
u
o if i=j and s t = .
8
Thus, the O matrix or the variance structure of errors looks like,

(
(
(
(
(
¸
(

¸

+
+
+
= O
×
2 2 2 2
2 2 2 2
2 2 2 2
...
... ... ... ...
...
...
v u u u
u v u u
u u v u
T T
o o o o
o o o o
o o o o


A random effect model is estimated by generalized least squares (GLS) when the variance
structure is known, and by feasible generalized least squares (FGLS) when the variance is
unknown. Compared to fixed effect models, random effect models are relatively difficult to
estimate. This document assumes panel data are balanced.

3.3.1 Generalized Least Squares (GLS)

When O is known (given), GLS based on the true variance components is BLUE and all the
feasible GLS estimators considered are asymptotically efficient as either n or T approaches
infinity (Baltagi 2001).
In GLS, you just need to compute u using the O matrix:
2 2
2
1
v u
v
T o o
o
u
+
÷ = .
9
Then transform
variables as follows.
-

-
÷ =
i it it
y y y u
*

-

-
÷ =
i it it
x x x u
*
for all X
k
-
u o ÷ =1
*



8
This implies that ) , (
js it
w w Corr is 1 if i=j and t=s, and ) (
2 2 2
v u u
o o o + if i=j and s t = .
9
If 0 = u , run pooled OLS. If 1 = u and 0
2
=
v
o , then run the within effect model.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 20
http://www.indiana.edu/~statmath

20
Finally, run OLS on the transformed variables:
* * * * *
'
it it it
x y c | o + + = . Since O is often unknown,
FGLS is more frequently used than GLS.

3.3.2 Feasible Generalized Least Squares (FGLS)

If O is unknown, first you have to estimate u using
2
ˆ
u
o and
2
ˆ
v
o :
2
2
2 2
2
ˆ
ˆ
1
ˆ ˆ
ˆ
1
ˆ
between
v
v u
v
T T o
o
o o
o
u ÷ =
+
÷ = .

The
2
ˆ
v
o is derived from the SSE (sum of squares due to error) of the within effect model or
from the deviations of residuals from group means of residuals:
k n nT
v v
k n nT
e e
k n nT
SSE
n
i
T
t
i it
within within
v
÷ ÷
÷
=
÷ ÷
=
÷ ÷
=
¿¿
= =
-
1 1
2
2
) (
'
ˆ o , where
it
v are the residuals of the LSDV1.

The
2
ˆ
u
o comes from the between effect model (group mean regression):
T
v
between u
2
2 2
ˆ
ˆ ˆ
o
o o ÷ = , where
K n
SSE
between
between
÷
=
2
ˆ o .

Next, transform variables using u
ˆ
and then run OLS:
* * * * *
'
it it it
x y c | o + + = .
-
-
÷ =
i it it
y y y u
ˆ
*

-
-
÷ =
i it it
x x x u
ˆ
*
for all X
k

- u o
ˆ
1
*
÷ =

The estimation of the two-way random effect model is skipped here.

3.3.3 Testing Random Effects (LM test)

The null hypothesis is that cross-sectional variance components are zero, 0 :
2
0
=
u
H o . Breusch
and Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003). In the following
formula, e is the n X 1 vector of the group specific means of pooled regression residuals and
e e' is the SSE of the pooled OLS regression. The LM follows chi-squared distribution with
one degree of freedom.
) 1 ( ~ 1
'
'
) 1 ( 2
1
'
'
) 1 ( 2
2
2
2
2
_
(
¸
(

¸

÷
÷
=
(
¸
(

¸

÷
÷
=
e e
e e T
T
nT
e e
DDe e
T
nT
LM
u
.

Baltagi (2001) presents the same LM test in a different way.
( ) ( )
) 1 ( ~ 1
) 1 ( 2
1
) 1 ( 2
2
2
2
2
2
2
2
_
(
(
¸
(

¸

÷
÷
=
(
(
¸
(

¸

÷
÷
=
¿¿
¿
¿¿
¿ ¿ -
it
i
it
it
u
e
e T
T
nT
e
e
T
nT
LM .
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 21
http://www.indiana.edu/~statmath

21

The two way random effect model has the null hypothesis of 0 :
2
1 0
=
u
H o and 0
2
2
=
u
o . The LM
test combines two one-way random effect models for group and time,
) 2 ( ~
2
2 1 12
_
u u u
LM LM LM + = .

3.4 Hausman Test: Fixed Effects versus Random Effects

The Hausman specification test compares the fixed versus random effects under the null
hypothesis that the individual effects are uncorrelated with the other regressors in the model
(Hausman 1978). If correlated (H
0
is rejected), a random effect model produces biased
estimators, violating one of the Gauss-Markov assumptions; so a fixed effect model is preferred.
Hausman’s essential result is that the covariance of an efficient estimator with its difference
from an inefficient estimator is zero (Greene 2003).

( ) ( ) ) ( ~
ˆ
2 1 '
k b b b b m
Efficient Robust Efficient Robust
_ ÷ ¿ ÷ =
÷
,
where, ) ( ) ( ] [
ˆ
Efficient Robust Efficient Robust
b Var b Var b b Var ÷ = ÷ = ¿ is the difference in the estimated
covariance matrix of the parameter estimates between the LSDV model (robust) and the
random effects model (efficient). It is notable that an intercept and dummy variables SHOULD
be excluded in computation.

3.5 Poolability Test

What is poolability? Poolability tests whether or not slopes are the same across groups or over
time. Thus, the null hypothesis of the poolability test is
k ik
H | | = :
0
. Remember that slopes
remain constant in fixed and random effect models; only intercepts and error variances matter.

The poolability test is undertaken under the assumption of ) , 0 ( ~
2
NT
I s N µ . This test uses the F
statistic,
| | ) ( , ) 1 ( ~
) (
) 1 ( ) ' (
'
'
K T n K n F
K T n e e
K n e e e e
F
i i
i i
obs
÷ ÷
÷
÷ ÷
=
¿
¿
,
where e e' is the SSE of the pooled OLS and
i i
e e
'
is the SSE of the OLS regression for group i.
If the null hypothesis is rejected, the panel data are not poolable. Under this circumstance, you
may go to the random coefficient model or hierarchical regression model.

Similarly, the null hypothesis of the poolability test over time is
k tk
H | | = :
0
. The F-test is
| | ) ( , ) 1 (
) (
) 1 ( ) ' (
'
'
K n T K T F
K n T e e
K T e e e e
F
t t
t t
obs
÷ ÷ =
÷
÷ ÷
=
¿
¿
,
where
t t
e e
'
is SSE of the OLS regression at time t.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 22
http://www.indiana.edu/~statmath

22
4. One-way Fixed Effect Models: Group Effects

A one-way fixed group model examines group differences in intercepts. The LSDV for this
fixed model needs to create as many dummy variables as the number of entities or subjects.
When many dummies are needed, the within effect model is useful since it transforms variables
using group means to avoid dummies. The between effect model uses group means of variables.

The sample panel data set includes cost and its related data of six U.S. airlines measured at 15
different time points. The following .use command reads a data set airline.dta
and .describe displays basic information of key variables.

. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear

. describe airline year cost output fuel load

storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------
airline int %8.0g Airline name
year int %8.0g Year
cost float %9.0g Total cost in $1,000
output float %9.0g Output in revenue passenger miles, index number
fuel float %9.0g Fuel price
load float %9.0g Load factor

You need to declare a cross-sectional (airline) and a time-series (year) variables using
the .tsset command.

. tsset airline year
panel variable: airline (strongly balanced)
time variable: year, 1 to 15
delta: 1 unit

Let us take a look at descriptive statistics of key variables using .xtsum.

. xtsum cost output fuel load

Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
cost overall | 13.36561 1.131971 11.14154 15.3733 | N = 90
between | .9978636 12.27441 14.67563 | n = 6
within | .6650252 12.11545 14.91617 | T = 15
| |
output overall | -1.174309 1.150606 -3.278573 .6608616 | N = 90
between | 1.166556 -2.49898 .3192696 | n = 6
within | .4208405 -1.987984 .1339861 | T = 15
| |
fuel overall | 12.77036 .8123749 11.55017 13.831 | N = 90
between | .0237151 12.7318 12.7921 | n = 6
within | .8120832 11.56883 13.8513 | T = 15
| |
load overall | .5604602 .0527934 .432066 .676287 | N = 90
between | .0281511 .5197756 .5971917 | n = 6
within | .0460361 .4368492 .6581019 | T = 15

4.1 The Pooled OLS Regression Model

First, fit the pooled regression model without any dummy variable.

. regress cost output fuel load

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 23
http://www.indiana.edu/~statmath

23
Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 86) = 2419.34
Model | 112.705452 3 37.5684839 Prob > F = 0.0000
Residual | 1.33544153 86 .01552839 R-squared = 0.9883
-------------+------------------------------ Adj R-squared = 0.9879
Total | 114.040893 89 1.28135835 Root MSE = .12461

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8827385 .0132545 66.60 0.000 .8563895 .9090876
fuel | .453977 .0203042 22.36 0.000 .4136136 .4943404
load | -1.62751 .345302 -4.71 0.000 -2.313948 -.9410727
_cons | 9.516923 .2292445 41.51 0.000 9.0612 9.972645
------------------------------------------------------------------------------

The regression equation is cost = 9.5169 + .8827*output +.4540*fuel -1.6275*load. This model
fits the data well (F=2419.34, p<.0000 and R
2
=.9883). We may, however, suspect if there is a
fixed group effect producing different intercepts across groups. Each airline may have a
significantly different level of cost, its Y-intercept, when all regressors are set to zero. This
difference is modeled as a fixed group effect.

As discussed in Chapter 2, there are three equivalent approaches of LSDV. They report the
identical parameter estimates of regresors except for dummy coefficients. Let us begin with
LSDV1.

4.2 LSDV1 without a Dummy

LSDV1 drops a dummy variable to get the model identified. LSDV1 produces correct ANOVA
information, goodness of fit, parameter estimates, and standard errors. As a consequence, this
approach is commonly used in practice. LSDV produces six regression equations for six
airlines. How can we draw these equations using LSDV1?

Airline 1: cost = 9.7059 + .9193*output +.4175*fuel -1.0704*load
Airline 2: cost = 9.6647 + .9193*output +.4175*fuel -1.0704*load
Airline 3: cost = 9.4970 + .9193*output +.4175*fuel -1.0704*load
Airline 4: cost = 9.8905 + .9193*output +.4175*fuel -1.0704*load
Airline 5: cost = 9.7300 + .9193*output +.4175*fuel -1.0704*load
Airline 6: cost = 9.7930 + .9193*output +.4175*fuel -1.0704*load

In SAS, PROC REG fits the OLS regression model. Let us drop the last dummy g6 and use it
as the reference group. Of course, you may drop another dummy variable to get the equivalent
result. LSDV1 fits the data better than does the pooled OLS. SSE decreases from 1.3354
to .2926, but R
2
increases from .9883 to .9974. Due to the dummies included, this model loses
five degrees of freedom (from 86 to 81).

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 24
http://www.indiana.edu/~statmath

24
Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 8 113.74827 14.21853 3935.79 <.0001
Error 81 0.29262 0.00361
Corrected Total 89 114.04089


Root MSE 0.06011 R-Square 0.9974
Dependent Mean 13.36561 Adj R-Sq 0.9972
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.79300 0.26366 37.14 <.0001
g1 1 -0.08706 0.08420 -1.03 0.3042
g2 1 -0.12830 0.07573 -1.69 0.0941
g3 1 -0.29598 0.05002 -5.92 <.0001
g4 1 0.09749 0.03301 2.95 0.0041
g5 1 -0.06301 0.02389 -2.64 0.0100
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001

The parameter estimate of g6 is presented in the intercept (9.7930). Other dummy parameter
estimates are computed using the reference point. The actual intercept of airline 1, for example,
is computed as 9.7059 = 9.7930 + (-.0871)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 + (-
.0630)*0 or simply 9.7930 + (-.0871), where 9.7930 is the reference point, the intercept of this
model. The coefficient -.0871 says that the Y-intercept of airline 1 (9.7059) is .0871 smaller
than that of airline 6 (reference point).

Stata has the .regress command for OLS regression (LSDV). The output is identical to that of
PROC REG.

. regress cost g1-g5 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | -.0870617 .0841995 -1.03 0.304 -.2545924 .080469
g2 | -.1282976 .0757281 -1.69 0.094 -.2789728 .0223776
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 25
http://www.indiana.edu/~statmath

25
g3 | -.2959828 .0500231 -5.92 0.000 -.395513 -.1964526
g4 | .097494 .0330093 2.95 0.004 .0318159 .1631721
g5 | -.063007 .0238919 -2.64 0.010 -.1105443 -.0154697
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.793004 .2636622 37.14 0.000 9.268399 10.31761
------------------------------------------------------------------------------

In LIMDEP, run the Regress$ command to fit the LSDV1. Do not forget to include ONE for
the intercept in the Rhs subcommand.

--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:51:23PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 9.79302127 .26366104 37.142 .0000
G1 | -.08707202 .08419916 -1.034 .3042 .16666667
G2 | -.12830600 .07572778 -1.694 .0940 .16666667
G3 | -.29598860 .05002285 -5.917 .0000 .16666667
G4 | .09749253 .03300915 2.954 .0041 .16666667
G5 | -.06300770 .02389180 -2.637 .0100 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

What if we drop a different dummy variable, say g1, instead of g6? Since the different
reference point is applied, you will get different dummy coefficients. As shown in the above,
the intercept 9.7059 in this model is the actual parameter estimate (Y-intercept) of g1, which
was excluded from the model. The Y-intercept of airline 2 is computed to get 9.6647=9.7059-
.0412. The Y-intercept of airline 2 (9.6647) is .0412 smaller than the reference point of 9.7059.
Actual Y-intercepts of other dummies are computed in this manner. The other statistics such as
parameter estimates of regressors and goodness-of-fit measures remain unchanged. That is,
choice of a dummy variable to be dropped does not change a model.

. regress cost g2-g6 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 26
http://www.indiana.edu/~statmath

26

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2 | -.0412359 .0251839 -1.64 0.105 -.0913441 .0088722
g3 | -.2089211 .0427986 -4.88 0.000 -.2940769 -.1237652
g4 | .1845557 .0607527 3.04 0.003 .0636769 .3054345
g5 | .0240547 .0799041 0.30 0.764 -.1349293 .1830387
g6 | .0870617 .0841995 1.03 0.304 -.080469 .2545924
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.705942 .193124 50.26 0.000 9.321686 10.0902
------------------------------------------------------------------------------

When you have not created dummy variables, take advantage of the .xi prefix command
(interaction expansion) to obtain the identical result. The Stata .xi, like.bysort, is used either
as an ordinary command or a prefix command. .xi creates dummies from a categorical
variable specified in the term i. and then run the command following the colon. Stata by
default drops the first dummy variable, while PROC TSCSREG and PROC PANEL in Section
4.5.2 drop the last dummy.

. xi: regress cost i.airline output fuel load

i.airline _Iairline_1-6 (naturally coded; _Iairline_1 omitted)

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iairline_2 | -.0412359 .0251839 -1.64 0.105 -.0913441 .0088722
_Iairline_3 | -.2089211 .0427986 -4.88 0.000 -.2940769 -.1237652
_Iairline_4 | .1845557 .0607527 3.04 0.003 .0636769 .3054345
_Iairline_5 | .0240547 .0799041 0.30 0.764 -.1349293 .1830387
_Iairline_6 | .0870617 .0841995 1.03 0.304 -.080469 .2545924
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.705942 .193124 50.26 0.000 9.321686 10.0902
------------------------------------------------------------------------------

4.3 LSDV2 without the Intercept

LSDV2 reports actual parameter estimates of the dummies. You do not need to compute actual
Y-intercept any more. Because LSDV2 suppresses the intercept, you will get incorrect F and R
2

statistics. However, the SSE of LSDV2 is correct.

In PROC REG, you need to use the /NOINT option to suppress the intercept. Obviously, the F
value of 497,985 and R
2
of 1 are not likely. However, SSE, parameter estimates of regressors,
and their standard errors are correct. Make sure that the intercepts presented in the beginning of
Section 4.2 are what we got here using LSDV2.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load /NOINT;
RUN;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 27
http://www.indiana.edu/~statmath

27

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 9 16191 1799.03381 497985 <.0001
Error 81 0.29262 0.00361
Uncorrected Total 90 16192


Root MSE 0.06011 R-Square 1.0000
Dependent Mean 13.36561 Adj R-Sq 1.0000
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

g1 1 9.70594 0.19312 50.26 <.0001
g2 1 9.66471 0.19898 48.57 <.0001
g3 1 9.49702 0.22496 42.22 <.0001
g4 1 9.89050 0.24176 40.91 <.0001
g5 1 9.73000 0.26094 37.29 <.0001
g6 1 9.79300 0.26366 37.14 <.0001
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001

Stata uses the noconstant option to suppress the intercept. Notice that noc is its abbreviation.

. regress cost g1-g6 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 9, 81) = .
Model | 16191.3043 9 1799.03381 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | 9.705942 .193124 50.26 0.000 9.321686 10.0902
g2 | 9.664706 .198982 48.57 0.000 9.268794 10.06062
g3 | 9.497021 .2249584 42.22 0.000 9.049424 9.944618
g4 | 9.890498 .2417635 40.91 0.000 9.409464 10.37153
g5 | 9.729997 .2609421 37.29 0.000 9.210804 10.24919
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 28
http://www.indiana.edu/~statmath

28
g6 | 9.793004 .2636622 37.14 0.000 9.268399 10.31761
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
------------------------------------------------------------------------------

In LIMDEP, you need to drop ONE out of the Rhs subcommand to suppress the intercept.
Unlike SAS and Stata, LIMDEP reports correct R
2
(.9974) and F (3,936) even in LSDV2.

REGRESS;Lhs=COST;Rhs=G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:53:24PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 9.70594925 .19312325 50.258 .0000 .16666667
G2 | 9.66471527 .19898117 48.571 .0000 .16666667
G3 | 9.49703267 .22495746 42.217 .0000 .16666667
G4 | 9.89051381 .24176245 40.910 .0000 .16666667
G5 | 9.73001357 .26094094 37.288 .0000 .16666667
G6 | 9.79302127 .26366104 37.142 .0000 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

4.4 LSDV3 with Restrictions

LSDV3 imposes a restriction that the sum of the dummy parameters is zero. PROC REG has
the RESTRICT statement to impose restrictions. LSDV3 reports the correct ANOVA table and
parameter estimates of regressors but produces different, compared to those of LSDV1 and
LSDV2, dummy coefficients due to the different baseline (group average) used.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 29
http://www.indiana.edu/~statmath

29
NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 8 113.74827 14.21853 3935.79 <.0001
Error 81 0.29262 0.00361
Corrected Total 89 114.04089


Root MSE 0.06011 R-Square 0.9974
Dependent Mean 13.36561 Adj R-Sq 0.9972
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.71353 0.22964 42.30 <.0001
g1 1 -0.00759 0.04562 -0.17 0.8683
g2 1 -0.04882 0.03798 -1.29 0.2023
g3 1 -0.21651 0.01606 -13.48 <.0001
g4 1 0.17697 0.01942 9.11 <.0001
g5 1 0.01647 0.03669 0.45 0.6547
g6 1 0.07948 0.04050 1.96 0.0532
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001
RESTRICT -1 3.01674E-15 7.82306E-11 0.00 1.0000*

* Probability computed using beta distribution.

A dummy coefficient means the deviation from the averaged group effect (9.714). The actual
intercept of airline 2, for example, is 9.6647 =9.7135+ (-.0488). Notice that the 3.01674E-15 of
RESTRICT is virtually zero.

In Stata, you have to use the .cnsreg command in stead of .regress. The command, however,
does not provide an ANOVA table and goodness-of-fit statistics other than F and SEE
(standard error of residual--error term, square root of MSE).

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 output fuel load, constraint(1)

Constrained linear regression Number of obs = 90
F( 8, 81) = 3935.79
Prob > F = 0.0000
Root MSE = 0.0601

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 30
http://www.indiana.edu/~statmath

30
( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | -.0075859 .0456178 -0.17 0.868 -.0983509 .0831792
g2 | -.0488218 .0379787 -1.29 0.202 -.1243875 .0267439
g3 | -.2165069 .0160624 -13.48 0.000 -.2484661 -.1845478
g4 | .1769698 .0194247 9.11 0.000 .1383208 .2156189
g5 | .0164689 .0366904 0.45 0.655 -.0565335 .0894712
g6 | .0794759 .0405008 1.96 0.053 -.001108 .1600597
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
------------------------------------------------------------------------------

LIMDEP has the Cls subcommand to impose restrictions. Again, do not forget to include ONE
in Rhs. b(2) in Cls: indicates the parameter of the second variable, g1, listed in Rhs.

REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 31, 2009 at 06:39:21PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
| Restrictns. F[ 1, 80] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 9.71354097 .22964002 42.299 .0000
G1 | -.00759172 .04561756 -.166 .8682 .16666667
G2 | -.04882570 .03797853 -1.286 .2023 .16666667
G3 | -.21650830 .01606233 -13.479 .0000 .16666667
G4 | .17697283 .01942459 9.111 .0000 .16666667
G5 | .01647259 .03669023 .449 .6547 .16666667
G6 | .07948030 .04050059 1.962 .0532 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

LSDV3 in LIMDEP reports different dummy coefficients. But you may compute actual
intercepts of groups in a manner similar to what you would do in SAS and Stata. The actual
intercept of airline 5, for example, is 9.7300 = 12.1221 + (-2.3920).

4.5 Within Group Effect Model

The within effect model does not use dummy variables and thus has larger degrees of freedom,
smaller MSE, and smaller standard errors of parameters than those of LSDV. As a consequence,
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 31
http://www.indiana.edu/~statmath

31
you need to adjust standard errors. This model does not report individual dummy coefficients
either; you need to compute them if really needed. The SAS TSCSREG and PANEL
procedures and LIMDEP Regress$ command report the adjusted (correct) MSE, SEE (square
root of MSE), R
2
, and standard errors.

4.5.1 Estimating the Within Effect Model

First, let us manually estimate the within group effect model with Stata. You need to compute
group means.

. quietly egen gm_cost=mean(cost), by(airline)
. quietly egen gm_output=mean(output), by(airline)
. quietly egen gm_fuel=mean(fuel), by(airline)
. quietly egen gm_load=mean(load), by(airline)

You will get the following group means of variables.

+------------------------------------------------------+
| airline gm_cost gm_output gm_fuel gm_load |
|------------------------------------------------------|
| 1 14.67563 .3192696 12.7318 .5971917 |
| 2 14.37247 -.033027 12.75171 .5470946 |
| 3 13.37231 -.9122626 12.78972 .5845358 |
| 4 13.1358 -1.635174 12.77803 .5476773 |
| 5 12.36304 -2.285681 12.7921 .5664859 |
| 6 12.27441 -2.49898 12.7788 .5197756 |
+------------------------------------------------------+

Then transform dependent and independent variables to compute deviations from group means.

. quietly gen gw_cost = cost - gm_cost
. quietly gen gw_output = output - gm_output
. quietly gen gw_fuel = fuel - gm_fuel
. quietly gen gw_load = load - gm_load

Now, we are ready to run the within effect model. Keep in mind that you have to suppress the
intercept. The within effect model reports correct SSE and parameter estimates of regressors
but incorrect R
2
and standard errors of parameter estimates. Notice that the degrees of freedom
increase from 81 (LSDV) to 87 since six dummy variables are not used.

. regress gw_cost gw_output gw_fuel gw_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 3871.82
Model | 39.0683861 3 13.0227954 Prob > F = 0.0000
Residual | .292622861 87 .003363481 R-squared = 0.9926
-------------+------------------------------ Adj R-squared = 0.9923
Total | 39.361009 90 .437344544 Root MSE = .058

------------------------------------------------------------------------------
gw_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gw_output | .9192846 .028841 31.87 0.000 .86196 .9766092
gw_fuel | .4174918 .0146657 28.47 0.000 .3883422 .4466414
gw_load | -1.070396 .1946109 -5.50 0.000 -1.457206 -.6835858
------------------------------------------------------------------------------

You may compute group intercepts using
- -
÷ =
i i i
x y d '
*
| . For example, the intercept of airline
5 is computed as 9.730 = 12.3630 – {.9193*(-2.2857) + .4175*12.7921 + (-1.0704)*.5665}. In
order to get the correct standard errors, you need to adjust them using the ratio of degrees of
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 32
http://www.indiana.edu/~statmath

32
freedom of the within effect model and LSDV. For example, the standard error of the logged
output is computed as .0299=.0288*sqrt(87/81).

4.5.2 Using SAS: PROC TSCSREG and PROC PANEL

PROC TSCSREG and PROC PANEL of SAS/ETS allows users to fit the within effect model
conveniently. They, in fact, report LSDV1, but you do not need to create dummy variables and
compute deviations from group means.

PROC SORT DATA=masil.airline;
BY airline year;

A data set needs to be sorted in advance by the variables, which will appear in the ID statement
of PROC TSCSREG and PROC PANEL. These time-series and cross-sectional variables may
be numeric or string in SAS. /FIXONE of the MODEL statement fits a one-way fixed effect
model.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

The TSCSREG Procedure
Fixed One Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixOne
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.2926 DFE 81
MSE 0.0036 Root MSE 0.0601
R-Square 0.9974


F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

5 81 57.73 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 -0.08706 0.0842 -1.03 0.3042 Cross Sectional
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 33
http://www.indiana.edu/~statmath

33
Effect 1
CS2 1 -0.1283 0.0757 -1.69 0.0941 Cross Sectional
Effect 2
CS3 1 -0.29598 0.0500 -5.92 <.0001 Cross Sectional
Effect 3
CS4 1 0.097494 0.0330 2.95 0.0041 Cross Sectional
Effect 4
CS5 1 -0.06301 0.0239 -2.64 0.0100 Cross Sectional
Effect 5
Intercept 1 9.793004 0.2637 37.14 <.0001 Intercept
output 1 0.919285 0.0299 30.76 <.0001
fuel 1 0.417492 0.0152 27.47 <.0001
load 1 -1.0704 0.2017 -5.31 <.0001

The following PANEL procedure returns the same output.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

Both PROC TSCSREG and PROC PANEL report correct (adjusted) MSE, SEE, R
2
, and
standard errors, and conduct the F test for fixed group effect as well. They have strong
advantages over other software packages in this respect.

4.5.3 Using Stata

The Stata .xtreg command fits the within group effect model without creating dummy
variables. .xtreg should follow the .tsset command that specifies cross-sectional and time-
series variables. Both variables should be numeric in Stata; string variables are not allowed
in .tsset.

. quietly tsset airline year

The fe option of .xtreg indicates the within effect model and i(airline) specifies airline
as the independent unit. Once .tsset is executed, i(airline) is redundant. This command
report incorrect F 3,604 and R
2
of .9926.

. xtreg cost output fuel load, fe i(airline)

Fixed-effects (within) regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9926 Obs per group: min = 15
between = 0.9856 avg = 15.0
overall = 0.9873 max = 15

F(3,81) = 3604.80
corr(u_i, Xb) = -0.3475 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 34
http://www.indiana.edu/~statmath

34
-------------+----------------------------------------------------------------
sigma_u | .1320775
sigma_e | .06010514
rho | .82843653 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5, 81) = 57.73 Prob > F = 0.0000

Like PROC PANEL, .xtreg reports correct standard errors and the F test for a fixed group
effect. But this command does not provide an analysis of variance (ANOVA) table. R
2
and F
statistic are not correct. The last line of the output tests the null hypothesis that five dummy
parameters in LSDV1 are zero (e.g., μ
1
=0, μ
2
=0, μ
3
=0, μ
4
=0, and μ
5
=0). Notice that the
intercept of 9.7135 is that of LSDV3.

Alternatively, you may use .areg to get the same result except for R
2
, which is correct. The
intercept 9.7135 is the average of six airlines, the intercept of LSDV3.

. areg cost output fuel load, absorb(airline)

Linear regression, absorbing indicators Number of obs = 90
F( 3, 81) = 3604.80
Prob > F = 0.0000
R-squared = 0.9974
Adj R-squared = 0.9972
Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
-------------+----------------------------------------------------------------
airline | F(5, 81) = 57.732 0.000 (6 categories)

4.5.4 Using LIMDEP

In LIMDEP, the Panel and Fixed subcommands in the Regress$ command fit a fixed effect
panel data model. The Str subcommand specifies a stratification variable.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Fixed$

+----------------------------------------------------+
| OLS Without Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:56:52PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 4 |
| Degrees of freedom = 86 |
| Residuals Sum of squares = 1.335450 |
| Standard error of e = .1246133 |
| Fit R-squared = .9882897 |
| Adjusted R-squared = .9878812 |
| Model test F[ 3, 86] (prob) =2419.33 (.0000) |
| Diagnostic Log likelihood = 61.76991 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 3] (prob) = 400.26 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.121594 |
| Akaike Info. Criter. = -4.121653 |
+----------------------------------------------------+

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 35
http://www.indiana.edu/~statmath

35
+----------------------------------------------------+
| Panel Data Analysis of COST [ONE way] |
| Unconditional ANOVA (No regressors) |
| Source Variation Deg. Free. Mean Square |
| Between 74.6799 5. 14.9360 |
| Residual 39.3611 84. .468584 |
| Total 114.041 89. 1.28136 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88273863 .01325455 66.599 .0000 -1.17430918
FUEL | .45397771 .02030424 22.359 .0000 12.7703592
LOAD | -1.62750780 .34530293 -4.713 .0000 .56046016
Constant| 9.51691223 .22924522 41.514 .0000

+----------------------------------------------------+
| Least Squares with Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:56:52PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Estd. Autocorrelation of e(i,t) .573531 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -90.48804 .3936109461D+02 .6548513 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 130.08647 .2926207777D+00 .9974341 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 95.740 5 .00000 31.875 5 84 .00000 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 536.889 8 .00000 3935.818 8 81 .00000 |
|(4) vs (2) 441.149 3 .00000 3604.832 3 81 .00000 |
|(4) vs (3) 136.633 5 .00000 57.733 5 81 .00000 |
+--------------------------------------------------------------------+

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 36
http://www.indiana.edu/~statmath

36
LIMDEP reports both the pooled OLS regression under the label OLS Without Group Dummy
Variables and the within effect model under Least Squares with Group Dummy
Variables. Like the SAS TSCSREG procedure, LIMDEP provides correct MSE, SEE, R
2
, and
standard errors of the fixed effect model. LIMDEP also conducts the F test for checking a fixed
group effect (see the last line of the LIMDEP output above to get 57.733).

4.6 Between Group Effect Model: Group Mean Regression

A between effect model uses aggregate information, group means of variables. In other words,
the unit of analysis is not an individual observation, but entity or subject. The number of
observations jumps down to n from nT. This group mean regression produces different
goodness-of-fit measures and parameter estimates compared to those of LSDV and the within
effect model.

Let us compute group means and run OLS with them. The .collapse command computes
aggregate information and stores into a new data set. This model fits data relatively well but its
t-tests report insignificant parameters. Note that /// links two command lines.

. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) ///
gm_load=load, by(airline)

. regress gm_cost gm_output gm_fuel gm_load

Source | SS df MS Number of obs = 6
-------------+------------------------------ F( 3, 2) = 104.12
Model | 4.94698124 3 1.64899375 Prob > F = 0.0095
Residual | .031675926 2 .015837963 R-squared = 0.9936
-------------+------------------------------ Adj R-squared = 0.9841
Total | 4.97865717 5 .995731433 Root MSE = .12585

------------------------------------------------------------------------------
gm_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gm_output | .7824568 .1087646 7.19 0.019 .3144803 1.250433
gm_fuel | -5.523904 4.478718 -1.23 0.343 -24.79427 13.74647
gm_load | -1.751072 2.743167 -0.64 0.589 -13.55397 10.05182
_cons | 85.8081 56.48199 1.52 0.268 -157.2143 328.8305
------------------------------------------------------------------------------

The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between
effect model, but PROC TSCSREG does not. /BTWNG and /BTWNT fit the between group
and time effect models, respectively.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNG;
RUN;
The PANEL Procedure
Between Groups Estimates

Dependent Variable: cost

Model Description

Estimation Method BtwGrps
Number of Cross Sections 6
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 37
http://www.indiana.edu/~statmath

37
Time Series Length 15


Fit Statistics

SSE 0.0317 DFE 2
MSE 0.0158 Root MSE 0.1258
R-Square 0.9936


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

Intercept 1 85.80901 56.4830 1.52 0.2681 Intercept
output 1 0.782455 0.1088 7.19 0.0188
fuel 1 -5.52398 4.4788 -1.23 0.3427
load 1 -1.75102 2.7432 -0.64 0.5886

The Stata .xtreg command has the be option to fit the between effect model but does not
report the ANOVA table.

. xtreg cost output fuel load, be i(airline)

Between regression (regression on group means) Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.8808 Obs per group: min = 15
between = 0.9936 avg = 15.0
overall = 0.1371 max = 15

F(3,2) = 104.12
sd(u_i + avg(e_i.))= .1258491 Prob > F = 0.0095

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .7824552 .1087663 7.19 0.019 .3144715 1.250439
fuel | -5.523978 4.478802 -1.23 0.343 -24.79471 13.74675
load | -1.751016 2.74319 -0.64 0.589 -13.55401 10.05198
_cons | 85.80901 56.48302 1.52 0.268 -157.2178 328.8358
------------------------------------------------------------------------------

LIMDEP has the Means subcommand to fit the between effect model.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Means$

+----------------------------------------------------+
| Group Means Regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:04:12PM |
| LHS=YBAR(i.) Mean = 13.36561 |
| Standard deviation = .9978636 |
| WTS=NTi/Nobs Number of observs. = 6 |
| Model size Parameters = 4 |
| Degrees of freedom = 2 |
| Residuals Sum of squares = .3167277E-01 |
| Standard error of e = .1258427 |
| Fit R-squared = .9936383 |
| Adjusted R-squared = .9840957 |
| Model test F[ 3, 2] (prob) = 104.13 (.0095) |
| Diagnostic Log likelihood = 7.218541 |
| Restricted(b=0) = -7.953835 |
| Chi-sq [ 3] (prob) = 30.34 (.0000) |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 38
http://www.indiana.edu/~statmath

38
| Info criter. LogAmemiya Prd. Crt. = -3.634619 |
| Akaike Info. Criter. = -3.910724 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .78244727 .10876126 7.194 .0000 .230256D-11
FUEL | -5.52443747 4.47865187 -1.234 .2174 .18642891
LOAD | -1.75094765 2.74304702 -.638 .5233 .32541105
Constant| 85.8148317 56.4811479 1.519 .1287

SAS, Stata, and LIMDEP all report the same result: SSE .0317, SEE .1258, F 104.12 (p<.0095),
and R
2
.9936.

4.7 Testing Fixed Group Effects (F-test)

How do we know whether there is a significant fixed group effect? The null hypothesis is that
all dummy parameters except for one are zero: 0 ... :
1 1 0
= = =
÷ n
H µ µ .

In order to conduct a F-test, let us obtain the SSE (e’e) of 1.3354 from the pooled OLS
regression and .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model.
Alternatively, you may draw R
2
of .9974 from LSDV1 or LSDV3 and .9883 from the pooled
OLS. Do not, however, use LSDV2 and the within effect model for R
2
.

The F statistic is computed as ] 81 , 5 [ 7319 . 57 ~
) 3 6 90 ( ) 9974 . 1 (
) 1 6 ( ) 9883 . 9974 (.
) 3 6 90 ( ) 2926 (.
) 1 6 ( ) 2926 . 3354 . 1 (
÷ ÷ ÷
÷ ÷
=
÷ ÷
÷ ÷
.

The large F statistic rejects the null hypothesis in favor of the fixed group effect model
(p<.0000). There is a fixed group effect in these panel data.

The SAS TSCSREG and PANEL procedures, Stata .xtreg command, and LIMDEP Regress$
command by default conduct the F test. Alternatively, you may conduct the same test in
LSDV1. In SAS, add the TEST statement in PROC REG and then run the procedure again
(ANOVA table and parameter estimates are skipped).

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
TEST g1 = g2 = g3 = g4 = g5 = 0;
RUN;

The REG Procedure
Model: MODEL1

Test 1 Results for Dependent Variable cost

Mean
Source DF Square F Value Pr > F

Numerator 5 0.20856 57.73 <.0001
Denominator 81 0.00361

In Stata, run the .test command, a follow-up command for the Wald test, right after
estimating the model.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 39
http://www.indiana.edu/~statmath

39

. quietly regress cost g1-g5 output fuel load
. test g1 g2 g3 g4 g5

( 1) g1 = 0
( 2) g2 = 0
( 3) g3 = 0
( 4) g4 = 0
( 5) g5 = 0

F( 5, 81) = 57.73
Prob > F = 0.0000

4.8 Summary

Table 4.1 summarizes the estimation of a fixed effect model in SAS, Stata, and LIMDEP. The
SAS PANEL procedure is generally preferred to Stata and LIMDEP counterparts since it
produces correct statistics and conducts various hypothesis tests conveniently.

Table 4.1 Comparison of the Fixed Effect Model in SAS, Stata, LIMDEP
*

SAS 9 Stata 11 LIMDEP 9
OLS estimation
PROC REG; .regress, .cnsreg Regress$
LSDV1 Correct Correct Correct (slightly different F)
LSDV2 Incorrect F, (adjusted) R
2
Incorrect F, (adjusted) R
2
Correct (slightly different F)
LSDV3 Correct
.cnsreg
No ANOVA table and R
2

Correct (slightly different F)
Different dummy coefficients
Panel Estimation
PROC TSCSREG;
PROC PANEL;
.xtreg, .areg Regress; Panel$
Estimation type LSDV1 Within effect Within effect
SSE (e’e) Correct No Correct
MSE or SEE Correct (adjusted) No Correct (adjusted) SEE
Model test (F) No Incorrect Slightly different F
(adjusted) R
2
Correct Incorrect (correct in .areg) Correct
Intercept Correct LSDV3 intercept No
Coefficients Correct Correct Correct
Standard errors Correct (adjusted) Correct (adjusted) Correct (adjusted)
Effect test (F) Yes Yes Yes
Between effect /BTWNG, /BTWNT
,be Means;
* “Yes/No” means whether the software reports the statistics. “Correct/incorrect” indicates whether the statistics
are different from those of the least squares dummy variable (LSDV) 1 without a dummy variable.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 40
http://www.indiana.edu/~statmath

40
5. One-way Fixed Effect Models: Time Effects

A fixed time effect model investigates how time affects the intercept using time dummy
variables. The logic and method are the same as those of the fixed group effect model.

5.1 Least Squares Dummy Variable Models

The least squares dummy variable (LSDV) model produces the following fifteen regression
equations

Time 01: cost = 20.4959 + .8677*output - .4845*fuel -1.9544*load
Time 02: cost = 20.5782 + .8677*output - .4845*fuel -1.9544*load
Time 03: cost = 20.6559 + .8677*output - .4845*fuel -1.9544*load
Time 04: cost = 20.7409 + .8677*output - .4845*fuel -1.9544*load
Time 05: cost = 21.2000 + .8677*output - .4845*fuel -1.9544*load
Time 06: cost = 21.4118 + .8677*output - .4845*fuel -1.9544*load
Time 07: cost = 21.5035 + .8677*output - .4845*fuel -1.9544*load
Time 08: cost = 21.6542 + .8677*output - .4845*fuel -1.9544*load
Time 09: cost = 21.8397 + .8677*output - .4845*fuel -1.9544*load
Time 10: cost = 22.1140 + .8677*output - .4845*fuel -1.9544*load
Time 11: cost = 22.4655 + .8677*output - .4845*fuel -1.9544*load
Time 12: cost = 22.6515 + .8677*output - .4845*fuel -1.9544*load
Time 13: cost = 22.6167 + .8677*output - .4845*fuel -1.9544*load
Time 14: cost = 22.5524 + .8677*output - .4845*fuel -1.9544*load
Time 15: cost = 22.5369 + .8677*output - .4845*fuel -1.9544*load

5.1.1 LSDV1 without a Dummy

In SAS REG procedure, include time dummy variables instead of group dummies. You need to
exclude one of time dummies, say t15 here, in LSDV1.

PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 17 112.95270 6.64428 439.62 <.0001
Error 72 1.08819 0.01511
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 41
http://www.indiana.edu/~statmath

41
Corrected Total 89 114.04089


Root MSE 0.12294 R-Square 0.9905
Dependent Mean 13.36561 Adj R-Sq 0.9882
Coeff Var 0.91981


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 22.53677 4.94053 4.56 <.0001
t1 1 -2.04096 0.73469 -2.78 0.0070
t2 1 -1.95873 0.72275 -2.71 0.0084
t3 1 -1.88103 0.72036 -2.61 0.0110
t4 1 -1.79601 0.69882 -2.57 0.0122
t5 1 -1.33693 0.50604 -2.64 0.0101
t6 1 -1.12514 0.40862 -2.75 0.0075
t7 1 -1.03341 0.37642 -2.75 0.0076
t8 1 -0.88274 0.32601 -2.71 0.0085
t9 1 -0.70719 0.29470 -2.40 0.0190
t10 1 -0.42296 0.16679 -2.54 0.0134
t11 1 -0.07144 0.07176 -1.00 0.3228
t12 1 0.11457 0.09841 1.16 0.2482
t13 1 0.07979 0.08442 0.95 0.3477
t14 1 0.01546 0.07264 0.21 0.8320
output 1 0.86773 0.01541 56.32 <.0001
fuel 1 -0.48448 0.36411 -1.33 0.1875
load 1 -1.95440 0.44238 -4.42 <.0001

In Stata and LIMDEP, execute following commands to fit the same LSDV1 (output is skipped).

. regress cost t1-t14 output fuel load

REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

5.1.2 LSDV2 without the Intercept

In LIMDEP, take ONE out to fit LSDV2 by suppressing the intercept. Unlike SAS and Stata,
LIMDEP reports correct, although slightly different, F and R
2
statistics.

REGRESS;Lhs=COST;Rhs=T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:15:08PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 42
http://www.indiana.edu/~statmath

42
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Autocorrel Durbin-Watson Stat. = .2363289 |
| Rho = cor[e,e(-1)] = .8818355 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
T1 | 20.4959389 4.20954636 4.869 .0000 .06666667
T2 | 20.5781713 4.22154389 4.875 .0000 .06666667
T3 | 20.6558664 4.22419549 4.890 .0000 .06666667
T4 | 20.7408923 4.24576770 4.885 .0000 .06666667
T5 | 21.1999763 4.44035103 4.774 .0000 .06666667
T6 | 21.4117634 4.53864000 4.718 .0000 .06666667
T7 | 21.5034994 4.57141663 4.704 .0000 .06666667
T8 | 21.6541766 4.62290530 4.684 .0000 .06666667
T9 | 21.8297215 4.65692608 4.688 .0000 .06666667
T10 | 22.1139553 4.79266903 4.614 .0000 .06666667
T11 | 22.4654855 4.94992975 4.539 .0000 .06666667
T12 | 22.6514956 5.00861379 4.523 .0000 .06666667
T13 | 22.6167135 4.98616006 4.536 .0000 .06666667
T14 | 22.5523879 4.95596262 4.551 .0000 .06666667
T15 | 22.5369251 4.94055238 4.562 .0000 .06666667
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1875 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

In SAS and Stata, use /NOINT and noconstant, respectively, to suppress the intercept and
estimate the same LSDV2 (output is skipped).

PROC REG DATA=masil.airline;
MODEL cost = t1-t15 output fuel load /NOINT;
RUN;

. regress cost t1-t15 output fuel load, noc

5.1.3 LSDV3 with a Restriction

In PROC REG, you need to impose a restriction using the RESTRICT statement.

PROC REG DATA=masil.airline;
MODEL cost = t1-t15 output fuel load;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 43
http://www.indiana.edu/~statmath

43

Model 17 112.95270 6.64428 439.62 <.0001
Error 72 1.08819 0.01511
Corrected Total 89 114.04089


Root MSE 0.12294 R-Square 0.9905
Dependent Mean 13.36561 Adj R-Sq 0.9882
Coeff Var 0.91981


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 21.66698 4.62405 4.69 <.0001
t1 1 -1.17118 0.41783 -2.80 0.0065
t2 1 -1.08894 0.40586 -2.68 0.0090
t3 1 -1.01125 0.40323 -2.51 0.0144
t4 1 -0.92622 0.38177 -2.43 0.0178
t5 1 -0.46715 0.19076 -2.45 0.0168
t6 1 -0.25536 0.09856 -2.59 0.0116
t7 1 -0.16363 0.07190 -2.28 0.0258
t8 1 -0.01296 0.04862 -0.27 0.7907
t9 1 0.16259 0.06271 2.59 0.0115
t10 1 0.44682 0.17599 2.54 0.0133
t11 1 0.79834 0.32940 2.42 0.0179
t12 1 0.98435 0.38756 2.54 0.0132
t13 1 0.94957 0.36537 2.60 0.0113
t14 1 0.88524 0.33549 2.64 0.0102
t15 1 0.86978 0.32029 2.72 0.0083
output 1 0.86773 0.01541 56.32 <.0001
fuel 1 -0.48448 0.36411 -1.33 0.1875
load 1 -1.95440 0.44238 -4.42 <.0001
RESTRICT -1 -3.9462E-15 . . .

* Probability computed using beta distribution.

In Stata, define the restriction with the .constraint command and specify the restriction using
the constraint() option of the .cnsreg command.

. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0
. cnsreg cost t1-t15 output fuel load, constraint(3)

Constrained linear regression Number of obs = 90
F( 17, 72) = 439.62
Prob > F = 0.0000
Root MSE = 0.1229

( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t1 | -1.171179 .4178338 -2.80 0.007 -2.004115 -.3382422
t2 | -1.088945 .4058579 -2.68 0.009 -1.898008 -.2798816
t3 | -1.011252 .4032308 -2.51 0.014 -1.815078 -.2074266
t4 | -.9262249 .3817675 -2.43 0.018 -1.687265 -.1651852
t5 | -.4671515 .1907596 -2.45 0.017 -.8474239 -.0868791
t6 | -.2553627 .0985615 -2.59 0.012 -.4518415 -.0588839
t7 | -.1636326 .0718969 -2.28 0.026 -.3069564 -.0203088
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 44
http://www.indiana.edu/~statmath

44
t8 | -.0129552 .0486249 -0.27 0.791 -.1098872 .0839768
t9 | .1625876 .0627099 2.59 0.012 .0375776 .2875976
t10 | .4468191 .175994 2.54 0.013 .0959814 .7976568
t11 | .7983439 .3294027 2.42 0.018 .1416916 1.454996
t12 | .9843536 .3875583 2.54 0.013 .2117702 1.756937
t13 | .9495716 .3653675 2.60 0.011 .2212248 1.677918
t14 | .8852448 .3354912 2.64 0.010 .2164554 1.554034
t15 | .8697821 .3202933 2.72 0.008 .2312891 1.508275
output | .8677268 .0154082 56.32 0.000 .8370111 .8984424
fuel | -.4844835 .3641085 -1.33 0.188 -1.210321 .2413535
load | -1.954404 .4423777 -4.42 0.000 -2.836268 -1.07254
_cons | 21.66698 4.624053 4.69 0.000 12.4491 30.88486
------------------------------------------------------------------------------

In LIMDEP, run the following command to fit the same LSDV3.

REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)+b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:16:47PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Autocorrel Durbin-Watson Stat. = .2363289 |
| Rho = cor[e,e(-1)] = .8818355 |
| Restrictns. F[ 1, 71] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
T1 | -1.17119233 .41783540 -2.803 .0065 .06666667
T2 | -1.08895999 .40585988 -2.683 .0091 .06666667
T3 | -1.01126486 .40323211 -2.508 .0144 .06666667
T4 | -.92623900 .38176914 -2.426 .0178 .06666667
T5 | -.46715493 .19075952 -2.449 .0168 .06666667
T6 | -.25536788 .09856234 -2.591 .0116 .06666667
T7 | -.16363186 .07189683 -2.276 .0259 .06666667
T8 | -.01295461 .04862498 -.266 .7907 .06666667
T9 | .16259020 .06271009 2.593 .0116 .06666667
T10 | .44682406 .17599505 2.539 .0133 .06666667
T11 | .79835421 .32940389 2.424 .0179 .06666667
T12 | .98436437 .38755999 2.540 .0133 .06666667
T13 | .94958221 .36536879 2.599 .0114 .06666667
T14 | .88525662 .33549236 2.639 .0102 .06666667
T15 | .86979380 .32029396 2.716 .0083 .06666667
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1876 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016
Constant| 21.6671313 4.62407240 4.686 .0000

5.2 Within Time Effect Model
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 45
http://www.indiana.edu/~statmath

45

The within effect model for a fixed time effect needs to compute deviations from time means.
Keep in mind that the intercept should be suppressed.

5.2.1 Estimating the Fixed Time Effect Model

Let us manually estimate the fixed time effect model first.

. quietly egen tm_cost = mean(cost), by(year)
. quietly egen tm_output = mean(output), by(year)
. quietly egen tm_fuel = mean(fuel), by(year)
. quietly egen tm_load = mean(load), by(year)

+---------------------------------------------------+
| year tm_cost tm_output tm_fuel tm_load |
|---------------------------------------------------|
| 1 12.36897 -1.790283 11.63606 .4788587 |
| 2 12.45963 -1.744389 11.66868 .4868322 |
| 3 12.60706 -1.577767 11.67494 .52358 |
| 4 12.77912 -1.443695 11.73193 .5244486 |
| 5 12.94143 -1.398122 12.26843 .5635266 |
| 6 13.0452 -1.393002 12.53826 .5541809 |
| 7 13.15965 -1.302416 12.62714 .5607425 |
| 8 13.29884 -1.222963 12.76768 .5670587 |
| 9 13.4651 -1.067003 12.86104 .6179098 |
| 10 13.70187 -.9023156 13.23183 .6233943 |
| 11 13.91324 -.9205539 13.66246 .5802577 |
| 12 14.05984 -.8641667 13.82315 .5856243 |
| 13 14.12841 -.7923916 13.75979 .5803183 |
| 14 14.23517 -.6428015 13.67403 .5804528 |
| 15 14.32062 -.5527684 13.62997 .5797168 |
+---------------------------------------------------+

Once time means are ready, transform the dependent and independent variables and then run
OLS with the intercept suppressed.

. quietly gen tw_cost = cost - tm_cost
. quietly gen tw_output = output - tm_output
. quietly gen tw_fuel = fuel - tm_fuel
. quietly gen tw_load = load - tm_load

. regress tw_cost tw_output tw_fuel tw_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 2015.95
Model | 75.6459391 3 25.215313 Prob > F = 0.0000
Residual | 1.08819023 87 .012507934 R-squared = 0.9858
-------------+------------------------------ Adj R-squared = 0.9853
Total | 76.7341294 90 .852601437 Root MSE = .11184

------------------------------------------------------------------------------
tw_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tw_output | .8677268 .0140171 61.90 0.000 .8398663 .8955873
tw_fuel | -.4844836 .3312359 -1.46 0.147 -1.142851 .1738836
tw_load | -1.954404 .4024388 -4.86 0.000 -2.754295 -1.154514
------------------------------------------------------------------------------

If you want to get intercepts of years, use
t t t
x y d
- -
÷ = '
*
| . For example, the intercept of year 7
is 21.5035=13.1597-{.8677*(-1.3024) + (-.4845)*12.6271 + (-1.9544)*.5607}. As discussed
previously, standard errors of a within effect model need to be adjusted. For instance, the
correct standard error of fuel price is computed as .3641= .3312*sqrt(87/72).

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 46
http://www.indiana.edu/~statmath

46
. sum cost output fuel load if year==7

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 6 13.15965 1.071738 11.88492 14.52004
output | 6 -1.302416 1.272691 -2.865108 .2550375
fuel | 6 12.62714 .0747646 12.48162 12.68725
load | 6 .5607425 .029541 .510342 .594495

5.2.2 Using SAS: PROC TSCSREG and PROC PANEL

You need to sort the data set by variables (i.e., year and airline), which will appear in the ID
statement of PROC TSCSREG and PROC PANEL. The output is very similar to that of
LSDV1 in Section 5.1.1.

PROC SORT DATA=masil.airline;
BY year airline;
RUN;

PROC TSCSREG DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /FIXONE;
RUN;

(output is skipped)

The F test does not reject the null hypothesis of no fixed time effect (F=1.17, p<.3178); that is,
there is no fixed time effect in these panel data.

PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /FIXONE;
RUN;

The PANEL Procedure
Fixed One Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixOne
Number of Cross Sections 15
Time Series Length 6


Fit Statistics

SSE 1.0882 DFE 72
MSE 0.0151 Root MSE 0.1229
R-Square 0.9905


F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 47
http://www.indiana.edu/~statmath

47
14 72 1.17 0.3178


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 -2.04096 0.7347 -2.78 0.0070 Cross Sectional
Effect 1
CS2 1 -1.95873 0.7228 -2.71 0.0084 Cross Sectional
Effect 2
CS3 1 -1.88103 0.7204 -2.61 0.0110 Cross Sectional
Effect 3
CS4 1 -1.79601 0.6988 -2.57 0.0122 Cross Sectional
Effect 4
CS5 1 -1.33693 0.5060 -2.64 0.0101 Cross Sectional
Effect 5
CS6 1 -1.12514 0.4086 -2.75 0.0075 Cross Sectional
Effect 6
CS7 1 -1.03341 0.3764 -2.75 0.0076 Cross Sectional
Effect 7
CS8 1 -0.88274 0.3260 -2.71 0.0085 Cross Sectional
Effect 8
CS9 1 -0.70719 0.2947 -2.40 0.0190 Cross Sectional
Effect 9
CS10 1 -0.42296 0.1668 -2.54 0.0134 Cross Sectional
Effect 10
CS11 1 -0.07144 0.0718 -1.00 0.3228 Cross Sectional
Effect 11
CS12 1 0.114571 0.0984 1.16 0.2482 Cross Sectional
Effect 12
CS13 1 0.079789 0.0844 0.95 0.3477 Cross Sectional
Effect 13
CS14 1 0.015463 0.0726 0.21 0.8320 Cross Sectional
Effect 14
Intercept 1 22.53677 4.9405 4.56 <.0001 Intercept
output 1 0.867727 0.0154 56.32 <.0001
fuel 1 -0.48448 0.3641 -1.33 0.1875
load 1 -1.9544 0.4424 -4.42 <.0001

5.2.3 Using Stata

In Stata .xtreg command, the fe option fits the fixed effect model. The following .iis
command specifies year as a panel identification variable. In this case, i(year) is redundant.

. iis year

. xtreg cost output fuel load, fe i(year)

Fixed-effects (within) regression Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9858 Obs per group: min = 6
between = 0.4812 avg = 6.0
overall = 0.5265 max = 6

F(3,72) = 1668.37
corr(u_i, Xb) = -0.1503 Prob > F = 0.0000
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 48
http://www.indiana.edu/~statmath

48

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8677268 .0154082 56.32 0.000 .8370111 .8984424
fuel | -.4844835 .3641085 -1.33 0.188 -1.210321 .2413535
load | -1.954404 .4423777 -4.42 0.000 -2.836268 -1.07254
_cons | 21.66698 4.624053 4.69 0.000 12.4491 30.88486
-------------+----------------------------------------------------------------
sigma_u | .8027907
sigma_e | .12293801
rho | .97708602 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(14, 72) = 1.17 Prob > F = 0.3178

Again, the intercept 21.6670 is the intercept of LSDV3 (see 5.1.3).

5.2.4 Using LIMDEP

In LIMDEP, specify a time-series variable for stratification in the Str= subcommand. The
pooled OLS part of the output is skipped. Do not forget to include ONE for the intercept.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Fixed$

+----------------------------------------------------+
| Least Squares with Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:19:57PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Estd. Autocorrelation of e(i,t) .881836 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 15 |
| Smallest 6, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1868 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -120.52864 .7673414157D+02 .3271354 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 70.98362 .1088193393D+01 .9904579 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 49
http://www.indiana.edu/~statmath

49
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 35.659 14 .00117 2.605 14 75 .00404 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 418.684 17 .00000 439.617 17 72 .00000 |
|(4) vs (2) 383.025 3 .00000 1668.364 3 72 .00000 |
|(4) vs (3) 18.427 14 .18800 1.169 14 72 .31776 |
+--------------------------------------------------------------------+

You may find F statistic 1.169 at the last line of the output and do not reject the null hypothesis
of no fixed time effect.

5.3 Between Time Effect Model

The between effect model regresses time means of dependent variables on those of independent
variables. See Sections 3.2 and 4.6.

. collapse (mean) tm_cost=cost (mean) tm_output=output (mean) tm_fuel=fuel ///
(mean) tm_load=load, by(year)

. regress tm_cost tm_output tm_fuel tm_load

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 4074.33
Model | 6.21220479 3 2.07073493 Prob > F = 0.0000
Residual | .005590631 11 .000508239 R-squared = 0.9991
-------------+------------------------------ Adj R-squared = 0.9989
Total | 6.21779542 14 .444128244 Root MSE = .02254

------------------------------------------------------------------------------
tm_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tm_output | 1.133337 .0512898 22.10 0.000 1.020449 1.246225
tm_fuel | .3342486 .0228284 14.64 0.000 .2840035 .3844937
tm_load | -1.350727 .2478264 -5.45 0.000 -1.896189 -.8052644
_cons | 11.18505 .3660016 30.56 0.000 10.37949 11.99062
------------------------------------------------------------------------------

PROC PANEL has the /BTWNT option to estimate the between effect model.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNT;
RUN;

The PANEL Procedure
Between Time Periods Estimates

Dependent Variable: cost

Model Description

Estimation Method BtwTime
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.0056 DFE 11
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 50
http://www.indiana.edu/~statmath

50
MSE 0.0005 Root MSE 0.0225
R-Square 0.9991


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

Intercept 1 11.18504 0.3660 30.56 <.0001 Intercept
output 1 1.133335 0.0513 22.10 <.0001
fuel 1 0.334249 0.0228 14.64 <.0001
load 1 -1.35073 0.2478 -5.45 0.0002

Alternatively, use the be option in the Stata .xtreg command and the Means subcommand in
LIMDEP Regress$ command to get the same result.

. xtreg cost output fuel load, be i(year)

Between regression (regression on group means) Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9840 Obs per group: min = 6
between = 0.9991 avg = 6.0
overall = 0.9749 max = 6

F(3,11) = 4074.35
sd(u_i + avg(e_i.))= .0225441 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.133335 .0512897 22.10 0.000 1.020447 1.246223
fuel | .3342494 .0228284 14.64 0.000 .2840044 .3844943
load | -1.35073 .2478257 -5.45 0.000 -1.896191 -.8052695
_cons | 11.18504 .3660008 30.56 0.000 10.37948 11.9906
------------------------------------------------------------------------------

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Means$

+----------------------------------------------------+
| Group Means Regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:23:24PM |
| LHS=YBAR(i.) Mean = 13.36561 |
| Standard deviation = .6664301 |
| WTS=NTi/Nobs Number of observs. = 15 |
| Model size Parameters = 4 |
| Degrees of freedom = 11 |
| Residuals Sum of squares = .5590461E-02 |
| Standard error of e = .2254382E-01 |
| Fit R-squared = .9991009 |
| Adjusted R-squared = .9988557 |
| Model test F[ 3, 11] (prob) =4074.46 (.0000) |
| Diagnostic Log likelihood = 37.92650 |
| Restricted(b=0) = -14.67933 |
| Chi-sq [ 3] (prob) = 105.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -7.348200 |
| Akaike Info. Criter. = -7.361410 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | 1.13334032 .05128905 22.097 .0000 .111879D-13
FUEL | .33424795 .02282811 14.642 .0000 .111879D-13
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 51
http://www.indiana.edu/~statmath

51
LOAD | -1.35072980 .24782272 -5.450 .0000 .141312D-06
Constant| 11.1850651 .36599619 30.561 .0000

5.4 Testing Fixed Time Effects.

The null hypothesis of the fixed time effect model is that all time dummy parameters except
one are zero: 0 ... :
1 1 0
= = =
÷ t
H t t . The F statistic is ] 72 , 14 [ 1683 . 1 ~
) 3 15 15 * 6 ( ) 0882 . 1 (
) 1 15 ( ) 0882 . 1 3354 . 1 (
÷ ÷
÷ ÷
.
The small F statistic does not reject the null hypothesis of no fixed time effect (p<.3180).

SAS PROC PANEL, LIMDEP, and Stata .xtreg by default conduct the F test. You may
conduct the same test using the TEST statement in LSDV1 and the Stata .test command.

PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
TEST t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;

(output is skipped)

. quietly regress cost t1-t14 output fuel load
. test t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14

( 1) t1 = 0
( 2) t2 = 0
( 3) t3 = 0
( 4) t4 = 0
( 5) t5 = 0
( 6) t6 = 0
( 7) t7 = 0
( 8) t8 = 0
( 9) t9 = 0
(10) t10 = 0
(11) t11 = 0
(12) t12 = 0
(13) t13 = 0
(14) t14 = 0

F( 14, 72) = 1.17
Prob > F = 0.3178
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 52
http://www.indiana.edu/~statmath

52
6. Two-way Fixed Effect Models

A two-way fixed model explores fixed effects of two group variables, two time variables, or
one group or one time variables. This chapter investigates fixed group and time effects. This
model thus needs two sets of group and time dummy variables (i.e., airline and year).

6.1 Strategies of the Least Squares Dummy Variable Models

You may combine LSDV1, LSDV2, and LSDV3 to avoid perfect multicollinearity or the
dummy variable trap in a two-way fixed effect model. There are five strategies when
combining three LSDVs. Since .cnsreg does not allow suppressing the intercept, strategy 4
does not work in Stata. The first strategy of dropping two dummies is generally recommended
because of its convenience of model estimation and interpretation.

1. Drop one cross-section and one time-series dummy variables.
2. Drop one cross-section dummy and suppress the intercept. Alternatively, drip one time
dummy and suppress the intercept
3. Drop one cross-section dummy and impose a restriction on the time-series dummy
parameters: 0 =
¿ t
t . Alternatively, drop one time-series dummy and impose a
restriction on the cross-section dummy parameters: 0 =
¿ i
µ
4. Suppress the intercept and impose a restriction on the cross-section dummy parameters:
0 =
¿ i
µ . Alternatively, suppress the intercept and impose a restriction on the time-
series dummy parameters: 0 =
¿ t
t .
5. Include all dummy variables and impose two restrictions on the cross-section and time-
series dummy parameters: 0 =
¿ i
µ and 0 =
¿ t
t

Each strategy produces different dummy coefficients but returns exactly same parameter
estimates of regressors. In general, dummy coefficients are not of primary interest in panel data
models.

6.2 LSDV1 without Two Dummies

The first strategy excludes two dummy variables, one dummy from each set of dummy
variables. Let us exclude g6 for the sixth airline and t15 for the last time period.

. regress cost g1-g5 t1-t14 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 22, 67) = 1960.82
Model | 113.864044 22 5.17563838 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 0.9984
-------------+------------------------------ Adj R-squared = 0.9979
Total | 114.040893 89 1.28135835 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1742825 .0861201 2.02 0.047 .0023861 .346179
g2 | .1114508 .0779551 1.43 0.157 -.0441482 .2670499
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 53
http://www.indiana.edu/~statmath

53
g3 | -.143511 .0518934 -2.77 0.007 -.2470907 -.0399313
g4 | .1802087 .0321443 5.61 0.000 .1160484 .2443691
g5 | -.0466942 .0224688 -2.08 0.042 -.0915422 -.0018463
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
------------------------------------------------------------------------------

In SAS, run the following script to get the same result.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.94004 2.21823 5.83 <.0001
g1 1 0.17428 0.08612 2.02 0.0470
g2 1 0.11145 0.07796 1.43 0.1575
g3 1 -0.14351 0.05189 -2.77 0.0073
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 54
http://www.indiana.edu/~statmath

54
g4 1 0.18021 0.03214 5.61 <.0001
g5 1 -0.04669 0.02247 -2.08 0.0415
t1 1 -0.69314 0.33784 -2.05 0.0441
t2 1 -0.63844 0.33208 -1.92 0.0588
t3 1 -0.59580 0.32945 -1.81 0.0750
t4 1 -0.54215 0.31891 -1.70 0.0938
t5 1 -0.47304 0.23195 -2.04 0.0454
t6 1 -0.42720 0.18844 -2.27 0.0266
t7 1 -0.39598 0.17330 -2.28 0.0255
t8 1 -0.33985 0.15011 -2.26 0.0268
t9 1 -0.27189 0.13482 -2.02 0.0477
t10 1 -0.22739 0.07635 -2.98 0.0040
t11 1 -0.11180 0.03190 -3.50 0.0008
t12 1 -0.03364 0.04290 -0.78 0.4357
t13 1 -0.01773 0.03626 -0.49 0.6263
t14 1 -0.01865 0.03051 -0.61 0.5432
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012

In LIMDEP, the following command fits the same model (output is skipped).

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

6.3 LSDV1 + LSDV2: Drop a Dummy and Suppress the Intercept

The second strategy combines LSDV1 and LSDV2 to drop a dummy and suppress the intercept.
Let us drop a dummy g6 and suppress the intercept. Keep in mind that SSE is still correct but F
and R
2
are not.

. regress cost g1-g5 t1-t15 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 23, 67) = .
Model | 16191.4201 23 703.974786 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1742825 .0861201 2.02 0.047 .0023861 .346179
g2 | .1114508 .0779551 1.43 0.157 -.0441482 .2670499
g3 | -.143511 .0518934 -2.77 0.007 -.2470907 -.0399313
g4 | .1802087 .0321443 5.61 0.000 .1160484 .2443691
g5 | -.0466942 .0224688 -2.08 0.042 -.0915422 -.0018463
t1 | 12.2469 1.885399 6.50 0.000 8.48363 16.01018
t2 | 12.3016 1.891045 6.51 0.000 8.527062 16.07615
t3 | 12.34424 1.89341 6.52 0.000 8.564976 16.1235
t4 | 12.39789 1.903395 6.51 0.000 8.598694 16.19708
t5 | 12.467 1.991503 6.26 0.000 8.491942 16.44206
t6 | 12.51284 2.035334 6.15 0.000 8.450294 16.57538
t7 | 12.54406 2.05038 6.12 0.000 8.451487 16.63664
t8 | 12.60019 2.073782 6.08 0.000 8.460909 16.73948
t9 | 12.66815 2.090527 6.06 0.000 8.495438 16.84086
t10 | 12.71266 2.151893 5.91 0.000 8.417458 17.00785
t11 | 12.82824 2.221401 5.77 0.000 8.394303 17.26217
t12 | 12.9064 2.247972 5.74 0.000 8.41943 17.39337
t13 | 12.92231 2.237999 5.77 0.000 8.455241 17.38937
t14 | 12.9214 2.224893 5.81 0.000 8.480492 17.3623
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 55
http://www.indiana.edu/~statmath

55
t15 | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
------------------------------------------------------------------------------

Alternatively, you may drop one of time dummies and suppress the intercept. The dummy
coefficients are different from those above but parameter estimates of regressors remained
unchanged.

. regress cost g1-g6 t1-t14 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 23, 67) = .
Model | 16191.4201 23 703.974786 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | 13.11432 2.229552 5.88 0.000 8.66412 17.56453
g2 | 13.05149 2.229864 5.85 0.000 8.600665 17.50232
g3 | 12.79653 2.230546 5.74 0.000 8.344341 17.24872
g4 | 13.12025 2.223638 5.90 0.000 8.68185 17.55865
g5 | 12.89335 2.222204 5.80 0.000 8.45781 17.32888
g6 | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
------------------------------------------------------------------------------

In SAS, execute the following script that has /NOINT to suppress the intercept.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t15 output fuel load /NOINT;
MODEL cost = g1-g6 t1-t14 output fuel load /NOINT;
RUN;

(output is skippted)

In LIMDEP, ONE should be taken out to suppress the intercept.

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15, OUTPUT,FUEL,LOAD$

(output is skippted)

REGRESS;Lhs=COST;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 56
http://www.indiana.edu/~statmath

56
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 03:58:13PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Autocorrel Durbin-Watson Stat. = .6035047 |
| Rho = cor[e,e(-1)] = .6982476 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 13.1139819 2.22955625 5.882 .0000 .16666667
G2 | 13.0511515 2.22986828 5.853 .0000 .16666667
G3 | 12.7961914 2.23055043 5.737 .0000 .16666667
G4 | 13.1199153 2.22364115 5.900 .0000 .16666667
G5 | 12.8930131 2.22220692 5.802 .0000 .16666667
G6 | 12.9397087 2.21823375 5.833 .0000 .16666667
T1 | -.69308729 .33783938 -2.052 .0441 .06666667
T2 | -.63838795 .33208126 -1.922 .0588 .06666667
T3 | -.59575348 .32944797 -1.808 .0750 .06666667
T4 | -.54210773 .31891465 -1.700 .0938 .06666667
T5 | -.47300784 .23194606 -2.039 .0454 .06666667
T6 | -.42717813 .18844068 -2.267 .0266 .06666667
T7 | -.39595152 .17329717 -2.285 .0255 .06666667
T8 | -.33982426 .15010661 -2.264 .0268 .06666667
T9 | -.27187359 .13481769 -2.017 .0477 .06666667
T10 | -.22737840 .07634935 -2.978 .0040 .06666667
T11 | -.11180525 .03190046 -3.505 .0008 .06666667
T12 | -.03364915 .04290088 -.784 .4356 .06666667
T13 | -.01774030 .03625541 -.489 .6262 .06666667
T14 | -.01864714 .03050793 -.611 .5431 .06666667
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3060 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0012 .56046016

Notice that LIMDEP reports correct F (1960.83), and R
2
(.9984).

6.4 LSDV1 + LSDV3: Drop a Dummy and Impose a Restriction

The third strategy excludes one dummy from a set of dummy variables and imposes a
restriction on another set of dummy parameters. Let us drop a time dummy here and then
impose a restriction on group dummy parameters.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t14 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 57
http://www.indiana.edu/~statmath

57
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.98600 2.22540 5.84 <.0001
g1 1 0.12833 0.04601 2.79 0.0069
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 -0.69314 0.33784 -2.05 0.0441
t2 1 -0.63844 0.33208 -1.92 0.0588
t3 1 -0.59580 0.32945 -1.81 0.0750
t4 1 -0.54215 0.31891 -1.70 0.0938
t5 1 -0.47304 0.23195 -2.04 0.0454
t6 1 -0.42720 0.18844 -2.27 0.0266
t7 1 -0.39598 0.17330 -2.28 0.0255
t8 1 -0.33985 0.15011 -2.26 0.0268
t9 1 -0.27189 0.13482 -2.02 0.0477
t10 1 -0.22739 0.07635 -2.98 0.0040
t11 1 -0.11180 0.03190 -3.50 0.0008
t12 1 -0.03364 0.04290 -0.78 0.4357
t13 1 -0.01773 0.03626 -0.49 0.6263
t14 1 -0.01865 0.03051 -0.61 0.5432
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 -1.9387E-16 . . .

* Probability computed using beta distribution.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 58
http://www.indiana.edu/~statmath

58

In Stata, you need to run the .cnsreg command with a constraint on the group dummy
parameters. .cnsreg with the .constraint(1) option fits OLS under constraint 1 defined
in .constraint.

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 t1-t14 output fuel load, constraint(1)

Constrained linear regression Number of obs = 90
F( 22, 67) = 1960.82
Prob > F = 0.0000
Root MSE = 0.0514

( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1283264 .0460126 2.79 0.007 .0364849 .2201679
g2 | .0654947 .0389685 1.68 0.097 -.0122867 .1432761
g3 | -.1894671 .0156096 -12.14 0.000 -.220624 -.1583102
g4 | .1342526 .0183163 7.33 0.000 .097693 .1708121
g5 | -.0926504 .0373085 -2.48 0.016 -.1671184 -.0181824
g6 | -.0459561 .0416069 -1.10 0.273 -.1290038 .0370916
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.986 2.225402 5.84 0.000 8.544076 17.42792
------------------------------------------------------------------------------

In LIMDEP, run a Regress$ command with the Cls: subcommand. b(2) in the subcommand
indicates the second parameter estimate listed in the Rhs= subcommand. Therefore, LIMDEP
fits the LSDV1 under the constraint that the sum of all group dummy parameters, b(2) for g1
through b(7) for g6, is zero.

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 04:24:35PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 59
http://www.indiana.edu/~statmath

59
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Autocorrel Durbin-Watson Stat. = .6035047 |
| Rho = cor[e,e(-1)] = .6982476 |
| Restrictns. F[ 1, 66] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 12.9856603 2.22540616 5.835 .0000
G1 | .12832155 .04601257 2.789 .0069 .16666667
G2 | .06549116 .03896849 1.681 .0976 .16666667
G3 | -.18946893 .01560965 -12.138 .0000 .16666667
G4 | .13425504 .01831636 7.330 .0000 .16666667
G5 | -.09264719 .03730846 -2.483 .0156 .16666667
G6 | -.04595164 .04160692 -1.104 .2734 .16666667
T1 | -.69308729 .33783938 -2.052 .0442 .06666667
T2 | -.63838795 .33208126 -1.922 .0589 .06666667
T3 | -.59575348 .32944797 -1.808 .0751 .06666667
T4 | -.54210773 .31891465 -1.700 .0939 .06666667
T5 | -.47300784 .23194606 -2.039 .0454 .06666667
T6 | -.42717813 .18844068 -2.267 .0267 .06666667
T7 | -.39595152 .17329717 -2.285 .0255 .06666667
T8 | -.33982426 .15010661 -2.264 .0269 .06666667
T9 | -.27187359 .13481769 -2.017 .0478 .06666667
T10 | -.22737840 .07634935 -2.978 .0041 .06666667
T11 | -.11180525 .03190046 -3.505 .0008 .06666667
T12 | -.03364915 .04290088 -.784 .4356 .06666667
T13 | -.01774030 .03625541 -.489 .6262 .06666667
T14 | -.01864714 .03050793 -.611 .5432 .06666667
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3061 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0012 .56046016

Alternatively, you may drop one group dummy and imposes a restriction on time dummy
variables. In LIMDEP, b(7) indicates the seventh parameter estimate for t1. The output is
skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t15 output fuel load;
RESTRICT t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0;
RUN;

. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0
. cnsreg cost g1-g5 t1-t15 output fuel load, constraint(3)

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$

6.5 LSDV2 + LSDV3: Suppress the Intercept and Impose a Restriction

The strategy of LSDV2 + LSDV3 includes all two sets of dummy variables and instead
suppresses the intercept and imposes a restriction. Stata does not support this approach. The
following procedure has a constraint on the group variable. Since the intercept is suppressed, F
(703.9748) and R
2
are incorrect.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load /NOINT;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 60
http://www.indiana.edu/~statmath

60
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 23 16191 703.97479 266704 <.0001
Error 67 0.17685 0.00264
Uncorrected Total 90 16192


Root MSE 0.05138 R-Square 1.0000
Dependent Mean 13.36561 Adj R-Sq 1.0000
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

g1 1 0.12833 0.04601 2.79 0.0069
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 12.29286 1.89169 6.50 <.0001
t2 1 12.34756 1.89736 6.51 <.0001
t3 1 12.39019 1.89982 6.52 <.0001
t4 1 12.44384 1.90989 6.52 <.0001
t5 1 12.51295 1.99808 6.26 <.0001
t6 1 12.55879 2.04195 6.15 <.0001
t7 1 12.59002 2.05706 6.12 <.0001
t8 1 12.64615 2.08052 6.08 <.0001
t9 1 12.71410 2.09734 6.06 <.0001
t10 1 12.75861 2.15883 5.91 <.0001
t11 1 12.87419 2.22838 5.78 <.0001
t12 1 12.95236 2.25499 5.74 <.0001
t13 1 12.96826 2.24505 5.78 <.0001
t14 1 12.96735 2.23202 5.81 <.0001
t15 1 12.98600 2.22540 5.84 <.0001
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 61
http://www.indiana.edu/~statmath

61
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 5.89339E-14 1.250165E-9 0.00 1.0000*

* Probability computed using beta distribution.

You may impose an alternative restriction on the time variable to obtain the equivalent result
despite different dummy coefficients. The output is skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load /NOINT;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;

In LIMDEP, following commands are supposed to work, but they return different parameter
estimates and goodness-of-fit measures probably due to its estimation method.

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$

(output is skipped)

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 04:47:10PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1790783 |
| Standard error of e = .5169924E-01 |
| Fit R-squared = .9984297 |
| Adjusted R-squared = .9979141 |
| Model test F[ 22, 67] (prob) =1936.37 (.0000) |
| Diagnostic Log likelihood = 152.1839 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 581.08 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.697046 |
| Akaike Info. Criter. = -5.708630 |
| Autocorrel Durbin-Watson Stat. = .6164424 |
| Rho = cor[e,e(-1)] = .6917788 |
| Restrictns. F[ 1, 66] (prob) = .68 (.4113) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 13.0058594 ......(Fixed Parameter).......
G2 | 12.9453125 216842.319 .000 1.0000 .16666667
G3 | 12.6894531 216842.319 .000 1.0000 .16666667
G4 | 13.0117188 216842.319 .000 1.0000 .16666667
G5 | 12.7812500 ......(Fixed Parameter).......
G6 | 12.8261719 ......(Fixed Parameter).......
T1 | -.39453125 306661.348 .000 1.0000 .06666667
T2 | -.33203125 433684.637 .000 1.0000 .06666667
T3 | -.29101563 216842.319 .000 1.0000 .06666667
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 62
http://www.indiana.edu/~statmath

62
T4 | -.24414063 306661.348 .000 1.0000 .06666667
T5 | -.16406250 ......(Fixed Parameter).......
T6 | -.10742188 ......(Fixed Parameter).......
T7 | -.07421875 ......(Fixed Parameter).......
T8 | -.02148438 ......(Fixed Parameter).......
T9 | .05859375 216842.319 .000 1.0000 .06666667
T10 | .10351563 216842.319 .000 1.0000 .06666667
T11 | .22070313 216842.319 .000 1.0000 .06666667
T12 | .30468750 216842.319 .000 1.0000 .06666667
T13 | .31250000 216842.319 .000 1.0000 .06666667
T14 | .31835938 216842.319 .000 1.0000 .06666667
T15 | .33203125 ......(Fixed Parameter).......
OUTPUT | .81399272 .03205125 25.397 .0000 -1.17430918
FUEL | .15204518 .16450594 .924 .3587 12.7703592
LOAD | -.88619366 .26338199 -3.365 .0013 .56046016

6.6 LSDV3 with Two Restrictions

The last strategy includes all group and time dummies and then imposes two restrictions on
group and time dummy parameters. Pay attention to the two RESTRICT statements in the
following PROC REG.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.66688 2.08107 6.09 <.0001
g1 1 0.12833 0.04601 2.79 0.0069
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 63
http://www.indiana.edu/~statmath

63
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 -0.37402 0.19187 -1.95 0.0554
t2 1 -0.31932 0.18609 -1.72 0.0908
t3 1 -0.27669 0.18335 -1.51 0.1360
t4 1 -0.22304 0.17297 -1.29 0.2017
t5 1 -0.15393 0.08644 -1.78 0.0795
t6 1 -0.10809 0.04486 -2.41 0.0187
t7 1 -0.07686 0.03193 -2.41 0.0188
t8 1 -0.02073 0.02045 -1.01 0.3143
t9 1 0.04722 0.02908 1.62 0.1091
t10 1 0.09173 0.08115 1.13 0.2624
t11 1 0.20731 0.14914 1.39 0.1691
t12 1 0.28547 0.17564 1.63 0.1088
t13 1 0.30138 0.16603 1.82 0.0740
t14 1 0.30047 0.15362 1.96 0.0546
t15 1 0.31911 0.14749 2.16 0.0341
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 -2.5962E-16 4.04547E-11 -0.00 1.0000*
RESTRICT -1 -2.3598E-16 . . .

* Probability computed using beta distribution.

In Stata, execute the following command to get the same result. Notice that constraints 1 and 3
were defined above.

. cnsreg cost g1-g6 t1-t15 output fuel load, constraint(1 3)

Constrained linear regression Number of obs = 90
F( 22, 67) = 1960.82
Prob > F = 0.0000
Root MSE = 0.0514

( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
( 2) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1283264 .0460126 2.79 0.007 .0364849 .2201679
g2 | .0654947 .0389685 1.68 0.097 -.0122867 .1432761
g3 | -.1894671 .0156096 -12.14 0.000 -.220624 -.1583102
g4 | .1342526 .0183163 7.33 0.000 .097693 .1708121
g5 | -.0926504 .0373085 -2.48 0.016 -.1671184 -.0181824
g6 | -.0459561 .0416069 -1.10 0.273 -.1290038 .0370916
t1 | -.3740245 .191872 -1.95 0.055 -.7570026 .0089536
t2 | -.3193228 .1860877 -1.72 0.091 -.6907554 .0521097
t3 | -.2766893 .1833501 -1.51 0.136 -.6426576 .0892789
t4 | -.2230399 .1729671 -1.29 0.202 -.5682837 .1222038
t5 | -.1539291 .0864404 -1.78 0.079 -.3264649 .0186066
t6 | -.1080904 .0448591 -2.41 0.019 -.1976296 -.0185513
t7 | -.0768646 .0319336 -2.41 0.019 -.1406043 -.0131248
t8 | -.0207326 .0204506 -1.01 0.314 -.061552 .0200869
t9 | .0472205 .0290822 1.62 0.109 -.0108278 .1052688
t10 | .0917281 .0811525 1.13 0.262 -.0702531 .2537092
t11 | .2073105 .1491443 1.39 0.169 -.0903829 .5050039
t12 | .2854727 .1756365 1.63 0.109 -.0650993 .6360447
t13 | .3013791 .1660294 1.82 0.074 -.030017 .6327752
t14 | .3004686 .1536212 1.96 0.055 -.0061606 .6070978
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 64
http://www.indiana.edu/~statmath

64
t15 | .3191137 .1474883 2.16 0.034 .0247259 .6135015
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.66688 2.081068 6.09 0.000 8.513054 16.82071
------------------------------------------------------------------------------

In LIMDEP, the following command returns the same result (output is skipped). Notice that
two restrictions in Cls: are separated by a comma.

REGRESS;Lhs=COST;
Rhs=One,G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0,
b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)+b(22)=0$

6.7 Two-way Within Effect Model

The two-way fixed effect model requires a transformation of dependent and independent
variables using group means.
- - - -
+ ÷ ÷ = y y y y y
t i it it
*
and
- - - -
+ ÷ ÷ = x x x x x
t i it it
*
.

. gen w_cost = cost - gm_cost - tm_cost + m_cost
. gen w_output = output - gm_output - tm_output + m_output
. gen w_fuel = fuel - gm_fuel - tm_fuel + m_fuel
. gen w_load = load - gm_load - tm_load + m_load

Once data are transformed, run the OLS with the transformed variables. Do not forget to
suppress the intercept.

. regress w_cost w_output w_fuel w_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 307.86
Model | 1.87739643 3 .625798811 Prob > F = 0.0000
Residual | .176848774 87 .002032745 R-squared = 0.9139
-------------+------------------------------ Adj R-squared = 0.9109
Total | 2.05424521 90 .022824947 Root MSE = .04509

------------------------------------------------------------------------------
w_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w_output | .8172487 .0279512 29.24 0.000 .7616927 .8728048
w_fuel | .16861 .1434621 1.18 0.243 -.1165364 .4537565
w_load | -.8828142 .2296907 -3.84 0.000 -1.339349 -.426279
------------------------------------------------------------------------------

Remember that F, R
2
, standard errors, and DF
error
are not correct. Standard errors need to be
adjusted; for instance, the standard error of the load factor is .2617=.2297*sqrt(87/67).

The dummy variable coefficients are computed as | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
i i i
and
| )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
t t t
. We need to compute overall means and group specific, say
airline 3, means.

. sum cost output fuel load

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 90 13.36561 1.131971 11.14154 15.3733
output | 90 -1.174309 1.150606 -3.278573 .6608616
fuel | 90 12.77036 .8123749 11.55017 13.831
load | 90 .5604602 .0527934 .432066 .676287
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 65
http://www.indiana.edu/~statmath

65

. sum cost output fuel load if airline==3

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 15 13.37231 .5220657 12.56479 13.99694
output | 15 -.9122625 .2435335 -1.337794 -.6169364
fuel | 15 12.78972 .8177211 11.6851 13.831
load | 15 .5845359 .0324437 .524334 .654256

The actual (absolute) intercept of airline 3 is -.1895 =(13.3723-13.3656)-(-.9123-(-
1.1743))*(.8172) -(12.7897-12.7704)*(.1686)- (.5845-.5605)*(-.8828). The actual intercept of
time period 9 is .0472=(13.4651-13.3656)-(-1.0670-(-1.1743))*(.8172) -(12.8610-
12.7704)*(.1686)- (.6179-.5605)*(-.8828). See the SAS output in Section 6.6 to cross-check the
computation.

. sum cost output fuel load if year==9

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 6 13.4651 1.042032 12.20495 14.78597
output | 6 -1.067003 1.278931 -2.673258 .4779284
fuel | 6 12.86104 .0212523 12.83356 12.89337
load | 6 .6179098 .0376737 .546723 .654256

6.8 Using SAS: PROC TSCSREG and PROC PANEL

PROC TSCSREG and PROC PANEL have the /FIXTWO option to fit the two-way fixed effect
model. The data set needs to be sorted by the group and time variables that will be declared in
the ID statement in PROC PANEL.

PROC SORT DATA=masil.airline;
BY airline year;

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXTWO;
RUN;

The PANEL Procedure
Fixed Two Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixTwo
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.1768 DFE 67
MSE 0.0026 Root MSE 0.0514
R-Square 0.9984


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 66
http://www.indiana.edu/~statmath

66
F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

19 67 23.10 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 0.174283 0.0861 2.02 0.0470 Cross Sectional
Effect 1
CS2 1 0.111451 0.0780 1.43 0.1575 Cross Sectional
Effect 2
CS3 1 -0.14351 0.0519 -2.77 0.0073 Cross Sectional
Effect 3
CS4 1 0.180209 0.0321 5.61 <.0001 Cross Sectional
Effect 4
CS5 1 -0.04669 0.0225 -2.08 0.0415 Cross Sectional
Effect 5
TS1 1 -0.69314 0.3378 -2.05 0.0441 Time Series
Effect 1
TS2 1 -0.63844 0.3321 -1.92 0.0588 Time Series
Effect 2
TS3 1 -0.5958 0.3294 -1.81 0.0750 Time Series
Effect 3
TS4 1 -0.54215 0.3189 -1.70 0.0938 Time Series
Effect 4
TS5 1 -0.47304 0.2319 -2.04 0.0454 Time Series
Effect 5
TS6 1 -0.4272 0.1884 -2.27 0.0266 Time Series
Effect 6
TS7 1 -0.39598 0.1733 -2.28 0.0255 Time Series
Effect 7
TS8 1 -0.33985 0.1501 -2.26 0.0268 Time Series
Effect 8
TS9 1 -0.27189 0.1348 -2.02 0.0477 Time Series
Effect 9
TS10 1 -0.22739 0.0763 -2.98 0.0040 Time Series
Effect 10
TS11 1 -0.1118 0.0319 -3.50 0.0008 Time Series
Effect 11
TS12 1 -0.03364 0.0429 -0.78 0.4357 Time Series
Effect 12
TS13 1 -0.01773 0.0363 -0.49 0.6263 Time Series
Effect 13
TS14 1 -0.01865 0.0305 -0.61 0.5432 Time Series
Effect 14
Intercept 1 12.94004 2.2182 5.83 <.0001 Intercept
output 1 0.817249 0.0319 25.66 <.0001
fuel 1 0.16861 0.1635 1.03 0.3061
load 1 -0.88281 0.2617 -3.37 0.0012

6.9 Using Stata and LIMDEP

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 67
http://www.indiana.edu/~statmath

67
The Stata .xtreg command does not have an option for two-way fixed or two-way random
effect models. However, this command is able to fit the two-way fixed effect model by
including a set of dummies for a group (LSDV1) and using the fe option.

. xtreg cost t1-t14 output fuel load, fe i(airline)

Fixed-effects (within) regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9955 Obs per group: min = 15
between = 0.9859 avg = 15.0
overall = 0.9885 max = 15

F(17,67) = 873.24
corr(u_i, Xb) = 0.3361 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.986 2.225402 5.84 0.000 8.544076 17.42792
-------------+----------------------------------------------------------------
sigma_u | .1306712
sigma_e | .05137639
rho | .86611203 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5, 67) = 69.05 Prob > F = 0.0000

The F statistic of 69.05 tests only if parameters of g1 through g5 are all zero. You may double-
check this test by running the following commands.

. quietly regress cost g1-g5 t1-t14 output fuel load
. test g1=g2=g3=g4=g5=0

( 1) g1 - g2 = 0
( 2) g1 - g3 = 0
( 3) g1 - g4 = 0
( 4) g1 - g5 = 0
( 5) g1 = 0

F( 5, 67) = 69.05
Prob > F = 0.0000

The following LIMDEP command fits the two-way fixed model. This command has Str and
Period to specify stratification and time variables. This command presents the pooled model
and one-way group effect model as well, but reports the incorrect intercept in the two-way
fixed model, 12.667 (2.081). The pooled OLS and fixed group effect parts of the entire output
is skipped below since they are redundant.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 68
http://www.indiana.edu/~statmath

68

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Period=YEAR;Fixed$

+----------------------------------------------------+
| Least Squares with Group and Period Effects |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:27:40PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Estd. Autocorrelation of e(i,t) .651825 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
| Panel: Prds: Empty 0, Valid data 15 |
| Smallest 0, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3052 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0011 .56046016
Constant| 12.6665675 2.08107166 6.087 .0000

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -90.48804 .3936109461D+02 .6548513 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 130.08647 .2926207777D+00 .9974341 |
|(5) X ind.&time effects 152.74790 .1768479062D+00 .9984493 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 95.740 5 .00000 31.875 5 84 .00000 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 536.889 8 .00000 3935.818 8 81 .00000 |
|(4) vs (2) 441.149 3 .00000 3604.832 3 81 .00000 |
|(4) vs (3) 136.633 5 .00000 57.733 5 81 .00000 |
|(5) vs (4) 45.323 14 .00004 3.133 14 67 .00085 |
|(5) vs (3) 181.956 20 .00000 21.947 20 67 .00000 |
+--------------------------------------------------------------------+

6.10 Testing Two-way Fixed Effects

The null hypothesis is that parameters of group and time dummies are zero:
0 ... :
1 1 0
= = =
÷ n
H µ µ and 0 ...
1 1
= = =
÷ T
t t . The F test compares the pooled regression and
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 69
http://www.indiana.edu/~statmath

69
two-way fixed group and time effect model. The F statistic of 23.1085 rejects the null
hypothesis at the .01 significance level (p<.0000).

] 67 , 19 [ 1085 . 23 ~
) 1 3 15 6 15 * 6 ( ) 1768 (.
) 2 15 6 ( ) 1768 . 3354 . 1 (
+ ÷ ÷ ÷
÷ + ÷


The SAS TSCSREG and PANEL procedures conduct this F-test for the group and time effects.
You may also run the following SAS REG procedure and Stata .regress command to perform
the same test. The Stata output is skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
TEST g1=g2=g3=g4=g5=t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;

Test 1 Results for Dependent Variable cost

Mean
Source DF Square F Value Pr > F

Numerator 19 0.06098 23.10 <.0001
Denominator 67 0.00264

. quietly regress cost g1-g5 t1-t14 output fuel load
. test g1 g2 g3 g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 70
http://www.indiana.edu/~statmath

70
7. Random Effect Models

A random effect model examines how group and/or time affect error variances. This model is
appropriate for n individuals who were drawn randomly from a large population. This chapter
focuses on the feasible generalized least squares (FGLS) with variance component estimation
methods.
10


7.1 One-way Random Group Effect Model

When the omega matrix is not known, you have to estimateu using the SSEs of the between
group effect model (.0317) and the fixed group effect model (.2926).

The variance component of error
2
ˆ
v
o is .00361263 = .292622872/(6*15-6-3)
The variance component of group
2
ˆ
u
o is .01559712 =.031675926/(6-4) - .00361263/15

Thus, u
ˆ
is
4) - /(6 .031675926 * 15
.00361263
1
ˆ
ˆ
1
ˆ ˆ
ˆ
1 .87668488
2
2
2 2
2
÷ = ÷ =
+
÷ =
between
v
v u
v
T T o
o
o o
o
,
where 01583796 .
4 6
031675926 .
ˆ
2
=
÷
=
÷
=
K n
SSE
between
between
o .

Next, transform the dependent and independent variables including the intercept using u
ˆ
.

. gen rg_cost = cost - .87668488*gm_cost
. gen rg_output = output - .87668488*gm_output
. gen rg_fuel = fuel - .87668488*gm_fuel
. gen rg_load = load - .87668488*gm_load
. gen rg_int = 1 - .87668488 // for the intercept

Finally, run the OLS with the transformed variables. Do not forget to suppress the intercept.
This is the groupwise heteroscedastic regression model (Greene 2003).

. regress rg_cost rg_int rg_output rg_fuel rg_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 4, 86) =19642.72
Model | 284.670313 4 71.1675783 Prob > F = 0.0000
Residual | .311586777 86 .003623102 R-squared = 0.9989
-------------+------------------------------ Adj R-squared = 0.9989
Total | 284.9819 90 3.16646556 Root MSE = .06019

------------------------------------------------------------------------------
rg_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rg_int | 9.627911 .2101638 45.81 0.000 9.210119 10.0457

10
Baltagi and Cheng (1994) introduce various ANOVA estimation methods, such as a modified Wallace and
Hussain method, the Wansbeek and Kapteyn method, the Swamy and Arora method, and Henderson’s method III.
They also discuss maximum likelihood (ML) estimators, restricted ML estimators, minimum norm quadratic
unbiased estimators (MINQUE), and minimum variance quadratic unbiased estimators (MIVQUE). Based on a
Monte Carlo simulation, they argue that ANOVA estimators are Best Quadratic Unbiased estimators of the
variance components for the balanced model, whereas ML, restricted ML, MINQUE, and MIVQUE are
recommended for the unbalanced models.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 71
http://www.indiana.edu/~statmath

71
rg_output | .9066808 .0256249 35.38 0.000 .8557401 .9576215
rg_fuel | .4227784 .0140248 30.15 0.000 .394898 .4506587
rg_load | -1.0645 .2000703 -5.32 0.000 -1.462226 -.6667731
------------------------------------------------------------------------------

7.2 Estimations in SAS, Stata, and LIMDEP

In SAS, the TSCSREG and PANEL procedures have the /RANONE option to fit the one-way
random effect model. These procedures by default use the Fuller and Battese (1974) estimation
method, which produces slightly different estimates from FGLS.

PROC PANEL has the /VCOMP=WK option for the Wansbeek and Kapteyn (1989) method,
which is the groupwise heteroscedastic regression. The BP option of the MODEL statement,
not available in PROC TSCSREG, conducts the Breusch-Pagen LM test for random effects.
Unlike PROC PANEL, PROC TSCSREG does not have VCOMP= to specify the type of
variance component estimation.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE BP VCOMP=WK;
RUN;

The PANEL Procedure
Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: cost

Model Description

Estimation Method RanOne
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.3111 DFE 86
MSE 0.0036 Root MSE 0.0601
R-Square 0.9923


Variance Component Estimates

Variance Component for Cross Sections 0.016015
Variance Component for Error 0.003613


Hausman Test for
Random Effects

DF m Value Pr > m

2 1.63 0.4429


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 72
http://www.indiana.edu/~statmath

72
Breusch Pagan Test for Random
Effects (One Way)

DF m Value Pr > m

1 334.85 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.629513 0.2107 45.71 <.0001
output 1 0.906918 0.0257 35.30 <.0001
fuel 1 0.422676 0.0140 30.11 <.0001
load 1 -1.06452 0.2000 -5.32 <.0001

PROC PANEL and PROC TSCSREG estimate the same variance component for error (.0036)
but a different variance component for groups (.0160 versus .4744). Notice that there are some
differences in the output of PROC TSCSREG (variance component estimates and Hausman test)
between SAS 9.2 and 9.13.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE;
RUN;

(output is skipped)

Alternatively, you may use PROC MIXED to get the same results. The following script returns
a set of random effect estimates. Unlike SAS 9.13, SAS 9.2 requires the CLASS statement to
explicitly specify an effect variable, airline in this case.

PROC MIXED DATA=masil.airline;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) airline 0.01674
Residual 0.003609


Fit Statistics

-2 Res Log Likelihood -210.4
AIC (smaller is better) -206.4
AICC (smaller is better) -206.3
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 73
http://www.indiana.edu/~statmath

73
BIC (smaller is better) -206.8


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 107.49 <.0001


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.6322 0.2116 5 45.53 <.0001
output 0.9073 0.02581 81 35.16 <.0001
fuel 0.4225 0.01406 81 30.05 <.0001
load -1.0646 0.1998 81 -5.33 <.0001


Solution for Random Effects

Std Err
Effect airline Estimate Pred DF t Value Pr > |t|

Intercept 1 0.01012 0.06594 81 0.15 0.8784
Intercept 2 -0.03450 0.06239 81 -0.55 0.5818
Intercept 3 -0.2106 0.05507 81 -3.82 0.0003
Intercept 4 0.1691 0.05581 81 3.03 0.0033
Intercept 5 0.002981 0.06180 81 0.05 0.9616
Intercept 6 0.06291 0.06349 81 0.99 0.3247


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 81 1235.88 <.0001
fuel 1 81 903.03 <.0001
load 1 81 28.40 <.0001

In Stata, the .xtreg command has the re option to produce FGLS estimates. Let us specify
airline as a panel identification variable using the .iis command. The theta option reports
an estimated theta (.8767).

. iis airline

. xtreg cost output fuel load, re theta

Random-effects GLS regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9925 Obs per group: min = 15
between = 0.9856 avg = 15.0
overall = 0.9876 max = 15

Random effects u_i ~ Gaussian Wald chi2(3) = 11091.33
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 74
http://www.indiana.edu/~statmath

74
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = .87668503

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9066805 .025625 35.38 0.000 .8564565 .9569045
fuel | .4227784 .0140248 30.15 0.000 .3952904 .4502665
load | -1.064499 .2000703 -5.32 0.000 -1.456629 -.672368
_cons | 9.627909 .210164 45.81 0.000 9.215995 10.03982
-------------+----------------------------------------------------------------
sigma_u | .12488859
sigma_e | .06010514
rho | .81193816 (fraction of variance due to u_i)
------------------------------------------------------------------------------

The sigma_u and sigma_e are square roots of the variance components for groups and errors
(.0156=.1249^2, .0036=.0601^2).

Alternatively, .xtmixed fits the same model, the random-intercept model. The || airline:,
option tells Stata to fit the model using the subject variable airline. Variance components for
groups and errors are reported under the labels sd(_cons) and sd(Residual).

. xtmixed cost output fuel load || airline:,

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = 105.20458
Iteration 1: log restricted-likelihood = 105.20458

Computing standard errors:

Mixed-effects REML regression Number of obs = 90
Group variable: airline Number of groups = 6

Obs per group: min = 15
avg = 15.0
max = 15


Wald chi2(3) = 11114.85
Log restricted-likelihood = 105.20458 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9073166 .025809 35.16 0.000 .856732 .9579013
fuel | .4225032 .0140598 30.05 0.000 .3949465 .45006
load | -1.064572 .1997763 -5.33 0.000 -1.456126 -.6730179
_cons | 9.632212 .211559 45.53 0.000 9.217564 10.04686
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
airline: Identity |
sd(_cons) | .1293723 .0429029 .0675403 .2478107
-----------------------------+------------------------------------------------
sd(Residual) | .0600715 .0047138 .051508 .0700588
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 107.49 Prob >= chibar2 = 0.0000

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 75
http://www.indiana.edu/~statmath

75
You may use the maximum likelihood estimation to fit random effect (or random intercept)
model. In SAS, add METHOD=ML to PROC MIXED. PROC PANEL and TSCSREG do not
have such option.

PROC MIXED DATA=masil.airline METHOD=ML;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) airline 0.01302
Residual 0.003494


Fit Statistics

-2 Log Likelihood -229.5
AIC (smaller is better) -217.5
AICC (smaller is better) -216.4
BIC (smaller is better) -218.7


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 105.92 <.0001


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.6186 0.2026 5 47.47 <.0001
output 0.9053 0.02466 81 36.72 <.0001
fuel 0.4234 0.01364 81 31.05 <.0001
load -1.0645 0.1962 81 -5.42 <.0001


Solution for Random Effects

Std Err
Effect airline Estimate Pred DF t Value Pr > |t|

Intercept 1 0.01306 0.05994 81 0.22 0.8281
Intercept 2 -0.03211 0.05640 81 -0.57 0.5707
Intercept 3 -0.2094 0.04900 81 -4.27 <.0001
Intercept 4 0.1676 0.04976 81 3.37 0.0012
Intercept 5 0.000761 0.05580 81 0.01 0.9892
Intercept 6 0.06008 0.05750 81 1.04 0.2992
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 76
http://www.indiana.edu/~statmath

76


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 81 1348.19 <.0001
fuel 1 81 963.88 <.0001
load 1 81 29.43 <.0001

In Stata, the mle option is used in .xtreg and .xtmixed commands to produce the same result.
You may also try .xtgls that fits panel data models with heteroscedasticity across and within
groups. Notice that error variance components are computed as .0130=1141^2 and .0035
= .0591^2. Compare the output of PROC MIXED above and .xtreg below.

. xtreg cost output fuel load, re mle

Random-effects ML regression Number of obs = 90
Group variable: airline Number of groups = 6

Random effects u_i ~ Gaussian Obs per group: min = 15
avg = 15.0
max = 15

LR chi2(3) = 436.32
Log likelihood = 114.72896 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9053099 .0253759 35.68 0.000 .8555741 .9550458
fuel | .4233757 .013888 30.48 0.000 .3961557 .4505957
load | -1.064456 .196231 -5.42 0.000 -1.449062 -.6798506
_cons | 9.618648 .206622 46.55 0.000 9.213677 10.02362
-------------+----------------------------------------------------------------
/sigma_u | .1140843 .0345293 .0630373 .2064687
/sigma_e | .0591072 .0045701 .0507956 .0687787
rho | .7883772 .1047419 .5365302 .9344669
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 105.92 Prob>=chibar2 = 0.000

. xtmixed cost output fuel load || airline:, mle
(output is skipped)

. xtgls cost output fuel load, i(airline) panels(hetero) corr(independent)
(output is skipped)

In LIMDEP, you have to specify Panel, Random Effect, and Het= subcommands for the
groupwise heteroscedastic model. LIMDEP estimates a slightly different variance component
for groups (.0119), thus producing different parameter estimates.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Het=AIRLINE;Random Effect$

+----------------------------------------------------+
| OLS Without Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 08:26:15PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 4 |
| Degrees of freedom = 86 |
| Residuals Sum of squares = 1.335450 |
| Standard error of e = .1246133 |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 77
http://www.indiana.edu/~statmath

77
| Fit R-squared = .9882897 |
| Adjusted R-squared = .9878812 |
| Model test F[ 3, 86] (prob) =2419.33 (.0000) |
| Diagnostic Log likelihood = 61.76991 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 3] (prob) = 400.26 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.121594 |
| Akaike Info. Criter. = -4.121653 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel Data Analysis of COST [ONE way] |
| Unconditional ANOVA (No regressors) |
| Source Variation Deg. Free. Mean Square |
| Between 74.6799 5. 14.9360 |
| Residual 39.3611 84. .468584 |
| Total 114.041 89. 1.28136 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88273863 .01325455 66.599 .0000 -1.17430918
FUEL | .45397771 .02030424 22.359 .0000 12.7703592
LOAD | -1.62750780 .34530293 -4.713 .0000 .56046016
Constant| 9.51691223 .22924522 41.514 .0000

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
+----------------------------------------------------+

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i) |
| Estimates: Var[e] = .361260D-02 |
| Var[u] = .119159D-01 |
| Corr[v(i,t),v(i,s)] = .767356 |
| Lagrange Multiplier Test vs. Model (3) = 334.85 |
| ( 1 df, prob value = .000000) |
| (High values of LM favor FEM/REM over CR model.) |
| Baltagi-Li form of LM Statistic = 334.85 |
| Sum of Squares .147779D+01 |
| R-squared .987042D+00 |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .90412380 .02461548 36.730 .0000 -1.17430918
FUEL | .42389869 .01374650 30.837 .0000 12.7703592
LOAD | -1.06455866 .19933132 -5.341 .0000 .56046016
Constant| 9.61063438 .20277404 47.396 .0000

7.3 One-way Random Time Effect Model

Let us computeu
ˆ
using the SSEs of the between time effect model (.0056) and the fixed time
effect model (1.0882).

The variance component for error
2
ˆ
v
o is .01511375 = 1.08819022/(15*6-15-3)
The variance component for time
2
ˆ
u
o is -.00201072 =.005590631/(15-4)- .01511375/6

Theu
ˆ
is
4) - (15 005590631/ . * 6
.01511375
1
ˆ
ˆ
1 1.226263 -
2
2
÷ = ÷ =
between
v
no
o


. gen rt_cost = cost - (-1.226263)*tm_cost
. gen rt_output = output - (-1.226263)*tm_output
. gen rt_fuel = fuel - (-1.226263)*tm_fuel
. gen rt_load = load - (-1.226263)*tm_load
. gen rt_int = 1 - (-1.226263) // for the intercept
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 78
http://www.indiana.edu/~statmath

78

. regress rt_cost rt_int rt_output rt_fuel rt_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 4, 86) = .
Model | 79944.1804 4 19986.0451 Prob > F = 0.0000
Residual | 1.79271995 86 .020845581 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 79945.9732 90 888.288591 Root MSE = .14438

------------------------------------------------------------------------------
rt_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rt_int | 9.516098 .1489281 63.90 0.000 9.220038 9.812157
rt_output | .8883838 .0143338 61.98 0.000 .8598891 .9168785
rt_fuel | .4392731 .0129051 34.04 0.000 .4136186 .4649277
rt_load | -1.279176 .2482869 -5.15 0.000 -1.772754 -.7855982
------------------------------------------------------------------------------

However, the negative value of the variance component for time is not likely.

In SAS, use the TSCSREG or PANEL procedure with the /RANONE option. Notice that the
data are sorted by year and airline. The /VCOMP=WH option in the MODEL statement
employs Wallace and Hussian’s method to estimating variance components and produces the
same parameter estimates.

PROC SORT DATA=masil.airline;
BY year airline;

PROC TSCSREG DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE;
RUN;
(Output is skipped)

PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE BP VCOMP=WH;
RUN;

The PANEL Procedure
Wallace and Hussain Variance Components (RanOne)

Dependent Variable: cost

Model Description

Estimation Method RanOne
Number of Cross Sections 15
Time Series Length 6


Fit Statistics

SSE 1.3354 DFE 86
MSE 0.0155 Root MSE 0.1246
R-Square 0.9883


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 79
http://www.indiana.edu/~statmath

79
Variance Component Estimates

Variance Component for Cross Sections 0
Variance Component for Error 0.016437


Hausman Test for
Random Effects

DF m Value Pr > m

2 12.17 0.0023


Breusch Pagan Test for Random
Effects (One Way)

DF m Value Pr > m

1 1.55 0.2135


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.516923 0.2292 41.51 <.0001
output 1 0.882739 0.0133 66.60 <.0001
fuel 1 0.453977 0.0203 22.36 <.0001
load 1 -1.62751 0.3453 -4.71 <.0001

PROC MIXED fits the same random time effect model although /SOLUTION in the
RANDOM statement does not work to produce random effect parameter estimates in this case.

PROC MIXED DATA=masil.airline;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) year 0
Residual 0.01553


Fit Statistics

-2 Res Log Likelihood -102.9
AIC (smaller is better) -100.9
AICC (smaller is better) -100.9
BIC (smaller is better) -100.2
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 80
http://www.indiana.edu/~statmath

80


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

0 0.00 1.0000


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.5169 0.2292 14 41.51 <.0001
output 0.8827 0.01325 72 66.60 <.0001
fuel 0.4540 0.02030 72 22.36 <.0001
load -1.6275 0.3453 72 -4.71 <.0001


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 72 4435.44 <.0001
fuel 1 72 499.92 <.0001
load 1 72 22.22 <.0001

In Stata, you have to switch group and time variables using the .tsset command.

. tsset year airline
panel variable: year (strongly balanced)
time variable: airline, 1 to 6
delta: 1 unit

. xtreg cost output fuel load, re i(year) theta

Random-effects GLS regression Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9843 Obs per group: min = 6
between = 0.9966 avg = 6.0
overall = 0.9883 max = 6

Random effects u_i ~ Gaussian Wald chi2(3) = 7258.03
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = 0

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8827385 .0132545 66.60 0.000 .8567602 .9087169
fuel | .453977 .0203042 22.36 0.000 .4141815 .4937724
load | -1.62751 .345302 -4.71 0.000 -2.30429 -.9507309
_cons | 9.516923 .2292445 41.51 0.000 9.067612 9.966233
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .12293801
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------

You may runt the following command to get the same result.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 81
http://www.indiana.edu/~statmath

81

. xtmixed cost output fuel load || year:,
(output is skipped)

In LIMDEP, you need to use the Str= and Random subcommands. The output below includes
only the random effect part. You may find that parameter estimates of SAS, Stata, and
LIMDEP are slightly different each other.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Het=YEAR;Random$

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 15 |
| Smallest 6, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i) |
| Estimates: Var[e] = .151138D-01 |
| Var[u] = .414686D-03 |
| Corr[v(i,t),v(i,s)] = .026705 |
| Lagrange Multiplier Test vs. Model (3) = 1.55 |
| ( 1 df, prob value = .213557) |
| (High values of LM favor FEM/REM over CR model.) |
| Baltagi-Li form of LM Statistic = 1.55 |
| Sum of Squares .133564D+01 |
| R-squared .988288D+00 |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88285277 .01314515 67.162 .0000 -1.17430918
FUEL | .45500533 .02122856 21.434 .0000 12.7703592
LOAD | -1.66267268 .35084190 -4.739 .0000 .56046016
Constant| 9.52363173 .24108843 39.503 .0000

7.4 Two-way Random Effect Model in SAS

The random group and time effect model is formulated as
it t i ti it
u X y c ¸ | o + + + + = ' . Let us
first estimate the two way FGLS using the SAS PANEL procedure with the /RANTWO option.
The BP2 option conducts the Breusch-Pagan LM test for the two-way random effect model.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANTWO;
RUN;
(Output is skipped)

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANTWO BP2;
RUN;

The PANEL Procedure
Fuller and Battese Variance Components (RanTwo)

Dependent Variable: cost

Model Description

Estimation Method RanTwo
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 82
http://www.indiana.edu/~statmath

82
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.2322 DFE 86
MSE 0.0027 Root MSE 0.0520
R-Square 0.9829


Variance Component Estimates

Variance Component for Cross Sections 0.017439
Variance Component for Time Series 0.001081
Variance Component for Error 0.00264


Hausman Test for
Random Effects

DF m Value Pr > m

3 6.93 0.0741

Breusch Pagan Test for Random
Effects (Two Way)

DF m Value Pr > m

2 336.40 <.0001

Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.362677 0.2440 38.38 <.0001
output 1 0.866448 0.0255 33.98 <.0001
fuel 1 0.436163 0.0172 25.41 <.0001
load 1 -0.98053 0.2235 -4.39 <.0001

The following .xtmixed command suffers from convergence problem in this case and
LIMDEP command produces different results (output is skipped).

. xtmixed cost output fuel load || airline: || year:, mle

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Period=YEAR;Random Effect$

7.5 Testing Random Effect Models

The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects. The null
hypothesis of the one-way random group effect model is that individual-specific or time-series
error variances are zero: 0 :
2
0
=
u
H o . If the null hypothesis is not rejected, the pooled
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 83
http://www.indiana.edu/~statmath

83
regression model is appropriate. The e’e of the pooled OLS is 1.33544153 and
e e'
is .0665147.

LM is 334.8496= ) 1 ( ~ 1
3354 . 1
0665 . * 15
) 1 15 ( 2
15 * 6
2
2
2
_
(
¸
(

¸

÷
÷
with p <.0000.

With the large chi-squared of 334.8496, we reject the null hypothesis in favor of the random
group effect model. The SAS PANEL procedure with the /BP option and the LIMDEP Panel
and Het subcommands report the same LM statistic (see 7.2). In Stata, run the .xttest0
command right after estimating the one-way random group effect model.

. quietly xtreg cost output fuel load, re i(airline)

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

cost[airline,t] = Xb + u[airline] + e[airline,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
cost | 1.281358 1.131971
e | .0036126 .0601051
u | .0155972 .1248886

Test: Var(u) = 0
chi2(1) = 334.85
Prob > chi2 = 0.0000

The null hypothesis of the one-way random time effect is that variance components for time are
zero, 0 :
2
0
=
u
H o . The following LM test uses Baltagi’s formula. The small chi-squared of
1.5472 does not reject the null hypothesis at the .01 level. SAS and LIMDEP return the same
LM statistic (see 7.3).

LM is
( )
) 1 ( ~ 1
3354 . 1
7817 .
) 1 6 ( 2
6 * 15
1
) 1 ( 2
5472 . 1
2
2
2
2
2
_
(
¸
(

¸

÷
÷
=
(
(
¸
(

¸

÷
÷
=
¿¿
¿ -
it
t
e
e n
n
Tn
with p<.2135

. quietly xtreg cost output fuel load, re i(year)

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

cost[year,t] = Xb + u[year] + e[year,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
cost | 1.281358 1.131971
e | .0151138 .122938
u | 0 0

Test: Var(u) = 0
chi2(1) = 1.55
Prob > chi2 = 0.2135

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 84
http://www.indiana.edu/~statmath

84
The two way random effects model has the null hypothesis that variance components for
groups and time are all zero. The LM statistic with two degrees of freedom is 336.3968 =
334.8496 + 1.5472 (p<.0001).

7.6 Fixed Effects versus Random Effects

How do we compare a fixed effect model and its counterpart random effect model? The
Hausman specification test examines if the individual effects are uncorrelated with the other
regressors in the model. Since computation is complicated, let us conduct the test in Stata.

. tsset airline year
panel variable: airline (strongly balanced)
time variable: year, 1 to 15
delta: 1 unit

. quietly xtreg cost output fuel load, fe

. estimates store fixed_group

. quietly xtreg cost output fuel load, re

. hausman fixed_group .

---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fixed_group . Difference S.E.
-------------+----------------------------------------------------------------
output | .9192846 .9066805 .0126041 .0153877
fuel | .4174918 .4227784 -.0052867 .0058583
load | -1.070396 -1.064499 -.0058974 .0255088
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2.12
Prob>chi2 = 0.5469
(V_b-V_B is not positive definite)

The Hausman statistic 2.12 is different from PROC PANEL’s 1.63 and Greene (2003)’s 4.16. It
is because SAS, Stata, and LIMDEP use different estimation methods to produce slightly
different parameter estimates. These tests, however, do not reject the null hypothesis in favor of
the random effect model.

7.7 Summary

Table 7.1 summarizes random effect estimations in SAS, Stata, and LIMDEP. PROC PANEL
is highly recommended.

Table 7.1 Comparison of the Random Effect Model in SAS, Stata, LIMDEP
*

SAS 9.2 Stata 11 LIMDEP 9
Procedure/Command PROC TSCSREG PROC PANEL .xtreg Regress; Panel$
One-way /RANONE /RANONE WK re Str=;Random$
Two-way /RANTWO /RANTWO No Str=;Period;Random$
SSE (e’e) Slightly different Correct No Incorrect
MSE or SEE Slightly different Correct No No
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 85
http://www.indiana.edu/~statmath

85
Model test (F) No No Wald test No
(adjusted) R
2
Slightly different Slightly different Incorrect Incorrect
Intercept Slightly different Correct Correct Slightly different
Coefficients Slightly different Correct Correct Slightly different
Standard errors Slightly different Correct Correct Slightly different
Variance for group Slightly different Correct Correct (sigma) Slightly different
Variance for error Correct Correct Correct (sigma) Correct
Theta No No theta No
Breusch-Pagan (LM) No BP, BP2 .xttest0 Yes
Hausman Test (H) Incorrect Yes .hausman Yes (unstable)
* “Yes/No” means whether a software package reports the statistic. “Correct/incorrect” indicates whether the
statistics are different from those of the groupwise heteroscedastic regression.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 86
http://www.indiana.edu/~statmath

86
8. Poolability Test

Table 8.1 summarizes the results of pooled OLS, fixed effect, and random effect model. We
may ask, “Which model is better than the others?” Do we have to consider individual-specific
or time effect? Are these effects are fixed or random?

Table 8.1 Summary of Pooled, Fixed Effect, and Random Effect Models
Model Output Fuel Load SSE/SEE DF F R
2
(Adj.)
Pooled
.8827
**

(.0133)
.4540
**

(.0203)
-1.6275
**

(.3453)
1.3354
(.1246)
86 2419.34
(p<.0000)
.9883
(.9879)
Between group
.7825
*

(.1088)
-5.5239
(4.4787)
-1.7511
(2.7432)
.0317
(.1259)
2 104.12
(p<.0095)
.9936
(.9841)
Between time
1.1333
**

(.0513)
.3342
**

(.0228)
-1.3507
**

(.2478)
.0056
(.0225)
11 4074.33
(p<.0000)
.9991
(.9989)
Fixed group
.9193
**

(.0299)
.4175
**

(.0152)
-1.0704
**

(.2017)
.2926
(.0601)
81 3935.79
(p<.0000)
.9974
(.9972)
Fixed time
.8677
**

(.0154)
-.4845
(.3641)
-1.9544
**

(.4424)
1.0882
(.1229)
72 439.62
(p<.0001)
.9905
(.9882)
Two-way
fixed
.8173
**

(.0319)
.1686
(.1635)
-.8828
**

(.2617)
.1769
(.0514)
67 1960.82
(p<.0000)
.9984
(.9979)
Random group
.9069
**

(.0257)
.4227
**

(.0140)
-1.0645
**

(.2000)
.3111
(.0601)
86 .9923

Random time
.8820
**

(.0134)
.2749
+

(.0568)
-2.0050
**

(.4184)
1.1722
(.1167)
86 .9848

Two-way
random
.8664
**

(.0255)
.4362
**

(.0172)
-.9805
**

(.2235)
.2322
(.0520)
86 .9829


The poolability test examine if data are poolable so that individual entities or time periods have
the same constant slopes of regressors. For poolability test, you need to run group by group
OLS regressions and/or time by time OLS regressions. If the null hypothesis is rejected, the
panel data are not poolable. In this case, you may consider the random coefficient model and
hierarchical regression model.

8.1 Group by Group OLS Regression

In SAS, use the BY statement in PROC REG. Do not forget to sort the data set in advance.

PROC SORT DATA=masil.airline;
BY airline;

PROC REG DATA=masil.airline;
MODEL cost = output fuel load;
BY airline;
RUN;

In Stata, the if qualifier makes it easy to run group by group regressions.

forvalues i= 1(1)6 { // run group by group regression
display "OLS regression for group " `i'
regress cost output fuel load if airline==`i'
}

OLS regression for group 1

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 87
http://www.indiana.edu/~statmath

87
Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 1843.46
Model | 3.41824348 3 1.13941449 Prob > F = 0.0000
Residual | .006798918 11 .000618083 R-squared = 0.9980
-------------+------------------------------ Adj R-squared = 0.9975
Total | 3.4250424 14 .244645886 Root MSE = .02486

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.18318 .0968946 12.21 0.000 .9699164 1.396444
fuel | .3865867 .0181946 21.25 0.000 .3465406 .4266329
load | -2.461629 .4013571 -6.13 0.000 -3.34501 -1.578248
_cons | 10.846 .2972551 36.49 0.000 10.19174 11.50025
------------------------------------------------------------------------------
OLS regression for group 2

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 3129.50
Model | 6.47622084 3 2.15874028 Prob > F = 0.0000
Residual | .007587838 11 .000689803 R-squared = 0.9988
-------------+------------------------------ Adj R-squared = 0.9985
Total | 6.48380868 14 .463129191 Root MSE = .02626

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.459104 .0792856 18.40 0.000 1.284597 1.63361
fuel | .3088958 .0272443 11.34 0.000 .2489315 .36886
load | -2.724785 .2376522 -11.47 0.000 -3.247854 -2.201716
_cons | 11.97243 .4320951 27.71 0.000 11.02139 12.92346
------------------------------------------------------------------------------
OLS regression for group 3

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 608.10
Model | 3.79286673 3 1.26428891 Prob > F = 0.0000
Residual | .022869767 11 .00207907 R-squared = 0.9940
-------------+------------------------------ Adj R-squared = 0.9924
Total | 3.8157365 14 .272552607 Root MSE = .0456

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .7268305 .1554418 4.68 0.001 .3847054 1.068956
fuel | .4515127 .0381103 11.85 0.000 .3676324 .5353929
load | -.7513069 .6105989 -1.23 0.244 -2.095226 .5926122
_cons | 8.699815 .8985786 9.68 0.000 6.722057 10.67757
------------------------------------------------------------------------------
OLS regression for group 4

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 777.86
Model | 7.37252558 3 2.45750853 Prob > F = 0.0000
Residual | .034752343 11 .003159304 R-squared = 0.9953
-------------+------------------------------ Adj R-squared = 0.9940
Total | 7.40727792 14 .52909128 Root MSE = .05621

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9353749 .0759266 12.32 0.000 .7682616 1.102488
fuel | .4637263 .044347 10.46 0.000 .3661192 .5613333
load | -.7756708 .4707826 -1.65 0.128 -1.811856 .2605148
_cons | 9.164608 .6023241 15.22 0.000 7.838902 10.49031
------------------------------------------------------------------------------
OLS regression for group 5

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 1999.89
Model | 7.08313716 3 2.36104572 Prob > F = 0.0000
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 88
http://www.indiana.edu/~statmath

88
Residual | .012986435 11 .001180585 R-squared = 0.9982
-------------+------------------------------ Adj R-squared = 0.9977
Total | 7.09612359 14 .506865971 Root MSE = .03436

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.076299 .0771255 13.96 0.000 .9065471 1.246051
fuel | .2920542 .0434213 6.73 0.000 .1964845 .3876239
load | -1.206847 .3336308 -3.62 0.004 -1.941163 -.4725305
_cons | 11.77079 .7430078 15.84 0.000 10.13544 13.40614
------------------------------------------------------------------------------
OLS regression for group 6

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 2602.49
Model | 11.1173565 3 3.70578551 Prob > F = 0.0000
Residual | .015663323 11 .001423938 R-squared = 0.9986
-------------+------------------------------ Adj R-squared = 0.9982
Total | 11.1330199 14 .795215705 Root MSE = .03774

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9673393 .0321728 30.07 0.000 .8965275 1.038151
fuel | .3023258 .0308235 9.81 0.000 .2344839 .3701678
load | .1050328 .4767508 0.22 0.830 -.9442886 1.154354
_cons | 10.77381 .4095921 26.30 0.000 9.872309 11.67532
------------------------------------------------------------------------------

8.2 Poolability Test across Groups

The null hypothesis of the poolability test across groups is
k ik
H | | = :
0
. The e e' is 1.3354,
the SSE of the pooled OLS regression. The
i i
e e ' is .1007 = .0068 + .0076 + .0229 + .0348
+ .0130 + .0157.

The F statistic is | | 66 , 20 4812 . 40 ~
) 4 15 ( 6 1007 .
4 ) 1 6 ( 1007 . 3354 . 1 (
÷
÷ ÷


The large 40.4812 rejects the null hypothesis of poolability (p< .0000). We conclude that the
panel data are not poolable with respect to airline.

8.3 Poolability Test over Time

The null hypothesis of the poolability test over time is
k tk
H | | = :
0
. The sum of
t t
e e ' is
computed from the 15 time by time regression.

forvalues i= 1(1)15 { // run year by year regression
display "OLS regression for year " `i'
regress cost output fuel load if year==`i'
}

(output is skipped)

. di .044807673 + .023093978 + .016506613 + .012170358 + .014104542 + ///
.000469826 + .063648817 + .085430285 + .049329439 + .077112957 + ///
.029913538 + .087240016 + .143348297 + .066075346 + .037256216

.7505079
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 89
http://www.indiana.edu/~statmath

89

The F statistic is | |
) 4 6 ( 15 7505 .
4 ) 1 15 ( ) 7505 . 3354 . 1 (
30 , 84 4175 .
÷
÷ ÷
=

The small F statistic does not reject the null hypothesis in favor of poolable panel data with
respect to time (p<.9991).

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 90
http://www.indiana.edu/~statmath

90
9. Conclusion

Panel data are analyzed to investigate group and time effects using fixed effect and random
effect models. The fixed effect model asks how group and/or time affect the intercept, while the
random effect model analyzes error variance structures affected by group and/or time. Slopes
are assumed unchanged in both fixed effect and random effect models.

A panel data set needs to be arranged in the long format as shown in Section 1.1. If the number
of groups (subjects) or time periods is extremely large, panel data models may be less useful
because the null hypothesis of F test is too strong. Then, you may consider categorizing
subjects to reduce the number of groups. If data are severely unbalanced, read output with
caution and consider dropping subjects with many missing data points. This document assumes
that data are balanced without missing values.

Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and
within effect model. LSDV has three approaches to avoid perfect multicollinearity. LSDV1
drops a dummy, LSDV2 suppresses the intercept, and LSDV3 includes all dummies and
imposes restrictions instead. LSDV1 is commonly used since it produces correct statistics.
LSDV2 provides actual parameter estimates of groups (Y-intercepts), but reports incorrect R
2
and F statistic. Notice that the dummy parameters of three LSDV approaches have different
meanings and thus conduct different t-tests.

The within effect model does not use dummy variables but deviations from group means. Thus,
this model is useful when there are many groups and/or time periods in the panel data set since
it is able to avoid the incidental parameter problem. The dummy parameter estimates need to be
computed afterward. Because of its larger degrees of freedom, the within effect model produces
incorrect MSE and standard errors of parameters. As a result, you need to adjust the standard
errors to conduct correct t-tests.

Random effect models are estimated by the generalized least squares (GLS) and the feasible
generalization least squares (FGLS). When the variance structure is known, GLS is used. If
unknown, FGLS estimates theta. Parameter estimates vary depending on estimation methods.

Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange
multiplier test. The Hausman specification test compares a fixed effect model and a random
effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is
preferred. Poolabiltiy is tested by running group by group or time by time regressions.

Among the four statistical packages addressed in this document, I would recommend SAS and
Stata. In particular, PROC PANEL provides various ways of analyzing panel data and report
correct (adjusted) statistics (see Table 4.1 and 7.1). Stata is very handy to manipulate panel data
reports incorrect F-test and R
2
. LIMDEP is able to estimate various panel data models but does
not good at data management. SPSS is least recommended for panel data models.

Extensions to these basic linear panel data models include dynamic models with autocorrelation,
random coefficient model, and hierarchical linear model, and logit/probit models.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 91
http://www.indiana.edu/~statmath

91
Appendix: Data Sets

Data set 1: Data of the top 50 information technology firms presented in OECD Information
Technology Outlook 2004 (http://thesius.sourceoecd.org/).

URL: http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.csv
http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.dta

firm = IT company name
type = type of IT firm
rnd = 2002 R&D investment in current USD millions
income = 2000 net income in current USD millions
d1 = 1 for equipment and software firms and 0 for telecommunication and electronics

. tab type d1

| d1
Type of Firm | 0 1 | Total
----------------+----------------------+----------
Telecom | 18 0 | 18
Electronics | 17 0 | 17
IT Equipment | 0 6 | 6
Comm. Equipment | 0 5 | 5
Service & S/W | 0 4 | 4
----------------+----------------------+----------
Total | 35 15 | 50

. sum rnd income

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
rnd | 39 2023.564 1615.417 0 5490
income | 50 2509.78 3104.585 -732 11797


Data set 2: Cost data for U.S. airlines (1970-1984) presented in Greene (2003).

URL: http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
http://www.indiana.edu/~statmath/stat/all/panel/airline.dta

airline = airline (six airlines)
year = year (fifteen years)
output0 = output in revenue passenger miles, index number
cost0 = total cost in $1,000
fuel0 = fuel price
load = load factor, the average capacity utilization of the fleet

. sum output0 cost0 fuel0 load

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
output0 | 90 .5449946 .5335865 .037682 1.93646
cost0 | 90 1122524 1192075 68978 4748320
fuel0 | 90 471683 329502.9 103795 1015610
load | 90 .5604602 .0527934 .432066 .676287
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 92
http://www.indiana.edu/~statmath

92
References

Baltagi, Badi H. 2001. Econometric Analysis of Panel Data. Wiley, John & Sons.
Baltagi, Badi H., and Young-Jae Chang. 1994. "Incomplete Panels: A Comparative Study of
Alternative Estimators for the Unbalanced One-way Error Component Regression
Model." Journal of Econometrics, 62(2): 67-89.
Breusch, T. S., and A. R. Pagan. 1980. "The Lagrange Multiplier Test and its Applications to
Model Specification in Econometrics." Review of Economic Studies, 47(1):239-253.
Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and
Applications. New York: Cambridge University Press.
Cameron, A. Colin, and Pravin K. Trivedi. 2009. Microeconometrics Using Stata. TX: Stata
Press.
Freund, Rudolf J., and Ramon C. Littell. 2000. SAS System for Regression, 3
rd
ed. Cary, NC:
SAS Institute.
Fuller, Wayne A. and George E. Battese. 1973. "Transformations for Estimation of Linear
Models with Nested-Error Structure." Journal of the American Statistical
Association, 68(343) (September): 626-632.
Fuller, Wayne A. and George E. Battese. 1974. "Estimation of Linear Models with Crossed-
Error Structure." Journal of Econometrics, 2: 67-78.
Greene, William H. 2003. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.
Greene, William H. 2007. LIMDEP Version 9.0 Econometric Modeling Guide 1. Plainview,
New York: Econometric Software.
Hausman, J. A. 1978. "Specification Tests in Econometrics." Econometrica, 46(6):1251-1271.
SAS Institute. 2004. SAS/ETS 9.1 User’s Guide. Cary, NC: SAS Institute.
SAS Institute. 2004. SAS/STAT 9.1 User’s Guide. Cary, NC: SAS Institute.
SPSS Inc. 2007. SPSS 16.0 Command Syntax Reference. Chicago, IL: SPSS Inc.
Stata Press. 2007. Stata Base Reference Manual, Release 10. College Station, TX: Stata Press.
Stata Press. 2007. Stata Longitudinal/Panel Data Reference Manual, Release 10. College
Station, TX: Stata Press.
Stata Press. 2007. Stata Time-Series Reference Manual, Release 10. College Station, TX: Stata
Press.
Suits, Daniel B. 1984. “Dummy Variables: Mechanics V. Interpretation.” Review of Economics
& Statistics 66 (1):177-180.
Uyar, Bulent, and Orhan Erdem. 1990. "Regression Procedures in SAS: Problems?" American
Statistician 44(4): 296-301.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel
Data. Cambridge, MA: MIT Press.








© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 93
http://www.indiana.edu/~statmath

93
Acknowledgements

I have to thank Dr. Heejoon Kang of the Kelley School of Business and Dr. David H. Good of
the School of Public and Environmental Affairs, Indiana University at Bloomington. I am also
grateful to Jeremy Albright, Dani Marinova, and Kevin Wilhite at the UITS Center for
Statistical and Mathematical Computing for comments and suggestions. A special thanks to
many readers around the world who have eagerly provided constructive feedback and
encouraged me to keep improving this document.


Revision History

 2005.11 First draft
 2008.04, 11 Corrected some errors and added Stata examples
 2009.09 Second draft (updated LSDV section and analysis output)

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 2

This document summarizes linear regression models for panel data and illustrates how to estimate each model using SAS 9.2, Stata 11, LIMDEP 9, and SPSS 17. This document does not address nonlinear models (i.e., logit and probit models) and dynamic models, but focuses on basic linear regression models. 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction Least Squares Dummy Variable Regression Panel Data Models One-way Fixed Effect Models: Fixed Group Effect One-way Fixed Effect Models: Fixed Time Effect Two-way Fixed Effect Models Random Effect Models Poolability Test Conclusion Appendix References

1. Introduction
Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each of which has repeated measurements at different time periods. U.S. Census Bureau’s Census 2000 data at the state or county level are cross-sectional but not time-series, while annual sales figures of Apple Computer Inc. for the past 20 years are time series but not cross-sectional. If annual sales data of IBM, LG, Siemens, Microsoft, and AT&T during the same periods are also available, they are panel data. The cumulative General Social Survey (GSS), American National Election Studies (ANES), and Current Population Survey (CPS) data are not panel data in the sense that individual respondents vary across survey years. Panel data may have group effects, time effects, or the both, which are analyzed by fixed effect and random effect models. 1.1 Data Arrangement A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. Thus, the total number of observations is nT. Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month). Otherwise, panel data should be analyzed with caution. A short panel data set has many entities but few time periods (small T), while a long panel has many time periods (large T) but few entities (Cameron and Trivedi 2009: 230). Panel data have a cross-section (entity or subject) variable and a time-series variable. In Stata, this arrangement is called the long form (as opposed to the wide form). While the long form has both group (individual level) and time variables, the wide form includes either group or time variable. Look at the following data set to see how panel data are arranged. There are 6 groups

http://www.indiana.edu/~statmath

2

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 3

(airlines) and 15 time periods (years). The .use command below loads a Stata data set through TCP/IP and in 1/20 of the .list command displays the first 20 observations.
. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear (Cost of U.S. Airlines (Greene 2003)) . list airline year load cost output fuel in 1/20, sep(20) +------------------------------------------------------------+ | airline year load cost output fuel | |------------------------------------------------------------| | 1 1 .534487 13.9471 -.0483954 11.57731 | | 1 2 .532328 14.01082 -.0133315 11.61102 | | 1 3 .547736 14.08521 .0879925 11.61344 | | 1 4 .540846 14.22863 .1619318 11.71156 | | 1 5 .591167 14.33236 .1485665 12.18896 | | 1 6 .575417 14.4164 .1602123 12.48978 | | 1 7 .594495 14.52004 .2550375 12.48162 | | 1 8 .597409 14.65482 .3297856 12.6648 | | 1 9 .638522 14.78597 .4779284 12.85868 | | 1 10 .676287 14.99343 .6018211 13.25208 | | 1 11 .605735 15.14728 .4356969 13.67813 | | 1 12 .61436 15.16818 .4238942 13.81275 | | 1 13 .633366 15.20081 .5069381 13.75151 | | 1 14 .650117 15.27014 .6001049 13.66419 | | 1 15 .625603 15.3733 .6608616 13.62121 | | 2 1 .490851 13.25215 -.652706 11.55017 | | 2 2 .473449 13.37018 -.626186 11.62157 | | 2 3 .503013 13.56404 -.4228269 11.68405 | | 2 4 .512501 13.8148 -.2337306 11.65092 | | 2 5 .566782 14.00113 -.1708536 12.27989 | +------------------------------------------------------------+

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape command to rearrange a data set back and forth between the long and wide form. The following command changes from the long form to wide one so that the wide form has only six observations that have a group variable and as many variables as the time period (4*15 year).
. keep airline year load cost output fuel . reshape wide cost output fuel load, i(airline) j(year) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) Data long -> wide ----------------------------------------------------------------------------Number of obs. 90 -> 6 Number of variables 6 -> 61 j variable (15 values) year -> (dropped) xij variables: cost -> cost1 cost2 ... cost15 output -> output1 output2 ... output15 fuel -> fuel1 fuel2 ... fuel15 load -> load1 load2 ... load15 -----------------------------------------------------------------------------

If you wish to rearrange the data set back to the long form, run the following command.
. reshape long cost output fuel load, i(airline) j(year)

In balanced panel data, all entities have measurements in all time periods. In a contingency table of cross-sectional and time-series variables, each cell should have only one frequency. When each entity in a data set has different numbers of observations due to missing values, the panel data are not balanced. Some cells in the contingency table have zero frequency. In
http://www.indiana.edu/~statmath 3

If dummies are considered as a part of the intercept. Table 1. while random effects are examined by the Lagrange Multiplier (LM) test (Breusch and Pagan 1980). not in their intercepts. Fixed effects are tested by the (incremental) F test.edu/~statmath 4 . the dummies act as an error term. otherwise.1 Fixed Effect and Random Effect Models Fixed Effect Model ' Functional form* yit  (  ui )  X it   vit Intercepts Error variances Slopes Estimation Hypothesis test * Varying across groups and/or times Constant Constant LSDV. Ordinary least squares (OLS) regressions with dummies.1). v2 ) A random effect model.indiana. If the null hypothesis is not rejected. within effect method Incremental F test Random Effect Model ' yit    X it   (ui  vit ) Constant Varying across groups and/or times Constant GLS. the total number of observations is not nT. assuming the same intercept and slopes. ui is allowed to be correlated to other regressors. Fixed effect models use least squares dummy variable (LSDV) and within effect estimation methods. estimates variance components for groups (or times) and error. A typical example is the groupwise heteroscedastic regression model (Greene 2003). If the null hypothesis that the individual effects are uncorrelated with the other regressors in the model is not rejected. The feasible generalized least squares (FGLS) method is used to estimate the variance structure when  is not known. The Hausman specification test (Hausman 1978) compares fixed effect and random effect models. There are various estimation methods for FGLS including the maximum likelihood method and simulation (Baltagi and Cheng 1994). 1. a core OLS assumption is violated. is known. the pooled OLS regression is favored. a random effect model is better than its fixed counterpart. http://www. A random effect model is estimated by generalized least squares (GLS) when the  matrix. Unbalanced panel data entail some computational and estimation issues although most software packages are able to handle both balanced and unbalanced data. The difference among groups (or time periods) lies in their variance of the error term.2 Fixed Effect versus Random Effect Models Panel data models examine fixed and/or random effects of entity (individual or subject) or time. ui is a part of the errors and thus should not be correlated to any regressor. FGLS Breusch-Pagan LM test vit ~ IID(0. assuming the same slopes and constant variance across entities or subjects. Since a group (individual specific) effect is time invariant and considered a part of the intercept. in fact. a variance structure among groups. A fixed group effect model examines group differences in intercepts.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 4 unbalanced panel data. are fixed effect models. In a random effect model. by contrast. this is a fixed effect model. The core difference between fixed and random effect models lies in the role of dummy variables (Table 1.

Str=.Str=.xtrc Regress$ w/o a dummy w/o One in Rhs Cls: Regress.xtreg.. LIMDEP regress$.Random$ Regress. PROC TSCSREG provides one-way and two-way fixed and random effect models. . PROC PANEL requires each entity (subject) has more than one observation. Thus. Table 1.xtgls . SAS. Two-way effect models have two sets of dummy variables for group and/or time variables (e. between effect model (group or time mean model). Stata. and SPSS SAS 9. and FGLS are fundamentally based on OLS in terms of estimation. Stata . 1.xtmixed . http://www. fe . but SPSS cannot. and LIMDEP also provide the procedures and commands that estimate panel data models in a convenient way (Table 1. be . The REG procedure of SAS/STAT..noconstant . state and year). PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks (1967) autoregressive model and Da Silva moving average method.areg. and SPSS regression commands all fit LSDV1 by dropping one dummy and have options to suppress the intercept (LSDV2).g.Panel.Panel. and error for a random effect model. and LIMDEP can estimate OLS with restrictions (LSDV3).13 users need to download and install PROC PANEL from http://www.2 Stata 11 LIMDEP 9 Regression (OLS) LSDV1 LSDV2 LSDV3 One-way fixed effect (within) Two-way fixed (within effect) Between effect One-way random effect Two-way random Random coefficient model PROC REG SPSS 17 Regression w/o a dummy /Origin N/A N/A N/A N/A N/A N/A N/A w/o a dummy /NOINT RESTRICT TSCSREG /FIXONE PANEL /FIXONE TSCSREG /FIXTWO PANEL /FIXTWO PANEL /BTWNG PANEL /BTWNT TSCSREG /RANONE PANEL /RANONE MIXED /RANDOM TSCSREG /RANTWO PANEL /RANTWO MIXED /RANDOM .xtmixed .Fixed$ Regress. and race).xtreg. Period=. PROC PANEL was an experimental procedure in 9. time.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 5 If one cross-sectional or time-series variable is considered (e. GLS.cnsreg .xtmixed .2.2 Procedures and Commands in SAS. Period=.RPM=.cnsreg).2). whereas PROC PANEL is able to deal with balanced and unbalanced data. Means$ Regress.Str=. PROC TSCSREG can handle balanced data only.edu/~statmath 5 1 .13 but becomes a regular procedure in 9. SAS/ETS has the TSCSREG and PANEL procedures to estimate one-way and two-way fixed/random effect models.com/apps/demosdownloads/setupintro.Panel. LIMDEP.Str=$ SAS.Panel. Fixed$ Regress.regress (. re .regress w/o a dummy . Stata.Str=.sas.1 These procedures estimate the within effect model for a fixed effect model and by default employ the Fuller-Battese method (1974) to estimate variance components for group. country. this is called a one-way fixed or random effect model.constraint command.2).indiana.Panel. SAS 9. Stata. In Stata. firm.Str=.3 Estimation and Software Issues The LSDV regression.g.cnsreg command requires restrictions defined in the . Random$ Regress. abs N/A . within effect model. any procedure and command for OLS is good for linear panel data models (Table 1.xtreg.jsp.

indiana. which are used in Econometric Analysis (Greene 2003).xtreg command estimates a within effect (fixed effect) model with the fe option. equivalent to the .areg command with the absorb option. For the twoway random effect model. Stata has . The Stata .htm. and a random effect model with re. SPSS has limited ability to analyze panel data. A panel data set has cost data for U. 3 You may fit the two-way fixed effect model by including a set of dummies and using the fe option. however. while PROC TSCSREG does not. This command.edu/~statmath 6 2 .xtreg with the fe option.S. See the Appendix for the details. a between effect model with be. The LIMDEP Regress$ command with the Panel subcommand estimates panel data models.3 The . the output of the two procedures is similar. and Means is for a between effect model. A random effect model can be also estimated using the .xtmixed command instead of . However. http://support. BP and BP2 produce invalid Breusch-Pagan statistics in cases of unbalanced data.xtreg. Random effect estimates a random effect model. PROC MIXED is also able to fit random effect and random coefficient (parameter) models and supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG.xtgls that fits panel data models with heteroscedasticity across groups and/or autocorrelation within groups. 1. A cross-sectional data set contains research and development (R&D) expenditure data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004. PROC PANEL has BP and BP2 options to conduct the Breusch-Pagen LM test for random effects. you need to use the . The Fixed effect subcommand fits a fixed effect model.sas. airlines (19701984). 2 Despite advanced features of PROC PANEL. fits the one-way within effect model that has a large dummy variable set. does not directly fit two-way fixed and random effect models.xtmixed command.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 6 while PROC PANEL supports the between effect model (/BTWNT and /BTWNG) and pooled OLS regression (/POOLED) as well. http://www.4 Data Sets This document uses two data sets.com/documentation/cdl/en/etsug/60372/HTML/default/etsug_panel_sect041.

1 -----------------------------------------------------------------------------rnd | Coef.0115). Take a look at the data structure (Figure 2.1).528 Electronics 0 1 | | Verizon .0839066 2. In the following regression equation. and  i is the error term. Equipment 1 0 | | Siemens 5.6 38 2609571.1604 says that this model accounts for 16 percent of the total variance. For a $ one million increase in net income.07 0.421 Service & S/W 1 0 | … … … … … … … … 2.490 6. R2 of .093 IT Equipment 1 0 | | Ericsson 4.1377 1500.  0 is the intercept. p<.38 -------------+-----------------------------Total | 99163705.66 0. 11. Figure 2. Std.697 314. clear ( R&D expenditure of IT firm (OECD 2002)) . It is commonly used to examine group and time effects in regression analysis.000 844. Err.indiana. use http://www. regress rnd income Source | SS df MS -------------+-----------------------------Model | 15902406.1 Dummy Variable Coding for Firm Types +-----------------------------------------------------------------+ | firm rnd income type d1 d2 | |-----------------------------------------------------------------| | LG Electronics 551 356 Electronics 0 1 | | AT&T 254 4. 1 is the slope of net income in 2000.edu/~statmath 7 .7957 4.2 Number of obs F( 1.424 2.71 0.1604 0.533 ------------------------------------------------------------------------------ http://www. The dummy variable d1 is set to 1 for equipment and software firms and zero for telecommunication and electronics. Model 1: R & Di   0  1incomei   i The pooled model fits the data well at the .8599 2120.edu/~statmath/stat/all/panel/rnd2002.797 Telecom 0 1 | | Microsoft 3.07.5 Residual | 83261299.482.05 significance level (F=7.1 37 2250305.1 Model 1 without a Dummy Variable: Pooled OLS The ordinary least squares (OLS) regression without dummy variables. Interval] -------------+---------------------------------------------------------------income | .750 8. .697 and slope of .2231.3930632 _cons | 1482. t P>|t| [95% Conf.5 1 15902406.0530414 .indiana.2231 million (p<. Consider a simple model of regressing R&D expenditure in 2002 on 2000 net income and firm type.012 . a firm is likely to increase R&D expenditure by $ . a pooled regression model. Least Squares Dummy Variable Regression A dummy variable is a binary variable that is coded to either 1 or zero.012). The variable d2 is coded in the opposite way.669 Telecom 0 1 | | IBM 4.dta. assumes a constant intercept and slope regardless of firm types.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 7 2. 37) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 7.0115 0.2230523 . The model has the intercept of 1.300 Comm.772 9.

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 8

Pooled model: R&D = 1,482.697 + .2231*income Despite moderate goodness of fit statistics such as F and t, this is a naïve model. R&D investment tends to vary across industries.
2.2 Model 2 with a Dummy Variable

You may assume that equipment and software firms have more R&D expenditure than other types of companies. Let us take this group difference into account.4 We have to drop one of the two dummy variables in order to avoid perfect multicollinearity. That is, OLS does not work with both dummies in a model. The  1 in model 2 is the coefficient of equipment, service, and software companies.
Model 2: R & Di   0   1incomei   1 d1i   i

Model 2 fits the date better than Model 1 The p-value of the F test is .0054 (significant at the .01 level); R2 is .2520, about .1 larger than that of Model 1; SSE (sum of squares due to error or residual) decreases from 83,261,299 to 74,175,757 and SEE (square root of MSE) also declines accordingly (1,500→1,435). The coefficient of d1 is statistically discernable from zero at the .05 level (t=2.10, p<.043). Unlike Model 1, this model results in two different regression equations for two groups. The difference lies in the intercepts, but the slope remains unchanged.
. regress rnd income d1 Source | SS df MS -------------+-----------------------------Model | 24987948.9 2 12493974.4 Residual | 74175756.7 36 2060437.69 -------------+-----------------------------Total | 99163705.6 38 2609571.2 Number of obs F( 2, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 6.06 0.0054 0.2520 0.2104 1435.4

-----------------------------------------------------------------------------rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------income | .2180066 .0803248 2.71 0.010 .0551004 .3809128 d1 | 1006.626 479.3717 2.10 0.043 34.41498 1978.837 _cons | 1133.579 344.0583 3.29 0.002 435.7962 1831.361 ------------------------------------------------------------------------------

d1=1: R&D = 2,140.2050 + .2180*income = 1,113.579 +1,006.6260*1 + .2180*income d1=0: R&D = 1,133.5790 + .2180*income = 1,113.579 +1,006.6260*0 + .2180*income The slope .2180 indicates a positive impact of two-year-lagged net income on a firm’s R&D expenditure. Equipment and software firms on average spend $1,007 million (=2,140-1,134) more for R&D than telecommunication and electronics companies.
2.3 Visualization of Model 1 and 2

4

The dummy variable (firm types) and regressors (net income) may or may not be correlated. 8

http://www.indiana.edu/~statmath

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 9

There is only a tiny difference in the slope (.2231 versus .2180) between Model 1 and Model 2. The intercept 1,483 of Model 1, however, is quite different from 1,134 for equipment and software companies and 2,140 for telecommunications and electronics in Model 2. This result appears to be supportive of Model 2. Figure 2.2 highlights differences between Model 1 and 2 more clearly. The red line (pooled) in the middle is the regression line of Model 1; the dotted blue line at the top is one for equipment and software companies (d1=1) in Model 2; finally the dotted green line at the bottom is for telecommunication and electronics firms (d2=1 or d1=0). Figure 2.2. Regression Lines of Model 1 and Model 2

2002 R&D Investment of OECD IT Firms
2500 R&D=2140+.218*Income

R&D (USD Millions) 1000 1500 2000

R&D=1483+.223*Income

R&D=1134+.218*Income

0 0

500

500

1000 1500 Income (USD Millions)

2000

2500

Source: OECD Information Technology Outlook 2004. http://thesius.sourceoecd.org/

This plot shows that Model 1 ignores the group difference, and thus reports the misleading intercept. The difference in the intercept between two groups of firms looks substantial. However, the two models have the similar slopes. Consequently, Model 2 considering a fixed group effect (i.e., firm type) seems better than the simple Model 1. Compare goodness of fit statistics (e.g., F, R2, and SSE) of the two models. See Section 3.2.2 and 4.7 for formal hypothesis test.
2.4 Least Squares Dummy Variable Regression: LSDV1, LSDV2, and LSDV3

The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with dummy variables. Above Model 2 is a typical example of LSDV. The key issue in LSDV is how to avoid the perfect multicollinearity or so called “dummy variable trap.” LSDV has three approaches to avoid getting caught in the trap. These approaches are different from each other with respect to model estimation and interpretation of dummy variable parameters (Suits 1984: 177). They produce different dummy parameter estimates, but their results are equivalent.
http://www.indiana.edu/~statmath 9

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 10

The first approach, LSDV1, drops a dummy variable as shown in Model 2 above. That is, the parameter of the eliminated dummy variable is set to zero and is used as a baseline (Table 3). A LSDV 1 variable to be dropped, d dropped (d2 in Model 2), needs to be carefully (as opposed to arbitrarily) selected so that it can play a role of the reference group effectively. LSDV2 includes all dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero). Finally, LSDV3 includes the intercept and all dummies, and then impose a restriction that the sum of parameters of all dummies is zero. Each approach has a constraint (restriction) that reduces the number of parameters to be estimated by one and thus makes the model identified. The following functional forms compare these three LSDVs.
LSDV1: R & Di   0  1incomei   1 d1i   i or R & Di   0  1incomei   2 d 2i   i LSDV2: R & Di   1incomei   1 d1i   2 d 2i   i LSDV3: R & Di   0   1incomei   1 d1i   2 d 2i   i , subject to  1   2  0

Table 2.1. Three Approaches of the Least Squares Dummy Variable Regression Model LSDV1 LSDV2 LSDV3 LSDV 1 LSDV 1 * * Dummies included d1  dd except d1  d d d1LSDV 3  d dLSDV 3
LSDV 1 for d dropped

Intercept? All dummies? Constraint (restriction)? Actual dummy parameters

 LSDV 1
No (d-1)

No Yes (d)

 LSDV 3
Yes (d)

LSDV 1 dropped

0
LSDV 1 LSDV 1 i


,

LSDV 2

0
* d

(Drop one dummy)

(Suppress the intercept)



LSDV 3 i

0

(Impose a restriction)

   * LSDV 1  dropped  
* i

 ,  ,… 
* 1 * 2

 i*   LSDV 3   iLSDV 3 ,
 LSDV 3 
1   i* d

Meaning of a dummy coefficient H0 of the t-test

How far away from the reference group (dropped)?
*  i*   dropped  0

Actual intercept

How far away from the average group effect?

 i*  0

 i* 

1   i*  0 d

Source: Constructed from Suits (1984) and David Good’s lecture (2004)

Three approaches end up fitting the same model but the coefficients of dummy variables in each approach have different meanings and thus are numerically different (Table 2.1). A * parameter estimate in LSDV2,  d , is the actual intercept (Y-intercept) of group d. It is easy to
* interpret substantively. The t-test examines if  d is zero. In LSDV1, a dummy coefficient shows the extent to which the actual intercept of group d deviates from the reference point (the * parameter of the dropped dummy variable), which is the intercept of LSDV1,  dropped   LSDV 1 .5

ˆ In Model 2,  1 of 1,007 is the estimated (relative) distance between two types of firm (equipment and software versus telecommunications and electronics). In Figure 2.2, the Y-intercept of equipment and software (absolute distance from the origin) is 2,140 = 1,134+1,006. The Y-intercept of telecommunications and electronics is 1,134.
5

http://www.indiana.edu/~statmath

10

LSDV2 reports a incorrect R2. 2. The intercept is the actual parameter estimate (absolute distance from the origin) of the dropped dummy variable. They all fit the same model. and LIMDEP.e. MODEL rnd = income d2. and SPSS Regression command all fit OLS and LSDVs. In general. The average effect is the intercept of LSDV3:  LSDV 3    i* . for example. RUN. d the null hypothesis is the deviation from the average is zero.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 11 The null hypothesis holds that the deviation from the reference group is zero. LSDV2 and LSDV3 involve some estimation problems. Here we include d2 instead of d1 to see how a different reference point changes the result.1 LSDV 1 without a Dummy LSDV 1 drops a dummy variable. Let us estimate three LSDVs using SAS. LIMDEP Regress$ command.5. Stata . Check the sign of the dummy coefficient and the intercept. 2.5 Estimating Three LSDVs The SAS REG procedure.1 summarizes differences in estimation and interpretation of the three LSDVs. Therefore.. The coefficient of a dummy included means how far its parameter estimate is away from the reference point or baseline (i. In short. Which approach is better than the others? You need to consider both estimation and interpretation issues carefully.regress command. each approach has a different baseline and thus tests a different hypothesis but produces exactly the same parameter estimates of regressors. Stata. in other words. PROC REG DATA=masil. Table 2. a dummy coefficient means how far its actual parameter is away from the average group effect 1 (Suits 1984: 178).rnd2002.indiana. Oftentimes researchers want to see how far dummy parameters deviate from the reference group rather than what are the actual intercept of each group.edu/~statmath 11 . we can replicate the other two LSDVs. LSDV1 is often preferred because of easy estimation in statistical software packages. In LSDV3. the intercept). The REG Procedure Model: MODEL1 Dependent Variable: rnd Number of Observations Read Number of Observations Used Number of Observations with Missing Values 50 39 11 Analysis of Variance Sum of Mean http://www. given one LSDV fitted.

you do not need to compute Y-intercepts of groups.140.007 smaller than 1. The Stata . This LSDV. however.133. whose dummy is dropped in the model (d1=1. this model is identical to Model 2 in Section 2.5.2. Alternatively.06 Pr > F 0. MODEL rnd = income d2 /SOLUTION.0101 0.48460 0.6259*0 + . The coefficient -1. thus.140.93 2. MODEL rnd = income d2 /SOLUTION.2 LSDV 2 without the Intercept LSDV 2 includes all dummy variables and suppresses the intercept.20468 0.rnd2002. 1.140.006.140 is the Y-intercept of equipment and software firms. regress rnd income d1 d2.42248 2023.134 = 2.1.6259*1 + . Therefore. you may use the GLM and MIXED procedures to get the same result.regress command has the noconstant option to fit LSDV2.2520 0.2180*income = 2.2180*income The intercept 2. reports incorrect (inflated) R2 (. This is because the X matrix does not have a column vector of 1 and produces incorrect sums of squares of model and total (Uyar and Erdem (1990: 298).7135 > .37174 Variable Intercept income d2 DF 1 1 1 t Value 4. .134 of equipment and software.93536 R-Square Adj R-Sq 0. However. PROC GLM DATA=masil.007.71 -2.5788 + .rnd2002.© 2005-2009 The Trustees of Indiana University (9/16/2009) Source Model Error Corrected Total DF 2 36 38 Squares 24987949 74175757 99163706 Linear Regression Models for Panel Data: 12 Square 12493974 2060438 F Value 6. noconstant http://www.1. dropping another dummy does not change the model although producing different dummy coefficients.2180*income d2=1: R&D = 1.2520) and F (29.62593 Standard Error 434.06).0001 0.indiana. RUN.007 of telecommunications and electronics means that its Y-intercept is -1.2047 .2180*income = 2. In short. d2=0). the sum of squares of errors is correct in any LSDV.0054 Root MSE Dependent Mean Coeff Var 1435.2104 Parameter Estimates Parameter Estimate 2140.2047 .10 Pr > |t| <.0428 d2=0: R&D = 2. 2. The coefficients of dummies are actual parameter estimates. PROC MIXED DATA=masil.006.edu/~statmath 12 .140 (baseline) – 1.88 > 6.08032 479. That is.56410 70. RUN.2047 + .21801 -1006.

3 LSDV 3 with a Restriction LSDV 3 includes the intercept and all dummies and then imposes a restriction on the model.2180*income d2=1: R&D = 1.6859 -2.010 .4225 ( 1) d1 + d2 = 0 -----------------------------------------------------------------------------rnd | Coef.637 + 503*1 + (-503)*0 + .140.637).6859 2.4184 d2 | -503.140 millions for R&D expenditure.constraint command defines a constraint.2180*income 2. t P>|t| [95% Conf.133.313 239.rnd2002.0551004 .140-$1.637 + 503*0 + (-503)*1 + . cnsreg rnd income d1 d2. RUN.205 + .06 0.69 ------------------------------------------------------------------------------ d1=1: R&D = 2.579 + .2180*income = 1.313 239.361 ------------------------------------------------------------------------------ d1=1: R&D = 2. the coefficient of RESTRICT is virtually zero and.2180*income = 1.10 0. Std.10 0. Err.5. Interval] -------------+---------------------------------------------------------------income | .133.0438 5.0583 3.579 + .637).0551004 . Since there are two groups here. Std.7135 0. RESTRICT d1 + d2 = 0. .79 Linear Regression Models for Panel Data: 13 Number of obs F( 3.4846 4.38 d2 | 1133. In the SAS output below. 36) Prob > F Root MSE = = = = 39 6.140+1.20749 _cons | 1636. http://www.205 + .71 0. while the .010 . in theory.20749 989. the coefficients of two dummies by definition share the same magnitude ($503) but have opposite directions.043 -989.2180*income The intercept is the average of actual parameter estimates: 1.000 1008.000 1259.2180*income d2=1: R&D = 1.140.2180066 .indiana. should be zero. Err. constraint 1 d1 + d2 = 0 .constraint command.69 -------------+-----------------------------Total | 258861361 39 6637470.edu/~statmath 13 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 184685604 3 61561868.0054 1435.0803248 2.029 3021. PROC REG DATA=masil.002 435. 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 29.133)/2. The number in the parenthesis indicates the constraint number defined in the .28 0. constraint(1) Constrained linear regression Number of obs F( 2.88 0.7 36 2060437.205 434. while telecommunications and electronics spend $503 millions LESS than the average (=$1. $503 millions MORE than the average expenditure of overall IT firms (=$2.579 344. Interval] -------------+---------------------------------------------------------------income | .043 17.3809128 d1 | 2140. MODEL rnd = income d1 d2. t P>|t| [95% Conf. The Stata .29 0.3809128 d1 | 503.4184 -17.71 0.4 -----------------------------------------------------------------------------rnd | Coef.0000 0.094 2265. The restriction is that the sum of all dummy parameters is zero.637 = (2.6896 1435.0803248 2.7962 1831.892 310.cnsreg command fits a constrained OLS using the constraint()option. Equipment and software firms invest $2.134-$1.1 Residual | 74175756.93 0.2180066 .

and SPSS LSDV 1 LSDV 2 LSDV 3 PROC REG.05) POUT(. Lhs=rnd.68587 239. constraint 1 d1+ d2 = 0 .10) /ORIGIN /DEPENDENT rnd /METHOD=ENTER income d1 d2.2104 Parameter Estimates Parameter Estimate 1636. * Probability computed using beta distribution. regress rnd income d1 d2.2 Estimating Three LSDVs Using SAS.income. PROC REG. MODEL rnd = income d1 d2.0428 0. Stata. d1.06 Pr > F 0.56410 70.10 . Rhs=income. Rhs=ONE.93536 R-Square Adj R-Sq 0.10 -2.31297 1. Pr > |t| <.10) /NOORIGIN /DEPENDENT rnd /METHOD=ENTER income d2. Lhs=rnd. d2. Stata LIMDEP SPSS . Table 2. RUN. d2$ REGRESSION /MISSING LISTWISE /STATISTICS COEFF R ANOVA /CRITERIA=PIN(. RESTRICT d1 + d2 = 0. . d2$ REGRESSION /MISSING LISTWISE /STATISTICS COEFF R ANOVA /CRITERIA=PIN(.05) POUT(. cnsreg rnd income d1 d2 const(1) REGRESS.0101 0.42248 2023. noconstant REGRESS. Rhs=ONE. PROC REG.71 2.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 14 The REG Procedure Model: MODEL1 Dependent Variable: rnd NOTE: Restrictions have been applied to parameter estimates.68587 0 Variable Intercept income d1 d2 RESTRICT DF 1 1 1 1 -1 t Value 5.89172 0. Number of Observations Read Number of Observations Used Number of Observations with Missing Values 50 39 11 Analysis of Variance Sum of Squares 24987949 74175757 99163706 Mean Square 12493974 2060438 Source Model Error Corrected Total DF 2 36 38 F Value 6.indiana.31297 -503. RUN.0428 . .28 2.0054 Root MSE Dependent Mean Coeff Var 1435.0001 0. RUN. Lhs=rnd. Cls: b(2)+b(3)=0$ N/A http://www. regress ind income d2 REGRESS. LIMDEP.81899E-12 Standard Error 310.04381 0.income.08032 239. d1. SAS MODEL rnd = income d2.21801 503.2520 0. MODEL rnd = income d1 d2 /NOINT.edu/~statmath 14 .

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 15 Table 2. http://www.2 compares how SAS. Stata. In LIMDEP.indiana. and SPSS estimate LSDVs. pay attention to the /ORIGIN option for LSDV2. LIMDEP. ONE indicates the intercept to be included. In SPSS.edu/~statmath 15 . SPSS is not able to fit the LSDV3. Cls: b(2)+b(3)=0 fits the model under the condition that the sum of parameter estimates of d1 (second parameter) and d2 (third parameter) is zero.

A fixed effect model examines if intercepts vary across groups or time periods. ' Fixed group effect model: yit  (  ui )  X it   vit .2 Fixed Effect Models There are several strategies for estimating fixed effect models.edu/~statmath 16 . Slopes remain the same across groups or time periods.. The functional forms of one-way panel data models are as follows. where vit ~ IID(0. v2 ) Note that ui is a fixed or random effect and errors are independent identically distributed. The least squares dummy variable model (LSDV) uses dummy variables. These strategies. n: the number of groups or firms T : the number of time periods N=nT : total number of observations k : the number of regressors excluding dummy variables K=k+1 (including the intercept) 3. The between effect model fits the model using group and/or time means of dependent and independent variables without dummies.indiana.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 16 3. Panel Data Models Panel data models examine group (individual-specific) effects. xi  : means of independent variables (IVs) of group i.g.1 summarizes pros and cons of these models.  yi  : dependent variable (DV) mean of group i. y : overall means of the DV. http://www. firm). where vit ~ IID(0. xt : means of independent variables (IVs) at time t. v2 ) .1 Functional Forms and Notation The parameter estimate of a dummy variable is a part of the intercept in a fixed effect model and a component of error in the random effect model. time effects. in fact. produce the identical parameter estimates of non-dummy independent variables.g. while a twoway model considers two sets of dummy variables (e.. v2 ) ' Random group effect model: yit    X it   (ui  vit ) . vit ~ IID(0. Model 2 in Chapter 2. Notations used in this document include. x : overall means of the IVs. Table 3.           yt : dependent variable (DV) mean at time t. whereas a random effect model explores differences in error variances. or both. whereas the within effect model does not. 3. A one-way model includes only one set of dummy variables (e. These effects are either fixed effect or random effect. of course. is a one-way fixed group effect panel data model. firm and year).

Within Effect. A within group effect model does not need dummy variables. and Between Effect Models As discussed in Chapter 2. the within effect model. Thus. Thus. the within effect model has larger degrees of freedom for error. LSDV is widely used because it is relatively easy to estimate and interpret substantively.1 Estimations: LSDV. however. 3) run OLS with the transformed variables without the intercept. you need to compute them using the formula di*  yi   xi  '  Since no dummy is used.6 The incidental parameter problem is no longer an issue. http://www. Under this circumstance. * sek  sek Table 3. R2 of the within effect model is not correct  sek LSDV nT  n  k df error because the intercept is suppressed.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 17 3. resulting in small MSE (mean square error) and incorrect (smaller) standard errors of parameter estimates. uses group means of the dependent and independent variables. This LSDV.   ui .edu/~statmath 17 6 . The within effect model in turn has several disadvantages.1 Comparison of Fixed Effect Models LSDV1 Within Effect Functional form y i  i i  X i    i yit  yi   xit  xi    it   i  Dummy Dummy coefficient Transformation Intercept (estimation) R2 SSE MSE Standard error of  DFerror Observations Yes Presented No Yes Correct Correct Correct Correct nT-n-k nT No Need to be computed Deviation from the group means No Incorrect Correct Smaller Incorrect (smaller) nT-k (n larger) nT Between Effect y i     xi    i No N/A Group means Yes n-K n The between group effect model. The parameter estimates of regressors in the within effect model are identical to those of LSDV. This data aggregation reduces the number of You need to follow three steps: 1) compute group means of the dependent and independent variables. If T is fixed and nT   . LSDV is useless and thus calls for another strategy. only coefficients of regressors are consistent.2. becomes problematic when there are many groups or subjects in panel data. Finally. This is the so called incidental parameter problem. but it uses deviations from group means.indiana. you have to adjust the standard error using the formula Within df error nT  k . are not consistent since the number of these parameters increases as nT increases (Baltagi 2001). Since this model does not report dummy coefficients. 2) transform variables to get deviations from the group means. so called the group mean regression. The coefficients of dummy variables. this model is the OLS of ( yit  yi  )  ( xit  xi  )'   ( it   i  ) without an intercept.

the fixed effect estimates are considered as the robust estimates and random effect estimates as the efficient estimates.7 (e' e Efficient  e' e Robust ) (n  1) (e' e Robust ) (nT  n  k )  2 2 ( RRobust  REfficient ) (n  1) 2 (1  RRobust ) (nT  n  k ) ~ F (n  1. (e' ePooled  e' eWithin ) (T  1) F-test: ~ F (T  1.2. run OLS of yi     xi    i . This hypothesis is tested by the F test. 3. nT  n  k ) If the null hypothesis is rejected..    Model: yit     i   t  X it    it .3 Fixed Time Effect and Two-way Fixed Effect Models For the fixed time effects model.   T 1  0 . and the between group models.indiana. Tn  T  k ) . which is based on loss of goodness-of-fit.1). you may conclude that the fixed group effect model is better than the pooled OLS model. (e' eWithin ) (Tn  T  k ) The fixed group and time effect model uses slightly different formulas.edu/~statmath .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 18 observations down to n. Dummy coefficients: di*  ( yi   y )  ( xi   x )'  and dt*  ( yt  y )  ( xt  x )'  7 When comparing fixed effect and random effect models.. * * Within effect Model: yit  yit  yi   yt  y and xit  xit  xi   xt  x .2 Testing Group Effects In a regression of yit    i  X it '    it .     Model: yit     t  X it '    it Within effect model: ( yit  yt )  ( xit  xt )'   ( it   t ) Dummy coefficients: dt*  yt  xt '  * Correct standard errors: sek  sek Within df error Tn  k  sek LSDV Tn  T  k df error    Between effect model: y t    xt   t H 0 :  1  . Then. and i and t in the formulas. the null hypothesis is that all dummy parameters except for one for the dropped are zero: H 0 : 1  . you need to switch n and T.1 contrasts LSDV. Table 3. The robust model in the following formula is LSDV (or within effect model) and the efficient model is the pooled regression. 18 http://www. 3.2. the within effect model.   n 1  0 . The within effect model of this two-way fixed model is estimated by five strategies (see Section 6...

(nT  n  T  k  1)] (e' e Robust ) (nT  n  T  k  1) 3. run pooled OLS.9 Then transform  *  1 8 9 This implies that Corr ( wit . w js )  E ( wit w js ) are  u   v2 if i=j and t=s and  u2 if i=j and t  s .1 Generalized Least Squares (GLS) When  is known (given). The ui are assumed independent of vit and X it .   T 1  0 . . This assumption is not necessary in the 2 fixed effect model.8 Thus.edu/~statmath .. and If  u2 ( u2   v2 ) if i=j and t  s ... Compared to fixed effect models... w js ) is 1 if i=j and t=s.   0 .  u2  u2 A random effect model is estimated by generalized least squares (GLS) when the variance structure is known. The components of Cov( wit . * yit  yit   yi     * xit  xit   xi  for all Xk T   2 u  v2 2 v ..  u   v2   . . u2 ) and vit ~ IID(0.  2  u2  u    . you just need to compute  using the  matrix:   1  variables as follows. 3. In GLS. This document assumes panel data are balanced. GLS based on the true variance components is BLUE and all the feasible GLS estimators considered are asymptotically efficient as either n or T approaches infinity (Baltagi 2001).   2 . which are also independent of each other for all i and t.   n 1  0 and  1  .. the  matrix or the variance structure of errors looks like.indiana.3 Random Effect Models The one-way random group effect model is formulated as yit    X it '   ui  vit . wit  ui  vit where ui ~ IID(0. 2  u   v2  u2   u2  u2   v2   T T  . v2 ) .3. then run the within effect model....... If   1 and  v2  0 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 19    * Correct standard errors: sek  sek Within df error nT  k  sek LSDV nT  n  T  k  1 df error H 0 : 1  ... random effect models are relatively difficult to estimate.. (e' e Efficient  e' e Robust ) (n  T  2) F-test: ~ F [(n  T  2)... 19 http://www. and by feasible generalized least squares (FGLS) when the variance is unknown.  .

run OLS on the transformed variables: yit   *  xit '  *   it . LM u   2 2 2(T  1)   eit 2(T  1)   eit       2 2 http://www. nT  n  k ˆ2 The  u comes from the between effect model (group mean regression): ˆ ˆ   2 u 2 between  ˆ  v2 T ˆ2 .3 Testing Random Effects (LM test) 2 The null hypothesis is that cross-sectional variance components are zero. first you have to estimate  using  u and  v2 : ˆ  1  ˆ  v2 1 . The LM follows chi-squared distribution with one degree of freedom. 2 2 ˆ ˆ ˆ2 T u   v T between ˆ  v2 ˆ The  v2 is derived from the SSE (sum of squares due to error) of the within effect model or from the deviations of residuals from group means of residuals: SSE within e' ewithin ˆ  v2    nT  n  k nT  n  k  (v i 1 t 1 n T it  vi  ) 2 . 2 2 2 2   nT    eit  nT   Tei     1   1 ~  2 (1) . 2(T  1)  e' e 2(T  1)  e' e     Baltagi (2001) presents the same LM test in a different way. H 0 :  u  0 . transform variables using  and then run OLS: yit   *  xit '  *   it .3.indiana. FGLS is more frequently used than GLS. 3.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 20 * * * Finally. In the following formula.2 Feasible Generalized Least Squares (FGLS) ˆ2 ˆ If  is unknown. nK * * * ˆ Next.3. ˆ  y*  y   y it it i   * ˆ xit  xit   xi  for all Xk  *  1  ˆ The estimation of the two-way random effect model is skipped here. e is the n X 1 vector of the group specific means of pooled regression residuals and e' e is the SSE of the pooled OLS regression.edu/~statmath 20 . where  between  SSE between . where vit are the residuals of the LSDV1. nT  e' DDe  nT  T 2e ' e  LM u   1   1 ~  2 (1) . Breusch and Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003). 3. Since  is often unknown.

edu/~statmath 21 . Remember that slopes remain constant in fixed and random effect models. n(T  K ) . the panel data are not poolable. 3. Hausman’s essential result is that the covariance of an efficient estimator with its difference from an inefficient estimator is zero (Greene 2003). http://www.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 21 2 2 The two way random effect model has the null hypothesis of H 0 :  u1  0 and  u 2  0 .indiana. the null hypothesis of the poolability test is H 0 :  ik   k . The F-test is Fobs  (e' e   et' et ) (T  1) K e e ' t t T (n  K )  F (T  1) K .  ei'ei n(T  K ) where e' e is the SSE of the pooled OLS and ei' ei is the SSE of the OLS regression for group i. only intercepts and error variances matter. where et' et is SSE of the OLS regression at time t. s 2 I NT ) . ˆ where. If correlated (H0 is rejected). It is notable that an intercept and dummy variables SHOULD be excluded in computation. T (n  K ) .4 Hausman Test: Fixed Effects versus Random Effects The Hausman specification test compares the fixed versus random effects under the null hypothesis that the individual effects are uncorrelated with the other regressors in the model (Hausman 1978). Similarly. This test uses the F statistic. violating one of the Gauss-Markov assumptions. (e' e   ei' ei ) (n  1) K Fobs  ~ F (n  1) K .5 Poolability Test What is poolability? Poolability tests whether or not slopes are the same across groups or over time. the null hypothesis of the poolability test over time is H 0 :  tk   k . 3. a random effect model produces biased estimators. so a fixed effect model is preferred. The poolability test is undertaken under the assumption of  ~ N (0. The LM test combines two one-way random effect models for group and time. Under this circumstance. LM u12  LM u1  LM u 2 ~  2 (2) . you may go to the random coefficient model or hierarchical regression model. ' ˆ m  bRobust  bEfficient   1 bRobust  bEfficient  ~  2 (k ) . Thus.   Var[bRobust  bEfficient ]  Var (bRobust )  Var (bEfficient ) is the difference in the estimated covariance matrix of the parameter estimates between the LSDV model (robust) and the random effects model (efficient). If the null hypothesis is rejected.

27441 14.6650252 12.8123749 11.0460361 .5971917 | n = 6 within | . . airlines measured at 15 different time points.xtsum. fit the pooled regression model without any dummy variable.use command reads a data set airline.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 22 4.278573 . The sample panel data set includes cost and its related data of six U.55017 13.1 The Pooled OLS Regression Model First.77036 .676287 | N = 90 between | .174309 1. .6581019 | T = 15 4.3733 | N = 90 between | .indiana. . Dev.0237151 12.11545 14.dta.dta and . regress cost output fuel load http://www. One-way Fixed Effect Models: Group Effects A one-way fixed group model examines group differences in intercepts.S. clear .36561 1.150606 -3.131971 11. 1 to 15 1 unit Let us take a look at descriptive statistics of key variables using . index number fuel float %9.67563 | n = 6 within | . .0281511 .0g Year cost float %9. The following .91617 | T = 15 | | output overall | -1.indiana.000 output float %9.9978636 12. The between effect model uses group means of variables.0g Total cost in $1. The LSDV for this fixed model needs to create as many dummy variables as the number of entities or subjects.8513 | T = 15 | | load overall | . describe airline year cost output fuel load storage display value variable name type format label variable label ----------------------------------------------------------------------------------------------airline int %8.edu/~statmath 22 .5604602 . xtsum cost output fuel load Variable | Mean Std.166556 -2.6608616 | N = 90 between | 1.5197756 .1339861 | T = 15 | | fuel overall | 12.0g Output in revenue passenger miles.987984 .56883 13.0527934 .tsset command.49898 .7318 12.14154 15.7921 | n = 6 within | .432066 .4368492 . tsset airline year panel variable: time variable: delta: airline (strongly balanced) year.831 | N = 90 between | .edu/~statmath/stat/all/panel/airline.0g Fuel price load float %9.describe displays basic information of key variables.8120832 11. Min Max | Observations -----------------+--------------------------------------------+---------------cost overall | 13. the within effect model is useful since it transforms variables using group means to avoid dummies. use http://www.0g Load factor You need to declare a cross-sectional (airline) and a time-series (year) variables using the .4208405 -1. When many dummies are needed.0g Airline name year int %8.3192696 | n = 6 within | .

Err.313948 -. SSE decreases from 1.4943404 load | -1.000 -2. its Y-intercept.01552839 -------------+-----------------------------Total | 114.9193*output +. They report the identical parameter estimates of regresors except for dummy coefficients.2926.9193*output +.9193*output +.453977 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 112.6647 + .9193*output +.9883 = 0.33544153 86 .edu/~statmath 23 . PROC REG fits the OLS regression model.12461 -----------------------------------------------------------------------------cost | Coef.0203042 22.0704*load In SAS. goodness of fit. MODEL cost = g1-g5 output fuel load. Let us drop the last dummy g6 and use it as the reference group.8905 + .9879 = . LSDV1 fits the data better than does the pooled OLS.9883 to .0704*load Airline 4: cost = 9.8563895 . LSDV produces six regression equations for six airlines. We may.4175*fuel -1. suspect if there is a fixed group effect producing different intercepts across groups. Due to the dummies included.9090876 fuel | .000 .9410727 _cons | 9.5169 + .9193*output +.0704*load Airline 6: cost = 9.3354 to .7059 + . Of course.airline.0704*load Airline 5: cost = 9. RUN.36 0.4175*fuel -1.7300 + .9883).28135835 Linear Regression Models for Panel Data: 23 Number of obs F( 3.4175*fuel -1.040893 89 1. How can we draw these equations using LSDV1? Airline 1: cost = 9. PROC REG DATA=masil.972645 ------------------------------------------------------------------------------ The regression equation is cost = 9.2292445 41. this approach is commonly used in practice.2 LSDV1 without a Dummy LSDV1 drops a dummy variable to get the model identified. p<.5684839 Residual | 1. and standard errors. Interval] -------------+---------------------------------------------------------------output | . 4.516923 . t P>|t| [95% Conf.6275*load. however. The REG Procedure Model: MODEL1 Dependent Variable: cost http://www. 86) Prob > F R-squared Adj R-squared Root MSE = 90 = 2419.4970 + . when all regressors are set to zero. there are three equivalent approaches of LSDV.4540*fuel -1.000 .345302 -4.7930 + . Std. As discussed in Chapter 2.34 = 0.0704*load Airline 2: cost = 9. This model fits the data well (F=2419.0704*load Airline 3: cost = 9.4175*fuel -1.9974.8827385 .705452 3 37.71 0.000 9. you may drop another dummy variable to get the equivalent result. This difference is modeled as a fixed group effect.0132545 66. LSDV1 produces correct ANOVA information.34. this model loses five degrees of freedom (from 86 to 81). As a consequence.4175*fuel -1.4175*fuel -1. Each airline may have a significantly different level of cost.62751 .indiana.51 0.60 0.0612 9.0000 = 0. parameter estimates. but R2 increases from .8827*output +.9193*output +.4136136 .0000 and R2=. Let us begin with LSDV1.

44970 R-Square Adj R-Sq 0.28135835 Number of obs F( 8.9974 0.2960)*0 + (.06301 0. Std.95 -2. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.01520 0.0757281 -1.0941 <.292622872 81 . The actual intercept of airline 1. The coefficient -. .31 Pr > |t| <.06011 -----------------------------------------------------------------------------cost | Coef.094 -.79300 -0.79 Pr > F <.1282976 .2789728 .92 2.9974 = 0.0001 <.0100 <.03 0.0001 The parameter estimate of g6 is presented in the intercept (9. The output is identical to that of PROC REG.76 27.0041 0.74827 0.304 -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Number of Observations Read Number of Observations Used Linear Regression Models for Panel Data: 24 90 90 Analysis of Variance Sum of Squares 113.0001 0.05002 0.36561 0.03301 0.0871 smaller than that of airline 6 (reference point). where 9. for example.080469 g2 | -.0000 = 0.64 30.regress command for OLS regression (LSDV). regress cost g1-g5 output fuel load Source | SS df MS -------------+-----------------------------Model | 113. t P>|t| [95% Conf.07573 0. is computed as 9. Other dummy parameter estimates are computed using the reference point.02989 0.14 -1.0630)*0 or simply 9.00361 Source Model Error Corrected Total DF 8 81 89 F Value 3935.74827 8 14. the intercept of this model.7930 is the reference point.edu/~statmath 24 .0001 <.04089 Mean Square 14.7930 + (-.7059 = 9. Err.9972 Parameter Estimates Parameter Estimate 9.0871).47 -5.1283)*0 + (-.040893 89 1.9972 = .69 0.0870617 .03 -1.29598 0.0871)*1 + (-.0001 0.02389 0. Stata has the .91928 0.7930).7059) is .79 = 0.09749 -0.69 -5.0841995 -1.2185338 Residual | .12830 -0.06011 13.26366 0.08706 -0.0975)*0 + (.0001 Root MSE Dependent Mean Coeff Var 0.29262 114. Interval] -------------+---------------------------------------------------------------g1 | -.3042 0.08420 0.21853 0.0223776 http://www.41749 -1.2545924 .20169 Variable Intercept g1 g2 g3 g4 g5 output fuel load DF 1 1 1 1 1 1 1 1 1 t Value 37.7930 + (-.07040 Standard Error 0.indiana.003612628 -------------+-----------------------------Total | 114.0871 says that the Y-intercept of airline 1 (9.

09749253 . choice of a dummy variable to be dropped does not change a model.7703592 LOAD | -1.G2.92 0. The other statistics such as parameter estimates of regressors and goodness-of-fit measures remain unchanged.G1.64 0.0238919 -2.000 .142 .0865 | | Restricted(b=0) = -138.070396 . instead of g6? Since the different reference point is applied.9192846 .LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27.95 0.4174918 . the intercept 9.0000 .0318159 . = -5.79 = 0.20168924 -5.2636622 37.08419916 -1.56046016 What if we drop a different dummy variable.OUTPUT.28135835 Number of obs F( 8.02988997 30.20169 -5. run the Regress$ command to fit the LSDV1.000 -1.G3.0330093 2.8598126 .89 (.307 .0412.0940 .02389180 -2. 81] (prob) =3935. regress cost g2-g6 output fuel load Source | SS df MS -------------+-----------------------------Model | 113.0000 12.000 9.16666667 G4 | .6690963 _cons | 9. you will get different dummy coefficients.000 -.9972 = . The Y-intercept of airline 2 is computed to get 9.31 0.0154697 output | .9974341 | | Adjusted R-squared = .4867748 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 9.004 . LogAmemiya Prd.003612628 -------------+-----------------------------Total | 114.07039502 .468 .2185338 Residual | . Do not forget to include ONE for the intercept in the Rhs subcommand.637 .47 0. = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = . The Y-intercept of airline 2 (9.0000) | | Diagnostic Log likelihood = 130.29598860 .0100 . That is.17430918 FUEL | .9971806 | | Model test F[ 8.03300915 2.0412 smaller than the reference point of 9.954 .74827 8 14.694 .3581 | | Chi-sq [ 8] (prob) = 536.1105443 -.063007 .793004 .268399 10.040893 89 1.097494 . .2926208 | | Standard error of e = .292622872 81 .0298901 30.0000 G1 | -.9787565 fuel | .G4.26366104 37.917 .05002285 -5.6647=9.395513 -.471696 -.1631721 g5 | -.3872503 .3042 .36561 | | Standard deviation = 1.G5. say g1.6647) is .6010493E-01 | | Fit R-squared = . --> REGRESS.528017 | | Akaike Info.7059 in this model is the actual parameter estimate (Y-intercept) of g1.0500231 -5.010 -. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.1964526 g4 | .16666667 G5 | -.0151991 27.7059.0000) | | Info criter.06300770 .4477333 load | -1.07572778 -1.12830600 .2959828 .0000 -1.edu/~statmath 25 . As shown in the above.Lhs=COST.000 .08707202 . 2009 at 03:51:23PM | | LHS=COST Mean = 13. which was excluded from the model.0041 .79302127 .06011 http://www.41749105 . Criter.16666667 OUTPUT | .7059.FUEL.0000 .e(-1)] = .9974 = 0.131971 | | WTS=none Number of observs.01519907 27.Rhs=ONE.16666667 G2 | -.528687 | | Autocorrel Durbin-Watson Stat. = 1.034 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 25 g3 | -.16666667 G3 | -.31761 ------------------------------------------------------------------------------ In LIMDEP.756 . Actual Y-intercepts of other dummies are computed in this manner.indiana.91928814 .0000 = 0. = -5.0264504 | | Rho = cor[e.14 0.82 (.76 0. Crt.

9787565 fuel | .000 .080469 .304 -.0412359 . SSE.xi prefix command (interaction expansion) to obtain the identical result.070396 . Because LSDV2 suppresses the intercept.1845557 .06011 Source | SS df MS -------------+-----------------------------Model | 113.64 0.47 0.2940769 -.0088722 g3 | -. you will get incorrect F and R2 statistics.0870617 .764 -.0240547 .0427986 -4.000 .0902 ------------------------------------------------------------------------------ When you have not created dummy variables. PROC REG DATA=masil. You do not need to compute actual Y-intercept any more.000 -1.5. Err.20169 -5. xi: regress cost i.292622872 81 . .31 0.26 0.03 0.79 = 0. you need to use the /NOINT option to suppress the intercept.76 0. The Stata .003 .6690963 _cons | 9.0636769 .3872503 .321686 10.040893 89 1. Std.edu/~statmath 26 . the F value of 497.105 -.0251839 -1.0088722 _Iairline_3 | -.8598126 .1830387 _Iairline_6 | . However.0636769 .9787565 fuel | .985 and R2 of 1 are not likely.080469 . t P>|t| [95% Conf.2 are what we got here using LSDV2.0151991 27.2089211 .1349293 . Interval] -------------+---------------------------------------------------------------g2 | -.0298901 30. In PROC REG.000 9.1237652 g4 | .0902 ------------------------------------------------------------------------------ 4.9192846 . and their standard errors are correct.64 0.193124 50.0251839 -1.4477333 load | -1. take advantage of the . However. Obviously.0870617 .30 0. Interval] -------------+---------------------------------------------------------------_Iairline_2 | -. MODEL cost = g1-g6 output fuel load /NOINT. RUN.0607527 3.000 9.88 0.76 0.4174918 .4174918 .31 0.2940769 -.6690963 _cons | 9. is used either as an ordinary command or a prefix command.000 -.9972 = .321686 10. http://www.070396 .30 0.04 0.74827 8 14.0913441 .47 0.4477333 load | -1.0913441 .9974 = 0.003612628 -------------+-----------------------------Total | 114.0799041 0.04 0. _Iairline_1 omitted) Number of obs F( 8. the SSE of LSDV2 is correct. and then run the command following the colon.26 0. .304 -.2185338 Residual | . parameter estimates of regressors.0841995 1.000 -1.000 .1830387 g6 | .0799041 0. Stata by default drops the first dummy variable.0298901 30.3 LSDV2 without the Intercept LSDV2 reports actual parameter estimates of the dummies.1845557 .2545924 output | . t P>|t| [95% Conf.airline.3872503 .0427986 -4. Err.3054345 g5 | .003 .0841995 1.8598126 .471696 -.1349293 .000 .9192846 .705942 . 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.1237652 _Iairline_4 | .airline output fuel load i. Std.2089211 .2 drop the last dummy.193124 50.20169 -5.0151991 27.000 -.105 -. Make sure that the intercepts presented in the beginning of Section 4.88 0.xi.2545924 output | .28135835 -----------------------------------------------------------------------------cost | Coef.3054345 _Iairline_5 | .bysort.airline _Iairline_1-6 (naturally coded. like.xi creates dummies from a categorical variable specified in the term i.471696 -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 26 -----------------------------------------------------------------------------cost | Coef. while PROC TSCSREG and PROC PANEL in Section 4.03 0.indiana.0607527 3.0240547 .764 -.0412359 .0000 = 0.705942 .

29 0.497021 .3043 9 1799.049424 9. Interval] -------------+---------------------------------------------------------------g1 | 9.0001 <.22496 0. .01520 0. noc Source | SS df MS -------------+-----------------------------Model | 16191.409464 10.26 48.321686 10.91 37.19898 0.29262 16192 Mean Square 1799.0001 <.000 9.24176 0.0000 1.944618 g4 | 9.19312 0.76 27.000 9.193124 50.2609421 37.79300 0.000 9.14 30.66471 9.0001 <.26 0.000 9. Std. regress cost g1-g6 output fuel load.89050 9.73000 9.26094 0.0001 Stata uses the noconstant option to suppress the intercept.0001 Root MSE Dependent Mean Coeff Var 0.36561 0.705942 . Analysis of Variance Sum of Squares 16191 0.57 42.0001 <.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 27 The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 NOTE: No intercept in model.2417635 40.06011 -----------------------------------------------------------------------------cost | Coef.91928 0.198982 48.41749 -1.70594 9.0001 <. 81) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 . Err.0000 1.47 -5.729997 .906633 Number of obs F( 9. Notice that noc is its abbreviation.0902 g2 | 9.210804 10.0000 . t P>|t| [95% Conf.31 Pr > |t| <.0000 1.003612628 -------------+-----------------------------Total | 16191.2249584 42.29 37.02989 0.22 40.49702 9. 0.edu/~statmath 27 .000 9. R-Square is redefined.890498 .20169 Variable g1 g2 g3 g4 g5 g6 output fuel load DF 1 1 1 1 1 1 1 1 1 t Value 50.0001 <.26366 0.00361 Source Model Error Uncorrected Total DF 9 81 90 F Value 497985 Pr > F <.0000 Parameter Estimates Parameter Estimate 9.03381 Residual | .37153 g5 | 9.24919 http://www.292622872 81 .44970 R-Square Adj R-Sq 1.0001 <.06062 g3 | 9.57 0.91 0.664706 .22 0.03381 0.0001 <.5969 90 179.indiana.268794 10.06011 13.07040 Standard Error 0.

Lhs=COST.0151991 27. = -5.19898117 48.0000 .16666667 G4 | 9.41749105 .G5.01519907 27. 81] (prob) =3935.0298901 30. LogAmemiya Prd.307 .G6.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 28 g6 | 9.airline.Rhs=G1.0000) | | Diagnostic Log likelihood = 130.910 .528017 | | Akaike Info.142 .16666667 G3 | 9.16666667 G6 | 9.89051381 .528687 | | Autocorrel Durbin-Watson Stat.000 9. REGRESS.14 0. Crt.0000) | | Info criter.4 LSDV3 with Restrictions LSDV3 imposes a restriction that the sum of the dummy parameters is zero. Rsqd & F may be < 0.3581 | | Chi-sq [ 8] (prob) = 536.73001357 .G4.9971806 | | Model test F[ 8.0000 .16666667 G5 | 9.0000 .9192846 .268399 10.edu/~statmath 28 .91928814 . MODEL cost = g1-g6 output fuel load.2926208 | | Standard error of e = . Unlike SAS and Stata.82 (.31761 output | .49703267 .471696 -.0000 12.26094094 37.36561 | | Standard deviation = 1.0000 . = 1.LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27.2636622 37.7703592 LOAD | -1.000 . = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .FUEL.3872503 . compared to those of LSDV1 and LSDV2. LSDV3 reports the correct ANOVA table and parameter estimates of regressors but produces different. LIMDEP reports correct R2 (. Criter.0000 .217 .0000 -1.288 .19312325 50.22495746 42.000 .0000 .756 .02988997 30.47 0.20169 -5.76 0. dummy coefficients due to the different baseline (group average) used.070396 .17430918 FUEL | .000 -1.131971 | | WTS=none Number of observs.0000 .468 .571 .20168924 -5. RUN.9787565 fuel | .G3.89 (.0865 | | Restricted(b=0) = -138.indiana. = -5. you need to drop ONE out of the Rhs subcommand to suppress the intercept.e(-1)] = .4477333 load | -1.70594925 .4867748 | | Not using OLS or no constant.66471527 .6010493E-01 | | Fit R-squared = .4174918 .07039502 .6690963 ------------------------------------------------------------------------------ In LIMDEP.26366104 37.56046016 4.793004 .OUTPUT. PROC REG has the RESTRICT statement to impose restrictions.258 . The REG Procedure Model: MODEL1 Dependent Variable: cost http://www. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 9.9974341 | | Adjusted R-squared = .24176245 40.16666667 G2 | 9. RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.G2.9974) and F (3.79302127 . PROC REG DATA=masil.936) even in LSDV2.8598126 .16666667 OUTPUT | .31 0.0264504 | | Rho = cor[e. 2009 at 03:53:24PM | | LHS=COST Mean = 13.

In Stata.96 30.0001 1.01942 0.31 0.0001 <.01647 0.47 -5. The actual intercept of airline 2.01674E-15 Standard Error 0.04562 0.7135+ (-. constraint(1) Constrained linear regression Number of obs F( 8.0001 0.0488). cnsreg cost g1-g6 output fuel load.17 -1.edu/~statmath 29 . . for example.04050 0.22964 0.79 0.48 9.20169 7.01520 0.11 0. Notice that the 3.cnsreg command in stead of .indiana. square root of MSE).04882 -0.06011 13. you have to use the . constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0 .0000 0. Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.8683 0.03798 0.07040 3.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 29 NOTE: Restrictions have been applied to parameter estimates.03669 0.9974 0.0001 Root MSE Dependent Mean Coeff Var 0.76 27.74827 0.45 1.714).0001 <.29 -13. A dummy coefficient means the deviation from the averaged group effect (9.07948 0.82306E-11 Variable Intercept g1 g2 g3 g4 g5 g6 output fuel load RESTRICT DF 1 1 1 1 1 1 1 1 1 1 -1 t Value 42. however. 81) Prob > F Root MSE = = = = 90 3935.9972 Parameter Estimates Parameter Estimate 9.00759 -0.00361 Source Model Error Corrected Total DF 8 81 89 F Value 3935.01674E-15 of RESTRICT is virtually zero.6647 =9.71353 -0. is 9.41749 -1.30 -0.0000* * Probability computed using beta distribution.91928 0. The command.0532 <.04089 Mean Square 14.21853 0.6547 0.0001 <.regress.0601 http://www. does not provide an ANOVA table and goodness-of-fit statistics other than F and SEE (standard error of residual--error term.0001 0.02989 0.36561 0.2023 <.17697 0.01606 0.44970 R-Square Adj R-Sq 0.79 Pr > F <.00 Pr > |t| <.21651 0.29262 114.

9971806 | | Model test F[ 8.3872503 .0000 .G1.229641 42.1845478 g4 | .56046016 LSDV3 in LIMDEP reports different dummy coefficients.5 Within Group Effect Model The within effect model does not use dummy variables and thus has larger degrees of freedom. http://www.16666667 G5 | . The actual intercept of airline 5.000 .0405008 1. Std. smaller MSE. 80] (prob) = .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 30 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 -----------------------------------------------------------------------------cost | Coef.17697283 .0000) | | Diagnostic Log likelihood = 130.4174918 .OUTPUT. As a consequence.0194247 9.0151991 27.6010493E-01 | | Fit R-squared = . Interval] -------------+---------------------------------------------------------------g1 | -.G2.0366904 0.01519907 27.G5.LOAD.4477333 load | -1.01647259 . 81] (prob) =3935.0831792 g2 | -. But you may compute actual intercepts of groups in a manner similar to what you would do in SAS and Stata.131971 | | WTS=none Number of observs.04561756 -. F[ 1.0000 -1.479 .286 .468 .20169 -5.01942459 9.Rhs=ONE.2023 .9787565 fuel | .00759172 .36561 | | Standard deviation = 1.655 -.41749105 .0000 .000 .0160624 -13.e(-1)] = .21650830 .528687 | | Autocorrel Durbin-Watson Stat.0379787 -1.11 0. | | Note.17430918 FUEL | . is 9.166 .0565335 .0264504 | | Rho = cor[e.G3.0075859 .47 0.29 0.7300 = 12.8598126 .756 .111 .17 0.0000 12.3581 | | Chi-sq [ 8] (prob) = 536.30 0.82 (.16666667 G6 | .04882570 .256614 10.000 -.0456178 -0.000 9.07039502 .6690963 _cons | 9. Rsqd & F may be < 0.03669023 .070396 .0532 .9192846 .7703592 LOAD | -1. t P>|t| [95% Conf.0000 G1 | -. b(2) in Cls: indicates the parameter of the second variable.16666667 G2 | -. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 31.4867748 | | Restrictns.000 . = -5.9974341 | | Adjusted R-squared = . Criter.307 . | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 9.0267439 g3 | -.471696 -. Err.22964002 42. 4. do not forget to include ONE in Rhs. Crt.indiana.0983509 .71354097 .76 0.299 .16666667 G3 | -. REGRESS.31 0.89 (.1383208 .17044 ------------------------------------------------------------------------------ LIMDEP has the Cls subcommand to impose restrictions.48 0. and smaller standard errors of parameters than those of LSDV.2165069 . for example.449 . g1.053 -.001108 .00 (*****) | | Not using OLS or no constant.Lhs=COST.0894712 g6 | . with restrictions imposed.20168924 -5.868 -.edu/~statmath 30 .0000 .1243875 .1600597 output | .0488218 .962 . LogAmemiya Prd. listed in Rhs.1769698 . = -5.02988997 30.0164689 .45 0.528017 | | Akaike Info.07948030 .6547 .8682 . Again.000 -1.1221 + (-2.16666667 G4 | . = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .3920).0298901 30. 2009 at 06:39:21PM | | LHS=COST Mean = 13.0865 | | Restricted(b=0) = -138.2484661 -.FUEL.0794759 . = 1.2156189 g5 | . Rsqd may be < 0.16666667 OUTPUT | .713528 .2926208 | | Standard error of e = .G4.96 0.202 -.04050059 1.91928814 .01606233 -13.0000) | | Info criter.G6.03797853 -1.

R2.82 = 0.37231 -.2857) + .361009 90 .9122626 12. You need to compute group means.457206 -. This model does not report individual dummy coefficients either.3630 – {.37247 -.5971917 | | 2 14. Std. Err. 4. The within effect model reports correct SSE and parameter estimates of regressors but incorrect R2 and standard errors of parameter estimates. .7921 + (-1.0146657 28. we are ready to run the within effect model. Interval] -------------+---------------------------------------------------------------gw_output | . quietly quietly quietly quietly gen gen gen gen gw_cost = gw_output gw_fuel = gw_load = cost . In order to get the correct standard errors.635174 12. For example. by(airline) gm_fuel=mean(fuel). +------------------------------------------------------+ | airline gm_cost gm_output gm_fuel gm_load | |------------------------------------------------------| | 1 14. you need to compute them if really needed.75171 . Keep in mind that you have to suppress the intercept.gm_output fuel . .1946109 -5.070396 . .47 0. 87) Prob > F R-squared Adj R-squared Root MSE = 90 = 3871.7921 .028841 31.730 = 12.058 -----------------------------------------------------------------------------gw_cost | Coef. The SAS TSCSREG and PANEL procedures and LIMDEP Regress$ command report the adjusted (correct) MSE.78972 . . let us manually estimate the within group effect model with Stata.5.0704)*. you need to adjust them using the ratio of degrees of http://www.1358 -1.000 .9923 = .7318 .4175*12.0227954 Residual | .4174918 .285681 12.27441 -2. .50 0.5845358 | | 4 13.6835858 ------------------------------------------------------------------------------ You may compute group intercepts using d i*  yi    ' xi  .5470946 | | 3 13. by(airline) You will get the following group means of variables.gm_load Now.edu/~statmath 31 . and standard errors.0000 = 0.9193*(-2.5664859 | | 6 12.3192696 12.437344544 Number of obs F( 3.0683861 3 13. the intercept of airline 5 is computed as 9. SEE (square root of MSE).5665}.9926 = 0.gm_cost = output .86196 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 31 you need to adjust standard errors. regress gw_cost gw_output gw_fuel gw_load.indiana. by(airline) gm_output=mean(output).67563 .9766092 gw_fuel | . quietly quietly quietly quietly egen egen egen egen gm_cost=mean(cost).9192846 . noc Source | SS df MS -------------+-----------------------------Model | 39. t P>|t| [95% Conf.36304 -2.gm_fuel load .4466414 gw_load | -1.5476773 | | 5 12.49898 12.87 0. Notice that the degrees of freedom increase from 81 (LSDV) to 87 since six dummy variables are not used.003363481 -------------+-----------------------------Total | 39.000 .292622861 87 . by(airline) gm_load=mean(load).1 Estimating the Within Effect Model First.77803 .7788 .033027 12.000 -1. . . .3883422 .5197756 | +------------------------------------------------------+ Then transform dependent and independent variables to compute deviations from group means. .

A data set needs to be sorted in advance by the variables.3042 Label Cross Sectional http://www.2926 0. For example. in fact. the standard error of the logged output is computed as .indiana. but you do not need to create dummy variables and compute deviations from group means.airline. ID airline year.edu/~statmath 32 . /FIXONE of the MODEL statement fits a one-way fixed effect model. They. The TSCSREG Procedure Fixed One Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixOne 6 15 Fit Statistics SSE MSE R-Square 0. PROC SORT DATA=masil.73 Pr > F <.2 Using SAS: PROC TSCSREG and PROC PANEL PROC TSCSREG and PROC PANEL of SAS/ETS allows users to fit the within effect model conveniently.0601 F Test for No Fixed Effects Num DF 5 Den DF 81 F Value 57.08706 t Value -1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 32 freedom of the within effect model and LSDV.0288*sqrt(87/81). MODEL cost = output fuel load /FIXONE.9974 DFE Root MSE 81 0.0842 Variable CS1 DF 1 Estimate -0. RUN.0001 Parameter Estimates Standard Error 0. report LSDV1. These time-series and cross-sectional variables may be numeric or string in SAS. 4. which will appear in the ID statement of PROC TSCSREG and PROC PANEL.airline.0036 0. BY airline year.03 Pr > |t| 0.0299=.5. PROC TSCSREG DATA=masil.

.604 and R2 of .20169 -5.95 -2. fe i(airline) Fixed-effects (within) regression Group variable: airline R-sq: within = 0. i(airline) is redundant. Both variables should be numeric in Stata.0298901 30. string variables are not allowed in . Both PROC TSCSREG and PROC PANEL report correct (adjusted) MSE.9856 overall = 0.8598126 . ID airline year. Once .9192846 .76 27.9926.0299 0. t P>|t| [95% Conf.0151991 27.0500 0. R2.2637 0.919285 0.0941 <. and conduct the F test for fixed group effect as well.81) Prob > F = = corr(u_i.0 15 3604.76 0. .000 .000 9.0001 Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Intercept The following PANEL procedure returns the same output.47 -5.14 30.0041 0. quietly tsset airline year The fe option of .229641 42.2017 Linear Regression Models for Panel Data: 33 -1. Interval] -------------+---------------------------------------------------------------output | .47 0. Xb) = -0. PROC PANEL DATA=masil. MODEL cost = output fuel load /FIXONE.3475 -----------------------------------------------------------------------------cost | Coef.06301 9.80 0. and standard errors.0000 Obs per group: min = avg = max = F(3.3 Using Stata The Stata .tsset.© 2005-2009 The Trustees of Indiana University (9/16/2009) CS2 CS3 CS4 CS5 Intercept output fuel load 1 1 1 1 1 1 1 1 -0. SEE. This command report incorrect F 3.4174918 .256614 10.0330 0.000 .tsset is executed.000 -1. Std. 4.0704 0. xtreg cost output fuel load.0001 <.0152 0.92 2.0001 <.5.417492 -1.9873 Number of obs Number of groups = = 90 6 15 15.3872503 .31 0.713528 .793004 0.indiana.070396 .64 37.4477333 load | -1.097494 -0.471696 -.0001 <.9926 between = 0. They have strong advantages over other software packages in this respect. Err.9787565 fuel | .xtreg command fits the within group effect model without creating dummy variables.69 -5.17044 http://www.0239 0.30 0.6690963 _cons | 9. .0001 0.31 0.airline.xtreg indicates the within effect model and i(airline) specifies airline as the independent unit.0757 0.1283 -0.tsset command that specifies cross-sectional and timeseries variables.xtreg should follow the .0100 <.29598 0. RUN.edu/~statmath 33 .

000 .Fixed$ +----------------------------------------------------+ | OLS Without Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27.Str=AIRLINE. μ1=0.1320775 sigma_e | .131971 | | WTS=none Number of observs. Std.0000) | | Diagnostic Log likelihood = 61. The last line of the output tests the null hypothesis that five dummy parameters in LSDV1 are zero (e. .areg to get the same result except for R2.471696 -.OUTPUT.Panel.7135 is the average of six airlines. you may use .3581 | | Chi-sq [ 3] (prob) = 400.121594 | | Akaike Info. Interval] -------------+---------------------------------------------------------------output | . R2 and F statistic are not correct.06011 -----------------------------------------------------------------------------cost | Coef. μ3=0.31 0. Notice that the intercept of 9.FUEL. 81) = 57.3872503 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 34 -------------+---------------------------------------------------------------sigma_u | .732 0. Err. Alternatively.edu/~statmath 34 .256614 10.713528 . absorb(airline) Linear regression.0151991 27.0000 Like PROC PANEL.47 0.33 (.0298901 30.20169 -5. REGRESS. = 90 | | Model size Parameters = 4 | | Degrees of freedom = 86 | | Residuals Sum of squares = 1. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3604.17044 -------------+---------------------------------------------------------------airline | F(5.8598126 .9192846 . 86] (prob) =2419.000 9. = -4.121653 | +----------------------------------------------------+ http://www. But this command does not provide an analysis of variance (ANOVA) table. The intercept 9.06010514 rho | .5. absorbing indicators Number of obs F( 3.9972 = . the intercept of LSDV3.36561 | | Standard deviation = 1.g.26 (.335450 | | Standard error of e = .9787565 fuel | . and μ5=0).000 .76991 | | Restricted(b=0) = -138. which is correct.4477333 load | -1. μ4=0.9878812 | | Model test F[ 3.xtreg reports correct standard errors and the F test for a fixed group effect. .229641 42. = -4. Crt.1246133 | | Fit R-squared = .6690963 _cons | 9.LOAD.000 (6 categories) 4. areg cost output fuel load.80 = 0.73 Prob > F = 0. The Str subcommand specifies a stratification variable.30 0. the Panel and Fixed subcommands in the Regress$ command fit a fixed effect panel data model. Criter..Lhs=COST.070396 .9882897 | | Adjusted R-squared = .82843653 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(5.4174918 . LogAmemiya Prd.indiana.76 0. 2009 at 03:56:52PM | | LHS=COST Mean = 13. μ2=0.9974 = 0.000 -1. t P>|t| [95% Conf.Rhs=ONE. 81) = 57.7135 is that of LSDV3.0000) | | Info criter.4 Using LIMDEP In LIMDEP.0000 = 0.

7703592 LOAD | -1. Autocorrelation of e(i.1140409821D+03 .51691223 .00000 31.00000 | |(4) vs (1) 536.9971806 | | Model test F[ 8.0000 -1.62750780 .9974341 | | Adjusted R-squared = .00000 | |(3) vs (1) 400.2926208 | | Standard error of e = .48804 .599 .45397771 . .02030424 22.818 8 81 .733 5 81 .00000 3604. 2009 at 03:56:52PM | | LHS=COST Mean = 13.7703592 LOAD | -1.9882897 | |(4) X and group effects 130.0865 | | Restricted(b=0) = -138.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 35 +----------------------------------------------------+ | Panel Data Analysis of COST [ONE way] | | Unconditional ANOVA (No regressors) | | Source Variation Deg. LogAmemiya Prd.149 3 . = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .329 3 86 .00000 57.2926207777D+00 . P value | |(2) vs (1) 95.0000000 | |(2) Group effects only -90.0000 12.756 .00000 2419.468584 | | Total 114.02988997 30. 81] (prob) =3935.17430918 FUEL | .3611 84.22924522 41.3936109461D+02 .01325455 66.256 3 .01519907 27.0000) | | Diagnostic Log likelihood = 130.20168924 -5.633 5 . Crt.889 8 .08647 .35814 . 14.36561 | | Standard deviation = 1. = -5.0000 12.t) . = -5.6548513 | |(3) X .0000) | | Info criter. 1. Valid data 6 | | Smallest 15.528687 | | Estd.359 .0000 -1.9974341 | +--------------------------------------------------------------------+ | Hypothesis Tests | | Likelihood Ratio Test F Tests | | Chi-squared d. Prob.82 (. F num.56046016 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.514 .9360 | | Residual 39. Criter.307 .indiana. Mean Square | | Between 74.89 (. Free.41749105 .00000 | +--------------------------------------------------------------------+ http://www.00000 | |(4) vs (2) 441.3581 | | Chi-sq [ 8] (prob) = 536.28136 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .88273863 .875 5 84 .91928814 .0000 .6010493E-01 | | Fit R-squared = . Largest 15 | | Average group size 15.variables only 61.56046016 Constant| 9.f.468 .1335449522D+01 .740 5 .17430918 FUEL | .6799 5.131971 | | WTS=none Number of observs.0000 .528017 | | Akaike Info.041 89.76991 .0000 +----------------------------------------------------+ | Least Squares with Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27. denom.00000 | |(4) vs (3) 136.edu/~statmath 35 .573531 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.00000 3935.00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .34530293 -4.713 .832 3 81 .07039502 .

7824568 . 4. 2) Prob > F R-squared Adj R-squared Root MSE = = = = = = 6 104.019 .19 0. MODEL cost = output fuel load /BTWNG. R .74647 gm_load | -1.airline. Std.8081 56. Let us compute group means and run OLS with them.9936 0. RUN.733).015837963 -------------+-----------------------------Total | 4.12585 -----------------------------------------------------------------------------gm_cost | Coef. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) /// gm_load=load.343 -24. PROC PANEL DATA=masil.94698124 3 1.48199 1. by(airline) .64899375 Residual | . Interval] -------------+---------------------------------------------------------------gm_output | .743167 -0. SEE.031675926 2 .6 Between Group Effect Model: Group Mean Regression A between effect model uses aggregate information. .12 0.05182 _cons | 85.1087646 7.589 -13.97865717 5 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 36 LIMDEP reports both the pooled OLS regression under the label OLS Without Group Dummy Variables and the within effect model under Least Squares with Group Dummy 2 Variables. In other words.79427 13.edu/~statmath 36 .52 0. ID airline year.23 0.478718 -1.523904 4.995731433 Number of obs F( 3.collapse command computes aggregate information and stores into a new data set.751072 2. The . and standard errors of the fixed effect model. t P>|t| [95% Conf. The number of observations jumps down to n from nT. regress gm_cost gm_output gm_fuel gm_load Source | SS df MS -------------+-----------------------------Model | 4. LIMDEP provides correct MSE. the unit of analysis is not an individual observation.indiana.8305 ------------------------------------------------------------------------------ The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between effect model.64 0. but PROC TSCSREG does not. respectively. but entity or subject.2143 328. This model fits data relatively well but its t-tests report insignificant parameters.9841 .268 -157.55397 10. Like the SAS TSCSREG procedure. group means of variables. The PANEL Procedure Between Groups Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections BtwGrps 6 http://www. This group mean regression produces different goodness-of-fit measures and parameter estimates compared to those of LSDV and the within effect model.3144803 1. LIMDEP also conducts the F test for checking a fixed group effect (see the last line of the LIMDEP output above to get 57. Err.250433 gm_fuel | -5.0095 0. Note that /// links two command lines. /BTWNG and /BTWNT fit the between group and time effect models.

Interval] -------------+---------------------------------------------------------------output | .953835 | | Chi-sq [ 3] (prob) = 30.52398 -1.268 -157.© 2005-2009 The Trustees of Indiana University (9/16/2009) Time Series Length Linear Regression Models for Panel Data: 37 15 Fit Statistics SSE MSE R-Square 0.xtreg command has the be option to fit the between effect model but does not report the ANOVA table.2681 0.0317 0.8358 ------------------------------------------------------------------------------ LIMDEP has the Means subcommand to fit the between effect model.80901 56.Means$ +----------------------------------------------------+ | Group Means Regression | | Ordinary least squares regression | | Model was estimated Aug 27.9936 overall = 0. = 6 | | Model size Parameters = 4 | | Degrees of freedom = 2 | | Residuals Sum of squares = .Lhs=COST.0 15 104.Panel. 2009 at 04:04:12PM | | LHS=YBAR(i.343 -24.19 0.1087663 7.0188 0.523978 4.1088 4.3167277E-01 | | Standard error of e = .782455 -5.edu/~statmath 37 .74675 load | -1.4830 0.1258 Parameter Estimates Standard Error 56.8808 between = 0.7432 Variable Intercept output fuel load DF 1 1 1 1 Estimate 85.478802 -1.0095 Obs per group: min = avg = max = F(3.2178 328.34 (.3144715 1.9936383 | | Adjusted R-squared = .4788 2.64 0.75102 t Value 1.250439 fuel | -5.05198 _cons | 85.019 .751016 2.3427 0.LOAD.23 -0.0158 0. Err.1258491 -----------------------------------------------------------------------------cost | Coef.64 Pr > |t| 0.48302 1.2) Prob > F = = sd(u_i + avg(e_i.19 -1.0095) | | Diagnostic Log likelihood = 7.indiana.13 (.1371 Number of obs Number of groups = = 90 6 15 15.9978636 | | WTS=NTi/Nobs Number of observs. Std.23 0.9840957 | | Model test F[ 3.12 0.80901 0.52 0. REGRESS.OUTPUT.36561 | | Standard deviation = .))= .74319 -0.218541 | | Restricted(b=0) = -7.589 -13.7824552 . 2] (prob) = 104.9936 DFE Root MSE 2 0.Str=AIRLINE.) Mean = 13.1258427 | | Fit R-squared = . t P>|t| [95% Conf.55401 10. .0000) | http://www.5886 Label Intercept The Stata .52 7.79471 13. xtreg cost output fuel load.Rhs=ONE.FUEL. be i(airline) Between regression (regression on group means) Group variable: airline R-sq: within = 0.

9883) (6  1)  ~ 57. TEST g1 = g2 = g3 = g4 = g5 = 0. Alternatively.2174 . There is a fixed group effect in these panel data. Stata.5233 .12 (p<.. however.0095). F 104.indiana.0000).9974) (90  6  3) The large F statistic rejects the null hypothesis in favor of the fixed group effect model (p<. run the .74304702 -. and LIMDEP Regress$ command by default conduct the F test. In order to conduct a F-test. add the TEST statement in PROC REG and then run the procedure again (ANOVA table and parameter estimates are skipped). http://www..52443747 4. use LSDV2 and the within effect model for R2.edu/~statmath 38 . MODEL cost = g1-g5 output fuel load.1258. Criter. (. Alternatively. PROC REG DATA=masil.910724 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St. right after estimating the model.519 . a follow-up command for the Wald test.47865187 -1. The F statistic is computed as (1.9974  . Crt.7319[5. RUN.8148317 56. The REG Procedure Model: MODEL1 Test 1 Results for Dependent Variable cost Mean Square 0. LogAmemiya Prd. Do not.2926) (90  6  3) (1  .634619 | | Akaike Info.3354  .9936.7 Testing Fixed Group Effects (F-test) How do we know whether there is a significant fixed group effect? The null hypothesis is that all dummy parameters except for one are zero: H 0 : 1  .78244727 .9974 from LSDV1 or LSDV3 and .1287 SAS.75094765 2. The SAS TSCSREG and PANEL procedures. = -3. you may conduct the same test in LSDV1.0001 In Stata.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 38 | Info criter.194 . and R2 .20856 0.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .230256D-11 FUEL | -5.10876126 7. you may draw R2 of . = -3.00361 Source Numerator Denominator DF 5 81 F Value 57.4811479 1. In SAS. 4.test command. let us obtain the SSE (e’e) of 1.0317.   n 1  0 .234 .0000 .9883 from the pooled OLS.Er. SEE . and LIMDEP all report the same result: SSE .xtreg command.73 Pr > F <.3354 from the pooled OLS regression and .81] .18642891 LOAD | -1.airline.638 .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model.32541105 Constant| 85. Stata .2926) (6  1) (.

Table 4. “Correct/incorrect” indicates whether the statistics are different from those of the least squares dummy variable (LSDV) 1 without a dummy variable. (adjusted) R2 Correct PROC TSCSREG.be Means. .indiana. and LIMDEP.73 0. . The SAS PANEL procedure is generally preferred to Stata and LIMDEP counterparts since it produces correct statistics and conducts various hypothesis tests conveniently.0000 4.1 summarizes the estimation of a fixed effect model in SAS. LIMDEP* SAS 9 Stata 11 LIMDEP 9 OLS estimation LSDV1 LSDV2 LSDV3 Panel Estimation PROC REG. quietly regress cost g1-g5 output fuel load .cnsreg Regress$ Correct Incorrect F.areg Correct (slightly different F) Correct (slightly different F) Correct (slightly different F) Different dummy coefficients Regress. Between effect /BTWNG. PROC PANEL. Panel$ Estimation type LSDV1 Within effect Within effect SSE (e’e) Correct No Correct MSE or SEE Correct (adjusted) No Correct (adjusted) SEE Model test (F) No Incorrect Slightly different F (adjusted) R2 Correct Incorrect (correct in .1 Comparison of the Fixed Effect Model in SAS.cnsreg No ANOVA table and R2 . http://www.8 Summary Table 4. 81) = Prob > F = 57. (adjusted) R2 . /BTWNT * “Yes/No” means whether the software reports the statistics.xtreg. Stata. test g1 g2 g3 g4 g5 ( ( ( ( ( 1) 2) 3) 4) 5) g1 g2 g3 g4 g5 F( = = = = = 0 0 0 0 0 5.regress. Stata.edu/~statmath 39 .areg) Correct Intercept Correct LSDV3 intercept No Coefficients Correct Correct Correct Standard errors Correct (adjusted) Correct (adjusted) Correct (adjusted) Effect test (F) Yes Yes Yes . Correct Incorrect F.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 39 . .

4845*fuel -1..4845*fuel -1.9544*load Time 04: cost = 20.9544*load Time 07: cost = 21.indiana.4845*fuel -1..8677*output .4845*fuel -1.5369 + .1140 + .4845*fuel -1.. PROC REG DATA=masil.8677*output .6542 + .9544*load Time 03: cost = 20.9544*load Time 13: cost = 22. say t15 here.01511 Source Model Error DF 17 72 F Value 439..4118 + .9544*load Time 10: cost = 22. One-way Fixed Effect Models: Time Effects A fixed time effect model investigates how time affects the intercept using time dummy variables.8677*output .edu/~statmath 40 .95270 1..4845*fuel -1.9544*load Time 08: cost = 21.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 40 5.8677*output .4845*fuel -1.4845*fuel -1.4845*fuel -1.4845*fuel -1..8677*output .8677*output .4845*fuel -1.4845*fuel -1. The logic and method are the same as those of the fixed group effect model.4845*fuel -1...1 LSDV1 without a Dummy In SAS REG procedure..8677*output .5524 + .9544*load Time 11: cost = 22.8677*output .1 Least Squares Dummy Variable Models The least squares dummy variable (LSDV) model produces the following fifteen regression equations Time 01: cost = 20.9544*load 5.. RUN.0001 http://www..8677*output .9544*load Time 15: cost = 22.8677*output .9544*load Time 02: cost = 20.8397 + .5782 + .1. The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 112..4959 + .4845*fuel -1.9544*load Time 06: cost = 21. in LSDV1. You need to exclude one of time dummies.8677*output .8677*output .4655 + . 5.64428 0.5035 + .08819 Mean Square 6.4845*fuel -1..8677*output .7409 + ..9544*load Time 05: cost = 21.6167 + ..9544*load Time 12: cost = 22.8677*output .2000 + . include time dummy variables instead of group dummies.8677*output .airline.9544*load Time 14: cost = 22.62 Pr > F <.9544*load Time 09: cost = 21. MODEL cost = t1-t14 output fuel load.6559 + .6515 + .

© 2005-2009 The Trustees of Indiana University (9/16/2009)
Corrected Total 89 114.04089

Linear Regression Models for Panel Data: 41

Root MSE Dependent Mean Coeff Var

0.12294 13.36561 0.91981

R-Square Adj R-Sq

0.9905 0.9882

Parameter Estimates Parameter Estimate 22.53677 -2.04096 -1.95873 -1.88103 -1.79601 -1.33693 -1.12514 -1.03341 -0.88274 -0.70719 -0.42296 -0.07144 0.11457 0.07979 0.01546 0.86773 -0.48448 -1.95440 Standard Error 4.94053 0.73469 0.72275 0.72036 0.69882 0.50604 0.40862 0.37642 0.32601 0.29470 0.16679 0.07176 0.09841 0.08442 0.07264 0.01541 0.36411 0.44238

Variable Intercept t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load

DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

t Value 4.56 -2.78 -2.71 -2.61 -2.57 -2.64 -2.75 -2.75 -2.71 -2.40 -2.54 -1.00 1.16 0.95 0.21 56.32 -1.33 -4.42

Pr > |t| <.0001 0.0070 0.0084 0.0110 0.0122 0.0101 0.0075 0.0076 0.0085 0.0190 0.0134 0.3228 0.2482 0.3477 0.8320 <.0001 0.1875 <.0001

In Stata and LIMDEP, execute following commands to fit the same LSDV1 (output is skipped).
. regress cost t1-t14 output fuel load
REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

5.1.2 LSDV2 without the Intercept

In LIMDEP, take ONE out to fit LSDV2 by suppressing the intercept. Unlike SAS and Stata, LIMDEP reports correct, although slightly different, F and R2 statistics.
REGRESS;Lhs=COST;Rhs=T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27, 2009 at 04:15:08PM | | LHS=COST Mean = 13.36561 | | Standard deviation = 1.131971 | | WTS=none Number of observs. = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1.088193 | | Standard error of e = .1229382 | | Fit R-squared = .9904579 | | Adjusted R-squared = .9882049 | | Model test F[ 17, 72] (prob) = 439.62 (.0000) | | Diagnostic Log likelihood = 70.98362 | | Restricted(b=0) = -138.3581 |

http://www.indiana.edu/~statmath

41

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 42

| Chi-sq [ 17] (prob) = 418.68 (.0000) | | Info criter. LogAmemiya Prd. Crt. = -4.009826 | | Akaike Info. Criter. = -4.015291 | | Autocorrel Durbin-Watson Stat. = .2363289 | | Rho = cor[e,e(-1)] = .8818355 | | Not using OLS or no constant. Rsqd & F may be < 0. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ T1 | 20.4959389 4.20954636 4.869 .0000 .06666667 T2 | 20.5781713 4.22154389 4.875 .0000 .06666667 T3 | 20.6558664 4.22419549 4.890 .0000 .06666667 T4 | 20.7408923 4.24576770 4.885 .0000 .06666667 T5 | 21.1999763 4.44035103 4.774 .0000 .06666667 T6 | 21.4117634 4.53864000 4.718 .0000 .06666667 T7 | 21.5034994 4.57141663 4.704 .0000 .06666667 T8 | 21.6541766 4.62290530 4.684 .0000 .06666667 T9 | 21.8297215 4.65692608 4.688 .0000 .06666667 T10 | 22.1139553 4.79266903 4.614 .0000 .06666667 T11 | 22.4654855 4.94992975 4.539 .0000 .06666667 T12 | 22.6514956 5.00861379 4.523 .0000 .06666667 T13 | 22.6167135 4.98616006 4.536 .0000 .06666667 T14 | 22.5523879 4.95596262 4.551 .0000 .06666667 T15 | 22.5369251 4.94055238 4.562 .0000 .06666667 OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918 FUEL | -.48449467 .36410984 -1.331 .1875 12.7703592 LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

In SAS and Stata, use /NOINT and noconstant, respectively, to suppress the intercept and estimate the same LSDV2 (output is skipped).
PROC REG DATA=masil.airline; MODEL cost = t1-t15 output fuel load /NOINT; RUN;

. regress cost t1-t15 output fuel load, noc

5.1.3 LSDV3 with a Restriction

In PROC REG, you need to impose a restriction using the RESTRICT statement.
PROC REG DATA=masil.airline; MODEL cost = t1-t15 output fuel load; RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0; RUN; The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.

Number of Observations Read Number of Observations Used

90 90

Analysis of Variance Sum of Squares Mean Square

Source

DF

F Value

Pr > F

http://www.indiana.edu/~statmath

42

© 2005-2009 The Trustees of Indiana University (9/16/2009)
Model Error Corrected Total 17 72 89 112.95270 1.08819 114.04089

Linear Regression Models for Panel Data: 43
6.64428 0.01511 439.62 <.0001

Root MSE Dependent Mean Coeff Var

0.12294 13.36561 0.91981

R-Square Adj R-Sq

0.9905 0.9882

Parameter Estimates Parameter Estimate 21.66698 -1.17118 -1.08894 -1.01125 -0.92622 -0.46715 -0.25536 -0.16363 -0.01296 0.16259 0.44682 0.79834 0.98435 0.94957 0.88524 0.86978 0.86773 -0.48448 -1.95440 -3.9462E-15 Standard Error 4.62405 0.41783 0.40586 0.40323 0.38177 0.19076 0.09856 0.07190 0.04862 0.06271 0.17599 0.32940 0.38756 0.36537 0.33549 0.32029 0.01541 0.36411 0.44238 .

Variable Intercept t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 output fuel load RESTRICT

DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1

t Value 4.69 -2.80 -2.68 -2.51 -2.43 -2.45 -2.59 -2.28 -0.27 2.59 2.54 2.42 2.54 2.60 2.64 2.72 56.32 -1.33 -4.42 .

Pr > |t| <.0001 0.0065 0.0090 0.0144 0.0178 0.0168 0.0116 0.0258 0.7907 0.0115 0.0133 0.0179 0.0132 0.0113 0.0102 0.0083 <.0001 0.1875 <.0001 .

* Probability computed using beta distribution.

In Stata, define the restriction with the .constraint command and specify the restriction using the constraint() option of the .cnsreg command.
. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0 . cnsreg cost t1-t15 output fuel load, constraint(3) Constrained linear regression Number of obs F( 17, 72) Prob > F Root MSE = = = = 90 439.62 0.0000 0.1229

( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0 -----------------------------------------------------------------------------cost | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------t1 | -1.171179 .4178338 -2.80 0.007 -2.004115 -.3382422 t2 | -1.088945 .4058579 -2.68 0.009 -1.898008 -.2798816 t3 | -1.011252 .4032308 -2.51 0.014 -1.815078 -.2074266 t4 | -.9262249 .3817675 -2.43 0.018 -1.687265 -.1651852 t5 | -.4671515 .1907596 -2.45 0.017 -.8474239 -.0868791 t6 | -.2553627 .0985615 -2.59 0.012 -.4518415 -.0588839 t7 | -.1636326 .0718969 -2.28 0.026 -.3069564 -.0203088

http://www.indiana.edu/~statmath

43

018 .4491 30.00 (*****) | | Not using OLS or no constant.0000) | | Diagnostic Log likelihood = 70. 71] (prob) = .33549236 2.9904579 | | Adjusted R-squared = .T15.T2.46715493 .3875583 2.554034 t15 | .32 0.7983439 .0065 .2 Within Time Effect Model http://www. 2009 at 04:16:47PM | | LHS=COST Mean = 13.01126486 .95441438 .3581 | | Chi-sq [ 17] (prob) = 418.266 .7976568 t11 | . Criter.791 -.624053 4. 72] (prob) = 439.0959814 .0627099 2.44237791 -4.T12.94958221 . | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ T1 | -1. = .593 .686 .e(-1)] = .0486249 -0.088193 | | Standard error of e = . Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)+b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 27.06666667 T2 | -1.92623900 .44682406 .edu/~statmath 44 .1416916 1.06666667 T13 | .16363186 .8370111 .0000) | | Info criter.06271009 2.06666667 T6 | -.0083 . run the following command to fit the same LSDV3.FUEL.3202933 2.6671313 4.48449467 .T10.013 .17599505 2.1098872 . = -4. F[ 1.27 0.508275 output | .T9.1229382 | | Fit R-squared = .17119233 .540 .01295461 .2875976 t10 | .Lhs=COST. with restrictions imposed.0133 .06666667 T3 | -1.716 .86979380 .T4.06666667 T4 | -.0168 .06666667 T15 | .449 .88525662 .591 .426 .803 .06666667 T11 | .36410984 -1.T6.17430918 FUEL | -.131971 | | WTS=none Number of observs.013 .3294027 2.9495716 .1876 12.04862498 -.008 .06666667 T7 | -.4844835 .010 .19075952 -2.2212248 1.36561 | | Standard deviation = 1.06666667 OUTPUT | .41783540 -2.454996 t12 | .418 .98362 | | Restricted(b=0) = -138.07189683 -2.2164554 1. = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1. Rsqd & F may be < 0.0000 -1.599 .06666667 T8 | -.59 0.79835421 .T14.09856234 -2.316 .683 .56046016 Constant| 21.42 0.012 .06666667 T12 | .508 .954404 .indiana.T5.0102 .8852448 .8818355 | | Restrictns.0259 .276 .T7.60 0. REGRESS.015291 | | Autocorrel Durbin-Watson Stat.T13.756937 t13 | .06666667 T9 | .3653675 2.0116 .06666667 T5 | -.0178 .0154082 56.8984424 fuel | -.2413535 load | -1.06666667 T14 | .LOAD.86772681 .0116 .07254 _cons | 21.33 0.0375776 .2363289 | | Rho = cor[e.2312891 1.677918 t14 | .08895999 .0000 .7703592 LOAD | -1.0133 .539 .42 0.9843536 .0091 .T8.62407240 4.011 .8697821 .38176914 -2.66698 4.0000 5.000 -2.000 .3641085 -1.3354912 2.T11.T1. = -4.40323211 -2.62 (.06666667 T10 | .T3.32029396 2.Rhs=ONE. | | Note.9882049 | | Model test F[ 17.69 0.8677268 .54 0.210321 .424 .0179 .0114 .0839768 t9 | .38755999 2.40585988 -2.54 0.68 (.OUTPUT.01540818 56.331 .0144 .98436437 .4468191 . Crt. Rsqd may be < 0.009826 | | Akaike Info.4423777 -4.639 .36536879 2.2117702 1.0129552 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 44 t8 | -.64 0.188 -1.25536788 .000 12.16259020 .175994 2.72 0.836268 -1.88486 ------------------------------------------------------------------------------ In LIMDEP. LogAmemiya Prd.1625876 .32940389 2.7907 .

8677*(-1.7341294 90 .4868322 | | 3 12.000 -2.6271 + (-1.26843 . by(year) mean(load). Err.75979 . Std. transform the dependent and independent variables and then run OLS with the intercept suppressed.6428015 13. quietly quietly quietly quietly egen egen egen egen tm_cost = tm_output tm_fuel = tm_load = mean(cost). For instance.63606 .0140171 61.60706 -1.5541809 | | 7 13.tm_fuel load .147 -1.0452 -1.8398663 . For example.67494 . .4024388 -4. Interval] -------------+---------------------------------------------------------------tw_output | . by(year) = mean(output).5607}.9205539 13.23183 .398122 12.11184 -----------------------------------------------------------------------------tw_cost | Coef.5527684 13.95 = 0. .577767 11.954404 .5804528 | | 15 14.tm_cost = output .tm_output fuel .4788587 | | 2 12.67403 .5635266 | | 6 13.7923916 13.8955873 tw_fuel | -.2. noc Source | SS df MS -------------+-----------------------------Model | 75.91324 -.indiana.3312*sqrt(87/72).86 0. 87) Prob > F R-squared Adj R-squared Root MSE = 90 = 2015.302416 12. the correct standard error of fuel price is computed as .852601437 Number of obs F( 3.52358 | | 4 12.067003 12.08819023 87 .754295 -1. standard errors of a within effect model need to be adjusted.53826 .90 0.15965 -1.4651 -1.86104 .5802577 | | 12 14.5797168 | +---------------------------------------------------+ Once time means are ready.5035=13.1597-{.9853 = . .000 .222963 12.3312359 -1.46 0. .tm_load .142851 . regress tw_cost tw_output tw_fuel tw_load.6179098 | | 10 13.70187 -.5803183 | | 14 14.9544)*.12841 -.154514 ------------------------------------------------------------------------------ If you want to get intercepts of years.23517 -.9023156 13. .5607425 | | 8 13.32062 -.6233943 | | 11 13.1738836 tw_load | -1.05984 -.443695 11.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 45 The within effect model for a fixed time effect needs to compute deviations from time means.5244486 | | 5 12. by(year) mean(fuel).94143 -1.82315 .3024) + (-. As discussed previously. 5.4845)*12.393002 12.3641= .4844836 .edu/~statmath 45 .62997 . .36897 -1. use d t*  y t   ' xt .0000 = 0.744389 11.5856243 | | 13 14.012507934 -------------+-----------------------------Total | 76. Keep in mind that the intercept should be suppressed. .66246 . http://www. by(year) +---------------------------------------------------+ | year tm_cost tm_output tm_fuel tm_load | |---------------------------------------------------| | 1 12.1 Estimating the Fixed Time Effect Model Let us manually estimate the fixed time effect model first. .76768 .45963 -1.29884 -1.790283 11.8677268 .62714 .9858 = 0.66868 . quietly quietly quietly quietly gen gen gen gen tw_cost = tw_output tw_fuel = tw_load = cost .8641667 13.5670587 | | 9 13.73193 .77912 -1. the intercept of year 7 is 21. t P>|t| [95% Conf.6459391 3 25.215313 Residual | 1.

272691 -2.029541 .0882 0. MODEL cost = output fuel load /FIXONE.indiana.865108 .e.62714 .2. PROC TSCSREG DATA=masil.edu/~statmath 46 .1.airline.3178). The PANEL Procedure Fixed One Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixOne 15 6 Fit Statistics SSE MSE R-Square 1.510342 . MODEL cost = output fuel load /FIXONE. there is no fixed time effect in these panel data. BY year airline. Min Max -------------+-------------------------------------------------------cost | 6 13.airline. which will appear in the ID statement of PROC TSCSREG and PROC PANEL.5607425 .48162 12. (output is skipped) The F test does not reject the null hypothesis of no fixed time effect (F=1. RUN.0151 0.17.airline.© 2005-2009 The Trustees of Indiana University (9/16/2009) . year and airline). RUN. The output is very similar to that of LSDV1 in Section 5..0747646 12.302416 1.2550375 fuel | 6 12.1229 F Test for No Fixed Effects Num DF Den DF F Value Pr > F http://www.9905 DFE Root MSE 72 0. PROC PANEL DATA=masil. Dev. that is. ID year airline.15965 1. RUN. PROC SORT DATA=masil.68725 load | 6 . p<. sum cost output fuel load if year==7 Linear Regression Models for Panel Data: 46 Variable | Obs Mean Std.1.52004 output | 6 -1.88492 14.2 Using SAS: PROC TSCSREG and PROC PANEL You need to sort the data set by variables (i. ID year airline.071738 11.594495 5.

37 0.33 -4.2947 0.88274 -0.48448 -1.0076 0.95 0.0085 0.0001 <.79601 -1.015463 22.17 0.8320 <.42 Pr > |t| 0.3764 0.95873 -1.0110 0.3228 0.079789 0. .0134 0.0844 0.7228 0.iis command specifies year as a panel identification variable.3178 Parameter Estimates Standard Error 0. In this case.03341 -0.75 -2.3260 0.2482 0. fe i(year) Fixed-effects (within) regression Group variable: year R-sq: within = 0.6988 0.33693 -1.21 4.42296 -0. The following .3641 0.72) Prob > F = = corr(u_i.64 -2.00 1.54 -1.1875 <.56 56. the fe option fits the fixed effect model.xtreg command.70719 -0.57 -2.7347 0.53677 0.indiana.4086 0.0718 0.0075 0.61 -2.4812 overall = 0.16 0.0101 0.4424 Variable CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS9 CS10 CS11 CS12 CS13 CS14 Intercept output fuel load DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Estimate -2.© 2005-2009 The Trustees of Indiana University (9/16/2009) 14 72 Linear Regression Models for Panel Data: 47 1.0000 Obs per group: min = avg = max = F(3.2.0001 0.0 6 1668. i(year) is redundant.71 -2.5060 0.3477 0.7204 0.1503 http://www.1668 0.32 -1.9858 between = 0. iis year .0726 4.5265 Number of obs Number of groups = = 90 15 6 6.edu/~statmath 47 .40 -2.0001 Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Cross Sectional Effect 6 Cross Sectional Effect 7 Cross Sectional Effect 8 Cross Sectional Effect 9 Cross Sectional Effect 10 Cross Sectional Effect 11 Cross Sectional Effect 12 Cross Sectional Effect 13 Cross Sectional Effect 14 Intercept 5.04096 -1.9405 0.71 -2.867727 -0.3 Using Stata In Stata . xtreg cost output fuel load.07144 0.0190 0.0070 0.78 -2.0984 0.88103 -1.9544 t Value -2.75 -2.0154 0.0084 0.0122 0. Xb) = -0.12514 -1.114571 0.

32 0.0154082 56.9882897 | |(4) X and group effects 70.17 Prob > F = 0.56046016 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.418 .331 . Std.3).000 -2.624053 4.Rhs=ONE.indiana.0000) | | Diagnostic Log likelihood = 70.95441438 .0000) | | Info criter. 5.1. t P>|t| [95% Conf.1088193393D+01 .000 . the intercept 21.Lhs=COST.9882049 | | Model test F[ 17. = -4.98362 | | Restricted(b=0) = -138.7673414157D+02 .0000 .4423777 -4.07254 _cons | 21.954404 .68 (. Err.66698 4.97708602 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(14.4844835 .01540818 56.131971 | | WTS=none Number of observs.0000 -1.36561 | | Standard deviation = 1.62 (.3641085 -1.9904579 | | Adjusted R-squared = . = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1.2.000 12.1335449522D+01 . Criter.12293801 rho | .3581 | | Chi-sq [ 17] (prob) = 418.8370111 .1229382 | | Fit R-squared = .variables only 61.4 Using LIMDEP In LIMDEP.1868 12.881836 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.7703592 LOAD | -1.3178 Again.8677268 .35814 .088193 | | Standard error of e = .009826 | | Akaike Info. Interval] -------------+---------------------------------------------------------------output | .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 48 -----------------------------------------------------------------------------cost | Coef.210321 .8984424 fuel | -. The pooled OLS part of the output is skipped.t) .36410984 -1.0000000 | |(2) Group effects only -120.Panel. Valid data 15 | | Smallest 6.98362 .4491 30.Str=YEAR.edu/~statmath 48 .76991 . 72] (prob) = 439. Crt.LOAD. 72) = 1.17430918 FUEL | -.86772681 .42 0. Largest 6 | | Average group size 6.188 -1. specify a time-series variable for stratification in the Str= subcommand. = -4.52864 . REGRESS. 2009 at 04:19:57PM | | LHS=COST Mean = 13.8027907 sigma_e | .OUTPUT.88486 -------------+---------------------------------------------------------------sigma_u | .Fixed$ +----------------------------------------------------+ | Least Squares with Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27.6670 is the intercept of LSDV3 (see 5.33 0.69 0.44237791 -4.3271354 | |(3) X .2413535 load | -1.FUEL.9904579 | +--------------------------------------------------------------------+ | Hypothesis Tests | http://www. Autocorrelation of e(i. Do not forget to include ONE for the intercept.015291 | | Estd.1140409821D+03 . LogAmemiya Prd.316 .836268 -1.48449467 .00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .

21220479 3 2.indiana.000 .00000 439.3844937 tm_load | -1.169 at the last line of the output and do not reject the null hypothesis of no fixed time effect.0056 DFE 11 http://www. MODEL cost = output fuel load /BTWNT. by(year) .0512898 22.3 Between Time Effect Model The between effect model regresses time means of dependent variables on those of independent variables.02254 -----------------------------------------------------------------------------tm_cost | Coef.020449 1.427 14 . RUN.2 and 4.07073493 Residual | .2840035 .00000 2419. Prob.10 0. collapse (mean) tm_cost=cost (mean) tm_output=output (mean) tm_fuel=fuel /// (mean) tm_load=load.169 14 72 .00117 2.684 17 .00000 1668.364 3 72 .617 17 72 . .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 49 | Likelihood Ratio Test F Tests | | Chi-squared d.9991 = 0. Err.edu/~statmath 49 . 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 4074.329 3 86 . 5.18800 1. t P>|t| [95% Conf.00000 | |(4) vs (1) 418.3342486 . The PANEL Procedure Between Time Periods Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length BtwTime 6 15 Fit Statistics SSE 0.350727 .6.9989 = . regress tm_cost tm_output tm_fuel tm_load Source | SS df MS -------------+-----------------------------Model | 6.f.659 14 . P value | |(2) vs (1) 35.31776 | +--------------------------------------------------------------------+ You may find F statistic 1.246225 tm_fuel | .0000 = 0.99062 ------------------------------------------------------------------------------ PROC PANEL has the /BTWNT option to estimate the between effect model. See Sections 3.56 0.airline.64 0.3660016 30.45 0. PROC PANEL DATA=masil. ID airline year. Std.00404 | |(3) vs (1) 400.000 1.133337 .444128244 Number of obs F( 3.025 3 .256 3 .33 = 0.21779542 14 .00000 | |(4) vs (2) 383.0228284 14. F num.18505 .37949 11.8052644 _cons | 11. denom.000508239 -------------+-----------------------------Total | 6.00000 | |(4) vs (3) 18. Interval] -------------+---------------------------------------------------------------tm_output | 1.2478264 -5.896189 -.005590631 11 .000 10.605 14 75 .000 -1.

0513 0.3342494 .|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | 1.indiana. LogAmemiya Prd.0512897 22.0001 0.9749 Number of obs Number of groups = = 90 15 6 6.0000) | | Info criter.8052695 _cons | 11.0002 Label Intercept Alternatively.FUEL.0225 Root MSE Parameter Estimates Standard Error 0.3844943 load | -1.LOAD.246223 fuel | .45 Pr > |t| <.000 10. xtreg cost output fuel load.000 .111879D-13 FUEL | .11) Prob > F = = sd(u_i + avg(e_i.10 14.18504 .133335 0.9991 overall = 0. = 15 | | Model size Parameters = 4 | | Degrees of freedom = 11 | | Residuals Sum of squares = . use the be option in the Stata .5590461E-02 | | Standard error of e = .OUTPUT.348200 | | Akaike Info.18504 1. = -7.56 22.64 -5.0000) | | Diagnostic Log likelihood = 37.0225441 -----------------------------------------------------------------------------cost | Coef.2840044 . be i(year) Between regression (regression on group means) Group variable: year R-sq: within = 0.361410 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.0228 0.Rhs=ONE.0000 .© 2005-2009 The Trustees of Indiana University (9/16/2009) MSE R-Square 0.Means$ +----------------------------------------------------+ | Group Means Regression | | Ordinary least squares regression | | Model was estimated Aug 27.2478 Variable Intercept output fuel load DF 1 1 1 1 Estimate 11. Criter. Err.111879D-13 http://www.Er.35073 t Value 30.000 1.21 (.92650 | | Restricted(b=0) = -14.67933 | | Chi-sq [ 3] (prob) = 105.edu/~statmath 50 .64 0.Lhs=COST.9991 Linear Regression Models for Panel Data: 50 0.46 (.37948 11.Str=YEAR. 11] (prob) =4074.0005 0.0000 Obs per group: min = avg = max = F(3.334249 -1.35 0.000 -1.9840 between = 0.02282811 14.9988557 | | Model test F[ 3.13334032 .896191 -.9906 -----------------------------------------------------------------------------REGRESS.133335 . Std.xtreg command and the Means subcommand in LIMDEP Regress$ command to get the same result. 2009 at 04:23:24PM | | LHS=YBAR(i.))= .020447 1.3660 0.10 0.2478257 -5.Panel.097 .642 .3660008 30.36561 | | Standard deviation = .0001 <.) Mean = 13.05128905 22.33424795 .0000 .0001 <.35073 .2254382E-01 | | Fit R-squared = . = -7.56 0. Crt.45 0.9991009 | | Adjusted R-squared = . Interval] -------------+---------------------------------------------------------------output | 1. .0 6 4074.0228284 14. t P>|t| [95% Conf.6664301 | | WTS=NTi/Nobs Number of observs.

3180).0882) (6 *15  15  3) The small F statistic does not reject the null hypothesis of no fixed time effect (p<.1850651 . LIMDEP.indiana.450 30.36599619 -5.xtreg by default conduct the F test. RUN. one are zero: H 0 :  1  . PROC REG DATA=masil.0000 . (output is skipped) . quietly regress cost t1-t14 output fuel load . The F statistic is (1. MODEL cost = t1-t14 output fuel load.561 Linear Regression Models for Panel Data: 51 .© 2005-2009 The Trustees of Indiana University (9/16/2009) LOAD | Constant| -1.24782272 .35072980 11.3354  1.airline.   t 1  0 . TEST t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0. 72) = Prob > F = 1.0000 . You may conduct the same test using the TEST statement in LSDV1 and the Stata .3178 http://www.72] .edu/~statmath 51 .test command.4 Testing Fixed Time Effects.141312D-06 5.1683[14.17 0. and Stata ... test t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 ( 1) ( 2) ( 3) ( 4) ( 5) ( 6) ( 7) ( 8) ( 9) (10) (11) (12) (13) (14) t1 = 0 t2 = 0 t3 = 0 t4 = 0 t5 = 0 t6 = 0 t7 = 0 t8 = 0 t9 = 0 t10 = 0 t11 = 0 t12 = 0 t13 = 0 t14 = 0 F( 14.0882) (15  1) ~ 1. The null hypothesis of the fixed time effect model is that all time dummy parameters except (1. SAS PROC PANEL.

drop one time-series dummy and impose a restriction on the cross-section dummy parameters:  i 0 4.cnsreg does not allow suppressing the intercept.9979 = .002639534 -------------+-----------------------------Total | 114. Std.indiana.. . Since .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 52 6.43 0.17563838 Residual | .1742825 .1114508 . strategy 4 does not work in Stata. This model thus needs two sets of group and time dummy variables (i. Let us exclude g6 for the sixth airline and t15 for the last time period.1 Strategies of the Least Squares Dummy Variable Models You may combine LSDV1.2670499 http://www.9984 = 0. or one group or one time variables. 5. Alternatively.176848775 67 .28135835 Number of obs F( 22. Drop one cross-section dummy and suppress the intercept.82 = 0. t P>|t| [95% Conf. The first strategy of dropping two dummies is generally recommended because of its convenience of model estimation and interpretation. Alternatively.0861201 2. suppress the intercept and impose a restriction on the timeseries dummy parameters:  t  0.2 LSDV1 without Two Dummies The first strategy excludes two dummy variables. This chapter investigates fixed group and time effects.e.0023861 . 2.0779551 1. Drop one cross-section dummy and impose a restriction on the time-series dummy parameters:  t  0 . Interval] -------------+---------------------------------------------------------------g1 | . Drop one cross-section and one time-series dummy variables. Alternatively.0441482 .040893 89 1. There are five strategies when combining three LSDVs.047 .864044 22 5. two time variables. In general. Two-way Fixed Effect Models A two-way fixed model explores fixed effects of two group variables. LSDV2.05138 -----------------------------------------------------------------------------cost | Coef. Err.346179 g2 | . dummy coefficients are not of primary interest in panel data models. Suppress the intercept and impose a restriction on the cross-section dummy parameters:  i  0 .157 -. 6. regress cost g1-g5 t1-t14 output fuel load Source | SS df MS -------------+-----------------------------Model | 113. 1. 6.02 0. drip one time dummy and suppress the intercept 3.edu/~statmath 52 . airline and year). and LSDV3 to avoid perfect multicollinearity or the dummy variable trap in a two-way fixed effect model.0000 = 0. one dummy from each set of dummy variables. 67) Prob > F R-squared Adj R-squared Root MSE = 90 = 1960. Include all dummy variables and impose two restrictions on the cross-section and timeseries dummy parameters:  i  0 and  t  0 Each strategy produces different dummy coefficients but returns exactly same parameter estimates of regressors.

0915422 -.2470907 -.21823 0.0100769 t6 | -.175477 -.4272042 . MODEL cost = g1-g5 t1-t14 output fuel load.3189139 -1.4730429 .8172487 .17685 114.6931382 .178708 .02 1.5409901 -. PROC REG DATA=masil.1501062 -2.indiana.92 0.78 0.00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960.512434 17.edu/~statmath 53 .61 0.43 -2.033641 .007 -.0321443 5.6384366 .0795393 .1576935 .37 0.001 -1.17564 0.094 -1.airline.8033319 -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 53 g3 | -.1192713 .1802087 .98 0.042 -.0224688 -2.000 .0470 0.301271 .17428 0.3603843 _cons | 12.11145 -0.82 Pr > F <.0243983 t3 | -.0510764 t7 | -.0481295 t12 | -.306 -.04089 Mean Square 5.0319005 -3.004 -.0001 0.36561 0.0901007 .77 0. run the following script to get the same result.025 -.05138 13.66 0.14351 Standard Error 2.163478 1.0519893 t13 | -.3959783 .0186451 .4949135 load | -.9984 0.0018463 t1 | -.218231 5.9979 Parameter Estimates Parameter Estimate 12.543 -.05 0.143511 .000 8.70 0.94004 2.031851 25.2617373 -3.1118032 .0073 http://www.0500762 t8 | -.5421537 .3294473 -1.1348175 -2.040233 t9 | -.18844 -2.0617764 t4 | -.0362554 -0.02 0.2443691 g5 | -.27 0.1575 0.405244 -.83 0.0429008 -0.0944011 t5 | -.0001 Root MSE Dependent Mean Coeff Var 0.61 0.50 0.030508 -0.027 -.8828142 . The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.0749914 t11 | -.81 0.28 0.0763495 -2.0518934 -2.048 -.08 0.03 0.027 -.2319459 -2.3398463 .253383 .7536739 .36765 ------------------------------------------------------------------------------ In SAS.059 -1.08612 0.8808235 fuel | .1160484 .0466942 .2273857 .04 0.77 Pr > |t| <.6394596 -.1732969 -2.367467 -.7418804 -.001 -.075 -1. RUN.436 -.045 -.044 -1.0027964 t10 | -.07796 0.626 -.0188098 t2 | -.3378385 -2.042249 output | .2718933 .94004 0.26 0.5958031 .16861 .05189 Variable Intercept g1 g2 g3 DF 1 1 1 1 t Value 5.0546315 t14 | -.38439 R-Square Adj R-Sq 0.000 .86404 0.9360088 -.49 0.83 2.3320802 -1.37978 -.0177346 .0399313 g4 | .

edu/~statmath 54 .450294 16.417458 17.0001 0.000 8.T10. .T13.3623 http://www.06 0.000 .0750 0.37 <.54406 2.61 0.T2.2670499 g3 | -.G1.66 1.91 0.78 -0.9064 2.26174 5.© 2005-2009 The Trustees of Indiana University (9/16/2009) g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0. 67) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .69314 -0.1114508 .T1. Err.27189 -0.05 -1.16348 0.002639534 -------------+-----------------------------Total | 16191.33208 0.176848775 67 .73948 t9 | 12.04 -2.2469 1.50 -0.906633 Number of obs F( 23.03190 0.000 8.05138 -----------------------------------------------------------------------------cost | Coef.39789 1.2470907 -.G2.460909 16.01865 0.33784 0.04290 0.42720 -0. 0.000 8. Std.394303 17.18844 0.451487 16.26 0. Let us drop a dummy g6 and suppress the intercept.0915422 -.000 8.07615 t3 | 12.61 25.495438 16.047 .03 -3. Rhs=ONE.44206 t6 | 12.073782 6.4357 0.07635 0.157 -.G4.4201 23 703.0018463 t1 | 12.151893 5.81 -1.22739 -0.04669 -0.000 8.000 8. the following command fits the same model (output is skipped).346179 g2 | . Keep in mind that SSE is still correct but F and R2 are not.13482 0.38937 t14 | 12.5432 <.18021 -0.3 LSDV1 + LSDV2: Drop a Dummy and Suppress the Intercept The second strategy combines LSDV1 and LSDV2 to drop a dummy and suppress the intercept.221401 5.63664 t8 | 12.1742825 .0518934 -2.01018 t2 | 12.891045 6.903395 6.T3.007 -.T11.48363 16.92231 2.0588 0.480492 17.042 -.000 8.T4. noc Source | SS df MS -------------+-----------------------------Model | 16191.0321443 5.000 8.26217 t12 | 12.54215 -0.1802087 .50 0.02 0.000 8.2443691 g5 | -.0255 0.02247 0.49 -0.51284 2.70 -2.000 8.33985 -0.0224688 -2.05038 6.0399313 g4 | .19708 t5 | 12.89341 6.47304 -0.34424 1.17330 0.08 0.03214 0.39598 -0.23195 0.000 8.0008 0.15011 0.63844 -0.51 0.74 0.0000 1.82824 2.03051 0.T5.27 -2.885399 6.92 -1. Interval] -------------+---------------------------------------------------------------g1 | .T8.52 0.0861201 2. REGRESS.03185 0.59580 -0.T12.12 0.237999 5.T6.1235 t4 | 12.0441 0.0000 1.28 -2.51 0.G5.0466942 .Lhs=COST.indiana.527062 16.02 -2.0040 0.31891 0.0001 0.0268 0.143511 .77 0.0779551 1. t P>|t| [95% Conf.71266 2.81 0.61 -2.000 8.84086 t10 | 12.77 0.035334 6.32945 0.6263 0.T7.000 8.T9.0441482 .1160484 .224893 5.G3.491942 16.16861 -0.08 -2.03626 0.11180 -0.LOAD$ 6.60019 2.15 0.03364 -0.0938 0.991503 6.0012 In LIMDEP.5969 90 179.41943 17.57538 t7 | 12.98 -3.0000 .T14.247972 5.88281 Linear Regression Models for Panel Data: 54 0.66815 2.0266 0.090527 6.43 0.3016 1.77 0.OUTPUT.08 0.01773 -0.26 -2.0454 0.81725 0. regress cost g1-g5 t1-t15 output fuel load.39337 t13 | 12.598694 16.FUEL.455241 17.3061 0.9214 2.000 8.564976 16.467 1.0023861 .0415 0.0477 0.00785 t11 | 12.974786 Residual | .

36765 output | .075 -1.68185 17.94004 2.1576935 .0617764 t4 | -.000 8.5958031 .000 8.1118032 .1501062 -2.094 -1.8828142 . RUN. you may drop one of time dummies and suppress the intercept.0177346 .436 -.T10.T12.0510764 t7 | -.7418804 -.T9.600665 17.002639534 -------------+-----------------------------Total | 16191.5969 90 179.6931382 .306 -.027 -.367467 -.26 0.0546315 t14 | -.3378385 -2.0188098 t2 | -.3398463 .92 0.042249 output | .3189139 -1.55865 g5 | 12.512434 17.0027964 t10 | -.8172487 .T6.T13.3603843 ------------------------------------------------------------------------------ Alternatively.2718933 . The dummy coefficients are different from those above but parameter estimates of regressors remained unchanged.0000 . Interval] -------------+---------------------------------------------------------------g1 | 13.1192713 .indiana.044 -1. Err.000 8.025 -.2617373 -3.T5.49 0.5409901 -.90 0.Lhs=COST.0243983 t3 | -.0500762 t8 | -.512434 17.37 0.0763495 -2.T3. OUTPUT.0519893 t13 | -.048 -.253383 .0186451 .LOAD$ (output is skippted) REGRESS.89335 2.G2.3320802 -1.8808235 fuel | .81 0.001 -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 55 t15 | 12.0795393 .4201 23 703.05 0.83 0.000 8.178708 .04 0.18844 -2.4730429 .045 -.000 8.8033319 -.344341 17.T11.03 0.001 -1. Std.1576935 . 0.66412 17.37 0. ONE should be taken out to suppress the intercept.61 0.edu/~statmath 55 .004 -.3294473 -1.223638 5.Lhs=COST.000 .36765 t1 | -.airline.T2.9360088 -.G5.45781 17.27 0.001 -1.0000 1.0100769 t6 | -.405244 -.031851 25.16861 .T7.543 -.059 -1.306 -.FUEL. t P>|t| [95% Conf.040233 t9 | -.4272042 .05138 -----------------------------------------------------------------------------cost | Coef.175477 -.031851 25.12025 2.000 8. MODEL cost = g1-g5 t1-t15 output fuel load /NOINT.24872 g4 | 13.229864 5.0901007 .301271 . MODEL cost = g1-g6 t1-t14 output fuel load /NOINT.0319005 -3.G3.229552 5.027 -.37978 -.79653 2. http://www.32888 g6 | 12.033641 .0429008 -0.94004 2.5421537 .030508 -0. 67) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .11432 2.03 0. regress cost g1-g6 t1-t14 output fuel load. execute the following script that has /NOINT to suppress the intercept.88 0.3603843 ------------------------------------------------------------------------------ In SAS.6394596 -.0481295 t12 | -.85 0.T1.176848775 67 .7536739 .66 0.218231 5.7536739 .000 .626 -.3959783 .4949135 load | -. noc Source | SS df MS -------------+-----------------------------Model | 16191.163478 1.906633 Number of obs F( 23.1732969 -2.2319459 -2.000 8.74 0.16861 .T15.8828142 . (output is skippted) In LIMDEP.02 0.0749914 t11 | -.2273857 . PROC REG DATA=masil.230546 5. Rhs=G1.0000 1.70 0.83 0.6384366 .66 0.163478 1.4949135 load | -.28 0.0362554 -0.222204 5.T14.50232 g3 | 12.05149 2.50 0.218231 5.80 0.78 0.T8.2617373 -3.1348175 -2.974786 Residual | .G4.8808235 fuel | .98 0. . REGRESS.8172487 .56453 g2 | 13.0944011 t5 | -.405244 -.T4.

e(-1)] = .T1.0000 .9984).T6. = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .06666667 T5 | -.17430918 FUEL | .9979401 | | Model test F[ 22.22737840 .T9.22986828 5.26173663 -3.1139819 2.0000) | | Diagnostic Log likelihood = 152.16666667 G3 | 12.922 .0000 .T5.06666667 OUTPUT | .T13.G5.01864714 .032 .22955625 5.7479 | | Restricted(b=0) = -138.659 .T12.3060 12.06666667 T8 | -.T4.OUTPUT.06666667 T4 | -.7703592 LOAD | -.0441 .42717813 . The REG Procedure http://www.56046016 Notice that LIMDEP reports correct F (1960.1768479 | | Standard error of e = .0588 .0000 . RUN.06666667 T11 | -.G3.27187359 .16666667 G5 | 12.0000 . LogAmemiya Prd.36561 | | Standard deviation = 1.0750 .23055043 5. = -5.06666667 T14 | -.737 .16666667 G2 | 13.4 LSDV1 + LSDV3: Drop a Dummy and Impose a Restriction The third strategy excludes one dummy from a set of dummy variables and imposes a restriction on another set of dummy parameters.978 .06666667 T3 | -.T10. RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.01774030 . 6.18844068 -2.0477 .06666667 T7 | -.16347826 1.33208126 -1.22220692 5.T2.1199153 2.611 .882 .808 .06666667 T10 | -.15010661 -2.700 .03185102 25.39595152 .LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 30. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 13.03625541 -.airline.81725242 .06666667 T13 | -.G4. Criter.0255 .63838795 .06666667 T2 | -.0454 .0008 .017 . = -5.0012 .721164 | | Autocorrel Durbin-Watson Stat.11180525 .47300784 . 67] (prob) =1960. MODEL cost = g1-g6 t1-t14 output fuel load.21 (.21823375 5.0040 .264 . 2009 at 03:58:13PM | | LHS=COST Mean = 13.0000 -1.0511515 2.T11.489 .T14.T3.900 .17329717 -2.0000) | | Info criter. Crt.54210773 .709580 | | Akaike Info.0938 .03364915 .267 .33783938 -2.06666667 T9 | -.16863516 .31891465 -1.8930131 2.052 .FUEL.G2.0266 .07634935 -2.802 .285 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 56 Rhs=G1.16666667 T1 | -.83).4356 .04290088 -.88281516 .7961914 2.59575348 .0000 .16666667 G6 | 12. PROC REG DATA=masil.06666667 T6 | -.3581 | | Chi-sq [ 22] (prob) = 582.indiana.505 .T7.6262 .16666667 G4 | 13.06666667 T12 | -.833 .0000 .131971 | | WTS=none Number of observs.0268 .5137627E-01 | | Fit R-squared = . Rsqd & F may be < 0.69308729 .23194606 -2.03190046 -3.T8.9984493 | | Adjusted R-squared = .6035047 | | Rho = cor[e.83 (.373 .5431 .G6.22364115 5.039 .33982426 . = .edu/~statmath 56 .6982476 | | Not using OLS or no constant. Let us drop a time dummy here and then impose a restriction on group dummy parameters.9397087 2.853 .03050793 -. and R2 (.784 .32944797 -1.13481769 -2.

84 2.0001 0.03190 0.03731 0.36561 0.16348 0.28 -2.33208 0.03185 0.0040 0.0001 <.0001 Root MSE Dependent Mean Coeff Var 0.9984 0.26 -2.0008 0.2733 0.06549 -0.50 -0.92 -1.47304 -0.39598 -0.31891 0.18844 0.0155 0.02 -2.26174 .68 -12.0477 0.01832 0.32945 0.01773 -0.13482 0.78 -0.27189 -0.0268 0.98 -3. * Probability computed using beta distribution.69314 -0.3061 0.03626 0.04089 Mean Square 5.0012 .63844 -0.14 7. http://www.04290 0.48 -1.4357 0.5432 <.0001 0.04601 0.9979 Parameter Estimates Parameter Estimate 12.03 -3.37 .6263 0.27 -2.10 -2.33985 -0.81725 0.0454 0.03897 0.38439 R-Square Adj R-Sq 0.edu/~statmath 57 .01865 0.86404 0.01561 0.88281 -1.0441 0.0255 0.9387E-16 Standard Error 2. Variable Intercept g1 g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load RESTRICT DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 t Value 5.66 1.03364 -0. Pr > |t| <.13425 -0.79 1.0588 0.22739 -0.98600 0.03051 0.12833 0.04596 -0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 57 Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.0750 0.0938 0.70 -2.17564 0.04161 0.09265 -0.indiana.07635 0.05138 13.22540 0.17685 114.04 -2.81 -1.0069 0.61 25.18947 0.33 -2.00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960.23195 0.15011 0.49 -0.82 Pr > F <.59580 -0.54215 -0.33784 0. Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.0266 0.0001 0.0975 <.17330 0.16861 -0.05 -1.11180 -0.42720 -0.

8808235 fuel | .G2.0181824 g6 | -.1283264 . = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .1708121 g5 | -.1576935 .37 0.175477 -.0183163 7.3398463 .8172487 .031851 25.6931382 .1732969 -2.3603843 _cons | 12.4949135 load | -.9360088 -. .1583102 g4 | .82 0.048 -.10 0.027 -.18844 -2.8033319 -.61 0.0519893 t13 | -.075 -1.0122867 .0429008 -0.27 0.5137627E-01 | | Fit R-squared = .0901007 .225402 5.OUTPUT.0000 0.constraint(1) option fits OLS under constraint 1 defined in .T14.98 0. run a Regress$ command with the Cls: subcommand.2273857 .0373085 -2.033641 .6384366 .G4. constraint(1) Constrained linear regression Number of obs F( 22.G5.0617764 t4 | -.7418804 -.016 -.2617373 -3.094 -1.0481295 t12 | -.81 0.025 -.097 -.0460126 2. t P>|t| [95% Conf.68 0. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0 .1671184 -.001 -. Std.LOAD.6394596 -.T3.040233 t9 | -.0546315 t14 | -. Therefore.0749914 t11 | -.626 -.5409901 -.36561 | | Standard deviation = 1.0186451 .1342526 .3189139 -1. 67) Prob > F Root MSE = = = = 90 1960.3378385 -2.G1.9979401 | | Model test F[ 22.66 0.042249 output | .1118032 .004 -.007 . .T6.T9.301271 .0514 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 -----------------------------------------------------------------------------cost | Coef.T5.044 -1.273 -.0944011 t5 | -.131971 | | WTS=none Number of observs.0177346 .4730429 .indiana.0795393 .436 -.0319005 -3.37978 -.1348175 -2.0370916 t1 | -.0000) | http://www.3294473 -1.059 -1.G3.543 -.0027964 t10 | -.000 .000 -.0510764 t7 | -.T8.0188098 t2 | -. 2009 at 04:24:35PM | | LHS=COST Mean = 13.0654947 .9984493 | | Adjusted R-squared = .28 0.027 -.0364849 .78 0.0416069 -1.7536739 .03 0.1501062 -2.48 0.1192713 .cnsreg with the .5421537 .2201679 g2 | .cnsreg command with a constraint on the group dummy parameters.70 0.0763495 -2. b(2) for g1 through b(7) for g6.05 0.0926504 .1432761 g3 | -.163478 1.0362554 -0.33 0.3959783 .0156096 -12.16861 .26 0.000 8.405244 -.1894671 .097693 .T4.3320802 -1.79 0.0243983 t3 | -.1768479 | | Standard error of e = .02 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 58 In Stata.0459561 .42792 ------------------------------------------------------------------------------ In LIMDEP.2319459 -2.253383 .0100769 t6 | -.4272042 .045 -.986 2.Lhs=COST. b(2) in the subcommand indicates the second parameter estimate listed in the Rhs= subcommand.T11.220624 -. is zero. cnsreg cost g1-g6 t1-t14 output fuel load.92 0.49 0.edu/~statmath 58 .04 0.FUEL. 67] (prob) =1960.306 -.5958031 .0500762 t8 | -.G6.14 0.constraint.T7. LIMDEP fits the LSDV1 under the constraint that the sum of all group dummy parameters.367467 -.178708 .8828142 .544076 17.2718933 .84 0.T1.50 0.001 -1. Err.T12.0389685 1. Interval] -------------+---------------------------------------------------------------g1 | .1290038 .T13.030508 -0.83 (. Rhs=ONE.T10. you need to run the . REGRESS. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 30.T2.000 .

T14.06666667 T12 | -.T5.5432 . In LIMDEP. LogAmemiya Prd.69308729 . = .483 . Rsqd may be < 0.06666667 T11 | -.06666667 T3 | -.18844068 -2.G2.4356 .T10.06666667 T7 | -.03625541 -.489 .681 . . Criter. = -5.3061 12.airline.0976 .0000 G1 | .5 LSDV2 + LSDV3: Suppress the Intercept and Impose a Restriction The strategy of LSDV2 + LSDV3 includes all two sets of dummy variables and instead suppresses the intercept and imposes a restriction.13425504 .784 .11180525 .0008 .0269 .31891465 -1.26173663 -3.47300784 .373 .0041 . F (703.01774030 .39595152 .032 .16666667 G6 | -.789 .22540616 5.9748) and R2 are incorrect.T9.7479 | | Restricted(b=0) = -138.0442 .18946893 .88281516 .808 .6262 .T8.01864714 .16666667 T1 | -.9856603 2. | | Note.700 .0751 .0156 .06666667 T8 | -. RUN.0939 .56046016 Alternatively.16863516 .23194606 -2.06666667 T5 | -.16666667 G2 | .indiana.Lhs=COST.01560965 -12.22737840 .835 .285 .039 .06666667 T14 | -. cnsreg cost g1-g5 t1-t15 output fuel load. RESTRICT t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0.T3.06666667 T13 | -. Rsqd & F may be < 0.00 (*****) | | Not using OLS or no constant. The following procedure has a constraint on the group variable.G1.06666667 T10 | -.33982426 .54210773 .06666667 OUTPUT | .33208126 -1.6035047 | | Rho = cor[e.03050793 -.0000 . you may drop one group dummy and imposes a restriction on time dummy variables.03364915 .T2.922 .27187359 .2734 .0267 .267 .16666667 G5 | -.T11.13481769 -2.06666667 T2 | -.edu/~statmath 59 .33783938 -2.264 .6982476 | | Restrictns. MODEL cost = g1-g6 t1-t15 output fuel load /NOINT.06549116 .03730846 -2.505 .16666667 G4 | .104 .0000) | | Info criter.T4.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 59 | Diagnostic Log likelihood = 152.17329717 -2.42717813 .T6.FUEL.04595164 . F[ 1.03190046 -3. http://www. Since the intercept is suppressed.06666667 T6 | -.721164 | | Autocorrel Durbin-Watson Stat.G4. MODEL cost = g1-g5 t1-t15 output fuel load.G5.G3.e(-1)] = .OUTPUT. with restrictions imposed.138 .63838795 .978 . 66] (prob) = . Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$ 6.06666667 T4 | -. Stata does not support this approach.7703592 LOAD | -. Crt.3581 | | Chi-sq [ 22] (prob) = 582.052 . b(7) indicates the seventh parameter estimate for t1.12832155 .06666667 T9 | -.16666667 G3 | -.04601257 2.03185102 25.T12. The output is skipped. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 12.0069 . Rhs=ONE.709580 | | Akaike Info.32944797 -1. = -5.airline.16347826 1.330 .T15.0012 .LOAD.0255 .15010661 -2.T13.21 (.0000 -1.659 . PROC REG DATA=masil.T1.0589 . constraint(3) REGRESS.017 .81725242 .0000 .09264719 .17430918 FUEL | .0454 .01831636 7.03896849 1.04160692 -1.0478 .07634935 -2. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0 .59575348 .T7. PROC REG DATA=masil.04290088 -.611 .

0001 <.25499 2.09265 -0.99808 2.04601 0.14 7. RUN.0155 0.51 6. R-Square is redefined.22540 Variable g1 g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t Value 2.0001 <.05706 2.23202 2.64615 12.84 Pr > |t| 0.24505 2.0001 <.2733 <.0001 <.51295 12.81 5.71410 12.74 5.22838 2.04161 1.08052 2.87419 12.39019 12.edu/~statmath 60 .36561 0.0000 Parameter Estimates Parameter Estimate 0.09734 2.29286 12. Linear Regression Models for Panel Data: 60 The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.96826 12.0001 Root MSE Dependent Mean Coeff Var 0.44384 12.00264 Source Model Error Uncorrected Total DF 23 67 90 F Value 266704 Pr > F <.15883 2.04195 2.33 -2.0001 <.10 6.13425 -0.89169 1.03897 0.0001 0.18947 0.96735 12.© 2005-2009 The Trustees of Indiana University (9/16/2009) RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.78 5.12833 0.75861 12.0001 <.0001 <.05138 13.55879 12.0001 <.52 6.01561 0.79 1.12 6.26 6.0001 <.95236 12.34756 12.04596 12.08 6.59002 12.0069 0.15 6.0001 <.48 -1.0001 <. Number of Observations Read Number of Observations Used 90 90 NOTE: No intercept in model.0001 <.0001 <.68 -12.50 6.06549 -0.90989 1.38439 R-Square Adj R-Sq 1.89982 1.89736 1.0001 <.0975 <.52 6.98600 Standard Error 0.03731 0.97479 0.06 5.0000 1.78 5.0001 <.91 5.01832 0.indiana.0001 http://www. Analysis of Variance Sum of Squares 16191 0.17685 16192 Mean Square 703.

FUEL. The output is skipped..T9.FUEL.T10.3061 0.36561 | | Standard deviation = 1.319 . Criter. following commands are supposed to work.39453125 306661.00 <.T13.5169924E-01 | | Fit R-squared = .0000) | | Diagnostic Log likelihood = 152.89339E-14 Linear Regression Models for Panel Data: 61 25.G2..airline. 67] (prob) =1936.000 1.319 . = -5. REGRESS.T2.T13.1839 | | Restricted(b=0) = -138.8261719 . In LIMDEP.T14.© 2005-2009 The Trustees of Indiana University (9/16/2009) output fuel load RESTRICT 1 1 1 -1 0.. Crt.G2.indiana. G6 | 12.08 (.T14.06666667 T3 | -.OUTPUT.9453125 216842.0000 .000 1.26174 1.T15.16861 -0.81725 0.637 .33203125 433684.06666667 http://www.03185 0.4113) | | Not using OLS or no constant.e(-1)] = .9984297 | | Adjusted R-squared = . 2009 at 04:47:10PM | | LHS=COST Mean = 13.T3.T9.G6..T6.T12.T3.Lhs=COST. RUN.0012 1. LogAmemiya Prd..0000 .697046 | | Akaike Info.0000 .G3..16666667 G4 | 13. = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .708630 | | Autocorrel Durbin-Watson Stat..T4...G4.000 1.. F[ 1..06666667 T2 | -.250165E-9 * Probability computed using beta distribution..6917788 | | Restrictns.7812500 .T2.348 .0000* 0. Rhs=G1...000 1.Lhs=COST.T7..0000) | | Info criter.6894531 216842..319 ...37 (..(Fixed Parameter).. PROC REG DATA=masil.03 -3.LOAD. You may impose an alternative restriction on the time variable to obtain the equivalent result despite different dummy coefficients.. G2 | 12..16666667 G5 | 12..G6.T5..edu/~statmath 61 .29101563 216842.LOAD.16348 0. 66] (prob) = ..T8. | | Note. with restrictions imposed.T15.(Fixed Parameter). but they return different parameter estimates and goodness-of-fit measures probably due to its estimation method. Rhs=G1.66 1.131971 | | WTS=none Number of observs.T12.6164424 | | Rho = cor[e..T11.T4.0000 .0001 0.0058594 .G5. = .. RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0..68 (. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 13..88281 5. Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 30.T7. T1 | -.(Fixed Parameter).G5.0117188 216842.T11.T8.000 1.T1.37 0.1790783 | | Standard error of e = ..G4.3581 | | Chi-sq [ 22] (prob) = 581.0000 ...OUTPUT.T5.9979141 | | Model test F[ 22..000 1.319 .T10.G3.0000 . Rsqd & F may be < 0. = -5.16666667 G3 | 12. MODEL cost = g1-g6 t1-t15 output fuel load /NOINT. Rsqd may be < 0. Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$ (output is skipped) REGRESS.T6.T1.

33203125 .9984 0.0000 .86404 0. Pay attention to the two RESTRICT statements in the following PROC REG.indiana..0001 0..319 216842.. RUN.82 Pr > F <.79 Pr > |t| <.30468750 .000 1.(Fixed 216842.3587 -3.56046016 6.365 .04089 Mean Square 5.7703592 .0000 .17430918 12.06666667 .03205125 .10742188 -..319 216842. Parameter)....0069 http://www..88619366 306661.....06666667 . MODEL cost = g1-g6 t1-t15 output fuel load. 25.. RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.07421875 -.397 ..(Fixed .9979 Parameter Estimates Parameter Estimate 12.36561 0...10351563 .348 .924 .319 216842. Parameter).000 1.31250000 ...319 216842...38439 R-Square Adj R-Sq 0.000 1...26338199 Linear Regression Models for Panel Data: 62 .. The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates..16406250 -.05859375 ..06666667 ....000 1.24414063 -.06666667 .06666667 ...02148438 . . Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.(Fixed ..12833 Standard Error 2.66688 0.airline...(Fixed ..81399272 .06666667 -1.0000 .04601 Variable Intercept g1 DF 1 1 t Value 6.0000 Parameter)..17685 114..319 216842.© 2005-2009 The Trustees of Indiana University (9/16/2009) T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 OUTPUT FUEL LOAD | | | | | | | | | | | | | | | -. Parameter). PROC REG DATA=masil.31835938 .edu/~statmath 62 ..0000 ..0013 .000 1.000 1.16450594 .05138 13.08107 0.17564 0.0000 .(Fixed ..0000 Parameter)...... RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0..00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960.....15204518 -.06666667 .0001 Root MSE Dependent Mean Coeff Var 0.000 1..6 LSDV3 with Two Restrictions The last strategy includes all group and time dummies and then imposes two restrictions on group and time dummy parameters.319 .0000 ..09 2....22070313 ..

1491443 1.0740 0.2766893 .055 -.0917281 .0892789 t4 | -.0000 0.02073 0.1583102 g4 | .79 0.2073105 .0341 <.202 -.109 -.1222038 t5 | -.1088 0.1406043 -.03 -3.96 0.3004686 .0181824 g6 | -. t P>|t| [95% Conf.03193 0.1976296 -.2624 0.37 -0.01832 0.edu/~statmath 63 .88281 -2.0702531 .1091 0.0001 0.1729671 -1.169 -.09265 -0.17297 0.1539291 .14914 0.17564 0.0155 0.0903829 .0156096 -12.© 2005-2009 The Trustees of Indiana University (9/16/2009) g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 output fuel load RESTRICT RESTRICT 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 0.78 0.0185513 t7 | -.30047 0.62 1.14 7.33 0.29 0.0364849 .0001 0.0373085 -2.63 0.08115 0.1052688 t10 | .0319336 -2.6360447 t13 | .00 .0795 0.0187 0.0472205 . * Probability computed using beta distribution.016 -. execute the following command to get the same result.0131248 t8 | -.41 -2.01 1.0290822 1.19187 0.13 0.030017 .2733 0.82 0.314 -.31911 0.48 -1.3061 0. In Stata.019 -.0389685 1.262 -.5050039 t12 | .68 -12.1833501 -1.0975 <.72 -1.0183163 7.0908 0.15393 -0.96 2.0926504 .14 0.000 -.39 1.0012 1. Err.7570026 .41 0.95 0.1080904 . cnsreg cost g1-g6 t1-t15 output fuel load.37402 -0.0370916 t1 | -.1290038 .68 0.41 -1.091 -.10809 -0.6426576 .48 0.220624 -.0108278 .13425 -0.82 0.33 -2.29 -1.1283264 .04161 0.074 -.3193228 .14749 0.27669 -0. Std.097 -.20731 0.1671184 -.1708121 g5 | -.03185 0.04596 -0.31932 -0.06549 -0.95 -1.01 0. Interval] -------------+---------------------------------------------------------------g1 | .097693 .82 1.78 -2.6327752 t14 | . 67) Prob > F Root MSE = = = = 90 1960.09173 0.055 -.04547E-11 .000 .2017 0.16 25.51 -1.indiana.1342526 .0000* .273 -.1894671 .62 0.0448591 -2.18947 0.0089536 t2 | -.0768646 .3013791 .3740245 .13 1.16348 0.28547 0.08644 0.5962E-16 -2.019 -.3598E-16 Linear Regression Models for Panel Data: 63 1.04486 0.1756365 1.0122867 .66 1.2537092 t11 | .5682837 .16861 -0.02908 0.0001 <.191872 -1.02045 0.6907554 . 0.0188 0.0864404 -1.1536212 1.10 -1.3264649 .0654947 .0416069 -1.63 1.109 -.18609 0.16603 0.061552 .0061606 .81725 0.10 0. Notice that constraints 1 and 3 were defined above.03897 0.2201679 g2 | .0204506 -1.03731 0. . 0.3143 0.22304 -0.0514 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 ( 2) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0 -----------------------------------------------------------------------------cost | Coef.2854727 .0650993 .0521097 t3 | -.07686 -0.0811525 1.18335 0.0186066 t6 | -.007 .0207326 .51 0.1360 0.41 0.079 -.0554 0.1691 0.136 -.0200869 t9 | .72 0.0460126 2.39 0.1432761 g3 | -.2230399 .26174 4. constraint(1 3) Constrained linear regression Number of obs F( 22.30138 0.6070978 http://www.0459561 .1660294 1.0546 0.01561 0.15362 0.04722 0.1860877 -1.

indiana.24 0.36561 1.000 8.0279512 29.4537565 w_load | -.000 .T13.002032745 -------------+-----------------------------Total | 2.gm_load . b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)+b(22)=0$ 6.16861 .1165364 .gm_output .8808235 fuel | .04509 -----------------------------------------------------------------------------w_cost | Coef. .T14.2297*sqrt(87/67).T3.831 load | 90 .163478 1. standard errors.edu/~statmath 64 .7536739 .8728048 w_fuel | .339349 -.2617373 -3.1474883 2.0000 0.tm_fuel + m_fuel load .3603843 _cons | 12.G3.150606 -3.426279 ------------------------------------------------------------------------------ Remember that F. Err.6608616 fuel | 90 12.9139 0.8123749 11. . means. sum cost output fuel load Variable | Obs Mean Std. .87739643 3 .5604602 .Lhs=COST. Standard errors need to be adjusted.77036 .T15. Notice that two restrictions in Cls: are separated by a comma.G4.2296907 -3.7616927 . gen gen gen gen w_cost = w_output w_fuel = w_load = cost .T10.T8.081068 6.03 0. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0.OUTPUT.3733 output | 90 -1.86 0.174309 1.243 -.001 -1. t P>|t| [95% Conf.gm_cost . 87) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 307.031851 25.625798811 Residual | . .T1. for instance. the following command returns the same result (output is skipped).0527934 .FUEL. run the OLS with the transformed variables. yit  yit  yi   yt  y and xit  xit  xi   xt  x .4949135 load | -. We need to compute overall means and group specific. .84 0.16861 .3191137 .T7.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 64 t15 | .2617=.T4. The dummy variable coefficients are computed as di*  ( yi   y )  ( xi   x )'  and dt*  ( yt  y )  ( xt  x )'  . . and DFerror are not correct.513054 16.1576935 .37 0. Std. noc Source | SS df MS -------------+-----------------------------Model | 1.tm_cost + m_cost = output .G1.82071 ------------------------------------------------------------------------------ In LIMDEP.432066 .LOAD. regress w_cost w_output w_fuel w_load.000 -1.8828142 .034 .66 0.176848774 87 .000 .T2.131971 11.T5.8172487 . R2.55017 13.18 0.T12. Interval] -------------+---------------------------------------------------------------w_output | . REGRESS. Dev.0247259 . say airline 3. Do not forget to suppress the intercept.306 -.T9.6135015 output | .8172487 .8828142 .14154 15.16 0. the standard error of the load factor is .05424521 90 .1434621 1.9109 .tm_output + m_output fuel .676287 http://www.022824947 Number of obs F( 3.66688 2.7 Two-way Within Effect Model The two-way fixed effect model requires a transformation of dependent and independent * * variables using group means. Rhs=One.G2.278573 .gm_fuel .T11.tm_load + m_load Once data are transformed.G6.T6. Min Max -------------+-------------------------------------------------------cost | 90 13.G5.405244 -.09 0.

654256 The actual (absolute) intercept of airline 3 is -.6169364 fuel | 15 12.7704)*(.0670-(-1.airline.7897-12. The actual intercept of time period 9 is .8172) -(12. PROC SORT DATA=masil.1743))*(.6179098 .067003 1.56479 13.9122625 .654256 6.0212523 12. .20495 14.0324437 .edu/~statmath 65 .524334 . Min Max -------------+-------------------------------------------------------cost | 6 13.89337 load | 6 .0376737 .4651 1.1686). sum cost output fuel load if year==9 Variable | Obs Mean Std.546723 .6 to cross-check the computation. The PANEL Procedure Fixed Two Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixTwo 6 15 Fit Statistics SSE MSE R-Square 0.6179-. MODEL cost = output fuel load /FIXTWO.1895 =(13. See the SAS output in Section 6.673258 .861012.8828).(. BY airline year.9984 DFE Root MSE 67 0.3656)-(-.5845-.0472=(13.8177211 11.5845359 .78597 output | 6 -1.37231 .indiana.3656)-(-1.83356 12. ID airline year. Dev.86104 . sum cost output fuel load if airline==3 Linear Regression Models for Panel Data: 65 Variable | Obs Mean Std.4651-13.4779284 fuel | 6 12. The data set needs to be sorted by the group and time variables that will be declared in the ID statement in PROC PANEL. PROC PANEL DATA=masil.8828).278931 -2. RUN.831 load | 15 .7704)*(.9123-(1.6851 13.1743))*(.3723-13.78972 .2435335 -1.8172) -(12.0514 http://www.airline.5220657 12.042032 12.© 2005-2009 The Trustees of Indiana University (9/16/2009) .(.99694 output | 15 -.337794 -.1768 0. Dev.1686).5605)*(-.8 Using SAS: PROC TSCSREG and PROC PANEL PROC TSCSREG and PROC PANEL have the /FIXTWO option to fit the two-way fixed effect model. Min Max -------------+-------------------------------------------------------cost | 15 13.0026 0.5605)*(-.

0441 0.81 -1.33985 -0.1348 0.01865 12.39598 -0.817249 0.83 25.10 Pr > F <.3378 0.4272 -0.04669 -0.174283 0.0305 2.69314 -0.43 -2.27 -2.111451 -0.3189 0.0519 0.0454 0.0415 0.50 -0.0255 0.0008 0.edu/~statmath 66 .63844 -0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 66 F Test for No Fixed Effects Num DF 19 Den DF 67 F Value 23.02 -2.0938 0.94004 0.77 5.26 -2.1884 0.01773 -0.0319 0.0763 0.02 1.0363 0.0319 0.0588 0.61 5.0750 0.3294 0.05 -1.0001 Parameter Estimates Standard Error 0.03364 -0.70 -2.14351 0.0266 0.98 -3.92 -1.0001 0.61 -2.5958 -0.0268 0.1118 -0.5432 <.78 -0.47304 -0.0780 0.16861 -0.0001 <.0861 0.22739 -0.180209 -0.04 -2.27189 -0.28 -2.1501 0.37 Pr > |t| 0.indiana.1575 0.1635 0.0225 0.0040 0.0001 0.3321 0.3061 0.4357 0.66 1.0012 Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Time Series Effect 1 Time Series Effect 2 Time Series Effect 3 Time Series Effect 4 Time Series Effect 5 Time Series Effect 6 Time Series Effect 7 Time Series Effect 8 Time Series Effect 9 Time Series Effect 10 Time Series Effect 11 Time Series Effect 12 Time Series Effect 13 Time Series Effect 14 Intercept 6.0470 0.0073 <.2617 Variable CS1 CS2 CS3 CS4 CS5 TS1 TS2 TS3 TS4 TS5 TS6 TS7 TS8 TS9 TS10 TS11 TS12 TS13 TS14 Intercept output fuel load DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Estimate 0.88281 t Value 2.0477 0.03 -3.0429 0.08 -2.2182 0.54215 -0.49 -0.1733 0.0321 0.6263 0.9 Using Stata and LIMDEP http://www.2319 0.

8033319 -.544076 17.1306712 sigma_e | .26 0. This command has Str and Period to specify stratification and time variables.001 -1.3959783 . 67) = Prob > F = 69.081).004 -. http://www.1118032 .0186451 .4949135 load | -.6394596 -.2273857 .5421537 .059 -1.0944011 t5 | -. Err.05137639 rho | . fe i(airline) Fixed-effects (within) regression Group variable: airline R-sq: within = 0.18844 -2.8808235 fuel | .626 -.edu/~statmath 67 . 67) = 69.0429008 -0.986 2.0319005 -3.031851 25.02 0.3398463 .70 0. The pooled OLS and fixed group effect parts of the entire output is skipped below since they are redundant.001 -.301271 .9859 overall = 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 67 The Stata .4272042 .9955 between = 0.2319459 -2.3294473 -1.025 -. quietly regress cost g1-g5 t1-t14 output fuel load .61 0.405244 -.1192713 .9360088 -.04 0.indiana.0000 The F statistic of 69.67) Prob > F = = corr(u_i.1501062 -2.28 0.033641 .027 -.436 -.3378385 -2.05 0.1576935 .5958031 .044 -1.045 -.027 -. . However.0617764 t4 | -.030508 -0. but reports the incorrect intercept in the two-way fixed model.5409901 -.178708 .0000 The following LIMDEP command fits the two-way fixed model.8172487 .16861 .000 8.2617373 -3.0243983 t3 | -.3361 -----------------------------------------------------------------------------cost | Coef. this command is able to fit the two-way fixed effect model by including a set of dummies for a group (LSDV1) and using the fe option.78 0. test g1=g2=g3=g4=g5=0 ( ( ( ( ( 1) 2) 3) 4) 5) g1 g1 g1 g1 g1 F( = g2 g3 g4 g5 0 = = = = 0 0 0 0 5.306 -.1732969 -2.0901007 .66 0.37978 -.84 0. Interval] -------------+---------------------------------------------------------------t1 | -.7418804 -.0510764 t7 | -.175477 -.0546315 t14 | -.05 Prob > F = 0. 12.0481295 t12 | -.03 0.543 -.3320802 -1.0100769 t6 | -. This command presents the pooled model and one-way group effect model as well.225402 5.0763495 -2.42792 -------------+---------------------------------------------------------------sigma_u | .05 0. Xb) = 0.xtreg command does not have an option for two-way fixed or two-way random effect models.6931382 .98 0.0362554 -0.367467 -.86611203 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(5.92 0.9885 Number of obs Number of groups = = 90 6 15 15.0027964 t10 | -.0500762 t8 | -.2718933 .0177346 .000 .0795393 .05 tests only if parameters of g1 through g5 are all zero.81 0.1348175 -2.048 -.24 0.042249 output | . Std.0 15 873.3603843 _cons | 12.37 0.163478 1.253383 . .0749914 t11 | -. xtreg cost t1-t14 output fuel load.8828142 . t P>|t| [95% Conf.0000 Obs per group: min = avg = max = F(17.0519893 t13 | -.075 -1.49 0.6384366 .27 0.667 (2.7536739 .0188098 t2 | -.3189139 -1.094 -1.50 0. You may doublecheck this test by running the following commands.4730429 .040233 t9 | -.

256 3 .f. P value | |(2) vs (1) 95.947 20 67 .Panel.Period=YEAR.6548513 | |(3) X . Autocorrelation of e(i. = -5.3581 | | Chi-sq [ 22] (prob) = 582.133 14 67 .0011 .1768479062D+00 .7703592 LOAD | -.76991 .   T 1  0 .00 | | Panel: Prds: Empty 0.818 8 81 .74790 .00004 3.17430918 FUEL | ..35814 .889 8 .00000 | |(4) vs (2) 441.9882897 | |(4) X and group effects 130.2926207777D+00 .t) .087 . Largest 15 | | Average group size 15.OUTPUT.9979401 | | Model test F[ 22.3936109461D+02 .00000 | |(4) vs (1) 536.. LogAmemiya Prd.633 5 ..659 .9974341 | |(5) X ind.00085 | |(5) vs (3) 181.00000 3935.Fixed$ +----------------------------------------------------+ | Least Squares with Group and Period Effects | | Ordinary least squares regression | | Model was estimated Aug 27.00000 | |(4) vs (3) 136.323 14 . F num.5137627E-01 | | Fit R-squared = .16863516 .6665675 2. Largest 6 | | Average group size 6.9984493 | | Adjusted R-squared = .Str=AIRLINE.26173663 -3.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 68 REGRESS.709580 | | Akaike Info.00000 57.00000 2419. Valid data 15 | | Smallest 0.88281516 .131971 | | WTS=none Number of observs.36561 | | Standard deviation = 1.149 3 .indiana.00000 31.00000 21. Criter.733 5 81 .0000) | | Info criter.832 3 81 .1335449522D+01 .00000 | +--------------------------------------------------------------------+ 6. Prob. 2009 at 04:27:40PM | | LHS=COST Mean = 13.10 Testing Two-way Fixed Effects The null hypothesis is that parameters of group and time dummies are zero: H 0 : 1  .651825 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.956 20 .373 . Valid data 6 | | Smallest 15.56046016 Constant| 12.08107166 6.FUEL.00000 3604.00000 | |(5) vs (4) 45.0000 -1.03185102 25.LOAD.81725242 . = -5.875 5 84 .740 5 .3052 12.Lhs=COST. Crt.edu/~statmath 68 .0000000 | |(2) Group effects only -90.16347826 1.1768479 | | Standard error of e = .variables only 61. denom.83 (.0000 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.21 (. = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .9984493 | +--------------------------------------------------------------------+ | Hypothesis Tests | | Likelihood Ratio Test F Tests | | Chi-squared d.721164 | | Estd. The F test compares the pooled regression and http://www.Rhs=ONE.&time effects 152..08647 .48804 .032 .329 3 86 .1140409821D+03 .00000 | |(3) vs (1) 400.00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .0000) | | Diagnostic Log likelihood = 152. 67] (prob) =1960.7479 | | Restricted(b=0) = -138.   n 1  0 and  1  .

01 significance level (p<.1768) (6  15  2) ~ 23. RUN. test g1 g2 g3 g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 http://www. You may also run the following SAS REG procedure and Stata .1768) (6 *15  6  15  3  1) The SAS TSCSREG and PANEL procedures conduct this F-test for the group and time effects.indiana. The Stata output is skipped.67] (.10 Pr > F <. Test 1 Results for Dependent Variable cost Mean Square 0. PROC REG DATA=masil. MODEL cost = g1-g5 t1-t14 output fuel load.00264 Source Numerator Denominator DF 19 67 F Value 23.0001 . quietly regress cost g1-g5 t1-t14 output fuel load .1085 rejects the null hypothesis at the . (1.0000). The F statistic of 23.3354  .1085[19.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 69 two-way fixed group and time effect model.edu/~statmath 69 . TEST g1=g2=g3=g4=g5=t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0.06098 0.regress command to perform the same test.airline.

you have to estimate  using the SSEs of the between group effect model (. transform the dependent and independent variables including the intercept using  . . .000 9.031675926/(6-4) . Std.9989 = 0. whereas ML. This model is appropriate for n individuals who were drawn randomly from a large population. minimum norm quadratic unbiased estimators (MINQUE).87668488*gm_cost rg_output = output .670313 4 71.311586777 86 . the Swamy and Arora method.00361263 = . Random Effect Models A random effect model examines how group and/or time affect error variances. Err. run the OLS with the transformed variables..031675926/(6 .edu/~statmath . restricted ML estimators.292622872/(6*15-6-3) ˆ The variance component of group  u2 is .1 One-way Random Group Effect Model When the omega matrix is not known. nK 64 ˆ Next.06019 -----------------------------------------------------------------------------rg_cost | Coef.10 7.210119 10. gen gen gen gen gen rg_cost = cost .2926). 70 http://www. and minimum variance quadratic unbiased estimators (MIVQUE).87668488  1  ˆ2 ˆ T u   v2 ˆ  v2 1 ˆ .4) ˆ2 where  between  SSEbetween . 86) Prob > F R-squared Adj R-squared Root MSE = 90 =19642. and MIVQUE are recommended for the unbalanced models. such as a modified Wallace and Hussain method.01559712 =.9989 = .0000 = 0.627911 .1675783 Residual | .87668488*gm_output rg_fuel = fuel .indiana. t P>|t| [95% Conf. ˆ The variance component of error  v2 is .0317) and the fixed group effect model (. and Henderson’s method III. restricted ML. . This is the groupwise heteroscedastic regression model (Greene 2003).87668488*gm_load rg_int = 1 . regress rg_cost rg_int rg_output rg_fuel rg_load. Do not forget to suppress the intercept.72 = 0.81 0.00361263/15 ˆ Thus. .  is .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 70 7. .00361263  v2 1 . .01583796 . the Wansbeek and Kapteyn method.9819 90 3. This chapter focuses on the feasible generalized least squares (FGLS) with variance component estimation methods.. 2 ˆ T between 15 * .16646556 Number of obs F( 4. they argue that ANOVA estimators are Best Quadratic Unbiased estimators of the variance components for the balanced model. They also discuss maximum likelihood (ML) estimators..0457 10 Baltagi and Cheng (1994) introduce various ANOVA estimation methods.87668488*gm_fuel rg_load = load .87668488 // for the intercept Finally. Based on a Monte Carlo simulation. MINQUE. Interval] -------------+---------------------------------------------------------------rg_int | 9.2101638 45.. noc Source | SS df MS -------------+-----------------------------Model | 284..003623102 -------------+-----------------------------Total | 284..031675926   .

and LIMDEP In SAS. PROC PANEL DATA=masil. the TSCSREG and PANEL procedures have the /RANONE option to fit the one-way random effect model. which produces slightly different estimates from FGLS.394898 . The BP option of the MODEL statement.462226 -.3111 0.016015 0.airline. Stata.15 0.003613 Hausman Test for Random Effects DF 2 m Value 1.38 0.9923 DFE Root MSE 86 0.edu/~statmath 71 . PROC TSCSREG does not have VCOMP= to specify the type of variance component estimation. Unlike PROC PANEL.indiana.0645 .63 Pr > m 0.0036 0. These procedures by default use the Fuller and Battese (1974) estimation method. ID airline year.4429 http://www.2000703 -5. not available in PROC TSCSREG. MODEL cost = output fuel load /RANONE BP VCOMP=WK.8557401 .0256249 35.0601 Variance Component Estimates Variance Component for Cross Sections Variance Component for Error 0.000 .0140248 30. PROC PANEL has the /VCOMP=WK option for the Wansbeek and Kapteyn (1989) method.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 71 rg_output | . The PANEL Procedure Wansbeek and Kapteyn Variance Components (RanOne) Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length RanOne 6 15 Fit Statistics SSE MSE R-Square 0. conducts the Breusch-Pagen LM test for random effects.9066808 .4227784 .000 -1.2 Estimations in SAS.4506587 rg_load | -1.000 .9576215 rg_fuel | . which is the groupwise heteroscedastic regression.6667731 ------------------------------------------------------------------------------ 7. RUN.32 0.

RUN.0140 0.003609 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) -210.edu/~statmath 72 .85 Pr > m <.0036) but a different variance component for groups (.2 requires the CLASS statement to explicitly specify an effect variable.4 -206.airline.0160 versus .2000 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9. RUN. airline in this case.06452 t Value 45. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION. The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1.13. PROC MIXED DATA=masil. PROC TSCSREG DATA=masil.1) Residual Subject airline Estimate 0. MODEL cost = output fuel load /SOLUTION.30 30. Unlike SAS 9. you may use PROC MIXED to get the same results.0001 Parameter Estimates Standard Error 0.2 and 9. ID airline year. The following script returns a set of random effect estimates.airline. (output is skipped) Alternatively.32 Pr > |t| <.906918 0.4 -206. CLASS airline.0001 PROC PANEL and PROC TSCSREG estimate the same variance component for error (.71 35.11 -5.0001 <.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 72 Breusch Pagan Test for Random Effects (One Way) DF 1 m Value 334.0257 0. Notice that there are some differences in the output of PROC TSCSREG (variance component estimates and Hausman test) between SAS 9.629513 0.01674 0. SAS 9.0001 <.4744).indiana.3 http://www. MODEL cost = output fuel load /RANONE.13.2107 0.422676 -1.0001 <.

2116 0.05581 0.55 -3.05 -5.9876 Number of obs Number of groups = = 90 6 15 15.05507 0.4225 -1.0003 0.0001 In Stata.9925 between = 0.33 Obs per group: min = avg = max = Wald chi2(3) = Random effects u_i ~ Gaussian http://www.15 -0.49 Pr > ChiSq <.06349 Effect Intercept Intercept Intercept Intercept Intercept Intercept airline 1 2 3 4 5 6 Estimate 0.0001 <. iis airline .2106 0.02581 0.3247 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 81 81 81 Effect output fuel load F Value 1235.03 28.5818 0.0001 <.0001 <.0001 Solution for Random Effects Std Err Pred 0.0001 <. re theta Random-effects GLS regression Group variable: airline R-sq: within = 0. .indiana.8784 0. The theta option reports an estimated theta (.06239 0.1998 Effect Intercept output fuel load Estimate 9. xtreg cost output fuel load.06594 0.40 Pr > F <.03450 -0.9616 0.9856 overall = 0.edu/~statmath 73 . Let us specify airline as a panel identification variable using the .99 Pr > |t| 0.0001 Solution for Fixed Effects Standard Error 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) BIC (smaller is better) Linear Regression Models for Panel Data: 73 -206.xtreg command has the re option to produce FGLS estimates.03 0.01406 0.0 15 11091.82 3.33 Pr > |t| <.8 Null Model Likelihood Ratio Test DF 1 Chi-Square 107.06291 DF 81 81 81 81 81 81 t Value 0.05 0.0001 <.8767).01012 -0.0646 DF 5 81 81 81 t Value 45.06180 0.0033 0.16 30.002981 0.88 903.iis command.1691 0.6322 0.9073 0. the .53 35.

0429029 .632212 .1249^2.9569045 fuel | .20458 105.6730179 _cons | 9.064572 .32 0.627909 . .20458 Wald chi2(3) Prob > chi2 = = 11114.000 9.0156=.456629 -.025809 35.210164 45.87668503 Linear Regression Models for Panel Data: 74 = 0.0675403 .456126 -. Std.0700588 -----------------------------------------------------------------------------LR test vs.215995 10.85 0.04686 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std. the random-intercept model.0036=. z P>|z| [95% Conf.4227784 .0000 -----------------------------------------------------------------------------cost | Coef. xtmixed cost output fuel load || airline:.000 . Err.0000 Prob > chi2 -----------------------------------------------------------------------------cost | Coef. Std.000 .672368 _cons | 9.53 0.15 0.0000 http://www. X) theta = 0 (assumed) = . .211559 45.000 . Interval] -----------------------------+-----------------------------------------------airline: Identity | sd(_cons) | .05 0.indiana.4225032 .© 2005-2009 The Trustees of Indiana University (9/16/2009) corr(u_i.000 -1.3949465 .49 Prob >= chibar2 = 0.9073166 .xtmixed fits the same model.2000703 -5. z P>|z| [95% Conf. linear regression: chibar2(01) = 107.81 0.217564 10.8564565 .06010514 rho | .0600715 .38 0.0140248 30. Err.051508 . Interval] -------------+---------------------------------------------------------------output | .025625 35. The || airline:.edu/~statmath 74 .2478107 -----------------------------+-----------------------------------------------sd(Residual) | .000 .9579013 fuel | .064499 . Variance components for groups and errors are reported under the labels sd(_cons) and sd(Residual). Err. option tells Stata to fit the model using the subject variable airline.3952904 .0047138 .20458 Computing standard errors: Mixed-effects REML regression Group variable: airline Number of obs Number of groups = = 90 6 15 15.9066805 .12488859 sigma_e | .1293723 .0140598 30. Alternatively.45006 load | -1.000 9.81193816 (fraction of variance due to u_i) ------------------------------------------------------------------------------ The sigma_u and sigma_e are square roots of the variance components for groups and errors (. [95% Conf.000 -1. Interval] -------------+---------------------------------------------------------------output | .16 0.03982 -------------+---------------------------------------------------------------sigma_u | .4502665 load | -1.0601^2).1997763 -5.0 15 Obs per group: min = avg = max = Log restricted-likelihood = 105. Performing EM optimization: Performing gradient-based optimization: Iteration 0: Iteration 1: log restricted-likelihood = log restricted-likelihood = 105.33 0.856732 . .

003494 Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -229.8281 0.72 31.04 Pr > |t| 0.02466 0. PROC PANEL and TSCSREG do not have such option.1962 Effect Intercept output fuel load Estimate 9. CLASS airline.01302 0.airline METHOD=ML.edu/~statmath 75 . In SAS.47 36.6186 0.05640 0. MODEL cost = output fuel load /SOLUTION.0012 0.4 -218.57 -4.05 -5.0001 Solution for Random Effects Std Err Pred 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 75 You may use the maximum likelihood estimation to fit random effect (or random intercept) model.0001 <.05580 0. RUN.9892 0.0001 <.0001 <.04900 0.7 Null Model Likelihood Ratio Test DF 1 Chi-Square 105.1676 0.03211 -0.2094 0.22 -0.01364 0.2026 0. The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1.indiana.4234 -1.5 -216.27 3.000761 0.0001 Solution for Fixed Effects Standard Error 0.92 Pr > ChiSq <.37 0.5707 <.5 -217.1) Residual Subject airline Estimate 0. add METHOD=ML to PROC MIXED. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION. PROC MIXED DATA=masil.0645 DF 5 81 81 81 t Value 47.05994 0.0001 0.05750 Effect Intercept Intercept Intercept Intercept Intercept Intercept airline 1 2 3 4 5 6 Estimate 0.2992 http://www.9053 0.01306 -0.42 Pr > |t| <.06008 DF 81 81 81 81 81 81 t Value 0.04976 0.01 1.

196231 -5.Lhs=COST.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 76 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 81 81 81 Effect output fuel load F Value 1348. 2009 at 08:26:15PM | | LHS=COST Mean = 13.Rhs=ONE.1140843 . Compare the output of PROC MIXED above and .OUTPUT. Interval] -------------+---------------------------------------------------------------output | .FUEL.36561 | | Standard deviation = 1.064456 .0119).0001 In Stata.Het=AIRLINE. = 90 | | Model size Parameters = 4 | | Degrees of freedom = 86 | | Residuals Sum of squares = 1.2064687 /sigma_e | .0045701 .LOAD.8555741 .Panel. you have to specify Panel. .6798506 _cons | 9. and Het= subcommands for the groupwise heteroscedastic model.618648 .4505957 load | -1.449062 -. xtreg cost output fuel load.xtreg below. the mle option is used in . Random Effect.9344669 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 105.0591^2. i(airline) panels(hetero) corr(independent) (output is skipped) In LIMDEP.0035 = . LIMDEP estimates a slightly different variance component for groups (. REGRESS.000 .0 15 436.1047419 .0253759 35. Notice that error variance components are computed as .0630373 .0507956 .88 29.000 .43 Pr > F <.000 -1.0591072 .indiana.000 .42 0.5365302 .013888 30.xtreg and .72896 -----------------------------------------------------------------------------cost | Coef.48 0.7883772 .1246133 | http://www.0687787 rho | .55 0.Random Effect$ +----------------------------------------------------+ | OLS Without Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 30.4233757 .9053099 .xtgls that fits panel data models with heteroscedasticity across and within groups. Err.edu/~statmath 76 .68 0. Std.xtmixed commands to produce the same result.3961557 .9550458 fuel | . You may also try .0345293 . thus producing different parameter estimates.Str=AIRLINE.0001 <.19 963.000 9. xtgls cost output fuel load.02362 -------------+---------------------------------------------------------------/sigma_u | . xtmixed cost output fuel load || airline:.0130=1141^2 and . re mle Random-effects ML regression Group variable: airline Random effects u_i ~ Gaussian Number of obs Number of groups = = 90 6 15 15. mle (output is skipped) .335450 | | Standard error of e = .0000 Obs per group: min = avg = max = LR chi2(3) Prob > chi2 = = Log likelihood = 114.92 Prob>=chibar2 = 0.0001 <.131971 | | WTS=none Number of observs.206622 46.32 0. z P>|z| [95% Conf.213677 10.

26 (.indiana.3611 84.t).01374650 30.28136 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .468584 | | Total 114.34530293 -4.226263)*tm_output rt_fuel = fuel .4) rt_cost = cost . Model (3) = 334.85 | | ( 1 df. ˆ The variance component for error  v2 is . Free.17430918 FUEL | .987042D+00 | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St. 86] (prob) =2419.56046016 Constant| 9.767356 | | Lagrange Multiplier Test vs.t) = e(i.121594 | | Akaike Info.713 .45397771 . Crt.341 .119159D-01 | | Corr[v(i.v(i.(-1.20277404 47.62750780 .edu/~statmath 77 .0882).01325455 66.6799 5.599 .7703592 LOAD | -1. prob value = .359 .000000) | | (High values of LM favor FEM/REM over CR model. Largest 15 | | Average group size 15. .06455866 .08819022/(15*6-15-3) ˆ The variance component for time  u2 is -.226263)*tm_load rt_int = 1 .00 | +----------------------------------------------------+ +--------------------------------------------------+ | Random Effects Model: v(i.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .0000 7.76991 | | Restricted(b=0) = -138.041 89. . LogAmemiya Prd.s)] = .226263) // for the intercept http://www.19933132 -5.0000) | | Info criter.147779D+01 | | R-squared .© 2005-2009 The Trustees of Indiana University (9/16/2009) | Fit R-squared = .(-1.361260D-02 | | Var[u] = . .22924522 41.t) + u(i) | | Estimates: Var[e] = .396 .17430918 FUEL | .0000 -1.85 | | Sum of Squares ..88273863 . gen gen gen gen gen ˆ  v2 .(-1.121653 | +----------------------------------------------------+ Linear Regression Models for Panel Data: 77 +----------------------------------------------------+ | Panel Data Analysis of COST [ONE way] | | Unconditional ANOVA (No regressors) | | Source Variation Deg. Valid data 6 | | Smallest 15.514 .42389869 .01511375/6 ˆ The  is .7703592 LOAD | -1.01511375 1 2 ˆ n between 6 * . Mean Square | | Between 74. Criter.3581 | | Chi-sq [ 3] (prob) = 400.005590631/(15-4).0000 12.0000 . .) | | Baltagi-Li form of LM Statistic = 334.Er.61063438 .005590631/(15 .3 One-way Random Time Effect Model ˆ Let us compute  using the SSEs of the between time effect model (.0056) and the fixed time effect model (1.02461548 36.0000 +----------------------------------------------------+ | Panel:Groups Empty 0.00201072 =.90412380 .01511375 = 1.0000 -1. = -4.226263  1  .0000 12.226263)*tm_fuel rt_load = load .837 .1.02030424 22. 1. .56046016 Constant| 9. 14. = -4.33 (.(-1.9882897 | | Adjusted R-squared = .(-1.51691223 .730 .0000) | | Diagnostic Log likelihood = 61.9360 | | Residual 39.0000 .9878812 | | Model test F[ 3.226263)*tm_cost rt_output = output .

Err.000 -1.7855982 ------------------------------------------------------------------------------ However. RUN. Std.airline.9883 DFE Root MSE 86 0.9732 90 888.0155 0. ID year airline. BY year airline.15 0. t P>|t| [95% Conf.2482869 -5.288591 Linear Regression Models for Panel Data: 78 Number of obs F( 4.edu/~statmath 78 .4649277 rt_load | -1. use the TSCSREG or PANEL procedure with the /RANONE option.000 . 0.14438 -----------------------------------------------------------------------------rt_cost | Coef.79271995 86 .0451 Residual | 1. Notice that the data are sorted by year and airline.812157 rt_output | .9168785 rt_fuel | . The /VCOMP=WH option in the MODEL statement employs Wallace and Hussian’s method to estimating variance components and produces the same parameter estimates. (Output is skipped) PROC PANEL DATA=masil. PROC SORT DATA=masil. PROC TSCSREG DATA=masil. noc Source | SS df MS -------------+-----------------------------Model | 79944.1804 4 19986.8598891 .90 0.1246 http://www. RUN.000 . 86) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .indiana.0000 1.airline. Interval] -------------+---------------------------------------------------------------rt_int | 9.0143338 61.000 9.020845581 -------------+-----------------------------Total | 79945.4392731 .98 0.0000 .04 0. ID year airline.3354 0.4136186 .8883838 . MODEL cost = output fuel load /RANONE BP VCOMP=WH. The PANEL Procedure Wallace and Hussain Variance Components (RanOne) Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length RanOne 15 6 Fit Statistics SSE MSE R-Square 1. the negative value of the variance component for time is not likely.279176 .772754 -. regress rt_cost rt_int rt_output rt_fuel rt_load.1489281 63.220038 9.0129051 34.516098 .© 2005-2009 The Trustees of Indiana University (9/16/2009) . MODEL cost = output fuel load /RANONE.0000 1.airline. In SAS.

62751 t Value 41.0001 <.51 66.1) Residual Subject year Estimate 0 0. PROC MIXED DATA=masil. The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1. RUN.882739 0. CLASS airline.0133 0.2 http://www.36 -4.edu/~statmath 79 .9 -100.55 Pr > m 0.0023 Breusch Pagan Test for Random Effects (One Way) DF 1 m Value 1.0001 PROC MIXED fits the same random time effect model although /SOLUTION in the RANDOM statement does not work to produce random effect parameter estimates in this case.2292 0.3453 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 79 Variance Component Estimates Variance Component for Cross Sections Variance Component for Error 0 0.01553 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -102.0001 <.016437 Hausman Test for Random Effects DF 2 m Value 12.516923 0.0203 0.17 Pr > m 0.9 -100.453977 -1.9 -100.2135 Parameter Estimates Standard Error 0. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN.indiana.airline. MODEL cost = output fuel load /SOLUTION.0001 <.71 Pr > |t| <.60 22.

4540 -1.44 499.indiana.71 Pr > |t| <.9843 between = 0.6275 DF 14 72 72 72 t Value 41. re i(year) theta Random-effects GLS regression Group variable: year R-sq: within = 0.01325 0. Err. tsset year airline panel variable: time variable: delta: year (strongly balanced) airline.516923 .000 .453977 . z P>|z| [95% Conf.60 0.0001 <.0203042 22.60 22. .0 6 7258.3453 Effect Intercept output fuel load Estimate 9.0001 <. http://www.966233 -------------+---------------------------------------------------------------sigma_u | 0 sigma_e | .5169 0.36 -4.0001 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 72 72 72 Effect output fuel load F Value 4435. X) = 0 (assumed) theta = 0 -----------------------------------------------------------------------------cost | Coef.000 -2.62751 . Std.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 80 Null Model Likelihood Ratio Test DF 0 Chi-Square 0.0001 <.4937724 load | -1.0001 In Stata.9087169 fuel | .4141815 .0132545 66.0001 <.92 22.0000 Obs per group: min = avg = max = Wald chi2(3) Prob > chi2 = = Random effects u_i ~ Gaussian corr(u_i.8567602 .edu/~statmath 80 .2292445 41.9966 overall = 0.02030 0.03 0.51 0.30429 -.067612 9.0001 <.9507309 _cons | 9.000 9. xtreg cost output fuel load.22 Pr > F <.12293801 rho | 0 (fraction of variance due to u_i) ------------------------------------------------------------------------------ You may runt the following command to get the same result.tsset command.36 0.8827 0.345302 -4. you have to switch group and time variables using the .0000 Solution for Fixed Effects Standard Error 0.000 .71 0.2292 0.8827385 .9883 Number of obs Number of groups = = 90 15 6 6. Interval] -------------+---------------------------------------------------------------output | .51 66. 1 to 6 1 unit .00 Pr > ChiSq 1.

v(i.66267268 .t) = e(i.OUTPUT.17430918 FUEL | .24108843 39.55 | | ( 1 df.213557) | | (High values of LM favor FEM/REM over CR model.Er.026705 | | Lagrange Multiplier Test vs.Random$ +----------------------------------------------------+ | Panel:Groups Empty 0.0000 -1.162 .0000 .56046016 Constant| 9. MODEL cost = output fuel load /RANTWO.88285277 .434 .02122856 21.LOAD.edu/~statmath 81 .45500533 .airline. RUN. The output below includes only the random effect part. prob value = .|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .0000 12.01314515 67.00 | +----------------------------------------------------+ +--------------------------------------------------+ | Random Effects Model: v(i.739 .Str=YEAR.151138D-01 | | Var[u] = .airline.35084190 -4.FUEL.Panel.0000 7.4 Two-way Random Effect Model in SAS The random group and time effect model is formulated as y it     ' X ti  u i   t   it . and LIMDEP are slightly different each other. The PANEL Procedure Fuller and Battese Variance Components (RanTwo) Dependent Variable: cost Model Description Estimation Method RanTwo http://www. The BP2 option conducts the Breusch-Pagan LM test for the two-way random effect model. (output is skipped) Linear Regression Models for Panel Data: 81 In LIMDEP. RUN. ID airline year.414686D-03 | | Corr[v(i. Let us first estimate the two way FGLS using the SAS PANEL procedure with the /RANTWO option.7703592 LOAD | -1. xtmixed cost output fuel load || year:. Stata. ID airline year.t) + u(i) | | Estimates: Var[e] = . REGRESS. you need to use the Str= and Random subcommands.indiana. PROC TSCSREG DATA=masil. (Output is skipped) PROC PANEL DATA=masil.988288D+00 | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Rhs=ONE.Het=YEAR.t).503 . Valid data 15 | | Smallest 6.) | | Baltagi-Li form of LM Statistic = 1.52363173 . MODEL cost = output fuel load /RANTWO BP2.133564D+01 | | R-squared .© 2005-2009 The Trustees of Indiana University (9/16/2009) . Largest 6 | | Average group size 6.s)] = . You may find that parameter estimates of SAS.Lhs=COST. Model (3) = 1.55 | | Sum of Squares .

Rhs=ONE.41 -4.017439 0.98053 t Value 38.© 2005-2009 The Trustees of Indiana University (9/16/2009) Number of Cross Sections Time Series Length Linear Regression Models for Panel Data: 82 6 15 Fit Statistics SSE MSE R-Square 0.0001 <.5 Testing Random Effect Models The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects.Period=YEAR.OUTPUT.0001 <. mle REGRESS.00264 Hausman Test for Random Effects DF 3 m Value 6.0001 The following .0520 Variance Component Estimates Variance Component for Cross Sections Variance Component for Time Series Variance Component for Error 0.0172 0.Str=AIRLINE. xtmixed cost output fuel load || airline: || year:.001081 0.436163 -0.0001 <.362677 0.93 Pr > m 0.2235 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9. The null hypothesis of the one-way random group effect model is that individual-specific or time-series error variances are zero: H 0 :  u2  0 .98 25.FUEL.xtmixed command suffers from convergence problem in this case and LIMDEP command produces different results (output is skipped).Lhs=COST.39 Pr > |t| <.0255 0.LOAD.40 Pr > m <.0027 0. If the null hypothesis is not rejected.Panel.Random Effect$ 7.38 33.2322 0. .0741 Breusch Pagan Test for Random Effects (Two Way) DF 2 m Value 336.866448 0.edu/~statmath . the pooled 82 http://www.2440 0.indiana.9829 DFE Root MSE 86 0.0001 Parameter Estimates Standard Error 0.

3354  1 ~  (1) with p<.55 0.edu/~statmath 83 .131971 e | . In Stata.2135 2(n  1)   eit 2(6  1)      2 .0155972 . The e’e of the pooled OLS is 1. SAS and LIMDEP return the same LM statistic (see 7. re i(airline) . re i(year) .281358 1.0000 2 The null hypothesis of the one-way random time effect is that variance components for time are 2 zero. The small chi-squared of 1. 6 * 15 15 2 * .0151138 .8496.0665147.t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------cost | 1.85 0. The following LM test uses Baltagi’s formula.2). 2 2  15 * 6  .131971 e | .2135 http://www.xttest0 command right after estimating the one-way random group effect model.3).3354  With the large chi-squared of 334.0000.t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------cost | 1.0665   1 ~  2 (1) with p <. .7817 Tn   net   2  1  LM is 1.122938 u | 0 0 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 1.0601051 u | .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 83 regression model is appropriate.281358 1. xttest0 Breusch and Pagan Lagrangian multiplier test for random effects cost[year.t] = Xb + u[year] + e[year.5472 does not reject the null hypothesis at the . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects cost[airline.5472   2 1. H 0 :  u  0 .01 level. quietly xtreg cost output fuel load.indiana. The SAS PANEL procedure with the /BP option and the LIMDEP Panel and Het subcommands report the same LM statistic (see 7. run the .1248886 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 334. we reject the null hypothesis in favor of the random group effect model.0036126 .8496=  2(15  1)  1.33544153 and e ' e is . LM is 334.t] = Xb + u[airline] + e[airline. quietly xtreg cost output fuel load.

2 Stata 11 LIMDEP 9 Procedure/Command One-way Two-way SSE (e’e) MSE or SEE PROC TSCSREG /RANONE /RANTWO Slightly different Slightly different PROC PANEL /RANONE WK /RANTWO Correct Correct .edu/~statmath .indiana. re .0052867 . let us conduct the test in Stata.9066805 . Since computation is complicated. quietly xtreg cost output fuel load.12 is different from PROC PANEL’s 1.1 Comparison of the Random Effect Model in SAS.064499 -. It is because SAS.E.3968 = 334. hausman fixed_group .0058583 load | -1.0058974 . PROC PANEL is highly recommended. Stata.1 summarizes random effect estimations in SAS. obtained from xtreg B = inconsistent under Ha. ---.0126041 .Random$ Str=. -------------+---------------------------------------------------------------output | .xtreg re No No No Regress.4227784 -.16. however.12 Prob>chi2 = 0. .0255088 -----------------------------------------------------------------------------b = consistent under Ho and Ha. Table 7. efficient under Ho. estimates store fixed_group .Period. Stata. Difference S.5472 (p<.5469 (V_b-V_B is not positive definite) The Hausman statistic 2.4174918 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 84 The two way random effects model has the null hypothesis that variance components for groups and time are all zero. 7.Random$ Incorrect No 84 http://www. The LM statistic with two degrees of freedom is 336.6 Fixed Effects versus Random Effects How do we compare a fixed effect model and its counterpart random effect model? The Hausman specification test examines if the individual effects are uncorrelated with the other regressors in the model. LIMDEP* SAS 9. Panel$ Str=. quietly xtreg cost output fuel load. Stata.0153877 fuel | .7 Summary Table 7. do not reject the null hypothesis in favor of the random effect model.63 and Greene (2003)’s 4. 7. and LIMDEP.9192846 . fe . 1 to 15 1 unit .Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed_group .0001). and LIMDEP use different estimation methods to produce slightly different parameter estimates.8496 + 1.070396 -1. tsset airline year panel variable: time variable: delta: airline (strongly balanced) year. obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 2. These tests.

indiana.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 85 Model test (F) No No Wald test No (adjusted) R2 Slightly different Slightly different Incorrect Incorrect Intercept Slightly different Correct Correct Slightly different Coefficients Slightly different Correct Correct Slightly different Standard errors Slightly different Correct Correct Slightly different Variance for group Slightly different Correct Correct (sigma) Slightly different Variance for error Correct Correct Correct (sigma) Correct theta Theta No No No BP. http://www.hausman Hausman Test (H) Incorrect Yes Yes (unstable) * “Yes/No” means whether a software package reports the statistic. “Correct/incorrect” indicates whether the statistics are different from those of the groupwise heteroscedastic regression.edu/~statmath 85 . BP2 .xttest0 Breusch-Pagan (LM) No Yes .

0568) .4362** (.0000) 3935.0000) 104.0704** (.2749+ (.4540** (.9829 2419.9989) .0299) . In Stata.9069** (.1686 (.9883 (. Do not forget to sort the data set in advance.2322 (.1 summarizes the results of pooled OLS.7511 (2.1246) .7432) -1.4787) .9984 (. We may ask.9544** (.0000) The poolability test examine if data are poolable so that individual entities or time periods have the same constant slopes of regressors.3111 (.0001) 1960.62 (p<. “Which model is better than the others?” Do we have to consider individual-specific or time effect? Are these effects are fixed or random? Table 8.edu/~statmath 86 . use the BY statement in PROC REG.9991 (.1635) .0255) .1 Group by Group OLS Regression In SAS. For poolability test. BY airline.1 Summary of Pooled. forvalues i= 1(1)6 { // run group by group regression display "OLS regression for group " `i' regress cost output fuel load if airline==`i' } OLS regression for group 1 http://www.0514) .) . MODEL cost = output fuel load.8820** (.9905 (.9882) .33 (p<. the panel data are not poolable.2017) -1.0513) .0152) -. If the null hypothesis is rejected.0601) 1.34 (p<.8827** (.1088) 1. Fixed Effect.9923 .0134) .0000) 439.8828** (.1769 (.0203) -5. In this case.1333** (. RUN.indiana.3342** (.3453) -1.1229) .2478) -1.0225) .8173** (.0317 (. 8.3507** (. Poolability Test Table 8.0154) .3354 (. fixed effect.0645** (.9879) .4227** (.6275** (.9979) .3641) .2000) -2.7825* (.airline.79 (p<.0050** (.4184) -.9193** (.1259) .9936 (.0095) 4074.0172) -1.8677** (. the if qualifier makes it easy to run group by group regressions.9841) . you need to run group by group OLS regressions and/or time by time OLS regressions.4424) -.0882 (.0133) .0257) . PROC REG DATA=masil.1167) .9974 (. you may consider the random coefficient model and hierarchical regression model.9805** (.12 (p<. and random effect model.0056 (.9848 .4845 (.0140) .2235) 1.0319) .5239 (4.8664** (. BY airline.82 (p<.9972) . and Random Effect Models Model Output Fuel Load SSE/SEE DF Pooled Between group Between time Fixed group Fixed time Two-way fixed Random group Random time Two-way random .2926 (.0520) 86 2 11 81 72 67 86 86 86 F R2 (Adj.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 86 8.4175** (.1722 (. PROC SORT DATA=masil.2617) -1.airline.0601) 1.0228) .

Interval] -------------+---------------------------------------------------------------output | 1.9353749 .40727792 14 .6105989 -1. Std. t P>|t| [95% Conf.396444 fuel | . Err.000 .18318 .001 .022869767 11 .89 Prob > F = 0.9924 .000 -3.9953 0.000 10.095226 .26428891 Residual | .000 .3847054 1.13941449 Residual | .2376522 -11.79286673 3 1.68 0.9940 0. Interval] -------------+---------------------------------------------------------------output | .49031 -----------------------------------------------------------------------------OLS regression for group 5 Source | SS df MS -------------+-----------------------------Model | 7.0272443 11.0000 = 0.244 -2.9975 = .034752343 11 .7756708 .4266329 load | -2. Err.23 0.36886 load | -2.00207907 -------------+-----------------------------Total | 3.000 7. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 3129.000 . t P>|t| [95% Conf.461629 .000 6.102488 fuel | .13 0.0456 -----------------------------------------------------------------------------cost | Coef.068956 fuel | .41824348 3 1.463129191 Number of obs F( 3.92346 -----------------------------------------------------------------------------OLS regression for group 3 Source | SS df MS -------------+-----------------------------Model | 3.36104572 Number of obs = 15 F( 3.4637263 .164608 .5353929 load | -.5613333 load | -.8157365 14 .47622084 3 2.1554418 4.9699164 1.0968946 12.007587838 11 .3661192 .2972551 36.4320951 27.10 0.67757 -----------------------------------------------------------------------------OLS regression for group 4 Source | SS df MS -------------+-----------------------------Model | 7.34501 -1. Interval] -------------+---------------------------------------------------------------output | .7513069 .0000 0.000 .000618083 -------------+-----------------------------Total | 3. 11) Prob > F R-squared Adj R-squared Root MSE = = = = = = 15 777.22 0.003159304 -------------+-----------------------------Total | 7.000 -3.722057 10.128 -1.4250424 14 .4707826 -1.32 0.40 0.3088958 .97243 .edu/~statmath 87 .0381103 11.044347 10.811856 .08313716 3 2.37252558 3 2.65 0. Err.9940 .71 0.8985786 9.459104 .3676324 .244645886 Linear Regression Models for Panel Data: 87 Number of obs F( 3.247854 -2.846 .699815 .201716 _cons | 11.4013571 -6.50 = 0.34 0.006798918 11 .3465406 . Std.86 0.02626 -----------------------------------------------------------------------------cost | Coef.4515127 .9980 = 0.0000 0.0181946 21.272552607 Number of obs F( 3.000 11.0759266 12.9988 = 0.000 .45750853 Residual | .49 0.000 1.05621 -----------------------------------------------------------------------------cost | Coef.7682616 1.0000 http://www.0000 = 0. Err.7268305 .284597 1.47 0.02139 12.6023241 15.000689803 -------------+-----------------------------Total | 6.21 0.19174 11.48380868 14 .5926122 _cons | 8.2605148 _cons | 9.0792856 18.15874028 Residual | .© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 3. Interval] -------------+---------------------------------------------------------------output | 1. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 1843.724785 .3865867 .25 0.indiana.838902 10.02486 -----------------------------------------------------------------------------cost | Coef.578248 _cons | 10.85 0. Std.46 0. Std.9985 = . 11) = 1999.2489315 .000 . t P>|t| [95% Conf.50025 -----------------------------------------------------------------------------OLS regression for group 2 Source | SS df MS -------------+-----------------------------Model | 6.46 = 0. t P>|t| [95% Conf.52909128 Number of obs F( 3.63361 fuel | .68 0. 11) Prob > F R-squared Adj R-squared Root MSE = = = = = = 15 608.

3354. Interval] -------------+---------------------------------------------------------------output | .2920542 .830 -.1050328 .13544 13.012170358 + .03436 -----------------------------------------------------------------------------cost | Coef.03774 -----------------------------------------------------------------------------cost | Coef.1173565 3 3.1007 (6  1)4 ~ 40.2344839 .029913538 + .0157. Err.0068 + .154354 _cons | 10.3701678 load | .9065471 1.0434213 6. t P>|t| [95% Conf.3023258 .076299 .0321728 30.7505079 http://www.3336308 -3. Std.09612359 14 . Err.004 -1. The sum of et ' et is computed from the 15 time by time regression.66 .038151 fuel | . the SSE of the pooled OLS regression.40614 -----------------------------------------------------------------------------OLS regression for group 6 Source | SS df MS -------------+-----------------------------Model | 11.62 0.023093978 + .96 0.246051 fuel | .872309 11.012986435 11 . The F statistic is (1.044807673 + .0076 + .0229 + .063648817 + .000 .1007 = .0771255 13.077112957 + /// .015663323 11 .70578551 Residual | .481220.000 9.indiana.9442886 1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Residual | .77079 .49 = 0.1007 6(15  4) The large 40.014104542 + /// .2 Poolability Test across Groups The null hypothesis of the poolability test across groups is H 0 :  ik   k .037256216 .67532 ------------------------------------------------------------------------------ 8. The e' e is 1.22 0.7430078 15.3354  .3 Poolability Test over Time The null hypothesis of the poolability test over time is H 0 :  tk   k .085430285 + . t P>|t| [95% Conf.016506613 + .0000 = 0.0308235 9.9673393 .000 .000 .206847 .4767508 0.001423938 -------------+-----------------------------Total | 11.9982 = .087240016 + .000 .9982 0.0130 + .4812 rejects the null hypothesis of poolability (p< .81 0.4095921 26.0000). Std.30 0. di .9977 . forvalues i= 1(1)15 { // run year by year regression display "OLS regression for year " `i' regress cost output fuel load if year==`i' } (output is skipped) .066075346 + .000469826 + .77381 . Interval] -------------+---------------------------------------------------------------output | 1.000 10.4725305 _cons | 11. The ei ' ei is .1330199 14 .3876239 load | -1.0348 + .9986 = 0. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 2602. 8.edu/~statmath 88 .1964845 .8965275 1.143348297 + .506865971 Linear Regression Models for Panel Data: 88 R-squared = Adj R-squared = Root MSE = 0.795215705 Number of obs F( 3. We conclude that the panel data are not poolable with respect to airline.73 0.049329439 + .001180585 -------------+-----------------------------Total | 7.07 0.941163 -.84 0.

9991).7505 15(6  4) The small F statistic does not reject the null hypothesis in favor of poolable panel data with respect to time (p<.417584.edu/~statmath 89 .7505) (15  1)4 .3354  .indiana.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 89 The F statistic is .30  (1. http://www.

1 and 7. Thus. The within effect model does not use dummy variables but deviations from group means. If unknown. Notice that the dummy parameters of three LSDV approaches have different meanings and thus conduct different t-tests. Extensions to these basic linear panel data models include dynamic models with autocorrelation.1. but reports incorrect R2 and F statistic. The fixed effect model asks how group and/or time affect the intercept. This document assumes that data are balanced without missing values. Parameter estimates vary depending on estimation methods. Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange multiplier test. In particular. random coefficient model.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 90 9. LSDV2 provides actual parameter estimates of groups (Y-intercepts). you may consider categorizing subjects to reduce the number of groups. Then. Slopes are assumed unchanged in both fixed effect and random effect models. LSDV1 drops a dummy. panel data models may be less useful because the null hypothesis of F test is too strong. Stata is very handy to manipulate panel data reports incorrect F-test and R2. LSDV1 is commonly used since it produces correct statistics. http://www. and logit/probit models. and hierarchical linear model. LSDV2 suppresses the intercept. LIMDEP is able to estimate various panel data models but does not good at data management. you need to adjust the standard errors to conduct correct t-tests. A panel data set needs to be arranged in the long format as shown in Section 1. When the variance structure is known. SPSS is least recommended for panel data models. read output with caution and consider dropping subjects with many missing data points.1). As a result. Random effect models are estimated by the generalized least squares (GLS) and the feasible generalization least squares (FGLS). FGLS estimates theta. GLS is used. this model is useful when there are many groups and/or time periods in the panel data set since it is able to avoid the incidental parameter problem. and LSDV3 includes all dummies and imposes restrictions instead. Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and within effect model.edu/~statmath 90 . If the null hypothesis of uncorrelation is rejected. Poolabiltiy is tested by running group by group or time by time regressions. Because of its larger degrees of freedom. while the random effect model analyzes error variance structures affected by group and/or time. the within effect model produces incorrect MSE and standard errors of parameters. PROC PANEL provides various ways of analyzing panel data and report correct (adjusted) statistics (see Table 4. The Hausman specification test compares a fixed effect model and a random effect model. Among the four statistical packages addressed in this document. I would recommend SAS and Stata. Conclusion Panel data are analyzed to investigate group and time effects using fixed effect and random effect models. The dummy parameter estimates need to be computed afterward.indiana. If the number of groups (subjects) or time periods is extremely large. LSDV has three approaches to avoid perfect multicollinearity. the fixed effect model is preferred. If data are severely unbalanced.

93646 cost0 | 90 1122524 1192075 68978 4748320 fuel0 | 90 471683 329502. sum output0 cost0 fuel0 load Variable | Obs Mean Std.dta airline = airline (six airlines) year = year (fifteen years) output0 = output in revenue passenger miles.edu/~statmath 91 .csv http://www.edu/~statmath/stat/all/panel/airline.dta firm = IT company name type = type of IT firm rnd = 2002 R&D investment in current USD millions income = 2000 net income in current USD millions d1 = 1 for equipment and software firms and 0 for telecommunication and electronics .417 0 5490 income | 50 2509. airlines (1970-1984) presented in Greene (2003). sum rnd income Variable | Obs Mean Std.585 -732 11797 Data set 2: Cost data for U. Dev.edu/~statmath/stat/all/panel/rnd2002.5449946 . Dev.432066 .stern.indiana.0527934 .037682 1.indiana.5604602 .5335865 .sourceoecd.564 1615.edu/~wgreene/Text/tables/tablelist5. Min Max -------------+-------------------------------------------------------rnd | 39 2023. URL: http://www.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 91 Appendix: Data Sets Data set 1: Data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004 (http://thesius.htm http://www.78 3104. index number cost0 = total cost in $1.indiana.676287 http://www.9 103795 1015610 load | 90 .indiana.nyu.S. tab type d1 | d1 Type of Firm | 0 1 | Total ----------------+----------------------+---------Telecom | 18 0 | 18 Electronics | 17 0 | 17 IT Equipment | 0 6 | 6 Comm. Equipment | 0 5 | 5 Service & S/W | 0 4 | 4 ----------------+----------------------+---------Total | 35 15 | 50 . URL: http://pages.000 fuel0 = fuel price load = load factor. the average capacity utilization of the fleet .org/).edu/~statmath/stat/all/panel/rnd2002. Min Max -------------+-------------------------------------------------------output0 | 90 .

and George E. T." Journal of the American Statistical Association. and George E. “Dummy Variables: Mechanics V. William H. 62(2): 67-89. Rudolf J. A.. 2007. Cary.indiana. SPSS 16. 3rd ed. Greene. William H. Cameron. http://www. Chicago.. Release 10. R. "Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-way Error Component Regression Model. Battese. J. Wayne A. New York: Cambridge University Press. 2004. 1980. 2009. Release 10. Bulent. LIMDEP Version 9. 2004. TX: Stata Press. Stata Press. Hausman." Econometrica. SAS/STAT 9.. Econometric Analysis of Panel Data. SAS/ETS 9. 1994. 2001. NC: SAS Institute. Greene. Uyar. 1974. "The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics. New York: Econometric Software. Stata Press. Microeconometrics: Methods and Applications. SAS Institute. Stata Base Reference Manual. College Station. Breusch. A. 2007. Stata Longitudinal/Panel Data Reference Manual. "Estimation of Linear Models with CrossedError Structure. Badi H. Interpretation. and Young-Jae Chang. Cameron. "Regression Procedures in SAS: Problems?" American Statistician 44(4): 296-301. and Ramon C. Plainview. 2007. Suits.1 User’s Guide. TX: Stata Press. Wayne A. TX: Stata Press. NC: SAS Institute. Baltagi. Battese. and Orhan Erdem. Release 10. Pagan. Cambridge.edu/~statmath 92 . "Transformations for Estimation of Linear Models with Nested-Error Structure. 1978. 2002.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 92 References Baltagi. Econometric Analysis." Review of Economic Studies. and A. Littell. 2007. 2007. Fuller. Trivedi." Journal of Econometrics. 5th ed.1 User’s Guide. 2005. Trivedi. TX: Stata Press. SPSS Inc. 46(6):1251-1271. Jeffrey M.” Review of Economics & Statistics 66 (1):177-180. and Pravin K. "Specification Tests in Econometrics. 2003. Cary. A. Econometric Analysis of Cross Section and Panel Data. Badi H. and Pravin K. Cary. Freund. MA: MIT Press. Microeconometrics Using Stata. Upper Saddle River. 2000. Colin. 2: 67-78. Stata Press. S. Stata Time-Series Reference Manual. SAS System for Regression.0 Econometric Modeling Guide 1. 1990. Colin." Journal of Econometrics. John & Sons. College Station. NJ: Prentice Hall. Daniel B. 47(1):239-253. Wooldridge. SAS Institute. Fuller. 1984. 1973. IL: SPSS Inc. NC: SAS Institute.0 Command Syntax Reference. College Station. Wiley. 68(343) (September): 626-632.

edu/~statmath 93 .04. Revision History  2005. Good of the School of Public and Environmental Affairs.11 First draft  2008. Indiana University at Bloomington.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 93 Acknowledgements I have to thank Dr.indiana. David H. and Kevin Wilhite at the UITS Center for Statistical and Mathematical Computing for comments and suggestions. I am also grateful to Jeremy Albright.09 Second draft (updated LSDV section and analysis output) http://www. A special thanks to many readers around the world who have eagerly provided constructive feedback and encouraged me to keep improving this document. Heejoon Kang of the Kelley School of Business and Dr. 11 Corrected some errors and added Stata examples  2009. Dani Marinova.

Sign up to vote on this title
UsefulNot useful