I n d i a n a U n i v e r s i t y

Uni ver si t y I nf or mat i on Technol ogy Ser vi ces
Linear Regression Models for Panel Data Using SAS, Stata,
LIMDEP, and SPSS
*





Hun Myoung Park, Ph.D.








© 2005-2009
Last modified on September 2009








University Information Technology Services
Center for Statistical and Mathematical Computing
Indiana University
410 North Park Avenue Bloomington, IN 47408
(812) 855-4724 (317) 278-4740
http://www.indiana.edu/~statmath

*
The citation of this document should read: “Park, Hun Myoung. 2009. Linear Regression Models for Panel Data
Using SAS, Stata, LIMDEP, and SPSS. Working Paper. The University Information Technology Services (UITS)
Center for Statistical and Mathematical Computing, Indiana University.”
http://www.indiana.edu/~statmath/stat/all/panel
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 2
http://www.indiana.edu/~statmath

2
This document summarizes linear regression models for panel data and illustrates how to
estimate each model using SAS 9.2, Stata 11, LIMDEP 9, and SPSS 17. This document does not
address nonlinear models (i.e., logit and probit models) and dynamic models, but focuses on
basic linear regression models.


1. Introduction
2. Least Squares Dummy Variable Regression
3. Panel Data Models
4. One-way Fixed Effect Models: Fixed Group Effect
5. One-way Fixed Effect Models: Fixed Time Effect
6. Two-way Fixed Effect Models
7. Random Effect Models
8. Poolability Test
9. Conclusion
Appendix
References


1. Introduction

Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each
of which has repeated measurements at different time periods. U.S. Census Bureau’s Census
2000 data at the state or county level are cross-sectional but not time-series, while annual sales
figures of Apple Computer Inc. for the past 20 years are time series but not cross-sectional. If
annual sales data of IBM, LG, Siemens, Microsoft, and AT&T during the same periods are also
available, they are panel data. The cumulative General Social Survey (GSS), American
National Election Studies (ANES), and Current Population Survey (CPS) data are not panel
data in the sense that individual respondents vary across survey years. Panel data may have
group effects, time effects, or the both, which are analyzed by fixed effect and random effect
models.

1.1 Data Arrangement

A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T
observations measured at 1 through t time period. Thus, the total number of observations is nT.
Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month).
Otherwise, panel data should be analyzed with caution. A short panel data set has many
entities but few time periods (small T), while a long panel has many time periods (large T) but
few entities (Cameron and Trivedi 2009: 230).

Panel data have a cross-section (entity or subject) variable and a time-series variable. In Stata,
this arrangement is called the long form (as opposed to the wide form). While the long form has
both group (individual level) and time variables, the wide form includes either group or time
variable. Look at the following data set to see how panel data are arranged. There are 6 groups
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 3
http://www.indiana.edu/~statmath

3
(airlines) and 15 time periods (years). The .use command below loads a Stata data set through
TCP/IP and in 1/20 of the .list command displays the first 20 observations.

. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear
(Cost of U.S. Airlines (Greene 2003))

. list airline year load cost output fuel in 1/20, sep(20)

+------------------------------------------------------------+
| airline year load cost output fuel |
|------------------------------------------------------------|
1. | 1 1 .534487 13.9471 -.0483954 11.57731 |
2. | 1 2 .532328 14.01082 -.0133315 11.61102 |
3. | 1 3 .547736 14.08521 .0879925 11.61344 |
4. | 1 4 .540846 14.22863 .1619318 11.71156 |
5. | 1 5 .591167 14.33236 .1485665 12.18896 |
6. | 1 6 .575417 14.4164 .1602123 12.48978 |
7. | 1 7 .594495 14.52004 .2550375 12.48162 |
8. | 1 8 .597409 14.65482 .3297856 12.6648 |
9. | 1 9 .638522 14.78597 .4779284 12.85868 |
10. | 1 10 .676287 14.99343 .6018211 13.25208 |
11. | 1 11 .605735 15.14728 .4356969 13.67813 |
12. | 1 12 .61436 15.16818 .4238942 13.81275 |
13. | 1 13 .633366 15.20081 .5069381 13.75151 |
14. | 1 14 .650117 15.27014 .6001049 13.66419 |
15. | 1 15 .625603 15.3733 .6608616 13.62121 |
16. | 2 1 .490851 13.25215 -.652706 11.55017 |
17. | 2 2 .473449 13.37018 -.626186 11.62157 |
18. | 2 3 .503013 13.56404 -.4228269 11.68405 |
19. | 2 4 .512501 13.8148 -.2337306 11.65092 |
20. | 2 5 .566782 14.00113 -.1708536 12.27989 |
+------------------------------------------------------------+

If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape
command to rearrange a data set back and forth between the long and wide form. The following
command changes from the long form to wide one so that the wide form has only six
observations that have a group variable and as many variables as the time period (4*15 year).

. keep airline year load cost output fuel

. reshape wide cost output fuel load, i(airline) j(year)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)

Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 90 -> 6
Number of variables 6 -> 61
j variable (15 values) year -> (dropped)
xij variables:
cost -> cost1 cost2 ... cost15
output -> output1 output2 ... output15
fuel -> fuel1 fuel2 ... fuel15
load -> load1 load2 ... load15
-----------------------------------------------------------------------------

If you wish to rearrange the data set back to the long form, run the following command.

. reshape long cost output fuel load, i(airline) j(year)

In balanced panel data, all entities have measurements in all time periods. In a contingency
table of cross-sectional and time-series variables, each cell should have only one frequency.
When each entity in a data set has different numbers of observations due to missing values, the
panel data are not balanced. Some cells in the contingency table have zero frequency. In
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 4
http://www.indiana.edu/~statmath

4
unbalanced panel data, the total number of observations is not nT. Unbalanced panel data
entail some computational and estimation issues although most software packages are able to
handle both balanced and unbalanced data.

1.2 Fixed Effect versus Random Effect Models

Panel data models examine fixed and/or random effects of entity (individual or subject) or time.
The core difference between fixed and random effect models lies in the role of dummy
variables (Table 1.1). If dummies are considered as a part of the intercept, this is a fixed effect
model. In a random effect model, the dummies act as an error term.

A fixed group effect model examines group differences in intercepts, assuming the same slopes
and constant variance across entities or subjects. Since a group (individual specific) effect is
time invariant and considered a part of the intercept,
i
u is allowed to be correlated to other
regressors. Fixed effect models use least squares dummy variable (LSDV) and within effect
estimation methods. Ordinary least squares (OLS) regressions with dummies, in fact, are fixed
effect models.

Table 1.1 Fixed Effect and Random Effect Models
Fixed Effect Model Random Effect Model
Functional form
*

it it i it
v X u y + + + = | o
'
) ( ) (
'
it i it it
v u X y + + + = | o
Intercepts Varying across groups and/or times Constant
Error variances Constant Varying across groups and/or times
Slopes Constant Constant
Estimation LSDV, within effect method GLS, FGLS
Hypothesis test Incremental F test Breusch-Pagan LM test
* ) , 0 ( ~
2
v it
IID v o

A random effect model, by contrast, estimates variance components for groups (or times) and
error, assuming the same intercept and slopes.
i
u is a part of the errors and thus should not be
correlated to any regressor; otherwise, a core OLS assumption is violated. The difference
among groups (or time periods) lies in their variance of the error term, not in their intercepts. A
random effect model is estimated by generalized least squares (GLS) when the O matrix, a
variance structure among groups, is known. The feasible generalized least squares (FGLS)
method is used to estimate the variance structure when O is not known. A typical example is
the groupwise heteroscedastic regression model (Greene 2003). There are various estimation
methods for FGLS including the maximum likelihood method and simulation (Baltagi and
Cheng 1994).

Fixed effects are tested by the (incremental) F test, while random effects are examined by the
Lagrange Multiplier (LM) test (Breusch and Pagan 1980). If the null hypothesis is not rejected,
the pooled OLS regression is favored. The Hausman specification test (Hausman 1978)
compares fixed effect and random effect models. If the null hypothesis that the individual
effects are uncorrelated with the other regressors in the model is not rejected, a random effect
model is better than its fixed counterpart.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 5
http://www.indiana.edu/~statmath

5
If one cross-sectional or time-series variable is considered (e.g., country, firm, and race), this is
called a one-way fixed or random effect model. Two-way effect models have two sets of
dummy variables for group and/or time variables (e.g., state and year).

1.3 Estimation and Software Issues

The LSDV regression, within effect model, between effect model (group or time mean model),
GLS, and FGLS are fundamentally based on OLS in terms of estimation. Thus, any procedure
and command for OLS is good for linear panel data models (Table 1.2).

The REG procedure of SAS/STAT, Stata .regress (.cnsreg), LIMDEP regress$, and SPSS
regression commands all fit LSDV1 by dropping one dummy and have options to suppress
the intercept (LSDV2). SAS, Stata, and LIMDEP can estimate OLS with restrictions (LSDV3),
but SPSS cannot. In Stata, .cnsreg command requires restrictions defined in the .constraint
command.

Table 1.2 Procedures and Commands in SAS, Stata, LIMDEP, and SPSS
SAS 9.2 Stata 11 LIMDEP 9 SPSS 17
Regression (OLS)
PROC REG .regress Regress$ Regression
LSDV1 w/o a dummy w/o a dummy w/o a dummy w/o a dummy
LSDV2 /NOINT ,noconstant
w/o One in Rhs
/Origin
LSDV3
RESTRICT .cnsreg Cls: N/A
One-way fixed
effect (within)
TSCSREG /FIXONE
PANEL /FIXONE
.xtreg, fe
.areg, abs
Regress;Panel;Str=;
Fixed$
N/A
Two-way fixed
(within effect)
TSCSREG /FIXTWO
PANEL /FIXTWO
N/A Regress;Panel;Str=;
Period=;Fixed$
N/A
Between effect
PANEL /BTWNG
PANEL /BTWNT
.xtreg, be Regress;Panel;Str=;
Means$
N/A
One-way random
effect
TSCSREG /RANONE
PANEL /RANONE
MIXED /RANDOM
.xtreg, re
.xtgls
.xtmixed
Regress;Panel;Str=;
Random$
N/A
Two-way random
TSCSREG /RANTWO
PANEL /RANTWO
.xtmixed Regress;Panel;Str=;
Period=;Random$
N/A
Random coefficient
model
MIXED /RANDOM .xtmixed
.xtrc
Regress;RPM=;Str=$ N/A

SAS, Stata, and LIMDEP also provide the procedures and commands that estimate panel data
models in a convenient way (Table 1.2). SAS/ETS has the TSCSREG and PANEL procedures
to estimate one-way and two-way fixed/random effect models.
1
These procedures estimate the
within effect model for a fixed effect model and by default employ the Fuller-Battese method
(1974) to estimate variance components for group, time, and error for a random effect model.
PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks
(1967) autoregressive model and Da Silva moving average method.

PROC TSCSREG can handle balanced data only, whereas PROC PANEL is able to deal with
balanced and unbalanced data. PROC PANEL requires each entity (subject) has more than one
observation. PROC TSCSREG provides one-way and two-way fixed and random effect models,

1
PROC PANEL was an experimental procedure in 9.13 but becomes a regular procedure in 9.2. SAS 9.13 users
need to download and install PROC PANEL from http://www.sas.com/apps/demosdownloads/setupintro.jsp.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 6
http://www.indiana.edu/~statmath

6
while PROC PANEL supports the between effect model (/BTWNT and /BTWNG) and pooled
OLS regression (/POOLED) as well. PROC PANEL has BP and BP2 options to conduct the
Breusch-Pagen LM test for random effects, while PROC TSCSREG does not.
2
Despite
advanced features of PROC PANEL, the output of the two procedures is similar. PROC
MIXED is also able to fit random effect and random coefficient (parameter) models and
supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG.

The Stata .xtreg command estimates a within effect (fixed effect) model with the fe option, a
between effect model with be, and a random effect model with re. This command, however,
does not directly fit two-way fixed and random effect models.
3
The .areg command with the
absorb option, equivalent to the .xtreg with the fe option, fits the one-way within effect
model that has a large dummy variable set. A random effect model can be also estimated using
the .xtmixed command. Stata has .xtgls that fits panel data models with heteroscedasticity
across groups and/or autocorrelation within groups.

The LIMDEP Regress$ command with the Panel subcommand estimates panel data models.
The Fixed effect subcommand fits a fixed effect model, Random effect estimates a random
effect model, and Means is for a between effect model. SPSS has limited ability to analyze
panel data.

1.4 Data Sets

This document uses two data sets. A cross-sectional data set contains research and development
(R&D) expenditure data of the top 50 information technology firms presented in OECD
Information Technology Outlook 2004. A panel data set has cost data for U.S. airlines (1970-
1984), which are used in Econometric Analysis (Greene 2003). See the Appendix for the details.

2
However, BP and BP2 produce invalid Breusch-Pagan statistics in cases of unbalanced data.
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/etsug_panel_sect041.htm.
3
You may fit the two-way fixed effect model by including a set of dummies and using the fe option. For the two-
way random effect model, you need to use the .xtmixed command instead of .xtreg.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 7
http://www.indiana.edu/~statmath

7
2. Least Squares Dummy Variable Regression

A dummy variable is a binary variable that is coded to either 1 or zero. It is commonly used to
examine group and time effects in regression analysis. Consider a simple model of regressing
R&D expenditure in 2002 on 2000 net income and firm type. The dummy variable d1 is set to 1
for equipment and software firms and zero for telecommunication and electronics. The variable
d2 is coded in the opposite way. Take a look at the data structure (Figure 2.1).

Figure 2.1 Dummy Variable Coding for Firm Types
+-----------------------------------------------------------------+
| firm rnd income type d1 d2 |
|-----------------------------------------------------------------|
| LG Electronics 551 356 Electronics 0 1 |
| AT&T 254 4,669 Telecom 0 1 |
| IBM 4,750 8,093 IT Equipment 1 0 |
| Ericsson 4,424 2,300 Comm. Equipment 1 0 |
| Siemens 5,490 6,528 Electronics 0 1 |
| Verizon . 11,797 Telecom 0 1 |
| Microsoft 3,772 9,421 Service & S/W 1 0 |
… … … … … … … …

2.1 Model 1 without a Dummy Variable: Pooled OLS

The ordinary least squares (OLS) regression without dummy variables, a pooled regression
model, assumes a constant intercept and slope regardless of firm types. In the following
regression equation,
0
| is the intercept;
1
| is the slope of net income in 2000; and
i
c is the
error term.

Model 1:
i i i
income D R c | | + + =
1 0
&

The pooled model fits the data well at the .05 significance level (F=7.07, p<.0115). R
2
of .1604
says that this model accounts for 16 percent of the total variance. The model has the intercept
of 1,482.697 and slope of .2231. For a $ one million increase in net income, a firm is likely to
increase R&D expenditure by $ .2231 million (p<.012).

. use http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.dta, clear
( R&D expenditure of IT firm (OECD 2002))

. regress rnd income

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 1, 37) = 7.07
Model | 15902406.5 1 15902406.5 Prob > F = 0.0115
Residual | 83261299.1 37 2250305.38 R-squared = 0.1604
-------------+------------------------------ Adj R-squared = 0.1377
Total | 99163705.6 38 2609571.2 Root MSE = 1500.1

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2230523 .0839066 2.66 0.012 .0530414 .3930632
_cons | 1482.697 314.7957 4.71 0.000 844.8599 2120.533
------------------------------------------------------------------------------

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 8
http://www.indiana.edu/~statmath

8
Pooled model: R&D = 1,482.697 + .2231*income

Despite moderate goodness of fit statistics such as F and t, this is a naïve model. R&D
investment tends to vary across industries.

2.2 Model 2 with a Dummy Variable

You may assume that equipment and software firms have more R&D expenditure than other
types of companies. Let us take this group difference into account.
4
We have to drop one of the
two dummy variables in order to avoid perfect multicollinearity. That is, OLS does not work
with both dummies in a model. The
1
o in model 2 is the coefficient of equipment, service, and
software companies.

Model 2:
i i i i
d income D R c o | | + + + =
1 1 1 0
&

Model 2 fits the date better than Model 1 The p-value of the F test is .0054 (significant at
the .01 level); R
2
is .2520, about .1 larger than that of Model 1; SSE (sum of squares due to
error or residual) decreases from 83,261,299 to 74,175,757 and SEE (square root of MSE) also
declines accordingly (1,500→1,435). The coefficient of d1 is statistically discernable from zero
at the .05 level (t=2.10, p<.043). Unlike Model 1, this model results in two different regression
equations for two groups. The difference lies in the intercepts, but the slope remains unchanged.

. regress rnd income d1

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 2, 36) = 6.06
Model | 24987948.9 2 12493974.4 Prob > F = 0.0054
Residual | 74175756.7 36 2060437.69 R-squared = 0.2520
-------------+------------------------------ Adj R-squared = 0.2104
Total | 99163705.6 38 2609571.2 Root MSE = 1435.4

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 1006.626 479.3717 2.10 0.043 34.41498 1978.837
_cons | 1133.579 344.0583 3.29 0.002 435.7962 1831.361
------------------------------------------------------------------------------

d1=1: R&D = 2,140.2050 + .2180*income = 1,113.579 +1,006.6260*1 + .2180*income
d1=0: R&D = 1,133.5790 + .2180*income = 1,113.579 +1,006.6260*0 + .2180*income

The slope .2180 indicates a positive impact of two-year-lagged net income on a firm’s R&D
expenditure. Equipment and software firms on average spend $1,007 million (=2,140-1,134)
more for R&D than telecommunication and electronics companies.

2.3 Visualization of Model 1 and 2


4
The dummy variable (firm types) and regressors (net income) may or may not be correlated.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 9
http://www.indiana.edu/~statmath

9
There is only a tiny difference in the slope (.2231 versus .2180) between Model 1 and Model 2.
The intercept 1,483 of Model 1, however, is quite different from 1,134 for equipment and
software companies and 2,140 for telecommunications and electronics in Model 2. This result
appears to be supportive of Model 2.

Figure 2.2 highlights differences between Model 1 and 2 more clearly. The red line (pooled) in
the middle is the regression line of Model 1; the dotted blue line at the top is one for equipment
and software companies (d1=1) in Model 2; finally the dotted green line at the bottom is for
telecommunication and electronics firms (d2=1 or d1=0).

Figure 2.2. Regression Lines of Model 1 and Model 2
R&D=1483+.223*Income
R&D=2140+.218*Income
R&D=1134+.218*Income
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
R
&
D

(
U
S
D

M
i
l
l
i
o
n
s
)
0 500 1000 1500 2000 2500
Income (USD Millions)
Source: OECD Information Technology Outlook 2004. http://thesius.sourceoecd.org/
2002 R&D Investment of OECD IT Firms


This plot shows that Model 1 ignores the group difference, and thus reports the misleading
intercept. The difference in the intercept between two groups of firms looks substantial.
However, the two models have the similar slopes. Consequently, Model 2 considering a fixed
group effect (i.e., firm type) seems better than the simple Model 1. Compare goodness of fit
statistics (e.g., F, R
2
, and SSE) of the two models. See Section 3.2.2 and 4.7 for formal
hypothesis test.

2.4 Least Squares Dummy Variable Regression: LSDV1, LSDV2, and LSDV3

The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with
dummy variables. Above Model 2 is a typical example of LSDV. The key issue in LSDV is
how to avoid the perfect multicollinearity or so called “dummy variable trap.” LSDV has three
approaches to avoid getting caught in the trap. These approaches are different from each other
with respect to model estimation and interpretation of dummy variable parameters (Suits 1984:
177). They produce different dummy parameter estimates, but their results are equivalent.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 10
http://www.indiana.edu/~statmath

10

The first approach, LSDV1, drops a dummy variable as shown in Model 2 above. That is, the
parameter of the eliminated dummy variable is set to zero and is used as a baseline (Table 3). A
variable to be dropped,
1 LSDV
dropped
d (d2 in Model 2), needs to be carefully (as opposed to arbitrarily)
selected so that it can play a role of the reference group effectively. LSDV2 includes all
dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero). Finally, LSDV3
includes the intercept and all dummies, and then impose a restriction that the sum of parameters
of all dummies is zero. Each approach has a constraint (restriction) that reduces the number of
parameters to be estimated by one and thus makes the model identified. The following
functional forms compare these three LSDVs.

LSDV1:
i i i i
d income D R c o | | + + + =
1 1 1 0
& or
i i i i
d income D R c o | | + + + =
2 2 1 0
&
LSDV2:
i i i i i
d d income D R c o o | + + + =
2 2 1 1 1
&
LSDV3:
i i i i i
d d income D R c o o | | + + + + =
2 2 1 1 1 0
& , subject to 0
2 1
= +o o

Table 2.1. Three Approaches of the Least Squares Dummy Variable Regression Model
LSDV1 LSDV2 LSDV3
Dummies included
1 1
1
LSDV
d
LSDV
d d ÷ except
for
1 LSDV
dropped
d
* *
1 d
d d ÷
3 3
1
LSDV
d
LSDV
d d ÷
Intercept?
1 LSDV
o
No
3 LSDV
o
All dummies? No (d-1) Yes (d) Yes (d)
Constraint
(restriction)?
0
1
=
LSDV
dropped
o

(Drop one dummy)
0
2
=
LSDV
o

(Suppress the intercept)
0
3
=
¿
LSDV
i
o

(Impose a restriction)
Actual dummy
parameters
1 1 * LSDV
i
LSDV
i
o o o + = ,
1 * LSDV
dropped
o o =
*
1
o ,
*
2
o ,…
*
d
o
3 3 * LSDV
i
LSDV
i
o o o + = ,
¿
=
* 3
1
i
LSDV
d
o o
Meaning of a
dummy coefficient
How far away from the
reference group (dropped)?
Actual intercept How far away from the
average group effect?
H
0
of the t-test
0
* *
= ÷
dropped i
o o 0
*
=
i
o
0
1
* *
= ÷
¿ i i
d
o o
Source: Constructed from Suits (1984) and David Good’s lecture (2004)

Three approaches end up fitting the same model but the coefficients of dummy variables in
each approach have different meanings and thus are numerically different (Table 2.1). A
parameter estimate in LSDV2,
*
d
o , is the actual intercept (Y-intercept) of group d. It is easy to
interpret substantively. The t-test examines if
*
d
o is zero. In LSDV1, a dummy coefficient
shows the extent to which the actual intercept of group d deviates from the reference point (the
parameter of the dropped dummy variable), which is the intercept of LSDV1,
1 * LSDV
dropped
o o = .
5


5
In Model 2,
1
ˆ
o of 1,007 is the estimated (relative) distance between two types of firm (equipment and software
versus telecommunications and electronics). In Figure 2.2, the Y-intercept of equipment and software (absolute
distance from the origin) is 2,140 = 1,134+1,006. The Y-intercept of telecommunications and electronics is 1,134.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 11
http://www.indiana.edu/~statmath

11
The null hypothesis holds that the deviation from the reference group is zero. In LSDV3, a
dummy coefficient means how far its actual parameter is away from the average group effect
(Suits 1984: 178). The average effect is the intercept of LSDV3:
¿
=
* 3
1
i
LSDV
d
o o . Therefore,
the null hypothesis is the deviation from the average is zero. In short, each approach has a
different baseline and thus tests a different hypothesis but produces exactly the same parameter
estimates of regressors. They all fit the same model; given one LSDV fitted, in other words, we
can replicate the other two LSDVs. Table 2.1 summarizes differences in estimation and
interpretation of the three LSDVs.

Which approach is better than the others? You need to consider both estimation and
interpretation issues carefully. In general, LSDV1 is often preferred because of easy estimation
in statistical software packages. Oftentimes researchers want to see how far dummy parameters
deviate from the reference group rather than what are the actual intercept of each group.
LSDV2 and LSDV3 involve some estimation problems; for example, LSDV2 reports a
incorrect R
2
.

2.5 Estimating Three LSDVs

The SAS REG procedure, Stata .regress command, LIMDEP Regress$ command, and
SPSS Regression command all fit OLS and LSDVs. Let us estimate three LSDVs using SAS,
Stata, and LIMDEP.

2.5.1 LSDV 1 without a Dummy

LSDV 1 drops a dummy variable. The intercept is the actual parameter estimate (absolute
distance from the origin) of the dropped dummy variable. The coefficient of a dummy included
means how far its parameter estimate is away from the reference point or baseline (i.e., the
intercept).

Here we include d2 instead of d1 to see how a different reference point changes the result.
Check the sign of the dummy coefficient and the intercept.

PROC REG DATA=masil.rnd2002;
MODEL rnd = income d2;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: rnd

Number of Observations Read 50
Number of Observations Used 39
Number of Observations with Missing Values 11


Analysis of Variance

Sum of Mean
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 12
http://www.indiana.edu/~statmath

12
Source DF Squares Square F Value Pr > F

Model 2 24987949 12493974 6.06 0.0054
Error 36 74175757 2060438
Corrected Total 38 99163706


Root MSE 1435.42248 R-Square 0.2520
Dependent Mean 2023.56410 Adj R-Sq 0.2104
Coeff Var 70.93536


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 2140.20468 434.48460 4.93 <.0001
income 1 0.21801 0.08032 2.71 0.0101
d2 1 -1006.62593 479.37174 -2.10 0.0428

d2=0: R&D = 2,140.2047 + .2180*income = 2,140.2047 - 1,006.6259*0 + .2180*income
d2=1: R&D = 1,133.5788 + .2180*income = 2,140.2047 - 1,006.6259*1 + .2180*income

The intercept 2,140 is the Y-intercept of equipment and software firms, whose dummy is
dropped in the model (d1=1, d2=0). The coefficient -1,007 of telecommunications and
electronics means that its Y-intercept is -1,007 smaller than 1,134 of equipment and software.
That is, 1,134 = 2,140 (baseline) – 1,007. Therefore, this model is identical to Model 2 in
Section 2.2. In short, dropping another dummy does not change the model although producing
different dummy coefficients.

Alternatively, you may use the GLM and MIXED procedures to get the same result.

PROC GLM DATA=masil.rnd2002;
MODEL rnd = income d2 /SOLUTION;
RUN;

PROC MIXED DATA=masil.rnd2002;
MODEL rnd = income d2 /SOLUTION;
RUN;

2.5.2 LSDV 2 without the Intercept

LSDV 2 includes all dummy variables and suppresses the intercept. The Stata .regress
command has the noconstant option to fit LSDV2. The coefficients of dummies are actual
parameter estimates; thus, you do not need to compute Y-intercepts of groups. This LSDV,
however, reports incorrect (inflated) R
2
(.7135 > .2520) and F (29.88 > 6.06). This is because
the X matrix does not have a column vector of 1 and produces incorrect sums of squares of
model and total (Uyar and Erdem (1990: 298). However, the sum of squares of errors is correct
in any LSDV.

. regress rnd income d1 d2, noconstant
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 13
http://www.indiana.edu/~statmath

13

Source | SS df MS Number of obs = 39
-------------+------------------------------ F( 3, 36) = 29.88
Model | 184685604 3 61561868.1 Prob > F = 0.0000
Residual | 74175756.7 36 2060437.69 R-squared = 0.7135
-------------+------------------------------ Adj R-squared = 0.6896
Total | 258861361 39 6637470.79 Root MSE = 1435.4

------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 2140.205 434.4846 4.93 0.000 1259.029 3021.38
d2 | 1133.579 344.0583 3.29 0.002 435.7962 1831.361
------------------------------------------------------------------------------

d1=1: R&D = 2,140.205 + .2180*income
d2=1: R&D = 1,133.579 + .2180*income

2.5.3 LSDV 3 with a Restriction

LSDV 3 includes the intercept and all dummies and then imposes a restriction on the model.
The restriction is that the sum of all dummy parameters is zero. The Stata .constraint
command defines a constraint, while the .cnsreg command fits a constrained OLS using the
constraint()option. The number in the parenthesis indicates the constraint number defined in
the .constraint command.

. constraint 1 d1 + d2 = 0
. cnsreg rnd income d1 d2, constraint(1)

Constrained linear regression Number of obs = 39
F( 2, 36) = 6.06
Prob > F = 0.0054
Root MSE = 1435.4225

( 1) d1 + d2 = 0
------------------------------------------------------------------------------
rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .2180066 .0803248 2.71 0.010 .0551004 .3809128
d1 | 503.313 239.6859 2.10 0.043 17.20749 989.4184
d2 | -503.313 239.6859 -2.10 0.043 -989.4184 -17.20749
_cons | 1636.892 310.0438 5.28 0.000 1008.094 2265.69
------------------------------------------------------------------------------

d1=1: R&D = 2,140.205 + .2180*income = 1,637 + 503*1 + (-503)*0 + .2180*income
d2=1: R&D = 1,133.579 + .2180*income = 1,637 + 503*0 + (-503)*1 + .2180*income

The intercept is the average of actual parameter estimates: 1,637 = (2,140+1,133)/2. Since there
are two groups here, the coefficients of two dummies by definition share the same magnitude
($503) but have opposite directions. Equipment and software firms invest $2,140 millions for
R&D expenditure, $503 millions MORE than the average expenditure of overall IT firms
(=$2,140-$1,637), while telecommunications and electronics spend $503 millions LESS than
the average (=$1,134-$1,637). In the SAS output below, the coefficient of RESTRICT is
virtually zero and, in theory, should be zero.

PROC REG DATA=masil.rnd2002;
MODEL rnd = income d1 d2;
RESTRICT d1 + d2 = 0;
RUN;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 14
http://www.indiana.edu/~statmath

14

The REG Procedure
Model: MODEL1
Dependent Variable: rnd

NOTE: Restrictions have been applied to parameter estimates.

Number of Observations Read 50
Number of Observations Used 39
Number of Observations with Missing Values 11


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 2 24987949 12493974 6.06 0.0054
Error 36 74175757 2060438
Corrected Total 38 99163706


Root MSE 1435.42248 R-Square 0.2520
Dependent Mean 2023.56410 Adj R-Sq 0.2104
Coeff Var 70.93536


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 1636.89172 310.04381 5.28 <.0001
income 1 0.21801 0.08032 2.71 0.0101
d1 1 503.31297 239.68587 2.10 0.0428
d2 1 -503.31297 239.68587 -2.10 0.0428
RESTRICT -1 1.81899E-12 0 . .

* Probability computed using beta distribution.

Table 2.2 Estimating Three LSDVs Using SAS, Stata, LIMDEP, and SPSS

LSDV 1 LSDV 2 LSDV 3
SAS
PROC REG;
MODEL rnd = income d2;
RUN;
PROC REG;
MODEL rnd = income d1 d2 /NOINT;
RUN;
PROC REG;
MODEL rnd = income d1 d2;
RESTRICT d1 + d2 = 0;
RUN;
Stata
. regress ind income d2 . regress rnd income d1 d2, noconstant . constraint 1 d1+ d2 = 0
. cnsreg rnd income d1 d2 const(1)
LIMDEP
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d2$
REGRESS;
Lhs=rnd;
Rhs=income, d1, d2$
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d1, d2;
Cls: b(2)+b(3)=0$
SPSS
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d2.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/ORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d1 d2.
N/A
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 15
http://www.indiana.edu/~statmath

15

Table 2.2 compares how SAS, Stata, LIMDEP, and SPSS estimate LSDVs. SPSS is not able to
fit the LSDV3. In LIMDEP, ONE indicates the intercept to be included. Cls: b(2)+b(3)=0 fits
the model under the condition that the sum of parameter estimates of d1 (second parameter)
and d2 (third parameter) is zero. In SPSS, pay attention to the /ORIGIN option for LSDV2.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 16
http://www.indiana.edu/~statmath

16
3. Panel Data Models

Panel data models examine group (individual-specific) effects, time effects, or both. These
effects are either fixed effect or random effect. A fixed effect model examines if intercepts vary
across groups or time periods, whereas a random effect model explores differences in error
variances. A one-way model includes only one set of dummy variables (e.g., firm), while a two-
way model considers two sets of dummy variables (e.g., firm and year). Model 2 in Chapter 2,
in fact, is a one-way fixed group effect panel data model.

3.1 Functional Forms and Notation

The parameter estimate of a dummy variable is a part of the intercept in a fixed effect model
and a component of error in the random effect model. Slopes remain the same across groups or
time periods. The functional forms of one-way panel data models are as follows.

Fixed group effect model:
it it i it
v X u y + + + = | o
'
) ( , where ) , 0 ( ~
2
v it
IID v o
Random group effect model: ) (
'
it i it it
v u X y + + + = | o , where ) , 0 ( ~
2
v it
IID v o

Note that
i
u is a fixed or random effect and errors are independent identically distributed,
) , 0 ( ~
2
v it
IID v o .

Notations used in this document include,
-
- i
y : dependent variable (DV) mean of group i.
-
t
y
-
: dependent variable (DV) mean at time t.
-
- i
x : means of independent variables (IVs) of group i.
-
t
x
-
: means of independent variables (IVs) at time t.
-
- -
y : overall means of the DV.
-
- -
x : overall means of the IVs.
- n: the number of groups or firms
- T : the number of time periods
- N=nT : total number of observations
- k : the number of regressors excluding dummy variables
- K=k+1 (including the intercept)

3.2 Fixed Effect Models

There are several strategies for estimating fixed effect models. The least squares dummy
variable model (LSDV) uses dummy variables, whereas the within effect model does not. These
strategies, of course, produce the identical parameter estimates of non-dummy independent
variables. The between effect model fits the model using group and/or time means of dependent
and independent variables without dummies. Table 3.1 summarizes pros and cons of these
models.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 17
http://www.indiana.edu/~statmath

17
3.2.1 Estimations: LSDV, Within Effect, and Between Effect Models

As discussed in Chapter 2, LSDV is widely used because it is relatively easy to estimate and
interpret substantively. This LSDV, however, becomes problematic when there are many
groups or subjects in panel data. If T is fixed and · ÷ nT , only coefficients of regressors are
consistent. The coefficients of dummy variables,
i
u + o , are not consistent since the number of
these parameters increases as nT increases (Baltagi 2001). This is the so called incidental
parameter problem. Under this circumstance, LSDV is useless and thus calls for another
strategy, the within effect model.

A within group effect model does not need dummy variables, but it uses deviations from group
means. Thus, this model is the OLS of ) ( )' ( ) (
- - -
÷ + ÷ = ÷
i it i it i it
x x y y c c | without an
intercept.
6
The incidental parameter problem is no longer an issue. The parameter estimates of
regressors in the within effect model are identical to those of LSDV. The within effect model in
turn has several disadvantages.

Since this model does not report dummy coefficients, you need to compute them using the
formula | '
*
- -
÷ =
i i i
x y d Since no dummy is used, the within effect model has larger degrees of
freedom for error, resulting in small MSE (mean square error) and incorrect (smaller) standard
errors of parameter estimates. Thus, you have to adjust the standard error using the formula
k n nT
k nT
se
df
df
se se
k
LSDV
error
Within
error
k k
÷ ÷
÷
= =
*
. Finally, R
2
of the within effect model is not correct
because the intercept is suppressed.

Table 3.1 Comparison of Fixed Effect Models
LSDV1 Within Effect Between Effect
Functional form
i i i i
X i y c | o + + =
- - -
÷ + ÷ = ÷
i it i it i it
x x y y c c
i i i
x y c o + + =
- -

Dummy Yes No No
Dummy coefficient Presented Need to be computed N/A
Transformation No Deviation from the group means Group means
Intercept (estimation) Yes No Yes
R
2
Correct Incorrect
SSE Correct Correct
MSE Correct Smaller
Standard error of |
Correct Incorrect (smaller)
DF
error
nT-n-k nT-k (n larger) n-K
Observations nT nT n

The between group effect model, so called the group mean regression, uses group means of the
dependent and independent variables. This data aggregation reduces the number of

6
You need to follow three steps: 1) compute group means of the dependent and independent variables; 2)
transform variables to get deviations from the group means; 3) run OLS with the transformed variables without the
intercept.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 18
http://www.indiana.edu/~statmath

18
observations down to n. Then, run OLS of
i i i
x y c o + + =
- -
. Table 3.1 contrasts LSDV, the
within effect model, and the between group models.

3.2.2 Testing Group Effects

In a regression of
it it i it
X y c | µ o + + + = ' , the null hypothesis is that all dummy parameters
except for one for the dropped are zero: 0 ... :
1 1 0
= = =
÷ n
H µ µ . This hypothesis is tested by the
F test, which is based on loss of goodness-of-fit. The robust model in the following formula is
LSDV (or within effect model) and the efficient model is the pooled regression.
7


) , 1 ( ~
) ( ) 1 (
) 1 ( ) (
) ( ) ' (
) 1 ( ) ' ' (
2
2 2
k n nT n F
k n nT R
n R R
k n nT e e
n e e e e
Robust
Efficient Robust
Robust
Robust Efficient
÷ ÷ ÷
÷ ÷ ÷
÷ ÷
=
÷ ÷
÷ ÷


If the null hypothesis is rejected, you may conclude that the fixed group effect model is better
than the pooled OLS model.

3.2.3 Fixed Time Effect and Two-way Fixed Effect Models

For the fixed time effects model, you need to switch n and T, and i and t in the formulas.

- Model:
it it t it
X y c | t o + + + = '
- Within effect model: ) ( )' ( ) (
t it t it t it
x x y y
- - -
÷ + ÷ = ÷ c c |
- Dummy coefficients: | '
*
t t t
x y d
- -
÷ =
- Correct standard errors:
k T Tn
k Tn
se
df
df
se se
k
LSDV
error
Within
error
k k
÷ ÷
÷
= =
*

- Between effect model:
t t t
x y c o + + =
- -

- 0 ... :
1 1 0
= = =
÷ T
H t t .
- F-test: ) , 1 ( ~
) ( ) ' (
) 1 ( ) ' ' (
k T Tn T F
k T Tn e e
T e e e e
Within
Within Pooled
÷ ÷ ÷
÷ ÷
÷ ÷
.

The fixed group and time effect model uses slightly different formulas. The within effect model
of this two-way fixed model is estimated by five strategies (see Section 6.1).

- Model:
it it t i it
X y c | t µ o + + + + = .
- Within effect Model:
- - - -
+ ÷ ÷ = y y y y y
t i it it
*
and
- - - -
+ ÷ ÷ = x x x x x
t i it it
*
.
- Dummy coefficients: | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
i i i
and | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
t t t


7
When comparing fixed effect and random effect models, the fixed effect estimates are considered as the robust
estimates and random effect estimates as the efficient estimates.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 19
http://www.indiana.edu/~statmath

19
- Correct standard errors:
1
*
+ ÷ ÷ ÷
÷
= =
k T n nT
k nT
se
df
df
se se
k
LSDV
error
Within
error
k k

- 0 ... :
1 1 0
= = =
÷ n
H µ µ and 0 ...
1 1
= = =
÷ T
t t .
- F-test: )] 1 ( ), 2 [( ~
) 1 ( ) ' (
) 2 ( ) ' ' (
+ ÷ ÷ ÷ ÷ +
+ ÷ ÷ ÷
÷ + ÷
k T n nT T n F
k T n nT e e
T n e e e e
Robust
Robust Efficient


3.3 Random Effect Models

The one-way random group effect model is formulated as
it i it it
v u X y + + + = | o ' ,
it i it
v u w + =
where ) , 0 ( ~
2
u i
IID u o and ) , 0 ( ~
2
v it
IID v o . The
i
u are assumed independent of
it
v and
it
X ,
which are also independent of each other for all i and t. This assumption is not necessary in the
fixed effect model. The components of ) ( ) , (
js it js it
w w E w w Cov = are
2 2
v u
o o + if i=j and t=s and
2
u
o if i=j and s t = .
8
Thus, the O matrix or the variance structure of errors looks like,

(
(
(
(
(
¸
(

¸

+
+
+
= O
×
2 2 2 2
2 2 2 2
2 2 2 2
...
... ... ... ...
...
...
v u u u
u v u u
u u v u
T T
o o o o
o o o o
o o o o


A random effect model is estimated by generalized least squares (GLS) when the variance
structure is known, and by feasible generalized least squares (FGLS) when the variance is
unknown. Compared to fixed effect models, random effect models are relatively difficult to
estimate. This document assumes panel data are balanced.

3.3.1 Generalized Least Squares (GLS)

When O is known (given), GLS based on the true variance components is BLUE and all the
feasible GLS estimators considered are asymptotically efficient as either n or T approaches
infinity (Baltagi 2001).
In GLS, you just need to compute u using the O matrix:
2 2
2
1
v u
v
T o o
o
u
+
÷ = .
9
Then transform
variables as follows.
-

-
÷ =
i it it
y y y u
*

-

-
÷ =
i it it
x x x u
*
for all X
k
-
u o ÷ =1
*



8
This implies that ) , (
js it
w w Corr is 1 if i=j and t=s, and ) (
2 2 2
v u u
o o o + if i=j and s t = .
9
If 0 = u , run pooled OLS. If 1 = u and 0
2
=
v
o , then run the within effect model.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 20
http://www.indiana.edu/~statmath

20
Finally, run OLS on the transformed variables:
* * * * *
'
it it it
x y c | o + + = . Since O is often unknown,
FGLS is more frequently used than GLS.

3.3.2 Feasible Generalized Least Squares (FGLS)

If O is unknown, first you have to estimate u using
2
ˆ
u
o and
2
ˆ
v
o :
2
2
2 2
2
ˆ
ˆ
1
ˆ ˆ
ˆ
1
ˆ
between
v
v u
v
T T o
o
o o
o
u ÷ =
+
÷ = .

The
2
ˆ
v
o is derived from the SSE (sum of squares due to error) of the within effect model or
from the deviations of residuals from group means of residuals:
k n nT
v v
k n nT
e e
k n nT
SSE
n
i
T
t
i it
within within
v
÷ ÷
÷
=
÷ ÷
=
÷ ÷
=
¿¿
= =
-
1 1
2
2
) (
'
ˆ o , where
it
v are the residuals of the LSDV1.

The
2
ˆ
u
o comes from the between effect model (group mean regression):
T
v
between u
2
2 2
ˆ
ˆ ˆ
o
o o ÷ = , where
K n
SSE
between
between
÷
=
2
ˆ o .

Next, transform variables using u
ˆ
and then run OLS:
* * * * *
'
it it it
x y c | o + + = .
-
-
÷ =
i it it
y y y u
ˆ
*

-
-
÷ =
i it it
x x x u
ˆ
*
for all X
k

- u o
ˆ
1
*
÷ =

The estimation of the two-way random effect model is skipped here.

3.3.3 Testing Random Effects (LM test)

The null hypothesis is that cross-sectional variance components are zero, 0 :
2
0
=
u
H o . Breusch
and Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003). In the following
formula, e is the n X 1 vector of the group specific means of pooled regression residuals and
e e' is the SSE of the pooled OLS regression. The LM follows chi-squared distribution with
one degree of freedom.
) 1 ( ~ 1
'
'
) 1 ( 2
1
'
'
) 1 ( 2
2
2
2
2
_
(
¸
(

¸

÷
÷
=
(
¸
(

¸

÷
÷
=
e e
e e T
T
nT
e e
DDe e
T
nT
LM
u
.

Baltagi (2001) presents the same LM test in a different way.
( ) ( )
) 1 ( ~ 1
) 1 ( 2
1
) 1 ( 2
2
2
2
2
2
2
2
_
(
(
¸
(

¸

÷
÷
=
(
(
¸
(

¸

÷
÷
=
¿¿
¿
¿¿
¿ ¿ -
it
i
it
it
u
e
e T
T
nT
e
e
T
nT
LM .
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 21
http://www.indiana.edu/~statmath

21

The two way random effect model has the null hypothesis of 0 :
2
1 0
=
u
H o and 0
2
2
=
u
o . The LM
test combines two one-way random effect models for group and time,
) 2 ( ~
2
2 1 12
_
u u u
LM LM LM + = .

3.4 Hausman Test: Fixed Effects versus Random Effects

The Hausman specification test compares the fixed versus random effects under the null
hypothesis that the individual effects are uncorrelated with the other regressors in the model
(Hausman 1978). If correlated (H
0
is rejected), a random effect model produces biased
estimators, violating one of the Gauss-Markov assumptions; so a fixed effect model is preferred.
Hausman’s essential result is that the covariance of an efficient estimator with its difference
from an inefficient estimator is zero (Greene 2003).

( ) ( ) ) ( ~
ˆ
2 1 '
k b b b b m
Efficient Robust Efficient Robust
_ ÷ ¿ ÷ =
÷
,
where, ) ( ) ( ] [
ˆ
Efficient Robust Efficient Robust
b Var b Var b b Var ÷ = ÷ = ¿ is the difference in the estimated
covariance matrix of the parameter estimates between the LSDV model (robust) and the
random effects model (efficient). It is notable that an intercept and dummy variables SHOULD
be excluded in computation.

3.5 Poolability Test

What is poolability? Poolability tests whether or not slopes are the same across groups or over
time. Thus, the null hypothesis of the poolability test is
k ik
H | | = :
0
. Remember that slopes
remain constant in fixed and random effect models; only intercepts and error variances matter.

The poolability test is undertaken under the assumption of ) , 0 ( ~
2
NT
I s N µ . This test uses the F
statistic,
| | ) ( , ) 1 ( ~
) (
) 1 ( ) ' (
'
'
K T n K n F
K T n e e
K n e e e e
F
i i
i i
obs
÷ ÷
÷
÷ ÷
=
¿
¿
,
where e e' is the SSE of the pooled OLS and
i i
e e
'
is the SSE of the OLS regression for group i.
If the null hypothesis is rejected, the panel data are not poolable. Under this circumstance, you
may go to the random coefficient model or hierarchical regression model.

Similarly, the null hypothesis of the poolability test over time is
k tk
H | | = :
0
. The F-test is
| | ) ( , ) 1 (
) (
) 1 ( ) ' (
'
'
K n T K T F
K n T e e
K T e e e e
F
t t
t t
obs
÷ ÷ =
÷
÷ ÷
=
¿
¿
,
where
t t
e e
'
is SSE of the OLS regression at time t.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 22
http://www.indiana.edu/~statmath

22
4. One-way Fixed Effect Models: Group Effects

A one-way fixed group model examines group differences in intercepts. The LSDV for this
fixed model needs to create as many dummy variables as the number of entities or subjects.
When many dummies are needed, the within effect model is useful since it transforms variables
using group means to avoid dummies. The between effect model uses group means of variables.

The sample panel data set includes cost and its related data of six U.S. airlines measured at 15
different time points. The following .use command reads a data set airline.dta
and .describe displays basic information of key variables.

. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear

. describe airline year cost output fuel load

storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------
airline int %8.0g Airline name
year int %8.0g Year
cost float %9.0g Total cost in $1,000
output float %9.0g Output in revenue passenger miles, index number
fuel float %9.0g Fuel price
load float %9.0g Load factor

You need to declare a cross-sectional (airline) and a time-series (year) variables using
the .tsset command.

. tsset airline year
panel variable: airline (strongly balanced)
time variable: year, 1 to 15
delta: 1 unit

Let us take a look at descriptive statistics of key variables using .xtsum.

. xtsum cost output fuel load

Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
cost overall | 13.36561 1.131971 11.14154 15.3733 | N = 90
between | .9978636 12.27441 14.67563 | n = 6
within | .6650252 12.11545 14.91617 | T = 15
| |
output overall | -1.174309 1.150606 -3.278573 .6608616 | N = 90
between | 1.166556 -2.49898 .3192696 | n = 6
within | .4208405 -1.987984 .1339861 | T = 15
| |
fuel overall | 12.77036 .8123749 11.55017 13.831 | N = 90
between | .0237151 12.7318 12.7921 | n = 6
within | .8120832 11.56883 13.8513 | T = 15
| |
load overall | .5604602 .0527934 .432066 .676287 | N = 90
between | .0281511 .5197756 .5971917 | n = 6
within | .0460361 .4368492 .6581019 | T = 15

4.1 The Pooled OLS Regression Model

First, fit the pooled regression model without any dummy variable.

. regress cost output fuel load

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 23
http://www.indiana.edu/~statmath

23
Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 86) = 2419.34
Model | 112.705452 3 37.5684839 Prob > F = 0.0000
Residual | 1.33544153 86 .01552839 R-squared = 0.9883
-------------+------------------------------ Adj R-squared = 0.9879
Total | 114.040893 89 1.28135835 Root MSE = .12461

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8827385 .0132545 66.60 0.000 .8563895 .9090876
fuel | .453977 .0203042 22.36 0.000 .4136136 .4943404
load | -1.62751 .345302 -4.71 0.000 -2.313948 -.9410727
_cons | 9.516923 .2292445 41.51 0.000 9.0612 9.972645
------------------------------------------------------------------------------

The regression equation is cost = 9.5169 + .8827*output +.4540*fuel -1.6275*load. This model
fits the data well (F=2419.34, p<.0000 and R
2
=.9883). We may, however, suspect if there is a
fixed group effect producing different intercepts across groups. Each airline may have a
significantly different level of cost, its Y-intercept, when all regressors are set to zero. This
difference is modeled as a fixed group effect.

As discussed in Chapter 2, there are three equivalent approaches of LSDV. They report the
identical parameter estimates of regresors except for dummy coefficients. Let us begin with
LSDV1.

4.2 LSDV1 without a Dummy

LSDV1 drops a dummy variable to get the model identified. LSDV1 produces correct ANOVA
information, goodness of fit, parameter estimates, and standard errors. As a consequence, this
approach is commonly used in practice. LSDV produces six regression equations for six
airlines. How can we draw these equations using LSDV1?

Airline 1: cost = 9.7059 + .9193*output +.4175*fuel -1.0704*load
Airline 2: cost = 9.6647 + .9193*output +.4175*fuel -1.0704*load
Airline 3: cost = 9.4970 + .9193*output +.4175*fuel -1.0704*load
Airline 4: cost = 9.8905 + .9193*output +.4175*fuel -1.0704*load
Airline 5: cost = 9.7300 + .9193*output +.4175*fuel -1.0704*load
Airline 6: cost = 9.7930 + .9193*output +.4175*fuel -1.0704*load

In SAS, PROC REG fits the OLS regression model. Let us drop the last dummy g6 and use it
as the reference group. Of course, you may drop another dummy variable to get the equivalent
result. LSDV1 fits the data better than does the pooled OLS. SSE decreases from 1.3354
to .2926, but R
2
increases from .9883 to .9974. Due to the dummies included, this model loses
five degrees of freedom (from 86 to 81).

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 24
http://www.indiana.edu/~statmath

24
Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 8 113.74827 14.21853 3935.79 <.0001
Error 81 0.29262 0.00361
Corrected Total 89 114.04089


Root MSE 0.06011 R-Square 0.9974
Dependent Mean 13.36561 Adj R-Sq 0.9972
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.79300 0.26366 37.14 <.0001
g1 1 -0.08706 0.08420 -1.03 0.3042
g2 1 -0.12830 0.07573 -1.69 0.0941
g3 1 -0.29598 0.05002 -5.92 <.0001
g4 1 0.09749 0.03301 2.95 0.0041
g5 1 -0.06301 0.02389 -2.64 0.0100
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001

The parameter estimate of g6 is presented in the intercept (9.7930). Other dummy parameter
estimates are computed using the reference point. The actual intercept of airline 1, for example,
is computed as 9.7059 = 9.7930 + (-.0871)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 + (-
.0630)*0 or simply 9.7930 + (-.0871), where 9.7930 is the reference point, the intercept of this
model. The coefficient -.0871 says that the Y-intercept of airline 1 (9.7059) is .0871 smaller
than that of airline 6 (reference point).

Stata has the .regress command for OLS regression (LSDV). The output is identical to that of
PROC REG.

. regress cost g1-g5 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | -.0870617 .0841995 -1.03 0.304 -.2545924 .080469
g2 | -.1282976 .0757281 -1.69 0.094 -.2789728 .0223776
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 25
http://www.indiana.edu/~statmath

25
g3 | -.2959828 .0500231 -5.92 0.000 -.395513 -.1964526
g4 | .097494 .0330093 2.95 0.004 .0318159 .1631721
g5 | -.063007 .0238919 -2.64 0.010 -.1105443 -.0154697
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.793004 .2636622 37.14 0.000 9.268399 10.31761
------------------------------------------------------------------------------

In LIMDEP, run the Regress$ command to fit the LSDV1. Do not forget to include ONE for
the intercept in the Rhs subcommand.

--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:51:23PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 9.79302127 .26366104 37.142 .0000
G1 | -.08707202 .08419916 -1.034 .3042 .16666667
G2 | -.12830600 .07572778 -1.694 .0940 .16666667
G3 | -.29598860 .05002285 -5.917 .0000 .16666667
G4 | .09749253 .03300915 2.954 .0041 .16666667
G5 | -.06300770 .02389180 -2.637 .0100 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

What if we drop a different dummy variable, say g1, instead of g6? Since the different
reference point is applied, you will get different dummy coefficients. As shown in the above,
the intercept 9.7059 in this model is the actual parameter estimate (Y-intercept) of g1, which
was excluded from the model. The Y-intercept of airline 2 is computed to get 9.6647=9.7059-
.0412. The Y-intercept of airline 2 (9.6647) is .0412 smaller than the reference point of 9.7059.
Actual Y-intercepts of other dummies are computed in this manner. The other statistics such as
parameter estimates of regressors and goodness-of-fit measures remain unchanged. That is,
choice of a dummy variable to be dropped does not change a model.

. regress cost g2-g6 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 26
http://www.indiana.edu/~statmath

26

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2 | -.0412359 .0251839 -1.64 0.105 -.0913441 .0088722
g3 | -.2089211 .0427986 -4.88 0.000 -.2940769 -.1237652
g4 | .1845557 .0607527 3.04 0.003 .0636769 .3054345
g5 | .0240547 .0799041 0.30 0.764 -.1349293 .1830387
g6 | .0870617 .0841995 1.03 0.304 -.080469 .2545924
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.705942 .193124 50.26 0.000 9.321686 10.0902
------------------------------------------------------------------------------

When you have not created dummy variables, take advantage of the .xi prefix command
(interaction expansion) to obtain the identical result. The Stata .xi, like.bysort, is used either
as an ordinary command or a prefix command. .xi creates dummies from a categorical
variable specified in the term i. and then run the command following the colon. Stata by
default drops the first dummy variable, while PROC TSCSREG and PROC PANEL in Section
4.5.2 drop the last dummy.

. xi: regress cost i.airline output fuel load

i.airline _Iairline_1-6 (naturally coded; _Iairline_1 omitted)

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 8, 81) = 3935.79
Model | 113.74827 8 14.2185338 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 0.9974
-------------+------------------------------ Adj R-squared = 0.9972
Total | 114.040893 89 1.28135835 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iairline_2 | -.0412359 .0251839 -1.64 0.105 -.0913441 .0088722
_Iairline_3 | -.2089211 .0427986 -4.88 0.000 -.2940769 -.1237652
_Iairline_4 | .1845557 .0607527 3.04 0.003 .0636769 .3054345
_Iairline_5 | .0240547 .0799041 0.30 0.764 -.1349293 .1830387
_Iairline_6 | .0870617 .0841995 1.03 0.304 -.080469 .2545924
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.705942 .193124 50.26 0.000 9.321686 10.0902
------------------------------------------------------------------------------

4.3 LSDV2 without the Intercept

LSDV2 reports actual parameter estimates of the dummies. You do not need to compute actual
Y-intercept any more. Because LSDV2 suppresses the intercept, you will get incorrect F and R
2

statistics. However, the SSE of LSDV2 is correct.

In PROC REG, you need to use the /NOINT option to suppress the intercept. Obviously, the F
value of 497,985 and R
2
of 1 are not likely. However, SSE, parameter estimates of regressors,
and their standard errors are correct. Make sure that the intercepts presented in the beginning of
Section 4.2 are what we got here using LSDV2.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load /NOINT;
RUN;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 27
http://www.indiana.edu/~statmath

27

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 9 16191 1799.03381 497985 <.0001
Error 81 0.29262 0.00361
Uncorrected Total 90 16192


Root MSE 0.06011 R-Square 1.0000
Dependent Mean 13.36561 Adj R-Sq 1.0000
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

g1 1 9.70594 0.19312 50.26 <.0001
g2 1 9.66471 0.19898 48.57 <.0001
g3 1 9.49702 0.22496 42.22 <.0001
g4 1 9.89050 0.24176 40.91 <.0001
g5 1 9.73000 0.26094 37.29 <.0001
g6 1 9.79300 0.26366 37.14 <.0001
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001

Stata uses the noconstant option to suppress the intercept. Notice that noc is its abbreviation.

. regress cost g1-g6 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 9, 81) = .
Model | 16191.3043 9 1799.03381 Prob > F = 0.0000
Residual | .292622872 81 .003612628 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | 9.705942 .193124 50.26 0.000 9.321686 10.0902
g2 | 9.664706 .198982 48.57 0.000 9.268794 10.06062
g3 | 9.497021 .2249584 42.22 0.000 9.049424 9.944618
g4 | 9.890498 .2417635 40.91 0.000 9.409464 10.37153
g5 | 9.729997 .2609421 37.29 0.000 9.210804 10.24919
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 28
http://www.indiana.edu/~statmath

28
g6 | 9.793004 .2636622 37.14 0.000 9.268399 10.31761
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
------------------------------------------------------------------------------

In LIMDEP, you need to drop ONE out of the Rhs subcommand to suppress the intercept.
Unlike SAS and Stata, LIMDEP reports correct R
2
(.9974) and F (3,936) even in LSDV2.

REGRESS;Lhs=COST;Rhs=G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:53:24PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 9.70594925 .19312325 50.258 .0000 .16666667
G2 | 9.66471527 .19898117 48.571 .0000 .16666667
G3 | 9.49703267 .22495746 42.217 .0000 .16666667
G4 | 9.89051381 .24176245 40.910 .0000 .16666667
G5 | 9.73001357 .26094094 37.288 .0000 .16666667
G6 | 9.79302127 .26366104 37.142 .0000 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

4.4 LSDV3 with Restrictions

LSDV3 imposes a restriction that the sum of the dummy parameters is zero. PROC REG has
the RESTRICT statement to impose restrictions. LSDV3 reports the correct ANOVA table and
parameter estimates of regressors but produces different, compared to those of LSDV1 and
LSDV2, dummy coefficients due to the different baseline (group average) used.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 29
http://www.indiana.edu/~statmath

29
NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 8 113.74827 14.21853 3935.79 <.0001
Error 81 0.29262 0.00361
Corrected Total 89 114.04089


Root MSE 0.06011 R-Square 0.9974
Dependent Mean 13.36561 Adj R-Sq 0.9972
Coeff Var 0.44970


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.71353 0.22964 42.30 <.0001
g1 1 -0.00759 0.04562 -0.17 0.8683
g2 1 -0.04882 0.03798 -1.29 0.2023
g3 1 -0.21651 0.01606 -13.48 <.0001
g4 1 0.17697 0.01942 9.11 <.0001
g5 1 0.01647 0.03669 0.45 0.6547
g6 1 0.07948 0.04050 1.96 0.0532
output 1 0.91928 0.02989 30.76 <.0001
fuel 1 0.41749 0.01520 27.47 <.0001
load 1 -1.07040 0.20169 -5.31 <.0001
RESTRICT -1 3.01674E-15 7.82306E-11 0.00 1.0000*

* Probability computed using beta distribution.

A dummy coefficient means the deviation from the averaged group effect (9.714). The actual
intercept of airline 2, for example, is 9.6647 =9.7135+ (-.0488). Notice that the 3.01674E-15 of
RESTRICT is virtually zero.

In Stata, you have to use the .cnsreg command in stead of .regress. The command, however,
does not provide an ANOVA table and goodness-of-fit statistics other than F and SEE
(standard error of residual--error term, square root of MSE).

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 output fuel load, constraint(1)

Constrained linear regression Number of obs = 90
F( 8, 81) = 3935.79
Prob > F = 0.0000
Root MSE = 0.0601

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 30
http://www.indiana.edu/~statmath

30
( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | -.0075859 .0456178 -0.17 0.868 -.0983509 .0831792
g2 | -.0488218 .0379787 -1.29 0.202 -.1243875 .0267439
g3 | -.2165069 .0160624 -13.48 0.000 -.2484661 -.1845478
g4 | .1769698 .0194247 9.11 0.000 .1383208 .2156189
g5 | .0164689 .0366904 0.45 0.655 -.0565335 .0894712
g6 | .0794759 .0405008 1.96 0.053 -.001108 .1600597
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
------------------------------------------------------------------------------

LIMDEP has the Cls subcommand to impose restrictions. Again, do not forget to include ONE
in Rhs. b(2) in Cls: indicates the parameter of the second variable, g1, listed in Rhs.

REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 31, 2009 at 06:39:21PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Autocorrel Durbin-Watson Stat. = 1.0264504 |
| Rho = cor[e,e(-1)] = .4867748 |
| Restrictns. F[ 1, 80] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 9.71354097 .22964002 42.299 .0000
G1 | -.00759172 .04561756 -.166 .8682 .16666667
G2 | -.04882570 .03797853 -1.286 .2023 .16666667
G3 | -.21650830 .01606233 -13.479 .0000 .16666667
G4 | .17697283 .01942459 9.111 .0000 .16666667
G5 | .01647259 .03669023 .449 .6547 .16666667
G6 | .07948030 .04050059 1.962 .0532 .16666667
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

LSDV3 in LIMDEP reports different dummy coefficients. But you may compute actual
intercepts of groups in a manner similar to what you would do in SAS and Stata. The actual
intercept of airline 5, for example, is 9.7300 = 12.1221 + (-2.3920).

4.5 Within Group Effect Model

The within effect model does not use dummy variables and thus has larger degrees of freedom,
smaller MSE, and smaller standard errors of parameters than those of LSDV. As a consequence,
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 31
http://www.indiana.edu/~statmath

31
you need to adjust standard errors. This model does not report individual dummy coefficients
either; you need to compute them if really needed. The SAS TSCSREG and PANEL
procedures and LIMDEP Regress$ command report the adjusted (correct) MSE, SEE (square
root of MSE), R
2
, and standard errors.

4.5.1 Estimating the Within Effect Model

First, let us manually estimate the within group effect model with Stata. You need to compute
group means.

. quietly egen gm_cost=mean(cost), by(airline)
. quietly egen gm_output=mean(output), by(airline)
. quietly egen gm_fuel=mean(fuel), by(airline)
. quietly egen gm_load=mean(load), by(airline)

You will get the following group means of variables.

+------------------------------------------------------+
| airline gm_cost gm_output gm_fuel gm_load |
|------------------------------------------------------|
| 1 14.67563 .3192696 12.7318 .5971917 |
| 2 14.37247 -.033027 12.75171 .5470946 |
| 3 13.37231 -.9122626 12.78972 .5845358 |
| 4 13.1358 -1.635174 12.77803 .5476773 |
| 5 12.36304 -2.285681 12.7921 .5664859 |
| 6 12.27441 -2.49898 12.7788 .5197756 |
+------------------------------------------------------+

Then transform dependent and independent variables to compute deviations from group means.

. quietly gen gw_cost = cost - gm_cost
. quietly gen gw_output = output - gm_output
. quietly gen gw_fuel = fuel - gm_fuel
. quietly gen gw_load = load - gm_load

Now, we are ready to run the within effect model. Keep in mind that you have to suppress the
intercept. The within effect model reports correct SSE and parameter estimates of regressors
but incorrect R
2
and standard errors of parameter estimates. Notice that the degrees of freedom
increase from 81 (LSDV) to 87 since six dummy variables are not used.

. regress gw_cost gw_output gw_fuel gw_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 3871.82
Model | 39.0683861 3 13.0227954 Prob > F = 0.0000
Residual | .292622861 87 .003363481 R-squared = 0.9926
-------------+------------------------------ Adj R-squared = 0.9923
Total | 39.361009 90 .437344544 Root MSE = .058

------------------------------------------------------------------------------
gw_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gw_output | .9192846 .028841 31.87 0.000 .86196 .9766092
gw_fuel | .4174918 .0146657 28.47 0.000 .3883422 .4466414
gw_load | -1.070396 .1946109 -5.50 0.000 -1.457206 -.6835858
------------------------------------------------------------------------------

You may compute group intercepts using
- -
÷ =
i i i
x y d '
*
| . For example, the intercept of airline
5 is computed as 9.730 = 12.3630 – {.9193*(-2.2857) + .4175*12.7921 + (-1.0704)*.5665}. In
order to get the correct standard errors, you need to adjust them using the ratio of degrees of
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 32
http://www.indiana.edu/~statmath

32
freedom of the within effect model and LSDV. For example, the standard error of the logged
output is computed as .0299=.0288*sqrt(87/81).

4.5.2 Using SAS: PROC TSCSREG and PROC PANEL

PROC TSCSREG and PROC PANEL of SAS/ETS allows users to fit the within effect model
conveniently. They, in fact, report LSDV1, but you do not need to create dummy variables and
compute deviations from group means.

PROC SORT DATA=masil.airline;
BY airline year;

A data set needs to be sorted in advance by the variables, which will appear in the ID statement
of PROC TSCSREG and PROC PANEL. These time-series and cross-sectional variables may
be numeric or string in SAS. /FIXONE of the MODEL statement fits a one-way fixed effect
model.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

The TSCSREG Procedure
Fixed One Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixOne
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.2926 DFE 81
MSE 0.0036 Root MSE 0.0601
R-Square 0.9974


F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

5 81 57.73 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 -0.08706 0.0842 -1.03 0.3042 Cross Sectional
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 33
http://www.indiana.edu/~statmath

33
Effect 1
CS2 1 -0.1283 0.0757 -1.69 0.0941 Cross Sectional
Effect 2
CS3 1 -0.29598 0.0500 -5.92 <.0001 Cross Sectional
Effect 3
CS4 1 0.097494 0.0330 2.95 0.0041 Cross Sectional
Effect 4
CS5 1 -0.06301 0.0239 -2.64 0.0100 Cross Sectional
Effect 5
Intercept 1 9.793004 0.2637 37.14 <.0001 Intercept
output 1 0.919285 0.0299 30.76 <.0001
fuel 1 0.417492 0.0152 27.47 <.0001
load 1 -1.0704 0.2017 -5.31 <.0001

The following PANEL procedure returns the same output.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

Both PROC TSCSREG and PROC PANEL report correct (adjusted) MSE, SEE, R
2
, and
standard errors, and conduct the F test for fixed group effect as well. They have strong
advantages over other software packages in this respect.

4.5.3 Using Stata

The Stata .xtreg command fits the within group effect model without creating dummy
variables. .xtreg should follow the .tsset command that specifies cross-sectional and time-
series variables. Both variables should be numeric in Stata; string variables are not allowed
in .tsset.

. quietly tsset airline year

The fe option of .xtreg indicates the within effect model and i(airline) specifies airline
as the independent unit. Once .tsset is executed, i(airline) is redundant. This command
report incorrect F 3,604 and R
2
of .9926.

. xtreg cost output fuel load, fe i(airline)

Fixed-effects (within) regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9926 Obs per group: min = 15
between = 0.9856 avg = 15.0
overall = 0.9873 max = 15

F(3,81) = 3604.80
corr(u_i, Xb) = -0.3475 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 34
http://www.indiana.edu/~statmath

34
-------------+----------------------------------------------------------------
sigma_u | .1320775
sigma_e | .06010514
rho | .82843653 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5, 81) = 57.73 Prob > F = 0.0000

Like PROC PANEL, .xtreg reports correct standard errors and the F test for a fixed group
effect. But this command does not provide an analysis of variance (ANOVA) table. R
2
and F
statistic are not correct. The last line of the output tests the null hypothesis that five dummy
parameters in LSDV1 are zero (e.g., μ
1
=0, μ
2
=0, μ
3
=0, μ
4
=0, and μ
5
=0). Notice that the
intercept of 9.7135 is that of LSDV3.

Alternatively, you may use .areg to get the same result except for R
2
, which is correct. The
intercept 9.7135 is the average of six airlines, the intercept of LSDV3.

. areg cost output fuel load, absorb(airline)

Linear regression, absorbing indicators Number of obs = 90
F( 3, 81) = 3604.80
Prob > F = 0.0000
R-squared = 0.9974
Adj R-squared = 0.9972
Root MSE = .06011

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9192846 .0298901 30.76 0.000 .8598126 .9787565
fuel | .4174918 .0151991 27.47 0.000 .3872503 .4477333
load | -1.070396 .20169 -5.31 0.000 -1.471696 -.6690963
_cons | 9.713528 .229641 42.30 0.000 9.256614 10.17044
-------------+----------------------------------------------------------------
airline | F(5, 81) = 57.732 0.000 (6 categories)

4.5.4 Using LIMDEP

In LIMDEP, the Panel and Fixed subcommands in the Regress$ command fit a fixed effect
panel data model. The Str subcommand specifies a stratification variable.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Fixed$

+----------------------------------------------------+
| OLS Without Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:56:52PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 4 |
| Degrees of freedom = 86 |
| Residuals Sum of squares = 1.335450 |
| Standard error of e = .1246133 |
| Fit R-squared = .9882897 |
| Adjusted R-squared = .9878812 |
| Model test F[ 3, 86] (prob) =2419.33 (.0000) |
| Diagnostic Log likelihood = 61.76991 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 3] (prob) = 400.26 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.121594 |
| Akaike Info. Criter. = -4.121653 |
+----------------------------------------------------+

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 35
http://www.indiana.edu/~statmath

35
+----------------------------------------------------+
| Panel Data Analysis of COST [ONE way] |
| Unconditional ANOVA (No regressors) |
| Source Variation Deg. Free. Mean Square |
| Between 74.6799 5. 14.9360 |
| Residual 39.3611 84. .468584 |
| Total 114.041 89. 1.28136 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88273863 .01325455 66.599 .0000 -1.17430918
FUEL | .45397771 .02030424 22.359 .0000 12.7703592
LOAD | -1.62750780 .34530293 -4.713 .0000 .56046016
Constant| 9.51691223 .22924522 41.514 .0000

+----------------------------------------------------+
| Least Squares with Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 03:56:52PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 9 |
| Degrees of freedom = 81 |
| Residuals Sum of squares = .2926208 |
| Standard error of e = .6010493E-01 |
| Fit R-squared = .9974341 |
| Adjusted R-squared = .9971806 |
| Model test F[ 8, 81] (prob) =3935.82 (.0000) |
| Diagnostic Log likelihood = 130.0865 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 8] (prob) = 536.89 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.528017 |
| Akaike Info. Criter. = -5.528687 |
| Estd. Autocorrelation of e(i,t) .573531 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .91928814 .02988997 30.756 .0000 -1.17430918
FUEL | .41749105 .01519907 27.468 .0000 12.7703592
LOAD | -1.07039502 .20168924 -5.307 .0000 .56046016

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -90.48804 .3936109461D+02 .6548513 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 130.08647 .2926207777D+00 .9974341 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 95.740 5 .00000 31.875 5 84 .00000 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 536.889 8 .00000 3935.818 8 81 .00000 |
|(4) vs (2) 441.149 3 .00000 3604.832 3 81 .00000 |
|(4) vs (3) 136.633 5 .00000 57.733 5 81 .00000 |
+--------------------------------------------------------------------+

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 36
http://www.indiana.edu/~statmath

36
LIMDEP reports both the pooled OLS regression under the label OLS Without Group Dummy
Variables and the within effect model under Least Squares with Group Dummy
Variables. Like the SAS TSCSREG procedure, LIMDEP provides correct MSE, SEE, R
2
, and
standard errors of the fixed effect model. LIMDEP also conducts the F test for checking a fixed
group effect (see the last line of the LIMDEP output above to get 57.733).

4.6 Between Group Effect Model: Group Mean Regression

A between effect model uses aggregate information, group means of variables. In other words,
the unit of analysis is not an individual observation, but entity or subject. The number of
observations jumps down to n from nT. This group mean regression produces different
goodness-of-fit measures and parameter estimates compared to those of LSDV and the within
effect model.

Let us compute group means and run OLS with them. The .collapse command computes
aggregate information and stores into a new data set. This model fits data relatively well but its
t-tests report insignificant parameters. Note that /// links two command lines.

. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) ///
gm_load=load, by(airline)

. regress gm_cost gm_output gm_fuel gm_load

Source | SS df MS Number of obs = 6
-------------+------------------------------ F( 3, 2) = 104.12
Model | 4.94698124 3 1.64899375 Prob > F = 0.0095
Residual | .031675926 2 .015837963 R-squared = 0.9936
-------------+------------------------------ Adj R-squared = 0.9841
Total | 4.97865717 5 .995731433 Root MSE = .12585

------------------------------------------------------------------------------
gm_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gm_output | .7824568 .1087646 7.19 0.019 .3144803 1.250433
gm_fuel | -5.523904 4.478718 -1.23 0.343 -24.79427 13.74647
gm_load | -1.751072 2.743167 -0.64 0.589 -13.55397 10.05182
_cons | 85.8081 56.48199 1.52 0.268 -157.2143 328.8305
------------------------------------------------------------------------------

The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between
effect model, but PROC TSCSREG does not. /BTWNG and /BTWNT fit the between group
and time effect models, respectively.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNG;
RUN;
The PANEL Procedure
Between Groups Estimates

Dependent Variable: cost

Model Description

Estimation Method BtwGrps
Number of Cross Sections 6
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 37
http://www.indiana.edu/~statmath

37
Time Series Length 15


Fit Statistics

SSE 0.0317 DFE 2
MSE 0.0158 Root MSE 0.1258
R-Square 0.9936


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

Intercept 1 85.80901 56.4830 1.52 0.2681 Intercept
output 1 0.782455 0.1088 7.19 0.0188
fuel 1 -5.52398 4.4788 -1.23 0.3427
load 1 -1.75102 2.7432 -0.64 0.5886

The Stata .xtreg command has the be option to fit the between effect model but does not
report the ANOVA table.

. xtreg cost output fuel load, be i(airline)

Between regression (regression on group means) Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.8808 Obs per group: min = 15
between = 0.9936 avg = 15.0
overall = 0.1371 max = 15

F(3,2) = 104.12
sd(u_i + avg(e_i.))= .1258491 Prob > F = 0.0095

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .7824552 .1087663 7.19 0.019 .3144715 1.250439
fuel | -5.523978 4.478802 -1.23 0.343 -24.79471 13.74675
load | -1.751016 2.74319 -0.64 0.589 -13.55401 10.05198
_cons | 85.80901 56.48302 1.52 0.268 -157.2178 328.8358
------------------------------------------------------------------------------

LIMDEP has the Means subcommand to fit the between effect model.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Means$

+----------------------------------------------------+
| Group Means Regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:04:12PM |
| LHS=YBAR(i.) Mean = 13.36561 |
| Standard deviation = .9978636 |
| WTS=NTi/Nobs Number of observs. = 6 |
| Model size Parameters = 4 |
| Degrees of freedom = 2 |
| Residuals Sum of squares = .3167277E-01 |
| Standard error of e = .1258427 |
| Fit R-squared = .9936383 |
| Adjusted R-squared = .9840957 |
| Model test F[ 3, 2] (prob) = 104.13 (.0095) |
| Diagnostic Log likelihood = 7.218541 |
| Restricted(b=0) = -7.953835 |
| Chi-sq [ 3] (prob) = 30.34 (.0000) |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 38
http://www.indiana.edu/~statmath

38
| Info criter. LogAmemiya Prd. Crt. = -3.634619 |
| Akaike Info. Criter. = -3.910724 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .78244727 .10876126 7.194 .0000 .230256D-11
FUEL | -5.52443747 4.47865187 -1.234 .2174 .18642891
LOAD | -1.75094765 2.74304702 -.638 .5233 .32541105
Constant| 85.8148317 56.4811479 1.519 .1287

SAS, Stata, and LIMDEP all report the same result: SSE .0317, SEE .1258, F 104.12 (p<.0095),
and R
2
.9936.

4.7 Testing Fixed Group Effects (F-test)

How do we know whether there is a significant fixed group effect? The null hypothesis is that
all dummy parameters except for one are zero: 0 ... :
1 1 0
= = =
÷ n
H µ µ .

In order to conduct a F-test, let us obtain the SSE (e’e) of 1.3354 from the pooled OLS
regression and .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model.
Alternatively, you may draw R
2
of .9974 from LSDV1 or LSDV3 and .9883 from the pooled
OLS. Do not, however, use LSDV2 and the within effect model for R
2
.

The F statistic is computed as ] 81 , 5 [ 7319 . 57 ~
) 3 6 90 ( ) 9974 . 1 (
) 1 6 ( ) 9883 . 9974 (.
) 3 6 90 ( ) 2926 (.
) 1 6 ( ) 2926 . 3354 . 1 (
÷ ÷ ÷
÷ ÷
=
÷ ÷
÷ ÷
.

The large F statistic rejects the null hypothesis in favor of the fixed group effect model
(p<.0000). There is a fixed group effect in these panel data.

The SAS TSCSREG and PANEL procedures, Stata .xtreg command, and LIMDEP Regress$
command by default conduct the F test. Alternatively, you may conduct the same test in
LSDV1. In SAS, add the TEST statement in PROC REG and then run the procedure again
(ANOVA table and parameter estimates are skipped).

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
TEST g1 = g2 = g3 = g4 = g5 = 0;
RUN;

The REG Procedure
Model: MODEL1

Test 1 Results for Dependent Variable cost

Mean
Source DF Square F Value Pr > F

Numerator 5 0.20856 57.73 <.0001
Denominator 81 0.00361

In Stata, run the .test command, a follow-up command for the Wald test, right after
estimating the model.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 39
http://www.indiana.edu/~statmath

39

. quietly regress cost g1-g5 output fuel load
. test g1 g2 g3 g4 g5

( 1) g1 = 0
( 2) g2 = 0
( 3) g3 = 0
( 4) g4 = 0
( 5) g5 = 0

F( 5, 81) = 57.73
Prob > F = 0.0000

4.8 Summary

Table 4.1 summarizes the estimation of a fixed effect model in SAS, Stata, and LIMDEP. The
SAS PANEL procedure is generally preferred to Stata and LIMDEP counterparts since it
produces correct statistics and conducts various hypothesis tests conveniently.

Table 4.1 Comparison of the Fixed Effect Model in SAS, Stata, LIMDEP
*

SAS 9 Stata 11 LIMDEP 9
OLS estimation
PROC REG; .regress, .cnsreg Regress$
LSDV1 Correct Correct Correct (slightly different F)
LSDV2 Incorrect F, (adjusted) R
2
Incorrect F, (adjusted) R
2
Correct (slightly different F)
LSDV3 Correct
.cnsreg
No ANOVA table and R
2

Correct (slightly different F)
Different dummy coefficients
Panel Estimation
PROC TSCSREG;
PROC PANEL;
.xtreg, .areg Regress; Panel$
Estimation type LSDV1 Within effect Within effect
SSE (e’e) Correct No Correct
MSE or SEE Correct (adjusted) No Correct (adjusted) SEE
Model test (F) No Incorrect Slightly different F
(adjusted) R
2
Correct Incorrect (correct in .areg) Correct
Intercept Correct LSDV3 intercept No
Coefficients Correct Correct Correct
Standard errors Correct (adjusted) Correct (adjusted) Correct (adjusted)
Effect test (F) Yes Yes Yes
Between effect /BTWNG, /BTWNT
,be Means;
* “Yes/No” means whether the software reports the statistics. “Correct/incorrect” indicates whether the statistics
are different from those of the least squares dummy variable (LSDV) 1 without a dummy variable.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 40
http://www.indiana.edu/~statmath

40
5. One-way Fixed Effect Models: Time Effects

A fixed time effect model investigates how time affects the intercept using time dummy
variables. The logic and method are the same as those of the fixed group effect model.

5.1 Least Squares Dummy Variable Models

The least squares dummy variable (LSDV) model produces the following fifteen regression
equations

Time 01: cost = 20.4959 + .8677*output - .4845*fuel -1.9544*load
Time 02: cost = 20.5782 + .8677*output - .4845*fuel -1.9544*load
Time 03: cost = 20.6559 + .8677*output - .4845*fuel -1.9544*load
Time 04: cost = 20.7409 + .8677*output - .4845*fuel -1.9544*load
Time 05: cost = 21.2000 + .8677*output - .4845*fuel -1.9544*load
Time 06: cost = 21.4118 + .8677*output - .4845*fuel -1.9544*load
Time 07: cost = 21.5035 + .8677*output - .4845*fuel -1.9544*load
Time 08: cost = 21.6542 + .8677*output - .4845*fuel -1.9544*load
Time 09: cost = 21.8397 + .8677*output - .4845*fuel -1.9544*load
Time 10: cost = 22.1140 + .8677*output - .4845*fuel -1.9544*load
Time 11: cost = 22.4655 + .8677*output - .4845*fuel -1.9544*load
Time 12: cost = 22.6515 + .8677*output - .4845*fuel -1.9544*load
Time 13: cost = 22.6167 + .8677*output - .4845*fuel -1.9544*load
Time 14: cost = 22.5524 + .8677*output - .4845*fuel -1.9544*load
Time 15: cost = 22.5369 + .8677*output - .4845*fuel -1.9544*load

5.1.1 LSDV1 without a Dummy

In SAS REG procedure, include time dummy variables instead of group dummies. You need to
exclude one of time dummies, say t15 here, in LSDV1.

PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 17 112.95270 6.64428 439.62 <.0001
Error 72 1.08819 0.01511
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 41
http://www.indiana.edu/~statmath

41
Corrected Total 89 114.04089


Root MSE 0.12294 R-Square 0.9905
Dependent Mean 13.36561 Adj R-Sq 0.9882
Coeff Var 0.91981


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 22.53677 4.94053 4.56 <.0001
t1 1 -2.04096 0.73469 -2.78 0.0070
t2 1 -1.95873 0.72275 -2.71 0.0084
t3 1 -1.88103 0.72036 -2.61 0.0110
t4 1 -1.79601 0.69882 -2.57 0.0122
t5 1 -1.33693 0.50604 -2.64 0.0101
t6 1 -1.12514 0.40862 -2.75 0.0075
t7 1 -1.03341 0.37642 -2.75 0.0076
t8 1 -0.88274 0.32601 -2.71 0.0085
t9 1 -0.70719 0.29470 -2.40 0.0190
t10 1 -0.42296 0.16679 -2.54 0.0134
t11 1 -0.07144 0.07176 -1.00 0.3228
t12 1 0.11457 0.09841 1.16 0.2482
t13 1 0.07979 0.08442 0.95 0.3477
t14 1 0.01546 0.07264 0.21 0.8320
output 1 0.86773 0.01541 56.32 <.0001
fuel 1 -0.48448 0.36411 -1.33 0.1875
load 1 -1.95440 0.44238 -4.42 <.0001

In Stata and LIMDEP, execute following commands to fit the same LSDV1 (output is skipped).

. regress cost t1-t14 output fuel load

REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

5.1.2 LSDV2 without the Intercept

In LIMDEP, take ONE out to fit LSDV2 by suppressing the intercept. Unlike SAS and Stata,
LIMDEP reports correct, although slightly different, F and R
2
statistics.

REGRESS;Lhs=COST;Rhs=T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:15:08PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 42
http://www.indiana.edu/~statmath

42
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Autocorrel Durbin-Watson Stat. = .2363289 |
| Rho = cor[e,e(-1)] = .8818355 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
T1 | 20.4959389 4.20954636 4.869 .0000 .06666667
T2 | 20.5781713 4.22154389 4.875 .0000 .06666667
T3 | 20.6558664 4.22419549 4.890 .0000 .06666667
T4 | 20.7408923 4.24576770 4.885 .0000 .06666667
T5 | 21.1999763 4.44035103 4.774 .0000 .06666667
T6 | 21.4117634 4.53864000 4.718 .0000 .06666667
T7 | 21.5034994 4.57141663 4.704 .0000 .06666667
T8 | 21.6541766 4.62290530 4.684 .0000 .06666667
T9 | 21.8297215 4.65692608 4.688 .0000 .06666667
T10 | 22.1139553 4.79266903 4.614 .0000 .06666667
T11 | 22.4654855 4.94992975 4.539 .0000 .06666667
T12 | 22.6514956 5.00861379 4.523 .0000 .06666667
T13 | 22.6167135 4.98616006 4.536 .0000 .06666667
T14 | 22.5523879 4.95596262 4.551 .0000 .06666667
T15 | 22.5369251 4.94055238 4.562 .0000 .06666667
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1875 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

In SAS and Stata, use /NOINT and noconstant, respectively, to suppress the intercept and
estimate the same LSDV2 (output is skipped).

PROC REG DATA=masil.airline;
MODEL cost = t1-t15 output fuel load /NOINT;
RUN;

. regress cost t1-t15 output fuel load, noc

5.1.3 LSDV3 with a Restriction

In PROC REG, you need to impose a restriction using the RESTRICT statement.

PROC REG DATA=masil.airline;
MODEL cost = t1-t15 output fuel load;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 43
http://www.indiana.edu/~statmath

43

Model 17 112.95270 6.64428 439.62 <.0001
Error 72 1.08819 0.01511
Corrected Total 89 114.04089


Root MSE 0.12294 R-Square 0.9905
Dependent Mean 13.36561 Adj R-Sq 0.9882
Coeff Var 0.91981


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 21.66698 4.62405 4.69 <.0001
t1 1 -1.17118 0.41783 -2.80 0.0065
t2 1 -1.08894 0.40586 -2.68 0.0090
t3 1 -1.01125 0.40323 -2.51 0.0144
t4 1 -0.92622 0.38177 -2.43 0.0178
t5 1 -0.46715 0.19076 -2.45 0.0168
t6 1 -0.25536 0.09856 -2.59 0.0116
t7 1 -0.16363 0.07190 -2.28 0.0258
t8 1 -0.01296 0.04862 -0.27 0.7907
t9 1 0.16259 0.06271 2.59 0.0115
t10 1 0.44682 0.17599 2.54 0.0133
t11 1 0.79834 0.32940 2.42 0.0179
t12 1 0.98435 0.38756 2.54 0.0132
t13 1 0.94957 0.36537 2.60 0.0113
t14 1 0.88524 0.33549 2.64 0.0102
t15 1 0.86978 0.32029 2.72 0.0083
output 1 0.86773 0.01541 56.32 <.0001
fuel 1 -0.48448 0.36411 -1.33 0.1875
load 1 -1.95440 0.44238 -4.42 <.0001
RESTRICT -1 -3.9462E-15 . . .

* Probability computed using beta distribution.

In Stata, define the restriction with the .constraint command and specify the restriction using
the constraint() option of the .cnsreg command.

. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0
. cnsreg cost t1-t15 output fuel load, constraint(3)

Constrained linear regression Number of obs = 90
F( 17, 72) = 439.62
Prob > F = 0.0000
Root MSE = 0.1229

( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t1 | -1.171179 .4178338 -2.80 0.007 -2.004115 -.3382422
t2 | -1.088945 .4058579 -2.68 0.009 -1.898008 -.2798816
t3 | -1.011252 .4032308 -2.51 0.014 -1.815078 -.2074266
t4 | -.9262249 .3817675 -2.43 0.018 -1.687265 -.1651852
t5 | -.4671515 .1907596 -2.45 0.017 -.8474239 -.0868791
t6 | -.2553627 .0985615 -2.59 0.012 -.4518415 -.0588839
t7 | -.1636326 .0718969 -2.28 0.026 -.3069564 -.0203088
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 44
http://www.indiana.edu/~statmath

44
t8 | -.0129552 .0486249 -0.27 0.791 -.1098872 .0839768
t9 | .1625876 .0627099 2.59 0.012 .0375776 .2875976
t10 | .4468191 .175994 2.54 0.013 .0959814 .7976568
t11 | .7983439 .3294027 2.42 0.018 .1416916 1.454996
t12 | .9843536 .3875583 2.54 0.013 .2117702 1.756937
t13 | .9495716 .3653675 2.60 0.011 .2212248 1.677918
t14 | .8852448 .3354912 2.64 0.010 .2164554 1.554034
t15 | .8697821 .3202933 2.72 0.008 .2312891 1.508275
output | .8677268 .0154082 56.32 0.000 .8370111 .8984424
fuel | -.4844835 .3641085 -1.33 0.188 -1.210321 .2413535
load | -1.954404 .4423777 -4.42 0.000 -2.836268 -1.07254
_cons | 21.66698 4.624053 4.69 0.000 12.4491 30.88486
------------------------------------------------------------------------------

In LIMDEP, run the following command to fit the same LSDV3.

REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)+b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:16:47PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Autocorrel Durbin-Watson Stat. = .2363289 |
| Rho = cor[e,e(-1)] = .8818355 |
| Restrictns. F[ 1, 71] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
T1 | -1.17119233 .41783540 -2.803 .0065 .06666667
T2 | -1.08895999 .40585988 -2.683 .0091 .06666667
T3 | -1.01126486 .40323211 -2.508 .0144 .06666667
T4 | -.92623900 .38176914 -2.426 .0178 .06666667
T5 | -.46715493 .19075952 -2.449 .0168 .06666667
T6 | -.25536788 .09856234 -2.591 .0116 .06666667
T7 | -.16363186 .07189683 -2.276 .0259 .06666667
T8 | -.01295461 .04862498 -.266 .7907 .06666667
T9 | .16259020 .06271009 2.593 .0116 .06666667
T10 | .44682406 .17599505 2.539 .0133 .06666667
T11 | .79835421 .32940389 2.424 .0179 .06666667
T12 | .98436437 .38755999 2.540 .0133 .06666667
T13 | .94958221 .36536879 2.599 .0114 .06666667
T14 | .88525662 .33549236 2.639 .0102 .06666667
T15 | .86979380 .32029396 2.716 .0083 .06666667
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1876 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016
Constant| 21.6671313 4.62407240 4.686 .0000

5.2 Within Time Effect Model
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 45
http://www.indiana.edu/~statmath

45

The within effect model for a fixed time effect needs to compute deviations from time means.
Keep in mind that the intercept should be suppressed.

5.2.1 Estimating the Fixed Time Effect Model

Let us manually estimate the fixed time effect model first.

. quietly egen tm_cost = mean(cost), by(year)
. quietly egen tm_output = mean(output), by(year)
. quietly egen tm_fuel = mean(fuel), by(year)
. quietly egen tm_load = mean(load), by(year)

+---------------------------------------------------+
| year tm_cost tm_output tm_fuel tm_load |
|---------------------------------------------------|
| 1 12.36897 -1.790283 11.63606 .4788587 |
| 2 12.45963 -1.744389 11.66868 .4868322 |
| 3 12.60706 -1.577767 11.67494 .52358 |
| 4 12.77912 -1.443695 11.73193 .5244486 |
| 5 12.94143 -1.398122 12.26843 .5635266 |
| 6 13.0452 -1.393002 12.53826 .5541809 |
| 7 13.15965 -1.302416 12.62714 .5607425 |
| 8 13.29884 -1.222963 12.76768 .5670587 |
| 9 13.4651 -1.067003 12.86104 .6179098 |
| 10 13.70187 -.9023156 13.23183 .6233943 |
| 11 13.91324 -.9205539 13.66246 .5802577 |
| 12 14.05984 -.8641667 13.82315 .5856243 |
| 13 14.12841 -.7923916 13.75979 .5803183 |
| 14 14.23517 -.6428015 13.67403 .5804528 |
| 15 14.32062 -.5527684 13.62997 .5797168 |
+---------------------------------------------------+

Once time means are ready, transform the dependent and independent variables and then run
OLS with the intercept suppressed.

. quietly gen tw_cost = cost - tm_cost
. quietly gen tw_output = output - tm_output
. quietly gen tw_fuel = fuel - tm_fuel
. quietly gen tw_load = load - tm_load

. regress tw_cost tw_output tw_fuel tw_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 2015.95
Model | 75.6459391 3 25.215313 Prob > F = 0.0000
Residual | 1.08819023 87 .012507934 R-squared = 0.9858
-------------+------------------------------ Adj R-squared = 0.9853
Total | 76.7341294 90 .852601437 Root MSE = .11184

------------------------------------------------------------------------------
tw_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tw_output | .8677268 .0140171 61.90 0.000 .8398663 .8955873
tw_fuel | -.4844836 .3312359 -1.46 0.147 -1.142851 .1738836
tw_load | -1.954404 .4024388 -4.86 0.000 -2.754295 -1.154514
------------------------------------------------------------------------------

If you want to get intercepts of years, use
t t t
x y d
- -
÷ = '
*
| . For example, the intercept of year 7
is 21.5035=13.1597-{.8677*(-1.3024) + (-.4845)*12.6271 + (-1.9544)*.5607}. As discussed
previously, standard errors of a within effect model need to be adjusted. For instance, the
correct standard error of fuel price is computed as .3641= .3312*sqrt(87/72).

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 46
http://www.indiana.edu/~statmath

46
. sum cost output fuel load if year==7

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 6 13.15965 1.071738 11.88492 14.52004
output | 6 -1.302416 1.272691 -2.865108 .2550375
fuel | 6 12.62714 .0747646 12.48162 12.68725
load | 6 .5607425 .029541 .510342 .594495

5.2.2 Using SAS: PROC TSCSREG and PROC PANEL

You need to sort the data set by variables (i.e., year and airline), which will appear in the ID
statement of PROC TSCSREG and PROC PANEL. The output is very similar to that of
LSDV1 in Section 5.1.1.

PROC SORT DATA=masil.airline;
BY year airline;
RUN;

PROC TSCSREG DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /FIXONE;
RUN;

(output is skipped)

The F test does not reject the null hypothesis of no fixed time effect (F=1.17, p<.3178); that is,
there is no fixed time effect in these panel data.

PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /FIXONE;
RUN;

The PANEL Procedure
Fixed One Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixOne
Number of Cross Sections 15
Time Series Length 6


Fit Statistics

SSE 1.0882 DFE 72
MSE 0.0151 Root MSE 0.1229
R-Square 0.9905


F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 47
http://www.indiana.edu/~statmath

47
14 72 1.17 0.3178


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 -2.04096 0.7347 -2.78 0.0070 Cross Sectional
Effect 1
CS2 1 -1.95873 0.7228 -2.71 0.0084 Cross Sectional
Effect 2
CS3 1 -1.88103 0.7204 -2.61 0.0110 Cross Sectional
Effect 3
CS4 1 -1.79601 0.6988 -2.57 0.0122 Cross Sectional
Effect 4
CS5 1 -1.33693 0.5060 -2.64 0.0101 Cross Sectional
Effect 5
CS6 1 -1.12514 0.4086 -2.75 0.0075 Cross Sectional
Effect 6
CS7 1 -1.03341 0.3764 -2.75 0.0076 Cross Sectional
Effect 7
CS8 1 -0.88274 0.3260 -2.71 0.0085 Cross Sectional
Effect 8
CS9 1 -0.70719 0.2947 -2.40 0.0190 Cross Sectional
Effect 9
CS10 1 -0.42296 0.1668 -2.54 0.0134 Cross Sectional
Effect 10
CS11 1 -0.07144 0.0718 -1.00 0.3228 Cross Sectional
Effect 11
CS12 1 0.114571 0.0984 1.16 0.2482 Cross Sectional
Effect 12
CS13 1 0.079789 0.0844 0.95 0.3477 Cross Sectional
Effect 13
CS14 1 0.015463 0.0726 0.21 0.8320 Cross Sectional
Effect 14
Intercept 1 22.53677 4.9405 4.56 <.0001 Intercept
output 1 0.867727 0.0154 56.32 <.0001
fuel 1 -0.48448 0.3641 -1.33 0.1875
load 1 -1.9544 0.4424 -4.42 <.0001

5.2.3 Using Stata

In Stata .xtreg command, the fe option fits the fixed effect model. The following .iis
command specifies year as a panel identification variable. In this case, i(year) is redundant.

. iis year

. xtreg cost output fuel load, fe i(year)

Fixed-effects (within) regression Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9858 Obs per group: min = 6
between = 0.4812 avg = 6.0
overall = 0.5265 max = 6

F(3,72) = 1668.37
corr(u_i, Xb) = -0.1503 Prob > F = 0.0000
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 48
http://www.indiana.edu/~statmath

48

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8677268 .0154082 56.32 0.000 .8370111 .8984424
fuel | -.4844835 .3641085 -1.33 0.188 -1.210321 .2413535
load | -1.954404 .4423777 -4.42 0.000 -2.836268 -1.07254
_cons | 21.66698 4.624053 4.69 0.000 12.4491 30.88486
-------------+----------------------------------------------------------------
sigma_u | .8027907
sigma_e | .12293801
rho | .97708602 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(14, 72) = 1.17 Prob > F = 0.3178

Again, the intercept 21.6670 is the intercept of LSDV3 (see 5.1.3).

5.2.4 Using LIMDEP

In LIMDEP, specify a time-series variable for stratification in the Str= subcommand. The
pooled OLS part of the output is skipped. Do not forget to include ONE for the intercept.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Fixed$

+----------------------------------------------------+
| Least Squares with Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:19:57PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 18 |
| Degrees of freedom = 72 |
| Residuals Sum of squares = 1.088193 |
| Standard error of e = .1229382 |
| Fit R-squared = .9904579 |
| Adjusted R-squared = .9882049 |
| Model test F[ 17, 72] (prob) = 439.62 (.0000) |
| Diagnostic Log likelihood = 70.98362 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 17] (prob) = 418.68 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.009826 |
| Akaike Info. Criter. = -4.015291 |
| Estd. Autocorrelation of e(i,t) .881836 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 15 |
| Smallest 6, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918
FUEL | -.48449467 .36410984 -1.331 .1868 12.7703592
LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -120.52864 .7673414157D+02 .3271354 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 70.98362 .1088193393D+01 .9904579 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 49
http://www.indiana.edu/~statmath

49
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 35.659 14 .00117 2.605 14 75 .00404 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 418.684 17 .00000 439.617 17 72 .00000 |
|(4) vs (2) 383.025 3 .00000 1668.364 3 72 .00000 |
|(4) vs (3) 18.427 14 .18800 1.169 14 72 .31776 |
+--------------------------------------------------------------------+

You may find F statistic 1.169 at the last line of the output and do not reject the null hypothesis
of no fixed time effect.

5.3 Between Time Effect Model

The between effect model regresses time means of dependent variables on those of independent
variables. See Sections 3.2 and 4.6.

. collapse (mean) tm_cost=cost (mean) tm_output=output (mean) tm_fuel=fuel ///
(mean) tm_load=load, by(year)

. regress tm_cost tm_output tm_fuel tm_load

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 4074.33
Model | 6.21220479 3 2.07073493 Prob > F = 0.0000
Residual | .005590631 11 .000508239 R-squared = 0.9991
-------------+------------------------------ Adj R-squared = 0.9989
Total | 6.21779542 14 .444128244 Root MSE = .02254

------------------------------------------------------------------------------
tm_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tm_output | 1.133337 .0512898 22.10 0.000 1.020449 1.246225
tm_fuel | .3342486 .0228284 14.64 0.000 .2840035 .3844937
tm_load | -1.350727 .2478264 -5.45 0.000 -1.896189 -.8052644
_cons | 11.18505 .3660016 30.56 0.000 10.37949 11.99062
------------------------------------------------------------------------------

PROC PANEL has the /BTWNT option to estimate the between effect model.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNT;
RUN;

The PANEL Procedure
Between Time Periods Estimates

Dependent Variable: cost

Model Description

Estimation Method BtwTime
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.0056 DFE 11
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 50
http://www.indiana.edu/~statmath

50
MSE 0.0005 Root MSE 0.0225
R-Square 0.9991


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

Intercept 1 11.18504 0.3660 30.56 <.0001 Intercept
output 1 1.133335 0.0513 22.10 <.0001
fuel 1 0.334249 0.0228 14.64 <.0001
load 1 -1.35073 0.2478 -5.45 0.0002

Alternatively, use the be option in the Stata .xtreg command and the Means subcommand in
LIMDEP Regress$ command to get the same result.

. xtreg cost output fuel load, be i(year)

Between regression (regression on group means) Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9840 Obs per group: min = 6
between = 0.9991 avg = 6.0
overall = 0.9749 max = 6

F(3,11) = 4074.35
sd(u_i + avg(e_i.))= .0225441 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.133335 .0512897 22.10 0.000 1.020447 1.246223
fuel | .3342494 .0228284 14.64 0.000 .2840044 .3844943
load | -1.35073 .2478257 -5.45 0.000 -1.896191 -.8052695
_cons | 11.18504 .3660008 30.56 0.000 10.37948 11.9906
------------------------------------------------------------------------------

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Means$

+----------------------------------------------------+
| Group Means Regression |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:23:24PM |
| LHS=YBAR(i.) Mean = 13.36561 |
| Standard deviation = .6664301 |
| WTS=NTi/Nobs Number of observs. = 15 |
| Model size Parameters = 4 |
| Degrees of freedom = 11 |
| Residuals Sum of squares = .5590461E-02 |
| Standard error of e = .2254382E-01 |
| Fit R-squared = .9991009 |
| Adjusted R-squared = .9988557 |
| Model test F[ 3, 11] (prob) =4074.46 (.0000) |
| Diagnostic Log likelihood = 37.92650 |
| Restricted(b=0) = -14.67933 |
| Chi-sq [ 3] (prob) = 105.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -7.348200 |
| Akaike Info. Criter. = -7.361410 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | 1.13334032 .05128905 22.097 .0000 .111879D-13
FUEL | .33424795 .02282811 14.642 .0000 .111879D-13
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 51
http://www.indiana.edu/~statmath

51
LOAD | -1.35072980 .24782272 -5.450 .0000 .141312D-06
Constant| 11.1850651 .36599619 30.561 .0000

5.4 Testing Fixed Time Effects.

The null hypothesis of the fixed time effect model is that all time dummy parameters except
one are zero: 0 ... :
1 1 0
= = =
÷ t
H t t . The F statistic is ] 72 , 14 [ 1683 . 1 ~
) 3 15 15 * 6 ( ) 0882 . 1 (
) 1 15 ( ) 0882 . 1 3354 . 1 (
÷ ÷
÷ ÷
.
The small F statistic does not reject the null hypothesis of no fixed time effect (p<.3180).

SAS PROC PANEL, LIMDEP, and Stata .xtreg by default conduct the F test. You may
conduct the same test using the TEST statement in LSDV1 and the Stata .test command.

PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
TEST t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;

(output is skipped)

. quietly regress cost t1-t14 output fuel load
. test t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14

( 1) t1 = 0
( 2) t2 = 0
( 3) t3 = 0
( 4) t4 = 0
( 5) t5 = 0
( 6) t6 = 0
( 7) t7 = 0
( 8) t8 = 0
( 9) t9 = 0
(10) t10 = 0
(11) t11 = 0
(12) t12 = 0
(13) t13 = 0
(14) t14 = 0

F( 14, 72) = 1.17
Prob > F = 0.3178
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 52
http://www.indiana.edu/~statmath

52
6. Two-way Fixed Effect Models

A two-way fixed model explores fixed effects of two group variables, two time variables, or
one group or one time variables. This chapter investigates fixed group and time effects. This
model thus needs two sets of group and time dummy variables (i.e., airline and year).

6.1 Strategies of the Least Squares Dummy Variable Models

You may combine LSDV1, LSDV2, and LSDV3 to avoid perfect multicollinearity or the
dummy variable trap in a two-way fixed effect model. There are five strategies when
combining three LSDVs. Since .cnsreg does not allow suppressing the intercept, strategy 4
does not work in Stata. The first strategy of dropping two dummies is generally recommended
because of its convenience of model estimation and interpretation.

1. Drop one cross-section and one time-series dummy variables.
2. Drop one cross-section dummy and suppress the intercept. Alternatively, drip one time
dummy and suppress the intercept
3. Drop one cross-section dummy and impose a restriction on the time-series dummy
parameters: 0 =
¿ t
t . Alternatively, drop one time-series dummy and impose a
restriction on the cross-section dummy parameters: 0 =
¿ i
µ
4. Suppress the intercept and impose a restriction on the cross-section dummy parameters:
0 =
¿ i
µ . Alternatively, suppress the intercept and impose a restriction on the time-
series dummy parameters: 0 =
¿ t
t .
5. Include all dummy variables and impose two restrictions on the cross-section and time-
series dummy parameters: 0 =
¿ i
µ and 0 =
¿ t
t

Each strategy produces different dummy coefficients but returns exactly same parameter
estimates of regressors. In general, dummy coefficients are not of primary interest in panel data
models.

6.2 LSDV1 without Two Dummies

The first strategy excludes two dummy variables, one dummy from each set of dummy
variables. Let us exclude g6 for the sixth airline and t15 for the last time period.

. regress cost g1-g5 t1-t14 output fuel load

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 22, 67) = 1960.82
Model | 113.864044 22 5.17563838 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 0.9984
-------------+------------------------------ Adj R-squared = 0.9979
Total | 114.040893 89 1.28135835 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1742825 .0861201 2.02 0.047 .0023861 .346179
g2 | .1114508 .0779551 1.43 0.157 -.0441482 .2670499
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 53
http://www.indiana.edu/~statmath

53
g3 | -.143511 .0518934 -2.77 0.007 -.2470907 -.0399313
g4 | .1802087 .0321443 5.61 0.000 .1160484 .2443691
g5 | -.0466942 .0224688 -2.08 0.042 -.0915422 -.0018463
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
------------------------------------------------------------------------------

In SAS, run the following script to get the same result.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.94004 2.21823 5.83 <.0001
g1 1 0.17428 0.08612 2.02 0.0470
g2 1 0.11145 0.07796 1.43 0.1575
g3 1 -0.14351 0.05189 -2.77 0.0073
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 54
http://www.indiana.edu/~statmath

54
g4 1 0.18021 0.03214 5.61 <.0001
g5 1 -0.04669 0.02247 -2.08 0.0415
t1 1 -0.69314 0.33784 -2.05 0.0441
t2 1 -0.63844 0.33208 -1.92 0.0588
t3 1 -0.59580 0.32945 -1.81 0.0750
t4 1 -0.54215 0.31891 -1.70 0.0938
t5 1 -0.47304 0.23195 -2.04 0.0454
t6 1 -0.42720 0.18844 -2.27 0.0266
t7 1 -0.39598 0.17330 -2.28 0.0255
t8 1 -0.33985 0.15011 -2.26 0.0268
t9 1 -0.27189 0.13482 -2.02 0.0477
t10 1 -0.22739 0.07635 -2.98 0.0040
t11 1 -0.11180 0.03190 -3.50 0.0008
t12 1 -0.03364 0.04290 -0.78 0.4357
t13 1 -0.01773 0.03626 -0.49 0.6263
t14 1 -0.01865 0.03051 -0.61 0.5432
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012

In LIMDEP, the following command fits the same model (output is skipped).

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

6.3 LSDV1 + LSDV2: Drop a Dummy and Suppress the Intercept

The second strategy combines LSDV1 and LSDV2 to drop a dummy and suppress the intercept.
Let us drop a dummy g6 and suppress the intercept. Keep in mind that SSE is still correct but F
and R
2
are not.

. regress cost g1-g5 t1-t15 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 23, 67) = .
Model | 16191.4201 23 703.974786 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1742825 .0861201 2.02 0.047 .0023861 .346179
g2 | .1114508 .0779551 1.43 0.157 -.0441482 .2670499
g3 | -.143511 .0518934 -2.77 0.007 -.2470907 -.0399313
g4 | .1802087 .0321443 5.61 0.000 .1160484 .2443691
g5 | -.0466942 .0224688 -2.08 0.042 -.0915422 -.0018463
t1 | 12.2469 1.885399 6.50 0.000 8.48363 16.01018
t2 | 12.3016 1.891045 6.51 0.000 8.527062 16.07615
t3 | 12.34424 1.89341 6.52 0.000 8.564976 16.1235
t4 | 12.39789 1.903395 6.51 0.000 8.598694 16.19708
t5 | 12.467 1.991503 6.26 0.000 8.491942 16.44206
t6 | 12.51284 2.035334 6.15 0.000 8.450294 16.57538
t7 | 12.54406 2.05038 6.12 0.000 8.451487 16.63664
t8 | 12.60019 2.073782 6.08 0.000 8.460909 16.73948
t9 | 12.66815 2.090527 6.06 0.000 8.495438 16.84086
t10 | 12.71266 2.151893 5.91 0.000 8.417458 17.00785
t11 | 12.82824 2.221401 5.77 0.000 8.394303 17.26217
t12 | 12.9064 2.247972 5.74 0.000 8.41943 17.39337
t13 | 12.92231 2.237999 5.77 0.000 8.455241 17.38937
t14 | 12.9214 2.224893 5.81 0.000 8.480492 17.3623
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 55
http://www.indiana.edu/~statmath

55
t15 | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
------------------------------------------------------------------------------

Alternatively, you may drop one of time dummies and suppress the intercept. The dummy
coefficients are different from those above but parameter estimates of regressors remained
unchanged.

. regress cost g1-g6 t1-t14 output fuel load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 23, 67) = .
Model | 16191.4201 23 703.974786 Prob > F = 0.0000
Residual | .176848775 67 .002639534 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 16191.5969 90 179.906633 Root MSE = .05138

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | 13.11432 2.229552 5.88 0.000 8.66412 17.56453
g2 | 13.05149 2.229864 5.85 0.000 8.600665 17.50232
g3 | 12.79653 2.230546 5.74 0.000 8.344341 17.24872
g4 | 13.12025 2.223638 5.90 0.000 8.68185 17.55865
g5 | 12.89335 2.222204 5.80 0.000 8.45781 17.32888
g6 | 12.94004 2.218231 5.83 0.000 8.512434 17.36765
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
------------------------------------------------------------------------------

In SAS, execute the following script that has /NOINT to suppress the intercept.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t15 output fuel load /NOINT;
MODEL cost = g1-g6 t1-t14 output fuel load /NOINT;
RUN;

(output is skippted)

In LIMDEP, ONE should be taken out to suppress the intercept.

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15, OUTPUT,FUEL,LOAD$

(output is skippted)

REGRESS;Lhs=COST;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 56
http://www.indiana.edu/~statmath

56
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

+----------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 03:58:13PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Autocorrel Durbin-Watson Stat. = .6035047 |
| Rho = cor[e,e(-1)] = .6982476 |
| Not using OLS or no constant. Rsqd & F may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 13.1139819 2.22955625 5.882 .0000 .16666667
G2 | 13.0511515 2.22986828 5.853 .0000 .16666667
G3 | 12.7961914 2.23055043 5.737 .0000 .16666667
G4 | 13.1199153 2.22364115 5.900 .0000 .16666667
G5 | 12.8930131 2.22220692 5.802 .0000 .16666667
G6 | 12.9397087 2.21823375 5.833 .0000 .16666667
T1 | -.69308729 .33783938 -2.052 .0441 .06666667
T2 | -.63838795 .33208126 -1.922 .0588 .06666667
T3 | -.59575348 .32944797 -1.808 .0750 .06666667
T4 | -.54210773 .31891465 -1.700 .0938 .06666667
T5 | -.47300784 .23194606 -2.039 .0454 .06666667
T6 | -.42717813 .18844068 -2.267 .0266 .06666667
T7 | -.39595152 .17329717 -2.285 .0255 .06666667
T8 | -.33982426 .15010661 -2.264 .0268 .06666667
T9 | -.27187359 .13481769 -2.017 .0477 .06666667
T10 | -.22737840 .07634935 -2.978 .0040 .06666667
T11 | -.11180525 .03190046 -3.505 .0008 .06666667
T12 | -.03364915 .04290088 -.784 .4356 .06666667
T13 | -.01774030 .03625541 -.489 .6262 .06666667
T14 | -.01864714 .03050793 -.611 .5431 .06666667
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3060 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0012 .56046016

Notice that LIMDEP reports correct F (1960.83), and R
2
(.9984).

6.4 LSDV1 + LSDV3: Drop a Dummy and Impose a Restriction

The third strategy excludes one dummy from a set of dummy variables and imposes a
restriction on another set of dummy parameters. Let us drop a time dummy here and then
impose a restriction on group dummy parameters.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t14 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 57
http://www.indiana.edu/~statmath

57
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.98600 2.22540 5.84 <.0001
g1 1 0.12833 0.04601 2.79 0.0069
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 -0.69314 0.33784 -2.05 0.0441
t2 1 -0.63844 0.33208 -1.92 0.0588
t3 1 -0.59580 0.32945 -1.81 0.0750
t4 1 -0.54215 0.31891 -1.70 0.0938
t5 1 -0.47304 0.23195 -2.04 0.0454
t6 1 -0.42720 0.18844 -2.27 0.0266
t7 1 -0.39598 0.17330 -2.28 0.0255
t8 1 -0.33985 0.15011 -2.26 0.0268
t9 1 -0.27189 0.13482 -2.02 0.0477
t10 1 -0.22739 0.07635 -2.98 0.0040
t11 1 -0.11180 0.03190 -3.50 0.0008
t12 1 -0.03364 0.04290 -0.78 0.4357
t13 1 -0.01773 0.03626 -0.49 0.6263
t14 1 -0.01865 0.03051 -0.61 0.5432
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 -1.9387E-16 . . .

* Probability computed using beta distribution.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 58
http://www.indiana.edu/~statmath

58

In Stata, you need to run the .cnsreg command with a constraint on the group dummy
parameters. .cnsreg with the .constraint(1) option fits OLS under constraint 1 defined
in .constraint.

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 t1-t14 output fuel load, constraint(1)

Constrained linear regression Number of obs = 90
F( 22, 67) = 1960.82
Prob > F = 0.0000
Root MSE = 0.0514

( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1283264 .0460126 2.79 0.007 .0364849 .2201679
g2 | .0654947 .0389685 1.68 0.097 -.0122867 .1432761
g3 | -.1894671 .0156096 -12.14 0.000 -.220624 -.1583102
g4 | .1342526 .0183163 7.33 0.000 .097693 .1708121
g5 | -.0926504 .0373085 -2.48 0.016 -.1671184 -.0181824
g6 | -.0459561 .0416069 -1.10 0.273 -.1290038 .0370916
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.986 2.225402 5.84 0.000 8.544076 17.42792
------------------------------------------------------------------------------

In LIMDEP, run a Regress$ command with the Cls: subcommand. b(2) in the subcommand
indicates the second parameter estimate listed in the Rhs= subcommand. Therefore, LIMDEP
fits the LSDV1 under the constraint that the sum of all group dummy parameters, b(2) for g1
through b(7) for g6, is zero.

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 04:24:35PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 59
http://www.indiana.edu/~statmath

59
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Autocorrel Durbin-Watson Stat. = .6035047 |
| Rho = cor[e,e(-1)] = .6982476 |
| Restrictns. F[ 1, 66] (prob) = .00 (*****) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
Constant| 12.9856603 2.22540616 5.835 .0000
G1 | .12832155 .04601257 2.789 .0069 .16666667
G2 | .06549116 .03896849 1.681 .0976 .16666667
G3 | -.18946893 .01560965 -12.138 .0000 .16666667
G4 | .13425504 .01831636 7.330 .0000 .16666667
G5 | -.09264719 .03730846 -2.483 .0156 .16666667
G6 | -.04595164 .04160692 -1.104 .2734 .16666667
T1 | -.69308729 .33783938 -2.052 .0442 .06666667
T2 | -.63838795 .33208126 -1.922 .0589 .06666667
T3 | -.59575348 .32944797 -1.808 .0751 .06666667
T4 | -.54210773 .31891465 -1.700 .0939 .06666667
T5 | -.47300784 .23194606 -2.039 .0454 .06666667
T6 | -.42717813 .18844068 -2.267 .0267 .06666667
T7 | -.39595152 .17329717 -2.285 .0255 .06666667
T8 | -.33982426 .15010661 -2.264 .0269 .06666667
T9 | -.27187359 .13481769 -2.017 .0478 .06666667
T10 | -.22737840 .07634935 -2.978 .0041 .06666667
T11 | -.11180525 .03190046 -3.505 .0008 .06666667
T12 | -.03364915 .04290088 -.784 .4356 .06666667
T13 | -.01774030 .03625541 -.489 .6262 .06666667
T14 | -.01864714 .03050793 -.611 .5432 .06666667
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3061 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0012 .56046016

Alternatively, you may drop one group dummy and imposes a restriction on time dummy
variables. In LIMDEP, b(7) indicates the seventh parameter estimate for t1. The output is
skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t15 output fuel load;
RESTRICT t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0;
RUN;

. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0
. cnsreg cost g1-g5 t1-t15 output fuel load, constraint(3)

REGRESS;Lhs=COST;
Rhs=ONE,G1,G2,G3,G4,G5,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$

6.5 LSDV2 + LSDV3: Suppress the Intercept and Impose a Restriction

The strategy of LSDV2 + LSDV3 includes all two sets of dummy variables and instead
suppresses the intercept and imposes a restriction. Stata does not support this approach. The
following procedure has a constraint on the group variable. Since the intercept is suppressed, F
(703.9748) and R
2
are incorrect.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load /NOINT;
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 60
http://www.indiana.edu/~statmath

60
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 23 16191 703.97479 266704 <.0001
Error 67 0.17685 0.00264
Uncorrected Total 90 16192


Root MSE 0.05138 R-Square 1.0000
Dependent Mean 13.36561 Adj R-Sq 1.0000
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

g1 1 0.12833 0.04601 2.79 0.0069
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 12.29286 1.89169 6.50 <.0001
t2 1 12.34756 1.89736 6.51 <.0001
t3 1 12.39019 1.89982 6.52 <.0001
t4 1 12.44384 1.90989 6.52 <.0001
t5 1 12.51295 1.99808 6.26 <.0001
t6 1 12.55879 2.04195 6.15 <.0001
t7 1 12.59002 2.05706 6.12 <.0001
t8 1 12.64615 2.08052 6.08 <.0001
t9 1 12.71410 2.09734 6.06 <.0001
t10 1 12.75861 2.15883 5.91 <.0001
t11 1 12.87419 2.22838 5.78 <.0001
t12 1 12.95236 2.25499 5.74 <.0001
t13 1 12.96826 2.24505 5.78 <.0001
t14 1 12.96735 2.23202 5.81 <.0001
t15 1 12.98600 2.22540 5.84 <.0001
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 61
http://www.indiana.edu/~statmath

61
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 5.89339E-14 1.250165E-9 0.00 1.0000*

* Probability computed using beta distribution.

You may impose an alternative restriction on the time variable to obtain the equivalent result
despite different dummy coefficients. The output is skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load /NOINT;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;

In LIMDEP, following commands are supposed to work, but they return different parameter
estimates and goodness-of-fit measures probably due to its estimation method.

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$

(output is skipped)

REGRESS;Lhs=COST;
Rhs=G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$

+----------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 04:47:10PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1790783 |
| Standard error of e = .5169924E-01 |
| Fit R-squared = .9984297 |
| Adjusted R-squared = .9979141 |
| Model test F[ 22, 67] (prob) =1936.37 (.0000) |
| Diagnostic Log likelihood = 152.1839 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 581.08 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.697046 |
| Akaike Info. Criter. = -5.708630 |
| Autocorrel Durbin-Watson Stat. = .6164424 |
| Rho = cor[e,e(-1)] = .6917788 |
| Restrictns. F[ 1, 66] (prob) = .68 (.4113) |
| Not using OLS or no constant. Rsqd & F may be < 0. |
| Note, with restrictions imposed, Rsqd may be < 0. |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
G1 | 13.0058594 ......(Fixed Parameter).......
G2 | 12.9453125 216842.319 .000 1.0000 .16666667
G3 | 12.6894531 216842.319 .000 1.0000 .16666667
G4 | 13.0117188 216842.319 .000 1.0000 .16666667
G5 | 12.7812500 ......(Fixed Parameter).......
G6 | 12.8261719 ......(Fixed Parameter).......
T1 | -.39453125 306661.348 .000 1.0000 .06666667
T2 | -.33203125 433684.637 .000 1.0000 .06666667
T3 | -.29101563 216842.319 .000 1.0000 .06666667
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 62
http://www.indiana.edu/~statmath

62
T4 | -.24414063 306661.348 .000 1.0000 .06666667
T5 | -.16406250 ......(Fixed Parameter).......
T6 | -.10742188 ......(Fixed Parameter).......
T7 | -.07421875 ......(Fixed Parameter).......
T8 | -.02148438 ......(Fixed Parameter).......
T9 | .05859375 216842.319 .000 1.0000 .06666667
T10 | .10351563 216842.319 .000 1.0000 .06666667
T11 | .22070313 216842.319 .000 1.0000 .06666667
T12 | .30468750 216842.319 .000 1.0000 .06666667
T13 | .31250000 216842.319 .000 1.0000 .06666667
T14 | .31835938 216842.319 .000 1.0000 .06666667
T15 | .33203125 ......(Fixed Parameter).......
OUTPUT | .81399272 .03205125 25.397 .0000 -1.17430918
FUEL | .15204518 .16450594 .924 .3587 12.7703592
LOAD | -.88619366 .26338199 -3.365 .0013 .56046016

6.6 LSDV3 with Two Restrictions

The last strategy includes all group and time dummies and then imposes two restrictions on
group and time dummy parameters. Pay attention to the two RESTRICT statements in the
following PROC REG.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t15 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.


Number of Observations Read 90
Number of Observations Used 90


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 22 113.86404 5.17564 1960.82 <.0001
Error 67 0.17685 0.00264
Corrected Total 89 114.04089


Root MSE 0.05138 R-Square 0.9984
Dependent Mean 13.36561 Adj R-Sq 0.9979
Coeff Var 0.38439


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 12.66688 2.08107 6.09 <.0001
g1 1 0.12833 0.04601 2.79 0.0069
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 63
http://www.indiana.edu/~statmath

63
g2 1 0.06549 0.03897 1.68 0.0975
g3 1 -0.18947 0.01561 -12.14 <.0001
g4 1 0.13425 0.01832 7.33 <.0001
g5 1 -0.09265 0.03731 -2.48 0.0155
g6 1 -0.04596 0.04161 -1.10 0.2733
t1 1 -0.37402 0.19187 -1.95 0.0554
t2 1 -0.31932 0.18609 -1.72 0.0908
t3 1 -0.27669 0.18335 -1.51 0.1360
t4 1 -0.22304 0.17297 -1.29 0.2017
t5 1 -0.15393 0.08644 -1.78 0.0795
t6 1 -0.10809 0.04486 -2.41 0.0187
t7 1 -0.07686 0.03193 -2.41 0.0188
t8 1 -0.02073 0.02045 -1.01 0.3143
t9 1 0.04722 0.02908 1.62 0.1091
t10 1 0.09173 0.08115 1.13 0.2624
t11 1 0.20731 0.14914 1.39 0.1691
t12 1 0.28547 0.17564 1.63 0.1088
t13 1 0.30138 0.16603 1.82 0.0740
t14 1 0.30047 0.15362 1.96 0.0546
t15 1 0.31911 0.14749 2.16 0.0341
output 1 0.81725 0.03185 25.66 <.0001
fuel 1 0.16861 0.16348 1.03 0.3061
load 1 -0.88281 0.26174 -3.37 0.0012
RESTRICT -1 -2.5962E-16 4.04547E-11 -0.00 1.0000*
RESTRICT -1 -2.3598E-16 . . .

* Probability computed using beta distribution.

In Stata, execute the following command to get the same result. Notice that constraints 1 and 3
were defined above.

. cnsreg cost g1-g6 t1-t15 output fuel load, constraint(1 3)

Constrained linear regression Number of obs = 90
F( 22, 67) = 1960.82
Prob > F = 0.0000
Root MSE = 0.0514

( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
( 2) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 | .1283264 .0460126 2.79 0.007 .0364849 .2201679
g2 | .0654947 .0389685 1.68 0.097 -.0122867 .1432761
g3 | -.1894671 .0156096 -12.14 0.000 -.220624 -.1583102
g4 | .1342526 .0183163 7.33 0.000 .097693 .1708121
g5 | -.0926504 .0373085 -2.48 0.016 -.1671184 -.0181824
g6 | -.0459561 .0416069 -1.10 0.273 -.1290038 .0370916
t1 | -.3740245 .191872 -1.95 0.055 -.7570026 .0089536
t2 | -.3193228 .1860877 -1.72 0.091 -.6907554 .0521097
t3 | -.2766893 .1833501 -1.51 0.136 -.6426576 .0892789
t4 | -.2230399 .1729671 -1.29 0.202 -.5682837 .1222038
t5 | -.1539291 .0864404 -1.78 0.079 -.3264649 .0186066
t6 | -.1080904 .0448591 -2.41 0.019 -.1976296 -.0185513
t7 | -.0768646 .0319336 -2.41 0.019 -.1406043 -.0131248
t8 | -.0207326 .0204506 -1.01 0.314 -.061552 .0200869
t9 | .0472205 .0290822 1.62 0.109 -.0108278 .1052688
t10 | .0917281 .0811525 1.13 0.262 -.0702531 .2537092
t11 | .2073105 .1491443 1.39 0.169 -.0903829 .5050039
t12 | .2854727 .1756365 1.63 0.109 -.0650993 .6360447
t13 | .3013791 .1660294 1.82 0.074 -.030017 .6327752
t14 | .3004686 .1536212 1.96 0.055 -.0061606 .6070978
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 64
http://www.indiana.edu/~statmath

64
t15 | .3191137 .1474883 2.16 0.034 .0247259 .6135015
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.66688 2.081068 6.09 0.000 8.513054 16.82071
------------------------------------------------------------------------------

In LIMDEP, the following command returns the same result (output is skipped). Notice that
two restrictions in Cls: are separated by a comma.

REGRESS;Lhs=COST;
Rhs=One,G1,G2,G3,G4,G5,G6,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0,
b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)+b(22)=0$

6.7 Two-way Within Effect Model

The two-way fixed effect model requires a transformation of dependent and independent
variables using group means.
- - - -
+ ÷ ÷ = y y y y y
t i it it
*
and
- - - -
+ ÷ ÷ = x x x x x
t i it it
*
.

. gen w_cost = cost - gm_cost - tm_cost + m_cost
. gen w_output = output - gm_output - tm_output + m_output
. gen w_fuel = fuel - gm_fuel - tm_fuel + m_fuel
. gen w_load = load - gm_load - tm_load + m_load

Once data are transformed, run the OLS with the transformed variables. Do not forget to
suppress the intercept.

. regress w_cost w_output w_fuel w_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 3, 87) = 307.86
Model | 1.87739643 3 .625798811 Prob > F = 0.0000
Residual | .176848774 87 .002032745 R-squared = 0.9139
-------------+------------------------------ Adj R-squared = 0.9109
Total | 2.05424521 90 .022824947 Root MSE = .04509

------------------------------------------------------------------------------
w_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w_output | .8172487 .0279512 29.24 0.000 .7616927 .8728048
w_fuel | .16861 .1434621 1.18 0.243 -.1165364 .4537565
w_load | -.8828142 .2296907 -3.84 0.000 -1.339349 -.426279
------------------------------------------------------------------------------

Remember that F, R
2
, standard errors, and DF
error
are not correct. Standard errors need to be
adjusted; for instance, the standard error of the load factor is .2617=.2297*sqrt(87/67).

The dummy variable coefficients are computed as | )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
i i i
and
| )' ( ) (
*
- - - - - -
÷ ÷ ÷ = x x y y d
t t t
. We need to compute overall means and group specific, say
airline 3, means.

. sum cost output fuel load

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 90 13.36561 1.131971 11.14154 15.3733
output | 90 -1.174309 1.150606 -3.278573 .6608616
fuel | 90 12.77036 .8123749 11.55017 13.831
load | 90 .5604602 .0527934 .432066 .676287
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 65
http://www.indiana.edu/~statmath

65

. sum cost output fuel load if airline==3

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 15 13.37231 .5220657 12.56479 13.99694
output | 15 -.9122625 .2435335 -1.337794 -.6169364
fuel | 15 12.78972 .8177211 11.6851 13.831
load | 15 .5845359 .0324437 .524334 .654256

The actual (absolute) intercept of airline 3 is -.1895 =(13.3723-13.3656)-(-.9123-(-
1.1743))*(.8172) -(12.7897-12.7704)*(.1686)- (.5845-.5605)*(-.8828). The actual intercept of
time period 9 is .0472=(13.4651-13.3656)-(-1.0670-(-1.1743))*(.8172) -(12.8610-
12.7704)*(.1686)- (.6179-.5605)*(-.8828). See the SAS output in Section 6.6 to cross-check the
computation.

. sum cost output fuel load if year==9

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cost | 6 13.4651 1.042032 12.20495 14.78597
output | 6 -1.067003 1.278931 -2.673258 .4779284
fuel | 6 12.86104 .0212523 12.83356 12.89337
load | 6 .6179098 .0376737 .546723 .654256

6.8 Using SAS: PROC TSCSREG and PROC PANEL

PROC TSCSREG and PROC PANEL have the /FIXTWO option to fit the two-way fixed effect
model. The data set needs to be sorted by the group and time variables that will be declared in
the ID statement in PROC PANEL.

PROC SORT DATA=masil.airline;
BY airline year;

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXTWO;
RUN;

The PANEL Procedure
Fixed Two Way Estimates

Dependent Variable: cost

Model Description

Estimation Method FixTwo
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.1768 DFE 67
MSE 0.0026 Root MSE 0.0514
R-Square 0.9984


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 66
http://www.indiana.edu/~statmath

66
F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

19 67 23.10 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t| Label

CS1 1 0.174283 0.0861 2.02 0.0470 Cross Sectional
Effect 1
CS2 1 0.111451 0.0780 1.43 0.1575 Cross Sectional
Effect 2
CS3 1 -0.14351 0.0519 -2.77 0.0073 Cross Sectional
Effect 3
CS4 1 0.180209 0.0321 5.61 <.0001 Cross Sectional
Effect 4
CS5 1 -0.04669 0.0225 -2.08 0.0415 Cross Sectional
Effect 5
TS1 1 -0.69314 0.3378 -2.05 0.0441 Time Series
Effect 1
TS2 1 -0.63844 0.3321 -1.92 0.0588 Time Series
Effect 2
TS3 1 -0.5958 0.3294 -1.81 0.0750 Time Series
Effect 3
TS4 1 -0.54215 0.3189 -1.70 0.0938 Time Series
Effect 4
TS5 1 -0.47304 0.2319 -2.04 0.0454 Time Series
Effect 5
TS6 1 -0.4272 0.1884 -2.27 0.0266 Time Series
Effect 6
TS7 1 -0.39598 0.1733 -2.28 0.0255 Time Series
Effect 7
TS8 1 -0.33985 0.1501 -2.26 0.0268 Time Series
Effect 8
TS9 1 -0.27189 0.1348 -2.02 0.0477 Time Series
Effect 9
TS10 1 -0.22739 0.0763 -2.98 0.0040 Time Series
Effect 10
TS11 1 -0.1118 0.0319 -3.50 0.0008 Time Series
Effect 11
TS12 1 -0.03364 0.0429 -0.78 0.4357 Time Series
Effect 12
TS13 1 -0.01773 0.0363 -0.49 0.6263 Time Series
Effect 13
TS14 1 -0.01865 0.0305 -0.61 0.5432 Time Series
Effect 14
Intercept 1 12.94004 2.2182 5.83 <.0001 Intercept
output 1 0.817249 0.0319 25.66 <.0001
fuel 1 0.16861 0.1635 1.03 0.3061
load 1 -0.88281 0.2617 -3.37 0.0012

6.9 Using Stata and LIMDEP

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 67
http://www.indiana.edu/~statmath

67
The Stata .xtreg command does not have an option for two-way fixed or two-way random
effect models. However, this command is able to fit the two-way fixed effect model by
including a set of dummies for a group (LSDV1) and using the fe option.

. xtreg cost t1-t14 output fuel load, fe i(airline)

Fixed-effects (within) regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9955 Obs per group: min = 15
between = 0.9859 avg = 15.0
overall = 0.9885 max = 15

F(17,67) = 873.24
corr(u_i, Xb) = 0.3361 Prob > F = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t1 | -.6931382 .3378385 -2.05 0.044 -1.367467 -.0188098
t2 | -.6384366 .3320802 -1.92 0.059 -1.301271 .0243983
t3 | -.5958031 .3294473 -1.81 0.075 -1.253383 .0617764
t4 | -.5421537 .3189139 -1.70 0.094 -1.178708 .0944011
t5 | -.4730429 .2319459 -2.04 0.045 -.9360088 -.0100769
t6 | -.4272042 .18844 -2.27 0.027 -.8033319 -.0510764
t7 | -.3959783 .1732969 -2.28 0.025 -.7418804 -.0500762
t8 | -.3398463 .1501062 -2.26 0.027 -.6394596 -.040233
t9 | -.2718933 .1348175 -2.02 0.048 -.5409901 -.0027964
t10 | -.2273857 .0763495 -2.98 0.004 -.37978 -.0749914
t11 | -.1118032 .0319005 -3.50 0.001 -.175477 -.0481295
t12 | -.033641 .0429008 -0.78 0.436 -.1192713 .0519893
t13 | -.0177346 .0362554 -0.49 0.626 -.0901007 .0546315
t14 | -.0186451 .030508 -0.61 0.543 -.0795393 .042249
output | .8172487 .031851 25.66 0.000 .7536739 .8808235
fuel | .16861 .163478 1.03 0.306 -.1576935 .4949135
load | -.8828142 .2617373 -3.37 0.001 -1.405244 -.3603843
_cons | 12.986 2.225402 5.84 0.000 8.544076 17.42792
-------------+----------------------------------------------------------------
sigma_u | .1306712
sigma_e | .05137639
rho | .86611203 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5, 67) = 69.05 Prob > F = 0.0000

The F statistic of 69.05 tests only if parameters of g1 through g5 are all zero. You may double-
check this test by running the following commands.

. quietly regress cost g1-g5 t1-t14 output fuel load
. test g1=g2=g3=g4=g5=0

( 1) g1 - g2 = 0
( 2) g1 - g3 = 0
( 3) g1 - g4 = 0
( 4) g1 - g5 = 0
( 5) g1 = 0

F( 5, 67) = 69.05
Prob > F = 0.0000

The following LIMDEP command fits the two-way fixed model. This command has Str and
Period to specify stratification and time variables. This command presents the pooled model
and one-way group effect model as well, but reports the incorrect intercept in the two-way
fixed model, 12.667 (2.081). The pooled OLS and fixed group effect parts of the entire output
is skipped below since they are redundant.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 68
http://www.indiana.edu/~statmath

68

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Period=YEAR;Fixed$

+----------------------------------------------------+
| Least Squares with Group and Period Effects |
| Ordinary least squares regression |
| Model was estimated Aug 27, 2009 at 04:27:40PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 23 |
| Degrees of freedom = 67 |
| Residuals Sum of squares = .1768479 |
| Standard error of e = .5137627E-01 |
| Fit R-squared = .9984493 |
| Adjusted R-squared = .9979401 |
| Model test F[ 22, 67] (prob) =1960.83 (.0000) |
| Diagnostic Log likelihood = 152.7479 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 22] (prob) = 582.21 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -5.709580 |
| Akaike Info. Criter. = -5.721164 |
| Estd. Autocorrelation of e(i,t) .651825 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
| Panel: Prds: Empty 0, Valid data 15 |
| Smallest 0, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .81725242 .03185102 25.659 .0000 -1.17430918
FUEL | .16863516 .16347826 1.032 .3052 12.7703592
LOAD | -.88281516 .26173663 -3.373 .0011 .56046016
Constant| 12.6665675 2.08107166 6.087 .0000

+--------------------------------------------------------------------+
| Test Statistics for the Classical Model |
+--------------------------------------------------------------------+
| Model Log-Likelihood Sum of Squares R-squared |
|(1) Constant term only -138.35814 .1140409821D+03 .0000000 |
|(2) Group effects only -90.48804 .3936109461D+02 .6548513 |
|(3) X - variables only 61.76991 .1335449522D+01 .9882897 |
|(4) X and group effects 130.08647 .2926207777D+00 .9974341 |
|(5) X ind.&time effects 152.74790 .1768479062D+00 .9984493 |
+--------------------------------------------------------------------+
| Hypothesis Tests |
| Likelihood Ratio Test F Tests |
| Chi-squared d.f. Prob. F num. denom. P value |
|(2) vs (1) 95.740 5 .00000 31.875 5 84 .00000 |
|(3) vs (1) 400.256 3 .00000 2419.329 3 86 .00000 |
|(4) vs (1) 536.889 8 .00000 3935.818 8 81 .00000 |
|(4) vs (2) 441.149 3 .00000 3604.832 3 81 .00000 |
|(4) vs (3) 136.633 5 .00000 57.733 5 81 .00000 |
|(5) vs (4) 45.323 14 .00004 3.133 14 67 .00085 |
|(5) vs (3) 181.956 20 .00000 21.947 20 67 .00000 |
+--------------------------------------------------------------------+

6.10 Testing Two-way Fixed Effects

The null hypothesis is that parameters of group and time dummies are zero:
0 ... :
1 1 0
= = =
÷ n
H µ µ and 0 ...
1 1
= = =
÷ T
t t . The F test compares the pooled regression and
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 69
http://www.indiana.edu/~statmath

69
two-way fixed group and time effect model. The F statistic of 23.1085 rejects the null
hypothesis at the .01 significance level (p<.0000).

] 67 , 19 [ 1085 . 23 ~
) 1 3 15 6 15 * 6 ( ) 1768 (.
) 2 15 6 ( ) 1768 . 3354 . 1 (
+ ÷ ÷ ÷
÷ + ÷


The SAS TSCSREG and PANEL procedures conduct this F-test for the group and time effects.
You may also run the following SAS REG procedure and Stata .regress command to perform
the same test. The Stata output is skipped.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
TEST g1=g2=g3=g4=g5=t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;

Test 1 Results for Dependent Variable cost

Mean
Source DF Square F Value Pr > F

Numerator 19 0.06098 23.10 <.0001
Denominator 67 0.00264

. quietly regress cost g1-g5 t1-t14 output fuel load
. test g1 g2 g3 g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 70
http://www.indiana.edu/~statmath

70
7. Random Effect Models

A random effect model examines how group and/or time affect error variances. This model is
appropriate for n individuals who were drawn randomly from a large population. This chapter
focuses on the feasible generalized least squares (FGLS) with variance component estimation
methods.
10


7.1 One-way Random Group Effect Model

When the omega matrix is not known, you have to estimateu using the SSEs of the between
group effect model (.0317) and the fixed group effect model (.2926).

The variance component of error
2
ˆ
v
o is .00361263 = .292622872/(6*15-6-3)
The variance component of group
2
ˆ
u
o is .01559712 =.031675926/(6-4) - .00361263/15

Thus, u
ˆ
is
4) - /(6 .031675926 * 15
.00361263
1
ˆ
ˆ
1
ˆ ˆ
ˆ
1 .87668488
2
2
2 2
2
÷ = ÷ =
+
÷ =
between
v
v u
v
T T o
o
o o
o
,
where 01583796 .
4 6
031675926 .
ˆ
2
=
÷
=
÷
=
K n
SSE
between
between
o .

Next, transform the dependent and independent variables including the intercept using u
ˆ
.

. gen rg_cost = cost - .87668488*gm_cost
. gen rg_output = output - .87668488*gm_output
. gen rg_fuel = fuel - .87668488*gm_fuel
. gen rg_load = load - .87668488*gm_load
. gen rg_int = 1 - .87668488 // for the intercept

Finally, run the OLS with the transformed variables. Do not forget to suppress the intercept.
This is the groupwise heteroscedastic regression model (Greene 2003).

. regress rg_cost rg_int rg_output rg_fuel rg_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 4, 86) =19642.72
Model | 284.670313 4 71.1675783 Prob > F = 0.0000
Residual | .311586777 86 .003623102 R-squared = 0.9989
-------------+------------------------------ Adj R-squared = 0.9989
Total | 284.9819 90 3.16646556 Root MSE = .06019

------------------------------------------------------------------------------
rg_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rg_int | 9.627911 .2101638 45.81 0.000 9.210119 10.0457

10
Baltagi and Cheng (1994) introduce various ANOVA estimation methods, such as a modified Wallace and
Hussain method, the Wansbeek and Kapteyn method, the Swamy and Arora method, and Henderson’s method III.
They also discuss maximum likelihood (ML) estimators, restricted ML estimators, minimum norm quadratic
unbiased estimators (MINQUE), and minimum variance quadratic unbiased estimators (MIVQUE). Based on a
Monte Carlo simulation, they argue that ANOVA estimators are Best Quadratic Unbiased estimators of the
variance components for the balanced model, whereas ML, restricted ML, MINQUE, and MIVQUE are
recommended for the unbalanced models.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 71
http://www.indiana.edu/~statmath

71
rg_output | .9066808 .0256249 35.38 0.000 .8557401 .9576215
rg_fuel | .4227784 .0140248 30.15 0.000 .394898 .4506587
rg_load | -1.0645 .2000703 -5.32 0.000 -1.462226 -.6667731
------------------------------------------------------------------------------

7.2 Estimations in SAS, Stata, and LIMDEP

In SAS, the TSCSREG and PANEL procedures have the /RANONE option to fit the one-way
random effect model. These procedures by default use the Fuller and Battese (1974) estimation
method, which produces slightly different estimates from FGLS.

PROC PANEL has the /VCOMP=WK option for the Wansbeek and Kapteyn (1989) method,
which is the groupwise heteroscedastic regression. The BP option of the MODEL statement,
not available in PROC TSCSREG, conducts the Breusch-Pagen LM test for random effects.
Unlike PROC PANEL, PROC TSCSREG does not have VCOMP= to specify the type of
variance component estimation.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE BP VCOMP=WK;
RUN;

The PANEL Procedure
Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: cost

Model Description

Estimation Method RanOne
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.3111 DFE 86
MSE 0.0036 Root MSE 0.0601
R-Square 0.9923


Variance Component Estimates

Variance Component for Cross Sections 0.016015
Variance Component for Error 0.003613


Hausman Test for
Random Effects

DF m Value Pr > m

2 1.63 0.4429


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 72
http://www.indiana.edu/~statmath

72
Breusch Pagan Test for Random
Effects (One Way)

DF m Value Pr > m

1 334.85 <.0001


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.629513 0.2107 45.71 <.0001
output 1 0.906918 0.0257 35.30 <.0001
fuel 1 0.422676 0.0140 30.11 <.0001
load 1 -1.06452 0.2000 -5.32 <.0001

PROC PANEL and PROC TSCSREG estimate the same variance component for error (.0036)
but a different variance component for groups (.0160 versus .4744). Notice that there are some
differences in the output of PROC TSCSREG (variance component estimates and Hausman test)
between SAS 9.2 and 9.13.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE;
RUN;

(output is skipped)

Alternatively, you may use PROC MIXED to get the same results. The following script returns
a set of random effect estimates. Unlike SAS 9.13, SAS 9.2 requires the CLASS statement to
explicitly specify an effect variable, airline in this case.

PROC MIXED DATA=masil.airline;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) airline 0.01674
Residual 0.003609


Fit Statistics

-2 Res Log Likelihood -210.4
AIC (smaller is better) -206.4
AICC (smaller is better) -206.3
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 73
http://www.indiana.edu/~statmath

73
BIC (smaller is better) -206.8


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 107.49 <.0001


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.6322 0.2116 5 45.53 <.0001
output 0.9073 0.02581 81 35.16 <.0001
fuel 0.4225 0.01406 81 30.05 <.0001
load -1.0646 0.1998 81 -5.33 <.0001


Solution for Random Effects

Std Err
Effect airline Estimate Pred DF t Value Pr > |t|

Intercept 1 0.01012 0.06594 81 0.15 0.8784
Intercept 2 -0.03450 0.06239 81 -0.55 0.5818
Intercept 3 -0.2106 0.05507 81 -3.82 0.0003
Intercept 4 0.1691 0.05581 81 3.03 0.0033
Intercept 5 0.002981 0.06180 81 0.05 0.9616
Intercept 6 0.06291 0.06349 81 0.99 0.3247


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 81 1235.88 <.0001
fuel 1 81 903.03 <.0001
load 1 81 28.40 <.0001

In Stata, the .xtreg command has the re option to produce FGLS estimates. Let us specify
airline as a panel identification variable using the .iis command. The theta option reports
an estimated theta (.8767).

. iis airline

. xtreg cost output fuel load, re theta

Random-effects GLS regression Number of obs = 90
Group variable: airline Number of groups = 6

R-sq: within = 0.9925 Obs per group: min = 15
between = 0.9856 avg = 15.0
overall = 0.9876 max = 15

Random effects u_i ~ Gaussian Wald chi2(3) = 11091.33
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 74
http://www.indiana.edu/~statmath

74
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = .87668503

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9066805 .025625 35.38 0.000 .8564565 .9569045
fuel | .4227784 .0140248 30.15 0.000 .3952904 .4502665
load | -1.064499 .2000703 -5.32 0.000 -1.456629 -.672368
_cons | 9.627909 .210164 45.81 0.000 9.215995 10.03982
-------------+----------------------------------------------------------------
sigma_u | .12488859
sigma_e | .06010514
rho | .81193816 (fraction of variance due to u_i)
------------------------------------------------------------------------------

The sigma_u and sigma_e are square roots of the variance components for groups and errors
(.0156=.1249^2, .0036=.0601^2).

Alternatively, .xtmixed fits the same model, the random-intercept model. The || airline:,
option tells Stata to fit the model using the subject variable airline. Variance components for
groups and errors are reported under the labels sd(_cons) and sd(Residual).

. xtmixed cost output fuel load || airline:,

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = 105.20458
Iteration 1: log restricted-likelihood = 105.20458

Computing standard errors:

Mixed-effects REML regression Number of obs = 90
Group variable: airline Number of groups = 6

Obs per group: min = 15
avg = 15.0
max = 15


Wald chi2(3) = 11114.85
Log restricted-likelihood = 105.20458 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9073166 .025809 35.16 0.000 .856732 .9579013
fuel | .4225032 .0140598 30.05 0.000 .3949465 .45006
load | -1.064572 .1997763 -5.33 0.000 -1.456126 -.6730179
_cons | 9.632212 .211559 45.53 0.000 9.217564 10.04686
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
airline: Identity |
sd(_cons) | .1293723 .0429029 .0675403 .2478107
-----------------------------+------------------------------------------------
sd(Residual) | .0600715 .0047138 .051508 .0700588
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 107.49 Prob >= chibar2 = 0.0000

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 75
http://www.indiana.edu/~statmath

75
You may use the maximum likelihood estimation to fit random effect (or random intercept)
model. In SAS, add METHOD=ML to PROC MIXED. PROC PANEL and TSCSREG do not
have such option.

PROC MIXED DATA=masil.airline METHOD=ML;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) airline 0.01302
Residual 0.003494


Fit Statistics

-2 Log Likelihood -229.5
AIC (smaller is better) -217.5
AICC (smaller is better) -216.4
BIC (smaller is better) -218.7


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 105.92 <.0001


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.6186 0.2026 5 47.47 <.0001
output 0.9053 0.02466 81 36.72 <.0001
fuel 0.4234 0.01364 81 31.05 <.0001
load -1.0645 0.1962 81 -5.42 <.0001


Solution for Random Effects

Std Err
Effect airline Estimate Pred DF t Value Pr > |t|

Intercept 1 0.01306 0.05994 81 0.22 0.8281
Intercept 2 -0.03211 0.05640 81 -0.57 0.5707
Intercept 3 -0.2094 0.04900 81 -4.27 <.0001
Intercept 4 0.1676 0.04976 81 3.37 0.0012
Intercept 5 0.000761 0.05580 81 0.01 0.9892
Intercept 6 0.06008 0.05750 81 1.04 0.2992
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 76
http://www.indiana.edu/~statmath

76


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 81 1348.19 <.0001
fuel 1 81 963.88 <.0001
load 1 81 29.43 <.0001

In Stata, the mle option is used in .xtreg and .xtmixed commands to produce the same result.
You may also try .xtgls that fits panel data models with heteroscedasticity across and within
groups. Notice that error variance components are computed as .0130=1141^2 and .0035
= .0591^2. Compare the output of PROC MIXED above and .xtreg below.

. xtreg cost output fuel load, re mle

Random-effects ML regression Number of obs = 90
Group variable: airline Number of groups = 6

Random effects u_i ~ Gaussian Obs per group: min = 15
avg = 15.0
max = 15

LR chi2(3) = 436.32
Log likelihood = 114.72896 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9053099 .0253759 35.68 0.000 .8555741 .9550458
fuel | .4233757 .013888 30.48 0.000 .3961557 .4505957
load | -1.064456 .196231 -5.42 0.000 -1.449062 -.6798506
_cons | 9.618648 .206622 46.55 0.000 9.213677 10.02362
-------------+----------------------------------------------------------------
/sigma_u | .1140843 .0345293 .0630373 .2064687
/sigma_e | .0591072 .0045701 .0507956 .0687787
rho | .7883772 .1047419 .5365302 .9344669
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 105.92 Prob>=chibar2 = 0.000

. xtmixed cost output fuel load || airline:, mle
(output is skipped)

. xtgls cost output fuel load, i(airline) panels(hetero) corr(independent)
(output is skipped)

In LIMDEP, you have to specify Panel, Random Effect, and Het= subcommands for the
groupwise heteroscedastic model. LIMDEP estimates a slightly different variance component
for groups (.0119), thus producing different parameter estimates.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Het=AIRLINE;Random Effect$

+----------------------------------------------------+
| OLS Without Group Dummy Variables |
| Ordinary least squares regression |
| Model was estimated Aug 30, 2009 at 08:26:15PM |
| LHS=COST Mean = 13.36561 |
| Standard deviation = 1.131971 |
| WTS=none Number of observs. = 90 |
| Model size Parameters = 4 |
| Degrees of freedom = 86 |
| Residuals Sum of squares = 1.335450 |
| Standard error of e = .1246133 |
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 77
http://www.indiana.edu/~statmath

77
| Fit R-squared = .9882897 |
| Adjusted R-squared = .9878812 |
| Model test F[ 3, 86] (prob) =2419.33 (.0000) |
| Diagnostic Log likelihood = 61.76991 |
| Restricted(b=0) = -138.3581 |
| Chi-sq [ 3] (prob) = 400.26 (.0000) |
| Info criter. LogAmemiya Prd. Crt. = -4.121594 |
| Akaike Info. Criter. = -4.121653 |
+----------------------------------------------------+

+----------------------------------------------------+
| Panel Data Analysis of COST [ONE way] |
| Unconditional ANOVA (No regressors) |
| Source Variation Deg. Free. Mean Square |
| Between 74.6799 5. 14.9360 |
| Residual 39.3611 84. .468584 |
| Total 114.041 89. 1.28136 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88273863 .01325455 66.599 .0000 -1.17430918
FUEL | .45397771 .02030424 22.359 .0000 12.7703592
LOAD | -1.62750780 .34530293 -4.713 .0000 .56046016
Constant| 9.51691223 .22924522 41.514 .0000

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 6 |
| Smallest 15, Largest 15 |
| Average group size 15.00 |
+----------------------------------------------------+

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i) |
| Estimates: Var[e] = .361260D-02 |
| Var[u] = .119159D-01 |
| Corr[v(i,t),v(i,s)] = .767356 |
| Lagrange Multiplier Test vs. Model (3) = 334.85 |
| ( 1 df, prob value = .000000) |
| (High values of LM favor FEM/REM over CR model.) |
| Baltagi-Li form of LM Statistic = 334.85 |
| Sum of Squares .147779D+01 |
| R-squared .987042D+00 |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .90412380 .02461548 36.730 .0000 -1.17430918
FUEL | .42389869 .01374650 30.837 .0000 12.7703592
LOAD | -1.06455866 .19933132 -5.341 .0000 .56046016
Constant| 9.61063438 .20277404 47.396 .0000

7.3 One-way Random Time Effect Model

Let us computeu
ˆ
using the SSEs of the between time effect model (.0056) and the fixed time
effect model (1.0882).

The variance component for error
2
ˆ
v
o is .01511375 = 1.08819022/(15*6-15-3)
The variance component for time
2
ˆ
u
o is -.00201072 =.005590631/(15-4)- .01511375/6

Theu
ˆ
is
4) - (15 005590631/ . * 6
.01511375
1
ˆ
ˆ
1 1.226263 -
2
2
÷ = ÷ =
between
v
no
o


. gen rt_cost = cost - (-1.226263)*tm_cost
. gen rt_output = output - (-1.226263)*tm_output
. gen rt_fuel = fuel - (-1.226263)*tm_fuel
. gen rt_load = load - (-1.226263)*tm_load
. gen rt_int = 1 - (-1.226263) // for the intercept
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 78
http://www.indiana.edu/~statmath

78

. regress rt_cost rt_int rt_output rt_fuel rt_load, noc

Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 4, 86) = .
Model | 79944.1804 4 19986.0451 Prob > F = 0.0000
Residual | 1.79271995 86 .020845581 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 79945.9732 90 888.288591 Root MSE = .14438

------------------------------------------------------------------------------
rt_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rt_int | 9.516098 .1489281 63.90 0.000 9.220038 9.812157
rt_output | .8883838 .0143338 61.98 0.000 .8598891 .9168785
rt_fuel | .4392731 .0129051 34.04 0.000 .4136186 .4649277
rt_load | -1.279176 .2482869 -5.15 0.000 -1.772754 -.7855982
------------------------------------------------------------------------------

However, the negative value of the variance component for time is not likely.

In SAS, use the TSCSREG or PANEL procedure with the /RANONE option. Notice that the
data are sorted by year and airline. The /VCOMP=WH option in the MODEL statement
employs Wallace and Hussian’s method to estimating variance components and produces the
same parameter estimates.

PROC SORT DATA=masil.airline;
BY year airline;

PROC TSCSREG DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE;
RUN;
(Output is skipped)

PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE BP VCOMP=WH;
RUN;

The PANEL Procedure
Wallace and Hussain Variance Components (RanOne)

Dependent Variable: cost

Model Description

Estimation Method RanOne
Number of Cross Sections 15
Time Series Length 6


Fit Statistics

SSE 1.3354 DFE 86
MSE 0.0155 Root MSE 0.1246
R-Square 0.9883


© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 79
http://www.indiana.edu/~statmath

79
Variance Component Estimates

Variance Component for Cross Sections 0
Variance Component for Error 0.016437


Hausman Test for
Random Effects

DF m Value Pr > m

2 12.17 0.0023


Breusch Pagan Test for Random
Effects (One Way)

DF m Value Pr > m

1 1.55 0.2135


Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.516923 0.2292 41.51 <.0001
output 1 0.882739 0.0133 66.60 <.0001
fuel 1 0.453977 0.0203 22.36 <.0001
load 1 -1.62751 0.3453 -4.71 <.0001

PROC MIXED fits the same random time effect model although /SOLUTION in the
RANDOM statement does not work to produce random effect parameter estimates in this case.

PROC MIXED DATA=masil.airline;
CLASS airline;
MODEL cost = output fuel load /SOLUTION;
RANDOM INTERCEPT / SUBJECT=airline TYPE=UN;
RUN;

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) year 0
Residual 0.01553


Fit Statistics

-2 Res Log Likelihood -102.9
AIC (smaller is better) -100.9
AICC (smaller is better) -100.9
BIC (smaller is better) -100.2
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 80
http://www.indiana.edu/~statmath

80


Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

0 0.00 1.0000


Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 9.5169 0.2292 14 41.51 <.0001
output 0.8827 0.01325 72 66.60 <.0001
fuel 0.4540 0.02030 72 22.36 <.0001
load -1.6275 0.3453 72 -4.71 <.0001


Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

output 1 72 4435.44 <.0001
fuel 1 72 499.92 <.0001
load 1 72 22.22 <.0001

In Stata, you have to switch group and time variables using the .tsset command.

. tsset year airline
panel variable: year (strongly balanced)
time variable: airline, 1 to 6
delta: 1 unit

. xtreg cost output fuel load, re i(year) theta

Random-effects GLS regression Number of obs = 90
Group variable: year Number of groups = 15

R-sq: within = 0.9843 Obs per group: min = 6
between = 0.9966 avg = 6.0
overall = 0.9883 max = 6

Random effects u_i ~ Gaussian Wald chi2(3) = 7258.03
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
theta = 0

------------------------------------------------------------------------------
cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8827385 .0132545 66.60 0.000 .8567602 .9087169
fuel | .453977 .0203042 22.36 0.000 .4141815 .4937724
load | -1.62751 .345302 -4.71 0.000 -2.30429 -.9507309
_cons | 9.516923 .2292445 41.51 0.000 9.067612 9.966233
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .12293801
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------

You may runt the following command to get the same result.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 81
http://www.indiana.edu/~statmath

81

. xtmixed cost output fuel load || year:,
(output is skipped)

In LIMDEP, you need to use the Str= and Random subcommands. The output below includes
only the random effect part. You may find that parameter estimates of SAS, Stata, and
LIMDEP are slightly different each other.

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Het=YEAR;Random$

+----------------------------------------------------+
| Panel:Groups Empty 0, Valid data 15 |
| Smallest 6, Largest 6 |
| Average group size 6.00 |
+----------------------------------------------------+

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i) |
| Estimates: Var[e] = .151138D-01 |
| Var[u] = .414686D-03 |
| Corr[v(i,t),v(i,s)] = .026705 |
| Lagrange Multiplier Test vs. Model (3) = 1.55 |
| ( 1 df, prob value = .213557) |
| (High values of LM favor FEM/REM over CR model.) |
| Baltagi-Li form of LM Statistic = 1.55 |
| Sum of Squares .133564D+01 |
| R-squared .988288D+00 |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OUTPUT | .88285277 .01314515 67.162 .0000 -1.17430918
FUEL | .45500533 .02122856 21.434 .0000 12.7703592
LOAD | -1.66267268 .35084190 -4.739 .0000 .56046016
Constant| 9.52363173 .24108843 39.503 .0000

7.4 Two-way Random Effect Model in SAS

The random group and time effect model is formulated as
it t i ti it
u X y c ¸ | o + + + + = ' . Let us
first estimate the two way FGLS using the SAS PANEL procedure with the /RANTWO option.
The BP2 option conducts the Breusch-Pagan LM test for the two-way random effect model.

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANTWO;
RUN;
(Output is skipped)

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANTWO BP2;
RUN;

The PANEL Procedure
Fuller and Battese Variance Components (RanTwo)

Dependent Variable: cost

Model Description

Estimation Method RanTwo
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 82
http://www.indiana.edu/~statmath

82
Number of Cross Sections 6
Time Series Length 15


Fit Statistics

SSE 0.2322 DFE 86
MSE 0.0027 Root MSE 0.0520
R-Square 0.9829


Variance Component Estimates

Variance Component for Cross Sections 0.017439
Variance Component for Time Series 0.001081
Variance Component for Error 0.00264


Hausman Test for
Random Effects

DF m Value Pr > m

3 6.93 0.0741

Breusch Pagan Test for Random
Effects (Two Way)

DF m Value Pr > m

2 336.40 <.0001

Parameter Estimates

Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.362677 0.2440 38.38 <.0001
output 1 0.866448 0.0255 33.98 <.0001
fuel 1 0.436163 0.0172 25.41 <.0001
load 1 -0.98053 0.2235 -4.39 <.0001

The following .xtmixed command suffers from convergence problem in this case and
LIMDEP command produces different results (output is skipped).

. xtmixed cost output fuel load || airline: || year:, mle

REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Period=YEAR;Random Effect$

7.5 Testing Random Effect Models

The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects. The null
hypothesis of the one-way random group effect model is that individual-specific or time-series
error variances are zero: 0 :
2
0
=
u
H o . If the null hypothesis is not rejected, the pooled
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 83
http://www.indiana.edu/~statmath

83
regression model is appropriate. The e’e of the pooled OLS is 1.33544153 and
e e'
is .0665147.

LM is 334.8496= ) 1 ( ~ 1
3354 . 1
0665 . * 15
) 1 15 ( 2
15 * 6
2
2
2
_
(
¸
(

¸

÷
÷
with p <.0000.

With the large chi-squared of 334.8496, we reject the null hypothesis in favor of the random
group effect model. The SAS PANEL procedure with the /BP option and the LIMDEP Panel
and Het subcommands report the same LM statistic (see 7.2). In Stata, run the .xttest0
command right after estimating the one-way random group effect model.

. quietly xtreg cost output fuel load, re i(airline)

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

cost[airline,t] = Xb + u[airline] + e[airline,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
cost | 1.281358 1.131971
e | .0036126 .0601051
u | .0155972 .1248886

Test: Var(u) = 0
chi2(1) = 334.85
Prob > chi2 = 0.0000

The null hypothesis of the one-way random time effect is that variance components for time are
zero, 0 :
2
0
=
u
H o . The following LM test uses Baltagi’s formula. The small chi-squared of
1.5472 does not reject the null hypothesis at the .01 level. SAS and LIMDEP return the same
LM statistic (see 7.3).

LM is
( )
) 1 ( ~ 1
3354 . 1
7817 .
) 1 6 ( 2
6 * 15
1
) 1 ( 2
5472 . 1
2
2
2
2
2
_
(
¸
(

¸

÷
÷
=
(
(
¸
(

¸

÷
÷
=
¿¿
¿ -
it
t
e
e n
n
Tn
with p<.2135

. quietly xtreg cost output fuel load, re i(year)

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

cost[year,t] = Xb + u[year] + e[year,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
cost | 1.281358 1.131971
e | .0151138 .122938
u | 0 0

Test: Var(u) = 0
chi2(1) = 1.55
Prob > chi2 = 0.2135

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 84
http://www.indiana.edu/~statmath

84
The two way random effects model has the null hypothesis that variance components for
groups and time are all zero. The LM statistic with two degrees of freedom is 336.3968 =
334.8496 + 1.5472 (p<.0001).

7.6 Fixed Effects versus Random Effects

How do we compare a fixed effect model and its counterpart random effect model? The
Hausman specification test examines if the individual effects are uncorrelated with the other
regressors in the model. Since computation is complicated, let us conduct the test in Stata.

. tsset airline year
panel variable: airline (strongly balanced)
time variable: year, 1 to 15
delta: 1 unit

. quietly xtreg cost output fuel load, fe

. estimates store fixed_group

. quietly xtreg cost output fuel load, re

. hausman fixed_group .

---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fixed_group . Difference S.E.
-------------+----------------------------------------------------------------
output | .9192846 .9066805 .0126041 .0153877
fuel | .4174918 .4227784 -.0052867 .0058583
load | -1.070396 -1.064499 -.0058974 .0255088
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2.12
Prob>chi2 = 0.5469
(V_b-V_B is not positive definite)

The Hausman statistic 2.12 is different from PROC PANEL’s 1.63 and Greene (2003)’s 4.16. It
is because SAS, Stata, and LIMDEP use different estimation methods to produce slightly
different parameter estimates. These tests, however, do not reject the null hypothesis in favor of
the random effect model.

7.7 Summary

Table 7.1 summarizes random effect estimations in SAS, Stata, and LIMDEP. PROC PANEL
is highly recommended.

Table 7.1 Comparison of the Random Effect Model in SAS, Stata, LIMDEP
*

SAS 9.2 Stata 11 LIMDEP 9
Procedure/Command PROC TSCSREG PROC PANEL .xtreg Regress; Panel$
One-way /RANONE /RANONE WK re Str=;Random$
Two-way /RANTWO /RANTWO No Str=;Period;Random$
SSE (e’e) Slightly different Correct No Incorrect
MSE or SEE Slightly different Correct No No
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 85
http://www.indiana.edu/~statmath

85
Model test (F) No No Wald test No
(adjusted) R
2
Slightly different Slightly different Incorrect Incorrect
Intercept Slightly different Correct Correct Slightly different
Coefficients Slightly different Correct Correct Slightly different
Standard errors Slightly different Correct Correct Slightly different
Variance for group Slightly different Correct Correct (sigma) Slightly different
Variance for error Correct Correct Correct (sigma) Correct
Theta No No theta No
Breusch-Pagan (LM) No BP, BP2 .xttest0 Yes
Hausman Test (H) Incorrect Yes .hausman Yes (unstable)
* “Yes/No” means whether a software package reports the statistic. “Correct/incorrect” indicates whether the
statistics are different from those of the groupwise heteroscedastic regression.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 86
http://www.indiana.edu/~statmath

86
8. Poolability Test

Table 8.1 summarizes the results of pooled OLS, fixed effect, and random effect model. We
may ask, “Which model is better than the others?” Do we have to consider individual-specific
or time effect? Are these effects are fixed or random?

Table 8.1 Summary of Pooled, Fixed Effect, and Random Effect Models
Model Output Fuel Load SSE/SEE DF F R
2
(Adj.)
Pooled
.8827
**

(.0133)
.4540
**

(.0203)
-1.6275
**

(.3453)
1.3354
(.1246)
86 2419.34
(p<.0000)
.9883
(.9879)
Between group
.7825
*

(.1088)
-5.5239
(4.4787)
-1.7511
(2.7432)
.0317
(.1259)
2 104.12
(p<.0095)
.9936
(.9841)
Between time
1.1333
**

(.0513)
.3342
**

(.0228)
-1.3507
**

(.2478)
.0056
(.0225)
11 4074.33
(p<.0000)
.9991
(.9989)
Fixed group
.9193
**

(.0299)
.4175
**

(.0152)
-1.0704
**

(.2017)
.2926
(.0601)
81 3935.79
(p<.0000)
.9974
(.9972)
Fixed time
.8677
**

(.0154)
-.4845
(.3641)
-1.9544
**

(.4424)
1.0882
(.1229)
72 439.62
(p<.0001)
.9905
(.9882)
Two-way
fixed
.8173
**

(.0319)
.1686
(.1635)
-.8828
**

(.2617)
.1769
(.0514)
67 1960.82
(p<.0000)
.9984
(.9979)
Random group
.9069
**

(.0257)
.4227
**

(.0140)
-1.0645
**

(.2000)
.3111
(.0601)
86 .9923

Random time
.8820
**

(.0134)
.2749
+

(.0568)
-2.0050
**

(.4184)
1.1722
(.1167)
86 .9848

Two-way
random
.8664
**

(.0255)
.4362
**

(.0172)
-.9805
**

(.2235)
.2322
(.0520)
86 .9829


The poolability test examine if data are poolable so that individual entities or time periods have
the same constant slopes of regressors. For poolability test, you need to run group by group
OLS regressions and/or time by time OLS regressions. If the null hypothesis is rejected, the
panel data are not poolable. In this case, you may consider the random coefficient model and
hierarchical regression model.

8.1 Group by Group OLS Regression

In SAS, use the BY statement in PROC REG. Do not forget to sort the data set in advance.

PROC SORT DATA=masil.airline;
BY airline;

PROC REG DATA=masil.airline;
MODEL cost = output fuel load;
BY airline;
RUN;

In Stata, the if qualifier makes it easy to run group by group regressions.

forvalues i= 1(1)6 { // run group by group regression
display "OLS regression for group " `i'
regress cost output fuel load if airline==`i'
}

OLS regression for group 1

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 87
http://www.indiana.edu/~statmath

87
Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 1843.46
Model | 3.41824348 3 1.13941449 Prob > F = 0.0000
Residual | .006798918 11 .000618083 R-squared = 0.9980
-------------+------------------------------ Adj R-squared = 0.9975
Total | 3.4250424 14 .244645886 Root MSE = .02486

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.18318 .0968946 12.21 0.000 .9699164 1.396444
fuel | .3865867 .0181946 21.25 0.000 .3465406 .4266329
load | -2.461629 .4013571 -6.13 0.000 -3.34501 -1.578248
_cons | 10.846 .2972551 36.49 0.000 10.19174 11.50025
------------------------------------------------------------------------------
OLS regression for group 2

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 3129.50
Model | 6.47622084 3 2.15874028 Prob > F = 0.0000
Residual | .007587838 11 .000689803 R-squared = 0.9988
-------------+------------------------------ Adj R-squared = 0.9985
Total | 6.48380868 14 .463129191 Root MSE = .02626

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.459104 .0792856 18.40 0.000 1.284597 1.63361
fuel | .3088958 .0272443 11.34 0.000 .2489315 .36886
load | -2.724785 .2376522 -11.47 0.000 -3.247854 -2.201716
_cons | 11.97243 .4320951 27.71 0.000 11.02139 12.92346
------------------------------------------------------------------------------
OLS regression for group 3

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 608.10
Model | 3.79286673 3 1.26428891 Prob > F = 0.0000
Residual | .022869767 11 .00207907 R-squared = 0.9940
-------------+------------------------------ Adj R-squared = 0.9924
Total | 3.8157365 14 .272552607 Root MSE = .0456

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .7268305 .1554418 4.68 0.001 .3847054 1.068956
fuel | .4515127 .0381103 11.85 0.000 .3676324 .5353929
load | -.7513069 .6105989 -1.23 0.244 -2.095226 .5926122
_cons | 8.699815 .8985786 9.68 0.000 6.722057 10.67757
------------------------------------------------------------------------------
OLS regression for group 4

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 777.86
Model | 7.37252558 3 2.45750853 Prob > F = 0.0000
Residual | .034752343 11 .003159304 R-squared = 0.9953
-------------+------------------------------ Adj R-squared = 0.9940
Total | 7.40727792 14 .52909128 Root MSE = .05621

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9353749 .0759266 12.32 0.000 .7682616 1.102488
fuel | .4637263 .044347 10.46 0.000 .3661192 .5613333
load | -.7756708 .4707826 -1.65 0.128 -1.811856 .2605148
_cons | 9.164608 .6023241 15.22 0.000 7.838902 10.49031
------------------------------------------------------------------------------
OLS regression for group 5

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 1999.89
Model | 7.08313716 3 2.36104572 Prob > F = 0.0000
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 88
http://www.indiana.edu/~statmath

88
Residual | .012986435 11 .001180585 R-squared = 0.9982
-------------+------------------------------ Adj R-squared = 0.9977
Total | 7.09612359 14 .506865971 Root MSE = .03436

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | 1.076299 .0771255 13.96 0.000 .9065471 1.246051
fuel | .2920542 .0434213 6.73 0.000 .1964845 .3876239
load | -1.206847 .3336308 -3.62 0.004 -1.941163 -.4725305
_cons | 11.77079 .7430078 15.84 0.000 10.13544 13.40614
------------------------------------------------------------------------------
OLS regression for group 6

Source | SS df MS Number of obs = 15
-------------+------------------------------ F( 3, 11) = 2602.49
Model | 11.1173565 3 3.70578551 Prob > F = 0.0000
Residual | .015663323 11 .001423938 R-squared = 0.9986
-------------+------------------------------ Adj R-squared = 0.9982
Total | 11.1330199 14 .795215705 Root MSE = .03774

------------------------------------------------------------------------------
cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .9673393 .0321728 30.07 0.000 .8965275 1.038151
fuel | .3023258 .0308235 9.81 0.000 .2344839 .3701678
load | .1050328 .4767508 0.22 0.830 -.9442886 1.154354
_cons | 10.77381 .4095921 26.30 0.000 9.872309 11.67532
------------------------------------------------------------------------------

8.2 Poolability Test across Groups

The null hypothesis of the poolability test across groups is
k ik
H | | = :
0
. The e e' is 1.3354,
the SSE of the pooled OLS regression. The
i i
e e ' is .1007 = .0068 + .0076 + .0229 + .0348
+ .0130 + .0157.

The F statistic is | | 66 , 20 4812 . 40 ~
) 4 15 ( 6 1007 .
4 ) 1 6 ( 1007 . 3354 . 1 (
÷
÷ ÷


The large 40.4812 rejects the null hypothesis of poolability (p< .0000). We conclude that the
panel data are not poolable with respect to airline.

8.3 Poolability Test over Time

The null hypothesis of the poolability test over time is
k tk
H | | = :
0
. The sum of
t t
e e ' is
computed from the 15 time by time regression.

forvalues i= 1(1)15 { // run year by year regression
display "OLS regression for year " `i'
regress cost output fuel load if year==`i'
}

(output is skipped)

. di .044807673 + .023093978 + .016506613 + .012170358 + .014104542 + ///
.000469826 + .063648817 + .085430285 + .049329439 + .077112957 + ///
.029913538 + .087240016 + .143348297 + .066075346 + .037256216

.7505079
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 89
http://www.indiana.edu/~statmath

89

The F statistic is | |
) 4 6 ( 15 7505 .
4 ) 1 15 ( ) 7505 . 3354 . 1 (
30 , 84 4175 .
÷
÷ ÷
=

The small F statistic does not reject the null hypothesis in favor of poolable panel data with
respect to time (p<.9991).

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 90
http://www.indiana.edu/~statmath

90
9. Conclusion

Panel data are analyzed to investigate group and time effects using fixed effect and random
effect models. The fixed effect model asks how group and/or time affect the intercept, while the
random effect model analyzes error variance structures affected by group and/or time. Slopes
are assumed unchanged in both fixed effect and random effect models.

A panel data set needs to be arranged in the long format as shown in Section 1.1. If the number
of groups (subjects) or time periods is extremely large, panel data models may be less useful
because the null hypothesis of F test is too strong. Then, you may consider categorizing
subjects to reduce the number of groups. If data are severely unbalanced, read output with
caution and consider dropping subjects with many missing data points. This document assumes
that data are balanced without missing values.

Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and
within effect model. LSDV has three approaches to avoid perfect multicollinearity. LSDV1
drops a dummy, LSDV2 suppresses the intercept, and LSDV3 includes all dummies and
imposes restrictions instead. LSDV1 is commonly used since it produces correct statistics.
LSDV2 provides actual parameter estimates of groups (Y-intercepts), but reports incorrect R
2
and F statistic. Notice that the dummy parameters of three LSDV approaches have different
meanings and thus conduct different t-tests.

The within effect model does not use dummy variables but deviations from group means. Thus,
this model is useful when there are many groups and/or time periods in the panel data set since
it is able to avoid the incidental parameter problem. The dummy parameter estimates need to be
computed afterward. Because of its larger degrees of freedom, the within effect model produces
incorrect MSE and standard errors of parameters. As a result, you need to adjust the standard
errors to conduct correct t-tests.

Random effect models are estimated by the generalized least squares (GLS) and the feasible
generalization least squares (FGLS). When the variance structure is known, GLS is used. If
unknown, FGLS estimates theta. Parameter estimates vary depending on estimation methods.

Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange
multiplier test. The Hausman specification test compares a fixed effect model and a random
effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is
preferred. Poolabiltiy is tested by running group by group or time by time regressions.

Among the four statistical packages addressed in this document, I would recommend SAS and
Stata. In particular, PROC PANEL provides various ways of analyzing panel data and report
correct (adjusted) statistics (see Table 4.1 and 7.1). Stata is very handy to manipulate panel data
reports incorrect F-test and R
2
. LIMDEP is able to estimate various panel data models but does
not good at data management. SPSS is least recommended for panel data models.

Extensions to these basic linear panel data models include dynamic models with autocorrelation,
random coefficient model, and hierarchical linear model, and logit/probit models.
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 91
http://www.indiana.edu/~statmath

91
Appendix: Data Sets

Data set 1: Data of the top 50 information technology firms presented in OECD Information
Technology Outlook 2004 (http://thesius.sourceoecd.org/).

URL: http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.csv
http://www.indiana.edu/~statmath/stat/all/panel/rnd2002.dta

firm = IT company name
type = type of IT firm
rnd = 2002 R&D investment in current USD millions
income = 2000 net income in current USD millions
d1 = 1 for equipment and software firms and 0 for telecommunication and electronics

. tab type d1

| d1
Type of Firm | 0 1 | Total
----------------+----------------------+----------
Telecom | 18 0 | 18
Electronics | 17 0 | 17
IT Equipment | 0 6 | 6
Comm. Equipment | 0 5 | 5
Service & S/W | 0 4 | 4
----------------+----------------------+----------
Total | 35 15 | 50

. sum rnd income

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
rnd | 39 2023.564 1615.417 0 5490
income | 50 2509.78 3104.585 -732 11797


Data set 2: Cost data for U.S. airlines (1970-1984) presented in Greene (2003).

URL: http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
http://www.indiana.edu/~statmath/stat/all/panel/airline.dta

airline = airline (six airlines)
year = year (fifteen years)
output0 = output in revenue passenger miles, index number
cost0 = total cost in $1,000
fuel0 = fuel price
load = load factor, the average capacity utilization of the fleet

. sum output0 cost0 fuel0 load

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
output0 | 90 .5449946 .5335865 .037682 1.93646
cost0 | 90 1122524 1192075 68978 4748320
fuel0 | 90 471683 329502.9 103795 1015610
load | 90 .5604602 .0527934 .432066 .676287
© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 92
http://www.indiana.edu/~statmath

92
References

Baltagi, Badi H. 2001. Econometric Analysis of Panel Data. Wiley, John & Sons.
Baltagi, Badi H., and Young-Jae Chang. 1994. "Incomplete Panels: A Comparative Study of
Alternative Estimators for the Unbalanced One-way Error Component Regression
Model." Journal of Econometrics, 62(2): 67-89.
Breusch, T. S., and A. R. Pagan. 1980. "The Lagrange Multiplier Test and its Applications to
Model Specification in Econometrics." Review of Economic Studies, 47(1):239-253.
Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and
Applications. New York: Cambridge University Press.
Cameron, A. Colin, and Pravin K. Trivedi. 2009. Microeconometrics Using Stata. TX: Stata
Press.
Freund, Rudolf J., and Ramon C. Littell. 2000. SAS System for Regression, 3
rd
ed. Cary, NC:
SAS Institute.
Fuller, Wayne A. and George E. Battese. 1973. "Transformations for Estimation of Linear
Models with Nested-Error Structure." Journal of the American Statistical
Association, 68(343) (September): 626-632.
Fuller, Wayne A. and George E. Battese. 1974. "Estimation of Linear Models with Crossed-
Error Structure." Journal of Econometrics, 2: 67-78.
Greene, William H. 2003. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.
Greene, William H. 2007. LIMDEP Version 9.0 Econometric Modeling Guide 1. Plainview,
New York: Econometric Software.
Hausman, J. A. 1978. "Specification Tests in Econometrics." Econometrica, 46(6):1251-1271.
SAS Institute. 2004. SAS/ETS 9.1 User’s Guide. Cary, NC: SAS Institute.
SAS Institute. 2004. SAS/STAT 9.1 User’s Guide. Cary, NC: SAS Institute.
SPSS Inc. 2007. SPSS 16.0 Command Syntax Reference. Chicago, IL: SPSS Inc.
Stata Press. 2007. Stata Base Reference Manual, Release 10. College Station, TX: Stata Press.
Stata Press. 2007. Stata Longitudinal/Panel Data Reference Manual, Release 10. College
Station, TX: Stata Press.
Stata Press. 2007. Stata Time-Series Reference Manual, Release 10. College Station, TX: Stata
Press.
Suits, Daniel B. 1984. “Dummy Variables: Mechanics V. Interpretation.” Review of Economics
& Statistics 66 (1):177-180.
Uyar, Bulent, and Orhan Erdem. 1990. "Regression Procedures in SAS: Problems?" American
Statistician 44(4): 296-301.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel
Data. Cambridge, MA: MIT Press.








© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 93
http://www.indiana.edu/~statmath

93
Acknowledgements

I have to thank Dr. Heejoon Kang of the Kelley School of Business and Dr. David H. Good of
the School of Public and Environmental Affairs, Indiana University at Bloomington. I am also
grateful to Jeremy Albright, Dani Marinova, and Kevin Wilhite at the UITS Center for
Statistical and Mathematical Computing for comments and suggestions. A special thanks to
many readers around the world who have eagerly provided constructive feedback and
encouraged me to keep improving this document.


Revision History

 2005.11 First draft
 2008.04, 11 Corrected some errors and added Stata examples
 2009.09 Second draft (updated LSDV section and analysis output)

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 2

This document summarizes linear regression models for panel data and illustrates how to estimate each model using SAS 9.2, Stata 11, LIMDEP 9, and SPSS 17. This document does not address nonlinear models (i.e., logit and probit models) and dynamic models, but focuses on basic linear regression models. 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction Least Squares Dummy Variable Regression Panel Data Models One-way Fixed Effect Models: Fixed Group Effect One-way Fixed Effect Models: Fixed Time Effect Two-way Fixed Effect Models Random Effect Models Poolability Test Conclusion Appendix References

1. Introduction
Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each of which has repeated measurements at different time periods. U.S. Census Bureau’s Census 2000 data at the state or county level are cross-sectional but not time-series, while annual sales figures of Apple Computer Inc. for the past 20 years are time series but not cross-sectional. If annual sales data of IBM, LG, Siemens, Microsoft, and AT&T during the same periods are also available, they are panel data. The cumulative General Social Survey (GSS), American National Election Studies (ANES), and Current Population Survey (CPS) data are not panel data in the sense that individual respondents vary across survey years. Panel data may have group effects, time effects, or the both, which are analyzed by fixed effect and random effect models. 1.1 Data Arrangement A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. Thus, the total number of observations is nT. Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month). Otherwise, panel data should be analyzed with caution. A short panel data set has many entities but few time periods (small T), while a long panel has many time periods (large T) but few entities (Cameron and Trivedi 2009: 230). Panel data have a cross-section (entity or subject) variable and a time-series variable. In Stata, this arrangement is called the long form (as opposed to the wide form). While the long form has both group (individual level) and time variables, the wide form includes either group or time variable. Look at the following data set to see how panel data are arranged. There are 6 groups

http://www.indiana.edu/~statmath

2

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 3

(airlines) and 15 time periods (years). The .use command below loads a Stata data set through TCP/IP and in 1/20 of the .list command displays the first 20 observations.
. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear (Cost of U.S. Airlines (Greene 2003)) . list airline year load cost output fuel in 1/20, sep(20) +------------------------------------------------------------+ | airline year load cost output fuel | |------------------------------------------------------------| | 1 1 .534487 13.9471 -.0483954 11.57731 | | 1 2 .532328 14.01082 -.0133315 11.61102 | | 1 3 .547736 14.08521 .0879925 11.61344 | | 1 4 .540846 14.22863 .1619318 11.71156 | | 1 5 .591167 14.33236 .1485665 12.18896 | | 1 6 .575417 14.4164 .1602123 12.48978 | | 1 7 .594495 14.52004 .2550375 12.48162 | | 1 8 .597409 14.65482 .3297856 12.6648 | | 1 9 .638522 14.78597 .4779284 12.85868 | | 1 10 .676287 14.99343 .6018211 13.25208 | | 1 11 .605735 15.14728 .4356969 13.67813 | | 1 12 .61436 15.16818 .4238942 13.81275 | | 1 13 .633366 15.20081 .5069381 13.75151 | | 1 14 .650117 15.27014 .6001049 13.66419 | | 1 15 .625603 15.3733 .6608616 13.62121 | | 2 1 .490851 13.25215 -.652706 11.55017 | | 2 2 .473449 13.37018 -.626186 11.62157 | | 2 3 .503013 13.56404 -.4228269 11.68405 | | 2 4 .512501 13.8148 -.2337306 11.65092 | | 2 5 .566782 14.00113 -.1708536 12.27989 | +------------------------------------------------------------+

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape command to rearrange a data set back and forth between the long and wide form. The following command changes from the long form to wide one so that the wide form has only six observations that have a group variable and as many variables as the time period (4*15 year).
. keep airline year load cost output fuel . reshape wide cost output fuel load, i(airline) j(year) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) Data long -> wide ----------------------------------------------------------------------------Number of obs. 90 -> 6 Number of variables 6 -> 61 j variable (15 values) year -> (dropped) xij variables: cost -> cost1 cost2 ... cost15 output -> output1 output2 ... output15 fuel -> fuel1 fuel2 ... fuel15 load -> load1 load2 ... load15 -----------------------------------------------------------------------------

If you wish to rearrange the data set back to the long form, run the following command.
. reshape long cost output fuel load, i(airline) j(year)

In balanced panel data, all entities have measurements in all time periods. In a contingency table of cross-sectional and time-series variables, each cell should have only one frequency. When each entity in a data set has different numbers of observations due to missing values, the panel data are not balanced. Some cells in the contingency table have zero frequency. In
http://www.indiana.edu/~statmath 3

1. The feasible generalized least squares (FGLS) method is used to estimate the variance structure when  is not known.indiana. the dummies act as an error term. are fixed effect models. while random effects are examined by the Lagrange Multiplier (LM) test (Breusch and Pagan 1980). There are various estimation methods for FGLS including the maximum likelihood method and simulation (Baltagi and Cheng 1994). Fixed effect models use least squares dummy variable (LSDV) and within effect estimation methods. otherwise. estimates variance components for groups (or times) and error. The difference among groups (or time periods) lies in their variance of the error term. Table 1. http://www. If dummies are considered as a part of the intercept. Unbalanced panel data entail some computational and estimation issues although most software packages are able to handle both balanced and unbalanced data. within effect method Incremental F test Random Effect Model ' yit    X it   (ui  vit ) Constant Varying across groups and/or times Constant GLS.2 Fixed Effect versus Random Effect Models Panel data models examine fixed and/or random effects of entity (individual or subject) or time. the total number of observations is not nT. a random effect model is better than its fixed counterpart. not in their intercepts. by contrast. this is a fixed effect model. a variance structure among groups. Since a group (individual specific) effect is time invariant and considered a part of the intercept. a core OLS assumption is violated. A fixed group effect model examines group differences in intercepts.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 4 unbalanced panel data. In a random effect model. If the null hypothesis is not rejected.1). in fact. v2 ) A random effect model. FGLS Breusch-Pagan LM test vit ~ IID(0. ui is a part of the errors and thus should not be correlated to any regressor. is known. The Hausman specification test (Hausman 1978) compares fixed effect and random effect models. A random effect model is estimated by generalized least squares (GLS) when the  matrix. Fixed effects are tested by the (incremental) F test. The core difference between fixed and random effect models lies in the role of dummy variables (Table 1. A typical example is the groupwise heteroscedastic regression model (Greene 2003). the pooled OLS regression is favored. Ordinary least squares (OLS) regressions with dummies.1 Fixed Effect and Random Effect Models Fixed Effect Model ' Functional form* yit  (  ui )  X it   vit Intercepts Error variances Slopes Estimation Hypothesis test * Varying across groups and/or times Constant Constant LSDV.edu/~statmath 4 . assuming the same intercept and slopes. assuming the same slopes and constant variance across entities or subjects. ui is allowed to be correlated to other regressors. If the null hypothesis that the individual effects are uncorrelated with the other regressors in the model is not rejected.

Means$ Regress.Panel..Str=. PROC TSCSREG provides one-way and two-way fixed and random effect models.regress (. re . Stata. The REG procedure of SAS/STAT.xtmixed .regress w/o a dummy .xtreg.Panel. PROC PANEL requires each entity (subject) has more than one observation. this is called a one-way fixed or random effect model.. and LIMDEP can estimate OLS with restrictions (LSDV3).xtreg. abs N/A .Str=. Period=.xtgls .13 users need to download and install PROC PANEL from http://www.Fixed$ Regress.Panel. PROC PANEL was an experimental procedure in 9. time.jsp. any procedure and command for OLS is good for linear panel data models (Table 1. GLS. LIMDEP.cnsreg .13 but becomes a regular procedure in 9.g.xtmixed .Str=.noconstant . fe . LIMDEP regress$. Two-way effect models have two sets of dummy variables for group and/or time variables (e. and SPSS SAS 9. firm.cnsreg). but SPSS cannot.2 Stata 11 LIMDEP 9 Regression (OLS) LSDV1 LSDV2 LSDV3 One-way fixed effect (within) Two-way fixed (within effect) Between effect One-way random effect Two-way random Random coefficient model PROC REG SPSS 17 Regression w/o a dummy /Origin N/A N/A N/A N/A N/A N/A N/A w/o a dummy /NOINT RESTRICT TSCSREG /FIXONE PANEL /FIXONE TSCSREG /FIXTWO PANEL /FIXTWO PANEL /BTWNG PANEL /BTWNT TSCSREG /RANONE PANEL /RANONE MIXED /RANDOM TSCSREG /RANTWO PANEL /RANTWO MIXED /RANDOM .com/apps/demosdownloads/setupintro.edu/~statmath 5 1 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 5 If one cross-sectional or time-series variable is considered (e. and SPSS regression commands all fit LSDV1 by dropping one dummy and have options to suppress the intercept (LSDV2). In Stata. state and year).xtrc Regress$ w/o a dummy w/o One in Rhs Cls: Regress.2 Procedures and Commands in SAS. Period=. . and race).sas.3 Estimation and Software Issues The LSDV regression.RPM=. and LIMDEP also provide the procedures and commands that estimate panel data models in a convenient way (Table 1. between effect model (group or time mean model). Table 1.Str=. PROC TSCSREG can handle balanced data only.1 These procedures estimate the within effect model for a fixed effect model and by default employ the Fuller-Battese method (1974) to estimate variance components for group. http://www.cnsreg command requires restrictions defined in the .xtreg. Stata .2).Str=.Panel.Panel. SAS 9.2).Str=$ SAS.g. be . and FGLS are fundamentally based on OLS in terms of estimation. SAS/ETS has the TSCSREG and PANEL procedures to estimate one-way and two-way fixed/random effect models. Fixed$ Regress.2. country. Stata. Random$ Regress. Stata.constraint command. PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks (1967) autoregressive model and Da Silva moving average method. Thus. 1. within effect model. SAS.areg. whereas PROC PANEL is able to deal with balanced and unbalanced data.indiana.xtmixed .Random$ Regress. and error for a random effect model.

http://support.edu/~statmath 6 2 . 2 Despite advanced features of PROC PANEL. BP and BP2 produce invalid Breusch-Pagan statistics in cases of unbalanced data. This command. fits the one-way within effect model that has a large dummy variable set. For the twoway random effect model. The Stata .xtreg with the fe option. PROC MIXED is also able to fit random effect and random coefficient (parameter) models and supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG. a between effect model with be. airlines (19701984). http://www. which are used in Econometric Analysis (Greene 2003). the output of the two procedures is similar.xtgls that fits panel data models with heteroscedasticity across groups and/or autocorrelation within groups.areg command with the absorb option. does not directly fit two-way fixed and random effect models. The Fixed effect subcommand fits a fixed effect model.com/documentation/cdl/en/etsug/60372/HTML/default/etsug_panel_sect041. 3 You may fit the two-way fixed effect model by including a set of dummies and using the fe option.xtreg. However.4 Data Sets This document uses two data sets. while PROC TSCSREG does not.S.3 The . A panel data set has cost data for U.xtmixed command instead of .xtreg command estimates a within effect (fixed effect) model with the fe option. you need to use the . and Means is for a between effect model. The LIMDEP Regress$ command with the Panel subcommand estimates panel data models.xtmixed command. A random effect model can be also estimated using the .sas. Stata has . Random effect estimates a random effect model. SPSS has limited ability to analyze panel data. equivalent to the .indiana. PROC PANEL has BP and BP2 options to conduct the Breusch-Pagen LM test for random effects.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 6 while PROC PANEL supports the between effect model (/BTWNT and /BTWNG) and pooled OLS regression (/POOLED) as well. and a random effect model with re. however. 1. A cross-sectional data set contains research and development (R&D) expenditure data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004.htm. See the Appendix for the details.

In the following regression equation.1 Model 1 without a Dummy Variable: Pooled OLS The ordinary least squares (OLS) regression without dummy variables.424 2.2230523 .300 Comm.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 7 2. For a $ one million increase in net income. clear ( R&D expenditure of IT firm (OECD 2002)) .38 -------------+-----------------------------Total | 99163705. a pooled regression model.  0 is the intercept. regress rnd income Source | SS df MS -------------+-----------------------------Model | 15902406.0115 0. and  i is the error term.1 37 2250305.697 314.3930632 _cons | 1482.012 .0839066 2.1604 says that this model accounts for 16 percent of the total variance. The model has the intercept of 1. use http://www.0115). Err. assumes a constant intercept and slope regardless of firm types. Figure 2.1604 0.421 Service & S/W 1 0 | … … … … … … … … 2. Std.1 -----------------------------------------------------------------------------rnd | Coef.71 0.490 6.5 1 15902406. Interval] -------------+---------------------------------------------------------------income | .697 and slope of . Least Squares Dummy Variable Regression A dummy variable is a binary variable that is coded to either 1 or zero. p<.2231. 37) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 7.533 ------------------------------------------------------------------------------ http://www.8599 2120.05 significance level (F=7. R2 of . It is commonly used to examine group and time effects in regression analysis.482.750 8.000 844. 1 is the slope of net income in 2000.2231 million (p<. t P>|t| [95% Conf.528 Electronics 0 1 | | Verizon .669 Telecom 0 1 | | IBM 4.012).66 0. Equipment 1 0 | | Siemens 5. Take a look at the data structure (Figure 2.1).edu/~statmath 7 .07.dta.7957 4.5 Residual | 83261299. The variable d2 is coded in the opposite way.0530414 .indiana.772 9.1 Dummy Variable Coding for Firm Types +-----------------------------------------------------------------+ | firm rnd income type d1 d2 | |-----------------------------------------------------------------| | LG Electronics 551 356 Electronics 0 1 | | AT&T 254 4.07 0.indiana.6 38 2609571.2 Number of obs F( 1. a firm is likely to increase R&D expenditure by $ .797 Telecom 0 1 | | Microsoft 3. Model 1: R & Di   0  1incomei   i The pooled model fits the data well at the .edu/~statmath/stat/all/panel/rnd2002. 11. Consider a simple model of regressing R&D expenditure in 2002 on 2000 net income and firm type.093 IT Equipment 1 0 | | Ericsson 4.1377 1500. . The dummy variable d1 is set to 1 for equipment and software firms and zero for telecommunication and electronics.

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 8

Pooled model: R&D = 1,482.697 + .2231*income Despite moderate goodness of fit statistics such as F and t, this is a naïve model. R&D investment tends to vary across industries.
2.2 Model 2 with a Dummy Variable

You may assume that equipment and software firms have more R&D expenditure than other types of companies. Let us take this group difference into account.4 We have to drop one of the two dummy variables in order to avoid perfect multicollinearity. That is, OLS does not work with both dummies in a model. The  1 in model 2 is the coefficient of equipment, service, and software companies.
Model 2: R & Di   0   1incomei   1 d1i   i

Model 2 fits the date better than Model 1 The p-value of the F test is .0054 (significant at the .01 level); R2 is .2520, about .1 larger than that of Model 1; SSE (sum of squares due to error or residual) decreases from 83,261,299 to 74,175,757 and SEE (square root of MSE) also declines accordingly (1,500→1,435). The coefficient of d1 is statistically discernable from zero at the .05 level (t=2.10, p<.043). Unlike Model 1, this model results in two different regression equations for two groups. The difference lies in the intercepts, but the slope remains unchanged.
. regress rnd income d1 Source | SS df MS -------------+-----------------------------Model | 24987948.9 2 12493974.4 Residual | 74175756.7 36 2060437.69 -------------+-----------------------------Total | 99163705.6 38 2609571.2 Number of obs F( 2, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 6.06 0.0054 0.2520 0.2104 1435.4

-----------------------------------------------------------------------------rnd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------income | .2180066 .0803248 2.71 0.010 .0551004 .3809128 d1 | 1006.626 479.3717 2.10 0.043 34.41498 1978.837 _cons | 1133.579 344.0583 3.29 0.002 435.7962 1831.361 ------------------------------------------------------------------------------

d1=1: R&D = 2,140.2050 + .2180*income = 1,113.579 +1,006.6260*1 + .2180*income d1=0: R&D = 1,133.5790 + .2180*income = 1,113.579 +1,006.6260*0 + .2180*income The slope .2180 indicates a positive impact of two-year-lagged net income on a firm’s R&D expenditure. Equipment and software firms on average spend $1,007 million (=2,140-1,134) more for R&D than telecommunication and electronics companies.
2.3 Visualization of Model 1 and 2

4

The dummy variable (firm types) and regressors (net income) may or may not be correlated. 8

http://www.indiana.edu/~statmath

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 9

There is only a tiny difference in the slope (.2231 versus .2180) between Model 1 and Model 2. The intercept 1,483 of Model 1, however, is quite different from 1,134 for equipment and software companies and 2,140 for telecommunications and electronics in Model 2. This result appears to be supportive of Model 2. Figure 2.2 highlights differences between Model 1 and 2 more clearly. The red line (pooled) in the middle is the regression line of Model 1; the dotted blue line at the top is one for equipment and software companies (d1=1) in Model 2; finally the dotted green line at the bottom is for telecommunication and electronics firms (d2=1 or d1=0). Figure 2.2. Regression Lines of Model 1 and Model 2

2002 R&D Investment of OECD IT Firms
2500 R&D=2140+.218*Income

R&D (USD Millions) 1000 1500 2000

R&D=1483+.223*Income

R&D=1134+.218*Income

0 0

500

500

1000 1500 Income (USD Millions)

2000

2500

Source: OECD Information Technology Outlook 2004. http://thesius.sourceoecd.org/

This plot shows that Model 1 ignores the group difference, and thus reports the misleading intercept. The difference in the intercept between two groups of firms looks substantial. However, the two models have the similar slopes. Consequently, Model 2 considering a fixed group effect (i.e., firm type) seems better than the simple Model 1. Compare goodness of fit statistics (e.g., F, R2, and SSE) of the two models. See Section 3.2.2 and 4.7 for formal hypothesis test.
2.4 Least Squares Dummy Variable Regression: LSDV1, LSDV2, and LSDV3

The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with dummy variables. Above Model 2 is a typical example of LSDV. The key issue in LSDV is how to avoid the perfect multicollinearity or so called “dummy variable trap.” LSDV has three approaches to avoid getting caught in the trap. These approaches are different from each other with respect to model estimation and interpretation of dummy variable parameters (Suits 1984: 177). They produce different dummy parameter estimates, but their results are equivalent.
http://www.indiana.edu/~statmath 9

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 10

The first approach, LSDV1, drops a dummy variable as shown in Model 2 above. That is, the parameter of the eliminated dummy variable is set to zero and is used as a baseline (Table 3). A LSDV 1 variable to be dropped, d dropped (d2 in Model 2), needs to be carefully (as opposed to arbitrarily) selected so that it can play a role of the reference group effectively. LSDV2 includes all dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero). Finally, LSDV3 includes the intercept and all dummies, and then impose a restriction that the sum of parameters of all dummies is zero. Each approach has a constraint (restriction) that reduces the number of parameters to be estimated by one and thus makes the model identified. The following functional forms compare these three LSDVs.
LSDV1: R & Di   0  1incomei   1 d1i   i or R & Di   0  1incomei   2 d 2i   i LSDV2: R & Di   1incomei   1 d1i   2 d 2i   i LSDV3: R & Di   0   1incomei   1 d1i   2 d 2i   i , subject to  1   2  0

Table 2.1. Three Approaches of the Least Squares Dummy Variable Regression Model LSDV1 LSDV2 LSDV3 LSDV 1 LSDV 1 * * Dummies included d1  dd except d1  d d d1LSDV 3  d dLSDV 3
LSDV 1 for d dropped

Intercept? All dummies? Constraint (restriction)? Actual dummy parameters

 LSDV 1
No (d-1)

No Yes (d)

 LSDV 3
Yes (d)

LSDV 1 dropped

0
LSDV 1 LSDV 1 i


,

LSDV 2

0
* d

(Drop one dummy)

(Suppress the intercept)



LSDV 3 i

0

(Impose a restriction)

   * LSDV 1  dropped  
* i

 ,  ,… 
* 1 * 2

 i*   LSDV 3   iLSDV 3 ,
 LSDV 3 
1   i* d

Meaning of a dummy coefficient H0 of the t-test

How far away from the reference group (dropped)?
*  i*   dropped  0

Actual intercept

How far away from the average group effect?

 i*  0

 i* 

1   i*  0 d

Source: Constructed from Suits (1984) and David Good’s lecture (2004)

Three approaches end up fitting the same model but the coefficients of dummy variables in each approach have different meanings and thus are numerically different (Table 2.1). A * parameter estimate in LSDV2,  d , is the actual intercept (Y-intercept) of group d. It is easy to
* interpret substantively. The t-test examines if  d is zero. In LSDV1, a dummy coefficient shows the extent to which the actual intercept of group d deviates from the reference point (the * parameter of the dropped dummy variable), which is the intercept of LSDV1,  dropped   LSDV 1 .5

ˆ In Model 2,  1 of 1,007 is the estimated (relative) distance between two types of firm (equipment and software versus telecommunications and electronics). In Figure 2.2, the Y-intercept of equipment and software (absolute distance from the origin) is 2,140 = 1,134+1,006. The Y-intercept of telecommunications and electronics is 1,134.
5

http://www.indiana.edu/~statmath

10

Here we include d2 instead of d1 to see how a different reference point changes the result. Which approach is better than the others? You need to consider both estimation and interpretation issues carefully. In LSDV3. Stata . MODEL rnd = income d2.5. in other words.rnd2002. we can replicate the other two LSDVs. RUN. 2. LSDV2 reports a incorrect R2. Oftentimes researchers want to see how far dummy parameters deviate from the reference group rather than what are the actual intercept of each group. and LIMDEP.e. Stata. Therefore. The coefficient of a dummy included means how far its parameter estimate is away from the reference point or baseline (i. The intercept is the actual parameter estimate (absolute distance from the origin) of the dropped dummy variable. for example.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 11 The null hypothesis holds that the deviation from the reference group is zero. Table 2.1 LSDV 1 without a Dummy LSDV 1 drops a dummy variable. 2. each approach has a different baseline and thus tests a different hypothesis but produces exactly the same parameter estimates of regressors. The REG Procedure Model: MODEL1 Dependent Variable: rnd Number of Observations Read Number of Observations Used Number of Observations with Missing Values 50 39 11 Analysis of Variance Sum of Mean http://www. LIMDEP Regress$ command. the intercept). In short.5 Estimating Three LSDVs The SAS REG procedure.indiana.edu/~statmath 11 . The average effect is the intercept of LSDV3:  LSDV 3    i* .1 summarizes differences in estimation and interpretation of the three LSDVs. a dummy coefficient means how far its actual parameter is away from the average group effect 1 (Suits 1984: 178). d the null hypothesis is the deviation from the average is zero. PROC REG DATA=masil. LSDV1 is often preferred because of easy estimation in statistical software packages. In general.. Check the sign of the dummy coefficient and the intercept.regress command. and SPSS Regression command all fit OLS and LSDVs. given one LSDV fitted. LSDV2 and LSDV3 involve some estimation problems. They all fit the same model. Let us estimate three LSDVs using SAS.

5.007 of telecommunications and electronics means that its Y-intercept is -1.2180*income d2=1: R&D = 1.2180*income The intercept 2.2520 0. Alternatively.5788 + .rnd2002. reports incorrect (inflated) R2 (. MODEL rnd = income d2 /SOLUTION. d2=0).indiana. The coefficient -1.006.2180*income = 2.© 2005-2009 The Trustees of Indiana University (9/16/2009) Source Model Error Corrected Total DF 2 36 38 Squares 24987949 74175757 99163706 Linear Regression Models for Panel Data: 12 Square 12493974 2060438 F Value 6.62593 Standard Error 434.2 LSDV 2 without the Intercept LSDV 2 includes all dummy variables and suppresses the intercept.edu/~statmath 12 .0101 0.48460 0.6259*0 + .1. whose dummy is dropped in the model (d1=1.0054 Root MSE Dependent Mean Coeff Var 1435.006.007.08032 479. . The coefficients of dummies are actual parameter estimates.140. regress rnd income d1 d2. However. In short.7135 > .06).134 = 2.2. however. the sum of squares of errors is correct in any LSDV. you may use the GLM and MIXED procedures to get the same result.56410 70.rnd2002.10 Pr > |t| <. RUN. MODEL rnd = income d2 /SOLUTION.0001 0. PROC MIXED DATA=masil.134 of equipment and software. you do not need to compute Y-intercepts of groups. this model is identical to Model 2 in Section 2. 2.2047 + . dropping another dummy does not change the model although producing different dummy coefficients. RUN.2520) and F (29.2047 .71 -2. That is.2047 .88 > 6.0428 d2=0: R&D = 2.93 2.2180*income = 2. noconstant http://www.regress command has the noconstant option to fit LSDV2.21801 -1006.140.93536 R-Square Adj R-Sq 0.007 smaller than 1.133.20468 0. This LSDV.140 (baseline) – 1. This is because the X matrix does not have a column vector of 1 and produces incorrect sums of squares of model and total (Uyar and Erdem (1990: 298). The Stata .140 is the Y-intercept of equipment and software firms.6259*1 + . Therefore.2104 Parameter Estimates Parameter Estimate 2140. 1.37174 Variable Intercept income d2 DF 1 1 1 t Value 4.42248 2023. PROC GLM DATA=masil.140. thus.1.06 Pr > F 0.

constraint 1 d1 + d2 = 0 .133.637). Since there are two groups here. in theory.3809128 d1 | 503.133.4225 ( 1) d1 + d2 = 0 -----------------------------------------------------------------------------rnd | Coef.140 millions for R&D expenditure.4184 d2 | -503.029 3021.0583 3.20749 989.892 310.2180*income d2=1: R&D = 1.7 36 2060437.71 0.205 + .0438 5.10 0.205 434.cnsreg command fits a constrained OLS using the constraint()option.134-$1. $503 millions MORE than the average expenditure of overall IT firms (=$2. while telecommunications and electronics spend $503 millions LESS than the average (=$1.313 239.0803248 2. MODEL rnd = income d1 d2.constraint command defines a constraint.© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 184685604 3 61561868. RESTRICT d1 + d2 = 0. the coefficient of RESTRICT is virtually zero and. the coefficients of two dummies by definition share the same magnitude ($503) but have opposite directions.313 239. http://www. Err.4184 -17.637 = (2.002 435.361 ------------------------------------------------------------------------------ d1=1: R&D = 2.579 + . Std.2180066 .69 ------------------------------------------------------------------------------ d1=1: R&D = 2.2180*income d2=1: R&D = 1.0000 0.edu/~statmath 13 .043 -989.5.2180*income = 1.10 0. In the SAS output below.indiana. Interval] -------------+---------------------------------------------------------------income | . The number in the parenthesis indicates the constraint number defined in the .71 0.20749 _cons | 1636.133)/2.88 0.7135 0.6859 2. 36) Prob > F Root MSE = = = = 39 6.2180*income The intercept is the average of actual parameter estimates: 1.06 0.69 -------------+-----------------------------Total | 258861361 39 6637470.7962 1831.29 0.043 17. Std.637).637 + 503*0 + (-503)*1 + . cnsreg rnd income d1 d2.140.3809128 d1 | 2140.38 d2 | 1133.79 Linear Regression Models for Panel Data: 13 Number of obs F( 3.93 0.140. while the .2180066 .2180*income = 1.140+1.0803248 2. .6859 -2.4846 4.3 LSDV 3 with a Restriction LSDV 3 includes the intercept and all dummies and then imposes a restriction on the model. Err.000 1008. constraint(1) Constrained linear regression Number of obs F( 2.4 -----------------------------------------------------------------------------rnd | Coef. The restriction is that the sum of all dummy parameters is zero. 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 29.0551004 .010 . Equipment and software firms invest $2.010 .000 1259. RUN.6896 1435.205 + .0551004 .2180*income 2. PROC REG DATA=masil. should be zero.140-$1.28 0.637 + 503*1 + (-503)*0 + . t P>|t| [95% Conf.094 2265.rnd2002. Interval] -------------+---------------------------------------------------------------income | .579 344. The Stata .0054 1435.579 + . t P>|t| [95% Conf.1 Residual | 74175756.constraint command.

constraint 1 d1+ d2 = 0 .31297 -503. Pr > |t| <. d2.2 Estimating Three LSDVs Using SAS. Table 2. SAS MODEL rnd = income d2. and SPSS LSDV 1 LSDV 2 LSDV 3 PROC REG. Lhs=rnd. Lhs=rnd.0428 .0101 0. Rhs=ONE.0001 0.68587 239. Rhs=ONE.31297 1.indiana.10) /ORIGIN /DEPENDENT rnd /METHOD=ENTER income d1 d2.21801 503. d2$ REGRESSION /MISSING LISTWISE /STATISTICS COEFF R ANOVA /CRITERIA=PIN(.income.income.28 2. Rhs=income.05) POUT(.93536 R-Square Adj R-Sq 0.06 Pr > F 0. RUN. PROC REG.71 2. RUN.0054 Root MSE Dependent Mean Coeff Var 1435. LIMDEP. . d2$ REGRESSION /MISSING LISTWISE /STATISTICS COEFF R ANOVA /CRITERIA=PIN(.05) POUT(.56410 70. cnsreg rnd income d1 d2 const(1) REGRESS. Lhs=rnd. RESTRICT d1 + d2 = 0.42248 2023. d1. Stata.81899E-12 Standard Error 310. MODEL rnd = income d1 d2. Number of Observations Read Number of Observations Used Number of Observations with Missing Values 50 39 11 Analysis of Variance Sum of Squares 24987949 74175757 99163706 Mean Square 12493974 2060438 Source Model Error Corrected Total DF 2 36 38 F Value 6.08032 239.10) /NOORIGIN /DEPENDENT rnd /METHOD=ENTER income d2. . noconstant REGRESS.edu/~statmath 14 .10 -2. Stata LIMDEP SPSS .0428 0. MODEL rnd = income d1 d2 /NOINT.89172 0. regress ind income d2 REGRESS.04381 0. d1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 14 The REG Procedure Model: MODEL1 Dependent Variable: rnd NOTE: Restrictions have been applied to parameter estimates. * Probability computed using beta distribution.10 . RUN. regress rnd income d1 d2.2520 0.2104 Parameter Estimates Parameter Estimate 1636. Cls: b(2)+b(3)=0$ N/A http://www.68587 0 Variable Intercept income d1 d2 RESTRICT DF 1 1 1 1 -1 t Value 5. PROC REG.

Cls: b(2)+b(3)=0 fits the model under the condition that the sum of parameter estimates of d1 (second parameter) and d2 (third parameter) is zero. pay attention to the /ORIGIN option for LSDV2.indiana. and SPSS estimate LSDVs. ONE indicates the intercept to be included. Stata. In SPSS.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 15 Table 2.edu/~statmath 15 . SPSS is not able to fit the LSDV3. LIMDEP. http://www. In LIMDEP.2 compares how SAS.

 v2 ) Note that ui is a fixed or random effect and errors are independent identically distributed. xt : means of independent variables (IVs) at time t. while a twoway model considers two sets of dummy variables (e. Panel Data Models Panel data models examine group (individual-specific) effects. vit ~ IID(0. v2 ) ' Random group effect model: yit    X it   (ui  vit ) . is a one-way fixed group effect panel data model. produce the identical parameter estimates of non-dummy independent variables. xi  : means of independent variables (IVs) of group i. firm). firm and year). Model 2 in Chapter 2. x : overall means of the IVs. The between effect model fits the model using group and/or time means of dependent and independent variables without dummies. http://www. Slopes remain the same across groups or time periods.g. Notations used in this document include. 3.  yi  : dependent variable (DV) mean of group i. time effects.1 summarizes pros and cons of these models.indiana. where vit ~ IID(0. y : overall means of the DV.g..           yt : dependent variable (DV) mean at time t. v2 ) . in fact. whereas the within effect model does not. These strategies. A fixed effect model examines if intercepts vary across groups or time periods. n: the number of groups or firms T : the number of time periods N=nT : total number of observations k : the number of regressors excluding dummy variables K=k+1 (including the intercept) 3.2 Fixed Effect Models There are several strategies for estimating fixed effect models. Table 3. of course. where vit ~ IID(0. A one-way model includes only one set of dummy variables (e. These effects are either fixed effect or random effect. or both. whereas a random effect model explores differences in error variances. The functional forms of one-way panel data models are as follows.edu/~statmath 16 . The least squares dummy variable model (LSDV) uses dummy variables.. ' Fixed group effect model: yit  (  ui )  X it   vit .1 Functional Forms and Notation The parameter estimate of a dummy variable is a part of the intercept in a fixed effect model and a component of error in the random effect model.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 16 3.

only coefficients of regressors are consistent. becomes problematic when there are many groups or subjects in panel data. If T is fixed and nT   . Finally. so called the group mean regression.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 17 3. This LSDV. resulting in small MSE (mean square error) and incorrect (smaller) standard errors of parameter estimates. 3) run OLS with the transformed variables without the intercept. The parameter estimates of regressors in the within effect model are identical to those of LSDV.1 Estimations: LSDV. Within Effect. http://www. Thus. The coefficients of dummy variables. This data aggregation reduces the number of You need to follow three steps: 1) compute group means of the dependent and independent variables.2.edu/~statmath 17 6 . LSDV is useless and thus calls for another strategy. Since this model does not report dummy coefficients. This is the so called incidental parameter problem.1 Comparison of Fixed Effect Models LSDV1 Within Effect Functional form y i  i i  X i    i yit  yi   xit  xi    it   i  Dummy Dummy coefficient Transformation Intercept (estimation) R2 SSE MSE Standard error of  DFerror Observations Yes Presented No Yes Correct Correct Correct Correct nT-n-k nT No Need to be computed Deviation from the group means No Incorrect Correct Smaller Incorrect (smaller) nT-k (n larger) nT Between Effect y i     xi    i No N/A Group means Yes n-K n The between group effect model. the within effect model has larger degrees of freedom for error. A within group effect model does not need dummy variables. Thus. uses group means of the dependent and independent variables. R2 of the within effect model is not correct  sek LSDV nT  n  k df error because the intercept is suppressed. however. Under this circumstance.6 The incidental parameter problem is no longer an issue.indiana. LSDV is widely used because it is relatively easy to estimate and interpret substantively. are not consistent since the number of these parameters increases as nT increases (Baltagi 2001). you need to compute them using the formula di*  yi   xi  '  Since no dummy is used. The within effect model in turn has several disadvantages. 2) transform variables to get deviations from the group means. and Between Effect Models As discussed in Chapter 2. the within effect model. but it uses deviations from group means. this model is the OLS of ( yit  yi  )  ( xit  xi  )'   ( it   i  ) without an intercept.   ui . * sek  sek Table 3. you have to adjust the standard error using the formula Within df error nT  k .

nT  n  k ) If the null hypothesis is rejected. This hypothesis is tested by the F test. 3. Dummy coefficients: di*  ( yi   y )  ( xi   x )'  and dt*  ( yt  y )  ( xt  x )'  7 When comparing fixed effect and random effect models. and the between group models. which is based on loss of goodness-of-fit.1). Then.edu/~statmath .. run OLS of yi     xi    i . and i and t in the formulas.indiana.    Model: yit     i   t  X it    it . 3.7 (e' e Efficient  e' e Robust ) (n  1) (e' e Robust ) (nT  n  k )  2 2 ( RRobust  REfficient ) (n  1) 2 (1  RRobust ) (nT  n  k ) ~ F (n  1...2 Testing Group Effects In a regression of yit    i  X it '    it . 18 http://www. you may conclude that the fixed group effect model is better than the pooled OLS model.3 Fixed Time Effect and Two-way Fixed Effect Models For the fixed time effects model. The within effect model of this two-way fixed model is estimated by five strategies (see Section 6. Table 3.2. (e' eWithin ) (Tn  T  k ) The fixed group and time effect model uses slightly different formulas. Tn  T  k ) . the null hypothesis is that all dummy parameters except for one for the dropped are zero: H 0 : 1  . the within effect model..2. you need to switch n and T.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 18 observations down to n.   T 1  0 .     Model: yit     t  X it '    it Within effect model: ( yit  yt )  ( xit  xt )'   ( it   t ) Dummy coefficients: dt*  yt  xt '  * Correct standard errors: sek  sek Within df error Tn  k  sek LSDV Tn  T  k df error    Between effect model: y t    xt   t H 0 :  1  . The robust model in the following formula is LSDV (or within effect model) and the efficient model is the pooled regression. * * Within effect Model: yit  yit  yi   yt  y and xit  xit  xi   xt  x .   n 1  0 . (e' ePooled  e' eWithin ) (T  1) F-test: ~ F (T  1.1 contrasts LSDV. the fixed effect estimates are considered as the robust estimates and random effect estimates as the efficient estimates.

2  u   v2  u2   u2  u2   v2   T T  . random effect models are relatively difficult to estimate. w js )  E ( wit w js ) are  u   v2 if i=j and t=s and  u2 if i=j and t  s . the  matrix or the variance structure of errors looks like.  .9 Then transform  *  1 8 9 This implies that Corr ( wit . In GLS... run pooled OLS.1 Generalized Least Squares (GLS) When  is known (given)... GLS based on the true variance components is BLUE and all the feasible GLS estimators considered are asymptotically efficient as either n or T approaches infinity (Baltagi 2001).   n 1  0 and  1  . This assumption is not necessary in the 2 fixed effect model. which are also independent of each other for all i and t.edu/~statmath .. Compared to fixed effect models.8 Thus..3 Random Effect Models The one-way random group effect model is formulated as yit    X it '   ui  vit ... (e' e Efficient  e' e Robust ) (n  T  2) F-test: ~ F [(n  T  2).   0 .. .... and by feasible generalized least squares (FGLS) when the variance is unknown. This document assumes panel data are balanced. If   1 and  v2  0 . (nT  n  T  k  1)] (e' e Robust ) (nT  n  T  k  1) 3.  u   v2   . wit  ui  vit where ui ~ IID(0.   T 1  0 . The ui are assumed independent of vit and X it . u2 ) and vit ~ IID(0.. you just need to compute  using the  matrix:   1  variables as follows. . then run the within effect model.  u2  u2 A random effect model is estimated by generalized least squares (GLS) when the variance structure is known.  2  u2  u    .3.. 19 http://www.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 19    * Correct standard errors: sek  sek Within df error nT  k  sek LSDV nT  n  T  k  1 df error H 0 : 1  . * yit  yit   yi     * xit  xit   xi  for all Xk T   2 u  v2 2 v .   2 ....indiana. w js ) is 1 if i=j and t=s. 3. The components of Cov( wit . v2 ) .. and If  u2 ( u2   v2 ) if i=j and t  s .

nT  e' DDe  nT  T 2e ' e  LM u   1   1 ~  2 (1) . FGLS is more frequently used than GLS. nT  n  k ˆ2 The  u comes from the between effect model (group mean regression): ˆ ˆ   2 u 2 between  ˆ  v2 T ˆ2 . Breusch and Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003).3. transform variables using  and then run OLS: yit   *  xit '  *   it . Since  is often unknown. run OLS on the transformed variables: yit   *  xit '  *   it .3 Testing Random Effects (LM test) 2 The null hypothesis is that cross-sectional variance components are zero. H 0 :  u  0 .edu/~statmath 20 .2 Feasible Generalized Least Squares (FGLS) ˆ2 ˆ If  is unknown. LM u   2 2 2(T  1)   eit 2(T  1)   eit       2 2 http://www.3. 3. e is the n X 1 vector of the group specific means of pooled regression residuals and e' e is the SSE of the pooled OLS regression. first you have to estimate  using  u and  v2 : ˆ  1  ˆ  v2 1 . ˆ  y*  y   y it it i   * ˆ xit  xit   xi  for all Xk  *  1  ˆ The estimation of the two-way random effect model is skipped here. The LM follows chi-squared distribution with one degree of freedom. where vit are the residuals of the LSDV1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 20 * * * Finally. 2(T  1)  e' e 2(T  1)  e' e     Baltagi (2001) presents the same LM test in a different way. 2 2 ˆ ˆ ˆ2 T u   v T between ˆ  v2 ˆ The  v2 is derived from the SSE (sum of squares due to error) of the within effect model or from the deviations of residuals from group means of residuals: SSE within e' ewithin ˆ  v2    nT  n  k nT  n  k  (v i 1 t 1 n T it  vi  ) 2 .indiana. where  between  SSE between . nK * * * ˆ Next. In the following formula. 3. 2 2 2 2   nT    eit  nT   Tei     1   1 ~  2 (1) .

Remember that slopes remain constant in fixed and random effect models. ˆ where. The poolability test is undertaken under the assumption of  ~ N (0. If the null hypothesis is rejected. the panel data are not poolable.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 21 2 2 The two way random effect model has the null hypothesis of H 0 :  u1  0 and  u 2  0 .4 Hausman Test: Fixed Effects versus Random Effects The Hausman specification test compares the fixed versus random effects under the null hypothesis that the individual effects are uncorrelated with the other regressors in the model (Hausman 1978). LM u12  LM u1  LM u 2 ~  2 (2) . where et' et is SSE of the OLS regression at time t.edu/~statmath 21 .  ei'ei n(T  K ) where e' e is the SSE of the pooled OLS and ei' ei is the SSE of the OLS regression for group i. Thus. n(T  K ) . If correlated (H0 is rejected).indiana. Hausman’s essential result is that the covariance of an efficient estimator with its difference from an inefficient estimator is zero (Greene 2003). Under this circumstance. the null hypothesis of the poolability test over time is H 0 :  tk   k .   Var[bRobust  bEfficient ]  Var (bRobust )  Var (bEfficient ) is the difference in the estimated covariance matrix of the parameter estimates between the LSDV model (robust) and the random effects model (efficient). 3. The F-test is Fobs  (e' e   et' et ) (T  1) K e e ' t t T (n  K )  F (T  1) K . so a fixed effect model is preferred. T (n  K ) . Similarly.5 Poolability Test What is poolability? Poolability tests whether or not slopes are the same across groups or over time. a random effect model produces biased estimators. you may go to the random coefficient model or hierarchical regression model. ' ˆ m  bRobust  bEfficient   1 bRobust  bEfficient  ~  2 (k ) . http://www. s 2 I NT ) . The LM test combines two one-way random effect models for group and time. 3. It is notable that an intercept and dummy variables SHOULD be excluded in computation. only intercepts and error variances matter. This test uses the F statistic. violating one of the Gauss-Markov assumptions. the null hypothesis of the poolability test is H 0 :  ik   k . (e' e   ei' ei ) (n  1) K Fobs  ~ F (n  1) K .

831 | N = 90 between | .6581019 | T = 15 4.9978636 12. Min Max | Observations -----------------+--------------------------------------------+---------------cost overall | 13. .0g Year cost float %9.use command reads a data set airline.xtsum.8120832 11.edu/~statmath 22 .5971917 | n = 6 within | . describe airline year cost output fuel load storage display value variable name type format label variable label ----------------------------------------------------------------------------------------------airline int %8.6608616 | N = 90 between | 1. The sample panel data set includes cost and its related data of six U.1 The Pooled OLS Regression Model First. The LSDV for this fixed model needs to create as many dummy variables as the number of entities or subjects.55017 13.describe displays basic information of key variables. One-way Fixed Effect Models: Group Effects A one-way fixed group model examines group differences in intercepts.edu/~statmath/stat/all/panel/airline. . .8513 | T = 15 | | load overall | .49898 .36561 1.0g Airline name year int %8. The following .166556 -2.4368492 . airlines measured at 15 different time points.676287 | N = 90 between | . clear . . 1 to 15 1 unit Let us take a look at descriptive statistics of key variables using .11545 14. regress cost output fuel load http://www.27441 14.5197756 .67563 | n = 6 within | . use http://www.432066 .0g Total cost in $1.3733 | N = 90 between | . the within effect model is useful since it transforms variables using group means to avoid dummies. When many dummies are needed.5604602 . index number fuel float %9.91617 | T = 15 | | output overall | -1.000 output float %9.indiana.6650252 12.4208405 -1.0g Load factor You need to declare a cross-sectional (airline) and a time-series (year) variables using the . xtsum cost output fuel load Variable | Mean Std.278573 .S.56883 13.tsset command.0460361 .0527934 .131971 11.dta.14154 15. fit the pooled regression model without any dummy variable.dta and .1339861 | T = 15 | | fuel overall | 12.0g Fuel price load float %9. Dev.7921 | n = 6 within | .indiana.3192696 | n = 6 within | . The between effect model uses group means of variables.174309 1.987984 .0237151 12.0281511 .0g Output in revenue passenger miles.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 22 4.8123749 11.150606 -3.77036 .7318 12. tsset airline year panel variable: time variable: delta: airline (strongly balanced) year.

36 0.12461 -----------------------------------------------------------------------------cost | Coef. LSDV1 produces correct ANOVA information.2292445 41.9883 = 0. goodness of fit. RUN.0203042 22.edu/~statmath 23 .000 -2. They report the identical parameter estimates of regresors except for dummy coefficients.0704*load In SAS. Let us begin with LSDV1.453977 .5169 + .34.60 0. you may drop another dummy variable to get the equivalent result. MODEL cost = g1-g5 output fuel load. Due to the dummies included.4136136 .516923 .8905 + .3354 to .4175*fuel -1. This difference is modeled as a fixed group effect.5684839 Residual | 1.0704*load Airline 3: cost = 9.9974.9090876 fuel | .9410727 _cons | 9.51 0. LSDV produces six regression equations for six airlines. t P>|t| [95% Conf.4175*fuel -1.34 = 0.9193*output +.9193*output +. Each airline may have a significantly different level of cost.7930 + .4970 + .4175*fuel -1. SSE decreases from 1.000 .0704*load Airline 6: cost = 9.000 . 86) Prob > F R-squared Adj R-squared Root MSE = 90 = 2419. Let us drop the last dummy g6 and use it as the reference group. this model loses five degrees of freedom (from 86 to 81). Of course. parameter estimates.6647 + .4943404 load | -1.345302 -4. Std. This model fits the data well (F=2419.62751 .9883 to . We may. however. but R2 increases from . when all regressors are set to zero.0000 and R2=. How can we draw these equations using LSDV1? Airline 1: cost = 9.4175*fuel -1. PROC REG DATA=masil.2 LSDV1 without a Dummy LSDV1 drops a dummy variable to get the model identified.9193*output +.040893 89 1.0704*load Airline 2: cost = 9. p<. there are three equivalent approaches of LSDV.4175*fuel -1.000 9.9193*output +. Err.indiana.313948 -.972645 ------------------------------------------------------------------------------ The regression equation is cost = 9. As discussed in Chapter 2. 4. As a consequence.0704*load Airline 5: cost = 9.33544153 86 .705452 3 37.9883). The REG Procedure Model: MODEL1 Dependent Variable: cost http://www.4540*fuel -1.0704*load Airline 4: cost = 9.9879 = .28135835 Linear Regression Models for Panel Data: 23 Number of obs F( 3.0000 = 0. suspect if there is a fixed group effect producing different intercepts across groups.2926.7300 + . its Y-intercept.0612 9.8827*output +. this approach is commonly used in practice.01552839 -------------+-----------------------------Total | 114. LSDV1 fits the data better than does the pooled OLS.7059 + .6275*load.8827385 .4175*fuel -1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 112.9193*output +. Interval] -------------+---------------------------------------------------------------output | . PROC REG fits the OLS regression model. and standard errors.9193*output +.airline.8563895 .0132545 66.71 0.

0100 <.76 27.20169 Variable Intercept g1 g2 g3 g4 g5 output fuel load DF 1 1 1 1 1 1 1 1 1 t Value 37.0000 = 0.08420 0.1283)*0 + (-.06301 0.08706 -0. Other dummy parameter estimates are computed using the reference point. is computed as 9.003612628 -------------+-----------------------------Total | 114.3042 0.74827 0.29598 0.0841995 -1.0871 smaller than that of airline 6 (reference point).91928 0.0630)*0 or simply 9.regress command for OLS regression (LSDV).02989 0.41749 -1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Number of Observations Read Number of Observations Used Linear Regression Models for Panel Data: 24 90 90 Analysis of Variance Sum of Squares 113.2960)*0 + (.02389 0. the intercept of this model.9974 0.44970 R-Square Adj R-Sq 0.03 -1.7059) is .0001 0.05002 0.92 2.64 30.9974 = 0.080469 g2 | -.00361 Source Model Error Corrected Total DF 8 81 89 F Value 3935.29262 114.0001 0.0757281 -1.292622872 81 . .040893 89 1.0975)*0 + (. Err.edu/~statmath 24 .26366 0.04089 Mean Square 14.06011 13.0001 <. The output is identical to that of PROC REG.7930).47 -5.01520 0.0001 Root MSE Dependent Mean Coeff Var 0. Stata has the .9972 Parameter Estimates Parameter Estimate 9. t P>|t| [95% Conf. The coefficient -.7930 + (-.79 = 0.69 0.0041 0.28135835 Number of obs F( 8.09749 -0.7930 + (-.03301 0. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.0871 says that the Y-intercept of airline 1 (9.304 -.2185338 Residual | .36561 0.2789728 .14 -1. where 9.21853 0.0871)*1 + (-.12830 -0.03 0.7930 is the reference point.indiana.9972 = .69 -5.0871). Interval] -------------+---------------------------------------------------------------g1 | -. The actual intercept of airline 1.31 Pr > |t| <.79300 -0.07573 0.0941 <.95 -2.06011 -----------------------------------------------------------------------------cost | Coef.79 Pr > F <.0001 <. regress cost g1-g5 output fuel load Source | SS df MS -------------+-----------------------------Model | 113.094 -.2545924 .7059 = 9.0223776 http://www. for example.74827 8 14.0001 The parameter estimate of g6 is presented in the intercept (9.1282976 .0870617 . Std.07040 Standard Error 0.

0940 .95 0. run the Regress$ command to fit the LSDV1.79302127 .06011 http://www.4867748 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 9.08419916 -1.7703592 LOAD | -1. The Y-intercept of airline 2 is computed to get 9.0238919 -2.9974341 | | Adjusted R-squared = . regress cost g2-g6 output fuel load Source | SS df MS -------------+-----------------------------Model | 113.0154697 output | .indiana.000 -.07039502 .063007 .edu/~statmath 25 .003612628 -------------+-----------------------------Total | 114.8598126 .7059 in this model is the actual parameter estimate (Y-intercept) of g1. Actual Y-intercepts of other dummies are computed in this manner.17430918 FUEL | . which was excluded from the model.G1.28135835 Number of obs F( 8.034 .G5.20169 -5.793004 .292622872 81 .6010493E-01 | | Fit R-squared = .4174918 .9787565 fuel | .2636622 37. instead of g6? Since the different reference point is applied.05002285 -5.16666667 G5 | -. say g1.26366104 37.20168924 -5.3581 | | Chi-sq [ 8] (prob) = 536.3042 .694 .FUEL. 2009 at 03:51:23PM | | LHS=COST Mean = 13.07572778 -1.74827 8 14.6647=9. The Y-intercept of airline 2 (9.0318159 . the intercept 9.02389180 -2.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 25 g3 | -.G4.41749105 .395513 -.9971806 | | Model test F[ 8.14 0.2959828 . choice of a dummy variable to be dropped does not change a model.0000) | | Diagnostic Log likelihood = 130.097494 .4477333 load | -1.0412 smaller than the reference point of 9. That is.7059.Rhs=ONE.0264504 | | Rho = cor[e.82 (.36561 | | Standard deviation = 1.08707202 .954 .64 0.000 9.06300770 .02988997 30.0000) | | Info criter.01519907 27.e(-1)] = .0298901 30.0100 .6647) is .3872503 .9192846 . --> REGRESS.0151991 27.142 .16666667 G4 | .03300915 2. LogAmemiya Prd.0500231 -5.2926208 | | Standard error of e = .47 0.2185338 Residual | .010 -.0865 | | Restricted(b=0) = -138.471696 -.0000 G1 | -.09749253 .91928814 .0000 12.070396 .0041 .0000 = 0.468 .12830600 . you will get different dummy coefficients.1964526 g4 | .16666667 G3 | -.76 0. As shown in the above.0330093 2.000 .G3.0000 .528687 | | Autocorrel Durbin-Watson Stat.29598860 .756 .268399 10. Crt.040893 89 1.1631721 g5 | -.79 = 0.G2.004 .56046016 What if we drop a different dummy variable. . 81] (prob) =3935. The other statistics such as parameter estimates of regressors and goodness-of-fit measures remain unchanged.6690963 _cons | 9.0412.OUTPUT.31761 ------------------------------------------------------------------------------ In LIMDEP.31 0.000 -1. = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .637 .9972 = . = -5.9974 = 0. = -5.0000 .1105443 -.LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27.Lhs=COST. Criter.307 .917 .131971 | | WTS=none Number of observs.528017 | | Akaike Info.7059. = 1.0000 -1.000 .89 (.16666667 OUTPUT | .92 0. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.16666667 G2 | -. Do not forget to include ONE for the intercept in the Rhs subcommand.

26 0.8598126 .000 .0240547 .xi prefix command (interaction expansion) to obtain the identical result.0298901 30.9972 = .000 -1.9974 = 0. SSE.3 LSDV2 without the Intercept LSDV2 reports actual parameter estimates of the dummies. while PROC TSCSREG and PROC PANEL in Section 4. However.88 0.0251839 -1.2089211 .000 .3872503 .edu/~statmath 26 .74827 8 14.0841995 1.000 .0902 ------------------------------------------------------------------------------ 4.2940769 -. take advantage of the .04 0.airline output fuel load i. The Stata . PROC REG DATA=masil. Stata by default drops the first dummy variable.0088722 _Iairline_3 | -.2940769 -.88 0.1349293 .bysort.28135835 -----------------------------------------------------------------------------cost | Coef.003 .3054345 _Iairline_5 | .321686 10.003 .26 0.292622872 81 .000 -.04 0.64 0.0412359 .471696 -.000 -1.76 0.6690963 _cons | 9.4477333 load | -1.3054345 g5 | .0151991 27.0913441 .0913441 . Interval] -------------+---------------------------------------------------------------g2 | -.040893 89 1.080469 .705942 . Because LSDV2 suppresses the intercept.0607527 3. Obviously. RUN. However.0841995 1.764 -.0870617 .105 -.xi creates dummies from a categorical variable specified in the term i. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3935.06011 Source | SS df MS -------------+-----------------------------Model | 113.4174918 .9192846 .304 -.30 0. t P>|t| [95% Conf.xi.03 0.003612628 -------------+-----------------------------Total | 114.0902 ------------------------------------------------------------------------------ When you have not created dummy variables.airline. _Iairline_1 omitted) Number of obs F( 8.64 0. You do not need to compute actual Y-intercept any more.4174918 .20169 -5.47 0.193124 50.0636769 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 26 -----------------------------------------------------------------------------cost | Coef.2545924 output | .76 0.0799041 0. MODEL cost = g1-g6 output fuel load /NOINT.03 0. . xi: regress cost i. like.6690963 _cons | 9.0240547 .8598126 .3872503 . t P>|t| [95% Conf.0298901 30. and their standard errors are correct.0427986 -4.indiana. Make sure that the intercepts presented in the beginning of Section 4. the SSE of LSDV2 is correct.0607527 3.304 -.0088722 g3 | -.1845557 .000 9.705942 .9787565 fuel | .080469 .4477333 load | -1. Err. and then run the command following the colon. parameter estimates of regressors. is used either as an ordinary command or a prefix command.2 are what we got here using LSDV2.1830387 g6 | . you will get incorrect F and R2 statistics.0412359 .2185338 Residual | .000 .0427986 -4. Interval] -------------+---------------------------------------------------------------_Iairline_2 | -. In PROC REG.0636769 . Err. Std.764 -.2089211 . the F value of 497.000 -. you need to use the /NOINT option to suppress the intercept.airline _Iairline_1-6 (naturally coded.321686 10.2 drop the last dummy.47 0.20169 -5.31 0.0251839 -1.9787565 fuel | .9192846 .5. http://www.31 0.193124 50.0151991 27.79 = 0.471696 -.0000 = 0.105 -. . Std.2545924 output | .000 9.985 and R2 of 1 are not likely.0870617 .1845557 .1237652 g4 | .1349293 .070396 .070396 .30 0.1237652 _Iairline_4 | .1830387 _Iairline_6 | .0799041 0.

198982 48. t P>|t| [95% Conf.292622872 81 .0000 Parameter Estimates Parameter Estimate 9.29 0.24919 http://www.22 0.79300 0.73000 9.22 40.03381 Residual | .31 Pr > |t| <.03381 0.0001 <.29262 16192 Mean Square 1799. .0902 g2 | 9.5969 90 179. 81) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .000 9.268794 10.06011 -----------------------------------------------------------------------------cost | Coef.91 0.29 37.0000 1.70594 9.0000 1.76 27.0001 <.2249584 42.2417635 40.57 0.890498 .06011 13.0000 .497021 .14 30.0001 <.20169 Variable g1 g2 g3 g4 g5 g6 output fuel load DF 1 1 1 1 1 1 1 1 1 t Value 50.409464 10.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 27 The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 NOTE: No intercept in model.89050 9.000 9.705942 . regress cost g1-g6 output fuel load.22496 0.19898 0.49702 9. Std.24176 0.02989 0.07040 Standard Error 0.0001 <.26 0.47 -5.0001 <.66471 9. Interval] -------------+---------------------------------------------------------------g1 | 9.0001 <.01520 0.00361 Source Model Error Uncorrected Total DF 9 81 90 F Value 497985 Pr > F <.91 37.0001 <.000 9.26 48.944618 g4 | 9. Analysis of Variance Sum of Squares 16191 0.000 9.indiana. Err.003612628 -------------+-----------------------------Total | 16191.57 42.193124 50.edu/~statmath 27 . Notice that noc is its abbreviation.26366 0.3043 9 1799.37153 g5 | 9.000 9.210804 10. noc Source | SS df MS -------------+-----------------------------Model | 16191.91928 0.36561 0.41749 -1.729997 .19312 0.2609421 37. 0.0001 Stata uses the noconstant option to suppress the intercept.0001 <. R-Square is redefined.06062 g3 | 9.0001 Root MSE Dependent Mean Coeff Var 0.049424 9.44970 R-Square Adj R-Sq 1.664706 .321686 10.906633 Number of obs F( 9.0000 1.26094 0.

6690963 ------------------------------------------------------------------------------ In LIMDEP.131971 | | WTS=none Number of observs.9787565 fuel | .31 0.070396 .528687 | | Autocorrel Durbin-Watson Stat.2636622 37.910 .14 0.26094094 37.56046016 4. PROC REG has the RESTRICT statement to impose restrictions.OUTPUT. RUN. Unlike SAS and Stata.0000 -1.82 (.000 -1. 2009 at 03:53:24PM | | LHS=COST Mean = 13.8598126 .0000) | | Info criter. LogAmemiya Prd.Lhs=COST.49703267 .0000 . LIMDEP reports correct R2 (.16666667 G5 | 9.16666667 OUTPUT | . | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 9.36561 | | Standard deviation = 1.79302127 .9974341 | | Adjusted R-squared = .0865 | | Restricted(b=0) = -138.142 .0151991 27.01519907 27.91928814 .20168924 -5.16666667 G2 | 9.73001357 . compared to those of LSDV1 and LSDV2.3581 | | Chi-sq [ 8] (prob) = 536.16666667 G6 | 9.258 .e(-1)] = .16666667 G4 | 9.70594925 .9192846 .307 .41749105 .9971806 | | Model test F[ 8.288 .19898117 48.468 .000 .936) even in LSDV2. Criter.0000 .edu/~statmath 28 . = -5. RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.G3.22495746 42.89051381 .571 .756 . PROC REG DATA=masil. LSDV3 reports the correct ANOVA table and parameter estimates of regressors but produces different.07039502 .0000 .000 9. MODEL cost = g1-g6 output fuel load.3872503 .89 (.G4.0000 .Rhs=G1.76 0.47 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 28 g6 | 9. Crt.LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27.0298901 30.17430918 FUEL | .20169 -5.9974) and F (3.4174918 .02988997 30.26366104 37.0000 .471696 -.19312325 50. Rsqd & F may be < 0.G6.7703592 LOAD | -1.6010493E-01 | | Fit R-squared = .airline.31761 output | .4477333 load | -1. = -5. = 1.000 .66471527 . The REG Procedure Model: MODEL1 Dependent Variable: cost http://www.793004 . you need to drop ONE out of the Rhs subcommand to suppress the intercept.24176245 40.217 . dummy coefficients due to the different baseline (group average) used.0000) | | Diagnostic Log likelihood = 130.indiana.528017 | | Akaike Info.0000 12.G5.G2. = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .268399 10.4 LSDV3 with Restrictions LSDV3 imposes a restriction that the sum of the dummy parameters is zero.0000 .2926208 | | Standard error of e = .4867748 | | Not using OLS or no constant.FUEL. REGRESS. 81] (prob) =3935.0000 .0264504 | | Rho = cor[e.16666667 G3 | 9.

© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 29 NOTE: Restrictions have been applied to parameter estimates. In Stata.8683 0. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0 . Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113. is 9. 81) Prob > F Root MSE = = = = 90 3935.0488).29262 114.21853 0. cnsreg cost g1-g6 output fuel load.0001 1.2023 <.0001 <.04562 0. does not provide an ANOVA table and goodness-of-fit statistics other than F and SEE (standard error of residual--error term.00361 Source Model Error Corrected Total DF 8 81 89 F Value 3935.04050 0.0001 0.07948 0.11 0. The actual intercept of airline 2.06011 13. Notice that the 3.00759 -0.01942 0.02989 0.30 -0.48 9.00 Pr > |t| <.96 30.01674E-15 of RESTRICT is virtually zero.0001 0. A dummy coefficient means the deviation from the averaged group effect (9.79 Pr > F <.31 0. constraint(1) Constrained linear regression Number of obs F( 8.6647 =9.0601 http://www.01606 0.71353 -0.91928 0.0532 <.cnsreg command in stead of .04882 -0.714).0001 <.17697 0.21651 0.47 -5.01520 0.0001 Root MSE Dependent Mean Coeff Var 0.41749 -1. .0001 <.29 -13.17 -1.22964 0.44970 R-Square Adj R-Sq 0. you have to use the .01674E-15 Standard Error 0. The command.03798 0.7135+ (-.indiana.0000* * Probability computed using beta distribution.regress.6547 0.edu/~statmath 29 . however.07040 3.9974 0.04089 Mean Square 14.01647 0.9972 Parameter Estimates Parameter Estimate 9.74827 0.45 1. square root of MSE).20169 7.03669 0.76 27.0000 0.79 0. for example.36561 0.82306E-11 Variable Intercept g1 g2 g3 g4 g5 g6 output fuel load RESTRICT DF 1 1 1 1 1 1 1 1 1 1 -1 t Value 42.

1845478 g4 | .G6.0794759 .0456178 -0. and smaller standard errors of parameters than those of LSDV. But you may compute actual intercepts of groups in a manner similar to what you would do in SAS and Stata.1383208 .Rhs=ONE.2165069 .04050059 1.000 . 80] (prob) = .229641 42.0983509 .2023 . for example.16666667 G4 | . = 1.91928814 .2156189 g5 | .9971806 | | Model test F[ 8.468 .17697283 .OUTPUT.001108 .16666667 G5 | .471696 -.20168924 -5.8682 .20169 -5. listed in Rhs.Lhs=COST.17044 ------------------------------------------------------------------------------ LIMDEP has the Cls subcommand to impose restrictions.indiana.5 Within Group Effect Model The within effect model does not use dummy variables and thus has larger degrees of freedom.17430918 FUEL | .04882570 .655 -. Std. b(2) in Cls: indicates the parameter of the second variable.07039502 .868 -.713528 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 30 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 -----------------------------------------------------------------------------cost | Coef.21650830 . Crt.30 0.45 0.0532 .4867748 | | Restrictns. with restrictions imposed.G4. REGRESS.03797853 -1.00 (*****) | | Not using OLS or no constant. = -5. LogAmemiya Prd.0267439 g3 | -.479 .6547 . do not forget to include ONE in Rhs.0831792 g2 | -. Rsqd may be < 0.edu/~statmath 30 .0366904 0. | | Note.286 . Criter.LOAD. The actual intercept of airline 5.0164689 .449 . t P>|t| [95% Conf.6010493E-01 | | Fit R-squared = . | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 9.000 .111 .48 0.0160624 -13.000 -1.22964002 42.000 -.G2.1600597 output | .0000 . smaller MSE.131971 | | WTS=none Number of observs.16666667 G2 | -.G1.01606233 -13.02988997 30.0264504 | | Rho = cor[e.0000) | | Info criter.01647259 .4477333 load | -1.04561756 -.16666667 G3 | -.1243875 .256614 10.3581 | | Chi-sq [ 8] (prob) = 536.17 0.16666667 G6 | .03669023 .1221 + (-2. 2009 at 06:39:21PM | | LHS=COST Mean = 13.0075859 .7703592 LOAD | -1.8598126 . 4.053 -.0298901 30.0405008 1.2484661 -. Rsqd & F may be < 0.07948030 .7300 = 12.47 0.166 . g1. http://www.9787565 fuel | .756 .1769698 .0379787 -1.307 .0000 G1 | -.e(-1)] = .070396 . Interval] -------------+---------------------------------------------------------------g1 | -.11 0. 81] (prob) =3935.82 (.31 0.16666667 OUTPUT | .G3.0865 | | Restricted(b=0) = -138.0565335 .01519907 27.01942459 9.202 -.71354097 .0151991 27.6690963 _cons | 9. F[ 1.0194247 9.0000 .4174918 .000 .89 (.76 0.29 0.299 . is 9.528687 | | Autocorrel Durbin-Watson Stat. = -5.3872503 .3920).0000 12.0000 .528017 | | Akaike Info.41749105 .FUEL.G5.0000 -1.962 .96 0.9974341 | | Adjusted R-squared = . = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .9192846 .36561 | | Standard deviation = 1.000 9. As a consequence.2926208 | | Standard error of e = .00759172 .56046016 LSDV3 in LIMDEP reports different dummy coefficients.0894712 g6 | . Err. Again. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 31.0000) | | Diagnostic Log likelihood = 130.0488218 .

7318 .058 -----------------------------------------------------------------------------gw_cost | Coef. Std.0704)*.gm_load Now.5971917 | | 2 14.730 = 12. . .2857) + .50 0. the intercept of airline 5 is computed as 9.gm_cost = output . by(airline) gm_fuel=mean(fuel). by(airline) You will get the following group means of variables.9766092 gw_fuel | .77803 . .6835858 ------------------------------------------------------------------------------ You may compute group intercepts using d i*  yi    ' xi  . The SAS TSCSREG and PANEL procedures and LIMDEP Regress$ command report the adjusted (correct) MSE.37231 -.5665}.9923 = . Keep in mind that you have to suppress the intercept.3192696 12.82 = 0.635174 12.indiana.1946109 -5.47 0. R2.4175*12. .9193*(-2.285681 12.7921 + (-1.4174918 . t P>|t| [95% Conf.361009 90 . and standard errors.9122626 12.75171 .1 Estimating the Within Effect Model First.edu/~statmath 31 .27441 -2.033027 12. .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 31 you need to adjust standard errors.000 -1. let us manually estimate the within group effect model with Stata. you need to compute them if really needed.000 . 87) Prob > F R-squared Adj R-squared Root MSE = 90 = 3871.292622861 87 . .36304 -2. In order to get the correct standard errors. quietly quietly quietly quietly egen egen egen egen gm_cost=mean(cost).437344544 Number of obs F( 3. .070396 . +------------------------------------------------------+ | airline gm_cost gm_output gm_fuel gm_load | |------------------------------------------------------| | 1 14. we are ready to run the within effect model.7788 .5197756 | +------------------------------------------------------+ Then transform dependent and independent variables to compute deviations from group means. quietly quietly quietly quietly gen gen gen gen gw_cost = gw_output gw_fuel = gw_load = cost .5664859 | | 6 12.457206 -.87 0.78972 .0000 = 0. regress gw_cost gw_output gw_fuel gw_load. The within effect model reports correct SSE and parameter estimates of regressors but incorrect R2 and standard errors of parameter estimates.7921 .0683861 3 13. .37247 -.gm_fuel load . by(airline) gm_output=mean(output).3630 – {. Interval] -------------+---------------------------------------------------------------gw_output | .003363481 -------------+-----------------------------Total | 39. .0146657 28. Notice that the degrees of freedom increase from 81 (LSDV) to 87 since six dummy variables are not used.9192846 . You need to compute group means.9926 = 0.000 . This model does not report individual dummy coefficients either. 4.1358 -1.5. For example.gm_output fuel . by(airline) gm_load=mean(load).5476773 | | 5 12. Err.5845358 | | 4 13.5470946 | | 3 13.49898 12. SEE (square root of MSE).4466414 gw_load | -1.028841 31. noc Source | SS df MS -------------+-----------------------------Model | 39.67563 .0227954 Residual | .3883422 . you need to adjust them using the ratio of degrees of http://www.86196 .

which will appear in the ID statement of PROC TSCSREG and PROC PANEL. The TSCSREG Procedure Fixed One Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixOne 6 15 Fit Statistics SSE MSE R-Square 0. but you do not need to create dummy variables and compute deviations from group means.0288*sqrt(87/81). /FIXONE of the MODEL statement fits a one-way fixed effect model.airline.0842 Variable CS1 DF 1 Estimate -0. PROC SORT DATA=masil. ID airline year.airline.0299=. These time-series and cross-sectional variables may be numeric or string in SAS.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 32 freedom of the within effect model and LSDV.9974 DFE Root MSE 81 0.08706 t Value -1.2926 0. They. report LSDV1. PROC TSCSREG DATA=masil.edu/~statmath 32 .5. the standard error of the logged output is computed as . BY airline year.03 Pr > |t| 0. For example. RUN.0036 0.73 Pr > F <. 4. in fact.3042 Label Cross Sectional http://www. MODEL cost = output fuel load /FIXONE.2 Using SAS: PROC TSCSREG and PROC PANEL PROC TSCSREG and PROC PANEL of SAS/ETS allows users to fit the within effect model conveniently.0001 Parameter Estimates Standard Error 0.0601 F Test for No Fixed Effects Num DF 5 Den DF 81 F Value 57. A data set needs to be sorted in advance by the variables.indiana.

SEE.0100 <.000 -1.0 15 3604. Xb) = -0.76 27. Interval] -------------+---------------------------------------------------------------output | .9787565 fuel | . fe i(airline) Fixed-effects (within) regression Group variable: airline R-sq: within = 0.95 -2.9192846 . 4.06301 9. and conduct the F test for fixed group effect as well.92 2.17044 http://www. Err.0757 0. .9926.20169 -5.xtreg should follow the .9856 overall = 0. .0941 <.0330 0.xtreg command fits the within group effect model without creating dummy variables.0704 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) CS2 CS3 CS4 CS5 Intercept output fuel load 1 1 1 1 1 1 1 1 -0. and standard errors. i(airline) is redundant.47 -5. ID airline year.6690963 _cons | 9. RUN.713528 .0001 Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Intercept The following PANEL procedure returns the same output.471696 -.0299 0. This command report incorrect F 3. Std.0001 <. .0041 0.0298901 30.0152 0.airline.3475 -----------------------------------------------------------------------------cost | Coef.604 and R2 of .000 . Both variables should be numeric in Stata.919285 0.30 0.417492 -1.47 0. PROC PANEL DATA=masil.tsset is executed.5.64 37.229641 42.097494 -0.0001 0.000 9.80 0.31 0.81) Prob > F = = corr(u_i.29598 0.edu/~statmath 33 .8598126 .2017 Linear Regression Models for Panel Data: 33 -1.indiana.76 0. They have strong advantages over other software packages in this respect.69 -5.256614 10.4477333 load | -1.31 0.xtreg indicates the within effect model and i(airline) specifies airline as the independent unit.4174918 . t P>|t| [95% Conf.0151991 27.000 .9873 Number of obs Number of groups = = 90 6 15 15.9926 between = 0. MODEL cost = output fuel load /FIXONE.0500 0.0001 <.070396 . xtreg cost output fuel load.tsset.tsset command that specifies cross-sectional and timeseries variables.3872503 . R2. string variables are not allowed in .1283 -0.2637 0. Once .14 30.0000 Obs per group: min = avg = max = F(3.3 Using Stata The Stata . Both PROC TSCSREG and PROC PANEL report correct (adjusted) MSE.0001 <.793004 0. quietly tsset airline year The fe option of .0239 0.

9878812 | | Model test F[ 3.0000) | | Diagnostic Log likelihood = 61.0000 Like PROC PANEL.17044 -------------+---------------------------------------------------------------airline | F(5.26 (. 86] (prob) =2419.xtreg reports correct standard errors and the F test for a fixed group effect. 81) Prob > F R-squared Adj R-squared Root MSE = 90 = 3604. The intercept 9.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 34 -------------+---------------------------------------------------------------sigma_u | . absorbing indicators Number of obs F( 3.229641 42.0298901 30. μ2=0.1320775 sigma_e | . REGRESS.33 (.g.4 Using LIMDEP In LIMDEP.30 0.3581 | | Chi-sq [ 3] (prob) = 400.8598126 .31 0. 2009 at 03:56:52PM | | LHS=COST Mean = 13.1246133 | | Fit R-squared = . . 81) = 57.3872503 . μ1=0. and μ5=0).Str=AIRLINE.4174918 .9192846 .131971 | | WTS=none Number of observs. Err.0000) | | Info criter. = -4.000 .Panel. t P>|t| [95% Conf.Rhs=ONE.713528 . 81) = 57.471696 -.Lhs=COST. areg cost output fuel load.edu/~statmath 34 . Notice that the intercept of 9. the intercept of LSDV3.80 = 0.82843653 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(5.000 .Fixed$ +----------------------------------------------------+ | OLS Without Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27. which is correct.FUEL.000 (6 categories) 4.9972 = . Crt.732 0..20169 -5. .06011 -----------------------------------------------------------------------------cost | Coef.OUTPUT. But this command does not provide an analysis of variance (ANOVA) table.indiana.0000 = 0.7135 is the average of six airlines.47 0.5.9974 = 0. LogAmemiya Prd.LOAD.000 -1.335450 | | Standard error of e = .36561 | | Standard deviation = 1.9787565 fuel | . = 90 | | Model size Parameters = 4 | | Degrees of freedom = 86 | | Residuals Sum of squares = 1.06010514 rho | .000 9.areg to get the same result except for R2.121653 | +----------------------------------------------------+ http://www.0151991 27.7135 is that of LSDV3. Alternatively.76991 | | Restricted(b=0) = -138. the Panel and Fixed subcommands in the Regress$ command fit a fixed effect panel data model. μ4=0. The last line of the output tests the null hypothesis that five dummy parameters in LSDV1 are zero (e. = -4. you may use . absorb(airline) Linear regression.73 Prob > F = 0. Interval] -------------+---------------------------------------------------------------output | . μ3=0. The Str subcommand specifies a stratification variable.6690963 _cons | 9.9882897 | | Adjusted R-squared = .76 0.4477333 load | -1.256614 10. R2 and F statistic are not correct. Criter.121594 | | Akaike Info. Std.070396 .

7703592 LOAD | -1. P value | |(2) vs (1) 95.7703592 LOAD | -1.02988997 30. LogAmemiya Prd.07039502 .02030424 22.00000 | |(3) vs (1) 400.6548513 | |(3) X .0000) | | Diagnostic Log likelihood = 130.22924522 41.0000 +----------------------------------------------------+ | Least Squares with Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27.359 .9974341 | | Adjusted R-squared = .t) . 2009 at 03:56:52PM | | LHS=COST Mean = 13.91928814 .131971 | | WTS=none Number of observs.6799 5.0000 .041 89.0000) | | Info criter.0000000 | |(2) Group effects only -90.76991 .832 3 81 .00000 31.9974341 | +--------------------------------------------------------------------+ | Hypothesis Tests | | Likelihood Ratio Test F Tests | | Chi-squared d.82 (.733 5 81 .0000 .28136 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .528687 | | Estd. Free.00000 3604.62750780 .3611 84.51691223 . F num.88273863 .17430918 FUEL | .00000 | |(4) vs (1) 536.599 .17430918 FUEL | . Autocorrelation of e(i. 1.89 (.307 . = -5.3936109461D+02 .889 8 .6010493E-01 | | Fit R-squared = .00000 2419. Prob.45397771 .3581 | | Chi-sq [ 8] (prob) = 536.9360 | | Residual 39. Largest 15 | | Average group size 15. 81] (prob) =3935.00000 | +--------------------------------------------------------------------+ http://www.528017 | | Akaike Info.01325455 66.34530293 -4.48804 .00000 | |(4) vs (3) 136. denom.1335449522D+01 .875 5 84 . 14.9971806 | | Model test F[ 8.329 3 86 .514 . Crt.633 5 .f.56046016 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.35814 .818 8 81 .56046016 Constant| 9.2926208 | | Standard error of e = .41749105 .756 .740 5 . = 90 | | Model size Parameters = 9 | | Degrees of freedom = 81 | | Residuals Sum of squares = .573531 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.256 3 .36561 | | Standard deviation = 1.713 .0000 12.1140409821D+03 . Mean Square | | Between 74.0000 -1.00000 3935.variables only 61.9882897 | |(4) X and group effects 130. .468 .0000 12.indiana.468584 | | Total 114.00000 | |(4) vs (2) 441.00000 57.01519907 27.0865 | | Restricted(b=0) = -138.20168924 -5.2926207777D+00 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 35 +----------------------------------------------------+ | Panel Data Analysis of COST [ONE way] | | Unconditional ANOVA (No regressors) | | Source Variation Deg.149 3 .edu/~statmath 35 .0000 -1. = -5.00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | . Criter. Valid data 6 | | Smallest 15.08647 .

12 0. t P>|t| [95% Conf.343 -24.3144803 1. The . Let us compute group means and run OLS with them.031675926 2 .23 0.97865717 5 .015837963 -------------+-----------------------------Total | 4.48199 1.55397 10.9841 .0095 0. SEE. 2) Prob > F R-squared Adj R-squared Root MSE = = = = = = 6 104. .733). R . Err. This model fits data relatively well but its t-tests report insignificant parameters. Like the SAS TSCSREG procedure.1087646 7.edu/~statmath 36 . This group mean regression produces different goodness-of-fit measures and parameter estimates compared to those of LSDV and the within effect model. regress gm_cost gm_output gm_fuel gm_load Source | SS df MS -------------+-----------------------------Model | 4.8081 56. respectively.52 0.268 -157. by(airline) .64899375 Residual | . collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) /// gm_load=load.743167 -0.7824568 . LIMDEP provides correct MSE. but entity or subject.05182 _cons | 85.6 Between Group Effect Model: Group Mean Regression A between effect model uses aggregate information.9936 0. In other words.74647 gm_load | -1. PROC PANEL DATA=masil. but PROC TSCSREG does not.2143 328. RUN. and standard errors of the fixed effect model. LIMDEP also conducts the F test for checking a fixed group effect (see the last line of the LIMDEP output above to get 57. The PANEL Procedure Between Groups Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections BtwGrps 6 http://www.523904 4.12585 -----------------------------------------------------------------------------gm_cost | Coef.94698124 3 1.79427 13. 4.8305 ------------------------------------------------------------------------------ The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between effect model. ID airline year.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 36 LIMDEP reports both the pooled OLS regression under the label OLS Without Group Dummy Variables and the within effect model under Least Squares with Group Dummy 2 Variables. Interval] -------------+---------------------------------------------------------------gm_output | . the unit of analysis is not an individual observation. Std.19 0.589 -13. MODEL cost = output fuel load /BTWNG. group means of variables.751072 2. Note that /// links two command lines.019 . The number of observations jumps down to n from nT.indiana.995731433 Number of obs F( 3. /BTWNG and /BTWNT fit the between group and time effect models.478718 -1.airline.250433 gm_fuel | -5.collapse command computes aggregate information and stores into a new data set.64 0.

343 -24. = 6 | | Model size Parameters = 4 | | Degrees of freedom = 2 | | Residuals Sum of squares = .1371 Number of obs Number of groups = = 90 6 15 15. .55401 10.75102 t Value 1. Err.79471 13.48302 1.Lhs=COST.34 (.8358 ------------------------------------------------------------------------------ LIMDEP has the Means subcommand to fit the between effect model.9936 DFE Root MSE 2 0.1087663 7.36561 | | Standard deviation = .8808 between = 0.23 0.64 Pr > |t| 0.74319 -0.52398 -1.OUTPUT.3427 0.13 (.751016 2.0000) | http://www.953835 | | Chi-sq [ 3] (prob) = 30. REGRESS.Str=AIRLINE.9978636 | | WTS=NTi/Nobs Number of observs.1258491 -----------------------------------------------------------------------------cost | Coef.19 0.589 -13.0188 0.2178 328. t P>|t| [95% Conf.xtreg command has the be option to fit the between effect model but does not report the ANOVA table.Rhs=ONE.indiana.80901 56.© 2005-2009 The Trustees of Indiana University (9/16/2009) Time Series Length Linear Regression Models for Panel Data: 37 15 Fit Statistics SSE MSE R-Square 0.9936 overall = 0.19 -1.4830 0. xtreg cost output fuel load.2) Prob > F = = sd(u_i + avg(e_i.523978 4. Std.9936383 | | Adjusted R-squared = .0158 0.5886 Label Intercept The Stata .Panel.1088 4.4788 2. Interval] -------------+---------------------------------------------------------------output | .23 -0.7432 Variable Intercept output fuel load DF 1 1 1 1 Estimate 85. be i(airline) Between regression (regression on group means) Group variable: airline R-sq: within = 0.250439 fuel | -5.2681 0.7824552 .52 0.218541 | | Restricted(b=0) = -7. 2009 at 04:04:12PM | | LHS=YBAR(i.478802 -1.0095 Obs per group: min = avg = max = F(3.12 0.782455 -5.FUEL. 2] (prob) = 104.64 0.80901 0.) Mean = 13.52 7.0 15 104.9840957 | | Model test F[ 3.1258 Parameter Estimates Standard Error 56.Means$ +----------------------------------------------------+ | Group Means Regression | | Ordinary least squares regression | | Model was estimated Aug 27.0317 0.74675 load | -1.019 .3144715 1.268 -157.1258427 | | Fit R-squared = .3167277E-01 | | Standard error of e = .edu/~statmath 37 .0095) | | Diagnostic Log likelihood = 7.))= .LOAD.05198 _cons | 85.

The REG Procedure Model: MODEL1 Test 1 Results for Dependent Variable cost Mean Square 0.9883) (6  1)  ~ 57. (.52443747 4.9936.edu/~statmath 38 . Crt.9883 from the pooled OLS.4811479 1. There is a fixed group effect in these panel data. http://www. Alternatively.xtreg command. The F statistic is computed as (1. and LIMDEP all report the same result: SSE . right after estimating the model. In order to conduct a F-test.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 38 | Info criter.10876126 7.9974) (90  6  3) The large F statistic rejects the null hypothesis in favor of the fixed group effect model (p<.78244727 . however.519 . Criter.00361 Source Numerator Denominator DF 5 81 F Value 57.airline. you may draw R2 of . PROC REG DATA=masil.638 .81] .0095). RUN.   n 1  0 .2926) (6  1) (.73 Pr > F <.3354 from the pooled OLS regression and ..1287 SAS.47865187 -1. SEE .|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | . TEST g1 = g2 = g3 = g4 = g5 = 0.74304702 -.2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model.234 .9974 from LSDV1 or LSDV3 and .Er. MODEL cost = g1-g5 output fuel load.1258. you may conduct the same test in LSDV1. use LSDV2 and the within effect model for R2. F 104.2174 .7 Testing Fixed Group Effects (F-test) How do we know whether there is a significant fixed group effect? The null hypothesis is that all dummy parameters except for one are zero: H 0 : 1  .18642891 LOAD | -1.32541105 Constant| 85. let us obtain the SSE (e’e) of 1.20856 0.9974  .indiana. = -3.5233 . Stata.test command.910724 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St. In SAS. Alternatively.12 (p<.3354  . Stata . The SAS TSCSREG and PANEL procedures. 4. LogAmemiya Prd. run the .7319[5.2926) (90  6  3) (1  .194 .0001 In Stata.. = -3. and LIMDEP Regress$ command by default conduct the F test. a follow-up command for the Wald test.634619 | | Akaike Info.230256D-11 FUEL | -5. Do not. add the TEST statement in PROC REG and then run the procedure again (ANOVA table and parameter estimates are skipped).75094765 2.0317.0000). and R2 .8148317 56.0000 .

and LIMDEP.73 0.xtreg.8 Summary Table 4. PROC PANEL. /BTWNT * “Yes/No” means whether the software reports the statistics. (adjusted) R2 . Correct Incorrect F.indiana. http://www. The SAS PANEL procedure is generally preferred to Stata and LIMDEP counterparts since it produces correct statistics and conducts various hypothesis tests conveniently. Stata. 81) = Prob > F = 57.be Means. . Stata.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 39 . (adjusted) R2 Correct PROC TSCSREG.cnsreg Regress$ Correct Incorrect F. quietly regress cost g1-g5 output fuel load . LIMDEP* SAS 9 Stata 11 LIMDEP 9 OLS estimation LSDV1 LSDV2 LSDV3 Panel Estimation PROC REG.regress.cnsreg No ANOVA table and R2 .1 summarizes the estimation of a fixed effect model in SAS. Between effect /BTWNG. “Correct/incorrect” indicates whether the statistics are different from those of the least squares dummy variable (LSDV) 1 without a dummy variable. Table 4.0000 4. test g1 g2 g3 g4 g5 ( ( ( ( ( 1) 2) 3) 4) 5) g1 g2 g3 g4 g5 F( = = = = = 0 0 0 0 0 5. Panel$ Estimation type LSDV1 Within effect Within effect SSE (e’e) Correct No Correct MSE or SEE Correct (adjusted) No Correct (adjusted) SEE Model test (F) No Incorrect Slightly different F (adjusted) R2 Correct Incorrect (correct in . . .edu/~statmath 39 .areg) Correct Intercept Correct LSDV3 intercept No Coefficients Correct Correct Correct Standard errors Correct (adjusted) Correct (adjusted) Correct (adjusted) Effect test (F) Yes Yes Yes .1 Comparison of the Fixed Effect Model in SAS.areg Correct (slightly different F) Correct (slightly different F) Correct (slightly different F) Different dummy coefficients Regress.

include time dummy variables instead of group dummies.4959 + .6167 + ..9544*load Time 15: cost = 22. PROC REG DATA=masil.6542 + . in LSDV1.8677*output .8677*output .9544*load Time 14: cost = 22.indiana..8677*output . You need to exclude one of time dummies.64428 0..9544*load Time 03: cost = 20.8677*output ..4845*fuel -1.9544*load Time 10: cost = 22..4655 + .5369 + . say t15 here.6515 + ..4845*fuel -1..4845*fuel -1.9544*load Time 08: cost = 21.8677*output .8677*output .4845*fuel -1.4845*fuel -1.8677*output .8677*output .9544*load 5..9544*load Time 05: cost = 21.7409 + .4845*fuel -1..1 Least Squares Dummy Variable Models The least squares dummy variable (LSDV) model produces the following fifteen regression equations Time 01: cost = 20.8677*output .8677*output .8677*output .4118 + .4845*fuel -1.9544*load Time 11: cost = 22.9544*load Time 06: cost = 21..4845*fuel -1.01511 Source Model Error DF 17 72 F Value 439.5782 + .4845*fuel -1. The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 112....9544*load Time 13: cost = 22.95270 1.1.edu/~statmath 40 .1 LSDV1 without a Dummy In SAS REG procedure. RUN. 5.6559 + .9544*load Time 02: cost = 20.8677*output . MODEL cost = t1-t14 output fuel load. One-way Fixed Effect Models: Time Effects A fixed time effect model investigates how time affects the intercept using time dummy variables.62 Pr > F <.9544*load Time 09: cost = 21..8397 + .4845*fuel -1.8677*output .4845*fuel -1.4845*fuel -1.08819 Mean Square 6.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 40 5.0001 http://www.5524 + .1140 + .airline.8677*output . The logic and method are the same as those of the fixed group effect model.4845*fuel -1.2000 + .4845*fuel -1..4845*fuel -1.8677*output .9544*load Time 04: cost = 20.9544*load Time 07: cost = 21.9544*load Time 12: cost = 22.5035 + .

© 2005-2009 The Trustees of Indiana University (9/16/2009)
Corrected Total 89 114.04089

Linear Regression Models for Panel Data: 41

Root MSE Dependent Mean Coeff Var

0.12294 13.36561 0.91981

R-Square Adj R-Sq

0.9905 0.9882

Parameter Estimates Parameter Estimate 22.53677 -2.04096 -1.95873 -1.88103 -1.79601 -1.33693 -1.12514 -1.03341 -0.88274 -0.70719 -0.42296 -0.07144 0.11457 0.07979 0.01546 0.86773 -0.48448 -1.95440 Standard Error 4.94053 0.73469 0.72275 0.72036 0.69882 0.50604 0.40862 0.37642 0.32601 0.29470 0.16679 0.07176 0.09841 0.08442 0.07264 0.01541 0.36411 0.44238

Variable Intercept t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load

DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

t Value 4.56 -2.78 -2.71 -2.61 -2.57 -2.64 -2.75 -2.75 -2.71 -2.40 -2.54 -1.00 1.16 0.95 0.21 56.32 -1.33 -4.42

Pr > |t| <.0001 0.0070 0.0084 0.0110 0.0122 0.0101 0.0075 0.0076 0.0085 0.0190 0.0134 0.3228 0.2482 0.3477 0.8320 <.0001 0.1875 <.0001

In Stata and LIMDEP, execute following commands to fit the same LSDV1 (output is skipped).
. regress cost t1-t14 output fuel load
REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$

5.1.2 LSDV2 without the Intercept

In LIMDEP, take ONE out to fit LSDV2 by suppressing the intercept. Unlike SAS and Stata, LIMDEP reports correct, although slightly different, F and R2 statistics.
REGRESS;Lhs=COST;Rhs=T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 27, 2009 at 04:15:08PM | | LHS=COST Mean = 13.36561 | | Standard deviation = 1.131971 | | WTS=none Number of observs. = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1.088193 | | Standard error of e = .1229382 | | Fit R-squared = .9904579 | | Adjusted R-squared = .9882049 | | Model test F[ 17, 72] (prob) = 439.62 (.0000) | | Diagnostic Log likelihood = 70.98362 | | Restricted(b=0) = -138.3581 |

http://www.indiana.edu/~statmath

41

© 2005-2009 The Trustees of Indiana University (9/16/2009)

Linear Regression Models for Panel Data: 42

| Chi-sq [ 17] (prob) = 418.68 (.0000) | | Info criter. LogAmemiya Prd. Crt. = -4.009826 | | Akaike Info. Criter. = -4.015291 | | Autocorrel Durbin-Watson Stat. = .2363289 | | Rho = cor[e,e(-1)] = .8818355 | | Not using OLS or no constant. Rsqd & F may be < 0. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ T1 | 20.4959389 4.20954636 4.869 .0000 .06666667 T2 | 20.5781713 4.22154389 4.875 .0000 .06666667 T3 | 20.6558664 4.22419549 4.890 .0000 .06666667 T4 | 20.7408923 4.24576770 4.885 .0000 .06666667 T5 | 21.1999763 4.44035103 4.774 .0000 .06666667 T6 | 21.4117634 4.53864000 4.718 .0000 .06666667 T7 | 21.5034994 4.57141663 4.704 .0000 .06666667 T8 | 21.6541766 4.62290530 4.684 .0000 .06666667 T9 | 21.8297215 4.65692608 4.688 .0000 .06666667 T10 | 22.1139553 4.79266903 4.614 .0000 .06666667 T11 | 22.4654855 4.94992975 4.539 .0000 .06666667 T12 | 22.6514956 5.00861379 4.523 .0000 .06666667 T13 | 22.6167135 4.98616006 4.536 .0000 .06666667 T14 | 22.5523879 4.95596262 4.551 .0000 .06666667 T15 | 22.5369251 4.94055238 4.562 .0000 .06666667 OUTPUT | .86772681 .01540818 56.316 .0000 -1.17430918 FUEL | -.48449467 .36410984 -1.331 .1875 12.7703592 LOAD | -1.95441438 .44237791 -4.418 .0000 .56046016

In SAS and Stata, use /NOINT and noconstant, respectively, to suppress the intercept and estimate the same LSDV2 (output is skipped).
PROC REG DATA=masil.airline; MODEL cost = t1-t15 output fuel load /NOINT; RUN;

. regress cost t1-t15 output fuel load, noc

5.1.3 LSDV3 with a Restriction

In PROC REG, you need to impose a restriction using the RESTRICT statement.
PROC REG DATA=masil.airline; MODEL cost = t1-t15 output fuel load; RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0; RUN; The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.

Number of Observations Read Number of Observations Used

90 90

Analysis of Variance Sum of Squares Mean Square

Source

DF

F Value

Pr > F

http://www.indiana.edu/~statmath

42

© 2005-2009 The Trustees of Indiana University (9/16/2009)
Model Error Corrected Total 17 72 89 112.95270 1.08819 114.04089

Linear Regression Models for Panel Data: 43
6.64428 0.01511 439.62 <.0001

Root MSE Dependent Mean Coeff Var

0.12294 13.36561 0.91981

R-Square Adj R-Sq

0.9905 0.9882

Parameter Estimates Parameter Estimate 21.66698 -1.17118 -1.08894 -1.01125 -0.92622 -0.46715 -0.25536 -0.16363 -0.01296 0.16259 0.44682 0.79834 0.98435 0.94957 0.88524 0.86978 0.86773 -0.48448 -1.95440 -3.9462E-15 Standard Error 4.62405 0.41783 0.40586 0.40323 0.38177 0.19076 0.09856 0.07190 0.04862 0.06271 0.17599 0.32940 0.38756 0.36537 0.33549 0.32029 0.01541 0.36411 0.44238 .

Variable Intercept t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 output fuel load RESTRICT

DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1

t Value 4.69 -2.80 -2.68 -2.51 -2.43 -2.45 -2.59 -2.28 -0.27 2.59 2.54 2.42 2.54 2.60 2.64 2.72 56.32 -1.33 -4.42 .

Pr > |t| <.0001 0.0065 0.0090 0.0144 0.0178 0.0168 0.0116 0.0258 0.7907 0.0115 0.0133 0.0179 0.0132 0.0113 0.0102 0.0083 <.0001 0.1875 <.0001 .

* Probability computed using beta distribution.

In Stata, define the restriction with the .constraint command and specify the restriction using the constraint() option of the .cnsreg command.
. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0 . cnsreg cost t1-t15 output fuel load, constraint(3) Constrained linear regression Number of obs F( 17, 72) Prob > F Root MSE = = = = 90 439.62 0.0000 0.1229

( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0 -----------------------------------------------------------------------------cost | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------t1 | -1.171179 .4178338 -2.80 0.007 -2.004115 -.3382422 t2 | -1.088945 .4058579 -2.68 0.009 -1.898008 -.2798816 t3 | -1.011252 .4032308 -2.51 0.014 -1.815078 -.2074266 t4 | -.9262249 .3817675 -2.43 0.018 -1.687265 -.1651852 t5 | -.4671515 .1907596 -2.45 0.017 -.8474239 -.0868791 t6 | -.2553627 .0985615 -2.59 0.012 -.4518415 -.0588839 t7 | -.1636326 .0718969 -2.28 0.026 -.3069564 -.0203088

http://www.indiana.edu/~statmath

43

0259 .40585988 -2.54 0.T14.012 .266 .Rhs=ONE.T3.32029396 2.836268 -1. REGRESS.46715493 .42 0.06666667 T12 | .88525662 .e(-1)] = .1876 12.04862498 -.33549236 2.449 .2164554 1.0000 .7703592 LOAD | -1.1625876 .593 .44237791 -4.4423777 -4.508 .06666667 T8 | -.T15.0959814 .06666667 T15 | .09856234 -2.686 .624053 4.T10.41783540 -2. Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)+b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 27.3354912 2.17430918 FUEL | -.0133 . with restrictions imposed.94958221 .2312891 1.210321 .T1.3581 | | Chi-sq [ 17] (prob) = 418.8697821 .64 0. = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1.62 (.3875583 2.79835421 .06666667 T3 | -1.426 .06666667 T10 | .8370111 .8984424 fuel | -.0178 .48449467 . Rsqd & F may be < 0.06666667 OUTPUT | . | | Note.69 0.T5.38755999 2.1098872 .T9. = .27 0. Criter.9495716 .98436437 .54 0. 2009 at 04:16:47PM | | LHS=COST Mean = 13.3641085 -1.7907 .7976568 t11 | .008 .0000 5.6671313 4. = -4.08895999 .indiana.95441438 .803 .06666667 T13 | .06666667 T11 | .17119233 .56046016 Constant| 21.00 (*****) | | Not using OLS or no constant.92623900 .T8.599 .36561 | | Standard deviation = 1.8818355 | | Restrictns.0000 -1.06666667 T9 | .0168 .06666667 T14 | .131971 | | WTS=none Number of observs.013 .316 .0083 .44682406 .T4.8852448 .59 0. 71] (prob) = .540 .0000) | | Info criter.FUEL.T2.424 .3202933 2.06666667 T7 | -.1229382 | | Fit R-squared = .36410984 -1. run the following command to fit the same LSDV3.0129552 .0154082 56.013 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 44 t8 | -.0091 .015291 | | Autocorrel Durbin-Watson Stat.0116 .639 .4468191 .06666667 T6 | -. Rsqd may be < 0.07189683 -2.3653675 2.9882049 | | Model test F[ 17.38176914 -2.9843536 .T11.42 0.06666667 T2 | -1.0144 .188 -1.009826 | | Akaike Info.454996 t12 | .0627099 2.756937 t13 | .2212248 1.T12. 72] (prob) = 439.0102 .0116 .19075952 -2.16363186 .edu/~statmath 44 .T13.175994 2.68 (.T7.418 .2 Within Time Effect Model http://www.0114 .4844835 .010 . | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ T1 | -1.86979380 .088193 | | Standard error of e = .40323211 -2.018 .2117702 1.8677268 .66698 4.2363289 | | Rho = cor[e.07254 _cons | 21. LogAmemiya Prd.32940389 2.2413535 load | -1.508275 output | .0065 .554034 t15 | .331 .791 -.0486249 -0.16259020 .000 12.2875976 t10 | .276 .1416916 1.72 0.000 -2.7983439 .06666667 T4 | -.17599505 2.88486 ------------------------------------------------------------------------------ In LIMDEP.Lhs=COST.539 .3294027 2.0133 .683 .01540818 56.0179 .000 .677918 t14 | . F[ 1.33 0.4491 30.LOAD.06666667 T5 | -.0000) | | Diagnostic Log likelihood = 70.25536788 .954404 .591 .9904579 | | Adjusted R-squared = .86772681 .01295461 .011 .60 0.T6.OUTPUT.62407240 4.0375776 .98362 | | Restricted(b=0) = -138.06271009 2.01126486 . Crt. = -4.36536879 2.716 .32 0.0839768 t9 | .

by(year) = mean(output). .9544)*.012507934 -------------+-----------------------------Total | 76.63606 .1738836 tw_load | -1. For instance.05984 -.29884 -1.9023156 13.4788587 | | 2 12. transform the dependent and independent variables and then run OLS with the intercept suppressed.8641667 13.067003 12. .5670587 | | 9 13.15965 -1.tm_output fuel .6271 + (-1.154514 ------------------------------------------------------------------------------ If you want to get intercepts of years. by(year) +---------------------------------------------------+ | year tm_cost tm_output tm_fuel tm_load | |---------------------------------------------------| | 1 12. noc Source | SS df MS -------------+-----------------------------Model | 75. 5.tm_cost = output .12841 -. regress tw_cost tw_output tw_fuel tw_load.3024) + (-.5527684 13.5541809 | | 7 13. quietly quietly quietly quietly egen egen egen egen tm_cost = tm_output tm_fuel = tm_load = mean(cost). .302416 12.9853 = .5797168 | +---------------------------------------------------+ Once time means are ready.67403 .393002 12.2.indiana.5635266 | | 6 13.1597-{.90 0.5803183 | | 14 14. http://www.45963 -1.77912 -1. As discussed previously.6233943 | | 11 13.954404 .5607}.66246 .67494 . by(year) mean(fuel). Keep in mind that the intercept should be suppressed.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 45 The within effect model for a fixed time effect needs to compute deviations from time means.3312*sqrt(87/72).60706 -1.222963 12.26843 .4868322 | | 3 12.95 = 0.7341294 90 .3641= .852601437 Number of obs F( 3.53826 .73193 .5802577 | | 12 14.08819023 87 .23517 -.000 -2.147 -1. .577767 11. Err.32062 -.000 .91324 -.5607425 | | 8 13. 87) Prob > F R-squared Adj R-squared Root MSE = 90 = 2015.62714 .398122 12. by(year) mean(load). . .23183 . For example.66868 .82315 .4844836 .75979 .tm_fuel load .7923916 13. t P>|t| [95% Conf.8677268 . standard errors of a within effect model need to be adjusted.11184 -----------------------------------------------------------------------------tw_cost | Coef.4845)*12.9205539 13.70187 -.46 0.5856243 | | 13 14.142851 .8677*(-1.5244486 | | 5 12.76768 .4651 -1. Std.5035=13. Interval] -------------+---------------------------------------------------------------tw_output | .5804528 | | 15 14.790283 11.6428015 13.52358 | | 4 12.0000 = 0.86 0.86104 . the intercept of year 7 is 21.8955873 tw_fuel | -.3312359 -1.62997 .9858 = 0.8398663 . use d t*  y t   ' xt .443695 11.215313 Residual | 1.4024388 -4. .94143 -1. . quietly quietly quietly quietly gen gen gen gen tw_cost = tw_output tw_fuel = tw_load = cost .0452 -1.36897 -1.6459391 3 25.754295 -1.6179098 | | 10 13.1 Estimating the Fixed Time Effect Model Let us manually estimate the fixed time effect model first.tm_load . the correct standard error of fuel price is computed as .0140171 61.edu/~statmath 45 .744389 11.

(output is skipped) The F test does not reject the null hypothesis of no fixed time effect (F=1.17. PROC SORT DATA=masil.302416 1.510342 .865108 .272691 -2. p<.9905 DFE Root MSE 72 0. Min Max -------------+-------------------------------------------------------cost | 6 13. year and airline).48162 12. which will appear in the ID statement of PROC TSCSREG and PROC PANEL.88492 14.1. MODEL cost = output fuel load /FIXONE.1229 F Test for No Fixed Effects Num DF Den DF F Value Pr > F http://www.indiana.2.2550375 fuel | 6 12.3178). ID year airline.62714 .071738 11. PROC PANEL DATA=masil.1. ID year airline.5607425 ..68725 load | 6 . The output is very similar to that of LSDV1 in Section 5.airline.0747646 12. RUN. BY year airline.2 Using SAS: PROC TSCSREG and PROC PANEL You need to sort the data set by variables (i.594495 5.029541 . The PANEL Procedure Fixed One Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixOne 15 6 Fit Statistics SSE MSE R-Square 1.15965 1. PROC TSCSREG DATA=masil. RUN.0882 0. sum cost output fuel load if year==7 Linear Regression Models for Panel Data: 46 Variable | Obs Mean Std.52004 output | 6 -1.e. MODEL cost = output fuel load /FIXONE.airline. Dev. there is no fixed time effect in these panel data.0151 0.edu/~statmath 46 . that is.© 2005-2009 The Trustees of Indiana University (9/16/2009) . RUN.airline.

In this case.21 4.07144 0.75 -2.015463 22.32 -1.0190 0.3228 0.95 0.7204 0.5060 0. the fe option fits the fixed effect model.40 -2.54 -1.867727 -0.75 -2.00 1.57 -2.7228 0.71 -2.0085 0.0070 0.079789 0. .xtreg command.iis command specifies year as a panel identification variable.9405 0.0122 0.0000 Obs per group: min = avg = max = F(3.4812 overall = 0.2.0001 <.© 2005-2009 The Trustees of Indiana University (9/16/2009) 14 72 Linear Regression Models for Panel Data: 47 1.5265 Number of obs Number of groups = = 90 15 6 6.71 -2.0154 0. iis year .88103 -1.0718 0. Xb) = -0.1503 http://www.0084 0.3178 Parameter Estimates Standard Error 0.78 -2. i(year) is redundant.0075 0.8320 <.12514 -1.1668 0.72) Prob > F = = corr(u_i.3477 0.33 -4.70719 -0.indiana.0 6 1668.0110 0.2947 0.37 0.3260 0.4424 Variable CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS9 CS10 CS11 CS12 CS13 CS14 Intercept output fuel load DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Estimate -2.3 Using Stata In Stata .48448 -1.0134 0.0001 0. xtreg cost output fuel load.88274 -0.4086 0.42296 -0. fe i(year) Fixed-effects (within) regression Group variable: year R-sq: within = 0.04096 -1.79601 -1.42 Pr > |t| 0.6988 0.95873 -1.1875 <.0984 0.0844 0.17 0.9858 between = 0.3641 0.56 56.114571 0.2482 0.edu/~statmath 47 .0001 Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Cross Sectional Effect 6 Cross Sectional Effect 7 Cross Sectional Effect 8 Cross Sectional Effect 9 Cross Sectional Effect 10 Cross Sectional Effect 11 Cross Sectional Effect 12 Cross Sectional Effect 13 Cross Sectional Effect 14 Intercept 5.16 0.0076 0.0726 4.53677 0.33693 -1.3764 0. The following .61 -2.64 -2.9544 t Value -2.03341 -0.0101 0.7347 0.

97708602 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(14.98362 | | Restricted(b=0) = -138. Std.44237791 -4.62 (.9882897 | |(4) X and group effects 70. = -4.000 -2. Crt. The pooled OLS part of the output is skipped.66698 4.Lhs=COST.12293801 rho | .56046016 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.4423777 -4. 72] (prob) = 439.1.0000 .95441438 .0000) | | Info criter.1335449522D+01 .000 .35814 .t) .0000) | | Diagnostic Log likelihood = 70.2413535 load | -1. = 90 | | Model size Parameters = 18 | | Degrees of freedom = 72 | | Residuals Sum of squares = 1.edu/~statmath 48 . Valid data 15 | | Smallest 6.OUTPUT.188 -1.1140409821D+03 .3).76991 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 48 -----------------------------------------------------------------------------cost | Coef.015291 | | Estd.Panel.variables only 61.8677268 .0154082 56.Fixed$ +----------------------------------------------------+ | Least Squares with Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 27.8027907 sigma_e | . Criter. LogAmemiya Prd.52864 .9882049 | | Model test F[ 17.48449467 .9904579 | +--------------------------------------------------------------------+ | Hypothesis Tests | http://www. 2009 at 04:19:57PM | | LHS=COST Mean = 13. t P>|t| [95% Conf. = -4.33 0.32 0. REGRESS. specify a time-series variable for stratification in the Str= subcommand.8984424 fuel | -.86772681 .331 .36410984 -1.69 0.8370111 .954404 .3271354 | |(3) X .316 .68 (.3581 | | Chi-sq [ 17] (prob) = 418.Rhs=ONE.418 . Largest 6 | | Average group size 6.07254 _cons | 21.1088193393D+01 . Interval] -------------+---------------------------------------------------------------output | .Str=YEAR.210321 .4844835 .836268 -1.42 0.6670 is the intercept of LSDV3 (see 5.17 Prob > F = 0. Autocorrelation of e(i.FUEL.088193 | | Standard error of e = .009826 | | Akaike Info.36561 | | Standard deviation = 1. Err.9904579 | | Adjusted R-squared = .17430918 FUEL | -.624053 4.1868 12.98362 .LOAD.7673414157D+02 .7703592 LOAD | -1.01540818 56.0000000 | |(2) Group effects only -120.4491 30.000 12.88486 -------------+---------------------------------------------------------------sigma_u | .1229382 | | Fit R-squared = .3641085 -1. 5.3178 Again.00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .4 Using LIMDEP In LIMDEP. the intercept 21. 72) = 1.131971 | | WTS=none Number of observs.indiana.881836 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.0000 -1. Do not forget to include ONE for the intercept.2.

364 3 72 . 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 4074.9989 = .256 3 .6. 5.896189 -.99062 ------------------------------------------------------------------------------ PROC PANEL has the /BTWNT option to estimate the between effect model.2840035 .3660016 30. PROC PANEL DATA=masil. by(year) .0512898 22.0228284 14.02254 -----------------------------------------------------------------------------tm_cost | Coef.659 14 .45 0.21220479 3 2.f. The PANEL Procedure Between Time Periods Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length BtwTime 6 15 Fit Statistics SSE 0.64 0.31776 | +--------------------------------------------------------------------+ You may find F statistic 1.00117 2.9991 = 0.00000 | |(4) vs (1) 418.00000 | |(4) vs (2) 383. regress tm_cost tm_output tm_fuel tm_load Source | SS df MS -------------+-----------------------------Model | 6. P value | |(2) vs (1) 35.3 Between Time Effect Model The between effect model regresses time means of dependent variables on those of independent variables.07073493 Residual | .21779542 14 .133337 .8052644 _cons | 11.3844937 tm_load | -1.2478264 -5.020449 1. collapse (mean) tm_cost=cost (mean) tm_output=output (mean) tm_fuel=fuel /// (mean) tm_load=load. Std.025 3 . MODEL cost = output fuel load /BTWNT.000508239 -------------+-----------------------------Total | 6.18505 .000 -1.56 0.00404 | |(3) vs (1) 400. RUN.000 .33 = 0.0056 DFE 11 http://www.350727 .005590631 11 .684 17 .427 14 .00000 | |(4) vs (3) 18.airline.3342486 .00000 2419. t P>|t| [95% Conf.169 14 72 .444128244 Number of obs F( 3. Err.edu/~statmath 49 . See Sections 3.246225 tm_fuel | . Prob.000 10.617 17 72 .0000 = 0. .18800 1. Interval] -------------+---------------------------------------------------------------tm_output | 1.605 14 75 .000 1. denom.00000 439.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 49 | Likelihood Ratio Test F Tests | | Chi-squared d. ID airline year.10 0.00000 1668.2 and 4.indiana.169 at the last line of the output and do not reject the null hypothesis of no fixed time effect.329 3 86 .37949 11. F num.

36561 | | Standard deviation = .111879D-13 http://www.0000 Obs per group: min = avg = max = F(3.361410 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.© 2005-2009 The Trustees of Indiana University (9/16/2009) MSE R-Square 0.0512897 22.) Mean = 13.642 .0225 Root MSE Parameter Estimates Standard Error 0.FUEL.Rhs=ONE.246223 fuel | .LOAD.6664301 | | WTS=NTi/Nobs Number of observs.56 0. Std.2840044 . xtreg cost output fuel load.33424795 .9749 Number of obs Number of groups = = 90 15 6 6.2478257 -5. Crt.indiana.13334032 .0001 <.3660 0.18504 1.02282811 14.334249 -1.))= .37948 11.0225441 -----------------------------------------------------------------------------cost | Coef.92650 | | Restricted(b=0) = -14.45 Pr > |t| <.097 .Means$ +----------------------------------------------------+ | Group Means Regression | | Ordinary least squares regression | | Model was estimated Aug 27.3660008 30. . 11] (prob) =4074.edu/~statmath 50 .133335 . = 15 | | Model size Parameters = 4 | | Degrees of freedom = 11 | | Residuals Sum of squares = .3844943 load | -1.9840 between = 0.35073 .0000 .9991009 | | Adjusted R-squared = .896191 -.10 0.348200 | | Akaike Info.0228284 14. 2009 at 04:23:24PM | | LHS=YBAR(i.0 6 4074.0228 0.Lhs=COST.9991 overall = 0.000 -1. t P>|t| [95% Conf.xtreg command and the Means subcommand in LIMDEP Regress$ command to get the same result. Interval] -------------+---------------------------------------------------------------output | 1.18504 .020447 1.9991 Linear Regression Models for Panel Data: 50 0.0001 <. be i(year) Between regression (regression on group means) Group variable: year R-sq: within = 0. use the be option in the Stata .10 14.64 0.2478 Variable Intercept output fuel load DF 1 1 1 1 Estimate 11.0000) | | Diagnostic Log likelihood = 37.3342494 .67933 | | Chi-sq [ 3] (prob) = 105.45 0.Panel.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | 1.000 1.111879D-13 FUEL | .0513 0.0002 Label Intercept Alternatively.64 -5.5590461E-02 | | Standard error of e = .0000) | | Info criter.46 (. LogAmemiya Prd.9906 -----------------------------------------------------------------------------REGRESS.OUTPUT. Err.35073 t Value 30. = -7.0001 0.35 0.56 22.2254382E-01 | | Fit R-squared = .21 (. = -7.8052695 _cons | 11.0005 0.11) Prob > F = = sd(u_i + avg(e_i.133335 0.Str=YEAR.05128905 22.Er.000 .0000 . Criter.000 10.9988557 | | Model test F[ 3.

3180). The null hypothesis of the fixed time effect model is that all time dummy parameters except (1.0882) (15  1) ~ 1.24782272 .airline.72] .1850651 .450 30.17 0. LIMDEP.561 Linear Regression Models for Panel Data: 51 ..35072980 11. (output is skipped) . PROC REG DATA=masil.xtreg by default conduct the F test.edu/~statmath 51 . 72) = Prob > F = 1.36599619 -5.3178 http://www.   t 1  0 . and Stata . TEST t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0.indiana.0882) (6 *15  15  3) The small F statistic does not reject the null hypothesis of no fixed time effect (p<.1683[14.141312D-06 5.© 2005-2009 The Trustees of Indiana University (9/16/2009) LOAD | Constant| -1. RUN. quietly regress cost t1-t14 output fuel load . one are zero: H 0 :  1  .0000 . The F statistic is (1.4 Testing Fixed Time Effects. MODEL cost = t1-t14 output fuel load..0000 .3354  1. test t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 ( 1) ( 2) ( 3) ( 4) ( 5) ( 6) ( 7) ( 8) ( 9) (10) (11) (12) (13) (14) t1 = 0 t2 = 0 t3 = 0 t4 = 0 t5 = 0 t6 = 0 t7 = 0 t8 = 0 t9 = 0 t10 = 0 t11 = 0 t12 = 0 t13 = 0 t14 = 0 F( 14.test command. SAS PROC PANEL. You may conduct the same test using the TEST statement in LSDV1 and the Stata .

Drop one cross-section and one time-series dummy variables. 5. strategy 4 does not work in Stata. two time variables. .edu/~statmath 52 .1114508 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 52 6.0000 = 0. There are five strategies when combining three LSDVs. Drop one cross-section dummy and impose a restriction on the time-series dummy parameters:  t  0 . and LSDV3 to avoid perfect multicollinearity or the dummy variable trap in a two-way fixed effect model.17563838 Residual | .040893 89 1. or one group or one time variables.1742825 . LSDV2. drop one time-series dummy and impose a restriction on the cross-section dummy parameters:  i 0 4.346179 g2 | . The first strategy of dropping two dummies is generally recommended because of its convenience of model estimation and interpretation..864044 22 5.02 0. suppress the intercept and impose a restriction on the timeseries dummy parameters:  t  0.2 LSDV1 without Two Dummies The first strategy excludes two dummy variables.157 -.cnsreg does not allow suppressing the intercept.43 0. 67) Prob > F R-squared Adj R-squared Root MSE = 90 = 1960.05138 -----------------------------------------------------------------------------cost | Coef. drip one time dummy and suppress the intercept 3.82 = 0. In general. Suppress the intercept and impose a restriction on the cross-section dummy parameters:  i  0 . Two-way Fixed Effect Models A two-way fixed model explores fixed effects of two group variables. Since . 2. This model thus needs two sets of group and time dummy variables (i.2670499 http://www.0441482 .9984 = 0. Drop one cross-section dummy and suppress the intercept.1 Strategies of the Least Squares Dummy Variable Models You may combine LSDV1. Include all dummy variables and impose two restrictions on the cross-section and timeseries dummy parameters:  i  0 and  t  0 Each strategy produces different dummy coefficients but returns exactly same parameter estimates of regressors.9979 = . t P>|t| [95% Conf. Alternatively. regress cost g1-g5 t1-t14 output fuel load Source | SS df MS -------------+-----------------------------Model | 113. 6.047 . 1. Alternatively. Let us exclude g6 for the sixth airline and t15 for the last time period. Err. 6.002639534 -------------+-----------------------------Total | 114. dummy coefficients are not of primary interest in panel data models.0023861 .e. This chapter investigates fixed group and time effects.28135835 Number of obs F( 22. airline and year). Std.0861201 2.0779551 1.176848775 67 . one dummy from each set of dummy variables.indiana. Alternatively. Interval] -------------+---------------------------------------------------------------g1 | .

0073 http://www.04 0.0915422 -.027 -.3378385 -2.1732969 -2.4272042 .08612 0.0519893 t13 | -.9984 0.5409901 -.17685 114.07796 0.044 -1.36561 0.5421537 .3189139 -1.543 -.8172487 .004 -.0001 0.airline.0399313 g4 | .6384366 .38439 R-Square Adj R-Sq 0.042 -.8033319 -.05 0.78 0.042249 output | .301271 .143511 .0319005 -3.0500762 t8 | -.00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960.92 0.9979 Parameter Estimates Parameter Estimate 12.7418804 -.05138 13.178708 .1160484 .83 2.0100769 t6 | -.1192713 .11145 -0.2470907 -.04089 Mean Square 5.0429008 -0.1118032 .08 0.000 .045 -.49 0.3398463 .040233 t9 | -. PROC REG DATA=masil.61 0.16861 .1501062 -2.1576935 .37978 -. MODEL cost = g1-g5 t1-t14 output fuel load.048 -.36765 ------------------------------------------------------------------------------ In SAS.17428 0.2718933 .007 -.3603843 _cons | 12.81 0.0321443 5.027 -.61 0.033641 .000 8.05189 Variable Intercept g1 g2 g3 DF 1 1 1 1 t Value 5.0749914 t11 | -.4730429 .0470 0.3320802 -1.306 -.0186451 .43 -2.0763495 -2.70 0.030508 -0.031851 25.0901007 .436 -.094 -1.0018463 t1 | -.21823 0.77 Pr > |t| <. run the following script to get the same result.94004 2.163478 1.86404 0.0177346 .indiana.02 0.218231 5.175477 -.0944011 t5 | -.9360088 -.0518934 -2.94004 0.0546315 t14 | -.82 Pr > F <.8808235 fuel | .5958031 .2273857 .0795393 .edu/~statmath 53 .77 0.0224688 -2.001 -.0510764 t7 | -.3959783 .0188098 t2 | -.27 0.075 -1.1348175 -2. The REG Procedure Model: MODEL1 Dependent Variable: cost Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.50 0.512434 17.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 53 g3 | -.405244 -.000 .0466942 .0617764 t4 | -.0001 Root MSE Dependent Mean Coeff Var 0.1802087 .98 0.626 -.025 -.2617373 -3.3294473 -1.8828142 .0243983 t3 | -.26 0.37 0.03 0.66 0.253383 .17564 0. RUN.83 0.28 0.001 -1.367467 -.0027964 t10 | -.02 1.7536739 .0362554 -0.2319459 -2.6394596 -.14351 Standard Error 2.4949135 load | -.1575 0.2443691 g5 | -.6931382 .18844 -2.059 -1.0481295 t12 | -.

04 -2.04290 0.346179 g2 | .974786 Residual | .34424 1.491942 16.0477 0. t P>|t| [95% Conf.08 -2.000 .000 8.61 -2.81 0.1160484 .50 0.000 8.1235 t4 | 12.LOAD$ 6.38937 t14 | 12.0454 0.17330 0.70 -2.000 8. Let us drop a dummy g6 and suppress the intercept.450294 16.T1.3623 http://www.78 -0.4357 0.03626 0.26174 5.T2.394303 17.0861201 2.042 -.T10.0750 0.T5.T4.0224688 -2.0399313 g4 | .27 -2.480492 17.05038 6.59580 -0.33208 0.1742825 .3016 1.000 8.0441482 .43 0.0779551 1.T14.T8.11180 -0.05138 -----------------------------------------------------------------------------cost | Coef.T3.598694 16.© 2005-2009 The Trustees of Indiana University (9/16/2009) g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0. regress cost g1-g5 t1-t15 output fuel load.1802087 .49 -0.03185 0. REGRESS.31891 0.44206 t6 | 12.T6.61 0.564976 16.991503 6.41943 17.81725 0.32945 0.0466942 .G2.66 1.000 8.08 0.04669 -0.51 0.0588 0.81 -1.77 0.33784 0.527062 16.01773 -0.T7.07615 t3 | 12.03051 0.5969 90 179.63844 -0. Rhs=ONE.035334 6.47304 -0.495438 16.71266 2.03214 0.000 8.98 -3.0012 In LIMDEP.48363 16.18844 0.T9.3061 0.01865 0.54215 -0. .01018 t2 | 12.460909 16.2443691 g5 | -.50 -0.2670499 g3 | -.12 0.0018463 t1 | 12.5432 <.451487 16.03190 0.224893 5.903395 6.02 -2.6263 0.82824 2.92 -1.26 0.143511 .0266 0.000 8.77 0.T11.15011 0.92231 2.4201 23 703.61 25.66815 2.455241 17.000 8.467 1.Lhs=COST.2470907 -.26 -2.02 0.23195 0.69314 -0.FUEL. 0.G1.3 LSDV1 + LSDV2: Drop a Dummy and Suppress the Intercept The second strategy combines LSDV1 and LSDV2 to drop a dummy and suppress the intercept.000 8.73948 t9 | 12.06 0.22739 -0.edu/~statmath 54 .39337 t13 | 12.39789 1.T13.000 8.16861 -0.0001 0.000 8.247972 5. 67) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .16348 0.05 -1. the following command fits the same model (output is skipped).OUTPUT.176848775 67 .77 0.26217 t12 | 12.0023861 .18021 -0.906633 Number of obs F( 23.000 8.0938 0.60019 2.007 -.03 -3.2469 1.073782 6.74 0.84086 t10 | 12.91 0.0415 0.9214 2.51284 2.221401 5.0915422 -.9064 2.0518934 -2.090527 6.151893 5.54406 2.07635 0.T12.0255 0.0040 0.0008 0.51 0.33985 -0.G3. Interval] -------------+---------------------------------------------------------------g1 | .0268 0.0000 .39598 -0.19708 t5 | 12.57538 t7 | 12.1114508 .885399 6.0001 0.00785 t11 | 12. Std.002639534 -------------+-----------------------------Total | 16191. Err.000 8.03364 -0.G5.88281 Linear Regression Models for Panel Data: 54 0. noc Source | SS df MS -------------+-----------------------------Model | 16191.0321443 5.52 0.891045 6.indiana.89341 6.27189 -0.63664 t8 | 12. Keep in mind that SSE is still correct but F and R2 are not.13482 0.0441 0.G4.42720 -0.237999 5.28 -2.000 8.37 <.157 -.08 0.15 0.0000 1.02247 0.047 .0000 1.417458 17.

974786 Residual | .0100769 t6 | -.7418804 -.230546 5.04 0.626 -.27 0.223638 5.6394596 -.FUEL.T1.94004 2.26 0.50232 g3 | 12.8172487 .3320802 -1.027 -.T4.075 -1.2319459 -2.000 .0243983 t3 | -.0500762 t8 | -.T15.T13.8808235 fuel | .0000 1.94004 2.600665 17.4949135 load | -.163478 1.18844 -2.512434 17.LOAD$ (output is skippted) REGRESS.306 -.7536739 .229864 5. execute the following script that has /NOINT to suppress the intercept. PROC REG DATA=masil.3959783 .0186451 .222204 5. OUTPUT.36765 t1 | -.030508 -0.36765 output | .512434 17. RUN.2617373 -3.027 -.81 0.28 0.218231 5.03 0.543 -.033641 .344341 17.4949135 load | -.0546315 t14 | -. you may drop one of time dummies and suppress the intercept.5409901 -.000 .0795393 .85 0.001 -1.16861 .T5.6931382 .66 0.T9.4201 23 703. t P>|t| [95% Conf.000 8. noc Source | SS df MS -------------+-----------------------------Model | 16191.79653 2.253383 .37978 -.0510764 t7 | -.178708 .T14. (output is skippted) In LIMDEP.2617373 -3. Interval] -------------+---------------------------------------------------------------g1 | 13.301271 .12025 2.3603843 ------------------------------------------------------------------------------ In SAS.0763495 -2.T10.0027964 t10 | -.040233 t9 | -.1118032 .T3.218231 5.2718933 . http://www.8828142 .37 0.176848775 67 .3294473 -1. .1348175 -2.000 8.042249 output | .56453 g2 | 13.001 -1.03 0.89335 2.175477 -.000 8.1501062 -2.4272042 .001 -.98 0.edu/~statmath 55 .094 -1.T11.0749914 t11 | -.405244 -.88 0. Std.0519893 t13 | -.50 0. ONE should be taken out to suppress the intercept.indiana.1576935 .05138 -----------------------------------------------------------------------------cost | Coef.80 0.32888 g6 | 12.airline.T8.8033319 -.11432 2. MODEL cost = g1-g5 t1-t15 output fuel load /NOINT.49 0.906633 Number of obs F( 23.92 0. regress cost g1-g6 t1-t14 output fuel load.05149 2.000 8.0188098 t2 | -. 67) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 .8808235 fuel | .367467 -.5969 90 179.70 0.G4.1576935 .031851 25.059 -1.3603843 ------------------------------------------------------------------------------ Alternatively.66412 17.Lhs=COST.90 0.T6.G5.78 0.4730429 .405244 -.229552 5. Rhs=G1.74 0.025 -.1732969 -2.2273857 .G2.163478 1.Lhs=COST.002639534 -------------+-----------------------------Total | 16191.T2.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 55 t15 | 12. Err.T7.T12.16861 .1192713 .5421537 .0617764 t4 | -.0177346 .0944011 t5 | -.68185 17.55865 g5 | 12. 0.0481295 t12 | -.37 0.000 8.031851 25.0319005 -3.045 -. REGRESS.048 -.24872 g4 | 13.83 0. MODEL cost = g1-g6 t1-t14 output fuel load /NOINT.8172487 .5958031 .000 8.61 0.436 -.05 0.306 -. The dummy coefficients are different from those above but parameter estimates of regressors remained unchanged.83 0.0901007 .0362554 -0.45781 17.044 -1.9360088 -.000 8.3398463 .0000 1.3189139 -1.3378385 -2.004 -.0000 .0429008 -0.7536739 .6384366 .8828142 .66 0.02 0.G3.

airline.16666667 G2 | 13.T12. The REG Procedure http://www.e(-1)] = .47300784 .833 .0266 .06666667 T10 | -. Criter.611 .03050793 -.T2.16666667 T1 | -.131971 | | WTS=none Number of observs.039 .9984493 | | Adjusted R-squared = .81725242 .0588 .032 .T11.6262 . RUN.T14.indiana.23194606 -2.7703592 LOAD | -.03364915 . LogAmemiya Prd.0000 .721164 | | Autocorrel Durbin-Watson Stat. and R2 (.13481769 -2.T5.G6.21 (.264 .0000 .853 .T6.16863516 .6035047 | | Rho = cor[e.285 .6982476 | | Not using OLS or no constant.59575348 .G5.01864714 .8930131 2.3060 12.16666667 G3 | 12.T8.0255 .1199153 2.63838795 .18844068 -2. = .06666667 T5 | -.FUEL.0000 .9984).802 . 67] (prob) =1960.06666667 OUTPUT | .04290088 -.0008 .0000 .700 . RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.0750 .17430918 FUEL | .0000) | | Diagnostic Log likelihood = 152. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 13. Let us drop a time dummy here and then impose a restriction on group dummy parameters.T9.267 .06666667 T13 | -.LOAD$ +----------------------------------------------------+ | Ordinary least squares regression | | Model was estimated Aug 30.0441 .900 .06666667 T8 | -.03625541 -.16666667 G5 | 12.69308729 .4 LSDV1 + LSDV3: Drop a Dummy and Impose a Restriction The third strategy excludes one dummy from a set of dummy variables and imposes a restriction on another set of dummy parameters.03190046 -3.7479 | | Restricted(b=0) = -138.5431 .39595152 .15010661 -2.0938 .T10.21823375 5. PROC REG DATA=masil.373 .OUTPUT.T4.0000) | | Info criter. 6.489 .33783938 -2.3581 | | Chi-sq [ 22] (prob) = 582.1139819 2.17329717 -2.0511515 2.11180525 .G4.659 .22737840 .83).36561 | | Standard deviation = 1.33982426 .0454 .0040 .505 .56046016 Notice that LIMDEP reports correct F (1960.0000 .01774030 .26173663 -3.737 .06666667 T14 | -.03185102 25.0477 .06666667 T9 | -.06666667 T3 | -.0000 -1.22220692 5.808 .017 . Crt.1768479 | | Standard error of e = .33208126 -1.784 . = -5.22955625 5.T7.54210773 . Rsqd & F may be < 0.0000 . MODEL cost = g1-g6 t1-t14 output fuel load.T3.0268 . = -5.22986828 5.22364115 5. = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .T13.06666667 T6 | -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 56 Rhs=G1.edu/~statmath 56 .06666667 T2 | -.16666667 G4 | 13.709580 | | Akaike Info.052 .882 . 2009 at 03:58:13PM | | LHS=COST Mean = 13.0012 .31891465 -1.978 .16347826 1.42717813 .5137627E-01 | | Fit R-squared = .06666667 T11 | -.06666667 T4 | -.06666667 T7 | -.27187359 .32944797 -1.07634935 -2.83 (.922 .9397087 2.06666667 T12 | -.7961914 2.23055043 5.88281516 .16666667 G6 | 12.T1.G3.4356 .G2.9979401 | | Model test F[ 22.

61 25.03897 0.70 -2.03185 0.81725 0.47304 -0.22540 0.36561 0.0938 0.01773 -0.98 -3.13425 -0.28 -2.63844 -0.69314 -0.04596 -0.9984 0.01832 0.5432 <. * Probability computed using beta distribution.82 Pr > F <.16348 0.17564 0.22739 -0.38439 R-Square Adj R-Sq 0.0001 0.04161 0.04089 Mean Square 5.0069 0.49 -0.31891 0.26 -2.0588 0.12833 0.indiana.03 -3.04601 0.6263 0.98600 0.0001 0.0040 0.33208 0.15011 0.84 2.13482 0.09265 -0. Pr > |t| <.0012 .78 -0.0268 0.0255 0.01865 0.03190 0.0001 Root MSE Dependent Mean Coeff Var 0.0008 0.17330 0.05138 13. Variable Intercept g1 g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 output fuel load RESTRICT DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 t Value 5.10 -2.32945 0.02 -2.0155 0.37 .0454 0.2733 0.14 7.59580 -0.39598 -0.03731 0.0750 0.23195 0.9387E-16 Standard Error 2.88281 -1.0441 0.3061 0.0477 0.04290 0.9979 Parameter Estimates Parameter Estimate 12.33985 -0.50 -0.edu/~statmath 57 .81 -1.68 -12.4357 0.27189 -0.27 -2.11180 -0.00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 57 Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.0975 <.16861 -0.0266 0.07635 0.33784 0. Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.05 -1.04 -2.06549 -0.86404 0.66 1.03364 -0.79 1.54215 -0.0001 <. http://www.92 -1.42720 -0.18947 0.18844 0.0001 0.03051 0.48 -1.33 -2.01561 0.26174 .17685 114.03626 0.

0389685 1. b(2) in the subcommand indicates the second parameter estimate listed in the Rhs= subcommand.0481295 t12 | -.5137627E-01 | | Fit R-squared = .T12.6931382 .9360088 -.0243983 t3 | -.edu/~statmath 58 . t P>|t| [95% Conf.0370916 t1 | -.075 -1. run a Regress$ command with the Cls: subcommand.0510764 t7 | -.097693 .000 8.000 -.T2.030508 -0.0100769 t6 | -.045 -.000 .1583102 g4 | .48 0.0156096 -12. is zero.33 0.027 -.040233 t9 | -.1894671 .001 -1.5409901 -. cnsreg cost g1-g6 t1-t14 output fuel load.0188098 t2 | -. you need to run the . .05 0.OUTPUT.0122867 .36561 | | Standard deviation = 1.1768479 | | Standard error of e = .G3.175477 -.4949135 load | -.9979401 | | Model test F[ 22.0519893 t13 | -.G2.8828142 .044 -1.0181824 g6 | -.37978 -.0373085 -2.T11.0183163 7.1501062 -2.1290038 .2273857 .306 -.26 0.0460126 2.T14.T1.92 0.0319005 -3.0514 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 -----------------------------------------------------------------------------cost | Coef.097 -.constraint(1) option fits OLS under constraint 1 defined in .436 -.033641 .2319459 -2.18844 -2.Lhs=COST.016 -.constraint.0000 0.025 -.0944011 t5 | -.0027964 t10 | -.4730429 .220624 -.3378385 -2.61 0.1671184 -.001 -.T7.3294473 -1.8808235 fuel | .G1.3959783 .42792 ------------------------------------------------------------------------------ In LIMDEP.0654947 .1708121 g5 | -.7418804 -.131971 | | WTS=none Number of observs.004 -.27 0.2201679 g2 | .986 2.G5.T5. .544076 17. 67) Prob > F Root MSE = = = = 90 1960.0459561 . 67] (prob) =1960.626 -.83 (.2617373 -3.0795393 .059 -1.0416069 -1.T3.03 0.0926504 . Interval] -------------+---------------------------------------------------------------g1 | .0500762 t8 | -.0763495 -2. LIMDEP fits the LSDV1 under the constraint that the sum of all group dummy parameters.T10.3320802 -1.048 -.0901007 .T8.405244 -.28 0.0362554 -0.1576935 .042249 output | .50 0.1283264 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 58 In Stata. Err.68 0.0000) | http://www.007 .82 0.02 0.0186451 .6384366 .84 0.indiana.178708 .04 0.1118032 .6394596 -.G6.5421537 .273 -.70 0.1732969 -2.3398463 .66 0.031851 25.094 -1.000 . Rhs=ONE.16861 . REGRESS.0364849 .0546315 t14 | -.1192713 .027 -. 2009 at 04:24:35PM | | LHS=COST Mean = 13.T13.0749914 t11 | -.3603843 _cons | 12.T6.1432761 g3 | -. constraint(1) Constrained linear regression Number of obs F( 22.14 0.163478 1.T4. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 30.9984493 | | Adjusted R-squared = .cnsreg command with a constraint on the group dummy parameters.cnsreg with the .367467 -.225402 5.543 -.4272042 .FUEL.81 0. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0 . = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .98 0.79 0.0429008 -0.G4.10 0.3189139 -1.0617764 t4 | -.2718933 .LOAD.301271 .8172487 .1342526 .78 0. b(2) for g1 through b(7) for g6. Std.0177346 .8033319 -.253383 .1348175 -2. Therefore.5958031 .49 0.T9.7536739 .37 0.

22540616 5.63838795 . cnsreg cost g1-g5 t1-t15 output fuel load. you may drop one group dummy and imposes a restriction on time dummy variables.T3.0012 .978 .G3. PROC REG DATA=masil.23194606 -2.0008 .T11.T14.032 .33783938 -2. with restrictions imposed. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Constant| 12.16863516 .06666667 T2 | -. Crt.31891465 -1.G5.052 .01560965 -12.138 .16347826 1. The output is skipped.6262 .7479 | | Restricted(b=0) = -138.06549116 . http://www.e(-1)] = .26173663 -3.22737840 .267 . RUN. Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$ 6.789 .330 . Rhs=ONE.15010661 -2.06666667 T11 | -.T4.06666667 T6 | -.17329717 -2.88281516 .0267 .airline.04160692 -1. Rsqd & F may be < 0.264 .835 .0478 .0000 . MODEL cost = g1-g5 t1-t15 output fuel load. F (703.59575348 .784 .9856603 2.FUEL.3581 | | Chi-sq [ 22] (prob) = 582.03185102 25.039 .42717813 . LogAmemiya Prd.06666667 T5 | -.6035047 | | Rho = cor[e.0000 G1 | .104 .27187359 .0156 .0000 -1.611 .T10.T6.indiana. constraint(3) REGRESS.21 (.54210773 .659 .681 .0269 .T1.7703592 LOAD | -.2734 . Since the intercept is suppressed.09264719 .0000) | | Info criter.33982426 . b(7) indicates the seventh parameter estimate for t1.03190046 -3. PROC REG DATA=masil.17430918 FUEL | . MODEL cost = g1-g6 t1-t15 output fuel load /NOINT.06666667 T14 | -.0041 .18844068 -2.39595152 .T13. F[ 1.0442 .06666667 T8 | -.16666667 G3 | -.16666667 G2 | .06666667 T3 | -.03050793 -.5 LSDV2 + LSDV3: Suppress the Intercept and Impose a Restriction The strategy of LSDV2 + LSDV3 includes all two sets of dummy variables and instead suppresses the intercept and imposes a restriction.G2.5432 .4356 .81725242 . = .06666667 T9 | -.T12. | | Note. = -5.9748) and R2 are incorrect.03625541 -.OUTPUT.69308729 .06666667 T7 | -.T9.0000 .483 .3061 12. .16666667 G4 | .0589 .06666667 OUTPUT | .017 .01831636 7.04595164 . In LIMDEP.16666667 T1 | -.T7.6982476 | | Restrictns. Stata does not support this approach.00 (*****) | | Not using OLS or no constant.32944797 -1.56046016 Alternatively.01774030 .0454 .721164 | | Autocorrel Durbin-Watson Stat.LOAD.47300784 . constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 59 | Diagnostic Log likelihood = 152.922 .13425504 . Criter.12832155 .33208126 -1.G1.airline.07634935 -2.0976 .0069 .11180525 .T8.16666667 G5 | -.13481769 -2.06666667 T10 | -. RESTRICT t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0.0751 .505 .285 .18946893 . The following procedure has a constraint on the group variable.16666667 G6 | -.489 . = -5.edu/~statmath 59 .04601257 2.06666667 T4 | -.06666667 T12 | -.03896849 1.04290088 -.T5.01864714 .700 .T15.06666667 T13 | -. 66] (prob) = .808 .Lhs=COST.T2.03364915 .373 .709580 | | Akaike Info.0255 .0939 .G4.03730846 -2. Rsqd may be < 0.

10 6.84 Pr > |t| 0.78 5.14 7.68 -12.04601 0.indiana.38439 R-Square Adj R-Sq 1.09265 -0.78 5.22540 Variable g1 g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t Value 2.29286 12.09734 2.03731 0.0975 <.91 5.04161 1.0001 http://www.0001 <.51295 12.0155 0.06 5.26 6.74 5.87419 12.0001 <.0001 <.0001 <.44384 12.00264 Source Model Error Uncorrected Total DF 23 67 90 F Value 266704 Pr > F <.0001 <.50 6.13425 -0.75861 12.0001 <.0001 <.© 2005-2009 The Trustees of Indiana University (9/16/2009) RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.01561 0.12833 0.23202 2.08 6.81 5.17685 16192 Mean Square 703.0001 <.22838 2.52 6.0001 <.95236 12.0001 0.01832 0.33 -2.03897 0. Linear Regression Models for Panel Data: 60 The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.0000 Parameter Estimates Parameter Estimate 0.04195 2.52 6. Number of Observations Read Number of Observations Used 90 90 NOTE: No intercept in model.0001 <.05706 2.0001 <.55879 12.0001 <. R-Square is redefined.0001 Root MSE Dependent Mean Coeff Var 0.0000 1.51 6.71410 12.25499 2.15 6.0001 <.97479 0.48 -1.39019 12.89169 1.89736 1.0001 <.99808 2.64615 12.edu/~statmath 60 .04596 12.0069 0.2733 <.12 6.89982 1.24505 2. Analysis of Variance Sum of Squares 16191 0.90989 1.08052 2.96735 12.96826 12.98600 Standard Error 0.06549 -0.0001 <.15883 2.18947 0.79 1.34756 12.36561 0.05138 13.59002 12. RUN.

37 (.33203125 433684.T3.637 .Lhs=COST.T2. | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ G1 | 13.0000 .00 <.T10.03 -3..697046 | | Akaike Info.... In LIMDEP.6894531 216842.T7.0000 .T6. = . F[ 1.G2.(Fixed Parameter).FUEL.0000 .0058594 ..T11.0000) | | Info criter. Rsqd may be < 0. 2009 at 04:47:10PM | | LHS=COST Mean = 13.68 (. Criter.319 .36561 | | Standard deviation = 1.6917788 | | Restrictns.T14.G3. REGRESS.T3.T15.T9.16666667 G4 | 13.16348 0. Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$ (output is skipped) REGRESS. following commands are supposed to work. Cls:b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)=0$ +----------------------------------------------------+ | Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Aug 30. Rhs=G1.6164424 | | Rho = cor[e..T2.T1.T8.000 1.4113) | | Not using OLS or no constant.G5...0117188 216842..T6.T4.© 2005-2009 The Trustees of Indiana University (9/16/2009) output fuel load RESTRICT 1 1 1 -1 0.9984297 | | Adjusted R-squared = .08 (.T5..81725 0.0000* 0.T4.(Fixed Parameter).T7.G6.OUTPUT..T9. G2 | 12..0000 . You may impose an alternative restriction on the time variable to obtain the equivalent result despite different dummy coefficients.000 1.1790783 | | Standard error of e = ...06666667 T2 | -.131971 | | WTS=none Number of observs. RUN.5169924E-01 | | Fit R-squared = ..8261719 . Rhs=G1.. Rsqd & F may be < 0.16666667 G3 | 12.T14.88281 5.G5.T1. T1 | -. Crt.G4.9979141 | | Model test F[ 22.. PROC REG DATA=masil.edu/~statmath 61 ..348 .06666667 http://www.000 1.T13. with restrictions imposed.e(-1)] = .0001 0.0000 .66 1.G4.16861 -0. LogAmemiya Prd.T13.G6.T10..Lhs=COST..OUTPUT.(Fixed Parameter).airline. G6 | 12. = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .G3..LOAD. but they return different parameter estimates and goodness-of-fit measures probably due to its estimation method.03185 0. The output is skipped. = -5...39453125 306661..T5.1839 | | Restricted(b=0) = -138.. MODEL cost = g1-g6 t1-t15 output fuel load /NOINT.T8. RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0..T12.9453125 216842.3061 0.LOAD..06666667 T3 | -.319 . | | Note.319 . 66] (prob) = .000 1.319 ..250165E-9 * Probability computed using beta distribution.708630 | | Autocorrel Durbin-Watson Stat.29101563 216842.89339E-14 Linear Regression Models for Panel Data: 61 25.0012 1. 67] (prob) =1936.indiana.000 1.7812500 .G2..T11.0000) | | Diagnostic Log likelihood = 152.000 1.26174 1. = -5..3581 | | Chi-sq [ 22] (prob) = 581..T15..FUEL.16666667 G5 | 12.T12.0000 .37 0..

319 ...88619366 306661.0069 http://www.16450594 .31835938 .0000 Parameter).15204518 -..000 1..04089 Mean Square 5..17430918 12.edu/~statmath 62 ....(Fixed .82 Pr > F <..0000 Parameter). Parameter).79 Pr > |t| <.3587 -3. Number of Observations Read Number of Observations Used 90 90 Analysis of Variance Sum of Squares 113.319 216842.(Fixed .6 LSDV3 with Two Restrictions The last strategy includes all group and time dummies and then imposes two restrictions on group and time dummy parameters.09 2.9984 0..00264 Source Model Error Corrected Total DF 22 67 89 F Value 1960....0001 0.06666667 ....000 1..0000 . PROC REG DATA=masil.319 216842..06666667 .....0000 . RESTRICT t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0.02148438 .06666667 -1.airline.56046016 6.26338199 Linear Regression Models for Panel Data: 62 .16406250 -.06666667 .365 .348 . .0000 . RUN.81399272 .......0000 .22070313 .05138 13...000 1.319 216842...06666667 ...0000 .10351563 . Parameter)..17564 0.10742188 -. The REG Procedure Model: MODEL1 Dependent Variable: cost NOTE: Restrictions have been applied to parameter estimates.06666667 .0000 .36561 0.17685 114.03205125 .30468750 ..397 . Parameter).07421875 -.86404 0.(Fixed .indiana..0001 Root MSE Dependent Mean Coeff Var 0. Pay attention to the two RESTRICT statements in the following PROC REG..319 216842.38439 R-Square Adj R-Sq 0..(Fixed .. RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0.000 1.9979 Parameter Estimates Parameter Estimate 12. 25....7703592 ..000 1...08107 0.66688 0.....05859375 . MODEL cost = g1-g6 t1-t15 output fuel load.24414063 -.0013 .924 .319 216842.000 1.© 2005-2009 The Trustees of Indiana University (9/16/2009) T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 OUTPUT FUEL LOAD | | | | | | | | | | | | | | | -.33203125 ..12833 Standard Error 2.(Fixed 216842..04601 Variable Intercept g1 DF 1 1 t Value 6.06666667 .31250000 ....000 1.

0012 1.82 0. Std.6907554 .2624 0.0183163 7. constraint(1 3) Constrained linear regression Number of obs F( 22.27669 -0.62 1. execute the following command to get the same result.091 -.33 -2.074 -.1088 0.3013791 .3740245 .16861 -0.136 -.0975 <.0917281 .1342526 .13425 -0.13 0.1432761 g3 | -.29 0.0204506 -1.1894671 .0416069 -1.0864404 -1.0554 0.68 -12.72 -1.3264649 .0654947 .3598E-16 Linear Regression Models for Panel Data: 63 1.17564 0.314 -.14 0.02045 0.0364849 .51 0.030017 .016 -.96 2.30138 0.0768646 .03193 0.220624 -.16 25.01 1.03897 0.1539291 .0200869 t9 | . Notice that constraints 1 and 3 were defined above.1283264 . Interval] -------------+---------------------------------------------------------------g1 | .79 0.2537092 t11 | .04722 0.16348 0.82 1.0319336 -2.1833501 -1.0903829 .41 -1.6327752 t14 | .0108278 .63 1. 0.01832 0.5962E-16 -2.88281 -2.95 0. In Stata.1536212 1.019 -.03185 0.02908 0.2073105 .0459561 .96 0.edu/~statmath 63 .01 0.1290038 .39 1.41 0.37 -0.13 1.2733 0.2766893 .262 -.0185513 t7 | -.18947 0.63 0.22304 -0. 67) Prob > F Root MSE = = = = 90 1960.66 1.08115 0.indiana.0460126 2.2017 0.0811525 1.6426576 .41 0.0131248 t8 | -.097 -.1660294 1.33 0.04547E-11 .48 0.09173 0.78 -2.0373085 -2.0089536 t2 | -.15393 -0.48 -1.41 -2.0795 0.51 -1.0186066 t6 | -.0341 <.07686 -0.0155 0.08644 0.1976296 -.17297 0. 0.019 -.1491443 1.19187 0.0521097 t3 | -. cnsreg cost g1-g6 t1-t15 output fuel load.18335 0.0448591 -2.0000* .0207326 .6070978 http://www.079 -.202 -.02073 0.01561 0.37402 -0.78 0.055 -.81725 0.007 .0001 0.0061606 .3004686 .109 -.29 -1.28547 0.169 -.1406043 -.2230399 .09265 -0.30047 0.1583102 g4 | .0740 0.14749 0.0472205 .6360447 t13 | .7570026 .097693 .10809 -0. t P>|t| [95% Conf.191872 -1.1222038 t5 | -.000 .0000 0.273 -. .10 0.20731 0.3061 0.000 -.0001 <.2854727 .3143 0.18609 0.0892789 t4 | -.1671184 -.1860877 -1.1080904 .0926504 .95 -1.109 -.0290822 1.1052688 t10 | .04486 0. * Probability computed using beta distribution.14914 0.26174 4.0156096 -12.1091 0.1360 0.0514 ( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0 ( 2) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0 -----------------------------------------------------------------------------cost | Coef.39 0.0188 0.0650993 .© 2005-2009 The Trustees of Indiana University (9/16/2009) g2 g3 g4 g5 g6 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 output fuel load RESTRICT RESTRICT 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 0.00 .16603 0.04161 0.31932 -0.06549 -0.0181824 g6 | -.15362 0.061552 .31911 0.0001 0.03731 0.1708121 g5 | -.0389685 1.1691 0.1756365 1.0702531 .10 -1.82 0.3193228 .14 7.62 0.72 0.04596 -0.68 0.5050039 t12 | .0122867 .0370916 t1 | -. Err.1729671 -1.2201679 g2 | .0187 0.0908 0.055 -.03 -3.0546 0.5682837 .

T5.09 0.66 0.7616927 .000 .426279 ------------------------------------------------------------------------------ Remember that F.FUEL. gen gen gen gen w_cost = w_output w_fuel = w_load = cost .G5.77036 .gm_output .G4.84 0.306 -.8828142 . Std.87739643 3 .022824947 Number of obs F( 3.676287 http://www.tm_fuel + m_fuel load .176848774 87 . b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)+b(16)+b(17)+b(18)+b(19)+b(20)+b(21)+b(22)=0$ 6.tm_cost + m_cost = output .432066 .831 load | 90 . t P>|t| [95% Conf.9139 0.339349 -.000 .G2. .000 8.gm_fuel .8808235 fuel | .LOAD.tm_load + m_load Once data are transformed. .031851 25.8728048 w_fuel | .86 0.0000 0.T13.OUTPUT.0279512 29.174309 1. Do not forget to suppress the intercept. the standard error of the load factor is .T15.T4. . yit  yit  yi   yt  y and xit  xit  xi   xt  x .8123749 11.T14. .66688 2. standard errors.T9.04509 -----------------------------------------------------------------------------w_cost | Coef. Err. noc Source | SS df MS -------------+-----------------------------Model | 1.indiana.T11.081068 6.16861 .03 0. Cls:b(2)+b(3)+b(4)+b(5)+b(6)+b(7)=0.82071 ------------------------------------------------------------------------------ In LIMDEP.05424521 90 .1576935 . for instance.Lhs=COST.513054 16. the following command returns the same result (output is skipped).18 0.9109 . R2.131971 11.G3.T8.8172487 .G1. run the OLS with the transformed variables. Standard errors need to be adjusted.405244 -.243 -.0247259 .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 64 t15 | .000 -1. Min Max -------------+-------------------------------------------------------cost | 90 13.163478 1.2297*sqrt(87/67).3733 output | 90 -1.0527934 .T10.4537565 w_load | -.2617=.5604602 .2617373 -3. . Interval] -------------+---------------------------------------------------------------w_output | .002032745 -------------+-----------------------------Total | 2. Notice that two restrictions in Cls: are separated by a comma.55017 13. Rhs=One.gm_cost . say airline 3.14154 15.T7.4949135 load | -.8828142 .2296907 -3.3191137 .G6.034 .6608616 fuel | 90 12. Dev. regress w_cost w_output w_fuel w_load.1474883 2. sum cost output fuel load Variable | Obs Mean Std.7536739 .gm_load . We need to compute overall means and group specific.7 Two-way Within Effect Model The two-way fixed effect model requires a transformation of dependent and independent * * variables using group means.37 0. means.tm_output + m_output fuel .1434621 1. REGRESS.T12.16 0.001 -1. .1165364 . The dummy variable coefficients are computed as di*  ( yi   y )  ( xi   x )'  and dt*  ( yt  y )  ( xt  x )'  .edu/~statmath 64 .T2.6135015 output | .8172487 .16861 .36561 1.T3.T1. and DFerror are not correct.3603843 _cons | 12.278573 .T6.150606 -3. 87) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 307.24 0.625798811 Residual | .

The data set needs to be sorted by the group and time variables that will be declared in the ID statement in PROC PANEL. ID airline year.6851 13.2435335 -1.7704)*(. The actual intercept of time period 9 is . sum cost output fuel load if year==9 Variable | Obs Mean Std.83356 12.8 Using SAS: PROC TSCSREG and PROC PANEL PROC TSCSREG and PROC PANEL have the /FIXTWO option to fit the two-way fixed effect model.6179098 .8828).831 load | 15 .airline.8177211 11. BY airline year.4779284 fuel | 6 12.524334 .9122625 .6169364 fuel | 15 12.1686). .8828).5220657 12.0324437 .78597 output | 6 -1.3656)-(-. Dev.20495 14.99694 output | 15 -.86104 .37231 .0212523 12.7897-12.1686).042032 12.3656)-(-1. See the SAS output in Section 6.indiana.278931 -2.654256 The actual (absolute) intercept of airline 3 is -. Dev. PROC PANEL DATA=masil.5605)*(-.067003 1.673258 .3723-13.4651 1.1743))*(. PROC SORT DATA=masil.0026 0.5605)*(-.337794 -.0670-(-1. The PANEL Procedure Fixed Two Way Estimates Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length FixTwo 6 15 Fit Statistics SSE MSE R-Square 0.8172) -(12.861012.89337 load | 6 .airline.0514 http://www.6 to cross-check the computation.6179-.78972 .5845-. Min Max -------------+-------------------------------------------------------cost | 6 13.1768 0.1895 =(13.7704)*(.546723 .9123-(1.5845359 . sum cost output fuel load if airline==3 Linear Regression Models for Panel Data: 65 Variable | Obs Mean Std.654256 6.© 2005-2009 The Trustees of Indiana University (9/16/2009) .4651-13. Min Max -------------+-------------------------------------------------------cost | 15 13.(.0472=(13.edu/~statmath 65 . RUN.56479 13.0376737 .(.1743))*(.9984 DFE Root MSE 67 0. MODEL cost = output fuel load /FIXTWO.8172) -(12.

0441 0.0008 0.77 5.0938 0.indiana.2617 Variable CS1 CS2 CS3 CS4 CS5 TS1 TS2 TS3 TS4 TS5 TS6 TS7 TS8 TS9 TS10 TS11 TS12 TS13 TS14 Intercept output fuel load DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Estimate 0.03364 -0.50 -0.61 5.1348 0.3189 0.0255 0.08 -2.4272 -0.92 -1.3378 0.817249 0.0012 Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Time Series Effect 1 Time Series Effect 2 Time Series Effect 3 Time Series Effect 4 Time Series Effect 5 Time Series Effect 6 Time Series Effect 7 Time Series Effect 8 Time Series Effect 9 Time Series Effect 10 Time Series Effect 11 Time Series Effect 12 Time Series Effect 13 Time Series Effect 14 Intercept 6.0861 0.27189 -0.180209 -0.111451 -0.0477 0.39598 -0.05 -1.47304 -0.3321 0.4357 0.37 Pr > |t| 0.0073 <.1635 0.14351 0.9 Using Stata and LIMDEP http://www.04 -2.0321 0.16861 -0.28 -2.edu/~statmath 66 .54215 -0.0001 0.2319 0.0001 Parameter Estimates Standard Error 0.69314 -0.0040 0.0415 0.98 -3.0519 0.1501 0.66 1.01773 -0.02 1.3061 0.0001 <.2182 0.174283 0.3294 0.10 Pr > F <.0588 0.61 -2.94004 0.0305 2.0780 0.1884 0.81 -1.04669 -0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 66 F Test for No Fixed Effects Num DF 19 Den DF 67 F Value 23.6263 0.0268 0.03 -3.02 -2.0454 0.0763 0.78 -0.5432 <.0363 0.1575 0.1118 -0.1733 0.0266 0.0470 0.27 -2.49 -0.5958 -0.63844 -0.01865 12.0750 0.88281 t Value 2.22739 -0.0429 0.0225 0.0319 0.70 -2.0319 0.43 -2.83 25.0001 0.33985 -0.26 -2.

048 -.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 67 The Stata .67) Prob > F = = corr(u_i.0510764 t7 | -.04 0.66 0.92 0.2718933 . xtreg cost t1-t14 output fuel load.0901007 .0319005 -3.027 -. Std.001 -1.70 0.030508 -0. but reports the incorrect intercept in the two-way fixed model.544076 17.3959783 .626 -.027 -.9360088 -.50 0.1501062 -2.05 tests only if parameters of g1 through g5 are all zero.045 -.42792 -------------+---------------------------------------------------------------sigma_u | .49 0.03 0.xtreg command does not have an option for two-way fixed or two-way random effect models. This command has Str and Period to specify stratification and time variables.1348175 -2.405244 -.367467 -.031851 25.86611203 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(5.0000 The following LIMDEP command fits the two-way fixed model.4272042 .175477 -.8828142 .0749914 t11 | -.5421537 .3398463 .025 -.000 .081).84 0.253383 .8808235 fuel | . quietly regress cost g1-g5 t1-t14 output fuel load .059 -1.3603843 _cons | 12. http://www.0 15 873.163478 1. This command presents the pooled model and one-way group effect model as well.6394596 -.02 0.24 0.1192713 .37 0.1576935 . test g1=g2=g3=g4=g5=0 ( ( ( ( ( 1) 2) 3) 4) 5) g1 g1 g1 g1 g1 F( = g2 g3 g4 g5 0 = = = = 0 0 0 0 5.78 0.0177346 .044 -1.6931382 .1118032 .5958031 . However.0795393 .436 -.225402 5.0500762 t8 | -.3320802 -1.26 0.986 2.28 0.0481295 t12 | -.16861 .6384366 .05 Prob > F = 0.edu/~statmath 67 .8033319 -.4949135 load | -. You may doublecheck this test by running the following commands. 67) = Prob > F = 69.040233 t9 | -.0243983 t3 | -.0617764 t4 | -.7418804 -.0100769 t6 | -.9885 Number of obs Number of groups = = 90 6 15 15.9859 overall = 0.0000 The F statistic of 69.1306712 sigma_e | .667 (2. 12.0188098 t2 | -.05137639 rho | . Err.094 -1.033641 .27 0.004 -.075 -1.3294473 -1.8172487 . this command is able to fit the two-way fixed effect model by including a set of dummies for a group (LSDV1) and using the fe option.0186451 .2319459 -2. The pooled OLS and fixed group effect parts of the entire output is skipped below since they are redundant.0000 Obs per group: min = avg = max = F(17.0362554 -0.05 0.0429008 -0.001 -. fe i(airline) Fixed-effects (within) regression Group variable: airline R-sq: within = 0.05 0.0763495 -2. 67) = 69.9955 between = 0.0519893 t13 | -. Interval] -------------+---------------------------------------------------------------t1 | -.98 0.543 -.2273857 .4730429 . .0944011 t5 | -.3361 -----------------------------------------------------------------------------cost | Coef.5409901 -.000 8.1732969 -2.0027964 t10 | -.3189139 -1.0546315 t14 | -.2617373 -3. Xb) = 0.81 0.indiana.301271 .042249 output | . .178708 .306 -.37978 -. t P>|t| [95% Conf.61 0.18844 -2.3378385 -2.7536739 .

= -5. 67] (prob) =1960.00 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .9984493 | +--------------------------------------------------------------------+ | Hypothesis Tests | | Likelihood Ratio Test F Tests | | Chi-squared d.17430918 FUEL | .0000 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -138.633 5 .36561 | | Standard deviation = 1.Rhs=ONE.81725242 .256 3 . Autocorrelation of e(i.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 68 REGRESS.21 (.087 . Valid data 6 | | Smallest 15.947 20 67 .3936109461D+02 .329 3 86 . Valid data 15 | | Smallest 0. F num.t) ..00000 3935.9984493 | | Adjusted R-squared = .88281516 .00000 21.7703592 LOAD | -. Criter.5137627E-01 | | Fit R-squared = .00000 | |(4) vs (2) 441.00004 3.00 | | Panel: Prds: Empty 0.956 20 .56046016 Constant| 12.131971 | | WTS=none Number of observs.10 Testing Two-way Fixed Effects The null hypothesis is that parameters of group and time dummies are zero: H 0 : 1  .0000000 | |(2) Group effects only -90. Largest 15 | | Average group size 15.1768479 | | Standard error of e = .1335449522D+01 . = 90 | | Model size Parameters = 23 | | Degrees of freedom = 67 | | Residuals Sum of squares = .03185102 25. P value | |(2) vs (1) 95.149 3 .9882897 | |(4) X and group effects 130.Panel.08107166 6.1768479062D+00 .0011 .0000 -1. The F test compares the pooled regression and http://www.48804 . = -5.740 5 .Period=YEAR.889 8 .7479 | | Restricted(b=0) = -138.3581 | | Chi-sq [ 22] (prob) = 582.26173663 -3.00000 3604.818 8 81 .FUEL.00000 2419.indiana.2926207777D+00 .0000) | | Info criter. Prob.00000 | |(3) vs (1) 400.1140409821D+03 .9979401 | | Model test F[ 22.&time effects 152.OUTPUT.Lhs=COST. Largest 6 | | Average group size 6.edu/~statmath 68 .3052 12.00000 | |(4) vs (1) 536.   T 1  0 .323 14 .032 . LogAmemiya Prd.variables only 61.9974341 | |(5) X ind.00000 57. denom.35814 .709580 | | Akaike Info.6665675 2.16863516 .659 .08647 .Fixed$ +----------------------------------------------------+ | Least Squares with Group and Period Effects | | Ordinary least squares regression | | Model was estimated Aug 27.733 5 81 ..f.0000) | | Diagnostic Log likelihood = 152.   n 1  0 and  1  .00000 | |(4) vs (3) 136.133 14 67 .74790 . Crt.6548513 | |(3) X .83 (..373 .. 2009 at 04:27:40PM | | LHS=COST Mean = 13.00085 | |(5) vs (3) 181.832 3 81 .00000 | +--------------------------------------------------------------------+ 6.651825 | +----------------------------------------------------+ +----------------------------------------------------+ | Panel:Groups Empty 0.LOAD.Str=AIRLINE.00000 | |(5) vs (4) 45.00000 31.875 5 84 .16347826 1.721164 | | Estd.76991 .

67] (. test g1 g2 g3 g4 g5 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 http://www.regress command to perform the same test.3354  .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 69 two-way fixed group and time effect model.0000).airline.10 Pr > F <.01 significance level (p<.1085 rejects the null hypothesis at the .00264 Source Numerator Denominator DF 19 67 F Value 23. RUN.1768) (6  15  2) ~ 23.1768) (6 *15  6  15  3  1) The SAS TSCSREG and PANEL procedures conduct this F-test for the group and time effects.0001 . quietly regress cost g1-g5 t1-t14 output fuel load .edu/~statmath 69 .1085[19.indiana. The F statistic of 23. The Stata output is skipped. TEST g1=g2=g3=g4=g5=t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0. PROC REG DATA=masil. MODEL cost = g1-g5 t1-t14 output fuel load. (1. You may also run the following SAS REG procedure and Stata . Test 1 Results for Dependent Variable cost Mean Square 0.06098 0.

031675926/(6-4) . such as a modified Wallace and Hussain method.670313 4 71. MINQUE. 86) Prob > F R-squared Adj R-squared Root MSE = 90 =19642. and MIVQUE are recommended for the unbalanced models.indiana. .06019 -----------------------------------------------------------------------------rg_cost | Coef. whereas ML.031675926/(6 .72 = 0..  is . Random Effect Models A random effect model examines how group and/or time affect error variances. .0317) and the fixed group effect model (. run the OLS with the transformed variables. This chapter focuses on the feasible generalized least squares (FGLS) with variance component estimation methods. Interval] -------------+---------------------------------------------------------------rg_int | 9. the Swamy and Arora method.292622872/(6*15-6-3) ˆ The variance component of group  u2 is . transform the dependent and independent variables including the intercept using  .00361263  v2 1 . and Henderson’s method III. minimum norm quadratic unbiased estimators (MINQUE). 70 http://www. They also discuss maximum likelihood (ML) estimators. you have to estimate  using the SSEs of the between group effect model (.01583796 . .9989 = .210119 10. 2 ˆ T between 15 * ..000 9. ..87668488*gm_fuel rg_load = load . the Wansbeek and Kapteyn method.81 0.87668488  1  ˆ2 ˆ T u   v2 ˆ  v2 1 ˆ .87668488*gm_load rg_int = 1 .00361263 = .edu/~statmath .0000 = 0. .003623102 -------------+-----------------------------Total | 284.1 One-way Random Group Effect Model When the omega matrix is not known. nK 64 ˆ Next.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 70 7.2101638 45.9989 = 0.031675926   . ˆ The variance component of error  v2 is . noc Source | SS df MS -------------+-----------------------------Model | 284. Do not forget to suppress the intercept. gen gen gen gen gen rg_cost = cost . and minimum variance quadratic unbiased estimators (MIVQUE).01559712 =. Std. they argue that ANOVA estimators are Best Quadratic Unbiased estimators of the variance components for the balanced model.2926). Based on a Monte Carlo simulation.. restricted ML.311586777 86 . Err.87668488*gm_output rg_fuel = fuel . This is the groupwise heteroscedastic regression model (Greene 2003). . t P>|t| [95% Conf.87668488*gm_cost rg_output = output .00361263/15 ˆ Thus. restricted ML estimators.9819 90 3.1675783 Residual | . regress rg_cost rg_int rg_output rg_fuel rg_load. This model is appropriate for n individuals who were drawn randomly from a large population.87668488 // for the intercept Finally.4) ˆ2 where  between  SSEbetween ..0457 10 Baltagi and Cheng (1994) introduce various ANOVA estimation methods.16646556 Number of obs F( 4..10 7.627911 .

which is the groupwise heteroscedastic regression. PROC PANEL DATA=masil. the TSCSREG and PANEL procedures have the /RANONE option to fit the one-way random effect model. These procedures by default use the Fuller and Battese (1974) estimation method.0256249 35.8557401 .016015 0. PROC TSCSREG does not have VCOMP= to specify the type of variance component estimation.003613 Hausman Test for Random Effects DF 2 m Value 1. The PANEL Procedure Wansbeek and Kapteyn Variance Components (RanOne) Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length RanOne 6 15 Fit Statistics SSE MSE R-Square 0. The BP option of the MODEL statement.9576215 rg_fuel | .32 0.9923 DFE Root MSE 86 0.000 .2000703 -5. and LIMDEP In SAS. RUN. conducts the Breusch-Pagen LM test for random effects.394898 .indiana. PROC PANEL has the /VCOMP=WK option for the Wansbeek and Kapteyn (1989) method.462226 -.edu/~statmath 71 . Stata.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 71 rg_output | .3111 0.0601 Variance Component Estimates Variance Component for Cross Sections Variance Component for Error 0. not available in PROC TSCSREG.63 Pr > m 0.2 Estimations in SAS.0645 . which produces slightly different estimates from FGLS.4429 http://www.000 -1.4227784 .airline.38 0.000 .0140248 30. MODEL cost = output fuel load /RANONE BP VCOMP=WK. Unlike PROC PANEL.4506587 rg_load | -1.0036 0.15 0.6667731 ------------------------------------------------------------------------------ 7. ID airline year.9066808 .

906918 0. The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1. RUN.airline. MODEL cost = output fuel load /RANONE. you may use PROC MIXED to get the same results. PROC TSCSREG DATA=masil.0001 Parameter Estimates Standard Error 0. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION. Notice that there are some differences in the output of PROC TSCSREG (variance component estimates and Hausman test) between SAS 9.0140 0.4 -206.1) Residual Subject airline Estimate 0.2107 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 72 Breusch Pagan Test for Random Effects (One Way) DF 1 m Value 334.13. PROC MIXED DATA=masil.422676 -1.0160 versus .2 and 9.003609 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) -210.2000 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9.4 -206.11 -5.629513 0. Unlike SAS 9.0036) but a different variance component for groups (.01674 0.06452 t Value 45.0001 <. RUN.0001 <. airline in this case.30 30.edu/~statmath 72 .85 Pr > m <.0257 0.indiana. (output is skipped) Alternatively.32 Pr > |t| <.13.0001 <. MODEL cost = output fuel load /SOLUTION.3 http://www.71 35.0001 PROC PANEL and PROC TSCSREG estimate the same variance component for error (. The following script returns a set of random effect estimates. ID airline year.airline. CLASS airline. SAS 9.2 requires the CLASS statement to explicitly specify an effect variable.4744).

Let us specify airline as a panel identification variable using the .indiana.49 Pr > ChiSq <. the . iis airline . .82 3.40 Pr > F <.0001 In Stata.3247 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 81 81 81 Effect output fuel load F Value 1235.33 Obs per group: min = avg = max = Wald chi2(3) = Random effects u_i ~ Gaussian http://www.99 Pr > |t| 0.9925 between = 0.0001 <.06180 0.05 0.9876 Number of obs Number of groups = = 90 6 15 15.53 35.4225 -1. The theta option reports an estimated theta (.03 0.06239 0.6322 0. re theta Random-effects GLS regression Group variable: airline R-sq: within = 0.33 Pr > |t| <.01012 -0.05507 0.9616 0.03 28.9073 0.edu/~statmath 73 .1691 0.0 15 11091.0001 <.06349 Effect Intercept Intercept Intercept Intercept Intercept Intercept airline 1 2 3 4 5 6 Estimate 0.0033 0.55 -3.06291 DF 81 81 81 81 81 81 t Value 0.9856 overall = 0.0001 Solution for Random Effects Std Err Pred 0.05581 0.0003 0.2116 0.0001 <.01406 0.1998 Effect Intercept output fuel load Estimate 9.xtreg command has the re option to produce FGLS estimates.15 -0.0001 <. xtreg cost output fuel load.8767).16 30.02581 0.0001 Solution for Fixed Effects Standard Error 0.8 Null Model Likelihood Ratio Test DF 1 Chi-Square 107.88 903.iis command.0001 <.0646 DF 5 81 81 81 t Value 45.06594 0.5818 0.8784 0.05 -5.002981 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) BIC (smaller is better) Linear Regression Models for Panel Data: 73 -206.2106 0.03450 -0.

53 0.06010514 rho | .20458 Computing standard errors: Mixed-effects REML regression Group variable: airline Number of obs Number of groups = = 90 6 15 15.856732 .025625 35.0 15 Obs per group: min = avg = max = Log restricted-likelihood = 105.456126 -.632212 .20458 105. option tells Stata to fit the model using the subject variable airline. Interval] -------------+---------------------------------------------------------------output | .064572 .81193816 (fraction of variance due to u_i) ------------------------------------------------------------------------------ The sigma_u and sigma_e are square roots of the variance components for groups and errors (. Interval] -------------+---------------------------------------------------------------output | .064499 .0047138 .025809 35.210164 45.000 9.000 .000 .indiana.12488859 sigma_e | . The || airline:. .0140248 30.16 0.0140598 30.672368 _cons | 9. Err. xtmixed cost output fuel load || airline:.04686 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std.627909 .0000 -----------------------------------------------------------------------------cost | Coef.6730179 _cons | 9.0036=.xtmixed fits the same model. .0675403 .9073166 .1293723 .000 .33 0.0156=.3949465 .9569045 fuel | .0429029 .0601^2).211559 45.81 0.49 Prob >= chibar2 = 0.215995 10.4225032 .4502665 load | -1.edu/~statmath 74 .000 9. [95% Conf.9579013 fuel | .2478107 -----------------------------+-----------------------------------------------sd(Residual) | .217564 10.0600715 .05 0.87668503 Linear Regression Models for Panel Data: 74 = 0. X) theta = 0 (assumed) = .45006 load | -1.3952904 .4227784 .2000703 -5.0000 Prob > chi2 -----------------------------------------------------------------------------cost | Coef. Variance components for groups and errors are reported under the labels sd(_cons) and sd(Residual).0700588 -----------------------------------------------------------------------------LR test vs. z P>|z| [95% Conf.000 .000 -1.15 0.38 0.0000 http://www.9066805 .1997763 -5. .20458 Wald chi2(3) Prob > chi2 = = 11114.32 0. Performing EM optimization: Performing gradient-based optimization: Iteration 0: Iteration 1: log restricted-likelihood = log restricted-likelihood = 105.1249^2.8564565 . Err.85 0. linear regression: chibar2(01) = 107. Interval] -----------------------------+-----------------------------------------------airline: Identity | sd(_cons) | .© 2005-2009 The Trustees of Indiana University (9/16/2009) corr(u_i. Alternatively.03982 -------------+---------------------------------------------------------------sigma_u | . the random-intercept model.051508 .456629 -.000 -1. Err. Std. Std. z P>|z| [95% Conf.

003494 Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -229.5 -216.1962 Effect Intercept output fuel load Estimate 9.9892 0. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN SOLUTION. CLASS airline.05640 0.05580 0. PROC PANEL and TSCSREG do not have such option. MODEL cost = output fuel load /SOLUTION.4234 -1.0001 <. add METHOD=ML to PROC MIXED.5 -217.42 Pr > |t| <.0001 <.9053 0.01302 0.5707 <.04900 0.01 1.05750 Effect Intercept Intercept Intercept Intercept Intercept Intercept airline 1 2 3 4 5 6 Estimate 0.47 36.57 -4.02466 0.4 -218.72 31. PROC MIXED DATA=masil.03211 -0.2992 http://www.000761 0. The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1.2094 0. In SAS.92 Pr > ChiSq <.0012 0.0645 DF 5 81 81 81 t Value 47.8281 0.06008 DF 81 81 81 81 81 81 t Value 0.2026 0.04976 0.0001 <.27 3.airline METHOD=ML.01306 -0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 75 You may use the maximum likelihood estimation to fit random effect (or random intercept) model.37 0. RUN.01364 0.1) Residual Subject airline Estimate 0.0001 0.05994 0.indiana.1676 0.05 -5.6186 0.edu/~statmath 75 .0001 Solution for Fixed Effects Standard Error 0.7 Null Model Likelihood Ratio Test DF 1 Chi-Square 105.04 Pr > |t| 0.0001 Solution for Random Effects Std Err Pred 0.22 -0.

FUEL.1047419 . i(airline) panels(hetero) corr(independent) (output is skipped) In LIMDEP.6798506 _cons | 9. 2009 at 08:26:15PM | | LHS=COST Mean = 13.206622 46. Std.3961557 .55 0. LIMDEP estimates a slightly different variance component for groups (. Notice that error variance components are computed as .1246133 | http://www.0001 <. mle (output is skipped) .Rhs=ONE.92 Prob>=chibar2 = 0.48 0.7883772 .Panel.9053099 .000 .013888 30. You may also try . z P>|z| [95% Conf. and Het= subcommands for the groupwise heteroscedastic model.131971 | | WTS=none Number of observs.9550458 fuel | .72896 -----------------------------------------------------------------------------cost | Coef.xtgls that fits panel data models with heteroscedasticity across and within groups.0345293 . xtgls cost output fuel load.32 0.0000 Obs per group: min = avg = max = LR chi2(3) Prob > chi2 = = Log likelihood = 114.0119).0630373 .19 963.02362 -------------+---------------------------------------------------------------/sigma_u | .Het=AIRLINE.88 29.edu/~statmath 76 .xtreg below.0591^2.5365302 .0507956 .Lhs=COST.000 9.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 76 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 81 81 81 Effect output fuel load F Value 1348.0045701 .449062 -.196231 -5.43 Pr > F <. Compare the output of PROC MIXED above and .Random Effect$ +----------------------------------------------------+ | OLS Without Group Dummy Variables | | Ordinary least squares regression | | Model was estimated Aug 30.4233757 .OUTPUT. you have to specify Panel.9344669 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 105. re mle Random-effects ML regression Group variable: airline Random effects u_i ~ Gaussian Number of obs Number of groups = = 90 6 15 15. the mle option is used in .indiana.68 0.618648 .4505957 load | -1. . REGRESS.0687787 rho | .8555741 .1140843 .36561 | | Standard deviation = 1.000 .Str=AIRLINE.xtmixed commands to produce the same result.0001 <.335450 | | Standard error of e = .000 . Err.0253759 35.0001 In Stata.064456 .0591072 . xtmixed cost output fuel load || airline:. xtreg cost output fuel load.xtreg and .213677 10.000 -1. Interval] -------------+---------------------------------------------------------------output | . thus producing different parameter estimates. Random Effect.0 15 436.0035 = . = 90 | | Model size Parameters = 4 | | Degrees of freedom = 86 | | Residuals Sum of squares = 1.0130=1141^2 and .LOAD.2064687 /sigma_e | .42 0.

514 . .0000 .121594 | | Akaike Info.17430918 FUEL | .(-1.56046016 Constant| 9.56046016 Constant| 9.359 . 14.62750780 .0000 .1.226263)*tm_load rt_int = 1 . .) | | Baltagi-Li form of LM Statistic = 334.88273863 .v(i. .226263)*tm_cost rt_output = output .61063438 . 1.01325455 66.0000 -1.© 2005-2009 The Trustees of Indiana University (9/16/2009) | Fit R-squared = . Largest 15 | | Average group size 15.226263  1  .28136 | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .42389869 .713 .t) = e(i.indiana.226263)*tm_output rt_fuel = fuel . Crt.0000 12. prob value = .90412380 .0000 +----------------------------------------------------+ | Panel:Groups Empty 0.(-1.edu/~statmath 77 .08819022/(15*6-15-3) ˆ The variance component for time  u2 is -.987042D+00 | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.121653 | +----------------------------------------------------+ Linear Regression Models for Panel Data: 77 +----------------------------------------------------+ | Panel Data Analysis of COST [ONE way] | | Unconditional ANOVA (No regressors) | | Source Variation Deg.0000) | | Diagnostic Log likelihood = 61.00 | +----------------------------------------------------+ +--------------------------------------------------+ | Random Effects Model: v(i. Valid data 6 | | Smallest 15. gen gen gen gen gen ˆ  v2 .837 .33 (.34530293 -4.t) + u(i) | | Estimates: Var[e] = . = -4.85 | | ( 1 df. Criter.0882). LogAmemiya Prd.(-1.76991 | | Restricted(b=0) = -138.0056) and the fixed time effect model (1.20277404 47.7703592 LOAD | -1.119159D-01 | | Corr[v(i.3611 84.226263) // for the intercept http://www.19933132 -5.0000 7. .00201072 =.t). = -4.9878812 | | Model test F[ 3.226263)*tm_fuel rt_load = load . Model (3) = 334.(-1.51691223 .0000) | | Info criter..730 .|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | .06455866 .26 (. Free. ˆ The variance component for error  v2 is .4) rt_cost = cost .0000 -1.01511375 1 2 ˆ n between 6 * .599 .01511375 = 1.767356 | | Lagrange Multiplier Test vs.01511375/6 ˆ The  is .s)] = . Mean Square | | Between 74.9882897 | | Adjusted R-squared = . 86] (prob) =2419.7703592 LOAD | -1.361260D-02 | | Var[u] = .041 89.005590631/(15-4).6799 5.17430918 FUEL | .0000 12.02030424 22.01374650 30.3 One-way Random Time Effect Model ˆ Let us compute  using the SSEs of the between time effect model (.(-1.02461548 36. .3581 | | Chi-sq [ 3] (prob) = 400.85 | | Sum of Squares .005590631/(15 .45397771 .341 .468584 | | Total 114.396 .9360 | | Residual 39.22924522 41.000000) | | (High values of LM favor FEM/REM over CR model.Er.147779D+01 | | R-squared .

BY year airline.98 0.288591 Linear Regression Models for Panel Data: 78 Number of obs F( 4.2482869 -5. MODEL cost = output fuel load /RANONE.0129051 34. Err.0451 Residual | 1.000 .© 2005-2009 The Trustees of Indiana University (9/16/2009) . 86) Prob > F R-squared Adj R-squared Root MSE = = = = = = 90 . the negative value of the variance component for time is not likely.4136186 .9883 DFE Root MSE 86 0.1804 4 19986. The /VCOMP=WH option in the MODEL statement employs Wallace and Hussian’s method to estimating variance components and produces the same parameter estimates.airline. MODEL cost = output fuel load /RANONE BP VCOMP=WH.8598891 . noc Source | SS df MS -------------+-----------------------------Model | 79944.3354 0.0155 0.1489281 63. Interval] -------------+---------------------------------------------------------------rt_int | 9.airline.8883838 . RUN.9168785 rt_fuel | . ID year airline.000 9.4392731 . In SAS.7855982 ------------------------------------------------------------------------------ However.279176 .516098 . use the TSCSREG or PANEL procedure with the /RANONE option.90 0. (Output is skipped) PROC PANEL DATA=masil.indiana. RUN. regress rt_cost rt_int rt_output rt_fuel rt_load.220038 9.edu/~statmath 78 . ID year airline.9732 90 888. PROC TSCSREG DATA=masil.1246 http://www.772754 -.000 .812157 rt_output | . Notice that the data are sorted by year and airline.0000 1.4649277 rt_load | -1. The PANEL Procedure Wallace and Hussain Variance Components (RanOne) Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length RanOne 15 6 Fit Statistics SSE MSE R-Square 1.000 -1.0143338 61. Std.15 0.0000 1.0000 .020845581 -------------+-----------------------------Total | 79945.04 0. 0. t P>|t| [95% Conf. PROC SORT DATA=masil.14438 -----------------------------------------------------------------------------rt_cost | Coef.airline.79271995 86 .

0133 0.882739 0.0001 <. RUN.453977 -1.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 79 Variance Component Estimates Variance Component for Cross Sections Variance Component for Error 0 0.2 http://www. RANDOM INTERCEPT / SUBJECT=airline TYPE=UN.516923 0.55 Pr > m 0.9 -100.36 -4.51 66.62751 t Value 41.3453 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9.1) Residual Subject year Estimate 0 0.60 22.edu/~statmath 79 . The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1.0001 <. MODEL cost = output fuel load /SOLUTION.2292 0.airline.9 -100.0023 Breusch Pagan Test for Random Effects (One Way) DF 1 m Value 1.9 -100.0001 PROC MIXED fits the same random time effect model although /SOLUTION in the RANDOM statement does not work to produce random effect parameter estimates in this case.0203 0.17 Pr > m 0. CLASS airline.71 Pr > |t| <.0001 <.2135 Parameter Estimates Standard Error 0.016437 Hausman Test for Random Effects DF 2 m Value 12. PROC MIXED DATA=masil.01553 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -102.indiana.

0203042 22.4141815 .067612 9.0132545 66. you have to switch group and time variables using the .60 0.71 0.0001 <.2292 0.36 0. 1 to 6 1 unit .0001 <.9883 Number of obs Number of groups = = 90 15 6 6. Err.0000 Obs per group: min = avg = max = Wald chi2(3) Prob > chi2 = = Random effects u_i ~ Gaussian corr(u_i. http://www. Interval] -------------+---------------------------------------------------------------output | .51 0. z P>|z| [95% Conf.9507309 _cons | 9.000 .03 0.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 80 Null Model Likelihood Ratio Test DF 0 Chi-Square 0.8827 0.000 -2.00 Pr > ChiSq 1.12293801 rho | 0 (fraction of variance due to u_i) ------------------------------------------------------------------------------ You may runt the following command to get the same result.000 9.02030 0.8827385 .5169 0.indiana.3453 Effect Intercept output fuel load Estimate 9.453977 .9966 overall = 0. . Std.51 66.516923 .9843 between = 0. tsset year airline panel variable: time variable: delta: year (strongly balanced) airline.62751 .4540 -1. re i(year) theta Random-effects GLS regression Group variable: year R-sq: within = 0.0001 <.0001 Type 3 Tests of Fixed Effects Num DF 1 1 1 Den DF 72 72 72 Effect output fuel load F Value 4435.0000 Solution for Fixed Effects Standard Error 0.edu/~statmath 80 .2292445 41.000 . X) = 0 (assumed) theta = 0 -----------------------------------------------------------------------------cost | Coef.0001 <.01325 0.0001 In Stata. xtreg cost output fuel load.22 Pr > F <.tsset command.6275 DF 14 72 72 72 t Value 41.345302 -4.36 -4.9087169 fuel | .0 6 7258.92 22.4937724 load | -1.44 499.8567602 .71 Pr > |t| <.30429 -.966233 -------------+---------------------------------------------------------------sigma_u | 0 sigma_e | .60 22.0001 <.

s)] = .55 | | ( 1 df.v(i.00 | +----------------------------------------------------+ +--------------------------------------------------+ | Random Effects Model: v(i. Let us first estimate the two way FGLS using the SAS PANEL procedure with the /RANTWO option.56046016 Constant| 9.35084190 -4.133564D+01 | | R-squared .t).t) = e(i.66267268 .503 .414686D-03 | | Corr[v(i.52363173 .162 . xtmixed cost output fuel load || year:.0000 . REGRESS. RUN.edu/~statmath 81 . ID airline year. MODEL cost = output fuel load /RANTWO BP2. and LIMDEP are slightly different each other.Lhs=COST.airline. You may find that parameter estimates of SAS.OUTPUT.55 | | Sum of Squares . ID airline year. MODEL cost = output fuel load /RANTWO.17430918 FUEL | . you need to use the Str= and Random subcommands.01314515 67.7703592 LOAD | -1.213557) | | (High values of LM favor FEM/REM over CR model.) | | Baltagi-Li form of LM Statistic = 1.739 .airline. Valid data 15 | | Smallest 6.Panel.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OUTPUT | . (Output is skipped) PROC PANEL DATA=masil. RUN.LOAD.45500533 . Largest 6 | | Average group size 6. prob value = .© 2005-2009 The Trustees of Indiana University (9/16/2009) .24108843 39. The output below includes only the random effect part. The BP2 option conducts the Breusch-Pagan LM test for the two-way random effect model.Er.Random$ +----------------------------------------------------+ | Panel:Groups Empty 0.t) + u(i) | | Estimates: Var[e] = .0000 7.Het=YEAR.FUEL.988288D+00 | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.434 . Stata.Rhs=ONE.0000 -1. The PANEL Procedure Fuller and Battese Variance Components (RanTwo) Dependent Variable: cost Model Description Estimation Method RanTwo http://www. PROC TSCSREG DATA=masil.026705 | | Lagrange Multiplier Test vs.151138D-01 | | Var[u] = .0000 12.indiana. Model (3) = 1.4 Two-way Random Effect Model in SAS The random group and time effect model is formulated as y it     ' X ti  u i   t   it .Str=YEAR.88285277 . (output is skipped) Linear Regression Models for Panel Data: 81 In LIMDEP.02122856 21.

5 Testing Random Effect Models The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects.0001 <.362677 0.0001 <.2235 Variable Intercept output fuel load DF 1 1 1 1 Estimate 9.0172 0.indiana.0001 The following .38 33.001081 0.0001 <.xtmixed command suffers from convergence problem in this case and LIMDEP command produces different results (output is skipped).Rhs=ONE.LOAD.0741 Breusch Pagan Test for Random Effects (Two Way) DF 2 m Value 336. mle REGRESS.40 Pr > m <.OUTPUT.017439 0.edu/~statmath . .98 25.0520 Variance Component Estimates Variance Component for Cross Sections Variance Component for Time Series Variance Component for Error 0.Random Effect$ 7. The null hypothesis of the one-way random group effect model is that individual-specific or time-series error variances are zero: H 0 :  u2  0 .0001 Parameter Estimates Standard Error 0.866448 0.0027 0.Str=AIRLINE.0255 0. If the null hypothesis is not rejected.Panel.© 2005-2009 The Trustees of Indiana University (9/16/2009) Number of Cross Sections Time Series Length Linear Regression Models for Panel Data: 82 6 15 Fit Statistics SSE MSE R-Square 0.41 -4. the pooled 82 http://www.FUEL.00264 Hausman Test for Random Effects DF 3 m Value 6.9829 DFE Root MSE 86 0.Period=YEAR.436163 -0.93 Pr > m 0. xtmixed cost output fuel load || airline: || year:.39 Pr > |t| <.2322 0.2440 0.Lhs=COST.98053 t Value 38.

0036126 .0151138 .2135 http://www. quietly xtreg cost output fuel load.5472 does not reject the null hypothesis at the . re i(airline) .0155972 . The SAS PANEL procedure with the /BP option and the LIMDEP Panel and Het subcommands report the same LM statistic (see 7. H 0 :  u  0 .3354  1 ~  (1) with p<. 2 2  15 * 6  .55 0.0000.281358 1. .0665147. re i(year) .131971 e | . The small chi-squared of 1.indiana.3354  With the large chi-squared of 334.0665   1 ~  2 (1) with p <.0000 2 The null hypothesis of the one-way random time effect is that variance components for time are 2 zero. SAS and LIMDEP return the same LM statistic (see 7. xttest0 Breusch and Pagan Lagrangian multiplier test for random effects cost[airline. xttest0 Breusch and Pagan Lagrangian multiplier test for random effects cost[year.8496=  2(15  1)  1.122938 u | 0 0 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 1.8496.5472   2 1.3).7817 Tn   net   2  1  LM is 1.t] = Xb + u[airline] + e[airline.2).edu/~statmath 83 .t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------cost | 1. The e’e of the pooled OLS is 1.85 0. LM is 334. 6 * 15 15 2 * .01 level. In Stata.2135 2(n  1)   eit 2(6  1)      2 . The following LM test uses Baltagi’s formula.0601051 u | .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 83 regression model is appropriate. we reject the null hypothesis in favor of the random group effect model.33544153 and e ' e is .281358 1.t] = Xb + u[year] + e[year. quietly xtreg cost output fuel load.t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------cost | 1.xttest0 command right after estimating the one-way random group effect model.131971 e | . run the .1248886 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 334.

9066805 . obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 2. Panel$ Str=. . The LM statistic with two degrees of freedom is 336.3968 = 334.0058974 . Difference S.1 summarizes random effect estimations in SAS. These tests. do not reject the null hypothesis in favor of the random effect model.0058583 load | -1.1 Comparison of the Random Effect Model in SAS. quietly xtreg cost output fuel load. estimates store fixed_group .0153877 fuel | . obtained from xtreg B = inconsistent under Ha. and LIMDEP use different estimation methods to produce slightly different parameter estimates.12 is different from PROC PANEL’s 1.6 Fixed Effects versus Random Effects How do we compare a fixed effect model and its counterpart random effect model? The Hausman specification test examines if the individual effects are uncorrelated with the other regressors in the model. 7. 1 to 15 1 unit .0255088 -----------------------------------------------------------------------------b = consistent under Ho and Ha.7 Summary Table 7. PROC PANEL is highly recommended.5469 (V_b-V_B is not positive definite) The Hausman statistic 2.xtreg re No No No Regress.0001).Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed_group . Since computation is complicated.Period. tsset airline year panel variable: time variable: delta: airline (strongly balanced) year. let us conduct the test in Stata.0052867 . fe .12 Prob>chi2 = 0. re . Stata.5472 (p<.Random$ Incorrect No 84 http://www. hausman fixed_group .16.9192846 .070396 -1.Random$ Str=.0126041 .indiana. Stata.E. Stata. It is because SAS. 7. quietly xtreg cost output fuel load. and LIMDEP.2 Stata 11 LIMDEP 9 Procedure/Command One-way Two-way SSE (e’e) MSE or SEE PROC TSCSREG /RANONE /RANTWO Slightly different Slightly different PROC PANEL /RANONE WK /RANTWO Correct Correct .064499 -. LIMDEP* SAS 9. Table 7.edu/~statmath .4174918 . ---.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 84 The two way random effects model has the null hypothesis that variance components for groups and time are all zero. however.63 and Greene (2003)’s 4. efficient under Ho.8496 + 1. -------------+---------------------------------------------------------------output | .4227784 -.

hausman Hausman Test (H) Incorrect Yes Yes (unstable) * “Yes/No” means whether a software package reports the statistic.indiana. http://www.edu/~statmath 85 . “Correct/incorrect” indicates whether the statistics are different from those of the groupwise heteroscedastic regression.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 85 Model test (F) No No Wald test No (adjusted) R2 Slightly different Slightly different Incorrect Incorrect Intercept Slightly different Correct Correct Slightly different Coefficients Slightly different Correct Correct Slightly different Standard errors Slightly different Correct Correct Slightly different Variance for group Slightly different Correct Correct (sigma) Slightly different Variance for error Correct Correct Correct (sigma) Correct theta Theta No No No BP. BP2 .xttest0 Breusch-Pagan (LM) No Yes .

the panel data are not poolable.2000) -2.79 (p<.1 Group by Group OLS Regression In SAS. use the BY statement in PROC REG.4362** (. fixed effect.4540** (. you may consider the random coefficient model and hierarchical regression model.2017) -1.8173** (.0601) 1. and Random Effect Models Model Output Fuel Load SSE/SEE DF Pooled Between group Between time Fixed group Fixed time Two-way fixed Random group Random time Two-way random .9989) .0000) The poolability test examine if data are poolable so that individual entities or time periods have the same constant slopes of regressors. “Which model is better than the others?” Do we have to consider individual-specific or time effect? Are these effects are fixed or random? Table 8. BY airline.4227** (.2322 (.1167) . and random effect model.9841) .82 (p<. If the null hypothesis is rejected. In this case.0134) . Do not forget to sort the data set in advance.7432) -1. forvalues i= 1(1)6 { // run group by group regression display "OLS regression for group " `i' regress cost output fuel load if airline==`i' } OLS regression for group 1 http://www.1686 (.1333** (.0001) 1960.0152) -.3641) .3507** (.3111 (.8677** (.0704** (.0319) .5239 (4. you need to run group by group OLS regressions and/or time by time OLS regressions.2235) 1.9805** (.7825* (.9972) .1722 (.8827** (.9936 (. For poolability test. PROC SORT DATA=masil.1635) . BY airline.0299) . Poolability Test Table 8.0225) .0520) 86 2 11 81 72 67 86 86 86 F R2 (Adj.1246) .edu/~statmath 86 .1088) 1.0203) -5.0056 (.0257) .33 (p<.8664** (.8820** (.9923 .4424) -.0154) .0000) 439.2617) -1.2478) -1. RUN. We may ask.0513) .9193** (.6275** (.4184) -.0601) 1.9974 (.9979) .9882) .34 (p<.1259) .9883 (.0172) -1.0514) .0000) 3935.3354 (.4787) .3342** (.2926 (.1 summarizes the results of pooled OLS.0000) 104.4175** (.7511 (2.0645** (.airline. 8.airline.0317 (.9544** (.9829 2419. MODEL cost = output fuel load. the if qualifier makes it easy to run group by group regressions.0882 (.0095) 4074.0133) .9984 (.2749+ (.9905 (.0140) .8828** (.0255) .3453) -1.1 Summary of Pooled. Fixed Effect.0568) . In Stata.62 (p<.4845 (.indiana.9991 (.12 (p<.) .0228) .© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 86 8. PROC REG DATA=masil.1229) .9069** (.9879) .9848 .0050** (.1769 (.

22 0.41824348 3 1.0456 -----------------------------------------------------------------------------cost | Coef.128 -1.000 7.36104572 Number of obs = 15 F( 3.© 2005-2009 The Trustees of Indiana University (9/16/2009) Source | SS df MS -------------+-----------------------------Model | 3.10 0.724785 .459104 .022869767 11 .34 0.40 0.000 -3.65 0.0000 0. Interval] -------------+---------------------------------------------------------------output | .13941449 Residual | .05621 -----------------------------------------------------------------------------cost | Coef.7268305 .3661192 .9980 = 0.46 0.indiana. Std.79286673 3 1.396444 fuel | .9353749 .0759266 12. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 1843.0000 = 0. t P>|t| [95% Conf.0000 0.4637263 . Interval] -------------+---------------------------------------------------------------output | . t P>|t| [95% Conf.5613333 load | -.37252558 3 2.23 0. t P>|t| [95% Conf.699815 .68 0.46 = 0.9699164 1.36886 load | -2.4013571 -6.000 11. Err.4250424 14 .102488 fuel | .001 . 11) Prob > F R-squared Adj R-squared Root MSE = = = = = = 15 777.25 0.3865867 .3088958 .000618083 -------------+-----------------------------Total | 3.6023241 15.095226 .2605148 _cons | 9. Std.47622084 3 2.49 0.0000 http://www.2376522 -11.000689803 -------------+-----------------------------Total | 6.02626 -----------------------------------------------------------------------------cost | Coef.19174 11.edu/~statmath 87 .9940 .000 .0792856 18.2972551 36.02139 12.86 0.722057 10.838902 10. Err.6105989 -1. Interval] -------------+---------------------------------------------------------------output | 1.0272443 11.50025 -----------------------------------------------------------------------------OLS regression for group 2 Source | SS df MS -------------+-----------------------------Model | 6.13 0.3676324 .007587838 11 .97243 .9975 = . t P>|t| [95% Conf.5926122 _cons | 8.000 .3847054 1. Err.40727792 14 .50 = 0.000 .201716 _cons | 11.85 0.0968946 12. 11) Prob > F R-squared Adj R-squared Root MSE = = = = = = 15 608. Std.9988 = 0.48380868 14 .0000 = 0.006798918 11 .0181946 21.044347 10.284597 1.9940 0.49031 -----------------------------------------------------------------------------OLS regression for group 5 Source | SS df MS -------------+-----------------------------Model | 7.034752343 11 .26428891 Residual | .244 -2.08313716 3 2.1554418 4.18318 .8985786 9.15874028 Residual | .67757 -----------------------------------------------------------------------------OLS regression for group 4 Source | SS df MS -------------+-----------------------------Model | 7.32 0.68 0.21 0.34501 -1.00207907 -------------+-----------------------------Total | 3.45750853 Residual | .068956 fuel | .578248 _cons | 10.92346 -----------------------------------------------------------------------------OLS regression for group 3 Source | SS df MS -------------+-----------------------------Model | 3.003159304 -------------+-----------------------------Total | 7.000 . Err.52909128 Number of obs F( 3.463129191 Number of obs F( 3.7513069 .5353929 load | -.02486 -----------------------------------------------------------------------------cost | Coef.3465406 . 11) = 1999.63361 fuel | .244645886 Linear Regression Models for Panel Data: 87 Number of obs F( 3.71 0.000 10.89 Prob > F = 0.247854 -2.9953 0.4515127 .000 .8157365 14 .811856 .000 1.164608 .9985 = .4266329 load | -2. Std.000 6. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 3129.846 . Interval] -------------+---------------------------------------------------------------output | 1.2489315 .9924 .272552607 Number of obs F( 3.000 -3.0381103 11.4707826 -1.47 0.7682616 1.461629 .7756708 .4320951 27.000 .

1007 = .© 2005-2009 The Trustees of Indiana University (9/16/2009) Residual | .3 Poolability Test over Time The null hypothesis of the poolability test over time is H 0 :  tk   k .30 0.07 0.22 0.001180585 -------------+-----------------------------Total | 7.481220.004 -1.9442886 1. Err.1964845 . Std.0348 + .000 .029913538 + . 8.0130 + . Interval] -------------+---------------------------------------------------------------output | .085430285 + .0068 + .044807673 + .000469826 + .0229 + . The e' e is 1.9982 0.012986435 11 .0308235 9.000 .indiana.49 = 0.3336308 -3.1050328 . t P>|t| [95% Conf.941163 -. The sum of et ' et is computed from the 15 time by time regression.3354  .84 0.087240016 + .9982 = .066075346 + .9065471 1.2 Poolability Test across Groups The null hypothesis of the poolability test across groups is H 0 :  ik   k .0771255 13.0000).077112957 + /// .4767508 0.3023258 .9977 .77381 .1173565 3 3.9986 = 0.001423938 -------------+-----------------------------Total | 11.03774 -----------------------------------------------------------------------------cost | Coef.0321728 30.9673393 .000 10.3876239 load | -1.73 0.506865971 Linear Regression Models for Panel Data: 88 R-squared = Adj R-squared = Root MSE = 0.09612359 14 .0076 + . The ei ' ei is . We conclude that the panel data are not poolable with respect to airline.8965275 1.81 0.4812 rejects the null hypothesis of poolability (p< .1330199 14 .0157.016506613 + . di . Err.872309 11.edu/~statmath 88 . the SSE of the pooled OLS regression.70578551 Residual | .96 0.023093978 + .0434213 6.0000 = 0.4725305 _cons | 11.014104542 + /// .143348297 + .000 .012170358 + .7430078 15.154354 _cons | 10.049329439 + .2920542 .40614 -----------------------------------------------------------------------------OLS regression for group 6 Source | SS df MS -------------+-----------------------------Model | 11. t P>|t| [95% Conf.063648817 + .076299 . Interval] -------------+---------------------------------------------------------------output | 1.1007 6(15  4) The large 40.03436 -----------------------------------------------------------------------------cost | Coef.206847 . Std.038151 fuel | .795215705 Number of obs F( 3. The F statistic is (1.66 .7505079 http://www.830 -.3354.000 9.246051 fuel | .000 .67532 ------------------------------------------------------------------------------ 8.2344839 .037256216 .13544 13. forvalues i= 1(1)15 { // run year by year regression display "OLS regression for year " `i' regress cost output fuel load if year==`i' } (output is skipped) .62 0.3701678 load | .77079 .4095921 26.015663323 11 .1007 (6  1)4 ~ 40. 11) Prob > F R-squared Adj R-squared Root MSE = 15 = 2602.

30  (1.7505 15(6  4) The small F statistic does not reject the null hypothesis in favor of poolable panel data with respect to time (p<. http://www.indiana.edu/~statmath 89 .417584.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 89 The F statistic is .7505) (15  1)4 .9991).3354  .

PROC PANEL provides various ways of analyzing panel data and report correct (adjusted) statistics (see Table 4. I would recommend SAS and Stata. FGLS estimates theta. LSDV2 provides actual parameter estimates of groups (Y-intercepts).1 and 7. and hierarchical linear model. If unknown.1. LSDV2 suppresses the intercept. The dummy parameter estimates need to be computed afterward.edu/~statmath 90 . The Hausman specification test compares a fixed effect model and a random effect model.1). while the random effect model analyzes error variance structures affected by group and/or time. A panel data set needs to be arranged in the long format as shown in Section 1. When the variance structure is known. Stata is very handy to manipulate panel data reports incorrect F-test and R2. SPSS is least recommended for panel data models. Extensions to these basic linear panel data models include dynamic models with autocorrelation. LSDV1 is commonly used since it produces correct statistics. the within effect model produces incorrect MSE and standard errors of parameters. Thus. this model is useful when there are many groups and/or time periods in the panel data set since it is able to avoid the incidental parameter problem. LIMDEP is able to estimate various panel data models but does not good at data management. Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and within effect model. and logit/probit models. and LSDV3 includes all dummies and imposes restrictions instead. read output with caution and consider dropping subjects with many missing data points. Because of its larger degrees of freedom. This document assumes that data are balanced without missing values. The within effect model does not use dummy variables but deviations from group means. you may consider categorizing subjects to reduce the number of groups. LSDV1 drops a dummy. but reports incorrect R2 and F statistic. http://www. the fixed effect model is preferred. Then. panel data models may be less useful because the null hypothesis of F test is too strong. If data are severely unbalanced. GLS is used. In particular. Poolabiltiy is tested by running group by group or time by time regressions. If the null hypothesis of uncorrelation is rejected. Conclusion Panel data are analyzed to investigate group and time effects using fixed effect and random effect models. LSDV has three approaches to avoid perfect multicollinearity. random coefficient model.indiana. Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange multiplier test. Among the four statistical packages addressed in this document. Random effect models are estimated by the generalized least squares (GLS) and the feasible generalization least squares (FGLS). Slopes are assumed unchanged in both fixed effect and random effect models. Notice that the dummy parameters of three LSDV approaches have different meanings and thus conduct different t-tests. As a result. you need to adjust the standard errors to conduct correct t-tests. If the number of groups (subjects) or time periods is extremely large.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 90 9. The fixed effect model asks how group and/or time affect the intercept. Parameter estimates vary depending on estimation methods.

Min Max -------------+-------------------------------------------------------output0 | 90 .93646 cost0 | 90 1122524 1192075 68978 4748320 fuel0 | 90 471683 329502. Equipment | 0 5 | 5 Service & S/W | 0 4 | 4 ----------------+----------------------+---------Total | 35 15 | 50 .indiana.stern.037682 1.000 fuel0 = fuel price load = load factor.edu/~wgreene/Text/tables/tablelist5.432066 .sourceoecd.78 3104. tab type d1 | d1 Type of Firm | 0 1 | Total ----------------+----------------------+---------Telecom | 18 0 | 18 Electronics | 17 0 | 17 IT Equipment | 0 6 | 6 Comm. Min Max -------------+-------------------------------------------------------rnd | 39 2023.S.nyu. index number cost0 = total cost in $1.5604602 .org/).9 103795 1015610 load | 90 . URL: http://pages.585 -732 11797 Data set 2: Cost data for U.htm http://www.dta airline = airline (six airlines) year = year (fifteen years) output0 = output in revenue passenger miles.dta firm = IT company name type = type of IT firm rnd = 2002 R&D investment in current USD millions income = 2000 net income in current USD millions d1 = 1 for equipment and software firms and 0 for telecommunication and electronics . Dev.edu/~statmath/stat/all/panel/rnd2002. URL: http://www.edu/~statmath/stat/all/panel/airline.417 0 5490 income | 50 2509. sum output0 cost0 fuel0 load Variable | Obs Mean Std. airlines (1970-1984) presented in Greene (2003).csv http://www.indiana.edu/~statmath 91 .5335865 .676287 http://www.indiana.564 1615.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 91 Appendix: Data Sets Data set 1: Data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004 (http://thesius. the average capacity utilization of the fleet . sum rnd income Variable | Obs Mean Std.indiana.5449946 .0527934 . Dev.edu/~statmath/stat/all/panel/rnd2002.

Release 10. William H. Jeffrey M. Suits.1 User’s Guide. Daniel B. 1973. Stata Longitudinal/Panel Data Reference Manual. "Transformations for Estimation of Linear Models with Nested-Error Structure. Cameron. "The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics. Release 10. 2007. Stata Time-Series Reference Manual. Plainview. 3rd ed. Colin. Cary.” Review of Economics & Statistics 66 (1):177-180. Wooldridge. New York: Cambridge University Press. T. NC: SAS Institute." Journal of the American Statistical Association.. http://www. Trivedi. Release 10. Freund. A. 1980. and Young-Jae Chang. and Orhan Erdem. Uyar. Cambridge. Econometric Analysis of Cross Section and Panel Data. and A. "Specification Tests in Econometrics. Econometric Analysis of Panel Data. Baltagi. 68(343) (September): 626-632. SAS/STAT 9. Littell. John & Sons. LIMDEP Version 9. 2000. A. "Regression Procedures in SAS: Problems?" American Statistician 44(4): 296-301.. and Ramon C.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 92 References Baltagi. Fuller. and George E. IL: SPSS Inc. College Station. Fuller. Wayne A. J. Microeconometrics: Methods and Applications. Chicago. Badi H. MA: MIT Press. 2004. Microeconometrics Using Stata. Pagan.indiana. Bulent. Wiley. 1994. 2: 67-78. 1978. 2004. 2007. 2001. SAS Institute." Review of Economic Studies. Upper Saddle River. and George E. Cary.0 Econometric Modeling Guide 1. Stata Press. SAS/ETS 9. Stata Press. 2009. Stata Base Reference Manual. “Dummy Variables: Mechanics V. SPSS Inc. "Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-way Error Component Regression Model. 2002. Stata Press. NJ: Prentice Hall. TX: Stata Press." Journal of Econometrics. Greene. TX: Stata Press. R. and Pravin K. TX: Stata Press. 2007. S. 1974. 47(1):239-253. 2003. 1990. Colin. SAS System for Regression. SPSS 16. Badi H." Econometrica. 2005. College Station. SAS Institute. 1984. 62(2): 67-89.. NC: SAS Institute. Rudolf J. Battese. "Estimation of Linear Models with CrossedError Structure.0 Command Syntax Reference. A. Greene. 46(6):1251-1271. Cary. Trivedi. Breusch. Hausman. Econometric Analysis. NC: SAS Institute. 5th ed. and Pravin K. 2007.edu/~statmath 92 . College Station. Wayne A." Journal of Econometrics. TX: Stata Press. Interpretation. William H. New York: Econometric Software. Battese. Cameron.1 User’s Guide. 2007.

Good of the School of Public and Environmental Affairs.edu/~statmath 93 . Dani Marinova. and Kevin Wilhite at the UITS Center for Statistical and Mathematical Computing for comments and suggestions. I am also grateful to Jeremy Albright. Heejoon Kang of the Kelley School of Business and Dr. Revision History  2005. Indiana University at Bloomington.© 2005-2009 The Trustees of Indiana University (9/16/2009) Linear Regression Models for Panel Data: 93 Acknowledgements I have to thank Dr. 11 Corrected some errors and added Stata examples  2009.11 First draft  2008.09 Second draft (updated LSDV section and analysis output) http://www. A special thanks to many readers around the world who have eagerly provided constructive feedback and encouraged me to keep improving this document.indiana. David H.04.

Sign up to vote on this title
UsefulNot useful