Hsiao 1985

Econometric Reviews
ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://www.tandfonline.com/loi/lecr20
Benefits and limitations of panel data
Cheng Hsiao
To cite this article: Cheng Hsiao (1985) Benefits and limitations of panel data, Econometric
Reviews, 4:1, 121-174, DOI: 10.1080/07474938508800078
To link to this article: https://doi.org/10.1080/07474938508800078
Published online: 21 Mar 2007.
Submit your article to this journal
Article views: 633
View related articles
Citing articles: 88 View citing articles
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=lecr20
ECONOMETRIC REVIEWS, 4 ( 1 ) , 1 2 1 - 1 7 4 ( 1 9 8 5 )
BENEFITS AND LIMITATIONS OF PANEL DATA
Cheng Hsiao
University of Southern California, Los Angeles 90089-0152
and
University of Toronto, Toronto M5S 1Al
Key Words and Phrases: panel data; longitudinal data; fixed
e f f e c t s ; random e f f e c t s ; linear mode 2s; nonlinear models;
dynamic models; l a t e n t variable mode 2s.
ABSTRACT
Observations for a number of cross-sectional units over time
have become increasingly available. The new data sources enable
econometricians to construct and test more complicated behavioral
models than a single cross sectional or time series data set
would allow. The availability of new data sources, however, also
raises new issues. In this paper we review some basic econo-
metric methods that have been used to analyze such data sets. We
also indicate areas of research where panel data may be useful.
1. INTRODUCTION
A panel (or longitudinal or temporal cross-sectional) data

set is one which follows the same sample of individuals over
time, and thus provides multiple observations on each individual
in the sample. Examples of these include the University of
Michigan's Panel Study of Income Dynamics (PSID) and the National
Copyripht Q 1985 by Marcel Dekkcr, Inc.

1 22 HSIAO
Longitudinal Surveys of Labor Market Experience (NLS). The PSID

has been collecting annual economic information from a represen-
tative national sample of about 6,000 families and 15,000 indi-
viduals since 1968. The data set contains over 5,000 variables
including employment, income, and human capital variables as well
as information on housing, travel to work, and mobility. The NLS
has followed samples of several age-sex cohorts: men 45 to 59 in
1966, young men 14 to 24 in 1966, women 30 to 44 in 1967, young
women 14 to 24 in 1968, and youth of both sexes 14 to 21 in 1979.
The first four cohorts were interviewed periodically for 15
years. The youth cohort interviews were planned to continue
annually for 6 years. The original sample size for each of the
1960s cohorts was about 5,000 individuals, and the 1979 cohort
started with over 12,000 individuals. The list of variables
collected is running into the thousands and, as M. Borus puts it,
"includes everything you always wanted to know about individuals
that the Census Bureau was not afraid to ask."
In addition to the PSID and NLS data sets, there are many
panel data sets in several Western countries including the U.S.,
Canada, France, W. Germany, and Switzerland, etc.l At least two
factors have contributed to the proliferation of longitudinal
information. One is that panel data allow economists and other
social scientists to analyze, in depth, complex economic and
related issues which could not have been treated with equal rigor
using time series or cross-sectional data alone. Like cross-
sectional data, panel data describes each of a number of individ-
uals. Like time series data, it describes changes through time.
By blending characteristics of both cross-sectional and time
'For the sources of U.S. (labor market) data sets, see

Ashenfelter and Solon (1982), and Borus (1982). For examples of
marketing data, see Beckwith (1972); biomedical data, see
Sheiner, Rosenberg, and Melmon (1972); financial market data
base, see Dielman, Nantell and Wright (1980).
BENEFITS AND LIMITATIONS O F PANEL DATA 123
series data, more reliable research methods can be used in order

to investigate phenomena that otherwise could not have been dealt
with.
The second factor is that the cost of developing useful
longitudinal data sets is no longer prohibitive. In some cases,
computerized matching of existing administrative records can
produce inexpensive longitudinal information, such as the Social
Security Administration's Continuous Work History Sample (CWHS).
In other cases, valuable longitudinal data bases can be generated
by computerized matching of existing administrative and survey
data, such as the PSID and the U.S. Current Population Survey.
Even in cases where the desired longitudinal information can be
collected only by initiating new surveys, such as the series of
negative income tax experiments in the U.S. and Canada, the
advance of computerized data management systems has made longi-
tudinal data development cost-effective in the last 20 years
(Ashenfelter and Solon (1982)).
The purpose of this paper is to review the major advantages
and limitations of panel data in the context of specific econo-
metric methodologies. In Section 2 we give an overview of the
major advantages of the informational content of panel data. In
Section 3 we review typical procedures for specification and
inference in various types of models. In Section 4 we use some
examples to illustrate how specific assumptions can be made to
the advantage to exploit the richness and the unique properties
of panel data. Some final remarks are in Section 5. The topics
to be discussed are obviously incomplete and the problems are
treated only in broad outlines. What we hope to provide, how-
ever, is a summary of the main strands of thought in the litera-
ture leading up to the recent advancements. A more complete
exposition of the research in this area and its attendant tech-
nical details can be found in Chamberlain (1984), Hsiao ('985)
and the references cited therein.
HSIAO
2. BENEFITS OF PANEL DATA

Panel data usually offer a researcher a large number of data
points, hence improving the efficiency of econometric estimates.
Moreover, the use of panel data provides major benefits for
econometric estimation in at least three areas: (1) the identi-
fication of economic models and discrimination between competing
economic hypotheses; (2) the elimination or reduction of esti-
mation bias and (3) the reduction of problems of data multicol-
linearity.
2.1 Identification and Discrimination

Between Competing Hypotheses
In economics, as in other branches of the social and behav-
ioral sciences, there are often competing theories. Examples of
these include the impact of collective bargaining on wages
(Freeman and Medoff (1981)), the appropriate short-term policy to
alleviate unemployment (Cripps and Tarlin (1974), Phelps (1972)),
the effects of schooling on earnings (Griliches (1979)), and the
relationship between advertising expenditure and sales. Econo-
mists on opposite sides of these issues generally have very
different views concerning the operation of the economy and the
impact of institutions on economic performance. Some economists
believe unions indeed raise wages or that advertising truly
generates greater sales. Adherents of the opposite view tend to
regard the effects as more of an epiphenomenon than a substantive
force, and believe that observed differences are mainly due to
the sorting of workers or firms by their characteristics.
A single time series data, be it aggregated or disaggre-
gated, does not contain information with regard to inter-
individual differences. Hence it is not particularly useful for
discriminating between hypotheses which depend on different
social-demographic factors. Cross-sectional data, while contain-
BENEFITS AND LIMITATIONS OF PANEL DATA 1 25
ing variation in micro-economic and demographic variables, cannot

be used to model dynamics. The estimated coefficients from a
single cross-section are more likely to reflect inter-individual
or inter-firm differences than intra-individual or intra-firm
dynamics, unless data on variables controlling for these differ-
ences are available and explicitly included in the chosen speci-
fication. For example, if information on worker quality is not
available, a cross-sectionally estimated coefficient for union
status dummy in a wage equation may reflect either the impact of
trade unions or differences in worker quality.
Panel data, by providing sequential observations for a
number of individuals, often allow us to distinguish inter-
individual differences from intra-individual differences and to
construct the proper recursive structure for studying the issue
in question through a before and after effect. For instance, in
the above example even if information on worker quality is not
available, if a worker's ability stays constant or only changes
slowly over time, we can distinguish the hypothesis that unions
truly raise wages from the hypothesis that union effects are
largely illusory by comparing the wage differential of a worker
moving from a non-union to a union firm or vice versa. Because
under the view that the differences between union/non-union wages
are mainly due to differences between union and non-union firms/
workers prior to unionism or post-union sorting, a worker's wage
should not be affected when he moves from a non-union to a union
firm or vice versa. On the other hand, if unions truly raise
wages, holding worker quality constant, the worker's wage should
rise as he moves from a non-union firm to a union firm.
In general, cross-sectional data can show what proportion of
the labour force is unemployed or describe the distribution of
wage rates at a point in time. The aggregate time series data
can provide indications of general trends and cyclical patterns
126 HSIAO
i n unemployment, wages, income, and s o f o r t h . But n e i t h e r c r o s s -

s e c t i o n a l n o r t i m e s e r i e s d a t a can p r o v i d e i n f o r m a t i o n r e g a r d i n g
how many of t h o s e unemployed i n one month f i n d employment i n t h e
n e x t month o r whether t h e tendency of w e l f a r e f a m i l i e s t o s t a y on
w e l f a r e o c c u r s s i m p l y b e c a u s e t h e same f a c t o r s t h a t c a u s e them t o
be on w e l f a r e keep them t h e r e o r w h e t h e r , i n a d d i t i o n , t h e e x p e r -
i e n c e of r e c e i v i n g w e l f a r e h a s some s o r t of a d d i c t i v e e f f e c t t h a t
i n d u c e s c o n t i n u i n g w e l f a r e dependence ( e . g . , Chamberlain ( 1 9 7 8 ) ,
Heckman (1978, 1 9 8 1 a , c ) , Heckman and B o r j a s ( l 9 8 O ) , Heckman and
Willis (1977)). Such q u e s t i o n s o f s t a t e p e r s i s t e n c e o f t e n n e c e s -
s i t a t e l o n g i t u d i n a l t r a c k i n g of t h e same i n d i v i d u a l s .
The d i s t i n c t i o n of " s t a t e dependence" o r " p o p u l a t i o n h e t e r o -
geneity" has important policy implications. An illuminating
example i s p r o v i d e d by Ben-Porath (1973)). Suppose a cross-
s e c t i o n a l e s t i m a t e of a group of m a r r i e d women w i t h i d e n t i c a l
observed c h a r a c t e r i s t i c s h a s a y e a r l y p a r t i c i p a t i o n r a t e o f 50
percent. T h i s c o u l d i n one extreme imply t h a t e a c h woman i n a
homogeneous p o p u l a t i o n h a s a 50 p e r c e n t chance of b e i n g i n t h e
l a b o r f o r c e i n any g i v e n y e a r , w h i l e a t t h e o t h e r extreme i t
might imply t h a t 50 p e r c e n t of t h e women i n a h e t e r o g e n e o u s
p o p u l a t i o n always work and 50 p e r c e n t n e v e r work. I n t h e former
c a s e e a c h woman would be e x p e c t e d t o spend h a l f of h e r m a r r i e d
l i f e i n t h e l a b o r f o r c e and h a l f o u t of t h e l a b o r f o r c e , and j o b
t u r n o v e r would be e x p e c t e d t o be f r e q u e n t , w i t h a n a v e r a g e j o b
duration o f , say, 2 years. I n t h e l a t t e r c a s e t h e r e i s no t u r n -
over, and current information a b o u t work status i s a perfect
p r e d i c t o r of f u t u r e work s t a t u s .
2.2 Reducing E s t i m a t i o n Bias

A fundamental s t a t i s t i c a l problem f a c i n g e v e r y e c o n o m e t r i -
c i a n i s " t h e s p e c i f i c a t i o n problem". By s p e c i f i c a t i o n problem we
mean t h e s e l e c t i o n o f v a r i a b l e s which a r e t o be i n c l u d e d i n a
BENEFITS AND LIMITATIONS OF PANEL DATA 127
behavioral relationship as well as the manner in which these

variables are related to the variables which affect the outcome
but appear in the equation only through the error term. If the
effects of the omitted variables are correlated with the included
explanatory variables, and if these correlations are not expli-
citly allowed for, the resulting estimates will be biased.
For instance, consider a simple regression model
where x and z. are k x l and k xl vectors of exogenous vari-

-it -1t 1 2
ables, p, & and j are 1x1, klXl and k2X1 vectors of constants,
respectively, and the error term u is independently, identi-
it
cally distributed over i and t with mean zero and variance
2
uU. It is well known that the least squares regression of yit on
x z . yields unbiased and consistent estimators of p, & and
-it and -1t
. Now, suppose that zit are unobservable, and the covariances
between x. and
-1t zit are nonzero. Then the least squares regres-
sion coefficients of yit on x are biased. However, if repeated
-it
observations for a group of individuals are available, they may
allow us to eliminate the effects of 5. For example, if zit= zi
for all t, (i.e., stay constant through time for a given
individual but vary across individuals), we can take the first
difference of the individual observations over time and obtain
Similarly, if z = z for all i, (i.e., 5 stay constant across

-it -t
128 HSIAO
individuals at a given time but exhibit variation through time),

we can take the deviation from the mean across individuals at a
given time and obtain
- N - N - N
where yt -- IiZlyit/N, x =
-t git/N, and ut = Iizl
u it/N. Least squares regression of (2.2) or (2.3) now provides

unbiased and consistent estimates of &. Nevertheless, if we only
have a single cross-sectional data set (T=l) for the former case
(zit = z ) , or a single time series data set (N=l) for the latter
case (z-it = -tz ) , such transformations cannot be performed. We
cannot get consistent estimates of & unless there exist instru-
ments which are correlated with g but are uncorrelated with g
and u.
MaCurdy's (1981) work on the life cycle labor supply of
prime-age males under certainty is an example of this approach.
Under certain simplifying assumptions, MaCurdy shows that a
worker's labor supply function may be written as (2.1) where y
is the logarithm of hours worked, x is the logarithm of the real
wage rate and z is the logarithm of the worker's (unobserved)
marginal utility of initial wealth which, as a summary measure of
a worker's lifetime wages and property income, is assumed to stay
constant through time but vary across individuals (i.e., zit =
zi). Given the economic problem, not only is xit correlated with
z i' but every economic variable that could act as an instrument
for xit (such as education) is also correlated with zi. Thus, in
general it is not possible to estimate consistently from a
cross-sectional data set, but if panel data are available one may
consistently estimate by first differencing (2.1).
B E N E F I T S A N D L I M I T A T I O N S OF P A N E L D A T A
2.3 Lessening the Problem of Multicollinearity
The shortage of degrees of freedom and severe multicollin-

earity problems found in time series data often frustrate econo-
mists who wish to determine the individual influences of each
explanatory variable. This problem arises because the informa-
tion provided by the sample is not "rich" enough to meet the
informational requirements of the model as specified. Given this
situation, one must either augment the sample information or
reduce the informational requirements of the model (for example,
by imposing prior restrictions on the parameters). Panel data,
because it offers many more degrees of freedom as well as infor-
mation on individual attributes, may reduce the gap between the
informational requirements of a model and the information pro-
vided by the data. For example, in the estimation of distributed
lag models, it is common to impose prior restrictions on the lag
response coefficients in order to avoid the multicollinearity and
shortage of degree of freedom problems commonly found in the time
series data (Malinvaud (1970)). However, if the cross-sectional
units have identical lag-response coefficients, then we may use
cross-sectional information to estimate an unconstrained distrib-
uted lag model (Pakes and Griliches (1984)).
3. SPECIFICATION AND INFERENCE

3.1 A General Remark
While panel data allow us to construct and test more realis-

tic behavioral models which could not be identified using a
cross-section or time series data alone, the availability of new
data sources also raises new issues. New methods are constantly
being introduced and points of view are changing. In this sec-
tion we review some basic econometric methodologies that have
been used to study the individual entities within a pooled data
base and to draw generalized inferences about the population.
HSIAO
Suppose we have sample observations on certain characteris-

tics of N individuals over T time periods denoted by yit,
xkit' i = 1 . . N t = 1 , . T, k = l . . , K. Conventionally,
observations on y are assumed to be the random outcome of some
it
experiment with a probability distribution conditional on vectors
of the characteristics x. = (xlit,. . . , x ) ' and parameters
-1t Kit
and $, where 8 denotes the mxl vector of parameters of primary
interest, and $ denotes the nXl vector of nuisance parameters.
The nuisance parameters & may or may not vary with i and/or t.
The primary parameters 8 , which will also be referred to as
structural parameters, are assumed common for all i and t.
One of the goals of panel data analysis is to use all available
information to make inferences on g.
To utilize the information on both the intertemporal dynam-
ics and the individualities found in a panel data set, it is
convenient to classify an economic variable into one of three
types: individual time-invariant, period individual-invariant,
and individual-time varying variables. The individual time-
invariant variables are variables that are the same for a given
cross-sectional unit through time but which vary across cross-
sectional units. Examples of these are attributes of individual
firm management, ability, sex, and socio-economic background
variables. The period individual-invariant variables are vari-
ables that are same for all cross-sectional units at a given
point in time but which vary through time. Examples of these are
prices, interest rates and widespread optimism or pessimism. The
individual-time varying variables are variables that vary across
cross-sectional units at a point in time and also exhibit varia-
tions through time. Examples of these variables are firm prof-
its, sales, and capital stock.
Since an econometric model is a simplification of a real
world phenomenon, there are large numbers of factors which affect
the outcome but which are not explicitly included as explanatory

variables. It is natural to assume that the effects of all the
omitted variables are also driven by these three types of vari-
ables. For ease of exposition, we shall assume that the effects
of period individual-invariant variables can be amalgamated with
the effects of the individual-time varying variables, so that the
effects of omitted variables vit, are represented by
where ai represents the individual specific effects and uit

represents the individual-period varying effects. Typically, ai
and u. are assumed to be independent of each other.
lt
When ai are treated as fixed constants, the model is re-
ferred to as a fixed effect model. When a . are treated as random
1
variables, it is called a random-effect model. To unify these
two formulations we may assume from the outset that the effects
are random. The fixed effect model is,viewed as one where inves-
tigators make inferences conditional on the effects that are in
the sample. The random effect model is viewed as one where
investigators make unconditional or marginal inference with
respect to the population of all effects. There is really no
distinction in the "nature of the effect". It is up to the
investigator to decide whether he wants to make inference with
respect to the population characteristics or only with respect to
the effects that are in the sample.
In general, whether one wishes to consider the conditional
likelihood function or the marginal likelihood function depends
on the context of data, the manner in which they were gathered
and the environment from which they came. For instance, consider
an example where several technicians care for machines. The
effects of technicians may be assumed random. But if the situa-
132 HSIAO
tion is not that each technician comes and goes, randomly sampled
from all employees, but that all are available, and if we want to
assess differences between these specific technicians, then the
fixed effect model is more appropriate. Similarly, if an experi-
ment involves hundreds of individuals that are considered a
random sample from some larger population, random effects are
more appropriate. But if the situation is one of analyzing just
a few individuals, say five or six, where the sole interest lay
in just these individuals, then individual effects would more
appropriately be fixed and not random. The situation to which a
model applies and the inferences based on it are the deciding
factors in determining whether we should treat effects as random
or fixed. When individual units in the sample are of interest,
the effects are more appropriately considered fixed. When infer-
ences will be made about the characteristics of a population from
which those in the data are considered to be a random sample,
then the effects should be considered r a n d ~ m . ~
Closely related to the question of fixed effect or random
effect inference is the impact of nonorthogonality between the
effects a . and the included explanatory variables, x on the
1 -it'
appropriateness of either approach. There are arguments which
suggest that the individual effects and the explanatory variables
are correlated (Mundlak (1961, 1978)). For example, in a panel
of farms observed over several years, suppose that y is a
it
measure of the output of the i-th farm in the t-th season, x.
-1t
are measured inputs, a represents the input reflecting soil
i
quality and other characteristics of the farm's location which
are known to the farmer but unknown to the investigating econo-
2 ~ this
n sense, if N becomes large, one would not be interested
in the specific effects of each individual but rather in the
characteristics of the population. A random effect framework
would be more appropriate.
BENEFITS AND LIMITATIONS O F PANEL DATA
metricians, and u reflects unmeasured inputs which are not

it
under the farmer's control (e.g., rainfall). The factor input
decisions x are generally made before knowing u but are
-it' it
conditional on ai. Treating a as an unknown parameter is equi-
i
valent to adding a time invariant variable to the set of explan-
atory variables x
-it' Therefore, using a fixed effect model can
eliminate the bias arising from the correlation between the
unobserved time-invariant effects and the included explanatory
variables. On the other hand, a random effect inference ignoring
the correlation between the effects and explanatory variables can
lead to biased estimation. However, this is a consequence of
misspecification rather than of the inappropriateness of the
random effect inference.
To gain some intuition about the role of independence
assumptions between the effects and the included explanatory
variables within a conditional and unconditional inference frame-
work, let us consider the following two experiments. Suppose a
population is made up of a certain composition of red and black
balls. The first experiment consists of N individuals, each
picking a fixed number of balls randomly from this population to
form his person specific urn. Each individual then makes T
independent trials of drawing a ball from his specific urn and
putting it back. The second experiment assumes that individuals
have different preferences for the composition of red and black
balls for their specific urns and allows these personal attri-
butes to affect the compositions of these urns. Specifically,
prior to making T independent trials with replacment from their
respective urns, individuals are allowed to take any number of
balls from the population until their composition reaches a
desired proportion.
If the researcher's interest lies in making inferences on an
individual urn's compositions of red and black balls, a fixed
effect model should be used whether the sample comes from the
134 HSIAO
first or the second experiment. If his interest is in the popu-

lation composition, a marginal or unconditional inference should
be used. However, in the first experiment, differences in indi-
vidual urns are outcomes of random sampling. The subscript i
is purely a labelling device with no substantive content. A
conventional random effect model assuming the independence be-
tween a . and x would be appropriate. In the second experiment
-it
the differences in individual urns reflect differences in per-
sonal attributes. A proper marginal inference has to allow for
these nonrandom influences.
Formally, let u. and a . be independent normal processes and
lt
be mutually independent. In the case of the first experiment, a 1.
are independently distributed and independent of individual
attributes, x. = (5il, . . . , XI )I, the distribution of ai must be
-1 -iT
expressible as random sampling from a univariate distribution
with mean p and variance u2 (Box and Tiao (1968), Chamberlain
a
(1980)). Thus, the conditional distribution of (
(!+ + %ai)',
ai 1 -x.
1
1 is identical to the marginal distribution ((si + %ai) ,
ail ,
where u! = (uil, . . . , uiT) and 5 is a T x l vector of ones.

N1
In the second experiment, a . is correlated with the vari-
1
ables generating the process. Suppose that conditional on L ,

E(a.lx.) = a? = z'x. and var(a.lx.) = aL Then the conditional

1 '"1 -1 1 "1 W'
I
distribution of {(E~+ $t.)'
1
is
In both cases, the conditional density of gi + gai given ai is
But the marginal densities of E~ + %ai given -i x are different

((3.2) and (3.3), respectively). Under the independence assump-
tion {gi + sa. lx. ) has common mean 5.1for i = 1,. . . , N. Under
1 -1
the assumption that a
i and zi -i + gail~x) has a
are correlated, {u
different mean %a*i for each different i. It is the mistaken use
of (3.2) as the density of (IJ~ + %a. lx.) which creates the bias
1 "1
in the estimates. It is not that in making inferences about the
characteristics of a population that we should assume a fixed
effects model
Although the unconditional or random effect inference in
general will be more efficient than the conditional or fixed
effect inference, the conditional density (3.4) demonstrates that
the fixed effect model does not suffer from the bias due to the
omission of the relevant individual attributes. Furthermore, if
we condition on ai, there is no need to postulate the distribu-
tion of a Thus, the fixed effect model has assumed paramount
i'
importance im empirical studies (e.g., Ashenfelter (1978),
HSIAO
Hausman (1978), and Kiefer (1979)).

However, a typical panel contains a large number of cross-
sectional units and only a small number of time series observa-
tions. If we treat ai as parameters to be estimated, there is
not enough information available to determine a Namely, we
i'
have the classical incidental parameters problem in which the
number of parameters increases with the number of observations
(Neyman and Scott (1948)). Whether the inconsistency in estimat-
ing the fixed effects will give rise to inconsistency for estima-
tors of the structural parameters of interest depends on whether
the estimators of 2 satisfy the Neyman-Scott principle. That is,
it depends on whether there exist functions of observables xi =
which are independent of the incidental parameters such that when

-8 are the true values, V .(xl,...,
NJ
xNlf?) converge to zero in
probability as N tends to infinity.
A
If such functions exist,
A
then an estimator 2 derived by solving YNi(xl, ..., xNIi) = 0, j =

1, ..., m is consistent under suitable regularity conditions.
However, whether such functions exist depends on the type of
model being analyzed.
If the effects are treated as random, it is typical to
assume that the effects possess a probability density function
characterized by a finite number of parameters. Hence, there is
no incidental parameter problem. However, the random effect
specification circumscribes the incidental parameter problem by
making specific distributional assumptions. How restrictive such
assumptions need to be again depend on the type of the model
being investigated. In the following subsections we shall review
the advantages and disadvantages of the fixed effect and random
effect approaches in eliminating the bias of the structural

parameter estimators for different types of the model in the
context of the restrictiveness of the assumptions and efficiency
of the estimates.
3.2 Linear Models

Consider the model
where u. are assumed to be independent of x . If ai are

lt -1t'
treated as fixed constants, the least squares dummy variable
(LSDV) estimators of & and a
i
are
i = 1,..., N,
8, = yi - Gi,
T
where ii = 2 T and ji = It=l
The LSDV estimator of & is also called the within group or

covariance estimator because it is the least 'squares ( L S ) estima-
tor of the transformed model.
T
where ; = ItZ1uit/T. The transformation of the data attained
i
by subtracting from each observation the time series mean for the
138 HSIAO
corresponding cross-sectional unit is called the covariance

transformation. Such a transformation eliminates the need to
include dummy variables to handle the individual effects in the
matrix of explanatory variables. Hence, we only need to invert a
matrix of order KXK, not a matrix of order (N+K) x (N+K).
The estimator (3.7) is independent of the incidental parame-
ters a and is consistent when either N or T or both tend to
i
infinity. The estimator Gi (3.8) is consistent only when T
tends to infinity. The LSDV estimator is the best linear un-
biased estimator (BLUE)
if u. is independently, identically
lt
2
distributed (i.i.d.) with mean zero and variance oU. Further-
more, if u is normally distributed, it is also the MLE.
it
If ai are treated as random, the residuals v it and vis
become serially correlated since both contain ai. Although we
can still apply the covariance transformation to eliminate the
individual period-invariant effect and the LS regression for the
transformed model (3.9) remains unbiased and consistent, it is no
longer the BLUE. The BLUE is the generalized least squares (GLS)
estimator.
When a . are independent of zit,and
it
are i.i.d., the GLS
u
of & is the weighted average of the between and within group
estimators (Maddala (1971)),
where
If $ + 1, EGLS converges to the OLS. If $ + 0, the GLS of &

becomes the within estimator (LSDV). In essence, IJJ measures the
weight given to the between group variation. In the LSDV proce-
dure (or fixed effect model), this source of variation is com-
pletely ignored. The OLS procedure corresponds to IJJ = 1. The
between group and within group variations are just added up.
Thus, one may view the OLS and LSDV as somewhat all-or-nothing
ways of utilizing the between group variation. The procedure of
treating ai as random provides a solution intermediate to treat-
ing them all as different and treating them all as equal. How-
ever, when T tends to infinity, the GLS converges to the LSDV.
This is because when T goes to infinity, we have an infinite
number of observations for each i. Therefore, we can consider
each a. as a random variable which has been drawn once and for-
1
ever so that for each i we can pretend that they are just like
fixed parameters.
Computationally, the GLS estimator is fairly simple to
implement. We only need to transform the data by subtracting a
140 HS IAO
f r a c t i o n (I-$') of i n d i v i d u a l means yi and Zi from t h e i r c o r r e -
sponding y
it
and x
-it
values, then regress [yit-(1-$ +) y- i ] On
When u2 and u2 a r e unknown, we may s u b s t i t u t e t h e i r c o n s i s -

a U
t e n t e s t i m a t e s f o r them. The two s t e p GLS e s t i m a t o r i s asymptot-
i c a l l y e q u i v a l e n t t o t h e GLS e s t i m a t o r .
When a . a r e c o r r e l a t e d w i t h
1
zit, t h e random e f f e c t i n f e r e n c e
i n v o k e s a j o i n t d i s t r i b u t i o n of ( a . ,511. For l i n e a r r e g r e s s i o n
1
models i t i s s u f f i c i e n t t o assume t h a t
h o l d s , where E*(a. ! x i ) r e f e r s t o t h e minimum mean s q u a r e e r r o r

l i n e a r predictor (or projection) of a on j i e 3 Eq. (3.12) in
g e n e r a l w i l l h o l d i f t h e d i s t r i b u t i o n of ( a i , x ! ) d o e s n o t depend
-1
on i and t h e v a r i a n c e a r e f i n i t e .
Substituting the identity relation
i n t o ( 3 . 6 ) , we c a n a p p l y t h e GLS method t o e s t i m a t e ( p , & ' , % I ) .

When ( 3 . 1 2 ) can be f u r t h e r approximated by
- i = 1, ..., N, (3.14)
E*(ailxi) = 1-1 + w
a*' Z i 3
f ( a i I s i ) i s l i n e a r , t h e n E * ( a i ( x-1. ) = E ( a i l z i ) .
3 ~ E
where 2" is a Kxl vector of constants, and UJ is i.i.d., the GLS

i
estimator of & under the assumption that u
it
are i.i.d. happens
to be identical to the fixed effect estimator of & (3.7). How-
ever it should be noted that this result is very special, and is
probably only true for linear models.
The application of the GLS estimator requires the knowledge
of the variance-covariance matrix of the residual. The addition
of cross-section dimension to the time series dimension provides
the possibility of exploring the (unknown) structure of the
residual variance-covariance matrix. To avoid imposing prior
restrictions on the variance-covariance matrix, Chamberlain
(1982, 1984) proposes to treat each period as an equation in a
multivariate setup so that arbitrary serial correlation and
heteroscedasticity patterns in the error process can be allowed
(also see Kiefer (1980) and MaCurdy (1982)).
Specifically, when T is fixed and N tends to infinity,
we can stack the T time period observations of the i-th indi-
vidual's characteristics into a vector (~!,x!).
1 -1
We assume that
($,zx) is an independent draw from a common (unknown) multi-
variate distribution function with finite fourth-order moments
and with Ex.x!
-1-1 =xx
--
positive definite. Then each individual
observation vector corresponds to a T-variate regression.
where @denotes the Kronecker product.

To allow for the possible correlation between ai and EX, we
assume that (3.13) holds. Then
E9; (x
HS I A O
where
The equations (3.15) and (3.16) can be rewritten as
where . =
-1
xi - E 1 ) and Ef = vec(n') ' = [zi,.. . , $1
is a IXKT' n ' denoting the t-th row of
vector with *t n. We can
obtain the LS estimates of by regressing (x.-%$)

1
on [I 8
T
(Xi-"']. Using a version of the central limit theorem we can
show that fi(2-7t) is asymptotically normally distributed with
mean zero and variance covariance matrix R , where
and
A consistent estimator of R is readily available from the

corresponding sample moments,
where
.+;.:
If the conditional variance-covariance matrix is homoscedastic,
namely, Var(xi(5x) = V does not depend on x , (3.20) will con-
verge to v
Equation (3.17) implies that l is subject to restrictions.
Let g = (&' , s t ) . We specify the restrictions on fl ((3.17)) by
the conditions that
We can impose these restrictions by using a minimum distance

A
estimator. Namely, choose 5 to minimize
Under the assumption that 2 possesses continuous second

partial derivatives and the matrix of the first partial deriva-
tives
has full column rank in an open neighborhood containing the true

parameter El, the minimum distance estimator of (3.22),
A
2, is
consistent and fi(f?-2) is asymptotically normally distributed
with mean zero and variance-covariance matrix
The quadratic form

HS IAO
converges to chi-square distribution with K T ~- K(l+T) degree of

freedom.
The advantage of the multivariate setup is that we only need
to assume that the T period observations of the characteristics
of the i-th individual are independently distributed across
cross-sectional units with finite fourth order moments. We do
not need to make specific assumptions about the error process.
Nor do we need to assume that E(ailxi) is linear.4 The Chamber-
lain minimum distance estimator is efficient within the class of
estimators which do not impose a priori restrictions on the
variance-covariance matrix of the T period error term. It also
allows us to test the specific structure of the error process.
However, if uit and 0. conditional on "x.
1
are i.i.d. and are
mutually independent, the minimum distance estimator ignoring the
specific structure of the error process is not as efficient as
the GLS. Moreover, the computation of the minimum distance
estimator can be quite tedious while the two-step GLS estimator
is fairly easy to implement.
The conclusions for the single equation linear models also
hold for the linear simultaneous equation models. The fixed
effect linear simultaneous equation estimator for the structural
parameters again satisfy the Neyman-Scott principle (e.g.,
Schmidt (1984)); hence they are consistent as either N or T
or both tend to infinity. The conventional random effect formu-
lation led to the single equation analog of the error-components
2SLS or 3SLS estimators (Baltagi (1981)). If the effects are
correlated with the exogenous variables or if the structure of
4 ~ fE(ailzi) # E*(ai)xi), in general there will be heteroscedas-

ticity because the residual will contain E(ailx.)
"1
- Efc(ailzi).
the variance-covariance matrix of error terms is unknown, we can

generalize the Chamberlain minimum distance procedure to estimate
the structural parameters (for details, see Hsiao (1985, ch. 5)).
3.3 Dynamic Models
Two factors preclude the straightforward application of

estimation methods for linear static models to models containing
lagged dependent variables of the form (Balestra and Nerlove
(1966)),
'it
-- Wi,t-l + a.1 + uit'
+ &hit (3.26)
One is the correlation between the time persistent errors and the
lagged dependent variables. The other is the problem of initial
observations. The issue of correlation between the residuals and
lagged dependent variables is not affected by the size of the
time series observations, T, while the initial value problem
arises only when T is small. When T is large, the weight of
the initial observation in the likelihood function becomes negli-
gible and it is appropriate to ignore this issue.
If ai are treated as fixed constants, the LSDV estimators
for y and & are biased when T is fixed.5 The bias is caused by
having to eliminate the unknown individual constant (or effects)
a i from each observation which creates the correlation of order
1/T between (yi,t-l-Gi and the residuals in the transformed
7
model,
'1n the case

converges to
(l-yl(T-l)l [l
146 HSIAO
- - T
where /T, assuming yio a r e observable.
Yi,-l 't=l Y i , t - 1
When T i s v e r y l a r g e , t h e r i g h t - h a n d s i d e v a r i a b l e s become
a s y m p t o t i c a l l y u n c o r r e l a t e d and t h e LSDV e s t i m a t o r s a r e c o n s i s -
tent. For s m a l l T, t h e b i a s f o r y i s n e g a t i v e and t h e b i a s f o r
$ i s positive if y > 0. The b i a s does n o t go t o z e r o a s Y
approaches t o z e r o (Anderson and Hsiao ( 1 9 8 1 ) , N i c k e l 1 ( 1 9 8 1 ) ) .
When a . a r e t r e a t e d a s random, t h e amalgamated e r r o r terms
1
v. = a.
~t 1
+ u i t a r e s e r i a l l y c o r r e l a t e d and a r e c o r r e l a t e d w i t h
Hence t h e LS e s t i m a t o r i s b i a s e d . The LS e s t i m a t e s of y
y i
and $ a r e b i a s e d upward and b i a s e d toward z e r o , r e s p e c t i v e l y
(Nerlove ( 1 9 7 1 ) , Trognon ( 1 9 7 8 ) ) .
To implement t h e maximum l i k e l i h o o d p r o c e d u r e , we need t o
i n i t i a l i z e a t i m e dependent. s t o c h a s t i c p r o c e s s . For fixed T
t h e assumption a b o u t t h e i n i t i a l o b s e r v a t i o n s d i r e c t l y a f f e c t s
t h e i n t e r p r e t a t i o n of a random e f f e c t model and t h e c o n s i s t e n c y
of t h e MLE (Anderson and Hsiao (1981, 1 9 8 2 ) ) .
I f yio a r e assumed t o be f i x e d c o n s t a n t s , t h e MLE i s c o n s i s -
t e n t when N tends t o i n f i n i t y . However, f i x e d yio and (3.26)
t o g e t h e r imply t h a t Cov(yio,ai) = 0 and Cov(yit,ai) # 0 for t =
1 , ..., . I n o t h e r words, t h e model p e r m i t s a c r o s s - s e c t i o n a l
unit to s t a r t from some a r b i t r a r y p o s i t i o n and t h e i n d i v i d u a l
effects, a a r e n o t b r o u g h t i n t o t h e model i n time 0 b u t a f f e c t
i'
t h e p r o c e s s i n time 1 and l a t e r . I f t h e p r o c e s s h a s been i n
o p e r a t i o n p r i o r t o t h e t i m e it i s sampled, it does n o t seem
r e a s o n a b l e t o view y a s d i f f e r e n t from any o t h e r y t =
io it'
1,. .. . Indeed, if t h e t r u e Cov(yio,ai) # 0, maxhizing the
likelihood function L under t h e assumption of fixed initial
o b s e r v a t i o n s w i l l n o t y i e l d c o n s i s t e n t e s t i m a t o r s f o r y and &
if T i s fixed, even when N is l a r g e because E(BL/%) # 0.
I n g e n e r a l , i t would seem more r e a s o n a b l e t o a l l o w an i n d i -
v i d u a l t o s t a r t a t any a r b i t r a r y p o s i t i o n , and a l s o t o a l l o w f o r
the possibility that the starting period of the sample observa-

tions need not coincide with .the beginning of a stochastic pro-
cess by letting the individual effects affect all sample observa-
tions, including y .
10
.
In other words, we let Cov(yio,ai) # 0.
However, there could be many different situations pertaining to
this assumption. One case often assumed by engineers is to let
yit be a state in time t. In each time period, yit is indepen-
dently distributed across i with finite mean and variance.
Then one does not need to know how the initial state is reached.
We only need to assume that yio is independently distributed
across i with mean p and variance u The MLE then maxi-
YO YO'
mizes the likelihood function of N independently distributed
(T+l) component random vectors (yio, . . . , yiT) that are character-
ized by a finite number of parameters. Hence it is consistent.
The assumption that y is randomly distributed with common
io
mean p may be reasonable if there are no exogenous variables,
YO
K , driving the stochastic process. If there are individual-
period varying variables, x driving the stochastic process,
-it'
then by the recursive substitutions of .the relation (3.26), we
can write y as
io
m
Let w
io - xi,-jYJ. Since -it
stand for ,@ttj-o x are different for
different i and t, each y io actually has different mean w.
10
for i = 1, . . . , N.
If we wish to estimate wio together with Y and &, there are
incidental parameters problems. Moreover, the MLE of wio are not
independent of the MLE of y and &. Hence the MLE is inconsistent
if T is fixed, if it exists at all.
To circumscribe the incidental parameter problem, Bhargava
and Sargan (1983) suggest that we predict wio by all the observed
148 HSIAO
-x.
1
(also see Chamberlain (1984), Heckman (1981b)). Conditional
on x . we have
"1'
where c . are i.i.d. and are allowed to be correlated with vit.

10
Then the MLE again maximizes the likelihood function of N inde-
pendently distributed (T+1) component vector (yio, . . . , yiT),
which is a function of finite number of parameters, so it is
consistent under fairly general conditions.
Although when T is fixed the consistency of the MLE
depends on the correct formulation of the initial conditions, in
practice, we have very little information on the characteristics
of the initial conditions. To obtain consistent estimators of y
and &, we may apply instrumental variable (IV) estimation methods.
Taking the first difference to eliminate a Chamberlain
i'
(1984) suggests the stacking of all differenced equations for a
single individual as a system of (T-1) equations,
The system (3.30) allows arbitrary serial correlation in the

residual. It also does not involve the individual effects, ai.
Therefore, we can use simultaneous equation methods to estimate Y
and & independent of whether ai are treated as fixed or random or
x . provided E(uitlzi)
as being correlated with "1
= 0.
The consistency of the IV estimators is independent of the
initial conditions. However, if we want efficient estimates or
also wish to test the maintained hypothesis on initial condi-
tions, it would seem more appropriate to rely on the maximum

likelihood principle. Different assumptions on the initial
conditions imply different restrictions on the serial covariance
matrix of an individual's behavioral equations. Therefore,
likelihood ratio tests can be used to test the maintained hypoth-
eses on initial conditions as well as the specific structure of
the error process (Bhargava and Sargan (1983)).
3.4 Nonlinear Models

The controlling for unobserved individual heterogeneity in
panel data analysis raises particularly difficult problems for
nonlinear models. First, in general there is no simple transfor-
mation of the data which can eliminate the individual and/or time
specific effects. Secondly, the ML estimation of the individual
specific effects and the structural parameters are not generally
separable. Thirdly, the random effect specification requires
knowledge of the specific distributional assumption of the effects
Specifically, let us consider a dichotomous discrete choice
model in which (3.6) describes the behavior of the latent vari-
able y where the observed variable d takes the value
1 if y . > O ,
lt
0 otherwise.
Then the probability of observing dit = 1 is given by F(&'x -it +oil,

where F ( . ) is the cumulative distribution function of u.
The log likelihood function for the fixed effect model is
given by
I 1
150 HSIAO
The log likelihood function for the random effect model is given
by
where H(a) denotes the probability distribution function of a .

As one can see, unless H(.) is specified parametrically, (3.33)
is not defined. For instance, let F stand for the standard
cumulative normal distribution function. Conditioning on a the
i'
model is a probit model. But the unconditional model is not
necessarily a probit unless H(a) is also normally distributed.
Moreover, even if this can be assumed, the model becomes a multi-
variate probit because of the correlation between yit and y . .
1s
The MLE necessitates the evaluation of multiple integrations
which can be extremely complicated.
Furthermore, if ai and x are correlated, we have to spec-
-it
ify the joint distribution of (ai,zx) in order to obtain consis-
tent estimates of structural parameters. An analogy to the
linear case is to assume that (Chamberlain (1980, 1984))
where w is independent of x and has a distribution function

i -i
H"(w) . Then
However, there is a very important difference in this step com-

pared with the linear case. In the linear case it was not re-
strictive to decompose a into its linear projection on x and an
i -i
orthogonal residual. Now we are assuming that E(a.(x.) is indeed
1 "1
linear, that w. is independent of L , and that wi has a specific
1
probability distribution. These are restrictive assumptions and
there is a payoff to relaxing them.
When ai are treated as fixed constants, the distribution of
ai plays no role and the computation of the MLE can be simpli-
fied. However, the ML estimation of ai and & are not separable.
For typical panels, T is small and N is large. The inconsis-
tency of cr is transmitted into the MLE of & (e.g., see Heckman
i
(1981a) for some Monte Carlo evidence). Moreover, in general,
simple functions for Y (see above (3.5)) are not always easy to
Nj
find. Andersen (1970, 1973) has suggested a general approach to
find Y by conditioning the likelihood function of the observ-
Nj
ables on the minimum sufficient statistics Ti for the incidental
parameters ai. Provided that t , exists and is not dependent on
1
the structural parameters 2 , the conditional likelihood function
of observables conditioning on ti, i = 1,. . . , N, will no longer
depend on ai. Hence under mild regularity conditions, maximizing
the conditional likelihood function will yield consistent estima-
tors of f?.6 Unfortunately, for nonlinear models, except for
special cases such as the fixed effect logit models (Andersen
(1973), Chamberlain (1980)) or Poisson models (Hausman, Hall and
Griliches (1984)), it is not generally possible to find the
minimum sufficient statistic ti for the incidental parameters ai
which are independent of the structural parameters (e.g., the
'When uit are normally distributed, the LSDV estimator of & for
the linear model is the conditional MLE (Cornwell and Schmidt
(1984)).
HSIAO
fixed effect probit model).
4. NON-STANDARD SITUATIONS
In the last section we discussed the general issues involved

in controlling for the unobserved heterogeneity in individual
sample responses in order to draw inferences about the common
aspects of the population. In this section we show that the
nature of panel data often allows econometricians to make use of
specific assumptions to identify a model which is otherwise not
possible. The examples we consider are the use of covariance
restrictions, estimation of unconstrained distributed lag models
and errors in variables models.
4.1 Covariance Restrictions
Consider the income-schooling model as an example. We often

encounter a triangular model of the form (e.g., Chamberlain and
Griliches (1975), Griliches (1379)),
where yl denotes years of schooling, y2 denotes a post school

test score, and y denotes earnings, and x are vectors of
3 -it
exogenous variables. If xit= ( v ~ ~ ~ , v ~are~ i.i.d.
~ , v and
~ ~ ~ )
the components of xit are mutually independent, (4.1) is a recur-
sive system. The model is identified. If the components of xit
'~n general, the dimension of the minimum sufficient statistic

will not be smaller than the sample size unless the distribution
is a member of the exponential family.
are correlated with each other, then unless there are sufficient
numbers of exogenous variables in the first equation which do not
appear in the other equations, the system (4.1) is not identified
(e.g., Hsiao (1983)).
If the correlation of (v v v ) are due to a common omitted
1' 2' 3
variable, say,
where u . are uncorrelated both across equations

glt
and across individuals : EuitujS = 0 for i # j , 8 then even if

there are not enough exogenous variables to identify each equa-
tion separately, it may still be possible to identify the model
by combining the restrictions on the variance-covariance matrix
with the restrictions on the coefficient matrix.
As an illustration, assume that h has a variance component
Structure:
- (4.4)
hit - ai + wit
where cy is the family (group) component which is invariant over

i
members (t) of the same family but is i.i.d. across i (groups)
2
with mean zero and variance o ~ ,and w it is the individual compo-
8 ~ nthe earning-schooling model h may be viewed as the unob-

served ability variable and u as the measurement error.
8
154 HSIAO
nent which is i.i.d. across i and

t with mean zero and vari-
ance o2 and is uncorrelated with oi. Then
U)
2 2 2 2
Ev, v! =
-1t-lt (
0: t oU)$cl4 t diag(ol ,02,03) ,
where 2'
= (d 1 ,d2 ,d3) . Note that the within group variance-
covariance matrix (4.6) is only of rank one.
For simplicity, we assume that there is no restriction on
the coefficients of exogenous variables. Under this assumption,
we may further ignore the existence of exogenous variables with-
out loss of generality because there are no excluded exogenous
variables which may be legitimately used as instruments for the
exogenous variables appearing in the equation. We also assume
> 2.
y32 = 0 and T - If the variance-covariance matrix of the
error term is unrestricted, the model remains unidentified.
However, if v satisfy (4.5)-(4.7), we may use y purged of the
-it
common omitted variable h as instruments to identify the model
(Chamberlain (1977a,b)).
To see how we may construct instruments by utilizing (4.6),
we rewrite (4.1) under the assumption that yg2 = 0, and &g = 2, g
= 1, 2, 3 , in terms of the following reduced form,
u
'dl lit
d2+Y21dl U2itt'21ulit
Ld3+Y31dl (U3itt'3~ulit,
I J
E
lit
E '
2it
E
3it,
Substituting y for h in the reduced form equation for yg, we
1
have
Equation ( 4 . 9 ) s # t, as an
can be estimated by using y
lis'
instrument for y provided that d a2 $ 0 . Purge y of its
lit' 1 cu 2
dependence on h and form the residual
- - - - -
Zit -
2
W
- '2it al 'lit - '2it al 'lit *
L
Then, provided d2ul # 0, w2 can be used as an instrument for y1
in the structural equation y
3:
because it is uncorrelated with h and u3, but is correlated

with y1.9
Similarly, to identify the y2 equation
we only need to interchange the roles of the y2 and y 3 equations

and repeat the two stages. With y21 and ygl identified, in the
'~f d2 = 0 , then w2 = y2 - yZ1yl = u2. The variable w2 is no

longer correlated with yl.
HSIAO
third stage we form the residuals
Then use y as a proxy for h:

1
v = -2 + U - -dl2 U
Zit dl 'lit Zit lit '
lit
+ U
3it
- -d3 U
lit
dl
Now d /d and d3/dl can be identified by a third application of

2 1
instrumental variables, using y s # t as an instrument for
lis'
(Note that only the ratio of the d's is identified due to
ylit.
2 2 2
the indeterminate scale of the latent variance because d (U +au)
2 2 2 g a
= cd (o /c+uu/c)).
g a
The reason that the model can be identified is because panel
data provides an additional set of cross-sibling covariances. By
combining these residual covariance restrictions with the more
conventional slope restrictions we may generate more than enough
equations to solve for the structural parameters.
4.2 Distributed Lag Models
A general distributed lag model for a single time sereis of

observations is usually written as
where for simplicity we assume that there is only one exogenous

variable, x , and condition on {xt), the u are independent draws

t
from a common distribution function. When no restrictions are
imposed on the lag coefficients, one cannot obtain consistent
estimates of even when T -, @ because the number of unknown
t
parameters increases with the number of observations. Moreover,
available samples often consist of fairly short time series on
variables which are highly correlated over time. There is not
sufficient information to obtain precise estimates of any of the
lag coefficients without specifying, a priori, that all of them
are a function of only a very small number of parameters (e.g. ,
Koyck lag, Almon lag, etc. See Dhrymes (l97l), Griliches (1967)
or Malinvaud (1970)).
When there are N times series, we may use cross-sectional
information to identify and estimate (at least some of the) lag
coefficients without having to specify a priori that the sequence
of lag coefficients progress in a particular way (Pakes and
Griliches (1984)). For instance, consider the problem of using
panel data to estimate the model
t = 1, . . . , T ,
where
is the contribution of the unobserved presample x's to the

current values of y. We call (4.17) the truncation remainder.
Provided that we can identify the impact of the unobserved trun-
cation remainder term on the coefficients of xtmQ, the problem of
..
collinearity among xt, x ~ -. ~ ,, in a single time series may be
reduced by the cross-sectional differences in $individual charac-
teristics. Thus, we may obtain consistent estimates of
t'
t .=
HSIAO
0, . . . , t-1 by regressing (4.16) cross-sectionally.
lt ) are determined
The values of the truncation remainders (b.
by the lag coefficients and the presample x's. Identification
requires constraints on either the lag coefficients or on the
stochastic process generating these x's. Because there are
usually many more degrees of freedom available in panel data, we
can use a different kind of prior restrictions than the usual
approach of constraining lag coefficients to identify the trun-
cation remainders (Pakes and Griliches (1984)).
For instance, if x is generated by a p-th order autore-
gressive process, then for q 2 0, the projection of the unob-
served presample x's on x is given by
-i
Since each element of b = (bil,. . . , biT)' is just a different

-i
linear combination of the same presample x's, assumption (4.18)
implies that
03
(9)
where $t,j = Zqzl Bt+q~j , J = I,..., Pa
Substituting (4.19) into (4.16), we have
Assuming (3.12), conditioning (4.20) on zi, passing through the

projection operator once more, we obtain
,
.,
t-j > p
where $t
,j = $t-p+j-l + @t,j if and = $t, otherwise
Subtracting yit by yil, we have
where v . - yit - Yil - qj = @t,j -

By construction, E*(v.lt Ix. ) = 0. Hence we can either apply
-1
the least squares or the more efficient Chamberlain minimum
distance estimator cross-sectionally to consistently estimate the
leading T-p-1 lag coefficients without having to impose any
160 HSIAO
m
restrictions on the sequence {f3T)T=o. Of course, if T is small
relative to p, we will not be able to build up much information
on the tail of the lag distribution. This simply reflects that
short panels, by their very nature do not contain unconstrained
information on that tail. However, the early coefficients are
often of significant interest in themselves. Moreover, they may
provide a basis for restricting the lag structure to be a func-
tion of a small number of parameters in further work.
4.3 Measurement Errors
Economic quantities are frequently measured with error.

Furthermore, in many applications there are no exact measurable
counterparts of the conceptual variables. Proxy variables have
to be used. The standard treatment for the errors-in-variables
model requires extraneous information either in the form of
additional data (replication and/or instrumental variables) or
additional assumptions to identify the parameters of interest
(e.g., Aigner, et al. (1984)). In the panel data context a
variety of errors-in-variables models may be identifiable and
estimable without the use of external instruments (e.g.,
Ashenfelter, Deaton and Solon (1984), Griliches and Hausman
(1984)).
For example, consider the following single equation model,
Suppose the true independent variable x. are not observed

It
directly. It is its erroneous reflection, the
that is observed, where e. is the measurement error. If LS is

It
applied to the observed variables, the equation to be estimated

is
yit = a i + Bzit + (uit-Beit) , (4.25)
and the resulting estimates are biased because of the correlation

between the zit and the new composite disturbance term (uit-Be. ) .
lt
Suppose the measurement error, e
it' consists of two indepen-
dent components,
eit = qi + E (4.26)
it '
where 11 is individual time-invariant and is i.i.d. across i

i
with mean 0 and variance o2 and tit is individual-time varying
11'
and is i.i.d. across i and t with mean 0 and variance u2
E'
Because the errors of measurement are assumed to have a particu-
lar structure over time and across individuals, panel data can
provide the possibility of using different transformation of the
data to induce different and deduceable changes in the biases in
the estimated parameters which can then be used to identify the
importance of measurement errors and recover the "true" parameter
Taking the first difference of the data, we eliminate the
contribution of the unobserved individual time-invariant compo-
nents from the specification and obtain
The LS estimate of (4.27) converges to
plim f3 -
N*
d - B(l - Var(zit -&zi,t-I) )
162 HSIAO
which is biased. But if xit is serially correlated, it is clear

that Zi,t-2 Or (zi,t-2
- zi,t-3) can be used as instruments for
Even though T may be finite, the resulting IV
(zit - Zi,t-1)
estimator is consistent so long as N tends to infinity.
Alternatively, we may obtain consistent estimates through a
comparison of magnitudes of the bias arrived at by subjecting a
model to different transformations (Griliches and Hausman (1984)).
For instance, if we use a covariance transformation to eliminate
the contributions of unobserved individual components, we have
The LS regression of (4.29) converges to

1
Then consistent estimators of f j and U

: can be solved from (4.28)
and (4.30) ,
~ O A Spointed out by S. Nickell, if xit is serially uncorrelated,

neither lagged zit nor (4.31) can be used to obtain consistent
estimators of B because (4.28) and (4.30) converge to the same
-
2 2
value of fj[1 ut/(ot + Var(xit>)].
In general, if the measurement errors are known to possess

certain structures, consistent estimators may be available from a
method of moments and/or from an IV approach by utilizing the
panel structure of the data. Moreover, the IV approach can lead
to specification tests which permit an assessment of the measure-
ment error correlation assumptions, which provide the rationale
for the validity of the instruments, hold true (e.g., Griliches &
Hausman ( 1 9 8 4 ) ) .
5. CONCLUSION
In this paper we have reviewed some basic methods that have

been used for drawing inferences about a population while allow-
ing for heterogeneity of individual sampling responses. We have
also indicated areas in which the existence of panel data have
enabled researchers to examine previously untestable analytical
assumptions. The selection of topics was highly subjective, and
was intended to serve as illustrations, not as an exhaustive
compilation of the kinds of research panel data allows. For
instance, the important subject of continuous time duration
models (Heckman & Singer ( 1 9 8 4 ) ) was completely ignored in defer-
ence to a forthcoming article by T. Lancaster.
Although pane1,data has opened up avenues of research that
simply could not have been pursued otherwise, it is not a panacea
for econometric,researchers. The power of panel data depends on
the extent and reliability of the information it contains as well
as on the validity of the restrictions upon which the statistical
methods have been built. Otherwise, it may solve one problem but
aggravate another. For instance, as mentioned in section 3.4,
the ability of panel data to control for unobserved individual
164 HSIAO
effects in nonlinear models requires the imposition of very

specific distributional assumptions. This specific distribu-
tional assumption could in turn have significant impact on the
final estimates. Chamberlain (1984) used a sample of 924 married
women in the PSID to estimate both probit and logit specification
of labor force participation decisions. The absolute value of
his conditional MLE of logit model was 40% above his probit
estimate obtained by assuming that the individual effects
i'
conditional on x. (3.34) was also normal.
-1
The sensitivity of econometric estimates to particular
assumptions is not confined to nonlinear models. It can happen
to linear models as well. Consider the following income-school-
ing example given by Griliches (1979),
where y is a measure of income or wage rate, S is a measure of

schooling, and A is an unmeasured ability variable which is
assumed to be positively related to S. The coefficients of
and p2 are assumed to be positive. Under the assumption that S
and A are uncorrelated with u , the least squares estimate of
which ignores A is biased upward. The standard leftout
variable formula gives the size of this bias as
where is the variance of S, and oAS is the covariance between

A and S.
If the omitted variable A is a purely "family" one,ll that
ll~amely,the family effect A. has the same meaning as ai.

is, if siblings have exactly the same level of A, then estimat-

ing from within family data, (i.e., from differences between
the brothers' earnings and differences between the brothers'
education), would eliminate the bias. But, if ability, apart
from having a family component, also has an individual component,
and this individual component is not independent of the schooling
variable, the within family estimates are not necessarily less
biased.
Suppose
Ait = ai + wit , (5.3)
where i denotes the family, and t denotes members of the

family. If uit is uncorrelated with Sthe expected value of
it'
the within (or LSDV) estimator is unbiased. But if the within
family covariance between A and S, oS,, is not equal to zero,
the expected value of the within estimator is
where o2 is the within family variance of S. The estimator

Slw
remains biased. Furthermore, if the reasons for the correlation
between A and S
are largely individual rather than familial,
2
then going to within family data will drastically reduce u
s lw
with little change to uAS (or aSur) , which would make this source
of bias even more serious.
Moreover, if S is also a function of A and other social
economic variables, (5.1) is only one behavioral equation in a
simultaneous equation model. Then, the probability limit of the
least squares estimate of it is no longer (5.2), but is of the
166 HSIAO
If, as argued by Griliches (1977, 1979), schooling is the result,

at least in part, of optimizing behavior by individuals and their
family, u could be negative. This opens the possibility that
us
the least squares estimates of the schooling coefficient may be
biased downward rather than upward. Furthermore, if the reasons
for u being negative is again largely individual rather than
us
familial, and the within family covariance between A and S
2
reduces uAS by roughly the same proportion :a u2
s lw is to uS '
there will be a significant decline in the @ 1 ,w relative to
A
The size of this decline would be attributed to the

importance of ability and "family background", when in fact it
reflects nothing more than the simultaneity problems associated
with the schooling variables itself. If short, the simultaneity
problem could reverse the single equation conclusions.
These examples demonstrate that the usefulness of panel data
in providing particular answers to certain issues depends on the
validity of the assumptions implicit in the model. Although
tests for some assumptions may be performed in certain limited
senses (e.g. , Chamberlain (1984), Hausman (1978), Rudd (1984)),
there are no general nonparametric testing procedures. Given
that the real world is much more complicated than a model can
allow for and one cannot expect the data to be perfect or even of
the same quality in different contexts (e.g., attrition in panel
data, Ashenfelter (l983), Hausman & Wise (1979)), perhaps a more
fruitful approach would be to explicitly recognize the limita-
tions of the data rather than assuming that ideal data are avail-
able to test various behavioral hypotheses. One can then develop
models and estimation methods based on what is observable and
what kind of data may be available so that we may have a better
understanding of what is really going on in the economy by avoid-
ing imposing arbitrary assumptions on our model and data (e. g. ,
Amemiya (1973) and Heckman (1976, 1978) on sample selectivity).
ACKNOWLEDGEMENTS
This work was supported in part by the Social Sciences and
Humanities Research Council of Canada grant 410-84-1032 to the
University of Toronto. I would also like to thank D. Young for
editorial assistance and M. Denny, C. Manski, A. Melino, D.
Mountain, S. Nickell, D. Poirier, an associate editor and a
referee for helpful comments.
REFERENCES
Aigner, D.J., Hsiao, C., Kapteyn, A , , & Wansbeek, T., (1984).
Latent variable models in econometrics. Handbook of Econo-
metrics, Vol. 11, ( Z . Griliches & M. Intriligator, Eds.)
Amsterdam: North-Holland, 1322-1393.
Amemiya, T., (1973). Regression analysis when the dependent

variable is truncated normal. Econometrica, 41, 997-1016.
Andersen, E.B., (1970). Asymptotic properties of conditional

maximum likelihood estimators. Journal of the Royal Statis-
tical Society, Series B, 32, 283-301.
Andersen, E.B., (1973). Conditional Inference and Models for

Measuring. Kbbenharn: Mentalhygiejnish Farlag.
Anderson, T.W. & Hsiao, C., (1981). Estimation of dynamic models

with error components. Journal of the American Statistical
Association, 76, 598-606.
Anderson, T.W. & Hsiao, C., (1982). Formulation and estimation

of dynamic models using panel data. Journal of Econo-
metrics, 18, 47-82.
HSIAO
Ashenfelter, O., (1978). Estimating the effect of training

programs on earnings. Review of Economics and Statistics,
60, 47-57.
Ashenfelter, O., (1983). Determining participation in income-

tested social programs. Journal of the American Statistical
Association, 78, 517-525.
Ashenfelter, 0. & Solon, G., (1982). Longitudinal labor market

data -- sources, uses and limitations. What's Happening to
American Labor Force and Productivity Measurements? Pro-
ceedings of a June 17, 1982 Conference sponsored by the
National Council on Employment Policy, The W.E. Upjohn
Institute for Employment Research, 109-126.
Ashenfelter, O., Deaton, A., & Solon, G., (1984). Does it make
sense to collect panel data in developing countries? Mimeo.
Balestra, P. & Nerlove, M., (1966). Pooling cross-section and

time series data in the estimation of a dynamic model: the
demand for natural gas. Econometrics, 34, 585-612.
Baltagi, B.H., (1981). Simultaneous equations with error compo-

nents. Journal of Econometrics, 17, 189-200.
Beckwith, N., (1972). Multivariate analysis of sales response of

competing brands to advertising. Journal of Marketing
Research, 9 , 168-176.
Ben-Porath, Y., (1973). Labor force participation rates and the

supply of labor. Journal of Political Economy, 81, 697-704.
Bhargava, A. & Sargan, J.D., (1983). Estimating dynamic random

effects models from panel data covering short time periods.
Econometrica, 51, 1635-1659.
Borus, M.E., (1982). An inventory of longitudinal data sets of

interest to economists. Review of Public Data Bases, 10,
113-126.
Box, G.E.P. & Tiao, G.C., (1968). Bayesian estimation of means

for the random effects model. Journal of the American
Statistical Association, 63, 174-181.
Chamberlain, G., (1977a). Education, income and ability revis-

ited. Latent Variables in Socio-Economic Models, (D.J.
Aigner & A.S. Goldberger, Eds.) Amsterdam: North-Holland.
Chamberlain, G., (1977b). An instrumental variable interpreta-

tion of identification in variance-components and MIMIC
model. Kinometrics: Determinants of Social-Economic Success
Within and Between Families (P. Taubman, Ed.) Amsterdam:
North-Holland.
Chamberlain, G., (1978). On the use of panel data. Paper pre-

sented at the Social Science Research Council conference on
Life Cycle Aspects of Employment and the Labor Market, Mt.
Kisco, New York.
Chamberlain, G., (1980). Analysis of covariance with qualitative

data, Review of Economic Studies, 47, 225-238.
Chamberlain, G., (1982). Multivariate regression models for

panel data. Journal of Econometrics, 18, 5-46.
170 HSIAO
Chamberlain, G., (1984). Panel data. Handbook of Econometrics,

Vol. I1 (Z. Griliches & M. Intriligator, Eds.) Amsterdam:
North-Holland.
Chamberlain, G. & Griliches, Z., (1975). Unobservables with a

variance-components structure: ability, schooling and the
economic success of brothers. International Economics
Review, 16, 422-450.
Cornwell, C. & Schmidt, P., (1984). Panel data with cross-

sectional variation in slopes as well as intercepts. Mimeo,
Michigan State University.
Cripps, T. & Tarling, R., (1974). An analysis of the duration of

male unemployment in Great Britain, 1932-1973. Economic
Journal, 84, 289-316.
Dhrymes, P., (1971). Distributed Lags: Problems of Estimation

and Formulation. San Francisco: Holden-Day.
Dielman, T., Nantell, T., & Wright, R., (1980). Price effects of
stock repurchasing: a random coefficient regression approach.
Journal of Financial and Quantitative Analysis, 15, 175-189.
Freeman, R.B. & Medoff, J.L., (1981). The impact of collective

bargaining: illusion or reality? Mimeo.
Griliches, Z., (1967). Distributed lags: a survey. Econo-

metrica, 35, 16-49.
Griliches, Z., (1977). Estimating the returns to schooling: some

econometric problems. Econometrica, 45, 1-22.
Griliches, Z., (1979). Sibling models and data in economics:

beginning of a survey. Journal of Political Economy, 87,
supplement 2, S37-S64.
Griliches, Z. & Hausman, J., (1984). Errors-in-variables in

panel data. NBER Technical Paper no. 37.
Hausman, J.A., (1978). Specification tests in econometrics.

Hausman, J.A. & Wise, D., (1979). Attrition bias in experimental

and panel data: the Gary income maintenance experiment.
Hausman, J.A., Hall, B.H., & Griliches, Z., (1984). Econometric

models for count data with application to the patents-R&D
relationship. Econometrica, 52, 909-938.
Heckman, J.J., (1976). The common structure of statistical

models of truncation, sample selection, and limited depen-
dent variables and a simple estimator for such models.
Annals of Economic and Social Measurement, 5, 475-492.
Heckman, J.J., (1978). Simple statistical models for discrete

panel data developed and applied to test the hypothesis of
true state dependence against the hypothesis of spurious
state dependence. Annales de L'INSEE, No. 30-31, 227-269.
Heckman, J.J., (1979). Sample selection bias as a specification

error. Econometrica, 47, 153-161.
Heckman, J.J., (1981a). Statistical models for discrete panel

data. Structural Analysis of Discrete Data with Econometric
HSIAO
Applications, (C.F. Manski and D . McFadden, Eds.) Cambridge,

Mass.: MIT Press, 114-178.
Heckman, J.J., (1981b). The incidental parameters problem and

the problem of initial conditions in estimating a discrete
time-discrete data stochastic process. Structural Analysis
of Discrete Data with Econometric Applications, (C.F. Manski
and D . McFadden, Eds.) Cambridge, Mass.: MIT Press, 179-195.
Heckman, J.J. (1981~). Heterogeneity and state dependence.

Studies in Labor Markets, (S. Rosen, Ed.) Chicago: Universi-
ty of Chicago Press, 91-139.
Heckman, J.J. & Borjas, G . , (1980). Does unemployment cause

future unemployment? definitions , questions and answers
from a continuous time model of heterogeneity and state
dependence. Econoniica, 47, 247-283.
Heckman, 3.5. & Singer, B., (1984). Econometric duration analy-

sis. Journal of Econometrics, 24, 63-132.
Heckrnan, J.J. & Willis, R . , (1977). A beta-logistic model for

the analysis of sequential labor force participation by
married women. Journal of Political Economy, 85, 27-58.
Hsiao, C . , (1983). Identification. Handbook of Econometrics,

Vol. I, (Z. Griliches and M. Intriligator, Eds.) Amsterdam:
North-Holland, 223-283.
Hsiao, C., (1985). Analysis of Panel Data. Cambridge: Cambridge

University Press (forthcoming).
Kiefer, N.M., (1979). Population heterogeneity and inference

from panel data on the effects of vocational education.
Journal of Political Economy, 87, part 2 , S213-S226.
Kiefer, N.M., (1980). Estimation of fixed effect models for time

series of cross-sections with arbitrary intertemporal covar-
iance. Journal of Econometrics, 14, 195-202.
MaCurdy, T.E., (1981). An empirical model of labor supply in a

life cycle setting. Journal of Political Economy, 89,
1059-1085.
MaCurdy, T.E., (1982). The use of time series processes to model

the error structure of earnings in a longitudinal data
analysis. Journal of Econometrics, 18, 83-114.
Maddala, G.S., (1971). The use of variance components models in

pooling cross section and time series data. Econometrica,
39, 341-358.
Malinvaud, E., (1970). Statistical Methods of Econometrics, 2nd

Edition. Amsterdam: North-Holland.
Mundlak, Y., (1961). Empirical production function free of

management bias. Journal of Farm Economics, 43, 44-56.
Mundlak, Y., (1978). On the pooling of time series and cross

section data. Econometrica, 46, 69-85.
Nerlove, M., (1971). Further evidence on the estimation of

dynamic economic relations from a time series of cross
section. Econometrica, 39, 359-382.
HS I A O
Neyman, J. & Scott, E.L., (1948). Consistent estimates based on

partially consistent observations. Econometrica, 16, 1-32.
Nickell, S., (1979). Biases in dynamic models with fixed effects.

Econornetrica, 49, 1399-1416.
Pakes, A. & Griliches, Z., (1984). Estimating distributed lags

in short panels with an application to the specification of
depreciation patterns and capital stock constructs. Review
of Economic Studies, 51, 243-262.
Phelps, E., (1972). Inflation Policy and Unemployment Theory:

The Cost Benefit Approach to Monetary Planning. London:
Macmillan.
Rudd, P., (1984). Tests of specification in econometrics.

Econometric Review, 3, 211-242.
Schmidt, P., (1984). Simultaneous equation models with fixed

effects. Mimeo, Michigan State Univeristy.
Sheiner, L., Rosenberg, B., & Melmon, K., (1972). Modeling of

individual pharmacokinetics for computer-aided drug dosage.
Computers and Biomedical Research, 5 , 441-459.
Trognon, A., (1978). Miscellaneous asymptotic properties of

ordinary least squares and maximum likelihood estimators in
dynamic error components models. Annales de L'INSEE, 30-31,
631-657.

Hsiao 1985

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hsiao 1985

Uploaded by

Copyright:

Available Formats

Econometric Reviews

ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://www.tandfonline.com/loi/lecr20

Benefits and limitations of panel data

To link to this article: https://doi.org/10.1080/07474938508800078

Published online: 21 Mar 2007.

Submit your article to this journal

Article views: 633

View related articles

Citing articles: 88 View citing articles

Full Terms & Conditions of access and use can be found at

BENEFITS AND LIMITATIONS OF PANEL DATA

A panel (or longitudinal or temporal cross-sectional) data

Copyripht Q 1985 by Marcel Dekkcr, Inc.

Longitudinal Surveys of Labor Market Experience (NLS). The PSID

'For the sources of U.S. (labor market) data sets, see

series data, more reliable research methods can be used in order

2. BENEFITS OF PANEL DATA

2.1 Identification and Discrimination

ing variation in micro-economic and demographic variables, cannot

i n unemployment, wages, income, and s o f o r t h . But n e i t h e r c r o s s -

2.2 Reducing E s t i m a t i o n Bias

behavioral relationship as well as the manner in which these

where x and z. are k x l and k xl vectors of exogenous vari-

Similarly, if z = z for all i, (i.e., 5 stay constant across

individuals at a given time but exhibit variation through time),

u it/N. Least squares regression of (2.2) or (2.3) now provides

2.3 Lessening the Problem of Multicollinearity

The shortage of degrees of freedom and severe multicollin-

3. SPECIFICATION AND INFERENCE

While panel data allow us to construct and test more realis-

Suppose we have sample observations on certain characteris-

the outcome but which are not explicitly included as explanatory

where ai represents the individual specific effects and uit

metricians, and u reflects unmeasured inputs which are not

first or the second experiment. If his interest is in the popu-

where u! = (uil, . . . , uiT) and 5 is a T x l vector of ones.

ables generating the process. Suppose that conditional on L ,

E(a.lx.) = a? = z'x. and var(a.lx.) = aL Then the conditional

In both cases, the conditional density of gi + gai given ai is

But the marginal densities of E~ + %ai given -i x are different

Hausman (1978), and Kiefer (1979)).

which are independent of the incidental parameters such that when

then an estimator 2 derived by solving YNi(xl, ..., xNIi) = 0, j =

effect approaches in eliminating the bias of the structural

3.2 Linear Models

where u. are assumed to be independent of x . If ai are

The LSDV estimator of & is also called the within group or

corresponding cross-sectional unit is called the covariance

If $ + 1, EGLS converges to the OLS. If $ + 0, the GLS of &

f r a c t i o n (I-$') of i n d i v i d u a l means yi and Zi from t h e i r c o r r e -

When u2 and u2 a r e unknown, we may s u b s t i t u t e t h e i r c o n s i s -

t e n t e s t i m a t e s f o r them. The two s t e p GLS e s t i m a t o r i s asymptot-

h o l d s , where E*(a. ! x i ) r e f e r s t o t h e minimum mean s q u a r e e r r o r

i n t o ( 3 . 6 ) , we c a n a p p l y t h e GLS method t o e s t i m a t e ( p , & ' , % I ) .

where 2" is a Kxl vector of constants, and UJ is i.i.d., the GLS

where @denotes the Kronecker product.

The equations (3.15) and (3.16) can be rewritten as

obtain the LS estimates of by regressing (x.-%$)

show that fi(2-7t) is asymptotically normally distributed with

mean zero and variance covariance matrix R , where

A consistent estimator of R is readily available from the

We can impose these restrictions by using a minimum distance

estimator. Namely, choose 5 to minimize

Under the assumption that 2 possesses continuous second