You are on page 1of 42

# Journal

of Econometrics

18 (1982) 546.

North-Holland

Publishing

MULTIVARIATE
REGRESSION
FOR PANEL DATA

Company

MODELS

Gary CHAMBERLAIN*
University
Nutionul

Bureau of EconomicResearch,

WI 53706, USA

Cambridge,

MA 02138. USA

## The paper examines

the relationship
between heterogeneity
bias and strict exogeneity
in a
distributed
lag regression
of y on X. The relationship
is very strong when x is continuous,
weaker when x is discrete, and non-existent
as the order of the distributed
lag becomes
random
variables
introduce
nonlinearity
and
heteroinfinite.
The individual
specific
skedasticity;
so the paper provides an appropriate
framework
for the estimation
of multivariate
linear predictors.
Restrictions
are imposed using a minimum distance estimator.
It is generally
more efficient than the conventional
estimators
such as quasi-maximum
likelihood.
There are
computationally
simple generalizations
of two- and three-stage
least squares
that achieve
this efficiency gain. Some of these ideas are illustrated
using the sample of Young Men in the
National
Longitudinal
Survey. The paper reports regressions on the leads and lags of variables
measuring union coverage, SMSA, and region. The results indicate that the leads and lags could
have been generated
just by a random
intercept.
This gives some support
for analysis of
covariance
type estimates; these estimates indicate a substantial
heterogeneity
bias in the union,
SMSA, and region coefficients.

1. Introduction
Suppose that we have a sample of individuals
(or firms) followed over time:
and i= 1,. ., N individuals.
(xif,yiJ, where there are t= 1,. . ., T periods
Consider the following distributed
lag specification:
E(YitIXil,...,XiT,biO,...,biJ,Ci)=

i bijXi,t-j+Ci,
j=O

t=J+l,...,T

## The coefficients b,, and ci are allowed to vary across individuals

but are
constant
over time. The population
parameters
of interest are fij= E(bij),
J. If the bii or ci are correlated with x, then a least squares regression
j=O,...,
*I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel
Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided
by the National
Science Foundation
(Grants No. SOC-7925959
and No. SES-8016383) and by
funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,
by the Department
of Health,
Education,
and Welfare pursuant
to the provisions
of the
Economic Opportunity
Act of 1964.
01657410/82/000Cr0000/\$02.75

1982 North-Holland

## of y, on x,, ., .xtmJ will not provide a consistent

estimator
of the B,i (as
N-+co). We shall refer to this inconsistency
as a heterogeneity
bias.
In section 2, on identification,
we consider first the case J =0 and bij=Pj
We argue that the presence of heterogeneity
bias will be signalled by a full
set of lags and leads in the least squares regression
of y, on x1,. . .,xT
Furthermore,
if we let y=(yi,..
.,yr), x=(xr,. . .,x,) and consider
the
multivariate
linear predictor: E*(y lx) = no + lI,x,
then the T x T matrix ZZ,
should have a distinctive
pattern the off-diagonal
elements within the
same column are all equal. In that case,

## so there is just a contemporaneous

relationship
when we transform
to first
differences.
I think that a test for such restrictions
should accompany
analysis of covariance type estimation.
There is an analogous
question when J is finite and the bj are random as
well as c. Does E(y, 1x1,. . ., xT) = E(y, 1x,, . . ., xtmJ) imply that there is no
heterogeneity
bias? We find that the answer is yes if x has a continuous
distribution
but not if x is discrete.
New issues arise as the order (J) of the distributed
lag becomes infinite.
We consider this problem in the context of a stationary
stochastic process; c
and the bj are (shift) invariant
random variables. There are invariant
random
variables with non-zero variance if and only if the process is not ergodic. We
pose the following question: if
E*(Y, I . . ..Xf-1.&,Xt+1,..

. I=

E*(Y,

1x,, x, -

1,.

.I,

so that y does not cause x according to the Sims (1972) definition, is it then
true that there is no heterogeneity
bias? The answer is no, because if d is an
invariant random variable, then
E*(dI . . .. x,_~,x,,x,+~

,... )=E*(dIxt,xtpl

,... ).

## Section 3 of the paper considers

the estimation
of multivariate
linear
predictors. lhere is a sample ri = (x;,y\$ i = 1,. . ., N, where x; = (xi,, ., xiK) and
yi=(y,r,. . ., yiM). We assume that ri is independent
and identically
distributed
(i.i.d.) according to some distribution
with finite fourth moments. We do not
assume that the regression function E(ji 1xi) is linear; for although E(ji 1xi, ci)
may be linear, there is generally no reason to insist that E(c,j xi) is linear.
Furthermore,
we allow the conditional
variance
V(_V,1xi) to be an arbitrary
function of xi; the heteroskedasticity
could, for example, be due to random
coefficients. Let wi be the vector formed from the squares and cross-products
of the elements of vi; let Zl be the matrix of linear predictor
coefficients:

G. Chamberlain,

Mulrirw-iate

regression

models fir

panel data

## ,5*Cyi ( xi) =ZIx,

where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d.
function of E(wi). So the problem is to make inferences about
functions of a population
mean, under random sampling.
This is straightforward
and the results have a variety of novel
Let ii be the least squares estimator; let it and 71 be the vectors

and f7 is a
differentiable
implications.

formed from

## the columns of ii and II. Then fi(7i-~)~N(O,Q)

as N-t co. The formula
for C2 is not the standard
one, since we are not assuming
homoskedastic,
linear regression.
We impose restrictions
by using a minimum
distance estimator:
find the
matrix satisfying the restrictions
that is closest to fi in the norm provided by
fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some
consider
a univariate
linear
predictor:
surprising
results.
For example,
E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction
that n2 =0
by using a least squares regression of y on x1 to estimate rcr; however, this is
asymptotically
less efficient, in general, than our minimum distance estimator.
The conventional
estimator is a minimum
distance estimator, but it is using
a different norm.
A related result is that two-stage
least squares is not, in general, an
efticient procedure
for combining
instrumental
variables;
three-stage
least
squares is also using the wrong norm. We provide more efficient estimators
for the linear simultaneous
equations
model by applying
our minimum
distance procedure
to the reduced form, thereby generalizing
Malinvauds
(1970) minimum
distance estimator.
Suppose that the only restrictions
are
that certain structural
coefficients are zero (and the normalization
rule). We
provide
a generalization
of three-stage
least squares that has the same
limiting
distribution
as our minimum
distance
estimator.
There is a
corresponding
generalization
of two-stage least squares.
We also consider the maximum
likelihood
estimator
based on assuming
that ri has a multivariate
normal distribution
with mean z and covariance
matrix Z. Then the slope coefficients in IZ are functions
of C and, more
generally,
we can consider estimating
arbitrary
functions
of C subject to
restrictions.
When the normality
assumptions
do not hold, we refer to the
estimator
as a quasi-maximum
likelihood
estimator.
The quasi-maximum
likelihood estimator has the same limiting distribution
as a certain minimum
distance estimator;
but in general that minimum
distance estimator
is not
using the optimal norm. Hence our estimator is generally more efficient than
the quasi-maximum
likelihood estimator.
Section 4 of the paper presents an empirical example that illustrates some
of the results. It is based on the panel of Young Men in the National
Longitudinal
Survey (Parnes); y, is the logarithm
of the individuals
hourly
wage, and x, includes variables to indicate whether or not the individuals
wage is set by collective bargaining;
whether or not he lives in an SMSA;
and whether or not he lives in the South. We present unrestricted
least

G. Chamberlain,

Multivariate

## regression models for panel data

squares regressions of y, on xi,. ., xT. There are significant leads and lags; if
they are generated just by a random
intercept
(c), then ZZ should have a
distinctive
form. There is some evidence in favor of this, and hence some
justification
for analysis of covariance
estimation.
and lags could be interpreted
as due just to c, with E(y, 1x1,. . ., xT, c) =j?x, + c.

2. Identification
Suppose
technology,

that

a farmer

is producing

a product

o<p<1,

Y,=Px,+c+~,,

with

a Cobb-Douglas

t=l,...,7;

## where y, is the logarithm

of output, x, is the logarithm
of a variable input
(labor), c represents
an input
that is fixed over time (soil quality),
U,
represents
a stochastic
input (rainfall), which is not under the farmers
control, and t indexes the seasons. We shall assume that the farmer knows
the product price (P) and the input price (W), which do not depend on his
decisions, and that he knows c. The factor input decision, however, is made
before knowing
u,, and we shall assume that xt is chosen to maximize
expected profits. Then the factor demand equation is
x, = {ln /I + ln CE(e 1%)I + ln(P,/&)

+ c}/(l -

p),

## where L?\$*is the information

set available to the farmer when he chooses xt.i
Although c is known to the farmer and affects his factor demand decisions,
we assume that it is not known to the econometrician.
He observes only
yr)
and
x
=(x1,.
.
.,
xT)
for
each
member
of
a
sample
of N farms.
y=(y1,...,
Consider the least squares regression of y, on x1 using just a single crosssection of the data. The population
counterpart
is

## where E* is the minimum

regression function),

mean

711=cov(Yl,4Vx,),

square

error

linear

predictor

(the wide-sense

%=E(Y~)--~E(xJ

Cov(c, x,)#O if V(c)#O; then n, #p and the least squares estimator of /I does
not converge to fi as N-co.
Furthermore,
with a single cross-section,
there
would be no internal evidence of this heterogeneity
bias.
This example

is discussed

in Mundlak

(1961,1963)

and in Zellner,

Kmenta,

G. Chamberlain,

:Illrltirariate

regression

## With more than one observation

per farm, however, we can consider the
counterpart
least squares regression of y, on x = (xi,. ., xT). The population
is
E*(y, I x) = pxt + E*(c ( x) + E*(u, I x).
Assume

E*(c 1x) = \$ +

xx,

Then
Iz= I/

## Even if E*(u, / x) =O, there will generally

be a full set of lags and leads
if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,. . ., ?; then Iz is proportional
to the row sums of V-(x), and all of the elements
of I will
typically be non-zero. I think that it is generally true that E*(c lx) depends
on all of the x,s if it depends
on any of them. So the presence
of
heterogeneity
bias will be signalled by a full set of lags and leads. Also, if
E*(u) x)=0,
then
the wide-sense
multivariate
regression
will have a
distinctive pattern:

co+, x) v

(x) = p I, + 1A,

## where 1 is a TX 1 vector of ones. The off-diagonal

elements within the same
column of ll, are all equal.
A common solution to the bias problem is some form of analysis of covariance. For example, we can form the farm specific means (j?=CT= 1 y,/T,
X =cT= 1 x,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then
run a pooled least squares regression of ~7 on 2. This is equivalent
to first
running the least squares regression of g* on & for each of the T cross-section
samples, and then forming a weighted average of the T slope coefficients. The
population
counterpart
of the tth least squares regression is

## So the least squares regression of Y; on ?r provides a consistent

(as N-co)
estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition
to hold unless
E*(u,-uu,.

j-~~-x~,...,x~-x~~~)=O,

t = 2,

,) 7:

## This analysis of covariance

estimator
was used by Mundlak (1961). Related estimators
have
been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),
Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in
Chamberlain
(1980).

10

G. Chamberlain,

Mu&variate

## so that x is strictly exogenous

differences.3 The strict exogeneity

## regression models for panel data

when we transform
the model to first
restriction is testable since it implies that

E*(!,,-Yr~1IXz-X1,..., xT--x.-l)=~*(Yt-Y,-l
hence there are exclusion restrictions
A stronger condition is that
E*(u,lx

on the linear

I-+-x,-d
predictors.

t=l,...,T.

,,..., xT)=o,

## This implies that Zl, has the form fiZ,+l1.

These restrictions
on n,are
testable; we can summarize
them by saying that x is strictly exogenous
conditional
on c. The restrictions
would fail to hold in the production
function
example
if u, is partly
predictable
from its past, so that
E[exp(u,) 1LAY,]depends on u, _ r, u, _ 2, . . .
Now suppose that the technology varies across the farms, so that
y,=bx,+c+u,,
where b is a random variable that is constant over time. We shall refer to b
and c as invariant
random
variables.
Our discussion
of E*(c lx) indicated
that it depends on all of the x,)s if it depends on any of them. I would expect
this to be true of E(c 1x) as well. This general characteristic
of invariant
random variables is formulated in the following condition:
Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of
random
variable. Then E(d 1x)=E(d I x*)
{l,..., T). Let d be an invariant
implies that E(d 1x) = E(d).
Suppose that the parameter
of interest is /l=E(b). If b or c is correlated
with x, then a least squares regression
of y, on x, will not provide
a
consistent estimator of /I. We have argued that such a heterogeneity
bias will
be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).
Under what conditons can we infer that there is no bias if we observe only a
contemporaneous
relationship?
Proposition
1 provides some guidance; it can
be extended easily to the case of a finite distributed
lag.
Condition (R).

## Prob (x, =x, _ 1) = 0 for some integer

Proposition

Suppose that

I.

E(y, I x, b, 4 = b x, + c,
3The strict exogeneity

terminology

n with i 5 n 5 T.

t=l,...,T.

[f conditions

fir

panel data

11

E(Y,

I4 = E(Y,I 4,

t=l,...,7:

implies that

ProoJ:

The following

equalities

So E(bIx)=E(bI
and

I xn -

one:

if T2 3, then

(C) implies

that

## E(b 1x)= E(b),

~(clx)=E(y,lx)--E(blx)x,=E(y,lx,)--xx,;
hence E(c 1x) = E(c 1x1) and so E(c 1x) = E(c).

Q.E.D.

## This analysis can be applied to linear transformations

of the process. If
we find that E(y, 1x) has a full set of lags and leads, then we can ask if
that is just due to E(c/x)#E(c).
Let dy,=y,-y,_,,
Ax~=x~--x~-~,
and
Ax = (Ax,, . . ., Ax,). Under the assumptions
of the proposition,
if
E(AY, 1A4 = E(AY, ( Ax,),
then
E(AY, 1A4 = B(A-4.
is possible
to find E(Ay, 1Ax)=E(Ay,
(Ax,) even though
or
example,
consider
the
stationary
case
in which cov(x,, b)
-W+)#W).
F
= cov (x,, b); then E*(b 1Ax) = E(b) and so E(b 1Ax)= E(b) if the regression
function of b on Ax is linear. Then we might find that E(Ay,) x) has a full set
of lags and leads even though E(Ay, 1Ax) does not.
The condition
that prob(x,=x,_
,)=O is necessary.
For consider
the
following
counter-example:
E(b ( x) = /II1 if x1 =. . . = xT, E(b 1x) = p2 if not
Note

that

it

## (PI f PA. Then

G. Chamberlain,

12

but p2 #E(b)
distinction
here
only takes on
probability
that
for large 7:
The following
distinction;
it is

Multivariate

regression

unless
prob(x, = ... =
between continuous
a finite set of values,
x1 =. . . = xT, although

models
for panel data

xT) = 0. So there
is an important
and discrete distributions
for x. If x,
then there will generally
be positive
this probability
may become negligible

proposition
provides
insight
mto
based on a condition that is slightly weaker than (R):

Condition (R).

Prob(x,

Proposition 2.

Suppose

this

= x2 =. . . = xT) = 0.
that

t=l,...,7;

## where T 2 2. Assume that condition (R) holds and define

6=til
(Yt-m-+l(x,--v.
Then E(6j = E(b) if E((6j) < a.4
ProoJ

The following

equalities

E(l+,b,c)= i

## hold with probability

i

b(x,-X)

one:

(x,-%)2=b;

I t=1

t=1

so if E(I6/)< co,
E(6j = E[E(6[ X, b, c)] = E(b).
Suppose
that (yil,. . ., yi,, xii,. . ., xiT),
from the distribution
of b,x). Define

6zt\$l

(Yit-Pi)(xit-xi)

Q.E.D.
i= 1,. . ., N,

is

random

sample

til (xit-xi)2.

## Then if the assumptions

of Proposition
2 are satisfied, cr= I &i/N converges
almost surely (as.) to E(b) as N-co.
It is important
that gi is an unbiased
estimator
of E(b), since we are actually taking the unweighted
mean of a
*The assumption
that E(161)< co is not innocuous.
For example, suppose that V(c)= V(b)=0
and (x,, y,) is independent
and identically distributed
(t = 1,. ., T) according to a bivariate normal
with
distribution.
Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~
w, where w has Students t-distribution
T- 1 degrees

of freedom.

13

## large number of these estimators.

The lack of bias requires that x be strictly
exogenous
conditional
on b,c. It would not be sufficient to assume that
E(y, ( xt, b, c) = bx, + c. For example, if x, = y,_ 1, then our estimator would not
converge to E(b), due to the small T bias in least squares estimates of an
autoregressive
process.
Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the
group with Di= 1. The sample mean of bi for that group converges as. to
E(b 1D = l), but we have no information
on E(b 1D = 0). So unless prob(D = 0)
= 0, any value for E(b) is consistent with a given value for E(b 1D = 1).5
If x, has a continuous
distribution,
then the assumption
that the regression
function is linear (E(y, 1xt, b, c) = bx, + c) is very restrictive; the implication
of
this assumption
(combined
with strict exogeneity) is that we can obtain an
unbiased
estimator
for b, and hence a consistent
(as N+co)
estimator
for
E(b). If x, is a binary variable, then the assumption
of linear regression is not
restrictive
at all; but there are fewer implications
since there is positive
probability
that 6is not defined for finite ?:
The following extension
lag is straightforward?
Proposition

1.

of Proposition

## 1 to the case of a finite distributed

Suppose that

E(y,IX,b,,...,b,,c)=

t=J+l,...,T

bjx,-j+c,

j=O

ii

1 X,-J-l

:1

X,-J

. . . . x&,

E(Y,

I4

## = E(Y, I x,, . . .>X,-J),

t=J+1,...,7;

5A solution
could be based on Mundlaks
(1978a) proposal
that E(bIx)=\$,,+\$,
CT=, x1.
However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult
to justify the restriction
that only cx,
matters, unless T is large and we have stationarity:
cov (b, I,) = cov (b, x1) and V(x) band diagonal. (See Proposition
4 and the discussion preceding
it). Furthermore,
if cov(h, x,) = cov(b, x1), then E(b 1x2-x,,
.,xr -xT- 1)= E(b) (if the regression
function is linear), and so there is no heterogeneity
bias once we transform to first differences.
6We shall not discuss the problems
that arise from truncating
the lag distribution
when
T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear
transformations
of the process, it is fairly straightforward
to extend our analysis to general
rational distributed
lag schemes.

## G. Chamberlain, Multivariate regression models for panel data

14
implies

that
E(y,Ix)=

Bjxt-j+Y,

j=O

where
pj = E(bj) = E(b, 1x)

and

j=O,...,

## The extension of Proposition

2 is also straightforward.
There
however, in the infinite lag case, which we shall take up next.
Large number of lags.

Suppose

E(.Yfldx),c)= f

J.

that

Bjxt-j+c2

i=O

## where O(X) is the information

set (a-field) generated
by {. . .,x_ I, x0, x1,. . .},
and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression
version of the Sims (1972) condition
for x to be strictly exogenous (y does
not cause x),
E(Yt

I 4) = E(Yt I x,2 xt -

19.. 4

## Does this condition

imply
that E(c 1a(x))=E(c),
so that there is no
heterogeneity
bias?
We shall consider this question
in the context of a (strictly) stationary
stochastic
process. Since c does not change over time, it is an invariant
random variable. The following proposition
is proved in appendix A:
Proposition

3.

## Ifd is an invariant random variable with E(ldl)< co, then

E(dIo(x))=E(dlx,,x,-,,...),
where t is any integer.
It follows that
n

E(Y,Ia(x))=E(cIx,,x,-,,...)+

C Pjx*-j
j=O

=E(y,Ix,,x,-I,...).
So we cannot

## G. Chamberlain, Multivariate regression models for panel data

1s

a large number
of lags have been included, then a small number of leads
information
on c.
We can gain some insight into this result by considering
the linear
predictor of an invariant
random variable. Let
E*(c 1xl,. . ., x.)=IC/T+&XT,
where
2;. =(& i, . . .) A,,)

and

x;=(xl,...,xT).

Stationarity
implies that I,=rV(xT)l, where r =cov(xl, c) and 1 is a TX 1
vector of ones. Since V(x,) is a band-diagonal
matrix, I is approximately
an
eigenvector
of I+,)
for large T; hence &.x,EzIc~T=
1x,. For example, if
X, = px, _ i + u,, where v, is serially uncorrelated,
then

&-x,=~

1

i=l

## Now in this example, L&K, does not approach

a limit as T--+Lx unless
z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial
linear predictor only if cj=O x,_ j/J converges to a non-degenerate
random
variable as J-rco.
The general
Proposition
then

case is covered

4.

by the following

proposition:

)=\$

## and E(d) < co, E(xf) < 00,

+/IT?,

where 2 is the limit in mean square of cJ= Ox, _ j/J as J+ co, t is any integer,
A=cov(d,i)/V(i)

=o

if

V(a)#O,

if

V(a)=O,

and
\$ = E(d) - AE(f).

(See appendix

A for proof.)

The existence of the f limit, both in mean square and almost surely, is the
main result of ergodic theory and will be discussed further below. It is clear
that 2 is an invariant
random variable. If V(a)#O, then the x process has a
(non-degenerate)
invariant
component,
and conditioning
on the xs gives a

G. Chamberlain,

16

Multioariate

regression

modelsfor

panel data

non-trivial
linear predictor if 2 is correlated with c. However, if V(i)=O, then
cov(c, x,)=0 for all t, and the linear prediction
of c is not improved
by
conditioning
on the xs
It follows from Proposition
E*(Y,

4 that

I . . .. x,-1,x*,x,+

,,..

=E*(ytIxt,x,-,,...I

=i+jio
( 1
Bj++

xt - j + r(J),

## where r(J) converges in mean square to zero as J-co.

So y does not cause x
according to Sims definition; but this does not imply that c is uncorrelated
with the xs. If we include a large number of lags, then the bias in any one
coefficient is a negligible A/J, but the bias in the sum of the lag coefficients
tends to 2 as J-co.
If we include K leads, then the sum of their coefficients
is approximately
K3,/J, which is close to zero when J is much larger than K.
If the pi are zero for j> J*, then the lag coefficients beyond that point will
be close to zero but their sum will be close to II.
there are non-degenerate
invariant
Under the stationarity
assumption,
random variables if and only if the process is not ergodic. The basic result
here is the (pointwise)
ergodic theorem:
Let g be a random
variable
on
and
let
g,(o)=g(Sw),
where
S
is
the
shift
(Q,F,P)
with E(lgl)< co,
transformation
(see appendix A); then the following limit exists as.:

## The limit kj is an invariant

random
variable;
it is the expectation
conditional
on &, where f is the information
set (a-field) generated by
the invariant
random variables. If 1/(i) # 0 for some g, then the process
ergodic. In the ergodic case, all of the invariant
random
variables
degenerate distributions.
Suppose

of 8,
all of
is not
have

that

E(Y,I 44, A= b x, + c,
and let

Gil
Recall

condition

(Y,-Ylbt--x)

il h-3.

(R):

=...=x~)=O.

prob(s,

want

to

examine

the

G. Chamberlain, Multinariate

significance

of condition

So a limiting

version

(R) as T+n;

of condition

## regression models for panel data

in the stationary

17

## case. Note that

(R) is

prob[ I/(x, ) f) = 0] = 0.
If this condition

holds, then
~~(xlY1l~)-~(xlI8)~(YlI,a)~,

limb

E(4 I &)-cm,

T-rX

as.

I &)I2

and b is observable
as T-tco.
But if there is positive probability
that
T/(x, 1f) =O, then the identification
problem is more difficult. There is no
information
on b for the stayers; so-in order to obtain E(b), even as T-co,
we have to make untestable
assumptions
part of the
b distribution.

3. Estimation
Consider
a sample
Y;=(x:,yi),
i = 1,. . .,X, where
xi. = (xi,, . ., xiK), yi
=(yil,. . ., yiM). We shall assume
that vi is independent
and identically
distributed
(i.i.d.) according
to some multivariate
distribution
with finite
fourth moments
and E(x,x:) non-singular.
Consider
the minimum
mean
square error linear predictors,

E*(yi, I xi)

m=l,...,M,

=dlxi>

E*bi 1xi) = LZxi

with

## We want to estimate ll subject to restrictions

and to test those restrictions.
For example, we may want to test whether a submatrix
of Ll has the form
/?Z+lA. I think that analysis of covariance estimation
should be accompanied
by such a test.
We shall not assume that the regression
function E(y, 1xi) is linear. For
although E@, 1xi, ci) may be linear (indeed, we hope that it is), there is generally
This agrees with the definition

in section

2 if xi includes

a constant.

18

G. Chamberlain,

Multivariate

## no reason to insist that E(ci Ixi) is linear. So we shall present a theory of

inference for linear predictors. Furthermore,
even if the regression function is
linear, there may be heteroskedasticity
due to random
coefficients, for
example.8 So we shall allow V(j, 1xi) to be an arbitrary function of xi.

## 3.1. The estimation

of linear predictors

Let wi be the vector formed from the distinct elements of riri that have
non-zero
variance.
Since v;=(xi,yi)
is i.i.d., it follows that wi is i.i.d. This
simple observation
is the key to our results. Since IZ is a function of E(wi),
our problem is to make inferences about a function of a population
mean,
under random sampling.
Let p= E(w,) and let IL be the vector formed from the columns of ll [Z
= vet (IZ)]. Then YI is a function
of P: x=/z(p).
7i = h(w) is the least squares estimator:

Let

W= cy2 1 w,/N;

then

.=VeC[(~~XixI)-~~XiYI].

## By the strong law of large numbers,

W converges almost surely to p as
(WL \$), where p is the true value of p. Let n=h(~o). Since h(p) is
7~. The central limit theorem implies
continuous
at p =p, we have 2%
that
N-tee

J5\$i-pO)%v(O,V(w,)).
Since h(p) is differentiable

gives

JN(iZ-d)%v(O,R),
where

## We have derived the limiting distribution

of the least squares estimator.
This approach
was used by Cramer
(1946) to obtain
limiting
normal
*Anderson
(1969,1970),
Swamy (1970,1974),
Hsiao (1975), and Mundlak
estimators
that incorporate
the particular
form of heteroskedasticity
that
random coefficients.
See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).

(1978a) discuss
is generated
by

G. Chamberlain,

Multivariate

## regression models for panel data

19

distributions
for sample correlation
and regression
coefficients (p. 367); he
presents an explicit formula for the variance of the limiting distribution
of a
sample correlation
coefficient (p. 359). Kendall and Stuart (1961, p. 293) and
Goldberger
(1974) present
the formula
for the variance
of the limiting
distribution
of a simple regression coefficient.
Evaluating
the partial derivatives
in the formula for 52 is tedious. That
calculation
can be simplified since i has a ratio form. In the case of simple
regression with a zero intercept, we have rc= E(y,x,)/E(xj!) and

fi(kTO)=

i=l

with
fl

The definition
implies that

y.u.I
I

we obtain

Ql

xi)[fi(

distribution

by working

## of rc gives E[(y, - rcxi)xi] = 0, and so the central

limit theorem

C(Yi- noxi)xillCfi

## the same limiting

,\$ m)].

E(xZ)l,

This approach
was used by White (1980) to obtain the limiting distribution
B (Proposition
5) we
for univariate
regression
coefficients. lo In appendix
s2 = E[iJJi-noxi)(yi

## -nOx,) @@i;l (Xi xi) @, 1,

(1)

where
@, = E(qx;).
A consistent
estimator
sample moments,
o=&\$

L 1

[~i-Bxi)(JJi-fiXi)@

n here

S,=

5 x,x:/N.
i=l

available

from

S;(Xixi)S;q

the

corresponding

AL?,

(2)

20

G. Chamberlain,

Multkariate

regression

## If Vcvi 1Xi) is uncorrelated

If the conditional
variance
depend on xi, then

3.2. Imposing

restrictions:

modelsfiv

function

panel data

is linear,

then

## with xix;, then

is homoskedastic,

so that

## Since IZ is a function of E(w,), restrictions

on ZZ imply restrictions
on E(wi).
by the
Let the dimension
of r=E(wi)
be q. We shall specify the restrictions
condition
that ~1 depends only on a p x 1 vector 8 of unknown
parameters:
p
=g(8), where g is a known function and psq. The domain of 8 is X a subset
of p-dimensional
Euclidean space (RP) that contains the true value 8. So the
restrictions imply that ~=g(6)
is confined to a certain subset of Rq.
We can impose the restrictions
by using a minimum
distance estimator:
choose &to

## where A, ff-i !P and !P is positive definite.

equivalent
to the following one: choose 6 to

This

minimization

problem

is

The properties
of 6 are developed, for example, in Malinvaud
(1970, ch. 9).
Since g does not depend on any exogenous variables, the derivation
of these
properties can be simplified considerably,
as in Chiang (1956) and Ferguson
(1958).
For completeness,
we shall state a set of regularity
conditions
and the
properties that they imply:

21

## Yis a compact subset of RP that contains 6; g

Assumption 1. uN aAg(Bo);
is continuous
on yT and g(6)=g(O) for 0~ Y implies that 8=8;
A, s Y,
where Y is positive definite.

Assumption 2. \$?[a,-g(O)]
O in which g has continuous
G = ag(eOym.
Choose 8 to

E. of
%(O, A); r contains
a neighborhood
second partial derivatives; rank (G) =p, where

minCa,-g(e)lA.Ca,-s(e)l.
0Er

Proposition

6.

If Assumption

Proposition
where

7.

Zf Assumptions

## I is satisfied, then ea%Oo.

I and 2 are satisfied, then ,,/%(&O)%V(O,

## If A is positive definite, then A -(CT A - 1 c)- 1 is positive semi-definite;

optimal choice for Y is A .
Proposition

## 8. If Assumptions I and 2 are satisfied,

definite matrix, and if A,%AI, then

Wwd831
(This is extended
B.)12

A),

hence an

if A is a q x q positive

4vC~,-g(B)1%2kp).

## to the case of nested restrictions

in Proposition

8, appendix

Suppose that the restrictions involve only Zl. We specify the restrictions
by
the condition
that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a
subset of R that includes the true value 6. Consider the following estimator
of 6: choose s^ to

~:CA-f(6)]8-[li-f(S)],
1
Since
appendix
ch. 9).

## the proofs are simple, we shall keep the paper self-contained

and include them in
B. The proofs are based on Chiang (1956), Ferguson
(1958), and Malinvaud
(1970,

22

where
definite.

fi is given
If Y, and

fi(&

## in eq. (2) and

we assume

f satisfy Assumptions

so)qo,

[F

that

n ~l Fj

- ),

and

where
F=

i3f(d)/W.

## We can also estimate So by applying the minimum distance procedure to w

instead of to Iz. Suppose that the components
of wi are arranged
so that
w:=(w;,, wQ, where wil contains the components
of x&. Partition
p=E(wi)
conformably:
p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume
that
V(w,) is
positive definite. Now choose 6 to

## and g,(n, ~1~)= pr. Then &r gives an estimator of 6;

distribution
as the estimator
8 that we obtained by
distance procedure to 12.(See Proposition
9, appendix
This framework
results
For a simple example, we shall use a univariate linear

E*(yi 1Xil,Xiz)=710 +
Consider

?Tl Xi1

## it has the same limiting

applying the minimum
B.)
on efficient estimation.
predictor model,

+7Cz Xi2.

## imposing the restriction

rc2 = 0. Then the conventional
estimator of
the
slope
coefficient
in
the
least
squares
regression
of
y
on x1. We
n1 is byx,,
shall show that this estimator
is generally less efficient than the minimum
distance
estimator
if the regression
function
is nonlinear
or if there is
heteroskedasticity.
Let fi,,it, be the slope coefficients in the least squares multiple regression
of y on x1,x2. The minrmum distance estimator
of a, under the restriction
rrZ =0 can be obtained
as 6=72r +r&
where r is chosen to minimize
the

G. Chamberlain,

(estimated)

variance

Multivariate

of the limiting

regression

distribution

## where Qj, is the estimated

covariance
between
distribution.
Since 72, = bYx, - 722bx2x1,we have

23

## of & this gives

tij and

I& in their

limiting

and
if V(y, 1xii, xi2)=a2,
then
w12/022 =
If E(Y, 1Xil,XiJ
is linear
-COv(Xi,,Xi2)/~(Xi~)
and s^= byxl. But in general 8# byxl and s^ is more
efficient than by_. The source of the efficiency gain is that the limiting
distribution
for ti, has a zero mean (if rc2=O), and so we can reduce variance
without
introducing
any bias if 72, is correlated
with b,,l. Under
the
assumptions
of linear regression
and homoskedasticity,
b,_ and 72, are
uncorrelated;
but this need not be true in the more general framework that
we are using.

3.3. Simultaneous
squares

equations:

A generalization

of two-

and three-stage

least

Given the discussion on imposing restrictions, it is not surprising that twostage least squares is not, in general, an efficient procedure
for combining
instrumental
variables.
I shall demonstrate
this with a simple example.
Assume that (yi,zirxil,xi2)
is i.i.d. according to some distribution
with finite
fourth moments, and that
yi = 6 Zi +

Vi,

where E(ui xii) = E(ui xi2) = 0. Assume also that E(zi xii) # 0, E(z, xi2) # 0. Then
there are two instrumental
variable estimators that both converge a.s. to 6:

## \$jcifI YixijlifI zixij,

j= 1,2,

fi{(;;)-(;)}-N(OJ)>
where the j, k element

of n is

2, = EC(Yi-dzi)2XijXi!J
Jk
E(zixii)E(zi.xik)

j,k=1,2.

24

## G. Chamberlain, Multivariate regression models for panel data

The two-stage
least squares estimator
combines
8,

^
zi=7c1xil +ti2xi2, based on the least squares regression
sume that E[(xir, Xia)(Xil, xi2)] is non-singular),

## and & by forming

of z on x1,x2 (as-

where
N

oiti,

ZiXil

i=l

Since i %a,

JN(&s,,

-6)

I(

rili~lzixil+722

C
i=l

## This suggests finding the r that minimizes

distribution
of fi[r(\$i
- 6) + (1 -r)(& -S)].
minimum distance estimator: choose e^to

zixi2
)

distribution

as

## the variance of the limiting

gives
e^=z&+(l-z)&,
where
~=(~+1,2)/(3.1+2~12+~22),
obtained
by using a
and Ijk is the j, k element
of A - . The estimator
consistent estimator of A has the same limiting distribution.
In general z #a since r is a function
of fourth moments
and a is not.
Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless
xil
E(xil

xi2
xi2)

>I=o.

## If we add another equation, then we can consider the conventional

threestage least squares estimator. Its limiting distribution
is derived in appendix
B (Proposition
5); however, viewed as a minimum
distance estimator,
it is
using the wrong norm in general.

G. Chamberlain,

Consider

the standard

yi =

Multivariate

regression

simultaneous

nxi + ui,

equations

25

model:

E(Ui xi) = 0,

## ryi + BXi = ui,

where rll+
B= 0 and Tui = vi. We are continuing
to assume that yi is
M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution
with finite fourth
moments (1 = 1,. .,N), and that E(x,,xi) is non-singular.
There are restrictions on
r and B: m(T, B)=O, where m is a known function. Assume that the implied
restrictions
on ll can be specified by the condition
that n=vec(lT)=f(Q
where the domain of 6 is r,, a subset of R that includes the true value So
(s 5 MK). Assume that Y, and f satisfy Assumptions
1 and 2; these properties
could be derived from regularity
conditions
on m, as in Malinvaud
(1970,
prop. 2, p. 670).
Choose

8 to
y:
E

where

[7i -

f(d)]&

1[72-f(s)],

is given

we assume

that

## in eq. (1) is positive

definite. Let
F= af(s)/S.
Then we have J%(~-~~)%NN(O,
A), where n
= (F Q - 1 F) . This generalizes Malinvauds
minimum distance estimator (p.
676); it reduces to his estimator if UPuy is uncorrelated
with xi xi, so that Q
= E(up up) @ [E(.qx;)] - (up = yi - Zlx,).
Now suppose that the only restrictions
on r and B are that certain
coefficients
are zero, together with the normalization
restrictions
that the
coefticient of yim in the mth structural
equation is one. Then we can give an
explicit formula for A. Write the mth structural equation as

## where the components

of zi, are the variables in yi and xi that appear in the
mth equation with unknown
coefficients. Let there be M structural equations
and assume that the true value r is non-singular.
Let 6 =(S;, . . ., &) be s x 1,
and let r(6) and B(6) be parametric
representations
of r and B that satisfy
the zero restrrctions
and the normalization
rule. We can choose a compact
set Y, c R containing
a neighborhood
of the true value a, such that I(6) is
non-singular
for b E Y,. Then s = f(s), where f(s) = vet [ - r (6) B(S)].
Assume that f(s) =IL implies that 6=6, so that the structural parameters
are identified.

Then

Y, and

f satisfy Assumptions

26

## G. Chamberlain, Multivariate regression models for panel data

A!+
N(O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),

an/as =

-(r

1 cgzK)p,,(zM

B d5;

)I,

## where @,, is block-diagonal: @,, = diag {E(zilx:), . . ., E(Zi,Xi)}, and @,=E(X&).

So we have

n = {~,,[E(Op Up~

Xi

X:)] -

l UP:,>- ,

to

## n = {@J-E -(Up up) @ @,I] a;,> - l,

which is the conventional asymptotic covariance matrix for three-stage least
squares [Zellner and Thiel (1962)].
I shall present a generalization of three-stage least squares that has the
same limiting distribution as the generalized minimum distance estimator.
Let /I=vec(B) and note that R= -(f ~ @ I)/?. Then we have
[ji+(r-

z)/?]s)-[a+(r-

0 4Bl

=[(ro1)72+P]O-[(ro1)12+81,
where
o=(Z~~;l)E(f

UpU:r~XtX;)(Z~Qi;).

matrix:

and let

where
iji = ~yi +

~Xi,

p+rO
7

B% BO.

G. Chamberlain,

Now replace

Multivariate

## regression models ,fir palIe data

21

0 by

6 = (Z@s,- ) 9yzgs,- ),
and note that
(I 0 S,)[(r

## Then we have the following

distance

function:

This corresponds
to Basmanns
(1965)
squares. 3
Minimizing
with respect to 6 gives

a,,=(S,,!F

interpretation

of three-stage

least

s:,)-(s,,!Ps,,).

The limiting
distribution
(Proposition
5). We record

of this
it as:

estimator

is

derived

in

appendix

Proposition
10. fi(6^,,-6)%iV(0,A),
where A =(@,, P- @P:,)-l. This
generalized three-stage least squares estimator is asymptotically efficient within
the class of minimum distance estimators.

Finally, we shall
Suppose that
Yil =S;

where E(xiUil)=O,
system by setting

consider

zil

the generalization

of two-stage

least

squares.

Oil,

Zil is sl x 1, and

rank

[E(XiZ:l)] =sl.

We complete

the

yi, = nk xi + Uim,
where E(XiUi,)=O

(m=2,.

. ., M). SO z~,,,=x~

(m=2,.

. ., M), and

## Let 6 =(6;, II;, . ., nJ and apply the minimum

distance procedure to obtain
8; since we are ignoring any restrictions
on R, (m = 2,. . ., M), 8 is a limited
information
minimum distance estimator.
13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be
obtained by following Hanken (1982). Also see White (1982).

28

We have
gives

G. Chamberlain,

a(\$,

-@)1?N(O,

Multivariate

regression

n 11), and

evaluating

the partitioned

## n 11= {E(Zi, \$I [E((Diq)Z Xix:)] - E(Xi Zil)} _ ,

inverse

(4)

where
\$1 =yi, -s;ozir.
We can obtain
the same limiting
distribution
generalization
of two-stage least squares: Let

by

using

the

following

and

where \$I %Sy
then

(for example,

8r could

&;G2
= (Z; x!PE;,x2,)-

be an instrumental

(z;

variable

estimator);

x!P ,, Xy,).

## This is the estimator of S, that we obtain by applying generalized three-stage

least squares
to the completed
system, with no restrictions
on A, (m
of this estimator is derived in appendix
= 2,. . .) M). The limiting distribution
B (Proposition
5):
Proposition 11. ,,/%(8,,,
-Sy)%N(O, A,,), where A, I is given in eq. (4). This
generalized two-stage least squares estimator is asymptotically
efficient in the
class of limited information minimum distance estimators.
3.4. Asymptotic
estimator

efjciency:

A comparison

likelihood

## with E(r,) =z,

V(rJ
Assume that ri is i.i.d. (i= 1,. . ., N) from a distribution
where Z is a J x J positive definite matrix; the fourth moments
are
finite. Suppose that we wish to estimate functions of Z subject to restrictions.
Let C= vet(Z) and express the restrictions
by the condition
that a=g(O),
where g is a function from Yinto Rq with a domain YC RP that contains the
true value O(q = J*; p 5 J(J + 1)/2). Let
=Z,

S=kiil

(ri-FJ(ri-yi),

If the distribution
function is

of vi is multivariate

normal,

then

the log-likelihood

## If there are no restrictions

on r, then the maximum likelihood
is a solution to the following problem: Choose 6 to solve

estimator

of 8

## We shall derive the properties of this estimator when the distribution

of Yi is
not necessarily normal; in that case we shall refer to the estimator as a quasimaximum likelihood estimator (e^,,,).14
MaCurdy
(1979) considered
a version of this problem and showed that,
under suitable regularity
conditions,
,/%(gQML -0) has a limiting normal
distribution;
the covariance
matrix, however, is not given by the standard
information
matrix formula. We would like to compare this distribution
with
the distribution
of the minimum distance estimator.
This comparison
1 in Ferguson
(1958). In our notation,
Ferguson considers the following problem: Choose 8
to solve

w (s, e) [s-g(e)] = 0.
He derives
the limiting
distribution
of fi(&-- fI) under
regularity
conditions
on the functions
W and g. These regularity
conditions
are
particularly
simple in our problem since W does not depend on S. We can
state them as follows:

## Assumption 3. E. c RP is an open set containing

to-one mapping
of E. into Rq with a continuous
second partial derivatives
in Eo; rank [ag(fI)/S]
singular for edo.

8; g is a continuous,
oneinverse; g has continuous
=p for OE 8,; Z(O) is non-

## In addition, we shall need SaAg(Oo) and the central limit theorem

+%(S-g(e))%N(O,d),
where A = V[(U~-~~)@(U~-~~)].

result that

Then Fergusons
theorem
implies that the likelihood
equations
surely have a unique
solution
within So for sufficiently
large
14The quasi-maximum
Malinvaud
(1970, p. 678).
JE--B

likelihood

terminology

was

used

by the

Cowles

almost
N, and

Commission;

see

30

## vmL4L- eO)%,N(O, A), where

A=(GYG,-GYAYG(GYG,)-,
and G=&(fl)/%,
Y=(Z@Zo)-.
It will be convenient
to rewrite this,
imposing the symmetry restrictions
on Z. Let G* be the J( J+ 1)/2 x 1 vector
formed by stacking the columns of the lower triangle of Z. We can define a
J* x [ J( J + 1)/2] matrix T such that CT= Ta*. The elements in each row of T
are all 0 except for a single element which is one; T has full column rank. Let
s= J-s* g(6)= Tg*(B), G* = ~g*(~)/S,
Y* = TYT;
then fi[S*
-s*(0)]
%N(O,A*), where A* is the covariance matrix of the vector formed from the
columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set
/I =(e*
Consider

y*G*)-

the following
T\$[s*

-g*(B)]

(G* y* A* y* G*)(e*
minimum

12.

1.

choose @MDto

## of E. that contains a neighborhood

result is implied by Proposition
7.

If Assumption

estimator:

G*)-

A,{!?* -g*(O)],

## where ris a compact subset

A,=%Y*. Then the following
Proposition

distance

y*

3 is satisfied,

as fi(gMD

then J%(&~~~

-0)

of 8 and

has the

- 0).

If A* is non-singular,
an optimal
minimum
distance
estimator
has
A,a%[A*-,
where [ is an arbitrary positive real number. If the distribution
of ri is normal, then A*- =iY*; but in general A*- is not proportional
to
Y*, since A* depends on fourth moments
and Y* is a function of second
moments.
So in general flPML is less efficient than the optimal minimum
distance estimator that uses
-1

,
;i;l(s~-s*)(s:-s-i)
1
where SF is the vector formed from the lower triangle of (ri-r](ri-f).
More generally, we can consider the class of consistent estimators that are
continuously
differentiable
functions of s-*: &=@*). Chiang (1956) shows that
the minimum distance estimator based on A*- has the minimal asymptotic
covariance
matrix within this class. The minimum
distance estimator based
on A, in (5) attains this lower bound.

G. Chamberlain,

Multivariate

regression

## models for panel data

31

4. An empirical example
We shall present
an empirical
example
that illustrates
some of the
preceding
results. The data come from the panel of Young Men in the
National
Longitudinal
Survey (Parnes). The sample consists of 1454 young
men who were not enrolled in school in 1969, 1970, or 1971, and who had
complete
data on the variables
listed in table 1. Table 2a presents
an
unrestricted
least squares regression of the logarithm of wage in 1969 on the
union, SMSA, and region variables for all three years. The regression also
includes a constant, schooling, experience, experience squared, and race. This
regression is repeated using the 1970 wage and the 1971 wage.

Table
Characteristics
Young Men,

of National Longitudinal
Survey
not enrolled in school in 1969,
1970, 1971; N= 1454.

Variable

Mean

LWI
LWZ
LW3
Ul
u2
lJ3
lJlU2
lJIcJ3
U2U3
UI CJ2U3
SMSAI
SMSAZ
SMSA3
RNSI
RNS2
RNS3
s
EXP69
EXP692
RACE

5.64
5.74
5.82
0.336
0.362
0.364
0.270
0.262
0.303
0.243
0.697
0.627
0.622
0.409
0.404
0.410
11.7
5.11
39.8
0.264

Standard
deviation
0.423
0.426
0.437

2.64
3.71
46.6

## LWI, L W2, LW3

~
logarithm
of hourly
earnings (in cents) on the current or last job in
1969,1970,1971;
UI, U2, U3 1 if wages on
current or last job set by collective bargaining,
0 if not, in 1969,1970,1971;
SMSAI,SMSAZ,
SMSA3 - 1 if respondent
in SMSA, 0 if not,
in 1969,1970,1971;
RNSI, RNSZ, RNS3 1, if
respondent in South, 0 if not, in 1969,1970,1971;
S ~ years of schooling completed;
EXP69 (S-age
in 1969 -6); RACE - 1 if respondent
black, 0 if not.

Xl.

1.f uoyxx

aJv sIoJ,a

p~vpuels

(6LO.O)
011'0

kmY0)
S80'0-

zzo'o-

(zsuo)

(SPO'O)
PIO'O

ZLO'O-

(ESOO)

(OPO'O)
610'0-

(LEOO)
OSO'O-

ZM7

lM7

aPnF'"!SUO!='J%~J flt'.

(180'0)
P9Z'O
(PLO'O)
181'0

## 'CSNZI 'ZSNti 'ISNXt-VSSWS'ZVSSWSIVS'SWS~

(6Lo'O)
9PZ'O
(260'0)
811'0

'(83VX p59dX~'69dX5Sf

9SZ'O(990'0)
LZZ'O

LZI'O

bM)'O)

(tr 10)
(911'0)
tzxo-

LPO'O-

hO~0)

f/M7

ZLOO-

(IPO'O)

uog3as (z)

alqe!JeA

.___

(ZLO'O)
821'0

1~

rn
In
zn
-.- ___~0 (sJolJa p.mpue)s pm) siua!ogao3
lseaI pap!llsamn

9z aw1

salenbs

znrn

(SLO'O)
260'0

.suo!ssaSaJ

i-n/n

(OLO'O)
9SI'O

fnzn

(POI'O)
281'0fl?Zl?nl/l

_.
e

(IEO'O) (OEO'O)

SEI'O

(szuo)

(SZO'O)
600'0-

tn

(EZO'O)
9Po'O

ZM7

.CM7

IPOO

(EZO'O)
8Po'O

(OEO'O)

OSI'O

(8200)

In

ajqe!mh

l/Ml

(szo'o)

zn

pun) slua!xjja03

ILI'O

';Zg

aJe slolJa pmpueis ayL \$wu~ 26ydxy fj9dxg s I) apnpu! suo!ssaJ%aJ I[V,
(950'0) (SSO'O)
880'0
COO'0

(LZO'O) (9zo.o)
980'0
010'0

pa~yno~m

16LO'O)
PLO'0

(I90'0) (S90'0)

.ba u! Q Bu!sn

(E60'0)
OSO'O

(660'0)
s90'0

(PSO'O) (SSO'O)
IOO'OZEO'O

IVSWS

Iseal pal3ysaJun

~__

Zf I0

(8LO'O)
ZCZ'O-

(601'0)
6EO'O-

';pg'

cvsws ZVSWS
__.

sanmbs

f800

SSI'O-

(Z6D'O)

'g:;'

ESO'O
(OLO'O)
801'o_

ISNkI

ozo'o

Ci'Ntl ZSNU

Rwo!ssaSaJ

ez alw

G. Chamberlain,

Multivariate

regression

33

## In section 2 we discussed the implications

of a random intercept (c) and a
random slope (b). If the leads and lags are due just to c, then the submatrices
of LI corresponding
to the union, SMSA, or region coefficients should have
the form /3l+U.
Consider,
for example,
the 3 x 3 submatrix
of union
coefficients ~ the off-diagonal
elements in each column should be equal to
each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;
In table 2b we add a complete set of union interactions,
so that, for the
union variables
at least, we have a general regression
function.
Now the
submatrix
of union coefficients is 3 x 7. If it equals (pZ3,0)+Zl, then in the
first three columns,
the off-diagonal
elements within a column should be
equal; in the last four columns, all elements within a column should be equal.
I first imposed the restrictions
on the SMSA and region coefficients, using
the minimum
distance estimator. fl is estimated using the formula in eq. (2),
section 3.1, and A,=&.
The minimum
distance statistic (Proposition
8) is
6.82, which is not a surprising value from a ~(10) distribution.
If we impose
the restrictions
on the union coefficients as well, then the 21 coefficients in
table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the
minimum
distance
statistic
(Proposition
8, appendix
B) of 19.36-6.82
= 12.54, which is not a surprising value from a ~(13) distribution.
So there is
no evidence here against the hypothesis
that all the lags and leads are
generated by c.
Consider a transformation
of the model in which the dependent variables are
LWl, LW2-LWl,
and LW3-LW2.
regression on
all of the lags and leads (and union interactions);
then impose the restriction that
U, SMSA, and RNS appear in the LW2- L WI and LW3 - LW2 equations
only as contemporaneous
changes (E(y, - y, 1 1x1, x2, x3) = p(x, - x,_ J). This
is equivalent
to the restriction
that c generates all of the lags and leads, and
we have seen that it is supported by the data. I also considered imposing all
of the restrictions
with the single exception of allowing separate coefficients
for entering and leaving union coverage in the wage change equations.
The
estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard
error on the sum of the coefficients is 0.024, so again there is no evidence
against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15
However, since the x,s are binary variables, condition (R) in Proposition
1

Using
May-May
CPS matches for 197771978, Mellow (1981) reports coefftcients (standard
errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership
in a wage
change regression.
The sample consists of 6,602 males employed as non-agricultural
wage and
salary workers in both years. He also reports results for 2,177 males and females whose age was
525. Here the coefficients on entering and leaving union membership
are quite different: 0.198
(0.031) and -0.035
(0.041); it would be useful to reconcile these numbers with our results for
young men. Also see Stafford and Duncan (1980).

34

G. Chamberlain,

Multivariate

regression

## models for panel data

does not hold. For example, the union coefticients provide some evidence
that E(b 1x1, x2,x,) is constant for the individuals
who experience a change in
if x,+x,+x,#O
or 33; but there is
union coverage [i.e., E(b 1x,,x,,x,)=if
no direct evidence on E(b 1x1, x2, x3) for the people who are always covered
or never covered. Furthermore,
our alternative
hypothesis has no structure.
It might be fruitful, for example, to examine the changes in union coverage
jointly with changes in employer.
Table 3a exhibits the estimates that result from imposing the restrictions
using
the optimal
minimum
distance
estimator.j
We also give the
conventional
generalized least squares estimates. They are minimum
distance
estimates in which the weighting matrix (AN) is the inverse of

## We give the conventional

standard
errors based on (pfi;F)-
and the
standard errors calculated according to Proposition
7, which do not require
an assumption
of homoskedastic
linear regression. These standard errors are
larger than the conventional
ones, by about 30%. The estimated
gain in
efficiency from using the appropriate
metric is not very large; the standard
errors calculated according to Proposition
7 are about 10% larger when we
use conventional
GLS instead of the optimum minimum distance estimator.
Table 3a also presents
the estimated
Ils. Consider,
for example,
an
individual
who was covered by collective
bargaining
in 1969. The linear
predictor of c increases by 0.089 if he is also covered in 1970, and it increases
0.036 if he is covered in all three years. The predicted c for
someone who is always covered is higher by 0.102 than for someone who is
never covered.
Table 3b presents estimates under the constraint
that I=U. The increment
in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly
large
value to come from a x2 (13) distribution.
If we constrain
only the union As
to be zero, then the increment
is 57.06- 19.36= 37.7, which is surprisingly
large coming from a x2(7) distribution.
So there is strong evidence for
heterogeneity
bias.
The union coefficient declines from 0.157 to 0.107 when we relax the A=0
restriction.
The least squares estimates for the separate cross-sections,
with
16We did not find much evidence for nonstationarity
in the slope coefficients. If we allow the
union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines
IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we
also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44
= 5.07, which is not a surprising value from a x(4) distribution.

0.086
(0.025)
~ 0.008
(0.046)

SMSAZ

- 0.067
(0.040)

- 0.023
(0.030)
SMSA

u2

UI

0.032
(0.046)

SMSA3

-0.082
(0.037)
0.156
(0.057)

lJllJ2

0.100
(0.072)

RNSl

0.152
(0.062)

UlU3

- 0.02 I
(0.077)

RNS2

0.195
(0.059)

r/2 U3

-0.085
(0.040)
(0.052)

- 0.082
(0.045)

RNS

-0.128
(0.068)

RNS3

-0.229
(0.085)

lJIUZU3

E*Cyjx)=nx=n,x,+n,x,;
x;=(Ul,
U2, U3, UIU2, UIU3, U2U3, UIU2U3, SMSAl,
SMSA2, SMSAS,
RNSI, RNS2, RNS3); x; =( 1, S, EXP69, EXP69, RACE). ZZ, = (/J,Z,, 0, BSMSAZJ,fiRNSZ3)+ 12; ZZ2 is unrestricted.
The restrictions are expressed as n = F6, where 6 is unrestricted.
B and 1 are minimum distance estimates with
A, =d in eq. (2), section 3.1; to,., and lo,, are minimum distance estimates with Ai = 6, in eq. (6), section
one based on
error for /?o,, is the conventional
4 ([or., is not shown in the table). The first standard
The x2
(FR,
4-l;
the second standard error for &rs is based on (FSZ;F)~Fn;1d6,F(F~;F)~.
statistics are computed from N[k-FG]&[?i-Fs].

x2(23) = 19.36

.I

0.050
(0.017)
(0.021)

0.121
(0.013)
(0.018)

/%,s

(i-3

0.0.56
(0.020)

0.107
(0.016)

errors) ok
SMS.4

(and standard

Coefficients

estimates.

Table 3a
Restricted

36

G. Chamberlain,

Multivariate

regression

## models for panel data

Table 3b
Restricted

estimates

under

Coefficients

s^

the constraint

(and standard

that I = 0.
errors) of:

SMSA

RNS

0.157
(0.012)

0.120
(0.013)

-0.150
(0.016)

x2(36) = 89.08
See footnote

to table 3a.

no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,
1970 and 1971.17 So the decline in the union coefficient, when we allow for
heterogeneity
bias, is 32% or 44x, depending
on which biased estimate (0.16
or 0.19) one uses. The SMSA and region coefficients also decline in absolute
value. The least squares estimates
for the separate cross-sections
give an
average SMSA coefficient
of 0.147 and an average region coefficient
of
-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the
decline in absolute value of the region coefficient is either 45% or 37%.

5. Conclusion
We have examined the relationship
between heterogeneity
bias and strict
exogeneity in distributed
lag regressions
of y on x. The relationship
is very
strong when x is continuous,
weaker when x is discrete, and non-existent
as
the order of the distributed
lag becomes infinite.
The individual
specific random
variables
introduce
nonlinearity
and
heteroskedasticity.
So we have provided an appropriate
framework
for the
estimation
of multivariate
linear predictors.
We showed that the optimal
minimum
distance
estimator
is more
efficient,
in general,
than
the
conventional
estimators
such as quasi-maximum
likelihood,
We provided
computationally
simple generalizations
of two- and three-stage least squares
that achieve this efficiency gain.
Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership
coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971
and 1973
(N=470),
Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability
of
union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this
variable is imputed for the other four years.) The coefficient declines to 0.081 when individual
intercepts
are included
in the regression.
His regressions
also include a large number
of
occupation
and industry specific job characteristics.

G. Chamberlain,

Multiuariate

regression

modekfor

37

panel data

Some of these ideas were illustrated using the sample of Young Men in the
National
Longitudinal
Survey. We examined
regressions
of wages on the
leads and lags in union coverage, SMSA, and region. The results indicate
that the leads and lags could have been generated
just by a random
intercept. This gives some support for analysis of covariance
type estimates;
these estimates indicate a substantial
heterogeneity
bias in the union, SMSA,
and region coefficients.
Appendix A
Let Sz be a set of points where OEQ is a doubly infinite sequence of
vectors of real numbers:
0={...,0_~,0~,0~,...}={0,,t~I),
where w,ER~
and I is the set of all integers. Let z,(w)=o,
be the tth coordinate
function.
Let F be the a-field generated by sets of the form
A = (0.xz,(w)E B,, . . ., Z,+k(u) E Bk},
where t, k E I and the Bs are q-dimensional
Bore1 sets. Let P be a probability
measure defined on 9 such that {e,, t E 11 is a (strictly) stationary
stochastic
process on the probability
space (C&P-, P).
The shift transformation
S is defined by z,(So) =zt+ r(w). It is an invertible,
measure preserving transformation.
A random variable d defined on (sZ,P, P)
is invariant
if d(So)=d(w)
except on a set with probability
measure zero
(almost surely or as.). A set A E 9 is invariant
if its indicator function is an
invariant
random variable.
We shall use E(d ( Y), to denote the conditional
expectation
of the random
variable
d with respect to the o-field 3, evaluated
at w. Let x, be a
component
of zl, let g(x) denote the a-field generated by {. . ., x_ 1, x0, x1,. . .},
and let E(d1 xt,x,_ r,. . .) denote the expectation
of d conditional
on the
a-field generated by xt, xt 1,. . . .
Proposition

3.

## where t is any integer.

Proof.
First we shall show that E(d I a(x)) is an invariant
random
Let f(o)=d(Sw).
A change of variable argument shows that
E(d I CT(X))~~
= E(fl S- o(x)),
[See

Billingsley

(1965,

example

variable.

a.s.

10.3, p.

109).]

Since

d is an

invariant

38

random

G. Chamberlain,

variable,

Multivariate

we have d(Sw)=d(o)

regression

models,for

panel data

Hence

## Let CJ(X,,Xf _ 1,. . .) denote the a-field generated

by (x,, x,_ 1,. . .), and let
~=~~_a(xt,x,1,. . .) be the left tail o-field generated by the x process.
Since E(d 1(T(X)) is an invariant
random
variable,
there is a version
of
E(d 1a(x)) that is measurable
Y-. [See Rozanov (1967, lemma 6. l., p. 162).]
a.s., and
so E(d 1a(x)) = E(d 10(x,, xt_ 1,. . .)).
Hence
E(d 1o(x)) = E(d 1Y)
Q.E.D.
Let d be an invariant
random
variable
and assume that E(P)< co,
E(xT)< co. Consider the Hilbert space of random variables generated by the
linear manifold
spanned by the variables
{d,. . .,x_ 1,x,,, x1,. . .}, closed with
respect to convergence
in mean square. We also include a constant
(1) in
the space. The inner product
is (a, b) =E(ab).
Then the linear predictor
of d on the closed linear
E*(d I . ..) X_1,X(),Xl)... ) is defined as the projection
subspace generated by { 1,. . ., x _ 1,x0, x1,. . .}.
Proposition 4.
then

If d is an invariant random variable and E(d) < co, E(xf) < co,

E*(dl...,

x_l,xO,xl

,... )=\$+A&

## where f is the limit in mean square of cfzO X,-~/J as J-co,

and
i = cov (d, ~)/V(~)

=o

if

V(2) # 0,

if

V(R)=O,

t is any integer,

## rc/= E(d) - AE(.f).

Proof: The existence of the limit is implied by the mean ergodic theorem
[Billingsley
(1965, theorem
2.1, p. 21)]. Since d is an invariant
random
variable, we have cov(d,x,)=cov(d,
x1) for all t. Let aJ=xf=l
x,-j/J. Then
cov (d, a,) = cov (d, x,), and so cov (d, m)= lim,, m cov (d, 2,) = cov (d, x1). Since
i is an invariant
random
variable, we have cov (a, a,)= cov (a, x,), and so
V(a) = lim,, m cov (2, a,) = cov (a, x1). Hence
cov (d - \$ - Af, x,) = cov (d, x1)-I

t E I.

Q.E.D.

G. Chamberlain,

Multioariate

reyression

## models for panel data

39

Appendix B
Let r: = (x;,yi), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write
the mth structural equation as
Yim

S:,Zirn

Uim,

m=l,...,M,

## where the components

of zi, are the variables in yi and xi that appear in the
mth equation
with unknown
coefficients.
Let S,, be the following blockdiagonal matrix:

and

Let 0: = (I\$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let
Gz, = E&J
Let 6 =(S;, . . ., 6b) be s x 1, and set

s^=
(S,, D - Sz,) -

(S,,

D s,,,).

Proposition
5.
Assume that (1) ri is i.i.d. according to some distribution with
,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;
und (4) D a%Y as N-+E, where !P is a positive definite matrix. Then
,I\$&
6)s N(0, A), where

Proof:
~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~.
strong law of large numbers,
Sz,2@,,;
@2x Y ~ W,
definite
matrix
since rank (@,,)=s.
So we obtain
distribution
by considering
(Gi,, Y - l a:,) - 1 CD,, Y

l f

i=l

(I\$ @ X&G.

## Note that II: @Ixi is i.i.d. with E(u: @ Xi)=O, V(U~ 0

applying

the central

limit theorem

By
the
is an s x s positive
the same limiting

X~)=E(U~U:

gives ~(8-6)~N(0,A).

Xix;).

Q.E.D.

Then

40

G. Chamberlain,

Multivariate

regression

## This result includes

as special cases a number
of the commonly
used
estimators.
If zi, = xi(m = 1, . . ., M) and D =Z, then 8 is the least squares
estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.
If Y = E(\$\$)
0 E(x&), then n is the asymptotic
covariance
matrix for the
three-stage
least squares estimator.
If Y =E(\$\$
@ XiXI), then ,4 is the
asymptotic
covariance
matrix for the generalized
three-stage
least squares
estimator [eq. (3), section 3.31. If
Y = diag{E(z\$~) E(xi xi), . ., E(vg) E(xi xi)),
then we have the asymptotic

covariance
Xi xi),

Y = diag{ E(\$t

matrix

for two-stage

least squares.

If

. ., E(?I~\$ xi xi)},

## we have the asymptotic

covariance
matrix for generalized
two-stage least
squares. [A,, is given in eq. (4), section 3.3.1
Next we shall derive the properties of the minimum distance estimator. Let
D,(0) = [a,-g(@]AJa,-g(6)]
and choose e to
min ON(e).
Bt 1
Assumptions
Proposition

6.

IfAssumption

3.2.

## 1 is satisfied, then @%I.

ProoJ:
Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].
D, a.~. converges
uniformly
to D* on Y: Let B be a neighborhood
of 8 and set r = r- B. Then

min D,(@*min
BEY

BET

D*(8)=&

## Since 6 > 0 and DN(&)a*O, it must

Since B is an arbitrary
Q.E.D.

Proposition

7.

If

## be that 8~ B a.s. for N sufficiently

neighborhood

Assumptions

of 8, we have shown

and

are

satisfied,

then

that

large.
8 Leo.

fi(&-8)

## G. Chamberlain, Multivariate regression models for panel data

Zf A is positive
optimal

choicefbr

Proof

Let

definite,

then A -(G

Y is A

A - 1 G)- 1 is positive

semi-definite;

41

hence

an

I.

s,(fI)=dD,(B)/80=

-2(aip(e)/a~)A.[a,-g(e)].

Since &SO,
for N sufficiently large we as.
mean value theorem implies that

## s,(6)= ~~(8) + (ds,(O*)/tW) (6 0) a.s.,

for sufficiently large N, where 8* is on the line segment connecting
8 and 0.
[There is a different 8* for each row of &,(O*)/%; the measurability
of 8*
follows from lemmas 2 and 3 of Jennrich
(1969).] Since 0*28,
direct
evaluation
shows that
&&I*)/%

32 G Y G,

which is non-singular.

fi(e^for sufficiently
considering

Hence @(e^-

Hence

0) = - [t%,(tl*)/ae,] - l JNs,(eo)

large

N.

We

obtain

the

same

a.s.,
limiting

distribution

by

## To find an optimal Y, note that there is a non-singular

matrix C such that
Let G=C-Gand
B=(GYG)-GYC.
Then we have

A=CC.

which is positive
Proposition

8.

and if ANaLA-,

semi-definite.
If Assumptions

Q.E.D.
I and 2 are

satisfied,

then

%c2(q
-PI.

if A is positive

definite,

42

Proof:

For sufficiently

large N we have

JNCg(&)-g(OO)]
= G, JE(e^- 0)
where G, %G.

From

## the proof of Proposition

a.s.,

7, we have

JE(e^--OO)=R,JN[u,-g(BO)]as.
where R,~R=(GA-G)-lGAp.

fib,
where Q = Z,- GR,
II 2 N(O, I,);

-g(831

.,

Hence

=,,%,

-AeoN

C is a non-singular

## d, = N[a, -g(@)] A,[a,

-fiCg(83-g(e")l
matrix

such

that

sQCU>
CC = A,

and

-g(8)] %i C Q A - 1 QCu.

Let G=CG
and
M,=Z,-c(G@lc;
idempotent
matrix with rank q-p and

then

M,

is

symmetric

CQA-QC=M;=M,.

Hence d,~,uM,u~~X2(q-p).

Q.E.D.

restrictions,
which are expressed by the
condition
that 8 =f(a), where a is s x 1 (s 5 p). The domain of a is Y,, a subset
of R that contains
the true value d . So O=f(a) is confined to a certain
subset of RP.
Assumption 2.
Y, is a compact
subset of R that contains
a; f is a
continuous
mapping
from r, into Y, f(a) = e for a E Y, implies a =a; Y,
contains
a neighborhood
of a0 in which f has continuous
second partial
derivatives; rank (F) = s, where F= df (a)/da.

Let h(a)=g[f(a)].

Choose

oi to

min [ahi-h(a)]AN[u,-h(a)].
OIE
r,

G. Chamberlain,

Proposition

8.

If Assumptions

## definite, and if AN%Ael,

regression

forpanel

models

1, 2, and 2 are

then d, -d2%~(p-s),

satisfied,

data

43

\$ A is positive

where

-W)lA.Ca,-W41,

d, =N[a,

Furthermore,

Multivariate

d, -d,

is independent

## of d, in their limiting joint distribution.

Proof
The assumptions
on f and
Y, imply that h and
& satisfy
Assumptions
1 and 2. By following the proof of Proposition
8, we can show
that the vector (d,, d2) converges in distribution
to (d:, d:), where

U& N(O,Z), C is
G=C-G,
and

non-singular

matrix

such

that

CC= A, 8=

C-H,

M,=z,-e(el~-w

MH=Iq-A(RA)-lzT,

## Since fi is in the column space of e, we have M,Mc=MGM,

MH-MG is a symmetric idempotent
matrix with rank p-s. Hence

=M,;

so

d, -d,~tU(MH-MG)u~~X2(p-s).
Since
cov[(M,-M,)u,M,u]
we see that d: - d,* is independent

=(MH-MG)MG=O,
of d:.

Q.E.D.

## In section 3.2 we considered

applying
the minimum
distance procedure
both to L and to W. We want to show that if the restrictions
involve only n,
then the two procedures
give estimators
of R with the same limiting
distribution.
First consider the effect of a one-to-one
transformation
from W
to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a\$,
where
p =g(O). Let h(8) = Ik(O)]. Choose 8 to

y;,rCk)

- W41ANCQ4- W)l.

44

## G. Chamberlain, Multivariate regression models for panel data

Proposition 9a. Assume that (1) Assumptions 1 and 2 are satisfied for g and
1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has
continuous second partial derivatives in a neighborhood of g(Q); L is nonsingular;

(3) A is positive

3 N(0, A),

where A = (G A

Proof:

definite

and A,

A(LAL)-.

Then

,/%(6-O)

l c)-.*

By the d-method,

fi[f(~,)-h(8~)]~N(O,
L A L).

Hence ,,/%(&
O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.
Q.E.D.
Since H= L G and L is non-singular,
we have A =(G A ~ c)-.
Finally,
consider
augmenting
aN to a k x 1 vector
cN: c;V=(a;, bk),
kzq.
(For example,
we can augment
WZ.) Assume
that
cN %<, where 5 = (g(OO),cg), and assume that J%(c~-<~)~N(O,@).
We
shall let TZ be unrestricted.
Let t,k = (\$;, I+VJ= (O,t,k) be a 1 x n vector, where
n =p + k - q; set ml(+) = (g(O),I&). Choose \$ to

## 7; CG- W)l A\$CCN

- 4W,
6I

where Ag%Wl.
Then I,&, provides an estimator of 8; we want to compare
this estimator with the following one: choose 8 to

minCa,-g~e8)1A.[a,-g(e)l,
eer

where A,
We shall
A,
and
consisting
following
Proposition
Proof

%A-,
Y is a compact subset of RP, and g is continuous
on Y.
set Y, equal to the Cartesian product of Y and Rkm4. Suppose that
A; are positive
definite,
and that the submatrix
of (Ag)-l
of the first q rows and columns equals Ail.
Then we have the
result:
9b.

\$, = 6

Minimizing

4-+,,=

## first with respect

to ti2 gives

-(A~,,)-A~,,Ca,-g(e)l,

45

## where A&, is the s, t submatrix

distance function is

= [aN -g(O)]
So the
distance

A,[a,

of A;

-g(O)].

of unrestricted
estimator.

## (s, t = 1,2). Then

the

concentrated

Q.E.D.

moments

does

not

affect

the

minimum

References
Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.
Krishnaiah,
ed., Proceedings
of the second international
symposium on multivariate
analysis
Anderson,
T.W., 1970, Estimation
of covariance
matrices which are linear combinations
or
whose inverse are linear combinations
of given matrices,
in: Essays in probability
and
statistics (University of North Carolina Press, Chapel Hill, NC).
Amemiya, T., 1971, The estimation
of variances in a variance-components
model, International
Economic Review 12, l-13.
Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation
of a dynamic model: The demand for natural gas, Econometrica
34, 5855612.
Basmann, R.L., 1965, On the application
of the identifiability
test statistic and its exact finite
sample
distribution
function
in predictive
testing
of explanatory
economic
models,
Unpublished
manuscript.
Billingsley, P., 1965, Ergodic theory and information
(Wiley, New York).
Billingsley, P., 1979, Probability
and measure (Wiley, New York).
Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,
113-134.
Chamberlain,
G., 1980, Analysis
of covariance
with qualitative
data, Review of Economic
Studies 47, 225-238.
Chiang, C.L., 1956, On regular best asymptotically
normal estimates, Annals of Mathematical
Statistics 27, 336-351.
Cramer, H., 1946, Mathematical
methods of statistics (Princeton
University
Press, Princeton,
NJ).
Ferguson,
T.S., 1958, A method
of generating
best asymptotically
normal
estimates
with
application
to the estimation
of bacterial
densities, Annals of Mathematical
Statistics 29,
10461062.
Goldberger,
AS., 1974, Asymptotics
of the sample regression slope, Unpublished
lecture note no.
12.
Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.
Griliches, Z. and A. Pakes, 1980, The estimation
of distributed
lags in short panels, National
Bureau of Economic Research technical paper no. 4.
Hansen,
L.P., 1982, Large sample properties
of generalized
method of moments
estimators,
Econometrica
50, forthcoming.
Hsiao, C., 1975, Some estimation
methods for a random coefficient model, Econometrica
43,
3055325.
Jennrich, R.I., 1969, Asymptotic
properties of non-linear least squares estimators,
The Annals of
Mathematical
Statistics 40, 6333643.
Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).
MaCurdy.
T.E., 1979, Multiple time series models applied to panel data: Specification
of a
dynamic model of labor supply, Unpublished
manuscript,

46

## Maddala, G.S., 1971, The use of variance components

models in pooling cross section and time
series data, Econometrica
39, 341-358.
Malinvaud,
E., 1970, Statistical methods of econometrics
(North-Holland,
Amsterdam).
Mellow, W., 1981, Unionism
and wages: A longitudinal
analysis, Review of Economics
and
Statistics 63, 43-52.
Mundlak,
Y., 1961, Empirical
production
function free of management
bias, Journal of Farm
Economics 43,44-56.
Mundlak,
Y., 1963, Estimation
of production
and behavioral
functions from a combination
of
time series and cross section data, in: C. Christ et al., eds., Measurement
in economics
(Stanford University Press, Stanford, CA).
Mundlak,
Y., 1978, On the pooling of time series and cross section data, Econometrica
46, 699
85
Mundlak,
Y., 1978a. Models with variable coefftcients: Integration
and extension, Annales de
IINSEE, 30-31, 4833509.
Rao, C.R., 1973, Linear statistical inference and its applications
(Wiley, New York).
Rothenberg,
T.J., 1973, Efficient estimation
with a priori information
(Yale University
Press,
New Haven, CT).
Rozanov, Y.A., 1967, Stationary
random processes (Holden-Day,
San Francisco, CA).
Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.
Sims, C.A. 1974, Distributed
lags, in: M.D. Intriligator
and D.A. Kendrick,
eds., Frontiers
of
quantitative
economics, Vol. II (North-Holland,
Amsterdam).
Swamv,
P.A.V.B..
1970. Efficient
inference
in a random
coefficient
regression
model.
Fconometrica
38, 3111323.
Swamv. P.A.V.B.. 1974. Linear models with random coefficients. in: P. Zarembka.
ed.. Frontiers
in econometrics
Stafford, F.P. and G.J. Duncan,
1980, Do union
members
wage
differentials?, American Economic Review 70, 355-371.
Wallace, T.D. and A. Hussain, 1969, The use of error components
models in combining
time
series with cross section data, Econometrica
37, 55-72.
White, H., 1980, Using least squares to approximate
unknown regression functions, International
Economic Review 21. 1499170.
White, H., 1980a, Nonlinear regression on cross section data, Econometrica
48, 721-746.
White, H., 1980b, A heteroskedasticity-consistent
covariance
matrix estimator
and a direct test
for heteroskedasticity,
Econometrica
48, 817-838.
White, H., 1982, Instrumental
variables regression with independent
observations,
Econometrica
50. forthcoming.
Zellner,
A. and H. Theil,
1962, Three-stage
least squares:
Simultaneous
estimation
of
simultaneous
equations, Econometrica
30, 54-78.
Zellner, A., J. Kmenta
and J. D&e,
1966, Specification
and estimation
of CobbDouglas
production
function models, Econometrica
34, 784795.