Journal of Econometrics 18 (1982) 546.
MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
Gary CHAMBERLAIN*

© All Rights Reserved

75 views

Journal of Econometrics 18 (1982) 546.
MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
Gary CHAMBERLAIN*

© All Rights Reserved

- Lecture 10 1
- Checking Assumptions of Normality Regression STATA
- Artikel Udara Pak Sam Grostlog 2018
- US Federal Reserve: 199648pap
- Eviews Tutorial 4 1
- Chapter 5
- mbook applied econometrics using MATLAB
- UT Dallas Syllabus for psci5362.001 06s taught by Patrick Brandt (pxb054000)
- Estimating Standard Errors in Finance Panel Data Sets
- Linton O. Econometrics (LN, LSE, 2002)(144s)_GL
- branasGarzaNeumann-parentsReligiosityAndDaughtersFertility-RevEconHousehold2004
- tut5
- Simple Linear Regression-Part 1
- ijcb14q3a3
- 1195_30
- 328
- Cointegration and Antitrust, A Primer
- MFPT- 3
- Chap 3 of BHT
- Estimation of Elasticities of Substitution for CES and VES Production Functions Using Firm-Level Data for Food-Processing Industries in Pakistan

You are on page 1of 42

of Econometrics

18 (1982) 546.

North-Holland

Publishing

MULTIVARIATE

REGRESSION

FOR PANEL DATA

Company

MODELS

Gary CHAMBERLAIN*

University

Nutionul

Bureau of EconomicResearch,

WI 53706, USA

Cambridge,

MA 02138. USA

the relationship

between heterogeneity

bias and strict exogeneity

in a

distributed

lag regression

of y on X. The relationship

is very strong when x is continuous,

weaker when x is discrete, and non-existent

as the order of the distributed

lag becomes

random

variables

introduce

nonlinearity

and

heteroinfinite.

The individual

specific

skedasticity;

so the paper provides an appropriate

framework

for the estimation

of multivariate

linear predictors.

Restrictions

are imposed using a minimum distance estimator.

It is generally

more efficient than the conventional

estimators

such as quasi-maximum

likelihood.

There are

computationally

simple generalizations

of two- and three-stage

least squares

that achieve

this efficiency gain. Some of these ideas are illustrated

using the sample of Young Men in the

National

Longitudinal

Survey. The paper reports regressions on the leads and lags of variables

measuring union coverage, SMSA, and region. The results indicate that the leads and lags could

have been generated

just by a random

intercept.

This gives some support

for analysis of

covariance

type estimates; these estimates indicate a substantial

heterogeneity

bias in the union,

SMSA, and region coefficients.

1. Introduction

Suppose that we have a sample of individuals

(or firms) followed over time:

and i= 1,. ., N individuals.

(xif,yiJ, where there are t= 1,. . ., T periods

Consider the following distributed

lag specification:

E(YitIXil,...,XiT,biO,...,biJ,Ci)=

i bijXi,t-j+Ci,

j=O

t=J+l,...,T

but are

constant

over time. The population

parameters

of interest are fij= E(bij),

J. If the bii or ci are correlated with x, then a least squares regression

j=O,...,

*I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel

Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided

by the National

Science Foundation

(Grants No. SOC-7925959

and No. SES-8016383) and by

funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,

by the Department

of Health,

Education,

and Welfare pursuant

to the provisions

of the

Economic Opportunity

Act of 1964.

01657410/82/000Cr0000/$02.75

1982 North-Holland

estimator

of the B,i (as

N-+co). We shall refer to this inconsistency

as a heterogeneity

bias.

In section 2, on identification,

we consider first the case J =0 and bij=Pj

We argue that the presence of heterogeneity

bias will be signalled by a full

set of lags and leads in the least squares regression

of y, on x1,. . .,xT

Furthermore,

if we let y=(yi,..

.,yr), x=(xr,. . .,x,) and consider

the

multivariate

linear predictor: E*(y lx) = no + lI,x,

then the T x T matrix ZZ,

should have a distinctive

pattern the off-diagonal

elements within the

same column are all equal. In that case,

relationship

when we transform

to first

differences.

I think that a test for such restrictions

should accompany

analysis of covariance type estimation.

There is an analogous

question when J is finite and the bj are random as

well as c. Does E(y, 1x1,. . ., xT) = E(y, 1x,, . . ., xtmJ) imply that there is no

heterogeneity

bias? We find that the answer is yes if x has a continuous

distribution

but not if x is discrete.

New issues arise as the order (J) of the distributed

lag becomes infinite.

We consider this problem in the context of a stationary

stochastic process; c

and the bj are (shift) invariant

random variables. There are invariant

random

variables with non-zero variance if and only if the process is not ergodic. We

pose the following question: if

E*(Y, I . . ..Xf-1.&,Xt+1,..

. I=

E*(Y,

1x,, x, -

1,.

.I,

so that y does not cause x according to the Sims (1972) definition, is it then

true that there is no heterogeneity

bias? The answer is no, because if d is an

invariant random variable, then

E*(dI . . .. x,_~,x,,x,+~

,... )=E*(dIxt,xtpl

,... ).

the estimation

of multivariate

linear

predictors. lhere is a sample ri = (x;,y$ i = 1,. . ., N, where x; = (xi,, ., xiK) and

yi=(y,r,. . ., yiM). We assume that ri is independent

and identically

distributed

(i.i.d.) according to some distribution

with finite fourth moments. We do not

assume that the regression function E(ji 1xi) is linear; for although E(ji 1xi, ci)

may be linear, there is generally no reason to insist that E(c,j xi) is linear.

Furthermore,

we allow the conditional

variance

V(_V,1xi) to be an arbitrary

function of xi; the heteroskedasticity

could, for example, be due to random

coefficients. Let wi be the vector formed from the squares and cross-products

of the elements of vi; let Zl be the matrix of linear predictor

coefficients:

G. Chamberlain,

Mulrirw-iate

regression

models fir

panel data

where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d.

function of E(wi). So the problem is to make inferences about

functions of a population

mean, under random sampling.

This is straightforward

and the results have a variety of novel

Let ii be the least squares estimator; let it and 71 be the vectors

and f7 is a

differentiable

implications.

formed from

as N-t co. The formula

for C2 is not the standard

one, since we are not assuming

homoskedastic,

linear regression.

We impose restrictions

by using a minimum

distance estimator:

find the

matrix satisfying the restrictions

that is closest to fi in the norm provided by

fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some

consider

a univariate

linear

predictor:

surprising

results.

For example,

E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction

that n2 =0

by using a least squares regression of y on x1 to estimate rcr; however, this is

asymptotically

less efficient, in general, than our minimum distance estimator.

The conventional

estimator is a minimum

distance estimator, but it is using

a different norm.

A related result is that two-stage

least squares is not, in general, an

efticient procedure

for combining

instrumental

variables;

three-stage

least

squares is also using the wrong norm. We provide more efficient estimators

for the linear simultaneous

equations

model by applying

our minimum

distance procedure

to the reduced form, thereby generalizing

Malinvauds

(1970) minimum

distance estimator.

Suppose that the only restrictions

are

that certain structural

coefficients are zero (and the normalization

rule). We

provide

a generalization

of three-stage

least squares that has the same

limiting

distribution

as our minimum

distance

estimator.

There is a

corresponding

generalization

of two-stage least squares.

We also consider the maximum

likelihood

estimator

based on assuming

that ri has a multivariate

normal distribution

with mean z and covariance

matrix Z. Then the slope coefficients in IZ are functions

of C and, more

generally,

we can consider estimating

arbitrary

functions

of C subject to

restrictions.

When the normality

assumptions

do not hold, we refer to the

estimator

as a quasi-maximum

likelihood

estimator.

The quasi-maximum

likelihood estimator has the same limiting distribution

as a certain minimum

distance estimator;

but in general that minimum

distance estimator

is not

using the optimal norm. Hence our estimator is generally more efficient than

the quasi-maximum

likelihood estimator.

Section 4 of the paper presents an empirical example that illustrates some

of the results. It is based on the panel of Young Men in the National

Longitudinal

Survey (Parnes); y, is the logarithm

of the individuals

hourly

wage, and x, includes variables to indicate whether or not the individuals

wage is set by collective bargaining;

whether or not he lives in an SMSA;

and whether or not he lives in the South. We present unrestricted

least

G. Chamberlain,

Multivariate

squares regressions of y, on xi,. ., xT. There are significant leads and lags; if

they are generated just by a random

intercept

(c), then ZZ should have a

distinctive

form. There is some evidence in favor of this, and hence some

justification

for analysis of covariance

estimation.

In this example, the leads

and lags could be interpreted

as due just to c, with E(y, 1x1,. . ., xT, c) =j?x, + c.

2. Identification

Suppose

technology,

that

a farmer

is producing

a product

o<p<1,

Y,=Px,+c+~,,

with

a Cobb-Douglas

t=l,...,7;

of output, x, is the logarithm

of a variable input

(labor), c represents

an input

that is fixed over time (soil quality),

U,

represents

a stochastic

input (rainfall), which is not under the farmers

control, and t indexes the seasons. We shall assume that the farmer knows

the product price (P) and the input price (W), which do not depend on his

decisions, and that he knows c. The factor input decision, however, is made

before knowing

u,, and we shall assume that xt is chosen to maximize

expected profits. Then the factor demand equation is

x, = {ln /I + ln CE(e 1%)I + ln(P,/&)

+ c}/(l -

p),

set available to the farmer when he chooses xt.i

Although c is known to the farmer and affects his factor demand decisions,

we assume that it is not known to the econometrician.

He observes only

yr)

and

x

=(x1,.

.

.,

xT)

for

each

member

of

a

sample

of N farms.

y=(y1,...,

Consider the least squares regression of y, on x1 using just a single crosssection of the data. The population

counterpart

is

regression function),

mean

711=cov(Yl,4Vx,),

square

error

linear

predictor

(the wide-sense

%=E(Y~)--~E(xJ

Cov(c, x,)#O if V(c)#O; then n, #p and the least squares estimator of /I does

not converge to fi as N-co.

Furthermore,

with a single cross-section,

there

would be no internal evidence of this heterogeneity

bias.

This example

is discussed

in Mundlak

(1961,1963)

and in Zellner,

Kmenta,

G. Chamberlain,

:Illrltirariate

regression

per farm, however, we can consider the

counterpart

least squares regression of y, on x = (xi,. ., xT). The population

is

E*(y, I x) = pxt + E*(c ( x) + E*(u, I x).

Assume

E*(c 1x) = $ +

xx,

Then

Iz= I/

be a full set of lags and leads

if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,. . ., ?; then Iz is proportional

to the row sums of V-(x), and all of the elements

of I will

typically be non-zero. I think that it is generally true that E*(c lx) depends

on all of the x,s if it depends

on any of them. So the presence

of

heterogeneity

bias will be signalled by a full set of lags and leads. Also, if

E*(u) x)=0,

then

the wide-sense

multivariate

regression

will have a

distinctive pattern:

co+, x) v

(x) = p I, + 1A,

elements within the same

column of ll, are all equal.

A common solution to the bias problem is some form of analysis of covariance. For example, we can form the farm specific means (j?=CT= 1 y,/T,

X =cT= 1 x,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then

run a pooled least squares regression of ~7 on 2. This is equivalent

to first

running the least squares regression of g* on & for each of the T cross-section

samples, and then forming a weighted average of the T slope coefficients. The

population

counterpart

of the tth least squares regression is

(as N-co)

estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition

to hold unless

E*(u,-uu,.

j-~~-x~,...,x~-x~~~)=O,

t = 2,

,) 7:

estimator

was used by Mundlak (1961). Related estimators

have

been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),

Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in

Chamberlain

(1980).

10

G. Chamberlain,

Mu&variate

differences.3 The strict exogeneity

when we transform

the model to first

restriction is testable since it implies that

E*(!,,-Yr~1IXz-X1,..., xT--x.-l)=~*(Yt-Y,-l

hence there are exclusion restrictions

A stronger condition is that

E*(u,lx

on the linear

I-+-x,-d

predictors.

t=l,...,T.

,,..., xT)=o,

These restrictions

on n,are

testable; we can summarize

them by saying that x is strictly exogenous

conditional

on c. The restrictions

would fail to hold in the production

function

example

if u, is partly

predictable

from its past, so that

E[exp(u,) 1LAY,]depends on u, _ r, u, _ 2, . . .

Now suppose that the technology varies across the farms, so that

y,=bx,+c+u,,

where b is a random variable that is constant over time. We shall refer to b

and c as invariant

random

variables.

Our discussion

of E*(c lx) indicated

that it depends on all of the x,)s if it depends on any of them. I would expect

this to be true of E(c 1x) as well. This general characteristic

of invariant

random variables is formulated in the following condition:

Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of

random

variable. Then E(d 1x)=E(d I x*)

{l,..., T). Let d be an invariant

implies that E(d 1x) = E(d).

Suppose that the parameter

of interest is /l=E(b). If b or c is correlated

with x, then a least squares regression

of y, on x, will not provide

a

consistent estimator of /I. We have argued that such a heterogeneity

bias will

be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).

Under what conditons can we infer that there is no bias if we observe only a

contemporaneous

relationship?

Proposition

1 provides some guidance; it can

be extended easily to the case of a finite distributed

lag.

Condition (R).

Proposition

Suppose that

I.

E(y, I x, b, 4 = b x, + c,

3The strict exogeneity

terminology

n with i 5 n 5 T.

t=l,...,T.

[f conditions

fir

panel data

11

E(Y,

I4 = E(Y,I 4,

t=l,...,7:

implies that

ProoJ:

The following

equalities

So E(bIx)=E(bI

and

I xn -

one:

if T2 3, then

(C) implies

that

~(clx)=E(y,lx)--E(blx)x,=E(y,lx,)--xx,;

hence E(c 1x) = E(c 1x1) and so E(c 1x) = E(c).

Q.E.D.

of the process. If

we find that E(y, 1x) has a full set of lags and leads, then we can ask if

that is just due to E(c/x)#E(c).

Let dy,=y,-y,_,,

Ax~=x~--x~-~,

and

Ax = (Ax,, . . ., Ax,). Under the assumptions

of the proposition,

if

E(AY, 1A4 = E(AY, ( Ax,),

then

E(AY, 1A4 = B(A-4.

is possible

to find E(Ay, 1Ax)=E(Ay,

(Ax,) even though

or

example,

consider

the

stationary

case

in which cov(x,, b)

-W+)#W).

F

= cov (x,, b); then E*(b 1Ax) = E(b) and so E(b 1Ax)= E(b) if the regression

function of b on Ax is linear. Then we might find that E(Ay,) x) has a full set

of lags and leads even though E(Ay, 1Ax) does not.

The condition

that prob(x,=x,_

,)=O is necessary.

For consider

the

following

counter-example:

E(b ( x) = /II1 if x1 =. . . = xT, E(b 1x) = p2 if not

Note

that

it

G. Chamberlain,

12

but p2 #E(b)

distinction

here

only takes on

probability

that

for large 7:

The following

distinction;

it is

Multivariate

regression

unless

prob(x, = ... =

between continuous

a finite set of values,

x1 =. . . = xT, although

models

for panel data

xT) = 0. So there

is an important

and discrete distributions

for x. If x,

then there will generally

be positive

this probability

may become negligible

proposition

provides

some additional

insight

mto

based on a condition that is slightly weaker than (R):

Condition (R).

Prob(x,

Proposition 2.

Suppose

this

= x2 =. . . = xT) = 0.

that

t=l,...,7;

6=til

(Yt-m-+l(x,--v.

Then E(6j = E(b) if E((6j) < a.4

ProoJ

The following

equalities

E(l+,b,c)= i

i

b(x,-X)

one:

(x,-%)2=b;

I t=1

t=1

so if E(I6/)< co,

E(6j = E[E(6[ X, b, c)] = E(b).

Suppose

that (yil,. . ., yi,, xii,. . ., xiT),

from the distribution

of b,x). Define

6zt$l

(Yit-Pi)(xit-xi)

Q.E.D.

i= 1,. . ., N,

is

random

sample

til (xit-xi)2.

of Proposition

2 are satisfied, cr= I &i/N converges

almost surely (as.) to E(b) as N-co.

It is important

that gi is an unbiased

estimator

of E(b), since we are actually taking the unweighted

mean of a

*The assumption

that E(161)< co is not innocuous.

For example, suppose that V(c)= V(b)=0

and (x,, y,) is independent

and identically distributed

(t = 1,. ., T) according to a bivariate normal

with

distribution.

Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~

w, where w has Students t-distribution

T- 1 degrees

of freedom.

13

The lack of bias requires that x be strictly

exogenous

conditional

on b,c. It would not be sufficient to assume that

E(y, ( xt, b, c) = bx, + c. For example, if x, = y,_ 1, then our estimator would not

converge to E(b), due to the small T bias in least squares estimates of an

autoregressive

process.

Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the

group with Di= 1. The sample mean of bi for that group converges as. to

E(b 1D = l), but we have no information

on E(b 1D = 0). So unless prob(D = 0)

= 0, any value for E(b) is consistent with a given value for E(b 1D = 1).5

If x, has a continuous

distribution,

then the assumption

that the regression

function is linear (E(y, 1xt, b, c) = bx, + c) is very restrictive; the implication

of

this assumption

(combined

with strict exogeneity) is that we can obtain an

unbiased

estimator

for b, and hence a consistent

(as N+co)

estimator

for

E(b). If x, is a binary variable, then the assumption

of linear regression is not

restrictive

at all; but there are fewer implications

since there is positive

probability

that 6is not defined for finite ?:

The following extension

lag is straightforward?

Proposition

1.

of Proposition

Suppose that

E(y,IX,b,,...,b,,c)=

t=J+l,...,T

bjx,-j+c,

j=O

ii

1 X,-J-l

:1

X,-J

. . . . x&,

E(Y,

I4

t=J+1,...,7;

5A solution

could be based on Mundlaks

(1978a) proposal

that E(bIx)=$,,+$,

CT=, x1.

However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult

to justify the restriction

that only cx,

matters, unless T is large and we have stationarity:

cov (b, I,) = cov (b, x1) and V(x) band diagonal. (See Proposition

4 and the discussion preceding

it). Furthermore,

if cov(h, x,) = cov(b, x1), then E(b 1x2-x,,

.,xr -xT- 1)= E(b) (if the regression

function is linear), and so there is no heterogeneity

bias once we transform to first differences.

6We shall not discuss the problems

that arise from truncating

the lag distribution

when

T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear

transformations

of the process, it is fairly straightforward

to extend our analysis to general

rational distributed

lag schemes.

14

implies

that

E(y,Ix)=

Bjxt-j+Y,

j=O

where

pj = E(bj) = E(b, 1x)

and

j=O,...,

2 is also straightforward.

There

however, in the infinite lag case, which we shall take up next.

Large number of lags.

Suppose

E(.Yfldx),c)= f

J.

that

Bjxt-j+c2

i=O

set (a-field) generated

by {. . .,x_ I, x0, x1,. . .},

and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression

version of the Sims (1972) condition

for x to be strictly exogenous (y does

not cause x),

E(Yt

I 4) = E(Yt I x,2 xt -

19.. 4

imply

that E(c 1a(x))=E(c),

so that there is no

heterogeneity

bias?

We shall consider this question

in the context of a (strictly) stationary

stochastic

process. Since c does not change over time, it is an invariant

random variable. The following proposition

is proved in appendix A:

Proposition

3.

E(dIo(x))=E(dlx,,x,-,,...),

where t is any integer.

It follows that

n

E(Y,Ia(x))=E(cIx,,x,-,,...)+

C Pjx*-j

j=O

=E(y,Ix,,x,-I,...).

So we cannot

1s

a large number

of lags have been included, then a small number of leads

provide little additional

information

on c.

We can gain some insight into this result by considering

the linear

predictor of an invariant

random variable. Let

E*(c 1xl,. . ., x.)=IC/T+&XT,

where

2;. =(& i, . . .) A,,)

and

x;=(xl,...,xT).

Stationarity

implies that I,=rV(xT)l, where r =cov(xl, c) and 1 is a TX 1

vector of ones. Since V(x,) is a band-diagonal

matrix, I is approximately

an

eigenvector

of I+,)

for large T; hence &.x,EzIc~T=

1x,. For example, if

X, = px, _ i + u,, where v, is serially uncorrelated,

then

&-x,=~

1

i=l

a limit as T--+Lx unless

z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial

linear predictor only if cj=O x,_ j/J converges to a non-degenerate

random

variable as J-rco.

The general

Proposition

then

case is covered

4.

by the following

proposition:

)=$

+/IT?,

where 2 is the limit in mean square of cJ= Ox, _ j/J as J+ co, t is any integer,

A=cov(d,i)/V(i)

=o

if

V(a)#O,

if

V(a)=O,

and

$ = E(d) - AE(f).

(See appendix

A for proof.)

The existence of the f limit, both in mean square and almost surely, is the

main result of ergodic theory and will be discussed further below. It is clear

that 2 is an invariant

random variable. If V(a)#O, then the x process has a

(non-degenerate)

invariant

component,

and conditioning

on the xs gives a

G. Chamberlain,

16

Multioariate

regression

modelsfor

panel data

non-trivial

linear predictor if 2 is correlated with c. However, if V(i)=O, then

cov(c, x,)=0 for all t, and the linear prediction

of c is not improved

by

conditioning

on the xs

It follows from Proposition

E*(Y,

4 that

I . . .. x,-1,x*,x,+

,,..

=E*(ytIxt,x,-,,...I

=i+jio

( 1

Bj++

xt - j + r(J),

So y does not cause x

according to Sims definition; but this does not imply that c is uncorrelated

with the xs. If we include a large number of lags, then the bias in any one

coefficient is a negligible A/J, but the bias in the sum of the lag coefficients

tends to 2 as J-co.

If we include K leads, then the sum of their coefficients

is approximately

K3,/J, which is close to zero when J is much larger than K.

If the pi are zero for j> J*, then the lag coefficients beyond that point will

be close to zero but their sum will be close to II.

there are non-degenerate

invariant

Under the stationarity

assumption,

random variables if and only if the process is not ergodic. The basic result

here is the (pointwise)

ergodic theorem:

Let g be a random

variable

on

and

let

g,(o)=g(Sw),

where

S

is

the

shift

(Q,F,P)

with E(lgl)< co,

transformation

(see appendix A); then the following limit exists as.:

random

variable;

it is the expectation

conditional

on &, where f is the information

set (a-field) generated by

the invariant

random variables. If 1/(i) # 0 for some g, then the process

ergodic. In the ergodic case, all of the invariant

random

variables

degenerate distributions.

Suppose

of 8,

all of

is not

have

that

E(Y,I 44, A= b x, + c,

and let

Gil

Recall

condition

(Y,-Ylbt--x)

il h-3.

(R):

=...=x~)=O.

prob(s,

want

to

examine

the

G. Chamberlain, Multinariate

significance

of condition

So a limiting

version

(R) as T+n;

of condition

in the stationary

17

(R) is

prob[ I/(x, ) f) = 0] = 0.

If this condition

holds, then

~~(xlY1l~)-~(xlI8)~(YlI,a)~,

limb

E(4 I &)-cm,

T-rX

as.

I &)I2

and b is observable

as T-tco.

But if there is positive probability

that

T/(x, 1f) =O, then the identification

problem is more difficult. There is no

information

on b for the stayers; so-in order to obtain E(b), even as T-co,

we have to make untestable

assumptions

about the unobservable

part of the

b distribution.

3. Estimation

Consider

a sample

Y;=(x:,yi),

i = 1,. . .,X, where

xi. = (xi,, . ., xiK), yi

=(yil,. . ., yiM). We shall assume

that vi is independent

and identically

distributed

(i.i.d.) according

to some multivariate

distribution

with finite

fourth moments

and E(x,x:) non-singular.

Consider

the minimum

mean

square error linear predictors,

E*(yi, I xi)

m=l,...,M,

=dlxi>

E*bi 1xi) = LZxi

with

and to test those restrictions.

For example, we may want to test whether a submatrix

of Ll has the form

/?Z+lA. I think that analysis of covariance estimation

should be accompanied

by such a test.

We shall not assume that the regression

function E(y, 1xi) is linear. For

although E@, 1xi, ci) may be linear (indeed, we hope that it is), there is generally

This agrees with the definition

in section

2 if xi includes

a constant.

18

G. Chamberlain,

Multivariate

inference for linear predictors. Furthermore,

even if the regression function is

linear, there may be heteroskedasticity

due to random

coefficients, for

example.8 So we shall allow V(j, 1xi) to be an arbitrary function of xi.

of linear predictors

Let wi be the vector formed from the distinct elements of riri that have

non-zero

variance.

Since v;=(xi,yi)

is i.i.d., it follows that wi is i.i.d. This

simple observation

is the key to our results. Since IZ is a function of E(wi),

our problem is to make inferences about a function of a population

mean,

under random sampling.

Let p= E(w,) and let IL be the vector formed from the columns of ll [Z

= vet (IZ)]. Then YI is a function

of P: x=/z(p).

7i = h(w) is the least squares estimator:

Let

W= cy2 1 w,/N;

then

.=VeC[(~~XixI)-~~XiYI].

W converges almost surely to p as

(WL $), where p is the true value of p. Let n=h(~o). Since h(p) is

7~. The central limit theorem implies

continuous

at p =p, we have 2%

that

N-tee

J5$i-pO)%v(O,V(w,)).

Since h(p) is differentiable

gives

JN(iZ-d)%v(O,R),

where

of the least squares estimator.

This approach

was used by Cramer

(1946) to obtain

limiting

normal

*Anderson

(1969,1970),

Swamy (1970,1974),

Hsiao (1975), and Mundlak

estimators

that incorporate

the particular

form of heteroskedasticity

that

random coefficients.

See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).

(1978a) discuss

is generated

by

G. Chamberlain,

Multivariate

19

distributions

for sample correlation

and regression

coefficients (p. 367); he

presents an explicit formula for the variance of the limiting distribution

of a

sample correlation

coefficient (p. 359). Kendall and Stuart (1961, p. 293) and

Goldberger

(1974) present

the formula

for the variance

of the limiting

distribution

of a simple regression coefficient.

Evaluating

the partial derivatives

in the formula for 52 is tedious. That

calculation

can be simplified since i has a ratio form. In the case of simple

regression with a zero intercept, we have rc= E(y,x,)/E(xj!) and

fi(kTO)=

i=l

with

fl

The definition

implies that

y.u.I

I

we obtain

Ql

xi)[fi(

distribution

by working

limit theorem

C(Yi- noxi)xillCfi

,$ m)].

E(xZ)l,

This approach

was used by White (1980) to obtain the limiting distribution

B (Proposition

5) we

for univariate

regression

coefficients. lo In appendix

follow Whites approach to obtain

s2 = E[iJJi-noxi)(yi

(1)

where

@, = E(qx;).

A consistent

estimator

sample moments,

o=&$

L 1

of 52 is readily

[~i-Bxi)(JJi-fiXi)@

n here

S,=

5 x,x:/N.

i=l

available

from

S;(Xixi)S;q

the

corresponding

AL?,

(2)

20

G. Chamberlain,

Multkariate

regression

If the conditional

variance

depend on xi, then

3.2. Imposing

restrictions:

modelsfiv

function

panel data

is linear,

then

is homoskedastic,

so that

on ZZ imply restrictions

on E(wi).

by the

Let the dimension

of r=E(wi)

be q. We shall specify the restrictions

condition

that ~1 depends only on a p x 1 vector 8 of unknown

parameters:

p

=g(8), where g is a known function and psq. The domain of 8 is X a subset

of p-dimensional

Euclidean space (RP) that contains the true value 8. So the

restrictions imply that ~=g(6)

is confined to a certain subset of Rq.

We can impose the restrictions

by using a minimum

distance estimator:

choose &to

equivalent

to the following one: choose 6 to

This

minimization

problem

is

The properties

of 6 are developed, for example, in Malinvaud

(1970, ch. 9).

Since g does not depend on any exogenous variables, the derivation

of these

properties can be simplified considerably,

as in Chiang (1956) and Ferguson

(1958).

For completeness,

we shall state a set of regularity

conditions

and the

properties that they imply:

21

Assumption 1. uN aAg(Bo);

is continuous

on yT and g(6)=g(O) for 0~ Y implies that 8=8;

A, s Y,

where Y is positive definite.

Assumption 2. $?[a,-g(O)]

O in which g has continuous

G = ag(eOym.

Choose 8 to

E. of

%(O, A); r contains

a neighborhood

second partial derivatives; rank (G) =p, where

minCa,-g(e)lA.Ca,-s(e)l.

0Er

Proposition

6.

If Assumption

Proposition

where

7.

Zf Assumptions

I and 2 are satisfied, then ,,/%(&O)%V(O,

optimal choice for Y is A .

Proposition

definite matrix, and if A,%AI, then

Wwd831

(This is extended

B.)12

A),

hence an

if A is a q x q positive

4vC~,-g(B)1%2kp).

in Proposition

8, appendix

Suppose that the restrictions involve only Zl. We specify the restrictions

by

the condition

that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a

subset of R that includes the true value 6. Consider the following estimator

of 6: choose s^ to

~:CA-f(6)]8-[li-f(S)],

1

Since

appendix

ch. 9).

and include them in

B. The proofs are based on Chiang (1956), Ferguson

(1958), and Malinvaud

(1970,

22

where

definite.

fi is given

If Y, and

fi(&

we assume

f satisfy Assumptions

so)qo,

[F

that

n ~l Fj

- ),

and

where

F=

i3f(d)/W.

instead of to Iz. Suppose that the components

of wi are arranged

so that

w:=(w;,, wQ, where wil contains the components

of x&. Partition

p=E(wi)

conformably:

p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume

that

V(w,) is

positive definite. Now choose 6 to

distribution

as the estimator

8 that we obtained by

distance procedure to 12.(See Proposition

9, appendix

This framework

leads to some surprising

results

For a simple example, we shall use a univariate linear

E*(yi 1Xil,Xiz)=710 +

Consider

?Tl Xi1

applying the minimum

B.)

on efficient estimation.

predictor model,

+7Cz Xi2.

rc2 = 0. Then the conventional

estimator of

the

slope

coefficient

in

the

least

squares

regression

of

y

on x1. We

n1 is byx,,

shall show that this estimator

is generally less efficient than the minimum

distance

estimator

if the regression

function

is nonlinear

or if there is

heteroskedasticity.

Let fi,,it, be the slope coefficients in the least squares multiple regression

of y on x1,x2. The minrmum distance estimator

of a, under the restriction

rrZ =0 can be obtained

as 6=72r +r&

where r is chosen to minimize

the

G. Chamberlain,

(estimated)

variance

Multivariate

of the limiting

regression

distribution

covariance

between

distribution.

Since 72, = bYx, - 722bx2x1,we have

23

tij and

I& in their

limiting

and

if V(y, 1xii, xi2)=a2,

then

w12/022 =

If E(Y, 1Xil,XiJ

is linear

-COv(Xi,,Xi2)/~(Xi~)

and s^= byxl. But in general 8# byxl and s^ is more

efficient than by_. The source of the efficiency gain is that the limiting

distribution

for ti, has a zero mean (if rc2=O), and so we can reduce variance

without

introducing

any bias if 72, is correlated

with b,,l. Under

the

assumptions

of linear regression

and homoskedasticity,

b,_ and 72, are

uncorrelated;

but this need not be true in the more general framework that

we are using.

3.3. Simultaneous

squares

equations:

A generalization

of two-

and three-stage

least

Given the discussion on imposing restrictions, it is not surprising that twostage least squares is not, in general, an efficient procedure

for combining

instrumental

variables.

I shall demonstrate

this with a simple example.

Assume that (yi,zirxil,xi2)

is i.i.d. according to some distribution

with finite

fourth moments, and that

yi = 6 Zi +

Vi,

where E(ui xii) = E(ui xi2) = 0. Assume also that E(zi xii) # 0, E(z, xi2) # 0. Then

there are two instrumental

variable estimators that both converge a.s. to 6:

j= 1,2,

fi{(;;)-(;)}-N(OJ)>

where the j, k element

of n is

2, = EC(Yi-dzi)2XijXi!J

Jk

E(zixii)E(zi.xik)

j,k=1,2.

24

The two-stage

least squares estimator

combines

8,

^

zi=7c1xil +ti2xi2, based on the least squares regression

sume that E[(xir, Xia)(Xil, xi2)] is non-singular),

of z on x1,x2 (as-

where

N

oiti,

ZiXil

i=l

Since i %a,

JN(&s,,

-6)

I(

rili~lzixil+722

C

i=l

distribution

of fi[r($i

- 6) + (1 -r)(& -S)].

minimum distance estimator: choose e^to

zixi2

)

distribution

as

The answer

leads to the

gives

e^=z&+(l-z)&,

where

~=(~+1,2)/(3.1+2~12+~22),

obtained

by using a

and Ijk is the j, k element

of A - . The estimator

consistent estimator of A has the same limiting distribution.

In general z #a since r is a function

of fourth moments

and a is not.

Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless

xil

E(xil

xi2

xi2)

>I=o.

threestage least squares estimator. Its limiting distribution

is derived in appendix

B (Proposition

5); however, viewed as a minimum

distance estimator,

it is

using the wrong norm in general.

G. Chamberlain,

Consider

the standard

yi =

Multivariate

regression

simultaneous

nxi + ui,

equations

25

model:

E(Ui xi) = 0,

where rll+

B= 0 and Tui = vi. We are continuing

to assume that yi is

M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution

with finite fourth

moments (1 = 1,. .,N), and that E(x,,xi) is non-singular.

There are restrictions on

r and B: m(T, B)=O, where m is a known function. Assume that the implied

restrictions

on ll can be specified by the condition

that n=vec(lT)=f(Q

where the domain of 6 is r,, a subset of R that includes the true value So

(s 5 MK). Assume that Y, and f satisfy Assumptions

1 and 2; these properties

could be derived from regularity

conditions

on m, as in Malinvaud

(1970,

prop. 2, p. 670).

Choose

8 to

y:

E

where

[7i -

f(d)]&

1[72-f(s)],

is given

we assume

that

definite. Let

F= af(s)/S.

Then we have J%(~-~~)%NN(O,

A), where n

= (F Q - 1 F) . This generalizes Malinvauds

minimum distance estimator (p.

676); it reduces to his estimator if UPuy is uncorrelated

with xi xi, so that Q

= E(up up) @ [E(.qx;)] - (up = yi - Zlx,).

Now suppose that the only restrictions

on r and B are that certain

coefficients

are zero, together with the normalization

restrictions

that the

coefticient of yim in the mth structural

equation is one. Then we can give an

explicit formula for A. Write the mth structural equation as

of zi, are the variables in yi and xi that appear in the

mth equation with unknown

coefficients. Let there be M structural equations

and assume that the true value r is non-singular.

Let 6 =(S;, . . ., &) be s x 1,

and let r(6) and B(6) be parametric

representations

of r and B that satisfy

the zero restrrctions

and the normalization

rule. We can choose a compact

set Y, c R containing

a neighborhood

of the true value a, such that I(6) is

non-singular

for b E Y,. Then s = f(s), where f(s) = vet [ - r (6) B(S)].

Assume that f(s) =IL implies that 6=6, so that the structural parameters

are identified.

Then

Y, and

f satisfy Assumptions

26

A!+

N(O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),

an/as =

-(r

1 cgzK)p,,(zM

B d5;

)I,

So we have

n = {~,,[E(Op Up~

Xi

X:)] -

l UP:,>- ,

to

which is the conventional asymptotic covariance matrix for three-stage least

squares [Zellner and Thiel (1962)].

I shall present a generalization of three-stage least squares that has the

same limiting distribution as the generalized minimum distance estimator.

Let /I=vec(B) and note that R= -(f ~ @ I)/?. Then we have

[ji+(r-

z)/?]s)-[a+(r-

0 4Bl

=[(ro1)72+P]O-[(ro1)12+81,

where

o=(Z~~;l)E(f

UpU:r~XtX;)(Z~Qi;).

matrix:

and let

where

iji = ~yi +

~Xi,

p+rO

7

B% BO.

G. Chamberlain,

Now replace

Multivariate

21

0 by

6 = (Z@s,- ) 9yzgs,- ),

and note that

(I 0 S,)[(r

distance

function:

This corresponds

to Basmanns

(1965)

squares. 3

Minimizing

with respect to 6 gives

a,,=(S,,!F

interpretation

of three-stage

least

s:,)-(s,,!Ps,,).

The limiting

distribution

(Proposition

5). We record

of this

it as:

estimator

is

derived

in

appendix

Proposition

10. fi(6^,,-6)%iV(0,A),

where A =(@,, P- @P:,)-l. This

generalized three-stage least squares estimator is asymptotically efficient within

the class of minimum distance estimators.

Finally, we shall

Suppose that

Yil =S;

where E(xiUil)=O,

system by setting

consider

zil

the generalization

of two-stage

least

squares.

Oil,

Zil is sl x 1, and

rank

[E(XiZ:l)] =sl.

We complete

the

yi, = nk xi + Uim,

where E(XiUi,)=O

(m=2,.

. ., M). SO z~,,,=x~

(m=2,.

. ., M), and

distance procedure to obtain

8; since we are ignoring any restrictions

on R, (m = 2,. . ., M), 8 is a limited

information

minimum distance estimator.

13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be

obtained by following Hanken (1982). Also see White (1982).

28

We have

gives

G. Chamberlain,

a($,

-@)1?N(O,

Multivariate

regression

n 11), and

evaluating

the partitioned

inverse

(4)

where

$1 =yi, -s;ozir.

We can obtain

the same limiting

distribution

generalization

of two-stage least squares: Let

by

using

the

following

and

where $I %Sy

then

(for example,

8r could

&;G2

= (Z; x!PE;,x2,)-

be an instrumental

(z;

variable

estimator);

x!P ,, Xy,).

least squares

to the completed

system, with no restrictions

on A, (m

of this estimator is derived in appendix

= 2,. . .) M). The limiting distribution

B (Proposition

5):

Proposition 11. ,,/%(8,,,

-Sy)%N(O, A,,), where A, I is given in eq. (4). This

generalized two-stage least squares estimator is asymptotically

efficient in the

class of limited information minimum distance estimators.

3.4. Asymptotic

estimator

efjciency:

A comparison

likelihood

V(rJ

Assume that ri is i.i.d. (i= 1,. . ., N) from a distribution

where Z is a J x J positive definite matrix; the fourth moments

are

finite. Suppose that we wish to estimate functions of Z subject to restrictions.

Let C= vet(Z) and express the restrictions

by the condition

that a=g(O),

where g is a function from Yinto Rq with a domain YC RP that contains the

true value O(q = J*; p 5 J(J + 1)/2). Let

=Z,

S=kiil

(ri-FJ(ri-yi),

If the distribution

function is

of vi is multivariate

normal,

then

the log-likelihood

on r, then the maximum likelihood

is a solution to the following problem: Choose 6 to solve

estimator

of 8

of Yi is

not necessarily normal; in that case we shall refer to the estimator as a quasimaximum likelihood estimator (e^,,,).14

MaCurdy

(1979) considered

a version of this problem and showed that,

under suitable regularity

conditions,

,/%(gQML -0) has a limiting normal

distribution;

the covariance

matrix, however, is not given by the standard

information

matrix formula. We would like to compare this distribution

with

the distribution

of the minimum distance estimator.

This comparison

can be readily made by using Theorem

1 in Ferguson

(1958). In our notation,

Ferguson considers the following problem: Choose 8

to solve

w (s, e) [s-g(e)] = 0.

He derives

the limiting

distribution

of fi(&-- fI) under

regularity

conditions

on the functions

W and g. These regularity

conditions

are

particularly

simple in our problem since W does not depend on S. We can

state them as follows:

to-one mapping

of E. into Rq with a continuous

second partial derivatives

in Eo; rank [ag(fI)/S]

singular for edo.

8; g is a continuous,

oneinverse; g has continuous

=p for OE 8,; Z(O) is non-

+%(S-g(e))%N(O,d),

where A = V[(U~-~~)@(U~-~~)].

result that

Then Fergusons

theorem

implies that the likelihood

equations

surely have a unique

solution

within So for sufficiently

large

14The quasi-maximum

Malinvaud

(1970, p. 678).

JE--B

likelihood

terminology

was

used

by the

Cowles

almost

N, and

Commission;

see

30

A=(GYG,-GYAYG(GYG,)-,

and G=&(fl)/%,

Y=(Z@Zo)-.

It will be convenient

to rewrite this,

imposing the symmetry restrictions

on Z. Let G* be the J( J+ 1)/2 x 1 vector

formed by stacking the columns of the lower triangle of Z. We can define a

J* x [ J( J + 1)/2] matrix T such that CT= Ta*. The elements in each row of T

are all 0 except for a single element which is one; T has full column rank. Let

s= J-s* g(6)= Tg*(B), G* = ~g*(~)/S,

Y* = TYT;

then fi[S*

-s*(0)]

%N(O,A*), where A* is the covariance matrix of the vector formed from the

columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set

/I =(e*

Consider

y*G*)-

the following

T$[s*

-g*(B)]

(G* y* A* y* G*)(e*

minimum

12.

1.

choose @MDto

result is implied by Proposition

7.

If Assumption

estimator:

G*)-

A,{!?* -g*(O)],

A,=%Y*. Then the following

Proposition

distance

y*

3 is satisfied,

as fi(gMD

then J%(&~~~

-0)

of 8 and

has the

- 0).

If A* is non-singular,

an optimal

minimum

distance

estimator

has

A,a%[A*-,

where [ is an arbitrary positive real number. If the distribution

of ri is normal, then A*- =iY*; but in general A*- is not proportional

to

Y*, since A* depends on fourth moments

and Y* is a function of second

moments.

So in general flPML is less efficient than the optimal minimum

distance estimator that uses

-1

,

;i;l(s~-s*)(s:-s-i)

1

where SF is the vector formed from the lower triangle of (ri-r](ri-f).

More generally, we can consider the class of consistent estimators that are

continuously

differentiable

functions of s-*: &=@*). Chiang (1956) shows that

the minimum distance estimator based on A*- has the minimal asymptotic

covariance

matrix within this class. The minimum

distance estimator based

on A, in (5) attains this lower bound.

G. Chamberlain,

Multivariate

regression

31

4. An empirical example

We shall present

an empirical

example

that illustrates

some of the

preceding

results. The data come from the panel of Young Men in the

National

Longitudinal

Survey (Parnes). The sample consists of 1454 young

men who were not enrolled in school in 1969, 1970, or 1971, and who had

complete

data on the variables

listed in table 1. Table 2a presents

an

unrestricted

least squares regression of the logarithm of wage in 1969 on the

union, SMSA, and region variables for all three years. The regression also

includes a constant, schooling, experience, experience squared, and race. This

regression is repeated using the 1970 wage and the 1971 wage.

Table

Characteristics

Young Men,

of National Longitudinal

Survey

not enrolled in school in 1969,

1970, 1971; N= 1454.

Variable

Mean

LWI

LWZ

LW3

Ul

u2

lJ3

lJlU2

lJIcJ3

U2U3

UI CJ2U3

SMSAI

SMSAZ

SMSA3

RNSI

RNS2

RNS3

s

EXP69

EXP692

RACE

5.64

5.74

5.82

0.336

0.362

0.364

0.270

0.262

0.303

0.243

0.697

0.627

0.622

0.409

0.404

0.410

11.7

5.11

39.8

0.264

Standard

deviation

0.423

0.426

0.437

2.64

3.71

46.6

~

logarithm

of hourly

earnings (in cents) on the current or last job in

1969,1970,1971;

UI, U2, U3 1 if wages on

current or last job set by collective bargaining,

0 if not, in 1969,1970,1971;

SMSAI,SMSAZ,

SMSA3 - 1 if respondent

in SMSA, 0 if not,

in 1969,1970,1971;

RNSI, RNSZ, RNS3 1, if

respondent in South, 0 if not, in 1969,1970,1971;

S ~ years of schooling completed;

EXP69 (S-age

in 1969 -6); RACE - 1 if respondent

black, 0 if not.

Xl.

1.f uoyxx

aJv sIoJ,a

p~vpuels

(6LO.O)

011'0

kmY0)

S80'0-

zzo'o-

(zsuo)

(SPO'O)

PIO'O

ZLO'O-

(ESOO)

(OPO'O)

610'0-

(LEOO)

OSO'O-

ZM7

lM7

aPnF'"!SUO!='J%~J flt'.

(180'0)

P9Z'O

(PLO'O)

181'0

(6Lo'O)

9PZ'O

(260'0)

811'0

'(83VX p59dX~'69dX5Sf

9SZ'O(990'0)

LZZ'O

LZI'O

bM)'O)

(tr 10)

(911'0)

tzxo-

LPO'O-

hO~0)

f/M7

ZLOO-

(IPO'O)

uog3as (z)

juapuadaa

alqe!JeA

.___

(ZLO'O)

821'0

1~

rn

In

zn

-.- ___~0 (sJolJa p.mpue)s pm) siua!ogao3

lseaI pap!llsamn

9z aw1

salenbs

znrn

(SLO'O)

260'0

.suo!ssaSaJ

i-n/n

(OLO'O)

9SI'O

fnzn

(POI'O)

281'0fl?Zl?nl/l

_.

e

(IEO'O) (OEO'O)

SEI'O

(szuo)

(SZO'O)

600'0-

tn

(EZO'O)

9Po'O

ZM7

.CM7

IPOO

(EZO'O)

8Po'O

(OEO'O)

OSI'O

(8200)

In

ajqe!mh

luapuadaa

l/Ml

(szo'o)

zn

pun) slua!xjja03

ILI'O

';Zg

aJe slolJa pmpueis ayL $wu~ 26ydxy fj9dxg s I) apnpu! suo!ssaJ%aJ I[V,

(950'0) (SSO'O)

880'0

COO'0

(LZO'O) (9zo.o)

980'0

010'0

pa~yno~m

16LO'O)

PLO'0

(I90'0) (S90'0)

.ba u! Q Bu!sn

(E60'0)

OSO'O

(660'0)

s90'0

(PSO'O) (SSO'O)

IOO'OZEO'O

IVSWS

Iseal pal3ysaJun

~__

Zf I0

(8LO'O)

ZCZ'O-

(601'0)

6EO'O-

';pg'

cvsws ZVSWS

__.

sanmbs

f800

SSI'O-

(Z6D'O)

'g:;'

ESO'O

(OLO'O)

801'o_

ISNkI

ozo'o

Ci'Ntl ZSNU

Rwo!ssaSaJ

ez alw

G. Chamberlain,

Multivariate

regression

33

of a random intercept (c) and a

random slope (b). If the leads and lags are due just to c, then the submatrices

of LI corresponding

to the union, SMSA, or region coefficients should have

the form /3l+U.

Consider,

for example,

the 3 x 3 submatrix

of union

coefficients ~ the off-diagonal

elements in each column should be equal to

each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;

not bad.

In table 2b we add a complete set of union interactions,

so that, for the

union variables

at least, we have a general regression

function.

Now the

submatrix

of union coefficients is 3 x 7. If it equals (pZ3,0)+Zl, then in the

first three columns,

the off-diagonal

elements within a column should be

equal; in the last four columns, all elements within a column should be equal.

I first imposed the restrictions

on the SMSA and region coefficients, using

the minimum

distance estimator. fl is estimated using the formula in eq. (2),

section 3.1, and A,=&.

The minimum

distance statistic (Proposition

8) is

6.82, which is not a surprising value from a ~(10) distribution.

If we impose

the restrictions

on the union coefficients as well, then the 21 coefficients in

table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the

minimum

distance

statistic

(Proposition

8, appendix

B) of 19.36-6.82

= 12.54, which is not a surprising value from a ~(13) distribution.

So there is

no evidence here against the hypothesis

that all the lags and leads are

generated by c.

Consider a transformation

of the model in which the dependent variables are

LWl, LW2-LWl,

and LW3-LW2.

Start with a multivariate

regression on

all of the lags and leads (and union interactions);

then impose the restriction that

U, SMSA, and RNS appear in the LW2- L WI and LW3 - LW2 equations

only as contemporaneous

changes (E(y, - y, 1 1x1, x2, x3) = p(x, - x,_ J). This

is equivalent

to the restriction

that c generates all of the lags and leads, and

we have seen that it is supported by the data. I also considered imposing all

of the restrictions

with the single exception of allowing separate coefficients

for entering and leaving union coverage in the wage change equations.

The

estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard

error on the sum of the coefficients is 0.024, so again there is no evidence

against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15

However, since the x,s are binary variables, condition (R) in Proposition

1

Using

May-May

CPS matches for 197771978, Mellow (1981) reports coefftcients (standard

errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership

in a wage

change regression.

The sample consists of 6,602 males employed as non-agricultural

wage and

salary workers in both years. He also reports results for 2,177 males and females whose age was

525. Here the coefficients on entering and leaving union membership

are quite different: 0.198

(0.031) and -0.035

(0.041); it would be useful to reconcile these numbers with our results for

young men. Also see Stafford and Duncan (1980).

34

G. Chamberlain,

Multivariate

regression

does not hold. For example, the union coefticients provide some evidence

that E(b 1x1, x2,x,) is constant for the individuals

who experience a change in

if x,+x,+x,#O

or 33; but there is

union coverage [i.e., E(b 1x,,x,,x,)=if

no direct evidence on E(b 1x1, x2, x3) for the people who are always covered

or never covered. Furthermore,

our alternative

hypothesis has no structure.

It might be fruitful, for example, to examine the changes in union coverage

jointly with changes in employer.

Table 3a exhibits the estimates that result from imposing the restrictions

using

the optimal

minimum

distance

estimator.j

We also give the

conventional

generalized least squares estimates. They are minimum

distance

estimates in which the weighting matrix (AN) is the inverse of

standard

errors based on (pfi;F)-

and the

standard errors calculated according to Proposition

7, which do not require

an assumption

of homoskedastic

linear regression. These standard errors are

larger than the conventional

ones, by about 30%. The estimated

gain in

efficiency from using the appropriate

metric is not very large; the standard

errors calculated according to Proposition

7 are about 10% larger when we

use conventional

GLS instead of the optimum minimum distance estimator.

Table 3a also presents

the estimated

Ils. Consider,

for example,

an

individual

who was covered by collective

bargaining

in 1969. The linear

predictor of c increases by 0.089 if he is also covered in 1970, and it increases

by an additional

0.036 if he is covered in all three years. The predicted c for

someone who is always covered is higher by 0.102 than for someone who is

never covered.

Table 3b presents estimates under the constraint

that I=U. The increment

in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly

large

value to come from a x2 (13) distribution.

If we constrain

only the union As

to be zero, then the increment

is 57.06- 19.36= 37.7, which is surprisingly

large coming from a x2(7) distribution.

So there is strong evidence for

heterogeneity

bias.

The union coefficient declines from 0.157 to 0.107 when we relax the A=0

restriction.

The least squares estimates for the separate cross-sections,

with

16We did not find much evidence for nonstationarity

in the slope coefficients. If we allow the

union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines

IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we

also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44

= 5.07, which is not a surprising value from a x(4) distribution.

0.086

(0.025)

~ 0.008

(0.046)

SMSAZ

- 0.067

(0.040)

- 0.023

(0.030)

SMSA

u2

UI

0.032

(0.046)

SMSA3

-0.082

(0.037)

0.156

(0.057)

lJllJ2

0.100

(0.072)

RNSl

0.152

(0.062)

UlU3

- 0.02 I

(0.077)

RNS2

0.195

(0.059)

r/2 U3

-0.085

(0.040)

(0.052)

- 0.082

(0.045)

RNS

-0.128

(0.068)

RNS3

-0.229

(0.085)

lJIUZU3

E*Cyjx)=nx=n,x,+n,x,;

x;=(Ul,

U2, U3, UIU2, UIU3, U2U3, UIU2U3, SMSAl,

SMSA2, SMSAS,

RNSI, RNS2, RNS3); x; =( 1, S, EXP69, EXP69, RACE). ZZ, = (/J,Z,, 0, BSMSAZJ,fiRNSZ3)+ 12; ZZ2 is unrestricted.

The restrictions are expressed as n = F6, where 6 is unrestricted.

B and 1 are minimum distance estimates with

A, =d in eq. (2), section 3.1; to,., and lo,, are minimum distance estimates with Ai = 6, in eq. (6), section

one based on

error for /?o,, is the conventional

4 ([or., is not shown in the table). The first standard

The x2

(FR,

4-l;

the second standard error for &rs is based on (FSZ;F)~Fn;1d6,F(F~;F)~.

statistics are computed from N[k-FG]&[?i-Fs].

x2(23) = 19.36

.I

0.050

(0.017)

(0.021)

0.121

(0.013)

(0.018)

/%,s

(i-3

0.0.56

(0.020)

0.107

(0.016)

errors) ok

SMS.4

(and standard

Coefficients

estimates.

Table 3a

Restricted

36

G. Chamberlain,

Multivariate

regression

Table 3b

Restricted

estimates

under

Coefficients

s^

the constraint

(and standard

that I = 0.

errors) of:

SMSA

RNS

0.157

(0.012)

0.120

(0.013)

-0.150

(0.016)

x2(36) = 89.08

See footnote

to table 3a.

no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,

1970 and 1971.17 So the decline in the union coefficient, when we allow for

heterogeneity

bias, is 32% or 44x, depending

on which biased estimate (0.16

or 0.19) one uses. The SMSA and region coefficients also decline in absolute

value. The least squares estimates

for the separate cross-sections

give an

average SMSA coefficient

of 0.147 and an average region coefficient

of

-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the

decline in absolute value of the region coefficient is either 45% or 37%.

5. Conclusion

We have examined the relationship

between heterogeneity

bias and strict

exogeneity in distributed

lag regressions

of y on x. The relationship

is very

strong when x is continuous,

weaker when x is discrete, and non-existent

as

the order of the distributed

lag becomes infinite.

The individual

specific random

variables

introduce

nonlinearity

and

heteroskedasticity.

So we have provided an appropriate

framework

for the

estimation

of multivariate

linear predictors.

We showed that the optimal

minimum

distance

estimator

is more

efficient,

in general,

than

the

conventional

estimators

such as quasi-maximum

likelihood,

We provided

computationally

simple generalizations

of two- and three-stage least squares

that achieve this efficiency gain.

Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership

coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971

and 1973

(N=470),

Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability

of

union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this

variable is imputed for the other four years.) The coefficient declines to 0.081 when individual

intercepts

are included

in the regression.

His regressions

also include a large number

of

occupation

and industry specific job characteristics.

G. Chamberlain,

Multiuariate

regression

modekfor

37

panel data

Some of these ideas were illustrated using the sample of Young Men in the

National

Longitudinal

Survey. We examined

regressions

of wages on the

leads and lags in union coverage, SMSA, and region. The results indicate

that the leads and lags could have been generated

just by a random

intercept. This gives some support for analysis of covariance

type estimates;

these estimates indicate a substantial

heterogeneity

bias in the union, SMSA,

and region coefficients.

Appendix A

Let Sz be a set of points where OEQ is a doubly infinite sequence of

vectors of real numbers:

0={...,0_~,0~,0~,...}={0,,t~I),

where w,ER~

and I is the set of all integers. Let z,(w)=o,

be the tth coordinate

function.

Let F be the a-field generated by sets of the form

A = (0.xz,(w)E B,, . . ., Z,+k(u) E Bk},

where t, k E I and the Bs are q-dimensional

Bore1 sets. Let P be a probability

measure defined on 9 such that {e,, t E 11 is a (strictly) stationary

stochastic

process on the probability

space (C&P-, P).

The shift transformation

S is defined by z,(So) =zt+ r(w). It is an invertible,

measure preserving transformation.

A random variable d defined on (sZ,P, P)

is invariant

if d(So)=d(w)

except on a set with probability

measure zero

(almost surely or as.). A set A E 9 is invariant

if its indicator function is an

invariant

random variable.

We shall use E(d ( Y), to denote the conditional

expectation

of the random

variable

d with respect to the o-field 3, evaluated

at w. Let x, be a

component

of zl, let g(x) denote the a-field generated by {. . ., x_ 1, x0, x1,. . .},

and let E(d1 xt,x,_ r,. . .) denote the expectation

of d conditional

on the

a-field generated by xt, xt 1,. . . .

Proposition

3.

Proof.

First we shall show that E(d I a(x)) is an invariant

random

Let f(o)=d(Sw).

A change of variable argument shows that

E(d I CT(X))~~

= E(fl S- o(x)),

[See

Billingsley

(1965,

example

variable.

a.s.

10.3, p.

109).]

Since

d is an

invariant

38

random

G. Chamberlain,

variable,

Multivariate

we have d(Sw)=d(o)

regression

models,for

panel data

Hence

by (x,, x,_ 1,. . .), and let

~=~~_a(xt,x,1,. . .) be the left tail o-field generated by the x process.

Since E(d 1(T(X)) is an invariant

random

variable,

there is a version

of

E(d 1a(x)) that is measurable

Y-. [See Rozanov (1967, lemma 6. l., p. 162).]

a.s., and

so E(d 1a(x)) = E(d 10(x,, xt_ 1,. . .)).

Hence

E(d 1o(x)) = E(d 1Y)

Q.E.D.

Let d be an invariant

random

variable

and assume that E(P)< co,

E(xT)< co. Consider the Hilbert space of random variables generated by the

linear manifold

spanned by the variables

{d,. . .,x_ 1,x,,, x1,. . .}, closed with

respect to convergence

in mean square. We also include a constant

(1) in

the space. The inner product

is (a, b) =E(ab).

Then the linear predictor

of d on the closed linear

E*(d I . ..) X_1,X(),Xl)... ) is defined as the projection

subspace generated by { 1,. . ., x _ 1,x0, x1,. . .}.

Proposition 4.

then

If d is an invariant random variable and E(d) < co, E(xf) < co,

E*(dl...,

x_l,xO,xl

,... )=$+A&

and

i = cov (d, ~)/V(~)

=o

if

V(2) # 0,

if

V(R)=O,

t is any integer,

Proof: The existence of the limit is implied by the mean ergodic theorem

[Billingsley

(1965, theorem

2.1, p. 21)]. Since d is an invariant

random

variable, we have cov(d,x,)=cov(d,

x1) for all t. Let aJ=xf=l

x,-j/J. Then

cov (d, a,) = cov (d, x,), and so cov (d, m)= lim,, m cov (d, 2,) = cov (d, x1). Since

i is an invariant

random

variable, we have cov (a, a,)= cov (a, x,), and so

V(a) = lim,, m cov (2, a,) = cov (a, x1). Hence

cov (d - $ - Af, x,) = cov (d, x1)-I

t E I.

Q.E.D.

G. Chamberlain,

Multioariate

reyression

39

Appendix B

Let r: = (x;,yi), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write

the mth structural equation as

Yim

S:,Zirn

Uim,

m=l,...,M,

of zi, are the variables in yi and xi that appear in the

mth equation

with unknown

coefficients.

Let S,, be the following blockdiagonal matrix:

and

Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let

Gz, = E&J

Let 6 =(S;, . . ., 6b) be s x 1, and set

s^=

(S,, D - Sz,) -

(S,,

D s,,,).

Proposition

5.

Assume that (1) ri is i.i.d. according to some distribution with

,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;

und (4) D a%Y as N-+E, where !P is a positive definite matrix. Then

,I$&

6)s N(0, A), where

Proof:

~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~.

strong law of large numbers,

Sz,2@,,;

@2x Y ~ W,

definite

matrix

since rank (@,,)=s.

So we obtain

distribution

by considering

(Gi,, Y - l a:,) - 1 CD,, Y

l f

i=l

(I$ @ X&G.

applying

the central

limit theorem

By

the

is an s x s positive

the same limiting

X~)=E(U~U:

gives ~(8-6)~N(0,A).

Xix;).

Q.E.D.

Then

40

G. Chamberlain,

Multivariate

regression

as special cases a number

of the commonly

used

estimators.

If zi, = xi(m = 1, . . ., M) and D =Z, then 8 is the least squares

estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.

If Y = E($$)

0 E(x&), then n is the asymptotic

covariance

matrix for the

three-stage

least squares estimator.

If Y =E($$

@ XiXI), then ,4 is the

asymptotic

covariance

matrix for the generalized

three-stage

least squares

estimator [eq. (3), section 3.31. If

Y = diag{E(z$~) E(xi xi), . ., E(vg) E(xi xi)),

then we have the asymptotic

covariance

Xi xi),

Y = diag{ E($t

matrix

for two-stage

least squares.

If

. ., E(?I~$ xi xi)},

covariance

matrix for generalized

two-stage least

squares. [A,, is given in eq. (4), section 3.3.1

Next we shall derive the properties of the minimum distance estimator. Let

D,(0) = [a,-g(@]AJa,-g(6)]

and choose e to

min ON(e).

Bt 1

Assumptions

Proposition

6.

IfAssumption

3.2.

ProoJ:

Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].

D, a.~. converges

uniformly

to D* on Y: Let B be a neighborhood

of 8 and set r = r- B. Then

min D,(@*min

BEY

BET

D*(8)=&

Since B is an arbitrary

Q.E.D.

Proposition

7.

If

neighborhood

Assumptions

of 8, we have shown

and

are

satisfied,

then

that

large.

8 Leo.

fi(&-8)

Zf A is positive

optimal

choicefbr

Proof

Let

definite,

then A -(G

Y is A

A - 1 G)- 1 is positive

semi-definite;

41

hence

an

I.

s,(fI)=dD,(B)/80=

-2(aip(e)/a~)A.[a,-g(e)].

Since &SO,

for N sufficiently large we as.

mean value theorem implies that

for sufficiently large N, where 8* is on the line segment connecting

8 and 0.

[There is a different 8* for each row of &,(O*)/%; the measurability

of 8*

follows from lemmas 2 and 3 of Jennrich

(1969).] Since 0*28,

direct

evaluation

shows that

&&I*)/%

32 G Y G,

which is non-singular.

fi(e^for sufficiently

considering

Hence @(e^-

Hence

0) = - [t%,(tl*)/ae,] - l JNs,(eo)

large

N.

We

obtain

the

same

a.s.,

limiting

distribution

by

matrix C such that

Let G=C-Gand

B=(GYG)-GYC.

Then we have

A=CC.

which is positive

Proposition

8.

and if ANaLA-,

semi-definite.

If Assumptions

Q.E.D.

I and 2 are

satisfied,

then

%c2(q

-PI.

if A is positive

definite,

42

Proof:

For sufficiently

large N we have

JNCg(&)-g(OO)]

= G, JE(e^- 0)

where G, %G.

From

a.s.,

7, we have

JE(e^--OO)=R,JN[u,-g(BO)]as.

where R,~R=(GA-G)-lGAp.

fib,

where Q = Z,- GR,

II 2 N(O, I,);

-g(831

.,

Hence

=,,%,

-AeoN

C is a non-singular

-fiCg(83-g(e")l

matrix

such

that

sQCU>

CC = A,

and

-g(8)] %i C Q A - 1 QCu.

Let G=CG

and

M,=Z,-c(G@lc;

idempotent

matrix with rank q-p and

then

M,

is

symmetric

CQA-QC=M;=M,.

Hence d,~,uM,u~~X2(q-p).

Q.E.D.

restrictions,

which are expressed by the

condition

that 8 =f(a), where a is s x 1 (s 5 p). The domain of a is Y,, a subset

of R that contains

the true value d . So O=f(a) is confined to a certain

subset of RP.

Assumption 2.

Y, is a compact

subset of R that contains

a; f is a

continuous

mapping

from r, into Y, f(a) = e for a E Y, implies a =a; Y,

contains

a neighborhood

of a0 in which f has continuous

second partial

derivatives; rank (F) = s, where F= df (a)/da.

Let h(a)=g[f(a)].

Choose

oi to

min [ahi-h(a)]AN[u,-h(a)].

OIE

r,

G. Chamberlain,

Proposition

8.

If Assumptions

regression

forpanel

models

1, 2, and 2 are

then d, -d2%~(p-s),

satisfied,

data

43

$ A is positive

where

-W)lA.Ca,-W41,

d, =N[a,

Furthermore,

Multivariate

d, -d,

is independent

Proof

The assumptions

on f and

Y, imply that h and

& satisfy

Assumptions

1 and 2. By following the proof of Proposition

8, we can show

that the vector (d,, d2) converges in distribution

to (d:, d:), where

U& N(O,Z), C is

G=C-G,

and

non-singular

matrix

such

that

CC= A, 8=

C-H,

M,=z,-e(el~-w

MH=Iq-A(RA)-lzT,

MH-MG is a symmetric idempotent

matrix with rank p-s. Hence

=M,;

so

d, -d,~tU(MH-MG)u~~X2(p-s).

Since

cov[(M,-M,)u,M,u]

we see that d: - d,* is independent

=(MH-MG)MG=O,

of d:.

Q.E.D.

applying

the minimum

distance procedure

both to L and to W. We want to show that if the restrictions

involve only n,

then the two procedures

give estimators

of R with the same limiting

distribution.

First consider the effect of a one-to-one

transformation

from W

to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a$,

where

p =g(O). Let h(8) = Ik(O)]. Choose 8 to

y;,rCk)

- W41ANCQ4- W)l.

44

Proposition 9a. Assume that (1) Assumptions 1 and 2 are satisfied for g and

1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has

continuous second partial derivatives in a neighborhood of g(Q); L is nonsingular;

(3) A is positive

3 N(0, A),

where A = (G A

Proof:

definite

and A,

A(LAL)-.

Then

,/%(6-O)

l c)-.*

By the d-method,

fi[f(~,)-h(8~)]~N(O,

L A L).

Hence ,,/%(&

O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.

Q.E.D.

Since H= L G and L is non-singular,

we have A =(G A ~ c)-.

Finally,

consider

augmenting

aN to a k x 1 vector

cN: c;V=(a;, bk),

kzq.

(For example,

we can augment

12 by adding

WZ.) Assume

that

cN %<, where 5 = (g(OO),cg), and assume that J%(c~-<~)~N(O,@).

We

shall let TZ be unrestricted.

Let t,k = ($;, I+VJ= (O,t,k) be a 1 x n vector, where

n =p + k - q; set ml(+) = (g(O),I&). Choose $ to

- 4W,

6I

where Ag%Wl.

Then I,&, provides an estimator of 8; we want to compare

this estimator with the following one: choose 8 to

minCa,-g~e8)1A.[a,-g(e)l,

eer

where A,

We shall

A,

and

consisting

following

Proposition

Proof

%A-,

Y is a compact subset of RP, and g is continuous

on Y.

set Y, equal to the Cartesian product of Y and Rkm4. Suppose that

A; are positive

definite,

and that the submatrix

of (Ag)-l

of the first q rows and columns equals Ail.

Then we have the

result:

9b.

$, = 6

Minimizing

4-+,,=

to ti2 gives

-(A~,,)-A~,,Ca,-g(e)l,

45

distance function is

= [aN -g(O)]

So the

distance

A,[a,

of A;

-g(O)].

addition

of unrestricted

estimator.

the

concentrated

Q.E.D.

moments

does

not

affect

the

minimum

References

Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.

Krishnaiah,

ed., Proceedings

of the second international

symposium on multivariate

analysis

(Academic Press, New York).

Anderson,

T.W., 1970, Estimation

of covariance

matrices which are linear combinations

or

whose inverse are linear combinations

of given matrices,

in: Essays in probability

and

statistics (University of North Carolina Press, Chapel Hill, NC).

Amemiya, T., 1971, The estimation

of variances in a variance-components

model, International

Economic Review 12, l-13.

Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation

of a dynamic model: The demand for natural gas, Econometrica

34, 5855612.

Basmann, R.L., 1965, On the application

of the identifiability

test statistic and its exact finite

sample

distribution

function

in predictive

testing

of explanatory

economic

models,

Unpublished

manuscript.

Billingsley, P., 1965, Ergodic theory and information

(Wiley, New York).

Billingsley, P., 1979, Probability

and measure (Wiley, New York).

Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,

113-134.

Chamberlain,

G., 1980, Analysis

of covariance

with qualitative

data, Review of Economic

Studies 47, 225-238.

Chiang, C.L., 1956, On regular best asymptotically

normal estimates, Annals of Mathematical

Statistics 27, 336-351.

Cramer, H., 1946, Mathematical

methods of statistics (Princeton

University

Press, Princeton,

NJ).

Ferguson,

T.S., 1958, A method

of generating

best asymptotically

normal

estimates

with

application

to the estimation

of bacterial

densities, Annals of Mathematical

Statistics 29,

10461062.

Goldberger,

AS., 1974, Asymptotics

of the sample regression slope, Unpublished

lecture note no.

12.

Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.

Griliches, Z. and A. Pakes, 1980, The estimation

of distributed

lags in short panels, National

Bureau of Economic Research technical paper no. 4.

Hansen,

L.P., 1982, Large sample properties

of generalized

method of moments

estimators,

Econometrica

50, forthcoming.

Hsiao, C., 1975, Some estimation

methods for a random coefficient model, Econometrica

43,

3055325.

Jennrich, R.I., 1969, Asymptotic

properties of non-linear least squares estimators,

The Annals of

Mathematical

Statistics 40, 6333643.

Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).

MaCurdy.

T.E., 1979, Multiple time series models applied to panel data: Specification

of a

dynamic model of labor supply, Unpublished

manuscript,

46

models in pooling cross section and time

series data, Econometrica

39, 341-358.

Malinvaud,

E., 1970, Statistical methods of econometrics

(North-Holland,

Amsterdam).

Mellow, W., 1981, Unionism

and wages: A longitudinal

analysis, Review of Economics

and

Statistics 63, 43-52.

Mundlak,

Y., 1961, Empirical

production

function free of management

bias, Journal of Farm

Economics 43,44-56.

Mundlak,

Y., 1963, Estimation

of production

and behavioral

functions from a combination

of

time series and cross section data, in: C. Christ et al., eds., Measurement

in economics

(Stanford University Press, Stanford, CA).

Mundlak,

Y., 1978, On the pooling of time series and cross section data, Econometrica

46, 699

85

Mundlak,

Y., 1978a. Models with variable coefftcients: Integration

and extension, Annales de

IINSEE, 30-31, 4833509.

Rao, C.R., 1973, Linear statistical inference and its applications

(Wiley, New York).

Rothenberg,

T.J., 1973, Efficient estimation

with a priori information

(Yale University

Press,

New Haven, CT).

Rozanov, Y.A., 1967, Stationary

random processes (Holden-Day,

San Francisco, CA).

Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.

Sims, C.A. 1974, Distributed

lags, in: M.D. Intriligator

and D.A. Kendrick,

eds., Frontiers

of

quantitative

economics, Vol. II (North-Holland,

Amsterdam).

Swamv,

P.A.V.B..

1970. Efficient

inference

in a random

coefficient

regression

model.

Fconometrica

38, 3111323.

Swamv. P.A.V.B.. 1974. Linear models with random coefficients. in: P. Zarembka.

ed.. Frontiers

in econometrics

(Academic Press, New York).

Stafford, F.P. and G.J. Duncan,

1980, Do union

members

receive compensating

wage

differentials?, American Economic Review 70, 355-371.

Wallace, T.D. and A. Hussain, 1969, The use of error components

models in combining

time

series with cross section data, Econometrica

37, 55-72.

White, H., 1980, Using least squares to approximate

unknown regression functions, International

Economic Review 21. 1499170.

White, H., 1980a, Nonlinear regression on cross section data, Econometrica

48, 721-746.

White, H., 1980b, A heteroskedasticity-consistent

covariance

matrix estimator

and a direct test

for heteroskedasticity,

Econometrica

48, 817-838.

White, H., 1982, Instrumental

variables regression with independent

observations,

Econometrica

50. forthcoming.

Zellner,

A. and H. Theil,

1962, Three-stage

least squares:

Simultaneous

estimation

of

simultaneous

equations, Econometrica

30, 54-78.

Zellner, A., J. Kmenta

and J. D&e,

1966, Specification

and estimation

of CobbDouglas

production

function models, Econometrica

34, 784795.

- Lecture 10 1Uploaded byzahoor80
- Checking Assumptions of Normality Regression STATAUploaded byAce Cosmo
- Artikel Udara Pak Sam Grostlog 2018Uploaded bysujatmiko
- US Federal Reserve: 199648papUploaded byThe Fed
- Eviews Tutorial 4 1Uploaded byAli
- Chapter 5Uploaded byEdward
- mbook applied econometrics using MATLABUploaded byimad.akhdar
- UT Dallas Syllabus for psci5362.001 06s taught by Patrick Brandt (pxb054000)Uploaded byUT Dallas Provost's Technology Group
- Estimating Standard Errors in Finance Panel Data SetsUploaded byJokoSusi
- Linton O. Econometrics (LN, LSE, 2002)(144s)_GLUploaded byMateus Ramalho
- branasGarzaNeumann-parentsReligiosityAndDaughtersFertility-RevEconHousehold2004Uploaded bysecularquiver
- tut5Uploaded byClifton Lo
- Simple Linear Regression-Part 1Uploaded byfa2heem
- ijcb14q3a3Uploaded byAdminAli
- 1195_30Uploaded bygargantua09
- 328Uploaded byprashanttutu
- Cointegration and Antitrust, A PrimerUploaded bypepepe78
- MFPT- 3Uploaded byJuanPesantes
- Chap 3 of BHTUploaded byAlejandra Orellana
- Estimation of Elasticities of Substitution for CES and VES Production Functions Using Firm-Level Data for Food-Processing Industries in PakistanUploaded byInnovative Development Strategies (Pvt.) Ltd
- Invenory Resarch HavardUploaded byMahbubul Haque Shuvo
- DSTHESIS.pdfUploaded byloong1
- Stat Modelling Assignment 5Uploaded byFariha Ahmad
- MultilevelModeling IntroUploaded byЖасмина Тачева
- Assessing Accounts Receivables Management as a Determinant of Profitability on Agro-Firms in Eldoret Business CentreUploaded byIJAERS JOURNAL
- Assessment of Coastal Erosion Vulnerability Around Midnapur-Balasore Coast, Eastern India Using Integrated Remote Sensing and GIS TechniquesUploaded bykowsilax
- G. Stacy Sirmans "Hedonic Model"Uploaded byKosa
- 3rd Phase PptUploaded byVeeresh Tangadgi
- Reading Report 1.pdfUploaded bysnazrul
- 3- IJAMSS -Analysis of Multiple Linear Regression Models Using Symbolic Interval-Valued VariablesUploaded byiaset123

- 4-1 - How to use SmartPLS software_Structural Model Assessment_3-5-14.pptUploaded byDedek Putri Lestari
- US President InquiryUploaded byedrickshaw
- ALT for ChassisUploaded bybhaumikhitech
- MATH 533 Week 8 Final ExamUploaded bykimmiekimmie
- ShowUploaded byFatima Sheikh
- Test Bank Questions Chapters 1 and 2Uploaded byAnonymous 8ooQmMoNs1
- 04695979 (1)Uploaded byRaja Muhammad Basharat
- Measures of Central Tendency(2)Uploaded byPatrick Gregorio
- Homework 8Uploaded byEngiya Mono
- STA6166_HW2_Ramin_Shamshiri_SolutionUploaded byRaminShamshiri
- 06-14-PP CHAPTER 06 Continuous ProbabilityUploaded byNawarajPokhrel
- Topic 1 - Data AnalysisUploaded byKenny Cantila
- Psychology 117 Study GuideUploaded byVanessa
- probability theory and statisticsUploaded byJessica Soh
- Unemployment Rate 3Uploaded bySk.Ashiquer Rahman
- Topic 14.pptUploaded bylucas
- Design of ExperimentsUploaded byTeka Kam
- Solutions to Even ProblemsUploaded byAndrey Hcivokram
- BM ASGNMNT Q1Uploaded byFreddy Savio D'souza
- Numerical Descriptive TechniquesUploaded byimtiazquazi
- iShares Portfolio Analytics Coskew and CoKurt VBA3Uploaded byPeter Urbani
- taylor.docxUploaded bySudipto Kundu
- Non Parametric guideUploaded byEnrico_Larios
- Ch-12 Correlation & RegressionUploaded bysourav kumar ray
- IER326491191184200.pdfUploaded byMushtaq Hussain Khan
- Grouped Data Calculation.pdfUploaded byNuelle Nefiel
- O-Level Statistics (4040) - Quiz Level 2Uploaded bySasykantha
- E 1345 Practice for Reducing the Effect of Variability OfUploaded byJonathan C Montealegre
- Basic Concepts of StatisticsUploaded byKNOWLEDGE CREATORS
- Minitab Handbook 2011_LRMUploaded byOzonevo Ifunanya Dantala

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.