## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

)

3, Number 1, pp. 1–31

Instrumental variables and GMM: Estimation

and testing

Christopher F. Baum

Boston College

Mark E. Schaﬀer

Heriot–Watt University

Steven Stillman

New Zealand Department of Labour

Abstract. We discuss instrumental variables (IV) estimation in the broader

context of the generalized method of moments (GMM), and describe an extended

IV estimation routine that provides GMM estimates as well as additional diagnostic

tests. Stand-alone test procedures for heteroskedasticity, overidentiﬁcation, and

endogeneity in the IV context are also described.

Keywords: st0030, instrumental variables, generalized method of moments, en-

dogeneity, heteroskedasticity, overidentifying restrictions, clustering, intra-group

correlation

1 Introduction

The application of the instrumental variables (IV) estimator in the context of the clas-

sical linear regression model, from a textbook context, is quite straightforward: if the

error distribution cannot be considered independent of the regressors’ distribution, IV is

called for, using an appropriate set of instruments. But applied researchers often must

confront several hard choices.

An omnipresent problem in empirical work is heteroskedasticity. Although the con-

sistency of the IV coeﬃcient estimates is not aﬀected by the presence of heteroskedas-

ticity, the standard IV estimates of the standard errors are inconsistent, preventing

valid inference. The usual forms of the diagnostic tests for endogeneity and overiden-

tifying restrictions will also be invalid if heteroskedasticity is present. These problems

can be partially addressed through the use of heteroskedasticity-consistent or “robust”

standard errors and statistics. The conventional IV estimator (though consistent) is,

however, ineﬃcient in the presence of heteroskedasticity. The usual approach today

when facing heteroskedasticity of unknown form is to use the generalized method of

moments (GMM), introduced by Hansen (1982). GMM makes use of the orthogonal-

ity conditions to allow for eﬃcient estimation in the presence of heteroskedasticity of

unknown form.

In the twenty years since it was ﬁrst introduced, GMM has become a very popular

tool among empirical researchers. It is also a very useful heuristic tool. Many standard

estimators, including IV and OLS, can be seen as special cases of GMM estimators, and are

often presented as such in ﬁrst-year graduate econometrics texts. Most of the diagnostic

tests we discuss in this paper can also be cast in a GMM framework. We begin, therefore,

with a short presentation of IV and GMM estimation in Section 2. We include here a

c 2003 Stata Corporation st0030

2 Instrumental variables and GMM: Estimation and testing

discussion of intra-group correlation or “clustering”. If the error terms in the regression

are correlated within groups, but not correlated across groups, then the consequences

for IV estimation are similar to those of heteroskedasticity: the IV coeﬃcient estimates

are consistent, but their standard errors and the usual forms of the diagnostic tests

are not. We discuss how clustering can be interpreted in the GMM context and how it

can be dealt with in Stata to make eﬃcient estimation, valid inference, and diagnostic

testing possible.

Eﬃcient GMM brings with it the advantage of consistency in the presence of arbi-

trary heteroskedasticity, but at a cost of possibly poor ﬁnite sample performance. If

heteroskedasticity is in fact not present, then standard IV may be preferable. The usual

Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker tests for the presence of

heteroskedasticity in a regression equation can be applied to an IV regression only un-

der restrictive assumptions. In Section 3, we discuss the test of Pagan and Hall (1983)

designed speciﬁcally for detecting the presence of heteroskedasticity in IV estimation,

and its relationship to these other heteroskedasticity tests.

Even when IV or GMM is judged to be the appropriate estimation technique, we

may still question its validity in a given application: are our instruments “good instru-

ments”? This is the question we address in Section 4. “Good instruments” should be

both relevant and valid: correlated with the endogenous regressors and at the same time

orthogonal to the errors. Correlation with the endogenous regressors can be assessed by

an examination of the signiﬁcance of the excluded instruments in the ﬁrst-stage IV re-

gressions. We may cast some light on whether the instruments satisfy the orthogonality

conditions in the context of an overidentiﬁed model: that is, one in which a surfeit of

instruments are available. In that context, we may test the overidentifying restrictions

in order to provide some evidence of the instruments’ validity. We present the variants

of this test due to Sargan (1958), Basmann (1960), and, in the GMM context, Hansen

(1982), and show how the generalization of this test, the C or “diﬀerence-in-Sargan”

test, can be used to test the validity of subsets of the instruments.

Although there may well be reason to suspect nonorthogonality between regressors

and errors, the use of IV estimation to address this problem must be balanced against

the inevitable loss of eﬃciency vis-`a-vis OLS. It is therefore very useful to have a test of

whether or not OLS is inconsistent and IV or GMM is required. This is the Durbin–Wu–

Hausman (DWH) test of the endogeneity of regressors. In Section 5, we discuss how to

implement variants of the DWH test, and how the test can be generalized to test the

endogeneity of subsets of regressors. We then show how the Hausman form of the test

can be applied in the GMM context, how it can be interpreted as a GMM test, when it

will be identical to the Hansen/Sargan/C test statistic, and when the two test statistics

will diﬀer.

We have written four Stata commands—ivreg2, ivhettest, overid, and ivendog—

that, together with Stata’s built-in commands, allow the user to implement all of the

above estimators and diagnostic tests. The syntax diagrams for these commands are

presented in the last section of the paper, and the electronic supplement presents anno-

tated examples of their use.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 3

2 IV and GMM estimation

The “Generalized Method of Moments” was introduced by L. Hansen in his celebrated

1982 paper. There are a number of good modern texts that cover GMM, and one recent

prominent text, Hayashi (2000), presents virtually all the estimation techniques dis-

cussed in the GMM framework. A concise online text that covers GMM is Hansen (2000).

The exposition below draws on Hansen (2000, Chapter 11); Hayashi (2000, Chapter 3);

Wooldridge (2002, Chapter 8); Davidson and MacKinnon (1993); and Greene (2000).

We begin with the standard IV estimator, and then relate it to the GMM framework.

We then consider the issue of clustered errors, and ﬁnally turn to OLS.

2.1 The method of instrumental variables

The equation to be estimated is, in matrix notation,

y = Xβ +u, E(uu

′

) = Ω (1)

with typical row

y

i

= X

i

β +u

i

(2)

The matrix of regressors X is n × K, where n is the number of observations. The

error term u is distributed with mean zero and the covariance matrix Ω is n×n. Three

special cases for Ω that we will consider are

Homoskedasticity: Ω = σ

2

I (3)

Heteroskedasticity: Ω =

_

_

_

_

_

_

_

_

σ

2

1

0

.

.

.

σ

2

i

.

.

.

0 σ

2

n

_

_

_

_

_

_

_

_

(4)

Clustering: Ω =

_

_

_

_

_

_

_

_

Σ

1

0

.

.

.

Σ

m

.

.

.

0 Σ

M

_

_

_

_

_

_

_

_

(5)

where Σ

m

indicates an intra-cluster covariance matrix. For cluster m with t observa-

tions, Σ

m

will be t ×t. Zero covariance between observations in the M diﬀerent clusters

gives the covariance matrix Ω, in this case, a block-diagonal form.

4 Instrumental variables and GMM: Estimation and testing

Some of the regressors are endogenous, so that E(X

i

u

i

) = 0. We partition the set

of regressors into [X

1

X

2

], with the K

1

regressors X

1

assumed under the null to be

endogenous, and the (K −K

1

) remaining regressors X

2

assumed exogenous.

The set of instrumental variables is Z and is n × L; this is the full set of variables

that are assumed to be exogenous; i.e., E(Z

i

u

i

) = 0. We partition the instruments

into [Z

1

Z

2

], where the L

1

instruments Z

1

are excluded instruments, and the remaining

(L −L

1

) instruments Z

2

≡ X

2

are the included instruments/exogenous regressors:

Regressors X = [X

1

X

2

] = [X

1

Z

2

] = [Endogenous Exogenous] (6)

Instruments Z = [Z

1

Z

2

] = [Excluded Included] (7)

The order condition for identiﬁcation of the equation is L ≥ K; there must be at

least as many excluded instruments as there are endogenous regressors. If L = K, the

equation is said to be “exactly identiﬁed”; if L > K, the equation is “overidentiﬁed”.

Denote by P

Z

the projection matrix Z(Z

′

Z)

−1

Z

′

. The instrumental variables esti-

mator of β is

´

β

IV

= {X

′

Z(Z

′

Z)

−1

Z

′

X}

−1

X

′

Z(Z

′

Z)

−1

Z

′

y = (X

′

P

Z

X)

−1

X

′

P

Z

y (8)

This estimator goes under a variety of names: the instrumental variables (IV) esti-

mator, the generalized instrumental variables estimator (GIVE), or the two-stage least-

squares (2SLS) estimator, the last reﬂecting the fact that the estimator can be calculated

in a two-step procedure. We follow Davidson and MacKinnon (1993, 220) and refer to it

as the IV estimator rather than 2SLS because the basic idea of instrumenting is central,

and because it can be (and in Stata, is more naturally) calculated in one step as well as

in two.

The asymptotic distribution of the IV estimator under the assumption of conditional

homoskedasticity (3) can be written as follows. Let

Q

XZ

= E(X

′

i

Z

i

) (9)

Q

ZZ

= E(Z

′

i

Z

i

) (10)

and let ´ u denote the IV residuals,

´ u ≡ y −X

´

β

IV

(11)

Then, the IV estimator is asymptotically distributed as

´

β

IV

A

∼ N{β, V (

´

β

IV

)}, where

V (

´

β

IV

) =

1

n

σ

2

(Q

′

XZ

Q

−1

ZZ

Q

XZ

)

−1

(12)

Replacing Q

XZ

, Q

ZZ

and σ

2

with their sample estimates

Q

XZ

=

1

n

X

′

Z (13)

C. F. Baum, M. E. Schaﬀer, and S. Stillman 5

Q

ZZ

=

1

n

Z

′

Z (14)

´ σ

2

=

´ u

′

´ u

n

(15)

we obtain the estimated asymptotic variance–covariance matrix of the IV estimator:

V (

´

β

IV

) = ´ σ

2

{X

′

Z(Z

′

Z)

−1

Z

′

X}

−1

= ´ σ

2

(X

′

P

Z

X)

−1

(16)

Note that some packages, including Stata’s ivreg, include a degrees-of-freedom cor-

rection to the estimate of ´ σ

2

by replacing n with n−L. This correction is not necessary,

however, since the estimate of ´ σ

2

would not be unbiased anyway (Greene 2000, 373).

Our ivreg2 routine defaults to the large-sample formulas for the estimated error vari-

ance and covariance matrix; the user can request the small-sample versions with the

option small.

2.2 The generalized method of moments

The standard IV estimator is a special case of a generalized method of moments (GMM)

estimator. The assumption that the instruments Z are exogenous can be expressed as

E(Z

i

u

i

) = 0. The L instruments give us a set of L moments,

g

i

(

´

β) = Z

′

i

´ u

i

= Z

′

i

(y

i

−X

i

´

β) (17)

where g

i

is L × 1. The exogeneity of the instruments means that there are L moment

conditions, or orthogonality conditions, that will be satisﬁed at the true value of β:

E{g

i

(β)} = 0 (18)

Each of the L moment equations corresponds to a sample moment, and we write these

L sample moments as

g(

´

β) =

1

n

n

i=1

g

i

(

´

β) =

1

n

n

i=1

Z

′

i

(y

i

−X

i

´

β) =

1

n

Z

′

´ u (19)

The intuition behind GMM is to choose an estimator for β that solves g(

´

β) = 0.

If the equation to be estimated is exactly identiﬁed, so that L = K, then we have as

many equations—the L moment conditions—as we do unknowns—the K coeﬃcients in

´

β. In this case, it is possible to ﬁnd a

´

β that solves g(β) = 0, and this GMM estimator

is in fact the IV estimator.

If the equation is overidentiﬁed, however, so that L > K, then we have more equa-

tions than we do unknowns, and in general it will not be possible to ﬁnd a

´

β that will

set all L sample moment conditions to exactly zero. In this case, we take an L × L

weighting matrix W and use it to construct a quadratic form in the moment conditions.

This gives us the GMM objective function:

J(

´

β) = ng(

´

β)

′

Wg(

´

β) (20)

6 Instrumental variables and GMM: Estimation and testing

A GMM estimator for β is the

´

β that minimizes J(

´

β). Deriving and solving the K ﬁrst

order conditions,

∂J(

´

β)

∂

´

β

= 0 (21)

yields the GMM estimator:

´

β

GMM

= (X

′

ZWZ

′

X)

−1

X

′

ZWZ

′

y (22)

Note that the results of the minimization, and hence the GMM estimator, will be the

same for weighting matrices that diﬀer by a constant of proportionality (we will make

use of this fact below). Beyond this, however, there are as many GMM estimators as

there are choices of weighting matrix W.

What is the optimal choice of weighting matrix? Denote by S the covariance matrix

of the moment conditions g:

S =

1

n

E(Z

′

uu

′

Z) =

1

n

E(Z

′

ΩZ) (23)

where S is an L×L matrix. The general formula for the distribution of a GMM estimator

is

V (

´

β

GMM

) =

1

n

(Q

′

XZ

WQ

XZ

)

−1

(Q

′

XZ

WSWQ

XZ

)(Q

′

XZ

WQ

XZ

)

−1

(24)

The eﬃcient GMM estimator is the GMM estimator with an optimal weighting matrix

W, one that minimizes the asymptotic variance of the estimator. This is achieved by

choosing W = S

−1

. Substitute this into (22) and (24), and we obtain the eﬃcient GMM

estimator

´

β

EGMM

= (X

′

ZS

−1

Z

′

X)

−1

X

′

ZS

−1

Z

′

y (25)

with asymptotic variance

V (

´

β

EGMM

) =

1

n

(Q

′

XZ

S

−1

Q

XZ

)

−1

(26)

Note the generality (the “G” of GMM) of the treatment thus far; we have not yet made

any assumptions about Ω, the covariance matrix of the disturbance term. However, the

eﬃcient GMM estimator is not yet a feasible estimator, because the matrix S is not

known. To be able to implement the estimator, we need to estimate S, and to do this,

we need to make some assumptions about Ω.

2.3 GMM and heteroskedastic errors

Let us start with one of the most commonly encountered cases in cross-section analysis:

heteroskedasticity of unknown form, but no clustering (4). We need a heteroskedasticity-

consistent estimator of S. Such an

´

S is available by using the standard “sandwich”

approach to robust covariance estimation. Denote by

´

Ω the diagonal matrix of squared

residuals,

C. F. Baum, M. E. Schaﬀer, and S. Stillman 7

´

Ω =

_

_

_

_

_

_

_

_

´ u

2

1

0

.

.

.

´ u

2

i

.

.

.

0 ´ u

2

n

_

_

_

_

_

_

_

_

(27)

where ´ u

i

is a consistent estimate of u

i

. Then a consistent estimator of S is

´

S =

1

n

(Z

′

´

ΩZ) (28)

This works because, although we cannot hope to estimate the n diagonal elements of Ω

with only n observations, they are suﬃcient to enable us to obtain a consistent estimate

of the L ×L matrix S.

The ´ u used for the matrix in (27) can come from any consistent estimator of β;

eﬃciency is not required. In practice, the most common choice for estimating ´ u is

the IV residuals. This gives us the algorithm for the feasible eﬃcient two-step GMM

estimator, as implemented in ivreg2,gmm and ivgmm0:

1

1. Estimate the equation using IV.

2. Form the residuals ´ u. Use these to form the optimal weighting matrix

´

W =

´

S

−1

=

_

1/n(Z

′

´

ΩZ)

_

−1

.

3. Calculate the eﬃcient GMM estimator

´

β

EGMM

and its variance-covariance matrix

using the estimated optimal weighting matrix and (25), (26), and (13). This yields

´

β

EGMM

= {X

′

Z(Z

′

´

ΩZ)

−1

Z

′

X}

−1

X

′

Z(Z

′

´

ΩZ)

−1

Z

′

y (29)

with asymptotic variance

V (

´

β

EGMM

) = {X

′

Z(Z

′

´

ΩZ)

−1

Z

′

X}

−1

(30)

A variety of other feasible GMM procedures are also possible. For example, the

procedure above can be iterated by obtaining the residuals from the two-step GMM

estimator, using these to calculate a new

´

S, using this in turn to calculate the three-

step feasible eﬃcient GMM estimator, and so forth, for as long as the user wishes or

until the estimator converges; this is the “iterated GMM estimator”.

2

1

This estimator goes under various names: “2-stage instrumental variables”(2SIV), White (1982);

“2-step 2-stage least squares”, Cumby et al. (1983); and “heteroskedastic 2-stage least squares”

(H2SLS), Davidson and MacKinnon (1993, 599).

2

Another approach is to choose a diﬀerent consistent but ineﬃcient Step 1 estimator for the cal-

culation of residuals used in Step 2. One common alternative to IV as the initial estimator is to use

the residuals from the GMM estimator that uses the identity matrix as the weighting matrix. Alterna-

tively, one may work directly with the GMM objective function. Note that the estimate of the optimal

weighting matrix is derived from some

´

β. Instead of ﬁrst obtaining an optimal weighting matrix and

then taking it as given when maximizing (20), we can write the optimal weighting matrix as a function

of

´

β, and choose

´

β to maximize J(

´

β) = ng

n

(

´

β)

′

W(

´

β)g

n

(

´

β). This is the “continuously updated GMM”

of Hansen et al. (1996); it requires numerical optimization methods.

8 Instrumental variables and GMM: Estimation and testing

2.4 GMM, IV and homoskedastic versus heteroskedastic errors

Let us now see what happens if we impose the more restrictive assumption of conditional

homoskedasticity on Ω (3). This means that the S matrix simpliﬁes to

S =

1

n

E(Z

′

ΩZ) = σ

2

1

n

E(Z

′

Z) (31)

The expectation term in (31) can be estimated by (1/n)Z

′

Z, but what about σ

2

? As we

noted above, the GMM estimator will be the same for weighting matrices that diﬀer by a

constant of proportionality. We can therefore obtain the eﬃcient GMM estimator under

conditional homoskedasticity if we simply ignore σ

2

and use as our weighting matrix

´

W =

_

1

n

Z

′

Z

_

−1

(32)

Substituting (32) into (22), we ﬁnd that it reduces to the formula for the IV estimator

in (8). To obtain the variance of the estimator, however, we do need an estimate of σ

2

.

If we use the residuals of the IV estimator to calculate ´ σ

2

= (1/n)´ u

′

´ u, we obtain

´

S = ´ σ

2

1

n

Z

′

Z (33)

Finally, if we now set

´

W =

´

S

−1

=

_

´ σ

2

1

n

Z

′

Z

_

−1

(34)

and substitute (34) into the formula for the asymptotic variance of the eﬃcient GMM

estimator (26), we ﬁnd that it reduces to the formula for the asymptotic variance of the

IV estimator (12). In eﬀect, under the assumption of conditional homoskedasticity, the

(eﬃcient) iterated GMM estimator is the IV estimator, and the iterations converge after

one step.

3

What are the implications of heteroskedasticity for the IV estimator? Recall that in

the presence of heteroskedasticity, the IV estimator is ineﬃcient but consistent, whereas

the standard estimated IV covariance matrix is inconsistent. Asymptotically correct

inference is still possible, however. In these circumstances the IV estimator is a GMM

estimator with a suboptimal weighting matrix, and hence the general formula for the

asymptotic variance of a general GMM estimator, (24), still holds. The IV weighting

matrix

´

W remains as in (32); what we need is a consistent estimate of

´

S. This is easily

done, using exactly the same method employed in two-step eﬃcient GMM. First, form

3

It is worth noting that the IV estimator is not the only such eﬃcient GMM estimator under

conditional homoskedasticity. Instead of treating ´σ

2

as a parameter to be estimated in a second stage,

what if we return to the GMM criterion function and minimize by simultaneously choosing

´

β and

´σ

2

? The estimator that solves this minimization problem is in fact the Limited Information Maximum

Likelihood estimator (LIML). In eﬀect, under conditional homoskedasticity, the continuously updated

GMM estimator is the LIML estimator. Calculating the LIML estimator does not require numerical

optimization methods, since it can be calculated as the solution to an eigenvalue problem; for example,

see Davidson and MacKinnon (1993, 644–51).

C. F. Baum, M. E. Schaﬀer, and S. Stillman 9

the “hat” matrix

´

Ω as in (27), using the IV residuals, and use this matrix to form the

´

S

matrix as in (28). Substitute this

´

S, the (suboptimal) IV weighting matrix

´

W (32), and

the sample estimates of Q

XZ

(13) and Q

ZZ

(14) into the general formula for the asymp-

totic variance of a GMM estimator (24), and we obtain an estimated variance–covariance

matrix for the IV estimator that is robust to the presence of heteroskedasticity:

Robust V (

´

β

IV

) = (X

′

P

Z

X)

−1

{X

′

Z(Z

′

Z)

−1

(Z

′

´

ΩZ)(Z

′

Z)

−1

Z

′

X}(X

′

P

Z

X)

−1

(35)

This is in fact the usual Eicker–Huber–White “sandwich” robust variance–covariance

matrix for the IV estimator, available from ivreg or ivreg2 with the robust option.

2.5 Clustering, robust covariance estimation, and GMM

We turn now to the third special form of the disturbance covariance matrix Ω, clus-

tering. Clustering arises very frequently in cross-section and panel data applications.

For example, it may be reasonable to assume that observations on individuals drawn

from the same family (cluster) are correlated with each other, but observations on in-

dividuals from diﬀerent families are not. In the panel context, it may be reasonable to

assume that observations on the same individual (cluster) in two diﬀerent time periods

are correlated, but observations on two diﬀerent individuals are not.

As speciﬁed in (5), the form of clustering is very general. The intra-cluster cor-

relation Σ

m

can be of any form, be it serial correlation, random eﬀects, or anything

else. The Σ

m

’s may, moreover, vary from cluster to cluster (the cluster analog to het-

eroskedasticity). Even in these very general circumstances, however, eﬃcient estimation

and consistent inference is still possible.

As usual, what we need is a consistent estimate of S. Denote by u

m

the vector of

disturbances for cluster m; if there are t observations in the cluster, then u

m

is t × 1.

Let ´ u

m

be some consistent estimate of u

m

. Finally, deﬁne

´

Σ

m

≡ ´ u

m

´ u

′

m

. If we now

deﬁne

´

Ω

C

as the block-diagonal form

´

Ω

C

=

_

_

_

_

_

_

_

_

_

´

Σ

1

0

.

.

.

´

Σ

m

.

.

.

0

´

Σ

M

_

_

_

_

_

_

_

_

_

(36)

then an estimator of S that is consistent in the presence of arbitrary intra-cluster cor-

relation is

´

S =

1

n

(Z

′

´

Ω

C

Z) (37)

The earliest reference to this approach to robust estimation in the presence of clus-

tering of which we are aware is White (1984, 135–136). It is commonly employed in

the context of panel data estimation; see Wooldridge (2002, 193), Arellano (1987), and

10 Instrumental variables and GMM: Estimation and testing

K´ezdi (2002). It is the standard Stata approach to clustering, implemented in, for

example, robust, regress and ivreg2.

4

The cluster-robust covariance matrix for IV estimation is obtained exactly as in

the preceding subsection except using

´

S as deﬁned in (37). This generates the robust

standard errors produced by ivreg and ivreg2 with the cluster option. Similarly,

GMM estimates that are eﬃcient in the presence of arbitrary intra-cluster correlation

are obtained exactly as in Subsection 2.3, except using the cluster-robust estimate of

´

S. This eﬃcient GMM estimator is a useful alternative to the ﬁxed or random eﬀects

IV estimators available from Stata’s xtivreg because it relaxes the constraint imposed

by the latter estimators that the correlation of individual observations within a group

is constant.

It is important to note here that, just as we require a reasonable number of diagonal

elements (observations) for the usual “hat” matrix

´

Ω, we also require a reasonable

number of diagonal elements (clusters) for

´

Ω

C

. An extreme case is where the number

of clusters M is ≤ K. When this is the case, rank(

´

S) = M ≤ K = rank(Z

′

Z). At this

point, ivreg2 will either refuse to report standard errors (in the case of IV estimation)

or exit with an error message (in the case of GMM estimation). But users should take

care that, if the cluster option is used, then it ought to be the case that M >> K.

5

2.6 GMM, OLS, and Heteroskedastic OLS (HOLS)

Our ﬁnal special case of interest is OLS. It is not hard to see that under conditional

homoskedasticity and the assumption that all the regressors are exogenous, OLS is an

eﬃcient GMM estimator. If the disturbance is heteroskedastic, OLS is no longer eﬃ-

cient, but correct inference is still possible through the use of the Eicker–Huber–White

“sandwich” robust covariance estimator, and this estimator can also be derived using

the general formula for the asymptotic variance of a GMM estimator with a suboptimal

weighting matrix, (24).

A natural question is whether a more eﬃcient GMM estimator exists, and the answer

is “yes” (Chamberlain (1982) and Cragg (1983)). If the disturbance is heteroskedastic,

there are no endogenous regressors, and the researcher has available additional moment

conditions, i.e., additional variables that do not appear in the regression but that are

known to be exogenous, then the eﬃcient GMM estimator is that of Cragg (1983),

dubbed “heteroskedastic OLS” (HOLS) by Davidson and MacKinnon (1993, 600). It can

be obtained in precisely the same way as feasible eﬃcient two-step GMM, except now the

ﬁrst-step ineﬃcient but consistent estimator used to generate the residuals is OLS rather

than IV. This estimator can be obtained using ivreg2 by specifying the gmm option, an

4

There are other approaches to dealing with clustering that put more structure on the Ω matrix

and hence are more eﬃcient but less robust. For example, the Moulton (1986) approach to obtaining

consistent standard errors is in eﬀect to specify an “error components” (a.k.a. “random eﬀects”)

structure in (36): Σ

m

is a matrix with diagonal elements σ

2

u

+ σ

2

v

and oﬀ-diagonal elements σ

2

v

. This

is then used with (24) to obtain a consistent estimate of the covariance matrix.

5

Stata’s oﬃcial ivreg is perhaps excessively forgiving in this regard, and will indicate error only if

M ≤ L; i.e., the number of regressors exceeds the number of clusters.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 11

empty list of endogenous regressors, and the additional exogenous variables in the list

of excluded instruments. If the gmm option is omitted, OLS estimates are reported.

2.7 To GMM or not to GMM?

The advantages of GMM over IV are clear: if heteroskedasticity is present, the GMM

estimator is more eﬃcient than the simple IV estimator, whereas if heteroskedasticity is

not present, the GMM estimator is no worse asymptotically than the IV estimator.

Nevertheless, the use of GMM does come with a price. The problem, as Hayashi

(2000) points out (215), is that the optimal weighting matrix

´

S at the core of eﬃcient

GMM is a function of fourth moments, and obtaining reasonable estimates of fourth

moments may require very large sample sizes. The consequence is that the eﬃcient

GMM estimator can have poor small sample properties. In particular, Wald tests tend

to over-reject the null (good news for the unscrupulous investigator in search of large

t statistics, perhaps, but not for the rest of us). If in fact the error is homoskedastic,

IV would be preferable to eﬃcient GMM. For this reason, a test for the presence of

heteroskedasticity when one or more regressors is endogenous may be useful in deciding

whether IV or GMM is called for. Such a test was proposed by Pagan and Hall (1983),

and we have implemented it in Stata as ivhettest. We describe this test in the next

section.

3 Testing for heteroskedasticity

The Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker statistics are stan-

dard tests of the presence of heteroskedasticity in an OLS regression. The principle is

to test for a relationship between the residuals of the regression and p indicator vari-

ables that are hypothesized to be related to the heteroskedasticity. Breusch and Pagan

(1979), Godfrey (1978), and Cook and Weisberg (1983) separately derived the same

test statistic. This statistic is distributed as χ

2

with p degrees of freedom under the

null of no heteroskedasticity, and under the maintained hypothesis that the error of the

regression is normally distributed. Koenker (1981) noted that the power of this test

is very sensitive to the normality assumption, and presented a version of the test that

relaxed this assumption. Koenker’s test statistic, also distributed as χ

2

p

under the null,

is easily obtained as nR

2

c

, where R

2

c

is the centered R

2

from an auxiliary regression

of the squared residuals from the original regression on the indicator variables. When

the indicator variables are the regressors of the original equation, their squares, and

their cross-products, Koenker’s test is identical to White’s nR

2

c

general test for het-

eroskedasticity (White 1980). These tests are available in Stata, following estimation

with regress, using our ivhettest as well as via hettest and whitetst.

As Pagan and Hall (1983) point out, the above tests will be valid tests for het-

eroskedasticity in an IV regression only if heteroskedasticity is present in that equation

and nowhere else in the system. The other structural equations in the system (corre-

sponding to the endogenous regressors X

1

) must also be homoskedastic, even though

12 Instrumental variables and GMM: Estimation and testing

they are not being explicitly estimated.

6

Pagan and Hall derive a test that relaxes this

requirement. Under the null of homoskedasticity in the IV regression, the Pagan–Hall

statistic is distributed as χ

2

p

, irrespective of the presence of heteroskedasticity elsewhere

in the system. A more general form of this test was separately proposed by White

(1982). Our implementation is of the simpler Pagan–Hall statistic, available with the

command ivhettest after estimation by ivreg, ivreg2, or ivgmm0. We present the

Pagan–Hall test here in the format and notation of the original White (1980 and 1982)

tests, however, to facilitate comparisons with the other tests noted above.

7

Let Ψ be the n × p matrix of indicator variables hypothesized to be related to the

heteroskedasticity in the equation, with typical row Ψ

i

. These indicator variables must

be exogenous, typically either instruments or functions of the instruments. Common

choices would be

1. The levels, squares, and cross-products of the instruments Z (excluding the con-

stant), as in the White (1980) test. This is the default in ivhettest.

2. The levels only of the instruments Z (excluding the constant). This is available

in ivhettest by specifying the ivlev option.

3. The “ﬁtted value” of the dependent variable. This is not the usual ﬁtted value of

the dependent variable, X

´

β. It is, rather,

´

X

´

β; i.e., the prediction based on the IV

estimator

´

β, the exogenous regressors Z

2

, and the ﬁtted values of the endogenous

regressors

´

X

1

. This is available in ivhettest by specifying the fitlev option.

4. The “ﬁtted value” of the dependent variable and its square (fitsq option).

The trade-oﬀ in the choice of indicator variables is that a smaller set of indicator vari-

ables will conserve degrees of freedom, at the cost of being unable to detect heteroskedas-

ticity in certain directions.

(Continued on next page)

6

For a more detailed discussion, see Pagan and Hall (1983) or Godfrey (1988, 189–90).

7

We note here that the original Pagan–Hall paper has a serious typo in the presentation of their

nonnormality-robust statistic. Their equation (58b), 195, is missing the term (in their terminology)

−2µ

3

ψ(

´

X

′

´

X)

−1

´

X

′

D(D

′

D)

−1

. The typo reappears in the discussion of the test by Godfrey (1988).

The correction published in Pesaran and Taylor (1999) is incomplete, as it applies only to the version

of the Pagan–Hall test with a single indicator variable.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 13

Let

Ψ =

1

n

n

i=1

Ψ

i

dimension = n ×p

´

D ≡

1

n

n

i=1

Ψ

′

i

(´ u

2

i

− ´ σ

2

) dimension = n ×1

´

Γ =

1

n

n

i=1

(Ψ

i

−

´

Ψ)

′

X

i

´ u

i

dimension = p ×K

´ µ

3

=

1

n

n

i=1

´ u

3

i

´ µ

4

=

1

n

n

i=1

´ u

4

i

´

X = P

z

X

(38)

If u

i

is homoskedastic and independent of Z

i

, then Pagan and Hall (1983) (Theo-

rem 8) show that under the null of no heteroskedasticity,

n

´

D

′

´

B

−1

´

D

A

∼ χ

2

p

(39)

where

´

B = B

1

+B

2

+B

3

+B

4

B

1

= (´ µ

4

− ´ σ

4

)

1

n

(Ψ

i

−Ψ)

′

(Ψ

i

−Ψ)

B

2

= −2´ µ

3 1

n

Ψ

′

´

X(

1

n

´

X

′

´

X)

−1

´

Γ

′

B

3

= B

′

2

B

4

= 4´ σ

2 1

n

´

Γ

′

(

1

n

´

X

′

´

X)

−1

´

Γ

(40)

This is the default statistic produced by ivhettest. Several special cases are worth

noting:

• If the error term is assumed to be normally distributed, then B

2

= B

3

= 0 and

B

1

= 2´ σ

4

(1/n)(Ψ

i

− Ψ)

′

(Ψ

i

− Ψ). This is available from ivhettest with the

phnorm option.

• If the rest of the system is assumed to be homoskedastic, then B

2

= B

3

= B

4

=

0 and the statistic in (39) becomes the White/Koenker nR

2

c

statistic. This is

available from ivhettest with the nr2 option.

• If the rest of the system is assumed to be homoskedastic and the error term is

assumed to be normally distributed, then B

2

= B

3

= B

4

= 0, B

1

= 2´ σ

4

(1/n)(Ψ

i

−

Ψ)

′

(Ψ

i

−Ψ), and the statistic in (39) becomes the Breusch–Pagan/Godfrey/Cook–

Weisberg statistic. This is available from ivhettest with the bpg option.

14 Instrumental variables and GMM: Estimation and testing

All of the above statistics will be reported with the all option. ivhettest can also

be employed after estimation via OLS or HOLS using regress or ivreg2. In this case,

the default test statistic is the White/Koenker nR

2

c

test.

The Pagan–Hall statistic has not been widely used in practice, perhaps because it is

not a standard feature of most regression packages. For a discussion of the relative merits

of the Pagan–Hall test, including some Monte Carlo results, see Pesaran and Taylor

(1999). Their ﬁndings suggest caution in the use of the Pagan–Hall statistic particularly

in small samples; in these circumstances, the nR

2

c

statistic may be preferred.

4 Testing the relevance and validity of instruments

4.1 Testing the relevance of instruments

An instrumental variable must satisfy two requirements: it must be correlated with

the included endogenous variable(s), and orthogonal to the error process. The former

condition may be readily tested by examining the ﬁt of the ﬁrst stage regressions. The

ﬁrst stage regressions are reduced form regressions of the endogenous variables X

1

on

the full set of instruments Z; the relevant test statistics here relate to the explanatory

power of the excluded instruments Z

1

in these regressions. A statistic commonly used, as

recommended by, for example, Bound et al. (1995), is the R

2

of the ﬁrst-stage regression

with the included instruments “partialled-out”.

8

Alternatively, this may be expressed

as the F test of the joint signiﬁcance of the Z

1

instruments in the ﬁrst-stage regression.

However, for models with multiple endogenous variables, these indicators may not be

suﬃciently informative.

To illustrate the pitfalls facing empirical researchers here, consider the following

simple example. The researcher has a model with two endogenous regressors and two

excluded instruments. One of the two excluded instruments is highly correlated with

each of the two endogenous regressors, but the other excluded instrument is just noise.

The model is therefore basically unidentiﬁed: there is one good instrument but two

endogenous regressors. But the Bound et al. (1995) F statistics and partial R

2

measures

from the two ﬁrst-stage regressions will not reveal this weakness. Indeed, the F statistics

will be statistically signiﬁcant, and without further investigation, the researcher will not

realize that the model cannot be estimated in this form. To deal with this problem of

“instrument irrelevance”, either additional relevant instruments are needed, or one of

the endogenous regressors must be dropped from the model. The statistics proposed

by Bound et al. (1995) are able to diagnose instrument relevance only in the presence

of a single endogenous regressor. When multiple endogenous regressors are used, other

statistics are required.

8

More precisely, this is the “squared partial correlation” between the excluded instruments Z

1

and

the endogenous regressor in question. It is deﬁned as (RSS

Z

2

− RSS

Z

)/TSS, where RSS

Z

2

is the

residual sum of squares in the regression of the endogenous regressor on Z

2

, and RSS

Z

is the RSS

when the full set of instruments is used.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 15

One such statistic has been proposed by Shea (1997): a “partial R

2

” measure that

takes the intercorrelations among the instruments into account.

9

For a model containing

a single endogenous regressor, the two R

2

measures are equivalent. The distribution

of Shea’s partial R

2

statistic has not been derived, but it may be interpreted like any

R

2

. As a rule of thumb, if an estimated equation yields a large value of the standard

(Bound et al. 1995) partial R

2

and a small value of the Shea measure, one may conclude

that the instruments lack suﬃcient relevance to explain all the endogenous regressors,

and the model may be essentially unidentiﬁed.

The Bound et al. (1995) measures and the Shea partial R

2

statistic can be obtained

via the first or ffirst options on the ivreg2 command.

The consequence of excluded instruments with little explanatory power is increased

bias in the estimated IV coeﬃcients (Hahn and Hausman 2002b). If their explanatory

power in the ﬁrst stage regression is nil, the model is in eﬀect unidentiﬁed with respect

to that endogenous variable; in this case, the bias of the IV estimator is the same as

that of the OLS estimator, IV becomes inconsistent, and nothing is gained from instru-

menting (ibid.). If the explanatory power is simply “weak”,

10

conventional asymptotics

fail. What is surprising is that, as Staiger and Stock (1997) and others have shown, the

“weak instrument” problem can arise even when the ﬁrst stage tests are signiﬁcant at

conventional levels (5% or 1%) and the researcher is using a large sample. One rule

of thumb is that for a single endogenous regressor, an F statistic below 10 is cause for

concern (Staiger and Stock 1997, 557). Since the size of the IV bias is increasing in the

number of instruments (Hahn and Hausman 2002b), one recommendation when faced

with this problem is to be parsimonious in the choice of instruments. For further discus-

sion, see, for example, Staiger and Stock (1997), Hahn and Hausman (2002a, 2002b),

and the references cited therein.

4.2 Overidentifying restrictions in GMM

We turn now to the second requirement for an instrumental variable. How can the

instrument’s independence from an unobservable error process be ascertained? If (and

only if) we have a surfeit of instruments—i.e., if the equation is overidentiﬁed—then we

can test the corresponding moment conditions described in (17); that is, whether the

instruments are uncorrelated with the error process. This condition will arise when the

order condition for identiﬁcation is satisﬁed in inequality: the number of instruments

excluded from the equation exceeds the number of included endogenous variables. This

test can and should be performed as a standard diagnostic in any overidentiﬁed instru-

9

The Shea partial R

2

statistic may be easily computed according to the simpliﬁcation presented in

Godfrey (1999), who demonstrates that Shea’s statistic for endogenous regressor i may be expressed

as R

2

p

= ν

OLS

i,i

/ν

IV

i,i

_

(1 −R

2

IV

)/(1 −R

2

OLS

)

_

, where ν

i,i

is the estimated asymptotic variance of the

coeﬃcient.

10

One approach in the literature, following Staiger and Stock (1997), is to deﬁne “weak” as meaning

that the ﬁrst stage reduced form coeﬃcients are in a N

1/2

neighborhood of zero, or equivalently,

holding the expectation of the ﬁrst stage F statistic constant as the sample size increases. See also

Hahn and Hausman (2002b).

16 Instrumental variables and GMM: Estimation and testing

mental variables estimation.

11

These are tests of the joint hypotheses of correct model

speciﬁcation and the orthogonality conditions, and a rejection may properly call either

or both of those hypotheses into question.

In the context of GMM, the overidentifying restrictions may be tested via the com-

monly employed J statistic of Hansen (1982). This statistic is none other than the value

of the GMM objective function (20), evaluated at the eﬃcient GMM estimator

´

β

EGMM

.

Under the null,

J(

´

β

EGMM

) = ng(

´

β)

′

´

S

−1

g(

´

β)

A

∼ χ

2

L−K

(41)

In the case of heteroskedastic errors, the matrix

´

S is estimated using the

´

Ω matrix (27),

and the J statistic becomes

J(

´

β

EGMM

) = ´ uZ

′

(Z

′

´

ΩZ)

−1

Z´ u

′

A

∼ χ

2

L−K

(42)

With clustered errors, the

´

Ω

C

matrix (37) can be used instead, and this J will be

consistent in the presence of arbitrary intra-cluster correlation.

The J statistic is distributed as χ

2

with degrees of freedom equal to the number of

overidentifying restrictions L − K rather than the total number of moment conditions

L because, in eﬀect, K degrees of freedom are used up in estimating the coeﬃcients

of β. J is the most common diagnostic utilized in GMM estimation to evaluate the

suitability of the model. A rejection of the null hypothesis implies that the instruments

are not satisfying the orthogonality conditions required for their employment. This may

be either because they are not truly exogenous, or because they are being incorrectly

excluded from the regression. The J statistic is calculated and displayed by ivreg2

when the gmm, robust, or cluster options are speciﬁed. In the last case, the J statistic

will be consistent in the presence of arbitrary intra-cluster correlation. This can be

quite important in practice: Hoxby and Paserman (1998) have shown that the presence

of intra-cluster correlation can readily cause a standard overidentiﬁcation statistic to

over-reject the null.

4.3 Overidentifying restrictions in IV

In the special case of linear instrumental variables under conditional heteroskedasticity,

the concept of the J statistic considerably predates the development of GMM estimation

techniques. The ivreg2 procedure routinely presents this test, labeled as Sargan’s

statistic (Sargan 1958) in the estimation output.

Just as IV is a special case of GMM, Sargan’s statistic is a special case of Hansen’s J

under the assumption of conditional homoskedasticity. Thus, if we use the IV optimal

weighting matrix (34) together with the expression for J (41), we obtain

Sargan’s statistic =

1

´ σ

2

´ u

′

Z(Z

′

Z)

−1

Z

′

´ u =

´ u

′

Z(Z

′

Z)

−1

Z

′

´ u

´ u

′

´ u/n

=

´ u

′

P

Z

´ u

´ u

′

´ u/n

(43)

11

Thus Davidson and MacKinnon (1993, 236): “Tests of overidentifying restrictions should be cal-

culated routinely whenever one computes IV estimates.” Sargan’s own view, cited in Godfrey (1988),

145, was that regression analysis without testing the orthogonality assumptions is a “pious fraud”.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 17

It is easy to see from (43) that Sargan’s statistic has an nR

2

u

form (where R

2

u

is the

uncentered R

2

), and it can be easily calculated this way by regressing the IV equation’s

residuals upon all instruments Z (both the included exogenous variables and those in-

struments that do not appear in the equation). The nR

2

u

of this auxiliary regression will

have a χ

2

L−K

distribution under the null hypothesis that all instruments are orthogonal

to the error. This auxiliary regression test is that performed by overid after ivreg,

and the statistic is also automatically reported by ivreg2.

12

A good discussion of this

test is presented in Wooldridge (2002, 123).

The literature contains several variations on this test. The main idea behind these

variations is that there is more than one way to consistently estimate the variance in

the denominator of (43). The most important of these is that of Basmann (1960).

Independently of Sargan, Basmann proposed an F(L−K, n−L) test of overidentifying

restrictions,

Basmann’s F statistic =

´ u

′

P

Z

´ u/(L −K)

´ u

′

M

Z

´ u/(n −L)

(44)

where M

Z

≡ I−P

Z

is the “annihilator” matrix and L is the total number of instruments.

Note that since ´ u

′

M

Z

´ u = ´ u

′

´ u − ´ u

′

P

Z

´ u, the same artiﬁcial regression can be used to

generate both the Basmann and the Sargan statistics.

The diﬀerence between Sargan’s and Basmann’s statistics is that the former uses

an estimate of the error variance from the IV regression estimated with the full set

of overidentifying restrictions, whereas the latter uses an estimate from a regression

without the overidentifying restrictions being imposed.

13

Either method will generate

a consistent estimator of the error variance under the null of instrument validity, and

hence the two statistics are asymptotically equivalent.

By default the Sargan nR

2

u

statistic and a χ

2

version of Basmann’s statistic (without

the numerator degrees of freedom) are reported in the overid output. An alternative

form of the Sargan statistic that uses a small-sample correction, replacing the estimate

of the error variance ´ u

′

´ u/n with ´ u

′

´ u/(n−K), may be requested via the dfr option; this

is also the version of the Sargan statistic reported by ivreg2 for IV estimation when the

small option is used. “Pseudo-F” forms of the Sargan and Basmann tests, obtained by

dividing the numerator ´ u

′

P

Z

´ u by L − K, may be requested via the f option. The all

option displays all ﬁve statistics.

Neither the Sargan nor the Basmann statistics computed for an IV regression is

valid in the presence of conditional heteroskedasticity. In this case, a heteroskedasticity-

robust overidentiﬁcation statistic can be calculated for an IV regression by applying a

general result in the literature for a test of overidentiﬁcation for a GMM estimator with

12

Note that Stata’s regress reports an uncentered R

2

only if the model does not contain a constant,

and a centered R

2

otherwise. Consequently, overid calculates the uncentered R

2

itself; the uncentered

total sum of squares of the auxiliary regression needed for the denominator of R

2

u

is simply the residual

sum of squares of the original IV regression.

13

See Davidson and MacKinnon (1993, 235–236). The Basmann statistic uses the error variance

from the estimate of their equation (7.54), and the pseudo-F form of the Basmann statistic is given by

equation (7.55); the Sargan statistic is given by their (7.57).

18 Instrumental variables and GMM: Estimation and testing

a suboptimal weighting matrix, which is what IV amounts to in these circumstances.

14

It does not seem to have been noted in the literature that in the IV case, this “ro-

bustiﬁed Sargan statistic” is numerically identical to the J statistic computed from

feasible eﬃcient two-step GMM for that equation. Thus, if robust inference is sought in

an instrumental variables model, one may calculate the test for overidentiﬁcation via a

standard J statistic. When the robust and/or cluster options are used with ivreg2

to estimate an IV regression with robust standard errors, the Hansen J statistic for

feasible eﬃcient two-step GMM is automatically reported.

4.4 Testing a subset of the overidentifying restrictions

The Hansen–Sargan tests for overidentiﬁcation presented above evaluate the entire set

of overidentifying restrictions. In a model containing a very large set of excluded instru-

ments, such a test may have very little power. Another common problem arises when

the researcher has prior suspicions about the validity of a subset of instruments, and

wishes to test them.

In these contexts, a “diﬀerence-in-Sargan” statistic may usefully be employed.

15

The test is known under other names as well; e.g., Ruud (2000) calls it the “distance

diﬀerence” statistic, and Hayashi (2000) follows Eichenbaum et al. (1988) and dubs it

the C statistic; we will use the latter term. The C test allows us to test a subset of

the original set of orthogonality conditions. The statistic is computed as the diﬀerence

between two Sargan statistics (or, for eﬃcient GMM, two J statistics): that for the

(restricted, fully eﬃcient) regression using the entire set of overidentifying restrictions,

versus that for the (unrestricted, ineﬃcient but consistent) regression using a smaller

set of restrictions, in which a speciﬁed set of instruments are removed from the set.

For excluded instruments, this is equivalent to dropping them from the instrument list.

For included instruments, the C test hypothecates placing them in the list of included

endogenous variables; in essence, treating them as endogenous regressors. The C test,

distributed χ

2

with degrees of freedom equal to the loss of overidentifying restrictions

(i.e., the number of suspect instruments being tested), has the null hypothesis that the

speciﬁed variables are proper instruments.

Although the C statistic can be calculated as the simple diﬀerence between the

Hansen–Sargan statistics for two regressions, this procedure can generate a negative

test statistic in ﬁnite samples. In the IV context, this problem can be avoided, and the

C statistic guaranteed to be nonnegative if the estimate of the error variance ´ σ

2

from the

original (restricted, more eﬃcient) IV regression is used to calculate the Sargan statistic

for the unrestricted IV regression as well. The equivalent procedure in the GMM context

is to use the

´

S matrix from the original estimation to calculate both J statistics. More

precisely,

´

S from the restricted estimation is used to form the restricted J statistic, and

the submatrix of

´

S with rows/columns corresponding to the unrestricted estimation is

used to form the J statistic for the unrestricted estimation; see Hayashi (2000, 220).

14

See Ahn (1995), Proposition 1, or, for an alternative formulation, Wooldridge (1995), Procedure 3.2.

15

See Hayashi (2000, 218–221 and 232–234) or Ruud (2000, Chapter 22), for comprehensive presen-

tations.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 19

The C test is conducted in ivreg2 by specifying the orthog option, and listing

the instruments (either included or excluded) to be challenged. The equation must

still be identiﬁed with these instruments either removed or reconsidered as endogenous

if the C statistic is to be calculated. Note that if the unrestricted equation is exactly

identiﬁed, the Hansen–Sargan statistic for the unrestricted equation will be zero and the

C statistic will coincide with the Hansen–Sargan statistic for the original (restricted)

equation, and this will be true irrespective of the instruments used to identify the

unrestricted estimation. This illustrates how the Hansen–Sargan overidentiﬁcation test

is an “omnibus” test for the failure of any of the instruments to satisfy the orthogonality

conditions, but at the same time requires that the investigator believe that at least some

of the instruments are valid; see Ruud (2000, 577).

4.5 Tests of overidentifying restrictions as Lagrange multiplier

(score) tests

The Sargan test can be viewed as analogous to a Lagrange multiplier (LM) or score test.

16

In the case of OLS, the resemblance becomes exact. Consider the model Y = X

2

β+u, for

which the researcher wishes to test whether the additional variables Z

1

can be omitted;

both X

2

and Z

1

are assumed to be exogenous. The LM test statistic of this hypothesis

is obtained as nR

2

u

from a regression of the OLS residuals ´ u on X

2

and Z

1

. It is easy

to see that this is in fact the same procedure used to obtain the Sargan statistic for the

special case of no endogenous regressors: X = X

2

and Z = [Z

1

X

2

]. This result carries

over into GMM estimation using Cragg’s HOLS: the J statistic for the HOLS estimator is

a heteroskedasticity-robust LM-type test of the hypothesis that Z

1

can be omitted from

the estimation.

When ivreg2 is used to generate OLS estimates, the Sargan statistic reported is an

LM test of the variables in the IV varlist. If the gmm option is chosen, HOLS estimates

are reported along with a robust LM statistic. As usual, the cluster option generates

a statistic that is robust to arbitrary intra-cluster correlation.

If the estimation method is OLS but the error is not homoskedastic, then the standard

LM test is no longer valid. A heteroskedasticity-robust version is, however, available.

17

The robust LM statistic for OLS is numerically equivalent to the J statistic from feasible

eﬃcient two-step GMM, i.e., HOLS, a result that again does not seem to have been noted

in the literature.

5 Testing for endogeneity of the regressors

There may well be reason to suspect nonorthogonality between regressors and errors—

which can arise from several sources, after all, including classical errors-in-variables.

Turning to IV or eﬃcient GMM estimation for the sake of consistency must be balanced

16

For a detailed discussion of the relationship between the diﬀerent types of tests in a GMM frame-

work, see Ruud (2000, Chapter 22).

17

See Wooldridge (2002, 58–61) and Wooldridge (1995) for more detailed discussion.

20 Instrumental variables and GMM: Estimation and testing

against the inevitable loss of eﬃciency. As Wooldridge states, “. . . an important cost

of performing IV estimation when x and u are uncorrelated: the asymptotic variance

of the IV estimator is always larger, and sometimes much larger, than the asymptotic

variance of the OLS estimator.” (Wooldridge 2003, 490) Naturally, this loss of eﬃciency

is a price worth paying if the OLS estimator is biased and inconsistent; thus, a test

of the appropriateness of OLS, and the necessity to resort to instrumental variables or

GMM methods, would be very useful. The intuition for such a test may also be couched

in terms of the number of orthogonality conditions available. May all or some of the

included endogenous regressors be appropriately treated as exogenous? If so, these

restrictions can be added to the set of moment conditions, and more eﬃcient estimation

will be possible.

5.1 Durbin–Wu–Hausman tests for endogeneity in IV estimation

Many econometrics texts discuss the issue of “OLS versus IV” in the context of the

Durbin–Wu–Hausman (DWH) tests, which involve estimating the model via both OLS

and IV approaches and comparing the resulting coeﬃcient vectors. In the Hausman

form of the test, a quadratic form in the diﬀerences between the two coeﬃcient vectors,

scaled by the precision matrix, gives rise to a test statistic for the null hypothesis that

the OLS estimator is consistent and fully eﬃcient.

Denote by

´

β

c

the estimator that is consistent under both the null and the alternative

hypotheses, and by

´

β

e

the estimator that is fully eﬃcient under the null but inconsistent

if the null is not true. The Hausman (1978) speciﬁcation test takes the quadratic form

H = n(

´

β

c

−

´

β

e

)

′

D

−

(

´

β

c

−

´

β

e

)

where

D =

_

V (

´

β

c

) −V (

´

β

e

)

_

(45)

where V (

´

β) denotes a consistent estimate of the asymptotic variance of β, and the

operator

−

denotes a generalized inverse.

A Hausman statistic for a test of endogeneity in an IV regression is formed by choos-

ing OLS as the eﬃcient estimator

´

β

e

and IV as the ineﬃcient but consistent estimator

´

β

c

. The test statistic is distributed as χ

2

with K

1

degrees of freedom, this being the

number of regressors being tested for endogeneity. The test is perhaps best interpreted

not as a test for the endogeneity or exogeneity of regressors per se, but rather as a test of

the consequence of employing diﬀerent estimation methods on the same equation. Un-

der the null hypothesis that OLS is an appropriate estimation technique, only eﬃciency

should be lost by turning to IV; the point estimates should be qualitatively unaﬀected.

The Hausman statistic comes in several ﬂavors, depending on which estimates of

the asymptotic variances are used. An obvious possibility would be to use V (

´

β

IV

) and

V (

´

β

OLS

) as generated by standard IV and OLS estimation; this would be the result if

Stata’s hausman command were used without any options. This is actually rarely done

because, although asymptotically valid, it has the drawback of possibly generating a

C. F. Baum, M. E. Schaﬀer, and S. Stillman 21

negative Hausman statistic in ﬁnite samples.

18

Avoiding this problem is straightforward,

however. Recall that the standard asymptotic covariances for IV and OLS are

V (

´

β

IV

) = ´ σ

2

IV

(X

′

P

Z

X)

−1

V (

´

β

OLS

) = ´ σ

2

OLS

(X

′

X)

−1

(46)

Under the null, both the IV and the OLS estimates of the error variance are consistent

estimators of σ, and either can be used to form the Hausman statistic. If a common

estimate of σ is used, then the generalized inverse of D is guaranteed to exist and a

positive test statistic is guaranteed.

19

If the Hausman statistic is formed using the OLS estimate of the error variance, then

the D matrix in (45) becomes

D = ´ σ

2

OLS

_

(X

′

P

Z

X)

−1

−(X

′

X)

−1

_

(47)

This version of the endogeneity test was ﬁrst proposed by Durbin (1954) and separately

by Wu (1973) (his T

4

statistic) and Hausman (1978). It can be obtained within Stata by

using hausman with the sigmamore option in conjunction with estimation by regress,

ivreg and/or, ivreg2.

If the Hausman statistic is formed using the IV estimate of the error variance, then

the D matrix becomes

D = ´ σ

2

IV

_

(X

′

P

Z

X)

−1

−(X

′

X)

−1

_

(48)

This version of the statistic was proposed separately by Wu (1973) (his T

3

statistic)

and Hausman (1978). It can be obtained within Stata by using hausman with the

(undocumented) sigmaless option.

Use of hausman with the sigmamore or sigmaless options avoids the additional

annoyance that because Stata’s hausman tries to deduce the correct degrees of freedom

for the test from the rank of the matrix D, it may sometimes come up with the wrong

answer. It will correctly report K

1

degrees of freedom for the test if a common estimate

of the error variance is used, i.e., in either the Durbin (47) or Wu T

3

(48) forms of

the statistic,

20

but not if both V (

´

β

IV

) and V (

´

β

OLS

) are used to form D. What will

happen in this case is that hausman will report the correct χ

2

statistic, but with degrees

of freedom equal to K rather than K

1

, and the user will have to calculate the correct

p−value by hand.

18

Readers should also bear in mind here and below that the estimates of the error variances may

or may not have small-sample corrections, according to the estimation package used and the options

chosen. If one of the variance–covariance matrices in D uses a small-sample correction, then so should

the other.

19

The matrix diﬀerence in (47) and (48) has rank K

1

; see Greene (2000, 384–385). Intuitively,

the variables being tested are those not shared by X and Z, namely the K

1

endogenous regressors

X

1

. The Hausman statistic for the endogeneity test can also be expressed in terms of a test of the

coeﬃcients of the endogenous regressors alone and the rest of the βs removed. In this alternate form,

the matrix diﬀerence in the expression equivalent to (47) is positive deﬁnite and a generalized inverse

is not required. See Bowden and Turkington (1984, 50–51).

20

This works in the former two cases because the matrix diﬀerence in (47) and (48) has rank K

1

; see

note 19 above.

22 Instrumental variables and GMM: Estimation and testing

Although these diﬀerent ﬂavors of the DWH endogeneity test are asymptotically

equivalent, they will diﬀer numerically, and may perform diﬀerently in ﬁnite samples.

Given the choice between forming the Hausman statistic using either ´ σ

2

OLS

or ´ σ

2

IV

, the

standard choice is the former (the Durbin statistic) because under the null, both are

consistent but the former is more eﬃcient. The Durbin ﬂavor of the test has the addi-

tional advantage of superior performance when instruments are weak (Staiger and Stock

1997).

5.2 Extensions: Testing a subset of the regressors for endogeneity,

and heteroskedastic-robust testing for IV and GMM estimation

In some contexts, the researcher may be certain that one or more regressors in X

1

is

endogenous but may question the endogeneity of the others. In such a context, the DWH

tests above are easily modiﬁed to apply to a subset of the endogenous regressors.

Consider dividing the set of endogenous regressors into two subsets, X

1A

and X

1B

,

where only the second set of variables is to be tested for endogeneity. In the tests using

the Hausman statistic formulation, (45), the less eﬃcient but consistent estimator

´

β

c

remains the IV estimator

´

β

IV

, but the fully eﬃcient estimator is now the IV estimator

´

β

IVB

from the regression in which X

1A

is still treated as endogenous but X

1B

is treated

as exogenous. A positive test statistic can again be guaranteed if the estimate of the

error variance ´ σ used in the matrix D is from either of the two IV estimations, since both

are consistent under the null. Again, use of the ´ σ

2

from the more eﬃcient estimation is

traditional.

The Hausman statistic framework of (45) for tests of the endogeneity of regressors

is available both for IV estimation with robust standard errors and for eﬃcient GMM

estimation. The procedure is essentially the same as in the standard IV versus OLS case

discussed above: estimate the equation twice, once with the regressors being tested as

exogenous (the more eﬃcient estimator) and once with the same regressors treated as

endogenous (the less eﬃcient but consistent estimator), and form the Hausman statistic

using the estimated coeﬃcients and (robust) covariance matrices.

If Stata’s hausman command is used to form the statistic this way, the mildly an-

noying problem of a possibly negative Hausman statistic can arise, and furthermore,

hausman will report the correct statistic but with the wrong degrees of freedom (K in-

stead of the correct K

1

). The way to guarantee a nonnegative test statistic is the same

method used with the C test: the equivalent of the sigmamore option of hausman would

be to use the

´

S matrix from the more eﬃcient estimation to form the covariance matrix

for the less eﬃcient but consistent estimation as well; see Section 4.4. Unfortunately,

this feature is not available with hausman,

21

nor can it easily be computed by hand, but

it is available via the orthog option of ivreg2, as we shall see at the very end of this

section.

21

Users beware: the sigmamore option following a robust estimation will not only fail to accomplish

this, but it will also generate an invalid test statistic.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 23

5.3 Durbin–Wu–Hausman tests as GMM tests of orthogonality con-

ditions

Readers at this point may be wondering about the relationship between the GMM tests

of orthogonality conditions implemented by the Hansen–Sargan, Basmann, and C tests

as discussed in Sections 4.2–4.4, and the Durbin–Wu–Hausman tests. The potential

resemblance is even closer once we note that the application of the Hausman test is not

limited to testing the endogeneity of regressors.

A Hausman test, like the C test, can be used to test a variety of combinations of

the orthogonality conditions, not only those involving regressors but those involving

excluded instruments as well. Denote by L

e

and L

c

the number of total instruments

in, respectively, the restricted (eﬃcient) and the unrestricted (consistent but ineﬃcient)

regressions. L

e

− L

c

is therefore the number of orthogonality conditions being tested.

Also denote by K

c

1

the number of endogenous regressors in the unrestricted regression.

Then it can be shown that under conditional homoskedasticity, the Hausman statistic

based on IV or GMM estimates,

´

β

e

and

´

β

c

, respectively, will be distributed as χ

2

with

degrees of freedom = Min[L

e

−L

c

, K

c

1

]. In the conditional heteroskedasticity case, the

degrees of freedom will be L

e

−L

c

if L

e

−L

c

≤ K

c

1

but unknown otherwise (making the

test impractical).

22

What, then, is the diﬀerence between the GMM C test and the Hausman speciﬁcation

test? In fact, because the two estimators being tested are both GMM estimators, the

Hausman speciﬁcation test is a test of linear combinations of orthogonality conditions

(Ruud 2000, 578-584). When the particular linear combination of orthogonality condi-

tions being tested is the same for the C test and for the Hausman test, the two test

statistics will be numerically equivalent. We can state this more precisely as follows: If

L

e

−L

c

≤ K

c

1

, the C statistic and the Hausman statistic are numerically equivalent.

23

If L

e

− L

c

> K

c

1

, the two statistics will be numerically diﬀerent, the C statistic will

have L

e

− L

c

degrees of freedom, and the Hausman statistic will have K

c

1

degrees of

freedom in the conditional homoskedasticity case (and an unknown number of degrees

of freedom in the conditional heteroskedasticity case).

One commonly encountered case in which the two statistics are exactly equivalent

is in fact the one with which we began our discussion of DWH tests, namely when we

want to test the endogeneity of regressors. An example of when the two test statistics

diﬀer would arise when the investigator has suspicions about a large number of excluded

instruments. In this case, the number of instruments being tested, L

e

− L

c

, may be

larger than the K

c

1

endogenous regressors in the less eﬃcient estimation.

The intuition behind the circumstances in which the two statistics will diﬀer follows

from what is being tested. The Hausman test is a vector of contrasts test that detects

changes in the coeﬃcients of the regressors treated as endogenous in the consistent but

ineﬃcient speciﬁcations. When the number of moment conditions being tested is larger

22

See Hausman and Taylor (1981) and Newey (1985), summarized by Hayashi (2000, 233–34).

23

We also need to assume, of course, that the two tests use the same estimate of the error variance,

´σ

2

, or the same

´

S matrix.

24 Instrumental variables and GMM: Estimation and testing

than the number of endogenous regressors that will be aﬀected by them, the Hausman

test will have fewer degrees of freedom than the C test. This means an investigator faces

a trade-oﬀ when deciding which of the two tests to use: when the two tests diﬀer, the

Hausman test is a test of linear combinations of moment conditions, and is more powerful

than the C test at detecting violations on restrictions of these linear combinations, but

the latter test will be able to detect other violations of moment conditions that the

former test cannot. As Ruud (2000), 585, points out, one of the appealing features of

the Hausman test is that its particular linear combination of moment conditions also

determines the consistency of the more eﬃcient GMM estimator.

There is an interesting semantic issue here: is there a diﬀerence between an “endo-

geneity test” and a test of “exogeneity” or “orthogonality”? The answer is, in the IV

context, “not really”. The DWH endogeneity tests are usually presented in textbooks as

tests of “endogeneity”, and the Hansen–Sargan–Basmann C tests are usually presented

as tests of the “validity” or “exogeneity” of instruments—and we have adopted these

conventions here—but they are all really just tests of orthogonality conditions. The

reason for the diﬀerent terminology relates, instead, to the circumstances in which the

researcher is operating and in particular his/her starting point. Say we start with an IV

estimation in which two regressors x

1A

and x

1B

are treated as endogenous and there are

ﬁve excluded instruments. We suspect that we do not need to be instrumenting x

1B

,

and so we employ the Hausman form of the DWH endogeneity test to see whether or not

we can increase our set of orthogonality conditions from 5 to 6. Now consider a second

researcher whose priors are somewhat less conservative; s/he starts with a speciﬁcation

in which x

1A

is still treated as endogenous but x

1B

is exogenous. S/he does, however,

have the same suspicions about x

1B

, and so s/he employs a C test of its orthogonality

to see whether or not s/he needs to reduce the set of orthogonality conditions from 6

to 5. The two tests are numerically the same, and are testing the same hypothesis—the

exogeneity of x

1B

—and the only diﬀerence is the starting point of the researchers.

5.4 DWH endogeneity tests in practice

There are a variety of ways of conducting a DWH endogeneity test in Stata for the

standard IV case with conditional homoskedasticity. Three equivalent ways of obtaining

the Durbin ﬂavor of the Durbin–Wu–Hausman statistic (47) are

1. Fit the less eﬃcient but consistent model using IV, followed by the command

hausman, save. Then, ﬁt the fully eﬃcient model by OLS (or by IV if only a subset

of regressors is being tested for endogeneity), followed by hausman, sigmamore.

2. Fit the fully eﬃcient model using ivreg2, specifying the regressors to be tested

in the orthog option.

3. Fit the less eﬃcient but consistent model using ivreg, then use ivendog to con-

duct an endogeneity test. This program will take as its argument a varlist consist-

ing of the subset of regressors to be tested for endogeneity; if the varlist is empty,

the full set of endogenous regressors is tested.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 25

The latter two methods are of course more convenient than the ﬁrst, as the test can

be done in one step.

Yet another asymptotically equivalent ﬂavor of the DWH test is available for stan-

dard IV estimation under conditional homoskedasticity, and is included in the output

of ivendog. This is the test statistic introduced by Wu (1973) (his T

2

), and sepa-

rately shown by Hausman (1978) to be calculated straightforwardly through the use of

auxiliary regressions. We will refer to it as the Wu–Hausman statistic.

24

Consider a simpliﬁed version of our basic model (1) with a single endogenous regres-

sor x

1

,

y = β

1

x

1

+X

2

β

2

+u, (49)

with X

2

≡ Z

2

assumed exogenous (including the constant, if one is speciﬁed) and with

excluded instruments Z

1

as usual. The auxiliary regression approach involves estimating

the reduced form (ﬁrst-stage) regression for x

1

:

x

1

= Z

1

Γ

1

+X

2

Γ

2

+v = ZΓ +v (50)

We are concerned with testing that x

1

⊥ u. Since by assumption each z in Z is

uncorrelated with u, the ﬁrst stage regression implies that this condition is equivalent

to a test of v ⊥ u. Exogeneity of the z’s implies that ´ v—the residuals from OLS

estimation of the ﬁrst-stage regression (50)—will be a consistent estimator of v. Thus,

we augment (49) with ´ u and re-estimate it with OLS. A t−test of the signiﬁcance of ´ v

in this auxiliary regression is then a direct test of the null hypothesis—in this context,

that θ = 0:

y = β

1

x

1

+X

2

β

2

+θ´ v +ǫ (51)

The Wu–Hausman test may be readily generalized to multiple endogenous variables,

since it merely requires the estimation of the ﬁrst-stage regression for each of the en-

dogenous variables, and augmentation of the original model with their residual series.

The test statistic then becomes an F test, with numerator degrees of freedom equal

to the number of included endogenous variables. One advantage of the Wu–Hausman

F statistic over the other DWH tests for IV versus OLS is that with certain normality

assumptions, it is a ﬁnite sample test exactly distributed as F (see Wu (1973) and

Nakamura and Nakamura (1981)). Wu (1974)’s Monte Carlo studies also suggest that

this statistic is to be preferred to the statistic using just ´ σ

2

IV

.

A version of the Wu–Hausman statistic for testing a subset of regressors is also

available, as Davidson and MacKinnon (1993, 241–242) point out. The modiﬁed test

involves estimating the ﬁrst-stage regression for each of the K

1B

variables in X

1B

in

order to generate a residual series. These residual series

´

V

B

are then used to augment

the original model,

y = X

1A

δ +X

1B

λ +X

2

β +

´

V

B

Θ +ǫ (52)

which is then estimated via instrumental variables, with only X

1A

speciﬁed as included

endogenous variables. The test for endogeneity of the variables in X

1B

is then a test for

24

A more detailed presentation of the test can be found in Davidson and MacKinnon (1993, 237–242).

26 Instrumental variables and GMM: Estimation and testing

the joint signiﬁcance of the Θ estimates; rejection of that null hypothesis implies that

instruments must be provided for the X

1B

variables.

An inconvenient complication here is that an ordinary F test for the signiﬁcance

of Θ in this auxiliary regression will not be valid, because the unrestricted sum of

squares needed for the denominator is wrong, and obtaining the correct SSR requires

further steps; see Davidson and MacKinnon (1993, Chapter 7). Only in the special

case where the eﬃcient estimator is OLS will an ordinary F test yield the correct test

statistic. The auxiliary regression approach to obtaining the Wu–Hausman statistic

described above has the further disadvantage of being computationally expensive and

practically cumbersome when there are more than a few endogenous variables to be

tested, because a residual series must be constructed separately for every endogenous

variable being tested.

We have taken a diﬀerent and simpler approach to programming the Wu–Hausman

statistic in ivendog. The Durbin ﬂavor of the Durbin–Wu–Hausman statistic (47) can

be written as

Durbin DWH: χ

2

(K

1B

) =

Q

∗

USSR/n

(53)

and the Wu–Hausman F statistic can be written as

Wu-Hausman: F(K

1B

, n −K −K

1B

) =

Q

∗

/K

1B

(USSR −Q

∗

)/(n −K −K

1B

)

(54)

where Q

∗

is the diﬀerence between the restricted and unrestricted sums of squares given

by the auxiliary regression (51) or (52), and USSR is the sum of squared residuals from

the eﬃcient estimate of the model.

25

From the discussion in the preceding section,

however, we know that for tests of the endogeneity of regressors, the C statistic and the

Hausman form of the DWH test are numerically equal, and when the error variance from

the more eﬃcient estimation is used, the Hausman form of the DWH test is the Durbin

ﬂavor. We also know from the discussion in Sections (4.3) and (4.4) that the C statistic

is simply the diﬀerence of two Sargan statistics, one for the unrestricted/consistent

estimation and one for the restricted/eﬃcient estimation, and we can use the estimate

of the error variance from the more eﬃcient estimation for both. Finally, we can see

from (53) and (54) that the Wu–Hausman F statistic can be easily calculated from the

same quantities needed for the DWH statistic.

This means that the Wu–Hausman F statistic in (54) does not need to be calculated

using the traditional auxiliary regression method, with all the ﬁrst-stage regressions and

generation of residual series as described above. Instead, it can be calculated using only

three additional regressions: one to estimate the restricted/eﬃcient model, and two

artiﬁcial regressions to obtain the two Sargan statistics. More precisely, we can write

25

See Wu (1973) or Nakamura and Nakamura (1981). Q

∗

can also be interpreted as the diﬀerence

between the sums of squares of the second-stage estimation of the eﬃcient model with and without

the residual series, and with the ﬁtted values for the variables X

1A

maintained as endogenous. If the

eﬃcient model is OLS, then the second-stage estimation is simply OLS augmented by the residual

series.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 27

Durbin DWH: χ

2

(K

1B

) =

´ u

′

e

P

Z,X

1B

´ u

e

− ´ u

′

c

P

Z

´ u

c

´ u

′

e

´ u

e

/n

(55)

W–H: F(K

1B

, n −K −K

1B

) =

(´ u

′

e

P

Z,X

1B

´ u

e

− ´ u

′

c

P

Z

´ u

c

)/K

1B

{´ u

′

e

´ u

e

−(´ u

′

e

P

Z,X

1B

´ u

e

− ´ u

′

c

P

Z

´ u

c

)}/(n −K −K

1B

)

(56)

where ´ u

e

and ´ u

c

refer to the residuals from the restricted/eﬃcient and the unre-

stricted/consistent estimations, respectively, and P

Z,X

1B

is the projection matrix of

the instruments Z augmented by the regressors X

1B

whose endogeneity is being tested.

A special case worth noting is when the unrestricted/consistent estimation is ex-

actly identiﬁed. In that case, the Sargan statistic for that equation is zero, and hence

´ u

′

c

P

Z

´ u

c

= 0. It is easy to see from the above that the Durbin ﬂavor of the Durbin–

Wu–Hausman χ

2

test statistic becomes identical to the Sargan statistic (43) for the

restricted/eﬃcient estimation, and the Wu–Hausman F statistic becomes identical to

Basmann’s F statistic (44).

26

Whereas we have available a large smorgasbord of alternative but asymptotically

equivalent tests of endogeneity in the case of IV estimation under conditional ho-

moskedasticity, there is much less choice when estimating either IV with a robust co-

variance matrix or eﬃcient GMM. As noted above, the use of hausman to test regressors

for endogeneity in the context of heteroskedasticity-robust or eﬃcient GMM estimation

will sometimes generate negative test statistics, and the degrees of freedom printed out

for the statistic will be wrong. If L

e

− L

c

> K

c

1

, there is the additional problem that

the degrees of freedom of the Hausman statistic are unknown. All these problems can

be avoided and a valid endogeneity test statistic obtained simply through the use of

the C statistic: estimate the restricted/fully eﬃcient model with ivreg2, specifying the

regressors to test for endogeneity in the orthog option.

6 Syntax diagrams

ivreg2 depvar

_

varlist1

¸

(varlist2=varlist iv)

_

weight

¸ _

if exp

¸ _

in range

¸

_

, gmm orthog(#) small level(#) hascons noconstant robust

cluster(varname) first ffirst noheader nofooter eform(string)

depname(varname) mse1 plus

¸

ivhettest

_

varlist

¸ _

, ivlev ivsq fitlev fitsq ph phnorm nr2 bpg all

¸

overid

_

, chi2 dfr f all

¸

ivendog

_

varlist

¸

26

This is another way of illustrating that the estimate of the error variance used in Basmann’s statistic

comes from an estimation without any overidentifying restrictions being imposed; see the discussion of

(44) above.

28 Instrumental variables and GMM: Estimation and testing

For a description of the options accepted by these commands, type help ivreg2, help

ivhettest, help overid, or help ivendog.

7 Acknowledgments

We are grateful to Jeﬀrey Wooldridge, Fumio Hayashi, Barbara Sianesi, Arthur Lewbel,

and seminar participants at Heriot–Watt University for their constructive suggestions.

We also thank an associate editor of the Stata Journal, and members of the Stata user

community who helped to test the software. Any remaining errors are our own.

8 References

Ahn, S. C. 1995. Robust GMM Tests for Model Speciﬁcation. Arizona State University

(Working Paper).

Arellano, M. 1987. Computing robust standard errors for within–groups estimators.

Oxford Bulletin of Economics and Statistics 49: 431–434.

Basmann, R. 1960. On ﬁnite sample distributions of generalized classical linear iden-

tiﬁability test statistics. Journal of the American Statistical Association 55(292):

650–659.

Bound, J., D. Jaeger, and R. Baker. 1995. Problems with instrumental variable estima-

tion when the correlation between the instruments and the endogenous explanatory

variables is weak. Journal of the American Statistical Association 90: 443–450.

Bowden, R. J. and D. A. Turkington. 1984. Instrumental Variables. Cambridge: Cam-

bridge University Press.

Breusch, T. S. and A. R. Pagan. 1979. A simple test for heteroskedasticity and random

coeﬃcient variation. Econometrica 47: 1287–1294.

Chamberlain, G. 1982. Multivariate regression models for panel data. Journal of Econo-

metrics 18: 5–46.

Cook, R. D. and S. Weisberg. 1983. Diagnostics for heteroscedasticity in regression.

Biometrika 70: 1–10.

Cragg, J. 1983. More eﬃcient estimation in the presence of heteroskedasticity of un-

known form. Econometrica 51: 751–763.

Cumby, R. E., J. Huizinga, and M. Obstfeld. 1983. Two-step two-stage least squares

estimation in models with rational expectations. Journal of Econometrics 21: 333–

355.

Davidson, R. and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics.

2d ed. New York: Oxford University Press.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 29

Durbin, J. 1954. Errors in variables. Review of the International Statistical Institute

22: 23–32.

Eichenbaum, M. S., L. P. Hansen, and K. J. Singleton. 1988. A time series analysis

of representative agent models of consumption and leisure. Quarterly Journal of

Economics 103(1): 51–78.

Godfrey, L. G. 1978. Testing for multiplicative heteroskedasticity. Journal of Econo-

metrics 8: 227–236.

—. 1988. Misspeciﬁcation Tests in Econometrics: The Lagrange Multiplier Principle

and Other Approaches. Cambridge: Cambridge University Press.

—. 1999. Instrument relevance in multivariate linear models. Review of Economics &

Statistics 81(3): 550–552.

Greene, W. 2000. Econometric Analysis. 4th ed. Upper Saddle River, NJ: Prentice–Hall.

Hahn, J. and J. Hausman. 2002a. A new speciﬁcation test for the validity of instrumental

variables. Econometrica 70(1): 163–89.

—. 2002b. Notes on bias in estimators for simultaneous equation models. Economics

Letters 75(2): 237–41.

Hansen, B. E. 2000. Econometrics. Manuscript, Madison, WI.

http://www.ssc.wisc.edu/

˜

bhansen/notes/notes.htm.

Hansen, L. 1982. Large sample properties of generalized method of moments estimators.

Econometrica 50(3): 1029–1054.

Hansen, L., J. Heaton, and A. Yaron. 1996. Finite sample properties of some alternative

GMM estimators. Journal of Business and Economic Statistics 14(3): 262–280.

Hausman, J. 1978. Speciﬁcation tests in econometrics. Econometrica 46(6): 1251–1271.

Hausman, J. A. and W. E. Taylor. 1981. A generalized speciﬁcation test. Economics

Letters 8: 239–245.

Hayashi, F. 2000. Econometrics. Princeton, NJ: Princeton University Press.

Hoxby, C. and M. D. Paserman. 1998. Overidentiﬁcation tests with grouped data.

National Bureau of Economic Research (Technical Paper 223).

K´ezdi, G. 2002. The Economic Returns to Education: Finite–Sample Properties of an

IV Estimator. University of Michigan (Working Paper).

Koenker, R. 1981. A note on Studentizing a test for heteroskedasticity. Journal of

Econometrics 17: 107–112.

Moulton, B. R. 1986. Random group eﬀects and the precision of regression estimates.

Journal of Econometrics 32: 385–397.

30 Instrumental variables and GMM: Estimation and testing

Nakamura, A. and M. Nakamura. 1981. On the relationships among several speciﬁcation

error tests presented by Durbin, Wu, and Hausman. Econometrica 49(6): 1583–1588.

Newey, W. 1985. Generalized method of moments speciﬁcation testing. Journal of

Econometrics 29: 229–256.

Pagan, A. R. and D. Hall. 1983. Diagnostic tests as residual analysis. Econometric

Reviews 2(2): 159–218.

Pesaran, M. H. and L. W. Taylor. 1999. Diagnostics for IV regressions. Oxford Bulletin

of Economics & Statistics 61(2): 255–281.

Ruud, P. A. 2000. An Introduction to Classical Econometric Theory. Oxford: Oxford

University Press.

Sargan, J. 1958. The estimation of economic relationships using instrumental variables.

Econometrica 26(3): 393–415.

Shea, J. 1997. Instrument relevance in multivariate linear models: A simple measure.

Review of Economics & Statistics 79(2): 348–352.

Staiger, D. and J. H. Stock. 1997. Instrumental variables regression with weak instru-

ments. Econometrica 65(3): 557–586.

White, H. 1980. A heteroskedasticity-consistent covariance matrix estimator and a

direct test for heteroskedasticity. Econometrica 48: 817–838.

—. 1982. Instrumental variables regression with independent observations. Economet-

rica 50(2): 483–499.

—. 1984. Asymptotic Theory for Econometricians. Orlando, FL: Academic Press.

Wooldridge, J. M. 1995. Score diagnostics for linear models estimated by two stage least

squares. In Advances in Econometrics and Quantitative Economics: Essays in honor

of Professor C. R. Rao, eds. G. S. Maddala, P. C. B. Phillips, and T. N. Srinivasan,

66–87. Cambridge, MA: Blackwell Publishers.

—. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT

Press.

—. 2003. Introductory Econometrics: A Modern Approach. 2d ed. New York: South-

Western College Publishing.

Wu, D.-M. 1973. Alternative tests of independence between stochastic regressors and

disturbances. Econometrica 41(4): 733–750.

—. 1974. Alternative tests of independence between stochastic regressors and distur-

bances: Finite sample results. Econometrica 42(3): 529–546.

C. F. Baum, M. E. Schaﬀer, and S. Stillman 31

About the Authors

Christopher F. Baum is an associate professor of economics at Boston College. He is an asso-

ciate editor of Computational Economics and The Stata Journal, and serves on the Advisory

Council of the Society for Computational Economics. Baum founded and manages the Boston

College Statistical Software Components (SSC) archive at RePEc (http://repec.org). His recent

research has focused on the eﬀects of uncertainty on international trade ﬂows, bank lending

behavior, and ﬁrms’ cash holdings.

Mark E. Schaﬀer is professor of economics and Director of the Centre for Economic Reform and

Transformation (CERT) at Heriot–Watt University, Edinburgh, Scotland. He is also a Research

Fellow at the Centre for Economic Policy Research (CEPR), the Institute for the Study of Labor

(IZA), and the William Davidson Institute. His research interests include various aspects of

ﬁrm and household behavior in the transition countries of Eastern Europe, the former USSR,

and East Asia.

Steven Stillman is a Senior Economic Researcher in the Labour Market Policy Group of the

New Zealand Department of Labour. He is also an aﬃliated Research Fellow at the Institute for

the Study of Labor (IZA) and the William Davidson Institute. His current research examines

the eﬀect of public policy and institutions on various dynamic aspects of household well-being

in New Zealand, Russia, and the United States.

2

Instrumental variables and GMM: Estimation and testing

discussion of intra-group correlation or “clustering”. If the error terms in the regression are correlated within groups, but not correlated across groups, then the consequences for IV estimation are similar to those of heteroskedasticity: the IV coeﬃcient estimates are consistent, but their standard errors and the usual forms of the diagnostic tests are not. We discuss how clustering can be interpreted in the GMM context and how it can be dealt with in Stata to make eﬃcient estimation, valid inference, and diagnostic testing possible. Eﬃcient GMM brings with it the advantage of consistency in the presence of arbitrary heteroskedasticity, but at a cost of possibly poor ﬁnite sample performance. If heteroskedasticity is in fact not present, then standard IV may be preferable. The usual Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker tests for the presence of heteroskedasticity in a regression equation can be applied to an IV regression only under restrictive assumptions. In Section 3, we discuss the test of Pagan and Hall (1983) designed speciﬁcally for detecting the presence of heteroskedasticity in IV estimation, and its relationship to these other heteroskedasticity tests. Even when IV or GMM is judged to be the appropriate estimation technique, we may still question its validity in a given application: are our instruments “good instruments”? This is the question we address in Section 4. “Good instruments” should be both relevant and valid: correlated with the endogenous regressors and at the same time orthogonal to the errors. Correlation with the endogenous regressors can be assessed by an examination of the signiﬁcance of the excluded instruments in the ﬁrst-stage IV regressions. We may cast some light on whether the instruments satisfy the orthogonality conditions in the context of an overidentiﬁed model: that is, one in which a surfeit of instruments are available. In that context, we may test the overidentifying restrictions in order to provide some evidence of the instruments’ validity. We present the variants of this test due to Sargan (1958), Basmann (1960), and, in the GMM context, Hansen (1982), and show how the generalization of this test, the C or “diﬀerence-in-Sargan” test, can be used to test the validity of subsets of the instruments. Although there may well be reason to suspect nonorthogonality between regressors and errors, the use of IV estimation to address this problem must be balanced against the inevitable loss of eﬃciency vis-`-vis OLS. It is therefore very useful to have a test of a whether or not OLS is inconsistent and IV or GMM is required. This is the Durbin–Wu– Hausman (DWH) test of the endogeneity of regressors. In Section 5, we discuss how to implement variants of the DWH test, and how the test can be generalized to test the endogeneity of subsets of regressors. We then show how the Hausman form of the test can be applied in the GMM context, how it can be interpreted as a GMM test, when it will be identical to the Hansen/Sargan/C test statistic, and when the two test statistics will diﬀer. We have written four Stata commands—ivreg2, ivhettest, overid, and ivendog— that, together with Stata’s built-in commands, allow the user to implement all of the above estimators and diagnostic tests. The syntax diagrams for these commands are presented in the last section of the paper, and the electronic supplement presents annotated examples of their use.

C. F. Baum, M. E. Schaﬀer, and S. Stillman

3

2

IV and GMM estimation

The “Generalized Method of Moments” was introduced by L. Hansen in his celebrated 1982 paper. There are a number of good modern texts that cover GMM, and one recent prominent text, Hayashi (2000), presents virtually all the estimation techniques discussed in the GMM framework. A concise online text that covers GMM is Hansen (2000). The exposition below draws on Hansen (2000, Chapter 11); Hayashi (2000, Chapter 3); Wooldridge (2002, Chapter 8); Davidson and MacKinnon (1993); and Greene (2000). We begin with the standard IV estimator, and then relate it to the GMM framework. We then consider the issue of clustered errors, and ﬁnally turn to OLS.

2.1

The method of instrumental variables

The equation to be estimated is, in matrix notation, y = Xβ + u, with typical row y i = Xi β + u i (2) E(uu′ ) = Ω (1)

The matrix of regressors X is n × K, where n is the number of observations. The error term u is distributed with mean zero and the covariance matrix Ω is n × n. Three special cases for Ω that we will consider are Homoskedasticity: Ω = σ2 I (4) (3)

2 σ1

0 .. .

2 σi

Heteroskedasticity:

Ω= Σ1

.. 0

.

2 σn

0 .. . Σm .. . ΣM

Clustering:

Ω=

(5)

0

where Σm indicates an intra-cluster covariance matrix. For cluster m with t observations, Σm will be t × t. Zero covariance between observations in the M diﬀerent clusters gives the covariance matrix Ω, in this case, a block-diagonal form.

E(Zi ui ) = 0. where V (βIV ) = 1 2 ′ σ (QXZ Q−1 QXZ )−1 ZZ n (12) Replacing QXZ . the equation is “overidentiﬁed”. where the L1 instruments Z1 are excluded instruments. 220) and refer to it as the IV estimator rather than 2SLS because the basic idea of instrumenting is central. Let ′ QXZ = E(Xi Zi ) ′ QZZ = E(Zi Zi ) (9) (10) and let u denote the IV residuals. if L > K. We partition the set of regressors into [X1 X2 ]. The instrumental variables estimator of β is βIV = {X ′ Z(Z ′ Z)−1 Z ′ X}−1 X ′ Z(Z ′ Z)−1 Z ′ y = (X ′ PZ X)−1 X ′ PZ y (8) This estimator goes under a variety of names: the instrumental variables (IV) estimator. and the (K − K1 ) remaining regressors X2 assumed exogenous. the generalized instrumental variables estimator (GIVE). The set of instrumental variables is Z and is n × L. and because it can be (and in Stata. We follow Davidson and MacKinnon (1993. The asymptotic distribution of the IV estimator under the assumption of conditional homoskedasticity (3) can be written as follows. If L = K. this is the full set of variables that are assumed to be exogenous. the equation is said to be “exactly identiﬁed”. u ≡ y − X βIV (11) A Then. with the K1 regressors X1 assumed under the null to be endogenous. there must be at least as many excluded instruments as there are endogenous regressors. QZZ and σ 2 with their sample estimates QXZ = 1 ′ XZ n (13) . the last reﬂecting the fact that the estimator can be calculated in a two-step procedure. or the two-stage leastsquares (2SLS) estimator. i. so that E(Xi ui ) = 0. V (βIV )}. We partition the instruments into [Z1 Z2 ]. and the remaining (L − L1 ) instruments Z2 ≡ X2 are the included instruments/exogenous regressors: Regressors X = [X1 X2 ] = [X1 Z2 ] = [Endogenous Exogenous] Instruments Z = [Z1 Z2 ] = [Excluded Included] (6) (7) The order condition for identiﬁcation of the equation is L ≥ K.. Denote by PZ the projection matrix Z(Z ′ Z)−1 Z ′ . the IV estimator is asymptotically distributed as βIV ∼ N {β.e. is more naturally) calculated in one step as well as in two.4 Instrumental variables and GMM: Estimation and testing Some of the regressors are endogenous.

The exogeneity of the instruments means that there are L moment conditions. This correction is not necessary. 2. ′ ′ gi (β) = Zi ui = Zi (yi − Xi β) (17) where gi is L × 1. 373). so that L > K. M. Schaﬀer. however. it is possible to ﬁnd a β that solves g(β) = 0. or orthogonality conditions. and we write these L sample moments as g(β) = The intuition behind 1 n n gi (β) = i=1 1 n n ′ Zi (yi − Xi β) = i=1 1 ′ Zu n (19) GMM is to choose an estimator for β that solves g(β) = 0.2 The generalized method of moments The standard IV estimator is a special case of a generalized method of moments (GMM) estimator. including Stata’s ivreg. the user can request the small-sample versions with the option small. and S. and in general it will not be possible to ﬁnd a β that will set all L sample moment conditions to exactly zero. Baum. The assumption that the instruments Z are exogenous can be expressed as E(Zi ui ) = 0. The L instruments give us a set of L moments. that will be satisﬁed at the true value of β: E{gi (β)} = 0 (18) Each of the L moment equations corresponds to a sample moment. E.C. If the equation to be estimated is exactly identiﬁed. we take an L × L weighting matrix W and use it to construct a quadratic form in the moment conditions. This gives us the GMM objective function: J(β) = ng(β)′ W g(β) (20) . If the equation is overidentiﬁed. Stillman 1 ′ ZZ n u′ u σ2 = n we obtain the estimated asymptotic variance–covariance matrix of the QZZ = V (βIV ) = σ 2 {X ′ Z(Z ′ Z)−1 Z ′ X}−1 = σ 2 (X ′ PZ X)−1 5 (14) (15) IV estimator: (16) Note that some packages. F. since the estimate of σ 2 would not be unbiased anyway (Greene 2000. so that L = K. and this GMM estimator is in fact the IV estimator. however. include a degrees-of-freedom correction to the estimate of σ 2 by replacing n with n − L. Our ivreg2 routine defaults to the large-sample formulas for the estimated error variance and covariance matrix. then we have as many equations—the L moment conditions—as we do unknowns—the K coeﬃcients in β. then we have more equations than we do unknowns. In this case. In this case.

2. Beyond this. we need to make some assumptions about Ω. but no clustering (4). will be the same for weighting matrices that diﬀer by a constant of proportionality (we will make use of this fact below).3 GMM and heteroskedastic errors Let us start with one of the most commonly encountered cases in cross-section analysis: heteroskedasticity of unknown form. there are as many GMM estimators as there are choices of weighting matrix W . we have not yet made any assumptions about Ω. . the covariance matrix of the disturbance term. This is achieved by choosing W = S −1 . we need to estimate S. ∂J(β) =0 (21) ∂β yields the GMM estimator: βGM M = (X ′ ZW Z ′ X)−1 X ′ ZW Z ′ y (22) Note that the results of the minimization. However. however. and we obtain the eﬃcient GMM estimator (25) βEGMM = (X ′ ZS −1 Z ′ X)−1 X ′ ZS −1 Z ′ y with asymptotic variance V (βEGMM ) = 1 ′ (Q S −1 QXZ )−1 n XZ (26) Note the generality (the “G” of GMM) of the treatment thus far. What is the optimal choice of weighting matrix? Denote by S the covariance matrix of the moment conditions g: S= 1 1 E(Z ′ uu′ Z) = E(Z ′ ΩZ) n n (23) where S is an L×L matrix. Such an S is available by using the standard “sandwich” approach to robust covariance estimation. Denote by Ω the diagonal matrix of squared residuals. and to do this. one that minimizes the asymptotic variance of the estimator. We need a heteroskedasticityconsistent estimator of S. because the matrix S is not known.6 Instrumental variables and GMM: Estimation and testing A GMM estimator for β is the β that minimizes J(β). the eﬃcient GMM estimator is not yet a feasible estimator. and hence the GMM estimator. To be able to implement the estimator. Substitute this into (22) and (24). Deriving and solving the K ﬁrst order conditions. The general formula for the distribution of a GMM estimator is 1 V (βGM M ) = (Q′ W QXZ )−1 (Q′ W SW QXZ )(Q′ W QXZ )−1 (24) XZ XZ n XZ The eﬃcient GMM estimator is the GMM estimator with an optimal weighting matrix W .

“2-step 2-stage least squares”. it requires numerical optimization methods. this is the “iterated GMM estimator”.2 1 This estimator goes under various names: “2-stage instrumental variables”(2SIV). Baum. 2 Another approach is to choose a diﬀerent consistent but ineﬃcient Step 1 estimator for the calculation of residuals used in Step 2. . for as long as the user wishes or until the estimator converges. although we cannot hope to estimate the n diagonal elements of Ω with only n observations. u2 i . The u used for the matrix in (27) can come from any consistent estimator of β. Davidson and MacKinnon (1993. . 3. and (13). and so forth. For example. 599). Use these to form the optimal weighting matrix W = S −1 = . u2 n 0 7 where ui is a consistent estimate of ui . 2. Calculate the eﬃcient GMM estimator βEGMM and its variance-covariance matrix using the estimated optimal weighting matrix and (25). we can write the optimal weighting matrix as a function of β. using these to calculate a new S. using this in turn to calculate the threestep feasible eﬃcient GMM estimator. This gives us the algorithm for the feasible eﬃcient two-step GMM estimator. and choose β to maximize J(β) = ng n (β)′ W (β)g n (β). In practice. (26). E. Cumby et al. as implemented in ivreg2.. eﬃciency is not required. they are suﬃcient to enable us to obtain a consistent estimate of the L × L matrix S. Stillman u2 1 .. Form the residuals u.gmm and ivgmm0:1 1. Instead of ﬁrst obtaining an optimal weighting matrix and then taking it as given when maximizing (20). (1983). the procedure above can be iterated by obtaining the residuals from the two-step GMM estimator. Schaﬀer. Estimate the equation using 1/n(Z ′ ΩZ) −1 Ω= (27) IV. the most common choice for estimating u is the IV residuals. F. Note that the estimate of the optimal weighting matrix is derived from some β. This is the “continuously updated GMM” of Hansen et al. One common alternative to IV as the initial estimator is to use the residuals from the GMM estimator that uses the identity matrix as the weighting matrix. one may work directly with the GMM objective function. This yields βEGMM = {X ′ Z(Z ′ ΩZ)−1 Z ′ X}−1 X ′ Z(Z ′ ΩZ)−1 Z ′ y with asymptotic variance V (βEGMM ) = {X ′ Z(Z ′ ΩZ)−1 Z ′ X}−1 (30) (29) A variety of other feasible GMM procedures are also possible. Alternatively. White (1982). (1996). M.C. and S. Then a consistent estimator of S is 1 S = (Z ′ ΩZ) (28) n This works because. and “heteroskedastic 2-stage least squares” (H2SLS). 0 .

In eﬀect. under conditional homoskedasticity. the GMM estimator will be the same for weighting matrices that diﬀer by a constant of proportionality. Instead of treating σ 2 as a parameter to be estimated in a second stage. This is easily done.4 GMM. If we use the residuals of the IV estimator to calculate σ 2 = (1/n)u′ u. In eﬀect. under the assumption of conditional homoskedasticity.8 Instrumental variables and GMM: Estimation and testing 2. using exactly the same method employed in two-step eﬃcient GMM. we obtain 1 S = σ2 Z ′ Z n Finally. but what about σ 2 ? As we noted above. the IV estimator is ineﬃcient but consistent. still holds. and hence the general formula for the asymptotic variance of a general GMM estimator. see Davidson and MacKinnon (1993. whereas the standard estimated IV covariance matrix is inconsistent.3 What are the implications of heteroskedasticity for the IV estimator? Recall that in the presence of heteroskedasticity. we ﬁnd that it reduces to the formula for the asymptotic variance of the IV estimator (12). however. First. Asymptotically correct inference is still possible. since it can be calculated as the solution to an eigenvalue problem. what we need is a consistent estimate of S. if we now set W = S −1 = 1 σ2 Z ′ Z n −1 (33) (34) and substitute (34) into the formula for the asymptotic variance of the eﬃcient GMM estimator (26). we do need an estimate of σ 2 . We can therefore obtain the eﬃcient GMM estimator under conditional homoskedasticity if we simply ignore σ 2 and use as our weighting matrix W = 1 ′ ZZ n −1 (32) Substituting (32) into (22). what if we return to the GMM criterion function and minimize by simultaneously choosing β and σ 2 ? The estimator that solves this minimization problem is in fact the Limited Information Maximum Likelihood estimator (LIML). the continuously updated GMM estimator is the LIML estimator. Calculating the LIML estimator does not require numerical optimization methods. the (eﬃcient) iterated GMM estimator is the IV estimator. To obtain the variance of the estimator. however. 644–51). (24). The IV weighting matrix W remains as in (32). . In these circumstances the IV estimator is a GMM estimator with a suboptimal weighting matrix. for example. This means that the S matrix simpliﬁes to S= 1 1 E(Z ′ ΩZ) = σ 2 E(Z ′ Z) n n (31) The expectation term in (31) can be estimated by (1/n)Z ′ Z. and the iterations converge after one step. form 3 It is worth noting that the IV estimator is not the only such eﬃcient GMM estimator under conditional homoskedasticity. we ﬁnd that it reduces to the formula for the IV estimator in (8). IV and homoskedastic versus heteroskedastic errors Let us now see what happens if we impose the more restrictive assumption of conditional homoskedasticity on Ω (3).

see Wooldridge (2002. ΩC = (36) Σm . For example. As speciﬁed in (5). it may be reasonable to assume that observations on the same individual (cluster) in two diﬀerent time periods are correlated. however. . random eﬀects. Finally. moreover. or anything else. 2. The Σm ’s may. It is commonly employed in the context of panel data estimation. Arellano (1987). and the sample estimates of QXZ (13) and QZZ (14) into the general formula for the asymptotic variance of a GMM estimator (24). As usual. vary from cluster to cluster (the cluster analog to heteroskedasticity). and we obtain an estimated variance–covariance matrix for the IV estimator that is robust to the presence of heteroskedasticity: Robust V (βIV ) = (X ′ PZ X)−1 {X ′ Z(Z ′ Z)−1 (Z ′ ΩZ)(Z ′ Z)−1 Z ′ X}(X ′ PZ X)−1 (35) This is in fact the usual Eicker–Huber–White “sandwich” robust variance–covariance matrix for the IV estimator. but observations on two diﬀerent individuals are not. available from ivreg or ivreg2 with the robust option. and .C. Even in these very general circumstances. The intra-cluster correlation Σm can be of any form. M. If we now m deﬁne ΩC as the block-diagonal form Σ1 0 . Schaﬀer. Substitute this S. using the IV residuals. Baum.. be it serial correlation. Clustering arises very frequently in cross-section and panel data applications. clustering. E. what we need is a consistent estimate of S. and GMM We turn now to the third special form of the disturbance covariance matrix Ω.. . and S. In the panel context. the (suboptimal) IV weighting matrix W (32). 135–136). the form of clustering is very general. 0 ΣM then an estimator of S that is consistent in the presence of arbitrary intra-cluster correlation is 1 S = (Z ′ ΩC Z) (37) n The earliest reference to this approach to robust estimation in the presence of clustering of which we are aware is White (1984. robust covariance estimation. deﬁne Σm ≡ um u′ . 193). Let um be some consistent estimate of um . and use this matrix to form the S matrix as in (28). Stillman 9 the “hat” matrix Ω as in (27). but observations on individuals from diﬀerent families are not. then um is t × 1.5 Clustering. it may be reasonable to assume that observations on individuals drawn from the same family (cluster) are correlated with each other. if there are t observations in the cluster. eﬃcient estimation and consistent inference is still possible. Denote by um the vector of disturbances for cluster m. F.

robust.4 The cluster-robust covariance matrix for IV estimation is obtained exactly as in the preceding subsection except using S as deﬁned in (37). except using the cluster-robust estimate of S. and will indicate error only if M ≤ L. there are no endogenous regressors.e. If the disturbance is heteroskedastic. and the researcher has available additional moment conditions. an 4 There are other approaches to dealing with clustering that put more structure on the Ω matrix and hence are more eﬃcient but less robust. For example.a. It can be obtained in precisely the same way as feasible eﬃcient two-step GMM. and the answer is “yes” (Chamberlain (1982) and Cragg (1983)). then it ought to be the case that M >> K. At this point.k. if the cluster option is used. implemented in. It is the standard Stata approach to clustering.. OLS is no longer eﬃcient. just as we require a reasonable number of diagonal elements (observations) for the usual “hat” matrix Ω. It is important to note here that. dubbed “heteroskedastic OLS” (HOLS) by Davidson and MacKinnon (1993. rank(S) = M ≤ K = rank(Z ′ Z). i. then the eﬃcient GMM estimator is that of Cragg (1983).5 2.6 GMM. This estimator can be obtained using ivreg2 by specifying the gmm option. This generates the robust standard errors produced by ivreg and ivreg2 with the cluster option. . OLS. and Heteroskedastic OLS (HOLS) Our ﬁnal special case of interest is OLS. If the disturbance is heteroskedastic. (24). This eﬃcient GMM estimator is a useful alternative to the ﬁxed or random eﬀects IV estimators available from Stata’s xtivreg because it relaxes the constraint imposed by the latter estimators that the correlation of individual observations within a group is constant. except now the ﬁrst-step ineﬃcient but consistent estimator used to generate the residuals is OLS rather than IV. the number of regressors exceeds the number of clusters. OLS is an eﬃcient GMM estimator. Similarly. But users should take care that. A natural question is whether a more eﬃcient GMM estimator exists. i. additional variables that do not appear in the regression but that are known to be exogenous. we also require a reasonable number of diagonal elements (clusters) for ΩC . It is not hard to see that under conditional homoskedasticity and the assumption that all the regressors are exogenous. 5 Stata’s oﬃcial ivreg is perhaps excessively forgiving in this regard. for e example. An extreme case is where the number of clusters M is ≤ K. GMM estimates that are eﬃcient in the presence of arbitrary intra-cluster correlation are obtained exactly as in Subsection 2. “random eﬀects”) 2 2 2 structure in (36): Σm is a matrix with diagonal elements σu + σv and oﬀ-diagonal elements σv . This is then used with (24) to obtain a consistent estimate of the covariance matrix.3. and this estimator can also be derived using the general formula for the asymptotic variance of a GMM estimator with a suboptimal weighting matrix. but correct inference is still possible through the use of the Eicker–Huber–White “sandwich” robust covariance estimator. When this is the case.. the Moulton (1986) approach to obtaining consistent standard errors is in eﬀect to specify an “error components” (a.e. regress and ivreg2. ivreg2 will either refuse to report standard errors (in the case of IV estimation) or exit with an error message (in the case of GMM estimation). 600).10 Instrumental variables and GMM: Estimation and testing K´zdi (2002).

3 Testing for heteroskedasticity The Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker statistics are standard tests of the presence of heteroskedasticity in an OLS regression. the GMM estimator is more eﬃcient than the simple IV estimator. The other structural equations in the system (corresponding to the endogenous regressors X1 ) must also be homoskedastic. OLS estimates are reported. As Pagan and Hall (1983) point out. If in fact the error is homoskedastic. Schaﬀer. and the additional exogenous variables in the list of excluded instruments. and S. the use of GMM does come with a price. The consequence is that the eﬃcient GMM estimator can have poor small sample properties. In particular. perhaps. is that the optimal weighting matrix S at the core of eﬃcient GMM is a function of fourth moments.7 To GMM or not to GMM? The advantages of GMM over IV are clear: if heteroskedasticity is present. and under the maintained hypothesis that the error of the regression is normally distributed. the GMM estimator is no worse asymptotically than the IV estimator. When the indicator variables are the regressors of the original equation. 2. We describe this test in the next section. and obtaining reasonable estimates of fourth moments may require very large sample sizes. Koenker’s test statistic. M. where Rc is the centered R2 from an auxiliary regression of the squared residuals from the original regression on the indicator variables. Nevertheless. Godfrey (1978). also distributed as χ2 under the null. F. whereas if heteroskedasticity is not present. For this reason. their squares. using our ivhettest as well as via hettest and whitetst. as Hayashi (2000) points out (215). Wald tests tend to over-reject the null (good news for the unscrupulous investigator in search of large t statistics. Koenker (1981) noted that the power of this test is very sensitive to the normality assumption. Koenker’s test is identical to White’s nRc general test for heteroskedasticity (White 1980). following estimation with regress. and Cook and Weisberg (1983) separately derived the same test statistic. Stillman 11 empty list of endogenous regressors. IV would be preferable to eﬃcient GMM. Such a test was proposed by Pagan and Hall (1983). even though . The problem. and we have implemented it in Stata as ivhettest. the above tests will be valid tests for heteroskedasticity in an IV regression only if heteroskedasticity is present in that equation and nowhere else in the system. This statistic is distributed as χ2 with p degrees of freedom under the null of no heteroskedasticity. p 2 2 is easily obtained as nRc . but not for the rest of us). and presented a version of the test that relaxed this assumption. E. Baum. Breusch and Pagan (1979).C. The principle is to test for a relationship between the residuals of the regression and p indicator variables that are hypothesized to be related to the heteroskedasticity. and 2 their cross-products. If the gmm option is omitted. a test for the presence of heteroskedasticity when one or more regressors is endogenous may be useful in deciding whether IV or GMM is called for. These tests are available in Stata.

irrespective of the presence of heteroskedasticity elsewhere p in the system. The levels. It is. A more general form of this test was separately proposed by White (1982). 189–90).12 Instrumental variables and GMM: Estimation and testing they are not being explicitly estimated. The “ﬁtted value” of the dependent variable. Our implementation is of the simpler Pagan–Hall statistic. as in the White (1980) test. at the cost of being unable to detect heteroskedasticity in certain directions. the exogenous regressors Z2 . 7 We 6 For . rather. and cross-products of the instruments Z (excluding the constant). with typical row Ψi . Under the null of homoskedasticity in the IV regression. available with the command ivhettest after estimation by ivreg.6 Pagan and Hall derive a test that relaxes this requirement. the Pagan–Hall statistic is distributed as χ2 .7 Let Ψ be the n × p matrix of indicator variables hypothesized to be related to the heteroskedasticity in the equation. These indicator variables must be exogenous. We present the Pagan–Hall test here in the format and notation of the original White (1980 and 1982) tests. This is the default in ivhettest. and the ﬁtted values of the endogenous regressors X1 . The correction published in Pesaran and Taylor (1999) is incomplete.e. however. see Pagan and Hall (1983) or Godfrey (1988. (Continued on next page) a more detailed discussion. The levels only of the instruments Z (excluding the constant). 4. or ivgmm0. as it applies only to the version of the Pagan–Hall test with a single indicator variable. squares. the prediction based on the IV estimator β. X β.. i. The trade-oﬀ in the choice of indicator variables is that a smaller set of indicator variables will conserve degrees of freedom. is missing the term (in their terminology) −2µ3 ψ(X ′ X)−1 X ′ D(D′ D)−1 . 195. This is not the usual ﬁtted value of the dependent variable. note here that the original Pagan–Hall paper has a serious typo in the presentation of their nonnormality-robust statistic. The typo reappears in the discussion of the test by Godfrey (1988). This is available in ivhettest by specifying the ivlev option. 3. Their equation (58b). 2. Common choices would be 1. typically either instruments or functions of the instruments. X β. The “ﬁtted value” of the dependent variable and its square (fitsq option). ivreg2. This is available in ivhettest by specifying the fitlev option. to facilitate comparisons with the other tests noted above.

then Pagan and Hall (1983) (Theorem 8) show that under the null of no heteroskedasticity. then B2 = B3 = B4 = 0. and S. • If the rest of the system is assumed to be homoskedastic and the error term is assumed to be normally distributed. E. This is available from ivhettest with the phnorm option. Stillman Let Ψ= D≡ Γ= 1 n 1 n 1 n 1 n 1 n n i=1 n i=1 13 dimension = n × p dimension = n × 1 dimension = p × K (38) Ψi Ψ′ (u2 − σ 2 ) i i − Ψ)′ Xi ui n i=1 (Ψi n i=1 n i=1 µ3 = µ4 = u3 i u4 i X = Pz X If ui is homoskedastic and independent of Zi . This is available from ivhettest with the bpg option. • If the rest of the system is assumed to be homoskedastic. nD′ B −1 D ∼ χ2 p where B = B1 + B2 + B3 + B4 1 B1 = (µ4 − σ 4 ) n (Ψi − Ψ)′ (Ψi − Ψ) 1 1 B2 = −2µ3 n Ψ′ X( n X ′ X)−1 Γ′ ′ B3 = B2 1 1 B4 = 4σ 2 n Γ′ ( n X ′ X)−1 Γ A (39) (40) This is the default statistic produced by ivhettest.C. then B2 = B3 = 0 and B1 = 2σ 4 (1/n)(Ψi − Ψ)′ (Ψi − Ψ). Several special cases are worth noting: • If the error term is assumed to be normally distributed. . M. Schaﬀer. then B2 = B3 = B4 = 2 0 and the statistic in (39) becomes the White/Koenker nRc statistic. F. This is available from ivhettest with the nr2 option. Baum. B1 = 2σ 4 (1/n)(Ψi − Ψ)′ (Ψi −Ψ). and the statistic in (39) becomes the Breusch–Pagan/Godfrey/Cook– Weisberg statistic.

4 4. Bound et al. this may be expressed as the F test of the joint signiﬁcance of the Z1 instruments in the ﬁrst-stage regression. either additional relevant instruments are needed. this is the “squared partial correlation” between the excluded instruments Z and 1 the endogenous regressor in question. the relevant test statistics here relate to the explanatory power of the excluded instruments Z1 in these regressions. ivhettest can also be employed after estimation via OLS or HOLS using regress or ivreg2. and without further investigation. The model is therefore basically unidentiﬁed: there is one good instrument but two endogenous regressors. It is deﬁned as (RSSZ2 − RSSZ )/T SS. consider the following simple example. perhaps because it is not a standard feature of most regression packages. 8 More precisely. The ﬁrst stage regressions are reduced form regressions of the endogenous variables X1 on the full set of instruments Z. (1995) are able to diagnose instrument relevance only in the presence of a single endogenous regressor. To illustrate the pitfalls facing empirical researchers here. for models with multiple endogenous variables. where RSSZ2 is the residual sum of squares in the regression of the endogenous regressor on Z2 . One of the two excluded instruments is highly correlated with each of the two endogenous regressors. the nRc statistic may be preferred. But the Bound et al. The Pagan–Hall statistic has not been widely used in practice. the F statistics will be statistically signiﬁcant. as recommended by. In this case. . (1995) F statistics and partial R2 measures from the two ﬁrst-stage regressions will not reveal this weakness. is the R2 of the ﬁrst-stage regression with the included instruments “partialled-out”. Indeed. including some Monte Carlo results. these indicators may not be suﬃciently informative. The researcher has a model with two endogenous regressors and two excluded instruments. other statistics are required. the researcher will not realize that the model cannot be estimated in this form. (1995).14 Instrumental variables and GMM: Estimation and testing All of the above statistics will be reported with the all option. To deal with this problem of “instrument irrelevance”. for example. and orthogonal to the error process.1 Testing the relevance and validity of instruments Testing the relevance of instruments An instrumental variable must satisfy two requirements: it must be correlated with the included endogenous variable(s). The former condition may be readily tested by examining the ﬁt of the ﬁrst stage regressions.8 Alternatively. For a discussion of the relative merits of the Pagan–Hall test. When multiple endogenous regressors are used. in these circumstances. and RSSZ is the RSS when the full set of instruments is used. A statistic commonly used. see Pesaran and Taylor (1999). Their ﬁndings suggest caution in the use of the Pagan–Hall statistic particularly 2 in small samples. 2 the default test statistic is the White/Koenker nRc test. or one of the endogenous regressors must be dropped from the model. The statistics proposed by Bound et al. but the other excluded instrument is just noise. However.

whether the instruments are uncorrelated with the error process. and the model may be essentially unidentiﬁed. How can the instrument’s independence from an unobservable error process be ascertained? If (and only if) we have a surfeit of instruments—i. who demonstrates that Shea’s statistic for endogenous regressor i may be expressed 2 OLS IV 2 2 as Rp = νi. and nothing is gained from instrumenting (ibid. For further discussion. The distribution of Shea’s partial R2 statistic has not been derived. Stillman 15 One such statistic has been proposed by Shea (1997): a “partial R2 ” measure that takes the intercorrelations among the instruments into account.C. the two R2 measures are equivalent. 2002b). 10 One approach in the literature. following Staiger and Stock (1997). IV becomes inconsistent. 1995) partial R2 and a small value of the Shea measure. E. as Staiger and Stock (1997) and others have shown. This test can and should be performed as a standard diagnostic in any overidentiﬁed instru9 The Shea partial R2 statistic may be easily computed according to the simpliﬁcation presented in Godfrey (1999). As a rule of thumb.9 For a model containing a single endogenous regressor. and the references cited therein.i (1 − RIV )/(1 − ROLS ) . but it may be interpreted like any R2 . 557). If their explanatory power in the ﬁrst stage regression is nil. What is surprising is that.e. If the explanatory power is simply “weak”. . is to deﬁne “weak” as meaning that the ﬁrst stage reduced form coeﬃcients are in a N 1/2 neighborhood of zero. This condition will arise when the order condition for identiﬁcation is satisﬁed in inequality: the number of instruments excluded from the equation exceeds the number of included endogenous variables.10 conventional asymptotics fail. see.i /νi. the model is in eﬀect unidentiﬁed with respect to that endogenous variable. Baum. See also Hahn and Hausman (2002b). an F statistic below 10 is cause for concern (Staiger and Stock 1997. the bias of the IV estimator is the same as that of the OLS estimator. and S. Since the size of the IV bias is increasing in the number of instruments (Hahn and Hausman 2002b). or equivalently. one may conclude that the instruments lack suﬃcient relevance to explain all the endogenous regressors. if an estimated equation yields a large value of the standard (Bound et al. The Bound et al.). M. One rule of thumb is that for a single endogenous regressor. if the equation is overidentiﬁed—then we can test the corresponding moment conditions described in (17). Staiger and Stock (1997). one recommendation when faced with this problem is to be parsimonious in the choice of instruments. where νi. Schaﬀer. 4.2 Overidentifying restrictions in GMM We turn now to the second requirement for an instrumental variable. (1995) measures and the Shea partial R2 statistic can be obtained via the first or ffirst options on the ivreg2 command. that is.i is the estimated asymptotic variance of the coeﬃcient. in this case. the “weak instrument” problem can arise even when the ﬁrst stage tests are signiﬁcant at conventional levels (5% or 1%) and the researcher is using a large sample. F. Hahn and Hausman (2002a. holding the expectation of the ﬁrst stage F statistic constant as the sample size increases. The consequence of excluded instruments with little explanatory power is increased bias in the estimated IV coeﬃcients (Hahn and Hausman 2002b). for example..

16 Instrumental variables and GMM: Estimation and testing mental variables estimation. evaluated at the eﬃcient GMM estimator βEGMM . Thus.3 Overidentifying restrictions in IV In the special case of linear instrumental variables under conditional heteroskedasticity.11 These are tests of the joint hypotheses of correct model speciﬁcation and the orthogonality conditions. the J statistic will be consistent in the presence of arbitrary intra-cluster correlation. Under the null. J is the most common diagnostic utilized in GMM estimation to evaluate the suitability of the model. we obtain Sargan’s statistic = 1 ′ u ′ PZ u u′ Z(Z ′ Z)−1 Z ′ u = ′ u Z(Z ′ Z)−1 Z ′ u = 2 ′ u/n σ u u u/n (43) 11 Thus Davidson and MacKinnon (1993. if we use the IV optimal weighting matrix (34) together with the expression for J (41). 145. the matrix S is estimated using the Ω matrix (27). 4. or because they are being incorrectly excluded from the regression. This may be either because they are not truly exogenous. . The J statistic is distributed as χ2 with degrees of freedom equal to the number of overidentifying restrictions L − K rather than the total number of moment conditions L because. A J(βEGMM ) = ng(β)′ S −1 g(β) ∼ χ2 (41) L−K In the case of heteroskedastic errors. and the J statistic becomes J(βEGMM ) = uZ ′ (Z ′ ΩZ)−1 Z u′ ∼ χ2 L−K A (42) With clustered errors. The ivreg2 procedure routinely presents this test. 236): “Tests of overidentifying restrictions should be calculated routinely whenever one computes IV estimates. cited in Godfrey (1988). A rejection of the null hypothesis implies that the instruments are not satisfying the orthogonality conditions required for their employment. Just as IV is a special case of GMM. K degrees of freedom are used up in estimating the coeﬃcients of β. the ΩC matrix (37) can be used instead. the overidentifying restrictions may be tested via the commonly employed J statistic of Hansen (1982). This statistic is none other than the value of the GMM objective function (20). in eﬀect.” Sargan’s own view. was that regression analysis without testing the orthogonality assumptions is a “pious fraud”. This can be quite important in practice: Hoxby and Paserman (1998) have shown that the presence of intra-cluster correlation can readily cause a standard overidentiﬁcation statistic to over-reject the null. The J statistic is calculated and displayed by ivreg2 when the gmm. robust. and a rejection may properly call either or both of those hypotheses into question. and this J will be consistent in the presence of arbitrary intra-cluster correlation. the concept of the J statistic considerably predates the development of GMM estimation techniques. In the last case. Sargan’s statistic is a special case of Hansen’s J under the assumption of conditional homoskedasticity. labeled as Sargan’s statistic (Sargan 1958) in the estimation output. In the context of GMM. or cluster options are speciﬁed.

Independently of Sargan. and the pseudo-F form of the Basmann statistic is given by equation (7.55). The diﬀerence between Sargan’s and Basmann’s statistics is that the former uses an estimate of the error variance from the IV regression estimated with the full set of overidentifying restrictions. overid calculates the uncentered R2 itself. Basmann proposed an F (L − K. the uncentered 2 total sum of squares of the auxiliary regression needed for the denominator of Ru is simply the residual sum of squares of the original IV regression. may be requested via the f option.57). This auxiliary regression test is that performed by overid after ivreg. Schaﬀer. u′ PZ u/(L − K) Basmann’s F statistic = ′ (44) u MZ u/(n − L) where MZ ≡ I −PZ is the “annihilator” matrix and L is the total number of instruments. and hence the two statistics are asymptotically equivalent. may be requested via the dfr option. the same artiﬁcial regression can be used to generate both the Basmann and the Sargan statistics.C. whereas the latter uses an estimate from a regression without the overidentifying restrictions being imposed. 123). The nRu of this auxiliary regression will 2 have a χL−K distribution under the null hypothesis that all instruments are orthogonal to the error. Baum.13 Either method will generate a consistent estimator of the error variance under the null of instrument validity. An alternative form of the Sargan statistic that uses a small-sample correction. and S. obtained by dividing the numerator u′ PZ u by L − K.12 A good discussion of this test is presented in Wooldridge (2002. Neither the Sargan nor the Basmann statistics computed for an IV regression is valid in the presence of conditional heteroskedasticity. 235–236). a heteroskedasticityrobust overidentiﬁcation statistic can be calculated for an IV regression by applying a general result in the literature for a test of overidentiﬁcation for a GMM estimator with 12 Note that Stata’s regress reports an uncentered R2 only if the model does not contain a constant. M. The all option displays all ﬁve statistics. The Basmann statistic uses the error variance from the estimate of their equation (7. replacing the estimate of the error variance u′ u/n with u′ u/(n − K). “Pseudo-F ” forms of the Sargan and Basmann tests. this is also the version of the Sargan statistic reported by ivreg2 for IV estimation when the small option is used. and the statistic is also automatically reported by ivreg2. The most important of these is that of Basmann (1960).54). 13 See Davidson and MacKinnon (1993. F. In this case. and it can be easily calculated this way by regressing the IV equation’s residuals upon all instruments Z (both the included exogenous variables and those in2 struments that do not appear in the equation). . 2 By default the Sargan nRu statistic and a χ2 version of Basmann’s statistic (without the numerator degrees of freedom) are reported in the overid output. Note that since u′ MZ u = u′ u − u′ PZ u. the Sargan statistic is given by their (7. n − L) test of overidentifying restrictions. The literature contains several variations on this test. Stillman 17 2 2 It is easy to see from (43) that Sargan’s statistic has an nRu form (where Ru is the 2 uncentered R ). Consequently. The main idea behind these variations is that there is more than one way to consistently estimate the variance in the denominator of (43). and a centered R2 otherwise. E.

Although the C statistic can be calculated as the simple diﬀerence between the Hansen–Sargan statistics for two regressions. ineﬃcient but consistent) regression using a smaller set of restrictions. In the IV context.15 The test is known under other names as well. has the null hypothesis that the speciﬁed variables are proper instruments.4 Testing a subset of the overidentifying restrictions The Hansen–Sargan tests for overidentiﬁcation presented above evaluate the entire set of overidentifying restrictions. Proposition 1. Procedure 3. this procedure can generate a negative test statistic in ﬁnite samples. two J statistics): that for the (restricted. a “diﬀerence-in-Sargan” statistic may usefully be employed. if robust inference is sought in an instrumental variables model. 14 See 15 See Ahn (1995). 218–221 and 232–234) or Ruud (2000. . for comprehensive presen- tations. see Hayashi (2000. versus that for the (unrestricted. for an alternative formulation. For excluded instruments. or. treating them as endogenous regressors. Hayashi (2000. which is what IV amounts to in these circumstances. (1988) and dubs it the C statistic. For included instruments.18 Instrumental variables and GMM: Estimation and testing a suboptimal weighting matrix.e. Thus. and the C statistic guaranteed to be nonnegative if the estimate of the error variance σ 2 from the original (restricted.2. this “robustiﬁed Sargan statistic” is numerically identical to the J statistic computed from feasible eﬃcient two-step GMM for that equation. 220). The C test. for eﬃcient GMM.. More precisely.. Chapter 22). this problem can be avoided. we will use the latter term. this is equivalent to dropping them from the instrument list. e. When the robust and/or cluster options are used with ivreg2 to estimate an IV regression with robust standard errors.g. The equivalent procedure in the GMM context is to use the S matrix from the original estimation to calculate both J statistics. such a test may have very little power. S from the restricted estimation is used to form the restricted J statistic. and Hayashi (2000) follows Eichenbaum et al. and the submatrix of S with rows/columns corresponding to the unrestricted estimation is used to form the J statistic for the unrestricted estimation. In a model containing a very large set of excluded instruments. 4. The C test allows us to test a subset of the original set of orthogonality conditions.14 It does not seem to have been noted in the literature that in the IV case. one may calculate the test for overidentiﬁcation via a standard J statistic. more eﬃcient) IV regression is used to calculate the Sargan statistic for the unrestricted IV regression as well. Another common problem arises when the researcher has prior suspicions about the validity of a subset of instruments. the C test hypothecates placing them in the list of included endogenous variables. In these contexts. and wishes to test them. Ruud (2000) calls it the “distance diﬀerence” statistic. the Hansen J statistic for feasible eﬃcient two-step GMM is automatically reported. in which a speciﬁed set of instruments are removed from the set. The statistic is computed as the diﬀerence between two Sargan statistics (or. in essence. the number of suspect instruments being tested). distributed χ2 with degrees of freedom equal to the loss of overidentifying restrictions (i. Wooldridge (1995). fully eﬃcient) regression using the entire set of overidentifying restrictions.

and listing the instruments (either included or excluded) to be challenged. . A heteroskedasticity-robust version is. but at the same time requires that the investigator believe that at least some of the instruments are valid. i.. the resemblance becomes exact. It is easy to see that this is in fact the same procedure used to obtain the Sargan statistic for the special case of no endogenous regressors: X = X2 and Z = [Z1 X2 ]. HOLS estimates are reported along with a robust LM statistic. Stillman 19 The C test is conducted in ivreg2 by specifying the orthog option. and S. the cluster option generates a statistic that is robust to arbitrary intra-cluster correlation. If the gmm option is chosen. available. 4. the Hansen–Sargan statistic for the unrestricted equation will be zero and the C statistic will coincide with the Hansen–Sargan statistic for the original (restricted) equation.5 Tests of overidentifying restrictions as Lagrange multiplier (score) tests The Sargan test can be viewed as analogous to a Lagrange multiplier (LM) or score test. 577). LM When ivreg2 is used to generate OLS estimates. Chapter 22). for which the researcher wishes to test whether the additional variables Z1 can be omitted. the Sargan statistic reported is an test of the variables in the IV varlist. see Ruud (2000. LM 5 Testing for endogeneity of the regressors There may well be reason to suspect nonorthogonality between regressors and errors— which can arise from several sources. Baum. HOLS. Schaﬀer.e.16 In the case of OLS. see Ruud (2000.17 The robust LM statistic for OLS is numerically equivalent to the J statistic from feasible eﬃcient two-step GMM. If the estimation method is OLS but the error is not homoskedastic. Note that if the unrestricted equation is exactly identiﬁed. 17 See Wooldridge (2002. a result that again does not seem to have been noted in the literature. The LM test statistic of this hypothesis 2 is obtained as nRu from a regression of the OLS residuals u on X2 and Z1 . As usual. both X2 and Z1 are assumed to be exogenous. F. M. after all. 58–61) and Wooldridge (1995) for more detailed discussion. however. and this will be true irrespective of the instruments used to identify the unrestricted estimation. The equation must still be identiﬁed with these instruments either removed or reconsidered as endogenous if the C statistic is to be calculated. Consider the model Y = X2 β +u. This result carries over into GMM estimation using Cragg’s HOLS: the J statistic for the HOLS estimator is a heteroskedasticity-robust LM-type test of the hypothesis that Z1 can be omitted from the estimation. including classical errors-in-variables. This illustrates how the Hansen–Sargan overidentiﬁcation test is an “omnibus” test for the failure of any of the instruments to satisfy the orthogonality conditions. then the standard test is no longer valid. Turning to IV or eﬃcient GMM estimation for the sake of consistency must be balanced 16 For a detailed discussion of the relationship between the diﬀerent types of tests in a GMM framework. E.C.

scaled by the precision matrix. this would be the result if Stata’s hausman command were used without any options. depending on which estimates of the asymptotic variances are used. The test is perhaps best interpreted not as a test for the endogeneity or exogeneity of regressors per se. . this loss of eﬃciency is a price worth paying if the OLS estimator is biased and inconsistent. In the Hausman form of the test. The test statistic is distributed as χ2 with K1 degrees of freedom. A Hausman statistic for a test of endogeneity in an IV regression is formed by choosing OLS as the eﬃcient estimator β e and IV as the ineﬃcient but consistent estimator β c . and by β e the estimator that is fully eﬃcient under the null but inconsistent if the null is not true. this being the number of regressors being tested for endogeneity. but rather as a test of the consequence of employing diﬀerent estimation methods on the same equation. an important cost of performing IV estimation when x and u are uncorrelated: the asymptotic variance of the IV estimator is always larger. and the operator − denotes a generalized inverse. would be very useful. these restrictions can be added to the set of moment conditions. thus. Denote by β c the estimator that is consistent under both the null and the alternative hypotheses. “. 490) Naturally. a quadratic form in the diﬀerences between the two coeﬃcient vectors. gives rise to a test statistic for the null hypothesis that the OLS estimator is consistent and fully eﬃcient. a test of the appropriateness of OLS.20 Instrumental variables and GMM: Estimation and testing against the inevitable loss of eﬃciency. The intuition for such a test may also be couched in terms of the number of orthogonality conditions available. than the asymptotic variance of the OLS estimator.” (Wooldridge 2003.1 Durbin–Wu–Hausman tests for endogeneity in IV estimation Many econometrics texts discuss the issue of “OLS versus IV” in the context of the Durbin–Wu–Hausman (DWH) tests. . and more eﬃcient estimation will be possible. 5. The Hausman statistic comes in several ﬂavors. The Hausman (1978) speciﬁcation test takes the quadratic form H = n(β c − β e )′ D− (β c − β e ) where D = V (β c ) − V (β e ) where V (β) denotes a consistent estimate of the asymptotic variance of β. May all or some of the included endogenous regressors be appropriately treated as exogenous? If so. and the necessity to resort to instrumental variables or GMM methods. As Wooldridge states. This is actually rarely done because. and sometimes much larger. which involve estimating the model via both OLS and IV approaches and comparing the resulting coeﬃcient vectors. the point estimates should be qualitatively unaﬀected. it has the drawback of possibly generating a (45) . An obvious possibility would be to use V (βIV ) and V (βOLS ) as generated by standard IV and OLS estimation. although asymptotically valid. Under the null hypothesis that OLS is an appropriate estimation technique. only eﬃciency should be lost by turning to IV.

Recall that the standard asymptotic covariances for IV and OLS are 2 V (βIV ) = σIV (X ′ PZ X)−1 2 V (βOLS ) = σOLS (X ′ X)−1 (46) Under the null. What will happen in this case is that hausman will report the correct χ2 statistic. however.C. . then the D matrix in (45) becomes 2 D = σOLS (X ′ PZ X)−1 − (X ′ X)−1 (47) This version of the endogeneity test was ﬁrst proposed by Durbin (1954) and separately by Wu (1973) (his T4 statistic) and Hausman (1978). it may sometimes come up with the wrong answer. then the generalized inverse of D is guaranteed to exist and a positive test statistic is guaranteed. i. the matrix diﬀerence in the expression equivalent to (47) is positive deﬁnite and a generalized inverse is not required. If one of the variance–covariance matrices in D uses a small-sample correction. 18 Readers should also bear in mind here and below that the estimates of the error variances may or may not have small-sample corrections. and S. and the user will have to calculate the correct p−value by hand. It will correctly report K1 degrees of freedom for the test if a common estimate of the error variance is used. ivreg and/or. The Hausman statistic for the endogeneity test can also be expressed in terms of a test of the coeﬃcients of the endogenous regressors alone and the rest of the βs removed. and either can be used to form the Hausman statistic.. Use of hausman with the sigmamore or sigmaless options avoids the additional annoyance that because Stata’s hausman tries to deduce the correct degrees of freedom for the test from the rank of the matrix D. see 1 note 19 above. 20 This works in the former two cases because the matrix diﬀerence in (47) and (48) has rank K . but with degrees of freedom equal to K rather than K1 . F. 384–385). then so should the other. see Greene (2000. It can be obtained within Stata by using hausman with the sigmamore option in conjunction with estimation by regress. Intuitively. in either the Durbin (47) or Wu T3 (48) forms of the statistic. See Bowden and Turkington (1984.e. Stillman 21 negative Hausman statistic in ﬁnite samples.18 Avoiding this problem is straightforward. If the Hausman statistic is formed using the the D matrix becomes IV estimate of the error variance. 50–51). then (48) 2 D = σIV (X ′ PZ X)−1 − (X ′ X)−1 This version of the statistic was proposed separately by Wu (1973) (his T3 statistic) and Hausman (1978). 1 the variables being tested are those not shared by X and Z.19 If the Hausman statistic is formed using the OLS estimate of the error variance. namely the K1 endogenous regressors X1 . Schaﬀer. both the IV and the OLS estimates of the error variance are consistent estimators of σ.20 but not if both V (βIV ) and V (βOLS ) are used to form D. Baum. according to the estimation package used and the options chosen. In this alternate form. 19 The matrix diﬀerence in (47) and (48) has rank K . If a common estimate of σ is used. E. ivreg2. M. It can be obtained within Stata by using hausman with the (undocumented) sigmaless option.

The procedure is essentially the same as in the standard IV versus OLS case discussed above: estimate the equation twice. The Hausman statistic framework of (45) for tests of the endogeneity of regressors is available both for IV estimation with robust standard errors and for eﬃcient GMM estimation. this feature is not available with hausman. X1A and X1B . In the tests using the Hausman statistic formulation.4. see Section 4. but it will also generate an invalid test statistic.22 Instrumental variables and GMM: Estimation and testing Although these diﬀerent ﬂavors of the DWH endogeneity test are asymptotically equivalent. 21 Users beware: the sigmamore option following a robust estimation will not only fail to accomplish this. the mildly annoying problem of a possibly negative Hausman statistic can arise. The Durbin ﬂavor of the test has the additional advantage of superior performance when instruments are weak (Staiger and Stock 1997). . the less eﬃcient but consistent estimator β c remains the IV estimator βIV . and form the Hausman statistic using the estimated coeﬃcients and (robust) covariance matrices. If Stata’s hausman command is used to form the statistic this way. and furthermore. 2 2 Given the choice between forming the Hausman statistic using either σOLS or σIV . The way to guarantee a nonnegative test statistic is the same method used with the C test: the equivalent of the sigmamore option of hausman would be to use the S matrix from the more eﬃcient estimation to form the covariance matrix for the less eﬃcient but consistent estimation as well. as we shall see at the very end of this section. the researcher may be certain that one or more regressors in X1 is endogenous but may question the endogeneity of the others. In such a context. both are consistent but the former is more eﬃcient. but it is available via the orthog option of ivreg2. and heteroskedastic-robust testing for IV and GMM estimation In some contexts. once with the regressors being tested as exogenous (the more eﬃcient estimator) and once with the same regressors treated as endogenous (the less eﬃcient but consistent estimator). use of the σ 2 from the more eﬃcient estimation is traditional. Consider dividing the set of endogenous regressors into two subsets. hausman will report the correct statistic but with the wrong degrees of freedom (K instead of the correct K1 ). Again. the standard choice is the former (the Durbin statistic) because under the null. since both are consistent under the null. the DWH tests above are easily modiﬁed to apply to a subset of the endogenous regressors. where only the second set of variables is to be tested for endogeneity. they will diﬀer numerically. (45). Unfortunately. 5.2 Extensions: Testing a subset of the regressors for endogeneity. but the fully eﬃcient estimator is now the IV estimator βIVB from the regression in which X1A is still treated as endogenous but X1B is treated as exogenous. A positive test statistic can again be guaranteed if the estimate of the error variance σ used in the matrix D is from either of the two IV estimations.21 nor can it easily be computed by hand. and may perform diﬀerently in ﬁnite samples.

will be distributed as χ2 with c degrees of freedom = M in[Le − Lc . Le − Lc . σ 2 . then. the two test statistics will be numerically equivalent. 23 We 22 See . respectively. the restricted (eﬃcient) and the unrestricted (consistent but ineﬃcient) regressions. In the conditional heteroskedasticity case. and S. 578-584).3 Durbin–Wu–Hausman tests as GMM tests of orthogonality conditions Readers at this point may be wondering about the relationship between the GMM tests of orthogonality conditions implemented by the Hansen–Sargan. The potential resemblance is even closer once we note that the application of the Hausman test is not limited to testing the endogeneity of regressors. When the number of moment conditions being tested is larger Hausman and Taylor (1981) and Newey (1985). the Hausman statistic based on IV or GMM estimates. Baum. A Hausman test. because the two estimators being tested are both GMM estimators. F. K1 ].4. the e c c degrees of freedom will be L − L if Le − Lc ≤ K1 but unknown otherwise (making the 22 test impractical). like the C test. the C statistic and the Hausman statistic are numerically equivalent. can be used to test a variety of combinations of the orthogonality conditions. Schaﬀer. of course. Stillman 23 5. In this case. An example of when the two test statistics diﬀer would arise when the investigator has suspicions about a large number of excluded instruments. namely when we want to test the endogeneity of regressors. the Hausman speciﬁcation test is a test of linear combinations of orthogonality conditions (Ruud 2000. not only those involving regressors but those involving excluded instruments as well. Denote by Le and Lc the number of total instruments in. may be c larger than the K1 endogenous regressors in the less eﬃcient estimation. the two statistics will be numerically diﬀerent. and the Durbin–Wu–Hausman tests. The intuition behind the circumstances in which the two statistics will diﬀer follows from what is being tested. M. is the diﬀerence between the GMM C test and the Hausman speciﬁcation test? In fact. Le − Lc is therefore the number of orthogonality conditions being tested. also need to assume.2–4. Then it can be shown that under conditional homoskedasticity. the C statistic will e c c have L − L degrees of freedom. or the same S matrix. the number of instruments being tested. that the two tests use the same estimate of the error variance. We can state this more precisely as follows: If c Le − Lc ≤ K1 . and C tests as discussed in Sections 4. respectively. summarized by Hayashi (2000. When the particular linear combination of orthogonality conditions being tested is the same for the C test and for the Hausman test. One commonly encountered case in which the two statistics are exactly equivalent is in fact the one with which we began our discussion of DWH tests.23 e c c If L − L > K1 . 233–34). Basmann. What. c Also denote by K1 the number of endogenous regressors in the unrestricted regression. E. The Hausman test is a vector of contrasts test that detects changes in the coeﬃcients of the regressors treated as endogenous in the consistent but ineﬃcient speciﬁcations. β e and β c .C. and the Hausman statistic will have K1 degrees of freedom in the conditional homoskedasticity case (and an unknown number of degrees of freedom in the conditional heteroskedasticity case).

5. The two tests are numerically the same. the Hausman test will have fewer degrees of freedom than the C test. This means an investigator faces a trade-oﬀ when deciding which of the two tests to use: when the two tests diﬀer. save. have the same suspicions about x1B . This program will take as its argument a varlist consisting of the subset of regressors to be tested for endogeneity. s/he starts with a speciﬁcation in which x1A is still treated as endogenous but x1B is exogenous. As Ruud (2000). instead. 585. if the varlist is empty. and are testing the same hypothesis—the exogeneity of x1B —and the only diﬀerence is the starting point of the researchers. S/he does. and so s/he employs a C test of its orthogonality to see whether or not s/he needs to reduce the set of orthogonality conditions from 6 to 5. then use ivendog to conduct an endogeneity test. Fit the less eﬃcient but consistent model using IV. sigmamore. . one of the appealing features of the Hausman test is that its particular linear combination of moment conditions also determines the consistency of the more eﬃcient GMM estimator. Fit the fully eﬃcient model using ivreg2. and is more powerful than the C test at detecting violations on restrictions of these linear combinations. The DWH endogeneity tests are usually presented in textbooks as tests of “endogeneity”. to the circumstances in which the researcher is operating and in particular his/her starting point. and so we employ the Hausman form of the DWH endogeneity test to see whether or not we can increase our set of orthogonality conditions from 5 to 6. followed by the command hausman.24 Instrumental variables and GMM: Estimation and testing than the number of endogenous regressors that will be aﬀected by them. points out. Then. the full set of endogenous regressors is tested. 3.4 DWH endogeneity tests in practice There are a variety of ways of conducting a DWH endogeneity test in Stata for the standard IV case with conditional homoskedasticity. “not really”. The reason for the diﬀerent terminology relates. but the latter test will be able to detect other violations of moment conditions that the former test cannot. the Hausman test is a test of linear combinations of moment conditions. ﬁt the fully eﬃcient model by OLS (or by IV if only a subset of regressors is being tested for endogeneity). Fit the less eﬃcient but consistent model using ivreg. Now consider a second researcher whose priors are somewhat less conservative. Say we start with an IV estimation in which two regressors x1A and x1B are treated as endogenous and there are ﬁve excluded instruments. 2. and the Hansen–Sargan–Basmann C tests are usually presented as tests of the “validity” or “exogeneity” of instruments—and we have adopted these conventions here—but they are all really just tests of orthogonality conditions. in the IV context. followed by hausman. Three equivalent ways of obtaining the Durbin ﬂavor of the Durbin–Wu–Hausman statistic (47) are 1. specifying the regressors to be tested in the orthog option. There is an interesting semantic issue here: is there a diﬀerence between an “endogeneity test” and a test of “exogeneity” or “orthogonality”? The answer is. however. We suspect that we do not need to be instrumenting x1B .

and is included in the output of ivendog. y = X1A δ + X1B λ + X2 β + VB Θ + ǫ (52) which is then estimated via instrumental variables. Schaﬀer. 237–242). the ﬁrst stage regression implies that this condition is equivalent to a test of v ⊥ u. y = β1 x1 + X2 β2 + u. This is the test statistic introduced by Wu (1973) (his T2 ). and separately shown by Hausman (1978) to be calculated straightforwardly through the use of auxiliary regressions. and S. as the test can be done in one step. The test statistic then becomes an F test. Since by assumption each z in Z is uncorrelated with u. The auxiliary regression approach involves estimating the reduced form (ﬁrst-stage) regression for x1 : x1 = Z1 Γ1 + X2 Γ2 + v = ZΓ + v (50) We are concerned with testing that x1 ⊥ u. We will refer to it as the Wu–Hausman statistic. since it merely requires the estimation of the ﬁrst-stage regression for each of the endogenous variables. Exogeneity of the z’s implies that v—the residuals from OLS estimation of the ﬁrst-stage regression (50)—will be a consistent estimator of v. it is a ﬁnite sample test exactly distributed as F (see Wu (1973) and Nakamura and Nakamura (1981)). we augment (49) with u and re-estimate it with OLS. M. if one is speciﬁed) and with excluded instruments Z1 as usual. (49) with X2 ≡ Z2 assumed exogenous (including the constant. The modiﬁed test involves estimating the ﬁrst-stage regression for each of the K1B variables in X1B in order to generate a residual series.C. Stillman 25 The latter two methods are of course more convenient than the ﬁrst. Baum. with numerator degrees of freedom equal to the number of included endogenous variables. F. E. and augmentation of the original model with their residual series. These residual series VB are then used to augment the original model. Wu (1974)’s Monte Carlo studies also suggest that 2 this statistic is to be preferred to the statistic using just σIV . as Davidson and MacKinnon (1993.24 Consider a simpliﬁed version of our basic model (1) with a single endogenous regressor x1 . A version of the Wu–Hausman statistic for testing a subset of regressors is also available. with only X1A speciﬁed as included endogenous variables. Yet another asymptotically equivalent ﬂavor of the DWH test is available for standard IV estimation under conditional homoskedasticity. . One advantage of the Wu–Hausman F statistic over the other DWH tests for IV versus OLS is that with certain normality assumptions. A t−test of the signiﬁcance of v in this auxiliary regression is then a direct test of the null hypothesis—in this context. 241–242) point out. Thus. The test for endogeneity of the variables in X1B is then a test for 24 A more detailed presentation of the test can be found in Davidson and MacKinnon (1993. that θ = 0: y = β1 x1 + X2 β2 + θv + ǫ (51) The Wu–Hausman test may be readily generalized to multiple endogenous variables.

An inconvenient complication here is that an ordinary F test for the signiﬁcance of Θ in this auxiliary regression will not be valid. the C statistic and the Hausman form of the DWH test are numerically equal. If the eﬃcient model is OLS.4) that the C statistic is simply the diﬀerence of two Sargan statistics. and obtaining the correct SSR requires further steps.26 Instrumental variables and GMM: Estimation and testing the joint signiﬁcance of the Θ estimates. because a residual series must be constructed separately for every endogenous variable being tested. and we can use the estimate of the error variance from the more eﬃcient estimation for both. and with the ﬁtted values for the variables X1A maintained as endogenous. we can see from (53) and (54) that the Wu–Hausman F statistic can be easily calculated from the same quantities needed for the DWH statistic. Q∗ can also be interpreted as the diﬀerence between the sums of squares of the second-stage estimation of the eﬃcient model with and without the residual series.25 From the discussion in the preceding section. The Durbin ﬂavor of the Durbin–Wu–Hausman statistic (47) can be written as Q∗ (53) Durbin DWH: χ2 (K1B ) = U SSR/n and the Wu–Hausman F statistic can be written as Wu-Hausman: F (K1B . We have taken a diﬀerent and simpler approach to programming the Wu–Hausman statistic in ivendog. and when the error variance from the more eﬃcient estimation is used. the Hausman form of the DWH test is the Durbin ﬂavor. then the second-stage estimation is simply OLS augmented by the residual series. Instead. Chapter 7). we can write 25 See Wu (1973) or Nakamura and Nakamura (1981). Only in the special case where the eﬃcient estimator is OLS will an ordinary F test yield the correct test statistic. however. we know that for tests of the endogeneity of regressors.3) and (4. because the unrestricted sum of squares needed for the denominator is wrong. and U SSR is the sum of squared residuals from the eﬃcient estimate of the model. We also know from the discussion in Sections (4. This means that the Wu–Hausman F statistic in (54) does not need to be calculated using the traditional auxiliary regression method. Finally. . with all the ﬁrst-stage regressions and generation of residual series as described above. More precisely. it can be calculated using only three additional regressions: one to estimate the restricted/eﬃcient model. and two artiﬁcial regressions to obtain the two Sargan statistics. see Davidson and MacKinnon (1993. rejection of that null hypothesis implies that instruments must be provided for the X1B variables. one for the unrestricted/consistent estimation and one for the restricted/eﬃcient estimation. n − K − K1B ) = Q∗ /K1B (U SSR − Q∗ )/(n − K − K1B ) (54) where Q∗ is the diﬀerence between the restricted and unrestricted sums of squares given by the auxiliary regression (51) or (52). The auxiliary regression approach to obtaining the Wu–Hausman statistic described above has the further disadvantage of being computationally expensive and practically cumbersome when there are more than a few endogenous variables to be tested.

Schaﬀer. see the discussion of (44) above. It is easy to see from the above that the Durbin ﬂavor of the Durbin– c Wu–Hausman χ2 test statistic becomes identical to the Sargan statistic (43) for the restricted/eﬃcient estimation. respectively.X1B is the projection matrix of the instruments Z augmented by the regressors X1B whose endogeneity is being tested. and S. the Sargan statistic for that equation is zero. the use of hausman to test regressors for endogeneity in the context of heteroskedasticity-robust or eﬃcient GMM estimation will sometimes generate negative test statistics. chi2 dfr f all ivendog varlist 26 This is another way of illustrating that the estimate of the error variance used in Basmann’s statistic comes from an estimation without any overidentifying restrictions being imposed. ivlev ivsq fitlev fitsq ph phnorm nr2 bpg all . As noted above. All these problems can be avoided and a valid endogeneity test statistic obtained simply through the use of the C statistic: estimate the restricted/fully eﬃcient model with ivreg2. If Le − Lc > K1 . M. 6 Syntax diagrams if exp in range . specifying the regressors to test for endogeneity in the orthog option. and the degrees of freedom printed out c for the statistic will be wrong. F. In that case. . Stillman u′ PZ.C. A special case worth noting is when the unrestricted/consistent estimation is exactly identiﬁed. and the Wu–Hausman F statistic becomes identical to Basmann’s F statistic (44). there is much less choice when estimating either IV with a robust covariance matrix or eﬃcient GMM.X1B ue − u′ PZ uc )}/(n − K − K1B ) e e c (56) where ue and uc refer to the residuals from the restricted/eﬃcient and the unrestricted/consistent estimations.X1B ue − u′ PZ uc e c u′ ue /n e 27 Durbin DWH: χ2 (K1B ) = W–H: F (K1B . and PZ.26 Whereas we have available a large smorgasbord of alternative but asymptotically equivalent tests of endogeneity in the case of IV estimation under conditional homoskedasticity. there is the additional problem that the degrees of freedom of the Hausman statistic are unknown. and hence u′ PZ uc = 0. gmm orthog(#) small level(#) hascons noconstant robust cluster(varname) first ffirst noheader nofooter eform(string) depname(varname) mse1 plus ivreg2 depvar varlist1 (varlist2=varlist iv) weight ivhettest varlist overid . n − K − K1B ) = (55) (u′ PZ. Baum.X1B ue − u′ PZ uc )/K1B e c {u′ ue − (u′ PZ. E.

2d ed. J. 1993. Econometrica 51: 751–763. Journal of the American Statistical Association 55(292): 650–659. Weisberg. 1995. Pagan. 1983. and J. or help ivendog. Bound. Breusch. On ﬁnite sample distributions of generalized classical linear identiﬁability test statistics. C. A simple test for heteroskedasticity and random coeﬃcient variation. S. 1984. Basmann. Cook. Barbara Sianesi. Cambridge: Cambridge University Press. Estimation and Inference in Econometrics. and seminar participants at Heriot–Watt University for their constructive suggestions. and A. Journal of Econometrics 18: 5–46. Oxford Bulletin of Economics and Statistics 49: 431–434. R. 1995. Baker. M. G. and members of the Stata user community who helped to test the software. E. G. help ivhettest. and M. R. 7 Acknowledgments We are grateful to Jeﬀrey Wooldridge. Huizinga. Jaeger. J. Econometrica 47: 1287–1294. Journal of the American Statistical Association 90: 443–450.. help overid. R. 8 References Ahn. and S. Davidson. 1960. D. and D. Biometrika 70: 1–10. New York: Oxford University Press. Robust GMM Tests for Model Speciﬁcation. Arthur Lewbel. . and R. T. D. More eﬃcient estimation in the presence of heteroskedasticity of unknown form. 1987. Any remaining errors are our own.. Fumio Hayashi. Arellano. R. J. Instrumental Variables. We also thank an associate editor of the Stata Journal. Cragg. Journal of Econometrics 21: 333– 355. Cumby. Turkington. type help ivreg2. Multivariate regression models for panel data. A. Chamberlain. Diagnostics for heteroscedasticity in regression. Computing robust standard errors for within–groups estimators. R. MacKinnon. Problems with instrumental variable estimation when the correlation between the instruments and the endogenous explanatory variables is weak. 1982. S. Arizona State University (Working Paper). R. 1983. Obstfeld.28 Instrumental variables and GMM: Estimation and testing For a description of the options accepted by these commands. 1983. Two-step two-stage least squares estimation in models with rational expectations. 1979. J. Bowden.

A. and W. Econometrica 70(1): 163–89. Random group eﬀects and the precision of regression estimates. National Bureau of Economic Research (Technical Paper 223). . F. Hansen. Madison. C. G. J. 1982. NJ: Prentice–Hall. Econometric Analysis. Hausman.. Upper Saddle River. 1978. Notes on bias in estimators for simultaneous equation models. A new speciﬁcation test for the validity of instrumental variables. D. Misspeciﬁcation Tests in Econometrics: The Lagrange Multiplier Principle and Other Approaches. Large sample properties of generalized method of moments estimators. Baum. Economics Letters 75(2): 237–41. Godfrey. E. —. J. Hansen. Hausman. R. Hausman. Testing for multiplicative heteroskedasticity. and M. J. 1996. —. Journal of Econometrics 17: 107–112. K´zdi. B. Economics Letters 8: 239–245. Koenker. 1986. 4th ed. Singleton. Hoxby. Princeton. Hansen. A time series analysis of representative agent models of consumption and leisure. Finite sample properties of some alternative GMM estimators. Cambridge: Cambridge University Press. Moulton. J. WI. Taylor.ssc.wisc. M. Hahn. S. 1954. Stillman 29 Durbin. 1999. —. Overidentiﬁcation tests with grouped data. Eichenbaum. B. Speciﬁcation tests in econometrics. J. Instrument relevance in multivariate linear models. Review of the International Statistical Institute 22: 23–32. Yaron. E. and K.C. 1981. 1998. Journal of Econometrics 8: 227–236. Econometrics. M. G. Hansen. NJ: Princeton University Press. and A. Paserman. A note on Studentizing a test for heteroskedasticity. 2002a.htm. 2002b. University of Michigan (Working Paper). L. R. Journal of Econometrics 32: 385–397. L. 2000. L. and S. Review of Economics & Statistics 81(3): 550–552. Schaﬀer. 2000. P. Hayashi. L. Heaton. 2002. 1978. Econometrica 50(3): 1029–1054. W. Journal of Business and Economic Statistics 14(3): 262–280.edu/˜bhansen/notes/notes. J. 2000. Quarterly Journal of Economics 103(1): 51–78. Greene. 1981. Manuscript. and J. A generalized speciﬁcation test. E. Econometrica 46(6): 1251–1271. Econometrics. Errors in variables. 1988. 1988. F. http://www.. The Economic Returns to Education: Finite–Sample Properties of an e IV Estimator.

Econometric Reviews 2(2): 159–218. MA: Blackwell Publishers. 1982. Review of Economics & Statistics 79(2): 348–352. Journal of Econometrics 29: 229–256. Taylor. Alternative tests of independence between stochastic regressors and disturbances. C. 2000. Wu. FL: Academic Press. M. Pagan. Newey. The estimation of economic relationships using instrumental variables. and T. Instrumental variables regression with independent observations. 1995. Econometrica 65(3): 557–586. 1984. Wu. Instrumental variables regression with weak instruments. N. R. S. Econometrica 49(6): 1583–1588. Stock. H. Econometrica 48: 817–838. —. Hall. P. 66–87. 1985. White. 1983. 1974. W. 1981. On the relationships among several speciﬁcation error tests presented by Durbin. Pesaran. Shea. Wooldridge. Asymptotic Theory for Econometricians. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Nakamura. Rao. G. 2d ed. Ruud. J. 2003. and L. B. 1997. Staiger. Diagnostics for IV regressions. W. 1973.-M. 1997. New York: SouthWestern College Publishing. Econometrica 42(3): 529–546. Diagnostic tests as residual analysis. M. Phillips. MA: MIT Press. H. —. A. Cambridge. Instrument relevance in multivariate linear models: A simple measure. Maddala. and D. D. Oxford: Oxford University Press. P. A. and M. D. eds. Econometric Analysis of Cross Section and Panel Data. Introductory Econometrics: A Modern Approach. In Advances in Econometrics and Quantitative Economics: Essays in honor of Professor C. 1958. Oxford Bulletin of Economics & Statistics 61(2): 255–281. Srinivasan. H. Score diagnostics for linear models estimated by two stage least squares. —. R. 1999. Generalized method of moments speciﬁcation testing. Econometrica 41(4): 733–750. and Hausman. Econometrica 50(2): 483–499. —. 1980. An Introduction to Classical Econometric Theory. and J. 2002. —. Econometrica 26(3): 393–415.30 Instrumental variables and GMM: Estimation and testing Nakamura. Cambridge. Orlando. Sargan. . A. Alternative tests of independence between stochastic regressors and disturbances: Finite sample results. J. J.

M. the Institute for the Study of Labor (IZA). and serves on the Advisory Council of the Society for Computational Economics. His research interests include various aspects of ﬁrm and household behavior in the transition countries of Eastern Europe. and S. He is also a Research Fellow at the Centre for Economic Policy Research (CEPR). and the William Davidson Institute. the former USSR. Baum. and the United States.C. He is also an aﬃliated Research Fellow at the Institute for the Study of Labor (IZA) and the William Davidson Institute. Baum founded and manages the Boston College Statistical Software Components (SSC) archive at RePEc (http://repec. bank lending behavior.org). E. Baum is an associate professor of economics at Boston College. His recent research has focused on the eﬀects of uncertainty on international trade ﬂows. Scotland. He is an associate editor of Computational Economics and The Stata Journal. Stillman About the Authors 31 Christopher F. His current research examines the eﬀect of public policy and institutions on various dynamic aspects of household well-being in New Zealand. Schaﬀer. and East Asia. Schaﬀer is professor of economics and Director of the Centre for Economic Reform and Transformation (CERT) at Heriot–Watt University. and ﬁrms’ cash holdings. . Mark E. Steven Stillman is a Senior Economic Researcher in the Labour Market Policy Group of the New Zealand Department of Labour. F. Russia. Edinburgh.

- 798530
- Exclusive Dealing
- econometrics_s03
- Endogeneity and Instrumental Variables
- Two-Stage Least Squares (2SLS)
- blp-rasmusen
- Dynamic Panel Data Models
- IV Tes Lecture
- Nathan Fong Jmp
- US Federal Reserve
- 3548816
- Critica Variables Instrument Ales
- IV Handout
- Beer 2
- wp10174
- The Impact of Mobile Phones on Farmers’ Welfare
- EViews_7_Whats_New
- Economic Factor
- School Resources and Student Achievement
- Silva & Sumarto (2015).pdf
- Stock Wright Yogo
- 30_pdf
- simultnaeous ch6
- Race to the Bottom
- Econometrics I 21
- Opera Donations
- Volume II Issue 2(4) Winter 2011
- Short introduction to the Generalized Method of Moments.pdf
- ECON370Term2Lec2
- What Are the Channels Through Which

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading