This action might not be possible to undo. Are you sure you want to continue?

# Identiﬁcation, weak instruments and statistical inference in

econometrics

∗

Jean-Marie Dufour

†

Université de Montréal

First version:May 2003

This version: July 2003

∗

This paper is based on the author’s Presidential Address to the Canadian Economics Association given on May

31, 2003, at Carleton University (Ottawa). The author thanks Bryan Campbell, Tarek Jouini, Lynda Khalaf, William Mc-

Causland, Nour Meddahi, Benoît Perron and Mohamed Taamouti for several useful comments. This work was supported

by the Canada Research Chair Program (Chair in Econometrics, Université de Montréal), the Alexander-von-Humboldt

Foundation (Germany), the Canadian Network of Centres of Excellence [program on Mathematics of Information Tech-

nology and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the Natural Sciences and

Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the Fonds

de recherche sur la société et la culture (Québec), and the Fonds de recherche sur la nature et les technologies (Québec).

†

Canada Research Chair Holder (Econometrics). Centre interuniversitaire de recherche en analyse des organisa-

tions (CIRANO), Centre interuniversitaire de recherche en économie quantitative (CIREQ), and Département de sciences

économiques, Université de Montréal. Mailing address: Département de sciences économiques, Université de Montréal,

C.P. 6128 succursale Centre-ville, Montréal, Québec, Canada H3C 3J7. TEL: 1 514 343 2400; FAX: 1 514 343 5831;

e-mail: jean.marie.dufour@umontreal.ca. Web page: http://www.fas.umontreal.ca/SCECO/Dufour .

ABSTRACT

We discuss statistical inference problems associated with identiﬁcation and testability in econo-

metrics, and we emphasize the common nature of the two issues. After reviewing the relevant

statistical notions, we consider in turn inference in nonparametric models and recent developments

on weakly identiﬁed models (or weak instruments). We point out that many hypotheses, for which

test procedures are commonly proposed, are not testable at all, while some frequently used econo-

metric methods are fundamentally inappropriate for the models considered. Such situations lead to

ill-deﬁned statistical problems and are often associated with a misguided use of asymptotic distri-

butional results. Concerning nonparametric hypotheses, we discuss three basic problems for which

such difﬁculties occur: (1) testing a mean (or a moment) under (too) weak distributional assump-

tions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models

with an unlimited number of parameters. Concerning weakly identiﬁed models, we stress that valid

inference should be based on proper pivotal functions — a condition not satisﬁed by standard Wald-

type methods based on standard errors — and we discuss recent developments in this ﬁeld, mainly

from the viewpoint of building valid tests and conﬁdence sets. The techniques discussed include

alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests.

The possibility of deriving a ﬁnite-sample distributional theory, robustness to the presence of weak

instruments, and robustness to the speciﬁcation of a model for endogenous explanatory variables

are stressed as important criteria assessing alternative procedures.

Key-words : hypothesis testing; conﬁdence set; conﬁdence interval; identiﬁcation; testability;

asymptotic theory; exact inference; pivotal function; nonparametric model; Bahadur-Savage; het-

eroskedasticity; serial dependence; unit root; simultaneous equations; structural model; instrumen-

tal variable; weak instrument; weak identiﬁcation; simultaneous inference; projection; split-sample;

conditional test; Monte Carlo test; bootstrap.

JEL classiﬁcation numbers: C1, C12, C14, C15, C3, C5.

i

RÉSUMÉ

Nous analysons les problèmes d’inférence associés à l’identiﬁcation et à la testabilité en

économétrie, en soulignant la similarité entre les deux questions. Après une courte revue des no-

tions statistiques requises, nous étudions tour à tour l’inférence dans les modèles non-paramétriques

ainsi que les résultats récents sur les modèles structurels faiblement identiﬁés (ou les instruments

faibles). Nous remarquons que beaucoup d’hypothèses, pour lesquelles des tests sont régulièrement

proposés, ne sont pas en fait testables, tandis que plusieurs méthodes économétriques fréquem-

ment utilisées sont fondamentalement inappropriées pour les modèles considérés. De telles situa-

tions conduisent à des problèmes statistiques mal posés et sont souvent associées à un emploi mal

avisé de résultats distributionnels asymptotiques. Concernant les hypothèses non-paramétriques,

nous analysons trois problèmes de base pour lesquels de telles difﬁcultés apparaissent: (1) tester

une hypothèse sur un moment avec des restrictions trop faibles sur la forme de la distribution;

(2) l’inférence avec hétéroscédasticité de forme non spéciﬁée; (3) l’inférence dans les modèles

dynamiques avec un nombre illimité de paramètres. Concernant les modèles faiblement identiﬁés,

nous insistons sur l’importance d’utiliser des fonctions pivotales — une condition qui n’est pas sat-

isfaite par les méthodes usuelles de type Wald basées sur l’emploi d’écart-types — et nous passons

en revue les développements récents dans ce domaine, en mettant l’accent sur la construction de test

et régions de conﬁance valides. Les techniques considérées comprennent les différentes statistiques

proposées, l’emploi de bornes, la subdivision d’échantillon, les techniques de projection, le condi-

tionnement et les tests de Monte Carlo. Parmi les critères utilisés pour évaluer les procédures, nous

insistons sur la possibilité de fournir une théorie distributionnelle à distance ﬁnie, sur la robustesse

par rapport à la présence d’instruments faibles ainsi que sur la robustesse par rapport la spéciﬁcation

d’un modèle pour les variables explicatives endogènes du modèle.

Mots clés : test d’hypothèse; région de conﬁance; intervalle de conﬁance; identiﬁcation; testabil-

ité; théorie asymptotique; inférence exacte; fonction pivotale; modèle non-paramétrique; Bahadur-

Savage; hétéroscédasticité; dépendance sérielle; racine unitaire; équations simultanées; modèle

structurel; variable instrumentale; instrument faible; inférence simultanée; projection; subdivision

d’échantillon; test conditionnel; test de Monte Carlo; bootstrap.

Classiﬁcation JEL: : C1, C12, C14, C15, C3, C5.

ii

Contents

List of Propositions and Theorems iii

1. Introduction 1

2. Models 4

3. Statistical notions 5

3.1. Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2. Test level and size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3. Conﬁdence sets and pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.4. Testability and identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4. Testability, nonparametric models and asymptotic methods 9

4.1. Procedures robust to nonnormality . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2. Procedures robust to heteroskedasticity of unknown form . . . . . . . . . . . . . 12

4.3. Procedures robust to autocorrelation of arbitrary form . . . . . . . . . . . . . . 13

5. Structural models and weak instruments 14

5.1. Standard simultaneous equations model . . . . . . . . . . . . . . . . . . . . . . 15

5.2. Statistical problems associated with weak instruments . . . . . . . . . . . . . . 16

5.3. Characterization of valid tests and conﬁdence sets . . . . . . . . . . . . . . . . 17

6. Approaches to weak instrument problems 18

6.1. Anderson-Rubin statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.2. Projections and inference on parameter subsets . . . . . . . . . . . . . . . . . . 21

6.3. Alternatives to the AR procedure . . . . . . . . . . . . . . . . . . . . . . . . . 21

7. Extensions 26

7.1. Multivariate regression, simulation-based inference and nonnormal errors . . . . 26

7.2. Nonlinear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8. Conclusion 29

List of Propositions and Theorems

4.1 Theorem : Mean non-testability in nonparametric models . . . . . . . . . . . . . . 10

4.2 Theorem : Characterization of heteroskedasticity robust tests . . . . . . . . . . . . . 12

4.4 Theorem : Unit root non-testability in nonparametric models . . . . . . . . . . . . . 13

iii

1. Introduction

The main objective of econometrics is to supply methods for analyzing economic data, building

models, and assessing alternative theories. Over the last 25 years, econometric research has led

to important developments in many areas, such as: (1) new ﬁelds of applications linked to the

availability of new data, ﬁnancial data, micro-data, panels, qualitative variables; (2) new models:

multivariate time series models, GARCH-types processes; (3) a greater ability to estimate nonlinear

models which require an important computational capacity; (4) methods based on simulation: boot-

strap, indirect inference, Markov chain Monte Carlo techniques; (5) methods based on weak distri-

butional assumptions: nonparametric methods, asymptotic distributions based on “weak regularity

conditions”; (6) discovery of various nonregular problems which require nonstandard distributional

theories, such as unit roots and unidentiﬁed (or weakly identiﬁed) models.

An important component of this work is the development of procedures for testing hypotheses

(or models). Indeed, a view widely held by both scientists and philosophers is that testability or the

formulation of testable hypotheses constitutes a central feature of scientiﬁc activity — a view we

share. With the exception of mathematics, it is not clear a discipline should be viewed as scientiﬁc if

it does not lead to empirically testable hypotheses. But this requirement leaves open the question of

formulating operational procedures for testing models and theories. To date, the only coherent — or,

at least, the only well developed — set of methods are those supplied by statistical and econometric

theory.

Last year, on the same occasion, MacKinnon (2002) discussed the use of simulation-based in-

ference methods in econometrics, speciﬁcally bootstrapping, as a way of getting more reliable tests

and conﬁdence sets. In view of the importance of the issue, this paper also considers questions as-

sociated with the development of reliable inference procedures in econometrics. But our exposition

will be, in a way, more specialized, and in another way, more general — and critical. Speciﬁcally,

we shall focus on general statistical issues raised by identiﬁcation in econometric models, and more

speciﬁcally on weak instruments in the context of structural models [e.g., simultaneous equations

models (SEM)]. We will ﬁnd it useful to bring together two separate streams of literature: namely,

results (from mathematical statistics and econometrics) on testability in nonparametric models, and

the recent econometric research on weak instruments.

1

In particular, we shall emphasize that identi-

ﬁcation problems arise in both literatures and have similar consequences for econometric methodol-

ogy. Further, the literature on nonparametric testability sheds light on various econometric problems

and their solutions.

Simultaneous equations models (SEM) are related in a natural way to the concept of equilib-

rium postulated by economic theory, both in microeconomics and macroeconomics. So it is not

surprising that SEM were introduced and most often employed in the analysis of economic data.

Methods for estimating and testing such models constitute a hallmark of econometric theory and

represent one of its most remarkable achievements. The problems involved are difﬁcult, raising

among various issues the possibility of observational equivalence between alternative parameter

values (non-identiﬁcation) and the use of instrumental variables (IV). Further, the ﬁnite-sample

1

By a nonparametric model (or hypothesis), we mean a set of possible data distributions such that a distribution [e.g.,

the “true” distribution] cannot be singled out by ﬁxing a ﬁnite number of parameter values.

1

distributional theory of estimators and test statistics is very complex, so inference is typically based

on large-sample approximations.

2

IV methods have become a routine part of econometric analysis and, despite a lot of loose

ends (often hidden by asymptotic distributional theory), the topic of SEM was dormant until a

few years ago. Roughly speaking, an instrument should have two basic properties: ﬁrst, it should

be independent (or, at least, uncorrelated) with the disturbance term in the equation of interest

(exogeneity); second, it should be correlated with the included endogenous explanatory variables

for which it is supposed to serve as an instrument (relevance). The exogeneity requirement has

been well known from the very beginning of IV methods. The second one was also known from the

theory of identiﬁcation, but its practical importance was not well appreciated and often hidden from

attention by lists of instruments relegated to footnotes (if not simply absent) in research papers. It

returned to center stage with the discovery of so-called weak instruments, which can be interpreted

as instruments with little relevance (i.e., weakly correlated with endogenous explanatory variables).

Weak instruments lead to poor performance of standard econometric procedures and cases where

they have pernicious effects may be difﬁcult to detect.

3

Interest in the problem also goes far beyond

IV regressions and SEM, because it underscores the pitfalls in using large-sample approximations,

as well as the importance of going back to basic statistical theory when developing econometric

methods.

A parameter (or a parameter vector) in a model is not identiﬁed when it is not possible to

distinguish between alternative values of the parameter. In parametric models, this is typically

interpreted by stating that the postulated distribution of the data — as a function of the parameter

vector (the likelihood function) — can be the same for different values of the parameter vector.

4

An important consequence of this sort of situation is a statistical impossibility: we cannot design

a data-based procedure for distinguishing between equivalent parameter values (unless additional

information is introduced). In particular, no reasonable test can be produced.

5

In nonparametric

models, identiﬁcation is more difﬁcult to characterize because a likelihood function (involving a

ﬁnite number of parameters) is not available and parameters are often introduced through more

abstract techniques (e.g., functionals of distribution functions). But the central problem is the same:

can we distinguish between alternative values of the parameter? So, quite generally, an identiﬁcation

problem can be viewed as a special form of non-testability. Speciﬁcally,

• identiﬁcation involves the possibility of distinguishing different parameter values on the basis

of the corresponding data distributions, while

• testability refers to the possibility of designing procedures that can discriminate between sub-

sets of parameter values.

2

For reviews, see Phillips (1983) and Taylor (1983).

3

Early papers which called attention to the problem include: Nelson and Startz (1990a, 1990b), Buse (1992), Choi

and Phillips (1992), Maddala and Jeong (1992), and Bound, Jaeger, and Baker (1993, 1995).

4

For general expositions of the theory of identiﬁcation in econometrics and statistics, the reader may consult Rothen-

berg (1971), Fisher (1976), Hsiao (1983), Prakasa Rao (1992), Bekker, Merckens, and Wansbeek (1994) and Manski

(1995, 2003).

5

By a reasonable test, we mean here a test that both satisﬁes a level constraint and may have power superior to the

level when the tested hypothesis (the null hypothesis) does not hold. This will be discussed in greater detail below.

2

Alternatively, a problem of non-testability can be viewed as a form of non-identiﬁcation. (or un-

deridentiﬁcation). These problems are closely related. Furthermore, it is well known that one can

create a non-identiﬁed model by introducing redundant parameters, and conversely identiﬁcation

problems can be eliminated by transforming the parameter space (e.g., by reducing the number of

parameters). Problems of non-identiﬁcation are associated with bad parameterizations, inappro-

priate choices of parameter representations. We will see below that the same remark applies quite

generally to non-testability problems, especially in nonparametric setups.

In this paper, we pursue two main objectives: ﬁrst, we analyze the statistical problems associated

with non-identiﬁcation within the broader context of testability; second, we review the inferential

issues linked to the possible presence of weak instruments in structural models. More precisely,

regarding the issue of testability, the following points will be emphasized:

1. many models and hypotheses are formulated in ways that make them fundamentally non-

testable; in particular, this tends to be the case in nonparametric setups;

2. such difﬁculties arise in basic apparently well-deﬁned problems, such as: (a) testing an hy-

pothesis about a mean when the observations are independent and identically distributed (i.i.d.

; (b) testing an hypothesis about a mean (or a median) with heteroskedasticity of unknown

form; (c) testing the unit root hypothesis on an autoregressive model whose order can be

arbitrarily large;

3. some parameters tend to be non-testable (badly identiﬁed) in nonparametric models while

others are not; in particular, non-testability easily occurs for moments (e.g., means, variances)

while it does not for quantiles (e.g., medians); from this viewpoint, moments are not a good

way of representing the properties of distributions in nonparametric setups, while quantiles

are;

4. these phenomena underscore parametric nonseparability problems: statements about a given

parameter (often interpreted as the parameter of interest) are not empirically meaningful with-

out supplying information about other parameters (often called nuisance parameters); but hy-

potheses that set both the parameter of interest and some nuisance parameters may well be

testable in such circumstances, so that the development of appropriate inference procedures

should start from a joint approach;

5. to the extent that asymptotic distributional theory is viewed as a way of producing statistical

methods which are valid under “weak” distributional assumptions, it is fundamentally mis-

leading because, under nonparametric assumptions, such approximations are arbitrarily bad

in ﬁnite samples.

Concerning weak instruments, we will review the associated problems and proposed solutions,

with an emphasis on ﬁnite-sample properties and the development of tests and conﬁdence sets which

are robust to the presence of weak instruments. In particular, the following points will be stressed:

1. in accordance with basic statistical theory, one should always look for pivots as the funda-

mental ingredient for building tests and conﬁdence sets; this principle appears to be especially

important when identiﬁcation problems are present;

3

2. parametric nonseparability arises in striking ways when some parameters may not be iden-

tiﬁed, so that proper pivots may easily involve many more parameters than the parameter of

interest; this also indicates that the common distinction between parameters of interest and

nuisance parameters can be quite arbitrary, if not misleading;

3. important additional criteria for evaluating procedures in such contexts include various forms

of invariance (or robustness), such as: (a) robustness to weak instruments; (b) robustness

to instrument exclusion; (c) robustness to the speciﬁcation of the model for the endogenous

explanatory variables in the equation(s) of interest;

4. weak instrument problems underscore in a striking way the limitations of large-sample argu-

ments for deriving and evaluating inference procedures;

5. very few informative pivotal functions have been proposed in the context of simultaneous

equations models;

6. the early statistic proposed by Anderson and Rubin (1949, AR) constitutes one of the (very

rare) truly pivotal functions proposed for SEM; furthermore, it satisﬁes all the invariance

properties listed above, so that it may reasonably be viewed as a fundamental building block

for developing reliable inference procedures in the presence of weak instruments;

7. a fairly complete set of inference procedures that allow one to produce tests and conﬁdence

sets for all model parameters can be obtained through projection techniques;

8. various extensions and improvements over the AR method are possible, especially in improv-

ing power; however, it is important to note that these often come at the expense of using

large-sample approximations or giving up robustness.

The literature on weak instruments is growing rapidly, and we cannot provide here a complete

review. In particular, we will not discuss in any detail results on estimation, the detection of weak

instruments, or asymptotic theory in this context. For that purpose, we refer the reader to the

excellent survey recently published by Stock, Wright, and Yogo (2002).

The paper is organized as follows. In the next two sections, we review succinctly some basic

notions concerning models (section 2) and statistical theory (section 3), which are important for

our discussion. In section 4, we study testability problems in nonparametric models. In section 5,

we review the statistical difﬁculties associated with weak instruments. In section 6, we examine a

number of possible solutions in the context of linear SEM, while extensions to nonlinear or non-

Gaussian models are considered in Section 7. We conclude in section 8.

2. Models

The purpose of econometric analysis is to develop mathematical representations of data, which we

call models or hypotheses (models subject to restrictions). An hypothesis should have two basic

features.

4

1. It must restrict the expected behavior of observations, be informative. A non-restrictive hy-

pothesis says nothing and, consequently, does not teach us anything: it is empirically empty,

void of empirical content. The more restrictive a model is, the more informative it is, and the

more interesting it is.

2. It must be compatible with available data; ideally, we would like it to be true.

However, these two criteria are not always compatible:

1. the information criterion suggests the use of parsimonious models that usually take the form

of parametric models based on strong assumptions; note the information criterion is empha-

sized by an inﬂuential view in philosophy of science which stresses falsiﬁability as a criterion

for the scientiﬁc character of a theory [Popper (1968)];

2. in contrast, compatibility with observed data is most easily satisﬁed by vague models which

impose few restrictions; vague models may take the form of parametric models with a large

number of free parameters or nonparametric models which involve an inﬁnite set of free

parameters and thus allow for weak assumptions.

Models can be classiﬁed as being either deterministic or stochastic. Deterministic models ,

which claim to make arbitrarily precise predictions, are highly falsiﬁable but always inconsistent

with observed data. Accordingly, most models used in econometrics are stochastic. Such models are

unveriﬁable: as with any theory that makes an indeﬁnite number of predictions, we can never be sure

that the model will not be put in jeopardy by new data. Moreover, they are logically unfalsiﬁable:

in contrast with deterministic models, a probabilistic model is usually logically compatible with all

possible observation vectors.

Given these facts, it is clear any criterion for assessing whether an hypothesis is acceptable must

involve a conventional aspect. The purpose of hypothesis testing theory is to supply a coherent

framework for accepting or rejecting probabilistic hypotheses. It is a probabilistic adaptation of the

falsiﬁcation principle.

6

3. Statistical notions

In this section, we review succinctly basic statistical notions which are essential for understanding

the rest of our discussion. The general outlook follows modern statistical testing theory, derived

from the Neyman-Pearson approach and described in standard textbooks, such as Lehmann (1986).

3.1. Hypotheses

Consider an observational experiment whose result can be represented by a vector of observations

X

(n)

= (X

1

, . . . , X

n

)

(3.1)

6

For further discussion on the issues discussed in this section, the reader may consult Dufour (2000).

5

where X

i

takes real values, and let

¯

F(x) =

¯

F(x

1

, . . . , x

n

) = P[X

1

≤ x

1

, . . . , X

n

≤ x

n

] (3.2)

be its distribution, where x = (x

1

, . . . , x

n

). We denote by F

n

the set of possible distribution

functions on R

n

[

¯

F ∈ F

n

].

For various reasons, we prefer to represent distributions in terms of parameters. There are

two ways of introducing parameters in a model. The ﬁrst is to deﬁne a function from a space of

probability distributions to a vector in some Euclidean space:

θ : F

n

−→R

p

. (3.3)

Examples of such parameters include: the moments of a distribution (mean, variance, kurtosis,

etc.), its quantiles (median, quartiles, etc.). Such functions are also called functionals. The second

approach is to deﬁne a family of distribution functions which are indexed by a parameter vector θ :

F(x) = F

0

(x| θ) (3.4)

where F

0

is a distribution function with a speciﬁc form. For example, if F

0

(x| θ) represents a

Gaussian distribution with mean µ and variance σ

2

[e.g., corresponding to a Gaussian law], we have

θ = (µ, σ

2

).

A model is parametric if the distribution of the data is speciﬁed up to a ﬁnite number of (scalar)

parameters. Otherwise, it is nonparametric. An hypothesis H

0

on X

(n)

is an assertion

H

0

:

¯

F ∈ H

0

, (3.5)

where H

0

is a subset of F

n

, the set of all possible distributions F

n

. The set H

0

may contain: a single

distribution (simple hypothesis), or several distributions (composite hypothesis). In particular, if we

can write θ = (θ

1

, θ

2

), H

0

often takes the following form:

H

0

≡ {F(·) : F(x) = F

0

(x| θ

1

, θ

2

) and θ

1

= θ

0

1

} . (3.6)

We usually abbreviate this as:

H

0

: θ

1

= θ

0

1

. (3.7)

In such a case, we call θ

1

the parameter of interest, and θ

2

a nuisance parameter: the parameter

of interest is set by H

0

but the nuisance parameter remains unknown. H

0

may be interpreted as

follows: there is at least one distribution in H

0

that can be viewed as a representation compatible

with the observed “behavior” of X

(n)

. Then we can say that:

H

0

is acceptable ⇐⇒

_

(∃F ∈ H

0

) F is acceptable

_

(3.8)

or, equivalently,

H

0

is unacceptable ⇐⇒

_

(∀F ∈ H

0

) F is unacceptable

_

. (3.9)

6

It is important to note here that showing that H

0

is unacceptable requires one to show that all

distributions in H

0

are incompatible with the observed data.

3.2. Test level and size

A test for H

0

is a rule by which one decides to reject or accept the hypothesis (or to view it as

incompatible with the data). It usually takes the form:

reject H

0

if S

n

(X

1

, . . . , X

n

) > c ,

do not reject H

0

if S

n

(X

1

, . . . , X

n

) ≤ c .

(3.10)

The test has level α iff

P

F

[Rejecting H

0

] ≤ α for all F ∈ H

0

(3.11)

or, equivalently,

sup

F∈H

0

P

F

[Rejecting H

0

] ≤ α, (3.12)

where P

F

[ · ] is the function (probability measure) giving the probability of an event when the data

distribution function is F. The test has size α if

sup

F∈H

0

P

F

[Rejecting H

0

] = α. (3.13)

H

0

is testable if we can ﬁnd a ﬁnite number c that satisﬁes the level restriction. Probabilities of

rejecting H

0

for distributions outside H

0

(i.e., for F / ∈ H

0

) deﬁne the power function of the test.

7

Power describes the ability of a test to detect a “false” hypothesis. Alternative tests are typically

assessed by comparing their powers: between two tests with the same level, the one with the highest

power against a given alternative distribution F / ∈ H

0

is deemed preferable (at least, under this

particular alternative). Among tests with the same level, we typically like to have a test with the

highest possible power against “alternatives of interest”.

As the set H

0

gets larger, the test procedure must satisfy a bigger set of constraints: the larger

is the set of distributions compatible with a null hypothesis, the stronger are the restrictions on the

test procedure. In other words, the less restrictive an hypothesis is, the more restricted will be the

corresponding test procedure. It is easy to understand that imposing a large set of restrictions on a

test procedure may reduce its power against speciﬁc alternatives. There may be a point where the

restrictions are no longer implementable, in the sense that no procedure which has some power can

satisfy the level constraint: H

0

is non-testable. In such a case, we have an ill-deﬁned test problem.

In a framework such as the one in (3.6), where we distinguish between a parameter of interest

θ

1

and a nuisance parameter θ

2

, this is typically due to heavy dependence of the distribution of

S

n

on the nuisance parameter θ

2

. If the latter is speciﬁed, we may be able to ﬁnd a (ﬁnite) critical

value c = c(α, θ

2

) that satisﬁes the level constraint (3.11). But, in ill-deﬁned problems, c(α, θ

2

)

7

More formally, the power function can be deﬁned as the function: P(F) = P

F

[Rejecting H

0

] for F ∈ H

1

\ H

0

,

where H

1

is an appropriate subset of the set of all possible distributions F

n

. Sometimes, it is also deﬁned on the set

H

1

∪ H

0

, in which case it should satisfy the level constraint for F ∈ H

0

.

7

depends heavily on θ

2

, so that it is not possible to ﬁnd a useful (ﬁnite) critical value for testing H

0

,

i.e. sup

θ

2

c(α, θ

2

) = ∞. Besides, even if this is the case, this does not imply that an hypothesis

that would ﬁx both θ

1

and θ

2

, is not testable, i.e. the hypothesis H

0

: (θ

1

, θ

2

) = (θ

0

1

, θ

0

2

) may

be perfectly testable. But only a complete speciﬁcation of the vector (θ

1

, θ

2

) does allow one to

interpret the values taken by the test statistic S

n

(nonseparability).

3.3. Conﬁdence sets and pivots

If we consider an hypothesis of the form

H

0

(θ

0

1

) : θ

1

= θ

0

1

(3.14)

and if we can build a different test S

n

(θ

0

1

; X

1

, . . . , X

n

) for each possible value of θ

0

1

, we can

determine the set of values that can be viewed as compatible with the data according to the tests

considered:

C =

_

θ

0

1

: S

n

(θ

0

1

; X

1

, . . . , X

n

) ≤ c(θ

0

1

)

_

. (3.15)

If

P

F

_

Rejecting H

0

(θ

0

1

)

¸

≤ α for all F ∈ H(F

0

, θ

0

1

) , (3.16)

we have

inf

θ

1

,θ

2

P[θ

1

∈ C] ≥ 1 −α . (3.17)

C is a conﬁdence set with level 1 − α for θ

1

. The set C covers the “true” parameter value θ

1

with

probability at least 1 −α. The minimal probability of covering the true value of θ

1

, i.e. inf

θ

1

,θ

2

P[θ

1

∈

C], is called the size of the conﬁdence set.

In practice, conﬁdence regions (or conﬁdence intervals) were made possible by the discovery of

pivotal functions (or pivots): a pivot for θ

1

is a function S

n

(θ

1

; X

1

, . . . , X

n

) whose distribution

does not depend on unknown parameters (nuisance parameters); in particular, the distribution does

not depend on θ

2

. More generally, the function S

n

(θ

1

; X

1

, . . . , X

n

) is boundedly pivotal if its

distribution function may depend on θ but is bounded over the parameter space [see Dufour (1997)].

When we have a pivotal function (or a boundedly pivotal function), we can ﬁnd a point c such that:

P[S

n

(θ

1

; X

1

, . . . , X

n

) ≥ c] ≤ α , ∀θ

1

. (3.18)

For example, if X

1

, . . . , X

n

i.i.d.

∼ N[µ, σ

2

], the t-statistic

t

n

(µ) =

√

n(

¯

X

n

−µ)/s

X

(3.19)

where

¯

X

n

=

n

i=1

X

i

/n and s

X

=

n

i=1

(X

i

−

¯

X

n

)/(n − 1), follows a Student t(n − 1) distribution

which does not depend on the unknown values of µ and σ; hence, it is a pivot. By contrast,

√

n(

¯

X

n

−

µ) is not a pivot because its distribution depends on σ. More generally, in the classical linear model

with several regressors, the t statistics for individual coefﬁcients [say, t(β

i

) =

√

n(

ˆ

β

i

− β

i

)/ˆ σ

ˆ

β

i

]

8

constitute pivots because their distributions do not depend on unknown nuisance parameters; in

particular, the values of the other regression coefﬁcients disappear from the distribution.

3.4. Testability and identiﬁcation

When formulating and trying to solve test problems, two types of basic difﬁculties can arise. First,

there is no valid test that satisﬁes reasonable properties [such as depending upon the data]: in such

a case, we have a non-testable hypothesis, an empirically empty hypothesis. Second, the proposed

statistic cannot be pivotal for the model considered: its distribution varies too much under the null

hypothesis to determine a ﬁnite critical point satisfying the level restriction (3.18).

If an hypothesis is non-testable, we are not able to design a reasonable procedure for deciding

whether it holds (without the introduction of additional data or information). This difﬁculty is

closely related to the concept of identiﬁcation in econometrics. A parameter θ is identiﬁable iff

θ(F

1

) = θ(F

2

) =⇒F

1

= F

2

. (3.20)

For θ

1

= θ

2

, we can, in principle, design a procedure for deciding whether θ = θ

1

or θ = θ

2

. The

values of θ are testable. More generally, a parametric transformation g(θ) is identiﬁable iff

g[θ(F

1

)] = g[θ(F

2

)] =⇒F

1

= F

2

. (3.21)

Intuitively, these deﬁnitions mean that different values of the parameter imply different distributions

of the data, so that we may expect to be able to “tell” the difference by looking at the data. This is

certainly the case when a unique distribution is associated with each parameter value [for example,

we may use the Neyman-Pearson likelihood ratio test to make the decision], but this may not be the

case when a parameter covers several distributions. In the next section, we examine several cases

where this happens.

4. Testability, nonparametric models and asymptotic methods

We will now discuss three examples of test problems that look perfectly well deﬁned and sensible

at ﬁrst sight, but turn out to be ill-deﬁned when we look at them more carefully. These include:

(1) testing an hypothesis about a mean when the observations are independent and identically dis-

tributed (i.i.d.); (2) testing an hypothesis about a mean (or a median) with heteroskedasticity of

unknown form; (3) testing the unit root hypothesis on an autoregressive model whose order can be

arbitrarily large.

8

4.1. Procedures robust to nonnormality

One of the most basic problems in econometrics and statistics consists in testing an hypothesis

about a mean, for example, its equality to zero. Tests on regression coefﬁcients in linear regressions

8

Further discussion on the issues discussed in this section is available in Dufour (2001). For rrelated discussions, see

also Horowitz (2001), Maasoumi (1992) and Pötscher (2002).

9

or, more generally, on parameters of models which are estimated by the generalized method of

moments (GMM) can be viewed as extensions of this fundamental problem. If the simplest versions

of the problem have no reasonable solution, the situation will not improve when we consider more

complex versions (as done routinely in econometrics).

The problem of testing an hypothesis about a mean has a very well known and neat solution

when the observations are independent and identically (i.i.d.) distributed according to a normal

distribution: we can use a t test. The normality assumption, however, is often considered to be too

“strong”. So it is tempting to consider a weaker (less restrictive) version of this null hypothesis,

such as

H

0

(µ

0

) : X

1

, . . . , X

n

are i.i.d. observations with E(X

1

) = µ

0

. (4.1)

In other words, we would like to test the hypothesis that the observations have mean µ

0

, under the

general assumption that X

1

, . . . , X

n

are i.i.d. Here H

0

(µ

0

) is a nonparametric hypothesis because

the distribution of the data cannot be completely speciﬁed by ﬁxing a ﬁnite number of parameters.

The set of possible data distributions (or data generating processes) compatible with this hypothesis,

i.e.,

H(µ

0

) = {Distribution functions F

n

∈ F

n

such that H

0

(µ

0

) is satisﬁed} , (4.2)

is much larger here than in the Gaussian case and imposes very strong restrictions on the test.

Indeed, the set H(µ

0

) is so large that the following property must hold.

Theorem 4.1 MEAN NON-TESTABILITY IN NONPARAMETRIC MODELS. If a test has level α for

H

0

(µ

0

), i.e.

P

F

n

[Rejecting H

0

(µ

0

)] ≤ α for all F

n

∈ H(µ

0

) , (4.3)

then, for any µ

1

= µ

0

,

P

F

n

[Rejecting H

0

(µ

0

)] ≤ α for all F

n

∈ H(µ

1

) . (4.4)

Further, if there is at least one value µ

1

= µ

0

such that

P

F

n

_

Rejecting H

0

(µ

0

)

¸

≥ α for at least one F

n

∈ H(µ

1

) , (4.5)

then, for all µ

1

= µ

0

,

P

F

n

_

Rejecting H

0

(µ

0

)

¸

= α for all F

n

∈ H(µ) . (4.6)

PROOF. See Bahadur and Savage (1956).

In other words [by (4.4)], if a test has level α for testing H

0

(µ

0

), the probability of rejecting

H

0

(µ

0

) should not exceed the level irrespective how far the “true” mean is from µ

0

. Further [by

(4.6)], if “by luck” the power of the test gets as high as the level, then the probability of rejecting

should be uniformly equal to the level α. Here, the restrictions imposed by the level constraint are

so strong that the test cannot have power exceeding its level: it should be insensitive to cases where

the null hypothesis does not hold! An optimal test (say, at level .05) in such a problem can be run

10

as follows: (1) ignore the data; (2) using a random number generator, produce a realization of a

variable U according to a uniform distribution on the interval (0, 1), i.e., U ∼ U(0, 1); (3) reject

H

0

if U ≤ .05. Clearly, this is not an interesting procedure. It is also easy to see that a similar result

will hold if we add various nonparametric restrictions on the distribution, such as a ﬁnite variance

assumption.

The above theorem also implies that tests based on the “asymptotic distribution” of the usual t

statistic for µ = µ

0

[t

n

(µ

0

) deﬁned in (3.19)] has size one under H

0

(µ

0

) :

sup

F

n

∈H(µ

0

)

P

F

n

_

|t

n

(µ

0

)| > c

¸

= 1 (4.7)

for any ﬁnite critical value c. In other words, procedures based on the asymptotic distribution of a

test statistic have size that deviate arbitrarily from their nominal size.

A way to interpret what happens here is through the distinction between pointwise convergence

and uniform convergence. Suppose, to simplify, that the probability of rejecting H

0

(µ

0

) when it is

true depends on a single nuisance parameter γ in the following way:

P

n

(γ) ≡ P

γ

_

|t

n

(µ

0

)| > c

¸

= 0.05 + (0.95)e

−|γ|n

(4.8)

where γ = 0. Then, for each value of γ, the test has level 0.05 asymptotically, i.e.

lim

n→∞

P

n

(γ) = 0.05 , (4.9)

but the size of the test is one for all sample sizes:

sup

γ>0

P

n

(γ) = 1 , for all n. (4.10)

P

n

(γ) converges to a level of 0.05 pointwise (for each γ), but the convergence is not uniform, so

that the probability of rejection is arbitrarily close to one for γ sufﬁciently close to zero (for all

sample sizes n).

Many other hypotheses lead to similar difﬁculties. Examples include:

1. hypotheses about various moments of X

t

:

H

0

(σ

2

) : X

1

, . . . , X

n

are i.i.d. observations such that Var(X

t

) = σ

2

,

H

0

(µ

p

) : X

1

, . . . , X

n

are i.i.d. observations such that E(X

p

t

) = µ

p

;

2. most hypotheses on the coefﬁcients of a regression (linear or nonlinear), a structural equation

(as in SEM), or a more general estimating function [Godambe (1960)]:

H

0

(θ

0

) : g

t

(Z

t

, θ

0

) = u

t

, t = 1 , . . . , T , where u

1

, . . . , u

T

are i.i.d.

In econometrics, models of the form H

0

(θ

0

) are typically estimated and tested through a variant of

the generalized method of moments (GMM), usually with weaker assumptions on the distribution

11

of u

1

, . . . , u

T

; see Hansen (1982), Newey and West (1987a), Newey and McFadden (1994) and

Hall (1999). To the extent that GMM methods are viewed as a way to allow for “weak assumptions”,

it follows from the above discussion that they constitute pseudo-solutions of ill-deﬁned problems.

It is important to observe that the above discussion does not imply that all nonparametric hy-

potheses are non testable. In the present case, the problem of non-testability could be eliminated by

choosing another measure of central tendency, such as a median:

H

0.5

0

(m

0

) : X

1

, . . . , X

n

are i.i.d. continuous r.v.’s such that

Med(X

t

) = m

0

, t = 1 , . . . , T .

H

0.5

0

(m

0

) can be easily tested with a sign test [see Pratt and Gibbons (1981, Chapter 2)]. More

generally, hypotheses on the quantiles of the distribution of observations in random sample remain

testable nonparametrically:

H

p

0

(Q

p0

) : X

1

, . . . , X

n

are i.i.d. observations such that

P

_

X

t

≤ Q

p0

¸

= p , t = 1 , . . . , T .

Moments are not empirically meaningful functionals in nonparametric models (unless strong distri-

butional assumptions are added), though quantiles are.

4.2. Procedures robust to heteroskedasticity of unknown form

Another common problem in econometrics consists in developing methods which remain valid in

making inference on regression coefﬁcients when the variances of the observations are not identi-

cal (heteroskedasticity). In particular, this may go as far as looking for tests which are “robust to

heteroskedasticity of unknown form”. But it is not widely appreciated that this involves very strong

restrictions on the procedures that can satisfy this requirement. To see this, consider the prob-

lem which consists in testing whether n observations are independent with common zero median,

namely:

H

0

: X

1

, . . . , X

n

are independent random variables

each with a distribution symmetric about zero.

(4.11)

Equivalently, H

0

states that the joint distribution F

n

of the observations belongs to the (huge) set

H

0

= {F

n

∈ F

n

: F

n

satisﬁes H

0

} : H

0

allows heteroskedasticity of unknown form. In such a

case, we have the following theorem.

Theorem 4.2 CHARACTERIZATION OF HETEROSKEDASTICITY ROBUST TESTS. If a test has

level α for H

0

, where 0 < α < 1, then it must satisfy the condition

P[ Rejecting H

0

| |X

1

|, . . . , |X

n

| ] ≤ α under H

0

. (4.12)

PROOF. See Pratt and Gibbons (1981, Section 5.10) and Lehmann and Stein (1949).

In other words, a valid test with level α must be a sign test — or, more precisely, its level must

be equal to α conditional on the absolute values of the observations (which amounts to considering

12

a test based on the signs of the observations). From this, the following remarkable property follows.

Corollary 4.3 If, for all 0 < α < 1, the condition (4.12) is not satisﬁed, then the size of the test is

equal to one, i.e.

sup

F

n

∈H

0

P

F

n

[Rejecting H

0

] = 1 . (4.13)

In other words, if a test procedure does not satisfy (4.12) for all levels 0 < α < 1, then its true

size is one irrespective of its nominal size. Most so-called “heteroskedasticity robust procedures”

based on “corrected” standard errors [see White (1980), Newey and West (1987b), Davidson and

MacKinnon (1993, Chapter 16), Cushing and McGarvey (1999)] do not satisfy condition (4.12) and

consequently have size one.

9

4.3. Procedures robust to autocorrelation of arbitrary form

As a third illustration, let us now examine the problem of testing the unit root hypothesis in the con-

text of an autoregressive model whose order is inﬁnite or is not bounded by a prespeciﬁed maximal

order:

X

t

= β

0

+

p

k=1

λ

k

X

t−k

+u

t

, u

t

i.i.d.

∼ N[0 , σ

2

] , t = 1 , . . . , n, (4.14)

where p is not bounded a priori. This type of problem has attracted a lot of attention in recent

years.

10

We wish to test:

˜

H

0

:

p

k=1

λ

k

= 1 (4.15)

or, more precisely,

˜

H

0

: X

t

= β

0

+

p

k=1

λ

k

X

t−k

+u

t

, t = 1 , . . . , n, for some p ≥ 0 ,

p

k=1

λ

k

= 1 and u

t

i.i.d.

∼ N[0 , σ

2

] .

(4.16)

About this problem, we can show the following theorem and corollary.

Theorem 4.4 UNIT ROOT NON-TESTABILITY IN NONPARAMETRIC MODELS. If a test has level

α for

˜

H

0

, i.e.

P

F

n

[Rejecting

˜

H

0

] ≤ α for all F

n

satisfying

˜

H

0

, (4.17)

then

P

F

n

[Rejecting

˜

H

0

] ≤ α for all F

n

. (4.18)

PROOF. See Cochrane (1991) and Blough (1992).

9

For examples of size distortions, see Dufour (1981) and Campbell and Dufour (1995, 1997).

10

For reviews of this huge literature, the reader may consult: Banerjee, Dolado, Galbraith, and Hendry (1993), Stock

(1994), Tanaka (1996), and Maddala and Kim (1998).

13

Corollary 4.5 If, for all 0 < α < 1, the condition (4.18) is not satisﬁed, then the size of the test is

equal to one, i.e.

sup

F

n

∈H

0

P

F

n

[Rejecting

˜

H

0

] = 1

where H

0

is the set of all data distributions F

n

that satisfy

˜

H

0

.

As in the mean problem, the null hypothesis is simply too “large” (unrestricted) to allow testing

from a ﬁnite data set. Consequently, all procedures that claim to offer corrections for very general

forms of serial dependence [e.g., Phillips (1987), Phillips and Perron (1988)] are affected by these

problems: irrespective of the nominal level of the test, the true size under the hypothesis

˜

H

0

is equal

to one.

To get a testable hypothesis, it is essential to ﬁx jointly the order of the AR process (i.e., a

numerical upper bound on the order) and the sum of the coefﬁcients: for example, we could consider

the following null hypothesis where the order of the autoregressive process is equal to 12:

H

0

(12) : X

t

= β

0

+

12

k=1

λ

k

X

t−k

+u

t

, t = 1 , . . . , n,

12

k=1

λ

k

= 1 and u

t

i.i.d.

∼ N[0 , σ

2

] .

(4.19)

The order of the autoregressive process is an essential part of the hypothesis: it is not possible to

separate inference on the unit root hypothesis from inference on the order of the process. Similar

difﬁculties will also occur for most other hypotheses on the coefﬁcients of (4.16). For further

discussion of this topic, the reader may consult Sims (1971a, 1971b), Blough (1992), Faust (1996,

1999) and Pötscher (2002).

5. Structural models and weak instruments

Several authors in the past have noted that usual asymptotic approximations are not valid or lead to

very inaccurate results when parameters of interest are close to regions where these parameters are

no longer identiﬁable. The literature on this topic is now considerable.

11

In this section, we shall

examine these issues in the context of SEM.

11

See Sargan (1983), Phillips (1984, 1985, 1989), Gleser and Hwang (1987), Koschat (1987), Phillips (1989), Hillier

(1990), Nelson and Startz (1990a, 1990b), Buse (1992), Choi and Phillips (1992), Maddala and Jeong (1992), Bound,

Jaeger, and Baker (1993, 1995), Dufour and Jasiak (1993, 2001), McManus, Nankervis, and Savin (1994), Angrist and

Krueger (1995), Hall, Rudebusch, and Wilcox (1996), Dufour (1997), Shea (1997), Staiger and Stock (1997), Wang and

Zivot (1998), Zivot, Startz, and Nelson (1998), Hall and Peixe (2000), Stock and Wright (2000), Hahn and Hausman

(2002a, 2002b, 2002c), Hahn, Hausman, and Kuersteiner (2001), Dufour and Taamouti (2000, 2001b, 2001a), Startz,

Nelson, and Zivot (2001), Kleibergen (2001b, 2001a, 2002a, 2002b, 2003), Bekker (2002), Bekker and Kleibergen (2001),

Chao and Swanson (2001, 2003), Moreira (2001, 2003a, 2003b), Moreira and Poi (2001), Stock and Yogo (2002, 2003),

Stock, Wright, and Yogo (2002)], Wright (2003, 2002), Imbens and Manski (2003), Kleibergen and Zivot (2003), Perron

(2003), and Zivot, Startz, and Nelson (2003).

14

5.1. Standard simultaneous equations model

Let us consider the standard simultaneous equations model:

y = Y β +X

1

γ +u, (5.1)

Y = X

1

Π

1

+X

2

Π

2

+V , (5.2)

where y and Y are T × 1 and T × G matrices of endogenous variables, X

1

and X

2

are T × k

1

and T × k

2

matrices of exogenous variables, β and γ are G × 1 and k

1

× 1 vectors of unknown

coefﬁcients, Π

1

and Π

2

are k

1

×Gand k

2

×Gmatrices of unknown coefﬁcients, u = (u

1

, . . . , u

T

)

**is a T ×1 vector of structural disturbances, and V = [V
**

1

, . . . , V

T

]

is a T ×G matrix of reduced-

form disturbances. Further,

X = [X

1

, X

2

] is a full-column rank T ×k matrix (5.3)

where k = k

1

+ k

2

. Finally, to get a ﬁnite-sample distributional theory for the test statistics, we

shall use the following assumptions on the distribution of u :

u and X are independent; (5.4)

u ∼ N

_

0, σ

2

u

I

T

¸

. (5.5)

(5.4) may be interpreted as the strict exogeneity of X with respect to u.

Note that the distribution of V is not otherwise restricted; in particular, the vectors V

1

, . . . , V

T

need not follow a Gaussian distribution and may be heteroskedastic. Below, we shall also consider

the situation where the reduced-form equation for Y includes a third set of instruments X

3

which

are not used in the estimation:

Y = X

1

Π

1

+X

2

Π

2

+X

3

Π

3

+V (5.6)

where X

3

is a T ×k

3

matrix of explanatory variables (not necessarily strictly exogenous); in partic-

ular, X

3

may be unobservable. We view this situation as important because, in practice, it is quite

rare that one can consider all the relevant instruments that could be used. Even more generally, we

could also assume that Y obeys a general nonlinear model of the form:

Y = g(X

1

, X

2

, X

3

, V, Π) (5.7)

where g(·) is a possibly unspeciﬁed nonlinear function and Π is an unknown parameter matrix.

The model presented in (5.1) - (5.2) can be rewritten in reduced form as:

y = X

1

π

1

+X

2

π

2

+v , (5.8)

Y = X

1

Π

1

+X

2

Π

2

+V , (5.9)

15

where π

1

= Π

1

β +γ , v = u +V β , and

π

2

= Π

2

β . (5.10)

Suppose now that we are interested in making inference about β.

(5.10) is the crucial equation governing identiﬁcation in this system: we need to be able to

recover β from the values of the regression coefﬁcients π

2

and Π

2

. The necessary and sufﬁcient

condition for identiﬁcation is the well-known rank condition for the identiﬁcation of β :

β is identiﬁable iff rank(Π

2

) = G. (5.11)

We have a weak instrument problem when either rank(Π

2

) < k

2

(non-identiﬁcation), or Π

2

is

close to having deﬁcient rank [i.e., rank(Π

2

) = k

2

with strong linear dependence between the rows

(or columns) of Π

2

]. There is no compelling deﬁnition of the notion of near-nonidentiﬁcation, but

reasonable characterizations include the condition that det(Π

2

Π

2

) is “close to zero”, or that Π

2

Π

2

has one or several eigenvalues “close to zero”.

Weak instruments are notorious for causing serious statistical difﬁculties on several fronts: (1)

parameter estimation; (2) conﬁdence interval construction; (3) hypothesis testing. We now consider

these problems in greater detail.

5.2. Statistical problems associated with weak instruments

The problems associated with weak instruments were originally discovered through its conse-

quences for estimation. Work in this area includes:

1. theoretical work on the exact distribution of two-stage least squares (2SLS) and other “con-

sistent” structural estimators and test statistics [Phillips (1983), Phillips (1984), Rothenberg

(1984), Phillips (1985), Phillips (1989), Hillier (1990), Nelson and Startz (1990a), Nelson

and Startz (1990a), Buse (1992), Maddala and Jeong (1992), Choi and Phillips (1992), Du-

four (1997)];

2. weak-instrument (local to non-identiﬁcation) asymptotics [Staiger and Stock (1997), Wang

and Zivot (1998), Stock and Wright (2000)];

3. empirical examples [Bound, Jaeger, and Baker (1995)].

The main conclusions of this research can be summarized as follows.

1. Theoretical results show that the distributions of various estimators depend in a complicated

way on unknown nuisance parameters. Thus, they are difﬁcult to interpret.

2. When identiﬁcation conditions are not satisﬁed, standard asymptotic theory for estimators

and test statistics typically collapses.

3. With weak instruments,

16

(a) the 2SLS estimator becomes heavily biased [in the same direction as ordinary least

squares (OLS)];

(b) the distribution of the 2SLS estimator is quite far from the normal distribution (e.g.,

bimodal).

4. A striking illustration of these problems appears in the reconsideration by Bound, Jaeger, and

Baker (1995) of a study on returns to education by Angrist and Krueger (1991, QJE). Using

329000 observations, these authors found that replacing the instruments used by Angrist and

Krueger (1991) with randomly generated (totally irrelevant) instruments produced very simi-

lar point estimates and standard errors. This result indicates that the original instruments were

weak.

For a more complete discussion of estimation with weak instruments, the reader may consult Stock,

Wright, and Yogo (2002).

5.3. Characterization of valid tests and conﬁdence sets

Weak instruments also lead to very serious problems when one tries to perform tests or build conﬁ-

dence intervals on the parameters of the model. Consider the general situation where we have two

parameters θ

1

and θ

2

[i.e., θ = ( θ

1

, θ

2

)] such that θ

2

is no longer identiﬁed when θ

1

takes a certain

value, say θ

1

= θ

0

1

:

L(y | θ

1

, θ

2

) ≡ L(y | θ

0

1

) . (5.12)

Theorem 5.1 If θ

2

is a parameter whose value is not bounded, then the conﬁdence region C with

level 1 −α for θ

2

must have the following property:

P

θ

[C is unbounded] > 0 (5.13)

and, if θ

1

= θ

0

1

,

P

θ

[C is unbounded] ≥ 1 −α. (5.14)

PROOF. See Dufour (1997).

Corollary 5.2 If C does not satisfy the property given in the previous theorem, its size must be zero.

This will be the case, in particular, for any Wald-type conﬁdence interval, obtained by assuming

that

t

´

θ

2

=

´

θ

2

−θ

2

´ σ

θ

2

approx

∼ N(0, 1) , (5.15)

which yields conﬁdence intervals of the form

´

θ

2

− c´ σ

θ

2

≤ θ

2

≤

´

θ

2

+ c´ σ

θ

2

, where P[|N(0, 1)| >

c] ≤ α. By the above corollary, this type of interval has level zero, irrespective of the critical value

c used:

inf

θ

P

θ

_

´

θ

2

−c´ σ

θ

2

≤ θ

2

≤

´

θ

2

+c´ σ

θ

2

_

= 0 . (5.16)

17

In such situations, the notion of standard error loses its usual meaning and does not constitute a valid

basis for building conﬁdence intervals. In SEM, for example, this applies to standard conﬁdence

intervals based on 2SLS estimators and their asymptotic “standard errors”.

Correspondingly, if we wish to test an hypothesis of form H

0

: θ

2

= θ

0

2

, the size of any test of

the form

¸

¸

¸t

´

θ

2

(θ

0

2

)

¸

¸

¸ =

¸

¸

¸

¸

¸

´

θ

2

−θ

0

2

´ σ

θ

2

¸

¸

¸

¸

¸

> c(α) (5.17)

will deviate arbitrarily fromits nominal size. No unique large-sample distribution for t

´

θ

2

can provide

valid tests and conﬁdence intervals based on the asymptotic distribution of t

´

θ

2

. From a statistical

viewpoint, this means that t

´

θ

2

is not a pivotal function for the model considered. More generally,

this type of problem affect the validity of all Wald-type methods, which are based on comparing

parameter estimates with their estimated covariance matrix.

By contrast, in models of the form (5.1) - (5.5), the distribution of the LR statistics for most

hypotheses on model parameters can be bounded and cannot move arbitrarily: likelihood ratios are

boundedly pivotal functions and provide a valid basis for testing and conﬁdence set construction

[see Dufour (1997)].

The central conclusion here is: tests and conﬁdence sets on the parameters of a structural model

should be based on proper pivots.

6. Approaches to weak instrument problems

What should be the features of a satisfactory solution to the problem of making inference in struc-

tural models? We shall emphasize here four properties: (1) the method should be based on proper

pivotal functions (ideally, a ﬁnite-sample pivot); (2) robustness to the presence of weak instruments;

(3) robustness to excluded instruments; (4) robustness to the formulation of the model for the ex-

planatory endogenous variables Y (which is desirable in many practical situations).

In the light of these criteria, we shall ﬁrst discuss the Anderson-Rubin procedure, which in our

view is the reference method for dealing with weak instruments in the context of standard SEM,

second the projection technique which provides a general way of making a wide spectrum of tests

and conﬁdence sets, and thirdly several recent proposals aimed at improving and extending the AR

procedure.

6.1. Anderson-Rubin statistic

A solution to testing in the presence of weak instruments has been available for more than 50

years [Anderson and Rubin (1949)] and is now center stage again [Dufour (1997), Staiger and

Stock (1997)]. Interestingly, the AR method can be viewed as an alternative way of exploiting

“instruments” for inference on a structural model, although it pre-dates the introduction of 2SLS

methods in SEM [Theil (1953), Basmann (1957)], which later became the most widely used method

for estimating linear structural equations models.

12

The basic problem considered consists in testing

12

The basic ideas for using instrumental variables for inference on structural relationships appear to go back to Working

(1927) and Wright (1928). For an interesting discussion of the origin of IVmethods in econometrics, see Stock and Trebbi

18

the hypothesis

H

0

(β

0

) : β = β

0

(6.1)

in model (5.1) - (5.4). In order to do that, we consider an auxiliary regression obtained by subtracting

Y β

0

from both sides of (5.1) and expanding the right-hand side in terms of the instruments. This

yields the regression

y −Y β

0

= X

1

θ

1

+X

2

θ

2

+ε (6.2)

where θ

1

= γ +Π

1

(β −β

0

), θ

2

= Π

2

(β −β

0

) and ε = u+V (β −β

0

) . Under the null hypothesis

H

0

(β

0

), this equation reduces to

y −Y β

0

= X

1

θ

1

+ε , (6.3)

so we can test H

0

(β

0

) by testing H

0

(β

0

) : θ

2

= 0, in the auxiliary regression (6.2). This yields the

following F-statistic — the Anderson-Rubin statistic — which follows a Fisher distribution under

the null hypothesis:

AR(β

0

) =

[SS

0

(β

0

) −SS

1

(β

0

)]/k

2

SS

1

(β

0

)/(T −k)

∼ F(k

2

, T −k) (6.4)

where SS

0

(β

0

) = (y −Y β

0

)

M(X

1

)(y −Y β

0

) and SS

1

(β

0

) = (y −Y β

0

)

M(X)(y −Y β

0

); for

any full-rank matrix A, we denote P(A) = A(A

A)

−1

A

**and M(A) = I− P(A). What plays the
**

crucial role here is the fact that we have instruments (X

2

) that can be related to Y but are excluded

from the structural equation. To draw inference on the structural parameter β, we “hang” on the

variables in X

2

: if we add X

2

to the constrained structural equation (6.3), its coefﬁcient should be

zero. For these reasons, we shall call the variables in X

2

auxiliary instruments.

Since the latter statistic is a proper pivot, it can be used to build conﬁdence sets for β :

C

β

(α) = {β

0

: AR(β

0

) ≤ F

α

(k

2

, T −k)} (6.5)

where F

α

(k

2

, T − k) is the critical value for a test with level α based on the F(k

2

, T − k) distri-

bution. When there is only one endogenous explanatory variable (G = 1), this set has an explicit

solution involving a quadratic inequation, i.e.

C

β

(α) = {β

0

: aβ

2

0

+bβ

0

+c ≤ 0} (6.6)

where a = Y

HY, H ≡ M(X

1

) − M(X) [1 +k

2

F

α

(k

2

, T −k)/(T −k)] , b = −2Y

Hy, and

c = y

**Hy . The set C
**

β

(α) may easily be determined by ﬁnding the roots of the quadratic polynomial

in equation (6.6); see Dufour and Jasiak (2001) and Zivot, Startz, and Nelson (1998) for details.

When G > 1, the set C

β

(α) is not in general an ellipsoid, but it remains fairly manageable

by using the theory of quadrics [Dufour and Taamouti (2000)]. When the model is correct and its

parameters are well identiﬁed by the instruments used, C

β

(α) is a closed bounded set close to an

ellipsoid. In other cases, it can be unbounded or empty. Unbounded sets are highly likely when

the model is not identiﬁed, so they point to lack of identiﬁcation. Empty conﬁdence sets can occur

(with a non-zero probability) when we have more instruments than parameters in the structural

(2003).

19

equation (5.1), i.e. the model is overidentiﬁed. An empty conﬁdence set means that no value of

the parameter vector β is judged to be compatible with the available data, which indicates that the

model is misspeciﬁed. So the procedure provides as an interesting byproduct a speciﬁcation test.

13

It is also easy to see that the above procedure remains valid even if the extended reduced form

(5.6) is the correct model for Y. In other words, we can leave out a subset of the instruments (X

3

)

and use only X

2

: the level of the procedure will not be affected. Indeed, this will also hold if

Y is determined by the general — possibly nonlinear — model (5.7). The procedure is robust to

excluded instruments as well as to the speciﬁcation of the model for Y. The power of the test may

be affected by the choice of X

2

, but its level is not. Since it is quite rare an investigator can be sure

relevant instruments have not been left out, this is an important practical consideration.

The AR procedure can be extended easily to deal with linear hypotheses which involve γ as

well. For example, to test an hypothesis of the form

H

0

(β

0

, γ

0

) : β = β

0

and γ = γ

0

, (6.7)

we can consider the transformed model

y −Y β

0

−X

1

γ

0

= X

1

θ

1

+X

2

θ

2

+ε . (6.8)

Since, under H

0

(β

0

, γ

0

),

y −Y β

0

−X

1

γ

0

= ε , (6.9)

we can test H

0

(β

0

, γ

0

) by testing H

0

(β

0

, γ

0

) : θ

1

= 0 and θ

2

= 0 in the auxiliary regression (6.8);

see Maddala (1974). Tests for more general restrictions of the form

H

0

(β

0

, ν

0

) : β = β

0

and Rγ = ν

0

, (6.10)

where R is a r ×K ﬁxed full-rank matrix, are discussed in Dufour and Jasiak (2001).

The AR procedure thus enjoys several remarkable features. Namely, it is: (1) pivotal in ﬁnite

samples; (2) robust to weak instruments; (3) robust to excluded instruments; (4) robust to the spec-

iﬁcation of the model for Y (which can be nonlinear with an unknown form); further, (5) the AR

method provides asymptotically “valid” tests and conﬁdence sets under quite weak distributional

assumptions (basically, the assumptions that cover the usual asymptotic properties of linear regres-

sion); and (6) it can be extended easily to test restrictions and build conﬁdence sets which also

involve the coefﬁcients of the exogenous variables, such as H

0

(β

0

, ν

0

) in (6.10).

But the method also has its drawbacks. The main ones are: (1) the tests and conﬁdence sets

obtained in this way apply only to the full vector β [or (β

, γ

)

**]; what can we do, if β has more than
**

one element? (2) power may be low if too many instruments are added (X

2

has too many variables)

to perform the test, especially if the instruments are irrelevant; (3) error normality assumption is

restrictive and we may wish to consider other distributional assumptions; (4) the structural equations

are assumed to be linear. We will now discuss a number of methods which have been proposed in

order to circumvent these drawbacks.

13

For further discussion of this point, see Kleibergen (2002b).

20

6.2. Projections and inference on parameter subsets

Suppose now that β [or (β

, γ

)

**] has more than one component. The fact that a procedure with a
**

ﬁnite-sample theory has been obtained for “joint hypotheses” of the form H

0

(β

0

) [or H

0

(β

0

, γ

0

)]

is not due to chance: since the distribution of the data is determined by the full parameter vector,

there is no reason in general why one should be able to decide on the value of a component of

β independently of the others. Such a separation is feasible only in special situations, e.g. in the

classical linear model (without exact multicollinearity). Lack of identiﬁcation is precisely a situation

where the value of a parameter may be determined only after various restrictions (e.g., the values

of other parameters) have been imposed. So parametric nonseparability arises here, and inference

should start from a simultaneous approach. If the data generating process corresponds to a model

where parameters are well identiﬁed, precise inferences on individual coefﬁcients may be feasible.

This raises the question how one can move from a joint inference on β to its components.

A general approach to this problem consists in using a projection technique. If

P[β ∈ C

β

(α)] ≥ 1 −α, (6.11)

then, for any function g(β),

P

_

g(β) ∈ g [C

β

(α)]

¸

≥ 1 −α. (6.12)

If g(β) is a component of β or (more generally) a linear transformation g(β) = w

β, the conﬁdence

set for a linear combination of the parameters, say w

β takes the usual form [w

˜

β−ˆ σz

α

, w

˜

β+ˆ σz

α

]

with

˜

β a k-class type estimator of β; see Dufour and Taamouti (2000).

14

Another interesting feature comes from the fact that the conﬁdence sets obtained in this way are

simultaneous in the sense of Scheffé. More precisely, if {g

a

(β) : a ∈ A} is a set of functions of β,

then

P

_

g

a

(β) ∈ g [C

β

(α)] for all a ∈ A

¸

≥ 1 −α. (6.13)

If these conﬁdence intervals are used to test different hypotheses, an unlimited number of hypotheses

can be tested without losing control of the overall level.

6.3. Alternatives to the AR procedure

In view of improving the power of AR procedures, a number of alternative methods have been

recently suggested. We will now discuss several of them.

a. Generalized auxiliary regression A general approach to the problem of testing H

0

(β

0

) con-

sists in replacing X

2

in the auxiliary regression

y −Y β

0

= X

1

θ

1

+X

2

θ

2

+ε (6.14)

14

g [C

β

(α)] takes the form of a bounded conﬁdence interval as soon as the conﬁdence set g [C

β

(α)] is unbounded.

For further discussion of projection methods, the reader may consult Dufour (1990, 1997), Campbell and Dufour (1997),

Abdelkhalek and Dufour (1998), Dufour, Hallin, and Mizera (1998), Dufour and Kiviet (1998), and Dufour and Jasiak

(2001).

21

by an alternative set of auxiliary instruments, say Z of dimension T ×

¯

k

2

. In other words, we

consider the generalized auxiliary regression

y −Y β

0

= X

1

θ

1

+Z

¯

θ

2

+ε (6.15)

where

¯

θ

2

= 0 under H

0

(β

0

). So we can test H

0

(β

0

) by testing

¯

θ

2

= 0 in (6.15). Then the problem

consists in selecting Z so that the level can be controlled and power may be improved with respect

to the AR auxiliary regression (6.14). For example, it is easy to see that the power of the AR test

could become low if a large set of auxiliary instruments is used, especially if the latter are weak.

So several alternative procedures can be generated by reducing the number of auxiliary instruments

(the number of columns in Z).

At the outset, we should note that, if (5.8) were the correct model and Π = [Π

1

, Π

2

] were

known, then an optimal choice from the viewpoint of power consists in choosing Z = X

2

Π

2

;

see Dufour and Taamouti (2001b). The practical problem, of course, is that Π

2

is unknown. This

suggests that we replace X

2

Π

2

by an estimate, such as

Z = X

2

˜

Π

2

(6.16)

where

˜

Π

2

is an estimate of the reduced-form coefﬁcient Π

2

in (5.2). The problem then consists in

choosing

˜

Π. For that purpose, it is tempting to use the least squares estimator

ˆ

Π = (X

X)

−1

X

Y.

However,

ˆ

Π and ε are not independent and we continue to face a simultaneity problem with messy

distributional consequences. Ideally, we would like to select an estimate

˜

Π

2

which is independent

of ε.

b. Split-sample optimal auxiliary instruments If we can assume that the error vectors

(u

t

, V

t

)

**, t = 1, . . . , T, are independent, this approach to estimating Π may be feasible by using
**

a split-sample technique: a fraction of the sample is used to obtain

˜

Π and the rest to estimate the

auxiliary regression (6.15) with Z = X

2

˜

Π

2

. Under such circumstances, by conditioning on

˜

Π, we

can easily see that the standard F test for

¯

θ

2

= 0 is valid. Further, this procedure is robust to weak

instruments, excluded instruments as well as the speciﬁcation of the model for Y [i.e., under the

general assumptions (5.6) or (5.7)], as long as the independence between

˜

Π and ε can be main-

tained. Of course, using a split-sample may involve a loss of the effective number of observations

and there will be a trade-off between the efﬁciency gain from using a smaller number of auxiliary

instruments and the observations that are “sacriﬁced” to get

˜

Π. Better results tend to be obtained by

using a relatively small fraction of the sample to obtain

˜

Π — 10% for example — and the rest for

the main equation. For further details on this procedure, the reader may consult Dufour and Jasiak

(2001) and Kleibergen (2002a).

15

A number of alternative procedures can be cast in the framework of equation (6.15).

c. LM-type GMM-based statistic If we take Z = Z

WZ

with

Z

WZ

= P[M(X

1

)X

2

]Y = P[M(X

1

)X

2

]M(X

1

)Y = [M(X

1

)X

2

]

ˆ

Π

2

, (6.17)

15

Split-sample techniques often lead to important distributional simpliﬁcations; for further discussion of this type of

method, see Angrist and Krueger (1995) and Dufour and Torrès (1998, 2000).

22

ˆ

Π

2

= [X

2

M(X

1

)X

2

]

−1

X

2

M(X

1

)Y , (6.18)

the F-statistic [say, F

GMM

(β

0

)] for

¯

θ

2

= 0 is a monotonic transformation of the LM-type statistic

LM

GMM

(β

0

) proposed by Wang and Zivot (1998). Namely,

F

GMM

(β

0

) =

_

T −k

1

−G

GT

_

LM

GMM

(β

0

)

1 −(1/T)LM

GMM

(β

0

)

(6.19)

where ν

1

= T −k

1

−G and

LM

GMM

(β

0

) =

(y −Y β

0

)

P[Z

WZ

](y −Y β

0

)

(y −Y β

0

)

M(X

1

)(y −Y β

0

)/T

. (6.20)

Note that

ˆ

Π

2

above is the ordinary least squares (OLS) estimator of Π

2

from the multivariate re-

gression (5.2), so that F

GMM

(β

0

) can be obtained by computing the F-statistic for θ

∗

2

= 0 in the

regression

y −Y β

0

= X

1

θ

∗

1

+ (X

2

ˆ

Π

2

)θ

∗

2

+u. (6.21)

When k

2

≥ G, the statistic F

GMM

(β

0

) can also be obtained by testing θ

∗∗

2

= 0 in the auxiliary

regression

y −Y β

0

= X

1

θ

∗∗

1

+

ˆ

Y θ

∗∗

2

+u (6.22)

where

ˆ

Y = X

ˆ

Π. It is also interesting to note that the OLS estimates of θ

∗∗

1

and θ

∗∗

2

, obtained by

ﬁtting the latter equation, are identical to the 2SLS estimates of θ

∗∗

1

and θ

∗∗

2

in the equation

y −Y β

0

= X

1

θ

∗∗

1

+Y θ

∗∗

2

+u. (6.23)

The LM

GMM

test may thus be interpreted as an approximation to the optimal test based on re-

placing the optimal auxiliary instrument X

2

Π

2

by X

2

ˆ

Π

2

. The statistic LM

GMM

(β

0

) is also nu-

merically identical to the corresponding LR-type and Wald-type tests, based on the same GMM

estimator (in this case, the 2SLS estimator of β).

As mentioned above, the distribution of this statistic will be affected by the fact that X

2

ˆ

Π

2

and

u are not independent. In particular, it is inﬂuenced by the presence of weak instruments. But

Wang and Zivot (1998) showed that the distribution of LM

GMM

(β

0

) is bounded by the χ

2

(k

2

)

asymptotically. When k

2

= G (usually deemed the “just-identiﬁed” case, although the model may

be under-identiﬁed in that case), we see easily [from (6.21)] that F

GMM

(β

0

) is (almost surely)

identical with the AR statistic, i.e.

F

GMM

(β

0

) = AR(β

0

) if k

2

= G, (6.24)

so that F

GMM

(β

0

) follows an exact F(G, T −k) distribution, while for k

2

> G,

GF

GMM

(β

0

) ≤

_

T −k

1

−G

T −k

1

−k

2

_

k

2

AR(β

0

) , (6.25)

so that the distribution of LM

GMM

(β

0

) can be bounded in ﬁnite samples by the distribution of a

23

monotonic transformation of a F(k

2

, T −k) variable [which, for T large, is very close to the χ

2

(k

2

)

distribution]. But, for T reasonably large, AR(β

0

) will always reject when F

GMM

(β

0

) rejects (at a

given level), so the power of the AR test is uniformly superior to that of the LM

GMM

bound test.

16

d. Kleibergen’s K test If we take Z = Z

K

with

Z

K

= P(X)

_

Y −(y −Y β

0

)

s

uV

(β

0

)

s

uu

(β

0

)

_

= X

˜

Π(β

0

) ≡

˜

Y (β

0

) , (6.26)

˜

Π(β

0

) =

ˆ

Π − ˆ π(β

0

)

s

εV

(β

0

)

s

εε

(β

0

)

,

ˆ

Π = (X

X)

−1

X

Y , (6.27)

ˆ π(β

0

) = (X

X)

−1

X

(y −Y β

0

) , s

uV

(β

0

) =

1

T −k

(y −Y β

0

)

M(X)Y , (6.28)

s

uu

(β

0

) =

(y −Y β

0

)

M(X)(y −Y β

0

)

T −k

, (6.29)

we obtain a statistic, which reduces to the one proposed by Kleibergen (2002a) for k

1

= 0. More

precisely, with k

1

= 0, the F-statistic F

K

(β

0

) for

¯

θ

2

= 0 is equal to Kleibergen’s statistic K(β

0

)

divided by G :

F

K

(β

0

) = K(β

0

)/G. (6.30)

This procedure tries to correct the simultaneity problem associated with the use of

ˆ

Y in

the LM

GMM

statistic by “purging” it from its correlation with u [by subtracting the term

ˆ π(β

0

)s

εV

(β

0

)/s

εε

(β

0

) in Z

K

] . In other words, F

K

(β

0

) and K(β

0

) ≡ GF

K

(β

0

) can be obtained

by testing

¯

θ

2

= 0 in the regression

y −Y β

0

= X

1

θ

1

+

˜

Y (β

0

)

¯

θ

2

+u (6.31)

where the ﬁtted values

ˆ

Y , which appear in the auxiliary regression (6.22) for the LM

GMM

test, have

been replaced by

˜

Y (β

0

) =

ˆ

Y −Xˆ π(β

0

)s

εV

(β

0

)/s

εε

(β

0

), which are closer to being orthogonal with

u.

If k

2

= G, we have F

K

(β

0

) = AR(β

0

) ∼ F(G, T − k), while in the other cases (k

2

≥ G),

we can see easily that the bound for F

GMM

(β

0

) in (6.25) also applies to F

K

(β

0

) :

GF

K

(β

0

) ≤

_

T −k

1

−G

T −k

1

−k

2

_

k

2

AR(β

0

) , (6.32)

Kleibergen (2002a) did not supply a ﬁnite-sample distributional theory but showed (assuming

k

1

= 0) that K(β

0

) follows a χ

2

(G) distribution asymptotically under H

0

(β

0

), irrespective of

the presence of weak instruments. This entails that the K(β

0

) test will have power higher than

the one of LM

GMM

test [based on the χ

2

(k

2

) bound], at least in the neighborhood of the null

hypothesis, although not necessarily far away from the null hypothesis.

It is also interesting to note that the inequality (6.32) indicates that the distribution of K(β

0

) ≡

16

The χ

2

(k

2

) bound also follows in a straightforward way from (6.25). Note that Wang and Zivot (1998) do not provide

the auxiliary regression interpretation (6.21) - (6.22) of their statistics. For details, see Dufour and Taamouti (2001b).

24

GF

K

(β

0

) can be bounded in ﬁnite samples by a [k

2

(T −k

1

−G)/(T −k)]F(k

2

, T −k) distribution.

However, because of the stochastic dominance of AR(β

0

), there would be no advantage in using

the bound to get critical values for K(β

0

), for the AR test would then have better power.

In view of the fact that the above procedure is based on estimating the mean of XΠ (using

X

ˆ

Π) and the covariances between the errors in the reduced form for Y and u [using s

εV

(β

0

)], it

can become quite unreliable in the presence of excluded instruments.

e. Likelihood ratio test The likelihood ratio (LR) statistic for β = β

0

was also studied by Wang

and Zivot (1998). The LR test statistic in this case takes the form:

LR

LIML

= T

_

ln

_

κ(β

0

)

_

−ln

_

κ(

ˆ

β

LIML

)

_¸

(6.33)

where

ˆ

β

LIML

is the limited information maximum likelihood estimator (LIML) of β and

κ(β) =

(y −Y β)

M(X

1

)(y −Y β)

(y −Y β)

M(X)(y −Y β)

. (6.34)

Like LM

GMM

, the distribution of LR

LIML

depends on unknown nuisance parameters under

H

0

(β

0

), but its asymptotic distribution is χ

2

(k

2

) when k

2

= G and bounded by the χ

2

(k

2

) dis-

tribution in the other cases [a result in accordance with the general LR distributional bound given in

Dufour (1997)]. This bound can also be easily derived from the following inequality:

LR

LIML

≤

_

T

T −k

_

k

2

AR(β

0

) , (6.35)

so that the distribution of LR

LIML

is bounded in ﬁnite samples by the distribution of a [Tk

2

/(T −

k)]F(k

2

, T − k) variable; for details, see Dufour and Khalaf (2000). For T reasonably large, this

entails that the AR(β

0

) test will have power higher than the one of LR

LIML

test [based on the

χ

2

(k

2

) bound], at least in the neighborhood of the null hypothesis. So the power of the AR test is

uniformly superior to the one of the LR

LIML

bound test. Because the LR test depends heavily on

the speciﬁcation of the model for Y , it is not robust to excluded instruments.

f. Conditional tests A promising approach was recently proposed by Moreira (2003a). His sug-

gestion consists in conditioning upon an appropriately selected portion of the sufﬁcient statistics for

a gaussian SEM. On assuming that the covariance matrix of the errors is known, the corresponding

conditional distribution of various test statistics for H

0

(β

0

) does not involve nuisance parameters.

The conditional distribution is typically not standard but may be established by simulation. Such

an approach may lead to power gains. On the other hand, the assumption that error covariances

are known is rather implausible, and the extension of the method to the case where the error co-

variance matrix is unknown is obtained at the expense of using a large-sample approximation. Like

Kleibergen’s procedure, this method yields an asymptotically similar test. For further discussion,

see Moreira and Poi (2001) and Moreira (2003b).

g. Instrument selection procedures Systematic search methods for identifying relevant instru-

ments and excluding unimportant instruments have been discussed by several authors; see Hall,

25

Rudebusch, and Wilcox (1996), Hall and Peixe (2000), Dufour and Taamouti (2001a), and Donald

and Newey (2001). In this setup, the power of AR-type tests depends on a function of model param-

eters called the concentration coefﬁcient. One way to approach instrument selection is to maximize

the concentration coefﬁcient towards maximizing test power. Robustness to instrument exclusion

is very handy in this context. For further discussion, the reader may consult Dufour and Taamouti

(2001a).

To summarize, in special situations, alternatives to the AR procedure may allow some power

gains with respect to the AR test with an unreduced set of instruments. They themselves may have

some important drawbacks. In particular, (1) only an asymptotic distributional theory is supplied,

(2) the statistics used are not pivotal in ﬁnite samples, although Kleibergen’s and Moreira’s statistics

are asymptotically pivotal, (3) they are not robust to instrument exclusion or to the formulation of

the model for the explanatory endogenous variables. It is also of interest to note that ﬁnite-sample

versions of several of these asymptotic tests may be obtained by using split-sample methods.

All the problems and techniques discussed above relate to sampling-based statistical methods.

SEM can also be analyzed through a Bayesian approach, which can alleviate the indeterminacies

associated with identiﬁcation via the introduction of a prior distribution on the parameter space.

Bayesian inferences always depend on the choice of prior distribution (a property viewed as undesir-

able in the sampling approach), but this dependence becomes especially strong when identiﬁcation

problems are present [see Gleser and Hwang (1987)]. This paper only aims at discussing prob-

lems and solutions which arise within the sampling framework, and it is beyond its scope to debate

the advantages and disadvantages of Bayesian methods under weak identiﬁcation. For additional

discussion on this issue, see Kleibergen and Zivot (2003) and Sims (2001).

7. Extensions

We will discuss succinctly some extensions of the above results to multivariate setups (where several

structural equations may be involved), models with non-Gaussian errors, and nonlinear models.

7.1. Multivariate regression, simulation-based inference and nonnormal errors

Another approach to inference on a linear structural equation model is based on observing that the

structural model (5.1) - (5.4) can be put in the form of a multivariate linear regression (MLR):

¯

Y = XB +U (7.1)

where

¯

Y = [y, Y ], B = [π, Π], U = [u, V ] = [

˜

U

1

, . . . ,

˜

U

T

]

, π = [π

1

, π

2

]

, Π = [Π

1

, Π

2

]

,

π

1

= Π

1

β+γ and π

2

= Π

2

β.

17

This model is linear except for the nonlinear restriction π

2

= Π

2

β.

Let us now make the assumption that the errors in the different equations for each observation,

˜

U

t

,

satisfy the property:

˜

U

t

= JW

t

, t = 1, . . . , T , (7.2)

17

Most of this section is based on Dufour and Khalaf (2000, 2001, 2002).

26

where the vector w = vec(W

1

, . . . , W

n

) has a known distribution and J is an unknown nonsin-

gular matrix (which enters into the covariance matrix Σ of the error vectors

˜

U

t

). This distributional

assumption is, in a way, more restrictive than the one made in section 5.1 — because of the assump-

tion on V — and in another way, less restrictive, because the distribution of u is not taken to be

necessarily N

_

0, σ

2

u

I

T

¸

.

Consider now an hypothesis of the form

H

0

: RBC = D (7.3)

where R, C and D are ﬁxed matrices. This is called a uniform linear (UL) hypothesis; for example,

the hypothesis β = β

0

tested by the AR test can be written in this form [see Dufour and Khalaf

(2000)]. The corresponding gaussian LR statistic is

LR(H

0

) = T ln(|

ˆ

Σ

0

|/|

ˆ

Σ|) (7.4)

where

ˆ

Σ =

ˆ

U

ˆ

U/T and

ˆ

Σ

0

=

ˆ

U

0

ˆ

U

0

/T are respectively the unrestricted and restricted estimates of

the error covariance matrix. The AR test can also be obtained as a monotonic transformation of a

statistic of the form LR(H

0

). An important feature of LR(H

0

) in this case is that its distribution

under H

0

does not involve nuisance parameters and may be easily simulated (it is a pivot); see

Dufour and Khalaf (2002). In particular, its distribution is completely invariant to the unknown

J matrix (or the error covariance matrix). In such a case, even though this distribution may be

complicated, we can use Monte Carlo test techniques — a form of parametric bootstrap — to obtain

exact test procedures.

18

Multivariate extensions of AR tests, which impose restrictions on several

structural equations, can be obtained in this way. Further, this approach allows one to consider any

(possibly non-gaussian) distribution on w.

More generally, it is of interest to note that the LR statistic for about any hypothesis on B can

be bounded by a LR statistic for an appropriately selected UL hypothesis: setting b = vec(B) and

H

0

: Rb ∈ ∆

0

(7.5)

where R an arbitrary q × k(G + 1) matrix and ∆

0

is an arbitrary subset of R

q

, the distribution of

the corresponding LR statistic can be bounded by the LR statistic for a UL hypothesis (which is

pivotal). This covers as special cases all restrictions on the coefﬁcients of SEM (as long as they are

written in the MLR form). To avoid the use of such bounds (which may be quite conservative), it is

also possible to use maximized Monte Carlo tests [Dufour (2002)].

All the above procedures are valid for parametric models that specify the error distribution up to

an unknown linear transformation (the J matrix) which allows an unknown covariance matrix. It is

easy to see that these (including the exact procedures discussed in section 6) yield “asymptotically

valid” procedures under much weaker assumptions than those used to obtain ﬁnite-sample results.

However, in view of the discussion in section 4, the pitfalls and limitations of such arguments should

be remembered: there is no substitute for a provably exact procedure.

If we aim at getting tests and conﬁdence sets for nonparametric versions of the SEM (where the

18

For further discussion of Monte Carlo test methods, see Dufour and Khalaf (2001) and Dufour (2002).

27

error distribution involves an inﬁnite set of nuisance parameters), this may be achievable by looking

at distribution-free procedures based on permutations, ranks or signs. There is very little work on

this topic in the SEM. For an interesting ﬁrst look, however, the reader should look at an interesting

recent paper by Bekker (2002).

7.2. Nonlinear models

It is relatively difﬁcult to characterize identiﬁcation and study its consequences in nonlinear struc-

tural models. But problems similar to those noted for linear SEM do arise. Nonlinear structural

models typically take the form:

f

t

(y

t

, x

t

, θ) = u

t

, E

θ

[u

t

| Z

t

] = 0 , t, . . . , T, (7.6)

where f

t

(·) is a vector of possibly nonlinear relationships, y

t

is a vector endogenous variables, x

t

is

a vector of exogenous variables, θ is vector of unknown parameters, Z

t

is a vector of conditioning

variables (or instruments) — usually with a number of additional distributional assumptions — and

E

θ

[ · ] refers to the expected value under a distribution with parameter value θ. In such models, θ

can be viewed as identiﬁable if there is only one value of θ [say, θ =

¯

θ] that satisﬁes (7.6), and we

have weak identiﬁcation (or weak instruments) when the expected values E¯

θ

[f

t

(y

t

, x

t

, θ) | Z

t

] = 0 ,

t, . . . , T, are “weakly sensitive” to the value of θ.

Research on weak identiﬁcation in nonlinear models remains scarce. Nonlinearity makes it dif-

ﬁcult to construct ﬁnite-sample procedures even in models where identiﬁcation difﬁculties do not

occur. So it is not surprising that work in this area has been mostly based on large-sample approx-

imations. Stock and Wright (2000) studied the asymptotic distributions of GMM-based estimators

and test statistics under conditions of weak identiﬁcation (and weak “high level” asymptotic distri-

butional assumptions). While GMM estimators of θ have nonstandard asymptotic distributions, the

objective function minimized by the GMM procedure follows an asymptotic distribution which is

unaffected by the presence of weak instruments: it is asymptotically pivotal. So tests and conﬁdence

sets based on the objective function can be asymptotically valid irrespective of the presence of weak

instruments. These results are achieved for the full parameter vector θ, i.e. for hypotheses of the

form θ = θ

0

and the corresponding joint conﬁdence sets. This is not surprising: parametric non-

separability arises here for two reasons, model nonlinearity and the possibility of non-identiﬁcation.

Of course, once a joint conﬁdence set for θ has been built, inference on individual parameters can

be drawn via projection methods. Other contributions in this area include papers by Kleibergen

(2001a, 2003), who proposed an extension of the K(β

0

) test, and Wright (2003, 2002) proposed

tests of underidentiﬁcation and identiﬁcation.

In view the discussion in section 4, the fact that all these methods are based on large-sample ap-

proximations without a ﬁnite-sample theory remains a concern. However, a ﬁrst attempt at deriving

ﬁnite-sample procedures is available in Dufour and Taamouti (2001b). Under parametric assump-

tions on the errors, the hypothesis θ = θ

0

is tested by testing γ = 0 in an auxiliary regression of the

form:

f

t

(y

t

, x

t

, θ

0

) = z

t

(θ

0

, θ

1

)γ +ε

t

, t, . . . , T, (7.7)

28

where the z

t

(θ

0

, θ

1

) are instruments in a way that maximizes power against a reference alternative

(point-optimal instruments). One gets in this way point-optimal tests [see King (1988) and Dufour

and King (1991)]. Inference on nonlinear regressions are also covered by this setup. As in the case

of linear SEM, sample-split techniques may be exploited to approximate optimal instruments, and

projection methods can be used to draw inference on subvectors of θ.

8. Conclusion

By way of conclusion, we will summarize the main points made in this paper.

1. There are basic pitfalls and limitations faced in developing inference procedures in economet-

rics. If we are not careful, we can easily be led into ill-deﬁned problems and ﬁnd ourselves:

(a) trying to test a non-testable hypothesis, such as an hypothesis on a moment in the context

of an insufﬁciently restrictive nonparametric model, or an hypothesis (e.g., a unit root

hypothesis) on a dynamic model while allowing a dynamic structure with an unlimited

(not necessarily inﬁnite) number of parameters;

(b) trying to solve an inference problem using a technique that cannot deliver a solution

because of the very structure of the technique, as in (i) testing an hypothesis on a mean

(or median) under heteroskedasticity of unknown form, via standard least-square-based

“heteroskedasticity-robust” standard errors, or (ii) building a conﬁdence interval for a

parameter which is not identiﬁable in a structural model, via the usual technique based

on standard errors. In particular, this type of difﬁculty arises for Wald-type statistics in

the presence of weak instruments (or weakly identiﬁed models)

2. In many econometric problems (such as, inference on structural models), several of the intu-

itions derived from the linear regression model and standard asymptotic theory can easily be

misleading.

(a) Standard errors do not constitute a valid way of assessing parameter uncertainty and

building conﬁdence intervals.

(b) In many models, such as structural models where parameters may be underidentiﬁed,

individual parameters in statistical models are not generally meaningful, but parameter

vectors can be (at least in parametric models). We called this phenomenon parametric

nonseparability. As a result, restrictions on individual coefﬁcients may not be testable,

while restrictions on the whole parameter vector are. This feature should play a central

role in designing methods for dealing with weakly identiﬁed models.

3. The above difﬁculties underscore the pitfalls of large-sample approximations, which are typ-

ically based on pointwise (rather than uniform) convergence results and may be arbitrarily

inaccurate in ﬁnite samples.

29

4. Concerning solutions to such problems, and more speciﬁcally in the context of weakly iden-

tiﬁed models, we have emphasized the following points.

(a) In accordance with basic statistical theory, one should always look for pivots as the

fundamental ingredient for building tests and conﬁdence sets.

(b) Pivots are not generally available for individual parameters, but they can be obtained in

a much wider set of cases for appropriately selected vectors of parameters.

(c) Given a pivot for a parameter vector, we can construct valid tests and conﬁdence sets

for the parameter vector.

(d) Inference on individual coefﬁcients may then be derived through projection methods.

5. In the speciﬁc example of SEM, the following general remarks are in our view important.

(a) Besides being pivotal, the AR statistic enjoys several remarkable robustness properties,

such as robustness to the presence of weak instruments, to excluded instruments or to

the speciﬁcation of a model for the endogenous explanatory variables.

(b) It is possible to improve the power of AR-type procedures (especially by reducing the

number of instruments), but power improvements may come at the expense of using a

possibly unreliable large-sample approximation or losing robustness (such as robustness

to excluded instruments). As usual, there is a trade-off between power (which is typi-

cally increased by considering more restrictive models) and robustness (which involves

considering a wider hypothesis).

(c) Trying to adapt and improve AR-type procedures (without ever forgetting basic statisti-

cal principles) constitutes the most promising avenue for dealing with weak instruments.

30

References

ABDELKHALEK, T., AND J.-M. DUFOUR (1998): “Statistical Inference for Computable General

Equilibrium Models, with Application to a Model of the Moroccan Economy,” Review of Eco-

nomics and Statistics, LXXX, 520–534.

ANDERSON, T. W., AND H. RUBIN (1949): “Estimation of the Parameters of a Single Equation in

a Complete System of Stochastic Equations,” Annals of Mathematical Statistics, 20, 46–63.

ANGRIST, J. D., AND A. B. KRUEGER (1991): “Does Compulsory School Attendance Affect

Schooling and Earning?,” Quarterly Journal of Economics, CVI, 979–1014.

ANGRIST, J. D., AND A. B. KRUEGER (1995): “Split-Sample Instrumental Variables Estimates of

the Return to Schooling,” Journal of Business and Economic Statistics, 13, 225–235.

BAHADUR, R. R., AND L. J. SAVAGE (1956): “The Nonexistence of Certain Statistical Procedures

in Nonparametric Problems,” Annals of Mathematical Statistics, 27, 1115–1122.

BANERJEE, A., J. DOLADO, J. W. GALBRAITH, AND D. F. HENDRY (1993): Co-Integration, Er-

ror Correction, and the Econometric Analysis of Non-Stationary Data. Oxford University Press

Inc., New York.

BASMANN, R. L. (1957): “A General Classical Method of Linear Estimation of Coefﬁcients in a

Structural Equation,” Econometrica, 25, 77–83.

BEKKER, P. A. (2002): “Symmetry-Based Inference in an Instrumental Variable Setting,” Discus-

sion paper, Department of Economics, University of Groningen, Groningen, The Netherlands.

BEKKER, P. A., AND F. KLEIBERGEN (2001): “Finite-Sample Instrumental Variables Inference

Using an Asymptotically Pivotal Statistic,” Discussion Paper TI 2001-055/4, Tinbergen Institute,

University of Amsterdam, Amsterdam, The Netherlands.

BEKKER, P. A., A. MERCKENS, AND T. J. WANSBEEK (1994): Identiﬁcation, Equivalent Models,

and Computer Algebra. Academic Press, Boston.

BLOUGH, S. R. (1992): “The Relationship between Power and Level for Generic Unit Root Tests

in Finite Samples,” Journal of Applied Econometrics, 7, 295–308.

BOUND, J., D. A. JAEGER, AND R. BAKER (1993): “The Cure can be Worse than the Disease:

A Cautionary Tale Regarding Instrumental Variables,” Technical Working Paper 137, National

Bureau of Economic Research, Cambridge, MA.

BOUND, J., D. A. JAEGER, AND R. M. BAKER (1995): “Problems With Instrumental Variables

Estimation When the Correlation Between the Instruments and the Endogenous Explanatory

Variable Is Weak,” Journal of the American Statistical Association, 90, 443–450.

BUSE, A. (1992): “The Bias of Instrumental Variables Estimators,” Econometrica, 60, 173–180.

31

CAMPBELL, B., AND J.-M. DUFOUR (1995): “Exact Nonparametric Orthogonality and Random

Walk Tests,” Review of Economics and Statistics, 77, 1–16.

(1997): “Exact Nonparametric Tests of Orthogonality and Random Walk in the Presence

of a Drift Parameter,” International Economic Review, 38, 151–173.

CHAO, J., AND N. R. SWANSON (2001): “Bias and MSE Analysis of the IV Estimator Under Weak

Identiﬁcation with Application to Bias Correction,” Discussion paper, Department of Economics,

Rutgers University, New Brunswick, New Jersey.

(2003): “Alternative Approximations of the Bias and MSE of the IV Estimator Under

Weak Identiﬁcation With an Application to Bias Correction,” Discussion paper, Department of

Economics, Rutgers University, New Brunswick, New Jersey.

CHOI, I., AND P. C. B. PHILLIPS (1992): “Asymptotic and Finite Sample Distribution Theory for

IV Estimators and Tests in Partially Identiﬁed Structural Equations,” Journal of Econometrics,

51, 113–150.

COCHRANE, J. H. (1991): “A Critique of the Application of Unit Root Tests,” Journal of Economic

Dynamics and Control, 15, 275–284.

CUSHING, M. J., AND M. G. MCGARVEY (1999): “Covariance Matrix Estimation,” in Mátyás

(1999), chap. 3, pp. 63–95.

DAVIDSON, R., AND J. G. MACKINNON (1993): Estimation and Inference in Econometrics. Ox-

ford University Press, New York.

DONALD, S. G., AND W. K. NEWEY (2001): “Choosing the Number of Instruments,” Economet-

rica, 69, 1161–1191.

DUFOUR, J.-M. (1981): “Rank Tests for Serial Dependence,” Journal of Time Series Analysis, 2,

117–128.

(1990): “Exact Tests and Conﬁdence Sets in Linear Regressions with Autocorrelated Er-

rors,” Econometrica, 58, 475–494.

(1997): “Some Impossibility Theorems in Econometrics, with Applications to Structural

and Dynamic Models,” Econometrica, 65, 1365–1389.

(2000): “Économétrie, théorie des tests et philosophie des sciences,” in Présentations

de l’Académie des lettres et des sciences humaines, vol. 53, pp. 166–182. Royal Society of

Canada/Société royale du Canada, Ottawa.

(2001): “Logique et tests d’hypothèses: réﬂexions sur les problèmes mal posés en

économétrie,” L’Actualité économique, 77(2), 171–190.

32

(2002): “Monte Carlo Tests with Nuisance Parameters: A General Approach to Finite-

Sample Inference and Nonstandard Asymptotics in Econometrics,” Journal of Econometrics,

forthcoming.

DUFOUR, J.-M., M. HALLIN, AND I. MIZERA (1998): “Generalized Runs Tests for Heteroskedas-

tic Time Series,” Journal of Nonparametric Statistics, 9, 39–86.

DUFOUR, J.-M., AND J. JASIAK (1993): “Finite Sample Inference Methods for Simultaneous

Equations and Models with Unobserved and Generated Regressors,” Discussion paper, C.R.D.E.,

Université de Montréal, 38 pages.

DUFOUR, J.-M., AND J. JASIAK (2001): “Finite Sample Limited Information Inference Methods

for Structural Equations and Models with Generated Regressors,” International Economic Re-

view, 42, 815–843.

DUFOUR, J.-M., AND L. KHALAF (2000): “Simulation-Based Finite-Sample Inference in Simul-

taneous Equations,” Discussion paper, C.R.D.E., Université de Montréal.

(2001): “Monte Carlo Test Methods in Econometrics,” in Companion to Theoretical Econo-

metrics, ed. by B. Baltagi, Blackwell Companions to Contemporary Economics, chap. 23, pp.

494–519. Basil Blackwell, Oxford, U.K.

(2002): “Simulation Based Finite and Large Sample Tests in Multivariate Regressions,”

Journal of Econometrics, 111(2), 303–322.

DUFOUR, J.-M., AND M. L. KING (1991): “Optimal Invariant Tests for the Autocorrelation Coef-

ﬁcient in Linear Regressions with Stationary or Nonstationary AR(1) Errors,” Journal of Econo-

metrics, 47, 115–143.

DUFOUR, J.-M., AND J. F. KIVIET (1998): “Exact Inference Methods for First-Order Autoregres-

sive Distributed Lag Models,” Econometrica, 66, 79–104.

DUFOUR, J.-M., AND M. TAAMOUTI (2000): “Projection-Based Statistical Inference in Linear

Structural Models with Possibly Weak Instruments,” Discussion paper, C.R.D.E., Université de

Montréal.

(2001a): “On Methods for Selecting Instruments,” Discussion paper, C.R.D.E., Université

de Montréal.

(2001b): “Point-Optimal Instruments and Generalized Anderson-Rubin Procedures for

Nonlinear Models,” Discussion paper, C.R.D.E., Université de Montréal.

DUFOUR, J.-M., AND O. TORRÈS (1998): “Union-Intersection and Sample-Split Methods in

Econometrics with Applications to SURE and MA Models,” in Handbook of Applied Economic

Statistics, ed. by D. E. A. Giles, and A. Ullah, pp. 465–505. Marcel Dekker, New York.

(2000): “Markovian Processes, Two-Sided Autoregressions and Exact Inference for Sta-

tionary and Nonstationary Autoregressive Processes,” Journal of Econometrics, 99, 255–289.

33

ENGLE, R. F., AND D. L. MCFADDEN (eds.) (1994): Handbook of Econometrics, Volume 4.

North-Holland, Amsterdam.

FAUST, J. (1996): “Near Observational Equivalence Problems and Theoretical Size Problems with

Unit Root Tests,” Econometric Theory, 12, 724–732.

FAUST, J. (1999): “Theoretical conﬁdence level problems with conﬁdence intervals for the spec-

trum of a time series,” Econometrica, 67, 629–637.

FISHER, F. M. (1976): The Identiﬁcation Problem in Econometrics. Krieger Publishing Company,

Huntington (New York).

GLESER, L. J., AND J. T. HWANG (1987): “The Nonexistence of 100(1 − α) Conﬁdence Sets of

Finite Expected Diameter in Errors in Variables and Related Models,” The Annals of Statistics,

15, 1351–1362.

GODAMBE, V. P. (1960): “An Optimum Property of Regular Maximum Likelihood Estimation,”

The Annals of Mathematical Statistics, 31, 1208–1212, Ackowledgement 32 (1960), 1343.

GRILICHES, Z., AND M. D. INTRILLIGATOR (eds.) (1983): Handbook of Econometrics, Volume

1. North-Holland, Amsterdam.

HAHN, J., AND J. HAUSMAN (2002a): “A New Speciﬁcation Test for the Validity of Instrumental

Variables,” Econometrica, 70, 163–189.

(2002b): “Notes on Bias in Estimators for Simultaneous Equation Models,” Economics

Letters, 75, 237–241.

(2002c): “Weak Instruments: Diagnosis and Cures in Empirical Econometrics,” Discus-

sion paper, Department of Economics, Massachusetts Institute of Technology, Cambridge, Mas-

sachusetts.

HAHN, J., J. HAUSMAN, AND G. KUERSTEINER (2001): “Higher Order MSE of Jackknife 2SLS,”

Discussion paper, Department of Economics, Massachusetts Institute of Technology, Cambridge,

Massachusetts.

HALL, A. R. (1999): “Hypothesis Testing in Models Estimated by GMM,” in Mátyás (1999),

chap. 4, pp. 96–127.

HALL, A. R., AND F. P. M. PEIXE (2000): “A Consistent Method for the Selection of Relevant

Instruments,” Discussion paper, Department of Economics, North Carolina State University,

Raleigh, North Carolina.

HALL, A. R., G. D. RUDEBUSCH, AND D. W. WILCOX (1996): “Judging Instrument Relevance

in Instrumental Variables Estimation,” International Economic Review, 37, 283–298.

HANSEN, L. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,”

Econometrica, 50, 1029–1054.

34

HILLIER, G. H. (1990): “On the Normalization of Structural Equations: Properties of Direction

Estimators,” Econometrica, 58, 1181–1194.

HOROWITZ, J. L. (2001): “The Bootstrap and Hypothesis Tests in Econometrics,” Journal of

Econometrics, 100(1), 37–40.

HSIAO, C. (1983): “Identiﬁcation,” in Griliches and Intrilligator (1983), chap. 4, pp. 223–283.

IMBENS, G. W., AND C. F. MANSKI (2003): “Conﬁdence Intervals for Partially Identiﬁed Pa-

rameters,” Discussion paper, Department of Economics, University of California at Berkeley,

Berkeley, California.

KING, M. L. (1988): “Towards a Theory of Point Optimal Testing,” Econometric Reviews, 6, 169–

218, Comments and reply, 219-255.

KLEIBERGEN, F. (2001a): “Testing Parameters in GMM Without Assuming That They are Identi-

ﬁed,” Discussion Paper TI 01-067/4, Tinbergen Institute, Amsterdam, The Netherlands.

(2001b): “Testing Subsets of Structural Coefﬁcients in the IV Regression Model,” Discus-

sion paper, Department of Quantitative Economics, University of Amsterdam.

(2002a): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables

Regression,” Econometrica, 70(5), 1781–1803.

(2002b): “Two Independent Statistics That Test Location and Misspeciﬁcation and Add-

Up to the Anderson-Rubin Statistic,” Discussion paper, Department of Quantitative Economics,

University of Amsterdam.

(2003): “Expansions of GMM Statistics That Indicate their Properties under Weak and/or

Many Instruments,” Discussion paper, Department of Quantitative Economics, University of Am-

sterdam.

KLEIBERGEN, F., AND E. ZIVOT (2003): “Bayesian and Classical Approaches to Instrumental

Variable Regression,” Journal of Econometrics, 114(1), 29–72.

KOSCHAT, M. A. (1987): “A Characterization of the Fieller Solution,” The Annals of Statistics, 15,

462–468.

LEHMANN, E. L. (1986): Testing Statistical Hypotheses, 2nd edition. John Wiley & Sons, New

York.

LEHMANN, E. L., AND C. STEIN (1949): “On the Theory of some Non-Parametric Hypotheses,”

Annals of Mathematical Statistics, 20, 28–45.

MAASOUMI, E. (1992): “Fellow’s Opinion: Rules of Thumb and Pseudo-Science,” Journal of

Econometrics, 53, 1–4.

35

MACKINNON, J. G. (2002): “Bootstrap Inference in Econometrics,” Canadian Journal of Eco-

nomics, 35(4), 615–645.

MADDALA, G. S. (1974): “Some Small Sample Evidence on Tests of Signiﬁcance in Simultaneous

Equations Models,” Econometrica, 42, 841–851.

MADDALA, G. S., AND J. JEONG (1992): “On the Exact Small Sample Distribution of the Instru-

mental Variable Estimator,” Econometrica, 60, 181–183.

MADDALA, G. S., AND I.-M. KIM (1998): Unit Roots, Cointegration and Structural Change.

Cambridge University Press, Cambridge, U.K.

MANSKI, C. (1995): Identiﬁcation Problems in the Social Sciences. Harvard University Press,

Cambridge and London.

MANSKY, C. (2003): Partial Identiﬁcation of Probability Distributions, Springer Series in Statis-

tics. Springer-Verlag, New York.

MÁTYÁS, L. (ed.) (1999): Generalized Method of Moments Estimation. Cambridge University

Press, Cambridge, U.K.

MCMANUS, D. A., J. C. NANKERVIS, AND N. E. SAVIN (1994): “Multiple Optima and Asymp-

totic Approximations in the Partial Adjustment Model,” Journal of Econometrics, 62, 91–128.

MOREIRA, M. J. (2001): “Tests With Correct Size When Instruments Can Be Arbitrarily Weak,”

Discussion paper, Department of Economics, Harvard University, Cambridge, Massachusetts.

(2003a): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica,

71(4), 1027–1048.

(2003b): “A General Theory of Hypothesis Testing in the Simultaneous Equations Model,”

Discussion paper, Department of Economics, Harvard University, Cambridge, Massachusetts.

MOREIRA, M. J., AND B. P. POI (2001): “Implementing Tests with Correct Size in the Simultane-

ous Equations Model,” The Stata Journal, 1(1), 1–15.

NELSON, C. R., AND R. STARTZ (1990a): “The Distribution of the Instrumental Variable Estimator

and its t-ratio When the Instrument is a Poor One,” Journal of Business, 63, 125–140.

(1990b): “Some Further Results on the Exact Small Properties of the Instrumental Variable

Estimator,” Econometrica, 58, 967–976.

NEWEY, W. K., AND D. MCFADDEN (1994): “Large Sample Estimation and Hypothesis Testing,”

in Engle and McFadden (1994), chap. 36, pp. 2111–2245.

NEWEY, W. K., AND K. D. WEST (1987a): “Hypothesis Testing with Efﬁcient Method of Moments

Estimators,” International Economic Review, 28, 777–787.

36

(1987b): “A Simple, Positive Semi-Deﬁnite, Heteroskedasticity and Autocorrelation Con-

sistent Covariance Matrix,” Econometrica, 55, 703–708.

PERRON, B. (2003): “Semiparametric Weak Instrument Regressions with an Application to the

Risk Return Tradeoff,” Review of Economics and Statistics, 85(2), 424–443.

PHILLIPS, P. C. B. (1983): “Exact Small Sample theory in the Simultaneous Equations Model,” in

Griliches and Intrilligator (1983), chap. 8, pp. 449–516.

(1984): “The Exact Distribution of LIML: I,” International Economic Review, 25, 249–261.

(1985): “The Exact Distribution of LIML: II,” International Economic Review, 26, 21–36.

(1987): “Time Series Regression with Unit Root,” Econometrica, 55(2), 277–301.

(1989): “Partially Identiﬁed Econometric Models,” Econometric Theory, 5, 181–240.

PHILLIPS, P. C. B., AND P. PERRON (1988): “Testing for a Unit Root in Time Series Regression,”

Biometrika, 75, 335–346.

POPPER, K. (1968): The Logic of Scientiﬁc Discovery. Harper Torchbooks, New York, revised edn.

PÖTSCHER, B. (2002): “Lower Risk Bounds and Properties of Conﬁdence Sets for Ill-Posed Esti-

mation Problems with Applications to Spectral Density and Persistence Estimation, Unit Roots

and Estimation of Long Memory Parameters,” Econometrica, 70(3), 1035–1065.

PRAKASA RAO, B. L. S. (1992): Identiﬁability in Stochastic Models: Characterization of Proba-

bility Distributions. Academic Press, New York.

PRATT, J. W., AND J. D. GIBBONS (1981): Concepts of Nonparametric Theory. Springer-Verlag,

New York.

ROTHENBERG, T. J. (1971): “Identiﬁcation in Parametric Models,” Econometrica, 39, 577–591.

(1984): “Approximating the Distributions of Econometric Estimators and Test Statistics,”

in Handbook of Econometrics, Volume 2, ed. by Z. Griliches, and M. D. Intrilligator, chap. 15,

pp. 881–935. North-Holland, Amsterdam.

SARGAN, J. D. (1983): “Identiﬁcation and Lack of Identiﬁcation,” Econometrica, 51, 1605–1633.

SHEA, J. (1997): “Instrument Relevance in Multivariate Linear Models: A Simple Measure,” Re-

view of Economics and Statistics, LXXIX, 348–352.

SIMS, C. (1971a): “Distributed Lag Estimation When the Parameter Space is Explicitly Inﬁnite-

Dimensional,” Annals of Mathematical Statistics, 42, 1622–1636.

SIMS, C. A. (1971b): “Discrete Approximations to Continuous Time Distributed Lags in Econo-

metrics,” Econometrica, 39, 545–563.

37

(2001): “Thinking About Instrumental Variables,” Discussion paper, Department of Eco-

nomics, Priceton University, Princeton, New Jersey.

STAIGER, D., AND J. H. STOCK (1997): “Instrumental Variables Regression with Weak Instru-

ments,” Econometrica, 65, 557–586.

STARTZ, R., C. R. NELSON, AND E. ZIVOT (2001): “Improved Inference for the Instrumental

Variable Estimator,” Discussion paper, Department of Economics, University of Washington.

STOCK, J. H. (1994): “Unit Root, Structural Breaks and Trends,” in Engle and McFadden (1994),

chap. 46, pp. 2740–2841.

STOCK, J. H., AND F. TREBBI (2003): “Who Invented IV Regression?,” The Journal of Economic

Perspectives, forthcoming.

STOCK, J. H., AND J. H. WRIGHT (2000): “GMM with Weak Identiﬁcation,” Econometrica, 68,

1097–1126.

STOCK, J. H., J. H. WRIGHT, AND M. YOGO (2002): “A Survey of Weak Instruments and Weak

Identiﬁcation in Generalized Method of Moments,” Journal of Business and Economic Statistics,

20(4), 518–529.

STOCK, J. H., AND M. YOGO (2002): “Testing for Weak Instruments in Linear IV Regression,”

Discussion Paper 284, N.B.E.R., Harvard University, Cambridge, Massachusetts.

(2003): “Asymptotic Distributions of Instrumental Variables Statistics with Many Weak In-

struments,” Discussion paper, Department of Economics, Harvard University, Cambridge, Mas-

sachusetts.

TANAKA, K. (1996): Time Series Analysis: Nonstationary and Noninvertible Distribution Theory.

John Wiley & Sons, New York.

TAYLOR, W. E. (1983): “On the Relevance of Finite Sample Distribution Theory,” Econometric

Reviews, 2, 1–39.

THEIL, H. (1953): “Repeated Least-Squares Applied to Complete Equation Systems,” Discussion

paper, Central Planing Bureau, The Hague, The Netherlands.

WANG, J., AND E. ZIVOT (1998): “Inference on Structural Parameters in Instrumental Variables

Regression with Weak Instruments,” Econometrica, 66, 1389–1404.

WHITE, H. (1980): “A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Het-

eroskedasticity,” Econometrica, 48, 817–838.

WORKING, E. J. (1927): “What Do Statistical Demand Curves Show?,” The Quarterly Journal of

Economics, 41, 212–235.

38

WRIGHT, J. H. (2002): “Testing the Null of Identiﬁcation in GMM,” Discussion Paper 732, Inter-

national Finance Discussion Papers, Board of Governors of the Federal Reserve System, Wash-

ington, D.C.

(2003): “Detecting Lack of Identiﬁcation in GMM,” Econometric Theory, 19(2), 322–330.

WRIGHT, P. G. (1928): The Tariff on Animal and Vegetable Oils. Macmillan, New York.

ZIVOT, E., R. STARTZ, AND C. R. NELSON (1998): “Valid Conﬁdence Intervals and Inference in

the Presence of Weak Instruments,” International Economic Review, 39, 1119–1144.

(2003): “Inference in Partially Identiﬁed Instrumental Variables Regression with Weak

Instruments,” Discussion paper, Department of Economics, University of Washington.

39

ABSTRACT

We discuss statistical inference problems associated with identiﬁcation and testability in econometrics, and we emphasize the common nature of the two issues. After reviewing the relevant statistical notions, we consider in turn inference in nonparametric models and recent developments on weakly identiﬁed models (or weak instruments). We point out that many hypotheses, for which test procedures are commonly proposed, are not testable at all, while some frequently used econometric methods are fundamentally inappropriate for the models considered. Such situations lead to ill-deﬁned statistical problems and are often associated with a misguided use of asymptotic distributional results. Concerning nonparametric hypotheses, we discuss three basic problems for which such difﬁculties occur: (1) testing a mean (or a moment) under (too) weak distributional assumptions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models with an unlimited number of parameters. Concerning weakly identiﬁed models, we stress that valid inference should be based on proper pivotal functions — a condition not satisﬁed by standard Waldtype methods based on standard errors — and we discuss recent developments in this ﬁeld, mainly from the viewpoint of building valid tests and conﬁdence sets. The techniques discussed include alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests. The possibility of deriving a ﬁnite-sample distributional theory, robustness to the presence of weak instruments, and robustness to the speciﬁcation of a model for endogenous explanatory variables are stressed as important criteria assessing alternative procedures. Key-words : hypothesis testing; conﬁdence set; conﬁdence interval; identiﬁcation; testability; asymptotic theory; exact inference; pivotal function; nonparametric model; Bahadur-Savage; heteroskedasticity; serial dependence; unit root; simultaneous equations; structural model; instrumental variable; weak instrument; weak identiﬁcation; simultaneous inference; projection; split-sample; conditional test; Monte Carlo test; bootstrap. JEL classiﬁcation numbers: C1, C12, C14, C15, C3, C5.

i

RÉSUMÉ

Nous analysons les problèmes d’inférence associés à l’identiﬁcation et à la testabilité en économétrie, en soulignant la similarité entre les deux questions. Après une courte revue des notions statistiques requises, nous étudions tour à tour l’inférence dans les modèles non-paramétriques ainsi que les résultats récents sur les modèles structurels faiblement identiﬁés (ou les instruments faibles). Nous remarquons que beaucoup d’hypothèses, pour lesquelles des tests sont régulièrement proposés, ne sont pas en fait testables, tandis que plusieurs méthodes économétriques fréquemment utilisées sont fondamentalement inappropriées pour les modèles considérés. De telles situations conduisent à des problèmes statistiques mal posés et sont souvent associées à un emploi mal avisé de résultats distributionnels asymptotiques. Concernant les hypothèses non-paramétriques, nous analysons trois problèmes de base pour lesquels de telles difﬁcultés apparaissent: (1) tester une hypothèse sur un moment avec des restrictions trop faibles sur la forme de la distribution; (2) l’inférence avec hétéroscédasticité de forme non spéciﬁée; (3) l’inférence dans les modèles dynamiques avec un nombre illimité de paramètres. Concernant les modèles faiblement identiﬁés, nous insistons sur l’importance d’utiliser des fonctions pivotales — une condition qui n’est pas satisfaite par les méthodes usuelles de type Wald basées sur l’emploi d’écart-types — et nous passons en revue les développements récents dans ce domaine, en mettant l’accent sur la construction de test et régions de conﬁance valides. Les techniques considérées comprennent les différentes statistiques proposées, l’emploi de bornes, la subdivision d’échantillon, les techniques de projection, le conditionnement et les tests de Monte Carlo. Parmi les critères utilisés pour évaluer les procédures, nous insistons sur la possibilité de fournir une théorie distributionnelle à distance ﬁnie, sur la robustesse par rapport à la présence d’instruments faibles ainsi que sur la robustesse par rapport la spéciﬁcation d’un modèle pour les variables explicatives endogènes du modèle. Mots clés : test d’hypothèse; région de conﬁance; intervalle de conﬁance; identiﬁcation; testabilité; théorie asymptotique; inférence exacte; fonction pivotale; modèle non-paramétrique; BahadurSavage; hétéroscédasticité; dépendance sérielle; racine unitaire; équations simultanées; modèle structurel; variable instrumentale; instrument faible; inférence simultanée; projection; subdivision d’échantillon; test conditionnel; test de Monte Carlo; bootstrap. Classiﬁcation JEL: : C1, C12, C14, C15, C3, C5.

ii

Contents

List of Propositions and Theorems 1. 2. 3. Introduction Models Statistical notions 3.1. Hypotheses . . . . . . . . . . . . 3.2. Test level and size . . . . . . . . . 3.3. Conﬁdence sets and pivots . . . . 3.4. Testability and identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 1 4 5 5 7 8 9 9 9 12 13 14 15 16 17 18 18 21 21 26 26 28 29

4.

Testability, nonparametric models and asymptotic methods 4.1. Procedures robust to nonnormality . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Procedures robust to heteroskedasticity of unknown form . . . . . . . . . . . . . 4.3. Procedures robust to autocorrelation of arbitrary form . . . . . . . . . . . . . . Structural models and weak instruments 5.1. Standard simultaneous equations model . . . . . . . . . . . . . . . . . . . . . . 5.2. Statistical problems associated with weak instruments . . . . . . . . . . . . . . 5.3. Characterization of valid tests and conﬁdence sets . . . . . . . . . . . . . . . . Approaches to weak instrument problems 6.1. Anderson-Rubin statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Projections and inference on parameter subsets . . . . . . . . . . . . . . . . . . 6.3. Alternatives to the AR procedure . . . . . . . . . . . . . . . . . . . . . . . . . Extensions 7.1. Multivariate regression, simulation-based inference and nonnormal errors . . . . 7.2. Nonlinear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion

5.

6.

7.

8.

**List of Propositions and Theorems
**

4.1 4.2 4.4 Theorem : Mean non-testability in nonparametric models . . . . . . . . . . . . . . Theorem : Characterization of heteroskedasticity robust tests . . . . . . . . . . . . . Theorem : Unit root non-testability in nonparametric models . . . . . . . . . . . . . 10 12 13

iii

and more speciﬁcally on weak instruments in the context of structural models [e. So it is not surprising that SEM were introduced and most often employed in the analysis of economic data. such as unit roots and unidentiﬁed (or weakly identiﬁed) models. GARCH-types processes. the literature on nonparametric testability sheds light on various econometric problems and their solutions.1. An important component of this work is the development of procedures for testing hypotheses (or models).1 In particular. a view widely held by both scientists and philosophers is that testability or the formulation of testable hypotheses constitutes a central feature of scientiﬁc activity — a view we share. at least. this paper also considers questions associated with the development of reliable inference procedures in econometrics. Introduction The main objective of econometrics is to supply methods for analyzing economic data. Further. results (from mathematical statistics and econometrics) on testability in nonparametric models.g. We will ﬁnd it useful to bring together two separate streams of literature: namely. Last year. 1 . (2) new models: multivariate time series models. asymptotic distributions based on “weak regularity conditions”. Markov chain Monte Carlo techniques. (5) methods based on weak distributional assumptions: nonparametric methods.. as a way of getting more reliable tests and conﬁdence sets. such as: (1) new ﬁelds of applications linked to the availability of new data. raising among various issues the possibility of observational equivalence between alternative parameter values (non-identiﬁcation) and the use of instrumental variables (IV). Speciﬁcally. (4) methods based on simulation: bootstrap. and assessing alternative theories. it is not clear a discipline should be viewed as scientiﬁc if it does not lead to empirically testable hypotheses. on the same occasion. the only coherent — or. qualitative variables. more specialized. the only well developed — set of methods are those supplied by statistical and econometric theory. micro-data. we mean a set of possible data distributions such that a distribution [e. Further. the ﬁnite-sample 1 By a nonparametric model (or hypothesis). both in microeconomics and macroeconomics.g.. (6) discovery of various nonregular problems which require nonstandard distributional theories. and the recent econometric research on weak instruments. To date. But our exposition will be. speciﬁcally bootstrapping. Indeed. in a way. we shall focus on general statistical issues raised by identiﬁcation in econometric models. panels. Over the last 25 years. building models. indirect inference. But this requirement leaves open the question of formulating operational procedures for testing models and theories. MacKinnon (2002) discussed the use of simulation-based inference methods in econometrics. (3) a greater ability to estimate nonlinear models which require an important computational capacity. the “true” distribution] cannot be singled out by ﬁxing a ﬁnite number of parameter values. In view of the importance of the issue. ﬁnancial data. The problems involved are difﬁcult. Simultaneous equations models (SEM) are related in a natural way to the concept of equilibrium postulated by economic theory. econometric research has led to important developments in many areas. and in another way. more general — and critical. simultaneous equations models (SEM)]. we shall emphasize that identiﬁcation problems arise in both literatures and have similar consequences for econometric methodology. With the exception of mathematics. Methods for estimating and testing such models constitute a hallmark of econometric theory and represent one of its most remarkable achievements.

Speciﬁcally. because it underscores the pitfalls in using large-sample approximations. second. the topic of SEM was dormant until a few years ago. • identiﬁcation involves the possibility of distinguishing different parameter values on the basis of the corresponding data distributions. no reasonable test can be produced. while • testability refers to the possibility of designing procedures that can discriminate between subsets of parameter values. and Baker (1993. In particular. the reader may consult Rothenberg (1971). despite a lot of loose ends (often hidden by asymptotic distributional theory). Merckens.5 In nonparametric models. this is typically interpreted by stating that the postulated distribution of the data — as a function of the parameter vector (the likelihood function) — can be the same for different values of the parameter vector. as well as the importance of going back to basic statistical theory when developing econometric methods. so inference is typically based on large-sample approximations.. This will be discussed in greater detail below. Roughly speaking.. The exogeneity requirement has been well known from the very beginning of IV methods. an instrument should have two basic properties: ﬁrst. functionals of distribution functions). It returned to center stage with the discovery of so-called weak instruments. 2003). The second one was also known from the theory of identiﬁcation. A parameter (or a parameter vector) in a model is not identiﬁed when it is not possible to distinguish between alternative values of the parameter. 3 2 2 . Choi and Phillips (1992).g. 4 For general expositions of the theory of identiﬁcation in econometrics and statistics. Early papers which called attention to the problem include: Nelson and Startz (1990a. we mean here a test that both satisﬁes a level constraint and may have power superior to the level when the tested hypothesis (the null hypothesis) does not hold. 1995). Weak instruments lead to poor performance of standard econometric procedures and cases where they have pernicious effects may be difﬁcult to detect. it should be independent (or. and Bound. but its practical importance was not well appreciated and often hidden from attention by lists of instruments relegated to footnotes (if not simply absent) in research papers. Hsiao (1983). it should be correlated with the included endogenous explanatory variables for which it is supposed to serve as an instrument (relevance). weakly correlated with endogenous explanatory variables). Prakasa Rao (1992).2 IV methods have become a routine part of econometric analysis and.e. Maddala and Jeong (1992). In parametric models. quite generally.3 Interest in the problem also goes far beyond IV regressions and SEM.4 An important consequence of this sort of situation is a statistical impossibility: we cannot design a data-based procedure for distinguishing between equivalent parameter values (unless additional information is introduced). see Phillips (1983) and Taylor (1983). 5 By a reasonable test. uncorrelated) with the disturbance term in the equation of interest (exogeneity). and Wansbeek (1994) and Manski (1995. identiﬁcation is more difﬁcult to characterize because a likelihood function (involving a ﬁnite number of parameters) is not available and parameters are often introduced through more abstract techniques (e. Buse (1992). Jaeger. 1990b). For reviews. at least. which can be interpreted as instruments with little relevance (i.distributional theory of estimators and test statistics is very complex. Bekker. Fisher (1976). But the central problem is the same: can we distinguish between alternative values of the parameter? So. an identiﬁcation problem can be viewed as a special form of non-testability.

3 . we review the inferential issues linked to the possible presence of weak instruments in structural models. many models and hypotheses are formulated in ways that make them fundamentally nontestable. (b) testing an hypothesis about a mean (or a median) with heteroskedasticity of unknown form.. the following points will be stressed: 1. a problem of non-testability can be viewed as a form of non-identiﬁcation. while quantiles are. some parameters tend to be non-testable (badly identiﬁed) in nonparametric models while others are not. the following points will be emphasized: 1. these phenomena underscore parametric nonseparability problems: statements about a given parameter (often interpreted as the parameter of interest) are not empirically meaningful without supplying information about other parameters (often called nuisance parameters). 3. 2.d. by reducing the number of parameters).g.Alternatively. and conversely identiﬁcation problems can be eliminated by transforming the parameter space (e.g. medians). in particular. especially in nonparametric setups.g. in accordance with basic statistical theory. this tends to be the case in nonparametric setups. from this viewpoint. such as: (a) testing an hypothesis about a mean when the observations are independent and identically distributed (i. Furthermore. but hypotheses that set both the parameter of interest and some nuisance parameters may well be testable in such circumstances.i. it is well known that one can create a non-identiﬁed model by introducing redundant parameters.. variances) while it does not for quantiles (e. inappropriate choices of parameter representations. we will review the associated problems and proposed solutions. moments are not a good way of representing the properties of distributions in nonparametric setups. under nonparametric assumptions. one should always look for pivots as the fundamental ingredient for building tests and conﬁdence sets. These problems are closely related. 5. with an emphasis on ﬁnite-sample properties and the development of tests and conﬁdence sets which are robust to the presence of weak instruments. regarding the issue of testability. such difﬁculties arise in basic apparently well-deﬁned problems. we pursue two main objectives: ﬁrst. such approximations are arbitrarily bad in ﬁnite samples. this principle appears to be especially important when identiﬁcation problems are present. In this paper. Problems of non-identiﬁcation are associated with bad parameterizations. second. it is fundamentally misleading because.. More precisely. to the extent that asymptotic distributional theory is viewed as a way of producing statistical methods which are valid under “weak” distributional assumptions. . (or underidentiﬁcation). so that the development of appropriate inference procedures should start from a joint approach. In particular. in particular. We will see below that the same remark applies quite generally to non-testability problems. 4. we analyze the statistical problems associated with non-identiﬁcation within the broader context of testability. non-testability easily occurs for moments (e. Concerning weak instruments. means. (c) testing the unit root hypothesis on an autoregressive model whose order can be arbitrarily large.

2. we refer the reader to the excellent survey recently published by Stock. such as: (a) robustness to weak instruments. We conclude in section 8. weak instrument problems underscore in a striking way the limitations of large-sample arguments for deriving and evaluating inference procedures. especially in improving power. In section 4.2. 7. while extensions to nonlinear or nonGaussian models are considered in Section 7. 5. The paper is organized as follows. so that proper pivots may easily involve many more parameters than the parameter of interest. 4. Wright. or asymptotic theory in this context. furthermore. we will not discuss in any detail results on estimation. various extensions and improvements over the AR method are possible. it satisﬁes all the invariance properties listed above. a fairly complete set of inference procedures that allow one to produce tests and conﬁdence sets for all model parameters can be obtained through projection techniques. and we cannot provide here a complete review. so that it may reasonably be viewed as a fundamental building block for developing reliable inference procedures in the presence of weak instruments. (b) robustness to instrument exclusion. In section 5. In particular. AR) constitutes one of the (very rare) truly pivotal functions proposed for SEM. An hypothesis should have two basic features. parametric nonseparability arises in striking ways when some parameters may not be identiﬁed. 3. Models The purpose of econometric analysis is to develop mathematical representations of data. which we call models or hypotheses (models subject to restrictions). we study testability problems in nonparametric models. if not misleading. In the next two sections. we review succinctly some basic notions concerning models (section 2) and statistical theory (section 3). however. we examine a number of possible solutions in the context of linear SEM. (c) robustness to the speciﬁcation of the model for the endogenous explanatory variables in the equation(s) of interest. very few informative pivotal functions have been proposed in the context of simultaneous equations models. and Yogo (2002). we review the statistical difﬁculties associated with weak instruments. For that purpose. it is important to note that these often come at the expense of using large-sample approximations or giving up robustness. important additional criteria for evaluating procedures in such contexts include various forms of invariance (or robustness). which are important for our discussion. this also indicates that the common distinction between parameters of interest and nuisance parameters can be quite arbitrary. 8. 4 . 6. the detection of weak instruments. In section 6. The literature on weak instruments is growing rapidly. the early statistic proposed by Anderson and Rubin (1949.

consequently. Xn ) 6 (3. note the information criterion is emphasized by an inﬂuential view in philosophy of science which stresses falsiﬁability as a criterion for the scientiﬁc character of a theory [Popper (1968)]. compatibility with observed data is most easily satisﬁed by vague models which impose few restrictions. . they are logically unfalsiﬁable: in contrast with deterministic models. be informative.1) For further discussion on the issues discussed in this section. Such models are unveriﬁable: as with any theory that makes an indeﬁnite number of predictions. Given these facts. The more restrictive a model is. Deterministic models . void of empirical content. It must be compatible with available data. such as Lehmann (1986). Statistical notions In this section. A non-restrictive hypothesis says nothing and. 3. most models used in econometrics are stochastic. Moreover. 5 . the information criterion suggests the use of parsimonious models that usually take the form of parametric models based on strong assumptions. and the more interesting it is. It must restrict the expected behavior of observations. Accordingly. the more informative it is. we would like it to be true. . Hypotheses Consider an observational experiment whose result can be represented by a vector of observations X(n) = (X1 . it is clear any criterion for assessing whether an hypothesis is acceptable must involve a conventional aspect. in contrast. . Models can be classiﬁed as being either deterministic or stochastic. the reader may consult Dufour (2000). However.1. 2. 2. which claim to make arbitrarily precise predictions.1. derived from the Neyman-Pearson approach and described in standard textbooks. The general outlook follows modern statistical testing theory. we review succinctly basic statistical notions which are essential for understanding the rest of our discussion.6 3. are highly falsiﬁable but always inconsistent with observed data. The purpose of hypothesis testing theory is to supply a coherent framework for accepting or rejecting probabilistic hypotheses. . ideally. vague models may take the form of parametric models with a large number of free parameters or nonparametric models which involve an inﬁnite set of free parameters and thus allow for weak assumptions. these two criteria are not always compatible: 1. It is a probabilistic adaptation of the falsiﬁcation principle. does not teach us anything: it is empirically empty. we can never be sure that the model will not be put in jeopardy by new data. a probabilistic model is usually logically compatible with all possible observation vectors.

Otherwise.4) where F0 is a distribution function with a speciﬁc form. we call θ1 the parameter of interest. and let ¯ ¯ F (x) = F (x1 . H0 is unacceptable ⇐⇒ (∀F ∈ H0 ) F is unacceptable .g. A model is parametric if the distribution of the data is speciﬁed up to a ﬁnite number of (scalar) parameters. . xn ) = P[X1 ≤ x1 . .). etc. and θ2 a nuisance parameter: the parameter of interest is set by H0 but the nuisance parameter remains unknown. kurtosis. if F0 (x | θ) represents a Gaussian distribution with mean µ and variance σ 2 [e. 1 We usually abbreviate this as: H0 : θ1 = θ0 . equivalently.where Xi takes real values. (3. .8) 6 .. Such functions are also called functionals. Then we can say that: H0 is acceptable ⇐⇒ (∃F ∈ H0 ) F is acceptable or. An hypothesis H0 on X (n) is an assertion ¯ H0 : F ∈ H 0 . quartiles. xn ). θ2 ).5) where H0 is a subset of Fn . where x = (x1 . 1 (3.2) be its distribution. or several distributions (composite hypothesis). corresponding to a Gaussian law]. . H0 may be interpreted as follows: there is at least one distribution in H0 that can be viewed as a representation compatible with the observed “behavior” of X (n) . . (3. the set of all possible distributions Fn . Xn ≤ xn ] (3. . θ2 ) and θ1 = θ0 } . H0 often takes the following form: H0 ≡ {F (·) : F (x) = F0 (x | θ1 .).9) (3.6) (3.7) In such a case. we prefer to represent distributions in terms of parameters. (3. . There are two ways of introducing parameters in a model. we have θ = (µ. . . The set H0 may contain: a single distribution (simple hypothesis). The second approach is to deﬁne a family of distribution functions which are indexed by a parameter vector θ : F (x) = F0 (x | θ) (3.3) Examples of such parameters include: the moments of a distribution (mean. In particular. etc. The ﬁrst is to deﬁne a function from a space of probability distributions to a vector in some Euclidean space: θ : Fn −→ Rp . . . For example. We denote by Fn the set of possible distribution ¯ functions on Rn [F ∈ Fn ]. its quantiles (median. For various reasons. . σ 2 ). it is nonparametric. variance. if we can write θ = (θ1 .

do not reject H0 if Sn (X1 . . for F ∈ H0 ) deﬁne the power function of the test. As the set H0 gets larger. the less restrictive an hypothesis is.7 / Power describes the ability of a test to detect a “false” hypothesis. the power function can be deﬁned as the function: P (F ) = PF [Rejecting H0 ] for F ∈ H1 \ H0 . the more restricted will be the corresponding test procedure. . It usually takes the form: reject H0 if Sn (X1 . In a framework such as the one in (3. . the one with the highest power against a given alternative distribution F ∈ H0 is deemed preferable (at least. the test procedure must satisfy a bigger set of constraints: the larger is the set of distributions compatible with a null hypothesis. equivalently.2.10) (3. we have an ill-deﬁned test problem. we typically like to have a test with the highest possible power against “alternatives of interest”. . Probabilities of rejecting H0 for distributions outside H0 (i. The test has level α iff PF [Rejecting H0 ] ≤ α for all F ∈ H0 or. (3. θ2 ) More formally. c(α. where we distinguish between a parameter of interest θ1 and a nuisance parameter θ2 . In other words. If the latter is speciﬁed. in the sense that no procedure which has some power can satisfy the level constraint: H0 is non-testable. It is easy to understand that imposing a large set of restrictions on a test procedure may reduce its power against speciﬁc alternatives. Xn ) ≤ c . Alternative tests are typically assessed by comparing their powers: between two tests with the same level.13) H0 is testable if we can ﬁnd a ﬁnite number c that satisﬁes the level restriction.e. 3. Sometimes. under this / particular alternative). θ2 ) that satisﬁes the level constraint (3.11).6). But.11) (3. The test has size α if F ∈H0 sup PF [Rejecting H0 ] = α . F ∈H0 (3. . the stronger are the restrictions on the test procedure. in which case it should satisfy the level constraint for F ∈ H0 . 7 7 . . Test level and size A test for H0 is a rule by which one decides to reject or accept the hypothesis (or to view it as incompatible with the data). .12) sup PF [Rejecting H0 ] ≤ α . Among tests with the same level. . where H1 is an appropriate subset of the set of all possible distributions Fn . this is typically due to heavy dependence of the distribution of Sn on the nuisance parameter θ2 . There may be a point where the restrictions are no longer implementable. where PF [ · ] is the function (probability measure) giving the probability of an event when the data distribution function is F.It is important to note here that showing that H0 is unacceptable requires one to show that all distributions in H0 are incompatible with the observed data. in ill-deﬁned problems.. we may be able to ﬁnd a (ﬁnite) critical value c = c(α. In such a case. Xn ) > c . it is also deﬁned on the set H1 ∪ H0 .

. . t(β i ) = n(β i − β i )/ˆ β ] σˆ i=1 Xi /n and sX = n i=1 i 8 . . For example.e. is called the size of the conﬁdence set.θ2 C]. . . even if this is the case. . θ0 ) may 1 2 be perfectly testable.18) (3. . X1 . . Xn ) whose distribution does not depend on unknown parameters (nuisance parameters). . is not testable. Xn ) is boundedly pivotal if its distribution function may depend on θ but is bounded over the parameter space [see Dufour (1997)].14) and if we can build a different test Sn (θ0 . the function Sn (θ1 .15) 1 1 1 If we have θ1 . . By contrast. θ2 ) does allow one to interpret the values taken by the test statistic Sn (nonseparability). θ0 ) . n(Xn − µ) is not a pivot because its distribution depends on σ. if X1 . θ2 ) = ∞ . X1 .e. When we have a pivotal function (or a boundedly pivotal function). Besides.19) ¯ (Xi − Xn )/(n − 1). . it is a pivot. X1 . More generally. . follows a Student t(n − 1) distribution √ ¯ which does not depend on the unknown values of µ and σ. In practice. . θ2 3. the distribution does not depend on θ2 . . σ 2 ]. . . we can 1 1 determine the set of values that can be viewed as compatible with the data according to the tests considered: C = θ0 : Sn (θ0 . conﬁdence regions (or conﬁdence intervals) were made possible by the discovery of pivotal functions (or pivots): a pivot for θ1 is a function Sn (θ1 . i. . . 1 (3. we can ﬁnd a point c such that: P[Sn (θ1 . X1 . θ2 ) = (θ0 .e. . The set C covers the “true” parameter value θ1 with probability at least 1 − α.16) (3. hence. this does not imply that an hypothesis that would ﬁx both θ1 and θ2 . sup c(α.17) inf P[θ1 ∈ C] ≥ 1 − α . Xn ) ≤ c(θ0 ) . so that it is not possible to ﬁnd a useful (ﬁnite) critical value for testing H0 . the hypothesis H0 : (θ1 . Xn ) for each possible value of θ0 . More generally. i. (3. i. . the t-statistic √ ¯ tn (µ) = n(Xn − µ)/sX ¯ where Xn = n i. The minimal probability of covering the true value of θ1 . Xn ∼ N [µ. inf P[θ1 ∈ θ 1 . . (3. in particular. Conﬁdence sets and pivots If we consider an hypothesis of the form H0 (θ0 ) : θ1 = θ0 1 1 (3. . Xn ) ≥ c] ≤ α .d. C is a conﬁdence set with level 1 − α for θ1 . ∀θ1 .3. But only a complete speciﬁcation of the vector (θ1 . .i. the t statistics for individual coefﬁcients [say. in the classical linear model √ ˆ with several regressors.θ2 PF Rejecting H0 (θ0 ) ≤ α 1 for all F ∈ H(F0 . . X1 .depends heavily on θ2 .

design a procedure for deciding whether θ = θ1 or θ = θ2 . an empirically empty hypothesis. (3) testing the unit root hypothesis on an autoregressive model whose order can be arbitrarily large.1. 4. the proposed statistic cannot be pivotal for the model considered: its distribution varies too much under the null hypothesis to determine a ﬁnite critical point satisfying the level restriction (3.4. Maasoumi (1992) and Pötscher (2002).constitute pivots because their distributions do not depend on unknown nuisance parameters. 3. Second. If an hypothesis is non-testable. the values of the other regression coefﬁcients disappear from the distribution. its equality to zero.18). In the next section. First. Testability and identiﬁcation When formulating and trying to solve test problems. A parameter θ is identiﬁable iff θ(F1 ) = θ(F2 ) =⇒ F1 = F2 . we have a non-testable hypothesis. we examine several cases where this happens. there is no valid test that satisﬁes reasonable properties [such as depending upon the data]: in such a case. Testability. The values of θ are testable. for example. so that we may expect to be able to “tell” the difference by looking at the data. a parametric transformation g(θ) is identiﬁable iff g[θ(F1 )] = g[θ(F2 )] =⇒ F1 = F2 . (3. Tests on regression coefﬁcients in linear regressions Further discussion on the issues discussed in this section is available in Dufour (2001). but turn out to be ill-deﬁned when we look at them more carefully. in principle. we are not able to design a reasonable procedure for deciding whether it holds (without the introduction of additional data or information). these deﬁnitions mean that different values of the parameter imply different distributions of the data. we can.21) Intuitively. This is certainly the case when a unique distribution is associated with each parameter value [for example. nonparametric models and asymptotic methods We will now discuss three examples of test problems that look perfectly well deﬁned and sensible at ﬁrst sight.i.8 4. two types of basic difﬁculties can arise.20) For θ1 = θ2 . in particular.d. 8 9 . but this may not be the case when a parameter covers several distributions. (2) testing an hypothesis about a mean (or a median) with heteroskedasticity of unknown form. Procedures robust to nonnormality One of the most basic problems in econometrics and statistics consists in testing an hypothesis about a mean. we may use the Neyman-Pearson likelihood ratio test to make the decision]. These include: (1) testing an hypothesis about a mean when the observations are independent and identically distributed (i. More generally. (3.). This difﬁculty is closely related to the concept of identiﬁcation in econometrics. For rrelated discussions. see also Horowitz (2001).

for any µ1 = µ0 . however. . Indeed. Further. Xn are i. If the simplest versions of the problem have no reasonable solution. Here. The set of possible data distributions (or data generating processes) compatible with this hypothesis. Further [by (4. if there is at least one value µ1 = µ0 such that PFn Rejecting H0 (µ0 ) ≥ α for at least one Fn ∈ H(µ1 ) .e. more generally. The normality assumption. if a test has level α for testing H0 (µ0 ).e. .5) (4.d.d.6) (4. . . H(µ0 ) = {Distribution functions Fn ∈ Fn such that H0 (µ0 ) is satisﬁed} . such as H0 (µ0 ) : X1 . P ROOF . If a test has level α for (4. for all µ1 = µ0 .i. the situation will not improve when we consider more complex versions (as done routinely in econometrics). . . if “by luck” the power of the test gets as high as the level. Here H0 (µ0 ) is a nonparametric hypothesis because the distribution of the data cannot be completely speciﬁed by ﬁxing a ﬁnite number of parameters.05) in such a problem can be run 10 (4.3) PFn [Rejecting H0 (µ0 )] ≤ α for all Fn ∈ H(µ0 ) .i. So it is tempting to consider a weaker (less restrictive) version of this null hypothesis. on parameters of models which are estimated by the generalized method of moments (GMM) can be viewed as extensions of this fundamental problem.1 M EAN H0 (µ0 ). PFn Rejecting H0 (µ0 ) = α for all Fn ∈ H(µ) . the probability of rejecting H0 (µ0 ) should not exceed the level irrespective how far the “true” mean is from µ0 . i. the restrictions imposed by the level constraint are so strong that the test cannot have power exceeding its level: it should be insensitive to cases where the null hypothesis does not hold! An optimal test (say. (4. at level .TESTABILITY IN NONPARAMETRIC MODELS . PFn [Rejecting H0 (µ0 )] ≤ α for all Fn ∈ H(µ1 ) . we would like to test the hypothesis that the observations have mean µ0 . i.1) In other words.4) NON . In other words [by (4.4)].2) is much larger here than in the Gaussian case and imposes very strong restrictions on the test.d.or. then the probability of rejecting should be uniformly equal to the level α. then. is often considered to be too “strong”. the set H(µ0 ) is so large that the following property must hold.6)]. under the general assumption that X1 . Xn are i.. The problem of testing an hypothesis about a mean has a very well known and neat solution when the observations are independent and identically (i. .i. then. . (4.) distributed according to a normal distribution: we can use a t test. observations with E(X1 ) = µ0 . See Bahadur and Savage (1956). Theorem 4. .

a structural equation (as in SEM).. The above theorem also implies that tests based on the “asymptotic distribution” of the usual t statistic for µ = µ0 [tn (µ0 ) deﬁned in (3. models of the form H0 (θ0 ) are typically estimated and tested through a variant of the generalized method of moments (GMM). A way to interpret what happens here is through the distinction between pointwise convergence and uniform convergence. Examples include: 1. for each value of γ. p H0 (µp ) : X1 . the test has level 0. . Xn are i. Then. observations such that Var(Xt ) = σ 2 . T . usually with weaker assumptions on the distribution 11 . most hypotheses on the coefﬁcients of a regression (linear or nonlinear).e. (2) using a random number generator. . Suppose. Clearly.05 pointwise (for each γ).d. 2. .05 . hypotheses about various moments of Xt : H0 (σ 2 ) : X1 . γ>0 (4. .d.e.10) Pn (γ) converges to a level of 0. θ0 ) = ut . . for all n .95)e−|γ|n where γ = 0. .as follows: (1) ignore the data. produce a realization of a variable U according to a uniform distribution on the interval (0.i. where u1 . Many other hypotheses lead to similar difﬁculties. It is also easy to see that a similar result will hold if we add various nonparametric restrictions on the distribution. t = 1 . 1). .i. . (4.05. (3) reject H0 if U ≤ . . In econometrics. procedures based on the asymptotic distribution of a test statistic have size that deviate arbitrarily from their nominal size. In other words. so that the probability of rejection is arbitrarily close to one for γ sufﬁciently close to zero (for all sample sizes n).d.8) lim Pn (γ) = 0. U ∼ U (0. that the probability of rejecting H0 (µ0 ) when it is true depends on a single nuisance parameter γ in the following way: Pn (γ) ≡ Pγ |tn (µ0 )| > c = 0. Xn are i.05 + (0. this is not an interesting procedure. or a more general estimating function [Godambe (1960)]: H0 (θ0 ) : gt (Zt . uT are i. but the convergence is not uniform.19)] has size one under H0 (µ0 ) : sup Fn ∈H(µ0 ) PFn |tn (µ0 )| > c = 1 (4. n→∞ (4. . . . to simplify. . . i. i.9) but the size of the test is one for all sample sizes: sup Pn (γ) = 1 . . 1). such as a ﬁnite variance assumption. observations such that E(Xt ) = µp .i. .7) for any ﬁnite critical value c.05 asymptotically.

. More generally. . . P ROOF . . . . In such a case. It is important to observe that the above discussion does not imply that all nonparametric hypotheses are non testable.’s such that Med(Xt ) = m0 . . . Newey and McFadden (1994) and Hall (1999). In the present case. Equivalently. To see this. . Xn are independent random variables (4. then it must satisfy the condition P[ Rejecting H0 | |X1 |. see Hansen (1982).10) and Lehmann and Stein (1949). . namely: H0 : X1 .5 H0 (m0 ) : X1 . T . observations such that P Xt ≤ Qp0 = p .5 H0 (m0 ) can be easily tested with a sign test [see Pratt and Gibbons (1981. .2. though quantiles are. Moments are not empirically meaningful functionals in nonparametric models (unless strong distributional assumptions are added). But it is not widely appreciated that this involves very strong restrictions on the procedures that can satisfy this requirement. In other words. . 4.12) . the problem of non-testability could be eliminated by choosing another measure of central tendency.11) each with a distribution symmetric about zero.i.of u1 . . Procedures robust to heteroskedasticity of unknown form Another common problem in econometrics consists in developing methods which remain valid in making inference on regression coefﬁcients when the variances of the observations are not identical (heteroskedasticity). continuous r. its level must be equal to α conditional on the absolute values of the observations (which amounts to considering 12 TESTS . H0 states that the joint distribution Fn of the observations belongs to the (huge) set H0 = {Fn ∈ Fn : Fn satisﬁes H0 } : H0 allows heteroskedasticity of unknown form. . .v.i. . See Pratt and Gibbons (1981. |Xn | ] ≤ α under H0 . . . it follows from the above discussion that they constitute pseudo-solutions of ill-deﬁned problems. Xn are i. t = 1 . If a test has (4. a valid test with level α must be a sign test — or. . consider the problem which consists in testing whether n observations are independent with common zero median.2 C HARACTERIZATION OF HETEROSKEDASTICITY ROBUST level α for H0 . where 0 < α < 1. Newey and West (1987a).d. . more precisely. we have the following theorem. . 0. . . Xn are i. To the extent that GMM methods are viewed as a way to allow for “weak assumptions”. T . . In particular. t = 1 . Section 5. . this may go as far as looking for tests which are “robust to heteroskedasticity of unknown form”. . such as a median: 0. uT . hypotheses on the quantiles of the distribution of observations in random sample remain testable nonparametrically: p H0 (Qp0 ) : X1 . . Chapter 2)]. Theorem 4. .d.

(4. and Hendry (1993). Procedures robust to autocorrelation of arbitrary form As a third illustration.i. i. Dolado.4 U NIT ˜ α for H0 . t = 1 . (4. i. i. i.e. ˜ H0 : Xt = β 0 + p k=1 p λk Xt−k + ut . and Maddala and Kim (1998).15) or. Most so-called “heteroskedasticity robust procedures” based on “corrected” standard errors [see White (1980). n . For examples of size distortions. . . more precisely. t = 1 . . the reader may consult: Banerjee. Davidson and MacKinnon (1993. see Dufour (1981) and Campbell and Dufour (1995. If a test has level (4. the condition (4. ˜ PFn [Rejecting H0 ] ≤ α P ROOF . For reviews of this huge literature. for all 0 < α < 1.14) where p is not bounded a priori.d. Galbraith. Newey and West (1987b). let us now examine the problem of testing the unit root hypothesis in the context of an autoregressive model whose order is inﬁnite or is not bounded by a prespeciﬁed maximal order: p Xt = β 0 + k=1 λk Xt−k + ut .12) and consequently have size one.d. sup PFn [Rejecting H0 ] = 1 . if a test procedure does not satisfy (4. Cushing and McGarvey (1999)] do not satisfy condition (4.3.16) k=1 About this problem. Chapter 16). Theorem 4. From this. then its true size is one irrespective of its nominal size.TESTABILITY IN NONPARAMETRIC MODELS . λk = 1 and ut ∼ N [0 . This type of problem has attracted a lot of attention in recent years.17) (4.9 4. the following remarkable property follows.3 If.13) Fn ∈H0 In other words. Stock (1994). we can show the following theorem and corollary.12) is not satisﬁed. for some p ≥ 0 . Tanaka (1996). .i. .e. . n . . 10 9 13 . σ 2 ] . 1997). . ut ∼ N [0 . then the size of the test is equal to one. See Cochrane (1991) and Blough (1992).12) for all levels 0 < α < 1. Corollary 4.a test based on the signs of the observations).18) ˜ PFn [Rejecting H0 ] ≤ α ˜ for all Fn satisfying H0 . σ 2 ] .10 We wish to test: p ˜ H0 : k=1 λk = 1 (4. (4. for all Fn . then ROOT NON .

the reader may consult Sims (1971a. Nankervis. For further discussion of this topic. 2002c). Jaeger. Gleser and Hwang (1987). Faust (1996. Chao and Swanson (2001. 1990b). and Zivot (2001).i. 2003). and Yogo (2002)]. Rudebusch. Hall and Peixe (2000).19) λk = 1 and ut ∼ N [0 . Hillier (1990). 2003b). Nelson. 1989). the condition (4. 2001b. and Savin (1994).16). Kleibergen and Zivot (2003). Buse (1992). Phillips (1987). The order of the autoregressive process is an essential part of the hypothesis: it is not possible to separate inference on the unit root hypothesis from inference on the order of the process. .18) is not satisﬁed. Stock. .. Bound. Consequently. Hall. the true size under the hypothesis H0 is equal to one. Moreira (2001.e. Staiger and Stock (1997). 1999) and Pötscher (2002). Bekker (2002). Wang and Zivot (1998). for all 0 < α < 1.g. Dufour and Taamouti (2000. Startz. (4. Maddala and Jeong (1992). Nelson and Startz (1990a. 14 . 2002a. 2003a. then the size of the test is equal to one. The literature on this topic is now considerable. n . 2001).11 In this section. Dufour (1997). the null hypothesis is simply too “large” (unrestricted) to allow testing from a ﬁnite data set. Koschat (1987). and Wilcox (1996). Wright. i. Phillips (1989). Wright (2003. ˜ sup PFn [Rejecting H0 ] = 1 Fn ∈H0 ˜ where H0 is the set of all data distributions Fn that satisfy H0 . Hahn. 2003). McManus. a numerical upper bound on the order) and the sum of the coefﬁcients: for example. Stock and Yogo (2002. Zivot. and Nelson (2003). t = 1 . and Kuersteiner (2001). Kleibergen (2001b. 11 See Sargan (1983). 1971b). Hausman. Similar difﬁculties will also occur for most other hypotheses on the coefﬁcients of (4. Imbens and Manski (2003).Corollary 4. Hahn and Hausman (2002a. i. Angrist and Krueger (1995). To get a testable hypothesis. Dufour and Jasiak (1993. Startz. Perron (2003). Phillips and Perron (1988)] are affected by these ˜ problems: irrespective of the nominal level of the test. 5. all procedures that claim to offer corrections for very general forms of serial dependence [e. and Nelson (1998). 2003).e. 2002b. 2002b. . and Zivot. As in the mean problem. 1995).. it is essential to ﬁx jointly the order of the AR process (i. Shea (1997). . Choi and Phillips (1992). 2001a.5 If. 1985. Blough (1992). and Baker (1993. Phillips (1984. Moreira and Poi (2001).d. Stock and Wright (2000). 2001a). we shall examine these issues in the context of SEM. Bekker and Kleibergen (2001). σ 2 ] . Startz. we could consider the following null hypothesis where the order of the autoregressive process is equal to 12: H0 (12) : Xt = β 0 + 12 k=1 12 k=1 λk Xt−k + ut . 2002). Structural models and weak instruments Several authors in the past have noted that usual asymptotic approximations are not valid or lead to very inaccurate results when parameters of interest are close to regions where these parameters are no longer identiﬁable.

Below.4) (5. Π1 and Π2 are k1 ×G and k2 ×G matrices of unknown coefﬁcients. . We view this situation as important because. (5.1. . . the vectors V1 .4) may be interpreted as the strict exogeneity of X with respect to u.5. X2 . u ∼ N 0.6) where X3 is a T × k3 matrix of explanatory variables (not necessarily strictly exogenous). we could also assume that Y obeys a general nonlinear model of the form: Y = g(X1 . X3 . VT need not follow a Gaussian distribution and may be heteroskedastic. . β and γ are G × 1 and k1 × 1 vectors of unknown coefﬁcients. . Even more generally.2) can be rewritten in reduced form as: y = X1 π 1 + X2 π 2 + v .2) where y and Y are T × 1 and T × G matrices of endogenous variables. The model presented in (5. . . Note that the distribution of V is not otherwise restricted.8) (5.5) (5. we shall use the following assumptions on the distribution of u : u and X are independent. we shall also consider the situation where the reduced-form equation for Y includes a third set of instruments X3 which are not used in the estimation: Y = X1 Π1 + X2 Π2 + X3 Π3 + V (5. . Y = X1 Π1 + X2 Π2 + V . Further.(5. . Finally. . X3 may be unobservable. Standard simultaneous equations model Let us consider the standard simultaneous equations model: y = Y β + X1 γ + u . in practice. in particular. VT ] is a T × G matrix of reducedform disturbances. uT ) is a T × 1 vector of structural disturbances. X2 ] is a full-column rank T × k matrix (5.3) where k = k1 + k2 . u = (u1 . .9) 15 . V. . Y = X1 Π1 + X2 Π2 + V . it is quite rare that one can consider all the relevant instruments that could be used. σ 2 IT u . X1 and X2 are T × k1 and T × k2 matrices of exogenous variables.1) .1) (5. (5. and V = [V1 .7) where g(·) is a possibly unspeciﬁed nonlinear function and Π is an unknown parameter matrix. Π) (5. (5. X = [X1 . to get a ﬁnite-sample distributional theory for the test statistics. in particular.

Choi and Phillips (1992). When identiﬁcation conditions are not satisﬁed. The necessary and sufﬁcient condition for identiﬁcation is the well-known rank condition for the identiﬁcation of β : β is identiﬁable iff rank(Π2 ) = G . Wang and Zivot (1998). Weak instruments are notorious for causing serious statistical difﬁculties on several fronts: (1) parameter estimation. Phillips (1989). Stock and Wright (2000)]. (5. Theoretical results show that the distributions of various estimators depend in a complicated way on unknown nuisance parameters.where π 1 = Π1 β + γ . 1. Phillips (1985). but reasonable characterizations include the condition that det(Π2 Π2 ) is “close to zero”. rank(Π2 ) = k2 with strong linear dependence between the rows (or columns) of Π2 ].. Jaeger. (3) hypothesis testing. 3. We now consider these problems in greater detail. Rothenberg (1984). Maddala and Jeong (1992).2. 5. (5. or that Π2 Π2 has one or several eigenvalues “close to zero”. weak-instrument (local to non-identiﬁcation) asymptotics [Staiger and Stock (1997). With weak instruments. 16 . 3. There is no compelling deﬁnition of the notion of near-nonidentiﬁcation. or Π2 is close to having deﬁcient rank [i. (5. standard asymptotic theory for estimators and test statistics typically collapses.11) We have a weak instrument problem when either rank(Π2 ) < k2 (non-identiﬁcation). Dufour (1997)]. The main conclusions of this research can be summarized as follows. and π 2 = Π2 β . empirical examples [Bound. (2) conﬁdence interval construction. and Baker (1995)]. theoretical work on the exact distribution of two-stage least squares (2SLS) and other “consistent” structural estimators and test statistics [Phillips (1983). Work in this area includes: 1. Buse (1992). 2.10) is the crucial equation governing identiﬁcation in this system: we need to be able to recover β from the values of the regression coefﬁcients π 2 and Π2 . Nelson and Startz (1990a).e. Thus. 2. they are difﬁcult to interpret. Hillier (1990).10) Suppose now that we are interested in making inference about β. v = u + V β . Statistical problems associated with weak instruments The problems associated with weak instruments were originally discovered through its consequences for estimation. Phillips (1984). Nelson and Startz (1990a).

This will be the case. these authors found that replacing the instruments used by Angrist and Krueger (1991) with randomly generated (totally irrelevant) instruments produced very similar point estimates and standard errors.. 1)| > c] ≤ α . QJE). 4. 1) . bimodal). By the above corollary.13) (5. 1 P ROOF . (5.14) Corollary 5. θ = ( θ1 . Wright. in particular. This result indicates that the original instruments were weak.. 5. (5. Using 329000 observations. A striking illustration of these problems appears in the reconsideration by Bound.(a) the 2SLS estimator becomes heavily biased [in the same direction as ordinary least squares (OLS)]. θ2 )] such that θ2 is no longer identiﬁed when θ1 takes a certain value. For a more complete discussion of estimation with weak instruments. where P[|N (0. See Dufour (1997). (5.12) 1 Theorem 5. Consider the general situation where we have two parameters θ1 and θ2 [i. (b) the distribution of the 2SLS estimator is quite far from the normal distribution (e.16) θ 17 . the reader may consult Stock. and Yogo (2002). Pθ [C is unbounded] ≥ 1 − α . and Baker (1995) of a study on returns to education by Angrist and Krueger (1991. if θ1 = θ0 . for any Wald-type conﬁdence interval. obtained by assuming that tθ 2 = θ2 − θ2 σ θ2 approx ∼ N (0. irrespective of the critical value c used: inf Pθ θ2 − cσ θ2 ≤ θ2 ≤ θ2 + cσ θ2 = 0 .e. say θ1 = θ0 : 1 L(y | θ1 . this type of interval has level zero. its size must be zero.1 If θ2 is a parameter whose value is not bounded.3. θ2 ) ≡ L(y | θ0 ) . Characterization of valid tests and conﬁdence sets Weak instruments also lead to very serious problems when one tries to perform tests or build conﬁdence intervals on the parameters of the model.15) which yields conﬁdence intervals of the form θ2 − cσ θ2 ≤ θ2 ≤ θ2 + cσ θ2 .2 If C does not satisfy the property given in the previous theorem. then the conﬁdence region C with level 1 − α for θ2 must have the following property: Pθ [C is unbounded] > 0 and.g. (5. Jaeger.

second the projection technique which provides a general way of making a wide spectrum of tests and conﬁdence sets. From a statistical viewpoint. 6. For an interesting discussion of the origin of IV methods in econometrics. a ﬁnite-sample pivot). if we wish to test an hypothesis of form H0 : θ2 = θ0 . By contrast.17) 2 σ θ2 will deviate arbitrarily from its nominal size. (3) robustness to excluded instruments. this type of problem affect the validity of all Wald-type methods.12 The basic problem considered consists in testing The basic ideas for using instrumental variables for inference on structural relationships appear to go back to Working (1927) and Wright (1928).In such situations. Interestingly. which are based on comparing parameter estimates with their estimated covariance matrix. Approaches to weak instrument problems What should be the features of a satisfactory solution to the problem of making inference in structural models? We shall emphasize here four properties: (1) the method should be based on proper pivotal functions (ideally.5). this applies to standard conﬁdence intervals based on 2SLS estimators and their asymptotic “standard errors”. which in our view is the reference method for dealing with weak instruments in the context of standard SEM. the notion of standard error loses its usual meaning and does not constitute a valid basis for building conﬁdence intervals. the size of any test of 2 the form θ2 − θ0 2 tθ2 (θ0 ) = > c(α) (5. see Stock and Trebbi 12 18 . although it pre-dates the introduction of 2SLS methods in SEM [Theil (1953). In the light of these criteria. No unique large-sample distribution for tθ2 can provide valid tests and conﬁdence intervals based on the asymptotic distribution of tθ2 . In SEM. which later became the most widely used method for estimating linear structural equations models. More generally.1. in models of the form (5. (2) robustness to the presence of weak instruments. Staiger and Stock (1997)].1) . Correspondingly. 6. and thirdly several recent proposals aimed at improving and extending the AR procedure. the AR method can be viewed as an alternative way of exploiting “instruments” for inference on a structural model. for example. the distribution of the LR statistics for most hypotheses on model parameters can be bounded and cannot move arbitrarily: likelihood ratios are boundedly pivotal functions and provide a valid basis for testing and conﬁdence set construction [see Dufour (1997)]. (4) robustness to the formulation of the model for the explanatory endogenous variables Y (which is desirable in many practical situations). Anderson-Rubin statistic A solution to testing in the presence of weak instruments has been available for more than 50 years [Anderson and Rubin (1949)] and is now center stage again [Dufour (1997). we shall ﬁrst discuss the Anderson-Rubin procedure. Basmann (1957)]. The central conclusion here is: tests and conﬁdence sets on the parameters of a structural model should be based on proper pivots. this means that tθ2 is not a pivotal function for the model considered.(5.

5) where Fα (k2 .1) . this equation reduces to y − Y β 0 = X1 θ 1 + ε . it can be used to build conﬁdence sets for β : Cβ (α) = {β 0 : AR(β 0 ) ≤ Fα (k2 .2).4). and Nelson (1998) for details. Cβ (α) = {β 0 : aβ 2 + bβ 0 + c ≤ 0} 0 (6. The set Cβ (α) may easily be determined by ﬁnding the roots of the quadratic polynomial in equation (6.6) where a = Y HY. we “hang” on the variables in X2 : if we add X2 to the constrained structural equation (6. When there is only one endogenous explanatory variable (G = 1). this set has an explicit solution involving a quadratic inequation.4) where SS0 (β 0 ) = (y − Y β 0 ) M (X1 )(y − Y β 0 ) and SS1 (β 0 ) = (y − Y β 0 ) M (X)(y − Y β 0 ). and c = y Hy . so they point to lack of identiﬁcation. This yields the following F-statistic — the Anderson-Rubin statistic — which follows a Fisher distribution under the null hypothesis: AR(β 0 ) = [SS0 (β 0 ) − SS1 (β 0 )]/k2 ∼ F (k2 .e. i.the hypothesis H0 (β 0 ) : β = β 0 (6.2) where θ1 = γ + Π1 (β − β 0 ). Unbounded sets are highly likely when the model is not identiﬁed. Since the latter statistic is a proper pivot. T − k)} (6. but it remains fairly manageable by using the theory of quadrics [Dufour and Taamouti (2000)]. we denote P (A) = A(A A)−1 A and M (A) = I− P (A). its coefﬁcient should be zero.3) so we can test H0 (β 0 ) by testing H0 (β 0 ) : θ2 = 0. the set Cβ (α) is not in general an ellipsoid. Empty conﬁdence sets can occur (with a non-zero probability) when we have more instruments than parameters in the structural (2003).6). H ≡ M (X1 ) − M (X) [1 + k2 Fα (k2 . T − k) is the critical value for a test with level α based on the F (k2 . Cβ (α) is a closed bounded set close to an ellipsoid. In order to do that. To draw inference on the structural parameter β.3). T − k) distribution. (6. When G > 1. θ2 = Π2 (β − β 0 ) and ε = u + V (β − β 0 ) . This yields the regression y − Y β 0 = X1 θ 1 + X2 θ 2 + ε (6. for any full-rank matrix A. In other cases. it can be unbounded or empty. Under the null hypothesis H0 (β 0 ). we shall call the variables in X2 auxiliary instruments. b = −2Y Hy. we consider an auxiliary regression obtained by subtracting Y β 0 from both sides of (5. 19 .1) and expanding the right-hand side in terms of the instruments. T − k) SS1 (β 0 )/(T − k) (6.(5.1) in model (5. What plays the crucial role here is the fact that we have instruments (X2 ) that can be related to Y but are excluded from the structural equation. T − k)/(T − k)] . Startz. When the model is correct and its parameters are well identiﬁed by the instruments used. see Dufour and Jasiak (2001) and Zivot. For these reasons. in the auxiliary regression (6.

the model is overidentiﬁed. For example. we can consider the transformed model y − Y β 0 − X1 γ 0 = X1 θ1 + X2 θ2 + ε . but its level is not. ν 0 ) : β = β 0 and Rγ = ν 0 . (6. what can we do. (5) the AR method provides asymptotically “valid” tests and conﬁdence sets under quite weak distributional assumptions (basically. In other words. are discussed in Dufour and Jasiak (2001).10) (6. see Kleibergen (2002b). (3) error normality assumption is restrictive and we may wish to consider other distributional assumptions. Since.equation (5. i. So the procedure provides as an interesting byproduct a speciﬁcation test. to test an hypothesis of the form H0 (β 0 . (6. Indeed. 13 For further discussion of this point.7).8) (6. But the method also has its drawbacks.6) is the correct model for Y. γ 0 ). if β has more than one element? (2) power may be low if too many instruments are added (X2 has too many variables) to perform the test. 20 . Namely. (2) robust to weak instruments. this will also hold if Y is determined by the general — possibly nonlinear — model (5. γ 0 ) : θ1 = 0 and θ2 = 0 in the auxiliary regression (6. such as H0 (β 0 . it is: (1) pivotal in ﬁnite samples. see Maddala (1974).9) we can test H0 (β 0 .8). γ ) ]. and (6) it can be extended easily to test restrictions and build conﬁdence sets which also involve the coefﬁcients of the exogenous variables. Tests for more general restrictions of the form H0 (β 0 . further. (4) the structural equations are assumed to be linear.10). especially if the instruments are irrelevant. under H0 (β 0 . ν 0 ) in (6. (4) robust to the speciﬁcation of the model for Y (which can be nonlinear with an unknown form). The main ones are: (1) the tests and conﬁdence sets obtained in this way apply only to the full vector β [or (β . the assumptions that cover the usual asymptotic properties of linear regression). The power of the test may be affected by the choice of X2 . this is an important practical consideration. which indicates that the model is misspeciﬁed. γ 0 ) : β = β 0 and γ = γ 0 . we can leave out a subset of the instruments (X3 ) and use only X2 : the level of the procedure will not be affected. γ 0 ) by testing H0 (β 0 . (3) robust to excluded instruments. y − Y β 0 − X1 γ 0 = ε . An empty conﬁdence set means that no value of the parameter vector β is judged to be compatible with the available data. The procedure is robust to excluded instruments as well as to the speciﬁcation of the model for Y. The AR procedure can be extended easily to deal with linear hypotheses which involve γ as well.7) where R is a r × K ﬁxed full-rank matrix.1). The AR procedure thus enjoys several remarkable features.e. Since it is quite rare an investigator can be sure relevant instruments have not been left out. We will now discuss a number of methods which have been proposed in order to circumvent these drawbacks.13 It is also easy to see that the above procedure remains valid even if the extended reduced form (5.

So parametric nonseparability arises here. a number of alternative methods have been recently suggested.6. e. If the data generating process corresponds to a model where parameters are well identiﬁed. in the classical linear model (without exact multicollinearity). Alternatives to the AR procedure In view of improving the power of AR procedures. then.g. precise inferences on individual coefﬁcients may be feasible. for any function g(β). Lack of identiﬁcation is precisely a situation where the value of a parameter may be determined only after various restrictions (e. and Dufour and Jasiak (2001). (6. Generalized auxiliary regression A general approach to the problem of testing H0 (β 0 ) consists in replacing X2 in the auxiliary regression y − Y β 0 = X1 θ 1 + X2 θ 2 + ε (6.14) 14 g [Cβ (α)] takes the form of a bounded conﬁdence interval as soon as the conﬁdence set g [Cβ (α)] is unbounded. We will now discuss several of them. Such a separation is feasible only in special situations. P g(β) ∈ g [Cβ (α)] ≥ 1 − α .2. 21 . Projections and inference on parameter subsets Suppose now that β [or (β . the values of other parameters) have been imposed. and Mizera (1998). γ ) ] has more than one component. Abdelkhalek and Dufour (1998). w β + σ zα ] ˜ a k-class type estimator of β. (6. This raises the question how one can move from a joint inference on β to its components. Hallin.3. A general approach to this problem consists in using a projection technique.g. if {ga (β) : a ∈ A} is a set of functions of β. the conﬁdence ˜ ˆ ˜ ˆ set for a linear combination of the parameters. there is no reason in general why one should be able to decide on the value of a component of β independently of the others.12) If g(β) is a component of β or (more generally) a linear transformation g(β) = w β. Dufour. More precisely. and inference should start from a simultaneous approach.. Dufour and Kiviet (1998).13) If these conﬁdence intervals are used to test different hypotheses. see Dufour and Taamouti (2000). If P[β ∈ Cβ (α)] ≥ 1 − α . γ 0 )] is not due to chance: since the distribution of the data is determined by the full parameter vector. the reader may consult Dufour (1990. say w β takes the usual form [w β − σ zα . The fact that a procedure with a ﬁnite-sample theory has been obtained for “joint hypotheses” of the form H0 (β 0 ) [or H0 (β 0 .11) 6. an unlimited number of hypotheses can be tested without losing control of the overall level. 1997). Campbell and Dufour (1997). then P ga (β) ∈ g [Cβ (α)] for all a ∈ A ≥ 1 − α . (6. a.14 with β Another interesting feature comes from the fact that the conﬁdence sets obtained in this way are simultaneous in the sense of Scheffé. For further discussion of projection methods.

Then the problem θ θ consists in selecting Z so that the level can be controlled and power may be improved with respect to the AR auxiliary regression (6. Better results tend to be obtained by ˜ — 10% for example — and the rest for using a relatively small fraction of the sample to obtain Π the main equation. c. it is tempting to use the least squares estimator Π = (X X)−1 X Y. (6..16) ˜ where Π2 is an estimate of the reduced-form coefﬁcient Π2 in (5. b. this procedure is robust to weak can easily see that the standard F test for θ instruments. Π and ε are not independent and we continue to face a simultaneity problem with messy ˜ distributional consequences. the reader may consult Dufour and Jasiak (2001) and Kleibergen (2002a). For further details on this procedure.8) were the correct model and Π = [Π1 . . Ideally.14).17) 15 Split-sample techniques often lead to important distributional simpliﬁcations. Π2 ] were known. Further. under the ˜ general assumptions (5.15) where ¯2 = 0 under H0 (β 0 ). we should note that. of course. are independent. ˆ However. by conditioning on Π. this approach to estimating Π may be feasible by using ˜ a split-sample technique: a fraction of the sample is used to obtain Π and the rest to estimate the ˜ ˜ auxiliary regression (6. Of course. So several alternative procedures can be generated by reducing the number of auxiliary instruments (the number of columns in Z).¯ by an alternative set of auxiliary instruments.2). This suggests that we replace X2 Π2 by an estimate. 22 . then an optimal choice from the viewpoint of power consists in choosing Z = X2 Π2 . it is easy to see that the power of the AR test could become low if a large set of auxiliary instruments is used.e. Split-sample optimal auxiliary instruments If we can assume that the error vectors (ut . Vt ) . . we would like to select an estimate Π2 which is independent of ε. 2000). we consider the generalized auxiliary regression y − Y β 0 = X1 θ1 + Z ¯2 + ε θ (6. is that Π2 is unknown.7)]. In other words. using a split-sample may involve a loss of the effective number of observations and there will be a trade-off between the efﬁciency gain from using a smaller number of auxiliary ˜ instruments and the observations that are “sacriﬁced” to get Π.15) with Z = X2 Π2 . At the outset.15). For that purpose. For example. t = 1. for further discussion of this type of method. T.15 A number of alternative procedures can be cast in the framework of equation (6.15). such as ˜ Z = X2 Π2 (6. The practical problem. especially if the latter are weak. . excluded instruments as well as the speciﬁcation of the model for Y [i.6) or (5. if (5. say Z of dimension T × k2 . see Dufour and Taamouti (2001b). we ¯2 = 0 is valid. The problem then consists in ˜ ˆ choosing Π. So we can test H0 (β 0 ) by testing ¯2 = 0 in (6. . Under such circumstances. as long as the independence between Π and ε can be maintained. see Angrist and Krueger (1995) and Dufour and Torrès (1998. LM-type GMM-based statistic If we take Z = ZW Z with ZW Z ˆ = P [M (X1 )X2 ]Y = P [M (X1 )X2 ]M (X1 )Y = [M (X1 )X2 ]Π2 .

21)] that FGM M (β 0 ) is (almost surely) identical with the AR statistic.ˆ Π2 = [X2 M (X1 )X2 ]−1 X2 M (X1 )Y . the distribution of this statistic will be affected by the fact that X2 Π2 and u are not independent. we see easily [from (6. (6.2). based on the same GMM estimator (in this case.24) so that the distribution of LMGM M (β 0 ) can be bounded in ﬁnite samples by the distribution of a 23 . FGM M (β 0 ) = where ν 1 = T − k1 − G and LMGM M (β 0 ) = (y − Y β 0 ) P [ZW Z ](y − Y β 0 ) . The statistic LMGM M (β 0 ) is also numerically identical to the corresponding LR-type and Wald-type tests. the 2SLS estimator of β).25) (6. obtained by 1 2 ﬁtting the latter equation. FGM M (β 0 ) = AR(β 0 ) if k2 = G . (y − Y β 0 ) M (X1 )(y − Y β 0 )/T (6.e. T − k) distribution. (6. In particular. i. ˆ As mentioned above.22) 1 ˆ ˆ where Y = X Π. When k2 = G (usually deemed the “just-identiﬁed” case. (6. although the model may be under-identiﬁed in that case).21) 1 When k2 ≥ G. it is inﬂuenced by the presence of weak instruments. are identical to the 2SLS estimates of θ∗∗ and θ∗∗ in the equation 1 2 y − Y β 0 = X1 θ∗∗ + Y θ∗∗ + u . so that FGM M (β 0 ) can be obtained by computing the F-statistic for θ∗ = 0 in the 2 regression ˆ 2 y − Y β 0 = X1 θ∗ + (X2 Π2 )θ∗ + u . so that FGM M (β 0 ) follows an exact F (G. It is also interesting to note that the OLS estimates of θ∗∗ and θ∗∗ . Namely.18) the F-statistic [say. 1 2 (6. G FGM M (β 0 ) ≤ T − k1 − G T − k1 − k2 k2 AR(β 0 ) .20) T − k1 − G GT LMGM M (β 0 ) 1 − (1/T )LMGM M (β 0 ) (6. FGM M (β 0 )] for ¯2 = 0 is a monotonic transformation of the LM-type statistic θ LMGM M (β 0 ) proposed by Wang and Zivot (1998). But Wang and Zivot (1998) showed that the distribution of LMGM M (β 0 ) is bounded by the χ2 (k2 ) asymptotically.19) ˆ Note that Π2 above is the ordinary least squares (OLS) estimator of Π2 from the multivariate regression (5.23) The LMGM M test may thus be interpreted as an approximation to the optimal test based on reˆ placing the optimal auxiliary instrument X2 Π2 by X2 Π2 . while for k2 > G. the statistic FGM M (β 0 ) can also be obtained by testing θ∗∗ = 0 in the auxiliary 2 regression ˆ 2 y − Y β 0 = X1 θ∗∗ + Y θ∗∗ + u (6.

suu (β 0 ) = (y − Y β 0 ) M (X)(y − Y β 0 ) . although not necessarily far away from the null hypothesis. have ˜ ˆ been replaced by Y (β 0 ) = Y −X π (β 0 )sεV (β 0 )/sεε (β 0 ).25) also applies to FK (β 0 ) : G FK (β 0 ) ≤ T − k1 − G T − k1 − k2 k2 AR(β 0 ) .21) . which reduces to the one proposed by Kleibergen (2002a) for k1 = 0. while in the other cases (k2 ≥ G). AR(β 0 ) will always reject when FGM M (β 0 ) rejects (at a given level). irrespective of the presence of weak instruments. is very close to the χ2 (k2 ) distribution].monotonic transformation of a F (k2 . Kleibergen’s K test ZK If we take Z = ZK with suV (β 0 ) ˜ ˜ = X Π(β 0 ) ≡ Y (β 0 ) .22) of their statistics.27) (6. (6.32) indicates that the distribution of K(β 0 ) ≡ 16 The χ2 (k2 ) bound also follows in a straightforward way from (6. at least in the neighborhood of the null hypothesis. It is also interesting to note that the inequality (6. (6.32) Kleibergen (2002a) did not supply a ﬁnite-sample distributional theory but showed (assuming k1 = 0) that K(β 0 ) follows a χ2 (G) distribution asymptotically under H0 (β 0 ). T −k) variable [which. FK (β 0 ) and K(β 0 ) ≡ G FK (β 0 ) can be obtained ˆ by testing ¯2 = 0 in the regression θ ˜ y − Y β 0 = X1 θ1 + Y (β 0 )¯2 + u θ (6.25). T −k we obtain a statistic. This entails that the K(β 0 ) test will have power higher than the one of LMGM M test [based on the χ2 (k2 ) bound]. for T large.29) = P (X) Y − (y − Y β 0 ) sεV (β 0 ) ˜ ˆ ˆ Π(β 0 ) = Π − π (β 0 ) . Note that Wang and Zivot (1998) do not provide the auxiliary regression interpretation (6.30) ˆ This procedure tries to correct the simultaneity problem associated with the use of Y in the LMGM M statistic by “purging” it from its correlation with u [by subtracting the term π (β 0 )sεV (β 0 )/sεε (β 0 ) in ZK ] . which are closer to being orthogonal with ˆ u. the F-statistic FK (β 0 ) for ¯2 = 0 is equal to Kleibergen’s statistic K(β 0 ) θ divided by G : FK (β 0 ) = K(β 0 )/G .31) ˆ where the ﬁtted values Y . for T reasonably large.28) (6. 24 . suu (β 0 ) ˆ Π = (X X)−1 X Y . But. we can see easily that the bound for FGM M (β 0 ) in (6.(6. suV (β 0 ) = 1 (y − Y β 0 ) M (X)Y .22) for the LMGM M test.16 d. sεε (β 0 ) π (β 0 ) ˆ = (X X)−1 X (y − Y β 0 ) . so the power of the AR test is uniformly superior to that of the LMGM M bound test. For details. see Dufour and Taamouti (2001b). More precisely. If k2 = G. In other words. T −k (6. with k1 = 0.26) (6. which appear in the auxiliary regression (6. we have FK (β 0 ) = AR(β 0 ) ∼ F (G. T − k).

34) (6. This bound can also be easily derived from the following inequality: LRLIM L ≤ T T −k k2 AR(β 0 ) . On the other hand. Conditional tests A promising approach was recently proposed by Moreira (2003a). 25 . for details. it X Π) can become quite unreliable in the presence of excluded instruments. T −k) distribution. (6. this entails that the AR(β 0 ) test will have power higher than the one of LRLIM L test [based on the χ2 (k2 ) bound]. g. Instrument selection procedures Systematic search methods for identifying relevant instruments and excluding unimportant instruments have been discussed by several authors. it is not robust to excluded instruments. His suggestion consists in conditioning upon an appropriately selected portion of the sufﬁcient statistics for a gaussian SEM. However. On assuming that the covariance matrix of the errors is known. (y − Y β) M (X)(y − Y β) (6. Such an approach may lead to power gains.35) so that the distribution of LRLIM L is bounded in ﬁnite samples by the distribution of a [T k2 /(T − k)]F (k2 . So the power of the AR test is uniformly superior to the one of the LRLIM L bound test. For further discussion. f. T − k) variable. The LR test statistic in this case takes the form: ˆ LRLIM L = T ln κ(β 0 ) − ln κ(β LIM L ) ˆ where β LIM L is the limited information maximum likelihood estimator (LIML) of β and κ(β) = (y − Y β) M (X1 )(y − Y β) . because of the stochastic dominance of AR(β 0 ). the distribution of LRLIM L depends on unknown nuisance parameters under H0 (β 0 ). at least in the neighborhood of the null hypothesis. Likelihood ratio test The likelihood ratio (LR) statistic for β = β 0 was also studied by Wang and Zivot (1998). The conditional distribution is typically not standard but may be established by simulation. this method yields an asymptotically similar test. but its asymptotic distribution is χ2 (k2 ) when k2 = G and bounded by the χ2 (k2 ) distribution in the other cases [a result in accordance with the general LR distributional bound given in Dufour (1997)].33) Like LMGM M . the assumption that error covariances are known is rather implausible. see Moreira and Poi (2001) and Moreira (2003b). see Hall.G FK (β 0 ) can be bounded in ﬁnite samples by a [k2 (T −k1 −G)/(T −k)]F (k2 . In view of the fact that the above procedure is based on estimating the mean of XΠ (using ˆ and the covariances between the errors in the reduced form for Y and u [using sεV (β 0 )]. Because the LR test depends heavily on the speciﬁcation of the model for Y . the corresponding conditional distribution of various test statistics for H0 (β 0 ) does not involve nuisance parameters. for the AR test would then have better power. there would be no advantage in using the bound to get critical values for K(β 0 ). e. Like Kleibergen’s procedure. see Dufour and Khalaf (2000). and the extension of the method to the case where the error covariance matrix is unknown is obtained at the expense of using a large-sample approximation. For T reasonably large.

1. π 1 = Π1 β +γ and π 2 = Π2 β. Π = [Π1 . π = [π 1 . . (1) only an asymptotic distributional theory is supplied. 17 This model is linear except for the nonlinear restriction π = Π β. t = 1. Π]. . 2002). U = [u. although Kleibergen’s and Moreira’s statistics are asymptotically pivotal.1) ¯ ˜ ˜ where Y = [y. . and nonlinear models. 7. in special situations. UT ] . alternatives to the AR procedure may allow some power gains with respect to the AR test with an unreduced set of instruments. 26 . and Wilcox (1996). V ] = [U1 . SEM can also be analyzed through a Bayesian approach. the reader may consult Dufour and Taamouti (2001a).4) can be put in the form of a multivariate linear regression (MLR): ¯ Y = XB + U (7. For additional discussion on this issue. π 2 ] . They themselves may have some important drawbacks. which can alleviate the indeterminacies associated with identiﬁcation via the introduction of a prior distribution on the parameter space. (7. Π2 ] . Dufour and Taamouti (2001a). simulation-based inference and nonnormal errors Another approach to inference on a linear structural equation model is based on observing that the structural model (5. . .2) 17 Most of this section is based on Dufour and Khalaf (2000. see Kleibergen and Zivot (2003) and Sims (2001). . T . B = [π. .1) . (3) they are not robust to instrument exclusion or to the formulation of the model for the explanatory endogenous variables. For further discussion.Rudebusch. Robustness to instrument exclusion is very handy in this context. but this dependence becomes especially strong when identiﬁcation problems are present [see Gleser and Hwang (1987)]. This paper only aims at discussing problems and solutions which arise within the sampling framework. One way to approach instrument selection is to maximize the concentration coefﬁcient towards maximizing test power. and Donald and Newey (2001). satisfy the property: ˜ Ut = JWt . (2) the statistics used are not pivotal in ﬁnite samples. Multivariate regression. and it is beyond its scope to debate the advantages and disadvantages of Bayesian methods under weak identiﬁcation. It is also of interest to note that ﬁnite-sample versions of several of these asymptotic tests may be obtained by using split-sample methods. . Bayesian inferences always depend on the choice of prior distribution (a property viewed as undesirable in the sampling approach). In particular. All the problems and techniques discussed above relate to sampling-based statistical methods. Hall and Peixe (2000). 2 2 ˜ Let us now make the assumption that the errors in the different equations for each observation. To summarize.(5. In this setup. 7. Extensions We will discuss succinctly some extensions of the above results to multivariate setups (where several structural equations may be involved). models with non-Gaussian errors. the power of AR-type tests depends on a function of model parameters called the concentration coefﬁcient. Y ]. 2001. Ut .

where the vector w = vec(W1 . All the above procedures are valid for parametric models that specify the error distribution up to an unknown linear transformation (the J matrix) which allows an unknown covariance matrix. σ 2 IT . even though this distribution may be complicated. It is easy to see that these (including the exact procedures discussed in section 6) yield “asymptotically valid” procedures under much weaker assumptions than those used to obtain ﬁnite-sample results. its distribution is completely invariant to the unknown J matrix (or the error covariance matrix). the hypothesis β = β 0 tested by the AR test can be written in this form [see Dufour and Khalaf (2000)]. This is called a uniform linear (UL) hypothesis. see Dufour and Khalaf (2002). less restrictive. However. because the distribution of u is not taken to be necessarily N 0. this approach allows one to consider any (possibly non-gaussian) distribution on w. In particular. . the distribution of the corresponding LR statistic can be bounded by the LR statistic for a UL hypothesis (which is pivotal). can be obtained in this way. This distributional assumption is. Further. u Consider now an hypothesis of the form H0 : RBC = D (7. The AR test can also be obtained as a monotonic transformation of a statistic of the form LR(H0 ). More generally. 27 . . The corresponding gaussian LR statistic is ˆ ˆ LR(H0 ) = T ln(|Σ0 |/|Σ|) (7. . it is of interest to note that the LR statistic for about any hypothesis on B can be bounded by a LR statistic for an appropriately selected UL hypothesis: setting b = vec(B) and H0 : Rb ∈ ∆0 (7. it is also possible to use maximized Monte Carlo tests [Dufour (2002)]. An important feature of LR(H0 ) in this case is that its distribution under H0 does not involve nuisance parameters and may be easily simulated (it is a pivot). more restrictive than the one made in section 5.18 Multivariate extensions of AR tests. see Dufour and Khalaf (2001) and Dufour (2002). To avoid the use of such bounds (which may be quite conservative). which impose restrictions on several structural equations.4) ˆ ˆ ˆ ˆ ˆ ˆ where Σ = U U /T and Σ0 = U0 U0 /T are respectively the unrestricted and restricted estimates of the error covariance matrix. the pitfalls and limitations of such arguments should be remembered: there is no substitute for a provably exact procedure. . in view of the discussion in section 4.5) where R an arbitrary q × k(G + 1) matrix and ∆0 is an arbitrary subset of Rq . in a way. for example.1 — because of the assumption on V — and in another way. If we aim at getting tests and conﬁdence sets for nonparametric versions of the SEM (where the 18 For further discussion of Monte Carlo test methods. Wn ) has a known distribution and J is an unknown nonsin˜ gular matrix (which enters into the covariance matrix Σ of the error vectors Ut ). This covers as special cases all restrictions on the coefﬁcients of SEM (as long as they are written in the MLR form). In such a case.3) where R. C and D are ﬁxed matrices. we can use Monte Carlo test techniques — a form of parametric bootstrap — to obtain exact test procedures.

. T. However. While GMM estimators of θ have nonstandard asymptotic distributions. . In such models. θ0 ) = zt (θ0 . θ can be viewed as identiﬁable if there is only one value of θ [say. Zt is a vector of conditioning variables (or instruments) — usually with a number of additional distributional assumptions — and Eθ [ · ] refers to the expected value under a distribution with parameter value θ. But problems similar to those noted for linear SEM do arise. a ﬁrst attempt at deriving ﬁnite-sample procedures is available in Dufour and Taamouti (2001b). In view the discussion in section 4. θ = ¯ that satisﬁes (7. Under parametric assumptions on the errors.e. yt is a vector endogenous variables. . Nonlinear structural models typically take the form: ft (yt . this may be achievable by looking at distribution-free procedures based on permutations. This is not surprising: parametric nonseparability arises here for two reasons. (7. 2003). however.error distribution involves an inﬁnite set of nuisance parameters). once a joint conﬁdence set for θ has been built. i. . inference on individual parameters can be drawn via projection methods. . who proposed an extension of the K(β 0 ) test. are “weakly sensitive” to the value of θ. θ is vector of unknown parameters. There is very little work on this topic in the SEM. θ) | Zt ] = 0 . ranks or signs. the objective function minimized by the GMM procedure follows an asymptotic distribution which is unaffected by the presence of weak instruments: it is asymptotically pivotal.6) where ft (·) is a vector of possibly nonlinear relationships. For an interesting ﬁrst look. θ1 )γ + εt . Eθ [ut | Zt ] = 0 .2. xt . T. Stock and Wright (2000) studied the asymptotic distributions of GMM-based estimators and test statistics under conditions of weak identiﬁcation (and weak “high level” asymptotic distributional assumptions). Nonlinearity makes it difﬁcult to construct ﬁnite-sample procedures even in models where identiﬁcation difﬁculties do not occur. and we θ] have weak identiﬁcation (or weak instruments) when the expected values E¯ [ft (yt . θ t. for hypotheses of the form θ = θ0 and the corresponding joint conﬁdence sets. Other contributions in this area include papers by Kleibergen (2001a. . So it is not surprising that work in this area has been mostly based on large-sample approximations. . . the reader should look at an interesting recent paper by Bekker (2002). T. xt . . 2002) proposed tests of underidentiﬁcation and identiﬁcation. θ) = ut .6). . Of course. the hypothesis θ = θ0 is tested by testing γ = 0 in an auxiliary regression of the form: ft (yt . 7. . the fact that all these methods are based on large-sample approximations without a ﬁnite-sample theory remains a concern. model nonlinearity and the possibility of non-identiﬁcation. xt . t. So tests and conﬁdence sets based on the objective function can be asymptotically valid irrespective of the presence of weak instruments.7) 28 . Research on weak identiﬁcation in nonlinear models remains scarce. xt is a vector of exogenous variables. and Wright (2003. . Nonlinear models It is relatively difﬁcult to characterize identiﬁcation and study its consequences in nonlinear structural models. (7. t. These results are achieved for the full parameter vector θ.

As in the case of linear SEM. The above difﬁculties underscore the pitfalls of large-sample approximations. Inference on nonlinear regressions are also covered by this setup. via standard least-square-based “heteroskedasticity-robust” standard errors.. but parameter vectors can be (at least in parametric models). individual parameters in statistical models are not generally meaningful.where the zt (θ0 . (b) In many models. we will summarize the main points made in this paper. (a) Standard errors do not constitute a valid way of assessing parameter uncertainty and building conﬁdence intervals. which are typically based on pointwise (rather than uniform) convergence results and may be arbitrarily inaccurate in ﬁnite samples. There are basic pitfalls and limitations faced in developing inference procedures in econometrics. this type of difﬁculty arises for Wald-type statistics in the presence of weak instruments (or weakly identiﬁed models) 2. while restrictions on the whole parameter vector are. (b) trying to solve an inference problem using a technique that cannot deliver a solution because of the very structure of the technique. If we are not careful. 1.g. As a result. or (ii) building a conﬁdence interval for a parameter which is not identiﬁable in a structural model. such as an hypothesis on a moment in the context of an insufﬁciently restrictive nonparametric model. via the usual technique based on standard errors. a unit root hypothesis) on a dynamic model while allowing a dynamic structure with an unlimited (not necessarily inﬁnite) number of parameters. as in (i) testing an hypothesis on a mean (or median) under heteroskedasticity of unknown form. Conclusion By way of conclusion. 3. 29 . We called this phenomenon parametric nonseparability. and projection methods can be used to draw inference on subvectors of θ. several of the intuitions derived from the linear regression model and standard asymptotic theory can easily be misleading. One gets in this way point-optimal tests [see King (1988) and Dufour and King (1991)]. such as structural models where parameters may be underidentiﬁed. In many econometric problems (such as. θ1 ) are instruments in a way that maximizes power against a reference alternative (point-optimal instruments). In particular. restrictions on individual coefﬁcients may not be testable. sample-split techniques may be exploited to approximate optimal instruments. we can easily be led into ill-deﬁned problems and ﬁnd ourselves: (a) trying to test a non-testable hypothesis. This feature should play a central role in designing methods for dealing with weakly identiﬁed models. inference on structural models). 8. or an hypothesis (e.

4. In the speciﬁc example of SEM. 30 . (c) Trying to adapt and improve AR-type procedures (without ever forgetting basic statistical principles) constitutes the most promising avenue for dealing with weak instruments. the following general remarks are in our view important. we can construct valid tests and conﬁdence sets for the parameter vector. we have emphasized the following points. (b) It is possible to improve the power of AR-type procedures (especially by reducing the number of instruments). to excluded instruments or to the speciﬁcation of a model for the endogenous explanatory variables. one should always look for pivots as the fundamental ingredient for building tests and conﬁdence sets. but power improvements may come at the expense of using a possibly unreliable large-sample approximation or losing robustness (such as robustness to excluded instruments). (c) Given a pivot for a parameter vector. Concerning solutions to such problems. there is a trade-off between power (which is typically increased by considering more restrictive models) and robustness (which involves considering a wider hypothesis). but they can be obtained in a much wider set of cases for appropriately selected vectors of parameters. (d) Inference on individual coefﬁcients may then be derived through projection methods. such as robustness to the presence of weak instruments. (b) Pivots are not generally available for individual parameters. (a) Besides being pivotal. As usual. 5. and more speciﬁcally in the context of weakly identiﬁed models. (a) In accordance with basic statistical theory. the AR statistic enjoys several remarkable robustness properties.

National Bureau of Economic Research. Equivalent Models.. F. W. J. D. Error Correction.. WANSBEEK (1994): Identiﬁcation. AND A. JAEGER . J. H ENDRY (1993): Co-Integration.. 77–83..” Technical Working Paper 137. AND A. 1115–1122. P. R. J. A. P. B EKKER . AND D. AND F. A. 443–450. University of Amsterdam.” Review of Economics and Statistics. B EKKER . with Application to a Model of the Moroccan Economy.. Amsterdam.” Journal of Business and Economic Statistics. Academic Press. A NGRIST. K RUEGER (1995): “Split-Sample Instrumental Variables Estimates of the Return to Schooling. D. T.” Annals of Mathematical Statistics. BAKER (1993): “The Cure can be Worse than the Disease: A Cautionary Tale Regarding Instrumental Variables. A. A. K LEIBERGEN (2001): “Finite-Sample Instrumental Variables Inference Using an Asymptotically Pivotal Statistic. AND R. D. and Computer Algebra. R. 31 .” Annals of Mathematical Statistics. The Netherlands.” Econometrica. Groningen. 46–63. A. Oxford University Press Inc. New York. AND H. P. University of Groningen.. 979–1014. (1957): “A General Classical Method of Linear Estimation of Coefﬁcients in a Structural Equation. A NGRIST. L.” Journal of the American Statistical Association.. J. The Netherlands. AND R. and the Econometric Analysis of Non-Stationary Data. (1992): “The Bias of Instrumental Variables Estimators. BASMANN . S AVAGE (1956): “The Nonexistence of Certain Statistical Procedures in Nonparametric Problems. R. Tinbergen Institute. B USE . (2002): “Symmetry-Based Inference in an Instrumental Variable Setting. AND L... J. D OLADO .” Journal of Applied Econometrics.” Econometrica. 90. Department of Economics. BAKER (1995): “Problems With Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak. 20. 13. B OUND . B.. J. 7. RUBIN (1949): “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations. A.. Boston. R. J. CVI. S. A NDERSON . A. Cambridge. W. LXXX.References A BDELKHALEK . BANERJEE . B LOUGH . (1992): “The Relationship between Power and Level for Generic Unit Root Tests in Finite Samples. J. MA. M. BAHADUR . 60. JAEGER . B OUND .” Quarterly Journal of Economics.-M. G ALBRAITH . 27.” Discussion Paper TI 2001-055/4. B EKKER . D. AND T. 173–180. K RUEGER (1991): “Does Compulsory School Attendance Affect Schooling and Earning?. D UFOUR (1998): “Statistical Inference for Computable General Equilibrium Models. 225–235. AND J. M ERCKENS .” Discussion paper. 295–308. T. 25. A. 520–534. B.

AND P.” Econometrica..” Discussion paper. New York. with Applications to Structural and Dynamic Models. AND M.-M.” in Présentations de l’Académie des lettres et des sciences humaines..” Discussion paper. I. 58. théorie des tests et philosophie des sciences. 77. New Brunswick.” Journal of Economic Dynamics and Control. J.” International Economic Review. 117–128.” Econometrica. 1–16. (2003): “Alternative Approximations of the Bias and MSE of the IV Estimator Under Weak Identiﬁcation With an Application to Bias Correction. M. G. (2000): “Économétrie.. 32 . DAVIDSON .C AMPBELL . 1365–1389. (1981): “Rank Tests for Serial Dependence. J.-M. 475–494. B. AND J. C. K. 3. M C G ARVEY (1999): “Covariance Matrix Estimation. C OCHRANE . 15. 171–190. pp. AND W. 113–150. (1990): “Exact Tests and Conﬁdence Sets in Linear Regressions with Autocorrelated Errors.” Journal of Time Series Analysis. 77(2). S WANSON (2001): “Bias and MSE Analysis of the IV Estimator Under Weak Identiﬁcation with Application to Bias Correction. Rutgers University. 69. B. Department of Economics. 65. S.. 1161–1191. G. 51. 38. D UFOUR . (2001): “Logique et tests d’hypothèses: réﬂexions sur les problèmes mal posés en économétrie. AND J.” in Mátyás (1999). pp.” Review of Economics and Statistics. J. (1997): “Some Impossibility Theorems in Econometrics. R. 63–95. C HOI . chap. C USHING . 166–182. 151–173. 275–284. R. AND N. D ONALD . 2.” L’Actualité économique. G. Rutgers University. P HILLIPS (1992): “Asymptotic and Finite Sample Distribution Theory for IV Estimators and Tests in Partially Identiﬁed Structural Equations.” Journal of Econometrics. Ottawa. 53.. vol. New Brunswick. C HAO . Department of Economics. N EWEY (2001): “Choosing the Number of Instruments.” Econometrica. (1991): “A Critique of the Application of Unit Root Tests.. New Jersey. New Jersey. D UFOUR (1995): “Exact Nonparametric Orthogonality and Random Walk Tests. J. H. Royal Society of Canada/Société royale du Canada. Oxford University Press. M AC K INNON (1993): Estimation and Inference in Econometrics. (1997): “Exact Nonparametric Tests of Orthogonality and Random Walk in the Presence of a Drift Parameter.

C. AND I. 111(2). 23.-M. Two-Sided Autoregressions and Exact Inference for Stationary and Nonstationary Autoregressive Processes.” Journal of Econometrics. (2000): “Markovian Processes. (2002): “Simulation Based Finite and Large Sample Tests in Multivariate Regressions. J. chap. Université de Montréal.D. AND J.” Journal of Nonparametric Statistics. D UFOUR .” Journal of Econometrics.R.. by B. J. 494–519.” Discussion paper.” Discussion paper. and A. J.” Journal of Econometrics.-M. ed.E. AND M.E. JASIAK (1993): “Finite Sample Inference Methods for Simultaneous Equations and Models with Unobserved and Generated Regressors. Baltagi. AND J. D UFOUR . 303–322. D UFOUR ..-M... Blackwell Companions to Contemporary Economics. Université de Montréal. E. (2001a): “On Methods for Selecting Instruments. Giles. 66.(2002): “Monte Carlo Tests with Nuisance Parameters: A General Approach to FiniteSample Inference and Nonstandard Asymptotics in Econometrics. M.. Ullah.” Discussion paper. D UFOUR .. C.. J. C.. Marcel Dekker.R. pp.D. 465–505. New York.K.E. 47. 9. 38 pages. D UFOUR . C.E. 115–143.D. D UFOUR . AND O. J.R. J. K HALAF (2000): “Simulation-Based Finite-Sample Inference in Simultaneous Equations.. M IZERA (1998): “Generalized Runs Tests for Heteroskedastic Time Series.” International Economic Review.. D UFOUR .-M.D.” Econometrica. Oxford. F.. (2001): “Monte Carlo Test Methods in Econometrics. C. 33 . A.-M. U. K IVIET (1998): “Exact Inference Methods for First-Order Autoregressive Distributed Lag Models.-M. 42. by D. 815–843. 39–86.-M.R. Université de Montréal.” in Handbook of Applied Economic Statistics. J.D.” Discussion paper..E.” Discussion paper. TAAMOUTI (2000): “Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments.. Basil Blackwell. AND M. 99.-M. D UFOUR .R. T ORRÈS (1998): “Union-Intersection and Sample-Split Methods in Econometrics with Applications to SURE and MA Models. 255–289. J. forthcoming.” in Companion to Theoretical Econometrics. AND L. K ING (1991): “Optimal Invariant Tests for the Autocorrelation Coefﬁcient in Linear Regressions with Stationary or Nonstationary AR(1) Errors. AND J. ed. Université de Montréal. 79–104. (2001b): “Point-Optimal Instruments and Generalized Anderson-Rubin Procedures for Nonlinear Models.” Journal of Econometrics. pp. L. JASIAK (2001): “Finite Sample Limited Information Inference Methods for Structural Equations and Models with Generated Regressors. H ALLIN . Université de Montréal.

W. 75. A. Massachusetts Institute of Technology. M. (1999): “Hypothesis Testing in Models Estimated by GMM. P. D.. 70. Amsterdam. 34 . M C FADDEN (eds. (2002c): “Weak Instruments: Diagnosis and Cures in Empirical Econometrics. F.) (1983): Handbook of Econometrics. H AUSMAN (2002a): “A New Speciﬁcation Test for the Validity of Instrumental Variables. 1208–1212. North-Holland. J. D. 37. pp. AND D. (1976): The Identiﬁcation Problem in Econometrics. 15. AND J. 283–298. (1999): “Theoretical conﬁdence level problems with conﬁdence intervals for the spectrum of a time series. chap. G LESER . (1982): “Large Sample Properties of Generalized Method of Moments Estimators. 96–127. L. Massachusetts. J. J. 724–732. Massachusetts. AND M. Volume 4. F. Huntington (New York). J. Krieger Publishing Company. Amsterdam. A.” Economics Letters. 629–637. P EIXE (2000): “A Consistent Method for the Selection of Relevant Instruments. R. AND F.” in Mátyás (1999). Department of Economics. G RILICHES . V. 50.) (1994): Handbook of Econometrics. AND J. H AHN . 1029–1054. Z. A. RUDEBUSCH . North Carolina. Department of Economics. H ALL . FAUST. 1343.E NGLE .” The Annals of Mathematical Statistics. G. North-Holland. L. Cambridge.. FAUST. Volume 1. R. 12. T. H AHN . 237–241. H AUSMAN .” Econometric Theory. W ILCOX (1996): “Judging Instrument Relevance in Instrumental Variables Estimation. AND D. (1996): “Near Observational Equivalence Problems and Theoretical Size Problems with Unit Root Tests. G ODAMBE .. J. F ISHER . North Carolina State University. J. (1960): “An Optimum Property of Regular Maximum Likelihood Estimation.” Econometrica. P. 1351–1362. H WANG (1987): “The Nonexistence of 100(1 − α) Conﬁdence Sets of Finite Expected Diameter in Errors in Variables and Related Models.” Discussion paper. 163–189.” Discussion paper. 31.. H ALL . (2002b): “Notes on Bias in Estimators for Simultaneous Equation Models.” Discussion paper. H ANSEN . AND G. 67. Cambridge. Raleigh.” International Economic Review. R. Ackowledgement 32 (1960). M.” Econometrica. Department of Economics... H ALL .” Econometrica. Massachusetts Institute of Technology. L. K UERSTEINER (2001): “Higher Order MSE of Jackknife 2SLS. R. I NTRILLIGATOR (eds..” The Annals of Statistics. 4.

H OROWITZ . S TEIN (1949): “On the Theory of some Non-Parametric Hypotheses.” Discussion paper. (1983): “Identiﬁcation. (1992): “Fellow’s Opinion: Rules of Thumb and Pseudo-Science. (1990): “On the Normalization of Structural Equations: Properties of Direction Estimators. L. University of California at Berkeley. (2001b): “Testing Subsets of Structural Coefﬁcients in the IV Regression Model. I MBENS .” in Griliches and Intrilligator (1983). L. (1987): “A Characterization of the Fieller Solution.” Discussion Paper TI 01-067/4. (1988): “Towards a Theory of Point Optimal Testing. 462–468.. University of Amsterdam. University of Amsterdam. H. Z IVOT (2003): “Bayesian and Classical Approaches to Instrumental Variable Regression. Comments and reply. M. 6.. F. (2001a): “Testing Parameters in GMM Without Assuming That They are Identiﬁed. 219-255. 53.” Discussion paper. J. 20.” Econometrica. (2002b): “Two Independent Statistics That Test Location and Misspeciﬁcation and AddUp to the Anderson-Rubin Statistic. AND C. Department of Quantitative Economics. 70(5).” Econometric Reviews. 4. G. AND C. W. K LEIBERGEN . Department of Quantitative Economics.H ILLIER . (2003): “Expansions of GMM Statistics That Indicate their Properties under Weak and/or Many Instruments. E. 28–45.” The Annals of Statistics. 58. The Netherlands. M. Amsterdam. K LEIBERGEN . (1986): Testing Statistical Hypotheses. 1–4. 223–283. 100(1). L EHMANN . L. chap. G.” Annals of Mathematical Statistics. University of Amsterdam. 29–72. 114(1). 15. Department of Quantitative Economics.” Discussion paper. AND E.” Journal of Econometrics. 2nd edition. C. (2002a): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression. California.” Journal of Econometrics. H SIAO . 37–40. Tinbergen Institute. KOSCHAT. 169– 218. A. E. L EHMANN . F..” Discussion paper. 1181–1194. L. John Wiley & Sons.” Journal of Econometrics. E.” Econometrica. M ANSKI (2003): “Conﬁdence Intervals for Partially Identiﬁed Parameters. M AASOUMI . 35 . 1781–1803. (2001): “The Bootstrap and Hypothesis Tests in Econometrics. K ING . pp. New York. Berkeley. F. Department of Economics.

S. 2111–2245..” Econometrica. Harvard University Press. Cambridge..” in Engle and McFadden (1994).” Econometrica. 91–128.. D. S. C. (1990b): “Some Further Results on the Exact Small Properties of the Instrumental Variable Estimator. 60. P OI (2001): “Implementing Tests with Correct Size in the Simultaneous Equations Model. (2003a): “A Conditional Likelihood Ratio Test for Structural Models. Cambridge and London. D. (1995): Identiﬁcation Problems in the Social Sciences. 63. N EWEY. Department of Economics.. M OREIRA . W EST (1987a): “Hypothesis Testing with Efﬁcient Method of Moments Estimators. Cambridge University Press. M C FADDEN (1994): “Large Sample Estimation and Hypothesis Testing. A. Massachusetts. G. Massachusetts. 1–15. AND J. M ANSKY. J. 62. AND B. 1027–1048. W. Cambridge. M OREIRA . N ELSON . G. Cambridge University Press.” Econometrica.M AC K INNON . 58... J. G.” Journal of Business. (ed. M ADDALA . AND I.” Discussion paper.” Canadian Journal of Economics. Springer-Verlag. C. L.) (1999): Generalized Method of Moments Estimation. M ADDALA . 36 . 71(4). M ADDALA .” International Economic Review. (2002): “Bootstrap Inference in Econometrics. AND D. J EONG (1992): “On the Exact Small Sample Distribution of the Instrumental Variable Estimator. J. C. U.” Journal of Econometrics. Springer Series in Statistics. 36. 615–645. NANKERVIS .” Discussion paper. Cambridge. (2001): “Tests With Correct Size When Instruments Can Be Arbitrarily Weak. P. W. 42. Harvard University. Department of Economics.” The Stata Journal. S TARTZ (1990a): “The Distribution of the Instrumental Variable Estimator and its t-ratio When the Instrument is a Poor One.K. pp. C. K. AND K. M. K IM (1998): Unit Roots. J. 841–851.K. Cointegration and Structural Change. G. 967–976. 1(1). E. (2003): Partial Identiﬁcation of Probability Distributions.-M. S. K. N EWEY. 777–787. AND R. M. (2003b): “A General Theory of Hypothesis Testing in the Simultaneous Equations Model. Harvard University.” Econometrica. chap. AND N. 181–183. M ÁTYÁS . M C M ANUS . 28. 125–140. R. M ANSKI . U. S AVIN (1994): “Multiple Optima and Asymptotic Approximations in the Partial Adjustment Model. New York. Cambridge.. 35(4). (1974): “Some Small Sample Evidence on Tests of Signiﬁcance in Simultaneous Equations Models.

G IBBONS (1981): Concepts of Nonparametric Theory.” Econometrica. 75. D. 70(3). 51. 37 . S HEA . 703–708. pp. B. Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. S IMS . Amsterdam. ed. 85(2).” Review of Economics and Statistics. 42. 335–346. (1971b): “Discrete Approximations to Continuous Time Distributed Lags in Econometrics. P ERRON . North-Holland. 545–563. (2002): “Lower Risk Bounds and Properties of Conﬁdence Sets for Ill-Posed Estimation Problems with Applications to Spectral Density and Persistence Estimation. T. 21–36. ROTHENBERG .” in Griliches and Intrilligator (1983).” in Handbook of Econometrics.” Biometrika. 55(2). Academic Press. 348–352. New York. P. (1968): The Logic of Scientiﬁc Discovery.” Econometrica. 277–301. B. 449–516. Positive Semi-Deﬁnite.. P RATT. New York. 25. P.” Annals of Mathematical Statistics. 39. chap. P ÖTSCHER . P HILLIPS . Intrilligator. 15.” International Economic Review. S IMS . 181–240. C. C. A. B. B. (1989): “Partially Identiﬁed Econometric Models. (1984): “Approximating the Distributions of Econometric Estimators and Test Statistics. 5. and M. 577–591. 424–443. chap. D. pp.” Econometric Theory. P ERRON (1988): “Testing for a Unit Root in Time Series Regression. L. revised edn. J. Griliches. 881–935. W. C. 39. by Z.” International Economic Review. P HILLIPS . (1987): “Time Series Regression with Unit Root. 8. B. J. (1985): “The Exact Distribution of LIML: II. P OPPER .” Econometrica. J. LXXIX. (1971): “Identiﬁcation in Parametric Models. (1983): “Identiﬁcation and Lack of Identiﬁcation. 55. K. AND P. P RAKASA R AO . S. Unit Roots and Estimation of Long Memory Parameters. New York. 26. 249–261. (1992): Identiﬁability in Stochastic Models: Characterization of Probability Distributions. AND J.” Econometrica. C. J.. (1971a): “Distributed Lag Estimation When the Parameter Space is Explicitly InﬁniteDimensional.” Review of Economics and Statistics. (1997): “Instrument Relevance in Multivariate Linear Models: A Simple Measure. Springer-Verlag.(1987b): “A Simple. S ARGAN . Harper Torchbooks. D. 1622–1636. 1035–1065. (1983): “Exact Small Sample theory in the Simultaneous Equations Model. (1984): “The Exact Distribution of LIML: I. Volume 2.” Econometrica. 1605–1633.” Econometrica. (2003): “Semiparametric Weak Instrument Regressions with an Application to the Risk Return Tradeoff.

T REBBI (2003): “Who Invented IV Regression?. Harvard University. R. H. J. 1097–1126. Department of Economics.. N. New Jersey. The Netherlands. Priceton University. Princeton. R. AND E. forthcoming.” Discussion paper. pp. (1953): “Repeated Least-Squares Applied to Complete Equation Systems. 20(4). J. Department of Economics. 48. S TAIGER .” Discussion paper. AND J. H. C. W RIGHT. J. H. W HITE . 2. The Hague. Massachusetts. AND M. S TOCK . S TOCK (1997): “Instrumental Variables Regression with Weak Instruments. Z IVOT (1998): “Inference on Structural Parameters in Instrumental Variables Regression with Weak Instruments. John Wiley & Sons.B. H.” Econometric Reviews. WANG . AND M. J. H. (1983): “On the Relevance of Finite Sample Distribution Theory. TANAKA . AND J. 518–529.(2001): “Thinking About Instrumental Variables. Z IVOT (2001): “Improved Inference for the Instrumental Variable Estimator. TAYLOR . 46. 66. S TOCK . 65. Cambridge. (2003): “Asymptotic Distributions of Instrumental Variables Statistics with Many Weak Instruments.” Econometrica. Massachusetts. S TOCK .” in Engle and McFadden (1994). AND F. (1980): “A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity. (1996): Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. S TOCK . New York. H. 2740–2841. K. 1–39. W ORKING .” Discussion paper. 817–838... University of Washington. 1389–1404.” The Journal of Economic Perspectives.” Discussion paper. 41. H. 557–586. chap. N ELSON .” Econometrica. 68. T HEIL .. Cambridge. Central Planing Bureau. H.. E. H.” Econometrica. J.” Econometrica. J. J. (1927): “What Do Statistical Demand Curves Show?. D. 38 . H. W. YOGO (2002): “A Survey of Weak Instruments and Weak Identiﬁcation in Generalized Method of Moments.” Discussion Paper 284. S TOCK .. Department of Economics. (1994): “Unit Root.. AND E. S TARTZ . W RIGHT (2000): “GMM with Weak Identiﬁcation.” Journal of Business and Economic Statistics. 212–235. Structural Breaks and Trends. Harvard University.E. YOGO (2002): “Testing for Weak Instruments in Linear IV Regression..” The Quarterly Journal of Economics. E.R. J.

” International Economic Review. 322–330. R. (2003): “Detecting Lack of Identiﬁcation in GMM. Macmillan. New York. E. 39.” Discussion Paper 732. S TARTZ . 39 .” Discussion paper.. Board of Governors of the Federal Reserve System. D. 19(2). H.W RIGHT. 1119–1144. R. (2003): “Inference in Partially Identiﬁed Instrumental Variables Regression with Weak Instruments. (2002): “Testing the Null of Identiﬁcation in GMM. Washington. N ELSON (1998): “Valid Conﬁdence Intervals and Inference in the Presence of Weak Instruments. University of Washington. Department of Economics. International Finance Discussion Papers.” Econometric Theory. AND C.C. J. (1928): The Tariff on Animal and Vegetable Oils. P. G. W RIGHT. Z IVOT.