You are on page 1of 37

ARTICLE IN PRESS

Journal of Econometrics 144 (2008) 81–117


www.elsevier.com/locate/jeconom

Partial identification of probability distributions


with misclassified data
Francesca Molinari
Department of Economics, Cornell University, 492 Uris Hall, Ithaca, NY 14853-7601, USA
Received 14 March 2006; received in revised form 19 August 2007; accepted 11 December 2007
Available online 6 January 2008

Abstract

This paper addresses the problem of data errors in discrete variables. When data errors occur, the observed variable is a
misclassified version of the variable of interest, whose distribution is not identified. Inferential problems caused by data
errors have been conceptualized through convolution and mixture models. This paper introduces the direct misclassification
approach. The approach is based on the observation that in the presence of classification errors, the relation between the
distribution of the ‘true’ but unobservable variable and its misclassified representation is given by a linear system of
simultaneous equations, in which the coefficient matrix is the matrix of misclassification probabilities. Formalizing the
problem in these terms allows one to incorporate any prior information into the analysis through sets of restrictions on the
matrix of misclassification probabilities. Such information can have strong identifying power. The direct misclassification
approach fully exploits it to derive identification regions for any real functional of the distribution of interest. A method
for estimating the identification regions and construct their confidence sets is given, and illustrated with an empirical
analysis of the distribution of pension plan types using data from the Health and Retirement Study.
r 2007 Elsevier B.V. All rights reserved.

JEL classification: C10; C13; C14; J26

Keywords: Misclassification; Partial identification; Direct misclassification approach

1. Introduction

Error-ridden data constitute a significant problem in nearly all fields of science. There are many possible
sources of data errors. Examples include use of inexact measures because of high costs or infeasibility of exact
evaluation, tendency of study subjects to underreport socially undesirable behaviors and attitudes and
overreport socially desirable ones, or imperfect recall (or lack of knowledge) by study subjects. When data
errors are present, often the sampling process does not identify the probability distribution of interest, and
inference is impaired.
This paper addresses the problem of data errors in discrete variables. Interest in the question emerges from
the observation that much of the empirical work in economics and related fields is based on the analysis of
Tel.: +1 6072556367; fax: +1 6072552818.
E-mail address: fm72@cornell.edu

0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2007.12.003
ARTICLE IN PRESS
82 F. Molinari / Journal of Econometrics 144 (2008) 81–117

survey data. The reliability of these data is well documented to be less than perfect (see for example Bound
et al., 2001). Although survey questions may gather information on variables that are conceptualized as
continuous (e.g., age, earnings, etc.), a considerable part of the collected data is in the form of variables taking
values in finite sets. Examples include educational attainment, language proficiency, workers’ union status,
employment status, health conditions, and health/functional status.
When data errors occur in variables of this type, it is natural to think about the problem in terms of
classification errors (see for example Bross, 1954; Aigner, 1973). An example may clarify this point. Suppose
that an analyst is interested in learning the distribution of pension plan types in the American population.
Three types are possible: defined benefit (DB), defined contribution (DC), and plans incorporating features of
both. Suppose that the analyst has data from a nationally representative survey which queried a random
sample of American households about their pension plans’ characteristics. Validation studies document that a
significant fraction of the reported plan types differ from the truth; for example, some people who truly have a
DB plan are erroneously classified as having a DC plan (Gustman and Steinmeier, 2001).
To formalize the problem, suppose that each member l of a population L is characterized by the vector
ðwl ; xl Þ 2 X  X , where X is a discrete set, not necessarily ordered, denoted by X  f1; 2; . . . ; Jg, 2pJo1.
Let a sampling process draw persons at random from L. Suppose that the analyst is interested in learning
features of the distribution PðxÞ from the available data. However, she does not observe realizations of x, but
observes realizations of w, which can either be equal or differ from the realizations of x. In the above example,
x denotes the true pension plan type and w the type reported in the survey.
Much of the existing literature on drawing inference in presence of error-ridden data has conceptualized the
problem using either convolution models or mixture models. In the case of convolution models, a latent variable
v 2 V is introduced and w is assumed to measure x with chronic (i.e., affecting each observation) ‘errors-in-
variables’: w ¼ x þ v. Researchers using convolution models commonly assume that the latent variable v is
statistically independent from x or uncorrelated with x and has mean zero (see, e.g., Klepper and Leamer,
1984). In the case of mixture models, latent variables v 2 V and z 2 f0; 1g are introduced and w is viewed as a
contaminated version of x, generated by the mixture w ¼ zx þ ð1  zÞv. In this model, z denotes whether x or v
is observed and realizations of w with z ¼ 1 are said to be error free. Researchers using mixture models
commonly assume that the error probability Prðz ¼ 0Þ is known or at least that it can be bounded non-trivially
from above (see, e.g., Horowitz and Manski, 1995).
When a variable with finite support is imperfectly classified, it is widely recognized that the assumption,
typical in convolution models, of independence between measurement error and true variable cannot hold
(see, for example, Bound et al., 2001, p. 3735). Moreover, compelling evidence from validation studies suggests
that errors in the data are occasional rather than ‘chronic’: a significant part of the observed data are error
free. Mixture models seem therefore more suited for the analysis of such data. However, often the researcher
has prior information on the nature of the misclassification pattern that has transformed x into w. This
information may aid in identification, but cannot easily be exploited through a mixture model.
In this paper I propose an alternative framework, which I call the direct misclassification approach, to draw
inference on the distribution of discrete variables subject to classification errors. The approach does not rely
on the introduction of latent variables, but is based on the observation that in the presence of misclassification,
the relation between the observable distribution of w and the unobservable distribution of x is given by
2 3 2 32 3
Prðw ¼ 1Þ Prðw ¼ 1jx ¼ 1Þ . . . Prðw ¼ 1jx ¼ JÞ Prðx ¼ 1Þ
6 .. 7 6 .. .. .. 76 .. 7
6 . 7¼6 . . . 76 . 7. (1.1)
4 5 4 54 5
Prðw ¼ JÞ Prðw ¼ Jjx ¼ 1Þ . . . Prðw ¼ Jjx ¼ JÞ Prðx ¼ JÞ
In all that follows I denote by P% the matrix of elements fPrðw ¼ ijx ¼ jÞgi;j2X which appears on the right-
hand side of the above equation. For iaj, Prðw ¼ ijx ¼ jÞ is generally referred to as ‘misclassification
probability.’ Eq. (1.1) is a simple formalism and does not have content per se. However, it becomes potentially
informative when combined with assumptions on the matrix of misclassification probabilities P% ; such
assumptions generate a misclassification model.
The method that I introduce allows one to draw inference on PðxÞ and on any real functional of this
distribution using Eq. (1.1) directly, when restrictions on the elements of P% are imposed. Due to the
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 83

classification errors, the identification of the probability distribution PðxÞ is partial and the inference on any of
its real functionals is in the form of identification regions, that is, sets collecting the feasible values of such
functionals. I show that these regions are ‘sharp,’ in the sense that they exhaust all the available information,
given the sampling process and the maintained assumptions. Manski (2003) gives an overview of the literature
on partial identification; for other work see, e.g., Hotz et al. (1997) and Blundell et al. (2007).
The restrictions imposed on P% can have several origins, including validation studies, economic theory,
cognitive and social psychology, or information on the circumstances under which the data have been
collected. In this paper I study their identifying power in general. I then consider a few specific examples. As a
starting point, I assume that the researcher has a known lower bound on the probability that the realizations
of w and x coincide, i.e., Prðw ¼ xÞX1  l, or, strengthening this assumption, that the researcher has a known
lower bound on the probability of correct report for each value that x can take, i.e., Prðw ¼ jjx ¼ jÞX1  l,
8j 2 X . This information is often provided by validation studies or knowledge of the circumstances under
which the data have been collected.1 In this paper it is regarded as ‘base-case’ information, and the
identification regions derived under these assumptions constitute the baseline of the analysis. Then I consider
the case of ‘constant probability of correct report’ and the case of ‘monotonicity in correct reporting.’ I show
that these assumptions can have identifying power when maintained alone, as well as when imposed jointly
with the base case assumptions.
The assumption of constant probability of correct report is motivated by the findings of validation studies.
For specific survey inquiries, these studies suggest that the probability of correct report, for at least a subset of
the values that x can take, is constant. For example, in the context of self-reports of employment status,
Poterba and Summers’ (1995) analysis suggests that there is approximately the same probability of correct
report for people who are employed and for those who are not in the labor force, but a much lower probability
of correct report for people who are unemployed.
The assumption of monotonicity in correct reporting is motivated by social psychology, which suggests that
when survey respondents are asked questions relative to socially and personally sensitive topics, they tend to
underreport socially undesirable behaviors and attitudes, and overreport socially desirable ones. This
suggestion is supported by validation studies, which often document, within a given survey inquiry, that the
probability of correct report of a certain alternative is greater than or equal to the probability of correct report
of a less socially desirable alternative. This is the case, for example, when survey respondents are asked about
their participation in welfare programs.
The proposed method allows the researcher to easily incorporate these assumptions, and in general any
restriction on the misclassification pattern, into the analysis. The method is easy to implement and often
computationally tractable (see Section 2.2 for a discussion of computational issues). Despite the fact that the
results of validation studies on discrete variables are often presented in the form of matrices of
misclassification probabilities (see, e.g., Bound et al., 2001), and the appeal of the simple formalization
given by the misclassification models, there appear to be no precedents to the direct use of Eq. (1.1) to deal
with the identification problems caused by classification errors.
However, there are precedents to the use of specific restrictions on misclassification probabilities. Aigner
(1973), Klepper (1988) and Bollinger (1996) imposed different sets of assumptions on the probabilities of
misclassifying a dichotomous variable x and derived sharp non-parametric bounds on the mean regression
EðyjxÞ. Their approach is close in spirit to the one in this paper, but their methods are designed exclusively for
binary variables and for the case in which specific assumptions hold. Swartz et al. (2004) discuss identification
problems due to misclassification from a Bayesian perspective. In particular, they focus on ‘permutation-type
non-identifiability’ by which switching the positions of Prðx ¼ iÞ and Prðx ¼ jÞ, and those of Pðwjx ¼ iÞ and
Pðwjx ¼ jÞ, the implied distribution of PðwÞ does not change. They introduce several assumptions on the
matrix of misclassification probabilities which overcome this type of problem, and achieve point identification
by imposing a prior on the misclassification matrix and on PðxÞ.

1
Availability of a lower bound on the error probability is a commonplace assumption in the statistic literature on robust estimation,
which makes use of mixture models. For example, Hampel (1974) and Hampel et al. (1986) state that ‘the proportion of gross errors in
data, depending on circumstances, is normally between 0.1% and 10% with several percent being the rule rather than the exception’
(p. 387 and p. 28, respectively).
ARTICLE IN PRESS
84 F. Molinari / Journal of Econometrics 144 (2008) 81–117

Most of the related literature in econometrics (e.g., Card, 1996; Hausman et al., 1998; Abrevaya and
Hausman, 1999; Lewbel, 2000; Dustmann and van Soest, 2000; Kane et al., 1999; Ramalho, 2002) proposes
methods imposing restrictions on misclassification probabilities to achieve parametric or semiparametric
identification of the quantities of interest (i.e., features of PðyjxÞ, or, less often, PðxÞ).2 As such, these methods
are subject to criticisms against possible misspecifications; moreover, while the assumptions employed might
hold in some data sets, there might be other data sets for which they do not hold, and in that case the methods
cannot be applied. Additionally, often these assumptions are maintained for technical reasons and do not have
an obvious interpretation.
Horowitz and Manski (1995, HM henceforth) introduced fully non-parametric methods to draw inference
on features of the distribution of a random variable x when the sampling process is corrupted or
contaminated. They adopted a mixture model and showed that if the researcher has a (non-trivial) lower
bound 1  l on the probability that the realization of w is drawn from the distribution of x, informative
bounds can be obtained on any parameter of the distribution PðxÞ that respects stochastic dominance. HM
showed that these bounds are sharp, in the sense that they exhaust all the available information, given the
sampling process and the maintained assumptions. The assumptions they entertain imply the base case
assumptions on P% introduced above, namely Prðw ¼ xÞX1  l, and Prðw ¼ jjx ¼ jÞX1  l, 8j 2 X .3 When
only these assumptions are maintained, in terms of identification of the types of parameters considered by
HM, the method developed in this paper is equivalent to the one they proposed.
However, often different, and perhaps more, information is available to the applied researcher beyond that
maintained by HM. This information can have strong identifying power, but cannot be easily used within a
mixture model. In particular, for each additional assumption that the researcher wants to bring to bear, she
needs to derive new sharp identification regions for the parameters of interest. Closed form results are often
not easy to obtain and different (possibly computationally challenging) calculation methods for the bounds
and confidence sets may need to be devised for each different set of assumptions.
The direct misclassification approach, on the other hand, does not rely on any specific set of
assumptions, but can incorporate any prior information on the misreporting pattern into the analysis. For
any set of maintained assumptions, the method guarantees sharpness of the implied identification regions,
and these regions and their confidence sets can be estimated using a relatively simple method introduced
in Section 2.
In this paper I focus on a single misclassified variable x. The method easily extends to drawing inference on
features of the distribution of x conditional on a perfectly observed covariate, or on the joint distribution of
several misclassified variables, taking values in finite sets. Given an outcome variable of interest y 2 Y , the
approach also extends to drawing inference on features of the distribution PðyjxÞ when x is subject to
classification errors. Moreover, it can allow one to draw inference when the data are not only error-ridden, but
also incomplete, a situation very common in practice. In fact, in presence of both misclassified and missing
data, the matrix in Eq. (1.1) simply becomes rectangular rather than square, with additional rows giving the
probabilities of having missing data, conditional on the true values of x.
The paper is organized as follows. Section 2 introduces the method, describes connectedness properties of
the identification regions, outlines how the identification regions can be estimated consistently, and proposes a
procedure to calculate confidence sets for the identification regions. Section 3 studies the identifying power of
a few specific assumptions, some of which have not been previously considered in the literature. Section 4
illustrates the estimation method with an application to data on the distribution of pension plans’
characteristics in the American population. Section 5 discusses extensions of the direct misclassification
approach. Section 6 concludes. All of the mathematical details are in Appendix A.

2
Specific restrictions include the following: Bross (1954), when introducing the misclassification problem for binary data, assumed that
Prðw ¼ 1jx ¼ 0Þ and Prðw ¼ 0jx ¼ 1Þ are of the same order of magnitude. Usually with binary data it is assumed either that Prðw ¼
1jx ¼ 0Þ ¼ Prðw ¼ 0jx ¼ 1Þo12 (e.g., Klepper, 1988; Card, 1996), or that Prðw ¼ 1jx ¼ 0Þ þ Prðw ¼ 0jx ¼ 1Þo1 (e.g., Bollinger, 1996;
Hausman et al., 1998). When J42, it is assumed that other monotonicity restrictions between the elements of P% hold (e.g., Abrevaya and
Hausman, 1999; Dustmann and van Soest, 2000), or that specific types of misclassification do not occur (Gong et al., 1990).
3
If the researcher has an upper bound l on the error probability, and the sampling process is corrupted, the first assumption follows; if
the sampling process is contaminated, the second assumption follows. These results are rigorously proved in Molinari (2003).
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 85

2. The direct misclassification approach

In all that follows, to keep the focus on identification, I treat identified quantities as population parameters,
and I assume that Prðw ¼ jÞ40 8j 2 X . A method to consistently estimate the identification regions and
construct their confidence sets is provided at the end of this section.
Let Pw denote the column vector ½Pwj ; j 2 X   ½Prðw ¼ jÞ; j 2 X , Px the column vector ½Prðx ¼ jÞ; j 2 X ,
and P% the stochastic matrix which, through Eq. (1.1), generates the misclassification of x into w. Denote the
elements of P% by p% % %
ij  fPrðw ¼ ijx ¼ jÞg, i; j 2 X , and the columns of P by pj . Let CX denote the space of
all probability distributions on X and define analogously CX W ; let R denote the real line. Let t : CX ! R be
a real functional of PðxÞ, denoted t½Px , with analogous definitions for functionals of the joint distribution of
ðw; xÞ. A particularly simple functional of PðxÞ is t½Px  ¼ E½1ðx ¼ jÞ ¼ Prðx ¼ jÞ, j 2 X . For any given matrix
of functionals of interest Y, let H½Y denote its identification region.
Given this notation, I can rewrite Eq. (1.1) as
Pw ¼ P% Px . (2.1)
The direct misclassification approach starts from the observation that Prðx ¼ jÞ, j 2 X enters each of the J
equations in system (1.1). Hence, each one of these equations can, potentially, imply restrictions on Prðx ¼ jÞ,
and therefore on Px and t½Px . The extent to which this is the case crucially depends on what assumptions are
imposed on the misreporting pattern.
The approach is quite intuitive. If P% were known, and of full rank, I would be able to solve the system of
linear equations in (2.1) and uniquely identify Px , and therefore t½Px . In practice, the misclassification
%
probabilities p%
ij , i; j 2 X are known only to belong to a set H½P , defined below. This set accounts both for
the restrictions coming from probability theory, as well as for the restrictions on the misreporting pattern
coming from validation studies, social and cognitive psychology, economic theory, etc. Denote the elements of
H½P%  by P  fpij gi;j2X , and the columns of this matrix by pj , j 2 X . When H½P%  is not a singleton, Px is not
identified and t½Px  need not be identified, but only known, respectively, to lie in the identification regions
H½Px  and Hft½Px g.
The identification region H½Px  is defined as the set of column vectors px ¼ ½pxk ; k 2 X , such that, given
P 2 H½P% , px solves system (2.1):
H½Px  ¼ fpx : Pw ¼ Ppx ; P 2 H½P% g. (2.2)
In the next subsection,
P
%
H½P  is formally defined and characterized in a way such that 8P 2 H½P , %
pxk X0,
8k 2 X , and Jk¼1 pxk ¼ 1.
Throughout this paper, the notation px is reserved to elements of H½Px  and the notation pxk to the kth
component of a vector px . Hence, pxk and px represent, respectively, feasible values of Prðx ¼ kÞ, k 2 X , and
½Prðx ¼ jÞ; j 2 X , given P 2 H½P%  and Eq. (2.1). By construction
px  px ðP; Pw Þ,

pxk ¼ pxk ðP; Pw Þ; k 2 X.


For ease of notation, I omit the arguments of pxk and px . The identification region Hft½Px g is then defined as
Hft½Px g ¼ ft½px  : px 2 H½Px g. (2.3)
% x x
The set H½P  is of central importance for the identification of P and t½P , as the identification regions of
these functionals are defined on the basis of H½P% . I denote by H P ½P%  the set of matrices that satisfy the
probabilistic constraints and by H E ½P%  the set of matrices satisfying the constraints coming from validation
studies and theories developed in the social sciences. Hence,
H½P%  ¼ H P ½P%  \ H E ½P% .
The geometry of H½P%  and its connectedness properties are of particular interest. This is because the
continuous image of a connected set is connected. Hence, if H½P%  is connected and px is a continuous
function of P, H½Px  is connected as well, and so is Hft½Px g if tðÞ is a continuous functional. Conversely, if
H½P%  is not connected or if the functionals are not continuous, H½Px  and Hft½Px g need not necessarily be
ARTICLE IN PRESS
86 F. Molinari / Journal of Econometrics 144 (2008) 81–117

connected. This has implications for the estimation of the identification regions. Consider for example the case
that interest centers on a real valued functional t½Px . When Hft½Px g is a connected set, it is given by the entire
interval between its smallest and its largest points. Hence by estimating these two points one obtains an
estimate of the entire identification region. When Hft½Px g is disconnected, parts of the interval between the
smallest and the largest points are not feasible and therefore are not elements of the identification region.
Section 2.2 introduces a method to estimate Hft½Px g when this is the case.
A relevant example of a case in which px is a continuous function of P is obtained when each matrix
P 2 H½P%  is of full rank. In this case, for each P 2 H½P% , one can solve the linear system in (2.1), obtaining
px ¼ P1 Pw . It is a well known result in matrix algebra that the inverse of a non-singular matrix is continuous
in the elements of the matrix (see, e.g., Campbell and Meyer, 1991, Chapter 10). A very simple condition
ensuring that each matrix P 2 H½P%  is of full rank is assuming that the probability of correct report is greater
than 12 for each of the values that x can take.4 Validation studies suggest that this requirement is often satisfied
in practice.5

2.1. The set H½P%  and its geometry


P
I start by characterizing the set H P ½P%  and its geometry. Probability theory requires that Ji¼1 pij ¼ 1,
8j 2 X , that pij X0, 8i; j 2 X , and that, given Pw , Eq. (2.1), and P, the implied px gives a valid probability
measure. Denote by H P ½P%  the set of P’s that satisfy these probabilistic requirements, so that, throughout
the entire paper,
( PJ !)
P
pij X0; 8i; j 2 X ; i¼1 pij ¼ 1; 8j 2 X ;
%
H ½P   P : PJ x . (2.4)
pxh X0; 8h 2 X ; h¼1 ph ¼ 1

Notice that the set H P ½P%  can be defined alternatively using the notions of ðJ  1Þ-dimensional simplex and
convex hull of a set of vectors. I use the following definitions:
Definition 1. The ðJ  1Þ-dimensional simplex is the set DJ1  fd 2 RJþ : d1 þ d2 þ    þ dJ ¼ 1g.
Definition 2. The convex hull of a finite subset fn1 ; n2 ; . . . ; nJ g of RJ , denoted convfn
P 1 ; n2 ; . . . ; nJ g, consists of all
the vectors of the form a1 n1 þ a2 n2 þ    þ aJ nJ with ai X0 8i ¼ 1; . . . ; J and Ji¼1 ai ¼ 1. (Rockafellar, 1970,
Corollary 2.3.1.)
By definition, Pw 2 DJ1 . The set H P ½P%  can be rewritten as
H P ½P%   fP : pj 2 DJ1 and pxj X0 8j 2 X ; and Pw 2 convfp1 ; p2 ; . . . ; pJ gg. (2.5)
In words, a matrix P is an element of H P ½P%  if its columns are probability mass functions, the implied px is a
probability mass function, and the vector Pw can be expressed as a convex combination of the columns of P.
This set of matrices contains also matrices that are not of full rank. Notably, it contains the matrix with each
column identical to Pw , denoted P.~ This matrix plays an important role in Proposition 1 below.
To describe the geometry of H P ½P%  I need to introduce another definition:
Definition 3. A subset G of Rn is star convex with respect to c0 2 G if for each c 2 G the line segment joining c
and c0 lies in G (Munkres, 1991, p. 330).
Star convexity implies path-connectedness, which in turn implies connectedness. Given a set of matrices
P  RJJ , define the line segment between two matrices P1 ; P2 2 P as
Pa ¼ aP1 þ ð1  aÞP2 ; a 2 ½0; 1.

4
If pjj 4 12 ; 8j 2 X , 8 P 2 H½P% , PT is strictly diagonally
P dominant, and hence P is non-singular. An n  n matrix A ¼ faij g is said to be
strictly diagonally dominant if, for i ¼ 1; 2; . . . ; n, jaii j4 nj¼1ðjaiÞ jaij j. A proof of the fact that if A is strictly diagonally dominant, then A is
non-singular, can be found in Horn and Johnson (1999, Theorem 6.1.10.)
5
Among others, this is the case in the context of workers’ union status (see, e.g., Card, 1996), transfer program recipiency (see, e.g.,
Moore et al., 1996), employment status (see, e.g., Poterba and Summers, 1995), and 1- and 3-digit level classification of industry and
occupation (see, e.g., Mellow and Sider, 1983).
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 87

Then the set P is convex if given any two matrices P1 ; P2 2 P, Pa 2 P for all a 2 ð0; 1Þ. Connectedness of the
set H P ½P%  is established in the following proposition:

~ However, it is not star convex with respect to any


Proposition 1. The set H P ½P%  is star convex with respect to P.
other of its elements.

The result in Proposition 1 implies that the set H P ½P%  is not convex, because a convex set is star convex
with respect to each of its elements. The set H P ½P%  is illustrated in Example 1 and in the first panel of Fig. 1.

Example 1. Suppose that x and w are binary, i.e., that J ¼ 2, and let Pw1 ¼ 0:3. Then the matrix P is
determined by its two diagonal elements, p11 and p22 , and
px1 2 ½0; 1 : Pw1 ¼ p11 px1 þ ð1  p22 Þð1  px1 Þ.
It is easy to verify that

H P ½P%  ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 ½0; 1  Pw1 Þ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1  Pw1 ; 1Þg.
This set is plotted in the first panel of Fig. 1, and its star convexity is apparent.

Panel 2: H[Π∗] Panel 3: H[Π∗]


Panel 1: HP[Π∗] Assuming π11 = π22 Assuming π11 ≥ π22
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6


π22

π22

π22

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
π11 π11 π11

Panel 4: H[Π∗] Panel 5: H[Π∗] Panel 6: H[Π∗]


Assuming π11 ≤ π22 Assuming πjj ≥ 0.2 ∀ j ∈ X Assuming πjj ≥ 0.8 ∀ j ∈ X
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6


π22

π22

π22

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
π11 π11 π11

Fig. 1. Geometry of the set H P ½P% , and of the set H½P%  under different assumptions, when J ¼ 2 and Prðw ¼ 1Þ ¼ 0:3.
ARTICLE IN PRESS
88 F. Molinari / Journal of Econometrics 144 (2008) 81–117

The problem which occurs in this example relates to the ‘permutation-type non-identifiability’ considered by
Swartz et al. (2004). For a given P1 2 H P ½P% , one can obtain another P2 2 H P ½P%  by letting p211 ¼ 1  p122
and p222 ¼ 1  p111 . Letting p~ x1 ¼ ð1  px1 Þ yields p111 px1 þ ð1  p122 Þð1  px1 Þ ¼ p211 p~ x1 þ ð1  p222 Þð1  p~ x1 Þ. This
explains the symmetry of H P ½P%  around the line p22 ¼ 1  p11 .
Denote by H E ½P%  the set of matrices that satisfy the restrictions on the misreporting pattern coming from
prior information. Then if, for example, validation studies suggest a uniform lower bound on the probability
of correct report for each j 2 X ,
H E ½P%  ¼ fP : pjj X1  l 8j 2 X g.
If social psychology suggests that individuals, when answering about the frequency with which they engage in
a certain socially desirable activity, either provide correct reports or overreport,
H E ½P%  ¼ fP : pij ¼ 0 8ioj 2 X g.
Of course, plenty of other restrictions are possible.
Because H P ½P%  is connected, but not convex, when I take its intersection with the set H E ½P%  I obtain a set
H½P%  that might be disconnected, connected, or convex, depending on how H E ½P%  slices H P ½P% . Below I
provide three examples of sets H E ½P% , which are further analyzed in Section 3. Each of these sets is trivially
convex, as it is linear in P, but its intersection with H P ½P%  generates sets H½P%  that can be disconnected,
connected, or convex. These examples are illustrated in the six panels of Fig. 1.
Example 2 (Constant probability of correct report). Let H E ½P%  ¼ fP : pjj ¼ p 8j 2 X g. Suppose that x and w
are binary, i.e. that J ¼ 2. Then
8
> fp : p 2 ½0; Pw1  [ ½1  Pw1 ; 1g if Pw1 o 12 ;
<
w w w 1
H½P%  ¼ fp : p 2 ½0; 1  P1  [ ½P1 ; 1g if P1 4 2 ;
>
: fp : p 2 ½0; 1g if Pw1 ¼ 12 :

Hence, if Pw1 a 12, H½P%  is disconnected. This set is plotted in the second panel of Fig. 1, and its
disconnectedness is apparent. The set H½P%  remains disconnected, if Pw1 a 12, even if the assumption of
constant probability of correct report is weakened to requiring that p22 ¼ p11 þ e, as long as jejoj1  2Pw1 j
(and e is such that p22 2 ½0; 1Þ.
Example 3 (Monotonicity in correct reporting). Let H E ½P%  ¼ fP : pjj Xpðjþ1Þðjþ1Þ 8j 2 X g. Suppose that x and
w are binary, i.e. that J ¼ 2, so that the monotonicity assumption simplifies to p11 Xp22 . Then if Pw1 o 12,
H½P%  ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 f½0; p11 gÞ [ ðp11 2 ½1  Pw1 ; 1; p22 2 ½1  Pw1 ; p11 Þg.
If Pw1 X 12,
H½P%  ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 ½0; minð1  Pw1 ; p11 ÞÞ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1  Pw1 ; p11 Þg.
Hence, if Pw1 o 12, H½P%  is disconnected, but otherwise it is connected. This set is plotted in the third panel of
Fig. 1. Its disconnectedness is apparent given the choice of Pw1 ¼ 0:3. To see why the set can be connected, the
fourth panel of Fig. 1 plots the set H½P%  obtained when the monotonicity assumption is p11 pp22 (in the
binary case, reversing the sign of the monotonicity assumption has an effect similar to maintaining p11 Xp22
but having Pw1 4 12).
Example 4 (Lower bound on the probability of correct report). Let H E ½P%  ¼ fP : pjj X1  l 8j 2 X g. Suppose
that x and w are binary, i.e. that J ¼ 2. Then if 14l4maxfPw1 ; 1  Pw1 g,
H½P%  ¼ fp11 ; p22 : ðp11 2 ½1  l; Pw1 ; p22 2 ½1  l; 1  Pw1 Þ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1  Pw1 ; 1Þg.
This set is connected through the point p11 ¼ Pw1 , p22 ¼ 1  Pw1 , and is plotted in the fifth panel of Fig. 1 for
Pw1 ¼ 0:3 and l ¼ 0:8.
If maxfPw1 ; 1  Pw1 g4l, then
H½P%  ¼ fp11 ; p22 : p11 2 ½maxf1  l; Pw1 g; 1; p22 2 ½maxf1  l; 1  Pw1 g; 1g,
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 89

and H½P%  is convex. This set is plotted in the sixth panel of Fig. 1. Its convexity is apparent given the choice
of Pw1 ¼ 0:3 and l ¼ 0:2.

2.2. Consistent estimation of the identification regions

The set H½P%  can be disconnected, connected or convex. These properties are reflected in the shape of the
identification regions of the functionals of interest, namely H½Px , Hft½Px g and HfY½Px g, for some vector of
dimension k of functionals Y : CX ! Rk . Hence, it is important to have a method to calculate and
consistently estimate the entire identification regions, that is able to capture their possible disconnectedness
and non-convexities. While the general identification approach proposed in Section 2.1 is valid for any set of
restrictions on P% , here I focus on restrictions that satisfy certain regularity conditions, described in
Assumptions C0 and C1 below, so that a simple estimator can be utilized.
Manski and Tamer (2002) introduced methods to estimate the entire identification region of a vector of
parameters of interest when the identification region cannot be expressed in closed form solution, but
is given by all values of the vector that minimize a specified objective function. Here I introduce a
related nonlinear programming estimator, using the same insight as in the linear programming estimator
proposed by Honore and Tamer (2006) and further discussed by Honore and Lleras-Muney (2006).
Observe that if I can calculate H½Px , I can then calculate Hft½Px g and HfY½Px g for any functionals tðÞ
and YðÞ (for example, the mean of x, its variance, the Gini coefficient, etc.); hence, I focus on the calculation
of H½Px .6
The set H½Px  consists of the vectors px 2 DJ1 for which the equations
Pw ¼ Ppx ; pj 2 DJ1 8j; P 2 H E ½P% , (2.6)
E %
have a solution for P. In general, H ½P  can be written as
( )
P : f j ðPÞXmj ; j ¼ 1; . . . ; q1 ; gi ðPÞpmq1 þi ; i ¼ 1; . . . ; q2 ;
E %
H ½P  ¼ ,
hk ðPÞ ¼ mq1 þq2 þk ; k ¼ 1; . . . ; q3 ;

where q1 þ q2 þ q3 ¼ q is the number of constraints imposed, mj 2 ½0; M, j ¼ 1; . . . ; q, is a non-negative


2 2 2
parameter bounded by some constant M, and f j : RJ !R, gi : RJ !R, and hk : RJ !R, are functions
taking as arguments the elements of the matrix P.
To give a concrete example, if X ¼ f1; 2; 3g and
H E ½P%  ¼ fP : pjj X0:88j 2 X ; 0:125pp12 p13 p0:33; p11 ¼ p22 g,
then q1 ¼ 4, q2 ¼ 1, q3 ¼ 1, q ¼ 6, and
f j ðPÞ ¼ pjj ; mj ¼ 0:8; j ¼ 1; 2; 3,

f 4 ðPÞ ¼ p12 p13 ; m4 ¼ 0:125,

g1 ðPÞ ¼ p12 p13 ; m5 ¼ 0:33,

h1 ðPÞ ¼ p11  p22 ; m6 ¼ 0.


The equations in (2.6) have the same structure as the constraints in a nonlinear programming problem.
Hence one can check whether a particular vector n 2 DJ1 belongs to H½Px  by checking if a nonlinear
programming problem that has constraints given by (2.6) has a solution with a specific value for the objective
6
If the researcher is interested in a scalar valued functional of Px , say, for example, t½Px  ¼ Prðx ¼ jÞ, j 2 X , and the matrix P is of full
rank for any P 2 H½P% , the extreme points of the identification region of this functional can be calculated and consistently estimated by
solving nonlinear optimization problems subject to linear and nonlinear constraints. In particular, let px ¼ P1 Pw , P 2 H½P% . Then the
smallest and the largest points in H½Prðx ¼ jÞ, j 2 X can be calculated as px;Lj ¼ inf P2H½P%  pxj ðP; Pw Þ, px;U
j ¼ supP2H½P%  pxj ðP; Pw Þ. These
extreme points are continuous functions of P , and therefore one can consistently estimate them by replacing Pw with PwN .
w
ARTICLE IN PRESS
90 F. Molinari / Journal of Econometrics 144 (2008) 81–117

function. Consider the nonlinear programming problem


X
QðnÞ ¼ max vk (2.7)
fpij g;fvk g
k

subject to
8
>
> vk X0 8k;
>
>
>
> p ij X0; i; j ¼ 1; . . . ; J;
>
> P
>
>
>
> 1  Ji¼1 pij ¼ vj ; j ¼ 1; . . . ; J;
<
Pw  Pn ¼ ½vJþ1 . . . v2J T ; (2.8)
>
>
>
> f l ðPÞ  ml þ v2Jþl X0; l ¼ 1; . . . ; q1 ;
>
>
>
> mq1 þm  gm ðPÞ þ v2Jþq1 þm X0; l ¼ 1; . . . ; q2 ;
>
>
>
>
: hs ðPÞ  mq þq þs þ v2Jþq þq þs ¼ 0; l ¼ 1; . . . ; q3 :
1 2 1 2

I consider restrictions determining the set H E ½P%  that satisfy the following conditions:
Assumption C0. For each j ¼ 1; . . . ; q1 , i ¼ 1; . . . ; q2 , and k ¼ 1; . . . ; q3 ; f j ðPÞjP¼0 ¼ gi ðPÞjP¼0 ¼ hk ðPÞjP¼0 ¼
2
0 and f j ðPÞ, gi ðPÞ, and hk ðPÞ are continuous on ½0; 1J .
Let P  V denote the constraint set defined by (2.8). Assumption C0 is imposed to establish that the
objective function in (2.7) achieves a maximum on (2.8). Observe that the set P  V is closed, because the
constraints defining it are continuous, and non-empty, because it contains the vector ½p01 ; . . . ; p0J ; v0 , with
p0ij ¼ 0 for i; j ¼ 1; . . . ; J, v0j ¼ 1 for j ¼ 1; . . . ; J, v0Jþj ¼ Pwj for j ¼ 1; . . . ; J; v02Jþl ¼ ml , l ¼ 1; . . . ; q1 ,
v02Jþq1 þm ¼ 0, m ¼ 1; . . . ; q2 , and v02Jþq1 þq2 þs ¼ mq1 þq2 þs , s ¼ 1; . . . ; q3 . Hence maximization of (2.7) on P 
V is equivalent to maximization of (2.7) on
( )
X X
P ~
~  V ¼ ½p1 ; . . . ; pJ ; v 2 P  V : vk X 0
v ,
k
k k

which is a closed and bounded set. The objective function in (2.7) is continuous, and therefore the result
follows by the Bolzano–Weierstrass theorem.7 The optimal function has value zero if and only if all vk ¼ 0,
that is if a solution exists to (2.6). Hence, for given n 2 DJ1 one can check whether n 2 H½Px  by solving the
above nonlinear programming problem and checking whether vk ¼ 0 for all k.
The above method for calculating identification regions has a natural sample analog counterpart, and under
some regularity conditions about the functions defining the set H E ½P%  and the sampling process, this
estimator is consistent. In particular, I maintain the following:
Assumption C1. For each j ¼ 1; . . . ; q1 , i ¼ 1; . . . ; q2 , and k ¼ 1; . . . ; q3 , either (i) f j ðPÞ, gi ðPÞ and hk ðPÞ are
homogeneous functions of degree (respectively) rj ; ri ; rk X1, or (ii) f j ðPÞ, gi ðPÞ and hk ðPÞ are multivariate
polynomials in P with non-negative coefficients, or (iii) f j ðPÞ are convex functions, gi ðPÞ are concave
2
functions, and either (i) or (ii) holds for hk ðPÞ. Additionally, gi ðPÞX0 and hk ðPÞX0 on ½0; 1J .
P
Assumption C2. (a) Let a random sample fwi g, i ¼ 1; . . . ; N be available, and let Pwi;N ¼ N1 N j¼1 1ðwj ¼ iÞ,
i ¼ 1; . . . ; J. (b) If the set H E ½P%  contains constraints involving any parameters to be estimated, let these
parameters enter the constraints additively. Without loss of generality, to simplify the notation, let the
N
parameters to be estimated be ml , l ¼ 1; . . . ; q̄pq. (c) Suppose that a random sample pffiffiffiffi of size n ¼d k for some
constant k such that 0oko1 is available to estimate ml , l ¼ 1; . . . ; q̄, so that Nðml;n  ml Þ ! Nð0; kVml Þ.
(d) Let ml satisfy ml 40; l ¼ 1; . . . ; q̄pq.
In Section 3 I consider several examples of restrictions defining the set H E ½P%  that satisfy Assumptions
C0–C1. For example, suppose that a validation study provides a lower bound on the probability of correct
7
Alternative assumptions replacing Assumption C0 and yielding a non-empty closed and bounded constraint set for every n 2 DJ1
would also imply this result.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 91

report for each type j ¼ 1; . . . ; J, so that H E ½P%  ¼ fP : pjj Xmj ; j 2 X g. Then Assumptions C0–C1 are clearly
satisfied. Moreover, if a validation (random) sample fw~ i ; x~ i g, i ¼ 1; . . . ; n is available (with n ¼ Nk , 0oko1),
which does not point identify Px and P% , but allows one to conclude that
Pn
1ðw~ ¼ j; x~ i ¼ jÞ
Pn i
pjj Xmj;n ¼ i¼1 ,
i¼1 1ðx~ i ¼ jÞ
then Assumption C2 is satisfied. The empirical analysis conducted in Section 4 shows that there are important
cases in which a validation sample allows for root-N consistent estimation of mj;n , but does not allow for point
identification of Px or P% .
Let H EN ½P%  denote the set H E ½P%  obtained when ml is replaced by ml;n , l ¼ 1; . . . ; q, with the convention
that ml;n ¼ ml for l ¼ q̄ þ 1; . . . ; q. Define an objective function QN ðnÞ as in (2.7)–(2.8), with ml;n , l ¼ 1; . . . ; q,
replacing ml and PwN replacing Pw . Then the following consistency result holds:
Proposition 2. Let Assumptions C0, C1 and C2 hold. Define the set
( )
H N ½Px  ¼ pxN 2 DJ1 : QN ðpxN ÞX sup QN ðnÞ  N , (2.9)
n2DJ1

where N ¼ N t , 0oto 12. Then the set H N ½Px  is a consistent estimator of H½Px , in the sense that
( )
p
rðH N ½Px ; H½Px Þ  max sup inf kpxN  px k; sup inf kpxN  px k ! 0.
pxN 2H N ½Px  px 2H½Px  px 2H½P x pxN 2H N ½Px 

Most of the calculations and estimations of H½Px  presented in this paper are performed using this nonlinear
programming method. The method requires checking the value function of the sample analog of (2.7)–(2.8) for
each n 2 DJ1 . Hence it works best, and the computations are easiest, when J is a relatively small number. This
is the case in many applications of interest. Examples include educational attainment, language proficiency,
workers’ union status, employment status, health conditions, and health/functional status. When J is a large
number, the nonlinear programming problem becomes computationally harder. This issue has been
acknowledged in the related literature on partial identification, and some solutions have been proposed. For
example, Chernozhukov et al. (2004) and Ciliberto and Tamer (2004) have suggested the use of the
Metropolis–Hastings algorithm to generate adaptive grid sets or the use of simulated annealing to perform the
optimization over DJ1 . While Ciliberto and Tamer’s empirical analysis is based on the optimization of a
different objective function and the parameter space for n in their case is not DJ1 , their work shows that the
computational problem is feasible for values of J as large as 13.

2.3. Confidence sets for the identification regions8

The problem of the construction of confidence intervals for partially identified parameters was addressed by
Horowitz and Manski (1998, 2000). They considered the case in which the identification region of the
parameter of interest is an interval whose lower and upper bounds can be estimated from sample data, and
proposed confidence intervals that asymptotically cover the entire identification region with fixed probability.
For the same class of problems, Imbens and Manski (2004) suggested shorter confidence intervals that
uniformly cover the parameter of interest, rather than its identification region, with a prespecified probability.
Beresteanu and Molinari (2007) provide confidence sets and confidence collections for partially identified
parameters whose convex identification region is equal to the expectation of a properly defined set valued
random variable. These approaches are not applicable to the problem studied here, because our identification
regions are given by the set of values of the parameters of interest that solve a maximization problem, do not
have a closed form solution, and are not necessarily convex. The problem of construction of confidence sets
for identification regions of parameters obtained as the solution of the minimization of a criterion function has
recently been addressed by Chernozhukov et al. (2007). They provided a method to construct confidence sets
8
I am very grateful to Elie Tamer for suggestions that led to the construction of these confidence sets.
ARTICLE IN PRESS
92 F. Molinari / Journal of Econometrics 144 (2008) 81–117

that cover the identification region with probability asymptotically equal to 1  a, and developed subsampling
methods to implement this procedure. Here I introduce a different procedure, and show that the coverage
property of these confidence sets follows directly from well known results in the literature (e.g., Rao, 1973;
Cox and Hinkley, 1974). The counterpart of the simplicity of this approach is that the confidence sets may be
conservative, in the sense that given a prespecified confidence coefficient 1  a, 0oao1, the confidence sets
asymptotically cover the identification region with probability at least equal to 1  a.x
The main insight for the construction of the confidence sets for H½Px , denoted C H½P N

, is given by observing
x w
that the only parameters to be estimated for obtaining H N ½P  in (2.9) are Pi;N , i ¼ 1; . . . ; J  1, and ml;n ,
l ¼ 1; . . . ; q̄. Let !^ N denote the J  1 þ q̄ vector collecting these estimators. Under Assumption C2, !^ N is root-
N consistent and asymptotically normal, and has a covariance matrix (Varð!Þ) that can be consistently
estimated from the data (Varð d !^ N Þ). Hence, if c1a denotes the 1  a quantile of the w2
ðJ1þq̄Þ distribution, I
construct a joint confidence ellipsoid for !  ½ðPwi Þi¼1;...;J1 ; ðml Þl¼1;...;q̄  as

C !N  f!0 : ð!^ N  ! 0 Þ0 ðVarð


d !^ N ÞÞ1 ð!^ N  ! 0 Þpc1a g.
It follows from the results in Rao (1973, Section 7b) that
lim Prð! 2 C !N Þ ¼ 1  a.
N!1
x
Given C !N , I construct C H½P
N

as follows. For a given !0 2 C !N , let H !0 ½Px  denote the identification region for
^
P obtained when !N is replaced by ! 0 in the estimation procedure described in the previous section. Let
x

x [
C H½P
N

¼ H !0 ½Px .
! 0 2C !N

Then
x
! 2 C !N ¼)H½Px   C H½P
N

,
and therefore
x
lim PrðH½Px   C H½P
N

ÞX1  a.
N!1

The confidence sets presented in Section 4 are obtained using this procedure. Using similar procedures one
can construct confidence regions for Hft½Px g and HfY½Px g, where again tðÞ and YðÞ denote functionals
of PðxÞ.

3. Analysis of the identifying power of specific restrictions on P%

This section analyzes in detail examples of restrictions on the matrix P% (which satisfy Assumptions
C0–C1) coming from validation studies and theories developed in the social sciences. I suggest settings in
which such assumptions may be credible, show their implications for the structure of H½P% , and present
results on the inferences that they allow one to draw on Px and t½Px . I show that when the ‘base-case’
assumptions are maintained, the direct misclassification approach is equivalent to the method proposed by
HM and therefore it gives the same identification regions for H½Prðx ¼ jÞ, j 2 X as the ones they derived.
Hence, I use these results as a benchmark to evaluate the identifying power of additional assumptions. Notice
however that H½Prðx ¼ jÞ, j 2 X is just the projection of H½Px  on its jth component. Therefore, when J42, a
comparison based simply on H½Prðx ¼ jÞ, j 2 X understates the identifying power of the additional
assumptions. When J ¼ 2, H½Px  is entirely described by H½Prðx ¼ 1Þ and closed form bounds can be derived
under different sets of assumptions, hence allowing for a full comparison.

3.1. Upper bound on the probability of data errors

Suppose that the researcher has a known lower bound on the probability that the realizations of w and x
coincide, i.e., Prðw ¼ xÞX1  l, or, strengthening this assumption, that the researcher has a known lower
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 93

bound on the probability of correct report for each value that x can take, i.e., Prðw ¼ jjx ¼ jÞX1  l, 8j 2 X .
Formally, consider the following:
Assumption 1. Prðw ¼ xÞX1  l40
or, as a stronger version of Assumption 1, that:
Assumption 2. Prðw ¼ jjx ¼ jÞX1  l40; 8j 2 X .
Assumptions 1 and 2 are quite often satisfied in practice, mainly due to the availability of results of
validation studies, and are therefore of particular interest. Additionally, Assumptions 1 and 2 exhaust the
implications for the structure of P% of the assumptions typically maintained by researchers adopting mixture
models. Hence, the results obtained under these ‘base-case’ assumptions are particularly suited to evaluate the
identifying power of additional prior information. In the next section I show that informative identification
regions might be obtained even if one dispenses with Assumptions 1 and 2, when other information is available.
When the researcher has prior information suggesting that either Assumption 1 or the stronger Assumption
2 hold, she can specify the set H E ½P% , respectively, as follows:
( )
XJ
E;1 % x
H ½P  ¼ P : phh ph X1  l ,
h¼1
E;2 %
H ½P  ¼ fP : pjj X1  l 8j 2 X g,
where H E;1 ½P%  denotes the set H E ½P%  when Assumption 1 is maintained, and H E;2 ½P%  denotes the set
H E ½P%  when Assumption 2 is maintained. Notice that H E;2 ½P%   H E;1 ½P% . Proposition 3 gives closed form
bounds on Prðx ¼ jÞ, j 2 X , for the case in which either Assumption 1 or 2 holds.
Proposition 3. (a) Suppose that Assumption 1 holds, and that no other information is available. Then from system
(1.1) one can learn that
H½Prðx ¼ jÞ ¼ ½maxðPwj  l; 0Þ; minð1; Pwj þ lÞ; j 2 X. (3.1)
(b) Suppose that Assumption 2 holds, and that no other information is available. Then from system (1.1) one
can learn that
  w   
Pj  l Pwj
H½Prðx ¼ jÞ ¼ max ; 0 ; min 1; ; j 2 X. (3.2)
1l 1l
The proof of Proposition 3 proceeds in two steps. First, it is shown that from the jth equation of system
(1.1) one can learn, depending on the maintained assumption, that Prðx ¼ jÞ lies in one of the intervals in
(3.1)–(3.2). Then it is shown that there exists a P 2 H½P%  for which the extreme values of these intervals solve
system (1.1). This implies that the bounds are sharp. This result establishes that when only Assumption 1 or
Assumption 2 is maintained, only the jth equation in system (1.1) implies restrictions on Prðx ¼ jÞ, j 2 X . In
the next section I show that when more structure is imposed on the matrix P, several of the equations in
system (1.1) imply restrictions on Prðx ¼ jÞ, j 2 X , and additional progress can be made.
The same identification regions as those in Proposition 3 were obtained by HM. They used a mixture model
to study the problem of inference with corrupted and contaminated data, and assumed that a known lower
bound is available on the probability that a realization of w is drawn from the distribution of x. Molinari
(2003) shows that under Assumptions 1 and 2, the identification regions for parameters that respect stochastic
dominance obtained using the direct misclassification approach are also equivalent to those obtained by HM.

3.2. Constant probability of correct report

Consider the case that, conditional on the value of x, there is constant probability of correct report for at
least a subset of the values that x can take. Formally:
Assumption 3. Prðw ¼ jjx ¼ jÞ ¼ p% X1  lX0 8j 2 X~  X ,
ARTICLE IN PRESS
94 F. Molinari / Journal of Econometrics 144 (2008) 81–117

where p% is known only to lie in ½1  l; 1, and l is strictly less than 1 if a non-trivial upper bound on the
probability of a data error is available.
There are various situations in which this assumption may be credible. For example, Poterba and Summers
(1995) use CPS data (with Reinterview Survey) and provide evidence for the reinterviewed sub-sample that the
rate of correct report of employment status is similar for individuals who are employed or not in the labor
force (Prðw ¼ jjx ¼ jÞ ’ 0:99), but much lower for individuals who are unemployed (Prðw ¼ jjx ¼ jÞ ’ 0:86).
Kane et al. (1999) provide evidence (Table 5, p. 18) that self-report of educational attainment is correct with
similar probabilities for individuals with no college, some college but no AA degree, and AA degree
(Prðw ¼ jjx ¼ jÞ ’ 0:92), and is higher for individuals with at least a bachelor degree (Prðw ¼ jjx ¼ jÞ ’ 0:99).
Assumption 3 may hold with X~ ¼ X when the misclassification is generated by specific types of interviewer
recording errors. For example, the interviewer may sometimes mark one box at random in the questionnaire.
Additionally, in the special case of dichotomous variables, some have argued that the misreporting of health
disability is independent from true disability status (see Kreider and Pepper, 2007 for a discussion of this
issue), or that the misreporting of workers’ union status is independent from true union status (see Bollinger,
1996 for a discussion of this issue). When this is the case, Assumption 3 holds.
In general, Assumption 3 does not place any restriction on Prðw ¼ ijx ¼ jÞ, iaj; i; j 2 X , other than that the
misreporting probabilities need to satisfy
X
Prðw ¼ ijx ¼ jÞ ¼ 1  p% ; 8j 2 X~ .
iaj

When J ¼ 2, this implies that the two off-diagonal elements of P% are equal; hence the only unknown element
of P% is p% .
Suppose first that X~  X , and without loss of generality let X~  f1; 2; . . . ; hg, 2phoJ. When this is the case,
Eq. (1.1) can be rewritten as
2 32 3 2 3
p% p%12 ... p%1J Prðx ¼ 1Þ Prðw ¼ 1Þ
6 p% p% ... p% 76 7 6 7
6 21 2J 76 Prðx ¼ 2Þ 7 6 Prðw ¼ 2Þ 7
6 76 7 6 7
6 .. .. .. .. 76 .. 7¼6 .. 7, (3.3)
6 . . . 7
. 54 6 . 7 6 . 7
4 5 4 5
p%
J1 p%
J2 ... p%JJ Prðx ¼ JÞ Prðw ¼ JÞ

where p% X1  l and, assuming that l constitutes a uniform upper bound for all the misclassification
probabilities, p% ~ E %
ll X1  l, 8l 2 X nX . Then H ½P  is defined as

H E;3 ½P%  ¼ fP : pjj ¼ pX1  l 8j 2 X~ ; pll X1  l 8l 2 X nX~ g.

Let H 3 ½P%  ¼ H P ½P%  \ H E;3 ½P% , where H P ½P%  was defined in (2.4). Then one can immediately calculate
H½Px  and Hft½Px g using the non-linear programming method described in Section 2, with H E ½P%  ¼
H E;3 ½P% .
It is natural to ask whether Assumption 3 does have identifying power. To answer this question, I consider
the case that the researcher has a non-trivial upper bound on the probability of data errors, i.e., that lo1 and
compare the bounds on Prðx ¼ jÞ, j 2 X derived in Proposition 3, Eq. (3.2), with the extreme points obtained
using the nonlinear programming method, with H E ½P%  ¼ H E;3 ½P% . In Section 3.4 I consider the case in
which x and w are binary ðJ ¼ 2Þ and show that Assumption 3 can have identifying power even when l ¼ 1.
Proposition 4 shows that if Pwi 40, for some i 2 X~ nfjg, the base case lower bound on Prðx ¼ jÞ, j 2 X~ , if
informative, is never feasible when Assumption 3 (with X~  X ) is maintained; hence the lower bound on
Prðx ¼ jÞ, j 2 X~ under Assumption 3 is strictly greater than that in (3.2). For the case in which the base case
upper bound on Prðx ¼ jÞ, j 2 X~ is informative, Proposition 5 derives conditions under which such upper
bound is not feasible when Assumption 3 (with X~  X ) is maintained, and shows that when those conditions
are satisfied, this upper bound is strictly smaller than that in (3.2). When the base case lower and upper bounds
(respectively) are not informative, Assumption 3 has no additional identifying power.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 95

Proposition 4. (a) Suppose that Assumption 3 holds, with X~  X , and that Pwj 4l. Then the lower bound on
Prðx ¼ jÞ, j 2 X~ is strictly greater than the base case lower bound in (3.2). The base case lower bound in (3.2) is
the sharp lower bound for Prðx ¼ kÞ, k 2 X nX~ .
(b) Suppose that Assumption 3 holds, with X~  X , and that Pwj pl. Then the sharp lower bound on Prðx ¼ jÞ,
j 2 X coincides with the base case lower bound in (3.2), and is equal to 0.
Proposition 5. (a) Suppose that Assumption 3 holds, with X~  X , and that Pwj o1  l.
If lp 12, the upper bound on Prðx ¼ jÞ, j 2 X~ is strictly smaller than the base case upper bound in (3.2) if and
only if
l
9 k 2 X~ nfjg : Pwj þ Pwk 4ð1  lÞ þ Pwj . (3.4)
1l
If l4 12, the upper bound on Prðx ¼ jÞ, j 2 X~ is strictly smaller than the base case upper bound in (3.2) if
9 k 2 X~ nfjg : Pwk 4l. (3.5)
The base case upper bound in (3.2) is the sharp upper bound for Prðx ¼ kÞ, k 2 X nX~ .
(b) Suppose that Assumption 3 holds, with X~  X , and that Pwj X1  l. Then the sharp upper bound on
Prðx ¼ jÞ, j 2 X coincides with the base case upper bound in (3.2) and is equal to 1.
The proofs of Propositions 4–5, parts (a), are based on showing that there is no P 2 H 3 ½P%  for which the
lower bound in (3.2) for Prðx ¼ jÞ, j 2 X~ solves system (3.3), and that when condition (3.4) or condition (3.5) is
satisfied, there is no P 2 H 3 ½P%  for which the upper bound in (3.2) for Prðx ¼ jÞ, j 2 X~ solve system (3.3).
When the inference is on Prðx ¼ kÞ, k 2 X nX~ , there is a P 2 H 3 ½P%  that allows for the base case bounds in
(3.2) to solve system (3.3). The proofs of Propositions 4–5, parts (b), are based on showing that when the
bounds on Prðx ¼ jÞ, j 2 X in (3.2) are not informative, one can find values of P 2 H 3 ½P%  for which pxj ¼ 0
and pxj ¼ 1 solve system (3.3).
The results in Propositions 4–5 can be explained as follows: only a subset X~ of the equations in system (1.1)
are related between each other. Therefore, when drawing inference on Prðx ¼ jÞ, j 2 X , an improvement on the
base case bound in (3.2) can be achieved only for j 2 X~ . Consider now the case in which X~ ¼ X . In this case
the results of Propositions 4–5 apply directly, with X replacing X~ . Of course, the identifying power of
Assumption 3 is the highest in this case. In particular, Proposition 4 establishes that the lower bound for
Prðx ¼ jÞ, j 2 X , if informative, improves for all j when Assumption 3 is maintained with X~ ¼ X .
A final consideration is relevant. Often the researcher might have prior information suggesting that
Assumption 3 holds, but not exactly. That is, she might have prior information that the probability of correct
report is only approximately constant: Prðw ¼ jjx ¼ jÞ p% , 8j 2 X~  X . Then it is natural to ask how much
variation in the probabilities of correct report is consistent with the conclusions of Propositions 4–5. For ease
of exposition, consider the identification of Prðx ¼ 1Þ, and let p11 ¼ p.9 Molinari (2003) shows that as long as
jpjj  p11 jol, 8j 2 X~ nf1g, and X~  X , or X~ ¼ X , the results of Proposition 4 continue to hold. A similar
condition is derived for the results of Proposition 5.
Example 6 in Section 3.4 illustrates the identifying power of Assumption 3, both for the case in which
X~  X and X~ ¼ X , by comparing the identification regions H½Prðx ¼ jÞ, j 2 X , H½Px  and H½EðxÞ obtained
using the nonlinear programming method with H E ½P%  ¼ H E;3 ½P%  with those obtained when only
Assumption 2 is maintained.

3.3. Monotonicity in correct reporting

Social psychology suggests that when survey respondents are asked questions relative to socially and
personally sensitive topics, they tend to underreport socially undesirable behaviors and attitudes, and
overreport socially desirable ones. This suggestion is often supported by validation studies. In the context of
questions of the type described above, these studies often document that Prðw ¼ jjx ¼ jÞX Prðw ¼ j þ 1jx ¼
j þ 1Þ, 8j 2 X~  X . This is the case for example when survey respondents are asked about their participation
9
When drawing inference on Pðx ¼ jÞ, j 2 X~ , we can always define pjj ¼ p, and look at pkk , k 2 X~ nfjg, as deviations from p.
ARTICLE IN PRESS
96 F. Molinari / Journal of Econometrics 144 (2008) 81–117

in welfare programs, and j ¼ 1 indicates non-participation, while j ¼ 2 indicates participation, or when they
are asked about their employment status, and j ¼ 1; 2 indicates, respectively, employed or not in the labor
force, while j ¼ 3 indicates unemployed.
Suppose that the set X  f1; 2; . . . ; Jg can be ordered according to the ‘social desirability’ of the values
that x can take, with x ¼ 1 being the most desirable, and x ¼ J the least desirable. Suppose further
that the researcher believes that there is monotonicity in correct reporting. Then she can maintain the
following:

Assumption 4. Prðw ¼ jjx ¼ jÞX Prðw ¼ j þ 1jx ¼ j þ 1Þ, 8j 2 X nfJg, Prðw ¼ Jjx ¼ JÞX1  lX0,

where l is strictly less than 1 if a non-trivial upper bound on the probability of a data error is available. When
this assumption holds, H E ½P%  is defined as

H E;4 ½P%  ¼ fP : pjj Xpðjþ1Þðjþ1Þ ; 8j 2 X nfJg; pJJ X1  lg.


Let H 4 ½P%  ¼ H P ½P%  \ H E;4 ½P% , where H P ½P%  was defined in (2.4). Then H½Px  and Hft½Px g can be
calculated using the nonlinear programming method described in Section 2, with H E ½P%  ¼ H E;4 ½P% .
To verify that Assumption 4 does have identifying power I again consider the case that lo1, and compare
the results obtained using the nonlinear programming method when Assumption 4 is maintained, with those
of Proposition 3. In Section 3.4 I consider the case in which x and w are binary ðJ ¼ 2Þ, and show that
Assumption 4 can have identifying power even when l ¼ 1.
Suppose that Assumption 4 holds. Proposition 6 shows that the base case lower bound in (3.2), when
informative, is feasible for Prðx ¼ 1Þ. However, for j 2 X nf1g if Pwl 40 for some l 2 f1; . . . ; j  1g, the base case
lower bound in (3.2), when informative, is not feasible for Prðx ¼ jÞ, and hence the lower bound under
Assumption 4 is strictly greater than that in (3.2). Regarding the base case upper bound in (3.2), the same
results as those in Proposition 5 hold, with X~ ¼ fj; j þ 1; . . . ; Jg. The proof of this proposition derives almost
directly from the proofs of Propositions 4–5.

Proposition 6. Suppose that Assumption 4 holds.


(a) Let Pwj 4l. Then if j ¼ 1, the base case lower bound in (3.2) is the sharp lower bound for Prðx ¼ 1Þ. The
lower bound for Prðx ¼ jÞ, j 2 X nf1g, is strictly greater than the base case lower bound in (3.2). The result of
Proposition 4, part (b), is unchanged.
(b) Let Pwj oð1  lÞ. Then the same results as in Proposition 5 hold, with X~ ¼ fj; j þ 1; . . . ; Jg. The result of
Proposition 5, part (b), is unchanged.

Example 6 in Section 3.4 illustrates the identifying power of Assumption 4, by comparing the identification
regions obtained using the nonlinear programming method with H E ½P%  ¼ H E;4 ½P%  with those obtained
when only Assumption 2 is maintained.

3.4. Dichotomous variables and numerical examples

When x and w are dichotomous variables, the identifying power of Assumptions 3 and 4 can be more easily
appreciated, since the bounds on H½Px  can be derived explicitly. This section shows how. It then provides
numerical examples of the identification regions obtained under Assumptions 2, 3 and 4, both for the case of
J ¼ 2 and 3.
Let X ¼ f1; 2g. The problem of misclassification of a dichotomous variable has received much attention
in the econometric, statistical, and epidemiological literature. It is in the context of misclassified dichoto-
mous variables that most of the precedents to the use of restrictions on the misclassification probabilities take
place.
To start, suppose that Assumption 3 holds. In the related literature it has often been assumed that Prðw ¼
1jx ¼ 2Þ ¼ Prðw ¼ 2jx ¼ 1Þ and additionally that these misclassification probabilities are less than 12 (see, e.g.,
Klepper, 1988; Card, 1996). Notice that with dichotomous variables Assumption 3 implies that Eq. (1.1)
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 97

can be rewritten as
" #  " #
Prðw ¼ 1Þ p% 1  p% Prðx ¼ 1Þ
¼ .
Prðw ¼ 2Þ 1  p% p% Prðx ¼ 2Þ

Hence, the identification region H½Px  can be inferred from the identification region
H½Prðx ¼ 1Þ ¼ fpx1 : Pw1 ¼ ppx1 þ ð1  pÞð1  px1 Þ; p 2 H 3 ½P% g,
where H 3 ½P%  was defined in Example 2. Notice that if p ¼ 12, Pw1 ¼ 12; in this case, PðwjxÞ ¼ PðwÞ, i.e., x and w
are statistically independent, and obviously knowledge of PðwÞ does not provide any information on PðxÞ.
If Pw1 a 12, then pa 12. The following proposition characterizes explicitly H½Prðx ¼ 1Þ.

Proposition 7. Let Assumption 3 hold, with X~ ¼ X  f1; 2g.


(a) If lo 12, then
8   w 
>
> w P1  l
< H½Prðx ¼ 1Þ ¼ P1 ; min 1  2l ; 1
> if Pw1 X0:5;
  w  
>
> P1  l w
>
: H½Prðx ¼ 1Þ ¼ max ; 0 ; P1 otherwise:
1  2l

(b) If lX 12, then


8
> H½Prðx ¼ 1Þ ¼ ½Pw1 ; 1 if Pw1 4l;
>
>  
>
> Pw1  l 1
>
> H½Prðx ¼ 1Þ ¼ 0; [ ½Pw1 ; 1 if lXPw1 4 ;
>
>
>
> 1  2l 2
>
< 1
H½Prðx ¼ 1Þ ¼ ½0; 1 if Pw1 ¼ ;
>
>  w  2
>
>
>
> w P1  l 1 w
>
> H½Prðx ¼ 1Þ ¼ ½0; P  [ ; 1 if 4P1 X1  l;
>
>
1
1  2l 2
>
>
: H½Prðx ¼ 1Þ ¼ ½0; Pw  if 1  l4Pw1 :
1

These identification regions are a subset of those in (3.2).

The fact that if lX 12 H½Prðx ¼ 1Þ can be given by two disjoint intervals is a direct consequence of the
possible disconnectedness of H½P%  arising when one assumes constant probability of correct report, and is
described in Section 2 and in Example 2.
Suppose now that Assumption 4 holds. Also in this case the identification region H½Px  can be inferred from
the identification region
H½Prðx ¼ 1Þ ¼ fpx1 : Pw1 ¼ p11 px1 þ ð1  p22 Þð1  px1 Þ; ðp11 ; p22 Þ 2 H 4 ½P% g, (3.6)
where H 4 ½P%  was defined in Example 3. Notice that again if p11 ¼ p22 ¼ 12, Pw1 ¼ 12; in this case,
PðwjxÞ ¼ PðwÞ, i.e., x and w are statistically independent, and obviously knowledge of PðwÞ does not provide
any information on PðxÞ. If Pw1 a 12, then p11 and p22 cannot be jointly equal to 12. The following proposition
characterizes explicitly H½Prðx ¼ 1Þ.

Proposition 8. Let Assumption 4 hold.


(a) If lo 12, then
8   w   w 
>
> P1  l P1  l
< H½Prðx ¼ 1Þ ¼ max 1  l ; 0 ; min 1  2l ; 1
> if Pw1 X0:5;
  w   (3.7)
>
> P1  l w
>
: H½Prðx ¼ 1Þ ¼ max 1  l ; 0 ; P1 otherwise.
ARTICLE IN PRESS
98 F. Molinari / Journal of Econometrics 144 (2008) 81–117

(b) If lX 12, then


8  w 
>
> P1  l
>
> H½Prðx ¼ 1Þ ¼ ; 1 if Pw1 4l;
>
> 1l
>
>
>
> 1
< H½Prðx ¼ 1Þ ¼ ½0; 1 if lXPw1 X ;
 w  2 (3.8)
>
> P1  l 1
>
> w w
>
> H½Prðx ¼ 1Þ ¼ ½0; P1  [ ; 1 if 4P1 X1  l;
>
> 1  2l 2
>
>
: H½Prðx ¼ 1Þ ¼ ½0; Pw  if 1  l4Pw1 :
1

These identification regions are a subset of those in (3.2).


Again, the fact that if lX 12 and Pw1 o 12, H½Prðx ¼ 1Þ can be given by two disjoint intervals is a direct
consequence of the possible disconnectedness of H½P%  arising when one assumes monotonicity in correct
reporting, and is described in Section 2 and in Example 3.
The following numerical example illustrates the identifying power of Assumptions 3 and 4 with X ¼ f1; 2g
by comparing the bounds in Propositions 7 and 8 with those in (3.2) and showing how the bounds improve as
l gets closer to the true misclassification parameter.
Example 5. Let Prðx ¼ 1Þ ¼ 0:3 and p% ¼ 0:9, so that Pw1 ¼ 0:34. Table 1 gives lower and upper bounds on
Prðx ¼ 1Þ, when Assumptions 2, 3 and 4 are maintained, as l approaches 1  p% . Notice that the identification
region for Prðx ¼ 1Þ when Assumptions 3 and 4 are maintained is informative even when l ¼ 1.

To conclude this section, I illustrate the identifying power of Assumption 3 (both for the case in which
X~  X and X~ ¼ X ) and Assumption 4, when J ¼ 3. I compare the identification regions H½Prðx ¼ jÞ, j 2 X ,
H½Px  and H½EðxÞ obtained using the nonlinear programming method with H E ½P%  ¼ H E;3 ½P%  and with
H E ½P%  ¼ H E;4 ½P%  with those obtained when only Assumption 2 is maintained.

Table 1
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case, with dichotomous
variables, for different values of l

Maintained assumptions

l Base-case Monotonicity in correct reporting Constant probability of correct report


H½Prðx ¼ 1Þ H½Prðx ¼ 1Þ H½Prðx ¼ 1Þ

1.000 ½0; 1 ½0; 0:34 [ ½0:66; 1 ½0; 0:34 [ ½0:66; 1


0.750 ½0; 1 ½0; 0:34 [ ½0:82; 1 ½0; 0:34 [ ½0:82; 1
0.400 ½0:00; 0:57 ½0:00; 0:34 ½0:00; 0:34
0.250 ½0:12; 0:45 ½0:12; 0:34 ½0:18; 0:34
0.100 ½0:27; 0:38 ½0:27; 0:34 ½0:30; 0:34

Table 2
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case

Maintained assumptions Exact value

Base-case Monotonicity in correct reporting Constant probability of correct report

X~ ¼ f1; 2g X~ ¼ X

Prðx ¼ 1Þ ½0:180; 0:425 ½0:180; 0:415 ½0:235; 0:415 ½0:235; 0:415 0.3
Prðx ¼ 2Þ ½0:434; 0:687 ½0:525; 0:687 ½0:525; 0:687 ½0:551; 0:687 0.6
Prðx ¼ 3Þ ½0:000; 0:138 ½0:000; 0:138 ½0:000; 0:138 ½0:000; 0:137 0.1
EðxÞ ½1:575; 1:955 ½1:585; 1:955 ½1:585; 1:899 ½1:585; 1:899 1.8
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 99

Fig. 2. Comparison of the identifying power of different assumptions for H½Px .

Example 6. Let X ¼ f1; 2; 3g, l ¼ 0:2, p% ¼ 0:85, ½Prðx ¼ jÞ; j 2 X  ¼ ½0:3 0:6 0:1T , and suppose that
w T
p% % %
21 ¼ 0:11, p12 ¼ 0:13, p13 ¼ 0:04, so that P ¼ ½0:34 0:55 0:11 ; with these values, EðxÞ ¼ 1:8. Table 2
gives the identification regions for t½P  ¼ Prðx ¼ jÞ, j 2 X , and for t½Px  ¼ EðxÞ, when Assumption 2 alone is
x

maintained, when Assumptions 2 and 3 are jointly maintained with X~ ¼ X and with X~ ¼ f1; 2g, and when
Assumptions 2 and 4 are jointly maintained. The improvement in the upper bound on Prðx ¼ 1Þ comes from
the second equation of system (1.1); indeed Pw1 þ Pw2 ¼ 0:8940:885 ¼ ð1  lÞ þ 1l l
Pw1 . Fig. 2 plots the
identification regions H½P  obtained under the different assumptions, mapping them in R2 .
x

4. Estimation and inference for the distribution of pension plan types in the US

To illustrate estimation of the bounds and construction of the confidence sets, I consider data on the
distribution of pension plan characteristics in the American population age 51–61. The data are based on
household interviews obtained in the Health and Retirement Study (HRS), a longitudinal, nationally
representative study of older Americans, which in its base year of 1992 surveyed 12,652 individuals from 7,607
households, with at least one household member born between 1931 and 1941. The survey has been updated
every two years since 1992, and in 1998 a new cohort of 2,529 individuals born between 1942 and 1947
(so-called ‘ War Babies’) was added to the HRS sample. I use data from the first HRS wave and from the War
Babies wave, focusing on the information collected on pension plan characteristics for people age 51–61 and
ARTICLE IN PRESS
100 F. Molinari / Journal of Econometrics 144 (2008) 81–117

Table 3
Percentage with self-reported plan type conditional on firm report of plan type, for respondents reporting pension coverage on current job
with a matched employer plan description

Self-report Provider report

DB DC Both

DB 0.56 0.26 0.45


DC 0.15 0.54 0.18
Both 0.27 0.18 0.35
Don’t Know 0.02 0.02 0.02

Sample size: 2,907. Source: Gustman and Steinmeier (2001, Table 6C).

employed at the time of the survey. This provides two nationally representative cross-sections of the
population of interest. The question to be addressed is:

How did the distribution of pension plan types in the population of currently employed Americans, age
51–61, change between 1992 and 1998?

Three pension plan types are possible: DB, DC, and plans incorporating features of both (Both). DB and
DC plans differ greatly in their characteristics. As described by Gustman et al. (2000), in a DB pension the
benefit formula is specified by the plan sponsor, usually as a function of the worker’s highest salary, years of
service, and retirement age. Typically such plans reduce the benefit amount for retirement prior to the so-
called normal retirement age, and are financed by employer (pre-tax) contributions. DC plans do not specify
the retirement benefit, but they set how much is contributed into the account each year the worker remains
with the plan. Then the benefit payout is determined at retirement, as a function of how much it accumulated
in the worker’s account. The plan type can affect several pension-related variables, including pension wealth
and pension accrual. For example, there are DB plans in which an additional year of service is rewarded by
greater retirement benefits up to the firm’s early retirement age. Then the benefit accrual profile may flatten
out, and even become negative, if retirement is delayed further. By contrast, DC plans tend to be actuarially
neutral with regard to the retirement age, rewarding delayed retirement more monotonically.
It is then of interest to learn how the distribution of pension plan types has changed over time, as a
preliminary step before studying the relation between pension incentives and retirement and saving behavior.
The HRS data can provide valuable information in this direction. However, there is evidence that workers are
particularly misinformed about their pension plans’ characteristics, and it is therefore not obvious how to
make use of their reported pension plan descriptions to draw the inference of interest. Gustman and
Steinmeier (2001) linked data from the first HRS wave with restricted data from Social Security
Administration and employer provided pension plan descriptions, and documented that individuals with
matched data (approximately 51% of the entire HRS sample and 67% of currently employed respondents)
approaching retirement age are remarkably misinformed with regard to their pension plans’ characteristics.
Their results are reported in Table 3, and suggest that, overall, approximately 49% of the currently employed
individuals with matched data correctly identify their pension plan type, the remaining 51% providing a
wrong report.
For the individuals in the first HRS wave without a matched pension (33% of the sample) it is difficult to
determine the true plan type: on one side, Gustman and Steinmeier (2001) document that the sub-sample
without a matched pension is different from the sub-sample with a matched pension; on the other side, the
evidence for the sub-sample with matched pension casts doubts on the reliability of the self-reports. Moreover,
linked data are not available for individuals in subsequent waves, or for individuals in the War Babies wave.10
Yet, the results of Gustman and Steinmeier’s (2001) analysis provide information on the misreporting pattern,
10
Additionally, employer provided pension plan descriptions are not publicly accessible by HRS users. In particular, such data are not
available for the analysis carried out in this paper.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 101

Table 4
True fractions of pension plan types for the subset of respondents with matched data for 1992, as calculated by Gustman and Steinmeier
(2001), Table 6A, and reported fractions of pension plan types for 1992 and 1998 (Author’s calculations)

t ¼ 1992 t ¼ 1992 t ¼ 1998

Point est. 95% CI Point est. 95% CI Point est. 95% CI

Prt ðx ¼ 1js ¼ 1Þ 0.48 ½0:46; 0:50 Prt ðw ¼ 1Þ 0.44 ½0:42; 0:45 0.28 ½0:25; 0:30
Prt ðx ¼ 2js ¼ 1Þ 0.21 ½0:19; 0:22 Prt ðw ¼ 2Þ 0.30 ½0:29; 0:32 0.38 ½0:35; 0:41
Prt ðx ¼ 3js ¼ 1Þ 0.31 ½0:29; 0:33 Prt ðw ¼ 3Þ 0.26 ½0:24; 0:27 0.34 ½0:31; 0:37
Sample size N ¼ 2; 907 Sample size N ¼ 4; 244 N ¼ 1; 124

and such information can be exploited through the direct misclassification approach to draw inference on the
question of interest.
In all that follows I assume that the HRS respondents correctly report whether they are covered by a
pension,11 and I take firm reported plan types to be the ‘true’ plan types. I also ignore the observations with
missing data (about 2% of the sample). Let x ¼ 1 if the individual has a DB plan, x ¼ 2 if the individual has a
DC plan, and x ¼ 3 if the individual has a plan combining features of both, so that X  f1; 2; 3g. As before,
w 2 X denotes the reported pension plan type. Let Pw;t  ½Prt ðw ¼ jÞ; j 2 X  and Px;t  ½Prt ðx ¼ jÞ; j 2 X 
denote, respectively, the vectors of fractions of reported pension plan types and true pension plan types at time
t ¼ 1992; 1998. For the respondents in the first HRS wave, let sl ¼ 1 denote the fact that individual l 2 L1992
%1
has a matched pension plan description, sl ¼ 0 otherwise, and denote by P1992 the matrix of misclassification
probabilities that maps the true pension plan types into the reported types for individuals with matched
%0
pension plan descriptions. Let P1992 denote the matrix of misclassification probabilities for the respondents in
the first HRS wave without a matched plan description, and let P% 1998 denote the matrix of misclassification
probabilities for the entire sample of respondents in the War Babies wave. Table 3 reveals, up to statistical
%1
considerations, P1992 . From the HRS data and from Gustman and Steinmeier’s (2001) results one can learn
w;1992 w;1998
P , P , and ½Pr1992 ðx ¼ jjs ¼ 1Þ; j 2 X . These values are reported in Table 4, along with 95%
confidence intervals.
One might expect the misclassification pattern reported by Gustman and Steinmeier (2001) to hold for the
entire set of respondents to the 1992 HRS survey. On the other hand, one might expect that the
misclassification structure mapping true pension plan types into reported types changes over time, so that
%1
P1992 can help in constructing H½P% 1998 , but not reduce this set to a singleton. However, one might as well be
tempted to entertain assumptions strong enough to achieve point identification of the quantity of interest. To
test the credibility of these conjectures, I examine the following assumptions:
%1
Assumption E1 (No selection). P%
1992 ¼ P1992 .
%1
Assumption E2 (No selection and no variation over time). P%
1998 ¼ P1992 .

The first assumption states that the misreporting pattern for respondents in the first HRS wave with
matched pension plan description holds for the entire sample of the first HRS wave. The second assumption
states that the misreporting pattern for the respondents in the War Babies wave is the same as that for the
respondents with matched data in the first HRS wave. When these assumptions are maintained, P% 1992 and
%1 x 1 w
P%1998 are identified, and, since P1992 is non-singular, one can use the equation p ¼ P P to attempt to learn
½Prt ðx ¼ jÞ; j 2 X , t ¼ 1992; 1998. Table 5 reports the results of such procedure, along with 95% bootstrap
%1
confidence intervals. The data reject the assumption that P% 1998 ¼ P1992 : the vector obtained from solving
%1 1 w;1998
ðP1992 Þ P does not generate a valid probability measure. In particular, the first element of the implied
11
This assumption is based on Gustman and Steinmeier’s (2001) comparison between peoples’ reports on their pension coverage in both
the 1992 and 1994 waves of the HRS. This comparison shows that 93% of the respondents who declared to be covered by a pension or to
be not covered by a pension in 1992 give the same answer in 1994. Of the remaining 7%, approximately 80% are individuals who declared
not to be covered by a pension in 1992 but to be covered in 1994.
ARTICLE IN PRESS
102 F. Molinari / Journal of Econometrics 144 (2008) 81–117

Table 5
Implications of Assumption E1—no selection—and Assumption E2—no selection and no variation over time—for the identification
regions of ½Prt ðx ¼ jÞ; j 2 X , t ¼ 1992; 1998

Maintained assumptions t ¼ 1992: No selection t ¼ 1998: No selection and no variation over time

Point estimate Bootstrap 95% CI Point estimate Bootstrap 95% CI

0.46 ½0:36; 0:60 0.86 ½1:82; 0:43


%1
ðP1992 Þ1 Pw;t 0.37 ½0:33; 0:41 0.48 ½0:31; 0:63
0.17 ½0:02; 0:28 1.38 ½0:89; 2:37
Sample size N ¼ 4; 244 N ¼ 1; 124

vector is negative and its 95% confidence interval does not cover the zero, and the last element is greater than
one. Hence, point identification of Px;1998 through Assumption E2 is not possible. On the other hand, the data
%1
do not reject the assumption that P% 1992 ¼ P1992 , despite the possible selection problem. In all that follows I
maintain Assumption E1 and focus the attention on the problem of inferring H½Px;1998 . Of course,
Assumption E1 can be relaxed, and H½Px;1992  can be estimated under weaker assumptions using the direct
misclassification approach.
The main assumption that I maintain throughout the entire analysis, and that I use to exploit part of the
%1
information in P1992 to learn H½Px;1998 , is the following:
Assumption E3 (No reduction in awareness). pjj;1998 Xpjj;1992 , 8j 2 X .
This assumption says that the fraction of individuals correctly identifying their pension plan type does not
decline over time. This in turn implies that lower bounds on the probability of correct report in 1992 provide
lower bounds on the probability of correct report in 1998. Assumption E3 is motivated by the observation that
in recent years the Social Security Administration and the Department of Labor have increasingly expanded
their efforts to improve individuals’ knowledge about pensions and about retirement saving in general (see
Gustman and Steinmeier, 2001 for a summary of recent interventions).
I now introduce two sets of assumptions, which I entertain along with Assumption E3 to construct the set
x;1998
H½P% 1998  and derive H½P . Of course, different empirical researchers might hold disparate beliefs about
which of the assumptions in Cases 1 and 2 hold; moreover, they might bring to bear different prior
information.
The identification regions for H½Px;1998  are plotted in Fig. 3, along with their 95% confidence sets. The
identification regions H½Pr1998 ðx ¼ jÞ, j 2 X are reported in Table 6, again with their 95% confidence
intervals.
Case 1:

1998  ¼ H ½P  \ fP : p11 ¼ p22 X0:54; p22 Xp33 X0:35; p21 pp12 ; p31 pp13 ; p23 pp13 g.
P
H½P% %

Case 1 maintains Assumption E3 and builds on Assumption E1. I assume that certain of the findings of
Gustman and Steinmeier (2001) for matched respondents in 1992 are informative about respondents in 1998. I
assume that the probability of correct report for 1998 respondents who truly have a DB or a DC plan is at
least as large as the corresponding probability for 1992 respondents. I also assume that persons with DB and
DC pensions have the same probabilities of correct reports, these being at least as large as the probability of
correct report by those whose pensions are of the Both type. This assumption is motivated by Table 2, which
shows this pattern for 1992 respondents.
I also assume that various other features of Table 2 carry over to respondents in the War Babies wave. I
assume that persons who truly have a pension plan of the Both type report their plan as DB more often than
the reverse pattern, where persons with DB plans report themselves as having a plan of the Both type. I assume
that persons who truly have a DC plan report a DB plan more often than individuals with a DB plan report a
DC one. And I assume that persons who truly have a plan of the Both type report a DB plan more often than
a DC one. These assumptions are expressed through the inequalities p21 pp12 , p31 pp13 , p23 pp13 .
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 103

Case 1
[0 0 1] CNH[P
x,1998
]

H[Px,1998]
Δ2

[1 0 0] [0 1 0]

Case 2
x,1998
[0 0 1] CNH[P ]

H[Px,1998]
Δ2

[1 0 0] [0 1 0]

Fig. 3. Identification regions and confidence sets for H½Px;1998  under different assumptions.

Table 6
Identification regions in cases 1–2 for Pr1998 ðx ¼ jÞ, and point estimates for Pr1992 ðx ¼ jÞ

Maintained assumptions H½Prt ðx ¼ 1Þ H½Prt ðx ¼ 2Þ H½Prt ðx ¼ 3Þ

Estimate 95% CI Estimate 95% CI Estimate 95% CI

t ¼ 1992 0.46 ½0:36; 0:60 0.37 ½0:33; 0:41 0.17 ½0:02; 0:28
Case 1, 1998 ½0:00; 0:42 ½0:00; 0:44 ½0:11; 0:72 ½0:10; 0:87 ½0:00; 0:89 ½0:00; 0:91
Case 2, 1998 ½0:00; 0:28 ½0:00; 0:34 ½0:35; 0:61 ½0:28; 0:80 ½0:11; 0:50 ½0:00; 0:67

Sample size N ¼ 4; 244 for 1992 N ¼ 1; 124 for 1998

The first panel of Fig. 3 shows the estimate of H½Px;1998  obtained in Case 1, and its confidence set, mapped
in R2 . Interestingly, these sets are non-convex. For the construction of the confidence set, I estimated Pw;1998
using sample means, and took as estimates of the lower bounds in H E ½P%  the values m1;n , m2;n in the (2,2) and
(3,3) entries of Table 3. These estimates are borrowed from Gustman and Steinmeier (2001) and are based on a
validation data (respondents to the 1992 wave with matched pension plan descriptions) independent from the
1998 data, with n ¼ 2; 907. For the construction of the confidence ellipsoid for ½Pw;1998
1 ; Pw;1998
2 ; m1 ; m2  I used
N 1;124
k ¼ n ¼ 2;907. The estimates of Pr1992 ðx ¼ 1Þ and H½Pr1998 ðx ¼ 1Þ reported in Table 6 suggest that the fraction
of individuals having a DB plan should have declined between 1992 and 1998. However, the confidence set for
H½Pr1992 ðx ¼ 1Þ  Pr1998 ðx ¼ 1Þ covers negative numbers, and therefore the hypothesis Pr1992 ðx ¼ 1Þ 
Pr1998 ðx ¼ 1Þo0 cannot be rejected. This shows that relatively mild restrictions yield a strong conclusion
regarding the question of interest, although more assumptions are needed to obtain statistical significance.
ARTICLE IN PRESS
104 F. Molinari / Journal of Econometrics 144 (2008) 81–117

Case 2:
8 0 19
> p11 ¼ p22 Xp33 X0:54; >
< =
% P % B C
p21 pp12 ; p31 pp13 ; p23 pp13 ;
H½P1998  ¼ H ½P  \ P : @ A .
>
: >
p21 X0:10; pij X0:15 for all other i; j 2 X ; iaj: ;

Case 2 builds on Case 1, as it retains all the assumptions maintained there. However, it is crucially set apart
from the previous case, in that it requires a lower bound on each probability of misclassification. This in turn
implies that, given any true pension plan type, the probability of correct report has to be necessarily less than
one. This assumption is motivated by the large amount of misreporting of pension plan types which appears in
Table 3, and which is documented at large by Gustman and Steinmeier (2001). Additionally, p33 is required to
have the same lower bound as p11 and p22 . This is motivated by the large amount of information campaigns on
DC plans (in particular 401k) that has characterized the mid to late 1990s.
Under these assumptions, the estimate of H½Px;1998  shrinks further. This allows one to conclude that the
fraction of individuals having DB plans has decreased between 1992 and 1998; in particular,
Pr1992 ðx ¼ 1Þ  Pr1998 ðx ¼ 1ÞX0:18. This in turn implies that the fraction of individuals having either DC
plans or plans incorporating features of both has increased sharply between 1992 and 1998. The confidence set
for H½Pr1992 ðx ¼ 1Þ  Pr1998 ðx ¼ 1Þ does not contain negative numbers, so that the assumption Pr1992 ðx ¼
1Þ  Pr1998 ðx ¼ 1Þo0 can be rejected. The confidence set H½Px;1998  in Case 2 is constructed again by
estimating Pw;1998 using sample means, and taking as estimate of the lower bound for pjj , j ¼ 1; 2; 3, in H E ½P% 
the value mn in the (2,2) entry of Table 3. However the lower bounds for the other parameters are treated as
constant, so that the confidence ellipsoid is constructed exclusively for the vector ½Pw;1998 1 ; Pw;1998
2 ; m.
By comparison, if one did not use all the information provided by Gustman and Steinmeier’s (2001)
analysis, but imposed only a uniform lower bound on the probability of correct report (Assumption 2), the
results of HM would apply. If one assumed 1  l ¼ 0:35, one would learn that Pr1998 ðx ¼ 1Þ 2 ½0; 0:79,
Pr1998 ðx ¼ 2Þ 2 ½0; 1, Pr1998 ðx ¼ 3Þ 2 ½0; 0:97. If one assumed 1  l ¼ 0:54, one would learn that
Pr1998 ðx ¼ 1Þ 2 ½0; 0:51, Pr1998 ðx ¼ 2Þ 2 ½0; 0:71, Pr1998 ðx ¼ 3Þ 2 ½0; 0:63. These bounds do not allow one to
identify the sign of the change in the fraction of individuals having a DB plan.

5. Extensions

The direct misclassification approach can be easily extended to drawing inference in the presence of multiple
misclassified variables, regression with misclassified outcome, regression with misclassified regressor, and
jointly missing and misclassified outcomes. Below I list briefly the modifications of the approach that allow
inference in each of these cases.

5.1. Two or more misclassified variables

In this case, the researcher simply has to redefine variables. Suppose that interest centers on features of
Pðx1 ; x2 Þ, x1 2 X 1  f1; 2; . . . ; J 1 g, x2 2 X 2  f1; 2; . . . ; J 2 g, 2pJ 1 ; J 2 o1, and the researcher observes only
ðw1 ; w2 Þ, a misclassified version of ðx1 ; x2 Þ. She can then construct random variables s and r, taking values in
S  f1; 2; . . . ; J 1 J 2 g, and such that s ¼ ðl  1ÞJ 2 þ j if x1 ¼ j and x2 ¼ l, and r ¼ ðk  1ÞJ 2 þ i if w1 ¼ i and
w2 ¼ k. She can then write the analogue of Eq. (1.1) for r and s, and use the method proposed here to draw the
inference of interest.

5.2. Regressions

(a) If interest centers on features of Pðxjs ¼ s0 Þ, where s 2 S is a perfectly observable discrete covariate with
Prðs ¼ s0 Þ40, and the researcher has prior information on P% s0  fPrðw ¼ ijx ¼ j; s ¼ s0 Þgi;j2X , the proposed
method can be applied directly, with the event s ¼ s0 conditioning all the probabilities involved.
(b) Consider now the case that interest centers on features of PðyjxÞ, where y is a perfectly observed outcome
variable. The problem of regression with misclassified covariates has been widely studied (e.g., Aigner, 1973;
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 105

Klepper, 1988; Bollinger, 1996; Card, 1996; Kane et al., 1999; Hu, 2006; Mahajan, 2006), and point
identified or interval identified estimators have been proposed under specific sets of assumptions.
The direct misclassification approach can be used to estimate the smallest point and the largest point
in the identification region of (for example) a mean regression under any set of assumptions. Molinari
(2003) shows how. Here I present ideas, for the special case in which the probability of correct report is
greater than 12 for each of the values that x can take (and any additional assumption might hold). In
this case any P 2 H½P%  is of full rank, so that px ¼ P1 Pw . This implies that a feasible value of ½Prðx ¼
jjw ¼ iÞ; i; j 2 X  can be uniquely expressed as a function of P. Hence, for each P 2 H½P% , I can use
the results of HM to obtain sharp bounds for Eðyjw ¼ i; x ¼ jÞ and use the Law of Total Probability to infer
sharp P-dependent bounds on Eðyjx ¼ jÞ. Taking the infimum and the supremum, respectively, of the
smallest and largest points in these bounds for P 2 H½P%  gives the smallest and the largest point in
H½Eðyjx ¼ jÞ, j 2 X .
This same argument has been proposed by Dominitz and Sherman (2006), who studied the problem of
inferring the distribution of test scores for truly English proficient students (x ¼ 1), when only an imperfect
indicator of English proficiency is available (w ¼ 1). They used a mixture model with verification and assumed
that students classified as English proficient (w ¼ 1) are more likely to be truly English proficient (x ¼ 1) than
students classified as limited English proficient (w ¼ 2). In terms of misclassification probabilities, this
assumption translates into p11 XPw1 .

5.3. Jointly missing and misclassified data

The data available to the empirical researcher are often not only error ridden, but also incomplete. Consider
the example of survey respondents being asked about their pension plan type: not only can they report DB,
DC, or Both, but they can as well choose not to respond to the question. Let w ¼ J þ 1 denote this outcome.
Then system (1.1) needs to be enlarged to include the equation
X
J
Prðw ¼ J þ 1Þ ¼ Prðw ¼ J þ 1jx ¼ jÞ Prðx ¼ jÞ.
j¼1

This simply implies that the set H½P%  is a set of rectangular matrices. The identification regions H½Px  and
Hft½Px g are still defined as in (2.2) and (2.3), and the nonlinear programming method can be used to
consistently estimate them. Of course, there are additional constraints, one coming from the ðJ þ 1Þth
equation in the above system, and the others from possible assumptions on the relationship between
misreporting and non-response.

6. Conclusions

This paper has studied the problem of drawing inference when a discrete variable is subject to classification
errors. This is a commonplace problem in surveys and elsewhere. The problem has long been conceptualized
through convolution and mixture models. This paper introduced the direct misclassification approach. The
approach is based on the observation that in the presence of classification errors, the relation between the
distribution of the ‘true’ but unobservable variable and its misclassified representation is given by a linear
system of simultaneous equations, in which the coefficient matrix is the matrix of misclassification
probabilities.
While this matrix is unknown, validation studies, economic theory, cognitive and social psychology, or
knowledge of the circumstances under which the data have been collected can provide information on the
misclassification pattern that has transformed the ‘true’ but unobservable variable into the observable but
possibly misclassified variable. The method introduced in this paper shows how to transform such prior
information into sets of restrictions on the (unknown) matrix of misclassification probabilities, and exploit
these restrictions to derive identification regions for any real functional of the distribution of interest. By
contrast, mixture models do not allow the researcher to easily exploit this type of prior information to learn
features of the distribution of interest. Convolution models, as usually implemented with the assumption of
ARTICLE IN PRESS
106 F. Molinari / Journal of Econometrics 144 (2008) 81–117

independence between measurement error and ‘true’ variable, are not suited to analyze errors in discrete data.
The direct misclassification approach does not rely on any specific set of assumptions, but it can incorporate
into the analysis any prior information that the researcher might have on the misreporting pattern. In some
cases the implied identification regions have a simple closed form solution that allows for straightforward
estimation using sample analogs. When this is not the case, the identification regions can be estimated using
the nonlinear programming estimator introduced in this paper. Confidence sets that cover the true
identification region with probability at least equal to a prespecified confidence level can be constructed using
a simple procedure based on the inversion of a Wald statistic.

Acknowledgments

I am grateful to the Associate Editor, two anonymous reviewers, Tim Conley, Joel Horowitz, Rosa
Matzkin, and especially Chuck Manski for helpful comments and suggestions. I have benefitted from
discussions with T. Bar, G. Barlevy, L. Barseghyan, L. Blume, R. DiCecio, M. Goltsman, A. Guerdjikova,
G. Jakubson, N. Kiefer, R. Lentz, G. Menzio, B. Meyer, M. Peski, J. Sullivan, C. Taber, E. Tamer,
T. Tatur, and T. Vogelsang, and from the comments of seminar participants at Boston College, Chicago GSB,
Cornell, Duke, Georgetown, Pittsburgh, Penn, Penn State, Princeton, Purdue, Toronto, UCLA, UCL,
Virginia, and at the 2003 Southern Economic Association Meetings. All remaining errors are my own.
Research support from Northwestern University Dissertation Year Fellowship, the Center for Analytic
Economics at Cornell University, and the National Science Foundation Grant SES-0617482 is gratefully
acknowledged.

Appendix A. Proofs of Propositions

A.1. Propositions in Section 2

A.1.1. Proposition 1
Proof. Let P1 2 H P ½P% . This means that 9 n1 2 DJ1 such that P1 n1 ¼ Pw . Now observe that for any
~ ¼ Pw . Hence, for any a 2 ð0; 1Þ it holds that ðaP1 þ ð1  aÞPÞn
n 2 DJ1 , Pn ~ 1 ¼ Pw , and therefore
1
ðaP þ ð1  aÞPÞ ~ 2 H ½P . To show that H ½P  is not star convex with respect to any other of its
P % P %

elements, consider a matrix P1 2 H P ½P%  with P1 aP. ~ Because P1 aP, ~ it follows that there exists an i 2 X
such that not all elements of the ith row of P are equal to Pwi . Without loss of generality, let i ¼ 1. Let
1

p11j 4Pw1 40 (a similar argument works for the case that p1j oPw1 ), and without loss of generality suppose j ¼ 1.
Construct P2 as follows: p21 ¼ Pw , p21k ¼ 1 8k 2 X nf1g. Then P2 2 H P ½P% . Let Pa ¼ aP1 þ ð1  aÞP2 . Then
for any a 2 ½0; 1  Pw1 Þ it follows that Pa eH P ½P% , because every element in the first row of the resulting
matrix is strictly greater than Pw1 . &

A.1.2. Proposition 2
For given vectors of positive probabilities Pw and positive constants l  ½m1 ; . . . ; mq̄ , let Qðn; Pw ; lÞ denote
the value function in the nonlinear programming problem (2.7)–(2.8). Observe that QN ðnÞ ¼ Qðn; PwN ; ln Þ. Let
ðv1 ; P1 Þ be the maximizer for Qðn; Pw;1 ; l1 Þ, where Pw;1 ; l1 are arbitrary values of Pw ; l (recall that this problem
has always an optimal solution). I show that for different feasible arbitrary values Pw;2 ; l2 , the difference
jQðn; Pw;1 ; l1 Þ  Qðn; Pw;2 ; l2 Þj is Op ðN 1=2 Þ. The strategy of this proof is similar to the one in Honore and
Lleras-Muney (2006), except that here some more complications arise due to the possible nonlinearity of some
of the constraints. This establishes that
p supn2DJ1 jQN ðnÞ  QðnÞj
sup jQN ðnÞ  QðnÞj ! 0 and ¼ op ð1Þ.
n2DJ1 N
The consistency result then follows from Manski and Tamer (2002, Proposition 5).
To simplify the notation, let q̄ ¼ q, and assume that q1 components of l are estimated for the greater-than-
or-equal constraints, q2 for the less-than-or-equal constraints, and q3 for the equality constraints,
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 107

q1 þ q2 þ q3 ¼ q. Let
( )
Pw;2
j m2l m1q1 þm m2q1 þq2 þs m1q1 þq2 þs
c1 ¼ min min w;1 ; min 1 ; min ; min ; min .
j P l2f1;...;q1 g m m2f1;...;q2 g m2q þm s2f1;...;q3 g m1q þq þs s2f1;...;q3 g m2q þq þs
j l 1 1 2 1 2

  c1 P1 X0, and
This implies that 0oc1 p1. Let P
X
J X
J X
J
vj  1  p ij ¼ 1  c1 p1ij X1  p1ij X0; j ¼ 1; . . . ; J,
i¼1 i¼1 i¼1

X
J X
J Pw;2
j
vJþj  Pw;2
j  p ij xj ¼ Pw;2
j  c1 p1ij xj X v1Jþj X0; j ¼ 1; . . . ; J.
i¼1 i¼1 Pw;1
j

Notice that jvj  v1j jpJð1  c1 Þ and jvJþj  v1Jþj jpð1 þ JÞð1c
c1 Þ.
1

Consider now the constraints defining H E ½P% . Let r ¼ maxðt; maxl rl Þ, where t is the degree of the
polynomial. Observe that if f l ðP1 ÞXm1l then v12Jþl ¼ 0; if f l ðP1 Þom1l then v12Jþl ¼ m1l  f l ðP1 Þ. For
l ¼ 1; . . . ; q1 , let
8
> 
if f l ðP1 ÞXm1l and f l ðPÞXm2
>
<
0 l;
  m2 Þ
v2Jþl  f l ðP1 Þ  m1l  ðf l ðPÞ 1 
if f l ðP ÞXm and f l ðPÞom2 ;
1
(A.1)
l l l
>
>
: m2  f ðPÞ
 if f l ðP 1
Þom1l :
l l

The suggested values of v2Jþl are feasible. In fact, if f l ðP1 ÞXm1l the implied v2Jþl is obviously non-negative. If
f l ðP1 Þom1l ,
2
 ¼ m2  f l ðc1 P1 ÞXml m1  c1 f l ðP1 ÞXc1 v1 X0,
v2Jþl ¼ m2l  f l ðPÞ l
m1l l 2Jþl

where the first inequality follows from Assumption C1. Moreover,


 
 1  c1
1 2 1 1
jv2Jþl  v2Jþl jpjml  ml j þ jf l ðPÞ  f l ðP ÞjpM þ max jf l ðPÞjð1  cr1 Þ,
c1 P2½0;1J
2

where maxP2½0;1J 2 jf l ðPÞj is bounded because f l ðÞ is a continuous function on a compact set.
Regarding the less-than-or-equal constraints, observe that under Assumption C1 a monotone transforma-
tion of gm ðPÞ and mq1 þm leaves the constraint unaltered. Hence without loss of generality when gm ðÞ satisfies
Assumptions C1(i), let rm ¼ 1.
Now, notice that if gm ðP1 Þpm1q1 þm then v12Jþq1 þm ¼ 0; if gm ðP1 Þ4m1q1 þm then v12Jþq1 þm ¼ gm ðP1 Þ  m1q1 þm . For
m ¼ 1; . . . ; q2 , let
8

if gm ðP1 Þpm1q1 þm and gm ðPÞpm 2
>
> 0 q1 þm ;
>
> !
>
>
>
< m1 1
1   m2 
if gm ðP1 Þpm1q1 þm and gm ðPÞ4m 2
q1 þm  gm ðP Þ þ 1þr
gm ðPÞ q1 þm q1 þm ;
v2Jþq1 þm  c1
>
>
>
> 1
>
>  2
: c1þr gm ðPÞ  mq1 þm
> if gm ðP1 Þ4m1q1 þm :
1


This choice of v2Jþq1 þm satisfies the constraint in (2.8). In fact, if gm ðPÞpm2
q1 þm the constraint is satisfied with
v2Jþq1 þm ¼ 0, and in the other cases
! !
2  2  1  2 1 
mq1 þm  gm ðPÞ þ v2Jþq1 þm Xmq1 þm  gm ðPÞ þ 1þr gm ðPÞ  mq1 þm ¼  1 gm ðPÞX0,
c1 c11þr
2
where the last inequality follows because by Assumption C1 gm ðÞ is non-negative on ½0; 1J and 0oc1 p1 by
construction. Notice also that the suggested values of v2Jþq1 þm are feasible. In fact, if gm ðP1 Þpm1q1 þm the
ARTICLE IN PRESS
108 F. Molinari / Journal of Econometrics 144 (2008) 81–117

1  
implied v2Jþq1 þm is obviously non-negative, because gm ðPÞXgm ðPÞ. On the other hand, recalling that by
c1þr
1
m1q1 þm
construction c1 pminm2f1;...;q2 g , if gm ðP1 Þ4m1q1 þm ,
m2q1 þm
1   m2 1 1 2 1 1 2
v2Jþq1 þm ¼ gm ðPÞ q1 þm ¼ 1þr gm ðc1 P Þ  mq1 þm X r gm ðP Þ  mq1 þm
c11þr c1 c1
1 1
X v X0.
cr1 2Jþq1 þm
Moreover, by Assumption C1(i)
 
 1 
  1 
jv2Jþq1 þm  v12Jþq1 þm jpjm2q1 þm  m1q1 þm j þ  1þr gm ðPÞ  gm ðP Þ
 c1 
! !
1  crþ1
p 1
M þ max gm ðPÞ ,
crþ1
1 P2½0;1J
2

where maxP2½0;1J 2 gm ðPÞ is bounded because gm ðÞ is a continuous function on a compact set. Finally, observe
that for the equality constraints the same calculations as above can be applied to hk ðPÞXmq1 þq2 þk and
hk ðPÞpmq1 þq2 þk , k ¼ 1; . . . ; q3 .
!
w;2 2 w;1 1 1  crþ1
Hence, for each n, Qðn; P ; l ÞX Qðn; P ; l Þ const  1
. Interchanging the role of Pw;1 and Pw;2
crþ1
1
!
rþ1
1  c2
yields Qðn; Pw;1 ; l1 ÞX Qðn; Pw;2 ; l2 Þ  const  , where
crþ1
2
( )
Pw;1
j m1l m2q1 þm m2q1 þq2 þs m1q1 þq2 þs
c2 ¼ min min w;2 ; min 2 ; min ; min ; min
j P l2f1;...;q1 g m m2f1;...;q2 g m1q þm s2f1;...;q3 g m1q þq þs s2f1;...;q3 g m2q þq þs
j l 1 1 2 1 2

with 0oc2 p1, so that


! !
w;2 2 w;1 1 1  crþ1
1 1  crþ1
2
jQðn; P ; l Þ  Qðn; P ; l Þjpconst  þ const  .
crþ1
1 crþ1
2

Finally, under Assumption C2 the estimators PwN and ln are root-N consistent, so that
sup jQN ðnÞ  QðnÞj ¼ Op ðN 1=2 Þ.
n2DJ1

A.2. Propositions in Section 3

I first introduce and prove a lemma that is useful for the proof of some of the following propositions.
Pwj  l
Lemma 1. Suppose that Assumption 2 holds, and that Pwj 4l, j 2 X . Then is an admissible value of pxj ,
1l
and therefore solves the jth equation of system (1.1), if and only if the following conditions jointly hold: (a) pjj ¼ 1,
P 1Pw
and (b) pji ¼ l 8i 2 X nfjg such that pxi 40, so that iaj pji pxi ¼ l 1lj .
Pwj  l
Proof. For 40 to be an admissible value of pxj , the jth equation of system (1.1) requires that
1l
Pwj  l X
pjj þ pji pxi ¼ Pwj , (A.2)
1l iaj
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 109

P x
1  Pwj
and iaj pi ¼ . By Assumption 2, pji 2 ½0; l, 8i 2 X nfjg and pjj 2 ½1  l; 1. Notice that it is possible
1l P
for pji ¼ l, 8i 2 X nfjg, because the pji are not related across i. (Recall that 1  pkk ¼ lak plk pl, 8k 2 X .)
Therefore,
Pwj  l X Pwj  l X
pjj þ pji pxi ppjj þl pxi
1l iaj
1  l iaj
Pwj  l 1  Pwj Pwj  l 1  Pwj
¼ pjj þl p ¼ Pwj .
þl
1l 1l 1l 1l
P 1Pw
Hence, Eq. (A.2) can be satisfied if and only if pjj ¼ 1, and iaj pji pxi ¼ l 1lj . That is, pji ¼ l 8i 2 X nfjg such
Pwj l
that pxi 40. Notice that at least one value of pxi 40, because pxj ¼ 1l o1. &

A.2.1. Proposition 3
Proof. Without loss of generality, suppose that interest is in characterizing the identification region
H½Prðx ¼ 1Þ.
(a) Assumption
P 1 holds:P For the first equation of system (1.1) to be satisfied it must be that
px1 ¼ Pw1  Jj¼2 p1j pxj þ px1 Ji¼2 pi1 . From the definition of H 1 ½P%  it follows that
( )
X J XJ X XJ X
J
x x
lX1  phh ph ¼ pih ph X p1j pxj þ px1 pi1 .
h¼1 h¼1 iah j¼2 i¼2

Hence from the first equation of system (1.1) one can learn that px1 XmaxfPw1  l; 0g, and px1 pminf1; Pw1 þ lg.
P P
If Pw1 4l, the lower bound is achieved for Jj¼2 p1j pxj ¼ l and Ji¼2 pi1 px1 ¼ 0. If Pw1 o1  l, the upper bound is
P P
achieved for Jj¼2 p1j pxj ¼ 0 and Ji¼2 pi1 px1 ¼ l. I now show that there are values of pxj 2 X nf1g and P 2
H 1 ½P%  such that the corresponding px 2 H½PðxÞ.
Pw1
(a.1.1) Upper Bound, with Pw1 o1  l: Let p11 ¼ , pjj ¼ 1, j 2 X nf1g, pij ¼ 0, i; j 2 X nf1g, iaj, and
ðPw1 þ lÞ
define pi1 , i 2 X nf1g, as follows:
8
>
< l
w for i ¼ j ¼ minfk ¼ 2; . . . ; J : Pwk Xlg;
if 9 j41 : Pwj Xl; pi1 ¼ ðP1 þ lÞ
>
: 0; 8i 2 X ; iaf1; jg:
8
> Pw2
>
> for i ¼ 2;
>
> w
ðP1 þ lÞ
>
> 8
>
>
>
>
>
>

l P Pwk
i1 Pwi
< for i 2 X nf1; 2g;
>
>
< min  ; P Pwk
i1
ðPw1 þ lÞ k¼2 ðPw1 þ lÞ ðPw1 þ lÞ > i : pl;
if Pwj ol; 8j 2 X nf1g; pi1 ¼ : w
k¼2 ðP1 þ lÞ
>
>
>
> 8
>
>
>
> < for i 2 X nf1; 2g;
>
>
> iP
1 Pwk
>
>0
>
>
: :i :
> w 4l:
k¼2 ðP1 þ lÞ

It is easy to show that the suggested P belongs to H 1 ½P% , and allows for px1 ¼ Pw1 þ l and the implied pxj ,
j 2 X nf1g to solve system (1.1). Hence, px1 ¼ Pw1 þ l is a feasible value of Prðx ¼ 1Þ given the maintained
assumptions.
(a.1.2) Upper bound, with Pw1 X1  l: In this case the upperP
bound is not informative, but just set equal to 1.
J
i¼2 pi1 ¼ 1  P1 pl, and pi1 p1 ¼ pi1 ¼ Pi pl,
w w
Let px1 ¼ 1; this in turn implies pxj ¼ 0, 8j 2 X nf1g. Let x
1 % x
8i 2 X , ia1. It is straightforward to verify that the suggested P 2 H ½P , and allows for p1 ¼ 1, and the
ARTICLE IN PRESS
110 F. Molinari / Journal of Econometrics 144 (2008) 81–117

implied pxj ¼ 0, 8j 2 X nf1g, to solve system (1.1). Hence px1 ¼ 1 is a feasible value of Prðx ¼ 1Þ given the
maintained assumptions.
l l
(a.2.1) Lower bound, with Pw1 4l: Let px2 ¼ Pw2 þ l, and p12 ¼ x , p22 ¼ 1  x , and pjj ¼ 1, 8j 2 X nf2g, so
p2 p2
that pi2 ¼ 0, 8i 2 X nf2g, and pij ¼ 0, 8i; j 2 X , iaj, ½i ja½1 2. Then it is straightforward to verify that the
suggested P 2 H 1 ½P% , and allows for px1 ¼ Pw1  l and the implied pxj , j 2 X nf1g to solve system (1.1). Hence
Pw1  l is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(a.2.2) Lower bound, with Pw1 pl: Then the lower bound is not informative, but just set equal to 0. Let
P P
px1 ¼ 0; this in turn implies Jj¼2 pxj ¼ 1. Let p12 ¼ p13 ¼    ¼ p1J ¼ Pw1 . Then Jj¼2 p1j pxj ¼ Pw1 . Moreover
PJ
j¼2 Pj ¼ 1  P1 X1  l, hence Pj p1  P1 for each j 2 X nf1g. Let pjj ¼ 1  P1 , 8j 2 X nf1g, and pij ¼ 0,
w w w w w
w
Pj PJ x
w p1, j 2 X nf1g, and j¼2 pj ¼ 1. It follows that when P1 pl, there exist
w
8i; j 2 X , iaj, ia1. Then pxj ¼
1  P1
values of P 2 H 1 ½P%  for which px1 ¼ 0 and the implied pxj , j 2 X nf1g solve system (1.1), and hence it is a
feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(a.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpPw1 p1  l; (2) Pw1 pminfl; 1  lg; (3) Pw1 Xmaxfl; 1  lg; (4) 1  loPw1 ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Let lpPw1 p1  l. It then follows that

Pw1  lppx1 pPw1 þ l.

Let px1 ¼ Pw1 þ ð1  2aÞl, for any a 2 ð0; 1Þ. To find values of pxj 2 X nf1g and P 2 H 1 ½P%  such that the
corresponding px 2 H½PðxÞ, I distinguish two sub-cases:
Pw1
1. If ap 12, let p11 ¼ , pij ¼ 0, 8i ¼ 1; . . . ; J, j ¼ 2; . . . ; J. Choose pj1 and pxj , j 2 X nf1g, as
Pw1 þ ð1  2aÞl
Pw1
follows: if 9 j : Pwj X1  w ,
P1 þ ð1  2aÞl
8
>
<1  Pw1
for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
pk1 ¼ Pw1 þ ð1  2aÞl
>
: 0; 8k 2 X ; kaf1; jg:
Pw1
If Pwj o1  ; 8j 2 X nf1g,
Pw1 þ ð1  2aÞl
8 w
< P2 
>

for k ¼ 2;
pk1 ¼ Pw1 P
k1
: min 1 
>  pi1 ; Pwk 8k 2 X nf1; 2g;
Pw1 þ ð1  2aÞl i¼2

pxj ¼ Pwj  pj1 ðPw1 þ ð1  2aÞlÞ.


Pw2 ð2a  1Þl
2. If a4 12, let pjj ¼ 1, 8j 2 X nf2g, p22 ¼ , p11 ¼ w , and px2 ¼ Pw2 þ ð2a  1Þl.
Pw2 þ ð2a  1Þl P2 þ ð2a  1Þl

P
(b) Assumption 2 holds: For the first equation of system (1.1) to be satisfied, I need p11 px1 þ Jj¼2 p1j pxj ¼ Pw1 ,
PJ x
j¼2 pj ¼ 1  p1 . From the definition of H ½P , p1j pl, 8j 2 X nf1g, and p11 X1  l. Let
x 2 %
where
PJ
j¼2 p1j pj pp̄ð1  p1 Þ, where p̄ 2 ½0; l. Then
x x

Pw1  p̄
px1 ¼ ,
p11  p̄
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 111

and px1 is well defined as long as p11 ap̄. I distinguish a few cases.
1. If Pw1 ominfl; 1  lg, one can pick p̄ ¼ Pw1 ol, and px1 ¼ 0 is the lower bound. As for the upper bound,
when Pw1 o1  lpp11 , by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and
Pw
p̄. Hence the upper bound is achieved for p11 ¼ 1  l, and p̄ ¼ 0, and is given by px1 ¼ 1 .
1l
2. If lpPw1 p1  l, by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and p̄.
Pw
Hence the upper bound is achieved for p11 ¼ 1  l, and p̄ ¼ 0, and is given by px1 ¼ 1 , and the lower
1l
x Pw1  l
bound is achieved for p11 ¼ 1, and p̄ ¼ l, and is given by p1 ¼ .
1l
3. If 1  lpP1 pl, pick p̄ ¼ P1 pl, and p1 ¼ 0 is the lower bound. Pick p11 ¼ Pw1 X1  l, and px1 ¼ 1 is the
w w x

upper bound.
4. If Pw1 4maxfl; 1  lg, pick p11 ¼ Pw1 X1  l, and px1 ¼ 1 is the upper bound. As for the lower bound, when
Pw1 4lXp̄, by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and p̄. Hence
Pw  l
the lower bound is achieved for p11 ¼ 1, and p̄ ¼ l, and is given by px1 ¼ 1 .
1l
 w
P l
To summarize, from the first equation of system (1.1) one can learn that px1 Xmax 1 ; 0 and
 1l
w
P
px1 pmin 1; 1 . I am left to show that one can find values of pxj 2 X nf1g and P 2 H 2 ½P%  such that for
 1 l w  
P l Pw
any p1 2 max 1
x
; 0 ; min 1; 1 the corresponding px 2 H½PðxÞ. I first show that this holds for
1l 1l
the extreme points, and then that it holds for any point in the closed interval between the lower and the upper
bound.
(b.1.1) Upper Bound, with Pw1 o1  l: Let p11 ¼ 1  l and pjj ¼ 1, 8 j41. Then the system reduces to
8
> Pw1
>
< ð1  lÞ ¼ Pw1 ;
1l
>
> Pw
: pj1 1 þ pxj ¼ Pwj ; j ¼ 2; . . . ; J;
1l
PJ P
where j¼2 pj1 ¼ l, and Jj¼2 Pwj 4l. Choose pk1 , k 2 X nf1g, as follows:
(
w
l for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
if 9 j : Pj Xl; pk1 ¼
0; 8k 2 X ; kaf1; jg:
(
Pw2 for k ¼ 2;
if Pwj ol; 8j 2 X nf1g; pk1 ¼ Pk1 w (A.3)
minfl  i¼2 pi1 ; Pk g 8k 2 X nf1; 2g:

Pw1
It is easy to show that the suggested P belongs to H 2 ½P% , and allows for px1 ¼ and the implied pxj , j 2 X nf1g
1l
Pw1
to solve system (1.1). Hence, px1 ¼ is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
1l
w
(b.1.2) Upper bound, with P1 X1  l: In this case the upper bound is not informative, but just set equal to 1.
Let px1 ¼ 1; this in turn implies pxj ¼ 0, 8j 2 X nf1g. Let pj1 ¼ Pwj , j ¼ 1; . . . ; J. It is straightforward to verify
that this P 2 H 2 ½P% , and obviously allows for px1 ¼ 1 and the implied pxj ¼ 0, 8j 2 X nf1g, to solve system
(1.1). Hence px1 ¼ 1 is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(b.2.1) Lower bound, with Pw1 4l: Let pj1 ¼ 0, 8j 2 X nf1g, and p12 ¼    ¼ p1J ¼ l; then the first equation of
Pwj
system (1.1) is satisfied, and the implied P 2 H 2 ½P% . Let pxj ¼ X0, j 2 X nf1g. It is straightforward to verify
1l
w
P l
that system (1.1) is satisfied. Hence px1 ¼ 1 is a feasible value for Prðx ¼ 1Þ given the maintained assumptions.
1l
ARTICLE IN PRESS
112 F. Molinari / Journal of Econometrics 144 (2008) 81–117

P
(b.2.2) Lower bound, with Pw1 pl: Let px1 ¼ 0; this in turn implies Jj¼2 pxj ¼ 1. Let p1j ¼ Pw1 and pjj ¼ 1  Pw1
Pwj PJ x
j¼2 pj ¼ 1. It follows that when P1 pl, there exist values of
w
8j41. Then pxj ¼ w X0, j 2 X nf1g, and
1  P1
P 2 H 2 ½P%  for which px1 ¼ 0 and the implied pxj , j 2 X nf1g, solve system (1.1), and hence it is a feasible value
of Prðx ¼ 1Þ given the maintained assumptions.
(b.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpPw1 p1  l; (2) Pw1 pminfl; 1  lg; (3) Pw1 Xmaxfl; 1  lg; (4) 1  loPw1 ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Pw  l x Pw Pw  al
Let lpPw1 p1  l. It then follows that 1 pp1 p 1 . Let px1 ¼ 1 , for any a 2 ð0; 1Þ. I show that
1l 1l 1l
2
there are values of pxj 2 X nf1g and P 2 H ½P  such that the corresponding px 2 H½PðxÞ. Let
%

p11 ¼ 1  lð1  aÞ, p1j ¼ al, 8j 2 X nf1g, pij ¼ 0, 8i; j 2 X nf1g, iaj. Choose pj1 and pxj , j 2 X nf1g, as follows:
(
w
lð1  aÞ for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
if 9 j : Pj Xlð1  aÞ; pk1 ¼
0; 8k 2 X ; kaf1; jg;
(
Pw2 for k ¼ 2;
if Pwj olð1  aÞ; 8j 2 X nf1g; pk1 ¼ Pk1 w
minflð1  aÞ  i¼2 pi1 ; Pk g 8k 2 X nf1; 2g;
1 Pw  al
pxj ¼ ðPwj  pj1 1 Þ. &
1  al 1l

A.2.2. Proposition 4
Proof. (a) Suppose, without loss of generality, that X~ ¼ f1; 2; . . . ; hg, 2phoJ, and consider Prðx ¼ 1Þ. By
Pw  l
Lemma 1, for 1 40 to solve the first equation of system (1.1), it must be that p11 ¼ p ¼ 1, and either
1l
P 1  Pw1
p1i ¼ l or pxi ¼ 0, 8i 2 X nf1g, with Ji¼2 p1i pxi ¼ l . Since p22 ¼ p by assumption, and p ¼ 1, it follows
1l
that p12 ¼ 0. Hence, for the first equation in system (1.1) to hold, px2 ¼ 0. Consider the second equation in
P
system (1.1): when the first equation of the system holds, P the second reduces to Ji¼3 p2i pxi ¼ Pw2 . However, for
each i 2 X nf1g, if p1i ¼ l, it follows that p2i ¼ 0, since kal pkl ¼ 1  pll pl, 8l 2 X . On the other hand, if
P
p1i ol, for the first equation in system (1.1) to hold it must be the case that pxi ¼ 0. Hence, Ji¼3 p2i pxi ¼ 0.
Therefore, since Pw2 40; the lower bound in (3.2) is not feasible for Prðx ¼ 1Þ, because the second equation of
system (1.1) is not satisfied. Notice now that repeating the same argument for each of equations 3 to h in
system (1.1), implies by a symmetry argument that Prðx ¼ 1Þ cannot achieve the lower bound in (3.2).
For k 2 X nX~ , Prðx ¼ kÞ can achieve the lower bound in (3.2). Consider for example Prðx ¼ JÞ. Let pJJ ¼ 1
and pJi ¼ l, 8i 2 X nfJg. Then the last equation of system (1.1) is satisfied. These values of pJi , i 2 X imply that
Pwj
p ¼ 1  l, and that pxj ¼ for each j 2 X nfJg. It is obvious that the suggested P 2 H 3 ½P% , and the
1l
implied pxj solves system (1.1).
P
(b) Suppose that Pw1 pl and that px1 ¼ 0. Then Jj¼2 pxj ¼ 1, and pxj X0 8j ¼ 2; . . . ; J. Then the proof of
Proposition 3, part (b.2.2), applies, with p ¼ 1  Pw1 , p12 ¼ p13 ¼    ¼ p1J ¼ Pw1 , and pij ¼ 0, 8i; j 2 X , iaj,
ia1. Hence, it follows that px1 ¼ 0 is a value consistent with Assumption 3 if Pw1 pl. &

A.2.3. Proposition 5

Proof. (a) Suppose, without loss of generality, that X~ ¼ f1; 2; . . . ; hg, 2phoJ, and consider Prðx ¼ 1Þ. For
Pw
px1 ¼ 1 o1 to be admissible in the first equation of system (1.1), it must be that p ¼ 1  l and
1l
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 113

PJ x
j¼2 p1j pj ¼ 0. Since pjj ¼ p, 8j 2 X~ , the second equation of the system becomes:

Pw1 XJ
p21 þ ð1  lÞpx2 þ p2j pxj ¼ Pw2 ,
1l j¼3

PJ x Pw1 PJ Pw1
where j¼3 pj ¼1  px2 . Let x
j¼3 p2j pj ¼ p̄ð1   px2 Þ, where p̄ 2 ½0; l, since the constraints
1l 1l
pij p1  ppl, 8iaj 2 X~ , and plk pl, 8lak 2 X nX~ , allow for p1j ¼ 0 or p1j ¼ l, 8j ¼ 2; . . . ; J. It follows that

Pw1
Pw2  p̄  ðp21  p̄Þ
px2 ¼ 1  l.
1  l  p̄
 
x Pw1
Notice that p2 must lie in 0; 1  . I need to distinguish three cases.
1l

1. 1  l  p̄40. Then
Pw1
Pw2  p̄  ðp21  p̄Þ
1  l X0 () p pp̄ þ ðPw  p̄Þ ð1  lÞ ,
21 2
1  l  p̄ Pw1
Pw
and one can always find values of p21 ; p̄ 2 ½0; l for which this inequality is satisfied. For px2 p1  1 it
1l
must be that

Pw1
Pw2  p̄  ðp21  p̄Þ w w w
1  l p1  P1 () p X l  1 þ P1 þ P2 ð1  lÞ.
21 w
1  l  p̄ 1l P1

As long as there exist values of p21 pl that satisfy the above inequality, the upper bound in (3.2) is
admissible. However,

l  1 þ Pw1 þ Pw2 l
ð1  lÞ4l () Pw1 þ Pw2 4ð1  lÞ þ Pw1 .
Pw1 1l

Hence, the upper bound in (3.2) can be rejected if

l
Pw1 þ Pw2 4ð1  lÞ þ Pw1 . (A.4)
1l

Pw1 þ Pw2  ð1  lÞ
2. 1  l  p̄ ¼ 0. Then p21 ¼ ð1  lÞ. Hence, the upper bound in (3.2) can be rejected if
Pw1
condition (A.4) is satisfied.
3. 1  l  p̄o0. Then
Pw1
Pw2  p̄  ðp21  p̄Þ
1  l X0 () p Xp̄ þ ðPw  p̄Þ ð1  lÞ .
21 2
1  l  p̄ Pw1

As long as there exist values of p21 pl that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
ð1  lÞ Pw ðl  p̄Þ
p̄ þ ðPw2  p̄Þ w 4l () Pw2 4p̄ þ 1 .
P1 1l
ARTICLE IN PRESS
114 F. Molinari / Journal of Econometrics 144 (2008) 81–117

Hence, given that by assumption pij pl, 8iaj, i; j 2 X , the upper bound in (3.2) can be rejected if Pw2 4l. For
Pw
px2 p1  1 it must be that
1l
Pw1
Pw2  p̄  ðp21  p̄Þ w w w
1  l p1  P1 () p p l  1 þ P1 þ P2 ð1  lÞ.
21
1  l  p̄ 1l Pw1

As long as there exist values of p21 X0 that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
l  1 þ Pw1 þ Pw2
ð1  lÞo0 () Pw1 þ Pw2 oð1  lÞ.
Pw1

Hence, the upper bound in (3.2) can be rejected if one of the following holds: (i) Pw2 4l, or (ii)
Pw1 þ Pw2 oð1  lÞ.
Finally, notice that
8
>
> if lp 12 ; ð1  l  pij Þ40; 8iaj; i; j 2 X ;
>
> 8 8
>
> > < w w l
>
> > w
>
> > Pw 4l¼) P1 þ P2 4ð1  lÞ þ P1 1  l;
>
>
>
< >
> 2
: Pw þ Pw 4ð1  lÞ
>
< 1 2
>
>
1
if l4 2 ; 8 :
>
>
>
>
>
> < Pw þ Pw oð1  lÞ þ Pw l
>
> >
> Pw1 þ Pw2 oð1  lÞ¼) 1 2 1
1l
>
> >
>
>
> >
: : P2 ol
w
:

When lp 12, condition (A.4) is necessary and sufficient to define the cases in which the upper bound in (3.2) is
not feasible. When l4 12, it can still be the case that ð1  l  p̄Þ40 (but it does not need to be). If Pw2 4l, (A.4)
is implied, and the upper bound in (3.2) is not feasible. If Pw1 þ Pw2 oð1  lÞ, then condition (A.4) is not
satisfied, and if ð1  l  p̄Þ40, the upper bound in (3.2) can be feasible. Hence, when lX 12, Pw2 4l is a
sufficient condition for the upper bound in (3.2) to be not feasible.
Notice now that repeating the same argument for each of equations 3 to h in system (3.3), and solving each
one of them, respectively, for px3 ; px4 ; . . . ; pxh , implies by a symmetry argument that if lp 12, the upper bound in
(3.2) can be rejected if and only if
l
Pw1 þ Pwj 4ð1  lÞ þ Pw1 some j 2 X~ nf1g,
1l
while if l4 12, the upper bound in (3.2) can be rejected if

Pwj 4l some j 2 X~ nf1g.

Equations h þ 1 to J in system (3.3) do not imply any additional conditions under which the upper bound in
(3.2) is not feasible. Indeed, let k 2 X nX~ ; then
Pw1 X
p21 þ pkk pxk þ pkj pxj ¼ Pwk .
1l j2X nf2;kg
P Pw1
Let pkk ¼ 1, and, by the same argument as above, let j2X nf2;kg pkj pxj ¼ p̄ð1   pxk Þ, where p̄ must lie in
1l
½0; l. Then
 
w Pw1 Pw1
Pk  p21  p̄ 1 
1l 1l
pxk ¼ ,
1  p̄
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 115

where 1  p̄X1  l40. It is straightforward to verify that there are values of p21 ; p̄ 2 ½0; l for which
Pw Pw Pw
pxk 2 0; 1  1 . For example, if Pwk p1  1 , let p̄ ¼ p21 ¼ 0, so that pxk ¼ Pwk . If Pwk 41  1 and
1l 1l 1l
Pwk  l Pw1
w
Pk 4l, let p̄ ¼ p21 ¼ l, so that pk ¼x
p1  .
1l 1l
(b) Suppose that P1 41  l, and that p1 ¼ 1. Then pxj ¼ 0 8j ¼ 2; . . . ; J. Then pick p ¼ Pw1 (notice that
w x

P1 41  l, hence the proposed value of p is admissible), and pj1 ¼ Pwj 8j ¼ 2; 3; . . . ; J. Since Pw1 41  l, it
w

follows that Pwj ol 8j ¼ 2; 3; . . . ; J, hence the proposed values of pj1 , 8j ¼ 2; 3; . . . ; J, are admissible, and
therefore px1 ¼ 1 is admissible, and hence it is the upper bound. &

A.2.4. Proposition 6
Proof. (a) Lower bound.
Pw2  l
Suppose that j41, and without loss of generality consider Prðx ¼ 2Þ. By Lemma 1, for px2 ¼ 40 to
1l
x
solve the second equation of system (1.1), it must be that p22 ¼ 1, and either p2i ¼ l or pi ¼ 0, 8i 2 X nf2g,
P 1  Pw2
with ia2 p2i pxi ¼ l . Since p22 pp11 by assumption, and p22 ¼ 1, it follows that p11 ¼ 1; hence, the first
1l PJ x w
P (1.1) reduces to i¼3 p1i pi ¼ P1 . However, for each i 2 X nf1; 2g, if p2i ¼ l, it follows that
equation of system
p1i ¼ 0, since kal pkl ¼ 1  pll pl, 8l 2 X . On the other hand, if p2i ol, for the second equation in system
P
(1.1) to hold it must be the case that pxi ¼ 0. Hence, Ji¼3 p1i pxi ¼ 0. Therefore, since Pw1 40, the lower bound
in (3.2) is not feasible for Prðx ¼ 2Þ. Notice now that repeating the same argument for Prðx ¼ jÞ, jX3, implies
that Prðx ¼ jÞ cannot achieve the lower bound in (3.2).
Consider now Prðx ¼ 1Þ, and let p11 ¼ 1, and p1i ¼ l, 8i 2 X nf1g. Then the first equation of system (1.1) is
Pwj
satisfied. Let pxj ¼ and pjj ¼ 1  l for each j 2 X nf1g. It is obvious that the suggested P 2 H 4 ½P% , and
1l
the implied pxj solves system (1.1).
(b) Upper bound.
Pw
First, let j ¼ 1, and Pw1 oð1  lÞ. Then, as shown in the proof of Proposition 5, for px1 ¼ 1 it must be that
1l
PJ
p11 ¼ 1  l and i¼2 p1i pi ¼ 0. But by Assumption 4, p11 Xp22 X    XpJJ X1  l, and therefore for px1 ¼
x

Pw1
to solve the first equation of system (1.1) it must be that pjj ¼ 1  l, 8 j 2 X , and I am back to the case of
1l
constant probability of correct report, with X~ ¼ X . Now let j41, and Pw oð1  lÞ. Then, again, for px ¼
j j
Pwj P
it must be that pjj ¼ 1  l and iaj pji pxi ¼ 0. But by Assumption 4, pjj Xpðjþ1Þðjþ1Þ X    XpJJ X1  l,
1l
and therefore it must be that pkk ¼ 1  l, 8k 2 fj; j þ 1; . . . ; Jg, and I am back to the case of constant
probability of correct report, with X~ ¼ fj; j þ 1; . . . ; Jg. The result of Proposition 5 applies. &

A.2.5. Proposition 7
 
Pw1  ð1  pÞ 1 2Pw1  1
Proof. With dichotomous variables, px1 ðpÞ ¼ ¼2 þ 1 , p 2 H 3 ½P% . Hence,
p  ð1  pÞ 2p  1
qpx1 ðpÞ
1. If lo 12 Pw1 X 12, then 1  ppPw1 pp and p0. Hence the lower bound on Prðx ¼ 1Þ is achieved for
qp
p ¼ 1 and the upper bound for p ¼ maxð1  l; Pw1 Þ.
2. If lX 12 Pw1 X 12, then for px1 2 ½0; 1 I need one of the following: (a) 1  ppPw1 pp¼)pXPw1 X 12; or (b)
ppPw1 p1  p¼)pp1  Pw1 o 12; additionally, I need pX1  l. Hence, the feasible values of p are given by
p 2 ½1  l; 1  Pw1  [ ½Pw1 ; 1. Notice that if loPw1 , the feasible values of p are given by p 2 ½Pw1 ; 1, and px1 is
decreasing in p; therefore the lower bound is achieved for p ¼ 1 and the upper bound for p ¼ Pw1 . When
ARTICLE IN PRESS
116 F. Molinari / Journal of Econometrics 144 (2008) 81–117

l4Pw1 , for values of p 2 ½Pw1 ; 1 the previous result applies. For values of p 2 ½1  l; 1  Pw1  px1 is decreasing
in p; therefore the upper bound is achieved for p ¼ 1  l and the lower bound for p ¼ 1  Pw1 .
qpx ðpÞ
3. If lo 12 Pw1 o 12, then 1  ppPw1 pp and 1 X0. Hence the lower bound on Prðx ¼ 1Þ is achieved for
qp
p ¼ 1  minðl; Pw1 Þ and the upper bound for p ¼ 1.
4. If lX 12 Pw1 o 12, then for px1 2 ½0; 1 I need one of the following: (a) 1  ppPw1 pp¼)pX1  Pw1 4 12; or (b)
ppPw1 p1  p¼)ppPw1 o 12; additionally, I need pX1  l. Hence, the feasible values of p are given by
p 2 ½1  l; Pw1  [ ½1  Pw1 ; 1. Notice that if 1  l4Pw1 , the feasible values of p are given by p 2 ½1  Pw1 ; 1,
and px1 is increasing in p; therefore the lower bound is achieved for p ¼ 1  Pw1 and the upper bound for
p ¼ 1. When 1  loPw1 , for values of p 2 ½1  Pw1 ; 1 the previous result applies. For values of p 2
½1  l; Pw1  px1 is increasing in p; therefore the upper bound is achieved for p ¼ Pw1 and the lower bound for
p ¼ 1  l.

It is easy to verify that these bounds are a subset of those in (3.2). &

A.2.6. Proposition 8
Pw1  ð1  p22 Þ
Proof. In this case, px1 ðpÞ ¼ , ðp11 ; p22 Þ 2 H 4 ½P% . Hence,
p11  ð1  p22 Þ
1. If lo 12, 1  p22 pPw1 pp11 , and px1 ðpÞ is increasing in p22 and decreasing in p11 . Hence the lower bound is
achieved for p22 ¼ 1  l and p11 ¼ 1. The upper bound is achieved with p22 ¼ p11 , since p11 bounds p22
from above. Hence if Pw1 X 12, the upper bound is achieved for p11 ¼ p22 ¼ maxð1  l; Pw1 Þ. If Pw1 o 12, the
upper bound is achieved for p11 ¼ p22 ¼ 1.
2. If lX 12 and Pw1 o 12, either 1  p22 pPw1 pp11 or 1  p22 XPw1 Xp11 . Hence, either p11 2 ½1  Pw1 ; 1 and
p22 2 ½1  Pw1 ; p11 , or p11 2 ½1  l; Pw1  and p22 2 ½1  l; p11 . In the first case px1 is increasing in p22 and
decreasing in p11 ; the lower bound is achieved for p11 ¼ 1, p22 ¼ 1  Pw1 . The upper bound is achieved with
p22 ¼ p11 ¼ 1. In the second case px1 is decreasing in p22 and increasing in p11 ; the lower bound is achieved
with p22 ¼ p11 ¼ 1  l. The upper bound is achieved with p11 ¼ Pw1 and p22 ¼ 1  l.
3. If lX 12 and Pw1 X 12, consider the following two cases. If l4Pw1 then p11 ¼ p22 ¼ 1  Pw1 are admissible
values, and the implied px1 ¼ 0. Also, p11 ¼ Pw1 is an admissible value, and the implied px1 ¼ 1. If loPw1 then
p11 2 ½Pw1 ; 1, p22 2 ½1  l; p11  and 1  p22 pPw1 pp11 . Then px1 is decreasing in p11 and increasing in p22 .
Hence the lower bound is achieved for p11 ¼ 1 and p22 ¼ 1  l, and the upper bound is achieved with
p22 ¼ p11 ¼ Pw1 . &

References

Abrevaya, J., Hausman, J.A., 1999. Semiparametric estimation with mismeasured dependent variables: an application to duration models
for unemployment spells. Annales d’Economie et de Statistique 55–56, 243–275.
Aigner, D.J., 1973. Regression with a binary independent variable subject to errors of observation. Journal of Econometrics 1, 49–60.
Beresteanu, A., Molinari, F., 2007. Asymptotic properties for a class of partially identified models. Econometrica, forthcoming.
Blundell, R., Gosling, A., Ichimura, H., Meghir, C., 2007. Changes in the distribution of male and female wages accounting for
employment composition using bounds. Econometrica 75, 323–363.
Bollinger, C.R., 1996. Bounding mean regressions when a binary regressor is mismeasured. Journal of Econometrics 73, 387–399.
Bound, J., Brown, C., Mathiowetz, N., 2001. Measurement error in survey data. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of
Econometrics, vol. 5. North-Holland, Elsevier Science, pp. 3705–3843.
Bross, I., 1954. Misclassification in 2  2 tables. Biometrics 10 (4), 478–486.
Campbell, S.L., Meyer, C.D., 1991. Generalized Inverses of Linear Transformations. Dover Publications, Inc., New York.
Card, D., 1996. The effect of unions on the structure of wages: a longitudinal analysis. Econometrica 64 (4), 957–979.
Chernozhukov, V., Hong, H., Tamer, E., 2004. Inference on parameter sets in econometric models. Discussion paper, MIT, Duke and
Northwestern University.
Chernozhukov, V., Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets in econometric models.
Econometrica 75, 1243–1284.
Ciliberto, F., Tamer, E., 2004. Market structure and multiple equilibria in airline markets, Discussion paper, University of Virginia and
Northwestern University.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 117

Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London, UK.
Dominitz, J., Sherman, R.P., 2006. Identification and estimation of bounds on school performance measures: a nonparametric analysis of
a mixture model with verification. Journal of Applied Econometrics 21, 1295–1326.
Dustmann, C., van Soest, A., 2000. Parametric and semiparametric estimation in models with misclassified dependent variables. IZA
Discussion Paper 218.
Gong, G., Whittemore, A.S., Grosser, S., 1990. Censored survival data with misclassified covariates: a case study of breast cancer
mortality. Journal of the American Statistical Association 85 (409), 20–28.
Gustman, A.L., Steinmeier, T.L., 2001. What people don’t know about their pension and social security. In: Gale, W.G., Shoven, J.B.,
Warshawsky, M.J. (Eds.), Public Policies and Private Pensions. Brookings Institution, Washington D.C.
Gustman, A.L., Mitchell, O.S., Samwick, A.A., Steinmeier, T.L., 2000. Evaluating pension entitlements. In: Mitchell, O.S., Hammond,
P.B., Rappaport, A.M. (Eds.), Forecasting Retirement Needs and Retirement Wealth. University of Pennsylvania.
Hampel, F.R., 1974. The influence curve and its role in robust estimation. Journal of the American Statistical Association 69 (346),
383–393.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions.
Wiley, New York.
Hausman, J., Abrevaya, J., Scott-Morton, F.M., 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of
Econometrics 87, 239–269.
Honore, B.E., Lleras-Muney, A., 2006. Bounds in competing risks models and the war on cancer. Econometrica 74, 1675–1698.
Honore, B.E., Tamer, E., 2006. Bounds on parameters in panel dynamic discrete choice models. Econometrica 74, 611–629.
Horn, R.A., Johnson, C.R., 1999. Matrix Analysis. Cambridge University Press, New York.
Horowitz, J.L., Manski, C.F., 1995. Identification and robustness with contaminated and corrupted data. Econometrica 63 (2), 281–302.
Horowitz, J.L., Manski, C.F., 1998. Censoring of outcomes and regressors due to survey nonresponse: identification and estimation using
weights and imputations. Journal of Econometrics 84, 37–58.
Horowitz, J.L., Manski, C.F., 2000. Nonparametric analysis of randomized experiments with missing covariate and outcome data. Journal
of the American Statistical Association 95 (449), 77–84.
Hotz, V.J., Mullin, C.H., Sanders, S.G., 1997. Bounding causal effects using data from a contaminated natural experiment: analyzing the
effects of teenage childbearing. Review of Economic Studies 64, 575–603.
Hu, Y., 2006. Bounding parameters in a linear regression model with a mismeasured regressor using additional information. Journal of
Econometrics 133, 51–70.
Imbens, G.W., Manski, C.F., 2004. Confidence intervals for partially identified parameters. Econometrica 72 (6), 1845–1857.
Kane, T.J., Rouse, C.E., Staiger, D., 1999. Estimating returns to schooling when schooling is misreported, NBER Working Paper 7235.
Klepper, S., 1988. Bounding the effects of measurement error in regressions involving dichotomous variables. Journal of Econometrics 37,
343–359.
Klepper, S., Leamer, E.E., 1984. Consistent sets of estimates for regressions with errors in all variables. Econometrica 52 (1), 163–183.
Kreider, B., Pepper, J., 2007. Inferring disability status from corrupt data. Journal of Applied Econometrics, forthcoming.
Lewbel, A., 2000. Identification of the binary choice model with misclassification. Econometric Theory 16, 603–609.
Mahajan, A., 2006. Identification and estimation of regression models with misclassification. Econometrica 74, 631–665.
Manski, C.F., 2003. Partial Identification of Probability Distributions. Springer Series in Statistics. Springer, New York.
Manski, C.F., Tamer, E., 2002. Inference on regressions with interval data on a regressor or outcome. Econometrica 70 (2), 519–546.
Mellow, W., Sider, H., 1983. Accuracy of response in labor market surveys: evidence and implications. Journal of Labor Economics 1 (4),
331–344.
Molinari, F., 2003. Contaminated, corrupted, and missing data, Ph.D. Thesis, Northwestern University, available at hhttp://
www.arts.cornell.edu/econ/fmolinari/dissertation.pdfi.
Moore, J.C., Marquis, K.H., Bogen, K., 1996. The SIPP Cognitive Research Evaluation Experiment: Basic Results and Documentation,
Unpublished Report, U.S. Bureau of the Census.
Munkres, J.R., 1991. Analysis on Manifolds. Addison-Wesley, Reading, MA.
Poterba, J.M., Summers, L.H., 1995. Unemployment benefits and labor market transitions: a multinomial logit model with errors in
classification. The Review of Economics and Statistics 77 (2), 201–216.
Ramalho, E.A., 2002. Regression models for choice-based samples with misclassification in the response variable. Journal of Econometrics
106, 171–201.
Rao, C.R., 1973. Linear Statistical Inference and its Applications. Wiley, New York.
Rockafellar, R.T., 1970. Convex Analysis. Princeton University Press, Princeton, New Jersey.
Swartz, T., Haitovsky, Y., Vexler, A., Yang, T., 2004. Bayesian identifiability and misclassification in multinomial data. Canadian Journal
of Statistics 32, 285–302.

You might also like