Professional Documents
Culture Documents
Abstract
This paper addresses the problem of data errors in discrete variables. When data errors occur, the observed variable is a
misclassified version of the variable of interest, whose distribution is not identified. Inferential problems caused by data
errors have been conceptualized through convolution and mixture models. This paper introduces the direct misclassification
approach. The approach is based on the observation that in the presence of classification errors, the relation between the
distribution of the ‘true’ but unobservable variable and its misclassified representation is given by a linear system of
simultaneous equations, in which the coefficient matrix is the matrix of misclassification probabilities. Formalizing the
problem in these terms allows one to incorporate any prior information into the analysis through sets of restrictions on the
matrix of misclassification probabilities. Such information can have strong identifying power. The direct misclassification
approach fully exploits it to derive identification regions for any real functional of the distribution of interest. A method
for estimating the identification regions and construct their confidence sets is given, and illustrated with an empirical
analysis of the distribution of pension plan types using data from the Health and Retirement Study.
r 2007 Elsevier B.V. All rights reserved.
1. Introduction
Error-ridden data constitute a significant problem in nearly all fields of science. There are many possible
sources of data errors. Examples include use of inexact measures because of high costs or infeasibility of exact
evaluation, tendency of study subjects to underreport socially undesirable behaviors and attitudes and
overreport socially desirable ones, or imperfect recall (or lack of knowledge) by study subjects. When data
errors are present, often the sampling process does not identify the probability distribution of interest, and
inference is impaired.
This paper addresses the problem of data errors in discrete variables. Interest in the question emerges from
the observation that much of the empirical work in economics and related fields is based on the analysis of
Tel.: +1 6072556367; fax: +1 6072552818.
E-mail address: fm72@cornell.edu
0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2007.12.003
ARTICLE IN PRESS
82 F. Molinari / Journal of Econometrics 144 (2008) 81–117
survey data. The reliability of these data is well documented to be less than perfect (see for example Bound
et al., 2001). Although survey questions may gather information on variables that are conceptualized as
continuous (e.g., age, earnings, etc.), a considerable part of the collected data is in the form of variables taking
values in finite sets. Examples include educational attainment, language proficiency, workers’ union status,
employment status, health conditions, and health/functional status.
When data errors occur in variables of this type, it is natural to think about the problem in terms of
classification errors (see for example Bross, 1954; Aigner, 1973). An example may clarify this point. Suppose
that an analyst is interested in learning the distribution of pension plan types in the American population.
Three types are possible: defined benefit (DB), defined contribution (DC), and plans incorporating features of
both. Suppose that the analyst has data from a nationally representative survey which queried a random
sample of American households about their pension plans’ characteristics. Validation studies document that a
significant fraction of the reported plan types differ from the truth; for example, some people who truly have a
DB plan are erroneously classified as having a DC plan (Gustman and Steinmeier, 2001).
To formalize the problem, suppose that each member l of a population L is characterized by the vector
ðwl ; xl Þ 2 X X , where X is a discrete set, not necessarily ordered, denoted by X f1; 2; . . . ; Jg, 2pJo1.
Let a sampling process draw persons at random from L. Suppose that the analyst is interested in learning
features of the distribution PðxÞ from the available data. However, she does not observe realizations of x, but
observes realizations of w, which can either be equal or differ from the realizations of x. In the above example,
x denotes the true pension plan type and w the type reported in the survey.
Much of the existing literature on drawing inference in presence of error-ridden data has conceptualized the
problem using either convolution models or mixture models. In the case of convolution models, a latent variable
v 2 V is introduced and w is assumed to measure x with chronic (i.e., affecting each observation) ‘errors-in-
variables’: w ¼ x þ v. Researchers using convolution models commonly assume that the latent variable v is
statistically independent from x or uncorrelated with x and has mean zero (see, e.g., Klepper and Leamer,
1984). In the case of mixture models, latent variables v 2 V and z 2 f0; 1g are introduced and w is viewed as a
contaminated version of x, generated by the mixture w ¼ zx þ ð1 zÞv. In this model, z denotes whether x or v
is observed and realizations of w with z ¼ 1 are said to be error free. Researchers using mixture models
commonly assume that the error probability Prðz ¼ 0Þ is known or at least that it can be bounded non-trivially
from above (see, e.g., Horowitz and Manski, 1995).
When a variable with finite support is imperfectly classified, it is widely recognized that the assumption,
typical in convolution models, of independence between measurement error and true variable cannot hold
(see, for example, Bound et al., 2001, p. 3735). Moreover, compelling evidence from validation studies suggests
that errors in the data are occasional rather than ‘chronic’: a significant part of the observed data are error
free. Mixture models seem therefore more suited for the analysis of such data. However, often the researcher
has prior information on the nature of the misclassification pattern that has transformed x into w. This
information may aid in identification, but cannot easily be exploited through a mixture model.
In this paper I propose an alternative framework, which I call the direct misclassification approach, to draw
inference on the distribution of discrete variables subject to classification errors. The approach does not rely
on the introduction of latent variables, but is based on the observation that in the presence of misclassification,
the relation between the observable distribution of w and the unobservable distribution of x is given by
2 3 2 32 3
Prðw ¼ 1Þ Prðw ¼ 1jx ¼ 1Þ . . . Prðw ¼ 1jx ¼ JÞ Prðx ¼ 1Þ
6 .. 7 6 .. .. .. 76 .. 7
6 . 7¼6 . . . 76 . 7. (1.1)
4 5 4 54 5
Prðw ¼ JÞ Prðw ¼ Jjx ¼ 1Þ . . . Prðw ¼ Jjx ¼ JÞ Prðx ¼ JÞ
In all that follows I denote by P% the matrix of elements fPrðw ¼ ijx ¼ jÞgi;j2X which appears on the right-
hand side of the above equation. For iaj, Prðw ¼ ijx ¼ jÞ is generally referred to as ‘misclassification
probability.’ Eq. (1.1) is a simple formalism and does not have content per se. However, it becomes potentially
informative when combined with assumptions on the matrix of misclassification probabilities P% ; such
assumptions generate a misclassification model.
The method that I introduce allows one to draw inference on PðxÞ and on any real functional of this
distribution using Eq. (1.1) directly, when restrictions on the elements of P% are imposed. Due to the
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 83
classification errors, the identification of the probability distribution PðxÞ is partial and the inference on any of
its real functionals is in the form of identification regions, that is, sets collecting the feasible values of such
functionals. I show that these regions are ‘sharp,’ in the sense that they exhaust all the available information,
given the sampling process and the maintained assumptions. Manski (2003) gives an overview of the literature
on partial identification; for other work see, e.g., Hotz et al. (1997) and Blundell et al. (2007).
The restrictions imposed on P% can have several origins, including validation studies, economic theory,
cognitive and social psychology, or information on the circumstances under which the data have been
collected. In this paper I study their identifying power in general. I then consider a few specific examples. As a
starting point, I assume that the researcher has a known lower bound on the probability that the realizations
of w and x coincide, i.e., Prðw ¼ xÞX1 l, or, strengthening this assumption, that the researcher has a known
lower bound on the probability of correct report for each value that x can take, i.e., Prðw ¼ jjx ¼ jÞX1 l,
8j 2 X . This information is often provided by validation studies or knowledge of the circumstances under
which the data have been collected.1 In this paper it is regarded as ‘base-case’ information, and the
identification regions derived under these assumptions constitute the baseline of the analysis. Then I consider
the case of ‘constant probability of correct report’ and the case of ‘monotonicity in correct reporting.’ I show
that these assumptions can have identifying power when maintained alone, as well as when imposed jointly
with the base case assumptions.
The assumption of constant probability of correct report is motivated by the findings of validation studies.
For specific survey inquiries, these studies suggest that the probability of correct report, for at least a subset of
the values that x can take, is constant. For example, in the context of self-reports of employment status,
Poterba and Summers’ (1995) analysis suggests that there is approximately the same probability of correct
report for people who are employed and for those who are not in the labor force, but a much lower probability
of correct report for people who are unemployed.
The assumption of monotonicity in correct reporting is motivated by social psychology, which suggests that
when survey respondents are asked questions relative to socially and personally sensitive topics, they tend to
underreport socially undesirable behaviors and attitudes, and overreport socially desirable ones. This
suggestion is supported by validation studies, which often document, within a given survey inquiry, that the
probability of correct report of a certain alternative is greater than or equal to the probability of correct report
of a less socially desirable alternative. This is the case, for example, when survey respondents are asked about
their participation in welfare programs.
The proposed method allows the researcher to easily incorporate these assumptions, and in general any
restriction on the misclassification pattern, into the analysis. The method is easy to implement and often
computationally tractable (see Section 2.2 for a discussion of computational issues). Despite the fact that the
results of validation studies on discrete variables are often presented in the form of matrices of
misclassification probabilities (see, e.g., Bound et al., 2001), and the appeal of the simple formalization
given by the misclassification models, there appear to be no precedents to the direct use of Eq. (1.1) to deal
with the identification problems caused by classification errors.
However, there are precedents to the use of specific restrictions on misclassification probabilities. Aigner
(1973), Klepper (1988) and Bollinger (1996) imposed different sets of assumptions on the probabilities of
misclassifying a dichotomous variable x and derived sharp non-parametric bounds on the mean regression
EðyjxÞ. Their approach is close in spirit to the one in this paper, but their methods are designed exclusively for
binary variables and for the case in which specific assumptions hold. Swartz et al. (2004) discuss identification
problems due to misclassification from a Bayesian perspective. In particular, they focus on ‘permutation-type
non-identifiability’ by which switching the positions of Prðx ¼ iÞ and Prðx ¼ jÞ, and those of Pðwjx ¼ iÞ and
Pðwjx ¼ jÞ, the implied distribution of PðwÞ does not change. They introduce several assumptions on the
matrix of misclassification probabilities which overcome this type of problem, and achieve point identification
by imposing a prior on the misclassification matrix and on PðxÞ.
1
Availability of a lower bound on the error probability is a commonplace assumption in the statistic literature on robust estimation,
which makes use of mixture models. For example, Hampel (1974) and Hampel et al. (1986) state that ‘the proportion of gross errors in
data, depending on circumstances, is normally between 0.1% and 10% with several percent being the rule rather than the exception’
(p. 387 and p. 28, respectively).
ARTICLE IN PRESS
84 F. Molinari / Journal of Econometrics 144 (2008) 81–117
Most of the related literature in econometrics (e.g., Card, 1996; Hausman et al., 1998; Abrevaya and
Hausman, 1999; Lewbel, 2000; Dustmann and van Soest, 2000; Kane et al., 1999; Ramalho, 2002) proposes
methods imposing restrictions on misclassification probabilities to achieve parametric or semiparametric
identification of the quantities of interest (i.e., features of PðyjxÞ, or, less often, PðxÞ).2 As such, these methods
are subject to criticisms against possible misspecifications; moreover, while the assumptions employed might
hold in some data sets, there might be other data sets for which they do not hold, and in that case the methods
cannot be applied. Additionally, often these assumptions are maintained for technical reasons and do not have
an obvious interpretation.
Horowitz and Manski (1995, HM henceforth) introduced fully non-parametric methods to draw inference
on features of the distribution of a random variable x when the sampling process is corrupted or
contaminated. They adopted a mixture model and showed that if the researcher has a (non-trivial) lower
bound 1 l on the probability that the realization of w is drawn from the distribution of x, informative
bounds can be obtained on any parameter of the distribution PðxÞ that respects stochastic dominance. HM
showed that these bounds are sharp, in the sense that they exhaust all the available information, given the
sampling process and the maintained assumptions. The assumptions they entertain imply the base case
assumptions on P% introduced above, namely Prðw ¼ xÞX1 l, and Prðw ¼ jjx ¼ jÞX1 l, 8j 2 X .3 When
only these assumptions are maintained, in terms of identification of the types of parameters considered by
HM, the method developed in this paper is equivalent to the one they proposed.
However, often different, and perhaps more, information is available to the applied researcher beyond that
maintained by HM. This information can have strong identifying power, but cannot be easily used within a
mixture model. In particular, for each additional assumption that the researcher wants to bring to bear, she
needs to derive new sharp identification regions for the parameters of interest. Closed form results are often
not easy to obtain and different (possibly computationally challenging) calculation methods for the bounds
and confidence sets may need to be devised for each different set of assumptions.
The direct misclassification approach, on the other hand, does not rely on any specific set of
assumptions, but can incorporate any prior information on the misreporting pattern into the analysis. For
any set of maintained assumptions, the method guarantees sharpness of the implied identification regions,
and these regions and their confidence sets can be estimated using a relatively simple method introduced
in Section 2.
In this paper I focus on a single misclassified variable x. The method easily extends to drawing inference on
features of the distribution of x conditional on a perfectly observed covariate, or on the joint distribution of
several misclassified variables, taking values in finite sets. Given an outcome variable of interest y 2 Y , the
approach also extends to drawing inference on features of the distribution PðyjxÞ when x is subject to
classification errors. Moreover, it can allow one to draw inference when the data are not only error-ridden, but
also incomplete, a situation very common in practice. In fact, in presence of both misclassified and missing
data, the matrix in Eq. (1.1) simply becomes rectangular rather than square, with additional rows giving the
probabilities of having missing data, conditional on the true values of x.
The paper is organized as follows. Section 2 introduces the method, describes connectedness properties of
the identification regions, outlines how the identification regions can be estimated consistently, and proposes a
procedure to calculate confidence sets for the identification regions. Section 3 studies the identifying power of
a few specific assumptions, some of which have not been previously considered in the literature. Section 4
illustrates the estimation method with an application to data on the distribution of pension plans’
characteristics in the American population. Section 5 discusses extensions of the direct misclassification
approach. Section 6 concludes. All of the mathematical details are in Appendix A.
2
Specific restrictions include the following: Bross (1954), when introducing the misclassification problem for binary data, assumed that
Prðw ¼ 1jx ¼ 0Þ and Prðw ¼ 0jx ¼ 1Þ are of the same order of magnitude. Usually with binary data it is assumed either that Prðw ¼
1jx ¼ 0Þ ¼ Prðw ¼ 0jx ¼ 1Þo12 (e.g., Klepper, 1988; Card, 1996), or that Prðw ¼ 1jx ¼ 0Þ þ Prðw ¼ 0jx ¼ 1Þo1 (e.g., Bollinger, 1996;
Hausman et al., 1998). When J42, it is assumed that other monotonicity restrictions between the elements of P% hold (e.g., Abrevaya and
Hausman, 1999; Dustmann and van Soest, 2000), or that specific types of misclassification do not occur (Gong et al., 1990).
3
If the researcher has an upper bound l on the error probability, and the sampling process is corrupted, the first assumption follows; if
the sampling process is contaminated, the second assumption follows. These results are rigorously proved in Molinari (2003).
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 85
In all that follows, to keep the focus on identification, I treat identified quantities as population parameters,
and I assume that Prðw ¼ jÞ40 8j 2 X . A method to consistently estimate the identification regions and
construct their confidence sets is provided at the end of this section.
Let Pw denote the column vector ½Pwj ; j 2 X ½Prðw ¼ jÞ; j 2 X , Px the column vector ½Prðx ¼ jÞ; j 2 X ,
and P% the stochastic matrix which, through Eq. (1.1), generates the misclassification of x into w. Denote the
elements of P% by p% % %
ij fPrðw ¼ ijx ¼ jÞg, i; j 2 X , and the columns of P by pj . Let CX denote the space of
all probability distributions on X and define analogously CX W ; let R denote the real line. Let t : CX ! R be
a real functional of PðxÞ, denoted t½Px , with analogous definitions for functionals of the joint distribution of
ðw; xÞ. A particularly simple functional of PðxÞ is t½Px ¼ E½1ðx ¼ jÞ ¼ Prðx ¼ jÞ, j 2 X . For any given matrix
of functionals of interest Y, let H½Y denote its identification region.
Given this notation, I can rewrite Eq. (1.1) as
Pw ¼ P% Px . (2.1)
The direct misclassification approach starts from the observation that Prðx ¼ jÞ, j 2 X enters each of the J
equations in system (1.1). Hence, each one of these equations can, potentially, imply restrictions on Prðx ¼ jÞ,
and therefore on Px and t½Px . The extent to which this is the case crucially depends on what assumptions are
imposed on the misreporting pattern.
The approach is quite intuitive. If P% were known, and of full rank, I would be able to solve the system of
linear equations in (2.1) and uniquely identify Px , and therefore t½Px . In practice, the misclassification
%
probabilities p%
ij , i; j 2 X are known only to belong to a set H½P , defined below. This set accounts both for
the restrictions coming from probability theory, as well as for the restrictions on the misreporting pattern
coming from validation studies, social and cognitive psychology, economic theory, etc. Denote the elements of
H½P% by P fpij gi;j2X , and the columns of this matrix by pj , j 2 X . When H½P% is not a singleton, Px is not
identified and t½Px need not be identified, but only known, respectively, to lie in the identification regions
H½Px and Hft½Px g.
The identification region H½Px is defined as the set of column vectors px ¼ ½pxk ; k 2 X , such that, given
P 2 H½P% , px solves system (2.1):
H½Px ¼ fpx : Pw ¼ Ppx ; P 2 H½P% g. (2.2)
In the next subsection,
P
%
H½P is formally defined and characterized in a way such that 8P 2 H½P , %
pxk X0,
8k 2 X , and Jk¼1 pxk ¼ 1.
Throughout this paper, the notation px is reserved to elements of H½Px and the notation pxk to the kth
component of a vector px . Hence, pxk and px represent, respectively, feasible values of Prðx ¼ kÞ, k 2 X , and
½Prðx ¼ jÞ; j 2 X , given P 2 H½P% and Eq. (2.1). By construction
px px ðP; Pw Þ,
connected. This has implications for the estimation of the identification regions. Consider for example the case
that interest centers on a real valued functional t½Px . When Hft½Px g is a connected set, it is given by the entire
interval between its smallest and its largest points. Hence by estimating these two points one obtains an
estimate of the entire identification region. When Hft½Px g is disconnected, parts of the interval between the
smallest and the largest points are not feasible and therefore are not elements of the identification region.
Section 2.2 introduces a method to estimate Hft½Px g when this is the case.
A relevant example of a case in which px is a continuous function of P is obtained when each matrix
P 2 H½P% is of full rank. In this case, for each P 2 H½P% , one can solve the linear system in (2.1), obtaining
px ¼ P1 Pw . It is a well known result in matrix algebra that the inverse of a non-singular matrix is continuous
in the elements of the matrix (see, e.g., Campbell and Meyer, 1991, Chapter 10). A very simple condition
ensuring that each matrix P 2 H½P% is of full rank is assuming that the probability of correct report is greater
than 12 for each of the values that x can take.4 Validation studies suggest that this requirement is often satisfied
in practice.5
Notice that the set H P ½P% can be defined alternatively using the notions of ðJ 1Þ-dimensional simplex and
convex hull of a set of vectors. I use the following definitions:
Definition 1. The ðJ 1Þ-dimensional simplex is the set DJ1 fd 2 RJþ : d1 þ d2 þ þ dJ ¼ 1g.
Definition 2. The convex hull of a finite subset fn1 ; n2 ; . . . ; nJ g of RJ , denoted convfn
P 1 ; n2 ; . . . ; nJ g, consists of all
the vectors of the form a1 n1 þ a2 n2 þ þ aJ nJ with ai X0 8i ¼ 1; . . . ; J and Ji¼1 ai ¼ 1. (Rockafellar, 1970,
Corollary 2.3.1.)
By definition, Pw 2 DJ1 . The set H P ½P% can be rewritten as
H P ½P% fP : pj 2 DJ1 and pxj X0 8j 2 X ; and Pw 2 convfp1 ; p2 ; . . . ; pJ gg. (2.5)
In words, a matrix P is an element of H P ½P% if its columns are probability mass functions, the implied px is a
probability mass function, and the vector Pw can be expressed as a convex combination of the columns of P.
This set of matrices contains also matrices that are not of full rank. Notably, it contains the matrix with each
column identical to Pw , denoted P.~ This matrix plays an important role in Proposition 1 below.
To describe the geometry of H P ½P% I need to introduce another definition:
Definition 3. A subset G of Rn is star convex with respect to c0 2 G if for each c 2 G the line segment joining c
and c0 lies in G (Munkres, 1991, p. 330).
Star convexity implies path-connectedness, which in turn implies connectedness. Given a set of matrices
P RJJ , define the line segment between two matrices P1 ; P2 2 P as
Pa ¼ aP1 þ ð1 aÞP2 ; a 2 ½0; 1.
4
If pjj 4 12 ; 8j 2 X , 8 P 2 H½P% , PT is strictly diagonally
P dominant, and hence P is non-singular. An n n matrix A ¼ faij g is said to be
strictly diagonally dominant if, for i ¼ 1; 2; . . . ; n, jaii j4 nj¼1ðjaiÞ jaij j. A proof of the fact that if A is strictly diagonally dominant, then A is
non-singular, can be found in Horn and Johnson (1999, Theorem 6.1.10.)
5
Among others, this is the case in the context of workers’ union status (see, e.g., Card, 1996), transfer program recipiency (see, e.g.,
Moore et al., 1996), employment status (see, e.g., Poterba and Summers, 1995), and 1- and 3-digit level classification of industry and
occupation (see, e.g., Mellow and Sider, 1983).
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 87
Then the set P is convex if given any two matrices P1 ; P2 2 P, Pa 2 P for all a 2 ð0; 1Þ. Connectedness of the
set H P ½P% is established in the following proposition:
The result in Proposition 1 implies that the set H P ½P% is not convex, because a convex set is star convex
with respect to each of its elements. The set H P ½P% is illustrated in Example 1 and in the first panel of Fig. 1.
Example 1. Suppose that x and w are binary, i.e., that J ¼ 2, and let Pw1 ¼ 0:3. Then the matrix P is
determined by its two diagonal elements, p11 and p22 , and
px1 2 ½0; 1 : Pw1 ¼ p11 px1 þ ð1 p22 Þð1 px1 Þ.
It is easy to verify that
H P ½P% ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 ½0; 1 Pw1 Þ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1 Pw1 ; 1Þg.
This set is plotted in the first panel of Fig. 1, and its star convexity is apparent.
π22
π22
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
π11 π11 π11
π22
π22
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
π11 π11 π11
Fig. 1. Geometry of the set H P ½P% , and of the set H½P% under different assumptions, when J ¼ 2 and Prðw ¼ 1Þ ¼ 0:3.
ARTICLE IN PRESS
88 F. Molinari / Journal of Econometrics 144 (2008) 81–117
The problem which occurs in this example relates to the ‘permutation-type non-identifiability’ considered by
Swartz et al. (2004). For a given P1 2 H P ½P% , one can obtain another P2 2 H P ½P% by letting p211 ¼ 1 p122
and p222 ¼ 1 p111 . Letting p~ x1 ¼ ð1 px1 Þ yields p111 px1 þ ð1 p122 Þð1 px1 Þ ¼ p211 p~ x1 þ ð1 p222 Þð1 p~ x1 Þ. This
explains the symmetry of H P ½P% around the line p22 ¼ 1 p11 .
Denote by H E ½P% the set of matrices that satisfy the restrictions on the misreporting pattern coming from
prior information. Then if, for example, validation studies suggest a uniform lower bound on the probability
of correct report for each j 2 X ,
H E ½P% ¼ fP : pjj X1 l 8j 2 X g.
If social psychology suggests that individuals, when answering about the frequency with which they engage in
a certain socially desirable activity, either provide correct reports or overreport,
H E ½P% ¼ fP : pij ¼ 0 8ioj 2 X g.
Of course, plenty of other restrictions are possible.
Because H P ½P% is connected, but not convex, when I take its intersection with the set H E ½P% I obtain a set
H½P% that might be disconnected, connected, or convex, depending on how H E ½P% slices H P ½P% . Below I
provide three examples of sets H E ½P% , which are further analyzed in Section 3. Each of these sets is trivially
convex, as it is linear in P, but its intersection with H P ½P% generates sets H½P% that can be disconnected,
connected, or convex. These examples are illustrated in the six panels of Fig. 1.
Example 2 (Constant probability of correct report). Let H E ½P% ¼ fP : pjj ¼ p 8j 2 X g. Suppose that x and w
are binary, i.e. that J ¼ 2. Then
8
> fp : p 2 ½0; Pw1 [ ½1 Pw1 ; 1g if Pw1 o 12 ;
<
w w w 1
H½P% ¼ fp : p 2 ½0; 1 P1 [ ½P1 ; 1g if P1 4 2 ;
>
: fp : p 2 ½0; 1g if Pw1 ¼ 12 :
Hence, if Pw1 a 12, H½P% is disconnected. This set is plotted in the second panel of Fig. 1, and its
disconnectedness is apparent. The set H½P% remains disconnected, if Pw1 a 12, even if the assumption of
constant probability of correct report is weakened to requiring that p22 ¼ p11 þ e, as long as jejoj1 2Pw1 j
(and e is such that p22 2 ½0; 1Þ.
Example 3 (Monotonicity in correct reporting). Let H E ½P% ¼ fP : pjj Xpðjþ1Þðjþ1Þ 8j 2 X g. Suppose that x and
w are binary, i.e. that J ¼ 2, so that the monotonicity assumption simplifies to p11 Xp22 . Then if Pw1 o 12,
H½P% ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 f½0; p11 gÞ [ ðp11 2 ½1 Pw1 ; 1; p22 2 ½1 Pw1 ; p11 Þg.
If Pw1 X 12,
H½P% ¼ fp11 ; p22 : ðp11 2 ½0; Pw1 ; p22 2 ½0; minð1 Pw1 ; p11 ÞÞ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1 Pw1 ; p11 Þg.
Hence, if Pw1 o 12, H½P% is disconnected, but otherwise it is connected. This set is plotted in the third panel of
Fig. 1. Its disconnectedness is apparent given the choice of Pw1 ¼ 0:3. To see why the set can be connected, the
fourth panel of Fig. 1 plots the set H½P% obtained when the monotonicity assumption is p11 pp22 (in the
binary case, reversing the sign of the monotonicity assumption has an effect similar to maintaining p11 Xp22
but having Pw1 4 12).
Example 4 (Lower bound on the probability of correct report). Let H E ½P% ¼ fP : pjj X1 l 8j 2 X g. Suppose
that x and w are binary, i.e. that J ¼ 2. Then if 14l4maxfPw1 ; 1 Pw1 g,
H½P% ¼ fp11 ; p22 : ðp11 2 ½1 l; Pw1 ; p22 2 ½1 l; 1 Pw1 Þ [ ðp11 2 ½Pw1 ; 1; p22 2 ½1 Pw1 ; 1Þg.
This set is connected through the point p11 ¼ Pw1 , p22 ¼ 1 Pw1 , and is plotted in the fifth panel of Fig. 1 for
Pw1 ¼ 0:3 and l ¼ 0:8.
If maxfPw1 ; 1 Pw1 g4l, then
H½P% ¼ fp11 ; p22 : p11 2 ½maxf1 l; Pw1 g; 1; p22 2 ½maxf1 l; 1 Pw1 g; 1g,
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 89
and H½P% is convex. This set is plotted in the sixth panel of Fig. 1. Its convexity is apparent given the choice
of Pw1 ¼ 0:3 and l ¼ 0:2.
The set H½P% can be disconnected, connected or convex. These properties are reflected in the shape of the
identification regions of the functionals of interest, namely H½Px , Hft½Px g and HfY½Px g, for some vector of
dimension k of functionals Y : CX ! Rk . Hence, it is important to have a method to calculate and
consistently estimate the entire identification regions, that is able to capture their possible disconnectedness
and non-convexities. While the general identification approach proposed in Section 2.1 is valid for any set of
restrictions on P% , here I focus on restrictions that satisfy certain regularity conditions, described in
Assumptions C0 and C1 below, so that a simple estimator can be utilized.
Manski and Tamer (2002) introduced methods to estimate the entire identification region of a vector of
parameters of interest when the identification region cannot be expressed in closed form solution, but
is given by all values of the vector that minimize a specified objective function. Here I introduce a
related nonlinear programming estimator, using the same insight as in the linear programming estimator
proposed by Honore and Tamer (2006) and further discussed by Honore and Lleras-Muney (2006).
Observe that if I can calculate H½Px , I can then calculate Hft½Px g and HfY½Px g for any functionals tðÞ
and YðÞ (for example, the mean of x, its variance, the Gini coefficient, etc.); hence, I focus on the calculation
of H½Px .6
The set H½Px consists of the vectors px 2 DJ1 for which the equations
Pw ¼ Ppx ; pj 2 DJ1 8j; P 2 H E ½P% , (2.6)
E %
have a solution for P. In general, H ½P can be written as
( )
P : f j ðPÞXmj ; j ¼ 1; . . . ; q1 ; gi ðPÞpmq1 þi ; i ¼ 1; . . . ; q2 ;
E %
H ½P ¼ ,
hk ðPÞ ¼ mq1 þq2 þk ; k ¼ 1; . . . ; q3 ;
subject to
8
>
> vk X0 8k;
>
>
>
> p ij X0; i; j ¼ 1; . . . ; J;
>
> P
>
>
>
> 1 Ji¼1 pij ¼ vj ; j ¼ 1; . . . ; J;
<
Pw Pn ¼ ½vJþ1 . . . v2J T ; (2.8)
>
>
>
> f l ðPÞ ml þ v2Jþl X0; l ¼ 1; . . . ; q1 ;
>
>
>
> mq1 þm gm ðPÞ þ v2Jþq1 þm X0; l ¼ 1; . . . ; q2 ;
>
>
>
>
: hs ðPÞ mq þq þs þ v2Jþq þq þs ¼ 0; l ¼ 1; . . . ; q3 :
1 2 1 2
I consider restrictions determining the set H E ½P% that satisfy the following conditions:
Assumption C0. For each j ¼ 1; . . . ; q1 , i ¼ 1; . . . ; q2 , and k ¼ 1; . . . ; q3 ; f j ðPÞjP¼0 ¼ gi ðPÞjP¼0 ¼ hk ðPÞjP¼0 ¼
2
0 and f j ðPÞ, gi ðPÞ, and hk ðPÞ are continuous on ½0; 1J .
Let P V denote the constraint set defined by (2.8). Assumption C0 is imposed to establish that the
objective function in (2.7) achieves a maximum on (2.8). Observe that the set P V is closed, because the
constraints defining it are continuous, and non-empty, because it contains the vector ½p01 ; . . . ; p0J ; v0 , with
p0ij ¼ 0 for i; j ¼ 1; . . . ; J, v0j ¼ 1 for j ¼ 1; . . . ; J, v0Jþj ¼ Pwj for j ¼ 1; . . . ; J; v02Jþl ¼ ml , l ¼ 1; . . . ; q1 ,
v02Jþq1 þm ¼ 0, m ¼ 1; . . . ; q2 , and v02Jþq1 þq2 þs ¼ mq1 þq2 þs , s ¼ 1; . . . ; q3 . Hence maximization of (2.7) on P
V is equivalent to maximization of (2.7) on
( )
X X
P ~
~ V ¼ ½p1 ; . . . ; pJ ; v 2 P V : vk X 0
v ,
k
k k
which is a closed and bounded set. The objective function in (2.7) is continuous, and therefore the result
follows by the Bolzano–Weierstrass theorem.7 The optimal function has value zero if and only if all vk ¼ 0,
that is if a solution exists to (2.6). Hence, for given n 2 DJ1 one can check whether n 2 H½Px by solving the
above nonlinear programming problem and checking whether vk ¼ 0 for all k.
The above method for calculating identification regions has a natural sample analog counterpart, and under
some regularity conditions about the functions defining the set H E ½P% and the sampling process, this
estimator is consistent. In particular, I maintain the following:
Assumption C1. For each j ¼ 1; . . . ; q1 , i ¼ 1; . . . ; q2 , and k ¼ 1; . . . ; q3 , either (i) f j ðPÞ, gi ðPÞ and hk ðPÞ are
homogeneous functions of degree (respectively) rj ; ri ; rk X1, or (ii) f j ðPÞ, gi ðPÞ and hk ðPÞ are multivariate
polynomials in P with non-negative coefficients, or (iii) f j ðPÞ are convex functions, gi ðPÞ are concave
2
functions, and either (i) or (ii) holds for hk ðPÞ. Additionally, gi ðPÞX0 and hk ðPÞX0 on ½0; 1J .
P
Assumption C2. (a) Let a random sample fwi g, i ¼ 1; . . . ; N be available, and let Pwi;N ¼ N1 N j¼1 1ðwj ¼ iÞ,
i ¼ 1; . . . ; J. (b) If the set H E ½P% contains constraints involving any parameters to be estimated, let these
parameters enter the constraints additively. Without loss of generality, to simplify the notation, let the
N
parameters to be estimated be ml , l ¼ 1; . . . ; q̄pq. (c) Suppose that a random sample pffiffiffiffi of size n ¼d k for some
constant k such that 0oko1 is available to estimate ml , l ¼ 1; . . . ; q̄, so that Nðml;n ml Þ ! Nð0; kVml Þ.
(d) Let ml satisfy ml 40; l ¼ 1; . . . ; q̄pq.
In Section 3 I consider several examples of restrictions defining the set H E ½P% that satisfy Assumptions
C0–C1. For example, suppose that a validation study provides a lower bound on the probability of correct
7
Alternative assumptions replacing Assumption C0 and yielding a non-empty closed and bounded constraint set for every n 2 DJ1
would also imply this result.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 91
report for each type j ¼ 1; . . . ; J, so that H E ½P% ¼ fP : pjj Xmj ; j 2 X g. Then Assumptions C0–C1 are clearly
satisfied. Moreover, if a validation (random) sample fw~ i ; x~ i g, i ¼ 1; . . . ; n is available (with n ¼ Nk , 0oko1),
which does not point identify Px and P% , but allows one to conclude that
Pn
1ðw~ ¼ j; x~ i ¼ jÞ
Pn i
pjj Xmj;n ¼ i¼1 ,
i¼1 1ðx~ i ¼ jÞ
then Assumption C2 is satisfied. The empirical analysis conducted in Section 4 shows that there are important
cases in which a validation sample allows for root-N consistent estimation of mj;n , but does not allow for point
identification of Px or P% .
Let H EN ½P% denote the set H E ½P% obtained when ml is replaced by ml;n , l ¼ 1; . . . ; q, with the convention
that ml;n ¼ ml for l ¼ q̄ þ 1; . . . ; q. Define an objective function QN ðnÞ as in (2.7)–(2.8), with ml;n , l ¼ 1; . . . ; q,
replacing ml and PwN replacing Pw . Then the following consistency result holds:
Proposition 2. Let Assumptions C0, C1 and C2 hold. Define the set
( )
H N ½Px ¼ pxN 2 DJ1 : QN ðpxN ÞX sup QN ðnÞ N , (2.9)
n2DJ1
where N ¼ N t , 0oto 12. Then the set H N ½Px is a consistent estimator of H½Px , in the sense that
( )
p
rðH N ½Px ; H½Px Þ max sup inf kpxN px k; sup inf kpxN px k ! 0.
pxN 2H N ½Px px 2H½Px px 2H½P x pxN 2H N ½Px
Most of the calculations and estimations of H½Px presented in this paper are performed using this nonlinear
programming method. The method requires checking the value function of the sample analog of (2.7)–(2.8) for
each n 2 DJ1 . Hence it works best, and the computations are easiest, when J is a relatively small number. This
is the case in many applications of interest. Examples include educational attainment, language proficiency,
workers’ union status, employment status, health conditions, and health/functional status. When J is a large
number, the nonlinear programming problem becomes computationally harder. This issue has been
acknowledged in the related literature on partial identification, and some solutions have been proposed. For
example, Chernozhukov et al. (2004) and Ciliberto and Tamer (2004) have suggested the use of the
Metropolis–Hastings algorithm to generate adaptive grid sets or the use of simulated annealing to perform the
optimization over DJ1 . While Ciliberto and Tamer’s empirical analysis is based on the optimization of a
different objective function and the parameter space for n in their case is not DJ1 , their work shows that the
computational problem is feasible for values of J as large as 13.
The problem of the construction of confidence intervals for partially identified parameters was addressed by
Horowitz and Manski (1998, 2000). They considered the case in which the identification region of the
parameter of interest is an interval whose lower and upper bounds can be estimated from sample data, and
proposed confidence intervals that asymptotically cover the entire identification region with fixed probability.
For the same class of problems, Imbens and Manski (2004) suggested shorter confidence intervals that
uniformly cover the parameter of interest, rather than its identification region, with a prespecified probability.
Beresteanu and Molinari (2007) provide confidence sets and confidence collections for partially identified
parameters whose convex identification region is equal to the expectation of a properly defined set valued
random variable. These approaches are not applicable to the problem studied here, because our identification
regions are given by the set of values of the parameters of interest that solve a maximization problem, do not
have a closed form solution, and are not necessarily convex. The problem of construction of confidence sets
for identification regions of parameters obtained as the solution of the minimization of a criterion function has
recently been addressed by Chernozhukov et al. (2007). They provided a method to construct confidence sets
8
I am very grateful to Elie Tamer for suggestions that led to the construction of these confidence sets.
ARTICLE IN PRESS
92 F. Molinari / Journal of Econometrics 144 (2008) 81–117
that cover the identification region with probability asymptotically equal to 1 a, and developed subsampling
methods to implement this procedure. Here I introduce a different procedure, and show that the coverage
property of these confidence sets follows directly from well known results in the literature (e.g., Rao, 1973;
Cox and Hinkley, 1974). The counterpart of the simplicity of this approach is that the confidence sets may be
conservative, in the sense that given a prespecified confidence coefficient 1 a, 0oao1, the confidence sets
asymptotically cover the identification region with probability at least equal to 1 a.x
The main insight for the construction of the confidence sets for H½Px , denoted C H½P N
, is given by observing
x w
that the only parameters to be estimated for obtaining H N ½P in (2.9) are Pi;N , i ¼ 1; . . . ; J 1, and ml;n ,
l ¼ 1; . . . ; q̄. Let !^ N denote the J 1 þ q̄ vector collecting these estimators. Under Assumption C2, !^ N is root-
N consistent and asymptotically normal, and has a covariance matrix (Varð!Þ) that can be consistently
estimated from the data (Varð d !^ N Þ). Hence, if c1a denotes the 1 a quantile of the w2
ðJ1þq̄Þ distribution, I
construct a joint confidence ellipsoid for ! ½ðPwi Þi¼1;...;J1 ; ðml Þl¼1;...;q̄ as
x [
C H½P
N
¼ H !0 ½Px .
! 0 2C !N
Then
x
! 2 C !N ¼)H½Px C H½P
N
,
and therefore
x
lim PrðH½Px C H½P
N
ÞX1 a.
N!1
The confidence sets presented in Section 4 are obtained using this procedure. Using similar procedures one
can construct confidence regions for Hft½Px g and HfY½Px g, where again tðÞ and YðÞ denote functionals
of PðxÞ.
This section analyzes in detail examples of restrictions on the matrix P% (which satisfy Assumptions
C0–C1) coming from validation studies and theories developed in the social sciences. I suggest settings in
which such assumptions may be credible, show their implications for the structure of H½P% , and present
results on the inferences that they allow one to draw on Px and t½Px . I show that when the ‘base-case’
assumptions are maintained, the direct misclassification approach is equivalent to the method proposed by
HM and therefore it gives the same identification regions for H½Prðx ¼ jÞ, j 2 X as the ones they derived.
Hence, I use these results as a benchmark to evaluate the identifying power of additional assumptions. Notice
however that H½Prðx ¼ jÞ, j 2 X is just the projection of H½Px on its jth component. Therefore, when J42, a
comparison based simply on H½Prðx ¼ jÞ, j 2 X understates the identifying power of the additional
assumptions. When J ¼ 2, H½Px is entirely described by H½Prðx ¼ 1Þ and closed form bounds can be derived
under different sets of assumptions, hence allowing for a full comparison.
Suppose that the researcher has a known lower bound on the probability that the realizations of w and x
coincide, i.e., Prðw ¼ xÞX1 l, or, strengthening this assumption, that the researcher has a known lower
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 93
bound on the probability of correct report for each value that x can take, i.e., Prðw ¼ jjx ¼ jÞX1 l, 8j 2 X .
Formally, consider the following:
Assumption 1. Prðw ¼ xÞX1 l40
or, as a stronger version of Assumption 1, that:
Assumption 2. Prðw ¼ jjx ¼ jÞX1 l40; 8j 2 X .
Assumptions 1 and 2 are quite often satisfied in practice, mainly due to the availability of results of
validation studies, and are therefore of particular interest. Additionally, Assumptions 1 and 2 exhaust the
implications for the structure of P% of the assumptions typically maintained by researchers adopting mixture
models. Hence, the results obtained under these ‘base-case’ assumptions are particularly suited to evaluate the
identifying power of additional prior information. In the next section I show that informative identification
regions might be obtained even if one dispenses with Assumptions 1 and 2, when other information is available.
When the researcher has prior information suggesting that either Assumption 1 or the stronger Assumption
2 hold, she can specify the set H E ½P% , respectively, as follows:
( )
XJ
E;1 % x
H ½P ¼ P : phh ph X1 l ,
h¼1
E;2 %
H ½P ¼ fP : pjj X1 l 8j 2 X g,
where H E;1 ½P% denotes the set H E ½P% when Assumption 1 is maintained, and H E;2 ½P% denotes the set
H E ½P% when Assumption 2 is maintained. Notice that H E;2 ½P% H E;1 ½P% . Proposition 3 gives closed form
bounds on Prðx ¼ jÞ, j 2 X , for the case in which either Assumption 1 or 2 holds.
Proposition 3. (a) Suppose that Assumption 1 holds, and that no other information is available. Then from system
(1.1) one can learn that
H½Prðx ¼ jÞ ¼ ½maxðPwj l; 0Þ; minð1; Pwj þ lÞ; j 2 X. (3.1)
(b) Suppose that Assumption 2 holds, and that no other information is available. Then from system (1.1) one
can learn that
w
Pj l Pwj
H½Prðx ¼ jÞ ¼ max ; 0 ; min 1; ; j 2 X. (3.2)
1l 1l
The proof of Proposition 3 proceeds in two steps. First, it is shown that from the jth equation of system
(1.1) one can learn, depending on the maintained assumption, that Prðx ¼ jÞ lies in one of the intervals in
(3.1)–(3.2). Then it is shown that there exists a P 2 H½P% for which the extreme values of these intervals solve
system (1.1). This implies that the bounds are sharp. This result establishes that when only Assumption 1 or
Assumption 2 is maintained, only the jth equation in system (1.1) implies restrictions on Prðx ¼ jÞ, j 2 X . In
the next section I show that when more structure is imposed on the matrix P, several of the equations in
system (1.1) imply restrictions on Prðx ¼ jÞ, j 2 X , and additional progress can be made.
The same identification regions as those in Proposition 3 were obtained by HM. They used a mixture model
to study the problem of inference with corrupted and contaminated data, and assumed that a known lower
bound is available on the probability that a realization of w is drawn from the distribution of x. Molinari
(2003) shows that under Assumptions 1 and 2, the identification regions for parameters that respect stochastic
dominance obtained using the direct misclassification approach are also equivalent to those obtained by HM.
Consider the case that, conditional on the value of x, there is constant probability of correct report for at
least a subset of the values that x can take. Formally:
Assumption 3. Prðw ¼ jjx ¼ jÞ ¼ p% X1 lX0 8j 2 X~ X ,
ARTICLE IN PRESS
94 F. Molinari / Journal of Econometrics 144 (2008) 81–117
where p% is known only to lie in ½1 l; 1, and l is strictly less than 1 if a non-trivial upper bound on the
probability of a data error is available.
There are various situations in which this assumption may be credible. For example, Poterba and Summers
(1995) use CPS data (with Reinterview Survey) and provide evidence for the reinterviewed sub-sample that the
rate of correct report of employment status is similar for individuals who are employed or not in the labor
force (Prðw ¼ jjx ¼ jÞ ’ 0:99), but much lower for individuals who are unemployed (Prðw ¼ jjx ¼ jÞ ’ 0:86).
Kane et al. (1999) provide evidence (Table 5, p. 18) that self-report of educational attainment is correct with
similar probabilities for individuals with no college, some college but no AA degree, and AA degree
(Prðw ¼ jjx ¼ jÞ ’ 0:92), and is higher for individuals with at least a bachelor degree (Prðw ¼ jjx ¼ jÞ ’ 0:99).
Assumption 3 may hold with X~ ¼ X when the misclassification is generated by specific types of interviewer
recording errors. For example, the interviewer may sometimes mark one box at random in the questionnaire.
Additionally, in the special case of dichotomous variables, some have argued that the misreporting of health
disability is independent from true disability status (see Kreider and Pepper, 2007 for a discussion of this
issue), or that the misreporting of workers’ union status is independent from true union status (see Bollinger,
1996 for a discussion of this issue). When this is the case, Assumption 3 holds.
In general, Assumption 3 does not place any restriction on Prðw ¼ ijx ¼ jÞ, iaj; i; j 2 X , other than that the
misreporting probabilities need to satisfy
X
Prðw ¼ ijx ¼ jÞ ¼ 1 p% ; 8j 2 X~ .
iaj
When J ¼ 2, this implies that the two off-diagonal elements of P% are equal; hence the only unknown element
of P% is p% .
Suppose first that X~ X , and without loss of generality let X~ f1; 2; . . . ; hg, 2phoJ. When this is the case,
Eq. (1.1) can be rewritten as
2 32 3 2 3
p% p%12 ... p%1J Prðx ¼ 1Þ Prðw ¼ 1Þ
6 p% p% ... p% 76 7 6 7
6 21 2J 76 Prðx ¼ 2Þ 7 6 Prðw ¼ 2Þ 7
6 76 7 6 7
6 .. .. .. .. 76 .. 7¼6 .. 7, (3.3)
6 . . . 7
. 54 6 . 7 6 . 7
4 5 4 5
p%
J1 p%
J2 ... p%JJ Prðx ¼ JÞ Prðw ¼ JÞ
where p% X1 l and, assuming that l constitutes a uniform upper bound for all the misclassification
probabilities, p% ~ E %
ll X1 l, 8l 2 X nX . Then H ½P is defined as
Let H 3 ½P% ¼ H P ½P% \ H E;3 ½P% , where H P ½P% was defined in (2.4). Then one can immediately calculate
H½Px and Hft½Px g using the non-linear programming method described in Section 2, with H E ½P% ¼
H E;3 ½P% .
It is natural to ask whether Assumption 3 does have identifying power. To answer this question, I consider
the case that the researcher has a non-trivial upper bound on the probability of data errors, i.e., that lo1 and
compare the bounds on Prðx ¼ jÞ, j 2 X derived in Proposition 3, Eq. (3.2), with the extreme points obtained
using the nonlinear programming method, with H E ½P% ¼ H E;3 ½P% . In Section 3.4 I consider the case in
which x and w are binary ðJ ¼ 2Þ and show that Assumption 3 can have identifying power even when l ¼ 1.
Proposition 4 shows that if Pwi 40, for some i 2 X~ nfjg, the base case lower bound on Prðx ¼ jÞ, j 2 X~ , if
informative, is never feasible when Assumption 3 (with X~ X ) is maintained; hence the lower bound on
Prðx ¼ jÞ, j 2 X~ under Assumption 3 is strictly greater than that in (3.2). For the case in which the base case
upper bound on Prðx ¼ jÞ, j 2 X~ is informative, Proposition 5 derives conditions under which such upper
bound is not feasible when Assumption 3 (with X~ X ) is maintained, and shows that when those conditions
are satisfied, this upper bound is strictly smaller than that in (3.2). When the base case lower and upper bounds
(respectively) are not informative, Assumption 3 has no additional identifying power.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 95
Proposition 4. (a) Suppose that Assumption 3 holds, with X~ X , and that Pwj 4l. Then the lower bound on
Prðx ¼ jÞ, j 2 X~ is strictly greater than the base case lower bound in (3.2). The base case lower bound in (3.2) is
the sharp lower bound for Prðx ¼ kÞ, k 2 X nX~ .
(b) Suppose that Assumption 3 holds, with X~ X , and that Pwj pl. Then the sharp lower bound on Prðx ¼ jÞ,
j 2 X coincides with the base case lower bound in (3.2), and is equal to 0.
Proposition 5. (a) Suppose that Assumption 3 holds, with X~ X , and that Pwj o1 l.
If lp 12, the upper bound on Prðx ¼ jÞ, j 2 X~ is strictly smaller than the base case upper bound in (3.2) if and
only if
l
9 k 2 X~ nfjg : Pwj þ Pwk 4ð1 lÞ þ Pwj . (3.4)
1l
If l4 12, the upper bound on Prðx ¼ jÞ, j 2 X~ is strictly smaller than the base case upper bound in (3.2) if
9 k 2 X~ nfjg : Pwk 4l. (3.5)
The base case upper bound in (3.2) is the sharp upper bound for Prðx ¼ kÞ, k 2 X nX~ .
(b) Suppose that Assumption 3 holds, with X~ X , and that Pwj X1 l. Then the sharp upper bound on
Prðx ¼ jÞ, j 2 X coincides with the base case upper bound in (3.2) and is equal to 1.
The proofs of Propositions 4–5, parts (a), are based on showing that there is no P 2 H 3 ½P% for which the
lower bound in (3.2) for Prðx ¼ jÞ, j 2 X~ solves system (3.3), and that when condition (3.4) or condition (3.5) is
satisfied, there is no P 2 H 3 ½P% for which the upper bound in (3.2) for Prðx ¼ jÞ, j 2 X~ solve system (3.3).
When the inference is on Prðx ¼ kÞ, k 2 X nX~ , there is a P 2 H 3 ½P% that allows for the base case bounds in
(3.2) to solve system (3.3). The proofs of Propositions 4–5, parts (b), are based on showing that when the
bounds on Prðx ¼ jÞ, j 2 X in (3.2) are not informative, one can find values of P 2 H 3 ½P% for which pxj ¼ 0
and pxj ¼ 1 solve system (3.3).
The results in Propositions 4–5 can be explained as follows: only a subset X~ of the equations in system (1.1)
are related between each other. Therefore, when drawing inference on Prðx ¼ jÞ, j 2 X , an improvement on the
base case bound in (3.2) can be achieved only for j 2 X~ . Consider now the case in which X~ ¼ X . In this case
the results of Propositions 4–5 apply directly, with X replacing X~ . Of course, the identifying power of
Assumption 3 is the highest in this case. In particular, Proposition 4 establishes that the lower bound for
Prðx ¼ jÞ, j 2 X , if informative, improves for all j when Assumption 3 is maintained with X~ ¼ X .
A final consideration is relevant. Often the researcher might have prior information suggesting that
Assumption 3 holds, but not exactly. That is, she might have prior information that the probability of correct
report is only approximately constant: Prðw ¼ jjx ¼ jÞ p% , 8j 2 X~ X . Then it is natural to ask how much
variation in the probabilities of correct report is consistent with the conclusions of Propositions 4–5. For ease
of exposition, consider the identification of Prðx ¼ 1Þ, and let p11 ¼ p.9 Molinari (2003) shows that as long as
jpjj p11 jol, 8j 2 X~ nf1g, and X~ X , or X~ ¼ X , the results of Proposition 4 continue to hold. A similar
condition is derived for the results of Proposition 5.
Example 6 in Section 3.4 illustrates the identifying power of Assumption 3, both for the case in which
X~ X and X~ ¼ X , by comparing the identification regions H½Prðx ¼ jÞ, j 2 X , H½Px and H½EðxÞ obtained
using the nonlinear programming method with H E ½P% ¼ H E;3 ½P% with those obtained when only
Assumption 2 is maintained.
Social psychology suggests that when survey respondents are asked questions relative to socially and
personally sensitive topics, they tend to underreport socially undesirable behaviors and attitudes, and
overreport socially desirable ones. This suggestion is often supported by validation studies. In the context of
questions of the type described above, these studies often document that Prðw ¼ jjx ¼ jÞX Prðw ¼ j þ 1jx ¼
j þ 1Þ, 8j 2 X~ X . This is the case for example when survey respondents are asked about their participation
9
When drawing inference on Pðx ¼ jÞ, j 2 X~ , we can always define pjj ¼ p, and look at pkk , k 2 X~ nfjg, as deviations from p.
ARTICLE IN PRESS
96 F. Molinari / Journal of Econometrics 144 (2008) 81–117
in welfare programs, and j ¼ 1 indicates non-participation, while j ¼ 2 indicates participation, or when they
are asked about their employment status, and j ¼ 1; 2 indicates, respectively, employed or not in the labor
force, while j ¼ 3 indicates unemployed.
Suppose that the set X f1; 2; . . . ; Jg can be ordered according to the ‘social desirability’ of the values
that x can take, with x ¼ 1 being the most desirable, and x ¼ J the least desirable. Suppose further
that the researcher believes that there is monotonicity in correct reporting. Then she can maintain the
following:
Assumption 4. Prðw ¼ jjx ¼ jÞX Prðw ¼ j þ 1jx ¼ j þ 1Þ, 8j 2 X nfJg, Prðw ¼ Jjx ¼ JÞX1 lX0,
where l is strictly less than 1 if a non-trivial upper bound on the probability of a data error is available. When
this assumption holds, H E ½P% is defined as
Example 6 in Section 3.4 illustrates the identifying power of Assumption 4, by comparing the identification
regions obtained using the nonlinear programming method with H E ½P% ¼ H E;4 ½P% with those obtained
when only Assumption 2 is maintained.
When x and w are dichotomous variables, the identifying power of Assumptions 3 and 4 can be more easily
appreciated, since the bounds on H½Px can be derived explicitly. This section shows how. It then provides
numerical examples of the identification regions obtained under Assumptions 2, 3 and 4, both for the case of
J ¼ 2 and 3.
Let X ¼ f1; 2g. The problem of misclassification of a dichotomous variable has received much attention
in the econometric, statistical, and epidemiological literature. It is in the context of misclassified dichoto-
mous variables that most of the precedents to the use of restrictions on the misclassification probabilities take
place.
To start, suppose that Assumption 3 holds. In the related literature it has often been assumed that Prðw ¼
1jx ¼ 2Þ ¼ Prðw ¼ 2jx ¼ 1Þ and additionally that these misclassification probabilities are less than 12 (see, e.g.,
Klepper, 1988; Card, 1996). Notice that with dichotomous variables Assumption 3 implies that Eq. (1.1)
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 97
can be rewritten as
" # " #
Prðw ¼ 1Þ p% 1 p% Prðx ¼ 1Þ
¼ .
Prðw ¼ 2Þ 1 p% p% Prðx ¼ 2Þ
Hence, the identification region H½Px can be inferred from the identification region
H½Prðx ¼ 1Þ ¼ fpx1 : Pw1 ¼ ppx1 þ ð1 pÞð1 px1 Þ; p 2 H 3 ½P% g,
where H 3 ½P% was defined in Example 2. Notice that if p ¼ 12, Pw1 ¼ 12; in this case, PðwjxÞ ¼ PðwÞ, i.e., x and w
are statistically independent, and obviously knowledge of PðwÞ does not provide any information on PðxÞ.
If Pw1 a 12, then pa 12. The following proposition characterizes explicitly H½Prðx ¼ 1Þ.
The fact that if lX 12 H½Prðx ¼ 1Þ can be given by two disjoint intervals is a direct consequence of the
possible disconnectedness of H½P% arising when one assumes constant probability of correct report, and is
described in Section 2 and in Example 2.
Suppose now that Assumption 4 holds. Also in this case the identification region H½Px can be inferred from
the identification region
H½Prðx ¼ 1Þ ¼ fpx1 : Pw1 ¼ p11 px1 þ ð1 p22 Þð1 px1 Þ; ðp11 ; p22 Þ 2 H 4 ½P% g, (3.6)
where H 4 ½P% was defined in Example 3. Notice that again if p11 ¼ p22 ¼ 12, Pw1 ¼ 12; in this case,
PðwjxÞ ¼ PðwÞ, i.e., x and w are statistically independent, and obviously knowledge of PðwÞ does not provide
any information on PðxÞ. If Pw1 a 12, then p11 and p22 cannot be jointly equal to 12. The following proposition
characterizes explicitly H½Prðx ¼ 1Þ.
To conclude this section, I illustrate the identifying power of Assumption 3 (both for the case in which
X~ X and X~ ¼ X ) and Assumption 4, when J ¼ 3. I compare the identification regions H½Prðx ¼ jÞ, j 2 X ,
H½Px and H½EðxÞ obtained using the nonlinear programming method with H E ½P% ¼ H E;3 ½P% and with
H E ½P% ¼ H E;4 ½P% with those obtained when only Assumption 2 is maintained.
Table 1
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case, with dichotomous
variables, for different values of l
Maintained assumptions
Table 2
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case
X~ ¼ f1; 2g X~ ¼ X
Prðx ¼ 1Þ ½0:180; 0:425 ½0:180; 0:415 ½0:235; 0:415 ½0:235; 0:415 0.3
Prðx ¼ 2Þ ½0:434; 0:687 ½0:525; 0:687 ½0:525; 0:687 ½0:551; 0:687 0.6
Prðx ¼ 3Þ ½0:000; 0:138 ½0:000; 0:138 ½0:000; 0:138 ½0:000; 0:137 0.1
EðxÞ ½1:575; 1:955 ½1:585; 1:955 ½1:585; 1:899 ½1:585; 1:899 1.8
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 99
Example 6. Let X ¼ f1; 2; 3g, l ¼ 0:2, p% ¼ 0:85, ½Prðx ¼ jÞ; j 2 X ¼ ½0:3 0:6 0:1T , and suppose that
w T
p% % %
21 ¼ 0:11, p12 ¼ 0:13, p13 ¼ 0:04, so that P ¼ ½0:34 0:55 0:11 ; with these values, EðxÞ ¼ 1:8. Table 2
gives the identification regions for t½P ¼ Prðx ¼ jÞ, j 2 X , and for t½Px ¼ EðxÞ, when Assumption 2 alone is
x
maintained, when Assumptions 2 and 3 are jointly maintained with X~ ¼ X and with X~ ¼ f1; 2g, and when
Assumptions 2 and 4 are jointly maintained. The improvement in the upper bound on Prðx ¼ 1Þ comes from
the second equation of system (1.1); indeed Pw1 þ Pw2 ¼ 0:8940:885 ¼ ð1 lÞ þ 1l l
Pw1 . Fig. 2 plots the
identification regions H½P obtained under the different assumptions, mapping them in R2 .
x
4. Estimation and inference for the distribution of pension plan types in the US
To illustrate estimation of the bounds and construction of the confidence sets, I consider data on the
distribution of pension plan characteristics in the American population age 51–61. The data are based on
household interviews obtained in the Health and Retirement Study (HRS), a longitudinal, nationally
representative study of older Americans, which in its base year of 1992 surveyed 12,652 individuals from 7,607
households, with at least one household member born between 1931 and 1941. The survey has been updated
every two years since 1992, and in 1998 a new cohort of 2,529 individuals born between 1942 and 1947
(so-called ‘ War Babies’) was added to the HRS sample. I use data from the first HRS wave and from the War
Babies wave, focusing on the information collected on pension plan characteristics for people age 51–61 and
ARTICLE IN PRESS
100 F. Molinari / Journal of Econometrics 144 (2008) 81–117
Table 3
Percentage with self-reported plan type conditional on firm report of plan type, for respondents reporting pension coverage on current job
with a matched employer plan description
DB DC Both
Sample size: 2,907. Source: Gustman and Steinmeier (2001, Table 6C).
employed at the time of the survey. This provides two nationally representative cross-sections of the
population of interest. The question to be addressed is:
How did the distribution of pension plan types in the population of currently employed Americans, age
51–61, change between 1992 and 1998?
Three pension plan types are possible: DB, DC, and plans incorporating features of both (Both). DB and
DC plans differ greatly in their characteristics. As described by Gustman et al. (2000), in a DB pension the
benefit formula is specified by the plan sponsor, usually as a function of the worker’s highest salary, years of
service, and retirement age. Typically such plans reduce the benefit amount for retirement prior to the so-
called normal retirement age, and are financed by employer (pre-tax) contributions. DC plans do not specify
the retirement benefit, but they set how much is contributed into the account each year the worker remains
with the plan. Then the benefit payout is determined at retirement, as a function of how much it accumulated
in the worker’s account. The plan type can affect several pension-related variables, including pension wealth
and pension accrual. For example, there are DB plans in which an additional year of service is rewarded by
greater retirement benefits up to the firm’s early retirement age. Then the benefit accrual profile may flatten
out, and even become negative, if retirement is delayed further. By contrast, DC plans tend to be actuarially
neutral with regard to the retirement age, rewarding delayed retirement more monotonically.
It is then of interest to learn how the distribution of pension plan types has changed over time, as a
preliminary step before studying the relation between pension incentives and retirement and saving behavior.
The HRS data can provide valuable information in this direction. However, there is evidence that workers are
particularly misinformed about their pension plans’ characteristics, and it is therefore not obvious how to
make use of their reported pension plan descriptions to draw the inference of interest. Gustman and
Steinmeier (2001) linked data from the first HRS wave with restricted data from Social Security
Administration and employer provided pension plan descriptions, and documented that individuals with
matched data (approximately 51% of the entire HRS sample and 67% of currently employed respondents)
approaching retirement age are remarkably misinformed with regard to their pension plans’ characteristics.
Their results are reported in Table 3, and suggest that, overall, approximately 49% of the currently employed
individuals with matched data correctly identify their pension plan type, the remaining 51% providing a
wrong report.
For the individuals in the first HRS wave without a matched pension (33% of the sample) it is difficult to
determine the true plan type: on one side, Gustman and Steinmeier (2001) document that the sub-sample
without a matched pension is different from the sub-sample with a matched pension; on the other side, the
evidence for the sub-sample with matched pension casts doubts on the reliability of the self-reports. Moreover,
linked data are not available for individuals in subsequent waves, or for individuals in the War Babies wave.10
Yet, the results of Gustman and Steinmeier’s (2001) analysis provide information on the misreporting pattern,
10
Additionally, employer provided pension plan descriptions are not publicly accessible by HRS users. In particular, such data are not
available for the analysis carried out in this paper.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 101
Table 4
True fractions of pension plan types for the subset of respondents with matched data for 1992, as calculated by Gustman and Steinmeier
(2001), Table 6A, and reported fractions of pension plan types for 1992 and 1998 (Author’s calculations)
Prt ðx ¼ 1js ¼ 1Þ 0.48 ½0:46; 0:50 Prt ðw ¼ 1Þ 0.44 ½0:42; 0:45 0.28 ½0:25; 0:30
Prt ðx ¼ 2js ¼ 1Þ 0.21 ½0:19; 0:22 Prt ðw ¼ 2Þ 0.30 ½0:29; 0:32 0.38 ½0:35; 0:41
Prt ðx ¼ 3js ¼ 1Þ 0.31 ½0:29; 0:33 Prt ðw ¼ 3Þ 0.26 ½0:24; 0:27 0.34 ½0:31; 0:37
Sample size N ¼ 2; 907 Sample size N ¼ 4; 244 N ¼ 1; 124
and such information can be exploited through the direct misclassification approach to draw inference on the
question of interest.
In all that follows I assume that the HRS respondents correctly report whether they are covered by a
pension,11 and I take firm reported plan types to be the ‘true’ plan types. I also ignore the observations with
missing data (about 2% of the sample). Let x ¼ 1 if the individual has a DB plan, x ¼ 2 if the individual has a
DC plan, and x ¼ 3 if the individual has a plan combining features of both, so that X f1; 2; 3g. As before,
w 2 X denotes the reported pension plan type. Let Pw;t ½Prt ðw ¼ jÞ; j 2 X and Px;t ½Prt ðx ¼ jÞ; j 2 X
denote, respectively, the vectors of fractions of reported pension plan types and true pension plan types at time
t ¼ 1992; 1998. For the respondents in the first HRS wave, let sl ¼ 1 denote the fact that individual l 2 L1992
%1
has a matched pension plan description, sl ¼ 0 otherwise, and denote by P1992 the matrix of misclassification
probabilities that maps the true pension plan types into the reported types for individuals with matched
%0
pension plan descriptions. Let P1992 denote the matrix of misclassification probabilities for the respondents in
the first HRS wave without a matched plan description, and let P% 1998 denote the matrix of misclassification
probabilities for the entire sample of respondents in the War Babies wave. Table 3 reveals, up to statistical
%1
considerations, P1992 . From the HRS data and from Gustman and Steinmeier’s (2001) results one can learn
w;1992 w;1998
P , P , and ½Pr1992 ðx ¼ jjs ¼ 1Þ; j 2 X . These values are reported in Table 4, along with 95%
confidence intervals.
One might expect the misclassification pattern reported by Gustman and Steinmeier (2001) to hold for the
entire set of respondents to the 1992 HRS survey. On the other hand, one might expect that the
misclassification structure mapping true pension plan types into reported types changes over time, so that
%1
P1992 can help in constructing H½P% 1998 , but not reduce this set to a singleton. However, one might as well be
tempted to entertain assumptions strong enough to achieve point identification of the quantity of interest. To
test the credibility of these conjectures, I examine the following assumptions:
%1
Assumption E1 (No selection). P%
1992 ¼ P1992 .
%1
Assumption E2 (No selection and no variation over time). P%
1998 ¼ P1992 .
The first assumption states that the misreporting pattern for respondents in the first HRS wave with
matched pension plan description holds for the entire sample of the first HRS wave. The second assumption
states that the misreporting pattern for the respondents in the War Babies wave is the same as that for the
respondents with matched data in the first HRS wave. When these assumptions are maintained, P% 1992 and
%1 x 1 w
P%1998 are identified, and, since P1992 is non-singular, one can use the equation p ¼ P P to attempt to learn
½Prt ðx ¼ jÞ; j 2 X , t ¼ 1992; 1998. Table 5 reports the results of such procedure, along with 95% bootstrap
%1
confidence intervals. The data reject the assumption that P% 1998 ¼ P1992 : the vector obtained from solving
%1 1 w;1998
ðP1992 Þ P does not generate a valid probability measure. In particular, the first element of the implied
11
This assumption is based on Gustman and Steinmeier’s (2001) comparison between peoples’ reports on their pension coverage in both
the 1992 and 1994 waves of the HRS. This comparison shows that 93% of the respondents who declared to be covered by a pension or to
be not covered by a pension in 1992 give the same answer in 1994. Of the remaining 7%, approximately 80% are individuals who declared
not to be covered by a pension in 1992 but to be covered in 1994.
ARTICLE IN PRESS
102 F. Molinari / Journal of Econometrics 144 (2008) 81–117
Table 5
Implications of Assumption E1—no selection—and Assumption E2—no selection and no variation over time—for the identification
regions of ½Prt ðx ¼ jÞ; j 2 X , t ¼ 1992; 1998
Maintained assumptions t ¼ 1992: No selection t ¼ 1998: No selection and no variation over time
vector is negative and its 95% confidence interval does not cover the zero, and the last element is greater than
one. Hence, point identification of Px;1998 through Assumption E2 is not possible. On the other hand, the data
%1
do not reject the assumption that P% 1992 ¼ P1992 , despite the possible selection problem. In all that follows I
maintain Assumption E1 and focus the attention on the problem of inferring H½Px;1998 . Of course,
Assumption E1 can be relaxed, and H½Px;1992 can be estimated under weaker assumptions using the direct
misclassification approach.
The main assumption that I maintain throughout the entire analysis, and that I use to exploit part of the
%1
information in P1992 to learn H½Px;1998 , is the following:
Assumption E3 (No reduction in awareness). pjj;1998 Xpjj;1992 , 8j 2 X .
This assumption says that the fraction of individuals correctly identifying their pension plan type does not
decline over time. This in turn implies that lower bounds on the probability of correct report in 1992 provide
lower bounds on the probability of correct report in 1998. Assumption E3 is motivated by the observation that
in recent years the Social Security Administration and the Department of Labor have increasingly expanded
their efforts to improve individuals’ knowledge about pensions and about retirement saving in general (see
Gustman and Steinmeier, 2001 for a summary of recent interventions).
I now introduce two sets of assumptions, which I entertain along with Assumption E3 to construct the set
x;1998
H½P% 1998 and derive H½P . Of course, different empirical researchers might hold disparate beliefs about
which of the assumptions in Cases 1 and 2 hold; moreover, they might bring to bear different prior
information.
The identification regions for H½Px;1998 are plotted in Fig. 3, along with their 95% confidence sets. The
identification regions H½Pr1998 ðx ¼ jÞ, j 2 X are reported in Table 6, again with their 95% confidence
intervals.
Case 1:
1998 ¼ H ½P \ fP : p11 ¼ p22 X0:54; p22 Xp33 X0:35; p21 pp12 ; p31 pp13 ; p23 pp13 g.
P
H½P% %
Case 1 maintains Assumption E3 and builds on Assumption E1. I assume that certain of the findings of
Gustman and Steinmeier (2001) for matched respondents in 1992 are informative about respondents in 1998. I
assume that the probability of correct report for 1998 respondents who truly have a DB or a DC plan is at
least as large as the corresponding probability for 1992 respondents. I also assume that persons with DB and
DC pensions have the same probabilities of correct reports, these being at least as large as the probability of
correct report by those whose pensions are of the Both type. This assumption is motivated by Table 2, which
shows this pattern for 1992 respondents.
I also assume that various other features of Table 2 carry over to respondents in the War Babies wave. I
assume that persons who truly have a pension plan of the Both type report their plan as DB more often than
the reverse pattern, where persons with DB plans report themselves as having a plan of the Both type. I assume
that persons who truly have a DC plan report a DB plan more often than individuals with a DB plan report a
DC one. And I assume that persons who truly have a plan of the Both type report a DB plan more often than
a DC one. These assumptions are expressed through the inequalities p21 pp12 , p31 pp13 , p23 pp13 .
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 103
Case 1
[0 0 1] CNH[P
x,1998
]
H[Px,1998]
Δ2
[1 0 0] [0 1 0]
Case 2
x,1998
[0 0 1] CNH[P ]
H[Px,1998]
Δ2
[1 0 0] [0 1 0]
Fig. 3. Identification regions and confidence sets for H½Px;1998 under different assumptions.
Table 6
Identification regions in cases 1–2 for Pr1998 ðx ¼ jÞ, and point estimates for Pr1992 ðx ¼ jÞ
t ¼ 1992 0.46 ½0:36; 0:60 0.37 ½0:33; 0:41 0.17 ½0:02; 0:28
Case 1, 1998 ½0:00; 0:42 ½0:00; 0:44 ½0:11; 0:72 ½0:10; 0:87 ½0:00; 0:89 ½0:00; 0:91
Case 2, 1998 ½0:00; 0:28 ½0:00; 0:34 ½0:35; 0:61 ½0:28; 0:80 ½0:11; 0:50 ½0:00; 0:67
The first panel of Fig. 3 shows the estimate of H½Px;1998 obtained in Case 1, and its confidence set, mapped
in R2 . Interestingly, these sets are non-convex. For the construction of the confidence set, I estimated Pw;1998
using sample means, and took as estimates of the lower bounds in H E ½P% the values m1;n , m2;n in the (2,2) and
(3,3) entries of Table 3. These estimates are borrowed from Gustman and Steinmeier (2001) and are based on a
validation data (respondents to the 1992 wave with matched pension plan descriptions) independent from the
1998 data, with n ¼ 2; 907. For the construction of the confidence ellipsoid for ½Pw;1998
1 ; Pw;1998
2 ; m1 ; m2 I used
N 1;124
k ¼ n ¼ 2;907. The estimates of Pr1992 ðx ¼ 1Þ and H½Pr1998 ðx ¼ 1Þ reported in Table 6 suggest that the fraction
of individuals having a DB plan should have declined between 1992 and 1998. However, the confidence set for
H½Pr1992 ðx ¼ 1Þ Pr1998 ðx ¼ 1Þ covers negative numbers, and therefore the hypothesis Pr1992 ðx ¼ 1Þ
Pr1998 ðx ¼ 1Þo0 cannot be rejected. This shows that relatively mild restrictions yield a strong conclusion
regarding the question of interest, although more assumptions are needed to obtain statistical significance.
ARTICLE IN PRESS
104 F. Molinari / Journal of Econometrics 144 (2008) 81–117
Case 2:
8 0 19
> p11 ¼ p22 Xp33 X0:54; >
< =
% P % B C
p21 pp12 ; p31 pp13 ; p23 pp13 ;
H½P1998 ¼ H ½P \ P : @ A .
>
: >
p21 X0:10; pij X0:15 for all other i; j 2 X ; iaj: ;
Case 2 builds on Case 1, as it retains all the assumptions maintained there. However, it is crucially set apart
from the previous case, in that it requires a lower bound on each probability of misclassification. This in turn
implies that, given any true pension plan type, the probability of correct report has to be necessarily less than
one. This assumption is motivated by the large amount of misreporting of pension plan types which appears in
Table 3, and which is documented at large by Gustman and Steinmeier (2001). Additionally, p33 is required to
have the same lower bound as p11 and p22 . This is motivated by the large amount of information campaigns on
DC plans (in particular 401k) that has characterized the mid to late 1990s.
Under these assumptions, the estimate of H½Px;1998 shrinks further. This allows one to conclude that the
fraction of individuals having DB plans has decreased between 1992 and 1998; in particular,
Pr1992 ðx ¼ 1Þ Pr1998 ðx ¼ 1ÞX0:18. This in turn implies that the fraction of individuals having either DC
plans or plans incorporating features of both has increased sharply between 1992 and 1998. The confidence set
for H½Pr1992 ðx ¼ 1Þ Pr1998 ðx ¼ 1Þ does not contain negative numbers, so that the assumption Pr1992 ðx ¼
1Þ Pr1998 ðx ¼ 1Þo0 can be rejected. The confidence set H½Px;1998 in Case 2 is constructed again by
estimating Pw;1998 using sample means, and taking as estimate of the lower bound for pjj , j ¼ 1; 2; 3, in H E ½P%
the value mn in the (2,2) entry of Table 3. However the lower bounds for the other parameters are treated as
constant, so that the confidence ellipsoid is constructed exclusively for the vector ½Pw;1998 1 ; Pw;1998
2 ; m.
By comparison, if one did not use all the information provided by Gustman and Steinmeier’s (2001)
analysis, but imposed only a uniform lower bound on the probability of correct report (Assumption 2), the
results of HM would apply. If one assumed 1 l ¼ 0:35, one would learn that Pr1998 ðx ¼ 1Þ 2 ½0; 0:79,
Pr1998 ðx ¼ 2Þ 2 ½0; 1, Pr1998 ðx ¼ 3Þ 2 ½0; 0:97. If one assumed 1 l ¼ 0:54, one would learn that
Pr1998 ðx ¼ 1Þ 2 ½0; 0:51, Pr1998 ðx ¼ 2Þ 2 ½0; 0:71, Pr1998 ðx ¼ 3Þ 2 ½0; 0:63. These bounds do not allow one to
identify the sign of the change in the fraction of individuals having a DB plan.
5. Extensions
The direct misclassification approach can be easily extended to drawing inference in the presence of multiple
misclassified variables, regression with misclassified outcome, regression with misclassified regressor, and
jointly missing and misclassified outcomes. Below I list briefly the modifications of the approach that allow
inference in each of these cases.
In this case, the researcher simply has to redefine variables. Suppose that interest centers on features of
Pðx1 ; x2 Þ, x1 2 X 1 f1; 2; . . . ; J 1 g, x2 2 X 2 f1; 2; . . . ; J 2 g, 2pJ 1 ; J 2 o1, and the researcher observes only
ðw1 ; w2 Þ, a misclassified version of ðx1 ; x2 Þ. She can then construct random variables s and r, taking values in
S f1; 2; . . . ; J 1 J 2 g, and such that s ¼ ðl 1ÞJ 2 þ j if x1 ¼ j and x2 ¼ l, and r ¼ ðk 1ÞJ 2 þ i if w1 ¼ i and
w2 ¼ k. She can then write the analogue of Eq. (1.1) for r and s, and use the method proposed here to draw the
inference of interest.
5.2. Regressions
(a) If interest centers on features of Pðxjs ¼ s0 Þ, where s 2 S is a perfectly observable discrete covariate with
Prðs ¼ s0 Þ40, and the researcher has prior information on P% s0 fPrðw ¼ ijx ¼ j; s ¼ s0 Þgi;j2X , the proposed
method can be applied directly, with the event s ¼ s0 conditioning all the probabilities involved.
(b) Consider now the case that interest centers on features of PðyjxÞ, where y is a perfectly observed outcome
variable. The problem of regression with misclassified covariates has been widely studied (e.g., Aigner, 1973;
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 105
Klepper, 1988; Bollinger, 1996; Card, 1996; Kane et al., 1999; Hu, 2006; Mahajan, 2006), and point
identified or interval identified estimators have been proposed under specific sets of assumptions.
The direct misclassification approach can be used to estimate the smallest point and the largest point
in the identification region of (for example) a mean regression under any set of assumptions. Molinari
(2003) shows how. Here I present ideas, for the special case in which the probability of correct report is
greater than 12 for each of the values that x can take (and any additional assumption might hold). In
this case any P 2 H½P% is of full rank, so that px ¼ P1 Pw . This implies that a feasible value of ½Prðx ¼
jjw ¼ iÞ; i; j 2 X can be uniquely expressed as a function of P. Hence, for each P 2 H½P% , I can use
the results of HM to obtain sharp bounds for Eðyjw ¼ i; x ¼ jÞ and use the Law of Total Probability to infer
sharp P-dependent bounds on Eðyjx ¼ jÞ. Taking the infimum and the supremum, respectively, of the
smallest and largest points in these bounds for P 2 H½P% gives the smallest and the largest point in
H½Eðyjx ¼ jÞ, j 2 X .
This same argument has been proposed by Dominitz and Sherman (2006), who studied the problem of
inferring the distribution of test scores for truly English proficient students (x ¼ 1), when only an imperfect
indicator of English proficiency is available (w ¼ 1). They used a mixture model with verification and assumed
that students classified as English proficient (w ¼ 1) are more likely to be truly English proficient (x ¼ 1) than
students classified as limited English proficient (w ¼ 2). In terms of misclassification probabilities, this
assumption translates into p11 XPw1 .
The data available to the empirical researcher are often not only error ridden, but also incomplete. Consider
the example of survey respondents being asked about their pension plan type: not only can they report DB,
DC, or Both, but they can as well choose not to respond to the question. Let w ¼ J þ 1 denote this outcome.
Then system (1.1) needs to be enlarged to include the equation
X
J
Prðw ¼ J þ 1Þ ¼ Prðw ¼ J þ 1jx ¼ jÞ Prðx ¼ jÞ.
j¼1
This simply implies that the set H½P% is a set of rectangular matrices. The identification regions H½Px and
Hft½Px g are still defined as in (2.2) and (2.3), and the nonlinear programming method can be used to
consistently estimate them. Of course, there are additional constraints, one coming from the ðJ þ 1Þth
equation in the above system, and the others from possible assumptions on the relationship between
misreporting and non-response.
6. Conclusions
This paper has studied the problem of drawing inference when a discrete variable is subject to classification
errors. This is a commonplace problem in surveys and elsewhere. The problem has long been conceptualized
through convolution and mixture models. This paper introduced the direct misclassification approach. The
approach is based on the observation that in the presence of classification errors, the relation between the
distribution of the ‘true’ but unobservable variable and its misclassified representation is given by a linear
system of simultaneous equations, in which the coefficient matrix is the matrix of misclassification
probabilities.
While this matrix is unknown, validation studies, economic theory, cognitive and social psychology, or
knowledge of the circumstances under which the data have been collected can provide information on the
misclassification pattern that has transformed the ‘true’ but unobservable variable into the observable but
possibly misclassified variable. The method introduced in this paper shows how to transform such prior
information into sets of restrictions on the (unknown) matrix of misclassification probabilities, and exploit
these restrictions to derive identification regions for any real functional of the distribution of interest. By
contrast, mixture models do not allow the researcher to easily exploit this type of prior information to learn
features of the distribution of interest. Convolution models, as usually implemented with the assumption of
ARTICLE IN PRESS
106 F. Molinari / Journal of Econometrics 144 (2008) 81–117
independence between measurement error and ‘true’ variable, are not suited to analyze errors in discrete data.
The direct misclassification approach does not rely on any specific set of assumptions, but it can incorporate
into the analysis any prior information that the researcher might have on the misreporting pattern. In some
cases the implied identification regions have a simple closed form solution that allows for straightforward
estimation using sample analogs. When this is not the case, the identification regions can be estimated using
the nonlinear programming estimator introduced in this paper. Confidence sets that cover the true
identification region with probability at least equal to a prespecified confidence level can be constructed using
a simple procedure based on the inversion of a Wald statistic.
Acknowledgments
I am grateful to the Associate Editor, two anonymous reviewers, Tim Conley, Joel Horowitz, Rosa
Matzkin, and especially Chuck Manski for helpful comments and suggestions. I have benefitted from
discussions with T. Bar, G. Barlevy, L. Barseghyan, L. Blume, R. DiCecio, M. Goltsman, A. Guerdjikova,
G. Jakubson, N. Kiefer, R. Lentz, G. Menzio, B. Meyer, M. Peski, J. Sullivan, C. Taber, E. Tamer,
T. Tatur, and T. Vogelsang, and from the comments of seminar participants at Boston College, Chicago GSB,
Cornell, Duke, Georgetown, Pittsburgh, Penn, Penn State, Princeton, Purdue, Toronto, UCLA, UCL,
Virginia, and at the 2003 Southern Economic Association Meetings. All remaining errors are my own.
Research support from Northwestern University Dissertation Year Fellowship, the Center for Analytic
Economics at Cornell University, and the National Science Foundation Grant SES-0617482 is gratefully
acknowledged.
A.1.1. Proposition 1
Proof. Let P1 2 H P ½P% . This means that 9 n1 2 DJ1 such that P1 n1 ¼ Pw . Now observe that for any
~ ¼ Pw . Hence, for any a 2 ð0; 1Þ it holds that ðaP1 þ ð1 aÞPÞn
n 2 DJ1 , Pn ~ 1 ¼ Pw , and therefore
1
ðaP þ ð1 aÞPÞ ~ 2 H ½P . To show that H ½P is not star convex with respect to any other of its
P % P %
elements, consider a matrix P1 2 H P ½P% with P1 aP. ~ Because P1 aP, ~ it follows that there exists an i 2 X
such that not all elements of the ith row of P are equal to Pwi . Without loss of generality, let i ¼ 1. Let
1
p11j 4Pw1 40 (a similar argument works for the case that p1j oPw1 ), and without loss of generality suppose j ¼ 1.
Construct P2 as follows: p21 ¼ Pw , p21k ¼ 1 8k 2 X nf1g. Then P2 2 H P ½P% . Let Pa ¼ aP1 þ ð1 aÞP2 . Then
for any a 2 ½0; 1 Pw1 Þ it follows that Pa eH P ½P% , because every element in the first row of the resulting
matrix is strictly greater than Pw1 . &
A.1.2. Proposition 2
For given vectors of positive probabilities Pw and positive constants l ½m1 ; . . . ; mq̄ , let Qðn; Pw ; lÞ denote
the value function in the nonlinear programming problem (2.7)–(2.8). Observe that QN ðnÞ ¼ Qðn; PwN ; ln Þ. Let
ðv1 ; P1 Þ be the maximizer for Qðn; Pw;1 ; l1 Þ, where Pw;1 ; l1 are arbitrary values of Pw ; l (recall that this problem
has always an optimal solution). I show that for different feasible arbitrary values Pw;2 ; l2 , the difference
jQðn; Pw;1 ; l1 Þ Qðn; Pw;2 ; l2 Þj is Op ðN 1=2 Þ. The strategy of this proof is similar to the one in Honore and
Lleras-Muney (2006), except that here some more complications arise due to the possible nonlinearity of some
of the constraints. This establishes that
p supn2DJ1 jQN ðnÞ QðnÞj
sup jQN ðnÞ QðnÞj ! 0 and ¼ op ð1Þ.
n2DJ1 N
The consistency result then follows from Manski and Tamer (2002, Proposition 5).
To simplify the notation, let q̄ ¼ q, and assume that q1 components of l are estimated for the greater-than-
or-equal constraints, q2 for the less-than-or-equal constraints, and q3 for the equality constraints,
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 107
q1 þ q2 þ q3 ¼ q. Let
( )
Pw;2
j m2l m1q1 þm m2q1 þq2 þs m1q1 þq2 þs
c1 ¼ min min w;1 ; min 1 ; min ; min ; min .
j P l2f1;...;q1 g m m2f1;...;q2 g m2q þm s2f1;...;q3 g m1q þq þs s2f1;...;q3 g m2q þq þs
j l 1 1 2 1 2
c1 P1 X0, and
This implies that 0oc1 p1. Let P
X
J X
J X
J
vj 1 p ij ¼ 1 c1 p1ij X1 p1ij X0; j ¼ 1; . . . ; J,
i¼1 i¼1 i¼1
X
J X
J Pw;2
j
vJþj Pw;2
j p ij xj ¼ Pw;2
j c1 p1ij xj X v1Jþj X0; j ¼ 1; . . . ; J.
i¼1 i¼1 Pw;1
j
Notice that jvj v1j jpJð1 c1 Þ and jvJþj v1Jþj jpð1 þ JÞð1c
c1 Þ.
1
Consider now the constraints defining H E ½P% . Let r ¼ maxðt; maxl rl Þ, where t is the degree of the
polynomial. Observe that if f l ðP1 ÞXm1l then v12Jþl ¼ 0; if f l ðP1 Þom1l then v12Jþl ¼ m1l f l ðP1 Þ. For
l ¼ 1; . . . ; q1 , let
8
>
if f l ðP1 ÞXm1l and f l ðPÞXm2
>
<
0 l;
m2 Þ
v2Jþl f l ðP1 Þ m1l ðf l ðPÞ 1
if f l ðP ÞXm and f l ðPÞom2 ;
1
(A.1)
l l l
>
>
: m2 f ðPÞ
if f l ðP 1
Þom1l :
l l
The suggested values of v2Jþl are feasible. In fact, if f l ðP1 ÞXm1l the implied v2Jþl is obviously non-negative. If
f l ðP1 Þom1l ,
2
¼ m2 f l ðc1 P1 ÞXml m1 c1 f l ðP1 ÞXc1 v1 X0,
v2Jþl ¼ m2l f l ðPÞ l
m1l l 2Jþl
where maxP2½0;1J 2 jf l ðPÞj is bounded because f l ðÞ is a continuous function on a compact set.
Regarding the less-than-or-equal constraints, observe that under Assumption C1 a monotone transforma-
tion of gm ðPÞ and mq1 þm leaves the constraint unaltered. Hence without loss of generality when gm ðÞ satisfies
Assumptions C1(i), let rm ¼ 1.
Now, notice that if gm ðP1 Þpm1q1 þm then v12Jþq1 þm ¼ 0; if gm ðP1 Þ4m1q1 þm then v12Jþq1 þm ¼ gm ðP1 Þ m1q1 þm . For
m ¼ 1; . . . ; q2 , let
8
if gm ðP1 Þpm1q1 þm and gm ðPÞpm 2
>
> 0 q1 þm ;
>
> !
>
>
>
< m1 1
1 m2
if gm ðP1 Þpm1q1 þm and gm ðPÞ4m 2
q1 þm gm ðP Þ þ 1þr
gm ðPÞ q1 þm q1 þm ;
v2Jþq1 þm c1
>
>
>
> 1
>
> 2
: c1þr gm ðPÞ mq1 þm
> if gm ðP1 Þ4m1q1 þm :
1
This choice of v2Jþq1 þm satisfies the constraint in (2.8). In fact, if gm ðPÞpm2
q1 þm the constraint is satisfied with
v2Jþq1 þm ¼ 0, and in the other cases
! !
2 2 1 2 1
mq1 þm gm ðPÞ þ v2Jþq1 þm Xmq1 þm gm ðPÞ þ 1þr gm ðPÞ mq1 þm ¼ 1 gm ðPÞX0,
c1 c11þr
2
where the last inequality follows because by Assumption C1 gm ðÞ is non-negative on ½0; 1J and 0oc1 p1 by
construction. Notice also that the suggested values of v2Jþq1 þm are feasible. In fact, if gm ðP1 Þpm1q1 þm the
ARTICLE IN PRESS
108 F. Molinari / Journal of Econometrics 144 (2008) 81–117
1
implied v2Jþq1 þm is obviously non-negative, because gm ðPÞXgm ðPÞ. On the other hand, recalling that by
c1þr
1
m1q1 þm
construction c1 pminm2f1;...;q2 g , if gm ðP1 Þ4m1q1 þm ,
m2q1 þm
1 m2 1 1 2 1 1 2
v2Jþq1 þm ¼ gm ðPÞ q1 þm ¼ 1þr gm ðc1 P Þ mq1 þm X r gm ðP Þ mq1 þm
c11þr c1 c1
1 1
X v X0.
cr1 2Jþq1 þm
Moreover, by Assumption C1(i)
1
1
jv2Jþq1 þm v12Jþq1 þm jpjm2q1 þm m1q1 þm j þ 1þr gm ðPÞ gm ðP Þ
c1
! !
1 crþ1
p 1
M þ max gm ðPÞ ,
crþ1
1 P2½0;1J
2
where maxP2½0;1J 2 gm ðPÞ is bounded because gm ðÞ is a continuous function on a compact set. Finally, observe
that for the equality constraints the same calculations as above can be applied to hk ðPÞXmq1 þq2 þk and
hk ðPÞpmq1 þq2 þk , k ¼ 1; . . . ; q3 .
!
w;2 2 w;1 1 1 crþ1
Hence, for each n, Qðn; P ; l ÞX Qðn; P ; l Þ const 1
. Interchanging the role of Pw;1 and Pw;2
crþ1
1
!
rþ1
1 c2
yields Qðn; Pw;1 ; l1 ÞX Qðn; Pw;2 ; l2 Þ const , where
crþ1
2
( )
Pw;1
j m1l m2q1 þm m2q1 þq2 þs m1q1 þq2 þs
c2 ¼ min min w;2 ; min 2 ; min ; min ; min
j P l2f1;...;q1 g m m2f1;...;q2 g m1q þm s2f1;...;q3 g m1q þq þs s2f1;...;q3 g m2q þq þs
j l 1 1 2 1 2
Finally, under Assumption C2 the estimators PwN and ln are root-N consistent, so that
sup jQN ðnÞ QðnÞj ¼ Op ðN 1=2 Þ.
n2DJ1
I first introduce and prove a lemma that is useful for the proof of some of the following propositions.
Pwj l
Lemma 1. Suppose that Assumption 2 holds, and that Pwj 4l, j 2 X . Then is an admissible value of pxj ,
1l
and therefore solves the jth equation of system (1.1), if and only if the following conditions jointly hold: (a) pjj ¼ 1,
P 1Pw
and (b) pji ¼ l 8i 2 X nfjg such that pxi 40, so that iaj pji pxi ¼ l 1lj .
Pwj l
Proof. For 40 to be an admissible value of pxj , the jth equation of system (1.1) requires that
1l
Pwj l X
pjj þ pji pxi ¼ Pwj , (A.2)
1l iaj
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 109
P x
1 Pwj
and iaj pi ¼ . By Assumption 2, pji 2 ½0; l, 8i 2 X nfjg and pjj 2 ½1 l; 1. Notice that it is possible
1l P
for pji ¼ l, 8i 2 X nfjg, because the pji are not related across i. (Recall that 1 pkk ¼ lak plk pl, 8k 2 X .)
Therefore,
Pwj l X Pwj l X
pjj þ pji pxi ppjj þl pxi
1l iaj
1 l iaj
Pwj l 1 Pwj Pwj l 1 Pwj
¼ pjj þl p ¼ Pwj .
þl
1l 1l 1l 1l
P 1Pw
Hence, Eq. (A.2) can be satisfied if and only if pjj ¼ 1, and iaj pji pxi ¼ l 1lj . That is, pji ¼ l 8i 2 X nfjg such
Pwj l
that pxi 40. Notice that at least one value of pxi 40, because pxj ¼ 1l o1. &
A.2.1. Proposition 3
Proof. Without loss of generality, suppose that interest is in characterizing the identification region
H½Prðx ¼ 1Þ.
(a) Assumption
P 1 holds:P For the first equation of system (1.1) to be satisfied it must be that
px1 ¼ Pw1 Jj¼2 p1j pxj þ px1 Ji¼2 pi1 . From the definition of H 1 ½P% it follows that
( )
X J XJ X XJ X
J
x x
lX1 phh ph ¼ pih ph X p1j pxj þ px1 pi1 .
h¼1 h¼1 iah j¼2 i¼2
Hence from the first equation of system (1.1) one can learn that px1 XmaxfPw1 l; 0g, and px1 pminf1; Pw1 þ lg.
P P
If Pw1 4l, the lower bound is achieved for Jj¼2 p1j pxj ¼ l and Ji¼2 pi1 px1 ¼ 0. If Pw1 o1 l, the upper bound is
P P
achieved for Jj¼2 p1j pxj ¼ 0 and Ji¼2 pi1 px1 ¼ l. I now show that there are values of pxj 2 X nf1g and P 2
H 1 ½P% such that the corresponding px 2 H½PðxÞ.
Pw1
(a.1.1) Upper Bound, with Pw1 o1 l: Let p11 ¼ , pjj ¼ 1, j 2 X nf1g, pij ¼ 0, i; j 2 X nf1g, iaj, and
ðPw1 þ lÞ
define pi1 , i 2 X nf1g, as follows:
8
>
< l
w for i ¼ j ¼ minfk ¼ 2; . . . ; J : Pwk Xlg;
if 9 j41 : Pwj Xl; pi1 ¼ ðP1 þ lÞ
>
: 0; 8i 2 X ; iaf1; jg:
8
> Pw2
>
> for i ¼ 2;
>
> w
ðP1 þ lÞ
>
> 8
>
>
>
>
>
>
l P Pwk
i1 Pwi
< for i 2 X nf1; 2g;
>
>
< min ; P Pwk
i1
ðPw1 þ lÞ k¼2 ðPw1 þ lÞ ðPw1 þ lÞ > i : pl;
if Pwj ol; 8j 2 X nf1g; pi1 ¼ : w
k¼2 ðP1 þ lÞ
>
>
>
> 8
>
>
>
> < for i 2 X nf1; 2g;
>
>
> iP
1 Pwk
>
>0
>
>
: :i :
> w 4l:
k¼2 ðP1 þ lÞ
It is easy to show that the suggested P belongs to H 1 ½P% , and allows for px1 ¼ Pw1 þ l and the implied pxj ,
j 2 X nf1g to solve system (1.1). Hence, px1 ¼ Pw1 þ l is a feasible value of Prðx ¼ 1Þ given the maintained
assumptions.
(a.1.2) Upper bound, with Pw1 X1 l: In this case the upperP
bound is not informative, but just set equal to 1.
J
i¼2 pi1 ¼ 1 P1 pl, and pi1 p1 ¼ pi1 ¼ Pi pl,
w w
Let px1 ¼ 1; this in turn implies pxj ¼ 0, 8j 2 X nf1g. Let x
1 % x
8i 2 X , ia1. It is straightforward to verify that the suggested P 2 H ½P , and allows for p1 ¼ 1, and the
ARTICLE IN PRESS
110 F. Molinari / Journal of Econometrics 144 (2008) 81–117
implied pxj ¼ 0, 8j 2 X nf1g, to solve system (1.1). Hence px1 ¼ 1 is a feasible value of Prðx ¼ 1Þ given the
maintained assumptions.
l l
(a.2.1) Lower bound, with Pw1 4l: Let px2 ¼ Pw2 þ l, and p12 ¼ x , p22 ¼ 1 x , and pjj ¼ 1, 8j 2 X nf2g, so
p2 p2
that pi2 ¼ 0, 8i 2 X nf2g, and pij ¼ 0, 8i; j 2 X , iaj, ½i ja½1 2. Then it is straightforward to verify that the
suggested P 2 H 1 ½P% , and allows for px1 ¼ Pw1 l and the implied pxj , j 2 X nf1g to solve system (1.1). Hence
Pw1 l is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(a.2.2) Lower bound, with Pw1 pl: Then the lower bound is not informative, but just set equal to 0. Let
P P
px1 ¼ 0; this in turn implies Jj¼2 pxj ¼ 1. Let p12 ¼ p13 ¼ ¼ p1J ¼ Pw1 . Then Jj¼2 p1j pxj ¼ Pw1 . Moreover
PJ
j¼2 Pj ¼ 1 P1 X1 l, hence Pj p1 P1 for each j 2 X nf1g. Let pjj ¼ 1 P1 , 8j 2 X nf1g, and pij ¼ 0,
w w w w w
w
Pj PJ x
w p1, j 2 X nf1g, and j¼2 pj ¼ 1. It follows that when P1 pl, there exist
w
8i; j 2 X , iaj, ia1. Then pxj ¼
1 P1
values of P 2 H 1 ½P% for which px1 ¼ 0 and the implied pxj , j 2 X nf1g solve system (1.1), and hence it is a
feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(a.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpPw1 p1 l; (2) Pw1 pminfl; 1 lg; (3) Pw1 Xmaxfl; 1 lg; (4) 1 loPw1 ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Let lpPw1 p1 l. It then follows that
Let px1 ¼ Pw1 þ ð1 2aÞl, for any a 2 ð0; 1Þ. To find values of pxj 2 X nf1g and P 2 H 1 ½P% such that the
corresponding px 2 H½PðxÞ, I distinguish two sub-cases:
Pw1
1. If ap 12, let p11 ¼ , pij ¼ 0, 8i ¼ 1; . . . ; J, j ¼ 2; . . . ; J. Choose pj1 and pxj , j 2 X nf1g, as
Pw1 þ ð1 2aÞl
Pw1
follows: if 9 j : Pwj X1 w ,
P1 þ ð1 2aÞl
8
>
<1 Pw1
for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
pk1 ¼ Pw1 þ ð1 2aÞl
>
: 0; 8k 2 X ; kaf1; jg:
Pw1
If Pwj o1 ; 8j 2 X nf1g,
Pw1 þ ð1 2aÞl
8 w
< P2
>
for k ¼ 2;
pk1 ¼ Pw1 P
k1
: min 1
> pi1 ; Pwk 8k 2 X nf1; 2g;
Pw1 þ ð1 2aÞl i¼2
P
(b) Assumption 2 holds: For the first equation of system (1.1) to be satisfied, I need p11 px1 þ Jj¼2 p1j pxj ¼ Pw1 ,
PJ x
j¼2 pj ¼ 1 p1 . From the definition of H ½P , p1j pl, 8j 2 X nf1g, and p11 X1 l. Let
x 2 %
where
PJ
j¼2 p1j pj pp̄ð1 p1 Þ, where p̄ 2 ½0; l. Then
x x
Pw1 p̄
px1 ¼ ,
p11 p̄
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 111
and px1 is well defined as long as p11 ap̄. I distinguish a few cases.
1. If Pw1 ominfl; 1 lg, one can pick p̄ ¼ Pw1 ol, and px1 ¼ 0 is the lower bound. As for the upper bound,
when Pw1 o1 lpp11 , by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and
Pw
p̄. Hence the upper bound is achieved for p11 ¼ 1 l, and p̄ ¼ 0, and is given by px1 ¼ 1 .
1l
2. If lpPw1 p1 l, by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and p̄.
Pw
Hence the upper bound is achieved for p11 ¼ 1 l, and p̄ ¼ 0, and is given by px1 ¼ 1 , and the lower
1l
x Pw1 l
bound is achieved for p11 ¼ 1, and p̄ ¼ l, and is given by p1 ¼ .
1l
3. If 1 lpP1 pl, pick p̄ ¼ P1 pl, and p1 ¼ 0 is the lower bound. Pick p11 ¼ Pw1 X1 l, and px1 ¼ 1 is the
w w x
upper bound.
4. If Pw1 4maxfl; 1 lg, pick p11 ¼ Pw1 X1 l, and px1 ¼ 1 is the upper bound. As for the lower bound, when
Pw1 4lXp̄, by the first equation of system (1.1) p̄pPw1 pp11 , and px1 is decreasing in both p11 and p̄. Hence
Pw l
the lower bound is achieved for p11 ¼ 1, and p̄ ¼ l, and is given by px1 ¼ 1 .
1l
w
P l
To summarize, from the first equation of system (1.1) one can learn that px1 Xmax 1 ; 0 and
1l
w
P
px1 pmin 1; 1 . I am left to show that one can find values of pxj 2 X nf1g and P 2 H 2 ½P% such that for
1 l w
P l Pw
any p1 2 max 1
x
; 0 ; min 1; 1 the corresponding px 2 H½PðxÞ. I first show that this holds for
1l 1l
the extreme points, and then that it holds for any point in the closed interval between the lower and the upper
bound.
(b.1.1) Upper Bound, with Pw1 o1 l: Let p11 ¼ 1 l and pjj ¼ 1, 8 j41. Then the system reduces to
8
> Pw1
>
< ð1 lÞ ¼ Pw1 ;
1l
>
> Pw
: pj1 1 þ pxj ¼ Pwj ; j ¼ 2; . . . ; J;
1l
PJ P
where j¼2 pj1 ¼ l, and Jj¼2 Pwj 4l. Choose pk1 , k 2 X nf1g, as follows:
(
w
l for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
if 9 j : Pj Xl; pk1 ¼
0; 8k 2 X ; kaf1; jg:
(
Pw2 for k ¼ 2;
if Pwj ol; 8j 2 X nf1g; pk1 ¼ Pk1 w (A.3)
minfl i¼2 pi1 ; Pk g 8k 2 X nf1; 2g:
Pw1
It is easy to show that the suggested P belongs to H 2 ½P% , and allows for px1 ¼ and the implied pxj , j 2 X nf1g
1l
Pw1
to solve system (1.1). Hence, px1 ¼ is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
1l
w
(b.1.2) Upper bound, with P1 X1 l: In this case the upper bound is not informative, but just set equal to 1.
Let px1 ¼ 1; this in turn implies pxj ¼ 0, 8j 2 X nf1g. Let pj1 ¼ Pwj , j ¼ 1; . . . ; J. It is straightforward to verify
that this P 2 H 2 ½P% , and obviously allows for px1 ¼ 1 and the implied pxj ¼ 0, 8j 2 X nf1g, to solve system
(1.1). Hence px1 ¼ 1 is a feasible value of Prðx ¼ 1Þ given the maintained assumptions.
(b.2.1) Lower bound, with Pw1 4l: Let pj1 ¼ 0, 8j 2 X nf1g, and p12 ¼ ¼ p1J ¼ l; then the first equation of
Pwj
system (1.1) is satisfied, and the implied P 2 H 2 ½P% . Let pxj ¼ X0, j 2 X nf1g. It is straightforward to verify
1l
w
P l
that system (1.1) is satisfied. Hence px1 ¼ 1 is a feasible value for Prðx ¼ 1Þ given the maintained assumptions.
1l
ARTICLE IN PRESS
112 F. Molinari / Journal of Econometrics 144 (2008) 81–117
P
(b.2.2) Lower bound, with Pw1 pl: Let px1 ¼ 0; this in turn implies Jj¼2 pxj ¼ 1. Let p1j ¼ Pw1 and pjj ¼ 1 Pw1
Pwj PJ x
j¼2 pj ¼ 1. It follows that when P1 pl, there exist values of
w
8j41. Then pxj ¼ w X0, j 2 X nf1g, and
1 P1
P 2 H 2 ½P% for which px1 ¼ 0 and the implied pxj , j 2 X nf1g, solve system (1.1), and hence it is a feasible value
of Prðx ¼ 1Þ given the maintained assumptions.
(b.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpPw1 p1 l; (2) Pw1 pminfl; 1 lg; (3) Pw1 Xmaxfl; 1 lg; (4) 1 loPw1 ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Pw l x Pw Pw al
Let lpPw1 p1 l. It then follows that 1 pp1 p 1 . Let px1 ¼ 1 , for any a 2 ð0; 1Þ. I show that
1l 1l 1l
2
there are values of pxj 2 X nf1g and P 2 H ½P such that the corresponding px 2 H½PðxÞ. Let
%
p11 ¼ 1 lð1 aÞ, p1j ¼ al, 8j 2 X nf1g, pij ¼ 0, 8i; j 2 X nf1g, iaj. Choose pj1 and pxj , j 2 X nf1g, as follows:
(
w
lð1 aÞ for k ¼ j ¼ minfi ¼ 2; . . . ; J : Pwi Xlg;
if 9 j : Pj Xlð1 aÞ; pk1 ¼
0; 8k 2 X ; kaf1; jg;
(
Pw2 for k ¼ 2;
if Pwj olð1 aÞ; 8j 2 X nf1g; pk1 ¼ Pk1 w
minflð1 aÞ i¼2 pi1 ; Pk g 8k 2 X nf1; 2g;
1 Pw al
pxj ¼ ðPwj pj1 1 Þ. &
1 al 1l
A.2.2. Proposition 4
Proof. (a) Suppose, without loss of generality, that X~ ¼ f1; 2; . . . ; hg, 2phoJ, and consider Prðx ¼ 1Þ. By
Pw l
Lemma 1, for 1 40 to solve the first equation of system (1.1), it must be that p11 ¼ p ¼ 1, and either
1l
P 1 Pw1
p1i ¼ l or pxi ¼ 0, 8i 2 X nf1g, with Ji¼2 p1i pxi ¼ l . Since p22 ¼ p by assumption, and p ¼ 1, it follows
1l
that p12 ¼ 0. Hence, for the first equation in system (1.1) to hold, px2 ¼ 0. Consider the second equation in
P
system (1.1): when the first equation of the system holds, P the second reduces to Ji¼3 p2i pxi ¼ Pw2 . However, for
each i 2 X nf1g, if p1i ¼ l, it follows that p2i ¼ 0, since kal pkl ¼ 1 pll pl, 8l 2 X . On the other hand, if
P
p1i ol, for the first equation in system (1.1) to hold it must be the case that pxi ¼ 0. Hence, Ji¼3 p2i pxi ¼ 0.
Therefore, since Pw2 40; the lower bound in (3.2) is not feasible for Prðx ¼ 1Þ, because the second equation of
system (1.1) is not satisfied. Notice now that repeating the same argument for each of equations 3 to h in
system (1.1), implies by a symmetry argument that Prðx ¼ 1Þ cannot achieve the lower bound in (3.2).
For k 2 X nX~ , Prðx ¼ kÞ can achieve the lower bound in (3.2). Consider for example Prðx ¼ JÞ. Let pJJ ¼ 1
and pJi ¼ l, 8i 2 X nfJg. Then the last equation of system (1.1) is satisfied. These values of pJi , i 2 X imply that
Pwj
p ¼ 1 l, and that pxj ¼ for each j 2 X nfJg. It is obvious that the suggested P 2 H 3 ½P% , and the
1l
implied pxj solves system (1.1).
P
(b) Suppose that Pw1 pl and that px1 ¼ 0. Then Jj¼2 pxj ¼ 1, and pxj X0 8j ¼ 2; . . . ; J. Then the proof of
Proposition 3, part (b.2.2), applies, with p ¼ 1 Pw1 , p12 ¼ p13 ¼ ¼ p1J ¼ Pw1 , and pij ¼ 0, 8i; j 2 X , iaj,
ia1. Hence, it follows that px1 ¼ 0 is a value consistent with Assumption 3 if Pw1 pl. &
A.2.3. Proposition 5
Proof. (a) Suppose, without loss of generality, that X~ ¼ f1; 2; . . . ; hg, 2phoJ, and consider Prðx ¼ 1Þ. For
Pw
px1 ¼ 1 o1 to be admissible in the first equation of system (1.1), it must be that p ¼ 1 l and
1l
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 113
PJ x
j¼2 p1j pj ¼ 0. Since pjj ¼ p, 8j 2 X~ , the second equation of the system becomes:
Pw1 XJ
p21 þ ð1 lÞpx2 þ p2j pxj ¼ Pw2 ,
1l j¼3
PJ x Pw1 PJ Pw1
where j¼3 pj ¼1 px2 . Let x
j¼3 p2j pj ¼ p̄ð1 px2 Þ, where p̄ 2 ½0; l, since the constraints
1l 1l
pij p1 ppl, 8iaj 2 X~ , and plk pl, 8lak 2 X nX~ , allow for p1j ¼ 0 or p1j ¼ l, 8j ¼ 2; . . . ; J. It follows that
Pw1
Pw2 p̄ ðp21 p̄Þ
px2 ¼ 1 l.
1 l p̄
x Pw1
Notice that p2 must lie in 0; 1 . I need to distinguish three cases.
1l
1. 1 l p̄40. Then
Pw1
Pw2 p̄ ðp21 p̄Þ
1 l X0 () p pp̄ þ ðPw p̄Þ ð1 lÞ ,
21 2
1 l p̄ Pw1
Pw
and one can always find values of p21 ; p̄ 2 ½0; l for which this inequality is satisfied. For px2 p1 1 it
1l
must be that
Pw1
Pw2 p̄ ðp21 p̄Þ w w w
1 l p1 P1 () p X l 1 þ P1 þ P2 ð1 lÞ.
21 w
1 l p̄ 1l P1
As long as there exist values of p21 pl that satisfy the above inequality, the upper bound in (3.2) is
admissible. However,
l 1 þ Pw1 þ Pw2 l
ð1 lÞ4l () Pw1 þ Pw2 4ð1 lÞ þ Pw1 .
Pw1 1l
l
Pw1 þ Pw2 4ð1 lÞ þ Pw1 . (A.4)
1l
Pw1 þ Pw2 ð1 lÞ
2. 1 l p̄ ¼ 0. Then p21 ¼ ð1 lÞ. Hence, the upper bound in (3.2) can be rejected if
Pw1
condition (A.4) is satisfied.
3. 1 l p̄o0. Then
Pw1
Pw2 p̄ ðp21 p̄Þ
1 l X0 () p Xp̄ þ ðPw p̄Þ ð1 lÞ .
21 2
1 l p̄ Pw1
As long as there exist values of p21 pl that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
ð1 lÞ Pw ðl p̄Þ
p̄ þ ðPw2 p̄Þ w 4l () Pw2 4p̄ þ 1 .
P1 1l
ARTICLE IN PRESS
114 F. Molinari / Journal of Econometrics 144 (2008) 81–117
Hence, given that by assumption pij pl, 8iaj, i; j 2 X , the upper bound in (3.2) can be rejected if Pw2 4l. For
Pw
px2 p1 1 it must be that
1l
Pw1
Pw2 p̄ ðp21 p̄Þ w w w
1 l p1 P1 () p p l 1 þ P1 þ P2 ð1 lÞ.
21
1 l p̄ 1l Pw1
As long as there exist values of p21 X0 that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
l 1 þ Pw1 þ Pw2
ð1 lÞo0 () Pw1 þ Pw2 oð1 lÞ.
Pw1
Hence, the upper bound in (3.2) can be rejected if one of the following holds: (i) Pw2 4l, or (ii)
Pw1 þ Pw2 oð1 lÞ.
Finally, notice that
8
>
> if lp 12 ; ð1 l pij Þ40; 8iaj; i; j 2 X ;
>
> 8 8
>
> > < w w l
>
> > w
>
> > Pw 4l¼) P1 þ P2 4ð1 lÞ þ P1 1 l;
>
>
>
< >
> 2
: Pw þ Pw 4ð1 lÞ
>
< 1 2
>
>
1
if l4 2 ; 8 :
>
>
>
>
>
> < Pw þ Pw oð1 lÞ þ Pw l
>
> >
> Pw1 þ Pw2 oð1 lÞ¼) 1 2 1
1l
>
> >
>
>
> >
: : P2 ol
w
:
When lp 12, condition (A.4) is necessary and sufficient to define the cases in which the upper bound in (3.2) is
not feasible. When l4 12, it can still be the case that ð1 l p̄Þ40 (but it does not need to be). If Pw2 4l, (A.4)
is implied, and the upper bound in (3.2) is not feasible. If Pw1 þ Pw2 oð1 lÞ, then condition (A.4) is not
satisfied, and if ð1 l p̄Þ40, the upper bound in (3.2) can be feasible. Hence, when lX 12, Pw2 4l is a
sufficient condition for the upper bound in (3.2) to be not feasible.
Notice now that repeating the same argument for each of equations 3 to h in system (3.3), and solving each
one of them, respectively, for px3 ; px4 ; . . . ; pxh , implies by a symmetry argument that if lp 12, the upper bound in
(3.2) can be rejected if and only if
l
Pw1 þ Pwj 4ð1 lÞ þ Pw1 some j 2 X~ nf1g,
1l
while if l4 12, the upper bound in (3.2) can be rejected if
Equations h þ 1 to J in system (3.3) do not imply any additional conditions under which the upper bound in
(3.2) is not feasible. Indeed, let k 2 X nX~ ; then
Pw1 X
p21 þ pkk pxk þ pkj pxj ¼ Pwk .
1l j2X nf2;kg
P Pw1
Let pkk ¼ 1, and, by the same argument as above, let j2X nf2;kg pkj pxj ¼ p̄ð1 pxk Þ, where p̄ must lie in
1l
½0; l. Then
w Pw1 Pw1
Pk p21 p̄ 1
1l 1l
pxk ¼ ,
1 p̄
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 115
where 1 p̄X1 l40. It is straightforward to verify that there are values of p21 ; p̄ 2 ½0; l for which
Pw Pw Pw
pxk 2 0; 1 1 . For example, if Pwk p1 1 , let p̄ ¼ p21 ¼ 0, so that pxk ¼ Pwk . If Pwk 41 1 and
1l 1l 1l
Pwk l Pw1
w
Pk 4l, let p̄ ¼ p21 ¼ l, so that pk ¼x
p1 .
1l 1l
(b) Suppose that P1 41 l, and that p1 ¼ 1. Then pxj ¼ 0 8j ¼ 2; . . . ; J. Then pick p ¼ Pw1 (notice that
w x
P1 41 l, hence the proposed value of p is admissible), and pj1 ¼ Pwj 8j ¼ 2; 3; . . . ; J. Since Pw1 41 l, it
w
follows that Pwj ol 8j ¼ 2; 3; . . . ; J, hence the proposed values of pj1 , 8j ¼ 2; 3; . . . ; J, are admissible, and
therefore px1 ¼ 1 is admissible, and hence it is the upper bound. &
A.2.4. Proposition 6
Proof. (a) Lower bound.
Pw2 l
Suppose that j41, and without loss of generality consider Prðx ¼ 2Þ. By Lemma 1, for px2 ¼ 40 to
1l
x
solve the second equation of system (1.1), it must be that p22 ¼ 1, and either p2i ¼ l or pi ¼ 0, 8i 2 X nf2g,
P 1 Pw2
with ia2 p2i pxi ¼ l . Since p22 pp11 by assumption, and p22 ¼ 1, it follows that p11 ¼ 1; hence, the first
1l PJ x w
P (1.1) reduces to i¼3 p1i pi ¼ P1 . However, for each i 2 X nf1; 2g, if p2i ¼ l, it follows that
equation of system
p1i ¼ 0, since kal pkl ¼ 1 pll pl, 8l 2 X . On the other hand, if p2i ol, for the second equation in system
P
(1.1) to hold it must be the case that pxi ¼ 0. Hence, Ji¼3 p1i pxi ¼ 0. Therefore, since Pw1 40, the lower bound
in (3.2) is not feasible for Prðx ¼ 2Þ. Notice now that repeating the same argument for Prðx ¼ jÞ, jX3, implies
that Prðx ¼ jÞ cannot achieve the lower bound in (3.2).
Consider now Prðx ¼ 1Þ, and let p11 ¼ 1, and p1i ¼ l, 8i 2 X nf1g. Then the first equation of system (1.1) is
Pwj
satisfied. Let pxj ¼ and pjj ¼ 1 l for each j 2 X nf1g. It is obvious that the suggested P 2 H 4 ½P% , and
1l
the implied pxj solves system (1.1).
(b) Upper bound.
Pw
First, let j ¼ 1, and Pw1 oð1 lÞ. Then, as shown in the proof of Proposition 5, for px1 ¼ 1 it must be that
1l
PJ
p11 ¼ 1 l and i¼2 p1i pi ¼ 0. But by Assumption 4, p11 Xp22 X XpJJ X1 l, and therefore for px1 ¼
x
Pw1
to solve the first equation of system (1.1) it must be that pjj ¼ 1 l, 8 j 2 X , and I am back to the case of
1l
constant probability of correct report, with X~ ¼ X . Now let j41, and Pw oð1 lÞ. Then, again, for px ¼
j j
Pwj P
it must be that pjj ¼ 1 l and iaj pji pxi ¼ 0. But by Assumption 4, pjj Xpðjþ1Þðjþ1Þ X XpJJ X1 l,
1l
and therefore it must be that pkk ¼ 1 l, 8k 2 fj; j þ 1; . . . ; Jg, and I am back to the case of constant
probability of correct report, with X~ ¼ fj; j þ 1; . . . ; Jg. The result of Proposition 5 applies. &
A.2.5. Proposition 7
Pw1 ð1 pÞ 1 2Pw1 1
Proof. With dichotomous variables, px1 ðpÞ ¼ ¼2 þ 1 , p 2 H 3 ½P% . Hence,
p ð1 pÞ 2p 1
qpx1 ðpÞ
1. If lo 12 Pw1 X 12, then 1 ppPw1 pp and p0. Hence the lower bound on Prðx ¼ 1Þ is achieved for
qp
p ¼ 1 and the upper bound for p ¼ maxð1 l; Pw1 Þ.
2. If lX 12 Pw1 X 12, then for px1 2 ½0; 1 I need one of the following: (a) 1 ppPw1 pp¼)pXPw1 X 12; or (b)
ppPw1 p1 p¼)pp1 Pw1 o 12; additionally, I need pX1 l. Hence, the feasible values of p are given by
p 2 ½1 l; 1 Pw1 [ ½Pw1 ; 1. Notice that if loPw1 , the feasible values of p are given by p 2 ½Pw1 ; 1, and px1 is
decreasing in p; therefore the lower bound is achieved for p ¼ 1 and the upper bound for p ¼ Pw1 . When
ARTICLE IN PRESS
116 F. Molinari / Journal of Econometrics 144 (2008) 81–117
l4Pw1 , for values of p 2 ½Pw1 ; 1 the previous result applies. For values of p 2 ½1 l; 1 Pw1 px1 is decreasing
in p; therefore the upper bound is achieved for p ¼ 1 l and the lower bound for p ¼ 1 Pw1 .
qpx ðpÞ
3. If lo 12 Pw1 o 12, then 1 ppPw1 pp and 1 X0. Hence the lower bound on Prðx ¼ 1Þ is achieved for
qp
p ¼ 1 minðl; Pw1 Þ and the upper bound for p ¼ 1.
4. If lX 12 Pw1 o 12, then for px1 2 ½0; 1 I need one of the following: (a) 1 ppPw1 pp¼)pX1 Pw1 4 12; or (b)
ppPw1 p1 p¼)ppPw1 o 12; additionally, I need pX1 l. Hence, the feasible values of p are given by
p 2 ½1 l; Pw1 [ ½1 Pw1 ; 1. Notice that if 1 l4Pw1 , the feasible values of p are given by p 2 ½1 Pw1 ; 1,
and px1 is increasing in p; therefore the lower bound is achieved for p ¼ 1 Pw1 and the upper bound for
p ¼ 1. When 1 loPw1 , for values of p 2 ½1 Pw1 ; 1 the previous result applies. For values of p 2
½1 l; Pw1 px1 is increasing in p; therefore the upper bound is achieved for p ¼ Pw1 and the lower bound for
p ¼ 1 l.
It is easy to verify that these bounds are a subset of those in (3.2). &
A.2.6. Proposition 8
Pw1 ð1 p22 Þ
Proof. In this case, px1 ðpÞ ¼ , ðp11 ; p22 Þ 2 H 4 ½P% . Hence,
p11 ð1 p22 Þ
1. If lo 12, 1 p22 pPw1 pp11 , and px1 ðpÞ is increasing in p22 and decreasing in p11 . Hence the lower bound is
achieved for p22 ¼ 1 l and p11 ¼ 1. The upper bound is achieved with p22 ¼ p11 , since p11 bounds p22
from above. Hence if Pw1 X 12, the upper bound is achieved for p11 ¼ p22 ¼ maxð1 l; Pw1 Þ. If Pw1 o 12, the
upper bound is achieved for p11 ¼ p22 ¼ 1.
2. If lX 12 and Pw1 o 12, either 1 p22 pPw1 pp11 or 1 p22 XPw1 Xp11 . Hence, either p11 2 ½1 Pw1 ; 1 and
p22 2 ½1 Pw1 ; p11 , or p11 2 ½1 l; Pw1 and p22 2 ½1 l; p11 . In the first case px1 is increasing in p22 and
decreasing in p11 ; the lower bound is achieved for p11 ¼ 1, p22 ¼ 1 Pw1 . The upper bound is achieved with
p22 ¼ p11 ¼ 1. In the second case px1 is decreasing in p22 and increasing in p11 ; the lower bound is achieved
with p22 ¼ p11 ¼ 1 l. The upper bound is achieved with p11 ¼ Pw1 and p22 ¼ 1 l.
3. If lX 12 and Pw1 X 12, consider the following two cases. If l4Pw1 then p11 ¼ p22 ¼ 1 Pw1 are admissible
values, and the implied px1 ¼ 0. Also, p11 ¼ Pw1 is an admissible value, and the implied px1 ¼ 1. If loPw1 then
p11 2 ½Pw1 ; 1, p22 2 ½1 l; p11 and 1 p22 pPw1 pp11 . Then px1 is decreasing in p11 and increasing in p22 .
Hence the lower bound is achieved for p11 ¼ 1 and p22 ¼ 1 l, and the upper bound is achieved with
p22 ¼ p11 ¼ Pw1 . &
References
Abrevaya, J., Hausman, J.A., 1999. Semiparametric estimation with mismeasured dependent variables: an application to duration models
for unemployment spells. Annales d’Economie et de Statistique 55–56, 243–275.
Aigner, D.J., 1973. Regression with a binary independent variable subject to errors of observation. Journal of Econometrics 1, 49–60.
Beresteanu, A., Molinari, F., 2007. Asymptotic properties for a class of partially identified models. Econometrica, forthcoming.
Blundell, R., Gosling, A., Ichimura, H., Meghir, C., 2007. Changes in the distribution of male and female wages accounting for
employment composition using bounds. Econometrica 75, 323–363.
Bollinger, C.R., 1996. Bounding mean regressions when a binary regressor is mismeasured. Journal of Econometrics 73, 387–399.
Bound, J., Brown, C., Mathiowetz, N., 2001. Measurement error in survey data. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of
Econometrics, vol. 5. North-Holland, Elsevier Science, pp. 3705–3843.
Bross, I., 1954. Misclassification in 2 2 tables. Biometrics 10 (4), 478–486.
Campbell, S.L., Meyer, C.D., 1991. Generalized Inverses of Linear Transformations. Dover Publications, Inc., New York.
Card, D., 1996. The effect of unions on the structure of wages: a longitudinal analysis. Econometrica 64 (4), 957–979.
Chernozhukov, V., Hong, H., Tamer, E., 2004. Inference on parameter sets in econometric models. Discussion paper, MIT, Duke and
Northwestern University.
Chernozhukov, V., Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets in econometric models.
Econometrica 75, 1243–1284.
Ciliberto, F., Tamer, E., 2004. Market structure and multiple equilibria in airline markets, Discussion paper, University of Virginia and
Northwestern University.
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81–117 117
Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London, UK.
Dominitz, J., Sherman, R.P., 2006. Identification and estimation of bounds on school performance measures: a nonparametric analysis of
a mixture model with verification. Journal of Applied Econometrics 21, 1295–1326.
Dustmann, C., van Soest, A., 2000. Parametric and semiparametric estimation in models with misclassified dependent variables. IZA
Discussion Paper 218.
Gong, G., Whittemore, A.S., Grosser, S., 1990. Censored survival data with misclassified covariates: a case study of breast cancer
mortality. Journal of the American Statistical Association 85 (409), 20–28.
Gustman, A.L., Steinmeier, T.L., 2001. What people don’t know about their pension and social security. In: Gale, W.G., Shoven, J.B.,
Warshawsky, M.J. (Eds.), Public Policies and Private Pensions. Brookings Institution, Washington D.C.
Gustman, A.L., Mitchell, O.S., Samwick, A.A., Steinmeier, T.L., 2000. Evaluating pension entitlements. In: Mitchell, O.S., Hammond,
P.B., Rappaport, A.M. (Eds.), Forecasting Retirement Needs and Retirement Wealth. University of Pennsylvania.
Hampel, F.R., 1974. The influence curve and its role in robust estimation. Journal of the American Statistical Association 69 (346),
383–393.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions.
Wiley, New York.
Hausman, J., Abrevaya, J., Scott-Morton, F.M., 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of
Econometrics 87, 239–269.
Honore, B.E., Lleras-Muney, A., 2006. Bounds in competing risks models and the war on cancer. Econometrica 74, 1675–1698.
Honore, B.E., Tamer, E., 2006. Bounds on parameters in panel dynamic discrete choice models. Econometrica 74, 611–629.
Horn, R.A., Johnson, C.R., 1999. Matrix Analysis. Cambridge University Press, New York.
Horowitz, J.L., Manski, C.F., 1995. Identification and robustness with contaminated and corrupted data. Econometrica 63 (2), 281–302.
Horowitz, J.L., Manski, C.F., 1998. Censoring of outcomes and regressors due to survey nonresponse: identification and estimation using
weights and imputations. Journal of Econometrics 84, 37–58.
Horowitz, J.L., Manski, C.F., 2000. Nonparametric analysis of randomized experiments with missing covariate and outcome data. Journal
of the American Statistical Association 95 (449), 77–84.
Hotz, V.J., Mullin, C.H., Sanders, S.G., 1997. Bounding causal effects using data from a contaminated natural experiment: analyzing the
effects of teenage childbearing. Review of Economic Studies 64, 575–603.
Hu, Y., 2006. Bounding parameters in a linear regression model with a mismeasured regressor using additional information. Journal of
Econometrics 133, 51–70.
Imbens, G.W., Manski, C.F., 2004. Confidence intervals for partially identified parameters. Econometrica 72 (6), 1845–1857.
Kane, T.J., Rouse, C.E., Staiger, D., 1999. Estimating returns to schooling when schooling is misreported, NBER Working Paper 7235.
Klepper, S., 1988. Bounding the effects of measurement error in regressions involving dichotomous variables. Journal of Econometrics 37,
343–359.
Klepper, S., Leamer, E.E., 1984. Consistent sets of estimates for regressions with errors in all variables. Econometrica 52 (1), 163–183.
Kreider, B., Pepper, J., 2007. Inferring disability status from corrupt data. Journal of Applied Econometrics, forthcoming.
Lewbel, A., 2000. Identification of the binary choice model with misclassification. Econometric Theory 16, 603–609.
Mahajan, A., 2006. Identification and estimation of regression models with misclassification. Econometrica 74, 631–665.
Manski, C.F., 2003. Partial Identification of Probability Distributions. Springer Series in Statistics. Springer, New York.
Manski, C.F., Tamer, E., 2002. Inference on regressions with interval data on a regressor or outcome. Econometrica 70 (2), 519–546.
Mellow, W., Sider, H., 1983. Accuracy of response in labor market surveys: evidence and implications. Journal of Labor Economics 1 (4),
331–344.
Molinari, F., 2003. Contaminated, corrupted, and missing data, Ph.D. Thesis, Northwestern University, available at hhttp://
www.arts.cornell.edu/econ/fmolinari/dissertation.pdfi.
Moore, J.C., Marquis, K.H., Bogen, K., 1996. The SIPP Cognitive Research Evaluation Experiment: Basic Results and Documentation,
Unpublished Report, U.S. Bureau of the Census.
Munkres, J.R., 1991. Analysis on Manifolds. Addison-Wesley, Reading, MA.
Poterba, J.M., Summers, L.H., 1995. Unemployment benefits and labor market transitions: a multinomial logit model with errors in
classification. The Review of Economics and Statistics 77 (2), 201–216.
Ramalho, E.A., 2002. Regression models for choice-based samples with misclassification in the response variable. Journal of Econometrics
106, 171–201.
Rao, C.R., 1973. Linear Statistical Inference and its Applications. Wiley, New York.
Rockafellar, R.T., 1970. Convex Analysis. Princeton University Press, Princeton, New Jersey.
Swartz, T., Haitovsky, Y., Vexler, A., Yang, T., 2004. Bayesian identifiability and misclassification in multinomial data. Canadian Journal
of Statistics 32, 285–302.