You are on page 1of 9

Bootstrap Tests for Distributional Treatment

Effects in Instrumental Variable Models


Alberto Abadie

This article considers the problem of assessing the distributional consequence s of a treatment on some outcome variable of interest when
treatment intake is (possibly) nonrandomized , but there is a binary instrument available for the researcher. Such a scenario is common in
observationa l studies and in randomized experiment s with imperfect compliance. One possible approach to this problem is to compare
the counterfactual cumulative distribution functions of the outcome with and without the treatment. This article shows how to estimate
these distributions using instrumental variable methods and a simple bootstrap procedure is proposed to test distributional hypotheses ,
such as equality of distributions, Ž rst-order and second-orde r stochastic dominance. These tests and estimators are applied to the study
of the effects of veteran status on the distribution of civilian earnings. The results show a negative effect of military service during the
Vietnam era that appears to be concentrated on the lower tail of the distribution of earnings. First-order stochastic dominance cannot be
rejected by the data.
KEY WORDS: Compliers; Empirical processes; Kolmogorov –Smirnov test; Stochastic dominance .

1. INTRODUCTION (for P415 dominating P405 ), then these distributions will be


ranked in the same way by any monotonic utilitarian social
Although most empirical research on treatment effects focus
welfare function (u0 > 0). If two income distributions can be
on the estimation of differences in mean outcomes, analysts
ranked by second-order stochastic dominance,
have long been interested in methods for estimating the impact
of a treatment on the entire distribution of outcomes. This is Z x ³Z z ´ Z x ³Z z ´
especially true in economics, where social welfare compar- dP415 4y5 dz µ dP405 4y5 dz 8x ¶ 0 (2)
0 0 0 0
isons may require integration of utility functions under alter-
native distributions of income. Following Atkinson (1970), (for P415 dominating P405 ), then these distributions will be
consider the class of symmetric utilitarian social welfare ranked in the same way by any concave monotonic utilitar-
functions: Zˆ ian social welfare function (u0 > 0, u00 < 0) (see Foster and
W 4P1 u5 D u4y5 dP 4y51 Shorrocks 1988 for details). Therefore, stochastic dominance
0
can be used to evaluate the distributional consequences of
where P is an income distribution and u2 ò 7! ò is a twice
treatments under mild assumptions about social preferences.
continuously differentiable individual utility function. Let P415
Another possible question is whether the treatment has any
and P405 denote the (potential) distributions that income would
effect on the distribution of the outcome, that is, whether or
follow if the population were exposed to the treatment in one
not the two distributions P415 and P405 are the same.
case, and excluded from the treatment in the other case. If u
In general, the assessment of the distributional conse-
is completely speciŽ ed (u D u)
N we rank P415 and P405 by com-
quences of treatments may be carried on by estimating
paring W 4P415 1 u5
N and W 4P405 1 u5.
N Note that afŽ ne transforma-
P415 and P405 . Estimation of the potential income distribu-
tions of u do not affect the ranking of any two distributions.
Typically, u is not Ž xed by the analyst but is restricted to tions, P415 and P405 , is straightforward when the treatment
have some desirable properties. In particular, social welfare is randomly assigned in the population. However, this type
(W 4P1 u5) is usually assumed to increase with the income of analysis becomes difŽ cult in observational studies (or in
of any subset of individuals in the population (u0 > 0). If randomized experiments with imperfect compliance) when
u is afŽ ne, then W 4P1 u5 ranks income distributions solely treatment intake is not randomly determined. Recently, Imbens
on the basis of average income. However, distributional con- and Rubin (1997) have shown that, when there is a binary
siderations often motivate social welfare functions that favor instrumental variable available for the researcher, the poten-
income redistribution to the poorer (u0 > 0 and u00 < 0). Under tial distributions of the outcome variable are identiŽ ed for the
these assumptions, stochastic dominance can be used to estab- subpopulation potentially affected in their treatment status by
lish a partial ordering on the distributions of income. If two variation in the instrument (the so-called compliers). In addi-
income distributions can be ranked by Ž rst-order stochastic tion, Abadie, Angrist and Imbens (in press) have studied dis-
dominance, tributional effects of treatments for compliers in instrumental
Zx Zx variable models using quantile regression techniques. How-
dP415 4y5 µ dP405 4y5 8x ¶ 0 (1) ever, up to date, no testing procedure has been proposed to
0 0 compare entire potential outcome distributions for compliers.
This article proposes a bootstrap strategy to perform this kind
Alberto Abadie is Assistant Professor, John F. Kennedy School of
of comparison. In particular, equality of distributions, Ž rst-
Government, Harvard University, Cambridge, MA 02138 (E-mail: order and second-order stochastic dominance hypotheses, all
alberto_abadie@harvard.edu). The author thanks Joshua Angrist for com- important for social welfare comparisons, are considered.
ments and for providing the data, and Jinyong Hahn, Kei Hirano, Guido
Imbens, Tom Knox, Guido Kuersteiner, Whitney Newey, Emmanuel Saez, Jim
Stock, seminar participants at the 2000 Econometric Society North American
Winter Meeting in Boston and the 2001 Conference of Econometrics and © 2002 American Statistical Association
Mathematical Economics in Rochester, the editor, an associate editor, and
Journal of the American Statistical Association
the referees for helpful comments.
March 2002, Vol. 97, No. 457, Theory and Methods
284
Abadie: Bootstrap Tests for Distributional Treatment Effects 285

The proposed method is applied to the study of the individual with treatment. DeŽ ne Di to be the treatment partic-
effects of Vietnam veteran status on the distribution of civil- ipation indicator (that is, Di equals one when individual i has
ian earnings. Following Angrist (1990), random variation in been exposed to the treatment, Di equals zero otherwise). Let
enrollment induced by the Vietnam era draft lottery is used Zi be a binary variable that is independent of the responses
to identify the effects of veteran status on civilian earnings. Yi 405 and Yi 415 but that is correlated with Di in the popula-
However, the focus of the present article is not restricted to tion (an instrument). Denote Di 405 the value that Di would
the average treatment effect for compliers. The entire marginal have taken if Zi D 0; Di 415 has the same meaning for Zi D 1.
distributions of potential earnings for veterans and nonveter- In practice, for any particular individual the analyst does not
ans are described for this subgroup of the population. These observe both potential treatment indicators Di 405 and Di 415.
distributions differ in a notable way from the correspond- Instead the realized treatment Di D Di 415 ¢ Zi C Di 405 ¢ 41 ƒ Zi 5
ing distributions of realized earnings. Veteran status appears is observed. In the same fashion, the analyst does not observe
to reduce lower quantiles of the earnings distribution, leav- both Yi 405 and Yi 415 for any individual i, one of these poten-
ing higher quantiles unaffected. Although the data show a fair tial outcomes is counterfactual. Only the realized outcome,
amount of evidence against equality in potential income dis- Yi D Yi 415 ¢ Di C Yi 405 ¢ 41 ƒ Di 5, is observed. In the analysis of
tributions for veterans and nonveterans, statistical testing falls randomized experiments with imperfect compliance, Zi usu-
short of rejecting this hypothesis at conventional signiŽ cance ally represents treatment assignment (randomized) whereas Di
levels. First- and second-order stochastic dominance of the represents treatment intake (nonrandomized). In observational
potential income distribution for nonveterans are not rejected studies, instruments are often provided by the so-called natu-
by the data. ral experiments or quasiexperiments. For the rest of the article
The rest of the article is structured as follows. In Section 2, use the following identifying assumption:
a framework for identiŽ cation of treatment effects in instru- Assumption 2.1.
mental variable models is brie y reviewed. It also shows how (i) Independence of the Instrument: 4Yi 4051 Yi 4151 Di 405,
to estimate the distributions of potential outcomes for compli- Di 4155 is independent of Zi .
ers. In contrast with Imbens and Rubin (1997) who report his- (ii) First Stage: 0 < P4Zi D 15 < 1 and P 4Di 415 D 15 >
togram estimates of these distributions, here a simple method P4Di 405 D 15.
is shown to estimate the cumulative distribution functions (cdf) (iii) Monotonicity: P 4Di 415 ¶ Di 4055 D 1.
of the same variables. An approach based on cdfs rather than
histograms is often convenient. First, the problem of choos- Assumption 2.1 contains a set of nonparametric restrictions
ing adequate binwidths for histograms is avoided. The cdf, under which instrumental variable models identify the causal
estimated by instrumental variable methods, can be evaluated effect of the treatment for the subpopulation potentially
at each observation in our sample, just as for the conven- affected in their treatment status by variation in the instru-
tional empirical distribution function. Moreover, as inspec- ment: Di 415 D 1 and Di 405 D 0 (see Imbens and Angrist 1994;
tion of equations (1) and (2) reveals, Ž rst- and second-order Angrist, Imbens, and Rubin 1996). This subpopulation is
stochastic dominance can be easily deŽ ned in terms of cdfs. sometimes called compliers. When the treatment intake, Di ,
Estimated cdfs may therefore suggest stochastic dominance is itself randomized, Assumption 2.1 holds for Zi D Di and
between estimated distributions in a way that would be hard every individual is a complier.
to visualize from histograms. In addition, tests for stochastic Notice that there are some important exclusion restrictions
dominance can be easily constructed within the well-known implicit in the notation. First, for each individual i, potential
family of tests which are based on differences in cdfs (see Dar- treatment indicators 4Di 4051 Di 4155 are not affected by the val-
ling 1957 for a review of this class of tests). In summary, an ues taken by the instrument for other individuals Zj , j 6D i;
approach to estimation of distributions of potential outcomes in the same fashion, potential outcomes 4Y i 4051 Yi 4155 are not
based on cdfs is important because it is often easier to deŽ ne, affected by the values taken by the treatment and instrument
visualize, and test some distributional hypotheses of inter- for other individuals 4Zj 1 Dj 5, j 6D i. This restriction is called
est, such as Ž rst- or second-order stochastic dominance, using stable-unit-treatment-value-assumption (SUTVA) and is fre-
cdfs rather than histograms (see, however, Anderson 1996 for quently used in statistical models of causal inference (see
an approach to test for stochastic dominance based on his- Rubin 1990). In addition, potential outcomes 4Y i 4051 Yi 4155 do
tograms; approaches based on nonparametric density estima- not depend on Zi . This last restriction, commonly invoked in
tion can also be conceived for the case in which the outcome instrumental variable models, allows us to attribute correlation
variable of interest has a continuous distribution). A com- between the instrument and the outcome variables to the effect
plete description of the bootstrap strategy is also provided in of the treatment alone (see Angrist et al. 1996 for a more elab-
Section 2, along with a proposition which states the asymp- orate discussion of the restrictions in Assumption 2.1).
totic validity of the bootstrap for the tests proposed in this In this article, distributional effects of possibly nonrandom -
article. Section 3 describes the data and presents the empirical ized treatments are studied by comparing the distributions
results. Section 4 concludes. of potential outcomes Yi 415 and Yi 405 with and without the
treatment. The Ž rst step is to show that the identiŽ cation con-
ditions in Assumption 2.1 allow us to estimate these distribu-
2. STATISTICAL METHODS
tions for the subpopulation of compliers. To estimate the cdfs
Let Yi 405 be the potential outcome for individual i without of potential outcomes for compliers, the following lemma will
treatment, and Yi 415 be the potential outcome for the same be useful.
286 Journal of the American Statistical Association, March 2002

Lemma 2.1. Let h4¢5 be a measurable function on the real First-order stochastic dominance: FA dominates FB if
line such that E—h4Yi 5— < ˆ. If Assumption 2.1 holds, then
FA 4y5 µ FB 4y5 8y 2 ò (H.2)
E6h4Yi 5Di —Zi D 17 ƒ E6h4Yi 5Di —Zi D 07 Second-order stochastic dominance: FA dominates FB if
E6D i —Zi D 17 ƒ E6Di —Zi D 07 Zy Zy
FA 4x5 dx µ FB 4x5 dx 8y 2 ò (H.3)
D E6h4Yi 4155—Di 405 D 01 Di 415 D 171 (3) ƒˆ ƒˆ

and One possible way to carry out these tests for the distribu-
tions of potential outcomes for compliers is to use statistics
E6h4Yi 541 ƒ Di 5—Zi D 17 ƒ E6h4Yi 541 ƒ Di 5—Zi D 07 directly based on the comparison between the estimates for
C
F415 and F405
C
. However, it is easier to test the implications of
E641 ƒ Di 5—Zi D 17 ƒ E641 ƒ Di 5—Zi D 07
these hypotheses on the two conditional distributions of the
D E6h4Yi 4055—Di 405 D 01 Di 415 D 170 (4) outcome variable, given Zi D 1 and Zi D 0. Denote F1 the cdf
of the outcome variable conditional on Zi D 1, and deŽ ne F0 in
Proof. Note that h4Yi 5Di is equal to h4Yi 4155 if Di D 1 and the same way for Zi D 0. That is, F1 4y5 D E618Yi µ y9—Zi D 17
equal to 0 if Di D 0. By Lemma 4.2 in Dawid (1979), we have and F0 4y5 D E618Yi µ y9—Zi D 07.
that 4h4Yi 41551 01 Di 4051 Di 4155 is independent of Zi . Then by Proposition 2.1. Under Assumption 2.1, hypotheses
Theorem 1 in Imbens and Angrist (1994), we have that (H.1)–(H.3) hold for 4FA 1 FB 5 D 4F415
C C
1 F405 5 if and only if they
hold for 4FA 1 FB 5 D 4F1 1 F0 5.
E6h4Yi 4155—Di 405 D 01 Di 415 D 17
Proof. From equations (5) and (6), we have
E6h4Yi 5 ¢ Di —Zi D 17 ƒ E6h4Yi 5 ¢ Di —Zi D 07
D 0 C
F415 C
4y5 ƒ F405 4y5
E6Di —Zi D 17 ƒ E6Di —Zi D 07
The second part of the lemma follows from an analogous argu- E618Yi µ y9—Zi D 17 ƒ E618Yi µ y9—Zi D 07
D 0
ment. E6D i —Zi D 17 ƒ E6Di —Zi D 07

Lemma 2.1 provides a simple way to estimate the cumula- Therefore, F415C
ƒ F405C
D K ¢ 4F1 ƒ F0 5 for K D 1=4E6Di —Zi D
tive distribution functions of the potential outcomes for com- 17 ƒ E6D i —Zi D 075 < ˆ, and the result of the proposition
pliers. DeŽ ne F415
C
4y5 D E618Yi 415 µ y9—Di 415 D 11 Di 405 D 07 holds.
and F405 4y5 D E618Yi 405 µ y9—Di 415 D 11 Di 405 D 07. Apply
C
Of course F1 and F0 can easily be estimated by the empir-
Lemma 2.1 with h4Yi 5 D 18Yi µ y9 to get ical distribution of Yi for Zi D 1 and Zi D 0, respectively.
Divide 4Y1 1 0001 Yn 5 into two subsamples given by different val-
C
F415 4y5 D 8E618Yi µ y9Di —Zi D 17 ƒ E618Yi µ y9Di —Zi D 079 ues for the instrument, 4Y 111 1 0001 Y11n1 5 are those observations
¯ P
with Zi D 1 (n1 D i Zi ) and 4Y011 1 0001 Y01n0 5 are those with
8E6Di —Zi D 17 ƒ E6D i —Zi D 0791 (5) P
Zi D 0 (n0 D i 1 ƒ Zi ). Consider the empirical distribution
and functions
n1
1 X
C
F405 4y5 D 8E618Yi µ y941 ƒ Di 5—Zi D 17 æ11n1 4y5 D 18Y11i µ y91
n1 iD1
ƒE618Yi µ y941 ƒ Di 5—Zi D 079 n0
¯ 1 X
æ01n0 4y5 D 18Y01j µ y90
8E641 ƒ Di 5—Zi D 17 ƒ E641 ƒ Di 5—Zi D 0790 (6) n0 jD1

Suppose that we have a random sample, 84Yi 1 Di 1 Zi 59niD1 , Then, the Kolmogorov–Smirnov statistic provides a nat-
drawn from the studied population. The sample counterparts ural way to measure the discrepancy in the data from
of equations (5) and (6) can be used to estimate F415 C
4y5 and the hypothesis of equality of distributions. A two-sample
F405 4y5 for y D 8Y1 1 : : : 1 Yn 9. We can compare the distribu-
C
Kolmogorov–Smirnov statistic can be deŽ ned as
tions of potential outcomes by plotting the estimates of F415 C
³ ´
n1 n0 1=2 ­ ­
and F405 . This comparison tells us how the treatment affects
C
Tneq D sup ­
æ11n1 4y5 ƒ æ01n0 4y5­
0 (7)
different parts of the distribution of the outcome variable, at n y2ò

least for the subpopulation of compliers.


Following McFadden (1989), the Kolmogorov–Smirnov
Researchers often want to formalize this type of comparison statistic can be modiŽ ed to test the hypotheses of Ž rst-order
using statistical hypothesis testing. In particular, a researcher stochastic dominance (for F1 dominating F0 )
may want to compare F415 C
and F405
C
by testing the hypotheses ± n n ²1=2
of equality in distributions, Ž rst-order or second-order stochas- ¢
Tnfsd D 1 0 sup æ11n1 4y5 ƒ æ01n0 4y5 1 (8)
tic dominance. For two distribution functions FA and FB , the n y2ò
hypotheses of interest can be formulated as follows.
and second-order stochastic dominance
Equality of distributions: ± n n ²1=2 Zy ¢
Tnssd D 1 0 sup æ11n1 4x5 ƒ æ01n0 4x5 dx0 (9)
FA 4y5 D FB 4y5 8y 2 ò0 (H.1) n y2ò ƒˆ
Abadie: Bootstrap Tests for Distributional Treatment Effects 287

Kolmogorov–Smirnov type nonparametric distance tests simpliŽ es the asymptotic analysis considerably and is hardly
generally have good power properties. Unfortunately, the restrictive for empirical applications.
asymptotic distributions of the test statistics under the null The results of a simulation study to assess the small sample
hypotheses are generally unknown, because they depend performance of the tests proposed in this article are reported in
on the underlying distribution of the data (see e.g., Appendix B. This simulation study suggests that the bootstrap
Romano 1988). In this article, a bootstrap strategy is used distribution of the tests provides a good approximation to the
to overcome this problem. This strategy is described by the nominal level even in fairly small samples.
following 4 steps: The idea of using resampling techniques to obtain criti-
cal values for Kolmogorov–Smirnov type statistics probably
Step 1: In what follows, let Tn be a generic notation for
Tneq , Tnfsd or Tnssd . Compute the statistic Tn for the original originated with Bickel (1969) and has also been used by
samples 4Y111 1 0001 Y11n1 5 and 4Y011 1 0001 Y01n0 5. Romano (1988), McFadden (1989), Klecan, McFadden, and
McFadden (1991), Præstgaard (1995) and Andrews (1997)
Step 2: Resample n observations 4Y b1 1 : : : 1 Ybn 5 from among others. A related approach based on simulation of p-
b
4Y1 1 : : : 1 Yn 5 with replacement. Divide 4Y1 1 : : : 1 Y bn 5 into two values can be found in Barrett and Donald (1999).
b b
samples: 4Y111 1 0 0 0 1 Y11n1 5 given by the n1 Ž rst elements Note that Proposition 2.2 naturally applies to tests based
of 4b bn 5, and 4b
Y1 1 : : : 1 Y b01n 5 given by the n0 last ele-
Y011 1 : : : 1 Y on perfectly randomized experiments (in which Zi D Di for
0
b b
ments of 4Y1 1 : : : 1 Yn 5. Use these two generated samples to all i). In such case, the entire population is madeup of com-
compute the test statistic T bn1b . pliers. Another interesting special case arises when Di 405 D 0
for all i. This happens, for example, in randomized trials if
Step 3: Repeat Step 2, B times. Note that n0 and n1 are individuals in the control group are perfectly excluded from
constant across bootstrap repetitions. treatment intake (not ruling out noncompliance in the treat-
Calculate the p-values of the tests with p-value D ment group). Then, the distribution of 4Yi 4051 Yi 4155 for the
PBStep 4: b treated is the distribution for compliers (see, e.g., Abadie et al.
bD1 18 Tn1b > Tn 9=B. Reject the null hypotheses if the p-value
is smaller than some signiŽ cance level , 0 <  < 005. in press).

By resampling from the pooled data set 4Y1 1 : : : 1 Yn 5 we


approximate the distribution of our test statistics when F1 D 3. EMPIRICAL EXAMPLE
F0 . Note that for (H.2) and (H.3), F1 D F0 represents the least
favorable case for the null hypotheses. This strategy allows us The data used in this study consist of a sample of 11,637
to estimate the supremum of the probability of rejection under white men, born in 1950–1953, from the March Current Pop-
the composite null hypotheses, which is the conventional def- ulation Surveys of 1979 and 1981–1985. Annual labor earn-
inition of test size. The proof of next proposition shows that, ings, weekly wages, Vietnam veteran status and an indi-
under a nondegeneracy condition, the asymptotic distributions cator of draft-eligibility based on the Vietnam draft lot-
of Tnfsd and Tnssd are continuous, perhaps except for an atom at tery outcome are provided for each individual in the sam-
zero that must have probability mass less than 0.5. Therefore, ple. Additional information about the data can be found in
the restriction that  < 005 is necessary to establish asymptotic Appendix C.
size . This restriction is, however, consistent with conven- Figure 1 shows the empirical distribution of realized annual
tional test levels. labor earnings (from now on, annual earnings) for veterans and
nonveterans. We can observe that the distribution of earnings
Assumption 2.2. The distribution of the outcome variable for veterans has higher low quantiles and lower high quantiles
is nondegenerate with bounded support. than that for nonveterans. Naive reasoning would lead us to
JustiŽ cation of the asymptotic validity of this procedure is conclude that military service during the Vietnam era reduced
provided by the following proposition. the probability of extreme earnings without a strong effect
on average earnings. The difference in means is indeed quite
Proposition 2.2. Under Assumption 2.2, the procedure small. On average, veterans earn only $264 less than nonvet-
described in Steps 1–4, for Tn equal to the test statistics in erans and this difference is not signiŽ cant at conventional test
equations (7)–(9) and hypotheses (H.1)–(H.3), (i) provides levels. However, this analysis does not take into account that
correct asymptotic size, , (ii) is consistent against any Ž xed veteran status was not randomly assigned in the population. In
alternative, (iii) has power (greater or equal to size) against fact, there was a strong selection process in the military dur-
contiguous alternatives. ing the Vietnam era. Some individuals volunteered, and oth-
This proposition is proven in Appendix A. Note that the dis- ers avoided enrollment using different methods, like student
tribution of Yi is not assumed to be continuous. This is impor- or occupational deferments. In addition, there was a screening
tant because outcome variables of interest in economics typi- process in the military prior to enrollment which disqualiŽ ed
cally have probability atoms (for example, earnings variables some individuals for service for a variety of reasons such as
typically have probability mass at zero; wage variables typi- having health problems or for having committed a felony (see
cally have probability mass at the minimum wage.) If the out- Baskir and Strauss 1978 for an account of the issues involved
come variable is absolutely continuous, then exact asymptotic in military enrollment during the Vietnam era). Thus, enroll-
results can be obtained (as in Dudley 1989, chap. 12). The ment for military service during the Vietnam era was in u-
bounded support assumption is stronger than necessary, but enced by variables associated with future potential earnings.
288 Journal of the American Statistical Association, March 2002

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1
non–veterans
veterans
0
10,000 20,000 30,000 40,000 50,000 60,000
Annual earnings

Figure 1. Empirical Distributions of Earnings for Veterans and Nonveterans.

Therefore, we cannot draw causal inferences by simply com- about hypotheses (H.1)–(H.3) for the subpopulation of com-
paring the distributions of realized earnings between veterans pliers. (The computer code used for these calculations is avail-
and nonveterans. able from the author on request.) Table 1 reports p-values for
If draft eligibility is a valid instrument, then the marginal the tests of equality of distributions, Ž rst-order and second-
distributions of potential outcomes for compliers are consis- order stochastic dominance. Notice that, for this example, the
tently estimated by using sample analogs of equations (5) and stochastic dominance tests are for earnings for nonveterans
(6). Figure 2 is the result of applying our data to those equa- dominating earnings for veterans. The Ž rst row of Table 1
tions. Note that in Ž nite samples, the instrumental variables contains the results for annual earnings as the outcome vari-
estimates of the potential cdfs for compliers may not be non- able. In the second row the analysis is repeated for weekly
decreasing functions (see Imbens and Rubin 1997 for a related wages. Bootstrap resampling was performed 2,000 times
discussion). The most remarkable feature of Figure 2 is the (B D2,000).
change in the estimated distributional effect of veteran status First, consider the results for annual earnings. The
on earnings with respect to the naive analysis. The average
Kolmogorov–Smirnov statistic for equality of distributions is
effect of military service for compliers can be easily estimated
revealed to take an unlikely high value under the null hypoth-
using the techniques in Imbens and Angrist (1994). On aver-
esis. However, we cannot reject equality of distributions at
age, veteran status is estimated to have a negative impact of
conventional test levels. The lack of evidence against the null
$1,278 on earnings for compliers, although this effect is far
hypothesis increases as we go from equality of distributions
from being statistically different from zero. Now, veteran sta-
tus seems to reduce low quantiles of the income distribution, to Ž rst-order stochastic dominance, and from Ž rst-order to
leaving high quantiles unaffected. If this characterization is second-order stochastic dominance. The results for weekly
true, the potential outcome for nonveterans would dominate wages are slightly different. For weekly wages we fall far from
that for veterans in the Ž rst-order stochastic sense. rejecting equality of distributions at conventional test levels.
Following the strategy described in Section 2, hypothe- This example illustrates how useful it can be to think in
ses testing is performed. First, the test statistics in terms of distributional effects, and not merely average effects,
equations (7)–(9) are computed for the draft-eligible/draft- when formulating the null hypothesis. Once we consider dis-
ineligible samples. Then, the distributions of the test statis- tributional effects, the belief that military service in Vietnam
tics under the least favorable null hypothesis are approximated had a negative effect on civilian earnings can naturally be
by resampling from the pooled sample and recomputing the incorporated in the null hypothesis by Ž rst- or second-order
test statistics. In this way, we are able to make inference stochastic dominance.
Abadie: Bootstrap Tests for Distributional Treatment Effects 289

0.9

0.8

0.7

0.6

0.5

0.4
veterans

0.3

0.2

0.1 non–veterans

0
10,000 20,000 30,000 40,000 50,000 60,000
Annual earnings

Figure 2. Estimated Distributions of Potential Earnings for Compliers.

4. SUMMARY AND DISCUSSION OF est. Estimates of cumulative distribution functions of poten-


POSSIBLE EXTENSIONS tial outcomes for compliers show an adverse effect of mili-
tary experience on the lower tail of the distribution of annual
When treatment intake is not randomized, instrumental vari-
earnings. However, equality of distributions cannot be rejected
able models allow us to identify the effects of a treatment on
at conventional conŽ dence levels. First- and second-order
some outcome variable, for the subpopulation whose treatment
stochastic dominance are not rejected by the data. Results are
status is determined by variation in the instrument. For this
more favorable to the hypothesis of equality of distributions
group of the population, called compliers, the entire marginal
when using weekly wages as the outcome variable.
distribution of the outcome under different treatments can be
Equality of distributions and Ž rst- and second-order stochas-
estimated. In this article, a strategy to test for distributional
tic dominance are not the only hypotheses that can be tested
effects of treatments within the population of compliers is
using the bootstrap to compare the distribution of the outcome
developed. In particular, the focus is on the equality of dis-
variable for different values of the instrument. For example,
tributions, Ž rst-order and second-order stochastic dominance
a test for a constant treatment effect,  D Y 415 ƒ Y 405, can
hypotheses. First, how to estimate the distributions of poten-
be implemented by applying the test of equality of distribu-
tial outcomes for compliers is explained. Then, bootstrap sam-
tions to Wi D Yi ƒ  ¢ Di . If  is unknown and needs to be
pling is used to approximate the null distribution of the test
estimated, the asymptotic distribution of the test statistic will
statistics.
be affected. Nuisance parameters may also arise if parametric
I illustrate this method with an application to the study of
models are used to adjust for the effect of covariates. Although
the effects of veteran status on civilian earnings. Following
estimation of nuisance parameters is not explicitly addressed
Angrist (1990), use variation in veteran status induced by ran-
in the present article, modiŽ cations along the lines of Romano
domly assigned draft eligibility to identify the effects of inter-
(1988) or theorem 19.23 in van der Vaart (1998) look like
promising starting points to obtain results analogous to those
Table 1. Tests on Distributional Effects of Veteran Status on Civilian in Proposition 2.2.
Earnings, p-values Another interesting question is how to make the cdf esti-
mators proposed in this article nondecreasing. One possi-
First-order Second-order ble approach is to choose the nondecreasing function that
Outcome Equality in stochastic stochastic
variable distributions dominance dominance minimizes a weighted average quadratic distance to the esti-
mated cdf. This can be accomplished using well-known
Annual earnings 01245 06260 07415 isotonic regression methods as in Robertson, Wright, and
Weekly wages 02330 06490 07530
Dykstra (1988).
290 Journal of the American Statistical Association, March 2002

Finally, using techniques similar to those in Appendix A, it deŽ ne the pooled empirical measure
can be seen that the results in Proposition 2.2 also hold for the
permutation versions of the tests proposed in this article (for 1X n
èn D „ 1
permutation tests resampling is done without replacement). An n iD1 Yi
appealing feature of permutation tests is that, by construction,
b1 1 : : : 1 Ybn 5 be a ran-
then ð11n1 ƒ èn D 41 ƒ  n 54ð11n1 ƒ ð01n0 5. Let 4Y
they provide exact level in Ž nite samples (see, e.g., Efron and
dom sample from the pooled empirical measure. DeŽ ne the bootstrap
Tibshirani 1993).
empirical measures:

APPENDIX A: ASYMPTOTIC VALIDITY n1


1 X 1 X
n
b
ð11n1 D „b b
ð01n0 D
OF THE BOOTSTRAP Yi 1 „b
Yj 0
n1 iD1 n0 jDn 1 C1

Proof of Proposition 2.2.


By theorem 3.7.7 in van der Vaart and Wellner (1996), if
Part (i) can be proven by extending the argument in van der Vaart n ! ˆ, then n1 4b
1=2
ð11n1 ƒ èn 5 ) GH given almost every sequence
and Wellner (1996) chap. 3.7 to tests for Ž rst- and second-order 4Y111 1 : : : 1 Y11n1 5, 4Y011 1 : : : 1 Y01n0 5, where H D  ¢ P1 C 41 ƒ  5 ¢ P0 .
stochastic dominance. Let P1 , P0 , be the probability laws of Y con- The same result holds for n0 4b
1=2
ð01n0 ƒ èn 5. Let
ditional on Z D 1 and Z D 0, respectively. Let Q be the probability
law of Z which is Bernoulli with parameter  . DeŽ ne the empirical ± n n ²1=2
bn D
D 1 0
4b
ð11n1 ƒ b
ð01n0 50
measures n
n1 n0
1 X 1 X Note that T 4Dbn 5 D T 441 ƒ  n 51=2 n1=2 b 1=2 1=2 b
ð11n1 D „ 1 ð01n0 D „ 1 1 4ð11n1 ƒ èn 5 ƒ  n n0 4ð01n0 ƒ
n1 iD1 Y11i n0 jD1 Y 01j èn 55. Therefore, T 4D bn 5 converges in distribution to T 441 ƒ
 51=2 GH ƒ  1=2 G0H 5 almost surely, where GH and G0H are indepen-
where „Y indicates a probability mass point at Y . Let ¦ D dent H-Brownian bridges. Because 41ƒ  51=2 GH ƒ  1=2 G0H is also an
8184ƒˆ1 y79 2 y 2 ò9, that is, the class of indicators of all lower half H-Brownian bridge, we have that, if P1 D P0 D P, then T 4D bn 5 con-
lines in ò. Because ¦ is known to be universally Donsker, by theo- verges in distribution to T 4GP 5 almost surely. Let P b be the bootstrap
rem 3.5.1 in van der Vaart and Wellner (1996) we have probability measure for the sample, and let
1=2 1=2 b 4Dbn 5 > c5 µ 90
G11n1 D n1 4ð11n1 ƒ P1 5 ) GP1 1 G01n0 D n0 4ð01n0 ƒ P0 5 ) GP0 cOn D inf8c2 P4T

in lˆ 4¦ 5, where “)” denotes weak convergence, lˆ 4¦ 5 is the set of To obtain the result of asymptotic size equal to  note that T eq , T fsd ,
all uniformly bounded real functions on ¦ and GP is a P-Brownian and T ssd are convex continuous functionals. Note also that if P is
bridge. Let nondegenerate, T 4GP 5 has support equal to 601 ˆ5. By theorem 11.1
± n n ²1=2
in Davydov, Lifshits, and Smorodina (1998), T 4GP 5 has continuous
Dn D 1 0 4ð11n1 ƒ ð01n0 50
n and strictly increasing cdf everywhere except possibly at zero. If P
As n ! ˆ,  n D n1 =n !  2 401 15 almost surely. Then, if P1 D is nondegenerate, Pr4supf 2¦ —GP 4f 5— D 05 D 0 so T eq 4GP 5 has abso-
P0 D P, Dn ) 41 ƒ  51=2 ¢ GP ƒ  1=2 ¢ G0P , where GP and G0P are lutely continuous distribution. It is left to be shown that T fsd 4GP 5
independent versions of a P-Brownian bridge. Because 41 ƒ  51=2 ¢ and T ssd 4GP 5 cannot have probability mass greater than 0.5 at zero.
GP ƒ  1=2 ¢ G0P is also a P-Brownian bridge, we have that Dn ) GP . Because the variance of GP 4f 5 is zero outside the support of P, we
For f 2 ¦ , let a4f 5 D sup8t 2 ò 2 f 4t5 D 19 and ‹ be the have that Pr4supf 2¦ GP 4f 5 D 05 D Pr46 9f 2 ¦ 2 GP 4f 5 > 05. By
Lebesgue measure on ò. For z 2 lˆ 4¦ 5, deŽ ne the following symmetry of Gaussian measures, Pr46 9f 2 ¦ 2 GP 4f 5 > 05 D Pr46 9
maps: TReq 4z5 D supf 2¦ —z4f 5—, T fsd 4z5 D supf 2¦ z4f 5 and T s sd 4z5 D f 2 ¦ 2 GP 4f 5 < 05. By Assumption 2.2, P is nondegenerate, hence
supf 2¦ 8g2 ¦ 2a4g5µa4f 59 z4g5 dŒ4g5 where Œ D ‹ ž a. Our test statis- Pr46 9f 2 ¦ 2 GP 4f 5 > 0 \ 6 9f 2 ¦ 2 GP 4f 5 < 05 D 0. Therefore,
tics are T eq 4Dn 5, T fsd 4Dn 5 and T ssd 4Dn 5. Let T be a generic nota- 1 ¶ Pr46 9f 2 ¦ 2 GP 4f 5 > 0 [ 6 9f 2 ¦ 2 GP 4f 5 < 05 D 2 ¢ Pr4supf 2¦
GP 4f 5 RD 05. The same reasoning applies to T ssd 4GP 5 once we sub-
tion for T eq , T fsd or T ssd . Notice that, for z1 1 z2 2 lˆ 4¦ 5, T 4z2 5 µ
stitute 8g2 ¦ 2a4g 5µa4f 59 GP 4g5 dŒ4g5 for GP 4f 5. Therefore, for  < 05,
T 4z1 5 C T 4z2 ƒ z1 5. Because T eq is equal to the norm in lˆ 4¦ 5, triv-
ially T eq is continuous. T fsd is also continuous because T fsd 4z2 ƒ we have cOn ! cP 45 almost surely. Then, the Ž rst result of the the-
z1 5 µ T eq 4z2 ƒ z1 5. Finally, the bounded support condition allows us orem holds by continuity of T 4GP 5 ƒ cP 45 at zero.
to restrict ourselves to functions z2 1 z1 2 8x 2 lˆ 4¦ 5 2 x4aƒ1 4t55 D By tightness of the limiting process, cOn is bounded in probability
0 for t 2 4ƒˆ1 l5 [ 4u1 ˆ59, for some real l, u (l < u) such that the and the tests are consistent against any Ž xed alternative. This proves
support of P is contained in the interval 6l1 u7. Then, it is easy to (i) and (ii).
To prove (iii), let M D Q ¢ P. Under M ,
see that T ssd 4z2 ƒ z1 5 µ 4u ƒ l5 ¢ T fsd 4z2 ƒ z1 5, hence T ssd , is continu-
ous. For the stochastic dominance tests we will use the least favor- ± n n ²1=2
0 1
able case (P1 D P0 ) to derive the null asymptotic distribution. Under Dn 4f 5 D 4ð11n1 4f 5 ƒ ð01n0 4f 55
n
the least favorable null hypotheses, by continuity, the tests statistics n µ± ¶
converge in distribution to T eq 4GP 5, T fsd 4GP 5, and T ssd 4GP 5, respec- 1 X n0 ²1=2 ± n ²1=2
1
D Zi ƒ 41 ƒ Zi 5 ¢ 4f 4Yi 5 ƒ Pf 4Yi 55
tively. Note that, in general, the asymptotic distribution of our test n1=2 iD1 n1 n0
statistics under the least favorable null hypotheses depends on the n µ± ¶
1 X 1 ƒ  ²1=2 ±  ²1=2
underlying probability P. It can easily be seen that our test statistics D Zi ƒ 41 ƒ Zi 5
n1=2 iD1  1ƒ
tend to inŽ nity under any Ž xed alternative.
Let cP 45 D inf8c 2 P4T 4GP 5 > c5 µ 9. Consider a test that rejects ¢ 4f 4Yi 5 ƒ Pf 4Yi 55 C op 4150
the null hypothesis if T 4Dn 5 > cn . Because cP 45 depends on P, the
sequence 8cn 9 is determined by a resampling method. Consider the Local alternatives are given by Mn D Q ¢ Pn , where Pn is a sequence
pooled sample 4Y1 1 : : : 1 Yn 5 D 4Y111 1 : : : 1 Y11n1 1 Y011 1 : : : 1 Y01n0 5, and of conditional probability measures equal to Pz1n for Z D z. P01n and
Abadie: Bootstrap Tests for Distributional Treatment Effects 291

Table 2. True Test Size in Small Samples, Monte Carlo Simulation

Distribution
Nominal test level

Empirical
example N(01 1) U(01 1) b(051 10)
Sample
size .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01

Equality 25 .119 .062 .017 .121 .063 .011 .130 .063 .014 .107 .053 .010
of distributions 50 .114 .059 .015 .127 .072 .017 .148 .085 .020 .108 .061 .013
100 .114 .055 .012 .114 .058 .011 .127 .069 .016 .115 .060 .016
250 .106 .051 .011 .119 .058 .011 .111 .053 .011 .121 .058 .013
500 .099 .047 .010 .107 .052 .010 .112 .055 .010 .106 .053 .012
First-order 25 .122 .059 .015 .125 .060 .012 .134 .070 .014 .101 .050 .011
stochastic dominance 50 .109 .055 .012 .135 .068 .018 .131 .072 .016 .118 .051 .013
100 .106 .056 .012 .115 .058 .011 .123 .067 .016 .112 .056 .015
250 .105 .053 .011 .120 .061 .013 .119 .056 .010 .110 .062 .014
500 .091 .049 .010 .106 .055 .011 .114 .057 .011 .093 .046 .009
Second-order 25 .110 .058 .011 .106 .052 .006 .107 .050 .011 .101 .049 .009
stochastic dominance 50 .101 .050 .012 .110 .059 .010 .103 .054 .011 .103 .052 .009
100 .104 .051 .009 .100 .047 .007 .105 .052 .011 .105 .053 .012
250 .098 .049 .011 .095 .048 .010 .100 .045 .007 .105 .051 .011
500 .100 .048 .011 .102 .053 .009 .101 .051 .009 .092 .045 .011
s.e. .005 .003 .002 .005 .003 .002 .005 .003 .002 .005 .003 .002

P11n approach a common limit P in the following sense: The same result holds for Ž rst- and second-order dominance tests
(note that for these tests the sequence of contiguous alternatives
Z µ 1
¶2
n1=2 4dPz1n
1=2
ƒ dP 1=2 5 ƒ xz dP 1=2 !0 for z D 01 11 (A.1) should be speciŽ ed such that T fsd 41 ƒ 0 5 ¶ 0 and T ssd 41 ƒ 0 5 ¶ 0,
2 respectively.)
where x1 , x0 are measurable real functions. Therefore,
APPENDIX B: SMALL SAMPLE BEHAVIOR
Zµ 1
¶2
1=2
n 4dMn1=2 ƒ dM 1=2
5 ƒ x dM 1=2 ! 01 To assess the small sample performance of the tests proposed in
2
this article a Monte Carlo study was conducted. To mimic as closely
for x4Zi 1 Yi 5 D 41 ƒ Zi 5 ¢ x0 4Yi 5 C Zi ¢ x1 4Yi 5. as possible the actual small sample behavior of these tests in real
It can be shown (van der Vaart and Wellner (1996), lemma applications, one of the distributions used for the simulation study
3.10.11) that the sequences of product measures Mnn and M n are con- is the empirical distribution of annual earnings from the data used
tiguous, Mx D 0, Mx2 < ˆ, and in Section 3. The other three distributions are a standard normal,
a uniform on (0,1), and a binomial with parameters (.5, 10). (Note
dMnn X n
dMn 1 X n
1 that the simulation considers a distribution, the standard normal, that
log D log 4Zi 1 Yi 5 D 1=2 x4Zi 1 Yi 5 ƒ Mx2 C op 4151
dM n dM n iD1 2 belongs to a larger family than permitted by the regularity conditions,
iD1
because it does not have bounded support.) For each distribution and
under M . Therefore each Monte Carlo iteration, a sample of size n was drawn (n equal
³ ´0 ³³ ´³ ´´ to 25, 50, 100, 250, and 500). Each sample was divided into two
dMnn Mn 0 P4f ƒ Pf 52 ’ 4f 5 subsamples following the proportion of draft eligibles/noneligibles
Dn 4f 51 log )N 2 1 1
dM n ƒMx =2 ’ 4f 5 Mx2 in the original data for the Ž rst distribution, and a 1/1 proportion
(approximate for n odd) for the other three distributions. Then, the
where ’ 4f 5 D  1=2 41 ƒ  51=2 41 4f 5 ƒ 0 4f 55 and z 4f 5 D Pxz f .
test statistics in equations (7)–(9) were computed and the bootstrap
Applying LeCam’s third lemma
tests were performed using 2,000 bootstrap iterations. This process
M nn was repeated for 4,000 Monte Carlo iterations. Table 2 shows the
Dn 4f 5 ) N 401 P4f ƒ Pf 52 5 C  1=2 41 ƒ  51=2 41 4f 5 ƒ 0 4f 550 results of this simulation study for samples sizes equal to 25, 50, 100,
250, and 500 and nominal test levels equal to 0.10, 0.05, and 0.01.
Using the Donsker property of ¦ , we obtain the uniform version of Asymptotic standard errors (as the number of Monte Carlo iterations
last result (see van der Vaart and Wellner 1996, theorem 3.10.12.) tends to inŽ nity) are reported in the last row of the table. The table
shows highly satisfactory performance of the tests, even in fairly
Dn ) GP C  1=2 41 ƒ  51=2 ¢ 41 ƒ 0 50
small samples (n D 25).
In addition, supf 2¦ —n1=2 4Pz1n ƒ P5f ƒ z 4f 5— ! 0 for z D 01 1, and
therefore supf 2¦ —n1=2 4P11n ƒ P01n 5f ƒ 41 4f 5 ƒ 0 4f 55— ! 0. APPENDIX C: DATA DESCRIPTION
Mnn
By contiguity arguments b cn ! cP 45. Then, using a version of The data set was especially prepared for Angrist and Krueger
Anderson’s lemma for general Banach spaces (see, e.g., van der Vaart (1995). Both annual earnings and weekly wages are in real terms.
and Wellner 1996, lemma 3.11.4), we obtain the desired result for Weekly wages are imputed by dividing annual labor earnings by the
the test of equality of distributions. number of weeks worked. The Vietnam era draft lottery is carefully
292 Journal of the American Statistical Association, March 2002

described in Angrist (1990), where the validity of draft eligibility as Baskir, L. M., and Strauss, W. A. (1978), Chance and Circumstance: the
an instrument for veteran status is also discussed. This lottery was Draft, the War, and the Vietnam Generation. New York: Alfred A. Knopf.
conducted every year between 1970 and 1974 and it used to assign Bickel, P. J. (1969), “A Distribution Free Version of the Smirnov Two Sample
Test in the p-Variate Case,” The Annals of Mathematical Statistics, 40,
numbers (from 1 to 365) to dates of birth in the cohorts being drafted. 1–23.
Men with lowest numbers were called to serve up to a ceiling deter- Darling, D. A. (1957), “The Kolmogorov–Smirnov, Cramer–von Mises Tests,”
mined every year by the Department of Defense. The value of that The Annals of Mathematical Statistics, 28, 823–838.
ceiling varied from 95 to 195 depending on the year. Here, an indi- Davydov, Y. A., Lifshits, M. A., and Smorodina, N. V. (1998), Local Proper-
ties of Distributions of Stochastic Functionals. Providence: American Math-
cator for lottery numbers lower than 100 is used as an instrument ematical Society.
for veteran status. The fact that draft eligibility affected the probabil- Dawid, A. P. (1979), “Conditional Independenc e in Statistical Theory,” Jour-
ity of enrollment along with its random nature makes this variable a nal of the Royal Statistical Society, 41, 1–31.
good candidate to instrument veteran status. Dudley, R. M. (1989), Real Analysis and Probability. New York: Chapman
& Hall.
Efron, B., and R. J. Tibshirani (1993), An Introduction to the Bootstrap.
[Received March 2000. Revised May 2001.] New York: Chapman & Hall.
Foster, J. E., and Shorrocks, A. F. (1988), “Poverty Orderings,” Econometrica,
56, 173–177.
Imbens, G. W., and Angrist, J. D. (1994), “ IdentiŽ cation and Estimation of
REFERENCES Local Average Treatment Effects,” Econometrica, 62, 467–476.
Imbens, G. W., and Rubin, D. B. (1997), “Estimating Outcome Distributions
Abadie, A., Angrist, J. D., and Imbens, G. W. (in press), “ Instrumental Vari- for Compliers in Instrumental Variable Models,” Review of Economic Stud-
ables Estimates of the Effect of Subsidized Training on the Quantiles of ies, 64, 555–574.
Trainee Earnings,” Econometrica. Klecan, L., McFadden, R., and McFadden, D. (1991), “A Robust Test for
Anderson, G. (1996), “Nonparametric Tests for Stochastic Dominance in Stochastic Dominance,” unpublishe d manuscript, M IT.
Income Distributions,” Econometrica, 64, 1183–1193. McFadden, D. (1989), “Testing for Stochastic Dominance,” in Studies in the
Andrews, D. W. K. (1997), “A Conditional Kolmogorov Test,” Econometrica, Economics of Uncertainty in Honor of Josef Hadar, ed. by T. B. Fomby,
65, 1097–1128. and T. K. Seo. New York. Springer-Verlag.
Angrist, J. D. (1990), “Lifetime Earnings and the Vietnam Era Draft Lottery: Præstgaard, J. T. (1995), “Permutation and Bootstrap Kolmogorov-Smirno v
Evidence from Social Security Administrative Records,” American Eco- Tests for the Equality of Two Distributions,” Scandinavia n Journal of
nomic Review, 80, 313–336. Statistics, 22, 305–322.
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), “ IdentiŽ cation of Robertson, T., Wright, F. T., and Dykstra, R. L. (1988), Order Restricted
Causal Effects Using Instrumental Variables,” Journal of the American Sta- Statistical Inference. New York: Wiley.
tistical Association, 91, 444–472. Romano, J. P. (1988), “A Bootstrap Revival of Some Nonparametri c Distance
Angrist, J. D., and Krueger, A. B. (1995), “Split-Sample Instrumental Vari- Tests,” Journal of the American Statistical Association, 83, 698–708.
ables Estimates of the Return to Schooling,” Journal of Business and Eco- Rubin, D. B. (1990), “Formal Models of Statistical Inference for Causal
nomic Statistics, 13, 225–235. Effects,” Journal of Statistical Planning and Inference, 25, 279–292.
Atkinson, A. B. (1970), “On the Measurement of Inequality,” Journal of Eco- van der Vaart, A. W. (1998), Asymptotic Statistics. New York: Cambridge
nomic Theory, 2, 244–263. University Press.
Barrett, G., and Donald, S. (1999), “Consistent Tests for Stochastic Domi- van der Vaart, A. W., and Wellner, J. A. (1996), Weak Convergence and
nance,” unpublishe d manuscript. Empirical Processes. New York: Springer-Verlag.

You might also like