You are on page 1of 9

Estimating Average Treatment Effects: Supplementary Analyses

and Remaining Challenges


By Susan Athey, Guido Imbens, Thai Pham, and Stefan Wager∗

I. Introduction the realized outcome,


arXiv:1702.01250v1 [stat.ME] 4 Feb 2017


obs Yi (0) if Wi = 0,
There is a large literature in economet- Yi =
Yi (1) if Wi = 1,
rics and statistics on semiparametric esti-
mation of average treatment effects under and pretreatment variables or features Xi .
the assumption of unconfounded treatment To identify τ we assume unconfoundedness
assignment. Recently this literature has (Rosenbaum and Rubin [1983])
focused on the setting with many covari-
ates, where regularization of some kind is  
Wi ⊥⊥ Yi (0), Yi (1) Xi ,

required. In this article we discuss some
of the lessons from the earlier literature
and their relevance for the many covariate and overlap of the covariate distributions,
setting, and propose some supplementary
analyses to assess the credibility of the re- e(x) ∈ (0, 1),
sults.
where the propensity score
(Rosenbaum and Rubin [1983]) is
II. The Set Up e(x) = pr(Wi = 1|Xi = x). De-
fine the marginal treatment prob-
We are interested in estimating an aver- ability p = E[Wi ], the conditional
age treatment effect in a setting with a bi- means of the potential outcomes,
nary treatment. We use the potential out- µ(w, x) = E[Yi (w)|Xi = x], the marginal
come or Rubin Causal Model set up (Rubin means, µw = E[Yi (w)], and the conditional
[1974], Holland [1986], Imbens and Rubin variances σ 2 (w, x) = V(Yi (w)|Xi = x).
[2015]). Each unit in a large popula- The efficient score for τ , which plays a key
tion is characterized by a pair of potential role in the discussion, is
outcomes (Yi (0), Yi (1)), with the estimand
y − µ(1, x)
equal to the average causal effect: φ(y, w, x; τ, µ(·, ·), e(·)) = w −
e(x)
τ = E[Yi (1) − Yi (0)],
y − µ(0, x)
(1 − w) + µ(1, x) − µ(0, x) − τ,
or the average effect for the treated, τt = 1 − e(x)
E[Yi (1) − Yi (0)|Wi = 1]. The treatment as- (Hahn [1998]) and the implied semipara-
signment for unit i is Wi ∈ {0, 1}. For each metric variance bound is
unit in a random sample from the popula-
tion we observe the treatment received and AV = E φ(Yiobs , Wi , Xi ; τ, µ(·, ·), e(·))2 .
 

∗ Athey: Graduate School of Business, Stanford Uni-


For the average effect for the treated, τt , the
versity, athey@stanford.edu. Imbens: Graduate School efficient score function is
of Business, Stanford University, imbens@stanford.edu.
Pham: Graduate School of Business, Stanford Univer- w
sity, thaipham@stanford.edu. Wager: Department of φ′ (y, w, x; τt , µ(·, ·), e(·)) = (y−µ(0, x)−τ )
p
Statistics, Columbia University, and Graduate School
of Business, Stanford University, swager@stanford.edu.
We are grateful for discussions with Jasjeet Sekhon and (1 − w)e(x)
+ (y − µ(0, x)).
comments by Panos Toulis p(1 − e(x))
1
2 PAPERS AND PROCEEDINGS MONTH YEAR

A wide range estimators for τ have been A. Double Robustness


proposed in this setting, (see for a review
Imbens and Wooldridge [2009]). Some of A consistent finding from the observa-
the proposed estimators rely on matching tional study literature with a fixed number
(Abadie and Imbens [2006]). Others rely of pretreatment variables is that the best
on different characterizations of the aver- estimators in practice involve both esti-
age treatment effect, using the propensity mation of the conditional expectations of
score, (Hirano et al. [2001]), the potential outcomes and estimation of
 obs the propensity score, rendering them less
· Wi Yiobs · (1 − Wi )

Y sensitive to estimation error in either. (Al-
τ =E i − ,
e(Xi ) 1 − e(Xi ) though this appears to be less important in
the case of a randomized experiment, where
the conditional expectation of the outcome, simply estimating the conditional expec-
h i tation of the outcome automatically leads
τ = E µ(1, Xi ) − µ(0, Xi ) , to robustness, e.g., Wager et al. [2016]
because the propensity score is constant
(Hahn [1998]), or the efficient score repre- and therefore always correctly specified.)
sentation An important notion in the observational
" study literature, formalizing this idea,
Y obs − µ(1, Xi ) is that of so called “doubly robust” es-
τ = E Wi i +
e(Xi ) timators (Robins and Rotnitzky [1995],
Robins et al. [1995], Scharfstein et al.
[1999]) that rely for consistency only
#
Yiobs − µ(0, Xi )
(1−Wi ) +µ(1, Xi )−µ(0, Xi ) on consistent estimation of either the
1 − e(Xi ) propensity score or the conditional out-
(van der Vaart [2000], come expectations, but do not require
Van Der Laan and Rubin [2006], consistent estimation of both. See also,
Chernozhukov et al. [2016]). Corre- Kang and Schafer [2007] for a critical
sponding estimators exist for the average perspective on these ideas. As a simple
effect for the treated. example to develop intuition for this,
consider the standard omitted variable
Because the unconfoundedness assump- bias formule when estimating a regression
tion imposes no restrictions on the joint function
distribution of the observed variables
(Yiobs , Wi , Xi ), it follows by the general Yiobs = Wi τ + Xi β + εi .
results for semiparametric estimators in
Newey [1994] that all three approaches, Omitting Xi from this regression leads to
substituting suitable (sometimes under- a bias in the least squares estimator for τ
smoothed) nonparametric estimators of the if, first, the included regressor Wi and the
propensity score and/ or the conditional omitted regressor Xi are correlated, and
expectations of the potential outcomes, second, the omitted regressor has a non-
and replacing the expectations by averages, zero coefficient. In this setting weight-
reach the semiparametric efficiency bound. ing by the inverse of, or conditioning on,
the propensity score removes the correla-
tion between Wi and Xi . Therefore it elim-
III. Four Issues inates the sensitivity to the parametric form
in which Xi is included, without introduc-
ing bias if the weights are misspecified but
First we wish to raise four issues that the regression function is correct.
have come up in the fixed-number-of- Here we view estimators as at least in the
covariate case, and which are even more rel- spirit of doubly robust estimation if they
evant in the many covariate setting. attempt to adjust directly for the associa-
VOL. VOL NO. ISSUE AVERAGE TREATMENT EFFECTS 3

tion between the treatment indicator and sity score is close to zero for a substantial
the covariates, through balancing, weight- fraction of the population. This is a partic-
ing, or otherwise, and adjust directly for ular concern in settings with many covari-
the association between the potential out- ates because regularization based on predic-
comes and the covariates. There are mul- tion criteria may downplay biases that are
tiple ways of obtaining such estimators. present in estimation of µ(w, x) in parts of
One can do so by subclassification on the the (w, x) space with few observations, even
propensity score in combination with re- if those values are important for the estima-
gression within the subclasses, or weight- tion of the average treatment effect. In that
ing in combination with regression. For ex- case one may wish to focus on a weighted
ample, suppose we parametrize the condi- average effect of the treatment. One can do
tional means as µ(w, x) = wτ +x′ β, and the so by trimming or weighting. Crump et al.
propensity score as e(x) = 1/(1+exp(x′ γ)), [2006, 2009] and Li et al. [2014] suggest es-
and estimate the regression by weighted timating
linear
p regression with weights equal to
E [ω(Xi ) · (Yi (1) − Yi (0))]
p
Wi / e(Xi ; γ̂) + (1 − Wi )/ (1 − e(Xi ; γ̂)), τω(·) = ,
then the estimator for τ is consistent if ei- E [ω(Xi )]
ther the propensity score or the conditional
expectations of the potential outcomes are for ω(x) = e(x)(1 − e(x)) or ω(x) =
correctly specified. Similarly, using the effi- 1α<e(x)<1−α . The semiparametric efficiency
cient score, if we estimate the average treat- bound for τω(·) is (Hirano et al. [2001])
ment effect by solving
ω(Xi )2 σ 2 (1, Xi )

1
N AV = E
1 X  obs  E[ω(Xi )2 ] e(Xi )
φ Yi , Wi , Xi ; τ, µ̂(·, ·), ê(·) = 0,
N i=1
ω(Xi )2 σ 2 (0, Xi )
+
as a function of τ given estimators µ̂(·, ·) 1 − e(Xi )

and ê(·), then as long as either the estima- 2
2
+ω(Xi ) µ(1, Xi ) − µ(0, Xi ) − τω(·) ,
tor for either µ(w, x) or e(x) is consistent,
the resulting estimator for τ is consistent.
which can be an order of magnitude smaller
If we use general nonparametric estima-
than the asymptotic variance bound for τ
tors for µ(·), ·) and e(·), this last estima-
itself.
tor also has the property that the esti-
In settings with limited or no heterogene-
mator for the finite dimensional compo-
ity in the treatment effects as a function
nent τ is asymptotically uncorrelated with
of the covariates, these weights are par-
the estimator for the nonparametric compo-
ticularly helpful and the weights ω(x) =
nents µ(w, x) and e(x). This orthogonality
e(x)(1 − e(x)) lead to efficient estima-
property (Chernozhukov et al. [2016]) fol-
tors for τ in that case. The arguments
lows from the representation of the estima-
in Crump et al. [2006, 2009] and Li et al.
tor in terms of the efficient score. Note that
[2014] show that one may wish to impose a
the properties are distinct: not all estima-
constant treatment effect in estimation even
tors that have the orthogonality property
if substantively one does not find that as-
are doubly robust.
sumption credible.
B. Modifying the Estimand
C. Weighting versus Balancing
A second issue is the choice of estimand.
Much of the literature has focused on the Although weighting by the inverse of the
average treatment effect E[Yi (1) − Yi (0)], or treatment assignment balances pretreat-
the average effect for the treated. A practi- ment variables in expectation, it does not
cal concern is that these estimands may be do so in finite samples. Recently there
difficult to estimate precisely if the propen- have been a number of estimators pro-
4 PAPERS AND PROCEEDINGS MONTH YEAR

posed that focus directly on balancing the 1


= )E [b(Xi )] ,
pretreatment variables, bypassing estima- p(1 − p)
tion of the propensity score (Hainmueller where the bias function b(·) is
[2012], Zubizarreta [2015], Graham et al.
[2012, 2016], Athey et al. [2016]). Specifi- b(x) = (e(x) − p)
cally, given a set of pretreatment variables
Xi , one can look for a set of weights λi such ×(p(µ(0, x) − µ0 ) + (1 − p)(µ(1, x) − µ1 ).
that Hence the bias is proportional to the covari-
N N ance of the propensity score and a weighted
1 X 1 X
average of the conditional expectations of
λi · Wi · Xi ≈ λi · (1 − Wi ) · Xi ,
Nt i=1
Nc i=1 the potential outcomes,
 
where Nc and Nt are the number of control Cov e(Xi ), pµ(0, Xi ) + (1 − p)µ(1, Xi ) .
and treated units respectively. The advan-
tage of such weights is that they eliminate The bias function at x measures the contri-
any biases associated with linear and addi- bution to the overall bias B, coming from
tive effects in the pretreatment variables in units with Xi = x. It is flat in a random-
the estimator ized experiment, or in cases where the pre-
PN obs
PN obs treatment variables are not associated with
i=1 λi Wi Yi i=1 λi (1 − Wi )Yi
τ̂ = PN − P N
, the outcome. As another special case, con-
i=1 λi Wi i=1 λi (1 − Wi ) sider a setting where all the pretreatment
variables are uncorrelated and have mean
whereas using the propensity score weights
zero and unit variance. If e(x) = x′ γ,
λi = Wi /e(Xi ) + (1 − Wi )/(1 − e(Xi )) does
and µ(w, x) = τ w + x′ β, then b(x) =
so only in expectation.
β ′ xx′ γ/(p(1 − p)), so that B = β ′ γ/(p(1 −
p)), depending only on the product of the
D. Sensitivity coefficients in the outcome equation and the
propensity score.
Settings where the bias B is large rela-
Consider the simple difference in aver- tive to the difference in average outcomes
age outcomes by treatment status as an by treatment effects, or b(·) is very vari-
estimator for the average treatment effect. able, are particularly challenging for esti-
The bias in this estimator arises from the mating τ . In our calculations below we re-
presence of pretreatment variables that are port summary statistics of b̂(Xi ), scaled by
associated with both the treatment and the standard deviation of the outcome.
the potential outcomes. Pretreatment vari-
ables that are associated solely with the IV. Three Estimators
treatment, or solely with the potential out-
comes may make it difficult to estimate the Here we briefly discuss three of the most
propensity score or the conditional expec- promising estimators that have been pro-
tations of the potential outcomes, but such posed for the case with many pretreat-
variables do not compromise the estimates ment variables. All three address biases
of the average treatment effects. As a result from the association between pretreatment
it is not so much sparsity of the propen- variables and potential outcomes and be-
sity score or sparsity of the conditional ex- tween pretreatment variables and treat-
pectations, but sparsity of the product of ment assignment. There are other estima-
the respective coefficients that matter. A tors using machine learning methods that
summary measure of this association is the focus only on one of these associations, for
characterization of the bias as an expected example inverse propensity score weight-
value, ing estimators that estimate the propen-
sity score using machine learning methods
B = E[Yiobs |Wi = 1] − E[Yiobs |Wi = 0] −τ

(McCaffrey et al. [2004]), but we do not ex-
VOL. VOL NO. ISSUE AVERAGE TREATMENT EFFECTS 5

pect those to perform well. The first two [2006] propose this estimator as a spe-
estimators we discuss assume linearity of cial case of the targeted maximum like-
the conditional expectation of the potential lihood approach, suggesting various ma-
outcomes in the, potentially many, covari- chine learning methods for estimation of
ates. How sensitive the results are in prac- the conditional outcome expectation and
tice to this linearity assumption in settings the propensity score. Chernozhukov et al.
with many covariates, where some of the [2016], in the context of much more gen-
covariates may be functions of underlying eral estimation problems, propose a closely
variables, remains to be seen. related estimator focusing on the orthogo-
nality properties arising from the use of the
A. The Double Selection Estimator (DSE) efficient score. In the Chernozhukov et al.
[2016] approach the sample is partitioned
Belloni et al. [2013] propose using into K subsamples, with the nonparamet-
LASSO (Tibshirani [1996]) as a covariate ric component estimated on one subsample,
selection method. They do so first to select and the parameter of interest estimated as
pretreatment variables that are important the average of the influence function over
for explaining the outcome, and then to the remainder of the sample. This is re-
select pretreatment variables that are peated K times, and the estimators for the
important for explaning the treatment as- parameter of interest averaged to obtain the
signment. They then combine the two sets final estimator, thereby further improving
of pretreatment variables and estimate a the properties in settings with many covari-
regression of the outcome on the treatment ates. We report both the simple version of
indicator and the union of the selected the DRE and the averaged version DMLE.
pretreatment variables.
V. Outstanding Challenges and
B. The Approximate Residual Balancing Practical Recommendations
Estimator (ARBE)
Here we present some practical rec-
Athey et al. [2016] suggest using elas- ommendations for researchers estimating
tic net (Zou and Hastie [2005]) or LASSO treatment effects, and discuss some of the
(Tibshirani [1996]) to estimate the condi- remaining challenges for the theoretical re-
tional outcome expectation, and then using searchers.
an approximate balancing approach in the
spirit of Zubizarreta [2015] as discussed in A. Recommendations
Section III.C to further remove bias arising
from remaining imbalances in the pretreat- The main recommendation is to report
ment variables. analyses beyond the point estimates and
the associated standard errors. Supporting
C. The Doubly Robust Estimator (DRE) and analyses should be presented to convey to
the Double Machine Learning Estimator the reader that the estimates are credible
(DMLE) (Athey and Imbens [2016]). By credible we
do not mean whether the unconfoundedness
In the general discussion of semiparamet- property holds, but whether the estimates
ric estimation van der Vaart [2000] suggest effectively adjust for differences in the co-
estimating the finite dimensional compo- variates. Here are four specific recommen-
nent as the average of the influence func- dations to do so.
tion, with the infinite dimensional compo-
nents estimated nonparametrically, leading 1) (Robustness) Do not rely on a sin-
to a doubly robust estimator in the spirit of gle estimation method. Many of the
Robins and Rotnitzky [1995], Robins et al. methods have attractive properties un-
[1995], Scharfstein et al. [1999]. In the der slightly different sets of regularity
specific context of estimation of average conditions but rely on the same funda-
treatment effects Van Der Laan and Rubin mental set of identifying assumptions.
6 PAPERS AND PROCEEDINGS MONTH YEAR

These regularity conditions are diffi- Chernozhukov et al. [2016]). In addition to


cult to assess in practice. Therefore, if the point estimates, we report simple boot-
the substantive results are not robust strap standard errors, the scaled bootstrap
to the specific choice of estimator, it is bias (SBB, calculated as the average differ-
unlikely that the results are credible. ence between the estimates, based on equal
size sample splits, and the overall estimate,
2) (Overlap) Assess concerns with over- scaled by the bootstrap standard error. In
lap by comparing the variance bound addition we report average of the estimator
for τ and τω(·) for a choice of ω(·) that based on sample splits, one for each covari-
de-emphasizes parts of the covariate ate, where we split the sample by the me-
space with limited overlap. If there dian value of each covariate in turn. Given
is a substantial efficiency difference be- the splits we calculate the estimator for
tween the τ and τω(·) , report results for each of the two subsamples, and then av-
both. erage those. See Athey and Imbens [2015a]
for details. We also report summary statis-
3) (Specification Sensitivity) Split the
sample based on median values of each tics of b̂(Xi ), the average, the median and
of the covariates in turn, estimate the the 0.025, 0.25, 0.75 and 0.975 quantiles,
parameter of interest on both subsam- based on random forest methods. We also
ples and average the estimates to as- present a histogram of b̂(x)fs.
sess sensitivity to the model specifica- For the Connors et al. [1996] data the
tion (e.g., Athey and Imbens [2015a]). methods do vary substantially, with the
four estimators (ignoring the naive differ-
4) (Half Sample Bias Estimates) Re- ence in means and the ols estimator) rang-
port half-sample estimates of the bias ing from 0.038 to 0.062. This range is sub-
of the estimator, calculated as the es- stantial compare to the difference relative
timator minus the average of estimates to the naive estimator of 0.074, and rela-
based on half samples, created by re- tive to the standard error. Trimming does
peatedly randomly splitting the orig- not reduce this range substantially. The
inal sample into two equal-sized sub- scaled bootstrap bias is as large as 29%
samples (Efron and Tibshirani [1994]). of the standard error, so coverage of con-
Asymptotic results rely on bias com- fidence intervals may not be close to nom-
ponents of the asymptotic distribution inal. Splitting systematically on the 70 co-
vanishing. These estimates may shed variates generates substantial variation in
light on the validity of such approxi- the estimates, with the standard deviation
mations. For example, it could reveal of the estimates (around 0.10) of the same
sensitivity to the choice of regulariza- order of magnitude as the standard errors
tion parameter. of the original estimates (around 0.14). The
tentative conclusion is that under uncon-
B. Some Illustrations foundedness the average effect is likely to
be positive, but with a range substantially
Here we illustrate these recommendations wider than that captured by the confidence
with the Connors et al. [1996] heart cather- intervals based on any of the estimators.
ization data, with 72 covariates. In a work-
ing paper version we provide two addi- C. Challenges
tional illustrations based on the Lalonde
data. We report six estimators, the sim- There are now more credible methods
ple difference in average outcomes by treat- available for estimating average treatment
ment status, the OLS estimator with all effects under unconfoundedness with many
covariates, the DS estimator (Belloni et al. covariates than there used to be, but there
[2013]), the ARB estimator (Athey et al. remain challenges in making these methods
[2016]), and the DR and DML esti- useful to practitioners. Here are some of
mators (Van Der Laan and Rubin [2006], the challenges remaining.
VOL. VOL NO. ISSUE AVERAGE TREATMENT EFFECTS 7

Table 1—An Illustration Based on the Connors et al. [1996] Heart Catherization Data

Metric trimmed Cov Split


ATT (s.e.) ATT SBB mean std
Naive 0.074 0.014 0.038 -0.002 0.073 0.011
OLS 0.064 0.014 0.056 0.704 0.073 0.011
DSE 0.062 0.014 0.057 -0.213 0.061 0.007
ARBE 0.061 0.015 0.050 -0.157 0.061 0.007
DRE 0.038 0.012 0.039 0.084 0.039 0.006
DMLE 0.037 0.014 0.036 0.341 0.042 0.007
Quantiles
mean 0.025 0.250 0.500 0.750 .975
b̂(Xi )/std(Yi ) 0.07 -1.29 -0.54 0.25 0.58 1.29

1) (Choice of Regularization) The American Economic Review, 105(5):476–


regularization methods used continue 480, 2015a.
to be based on optimal prediction for
the infinitely dimensional components Susan Athey and Guido Imbens. Recursive
of the influence function. Although in partitioning for heterogeneous causal ef-
some cases this may be optimal in large fects. arXiv preprint arXiv:1504.01132,
samples, e.g., Wager et al. [2016], in 2015b.
many cases these methods do not fo- Susan Athey and Guido Imbens. The
cus on the ultimate object of interest, state of applied econometrics-causality
the average treatment effect, and the and policy evaluation. arXiv preprint
implication that not all errors in esti- arXiv:1607.00699, 2016.
mating the unknown functions matter
equally. See for some discussion of this Susan Athey, Guido Imbens, and Stefan
issue Athey and Imbens [2015b]. Wager. Efficient inference of average
treatment effects in high dimensions via
2) (Choice of Prediction Methods) approximate residual balancing. arXiv
The leading estimators allow for the preprint arXiv:1604.07125, 2016.
use of many different prediction meth-
ods of the infinitely dimensional com- Alexandre Belloni, Victor Chernozhukov,
ponents, without guidance for prac- Ivan Fernández-Val, and Chris
tioners how to choose among these Hansen. Program evaluation with
methods in practice. high-dimensional data. arXiv preprint
arXiv:1311.2645, 2013.
3) (Supporting Analyses) There is
more work needed on supporting anal- Victor Chernozhukov, Denis Chetverikov,
yses that are intended to provide evi- Mert Demirer, Esther Duflo, Christian
dence that in a particular data analysis Hansen, et al. Double machine learn-
the answer is credible. ing for treatment and causal parameters.
arXiv preprint arXiv:1608.00060, 2016.
REFERENCES
Alfred F Connors, Theodore Speroff,
Alberto Abadie and Guido W Imbens. Neal V Dawson, Charles Thomas,
Large sample properties of matching es- Frank E Harrell, Douglas Wagner, Nor-
timators for average treatment effects. man Desbiens, Lee Goldman, Albert W
Econometrica, 74(1):235–267, 2006. Wu, Robert M Califf, et al. The effective-
ness of right heart catheterization in the
Susan Athey and Guido Imbens. A measure initial care of critically iii patients. Jama,
of robustness to misspecification. The 276(11):889–897, 1996.
8 PAPERS AND PROCEEDINGS MONTH YEAR

Richard Crump, V Joseph Hotz, Guido Im- Guido W Imbens and Donald B Rubin.
bens, and Oscar Mitnik. Moving the goal- Causal Inference in Statistics, Social,
posts: Addressing limited overlap in the and Biomedical Sciences. Cambridge
estimation of average treatment effects University Press, 2015.
by changing the estimand, 2006.
Joseph DY Kang and Joseph L Schafer. De-
Richard K Crump, V Joseph Hotz, mystifying double robustness: A compar-
Guido W Imbens, and Oscar A Mit- ison of alternative strategies for estimat-
nik. Dealing with limited overlap in ing a population mean from incomplete
estimation of average treatment effects. data. Statistical science, pages 523–539,
Biometrika, pages 187–199, 2009. 2007.
Fan Li, Kari Lock Morgan, and Alan M Za-
Bradley Efron and Robert J Tibshirani. An
slavsky. Balancing covariates via propen-
introduction to the bootstrap. CRC press,
sity score weighting. arXiv preprint
1994.
arXiv:1404.1785, 2014.
Bryan Graham, Christine Pinto, and Daniel Daniel F McCaffrey, Greg Ridgeway, and
Egel. Inverse probability tilting for Andrew R Morral. Propensity score
moment condition models with missing estimation with boosted regression for
data. Review of Economic Studies, pages evaluating causal effects in observational
1053–1079, 2012. studies. Psychological Methods, 9(4):403,
2004.
Bryan Graham, Christine Pinto, and Daniel
Egel. Efficient estimation of data Whitney K Newey. The asymptotic
combination models by the method of variance of semiparametric estimators.
auxiliary-to-study tilting (ast). Journal Econometrica: Journal of the Economet-
of Business and Economic Statistics, 34 ric Society, pages 1349–1382, 1994.
(2):288–301, 2016.
James Robins and Andrea Rotnitzky. Semi-
Jinyong Hahn. On the role of the propensity parametric efficiency in multivariate re-
score in efficient semiparametric estima- gression models with missing data. Jour-
tion of average treatment effects. Econo- nal of the American Statistical Associa-
metrica, pages 315–331, 1998. tion, 90(1):122–129, 1995.

Jens Hainmueller. Entropy balancing for James Robins, Andrea Rotnitzky, and L.P.
causal effects: A multivariate reweighting Zhao. Analysis of semiparametric regres-
method to produce balanced samples in sion models for repeated outcomes in the
observational studies. Political Analysis, presence of missing data. Journal of the
20(1):25–46, 2012. American Statistical Association, 90(1):
106–121, 1995.
Keisuke Hirano, Guido Imbens, Geert Rid-
Paul R Rosenbaum and Donald B Rubin.
der, and Donald Rubin. Combining pan-
The central role of the propensity score
els with attrition and refreshment sam-
in observational studies for causal effects.
ples. Econometrica, pages 1645–1659,
Biometrika, 70(1):41–55, 1983.
2001.
Donald B Rubin. Estimating causal effects
Paul W Holland. Statistics and causal infer- of treatments in randomized and nonran-
ence. Journal of the American Statistical domized studies. Journal of Educational
Association, 81(396):945–970, 1986. Psychology, 66(5):688–701, 1974.
Guido Imbens and Jeffrey Wooldridge. Re- Daniel O Scharfstein, Andrea Rotnitzky,
cent developments in the econometrics and James M Robins. Adjusting for
of program evaluation. Journal of Eco- nonignorable drop-out using semipara-
nomic Literature, 47(1):5–86, 2009. metric nonresponse models. Journal of
VOL. VOL NO. ISSUE AVERAGE TREATMENT EFFECTS 9

the American Statistical Association, 94


(448):1096–1120, 1999.
Robert Tibshirani. Regression shrinkage
and selection via the lasso. Journal of
the Royal Statistical Society. Series B
(Methodological), pages 267–288, 1996.
Mark J Van Der Laan and Daniel Rubin.
Targeted maximum likelihood learning.
The International Journal of Biostatis-
tics, 2(1), 2006.
Aad W. van der Vaart. Asymptotic Statis-
tics. Number 3. Cambridge Univ Pr,
2000.
Stefan Wager, Wenfei Du, Jonathan Tay-
lor, and Robert J Tibshirani. High-
dimensional regression adjustments in
randomized experiments. Proceedings of
the National Academy of Sciences, 113
(45):12673–12678, 2016.
Hui Zou and Trevor Hastie. Regularization
and variable selection via the elastic net.
Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 67(2):
301–320, 2005.
José R Zubizarreta. Stable weights that
balance covariates for estimation with
incomplete outcome data. Journal of
the American Statistical Association, 110
(511):910–922, 2015.

You might also like