04 Ate

Inference for Average Treatment Effects
Kosuke Imai
Harvard University
S TAT 186/G OV 2002 C AUSAL I NFERENCE
Fall 2019
Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 1 / 15

Motivation
Two limitations of permutation inference:

1 causal heterogeneity
2 population inference
Fundamental problem of causal inference
cannot identify individual causal effects
Neyman’s approach:
1 Average treatment effects as causal quantities of interest: SATE
and PATE
2 Design-based approach: randomization of treatment assignment,
random sampling
3 Asymptotic approximation rather than exact inference

Social Pressure and Turnout (Gerber, et al. 2008. Am. Political Sci. Rev.)
August 2006 Primary Election in Michigan

Statewide elections: Governor, US Senator
180,000 households
Send postcards with different messages
Randomly assign each household to a group (or treatment)

1 no message (control group)
2 civic duty message
3 “you are being studied” message (Hawthorne effect)
4 household social pressure message
5 neighborhood social pressure message

Neighborhood Social Pressure Message

“You are being studied” Message

Standard Empirical Analysis
Groups Control Civic duty Hawthorne Self Neighbor
Turnout rate 29.7% 31.5% 32.2% 34.5% 37.5%
# of voters 191,243 38,218 38,204 38,218 38,201
Neighborhood social pressure vs. Control
= 37.5 − 29.7 = 7.8

τ̂
r
37.5 × (100 − 37.5) 29.7 × (100 − 29.7)
s.e. = + ≈ 0.3
38201 191243
95%CI = [7.8 − 1.96 × 0.3, 7.8 + 1.96 × 0.3] = [7.2, 8.4]
This calculation ignores the fact that some households have

multiple voters: we will discuss this issue later in the course
How can we justify this standard difference-in-means analysis
from the randomization perspective?

Estimation of the Sample Average Treatment Effect
Due to Neyman (1923) Neyman. 1990 (translated to English) Stat. Sci.

Difference-in-means estimator:
n n
1 X 1 X
τ̂ ≡ Ti Yi − (1 − Ti )Yi
n1 n0
i=1 i=1
Unbiasedness (over repeated treatment assignments):

n n
1 X 1 X
E(τ̂ | On ) = E(Ti | On )Yi (1) − {1 − E(Ti | On )}Yi (0)
n1 n0
i=1 i=1
n
1X
= (Yi (1) − Yi (0)) = SATE
n
i=1
where On = {Yi (0), Yi (1)}ni=1

The Variance of the Difference-in-Means Estimator
Variance of τ̂ :

1 n0 2 n1 2
V(τ̂ | On ) = S1 + S0 + 2S01 ,
n n1 n0
where for t = 0, 1,
n
1 X
St2 = (Yi (t) − Y (t))2 sample variance of Yi (t)
n−1
i=1
n
1 X
S01 = (Yi (0) − Y (0))(Yi (1) − Y (1)) sample covariance
n−1
i=1
The variance is NOT identifiable

Details of the Variance Derivation
1 Let Xi = Yi (1) + n1 Yi (0)/n0 and Di = nTi /n1 − 1, and write
 !2 
n
1  X 
V(τ̂ | On ) = E D i Xi
On
n2  
i=1
2 Show
n0
E(Di | On ) = 0, E(Di2 | On ) = ,
n1
n0
E(Di Dj | On ) = −
n1 (n − 1)
3 Use Ê and Ë to show,
n
n0 X
V(τ̂ | On ) = (Xi − X )2
n(n − 1)n1
i=1
4 Substitute the potential outcome expressions for Xi

Conservative Variance Estimator
The usual variance estimator is conservative on average:
!
S12 S02 σ̂12 σ̂02
V(τ̂ | On ) ≤ + = E + On
n1 n0 n1 n0
where
n
1 X
σ̂t = 1{Ti = t}(Yi − Y t )2 for t = 0, 1
nt − 1
i=1
Under the constant additive unit causal effect assumption, i.e.,

Yi (1) − Yi (0) = c for all i,
1 2 S12 S02
S01 = (S + S02 ) and V(τ̂ | On ) = +
2 1 n1 n0
The optimal treatment assignment rule:
n n
n1opt = , n0opt =
1 + S0 /S1 1 + S1 /S0
Bounds on the Variance
Use of the Cauchy-Schwartz inequality:

1 Upper bound: sample correlation between Yi (1) and Yi (0) is 1
2 Lower bound: sample correlation between Yi (1) and Yi (0) is −1
2 2
n0 n1 S1 S0 n0 n1 S1 S0
− ≤ V(τ̂ | On ) ≤ +
n n1 n0 n n1 n0
Constant additive unit causal effect sample correlation is 1

2
S12 S02

n0 n1 S1 S0
+ = +
n n1 n0 n1 n0
Sharp bounds based on the entire marginal distributions

application of Hoeffding’s lemma (Aronow et al. 2015. Ann. Stat.)

Inference for Population Average Treatment Effect
Assumption: simple random sampling from an infinite population
Unbiasedness (over repeated sampling):
E{E(τ̂ | On )} = E(SATE) = PATE
Variance:
V(τ̂ ) = V{E(τ̂ | On )} + E{V(τ̂ | On )}

σ12 σ02
= +
n1 n0
where σt2 is the population variance of Yi (t) for t = 0, 1

Unbiased variance estimator:
2 2
[) = σ̂1 + σ̂0
V(τ̂ [)} = V(τ̂ )
where E{V(τ̂
n1 n0
for t = 0, 1
Asymptotic Inference for PATE
Hold k = n1 /n constant:
n
1X Ti Yi (1) (1 − Ti )Yi (0)
τ̂ = −
n k 1−k
i=1 | {z }
i.i.d. with mean PATE & variance nV(τ̂ )
Consistency via Law of large numbers:

p
τ̂ −→ PATE
Asymptotic normality via the Central Limit Theorem:
!
√ d σ2 σ02
n(τ̂ − PATE) −→ N 0, 1 +
k 1−k
(1 − α) × 100% Confidence intervals:

[τ̂ − s.e. × zα/2 , τ̂ + s.e. × zα/2 ]

Exchange at the Royal Statistiacal Society
(Neyman et al. (1935) Suppl. of J. Royal Stat. Soc.)
Neyman: So long as the average yields of any treatments are

identical, the question as to whether these treatments affect separate
yields on single plots seems to be uninteresting
Fisher: It may be foolish, but that is what the z test was designed for,
and the only purpose for which it has been used.
Neyman: I am considering problems which are important from the

point of view of agriculture.
Fisher: It may be that the question which Dr. Neyman thinks should be
answered is more important than the one I have proposed and
attempted to answer. I suggest that before criticizing previous work it is
always wise to give enough study to the subject to understand its
purpose.
Summary: Fisher vs. Neyman
Like Fisher, Neyman proposed randomization-based inference

Unlike Fisher,
1 estimands are average treatment effects
2 heterogenous treatment effects are allowed
3 population as well as sample inference is possible
4 asymptotic approximation is required for inference
Reading: I MBENS AND RUBIN , C HAPTER 6

04 Ate

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 Ate

Uploaded by

Copyright:

Available Formats

Inference for Average Treatment Effects

S TAT 186/G OV 2002 C AUSAL I NFERENCE

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 1 / 15

Two limitations of permutation inference:

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 2 / 15

August 2006 Primary Election in Michigan

Randomly assign each household to a group (or treatment)

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 3 / 15

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 4 / 15

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 5 / 15

Neighborhood social pressure vs. Control

= 37.5 − 29.7 = 7.8

This calculation ignores the fact that some households have

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 6 / 15

Due to Neyman (1923) Neyman. 1990 (translated to English) Stat. Sci.

Unbiasedness (over repeated treatment assignments):

where On = {Yi (0), Yi (1)}ni=1

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 7 / 15

The variance is NOT identifiable

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 8 / 15

4 Substitute the potential outcome expressions for Xi

Under the constant additive unit causal effect assumption, i.e.,

Use of the Cauchy-Schwartz inequality:

Constant additive unit causal effect sample correlation is 1

Sharp bounds based on the entire marginal distributions

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 11 / 15

E{E(τ̂ | On )} = E(SATE) = PATE

V(τ̂ ) = V{E(τ̂ | On )} + E{V(τ̂ | On )}

where σt2 is the population variance of Yi (t) for t = 0, 1

Consistency via Law of large numbers:

(1 − α) × 100% Confidence intervals:

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 13 / 15

Neyman: So long as the average yields of any treatments are

Neyman: I am considering problems which are important from the

Like Fisher, Neyman proposed randomization-based inference

Reading: I MBENS AND RUBIN , C HAPTER 6

Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2019 15 / 15

You might also like