Professional Documents
Culture Documents
2023-03-19
Causal inference - inference about counterfactuals We need a statistical model that can explicitly distinguish
factuals and counterfactuals.
Treatment (Di ): Indicator of treatment intake for unit i, where i = 1, . . . , N
Observed outcome (Yi ): Variable of interest whose value may be effected by the treatment
Yi = YDi i = Di Y1i + (1 − Di )Y0i
Meaning that if Di = 1, then Yi = Y1i , and if Di = 0, then Yi = Y0i .
Potential outcomes (Ydi ): Value of the outcome that would be realized if unit i received the treatment d
where d = 0 or 1
Y1i - potential outcome for unit i with treatment.
Y0i - potential outcome for unit i without treatment.
Causal effect / unit treatment effect (τi = Y1i − Y0i )
The fundamental problem of causal inference is that we can never observe both Y1i and Y0i for the
same i. This makes τi unidentifiable without further assumptions.
Key assumptions
′
Y(D1 ,D2 ,...,DN )i = Y(D1′ ,D2′ ,...,DN
′ )i if Di = D .
i
This means: 1. No interference between units (spillover effects, contagion, dilution, etc). 2. Stability of
treatment across units (no different versions of treatments)
Without SUTVA, even with a two unit vector, there are way too many potential outcomes for unit one:
Y(0,0)1 , Y(1,0)1 , Y(0,1)1 , Y(1,1)1 . This means that there are at least six causal effects for unit 1 (all the possible
combinations of one of those potential outcomes minus the other.)
Key estimands
Since unit-level casual effects are fundamentally unobservable, we instead focus on averages in most situa-
tions.
The Average treatment effect (ATE) is still identified, and throughout this course we will consider various
assumptions under which it can be identified from the observed information.
1
N
1 X
τAT E − [Y1i − Y0i ] = E[Y1i − Y0i ]
N i=1
The average treatment effect on the treated (ATT) is not equal to ATE when Di and Ydi are associated.
N
1 X
τAT T = Di [Y1i − Y0i ] = E[Y1i − Y0i |Di = 1]
N1 i=1
PN
Where N1 = i=1 Di , or the the number of treated units.
The average treatment effect on the control (ATC) by extension could be thought of as:
N
1 X
τAT C = (1 − Di )[Y1i − Y0i ] = E[Y1i − Y0i |Di = 0]
N0 i=1
The conditional average treatment effect is a subgroup effect; the treatment effect on units that have
particular characteristics x.
N1 N2
1 X 1 X
τ̃ = y1i − y0i = E[Yi |Di = 1] − E[Yi |Di = 0]
n1 i=1 n0 i=1
Unfortunately, this estimator is biased if selection into treatment is associated with potential outcomes.
Proof (drawing also from Recitation 2, slide 5)
This is what we start out with:
τ̃ = E[Yi |Di = 1] − E[Yi |Di = 0]
Now we can add and subtract (so the terms cancel out) E[Y0i |Di = 1], the hypothetical expected outcome of
if a treatment group individual didn’t get treatment.
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0] + E[Y0i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1]
Here, the first half of our new τ̃ is the average treatment effect on the treated:
The second half of our new τ̃ is the selection bias, because it represents the non treatment driven differences
in expected values between the treatment and control groups.
2
E[Y0i |Di = 1] − E[Y0i |Di = 0] = SelectionBias
The only time that ATT will be identified here is when selection bias is zero, which will only happen
when:the expected outcome of no treatment for the treatment group is the same as the expected outcome
of no treatment for the control group.
Meanwhile (still going off of recitation), ATE will be identified when both:
AND
Research design can make it more likely that these conditions are met:
Best: Researcher randomizes the treatment.
Next best: Treatment assignment process is quasi-random and well understood (“natural experiments”)
Not so great: Treatment is “as if” random after statistical control (regression, matching)
Worst: Treatment is self-selected and no plausible control is available.