Statistical Models For Causal Analysis - Causal Inference - Notes

Chapter 1: Statistical Models for Causal Analysis
MIT 17.802: Quantitative Research Methods II
2023-03-19
Lecture 1: Statistical Models for Causal Analysis
Basic concepts and definitions
Causal inference - inference about counterfactuals We need a statistical model that can explicitly distinguish
factuals and counterfactuals.
Treatment (Di ): Indicator of treatment intake for unit i, where i = 1, . . . , N
Observed outcome (Yi ): Variable of interest whose value may be effected by the treatment
Yi = YDi i = Di Y1i + (1 − Di )Y0i
Meaning that if Di = 1, then Yi = Y1i , and if Di = 0, then Yi = Y0i .
Potential outcomes (Ydi ): Value of the outcome that would be realized if unit i received the treatment d
where d = 0 or 1
Y1i - potential outcome for unit i with treatment.
Y0i - potential outcome for unit i without treatment.
Causal effect / unit treatment effect (τi = Y1i − Y0i )
The fundamental problem of causal inference is that we can never observe both Y1i and Y0i for the
same i. This makes τi unidentifiable without further assumptions.
Key assumptions
SUTVA: Stable Unit Treatment Value Assumption
′
Y(D1 ,D2 ,...,DN )i = Y(D1′ ,D2′ ,...,DN
′ )i if Di = D .
i
This means: 1. No interference between units (spillover effects, contagion, dilution, etc). 2. Stability of
treatment across units (no different versions of treatments)
Without SUTVA, even with a two unit vector, there are way too many potential outcomes for unit one:
Y(0,0)1 , Y(1,0)1 , Y(0,1)1 , Y(1,1)1 . This means that there are at least six causal effects for unit 1 (all the possible
combinations of one of those potential outcomes minus the other.)
Key estimands
Since unit-level casual effects are fundamentally unobservable, we instead focus on averages in most situa-
tions.
The Average treatment effect (ATE) is still identified, and throughout this course we will consider various
assumptions under which it can be identified from the observed information.
1
N
1 X
τAT E − [Y1i − Y0i ] = E[Y1i − Y0i ]
N i=1
The average treatment effect on the treated (ATT) is not equal to ATE when Di and Ydi are associated.
N
1 X
τAT T = Di [Y1i − Y0i ] = E[Y1i − Y0i |Di = 1]
N1 i=1
PN
Where N1 = i=1 Di , or the the number of treated units.
The average treatment effect on the control (ATC) by extension could be thought of as:
N
1 X
τAT C = (1 − Di )[Y1i − Y0i ] = E[Y1i − Y0i |Di = 0]
N0 i=1
The conditional average treatment effect is a subgroup effect; the treatment effect on units that have
particular characteristics x.
τCAT E (x) = E[Y1i − Y0i |Xi = x]
Where Xi is a pre-treatment covariate for unit i.

The most common naive estimator is a comaprison of observed outcomes for the treated and untreated:
N1 N2
1 X 1 X
τ̃ = y1i − y0i = E[Yi |Di = 1] − E[Yi |Di = 0]
n1 i=1 n0 i=1
Unfortunately, this estimator is biased if selection into treatment is associated with potential outcomes.
Proof (drawing also from Recitation 2, slide 5)
This is what we start out with:
τ̃ = E[Yi |Di = 1] − E[Yi |Di = 0]
Which, in practice is this:

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0]
Now we can add and subtract (so the terms cancel out) E[Y0i |Di = 1], the hypothetical expected outcome of
if a treatment group individual didn’t get treatment.
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0] + E[Y0i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1]
Then we switch the order around:
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]
Here, the first half of our new τ̃ is the average treatment effect on the treated:
E[Y1i |Di = 1] − E[Y0i |Di = 1] = τAT T
The second half of our new τ̃ is the selection bias, because it represents the non treatment driven differences
in expected values between the treatment and control groups.
2
E[Y0i |Di = 1] − E[Y0i |Di = 0] = SelectionBias
The only time that ATT will be identified here is when selection bias is zero, which will only happen
when:the expected outcome of no treatment for the treatment group is the same as the expected outcome
of no treatment for the control group.
E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]
Meanwhile (still going off of recitation), ATE will be identified when both:
E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]
AND
E[Y1i |Di = 1] = E[Y1i |Di = 0] = E[Y1i ]
Research design can make it more likely that these conditions are met:
Best: Researcher randomizes the treatment.
Next best: Treatment assignment process is quasi-random and well understood (“natural experiments”)
Not so great: Treatment is “as if” random after statistical control (regression, matching)
Worst: Treatment is self-selected and no plausible control is available.

Statistical Models For Causal Analysis - Causal Inference - Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Models For Causal Analysis - Causal Inference - Notes

Uploaded by

Copyright:

Available Formats

Chapter 1: Statistical Models for Causal Analysis

MIT 17.802: Quantitative Research Methods II

Lecture 1: Statistical Models for Causal Analysis

Basic concepts and definitions

SUTVA: Stable Unit Treatment Value Assumption

τCAT E (x) = E[Y1i − Y0i |Xi = x]

Where Xi is a pre-treatment covariate for unit i.

Which, in practice is this:

Then we switch the order around:

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]

E[Y1i |Di = 1] − E[Y0i |Di = 1] = τAT T

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

E[Y1i |Di = 1] = E[Y1i |Di = 0] = E[Y1i ]

You might also like