You are on page 1of 3

Chapter 1: Statistical Models for Causal Analysis

MIT 17.802: Quantitative Research Methods II

2023-03-19

Lecture 1: Statistical Models for Causal Analysis

Basic concepts and definitions

Causal inference - inference about counterfactuals We need a statistical model that can explicitly distinguish
factuals and counterfactuals.
Treatment (Di ): Indicator of treatment intake for unit i, where i = 1, . . . , N
Observed outcome (Yi ): Variable of interest whose value may be effected by the treatment
Yi = YDi i = Di Y1i + (1 − Di )Y0i
Meaning that if Di = 1, then Yi = Y1i , and if Di = 0, then Yi = Y0i .
Potential outcomes (Ydi ): Value of the outcome that would be realized if unit i received the treatment d
where d = 0 or 1
Y1i - potential outcome for unit i with treatment.
Y0i - potential outcome for unit i without treatment.
Causal effect / unit treatment effect (τi = Y1i − Y0i )
The fundamental problem of causal inference is that we can never observe both Y1i and Y0i for the
same i. This makes τi unidentifiable without further assumptions.

Key assumptions

SUTVA: Stable Unit Treatment Value Assumption


Y(D1 ,D2 ,...,DN )i = Y(D1′ ,D2′ ,...,DN
′ )i if Di = D .
i

This means: 1. No interference between units (spillover effects, contagion, dilution, etc). 2. Stability of
treatment across units (no different versions of treatments)
Without SUTVA, even with a two unit vector, there are way too many potential outcomes for unit one:
Y(0,0)1 , Y(1,0)1 , Y(0,1)1 , Y(1,1)1 . This means that there are at least six causal effects for unit 1 (all the possible
combinations of one of those potential outcomes minus the other.)

Key estimands

Since unit-level casual effects are fundamentally unobservable, we instead focus on averages in most situa-
tions.
The Average treatment effect (ATE) is still identified, and throughout this course we will consider various
assumptions under which it can be identified from the observed information.

1
N
1 X
τAT E − [Y1i − Y0i ] = E[Y1i − Y0i ]
N i=1

The average treatment effect on the treated (ATT) is not equal to ATE when Di and Ydi are associated.

N
1 X
τAT T = Di [Y1i − Y0i ] = E[Y1i − Y0i |Di = 1]
N1 i=1

PN
Where N1 = i=1 Di , or the the number of treated units.
The average treatment effect on the control (ATC) by extension could be thought of as:
N
1 X
τAT C = (1 − Di )[Y1i − Y0i ] = E[Y1i − Y0i |Di = 0]
N0 i=1

The conditional average treatment effect is a subgroup effect; the treatment effect on units that have
particular characteristics x.

τCAT E (x) = E[Y1i − Y0i |Xi = x]

Where Xi is a pre-treatment covariate for unit i.


The most common naive estimator is a comaprison of observed outcomes for the treated and untreated:

N1 N2
1 X 1 X
τ̃ = y1i − y0i = E[Yi |Di = 1] − E[Yi |Di = 0]
n1 i=1 n0 i=1

Unfortunately, this estimator is biased if selection into treatment is associated with potential outcomes.
Proof (drawing also from Recitation 2, slide 5)
This is what we start out with:
τ̃ = E[Yi |Di = 1] − E[Yi |Di = 0]

Which, in practice is this:


τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0]

Now we can add and subtract (so the terms cancel out) E[Y0i |Di = 1], the hypothetical expected outcome of
if a treatment group individual didn’t get treatment.

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0] + E[Y0i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1]

Then we switch the order around:

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]

Here, the first half of our new τ̃ is the average treatment effect on the treated:

E[Y1i |Di = 1] − E[Y0i |Di = 1] = τAT T

The second half of our new τ̃ is the selection bias, because it represents the non treatment driven differences
in expected values between the treatment and control groups.

2
E[Y0i |Di = 1] − E[Y0i |Di = 0] = SelectionBias

The only time that ATT will be identified here is when selection bias is zero, which will only happen
when:the expected outcome of no treatment for the treatment group is the same as the expected outcome
of no treatment for the control group.

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

Meanwhile (still going off of recitation), ATE will be identified when both:

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

AND

E[Y1i |Di = 1] = E[Y1i |Di = 0] = E[Y1i ]

Research design can make it more likely that these conditions are met:
Best: Researcher randomizes the treatment.
Next best: Treatment assignment process is quasi-random and well understood (“natural experiments”)
Not so great: Treatment is “as if” random after statistical control (regression, matching)
Worst: Treatment is self-selected and no plausible control is available.

You might also like