You are on page 1of 110

Quantitative Social Science Methods, I, Lecture

Notes: Detecting and Reducing Model


Dependence in Causal Inference

Gary King1
Institute for Quantitative Social Science
Harvard University

August 17, 2020

1
GaryKing.org
1 / 45 .
Detecting Model Dependence

Matching to Reduce Model Dependence

Three Matching Methods

Problems with Propensity Score Matching

The Matching Frontier

Detecting Model Dependence 2 / 45 .


Readings in Model Dependence

• King, Gary and Langche Zeng. “The Dangers of Extreme


Counterfactuals,” Political Analysis, 14, 2, (2007): 131-159.
• King, Gary and Langche Zeng. “When Can History be Our
Guide? The Pitfalls of Counterfactual Inference,” International
Studies Quarterly, 2006, 51 (March, 2007): 183–210.
• Related Software: WhatIf, MatchIt, Zelig, CEM

j.mp/causalinference

Detecting Model Dependence 3 / 45 .


Counterfactuals

• Three types:
1. Forecasts What will the mortality rate be in 2025?
2. Whatif Questions What would have happened if the U.S. had
not invaded Iraq?
3. Causal Effects What is the causal effect of the Iraq war on
World GDP? (a factual minus a counterfactual)
• Counterfactuals are part of most social science research

Detecting Model Dependence 4 / 45 .


Which model would you choose? (Both fit the data well.)

• Compare prediction at 𝑥 = 1.5 to prediction at 𝑥 = 5


• How do you choose a model? 𝑅 2 ? Some “test”? “Theory”?
• The bottom line: answers to some questions don’t exist in the
data. We show how to determine which ones.
• Same for what if questions, predictions, and causal inferences

Detecting Model Dependence 5 / 45 .


Model Dependence Proof

Model Free Inference


To estimate 𝐸(𝑌 ∣𝑋 = 𝑥) at 𝑥, average many observed 𝑌 with value
𝑥

Assumptions (Model-Based Inference)


1. Definition: model dependence at 𝑥 is the difference between
predicted outcomes for any two models that fit about equally
well.
2. The functional form follows strong continuity (think
smoothness, although it is less restrictive)

Result
The maximum degree of model dependence: a function of the
distance from the counterfactual to the data

Detecting Model Dependence 6 / 45 .


A Simple Measure of Distance from The Data

Figure: The Convex Hull

• Interpolation: Inside the convex hull


• Extrapolation: Outside the convex hull
• Works mathematically for any number of 𝑋 variables
• Software to determine whether a point is in the hull (which is
all we need) without calculating the hull (which would take
forever), so its fast; see GaryKing.org/whatif
Detecting Model Dependence 7 / 45 .
Model Dependence Example
Replication of Doyle and Sambanis, APSR 2000
(From: King and Zeng, 2007)

• Data: 124 Post-World War II civil wars


• Dependent var: peacebuilding success
• Treatment: multilateral UN peacekeeping intervention (0/1)
• Control vars: war type, severity, duration; development
status,…
• Counterfactual question: Switch UN intervention for each
war
• Data analysis: Logit model
• The question: How model dependent are the results?
• Percent of counterfactuals in the convex hull: 0%
• ↝ without estimating any models, we know: inferences will
be model dependent
• For illustration: let’s find an example….

Detecting Model Dependence 8 / 45 .


Two Logit Models, Apparently Similar Results
Effect of Multilateral UN Intervention on Peacebuilding Success

Original “Interactive” Model Modified Model


Variables Coeff SE P-val Coeff SE P-val
Wartype −1.742 .609 .004 −1.666 .606 .006
Logdead −.445 .126 .000 −.437 .125 .000
Wardur .006 .006 .258 .006 .006 .342
Factnum −1.259 .703 .073 −1.045 .899 .245
Factnum2 .062 .065 .346 .032 .104 .756
Trnsfcap .004 .002 .010 .004 .002 .017
Develop .001 .000 .065 .001 .000 .068
Exp −6.016 3.071 .050 −6.215 3.065 .043
Decade −.299 .169 .077 −0.284 .169 .093
Treaty 2.124 .821 .010 2.126 .802 .008
UNOP4 3.135 1.091 .004 .262 1.392 .851
Wardur*UNOP4 — — — .037 .011 .001
Constant 8.609 2.157 0.000 7.978 2.350 .000
N 122 122
Log-likelihood -45.649 -44.902
Pseudo 𝑅 2 .423 .433

Detecting Model Dependence 9 / 45 .


Model Dependence: Same Fit, Different Predictions

Detecting Model Dependence 10 / 45 .


Detecting Model Dependence

Matching to Reduce Model Dependence

Three Matching Methods

Problems with Propensity Score Matching

The Matching Frontier

Matching to Reduce Model Dependence 11 / 45 .


Readings, Matching
• Do powerful methods have to be complicated?
↝ “Causal Inference Without Balance Checking: Coarsened
Exact Matching” (PA, 2011. Stefano Iacus, Gary King, and
Giuseppe Porro)
• The most popular method (propensity score matching, used
in 140,000 articles!) sounds magical:
↝ “Why Propensity Scores Should Not Be Used for Matching”
(Gary King, Richard Nielsen) (PA, 2019; Gary King and
Richard Nielsen)
• Matching methods optimize either imbalance (≈ bias) or #
units pruned (≈ variance); users need both simultaneously’:
↝ “The Balance-Sample Size Frontier in Matching Methods for
Causal Inference” (AJPS, 2017; Gary King, Christopher Lucas
and Richard Nielsen)
• Current practice, matching as preprocessing: violates current
statistical theory. So let’s change the theory:
↝ “A Theory of Statistical Inference for Matching Methods in
Causal Research” (Stefano Iacus, Gary King, Giuseppe Porro)
Matching to Reduce Model Dependence 12 / 45 .
Matching to Reduce Model Dependence
(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

12

10

8
Outcome

12 14 16 18 20 22 24 26 28

Education (years)

12 T
Matching to Reduce Model Dependence 13 / 45 .
The Problems Matching Solves
H
Without Matching:With
out
H Matching:
hh hh( ((
H
Imbalance
Imbalance
H ↝ Model Dependence ↝ ( Model
(( ((
h
Dependence
hhhh
hhh
h ( ( (((
h( X 

↝ Researcher discretion ↝ (((((h
Researcher hhhh ↝ Bias ↝ 
discretion X
X
Bias

A central project of statistics: Automating away human discretion


• Qualitative choice from unbiased estimates = biased
estimator
• e.g., Choosing from results of 50 randomized experiments
• Choosing based on “plausibility” is probably worse[eff]
• conscientious effort doesn’t avoid biases (Banaji 2013)[acc]
• People do not have easy access to their own mental processes
or feedback to avoid the problem (Wilson and Brekke
1994)[exprt]
• Experts overestimate their ability to control personal biases
more than nonexperts, and more prominent experts are the14 / 45 .
Matching to Reduce Model Dependence
What’s Matching?
• Notation: 𝑌𝑖 dep var, 𝑇𝑖 (1=treated, 0=control), 𝑋𝑖
confounders
• Treatment Effect for treated observation 𝑖:

TE𝑖 = 𝑌𝑖 (1) − 𝑌𝑖 (0)


= observed − unobserved

• Estimate 𝑌𝑖 (0) with 𝑌𝑗 with a matched (𝑋𝑖 ≈ 𝑋𝑗 ) control


• Quantities of Interest
1. SATT: Sample Average Treatment effect on the Treated:

SATT = Mean (TE𝑖 )


𝑖∈{𝑇𝑖 =1}

2. FSATT: Feasible SATT (prune badly matched treateds too)


• Big convenience: Follow preprocessing with whatever
statistical method you’d have used without matching
• Pruning nonmatches makes control vars matter less: reduces
imbalance, model dependence, researcher discretion, & bias
Matching to Reduce Model Dependence 15 / 45 .
Evaluating Reduction in Model Dependence
Empirical Illustration: Carpenter, AJPS, 2002

• Hypothesis: Democratic senate majorities slow FDA drug


approval time
• Data: 𝑛 = 408 new drugs (262 approved, 146 pending)
• Measured confounders: 18 (clinical factors, firm
characteristics, media variables, etc.)
• Model: lognormal survival
• QOI: Causal effect of Democratic Senate majority (identified
by Carpenter as not robust)
• Match: prune 49 units (2 treated, 17 control units)
• Run: 262,143 possible specifications; calculate SATT for each
• Evaluate: Variability in SATT across specifications
• (Normally we’d only use one or a few specifications)

Matching to Reduce Model Dependence 16 / 45 .


Reducing Model Dependence
0.20
0.15

Point estimate of
Carpenter's specification
Density

using raw data


0.10

Matched
0.05

Raw data
data
0.00

−80 −70 −60 −50 −40 −30


Estimated in−sample average treatment effect for the treated

SATT Histogram: Effect of Democratic Senate majority on FDA


drug approval time, across 262, 143 specifications
Matching to Reduce Model Dependence 17 / 45 .
Another Example: Jeffrey Koch, AJPS, 2002
60

Matched
50

data
40
Density
30

Point estimate
of raw data
20
10

Raw data
0

−0.05 0.00 0.05 0.10


Estimated average treatment effect

SATT Histogram: Effect of being a highly visible female


Republican candidate across 63 possible specifications with the
Koch data
Matching to Reduce Model Dependence 18 / 45 .
Assumptions to Justify Current Practice
Existing Theory of Inference: Stop What You’re Doing!
• Framework: simple random sampling from a population
• Exact matching: Rarely possible; but would make estimation
easy
• Assumptions:
• Unconfoundedness: 𝑇 ⊥𝑌 (0) ∣ 𝑋 (Healthy & unhealthy get meds)
• Common support: Pr(𝑇 = 1∣𝑋 ) < 1 (𝑇 = 0, 1 are both possible)
• Approximate matching (bias correction, new variance
estimation): common, but all current practices would have to
change

Alternative Theory of Inference: It’s Gonna be OK!


• Framework: stratified random sampling from a population
• Define 𝐴: a stratum in a partition of the product space of 𝑋
(“continuous” variables have natural breakpoints)
• We already know and use these procedures: Group strong and
weak partisans; Don’t match college dropout with 1st year grad
student
• Assumptions:
• Set-wide Unconfoundedness: 𝑇 ⊥𝑌 (0) ∣ 𝐴
• Set-wide Common support: Pr(𝑇 = 1∣𝐴) < 1
• Fits all common matching methods & practices; no asymptotics
• Easy extensions for: multi-level, continuous, & mismeasured
treatments; 𝐴 too wide, 𝑛 too small
Matching to Reduce Model Dependence 19 / 45 .
Detecting Model Dependence

Matching to Reduce Model Dependence

Three Matching Methods

Problems with Propensity Score Matching

The Matching Frontier

Three Matching Methods 20 / 45 .


Matching: Finding Hidden Randomized Experiments
Types of Experiments
Balance Complete Fully
Covariates: Randomization Blocked
Observed On average Exact
Unobserved On average On average

↝ Fully blocked dominates complete randomization for:


imbalance, model dependence, power, efficiency, bias, research
costs, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

Goal of Each Matching Method (in Observational


Data)
• PSM: complete randomization
• Other methods: fully blocked
• Other matching methods dominate PSM (wait, it gets worse)

Three Matching Methods 21 / 45 .


Method 1: Mahalanobis Distance Matching
(Approximates Fully Blocked Experiment)

Procedure
1. Preprocess (Matching)

• Distance(𝑋𝑐 , 𝑋𝑡 ) = (𝑋𝑐 − 𝑋𝑡 )′ 𝑆 −1 (𝑋𝑐 − 𝑋𝑡 )
• Match each treated unit to the nearest control unit
• Control units: not reused; pruned if unused
• Prune matches if Distance>caliper
• (Many adjustments available to this basic method)
2. Estimation Difference in means or a model

Interpretation
• Quiz: Do you understand the distance trade offs?
• Quiz: Does standardization help?
↝ Mahalanobis is for methodologists; in applications, use
Euclidean!

Three Matching Methods 22 / 45 .


Mahalanobis Distance Matching

80

70

60

Age 50

40

30

20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70
TTT
60 T

Age 50 T T
T TT

40 T T TT
T T
T
30 T T
TT
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70 CC C C
C
C TTT C
60 C C CC
C C C T C CC
CC C C
CC C
Age 50 C CC CT T C C T C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70 CC C C
C
C TTT C
60 C C CC
C C C T C CC
CC C C
CC C
Age 50 C CC CT T C C T C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70 CC C C
C
C TTT C
60 C C CC
C C C T C CC
CC C C
CC C
Age 50 C CC CT T C C T C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70 C
TTT
60 C T C C
C C C
C C CT
Age 50 CT T C
TT C
T
40
C T C TT C
CT C T C T
C
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Mahalanobis Distance Matching

80

70 C
TTT
60 C T C C
C C C
C C CT
Age 50 CT T C
TT C
T
40
C T C TT C
CT C T C T
C
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 23 / 45 .
Best Case: Mahalanobis Distance Matching

80
CC C C C
CC
C
C
C
TCC CCCCCCC C CCC C CCC C CCC C C
TT
CCC C CCCC CC
70 C CCCC C CCC CC C T
C C C CT
C TC
CC C CC
CC CCCC CCCCC CCC C CCCCCCC CC
CC CCC CCC CCCCC CC
C
CCC CC
C C C CC
C CCCC
T
C CC
C C
C
CC C CC
CCCC TC
60 C T
CCC C
C C C
C CCC C C C C C
C C
C CC C
CCC CCC C T
C C C C
C
T C C
C TT
C CC C
T C C CC
C C CCC CCCC
C
CCC C CCCCC C C
C CCCCC TCCC
C TC C C
C C
C C C CT
C T C
C C C C
Age 50 CCC CCCCCCCC CCCC C C C TC
C CC C CTCC CCC
CTCCC T CCC CCC
TC
C
C CC
CC
T
CC
C CC C CC
CC C
C C CCCC
C CCC CC C
T
C
T C
CCTCC C
C CC CC
CC
C C CCCC C
T C
T TC CCCC
C
T CCCC
C
CCCC C C CCC T
CCCCC CC TC
C TC
40
TC
C CCCC T
CCC C TCC CCCC
C C T C C T T CCT C C
C CCC CC CCCCC
C C
CCC
T CCCCCC CC CC
CC C
CCCC C
T CC CC
T
CC
CC C C
TC
CC CC CCCC
C
CC
C C CCC CC C C TC
CC
CCCCT T C TC CCC
30 C C CCCC CCC

20
12 14 16 18 20 22 24 26 28

Education (years)
Three Matching Methods 24 / 45 .
Method 2: Coarsened Exact Matching
(Approximates Fully Blocked Experiment)

Procedure
1. Preprocess (Matching)
• Temporarily coarsen 𝑋 as much as you’re willing
• e.g., Education (grade school, high school, college, graduate)
• Apply exact matching to the coarsened 𝑋 , 𝐶(𝑋 )
• Sort observations into strata, each with unique values of 𝐶(𝑋 )
• Prune any stratum with 0 treated or 0 control units
• Pass on original (uncoarsened) units except those pruned
2. Estimation Difference in means or a model
• Weight controls in each stratum to equal treateds

Interpretation
• Quiz: Do you understand distance trade offs?
• Quiz: What do you do if you have too few observations?

Three Matching Methods 25 / 45 .


Coarsened Exact Matching

Three Matching Methods 26 / 45 .


Coarsened Exact Matching

80

70 CC C C
C
C TTT C
60 C C CC T C C
C C C C
CC C C
CC
Age 50 C CC CT TCC C T
TT C C
C T
40
C T C TT C
CT C T
C TC
30 T T
T T
20
12 14 16 18 20 22 24 26 28

Education
Three Matching Methods 26 / 45 .
Coarsened Exact Matching

Old

CC C C
Retirement C
C TTT C
C C CC T C C
C C C C
Senior Discounts CC C C
CC
C CC CT TCC C T
TT C C
C T
The Big 40
C T C TT C
CT C T
C TC
Don't trust anyone T T
over 30 T
T
Drinking age

HS BA MA PhD 2nd PhD

Education
Three Matching Methods 26 / 45 .
Coarsened Exact Matching

Old

CC C C
Retirement C
C TTT C
C C CC T C C
C C C C
Senior Discounts CC C C
CC
C CC CT TCC C T
TT C C
C T
The Big 40
C T C TT C
CT C T
C TC
Don't trust anyone T T
over 30 T
T
Drinking age

HS BA MA PhD 2nd PhD

Education
Three Matching Methods 26 / 45 .
Coarsened Exact Matching

Old

Retirement
C TTT
CC T C C
C C C
Senior Discounts
TC CT
CT C C
TT
T
The Big 40 T C TT C
CT C
TC
Don't trust anyone
over 30

Drinking age

HS BA MA PhD 2nd PhD

Education
Three Matching Methods 26 / 45 .
Coarsened Exact Matching

Old

Retirement C
TTT
C
T C C C

Senior Discounts
C
C C

CT
CC CT
T
TT C

The Big 40 T C T TT C
CT TC
C

Don't trust anyone


over 30

Drinking age

HS BA MA PhD 2nd PhD

Education
Three Matching Methods 26 / 45 .
Coarsened Exact Matching

80

70
C
TTT
60 T C C
C C

C
C C

Age 50 CT
CC CT
T
TT C

40 T C T TT C
CT TC
C

30

20
12 14 16 18 20 22 24 26 28

Education
Three Matching Methods 26 / 45 .
Best Case: Coarsened Exact Matching

80
C
C
CC C C C C C CCCCC C CCCCCC C C C
70
CTCC CC
C
C
C CCCC C CCCC CC CTT
C
T
C CCCT
C
C C C C CC C C CCC
C C CC
CC CCCC CCCCC CCCC CCCCCCC CC TCCCCCC
CCC C C C CC CC CCCC C CC
C C C C CC CCCC
C
CC C
C
C C
C CCC
TC
C C C C C C CC
CC CTC
C
60 C C T
CC CC C C C CC CCT CC C
CC CCC CCCTC TC CC T
C
T C C CCC C
C
C
CCC C CCCCC C CC CC TCCC
C TCC C CCC CCCCC
C CCCCC CCC C C
T
C CC TCCCC C C C
Age 50 CC
CC C CCC
CC TCC CCCC
C C C C T CC
C CCT
C C
CCC
C T
C CC C
TCC CCC
TC
C CC C
T
CC
CC C C C
CC CC CC CC
CC CCC T C C
CC CC CCC CC
C CCC
C T C TCT
CC
CC C C
CT
T
C C C
CC CC C
CC C C C T
CCT CCC C
40 CCCCC C C CC CCT
C CC C TCC CCCC
TC
C C T CCC CCCCCT T
C TC CCC
C CC T
C C C
TC
CCCCC
C
C CC
CC C CCCC C CC CC
T C C
CC
CC
C
C
T CCCCC
C
C CCC CC C C TCT
CCC
C
C
T
C C C
T C CC TCCCCCC
CCC
C
30 C C C CCC C C CC

20
12 14 16 18 20 22 24 26 28

Education

80
Three Matching Methods 27 / 45 .
Method 3: Propensity Score Matching
(Approximates Completely Randomized Experiment)

Procedure
1. Preprocess (Matching)
1
• Reduce 𝑘 elements of 𝑋 to scalar 𝜋𝑖 ≡ Pr(𝑇𝑖 = 1∣𝑋 ) = 1+𝑒 −𝑋𝑖 𝛽
• Distance(𝑋𝑐 , 𝑋𝑡 ) = ∣𝜋𝑐 − 𝜋𝑡 ∣
• Match each treated unit to the nearest control unit
• Control units: not reused; pruned if unused
• Prune matches if Distance>caliper
• (Many adjustments available to this basic method)
2. Estimation Difference in means or a model

Interpretation
• Quiz: Do you understand distance trade offs?
• Quiz: What do you do when one variable is very important?

Three Matching Methods 28 / 45 .


Propensity Score Matching

80

70 C C
C C C
C TTT C
60 C C CC T C
C C C CC
CC C C
CC C
Age 50 C CC CT TC CT C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
TT
20
12 16 20 24 28

Education (years)
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70 C C
C C C
C TTT C
60 C C CC T C
C C C CC
CC C C
CC C
Age 50 C CC CT TC CT C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
TT
20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70 C C
C C C
C TTT C
C C CC T C T
60 C C C CC T
CC C C C
T
CC C C
C
Age 50 C CC CT TC CT C C C
T
T
C
TT T
C
C C T C
T
40 T C TT C T
C
CT C T C T C
T
C
C
C C
C
T
30 T T C
T
C
C
TT C
C
C
C
C
C
20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70
T
60 T
C
T
C
C
Age 50 C
T
T
C
T
C
C
T
40 T
C
C
T
C
C
C
C
T
30 C
T
C
C
C
C
C
C
C
C
20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70
T
60 T
C
T
C
C
Age 50 C
T
T
C
T
C
C
T
40 T
C
C
T
C
C
C
C
T
30 C
T
C
C
C
C
C
C
C
C
20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70
T
60 T
C
T
Age 50 C
T
T
C
T
C
T
40 T
C
C
T
C
C
C
T
30 C
T
C

20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80
1
70
TTT C
T
60 TC C T
C C C C
T
C CT
T
Age 50 CT C C C C
T
T
C
TT T
C C T C
T
40 T C TT C T
C
CT C T C T C
T
C
C C
C
T
30 T T C
T
C
TT
20
0
12 16 20 24 28
Propensity
Education (years) Score
Three Matching Methods 29 / 45 .
Propensity Score Matching

80

70
TTT C
60 TC C
C C C
C CT
T
Age 50 CT C C C
TT
C C T
40 T C TT C
CT C T C T
C
30 T T
TT
20
12 16 20 24 28

Education (years)
Three Matching Methods 29 / 45 .
Best Case: Propensity Score Matching is Suboptimal
80
C
C C C
C C C C CCCCCC CC CCC C 1
70
C
CTCCCC
C
C CCCC CCCCC CC C TCTC C C CC
C C C C C C T
C C CCCT
CCC CC CC CCCCC CCCC CC TC
CC C CC
CCCCCCC
C
CCCC
C CCC C C
CC C
CTC CC
C CCCCC
C C C
C C CCCC CC
C
C
C C C C C CCC CC C CC
C CT
60 C
C C
C T
C
CC C T
CC CC CCC CCC
T CC C
C C
CC CC C C CT C C
T C C C
C
C CC CCCC CT CTCC
C CC CCC CCCCC
CC C
CC C
CC
C
CCCCC CCC C
CT
C CC C T
C
T CCCC C C C
Age 50 CC C CCC CCC
CTC
CC C
C C CCC T
CC
CC
C CCC
TC CC CCT C
TC
C CC T CC
C
CC
C
TCC C
C
T C
CC C
C C C C
CT
CCCCCCCC
CC
C
C
T
CT
C
CCCC
CCCCCC
C CCCC
T
CT C C C C C C C T
CC C
T CCCC
C
40 CCCC C
C CCC CC C
CCTCCC C
CC C TCC CCC
TC
CC T CC CC CC
CTC T
C
T CC CC CC T
C C C CCCCCC
CC C
TC CC
CCC
TCC C CCCCCC CC T C C
T CCC C
T C C C C
C CC CCC C C TCC
C C TC CC C
CTC
C CC
C
30 CC C C
C C CCCC C CC CC

20
0
12 16 20 24 28
Propensity
Education (years) Score

80
C
C C C
C
C C C C CC C C CC CC 1
Three Matching Methods C CCC C CC CCCTC CC C C
TCCCC
C 30 / 45 .
Detecting Model Dependence

Matching to Reduce Model Dependence

Three Matching Methods

Problems with Propensity Score Matching

The Matching Frontier

Problems with Propensity Score Matching 31 / 45 .


Random Pruning Increases Imbalance
Deleting data only helps if you’re careful!

• “Random pruning”: pruning process is independent of 𝑋


• Discrete example
• Sex-balanced dataset: treateds 𝑀𝑡 , 𝐹𝑡 , controls 𝑀𝑐 , 𝐹𝑐
• Randomly prune 1 treated & 1 control ↝ 4 possible datasets: 2
balanced {𝑀𝑡 , 𝑀𝑐 }, {𝐹𝑡 , 𝐹𝑐 }
2 imbalanced {𝑀𝑡 , 𝐹𝑐 }, {𝐹𝑡 , 𝑀𝑐 }
• ⟹ random pruning increases imbalance
• Continuous example
• Dataset: 𝑇 ∈ {0, 1} randomly assigned; 𝑋 any fixed variable;
with 𝑛 units
• Measure of imbalance: squared difference in means 𝑑 2 , where
𝑑 = 𝑋𝑡̄ − 𝑋𝑐̄
• 𝐸(𝑑 2 ) = 𝑉 (𝑑) ∝ 1/𝑛 (note: 𝐸(𝑑) = 0)
• Random pruning ↝ 𝑛 declines ↝ 𝐸(𝑑 2 ) increases
• ⟹ random pruning increases imbalance
• Result is completely general

Problems with Propensity Score Matching 32 / 45 .


PSM’s Statistical Properties

1. Low Standards: Sometimes helps, never optimizes


• Efficient relative to complete randomization, but
• Inefficient relative to (the more powerful) full blocking
• Other methods dominate:
𝑋𝑐 = 𝑋𝑡 ⇒ 𝜋𝑐 = 𝜋𝑡 but
𝜋𝑐 = 𝜋𝑡 ⇏ 𝑋𝑐 = 𝑋𝑡
2. The PSM Paradox: When you do “better,” you do worse
• Background: Random matching increases imbalance
• When PSM approximates complete randomization (to begin
with or, after some pruning) ↝ all 𝜋̂ ≈ 0.5 (or constant within
strata) ↝ pruning at random ↝ Imbalance ↝ Inefficency ↝
Model dependence ↝ Bias
• If the data have no good matches, the paradox won’t be a
problem but you’re cooked anyway.
• Doesn’t PSM solve the curse of dimensionality problem?
Nope. The PSM Paradox gets worse with more covariates

Problems with Propensity Score Matching 33 / 45 .


PSM is Blind Where Other Methods Can See

Mahalanobis P opens y Sco e

1000
2 C CCC
C TT
C C C T C TT
C CT C

800
CC C CT
C C
T C
T
TC C
T
0 C C
CT T C CT
C CC T C
C T T
C C
C T
C T

600
C

Simulation #
C T TC
C CT C
−2 C C
T C T
T
C

Sm
X2

400
C
−4 C C C
T TTT
−−−
−−−−
−−−−−−−−
−−−−−−−−−−−−−−
−−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−
−−−−−−−
−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−
−−−−−
−−−−−−
−−−−−−
−−

−−
−−
−−
−−

−−
−−
−−
−−

−−
−−

−− −
−−

−−
− −−
−−
−−
−−

−−
−−

−− −
−−

−−
− −−
−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−

−−
−−

−−−−
−−

−−
−−
−−
−−

−−

−−−−
−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−−
−−
−−
−−−
−−−
−−
−−−−
−−−
−−
−−


−−

−−−−
−−

−−

−−−−
−−

−−

−−−−
−−
−−
−−
−−

−−−
−−−
−−−
−−
−−−−
−−−
−−
−−



−−−
−−
−−
−−


−−−−
−−
−−

−−
−−−
−−
−−
−−
−−

−−−−−
−−
− −
−−−
−−

−−−−
−−−
−−

−−−−
−−


−−−


−−−
−−−−
−−−
−−
−−
−−
−−
− −
−−
−−−
−−
−−

−−−
−−
−−
−−
−−−
−−−−
−−

−−−−
− −
−−
−−−
−−

−−
− −

−−−
−−
−−−

−−
−−−−
−−
−−
−−
−−−
−−
−−
−−−

−−−

−−−

−−−

−−−

−−−

−−−

−−
−−
−−

−−−

−−
−−
−−
−−



−−−−
−−−−−
−−−−−−
−− −−
−−−−−−

−−−−
−−
−−
− −−
−−
− −−−
−−−−−
−−

−−
−−−
−−−−
−−
−−
− −
−−
−−
− −
−−
−−
− −−
−−−−−
−−−−

−−−
−−
−−−
−−
−−−−
−−−
−−−−−−
−−−−−−

−−−−
−−
−−
− −
−−
−−
−−−−
−−−−−
−−−
−−−−−−
−−−−
−−
−−
−−
−−
−−−
− −
−−
−−
−− −−
−−−
−−−

−−−−−−

−−−−−

−−−
−−
−−
− −
−−
−−−−

−−−
− −
−−−
−−
−−−
−−
−−
−−−
− −
−−
− −
−−
−−

− −
−−
−−


−−−
−−
− −

−−
−−−
−−
−−−
−−
−−
−−−−

−− −−−

−− −

C C C C






−−

−−
−−
−−



−−

−−


−−

−−

−−

−−




−−


−−

−−
−−

−−



−−
−−







−−
−−

−−












−−
−−

−−

−−



−−

−−

−−

−−

−−

−−

−−

−−
−−
−−


−−

−−
−−


−−

−−





−−


−−


−−


−−
−−

−−
−−
−−




−−


−−

−−
−−

−−



−−
−−







−−

−−












−−−
−−






−−
−−




−−−
−−


−−

−−

−−
−−



−−

−−


−−

−−
−−

−−

−−
−−





−−


−−
−−

−−
−−
−−










−−

−−
−−
−−
−−








−−

−−

−−−






−−

−−



−−
−−
−−

−−
−−





−−
−−


−−

−−





−−




−−
−−

−−
−−






−−





−−

−−






−−




−−−
−−






−−
−−




−−−
−−


−−

−−
−−

−−

−−
−−





−−

−−


−−

−−
−−
−−











−−


−−

−−





−−




−−

−−
−−



−−


−−


−−
−−
−−
−−−


−−

−−

−−




−−
−−

−−

−−

−−
−−
−−
−−
−−

−−
−−
−−

−−


−−


−−
−−

−−


−−
−−




−−





−−

−−

−−







−−


−−
−−

−−
− −












−−


−−




−−
−−

−−
−−


−−−





−−

−−−

−−

−−−






−−


−−
−−

−−
− −



−−

−−

−−

−−
−−
−−




−−



−−


−−

−−−










−−

−−

−−
− −






−−

−−
−−

−−
− −






−−
−−
−−

−−−











−−
−−
−−

−−−











−−
−−





−−
−−
−−

−−


−−

−−−





−−

−−

−−
−−










−−
−−

−−−










−−
−−

−−−











−−



−−





−−
−−




−−




−−






−−








−−
−−
−−−
−−




−−



−−
−−

CCC CC
− −− −− −
−−−−− −−−−
−−−−
−−−
−−
−− −− −
−−−−−−−−−−
−−−
−−
−−−−
−−−−
−−−−−−−
−−−−
−− −− −−−−−−
−−−−
−−−−
−−−− −− −−−−−−−−−−−−
− −−− −− −−−−−− − − − − − −− −

−−−−
−−
−−−−−−
−−
−−−−
−−−

−−−
−−

− −−
−−
−−−−−−
−−
−− −


−−

− −
−−
−−

−−−−
−−
− −−
−−−
−−−
−−
−−−
−−
−−
−−
−−
−−−−−−
−−
−−

−−−

−−
−−−
−−
− −−−

−−
−−−
−−−−
−−
−−
−−−
−−
−−
−−
−−−−
−−−
−−
−−−
−−
−−
−−
−−
−−
−−
−−−−
−−−
− −
−−−

−−
−−−
−−−
−−
−−
−−−−
−−−
−−

−−
−−−
−−
−−
−−
−−−
−−
−−−
−−
−−−−
−−

−−−


−−−−
−−−
−−

−−−
−−−
−−
−−
−−−
−−
−−−−
−−−−
−−

−−−
−−−
−−

−−
−−
−−−
−−
−−
−−−−
−−
−−
−− −−
−−
−−−
−−
−−
−−
−−−
−−
−−
−−−
−−
−−
−−
−−−
−−−
−−


−−
−−
−−−
−−−
−−
−−
−− −
−−
−−

−−
−−
−−
−−−
−−−
−−−

−−
−− −− − − −
−−−−−−− − − − − − −−
−−−− − − − −−−−−−− −
−− −
−− −− − −− −− −− − − −−

C C CC CT



−−







−−
−−







−−




−−−





−−










−−
−−

−−−





−−−












−−




−−
−−

−−







−−












−−








−−








−−





−−


−−





−−


−−








−−




−−

−−

−−−








−−









−−




−−

−−

−−−








−−−






−−






−−−

−−



−−−









−−


−−








−−




−−

−−

−−−








−−−





−−
−−

−−

−−
−−
−−

−−−
−−
−−−−−
−−−−

−−−
−−−−

−−
−−


−−


−−









−−









−−

−−
−−



−−







−−−



−−








−−

−−



−−
−−
−−−




−−



−−

−−

−−





−−

−−

−−


−−
−−
−−



−−

−−







−−
−−


−−
−−

−−
−−

−−
−−−−
−−


−−

−−



−−
−−
−−
−−

−−


−−





−−−
−−
−−

CT T C C
−− −− −−−−−− −−−−−− − −− − −− − − − − −−−−−−−−−−−− −− −−−−−−−− − −− −−− −−−−−− −−−−−−−−


−−

−−−

−−
−−

−−−

−−
−−

−−−
−−−
−−
−−−−
−−
−−
−−

−−−
−−
−−

−−−
−−








−−
−−
−−
−−

−−−

−−
−−

−−
−−−

−−−
−−
−−
−−−
−−
−−
−−
−−−
−−
−−

−−

−−

−−

−−

−−

−−

−−

−−
−−
−−

−−
−−
−−
−−−
−−
−−

−−

−−

−−
−−
−−

−−
−−
−−
−−

−−
−−
−−
−−−
−−
−−
−−
−−−
−−
−−

−−
−−
−−
−−

−−

−−

−−

−−

−−
−−
−−

−−
−−
−−−−
−−
−−−

−−
−−
−−
−−
−−

−−
−−
−−
−−−
−−−
−−−
−−−
−−−
−−−
−−
−−
−−−
−−−
−−
−−
− −
−−

−−−

−−−
−−−

−−−
−−
−−−
−−
−−
−−

−−
−−

−−
−−

−−

−−

−−
−−
−−
−−−
−−
− −
−−
−−−

−−
−−
−−
−−

−−

−−
−−
− −
−−
−−−
−−
− −
−−
−−−
−−
−−
−−
−−−
−−−
−−
−−−
−−−
−−
− −
−−
−−−
−−−−
−−−−
−−
−−−
−−−
−−−

−−
−−−
−−−


−−−−
−−

−−−
−−
−−
−−−
−−
−−
−−
− −
−−−
−−

−−−−
−−−

−−−
−−−−

−−−
−−
−−
−−
−−

−−
−−−
−−


−−−
−−
− −
−−
−−
−−
−−
−−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−

−−−
−−

−−
−−−−

−−
−−
−−−

−−
−−
−−

−−−
−−
−−−

−−
−−
− −

−−
−−
−−
−−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−−
−−
−−−

−−
−−−
−−

−−

−−
−−
−−−
−−−−
−−
−−−−−
−−−−
−−−
−−
−−
−−
−−
−−
−−−−
−−
−−
−−−
−−−
−−−
−−
− −

−−−
−−
−−−
−−
−−

−−−−−
−−
−−−
−−−
− −
−−−−
−−
−−

−−−−
−−
−−
− −

−−−
−−
−−−
−−
−−
−−−

−−
−−
−−
−−−−−

−−−−

−−
−−
−−

−−
−−
−−−
−−

−−−
−−

−−−−
−−

TT
− − − − −−−− − −− − −
−−



−−
−−−

−−

−−
−−−

−−

−−
−−−


−−

−−


−−


−−

−−

−−








−−



−−

−−
−−−






−−

−−−

−−


−−

−−


−−

−−




−−


−−








−−









−−







−−
−−






−−

−−−
−−


−−

−−


−−


−−









−−

−−−





−−

−−−


−−


−−





−−


−−








−−



−−
−−

−−
− −


−−
−−

−−
− −


−−
−−

−−
− −


−−
−−

−−
− −


−−



−−


−−

−−

−−
−−












−−




−−

−−




−−




−−
−−−
−−
−−
− −


−−

−−







−−
−−


−−
−−

−−
− −



−−
−−
−−
−−−

−−

−−








−−




−−


−−

−−






−−
−−
−−





−−
−−−

−−


−−



−−

−−





−−

−−





−−


−−


−−

−−
−−
− −


−−
−−

−−
− −



−−
−−

−−
− −



−−
−−

−−
− −


−−


−−−

−−
− −−
−−
−−−−−−

−−

− − − − − − − − −−−−− − − −−−−−− −−−
C
C C C

−−

−−
−−

−−

−−
−−

−−

−−
−−
−−
−−

−−
−−
−−

−−
−−
−−
−−

−−
−−
−−
−−
−−

−−

−−
−−

−−

−−

−−
−−
−−

−−
−−
−−
−−
−−
−−
−−

−−
−−

−−
−−
−−

−−

−−
−−
−−

−−

−−
−−
−−
−−

−−
−−
−−
−−

−−

−−
−−

−−

−−

−−
−−
−−−−
−−−−−−
−−−
−−
−−
−−

−−
−−
−−
−−
−−

−−
−−
−−

−−
−−
−−−
−−
−−
−−−
−−
−−
−−
−−
−−
−−
−−
−−

−−
−−
−−
−−−
−−
− −
−−
−−

−−

−−
−−

−−
−−
−−

−−
−−
−−
−−−
−−−

−−

−−
−−
−−−−
−−
−−
−−
−−
−−
−−
−−
−−

−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−−
−−
−−−−
−−
−−−−−

−−−

200

−−−

−−
−− −−
−−
−− −

−−−−
−−
−− −

−−−

−−−

−−−

−−−
−−
−−
−−

−−−
−−
−−
−−

−−

−−
−−
−−
−−

−−
−−
−−−
−−
−− −

−−−−
−−
−− −
−−−

−−−
−−−
−−
−−
−− −−
−−
−−

−−
−−
−−
−−
−−

−−

−−
−−
−−−

−−
−−
−−−
−−
−− −

−−−

−−−−
−−
−− −

−−−


−−−
−−−

−−
−− −

−−−

−−−−
−−
−− −

−−−−
−−
−− −
−−−

−−−

−−
−−
−−
−−

−−
−−
−−−

−−
−−
−−−

−−
−− −

−−−

−−−
−−−

−−
−−
−− −

−−−−
−−
−−−−
−−

−−−
−−
−−−

−−−−

−−
− −
−−−
−−−
−−

−−−

−−−
−−
−−
−−
−−−−−
−−
−−−

−−−−
−−
−− −

−−−
−−−

−−−

−−−
−−
−−−−
−−
−−−
−−
−−
−−−

−−−

−−−

−−−

−−−
−−

−−
−− −

−−−

−−−

−−−

−−−

−−−

−−−
−−

−−−
−−−−

−−
−− −
−−−
−−−
−−−
−−−
−−−
−−−−
−−−−−
−−
− −

−−−
−−−
−−−
−−−−−
−−−

−−
− −− −− −−
−− −− −− −−
−− −− −− −−−−−
−−−−− − − −−−−
−− − −−
−−−−−
−−
− − −− − −− − −−
−− −

T T C

−−

−−
−−

−−

−−
−−

−−

−−
−−

−−

−−
−−
−−

−−
−−
−−

−−

−−
−−

−−

−−
−−

−−

−−

−−
−−

−−

−−
−−
−−

−−
−−
−−

−−
−−
−−

−−

−−
−−

−−

−−
−−

−−

−−

−−

−−

−−

−−
−−
−−
−−

−−

−−

−−

−−

−−

−−
−−

−−

−−
−−
−−

−−
−−
−−

−−

−−
−−

−−

−−
−−

−−

−−

−−

−−

−−

−−

−−

−−

−−

−−

−−

−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−

−−
−−
−−

−−
−−
−−

−−

−−

−−

−−

−−

−−

−−

−−

−−
−−
−−

−−
−−
−−−
−−
−−

−−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−

−−
−−
−−

−−
−−
−−

−−
−−
−−

−−
−−
−−
−−
−−
−−
−−
−−
−−
−−


− −−
−− −−
−− −−−
−−−−

−−−−
−−−−
−−
−−−
−−

−−−−
− −−
−− −−−
−− −−

−−−−−
−−−−
−−
−−−
−−
−−−
−−
−−−− −−
−−−−
−−
−−
− −−
−−− −−
−− −−−
−−−−
−−−−
−−−−
−−
−−−
−−
−−−−−−−−
−−−−
−− −−−−
−−−−
−− −−
−−
−−−−
−−
−−−−
−−
−− −−
−− −−−
−−
−−−−
−− −
−−−−−−
−−−−
−−−−
−−
−−−−−−
−−− −
−−−
−−−−
−−
−−−
−−
−−−−
−−
−−−−−−−−− −− −
−−
− −−
− −−
−−−−−−− −− −
−− −−
−−−−
−−−−−−−− −−−−−−−−−− −
−− −
−−
−−−−
−−−−−−−−−−−− −−
−−−−
−− −−
−−
−−−
−−
−−
−−−−
−−
− −−
−− −−
−− −−
−−
−− −−
−−
− −
−−− −−
−−−−−
−−
−−
−−−−−
−−−−−
−−−−−− −−
−−−−−−
−− −
−−− −−
−−− −
−−
−−
C
CC C


−−

−−

−−










−−




−−

−−
−−





−−
−−


−−
−−




−−




−−

−−





−−


−−
−−

−−
−−−
−−
−−






−−





−−


−−

−−


−−

−−
−−





−−
−−




−−


−−

−−

−−


−−



−−









−−


−−



−−
−−

−−



−−









−−

−−
−−

−−

−−


−−

−−

−−

−−



−−




−−



−−

−−





−−

−−


−−
−−


−−
−−−




−−

−−
−−




−−


−−
−−

−−
−−−
−−
−−






−−





−−


−−

−−


−−

−−
−−





−−
−−




−−


−−
−−

−−


−−
−−





−−
−−
−−



−−



−−


−−


−−





−−
−−

−−





−−
−−



−−

−−

−−






−−






−−
−−



−−


−−

−−





−−


−−

−−
−−
−−
−−
−−






−−





−−


−−
−−

−−
−−−
−−
−−





−−
−−
−−





−−


−−
−−

−−



−−
−−−

−−

−−





−−


−−
−−

−−



−−
−−−



−−





−−


−−
−−

−−
−−−
−−
−−






−−





−−


−−
−−

−−
−−−
−−
−−





−−
−−





−−


−−
−−

−−


−−
−−





−−
−−
−−



−−



−−

−−

−−


−−


−−




−−




−−

−−


−−


−−







−−

−−




−−
−−

−−


−−

−−





−−
−−


−−
−−


−−

−−
−−−


−−

−−


−−


−−


−−−
−−

−−
−−
−−
−−

−−


−−


−−


−−
−−

−−


−−

−−
−−−

−−



−−


−−

−−


−−


−−


−−




−−













−−

−−−







−−
−−

−−


−−


−−

−−−

−−


−−
−−
−−


−−
−−
−−




−−

−−
−−−
−−


−−
−−


−−





−−



−−
−−

−−

−−
−−






−−
−−




−−

−−


−−





−−






−−
−−






−−


−−





−−
−−




−−




−−


−−

−−−




−−
−−

−−





−−


−−




−−

−−
−−−






−−


−−

−−


−−
−−

−−
−−
−−
−−






−−


−−

−−



−−

−−


−−

−−



−−



−−
−−

−−





−−

−−




−−


−−

−−−
−−


−−
−−−







−−

−−−

−−−
−−
−−
−−

−−
−−


−−


−−


−−


−−
−−

−−


−−
−−


−−
−−




−−
−−



−−


−−
−−



−−


−−
−−

−−


−−

−−

−−

−−
−−



−−


−−
−−
−−

−−

−−




−−
−−



−−
−−
−−

−−

−−





−−

−−


−−




−−
−−







−−





−−

−−



−−




−−
−−











−−

−−



−−

−−
−−
−−
−−


−−
−−



−−


−−

−−

−−

−−

−−−
−−


−−
−−

−−
−−
−−


−−

−−




−−






−−

−−−










−−


−−

−−






CT
− − − − − − − − − −−−− − − − − −−−−−− − −−−−−− − − − −−−−−−−−−−−− − − − − −− −−−−−−−−−−−−−−−−−−−−−−
−−−− − −−−−− −

−−
−−
− −



−−
− −

−−
−−
−−
−−−

−−

−−
−−−

−−

−−
−−
−−



−−
− −
−−
−−
−−
−−
−−−

−−


−−
− −

−−
−−
−−−

−−

−−
−−
−−
−−
−−
−−
−−
−−−
− −
−−
−−−
−−−
−−−
−−
−−
−−
−−−

−−−
−−−

−−−
−−−
−−


−−
− −−

−−−
−−−

−−

−−
−−−

−−

−−
−−
−−
−−
−−
−−
−−

−−−
−−−
−−−
−−


−−
− −−

−−−
−−−

−−


−−

−−
−−

−−
−−−
−−
−−−
−−−−
−− −
−−−
−−−
−−
−−−−
−−
−−
−−−
−−−
−−−

−−

−−
−−
− −−

−−−
−−
−−
−−
−−
−−−
− −
−−−
− −
−−−

−−−−
−−
−−
− −
−−
−−
−−−
−−−
−−
−−−−
−−
−−−



−−

−−−


−−−
−−−
−−
− −
−−−−
−−
− −
−−−−

−−
−−
− −
−−


−−





−−


−−


−−




CTTT
−−
−−
−−
−−−
−−
−−−−
−−−−−
−−−−
−−−
− −
−−−−−
− − −
−−
−−−−
−−−−−
−−−
−−−
−−−
−−−
−−−− −−−−
− −−−
−− −−−−−−−
−−−−
−−−−
−−−
−−−
−−−
− − −−−−−−−
−−−−
−−−
−−−

−−
−−−
−−−−−
−−
−−−−−
−−−
−−−
−−−
−−−−−−− −−
−−−−
− −−−− −−
−−
−−−−
−−−−−
−−−−−
−−−
−−−
−−−
−−−−−
−−−−−
−−−− −−−−
−− −−
−−
−6 C













−−


−−








−−






−−
−−−





−−


−−
−−−
−−

−−





−−






−−
−−−







−−


−−
−−−






−−

−−







−−






−−
−−−

−−










−−

−−







−−





−−
−−−
−−
−−








−−

−−







−−
−−
−−




−−
−−


−−




−−

−−









−−





−−

−−




−−

−−

−−
−− −
−−

−−
−−












−−
−−
−−
−−




−−
−−−





−−


−−
−−−
−−

−−





−−






−−
−−−







−−


−−
−−−







−−

−−






−−









−−
−−−

−−








−−

−−







−−





−−
−−−


−−
−−





−−


−−




−−




−−





−−


−−

−−




−−
−−




−−

−−−

−−




−−

−−


−−
−−





−−
−−−
−−




−−


−−
−−

−−







−−






−−
−−−





−−


−−
−−−
−−

−−





−−





−−
−−
−−
−−









−−


−−








−−

−−





−−







−−
−−−
−−


−−

−−



−−


−−



−−
−−







−−


−−
−−
−−
−−
−−






−−




−−

−−−




−−
−−




−−
−−−
−−
−−





−−

−−




−−

−−






−−
−−




−−








−−

−−






−−






−−
−−−


−−






−−


−−




−−




−−






−−


−−






−−


−−






−−





−−

−−







−−
−−−
−−
−−







−−


−−
−−
−−




−−


−−



−−
−−







−−




−−

−−
−−
−−
−−
−−



−−




−−
−−




−−
−−







−−


−−
−−−




−−

−−
−−
−−





−−
−−



−−



−−


−−


−−

−−






−−








−−





−−


−−

−−







−−
−−
−−
−−






−−
−−


−−
−−





−−





−−






−−


−−
−−
−−−−




−−



−−
−−




−−




−−




−−


−−




−−

−−








−−




−−

−−

−−

−−







−−





−−
−−




−−

−−

−−

−−


−−



−−




−−
−−




−−






−−


−−
−−
−−
−−




−−




−−


−−

−−


−−
−−



−−
−−

−−


−−




−−


−−

−−


−−




−−

−−





−−








−−







−−




−−

−−
−− −



−−






−−
−−


−−

−−

−−−



−−

−−










−−



−−
−−
−−



−−



−−



−−




−−


−−



−−



−−







−−
−−
−−
−−

−−−
−−







−−

−−−


−−
−−





−−

−−
−−






−−





−−
−−
−−

−−
−−






−−






−−
−−

−−





−−

−−
−−






−−




−−
−−−

−−



−−


−−







−−
−−
−− −




−−
−−−




−−
−−

−−−
−−


−−−







−−





−−
−−−

−−



−−


−−


−−





−−

−−




−−
−−−

−−




−−


−−
−−
−−


−−
−−
−−

−−


−−−

−−

−−



−−


−−
−−
−−
−−


−−


−−


−−


−−
−−


−−




−−

−−



−−

−−

−−






−−
−−−


−−
−−−





−−




−−

−−
−−
−−
−−−
−−
−−−
−−−−
−−
−− −
−−−

−−−−
−−−

−−−−
−−−−

−−−−−
−−−

−−−
−−−
−−
−−−
−−−−−
−−
−− −−

−−−−
−−−
−−
−−−−
−−−

−−−−
−−−
−−
−−
−−−
−−−−
−−−−
−−
−− −
−−
−−−
−−
−−−
−−−−
−−−−−
−−
−− −
−−
−−−
−−
−−−−

−−−
−−−
−−−
−−
−−
−−−
−−−
−−−
−−
−−−
−−
−− −
−−
−−−
−−
−− −
−−
−−

−−
−−

−−−
−−
−−
−−−−−
−−−
−−−

−−
−−−
−−
−−
−−
−−
−−−
−−−
−−−
−−−
−−−
−−
−−−−
−−
− −
−−−−
−−
− −−
−−
−− −
−−
−−− −
−−−

−−−
−−−

−−−
−−−

−−
−−−

−−
−−−
−−
−−
−−−
− −−
−−−−
−−


−−
−−
−−

−−
−−
−−
−−
−−

TCT
− − −−
− −− −− −− − −−
− −−
− −− − − − − − − −− − − − −− −−
−−−−−
−−−−
− −− − − − −−
− −− − − − − −−− −−− − −−
C C







−−
−−


−−
−−




−−

−−

−−


−−
−−
−−



−−





−−



−−
−−
−−−



−−






−−


−−

−−
−−
−−−







−−




−−
−−



−−
−−
−−



−−

−−


−−



−−
−−
−−−


−−
−−





−−



−−
−−


−−

−−

−−


−−




−−

−−


−−

−−


−−


−−





−−

−−



−−
−−
−−



−−






−−




−−
−−
−−−







−−
−−



−−
−−
−−−



−−

−−
−−

−−


−−
−−



−−
−−−
−−

−−

−−


−−
−−

−−

−−


−−

−−

−−


−−




−−

−−


−−

−−


−−


−−




−−

−−


−−



−−


−−






−−

−−

−−


−−
−−
−−



−−
−−

−−
−−
−−
−−
−−
−−

−−


−−




−−

−−


−−

−−

−−





−−





−−


−−

−−
−−
−−−








−−




−−
−−
−−−




−−


−−
−−



−−
−−



−−
−−
−−


−−
−−





−−



−−
−−


−−

−−

−−


−−
−−

−−

−−


−−

−−
−−

−−


−−




−−

−−


−−


−−





−−






−−


−−

−−
−−
−−−








−−




−−
−−
−−
−−


−−

−−




−−



−−
−−
−−
−−



−−




−−
−−
−−







−−
−−
−−




−−









−−
−−−

−−
−−
−−




−−
−−

−−


−−




−−
−−




−−
−−
−−


−−
−−



−−
−−


−−
−−−
−−

−−
−−

−−


−−




−−

−−

−−



−−
−−



−−
−−



−−
−−
−−



−−−

−−

−−




−−
−−
−−

−−


−−





−−




−−

−−




−−

−−
−−

−−
−−





−−
−−
−−

−−
−−
−−

−−


−−


−−



−−




−−



−−





−−




−−




−−



−−





−−




−−


−−



−−
−−
−−
−−


−−


−−


−−


−−


−−


−−
−−
−−


−−
−−



−−
−−

−−


−−
−−−
−−

















−−




−−
















−−
−−
−−





−−






−−




−−
−−





−−

−−







−−
−−

−−



−−

−−




−−


−−

C CC

−−
−−
−−−
−−
−−−
−−
−−
−−
−−
−−
−−
−−
−−
−−−
−−
−−−
−−
−−
−−−
−−
−−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−−−
−−−−
−−
−−
−−
−−
−−
−−
−−−−
−−
−−−
−−
−−
−−
−−
−−
−−−−−−−−−−−−−−−−−−−−−−−−
−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−
−−−
−−
−−−
−−
−−−
−−−
−−
−−
−−
− −
−−− −
−−−−−
−−
−−
−−−
−−
− −
−−−−−
−−−−
−−−
−−− −
−−−−−−
−−−−
−−
−−
−−
−−
−−−
−−− −
−−−−−−
−−
−−−−−
−−−−−−
−−−−−−
−−−−−
−−−
−−−
−−−
−−−
−−−
−−−−−
−−−−−
−−−
−−−
−−
−−
−−−
−−
−−
−−
−−−
−−−
−−−
−−−
−−−−
−−
−−−−
−−−
−−−−
−−
−− −
−−
−−
−−
−−−
−−
−−−
−−−−−
−−−
−−−
−−−
−−
−−−
−−−
−−−
−−−
−−−
−−−
−−
−−−
−−
−−−
−−−
−−−
−−−
−−
−− −−
−−
−− −
−−−−
−−−
−−−−
−−−
−−−
−−
−−−−−
−−
−−
−−−
−−
−−
−−
−−
−−−−
−−− −

−−− −

−−
C CC C T TC



−−
−−




−−



−−
−−

−−

−−

−−



−−
−−

−−

−−
−−−
−−
−−
−−

−−
−−−−

−−
−−−

−−
−−−
−−
−−
−−
−−



−−


−−

−−
−−

−−−

−−
−−

−−
−−
−−

−−
−−


−−

−−

−−

−−

−−
−−
−−

−−

−−
−−

−−

−−
−−−

−−
−−−−


−−


−−


−−


−−

−−

−−
−−


−−


−−




−−

−−

−−



−−
−−


−−
−−

−−
−−
−−

−−
−−


−−
−−
−−

−−
−−
−−


−−


−−


−−

−−
−−


−−
−−

−−
−−

−−

−−
−−

−−

−−
−−−

−−

−−
−−


−−


−−


−−


−−

−−

−−
−−



−−







−−

−−

−−


−−



−−

−−
−−
−−
−−

−−

−−
−−
−−
−−−
−−

−−
−−−

−−

−−
−−


−−







−−


−−−

−−
−−

−−











−−

−−
−−
−−
−−

−−

−−

−−

−−
−−

−−
−−


−−

−−
−−
−−


−−






−−



−−

−−

−−
−−

−−
−−



−−


−−

−−
−−

−−
−−
−−

−−

−−
−−
−−

−−

−−

−−
−−
−−


−−



−−


−−
−−−
−−
− −

−−



−−




−−




−−





−−




−−




−−






−−

−−
−−
−−
−−


−−
−−
−−
−−


−−
−−−
−−





−−


−−
−−





−−
−−
−−
−−
−−






C CC
−−
−−
−−−
−−−
−−
−−−−
−−−
−−−
−−
−−−
−−−
−−−
−−−
−−−−
−−−−−
−−−
−−−−−
−−−−−
−− −
−−−−−
−− −−
−−−−−

−−
− −−
−−−−−−−
−−
−−
−−−
−−
−−−−−
−− −
−−−−−
−−−−
−−−−−
−−
−−
−−−
−−
−−−−−
−− −
−−−− −
−−−
−− −
−−
−− − −−
− −
−−−−
−−−−
−−
−−
−−
−−−−
−−−− −−−−
−−−−
−−
−− −−
−−−
−−−−
−−−
−−−−−
−−−−−
−−− −
−−−
−−−−
−−
−−−−
−−
−−−−
−−−−
−−−
−−−
−−

−−− −− − − −− −− −−
−− −− − − −− − − −− −− −
−− −−− − − −− −
−−
−− −−
−− − − −− −−
−−−−−−
−− −−
−− − −− −− −− −−−
−− −−
−−−−
−− −−

T T TC

−−

−−
−−−



−−

−−
−−−



−−

−−−

−−
−−−
−−

−−−

−−
−−−
−−

−−−

−−



−−−
−−

−−
−−
−−−
−−

−−−−

−−
−−−


−−

−−
−−−



−−

−−−

−−
−−−
−−

−−−

−−−

−−−

−−−

−−−

−−−

−−−

−−

−−

−−−



−−

−−−


−−

−−
−−−




−−

−−
−−−






−−

−−−


−−

−−
−−−



−−

−−
−−−




−−

−−−

−−
−−−
−−

−−−

−−−

−−−

−−−

−−
−−
−−−
−−

−−

−−

−−−


−−

−−
−−−



−−

−−
−−−




−−

−−−

−−
−−−
−−
−−
−−
−−

−−−













−−
−−−
−−
−−−−−
−−
−−−


−−

−−−

−−−

−−−
−−

−−
− −

−−

−−
−−


−−

−−

−−−

−−
−−−
−−
−−−−−


−−−

−−−

−−

−−

−−−


−−
−−




















−−

−−−

−−−
−−
−−

−− −
−−
−−−
−−

−−−−
−−−
−−−
−−



−−
−−−
−−



−−
−−



−−
−−−
−−−

C −









−−





−−



−−

−−





−−



−−



−−


−−
−−






−−


−−

−−−







−−−


−−



−−






−−

−−

−−
−−







−−−


−−



−−



−−

−−





−−



−−








−−






−−


−−

−−−







−−


−−
−−







−−−


−−

−−

−−


−−

−−




−−
−−

−−
−−












−−









−−



−−
−−




−−
−−

−−
−−





−−







−−
−−−




−−



−−








−−


−−

−−
−−−


−−


−−

−−



−−



−−
−−


−−


−−



−−

−−




−−




−−






−−

−−



−−
−−
− −


−−


−−

−−






−−


−−

−−
−−
−−


−−
−−−

−−
−−
−−

−−
−−

−−
−−
−−
−−
−−

−−

−−


−−−−

−−
−−
−−
−−−
−−

−−
−−
−−
−−
−−
−−

−−

−−−








−−


−−

−−
−−





−−
−−−−


−−

−−

−−
−−
−−
− −
−−


−−
−−−

−−

−−−
−−


−−
−−




−−

−−
−−
−− −




−−


−−


−−

−−
−−



−−
−−

−−
−−

−−

−−
−−
−−
−−−

−−
−−
−−

−−
−−

−−
−−

−−
−−

−−


−−


−−

−−


−−


−−
−−



−−
−−


−−


−−



−−−

−−

−−


−−




−−−


−−
−− −
−−


−−

−−−


−−
−−
−−

−−

−−
−−−
−−
−−
−−
−− −
−−


−−








−−



−−

−−

−−−

CCC
− − −− − − −− − −−
− − − − − − − − −
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−− −
−− −
−− −
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−− −
−− −
−− −
−−
−−
−−
−−
−−
−− −
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−− −
−− −−
−− −
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
−− −
−− −−−
−−
−−
−−
−−−−−
−−
−−
−−
−−
−−
−−−
−− − −−
−−
−−−
−−
−−−−
−−−−− − −−−−−− −−
−−
−−
−−
−−

C TTC
−− −− −−
0 −−−−−−−−−−− −−−−
−−−−−
−−−
−−−−
−−−−−−−−−− −−−−
−−−−−
−−−
−−−−
−−−−−−− −−
−−−−−−−− −− −−−−
−−−−
−−−−−−
−−−−−−− −− −−−−
−−−−−
−−−
−−−−
−−
−−− −−−−
−−−− −− −− −− −−−−−
−−−−−
−−−−
−−
−−−−−
−− −−
−−−
−−−−
−− −
−− −−
−−−
−−− −

−8 C T
0 50 100
−8 −6 −4 −2 0 2
Number of Dropped Obs. N m D O
X1

P b m w hP p n yS M hn
What Does PSM Match?

MDM Matches PSM Matches

First 25 Matches Treated First 25 Matches Treated


7

7
● ●

Second 25 Matches ● Control Second 25 Matches ● Control


Third 25 Matches Third 25 Matches
Final 25 Matches Final 25 Matches
6

6
● ●

● ● ● ● ● ●
●● ●●
● ● ● ● ● ●
● ● ● ●
●● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ●
5

5
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ● ●●

● ● ● ●● ● ● ●●


● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
4

4
● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
X2

X2
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ●
3

3
●●
● ● ● ● ● ● ● ●●
● ● ● ● ● ● ●
● ●● ●● ● ● ● ●● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ●● ●● ● ● ● ●● ●●
2

2
● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ●● ●● ● ●
1

1
● ● ● ● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
0

0 1 2 3 4 5 6 0 0 1 2 3 4 5 6

X1 X1

Controls: 𝑋1 , 𝑋2 ∼ Uniform(0,5)
Treateds: 𝑋1 , 𝑋2 ∼ Uniform(1,6)

Problems with Propensity Score Matching 35 / 45 .


PSM Increases Model Dependence & Bias

Model Dependence Bias

4.0
0.05

Maximum Coefficient across 512 Specifications


0.04

3.5
0.03
Variance

3.0
0.02

2.5
0.01

PSM PSM

MDM MDM
True effect = 2
2.0
0.00

0 40 80 120 160 0 40 80 120 160

Number of Units Pruned Number of Units Pruned

𝑌𝑖 = 2𝑇𝑖 + 𝑋1𝑖 + 𝑋2𝑖 + 𝜖𝑖


𝜖𝑖 ∼ 𝑁 (0, 1)
Problems with Propensity Score Matching 36 / 45 .
The Propensity Score Paradox in Real Data

Finkel et al. (JOP, 2012) Nielsen et al. (AJPS, 2011)


30
10
25

8
20
Imbalance

Imbalance
6
15
Random PSM
Raw Random
4 Raw ● PSM
● 10
1/4 SD caliper
1/4 SD caliper
2 CEM 5 CEM
MDM
MDM
0 0

0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500
Number of units pruned Number of units pruned

Similar pattern for > 20 other real data sets we checked

Problems with Propensity Score Matching 37 / 45 .


Detecting Model Dependence

Matching to Reduce Model Dependence

Three Matching Methods

Problems with Propensity Score Matching

The Matching Frontier

The Matching Frontier 38 / 45 .


Tensions in Existing Matching Methods

• Maximize one metric; judge against another: Propensity


score matching, compared with var-by-var diff in means
• Choose 𝑛; check imbalance after: Propensity score matching,
Mahalanobis
• Choose imbalance; check 𝑛 after: exact matching, CEM

The Matching Frontier 39 / 45 .


The Matching Frontier 40 / 45 .
A Solution: The Matching Frontier

Imbalance

Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Imbalance
Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Imbalance

Result #1

Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1

Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1 Result #3
● ●
Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1 Result #3
● ●

Result #4

Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1 Result #3
● ●

Result #4

Less biased

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1 Result #3
● ●

Result #4

Theoretical frontier
Less biased

(optimal)

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


A Solution: The Matching Frontier

More biased
Result #2

Imbalance

Result #1 Result #3
● ●

Result #4

Theoretical frontier
Less biased

(optimal)

IMPOSSIBLE REGION

Low variance High variance


Number of Units Pruned

The Matching Frontier 41 / 45 .


How hard is the frontier to calculate?
• Consider 1 point on the SATT frontier:
• Start with matrix of 𝑁 control units 𝑋0
• Calculate imbalance for all (𝑁𝑛 ) subsets of rows of 𝑋0
• Choose subset with lowest imbalance
• Evaluations needed to compute the entire frontier:
• (𝑁𝑛 ) evaluations for each sample size 𝑛 = 𝑁 , 𝑁 − 1, … , 1
• The combination is the (gargantuan) “power set”
• e.g., 𝑁 > 300 requires more imbalance evaluations than
elementary particles in the universe
• ↝ It’s hard to calculate!
• We develop algorithms for the (optimal) frontier which:
• runs very fast
• operate as “greedy” but we prove are optimal
• do not require evaluating every subset
• work with very large data sets
• is the exact frontier (no approximation or estimation)
• ↝ It’s easy to calculate!

The Matching Frontier 42 / 45 .


Constructing the FSATT Mahalanobis Frontier
Remaining Data Frontier

● Treated
1.0 ●
● ● Control 0.4

Average Mahalanobis Discrepancy


● Next to remove
● ●

0.5
● 0.3

Covariate 2


● ●
● ●
0.0 0.2


● ●

−0.5 ●
● ● 0.1

−1.0 ● 0.0

−1.0 −0.5 0.0 0.5 1.0 0 5 10 15 20

Covariate 1 Number of Observations Dropped

Remaining Data Frontier

● Treated
1.0 ●
● Control 0.4
The Matching Frontier 43 / 45 .
cy


● Next to remove
Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4
0.3
0.2

X2
0.1
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5

X1

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35


0.3

0.3

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3

0.3

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●

0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●


0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●



0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●




0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●





0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●






0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●








0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●
0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●

0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●


0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●



0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●




0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●





0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●

0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●


0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●



0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●

0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●



0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1

0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1


0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1




0.05

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1




0.05 ●

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1




0.05 ●

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1




0.05 ●

0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Discrete algorithm

Short version:
• Calculate bins
• Until balance stops improving, greedily prune a control unit
from the bin with the largest proportional difference between
control and treated units
0.4

0.4

0.35

● ●
0.3


0.3 ●









0.25 ●






0.2

L1

0.2 ●





0.15 ●




0.1
0.1




0.05 ●



0
0.0

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 0 5 10 15 20 25 30 35 40

Number of Observations Pruned

The Matching Frontier 44 / 45 .


Job Training Data: Frontier and Causal Estimates
1.0

2000
0.9

0
−2000
0.8

Estimate
L1
0.7

−6000
0.6
0.5

−10000
0 5000 10000 15000 0 5000 10000 15000
Number of Observations Pruned Number of Observations Pruned

• 185 Ts; pruning most 16,252 Cs won’t increase variance much


• Huge bias-variance trade-off after pruning most Cs
• Estimates converge to experiment after removing bias
• No mysteries: basis of inference clearly revealed

The Matching Frontier 45 / 45 .

You might also like