12 Matching

Matching Methods
Kosuke Imai
Harvard University
S TAT 186/G OV 2002 C AUSAL I NFERENCE
Fall 2019
Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 1 / 18

Motivation
Causal inference inference for counterfactuals
Comparison between treated and control units
Consider the Average Treatment Effect for the Treated (ATT):
τATT = E(Yi (1) − Yi (0) | Ti = 1)
Regression model-based imputation:
n
1 X
τ̂reg = Ti (Yi − µ̂0 (Xi ))
n1
i=1
Regression can be model-dependent

Matching nonparametric imputation:
 
n
1 X 1 X
τ̂match = Ti Yi − Yi 0 
n1 |Mi | 0
i=1 i ∈Mi
where Mi is the “matched set” for treated unit i

Matching as Nonparametric Preprocessing
(Ho, et al. 2007. Political Anal.)
210 Daniel E. Ho et al.
Fig. 1 Model sensitivity of ATE estimates for imbalanced raw and balanced matched data. Thi
figure presents an artificial data set of treated units represented by ‘‘T’’ and control units represented
by ‘‘C.’’Kosuke
The Imai
vertical axis plots Yi and the horizontal
(Harvard) axis plots Xi. The
Matching Methods panels depictFall
Stat186/Gov2002 estimates
2019 3of
/ 18the
Selection Bias
Assumptions
1 Overlap: 0 < Pr(Ti = 1 | Xi = x) < 1 for any x
2 Ignorability: {Yi (1), Yi (0)}⊥
⊥Ti | Xi = x for any x
Bias decomposition (Heckman et al. 1998. Econometrica ):
E(Yi (0) | Ti = 1) − E(Yi | Ti = 0)
Z
= E(Yi (0) | Ti = 1, Xi = x)dFXi |Ti =1 (x)
S1 \S
Z
− E(Yi (0) | Ti = 0, Xi = x)dFXi |Ti =0 (x)
S0 \S
| {z }
bias due to lack of common support
Z
+ E(Yi (0) | Ti = 0, Xi = x)d{FXi |Ti =1 (x) − FXi |Ti =0 (x)}
|S {z }
bias due to imbalance of observables
Z
+ {E(Yi (0) | Ti = 1, Xi = x) − E(Yi (0) | Ti = 0, Xi = x)}dFXi |Ti =1 (x)
S
| {z }
bias due to unobservables

Exact and Coarsened Exact Matching
Exact Matching perfect covariate balance:
F
e (Xi | Ti = 1) = F
e (Xi | Ti = 0)
No model dependence
But, infeasible when
covariate is continuous
there are many covariates
Coarsened Exact Matching (CEM) (Iacus et al. 2011 Political Anal.)

discretize covariates so that you can match
many covariates are discrete
discrete categories may have substantive meanings
accounts for all interactions among coarsened variables
some treated units may have no matched controls (lack of overlap)
changes estimand
bias-variance tradeoff

Matching based on Distance Measures
Common measures:
1 Mahalanobis distance:
q
D(Xi , Xj ) = (Xi − Xj )> Σ
e −1 (Xi − Xj )
2 (Estimated) Propensity score:
D(Xi , Xj ) = |π(X
[i ) − π(X i = 1 | Xi ) − Pr(Tj = 1 | Xj )|
[j )| = |Pr(T\ \
or often with the linear predictor of logistic regression
D(Xi , Xj ) = |logit(π(X
[i )) − logit(π(X
[j ))|
Common matching methods (Rubin. 2006. Matched Sampling for Causal
Effects. Cambridge University Press; Stuart. 2010. Stat. Sci.):
one-to-one, one-to-many
caliper
with and without replacement
optimal matching (Rosenbaum 1991. J. Am. Stat. Assoc)
full matching (Rosenbaum. 1989. J. Royal Stat. Soc. B; Hansen. 2004. J. Am.
Stat. Assoc)
Propensity Score as a Balancing Score (Rosenbaum and Rubin.
1983. Biometrika)
Probability of receiving the treatment:
π(Xi ) = Pr(Ti = 1 | Xi )
Balancing property:
Ti ⊥
⊥ Xi | π(Xi )
Exogeneity given the propensity score (under exogeneity given

covariates):
(Yi (1), Yi (0)) ⊥

⊥ Ti | π(Xi )
Dimension reduction propensity score matching

But, true propensity score is unknown: propensity score tautology

Checking Covariate Balance
Success of matching method depends on the resulting balance

Ideally, compare the joint distribution of all covariates
In practice, check lower-dimensional summaries (e.g., standardized
mean difference, variance ratio, empirical CDF)
1
Pn
1

T X X
P
n1 i=1 i ij − |Mi | i ∈Mi
0 i 0j
standardized mean difference = q Pn

1 2
n1 −1 i=1 Ti (Xij − X j1 )
Frequent use of balance test

failure to reject the null 6= covariate balance
problematic especially because matching reduces the number of
observations

An Empirical Example (Eggers and Hainmueller. 2009. Am. Political Sci. Rev.)
Estimating the financial benefits of political office
Figure 3: Covariate Balance Before and After Matching
Conservative Candidates
● Matched
Aristocrat ● ● ● Unmatched
University: Not reported ● ●
University: Degree ● ●
University: Oxbridge ● ●
Schooling: Not reported ● ●
Schooling: Regular ● ●
Schooling: Public ● ●
Schooling: Eton ● ●
Miner ●
Journalist ● ●
Union Official ●
White Collar ● ●
Business ● ●
Local Politician ● ●
Civil Servant ● ●
Doctor ● ●
Solicitor ● ●
Barrister ● ●
Teacher ● ●
Female ● ●
Year of Death ● ●
Year of Birth ● ●
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
Standardized Bias

Equivalence Tests (Wellek. 2010. Testing Statistical Hypotheses of Equivalence
and Noninferiority. Chapman & Hall.)
Null hypothesis of usual balance tests: treatment and control

groups are the same
Problem: failure to reject the null 6= the null is correct
Shift the burden of proof reject the null hypothesis that
treatment and control groups are different
Use of Equivalence tests (Hartman and Hidalgo. 2018. Am. J. Political Sci.):
H0 : |τ | ≥ ∆ and H1 : |τ | < ∆
for a pre-selected value of ∆ > 0
Two one-sided test procedure (TOST: α level):
τ̂ + ∆ τ̂ − ∆
q > z1−α q < −z1−α
[
V(τ̂ ) [
V(τ̂ )
Two groups are equivalent if and only if both are rejected
α = probability of falsely concluding equivalence under the null
Difference in Means Test Equivalence Test
Reject H0
Reject H0 Accept H0 Reject H0 Accept H0 Accept H0
of a
of no difference of no difference of no difference of a difference difference of a difference
α 2 α 2
t−stat t−stat
Figure 1: Tests of equivalence versus tests of difference. The left panel depicts the logic of tests of
Inverting the test the largest equivalence region which is
difference under the null hypothesis of no difference. The right panel depicts the logic of tests of
consistent with the data at the (1 − α) × 100% confidence level
equivalence under the null hypothesis of difference.
4.2 Sample Size and Traditional Balance Tests
There are three factors that can result in the t-statistic lying in either the tails or the center of the
t Kosuke
distribution. If the mean difference betweenMatching
Imai (Harvard) the two populations
Methods is small, then the t-statistic will
Stat186/Gov2002 Fall 2019 11 / 18
FIGURE 2 Results of Equivalence Tests
Effect of Ethnic Quota on Redistribution
Observed Equivalence Tests Equivalenc
Mean Sci. Rev.)
(Dunning and Nilekani. 2013. Am. Political Confidence
Difference Equivalence Range: +/- 0.36 σ Interval (+/-
Variable (Scale of Var) (Scale of V
# Illiterates −257.6 # Illiterates 6 33.9
# Marginal Workers −12.5 # Marginal Workers 83.4
# Households −99 # Households 209.2
Ordered list determining
Agricultural Laborers −14.1 Agricultural Laborers 93.8
villages whose council −57.9

Cultivators Cultivators 150.8
Female Nonworkers −198.1 Female Nonworkers 433
presidencies are reserved
SC Population −15.6 SC Population 109.6
for scheduled castes

ST Population 35.9 ST Population 105.5
Male Nonworkers −146 .3 Male Nonworkers 309.6
Female Cultivators −32.9 Female Cultivators 76 .8
HH Industry Workers 2.2 HH Industry Workers 19.4
Marginal Agriculture Workers −9 Marginal Agriculture Workers 38.2
balance tests# Workersnumber−179.8 of # Workers 36 9.1
Population −544.8 1090.1

households and mean −15.5
Population
Female Workers, Other Industry Female Workers, Other Industry 46 .8
female nonworkers
Population (0−6 ) have −107 Population (0−6 ) 231.8
relatively small p-values −142.2

Female Illiterates Female Illiterates 347.6
Percentage SC 0 Percentage SC 0
Percentage ST 0 Percentage ST 0
−0.2 0.0 0.2

Equivalence Range (in standard deviations σ)
Note: The
Kosuke Imai (Harvard) observed mean difference is the mean of the treated
Matching Methods group minus
Stat186/Gov2002 Fallthe mean 12
2019 of / 18
the c
Bias of Matching
Bias of matching arises because of imbalance:
 
 1 X 
B(Xi , XMi ) = E(Yi (0) | Ti = 1, Xi ) − E Yi 0 XMi

 |Mi | 0
i ∈Mi

where XMi = {Xi 0 }i 0 ∈Mi

Bias correction (Abadie and Imbens. 2011. J Bus Econ Stat):
1 X
Y[
i (0) = Yi 0 + B(X\ i , XMi )
|Mi | 0
i ∈Mi
1 X n o
= Yi 0 + β̂ > (Xi − Xi 0 )
|Mi | 0
i ∈Mi
where β̂ is the estimated coefficient for the regression of Yi 0 on Xi 0

using all i 0 ∈ Mi
Variance
All matching estimators can be written as a weighting estimator:
 
n
1 X 1 X
τ̂match = Ti Yi − Yi 0 
n1 |Mi | 0
i=1 i ∈Mi
 
1 X 1 X
 n0
X 1{i ∈ Mi 0 } 
= Yi − Yi
n1 n0 n1 0 |Mi 0 |
i:Ti =1 i:Ti =0 i :Ti 0 =1
| {z }
Wi
Estimation error for the conditional ATT (CATT):

1 X 1 X
τ̂match − CATT = µ0 (Xi ) − Wi · µ0 (Xi )
n1 n0
i:Ti =1 i:T =0
| {z i }
≈0 if matched well and in a large sample
1 X 1 X
+ (Yi (1) − µ1 (Xi )) − Wi (Yi (0) − µ(0, Xi ))
n1 n0
i:Ti =1 i:Ti =0
Assume matching is done well and the sample is relatively large
Conditional variance over sampling from a super-population,
V(τ̂match | X, T)
n n
1 X 1 X
≈ V(Yi (1) | X, T) + W 2 · V(Yi (0) | X, T)
n12 i:T =1 n02 i:T =0 i
i i
n 2
X Ti Wi
= + (1 − Ti ) V(Yi | X, T)
n1 n0
i=1

Sensitivity Analysis (Rosenbaum. 2002. Observational Studies. Springer)
Consider a simple pair-matching of treated and control units
Assumption: treatment assignment is random within each pair
Question: How large a departure from the key (untestable)
assumption must occur for the conclusions to no longer hold?
Sensitivity analysis: for any pair j,
1 Pr(Ti = 1 | Si = j)/ Pr(Ti = 0 | Si = j)
≤ ≤ Γ where Γ ≥ 1
Γ Pr(Ti 0 = 1 | Si 0 = j)/ Pr(Ti 0 = 0 | Si 0 = j)
The model is
exp(f (Xi ) + γUi )
Pr(Ti = 1 | Xi , Ui ) = where exp(γ) = Γ
1 + exp(f (Xi ) + γUi )
and a standardized unobserved confoudner 0 ≤ Ui ≤ 1
Ratio of conditional treatment assignment probabilities can be
bounded by Γ/(1 + Γ) and 1/(1 + Γ)
bound p-value with the Wilcoxon’s signed rank sum test or
McNemar’s test
Smoking and Lung Cancer
Unobserved confounders:
an error has been made, of an old kind, in arguing from cor-
relation to causation, ... the possibility should be explored that
the different smoking classes, non-smokers, cigarette smok-
ers, cigar smokers, pipe smokers, etc., have adopted their
habits partly by reason of their personal temperaments and
dispositions, and are not lightly to be assumed to be equiva-
lent in their genotypic composition. (Fisher. 1958. Nature)
36,975 heavy smokers paired with nonsomokers based on age,
race, education, marital status, various health history measures,
etc. (Hammond. 1964. J. Natl. Cancer Inst.)
Of these pairs, 122 pairs had exactly one person died of lung
cancer – 110 heavy smokers
Sensitivity analysis based on McNemar’s test (maximum p-value):
< 0.0001 (Γ = 3), 0.004(Γ = 4), 0.03(Γ = 5), 0.1(Γ = 6)

Summary
Bias of regression due to covariate imbalance

Matching reduces bias by improving covariate balance
Various matching methods: propensity score, Mahalanobis, CEM,
full matching, other optimal matching methods
Importance of resulting balance equivalence test
Matching does not eliminate bias due to unobservables
sensitivity analysis
Recommended readings:
Ho et al. 2007. “Matching as Nonparametric Preprocessing for
Reducing Model Dependence in Parametric Causal Inference.”
Political Analysis
Stuart. 2010. “Matching methods for causal inference: A review
and a look forward.” Statistical Science
Imbens and Rubin. Chapters 12–15, 17–19, and 22.

12 Matching

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

12 Matching

Uploaded by

Copyright:

Available Formats

Matching Methods

S TAT 186/G OV 2002 C AUSAL I NFERENCE

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 1 / 18

Regression can be model-dependent

where Mi is the “matched set” for treated unit i

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 4 / 18

Exact Matching perfect covariate balance:

Coarsened Exact Matching (CEM) (Iacus et al. 2011 Political Anal.)

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 5 / 18

2 (Estimated) Propensity score:

or often with the linear predictor of logistic regression

Probability of receiving the treatment:

Exogeneity given the propensity score (under exogeneity given

(Yi (1), Yi (0)) ⊥

Dimension reduction propensity score matching

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 7 / 18

Success of matching method depends on the resulting balance

standardized mean difference = q Pn

Frequent use of balance test

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 8 / 18

University: Not reported ● ●

Schooling: Not reported ● ●

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 9 / 18

Null hypothesis of usual balance tests: treatment and control

4.2 Sample Size and Traditional Balance Tests

villages whose council −57.9

for scheduled castes

Male Nonworkers −146 .3 Male Nonworkers 309.6

Female Cultivators −32.9 Female Cultivators 76 .8

HH Industry Workers 2.2 HH Industry Workers 19.4

Marginal Agriculture Workers −9 Marginal Agriculture Workers 38.2

balance tests# Workersnumber−179.8 of # Workers 36 9.1

Population −544.8 1090.1

Female Workers, Other Industry Female Workers, Other Industry 46 .8

relatively small p-values −142.2

−0.2 0.0 0.2

where XMi = {Xi 0 }i 0 ∈Mi

where β̂ is the estimated coefficient for the regression of Yi 0 on Xi 0

Estimation error for the conditional ATT (CATT):

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 15 / 18

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 17 / 18

Bias of regression due to covariate imbalance

Kosuke Imai (Harvard) Matching Methods Stat186/Gov2002 Fall 2019 18 / 18

You might also like