You are on page 1of 12

Rerandomization in

Randomized Experiments
Kari Lock and Don Rubin
Harvard University
JSM 2010

The Gold Standard
Why are randomized experiments so good?
They yield unbiased estimates of the treatment effect

They eliminate (?) confounding factors
ON AVERAGE. For any particular experiment,
covariate imbalance is possible (and likely)

Rerandomization
Suppose you are doing a randomized experiment and
have covariate information available before conducting
the experiment

You randomize to treatment and control, but get a
bad randomization

Can you rerandomize?
Yes, but you first need to specify a concrete
definition of bad

Randomize subjects to
treated and control
Collect covariate data
Specify a criteria determining when
a randomization is unacceptable;
based on covariate balance
(Re)randomize subjects
to treated and control
Check covariate balance
1)
2)
Conduct experiment
unacceptable acceptable
Analyze results with a
Fisher randomization test
3)
4)
Unbiased
To maintain an unbiased estimate of the treatment
effect, the decision to rerandomize or not must be
automatic and specified in advance
blind to which group is treated

Theorem: If the treated and control groups are the
same size, and if for every unacceptable randomization
the exact opposite randomization is also unacceptable,
then rerandomization yields an unbiased estimate of
the treatment effect.




Mahalanobis Distance
Define overall covariate distance by
M = Dr
-1
D
2
Under adequate sample sizes and pure randomization: ~
k
M _
D
j
: Standardized difference between treated and
control covariate means for covariate j
k = number of covariates
D = (D
1
, , D
k
)
r = covariate correlation matrix = cov(D)
Choose a and rerandomize when M > a
Rerandomization Based on M
Since M follows a known distribution, easy to
specify the proportion of rejected randomizations
M is affinely invariant
Correlations between covariates are maintained
The variance reduction on each covariate is the
same (and known)
The variance reduction for any linear combination
of the covariates is known
Rerandomization
Theorem: If n
T
= n
C
and rerandomization
occurs when M > a, then
( )
( ) ( )
| ,
cov co | v
T C
T C T C
a
E M a
M a v
s =
s =
X X 0
X X X X
and
1,
2
2 2
, is the incomplete gamma function
,
2 2
a
k a
v
k a
k

| |
+
|
\ .

| |
|
\ .
( )
( ) ( ) ( )
2
va
| 0,
| 1 (1 ) var . r
T C
T C T C
a
E M a
M
Y Y
Y Y a v Y R Y
s =
s =
Difference
in Covariate
Means
Difference in Outcome Means
Pure Randomization
Re-Randomization
Standardized Differences in Covariate Means
-4 -2 0 2 4
male
age
collgpaa
actcomp
preflit
likelit
likemath
numbmath 0.14
0.15
0.17
0.14
0.16
0.16
0.16
0.15
, ,
, ,
var( |
var(
)
)
j T j C
j T j C
X X a T
X X
s

(theoretical v
a
= .16)
Pure Randomization
Re-Randomization
var( |
.57
var(
)
)
T C
T C
Y Y a
Y Y
T

=
s
(theory = .58)
Math
Difference in Means
-1.0 -0.5 0.0 0.5 1.0
Verbal
Difference in Means
-1.0 -0.5 0.0 0.5 1.0
Equivalent to
increasing the
sample size by a
factor of 1.7
Difference in Outcome Means Under Null
Conclusion
Rerandomization improves covariate balance
between the treated and control means, and
increases precision in estimating the treatment effect
if the covariates are correlated with the response

Rerandomization gives the researcher more power
to detect a significant result, and more faith that an
observed effect is really due to the treatment
lock@stat.harvard.edu