This action might not be possible to undo. Are you sure you want to continue?
:
An Observational Study
Amelia Haviland, Daniel S. Nagin, Paul R. Rosenbaum
RAND, CarnegieMellon University, University of Pennsylvania
Abstract
Using data from the Montréal LongitudinalExperimental Study of Boys, the ef
fects on subsequent violence of joining a gang at age 14 are studied controlling for
characteristics of boys prior to age 14. The boys are divided into trajectory groups
based on violence from ages 11 to 13, and within trajectory groups, joiners are opti
mally matched to a variable number of controls using propensity scores, Mahalanobis
distances, and a combinatorial optimization algorithm. The trajectory groups de…ne
meaningful subpopulations where e¤ects may be di¤erent, while propensity scores and
optimal matching tend to balance twelve baseline covariates. By using between 1 and
7 controls for each joiner, greater e¢ciency is obtained than by pair matching, with
greater bias reduction than is available by matching in a …xed ratio. We develop
new e¢ciency bounds to guide decisions about the structure of the matching. Fur
ther adjustments for covariates are made using permutational covariance adjustment.
The possible impact of failing to adjust for an important but unmeasured covariate is
examined using sensitivity analysis.
Key words: Covariance adjustment, instrumental variable, matching with variable
controls, mixture model, observational study, optimal matching, propensity score,
sensitivity analysis, trajectory group.
This work was supported by grants SES0345113 and SES99113700 from the Methodology, Measure
ment and Statistics Program and the Statistics and Probability Program of the U.S. National Science
Foundation and the National Institute of Mental Health (RO1 MH6561101A2). It also made heavy use of
data collected with the support from Québec’s CQRS and FCAR funding agencies, Canada’s NHRDP and
SSHRC funding agencies, and the Molson Foundation.
1
1 Introduction: Viewing Observational Studies of Development from the Perspective
of Controlled Experiments
A key aim of empirical research in developmental psychopathology and life course studies
is measuring the e¤ect of therapeutic interventions, such as treatment for depression, or
life events, such as gang membership, on behavioral trajectories. Ideally, such e¤ects would
be estimated with experimental data but many interventions that a¤ect development are
beyond experimental control, for ethical or practical reasons. In these situations inferences
must be drawn from observational data. Four decades ago Cochran (1965) re‡ected on the
design of studies which aim to draw casual inferences from observational data. He framed
his recommendations in the context of still earlier advice by Dorn (1953) who suggested
that the design of an observational study be organized around the question “How should
the study be conducted if it were possible to do it by controlled experimentation?” Certain
issues are common to an experiment and an observational study, and these shared issues
are brought into focus by thinking about the simpler situation of an experiment. One then
tries to reconstruct, to the limited extent possible, the circumstances of the experiment
from the observational data. Finally, one tries to address the weaknesses that are present
in the observational study but which would have been avoided in an experiment. A
similar perspective is discussed in Meyer (1995), Shadish, Cook and Campbell (2002) and
Rosenbaum (2002a, 2005a).
A well designed and executed experiment has four key ingredients: (1) a clearly speci…ed
treatment, (2) good baseline measurements describing the subjects of the experiment prior
to treatment, (3) a well de…ned and properly executed method of randomizing treatment
assignment, and (4) well de…ned outcomes measured after treatment.
Regarding the …rst ingredient we examine the possible e¤ects of joining a gang at age
14 on adolescent males who had not been in gangs prior to age 14. Our methods allow us
to examine whether gang membership has a di¤erent e¤ect on groups of boys distinguished
by their histories of violence. Also investigated are whether …rsttime gang membership
has an immediate e¤ect on violence at 14 and whether e¤ects persist from age 15 to 17.
Our data show that for most youth, gang membership is not persistent. What then is the
e¤ect of currently being in a gang, as distinct from joining a gang and perhaps departing?
It is these questions we address in the current manuscript.
Our analysis draws upon two lines of research that in our view are helpful in bringing
the second two attributes of an experiment to the analysis of nonexperimental longitudinal
2
data. One line involves the use of …nite mixture modeling to analyze developmental trajec
tories in a groupbased framework (cf. Nagin, 1999 & 2005; Nagin and Land, 1993; Muthen,
2001)) and the second involves the use of propensity scores, matching, and strati…cation
for causal inference in observational data (cf. Rosenbaum and Rubin, 1983; Rosenbaum,
2002a). The trajectory groups are based on pretreatment measures of the variable which,
after treatment, is the outcome variable. Thus, …tting a groupbased trajectory model to
pretreatment, baseline data permits the comparison of treated and control subjects who
appeared similar, in terms of developmental trajectory, prior to treatment. As such, the
trajectory groups serve as the baseline measure of response. Fitting propensity scores us
ing observed pretreatment measures or covariates permits the comparison of treated and
control groups that are balanced in terms of these observed covariates. Thus, the propen
sity score serves to stochastically balance observed covariates as random assignment of
treatments would do in an experiment. Unlike randomization in an experiment, neither
propensity scores or trajectory modeling can control for covariates that were not measured;
we examine this inescapable concern with the aid of a sensitivity analysis.
1.1 Data and Case Study: Joining a Gang in Montreal
We begin with a brief discussion of the data and case study that will be used to demon
strate the proposed framework. As already noted, the case study examines the e¤ect of
gang membership on violent delinquency. We chose this example for the case study for two
reasons. First, it is a substantively interesting and important problem. There is a long and
rich tradition of gang research in criminology because group in‡uence, particularly among
adolescents, is thought to play a key role in criminal and delinquent behavior (Short and
Stodtbeck 1965; Thornberry et al., 2003; Warr, 2002; Cohen, 1955; Cloward and Ohlin,
1960). Second, it is an archetypal example of a large class of important inference problems
in psychology and the social sciences more generally. Criminologists have long under
stood that estimates of the facilitation e¤ect of gang membership may be contaminated
by selection e¤ects whereby the most delinquency prone individuals choose to join gangs
(Lacourse et al., 2003; Thornberry et al., 2003). Similarly, selection e¤ects of this sort are
a generic threat to validity of causal inferences about the e¤ects of events or experiences
on developmental trajectories because the likelihood of their occurrence is commonly tied
to the development of the very outcome under investigation. For example, just as gang
membership is more likely among the already violent, divorce is more likely among those
3
experiencing depression which greatly complicates the task of inferring the e¤ect of divorce
on depression (Halford & Bouma, 1997; Schmalling & Scher, 1997).
The data used in the case study are the product of the Montréal Longitudinal Study of
Boys. The 1037 male subjects in this study were in kindergarten at its outset in the spring
of 1984. They were next assessed in 1988 and then again annually until 1995 when their
average age was 17. The sample was drawn from 53 schools in the lowest socioeconomic
areas in Montréal, Canada. To control for cultural e¤ects, boys were included in the
longitudinal study only if both their biological parents were born in Canada and their
biological parents’ mother tongue was French. This resulted in a homogeneous white,
Frenchspeaking sample. Wideranging measurements of potentially important covariates
such as social and psychological function were made based on assessments by parents,
teachers, peers, selfreports of the boy himself, and administrative records from schools and
the juvenile court. These measurements include data on the boy’s behavior across many
domains (e.g., sexual activity and delinquency in adolescence), and social functioning (e.g.,
peer popularity). See Tremblay et al. (1987) for further details on this study.
The selfreported data on annual involvement in violent delinquency and participation
in delinquent groups, which we hereafter refer to as gangs, form the core of the analyses
we report in this paper. Queries on prior year involvement in violent delinquency and
gangs were initiated in 1989 when the boys were age 11. Subjects were asked about the
frequency of their involvement in seven di¤erent types of violent delinquency within the
past year—threatening to attack someone, …st …ghting, attacking an innocent person, gang
…ghting, throwing objects at people, carrying weapons, and using weapons. These items
were each coded on a 4point Likertscale (0 = never; 1 = once or twice; 2 = sometimes; 3
= often) and summed to form an overall scale of violent delinquency. This scale was used
to estimate the trajectory model over the period when the boys were 11 to 13 years old. In
estimating the treatment e¤ect of gang membership at ages 14 to 17 , the item pertaining
to gang …ghting was excluded. Gang membership status in the prior year was based on the
subject’s response to the question: During the past 12 months, were you part of a group
or a gang that committed reprehensible acts?
1
1
The original version of the question as administered in French was: “Au cours des 12 derniers mois,
astu fais partie d’un groupe des jeunes (gang) qui fait des mauvais coups?”
4
1.2 Haviland and Nagin (2005): Modelling Pretreatment Developmental Trajectories
Our point of departure is recent work by Haviland and Nagin (2005) that tackles the same
type of inference problem that we address in this paper—events or experience that might
alter an individual’s developmental trajectory. As in this paper, groupbased trajectory
modeling plays a central role in Haviland and Nagin. The method is designed to identify
groups of individuals following approximately the same developmental trajectory over a
speci…ed period of time (e.g., age 11 to 13) for the outcome of interest (e.g., violent delin
quency). Stated informally, prior to joining a gang, individuals in the same trajectory
group appeared to be headed along the same path, at least so far as violence is concerned.
The use of trajectory groups as a basis for inference leads to the estimation of trajec
tory groupspeci…c treatment e¤ects. This is scienti…cally important because a key premise
of life course theories of development is that the magnitude, including the sign, of treat
ment e¤ects may depend upon a person’s developmental trajectory (Elder, 1985 & 1998;
Thornberry et al., 2003). Thus, the trajectory group framework allows for examination of
whether there are di¤erences in treatment e¤ects across substantively interesting groups
that are di¤erentiated by their developmental history. The capability to estimate trajec
tory group speci…c treatment e¤ects also allows researchers to examine the association of
the treatment e¤ect estimates with characteristics of trajectory group members. This ca
pability is signi…cant because it provides an empirical basis for better understanding how
treatment e¤ects vary across variables that distinguish developmental history.
The trajectory groups can also be thought of as latent strata measuring the history
of the outcome variable. In the spirit of research using propensity scores and matching,
Haviland and Nagin use the trajectory groups and the attendant posterior probabilities of
group membership for each individual as a statistical device for creating balance on a key
covariate that may be confounding the treatment e¤ect estimate, namely prior violence,
which notably consists of prior values of the posttreatment outcome measure. Speci…cally,
in the context of the illustrative application it is vital to account for the empirical reality
that gang membership is more likely among individuals with a history of violence. By
comparing gang joiners and nonjoiners in the same trajectory groups, one is comparing
individuals whose trend in violence, before joining a gang, was similar.
Haviland and Nagin demonstrated their approach with the same example that this
paper begins with—an analysis of the e¤ect of …rst time gang membership at age 14 on
violent delinquency at 14. In the current work, we substantially extend the prior analysis
5
to include inferences not only about the e¤ect of …rst time gang membership at age 14 on
violence at 14 but also on the e¤ects at later ages. Also, we combine trajectory groups
with propensity scores to ensure balance on covariates other than prior violence.
As discussed in detail in Haviland and Nagin, application of groupbased trajectory
modeling to the subset of individuals in the Montréal study with no self reported history
of gang involvement from age 11 to 13 yielded the three groupmodel depicted in Figure
1. The raxis of Figure 1 measures age and the naxis measures the value of the Poisson
rate parameter, `
t
, which for each group measures the expected rate of violent o¤ending
at age t. The two largest groups, called the Lows and the Mediums, were, respectively,
estimated to compose 46% and 48% of the sampled population. The third group, called
the Chronics, had a far higher rate of violent delinquency and was also far smaller. It was
estimated to compose 6% the sampled population.
The tendency of more violent boys being more likely to join gangs is quite pronounced
in our data. The percent of boys who became …rsttime gang members at age 14 is smallest
for the low violence group (7%), next largest for the middle group (15%) and highest for
the chronic group (31%). Thus, a simple comparison of the age 14 violent delinquency
of the …rst time gang joiners with their peers who did not join gangs would hopelessly
confound the facilitation e¤ect of gang membership with the already higher violence rate
of …rst time gang members.
More formally the problem of confounding can be characterized in terms of covariate
imbalance. A covariate is a variable measured prior to treatment and hence una¤ected
by a subsequent decision to apply or withhold the treatment. A covariate is said to be
out of balance if its distribution di¤ers between the treated and untreated. For instance,
Haviland and Nagin found that gang joiners at age 14 had been more violent before age
14 and also less popular than boys who did not join gangs at age 14 – prior violence and
peer rated popularity are two of the many covariates that were found to be out of balance.
The concern, of course, is that a di¤erence in outcomes after treatment, say a di¤erence in
level of violence at age 14 or 15, may not be an e¤ect caused by joining a gang, but may
instead re‡ect the di¤erences between joiners and nonjoiners that already existed prior to
age 14.
The essence of the Haviland and Nagin approach to creating balance is to perform
analyses within trajectory group. They …nd that unlike contrasts of gang members and
nongang members that are not conditioned on their trajectory group membership, contrasts
which condition on trajectory group membership had very good success in achieving balance
6
between the …rsttime gang members and their counterparts who remain outside of gangs
on a wide range of potential confounders including most especially violence levels prior to
age 14.
The approach to achieving balance laid out in Haviland and Nagin is empirically based
and is not guaranteed to balance covariates, particularly covariates that are very di¤erent
from the outcome, such as popularity. Due to the statistical construction of groupbased
trajectory models it is likely that the approach will have good success in achieving balance
within trajectory on lagged outcomes, which as argued above, we believe to be very impor
tant. Still no theorem assures that balance will be achieved even in expectation. Further,
as an empirical matter, the likelihood of achieving balance on other potential confounders
is lower than for lagged outcomes. By contrast, methods for achieving balance based on
propensity scores and matching are speci…cally designed to achieve this aim. For this rea
son, integration of the inference strategy laid out in Haviland and Nagin with other formal
statistical approaches for achieving balance is desirable.
In the integrated framework described in this paper, trajectory groups summarize base
line measures of the outcome and propensity scores provide a useful but limited surrogate
for random assignment of treatments. The integration is composed of a threestage analy
sis. The …rst stage involves estimating a groupbased trajectory model for the outcome and
subjects of interest. In the context of our demonstration analysis of the violence facilitation
e¤ect of gang membership, this step, which has already been described, involves estimation
of a trajectory model of violent delinquency from age 11 to 13 for individuals with no gang
involvement over this period. In the second stage each treated individual is matched with
one or more untreated individuals. The matching of gang joiners with nongang joiners,
carried out within trajectory group, attempts to …nd nonjoiners who are close on an es
timate of the propensity score, and on the individual variables that enter the propensity
score. We then check the degree of success of the matching strategy in achieving balance
between the …rst time gang members—the treated—and their matched counterparts who
did not join gangs—the controls. In the third stage of the analysis the treatment e¤ect of
the event of interest, gang membership in our case, is analyzed. Speci…cally we examine
the e¤ect of …rsttime gang membership at age 14 on violence at age 14 and beyond, within
and across trajectory group.
7
1.3 Overview of Using Propensity Scores to Balance Observed Covariates in Obser
vational Studies
In the simplest randomized experiment, subjects are assigned to treatment or control by
the independent tosses of a fair coin, so that every individual has the same chance, namely
1
2
, of receiving treatment rather than control. Randomization then warrants or becomes
the “reasoned basis” for inferences about the e¤ects caused by the treatment; see Fisher
(1935) and Rubin (1974). In contrast, the de…ning feature of an observational study is
that randomization is not used to assign treatments, so that some individuals are more
likely to receive the treatment than others. For instance, the boys who joined gangs at
age 14 did not do so ‘at random, with equal probabilities;’ in fact, the boys who joined
gangs at age 14 tended to be quite di¤erent from those who did not, even several years
prior to age 14. In the absence of random assignment of treatments, there is nothing
to ensure that treated and control subjects were comparable prior to treatment, with
the consequence that di¤ering outcomes after treatment may not be e¤ects caused by
the treatment, but instead may simply re‡ect pretreatment di¤erences. The problems
of interpretation created by the absence of random assignment are of two kinds: (i) the
probability of receiving treatment may vary with observed pretreatment characteristics or
covariates, so that the data at hand indicate treated and control subjects not comparable
prior to the start of treatment, and (ii) the probability of receiving treatment may also vary
with unobserved pretreatment covariates, so that adjustments for observed covariates may
be insu¢cient to make the treated and control groups comparable. As examples of (i), the
observed covariates clearly show that gang joiners were more violent and less popular than
nonjoiners prior to age 14. It is not possible to exhibit a di¤erence of type (ii), because the
relevant covariates were not observed, but even with twelve observed covariates we consider
in Table 6, it would hardly be surprising if some important pretreatment di¤erence was
not adequately measured.
The propensity score is a device for removing imbalances in observed pretreatment
covariates, whether or not there are also imbalances in unobserved covariates, that is, a
device for addressing problem (i). Through matching or strati…cation on the propensity
score, one compares treated and control groups that appear comparable prior to treatment
in terms of observed covariates. A nontechnical survey of methods and results about
propensity scores is given by Jo¤e and Rosenbaum (1999), and for several casestudies, see
Rosenbaum and Rubin (1984, 1985a), Smith (1997) and Dehejia and Wahba (1999).
8
The propensity score is the conditional probability of receiving the treatment rather
than the control given the observed covariates (Rosenbaum and Rubin, 1983). In the
simplest randomized experiment mentioned above, the propensity score would be
1
2
for
every subject. In the current context, the propensity score is the conditional probability
of joining a gang at age 14, given the observed covariates, namely violence prior to age 14,
peer rated popularity, mother’s age the birth of her …rst child, and so on. If two boys have
the same propensity score given observed covariates, say a .2 chance of joining a gang at
14, then these observed covariates will be of no further use in predicting which of these two
boys will join a gang at 14, so for these two boys, there will be no systematic tendency for
the observed covariates to be di¤erent for the joiner and the nonjoiner.
The propensity score is estimated from the data, perhaps using a logit model. Boys
with similar estimated propensity scores are compared, say by matching; then the joiners
and nonjoiners will have similar distributions of the observed covariates, such as past
violence, popularity, mother’s age, and so on. Obviously, unlike randomization which does
not use the covariates in balancing them, the propensity score can only be expected to
balance the observed covariates used to construct the score. Unobserved covariates need
to be addressed by other methods, such as sensitivity analyses, which we illustrate in §5.4.
There are two key theorems concerning propensity scores. Informally, they say: (1)
matching or stratifying on the propensity score tends to balance the observed covariates
used to construct the score, and (2) if there is no bias from unobserved covariates, that is if
problem (ii) above does not arise, then to adjust for the many observed covariates it su¢ces
to adjust for the unidimensional propensity score. In the current context, if one matches
gang joiners and nonjoiners on their probabilities of joining a gang given the twelve observed
covariates in Table 6 — that is, given violence before age 14, peer rated popularity, and so
on — then these twelve covariates will tend to be balanced in treated and control groups.
Also, if it were su¢cient to adjust for these twelve observed covariates, if there were no
bias from covariates that were not measured, then one could estimate the e¤ects of joining
a gang by adjusting for the 1dimensional propensity score rather than the 12dimensional
observed covariates. Formally, the two theorems in Rosenbaum and Rubin (1983) say: (1)
treatment assignment and the observed covariates are conditionally independent given the
propensity score, and (2) if treatment assignment is strongly ignorable given the observed
covariates then it is strongly ignorable given the propensity score alone.
Intuitively, it seems reasonable to expect that the true value of a parameter would
perform better than an estimate of the parameter, and while that intuition is sometimes
9
true, it is not always true; see, for instance, Stuart (1955). In fact, when matching or
stratifying on propensity scores, estimated scores typically perform somewhat better than
theory says true propensity scores would perform; see, for instance, Rosenbaum and Rubin
(1984) where 74 observed covariates are better balanced by propensity scores than would
have been expected in a randomized experiment. The estimated propensity scores cannot
distinguish systematic imbalances from imbalances produced by chance, and the estimated
scores tend to remove some of the chance imbalances, whereas the true propensity score
removes only the systematic imbalance. Some theory related to estimated propensity
scores is discussed in Rosenbaum (1984, 1987).
1.4 Overview of Groupbased Trajectory Modeling
This section provides a brief overview of groupbased trajectory modeling. For more elab
oration, see Nagin (1999; 2005) or related work by Muthen (2001).
The groupbased trajectory model is an application of …nite mixture modeling in which
the population is assumed to be composed of a …nite number 1 of trajectory groups denoted
by the index /, / = 1, 2, . . ., 1. The three key outputs of groupbased trajectory modeling
are: (1) the developmental trajectory of each group /, (2) the population base rate of each
group, and (3) for each individual i the posterior probability of their membership in each
group / given his or her actual measurements on the outcome of interest.
Let 1
it
denote the outcome variable of interest for individual i at time t, and let
¯
Y
it
denote a vector of such outcomes from time 1 through t. In the current application,
1
it
is the frequency of violent delinquency of individual i in year t. The trajectory
model describes the conditional distribution, 1
¯
Y
it
1
i
= /
. of
¯
Y
it
given that indi
vidual i is in group /, denoted 1
i
= /, and the frequency ¬
k
= Pr (1
i
= /) of group /.
In its most general form, the groupbased trajectory model follows a …nite mixture model,
Pr
¯
Y
it
=
¸
K
k=1
¬
k
1
¯
Y
it
1
i
= /
, with the responses of distinct individuals i being
mutually independent.
In the current paper, we consider a speci…c form of this model in which 1
¯
Y
it
1
i
= /
consists of independent Poisson variables whose parameters vary with time t and trajectory
group /, that is,
1
¯
Y
it
1
i
= /
=
¸
t
exp(÷`
kt
) `
Y
it
kt
1
it
!
.
so that the coordinates of
¯
Y
it
are dependent for each individual i, but the dependence would
10
vanish within individual i’s latent trajectory group, 1
i
= /. The logarithms, log (`
kt
), of
the expected frequencies, `
kt
, are modeled as a polynomial in time t with coe¢cients
that depend upon the trajectory group /, that is, log (`
kt
) = 0
0k
+ 0
1k
t + 0
2k
t
2
+ . . . +
0
pk
t
p
. The parameters, , of this function are permitted to vary freely across groups /.
This speci…cation accords with much criminological research demonstrating that rates of
o¤ending vary with age and also that there is considerable heterogeneity in the trajectories
of o¤ending across individuals (Farrington, 1986; Nagin, Farrington, and Mo¢tt, 1995).
The observed delinquency histories, (1
i;11
. 1
i;12
. 1
i;13
), from age 11 to 13 of the · boys
with no self reported gang membership over this period form the basis for estimating the
trajectory model used to create the baseline measurements on delinquent history. Their
likelihood is
/
¯
Y
1t
. . . . .
¯
Y
Nt
; .
=
N
¸
i=1
K
¸
k=1
13
¸
t=11
exp(÷`
kt
) `
Y
it
kt
1
it
!
Maximization of this likelihood produces estimates of the parameters of the polyno
mial functions that de…ne the shape of each group’s trajectory of violent delinquency as
measured by `
kt
. In this application, for ages 11, 12, and 13, we use a loglinear function
of age, log (`
kt
) = 0
0k
+ 0
1k
t. The …nal key product of groupbased trajectory modeling,
the posterior probability of membership in the /
th
trajectory group, Pr
1
i
= / [
¯
Y
it
, is
calculated postestimation for each individual i in the estimation sample. It measures the
probability of each such individual’s membership in group / given their history of violent
delinquency from age 11 to 13.
To permit matching of joiners and controls within trajectory groups, each boy was
attached to the trajectory group for which he had the highest estimated probability of
membership, although we use the estimates of Pr
1
i
= / [
¯
Y
it
as well. On average this
highest estimated probability of membership was .85 or greater for each of the three groups
depicted in Figure 1. Figure 2 depicts the violence scores by trajectory group.
2 Trajectory Groups and Propensity Scores: The Boys Before Age 14
2.1 Violence Trajectory and Covariates Before Age 14
A total of 580 individuals in the Montreal Study reported no involvement with gangs from
age 11 through 13 and also had no more than one missing assessment of their violent
11
Table 1: Frequencies of Trajectory Groups Based on Violence Ages 1113 and Gang Joining
at Age 14.
Status at Age 14 Low Medium Chronic
Gang Joiner 21 38 9
Not a Joiner 276 216 20
Total 297 254 29
delinquency and gang involvement over this period.
2
The self reported violent delinquency
of these individuals was used to estimate the trajectories reported in Figure 1. Of these
individuals, 68 joined gangs for the …rst time at age 14. The …rsttime gang joiners and
their counterparts who did not join gangs were distributed across the trajectory groups at
the frequencies in Table 1.
Our aim was to match, within trajectory group, each …rsttime gang joiner with one
or more of his counterparts who did not join at 14 but who had similar covariates prior to
age 14. These covariates include variables that are known to be correlated with violence
and include: violence scores at ages 10, 11, 12, and 13, peer rated popularity at age 11,
the age of the boy’s mother at the birth of her …rst child, peer rated aggression at age
11, teacher rated hyperactivity at age 11, selfreported number of sexual partners at age
13, teacher rated opposition at age 11, a rough IQ measure, and teacher rated physical
aggression at age 11.
3
Figure 3 depicts the covariates prior to matching, together with the
twosided signi…cance level from Wilcoxon’s rank sum test. If this had been a randomized
experiment in which the boys in the Montreal study had been selected at random to join
a gang at age 14, one covariate in twenty would be expected to yield a signi…cance level
of 0.05 or less, whereas in Figure 3, ten of the twelve covariates have signi…cance levels
less than 0.05. Before joining gangs, joiners were more violent than nonjoiners, were less
popular with their peers, were more aggressive, hyperactive, and oppositional, had more
sexual partners, and had mothers whose age at the birth of their …rst child was younger.
2
282 boys were involved in gangs prior to age 14 and 128 had more than one missing assessment over
this age range. An additional 59 boys were either missing their gang membership status at age 14 or their
assessment of violent delinquency.
3
Violence at age 10 was not used in estimating the trajectory model because it measured frequency at
age 10 and all prior years whereas the later year violence measurements are only for that year.
12
2.2 The Propensity to Join a Gang at Age 14
As a …rst step in matching the gang joiners with comparable nonjoiners we estimated a
propensity score using the original 12 covariates in Figure 3 plus some others derived from
these covariates. The complete list of covariates are reported in Table 6 which will be
described in greater detail in §4.2. The propensity score, which measures the conditional
probability of joining a gang at age 14 given the covariates, was estimated using a single
logit model.
Figure 4 depicts the estimated propensity scores for joiners and potential controls in
each of the three trajectory groups. In the low and medium trajectory groups, there is a
substantial di¤erence between joiners and potential controls, but there is also a fair amount
of overlap in the distributions, so credible matches would seem to be available. By contrast,
in the chronic group, the distributions exhibit limited overlap. Summary statistics on the
distributions for the chronics are reported in Table 2. The median propensity score among
the chronics who join gangs was above the maximum among the nonjoiners, and the lower
quartile among the joiners was above the upper quartile among the nonjoiners. Indeed,
the median propensity score for the 9 joiners in the chronic group is just slightly above
the maximum propensity score for all the other 571 boys in the sample. Whether or
not they joined a gang, the boys in the chronic group tended to have very high estimated
probabilities of gang membership compared to the other two groups— even the median
for the 20 nonjoiners in the chronic group is above the median for the joiners in the low
and medium groups and half of the joiners in the chronic group are quite di¤erent from
everyone else.
These results indicate that the propensity for gang joining in the chronic group is
materially higher than in the low and medium groups particularly for those individuals
who actually joined gangs. More importantly the results suggest that it would be di¢cult
to …nd good matches for the gang joiners in the chronic group. This indeed turns out to
be the case. All attempts to …nd good matches for the gang joiners in the chronic group
failed dismally. One could try to skirt this problem by running a regression to adjust for
the covariate di¤erence between joiners and nonjoiners in the chronic group and hoping for
the best. However, for this group the regression would largely consist of an extrapolation
rather than interpolation between joiners and nonjoiners whose covariate distributions
exhibit limited overlap. (It is remarkable how rare it is for an investigator to check whether
a covariance adjustment model is an interpolation of substantially overlapping covariate
13
Table 2: Distribution of the Propensity Score in Chronic Trajectory Group : Estimated
Probabilities of Joining a Gang at Age 14.
Chronic Group min Quartile 1 Median Quartile 3 max
Joiners (: = 9) 0.21 0.40 0.69 0.84 0.98
Potential Controls (: = 20) 0.00 0.10 0.18 0.36 0.53
distributions or an extrapolation of largely nonoverlapping covariate distributions.)
We concluded that these data do not permit credible estimation of the e¤ect of gang
membership for the boys in the chronic trajectory. Half of these gang joiners were unlike
anyone else in the study prior to age 14, and so there is no evidence in the data about
what would have happened to boys like this if they had not joined gangs. This was
disappointing because, even though this group consists of only a few boys, in many ways
the chronic group is quite interesting from a policy and scienti…c standpoint. Thus, we
made estimates of the violence facilitation e¤ect of gang membership only for the numerous
but less violent boys in the low and medium groups which hereafter are also referenced by
the indices : = 1 and : = 2, respectively.
3 Planning the Matching
3.1 Conceptual Issues about E¢ciency
The “primary role” of matching in observational studies is to remove systematic biases due
to imbalances in observed covariates (Cochran 1965, p. 237), that is biases which do not
diminish in magnitude as the sample size increases (technically, biases of order O(1) in
estimated treatment e¤ects), because these dominate the mean squared error as the sample
size increases. Nonetheless, the standard error of the estimator is also of concern, even
though it does diminish with increasing sample size (technically, it is of order O
1´
:
+
),
and so the standard error’s relative contribution to the mean square error is negligible in
large samples when systematic biases are present. In the limit, in very large samples,
there is really no tradeo¤ of systematic bias (of order O(1)) and standard error (of order
O
1´
:
+
); rather, in the limit, bias is allimportant. In practice, of course, the sample
size may be large but it is …nite, so one does not want to be wasteful of statistical e¢ciency.
In the current section, we focus on the question of how the number of controls matched
to each joiner a¤ects the standard error. In subsequent sections, we focus on the more
important issue of controlling bias from covariates.
14
Table 3: Elementary E¢ciency Comparisons With Various Numbers of Controls Matched
to Each Treated Subject.
Controls :
si
1 2 3 4 5 10 20 50 ·
Variance multiplier
1 +
1
m
si
2.00 1.50 1.33 1.25 1.20 1.10 1.05 1.02 1.00
E¢ciency considerations are clari…ed with the aid of a simple model similar to that
used in the analysis of variance or the paired ttest; see, for instance, Rosenbaum and
Rubin (1985a) and Smith (1997). The i
th
gang joiner at 14 in trajectory group : is
matched to :
si
_ 1 controls. Under the simple model, the response has an additive e¤ect
for each matched set, a constant treatment e¤ect, and independent errors with constant
variance o
2
; we use this model only for e¢ciency calculations in §3, not for inference in
later sections. If one subtracts from the response of the i
th
joiner in trajectory group : the
average response of his :
si
controls, then this di¤erence has variance o
2
1 +
1
m
si
. We
examine various technical consequences in §3.3, but a key point can be discussed without
additional technical detail. Table 3 shows how the variance multiplier,
1 +
1
m
si
, changes
as the number of controls :
si
changes. Notice, …rst, that there is an asymptote: for
matched pairs, :
si
= 1, the multiplier is 2, but if each treated subject were matched to
in…nitely many controls, the multiplier would drop to 1. In fact, using two controls rather
than one, :
si
= 2, yields a multiplier of 1.5, halfway to :
si
= ·. The distance is halved
again, to 1.25, by using :
si
= 4 rather than :
si
= 2 controls. By contrast, the gain from
using :
si
= 10 controls rather than :
si
= 5 controls is much smaller. The key point here
is that if the sample size permits the use of more than one control per treated subject, then
there are substantial gains to be had using :
si
= 2 controls, and meaningful gains from
:
si
= 4, but for much larger values of :
si
the gains are no longer large. Moreover, as
theory suggests and as Smith (1997) shows in a casestudy, it becomes harder and harder to
…nd good matches as :
si
increases, so a large value of :
si
may yield a biased comparison
with a negligible gain in e¢ciency. See also the case study by Dehejia and Wahba (1999)
where modelbased methods unaided by matching are distorted by the use of controls quite
unlike the treated subjects.
The discussion so far has focused on a single matched pair, but one faces the additional
choice of allowing the individual :
si
to vary while keeping their total
¸
i
:
si
…xed. Again,
we develop this in some detail in §3.3, but the basic issue is clear with just two pairs, say
i and ,. Suppose joiner i is matched to :
si
controls and joiner , is matched to :
sj
15
Table 4: E¢ciency in Two Pairs With a Total of 7 Controls.
:
si
1 2 3 4 5 6
:
sj
6 5 4 3 2 1
:
si
+:
sj
7 7 7 7 7 7
1
4
1 +
1
m
si
+
1 +
1
m
sj
¸
0.79 0.68 0.65 0.65 0.68 0.79
controls, and their two treatedminuscontrol di¤erences are averaged. That average of
two di¤erences within two pairs has variance
1
4
1 +
1
m
si
+
1 +
1
m
sj
¸
. Table 4 shows
the e¢ciency in two pairs, i and ,, with a total of :
si
+:
sj
= 7 controls shared between
them. These variances are fairly stable, especially if min(:
si
. :
sj
) _ 2. The worst
variance, namely 2, in Table 3 is 100% bigger than the best variance, namely 1, in Table
3, but the worst variance, namely 0.79, in Table 4 is 23% bigger than the best variance,
namely 0.65, in Table 4. If min(:
si
. :
sj
) _ 2, then the worst in Table 4 is less than 5%
higher than the best. In short, although the most e¢cient allocation has :
si
constant, the
same for all i, the e¢ciency changes only slowly as the :
si
are allowed to vary with their
total
¸
i
:
si
…xed. Moreover, Ming and Rosenbaum (2000) show that one can achieve
much greater bias reduction by allowing the :
si
to vary. It is easy to see why this is so in
the current context. The most violent boys are the ones most likely to join a gang. If a
nonviolent boy, say i, joins a gang, then there will be an abundance of similar nonviolent
controls available to match to i, so :
si
should be set somewhat higher. On the other
hand, if an extremely violent boy, ,, joins a gang, there will be comparatively few similar
controls available to match to ,, so :
sj
should be set somewhat lower. Notice that, in
Table 4, setting :
si
= 5 and :
sj
= 2 would be nearly as e¢cient as setting :
si
= 4 and
:
sj
= 3, but the former would produce better matches given the types of boys who join
gangs.
From Table 1, there are 13.1 = 276´21 nonjoiners for each joiner in group 1 and
5.7 = 216´38 nonjoiners for each joiner in group 2. After some theoretical calculations
and some preliminary e¤orts at matching, we decided to match each joiner in group 1
to between 2 and 7 controls with an average of 5 controls, and each joiner in group 2 to
between 1 and 6 controls with an average of 3 controls. The remainder of the current
section describes the theoretical calculations of e¢ciency bounds that guided this decision.
A reader uninterested in these e¢ciency calculations could skip to §4 where the matching
is performed and evaluated.
16
3.2 Matching with Variable Numbers of Controls: A Flexible Strategy
There are o strata, with :
s
treated subjects, i = 1. . . . . :
s
, :
+
=
¸
S
s=1
:
s
treated subjects
in total, and '
s
potential controls in stratum :, with '
s
_ :
s
, : = 1. . . . . o. In Table 1,
setting aside trajectory group 3 for the reasons discussed in §2.2, o = 2, :
1
= 21, :
2
= 38,
:
+
= 59, '
1
= 276, and '
2
= 216. The i
th
treated subject in stratum : will be matched
to :
si
_ 1 controls from stratum :, with no control matched to more than one treated
subject. In this study, the number of controls was vastly greater than the number of
treated subjects; therefore, we do not consider the alternative of “full matching” in which
a treated subject may be matched to several controls or a control may be matched to
several treated subjects (see Rosenbaum 1991, Gu and Rosenbaum 1993, Hansen 2004).
Every one of the :
+
= 59 joiners in groups 1 and 2 will be matched (Rosenbaum and
Rubin 1985b). Write m = (:
11
. :
12
. . . . . :
S;n
S
)
T
, and write :
s+
=
¸
ns
i=1
:
si
, so that
:
s+
_ '
s
with equality if every control is matched.
In addition to strata, each subject has a vector of observed, pretreatment covariates;
see Table 6 below. Between each treated subject and each control in the same stratum,
there is a distance, such as the Mahalanobis distance (Rubin 1980), measuring how similar
these two subjects are with respect to the covariates. Each joiner in stratum : will be
matched to at least c
s
_ 1 controls and at most
s
_ c
s
controls with :
s+
controls
used in total in stratum :. Write = (c
1
. c
2
. . . . . c
S
)
T
, = (
1
. . . . .
S
)
T
, and m
+
=
(:
1+
. . . . . :
S+
)
T
. Clearly, one must choose :
s+
so that :
s
_ :
s+
_ '
s
and one must
choose (c
s
.
s
) to satisfy :
s
c
s
_ :
s+
_ :
s
s
; if this is true for : = 1. . . . . o, we say that
the choice (. . m
+
) is possible. For instance, in stratum : = 1, the number of matched
controls, :
1+
, must satisfy :
1
= 21 _ :
1+
_ '
1
= 276 and 21 c
1
_ :
s+
_ 21
1
, so
:
1+
= 105 = 5 21, c
1
= 2,
1
= 7, is one of many possibilities.
Write j
s
= :
s+
´:
s
 where r is the greatest integer less than or equal to r and r is
the least integer greater than or equal to r; for instance, with :
1+
= 105, j
1
= :
1+
´:
1
 =
105´21 = 5. In general, the closest matches, in terms of distance, are obtained by taking
c
s
=
s
= 1, :
s+
= :
s
, and j
s
= 1, whereas the smallest standard errors are obtained by
taking :
s+
= '
s
, c
s
= j
s
= :
s+
´:
s
,
s
= :
s+
´:
s
, so these two considerations pull in
di¤erent directions. In stratum : = 1, these extremes have c
s
=
s
= 1 and :
s+
= :
s
= 21
or :
s+
= '
s
= 276, c
s
= j
s
= :
s+
´:
s
 = 276´21 = 13,
s
= 276´21 = 14. In a
careful casestudy, Smith (1997) showed that insisting upon matched pairs, :
s+
= :
s
, can
lead to a substantial loss in e¢ciency, together with the discarding perfectly acceptable
17
controls, while insisting upon :
s+
= '
s
provides negligible gains in e¢ciency when '
s
´:
s
is large, together with substantially inferior matched controls. Ming and Rosenbaum
(2000) show that substantial gains in bias reduction are possible by allowing the set sizes
to vary, that is, by setting c
s
< j
s
and
s
j
s
+ 1, often with little loss of e¢ciency.
Our strategy is to choose :
s+
, c
s
,
s
in §3.3 so that, even in the worst case, the loss
of e¢ciency is not large; then, in §4, to minimize total covariate distance subject to the
chosen :
s+
, c
s
,
s
.
3.3 Sizes of Matched Sets: Worst Case E¢ciency Bounds
For approximate e¢ciency calculations in this section only, suppose responses of distinct
subjects are mutually independent, every response has the same variance, o
2
, and the
covariates are entirely useless, that is, independent of treatment and response, so matching
on covariates neither removes bias nor reduces variance. This is, in a sense, the worst
situation for matching, because (i) imbalances in covariates do not create bias, so that the
tradeo¤ of bias and variance entirely focuses on variance, and (ii) matching does nothing,
in this case, to reduce the variance. Consider the di¤erence between the response of the
i
th
treated subject and the average of this subject’s :
si
matched controls. The average of
the :
+
di¤erences has variance
i (m) =
o
2
:
2
+
¸
s;i
1 +
1
:
si
Obviously, there is little to be gained in i (m) by increasing :
si
beyond a certain point,
because 1 +
1
m
si
tends to 1, not zero, as :
si
÷ ·. In other words, even with in…nitely
many controls matched to each treated subject, the standard error
i (m) cannot fall
below o´
:
+
because there are only :
+
treated subjects. If :
si
were constant, the same
for all :. i, then
i (m) is more than 40% above its asymptote for :
si
= 1, but less than
10% above the asymptote for :
si
= 5 and less than 5% above the asymptote for :
si
= 10.
Moreover, the asymptote, obtained by letting :
si
÷·, is not a real option in any actual
problem because :
s+
_ '
s
and the '
s
are …nite; see Table 1 where, for instance in
stratum : = 2, the largest possible average value of :
si
is only '
2+
´:
2
= 216´38 = 5.6.
At the same time, it is much harder to …nd good matches on covariates as :
si
increases
(Smith 1997, Ming and Rosenbaum 2000).
Write ´
;;m
+
for the set of possible values for the vector m of matched set sizes;
18
that is, m ÷ ´
;;m
+
if and only if m is an :
+
dimensional vector with positive integer
coordinates :
si
such that, c
s
_ :
si
_
s
, i = 1. . . . . :
s
, and :
s+
=
¸
ns
i=1
:
si
, for
: = 1. . . . . o. Two elements of ´
;;m
+
are of particular interest, speci…cally m and
¯ m which will now be de…ned. Informally, m is as nearly constant as possible, while
¯ m is as dispersed as possible. De…ne m by the rule: (i) if :
s+
´:
s
is an integer, then
:
si
= :
s+
´:
s
for i = 1. . . . . :
s
; otherwise, (ii) if :
s+
´:
s
is not an integer then :
si
= j
s
+1
for i = 1. . . . . :
s+
÷ j
s
:
s
and :
si
= j
s
for i = (:
s+
÷j
s
:
s
) + 1. . . . . :
s
. With c
s
<
s
,
de…ne ¯ m as follows
¯ :
si
=
s
1or i = 1. . . . . :
s
.
¯ :
si
= :
s+
÷(
s
÷c
s
) :
s
÷c
s
(:
s
÷1) 1or i = :
s
+ 1.
¯ :
si
= c
s
1or i = :
s
+ 2. . . . . :
s
.
where
:
s
=
¸
:
s+
÷:
s
c
s
s
÷c
s
¸
.
If c
s
=
s
= :
s+
´:
s
, then take ¯ :
si
= c
s
. For any possible choice of , , m
+
,
Proposition 1 determines the minimum and maximum standard error,
i (m). Using
Proposition 1, one can select , , m
+
so the worst possible loss in e¢ciency is controlled.
Proposition 1 For possible , , m
+
,
i (m) = min
m2M
;;m
+
i (m) and i ( ¯ m) = max
m2M
;;m
+
i (m) .
Proof. Notice that the contribution to i (m) from each stratum : is a symmetric and
convex function of (:
s1
. . . . . :
s;ns
), that is, a Schur convex function (Marshall and Olkin
1979), so i (m) itself is the sum of o Schur convex functions. (The function i (m) itself
would be Schur convex on a symmetric domain, but ´
;;m
+
is only symmetric within
strata.) The result then follows immediately from properties of Schur convex functions,
speci…cally Proposition 4.C.1 of Marshall and Olkin (1979, p. 132) which is originally due
to J. H. B. Kemperman.
We now de…ne two quantities, where
;;m
+
compares the best and worst standard
errors with a possible , , m
+
, and where
;;m
+
compares use of all controls to the
worst standard errors with a possible , , m
+
. The smallest possible value of i (m)
occurs if all controls are used, :
s+
= '
s
for : = 1. . . . . o, and the :
si
are as uniform
19
Table 5: E¢ciency Calculations for Matching with Variable Controls. Above the double
line, caculations describe planning before matching. Below the double line, caculations
describe the acutal match produced by minimum distance matching.
Stratum : = 1 (1on) : = 2 ('cdin:) All
Treated, :
s
21 38 59
Potential Controls, '
s
276 216 492
Matched Controls, :
s+
105 114 219
Minimum c
s
2 1 1 or 2
Maximum
s
7 6 7
;;m
+
97% 90% 92%
;;m
+
92% 84% 87%
¯ m 7
12
. 5
1
. 2
8
6
15
. 2
1
. 1
22
219
m 5
21
3
38
219
m 14
3
. 13
18
6
26
. 5
12
492
Actual Match Frequencies 7
10
. 6
1
. 5
3
. 2
7
6
9
. 5
2
. 4
3
. 3
4
. 2
6
. 1
14
219
Actual m versus m 92% 87% 89%
as possible. De…ne j
s
= '
s
´:
s
 and m by the rule: (i) if '
s
´:
s
is an integer, then
:
si
= '
s
´:
s
for i = 1. . . . . :
s
; otherwise, (ii) if '
s
´:
s
is not an integer then :
si
= j
s
+1
for i = 1. . . . . '
s
÷j
s
:
s
and :
si
= j
s
for i = :
s
÷
'
s
÷j
s
:
s
+ 1. . . . . :
s
. De…ne
;;m
+
=
i (m)
i ( ¯ m)
and
;;m
+
=
i
m
i ( ¯ m)
so the standard error of the average di¤erence would be
;;m
+
times smaller with the
best choice of match frequencies, m, than with the worst, ¯ m, and similarly
;;m
+
times
smaller with match frequencies mthan with the worst, ¯ m. In typical problems, the length
of a con…dence interval for a mean would be proportional to the standard error, so
;;m
+
and
;;m
+
describe the e¤ect on the length of a con…dence interval.
Table 5 shows some e¢ciency calculations for our proposed design. In stratum : = 1,
with :
1
= 21, '
1
= 276, :
1+
= 105, c
s
= 2,
s
= 7, the least dispersed match frequencies
m for the :
1
= 21 joiners has (:
11
. . . . . :
1;21
) equal to (5. 5. . . . . 5)
T
which we write 5
21
,
and the most dispersed ¯ m has ( ¯ :
11
. . . . . ¯ :
1;21
) equal to (7. 7. . . . . 7. 5. 2. . . . . 2), which we
write 7
12
. 5
1
. 2
8
, noting that :
1+
= 105 = (12 7)+5+(8 2). Similarly, using all controls
matched as uniformly as possible has
:
11
. . . . . :
1;21
equal to (14. 14. 14. 13. . . . . 13) or
14
3
. 13
16
. In stratum : = 1, the most e¢cient allocation m with :
1+
= 105 controls
20
yields a standard error that is 97% of the standard error for the least e¢cient allocation
¯ m with :
1+
= 105 controls and c
s
= 2,
s
= 7. Also, in stratum : = 1, the most e¢cient
allocation with all controls, m with :
1+
= 276, has a standard error that is 92% of the
standard error of the least e¢cient allocation ¯ m with :
1+
= 105 controls and c
s
= 2,
s
= 7. That is, the c
1
= 2.
1
= 7, :
1+
= 105 con…guration is much more ‡exible,
so it can remove more bias, but it is never much less e¢cient than the c
1
= 13,
2
= 14,
:
1+
= 276 design. Even in the worst case, even with irrelevant covariates and no bias to
be removed, the allocations in Table 5 are never very ine¢cient.
The frequencies actually obtained in §4 by optimal matching were not the least e¢cient
frequencies ¯ m, but rather the frequencies in m ÷ ´
;;m
+
given below the double lines
in Table 5. The standard error using all '
+
= 492 controls allocated as uniformly as
possible is 89% of the standard error for the actual m with 219 controls and an uneven
allocation.
We decided upon the allocation rules in Table 5 after trying several alternative rules
and obtaining the optimal matches, as in§4. As one might anticipate from Figure 4, it
was more di¢cult to …nd good matches on covariates in declining group, : = 2, than in the
low group, : = 1, and so we tolerated somewhat lower values of
;;m
+
and
;;m
+
in
group : = 2. In particular, when we tried matching with a minimum of c
2
= 2 controls in
trajectory group : = 2, the balance on covariates in this trajectory group was unsatisfactory
by the standards discussed in §4.2, while with c
2
= 1 good balance was attainable.
Three points deserve emphasis. First, there is a tradeo¤ between bias and variance,
between comparing comparable subjects and using many subjects as possible, and one
faces diminishing returns if either consideration completely outweighs the other. Second,
although the e¢ciency bounds in the current section provide a rough guide to setting the
matching parameters, and , one should always check, as we do in §4.2, that matching
has actually balanced observed covariates, and adjust the matching parameters, and
, accordingly. Third, it is never appropriate to use extremely biased controls in the
hope of boosting e¢ciency; see for instance Rubin (1979) for a simulation and see Dehejia
and Wahba (1999) for a case study of the damage such controls can do to model based
adjustments.
21
4 Optimal Matching: Method and Evaluation of Comparability
4.1 Matching Method: Separate Propensity Scores, Minimum Distance Matching
The matching was performed separately within the two trajectory groups, the lows (: = 1)
and the mediums (: = 2), using the allocations displayed in Table 5, so :
1+
= 105 controls
were selected in the low group with at least c
1
= 2 controls and at most
1
= 7 controls
for each joiner, and :
2+
= 114 controls were selected in the medium group with at least
c
2
= 1 control and at most
2
= 6 controls for each joiner. The matching attempted to
balance the covariates listed in Table 6. The twelve covariates in Figure 3 are identi…ed
by an asterisk in Table 6, and the remaining covariates are derived from these twelve.
One derived covariate, which is obtained from the groupbased trajectory model, is the
conditional posterior probability of being in the medium group given assignment to either
the low or medium group. Two separate propensity scores were estimated, one for each
trajectory group, to allow for di¤erent selection mechanisms across trajectories. The
propensity score was estimated using a logit model to predict joining a gang at age 14
from the other covariates in Table 6, and the logit of the propensity score is used in most
calculations.
For trajectory groups together, the mean of each covariate for the :
+
= 59 joiners at
age 14 (the ‘treated’ group) is A
t
in Table 6, and the mean for the '
+
= 492 potential
controls is A
c
. Generally, the gang joiners were more violent than potential controls at
ages 10, 11, 12 and 13, were less popular with their peers, had higher posterior probabilities
of membership in the medium trajectory group, and of course higher propensity scores.
Some of the covariates in the upper portion of Table 6 had missing values, and their
associated missing value indicators are listed in the lower portion of Table 6. For instance,
the covariate ‘number of sexual partners’ was missing for A
t
= 5% of joiners and A
c
= 1%
of potential controls. If missing value indicators are included as variables in the propensity
score, as we did, then the propensity score tends to balance the observed values of the
covariates and the pattern of missing value indicators, but of course it may not balance the
unobserved covariate values themselves; see the appendix of Rosenbaum and Rubin (1984).
For instance, it would tend to balance the observed values of ‘number of sexual partners’ and
the frequency of missing values, but cannot be expected to balance the missing ‘number
of sexual partners.’ When missing value indicators are included as predictors in a logit
model, the …tted propensity scores are una¤ected by the numerical values used in place of
22
missing values, because the …tted coe¢cients adjust to compensate; we used the covariate’s
mean. Although this does not a¤ect the propensity scores themselves, it does have a small
e¤ect on the Mahalanobis distance described below, and we excluded missing values when
evaluating covariate balance, so this use of the mean does not a¤ect balance measures
discussed in §4.2.
Within each trajectory group, we de…ned a distance between each joiner and each
potential control. The distance had two components. First, the distance was the Maha
lanobis distance computed from the covariates in Table 6, including the relevant group’s
logit propensity score, but excluding the missing value indicators. Second, if two individ
uals di¤ered on their logits of the propensity score by more than 0.2 times the standard
deviation of the logit of the propensity score, then a penalty of 200 was added to this dis
tance. Penalties are a standard device for e¤ectively constraining an optimization problem
without formally introducing constraints. In this study, all but 7 of the 219 actual matches
avoided the penalty and thereby respected the constraint. We selected the controls to min
imize the total of the distances, that is, of the total of the :
1+
= 105 distances in the
low group and the total of the :
2+
= 114 distances in the medium group . This is com
binatorial optimization problem. We solved it using the tactic in Ming and Rosenbaum
(2001), who convert it into the familiar ‘optimal assignment problem’ and then solve the
optimal assignment problem using Bertsekas’ (1981) auction algorithm with a substantially
accelerated version of the Splus code in Rosenbaum (2002a, p. 325326). Faster software
in R is available from Hansen (2004).
4.2 Comparability Before Age 14: Covariate Balance Before and After Matching
In this section, the matched sample is evaluated in terms of covariate balance. The
measures of balance are simple, traditional, and have been in use for some time; e.g.,
Rosenbaum and Rubin (1985a). For any one covariate A, let A
t
and A
c
be the means, and
let :
Xt
and :
Xc
be the standard deviations of A for, respectively, all 59 joiners and all 492
potential controls, before strati…cation and matching. Also, let :
X
=
:
2
Xt
+:
2
Xc
´2 be
an equally weighted combination of the two standard errors. Because we use every joiner,
matching does not alter the mean A
t
of A for joiners. Joiner i in stratum : has covariate
23
value A
tsi
and is matched to :
si
controls with covariate values A
csij
, , = 1. . . . . :
si
. Write
A
csi
=
1
:
si
m
si
¸
j=1
A
csij
and A
c
=
1
:
+
S
¸
s=1
ns
¸
i=1
A
csi
. (1)
In words A
csi
is the average of the covariate for all controls matched to gang member i in
trajectory group s, A
c
is the average of those averages across all gang joiners regardless
of their trajectory group. One hopes to see covariate balance after matching, speci…cally
that A
tsi
÷ A
csi
is centered near zero, or that A
t
÷ A
c
is near zero. Table 6 reports A
t
and A
c
before matching, A
c
after matching, and two absolute standardized measures,
o1
X
=
A
t
÷A
c
:
X
, and o1
Xm
=
A
t
÷A
c
:
X
,
in which the denominators are the same. Aside from the missing value indicators, the
covariates in Table 6 are sorted by the standardized bias o1
X
before matching. Because
the two propensity scores were de…ned separately in the two trajectory groups, the various
measures for them are calculated only from individuals in the relevant trajectory group.
Before matching, the groups are almost a full standard deviation apart on the propensity
score, almost half a standard deviation apart on peer rated popularity, almost 40% of
a standard deviation apart on the posterior probability of membership in the medium
trajectory group, and roughly a quarter of a standard deviation apart on violence at ages
10, 11, 12, and 13. The balance is typically improved after matching, particularly when
the bias before matching was not small. We particularly note that the standardized bias
after matching is less than 5% of a standard deviation for violence scores at age 10, 11, 12,
and 13. As we have previously argued, balance on prior violence levels between gang joiners
and their matched controls is particularly important to generating a credible estimate of
the violence facilitation e¤ect of gang membership. Thus, we are heartened that matching
resulted in near perfect balance on these covariates.
Figure 5 depicts the standardized biases for the 15 covariates, but not for their missing
value indicators. In the …rst two boxplots in Figure 5, the lower quartile of the stan
dardized biases before matching is above the upper quartile after matching. Figure 5 also
includes the standardized biases in the means after matching in the low and medium groups
separately. Fourteen covariates are included in these boxplots because only the propensity
score in group : is used to describe that group. For any A, all four standardized biases in
24
Table 6: Covariate Imbalance Before and After Matching, for 15 Covariates and 7 Missing
Value Indicators. Absolute standardized di¤erence in covariate means, before and after
matching. Covariate means for all gang joiners at age 14, all nonjoiners, and matched
nonjoiners. Covariates are ordered by the standardized bias before matching.
Covariate o1
X
o1
Xm
A
t
A
c
A
c
Logit Propensity Score 1 0.96 0.21 ÷2.05 ÷3.03 ÷2.27
Logit Propensity Score 2 0.79 0.18 ÷1.38 ÷2.04 ÷1.53
Peer Rated Popularity, Age 11* 0.47 0.18 ÷0.28 0.18 ÷0.10
Pr (1on[ 1on or 1cci:i:o) 0.38 0.02 0.65 0.50 0.66
Violence, Age 11* 0.26 0.04 1.24 0.93 1.19
Mother’s age at …rst birth* 0.26 0.11 22.56 23.56 23.00
Peer Rated Aggression, Age 11* 0.25 0.05 0.00 ÷0.23 ÷0.05
Violence, Age 10* 0.24 0.02 2.46 1.96 2.41
Violence, Age 12* 0.23 0.03 1.00 0.73 1.03
Teacher Rating of Hyperactivity, Age 11* 0.22 0.09 1.23 0.95 1.11
Violence, Age 13* 0.21 0.02 0.88 0.67 0.86
Number of Sexual Partners, Age 13* 0.21 0.06 0.23 0.14 0.21
Teacher Rating of Opposition, Age 11* 0.19 0.20 2.50 2.04 2.02
Intelligence Score* 0.07 0.22 8.92 9.10 9.46
Teacher Rating of Physical Aggression, Age 11* 0.03 0.20 0.75 0.79 0.49
Number of Sexual Partners Missing 0.27 0.15 0.05 0.01 0.03
Intelligence Score Missing 0.16 0.06 0.03 0.01 0.03
Physical Aggression Missing 0.15 0.15 0.05 0.02 0.08
Violence Age 13 Missing 0.14 0.11 0.03 0.01 0.02
Mother’s Age Missing 0.13 0.17 0.03 0.01 0.01
Popularity Missing 0.09 0.02 0.15 0.12 0.16
Aggression Rating Missing 0.09 0.02 0.15 0.12 0.16
25
Figure 5 have the same denominator :
X
; only the means in the numerator change. Figure
5 suggests the improvement in balance is similar in both groups. Figure 5 and Table 6 also
show that the matching had good success in reducing the absolute standardized di¤erences
to near or generally well below two tenths of a standard deviation.
Figure 6 depicts the covariates themselves, before and after matching, for the four
covariates in Table 6 with the largest standardized biases before matching. Note that
before matching the distribution of each of these covariates across the nongang members
di¤ers appreciably from that of the gang members whereas after matching the distributions
for the gang and nongang member are very similar. These results are another encouraging
sign that the matching had substantial successful in bring into balance covariates that in
the unmatched data were substantially out of balance.
As a methodological aside, the boxplots for 59 joiners and the 273 unmatched controls
in Figure 6 are conventional boxplots. A joiner might be matched to between 1 and 7
controls, but an appropriate, directly adjusted analysis gives equal weight the to the 59
joiners. Therefore, for each covariate, we created a weighted empirical distribution for the
matched controls. In the weighted distribution, if a joiner had 1 matched control then that
control received weight 1, but if the joiner had 2 matched controls then each one received
weight
1
2
, and so on, up to 7 matched controls each with weight
1
7
. The expectation of
this weighted empirical distribution is A
c
in (1), as displayed in Table 6. Alternatively,
if one computed A
c
in (1) not from the covariate A
csij
itself, but rather from the binary
variable indicating whether A
csij
_ r, then the result is the weighted empirical distribution
evaluated at r. In Figure 6 and in several later boxplots, the quartiles for matched controls
are computed from this weighted empirical distribution.
Although the missing value indicators at the bottom of Table 6 also show improved
balance overall, much of this occurred in the low trajectory group, where there was more
freedom to pick controls. There was little improvement in the missing indicators in the
medium group, in part because the Mahalanobis distance emphasized the actual covariates
and the missing indicators were included only indirectly through the propensity score.
26
Table 7: Comparison of Levels of Violence Among Gang Joiners at Age 14 and Matched
Controls. Twosided signi…cance levels from the HodgeLehmann aligned rank test, testing
no e¤ect on the level or the change in violence. Changes in violence are the di¤erence
between violence at a given age and the average violence for ages 10 to 13. The 95 percent
con…dence interval for an additive e¤ect on the level of violence is obtained by inverting
the test.
# Sets
Level of
Violence
Change in
Violence
95% Con…dence
Interval
Covariate Age 10 59 0.488
Covariate Age 11 57 0.671
Covariate Age 12 58 0.209
Covariate Age 13 57 0.600
Covariate Average 10 to 13 59 0.631
Outcome Age 14 59 0.0017 0.0074 [0.25. 1.00]
Outcome Age 15 53 0.0089 0.0159 [0.14. 1.16]
Outcome Age 16 48 0.411 0.509 [÷0.25. 0.63 ]
Outcome Age 17 50 0.556 0.606 [÷0.30. 0.51]
Outcome Average 14 to 17 59 0.015 0.023 [0.08. 0.79]
5 Outcomes in Late Adolescence for Boys Who Joined Gangs at 14 and for Matched
Controls
5.1 Violence and Change in Violence, Ages 14 to 17
Figure 7 reports boxplots of the violence scores for the 59 gang joiners for ages 1417. Also,
shown are the weighted empirical distribution for matched controls, de…ned in §4.2. Figure
8 shows counterpart boxplots by trajectory group. For the sample as a whole and for each
trajectory group, the plots suggest a pronounced upward shift in violence at ages 14 and
15. By age 16 the di¤erences between the gang joiners at age 14 and their match controls
seems to have largely dissipated particularly for the medium violence trajectory group.
The aligned rank test of Hodges and Lehmann (1962) was used to test the null hypoth
esis of no di¤erence in a comparison of the level of violence for the joiner in a matched
set to the levels of violence for his several matched controls. The aligned rank test is
essentially a generalization of Wilcoxon’s signed rank test to the case of matching with
multiple controls. Table 7 displays the results for the sample as whole. Prior to age
14, when none of the boys were in gangs, there is not a signi…cance di¤erence in level of
violence among the boys who would join at age 14 and their matched controls. After age
27
Table 8: Violence Outcomes Within Trajectory Groups. The outcome is the level of
violence at speci…c ages. Comparison of 21 joiners and 105 matched controls in the low
trajectory and 38 joiners and 114 matched controls in the medium trajectory group using
the aligned rank test. Twosided sign…cance levels for testing no e¤ect and 95 percent
con…dence intervals for an additive e¤ect formed by inverting the test.
Age Group
Signi…cance
Level
95% CI
14 Low (: = 1) 0.008 [0.16. 1.13]
Medium (: = 2) 0.033 [0.04. 1.21]
15 Low (: = 1) 0.034 [0.00. 1.40]
Medium (: = 2) 0.086 [0.00. 1.41]
16 Low (: = 1) 0.044 [0.00. 1.16]
Medium (: = 2) 0.753 [÷0.88. 0.57]
17 Low (: = 1) 0.070 [0.00. 1.58]
Medium (: = 2) 0.520 [÷0.95. 0.33]
14, two variables are examined: (i) the level of violence at a given age, and (ii) the change
in the level of violence at a given age when compared to the average for this boy from ages
10 to 13. For the sample as a whole, at ages 14 and 15, the joiners were signi…cantly
more violent than their matched controls, and the changes in their violence from baseline
were signi…cantly greater than the changes for their matched controls; whereas, at ages 16
and 17, the di¤erences were not signi…cant. In Table 7, a 95% con…dence interval for an
additive e¤ect of gang joining at age 14 on the subsequent level of violence is obtained by
inverting the aligned rank test.
Table 8 displays the results by trajectory group. Speci…cally, Table 8 reports inferences
about the e¤ects of joining a gang at age 14 on the level of violence at subsequent ages,
separately for the Low (: = 1) and Medium (: = 2) trajectory groups, again using the
HodgesLehmann aligned rank test
4
. The patterns in Table 8 are intriguing. They suggest
that the e¤ects of gang joining at age 14 may be more persistent for the low group than the
declining group. By age 15 the di¤erence in the violence between the age 14 joiners and
4
The reader will note that several con…dence intervals end exactly at zero. Many boys had a violence
score of zero in at least one year. As a rank statistic, the aligned rank statistic takes discrete steps as
the parameter being tested is varied, which explains, for instance, the signi…cance level of 0.034 in the low
group at age 15 together with a con…dence interval, [0:00; 1:40], which ends sharply at zero. This says that
an additive e¤ect of zero is not plausible, is rejected as too small at the 0.034 level, but that any positive
e¤ect is not rejected at the 0.05 level. Similarly, in the medium group at age 15, a zero e¤ect is not rejected
at the 0.05 level, because the signi…cance level is 0.086, but any negative e¤ect is rejected at the 0.05 level.
28
nonjoiners is no longer signi…cant at the .05 level for the medium group whereas for the
low group there is a signi…cant di¤erence until age 17. Nonetheless, we caution against
over interpretation of this seeming di¤erence in the persistence of the violence facilitation
e¤ect between the two groups. The sample sizes within groups are small — there are only
:
1
= 21 joiners in the Low group — and so de…nitive statements are not possible. For
instance, every con…dence interval in Table 8 includes an e¤ect of 0.25. At age 16 a test
of the null hypothesis of an e¤ect of 0.25 yields a twosided signi…cance level of 0.42 in the
Low group and 0.28 in the Medium group, so the same e¤ect of 0.25 is entirely plausible
at age 16 for both Low and Medium groups; that is, the e¤ect may well be the same in
both groups.
5.2 Gang Membership, Ages 14 to 17
Tables 9 and 10 describe the changes in gang membership over time, ages 14 to 17. In these
tables, the percents for controls are found by averaging over the controls matched to each
joiner, and averaging those averages, in parallel with the de…nition of A
c
. By de…nition,
the joiners are all in gangs at age 14 and the controls are not. A year later, in Tables 9,
only 39% of the joiners are still in gangs, while 10% of the controls have joined. By age
17, only 20% of the joiners are still in gangs and 16% of the matched controls are in gangs,
and these frequencies do not di¤er signi…cantly as judged by the MantelHaenszel test for
binary responses in multiply matched sets.
Table 10 describes the changes in gang membership separately in the two trajectory
groups. The immediate decline in gang membership at age 15 is greater in the low group
than in the medium group, but 25% of joiners in the low group are in gangs at age 17, as
opposed to 17% in the medium group; however, this di¤erence is not statistically signi…cant
by Fisher’s exact test. (In detail, in the low group, there were 20 matched sets with violence
data on both an age 14 joiner and a control, and in 25% = 5´20 of these sets the joiner
was in a gang at 17, so in the low group each joiner is 5% of the group, whereas for the
medium group, the ratio was 17% = 5´30, and the twosided pvalue from Fisher’s exact
test was 0.49.) Also, fewer controls subsequently join gangs in the low group than in the
medium group. In principle, di¤erences in the persistence in gang membership across the
groups might account for di¤erences in the persistence of the violence facilitation e¤ect
across groups, but of course neither di¤erence was statistically signi…cant.
Table 9 describes a commonplace event: as time passes, the integrity of the “treated”
29
Table 9: Gang Membership at Ages 14 to 17, for Joiners at 14 and Matched Controls.
The percent for controls is direct adjustment, that is, the average over matched sets of
the average gang membership among controls in each set. Twosided pvalues from the
MantelHaenszel test without continuity correction.
Age 14 15 16 17
Joiners at 14 in Gangs (%) 100 39 25 20
Matched Controls in Gangs (%) 0 10 11 16
MantelHaenszel Pvalue — 0.000046 0.027 .42
Matched Sets 59 56 48 50
Table 10: Gang Membership, Ages 14 to 17, for Joiners at Age 14 and Matched Controls,
by Trajectory Group. The percents for controls are directly adjusted, as in Table 6: it is
the average over joiners of the percentage of that joiner’s controls who are in gangs.
Trajectory Group Age 14 15 16 17
Low Joiners at 14 in Gangs (%) 100 29 17 25
Low Matched Controls in Gangs (%) 0 12 5 9
Medium Joiners at 14 in Gangs (%) 100 46 30 17
Medium Matched Controls in Gangs (%) 0 8 15 20
and “control” groups degrades, as subjects enter and leave. This occurs in both exper
iments and observational studies. In the language of randomized clinical trials, Table
9 displays ‘noncompliance,’ that is, the tendency over time for joiners at age 14 to quit
gangs, and for matched controls to join them. The noncompliance analysis for randomized
experiments in Greevy, et al. (2004) uses the ‘randomly assigned treatment’ as an instru
ment for the ‘received treatment;’ it gives appropriate inferences providing the assignment
is randomized, even if noncompliance is nonrandom and biased. Speci…cally, that analysis
assumes the randomized treatment assignment is untainted by selfselection bias, but the
e¤ect of the treatment is a function of the treatment received, not the treatment assigned,
and the treatment received may be a¤ected by selfselection bias. In the current con
text, in which randomization was not used, this would mean that if the matching in §4
had matched comparable joiners and controls at age 14 — this appears to be true for the
covariates in Table 6, but is a matter of speculation for covariates not measured — then
the noncompliance analysis would be appropriate even if later decisions to exit or enter
gangs were nonrandom. For instance, it would not be surprising if subsequent violence,
inside gangs or outside, were related to exit or entry, but this would not invalidate the
instrumental variable analysis.
30
Table 11: E¤ects of Gang Membership on Violence Under Three Noncompliance Models.
Model c
´
95% CI
Transient 1 1.10 [0.29. 1.86]
Lingering
1
2
0.77 [0.20. 1.31]
Permanent 0 0.46 [0.11. 0.81]
We applied the noncompliance analysis in Greevy, et. al. (2004) in the following
way. As in that paper, we de…ned an indicator of gang membership at age a, for a =
14. 15. 16. 17, using exponential smoothing. Write G
ia
= 1 if boy i was in a gang at
age a, G
ia
= 0 otherwise, so G
i;14
= 1 for joiners and G
i;14
= 0 for controls, but for
a 14, there is some switching, as indicated in Table 9. De…ne
¯
G
i;14
= G
i;14
, and
¯
G
i;a+1
= cG
i;a+1
+(1 ÷c)
¯
G
i;a
, for a = 14. 15. 16. where 0 _ c _ 1, with
¯
G
i;a+1
de…ned to
be missing if either G
i;a+1
or
¯
G
i;a
is missing. If c = 1 then
¯
G
i;a
= G
i;a
simply indicates
whether a boy is in a gang at age a; this is the “transient” model, because only current
gang membership matters. If c = 0 then
¯
G
i;a
= G
i;14
simply indicates whether a boy
joined a gang at age 14; this is the “permanent” model, because e¤ects at age 14 last until
age 17. If 0 < c < 1, then past gang membership exerts diminishing in‡uence as the years
pass. For instance, with c =
1
2
, a boy who was in a gang for the …rst two years, G
i;14
= 1,
G
i;15
= 1, G
i;16
= 0, G
i;17
= 0, has
¯
G
i;14
= 1,
¯
G
i;15
= 1,
¯
G
i;16
=
1
2
,
¯
G
i;17
=
1
4
. The model
with c =
1
2
is the “lingering” model. In each model, the e¤ect on violence at age a is
modeled as
¯
G
i;a
. As in Greevy, et al. (2004), to test H
0
: =
0
, we subtracted
0
¯
G
i;a
from the violence at age a, averaged all of these adjusted values for each boy over ages
a = 14. 15. 16. 17, and applied the aligned rank test to these adjusted averages, comparing
joiners at 14 to their matched controls; see Greevy, et al. (2004) for detailed discussion
and references to earlier work. Of course, for all c, when testing H
0
: = 0, this gives the
0.015 signi…cance level in Table 7 for the average level of violence, ages 14 to 17. Table 11
shows the HodgesLehmann point estimate of under each model, together with the 95%
con…dence interval.
Based on the pattern seen in Table 7, the “permanent” model with c = 0 does not
look plausible: the di¤erence in violence between joiners at age 14 and matched controls
seems to diminish with increasing age, perhaps because of the shift in gang membership
over time in Table 9. Figure 9 depicts residuals, as in Greevy, et al. (2004), from the three
noncompliance models. If the model were correct and if the sample size were extremely
large, then each pair of boxplots would be the same, but di¤erent pairs of boxplots might
31
di¤er. Arguably, with c = 0 in Figure 9, the e¤ect of
´
= 0.46 is too small at age a = 14
and too large at age a = 17. The plots for c = 1 with
´
= 1.10 and for c = 1´2 with
´
= 0.77 both look better than for c = 0, and neither looks dramatically better than the
other. In a much larger study, we would explore whether these results di¤er by trajectory
group; however, in the current study, the groupspeci…c con…dence intervals for ages 15 to
17 in Table 8 all include zero e¤ect.
5.3 Covariance Adjustment of Matched Sets
It is sometimes possible to increase the e¤ectiveness of bias adjustments or reduce sampling
error by combining matching with some form of covariance adjustment (Rubin 1979). Here,
we apply the method in Rosenbaum (2002b, §7), in which: (i) the hypothesized treatment
e¤ect is subtracted from the responses of the 59 gang joiners to form adjusted responses
for the 278 = 59 + 219 matched boys, (ii) these 278 adjusted responses are regressed on
covariates to obtain 278 residuals, and (iii) the aligned rank test is applied to the residuals
within matched sets. In a randomized experiment in which one boy in each set is picked
at random for treatment, this procedure produces a randomization inference with the
correct level. In our implementation here, we used Huber’s mestimates to perform the
regression, as implemented in Splus with Huber’s weight function. In Table 6, the 12
covariates marked with an asterisk are variables not derived from other variables, and the
adjustment used these 12 covariates. (Missing values of covariates were replaced by the
mean for the covariate.)
For an additive e¤ect on the level of violence at age 14, the covariance adjustment
yielded a twosided signi…cance level of 0.0026 for testing no e¤ect and a 95% con…dence
interval of [0.25. 1.02]; at age 15, the signi…cance level for no e¤ect was 0.014 with 95%
con…dence interval of [0.12. 1.14]; for the average level of violence for ages 14 to 17, the
signi…cance level was 0.037 with a 95% con…dence interval of [0.02. 0.76]. These results are
quite similar to the corresponding results in Table 7 without covariance adjustment. In
this one instance, the matching alone seems to have adequately controlled for the covariates
in the matched sets, with the covariance adjustment doing little more; however, this cannot
be counted upon in general.
32
5.4 Sensitivity to Bias from an Unobserved Covariate
The analysis in §4.2 suggests that the matching was quite successful in balancing the
measured covariates in Table 6, but there is the inevitable concern that some important
covariate may not have been measured. The analysis in Table 7 would be correct in a
randomized experiment in which one boy in each matched set were picked at random for
treatment, but this analysis would not be correct if an important unmeasured covariate
had not been controlled by matching. Here, we ask how such an unobserved covariate
might alter the analysis in Table 7. For a nontechnical overview of methods of sensitivity
analysis, see Rosenbaum (2005b). The method used here is described in detail in Gastwirth,
Krieger and Rosenbaum (2000), so only a brief description is given here. Suppose that
an unobserved binary covariate, n = 1 or n = 0, were associated with a _ 1 fold
increase in the odds of joining a gang at age 14. What is the largest possible onesided
signi…cance level for the aligned rank test allowing for the impact of failure to control for
n? Table 12 gives sharp upper bounds on the onesided signi…cance levels for no e¤ect,
testing against increased violence among gang joiners, for several values of . If = 1,
then one obtains the randomization distribution and essentially the analysis in Table 7.
(For technical reasons, the sensitivity analysis is best viewed as onesided. If one doubles
the values for = 1 in Table 12, one obtains the corresponding twosided signi…cance levels
in Table 7.) The bounds in Table 12 are sharp in the sense that they are attained for a
certain unobserved covariate strongly related to violence at the given age. The increase
in violence at age 14 is insensitive to a 50% increase in the odds of joining a gang ( = 1.5)
associated with n = 1, as the maximum possible signi…cance level of 0.036 is still less than
the conventional 0.05, but larger biases than this could explain the observed association.
The results at age 15 are more sensitive to unobserved bias than the results at age 14. The
ostensible e¤ect of gang joining on violence at age 14 is not sensitive to small biases, but
it is far more sensitive to bias than, say, Hammond’s (1964) study of the e¤ects of heavy
smoking on the risk of lung cancer, which becomes sensitive to bias at about = 6; see
Rosenbaum (2002a, §4.3.2).
6 Concluding Remarks
We began this paper with the observation that a key aim of empirical research in devel
opmental psychopathology and life course studies is measuring the e¤ects of therapeutic
interventions or important life events on behavioral trajectories. We also observed that
33
Table 12: Sensitivity to Unobserved Biases: Sharp Upper Bounds on the OneSided Sign
…cance Level for Testing No E¤ect on Level of Violence at Ages 14 and 15.
Age 14 Age 15
1.0 0.00084 0.0045
1.3 0.012 0.037
1.4 0.021 0.059
1.5 0.036 0.088
1.6 0.056 0.124
1.7 0.082 0.166
the use of experimental control to infer these e¤ects was often impractical or unethical.
This paper presented an approach for inferring such e¤ects from observational data that
attempts to recreate some key ingredients of a well designed experiment.
The inference strategy was designed with three goals in mind: First, we wanted to
exploit the rich variety of measurements available in quality longitudinal studies. It is par
ticularly important to balance covariates strongly associated with both treatment status
and outcomes. These often include covariates that are the de…ning feature of longitudinal
data sets—prior values of the outcome variable and prior values of the treatment variable.
Second, we wanted to demonstrate a mode of analysis in which key results on pretreatment
balance and posttreatment outcomes can be communicated in a transparent fashion. Such
transparency is important to reporting statistical …ndings in a comprehensible fashion,
particularly to nontechnical audiences. Third, research on life course development can be
divided into two distinct literatures. One aims to document and understand individual dif
ferences in developmental trajectories. The ultimate purpose of this literature is to develop
empirically veri…ed theory of the predictors and consequences of alternative trajectories of
development. Research in this tradition relies primarily on prospective longitudinal stud
ies such as that used in this paper and statistical inference is most commonly based on
regressionbased statistical procedures. Another literature, which is more clinically or pol
icy oriented, aims to identify interventions or programs that can alter trajectories for the
better. For this type of research, experiments are the preferred statistical methodology.
Our third objective was to demonstrate a form of analysis based on groupbased trajec
tory modeling, propensity scores, and matching that can better unite these two strands of
research.
34
References
[1] Bergstralh, E. J., Kosanke, J. L., and Jacobsen, S. L. (1996), “Software for optimal
matching in observational studies,” Epidemiology, 7, 331332.
[2] Bertsekas, D. P. (1981), “A new algorithm for the assignment problem,”
Mathematical Programming, 21, 152171. Fortran code is available at:
http://www.mit.edu:8001//people/ dimitrib/home.html
[3] Cloward, R. A. and L. E. Ohlin. (1960). Delinquency and Opportunity: A Theory of
Delinquent Gangs. New York: Free Press.
[4] Cochran, W. G. (1965) The planning of observational studies of human populations
(with Discussion). Journal of the Royal Statistical Society, A128, 134155.
[5] Cohen, A. K. (1955). Delinquent Boys: The Culture of the Gang. Glencoe, IL: Free
Press.
[6] Dehejia, R. H. and Wahba, S. (1999) Causal e¤ects in nonexperimental studies:
Reevaluating the evaluation of training programs. Journal of the American Statistical
Association, 94, 10531062.
[7] Dorn, H.F. (1953). Philolosphy of Inferences from Retrospective Studies. American
Journal of Public Health, 43:677683.
[8] Elder, Jr., G. H. 1985. Perspectives on the Life Course. In: G.H. Elder, Jr., ed. Life
Course Dynamics. Ithaca: Cornell University Press.
[9] Elder, Jr. G. H. 1998. “The Life Course as Developmental Theory.” Child Development,
69: 112.
[10] Farrington, D. P. 1986. “Age and Crime.” In M. Tonry, and N. Morris, eds., Crime
and Justice: An Annual Review of Research. Vol. 7 Chicago: University of Chicago
Press.
[11] Fisher, R.A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd.
[12] Gastwirth, J. L., Krieger, A. M. & Rosenbaum, P. R. (2000). Asymptotic separability
in sensitivity analysis. J. R. Statist. Soc. B 62, 54555.
[13] Greevy, R., Silber, J. H., Cnaan, A., and Rosenbaum, P. R. (2004), “Randomization
inference with imperfect compliance in the ACEinhibitor after anthracycline random
ized trial,” Journal of the American Statistical Association, 99, 715.
[14] Gu, X. S. and Rosenbaum, P. R. (1993) Comparison of multivariate matching meth
ods: Structures, distances and algorithms. Journal of Computational and Graphical
Statistics, 2, 405420.
35
[15] Halford, K. & Bouma, R. (1997), ’Individual psychopathology and marital distress’,
in Halford, K. & Markham, H. (eds) Clinical Handbook of Marriage and Couples
Interventions, John Wiley and Sons, Chichester UK.
[16] Hammond, E. C. (1964). Smoking in relation to mortality and morbidity. Journal of
the National Cancer Institute 32, 1161–1188.
[17] Hansen, B. B. (2004), “Full matching in an observational study of coaching for the
SAT,” Journal of the American Statistical Association, 99, 609618. Rcode is available
at: http://www.stat.lsa.umich.edu/~bbh/
[18] Haviland, A. and D.S. Nagin. 2005. “Causal Inference with Groupbased Trajectory
Models.” Psychometrika, 70, 122.
[19] Hodges, J. L. and Lehmann, E. L. (1962), “Rank methods for combination of inde
pendent experiments in the analysis of variance,” Annals of Mathematical Statistics,
33, 48297.
[20] Hodges, J. L. and Lehmann, E. L. (1963), “Estimates of location based on ranks,”
Annals of Mathematical Statistics, 34, 598611.
[21] Jo¤e, M. M. and Rosenbaum, P. R. (1999) Propensity scores. American Journal of
Epidemiology, 150, 327333.
[22] Lacourse, E., D. Nagin, F. Vitaro, M. Claes, and R. E. Tremblay. 2003. “Developmen
tal Trajectories of Boys Delinquent Group Membership and Facilitation of Violent
Behaviors During Adolescence.” Development and Psychopathology, 15, 183  197.
[23] Lehmann, E. L. (1998), Nonparametrics, New Jersey: Prentice Hall.
[24] Marshall, A. W. and Olkin, I. (1979) Inequalities. New York: Academic.
[25] Meyer, B. D. (1995) Natural and quasiexperiments in economics. Journal of Business
and Economic Statistics, 13, 151161.
[26] Ming, K. and Rosenbaum, P. R. (2000). Substantial gains in bias reduction from
matching with a variable number of controls. Biometrics 56, 118124.
[27] Ming, K. and Rosenbaum, P. R. (2001), “A note on optimal matching with variable
controls using the assignment algorithm,” Journal of Computational and Graphical
Statistics, 10, 455463.
[28] Muthén, B. O. 2001. “SecondGeneration Structural Equation Modeling with a Combi
nation of Categorical and Continuous Latent Variables: New Opportunities for Latent
Class/Latent Curve Modeling.” In A. Sayers and L. Collins, eds., New Methods for
the Analysis of Change. Washington, D.C.: American Psychological Association.
[29] Nagin, D. S. 1999. "Analyzing Developmental Trajectories: A Semiparametric,
36
Groupbased Approach." Psychological Methods, 4: 139177.
[30] Nagin, D. S. (2005) GroupBased Modeling of Development. Cambridge, MA: Harvard
University Press.
[31] Nagin, D., D. Farrington, and T. Mo¢tt. 1995. “LifeCourse Trajectories of Di¤erent
Types of O¤enders.” Criminology, 33: 111139.
[32] Nagin, D. S., & Land, K. C. (1993). Age, criminal careers, and population heterogene
ity: Speci…cation and estimation of a nonparametric, mixed poisson model. Criminol
ogy, 31(3), 327362.
[33] Rosenbaum P. R. (1984) Conditional permutation tests and the propensity score in
observational studies. Journal of the American Statistical Association, 79, 565574.
[34] Rosenbaum, P. R. (1987) Modelbased direct adjustment. Journal of the American
Statistical Association, 82, 387394.
[35] Rosenbaum, P.R. (1989), “Optimal matching in observational studies,” Journal of the
American Statistical Association, 84, 102432.
[36] Rosenbaum, P. R. (1991) A characterization of optimal designs for observational stud
ies. Journal of the Royal Statistical Society B53 597610.
[37] Rosenbaum, P. R. (2002a) Observational Studies (2
nd
edition). New York: Springer
Verlag.
[38] Rosenbaum, P.R. (2002b). Covariance adjustment in randomized experiments and
observational studies. Statistical Science 17, 286327.
[39] Rosenbaum, P. R. (2005a) Observational study. In: Encyclopedia of Statistics in Be
havioral Science, 2005, eds., B. S. Everitt and D. C. Howell, New York: John Wiley
and Sons, pp. 14511462.
[40] Rosenbaum, P. R. (2005b) Sensitivity analysis in observational studies. In: Encyclo
pedia of Statistics in Behavioral Science, 2005, eds., B. S. Everitt and D. C. Howell,
New York: John Wiley and Sons, pp. 18091814.
[41] Rosenbaum, P. and Rubin, D. (1983) The central role of the propensity score in
observational studies for causal e¤ects. Biometrika 70, 4155.
[42] Rosenbaum, P. & Rubin, D. (1984) Reducing bias in observational studies using sub
classi…cation on the propensity score. Journal of the American Statistical Association,
79, 516524.
[43] Rosenbaum, P. and Rubin, D. (1985a). Constructing a control group using multi
variate matched sampling methods that incorporate the propensity score. American
Statistician 39, 33–38.
37
[44] Rosenbaum, P. R. and Rubin, D. R. (1985b). The bias due to incomplete matching.
Biometrics 41, 103116.
[45] Rosenbaum, P. R. and Silber, J. H. (2001) Matching and thick description in an
observational study of mortality after surgery. Biostatistics, 2, 217232.
[46] Rubin, D. B. (1974) Estimating causal e¤ects of treatments in randomized and non
randomized studies. Journal of Educational Psychology, 66, 688701.
[47] Rubin, D. B. (1979) Using multivariate matched sampling and regression adjustment
to control bias in observational studies. Journal of the American Statistical Associa
tion, 74, 318–328.
[48] Rubin D. B. (1980). Bias reduction using Mahalanobis metric matching. Biometrics
36, 293298.
[49] Schmaling, K. & Sher, T. (1997), ‘Physical health and relationships’, in Halford, K.
& Markman, H. (eds) Clinical Handbook of Marriage and Couples Interventions, pp.
323338., John Wiley and Sons, Chichester UK.
[50] Shadish, W. R., Cook, T. D. & Campbell, D. T. (2002). Experimental and Quasi
Experimental Designs for Generalized Causal Inference. Boston: HoughtonMiin.
[51] Short, J. F., Jr. and F. L. Stodtbeck. (1965). Group Process and Gang Delinquency.
Chicago: University of Chicago Press.
[52] Smith, H. (1997), “Matching with multiple controls to estimate treatment e¤ects in
observational studies,” Sociological Methodology 27, 325353.
[53] Stuart, A. (1955) A paradox of statistical estimation. Biometrika, 42, 527529.
[54] Thornberry, T., M. Krohn, A Lizotte, C. Smith, and K. Tobin. 2003. Gangs and
Delinquency in Developmental Perspective. Cambridge, U.K.: Cambridge University
Press.
[55] Tremblay, R. E., DesmaraisGervais, L., Gagnon, C., and Charlebois, P. (1987) The
preschool behavior questionnaire: stability of its factor structure between culture,
sexes, ages, and socioeconomic classes. International Journal of Behavioral Develop
ment, 10, 467484.
[56] Warr, M. .(2002). Companions in Crime: The Social Aspects of Criminal Conduct.
New York: Cambridge University Press.
38
Age
L
a
m
b
d
a
11.0 11.5 12.0 12.5 13.0
0
2
4
6
Low
Medium
Chronic
Figure 1: Expected Trajectories of Violent Delinquency in Groups Low (s = 1), Medium
(s = 2), and Chronic (s = 3).
0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 1
V
i
o
l
e
n
c
e
S
c
o
r
e
0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 2
V
i
o
l
e
n
c
e
S
c
o
r
e
0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 3
V
i
o
l
e
n
c
e
S
c
o
r
e
Figure 2: Boxplots of Violence Scores for the Three Trajectory Groups, Ages 10 to 13,
When None of the Boys Were in Gangs. Low is trajectory group s = 1, Medium is group
s = 2, and Chronic is group s = 3.
0
2
4
6
8
1
0
1
2
1
4
Joiners Potential Controls
Violence Age 10, P=.0087
V
io
le
n
c
e
0
2
4
6
8
Joiners Potential Controls
Violence Age 11, P=.0028
V
io
le
n
c
e
0
2
4
6
8
1
0
1
2
Joiners Potential Controls
Violence Age 12, P=.044
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Potential Controls
Violence Age 13, P=.0032
V
io
le
n
c
e

2

1
0
1
2
3
Joiners Potential Controls
Popularity, Age 11, P=.0015
P
o
p
u
la
r
it
y
1
5
2
0
2
5
3
0
3
5
Joiners Potential Controls
Mother’s Age, P=.020
A
g
e
a
t
F
ir
s
t
B
ir
t
h

2

1
0
1
2
3
Joiners Potential Controls
Aggression, Age 11, P=.0081
P
e
e
r
R
a
t
e
d
A
g
g
r
e
s
s
io
n
0
1
2
3
4
Joiners Potential Controls
Hyperactivity, Age 11, P=0.017
T
e
a
c
h
e
r
R
a
t
e
d
H
y
p
e
r
a
c
t
iv
it
y
0
2
4
6
8
Joiners Potential Controls
Sex Partners, Age 13, P=.0024
N
u
m
b
e
r
o
f
P
a
r
t
n
e
r
s
0
2
4
6
8
1
0
Joiners Potential Controls
Opposition, Age 11, P=.034
T
e
a
c
h
e
r
R
a
t
e
d
O
p
p
o
s
it
io
n
0
2
4
6
8
1
0
1
2
Joiners Potential Controls
IQ, P=.89
I
Q
T
e
s
t
M
e
a
s
u
r
e
0
1
2
3
4
5
6
Joiners Potential Controls
Aggression, Age 11, P=.27
T
e
a
c
h
e
r
R
a
t
e
d
A
g
g
r
e
s
s
io
n
Figure 3: Twelve Covariates Before Matching for Gang Joiners at Age 14 and for
Potential Controls Who Did Not Join at Age 14. The Pvalue is from Wilcoxon's two
sided rank sum test.
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 1
P
r
o
b
(
J
o
i
n
)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 2
P
r
o
b
(
J
o
i
n
)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 3
P
r
o
b
(
J
o
i
n
)
Figure 4: Estimated Propensity Scores by Trajectory Group. Low is trajectory group s =
1, Medium is group s = 2, and Chronic is group s = 3.
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Before Matching AfterGroups 1 & 2 AfterGroup 1 AfterGroup 2
S
t
a
n
d
a
r
d
i
z
e
d
D
i
f
f
e
r
e
n
c
e
Figure 5: Absolute Standardized Differences in Means for Gang Joiners at Age 14
Versus Controls, Before and After Matching, For 15 Covariates. Low is trajectory group
s = 1 and Medium is group s = 2.
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners Matched Unmatched
Propensity Score
P
r
o
p
e
n
s
i
t
y
S
c
o
r
e

3

2

1
0
1
2
3
Joiners Matched Unmatched
Peer Rated Popularity
P
o
p
u
l
a
r
i
t
y
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners Matched Unmatched
Prob(Group 2  1 or 2)
P
r
o
b
a
b
i
l
i
t
y
0
2
4
6
8
1
0
Joiners Matched Unmatched
Violence Age 11
V
i
o
l
e
n
c
e
Figure 6: Four Covariates with Largest Initial Bias: (i) GroupSpecific Propensity
Scores, (ii) Peer Rated Popularity, and (iii) Conditional Probability of Trajectory Group
s = 2 Given Groups s = 1 or s = 2, and (iv) Violence at Age 11. For the 59 gang joiners
at age 14 and the 273 unmatched nonjoiners, these are conventional boxplots, whereas for
the 219 matched controls, the boxplot uses quartiles derived from the weighted empirical
distribution.
0
2
4
6
8
1
0
Joiners Matched Controls
Age 14
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17
V
io
le
n
c
e
Figure 7: Violence Outcomes: Violence Scores Ages 14 to 17 for 59 Gang Joiners at Age
14 and 219 Matched Controls. The boxplot for the matched controls is based on quartiles
from the weighted empirical distribution.
0
2
4
6
8
1
0
Joiners Matched Controls
Age 14, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 14, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17, Group 2
V
io
le
n
c
e
Figure 8: Violence Outcomes by Trajectory Group, Ages 14 to 17. For matched
controls, boxplots use quartiles from the weighted empirical distribution. Low is
trajectory group s = 1 and Medium is group s = 2.
0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=0
V
io
le
n
c
e
Figure 9: Residuals from three noncompliance models, a = 1, 0.5, 0. The 59 joiners at
age 14 are compared to their 219 matched controls. Boxplots for controls use quartiles
from the weighted empirical distribution.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.