You are on page 1of 47

Gang Membership and Teen Violence:

An Observational Study
Amelia Haviland, Daniel S. Nagin, Paul R. Rosenbaum

RAND, Carnegie-Mellon University, University of Pennsylvania


Abstract
Using data from the Montral Longitudinal-Experimental Study of Boys, the ef-
fects on subsequent violence of joining a gang at age 14 are studied controlling for
characteristics of boys prior to age 14. The boys are divided into trajectory groups
based on violence from ages 11 to 13, and within trajectory groups, joiners are opti-
mally matched to a variable number of controls using propensity scores, Mahalanobis
distances, and a combinatorial optimization algorithm. The trajectory groups dene
meaningful subpopulations where eects may be dierent, while propensity scores and
optimal matching tend to balance twelve baseline covariates. By using between 1 and
7 controls for each joiner, greater eciency is obtained than by pair matching, with
greater bias reduction than is available by matching in a xed ratio. We develop
new eciency bounds to guide decisions about the structure of the matching. Fur-
ther adjustments for covariates are made using permutational covariance adjustment.
The possible impact of failing to adjust for an important but unmeasured covariate is
examined using sensitivity analysis.
Key words: Covariance adjustment, instrumental variable, matching with variable
controls, mixture model, observational study, optimal matching, propensity score,
sensitivity analysis, trajectory group.

This work was supported by grants SES-0345113 and SES-99113700 from the Methodology, Measure-
ment and Statistics Program and the Statistics and Probability Program of the U.S. National Science
Foundation and the National Institute of Mental Health (RO1 MH65611-01A2). It also made heavy use of
data collected with the support from Qubecs CQRS and FCAR funding agencies, Canadas NHRDP and
SSHRC funding agencies, and the Molson Foundation.
1
1 Introduction: Viewing Observational Studies of Development from the Perspective
of Controlled Experiments
A key aim of empirical research in developmental psychopathology and life course studies
is measuring the eect of therapeutic interventions, such as treatment for depression, or
life events, such as gang membership, on behavioral trajectories. Ideally, such eects would
be estimated with experimental data but many interventions that aect development are
beyond experimental control, for ethical or practical reasons. In these situations inferences
must be drawn from observational data. Four decades ago Cochran (1965) reected on the
design of studies which aim to draw casual inferences from observational data. He framed
his recommendations in the context of still earlier advice by Dorn (1953) who suggested
that the design of an observational study be organized around the question How should
the study be conducted if it were possible to do it by controlled experimentation? Certain
issues are common to an experiment and an observational study, and these shared issues
are brought into focus by thinking about the simpler situation of an experiment. One then
tries to reconstruct, to the limited extent possible, the circumstances of the experiment
from the observational data. Finally, one tries to address the weaknesses that are present
in the observational study but which would have been avoided in an experiment. A
similar perspective is discussed in Meyer (1995), Shadish, Cook and Campbell (2002) and
Rosenbaum (2002a, 2005a).
A well designed and executed experiment has four key ingredients: (1) a clearly specied
treatment, (2) good baseline measurements describing the subjects of the experiment prior
to treatment, (3) a well dened and properly executed method of randomizing treatment
assignment, and (4) well dened outcomes measured after treatment.
Regarding the rst ingredient we examine the possible eects of joining a gang at age
14 on adolescent males who had not been in gangs prior to age 14. Our methods allow us
to examine whether gang membership has a dierent eect on groups of boys distinguished
by their histories of violence. Also investigated are whether rst-time gang membership
has an immediate eect on violence at 14 and whether eects persist from age 15 to 17.
Our data show that for most youth, gang membership is not persistent. What then is the
eect of currently being in a gang, as distinct from joining a gang and perhaps departing?
It is these questions we address in the current manuscript.
Our analysis draws upon two lines of research that in our view are helpful in bringing
the second two attributes of an experiment to the analysis of non-experimental longitudinal
2
data. One line involves the use of nite mixture modeling to analyze developmental trajec-
tories in a group-based framework (cf. Nagin, 1999 & 2005; Nagin and Land, 1993; Muthen,
2001)) and the second involves the use of propensity scores, matching, and stratication
for causal inference in observational data (cf. Rosenbaum and Rubin, 1983; Rosenbaum,
2002a). The trajectory groups are based on pretreatment measures of the variable which,
after treatment, is the outcome variable. Thus, tting a group-based trajectory model to
pretreatment, baseline data permits the comparison of treated and control subjects who
appeared similar, in terms of developmental trajectory, prior to treatment. As such, the
trajectory groups serve as the baseline measure of response. Fitting propensity scores us-
ing observed pretreatment measures or covariates permits the comparison of treated and
control groups that are balanced in terms of these observed covariates. Thus, the propen-
sity score serves to stochastically balance observed covariates as random assignment of
treatments would do in an experiment. Unlike randomization in an experiment, neither
propensity scores or trajectory modeling can control for covariates that were not measured;
we examine this inescapable concern with the aid of a sensitivity analysis.
1.1 Data and Case Study: Joining a Gang in Montreal
We begin with a brief discussion of the data and case study that will be used to demon-
strate the proposed framework. As already noted, the case study examines the eect of
gang membership on violent delinquency. We chose this example for the case study for two
reasons. First, it is a substantively interesting and important problem. There is a long and
rich tradition of gang research in criminology because group inuence, particularly among
adolescents, is thought to play a key role in criminal and delinquent behavior (Short and
Stodtbeck 1965; Thornberry et al., 2003; Warr, 2002; Cohen, 1955; Cloward and Ohlin,
1960). Second, it is an archetypal example of a large class of important inference problems
in psychology and the social sciences more generally. Criminologists have long under-
stood that estimates of the facilitation eect of gang membership may be contaminated
by selection eects whereby the most delinquency prone individuals choose to join gangs
(Lacourse et al., 2003; Thornberry et al., 2003). Similarly, selection eects of this sort are
a generic threat to validity of causal inferences about the eects of events or experiences
on developmental trajectories because the likelihood of their occurrence is commonly tied
to the development of the very outcome under investigation. For example, just as gang
membership is more likely among the already violent, divorce is more likely among those
3
experiencing depression which greatly complicates the task of inferring the eect of divorce
on depression (Halford & Bouma, 1997; Schmalling & Scher, 1997).
The data used in the case study are the product of the Montral Longitudinal Study of
Boys. The 1037 male subjects in this study were in kindergarten at its outset in the spring
of 1984. They were next assessed in 1988 and then again annually until 1995 when their
average age was 17. The sample was drawn from 53 schools in the lowest socioeconomic
areas in Montral, Canada. To control for cultural eects, boys were included in the
longitudinal study only if both their biological parents were born in Canada and their
biological parents mother tongue was French. This resulted in a homogeneous white,
French-speaking sample. Wide-ranging measurements of potentially important covariates
such as social and psychological function were made based on assessments by parents,
teachers, peers, self-reports of the boy himself, and administrative records from schools and
the juvenile court. These measurements include data on the boys behavior across many
domains (e.g., sexual activity and delinquency in adolescence), and social functioning (e.g.,
peer popularity). See Tremblay et al. (1987) for further details on this study.
The self-reported data on annual involvement in violent delinquency and participation
in delinquent groups, which we hereafter refer to as gangs, form the core of the analyses
we report in this paper. Queries on prior year involvement in violent delinquency and
gangs were initiated in 1989 when the boys were age 11. Subjects were asked about the
frequency of their involvement in seven dierent types of violent delinquency within the
past yearthreatening to attack someone, st ghting, attacking an innocent person, gang
ghting, throwing objects at people, carrying weapons, and using weapons. These items
were each coded on a 4-point Likert-scale (0 = never; 1 = once or twice; 2 = sometimes; 3
= often) and summed to form an overall scale of violent delinquency. This scale was used
to estimate the trajectory model over the period when the boys were 11 to 13 years old. In
estimating the treatment eect of gang membership at ages 14 to 17 , the item pertaining
to gang ghting was excluded. Gang membership status in the prior year was based on the
subjects response to the question: During the past 12 months, were you part of a group
or a gang that committed reprehensible acts?
1
1
The original version of the question as administered in French was: Au cours des 12 derniers mois,
as-tu fais partie dun groupe des jeunes (gang) qui fait des mauvais coups?
4
1.2 Haviland and Nagin (2005): Modelling Pretreatment Developmental Trajectories
Our point of departure is recent work by Haviland and Nagin (2005) that tackles the same
type of inference problem that we address in this paperevents or experience that might
alter an individuals developmental trajectory. As in this paper, group-based trajectory
modeling plays a central role in Haviland and Nagin. The method is designed to identify
groups of individuals following approximately the same developmental trajectory over a
specied period of time (e.g., age 11 to 13) for the outcome of interest (e.g., violent delin-
quency). Stated informally, prior to joining a gang, individuals in the same trajectory
group appeared to be headed along the same path, at least so far as violence is concerned.
The use of trajectory groups as a basis for inference leads to the estimation of trajec-
tory group-specic treatment eects. This is scientically important because a key premise
of life course theories of development is that the magnitude, including the sign, of treat-
ment eects may depend upon a persons developmental trajectory (Elder, 1985 & 1998;
Thornberry et al., 2003). Thus, the trajectory group framework allows for examination of
whether there are dierences in treatment eects across substantively interesting groups
that are dierentiated by their developmental history. The capability to estimate trajec-
tory group specic treatment eects also allows researchers to examine the association of
the treatment eect estimates with characteristics of trajectory group members. This ca-
pability is signicant because it provides an empirical basis for better understanding how
treatment eects vary across variables that distinguish developmental history.
The trajectory groups can also be thought of as latent strata measuring the history
of the outcome variable. In the spirit of research using propensity scores and matching,
Haviland and Nagin use the trajectory groups and the attendant posterior probabilities of
group membership for each individual as a statistical device for creating balance on a key
covariate that may be confounding the treatment eect estimate, namely prior violence,
which notably consists of prior values of the post-treatment outcome measure. Specically,
in the context of the illustrative application it is vital to account for the empirical reality
that gang membership is more likely among individuals with a history of violence. By
comparing gang joiners and nonjoiners in the same trajectory groups, one is comparing
individuals whose trend in violence, before joining a gang, was similar.
Haviland and Nagin demonstrated their approach with the same example that this
paper begins withan analysis of the eect of rst time gang membership at age 14 on
violent delinquency at 14. In the current work, we substantially extend the prior analysis
5
to include inferences not only about the eect of rst time gang membership at age 14 on
violence at 14 but also on the eects at later ages. Also, we combine trajectory groups
with propensity scores to ensure balance on covariates other than prior violence.
As discussed in detail in Haviland and Nagin, application of group-based trajectory
modeling to the subset of individuals in the Montral study with no self reported history
of gang involvement from age 11 to 13 yielded the three group-model depicted in Figure
1. The r-axis of Figure 1 measures age and the n-axis measures the value of the Poisson
rate parameter, `
t
, which for each group measures the expected rate of violent oending
at age t. The two largest groups, called the Lows and the Mediums, were, respectively,
estimated to compose 46% and 48% of the sampled population. The third group, called
the Chronics, had a far higher rate of violent delinquency and was also far smaller. It was
estimated to compose 6% the sampled population.
The tendency of more violent boys being more likely to join gangs is quite pronounced
in our data. The percent of boys who became rst-time gang members at age 14 is smallest
for the low violence group (7%), next largest for the middle group (15%) and highest for
the chronic group (31%). Thus, a simple comparison of the age 14 violent delinquency
of the rst time gang joiners with their peers who did not join gangs would hopelessly
confound the facilitation eect of gang membership with the already higher violence rate
of rst time gang members.
More formally the problem of confounding can be characterized in terms of covariate
imbalance. A covariate is a variable measured prior to treatment and hence unaected
by a subsequent decision to apply or withhold the treatment. A covariate is said to be
out of balance if its distribution diers between the treated and untreated. For instance,
Haviland and Nagin found that gang joiners at age 14 had been more violent before age
14 and also less popular than boys who did not join gangs at age 14 prior violence and
peer rated popularity are two of the many covariates that were found to be out of balance.
The concern, of course, is that a dierence in outcomes after treatment, say a dierence in
level of violence at age 14 or 15, may not be an eect caused by joining a gang, but may
instead reect the dierences between joiners and nonjoiners that already existed prior to
age 14.
The essence of the Haviland and Nagin approach to creating balance is to perform
analyses within trajectory group. They nd that unlike contrasts of gang members and
nongang members that are not conditioned on their trajectory group membership, contrasts
which condition on trajectory group membership had very good success in achieving balance
6
between the rst-time gang members and their counterparts who remain outside of gangs
on a wide range of potential confounders including most especially violence levels prior to
age 14.
The approach to achieving balance laid out in Haviland and Nagin is empirically based
and is not guaranteed to balance covariates, particularly covariates that are very dierent
from the outcome, such as popularity. Due to the statistical construction of group-based
trajectory models it is likely that the approach will have good success in achieving balance
within trajectory on lagged outcomes, which as argued above, we believe to be very impor-
tant. Still no theorem assures that balance will be achieved even in expectation. Further,
as an empirical matter, the likelihood of achieving balance on other potential confounders
is lower than for lagged outcomes. By contrast, methods for achieving balance based on
propensity scores and matching are specically designed to achieve this aim. For this rea-
son, integration of the inference strategy laid out in Haviland and Nagin with other formal
statistical approaches for achieving balance is desirable.
In the integrated framework described in this paper, trajectory groups summarize base-
line measures of the outcome and propensity scores provide a useful but limited surrogate
for random assignment of treatments. The integration is composed of a three-stage analy-
sis. The rst stage involves estimating a group-based trajectory model for the outcome and
subjects of interest. In the context of our demonstration analysis of the violence facilitation
eect of gang membership, this step, which has already been described, involves estimation
of a trajectory model of violent delinquency from age 11 to 13 for individuals with no gang
involvement over this period. In the second stage each treated individual is matched with
one or more untreated individuals. The matching of gang joiners with non-gang joiners,
carried out within trajectory group, attempts to nd non-joiners who are close on an es-
timate of the propensity score, and on the individual variables that enter the propensity
score. We then check the degree of success of the matching strategy in achieving balance
between the rst time gang membersthe treatedand their matched counterparts who
did not join gangsthe controls. In the third stage of the analysis the treatment eect of
the event of interest, gang membership in our case, is analyzed. Specically we examine
the eect of rst-time gang membership at age 14 on violence at age 14 and beyond, within
and across trajectory group.
7
1.3 Overview of Using Propensity Scores to Balance Observed Covariates in Obser-
vational Studies
In the simplest randomized experiment, subjects are assigned to treatment or control by
the independent tosses of a fair coin, so that every individual has the same chance, namely
1
2
, of receiving treatment rather than control. Randomization then warrants or becomes
the reasoned basis for inferences about the eects caused by the treatment; see Fisher
(1935) and Rubin (1974). In contrast, the dening feature of an observational study is
that randomization is not used to assign treatments, so that some individuals are more
likely to receive the treatment than others. For instance, the boys who joined gangs at
age 14 did not do so at random, with equal probabilities; in fact, the boys who joined
gangs at age 14 tended to be quite dierent from those who did not, even several years
prior to age 14. In the absence of random assignment of treatments, there is nothing
to ensure that treated and control subjects were comparable prior to treatment, with
the consequence that diering outcomes after treatment may not be eects caused by
the treatment, but instead may simply reect pretreatment dierences. The problems
of interpretation created by the absence of random assignment are of two kinds: (i) the
probability of receiving treatment may vary with observed pretreatment characteristics or
covariates, so that the data at hand indicate treated and control subjects not comparable
prior to the start of treatment, and (ii) the probability of receiving treatment may also vary
with unobserved pretreatment covariates, so that adjustments for observed covariates may
be insucient to make the treated and control groups comparable. As examples of (i), the
observed covariates clearly show that gang joiners were more violent and less popular than
nonjoiners prior to age 14. It is not possible to exhibit a dierence of type (ii), because the
relevant covariates were not observed, but even with twelve observed covariates we consider
in Table 6, it would hardly be surprising if some important pretreatment dierence was
not adequately measured.
The propensity score is a device for removing imbalances in observed pretreatment
covariates, whether or not there are also imbalances in unobserved covariates, that is, a
device for addressing problem (i). Through matching or stratication on the propensity
score, one compares treated and control groups that appear comparable prior to treatment
in terms of observed covariates. A nontechnical survey of methods and results about
propensity scores is given by Joe and Rosenbaum (1999), and for several case-studies, see
Rosenbaum and Rubin (1984, 1985a), Smith (1997) and Dehejia and Wahba (1999).
8
The propensity score is the conditional probability of receiving the treatment rather
than the control given the observed covariates (Rosenbaum and Rubin, 1983). In the
simplest randomized experiment mentioned above, the propensity score would be
1
2
for
every subject. In the current context, the propensity score is the conditional probability
of joining a gang at age 14, given the observed covariates, namely violence prior to age 14,
peer rated popularity, mothers age the birth of her rst child, and so on. If two boys have
the same propensity score given observed covariates, say a .2 chance of joining a gang at
14, then these observed covariates will be of no further use in predicting which of these two
boys will join a gang at 14, so for these two boys, there will be no systematic tendency for
the observed covariates to be dierent for the joiner and the nonjoiner.
The propensity score is estimated from the data, perhaps using a logit model. Boys
with similar estimated propensity scores are compared, say by matching; then the joiners
and nonjoiners will have similar distributions of the observed covariates, such as past
violence, popularity, mothers age, and so on. Obviously, unlike randomization which does
not use the covariates in balancing them, the propensity score can only be expected to
balance the observed covariates used to construct the score. Unobserved covariates need
to be addressed by other methods, such as sensitivity analyses, which we illustrate in 5.4.
There are two key theorems concerning propensity scores. Informally, they say: (1)
matching or stratifying on the propensity score tends to balance the observed covariates
used to construct the score, and (2) if there is no bias from unobserved covariates, that is if
problem (ii) above does not arise, then to adjust for the many observed covariates it suces
to adjust for the unidimensional propensity score. In the current context, if one matches
gang joiners and nonjoiners on their probabilities of joining a gang given the twelve observed
covariates in Table 6 that is, given violence before age 14, peer rated popularity, and so
on then these twelve covariates will tend to be balanced in treated and control groups.
Also, if it were sucient to adjust for these twelve observed covariates, if there were no
bias from covariates that were not measured, then one could estimate the eects of joining
a gang by adjusting for the 1-dimensional propensity score rather than the 12-dimensional
observed covariates. Formally, the two theorems in Rosenbaum and Rubin (1983) say: (1)
treatment assignment and the observed covariates are conditionally independent given the
propensity score, and (2) if treatment assignment is strongly ignorable given the observed
covariates then it is strongly ignorable given the propensity score alone.
Intuitively, it seems reasonable to expect that the true value of a parameter would
perform better than an estimate of the parameter, and while that intuition is sometimes
9
true, it is not always true; see, for instance, Stuart (1955). In fact, when matching or
stratifying on propensity scores, estimated scores typically perform somewhat better than
theory says true propensity scores would perform; see, for instance, Rosenbaum and Rubin
(1984) where 74 observed covariates are better balanced by propensity scores than would
have been expected in a randomized experiment. The estimated propensity scores cannot
distinguish systematic imbalances from imbalances produced by chance, and the estimated
scores tend to remove some of the chance imbalances, whereas the true propensity score
removes only the systematic imbalance. Some theory related to estimated propensity
scores is discussed in Rosenbaum (1984, 1987).
1.4 Overview of Group-based Trajectory Modeling
This section provides a brief overview of group-based trajectory modeling. For more elab-
oration, see Nagin (1999; 2005) or related work by Muthen (2001).
The group-based trajectory model is an application of nite mixture modeling in which
the population is assumed to be composed of a nite number 1 of trajectory groups denoted
by the index /, / = 1, 2, . . ., 1. The three key outputs of group-based trajectory modeling
are: (1) the developmental trajectory of each group /, (2) the population base rate of each
group, and (3) for each individual i the posterior probability of their membership in each
group / given his or her actual measurements on the outcome of interest.
Let 1
it
denote the outcome variable of interest for individual i at time t, and let

Y
it
denote a vector of such outcomes from time 1 through t. In the current application,
1
it
is the frequency of violent delinquency of individual i in year t. The trajectory
model describes the conditional distribution, 1


Y
it

1
i
= /

. of

Y
it
given that indi-
vidual i is in group /, denoted 1
i
= /, and the frequency
k
= Pr (1
i
= /) of group /.
In its most general form, the group-based trajectory model follows a nite mixture model,
Pr

Y
it

=

K
k=1

k
1


Y
it

1
i
= /

, with the responses of distinct individuals i being


mutually independent.
In the current paper, we consider a specic form of this model in which 1


Y
it

1
i
= /

consists of independent Poisson variables whose parameters vary with time t and trajectory
group /, that is,
1


Y
it

1
i
= /

=

t
exp(`
kt
) `
Y
it
kt
1
it
!
.
so that the coordinates of

Y
it
are dependent for each individual i, but the dependence would
10
vanish within individual is latent trajectory group, 1
i
= /. The logarithms, log (`
kt
), of
the expected frequencies, `
kt
, are modeled as a polynomial in time t with coecients
that depend upon the trajectory group /, that is, log (`
kt
) = 0
0k
+ 0
1k
t + 0
2k
t
2
+ . . . +
0
pk
t
p
. The parameters, , of this function are permitted to vary freely across groups /.
This specication accords with much criminological research demonstrating that rates of
oending vary with age and also that there is considerable heterogeneity in the trajectories
of oending across individuals (Farrington, 1986; Nagin, Farrington, and Mott, 1995).
The observed delinquency histories, (1
i;11
. 1
i;12
. 1
i;13
), from age 11 to 13 of the boys
with no self reported gang membership over this period form the basis for estimating the
trajectory model used to create the baseline measurements on delinquent history. Their
likelihood is
/

Y
1t
. . . . .

Y
Nt
; .

=
N

i=1
K

k=1
13

t=11
exp(`
kt
) `
Y
it
kt
1
it
!
Maximization of this likelihood produces estimates of the parameters of the polyno-
mial functions that dene the shape of each groups trajectory of violent delinquency as
measured by `
kt
. In this application, for ages 11, 12, and 13, we use a log-linear function
of age, log (`
kt
) = 0
0k
+ 0
1k
t. The nal key product of group-based trajectory modeling,
the posterior probability of membership in the /
th
trajectory group, Pr

1
i
= / [

Y
it

, is
calculated post-estimation for each individual i in the estimation sample. It measures the
probability of each such individuals membership in group / given their history of violent
delinquency from age 11 to 13.
To permit matching of joiners and controls within trajectory groups, each boy was
attached to the trajectory group for which he had the highest estimated probability of
membership, although we use the estimates of Pr

1
i
= / [

Y
it

as well. On average this


highest estimated probability of membership was .85 or greater for each of the three groups
depicted in Figure 1. Figure 2 depicts the violence scores by trajectory group.
2 Trajectory Groups and Propensity Scores: The Boys Before Age 14
2.1 Violence Trajectory and Covariates Before Age 14
A total of 580 individuals in the Montreal Study reported no involvement with gangs from
age 11 through 13 and also had no more than one missing assessment of their violent
11
Table 1: Frequencies of Trajectory Groups Based on Violence Ages 11-13 and Gang Joining
at Age 14.
Status at Age 14 Low Medium Chronic
Gang Joiner 21 38 9
Not a Joiner 276 216 20
Total 297 254 29
delinquency and gang involvement over this period.
2
The self reported violent delinquency
of these individuals was used to estimate the trajectories reported in Figure 1. Of these
individuals, 68 joined gangs for the rst time at age 14. The rst-time gang joiners and
their counterparts who did not join gangs were distributed across the trajectory groups at
the frequencies in Table 1.
Our aim was to match, within trajectory group, each rst-time gang joiner with one
or more of his counterparts who did not join at 14 but who had similar covariates prior to
age 14. These covariates include variables that are known to be correlated with violence
and include: violence scores at ages 10, 11, 12, and 13, peer rated popularity at age 11,
the age of the boys mother at the birth of her rst child, peer rated aggression at age
11, teacher rated hyperactivity at age 11, self-reported number of sexual partners at age
13, teacher rated opposition at age 11, a rough IQ measure, and teacher rated physical
aggression at age 11.
3
Figure 3 depicts the covariates prior to matching, together with the
two-sided signicance level from Wilcoxons rank sum test. If this had been a randomized
experiment in which the boys in the Montreal study had been selected at random to join
a gang at age 14, one covariate in twenty would be expected to yield a signicance level
of 0.05 or less, whereas in Figure 3, ten of the twelve covariates have signicance levels
less than 0.05. Before joining gangs, joiners were more violent than nonjoiners, were less
popular with their peers, were more aggressive, hyperactive, and oppositional, had more
sexual partners, and had mothers whose age at the birth of their rst child was younger.
2
282 boys were involved in gangs prior to age 14 and 128 had more than one missing assessment over
this age range. An additional 59 boys were either missing their gang membership status at age 14 or their
assessment of violent delinquency.
3
Violence at age 10 was not used in estimating the trajectory model because it measured frequency at
age 10 and all prior years whereas the later year violence measurements are only for that year.
12
2.2 The Propensity to Join a Gang at Age 14
As a rst step in matching the gang joiners with comparable non-joiners we estimated a
propensity score using the original 12 covariates in Figure 3 plus some others derived from
these covariates. The complete list of covariates are reported in Table 6 which will be
described in greater detail in 4.2. The propensity score, which measures the conditional
probability of joining a gang at age 14 given the covariates, was estimated using a single
logit model.
Figure 4 depicts the estimated propensity scores for joiners and potential controls in
each of the three trajectory groups. In the low and medium trajectory groups, there is a
substantial dierence between joiners and potential controls, but there is also a fair amount
of overlap in the distributions, so credible matches would seem to be available. By contrast,
in the chronic group, the distributions exhibit limited overlap. Summary statistics on the
distributions for the chronics are reported in Table 2. The median propensity score among
the chronics who join gangs was above the maximum among the nonjoiners, and the lower
quartile among the joiners was above the upper quartile among the nonjoiners. Indeed,
the median propensity score for the 9 joiners in the chronic group is just slightly above
the maximum propensity score for all the other 571 boys in the sample. Whether or
not they joined a gang, the boys in the chronic group tended to have very high estimated
probabilities of gang membership compared to the other two groups even the median
for the 20 nonjoiners in the chronic group is above the median for the joiners in the low
and medium groups and half of the joiners in the chronic group are quite dierent from
everyone else.
These results indicate that the propensity for gang joining in the chronic group is
materially higher than in the low and medium groups particularly for those individuals
who actually joined gangs. More importantly the results suggest that it would be dicult
to nd good matches for the gang joiners in the chronic group. This indeed turns out to
be the case. All attempts to nd good matches for the gang joiners in the chronic group
failed dismally. One could try to skirt this problem by running a regression to adjust for
the covariate dierence between joiners and nonjoiners in the chronic group and hoping for
the best. However, for this group the regression would largely consist of an extrapolation
rather than interpolation between joiners and nonjoiners whose covariate distributions
exhibit limited overlap. (It is remarkable how rare it is for an investigator to check whether
a covariance adjustment model is an interpolation of substantially overlapping covariate
13
Table 2: Distribution of the Propensity Score in Chronic Trajectory Group : Estimated
Probabilities of Joining a Gang at Age 14.
Chronic Group min Quartile 1 Median Quartile 3 max
Joiners (: = 9) 0.21 0.40 0.69 0.84 0.98
Potential Controls (: = 20) 0.00 0.10 0.18 0.36 0.53
distributions or an extrapolation of largely nonoverlapping covariate distributions.)
We concluded that these data do not permit credible estimation of the eect of gang
membership for the boys in the chronic trajectory. Half of these gang joiners were unlike
anyone else in the study prior to age 14, and so there is no evidence in the data about
what would have happened to boys like this if they had not joined gangs. This was
disappointing because, even though this group consists of only a few boys, in many ways
the chronic group is quite interesting from a policy and scientic standpoint. Thus, we
made estimates of the violence facilitation eect of gang membership only for the numerous
but less violent boys in the low and medium groups which hereafter are also referenced by
the indices : = 1 and : = 2, respectively.
3 Planning the Matching
3.1 Conceptual Issues about Eciency
The primary role of matching in observational studies is to remove systematic biases due
to imbalances in observed covariates (Cochran 1965, p. 237), that is biases which do not
diminish in magnitude as the sample size increases (technically, biases of order O(1) in
estimated treatment eects), because these dominate the mean squared error as the sample
size increases. Nonetheless, the standard error of the estimator is also of concern, even
though it does diminish with increasing sample size (technically, it is of order O

:
+

),
and so the standard errors relative contribution to the mean square error is negligible in
large samples when systematic biases are present. In the limit, in very large samples,
there is really no trade-o of systematic bias (of order O(1)) and standard error (of order
O

:
+

); rather, in the limit, bias is all-important. In practice, of course, the sample


size may be large but it is nite, so one does not want to be wasteful of statistical eciency.
In the current section, we focus on the question of how the number of controls matched
to each joiner aects the standard error. In subsequent sections, we focus on the more
important issue of controlling bias from covariates.
14
Table 3: Elementary Eciency Comparisons With Various Numbers of Controls Matched
to Each Treated Subject.
Controls :
si
1 2 3 4 5 10 20 50
Variance multiplier

1 +
1
m
si

2.00 1.50 1.33 1.25 1.20 1.10 1.05 1.02 1.00


Eciency considerations are claried with the aid of a simple model similar to that
used in the analysis of variance or the paired t-test; see, for instance, Rosenbaum and
Rubin (1985a) and Smith (1997). The i
th
gang joiner at 14 in trajectory group : is
matched to :
si
_ 1 controls. Under the simple model, the response has an additive eect
for each matched set, a constant treatment eect, and independent errors with constant
variance o
2
; we use this model only for eciency calculations in 3, not for inference in
later sections. If one subtracts from the response of the i
th
joiner in trajectory group : the
average response of his :
si
controls, then this dierence has variance o
2

1 +
1
m
si

. We
examine various technical consequences in 3.3, but a key point can be discussed without
additional technical detail. Table 3 shows how the variance multiplier,

1 +
1
m
si

, changes
as the number of controls :
si
changes. Notice, rst, that there is an asymptote: for
matched pairs, :
si
= 1, the multiplier is 2, but if each treated subject were matched to
innitely many controls, the multiplier would drop to 1. In fact, using two controls rather
than one, :
si
= 2, yields a multiplier of 1.5, half-way to :
si
= . The distance is halved
again, to 1.25, by using :
si
= 4 rather than :
si
= 2 controls. By contrast, the gain from
using :
si
= 10 controls rather than :
si
= 5 controls is much smaller. The key point here
is that if the sample size permits the use of more than one control per treated subject, then
there are substantial gains to be had using :
si
= 2 controls, and meaningful gains from
:
si
= 4, but for much larger values of :
si
the gains are no longer large. Moreover, as
theory suggests and as Smith (1997) shows in a case-study, it becomes harder and harder to
nd good matches as :
si
increases, so a large value of :
si
may yield a biased comparison
with a negligible gain in eciency. See also the case study by Dehejia and Wahba (1999)
where model-based methods unaided by matching are distorted by the use of controls quite
unlike the treated subjects.
The discussion so far has focused on a single matched pair, but one faces the additional
choice of allowing the individual :
si
to vary while keeping their total

i
:
si
xed. Again,
we develop this in some detail in 3.3, but the basic issue is clear with just two pairs, say
i and ,. Suppose joiner i is matched to :
si
controls and joiner , is matched to :
sj
15
Table 4: Eciency in Two Pairs With a Total of 7 Controls.
:
si
1 2 3 4 5 6
:
sj
6 5 4 3 2 1
:
si
+:
sj
7 7 7 7 7 7
1
4

1 +
1
m
si

1 +
1
m
sj

0.79 0.68 0.65 0.65 0.68 0.79


controls, and their two treated-minus-control dierences are averaged. That average of
two dierences within two pairs has variance
1
4

1 +
1
m
si

1 +
1
m
sj

. Table 4 shows
the eciency in two pairs, i and ,, with a total of :
si
+:
sj
= 7 controls shared between
them. These variances are fairly stable, especially if min(:
si
. :
sj
) _ 2. The worst
variance, namely 2, in Table 3 is 100% bigger than the best variance, namely 1, in Table
3, but the worst variance, namely 0.79, in Table 4 is 23% bigger than the best variance,
namely 0.65, in Table 4. If min(:
si
. :
sj
) _ 2, then the worst in Table 4 is less than 5%
higher than the best. In short, although the most ecient allocation has :
si
constant, the
same for all i, the eciency changes only slowly as the :
si
are allowed to vary with their
total

i
:
si
xed. Moreover, Ming and Rosenbaum (2000) show that one can achieve
much greater bias reduction by allowing the :
si
to vary. It is easy to see why this is so in
the current context. The most violent boys are the ones most likely to join a gang. If a
nonviolent boy, say i, joins a gang, then there will be an abundance of similar nonviolent
controls available to match to i, so :
si
should be set somewhat higher. On the other
hand, if an extremely violent boy, ,, joins a gang, there will be comparatively few similar
controls available to match to ,, so :
sj
should be set somewhat lower. Notice that, in
Table 4, setting :
si
= 5 and :
sj
= 2 would be nearly as ecient as setting :
si
= 4 and
:
sj
= 3, but the former would produce better matches given the types of boys who join
gangs.
From Table 1, there are 13.1 = 27621 nonjoiners for each joiner in group 1 and
5.7 = 21638 nonjoiners for each joiner in group 2. After some theoretical calculations
and some preliminary eorts at matching, we decided to match each joiner in group 1
to between 2 and 7 controls with an average of 5 controls, and each joiner in group 2 to
between 1 and 6 controls with an average of 3 controls. The remainder of the current
section describes the theoretical calculations of eciency bounds that guided this decision.
A reader uninterested in these eciency calculations could skip to 4 where the matching
is performed and evaluated.
16
3.2 Matching with Variable Numbers of Controls: A Flexible Strategy
There are o strata, with :
s
treated subjects, i = 1. . . . . :
s
, :
+
=

S
s=1
:
s
treated subjects
in total, and '
s
potential controls in stratum :, with '
s
_ :
s
, : = 1. . . . . o. In Table 1,
setting aside trajectory group 3 for the reasons discussed in 2.2, o = 2, :
1
= 21, :
2
= 38,
:
+
= 59, '
1
= 276, and '
2
= 216. The i
th
treated subject in stratum : will be matched
to :
si
_ 1 controls from stratum :, with no control matched to more than one treated
subject. In this study, the number of controls was vastly greater than the number of
treated subjects; therefore, we do not consider the alternative of full matching in which
a treated subject may be matched to several controls or a control may be matched to
several treated subjects (see Rosenbaum 1991, Gu and Rosenbaum 1993, Hansen 2004).
Every one of the :
+
= 59 joiners in groups 1 and 2 will be matched (Rosenbaum and
Rubin 1985b). Write m = (:
11
. :
12
. . . . . :
S;n
S
)
T
, and write :
s+
=

ns
i=1
:
si
, so that
:
s+
_ '
s
with equality if every control is matched.
In addition to strata, each subject has a vector of observed, pretreatment covariates;
see Table 6 below. Between each treated subject and each control in the same stratum,
there is a distance, such as the Mahalanobis distance (Rubin 1980), measuring how similar
these two subjects are with respect to the covariates. Each joiner in stratum : will be
matched to at least c
s
_ 1 controls and at most
s
_ c
s
controls with :
s+
controls
used in total in stratum :. Write = (c
1
. c
2
. . . . . c
S
)
T
, = (
1
. . . . .
S
)
T
, and m
+
=
(:
1+
. . . . . :
S+
)
T
. Clearly, one must choose :
s+
so that :
s
_ :
s+
_ '
s
and one must
choose (c
s
.
s
) to satisfy :
s
c
s
_ :
s+
_ :
s

s
; if this is true for : = 1. . . . . o, we say that
the choice (. . m
+
) is possible. For instance, in stratum : = 1, the number of matched
controls, :
1+
, must satisfy :
1
= 21 _ :
1+
_ '
1
= 276 and 21 c
1
_ :
s+
_ 21
1
, so
:
1+
= 105 = 5 21, c
1
= 2,
1
= 7, is one of many possibilities.
Write j
s
= :
s+
:
s
| where r| is the greatest integer less than or equal to r and r| is
the least integer greater than or equal to r; for instance, with :
1+
= 105, j
1
= :
1+
:
1
| =
10521| = 5. In general, the closest matches, in terms of distance, are obtained by taking
c
s
=
s
= 1, :
s+
= :
s
, and j
s
= 1, whereas the smallest standard errors are obtained by
taking :
s+
= '
s
, c
s
= j
s
= :
s+
:
s
|,
s
= :
s+
:
s
|, so these two considerations pull in
dierent directions. In stratum : = 1, these extremes have c
s
=
s
= 1 and :
s+
= :
s
= 21
or :
s+
= '
s
= 276, c
s
= j
s
= :
s+
:
s
| = 27621| = 13,
s
= 27621| = 14. In a
careful case-study, Smith (1997) showed that insisting upon matched pairs, :
s+
= :
s
, can
lead to a substantial loss in eciency, together with the discarding perfectly acceptable
17
controls, while insisting upon :
s+
= '
s
provides negligible gains in eciency when '
s
:
s
is large, together with substantially inferior matched controls. Ming and Rosenbaum
(2000) show that substantial gains in bias reduction are possible by allowing the set sizes
to vary, that is, by setting c
s
< j
s
and
s
j
s
+ 1, often with little loss of eciency.
Our strategy is to choose :
s+
, c
s
,
s
in 3.3 so that, even in the worst case, the loss
of eciency is not large; then, in 4, to minimize total covariate distance subject to the
chosen :
s+
, c
s
,
s
.
3.3 Sizes of Matched Sets: Worst Case Eciency Bounds
For approximate eciency calculations in this section only, suppose responses of distinct
subjects are mutually independent, every response has the same variance, o
2
, and the
covariates are entirely useless, that is, independent of treatment and response, so matching
on covariates neither removes bias nor reduces variance. This is, in a sense, the worst
situation for matching, because (i) imbalances in covariates do not create bias, so that the
trade-o of bias and variance entirely focuses on variance, and (ii) matching does nothing,
in this case, to reduce the variance. Consider the dierence between the response of the
i
th
treated subject and the average of this subjects :
si
matched controls. The average of
the :
+
dierences has variance
i (m) =
o
2
:
2
+

s;i

1 +
1
:
si

Obviously, there is little to be gained in i (m) by increasing :


si
beyond a certain point,
because 1 +
1
m
si
tends to 1, not zero, as :
si
. In other words, even with innitely
many controls matched to each treated subject, the standard error

i (m) cannot fall
below o

:
+
because there are only :
+
treated subjects. If :
si
were constant, the same
for all :. i, then

i (m) is more than 40% above its asymptote for :
si
= 1, but less than
10% above the asymptote for :
si
= 5 and less than 5% above the asymptote for :
si
= 10.
Moreover, the asymptote, obtained by letting :
si
, is not a real option in any actual
problem because :
s+
_ '
s
and the '
s
are nite; see Table 1 where, for instance in
stratum : = 2, the largest possible average value of :
si
is only '
2+
:
2
= 21638 = 5.6.
At the same time, it is much harder to nd good matches on covariates as :
si
increases
(Smith 1997, Ming and Rosenbaum 2000).
Write
;;m
+
for the set of possible values for the vector m of matched set sizes;
18
that is, m
;;m
+
if and only if m is an :
+
dimensional vector with positive integer
coordinates :
si
such that, c
s
_ :
si
_
s
, i = 1. . . . . :
s
, and :
s+
=

ns
i=1
:
si
, for
: = 1. . . . . o. Two elements of
;;m
+
are of particular interest, specically m and
m which will now be dened. Informally, m is as nearly constant as possible, while
m is as dispersed as possible. Dene m by the rule: (i) if :
s+
:
s
is an integer, then
:
si
= :
s+
:
s
for i = 1. . . . . :
s
; otherwise, (ii) if :
s+
:
s
is not an integer then :
si
= j
s
+1
for i = 1. . . . . :
s+
j
s
:
s
and :
si
= j
s
for i = (:
s+
j
s
:
s
) + 1. . . . . :
s
. With c
s
<
s
,
dene m as follows
:
si
=
s
1or i = 1. . . . . :
s
.
:
si
= :
s+
(
s
c
s
) :
s
c
s
(:
s
1) 1or i = :
s
+ 1.
:
si
= c
s
1or i = :
s
+ 2. . . . . :
s
.
where
:
s
=

:
s+
:
s
c
s

s
c
s

.
If c
s
=
s
= :
s+
:
s
, then take :
si
= c
s
. For any possible choice of , , m
+
,
Proposition 1 determines the minimum and maximum standard error,

i (m). Using
Proposition 1, one can select , , m
+
so the worst possible loss in eciency is controlled.
Proposition 1 For possible , , m
+
,
i (m) = min
m2M
;;m
+
i (m) and i ( m) = max
m2M
;;m
+
i (m) .
Proof. Notice that the contribution to i (m) from each stratum : is a symmetric and
convex function of (:
s1
. . . . . :
s;ns
), that is, a Schur convex function (Marshall and Olkin
1979), so i (m) itself is the sum of o Schur convex functions. (The function i (m) itself
would be Schur convex on a symmetric domain, but
;;m
+
is only symmetric within
strata.) The result then follows immediately from properties of Schur convex functions,
specically Proposition 4.C.1 of Marshall and Olkin (1979, p. 132) which is originally due
to J. H. B. Kemperman.
We now dene two quantities, where
;;m
+
compares the best and worst standard
errors with a possible , , m
+
, and where
;;m
+
compares use of all controls to the
worst standard errors with a possible , , m
+
. The smallest possible value of i (m)
occurs if all controls are used, :
s+
= '
s
for : = 1. . . . . o, and the :
si
are as uniform
19
Table 5: Eciency Calculations for Matching with Variable Controls. Above the double
line, caculations describe planning before matching. Below the double line, caculations
describe the acutal match produced by minimum distance matching.
Stratum : = 1 (1on) : = 2 ('cdin:) All
Treated, :
s
21 38 59
Potential Controls, '
s
276 216 492
Matched Controls, :
s+
105 114 219
Minimum c
s
2 1 1 or 2
Maximum
s
7 6 7

;;m
+
97% 90% 92%

;;m
+
92% 84% 87%
m 7
12
. 5
1
. 2
8
6
15
. 2
1
. 1
22
219
m 5
21
3
38
219
m 14
3
. 13
18
6
26
. 5
12
492
Actual Match Frequencies 7
10
. 6
1
. 5
3
. 2
7
6
9
. 5
2
. 4
3
. 3
4
. 2
6
. 1
14
219
Actual m versus m 92% 87% 89%
as possible. Dene j
s
= '
s
:
s
| and m by the rule: (i) if '
s
:
s
is an integer, then
:
si
= '
s
:
s
for i = 1. . . . . :
s
; otherwise, (ii) if '
s
:
s
is not an integer then :
si
= j
s
+1
for i = 1. . . . . '
s
j
s
:
s
and :
si
= j
s
for i = :
s

'
s
j
s
:
s

+ 1. . . . . :
s
. Dene

;;m
+
=

i (m)
i ( m)
and
;;m
+
=

i ( m)
so the standard error of the average dierence would be
;;m
+
times smaller with the
best choice of match frequencies, m, than with the worst, m, and similarly
;;m
+
times
smaller with match frequencies mthan with the worst, m. In typical problems, the length
of a condence interval for a mean would be proportional to the standard error, so
;;m
+
and
;;m
+
describe the eect on the length of a condence interval.
Table 5 shows some eciency calculations for our proposed design. In stratum : = 1,
with :
1
= 21, '
1
= 276, :
1+
= 105, c
s
= 2,
s
= 7, the least dispersed match frequencies
m for the :
1
= 21 joiners has (:
11
. . . . . :
1;21
) equal to (5. 5. . . . . 5)
T
which we write 5
21
,
and the most dispersed m has ( :
11
. . . . . :
1;21
) equal to (7. 7. . . . . 7. 5. 2. . . . . 2), which we
write 7
12
. 5
1
. 2
8
, noting that :
1+
= 105 = (12 7)+5+(8 2). Similarly, using all controls
matched as uniformly as possible has

:
11
. . . . . :
1;21

equal to (14. 14. 14. 13. . . . . 13) or


14
3
. 13
16
. In stratum : = 1, the most ecient allocation m with :
1+
= 105 controls
20
yields a standard error that is 97% of the standard error for the least ecient allocation
m with :
1+
= 105 controls and c
s
= 2,
s
= 7. Also, in stratum : = 1, the most ecient
allocation with all controls, m with :
1+
= 276, has a standard error that is 92% of the
standard error of the least ecient allocation m with :
1+
= 105 controls and c
s
= 2,

s
= 7. That is, the c
1
= 2.
1
= 7, :
1+
= 105 conguration is much more exible,
so it can remove more bias, but it is never much less ecient than the c
1
= 13,
2
= 14,
:
1+
= 276 design. Even in the worst case, even with irrelevant covariates and no bias to
be removed, the allocations in Table 5 are never very inecient.
The frequencies actually obtained in 4 by optimal matching were not the least ecient
frequencies m, but rather the frequencies in m
;;m
+
given below the double lines
in Table 5. The standard error using all '
+
= 492 controls allocated as uniformly as
possible is 89% of the standard error for the actual m with 219 controls and an uneven
allocation.
We decided upon the allocation rules in Table 5 after trying several alternative rules
and obtaining the optimal matches, as in4. As one might anticipate from Figure 4, it
was more dicult to nd good matches on covariates in declining group, : = 2, than in the
low group, : = 1, and so we tolerated somewhat lower values of
;;m
+
and
;;m
+
in
group : = 2. In particular, when we tried matching with a minimum of c
2
= 2 controls in
trajectory group : = 2, the balance on covariates in this trajectory group was unsatisfactory
by the standards discussed in 4.2, while with c
2
= 1 good balance was attainable.
Three points deserve emphasis. First, there is a trade-o between bias and variance,
between comparing comparable subjects and using many subjects as possible, and one
faces diminishing returns if either consideration completely outweighs the other. Second,
although the eciency bounds in the current section provide a rough guide to setting the
matching parameters, and , one should always check, as we do in 4.2, that matching
has actually balanced observed covariates, and adjust the matching parameters, and
, accordingly. Third, it is never appropriate to use extremely biased controls in the
hope of boosting eciency; see for instance Rubin (1979) for a simulation and see Dehejia
and Wahba (1999) for a case study of the damage such controls can do to model based
adjustments.
21
4 Optimal Matching: Method and Evaluation of Comparability
4.1 Matching Method: Separate Propensity Scores, Minimum Distance Matching
The matching was performed separately within the two trajectory groups, the lows (: = 1)
and the mediums (: = 2), using the allocations displayed in Table 5, so :
1+
= 105 controls
were selected in the low group with at least c
1
= 2 controls and at most
1
= 7 controls
for each joiner, and :
2+
= 114 controls were selected in the medium group with at least
c
2
= 1 control and at most
2
= 6 controls for each joiner. The matching attempted to
balance the covariates listed in Table 6. The twelve covariates in Figure 3 are identied
by an asterisk in Table 6, and the remaining covariates are derived from these twelve.
One derived covariate, which is obtained from the group-based trajectory model, is the
conditional posterior probability of being in the medium group given assignment to either
the low or medium group. Two separate propensity scores were estimated, one for each
trajectory group, to allow for dierent selection mechanisms across trajectories. The
propensity score was estimated using a logit model to predict joining a gang at age 14
from the other covariates in Table 6, and the logit of the propensity score is used in most
calculations.
For trajectory groups together, the mean of each covariate for the :
+
= 59 joiners at
age 14 (the treated group) is A
t
in Table 6, and the mean for the '
+
= 492 potential
controls is A
c
. Generally, the gang joiners were more violent than potential controls at
ages 10, 11, 12 and 13, were less popular with their peers, had higher posterior probabilities
of membership in the medium trajectory group, and of course higher propensity scores.
Some of the covariates in the upper portion of Table 6 had missing values, and their
associated missing value indicators are listed in the lower portion of Table 6. For instance,
the covariate number of sexual partners was missing for A
t
= 5% of joiners and A
c
= 1%
of potential controls. If missing value indicators are included as variables in the propensity
score, as we did, then the propensity score tends to balance the observed values of the
covariates and the pattern of missing value indicators, but of course it may not balance the
unobserved covariate values themselves; see the appendix of Rosenbaum and Rubin (1984).
For instance, it would tend to balance the observed values of number of sexual partners and
the frequency of missing values, but cannot be expected to balance the missing number
of sexual partners. When missing value indicators are included as predictors in a logit
model, the tted propensity scores are unaected by the numerical values used in place of
22
missing values, because the tted coecients adjust to compensate; we used the covariates
mean. Although this does not aect the propensity scores themselves, it does have a small
eect on the Mahalanobis distance described below, and we excluded missing values when
evaluating covariate balance, so this use of the mean does not aect balance measures
discussed in 4.2.
Within each trajectory group, we dened a distance between each joiner and each
potential control. The distance had two components. First, the distance was the Maha-
lanobis distance computed from the covariates in Table 6, including the relevant groups
logit propensity score, but excluding the missing value indicators. Second, if two individ-
uals diered on their logits of the propensity score by more than 0.2 times the standard
deviation of the logit of the propensity score, then a penalty of 200 was added to this dis-
tance. Penalties are a standard device for eectively constraining an optimization problem
without formally introducing constraints. In this study, all but 7 of the 219 actual matches
avoided the penalty and thereby respected the constraint. We selected the controls to min-
imize the total of the distances, that is, of the total of the :
1+
= 105 distances in the
low group and the total of the :
2+
= 114 distances in the medium group . This is com-
binatorial optimization problem. We solved it using the tactic in Ming and Rosenbaum
(2001), who convert it into the familiar optimal assignment problem and then solve the
optimal assignment problem using Bertsekas (1981) auction algorithm with a substantially
accelerated version of the Splus code in Rosenbaum (2002a, p. 325-326). Faster software
in R is available from Hansen (2004).
4.2 Comparability Before Age 14: Covariate Balance Before and After Matching
In this section, the matched sample is evaluated in terms of covariate balance. The
measures of balance are simple, traditional, and have been in use for some time; e.g.,
Rosenbaum and Rubin (1985a). For any one covariate A, let A
t
and A
c
be the means, and
let :
Xt
and :
Xc
be the standard deviations of A for, respectively, all 59 joiners and all 492
potential controls, before stratication and matching. Also, let :
X
=

:
2
Xt
+:
2
Xc

2 be
an equally weighted combination of the two standard errors. Because we use every joiner,
matching does not alter the mean A
t
of A for joiners. Joiner i in stratum : has covariate
23
value A
tsi
and is matched to :
si
controls with covariate values A
csij
, , = 1. . . . . :
si
. Write
A
csi
=
1
:
si
m
si

j=1
A
csij
and A
c
=
1
:
+
S

s=1
ns

i=1
A
csi
. (1)
In words A
csi
is the average of the covariate for all controls matched to gang member i in
trajectory group s, A
c
is the average of those averages across all gang joiners regardless
of their trajectory group. One hopes to see covariate balance after matching, specically
that A
tsi
A
csi
is centered near zero, or that A
t
A
c
is near zero. Table 6 reports A
t
and A
c
before matching, A
c
after matching, and two absolute standardized measures,
o1
X
=

A
t
A
c

:
X
, and o1
Xm
=

A
t
A
c

:
X
,
in which the denominators are the same. Aside from the missing value indicators, the
covariates in Table 6 are sorted by the standardized bias o1
X
before matching. Because
the two propensity scores were dened separately in the two trajectory groups, the various
measures for them are calculated only from individuals in the relevant trajectory group.
Before matching, the groups are almost a full standard deviation apart on the propensity
score, almost half a standard deviation apart on peer rated popularity, almost 40% of
a standard deviation apart on the posterior probability of membership in the medium
trajectory group, and roughly a quarter of a standard deviation apart on violence at ages
10, 11, 12, and 13. The balance is typically improved after matching, particularly when
the bias before matching was not small. We particularly note that the standardized bias
after matching is less than 5% of a standard deviation for violence scores at age 10, 11, 12,
and 13. As we have previously argued, balance on prior violence levels between gang joiners
and their matched controls is particularly important to generating a credible estimate of
the violence facilitation eect of gang membership. Thus, we are heartened that matching
resulted in near perfect balance on these covariates.
Figure 5 depicts the standardized biases for the 15 covariates, but not for their missing
value indicators. In the rst two boxplots in Figure 5, the lower quartile of the stan-
dardized biases before matching is above the upper quartile after matching. Figure 5 also
includes the standardized biases in the means after matching in the low and medium groups
separately. Fourteen covariates are included in these boxplots because only the propensity
score in group : is used to describe that group. For any A, all four standardized biases in
24
Table 6: Covariate Imbalance Before and After Matching, for 15 Covariates and 7 Missing
Value Indicators. Absolute standardized dierence in covariate means, before and after
matching. Covariate means for all gang joiners at age 14, all nonjoiners, and matched
nonjoiners. Covariates are ordered by the standardized bias before matching.
Covariate o1
X
o1
Xm
A
t
A
c
A
c
Logit Propensity Score 1 0.96 0.21 2.05 3.03 2.27
Logit Propensity Score 2 0.79 0.18 1.38 2.04 1.53
Peer Rated Popularity, Age 11* 0.47 0.18 0.28 0.18 0.10
Pr (1on[ 1on or 1cc|i:i:o) 0.38 0.02 0.65 0.50 0.66
Violence, Age 11* 0.26 0.04 1.24 0.93 1.19
Mothers age at rst birth* 0.26 0.11 22.56 23.56 23.00
Peer Rated Aggression, Age 11* 0.25 0.05 0.00 0.23 0.05
Violence, Age 10* 0.24 0.02 2.46 1.96 2.41
Violence, Age 12* 0.23 0.03 1.00 0.73 1.03
Teacher Rating of Hyperactivity, Age 11* 0.22 0.09 1.23 0.95 1.11
Violence, Age 13* 0.21 0.02 0.88 0.67 0.86
Number of Sexual Partners, Age 13* 0.21 0.06 0.23 0.14 0.21
Teacher Rating of Opposition, Age 11* 0.19 0.20 2.50 2.04 2.02
Intelligence Score* 0.07 0.22 8.92 9.10 9.46
Teacher Rating of Physical Aggression, Age 11* 0.03 0.20 0.75 0.79 0.49
Number of Sexual Partners Missing 0.27 0.15 0.05 0.01 0.03
Intelligence Score Missing 0.16 0.06 0.03 0.01 0.03
Physical Aggression Missing 0.15 0.15 0.05 0.02 0.08
Violence Age 13 Missing 0.14 0.11 0.03 0.01 0.02
Mothers Age Missing 0.13 0.17 0.03 0.01 0.01
Popularity Missing 0.09 0.02 0.15 0.12 0.16
Aggression Rating Missing 0.09 0.02 0.15 0.12 0.16
25
Figure 5 have the same denominator :
X
; only the means in the numerator change. Figure
5 suggests the improvement in balance is similar in both groups. Figure 5 and Table 6 also
show that the matching had good success in reducing the absolute standardized dierences
to near or generally well below two tenths of a standard deviation.
Figure 6 depicts the covariates themselves, before and after matching, for the four
covariates in Table 6 with the largest standardized biases before matching. Note that
before matching the distribution of each of these covariates across the nongang members
diers appreciably from that of the gang members whereas after matching the distributions
for the gang and nongang member are very similar. These results are another encouraging
sign that the matching had substantial successful in bring into balance covariates that in
the unmatched data were substantially out of balance.
As a methodological aside, the boxplots for 59 joiners and the 273 unmatched controls
in Figure 6 are conventional boxplots. A joiner might be matched to between 1 and 7
controls, but an appropriate, directly adjusted analysis gives equal weight the to the 59
joiners. Therefore, for each covariate, we created a weighted empirical distribution for the
matched controls. In the weighted distribution, if a joiner had 1 matched control then that
control received weight 1, but if the joiner had 2 matched controls then each one received
weight
1
2
, and so on, up to 7 matched controls each with weight
1
7
. The expectation of
this weighted empirical distribution is A
c
in (1), as displayed in Table 6. Alternatively,
if one computed A
c
in (1) not from the covariate A
csij
itself, but rather from the binary
variable indicating whether A
csij
_ r, then the result is the weighted empirical distribution
evaluated at r. In Figure 6 and in several later boxplots, the quartiles for matched controls
are computed from this weighted empirical distribution.
Although the missing value indicators at the bottom of Table 6 also show improved
balance overall, much of this occurred in the low trajectory group, where there was more
freedom to pick controls. There was little improvement in the missing indicators in the
medium group, in part because the Mahalanobis distance emphasized the actual covariates
and the missing indicators were included only indirectly through the propensity score.
26
Table 7: Comparison of Levels of Violence Among Gang Joiners at Age 14 and Matched
Controls. Two-sided signicance levels from the Hodge-Lehmann aligned rank test, testing
no eect on the level or the change in violence. Changes in violence are the dierence
between violence at a given age and the average violence for ages 10 to 13. The 95 percent
condence interval for an additive eect on the level of violence is obtained by inverting
the test.
# Sets
Level of
Violence
Change in
Violence
95% Condence
Interval
Covariate Age 10 59 0.488
Covariate Age 11 57 0.671
Covariate Age 12 58 0.209
Covariate Age 13 57 0.600
Covariate Average 10 to 13 59 0.631
Outcome Age 14 59 0.0017 0.0074 [0.25. 1.00]
Outcome Age 15 53 0.0089 0.0159 [0.14. 1.16]
Outcome Age 16 48 0.411 0.509 [0.25. 0.63 ]
Outcome Age 17 50 0.556 0.606 [0.30. 0.51]
Outcome Average 14 to 17 59 0.015 0.023 [0.08. 0.79]
5 Outcomes in Late Adolescence for Boys Who Joined Gangs at 14 and for Matched
Controls
5.1 Violence and Change in Violence, Ages 14 to 17
Figure 7 reports boxplots of the violence scores for the 59 gang joiners for ages 14-17. Also,
shown are the weighted empirical distribution for matched controls, dened in 4.2. Figure
8 shows counterpart boxplots by trajectory group. For the sample as a whole and for each
trajectory group, the plots suggest a pronounced upward shift in violence at ages 14 and
15. By age 16 the dierences between the gang joiners at age 14 and their match controls
seems to have largely dissipated particularly for the medium violence trajectory group.
The aligned rank test of Hodges and Lehmann (1962) was used to test the null hypoth-
esis of no dierence in a comparison of the level of violence for the joiner in a matched
set to the levels of violence for his several matched controls. The aligned rank test is
essentially a generalization of Wilcoxons signed rank test to the case of matching with
multiple controls. Table 7 displays the results for the sample as whole. Prior to age
14, when none of the boys were in gangs, there is not a signicance dierence in level of
violence among the boys who would join at age 14 and their matched controls. After age
27
Table 8: Violence Outcomes Within Trajectory Groups. The outcome is the level of
violence at specic ages. Comparison of 21 joiners and 105 matched controls in the low
trajectory and 38 joiners and 114 matched controls in the medium trajectory group using
the aligned rank test. Two-sided signcance levels for testing no eect and 95 percent
condence intervals for an additive eect formed by inverting the test.
Age Group
Signicance
Level
95% CI
14 Low (: = 1) 0.008 [0.16. 1.13]
Medium (: = 2) 0.033 [0.04. 1.21]
15 Low (: = 1) 0.034 [0.00. 1.40]
Medium (: = 2) 0.086 [0.00. 1.41]
16 Low (: = 1) 0.044 [0.00. 1.16]
Medium (: = 2) 0.753 [0.88. 0.57]
17 Low (: = 1) 0.070 [0.00. 1.58]
Medium (: = 2) 0.520 [0.95. 0.33]
14, two variables are examined: (i) the level of violence at a given age, and (ii) the change
in the level of violence at a given age when compared to the average for this boy from ages
10 to 13. For the sample as a whole, at ages 14 and 15, the joiners were signicantly
more violent than their matched controls, and the changes in their violence from baseline
were signicantly greater than the changes for their matched controls; whereas, at ages 16
and 17, the dierences were not signicant. In Table 7, a 95% condence interval for an
additive eect of gang joining at age 14 on the subsequent level of violence is obtained by
inverting the aligned rank test.
Table 8 displays the results by trajectory group. Specically, Table 8 reports inferences
about the eects of joining a gang at age 14 on the level of violence at subsequent ages,
separately for the Low (: = 1) and Medium (: = 2) trajectory groups, again using the
Hodges-Lehmann aligned rank test
4
. The patterns in Table 8 are intriguing. They suggest
that the eects of gang joining at age 14 may be more persistent for the low group than the
declining group. By age 15 the dierence in the violence between the age 14 joiners and
4
The reader will note that several condence intervals end exactly at zero. Many boys had a violence
score of zero in at least one year. As a rank statistic, the aligned rank statistic takes discrete steps as
the parameter being tested is varied, which explains, for instance, the signicance level of 0.034 in the low
group at age 15 together with a condence interval, [0:00; 1:40], which ends sharply at zero. This says that
an additive eect of zero is not plausible, is rejected as too small at the 0.034 level, but that any positive
eect is not rejected at the 0.05 level. Similarly, in the medium group at age 15, a zero eect is not rejected
at the 0.05 level, because the signicance level is 0.086, but any negative eect is rejected at the 0.05 level.
28
non-joiners is no longer signicant at the .05 level for the medium group whereas for the
low group there is a signicant dierence until age 17. Nonetheless, we caution against
over interpretation of this seeming dierence in the persistence of the violence facilitation
eect between the two groups. The sample sizes within groups are small there are only
:
1
= 21 joiners in the Low group and so denitive statements are not possible. For
instance, every condence interval in Table 8 includes an eect of 0.25. At age 16 a test
of the null hypothesis of an eect of 0.25 yields a two-sided signicance level of 0.42 in the
Low group and 0.28 in the Medium group, so the same eect of 0.25 is entirely plausible
at age 16 for both Low and Medium groups; that is, the eect may well be the same in
both groups.
5.2 Gang Membership, Ages 14 to 17
Tables 9 and 10 describe the changes in gang membership over time, ages 14 to 17. In these
tables, the percents for controls are found by averaging over the controls matched to each
joiner, and averaging those averages, in parallel with the denition of A
c
. By denition,
the joiners are all in gangs at age 14 and the controls are not. A year later, in Tables 9,
only 39% of the joiners are still in gangs, while 10% of the controls have joined. By age
17, only 20% of the joiners are still in gangs and 16% of the matched controls are in gangs,
and these frequencies do not dier signicantly as judged by the Mantel-Haenszel test for
binary responses in multiply matched sets.
Table 10 describes the changes in gang membership separately in the two trajectory
groups. The immediate decline in gang membership at age 15 is greater in the low group
than in the medium group, but 25% of joiners in the low group are in gangs at age 17, as
opposed to 17% in the medium group; however, this dierence is not statistically signicant
by Fishers exact test. (In detail, in the low group, there were 20 matched sets with violence
data on both an age 14 joiner and a control, and in 25% = 520 of these sets the joiner
was in a gang at 17, so in the low group each joiner is 5% of the group, whereas for the
medium group, the ratio was 17% = 530, and the two-sided p-value from Fishers exact
test was 0.49.) Also, fewer controls subsequently join gangs in the low group than in the
medium group. In principle, dierences in the persistence in gang membership across the
groups might account for dierences in the persistence of the violence facilitation eect
across groups, but of course neither dierence was statistically signicant.
Table 9 describes a commonplace event: as time passes, the integrity of the treated
29
Table 9: Gang Membership at Ages 14 to 17, for Joiners at 14 and Matched Controls.
The percent for controls is direct adjustment, that is, the average over matched sets of
the average gang membership among controls in each set. Two-sided p-values from the
Mantel-Haenszel test without continuity correction.
Age 14 15 16 17
Joiners at 14 in Gangs (%) 100 39 25 20
Matched Controls in Gangs (%) 0 10 11 16
Mantel-Haenszel P-value 0.000046 0.027 .42
Matched Sets 59 56 48 50
Table 10: Gang Membership, Ages 14 to 17, for Joiners at Age 14 and Matched Controls,
by Trajectory Group. The percents for controls are directly adjusted, as in Table 6: it is
the average over joiners of the percentage of that joiners controls who are in gangs.
Trajectory Group Age 14 15 16 17
Low Joiners at 14 in Gangs (%) 100 29 17 25
Low Matched Controls in Gangs (%) 0 12 5 9
Medium Joiners at 14 in Gangs (%) 100 46 30 17
Medium Matched Controls in Gangs (%) 0 8 15 20
and control groups degrades, as subjects enter and leave. This occurs in both exper-
iments and observational studies. In the language of randomized clinical trials, Table
9 displays noncompliance, that is, the tendency over time for joiners at age 14 to quit
gangs, and for matched controls to join them. The noncompliance analysis for randomized
experiments in Greevy, et al. (2004) uses the randomly assigned treatment as an instru-
ment for the received treatment; it gives appropriate inferences providing the assignment
is randomized, even if noncompliance is nonrandom and biased. Specically, that analysis
assumes the randomized treatment assignment is untainted by self-selection bias, but the
eect of the treatment is a function of the treatment received, not the treatment assigned,
and the treatment received may be aected by self-selection bias. In the current con-
text, in which randomization was not used, this would mean that if the matching in 4
had matched comparable joiners and controls at age 14 this appears to be true for the
covariates in Table 6, but is a matter of speculation for covariates not measured then
the noncompliance analysis would be appropriate even if later decisions to exit or enter
gangs were nonrandom. For instance, it would not be surprising if subsequent violence,
inside gangs or outside, were related to exit or entry, but this would not invalidate the
instrumental variable analysis.
30
Table 11: Eects of Gang Membership on Violence Under Three Noncompliance Models.
Model c

95% CI
Transient 1 1.10 [0.29. 1.86]
Lingering
1
2
0.77 [0.20. 1.31]
Permanent 0 0.46 [0.11. 0.81]
We applied the noncompliance analysis in Greevy, et. al. (2004) in the following
way. As in that paper, we dened an indicator of gang membership at age a, for a =
14. 15. 16. 17, using exponential smoothing. Write G
ia
= 1 if boy i was in a gang at
age a, G
ia
= 0 otherwise, so G
i;14
= 1 for joiners and G
i;14
= 0 for controls, but for
a 14, there is some switching, as indicated in Table 9. Dene

G
i;14
= G
i;14
, and

G
i;a+1
= cG
i;a+1
+(1 c)

G
i;a
, for a = 14. 15. 16. where 0 _ c _ 1, with

G
i;a+1
dened to
be missing if either G
i;a+1
or

G
i;a
is missing. If c = 1 then

G
i;a
= G
i;a
simply indicates
whether a boy is in a gang at age a; this is the transient model, because only current
gang membership matters. If c = 0 then

G
i;a
= G
i;14
simply indicates whether a boy
joined a gang at age 14; this is the permanent model, because eects at age 14 last until
age 17. If 0 < c < 1, then past gang membership exerts diminishing inuence as the years
pass. For instance, with c =
1
2
, a boy who was in a gang for the rst two years, G
i;14
= 1,
G
i;15
= 1, G
i;16
= 0, G
i;17
= 0, has

G
i;14
= 1,

G
i;15
= 1,

G
i;16
=
1
2
,

G
i;17
=
1
4
. The model
with c =
1
2
is the lingering model. In each model, the eect on violence at age a is
modeled as

G
i;a
. As in Greevy, et al. (2004), to test H
0
: =
0
, we subtracted
0

G
i;a
from the violence at age a, averaged all of these adjusted values for each boy over ages
a = 14. 15. 16. 17, and applied the aligned rank test to these adjusted averages, comparing
joiners at 14 to their matched controls; see Greevy, et al. (2004) for detailed discussion
and references to earlier work. Of course, for all c, when testing H
0
: = 0, this gives the
0.015 signicance level in Table 7 for the average level of violence, ages 14 to 17. Table 11
shows the Hodges-Lehmann point estimate of under each model, together with the 95%
condence interval.
Based on the pattern seen in Table 7, the permanent model with c = 0 does not
look plausible: the dierence in violence between joiners at age 14 and matched controls
seems to diminish with increasing age, perhaps because of the shift in gang membership
over time in Table 9. Figure 9 depicts residuals, as in Greevy, et al. (2004), from the three
noncompliance models. If the model were correct and if the sample size were extremely
large, then each pair of boxplots would be the same, but dierent pairs of boxplots might
31
dier. Arguably, with c = 0 in Figure 9, the eect of

= 0.46 is too small at age a = 14
and too large at age a = 17. The plots for c = 1 with

= 1.10 and for c = 12 with

= 0.77 both look better than for c = 0, and neither looks dramatically better than the
other. In a much larger study, we would explore whether these results dier by trajectory
group; however, in the current study, the group-specic condence intervals for ages 15 to
17 in Table 8 all include zero eect.
5.3 Covariance Adjustment of Matched Sets
It is sometimes possible to increase the eectiveness of bias adjustments or reduce sampling
error by combining matching with some form of covariance adjustment (Rubin 1979). Here,
we apply the method in Rosenbaum (2002b, 7), in which: (i) the hypothesized treatment
eect is subtracted from the responses of the 59 gang joiners to form adjusted responses
for the 278 = 59 + 219 matched boys, (ii) these 278 adjusted responses are regressed on
covariates to obtain 278 residuals, and (iii) the aligned rank test is applied to the residuals
within matched sets. In a randomized experiment in which one boy in each set is picked
at random for treatment, this procedure produces a randomization inference with the
correct level. In our implementation here, we used Hubers m-estimates to perform the
regression, as implemented in Splus with Hubers weight function. In Table 6, the 12
covariates marked with an asterisk are variables not derived from other variables, and the
adjustment used these 12 covariates. (Missing values of covariates were replaced by the
mean for the covariate.)
For an additive eect on the level of violence at age 14, the covariance adjustment
yielded a two-sided signicance level of 0.0026 for testing no eect and a 95% condence
interval of [0.25. 1.02]; at age 15, the signicance level for no eect was 0.014 with 95%
condence interval of [0.12. 1.14]; for the average level of violence for ages 14 to 17, the
signicance level was 0.037 with a 95% condence interval of [0.02. 0.76]. These results are
quite similar to the corresponding results in Table 7 without covariance adjustment. In
this one instance, the matching alone seems to have adequately controlled for the covariates
in the matched sets, with the covariance adjustment doing little more; however, this cannot
be counted upon in general.
32
5.4 Sensitivity to Bias from an Unobserved Covariate
The analysis in 4.2 suggests that the matching was quite successful in balancing the
measured covariates in Table 6, but there is the inevitable concern that some important
covariate may not have been measured. The analysis in Table 7 would be correct in a
randomized experiment in which one boy in each matched set were picked at random for
treatment, but this analysis would not be correct if an important unmeasured covariate
had not been controlled by matching. Here, we ask how such an unobserved covariate
might alter the analysis in Table 7. For a nontechnical overview of methods of sensitivity
analysis, see Rosenbaum (2005b). The method used here is described in detail in Gastwirth,
Krieger and Rosenbaum (2000), so only a brief description is given here. Suppose that
an unobserved binary covariate, n = 1 or n = 0, were associated with a _ 1 fold
increase in the odds of joining a gang at age 14. What is the largest possible one-sided
signicance level for the aligned rank test allowing for the impact of failure to control for
n? Table 12 gives sharp upper bounds on the one-sided signicance levels for no eect,
testing against increased violence among gang joiners, for several values of . If = 1,
then one obtains the randomization distribution and essentially the analysis in Table 7.
(For technical reasons, the sensitivity analysis is best viewed as one-sided. If one doubles
the values for = 1 in Table 12, one obtains the corresponding two-sided signicance levels
in Table 7.) The bounds in Table 12 are sharp in the sense that they are attained for a
certain unobserved covariate strongly related to violence at the given age. The increase
in violence at age 14 is insensitive to a 50% increase in the odds of joining a gang ( = 1.5)
associated with n = 1, as the maximum possible signicance level of 0.036 is still less than
the conventional 0.05, but larger biases than this could explain the observed association.
The results at age 15 are more sensitive to unobserved bias than the results at age 14. The
ostensible eect of gang joining on violence at age 14 is not sensitive to small biases, but
it is far more sensitive to bias than, say, Hammonds (1964) study of the eects of heavy
smoking on the risk of lung cancer, which becomes sensitive to bias at about = 6; see
Rosenbaum (2002a, 4.3.2).
6 Concluding Remarks
We began this paper with the observation that a key aim of empirical research in devel-
opmental psychopathology and life course studies is measuring the eects of therapeutic
interventions or important life events on behavioral trajectories. We also observed that
33
Table 12: Sensitivity to Unobserved Biases: Sharp Upper Bounds on the One-Sided Sign-
cance Level for Testing No Eect on Level of Violence at Ages 14 and 15.
Age 14 Age 15
1.0 0.00084 0.0045
1.3 0.012 0.037
1.4 0.021 0.059
1.5 0.036 0.088
1.6 0.056 0.124
1.7 0.082 0.166
the use of experimental control to infer these eects was often impractical or unethical.
This paper presented an approach for inferring such eects from observational data that
attempts to recreate some key ingredients of a well designed experiment.
The inference strategy was designed with three goals in mind: First, we wanted to
exploit the rich variety of measurements available in quality longitudinal studies. It is par-
ticularly important to balance covariates strongly associated with both treatment status
and outcomes. These often include covariates that are the dening feature of longitudinal
data setsprior values of the outcome variable and prior values of the treatment variable.
Second, we wanted to demonstrate a mode of analysis in which key results on pre-treatment
balance and post-treatment outcomes can be communicated in a transparent fashion. Such
transparency is important to reporting statistical ndings in a comprehensible fashion,
particularly to non-technical audiences. Third, research on life course development can be
divided into two distinct literatures. One aims to document and understand individual dif-
ferences in developmental trajectories. The ultimate purpose of this literature is to develop
empirically veried theory of the predictors and consequences of alternative trajectories of
development. Research in this tradition relies primarily on prospective longitudinal stud-
ies such as that used in this paper and statistical inference is most commonly based on
regression-based statistical procedures. Another literature, which is more clinically or pol-
icy oriented, aims to identify interventions or programs that can alter trajectories for the
better. For this type of research, experiments are the preferred statistical methodology.
Our third objective was to demonstrate a form of analysis based on group-based trajec-
tory modeling, propensity scores, and matching that can better unite these two strands of
research.
34
References
[1] Bergstralh, E. J., Kosanke, J. L., and Jacobsen, S. L. (1996), Software for optimal
matching in observational studies, Epidemiology, 7, 331-332.
[2] Bertsekas, D. P. (1981), A new algorithm for the assignment problem,
Mathematical Programming, 21, 152-171. Fortran code is available at:
http://www.mit.edu:8001//people/ dimitrib/home.html
[3] Cloward, R. A. and L. E. Ohlin. (1960). Delinquency and Opportunity: A Theory of
Delinquent Gangs. New York: Free Press.
[4] Cochran, W. G. (1965) The planning of observational studies of human populations
(with Discussion). Journal of the Royal Statistical Society, A128, 134-155.
[5] Cohen, A. K. (1955). Delinquent Boys: The Culture of the Gang. Glencoe, IL: Free
Press.
[6] Dehejia, R. H. and Wahba, S. (1999) Causal eects in nonexperimental studies:
Reevaluating the evaluation of training programs. Journal of the American Statistical
Association, 94, 1053-1062.
[7] Dorn, H.F. (1953). Philolosphy of Inferences from Retrospective Studies. American
Journal of Public Health, 43:677-683.
[8] Elder, Jr., G. H. 1985. Perspectives on the Life Course. In: G.H. Elder, Jr., ed. Life
Course Dynamics. Ithaca: Cornell University Press.
[9] Elder, Jr. G. H. 1998. The Life Course as Developmental Theory. Child Development,
69: 1-12.
[10] Farrington, D. P. 1986. Age and Crime. In M. Tonry, and N. Morris, eds., Crime
and Justice: An Annual Review of Research. Vol. 7 Chicago: University of Chicago
Press.
[11] Fisher, R.A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd.
[12] Gastwirth, J. L., Krieger, A. M. & Rosenbaum, P. R. (2000). Asymptotic separability
in sensitivity analysis. J. R. Statist. Soc. B 62, 545-55.
[13] Greevy, R., Silber, J. H., Cnaan, A., and Rosenbaum, P. R. (2004), Randomization
inference with imperfect compliance in the ACE-inhibitor after anthracycline random-
ized trial, Journal of the American Statistical Association, 99, 7-15.
[14] Gu, X. S. and Rosenbaum, P. R. (1993) Comparison of multivariate matching meth-
ods: Structures, distances and algorithms. Journal of Computational and Graphical
Statistics, 2, 405-420.
35
[15] Halford, K. & Bouma, R. (1997), Individual psychopathology and marital distress,
in Halford, K. & Markham, H. (eds) Clinical Handbook of Marriage and Couples
Interventions, John Wiley and Sons, Chichester UK.
[16] Hammond, E. C. (1964). Smoking in relation to mortality and morbidity. Journal of
the National Cancer Institute 32, 11611188.
[17] Hansen, B. B. (2004), Full matching in an observational study of coaching for the
SAT, Journal of the American Statistical Association, 99, 609-618. R-code is available
at: http://www.stat.lsa.umich.edu/~bbh/
[18] Haviland, A. and D.S. Nagin. 2005. Causal Inference with Group-based Trajectory
Models. Psychometrika, 70, 1-22.
[19] Hodges, J. L. and Lehmann, E. L. (1962), Rank methods for combination of inde-
pendent experiments in the analysis of variance, Annals of Mathematical Statistics,
33, 482-97.
[20] Hodges, J. L. and Lehmann, E. L. (1963), Estimates of location based on ranks,
Annals of Mathematical Statistics, 34, 598-611.
[21] Joe, M. M. and Rosenbaum, P. R. (1999) Propensity scores. American Journal of
Epidemiology, 150, 327-333.
[22] Lacourse, E., D. Nagin, F. Vitaro, M. Claes, and R. E. Tremblay. 2003. Developmen-
tal Trajectories of Boys Delinquent Group Membership and Facilitation of Violent
Behaviors During Adolescence. Development and Psychopathology, 15, 183 - 197.
[23] Lehmann, E. L. (1998), Nonparametrics, New Jersey: Prentice Hall.
[24] Marshall, A. W. and Olkin, I. (1979) Inequalities. New York: Academic.
[25] Meyer, B. D. (1995) Natural and quasi-experiments in economics. Journal of Business
and Economic Statistics, 13, 151-161.
[26] Ming, K. and Rosenbaum, P. R. (2000). Substantial gains in bias reduction from
matching with a variable number of controls. Biometrics 56, 118-124.
[27] Ming, K. and Rosenbaum, P. R. (2001), A note on optimal matching with variable
controls using the assignment algorithm, Journal of Computational and Graphical
Statistics, 10, 455-463.
[28] Muthn, B. O. 2001. Second-Generation Structural Equation Modeling with a Combi-
nation of Categorical and Continuous Latent Variables: New Opportunities for Latent
Class/Latent Curve Modeling. In A. Sayers and L. Collins, eds., New Methods for
the Analysis of Change. Washington, D.C.: American Psychological Association.
[29] Nagin, D. S. 1999. "Analyzing Developmental Trajectories: A Semi-parametric,
36
Group-based Approach." Psychological Methods, 4: 139-177.
[30] Nagin, D. S. (2005) Group-Based Modeling of Development. Cambridge, MA: Harvard
University Press.
[31] Nagin, D., D. Farrington, and T. Mott. 1995. Life-Course Trajectories of Dierent
Types of Oenders. Criminology, 33: 111-139.
[32] Nagin, D. S., & Land, K. C. (1993). Age, criminal careers, and population heterogene-
ity: Specication and estimation of a nonparametric, mixed poisson model. Criminol-
ogy, 31(3), 327-362.
[33] Rosenbaum P. R. (1984) Conditional permutation tests and the propensity score in
observational studies. Journal of the American Statistical Association, 79, 565-574.
[34] Rosenbaum, P. R. (1987) Model-based direct adjustment. Journal of the American
Statistical Association, 82, 387-394.
[35] Rosenbaum, P.R. (1989), Optimal matching in observational studies, Journal of the
American Statistical Association, 84, 1024-32.
[36] Rosenbaum, P. R. (1991) A characterization of optimal designs for observational stud-
ies. Journal of the Royal Statistical Society B53 597-610.
[37] Rosenbaum, P. R. (2002a) Observational Studies (2
nd
edition). New York: Springer-
Verlag.
[38] Rosenbaum, P.R. (2002b). Covariance adjustment in randomized experiments and
observational studies. Statistical Science 17, 286-327.
[39] Rosenbaum, P. R. (2005a) Observational study. In: Encyclopedia of Statistics in Be-
havioral Science, 2005, eds., B. S. Everitt and D. C. Howell, New York: John Wiley
and Sons, pp. 1451-1462.
[40] Rosenbaum, P. R. (2005b) Sensitivity analysis in observational studies. In: Encyclo-
pedia of Statistics in Behavioral Science, 2005, eds., B. S. Everitt and D. C. Howell,
New York: John Wiley and Sons, pp. 1809-1814.
[41] Rosenbaum, P. and Rubin, D. (1983) The central role of the propensity score in
observational studies for causal eects. Biometrika 70, 41-55.
[42] Rosenbaum, P. & Rubin, D. (1984) Reducing bias in observational studies using sub-
classication on the propensity score. Journal of the American Statistical Association,
79, 516-524.
[43] Rosenbaum, P. and Rubin, D. (1985a). Constructing a control group using multi-
variate matched sampling methods that incorporate the propensity score. American
Statistician 39, 3338.
37
[44] Rosenbaum, P. R. and Rubin, D. R. (1985b). The bias due to incomplete matching.
Biometrics 41, 103-116.
[45] Rosenbaum, P. R. and Silber, J. H. (2001) Matching and thick description in an
observational study of mortality after surgery. Biostatistics, 2, 217-232.
[46] Rubin, D. B. (1974) Estimating causal eects of treatments in randomized and non-
randomized studies. Journal of Educational Psychology, 66, 688-701.
[47] Rubin, D. B. (1979) Using multivariate matched sampling and regression adjustment
to control bias in observational studies. Journal of the American Statistical Associa-
tion, 74, 318328.
[48] Rubin D. B. (1980). Bias reduction using Mahalanobis metric matching. Biometrics
36, 293-298.
[49] Schmaling, K. & Sher, T. (1997), Physical health and relationships, in Halford, K.
& Markman, H. (eds) Clinical Handbook of Marriage and Couples Interventions, pp.
323-338., John Wiley and Sons, Chichester UK.
[50] Shadish, W. R., Cook, T. D. & Campbell, D. T. (2002). Experimental and Quasi-
Experimental Designs for Generalized Causal Inference. Boston: Houghton-Mi-in.
[51] Short, J. F., Jr. and F. L. Stodtbeck. (1965). Group Process and Gang Delinquency.
Chicago: University of Chicago Press.
[52] Smith, H. (1997), Matching with multiple controls to estimate treatment eects in
observational studies, Sociological Methodology 27, 325-353.
[53] Stuart, A. (1955) A paradox of statistical estimation. Biometrika, 42, 527-529.
[54] Thornberry, T., M. Krohn, A Lizotte, C. Smith, and K. Tobin. 2003. Gangs and
Delinquency in Developmental Perspective. Cambridge, U.K.: Cambridge University
Press.
[55] Tremblay, R. E., Desmarais-Gervais, L., Gagnon, C., and Charlebois, P. (1987) The
preschool behavior questionnaire: stability of its factor structure between culture,
sexes, ages, and socioeconomic classes. International Journal of Behavioral Develop-
ment, 10, 467-484.
[56] Warr, M. .(2002). Companions in Crime: The Social Aspects of Criminal Conduct.
New York: Cambridge University Press.
38
Age
L
a
m
b
d
a
11.0 11.5 12.0 12.5 13.0
0
2
4
6
Low
Medium
Chronic

Figure 1: Expected Trajectories of Violent Delinquency in Groups Low (s = 1), Medium
(s = 2), and Chronic (s = 3).

0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 1
V
i
o
l
e
n
c
e

S
c
o
r
e
0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 2
V
i
o
l
e
n
c
e

S
c
o
r
e
0
5
1
0
1
5
Age 10 Age 11 Age 12 Age 13
Group 3
V
i
o
l
e
n
c
e

S
c
o
r
e

Figure 2: Boxplots of Violence Scores for the Three Trajectory Groups, Ages 10 to 13,
When None of the Boys Were in Gangs. Low is trajectory group s = 1, Medium is group
s = 2, and Chronic is group s = 3.


0
2
4
6
8
1
0
1
2
1
4
Joiners Potential Controls
Violence Age 10, P=.0087
V
io
le
n
c
e
0
2
4
6
8
Joiners Potential Controls
Violence Age 11, P=.0028
V
io
le
n
c
e
0
2
4
6
8
1
0
1
2
Joiners Potential Controls
Violence Age 12, P=.044
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Potential Controls
Violence Age 13, P=.0032
V
io
le
n
c
e
-
2
-
1
0
1
2
3
Joiners Potential Controls
Popularity, Age 11, P=.0015
P
o
p
u
la
r
it
y
1
5
2
0
2
5
3
0
3
5
Joiners Potential Controls
Mothers Age, P=.020
A
g
e

a
t

F
ir
s
t

B
ir
t
h
-
2
-
1
0
1
2
3
Joiners Potential Controls
Aggression, Age 11, P=.0081
P
e
e
r

R
a
t
e
d

A
g
g
r
e
s
s
io
n
0
1
2
3
4
Joiners Potential Controls
Hyperactivity, Age 11, P=0.017
T
e
a
c
h
e
r

R
a
t
e
d

H
y
p
e
r
a
c
t
iv
it
y
0
2
4
6
8
Joiners Potential Controls
Sex Partners, Age 13, P=.0024
N
u
m
b
e
r

o
f

P
a
r
t
n
e
r
s
0
2
4
6
8
1
0
Joiners Potential Controls
Opposition, Age 11, P=.034
T
e
a
c
h
e
r

R
a
t
e
d

O
p
p
o
s
it
io
n
0
2
4
6
8
1
0
1
2
Joiners Potential Controls
IQ, P=.89
I
Q

T
e
s
t

M
e
a
s
u
r
e
0
1
2
3
4
5
6
Joiners Potential Controls
Aggression, Age 11, P=.27
T
e
a
c
h
e
r

R
a
t
e
d

A
g
g
r
e
s
s
io
n

Figure 3: Twelve Covariates Before Matching for Gang Joiners at Age 14 and for
Potential Controls Who Did Not Join at Age 14. The P-value is from Wilcoxon's two-
sided rank sum test.

0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 1
P
r
o
b
(
J
o
i
n
)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 2
P
r
o
b
(
J
o
i
n
)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners All Others
Group 3
P
r
o
b
(
J
o
i
n
)

Figure 4: Estimated Propensity Scores by Trajectory Group. Low is trajectory group s =
1, Medium is group s = 2, and Chronic is group s = 3.


0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Before Matching After-Groups 1 & 2 After-Group 1 After-Group 2
S
t
a
n
d
a
r
d
i
z
e
d

D
i
f
f
e
r
e
n
c
e

Figure 5: Absolute Standardized Differences in Means for Gang Joiners at Age 14
Versus Controls, Before and After Matching, For 15 Covariates. Low is trajectory group
s = 1 and Medium is group s = 2.
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners Matched Unmatched
Propensity Score
P
r
o
p
e
n
s
i
t
y

S
c
o
r
e
-
3
-
2
-
1
0
1
2
3
Joiners Matched Unmatched
Peer Rated Popularity
P
o
p
u
l
a
r
i
t
y
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Joiners Matched Unmatched
Prob(Group 2 | 1 or 2)
P
r
o
b
a
b
i
l
i
t
y
0
2
4
6
8
1
0
Joiners Matched Unmatched
Violence Age 11
V
i
o
l
e
n
c
e

Figure 6: Four Covariates with Largest Initial Bias: (i) Group-Specific Propensity
Scores, (ii) Peer Rated Popularity, and (iii) Conditional Probability of Trajectory Group
s = 2 Given Groups s = 1 or s = 2, and (iv) Violence at Age 11. For the 59 gang joiners
at age 14 and the 273 unmatched nonjoiners, these are conventional boxplots, whereas for
the 219 matched controls, the boxplot uses quartiles derived from the weighted empirical
distribution.

0
2
4
6
8
1
0
Joiners Matched Controls
Age 14
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17
V
io
le
n
c
e

Figure 7: Violence Outcomes: Violence Scores Ages 14 to 17 for 59 Gang Joiners at Age
14 and 219 Matched Controls. The boxplot for the matched controls is based on quartiles
from the weighted empirical distribution.



0
2
4
6
8
1
0
Joiners Matched Controls
Age 14, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17, Group 1
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 14, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 15, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 16, Group 2
V
io
le
n
c
e
0
2
4
6
8
1
0
Joiners Matched Controls
Age 17, Group 2
V
io
le
n
c
e

Figure 8: Violence Outcomes by Trajectory Group, Ages 14 to 17. For matched
controls, boxplots use quartiles from the weighted empirical distribution. Low is
trajectory group s = 1 and Medium is group s = 2.


0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=1
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=1/2
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 14, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 15, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 16, alpha=0
V
io
le
n
c
e
0
5
1
0
1
5
Joiners Matched Controls
Age 17, alpha=0
V
io
le
n
c
e

Figure 9: Residuals from three noncompliance models, a = 1, 0.5, 0. The 59 joiners at
age 14 are compared to their 219 matched controls. Boxplots for controls use quartiles
from the weighted empirical distribution.

You might also like