144 4jan2021 JuvenileOffendersProgramEffectivenessLipsey

DOI: 10.1111/1745-9133.
12530
D E B AT E
E V I D E N C E - BA S E D A P P ROAC H E S TO I M P ROV I N G
JUVENILE JUSTICE PROGRAMMING
Revisited: Effective use of the large body of

research on the effectiveness of programs for
juvenile offenders and the failure of the model
programs approach
Mark W. Lipsey
Human & Organizational Development,

Vanderbilt University, Nashville, Abstract
Tennessee Elliott et al.’s essay is part of what is now a series of
Correspondence
papers in Criminology and Public Policy on contrasting
Mark W. Lipsey, Department of Human ways that the evidence base on effective interventions
and Organizational Development, Vander- with juvenile offenders can be used to improve juve-
bilt University, 230 Appleton Place, PMB
181, Nashville, TN 37203-5721. nile justice programming. Their paper has two themes—
Email: mark.lipsey@vanderbilt.edu (a) a defense of the model programs approach as pro-
moted by the Blueprints for Healthy Youth Development
evidence-based program registry, and (b) an attack on
the Standardized Program Evaluation Protocol scheme
for assessing programs for juvenile offenders and the
meta-analysis on which it is based. I address both
those themes in this commentary in the context of the
broader cross-disciplinary conversation about ways to
apply research evidence to improve the effectiveness of
the programs used in actual practice.
KEYWORDS
Evaluation, evidence-based program registries, evidence-based
programs, juvenile delinquency, juvenile justice system, juvenile
offenders, juvenile recidivism, meta-analysis, model programs
The Elliott, Buckley, Gottfredson, Hawkins, and Tolan (2020) essay is part of what is now a series
of papers in Criminology and Public Policy on contrasting ways that the evidence base on effec-
tive interventions with juvenile offenders can be used to improve juvenile justice programming.
The first paper (Baglivio et al., 2018) reported research showing that higher scores on a quality of
Criminology & Public Policy. 2020;19:1329–1345. wileyonlinelibrary.com/journal/capp © 2020 American Society of Criminology 1329
1330 LIPSEY
service delivery measure for interventions with juvenile offenders in residential placements were
associated with lower recidivism. That quality of service delivery measure is part of a scheme for
assessing the likelihood that a local juvenile justice program will be effective for reducing recidi-
vism that is based on a large meta-analysis my colleagues and I have conducted. In a commentary
on the Baglivio et al. paper (Lipsey, 2018), I contrasted the way that assessment scheme (called
the Standardized Program Evaluation Protocol [SPEP]) uses evidence on program effectiveness
with the approach represented in the model programs listings in the Office of Juvenile Justice
and Delinquency Prevention Model Programs Guide, CrimeSolutions, and Blueprints for Healthy
Youth Development. In particular, I noted that the model programs approach had failed to have
much actual impact on programming in juvenile justice systems due to limited uptake. I also
argued that there were good reasons to question whether the supporting research for many of the
model programs in those registries was adequate to ensure that the programs would be effective
when scaled up and used in routine practice.
The authors of Elliott et al. include the Blueprints founder, the current Blueprints director,
and several members of the Blueprints advisory board. Their paper has two themes—a defense
of the model programs approach as promoted by Blueprints, and an attack on the SPEP and the
meta-analysis on which it is based. I will address both those themes, but first it is instructive to
recognize that the issues raised in this dialogue are part of a broader cross-disciplinary conversa-
tion about ways to apply research evidence to improve the effectiveness of the programs used in
actual practice.
1 THE EVIDENCE-BASED PRACTICE MOVEMENT
The evidence-based practice (EBP) movement has revolutionized medicine and has increasing
currency in such nonmedical areas as education, mental health, social work, and criminal jus-
tice. In medicine, important developments for evidence-based treatment include recognition of
the limitations of a single study and the need for a sufficient number and scope of clinical trials
before drawing conclusions about effective treatment (Ioannidis, 2008). The associated challenge
of integrating evidence across multiple trials has propelled meta-analysis forward as a systematic
way to accomplish that integration. An especially relevant development in the present context
has been increased recognition of a category of treatments referred to as “complex interventions.”
These are interventions with multiple constituent elements, and possibly other features that add
to their complexity, in contrast to relatively less complex pharmaceutical or surgical interventions
(Minary et al., 2019; Moore, Evans, Hawkins, Littlecott, & Turley, 2017). On this point the chal-
lenges for establishing evidence-based interventions in medicine and for social–psychological–
behavioral interventions (behavioral interventions for short) of the sort found in education, men-
tal health, criminal justice, and so on are in close alignment. Virtually all of the behavioral inter-
ventions in these latter areas are complex interventions.
One challenge for research addressing the “what works” question for complex interventions
is defining the “what” so as to adequately identify the intervention being evaluated and guide
replication if it proves effective. The most typical approach has been to define the whole treat-
ment package with all its complexity as a single distinct treatment protocol that is then described
in an accompanying manual. To help practitioners identify and select appropriate manualized
evidence-based programs, various curated registries have been developed, such as the U.S. Depart-
ment of Education’s What Works Clearinghouse, the EBPs Resource Center sponsored by the
Department of Health and Human Services’ Substance Abuse and Mental Health Services Admin-
istration, and those in criminal justice mentioned earlier.
LIPSEY 1331
The development of manualized evidence-based treatments was a big and important step for
facilitating the dissemination and use of evidence-based behavioral programs. Over time, how-
ever, conceptual and practical problems with this approach have emerged that are related to both
the research that provides the evidence and the transfer of the results into practice. At the heart of
these issues is the very nature of complex behavioral interventions. Their complexity makes it dif-
ficult for different implementations of what is ostensibly the same treatment protocol to be similar
in all the ways that matter for their effectiveness. Moreover, part of the complexity of behavioral
interventions is that they typically involve extensive interpersonal interactions between providers
and the individuals served, and their success can depend heavily on the skill of the providers and
the relationships established between them and their clients. This interactional, relational aspect
of behavioral programs has been recognized, for example, in the importance placed on the con-
cept of therapeutic alliance in mental health treatment (Ardito & Rabellino, 2011; Hogue, Dauber,
Stambaugh, Cecero, & Liddle, 2006). This aspect of behavioral programs is difficult to capture in
a treatment manual and replicate in practice.
A related conundrum for complex behavioral interventions is defining the boundaries for what
follows the protocol and what departures, or local adaptations are large enough to constitute a dif-
ferent treatment that cannot be assumed to be equally effective. Manualized behavioral programs
demand fidelity to the protocol that defines what was tested in the supporting research, but the
fidelity standards themselves rarely have an empirical basis that establishes which of them with
what degree of adherence are critical to producing the intended effects. In concept, every reason-
able but nontrivial departure constitutes a different program that requires its own evidence base
and manual. Even with attempts to follow a treatment protocol with fidelity, researchers have gen-
erally shown little inclination to study the “same” intervention in multiple settings with different
samples. The result is that most of the behavioral programs listed in the evidence-based program
registries are supported by very few studies, often only one or two.
When replication of behavioral programs is attempted, especially by independent investigators,
the results often fall short of what was reported in the initial research. Sobering examples of this
come from the recent tiered evidence initiatives. The U.S. Department of Education’s Education
Innovation and Research program, for instance, supported evaluation of programs with promising
initial evidence of effectiveness when they were implemented under conditions more typical of
routine practice. Only 13 of the 44 programs evaluated generated overall positive impact findings
with seven others showing mixed results (Lester, 2017). A similar pattern appeared in the tiered
evidence results for teen pregnancy prevention programs (Farb & Margolis, 2016). Only 12 of the
41 studies found overall positive effects on the key behavioral outcomes. For Tier 1 programs with
the strongest initial evidence of effectiveness, only four of 19 demonstrated overall positive effects.
All of this raises questions about the generalizability of the research findings for the manualized
behavioral programs in the various registries. Without strong evidence of generalizability, there
is little assurance that use of these programs by practitioners will produce the expected positive
outcomes.
For practitioners, there are other issues that inhibit enthusiasm for manualized behavioral pro-
grams. Members of the provider community tend to value their own expertise and have not shown
great eagerness to practice manualized programs by the book, especially when they did not cre-
ate the book and are not convinced that it fits their specific context and clientele. Given little
direct evidence supporting the fidelity standards of most such programs, and limited evidence of
the generalizability of the results of the original research across different implementations and
participant samples, these are not unreasonable concerns. When such programs are adopted,
providers must deal with the widely recognized struggle between strict fidelity to the protocol and
1332 LIPSEY
adaptations believed to make the program more appropriate to the local context (Bopp, Saunders,
& Lattimore, 2013). The result of these various factors has been limited uptake of the evidence-
based behavioral programs listed in the various registries and correspondingly limited penetra-
tion into actual practice in the respective service domains. It matters little how effective those
programs might be if they are not, in fact, widely used by the practitioners who serve the intended
beneficiaries of those programs.
This situation has led to a rather active exploration of alternative ways to make effective use
of the growing evidence base on behavioral programs. Of particular interest is the likelihood that
families of similar programs have some common characteristics that can guide effective prac-
tice without requiring emulation with fidelity of any one specific program (e.g., Barth et al.,
2012; Feucht & Tyson, 2018). Weisz and Chorpita (2012), for instance, have suggested that psy-
chotherapy can be viewed as consisting of modules—freestanding procedures that address specific
clinical issues and are sequenced into the full treatment regimen; for example, for self-calming,
modifying negative cognitions, increasing compliance with instructions, and the like. In a some-
what similar vein, Chorpita and his colleagues (e.g., Chorpita & Daleiden, 2009) have extracted
what they call practice elements, discrete treatment techniques or strategies, from hundreds of
treatment protocols for children’s behavioral health programs that outperform comparison con-
ditions in randomized trials; for example, goal-setting, modeling, and therapist praise/rewards.
These practice elements are organized in an online system that supports their flexible clinical use
(www.practicewise.com/).
Other formulations of the active ingredients of effective programs that might be identified
and used to guide practice include, for example, core components (Blase & Fixsen, 2013; National
Academies of Sciences, 2020), core elements (Hogue et al., 2017), kernels (Embrey & Biglan, 2008),
change mechanisms (Doss, 2004), behavior change techniques (Michie et al., 2013), and quality indi-
cators (Smith et al., 2012). An especial influence on the effort to identify characteristics distinctive
to effective programs has been the rise of meta-analysis, especially when used to systematically
study variation across programs in relationship to the observed effects (e.g., Dymnicki, Trivits,
Hoffman, & Osher, 2020; Tanner-Smith & Grant, 2018; Van Ryzin, Roseth, Fosco, Lee, & Chen,
2016; Wilson, Lipsey, Aloe, & Sahni, 2020).
These various conceptions and methods are generally presented as complements to the model
program approach rather than as competitors. The common theme, however, is to make use of
evidence about program effectiveness in a way that is more flexible than the model program
approach, more efficient in characterizing effective practice than multiple manuals, and which
responds to the problems and limitations that have arisen for that approach.
2 MODEL PROGRAMS IN JUVENILE JUSTICE
2.1 Potential for broad impact
The broader themes described above about issues and developments in the EBP movement are
reflected in the current exchanges about evidence-based programs in juvenile justice. The Lipsey
(2018) critique of the manualized model program approach had two main themes. The first was
about the limited evidence that these programs are capable of having broad impact on the out-
comes for the juveniles served in a juvenile justice system. The points made were as follows:
LIPSEY 1333
∙ There are relatively few manualized programs that target recidivism outcomes for juvenile
offenders that have qualified for inclusion in the registries.
∙ The programs that are listed represent a limited repertoire relative to the range of criminogenic
needs juvenile justice systems attempt to address.
∙ The evidence for the listed programs often consists of few studies, and those are typically con-
ducted by the program developers under circumstances that may not be representative of actual
practice.
∙ Attempts to replicate the research supporting a program are relatively rare, especially with an
independent evaluator, and when done the results often fail to confirm the original positive
effects.
Elliott et al.’s response to these points acknowledges that there are relatively few programs
in the registries that target recidivism outcomes for juvenile offenders, with only four such pro-
grams that have qualified for model program status in the Blueprints registry. All four of those
programs address family issues with family therapy and/or parent training among other compo-
nents, but only two (Functional Family Therapy [FFT] and Multisystemic Therapy [MST]) are for
general use with juvenile offenders; the other two are specialized for sex offenders (MST-Problem
Sexual Behavior) or foster care (Treatment Foster Care Oregon). Despite this narrow repertoire,
Elliott et al. argue that these programs are sufficient to have large system level effects when imple-
mented by juvenile justice agencies. That claim assumes that these programs are universally effec-
tive treatments for any criminogenic need presented by juvenile offenders, including for example
mental health, educational, and vocational needs, though no evidence is cited that demonstrates
such scope.
Otherwise, Elliott et al. report that these programs are exceptions to the Lipsey critique. All
have multiple randomized studies, including some by independent evaluators, and all have been
implemented in numerous states and juvenile justice systems with evidence that adequate fidelity
can be attained at that scale. However, the authors do not directly address the evidence cited
in Lipsey (2018) indicating that FFT and MST do not always perform as well as expected when
implemented in practice. FFT has been shown to produce negative effects if not implemented
with sufficient fidelity, as may happen in practice (Barnoski, 2002, 2004; Sexton & Turner, 2010).
And a study of FFT in the United Kingdom found no effects on delinquency outcomes (Humayun
et al., 2017). Further, a recent meta-analysis of MST (van der Stouwe, Asscher, Stams, Dekovic,
& van der Laan, 2014) found a modest mean effect size for the 11 studies reporting delinquency
outcomes with juvenile offenders (0.16) and, over all study samples, a dramatically smaller mean
effect size on delinquency outcomes for the 12 studies conducted by independent researchers than
for the eight studies conducted by researchers associated with the program developers (0.08 vs.
0.42).
Elliott et al. provide a general characterization of this situation as follows: “. . . available evi-
dence indicates that when programs are continually monitored for fidelity with a high level of devel-
oper involvement, as is the case for the four Blueprint juvenile justice model programs, scale-up
fidelity can be maintained and positive program effectiveness achieved” (emphasis added). There
is little doubt that these few model programs, especially the more generally applicable FFT and
MST, have the strongest research base by far among programs for juvenile offenders. On balance,
however, the full body of available evidence does not make a convincing case that these programs
can be implemented and sustained in routine practice by juvenile justice personnel (i.e., without
a “high level of developer involvement”) in a way that can be confidently expected to have a large
overall impact on the recidivism outcomes of the juveniles served in a juvenile justice system.
1334 LIPSEY
A particularly revealing assessment of the system level impact of introducing model programs
into routine practice would be a comparison of the recidivism rates for the population of juve-
nile offenders before that introduction and afterwards. To be representative of actual practice,
the model programs would have to be implemented and sustained on a routine basis by juvenile
justice personnel without any atypical levels of external support. As with any time series analy-
sis, other coinciding and confounding factors would have to be taken into account. Elliott et al.
basically agree with this proposition: “The best evidence for the impact of evidence-based pro-
grams on juvenile justice systems is the population level reductions in recidivism and cost savings
achieved in statewide implementations of evidence-based programs.” They then cite three exam-
ples of such evidence from statewide implementations—the 1997 Community Juvenile Justice Act
in Washington state, Project Redirection in Florida, and the Research-Based Programs Initiative
in Pennsylvania.
None of these examples stands up to close scrutiny. The 1997 Washington state reform gave
judges EBP options and, in an initial assessment, quasi-experimental comparisons were made
between juveniles assigned to MST, FFT, or ART versus practice as usual. Only a relatively small
proportion of the total juvenile offender population was represented in those comparisons, but
for them overall positive effects on recidivism were found for FFT and ART, but there were neg-
ative effects for one of the two counties using MST and for the instances where service delivery
was deemed “not competent” for FFT and ART (Barnoski, 2004). Implementation with fidelity in
actual practice perhaps is a problem after all. The claim of a 10% statewide reduction in recidivism
cited by Elliott et al. is not found in the reports on this initiative. However, a later report (Knoth,
Wanner, & He, 2019) provides an analysis of statewide recidivism rates from 1995 to 2014. The
demographic and offense type mixes changed over that period, but disaggregated by the respec-
tive subgroups, there is a downward trend in recidivism across the full 1995–2014 period with no
evident discontinuity associated with the onset of the 1997 EBP initiative.
Project Redirection in Florida was a statewide initiative, but also involved only a small pro-
portion of the juvenile offender population. Judges were given the option at their discretion to
“redirect” any youth deemed appropriate for residential confinement to one of three community-
based EBPs instead (MST, FFT, and Brief Strategic Family Therapy). The evaluation of the effects
on recidivism did not involve randomization but, rather, attempted to match youth selected for
EBPs with similar residential confinement youth on prior history and demographic variables, with
risk assessment scores added to a later analysis (Justice Research Center, 2013; OPPAGA, 2010).
Given the judicial discretion involved in selecting youth for redirection, this comparison is highly
vulnerable to selection bias, the very kind of design Elliott et al. rail against in other contexts. The
youth redirected to EBP may or may not have actually been sent to residential confinement if the
EBP option was not available, and the youth in residential confinement, expected to generally
have higher recidivism rates, may or may not have been matched to the redirected cases on all
relevant variables. The redirected youth were indeed found to have lower recidivism rates, but
the extent to which that was because of the EBP effects or lower initial recidivism risk not being
fully accounted for by the matching is an open question.
The Research-Based Programs Initiative in Pennsylvania, begun in 1998, was a $60 million
investment in programs from the Blueprints registry. Most of those were prevention programs;
only two (MST and FFT) received referrals from the juvenile justice system, often mixed in with
referrals from other sources. The evaluation of the effects of these programs did not involve con-
trolled designs but, rather, were before and after ratings of improvement on various dimensions
or “success” indicators, such as not being arrested after treatment, with no comparison to com-
parable youth who did not participate in the program (Meyer-Chilenski, Bumbarger, Kyler, &
LIPSEY 1335
Greenberg, 2007). As with the other statewide initiatives cited by Elliott et al., this initiative does
not provide a convincing demonstration of statewide EBP impact on the recidivism of the popu-
lation of juvenile offenders.
2.2 Uptake of evidence-based programs
The second theme of the Lipsey (2018) critique was that, even if the certified model programs
are effective for reducing recidivism when well implemented in a juvenile justice system, there
are obstacles to their uptake that have resulted in limited penetration into those systems. The
nature of those obstacles is well known. Providers may be reluctant to abandon a program they
believe to be effective in order to adopt a certified EBP, and they may doubt that it fits the local
situation and clientele well enough to be more effective than their current program. The strict
fidelity requirements for EBPs may be seen as unwelcome restraints on clinical judgment and the
ability to make what they see as needed adaptations to the program. Local programs typically have
reputations, history, and political support that make decision makers wary about disrupting them,
and there are significant costs associated with licensing and implementing most of the certified
EBPs.
Whether for these reasons or others, the available evidence indicates a very small presence of
EBPs from any of the relevant registries in juvenile justice systems. Lipsey (2018) reported the
results of an inventory of all the therapeutically oriented programs provided to juvenile offenders
in 10 juvenile justice systems. Of a total of 1,087 programs, only 79 (7.3%) were listed in Blueprints,
the OJJDP Model Programs Guide, or CrimeSolutions. Elliott et al. counter that the four model
Blueprints programs have been implemented in numerous states (ranging from 13 to 34) and serve
a total of about 35,800 youth annually (a number that includes some international participants).
Those are impressive numbers, but according to the OJJDP Statistical Briefing Book the total num-
ber of juveniles who received a juvenile court sanction in 2018, the last year with available data,
was 505,400. The estimate of the proportion of those youth served annually by the Blueprint pro-
grams is thus about 7.1% of the total number supervised under juvenile justice auspices.
Blueprints was launched in 1996. Thus over its 24 year history it has identified four model pro-
grams for reducing the recidivism of juvenile offenders, two of which are rather specialized, and
the current uptake of those programs by juvenile justice systems, whether measured by proportion
of programs or proportion of juveniles served, is remarkably modest. Even if every implementa-
tion of each of these programs had large positive effects on the youth served, the overall system
impact would necessarily be modest. Elliott et al. argue that it typically takes decades for innova-
tive programs to be widely adopted. One might be forgiven for wondering how many more years
it will take on top of the 24 already invested for Blueprint programs, and whether in the mean-
time there are other ways of using the body of evidence on effective programs that might produce
results at a faster pace.
3 SPEP
3.1 The meta-analysis
Elliott et al.’s misrepresentation of the SPEP and the meta-analysis on which it is based is so exten-
sive that it is more efficient to simply provide an accurate description than to respond to every
1336 LIPSEY
mischaracterization. The foundational meta-analysis built on previous work and has a scope that
includes all the research reports that could be located with a vigorous search that described stud-
ies using comparison group designs, rehabilitation-oriented interventions with juvenile offend-
ers, and recidivism outcomes (along with some other more specific eligibility requirements; see
Lipsey, 2009, and the references there to earlier reports on this meta-analysis). The objective of
this meta-analysis was not to estimate mean effect sizes for any intervention or for all of them
together—that has been done in earlier work by me and others (Lipsey & Cullen, 2007)—but,
rather, to investigate the diversity represented in the full body of eligible research, that is, vari-
ances rather than means. The first revelation that emerged from this effort was the sheer size of
this body of research. For the period from 1958 to 2002, 548 independent samples from 361 eligible
study reports were identified. Recent updates, not yet fully analyzed, have pushed the number
of comparisons with independent samples to over 700. This raises an immediate question about
what can be learned from the full body of intervention studies that might not be evident from the
proportionately few that appear in the relevant registries.
The methodological quality of these studies was, of course, a prime issue, and one that required
careful handling. The perspective represented in this meta-analysis is, first, that there needs to be
some balance between internal validity and external validity—the latter because the main interest
is application to real world situations (Loyka et al., 2020). A predominant focus on internal validity
would privilege low-attrition randomized studies. However, such studies are more likely to be
conducted by the program developer, with the program often also implemented by the developer,
and to apply atypical restrictions to the treatment sample to ensure that they are appropriate and
minimize attrition. Such studies generally yield larger effect sizes, but those effects are not likely
to represent what would actually be found in practice with the respective interventions (Lipsey,
2003; Petrosino & Soydan, 2005; Wolf, Morisson, Inns, Slavin, & Risman, 2020).
Second, this meta-analysis took an empirical approach to the question of methodological qual-
ity, something possible because of the large number of diverse studies. Nonrandomized compari-
son studies, for example, are vulnerable to selection bias but that does not necessarily mean such
bias occurs. For studies with juvenile offenders, prior delinquency history and key demographic
variables with relatively high correlations with recidivism outcomes are often available as statis-
tical control or matching variables. If such nonrandomized studies yield effect sizes comparable
to those from randomized studies after adjustment for any other differences (e.g., nature of the
samples, type of intervention), then as an empirical matter they are not showing any systematic
bias and there is no obvious reason to exclude them from the meta-analysis.
This empirical approach is also revealing with regard to the potential for bias in randomized
studies. Randomization of small samples (meaning less than many hundreds) has a nontrivial
probability of resulting in problematic baseline differences on key variables, a situation known
as “unhappy randomization.” Differential attrition from the outcome measures can also intro-
duce bias into even a well randomized comparison. Further, those outcome measures themselves
can be a source of variation that makes some interventions look erroneously less effective. There
are systematic differences in the effect sizes for arrest versus conviction versus institutionaliza-
tion recidivism measures and for shorter versus longer posttreatment recidivism intervals. In this
meta-analysis a technique was developed and applied to standardize the outcome measures to the
equivalent of 12-month rearrest recidivism in order to eliminate this source of variation.
Given this empirical perspective on methodological quality, the eligibility criteria for this meta-
analysis were rather broad, requiring randomization (42% of the studies), statistical or procedu-
ral matching (28%), or comparisons between treatment and comparison groups on baseline vari-
ables that included known correlates of recidivism such as prior delinquency history or risk, age,
LIPSEY 1337
sex, and race (30%). Across all these designs a composite variable was created that combined the
baseline difference data into an overall index of the magnitude of the initial differences (even
for randomized studies) with missing values on the variables in that composite estimated with
maximum likelihood imputation. Not surprisingly, when used as a covariate in the analyses, this
variable turned out to have the largest independent relationship to effect size from among the
methodological variables. With that composite variable controlled, there was no systematic dif-
ference in the effect estimates for randomized and matched studies and only a small difference
between randomized and unmatched baseline-difference-only studies. This composite variable,
dummy codes for the three categorical design types, and several other potentially influential vari-
ables were included as covariate control variables in all analyses. The logic of this approach is
that if there are no empirical differences in effect estimates statistically controlled in this way,
there is little room for variation in methodological “quality” to be a consequential influence on
the substantive results of the analyses.
By intent, this meta-analytic database included a wide variety of intervention modalities. These
were sorted into categories representing recognizable approaches in the world of practice such
as counseling variants, cognitive behavioral programs, social skill building, vocational training,
and the like. We refer to these categories as generic program types. The terminology is important
here—these are not programs or interventions per se, simply families of programs (“types”) that
are functionally similar. One important finding relates to the individual programs within these
categories. There we distinguish name brand programs such as MST, FFT, ART, Reasoning and
Rehabilitation, and so on that are disseminated beyond any one implementation and may be found
on one of the EBP registries from functionally similar homegrown and one-off programs that we
refer to as generic programs in the same sense as name brand and generic store brand goods at the
supermarket.
Two interesting findings emerge from these distinctions. First, there are some program cate-
gories (generic program types) in which there are virtually no name-brand programs (e.g., indi-
vidual counseling and group counseling) but for which there are nonetheless many evaluation
studies. The meta-analysis data, therefore, allows the evidence base and effect sizes for these pro-
grams to be represented and reported, with their implications for practice, although no programs
of that type appear in any EBP registry. Careful interpretation of that evidence base (more on this
later) justifies the use of programs of these types despite their absence from EBP registries if they
suit the purposes of a juvenile justice system, for example, to match needs or make good use of
local resources.
Second, within some of the program categories, we find both name brand and generic pro-
grams represented. As an example, within the family therapy category MST and FFT are included
along with generic family therapy programs clearly recognizable as family therapy but not a brand-
name version. The cognitive behavioral therapy program category provides a similar mix—various
name-brand programs like Thinking for a Change, Moral Reconation Therapy, ART, and Reason-
ing and Rehabilitation along with no name generic cognitive behavioral therapy programs. In both
cases, the distribution of recidivism effect sizes across all the programs in the respective category
is broad, ranging from negative values to rather large positive values (coded to be lower recidi-
vism), but skewed in a strong positive direction. The striking feature of these distributions is that
there is little differentiation between the effect sizes for the name-brand programs and those for
the generic versions of that same program type, something that has been found by others as well
(e.g., Wampold et al., 2011; Weisz, Jensen-Doss, & Hawley, 2006). Most of the name-brand pro-
grams appear on one or another of the EBP registries as programs certified to be effective. That
rather implies that they have some distinctive characteristics captured in their treatment manuals
1338 LIPSEY
that generic versions of that same intervention modality do not have, and which thus makes them
distinctively effective. The implication of the meta-analysis findings, however, is that generic ver-
sions of family therapy or CBT can be as effective as the name-brand EBP versions. This is good
news for juvenile justice agencies with local family therapy or CBT programs.
The wide distribution of effects, however, indicates that more is required for one of these
programs to be effective than simply using a family therapy or CBT program modality. For
name-brand programs, that something is fidelity. Poorly implemented family therapy or CBT can
have negative effects, as, for example, the Washington state evaluations of FFT and ART show
(Barnoski, 2004). For the generic programs, there are empirically identifiable program character-
istics associated with larger positive effects that can be used much like fidelity measures to guide
implementation toward effectiveness.
Indeed, the main thrust of the meta-analysis is to attempt to identify the program character-
istics across the great heterogeneity deliberately represented in the database that are associated
with larger positive outcomes on recidivism. To this end, a great variety and number of program
characteristics are coded from the respective study reports into variables that can be used analyti-
cally. These include the methodological control variables described earlier, but also details about
the nature of the program participant samples, the intervention modality, the amount of treat-
ment provided, the service delivery format, quality monitoring, the juvenile justice context, and
the like—approximately 150 items with which our coders interrogate the source research reports.
These variables are then used to build statistical models aimed at predicting recidivism effect
sizes, meta-analytic versions of multiple regression models that take into account the varying
sample sizes and the associated statistical precision of the effect estimates. Making a long story
short, a small set of robust predictors emerges from these analyses. Those predictors, some of
which combine multiple related variables, represent the treatment modality (generic program
type), the quality of service delivery, the amount of treatment (dose), and the risk level of the
participating juveniles.
While these predictors collectively do not account for an especially high proportion of the very
large variation in the overall distribution of recidivism effect sizes, they are sufficient to identify
a set of program characteristics associated with better than average effects. That is, in an overall
distribution of recidivism effect sizes with a positive mean of nontrivial magnitude, there are pro-
files on this set of characteristics that identify programs with effects above that mean and, indeed,
some profiles that identify programs with effects well above that mean. This applies to the overall
effect size distribution, but also to the effect size distributions for any generic program type, for
example, the family therapy and CBT distributions described above.
3.2 The SPEP assessment scheme
So, what can you do with a statistical prediction model for the characteristics of intervention pro-
grams that are associated with reductions in the recidivism of juvenile offenders that might be use-
ful to juvenile justice practitioners? Consider an analogy. Longitudinal panel studies have found
characteristics of juvenile offenders that are predictive of later recidivism, for example, age of first
arrest, antisocial peers, substance abuse, and the like. None of these have been tested in random-
ized studies to establish their causal relationship to recidivism—it would hardly be practical or
ethical to randomly assign youth be arrested earlier or later, hang out with antisocial peers or
not, etc. Empirically, however, these characteristics are predictive, and whether they have direct
causal links to recidivism or are proxies for other variables with which they are correlated that
LIPSEY 1339
have such direct links, they can nonetheless be used to distinguish youth at greater or lesser risk of
recidivism. On that basis, they can be organized into a risk assessment instrument that will have
utility for identifying juvenile offenders with different risk of recidivism. The key index of the
quality of such an instrument is predictive validity—the extent to which it differentiates youth
who then go on to actually recidivate at different rates. Moreover, the predictive validity of a risk
assessment can be well short of 100% and still be useful for practical purposes. It helps juvenile
justice personnel improve the odds of identifying appropriate youth for more or less intensive
supervision and treatment, and typically does a better job of that than those personnel can do on
the basis of their clinical judgment.
The SPEP assessment builds on that same logic. The meta-analysis has identified a set of
program characteristics empirically associated with better recidivism outcomes for the juveniles
served. That information has then been integrated into a rating scheme designed to differenti-
ate programs that are more or less likely to be effective in reducing the recidivism of the youth
served. The predictions made by those ratings are far from perfect, but if they have sufficient pre-
dictive validity, they can help juvenile justice agencies distinguish programs expected to be more
effective from those expected to be less effective. Most important, the predictive characteristics
are malleable so the SPEP assessment can also be used diagnostically to identify ways programs
might improve their expected effectiveness.
The critical question is whether the SPEP assessment, with its relatively simple and general set
of constituent program characteristics, has sufficient predictive validity to be useful at a practi-
cal level. Several validation studies have been conducted, but they are difficult to do. Because the
SPEP rates programs, a relatively large number of programs must be represented in the valida-
tion sample to yield statistically stable results. Also, the recidivism outcomes to be predicted are
confounded with the initial recidivism risk level of the clientele served by each program and mul-
tivariate statistical techniques must be used to disentangle that confound from the relationship
between SPEP scores and recidivism.
A predictive validity study of the initial pilot version of the SPEP was completed in North Car-
olina in 2007 with 50 community programs serving relatively low risk juveniles and 113 programs
serving somewhat higher risk court supervised youth (Lipsey, Howell, & Tidd, 2007). That study
showed encouraging, and statistically significant relationships between risk-adjusted recidivism
and SPEP scores for program type and amount of service. This was followed by two validation
studies conducted in Arizona, one that I conducted (Lipsey, 2008) with a sample of 66 programs.
Overall SPEP scores showed statistically significant and relatively strong relationships with the
risk-adjusted 12-month rearrest recidivism outcomes for the juveniles served in those programs
and the overall and component scores all showed relationships in the expected direction for both
6- and 12-month recidivism, but not all reached statistical significance. An independent valid-
ity study was then conducted in 2010 with a larger Arizona sample of 90 programs (Redpath &
Brandner, 2010). This study also showed correlations in the expected direction between the overall
and component SPEP scores and risk-adjusted recidivism for 6- and 12-month arrest recidivism,
though not all reached statistical significance.
The version of the SPEP tested in Arizona lacked the quality of service delivery component,
which had not yet been developed and incorporated into the scoring at the time of the validation
studies. As mentioned earlier, a separate analysis of the relationship of the ratings on only that
component with risk-adjusted recidivism was conducted with Florida data from 56 residential
programs (Baglivio et al., 2018). Statistically significant odds ratios were found for the relationships
with risk-adjusted rearrest, reconviction, and reincarceration with the respective odds ratios all
about 0.89.
1340 LIPSEY
The most recent validation study was conducted in Pennsylvania with 162 cohorts of juvenile
offenders served by SPEP-rated programs (Mulvey, Schubert, Jones, & Hawes, 2020). Recidivism
was indexed as 6 and 12 month adjudication or conviction but the overall base rates on these
measures were quite low, 9% and 23% respectively. This provided a restricted range with limited
variation for correlation with SPEP scores. The overall pattern of that relationship was generally in
the expected direction, but not strong or statistically significant. However, there were significant
relationships for some of the component scores with 12-month risk-adjusted recidivism.
The most distinctive feature of the Pennsylvania study was the availability of data on successive
SPEP scores for 38 programs that made possible an analysis of the effects of change over time. Of
those 38, 17 showed no or negative changes in their SPEP scores between the first rating and the
subsequent one, while the remaining 21 showed a positive shift. The difference scores for those
changes were significantly correlated with 6-month risk-adjusted recidivism, but not 12-month.
There were data problems acknowledged by the authors in this study, and the low recidivism base
rates restricted the latitude for improvement, but the overall the pattern of relationships was gen-
erally in the right direction, though not always statistically significant. Perhaps most important,
this study provided the first bit of evidence that program changes that improve SPEP scores may
also improve the effects of those programs on recidivism.
3.3 Response to the Elliott et al. critique
Much of the misrepresentation of the SPEP and the meta-analysis on which it is based in the
Elliott et al.’s characterization should be evident in comparison to the summary above. A few
points, however, warrant a separate response. First among these is the bias Elliott et al. build into
their critique with their a priori definition of a “program” as “a coherent package of activities
with defined delivery protocols, implementation manuals, training and technical assistance that
implement an identified logic model . . . ” In other words, only interventions that meet Blueprint
standards are true programs, all others are mere practices.
Consider, for example, family therapy delivered by social workers trained in FFT and guided
by the FFT instruction manual. In the same juvenile justice jurisdiction, a group of social workers
trained in family systems theory use a family therapy approach broadly similar to FFT in their
work with similar juvenile offenders and their families. Elliott et al. would have you believe that
only the FFT version of family therapy is a real program. That will come as quite a surprise to the
providers of the more eclectic family therapy. They might rightly view this as simply the arrogance
of academic researchers—if they didn’t create and bless the program, it’s not a real program. The
attempt by Elliott and his coauthors on the Blueprints advisory board to restrict the definition of
a “program” to those on the Blueprints registry and their kin, and denigrate everything else as a
mere practice, is not consistent with the breadth of the endeavors widely recognized as programs
for juvenile offenders throughout juvenile justice.
A related feature of the Elliott et al. claims about what constitutes a program is their insistence
that a logic model is required or some analogous specification of the causal mechanism under-
lying the intervention. Omitted is any explanation of why a program fails to be a program if that
logic is not made explicit. A credible randomized experiment can establish the causal relationship
between participation in a program and the resulting outcome. And no program provider lacks
what is at least a tacit theory of action that guides program activities. Concocting an explicit logic
model that is very rarely actually subjected to a direct test does not provide any obvious status
to the associated program. Widespread rigorous testing of candidate causal mechanisms would
LIPSEY 1341
indeed advance understanding of how effective programs work, help design better programs, and
support more empirically based fidelity standards, but the logic models that accompany the typ-
ical researcher developed program are generally little more than informed speculation. And, if
an intervention is demonstrated by credible research to reliably cause beneficial outcomes for the
participants, that is, if it “works,” it has practical value and utility even if that happens prior to
any empirically based explanation of how and why it works.
On another point, Elliott et al. assert that any program found in an evaluation to produce a nega-
tive effect should be viewed as iatrogenic and dropped from consideration as an effective program.
However, they exempt Blueprint programs in instances where, for example, poor implementation
of FFT yields negative effects. This perspective is also extended to any studies of programs in a
related family of programs, for example, variations on CBT, that show negative effect sizes so that
CBT as a general treatment approach is discredited. In both instances, this is a rather simplistic
interpretation of negative effects. The critical question is why there were negative effects and the
associated implications for practice. If a manualized program, or multiple programs in a related
family, show consistent negative effects (as with the multiple studies of various Scared Straight
type prison visitation programs), it is appropriate to conclude that they are categorically harmful.
If the negative effects are shown to stem from some avoidable source, however, and the effects
are otherwise mainly positive, that has different implications. This is the case with the negative
FFT studies in which FFT has been implemented with poor fidelity. The appropriate response is
to avoid poor fidelity, not condemn FFT to the scrap pile. Similar considerations apply to a distri-
bution of effect sizes for variants on a program approach such as family therapy or CBT. Negative
effects can occur by chance even for truly effective programs, or can appear because of biased
research designs, poor implementation, inappropriate participant samples for the nature of the
program, and other such reasons. If the general pattern skews positive, and the negative effects
are associated with flukes or avoidable circumstances, it is not a correct conclusion to deem the
program or family of programs iatrogenic and thus categorically harmful.
Finally, there is a point Elliott et al. make that I don’t want to lose in this discussion. That is the
importance of assessing the system level effects on outcomes for the population of juveniles served
as reforms are implemented, initiatives taken, model programs adopted, SPEP implemented, and
so on. System level data are increasingly available and the analytic techniques for linking trends
and changes in outcomes with operational practices over time are relatively advanced. The ulti-
mate EBP is a system-level integration of multiple workable and effective practices that produce
system-level positive effects on the lives of the youth served and the communities in which they
reside. Research at that level, whenever possible, should be high on the agenda for juvenile justice
researchers.
4 CONCLUSION
The primary goals of juvenile justice systems are twofold. First is public safety, largely pursued
via some level of supervision or incapacitation to constrain the potential for further offending
in the short term, ideally without causing longer term harm. The second is facilitating change
in the behavioral and psychosocial factors presumed to underlie the delinquent behavior, largely
via intervention programs aimed at reducing recidivism and promoting positive youth outcomes.
There is a large body of research that investigates the effects of such interventions on the subse-
quent offense rates of juvenile offenders. The question at issue in this exchange is not whether
juvenile justice programs should be supported by credible evidence of effectiveness. There is good
1342 LIPSEY
reason to believe that much of the programming in juvenile justice is ineffective and possibly even
harmful, a situation that not only can produce poor outcomes, but is wasteful of scarce resources.
The question at hand is how the available research, and future research in whatever form will be
helpful, can be used to guide program practice in ways that increase the probability of positive out-
comes for the affected youth and the community. Over the approximately 20-year history of the
EBP registries, it is simply a fact that few robustly effective model programs for juvenile offenders
have been identified and those few have not been widely integrated into juvenile justice practice.
The model program approach does hold promise and plays an important role for agencies seek-
ing to implement new programs with a high likelihood of effectiveness. There is no need to inhibit
development of innovative and promising manualized programs or their identification and pro-
motion through evidence-based registries. But there are other reasonable ways to make practical
use of the large and growing body of juvenile justice intervention research that are worth explor-
ing, and which may turn out to have the potential for larger net effects on practice than model
programs. Where Elliott et al. and I most differ is that I don’t think those explorations should be
denounced or inhibited simply because they attempt to use evidence in a different way.
CONFLICT OF INTEREST
The authors confirm that they have no conflict of interest to declare.
REFERENCES
Ardito, R. B., & Rabellino, D. (2011). Therapeutic alliance and outcome of psychotherapy: Historical excursus, mea-
surements, and prospects for research. Frontiers in Psychology, 2(270), 270. https://doi.org/10.3389/fpsyg.2011.
00270
Baglivio, M. T., Wolff, K. T., Jackowski, K., Chapman, G., Greenwald, M. A., & Gomez, K. (2018). Does treat-
ment quality matter? Multilevel examination of the effects of intervention quality on recidivism of adoles-
cents completing long-term juvenile justice residential placement. Criminology & Public Policy, 17(1), 147–180.
https://doi.org/10.1111/1745-9133.12338
Barnoski, R. (2002). Washington State’s implementation of functional family therapy for juvenile offenders:
Preliminary findings. Washington State Institute for Public Policy. Document No. 02-08-1201 Retrieved from
wsipp.wa.gov/ReportFile/803.
Barnoski, R. (2004). Outcome evaluation of Washington State’s research-based programs for juvenile
offenders. Washington State Institute for Public Policy. Document Number: 04-01-1201. Retrieved from
http://www.wsipp.wa.gov/ReportFile/852/Wsipp_Outcome-Evaluation-of-Washington-States-Research-
Based-Programs-for-Juvenile-Offenders_Full-Report.pdf
Barth, R. P., Lee, B. R., Lindsey, M. A., Collins, K. S., Strieder, F., Chorpita, B. F., . . . Sparks, J. A. (2012). Evidence-
based practice at a crossroads: The timely emergence of common elements and common factors. Research on
Social Work Practice, 22(1), 108–119. https://doi.org/10.1177/1049731511408440
Blase, K., & Fixsen, D. (2013). Core intervention components: Identifying and operationalizing what makes programs
work. ASPE research brief. Washington, DC: Office of the Assistant Secretary for Planning and Evaluation, US
Department of Health and Human Services. Retrieved from https://aspe.hhs.gov/pdf-report/core-intervention-
components-identifying-and-operationalizing-what-makes-programs-work.
Bopp, M., Saunders, R. P., & Lattimore, D. (2013). The tug-of-war: Fidelity versus adaptation throughout the health
promotion program life cycle. Journal of Primary Prevention, 34(3), 193–207. https://doi.org/10.1007/s10935-013-
0299-y.
Chorpita, B. F., & Daleiden, E. L. (2009). Mapping evidence-based treatments for children and adolescents: Appli-
cation of the distillation and matching model to 615 treatments from 322 randomized trials. Journal of Consulting
and Clinical Psychology, 77(3), 566–579. https://doi.org/10.1037/a0014565
Doss, B. D. (2004). Changing the way we study change in psychotherapy. Clinical Psychology: Science and Practice,
11(4), 368–386. https://doi.org/10.1093/clipsy/bph094
LIPSEY 1343
Dymnicki, A., Trivits, L., Hoffman, C., & Osher, D. (2020). Advancing the use of core components of effective pro-
grams: Suggestions for researchers publishing evaluation results. Issue brief. Washington, DC: Office of the Assis-
tant Secretary for Planning and Evaluation U.S. Department of Health and Human Services. Retrieved from
https://youth.gov/sites/default/files/ASPE-Brief_Core-Components.pdf
Elliott, D., Buckley, P., Gottfredson, D., Hawkins, D., & Tolan, P. (2020). Evidence-based juvenile justice programs
and practices: A critical review. Criminology and Public Policy,
Embry, D. D., & Biglan, A. (2008). Evidence-based Kernels: Fundamental units of behavioral influence. Clinical
Child and Family Psychology Review, 11(3), 75–113. https://doi.org/10.1007/s10567-008-0036-x
Farb, A. F., & Margolis, A. L. (2016). The teen pregnancy prevention program (2010–2015): Synthesis of impact
findings. American Journal of Public Health, 106, S9–S15. https://doi.org/10.2105/AJPH.2016.303367
Feucht, T. E., & Tyson, J. (2018). Advancing “what works” in justice: Past, present, and future work of federal justice
research agencies. Justice Evaluation Journal, 1(2), 151–187. https://doi.org/10.1080/24751979.2018.1552083
Hogue, A., Bobek, M., Dauber, S., Henderson, C. E., McLeod, B. D., & Southam-Gerow, M. A. (2017). Distilling the
core elements of family therapy for adolescent substance use: Conceptual and empirical solutions. Journal of
Child and Adolescent Substance Abuse, 26(6), 437–453. https://doi.org/10.1080/1067828X.2017.1322020.
Hogue, A., Dauber, S., Stambaugh, L. F., Cecero, J. J., & Liddle, H. A. (2006). Early therapeutic alliance and treat-
ment outcome in individual and family therapy for adolescent behavior problems. Journal of Consulting and
Clinical Psychology, 74(1), 121–129. https://doi.org/10.1037/0022-006X.74.1.121
Humayun, S., Herlitz, L., Chesnokov, M., Doolan, M., Landau, S., & Scott, S. (2017). Randomized controlled trial
of functional family therapy for offending and antisocial behavior in UK youth. Journal of Child Psychology and
Psychiatry, 58(9), 1023–1032. https://doi.org/10.1111/jcpp.12743 .
Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology, 19(5), 640–648. https:
//doi.org/10.1097/EDE.0b013e31818131e7.
Knoth, L., Wanner, P., & He, L. (2019). Washington State recidivism trends: FY 1995–FY 2014. Docu-
ment Number 19-03-1901. Washington, DC: Washington State Institute for Public Policy. Retrieved from
http://www.wsipp.wa.gov/ReportFile/1703/Wsipp_Washington-State-Adult-and-Juvenile-Recidivism-Trends-
FY-1995-FY-2014_Report.pdf
Lester, P. (2017). Tiered evidence grants: An assessment of the education innovation and research program. Wash-
ington, DC: Social Innovation Research Center, IBM Center for the Business of Government. https://doi.sites/
default/files/Tiered/20Evidence/20Grants-Education/20Innovation/20and/20Research/20Program.pdf
Lipsey, M. W. (2003). Those confounded moderators in meta-analysis: Good, bad, and ugly. The Annals of the Amer-
ican Academy of Political and Social Science, 587(1), 69–81. https://doi.org/10.1177/0002716202250791
Lipsey, M. W. (2008). The Arizona standardized program evaluation protocol (SPEP) for assessing the effectiveness of
programs for juvenile probationers: SPEP ratings and relative recidivism reduction for the initial SPEP sample. A
Report to the Juvenile Justice Services Division, Administrative Office of the Courts, State of Arizona. Nashville,
TN: Center for Evaluation Research and Methodology, Vanderbilt Institute for Public Policy Studies. Retrieved
from http://www.my.vanderbilt.edu/spep/files/2013/04/AZ-Recidivism-Report_final-revised.pdf
Lipsey, M. W. (2009). The primary factors that characterize effective interventions with juvenile offenders: A meta-
analytic review. Victims & Offenders, 4(2), 124–147. https://doi.org/10.1080/15564880802612573
Lipsey, M. W. (2018). Effective use of the large body of research on the effectiveness of programs for juve-
nile offenders and the failure of the model programs approach. Criminology and Public Policy, 17(1), 189–198.
https://doi.org/10.1111/1745-9133.12345
Lipsey, M. W., & Cullen, F. T. (2007). The effectiveness of correctional rehabilitation: A review of systematic reviews.
Annual Review of Law and Social Science, 3, 297–320. https://doi.org/10.1146/annurev.lawsocsci.3.081806.112833
Lipsey, M. W., Howell, J. C., & Tidd, S. T. (2007). The standardized program evaluation protocol (SPEP): A practical
approach to evaluating and improving juvenile justice programs in North Carolina. Final evaluation report.
Nashville, TN: Vanderbilt University. Available from author, Retrieved from http://mark.lipsey@vanderbilt.edu
Loyka, C. M., Ruscio, J., Edelblum, A. B., Hatch, L., Wetreich, B., & Zabel, A. (2020). Weighing people rather than
food: A framework for examining external validity. Perspectives on Psychological Science, 15(2), 483–496. https:
//doi.org/10.1177/1745691619876279
Meyer-Chilenski, S., Bumbarger, B. K., Kyler, S., & Greenberg, M. (2007). Reducing youth violence and delin-
quency in Pennsylvania: PCCD’s research-based programs initiative. Prevention research center. State College,
1344 LIPSEY
PA: Pennsylvania State University. Retrieved from http://www.episcenter.psu.edu/sites/default/files/resources/

PCCD_ReducingYouthViolence.pdf
Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., . . . Wood, C. E. (2013). The
behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international
consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46(1), 81–95. https:
//doi.org/10.1007/s12160-013-9486-6.
Minary, L., Trompette, J., Kivits, J., Cambon, L., Tarquinio, C., & Alla, F. (2019). Which design to evaluate com-
plex interventions? Toward a methodological framework through a systematic review. BMC Medical Research
Methodology, 19(92). https://doi.org/10.1186/s12874-019-0736-6
Moore, G. F., Evans, R. E., Hawkins, J., Littlecott, H. J., & Turley, R. (2017). All interventions are complex, but some
are more complex than others: Using iCAT SR to assess complexity. Cochrane Database of Systematic Reviews,
7, ED000122. https://doi.org/10.1002/14651858.ED000122
Mulvey, E. P., Schubert, C. A., Jones, B., & Hawes, S. (2020). A validation of SPEP™ in Pennsylvania. Harrisburg,
PA: Pennsylvania Commission on Crime and Delinquency. Retrieved from https://www.pccd.pa.gov/Juvenile-
Justice/Documents/A%20Validation%20of%20SPEP%20in%20PA%20Report.pdf
National Academies of Sciences, Engineering, and Medicine. (2020). Core components of programs focused on
optimal health. In N. F. Kahn & R. Graham (Eds.), Promoting positive adolescent health behaviors and outcomes:
Thriving in the 21st Century. Washington, DC: The National Academies Press. https://doi.org/10.17226/25552.
OPPAGA (2010). Redirection saves $51.2 million and continues to reduce recidivism. (Report No. 10–38). Tallahas-
see, FL: Office of Program Policy Analysis & Government Accountability. Retrieved from https://oppaga.fl.gov/
Documents/Reports/10-38.pdf
Petrosino, A., & Soydan, H. (2005). The impact of program developers as evaluators on criminal recidivism: Results
from meta-analyses of experimental and quasi-experimental research. Journal of Experimental Criminology, 1,
435–450. https://doi.org/10.1007/s11292-005-3540-8
Redpath, D. P., & Brandner, J. K. (2010). The Arizona standardized program evaluation protocol (SPEP) for assessing
the effectiveness of programs for juvenile probationers: SPEP rating and relative recidivism reduction: An update
to the January 2008 report by Dr. Mark Lipsey. Arizona Supreme Court Administrative Office of the Courts,
Juvenile Justice Service Division, Phoenix, AZ. Retrieved from https://my.vanderbilt.edu/spep/files/2013/04/
AZ-SPEP-Recidivism-Analysis_Update_-2010.pdf
Sexton, T., & Turner, C. W. (2010). The effectiveness of functional family therapy for youth with behavioral problems
in a community practice setting. Journal of Family Psychology, 24(3), 339–348. https://doi.org/10.1037/a0019406
Smith, C., Akiva, T., Sugar, S. A., Lo, Y. J., Frank, K. A., Peck, S. C., . . . Devaney, T. (2012). Continuous quality
improvement in afterschool settings: Impact findings from the youth program quality intervention study. Washing-
ton, DC: Forum for Youth Investment.
Tanner-Smith, E., & Grant, S. (2018). Meta-analysis of complex interventions. Annual Review of Public Health, 39,
135–151. https://doi.org/10.1146/annurev-publhealth-040617-014112
The Justice Research Center (2013). Redirection: Reducing juvenile crime in Florida. Report Tallahassee, FL:
The Justice Research Center. Retrieved from https://evidencebasedassociates.com/wp-content/uploads/2016/
09/2013_EBA-Report.pdf
van der Stouwe, T., Asscher, J. J., Stams, G. J. J.M., Dekovic, M., & van der Laan, P. H. (2014). The effectiveness
of multisystemic therapy (MST): A meta-analysis. Clinical Psychology Review, 34(6), 468–481. https://doi.org/10.
1016/j.cpr.2014.06.006
Van Ryzin, M. J., Roseth, C. J., Fosco, G. M., Lee, Y., & Chen, I. (2016). A Component-centered meta-analysis of
family-based prevention programs for adolescent substance use. Clinical Psychology Review, 45, 72–80. https:
//doi.org/10.1016/j.cpr.2016.03.007.
Wampold, B. E., Budge, S. L., Laska, K. M., Del Re, A. C., Baardseth, T. P., Flűckiger, C., . . . Gunn, W. (2011).
Evidence-based treatments for depression and anxiety versus treatment-as-usual: Meta-analysis of direct com-
parisons. Clinical Psychology Review, 31, 1304–1312. https://doi.org/10.1016/j.cpr.2011.07.012
Weisz, J. R., & Chorpita, B. F. (2012). “Mod squad” for youth psychotherapy: Restructuring evidence-based treat-
ment for clinical practice. In P. C. Kendall (Ed.), Child and adolescent therapy: Cognitive-behavioral procedures
(pp. 379–397). New York: Guilford Press.
LIPSEY 1345
Weisz, J. R., Jensen-Doss, A., & Hawley, K. M. (2006). Evidence-based youth psychotherapies versus usual clinical
care: A meta-analysis of direct comparisons. American Psychologist, 61(7), 671–689. https://doi.org/10.1037/0003-
066X.61.7.671
Wilson, S. J., Lipsey, M. W., Aloe, A., & Sahni, S. (2020). Developing evidence-based practice guidelines for
youth programs: Technical report on the core components of interventions that address externalizing behavior
problems. Office of the Assistant Secretary for Planning and Evaluation, U.S. Department of Health and Human
Services., Retrieved from https://aspe.hhs.gov/system/files/pdf/263931/Technical-Report-Externalizing.pdf
Wolf, R., Morisson, J., Inns, A., Slavin, R., & Risman, K. (2020). Average effect sizes in developer-commissioned
and independent evaluations. Journal of Research on Educational Effectiveness, 13(2). https://doi.org/10.1080/
19345747.2020.1726537
AU T H O R B I O G R A P H Y
Mark W. Lipsey is a research professor in Human and Organizational Development at Van-

derbilt University. His research specialties are evaluation research and meta-analysis investi-
gating the effects of interventions with children, youth, and families. Recent research topics
include interventions for juvenile delinquency and substance use, early childhood education
programs, methodological quality in program evaluation, and ways to help practitioners and
policy makers make better use of research to improve the outcomes of programs for children
and youth. Prof. Lipsey’s published works include textbooks on program evaluation, meta-
analysis, and statistical power as well as articles on applied methods and the effectiveness of
school, community, and residential programs for youth.
How to cite this article: Lipsey MW. Revisited: Effective use of the large body of
research on the effectiveness of programs for juvenile offenders and the failure of the
model programs approach. Criminol Public Policy. 2020;19:,1329–1345.
https://doi.org/10.1111/1745-9133.12530

144 4jan2021 JuvenileOffendersProgramEffectivenessLipsey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

144 4jan2021 JuvenileOffendersProgramEffectivenessLipsey

Uploaded by

Copyright:

Available Formats

DOI: 10.1111/1745-9133.

Revisited: Effective use of the large body of

Human & Organizational Development,

1 THE EVIDENCE-BASED PRACTICE MOVEMENT

2 MODEL PROGRAMS IN JUVENILE JUSTICE

2.1 Potential for broad impact

2.2 Uptake of evidence-based programs

3.1 The meta-analysis

3.2 The SPEP assessment scheme

3.3 Response to the Elliott et al. critique

PA: Pennsylvania State University. Retrieved from http://www.episcenter.psu.edu/sites/default/files/resources/

Mark W. Lipsey is a research professor in Human and Organizational Development at Van-

You might also like