You are on page 1of 476

The Oxford Handbook of Research Strategies

for Clinical Psychology


Edito r- in-Chief

Peter E. Nathan

a rea edito rs :

Clinical Psychology
David H. Barlow

Cognitive Neuroscience
Kevin N. Ochsner and Stephen M. Kosslyn

Cognitive Psychology
Daniel Reisberg

Counseling Psychology
Elizabeth M. Altmaier and Jo-Ida C. Hansen

Developmental Psychology
Philip David Zelazo

Health Psychology
Howard S. Friedman

History of Psychology
David B. Baker

Methods and Measurement

Todd D. Little

Kenneth M. Adams

Organizational Psychology
Steve W. J. Kozlowski

Personality and Social Psychology

Kay Deaux and Mark Snyder

Editor in Chief peter e. nathan

The Oxford Handbook

of Research Strategies
for Clinical Psychology
Edited by
Jonathan S. Comer
Philip C. Kendall

Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide.

Oxford New York

Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto

With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam

Oxford is a registered trademark of Oxford University Press in the UK and

certain other countries.

Published in the United States of America by

Oxford University Press
198 Madison Avenue, New York, NY 10016

© Oxford University Press 2013

Oxford is a registered trademark of Oxford University Press

All rights reserved. No part of this publication may be reproduced, stored in a

retrieval system, or transmitted, in any form or by any means, without the prior
permission in writing of Oxford University Press, or as expressly permitted by law,
by license, or under terms agreed with the appropriate reproduction rights organization.
Inquiries concerning reproduction outside the scope of the above should be sent to the
Rights Department, Oxford University Press, at the address above.

You must not circulate this work in any other form

and you must impose this same condition on any acquirer.

Library of Congress Cataloging-in-Publication Data

The Oxford handbook of research strategies for clinical psychology / edited by Jonathan S. Comer, Philip C. Kendall.
pages ; cm
Includes bibliographical references.
ISBN 978–0–19–979354–9
1. Clinical psychology—Research—Methodology—Handbooks, manuals, etc. I. Comer, Jonathan S.,
editor of compilation. II. Kendall, Philip C., editor of compilation. III. Title: Handbook of
research strategies for clinical psychology.
RC467.8.O94 2013

9 8 7 6 5 4 3 2 1
Printed in the United States of America
on acid-free paper

Oxford Library of Psychology vii

About the Editors ix

Contributors xi

Table of Contents xv

Chapters 1–442

Index 443

This page intentionally left blank

The Oxford Library of Psychology, a landmark series of handbooks, is published by

Oxford University Press, one of the world’s oldest and most highly respected pub-
lishers, with a tradition of publishing significant books in psychology. The ambi-
tious goal of the Oxford Library of Psychology is nothing less than to span a vibrant,
wide-ranging field and, in so doing, to fill a clear market need.
Encompassing a comprehensive set of handbooks, organized hierarchically, the
Library incorporates volumes at different levels, each designed to meet a distinct
need. At one level are a set of handbooks designed broadly to survey the major sub-
fields of psychology; at another are numerous handbooks that cover important cur-
rent focal research and scholarly areas of psychology in depth and detail. Planned
as a reflection of the dynamism of psychology, the Library will grow and expand as
psychology itself develops, thereby highlighting significant new research that will
impact on the field. Adding to its accessibility and ease of use, the Library will be
published in print and, later on, electronically.
The Library surveys psychology’s principal subfields with a set of handbooks that
capture the current status and future prospects of those major subdisciplines. This
initial set includes handbooks of social and personality psychology, clinical psy-
chology, counseling psychology, school psychology, educational psychology, indus-
trial and organizational psychology, cognitive psychology, cognitive neuroscience,
methods and measurements, history, neuropsychology, personality assessment,
developmental psychology, and more. Each handbook undertakes to review one
of psychology’s major subdisciplines with breadth, comprehensiveness, and exem-
plary scholarship. In addition to these broadly-conceived volumes, the Library also
includes a large number of handbooks designed to explore in depth more special-
ized areas of scholarship and research, such as stress, health and coping, anxiety
and related disorders, cognitive development, or child and adolescent assessment.
In contrast to the broad coverage of the subfield handbooks, each of these latter
volumes focuses on an especially productive, more highly focused line of scholar-
ship and research. Whether at the broadest or most specific level, however, all of the
Library handbooks offer synthetic coverage that reviews and evaluates the relevant
past and present research and anticipates research in the future. Each handbook in
the Library includes introductory and concluding chapters written by its editor to
provide a roadmap to the handbook’s table of contents and to offer informed antici-
pations of significant future developments in that field.
An undertaking of this scope calls for handbook editors and chapter authors who
are established scholars in the areas about which they write. Many of the nation’s
and world’s most productive and best-respected psychologists have agreed to edit
Library handbooks or write authoritative chapters in their areas of expertise.

For whom has the Oxford Library of Psychology been written? Because of its
breadth, depth, and accessibility, the Library serves a diverse audience, including
graduate students in psychology and their faculty mentors, scholars, researchers,
and practitioners in psychology and related fields. Each will find in the Library the
information they seek on the subfield or focal area of psychology in which they
work or are interested.
Befitting its commitment to accessibility, each handbook includes a compre-
hensive index, as well as extensive references to help guide research. And because
the Library was designed from its inception as an online as well as a print resource,
its structure and contents will be readily and rationally searchable online. Further,
once the Library is released online, the handbooks will be regularly and thoroughly
In summary, the Oxford Library of Psychology will grow organically to provide a
thoroughly informed perspective on the field of psychology, one that reflects both
psychology’s dynamism and its increasing interdisciplinarity. Once published elec-
tronically, the Library is also destined to become a uniquely valuable interactive
tool, with extended search and browsing capabilities. As you begin to consult this
handbook, we sincerely hope you will share our enthusiasm for the more than 500-
year tradition of Oxford University Press for excellence, innovation, and quality, as
exemplified by the Oxford Library of Psychology.

Peter E. Nathan
Oxford Library of Psychology

viii ox ford library o f p s ycho lo gy


Jonathan S. Comer
Dr. Comer is Associate Professor of Psychology at Florida International University
and the Center for Children and Families. Before this, he served as Director of the
Early Childhood Interventions Program of Boston University, an interdisciplinary
clinical research laboratory in the Center for Anxiety and Related Disorders devoted
to expanding the quality and accessibility of mental health care for very young chil-
dren. His program of research examines five areas of overlapping inquiry: (1) the
assessment, phenomenology, and course of child anxiety disorders; (2) the develop-
ment and evaluation of evidence-based treatments for childhood psychopathology,
with particular focus on the development of innovative methods to reduce system-
atic barriers to effective mental health care in the community; (3) the psychological
impact of disasters and terrorism on youth; (4) national patterns and trends in the
utilization of mental health services and quality of care; and (5) psychosocial treat-
ment options for mood, anxiety, and disruptive behavior problems presenting in
early childhood.

Philip C. Kendall
Dr. Kendall’s doctorate in clinical psychology is from Virginia Commonwealth
University. He has been honored with the Outstanding Alumnus Award from this
institution. His Board Certification (ABPP) is in (a) Clinical Child and Adolescent
psychology and (b) Cognitive and Behavioral Therapy. Dr. Kendall’s CV lists over
450 publications. He has had over 25 years of uninterrupted research grant sup-
port from various agencies. Having received many thousands of citations per year,
he placed among an elite handful of the most “Highly-Cited” individuals in all of
the social and medical sciences. In a recent quantitative analysis of the publications
by and citations to all members of the faculty in the 157 American Psychological
Association-approved programs in clinical psychology, Dr. Kendall ranked 5th. Dr.
Kendall has garnered prestigious awards: Fellow at the Center for Advanced Study
in the Behavioral Sciences, inaugural Research Recognition Award from the Anxiety
Disorders Association of America, “Great Teacher” award from Temple University,
identified as a “top therapist” in the tristate area by Philadelphia Magazine, and
a named chair and Distinguished University Professorship at Temple University.
He has been president of the Society of Clinical Child and Adolescent Psychology
(Division 53) of APA as well as President of the Association for the Advancement of
Behavior Therapy (AABT, now ABCT). Recently, ABCT recognized and awarded
him for his “Outstanding Contribution by an Individual for Educational/Training

This page intentionally left blank

Elizabeth W. Adams Candice Chow

University of Alabama Department of Psychology
Tuscaloosa, AL Wellesley College
Leona S. Aiken Wellesley, MA
Department of Psychology Jonathan S. Comer
Arizona State University Department of Psychology
Tempe, AZ Florida International University
Marc Atkins Miami, FL
Department of Psychiatry Mark R. Dadds
University of Illinois at Chicago School of Psychology
Chicago, IL The University of New South Wales
Amanda N. Baraldi Sydney, Australia
Department of Psychology Ulrich W. Ebner-Priemer
Arizona State University Karlsruhe Institute of Technology
Tempe, AZ Karlsruhe, Germany
David H. Barlow Andy P. Field
Center for Anxiety and Related Disorders School of Psychology
Boston University University of Sussex
Boston, MA Sussex, United Kingdom
Rinad S. Beidas John P. Forsyth
Center for Mental Health Policy and Services University at Albany, State University
Research of New York
Perelman School of Medicine Department of Psychology
University of Pennsylvania Albany, NY
Philadelphia, PA Kaitlin P. Gallo
Deborah C. Beidel Center for Anxiety and Related Disorders
Psychology Department Boston University
University of Central Florida Boston, MA
Orlando, FL Lois A. Gelfand
Timothy A. Brown Department of Psychology
Department of Psychology University of Pennsylvania
Boston University Philadelphia, PA
Boston, MA Andrew J. Gerber
Mathew M. Carper Division of Child and Adolescent Psychiatry
Department of Psychology Columbia College of Physicians and Surgeons
Temple University New York State Psychiatric Institute
Philadelphia, PA New York, NY
Heining Cham Marlen Z. Gonzalez
Department of Psychology Department of Psychology
Arizona State University University of Virginia
Tempe, AZ Charlottesville, VA

David J. Hawes Patrick E. McKnight
School of Psychology Department of Psychology and Pearson
The University of Sydney Education
Sydney, Australia George Mason University
Nadia Islam Fairfax, VA
Department of Psychology Bryce D. McLeod
Virginia Commonwealth University Department of Psychology
Richmond, VA Virginia Commonwealth University
Matthew A. Jarrett Richmond, VA
Department of Psychology Tara Mehta
University of Alabama Department of Psychiatry
Tuscaloosa, AL University of Illinois at Chicago
Kirsten Johnson Chicago, IL
The University of Vermont Jenna Merz
Burlington, VT Department of Psychiatry
Zornitsa Kalibatseva University of Illinois at Chicago
Department of Psychology Chicago, IL
Michigan State University Dave S. Pasalich
East Lansing, MI School of Psychology
Philip C. Kendall The University of New South Wales
Department of Psychology Sydney, Australia
Temple University Joseph R. Rausch
Philadelphia, PA Department of Pediatrics
Gerald P. Koocher University of Cincinnati
Department of Psychology Cincinnati, OH
Simmons College Kendra L. Read
Boston, MA Department of Psychology
Helena Chmura Kraemer Temple University
Stanford University Philadelphia, PA
University of Pittsburgh Randall T. Salekin
Pittsburgh, PA Department of Psychology
Frederick T. L. Leong University of Alabama
Department of Psychology Tuscaloosa, AL
Michigan State University Philip S. Santangelo
East Lansing, MI Karlsruhe Institute of Technology
Yu Liu Karlsruhe, Germany
Arizona State University Bonnie Solomon
Tempe, AZ Department of Psychology
Ginger Lockhart University of Illinois at Chicago
Department of Psychology Chicago, IL
Arizona State University Lynne Steinberg
Tempe, AZ Department of Psychology
David P. MacKinnon University of Houston
Department of Psychology Houston, TX
Arizona State University David Thissen
Tempe, AZ Department of Psychology
Katherine M. McKnight The University of North Carolina
Department of Psychology and Pearson at Chapel Hill
Education Chapel Hill, NC
George Mason University
Fairfax, VA

xii con t r i buto rs

Timothy J. Trull Nina Wong
Department of Psychological Sciences Anxiety Disorders Clinic
University of Missouri-Columbia University of Central Florida
Columbia, MO Orlando, FL
Stephen G. West Michael J. Zvolensky
Department of Psychology Department of Psychology
Arizona State University The University of Vermont
Tempe, AZ Burlington, VT
Emily Wheat
Department of Psychology
Virginia Commonwealth University
Richmond, VA
Erika J. Wolf
National Center for PTSD
VA Boston Healthcare System
Department of Psychiatry
Boston University School of Medicine
Boston, MA

cont r i bu tor s xiii

This page intentionally left blank

1. A Place for Research Strategies in Clinical Psychology 1

Jonathan S. Comer and Philip C. Kendall

Part One • Design Strategies for Clinical Psychology

2. Laboratory Methods in Experimental Psychopathology 7
Michael J. Zvolensky, John P. Forsyth, and Kirsten Johnson
3. Single-Case Experimental Designs and Small Pilot Trial Designs 24
Kaitlin P. Gallo, Jonathan S. Comer, and David H. Barlow
4. The Randomized Controlled Trial: Basics and Beyond 40
Philip C. Kendall, Jonathan S. Comer, and Candice Chow
5. Dissemination and Implementation Science:
Research Models and Methods 62
Rinad S. Beidas, Tara Mehta, Marc Atkins, Bonnie Solomon, and Jenna Merz
6. Virtual Environments in Clinical Psychology Research 87
Nina Wong and Deborah C. Beidel

Part Two • Measurement Strategies for Clinical Psychology

7. Assessment and Measurement of Change Considerations in
Psychotherapy Research 103
Randall T. Salekin, Matthew A. Jarrett, and Elizabeth W. Adams
8. Observational Coding Strategies 120
David J. Hawes, Mark R. Dadds, and Dave S. Pasalich
9. Designing, Conducting, and Evaluating Therapy Process Research 142
Bryce D. McLeod, Nadia Islam, and Emily Wheat
10. Structural and Functional Brain Imaging in Clinical Psychology 165
Andrew J. Gerber and Marlen Z. Gonzalez
11. Experience Sampling Methods in Clinical Psychology 188
Philip S. Santangelo, Ulrich W. Ebner-Priemer, and Timothy J. Trull

Part Three • Analytic Strategies for Clinical Psychology

12. Statistical Power: Issues and Proper Applications 213
Helena Chmura Kraemer
13. Multiple Regression: The Basics and Beyond for Clinical Scientists 227
Stephen G. West, Leona S. Aiken, Heining Cham, and Yu Liu

14. Statistical Methods for Use in the Analysis of Randomized
Clinical Trials Utilizing a Pretreatment, Posttreatment,
Follow-up (PPF) Paradigm 253
Kendra L. Read, Philip C. Kendall, Mathew M. Carper, and Joseph R. Rausch
15. Evaluating Treatment Mediators and Moderators 262
David P. MacKinnon, Ginger Lockhart, Amanda N. Baraldi, and Lois A. Gelfand
16. Structural Equation Modeling: Applications in the Study of Psychopathology 287
Erika J. Wolf and Timothy A. Brown
17. Meta-analysis in Clinical Psychology Research 317
Andy P. Field
18. Item Response Theory 336
Lynne Steinberg and David Thissen
19. Missing Data in Psychological Science 374
Patrick E. McKnight and Katherine M. McKnight

Part Four • Matters of Responsible Research Conduct in Clinical Psychology

20. Ethical Considerations in Clinical Psychology Research 395
Gerald P. Koocher
21. Clinical Research with Culturally Diverse Populations 413
Frederick T. L. Leong and Zornitsa Kalibatseva

Part Five • Conclusion

22. Decades Not Days: The Research Enterprise in Clinical Psychology 437
Philip C. Kendall and Jonathan S. Comer

Index 443

xvi con t e n ts

A Place for Research Strategies

1 in Clinical Psychology

Jonathan S. Comer and Philip C. Kendall

Despite daunting statistics portraying the staggering scope and costs of mental illness, recent
years have witnessed considerable advances in our understanding of psychopathology and optimal
methods for intervention. However, relative to other sciences, clinical psychology is still a relatively
nascent field, and as such the majority of work is ahead of us. The prepared investigator must be
familiar with the full portfolio of modern research strategies for the science of clinical psychology.
The present Handbook has recruited some of the field’s foremost experts to explicate the essential
research strategies currently used across the modern clinical psychology landscape. Part I of the
Handbook addresses design strategies for clinical psychology and covers laboratory methods in
experimental psychopathology, single-case experimental designs, small pilot trials, the randomized
controlled trial, adaptive and modular treatment designs, and dissemination methods and models.
Part II addresses measurement strategies for clinical psychology and covers assessment, change
measurement, observational coding, measurement of process variables across treatment, structural
and functional brain imagining, and experience sampling data-collection methods. Part III addresses
analytic strategies for clinical psychology and includes chapters on statistical power, correlation and
regression, randomized clinical trial data analysis, conventions in mediation and moderation analysis,
structural equation modeling, meta-analytic techniques, item-response theory, and the appropriate
handling of missing data. In Part IV, matters of responsible conduct in clinical psychology research are
covered, including ethical considerations in clinical research and important issues in research with
culturally diverse populations. The book concludes with an integrative summary of research strategies
addressed across the volume, and guidelines for future directions in research methodology, design,
and analysis that will keep our young science moving forward in a manner that maximizes scientific
rigor and clinical relevance.
Key Words: Research methods, research strategies, methodology, design, measurement, data analysis

Mental health problems impose a staggering with mental disorders. When left untreated these
worldwide public health burden. In the United disorders are associated with frequent comor-
States, for example, roughly half of the population bid mental disorders (Costello, Mustillo, Erkanli,
suffers at some point in their lives from a mental dis- Keeler, & Angold, 2003; Kessler, Chiu, Demler, &
order (Kessler, Berglund, Demler, Jin, Merikangas, Walters, 2005), elevated rates of medical problems
& Walters, 2005), and one in four has suffered (Goodwin, Davidson, & Keyes, 2009; Roy-Byrne,
from a mental disorder in the past year (Kessler, Davidson, Kessler et al., 2006), family dysfunction,
Chiu, Demler, & Walters, 2005). These estimates disability in major life roles (Merikangas, Ames,
are particularly striking when considering the tre- Cui, Stang, Ustun, et al., 2007), poorer educational
mendous individual and societal costs associated attainment (Breslau, Lane, Sampson, & Kessler,

2008), and overall reduced health-related quality First, the past decade has witnessed extraordi-
of life (Comer, Blanco, Hasin, Liu, Grant, Turner, nary technological advances in our ability to image
& Olfson, 2011; Daly, Trivedi, Wisniewski, et al., and analyze the living brain and to collect other bio-
2010). Furthermore, mental disorders confer an logical (e.g., genes, proteins) and experiential data
increased risk of suicide attempts (Nock & Kessler, associated with key domains of functioning and
2006) and are prospective predictors of problem- dysfunction. Such innovations have the potential
atic substance use years later (Kendall & Kessler, to apply noninvasive techniques to understand the
2002; Swendsen, Conway, Degenhardt, Glantz, Jin, development and function of brain networks and
Merikangas, Sampson, & Kessler, 2010). how various changes in functional connectivity may
The societal burden of mental disorders is por- place individuals at risk for clinical syndromes and
trayed in reports of losses in worker productiv- reduced treatment response. Such work can also
ity and of high health care utilization and costs inform our understanding of neurobiological mech-
(e.g., Greenberg et al., 1999). For example, major anisms of adaptive and maladaptive change.
depressive disorder (MDD) is associated with Second, despite advances in epidemiology iden-
workforce impairments, with 20 to 30 percent tifying rates and patterns of mental health disorders,
of Americans with moderate or severe MDD col- longitudinal work is needed to identify developmen-
lecting disability and/or unemployed (Birnbaum, tal patterns of mental disorders in order to deter-
Kessler, Kelley, Ben-Hamadi, Joish, & Greenberg, mine when, where, and how to intervene optimally.
2010). Depressed workers miss more workdays Work in this area evaluating biomarkers may have
than nondepressed workers, collectively account- the potential to identify biosignatures of clinical
ing for roughly 225 million missed annual work- presentations and treatment response and may help
days and corresponding to an estimated $36.6 to identify differentially indicated treatments for
billion in lost productivity each year (Kessler, use at different stages of disorder and recovery. Such
Akiskal, Ames, Birnbaum, Greenberg, et al., work can also help to better identify psychological
2006). Individuals with serious mental health risk and protective factors across the lifespan.
problems earn on average roughly $16,000 less Third, despite the identification of evidence-
annually than their unaffected counterparts, based psychological treatment practices with the
resulting in estimated total lost annual earnings of potential to improve outcomes for many of the
$193.2 billion nationally. mental health problems affecting the population,
Despite these daunting statistics, the past 40 much remains to be learned to develop interven-
years have witnessed considerable advances in our tions for the difficult-to-treat and difficult-to-reach
understanding of psychopathology and the expected individuals, to improve supported interventions and
trajectories of various disorders, and the field has their delivery, to incorporate the diverse needs and
identified evidence-based interventions with which circumstances of affected individuals, and to expand
to treat many of these debilitating conditions treatment availability, accessibility, and acceptabil-
(Barlow, 2008; Kendall, 2012). However, much ity. Regrettably, substantial problems in the broad
remains to be learned about mental disorders and availability and quality of psychological treatments
their treatment, and this should not be surprising. in the community constrain effective care for the
After all, whereas many sciences have been progress- majority of affected individuals. A new generation of
ing for centuries (e.g., biology, chemistry, physics), research in clinical psychology is needed to address
it is only recently, relatively speaking, that the scien- the gaps that persist between treatment in experi-
tific method and empiricism have been applied to mental settings and services available in the com-
the field of clinical psychology. munity. The blossoming field of dissemination and
At this relatively early stage in the science of clin- implementation science (Kendall & Beidas, 2007;
ical psychology, the majority of work is ahead of us, McHugh & Barlow, 2010) has begun to systemati-
and as such we must embrace a deep commitment to cally address this critical area, but we are just at the
empiricism and appreciate the intricate interdepen- very beginning of elucidating optimal methods for
dence of research and practice as we move forward. broad-based, sustainable training in evidence-based
The National Institute of Mental Health Strategic treatments.
Plan (2008) provides a strong guiding framework Fourth, efforts are needed to expand the pub-
to focus and accelerate clinical research so that sci- lic health relevance of clinical research. Innovative
entific breakthroughs can tangibly improve mental research and research strategies are needed that can
health care and the lives of affected individuals. rapidly inform the delivery of quality treatment to

2 a pl ac e fo r res earch s trategies in c li ni c al psyc h ology

maximally benefit the largest number of affected or measurement, observational coding, measurement
at-risk individuals. Such work would entail compar- of process variables across treatment, structural and
ative-effectiveness analyses and evaluations of sup- functional brain imagining, and experience sam-
ported treatments in nonspecialty settings and by pling data-collection methods.
nonspecialty providers across service sectors, while Part III addresses analytic strategies for clini-
also addressing disparities in care and incorporating cal psychology and includes chapters on statisti-
technological innovations. cal power, correlation and regression, randomized
In the face of such grand and laudable objec- clinical trial data analysis, conventions in media-
tives for our field, the prepared investigator must be tion and moderation analysis, structural equation
familiar with the full portfolio of modern research modeling, meta-analytic techniques, item-response
strategies for the science of clinical psychology—a set theory, and the appropriate handling of missing
of “directions,” so to speak, for getting from “here” data. In Part IV, matters of responsible conduct in
to “there.” Just as with any travel directions, where clinical psychology research are covered, includ-
many acceptable ways to get to the same destination ing ethical considerations in clinical research and
may exist (e.g., the quick way, the scenic way, the important issues in research with culturally diverse
cheap way), for each empirical question there are populations. The book concludes with an integra-
many research strategies that can be used to reveal tive summary of research strategies addressed across
meaningful information, each with strengths and the volume, and guidelines for future directions in
limitations. When conducting research, it is incum- research methodology, design, and analysis that
bent upon the investigator to explicitly know why will keep our young science moving forward in a
he or she is taking a particular route, to be familiar manner that maximizes scientific rigor and clinical
with the tradeoffs inherent in taking such a route, relevance.
and to travel that route correctly.
Importantly, evaluations into psychopathol- References
Barlow, D. H. (Ed.). (2008). Clinical handbook of psychological
ogy and therapeutic efficacy and effectiveness have
disorders (4th ed.). New York : Guilford Press.
evolved from a historical reliance on simply profes- Birnbaum, H. G., Kessler, R. C., Kelley, D., Ben-Hamadi, R.,
sional introspection and retrospective case history Joish, V. N., & Greenberg, P. E. (2010). Employer burden
exploration to the modern reliance on complex of mild, moderate, and severe major depressive disorder:
multimethod experimental investigations; prospec- Mental health services utilization and costs, and work perfor-
mance. Depression and Anxiety, 27(1), 78–89. doi:10.1002/
tive, longitudinal research; and well-controlled
cross-sectional examinations across well-defined Breslau, J., Lane, M., Sampson, N., & Kessler, R. C. (2008).
samples. The evolution is to be applauded. To con- Mental disorders and subsequent educational attainment in
tinue to move the science of clinical psychology a US national sample. Journal of Psychiatric Research, 42(9),
forward, investigators must systematically rely on 708–716.
research strategy “routes” that achieve favorable bal- Comer, J. S., Blanco, C., Hasin, D. S., Liu, S. M., Grant, B. F.,
Turner, J. B., & Olfson, M. (2011). Health-related quality
ances between scientific rigor and clinical relevance. of life across the anxiety disorders: Results from the National
This requires careful deliberations around matters of Epidemiologic Survey on Alcohol and Related Conditions
tradeoffs between internal validity (which is typi- (NESARC). Journal of Clinical Psychiatry, 72(1), 43–50.
cally linked with rigor) and external validity (which Costello, E., Mustillo, S., Erkanli, A., Keeler, G., & Angold, A.
is typically linked with relevance). It is with this (2003). Prevalence and development of psychiatric disorders
in childhood and adolescence. Archives of General Psychiatry,
in mind that we have recruited some of the field’s 60(8), 837–844. doi:10.1001/archpsyc.60.8.837
foremost experts for this Handbook to explicate the Daly, E. J., Trivedi, M. H., Wisniewski, S. R., Nierenberg, A. A.,
essential research strategies currently used across the Gaynes, B. N., Warden, D., & . . . Rush, A. (2010). Health-
modern clinical psychology landscape that maxi- related quality of life in depression: A STAR*D report.
mize both rigor and relevance. Annals of Clinical Psychiatry, 22(1), 43–55.
Goodwin, R. D., Davidson, K. W., & Keyes, K. (2009). Mental
Part I of the book addresses design strategies for disorders and cardiovascular disease among adults in the
clinical psychology and covers laboratory meth- United States. Journal of Psychiatric Research, 43(3), 239–246.
ods in experimental psychopathology, single-case doi:10.1016/j.jpsychires.2008.05.006
experimental designs, small pilot trials, the ran- Greenberg, P. E., Sisitsky, T., Kessler, R. C., Finkelstein, S. N.,
domized controlled trial, adaptive and modular Berndt, E. R., Davidson, J. R., et al. (1999). The economic
burden of anxiety disorders in the 1990s. Journal of Clinical
treatment designs, and dissemination methods and Psychiatry, 60, 427–435.
models. Part II addresses measurement strategies for Kendall, P. C. (2012). Child and adolescent therapy: Cognitive-
clinical psychology and covers assessment, change behavioral procedures (4th ed.). New York : Guilford.

comer, k end all 3

Kendall, P. C., & Beidas, R. S. (2007). Smoothing the trail treatments: A review of current efforts. American Psychologist,
for dissemination of evidence-based practices for youth: 65, 73–84.
Flexibility within fidelity. Professional Psychology: Research Merikangas, K. R., Ames, M., Cui, L., Stang, P. E., Ustun, T.,
and Practice, 38, 13–20. Von Korff, M., & Kessler, R. C. (2007). The impact of
Kendall, P. C., & Kessler, R. C. (2002). The impact of childhood comorbidity of mental and physical conditions on role
psychopathology interventions on subsequent substance abuse: disability in the US adult household population. Archives
policy implications, comments, and recommendations. Journal of General Psychiatry, 64(10), 1180–1188. doi:10.1001/
of Consulting and Clinical Psychology, 70(6), 1303–1306. archpsyc.64.10.1180
Kessler, R. C., Akiskal, H. S., Ames, M., Birnbaum, H., National Institute of Mental Health (2008). National Institute
Greenberg, P., Hirschfeld, R. A., &. . . Wang, P. S. (2006). of Mental Health Strategic Plan. Bethesda, MD : National
Prevalence and effects of mood disorders on work perfor- Institute of Mental Health.
mance in a nationally representative sample of U.S. work- Nock, M. K., & Kessler, R. C. (2006). Prevalence of and risk
ers. American Journal of Psychiatry, 163(9), 1561–1568. factors for suicide attempts versus suicide gestures: Analysis
doi:10.1176/appi.ajp.163.9.1561 of the National Comorbidity Survey. Journal of Abnormal
Kessler, R. C., Berglund, P., Demler, O., Jin, R., Merikangas, Psychology, 115(3), 616–623.
K. R., & Walters, E. E. (2005). Lifetime prevalence and Roy-Byrne, P. P., Davidson, K. W., Kessler, R. C., Asmundson,
age-of-onset distributions of DSM-IV disorders in the G. G., Goodwin, R. D., Kubzansky, L., &. . . Stein, M. B.
National Comorbidity Survey replication. Archives of General (2008). Anxiety disorders and comorbid medical illness.
Psychiatry, 62(6), 593–602. doi:10.1001/archpsyc.62.6.59 General Hospital Psychiatry, 30(3), 208–225. doi:10.1016/j.
Kessler, R. C., Chiu, W., Demler, O., & Walters, E. E. (2005). genhosppsych.2007.12.006
Prevalence, severity, and comorbidity of 12-month DSM-IV Swendsen, J., Conway, K. P., Degenhardt, L., Glantz, M., Jin,
disorders in the National Comorbidity Survey replication. R., Merikangas, K. R., &. . . Kessler, R. C. (2010). Mental
Archives of General Psychiatry, 62(6), 617–627. doi:10.1001/ disorders as risk factors for substance use, abuse and depen-
archpsyc.62.6.617 dence: Results from the 10-year follow-up of the National
McHugh, R. K., & Barlow, D. H. (2010). The dissemina- Comorbidity Survey. Addiction, 105(6), 1117–1128.
tion and implementation of evidence-based psychological doi:10.1111/j.1360–0443.2010.02902.x

4 a pl ac e f o r res earch s trategies in c li ni c al psyc h ology

Design Strategies for
Clinical Psychology
This page intentionally left blank

Laboratory Methods in Experimental

2 Psychopathology

Michael J. Zvolensky, John P. Forsyth, and Kirsten Johnson

Experimental psychopathology represents a subfield of psychological science aimed at elucidating the
processes underlying abnormal behavior. The present chapter provides a synopsis of key elements of
experimental psychopathology research and its methods. In the first section, we define experimental
psychopathology research and briefly articulate its origins. Second, we present the methodological
approaches employed in experimental psychopathology research. Third, we present some of the
molar conceptual considerations for the assessment approaches in experimental psychopathology
research. In the final section, we describe some key challenges to experimental psychopathology
research as well as potentially useful strategies recommended for overcoming such challenges.
Key Words: Experimental psychopathology, laboratory, mechanism, laboratory models, translational

Experimental psychopathology represents a The present chapter provides a synopsis of the

subfield of psychological science aimed at eluci- key elements of experimental psychopathology
dating the processes underlying abnormal behav- research and its methods. In the first section, we
ior (Zvolensky, Lejuez, Stuart, & Curtin, 2001). define experimental psychopathology research and
Although originally restricted to “true experimen- briefly articulate its origins. Second, we present the
tal” laboratory-based tests (Kimmel, 1971), experi- methodological approaches employed in experi-
mental psychopathology now reflects a larger, more mental psychopathology research. Third, we present
diverse and multifacted field of inquiry (Zvolensky some of the molar conceptual considerations for the
et al., 2001). Topics of study include examinations assessment approaches in experimental psychopa-
of the phenomenology of psychological disorders; thology research. In the final section, we describe
explication of the underlying processes governing some key challenges to experimental psychopathol-
the etiology, maintenance, and amelioration of psy- ogy research as well as potentially useful strategies
chopathology; and tests of intervention(s) with the recommended for overcoming such challenges.
express purpose of identifying explanatory mecha-
nisms. This work typically involves a range of meth- Experimental Psychopathology:
odologies (e.g., laboratory and field studies) as well Definition and Origins
as populations (e.g., diagnosed cases and nonclini- Definition
cal). The subfield of experimental psychopathology Kimmel (1971) offered one of the earliest defi-
represents one of the branches in psychological sci- nitions of experimental psychopathology research:
ence upon which evidence-based practice is theoret- “the experimental study of pathological behavior (i.e.,
ically and empirically built (Forsyth & Zvolensky, using the experimental method to study pre-existing
2002; McFall, 1991). pathological behavior), or the study of experimental

pathological behavior (i.e., the pathological behav- approaches employed in experimental psychopa-
ior being studied is induced experimentally rather thology research.
than developed naturally)” (p. 7, emphasis added).
In the former sense, experimental psychopathol- Origin
ogy is the study of the behavior of individuals with The origin of experimental psychopathology can
known psychopathology in response to imposed be discussed in relation to the scholarly work of Ivan
experimental conditions (e.g., how persons with Pavlov (1849–1936) and William James (1842–
and without a diagnosis of a specific disorder 1910). Both Pavlov and James helped to establish
respond under conditions of “imposed stress” or two traditions within experimental psychopathol-
“no stress”), whereas the latter approach entails ogy: (a) the experimental induction and subsequent
identification and manipulation of variables to modeling of abnormal behavior in laboratory ani-
induce psychopathology processes among individ- mals (Pavlov) and (b) the experimental approach
uals without a history of psychopathology (Forsyth to the study of preexisting abnormal behavior in
& Zvolensky, 2002). humans (James).
Others have defined experimental psychopathol- Pavlov first used the label “experimental psy-
ogy more broadly as the application of methods chopathology” in a 1903 lecture delivered at the
and principles of psychological science to under- International Medical Congress in Madrid titled
stand the nature and origins of abnormal behavior Experimental Psychology and Psychopathology in
(Kihlstrom & McGlynn, 1991). This definition Animals (Forsyth & Zvolensky, 2002). In that lec-
encompasses other quasi-experimental and cor- ture, Pavlov presented for the first time his theory
relational methodologies (Kihlstrom & McGlynn, of conditioned reflexes, which revealed his tendency
1991). Zvolensky and colleagues (2001) defined to cast psychopathology in physiological terms via
experimental psychopathology as laboratory-based experimental tests on animals. Yet Pavlov did not
research with human and/or nonhuman animals, develop a coherent conceptualization of experimen-
directly aimed at discovering and/or explaining the tal psychopathology apart from use of an “experi-
etiology and maintenance of psychopathological mental approach.” Later, the contributions of two
processes; work that may potentially contribute to different investigators in Pavlov’s laboratory facili-
the amelioration of dysfunctional behavior in the tated advancements in the area of experimental psy-
future. These definitions of experimental psycho- chopathology research (Kimmel, 1971; Popplestone
pathology can be contrasted with that of clinical & McPherson, 1984). Specifically, Yerofeeva (1912,
psychopathology research that involves study with 1916) and Shenger-Krestovnikova (1921) both
humans, typically with a particular psychological observed persistent “abnormal behavior” in their
disorder, to (a) address the treatment/prevention experimental animals following the use of novel
of psychopathology in settings primarily outside of conditioning procedures. This work was the precur-
the laboratory or (b) identify symptoms or deficits sor to the phenomenon later known as “experimen-
that characterize psychological disorders (Forsyth & tal neurosis.” This work on experimental neurosis
Zvolensky, 2002). Moreover, experimental psycho- led to a marked shift in Pavlov’s research agenda: he
pathology can be distinguished from “basic psycho- devoted the remainder of his scientific career to the
logical research.” Although basic research ultimately experimental analysis of variables and processes that
may have important clinical implications, the goals occur in abnormal behavior patterns among human
and overarching framework for such research are to and nonhuman animals.
elucidate broadly applicable principles, indepen- Due to Pavlov’s contributions, other behavioral
dent of any a priori clinical relevance or application scientists began to view laboratory approaches
(Osgood, 1953). to studying abnormal behavior processes as both
Across perspectives, the common thread that meaningful and productive (Benjamin, 2000).
runs through each of the above definitions of exper- From this work emerged a core conceptual prin-
imental psychopathology is a focus on knowledge ciple of subsequent experimental psychopathology
development for psychopathology by using experi- research. That is, otherwise adaptive behavioral pro-
mental and related methodologies. The phenom- cesses provide the foundation upon which maladap-
enon of interest can be induced or it may consist of tive patterns of behavior are established, and such
already occurring natural abnormal behavior. Please behavioral processes can subsequently interfere with
see below (“Underlying Conceptual Approach”) for an organism’s ability to behave effectively (Kimmel,
an expanded discussion of the main methodological 1971). The core task, therefore, involved isolating

8 l a b or ato ry metho ds in experimental psyc h opat h ology

processes responsible for “moving” the organism addressed the problem of consciousness via experi-
from an adaptive to a maladaptive range of behav- ments on hypnosis, automatic writing, phantom
ior. Notably, although not the present purpose, this limb phenomenon, psychophysical manipulations
core experimental psychopathology concept helped with clinical samples (e.g., perception of space, bal-
pave the way for comparative psychiatry (Lubin, ance), neurophysiology, and studies of dissociative
1943) and comparative psychopathology (Zubin phenomena. These topics of study and the methods
& Hunt, 1967)—approaches that emphasize cross- used to examine them represent the precursors to
species comparisons of behavioral processes (e.g., a science of abnormal behavior, and experimental
human-to-animal comparisons). psychopathology specifically.
This approach of focusing on identifying and According to James, experimental psychopathol-
manipulating variables that, either in whole or in ogy was the study of the variables and processes that
part, cause and/or exacerbate psychopathology influence both aberrant and exceptional human
began to define experimental psychopathology experience. James’s laboratory work was largely
research (Kimmel, 1971). Yet while Pavlov and his devoted to an experimental analysis of psychopa-
contemporaries confined themselves to the experi- thology as it occurs naturally. James also conducted
mental production of psychopathological behav- his experimental psychopathology research with a
ior in the laboratory (Anderson & Liddell, 1935; focus on practical concerns, a direct precursor to
Krasnogorski, 1925; Liddell, 1938), William James topics that now carry the label “clinical relevance.”
had been working to develop an experimental psy- This approach also was heavily influenced by the
chopathology of abnormal behavior as it occurs clinical emphasis of the emerging French labora-
naturally. tory sciences in physiology, neurology, experimental
Although William James is best known for his physiology, and psychology. Notably, this approach
functionalist philosophy, he also helped to pioneer can be contrasted with the Germanic tradition,
work in experimental psychopathology. James was where pure science was considered to be the father
highly critical of trends in American psychology; of clinical application (Taylor, 1996). The infu-
indeed, he was particularly judgmental of the import- sion of ideas emerging from rapid developments
ing of the German ideal of science (i.e., Wundtian in experimental psychopathology gave way to the
and Titchenerian psychology), with its emphasis rise of scientific psychotherapy in America at a time
on determinism, materialism, structuralism, and when experimental psychology, psychiatry, and
reductionism (Taylor, 1996). James believed the medicine also were beginning to question the prac-
Wundtian experimental tradition had created a less- tices of mental healers (Taylor, 1996); this move is
than-ideal instrument, a tradition characterized by remarkably similar to contemporary efforts to sty-
laboratory activities focused on building apparatus mie “pseudoscience” activities (Lilienfeld, 1996).
to collect “trivial measurements.” Specifically, this Pavlovian- and Jamesian-style experimental psy-
work often lacked practical purpose or relevance chopathology research gained momentum through
(Taylor, 1996). James was chiefly concerned that the early to middle nineteenth century. By 1904, the
psychology might lose sight of developing a com- first world congress of experimental psychology was
prehensive and integrated psychological science of convened (Schumann, 1905). By 1910, several texts
the “whole person.” appeared outlining current experimental psycho-
William James established a laboratory at pathology research, most notably Gregor’s (1910)
Harvard University in 1875. Between 1874 and Leitfaden der Experimentellen Psychopathologie
1889, James was involved in collaborative research (“guide” or “how-to book” for experimental psycho-
in Bowditch’s laboratory at Harvard Medical School pathology). Two years later, Franz (1912) published
(Taylor, 1996). While at Harvard University, James the first review of experimental psychopathology
maintained active collaborations with individu- research in Psychological Bulletin; a review that was
als at Harvard Medical School in an attempt to followed 4 years later by another review article with
bridge areas of physiology, neurology, and psychol- the same title (Wells, 1914). Neither Franz nor
ogy, an approach well ahead of its time (National Wells offered a definition of experimental psycho-
Advisory Mental Health Council Behavioral Science pathology in their respective works, yet both papers
Workgroup, 2000). As Taylor (1996) observed, are of historical interest in highlighting the nature
these laboratory sciences became integrated at of experimental psychopathology research during
Harvard, culminating in experimental research on this period. Much of this research, in turn, was
the problem of consciousness. James, in particular, occurring in the context of various labels, such as

zvolensk y, for sy t h , joh nson 9

abnormal psychology, psychopathology, pathopsy- have been developed that showcase and disseminate
chology, clinical psychology, psycho-clinical, medi- experimental psychopathology research. For exam-
cal psychology, and medico-clinical. Moreover, this ple, the Journal of Abnormal Psychology is a flagship
work was experimental in design and focused on journal of the APA that has played a key role in
questions pertaining to the understanding of nor- experimental psychopathology dissemination.
mal and abnormal psychological processes. Unlike “true experiments,” where the focus is to
Experimental psychopathology research prolifer- vary some variable deliberately after random assign-
ated in the ensuing decades and tended to follow ment so as to elucidate causal relations, the new
either a Jamesian (i.e., experimental analysis of natu- wave of experimental psychopathology often uti-
rally occurring psychopathology) or Pavlovian (i.e., lized correlational methodology (Kimmel, 1971).
the experimental induction of psychopathology) This approach now often falls under the label of
approach. The behaviorists were following the lead “descriptive psychopathology research.” The global
of Pavlov, Hull, and Watson in pursuing basic and purpose of descriptive psychopathology research is
applied work in experimental neurosis (e.g., Estes to identify markers that are thought to characterize,
& Skinner, 1941; Franks, 1964; Rachman, 1966; or covary with, psychopathological processes of phe-
Skinner, 1953; Wolpe, 1952, 1958; Wolpe, Salter, notypes. Although markers can be either broadband
& Reyna, 1964). This approach drew heavily upon or specific to forms of psychopathology, the concept
the findings from experimental psychology and itself includes examination of individual difference
involved a conception of abnormality in terms of variables thought to aid in the prediction, diagnosis,
deficits in functioning of certain psychological sys- or understanding of the consequences of a disorder
tems rather than people suffering from mental dis- (Sher & Trull, 1996). Such markers are typically
eases produced by biological causes (Eysenck, 1973). studied via use of sophisticated laboratory methods
Notably, Eriksonians, Gestaltists, and Freudians that may involve biochemical assays, pharmacologi-
also were embarking on experimental psychopathol- cal or psychological challenges, psychophysiologi-
ogy and outlining a framework for how such work cal measures, neuropsychological assessments, or
might proceed (e.g., Brown, 1937; Mackinnon & cognitive assessments (see Hunt & Cofer, 1944;
Henle, 1948). Additionally, numerous attempts Kihlstrom & McGlynn, 1991; Lenzenweger &
were under way to extend findings of experimental Dworkin, 1998; Sher & Trull, 1996, for more com-
psychopathology to the practice of psychotherapy prehensive descriptions of this approach).
(Landis, 1949; Masserman, 1943; Wolpe, 1958) The move from a strict application of experimen-
and to use this work as a framework for a science tal to descriptive psychopathology research method-
of abnormal behavior, generally (Eysenck, 1961). ology appears to be greatly influenced by advances
By 1947, the American Psychological Association in cognitive and neuropsychological assessment
(APA) recognized experimental psychopathology instruments and research. Here, the focus often
research as a legitimate component of accredited is to understand higher-order cognitive processes
training in clinical psychology (APA, 1947). of relevance to various psychopathological states
Experimental psychopathology research grew fur- (e.g., executive functioning, language abilities,
ther in the early 1950s with the establishment of the and attentional functions) via sophisticated instru-
Journal of Clinical and Experimental Psychopathology. ments. This work also appears to have been driven
This journal became an outlet for experimental psy- by early refinements in the Diagnostic and Statistical
chopathology research. The 1955 APA meeting also Manual of Mental Disorders (e.g., DSM; American
was significant in its thematic focus on the experi- Psychiatric Association, 1994). This shift in focus
mental approach to psychopathology (see Hoch & not only contributed to understanding the nature
Zubin, 1957). Experimental psychopathology grew of abnormal behavior (Chapman & Chapman,
and diversified during this period, and by the early 1973; Ingram, 1986; McNally, 1998), but it also
to middle 1960s began to include laboratory obser- provided important insights into the role of cogni-
vation of psychopathological processes. Indeed, the tive functioning in the development, expression,
1960s marked an important historical shift in focus and maintenance of psychopathology (Abramson
and a more widespread usage of the word “experi- & Seligman, 1977; Kihlstrom & McGlynn, 1991;
mental” to include research, often in a laboratory Maser & Seligman, 1977).
setting, but where the purpose was to identity Overall, experimental psychopathology research
psychopathological processes. Since this period, has grown from basic laboratory roots and in many
numerous professional organizations and journals respects represents a hybrid of other laboratory

10 l a b or ato ry metho ds in experimental psyc h opat h ology

and clinical research. It has been influenced by a (see Table 2.1). In short, these studies can address
number of philosophical, contextual, and method- “if,” “how,” and “why” questions concerning patho-
ological developments over the past 100-plus years. genic variables and processes. Included here would
Currently, experimental psychopathology tends be studies that attempt to produce critical features
to reflect work that is concerned with underlying of psychopathology in organisms with no known
processes for psychopathology and often examines history of psychopathology (e.g., experimental neu-
them via experimental or correlational methodol- rosis; Pavlov, 1961; Wolpe, 1958). In such work,
ogy. With this background, we now turn to a more psychopathology processes represent the depen-
in-depth discussion of the main methodological dent variable of interest. These processes are often
approaches employed in experimental psychopa- induced directly in mild but prototypical forms.
thology research. Given the focus on the induction of psychopa-
thology, participants in Type I research often are
Underlying Conceptual Approach those who have no preexisting history of psycho-
Forsyth and Zvolensky (2002) derived a two- pathology. Such participants, unlike those with
dimensional scheme for characterizing experimental preexisting psychopathology, provide experimental
psychopathology work. The first dimension spans psychopathologists with a relatively “clean” biobe-
research where an independent variable is manipu- havioral history upon which to engage in theory-
lated or not manipulated. The second dimension driven causal-oriented hypothesis testing. Moreover,
includes the nature of the population under study. bodies of work on a particular type of process
The resulting matrix yields four possible ways to (e.g., respondent learning during physical stress)
characterize experimental psychopathology research theoretically offer a “normative context” upon
(i.e., Type I, Type II, Type III, and Type IV). Please which sophisticated evaluations of similar pro-
see Table 2.1 for a summary of these labels and their cesses in clinical samples can be better understood.
definitions. Although it is sometimes common to challenge the
use of nonclinical samples in these studies, such
Type I Experimental Psychopathology arguments are often not theoretically justified from
Research an experimental psychopathology research tradi-
Type I research involves the manipulation of inde- tion. Indeed, experimental psychopathology seeks to
pendent variables and examination of their effects determine, on an a priori basis, the nature of specific
on behavior in nonclinical samples. Although both biobehavioral processes moving one from a normal
dimensions characterize experimental psychology psychological experience to a dysfunctional one.
research, they represent experimental psychopathol- As indicated, the basic assumption guiding this work
ogy research when the a priori focus is on elucidating is that abnormal behavior is governed by the same
processes that contribute, either in whole or in part, principles and classes of variables that determine
to the genesis or maintenance of abnormal behavior normal adaptive behavior. It is the combination of

Table 2.1. Classifications and Definitions of Psychopathology Research

Dimension Definition

Type I: Experimental psychopathology The manipulation of independent variables to observe their effects on
research behavior in nonclinical samples. Here, the a priori focus is on elucidating
variables that contribute to the genesis of abnormal behavior.

Type II: Quasi-experimental The manipulation of independent variables to observe their biobehav-
psychopathology research ioral processes in samples with a well-established type of psychopathol-
ogy, among persons displaying well-established or subclinical features of

Type III: Nonpatient psychopathology No manipulation of independent variables; is limited to descriptive state-
research ments about behavioral and psychological processes in nonclinical samples

Type IV: Descriptive psychopathology No manipulation of independent variables; is limited to descriptive

research statements about psychopathology in samples with well-established
or subclinical features of psychopathology

zvolensk y, for sy t h , joh nson 11

such variables that results in variations in behavior, and psychopathology within such a system is often
some of which may be characterized as abnormal a product of multiple controlling variables. Type I
or maladaptive in a psychological sense (Sandler & models, therefore, should be viewed not as exact
Davidson, 1971). The task, therefore, is to exam- replicas of psychological disorders, or as the model
ine how varied permutations of such variables result of a specific form of psychopathology based on cor-
in psychopathological behavior. Thus, such ques- respondence alone.
tions are impossible to address among those already The logic of Type I experimental psychopathol-
experiencing psychopathology. For example, Peters, ogy research is similar to that of basic research: to
Constans, and Mathews (2011) employed a Type I yield scientific understanding of the independent
research paradigm to test the hypothesis that attri- variables and relevant processes that cause or main-
butional style may be one causative factor of depres- tain forms of psychopathology. This approach is
sion vulnerability. Here, 54 undergraduate students, guided by the view that diagnostically similar and
without a history of depression, were randomly dissimilar forms of psychopathology (American
assigned to one of two experimental conditions: Psychiatric Association, 1994) are complexly deter-
resilience condition, n = 28; vulnerability condition, mined. Thus, two individuals who meet identical
n = 26. The resiliency condition involved exposing DSM diagnostic criteria for Disorder X may exhibit
participants to 60 descriptions that promoted a self- markedly different histories with respect to causal
worthy, stable attribution of a positive event and 60 variables (i.e., equifinality), just as two individuals
descriptions that promoted an unstable attribution who meet criteria for different DSM diagnoses may
unrelated to self-worth for a negative event. In con- exhibit fairly similar histories with respect to puta-
trast, the vulnerability condition involved expos- tive causal variables (i.e., multifinality). The task
ing participants to 60 descriptions that promoted of Type I experimental psychopathology research,
a self-deficient, stable attribution of a negative event therefore, is to elucidate a universe of relevant
and 60 descriptions that promoted an unstable causal processes and their relation to specific forms
attribution unrelated to self-worth for a positive of psychopathology.
event. Following exposure to the assigned descrip- Such Type I research is not driven by, nor neces-
tions, all participants subsequently completed a sarily dependent on, a reliable and valid psychiat-
stressor task (i.e., Cognitive Ability Test). Through ric nomenclature (but see Kihlstrom & McGlynn,
a series of assessments, Peters and colleagues mea- 1991, for a different view). Indeed, Type I experi-
sured the change in mood state from before to after mental psychopathology may be inspired by the
manipulation. Results indicated that individuals psychiatric nomenclature, or more generally by
in the resilience condition reported less depressed questions about the nature of psychological suffer-
mood (compared to the vulnerability condition) in ing and behavior maladjustment (e.g., Gantt, 1971).
response to the academic stressor (please see Peters The expectation over time is that this work will yield
et al., 2011, for a complete description of study a clearer understanding of a subset of clinically rel-
methods and results). evant variables.
Notably, Type I models naturally do not yield a Type I research has a strength (and challenge) of
complete account of how psychopathology devel- being able to identify putative causal variables that
ops. The reason is that experimental psychopathol- are directly manipulable. This aspect is important
ogy models tend to be highly local and specific, for this research approach, as such variables, to the
and for ethical, pragmatic, and strategic reasons extent that they are subject to direct manipulation,
also tend to focus on specific subsets of variables in also may serve as the “building blocks” of future
relation to the induction of prototypical aspects of intervention efforts. Despite the apparent analytic
abnormal behavior. That is, the variables shown to correspondence that is involved with the identifica-
produce key features of psychopathology in a speci- tion of “controlling variables” and the direct appli-
fied population represent only a subset of a universe cation of such variables to intervention strategies, it
of possible causal variables that may be subjected is indeed rare that experimental psychopathologists
to experimental scrutiny in the relatively closed sys- follow these processes fully through to the point of
tem of the laboratory. Although it often is assumed application (see Zvolensky et al., 2001, for a dis-
that such variables will lead to similar behavioral cussion of this issue). The reason is due, in part, to
effects in the open system of the natural world the analytic agenda of Type I experimental psycho-
(Mook, 1983), this may not always be true. The pathologists. This agenda is constrained by analytic
open system is subject to many dynamic influences, goals of prediction and influence, and the more

12 l a b or atory metho ds in experimen tal psyc h opat h ology

general epistemic agenda of contributing to scien- of independent variables on the naturally occurring
tific understanding (i.e., knowledge for knowledge’s psychopathology; thus, it cannot be clearly shown
sake), and only secondary concern about whether that the independent variables are related to the
such knowledge may be put to practical use. psychopathology in a causative sense. Please refer to
Table 2.1.
Type II Experimental Psychopathology Behavior characterized at one point in time as
Research “abnormal” is presumably the product of complex
Type II research involves the direct manipulation interactions of causative variables and biobehavioral
of independent variables and evaluation of their processes associated with them. A psychiatric diag-
effects on biobehavioral processes in samples with nosis is a summary label of that cumulative history
a well-established type of psychopathology, among but is not synonymous with it. That is, although one
persons who vary in some established psychopa- can assume that psychopathology is the result of a
thology risk dimension or display subclinical (i.e., history of causative and controlling variables that are
they do not reach a diagnostic threshold) features somehow different from persons who do not meet
of psychopathology. For example, Felmingham and diagnostic criteria, one cannot infer from the diag-
colleagues (2010) recorded functional magnetic nosis the putative variables and processes responsible
resonance imaging data in both male and female for it. Thus, when independent variables are varied
participants with a diagnosis of posttraumatic stress among persons with Diagnosis X, any resulting
disorder (PTSD), trauma-exposed controls, and changes in behavior may be due to the interaction
non–trauma-exposed controls while they viewed of the independent variable and a host of unknown
masked facial expressions of fear. Specifically, fear variables and processes in persons so diagnosed. The
and neutral gray-scale face stimuli were presented in result leaves the experimenter hypothesizing about
a backward masking paradigm, with target stimuli why the changed independent variable functioned
(fear or neutral) presented for 16.7 ms, followed differentially in one patient sample compared to
immediately by a neutral face mask (163.3 ms). By another. For instance, research has shown that per-
examining neural activation to threat, Felmingham sons with a diagnosis of panic disorder are more
and colleagues sought to elucidate one of the pos- likely to panic in response to biological challenges
sible pathways through which women have a greater than persons with other anxiety disorder diagnoses
propensity than men to develop PTSD following and healthy nonpsychiatric controls (Zvolensky &
trauma. Findings indicated that exposure to trauma Eifert, 2000). What remains entirely unclear from
was associated with enhanced brainstem activity to this research, however, is why persons with a diag-
fear in women, regardless of the presence of PTSD; nosis of panic disorder are more likely to panic in
however, in men, brainstem activity was associated response to biological challenges. As the vast major-
only with the development of PTSD. Moreover, men ity of psychological and pharmacological treatment
with PTSD (compared to women) displayed greater strategies are geared toward implementing a treat-
hippocampal activity to fear, possibly suggesting ment based upon—almost exclusively—psychiatric
that men have an enhanced capacity for contextu- diagnosis, the “why” question has at first glance very
alizing fear-related stimuli (please see Felmingham little practical value. To be sure, there is a real and
et al., 2010, for a complete description of study powerful temptation to attribute the cause of differ-
methods and results). As illustrated here, unlike ential responses to biological challenge procedures
Type I experimental psychopathology research, Type to a psychiatric diagnosis. In doing so, however, the
II research is limited to quasi-experimental ques- variables responsible for the differential responses
tions of the “what,” “if,” and “how” variety. Type II are left unexplained.
research cannot directly provide answers to “why” From a scientific standpoint, the biobehavioral
questions because the psychopathology is selected processes associated with a diagnosis of panic dis-
for, not produced directly. Although this type of order or other psychological disorders, and particu-
research can attempt to address questions about larly their interaction with a challenge procedure or
variables and processes that “cause” psychopathol- other experimental manipulations, cannot be fully
ogy, it is unable to do so in a clear and unambiguous addressed with Type II research. Although Type
sense. The central reason for this analytic limitation II research is specific to descriptive (correlational)
boils down to this: because the variables responsible statements and its relation to other processes, this
for a given psychopathology are unknown (at least need not always be the case. For instance, Type I
in part), one cannot clearly demonstrate the effects and Type II experimental psychopathology research

zvolensk y, for sy t h , joh nson 13

can be programmatically combined, such that the as the relation(s) observed between these strengths
“psychopathology” is experimentally induced (i.e., and overall life satisfaction. Specifically, results indi-
Type I) and then subjected to other experimental cated that the values-in-action strengths of hope
conditions (i.e., Type II). In this way, one can move and zest were significant positive predictors of life
closer to addressing how variables responsible for satisfaction (please see Proctor et al., 2011, for a
producing the psychopathology interact with vari- complete description of study methods and results).
ables that may either exacerbate or attenuate a range Although some Type III research could, in princi-
of behavior associated with psychopathology. ple, contribute to understanding psychopathology
Overall, Type II research can elucidate indepen- (e.g., elucidating behavioral or individual difference
dent variables that (a) exacerbate, or modify the risk factors associated with psychological problems),
expression of, existing forms of abnormal behav- often the goals of such research are not specific to
ior (e.g., pathoplastic effects; Clark, Watson, & questions about abnormal behavior per se. Only
Mineka, 1994) and (b) may be influenced directly when Type III research is embedded within the
to either prevent or ameliorate psychopathology larger context of clinical science and practice may
(e.g., treatment intervention as an independent it become relevant to understanding psychopathol-
variable). This work occupies an important place in ogy; this topic is beyond the scope of the present
the broader scientific context. For instance, pressing paper. Please refer to Table 2.1.
clinical concerns often focus on mechanisms that
may be prototypical “gateways” for other types of Type IV Descriptive Psychopathology
destructive or problematic behaviors. Similarly, psy- Research
chopathologists may attempt to elucidate how the Type IV research involves no manipulation of
presence or absence of certain variables or condi- independent variables and is thus limited to either
tions either increases or decreases the risk for a spe- descriptive or correlational statements about psycho-
cific type of behavior, including how such variables pathology in samples with known or subclinical fea-
may exacerbate the clinical severity of a preexisting tures of psychopathology. Please refer to Table 2.1.
psychological condition. Yet when Type II research As with Type II, the nature of the population under
focuses on testing the efficacy of manipulable treat- study (i.e., clinical or subclinical individuals) most
ment interventions on therapeutic outcomes, it more clearly identifies Type IV research as belonging
likely belongs within the realm of clinical psychopa- within the realm of psychopathology research. As
thology research (see Kihlstrom & McGlynn, 1991; such, Type IV research also is predicated on the reli-
Sher & Trull, 1996). Finally, Type II experimental ability and validity of the DSM diagnostic system,
psychopathology is dependent on the reliability and including related methods of classification. Type
validity of the psychiatric nomenclature, including IV research has become increasingly popular in
methods used to identify and discriminate persons recent years, owing much to the growing precision
with known or subclinical forms of psychopathol- of psychiatric diagnosis and interest in delimiting
ogy from “normals,” based largely on topographic characteristic features of different forms of abnor-
or symptom features alone. mal behavior. This work, in turn, draws heavily on
sophisticated assessment methodologies and tasks,
Type III “Nonpatient” Psychopathology many of which are drawn from experimental psychol-
Research ogy and medical research (Kihlstrom & McGlynn,
Unlike research of the Type I and Type II variet- 1991). For example, Hawkins and Cougle (2011)
ies, Type III research involves no manipulation of examined the relation(s) between anger and a variety
independent variables and is limited to descriptive of clinically diagnosed anxiety disorders among par-
statements (i.e., largely correlational) about behav- ticipants in a large, nationally representative survey.
ioral and psychological processes in nonclinical Using a combination of self-report measures and
samples. For example, Proctor, Maltby, and Linley structured clinical interviews, Hawkins and Cougle
(2011) recruited 135 nonclinical undergradu- provided correlational statements about the possible
ate students to complete self-reported measures of link between anxiety-related psychopathology and
strengths use, subjective well-being, self-esteem, the expression of anger. Specifically, results of this
self-efficacy, health-related quality of life, and val- investigation suggest that there are unique relation-
ues-in-action. Here, Proctor and colleagues gener- ships between multiple anxiety disorders (excluding
ated descriptive statements regarding the most- and panic disorder and PTSD) and various indices of
least-commonly endorsed character strengths as well anger experience and expression that are not better

14 l a b or ato ry metho ds in experimental psyc h opat h ology

accounted for by psychiatric comorbidity (please see basis for understanding how and why certain assess-
Hawkins & Cougle, 2011, for a complete descrip- ment activities are employed in any given type of
tion of study methods and results). experimental psychopathology research.
Use of such “experimental” tasks in the context
of Type IV research sometimes can give the impres- Level of Analysis
sion that such research is experimental. Yet use of In most instances, the procedures employed to
an experimental task does not ipso facto entail that execute assessment activities in experimental psy-
the research is experimental, and hence, capable of chopathology research are highly influenced by the
addressing questions about variables and processes underlying conceptual framework for the psycho-
that maintain, exacerbate, or attenuate psychopa- pathology phenotype in question (Kazdin, 1982).
thology. Type IV research usually includes the appli- For example, the level of analysis for the assessment
cation (not manipulation) of experimental tasks in of psychopathology processes is largely influenced
the context of elucidating biobehavioral differences by the conceptualization of the problem behavior
between clinical and nonclinical samples (e.g., see in question. In most cases, assessment activities for
Kihlstrom & McGlynn, 1991; McNally, 1998; experimental psychopathology focus on symptom
Williams, Mathews, & MacLeod, 1997). Typically, presentation (e.g., number of panic attacks per
any observed differences are then used to support observational period), psychopathology phenotype
inferences about the nature of the psychopathology (e.g., alcohol abuse vs. dependence), or the operative
in question, including presumed underlying dys- system components (cognitive, behavioral, physical,
functional processes thought to covary with known and social context). The level of analysis employed
forms of psychopathology. Much of this work can in experimental psychopathology will directly affect
be classified as descriptive or demonstration psycho- the extent to which specific aspects of problematic
pathology studies. This is a direct acknowledgment behavior are assessed.
that Type IV research can inform our understanding Assessment at the symptom level in experimen-
about what persons with different forms of abnor- tal psychopathology focuses on individual behav-
mal behavior typically do in response to imposed ior (e.g., number of drinks per drinking episode,
tasks under controlled conditions, but not why they number or intensity of catastrophic thoughts); it is
do what they do. a unidimensional approach. Assessment at the phe-
notypic level focuses on the symptoms that covary,
Summary and therefore it is multidimensional (e.g., facets of
Four common types of experimental psychopa- distinct elements of drinking behavior or thought
thology research differ in their focus on manipu- processes); this approach encompasses more ele-
lation of independent variables and sample type ments of the individual’s behavior (e.g., frequency,
(Forsyth & Zvolensky, 2002). These types of research amount, consequences). Assessment at the system
vary in their ability to identify processes governing level tends to be more inclusive, assuming that the
the origins and maintenance of psychopathology various systems involved affect one another in a
processes. Yet across these types of research there direct fashion; for example, substance use behav-
are some overarching assessment issues that are rou- ior affects anxiety and related mood states and vice
tinely considered. We now turn to a discussion of versa (Drasgow & Kanfer, 1985). Although more
these molar conceptual considerations in the con- inclusive theoretically, the challenge to using the
text of experimental psychopathology research. system-level approach historically has been in the
titration of the accuracy operative conceptual model
Assessment Approach in Experimental in terms of the pragmatic aspects of the assessment
Psychopathology Research: Molar processes (e.g., isolating the appropriate level to
Conceptual Considerations assess problem behavior relative to existing scientific
The assessment approach for experimental psy- information about it).
chopathology research has no single “strategy” that
will work for all types of scientific activities. There Methods
also is no standard model that can work for all types All levels of analysis for the assessment of experi-
of experimental psychopathology research. Yet a mental psychopathology can theoretically involve
number of basic issues, including level of analysis, the measurement of responses across cognitive,
method of assessment, nature of inferences drawn, behavioral, and physiological systems. The measure-
and quality of the data obtained, provide a conceptual ment of specific systems varies both by content area

zvolensk y, for sy t h , joh nson 15

(e.g., depressive vs. anxiety disorder) and the par- Person-referenced approaches focus on the indi-
ticular systems theoretically involved with the prob- vidual and compare measured responses to the
lem behavior in question. Therefore, there is great same person (e.g., number of times of marijuana
variability across distinct types of psychopathology use per week). The referent is the person himself or
despite recognition of some of their overarching herself and his or her own behavior in a particu-
commonalities. The classic work by Cone (1978) lar epoch. Criterion-referenced approaches focus
provides a model for understanding the assessment on responses of the individual in the context of a
process in experimental psychopathology research. specified standard (e.g., endorsing a score of 10 or
Cone (1978) identified that assessment tactics higher on a designated alcohol use measure is sug-
vary along dimensions—content, directness, and gestive of alcohol abuse or dependence). Although
generalizability. Content reflects the nature of the criterion-referenced approaches often provide a spe-
responses being assessed (cognitive, behavioral, and cific benchmark upon which to evaluate a response,
physiological). Directness pertains to the immediacy the challenge for most cases of psychopathology has
of the assessment of responses in the time and con- often been in isolating objective indices of “adaptive”
text in which they occur (e.g., measuring alcohol use responding. Norm-referenced approaches compare
during periods of actual use vs. retrospective reports the observed responses to a normative group. For
of alcohol use behavior). Common forms of indirect example, a researcher may compare the degree and
methods of assessment include interviews, question- type of attentional bias for threat experienced by a
naires, and ratings by self or others; common forms person with generalized anxiety disorder, who is also
of direct assessment include monitoring behavior in currently depressed, to the typical attentional biases
real-world settings (e.g., time sampling approaches), observed among nondepressed persons with this
role playing, and various forms of analogue behav- same anxiety disorder diagnosis.
ior (e.g., measuring emotional responses to drug
cues in the laboratory). Generalizability refers to the Determining Assessment Value
consistency of the responses being measured across With the consideration of the types of inference
a particular domain. There are distinct domains of modalities described above, it is important to note
generalizability often relevant to psychopathological that the quality of the data derived from any given
processes (e.g., time, setting, method; Cone, 1978). assessment activity of experimental psychopathol-
Contingent upon the goals of the assessment, ogy research can be interpreted from distinct con-
there will be natural variation in the method and ceptual models. Just as the goals of the assessment
content targeted for measurement in experimental often affect the types of content and methods used,
psychopathology. There also are likely differences in the modes of evaluating the quality of data derived
method and content during assessment as a func- from any given assessment activity vary greatly.
tion of the training and background of the assessor. These approaches differ in the assumptions made
There is naturally no “universal” or “correct” model about underlying psychopathology, measurement
that will be sufficient to meet the assessment objec- processes, and interpretation guidelines. Thus, the
tives for all types of psychopathology. In short, the utilization of any given model for any given instance
methods employed to assess the content areas of of experimental psychopathology research may
primary interest will vary directly as a function of depend on any number of factors (e.g., familiarity
the assessment goals themselves. Additionally, prag- with a particular model; agreement and understand-
matic considerations (e.g., time and resources) can ing of underlying assumptions).
greatly affect the choice of method employed in the Arguably the model most commonly employed
assessment process. in experimental psychopathology research is the
“classic” psychometric model (Guion, 1980). The
Drawing Inferences basic premise of the psychometric model is that
The data derived from the assessment process in there is measurement error; the goal, therefore, is
experimental psychopathology can be interpreted in to develop and utilize instruments that maximize
distinct ways; the challenge is isolating the best pos- accuracy and minimize error. This approach empha-
sible information for maximum explanatory value. sizes the validity and reliability of a particular assess-
There are three commonly employed forms of infer- ment tool in capturing the processes or variables of
ence in experimental psychopathology research: interest. The psychometric model has driven many
person-referenced, criterion-referenced, and norm- of the assessment approaches used in better under-
referenced approaches (Kazdin, 1977). standing psychopathology. The generalizability

16 l a b or atory metho ds in experimen tal psyc h opat h ology

model focuses on determining the nature of vari- Interconnection with Practice
ability in regard to the setting or context in which it Scholars have frequently lamented the gaps
was obtained (Cone, 1978). In short, variability is between science and practice (e.g., Onken & Bootzin,
understood in relation to the contextual conditions 1998). Indeed, the field of behavioral science, as a
(e.g., time of assessment, setting). To the extent whole, has made many efforts to call attention to
there are large differences in context for any given the benefits of such endeavors and devised strate-
assessment (e.g., responding to drug cues when in gies for doing so (e.g., National Advisory Mental
a positive vs. negative mood state), interpretation Health Council Behavioral Science Workgroup,
of those data is made in concert with the context in 2000). The lack of integration of experimental
which it was obtained. The accuracy model posits psychopathology research into mainstream clinical
that the usefulness of a given assessment tool cen- science and practice is highly similar to the widely
ters on how well it captures the process in ques- noted gap between clinical science and clinical prac-
tion (Cone, 1978). Although seemingly simple, it tice. Although several factors are likely responsible
is often not a pragmatically feasible approach for for this gap, there are two that appear to be at the
experimental psychopathology research, as there are crux of the matter. We have already alluded to the
so many instances wherein there exists a “standard” different analytic agendas of the basic experimental
to which evaluate “accuracy.” psychopathology researcher and the applied practi-
tioner. This difference is compounded by a second
Summary issue, namely a language gap. The scientist prefers a
Each type of experimental psychopathology language that is precise, but not necessarily broad
research involves a consideration of a number of in scope, whereas the practitioner prefers concepts
overarching conceptual considerations from an that are often broad, but technically imprecise. For
assessment standpoint. There is no single formula instance, emotions are frequently discussed clinically
or set of standards that will uniformly be applied to and are often the focus of therapeutic attention, yet
all sets of questions being addressed. The assessment emotion and related terms used to describe feelings
approach taken in experimental psychopathology, are not considered technical concepts within the sci-
therefore, is theoretically driven and tied directly to ence of psychology. As others have identified, infor-
the psychopathology process in question. mation between diverse fields of psychology needs
to be bridged to maximize the full range of possible
Key Challenges to Experimental growth and practical impact of psychological sci-
Psychopathology Research ence (Onken & Bootzin, 1998).
Although experimental psychopathology offers a Unfortunately, there has not been a systematic a
unique and powerful lens through which to exam- priori research agenda focused on understanding how
ine psychopathological processes (Zvolensky et al., basic biobehavioral processes are altered in specific
2001), there are numerous challenges—some theo- forms of psychopathology and the implications of
retical and some practical—to the application and these alterations for etiology, course, and successful
overall developmental sophistication of experimen- treatment/prevention. In our view, a sophisticated
tal psychopathology research. For example, there “translational focus” will require that basic research
are difficulties inherent in the types of laboratory on biobehavioral processes directly inform clinical
models that can be utilized (Abramson & Seligman, questions, and conversely, that observations about
1977). Specifically, there are notable challenges such naturally occurring psychopathology be used to
as ethics and knowledge regarding (a) the relative guide efforts to understand operative biobehavioral
ability to comprehensively understand the types of processes that give rise to diagnostic labels or more
symptoms that characterize a phenotype of interest generally “psychopathology.” This kind of transla-
and (b) the ability to take the steps to produce these tional integration will not likely come about via
symptoms in humans when they are determined current common practices of providing only paren-
(Abramson & Seligman, 1977). Isolating ways to thetical references to basic research in clinical articles,
address these types of limitations will be benefi- and similarly when clinical issues are discussed only
cial, and perhaps central, in maximizing the impact tangentially in basic research articles. Such efforts
of experimental psychopathology on the field as a do not embody the spirit of experimental psychopa-
whole. We now present some of these challenges thology, for which the core strengths derive from an
and, where possible, potential strategies for over- a priori focus on basic research on psychopathology
coming them. processes with a least one eye on practical utility.

zvolensk y, for sy t h , joh nson 17

Serious concerns regarding experimental psychopa- Whereas many medical diagnoses implicate
thology research will rightly continue so long as the pathogenic disease processes that underlie medi-
methods and language employed are not considered cal syndromes, psychopathology research has not
in terms of understanding salient and manipulable always elucidated or emphasized these core pro-
variables and processes that characterize psychopa- cesses. Rather, diagnoses themselves have come to
thology and human suffering more generally. dominate contemporary mental health research and
One solution might involve steps consistent practice. For example, behavior problems often are
with Hayes’ (1987) mutual interest model, whereby defined by symptoms (i.e., topography) without ref-
basic experimental psychopathologists and applied erence to underlying biobehavioral processes (i.e.,
practitioners collaborate and communicate with functions). Thus, a specific treatment for Disorder
one another when their interests intersect. Based X may be “fitted” to a specific patient with Disorder
upon similar logic, another solution might include X. Although this type of diagnosis-based clinical
the creation of translational research centers, where research and practice is certainly here to stay for
basic and applied psychopathologists are housed political, pragmatic, and clinical reasons, greater
under one roof, devise integrative basic and applied attention to core processes will undoubtedly be
programs of research, outline both technical and important to move psychopathology research for-
nontechnical analyses of the relation between basic ward in a meaningful way (Eifert et al., 1998).
processes and psychosocial treatment development Research and practice that does not attend to
and testing, and ultimately disseminate such work core biobehavioral processes may lead to a number
to practitioners and the general public. We envision of problems. For example, as described by Kiesler
that such dissemination efforts would include both (1966) and Persons (1991), there can be a “myth of
technical and nontechnical descriptions of variables uniformity,” such that all persons with Disorder X
and processes shown to cause and exacerbate psy- are presumed to be more alike than different.
chopathology, and similarly descriptions of how Clinicians largely operate from within an idio-
such variables and processes relate to intervention graphic framework when working with their clients
components contained in psychosocial treatments. and rightly question work based exclusively on a
Ideally, such work would link psychosocial inter- DSM type of system because of the implicit nomo-
ventions with manipulable variables and processes thetic assumption of uniformity across persons.
shown to cause or exacerbate psychopathology. It is This may be particularly true for behavioral prob-
our belief that such work would likely yield more lems that do not fit within the current DSM system
powerful psychosocial interventions that focus less (e.g., Eifert, Zvolensky, & Lejuez, 2000). Moreover,
on psychiatric diagnosis (i.e., symptom topogra- as Wolpe (1989) noted, common problems
phy) and more on shared vulnerabilities and core (e.g., panic disorder) do not necessarily imply com-
processes responsible for human suffering and its mon histories, let alone common active clinical pro-
alleviation. The resulting treatments would, in prin- cesses. In fact, there are often different dysfunctional
ciple, become more functional and process-driven. processes that are operating for two persons with the
same disorder (i.e., equifinality) and quite possibly
Process-Oriented Experimental similar behavioral processes that cut across dissimi-
Psychopathology Across Response Systems lar DSM categories (i.e., multifinality). Thus, any
Contemporary mental health research and prac- psychosocial treatment that involves targeting core
tice are predicated upon accurate classification of processes linked to certain historical antecedents
psychological disorders. In fact, the reliance on the will optimally need to address the dysfunctional
DSM system is apparent from all standpoints of processes at work for a single case. Experimental
mental health research and practice. Building upon psychopathology is well suited for clinical applica-
changes in health care policies and procedures, tion because it often focuses on core processes that
there has been a well-reasoned “push” to standard- give rise to psychopathology. Thus, it can give cli-
ize and manualize psychosocial and pharmacologi- nicians a tangible strategy based upon a theoreti-
cal treatments for psychological disorders. Whereas cal understanding of the dysfunctional processes at
the early study of psychopathology dealt with core work for any one client.
biobehavioral processes, one could argue this move- Another, related way in which experimental psy-
ment embodies a “return to a medical model” type chopathology can have an impact on the process
of thinking (Follette, Houts, & Hayes, 1992; Hayes level is by contributing to future interdisciplinary
& Follette, 1992). movements within behavioral science. In current

18 l a b or atory metho ds in experimen tal psyc h opat h ology

practice, it is common for psychopathologists to to the understanding of the pathophysiology in
develop theories about behavior problems with little obsessive-compulsive disorder (Saxena et al., 1998).
reference to whether observations are supported by Thus, experimental psychopathology researchers’
theories at more basic levels of science (e.g., neuro- ability to advance understanding of the mecha-
biological). Unfortunately, this results in disconti- nisms responsible for psychopathology increases
nuity between and within various scientific fields. with the continued application of these brain imag-
Reference to this lower level of analysis can and ing technologies.
should inform and constrain theory about observed
psychopathology at higher levels. At the same time, Laboratory Preparations Are Not
some may suggest that the “Decade of the Brain” Equivalent to Clinical Processes
has fueled the viewpoint that all psychopathology Experimental psychopathologists have devised
outcomes can be reduced to biology. Yet psycho- numerous laboratory preparations to induce and
pathology cannot be usefully considered strictly in elucidate clinically relevant processes. Examples of
biological terms as by definition it is not reducible such work include preparations that use noncon-
to biological processes alone. For example, fear is a tingent aversive stimulation to study motivation
functional state characterized by collateral changes and emotion dysregulation (Mineka & Hendersen,
across systems and therefore is not reducible to bio- 1985); procedures to elucidate the role of onset and
logical activities alone. As Miller and Keller (2000, offset predictability and controllability in relation
p. 213) have argued, “we advocate not that every to anxious responding (Zvolensky, Eifert, & Lejuez,
study employ both psychological and biological 2001); mood induction preparations in depressive
methods, but that researchers not ignore or dismiss responding (Martin, 1990); conditioning prepara-
relevant literature, particularly in the conceptualiza- tions to study the acquisition and transfer of aver-
tion of their research.” sive affective states to a variety of stimuli (Forsyth,
Cross-level of analysis theory development Daleiden, & Chorpita, 2000); and preparations
and evaluation of core processes requires broad that give rise to verbal-regulatory (i.e., cognitive)
assessment of pertinent constructs that integrates processes involved in psychopathology (Hayes,
information from multiple response systems at Jacobson, Follette, & Dougher, 1994). Such prepa-
these different levels of analysis. Utilization of the rations are often a reliable way to establish or evoke
laboratory context for the examination of clinical “clinical” processes. Yet experimental preparations
psychopathology provides the experimental psycho- are not equivalent with clinical processes, nor
pathology researcher with the necessary flexibility in should they be treated as such. Indeed, the distinc-
measurement. The multimethod assessment strategy tion between procedure and process is critical when
may be particularly helpful when completed within considering the clinical relevance of findings from
the context of experimental elicitation of impor- experimental psychopathology. Thus, the reporter
tant clinical phenomena. Assessment of emotional of such work has to use cautious language when dis-
response provides one such example. Without broad seminating this type of scholarship.
measurement, any one index may yield ambiguous, There has been a great tendency to equate exper-
incomplete, or misleading information about the imental preparations with clinical processes, lead-
affective response (Cacioppo & Tassinary, 1990). ing to erroneous conclusions. Here, we consider
Moreover, during clinically relevant cognitive-affec- Pavlovian or respondent conditioning in relation
tive distress states, “differential” information from to understanding the origins and maintenance of
response domains may reliably manifest to inform anxiety-related disorders as one example of this
theory about underlying mechanisms. problem. In its most basic form, such preparations
Importantly, such efforts to study psychopathol- involve pairing a previous neutral stimulus (NS)
ogy can be greatly aided by technological advance- in a contingency with an unpleasant event or aver-
ments, as reflected by those in human neuroimaging. sive unconditioned stimulus (UCS) that is capable
Numerous functional brain imaging techniques are of eliciting a strong negative emotional uncondi-
currently available to examine neural mechanisms tioned response (UCR). With suitable controls, this
in clinical psychopathology. Some examples include preparation will result in a change in the emotion-
positron emission tomography, single photon eliciting functions of the NS, such that it becomes
emission computed tomography, and functional a conditional stimulus (CS) capable of eliciting a
magnetic resonance imaging. For example, neu- conditioned emotional response (CER or CR; i.e.,
roimaging techniques have contributed significantly fear and anxiety) more or less similar to the original

zvolensk y, for sy t h , joh nson 19

UCR (Mineka & Zinbarg, 1996). At the core, these UCSs themselves) in relation to objects or events in
developments share one common etiological thread: the environment.
aversive stimulus functions are transferred via an Experimental psychopathologists have empha-
association between some otherwise nonthreatening sized clinically relevant processes and have devised
object or event and an abrupt and highly aversive powerful experimental preparations to elucidate
neurobiological response (Forsyth & Eifert, 1998). such processes. However, Pavlovian conditioning
Associative learning is not predicated on iden- itself was never considered a pathogenic process
tifying or manipulating a pain- or fear-inducing (Pavlov, 1903), but rather could become pathogenic
stimulus (i.e., a UCS) as is typical of laboratory when interacting with other variables. At some
preparations of associative fear conditioning. For point, Pavlovian fear conditioning began to be
example, operant preparations can yield respon- viewed as pathogenic itself. Researchers then began
dent processes (e.g., punishing consequences can to treat respondent conditioning preparations and
establish antecedents as conditioned suppressors associative processes as monotypic (Lazarus, 1971;
that elicit a wide range of conditioned emotional Rachman, 1977). This view has arguably had the
responses). The critical process variable that enables unfortunate effect of obscuring clinically relevant
transfer of aversive stimulus functions to otherwise learning processes that are involved in the acquisi-
innocuous stimuli is the response, not identify- tion and maintenance of fearful and anxious behav-
ing contiguous NS–UCS pairings; this view has ior seen clinically (Davey, 2002).
received recent experimental support (see Forsyth What accounts for an otherwise adaptive condi-
et al., 2000). tioned emotional response leading to anxiety psy-
Unfortunately, laboratory conditioning prepara- chopathology in some individuals, but not others?
tions involving NS–UCS pairings have been taken In this example, a narrow focus on the preparations
as the definitive working model to explain associa- involved has led to the following spurious conclu-
tive processes in the etiology of phobias and other sions: (a) many persons exposed to events that could
fear-related conditions seen clinically. Hence, if a be construed in terms of Pavlovian fear conditioning
sequence of events leading to clinical fear onset can- preparations (b) fail to develop clinically significant
not be construed in terms of pairings between some conditioned emotional responses as a result, and
traumatic/painful UCS in relation to some object therefore (c) conditioning cannot account for fear
or event in the environment, the phobia cannot be onset in the majority of cases seen clinically. A focus
due to Pavlovian conditioning (e.g., see Menzies & on the formal and structural properties of the prep-
Clarke, 1995; Rachman, 1991). Here, it is not dis- arations can either potentiate or depotentiate the
puted that phobias may be acquired by means other probability of conditioning and the extent to which
than direct traumatic conditioning, or more impor- conditioning processes become problematic. Indeed,
tantly exposure to an identifiable pain-producing we know that prior history of control over aversive
aversive event. What is disputed, however, are the events, prior history of exposure to stimuli without
contentions that (a) finding an identifiable UCS aversive consequences, and contextual factors can,
is the only evidence for direct conditioning and either alone or in combination, influence whether
(b) laboratory fear conditioning preparations involv- conditioned emotional responses are acquired and
ing CSs and UCSs are the way to define associative the extent to which they are evoked on subsequent
fear onset processes clinically. occasions (Bouton, 2000). Consideration of such
As Eysenck (1987) correctly pointed out, from factors does not mean that conditioning processes
the experimenter or clinician’s perspective, evidence are not involved, but rather illustrates that condi-
for direct conditioning typically involves either the tioning is complex and functionally determined.
manipulation or identification of neutral stimuli
(NSs) in relation to identifiable pain-producing Summary
stimuli (UCSs). That is, experimenters tend to Experimental psychopathology represents a sub-
define conditioning processes in terms of condition- field of psychological science aimed at elucidating
ing preparations. Eysenck goes on to say, however, the processes underlying abnormal behavior. The
that from the individual’s perspective, direct condi- present chapter provided a synopsis of the historical
tioning involves the experience of abrupt and aver- perspectives and key elements of experimental psy-
sive interoceptive or bodily responses. That is, as far chopathology research. Further, the methodological
as research subjects and clients are concerned, con- approaches employed in experimental psychopa-
ditioning involves the bodily effects of UCSs (not thology were described in relation to conceptual

20 l a b or atory metho ds in experimen tal psyc h opat h ology

considerations. Although experimental psychopa- review. Clinical Psychology: Science and Practice, 7(4), 403–417.
thology has made major contributions to the field DOI: 10.1093/clipsy/7.4.403
Estes, W. K., & Skinner, B. F. (1941). Some quantitative prop-
of psychological science, there are numerous points erties of anxiety. Journal of Experimental Psychology, 29,
of entry for it to maximize its integrative potential 390–400. DOI: 10.1037/h0062283
across basic and applied domains (translational Eysenck, H. J. (Ed.) (1961). Handbook of abnormal psychology:
function). Future experimental psychopathology An experimental approach. New York: Basic Books.
work will likely need to continue to develop and Eysenck, H. J. (Ed.) (1973). Handbook of abnormal psychology.
San Diego, CA: EdITS Publishers.
expand in innovative ways to overcome key chal- Eysenck, H. J. (1987). Behavior therapy. In H. J. Eysenck &
lenges facing it and the field as a whole. I. Martin (Eds.), Theoretical foundations of behavior therapy
(pp. 3–34). New York: Plenum.
References Felmingham, K., Williams, L. M., Kemp, A. H., Liddell, B.,
Abramson, L. Y., & Seligman, M. E. P. (1977). Modeling psy- Falconer, E., Peduto, A., &, Bryant, R. (2010). Neural
chopathology in the laboratory: History and rationale. In responses to masked fear faces: Sex differences and trauma
J. P. Maser & M. E. P. Seligman (Eds.), Psychopathology: exposure in posttraumatic stress disorder. Journal of Abnormal
Experimental models (pp. 1–26). San Francisco: W. H. Psychology, 119(1), 241–247. DOI: 10.1037/a0017551.
Freeman. Follette, W. C., Houts, A. C., & Hayes, S. C. (1992). Behavior
American Psychiatric Association (1994). Diagnostic and statisti- therapy and the new medical model. Behavioral Assessment,
cal manual of mental disorders (4th ed.). Washington, DC: 14, 323–343.
Author. Forsyth, J. P., Daleiden, E. L., & Chorpita, B. F. (2000). Response
American Psychological Association (1947). Recommended primacy in fear conditioning: Disentangling the contribu-
graduate training program in clinical psychology. American tions of UCS vs. UCR intensity. Psychological Record, 50,
Psychologist, 2, 539–558. 17–33.
Anderson, O. D., & Liddell, H. S. (1935). Observations on Forsyth, J. P., & Eifert, G. H. (1998). Response intensity in
experimental neurosis in sheep. Archives of Neurological content-specific fear conditioning comparing 20% versus
Psychiatry, 34, 330–354. 13% CO2-enriched air as unconditioned stimuli. Journal of
Benjamin, L. Jr. (2000). The psychology laboratory at the turn of Abnormal Psychology, 107(2), 291–304. DOI: 10.1037/0021-
the 20th century. American Psychologist, 55, 318–321. DOI: 843X.107.2.291
10.1037/0003-066X.55.3.318 Forsyth, J. P., & Zvolensky, M. J. (2002). Experimental psy-
Bouton, M. E. (2000). A learning theory perspective on lapse, chopathology, clinical science, and practice: An irrelevant
relapse, and the maintenance of behavior change. Health or indispensable alliance? Applied and Preventive Psychology:
Psychology, 19, 57–63. DOI: 10.1037/0278-6133.19. Current Scientific Perspectives, 10, 243–264. DOI: 10.1016/
Suppl1.57 S0962–1849(01)80002–0
Brown, J. F. (1937). Psychoanalysis, topological psychology and Franks, C. M. (Ed.) (1964). Conditioning techniques in clinical
experimental psychopathology. Psychoanalytic Quarterly, 6, practice and research. Berlin: Springer.
227–237. Franz, S. I. (1912). Experimental psychopathology. Psychological
Cacioppo, J. T., & Tassinary, L. G. (1990). Inferring psycho- Bulletin, 9, 145–154.
logical significance from physiological signals. American Gantt, W. H. (1971). Experimental basis for neurotic behavior.
Psychologist, 45, 16–28. DOI: 10.1037/0003-066X.45.1.16 In H. D. Kimmel (Ed.), Experimental psychopathology: Recent
Chapman, L. J., & Chapman, J. P. (1973). Disordered thought in research and theory (pp. 33–48). New York: Academic Press.
schizophrenia. Englewood Cliffs, NJ: Prentice-Hall. Gregor, A. A. (1910). Leitfaden der experimentellen psychopatholo-
Clark, L. A., Watson, D., & Mineka, S. (1994). Temperament, gie. Berlin: Allzeit Voran.
personality, and the mood and anxiety disorders. Journal of Guion, R. M. (1980). On Trinitarian doctrines of validity.
Abnormal Psychology, 103, 103–116. DOI: 10.1037/0021- Professional Psychology, 11(3), 385–398. DOI: 10.1037/0735-
843X.103.1.103 7028.11.3.385
Cone, J. D. (1978). The behavioral assessment grid (BAG): Hawkins, K. A., & Cougle, J. R. (2011). Anger problems across
A conceptual framework and a taxonomy. Behavior Therapy, the anxiety disorders: Findings from a population-based
9(5), 882–888. DOI: 10.1016/S0005-7894(78)80020-3 study. Depression and Anxiety, 28, 145–152. DOI: 10.1002/
Davey, G. C. L. (2002). “Nonspecific” rather than “nonassocia- da.20764
tive” pathways to phobias: A commentary on Poulton and Hayes, S. C. (1987). The relation between “applied” and “basic”
Menzies. Behaviour Research and Therapy, 40, 151–158. psychology. Behavior Analysis, 22, 91–100.
DOI: 10.1016/S0005-7967(01)00046-8 Hayes, S. C., & Follette, W. C. (1992). Can functional analysis
Drasgow, F., & Kanfer, R. (1985). Equivalence of psychologi- provide a substitute for syndromal classification? Behavioral
cal measurement in heterogeneous populations. Journal of Assessment, 14, 345–365.
Applied Psychology, 70, 4, 662–680. DOI: 10.1037/0021- Hayes, S. C., Jacobson, N. S., Follette, V. M., & Dougher, M. J.
9010.70.4.662 (Eds.) (1994). Acceptance and change: Content and context in
Eifert, G. H., Schulte, D., Zvolensky, M. J., Lejuez, C. W., & psychotherapy. Reno, NV: Context Press.
Lau, A. W. (1998). Manualized behavior therapy: Merits Hoch, P. H., & Zubin, J. (Eds.) (1957). Experimental psychopa-
and challenges. Behavior Therapy, 28, 499–509. DOI: 0005- thology. New York: Grune & Stratton.
7894/97/0499-050951.0 Hunt, J. M., & Cofer, C. N. (1944). Psychological deficit.
Eifert, G. H., Zvolensky, M. J., & Lejuez, C. W. (2000). Heart- In J. M. Hunt (Ed.), Personality and the behavior disorders
focused anxiety and chest pain: A conceptual and clinical (pp. 971–1032). Oxford: Ronald Press.

zvolensk y, for sy t h , joh nson 21

Ingram, R. E. (1986). Information processing approaches to clinical of Psychology, 36, 495–529. DOI: 10.1146/annurev.
psychology. Orlando, FL: Academic Press. ps.36.020185.002431
Kazdin, A. E. (1977). Artifact, bias, and complexity of assess- Mineka, S., & Zinbarg, R. (1996). Conditioning and ethological
ment: The ABCs of reliability. Journal of Applied Behavior models of anxiety disorders: Stress-in-dynamic context anxi-
Analysis, 10(1), 141–150. ety models. In D. A. Hope (Ed.), Perspectives on anxiety, panic,
Kazdin, A. E. (1982). Single-case experimental designs in clini- and fear: Volume 43 of the Nebraska Symposium on Motivation
cal research and practice. New Directions for Methodology of (pp. 135–210). Lincoln, NB: Nebraska University Press.
Social & Behavioral Science, 13, 33–47. Mook, D. G. (1983). In defense of external invalidity.
Kiesler, D. J. (1966). Some myths of psychotherapy research and American Psychologist, 38, 379–387. DOI: 10.1037/0003-
the search for a paradigm. Psychological Bulletin, 65, 110–136. 066X.38.4.379
DOI: 10.1037/h0022911 National Advisory Mental Health Council Behavioral Science
Kihlstrom, J. F., & McGlynn, S. M. (1991). Experimental Workgroup (2000). Translating behavioral science into action.
research in clinical psychology. In M. Hersen, A. Kazdin, & Washington, DC: National Institutes of Health.
A. Bellack (Eds.), Clinical psychology handbook (pp. 239–257). Onken, L. S., & Bootzin, R. R. (1998). Behavioral therapy
New York: Pergamon Press. development and psychological science: If a tree falls in the
Kimmel, H. D. (1971). Introduction. In H. D. Kimmel (Ed.), forest and no one hears it. Behavior Therapy, 29, 539–544.
Experimental psychopathology: Recent research and theory Osgood, C. E. (1953). Method and theory in experimental psychol-
(pp. 1–10). New York: Academic Press. ogy. New York: Oxford University Press.
Krasnogorski, N. I. (1925). The conditioned reflexes and chil- Pavlov, I. P. (1903). Experimental psychology and psychopathol-
dren’s neuroses. American Journal of Disorders of Children, 30, ogy in animals. Herald of the Military Medical Academy, 7(2),
753–768. 109–121.
Landis, C. (1949). Experimental methods in psychopathology. Pavlov, I. P. (1961). Psychopathology and psychiatry: I. P. Pavlov
Mental Hygiene, 33, 96–107. selected works. San Francisco: Foreign Languages Publishing
Lazarus, A. A. (1971). Behavior therapy and beyond. New York: House.
McGraw-Hill. Persons, J. B. (1991). Psychotherapy outcome studies do not accu-
Lenzenweger, M. F., & Dworkin, R. H. (Eds.) (1998). Origins rately represent current models of psychopathology. American
and development of schizophrenia: Advances in experimental Psychologist, 46, 99–106. DOI: 10.1037/0003-066X.46.2.99
psychopathology. Washington, DC: American Psychological Peters, K. D., Constans, J. I., & Mathews, A. (2011).
Association. Experimental modification of attribution processes. Journal
Liddell, H. S. (1938). The experimental neurosis and the prob- of Abnormal Psychology, 120(1), 168–173. DOI: 10.1037/
lem of mental disorder. American Journal of Psychiatry, 94, a0021899
1035–1041. Popplestone, J. A., & McPherson, M. W. (1984). Pioneer psy-
Lilienfeld, S. O. (1996, Jan/Feb). EMDR treatment: Less than chology laboratories in clinical settings. In J. Brozek (Ed.),
meets the eye? Skeptical Inquirer, 25–31. Explorations in the history of psychology in the United States
Lubin, A. J. (1943). The experimental neurosis in animal and (pp. 196–272). Lewisburg, PA: Bucknell University Press.
man. American Journal of the Medical Sciences, 205, 269–277. Proctor, C., Maltby, J., & Linley, A. P. (2011). Strengths use as
DOI: 10.1097/00000441-194302000-00026 a predictor of well-being and health-related quality of life.
Mackinnon, D. W., & Henle, M. (1948). Experimental studies Journal of Happiness Studies, 12, 1, 153–169. DOI: 10.1007/
in psychodynamics; A laboratory manual. Cambridge, MA: s10902-009-9181-2
Harvard University Press. Rachman, S. (1966). Sexual fetishism: An experimental ana-
Martin, M. (1990). On the induction of mood. Clinical logue. Psychological Record, 16, 293–296.
Psychology Review, 10, 669–697. DOI: 10.1016/0272- Rachman, S. (1977). The conditioning theory of fear acquisi-
7358(90)90075-L tion: A critical examination. Behaviour Research and Therapy,
Maser, J. D., & Seligman, M. E. P. (1977). Psychopathology: 15, 375–387. DOI: 10.1016/0005-7967(77)90041-9
Experimental models. San Francisco: W. H. Freeman. Rachman, S. (1991). Neo-conditioning and the classical theory
Masserman, J. H. (1943). Experimental neuroses and psy- of fear acquisition. Clinical Psychology Review, 11, 155–173.
chotherapy. Archives of Neurology and Psychiatry , 49, DOI: 10.1016/0272-7358(91)90093-A
43–48. Sandler, J., & Davidson, R. S. (1971). Psychopathology: An
McFall, R. M. (1991). Manifesto for a science of clinical psychol- analysis of response consequences. In H. D. Kimmel (Ed.),
ogy. Clinical Psychologist, 44(6), 75–88. Experimental psychopathology: Recent research and theory
McNally, R. J. (1998). Information-processing abnor- (pp. 71–93). New York: Academic Press.
malities in anxiety disorders: Implications for cogni- Saxena, S., Brody, A., Schwartz, J., & Baxter, L. (1998).
tive neuroscience. Cognition and Emotion, 12, 479–495. Neuroimaging and frontal-subcortical circuitry in obses-
10.1080/026999398379682 sive-compulsive disorder. British Journal of Psychiatry, 173,
Menzies, R. G., & Clarke, J. C. (1995). The etiology of phobias: 26–37.
A non-associative account. Clinical Psychology Review, 15, Schumann, F. (1905). Proceedings of the First Congress of
23–48. 10.1016/0272-7358(94)00039-5 Experimental Psychology, at Giessen, April, 1904. Psychological
Miller, G., & Keller, J. (2000). Psychology and neuroscience: Bulletin, 2, 81–86.
Making peace. Current Directions in Psychological Science, 9, Shenger-Krestovnikova, N. R. (1921). Contributions to the
212–215. DOI: 10.1111/1467-8721.00097 question of differentiation of visual stimuli and the limits of
Mineka, S., & Hendersen, R. W. (1985). Controllability and differentiation by the visual analyzer of the dog. Bulletin of
predictability in acquired motivation. Annual Review the Lesgaft Institute of Petrograd, 3, 1–43.

22 l a b or ato ry metho ds in experimental psyc h opat h ology

Sher, K. J., & Trull, T. J. (1996). Methodological issues in psy- Yerofeeva, M. N. (1912). Electrical stimulation of the skin of
chopathology research. Annual Review of Psychology, 47, the dog as a conditioned salivary stimulus. Unpublished
371–400. DOI: 10.1146/annurev.psych.47.1.371 thesis.
Skinner, B. F. (1953). Science and human behavior. New York: Yerofeeva, M. N. (1916). Contribution to the study of destructive
The Free Press. conditioned reflexes. Comptes Rendus de la Societé Biologique,
Taylor, E. (1996). William James on consciousness beyond the mar- 79, 239–240.
gin. Princeton, NJ: Princeton University Press. Zubin, J., & Hunt, H. F. (1967). Comparative psychopathology,
Wells, F. L. (1914). Experimental psychopathology. Psychological animal and human. New York: Grune and Stratton.
Bulletin, 11, 202–212. DOI: 10.1037/h0073486 Zvolensky, M. J., & Eifert, G. H. (2000). A review of psycho-
Williams, J. M. G., Mathews, A., & MacLeod, C. (1997). The logical factors/processes affecting anxious responding during
emotional Stroop task and psychopathology. Psychological voluntary hyperventilation and inhalations of carbon diox-
Bulletin, 120, 3–24. ide-enriched air. Clinical Psychology Review, 21, 375–400.
Wolpe, J. (1952). Experimental neuroses as learned behavior. DOI: 10.1016/S0272-7358(99)00053-7
British Journal of Psychology, 43, 243–268. Zvolensky, M. J., Eifert, G. H., & Lejuez, C. W. (2001).
Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford, Emotional control during recurrent 20% carbon dioxide-
CA: Stanford University Press. enriched air induction: Relation to individual difference
Wolpe, J. (1989). The derailment of behavior therapy: A tale variables. Emotion, 2, 148–165. DOI: 10.1037//1528-
of conceptual misdirection. Journal of Behavior Therapy and 3542.1.2.148
Experimental Psychiatry, 20, 3–15. DOI: 10.1016/0005- Zvolensky, M. J., Lejuez, C. W., Stuart, G. L., & Curtin, J. J.
7916(89)90003-7 (2001). Experimental psychopathology in psychological
Wolpe, J., Salter, A., & Reyna, L. J. (Eds.) (1964). The condition- science. Review of General Psychology, 5, 371–381. DOI:
ing therapies. New York: Holt, Rinehart. 10.1037/1089-2680.5.4.371

zvolensk y, for sy t h , joh nson 23


Single-Case Experimental Designs

3 and Small Pilot Trial Designs

Kaitlin P. Gallo, Jonathan S. Comer, and David H. Barlow

This chapter covers single-case experimental designs and small pilot trial designs, beginning with
a review of the history of single-case experimental designs. Such designs can play key roles in
each stage of treatment development and evaluation. During the earliest stages of treatment
development and testing, single-case experimental designs clarify functional relationships between
treatment and symptoms. After a treatment has been formalized, a series of replicating single-case
experiments in conjunction with randomized clinical trials can contribute meaningful information to
efficacy evaluations. After a treatment has demonstrated robust efficacy in large-scale clinical trials,
single-case designs can speak to the generalizability and transportability of treatment efficacy by
demonstrating the successful application of established treatments when flexibly applied to individuals,
or in settings, that may vary in important and meaningful ways. Specific designs covered include A-B
designs, basic withdrawal designs (i.e., A-B-A trials), extensions of the traditional withdrawal design
(e.g., A-B-A-B designs, B-A-B-A designs, and A-B-C-B designs), multiple-baseline trials, and small
pilot trial designs, all of which assess treatment effects in a systematic manner with a relatively small
number of participants. We conclude with a call for increased utilization of single-case experimental
designs in clinical psychology treatment outcomes research.
Key Words: Single-case experimental designs, multiple-baseline designs, withdrawal designs,
treatment outcomes research, idiographic and nomothetic group design evaluations

Evidence-based practice in clinical psychology systematic and rich evidence of treatment effects
entails an explicit and judicious integration of best with a relatively small number of participants.
available research with clinical expertise, in the con- The National Institute of Drug Abuse commis-
text of client characteristics, preferences, and values. sioned a report broadly outlining a sequence of
Such an endeavor necessitates a compelling body of three progressive stages of treatment development
evidence from which to draw. Systematic, carefully and evaluation (Barlow, Nock, & Hersen, 2009;
designed treatment evaluations—the cornerstone Kazdin, 2001). In Stage 1, the first phase of treat-
of applied psychology—are central to this pursuit ment development and testing, novel interventions
and allow data to meaningfully influence individual eventuate from a scholarly integration of theory,
clinical practice, mental health care debates, and previous research, and consultation with relevant
public policy. Large controlled group comparison experts (the formal process of treatment develop-
designs as well as experimental designs utilizing ment is covered elsewhere; Rounsaville, Carroll,
only a few participants each contribute valuable evi- & Onken, 2001). To provide preliminary evidence
dence in this regard. In this chapter, we address the that the intervention is associated with meaningful
latter set of designs—those designs that can provide change, pilot testing is conducted on a relatively

small number of participants who are representative large-scale RCTs, single-case designs contribute
of the population of clients for whom the treatment to Stage 3 efforts by demonstrating the successful
is designed. Stage 1 research activities afford oppor- application of established treatments when flexibly
tunities to refine treatment procedures as necessary applied to individuals, or in settings, that may vary
prior to large-scale treatment evaluation in response in important and meaningful ways (e.g., Suveg,
to early data, and to focus on key preliminary issues Comer, Furr, & Kendall, 2006). Accordingly, in
related to treatment feasibility, tolerability, and many ways comprehensive treatment evaluation
credibility and consumer satisfaction. begins and ends with the study of change in the
Once an intervention has been formalized individual. In this chapter we also cover small pilot
and feasibility and preliminary efficacy have been trial designs, which formally set the stage for Stage
established, Stage 2 research entails larger-scale 2 research. Collectively, the designs covered in this
evaluations—typically group comparisons in tightly chapter all share the advantage of providing system-
controlled trials—to firmly establish treatment effi- atic and compelling evidence of treatment effects
cacy and to evaluate potential mediators and mod- with a relatively small number of participants.
erators of treatment response. Stage 3 consists of We begin with a brief historical overview of the
research efforts to evaluate the broad effectiveness role of single-case designs in clinical psychology, fol-
and transportability of outcomes demonstrated in lowed by consideration of some general procedures,
Stage 2 laboratory studies to less controlled practice and then examine the prototypical single-case
settings. experimental designs including multiple-baseline
From a methodology and design perspective, designs. We then examine key methodological and
Stage 1 research is typically the purview of idio- design issues related to the small pilot RCT, which
graphic single-case experimental designs and small can serve as a bridge from idiographic single-case
pilot randomized controlled trials (RCTs), whereas designs to nomothetic group comparison research,
Stage 2 activities are addressed with adequately and conclude with a renewed sense of urgency for
powered RCTs, nomothetic group comparisons, research utilizing the experimental designs consid-
and formal tests of mediation and moderation ered in this chapter.
(see Kendall, Comer, & Chow, this volume; see
also MacKinnon, Lockhart, & Gelfand, this vol- Single-Case Designs: A Brief Historical
ume). Stage 3 research activities utilize a diversity Overview
of designs, including single-case designs, RCTs, Until relatively recently, the field of clinical psy-
sequential multiple assignment randomized trial chology lacked an adequate methodology for study-
(SMART) designs (Landsverk, Brown, Rolls Reutz, ing individual behavior change. Hersen and Barlow
Palinkas, & Horwitz, 2011), practical clinical tri- (1976) outlined procedures for studying changes in
als (March et al., 2005), qualitative methods, and individual behavior, with foundations in laboratory
clinical epidemiology to address the transportabil- methods in experimental physiology and psychology.
ity of treatment effects and the uptake of supported Prior to this emergence of systematic procedures, less
practices in community settings (see Beidas et al., robust procedures dominated the field of applied
this volume). clinical research, including the popular but less sci-
In this chapter we cover single-case experimen- entific case study method (Bolger, 1965) that domi-
tal designs, including A-B designs, basic withdrawal nated clinical psychology research for the first half of
designs (i.e., A-B-A trials), extensions of the tradi- the twentieth century. These case studies tended to
tional withdrawal design (e.g., A-B-A-B designs, be relatively uncontrolled and researchers often drew
B-A-B-A designs, and A-B-C-B designs), and mul- expansive conclusions from their data, with some
tiple-baseline trials. As Barlow, Nock, and Hersen exceptions (e.g., Watson & Rayner, 1920).
(2009) note, single-case experimental designs can In the middle of the twentieth century, an
play key roles in each of the three outlined stages increased focus on more rigorously applied research
of treatment evaluation. Such designs are essential and statistical methods fueled a split between those
for initial Stage 1 evaluations, clarifying functional investigators who remained loyal to uncontrolled
relationships between treatment and symptoms. case studies (which, despite frequent exaggerated
After a treatment has been formalized, a series of conclusions of a treatment’s efficacy, often con-
replicating single-case experiments can contribute tained useful information about individual behav-
meaningfully to Stage 2 efficacy evaluations. After iors) versus investigators who favored research
a treatment has demonstrated robust efficacy in that compared differences between groups. By the

gallo, comer, bar low 25

late 1940s, some clinical researchers started using systematic and tightly controlled designs should play
between-subjects group designs with operationalized prominent, complementary roles.
dependent variables (Barlow et al., 2009). Although
these early efforts (e.g., Barron & Leary, 1955; General Procedures
Powers & Witmer, 1951) were crude by today’s In single-case experimental research, a repeated
standards and the most usual result was “no differ- measures design, in which data are collected sys-
ences” between therapy and comparison group, the tematically throughout the baseline and treatment
idea that therapeutic efficacy must be established phases, is essential in order to comprehensively
scientifically slowly took hold. This notion was rein- evaluate treatment-related change. Although two-
forced by Eysenck’s (1952) controversial conclusion point, pre–post measurement strategies can exam-
(based on limited studies and actuarial tables) that ine the broad impact of an intervention, systematic
untreated patients tended to improve as much as repeated measurements across an intervention
those assumed to be receiving psychotherapy. phase allow for a nuanced examination of how, why,
Despite the increase in the popularity of and when changes happen (Barlow et al., 2009).
between-group comparisons in the latter part of the Measurements must be specific, observable, and
twentieth century, several factors impeded its util- replicable (Kazdin, 2001; Nock & Kurtz, 2005) and
ity and impact for the first few decades of its use are ideally obtained under the same conditions for
(Barlow et al., 2009). For example, some clinicians each observation, with the measurement device and
worried that withholding treatment for those study all environmental conditions remaining constant.
participants assigned to a comparison group might Specificity of observations refers to measurement
be unethical. Practically, researchers found it diffi- precision and the extent to which the boundaries
cult to recruit sufficient populations of people with of a target behavior are made clear. For example, a
low-base rate disorders for their studies (an issue target behavior that calls for a child to demonstrate
that has improved with the advent of the multisite “appropriate classroom behavior” is less specific than
clinical trial). Results were typically presented in an one that calls for a child to “remain seated and not
averaged or aggregated format, obscuring within- talk out of turn for a period of 60 minutes.”
subject variability and decreasing the generalizabil- Repeated assessments are critical, but the
ity of the findings. researcher must carefully balance the need for suf-
Clinical investigators have begun to debate the ficient information with the need to avoid sub-
merits of idiographic and nomothetic approaches ject fatigue when determining the frequency of
to treatment evaluation (Barlow & Nock, 2009). measurements. The researcher must also carefully
Evaluation of dependent variables comparing aver- consider whether to rely only on self-report mea-
aged data from large groups of people (nomothetic sures, which can be influenced by social desirability
approach) is an essential method with which to estab- (i.e., the inclination for participants to behave in a
lish treatment efficacy and effectiveness, and with way that they think will be perceived well by the
which to inform broad public policy. However, the experimenter) (Crowne & Marlowe, 1960) and/or
generalizability of data obtained by these approaches demand characteristics (i.e., the change in behav-
may be limited in some cases, as the true effects of ior that can occur when a research participant for-
the independent variable for individual subjects mulates beliefs about the purpose of the research)
may be blurred among reported averages. Research (Orne, 1962), or whether to include structured
designs that examine individuals on a more intensive behavioral observations as well. Our view is that
level (idiographic approach) allow for a more spe- clinical researchers should always make attempts
cific understanding of the mechanisms of an inter- to incorporate behavioral observations into single-
vention and its effects on different presentations as case experimental designs. Finally, given the small
they pertain to the individual, although such meth- sample size associated with single-case experimental
ods confer less generalizability relative to nomo- designs, the researcher must take care when inter-
thetic approaches. The importance of single-case preting data, especially in the case of extreme vari-
designs is prominently featured in the establishment ability, so that outliers do not unnecessarily skew
of consensus clinical guidelines and best practice results and conclusions (Barlow et al., 2009). This
treatment algorithms (e.g., American Psychological is particularly challenging in the case of nonlinear
Association, 2002). Following years of debate, we changes in target behaviors. Experimental phases
believe that in the modern evidence-based practice should be long enough to differentially identify
landscape both methodological traditions utilizing random outliers from systematic cyclic variations

26 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

in target outcomes. When nonlinear variations can experimental designs, with the goal of familiarizing
present challenges to interpretation, the clinical the reader with the merits and limitations of each and
researcher is wise to extend measurement proce- providing brief illustrations from published research.
dures to evaluate whether a steady and stable pat- Specifically, we consider A-B designs (a bridge
tern emerges. between the case study and experimental design) and
Procedurally, most single-case experimental then move on to the basic withdrawal designs: A-B-A
designs begin with a baseline period, in which tar- designs, A-B-A-B designs, B-A-B-A designs, and
get behaviors are observed repeatedly for a period of A-B-C-B designs. We follow with a consideration
time. The baseline phase, often called the “A” phase, of multiple-baseline designs. Whereas withdrawal
demonstrates the stability of the target behavior designs are marked by the removal of an interven-
prior to the intervention so that the effects of the tion after behavior change is accomplished, multiple-
intervention can be evaluated against the natural- baseline designs are marked by different lengths of
istic occurrence of the target behavior (Risley & an initial baseline phase, followed by phase changes
Wolf, 1972). Baseline (phase A) observations also across people, time, or behaviors.
provide data that predict future levels of the target
behavior. Demonstrating decreasing symptoms after A-B Designs
the initiation of treatment may be less compelling if Whereas case studies afford opportunities to
symptoms already were shown to be systematically study infrequently occurring disorders, to illustrate
declining across the baseline period. clinical techniques, and to inspire larger system-
Although a stable baseline pattern is preferable, atic clinical trials, such efforts do not afford causal
with no variability or slope in the target behavior(s) conclusions. Additionally, it is difficult to remove
(Kazdin, 1982, 2003), this may be difficult in applied clinical bias from the reported results. Even with
clinical research (Sidman, 1960). Accordingly, visual repeated assessment (e.g., Nock, Goldman, Wang,
inspection and statistical techniques can be utilized & Albano, 2004), internal validity cannot be guar-
to compare phases to each other (Barlow et al., anteed. A-B designs use repeated measurements and
2009). For example, interrupted time-series analy- as such represent a transition between case studies
ses (ITSA) allow the researcher to evaluate changes and experiments (where the independent variable is
in the slope and level of symptom patterns induced manipulated), allowing the researcher to systemati-
by treatment by first calculating omnibus tests cally examine a variable of interest during an inter-
(F statistic) of slope and level changes, with fol- vention phase of treatment against its value during
low-up post hoc t tests to examine which specific a baseline period.
aspect was affected by treatment initiation (slope, In the traditional A-B design, a target behavior
level, or both). The double bootstrap method is identified and then measured repeatedly in the
(McKnight, McKean, & Huitema, 2000) entails A (baseline) and B (intervention) phases (Hayes,
iterative statistical resampling methods to achieve Barlow, & Nelson-Gray, 1999). In the baseline
less biased estimates that are particularly well suited phase, data about the natural (pre-intervention)
for small n single-case experiments. occurrence of the target behavior are collected. The
When moving between phases in single-case researcher then introduces the intervention, con-
experimental research, it is crucial to change tinues collecting repeated measures, and examines
only one variable at a time (Barlow et al., 2009). changes in the target behavior.
Otherwise, it is impossible to determine which The A-B with follow-up design includes the same
manipulation was responsible for changes in a tar- components as the A-B design, with the addition of
get behavior. Data stability on a target behavior is a period of repeated measurements following the B
widely regarded as a necessary criterion that must intervention phase. This design provides more evi-
be achieved prior to progressing to the next phase dence of the stability of an intervention’s effects than
(Barlow et al., 2009). the traditional A-B design; however, it is still pos-
sible that improvements seen in the follow-up phase
Single-Case Experimental Designs are not the result of the intervention but of some
Having provided a general overview of the major other factor. In instances where multiple behaviors
considerations and procedures involved in single-case or multiple measures are of interest, an A-B design
design research, we now turn our attention to the pro- with multiple target measures and follow-up can be
totypical single-case experimental designs. We begin utilized. For example, a researcher might collect
with an overview of the major types of single-case measures of both anxiety and depression across an

gallo, comer, bar low 27

A-B design, or might collect measures of a single (Barlow et al., 2009). However, conclusions from
behavior (such as alcohol use) across multiple set- this design are vulnerable to multiple possible
tings. A-B designs can also include a follow-up threats to internal and external validity; thus, the
period and booster treatment if it becomes clinically transitory strategy of the A-B design should be used
indicated during the follow-up period for the B only when other more systematic methods are not
phase, or the intervention, to be briefly reinstated. possible (Campbell, 1969).
This is similar to the A-B-A-B design, which we dis- Despite its clinical utility and improvements over
cuss later in this section. the traditional case study, several limitations hinder
Cooper, Todd, Turner, and Wells (2007) used an the methodological vigor of the A-B design. The
A-B design with multiple target measures and fol- biggest problem with this design is that observed
low-up in their examination of cognitive-behavioral changes in the B phase may not be caused by the
treatment for bulimia nervosa. Baseline measurements intervention but instead by some other factor (Wolf
were relatively stable, with decreases in symptomatol- & Risley, 1971). As such, Campbell and Stanley
ogy (bingeing, vomiting, and negative beliefs about (1966) advocate the use of the term “quasi-exper-
eating) beginning during the treatment (B) phase imental design” to describe that correlative factors
and maintained at 3- and 6-month follow-up points. may be just as likely to account for observed change
Results were similar for the other two participants. as the intervention itself. In the bulimia treatment
Figure 3.1 shows an example of an A-B design. study by Cooper and colleagues (2007), although
The A-B design allows the researcher to avoid it is certainly possible that treatment caused the
some of the drawbacks of the case study approach improvements seen, it is impossible to confirm this
when examining only one individual. While cer- hypothesis given that the design does not control for
tainly not the most rigorous of the single-case the possibility that some other variable was respon-
strategies, the A-B design can be helpful “transitory sible for improvements. Additionally, withdrawal
strategy” in cases where true experimental methods, designs are meaningful only to the extent that the
such as an RCT or a repetition of the A-B phases, intervention can be withdrawn (e.g., one can with-
are not possible (Campbell & Stanley, 1966; Cook draw reinforcement or a drug, but not surgery or a
& Campbell, 1979). The major strength of the A-B cognitive problem-solving skill).
design is that when the target behavior demonstrates
stability during the baseline period and the behavior A-B-A Design
changes upon intervention, one can infer that the The A-B-A design offers a more rigorous research
change may have been a result of the intervention design than the A-B design. With the A-B-A strategy,

Baseline Intervention


Frequency of target behavior


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Figure 3.1 Example of an A-B design.

28 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

repeated measurements are collected during a base- Baseline Intervention Baseline
line period of measurement (A), which is followed

Frequency of target behavior

by the intervention (B), followed by the withdrawal 12
of the intervention (A). The A-B-A design allows 10
for firmer conclusions than does the A-B design 8
because the effects of the intervention (B phase) can
be compared against the effects of the removal of
that intervention (in the second phase A, an effec-
tive return to baseline). Here, the researcher sys- 2
tematically controls both the introduction and the 0
removal of an independent variable, in this case an 1 2 3 4 5 6 7 8 9

intervention. Manipulating an independent variable Figure 3.2 Example of an A-B-A design.

is the hallmark of a true experiment. If the baseline
phase is stable, improvements are then observed in
the B phase followed by a return to baseline levels confound—specifically, the ordering of treatment
in the second A phase, the experimenter can con- introduction may affect the strength of response
clude that the changes likely occurred as a result of during the final phase A (Bandura, 1969; Cook
the intervention. The certainty with which one can & Campbell, 1979; Kazdin, 2003). Additionally,
make such a conclusion increases with each replica- researchers should keep in mind that many with-
tion in different subjects. drawal designs within clinical psychology may be
Moore and colleagues (Moore, Gilles, McComas, of limited utility, given that “unlearning” clinical
& Symons, 2010) used an A-B-A withdrawal design skills during the withdrawal phase may be difficult
to evaluate the effects of functional communication or impossible.
training (FCT) on nonsuicidal self-injurious behav-
ior in a male toddler with a traumatic brain injury. A-B-A-B Design
FCT involves teaching effective communication The A-B-A-B design is often the preferred strat-
strategies that are meant to replace self-injurious or egy of the single-case researcher, given its rigorous
other undesirable behaviors (Moore et al., 2010). design and clinical utility. In an A-B-A-B design,
The boy in this examination was taught to use a but- repeated measurements are collected through each
ton to communicate with his mother in order to tell of four phases: the baseline, the first intervention
her that he would like her to come in the room. In phase, the second baseline, and then an additional
the first (A) phase, when the boy pressed the but- intervention phase. In most published A-B-A-B
ton, his mother was to give him 10 seconds of posi- studies, only one behavior is targeted. However,
tive attention, and when he hurt himself, he was to in some cases, other behaviors that are not a tar-
receive no attention. In the B phase, the opposite get of the intervention can be measured so that
contingencies occurred: attention for self-injury but the experimenter can monitor side effects (Kazdin,
none for the newly learned form of communication. 1973). Note that this design differs from the A-B
Following the B phase, the A phase was repeated. design with booster treatment in that the second
In this intervention, the toddler’s functional com- round of treatment is identical to the first, rather
munication was markedly higher, and self-injurious than a simple scaled-down version of the B phase.
behavior markedly lower, in both training (A) phases The A-B-A-B design improves upon the A-B and
as compared to the phase when the contingency was A-B-A designs by affording increased opportunities
removed (B phase). Given the clear shift from A to to systematically evaluate the link between interven-
B and then from B to A, one can conclude with tion and target behavior, providing more support
some certainty that the intervention was responsible for causal conclusions. Essentially, the design affords a
for the improvements in this case. Figure 3.2 shows built-in replication of findings observed in the initial
an example of an A-B-A design. two study phases. Additionally, from an ethical stand-
Despite the advantages of the A-B-A design point, some may prefer to end the experiment on the
over traditional A-B designs, this design concludes intervention phase rather than on the withdrawal
on the nontreatment phase, perhaps limiting the phase, so that the subject can continue to experience
full clinical benefit that the subject can receive the greatest possible treatment benefits—which may
from the treatment (Barlow & Hersen, 1973). not occur if the individual ends his or her participa-
In addition, the A-B-A design has a sequential tion in a nonintervention phase.

gallo, comer, bar low 29

Importantly, the experimenter cannot control phase B consisted of satiation therapy. The initial
every situation during data collection within an A-B- plan called for the schedule to include 14 days of
A-B design. For example, in some clinical circum- baseline, then 14 days of treatment, followed by an
stances, the phase change may occur at the request additional 14 days of baseline, and 14 additional
(or at the whim) of the subject (e.g., Wallenstein days of treatment, with three daily recordings of the
& Nock, 2007). Such an occurrence considerably dependent variables. In this particular study, 8 days
limits the strength of conclusions due to potential of treatment nonadherence occurred at the start of
confounding variables that may have led to both what would have been the first treatment phase, so
changes in the target behavior and the decision to the treatment phase was restarted and that 8-day
change phases. However, if the A-B-A-B design is period was treated as its own separate phase.
followed to fruition, and when study phase changes Results revealed a reduction in sexual interest in
are controlled entirely by the researcher and not boys and a concurrent shift in sexual interest in same-
extraneous factors, one can maintain some confi- age male peers, with a shift in predominant sexual
dence in the treatment effects. An additional limita- interest from boys to same-age peers that began at
tion is that if improvements occur during a baseline the start of the B phase (following the unscheduled
period, conclusions about the efficacy of the inter- phase treatment nonadherence) and continued
vention are significantly limited. In such a case, it throughout the second baseline and final treatment
would behoove the clinician to replicate the exami- phase. Whereas the dependent variable did not shift
nation, either with the same person or additional back to baseline levels during the second iteration
people who have the same presenting problem. of phase A, the timing of the commencement of the
One of the main limitations of the A-B-A-B man’s improvements provides evidence for the treat-
design and other withdrawal designs is the experi- ment as a probable cause for the shift, considering
menter’s knowledge of all phase changes and results, it would be difficult for the patient to “unlearn” the
which may bias when study phases are changed and skills provided in the treatment phase. Figure 3.3
how behaviors are evaluated (Barlow et al., 2009). shows an example of an A-B-A-B design.
For example, if an experimenter hypothesizes that an
intervention will reduce depression, she may change B-A-B Design
to the intervention phase if the depression starts to In the B-A-B design, the treatment (B phase) is
remit during the withdrawal phase, or she may wish applied before a baseline examination (A phase) and
to keep the intervention phase for a longer period the examination ends with a final phase of treatment
of time than planned if results are not immediately (B) (Barlow et al., 2009) (see an example of a B-A-B
observed. Determining phase lengths in advance design in Figure 3.4). Many prefer the B-A-B design
eliminates this potential for bias. However, consid- because active treatment is administered both at
ering clinical response when determining when to the start and at the end of examination. Thus, the
switch phases may be important for some research B-A-B design offers a clinically indicated experi-
questions, such as when phase changes are to be mental strategy for individuals for whom waiting
made after a symptom improvement, or after data for treatment in order to collect baseline data is
have stabilized. For such cases, we recommend that contraindicated. Additionally, similar to the A-B-
the research develop clear clinical response criteria for A-B design, the last phase is treatment, a sequence
phase change prior to initiating the study, and strictly that may increase the likelihood that the patient will
adhere to those criteria to determine phase shifts.
Hunter, Ram, and Ryback (2008) attempted 14
Baseline Intervention Baseline Intervention
to use an A-B-A-B design to examine the effects
Frequency of target behavior

of satiation therapy (Marshall, 1979) to curb a
19-year-old man’s sexual interest in prepubescent
boys—work that illustrates how factors outside 8
of an experimenter’s control can in practice affect 6
the intended design. The goal of satiation therapy 4
is to prescribe prolonged masturbation to specific
paraphilic fantasies, which is meant to cause bore-
dom and/or extinction of deviant sexual arousal 0
1 2 3 4 5 6 7 8 9 10 11 12
to those paraphilic cues. Phase A in this study
consisted of baseline measurement collection, and Figure 3.3 Example of an A-B-A-B design.

30 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

Intervention Baseline Intervention To take one example, rather than implementing
a return to baseline or withdrawal phase (A), fol-
Frequency of target behavior

12 lowing the initial intervention phase consisting of

10 contingent reinforcement, the amount of reinforce-
ment in the C phase remains the same as in the B
phase, but is not contingent on the behavior of the
subject (Barlow et al., 2009). For example, if a child
4 received a sticker in phase B for every time he raised
2 his hand to speak in class instead of talking out of
0 turn, in phase C the provision of stickers for the
1 2 3 4 5 6 7 8 9 child would not be contingent upon his raising his
hand. The C phase thus serves a similar purpose as
Figure 3.4 Example of a B-A-B design.
the placebo phase common in evaluations of phar-
maceutical agents.
continue to benefit from the treatment even after The principal strength of this design over the tra-
the examination ends. ditional A-B-A-B design is that the improvements
However, the B-A-B design has limited experi- seen in the B (intervention) phase can be more
mental utility because it is not possible to examine reliably assigned to the effects of the intervention
the treatment effects against the natural frequency of itself rather than to the effects of participating in an
the target behavior without a baseline phase occur- experimental condition. In an A-B-C-B design, the
ring before treatment implementation. Although the baseline phase cannot be compared against either
A phase is marked by the withdrawal of the inter- the B or C phase, as the baseline phase occurs only
vention, no previous baseline has been established once and is not repeated for comparison in a later
in a B-A-B design, prohibiting measurement of the phase of the examination.
target behavior unaffected by treatment (either dur- One study from the child literature utilized
ing the treatment phase itself or as a remnant of an A-B-C-B design to evaluate a peer-mediated
the treatment phase just prior). Thus, the A-B-A-B intervention meant to improve social interactions
design is preferable in most cases. An illustration of in children with autism (Goldstein, Kaczmarek,
the problems of a B-A-B design is an early study by Pennington, & Shafer, 1992). Five groups of three
Truax and Carkhuff (1965) examining the effects of children (one with autism, two peers without
the Rogerian techniques of empathy and uncondi- autism) engaged in coded interactions in which the
tional positive regard on three psychiatric patients’ peers were taught facilitation methods. The A phase
responses in a 1-hour interview. Three 20-minute consisted of the baseline, with conversation as usual.
phases made up the hour-long interview: B, in In the B phase, the two peers were instructed to use
which the therapist utilized high levels of empathy the social facilitation strategies they were taught.
and unconditional positive regard; A, when these The C phase consisted of continued use of facilita-
techniques were decreased; and B, when the thera- tion strategies, but peers were instructed to use them
peutic techniques were again increased. Coders who with the other child instead of with the child with
were blind to phase rated and confirmed the rela- autism. In this phase, they were praised only when
tive presence and absence of empathy and uncon- they used the newly learned strategies with the peer
ditional positive regard. The dependent variable of who did not have autism. The B (peer interven-
interest was the patient’s “intrapersonal explora- tion) phase saw an increase in communicative acts,
tion.” The researchers did identify slightly higher which returned back to levels similar to baseline in
intrapersonal exploration in the B phases relative the C phase, and rose again to higher levels during
to the withdrawal (A) phase. However, this investi- the second B phase, for four of the five children.
gation does not present a compelling indication of These outcomes provide evidence for the efficacy of
positive intervention effects, as we have no indica- the intervention taught for use during the B phases,
tion of where levels of the target behavior were prior above and beyond the effects of simply participating
to intervention. in a study.

A-B-C-B Design Multiple-Baseline Designs

The A-B-C-B design attempts to control for pla- Withdrawal designs are particularly well suited
cebo effects that may affect the dependent variable. for the evaluation of interventions that would be

gallo, comer, bar low 31

less likely to retain effects once they are removed, as Baseline Intervention
is the case in the evaluation of a therapeutic medica- 25

Frequency of target behavior #1

tion with a very short half-life. Some procedures are, 20
however, irreversible (e.g., various surgeries, or the
learning of a skill in psychotherapy). How can the 15
clinical researcher evaluate the intervention when it is
not possible to completely remove the intervention?
In such situations, reversal and withdrawal designs 5
are misguided because withdrawing the intervention
may have little effect. When withdrawal or reversal 1 2 3 4 5 6 7 8 9 10 11 12 13
is impossible or unethical, multiple-baseline designs 25

Frequency of target behavior #2

offer a valuable alternative.
Multiple-baseline designs entail applying an inter- 20

vention to different behaviors, settings, or subjects, 15

while systematically varying the length of the base-
line phase for each behavior, setting, or subject (Baer, 10
Wolf, & Risley, 1968). Whereas multiple-baseline
designs do not include a withdrawal of treatment,
the efficacy of the treatment is demonstrated by 0
1 2 3 4 5 6 7 8 9 10 11 12 13
reproducing the treatment effects in different 25
Frequency of target behavior #3

behaviors, people, or settings at different times.

Accordingly, the multiple-baseline design consists 20
of an A and a B phase, but the A phase is differ-
entially extended for each target behavior, subject,
and/or setting. For example, one individual’s base- 10
line might last 3 days, another’s baseline might last
5 days, and a third individual’s baseline might last
7 days. If the intervention is effective, the behavior 0
will not change until the intervention is actually ini- 1 2 3 4 5 6 7 8 9 10 11 12 13

tiated. Thus, analysis in a multiple-baseline design Figure 3.5 Example of multiple-baseline design across behav-
occurs within subjects, settings, or behaviors. Does iors, where behaviors 1 and 2 were specifically targeted by the
the behavior change after treatment begins relative intervention and behavior 3 was not.
to baseline, and among subjects, settings, or behav-
iors (do other baselines remain stable while one
changes)? A strong multiple-baseline strategy has intervention in temporal sequence to independent
the baseline phase continue until stability of data is behaviors. Support for an intervention is demon-
observed so that any effects of the intervention can strated when outcome behaviors improve across
be adequately measured against the stable baseline. the study upon the initiation of treatment targeting
Once stability is achieved, the clinical researcher those specific behaviors, and not before.
may begin applying the treatment. Another multi- As an example of a multiple-baseline design across
ple-baseline strategy determines the lengths of base- behaviors, Lane-Brown and Tate (2010) evaluated a
line intervals a priori and then randomly assigns novel treatment for apathy that included positive
these interval lengths to subjects. reinforcement and motivational interviewing in a
Multiple-baseline designs can take one of three man with a traumatic brain injury. Specific behav-
forms: (1) multiple-baseline design across behav- iors targeted were bedroom organization, increasing
iors (example in Fig. 3.5); (2) multiple-baseline exercise, and improving social conversations. The
design across subjects (example in Fig. 3.6); or first two goals were treated while the latter remained
(3) multiple-baseline design across settings (Hersen, untreated. Lane-Brown and Tate found an increase
1982; Miltenberger, 2001). The multiple-baseline in goal-directed activity involving organization and
design across behaviors examines the effects of an exercise after each of these behaviors was targeted by
intervention on different behaviors within the treatment, but no improvement on the untargeted
same individual. When examining the intervention social conversations, providing evidence that it was
across behaviors, the clinical researcher applies the the treatment that led to observed changes.

32 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

Baseline Intervention of two stages: the child directed interaction (CDI)
Setting #1 phase, during which parents are taught to follow the
Frequency of target behavior
Subject #1
lead of the child, and the parent directed interaction
(PDI) phase, during which parents learn to effec-
10 tively direct and lead the child (Herschell, Calzada,
8 Eyberg, & McNeil, 2002). Anxiety symptoms
6 remained stable during the baseline period for all
three subjects and began decreasing only after the
0 initiation of treatment, particularly during the PDI
1 2 3 4 5 6 7 8 9 portion of treatment, showing preliminary support
35 for the use of adapted PCIT to treat preschoolers
Setting #2 with separation anxiety disorder.
Frequency of target behavior

30 or
25 Subject #2 In a multiple-baseline design across settings, treat-
ment is applied in sequence across new and different
settings (such as at home, at school, and with peers)
(Freeman, 2003). These designs demonstrate treat-
10 ment efficacy when changes occur in each setting
5 when, and only when, the intervention is imple-
0 mented in that setting. Kay, Harchik, and Luiselli
1 2 3 4 5 6 7 8 9 (2006) used such a design to evaluate a multicom-
25 ponent behavior intervention using compensatory
Setting #3 responses and positive reinforcement to reduce
Frequency of target behavior

20 or
Subject #3 drooling in a 17-year-old boy with autism. The
intervention was introduced after varying numbers
of days in three settings (the classroom, the com-
10 munity, and cooking class), with decreased drooling
occurring in each setting only after the intervention
was introduced in that setting.
0 Kazdin and Kopel (1975) provide recommenda-
1 2 3 4 5 6 7 8 9 tions for how to be sure that the treatment is affect-
ing the target variable. Specifically, the baselines
Figure 3.6 Example of multiple-baseline design across settings
should be as different as possible from each other in
or subjects.
length, at least four baselines should be used, and/
or treatment should be withdrawn and reapplied if
The multiple-baseline design across subjects (or across necessary to demonstrate that the treatment causes
individuals) examines the effects of intervention on the change in the target variable. The number of
different people with similar presentations, with baselines that are needed has been deliberated in the
the duration of the baseline interval varying across literature, with a consensus that three or four base-
subjects. For example, in a study of six individuals, lines are necessary to be sure that observed changes
two may undergo a 2-week baseline interval prior are the result of the treatment (Barlow et al., 2009;
to treatment, two may undergo a 4-week baseline Kazdin & Kopel, 1975; Wolf & Risley, 1971).
interval prior to treatment, and two may undergo The strength of the multiple-baseline design
a 6-week baseline interval prior to treatment. The comes largely from the ability to demonstrate the
effect of an intervention is demonstrated when a efficacy of an intervention by showing that the
change in each person’s functioning is obtained after desired change occurs only when the intervention
the initiation of treatment, and not before. is applied to the behavior, subject, or setting spe-
Choate, Pincus, Eyberg, and Barlow (2005) uti- cifically targeted (Barlow et al., 2009). One of the
lized the multiple-baseline design across subjects to biggest advantages of the multiple-baseline design is
examine an adaptation of Parent-Child Interaction that it allows for multiple behaviors to be examined
Therapy (PCIT) to treat early separation anxiety at one time, which is more similar to naturalistic sit-
disorder. Treatment of three different children was uations, and allows the behaviors to be measured in
implemented after 1, 2, and 4 weeks of baseline the context of each other (Barlow et al., 2009)—for
monitoring of anxiety symptoms. PCIT consists example, in the case of comorbid conditions.

gallo, comer, bar low 33

However, unlike withdrawal designs, multiple- feasibility and acceptability of the experimental
baseline designs control only the introduction, treatment, or providing a preliminary indication of
but not the removal, of treatment. Thus, when the effectiveness of the experimental treatment. Too
appropriate, withdrawal designs are able to yield often researchers fail to appreciate the small pilot
more compelling evidence for causal conclusions. RCT’s more fundamental role in providing prelimi-
Additionally, the multiple-baseline design’s strength nary information on the feasibility and acceptabil-
decreases if fewer than three or four settings, behav- ity of the study design to be used in the subsequent
iors, or individuals are measured. Finally, there are large-scale treatment evaluation. The pilot RCT
limitations to the multiple-baseline design involving serves as a check on the research team’s ability to
generalization, but the possibility for generalization recruit, treat, and retain participants across random-
can be further evaluated utilizing “generalization ization and key study points (e.g., as a check on the
tests” (see Kendall, 1981). availability of eligible and willing participants using
the proposed recruitment methods, to test the fea-
Moving from Idiographic to Nomothetic sibility of assessment and treatment protocols, to
Group Design Evaluations evaluate whether the study protocol sufficiently
Whereas single-case experimental designs and retains target participants across randomization or
multiple-baseline series inform our understand- whether participants systematically drop out when
ing of individual behavior change and play key assigned to a less-preferred treatment condition, to
roles in treatment development, nomothetic group evaluate whether participant compensation is suffi-
experimental designs are essential for establish- cient to recruit participants to complete assessments
ing treatment efficacy and effectiveness, and for long after treatment has been completed, etc.). The
meaningfully influencing health care policy and small pilot RCT thus provides researchers an oppor-
practice. Specifically, adequately powered RCTs tunity to identify and correct potential “glitches” in
that maximize both scientific rigor and clinical rel- the research design prior to the funding and ini-
evance constitute the field’s “gold standard” research tiation of an adequately powered large-scale RCT
design for establishing broad empirical support for (Kraemer, Mintz, Noda, Tinklenberg, & Yesavage,
a treatment (Chambless & Hollon, 1998; Kendall 2006), and accordingly the pilot RCT should ide-
& Comer, 2011). Such work entails a well-defined ally implement an identical design to that foreseen
independent variable (i.e., manualized treatment for the subsequent large-scale RCT.
protocols), appropriate control condition(s), a com- Failure to appreciate this fundamental role of the
prehensive multimodal/multi-informant assessment small pilot trial as a check on the study design can
strategy, treatment fidelity checks, statistical and have dramatic effects on the design of a pilot trial,
clinical significance testing, evaluation of response which can in turn have unfortunate consequences
across time, and an adequately powered sample of for a program of research. Consider the following
clinically representative participants to enable sta- cautionary example of a researcher who misguidedly
tistical judgments that are both reliable and valid perceives the sole function of pilot work as provid-
(see Kendall, Comer, & Chow, this volume, for a ing preliminary information on the feasibility and
full consideration of RCT methods and design). acceptability of the experimental treatment:
Needless to say, such undertakings are enormously
time- and resource-intensive, and so entering into A researcher spends considerable efforts
a large-scale RCT first requires careful preparation conceptualizing and developing an intervention
to minimize the risk of a failed study, unfortunate for a target clinical population based on theory,
wasting of time and resources, and unwarranted empirical research, and extensive consultation with
burden on study participants. Prior to conducting noted experts in the area. The researcher appreciates
a large adequately powered RCT, a small pilot RCT the need for an adequately powered RCT in the
is warranted. establishment of empirical support for his treatment,
appreciates that such an endeavor will require
Appropriate Use and Design of the considerable funding, and also appreciates that
Small Pilot RCT a grant review committee will require pilot data
The empirical preparation for a large-scale RCT before it would consider recommending funding for
is the purview of the small pilot RCT. Many errone- a proposed large-scale RCT. And so the researcher
ously perceive the sole function of the small pilot secures small internal funding to pilot test his
RCT as providing preliminary information on the treatment, and calculates that with this money

34 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

he is able to treat and evaluate 16 participants proposed $10 compensation for participating in
across pretreatment, posttreatment, and 6-month assessments is sufficient to maximize participation.
follow-up. In the above example, the researcher’s failure to
Given the researcher’s limited funds for the appreciate the role of pilot work in gathering pre-
pilot work, and given his misguided sole focus on liminary information on the feasibility and accept-
establishing the feasibility and acceptability of his ability of proposed study procedures—and not
novel treatment with the pilot data, he decides to solely the feasibility and acceptability of the treat-
run all of the pilot subjects through the experimental ment itself—interfered with his ability to secure
treatment. “After all,” the researcher thinks to funding for an adequately powered and controlled
himself, “since a pilot study is underpowered to evaluation of his treatment.
statistically test outcomes against a control condition, This researcher would have been better off using
I might as well run as many subjects as I can the pilot funding to conduct a small pilot RCT
through my new treatment so that I can have all the implementing an identical design foreseen for the
more data on treatment credibility and consumer subsequent large-scale RCT. Specifically, a pilot
satisfaction.” The researcher further decides that design that randomized 16 participants across the
since pilot work is by design underpowered to enable two treatments that he intended to include in the
statistical judgments about treatment efficacy, to save large-scale RCT and employed diagnostic inter-
costs he would rely solely on self-reports rather than views and 6-month follow-up assessments would
on lengthy structured diagnostic interviews, although have provided more compelling evidence that his
he does plan to include diagnostic interviews in proposed large-scale RCT was worth the sizable
the subsequent large-scale RCT design. Finally, the requested investment. Given that small pilot samples
researcher calculates that if he cuts the 6-month are not sufficiently powered to enable stable efficacy
follow-up assessments from the pilot design, he can judgments (Kraemer et al., 2006), the additional 12
run four more subjects through the experimental subjects he gained by excluding a control group and
treatment. At the end of the pilot trial, he treats 20 abandoning diagnostic and follow-up assessments,
subjects with the experimental treatment (with only in truth, provided no incremental support for his
three dropouts) and collects consumer satisfaction treatment. Instead, the inadequate pilot design left
forms providing preliminary indication of treatment the review committee with too many questions about
feasibility and acceptability. his team’s ability to implement and retain subjects
The researcher includes these encouraging pilot across a randomization procedure, the team’s abil-
data in a well-written grant submission to fund an ity to implement the education control treatment
impressively designed large-scale RCT comparing faithfully without inadvertently including elements
his experimental treatment to a credible education/ of the experimental treatment, the team’s ability to
support/attention control condition with a 6-month adequately conduct diagnostic interviews, and the
follow-up, but is surprised when the scientific review team’s ability to retain subjects across the proposed
committee recommends against funding his work long-term follow-up. Researchers who receive this
due to “inadequate pilot testing.” The summary type of feedback from grant review committees are
statements from the review note that the researcher undoubtedly disappointed, but not nearly as dis-
does not provide any evidence that he can recruit appointed as those researchers who have invested
and retain subjects across a randomization, or several years and considerable resources into a con-
that his team can deliver the education/support/ trolled trial only to realize midway through the
attention control condition proposed, or that study that key glitches in their study design that
subjects randomly assigned to this control condition could have been easily averted are systematically
will not drop out when they learn of their treatment interfering with the ability to truly evaluate treat-
assignment. The committee also questions whether ment efficacy or to meaningfully interpret the data.
his team is sufficiently trained to conduct the
diagnostic interviews proposed in the study protocol, Caution Concerning the Misuse of Pilot
as these were not included in the pilot trial. Because Data for the Purposes of Power Calculations
there were no 6-month follow-up evaluations It is critical to caution researchers against the
included in the pilot work, the committee expresses common misuse of data drawn from small pilot
uncertainty about the researcher’s ability to compel studies for the purposes of power calculations in
participants to return for assessments so long after the design of a subsequent large-scale RCT. As well
treatment has completed, and wonder whether his articulated elsewhere (Cohen, 1988; Kraemer &

gallo, comer, bar low 35

Thiemann, 1987; see the chapter by Kraemer in this a failure to pursue large-scale research that would
volume), power refers to the probability of accurately identify meaningful treatment effects. Because
rejecting a null hypothesis (e.g., the effect of an a limited sample size can yield large variability in
experimental treatment is comparable to the effect effects, effect sizes drawn from underpowered stud-
of a control treatment) when the null hypothesis is ies (such as small pilot studies) result in effect size
indeed untrue. Designing an adequately powered estimates that are unstable. In the above example,
RCT study entails recruiting a sample large enough although a large treatment effect was found in the
to yield reliably different treatment response scores pilot trial, the true treatment effect may in fact be
across conditions if true group response differences moderate but meaningful (e.g., d = 0.5). As a larger
do indeed exist. Conventional calculations call for sample size is required to reliably detect a moderate
the researcher to determine the needed sample size effect versus a large effect, a study designed to sim-
via calculations that consider an expected effect size ply capture a large effect is at increased risk to retain
(in RCT data, typically the magnitude of difference the null hypothesis when in fact there are true treat-
in treatment response across groups) in the con- ment differences (i.e., a power analysis based on a
text of an acceptably low α level (i.e., the proba- predicted large effect would estimate the need for a
bilty of rejecting the null hypothesis if it is indeed smaller sample than would one based on a predicted
true; consensus typically stipulates α ≤ .05) and an moderate effect). In this scenario, after a thorough
acceptably high level of power (consensus typically time- and resource-intensive RCT, the researcher
stipulates power ≥ .80) (which sets the probability would erroneously conclude that his treatment does
of correctly rejecting the null hypothesis when there not “work.” Accordingly, the researcher is better jus-
is a true effect at four in five tests). tified to rely on related work in the literature using
Although conventions stipulate acceptable α adequately powered samples to evaluate the effect of
and power levels to incorporate into sample size similar treatment methods for neighboring clinical
calculations, broad conventions do not stipulate an conditions than to rely on underpowered pilot work,
expected effect size magnitude to include because even though the pilot work examined the very treat-
this will vary widely across diverse clinical popu- ment for the very condition under question.
lations and across varied treatments. Whereas an
exposure-based treatment for specific phobias may Discussion
expectedly yield a relatively large effect size, a bib- The past 25 years have witnessed tremendous
liotherapy treatment for borderline personality dis- progress in the advancement of evidence-based
order may expectedly yield a very small effect size. practice. Many contemporary treatment guidelines
To estimate an expected effect size for the design of (e.g., Chambless & Hollon, 1998; Silverman &
an adequately powered study, the researcher must Hinshaw, 2008) appropriately privilege the out-
rely on theory regarding the specific clinical popu- comes of large randomized group comparison trials
lation and the treatment being evaluated, as well over other research methodologies in the identifi-
as the magnitude of effects found in related stud- cation of empirically supported treatments. Large
ies. Indeed, expert guidelines argue that rationale RCTs are undoubtedly the most rigorous and exper-
and justification for a proposed hypothesis-testing imentally controlled methodology we have with
study should be drawn “from previous research” which to inform broad public policy decisions and
(Wilkinson & Task Force on Statistical Inference, mental health care debates. However, key limita-
1999). tions in the generality of obtained results highlight
Commonly, researchers will accordingly use data the additional need for data drawn from comple-
from their pilot RCT to estimate an expected effect mentary methods that add to the rigorous evidence
size for a proposed large-scale RCT. For example, if yielded by the RCT. Indeed, consensus guidelines
a small pilot RCT (n = 15) identified a large treat- for evidence-based practice explicitly call for sup-
ment effect (e.g., d = 0.8), a researcher might use porting evidence drawn from a broad portfolio of
this effect size to guide power calculations for deter- research methods and strategies, each with its own
minining the necessary sample size for a proposed advantages and limitations.
large-scale RCT. But as Kraemer and colleagues The multiple strengths of single-case experimen-
(2006) mathematically demonstrate, this misguided tal designs and small pilot trials should place these
practice can lead to the design of underpowered designs firmly in the comprehensive portfolio of
studies positioned to retain the null hypothesis informative designs for evaluating evidence-based
when in fact true treatment differences exist, or to practices in mental health care. Regrettably, the

36 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

prominence of and appreciation for such designs and small pilot trial designs in the evaluation of evi-
has waned over the past several decades, as evi- dence-based practice for mental health conditions.
denced by their declining representation in lead- Alongside RCTs, small experimental designs play a
ing clinical psychology journals and by the limited crucial role in developing interventions and testing
availability of funding granted to work utilizing their clinical utility among groups of people and
these designs. It may be that for many, delibera- among individuals. Moreover, small experimen-
tion on the merits of single-case designs mistakenly tal designs are easily adaptable in clinical settings
lumps these designs with unrelated methods that and laboratories alike, affording clinicians, many of
also rely on small n samples but do not incorporate whom have limited resources to conduct large-scale
experimental manipulation of treatment initiation research studies, a greater opportunity to contribute
and discontinuation or systematic observation of to the growing literature on evidence-based psycho-
target behaviors (e.g., case histories, case series, ret- logical treatments. Increased utilization of single-
rospective chart reviews). Importantly, low statisti- case, multiple-baseline, and small pilot trial designs
cal power and low experimental control are distinct would significantly enhance our understanding of
methodological constructs. the effects of mental health treatments and would
In many ways, comprehensive treatment evalua- more efficiently elucidate what treatments work
tion must begin and end with the study of change in best for whom. Appropriately designed small
the individual. The large-scale RCT cannot be con- experimental designs are an important and neces-
ducted in a vacuum. Systematic research activities sary component of the evaluation of any psycho-
focusing on individual treatment-related changes logical treatment, and increasing their frequency in
are needed in preparation for large clinical trials, the research literature will significantly enhance the
and systematic research activities evaluating the suc- understanding about the benefits and weaknesses of
cessful application of RCT-supported treatments to evidence-based psychological treatments.
individuals that may vary in important and mean-
ingful ways are needed before attempting large-scale References
implementation of supported treatments in practice American Psychological Association. (2002). Criteria for practice
settings. Too often, researchers rush through the guideline development and evaluation. American Psychologist,
treatment development stage in order to focus their 57(12), 1048–1051. doi: 10.1037/0003-066x.57.12.1048
efforts on randomized controlled outcomes. Often Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some cur-
rent dimensions of applied behavior analysis. Journal of
the result is a very rigorous evaluation of a treat- Applied Behavior Analysis, 1(1), 91–97. doi: 10.1901/
ment package that could have been substantially jaba.1968.1-91
improved had the developers invested the time in Bandura, A. (1969). Principles of behavior modification. New
single-case design research activities to evaluate York: Holt, Rinehart and Winston.
the “how,” “why,” and “when” of treatment-related Barlow, D. H., & Hersen, M. (1973). Single-case experimental
designs: Uses in applied clinical research. Archives of General
change. In addition, too often time- and resource- Psychiatry, 29(3), 319–325.
intensive RCTs fail to recruit and retain partici- Barlow, D. H., & Nock, M. K. (2009). Why can’t we be more idio-
pants across study procedures when a relatively graphic in our research? Perspectives on Psychological Science,
inexpensive pilot trial could have corrected simple 4(1), 19–21. doi: 10.1111/j.1745-6924.2009.01088.x
unforeseeable glitches in the research design. And Barlow, D. H., Nock, M., & Hersen, M. (2009). Single case
experimental designs: strategies for studying behavior change
too often treatments supported in tightly controlled (3rd ed.). Boston: Pearson/Allyn and Bacon.
RCTs are expected to be applied in broad mental Barron, F., & Leary, T. F. (1955). Changes in psychoneu-
health care settings without first evaluating the sup- rotic patients with and without psychotherapy. Journal
ported treatment’s effects in individual participants of Consulting Psychology, 19(4), 239–245. doi:10.1037/
across practice settings. In such cases, the result h0044784
Bolger, H. (1965). The case study method. In B. B. Wolman
can be a misguided attempt to shoehorn treatment (Ed.), Handbook of clinical psychology (pp. 28–39). New York:
strategies shown to be effective in controlled labora- McGraw-Hill.
tory settings into practice settings that may differ in Campbell, D. T. (1969). Reforms as experiments. American
important and meaningful ways—an effort that can Psychologist, 24(4), 409–429. doi: 10.1037/h0027982
inadvertently increase the already regrettable wedge Campbell, D. T., & Stanley, J. C. (1966). Experimental and qua-
si-experimental designs for research. Chicago: R. McNally.
between research and practice communities. Chambless, D. L., & Hollon, S. D. (1998). Defining empiri-
We hope this chapter has provided a renewed cally supported therapies. Journal of Consulting and Clinical
sense of urgency for the role of single-case designs, Psychology, 66(1), 7–18. doi: 10.1037/0022-006x.66.1.7
including multiple-baseline and withdrawal designs,

gallo, comer, bar low 37

Choate, M. L., Pincus, D. B., Eyberg, S. M., & Barlow, D. H. Kazdin, A. E., & Kopel, S. A. (1975). On resolving ambiguities
(2005). Parent-Child Interaction Therapy for treatment of of the multiple-baseline design: Problems and recommenda-
separation anxiety disorder in young children: A pilot study. tions. Behavior Therapy, 6(5), 601–608. doi: 10.1016/s0005-
Cognitive and Behavioral Practice, 12(1), 126–135. doi: 7894(75)80181-x
10.1016/s1077-7229(05)80047-1 Kendall, P. C. (1981). Assessing generalization and the single-
Cohen, J. (1988). Set correlation and contingency tables. subject strategies. Behavior Modification, 5(3), 307–319.
Applied Psychological Measurement, 12(4), 425–434. doi: doi:10.1177/014544558153001
10.1177/014662168801200410 Kendall, P. C., & Comer, J. S. (2011). Research methods in
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: clinical psychology. In D. H. Barlow (Ed.), The Oxford hand-
design & analysis issues for field settings. Boston: Houghton book of clinical psychology (pp. 52–75). New York: Oxford
Mifflin. University Press.
Cooper, M., Todd, G., Turner, H., & Wells, A. (2007). Cognitive Kraemer, H. C., Mintz, J., Noda, A., Tinklenberg, J., &
therapy for bulimia nervosa: An A-B replication series. Yesavage, J. A. (2006). Caution regarding the use of pilot
Clinical Psychology & Psychotherapy, 14(5), 402–411. doi: studies to guide power calculations for study proposals.
10.1002/cpp.548 Archives of General Psychiatry, 63(5), 484–489. doi: 10.1001/
Crowne, D. P., & Marlowe, D. (1960). A new scale of social desir- archpsyc.63.5.484
ability independent of psychopathology. Journal of Consulting Kraemer, H. C., & Thiemann, S. (1987). How many subjects?
Psychology, 24(4), 349–354. doi: 10.1037/h0047358 Statistical power analysis in research. Thousand Oaks, CA:
Eysenck, H. J. (1952). The effects of psychotherapy: an evalua- Sage Publications, Inc.
tion. Journal of Consulting Psychology, 16(5), 319–324. doi: Landsverk, J., Brown, C., Rolls Reutz, J., Palinkas, L., & Horwitz,
10.1037/h0063633 S. (2011). Design elements in implementation research: A
Freeman, K. A. (2003). Single subject designs. In J. C. Thomas structured review of child welfare and child mental health
& M. Hersen (Eds.), Understanding research in clinical and studies. Administration and Policy in Mental Health and
counseling psychology (pp. 181–208). Mahwah, NJ: Lawrence Mental Health Services Research, 38(1), 54–63. doi: 10.1007/
Erlbaum Associates Publishers. s10488-010-0315-y
Goldstein, H., Kaczmarek, L., Pennington, R., & Shafer, K. Lane-Brown, A. P., & Tate, R. P. (2010). Evaluation of an inter-
(1992). Peer-mediated intervention: Attending to, com- vention for apathy after traumatic brain injury: A multiple-
menting on, and acknowledging the behavior of preschool- baseline, single-case experimental design. Journal of Head
ers with autism. Journal of Applied Behavior Analysis, 25(2), Trauma Rehabilitation, 25(6), 459–469.
289–305. doi: 10.1901/jaba.1992. 25–289 March, J. S., Silva, S. G., Compton, S., Shapiro, M., Califf, R.,
Hayes, S. C., Barlow, D. H., & Nelson-Gray, R. O. (1999). The & Krishnan, R. (2005). The case for practical clinical trials in
scientist practitioner: research and accountability in the age of psychiatry. American Journal of Psychiatry, 162(5), 836–846.
managed care. Needham, MA: Allyn and Bacon. doi: 10.1176/appi.ajp.162.5.836
Herschell, A. D., Calzada, E. J., Eyberg, S. M., & McNeil, C. Marshall, W. L. (1979). Satiation therapy: A procedure for
B. (2002). Clinical issues in parent-child interaction therapy. reducing deviant sexual arousal. Journal of Applied Behavior
Cognitive and Behavioral Practice, 9(1), 16–27. doi: 10.1016/ Analysis, 12(3), 377–389. doi: 10.1901/jaba.1979.12–377
s1077-7229(02)80035-9 McKnight, S. D., McKean, J. W., & Huitema, B. E. (2000). A
Hersen, M. (1982). Single-case experimental designs. In A. S. double bootstrap method to analyze linear models with autore-
Bellack, M. Hersen, & A. E. Kazdin (Eds.), International gressive error terms. Psychological Methods, 5(1), 87–101. doi:
handbook of behavior modification and therapy (pp. 167–203). 10.1037/1082-989x.5.1.87
New York: Plenum Press. Miltenberger, R. G. (2001). Behavior modification: Principles and
Hersen, M., & Barlow, D. H. (1976). Single case experimental procedures (2nd ed.). Belmont, CA: Wadsworth/Thomson
designs: strategies for studying behavior change (1st ed., Vol. Learning.
56). New York: Pergamon Press. Moore, T. R., Gilles, E., McComas, J. J., & Symons, F. J. (2010).
Hunter, J. A., Ram, N., & Ryback, R. (2008). Use of satia- Functional analysis and treatment of self-injurious behaviour
tion therapy in the treatment of adolescent-manifest in a young child with traumatic brain injury. Brain Injury,
sexual interest in male children: A single-case, repeated 24(12), 1511–1518. doi: 10.3109/02699052.2010.523043
measures design. Clinical Case Studies, 7(1), 54–74. doi: Nock, M. K., Goldman, J. L., Wang, Y., & Albano, A. M.
10.1177/1534650107304773 (2004). From science to practice: The flexible use of evidence-
Kay, S., Harchik, A. F., & Luiselli, J. K. (2006). Elimination based treatments in clinical settings. Journal of the American
of drooling by an adolescent student with autism attending Academy of Child & Adolescent Psychiatry, 43(6), 777–780.
public high school. Journal of Positive Behavior Interventions, doi: 10.1097/01.chi.0000120023.14101.58
8(1), 24–28. doi: 10.1177/10983007060080010401 Nock, M. K., & Kurtz, S. M. S. (2005). Direct behavioral
Kazdin, A. E. (1973). Methodological and assessment consider- observation in school settings: Bringing science to practice.
ations in evaluating reinforcement programs in applied set- Cognitive and Behavioral Practice, 12(3), 359–370. doi:
tings. Journal of Applied Behavior Analysis, 6(3), 517–531. 10.1016/s1077-7229(05)80058-6
doi: 10.1901/jaba.1973.6–517 Orne, M. T. (1962). On the social psychology of the psycho-
Kazdin, A. E. (1982). Single-case research designs: methods for clin- logical experiment: With particular reference to demand
ical and applied settings. New York: Oxford University Press. characteristics and their implications. American Psychologist,
Kazdin, A. E. (2001). Behavior modification in applied settings 17(11), 776–783. doi: 10.1037/h0043424
(6th ed.). Belmont, CA: Wadsworth/Thompson Learning. Powers, E., & Witmer, H. (1951). An experiment in the preven-
Kazdin, A. E. (2003). Research design in clinical psychology (4th tion of delinquency; the Cambridge-Somerville Youth Study.
ed.). Boston, MA: Allyn and Bacon. New York: Columbia University Press.

38 si n g le - c as e experimental des igns and small pi lot t r i al d esi gns

Risley, T. R. , & Wolf, M. M. ( 1972 ). Strategies for ana- with multiple anxiety disorders. Clinical Case Studies, 5(6),
lyzing behavioral change over time . In J. Nesselroade 488–510. doi: 10.1177/1534650106290371
& H. Reese (Eds.), Life-span developmental psychol- Truax, C. B., & Carkhuff, R. R. (1965). Experimental manip-
ogy. Methodological issues (pp. 175 –183). New York : ulation of therapeutic conditions. Journal of Consulting
Academic Press . Psychology, 29(2), 119–124. doi: 10.1037/h0021927
Rounsaville, B. J., Carroll, K. M., & Onken, L. S. (2001). Wallenstein, M. B., & Nock, M. K. (2007). Physical exercise as a
A stage model of behavioral therapies research: Getting treatment for non-suicidal self-injury: Evidence from a single
started and moving on from stage I. Clinical Psychology: case study. American Journal of Psychiatry, 164(2), 350–351.
Science and Practice, 8(2), 133–142. doi: 10.1093/ doi: 10.1176/appi.ajp.164.2.350-a
clipsy/8.2.133 Watson, J. B., & Rayner, R. (1920). Conditioned emotional
Sidman, M. (1960). Tactics of scientific research. Oxford, England: reactions. Journal of Experimental Psychology, 3(1), 1–14. doi:
Basic Books. 10.1037/h0069608
Silverman, W. K., & Hinshaw, S. P. (2008). The second spe- Wilkinson, L., & Task Force on Statistical Inference. (1999).
cial issue on evidence-based psychosocial treatments for Statistical methods in psychology journals: Guidelines and
children and adolescents: A 10-year update. Journal of explanations. American Psychologist, 54(8), 594–604. doi:
Clinical Child and Adolescent Psychology, 37(1), 1–7. doi: 10.1037/0003-066x.54.8.594
10.1080/15374410701817725 Wolf, M. M., & Risley, T. R. (1971). Reinforcement: Applied
Suveg, C., Comer, J. S., Furr, J. M., & Kendall, P. C. (2006). research. In R. Glaser (Ed.), The nature of reinforcement
Adapting manualized CBT for a cognitively delayed child (pp. 310–325). New York: Academic Press.

gallo, comer, bar low 39


The Randomized Controlled Trial:

4 Basics and Beyond

Philip C. Kendall, Jonathan S. Comer, and Candice Chow

This chapter describes methodological and design considerations central to the scientific evaluation
of clinical treatment methods via randomized clinical trials (RCTs). Matters of design, procedure,
measurement, data analysis, and reporting are each considered in turn. Specifically, the authors
examine different types of controlled comparisons, random assignment, the evaluation of treatment
response across time, participant selection, study setting, properly defining and checking the integrity
of the independent variable (i.e., treatment condition), dealing with participant attrition and missing
data, evaluating clinical significance and mechanisms of change, and consolidated standards for
communicating study findings to the scientific community. After addressing considerations related to
the design and implementation of the traditional RCT, the authors turn their attention to important
extensions and variations of the RCT. These treatment study designs include equivalency designs,
sequenced treatment designs, prescriptive designs, adaptive designs, and preferential treatment
designs. Examples from the recent clinical psychology literature are provided, and guidelines are
suggested for conducting treatment evaluations that maximize both scientific rigor and clinical
Key Words: Randomized clinical trial, RCT, normative comparisons, random assignment, treatment
integrity, equivalency designs, sequenced treatment designs

The randomized controlled trial (RCT)—a while at the same time sacrificing important
group comparison design in which participants elements of internal validity. Although all of the
are randomly assigned to treatment conditions— methodological and design ideals presented may
constitutes the most rigorous and objective method- not always be achieved within a single RCT, our
ological design for evaluating therapeutic outcomes. discussions provide exemplars of the RCT.
In this chapter we focus on RCT research strategies
that maximize both scientific rigor and clinical rel- Design Considerations
evance (for consideration of single-case, multiple- To adequately assess the causal impact of a thera-
baseline, and small pilot trial designs, see Chapter peutic intervention, clinical researchers must use
3 in this volume). We organize the present chapter control procedures derived from experimental sci-
around (a) RCT design considerations, (b) RCT ence. In the RCT, the intervention applied consti-
procedural considerations, (c) RCT measurement tutes the experimental manipulation, and thus to
considerations, (d) RCT data analysis, and (e) RCT have confidence that an intervention is responsible
reporting. We then turn our attention to extensions for observed changes, extraneous factors must be
and variations of the traditional RCT, which offer experimentally “controlled.” The objective is to dis-
various adjustments for clinical generalizability, tinguish intervention effects from any changes that

result from other factors, such as the passage of time, by comparing prospective changes shown by partici-
patient expectancies of change, therapist attention, pants across conditions.
repeated assessments, and simple regression to the Importantly, not all control conditions are
mean. To maximize internal validity, the clinical “created equal.” Deciding which form of control
researcher must carefully select control/comparison condition to select for a particular study (e.g., no
condition(s), randomly assign participants across treatment, waitlist, attention-placebo, standard
treatment conditions, and systematically evaluate treatment as usual) requires careful deliberation (see
treatment response across time. We now consider Table 4.1 for recent examples from the literature).
each of these RCT research strategies in turn. In a no-treatment control condition, comparison
participants are evaluated in repeated assessments,
Selecting Control Condition(s) separated by an interval of time equal in duration to
Comparisons of participants randomly assigned the treatment provided to those in the experimen-
to different treatment conditions are essential to con- tal treatment condition. Any changes seen in the
trol for factors other than treatment. In a “controlled” treated participants are compared to changes seen in
treatment evaluation, comparable participants are the nontreated participants. When, relative to non-
randomly placed into either the experimental condi- treated participants, the treated participants show
tion, composed of those who receive the intervention, significantly greater improvements, the experimen-
or a control condition, composed of those who do not tal treatment may be credited with producing the
receive the intervention. The efficacy of treatment observed changes. Several important rival hypoth-
over and above the outcome produced by extraneous eses are eliminated in a no-treatment design, includ-
factors (e.g., the passage of time) can be determined ing effects due to the passage of time, maturation,

Table 4.1 Types of Control Conditions in Treatment Outcome Research

Recent Example in Literature
Control Condition Definition Description Reference

No-treatment Control participants are admin- Adults with anxiety symptoms Varley et al. (2011)
control istered assessments on repeated were randomly assigned to a
occasions, separated by an inter- standard self-help condition,
val of time equal to the length of an augmented self-help condi-
treatment. tion, or a control condition in
which they did not receive any

Waitlist control Control participants are assessed Adolescents with anxiety disor- Spence et al. (2011)
before and after a designated dura- ders were randomly assigned to
tion of time but receive the treat- Internet-delivered CBT, face-
ment following the waiting period. to-face CBT, or to a waitlist
They may anticipate change due to control group.

Attention-placebo/ Control participants receive a School-age children with anxi- Miller et al. (2011)
nonspecific control treatment that involves nonspecific ety symptoms were randomly
factors (e.g., attention, contact with assigned to either a cognitive-
a therapist). behavioral group intervention
or an attention control in which
students were read to in small

Standard treatment/ Control participants receive an Depressed veterans were Mohr et al. (2011)
routine care control intervention that is the current randomly assigned to either
practice for treatment of the telephone-administered cognitive-
problem under study. behavioral therapy or standard
care through community-based
outpatient clinics.

k end all, comer, c h ow 41

spontaneous remission, and regression to the mean. participants. Moreover, with increasing waitlist dura-
Importantly, however, other potentially important tions, the problem of differential attrition arises,
confounding factors not specific to the experi- which compromises study interpretation. If attrition
mental treatment—such as patient expectancies to rates are higher in a waitlist condition, the sample
get better, or meeting with a caring and attentive in the control condition may be different from the
clinician—are not ruled out in a no-treatment con- sample in the treatment condition, and no longer rep-
trol design. Accordingly, no-treatment control con- resentative of the larger group. For appropriate inter-
ditions may be useful in earlier stages of treatment pretation of study results, it is important to recognize
development, but to establish broad empirical sup- that the smaller waitlist group at the end of the study
port for an intervention, more informative control now represents only patients who could tolerate and
procedures are preferred. withstand a prolonged waitlist period.
A more revealing variant of the no-treatment An alternative to waitlist control condition is
condition is the waitlist condition. Here, partici- the attention-placebo control condition (or non-
pants in the waitlist condition expect that after a specific treatment condition), which accounts for
certain period of time they will receive treatment, key effects that might be due simply to regularly
and accordingly may anticipate upcoming changes meeting with and getting the attention of a warm
(which may in turn affect their symptoms). Changes and knowledgeable therapist. For example, in a
are evaluated at uniform intervals across the waitlist recent RCT, Kendall and colleagues (2008) ran-
and experimental conditions, and if we assume the domly assigned children with anxiety disorders to
participants in the waitlist and treatment conditions receive one of two forms of cognitive-behavioral
are comparable (e.g., comparable baseline symptom treatment (CBT; either individual or family CBT)
severity and gender, age, and ethnicity distribu- or to a manual-based family education, support,
tions), we can then infer that changes in the treated and attention (FESA) condition. Individual and
participants relative to waitlist participants are likely family-based CBT showed superiority over FESA
due to the intervention rather than to expectations in reducing children’s principal anxiety diagno-
of impending change. However, as with no-treat- sis. Given the attentive and supportive nature of
ment conditions, waitlist conditions are of limited FESA, it could be inferred that gains associated
value for evaluating treatments that have already with CBT were not likely attributable to “com-
been examined relative to “inactive” conditions. mon therapy factors” such as learning about emo-
No-treatment and waitlist conditions in study tions, receiving support from an understanding
designs introduce important ethical considerations, therapist, and having opportunities to discuss the
particularly with vulnerable populations (see Kendall child’s difficulties.
& Suveg, 2008). For ethical purposes, the function- Developing and implementing a successful
ing of waitlist participants must be carefully moni- attention-placebo control condition requires careful
tored to ensure that they are safely able to tolerate the deliberation. Attention placebos must credibly instill
treatment delay. If a waitlist participant experiences a positive expectations in participants and provide
clinical emergency requiring immediate professional comparable professional contact, while at the same
attention during the waitlist interval, the provision of time they must be devoid of specific therapeutic
emergency professional services undoubtedly compro- techniques hypothesized to be effective. For ethical
mises the integrity of the waitlist condition. In addi- purposes, participants must be fully informed of and
tion, to maximize internal validity, the duration of the willing to take a chance on receiving a psychosocial
control condition should be equal to the duration of placebo condition. Even then, a credible attention-
the experimental treatment condition to ensure that placebo condition may be difficult for therapists to
differences in response across conditions cannot be accomplish, particularly if they do not believe that
attributed simply to differential passages of time. Now the treatment will offer any benefit to the participant.
suppose a 24-session treatment takes 6 months to Methodologically, it is difficult to ensure that study
provide—is it ethical to withhold treatment for such therapists share comparable positive expectancies
a long wait period (see Bersoff & Bersoff, 1999)? The for their attention-placebo participants as they do
ethical response to this question varies across clinical for their participants who are receiving more active
conditions. It may be ethical to incorporate a waitlist treatment (O’Leary & Borkovec, 1978). “Demand
design when evaluating an experimental treatment for characteristics” suggest that when study therapists
obesity, but a waitlist design may be unethical when predict a favorable treatment response, participants
evaluating an experimental treatment for suicidal will tend to improve accordingly (Kazdin, 2003),

42 the r a n do mized co ntro l l ed trial

which in turn affects the interpretability of study assignment in the context of an RCT ensures that
findings. Similarly, whereas participants in an every participant has an equal chance of being assigned
attention-placebo condition may have high baseline to the active treatment condition or to the control
expectations, they may grow disenchanted when condition(s). Random assignment, however, does not
no meaningful changes are emerging. The clinical guarantee comparability across conditions—simply as
researcher is wise to assess participant expectations a result of chance, one resultant group may be dif-
for change across conditions so that if an experimen- ferent on some variables (e.g., household income,
tal treatment outperforms an attention-placebo con- occupational impairment, comorbidity). Appropriate
trol condition, the impact of differential participant statistical tests can be used to evaluate the comparabil-
expectations across conditions can be evaluated. ity of participants across treatment conditions.
Inclusion of an attention-placebo control condi- Problems arise when random assignment is not
tion, when carefully designed, offers advantages from incorporated into a group-comparison design of
an internal validity standpoint. Treatment compo- treatment response. Consider a situation in which
nents across conditions are carefully specified and participants do not have an equal chance of being
the clinical researcher maintains tight control over assigned to the experimental and control conditions.
the differential experiences of participants across For example, suppose a researcher were to allow
conditions. At the same time, such designs typically depressed participants to elect for themselves whether
compare an experimental treatment to a treatment to participate in the active treatment or in a waitlist
condition that has been developed for the purposes treatment condition. If participants in the active
of the study and that does not exist in actual clinical treatment condition subsequently showed greater
practice. The use of a standard treatment comparison symptom reductions than waitlist participants, the
condition (or treatment-as-usual condition) affords research cannot rule out the possibility that post-
evaluation of an experimental treatment relative to treatment symptom differences could have resulted
the intervention that is currently available and being from prestudy differences between the participants
applied. Including standard treatment as the control (e.g., selection bias). Participants who choose not to
condition offers advantages over attention-placebo, receive treatment immediately may not be ready to
waitlist, and no-treatment controls. Ethical concerns work on their depression and may be meaningfully
about no-treatment conditions are quelled, and, as different from those depressed participants who are
all participants receive care, attrition is likely to be immediately ready to work on their symptoms.
minimized, and nonspecific factors are likely to be Although random assignment does not ensure
equated (Kazdin, 2003). When the experimental participant comparability across conditions on all
treatment and the standard care intervention share measures, randomization procedures do rigorously
comparable durations and participant and therapist maximize the likelihood of comparability. An alter-
expectancies, the researcher can evaluate the relative native procedure, randomized blocks assignment, or
efficacy of the interventions. assignment by stratified blocks, involves matching
In a recent example, Mufson and colleagues (arranging) prospective participants in subgroups
(2004) randomly assigned depressed adolescents to that contain participants that are highly comparable
interpersonal psychotherapy (IPT-A) or to “treat- on key dimensions (e.g., socioeconomic status indica-
ment as usual” in school-based mental health clinics. tors) and contain the same number of participants as
Adolescents treated with IPT-A relative to treatment the number of conditions. For example, if the study
as usual showed greater symptom reduction and requires three conditions—a standard treatment, an
improved overall functioning. Given this design, experimental treatment, and a waitlist condition—
the researchers were able to infer that IPT-A outper- participants can be arranged in matching groups of
formed the existing standard of care for depressed three so that each trio is highly comparable on pre-
adolescents in the school settings. Importantly, in selected features. Members in each trio are then ran-
standard treatment comparisons, it is critical that domly assigned to one of the three conditions, in turn
both the experimental treatment and the standard increasing the probability that each condition will
(routine) treatment are implemented in a high- contain comparable participants while at the same
quality fashion (Kendall & Hollon, 1983). time retaining a critical randomization procedure.

Random Assignment Evaluating Treatment Response Across Time

To achieve baseline comparability between study In the RCT, it is essential to evaluate par-
conditions, random assignment is essential. Random ticipant functioning on the dependent variables

k end all, comer, c h ow 43

(e.g., presenting symptoms) prior to treatment initi- for childhood anxiety (Kendall, Safford, Flannery-
ation. Such pretreatment (or “baseline”) assessments Schroeder & Webb, 2004), it was found that posi-
provide critical data to evaluate between-groups tive responders relative to less-positive responders
comparability at treatment outset, as well as within- had fewer problems with substance use at the long-
groups treatment response. Posttreatment assess- term follow-up (see also Kendall & Kessler, 2002).
ments of participants are essential to examine the In another example, participants in the Treatment
comparative efficacy of treatment versus control for Adolescents with Depression Study (TADS)
conditions. Importantly, evidence of acute treat- were followed for 5 years after study entry (Curry
ment efficacy (i.e., improvement immediately et al., 2011). TADS evaluated the relative effective-
upon therapy completion) may not be indicative ness of fluoxetine, CBT, and their combination in
of long-term success (maintenance). At posttreat- the treatment of adolescents with major depres-
ment, treatment effects may be appreciable but sive disorder (see Treatment for Adolescents with
fail to exhibit maintenance at a follow-up assess- Depression Study [TADS] Team, 2004). The Survey
ment. Accordingly, we recommend that treatment of Outcomes Following Treatment for Adolescent
outcome studies systematically include a follow-up Depression (SOFTAD) was an open, 3.5-year fol-
assessment. Follow-up assessments (e.g., 6 months, low-up period extending beyond the TADS 1-year
9 months, 18 months) are essential to demonstra- follow-up period. Initial acute outcomes (measured
tions of treatment efficacy and are a signpost of directly after treatment) found combination treat-
methodological rigor. Maintenance is demonstrated ment to be associated with significantly greater out-
when a treatment produces results at the follow-up comes relative to fluoxetine or CBT alone, and CBT
assessment that are comparable to those found at showed no incremental response over pill placebo
posttreatment (i.e., improvements from pretreat- immediately following treatment (TADS Team,
ment and an absence of detrimental change from 2004). However, by the 5-year follow-up, 96 per-
posttreatment to follow-up). cent of participants, regardless of treatment condi-
Follow-up evaluations can help to identify dif- tion, experienced remission of their major depressive
ferential treatment effects of considerable clinical episode, and 88 percent recovered by 2 years (Curry
utility. For example, two treatments may produce et al., 2011). Importantly, gains identified at long-
comparable effects at the end of treatment, but one term follow-ups are fully attributable to the initial
may be more effective in the prevention of relapse treatment only if one determines that participants
(see Anderson & Lambert, 2001, for demonstration did not seek or receive additional treatments dur-
of survival analysis in clinical psychology). When ing the follow-up interval. As an example, during
two treatments show comparable response at post- the TADS 5-year follow-up interim, 42 percent of
treatment, yet one is associated with a higher relapse participants received psychotherapy and 44 per-
rate over time, follow-up evaluations provide critical cent received antidepressant medication (Curry et
data to support selection of one treatment over the al., 2011). Appropriate statistical tests are needed
other. For example, Brown and colleagues (1997) to account for differences across conditions when
compared CBT and relaxation training for depres- services have been rendered during the long-term
sion in alcoholism. When considering the average follow-up interval.
(mean) days abstinent and drinks per day as depen-
dent variables, measured at pretreatment and at 3 Multiple Treatment Comparisons
and 6 months posttreatment, the authors found To evaluate the comparative (or relative) effi-
that although both treatments produced compa- cacy of therapeutic interventions, researchers use
rable acute gains, CBT was superior to relaxation between-groups designs with more than one active
training in maintaining the gains. treatment condition. Such between-groups designs
Follow-up evaluations can also be used to offer direct comparisons of one treatment with one
detect continued improvement—the benefits of or more alternative active treatments. Importantly,
some interventions may accumulate over time and whereas larger effect sizes can be reasonably expected
possibly expand to other domains of function- in evaluations comparing an active treatment to
ing. Policymakers and researchers are increasingly an inactive condition, smaller differences are to
interested in expanding intervention research to be expected when distinguishing among multiple
consider potential indirect effects on the preven- active treatments. Accordingly, sample size consid-
tion of secondary problems. In a long-term (7.4 erations are influenced by whether the comparison
years) follow-up of individuals treated with CBT is between a treatment and a control condition or

44 t h e r a n do mized co ntro l l ed trial

one treatment versus another known-to-be-effective option to ensure that all treatments are conducted
treatment (see Kazdin & Bass, 1989; see Chapter 12 by comparable therapists. It is wise to gather data
in this volume for a full consideration of statistical on therapist variables (e.g., expertise, experience,
power). Research aiming to identify reliable differ- allegiance) and examine their relationships to par-
ences in response between two active treatments ticipant outcomes.
will need to evaluate a larger sample of participants For proper evaluation, intervention procedures
than research comparing an active condition to an across treatments must be equated for key variables
inactive treatment. such as (a) duration; (b) length, intensity, and fre-
In a recent example utilizing multiple active treat- quency of contacts with participants; (c) credibility
ment comparisons, Walkup and colleagues (2008) of treatment rationale; (d) treatment setting; and
examined the efficacy of CBT, sertraline, and their (e) degree of involvement of persons significant to
combination in a placebo-controlled trial with chil- the participant. These factors may be the basis for
dren diagnosed with separation anxiety disorder, two alternative therapies (e.g., conjoint vs. individ-
generalized anxiety disorder, and/or social phobia. ual marital therapy). In such cases, the nonequated
Participants were assigned to CBT, sertraline, their feature constitutes an experimentally manipulated
combination, or a placebo pill for 12 weeks. Patients variable rather than a factor to control.
receiving the three active treatments all fared signifi- What is the best method of measuring change
cantly better than those in the placebo group, with when two alternative treatments are being com-
the combination of sertraline and CBT yielding the pared? Importantly, measures should cover the range
most favorable treatment outcomes. Specifically, anal- of symptoms and functioning targeted for change,
yses revealed a significant clinical response in roughly tap costs and potential negative side effects, and be
81 percent of youth treated with a combination of unbiased with respect to the alternate interventions.
CBT and sertraline, 60 percent of youth treated with Assessments should not be differentially sensitive to
CBT alone, 55 percent of youth treated with sertra- one treatment over another. Treatment comparisons
line alone, and 24 percent receiving placebo alone. will be misleading if measures are not equally sensi-
As in the above-mentioned designs, it is wise to tive to the types of changes that most likely result
check the participant comparability across condi- from each intervention type.
tions on important variables (e.g., baseline func- Special issues are presented in comparisons of
tioning, prior therapy experience, socioeconomic psychological and psychopharmacological treat-
indicators, treatment preferences/expectancies) ments (e.g., Beidel et al., 2007; Dobson et al.,
before continuing with statistical evaluation of the 2008; Marcus et al., 2007; MTA Cooperative
intervention effects. Multiple treatment compari- Group, 1999; Pediatric OCD Treatment Study
sons are optimal when each participant is randomly Team, 2004; Walkup et al, 2008). For example,
assigned to receive one and only one treatment. As when and how should placebo medications be used
previously noted, a randomized block procedure, in comparison to or with psychological treatment?
with participants blocked on preselected variable(s) How should expectancy effects be addressed? How
(e.g., baseline severity), can be used. Comparability should differential attrition be handled statistically
across therapists who are administering the different and/or conceptually? How should inherent differ-
treatments is also essential. Therapists conducting ences in professional contact across psychological
each type of treatment should be equivalent in train- and pharmacological interventions be addressed?
ing, experience, intervention expertise, treatment Follow-up evaluations become particularly impor-
allegiance, and expectation that the intervention tant after acute treatment phases are discontin-
will be effective. To control for therapist variables, ued. Psychological treatment effects may persist
one method has each study therapist conduct each after treatment, whereas the effects of medications
type of intervention in the study. This method is may not persist upon medication discontinuation.
optimized when cases are randomly assigned to (Interested readers are referred to Hollon, 1996;
therapists who are equally expert and favorably Hollon & DeRubeis, 1981; Jacobson & Hollon,
disposed toward each treatment. For example, an 1996a, 1996b, for thorough consideration of these
intervention test would have reduced validity if a issues.)
group of psychodynamic therapists were asked to
conduct both a CBT (in which their expertise is Procedural Considerations
low) and a psychodynamic therapy (in which their We now address key RCT procedural consid-
expertise is high). Stratified blocking offers a viable erations, including (a) sample selection, (b) study

k end all, comer, c h ow 45

setting, (c) defining the independent variable, racially and ethnically diverse samples may be simi-
and (d) checking the integrity of the independent lar in many ways to single-ethnicity samples, one can
variable. question the extent to which efficacy findings from
predominantly European-American samples can
Sample Selection be generalized to ethnic-minority samples (Bernal,
Selecting a sample to best represent the clinical Bonilla, & Bellido, 1995; Bernal & Scharron-Del-
population of interest requires careful deliberation. Rio, 2001; Hall, 2001; Olfson, Cherry, & Lewis-
A selected sample refers to a sample of participants Fernandez, 2009; Sue, 1998). Investigations have
who may require treatment but who may otherwise also addressed the potential for bias in diagnoses
only approximate clinically disordered persons. By and in the provision of mental health services to
contrast, RCTs optimize external validity when ethnic-minority patients (e.g., Flaherty & Meaer,
treatments are applied and evaluated with actual 1980; Homma-True, Green, Lopez, & Trimble,
treatment-seeking patients. Consider a study inves- 1993; Lopez, 1989; Snowden, 2003).
tigating the effects of a treatment on social anxiety A simple rule is that the research sample should
disorder. The researcher could use (a) a sample of reflect the broad population to which the study
patients diagnosed with social anxiety disorder via results are to be generalized. To generalize to a single-
structured diagnostic interviews (genuine clinical ethnicity group, one must study a single-ethnicity
sample), (b) a sample consisting of a group of indi- sample. To generalize to a diverse population, one
viduals who self-report shyness (analogue sample), or must study a diverse sample, as most RCTs strive to
(c) a sample of socially anxious persons after exclud- accomplish. Barriers to care must be reduced and
ing cases with depressed mood and/or substance use outreach efforts employed to inform minorities of
(highly select sample). This last sample may meet full available services (see Sweeney, Robins, Ruberu,
diagnostic criteria for social anxiety disorder but are & Jones, 2005; Yeh, McCabe, Hough, Dupuis, &
nevertheless highly selected. Hazen, 2003) and include them in the research.
From a feasibility standpoint, clinical research- Walders and Drotar (2000) provide guidelines
ers may find it easier to recruit analogue samples for recruiting and working with ethnically diverse
relative to genuine clinical samples, and such sam- samples.
ples may afford a greater ability to control various After the fact, appropriate statistical analyses can
conditions and minimize threats to internal valid- examine potential differential outcomes (see Arnold
ity. At the same time, analogue and select samples et al., 2003; Treadwell, Flannery-Schroeder, &
compromise external validity—these individuals Kendall, 1994). Although grouping and analyzing
are not necessarily comparable to patients seen in research participants by racial or ethnic status is a
typical clinical practice (and may not qualify as an common analytic approach, this approach is sim-
RCT). With respect to social anxiety disorder, for plistic because it fails to address variations in each
instance, one could question whether social anxi- patient’s degree of ethnic identity. It is often the
ety disorder in genuine clinical populations com- degree to which an individual identifies with an eth-
pares meaningfully to self-reported shyness (see nocultural group or community, and not simply his
Heiser, Turner, Beidel, & Roberson-Nay, 2009). or her ethnicity itself, that may moderate response
When deciding whether to use clinical, analogue, to treatment. For further consideration of this
or select samples, the researcher needs to consider important issue, the reader is referred to Chapter 21
how the study results will be interpreted and gen- in this volume.
eralized. Regrettably, nationally representative data
show that standard exclusion criteria set for clini- Study Setting
cal treatment studies exclude up to 75 percent of Some have questioned whether outcomes found
affected individuals in the general population who at select research centers can transport to clinical
have major depression (Blanco, Olfson, Goodwin, practice settings, and thus the question of whether
et al., 2008). an intervention can be transported to other service
Researchers must consider patient diversity when settings requires independent evaluation (Southam-
deciding which samples to study. Research sup- Gerow, Ringeisen, & Sherrill, 2006). It is not suf-
porting the efficacy of psychological treatments has ficient to demonstrate treatment efficacy within a
historically been conducted with predominantly narrowly defined sample in a highly selective set-
European-American samples, although this is rap- ting. One should study, rather than assume, that a
idly changing (see Huey & Polo, 2008). Although treatment found to be efficacious within a research

46 t h e r a n do mized co ntro l l ed trial

clinical setting will be efficacious in a clinical ser- treatments to have clinicians perform treatment in a
vice setting (see Hoagwood, 2002; Silverman, rigid manner, this misperception has restricted some
Kurtines, & Hoagwood, 2004; Southam-Gerow clinicians’ openness to manual-based interventions
et al., 2006; Weisz, Donenberg, Han, & Weiss, (Addis & Krasnow, 2000).
1995; Weisz, Weiss, & Donenberg, 1992). Closing Effective use of manual-based treatments must
the gap between RCTs and clinical practice requires be preceded by adequate training (Barlow, 1989).
transporting effective treatments (getting “what Clinical professionals cannot become proficient in
works” into practice) and identifying additional the administration of therapy simply by reading a
research into those factors that may be involved in manual. Interactive training, flexible application,
successful transportation (e.g., patient, therapist, and ongoing clinical supervision are essential to
researcher, service delivery setting; see Kendall & ensure proper conduct of manual-based therapy:
Southam-Gerow, 1995; Silverman et al., 2004). The goal has been referred to as “flexibility within
Methodological issues relevant to the conduct of fidelity” (Kendall & Beidas, 2007).
research evaluating the transportability of treat- Several modern treatment manuals allow the
ments to “real-world” settings can be found in therapist to attend to each patient’s specific cir-
Chapter 5 in this volume. cumstances, clinical needs, concerns, and comorbid
diagnoses without deviating from the core treat-
Defining the Independent Variable ment strategies detailed in the manual. The goal is
Proper treatment evaluation necessitates that the to include provisions for standardized implementa-
treatment must be adequately described and detailed tion of therapy while using a personalized case for-
in order to replicate the evaluation in another set- mulation (e.g., see Suveg, Comer, Furr, & Kendall,
ting, or to be able to show and teach others how to 2006). Importantly, use of manual-based treatments
conduct the treatment. Treatment manuals achieve does not eliminate the potential for differential ther-
the required description and detail of the treat- apist effects. Researchers examine therapist variables
ment. Treatment manuals enhance internal validity within the context of manual-based treatments (e.g.,
and treatment integrity and allow for comparison therapeutic relationship-building behaviors, flexibil-
of treatments across formats and contexts, while at ity, warmth) that may relate to treatment outcome
the same time reducing potential confounds (e.g., (Creed & Kendall, 2005; Karver et al., 2008; Shirk
differences in the amount of clinical contact, type et al., 2008; see also Chapter 9 in this volume for
and amount of training). Therapist manuals facili- a full consideration of designing, conducting, and
tate training and contribute meaningfully to rep- evaluating therapy process research).
lication (Dobson & Hamilton, 2002; Dobson &
Shaw, 1988). Checking the Integrity of the Independent
The merits of manual-based treatments are not Variable
universally agreed upon. Debate has ensued regard- Careful checking of the manipulated variable
ing the appropriate use of manual-based treatments is required in any rigorous experimental research.
versus a more variable approach typically found in In the RCT, the manipulated variable is typically
clinical practice (see Addis, Cardemil, Duncan, & treatment or a key characteristic of treatment. By
Miller, 2006; Addis & Krasnow, 2000; Westen, experimental design, all participants are not treated
Novotny, & Thompson-Brenner, 2004). Some have the same. However, just because the study has been
argued that treatment manuals limit therapist cre- so designed does not guarantee that the indepen-
ativity and place restrictions on the individualiza- dent variable (treatment) has been implemented
tion that the clinicians use (see also Waltz, Addis, as intended. In the course of a study—whether
Koerner, & Jacobson, 1993; Wilson, 1995). Indeed, due to insufficient therapist training, therapist
some therapy manuals may appear “cookbook-ish,” variables, lack of manual specification, inadequate
and some lack attention to the clinical sensitivities therapist monitoring, participant demand charac-
needed for implementation and individualization, teristics, or simple error variance—the treatment
but our experience and data suggest that this is not that was assigned may not in fact be the treatment
the norm. An empirical evaluation from our labora- that was provided (see also Perepletchikova &
tory found that the use of a manual-based treatment Kazdin, 2005).
for child anxiety disorders (Kendall & Hedtke, 2006) To help ensure that the treatments are indeed
did not restrict therapist flexibility (Kendall & Chu, implemented as intended, it is wise to require that
1999). Although it is not the goal of manual-based a treatment plan be followed, that therapists are

k end all, comer, c h ow 47

carefully trained, and that sufficient supervision can examine the adequacy with which the treatment
is available throughout. The researcher is wise to was implemented (see Hollon, Garber, & Shelton,
conduct an independent check on the manipula- 2005). It is also of interest to investigate potential
tion. For example, treatment sessions are recorded variations in treatment outcome that may be associ-
so that an independent rater can listen to and/or ated with differences in the quality of the treatment
watch the recordings and provide quantifiable judg- provided (Garfield, 1998; Kendall & Hollon, 1983).
ments regarding key characteristics of the treat- Expert judges are needed to make determinations
ment. Such a manipulation check provides the of differential quality prior to the examination of
necessary assurance that the described treatment differential outcomes for high- versus low-quality
was indeed provided as intended. Digital audio and therapy implementation (see Waltz et al., 1993).
video recordings are inexpensive, can be used for McLeod and colleagues (in press) provide a descrip-
subsequent training, and can be analyzed to answer tion of procedural issues in the conduct of quality
key research questions. Therapy session recordings assurance and treatment integrity checks.
evaluated in RCTs not only provide a check on the
treatment within each separate study but also allow Measurement Considerations
for a check on the comparability of treatments pro- Assessing the Dependent Variable(s)
vided across studies. That is, the therapy provided as No single measure can serve as the sole indica-
CBT in one researcher’s RCT could be checked to tor of participants’ treatment-related gains. Rather,
assess its comparability to other teams’ CBT. a variety of methods, measures, data sources, and
A recently completed clinical trial from our sampling domains (e.g., symptoms, distress, func-
research program comparing two active-treatment tional impairment, quality of life) are used to assess
conditions for childhood anxiety disorders against outcomes. A rigorous treatment RCT will con-
an active attention control condition (Kendall sider using assessments of participant self-report;
et al., 2008) illustrates a procedural plan for integ- participant test/task performance; therapist judg-
rity checks. First, we developed a checklist of the ments and ratings; archival or documentary records
strategies and content called for in each session by (e.g., health care visits and costs, work and school
the respective treatment manuals. A panel of expert records); observations by trained, unbiased, blinded
clinicians served as independent raters who used the observers; rating by significant people in the par-
checklists to rate randomly selected video segments ticipant’s life; and independent judgments by pro-
from randomly selected cases. The panel of raters fessionals. Outcomes are more compelling when
was trained on nonstudy cases until they reached an observed by independent (blind) evaluators than
interrater reliability of Cohen’s κ ≥ .85. After ensur- when based solely on the therapist’s opinion or the
ing reliability, the panel used the checklists to assess participant’s self-reports.
whether the appropriate content was covered for Collecting data on variables of interest from
randomly selected segments that were representative multiple reporters (e.g., treatment participant, fam-
of all sessions, conditions, and therapists. For each ily members, peers) can be particularly important
coded session, we computed an integrity ratio cor- when assessing children and adolescents. Such a
responding to the number of checklist items covered multi-informant strategy is critical as features of
by the therapist divided by the total number of items cognitive development may compromise youth self-
that should have been included. Integrity check reports, and children may simply offer what they
results indicated that across the conditions, 85 to 92 believe to be the desired responses. And so in RCTs
percent of intended content was in fact covered. with youth, collecting additional data from impor-
It is also wise for the RCT researcher to evaluate tant adults in children’s lives who observe them
the quality of treatment provided. A therapist may across different settings (e.g., parents, teachers) is
strictly adhere to a treatment manual and yet fail to essential. Importantly, however, because emotions
administer the treatment in an otherwise competent and mood are partially internal phenomena, some
manner, or he or she may administer therapy while symptoms may be less known to parents and teach-
significantly deviating from the manual. In both ers, and some observable symptoms may occur in
cases, the operational definition of the independent situations outside the home or school. Accordingly,
variable (i.e., the treatment manual) has been vio- an inherent dilemma with a multi-informant assess-
lated, treatment integrity impaired, and replication ment strategy is that discrepancies among infor-
rendered impossible (Dobson & Shaw, 1988). When mants are to be expected (Comer & Kendall, 2004).
a treatment fails to demonstrate expected gains, one Research shows low to moderate concordance rates

48 the r a n do mized co ntro l l ed trial

among informants in the assessment of youth collected in ways that allow us to make statistical
(De Los Reyes & Kazdin, 2005), with particularly inferences about the larger population that a given
low agreement among child internalizing symptoms sample was selected to represent. Data do not “speak”
(Comer & Kendall, 2004). for themselves. Although a comprehensive statisti-
A multimodal strategy relies on multiple inquiries cal discussion about RCT data analysis is beyond
to evaluate an underlying construct of interest. For the present scope (the reader is referred to Jaccard &
example, assessing family functioning may include Guilamo-Ramos, 2002a, 2002b; Kraemer & Kupfer,
family members completing self-report forms on 2006; Kraemer, Wilson, Fairburn, & Agras, 2002;
their perceptions of relationships in the family, as and Chapters 14 and 16 in this volume) in this sec-
well as conducting structured behavioral observa- tion, we discuss three areas that merit consideration
tions of family members interacting to be coded by in the context of RCT data analysis: (a) addressing
independent raters. Statistical packages can integrate missing data and attrition, (b) assessing clinical sig-
data obtained from multimodal assessment strate- nificance, and (c) evaluating mechanisms of change
gies (see Chapter 16 in this volume). The increasing (i.e., mediators and moderators).
availability of handheld communication devices and
personal digital assistants allows researchers to incor- Addressing Missing Data and Attrition
porate experience sampling methodology (ESM), in Not every participant assigned to treatment
which people report on their emotions and behavior actually completes participation in an RCT. A loss
in real-world situations (in situ). ESM data provide of research participants (attrition) may occur just
naturalistic information on patterns in day-to-day after randomization, during treatment, prior to
functioning (see Chapter 11 in this volume). posttreatment evaluation, or during the follow-up
In a well-designed RCT, multiple targets are interval. Increasingly, researchers are evaluating
assessed to determine treatment evaluation. For predictors and correlates of attrition to elucidate
example, one can measure the presence of a diag- the nature of treatment dropout, to understand
nosis, overall well-being, interpersonal skills, self- treatment tolerability, and to enhance the sustain-
reported mood, family functioning, occupational ability of mental health services in the community
impairment, and health-related quality of life. No (Kendall & Sugarman, 1997; Reis & Brown, 2006;
one target captures all, and using multiple targets Vanable, Carey, Carey, & Maisto., 2002). However,
facilitates an examination of therapeutic changes from a research methods standpoint, attrition can
when changes occur, and the absence of change be problematic for data analysis, such as when there
when interventions are less beneficial. However, are large numbers of noncompleters or when attri-
inherent in a multiple-domain assessment strategy tion varies across conditions (Leon et al., 2006;
is the fact that it is rare that a treatment produces Molenberghs et al., 2004).
uniform effects across assessed domains. Suppose a Regardless of how diligently researchers work to
treatment, relative to a control condition, improves prevent attrition, data will likely be lost. Although
participants’ severity of anxiety, but not their over- attrition rates vary across RCTs and treated clini-
all quality of life. In an RCT designed to evaluate cal populations, Mason (1999) estimated that most
improved anxiety symptoms and quality of life, researchers can expect roughly 20 percent of their
should the treatment be deemed efficacious if only sample to withdraw or be removed from a study
one of two measures showed gains? The Range of prior to completion. To address this matter, research-
Possible Changes model (De Los Reyes & Kazdin, ers can conduct and report two sets of analyses:
2006) calls for a multidimensional conceptualiza- (a) analyses of outcomes for treatment completers
tion of intervention change. In this spirit, we rec- and (b) analyses of outcomes for all participants
ommend that RCT researchers be explicit about the who were included at the time of randomiza-
domains of functioning expected to change and the tion (i.e., the intent-to-treat sample). Treatment-
relative magnitude of such expected changes. We completer analyses involve the evaluation of only
also caution consumers of the treatment outcome those who actually completed treatment and exam-
literature against simplistic dichotomous appraisals ine what the effects of treatment are when someone
of treatments as efficacious or not. completes a full treatment course. Treatment refus-
ers, treatment dropouts, and participants who fail
Data Analysis to adhere to treatment schedules are not included in
Data analysis is an active process through which such analyses. Reports of such treatment outcomes
we extract useful information from the data we have may be somewhat elevated because they represent

k end all, comer, c h ow 49

the results for only those who adhered to and com- be used (see Neuner et al., 2008, for an example).
pleted the treatment. A more conservative approach Mixed-effects modeling may be particularly useful
to addressing missing data, intent-to-treat analysis, in addressing missing data if numerous assessments
entails the evaluation of outcomes for all partici- are collected across a treatment trial (e.g., weekly
pants involved at the point of randomization. As symptom reports are collected).
proponents of intent-to-treatment analyses we say, Despite sophisticated data analytic approaches
“once randomized, always analyzed.” to accounting for missing data, we recommend that
Careful consideration is required when selecting researchers attempt to contact noncompleting par-
an appropriate analytic method to handle missing ticipants and re-evaluate them at the time when the
endpoint data because different methods can pro- treatment would have ended. This method accounts
duce different outcomes (see Chapter 19 in this for the passage of time, as both dropouts and treat-
volume). Researchers address missing endpoint ment completers are evaluated over time periods
data via one of several ways: (a) last observation of the same duration, and minimizes any potential
carried forward (LOCF), (b) substituting pretreat- error introduced by statistical imputation and mod-
ment scores for posttreatment scores, (c) multiple eling approaches to missing data. If this method is
imputation methods, and (d) mixed-effects models. used, however, it is important to determine whether
LOCF analysis assumes that participants who drop dropouts sought and/or received alternative treat-
out remain constant on the outcome variable from ments in the interim.
their last assessed point through the posttreatment
evaluation. For example, if a participant drops out at Assessing the Persuasiveness
week 6, the data from the week 5 assessment would of Therapeutic Outcomes
be substituted for his or her missing posttreatment Data produced by RCTs are submitted to statisti-
assessment data. The LOCF approach can be prob- cal tests of significance. Mean scores for participants
lematic however, as the last data collected may not in each condition are compared, within-group and
be representative of the dropout participant’s ulti- between-group variability is considered, and the
mate progress or lack of progress at posttreatment, analysis produces a numerical figure, which is then
given that participants may change after dropping checked against critical values. Statistical signifi-
out of treatment. The use of pretreatment data as cance is achieved when the magnitude of the mean
posttreatment data (a conservative and not recom- difference is beyond that which could have resulted
mended method) simply inserts pretreatment scores by chance alone (conventionally defined as p < .05).
for cases of attrition as posttreatment scores, assum- Tests of statistical significance are essential as they
ing that participants who drop out make no change inform us that the degree of change was likely not
from their initial baseline state. Critics of pretreat- due to chance.
ment substitution and LOCF argue that these crude Importantly, statistical tests alone do not provide
methods introduce systematic bias and fail to take evidence of clinical significance. Sole reliance on sta-
into account the uncertainty of posttreatment func- tistical significance can lead to perceiving treatment
tioning (see Leon et al., 2006). More current missing gains as potent when in fact they may be clinically
data imputation methods are grounded in statistical insignificant. For example, imagine that the results
theory and incorporate the uncertainty regarding of a treatment outcome study demonstrate that
the true value of the missing data. Multiple imputa- mean Beck Depression Inventory (BDI) scores are
tion methods impute a range of values for the miss- significantly lower at posttreatment than pretreat-
ing data, incorporating the uncertainty of the true ment. An examination of the means, however,
values of missing data and generating a number of reveals only a small but reliable shift from a mean
nonidentical datasets (Little & Rubin, 2002). After of 29 to a mean of 26. With larger sample sizes, this
the researcher conducts analyses on the nonidenti- difference may well achieve statistical significance
cal datasets, the results are pooled and the result- at the conventional p < .05 level (i.e., over 95 per-
ing variability addresses the uncertainty of the true cent chance that the finding is not due to chance
value of the missing data. alone), yet perhaps be of limited practical signifi-
Mixed-effects modeling, which relies on linear cance. Both before and after treatment, the scores
and/or logistic regression to address missing data in are within the range considered indicative of clinical
the context of random (e.g., participant) and fixed levels of depressive distress (Kendall, Hollon, Beck,
effects (e.g., treatment, age, sex) (see Hedeker & Hammen, & Ingram, 1987), and such a small mag-
Gibbons, 1994, 1997; Laird & Ware, 1982), can nitude of change may have little effect on a person’s

50 t h e r a n do mized co ntro l l ed trial

life impairment (Gladis, Gosch, Dishuk, & Crits- trial in which elementary school children with mild
Christoph, 1999). Conversely, statistically meager to moderate symptoms of depression were randomly
results may disguise meaningful changes in partici- assigned either to a Primary and Secondary Control
pant functioning. As Kazdin (1999) put it, some- Enhancement Training (PASCET) program or to a
times a little can mean a lot, and vice versa. no-treatment control group. Normative compari-
Clinical significance refers to the persuasiveness sons were used to determine whether participants’
or meaningfulness of the magnitude of change scores on two depression measures, the Children’s
(Kendall, 1999). Whereas statistical significance Depression Inventory and the Revised Children’s
tests address the question, “Were there treatment- Depression Rating Scale, fell within one standard
related changes?” tests of clinical significance address deviation above elementary school norm groups
the question, “Were treatment-related changes at pretreatment, posttreatment, and 9-month fol-
meaningful and convincing?” Specifically, this can low-up time points. Utilizing normative compari-
be made operational as changes on a measure of sons allowed the authors to conclude that children
the presenting problem (e.g., anxiety symptoms) who had received the treatment intervention were
that result in the participants being returned to more likely to fall within the normal range on
within normal limits on that same measure. Several depression measures than children in the no-treat-
approaches for measuring clinically significant ment control condition.
change have been developed, two of which are nor- The Reliable Change Index (RCI; Jacobson,
mative sample comparison and reliable change index. Follette, & Revenstorf, 1984; Jacobson & Traux,
Normative comparisons (Kendall & Grove, 1991) is another popular method to examine clini-
1988; Kendall, Marrs-Garcia, Nath, & Sheldrick, cally significant change. The RCI entails calculating
1999) are conducted in several steps. First, the the number of participants moving from a dysfunc-
researcher selects a normative group for posttreat- tional to a normative range. Specifically, the research
ment comparison. Given that several well-estab- calculates a difference score (posttreatment minus
lished measures provide normative data (e.g., the pretreatment) divided by the standard error of mea-
BDI, the Child Behavior Checklist), investigators surement (calculated based on the reliability of the
may choose to rely on these preexisting normative measure). The RCI is influenced by the magnitude
samples. However, when normative data do not of change and the reliability of the measure. The
exist, or when the treatment sample is qualitatively RCI has been used in RCT research, although its
different on key factors (e.g., socioeconomic status originators point out that it has at times been mis-
indicators, age), it may be necessary to collect one’s applied (Jacobson, Roberts, Berns, & McGlinchey,
own normative data. In a typical RCT, when using 1999). When used in conjunction with reliable
statistical tests to compare groups, the investigator measures and appropriate cutoff scores, it can be a
assumes equivalency across groups (null hypothesis) valuable tool for assessing clinical significance.
and aims to find that they are not (alternate hypoth-
esis). However, when the goal is to show that treated Evaluating Change Mechanisms
individuals are equivalent to “normal” individuals The RCT researcher is often interested in iden-
on some factor (i.e., are indistinguishable from tifying (a) the conditions that dictate when a treat-
normative comparisons), traditional hypothesis- ment is more or less effective and (b) the processes
testing methods are inadequate. One uses an equiv- through which a treatment produces change.
alency testing method to circumvent this problem Addressing such issues necessitates the specifica-
(Kendall, Marrs-Garcia, et al., 1999) that examines tion of moderator and mediator variables (Baron
whether the difference between the treatment and & Kenny, 1986; Holmbeck, 1997; Kraemer et al.,
normative groups is within some predetermined 2002). A moderator is a variable that delineates the
range. When used in conjunction with traditional conditions under which a given treatment is related
hypothesis testing, this approach allows conclusions to an outcome. Conceptually, moderators iden-
to be drawn about the equivalency of groups (see, tify on whom and under what circumstances treat-
e.g., Jarrett, Vittengl, Doyle, & Clark, 2007; Pelham ments have different effects (Kraemer et al., 2002).
et al., 2000; Westbrook & Kirk, 2007, for examples A moderator is functionally a variable that influ-
of normative comparisons), thus testing that post- ences either the strength or direction of a relation-
treatment data are within a normative range on the ship between an independent variable (treatment)
measure of interest. For example, Weisz and col- and a dependent variable (outcome). For example,
leagues (1997) utilized normative comparisons in a if in an RCT the experimental treatment was found

k end all, comer, c h ow 51

to be more effective with men than with women, Schulz, & Altman, 2001). An international group
but this gender effect was not found in response of epidemiologists, statisticians, and journal editors
to the control treatment, then gender would be developed a set of consolidated standards for report-
considered a moderator of the association between ing trials (i.e., CONSORT; see Begg et al., 1996) in
treatment and outcome. Treatment moderators help order to maximize transparency in RCT reporting.
clarify for consumers of the treatment outcome lit- CONSORT guidelines consist of a 22-item checklist
erature which patients might be most responsive to of study features that can bias estimates of treatment
which treatments, and for which patients alternative effects, or that are critical to judging the reliability or
treatments might be sought. Importantly, when a relevance of RCT findings, and consequently should
variable broadly predicts outcome across all treat- be included in a comprehensive research report. A
ment conditions in an RCT, conceptually that vari- quality report will address each of these 22 items.
able is simply a predictor, and not a moderator (see Importantly, participant flow should be character-
Kraemer et al., 2002). ized at each research stage. The researcher reports the
On the other hand, a mediator is a variable that specific numbers of participants who were randomly
serves to explain the process by which a treatment assigned to each treatment condition, who received
affects an outcome. Conceptually, mediators iden- treatments as assigned, who participated in post-
tify how and why treatments take effect (Kraemer treatment evaluations, and who participated in fol-
et al., 2002). The mediator effect reveals the mecha- low-up evaluations. It has become standard practice
nism through which the independent variable (e.g., for scientific journals to require a CONSORT flow
treatment) is related to outcome (e.g., treatment- diagram. See Figure 4.1 for an example of a flow dia-
related changes). Accordingly, mediational models gram used in reporting to depict participant flow at
are inherently causal models, and in the context of each stage of an RCT.
an RCT, significant meditational pathways inform Next, the researcher must decide where to submit
us about causal relationships. If an effective treat- the report. We recommend that researchers consider
ment for child externalizing problems was found submitting RCT findings to peer-reviewed journals
to have an impact on parenting behavior, which in only. Publishing RCT outcomes in a refereed journal
turn was found to have a significant influence on (i.e., one that employs the peer-review process) sig-
child externalizing behavior, then parent behavior nals that the work has been accepted and approved
would be considered to mediate the treatment-to- for publication by a panel of impartial and qualified
outcome relationship (provided certain statistical reviewers (i.e., independent researchers knowledge-
criteria were met; see Holmbeck, 1997). Specific able in the area but not involved with the RCT).
statistical methods used to evaluate the presence of Consumers should be highly cautious of RCTs
treatment moderation and mediation can be found published in journals that do not place manuscript
elsewhere (see Chapter 15 in this volume). submissions through a rigorous peer-review pro-
cess. Although the peer-review process slows down
Reporting the Results the speed with which one is able to communicate
Communicating study findings to the scientific RCT results, much to the chagrin of the excited
community constitutes the final stage of conducting researcher who just completed an investigation, it is
an RCT. A quality report will present outcomes in nonetheless one of the indispensable safeguards that
the context of previous related work (e.g., discuss- we have to ensure that our collective knowledge base
ing how the findings build on and support previ- is drawn from studies meeting acceptable standards.
ous work; discussing the ways in which findings are Typically, the review process is “blind,” meaning that
discrepant from previous work and why this may the authors of the article do not know the identities
be the case), as well as consider shortcomings and of the peer reviewers who are considering their man-
limitations that can direct future empirical efforts uscript. Many journals now employ a double-blind
and theory in the area. To prepare a well-constructed peer-review process in which the identities of study
report, the researcher must provide all of the relevant authors are also not known to the peer reviewers.
information for the reader to critically appraise,
interpret, and/or replicate study findings. It has Extensions and Variations of the RCT
been suggested that there have been inadequacies Thus far, we have addressed considerations
in the reporting of RCTs (see Westen et al., 2004). related to the design and implementation of the
Inadequacies in the reporting of RCTs can result in standard RCT. We now turn our attention to impor-
bias in estimating treatment effectiveness (Moher, tant extensions and variations of the RCT. These

52 t h e r a n do mized co ntro l l ed trial

Completed phone screen [n=242]

Declined [n=44]

Completed diagnostic evaluation


Did not meet study criteria [n=42]

Randomized [n=156]

Behavior Therapy [n=52] Parent education/support [n=53] No Treatment [n=51]

Study Dropout [n=1] Study Dropout [n=5] Study Dropout [n=11]

Completed posttreatment assessment Completed posttreatment assessment Completed posttreatment assessment

[n=51] [n=48] [n=40]

Study Dropout [n=4] Study Dropout [n=6] Study Dropout [n=8]

Completed 12-month follow-up Completed 12-month follow-up Completed 12-month follow-up

[n=47] [n=42] [n=32]

Figure 4.1 Example of flow diagram used in reporting to depict participant flow at each study stage.

treatment study designs—which include equiva- amount of time. In another example, a researcher
lency designs and sequenced treatment designs— may be interested in comparing a cross-diagnostic
address key questions that cannot be adequately treatment (e.g., one that flexibly addresses any of
addressed with the traditional RCT. We discuss the common child anxiety disorders—separation
each of these designs in turn and note some of their anxiety disorder, social anxiety disorder, or gener-
strengths and limitations. alized anxiety disorders) relative to single-disorder
treatment protocols for those specific disorders. The
Equivalency Designs researcher may not hold a hypothesis that the cross-
As varied therapeutic treatment interventions diagnostic protocol produces superior outcomes
are becoming readily available, research is needed over single-disorder treatment protocols, but if it
to determine their relative efficacy. Sometimes, the could be demonstrated that it produces equivalent
researcher is not interested in evaluating the superi- outcomes, parsimony would suggest that the cross-
ority of one treatment over another, but rather that diagnostic protocol would be the most efficient to
a treatment produces comparable results to another broadly disseminate.
treatment that differs in key ways. For example, In equivalency research designs, significance tests
a researcher may be interested in determining are utilized to determine the equivalence of treat-
whether an individual treatment protocol can yield ment outcomes observed across multiple active
comparable results when administered in a group treatments. While a standard significance test
format. The researcher may not hold a hypothesis would be used in a comparative trial, such a test
that the group format would produce superior out- could not conclude equivalency between treatments
comes, but if it could be demonstrated that the two because a nonsignificant difference does not neces-
treatments produce equivalent outcomes, the group sarily signify equivalence (Altman & Bland, 1995).
treatment may nonetheless be preferred due to the In an equivalency design, a confidence interval is
efficiency of treating multiple patients in the same established to define a range of points within which

k end all, comer, c h ow 53

treatments may be deemed essentially equivalent studies. In addition, as noted earlier, equivalency
(Jones, Jarvis, Lewis, & Ebbutt, 1996). To minimize tests can be conducted to determine the clinical
bias, this confidence interval must be determined significance of treatment outcomes—that is, the
prior to data collection. extent to which posttreatment functioning identi-
Barlow and colleagues, for example, are currently fied in a treated group is comparable to functioning
testing the efficacy of a transdiagnostic treatment among normative comparisons (Kendall, Marrs-
(Unified Protocol for Emotional Disorders; Barlow, Garcia, et al., 1999).
Farchione, Fairholme, Ellard, Boisseau, et al., 2010)
for anxiety disorders. The proposed analyses include Sequenced Treatment Designs
a rigorous comparison of the Unified Protocol (UP) When interventions are applied, a treated partici-
against single-diagnosis psychological treatment pant’s symptoms may improve (treatment response),
protocols (SDPs). Statistical equivalence will be may get worse (deterioration), may neither improve
used to test the hypothesis that the UP is statistically nor deteriorate (treatment nonresponse), or may
equivalent to SDPs. An a priori confidence interval improve somewhat but not to a satisfactory extent
around change in the clinical severity rating (CSR) (partial response). In clinical practice, over the
will be utilized to evaluate statistical equivalence course of treatment important clinical decisions
among treatments. The potential finding that the must be made regarding when to escalate treatment,
UP is indeed equivalent to SDPs in the treatment augment treatment with another intervention, or
of anxiety disorders, regardless of specific diagnosis, switch to another supported intervention. The stan-
would have important implications for treatment dard RCT design does not provide sufficient data
dissemination and transportability. with which to inform the optimal sequence of treat-
A variation of the equivalency research design is ment for cases of nonresponse, partial response, or
the benchmarking design, which involves a quantita- deterioration.
tive comparison between treatment outcomes col- When the aim of a research study is to deter-
lected in a current study and results from similar mine the most effective sequence of treatments for
treatment outcome studies. Demonstrating equiva- an identified patient population, a sequenced treat-
lence in such a study design allows the researcher to ment design may be utilized. This design involves
determine whether results from a current treatment the assignment of study participants to a particular
evaluation are equivalent to findings reported else- sequence of treatment and control/comparison con-
where in the literature. Results of a trial are evalu- ditions. The order in which conditions are assigned
ated, or benchmarked, against the findings from may be random, as in a randomized sequence design.
other comparable trials. Weersing and Weisz (2002) In other sequenced treatment designs, factors such
used a benchmarking design to assess differences in as participant characteristics, individual treatment
the effectiveness of community psychotherapy for outcomes, or participant preferences may influ-
depressed youth versus evidence-based CBT pro- ence the sequence of administered treatments.
vided in RCTs. The authors aggregated data from These variations on sequenced treatment designs—
all available clinical trials evaluating the effects of prescriptive, adaptive, and preferential treatment
best-practice treatment, determined the pooled designs, respectively—are outlined in further detail
effect sizes associated with depressed youth treated below.
in these clinical trials, and benchmarked these data The prescriptive treatment design recognizes that
with outcomes of depressed youth treated in com- individual patient characteristics play a key role in
munity mental health clinics. They found that out- treatment outcomes and assigns treatment condition
comes of youth treated in community care settings based on these patient characteristics. The basis of
were more similar to youth in control conditions this treatment design aims to improve upon nomo-
than to youth treated with CBT. thetic data models by incorporating idiographic
Benchmarking equivalency designs allow for data to treatment assignments (see Barlow & Nock,
meaningful comparison groups with which to 2009). Study participants who are matched to treat-
gauge the progress of treated participants in a clini- ment conditions based on individual characteristics
cal trial. The comparison data are typically readily (e.g., psychiatric comorbidity, levels of distress and
available, given that they may include samples that impairment, readiness to change, etc.) may experi-
have been used to obtain normative data for specific ence greater gains than those who are not matched
measures, or research participants whose outcome to interventions based on patient characteristics
data are included in reported results in published (Beutler & Harwood, 2000). In a prescriptive

54 t h e r a n do mized co ntro l l ed trial

treatment design, the clinical researcher studies one experimental group to another if a particular
the effectiveness of a treatment decision-making intervention is lacking in effectiveness for an indi-
algorithm as opposed to a set treatment protocol. vidual patient (see Chapter 5 in this volume). After
Participants do not have an equal chance of receiv- a participant reaches a predetermined deterioration
ing study treatments, as is the case in the standard threshold, or if he or she fails to meet a response
RCT. Instead, what remains consistent across par- threshold before a given point during a trial, the
ticipants is the application of the same decision- participant may be switched from the innovative
making algorithm, which can lead to a variety of treatment to the accepted standard, or vice versa. In
sequenced treatment courses. this way, the adaptive treatment option allows the
Although a prescriptive treatment design may clinical researcher to determine the relative efficacy
enhance clinical generalizability—as practitioners of the innovative treatment if the adaptive strategy
will typically incorporate patient characteristics into produces significantly better outcomes than the
treatment planning—this design introduces seri- standard treatment (Dawson & Lavori, 2004).
ous threats to internal validity. In a variation, the Illustrating the utility of the adaptive treatment
randomized prescriptive treatment design randomizes design, the Sequenced Treatment Alternatives to
participants to either a blind randomization algo- Relieve Depression (STAR*D) study assessed the
rithm or an experimental treatment algorithm. For effectiveness of depression treatments in patients
example, algorithm A may randomly assign partici- with major depressive disorder (Rush et al., 2004).
pants to one of three treatment conditions (i.e., the Participants advanced through four levels of treat-
blind randomization algorithm), and algorithm B ment and were assigned a particular course of treat-
may match participants to each of the three treat- ment depending on their response to treatment up
ment conditions based on baseline data hypoth- until that point. In level 1, all participants were given
esized to inform the optimal treatment assignment citalopram for 12 to 14 weeks. Those who became
(i.e., the experimental treatment algorithm). Here, symptom-free after this period could continue on
the researcher is interested in which algorithm con- citalopram for a 12-month follow-up period, while
dition is superior, rather than what is the absolute those who did not become symptom-free moved
effect of a specific treatment protocol. on to level 2. Levels 2 and 3 allowed participants
One of the primary goals of prescriptive treat- to choose another medication or cognitive therapy
ment research designs is to examine the effectiveness (switch) or augment their current medication with
of treatments tailored to individuals for whom those another medication or cognitive therapy (add-on).
treatments are thought to work best. Key patient In level 4, participants who were not yet symptom-
dimensions that have been found in nomothetic free were taken off their medications and randomly
evaluations to be effective mediators or modera- assigned to either a monoamine oxidase inhibitor
tors of treatment outcome can lay the groundwork (MAOI) or the combination of venlafaxine extended
for decision rules to assign participants to particu- release with mirtazapine. At each stage of the study,
lar interventions, or alternatively, can lead to the participants were assigned to treatments based on
administration or omission of a specific module their previous treatment responses. Roughly half
within a larger intervention. Prescriptive treatment of the participants became symptom-free after two
designs offer opportunities to better develop and treatment levels, and roughly 70 percent of study
tailor efficacious treatments to patients with varied completers became symptom-free over the course of
characteristics. all four treatment levels.
In the adaptive treatment design, a participant’s The Systematic Treatment Enhancement Program
course of treatment is determined by his or her clin- for Bipolar Disorder (STEP-BD), a national, lon-
ical response across the trial. In the traditional RCT, gitudinal public health initiative, was implemented
a comparison is typically made between an innova- in an effort to gauge the effectiveness of treatments
tive treatment and some sort of placebo or accepted for adults with bipolar disorder (Sachs et al., 2003).
standard. Some argue that a more clinically relevant All participants were initially invited to enter
design involves a comparison between an innova- Standard Care Pathways (SCPs), which involved
tive treatment and an adaptive strategy in which a clinical care delivered by a STEP-BD clinician. At
participant’s treatment condition is switched based any point during their participation in SCP treat-
on treatment outcome to date (Dawson & Lavori, ment, participants could become eligible for one
2004). With the adaptive treatment study design, of the Randomized Care Pathways (RCPs) for
clinical researchers can also switch participants from acute depression, refractory depression, or relapse

k end all, comer, c h ow 55

prevention. Upon meeting nonresponse criteria In a multiple-groups crossover design, participants
(i.e., failure to respond to treatment within the first are randomly assigned to receive a sequence of at
12 weeks, or failure to respond to two or more anti- least two treatments, one of which may be a control
depressants in the current depressive episode), par- condition. In this design, participants act as their
ticipants were randomly assigned to one of the RCP own controls, as at some point during the trial, they
treatment arms. After participating in one of these receive each of the experimental and control/com-
treatment arms, participants could return to SCP parison conditions. Because each participant is his
or opt to participate in another RCP. Some of the or her own control, the risk of having comparison
treatment arms also allowed the treating clinician groups that are dissimilar on variables such as demo-
to exclude a participant from an RCP based on his graphic characteristics, severity of presenting symp-
or her particular presentation. The treatment was toms, and comorbidities is eliminated. Precautions
therefore adaptive in nature, allowing for flexibil- should be taken to ensure that the effects of one
ity and some element of decision making to occur treatment intervention have receded before starting
within the trial. Importantly, although such flexibil- participants on the next treatment intervention.
ity may enhance generalizability to clinical settings Illustrations of multiple-groups crossover designs
and when used appropriately can guide clinical can often be found in clinical trials testing the effi-
practice, this flexibility introduces serious threats to cacy of various medications. Hood and colleagues
internal validity. Accordingly, this design does not (2010) utilized a double-blind crossover design in a
allow inferences to be made about the absolute ben- study with untreated and selective serotonin reuptake
efit of various interventions. inhibitor (SSRI)-remitted patients with social anxi-
The preferential treatment design allows study ety disorder. Participants were administered a single
participants to choose the treatment condition(s) dose of either pramipexole (a dopamine agonist) or
to which they are assigned. This approach consid- sulpiride (a dopamine antagonist). One week later,
ers patient preferences, which emulates the process participants received a single dose of the medication
that typically occurs in clinical practice. Taking into they had not received the previous week. Following
account patient preferences in a treatment study can each medication administration, participants were
result in a better understanding of which individu- asked to rate their anxiety and mood, and they
als will fare best when administered specific inter- were invited to engage in anxiety-provoking tasks.
ventions under circumstances that incorporate their The authors concluded that untreated participants
preferences in determining treatment selection. experienced significant increases in anxiety symp-
Proponents often argue that assigning treatments toms following anxiety-provoking tasks after both
based on patient preference may increase other fac- medications. In contrast, SSRI-remitted partici-
tors known to positively affect treatment outcomes, pants experienced elevated anxiety under the effects
including patient motivation, attitudes toward of sulpiride and decreased anxiety levels under the
treatment, and expectations of treatment success. effects of pramipexole.
Lin and colleagues (2005) utilized a preferential Multiple-groups crossover designs are best suited
treatment design to explore the effects of matching for the evaluation of interventions that would not
patient preferences and interventions in a popula- expectedly retain effects once they are removed, as
tion of adults with major depression. Participants is the case in the evaluation of a therapeutic medi-
were offered antidepressant medication and/or cation with a very short half-life. These designs are
counseling based on patient preference, where more difficult to implement in the evaluation of psy-
appropriate. Participants who were matched to their chosocial interventions, which often produce effects
treatment preference exhibited more positive treat- that are somewhat irreversible (e.g., the learning of
ment outcomes at 3- and 9-month follow-up evalu- a skill, or the acquisition of important knowledge).
ations than participants who were not matched to How can the clinical researcher evaluate separate
their preferred treatment condition. treatment phases when it is not possible to com-
Importantly, outcomes identified in preferential pletely remove the intervention? In such situations,
treatment designs are intertwined with the con- crossover designs are misguided.
found of patient preferences. Accordingly, clinical Proponents of sequential designs argue that
researchers are wise to use preferential treatment designs that are informed by patient characteristics,
designs only after treatment efficacy has first been outcomes, and preferences provide patients with
established for the various treatment arms in a ran- uniquely individualized care within a clinical trial.
domized design. The argument suggests that an appropriate match

56 the r a n do mized co ntro l l ed trial

between patient characteristics and treatment type service of only profitability and/or cost contain-
will optimize success in producing significant treat- ment. Clinical trials must retain scientific rigor to
ment effects and lead to a heightened understand- enhance the ability of practitioners to deliver effec-
ing of interventions that are best suited to a variety tive treatment procedures to individuals in need.
of patients and circumstances in clinical practice
(Luborsky et al., 2002). In this way, systematic References
evaluation is extended to the very decision-making Addis, M., Cardemil, E. V., Duncan, B., & Miller, S. (2006).
algorithms that occur in real-world clinical practice, Does manualization improve therapy outcomes? In J. C.
an important element not afforded by the standard Norcross, L. E. Beutler, & R. F. Levant (Eds.), Evidence-based
RCT. However, whereas these approaches increase practices in mental health (pp. 131–160). Washington, DC:
American Psychological Association.
clinical relevance and may enhance the ability to Addis, M., & Krasnow, A. (2000). A national survey of practic-
generalize findings from research to clinical prac- ing psychologists’ attitudes toward psychotherapy treatment
tice, they also decrease scientific rigor by eliminating manuals. Journal of Consulting and Clinical Psychology, 68,
the uniformity of randomization to experimental 331–339.
conditions. Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not
evidence of absence. British Medical Journal, 311 (7003), 485.
American Psychological Association. (2005). Policy statement on
Conclusion evidence-based practice in psychology. Retrieved August 27,
The RCT offers the most rigorous method of 2011, from
examining the causal impact of therapeutic inter- evidence-based-statement.pdf.
Anderson, E. M., & Lambert, M. J. (2001). A survival analysis
ventions. After reviewing the essential consider-
of clinical significant change in outpatient psychotherapy.
ations relevant for matters of design, procedure, Journal of Clinical Psychology, 57, 875–888.
measurement, and data analysis, one recognizes Arnold, L. E., Elliott, M., Sachs, L., Bird, H., Kraemer, H. C.,
that no one single clinical trial, even with optimal Wells, K. C., et al. (2003). Effects of ethnicity on treatment
design and procedures, can alone answer the rel- attendance, stimulant response/dose, and 14-month outcome
in ADHD. Journal of Consulting and Clinical Psychology, 71,
evant questions about the efficacy and effectiveness
of therapy. Rather, a series and collection of studies, Barlow, D. H. (1989). Treatment outcome evaluation methodol-
with varying designs and approaches, is necessary. ogy with anxiety disorders: Strengths and key issues. Advances
The extensions and variations of the RCT addressed in Behavior Research and Therapy, 11, 121–132.
in this chapter address important features relevant Barlow, D. H., Farchione, T. J., Fairholme, C. P., Ellard, K. K.,
Boisseau, C. L., Allen, L. B., & Ehrenreich May, J. T. (2010).
to clinical practice that are not informed by the
Unified protocol for transdiagnostic treatment of emotional dis-
standard RCT; however, each modification to the orders: Therapist guide. New York: Oxford University Press.
standard RCT decreases the maximized internal Barlow, D. H., & Nock, M. K. (2009). Why can’t we be more
validity achieved in the standard RCT. Criteria for idiographic in our research? Perspectives on Psychological
determining evidence-based practice have been pro- Science, 4(1), 19–21.
Baron, R. M., & Kenny, D. A. (1986). The mediator-moder-
posed (American Psychological Association, 2005,
ator variable distinction in social psychological research:
Chambless & Hollon, 1998), and the quest to iden- Conceptual, strategic, and statistical consideration. Journal
tify such treatments continues. The goal is for the of Personality and Social Psychology, 51, 1173–1182.
research to be rigorous, with the end goal being to Begg, C. B., Cho, M. K., Eastwood, S., Horton, R., Moher,
optimize clinical decision making and practice for D., Olkin, I., et al. (1996). Improving the quality of report-
ing of randomized controlled trials: The CONSORT state-
those affected by emotional and behavioral disor-
ment. Journal of the American Medical Association, 276,
ders and problems. 637–639.
Controlled trials play a vital role in facilitating a Beidel, D. C., Turner, S. M., Sallee, F. R., Ammerman, R. T.,
dialogue between academic clinical psychology and Crosby, L. A., & Pathak, S. (2007). SET-C vs. Fluoxetine
the public and private sector (e.g., insurance pay- in the treatment of childhood social phobia. Journal of the
American Academy of Child and Adolescent Psychiatry, 46,
ers, Department of Health and Human Services,
policymakers). The results of controlled clinical Bernal, G., Bonilla, J., & Bellido, C. (1995). Ecological validity
evaluations are increasingly being examined by both and cultural sensitivity for outcome research: Issues for the
professional associations and managed care organi- cultural adaptation and development of psychosocial treat-
zations with the intent of formulating clinical prac- ments with Hispanics. Journal of Abnormal Child Psychology,
23, 67–82.
tice guidelines for cost-effective care that provides
Bernal, G., & Scharron-Del-Rio, M. R. (2001). Are empirically
optimized service to those in need. In the absence supported treatments valid for ethnic minorities? Toward an
of compelling data, there is the risk that psychologi- alternative approach for treatment research. Cultural Diversity
cal practice will be co-opted and exploited in the and Ethnic Minority Psychology, 7, 328–342.

k end all, comer, c h ow 57

Bersoff, D. M., & Bersoff, D. N. (1999). Ethical perspectives Gladis, M. M., Gosch, E. A., Dishuk, N. M., & Crits-Cristoph,
in clinical research. In P. C. Kendall,, J. Butcher, & G. P. (1999). Quality of life: Expanding the scope of clinical
Holmbeck (Eds.), Handbook of research methods in clinical significance. Journal of Consulting and Clinical Psychology, 67,
psychology (pp. 31–55). New York, NY: Wiley. 320–331.
Beutler, L. E., & Harwood, M. T. (2000). Prescriptive therapy: Hall, G. C. N. (2001). Psychotherapy research with ethnic
A practical guide to systematic treatment selection. New York: minorities: Empirical, ethnical, and conceptual issues.
Oxford University Press. Journal of Consulting and Clinical Psychology, 69, 502–510.
Blanco, C., Olfson, M., Goodwin, R. D., Ogburn, E., Liebowitz, Hedeker, D., & Gibbons, R. D. (1994). A random-effects ordi-
M. R., Nunes, E. V., & Hasin, D. S. (2008). Generalizability nal regression model for multilevel analysis. Biometrics, 50,
of clinical trial results for major depression to community 933–944.
samples: Results from the National Epidemiologic Survey Hedeker, D., & Gibbons, R. D. (1997). Application of random-
on Alcohol and Related Conditions. Journal of Clinical effects pattern-mixture models for missing data in longitudi-
Psychiatry, 69, 1276–1280. nal studies. Psychological Methods, 2, 64–78.
Brown, R. A., Evans, M., Miller, I., Burgess, E., & Mueller, T. Heiser, N. A., Turner, S. M., Beidel, D. C., & Roberson-Nay, R.
(1997). Cognitive-behavioral treatment for depression in (2009). Differentiating social phobia from shyness. Journal of
alcoholism. Journal of Consulting and Clinical Psychology, 65, Anxiety Disorders, 23, 469–476.
715–726. Hoagwood, K. (2002). Making the translation from research to
Chambless, D. L., & Hollon, S. D. (1998). Defining empiri- its application: The je ne sais pas of evidence-based practices.
cally supported therapies. Journal of Consulting and Clinical Clinical Psychology: Science and Practice, 9, 210–213.
Psychology, 66, 7–18. Hollon, S. D. (1996). The efficacy and effectiveness of psycho-
Comer, J. S., & Kendall, P. C. (2004). A symptom-level exami- therapy relative to medications. American Psychologist, 51,
nation of parent-child agreement in the diagnosis of anx- 1025–1030.
ious youths. Journal of the American Academy of Child and Hollon, S. D., & DeRubeis, R. J. (1981). Placebo-psychotherapy
Adolescent Psychiatry, 43, 878–886. combinations: Inappropriate representation of psychother-
Creed, T. A., & Kendall, P. C. (2005). Therapist alliance-building apy in drug-psychotherapy comparative trials. Psychological
behavior within a cognitive-behavioral treatment for anxiety Bulletin, 90, 467–477.
in youth. Journal of Consulting and Clinical Psychology, 73, Hollon, S. D., Garber, J., & Shelton, R. C. (2005). Treatment
498–505. of depression in adolescents with cognitive behavior therapy
Curry, J., Silva, S., Rohde, P., Ginsburg, G., Kratochvil, C., and medications: A commentary on the TADS project.
Simons, A., et al. (2011). Recovery and recurrence follow- Cognitive and Behavioral Practice, 12, 149–155.
ing treatment for adolescent major depression. Archives of Holmbeck, G. N. (1997). Toward terminological, conceptual,
General Psychiatry, 68(3), 263–269. and statistical clarity in the study of mediators and modera-
Dawson, R., & Lavori, P. W. (2004). Placebo-free designs for tors: Examples from the child-clinical and pediatric psychol-
evaluating new mental health treatments: The use of adaptive ogy literatures. Journal of Consulting and Clinical and Clinical
treatment strategies. Statistics in Medicine, 23, 3249–3262. Psychology, 65, 599–610.
De Los Reyes, A., & Kazdin, A. E. (2005). Informant discrepan- Homma-True, R., Greene, B., Lopez, S. R., & Trimble, J. E.
cies in the assessment of childhood psychopathology: A criti- (1993). Ethnocultural diversity in clinical psychology.
cal review, theoretical framework, and recommendations for Clinical Psychologist, 46, 50–63.
further study. Psychological Bulletin, 131, 483–509. Hood, S., D., Potokar, J. P., Davies, S. J., Hince, D. A., Morris,
De Los Reyes, A., & Kazdin, A. E. (2006). Conceptualizing K., Seddon, K. M., et al. (2010). Dopaminergic challenges in
changes in behavior in intervention research: The range social anxiety disorder: Evidence for dopamine D3 desensiti-
of possible changes model. Psychological Review, 113, zation following successful treatment with serotonergic anti-
554–583. depressants. Journal of Psychopharmacology, 24(5), 709–716.
Dobson, K. S., & Hamilton, K. E. (2002). The stage model for Huey, S. J., & Polo, A. J. (2008). Evidence-based psychosocial
psychotherapy manual development: A valuable tool for pro- treatments for ethnic minority youth. Journal of Clinical
moting evidence-based practice. Clinical Psychology: Science Child and Adolescent Psychology, 37, 262–301.
and Practice, 9, 407–409. Jaccard, J., & Guilamo-Ramos, V. (2002a). Analysis of variance
Dobson, K. S., Hollon, S. D., Dimidjian, S., Schmaling, K. B., frameworks in clinical child and adolescent psychology:
Kohlenberg, R. J., Gallop, R. J., et al. (2008). Randomized Issues and recommendations. Journal of Clinical Child and
trial of behavioral activation, cognitive therapy, and antide- Adolescent Psychology, 31, 130–146.
pressant medication in the prevention of relapse and recur- Jaccard, J., & Guilamo-Ramos, V. (2002b). Analysis of vari-
rence in major depression. Journal of Consulting and Clinical ance frameworks in clinical child and adolescent psychology:
Psychology, 76, 468–477. Advanced issues and recommendations. Journal of Clinical
Dobson, K. S., & Shaw, B. (1988). The use of treatment manu- Child and Adolescent Psychology, 31, 278–294.
als in cognitive therapy. Experience and issues. Journal of Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984).
Consulting and Clinical Psychology, 56, 673–682. Psychotherapy outcome research: Methods for reporting
Flaherty, J. A., & Meaer, R. (1980). Measuring racial bias in variability and evaluating clinical significance. Behavior
inpatient treatment. American Journal of Psychiatry, 137, Therapy, 15, 336–352.
679–682. Jacobson, N. S., & Hollon, S. D. (1996a). Cognitive-behavior
Garfield, S. (1998). Some comments on empirically supported therapy versus pharmacotherapy: Now that the jury’s returned
psychological treatments. Journal of Consulting and Clinical its verdict, it’s time to present the rest of the evidence. Journal
Psychology, 66, 121–125. of Consulting and Clinical Psychology, 74, 74–80.

58 the r a n do mized co ntro l l ed trial

Jacobson, N. S., & Hollon, S. D. (1996b). Prospects for future child and family modalities. Journal of Consulting and Clinical
comparisons between drugs and psychotherapy: Lessons Psychology, 76, 282–297.
from the CBT-versus-pharmacotherapy exchange. Journal of Kendall, P. C., & Kessler, R. C. (2002). The impact of childhood
Consulting and Clinical Psychology, 64, 104–108. psychopathology interventions on subsequent substance
Jacobson, N. S., Roberts, L. J., Berns, S. B., & McGlinchey, J. abuse: Policy implications, comments, and recommenda-
B. (1999). Methods for defining and determining the clini- tions. Journal of Consulting and Clinical Psychology, 70,
cal significance of treatment effects. Description, application, 1303–1306.
and alternatives. Journal of Consulting and Clinical Psychology, Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C.
67, 300–307. (1999). Normative comparisons for the evaluation of clini-
Jacobson, N. S., & Traux, P. (1991). Clinical significance: A sta- cal significance. Journal of Consulting and Clinical Psychology,
tistic approach to defining meaningful change in psychother- 67, 285–299.
apy research. Journal of Consulting and Clinical Psychology, Kendall, P. C., Safford, S., Flannery-Schroeder, E., & Webb, A.
59, 12–19. (2004). Child anxiety treatment: Outcomes in adolescence
Jarrett, R. B., Vittengl, J. R., Doyle, K., & Clark, L. A. (2007). and impact on substance use and depression at 7.4-year
Changes in cognitive content during and following cognitive follow-up. Journal of the Consulting and Clinical Psychology,
therapy for recurrent depression: Substantial and enduring, 72, 276–287.
but not predictive of change in depressive symptoms. Journal Kendall, P. C., & Southam-Gerow, M. A. (1995). Issues in the
of Consulting and Clinical Psychology, 75, 432–446. transportability of treatment: The case of anxiety disorders
Jones, B., Jarvis, P., Lewis, J. A., & Ebbutt, A. F. (1996). Trials in youth. Journal of Consulting and Clinical Psychology, 63,
to assess equivalence: the importance of rigorous methods. 702–708.
British Medical Journal, 313 (7048), 36–39. Kendall, P. C., & Sugarman, A. (1997). Attrition in the treat-
Karver, M., Shirk, S., Handelsman, J. B., Fields, S., Crisp, H., ment of childhood anxiety disorders. Journal of Consulting
Gudmundsen, G., & McMakin, D. (2008). Relationship and Clinical Psychology, 65, 883–888.
processes in youth psychotherapy: Measuring alliance, alli- Kendall, P. C., & Suveg, C. (2008). Treatment outcome stud-
ance-building behaviors, and client involvement. Journal of ies with children: Principles of proper practice. Ethics and
Emotional and Behavioral Disorders, 16, 15–28. Behavior, 18, 215–233.
Kazdin, A. E. (1999). The meanings and measurement of clini- Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment
cal significance. Journal of Consulting and Clinical Psychology, effects and their importance to clinical research and practice.
67, 332–339. Biological Psychiatry, 59, 990–996.
Kazdin, A. E. (2003). Research design in clinical psychology (4th Kraemer, H. C., Wilson, G. T., Fairburn, C. G., & Agras, W.
ed.). Boston, MA: Allyn and Bacon. S. (2002). Mediators and moderators of treatment effects in
Kazdin, A. E., & Bass, D. (1989). Power to detect differences randomized clinical trials. Archives of General Psychiatry, 59,
between alternative treatments in comparative psychother- 877–883.
apy outcome research. Journal of Consulting and Clinical Laird, N. M., & Ware, J. H. (1982). Random-effects models for
Psychology, 57, 138–147. longitudinal data. Biometrics, 38, 963–974.
Kendall, P. C. (1999). Introduction to the special section: Clinical Leon, A. C., Mallinckrodt, C. H., Chuang-Stein, C., Archibald,
Significance. Journal of Consulting and Clinical Psychology, D. G., Archer, G. E., & Chartier, K. (2006). Attrition in
67, 283–284. randomized controlled clinical trials: Methodological
Kendall, P. C., & Beidas, R. S. (2007). Smoothing the trail issues in psychopharmacology. Biological Psychiatry, 59,
for dissemination of evidence-based practices for youth: 1001–1005.
Flexibility within fidelity. Professional Psychology: Research Lin, P., Campbell, D. G., Chaney, E. F., Liu, C., Heagerty, P.,
and Practice, 38, 13–20. Felker, B. L., et al. (2005). The influence of patient pref-
Kendall, P. C., & Chu, B. (1999). Retrospective self-reports of erence on depression treatment in primary care. Annals of
therapist flexibility in a manual-based treatment for youths Behavioral Medicine, 30(2), 164–173.
with anxiety disorders. Journal of Clinical Child Psychology, Little, R. J. A., & Rubin, D. (2002). Statistical analysis with miss-
29, 209–220. ing data (2nd ed.). New York: Wiley.
Kendall, P. C., & Grove, W. (1988). Normative comparisons in Lopez, S. R. (1989). Patient variable biases in clinical judgment:
therapy outcome. Behavioral Assessment, 10, 147–158. Conceptual overview and methodological considerations.
Kendall, P. C., & Hedtke, K. A. (2006). Cognitive-behavioral ther- Psychological Bulletin, 106, 184–204.
apy for anxious children (3rd ed.). Ardmore, PA: Workbook Luborsky, L., Rosenthal, R., Diguer, L., Andrusyna, T. P.,
Publishing. Berman, J. S., Levitt, J. T., et al. (2002). The dodo bird ver-
Kendall, P. C., & Hollon, S. D. (1983). Calibrating ther- dict is alive and well—mostly. Clinical Psychology: Science and
apy: Collaborative archiving of tape samples from ther- Practice, 9(1), 2–12.
apy outcome trials. Cognitive Therapy and Research, 7, Marcus, S. M., Gorman, J., Shea, M. K., Lewin, D., Martinez,
199–204. J., Ray, S., et al. (2007). A comparison of medication side
Kendall, P. C., Hollon, S., Beck, A. T., Hammen, C., & Ingram, effect reports by panic disorder patients with and without
R. (1987). Issues and recommendations regarding use of the concomitant cognitive behavior therapy. American Journal of
Beck Depression Inventory. Cognitive Therapy and Research, Psychiatry, 164, 273–275.
11, 289–299. Mason, M. J. (1999). A review of procedural and statisti-
Kendall, P. C., Hudson, J. L., Gosch, E., Flannery-Schroeder, E., cal methods for handling attrition and missing data.
& Suveg, C. (2008). Cognitive-behavioral therapy for anxi- Measurement and Evaluation in Counseling and Development,
ety disordered youth: A randomized clinical trial evaluating 32, 111–118.

k end all, comer, c h ow 59

Moher, D., Schulz, K. F., & Altman, D. (2001). The CONSORT theory and research. Clinical Psychology: Science and Practice,
Statement: Revised recommendations for improving the 11, 295–299.
quality of reports of parallel-group randomized trials. Journal Snowden, L. R. (2003). Bias in mental health assessment and
of the American Medical Association, 285, 1987–1991. intervention: Theory and evidence. American Journal of
Molenberghs, G., Thijs, H., Jansen, I., Beunckens, C., Kenward, Public Health, 93, 239–243.
M. G., Mallinckrodt, C., & Carroll, R. (2004). Analyzing Southam-Gerow, M. A., Ringeisen, H. L., & Sherrill, J. T. (2006).
incomplete longitudinal clinical trial data. Biostatistics, 5, Integrating interventions and services research: Progress and
445–464. prospects. Clinical Psychology: Science and Practice, 13, 1–8.
MTA Cooperative Group. (1999). A 14-month randomized Sue, S. (1998). In search of cultural competence in psychother-
clinical trial of treatment strategies for attention-deficit/ apy and counseling. American Psychologist, 53, 440–448.
hyperactivity disorder. Archives of General Psychiatry, 56, Suveg, C., Comer, J. S., Furr, J. M., & Kendall, P. C. (2006).
1088–1096. Adapting manualized CBT for a cognitively delayed child
Mufson, L., Dorta, K. P., Wickramaratne, P., Nomura, Y., Olfson, with multiple anxiety disorders. Clinical Case Studies, 5,
M., & Weissman, M. M. (2004). A randomized effectiveness 488–510.
trial of interpersonal psychotherapy for depressed adoles- Sweeney, M., Robins, M., Ruberu, M., & Jones, J. (2005).
cents. Archives of General Psychiatry, 61, 577–584. African-American and Latino families in TADS: Recruitment
Neuner, F., Onyut, P. L., Ertl, V., Odenwald, M., Schauer, E., & and treatment considerations. Cognitive and Behavioral
Elbert, T. (2008). Treatment of posttraumatic stress disorder Practice, 12, 221–229.
by trained lay counselors in an African refugee settlement: Treadwell, K., Flannery-Schroeder, E. C., & Kendall, P. C.
A randomized controlled trial. Journal of Consulting and (1994). Ethnicity and gender in a sample of clinic-referred
Clinical Psychology, 76, 686–694. anxious children: Adaptive functioning, diagnostic status,
O’Leary, K. D., & Borkovec, T. D. (1978). Conceptual, method- and treatment outcome. Journal of Anxiety Disorders, 9,
ological, and ethical problems of placebo groups in psycho- 373–384.
therapy research. American Psychologist, 33, 821–830. Treatment for Adolescents with Depression Study (TADS) Team.
Olfson, M., Cherry, D., & Lewis-Fernandez, R. (2009). Racial (2004). Fluoxetine, cognitive-behavioral therapy, and their
differences in visit duration of outpatient psychiatric visits. combination for adolescents with depression: Treatment for
Archives of General Psychiatry, 66, 214–221. Adolescents with Depression Study (TADS) randomized
Pediatric OCD Treatment Study (POTS) Team. (2004). controlled trial. Journal of the American Medical Association,
Cognitive-behavior therapy, sertraline, and their combina- 292, 807–820.
tion for children and adolescents with obsessive-compulsive Vanable, P. A., Carey, M. P., Carey, K. B., & Maisto, S. A.
disorder: The Pediatric OCD Treatment Study (POTS) ran- (2002). Predictors of participation and attrition in a health
domized controlled trial. Journal of the American Medical promotion study involving psychiatric outpatients. Journal of
Association, 292, 1969–1976. Consulting and Clinical Psychology, 70, 362–368.
Pelham, W. E., Jr., Gnagy, E. M., Greiner, A. R., Hoza, B., Walders, N., & Drotar, D. (2000). Understanding cultural and
Hinshaw, S. P., Swanson, J. M., et al. (2000). Behavioral ethnic influences in research with child clinical and pediatric
versus behavioral and psychopharmacological treatment in psychology populations. In D. Drotar (Ed.), Handbook of
ADHD children attending a summer treatment program. research in pediatric and clinical child psychology (pp. 165–188).
Journal of Abnormal Child Psychology, 28, 507–525. New York: Springer.
Perepletchikova, F., & Kazdin, A. E. (2005). Treatment integ- Walkup, J. T., Albano, A. M., Piacentini, J., Birmaher, B.,
rity and therapeutic change: Issues and research recom- Compton, S. N., et al. (2008) Cognitive behavioral ther-
mendations. Clinical Psychology: Science and Practice, 12, apy, sertraline, or a combination in childhood anxiety. New
365–383. England Journal of Medicine, 359, 1–14.
Reis, B. F., & Brown, L. G. (2006). Preventing therapy dropout Waltz, J., Addis, M. E., Koerner, K., & Jacobson, N. S. (1993).
in the real world: The clinical utility of videotape prepara- Testing the integrity of a psychotherapy protocol: Assessment
tion and client estimate of treatment duration. Professional of adherence and competence. Journal of Consulting and
Psychology: Research and Practice, 37, 311–316. Clinical Psychology, 61, 620–630.
Rush, A. J., Fava, M., Wisniewski, S. R., Lavori, P. W., Trivedi, Weersing, R. V., & Weisz, J. R. (2002). Community clinic
M. H., Sackeim, H. A., et al. (2004). Sequenced treatment treatment of depressed youth: Benchmarking usual care
alternatives to relieve depression (STAR*D): rationale and against CBT clinical trials. Journal of Consulting and Clinical
design. Controlled Clinical Trials, 25(1), 119–142. Psychology, 70(2), 299–310.
Sachs, G. S., Thase, M. E., Otto, M. W., Bauer, M., Miklowitz, Weisz, J., Donenberg, G. R., Han, S. S., & Weiss, B. (1995).
D., Wisniewski, S. R., et al. (2003). Rationale, design, and Bridging the gap between laboratory and clinic in child and
methods of the systematic treatment enhancement pro- adolescent psychotherapy. Journal of Consulting and Clinical
gram for bipolar disorder (STEP-BD). Biological Psychiatry, Psychology, 63, 688–701.
53(11), 1028–1042. Weisz, J. R., Thurber, C. A., Sweeney, L., Proffitt, V. D., &
Shirk, S. R., Gudmundsen, G., Kaplinski, H., & McMakin, D. LeGagnoux, G. L. (1997). Brief treatment of mild-to-mod-
L. (2008). Alliance and outcome in cognitive-behavioral erate child depression using primary and secondary control
therapy for adolescent depression. Journal of Clinical Child enhancement training. Journal of Consulting and Clinical
and Adolescent Psychology, 37, 631–639. Psychology, 65(4), 703–707.
Silverman, W. K., Kurtines, W. M., & Hoagwood, K. (2004). Weisz, J. R., Weiss, B., & Donenberg, G. R. (1992). The lab ver-
Research progress on effectiveness, transportability, and dis- sus the clinic: Effects of child and adolescent psychotherapy.
semination of empirically supported treatments: Integrating American Psychologist, 47, 1578–1585.

60 t h e r a n do mized co ntro l l ed trial

Westbrook, D., & Kirk, J. (2007). The clinical effectiveness of V. M. Follette, R. D. Dawes & K. Grady (Eds.), Scientific
cognitive behaviour therapy: Outcome for a large sample standards of psychological practice: Issues and recommendations
of adults treated in routine practice. Behaviour Research and (pp. 163–196). Reno, NV: Context Press.
Therapy, 43, 1243–1261. Yeh, M., McCabe, K., Hough, R. L., Dupuis, D., & Hazen, A.
Westen, D., Novotny, C., & Thompson-Brenner, H. (2004). The (2003). Racial and ethnic differences in parental endorse-
empirical status of empirically supported psychotherapies: ment of barriers to mental health services in youth. Mental
Assumptions, findings, and reporting in controlled clinical Health Services Research, 5, 65–77.
trials. Psychological Bulletin, 130, 631–663.
Wilson, G. T. (1995). Empirically validated treatments as a basis
for clinical practice: Problems and prospects. In S. C. Hayes,

k end all, comer, c h ow 61


Dissemination and Implementation

5 Science: Research Models and
Rinad S. Beidas, Tara Mehta, Marc Atkins, Bonnie Solomon, and Jenna Merz

Dissemination and implementation (DI) science has grown exponentially in the past decade. This
chapter reviews and discusses the research methodology pertinent to empirical DI inquiry within
mental health services research. This chapter (a) reviews models of DI science, (b) presents and
discusses design, variables, and measures relevant to DI processes, and (c) offers recommendations
for future research.
Key Words: Dissemination, implementation, research methods, mental health services research

Introduction Part of the impetus for the EBP movement in

Using the specific criteria for “empirically sup- mental health services in the United States was a
ported treatments” (Chambless & Hollon, 1998), 1995 task force report initiated by the Division
efficacious psychosocial treatments have been of Clinical Psychology (Division 12) of the APA
identified for mental health and substance abuse, (Chambless & Hollon, 1998). The initial report
and national accrediting bodies (e.g., American identified empirically supported psychosocial treat-
Psychological Association [APA]) have recom- ments for adults and also highlighted the lack of
mended the use of such treatments, a practice empirical support for many interventions. Since the
that is often referred to as evidence-based prac- initial report, debate with regard to the provision of
tice (EBP; APA, 2005). However, uptake of EBP EBP in clinical practice has ensued, but the move-
is a slow process, with some suggesting that the ment has gained solid footing. Efforts to expand
translation of new research findings into clinical the use of EBP have encouraged the rethinking
practice can take over a decade and a half (Green, of community mental health practice, akin to the
Ottoson, Garcia, & Hiatt, 2009). Given the movement within evidence-based medicine (Torrey,
emphasis on dissemination and implementation Finnerty, Evans, & Wyzik, 2003).
of research innovation, a number of recent efforts Given the different terms used within the
have endeavored to ensure that EBP is dissemi- area of dissemination and implementation (DI)
nated to and implemented within the community research (Beidas & Kendall, 2010; Special Issue of
(McHugh & Barlow, 2010). For example, the the Journal of Consulting and Clinical Psychology,
United States Veterans Administration Quality Kendall & Chambless, 1998; Rakovshik &
Enhancement Research Initiative and the United Mcmanus, 2010), operational definitions are pro-
Kingdom’s Improving Access to Psychological vided. EBP refers here to the provision of psycho-
Therapies are examples of international efforts to social treatments supported by the best scientific
enact large-scale systemic change in the provision evidence while also taking into account clinical
of EBP. experience and client preference (APA, 2005).

Empirically supported treatments refer here to Action on Research Implementation in Health
specific psychological interventions that have Services (PARiHS; Kitson, Harvey, & McCormack,
been evaluated scientifically (e.g., a randomized 1998); Reach, Efficacy, Adoption, Implementation,
controlled trial [RCT]) and independently repli- and Maintenance (RE-AIM; Glasgow, Vogt, &
cated with a delineated population (Chambless & Boles, 1999); Stages of Implementation and Core
Hollon, 1998). DI science includes the purposeful Implementation Components (Fixsen, Naoom,
distribution of relevant information and materials Blasé, Friedman, & Wallace, 2005); the Consolidated
to therapists (i.e., dissemination) and the adoption Framework for Implementation Research (CFIR;
and integration of EBP into practice (i.e., imple- Damschroder et al., 2009); the Practical, Robust
mentation; Lomas, 1993). Dissemination and Implementation, and Sustainability Model (PRISM)
implementation are best initiated together in that (Feldstein & Glasgow, 2008); and a Conceptual
both need to occur in order to influence systemic Model of Implementation Research (Proctor et al.,
change (Proctor et al., 2009). 2009).
This relatively nascent field of study has yet to
develop a “gold-standard” set of research methods PARiHS
specific to DI processes. Nevertheless, this chapter The PARiHS framework has been put forth as
reviews relevant research methodology pertinent a practical heuristic to understand the process of
to research questions within this area. The chapter implementation (Kitson et al., 2008). The use of
(a) reviews models of DI science, (b) presents and the PARiHS model is twofold, “as a diagnostic and
discusses relevant research methods (i.e., design, evaluative tool to successfully implement evidence
variables, and measures), and (c) offers recommen- into practice, and by practitioners and researchers to
dations for future research. evaluate such activity” (Kitson et al., 2008).
The framework posits three interactive compo-
Research Methods nents: evidence (E), context (C), and facilitation
Models (F). E refers to knowledge, C refers to the sys-
A number of models1 exist that are specific to tem within which implementation occurs, and F
DI science (e.g., Consolidated Framework for refers to support of the implementation process.
Implementation Research; Damschroder et al., Successful implementation depends on the interre-
2009) or have been applied from other areas (e.g., lationship between E, C, and F (Kitson et al., 1998).
Diffusion of Innovation; Rogers, 1995) that are The PARiHS model emphasizes that (a) evidence is
salient. When considering models, it is important composed of “codified and non-codified source of
to consider model typology and the need for mul- knowledge,” which includes research, clinical expe-
tiple models to explain DI processes (Damschroder, rience, patient preferences, and local information,
2011). Impact models are explanatory in that they (b) implementing evidence in practice is a team
describe DI hypotheses and assumptions, includ- effort that must balance a dialectic between new
ing causes, effects, and factors (i.e., the “what”), and old, (c) certain settings are more conducive to
whereas process models emphasize the actual imple- implementation of new evidence than others, such
mentation process (i.e., the “how to”; Grol, Bosch, as those that have evaluation and feedback in place,
Hulscher, Eccles, & Wensing, 2007). Below, rele- and (d) facilitation is necessary for implementation
vant models are described. First, we present heuris- success (Kitson et al., 2008). Initial support exists
tic models that can guide study conceptualization, around the model (e.g., facilitation; Kauth et al.,
and then we present models that are more specific to 2010), although there is the need for prospective
various DI questions, including models that empha- study (Helfrich et al., 2010).
size individual practitioners and social and organi-
zational processes. See Table 5.1 for a comparison of RE-AIM
DI models and features. The RE-AIM framework is another model that
can aid in the planning and conducting of DI stud-
comprehensive models ies. RE-AIM evaluates the public health impact of
Models included within this section are compre- an intervention as a function of the following five
hensive and ecological in nature in that they include factors: reach, efficacy, adoption, implementation,
individual, organizational, and systemic processes. and maintenance. This model is consistent with a
These models function largely as guiding heuristics systems-based social ecological framework (Glasgow
when designing DI studies and include Promoting et al., 1999).

beid as, meh ta, at k i ns, solomon, mer z 63

Table 5.1 A Comparison of Key Features Across Key Dissemination and Implementation (DI) Models
Model Features

Emphasizes Process
and Organizational

of Implementation
Emphasizes Social

Opinion Leaders

Emphasizes Key
Feedback Loops

Considers Role

of Research–






Note: PARiHS = Promoting Action on Research Implementation in Health Services (Kitson et al., 1998); RE-AIM = Reach, Efficacy,
Adoption, Implementation, and Maintenance (Glasgow et al., 1999); SI/CIC = Stages of Implementation and Core Implementation
Components (Fixsen et al., 2005); CFIR = Consolidated Framework for Implementation Research (Damschroder et al., 2009);
PRISM = Practical, Robust Implementation, and Sustainability Model (Feldstein & Glasgow, 2008); CMIR = Conceptual Model of
Implementation Research (Proctor et al., 2009); Stetler model (Stetler, 2001); TPB = Theory of Planned Behavior (Ajzen, 1988; 1991);
DOI = Diffusion of Innovation (Rogers, 1995); ARC = Availability, Responsiveness, and Continuity model (Glisson & Schoenwald,
2005); CID = Clinic/Community Intervention Development Model (Hoagwood et al. 2002).
= Feature characterizes model

Reach refers to “the percentage and risk charac- various EBPs, can be scored on each dimension and
teristics of persons who receive or are affected by a plotted and compared to one another.
policy or program,” whereas efficacy refers to posi- Over 100 studies have been completed using
tive and negative health outcomes (i.e., biological, RE-AIM as an organizing heuristic since it was
behavioral, and patient-centered) following imple- published in 1999, but the authors state that it has
mentation of an intervention (Glasgow et al., 1999, not been validated because it is a guiding framework
p. 1323). Both reach and efficacy address individ- rather than model or theory (
ual-level variables. Adoption refers to the number org/about_re-aim/FAQ/index.html). No literature
of settings that choose to implement a particular reviews of RE-AIM–guided studies exist to our
intervention, whereas implementation refers to “the knowledge.
extent to which a program is delivered as intended”
(Glasgow et al., 1999, p. 1323). Both adoption and Stages of Implementation and Core
implementation are organizational-level variables. Implementation Components
Maintenance refers to the extent to which an inter- Fixsen and colleagues have provided two key con-
vention becomes a routine part of the culture of a ceptual models for understanding implementation
context (i.e., sustainability). Maintenance is both an processes (Fixsen, Blase, Naoom, & Wallace, 2009;
individual- and organizational-level variable. Each Fixsen et al., 2005). The recursive and nonlinear
of the five factors can be scored from 0 to 100, with stages of implementation include exploration,
the total score representing the public health impact installation, initial implementation, full imple-
of a particular intervention. Interventions, such as mentation, innovation, and sustainability (Fixsen

64 d i sse m i natio n and imp l ementatio n sc i enc e

et al., 2005). Fixsen and colleagues (2009) suggest a particular setting, feasibility of implementing a
that “the stages of implementation can be thought pilot, complexity of the intervention, design quality
of as components of a tight circle with two-headed and packaging, and cost.
arrows from each to every other component” (Fixsen The outer setting refers to the “economic, politi-
et al., 2009, p. 534). cal, and social context within which an organiza-
Based upon a review of successful programs, a tion resides” (Damschroder et al., 2009, p. 57).
number of core components were proposed within Specifically, the outer setting concerns patient needs
the stages of implementation. These core implemen- for the intervention, cosmopolitanism (i.e., the
tation components include: staff selection, preser- social network of the organization), peer pressure
vice and in-service training, ongoing coaching and to implement the intervention, and external incen-
consultation, staff evaluation, decision support data tives to implement. The inner setting refers to the
systems, facilitative administrative support, and sys- “structural, political, and cultural contexts through
tems interventions (Fixsen et al., 2005). These com- which the implementation process will proceed”
ponents are both integrated and compensatory in (Damschroder et al., 2009, p. 57). These include
that they work together and compensate for strengths structural characteristics (e.g., social organization
and weaknesses to result in optimal outcomes. Core of the agency, age, maturity, size), social networks
implementation components work in tandem with and communication, culture, and implementation
effective programs (Fixsen et al., 2005). climate. The outer setting can influence implemen-
Given the integrated and compensatory nature of tation and may be mediated through modifications
the core implementation components, an adjustment of the inner setting, and the two areas can be over-
of one necessarily influences the others. Importantly, lapping and dynamic (Damschroder et al., 2009).
feedback loops must be built into implementation Individual characteristics refer to stakeholders
programs that allow for such natural corrections. involved with the process of implementation. This
These core implementation components provide framework views stakeholders as active seekers of
a blueprint for implementation research design innovation rather than passive vessels of informa-
(Fixsen et al., 2009). Although this model focuses tion (Greenhalgh, Robert, Macfarlane, Bate, &
on clinician behavior as the emphasized outcome Kyriakidou, 2004). Constructs within this domain
variable, systemic variables and patient outcomes include knowledge and beliefs about the interven-
are also included, making it a comprehensive model tion, self-efficacy with regard to use of the inter-
of implementation processes. vention, individual stage of change, individual
identification with the organization, and other
CFIR individual attributes. Finally, the implementa-
The CFIR is a metatheoretical synthesis of the tion process here refers to four activities: planning,
major models emerging from implementation sci- engaging, executing, and reflecting and evaluating.
ence (Damschroder et al., 2009). CFIR does not Empirical validation for the CFIR model is cur-
specify hypotheses, relationships, or levels but rently ongoing.
rather distills models and theories into core compo-
nents, creating an overarching ecological framework PRISM
that can be applied to various DI research studies. The PRISM model (Feldstein & Glasgow, 2008)
CFIR has five major domains that reflect the struc- represents another comprehensive ecological model
ture of other widely cited implementation theories that integrates across existing DI frameworks (e.g.,
(e.g., Fixsen et al., 2005; Kitson et al., 1998): inter- PARiHS, RE-AIM) to offer a guiding heuristic in
vention characteristics, outer setting, inner setting, DI study design. PRISM is comprehensive in that it
individual characteristics, and the implementation “considers how the program or intervention design,
process. the external environment, the implementation and
Intervention characteristics are important in sustainability infrastructure, and the recipients influ-
DI, particularly the core (i.e., essential elements) ence program adoption, implementation, and main-
and peripheral (i.e., adaptable elements) compo- tenance” (Feldstein & Glasgow, 2008, p. 230).
nents. Other important intervention characteristics The first element of the PRISM model considers
include intervention source, stakeholder perception the perspectives of the organization and consum-
of the evidence for the intervention, stakeholder ers with regard to the intervention. Organizational
perception of the advantage of implementing the characteristics are investigated at three levels (leader-
intervention, adaptability of the intervention for ship, management, and front-line staff); the authors

beid as, meh ta, at k i ns, solomon, mer z 65

recommend considering how the intervention will The model posits two required components: evi-
be perceived by the organization and staff members. dence-based intervention strategies (i.e., EBP) and
For example, readiness for change, program usabil- evidence-based implementation strategies (i.e.,
ity, and alignment with organizational mission are a systems environment, organizational, group/learn-
few issues to address. With regard to taking the con- ing, supervision, individual providers/consumers).
sumer perspective, PRISM recommends considering Unique to this model, three interrelated outcomes
how an intervention will be received by consumers, are specified: implementation (e.g., feasibility, fidel-
such as burden associated with the intervention and ity), service (e.g., effectiveness, safety), and client
the provision of consumer feedback. (e.g., symptoms) outcomes.
The second element of PRISM focuses on orga-
nizational and consumer characteristics. Important models that emphasize individual
organizational characteristics include the financial practitioners
and structural history of an organization as well as Moving beyond heuristic models, we describe
management support. Consumer characteristics to models that specify various components of DI
consider include demographics, disease burden, and processes. Models included within this section
knowledge and beliefs. Relatedly, the third element emphasize individual practitioners and include the
considers characteristics of the external environ- Stetler model (Stetler, 2001) and Theory of Planned
ment relevant to DI efforts, which “may be some of Behavior (Ajzen, 1988, 1991).
the most powerful predictors of success” (Feldstein
& Glasgow, 2008, p. 237). The external environ- Stetler Model
ment refers to motivating variables such as payer The Stetler model (Stetler, 2001) emerges from
satisfaction, competition, regulatory environment, the nursing literature and focuses on how the indi-
payment, and community resources. vidual practitioner can use research information in
The fourth element of the PRISM model refers the provision of EBP. The linear model is “a series of
to the infrastructure present to support implemen- critical-thinking steps designed to buffer the poten-
tation and sustainability. The authors recommend tial barriers to objective, appropriate, and effective
that for implementation to be successful, plans for utilization of research findings” (Stetler, 2001). The
sustainability must be integrated into DI efforts unit of emphasis is the individual’s appropriate use
from the very beginning. Specific variables to con- of research findings.
sider within this element include adopter training The Stetler model has been updated and refined
and support, adaptable protocols and procedures, a number of times (Stetler, 2001) and comprises five
and facilitation of sharing best practices. main stages: (a) preparation, (b) validation, (c) com-
The unique contributions of the PRISM model parative evaluation/decision making, (d) translation/
lie in the integration of various DI models and application, and (e) evaluation. During preparation,
focus on integrating concepts not included in previ- the practitioner identifies a potential high-priority
ous models: (a) perspectives and characteristics of problem, considers the need to form a team or
organizational workers at three levels (leadership, other internal and/or external factors, and seeks
management, and staff), (b) partnerships between systematic reviews and empirical evidence relevant
researchers and those doing the implementation, to the problem. During validation, the practitioner
and (c) planning for sustainability from the begin- rates the quality of evidence and rejects noncredible
ning. Additionally, the authors provide a useful set sources. During comparative evaluation/decision
of questions to ask at each level of the PRISM model making, the practitioner synthesizes findings across
when designing a research project (see Feldstein & empirical sources, evaluates the feasibility and fit of
Glasgow, 2008). current practices, and makes a decision about the
use of evidence in the problem identified. During
Conceptual Model of Implementation Research translation/application, the evidence is used with
Proctor and colleagues (2009) proposed a con- care to ensure that application does not go beyond
ceptual model of implementation research that inte- the evidence. Additionally during this stage, a con-
grates across relevant theories and underscores the certed effort to include dissemination and change
types of outcomes to consider in DI research. Their strategies is necessary. During evaluation, out-
model assumes nested levels (policy, organization, comes from the implementation of the evidence are
group, individual) that integrate quality improve- assessed, including both formal and informal evalu-
ment, implementation processes, and outcomes. ation and cost/benefit analyses. Both formative and

66 d i sse m i n atio n and imp l ementation sc i enc e

summative evaluations are to be included (Stetler, Rogers defined diffusion as “the process through
2001). which an innovation, defined as an idea perceived
as new, spreads via certain communication channels
Theory of Planned Behavior over time among the members of a social system”
Theory of Planned Behavior (TPB; Ajzen, 1988; (Rogers, 2004, p. 13). Diffusion can be conceptual-
1991) can be used to understand the behavior of ized as both a type of communication and of social
the individual practitioner within DI efforts. From change that occurs over time (Haider & Kreps,
the perspective of TPB, behavior is determined by 2004). Adoption of innovation is contingent upon
an individual’s intention to perform a given behav- five characteristics: relative advantage, compatibility,
ior. Intentions are a function of attitudes toward the complexity, trialability, and observability (Rogers,
behavior, subjective norms, and perceived control. 1995). Relative advantage refers to whether or not
This theory has received great attention in other use of an innovation will confer advantage to the
areas of psychology and is empirically supported individual (e.g., improve job performance, increase
(Armitage & Conner, 2001) but has only recently compensation). Compatibility is the extent to which
been applied to DI processes. an innovation is consistent with the individual’s set
In one recent study, clinicians were randomly of values and needs. Complexity refers to how easily
assigned to one of two continuing education an innovation can be learned and used. Trialability
workshops: a TPB-informed workshop and a stan- is the extent to which an innovation can be tested
dard continuing-education workshop. Outcomes on a small scale to evaluate efficacy. Observability
included clinician intentions and behavior in the describes the positive outcomes that are engendered
usage of an assessment tool. The key manipulation by implementation of an innovation.
in the TPB-informed workshop was an elicitation Irrespective of innovation characteristics, DOI
exercise to gather participant attitudes, social norms, theory suggests that innovations are adopted accord-
and perceived control. Findings were supportive in ing to a five-step temporal process of Innovation-
that participants demonstrated both higher inten- Decision: knowledge, persuasion, decision,
tions and higher implementation rates in the use implementation, and confirmation. Knowledge
of the assessment tool (Casper, 2007). This model refers to an individual learning of an innova-
can be used to guide the design of studies hoping to tion, whereas persuasion refers to attitude forma-
influence behavior change at the individual practi- tion about an innovation. Decision occurs when
tioner level. a person decides to adopt or reject an innovation.
Implementation refers here to when an individual
models that emphasize social and uses an innovation, whereas confirmation refers
organizational processes to an individual seeking reinforcement about the
Models within this section emphasize the social decision to implement an innovation. Decisions
nature of DI and the importance of organiza- to adopt an innovation are recursive, meaning that
tional context and include Diffusion of Innovation an individual can reject an innovation at first while
(Rogers, 1995), the Availability, Responsiveness, and adopting it later (Lovejoy, Demireva, Grayson, &
Continuity model (Glisson & Schoenwald, 2005), and McNamara, 2009). Rogers (2004) describes the dif-
the Clinic/Community Intervention Development fusion of innovation as following an S-shaped curve
Model (Hoagwood, Burns, & Weisz, 2002). where innovation adoption begins at a slow rate
(i.e., early adopters; first 16%) but reaches a tipping
Diffusion of Innovation point when adoption accelerates rapidly (i.e., early
The Diffusion of Innovation (DOI) framework and late majority; 68%) and then decreases again
(Rogers, 1995) has been widely used and cited (i.e., laggards; last 16%). The tipping point, or
within the field of DI science as an integral frame- threshold of program utilizers, occurs when approx-
work. DOI has been empirically applied across a imately 25% of the social network become utiliz-
number of fields, such as agriculture and health sci- ers (Valente & Davis, 1999). A well-known and
ences (Green et al., 2009). The tenets of DOI are practical application of DOI includes key opinion
outlined in Rogers’ book, Diffusion of Innovations, leaders, a small group of influential early adopters
which was revised to its fifth edition before Rogers’ who make it more likely that innovation will spread
death in 2004. Over 5,000 studies have been con- within a social network (Valente & Davis, 1999);
ducted on DOI, and a new one is published approx- this theory has been supported in mental health ser-
imately daily (Rogers, 2004). vices research (Atkins et al., 2008).

beidas, meh ta, at k i ns, solomon, mer z 67

DOI has been influential in DI science. The field DI efforts of multisystemic therapy into poor rural
has taken into account characteristics of innovations communities (Glisson et al., 2010).
and the innovation-decision process within a social
context when designing DI research. DOI has been Clinic/Community Intervention Development Model
applied to understanding how to bridge the gap Hoagwood, Burns, and Weisz (2002) proposed
between research and clinical practice within various the Clinic/Community Intervention Development
psychosocial interventions and treatment popula- (CID) model for community deployment efforts of
tions (e.g., autism; Dingfelder & Mandell, 2010). EBP for youth mental health. The CID model allows
DI researchers to understand factors associated with
Availability, Responsiveness, and Continuity Model sustainable services, including why and how services
The Availability, Responsiveness, and Continuity work in practice settings. The CID model comprises
(ARC) organizational and community model is spe- eight steps. Steps 1 through 6 involve efficacy to
cific to mental health services research and is based effectiveness with emphasis on single case appli-
upon three key assumptions: (a) the implementa- cations in practice settings, a limited effectiveness
tion of EBP is both a social and technical process, study to pilot the intervention in real-world practice
(b) mental health services are embedded in layers settings, followed by a full effectiveness study. Steps
of context, including practitioner, organization, and 7 and 8 are specific to DI processes. Step 7 calls
community, and (c) effectiveness is related to how for a series of studies to assess goodness of fit with
well the social context can support the objectives practice settings, whereas Step 8 focuses on going to
of the EBP (Glisson & Schoenwald, 2005). ARC scale by engaging in dissemination research in mul-
aims to improve the fit between the social context tiple organizational settings.
and EBP through intervening at the organizational CID is put forth as a model “for speeding up the
and interorganizational domain levels. The organi- process of developing scientifically valid and effec-
zational level refers to the needs of mental health tive services within the crucible of practice settings”
practitioners, and ARC involves such providers in (Hoagwood et al., 2002, p. 337). A strength of the
organizational processes and policies. The empha- model is that it is externally valid given its emphasis
sis on interorganizational domain level within ARC on components, adaptations, and moderators and
allows for the formation of partnerships among prac- mediators. Additionally, the model calls for inno-
titioners, organizational opinion leaders, and commu- vative thinking as well as new research models to
nity stakeholders with the shared goal of ameliorating assess goodness of fit and criteria to determine when
identified problems in a community through a par- a program is ready to go to scale.
ticular EBP (Glisson & Schoenwald, 2005).
Within the ARC model, a key component summary
includes an ARC change agent who “works with As evident from this review, the sheer number
an interorganizational domain (e.g., juvenile court, of possible DI models to consider when designing
school system, law enforcement, business group, a research question can be quite daunting. Each
churches) at several levels (e.g., community, orga- model presented has strengths and limitations, and
nization, individual) around a shared concern (e.g., none of the models offered covers all of the content
reducing adolescent delinquent behavior)” (Glisson areas relevant to DI science (see Table 5.1). One
& Schoenwald, 2005, p. 248). This individual clear limitation is that many of these newly derived
works at the community level by helping form theoretical models have not yet been subjected to
a group to support an EBP for a selected popula- rigorous scientific evaluation. Despite these limita-
tion, at the organizational level by providing sup- tions, we recommend that all DI-related questions
port in the delivery of EBP, and at the individual be theoretically driven. When designing a research
level to develop individual partnerships with key question, first identify a relevant model that can
opinion leaders. Change agents provide techni- guide the construction of research design in order
cal information, empirical evidence, evaluation of to provide meaningful contributions to the field.
outcomes, and support during times of conflict. In Our bias and recommendation is toward compre-
other words, the role of the change agent is to serve hensive ecological models that take into account the
as a bridge between those disseminating and those contextual aspects of DI processes as the underly-
implementing the EBP (Glisson & Schoenwald, ing framework. However, when examining certain
2005). An especially clear and relevant application processes (e.g., attitudes), it can be helpful to select
of ARC is described in a recent study that improved specific models that can lead to testable hypotheses.

68 d i sse m i natio n and imp l ementatio n sc i enc e

For example, one might select a heuristic model & Brody, 2003), resource-intensiveness, and delay
such as the CFIR (Damschroder et al., 2009) when in application of findings to practice (Atkins et al.,
considering which constructs to focus on in a DI 2006; Carroll & Rounsaville, 2003). In addition, DI
study and then select a more specific model based trials often operate at a larger system level, requiring
on the study question (e.g., training and attitudes; that the unit of randomization be at the system level
TPB, Ajzen, 1988, 1991). (e.g., agencies, schools, classrooms, work settings).
We concur with Damschroder’s (2011) sug- Thus, the sample needed to have adequate power
gestions of the following steps when selecting to detect differences beyond chance may be beyond
models: consider (a) the nature of the model (i.e., the capacity of many DI trials.
process vs. impact, context, discipline), (b) level
of application (e.g., individual, organization), Clinical Equipoise
(c) available evidence, and (d) which model has One option for augmenting traditional RCT
the greatest potential for adding to the literature. designs for DI research in a flexible manner comes
Importantly, it is likely that more than one model from clinical equipoise. Freedman (1987) suggested
will be needed when designing complex DI stud- the use of clinical equipoise in RCTs. The crite-
ies. Furthermore, after aggregating results, it is rion for clinical equipoise is met if there is genuine
important to consider how the results fit back in uncertainty within the practice community about
with the original model(s) selected with regard to a particular intervention. Statistical procedures
validation of the mode and necessary refinements have been developed that allow for balancing the
(Damschroder, 2011). principle of clinical equipoise with randomization
(i.e., equipoise-stratified randomized design; Lavori
Research Design et al., 2001).
The most relevant research designs for DI stud- For example, in the case of the Sequenced
ies are provided and discussed. Although all of the Treatment Alternatives to Relieve Depression
research methods addressed within this book may (STAR*D) research trial (Rush, 2001), a patient
be appropriate in the design of DI studies, given and a service provider might agree that all treat-
the size and complexity of such studies, we focus on ments possible after a failed trial of citalopram are
designs that are particularly salient to DI: experi- roughly equivalent (clinical equipoise). Using an
mental designs, quasi-experimental designs, and equipoise-stratified randomized design allows the
qualitative methodology. clinician and patient to judge what the best treat-
ment option might be based on patient preferences,
experimental designs which can then be statistically controlled by using
Randomized Controlled Trials chosen treatment option as a prerandomization fac-
A full discussion of randomized controlled tri- tor (see Lavori et al., 2001, for a detailed descrip-
als (RCTs) is beyond the scope of this chapter (see tion). From a DI perspective, equipoise offers an
Kendall & Comer, 2011); however, RCT designs advance over the constraints typically imposed on
are often used in DI studies and merit mention participants and settings in RCTs (e.g., West et al.,
(e.g., Miller, Yahne, Moyers, Martinez, & Pirritano, 2008). The concept of equipoise has been integrated
2004; Sholomskas et al., 2005). The main strength into Sequential Multiple Assignment Randomized
of RCTs involves the use of random assignment to Trial (SMART) designs, which allow for patient and
rule out selection bias, which allows for differences provider preference while maintaining the use of
in outcomes between conditions to be explained by randomization and the rigor of RCTs (Landsverk,
the experimental manipulation rather than group Brown, Rolls Reutz, Palinkas, & Horwitz, 2010)
differences (Song & Herman, 2010). RCTs are (see also Chapter 4 in this volume). SMART
often considered the gold-standard research design. designs make possible experimental investigation of
Much has been written about the use of RCTs the treatment choices made by patients and provid-
in DI research. Some researchers have suggested ers by using randomization strategies that account
that limitations exist to RCTs in their application to for choice.
DI studies (e.g., Atkins, Frazier, & Cappella, 2006;
Carroll & Rounsaville, 2003). Such limitations Standardization
include tightly controlled settings, homogenous Another way to consider how to augment RCTs
participants (although some research suggests this is for DI research is to determine which components
overstated; see Stirman, DeRubeis, Crits-Cristoph, of the intervention require standardization (Hawe,

beid as, meh ta, at k i ns, solomon, mer z 69

Shiell, & Riley, 2004). A complex intervention in that “they select clinically relevant interventions
refers to an intervention that cannot be simply to compare, include a diverse population of study
reduced into component parts to understand the participants, recruit participants from a variety of
whole (i.e., component analysis; Hawe et al., 2004). practice settings, and collect data on a broad range
However, because an intervention is complex does of health outcomes” (p. 1626).
not mean that an RCT is not appropriate—the March and colleagues (2005) suggested that there
question lies in what part of the intervention is are eight defining principles of PCTs: (a) research
standardized. Standardization as it is conceptual- questions that are of public health interest and
ized within a traditional RCT suggests that com- clinically relevant, (b) they are performed in usual
ponents of the intervention are the same across care settings, (c) power is sufficient to identify small
different sites. Hawe and colleagues (2004) suggest to medium effects, (d) randomization is included,
an alternative perspective to standardization: “rather (e) randomization depends on the principle of uncer-
than defining the components of the intervention tainty/clinical equipoise, (f ) outcomes are simple
as standard—for example, the information kit, the and clinically relevant, (g) interventions map onto
counseling intervention, the workshops—what best clinical practice, and (h) research burden is
should be defined as standard are the steps in the minimized. PCTs are well suited to answer questions
change process that the elements are purporting to related to intervention effectiveness as well as which
facilitate or the key functions that they are meant to treatment works best for which patients depending
have” (Hawe et al., 2004, p. 1562). on their characteristics (March et al., 2005).
Pragmatically, this means that the form can be PCTs are similar to effectiveness designs in that
adapted while process and function remain stan- they aim to provide information to decision makers
dardized. What is varied becomes the form of the about whether or not interventions work in rou-
intervention in different contexts. For example, to tine clinical care settings. Questions best answered
train providers about treatment of anxiety, the tra- by this design include the overall effectiveness of
ditional way to conduct an RCT would be to stan- a particular intervention in routine settings and
dardize training methods across sites. Live group include heterogeneous patient populations neces-
instruction might be compared to computer-guided sitating larger sample sizes. Outcomes must include
instruction. In each case, the information provided evidence that is relevant to everyday policymakers,
would be the same and therefore the results would such as quality of life and the cost-effectiveness of
relate to which type of training was superior for the interventions (Macpherson, 2004).
majority of participants. Alternatively, one could The main strength of PCTs and their relevance
standardize the function by providing supervisors in to DI research lies in their emphasis on understand-
an organization with the materials necessary to cre- ing whether or not interventions can be effective in
ate training programs that are tailored to the specific real-world settings. In other words, these designs are
setting. In this case, intervention integrity would heavy on external validity and ecological evidence
not relate to typical quality assurance efforts (i.e., and provide evidence for decision makers regarding
did trainer follow specific protocol); rather, it would which interventions to recommend. Such trials have
be related to whether the training developed within been used effectively in medicine and psychiatry
each context provided information consistent with (March et al., 2005). Limitations to PCTs include
the theory or principles underlying the change pro- that they are very costly, need a resource-intensive
cess. This effort could result in improved effective- infrastructure to succeed (March et al., 2005; Tunis
ness of DI efforts (Hawe et al., 2004). et al., 2003), require close collaborations between the
research team and practice sites, and may be more
Practical Clinical Trials reflective of agency priorities than researcher priori-
Practical clinical trials (PCTs; also known as ties (e.g., symptom reduction may not be the primary
pragmatic clinical trials) have been recommended outcome measure but rather improved functioning in
as an alternative to traditional RCTs (Tunis, Stryer, setting). However, recent advances in electronic health
& Clancy, 2003) and are specifically relevant to records make it more feasible to realize the potential of
effectiveness studies, the mainstay of DI research. such designs in the future (March, 2011).
PCTs are designed to provide the information nec-
essary to make decisions about best-care practices Adaptive Clinical Trials
in routine clinical settings. Tunis and colleagues Adaptive clinical trials are another alternative
(2003) describe the distinctive features of PCTs to RCTs and are flexible in that they plan for the

70 d i sse m i n atio n and imp l ementation sc i enc e

possibility of reactive changes to study design and/ study development” (Atkins et al., 2006, p. 107).
or statistical procedures as the study progresses This allows for flexible research design and “ongoing
based upon review of interim data (Chow & Chang, interaction between researcher- and context-driven
2008). That is, an adaptive design can be defined information at various information points in a proj-
as “a design that allows adaptations to trial and/ ect” (Atkins et al., 2006, p. 107).
or statistical procedures of the trial after its initia-
tion without undermining the validity and integrity quasi-experimental designs
of the trial” (Chow & Chang, 2008). A number Single-Case Time-Series Intervention
of adaptive design strategies exist (see Chow & Given the emphasis within the psychological lit-
Chang, 2008, for review). One that stands out as erature on RCTs, single-case time-series designs have
being particularly salient to DI processes includes fallen somewhat out of favor (Borckardt et al., 2008).
adaptive treatment switching. This design allows Once the mainstay of behavior therapists in the 1970s
researchers to switch a participant from one group and early 1980s, single-case designs focus on the
to another based on lack of efficacy. For example, a experimental analysis of behavior (Hersen & Barlow,
patient assigned to usual care could be switched to 1976). Using single-case interventions may provide
an EBP if usual care is not effective. Bayesian ana- the establishment of a model of individualized EBP
lytic approaches that rely on probability theory are in which the goal would be less the use of scientifi-
especially appropriate statistical analyses for these cally established treatments and more the scientific
designs (Luce et al., 2009). use of treatment (Gambrill, 2006), thus returning to
Although these designs have the advantage of the roots of behavior therapy and also bridging the
allowing for flexibility to accommodate policy-re- gap between research and practice. The APA Division
lated questions, they provide challenges to fidelity 12 task force includes the use of systematic single-
assessment and there are as yet no clear guidelines case intervention as one manner from which to glean
for the appropriate use of adaptive clinical trial scientific evidence (Chambless & Hollon, 1998).
designs (Chow & Chang, 2008). Single-case designs allow for multiple observa-
tions before and after treatment to provide evidence
Hybrid Models of patient change and can be accomplished in both
Hybrid models have been recommended to clinical settings and research settings (see Borckardt
capitalize on the best of efficacy and effective- et al., 2008). Single-case studies have natural appeal
ness methodologies (Atkins et al., 2006; Carroll to practitioners as they can provide information rel-
& Rounsaville, 2003). Carroll and Rounsaville evant to each client and allow for the comparison
(2003) proposed a hybrid model that retains the of interventions to determine which works best for
methodological rigor of RCTs but adds additional this client under specific circumstances (Stewart &
components of traditional effectiveness research. In Chambless, 2010). Additionally, single-case time-
addition to the typical features of an RCT meant to series designs can include important manipulations
protect internal validity (e.g., random assignment, (e.g., randomization) to help ensure a degree of
blind assessment of outcomes, fidelity monitoring), methodological rigor from which researchers can
the authors suggest that the following components generate causal inferences (Kratochwill & Levin,
be integrated into the design to balance external 2010; Lewin, Lall, & Kratochwill, 2011).
validity and make RCTs more appropriate for DI
research: enhanced diversity in patients and settings, qualitative methods
attention to training issues, evaluation of cost effec- Qualitative methods offer a window into the
tiveness, and assessment of patient and provider sat- complex processes occurring within DI research
isfaction. These recommendations have been feasibly studies in a manner that purely quantitative studies
integrated into DI RCTs. For example, one study are unable to provide. Qualitative research “provides
feasibly balanced features of efficacy research (e.g., a vivid, dense, and full description in the natural
randomization, rigorous assessment) and effective- language of the phenomenon under study” (Hill,
ness research (e.g., few exclusion criteria, completed Thompson, & Williams, 1997, p. 518). Rather
in naturalistic setting; Dimeff et al., 2009). Other than identifying a priori hypotheses, relationships
important recommendations when considering between phenomena are identified as part of the pro-
how to adapt RCT methodology for DI research cess of qualitative research. A qualitative approach
include understanding organizational context and allows for the “change over time” investigation of
including a “systematic and iterative approach to DI efforts (Meyer, 2004).

beid as, meh ta, at k i ns, solomon, mer z 71

Qualitative methods can be used to “explore and that each DI study should include (a) an RCT,
obtain depth of understanding as to the reasons for (b) a qualitative evaluation of the implementation of
success or failure to implement evidence-based prac- the study with an emphasis on organizational char-
tice or to identify strategies for facilitating imple- acteristics, and (c) systematic case studies. Taking a
mentation while quantitative methods are used to mixed-method approach to DI processes moves the
test and confirm hypotheses based on an existing field toward a rapprochement between research and
conceptual model and obtain breadth of under- practice (Dattilio et al., 2010). A review of 22 stud-
standing of predictors of successful implementation” ies utilizing mixed methods in child mental health
(Palinkas et al., 2011, p. 44). In this way, qualitative services research found that mixed methods were
methodology can be used to augment traditional used for one of five reasons: (a) to measure inter-
quantitative methods by providing more nuanced vention and process, (b) to conduct exploratory and
contextual information on barriers and/or facilita- confirmatory research, (c) to examine intervention
tors. Numerous examples of exemplary use of quali- content and context, (d) to understand perspec-
tative methodology exist within DI literature. For tives of consumers (i.e., practitioners and clients),
example, one study used an ethnographic approach and (e) to compensate for one set of methods with
to understand intentions of community clinicians another (Palinkas et al., 2011). The authors state, “it
to use EBP (Palinkas et al., 2008). In this study, par- is the combining of these methods through mixed
ticipant observation and semistructured interviews method designs that is likely to hold the greatest
were used to understand treatment implementation promise for advancing our understanding of why
in an effectiveness trial of EBP for depression, anxi- evidence-based practices are not being used, what
ety, and conduct problems in youth. Three patterns can be done to get them into routine use, and how
emerged with regard to participant intention to use to accelerate the improvement of systems of care
EBP: application of treatment with fidelity, aban- and practice” (Palinkas et al., 2011).
donment of treatment, and selective application of
treatment. Factors associated with these intentions Outcomes Relevant to Dissemination
were also explored. and Implementation
Qualitative research methods, like all meth- A number of variables have been examined
odologies, are not without limitations. Despite as both predictors and outcomes within the DI
increasing attention to the value of such methods, literature and include individual provider (e.g.,
weaknesses include less scientific rigor than quan- knowledge, attitudes), organizational (e.g., climate,
titative methods and concerns about reliability and support), and client variables (e.g., treatment out-
validity, analytic techniques used, and quality of come). However, given the present emphasis on DI
produced knowledge (Fitzpatrick & Bolton, 1996; methods, we focus on reviewing implementation
Mays & Pope, 2000). outcomes.
Proctor and colleagues (2011) recommend that
summary of designs DI research focus on implementation outcomes
Each design can be useful when attempting to that are conceptually different from service or cli-
answer questions relevant to DI processes, and care- ent outcomes. Specifically, the authors “define
ful consideration of the research question and bal- implementation outcomes as the effects of deliber-
ancing the strengths and limitations of each design ate and purposive actions to implement new treat-
is necessary. A recent review describing elements ments, practices, and services” (Proctor et al., 2011,
in studies of EBP implementation in child welfare p. 65). An emphasis on implementation outcomes
and mental health settings found RCTs to be the is necessary given that such outcomes are indicators
dominant paradigm, with some utilization of mixed of implementation success, are proximal indicators
methodology. Little use of emerging alternative of implementation processes, and are related to ser-
designs (e.g., PCTs, SMART design) was identi- vice and clinical outcomes (Proctor et al., 2011).
fied (Landsverk et al., 2010), suggesting that future Distinguishing between implementation and
studies should consider these alternatives. intervention effectiveness is crucial in DI studies
In a developing area such as DI, researchers might to understand what occurs following implemen-
recognize the strengths of established methods but tation (i.e., is failure due to a poorly designed or
also consider the use of multiple-method research inappropriate intervention or to an effective prac-
to produce converging results. For example, we tice implemented inadequately). Proctor and col-
agree with Dattilio, Edwards, and Fishman (2010) leagues (2011) suggested that there are eight crucial

72 d i sse m i natio n and imp l ementatio n sc i enc e

outcomes to understand the effects of DI studies: Penetration refers to “the integration of a practice
acceptability, adoption, appropriateness, feasibility, within a service setting and its subsystems” (Proctor
fidelity, implementation cost, penetration, and sus- et al., 2011)—in other words, how widely used a
tainability. We suggest adaptation of intervention as particular practice is within an organization, con-
an additional outcome of interest. ceptually similar to the “reach” component in the
Acceptability refers to the belief among stake- RE-AIM framework. Direct measures of penetra-
holders that a particular EBP is acceptable and up tion have not been identified.
to standards. Proctor and colleagues (2011) distin- Any successful DI effort should result not only in
guish acceptability from satisfaction, stating that the EBP being implemented within the community,
acceptability is more specific to a particular set of but also sustainability over time if found to be effec-
practices. Additionally, acceptability is fluid in that tive. This construct is akin to the “maintenance”
it changes with experience (e.g., before to after component in the RE-AIM model and is directly
implementation). Acceptability can be measured at addressed in the PRISM model.
the individual provider, organizational, and client It is likely that sustained programs are better
levels. One example of an instrument that measures situated to yield sustained effects. Sustainability
this construct includes the Evidence-Based Practice is also crucial because outcomes may not realisti-
Attitude Scale (EBPAS; Aarons, 2004). cally be achieved or detected within the timeframe
Adoption refers here to “the intention, initial permitted by traditional research studies or the
decision, or action to try or employ an innovation grants that typically support them, particularly
or EBP” (Proctor et al., 2011). Adoption is mea- if the intervention targets behavioral change or
sured at the individual or organizational level and community-level mental health outcomes (Pluye,
refers here to the same construct as delineated in the Potvin, & Denis, 2004). Moreover, the recur-
RE-AIM model. Standardized measures of adoption rent discontinuation of promising or effective
have yet to be identified, and time criteria have not programs can have deleterious consequences for
been specified (i.e., when does adoption become a community, specifically with regard to willing-
routine practice). ness to support future projects (Pluye et al., 2004;
Appropriateness refers to the compatibility of Shediac-Rizkallah & Bone, 1998).
an EBP for a given setting, provider, or consumer. Sustainability has been operationalized multiple
The constructs of appropriateness and acceptability ways, including the continuation of program activi-
overlap but are also distinct given that an EBP can ties, the maintenance of intended benefits for the
be appropriate but not acceptable (Proctor et al., target population, and the development of commu-
2011). Standardized measures of appropriateness nity capacity (Scheirer, 2005; Shediac-Rizkallah &
have not been identified. Bone, 1998). Altman (1995, p. 527) has proposed
Feasibility refers to the extent to which an EBP an especially clear definition:
can be used effectively within a service system
Sustainability is . . . defined as the infrastructure that
(Proctor et al., 2011) and can be assessed from an
remains in a community after a research project ends.
individual and organizational level. Measures of fea-
Sustainability includes consideration of interventions
sibility have not been identified.
that are maintained, organizations that modify their
Implementation cost refers to the cost of an
actions as a result of participating in research, and
implementation effort and varies with regard to
individuals who, through the research process, gain
delivery, complexity of the EBP, and particular ser-
knowledge and skills that are used in other life
vice setting. The few studies that have reported on
implementation cost have quantified cost by inter-
vention component. However, direct measures of This conceptualization highlights the relation-
implementation cost are not currently widely used ship between a program and the setting in which
(Proctor et al., 2011). One possible strategy to be it is implemented and emphasizes that systemic
used is the English cost calculator, a method used change at multiple levels ought to be a goal of any
to calculate the cost of core work activities and intervention. Thus, thinking about sustainability
administrative costs, in order to inform adminis- ought to reflect enduring change at the community
trators when making implementation decisions level, as should the ways in which sustainability is
(Chamberlain et al., 2011). It is likely that the DI planned for and measured.
field can benefit from work in health economics to Too often, DI research is viewed as a linear pro-
advance this area. cess, culminating in the sustainability phase. More

beid as, meh ta, at k i ns, solomon, mer z 73

effective, however, is to view sustainability as a Harris, & Ahluwalia, 2003). One difficulty with
process that unfolds alongside the research effort measuring fidelity includes varying fidelity mea-
(Pluye et al., 2004). From this perspective, plan- sures across treatment modality.
ning for sustainability becomes part of planning for The emphasis on fidelity has come under criti-
the DI process more generally (Adelman & Taylor, cism. A recent meta-analysis suggests that neither
2003; Altman, 1995; Pluye et al., 2004), and this adherence nor competence is significantly related
planning is best informed by an understanding of to patient outcomes (Webb, DeRubeis, & Barber,
factors believed to influence sustainability, among 2010). Possible explanations of this puzzling find-
them (a) the presence of a program “champion” or ing include limited variability on adherence and
change agent (Adelman & Taylor, 2003; Scheirer, competence ratings within RCTs included in this
2005; Shediac-Rizkallah & Bone, 1998), (b) the meta-analysis (therapists are trained to criterion
extent to which the program is compatible with an and monitored, resulting in a limited range) and
organization’s values or mission (Scheirer, 2005), the possibility of a curvilinear relationship between
(c) the extent to which the program is integrated fidelity and outcomes. However, much is unknown
into the structures and routines of an organization about the causal role of specific treatment interven-
or community (Adelman & Taylor, 2003; Shediac- tions on specific outcomes (Morgenstern & McKay,
Rizkallah & Bone, 1998), (d) the extent to which 2007), and more dismantling studies are needed to
community members perceive the program as bene- understand the relative contribution of various ther-
ficial and support it (Altman, 1995; Scheirer, 2005), apeutic procedures on outcomes. Given the current
and (e) flexibility to modify the program over time literature, it is premature to conclude that fidelity
(Scheirer, 2005). to EBP is unimportant in DI efforts, but further
All but the last of these factors can benefit from empirical study is necessary.
an ongoing collaboration between researcher and The question of adaptation of treatments to
community: “The literature overwhelmingly shows particular settings has been raised with regard to
a positive relationship between community par- fidelity. Adaptation has been defined as intentional
ticipation and sustainability” (Shediac-Rizkallah & or unintentional additions, deletions, or modifica-
Bone, 1998, p. 103). Early involvement of com- tions of a program (Center for Substance Abuse
munity members in the research process can help Prevention, 2002). The term “re-invention” has
researchers appreciate the needs of the community, been used (Rogers, 1995), often interchangeably.
thereby enabling them to study and develop inter- Most researchers agree that adaptation is not inher-
ventions that better meet those needs (Altman, ently negative; it is often beneficial to make certain
1995; Durlak & DuPre, 2008). This, in turn, can changes to better address the needs, culture, and
increase the willingness among community mem- context of the local environment (Bauman, Stein
bers and groups to take ownership of the interven- & Ireys, 1991; Castro, Barrera, & Martinez, 2004;
tion and sustain it beyond the initial funding period Center for Substance Abuse Prevention, 2002;
(Altman, 1995). To date, measures of sustainability Ozer, Wanis & Bazell, 2010; Rogers, 1995). In
are not available. fact, there is evidence to suggest that adaptation can
Fidelity refers to the implementation of an EBP serve to increase both the effectiveness of an inter-
as specified by treatment developers. Measuring vention (e.g., McGraw, Sellers, Stone & Bebchuk,
provider adherence and competence/skill has 1996) and the likelihood that an intervention is sus-
become standard procedure to determine treatment tained over time (e.g., Scheirer, 2005), which may
fidelity (Kendall & Comer, 2011; Perpepletchikova be a consequence of increasing the relevance of the
& Kazdin, 2005). Adherence refers to the degree intervention for the target population (Castro et al.,
to which a clinician follows the procedures of an 2004; Ozer et al., 2010).
EBP, whereas competence refers to the level of When we shift our attention to the process by
skill demonstrated by the clinician in the delivery which individuals and organizations implement
of treatment (Perepletchikova & Kazdin, 2005). EBP, a key issue that arises is the extent to which the
Adherence and competence are typically measured programs or practices being used in fact resemble
by independent evaluators based on in-session clini- those upon which the evidence was based. Despite
cian behavior. Illustrative examples of fidelity mea- findings from a recent meta-analysis (Webb et al.,
sures include the Cognitive Therapy Scale (Young 2010), a number of studies have demonstrated that
& Beck, 1980) and the Motivational Interviewing a high level of fidelity to an intervention’s design
Treatment Integrity scale (Moyers, Martin, Catley, has been linked to improved outcomes (Battistich,

74 d i sse m i n atio n and imp l ementation sc i enc e

Schaps, Watson, & Solomon, 1996; Blakely et al., Ideally, program developers would not only iden-
1987; Botvin, Baker, Dusenbury, Tortu, & Botvin, tify those elements central to the program’s theory
1990; Durlak & DuPre, 2008; Rohrbach, Graham, of change that must remain intact, but also articu-
& Hansen, 1993), and there are those who insist that late the range of acceptable adaptations (Green &
absolute fidelity must be maintained (O’Connor, Glasgow, 2006). Second, the definition provided
Small, & Cooney, 2007). Many researchers acknowl- earlier—which encompasses additions, deletions,
edge that what matters most is fidelity to an inter- and modifications to the program model—may lead
vention’s core components or causal mechanism(s). to some confusion regarding what actually counts as
In other words, testing interventions in real-world an adaptation. For instance, how should we distin-
settings requires a balancing act, of sorts, between guish between an addition to a program’s model and
preserving an intervention’s core components and a separate but related practice taking place alongside
making needed adaptations given the local context the program, within the same organization?
(i.e., flexibility within fidelity; Bauman et al., 1991; These challenges demand a thoughtful and delib-
Center for Substance Abuse Prevention, 2002; erate implementation process, in which researchers
Green & Glasgow, 2006; Kendall & Beidas, 2007; work closely with local stakeholders to plan for the
Kendall, Gosch, Furr, & Sood, 2008). implementation of EBP. During this process, consid-
Rogers (1995) noted that some amount of re-in- eration should be given to both the local conditions
vention is inevitable among adopters of innovations; that make adaptations appropriate in practice, as
for example, several studies report that adapta- well as the extent to which they may be permissible
tions are the norm when implementing school- by the theory underlying the intervention (Green &
based interventions (Datnow & Castellano, 2000; Glasgow, 2006). Finally, descriptions and rationales
Dusenbury, Brannigan, Hansen, Walsh, & Falco, for adaptations must be documented so that imple-
2005; Larsen & Samdal, 2007; Ozer et al., 2010; mentation can be more meaningfully evaluated and
Ringwalt, Ennett, Vincus, & Simons-Rudolph, outcomes can be interpreted more accurately.
2004). That said, the true prevalence of adaptations
is unknown because they are not reported consis- Measures
tently. Durlak and DuPre (2008) found that only Variables of interest in DI research vary from
3 of 59 studies assessing the impact of implementa- those in other related areas. Accordingly, measures
tion on intervention outcomes reported on adapta- explicit to DI research have emerged and made it
tion, whereas 37 reported on fidelity. possible to measure constructs from an ecological
In light of this, those involved in DI research perspective including provider, client, and organi-
must take care to document the adaptation pro- zational variables (Table 5.2). Measures specific to
cess. According to the Center for Substance Abuse DI processes (as in Proctor et al., 2011) also exist.
Prevention (2002), the following steps have been For further discussion of DI measures, see Lewis,
proposed to guide the process of adapting programs Comtois, and Krimer (2011).2
to new settings: (a) identify the theory of change
underlying the program, (b) identify the com- measures at the provider level
ponents that are essential to the program (i.e., its Provider Attitudes
“core” components), (c) identify appropriate adap- Measure of Disseminability (MOD; Trent,
tations given the local circumstances, (d) consult Buchanan, & Young, 2010), a 32-item self-report
with the program developer regarding the previ- measure, assesses therapists’ attitudes toward the
ous steps, (e) consult with local stakeholders, and adoption of a particular EBP on a scale from 1 (not
(f ) develop a plan for implementation, including a at all) to 7 (very much). The MOD is based upon
plan for assessing the fidelity/adaptation balance. a three-factor model (treatment evaluation, level of
The task is not without challenges. First, few comfort, and negative expectations) that has been
interventions adequately delineate which com- studied using exploratory and confirmatory factor
ponents are core (Durlak & DuPre, 2008), mak- analysis (Trent et al., 2010). Psychometric proper-
ing it difficult to determine whether a proposed ties include strong retest reliability (.93) and inter-
adaptation may threaten the very mechanism that nal consistency (.73 to .83; Trent et al., 2010).
makes the intervention work. Those involved in Evidence-Based Practice Attitude Scale (EBPAS;
DI research are urged to work in tandem with pro- Aarons, 2004), a 15-item self-report measure,
gram developers, requesting, if necessary, that they assesses therapists’ attitudes toward the adoption and
conduct some manner of core component analysis. implementation of EBP on a scale from 0 (not at all)

beid as, meh ta, at k i ns, solomon, mer z 75

Table 5.2 A Comparison of DI Measures
Measure Features

Organizational Level
Provider Knowledge
Provider Attitudes

Provider Fidelity


Freely Available
Provider Level

Client Level

Measure Website


















Note: MOD = Measure of Disseminability (Trent, Buchanan, & Young, 2010); EBPAS = Evidence Based Practice Attitude Scale (Aarons,
2004); MPAS = Modified Practitioner Attitude Scale (Chorpita et al., 2004); ASA = Attitudes Toward Standardized Assessment Scales
(Jensen-Doss & Hawley, 2010); TX-CHAT = Texas Survey of Provider Characteristics and Attitudes (Jensen-Doss, Hawley, Lopez, &
Osterberg, 2009); KEBSQ = Knowledge of Evidence Based Services Questionnaire (Stumpf et al., 2009); CBT-KQ = Cognitive-Behavioral
Therapy Knowledge Quiz (Latham, Myles, & Ricketts, 2003; Myles, Latham, & Ricketts, 2003); TPOCS-S = Therapy Process Observational
Coding System for Child Psychotherapy Strategies Scale (McLeod, 2001); ORC = Organizational Readiness for Change (Institute for
Behavioral Research, 2002); OSC = Organizational Social Context (Glisson et al., 2008); ORCA = Organizational Readiness to Change
Assessment (Helfrich, Li, Sharp, & Sales, 2009); AII = Adopting Innovation Instrument (Moore & Benbasat, 1991); SHAY = State Health
Authority Yardstick (Finnerty et al., 2009); TCAT = Treatment Cost Analysis Tool (Flynn et al., 2009); OS = Ohio Scales (Ogles, Lunnen,
Gillespie, & Trout, 1996); CIS = Columbia Impairment Scale (Hurt, Arnold & Aman, 2003); PROMIS = Patient-Reported Outcomes
Measurement Information System (Cella et al., 2010).
= Feature characterizes model

76 d i sse m i natio n and imp l ementatio n sc i enc e

to 4 (to a great extent). The EBPAS maps onto four itself. Psychometric quality refers to clinicians’
subscales: appeal, requirements, openness, and diver- beliefs about the reliability and validity of standard-
gence (Aarons, 2004). Appeal refers to the extent to ized measures. Practicality refers to clinicians’ belief
which a therapist would adopt a new practice if it about the feasibility of using standardized measure
is intuitively appealing. Requirements refers to the in clinical practice. In an initial psychometric evalu-
extent to which a therapist would adopt a new prac- ation, internal consistency ranged from .72 to .75,
tice if required by his or her organization or legally and scale structure was corroborated by a confirma-
mandated. Openness is the extent to which a thera- tory factor analysis suggesting adequate model fit
pist is generally receptive to using new interven- (RMSEA = .045, CFI = .935). The measure was also
tions. Divergence is the extent to which a therapist found to be predictive of intentions to use evidence-
perceives research-based treatments as not useful based assessment (Jensen-Doss & Hawley, 2010).
clinically (Aarons, 2004). The EBPAS demonstrates Texas Survey of Provider Characteristics and
good internal consistency (Aarons, 2004), subscale Attitudes (TX-CHAT; Jensen-Doss, Hawley, Lopez,
alphas range from .59 to .90 (Aarons & Sawitzky, & Osterberg, 2009) is a 27-item measure created for
2006), and its validity is supported by its relation- administration to direct service providers to under-
ship with both therapist-level attributes and orga- stand therapists’ attitudes toward EBPs that they are
nizational characteristics (Aarons, 2004). Recently, currently using in their clinical practice. Items are
a 50-item version of the EBPAS (EBPAS-50) has measured on a scale from 1 (not at all true for me)
been developed and includes an additional eight fac- to 5 (very true for me). Items map onto five sub-
tors: limitations, fit, monitoring, balance, burden, scales: provider’s attitudes toward evidence-based
job security, organizational support, and feedback. treatments, colleagues’ attitudes toward evidence-
Exploratory analyses demonstrated high internal based treatments, agency support for implemen-
consistency among factors (.77 to .92; Aarons, tation, barriers to implementation, and quality of
Cafri, Lugo, & Sawitzky, 2010). training. The measure has held up to initial psycho-
Modified Practitioner Attitude Scale (MPAS; metric investigation with adequate alpha’s at .69 or
Chorpita et al., unpublished measure, 2004) is an above (Jensen-Doss et al., 2009; Lopez, Osterberg,
eight-item measure created for administration to Jensen-Doss, & Rae, 2011).
direct service providers to understand therapists’
attitudes toward EBP. Items are measured on a scale Provider Knowledge
from 0 (not at all) to 4 (to a great extent). Items on Knowledge of Evidence Based Services
the MPAS are similar to items on the EBPAS but Questionnaire (KEBSQ; Stumpf, Higa-McMillan,
are specifically worded to avoid references to treat- & Chorpita, 2009) is a 40-item self-report measure
ment manuals (e.g., referring to treatments rather administered to direct service providers to mea-
than treatment manuals). Psychometric properties sure their knowledge of EBP. Items on the KEBSQ
for the MPAS suggest adequate internal consistency include practice elements of EBP and non-EBP used
(.80) and moderate relationship with the EBPAS in the treatment of four childhood difficulties: (a)
(r = .36). The wording in the MPAS (i.e., not refer- anxious/avoidant, (b) depressed/withdrawn, (c) dis-
ring to treatment manuals but referring to EBP) ruptive behavior, and (d) attention/hyperactivity. In
may result in differential results in reported provider this measure, 40 practice elements are listed and prac-
attitudes (Borntrager, Chorpita, Higa-McMillan, & titioners are to classify if a particular practice element
Weisz, 2009). (e.g., relaxation) is used in EBP for each of the four
Attitudes Toward Standardized Assessment Scales difficulties. Each item is scored on a scale from 0 to 4
(ASA; Jensen-Doss & Hawley, 2010) is a 22-item with a total possible score of 160; higher scores indi-
measure created for administration to direct service cate more knowledge of EBP. The measure has accept-
providers to understand therapists’ attitude towards able temporal stability (.56), sensitivity to training,
standardized assessment measures often utilized in and discriminative validity (Stumpf et al., 2009).
EBP. Items are measured on a scale from 1 (strongly Overall internal consistency is low (.46; Okamura,
agree) to 5 (strongly disagree). Items address three Nakamura, McMillan, Mueller, & Hayashi, 2010),
factors: benefit over clinical judgment, psycho- but the original authors caution against measur-
metric quality, and practicality. Benefit over clini- ing internal consistency given that each item rep-
cal judgment refers to items assessing the extent to resents a unique and independent technique that
which standardized measures provide extra infor- is not necessarily related to other items (Stumpf
mation above and beyond clinical judgment by et al., 2009).

beid as, meh ta, at k i ns, solomon, mer z 77

Cognitive-Behavioral Therapy Knowledge Quiz ratings include thoroughness and frequency.
(CBT-KQ; Latham, Myles, & Ricketts, 2003; Thoroughness refers to depth of provision of inter-
Myles, Latham, & Ricketts, 2003) is a 26-item vention; frequency refers to how often a therapist
self-report multiple-choice measure administered provides the intervention during a session. The
to direct service providers to measure knowledge TPOCS-S has been psychometrically investigated.
of CBT in adult patients. Items on the CBT-KQ The measure has shown good interrater reliability
map onto the following categories: (a) general CBT (.66 to .95), internally consistent subscales (.74
issues, (b) underpinnings of behavioral approaches, to .86), and adequate construct validity (McLeod &
(c) underpinnings of cognitive approaches, Weisz, 2010). The TPOCS-S has been used success-
(d) practice of behavioral psychotherapy, and (e) fully in studies characterizing usual care (Garland
practice of cognitive therapy. Each item is scored et al., 2010).
as correct or incorrect with a total possible score of
26; higher scores indicate more knowledge of CBT. measures at the organizational level
Psychometrics are not yet available. Organizational Readiness for Change (ORC; Institute
for Behavioral Research, 2002) is a 129-item instru-
Provider Intervention Fidelity ment that measures organizational characteristics
Several instruments exist to measure fidelity to and is gathered through administration to various
specific treatment modalities. For example, for moti- individuals in an organization. Responses are pro-
vational interviewing, one can use the Motivational vided based on a 5-point Likert rating scale ranging
Interviewing Skill Coding (MISC; Moyers et al., from 1 (strongly disagree) to 5 (strongly agree). The
2003) whereas for cognitive therapy, one can use 18 scales represent three major domains: motivation,
the Cognitive Therapy Scale (CTS; Young & resources, and organizational factors. Motivational
Beck, 1980), the Cognitive Therapy Scale-Revised factors include program needs, training needs,
(CTS-R; James, Blackburn, & Reichelt, 2001), or and pressure for change. Resources include office
the Collaborative Study Psychotherapy Ratings Scale facilities, staffing, training, equipment, and avail-
(CSPRS; Hollon et al., 1988). Often, investigators ability of Internet. Organizational factors include
create intervention-specific fidelity measures for the staff attributes and organizational climate. Staff
specific EBP they are researching and disseminat- attributes include growth, efficacy, influence, and
ing (Beidas & Kendall, 2010). Recommendations adaptability; organizational climate includes mis-
have been made for using standardized fidelity mea- sion, cohesion, autonomy, communication, stress,
sures across EBPs; however, there is currently no and flexibility for change.
measure that can be used across EBPs, and as can Psychometrically speaking, the instrument has
be seen, often multiple measures exist for the same shown moderate to high coefficient alphas (range: .56
treatment modality. However, one observational to .92), and support for the factors has been gleaned
coding system that cuts across modalities for child from principal component analysis (Lehmen,
psychotherapy strategies has been psychometrically Greener, & Simpson, 2002). This measure has mul-
explored and is described below. tiple forms to be administered to various individu-
Therapy Process Observational Coding System for als within an organization, such as front-line staff
Child Psychotherapy Strategies Scale (TPOCS-S; and supervisors. Additionally, the measure has been
McLeod, 2001) is a 31-item coding measure modified for use in settings other than community
intended to allow for description of provision mental health centers (e.g., criminal justice). Score
of mental health treatment in practice settings. profiles can be mapped onto norms, allowing for
TPOCS-S subscales differentiate between interven- direct comparisons to other national organizations.
tion strategies and include cognitive, behavioral, Ideally, the measure is administered to at least five
psychodynamic, family, and client-centered tech- individuals in an organization (TCU IBR, 2002).
niques. The TPOCS-S scoring involves “extensive- Organizational Social Context (OSC; Glisson
ness ratings of therapeutic interventions designed to et al., 2008) is a measurement system that quantita-
measure the degree to which therapists use specific tively evaluates the social context of mental health
therapeutic interventions during a therapy session” and social services organizations through admin-
(McLeod & Weisz, 2010, p. 438). Coders observe istration to direct service providers. Specifically,
sessions and indicate the degree to which a therapist the OSC measures both individual-level (work
engages in each strategy during the whole session attitudes, work behavior) and organizational-level
from 1 (not at all) to 7 (extensively). Extensiveness (culture) variables, as well as individual and shared

78 d i sse m i n atio n and imp l ementation sc i enc e

perceptions (climate). Assessing the social context of ease of use, result demonstrability, image, visibility,
an organization makes it possible to capture features trialability, and voluntariness. Psychometrics are
that may influence service and treatment, clinician adequate with regard to reliability (.71 to .95) and
morale, and adoption and implementation of EBP. validity, with a principal component analysis identi-
The OSC has 105 items that form 16 first- fying seven factors (Moore & Benbasat, 1991).
order scales and 7 second-order scales. Factors are State Health Authority Yardstick (SHAY; Finnerty
grouped by structure, culture, psychological and et al., 2009) is a 15-item agency-specific behavior-
organizational climate, and work attitudes. Culture ally anchored instrument that assesses systems-level
refers to the norms and values of an organization; considerations that are relevant to the implementa-
climate refers to the impact of a work context on tion of EBP. Specifically, the SHAY assesses seven
an individual. Work attitudes refer to morale of domains: planning, financing, training, leadership,
an individual worker. The measurement of these policies and regulations, quality improvement, and
factors together allows for an understanding of an stakeholders. Items are rated from 1 (little or no
organization’s context and can be compared with implementation) to 5 (full implementation). The
norms of national service settings. Confirmatory SHAY is intended to be administered by two inde-
factor analysis supported these factors; alpha coeffi- pendent raters who interview multiple informants in
cients for scales range from .71 to .94 (Glisson et al., an organization. The two raters make independent
2008). It is preferable that four or more individu- ratings and then create consensus ratings. Initial
als from an organization complete this assessment evidence partially supports construct and criterion
for adequate measurement of organizational climate validity of the instrument in assessing state-level
(P. Green, personal communication). facilitators of and/or barriers to EBP implementa-
tion (Finnerty et al., 2009).
measures specific to di processes Treatment Cost Analysis Tool (TCAT; Flynn et al.,
The instruments below measure specific con- 2009) is a measure created to assist in cost analysis
structs relevant to DI processes and either map onto of outpatient substance abuse treatment programs.
relevant DI models (e.g., PARiHS, DOI) or pro- To generate cost analysis, the TCAT includes infor-
vide information specific to Proctor and colleagues’ mation about client volume, counseling, total pro-
(2011) suggested implementation outcomes. gram costs, overhead costs, and personnel data. The
Organizational Readiness to Change Assessment measure is easy to use and is available through an
(ORCA; Helfrich, Li, Sharp, & Sales, 2009) oper- Excel spreadsheet. This measure provides informa-
ationalizes the core constructs of the PARiHS tion on cost effectiveness as suggested by Proctor
framework. The ORCA is a 77-item measure that and colleagues (2011).
is administered to staff involved in quality improve-
ment initiatives; responses range from 1 (very weak) measures at the client level
to 5 (very strong). Items map onto three scales that Given the complexity of DI studies, client mea-
make up the core elements of the PARiHS frame- sures to address client characteristics and client out-
work: (a) strength and extent of evidence, (b) orga- comes should be easy to implement and score, freely
nizational climate, and (c) capacity for internal available so that their use may be sustained follow-
facilitation of QI program. A three-factor solution ing the research project, and specific to the research
was identified via exploratory factor analysis, and question. Several large systems have adopted out-
reliability (.74 to .95) was acceptable, but further come measures that would be appropriate in DI
validation is necessary (Helfrich et al., 2009). A fol- research studies.
low-up study found the preimplementation ORCA For example, Illinois requires the Ohio Scales
scores to be predictive of low and high implementation (Ogles, Lunnen, Gillespie, & Trout, 1996) and
rates across sites (Hagedorn & Heideman, 2010). Columbia Impairment Scale (Hurt, Arnold &
Adopting Innovation Instrument (Moore & Aman, 2003) for all children funded by Medicaid.
Benbasat, 1991) is a 38-item self-report mea- The Ohio Scales (Ogles et al., 1996) focus on effi-
sure that assesses perceptions a provider may have cient administration, scoring, and interpretation.
toward adopting an innovation. In the rigorous There are three parallel forms of the Ohio Scales
development of this instrument, the authors specifi- that can be completed by the youth, caregiver, and
cally aimed to measure the constructs that Rogers service provider. All forms include questions relat-
(2004) proposed. Specifically, this instrument con- ing to problem severity, functioning, satisfaction,
tains eight factors: relative advantage, compatibility, and hopefulness. These scales were developed not

beid as, meh ta, at k i ns, solomon, mer z 79

to diagnose youth but to provide an efficient means Several models relevant to DI research were
of tracking outcomes in community agencies. described. Comprehensive models (e.g., CFIR;
Psychometric properties are solid with adequate test– Damschroder et al., 2009) provide heuristics as to
retest reliability (.65 to .97) and preliminary valid- areas and levels of study, while more specific mod-
ity (Ogles et al., 1996). The Columbia Impairment els (e.g., TPB; Ajzen, 1988, 1991) describe specific
Scale (CIS; Hurt et al., 2003) focuses on impairment targets that might be vital to understand DI mecha-
of functioning and assesses how well an individual nisms. The comprehensive models underscore the
carries out age-appropriate daily activities. The items importance of considering multiple levels of change
are scored on a 4-point scale, with a greater score and organization, leading to the need for complex
indicating greater impairment. The CIS can be filled studies that address not only whether an imple-
out by either a clinician or a caregiver and demon- mented intervention has the desired effect, but also
strates good internal consistency, test–retest reliabil- how the context affects the changes (or lack thereof )
ity, and validity (Hurt at al., 2003). that may be a result of the intervention. This neces-
An exiting initiative sponsored by the National sitates the careful and thoughtful assessment of fidel-
Institutes of Health also has produced efficient and ity and a thorough understanding of the issues that
easily accessible outcome measures that can be uti- are inherent in fidelity measurement (e.g., What are
lized in DI studies: the Patient-Reported Outcomes the core elements of the intervention? Is ongoing
Measurement Information System (PROMIS). The quality assessment incorporated into the DI pro-
goal of this project is “to develop and evaluate, for cess? Can interventions be adapted with fidelity?).
the clinical research community, a set of publicly With regard to adaptation, empirical questions to
available, efficient, and flexible measurements of tackle include: What adaptations, for whom, result
patient-reported outcomes, including health-related in improved client outcomes? Do adaptations result
quality of life” (Cella et al., 2010, p. 1180). Content in higher rates of implementation or sustainability
areas for items include physical health (e.g., fatigue), of EBP? When does flexibility in implementation of
mental health (e.g., anxiety), and social health (e.g., an EBP become infidelity (Kendall & Beidas, 2007;
social function). These items are available as paper- Kendall et al., 2008)?
and-pencil measures and computer adaptive tests. The specific models reviewed suggest possible
Large-scale testing of PROMIS items suggests targets of intervention to optimize DI efforts.
good reliability and validity. A larger discussion of For example, models emphasizing the organiza-
PROMIS is beyond the scope of this chapter, but tional context of DI efforts (e.g., ARC; Glisson &
these tools may be particularly well suited for DI Schoenwald, 2005) suggest that key components
studies given their brevity, ability to be tailored to of the organizational context such as norms and
particular populations, and ease of use. For example, expectations within the setting may influence DI
if one is interested in studying the DI of CBT for outcomes. An organizational perspective includes
youth anxiety disorders, one could use the pediatric how to influence and support an organization in the
PROMIS anxiety and depressive symptoms scales to adoption and implementation of EBP. Traditionally,
measure outcomes (Irwin et al., 2010). this perspective has included an understanding of
the structures needed to support new models and
Conclusion and Future Directions learning and infrastructures that can support new
DI science is a relatively new area of inquiry models of mental health services. For example, pro-
within mental health services research that strives viding facilitation to an organization in the creation
to understand the key mechanisms and processes of the structures and infrastructures needed to sup-
needed to expand the utilization of EBP in com- port a new intervention model increased the likeli-
munity mental health settings. DI research aims to hood that a particular intervention was adopted and
bridge the research-to-practice gap that prevents more clients improved following implementation
knowledge and practices of effective treatments from (Glisson et al., 2010).
reaching many people in need (Weisz, Donenberg, Other important organizational considerations
Han, & Kauneckis, 1995). Researchers focus on include the role of social networks within DI efforts.
the need to systematically study the process of DI Given that adoption of EBP may be a slow pro-
to increase the use of best practices in community cess, program response needs to be understood as
mental health settings (Schoenwald & Hoagwood, unfolding over time, requiring longitudinal studies
2001; Schoenwald, Hoagwood, Atkins, Evans, & that account for the differential adoption of inter-
Ringeisen, 2010). ventions. In addition, if programs are not adopted

80 d i sse m i n atio n and imp l ementation sc i enc e

throughout a social system such as an agency or proximal and distal outcomes that are relevant for
school, this may suggest that the program is not seen DI research. For example, measuring organizational
by a sufficient number of members as appropriate to change or therapist attitude change is a proximal
their needs. This could lead to a series of questions outcome, whereas improved client outcome is the
as to how to adapt programs or how to activate key distal outcome.
opinion leaders to influence mental health program Despite the many challenges, DI research has an
use to inform DI efforts throughout a social system important place in the field of mental health services
(Atkins et al., 2008). Key questions with regard to research. The primary goal of DI research is to iden-
how organizational context may influence DI out- tify key processes and mechanisms for change while
comes include: How do organizational constructs spreading EBP into community mental health settings
(e.g., organizational support) operate to facilitate or (i.e., dissemination), resulting in uptake and adoption
impede DI? How can knowledge of barriers/facilita- of such new technologies (i.e., implementation) that
tors be used to coordinate and augment DI of EBP in is sustainable (i.e., is maintained). The public policy
community mental health settings? How can organi- implications of such empirical inquiry are substantial,
zational interventions effectively improve DI efforts? given the unmet mental health needs within the U.S.
How can social networks be used to augment DI? population. One study found that only 21% of in-
A fundamental issue that arises when taking need children received mental health services within a
an organizational perspective is the natural ten- year and that uninsured youth were especially vulner-
sion between the adaptability of a services setting able (Kataoka, Zhang, & Wells, 2002).
and the adaptability of a new intervention. There is The public policy implications of DI research
often an implicit assumption that a service setting suggest that policymakers can, and will, play a key
is ready to adopt a new intervention. However, if role in shaping the future of DI efforts. Recently,
one takes an ecological perspective, there is an active policymakers have been moving from passive users
transactional interplay between an organization and of technology to active participants in a process of
a new intervention, with the organization influenc- DI. For example, Illinois policymakers are insist-
ing the intervention and the intervention influenc- ing that mental health providers implement EBP
ing the organization. For example, the organization (e.g., Illinois requires that agencies receiving grants
is likely to be constrained by the structure of the to implement school-based mental health services
agency, staffing, and budget issues, whereas inter- utilize EBP). This results in the formation of a criti-
vention delivery may be constrained by the com- cal relationship between policy and DI efforts and
mon elements that are required to effect change. also provides an opportunity for research and policy
How and what changes at each level is an empirical to inform one another. Future research can include
question that can enhance the understanding of DI key questions with regard to public policy such as:
processes and mechanisms. Research that addresses How can researchers engage and capitalize on the
and resolves this tension is paramount. push policymakers are currently making for the use
As stated earlier, the added complexity of includ- of EBP? How can researchers partner with policy-
ing multiple levels of change (i.e., individual, orga- makers to ensure that efforts are indeed effective
nizational) within a study calls for research methods and sustainable?
and design that may stray from the traditional mod- New knowledge is a key feature of all research.
els or “gold standard” of RCTs. Although it remains DI research can contribute new knowledge both
important to assess and evaluate client outcomes, through an understanding of the support and
there are several methods to augment traditional monitoring structures that are needed to support
RCT designs, as well as alternative designs (e.g., DI of effective practices and the natural processes
PCTs). Research on the development of DI-specific that support DI, such as social networks and key
methods is sorely needed. Choosing a specific opinion leaders. Mental health service settings can
research design requires consideration of the most be transformed with potentially enormous impact
effective method and design to answer the specific on the public health of the general population.
research questions, the strengths and weaknesses
of each design, the context of the research, and Notes
the available resources. Relying on mixed-method 1. The term “model” encompasses theories, models, and
frameworks in this chapter.
designs may be optimal given that different levels 2. We thank Cara Lewis, Katherine Comtois, and Yekaterina
of inquiry may address various questions within Krimer for the guidance they provided in the measures section
the same research study. Finally, there are both of the chapter: they and the Seattle Implementation Resource

beid as, meh ta, at k i ns, solomon, mer z 81

Conference are preparing a comprehensive repository of mea- The fidelity–adaptation debate: implications for the imple-
sures and were generous in discussing these measures with us. mentation of public sector social programs. American Journal
of Community Psychology, 15, 253–268. doi:10.1007/
References BF00922697
Aarons, G. (2004). Mental health provider attitudes toward adop- Borckardt, J. J., Nash, M. R., Murphy, M. D., Moore, M.,
tion of evidence-based practice: The Evidence-Based Practice Shaw, D., & O’Neil, P. (2008). Clinical practice as natural
Attitude Scale (EBPAS). Mental Health Services Research, 6, laboratory for psychotherapy research: A guide to case-based
61–74. doi:10.1023/B:MHSR.0000024351.12294.65 time-series analysis. American Psychologist, 63, 77–95. doi:
Aarons, G., Cafri, G., Lugo, L., & Sawitzky, A. (2010). 10.1037/0003-066X.63.2.77
Expanding the domains of attitudes towards evidence-based Borntrager, C. F., Chorpita, B. F., Higa-McMillan, C., & Weisz,
practice: The Evidence Based Practice Attitude Scale-50. J. R. (2009). Provider attitudes toward evidence-based prac-
Administration and Policy in Mental Health and Mental Health tices: are the concerns with the evidence or with the manu-
Services, 39, 331–340. doi: 10.1007/s10488-010-0302-3 als? Psychiatric Services, 60, 677–681. doi: 10.1176/appi.
Aarons, G., & Sawitzky, A. C. (2006). Organizational culture ps.60.5.677
and climate and mental health provider attitudes toward Botvin, G. J., Baker, E., Dusenbury, L., Tortu, S., & Botvin,
evidence-based practice. Psychological Services, 3, 61–72. doi: E. M. (1990). Preventing adolescent drug abuse through
10.1037/1541–1559.3.1.61 a multimodal cognitive-behavioral approach: Results of a
Adelman, H. S., & Taylor, L. (2003). On sustainability of proj- 3-year study. Journal of Consulting and Clinical Psychology,
ect innovations as systemic change. Journal of Educational 58, 437–446. doi:10.1037//0022-006X.58.4.437
& Psychological Consultation, 14, 1–25. doi:10.1207/ Carroll, K., & Rounsaville, B. (2003). Bridging the gap: A
S1532768XJEPC1401_01 hybrid model to link efficacy and effectiveness research in
Ajzen, I. (1988). Attitudes, personality and behavior. Milton substance abuse treatment. Psychiatric Services, 54, 333–339.
Keynes, England, Open University Press: Chicago, Dorsey doi:10.1176/
Press. Casper, E. (2007). The theory of planned behavior applied to con-
Ajzen, I. (1991). The theory of planned behavior. Organizational tinuing education for mental health professionals. Psychiatric
Behavior and Human Decision Processes, 50, 179–211. Services, 58, 1324–1329. doi:10.1176/
doi:10.1016/0749-5978(91)90020-T Castro, F. G., Barrera, M., Jr., & Martinez, C. R., Jr. (2004). The
Altman, D. G. (1995). Sustaining interventions in community cultural adaptation of prevention interventions: Resolving
systems: On the relationship between researchers and com- tensions between fidelity and fit. Prevention Science, 5,
munities. Health Psychology, 14, 526–536. doi:10.1037/0278- 41–45. doi:10.1023/
6133.14.6.526 Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B. Yount,
American Psychological Association. (2005). American S. . . PROMIS Cooperative Group. (2010). The Patient-
Psychological Association policy statement on evidence-based Reported Outcomes Measurement Information System
practice in psychology. Retrieved from (PROMIS) developed and tested its first wave of adult self-
practice/resources/evidence/evidence-based-statement.pdf reported health outcome item banks: 2005–2008. Journal of
Armitage, C. J., & Conner, M. (2001). Efficacy of the Theory of Clinical Epidemiology, 63, 1179–1194.
Planned Behaviour: a meta-analytic review. British Journal of Center for Substance Abuse Prevention. (2002 Conference
Social Psychology, 40, 471–499. Retrieved from http://www. Edition). Finding the Balance: Program Fidelity and Adaptation in Substance Abuse Prevention. Executive Summary of a State-
Atkins, M. S., Frazier, S., & Cappella, E. (2006). Hybrid research of-the-Art Review. Rockville, MD: U.S. Department of
models. Natural opportunities for examining mental health in Health and Human Services, Substance Abuse and Mental
context. Clinical Psychology: Science and Practice, 13, 105–107. Health Services Administration, Center for Substance Abuse
doi:10.1111/j.1468-2850.2006.00012.x Prevention.
Atkins, M. S., Frazier, S. L., Leathers, S. J., Graczyk, P. A., Talbott, Chamberlain, P., Snowden, L. R., Padgett, C., Saldana, L., Roles,
E., Jakobsons, L.,. . . . Bell, C. (2008). Teacher key opinion J., Holmes, L.,. . . Landsverk, J. (2011). A strategy for assess-
leaders and mental health consultation in low-income urban ing costs of implementing new practices in the child welfare
schools. Journal of Consulting and Clinical Psychology, 76, system: Adapting the English cost calculator in the United
905–908. doi: 10.1037/a0013036 States. Administration and Policy in Mental Health and
Battistich, V., Schaps, E., Watson, M., & Solomon, D. Mental Health Services Research, 38, 24–31. doi: 10.1007/
(1996). Prevention effects of the child development proj- s10488-010-0318-8
ect: Early findings from an ongoing multisite demon- Chambless, D. L., & Hollon, S. D. (1998). Defining empiri-
stration trial. Journal of Adolescent Research, 11, 12–35. cally supported therapies. Journal of Consulting and Clinical
doi:10.1177/0743554896111003 Psychology, 66, 7–18. Retrieved from http://www.ncbi.nlm.
Bauman, L. J., Stein, R. E., & Ireys, H. T. (1991). Reinventing
fidelity: The transfer of social technology among settings. Chow, S. C., & Chang, M. (2008). Adaptive design methods in
American Journal of Community Psychology, 19, 619–639. clinical trials—a review. Orphanet Journal of Rare Diseases, 3,
doi:10.1007/BF00937995 11. doi: 10.1186/1750-1172-3-11
Beidas, R. S., & Kendall, P. C. (2010). Training therapists in evi- Damschroder, L. J. (Feb. 10, 2011). The role and selection of the-
dence-based practice: A critical review of studies from a sys- oretical frameworks in implementation research. Retrieved
tems-contextual perspective. Clinical Psychology: Science and Feb. 10, 2011. from
Practice, 17, 1–30. doi:10.1111/j.1468-2850.2009.01187.x ers/cyber. . . /eis-021011.pdf
Blakely, C. H., Mayer, J. P., Gottschalk, R. G., Schmitt, N., Damschroder, L. J., Aron, D. C., Keith, R. E., Kirsh, S. R., Alexander,
Davidson, W. S., Roitman, D. B., & Emshoff, J. G. (1987). J. A., & Lowery, J. C. (2009). Fostering implementation of health

82 d i sse m i natio n and imp l ementatio n sc i enc e

services research findings into practice: a consolidated frame- Garland, A., Brookman-Frazee, L., Hurlburt, M., Accurso, E.,
work for advancing implementation science. Implementation Zoffness, R., Haine-Schlagel, R., . . . Ganger, W. (2010).
Science, 4. doi: 10.1186/1748-5908-4-50 Mental health care for children with disruptive behavior
Datnow, A., & Castellano, M. (2000). Teachers’ responses to suc- problems: A view inside therapists’ offices. Psychiatric Services,
cess for all: How beliefs, experiences, and adaptations shape 61, 788–795. doi: 10.1176/
implementation. American Educational Research Journal, 37, Glasgow, R. E., Vogt, T. M., & Boles, S. M. (1999). Evaluating
775–799. doi:10.2307/1163489 the public health impact of health promotion interventions:
Dattilio, F. M., Edwards, D. J., & Fishman, D. B. (2010). Case the RE-AIM framework. American Journal of Public Health,
studies within a mixed methods paradigm: toward a reso- 89, 1322–1327. Retrieved from http://www.pubmedcentral.
lution of the alienation between researcher and practitioner
in psychotherapy research. Psychotherapy, 47, 427–441. doi: &rendertype=abstract
10.1037/a0021181 Glisson, C., Landsverk, J., Schoenwald, S., Kelleher, K.,
Dimeff, L., Koerner, K., Woodcock, E., Beadnell, B., Brown, M., Hoagwood, K., Mayberg, S.,. . . The Research Network on
Skutch, J.,. . . Harned, M. (2009). Which training method Youth Mental Health (2008). Assessing the Organizational
works best? A randomized controlled trial comparing three Social Context (OSC) of mental health services: Implications
methods of training clinicians in dialectical behavior ther- for research and practice. Administration and Policy in Mental
apy skills. Behavior Research and Therapy, 47, 921–930. Health and Mental Health Services Research, 35, 98–113. doi:
doi:10.1016/j.brat.2009.07.011 10.1007/s10488-007-0148-5
Dingfelder, H. E., & Mandell, D. S. (2010). Bridging the Glisson, C., & Schoenwald, S. K. (2005). The ARC organiza-
research-to-practice gap in autism intervention: An appli- tional and community intervention strategy for implement-
cation of diffusion of innovation theory. Journal of Autism ing evidence-based children’s mental health treatments.
and Developmental Disorders, 41(5), 597–609. doi: 10.1007/ Mental Health Services Research, 7, 243–259. doi: 10.1007/
s10803-010-1081-0. s11020-005-7456-1
Durlak, J. A., & DuPre, E. P. (2008). Implementation matters: Glisson, C., Schoenwald, S. K., Hemmelgarn, A., Green, P.,
A review of research on the influence of implementation on Dukes, D., Armstrong, K. S., & Chapman, J. E. (2010).
program outcomes and the factors affecting implementation. Randomized trial of MST and ARC in a two-level evi-
American Journal of Community Psychology, 41, 327–350. dence-based treatment implementation strategy. Journal
doi:10.1007/s10464-008-9165-0 of Consulting and Clinical Psychology, 78, 537–550. doi:
Dusenbury, L., Brannigan, R., Hansen, W. B., Walsh, J., & 10.1037/a0019160
Falco, M. (2005). Quality of implementation: Developing Green, L. W., & Glasgow, R. E. (2006). Evaluating the rel-
measures crucial to understanding the diffusion of preven- evance, generalization, and applicability of research:
tive interventions. Health Education Research, 20, 308–313. Issues in external validation and translation methodol-
doi:10.1093/her/cyg134 ogy. Evaluation & the Health Professions, 29, 126–153.
Feldstein, A., & Glasgow, R. (2008). A practical, robust imple- doi:10.1177/0163278705284445
mentation and sustainability model (PRISM) for integrating Green, L. W., Ottoson, J. M., Garcia, C., & Hiatt, R. A.
research findings into practice. Joint Commission Journal on (2009). Diffusion theory, and knowledge dissemination,
Quality and Patient Safety, 34, 228–242. utilization, and integration in public health. Annual Review
Finnerty, M., Rapp, C., Bond, G., Lynde, D., Ganju, V., & of Public Health, 30, 151–174. doi: 10.1146/annurev.
Goldman, H. (2009). The State Health Authority Yardstick publhealth.031308.100049
(SHAY). Community Mental Health Journal, 45, 228–236. Greenhalgh, T., Robert, G., Macfarlane, F., Bate, P., & Kyriakidou,
doi:10.1007/s10597-009-9181-z O. (2004). Diffusion of innovations in service organizations:
Fitzpatrick, R., & Boulton, M. (1996). Qualitative research in systematic review and recommendations. Milbank Quarterly,
healthcare: I. The scope and validity of methods. Journal of 82, 581–629. doi: 10.1111/j.0887-378X.2004.00325.x
Evaluation in Clinical Practice, 2, 123–130. Grol, R., Bosch, M., Hulscher, M., Eccles, M., & Wensing, M.
Fixsen, D. L., Blase, K. A., Naoom, S. F., & Wallace, F. (2009). (2007). Planning and studying improvement in patient care:
Core implementation components. Research on Social Work The use of theoretical perspectives. Milbank Quarterly, 85,
Practice, 19, 531–540. doi: 10.1177/1049731509335549 93–138.
Fixsen, D. L., Naoom, S. F., Blasé, K. A., Friedman, R. M., & Hagedorn, H., & Heideman, P. (2010). The relationship between
Wallace, F. (2005). Implementation research: A synthesis of the baseline Organizational Readiness to Change Assessment
literature. Tampa, FL: University of South Florida, The Louis subscale scores and implementation of hepatitis prevention
de la Parte Florida Mental Health Institute. Department of services in substance use disorders treatment clinics: a case
Child & Family Studies. Retrieved from http://nirn.fpg.unc. study. Implementation Science, 5. Retrieved from http://www.
Flynn, P., Broome, K., Beaston-Blaakman, A., Knight, D., Haider, M., & Kreps, G. (2004). Forty years of diffusion of inno-
Horgan, C., & Shepard, D. (2009). Treatment Cost Analysis vations: Utility and value in public health. Journal of Health
Tool (TCAT) for estimating costs of outpatient treat- Communication, 9, 3–11. doi: 10.1080/10810730490271430
ment services. Drug and Alcohol Dependence, 100, 47–53. Hawe, P., Shiell, A., & Riley, T. (2004). Complex interventions:
doi:10.1016/j.drugalcdep.2008.08.015 how “out of control” can a randomised controlled trial be?
Freedman, B. (1987). Equipoise and the ethics of clinical research. British Medical Journal, 328, 1561–1563. doi: 10.1136/
New England Journal of Medicine, 16, 141–145. Retrieved bmj.328.7455.1561
from Helfrich, C. D., Damschroder, L. J., Hagedorn, H. J., Daggett,
Gambrill, E. (2006). Social work practice: A critical thinker’s guide G. S., Sahay, A., Ritchie, M.,. . . Stetler, C. B. (2010). A
(2nd ed.) New York: Oxford University Press. critical synthesis of literature on the Promoting Action on

beid as, meh ta, at k i ns, solomon, mer z 83

Research Implementation in Health Services (PARIHS) within fidelity. Professional Psychology: Research and Practice,
framework. Implementation Science, 82. doi: 10.1186/1748- 38, 13–20.
5908-5-82 Kendall, P. C., & Chambless, D. (Eds.) (1998). Empirically
Helfrich, C. D., Li, Y., Sharp, N., & Sales, A. (2009). supported psychological therapies, Journal of Consulting and
Organizational readiness to change assessment: Development Clinical Psychology, 66, entire issue.
of an instrument based on the Promoting Action Research Kendall, P. C., & Comer, J. S. (2011). Research methods in
in Health Services (PARiHS) framework. Implementation clinical psychology. In D. Barlow (Ed.), Oxford handbook
Science, 4. doi: 10.1186/1748-4-38 of clinical psychology (pp. 52–75). New York: Oxford
Hersen, M., & Barlow, D. (1976). Single-case experimental University Press.
designs: Strategies for studying behavior change. New York: Kendall, P. C., Gosch, E., Furr, J., & Sood, E. (2008). Flexibility
Pergamon Press. within fidelity. Journal of the American Academy of Child and
Hill, C., Thompson, B., & Williams, E. (1997). A guide to Adolescent Psychiatry, 47, 987–993.
conducting consensual qualitative research. Counseling Kitson, A., Harvey, G., & McCormack, B. (1998). Enabling
Psychologist, 25, 517–572. doi:10.1177/0011000097254001 the implementation of evidence based practice: a conceptual
Hoagwood, K., Burns, B., & Weisz, J. (2002). A profitable con- framework. Quality in Health Care, 7, 149–158. Retrieved
junction: From science to service in children’s mental health. from
In B. Burns & K. Hoagwood (Eds.), Community treatment rtid=2483604&tool=pmcentrez&rendertype=abstract
for youth: Evidence-based interventions for severe emotional Kitson, A. L., Rycroft-Malone, J., Harvey, Gill, McCormack,
and behavioral disorders (pp. 328–338). New York: Oxford Brendan, Seers, K., & Titchen, A. (2008). Evaluating the
University Press. successful implementation of evidence into practice using
Hollon, S., Evans, D., Auerbach, A., DeRubeis, R., Elkin, I., the PARiHS framework: theoretical and practical challenges.
Lowery, A.,. . . Piasecki, J. (1988). Development of a system for Implementation Science, 3. doi: 10.1186/1748-5908-3-1.
rating therapies for depression: Differentiating cognitive therapy, Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the sci-
interpersonal therapy, and clinical management pharmaco- entific credibility of single-case intervention research:
therapy. Unpublished manuscript, University of Minnesota, Randomization to the rescue. Psychological Methods, 15,
Twin Cities Campus. 124–144. doi: 10.1037/a0017736
Hurt, E., Arnold, L. E., & Aman, M. G. (2003). Clinical instru- Landsverk, J., Brown, C. H., Rolls Reutz, J., Palinkas, L., &
ments and scales in pediatric psychopharmacology. In A. Horwitz, S. M. (2010). Design elements in implementa-
Martin, L. Scahill, & C. Kratochvil, (2nd ed.), Pediatric psy- tion research: A structured review of child welfare and child
chopharmacology: Principles and practice (pp. 389–406). New mental health studies. Administration and Policy in Mental
York: Oxford University Press. Health and Mental Health Services Research, 38, 54–63. doi:
Institute of Behavioral Research (2002). Organizational readiness 10.1007/s10488-010-0315-y
for change. Texas Christian University, Fort Worth, Texas. Larsen, T., & Samdal, O. (2007). Implementing second
Irwin, D., Stucky, B., Langer, M., Thissen, D., DeWitt, E., Jin- step: Balancing fidelity and program adaptation. Journal
Shei, L.,. . . DeWalt, D. (2010). An item response analysis of Educational & Psychological Consultation, 17, 1–29.
of the pediatric PROMIS anxiety and depressive symptoms doi:10.1207/s1532768Xjepc1701_1
scales. Quality of Life Research, 19, 595–607. doi:10.1007/ Latham, M., Myles, P., & Ricketts, T. (2003). Development of
s11136-010-9619-3 a multiple choice questionnaire to measure changes in knowl-
James, I., Blackburn, I., & Reichelt, F. (2001). Manual for edge during CBT training. Open paper. British Association
the Revised Cognitive Therapy Scale (CTS-R) (2nd ed.). of Behavioural and Cognitive Psychotherapy Annual
Unpublished manuscript, available from Ian James, Centre Conference, York University.
for the Health of the Elderly, Newcastle General Hospital, Lavori, P. W., Rush, A. J., Wisniewski, S. R., Alpert, J., Fava,
Westgate Road, Newcastle NE4 6BE. M., Kupfer, D. J.,. . . Trivedi, M. (2001). Strengthening clini-
Jensen-Doss, A., & Hawley, K. (2010). Understanding barriers cal effectiveness trials: equipoise-stratified randomization.
to evidence-based assessment: Clinician’s attitudes toward Biological Psychiatry, 50, 792–801. Retrieved from http://
standardized assessment tools. Journal of Clinical Child &
Adolescent Psychology, 39, 885–896. doi:10.1080/15374416. Lehman, W., Greener, J., & Simpson, D. (2002). Assessing
2010.517169 organizational readiness for change. Journal of Substance
Jensen-Doss, A., Hawley, K., Lopez, M., & Osterberg, L. (2009). Abuse Treatment, 22, 197–209. doi:10.1016/S0740-
Using evidence-based treatments: The experiences of youth 5472(02)00233-7
providers working under a mandate. Professional Psychology: Lewin, J., Lall, V., & Kratochwill, T. (2011). Extensions of a
Research and Practice, 40, 417–424. doi:10.1037/a0014690 versatile randomization test for assessing single-case inter-
Kataoka, S., Zhang, L., & Wells, K. (2002). Unmet need for vention effects. Journal of School Psychology, 49, 55–79.
mental health care among US children: Variation by ethnic- doi:10.1016/j.jsp.2010.09.002
ity and insurance status. American Journal of Psychiatry, 159, Lewis, C., Comtois, K., & Krimer, Y. (2011, March). A com-
1548–1555. doi: 10.1176/appi.ajp.159.9.1548 prehensive review of dissemination and implementation sci-
Kauth, M., Sullivan, G., Blevins, D., Cully, J., Landes, R., Said, ence instruments. Poster presented at the annual meeting
Q., & Teasdale, T. (2010). Employing external facilitation to of the National Institutes of Health Dissemination and
implement cognitive behavioral therapy in VA clinics: a pilot Implementation Conference, Bethesda, MD.
study. Implementation Science, 5. Retrieved from http://www. Lomas, J. (1993). Diffusion, dissemination and implementa- tion: Who should do what? Annals of the New York Academy
Kendall, P. C., & Beidas, R. (2007). Smoothing the trail for dis- of Sciences, 703, 226–237. doi:10.1111/j.1749-6632.1993.
semination of evidence-based practices for youth: Flexibility tb26351.x

84 d i sse m i n atio n and imp l ementation sc i enc e

Lopez, M., Osterberg, L., Jensen-Doss, A., & Rae, W. (2011). Skills Code. Behavioral and Cognitive Psychotherapy, 31,
Effects of workshop training for providers under mandated 177–184. doi:10.1017/S1352465803002054
use of evidence-based treatment. Administration and Policy Myles, P. J., Latham, M., & Ricketts, T. (2003). The contribu-
in Mental Health and Mental Health Services Research. 38(4), tions of an expert panel in the development of a new measure
301–312. doi: 10.1007/s10488-010-0326-8 of knowledge for the evaluation of training in cognitive behav-
Lovejoy, T. I., Demireva, P. D., Grayson, J. L., & McNamara, J. ioural therapy. Open paper. British Association of Behavioural
R. (2009). Advancing the practice of online psychotherapy: and Cognitive Psychotherapies Annual Conference, York
An application of Rogers’ diffusion of innovations theory. University.
Psychotherapy: Theory, Research, Practice, Training, 46, 112– O’Connor, C., Small, S. A., & Cooney, S. M. (2007). Program
124. doi: 10.1037/a0015153 fidelity and adaptation: Meeting local needs without com-
Luce, B., Kramer, J., Gooman, S., Connor, J., Tunis, S., Whicher, promising program effectiveness. What Works, Wisconsin
D., & Schwartz, J. (2009). Rethinking randomized clinical Research to Practice Series, 4. Madison, WI: University of
trials for comparative effectiveness research: The need for Wisconsin–Madison/Extension.
transformational change. Annals of Internal Medicine, 151, Ogles, B., Lunnen, K., Gillespie, D., & Trout, C. (1996).
206–209. Conceptualization and initial development of the Ohio
Macpherson, H. (2004). Pragmatic clinical trials. Complementary Scales. In C. Liberton,, K. Kutash,, & R. Friendman (Eds.),
Therapies in Medicine, 12, 136–40. doi: 10.1016/j. The 8th Annual Research Conference Proceedings, A system of
ctim.2004.07.043 care for children’s mental health: Expanding the research base
March, J. (May 2011). Twenty years of comparative treatment tri- (pp. 33–37). Tampa, FL: University of South Florida, Florida
als in pediatric psychiatry. University of Illinois at Chicago Mental Health Institute, Research and Training Center for
Grand Rounds, Chicago, Illinois. Children’s Mental Health.
March, J., Silva, S., Compton, S., Shapiro, M., Califf, R., & Okamura, K., Nakamura, B., Higa McMillan, C., Mueller, C.,
Krishnan, R. (2005). The case for practical clinical trials in & Hayashi, K. (2010, November). Psychometric evaluation
psychiatry. American Journal of Psychiatry, 162, 836–846. of the Knowledge of Evidence-Based Questionnaire in a com-
doi:10.1176/appi.ajp.162.5.836 munity sample of mental health therapists. Poster presented
Mays, N., & Pope, C. (2000). Assessing quality in qualitative at the annual meeting of the Association for Behavioral and
research. BMJ, 320, 50–52. Retrieved from http://www.bmj. Cognitive Therapies, San Francisco, CA.
com/content/320/7226/50.1.extract Ozer, E. J., Wanis, M. G., & Bazell, N. (2010). Diffusion of
McGraw, S. A., Sellers, D. E., Stone, E. J., & Bebchuk, J. school-based prevention programs in two urban districts:
(1996). Using process data to explain outcomes: An illus- Adaptations, rationales, and suggestions for change. Prevention
tration from the child and adolescent trial for cardiovas- Science, 11, 42–55. doi:10.1007/s11121-009-0148-7
cular health (CATCH). Evaluation Review, 20, 291–312. Palinkas, L., Aarons, G., Horwitz, S., Chamberlain, P., Hurlburt,
doi:10.1177/0193841X9602000304 M., & Landsverk, J. (2011). Mixed method designs in imple-
McHugh, R. K., & Barlow, D. H. (2010). The dissemination mentation research. Administration and Policy in Mental
and implementation of evidence-based psychological treat- Health and Mental Health Services Research, 38(1), 44–53.
ments: A review of current efforts. American Psychologist, 65, doi: 10.1007/s10488-010-0314-z
73–84. doi: 10.1037/a0018121 Palinkas, L., Schoenwald, S. K., Hoagwood, K., Landsverk, J.,
McLeod, B. (2001). The therapy process observational coding system Chorpita, B. F., & Weisz, J. R. (2008). An ethnographic
for child psychotherapy. Unpublished manuscript, University study of implementation of evidence-based treatments in
of California, Los Angeles. child mental health: First steps. Psychiatric Services, 59, 738–
McLeod, B. D., & Weisz, J. R. (2010). The therapy process 746. doi: 10.1176/
observational coding system for child psychotherapy-strate- Perpepletchikova, F., & Kazdin, A. (2005). Treatment integrity
gies scale. Journal of Clinical Child and Adolescent Psychology and therapeutic change: Issues and research recommenda-
39, 436–443. doi: 10.1080/15374411003691750 tions. Clinical Psychology: Science and Practice, 12, 365–383.
Meyer, G. (2004). Diffusion methodology: Time to inno- doi:10.1093/clipsy/bpi045
vate? Journal of Health Communication, 9, 59–69. doi: Pluye, P., Potvin, L., & Denis, J-L. (2004). Making public health
10.1080/10810730490271539 programs last: Conceptualizing sustainability. Evaluation
Miller, W., Yahne, C., Moyers, T., Martinez, J., & Pirritano, M. and Program Planning, 27, 121–133. doi:10.1016/j.
(2004). A randomized trial of methods to help clinicians evalprogplan.2004.01.001
learn motivational interviewing. Journal of Consulting and Proctor, E., Silmere, H., Raghavan, R., Hovmand, P., Aarons, G.,
Clinical Psychology, 72, 1050–1062. doi:10.1037/0022- Bunger, A.,. . . Hensley, M. (2011). Outcomes for implemen-
006X.72.6.1050 tation research: Conceptual distinctions, measurement chal-
Moore, G., & Benbasat, I. (1991). Development of an instru- lenges, and research agenda. Administration and Policy in
ment to measure the perceptions of adopting an informa- Mental Health and Mental Health Services Research, 38(2),
tion technology innovation. Information Systems Research, 2, 65–76. doi: 10.1007/s10488-010-0319-7
192–222. doi:10.1287/isre.2.3.192 Proctor, E. K., Landsverk, J., Aarons, G., Chambers, D., Glisson,
Morgenstern, J., & McKay, J. R. (2007). Rethinking the C., & Mittman, B. (2009). Implementation research in
paradigms that inform behavioral treatment research for mental health services: an emerging science with conceptual,
substance use disorders. Addiction, 102, 1377−1389. methodological, and training challenges. Administration and
doi:10.1111/j.1360-0443.2007.01882.x Policy in Mental Health and Mental Health Services Research,
Moyers, T., Martin, T., Catley, D., Harris, K., & Ahluwalia, J. 36(1), 24–34. doi: 10.1007/s10488-008-0197-4
(2003). Assessing the integrity of motivational interviewing Rakovshik, S. G., & McManus, F. (2010). Establishing evidence-
interventions: Reliability of the Motivational Interviewing based training in cognitive behavioral therapy: A review

beid as, meh ta, at k i ns, solomon, mer z 85

of current empirical findings and theoretical guidance. Stetler, C. (2001). Updating the Stetler Model of research utiliza-
Clinical Psychology Review, 30, 496–516. doi: 10.1016/j. tion to facilitate evidence-based practice. Nursing Outlook,
cpr.2010.03.004 49, 272–279. doi:10.1067/mno.2001.120517
Ringwalt, C., Ennett, S. T., Vincus, A., & Simons-Rudolph, A. Stewart, R. E., & Chambless, D. L. (2010). Interesting practitio-
(2004). Students’ special needs and problems as reason for ners in training in empirically supported treatments: Research
the adaptation of substance abuse prevention curricula in reviews versus case studies. Journal of Clinical Psychology, 66,
the nation’s middle schools. Prevention Science, 5, 197–206. 73–95. doi: 10.1002/jclp
doi:10.1023/B:PREV.0000037642.40783.95 Stirman, S., DeRubeis, R., Crits-Christoph, P., & Brody, P.
Rogers, E. (1995). Diffusion of innovations. New York: The Free (2003). Are samples in randomized controlled trials of psy-
Press. chotherapy representative of community outpatients? A
Rogers, E. (2004). A prospective and retrospective look at the new methodology and initial findings. Journal of Consulting
diffusion model. Journal of Health Communication, 9, 13–19. and Clinical Psychology, 71, 963–972. doi: 10.1037/0022-
doi: 10.1080/10810730490271449 006X.71.6.963
Rohrbach, L. A., Graham, J. W., & Hansen, W. B. (1993). Stumpf, R. E., Higa-McMillan, C. K., & Chorpita, B. F. (2009).
Diffusion of a school-based substance abuse prevention Implementation of evidence-based services for youth: assess-
program: predictors of program implementation. Preventive ing provider knowledge. Behavior Modification, 33, 48–65.
Medicine, 22, 237–260. doi:10.1006/pmed.1993.1020 doi: 10.1177/0145445508322625.
Rush, A. (2001). Sequenced Treatment Alternatives to Relieve Torrey, W., Finnerty, M., Evans, A., & Wyzik, P. (2003).
Depression (STAR*D). In: Syllabus and Proceedings Strategies for leading the implementation of evidence-based
Summary, American Psychiatric Association 154th Annual practices. Psychiatric Clinics of North America, 26, 883–897.
Meeting, New Orleans, LA, May 5–10, p. 182. doi: 10.1016/S0193-953X(03)00067-4
Scheirer, M. A. (2005). Is sustainability possible? A review Trent, L., Buchanan, E., & Young, J. (2010, November).
and commentary on empirical studies of program sus- Development and initial psychometric evaluation of the Measure
tainability. American Journal of Evaluation, 26, 320–347. of Disseminability (MOD). Poster presented at the annual
doi:10.1177/1098214005278752 meeting of the Association for Behavioral and Cognitive
Schoenwald, S. K., & Hoagwood, K. (2001). Effectiveness, trans- Therapies, San Francisco, CA.
portability, and dissemination of interventions: What mat- Tunis, S. R., Stryer, D. B., & Clancy, C. M. (2003). Practical
ters when? Psychiatric Services, 52, 1190–1197. doi:10.1176/ clinical trials: increasing the value of clinical research for decision making in clinical and health policy. Journal of
Schoenwald, S. K., Hoagwood, K. E., Atkins, M. S., Evans, the American Medical Association, 290, 1624–1632. doi:
M. E., & Ringeisen, H. (2010). Workforce develop- 10.1001/jama.290.12.1624
ment and the organization of work: The science we need. Valente, T. W., & Davis, R. L. (1999). Accelerating the diffusion
Administration and Policy in Mental Health and Mental of innovations using opinion leaders. Annals of the American
Health Services Research, 37, 71–80. doi:10.1007/s10488- Academy of Political and Social Science, 566, 55–67. doi:
010-0278-z 10.1177/0002716299566001005
Shediac-Rizkallah, M. C., & Bone, L. R. (1998). Planning for Webb, C., DeRubeis, R., & Barber, J. (2010). Clinician adher-
the sustainability of community-based health programs: ence/competence and treatment outcome: A meta-analytic
Conceptual frameworks and future directions for research, review. Journal of Consulting and Clinical Psychology, 78,
practice and policy. Health Education Research, 13, 87–108. 200–211. doi:10.1037/a0018912
doi:10.1093/her/13.1.87 Weisz, J. R., Donenberg, G. R., Han, S. S., & Kauneckis, D.
Sholomskas, D., Syracuse-Siewert, G., Rounsaville, B., Ball, (1995). Child and adolescent psychotherapy outcomes
S., Nuro, K., & Carroll, K. (2005). We don’t train in vain: in experiments versus clinics: Why the disparity? Journal
A dissemination trial of three strategies of training clini- of Abnormal Child Psychology, 23, 83–106. doi:10.1007/
cians in cognitive-behavioral therapy. Journal of Consulting BF01447046
and Clinical Psychology, 73, 106–115. doi:10.1037/0022- West, S., Duan, N., Pequegnat, W., Gaist, P., Des Jarlais, D.,
006X.73.1.106 Holtgrave, D.,. . . Mullen, D. (2008). Alternatives to the ran-
Song, M., & Herman, R. (2010). Critical issues and common domized controlled trial. American Journal of Public Health,
pitfalls in designing and conducting impact studies in educa- 98, 1359–1366. doi:10.2105/AJPH.2007.124446
tion: Lessons learned from the What Works Clearinghouse Young, J., & Beck, A. (1980). The development of the Cognitive
(Phase I). Educational Evaluation and Policy Analysis, 32, Therapy Scale. Unpublished manuscript, University of
351–371. doi: 10.3102/0162373710373389 Pennsylvania, Philadelphia, PA.

86 d i sse m i natio n and imp l ementatio n sc i enc e


Virtual Environments in Clinical

6 Psychology Research

Nina Wong and Deborah C. Beidel

Virtual environments (VEs) represent a new and powerful medium through which an individual can
become immersed in an “alternative reality.” Applications of these environments are increasingly
used to treat psychological disorders such as anxiety and autism spectrum disorders. We review the
available literature regarding the use of VEs to treat these and other clinical disorders, highlighting
both what we know and what we need to learn. We end with suggestions for integrating VE into
future research endeavors.
Key Words: Virtual environments, virtual reality, virtual exposure, treatment, clinical research

Since the introduction of computer-generated right side of the environment is displayed). In addi-
graphics in the early 1960s, the development tion to the visual images, auditory, tactile, and
and utilization of virtual environments (VEs) has olfactory stimuli are often included to increase
flourished. Virtual reality technology was initially patients’ immersion in the VE. In essence, a virtual
developed through investments by the federal gov- environment is simulated and can be controlled
ernment, military research, and NASA (Wiederhold or deliberately manipulated such that individuals
& Wiederhold, 2005) and has been increasingly immerse themselves into lifelike experiences with
used for architecture and design, visualization of surprising authenticity.
scientific models, education and training, enter- Although psychotherapy and clinical psychol-
tainment, and medicine over the past two decades ogy research have only recently taken advantage
(Blade & Padgett, 2002). of such technology (see Glantz, Rizzo, & Graap,
Virtual reality (VR) is defined by the integra- 2003; Wiederhold & Wiederhold, 2005), the use of
tion of computer graphics, body tracking devices, VR and computer-based therapies ranked 3rd and
visual displays, and sensory input devices in real 5th, respectively, of 38 interventions predicted to
time to synthesize a three-dimensional computer- increase by 2012 (Norcross, Hedges, & Prochaska,
generated environment. The most commonly used 2002). The growing enthusiasm for VE applications
approach to facilitate VR involves a head-mounted in clinical psychology merits a review of the cur-
display (HMD) consisting of a visor with separate rent literature and suggestions for future research.
visual display screens for each eye. Perception of This chapter begins by highlighting findings of VE
the actual surrounding environment is blocked by technology in clinical psychology, organized by
focusing on the visor’s VE display screens. A body diagnostically relevant categories, and concludes by
tracking device is connected to the HMD and discussing the limitations of VEs in clinical psychol-
matches the patient’s VE to real-life head move- ogy research and directions for integrating VEs into
ments (i.e., if the patient turns to the right, the future research paradigms.

Virtual Environments for Anxiety Disorders Specific Phobias
Efficacious behavioral treatment for anxiety dis- VEs have been used to successfully treat specific
orders involves a systematic exposure to situations phobias, including aviophobia, acrophobia, and
and stimuli that evoke fear. With repeated and arachnophobia (fear of spiders). In particular, one
prolonged exposure, the patient’s anxiety responses of the most extensively utilized VEs was developed
gradually diminish through a process of habitua- to treat fear of flying (Da Costa, Sardinha, & Nardi,
tion (Wolpe, 1958). Exposure-based treatments for 2008; Klein, 2000; Price, Anderson, & Rothbaum,
anxiety disorders are well established and certainly 2008). Patients can be exposed to various flight
the gold standard (Barlow, 2002; Chambless & experiences such as taxiing, takeoff, flight, and land-
Ollendick, 2001; Deacon & Abramowitz, 2004). ing under calm and turbulent weather conditions
Typical exposure modalities include imaginal or through a virtual airplane environment. Speakers
in vivo presentation of the feared stimulus. When emitting low-frequency sound waves are built into
patients have difficulty with imaginal exposure or a platform on which the patient sits and feels the
the feared situations cannot be recreated in vivo, vibrations associated with takeoff. With respect to
VEs may serve as a clinical tool to enhance stimulus efficacy, a randomized clinical trial (Rothbaum,
presentation. Hodges, Smith, Lee, & Price, 2000) found that
There has been a rapidly growing interest in individuals in the VR exposure and standard in vivo
the application of VR to treat anxiety disorders, exposure conditions reported a significant decrease
with the appearance of a number of recent quali- in fear from pretreatment to posttreatment, whereas
tative literature reviews (i.e., Anderson, Jacobs, & the waitlist control group did not show a signifi-
Rothbaum, 2004; Bush, 2008; Coelho, Waters, cant change. Participants who received VRET or
Hine, & Wallis, 2009; Gerardi, Cukor, Difede, Rizzo, standard in vivo exposure maintained treatment
& Rothbaum, 2010; Krijn, Emmelkamp, Ólafsson, & gains at the 1-year follow-up (Rothbaum, Hodges,
Biemond, 2004; Meyerbröker & Emmelkamp, 2010; Anderson, Price, & Smith, 2002) and did not have
Pull, 2005; Rothbaum & Hodges, 1999). Although a significant increase in fear of flying even after the
the majority of studies focused on the treatment of September 11th attacks (Anderson et al., 2006). A
specific phobias, such as aviophobia (fear of fly- second randomized control trial (Rothbaum et al.,
ing) and acrophobia (fear of heights), other studies 2006) replicated and extended the study by adding
examined the use of virtual reality exposure therapy a posttreatment flight as a behavioral outcome mea-
(VRET) for social phobia or social anxiety, post- sure. In addition to decreases in self-reported mea-
traumatic stress disorder, and panic disorder with sures of anxiety from pretreatment to posttreatment
or without agoraphobia. Collectively, VEs for the for both the standard in vivo and VRET groups,
treatment of anxiety disorders have demonstrated 76 percent of both treatment groups completed
promising results in case, comparative, and random- the posttreatment flight relative to 20 percent of
ized controlled studies. Two meta-analytic reviews the waitlist group, and both groups maintained
(Parsons & Rizzo, 2008; Powers & Emmelkamp, treatment gains at the 6- and 12-month follow-ups
2008) found large effect sizes for VRET, indicating (Rothbaum et al., 2006). Only one study directly
that this intervention is highly effective in reducing compared VRET to computer-assisted psychother-
anxiety and phobia symptoms (Parsons & Rizzo, apy for fear of flying (Tortella-Feliu et al., 2011). All
2008). Specifically, the average reduction in over- three treatment conditions (VRET, computer-based
all anxiety was 0.95 standard deviations, with the treatment with therapist, and computer-based treat-
smallest effect size for PTSD (0.87) and largest for ment without therapist) showed large within-group
panic disorder with agoraphobia (1.79). In the sec- effect sizes and were equally effective at significantly
ond meta-analysis (Powers & Emmelkamp, 2008), reducing fear of flying at posttreatment and 1-year
VRET demonstrated a large overall mean effect size follow-up measures. Collectively, there appears to
(Cohen’s d = 1.11) relative to control conditions, be substantial data to support the use of VE for fear
and the effect was consistent across general measures of flying.
of distress, as well as cognitive, behavioral, and psy- VE treatment shows preliminary promise for
chophysiology measures (Powers & Emmelkamp, the treatment of other specific phobias. Two small
2008). To inform future clinical research efforts studies used virtual environments to treat acropho-
incorporating VE technology in anxiety treatments, bia. Glass elevators, footbridges, and outdoor bal-
we next briefly review the utility of VR for the vari- conies at varying heights are simulated in the VE.
ous anxiety disorders. On average, participants treated with VR for height

88 v i rtua l enviro nments in cl inical psyc h ology researc h

phobia experienced decreases in symptoms at post- and speaking during meetings or class (Ruscio et al.,
treatment relative to the waitlist group (Rothbaum 2008; Turner, Beidel, & Townsley, 1992). However,
et al., 1995). Another study (Coelho, Silva, Santos, a significant barrier to the treatment of SP lies in the
Tichon, & Wallis, 2008) compared the effects of difficulty of recruiting audience members to create
three VRET or in vivo exposure sessions for height in vivo social or public-speaking situations (Olfson
phobia. Five patients received in vivo exposure to et al., 2000). Thus, VEs have been recently devel-
an eight-story hotel, and ten patients received expo- oped as a possible alternative context for exposure
sure therapy to a virtual hotel. Both groups showed therapy for social phobia (Klinger et al., 2003; Roy
decreased anxiety and avoidance symptoms; how- et al., 2003). VEs for social phobia, such as a vir-
ever, patients appeared to habituate more quickly tual auditorium or conference room, primarily tar-
in the VE. In a treatment study for spider phobia, get public-speaking fears. Given that the hallmark
83 percent of patients who received four 1-hour of social phobia is a fear of negative evaluation by
VR treatment sessions showed statistically and other people, the ability of the VE to elicit that fear
clinically significant improvement at posttreat- is necessary for VR to work. Thus, the environ-
ment, relative to no changes in a waitlist control ment and the people in the environment have to
group (Garcia-Palacios, Hoffman, Carlin, Furness, feel realistic—cartoonish-looking avatars might not
& Botella, 2002). With regard to other specific provide an environment reminiscent of the indi-
phobias, a case study on the efficacy of VRET to vidual’s actual fear. Fortunately, heightened physi-
treat driving phobia suggest that patients may no ological responses appear to be elicited in healthy
longer meet diagnostic criteria after treatment, and controls in a VR speech task (Kotlyar et al., 2008).
recommended using VRET to lower anxiety to the However, that same study observed similar increases
point of beginning in vivo exposure therapy (Wald in diastolic blood pressure, systolic blood pressure,
& Taylor, 2003). In one small study, six patients and heart rate in participants completing an in vivo
with claustrophobia were treated with a multicom- math task. These physiological findings should be
ponent therapy program that included four sessions interpreted as preliminary given that the VR speech
of VR (Malbos, Mestre, Note, & Gellato, 2008). task and in vivo math task may not be directly com-
At follow-up, their treatment gains generalized to parable, even though both may elicit substantial
other settings (i.e., using the elevator alone). Based subjective and physiological distress.
on these collective findings, VRET appears to be A number of small studies investigated the effi-
an effective tool for the treatment of specific pho- cacy of VRET for the nongeneralized subtype of
bias. Interestingly, 76 percent of people with spe- social phobia, namely public-speaking fears. North
cific phobia or social phobia reported a preference and colleagues (1998) compared VRET to a con-
for VRET over in vivo exposure (Garcia-Palacios, trol condition for participants with public-speaking
Botella, Hoffman, & Fabregat, 2007). phobia. The VRET condition was a large virtual
audience, and the control condition was an unre-
Social Phobia/Fear of Public Speaking lated neutral VE. The six participants who com-
To date, VR has been successfully used to treat pleted the VRET reported improvement on their
specific phobias in particular situations with pow- attitude toward public speaking and subjective
erful physical cues (e.g., distance cues for heights units of distress, while those in the control condi-
or strong vibrations and loud noises for flying). tion showed no change (North, North, & Coble,
However, much more so than for specific phobias, 1998). Improvement was also reported in a single-
therapists face significant challenges finding appro- case study where VR public-speaking exposure
priate in vivo exposure contexts for individuals with therapy was part of a larger cognitive-behavioral
social phobia (SP), which is characterized by a pat- therapy (CBT) program for two woman diag-
tern of excessive fear of social situations or perfor- nosed with social phobia (Anderson, Rothbaum, &
mances in which an individual may be scrutinized Hodge, 2003). Findings indicate that at posttreat-
by others (American Psychiatric Association, 2000). ment, patients’ subjective rating of anxiety during
The third most common mental health disorder in the task and anxiety symptoms as measured by the
the United States, SP affects 8 percent of all youth Personal Report of Confidence as a Public Speaker
and its prevalence ranges from 5 to 8 percent in the Questionnaire (PRCS) decreased. Similar results
general population (Beidel & Turner, 2007). were reported for an 8-week manualized treat-
Common distressful situations for people with ment that included VRE with CBT (Anderson,
SP include public speaking, meeting new people, Zimand, Hodges, & Rothbaum, 2005). For the ten

wong, bei d el 89
participants diagnosed with social phobia or panic staff to give a speech to either a neutral, positive,
disorder with a primary fear of public speaking, self- or negative virtual audience. In that study, all three
reported measures of anxiety and public-speaking conditions elicited anxiety in participants with ele-
fears decreased after treatment (Anderson et al., vated PRCS scores. All participants reported feeling
2005). In addition, participants reported feeling anxious when giving a speech to the negative audi-
satisfied by the treatment, and treatment gains were ence regardless of their PRCS scores (Pertaub et al.,
maintained at 3-month follow-up. However, since 2002). Slater and colleagues (2006) later replicated
there were no control groups and VR was not tested the study by Pertaub and colleagues (2002) by com-
in isolation in the two studies by Anderson and col- paring participants’ physiological responses and sub-
leagues (2005), the efficacy of VR exposure alone jective levels of distress when speaking to an empty
cannot be confirmed at this time. VR auditorium or a VR audience. Participants were
Several studies further examined public-speaking classified by PRCS scores as either confident speak-
anxiety in students without a clinical diagnosis of ers (PRCS ≥ 20) or phobic speakers (PRCS ≤ 10).
SP (Harris, Kemmerling, & North, 2002; Lister, Unlike confident speakers, phobic speakers reported
Piercey, & Joordens, 2010; Wallach, Safir, & Bar- higher levels of anxiety and somatic responses in
Zvi, 2009). Relative to no treatment, VRET was both the empty VR auditorium and VR audience
more effective at reducing public-speaking fears conditions. Physiologically, the phobic speakers
among a small sample of undergraduate students showed a decreasing trend in heart rate when speak-
who reported high levels of public-speaking fears ing to an empty VR room compared to those speak-
(Harris et al., 2002). In that study, Harris and col- ing to an audience, and increased arousal when the
leagues (2002) surveyed an introductory public- VR audience was present (Slater et al., 2006). These
speaking class at a large university and recruited two studies suggest that participants respond to
participants based on a PRCS cutoff score of more contextual changes in the VE.
than 16. Eight participants in the VRET condition Collectively, the above studies indicate that
received exposure to a virtual auditorium, and six VEs elicit a response for participants with public-
participants were in the waitlist control condition. speaking fears. Even a three-dimensional video of a
At posttest, self-reported levels of confidence as a virtual audience presented on a standard CRT tele-
speaker were significantly different between the vision elicits a performance fear response (Lister
VRET condition and waitlist controls. Also using et al., 2010). In that study, Lister and colleagues
university students with public-speaking anxiety, a (2010) similarly recruited undergraduate students
larger randomized clinical trial examined VR CBT from a general introduction psychology course,
as an alternative to CBT for public-speaking anxiety based on a PRCS cut-off score of more than 21.
in an Israeli sample (Wallach et al., 2009). Eighty- Nine participants in the VR condition stood behind
eight participants with public-speaking anxiety were a podium in front of the TV and were exposed to
randomly assigned to VR CBT, CBT, and waitlist the virtual audience through the use of polarized
control. Relative to waitlist controls, both the VR shutter glasses. The speakers initially stood in front
CBT and CBT groups reported significantly lower of the virtual audience when the audience paid
anxiety on social anxiety and public-speaking ques- no attention (2 minutes) and subsequently read a
tionnaires and lower anxiety during a 10-minute text while the audience paid attention (additional
behavioral speech performance task (Wallach et al., 2 minutes). Results showed that skin conductance
2009). Although that investigation was the only and heart-rate measures increased after perform-
study to include a behavioral performance task, no ing the public-speaking task to the VR audience,
significant differences were found on observer rat- and subjective ratings of anxiety and negative self-
ings of anxiety between the two treatment groups. beliefs about public-speaking ability decreased
While VR CBT was not superior to CBT in that (Lister et al., 2010).
study, twice as many participants dropped out from Despite the extant literature on VEs for public-
CBT than from VR CBT (Wallach et al., 2009). speaking anxiety, VRET for the generalized sub-
Additional studies examined whether the changes type of social phobia remains much less studied.
in the virtual environment influenced public-speak- Currently, only one controlled study in France
ing anxiety in students without a clinical diagno- (Klinger et al., 2005) used VEs to target symptoms
sis of SP (Pertaub, Slater, & Barker, 2002; Slater, other than public-speaking fears. In that study, 36
Pertaub, Barker, & Clark, 2006). Pertaub and col- participants clinically diagnosed with social pho-
leagues (2002) recruited university students and bia were assigned to either 12 sessions of VRET

90 v i rt ua l enviro nments in cl inical psyc h ology researc h

or group CBT. Klinger and colleagues (2005) used consisted of six, 90-minute VRET sessions over
exposure VEs that replicated a variety of social the course of 4 weeks as part of a larger treatment
situations, including public speaking, short inter- protocol including psychoeducation, relaxation
personal interaction (e.g., small talk at a dinner con- training, and in vivo exposure. Although his score
versation), assertiveness (e.g., having a viewpoint on a self-report PTSD inventory decreased at post-
challenged), and evaluation (e.g., completing a task treatment, and gains were maintained at 7-week
while being observed). After treatment, participants follow-up, the specific effects of VRET cannot
in both conditions showed increased global measures be isolated from the multicomponent treatment.
of functioning and reported decreased symptoms Similar findings were reported by Wood and col-
of social phobia, and the efficacy of VRET to the leagues (2007)—one veteran reported decreases in
control traditional group CBT was not statistically symptoms of combat-related PTSD and physiologi-
different based on effect size comparisons (Klinger cal arousal following VRET. Although the ability
et al., 2005). However, these findings should be to generalize findings from these two case studies is
interpreted cautiously because that study did not limited, a pilot study found that 12 men with PTSD
include a third condition such as placebo or waitlist reported decreased levels of PTSD and depression
control. Future research on VRET for the general- after 20 sessions of VRET, and 75 percent of par-
ized subtype of social phobia is needed. ticipants no longer met criteria for PTSD (Wood
et al., 2009). Furthermore, a larger study (Reger
Posttraumatic Stress Disorder et al., 2011) examined the effectiveness of VRET
U.S. military deployment to Operation Iraqi for 24 active-duty soldiers who sought treatment
Freedom/Operation Enduring Freedom has been following a deployment to Iraq or Afghanistan.
extensive, and up to 18.5 percent of returning vet- Patients reported a significant reduction of PTSD
erans are diagnosed with posttraumatic stress dis- symptoms as measure by the PTSD Checklist
order (PTSD) (Hoge, Auchterlonie, & Milliken, (Military version) at posttreatment.
2006; Tanielian & Jaycox, 2008; Smith et al., VR has also been used to treat civilian-related
2008). Although veterans with PTSD are often PTSD. Two small studies by one research group
reluctant to seek mental health services (Hoge (Difede, Cukor, Patt, Giosan, & Hoffman, 2006;
et al., 2004) they tend to be higher users of medical Difede et al., 2007) investigated the utility of
care services (Kessler, 2000; Solomon & Davidson, VRET for civilian and disaster workers who were
1997), and a majority (>90 percent) also seek dis- directly exposed to the World Trade Center attacks
ability compensation for debilitating occupational on September 11, 2011 and diagnosed with
impairment. Unlike other anxiety disorders charac- PTSD. Both studies reported that participants
terized by anticipatory fear, individuals with PTSD who received VR treatment showed a significant
experience anxiety related to previous traumatic decrease in PTSD symptoms relative to the waitlist
events that actually happened. Positive symptoms of control group, and improvements were maintained
PTSD include intrusive thoughts, re-experiencing, at 6-month follow-up (Difede et al., 2007).
hyperarousal, and avoidance. Behavioral treatment
for PTSD often relies on imaginal exposure because Panic Disorder With or Without
the traumatic events may be difficult or unethi- Agoraphobia
cal to recreate. Recently, VEs have been developed Panic disorder with or without agoraphobia is
to augment imaginal exposure for combat-rated associated with substantial severity and impairment,
PTSD with sensory cues such as visual, auditory, and lifetime prevalence estimates are 1.1 percent
olfactory, and haptic stimuli (Cukor, Spitalnick, and 3.7 percent, respectively (Kessler et al., 2006).
Difede, Rizzo, & Rothbaum, 2009). For example, Individuals with panic disorder report significant
at least one VR technology can recreate 13 scents distress over panic attacks. Panic attacks are char-
from Middle Eastern wartime settings, including acterized by the sudden and unexpected onset of a
burned rubber, gunpowder, and body odor (Rizzo period of intense fear and discomfort with a clus-
et al., 2010). ter of physical and cognitive symptoms (American
A few case studies examined the efficacy of Psychiatric Association, 2000). Patients with panic
VRET for combat-related PTSD (Reger & Gahm, disorder are often concerned about the implications
2008; Wood et al., 2007). Reger and Gahm (2008) of their panic attacks and report a persistent concern
treated one active-duty U.S. military soldier diag- and avoidance of future attacks. Panic disorder may
nosed with combat-related PTSD. The treatment occur with or without agoraphobia. Agoraphobia

wong, bei d el 91
is characterized by severe anxiety and avoidance of at 3-month follow-up. In another study (Vincelli
situations in which a panic attack might occur and et al., 2003), VR was integrated into a multicompo-
fear that it might be difficult or embarrassing to nent CBT strategy (Experiential Cognitive Therapy
escape (e.g., crowds, public transportation, traveling [ECT]) for the treatment of panic disorders with
alone, etc.). Individuals with panic disorder, with or agoraphobia. The small controlled study compared
without agoraphobia, often report intense fear and treatment outcome in three groups, including an
need to escape, in addition to a number of physical ECT condition in which participants received eight
sensations in their body such as heart palpitations, sessions of VR-assisted CBT, a CBT condition in
difficulty breathing, feeling unsteady or nauseated, which they received 12 sessions of traditional cog-
and trembling. nitive-behavioral approach, and a waitlist control
Although interoceptive therapy remains undis- condition. At posttreatment, both treatment groups
puted as one of the gold-standard treatments for reported significantly less frequent panic attacks and
panic disorder, currently at least one VE exists to decreased levels of anxiety and depression (Vincelli
augment interoceptive exposure. The VE technol- et al., 2003). Similar findings in a larger study (Choi
ogy incorporates external audio/visual stimuli with et al., 2005) reported that patients with panic dis-
interoceptive cues to trigger bodily sensations, such order and agoraphobia improved at posttreatment,
as blurred vision and audible rapid heartbeats, while regardless if they received brief ECT (4 sessions) or
the person is in a virtual bus or tunnel (Botella 12 sessions of traditional panic treatment.
et al., 2004, 2007). Recently, the utility of VRET
for panic disorder with or without agoraphobia has Virtual Environments for Developmental
been examined in a few small controlled studies Disorders and Intellectual Disabilities
(Botella et al., 2007; Pérez-Ara et al., 2010). In one In addition to the anxiety disorders, VEs have
study, patients with panic disorder, with or without been widely implemented for youth with devel-
agoraphobia, who received either in vivo exposure opmental disorders, particularly autism spectrum
or VRET as part of a multicomponent treatment, disorders (ASDs). Given the neurodevelopmental
improved similarly and showed more improvement challenges faced by children with these disorders,
than waitlist controls at posttreatment (Botella et al., several potential concerns regarding the imple-
2007). Both treatment groups reported decreased mentation of VE have been identified (Andersson,
catastrophic thoughts and improvements on clini- Josefsson, & Pareto, 2006; Parsons, 2005)—specif-
cal global impression scores with treatment gains ically, would adolescents with ASDs (a) be able to
maintained at 12-month follow-up. Interestingly, use the VEs appropriately, (b) understand the VEs
that study reported that 100 percent of the in vivo as representational devices, and (c) learn new infor-
condition participants no longer had panic or had a mation from the VEs about social skills (Parsons,
50 percent reduction in panic frequency at posttreat- 2005)? Over the course of three investigations
ment, while the rates were 90 percent in the VRET (Parsons, 2005), data indicated that individuals
condition and 28.57 percent in the waitlist control with ASDs can complete tasks, understand social
group (Botella et al., 2007). Finally, both treatment situations, and learn social conventions through the
groups also reported similar levels of satisfaction. VE, thus confirming the benefits of immersive VEs
Another small study from the same research group for children and adolescents with ASDs (Wallace
compared treatment outcomes for patients with et al., 2010).
panic disorder with or without agoraphobia who For youth with ASDs, the use of VEs have pri-
received a multicomponent CBT program, includ- marily targeted social skill deficits (Cheng, Chiang,
ing exposure to an agoraphobic VE (e.g., a crowded Ye, & Cheng, 2010; Mitchell, Parsons, & Leonard,
mall or narrow tunnel) in two conditions (Pérez-Ara 2007). Mitchell and colleagues (2007) created a vir-
et al., 2010). One condition received simultaneously tual café wherein six adolescents with ASDs were
presented audio and visual effects (e.g., audible rapid able to practice their social skills anonymously (i.e.,
heartbeat or double vision). The other condition determining where to sit: either at an empty table
received the same VE for 25 minutes followed by or an empty seat with others already sitting, or were
traditional interoceptive exercises (e.g., head spin- given the choice to ask others if a seat was available).
ning and hyperventilation). Outcome did not differ After engaging in the VE experience, some adoles-
by treatment condition; both treatments reported cents showed greater levels of social understanding
decreased fear, avoidance, and panic disorder sever- as measured by their ability to justify where they
ity at posttreatment, and gains were maintained would sit and why they chose that seat. Interestingly,

92 v i rt ua l enviro nments in cl inical psyc h ology researc h

participants were also able to generalize learned are assessed. In this way, changes in cue reactivity as
behavior to different VE contexts (i.e., from a virtual a result of an intervention may be measured. VEs
café to a live video of a bus). Another small study for eating disorders are designed with food-related
used a collaborative virtual learning environment to and body-image cues to elicit emotional responses,
teach the use and comprehension of empathy (e.g., and have been used to assess body-image distor-
kindness, tolerance, and respect) to three children tion and dissatisfaction. The scenarios incorporate
with ASDs (Cheng et al., 2010). Each participant cues such as a virtual kitchen with high-calorie foods
showed improvement on the Empathy Rating Scale and scales showing the patients’ real weight (Ferrer-
posttreatment and continued to maintain gains at García & Gutiérrez-Maldonado, 2005; Gutiérrez-
follow-up relative to baseline. Results preliminarily Maldonado, Ferrer-García, Caqueo-Urízar, &
suggest that the virtual learning environment sys- Letosa-Porta, 2006; Gutiérrez-Maldonado, Ferrer-
tem may be helpful in increasing empathic under- García, Caqueo-Urízar, & Moreno, 2010; Perpiñá,
standing among children with ASDs. Botella, & Baños, 2003). However, it is not always
VEs have been used to target other skill deficits necessary for VEs to present cue-related stimuli in
among youth with ASDs, such as crossing the street order to assess psychological symptoms. For exam-
(Josman, Ben-Chaim, Friedrich, & Weiss, 2008). ple, neutral VEs have been used to assess unfounded
Additionally, a VR-tangible interaction, involving persecutory ideation among individuals on a con-
both virtual and physical environments, was used tinuum of paranoia (Freeman, Pugh, Vorontsova,
to address sensory integration and social skills treat- Antley, & Slater, 2010). Unlike real life, where the
ment for 12 children with ASDs (Jung et al., 2006). behaviors of other people are never under complete
Although not empirically studied, Wang and Reid experimental control, in VEs the behaviors of others
(2009) also discussed the potential for VR technol- are scripted by the designer. Thus, VEs can provide
ogy in the cognitive rehabilitation of children with a high level of experimental control. The VE avatars
autism. do only what they are programmed to do, allow-
In addition to youth with ASDs, VEs have ing a unique opportunity to observe how someone
been used for people with intellectual disabilities interprets their behavior. For example, an avatar
(Standen & Brown, 2005; Stendal, Balandin, & that looks in the direction of the person immersed
Molka-Danielsen, 2011). VEs have the potential in the environment may be interpreted as a “curious
to teach independent living skills, including gro- glance” by someone with no evidence of psycho-
cery shopping, food preparation, spatial orienta- sis or as a “menacing stare” by an individual with
tion (Mengue-Topio, Courbois, Farran, & Sockeel, paranoia. Finally, cognitive-based assessment and
2011), road safety, and vocational training (Standen training may be embedded into VEs. For example,
& Brown, 2005). To enhance usability for a popula- cognitive training programs in a VE have been
tion with intellectual disabilities, Lannen and col- used for older adults with chronic schizophrenia
leagues (2002) recommended the development of (Chan, Ngai, Leung, & Wong, 2010), and continu-
new devices based on existing data, and additional ous performance tests in a VE have been used for
prototype testing of devices among individuals with youth with attention-deficit/hyperactivity disorders
learning disabilities. (Pollak et al., 2009; Rizzo et al., 2000). Collectively,
these studies suggest that VEs may be used broadly
VEs Used with Other Clinical Disorders across a wide range of clinical disorders.
In addition to anxiety disorders and develop-
mental disorders, VE applications are increas- Limitations of VEs in Clinical
ingly used in other clinical contexts. For example, Psychology Research
VEs have been used to assess reactivity and crav- There are several challenges for the integration
ings for alcohol (Bordnick et al., 2008) and other of VEs into research and clinical settings (Blade
substances (Bordnick et al., 2009; Culbertson & Padgett, 2002). We will discuss issues that may
et al., 2010; Girard, Turcotte, Bouchard, & Girard affect the utility and efficacy of VE. First, for VEs
2009; Traylor, Bordnick, & Carter, 2008). In to be effective, several basic conditions are neces-
these scenarios, participants walk through a vir- sary (Foa & Kozak, 1986), including active partici-
tual party where alcohol or other substance-related pation and immersion in the environment (Slater,
cues are available. As they encounter different cues Pertaub, & Steed, 1999), generalizability to real-
(a bottle of beer, a bartender, other people drink- life situations, and the ability to elicit physiologi-
ing and smoking), their cravings for that substance cal responses (North, North, & Coble, 1997/1998;

wong, bei d el 93
Schuemie, van der Straaten, Krijn, & van der Despite these potential challenges, the ben-
Mast, 2000). Among these conditions, the user’s efits of VE in research and clinical contexts appear
level of immersion or presence is usually described to outweigh the limitations (Glantz et al., 2003).
by the quality of the VE experience, and may be As the efficacy data for VEs increase, additional
directly related to the efficacy of the experience research and demand for the intervention will
(Wiederhold & Weiderhold, 2005). Overall, VEs increase, thereby making the purchase of VR com-
do seem to provide the level of immersion needed patible equipment a better investment for research-
for treatment efficacy (Alsina-Jurnet, Gutiérrez- ers, therapists, and patients alike. For example, VE
Maldonado, & Rangel-Gómez, 2011; Gamito appears to be a cost-effective treatment strategy
et al., 2010; IJsselsteijn, de Ridder, Freeman, & (Wood et al., 2009)—the cost of VRET for 12 par-
Avons, 2000; Krijn, Emmelkamp, Biemond, et al., ticipants was estimated to be $114,490 less than
2004; Krijn, Emmelkamp, Ólafsson, Schuemie, the cost of treatment as usual ($439,000). Findings
& van der Mast, 2007; Price & Anderson, 2007; from an Internet survey on psychotherapists’ per-
Price, Mehta, Tone, & Anderson, 2011; Villani, ception regarding the use of VR treatments found
Riva, & Riva, 2007; Wallace et al., 2010; Witmer that one of the highest-rated benefits was the abil-
& Singer, 1998), thus making them a potentially ity to expose patients to stimuli that would other-
valuable tool to augment exposure therapy and wise be impractical or difficult to access (Segal et
exposure research, particularly in instances where al., 2011). Another highly rated benefit was that
in vivo exposure is not feasible and the participant therapists believe they have increased control over
cannot produce a vivid imaginal scene. the situation. Indeed, VR has appealing features
A second issue that merits consideration is the such as the potential to deliberately control what
potential for side effects due to being immersed in is presented to the client, the ability to tailor treat-
the VE environment, such as cyber sickness (Bruck ment environments to the needs of each individual
& Watters, 2009). Although the increasing sophis- within the confines of the technology, the ability to
tication of the HMDs and tracking devices has expose the client to a range of conditions that would
dramatically decreased the likelihood of motion be impractical or unsafe in the real world, and the
sickness in VEs, there is still a need to carefully ability to provide an alternative to individuals who
evaluate patients both before and after the session are concerned about confidentiality or being seen in
to evaluate motion or simulator sickness symptoms treatment by others (Glantz et al., 2003).
such as nausea, dizziness, and headache. In our
clinic, we advise patients to have a light meal about Integrating VE into Research Paradigms
an hour prior to the treatment sessions to reduce the As indicated above, VEs are now an accepted tool
likelihood of side effects. Furthermore, we do not in the clinical treatment of individuals with anxiety
allow patients to leave the treatment clinic until any disorders, and are emerging as useful for other dis-
symptoms have dissipated. Although motion sick- orders as well. We now turn our attention to the
ness is rare, it is necessary to evaluate. utility of VEs in research endeavors.
Third, even though sophisticated environments One of the clear advantages of using VEs in the
exist, researchers and clinicians may be less likely research setting is the ability to assess behavior under
to use this technology because of the time and standardized conditions. For example, consider the
effort required to learn proper equipment use. assessment of public-speaking anxiety. Previously,
Although a degree in engineering or computer investigators who wanted to conduct a behavioral
science is not necessary, some comfort with basic assessment faced the choice of having the individual
electronics such as dual-screen computer monitors, deliver a speech to a camera or to a very small audi-
HMDs, audio amplifiers, and air compressors (in ence or to spend extensive time trying to arrange for
the case of olfactory stimulation) is beneficial. As a more realistic-size audience (usually more than five
with any type of equipment, technical difficulties people) in order to conduct the assessment. When
are possible and require the ability to troubleshoot working with youth, there are additional issues asso-
the problem (Segal, Bhatia, & Drapeau, 2011). ciated with potentially using the youth’s peers for a
The sophisticated VR units and specialized hard- legitimate social-exposure task. With VE, the audi-
ware required also remain costly (Gregg & Terrier, ence can be credibly simulated so there is no need to
2007) even though prices have dropped over the try and find a live audience. In addition to having a
past decade (Rizzo, Reger, Gahm, Difede, & readily available virtual audience, the customizable
Rothbaum, 2009). nature of VEs allow the investigator to have full

94 v i rtua l enviro nments in cl inical psyc h ology researc h

control over the behavior of the “audience.” Avatars actually happened. In our experience, individuals
can be activated to utter the same phrase or make with PTSD, when placed in a VE for assessment,
the same movement at the same time for each assess- will find any deviation from the actual traumatic
ment and for each research participant. Audience event to be a distraction. Thus, if the actual event
feedback can be specifically programmed to occur involved a collision with a dump truck, the indi-
at the same time for every participant. Similarly, for vidual with PTSD will resist immersion in a VE if
substance abuse/dependence research, assessment of the collision involves a tractor-trailer. They become
craving and other behaviors related to addiction can difficult to engage in the VE because they are dis-
be measured using controlled environments. The tracted by the “wrong elements.” In such cases, VEs
ability to replicate the assessment before and after serve as a distinct disadvantage, not an advantage.
treatment, for example, allows for more calibrated There are two groups of researchers that engage in
and standardized behavioral assessments. In short, the use of VE. The first group consists of individuals
conducting behavioral assessments in vivo can be who either design/develop or partner with software
cumbersome and allows only limited controllabil- engineers/companies in the design and development
ity of the environment. Thus, the customizable and of software to be used in clinical and research settings.
controllable features available through the applica- In many instances, VEs are developed by researchers
tion of VEs are potential solutions for researchers in partnership with small businesses that specialize in
who want to fully control the environment in a VR, with funding provided by, for example, the
standardized research protocol. Department of Defense or the National Institute of
Additionally, as the protection of human sub- Health’s (NIH) Small Business Innovation Research
jects and patient confidentiality are top priorities (SBIR)/Small Business Technology Transfer (STTR)
for Institutional Review Board regulations, both grant mechanisms. Whereas small businesses are eli-
researchers and participants can benefit from VE gible to submit grants through the SBIR mechanism,
protocols in the laboratory, where patient confiden- the STTR mechanism requires a formal partnership
tiality is easily maintained. Furthermore, although between a small business and a research university.
the use of audio/video recording and physiological Other grant mechanisms, such as the R (Research)
assessments during the in vivo encounter is difficult mechanisms, may be used for research when exist-
to engineer, it is much more feasible through a VE. ing VEs are used for the purpose of other research
For example, when the distressful situation/event questions. Individuals involved in the development
can be created in the clinic through the use of VE, come from many different backgrounds, including
there is the added advantage of being able to assess psychology, engineering, computer programming
physiological responses (e.g., heart rate, electroder- and gaming, and even art.
mal activity, etc.) and the ability to record the assess- The second group of individuals who use VE are
ment, allowing for the coding of overt behaviors. not primarily involved in its development, but the
A disadvantage of VE in the assessment of psy- majority are psychologists who purchase/use soft-
chopathology is that although the environments ware from companies designed specifically for that
are customizable, they are unlikely to be able to purpose. The largest hurdle in establishing a VE
exactly recreate every specific anxious situation that laboratory is the initial outlay for the equipment
might exist. In the cases of most anxiety disorders, and software. The prices for systems typically used
this is not an issue, as most individuals with anxi- in research settings vary, but the cost for initial out-
ety disorders fear something that might happen. For lays ranges from $15,000 to $25,000 for hardware
example, individuals with a specific phobia of flying and software programs. Once past this initial invest-
are afraid that the plane might crash. Thus, the VE ment, there are few additional costs (e.g., repair/
must encompass visuals of the terminal, the runway, replacement of broken equipment, maintenance
the inside of the plane—but because the event has contracts, software upgrades).
not happened, the individual is able to accept the Although most of the interfaces are fairly intui-
VE as is (color of the seats in the plane, number tive, they are not turnkey. Thus, there is an initial
of people in the terminal). Similarly, for individuals learning curve for researchers to learn how to oper-
with social phobia who are anxious in a job inter- ate the system seamlessly in order to provide the
view, the sex or race of the human resources director most realistic experience. Individuals who are more
may not matter. In contrast, VE may actually limit familiar with videogames often find the system
the assessment of psychopathology among individ- easier to use. Older adults, or those with less gam-
uals with PTSD for whom the traumatic event has ing experience, will require, and should be given,

wong, bei d el 95
practice so that they are familiar with the controls extant literature remains limited by small sample
prior to participating in the research study. sizes, reliance upon self-report data, measures that
The equipment required for VE is quite variable lack psychometric properties, and inconsistent meth-
and depends on the type of media. Mixed-media odological procedures. Future research will benefit
environments can be sophisticated and may involve from larger sample sizes, the use of clinical measures
the use of “green screens” such as those used in the with better psychometric properties for the diagnos-
motion picture industry to allow the individual a tic assessment, behavioral measures with indepen-
more immersive experience. The setup requires a dent and blinded raters, randomized treatment and
larger space so that the individual may walk around control groups, standardized VR treatment protocols
the green-screen environment, which enhances the across investigations (i.e., the number, duration, and
experience. More commonly, VE systems use a frequency of sessions vary greatly), and the distinc-
desktop computer, dual monitors, an HMD, ear- tion between clinical and statistical significance (i.e.,
phones, and perhaps a scent machine if olfactory the number of patients who no longer meet diagnos-
cues are included. One distinct advantage is that the tic criteria versus patients who returned to normal
equipment is more compact, allowing it to be used levels of functioning) (see Kendall, Marrs-Garcia,
in a typical “office”-size room. There is no oppor- Nath, & Sheldrick, 1999). In addition, relative to
tunity for the individual to actually walk through the literature on VEs for adults with anxiety disor-
the environment as he or she is tethered by wires ders, the utility of VEs for anxious youth is virtually
to the computer that delivers the VE program. nonexistent (Bouchard, 2011).
Thus, the individual is either seated or stands and For researchers, VEs offer flexibility and the abil-
moves through the environment via a game control- ity to conduct standardized behavioral assessments
ler. Wearing the HMD can be somewhat cumber- that are realistic yet under the control of the investi-
some, and may not be feasible for children given its gator. The initial adoption of VEs as a research tool
size and weight. However, as technology continues can be daunting and expensive, yet the advantages
to improve, VE environments are changing. HMDs are many and the obstacles are relatively easy to
may be replaced by newer systems that require only overcome. As technology improves and individu-
the use of a video monitor. als’ familiarity/sophistication with virtual worlds
As noted above, designing and developing VEs becomes standard, research participants and clinical
require collaboration among different disciplines. patients alike may be increasingly receptive to the
Clinical psychologists often provide the vision of use of VEs in both research and clinical settings.
what the software must be able to do in order to
be useful for research, clinical, and data-collec- Conclusion
tion purposes. Human factors psychologists are VE systems have been most widely used for anxi-
integral for ensuring that the hardware/software ety disorders, developmental disorders, and a num-
interface is user-friendly. Computer programmers ber of other health-related disorders. To date, data
and software engineers are responsible for writing are limited but VEs appear to be a viable alternative
the software. Artists/graphic designers design the to other exposure modalities. Clearly, VEs can be a
visual layout/feel of the VE and build any avatars tool for clinical psychology treatment and research
that populate the environment. Constructing VEs when appropriately used as an enhancement—not a
can be quite costly. Currently, our laboratory col- replacement—for empirically supported behavioral
laborates with a VE company to design VEs for treatments (Glantz et al., 2003; Gorini & Riva,
children with social phobia. The design of one 2008).
environment (various areas within an elementary
school) and eight speaking avatars, plus the related References
background, will cost approximately $250,000 and Alsina-Jurnet, I., Gutiérrez-Maldonado, J., & Rangel-Gómez,
M. (2011). The role of presence in the level of anxiety experi-
take over 18 months to develop. This includes the enced in clinical virtual environments. Computers in Human
conceptual storyboard design and conversational Behavior, 27, 504–512.
content of all avatar characters as well as the actual American Psychiatric Association. (2000). Diagnostic and statisti-
visual design and engineering. cal manual of mental disorders (4th ed., text rev.). Washington,
Anderson, P., Jacobs, C. H., Lindner, G. K., Edwards, S.,
Future Directions Zimand, E., Hodges, L., et al. (2006). Cognitive behavior
Given the rapid technological progress of VEs therapy for fear of flying: Sustainability of treatment gains
for clinical psychology these past three decades, the after September 11. Behavior Therapy, 37, 91–97.

96 v i rtua l enviro nments in cl inical psyc h ology researc h

Anderson, P., Jacobs, C., & Rothbaum, B. O. (2004). Computer- Choi, Y., Vincelli, F., Riva, G., Wiederhold, B. K., Lee, J., &
supported cognitive behavioral treatment of anxiety disor- Park, K. (2005). Effects of group experiential cognitive ther-
ders. Journal of Clinical Psychology, 60, 253–267. apy for the treatment of panic disorder with agoraphobia.
Anderson, P., Rothbaum, B. O., & Hodges, L. F. (2003). Virtual CyberPsychology & Behavior, 8, 387–393.
reality exposure in the treatment of social anxiety: Two case Coelho, C. M., Silva, C. F., Santos, J. A., Tichon, J., & Wallis, G.
reports. Cognitive and Behavioral Practice, 10, 240–247. (2008). Contrasting the effectiveness and efficiency of virtual
Anderson, P. L., Zimand, E., Hodges, L. F., & Rothbaurn, B. reality and real environments in the treatment of acrophobia.
O. (2005). Cognitive behavioral therapy for public-speaking PsychNology Journal, 6, 203–216.
anxiety using virtual reality for exposure. Depression and Coelho, C. M., Waters, A. M., Hine, T. J., & Wallis, G. (2009).
Anxiety, 22, 156–158. The use of virtual reality in acrophobia research and treat-
Andersson, U., Josefsson, P., & Pareto, L. (2006). Challenges ment. Journal of Anxiety Disorders, 23, 563–574.
in designing virtual environments training social skills for Cukor, J., Spitalnick, J., Difede, J., Rizzo, A., & Rothbaum,
children with autism. International Journal on Disability and B. O. (2009). Emerging treatments for PTSD. Clinical
Human Development, 5, 105–111. Psychology Review, 29, 715–726.
Barlow, D. H. (2002). Anxiety and its disorders: The nature and Culbertson, C., Nicolas, S., Zaharovits, I., London, E. D.,
treatment of anxiety and panic (2nd ed.). New York: Guilford De La Garza, R., Brody, A. L., & Newton, T. F. (2010).
Press. Methamphetamine craving induced in an online virtual real-
Beidel, D. C., & Turner, S. M. (2007). Shy children, phobic ity environment. Pharmacology, Biochemistry and Behavior,
adults: The nature and treatment of social anxiety disorder. 96, 454–460.
Washington, DC: American Psychological Association. Da Costa, R. T., Sardinha, A., & Nardi, A. E. (2008). Virtual
Blade, R. A., & Padgett, M. (2002). Virtual environments: reality exposure in the treatment of fear of flying. Aviation,
History and profession. In K. M. Stanney & K. M. Stanney Space, and Environmental Medicine, 79, 899–903.
(Eds.), Handbook of virtual environments: Design, implemen- Deacon, B. J., & Abramowitz, J. S. (2004). Cognitive and behav-
tation, and applications (pp. 1167–1177). Mahwah, NJ: ioral treatments for anxiety disorders: A review of meta-ana-
Lawrence Erlbaum Associates Publishers. lytic findings. Journal of Clinical Psychology, 60, 429–441.
Bordnick, P. S., Copp, H. L., Traylor, A., Graap, K. M., Carter, B. Difede, J., Cukor, J., Jayasinghe, N., Patt, I., Jedel, S., Spielman,
L., Walton, A., & Ferrer, M. (2009). Reactivity to cannabis L., Giosan, C., & Hoffman, H. G. (2007). Virtual reality
cues in virtual reality environments. Journal of Psychoactive exposure therapy for the treatment of posttraumatic stress
Drugs, 41, 105–112. disorder following September 11, 2001. Journal of Clinical
Bordnick, P. S., Traylor, A., Copp, H. L., Graap, K. M., Carter, B., Psychiatry, 68, 1639–1647.
Ferrer, M., & Walton, A. P. (2008). Assessing reactivity to Difede, J., Cukor, J., Patt, I., Giosan, C., & Hoffman, H. (2006).
virtual reality alcohol-based cues. Addictive Behaviors, 33, The application of virtual reality to the treatment of PTSD
743–756. following the WTC attack. Annals of the New York Academy
Botella, C., Villa, H., García-Palacios, A., Baños, R., Perpiñá, C., of Sciences, 1071, 500–501.
& Alcañiz, M. (2004). Clinically significant virtual environ- Ferrer-García, M. M., & Gutiérrez-Maldonado, J. J. (2005).
ments for the treatment of panic disorder and agoraphobia. Assessment of emotional reactivity produced by exposure to
CyberPsychology and Behavior, 7, 527–535. virtual environments in patients with eating disorders. Annual
Botella, C. C., García-Palacios, A. A., Villa, H. H., Baños, R. Review of CyberTherapy and Telemedicine, 3, 123–128.
M., Quero, S. S., Alcañiz, M. M., & Riva, G. G. (2007). Foa, E. B., & Kozak, M. J. (1986). Emotional processing of fear:
Virtual reality exposure in the treatment of panic disorder Exposure to corrective information. Psychological Bulletin,
and agoraphobia: A controlled study. Clinical Psychology & 99, 20–35.
Psychotherapy, 14, 164–175. Freeman, D., Pugh, K., Vorontsova, N., Antley, A., & Slater,
Bouchard, S. (2011). Could virtual reality be effective in treating M. (2010). Testing the continuum of delusional beliefs: An
children with phobias? Expert Review of Neurotherapeutics, experimental study using virtual reality. Journal of Abnormal
11, 207–213. Psychology, 119, 83–92.
Bruck, S., & Watters, P. (2009). Cybersickness and anxiety Gamito, P., Oliveira, J., Morais, D., Baptista, A., Santos, N.,
during simulated motion: Implications for VRET. Annual Soares, F., &. . . Rosa, P. (2010). Training presence: The
Review of CyberTherapy and Telemedicine, 7, 169–173. importance of virtual reality experience on the “sense of being
Bush, J. (2008). Viability of virtual reality exposure therapy as there.” Annual Review of CyberTherapy and Telemedicine,
a treatment alternative. Computers in Human Behavior, 24, 8,103–106.
1032–1040. Garcia-Palacios, A., Hoffman, H., Carlin, A., Furness, T. A., &
Chambless, D. L., & Ollendick, T. H. (2001). Empirically sup- Botella, C. (2002). Virtual reality in the treatment of spider
ported psychological interventions: Controversies and evi- phobia: A controlled study. Behaviour Research and Therapy,
dence. Annual Review of Psychology, 52, 685–716. 40, 983–993.
Chan, C. F., Ngai, E. Y., Leung, P. H., & Wong, S. (2010). Effect Garcia-Palacios, A. A., Botella, C. C., Hoffman, H. H., &
of the adapted virtual reality cognitive training program Fabregat, S. S. (2007). Comparing acceptance and refusal
among Chinese older adults with chronic schizophrenia: A rates of virtual reality exposure vs. in vivo exposure by
pilot study. International Journal of Geriatric Psychiatry, 25, patients with specific phobias. CyberPsychology & Behavior,
643–649. 10, 722–724.
Cheng, Y., Chiang, H., Ye, J., & Cheng, L. (2010). Enhancing Gerardi, M., Cukor, J., Difede, J., Rizzo, A., & Rothbaum, B.
empathy instruction using a collaborative virtual learning (2010). Virtual reality exposure therapy for post-traumatic
environment for children with autistic spectrum conditions. stress disorder and other anxiety disorders. Current Psychiatry
Computers & Education, 55, 1449–1458. Reports, 12, 298–305.

wong, bei d el 97
Girard, B., Turcotte, V., Bouchard, S., & Girard, B. (2009). therapy versus cognitive behavior therapy for social phobia:
Crushing virtual cigarettes reduces tobacco addiction and A preliminary controlled study. CyberPsychology & Behavior,
treatment discontinuation. CyberPsychology & Behavior, 12, 8, 76–88.
477–483. Klinger, E., Chemin, I., Légeron, P., et al. (2003). Designing vir-
Glantz, K., Rizzo, A., & Graap, K. (2003). Virtual reality for tual worlds to treat social phobia. In B. Wiederhold, G. Riva,
psychotherapy: Current reality and future possibilities. & M. D. Wiederhold (Eds.), Cybertherapy 2003 (pp. 113–121).
Psychotherapy: Theory, Research, Practice, Training, 40, 55–67. San Diego, CA: Interactive Media Institute.
Gorini, A., & Riva, G. (2008). Virtual reality in anxiety disorders: Kotlyar, M., Donahue, C., Thuras, P., Kushner, M. G.,
The past and the future. Expert Review of Neurotherapeutics, O’Gorman, N., Smith, E. A., & Adson, D. E. (2008).
8, 215–233. Physiological response to a speech stressor presented in a vir-
Gregg, L., & Tarrier, N. (2007). Virtual reality in mental health: tual reality environment. Psychophysiology, 45, 1034–1037.
A review of the literature. Social Psychiatry and Psychiatric Krijn, M., Emmelkamp, P. G., Biemond, R., de Wilde de Ligny,
Epidemiology, 42, 343–354. C., Schuemie, M. J., & van der Mast, C. G. (2004). Treatment
Gutiérrez-Maldonado, J., Ferrer-García, M., Caqueo-Urízar, A., of acrophobia in virtual reality: The role of immersion and
& Letosa-Porta, A. (2006). Assessment of emotional reactivity presence. Behaviour Research & Therapy, 42, 229–239.
produced by exposure to virtual environments in patients with Krijn, M. M., Emmelkamp, P. G., Ólafsson, R. P., & Biemond,
eating disorders. CyberPsychology & Behavior, 9, 507–513. R. R. (2004). Virtual reality exposure therapy of anxiety dis-
Gutiérrez-Maldonado, J., Ferrer-García, M., Caqueo-Urízar, orders: A review. Clinical Psychology Review, 24, 259–281.
A., & Moreno, E. (2010). Body image in eating disor- Krijn, M. M., Emmelkamp, P. G., Ólafsson, R. P., Schuemie,
ders: The influence of exposure to virtual-reality environ- M. J., & van der Mast, C. G. (2007). Do self-statements
ments. Cyberpsychology, Behavior, and Social Networking, 13, enhance the effectiveness of virtual reality exposure therapy?
521–531. A comparative evaluation in acrophobia. CyberPsychology &
Harris, S. R., Kemmerling, R. L., & North, M. M. (2002). Behavior, 10, 362–370.
Brief virtual reality therapy for public speaking anxiety. Lannen, T. T., Brown, D. D., & Powell, H. H. (2002). Control
CyberPsychology & Behavior, 5, 543–550. of virtual environments for young people with learning dif-
Hoge, C. W., Auchterlonie, J. L., & Milliken, C. S. (2006). ficulties. Disability and Rehabilitation: An International,
Mental health problems, use of mental health services and Multidisciplinary Journal, 24, 578–586.
attrition from military service after returning from deploy- Lister, H. A., Piercey, C., & Joordens, C. (2010). The effective-
ment to Iraq or Afghanistan. Journal of the American Medical ness of 3-D video virtual reality for the treatment of fear of
Association, 295, 1023–1032. public speaking. Journal of CyberTherapy and Rehabilitation,
Hoge, C. W., Castro, C. A., Messer, S. C., McGurk, D., Cotting, 3, 375–381.
D. I., & Koffman, R. L. (2004). Combat duty in Iraq and Malbos, E. E., Mestre, D. R., Note, I. D., & Gellato, C. C.
Afghanistan, mental health problems, and barriers to care. (2008). Virtual reality and claustrophobia: Multiple com-
New England Journal of Medicine, 351, 13–22. ponents therapy involving game editor virtual environments
IJsselsteijn, W. A., de Ridder, H., Freeman, J., & Avons, S. exposure. CyberPsychology & Behavior, 11, 695–697.
(2000). Presence: Concept, determinants and measurement. Mengue-Topio, H., Courbois, Y., Farran, E. K., & Sockeel, P.
Proceedings of the SPIE, Human Vision and Electronic Imaging (2011). Route learning and shortcut performance in adults
V, 3959, 520–529. with intellectual disability: A study with virtual environ-
Josman, N., Ben-Chaim, H., Friedrich, S., & Weiss, P. (2008). ments. Research in Developmental Disabilities, 32, 345–352.
Effectiveness of virtual reality for teaching street-crossing Meyerbröker, K., & Emmelkamp, P. G. (2010). Virtual reality
skills to children and adolescents with autism. International exposure therapy in anxiety disorders: A systematic review
Journal on Disability and Human Development, 7, 49–56. of process-and-outcome studies. Depression and Anxiety, 27,
Jung, K., Lee, H., Lee, Y., Cheong, S., Choi, M., Suh, D., Suh, 933–944.
D., Oah, S., Lee, S., & Lee, J. (2006). The application of a Mitchell, P., Parsons, S., & Leonard, A. (2007). Using virtual
sensory integration treatment based on virtual reality-tangi- environments for teaching social understanding to 6 adoles-
ble interaction for children with autistic spectrum disorder. cents with autistic spectrum disorders. Journal of Autism and
PsychNology Journal, 4, 145–159. Developmental Disorders, 37, 589–600.
Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C. Norcross, J., Hedges, M., & Prochaska, J. (2002). The face
(1999). Normative comparisons for the evaluation of clini- of 2010: A Delphi poll on the future of psychotherapy.
cal significance. Journal of Consulting and Clinical Psychology, Professional Psychology: Research and Practice, 33, 316–322.
67, 285–299. North, M., North, S., & Coble, J. R. (1998). Virtual reality ther-
Kessler, R. C. (2000). Posttraumatic stress disorder: The burden apy: An effective treatment for the fear of public speaking.
to the individual and to society. Journal of Clinical Psychiatry, International Journal of Virtual Reality, 3, 2–6.
61 (Suppl 5), 4–14. North, M. M., North, S. M., & Coble, J. R. (1997/1998).
Kessler, R. C., Chiu, W., Jin, R., Ruscio, A., Shear, K., & Walters, Virtual reality therapy: An effective treatment for psychologi-
E. E. (2006). The epidemiology of panic attacks, panic dis- cal disorders. In G. Riva & G. Riva (Eds.), Virtual reality in
order, and agoraphobia in the National Comorbidity Survey neuro-psycho-physiology: Cognitive, clinical and methodological
Replication. Archives of General Psychiatry, 63, 415–424. issues in assessment and rehabilitation (pp. 59–70). Amsterdam
Klein, R. A. (2000). Virtual reality exposure therapy in the treat- Netherlands: IOS Press.
ment of fear of flying. Journal of Contemporary Psychotherapy, Olfson, M., Guardino, M., Struening, E., Schneier, F. R.,
30, 195–207. Hellman, F., & Klein, D. F. (2000). Barriers to the treat-
Klinger, E., Bouchard, S. S., Légeron, P. P., Roy, S. S., Lauer, ment of social anxiety. American Journal of Psychiatry, 157,
F. F., Chemin, I. I., & Nugues, P. P. (2005). Virtual reality 521–527.

98 v i rtua l enviro nments in cl inical psyc h ology researc h

Parsons, S. (2005). Use, understanding and learning in virtual (2000). The virtual classroom: A virtual reality environment
environments by adolescents with autistic spectrum disorders. for the assessment and rehabilitation of attention deficits.
Annual Review of CyberTherapy and Telemedicine, 3, 207–215. CyberPsychology & Behavior, 3, 483–499.
Parsons, T. D., & Rizzo, A. A. (2008). Affective outcomes of vir- Rothbaum, B., & Hodges, L. F. (1999). The use of virtual real-
tual reality exposure therapy for anxiety and specific phobias: ity exposure in the treatment of anxiety disorders. Behavior
A meta-analysis. Journal of Behavior Therapy and Experimental Modification, 23, 507.
Psychiatry, 39, 250–261. Rothbaum, B. O., Anderson, P., Zimand, E., Hodges, L., Lang,
Pérez-Ara, M. A., Quero, S. S., Botella, C. C., Baños, R. R., D., & Wilson, J. (2006). Virtual reality exposure therapy and
Andreu-Mateu, S. S., García-Palacios, A. A., & Bretón- standard (in vivo) exposure therapy in the treatment of fear
López, J. J. (2010). Virtual reality interoceptive exposure of flying. Behavior Therapy, 37, 80–90.
for the treatment of panic disorder and agoraphobia. Annual Rothbaum, B. O., Hodges, L., Anderson, P. L., Price, L., &
Review of CyberTherapy and Telemedicine, 8, 61–64. Smith, S. (2002). Twelve-month follow-up of virtual reality
Perpiñá, C. C., Botella, C. C., & Baños, R. M. (2003). Virtual and standard exposure therapies for the fear of flying. Journal
reality in eating disorders. European Eating Disorders Review, of Consulting and Clinical Psychology, 70, 428–432.
11, 261–278. Rothbaum, B. O., Hodges, L., Smith, S., Lee, J. H., & Price, L.
Pertaub, D.P., Slater, M., & Barker, C. (2002). An experiment (2000). A controlled study of virtual reality exposure ther-
on public speaking anxiety in response to three different apy for the fear of flying. Journal of Consulting and Clinical
types of virtual audience. Presence: Teleoperators and Virtual Psychology, 68, 1020–1026.
Environments, 11, 68–78. Rothbaum, B. O., Hodges, L. F., Kooper, R., Opdyke, D.,
Pollak, Y., Weiss, P. L., Rizzo, A. A., Weizer, M., Shriki, L., Williford, J. S., & North, M. (1995). Effectiveness of
Shalev, R. S., & Gross-Tsur, V. (2009). The utility of a con- computer-generated (virtual reality) graded exposure in the
tinuous performance test embedded in virtual reality in mea- treatment of acrophobia. American Journal of Psychiatry, 152,
suring ADHD-related deficits. Journal of Developmental and 626–628.
Behavioral Pediatrics, 30, 2–6. Roy, S. S., Klinger, E. E., Légeron, P. P., Lauer, F. F., Chemin,
Powers, M. B., & Emmelkamp, P. G. (2008). Review: Virtual I. I., & Nugues, P. P. (2003). Definition of a VR-based pro-
reality exposure therapy for anxiety disorders: A meta-analy- tocol to treat social phobia. CyberPsychology & Behavior, 6,
sis. Journal of Anxiety Disorders, 22, 561–569. 411–420.
Price, M., & Anderson, P. (2007). The role of presence in vir- Ruscio, A. M., Brown, T. A., Chiu, W. T., Sareen, J. J., Stein, M.
tual reality exposure therapy. Journal of Anxiety Disorders, 21, B., & Kessler, R. C. (2008). Social fears and social phobia
742–751. in the USA: Results from the National Comorbidity Survey
Price, M., Anderson, P., & Rothbaum, B. (2008). Virtual reality Replication. Psychological Medicine: A Journal of Research in
as treatment for fear of flying: A review of recent research. Psychiatry and the Allied Sciences, 38, 15–28.
International Journal of Behavioral Consultation & Therapy, Schuemie, M. J., van der Straaten, P., Krijn, M., & van der Mast,
4, 340–347. C. G. (2001). Research on presence in virtual reality: A sur-
Price, M., Mehta, N., Tone, E. B., & Anderson, P. L. (2011). vey. CyberPsychology & Behavior, 4, 183–201.
Does engagement with exposure yield better outcomes? Segal, R., Bhatia, M., & Drapeau, M. (2011). Therapists’ percep-
Components of presence as a predictor of treatment response tion of benefits and costs of using virtual reality treatments.
for virtual reality exposure therapy for social phobia. Journal CyberPsychology, Behavior & Social Networking, 14, 29–34.
of Anxiety Disorders, 25, 763–770. Slater, M., Pertaub, D., Barker, C., & Clark, D. M. (2006). An
Pull, C. B. (2005). Current status of virtual reality exposure experimental study on fear of public speaking using a virtual
therapy in anxiety disorders. Current Opinion in Psychiatry, environment. CyberPsychology & Behavior, 9, 627–633.
18, 7–14. Slater, M., Pertaub, D., & Steed, A. (1999). Public speaking in
Reger, G. M., & Gahm, G. A. (2008). Virtual reality exposure virtual reality: Facing an audience of avatars. IEEE Computer
therapy for active duty soldiers. Journal of Clinical Psychology, Graphics and Applications, 19, 6–9.
64, 940–946. Smith, T. C., Ryan, M. A. K., Wingard, D. L., et al. (2008). New
Reger, G. M., Holloway, K. M., Candy, C., Rothbaum, B. O., onset and persistent symptoms of post-traumatic stress dis-
Difede, J., Rizzo, A. A., & Gahm, G. A. (2011). Effectiveness order self reported after deployment and combat exposures:
of virtual reality exposure therapy for active duty soldiers in Prospective population based US military cohort study.
a military mental health clinic. Journal of Traumatic Stress, British Medical Journal, 336, 366–371.
24, 93–96. Solomon, S. D., & Davidson, J. R. T. (1997). Trauma:
Rizzo, A., Difede, J., Rothbaum, B. O., Reger, G., Spitalnick, J., Prevalence, impairment, service use, and cost. Journal of
Cukor, J., & Mclay, R. (2010). Development and early evalu- Clinical Psychiatry, 58, 5–11.
ation of the virtual Iraq/Afghanistan exposure therapy system Standen, P. J., & Brown, D. J. (2005). Virtual reality in the
for combat-related PTSD. Annals of the New York Academy of rehabilitation of people with intellectual disabilities: Review.
Sciences, 1208, 114–125. CyberPsychology & Behavior, 8, 272–282.
Rizzo, A., Reger, G., Gahm, G., Difede, J., & Rothbaum, B. O. Stendal, K., Balandin, S., & Molka-Danielsen, J. (2011). Virtual
(2009). Virtual reality exposure therapy for combat-related worlds: A new opportunity for people with lifelong disabil-
PTSD. In P. J. Shiromani, T. M. Keane, J. E. LeDoux, P. J. ity? Journal of Intellectual and Developmental Disability, 36,
Shiromani, T. M. Keane, & J. E. LeDoux (Eds.), Post-traumatic 80–83.
stress disorder: Basic science and clinical practice (pp. 375–399). Tanielian, T., & Jaycox, L. H. (2008). Invisible wounds of
Totowa, NJ: Humana Press. war: Psychosocial and cognitive injuries, their consequences,
Rizzo, A. A., Buckwalter, J. G., Bowerly, T. T., van der Zaag, C. C., and services to assist recovery. Santa Monica, CA: RAND
Humphrey, L. L., Neumann, U. U., &. . . Sisemore, D. D. Corporation.

wong, bei d el 99
Tortella-Feliu, M., Botella, C., Llabrés, J., Bretón-López, J., del Wallach, H. S., Safir, M. P., & Bar-Zvi, M. (2009). Virtual reality
Amo, A., Baños, R. M., & Gelabert, J. M. (2011). Virtual cognitive behavior therapy for public speaking anxiety: A ran-
reality versus computer-aided exposure treatments for fear of domized clinical trial. Behavior Modification, 33, 314–338.
flying. Behavior Modification, 35, 3–30. Wang, M., & Reid, D. (2009). The virtual reality-cognitive
Traylor, A. C., Bordnick, P. S., & Carter, B. L. (2008). Assessing rehabilitation (VR-CR) approach for children with autism.
craving in young adult smokers using virtual reality. American Journal of CyberTherapy and Rehabilitation, 2, 95–104.
Journal on Addictions, 17, 436–440. Wiederhold, B. K., & Wiederhold, M. D. (2005). A brief his-
Turner, S. M., Beidel, D. C., & Townsley, R. M. (1992). Social tory of virtual reality technology. In B. K. Wiederhold &
phobia: A comparison of specific and generalized subtypes M. D. Wiederhold (Eds.), Virtual reality therapy for anxiety
and avoidant personality disorder. Journal of Abnormal disorders: Advances in evaluation and treatment (pp. 11–27).
Psychology, 101, 326–331. Washington, DC: American Psychological Association.
Villani, D., Riva, F., & Riva, G. (2007). New technologies for Witmer, B. G., & Singer, M. J. (1998). Measuring presence in
relaxation: The role of presence. International Journal of Stress virtual environments: A presence questionnaire. Presence:
Management, 14, 260–274. Teleoperators & Virtual Environments, 7, 225–240.
Vincelli, F., Anolli, L., Bouchard, S., Wiederhold, B. K., Zurloni, Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford,
V., & Riva, G. (2003). Experiential cognitive therapy in the CA: Stanford University Press.
treatment of panic disorders with agoraphobia: A controlled Wood, D., Murphy, J., McLay, R., Koffman, R., Spira, J.,
study. CyberPsychology & Behavior, 6, 321–328. Obrecht, R. E., Pyne, J., & Wiederhold, B. K. (2009). Cost
Wald, J., & Taylor, S. (2003). Preliminary research on the effi- effectiveness of virtual reality graded exposure therapy with
cacy of virtual reality exposure therapy to treat driving phobia. physiological monitoring for the treatment of combat related
Cyberpsychology & Behavior: The Impact of the Internet, Multimedia post traumatic stress disorder. Annual Review of CyberTherapy
and Virtual Reality on Behavior and Society, 6, 459–465. and Telemedicine, 7, 223–229.
Wallace, S., Parsons, S., Westbury, A., White, K., White, K., Wood, P. D., Murphy, J., Center, K., McKay, R., Reeves, D.,
& Bailey, A. (2010). Sense of presence and atypical social Pyne, J., Shilling, R., & Wiederhold, B. K. (2007). Combat-
judgments in immersive virtual environments: Responses related post-traumatic stress disorder: A case report using vir-
of adolescents with autism spectrum disorders. Autism, 14, tual reality exposure therapy with physiological monitoring.
199–213. CyberPsychology & Behavior, 10, 309–315.

10 0 v i rt ua l enviro nments in cl inical psyc h ology researc h

Measurement Strategies
for Clinical Psychology
This page intentionally left blank

Assessment and Measurement

7 of Change Considerations in
Psychotherapy Research
Randall T. Salekin, Matthew A. Jarrett, and Elizabeth W. Adams

This chapter focuses on a range of measurement issues including traditional test development
procedures and model testing. Themes of the chapter include the importance of integrating research
disciplines and nosologies with regard to test development, the need for test development and
measurement in the context of idiographic and nomothetic treatment designs, the need for change
sensitive measures, and the possible integration of both idiographic and nomothetic treatment
designs. Finally, the chapter emphasizes the importance of exploring novel areas of measurement
(e.g., biological functioning, contextual factors) using emerging technologies. It is thought that
innovative test development and use will lead to improved intervention model testing.
Key Words: assessment, measurement, psychotherapy, treatment, change

As scientists investigate the outcomes of various issue is the precision of measurement in capturing
methods of psychological therapy, they consider a psychotherapy change. Specifically, regardless of
number of scientific issues, such as idiographic ver- trial design, how should we measure patient change
sus nomothetic measurement, objective versus sub- when interventions are implemented?
jective assessment, trait versus behavioral change, There are six primary goals for the current chap-
and the economics of large-scale randomized con- ter. First, we discuss general issues in test develop-
trolled trials (RCTs; see Chapter 4 in this volume) ment. Second, we discuss the need for future tests
versus single-subject designs (Barlow & Nock, 2009; to be able to span current and future nosologies.
Borckardt et al., 2008; Meier, 2008; see Chapter 3 This issue may become particularly important as we
in this volume). Large-scale RCTs have helped to attempt to integrate information that has been gar-
answer questions of aggregate change (e.g., an active nered across sub-disciplines and fields of research.
treatment showing greater gains than a control con- Third, we discuss special considerations for test
dition), but RCTs have yet to fully deliver with development in the context of intervention research.
regard to individual responses to treatment and Within this section, we discuss the need to measure
the mechanisms of change in psychological therapy not only static but also dynamic characteristics of
(Borckardt et al., 2008). Although moderation anal- personality and psychopathology. Fourth, we out-
yses in large RCTs address key questions related to line methods to garner more in-depth information
individual differences in treatment response, iatro- on clients during the therapeutic process. Fifth, we
genic effects of therapy at the individual level may go provide an overview of new assessment technolo-
undetected in RCTs (Barlow, 2010; Bergin, 1966). gies. Sixth, we discuss the important and controver-
While this debate ranges from methodological to sial issues of arbitrary metrics and idiographic versus
economic concerns, one critical but often neglected nomothetic research. It is our hope that a discussion

of these topics will lead to improved assessment validity of any measure. Rather than an isolated set
technology and an enhanced consideration of mea- of observations, studies must be conducted over
surement issues when evaluating change in psycho- time to examine the factor structure, links to other
logical therapy. measures, differentiation between selected groups,
and hypothesized changes over time or in response
Test Construction: The Traditional to a manipulation. It has been argued that these
Approach construct validity studies are critical, in that they
Many scholarly papers have focused on test con- guide us toward the most precise instruments.
struction and assessment (e.g., Clark & Watson, Loevinger’s (1957) monograph continues to
1995; Salekin, 2009). Historically, test construction be one of the most thorough delineations of theo-
was initiated for placement purposes. For example, retically based psychological test construction.
initial test development focused on selection prob- Clark and Watson (1995) elaborated on Loevinger’s
lems in the early twentieth century (e.g., school and monograph, suggesting that there are a number of
military personnel selection). Tests were designed to important steps in test development and validation.
be highly efficient in obtaining information about First, test developers should consider whether a test is
large groups of people in a short amount of time and needed before developing a new test. If new develop-
valuable in terms of predictive validity. In essence, ment is unnecessary, then researchers may want to
a great emphasis was placed on uniformity and the devote time to other knowledge gaps in the literature.
stable, enduring nature of traits. For instance, Binet There are many reasons for new tests, such as the old
used intelligence tests to determine which children ones do not work in the capacity for which they were
would need special education. Alpha I and II tests designed, they do not work as well as they should, a
were developed for World War I and II to determine test is required for a different population, there is a
appropriate settings for soldiers. Test questions helped need for speedier assessments of a construct (briefer
distinguish those who would be better or less suited indices), and so on. Should it be decided that a new
for high-risk placements (e.g., fighter pilots; kitchen test is, in fact, needed, Loevinger (1957) and Clark
staff, marines, generals; strategists; mechanics). and Watson (1995) provide excellent suggestions for
This traditional approach to test construction item pool creation (i.e., generate a broad set of items),
and evaluation has served as the foundation for distillation (i.e., reduce overlap by having experts rate
modern clinical assessment. For example, clinical similarity and prototypicality of items), and the sub-
assessment tools are often evaluated for their psy- sequent validation process (i.e., linking the construct
chometric properties, such as scale homogeneity, to meaningful external correlates). These topics have
inter-rater and test–retest reliability, item-response been covered in great length in past research, so we
functioning (see Chapter 18 in this volume), and would point readers to these important papers before
various aspects of validity, including predictive, con- engaging in test development.
struct, and ecological validity. The key to successful This traditional approach has proven valuable in
test development is that the instrument be meth- driving test development. However, we see impor-
odologically sound and have practical implications tant next steps in test development to include the
and utility. Advice for test construction and validity need to (a) develop measures that span sub-disci-
comes from classic works by Cronbach and Meehl plines and classification systems, (b) include those
(1955), Loevinger (1957), and more recently Clark measures that are appropriate for interdisciplinary
and Watson (1995). Cronbach and Meehl (1955) research in studies that utilize traditional measures,
argued that investigating the construct validity of a and (c) design and evaluate measures that are ideal
measure entails, at minimum, the following steps: for measuring change in psychotherapy.
(a) articulating a set of theoretical concepts and
their interrelations, (b) developing ways to index The Need for Integration Across Nosologies
the hypothetical concepts proposed by a theory, and and Research
(c) empirically testing the hypothesized relations The fields of psychiatry and psychology are in
among constructs and their observable manifesta- the midst of many changes. With the advancement
tions. This means that without a specified theory of the DSM-5 and the ICD-11, there are expected
(the nomological net), there is no construct validity. changes in the criteria that underpin disorders.
It has been argued that such articulation of theory is The DSM-5 workgroups are also grappling with
chief in the development of tests. Moreover, a series taxonomic versus dimensional models, as well as
of studies are needed to determine the construct the possibility of reducing the overall number of

10 4 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

categories for mental illness. The potential changes Table 7.1 Connecting the Classification Schemes—
to these diagnostic manuals could be vast, and mea- Childhood Disorders as an Example
surement development may be needed with increas- Externalizing Internalizing
ing urgency to address DSM and ICD revisions.
(Approach) (Withdrawal)
In addition, there is a clear need for integration of
measures across disciplines. For example, with the (Positive) (Negative)
emergence of fields such as developmental psycho-
Big 5 Language
pathology (Cicchetti, 1984), a growing body of
research linking normal development and abnormal High Extraversion High Conscientiousness
development has emerged (Eisenberg et al., 2005;
Rothbart, 2004). We speculate that common tests Low Conscientiousness High Neuroticism
or constructs that can be used to unite different Low or High Agreeableness
mental health languages or discipline-specific con-
structs may be highly valuable. Thus, a new measure DSM-IV Classifications
might provide a number of labels for similar sets of ODD Separation Anxiety
symptoms (DSM-5 ADHD, ICD-11 Hyperactivity,
and FFM Extreme Extraversion).1 See Table 7.1 for CD Depression
a variety of classification schemes and disorders that ADHD Generalized Anxiety
likely pertain to similar underlying problems in
children. Substance Abuse Obsessive-Compulsive
Integration may not be far off for the mental Disorder
health fields. Nigg (2006), for example, delineated
Eating Disorders
some of the similarities between classical and tem-
perament theories, highlighting the large degree of ICD-10
overlap of symptoms across systems and the need
Hyperkinetic Disorders Separation Anxiety
for integration. At a basic level, scholars have recog-
nized the need to bring together the fields of research Conduct Disorders Phobic Anxiety
(Frick, 2004; Frick & Morris, 2004; Nigg, 2006)
Oppositional Defiant Social Anxiety
to eventually further clinical practice. To illustrate
this point, a recent example of cross-fertilization is
the research on personality and its potential rela- Temperament
tion to psychopathology (see Caspi & Shiner, 2006;
Nigg, 2006; Rutter, 1987; Salekin & Averett, 2008; Surgency Shyness/Fear
Tackett, 2006, 2010; Widiger & Trull, 2007). This Positive affect/Approach Irritability/Frustration
research shows connections between personality and
psychopathology. Research in this area has under- Low Control High Control
scored three basic human characteristics—approach
(extraversion or positive affect), avoidance (fear, or
anxiety, withdrawal), and constraint (control)— nosology that serves to advance knowledge across
with hierarchical characteristics that add to these levels of functioning. One example of such a nosol-
three basic building blocks of human functioning ogy currently in development within the disciplines
and personality (Watson, Kotov, & Gamez, 2006). of clinical psychology and psychiatry is the initia-
Depending on the researcher, some have begun to tive by the National Institute of Mental Health
suggest that a disruption (or an extreme level) in (NIMH) to develop what have been called Research
one or more of these areas can reflect psychopathol- Domain Criteria (RDoc). This system is designed
ogy. For instance, primary deficits in emotion regu- to reflect new ways of classifying psychopathology
lation may account for a host of disorders including based on dimensions of observable behavior and
generalized anxiety, depression, and antisocial con- neurobiological measures. In turn, such a system is
duct (Allen, McHugh, & Barlow, 2011; Brown & working to define basic dimensions of functioning
Barlow, 2009). (e.g., fear) that can be studied across levels of analy-
We mention cross-fertilization because we sis (e.g., genes, neural circuits, brain regions, brain
believe that the next step for assessment technology functions, etc.). See Table 7.2 for a current draft of
may be to work toward a common cross-discipline the RDoc matrix and examples of genes and brain

salek i n, jar ret t, ad ams 105

Table 7.2 RDoc Matrix Examples
————— Units of Analysis —————

Domains/ Genes Molecules Cells Circuits Physiology Behavior Self-Reports Paradigms


Negative Valence

Active threat Pre-frontal- Heart rate

(“fear”) amygadala

Potential threat

Sustained threat



Positive Valence


Initial responsive-
ness to reward

Sustained respon-
siveness to reward

Reward learning


Cognitive Systems

Attention DRD4 Continuous



Working memory Behavior

of Executive

Declarative memory

Language behavior

Cognitive (effortful)

10 6 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

Table 7.2 (Continued)
————— Units of Analysis —————

Domains/ Genes Molecules Cells Circuits Physiology Behavior Self-Reports Paradigms


Systems for Social


Imitation, theory
of mind

Social dominance

Facial expression

tion fear



Arousal & regula-

tion (multiple)

Resting state

regions that might be implicated for certain condi- genetic research with psychological assessment.
tions (e.g., fear). Much work will be needed over the next few decades
Clearly, the integration of biology into psychol- to advance knowledge, and psychology certainly has
ogy and psychological assessment is likely to be an important role in assessing psychopathology and
important in the upcoming decades. Although the its potential alterations over the course of treatment.
RDoc system is still in development, such multi- Specifically, psychology can help by developing pre-
level analyses are under way in many research areas. cise measures that get us closer to the underlying
Because of the importance of biology in the assess- construct (e.g., disorders) so that genotype–pheno-
ment process, we mention two areas of study where type relationships can be more easily detected. For
assessment and measurement of change will be key example, emerging research on the predominantly
in the future—genetics and neuroimaging. Because inattentive type of attention-deficit/hyperactivity
of their rising importance, increasing the precision disorder (ADHD) has revealed that within-subtype
of assessment will be a priority for further under- differences may exist with respect to hyperactivity/
standing psychopathology. impulsivity (H/I) symptom count (Carr, Henderson,
& Nigg, 2010). Genetic studies and neuropsycho-
Measurement in Genetic Studies logical studies are showing support for differentiating
Recent articles in Time magazine highlight between children with ADHD-I who have less than
the extent to which genes affect our lives (Cloud, or greater than two H/I symptoms (Volk, Todorov,
2010). Behavior and molecular genetics have greatly Hay, & Todd, 2009). In addition, DSM-5 is cur-
advanced our knowledge of psychopathology. The rently exploring an inattentive presentation (restric-
field of epigenetics has demonstrated that DNA is tive) subtype (i.e., less than two H/I symptoms).
not one’s destiny—that is, the environment can help This example relates to the above-mentioned points
change the expression of one’s genetic code. Despite regarding the integration of genetic research and
significant advances in molecular and behavior precise measurement in better understanding geneti-
genetics, there continues to be a need to integrate cally driven effects. In addition, better measurement

salek i n, jar ret t, ad ams 107

of environmental factors, including interventions, parasympathetic autonomic responsivity as well as
will be chief if we are to understand what parts of the influence of serotonergically modulated reactive
the environment allow for the activation and deacti- control systems; and (d) a related affiliation/empathy
vation of genetic influences. system linked to effortful control and the capacity
for negative affect. This system subsequently leads
fMRI Research, Task Development, and to empathy and a desire for and tendency toward
Further Psychological Measurement affiliation and cooperation (as opposed to social
With the advent of functional magnetic resonance dominance or social interaction, which is driven by
imaging (fMRI), we have gained considerable knowl- the reward/approach systems).
edge regarding the workings of the brain (see Chapter In sum, the uniting of nosologies as well as the
10 in this volume). However, our knowledge can be integration of biology and psychology will not only
furthered through the use of psychological assess- serve to provide a common language for under-
ment and intervention studies. Because neuroimag- standing “syndromes” or conditions but may also
ing findings are so heavily dependent on specified elucidate the processes and primary neural anchors
tasks, advancement in assessment measures is very that might be related to a particular condition. In
much needed to shorten the gap between the task turn, these processes could then become the target
performed and the inference made about the partici- of intervention. Next, we turn our attention from
pant. Thus, the psychological assessment enterprise single-time-point assessment to considering assess-
could, in part, serve to integrate systems by further ment over multiple time points, and in particular in
developing tasks relevant to imaging studies. Assessing response to treatment.
state effects (e.g., affect, thoughts) during neuroimag-
ing tasks could also be beneficial for future research. Measuring the Benefits of Psychotherapy
Overall, interconnecting psychological assessment Smith and Glass (1977) sparked interest in scien-
with cognitive and biological paradigms will help to tists to quantitatively index change in patients with
fasten the disciplines and make more substantial con- a standard metric. However, the study of human
tributions to both basic and applied research. behavior involves significant sources of error and/or
The above underscores the need for multidisci- variability, a problem that has affected the precision
plinary research but also the need for multimethod– of measurement as it relates to the prediction of
multitrait matrices of behavioral and biological behavior, the definition of psychological constructs
measures. This approach might be one important (e.g., traits), and capturing change in response to
way to start to integrate information on psycho- psychological intervention. Measurement issues
logical assessment and to examine progress across will always be a challenge when studying a phe-
treatment. As mentioned, Nigg (2006) and others nomenon as complex as human behavior.
(Sanislow, Pine et al., 2010) contend that higher- In a review of the history of the assessment pro-
order traits can be conceived as part of a hierarchi- cess, Meier (2008) argues, much like Loevinger
cal model rooted at an abstract level. In contrast, (1957) and Clark and Watson (1995), that when
reactivity and basic approach and withdrawal considering test usage, there are three main factors
systems may emerge early in life but differentiate to consider: (1) test content, (2) test purpose, and
into additional meaningful lower-order behavioral (3) test context. Significant progress has been made
response systems during childhood. Nigg (2006) in the area of test content, as mentioned above, but
has argued that differentiations can be made at four test purpose and test context have been relatively
levels: (a) a frontal limbic, dopaminergically modu- less studied. Given that one purpose of assessment
lated approach system anchored in the reactivity is to measure response to intervention, it seems that
of nucleus accumbens and associated structures to measurement development in this area might also
potential reward; (b) a frontal limbic, withdrawal be needed. Specifically, in terms of Loevinger (1957)
system anchored in reactivity to amygdala and asso- and Cronbach and Meehl’s (1955) notion, a theory
ciated structures but also including stress response is required to adequately develop psychological tests
systems and sympathetic autonomic response, with that are sensitive to change. A theory of dysfunc-
important distinctions between anxiety and fear tion and a theory of alleviating the dysfunction may
response; (c) a constraint system that has diverse serve as a model that can guide test development
neurobiological conceptions but has been con- in this regard. Also, unlike methods for develop-
ceived as rooted in frontal-striatal neural loops ing measures that often focus on characteristics or
that are dopaminergically modulated and reflect concepts that exhibit stability (e.g., intelligence), a

10 8 a sse ssm e nt and meas urement o f c h ange consi d er at i ons

primary concern for the intervention scientist might also examine a broad network of changes, includ-
be the consideration of measures that adequately ing phenotypic change on a measure of personality
tap dynamic characteristics as well as mechanisms as well as biological changes that might be mapped
of change for symptoms and traits. We discuss these through genetic expression, electroencephalography
measurement issues next. (EEG), or fMRI measurement.
Bandura’s (1977) self-efficacy theory provides
Measurement of Psychotherapy Change: an additional example of a nomothetic theoretical
Starting with a Theoretical Base approach that affords specificity regarding what
When researchers attempt to measure psycho- outcomes should change as a result of an inter-
therapeutic change, they require a theory about vention. With this theory, outcome expectations
how the change occurs. For these purposes, test refer to knowledge about what specific behaviors
developers search existing theoretical and empiri- will lead to desired outcomes. Self-efficacy expec-
cal literatures to develop items responsive to the tations, however, refer to beliefs individuals hold
interventions and empirical populations in ques- about whether they can perform those specific tasks.
tion. The first step would be to conduct a literature Anxious individuals may know, for example, that
review in preparation for creating items and tasks the more they engage in social situations the better
about problems thought to be affected by different they will perform at completing this behavior even if
interventions with the clinical population of inter- they continue to feel anxious. Thus, they may come
est. Problems with depression and/or anxiety might to realize that they have competency in completing
be guided by research on coping, courage, or hope. the task. Bandura (1977) intended this theory to
Like the research on general test construction, a be applicable to all clients and problems and pro-
thorough explication of the constructs expected to posed a set of interventions (i.e., modeling, direct
be influenced and those unaffected by the interven- performance of behaviors) specifically designed to
tion should result in a large initial item pool, provid- improve efficacy expectations (see Bandura, 1997).
ing the capacity to investigate general and specific Thus, the model provides an efficient means of mea-
intervention effects. Test developers might consider suring mechanisms of change because of the strong
including items from theories that conceptualize the relationship to both affect and behavior, and the
process and outcomes of an intervention in different requisite constructs can be measured through ques-
ways as well as items not expected to be affected by tionnaires or interviews. Biological changes might
treatment. Demonstration of change in intervention- also be expected and indexed, including approach
sensitive items and stability in items not expected to behavior, which could be detected on the left hemi-
change would be another strong method for demon- sphere through EEG or imaging work, and increase
strating construct validity for change-sensitive tests. in heart rate and other biological indicators that
It is possible that there are both static and dynamic show greater approach and motivation.
items, some of which are more appropriate for inter- Although the vast majority of theories for psy-
vention research (Cook & Campbell, 1979). chotherapies are nomothetic, Mumma (2004)
Like Loevinger (1957), we suggest that theory offered a manner in which to conceptualize cases
should guide the process. For example, Rogers’ idiographically and also test a case-specific theo-
Client-Centered Psychotherapy provides a nomo- retical model from the perspective of the client. He
thetic theory—all individuals are good in nature—so refers to this as “cognitive case formulation” (CCS),
that aspect of human functioning would not be where thoughts and beliefs are elicited in using a
expected to change. However, facilitating one’s methodology focusing on clinically relevant issues
awareness regarding one’s innate good nature and and situations (Beck, 1995; Persons, 1989; Persons
elucidating one’s ability to make one’s own decisions et al., 2001). Once cognitions are elicited, they are
more confidently could be related to healthy human examined and selected for relevance to the problem.
outcomes and an awareness of one’s innate goodness. Following the case formulation, Mumma suggested
As such, measures of change might examine level of that the intervention be geared toward the specific
awareness gained and decision-making frequency cognitions that mediate the problem. Convergent
and capacity across therapy. Assessment might also and discriminant validity analyses are then utilized
then focus on resultant increases in mental health, to determine whether the intervention is effec-
knowing, at the same time, that one’s level of innate tive. This approach entails learning about the cli-
goodness (as rated by other Rogerian clinicians) ent’s specific cognitions, which may not have any
would be stable across that same period. One might connection to efficacy expectation (as noted in the

salek i n, jar ret t, ad ams 109

nomothetic example above) or the awareness of the For example, Kelly, Roberts, and Ciesla (2005)
patient, but instead to the specific beliefs of the found that clients exhibited significant change in
individual, which could then be altered presumably symptoms of depression prior to the start of inter-
through the course of a tailored psychotherapy. In vention. In some instances, studies that conduct an
the section below, we discuss these issues in more initial assessment and then begin treatment weeks
detail with regard to nomothetic and idiographic later may miss an important window of assessing
assessment strategies that focus on tapping change. change (i.e., changes between initial assessment and
the start of treatment).
Assessing Change on Relevant Outcome If change-sensitive measures are seen as a prior-
Constructs in Intervention ity, then it will be necessary to discuss how such
As noted throughout the chapter, many assess- measures might be developed. Prior to discussing
ment measures have been evaluated using traditional recommendations for assessing the psychometric
psychometric approaches. For example, many of properties of change-sensitive measures, we briefly
these approaches have come from a trait-based tradi- discuss recommendations that have been identified
tion that assumes that specific traits should be static for item selection for change-sensitive measures.
over time. This point is critical, since if trait-based The process for developing such measures is similar
measures are not expected to change, use of such to what Cronbach and Meehl (1957) and Loevinger
measures in treatment-outcome studies may be inap- (1957) have suggested for trait measures. Meier
propriate (Meier, 2008). Traditionally, these mea- (2008) recommends a series of intervention item-
sures have included lengthy assessment tools such selection rules that align well with traditional theo-
as the Minnesota Multiphasic Personality Inventory ries regarding test development (see Table 7.3).
(MMPI; Hathaway & McKinley, 1941) and the As with traditional tests, items created for
Child Behavioral Checklist (CBCL; Achenbach, change-sensitive tests should be theoretically based.
2001), measures that were developed primarily Contemporary measurement theories purport that
for selection or screening purposes rather than for test validation is closely related to traditional hypoth-
assessing change in response to treatment. At the esis testing (Ellis & Blustein, 1991; Landy, 1986).
same time, these limitations have been recognized Theoretical grounding also provides an important
by test developers. For example, a new version of the context for understanding the meaning of changing
CBCL was just released that is designed for detect- values of items in a change-sensitive assessment.
ing change (i.e., the BRIEF Problem Monitor). This In relation to the assessment of reliability and
brief version that utilizes CBCL items was developed validity, the general recommendations discussed in
for researchers and clinicians interested in examining
response to intervention. Similarly, Butcher devel-
oped the Butcher Treatment Planning Inventory Table 7.3 Intervention Item Selection Rules
(Butcher, 1998) as a means of gathering information 1. Ground scale items in theoretical and empirical lit-
about a client before treatment as well as to moni- erature relevant to applicable interventions, clinical
tor his or her response to treatment. Recently, the populations, and target problems.
Risk-Sophistication-Treatment Inventory has been
found to be change sensitive with adolescent offend- 2. Aggregate items at appropriate levels, and assess range
of item scores at pretest.
ers (Salekin, Tippey, & Allen, 2012).
In addition to the nature of the assessment 3. Make sure items evidence change in intervention
tool, informant effects are also present in relation conditions (in theoretically expected direction).
to assessing change. For example, Lambert (1994)
4. Examine whether differences in change exist between
found that (a) measures of specific therapy targets
intervention and comparison groups.
produced larger effects than more distal measures,
(b) therapist and coder ratings produced larger 5. Examine whether intake differences exist between
effects than self-reports, and (c) global ratings pro- comparison groups.
duced larger effects than specific symptom ratings.
6. Examine relations between item scores and systematic
Overall, these findings point toward the idea that
error sources.
there are “reliable differences in the sensitivity of
instruments to change” (Lambert, 1994, p. 85). In 7. Cross-validate results to minimize chance effects.
addition to the issue of measurement, one must also Based on Cronbach and Meehl (1955), Loevinger (1957), and
consider timing and changes prior to intervention. recently Meier (2008).

110 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

the traditional test-development section apply with response to treatment. Meier (2008) noted that once
some special considerations. It is still important to an outcome measure is established as a stable mea-
assess for the stability of a construct in the absence of sure with strong psychometric properties, it should
intervention. As Vermeersch and colleagues (2004, not change in response to repeated administration
p. 38) noted, “the sensitivity to change of a measure in the absence of intervention but should change
is directly related to the construct validity of the in the presence of intervention. In turn, such an
instrument.” Once basic psychometric properties evaluation would require the examination of test–
are established (e.g., internal consistency, test–retest retest reliability in a control sample, and the test–
reliability, inter-rater reliability, etc.), one can then retest range should approximate the periods used
evaluate how sensitive to change the measure is in between outcome measures in the clinical sample

Table 7.4 Cross-Cutting Areas of Change Across Diverse Psychotherapies: Potential Change-Sensitive Items
Domain Description


Listens Pays attention to others and follows through with instructions

Shares thinking Is able to talk about what he or she is thinking about with others

Rational Is able to develop rational thoughts with a number of options for problems

Views of the self Self-understanding


Bright affect Becomes happier with self over the course of treatment

Emotion regulation Is able to regulate emotions, control temper

Compensatory skills Is able to engage in behaviors to alleviate or reduce negative feelings


Helping Contributes in the home helping with chores and in the community

Sports/Hobbies Engages in sports and/or hobbies

Well behaved Behaved at home and at school/work

Ways of responding Learns ways of responding


Close friend New friends or a close friend attained. A close or confiding high-quality friendship.
Is able to start conversations.

Social life Number of friends, frequency of social activities, level of conflict with others; makes
friends easily

Romantic Quality and satisfaction with romantic relationship (adolescents and young adults)

Family Quality of relationship with parents and siblings


Academic Quality of school performance; level of effort and grades

Finance Financial standing. How does the individual handle money? (adolescents and young adults)

Personal health Health status and healthy lifestyle

Family health Health status and healthy lifestyle in immediate family

salek i n, jar ret t, ad ams 111

(e.g., weekly, monthly, etc.). Table 7.4 provides an general trend within the field of medicine to estab-
example of several potential change-sensitive items lish guidelines for evidence-based medicine. Strengths
that could be included in test–retest designs. of the RCT include enhanced causal clarity and the
Changes are to be expected in biology as well. ability to utilize parametric statistics to examine treat-
For instance, as we chart out differences based on ment outcomes (see also Chapter 4 in this volume).
extraversion and introversion or approach and with- At the same time, RCTs have limitations (e.g., Barlow
drawal, we may expect certain changes in biology, & Nock; 2009; Borckardt et al., 2008; Westen,
such as a set of neurons that becomes more active in Novotny, & Thompson-Brenner, 2004). Although
the amygdala. However, more subtle differences may RCTs can identify significant differences at the group
also be expected, and measurement would be needed or aggregate level, clients often show substantial vari-
to take into account such differences. Specifically, ability in response to treatment, and the methodology
although we labeled our two broad categories as of the RCT is less well equipped to address this vari-
approach or withdrawal, biological distinctions ability (Barlow, 2010). More importantly, although
are not always similarly demarcated and grouped. many trials have supported the efficacy of therapeu-
Recent physiological evidence is consistent with tic approaches for a range of psychological problems,
this perspective. For example, left frontal activation there is still a dearth of knowledge regarding the mech-
appears to be related to motivation (approach) more anisms of change in treatment. These issues have long
clearly than to valence. Those measuring change challenged clinicians and researchers. For example,
across intervention trials would need to be aware of Kiesler (1966) first posed the question of personalized
this. Moreover, Nigg (2006) has argued that angry treatment approaches. Although this level of specific-
affect sometimes reflects a potential to overcome ity remains a challenge, the question remains whether
obstacles in anticipation of reward or reinforcement the current assessment tools and methodologies allow
emanating from traits such as surgency and reward us to move closer to determining what treatments are
responsivity, whereas in other instances it reflects more effective for certain individuals.
a propensity to resentful, angry, hostile affect that New methodologies may be needed. With any
is responsive to fear or anticipated negative conse- methodology, there will be a need for assessment that
quences. These may be related to panic, alarm, and/ is capable of indexing change, a crucial aspect for
or stress response, as well as rage related to panic or determining the effects of treatment. Although ear-
fear. As mentioned, we expect that biological research lier parts of the chapter have focused on traditional
in conjunction with psychological assessment will test-development procedures, model testing, and
advance the field in this regard. However, for now, the development of change-sensitive tests, the final
they are important factors to be cognizant about in part of the chapter will focus on selecting measures
designing measures and indices of change. in the context of treatment evaluation that might
best capture change given the particular strengths of
Specific Considerations for Treatment the treatment design. Prior to discussing test selec-
Outcome Evaluation Methodologies tion in the context of treatment, we briefly review
It has been argued that current assessment and the evidence-based treatment (EBT) movement
test-development practices may pose some problems and the treatment evaluation methodologies below
in the area of treatment research or may not capture that have been supported by this movement.
all the important variables when considering whether
or not an individual has changed. In addition to Empirically Supported Methodologies
assessment challenges, there are controversies in rela- Although the movement to evaluate the efficacy of
tion to experimental methodologies that are best psychosocial treatments occurred many years prior to
used for evaluating treatment. This seems like a good 1995, the first report on evidence-based practice was
point in the chapter to discuss concerns regarding issued at that time. This report, issued by the Society
methodology because these concerns are also linked of Clinical Psychology Task Force on Promotion
to a general uneasiness about the types of assessments and Dissemination of Psychological Procedures,
that are needed to detect change in clients. With the established guidelines for both methodologies and
movement toward establishing empirically supported the level of evidence needed for various classifica-
treatments (Kendall, 1998), there has been an increas- tions of “empirical support.” Three categories were
ing emphasis on the use of the RCT, a method that established for empirically supported treatments:
allows randomization to treatment or control con- (1) well-established treatments, (2) probably effica-
ditions. Such a movement has coincided with the cious treatments, and (3) experimental treatments.

112 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

The primary distinction between well-established possibly during the middle of treatment, at the end
treatments and probably efficacious treatments is that of treatment (posttreatment), and then again at a
the former must have been found to be superior to follow-up point. Increasingly, RCTs are incorporat-
a psychological placebo, pill, or another treatment ing weekly assessments of key outcomes, affording
whereas the latter must prove to be superior only opportunities for more sophisticated evaluations of
to a waitlist or no-treatment control condition. In treatment-related changes (e.g., Kendall, Comer,
addition, the former require evidence from at least Marker, Creed, Puliafico, et al., 2009). Such designs
two different investigatory teams, whereas the lat- may incorporate both short-term and longer-term
ter require evidence from only one investigatory follow-up assessments. These models expect that
team. For both types of empirically supported treat- individuals have the same traits or behaviors to vary-
ments, client characteristics (real clients) should be ing degrees, given that all individuals often meet
well specified (e.g., age, sex, ethnicity, diagnosis), criteria for the same disorder or related disorders.
and the clinical services should follow written treat- While the nomothetic method is likely to always be
ment manuals. Experimental treatments are those useful as the most rigorous method with which to
treatments not yet shown to be at least probably afford causal conclusions, the nomothetic approach
efficacious. This category was intended to capture has been criticized. First, RCTs often report aggre-
treatments frequently used in clinical practice but gate data, but idiographic researchers might be
not yet fully evaluated or newly developed ones not more interested in individuals who responded
yet put to the test of scientific scrutiny. particularly well versus those who did not change.
Most clinicians and researchers are aware of these Fortunately, formal evaluation of treatment mod-
guidelines as they relate to group comparisons, but erators is becoming increasingly common in RCT
single-case designs were also supported in this task analyses (see Chapter 15 in this volume). A second
force report. Single-case designs have a long his- criticism is that in applied settings, clinicians ques-
tory in psychology, and many single-case studies led tion the applicability of findings from RCTs to the
to advances in clinical applications. Nevertheless, individuals seen in their clinical settings. In other
their use has decreased substantially as the National words, despite RCTs using real clients who meet
Institutes of Health funded large-scale RCTs. More diagnostic criteria, there is a perception that prob-
recently, single-case designs have re-emerged (Barlow lems exist in generalizing nomothetic results to an
& Nock, 2009; Borckardt et al., 2008; Westen & idiographic situation. Although we agree with some
Bradley, 2005). RCTs provide significant causal clar- of the criticisms of RCTs, these designs nevertheless
ity through the use of randomization, but they tend to still have great value in treatment evaluation. There
answer questions of aggregate change (e.g., do clients may be room to integrate some of the advantages of
in the treatment condition improve more than those idiographic designs into future RCTs.
in a control condition?). Several large-scale RCTs have
established that certain treatments are more effective Idiographic Designs
than credible placebos or alternative treatments, but The intensive study of the individual has net-
some of these studies, particularly those that fail to ted some of psychology’s important discoveries.
address treatment mediators and moderators, have The founders of experimental psychology, includ-
struggled to describe (a) idiographic change and (b) ing Fechner, Wundt, Ebbinghaus, and Pavlov, all
processes of change in treatment. In comparison to studied individual organisms and produced field
RCTs, single-case designs may lack the causal clarity changing findings. Allport (1937, 1938) was one
of RCTs given the lack of randomization, but at the of the first to discuss the importance of the idio-
same time, they are helpful in examining idiographic graphic approach with respect to personality. While
change (see Chapter 3 in this volume). In the follow- nomothetic tests gained popularity because of their
ing sections, we will outline measurement selection ease of use and economics, it has been argued that
strategies for both nomothetic or group designs as idiographic methods should climb in importance
well as idiographic or single-case designs. In addition, because of their potential for further improving
we discuss areas of integration that would allow for validity, particularly in clinical settings. Although
maximizing treatment evaluation. the previously discussed measures can be utilized in
both nomothetic and idiographic treatment designs,
Nomothetic Designs special consideration is needed when utilizing an
Nomothetic designs typically assess functioning idiographic design. Idiographic designs might focus
prior to treatment (e.g., pretreatment, baseline), more on the specific problems of the individual,

salek i n, jar ret t, ad ams 113

and measures may be tailored to the intervention marker for 12 hours a day across a 2-day period
process. Testing procedures may also involve direct (Freedman, Ianni, Ettedgui, & Puthezhath, 1985).
observation of the trait or behavior. Another con- A number of other naturalistic studies have used
sideration is the frequency of measurement. Many similar techniques with larger samples to examine
idiographic designs may include daily or weekly the relation between cardiovascular functioning and
measurement of symptoms or traits and behavior. anxiety (e.g., Hoehn-Saric, McLeod, Funderburk,
In turn, this frequency of measurement has implica- & Kowalski, 2004). Other research has examined
tions for detecting change and the analysis of data. respiratory functioning (Klein, 1993; Ley, 1985) or
Although in general frequently assessing a prob- hyperventilation/hypocapnia (i.e., reduced level of
lem can be beneficial to understanding change carbon dioxide in the blood). Endocrine measure-
that is occurring, one common problem when ment has also been examined for anxiety disorders
variables are frequently measured over the course since they are thought to be related to psychological
of intervention is the problem of autocorrelation. stress. Psychological stress in humans leads to a cas-
Conventional parametric and nonparametric sta- cade of hormonal changes regulated by the hypotha-
tistics assume that observations are independent, lamic-pituitary-adrenal (HPA) axis, with an increase
but data collected in single-case designs are highly in cortisol being the most typically observed finding
dependent. Autocorrelation, sometimes referred to (Alpers, 2009).
as serial dependence, reflects the fact that one value
in time depends on one or more of the preceding Integration of Nomothetic and
values. For example, weather is a natural example Idiographic Elements
of autocorrelation: the weather yesterday is a good We have discussed the pros and cons of nomo-
predictor of the weather today. In turn, single-case thetic and idiographic designs. The debate has
design approaches need statistical analysis tools led many researchers to utilize only one of the
that take into account factors such as autocorrela- approaches, yet we see substantial room for integra-
tion. Recently, statistical approaches have emerged tion. For example, researchers conducting an RCT
to begin to address such issues (see Borckardt et could easily integrate weekly assessment into the
al., 2008). One of the key assumptions, though, is framework of the RCT assessment schedule (e.g.,
equal intervals of measurement (e.g., weekly, daily, short weekly ratings of symptoms in addition to
etc.). In turn, researchers seeking to use single-case more comprehensive assessments at pretreatment,
design elements need to plan for having evenly midtreatment, and posttreatment). As mentioned,
spaced assessment intervals. Finally, approaches that some studies now integrate the two. In addition,
use such frequent measurement have the capacity to researchers could continue with brief weekly assess-
capture the mechanisms of change in therapy. For ments in the short period between the initial pre-
example, an RCT that utilizes only a mid-treatment treatment assessment and the start of treatment.
assessment may miss out on important changes in For example, in many RCTs, additional assessment
the therapeutic process. In addition, new single-case sessions and/or other administrative processes (e.g.,
design approaches also allow for the examination establishing diagnoses, randomizing to a treatment
of multivariate process change using cross-lagged condition, etc.) may occur in the weeks following
correlations (e.g., does variable 1 change before the pretreatment assessment. Lack of measurement
variable 2? If so, what was the lag in change?). during this period could result in a loss of informa-
Self-monitoring, natural-language, and word-use tion regarding whether change occurs in the weeks
approaches can all be useful in gleaning further prior to the start of treatment. As mentioned earlier,
idiographic information about a client (Dimaggio weekly assessment during RCTs would also allow
& Semerari, 2004). See Table 7.5 for examples of for evaluation of session-by-session change in addi-
idiographic methods. tion to changes between key comprehensive assess-
ment points (e.g., pretreatment, midtreatment,
Physiological Data in Natural Settings etc.). Finally, this weekly assessment also allows for
Physiological data may be collected in natural an examination of individual change. For example,
settings and more frequently across the week. Such researchers utilizing both approaches would be able
data have been collected with respect to cardiovascu- to examine aggregate change but also processes of
lar, respiratory, and hormonal systems. In one study change at the individual level. Although seen in
examining panic disorder, 12 subjects wore an appa- only a handful of reported RCTs, the merits of
ratus to record ECG as well as signals from an event more regular assessments and the feasibility of their

114 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

Table 7.5 Idiographic Outcome Measures
Progress Notes A therapist’s notes provide the natural language of individuals.

Client Notes Therapists may ask clients to keep a small notebook with them, jotting down notes that
seem most applicable to them. They may do this to record incidents and what occurred
just before and right after the incident.

Client Diaries Clients may keep daily diaries of their life, and this may serve as an important key to cog-
nitions that are related to a problem and thus a factor to monitor across the intervention.

Writing Assignments Patients may participate in writing assignments to allow assessment of their personality.
This might involve writing narratives about the way they might solve a current problem
they have.

Storytelling Clients may tell stories about family members and how they interact with those family

Self-monitoring Self-monitoring is a combination of self-report and behavioral observation whereby indi-

viduals observe and record behaviors intended to be the targets of an intervention (Kazdin,
1974). Monitoring provides treatment-sensitive data in a wide variety of treatment
domains. A client may be assigned to monitor problematic behaviors in preparation for the
intervention. Traditionally, the procedure teaches clients to keep a notebook (or electronic
device) with them at all times to record behaviors immediately after they occur; only a
single response is recorded at a time (Cone, 1999; Barton, Blanchard, & Veazy, 1999).

Natural Language Verbal and nonverbal communications are the primary media through which therapist
and client transfer information and meaning. One method of understanding individu-
als well is to listen closely to what they say—that is, to study their use of language.
A focus on natural language is the essence of numerous narrative approaches to study-
ing, assessing, and intervening with individuals. Two such contemporary approaches to
natural language that offer potential insights into improvement in the measurement of
psychological constructs are narrative therapy and Pennebaker’s word analysis.

integration within an RCT would combine to make effectively change broader problems. Thus, it might
for a more potent, and informative, RCT. be helpful to determine if we can alter, for example,
Although we have discussed modifications to depression, psychotic symptoms, or personality traits
the RCT, single-case designs can also benefit from such as “openness to experience,” characteristics that
nomothetic approaches. For example, while the sin- are often thought to be stable. Examination of such
gle-case researcher may be interested in individual- changes might be looked at in terms of cognition
level change, standard pre–post analyses can also be and behavior. With respect to openness, one might
conducted. Although such pre–post analyses often measure the extent to which the client agrees to meet
involve a small sample size, nonparametric equiva- other people, try new restaurants, or see new movies;
lents to t tests and repeated-measures ANOVA, such these activities are open and outside of the norm for
as Wilcoxon and Friedman tests, can be utilized in the individual. Similarly, there may be changes that
such circumstances (see ter Kuile et al., 2009, for an are possible with respect to difficult personalities in
example). Overall, we see substantial room for inte- youth and adults such as interpersonal callousness
grating the strengths of both methodologies. (see Salekin, 2002; Salekin, Lester, & Sellers, 2012;
We have focused on change-sensitive measures Salekin et al., 2012; Salekin, Worley, & Grimes,
which may lead one to believe that we are focusing 2010; see also Hawes & Dadds, 2007).
on easily changeable behaviors, but we do not believe When seeking these change-sensitive measures,
that the field should move away from treating diffi- it will be important to keep in mind that we want
cult conditions or even traits (see Tang, DeRubeis, our measures to accurately tap the constructs we
Hollon, Amsterdam, Shelton, & Schalet, 2009). intend for them to tap, as well as for the measures
That is, when we talk about change-sensitive tests, to have meaning along the dimension (at high lev-
many of these tests are designed to examine behav- els of depression, the person is actually depressed)
ior, but we may also be interested in whether we can and that what we are calling the measure (e.g., Dep

salek i n, jar ret t, ad ams 115

Scale-Revised) is in fact what the measure is indexing in this volume), or intensive repeated measures in
(e.g., self-esteem). Correspondingly, we would want naturalistic settings (IRM-NS), would allow for the
the change to be clinically significant (Jacobson & assessment of mood, thoughts, and behaviors in
Traux, 1991; Kendall, 1998) and not reflect issues the moment while offering a more comprehensive
with measurement. In recent years, there has been data collection than measures administered during a
some concern about arbitrary metrics. If mea- therapy session (Heron & Smyth, 2010; Moskowitz,
sures are not sound, it can be very problematic for Russell, Sadikaj, & Sutton, 2009; Palermo, 2008;
research on interventions. As Kazdin (2006) puts it, Trull & Ebner-Priemer, 2009; Wenze & Miller,
“Arbitrary metrics and related points about them. . . 2010). Also, reminders can be sent to patients
if applied, shake key pillars of the foundations of about homework assignments, and other cues can
EBTs and greatly qualify the conclusions that can be be given. While paper-and-pencil tests are likely to
made about these treatments” (p. 45). continue to serve a purpose, critics have argued that
A solution to this issue can be found from in-therapy self-report measures are subject to recall
research advice previously published in the 1970s biases, semantic memory, heuristics, and preexisting
and 1980s. Specifically, researchers during this beliefs, which these technological advances could
time were encouraged to evaluate whether changes reduce (Moskowitz et al., 2009). Technology has
in treatment were socially valid; this meant that advanced and offers hope for better measurement,
researchers were asked to focus on domains that yet several aspects of using technology will continue
were important to individuals in everyday life. to require further investigation for appropriate use,
Questions were further asked as to whether changes including psychometric properties, method choice,
following treatment actually made a difference in and technological variables.
clients’ lives. Specifically did the clients notice any Overall, assessments that use computers, per-
difference following treatment and did the interven- sonal digital assistants, and mobile phones will likely
tion also make a difference to those with whom the be important as they are tangible and adapt clinical
clients’ interacted (e.g., relatives and friends; Wolf, measures to the technological progression of society.
1978). Although there are likely many ways to Being familiar with these devices and their use in
accomplish this clinical research goal, one method therapy offers clinicians a unique method of mea-
previously used suggests the use of naïve observ- surement. Further, research needs to be conducted
ers who rated the performance of clients prior to to provide guidelines for implementing therapy via
and after treatment (Kazdin, 1977). Technology technology, but the potential for enhanced clinical
may further assist with this needed component to utility is present (see Table 7.6). Such research may
research studies, where family and friends will be begin to address some of the issues raised by Mischel
able to provide feedback on issues pertaining to (1968), who ignited a controversy in the late 1960s
clinical change. about the extent to which the environment and
context had to do with an individual’s personal-
Context and Assessment Technology: ity (see also Kantor, 1924; Lewin, 1935; Murray,
Needed Advancements to Keep Pace with 1951; Peterson, 1968). When measuring change, a
Psychotherapy Innovations method that combines event-specific reactions and
Technology is changing the way that clinicians the frequencies of the events themselves could also
and researchers perceive assessment and treatment prove most fruitful. Such integration may converge
with their clients. The prevalent use of computers, better with what is assessed by the overall test bat-
personal digital assistants, and mobile phones could tery. This may provide additional information on
very well help researchers examine progress in ther- person–situation mixtures requiring careful consid-
apy from nomothetic and idiographic perspectives. eration (e.g., verbal aggression may be charged only
Given our need to more closely test the theoretical
models of intervention, such devices offer unique
Table 7.6 Technological Advances
methods of measuring behavior, assessing outcomes,
and delivering treatments that are easily accessible, Computers
thereby, increasing response frequency and accuracy Smart phones
in comparison to paper measures. To underscore
this point, measuring individuals in natural settings Personal digital assistants
at specific moments, often referred to as ecologi-
Global positioning systems
cal momentary assessment (EMA; see Chapter 11

116 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

by one parent in a certain context) and could help References
with concerns regarding arbitrary metrics. Achenbach, T. M. (2001a). Manual for the Child Behavior
Checklist 4–18 and 2001 Profile. Burlington: University of
Vermont, Department of Psychiatry.
Concluding Comments: Integration and
Allen, L. B., McHugh, R. K., & Barlow, D. H. (2008). Emotional
Future Directions disorders: A unified protocol. In D. H. Barlow (Ed.), Clinical
Improving the assessment of psychological vari- handbook of psychological disorders: A step-by-step treatment
ables will be an important future task for psychology. manual (4th ed., pp. 216–249). New York: Guilford Press.
This chapter covered a number of important top- Allport, G. W. (1937). Personality: A psychological interpretation.
New York: Henry Holt.
ics such as test construction, integration of nomo-
Allport, G. W. (1938). Editorial. Journal of Abnormal and Social
logical systems of classification, the importance of Psychology, 33, 3–13.
biology, measurement of change, development of Alpers, G. W. (2009). Ambulatory assessment in panic disorder
new measures, the integration of nomothetic and and specific phobia. Psychological Assessment, 21, 476–485.
idiographic designs, and technological advances, as Bandura, A. (1977). Self-efficacy: Toward a unifying theory of
behavior change. Psychological Review, 84, 191–215.
well as a brief discussion on arbitrary metrics. We
Bandura, A. (1997). Self-efficacy: The exercise of control. New York:
see value in being more idiographic in our research, Freeman.
and technology is offering us more opportunities Barlow, D. H. (2010). Negative effects from psychological treat-
to do so. Although most psychological researchers ments: A perspective. American Psychologist, 65, 13–20.
have been trained in group comparison designs and Barlow, D. H., & Nock, M. K. (2009). Why can’t we be more
idiographic in our research? Perspectives on Psychological
have relied primarily on them, exciting advances
Science, 4, 19–21.
have been made in the use of idiographic method- Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single-case
ologies, such as the single-case, experimental design experimental designs: Strategies for studying behavior change.
(see Barlow, Nock, & Hersen, 2009). New York: Pearson.
We see value in placing more emphasis on an Barton, K. A., Blanchard, E. B., & Veazy, C. (1999). Self-
monitoring as an assessment strategy in behavioral medicine.
integration of nomothetic and idiographic strategies
Psychological Assessment, 11, 490–497.
that can be used in both clinical and basic science Beck, A. T. (1995). Cognitive therapy: Past, present, and future.
settings. As noted by Barlow and Kazdin, in clinical In M. J. Mahoney (Ed.), Cognitive and constructive psychother-
science, having established the effectiveness of a par- apies: Theory, research, and practice (pp. 29–40). New York:
ticular independent variable (e.g., an intervention Springer Publishing.
Bergin, A. E. (1966). Some implications of psychotherapy
for a specific form of psychopathology), one could
research for therapeutic practice. Journal of Abnormal
then carry on with more idiographic efforts track- Psychology, 71, 235–246.
ing down sources of inter-subject variability and Borckardt, J. J., Nash, M. R., Murphy, M. D., Moore, M., Shaw,
isolating factors responsible for this variability (see D., & O’Neil, P. (2008). Clinical practice as natural labora-
also Kazdin & Nock, 2003; Kendall, 1998). This tory for psychotherapy research. American Psychologist, 63,
might allow us to better assess change in psychologi-
Brown, T. A., & Barlow, D. H. 2009). A proposal for a dimen-
cal therapy. Necessary alterations in the interven- sional classification system based on the shared features of
tion protocols to effectively address client variability the DSM-IV anxiety and mood disorders: Implications
could be further tested, once again idiographically, for assessment and treatment. Psychological Assessment, 21,
and incorporated into the treatments. Researchers 256–271.
Butcher, J. N. (1998). Butcher Treatment Planning Inventory. San
in basic science laboratories could undertake similar
Antonio, TX: The Psychological Corporation.
strategies and avoid tolerating the large error terms. Carr, L., Henderson, J., & Nigg, J. T. (2010). Cognitive control
By incorporating a number of the innovations in the and attentional selection in adolescents with ADHD versus
assessment field, psychological science, both basic ADD. Journal of Clinical Child & Adolescent Psychology, 39,
and applied, could make significant strides in psy- 726–740.
Caspi, A., & Shiner, R. L. (2006). Personality development.
chological therapy research and practice in the years
In W. Damon & R. Lerner (Series Eds.) & N. Eisenberg
to come. (Vol. Ed.), Handbook of child psychology: Vol. 3. Social, emo-
tional and personality development (6th ed., pp. 300–365).
Note Hoboken, NJ: Wiley.
1. Aside from integrating information across disciplines, inte- Cicchetti, D. (1984). The emergence of developmental psycho-
gration of information across sources has also become an impor- pathology. Child Development, 55, 1–6.
tant consideration. Although beyond the focus of this chapter, Clark, L. A., & Watson, D. B. (1995). Constructing validity:
we see the need to better understand and integrate information Basic issues in objective scale development. Psychological
across sources as another important step for researchers to take Assessment, 7, 309–319.
in the assessment domain (see De Los Reyes & Kazdin, 2004, Cloud, J. (2010). Why genes aren’t our destiny: The new field
2005). of epigenetics is showing how the environment and your

salek i n, jar ret t, ad ams 117

choices can influence your genetic code—and that of your Kazdin, A. E. (1977). Assessing the clinical or applied impor-
kids. Time, January, 48–53. tance of behavior change through social validation. Behavior
Cone, J. D. (1999). Introduction to the special section on self- Modification, 1, 427–452.
monitoring: A major assessment method in clinical psychol- Kazdin, A. E. (2006). Arbitrary metrics: Implications for iden-
ogy. Psychological Assessment, 11, 411–414. tifying evidence-based treatments. American Psychologist, 61,
Cook, T., & Campbell, D. (1979). Quasi-experimentation. 42–49.
Chicago, IL: Rand McNally. Kazdin, A. E., & Nock, M. K. (2003). Delineating mechanisms
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in of change in child and adolescent therapy: Methodological
psychological test. Psychological Bulletin, 52, 281–302. issues and research recommendations. Journal of Child
De Los Reyes, A., & Kazdin, A. E. (2004). Measuring infor- Psychology and Psychiatry, 44, 1116–1129.
mant discrepancies in clinical child research. Psychological Kelly, M. A. R., Roberts, J. E., & Ciesla, J. A. (2005). Sudden
Assessment, 16, 330–334. gains in cognitive behavioral treatment for depression: When
De Los Reyes, A., & Kazdin, A. E. (2005). Informant discrepan- do they occur and do they matter? Behavior Research and
cies in the assessment of childhood psychopathology: A criti- Therapy, 43, 703–714.
cal review, theoretical framework, and recommendations for Kendall, P. C. (1998). Empirically supported psychological ther-
further study. Psychological Bulletin, 131, 483–509. apies. Journal of Consulting and Clinical Psychology, 66, 3–7.
Dimaggio, G., & Semerari, A. (2004). Disorganized narratives. Kendall, P. C., Comer, J. S., Marker, C. D., Creed, T. A., Puliafico,
In L. E. Angus & J. McLeod (Eds.), Handbook of narra- A. C., Hughes, A. A., Martin, E., Suveg, C., & Hudson, J.L.
tive and psychotherapy (pp. 263–282). Thousand Oaks, CA: (2009). In-session exposure tasks and therapeutic alliance
Sage. across the treatment of childhood anxiety disorders. Journal
Eisenberg, N., Sadovsky, A., Spinrad, T. L., Fabes, R. A., of Consulting and Clinical Psychology, 77, 517–525.
Losoya, S. H., Valiente, C., . . . Shepard, S. A. (2005). The Kiesler, D. J. (1966). Some myths of psychotherapy research and
relations of problem behavior status to children’s negative the search for a paradigm. Psychological Bulletin, 65, 100–136.
emotionality, effortful control, and impulsivity: Concurrent Klein, D. F. (1993). False suffocation alarms, spontaneous panics,
relations and prediction of change. Developmental Psychology, and related conditions: An integrative hypothesis. Archives of
41, 193–211. General Psychiatry, 50, 306–317.
Ellis, M. V., & Blustein, D. L. (1991). Developing and using Lambert, M. J. (1994). Use of psychological tests for outcome
educational and psychological tests and measures: The uni- assessment. In M. E. Maruish (Ed.), The use of psychological test-
ficationist perspective. Journal of Counseling & Development, ing for treatment planning and outcome assessment (pp. 75– 97).
69, 550–555. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers.
Freedman, R. R., Ianni, P., Ettedgui, E., & Puthezhath, N. Landy, F. J. (1986). Stamp collecting versus science: Validation as
(1985). Ambulatory monitoring of panic disorder. Archives hypothesis testing. American Psychologist, 41, 1183–1192.
of General Psychiatry, 42, 244–248. Lewin, K. (1935). A dynamic theory of personality. New York,
Frick, P. J. (2004). Integrating research on temperament and McGraw-Hill.
childhood psychopathology: Its pitfalls and promise. Journal Ley, R. (1985). Agoraphobia, the panic attack and the hyperventi-
of Clinical Child and Adolescent Psychology, 33, 2–7. lation syndrome. Behaviour Research and Therapy, 23, 79–81.
Frick, P. J., & Morris, A. (2004). Temperament and developmen- Loevinger, J. (1957). Objective tests as instruments of psycho-
tal pathways to conduct problems. Journal of Clinical Child logical theory. Psychological Reports, 3, 635–694.
And Adolescent Psychology, 33(1), 54–68. Meier, S. T. (2008). Measuring change in counseling and psycho-
Hathaway, S. R., & McKinley, J. C. (1941). The Minnesota therapy. New York: Guilford Press.
Multiphasic Personality Inventory manual. New York: Mischel, W. (1968). Personality and assessment. Hoboken, NJ:
Psychological Corporation. John Wiley & Sons Inc.
Hawes, D. J., & Dadds, M. R. (2007). Stability and malleability Moskowitz, D. S., Russell, J. J., Sadikaj, G., & Sutton, R.
of callous unemotional traits during treatment for childhood (2009). Measuring people intensively. Canadian Psychology,
conduct problems. Journal of Clinical Child and Adolescent 50, 131–140.
Psychology, 36, 347–355. Mumma, G. H. (2004). Validation of idiosyncratic cognitive
Heron, K. E., & Smyth, J. M. (2010). Ecological momentary schema in cognitive case formulation: An intraindividual
interventions: Incorporating mobile technology into psy- idiographic approach. Psychological Assessment, 16, 211–230.
chosocial and health behaviour treatments. British Journal of Murray, H. A. (1951). Uses of the Thematic Apperception Test.
Health Psychology, 15, 1–39. The American Journal of Psychiatry, 107, 577–581.
Hoehn-Saric, R., McLeod, D. R., Funderburk, F., & Kowalski, Nigg, J. T. (2006). Temperament and developmental psycho-
P. (2004). Somatic symptoms and physiologic responses in pathology. Journal of Child Psychology and Psychiatry, 47,
generalized anxiety disorder and panic disorder. Archives of 395–422.
General Psychiatry, 61, 913–921. Palermo, T. M. (2008). Editorial: Section on innovations in
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A technology in measurement, assessment, and intervention.
statistical approach to defining meaningful change in psy- Journal of Pediatric Psychology, 33, 35–38.
chotherapy research. Journal of Consulting and Clinical Peterson , D. R. (1968). The clinical study of social behavior. East
Psychology, 59, 12–19. Norwalk, CT: Appleton-Century-Crofts.
Kantor, J. R. (1924). Principles of psychology (Vol. 1). Bloomington, Persons, J. B. (1989). Cognitive Therapy in Practice: A Case
IN: Principia Press. Formulation Approach. New York: W.W. Norton.
Kazdin, A. E. (1974). Self monitoring and behavior change. In Persons, J. B., Davidson, J., & Tompkins, M. A. (2001). Essential
M. J. Mahoney & C. E. Thorsen (Eds.), Self-control: Power to Components of Cognitive-Behavior Therapy for Depression.
the person (pp. 218–246). Monterey, CA: Brooks-Cole. Washington: American Psychological Association.

118 a sse ssm e nt and meas urement o f ch ange consi d er at i ons

Rothbart, M. K. (2004). Commentary: Differentiated measures Tang, T. Z., DeRubeis, R. J., Hollon, S. D., Amsterdam, J.,
of temperament and multiple pathways to childhood disor- Shelton, R., & Schalet, B. (2009). Personality change dur-
ders. Journal of Clinical Child and Adolescent Psychology, 33, ing depression treatment. Archives of General Psychiatry, 66,
82–87. 1322–1330.
Rutter, M. (1987). Temperament, personality and personality ter Kuile, M. M., Bulte, I., Weijenborg, P. T., Beekman, A.,
disorder. British Journal of Psychiatry, 150, 443–458. Melles, R., & Onghena, P. (2009). Therapist-aided exposure
Salekin, R. T. (2002). Psychopathy and therapeutic pessimism: for women with lifelong vaginismus: A replicated single-case
Clinical lore or clinical reality? Clinical Psychology Review, design. Journal of Consulting and Clinical Psychology, 77,
22, 79–112. 149–159.
Salekin, R. T. (2009). Psychopathology and assessment: Trull, T. J., & Ebner-Priemer, U. W. (2009). Using experi-
Contributing knowledge to science and practice. Journal of ence sampling methods/ecological momentary assessment
Psychopathology and Behavioral Assessment, 31, 1–6. (ESM/EMA) in clinical assessment and clinical research:
Salekin, R. T., & Averett, C. A. (2008). Personality in child- Introduction to the special section. Psychological Assessment,
hood and adolescence. In M. Hersen & A. M. Gross 21, 457–462.
(Eds.), Handbook of clinical psychology (Vol. 2): Children Vermeersch, D. A., Whipple, J. L., Lambert, M. J., Hawkins,
and adolescents (pp. 351–385). Hoboken, NJ: John Wiley E. J., Burchfield, C. M., & Okiishi, J. C. (2004). Outcome
& Sons. Questionnaire: Is it sensitive to changes in counseling center
Salekin, R. T., Lester, W. S., & Sellers, M. K. (2012). Mental sets clients? Journal of Counseling Psychology, 51, 38–49.
in conduct problem youth with psychopathic features: Entity Volk, H. E., Todorov, A. A., Hay, D. A., & Todd, R. D. (2009).
versus incremental theories of intelligence. Law and Human Simple identification of complex ADHD subtypes using
Behavior, 36, 283–292. current symptom counts. Journal of the American Academy of
Salekin, R. T., Tippey, J. G., & Allen, A. D. (2012). Treatment Child and Adolescent Psychiatry, 48, 441–450.
of conduct problem youth with interpersonal callous traits Watson, D., Kotov, R., & Gamez, W. (2006). Basic dimensions
using mental models: Measurement of risk and change. of temperament in relation to personality and psychopathol-
Behavioral Sciences and the Law, 30, 470–486. ogy. In R. F. Krueger & J. L. Tackett (Eds.), Personality and
Salekin, R. T., Worley, C., & Grimes, R. D. (2010). Treatment of psychopathology (pp. 7–38). New York: Guilford Press.
psychopathy: A review and brief introduction to the mental Wenze, S. J., & Miller, I. W. (2010). Use of ecological momentary
model approach for psychopathy. Behavioral Sciences and the assessment in mood disorders research. Clinical Psychology
Law, 28, 235–266. Review, 30, 794–804.
Sanislow, C. A., Pine, D. S., Quinn, K. J., Kozak, M. J., Garvey, Westen, D., & Bradley, R. (2005). Empirically supported
M. A., Heinssen, R. K., Wang, P. S., & Cuthbert, B. N. complexity: Rethinking evidence-based practice in psy-
(2010). Developing constructs for psychopathology research: chotherapy. Current Directions in Psychological Science, 14,
Research domain criteria. Journal of Abnormal Psychology, 266–271.
119, 631–639. Westen, D., Novotny, C. M., & Thompson-Brenner, H. (2004).
Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychother- The empirical status of empirically supported psychothera-
apy outcome studies. American Psychologist, 32, 752–760. pies: Assumptions, findings, and reporting in controlled
Tackett, J. L. (2006). Evaluating models of the personality- clinical trials. Psychological Bulletin, 130, 631–663.
psychopathology relationship in children and adolescents. Widiger, T. A., & Trull, T. J. (2007). Plate tectonics in the clas-
Clinical Psychology Review, 26, 584–599. sification of personality disorders. American Psychologist, 62,
Tackett, J. L. (2010). Measurement and assessment of child and 71–83.
adolescent personality pathology: Introduction to the special Wolf, M. M. (1978). Social validity: The case of subjective mea-
issue. Journal of Psychopathology and Behavioral Assessment, surement or how applied behavior analysis is finding its
32, 463–466. heart. Journal of Applied Behavior Analysis, 11, 203–214.

salek i n, jar ret t, ad ams 119


Observational Coding Strategies

David J. Hawes, Mark R. Dadds, and Dave S. Pasalich

Observational coding involves classifying and quantifying verbal and nonverbal behavioral events
or psychological states, irrespective of participants’ reports or perceptions. Such coding has been
widely used to index the dimensions of diagnostic symptoms associated with various disorders, the
contextual dynamics of functional importance to these disorders, and individual differences (e.g., child
temperament) and internal processes (e.g., cognitive biases) implicated in pathways to these disorders.
We provide an overview of the applications of observational coding strategies in clinical research, and
key principles in the design and implementation of observational strategies. Examples are drawn from
programs of research that demonstrate the theory-driven use of observation, often in the context of
multimethod measurement. We also focus specifically on observational measurement in intervention
designs, and make recommendations regarding the role of observation in addressing research
challenges associated with emerging models of psychopathology.
Key Words: Observational coding, direct observation, parent–child interaction, relationship

Introduction with disorders. Behavioral avoidance tests, for

The rich evidence base available to us as clinical example, have been used to assess forms of anxious
practitioners and behavioral scientists owes much to avoidance associated with various anxiety disorders,
direct observation. Observational coding involves including specific phobia, agoraphobia, and obses-
classifying and quantifying verbal and nonverbal sive-compulsive disorder. In such a test, participants
(e.g., motor actions, expressed affect) behavioral may be exposed to their feared stimuli under con-
events or psychological states, irrespective of partic- trolled settings and observed while they complete as
ipants’ reports or perceptions (Dishion & Granic, many steps as possible in a graduated series. Ost and
2004; Heyman & Slep, 2004). In clinical psychol- colleagues (2001), for example, asked youth with a
ogy and psychiatry, direct observation has long snake phobia to enter a room where a live snake was
been an essential diagnostic and therapeutic tool. enclosed in a glass container and to remove the lid
Although issues related to the use of observation in and pick up the snake and hold it for 10 seconds.
clinical practice versus clinical science overlap con- The percentage of steps the youth accomplished was
siderably, the primary focus of this chapter is on the then observed and recorded.
latter. Second, observation is widely used to measure
Observational coding is commonly used to mea- the contextual variables associated with mental
sure variables related to psychopathology and dys- health problems, and in turn to capture the qualities
function. First, coding can provide valuable data on of social contexts that maintain and amplify these
the dimensions of diagnostic symptoms associated problems over time. In such a capacity, observation

allows for the systematic examination of functional valid information about behavior (Barkely, 1997)
relationships between problem behavior and the and carries distinct advantages over other strategies
environment in which it occurs. It is for this reason in the empirical study of psychopathology and its
that observational methods have played an integral treatment. These advantages have become increas-
role in the early development of behavioral inter- ingly apparent as evidence regarding the nature
ventions. Indeed, much of what we currently know of psychopathology has emerged. In the past two
about risk and protective processes within the rela- decades, experimental research has focused increas-
tionship contexts of couples, parents and children, ingly on cognitive aspects of psychopathology, with
peers, and siblings has its origins in data coded from findings emphasizing various biases in information
direct observation. As discussed later, observational processing across a range of clinical populations
coding of the features and contextual interactions (e.g., Dobson & Kendall, 1993). For example, evi-
continues to play a key role in the scientific inves- dence shows that the patterns of overreactive par-
tigation of problem development, treatment effects, enting that are associated with childhood conduct
and mechanisms of behavior change. problems are also associated with parental deficits
Third, observational coding is commonly used in the encoding and appraisal of child behavior.
to assess participant characteristics or individual Such parents appear more likely to notice negative,
differences that—while not symptoms of psychopa- relative to positive, child behavior, and to view neu-
thology per se—may be implicated in pathways to tral or positive child behavior as problematic (e.g.,
psychopathology. A common example is the use of Lorber, O’Leary, & Kendziora, 2003). At the same
observation to code dimensions of child tempera- time, there is growing evidence that the behaviors of
ment, as in Caspi’s (2000) longitudinal study of functional importance to many disorders occur in
personality continuities from childhood to adult- overlearned patterns (Dishion & Stormshak, 2007;
hood. In this study the temperaments of 3-year- Gottman, 1998). That is, as a result of frequent
olds, coded from direct observation, were found to repetition they are often performed somewhat auto-
predict psychological disorders, including depres- matically, outside of awareness. For example, the
sion, antisocial personality disorder, and alcohol reactive actions of distressed couples—often corrod-
dependence, in those individuals at 21 years of age. ing relationship quality through escalating cycles of
Driven by emerging models of temperament, lab- escape-conditioning—are likely to be enacted with
based observational paradigms have been developed the same automaticity of other overlearned behav-
to index temperament dimensions (e.g., Goldsmith iors such as driving a car or reading. The implica-
& Rothbart, 1996; Kochanska, Murray, & Harlan, tion of such evidence for the topic at hand is that
2000). Such paradigms have featured increasingly in the cognitive and behavioral processes that under-
clinical research as interest in developmental models lie various clinical problems have the potential to
of psychopathology has grown. confound self-reports of those problems and their
Fourth, observational coding has at various times components. Alternatively, data coded from direct
been used as a means to indirectly index covert, observation are neither limited by participants’
internal events—typically cognitive processes— explicit awareness of behaviors nor contaminated by
that are not amenable to measurement via direct the perceptual biases that may color their reports.
observation or self-report. Ehrmantrout and col- Such advantages may often offset the significant
leagues (2011), for example, videotaped depressed costs of adopting observational measures in clinical
adolescents and their parents in problem-solving research, and have often been cited by researchers
interactions, and coded the affect expressed by par- who have (e.g., Sheeber et al., 2007).
ents. The adolescents also viewed these recordings The importance of observational strategies to
in a video-mediated recall procedure in which they clinical research can also be appreciated in relation
were required to identify their parents’ emotions to the emphasis on the social environment in many
in 20-second intervals. By analyzing discrepancies research questions concerning the development and
between adolescents’ subjective ratings of parent treatment of psychopathology. Causal models of
affect and the independently coded observations of psychopathology have long emphasized environ-
this affect, researchers were able to identify the emo- mental risk factors in the prediction of dysfunc-
tion recognition deficits that characterized these tion, and decades of research have shown that the
adolescents. most proximal of these often operate through social
Direct observation has been described as the most mechanisms. Historically, this research has focused
effective method available for obtaining ecologically primarily on family dynamics and has produced

h awes, d ad d s, pasali c h 121

models that have been translated into widely dis- couples (see Kerig & Baucom, 2004). Conversely,
seminated interventions. It is in the field of child- there are numerous clinical problems—ranging
hood conduct problems that family dynamics have from psychotic disorders to common mood and
probably been investigated most extensively (see anxiety disorders—that are characterized by largely
Hawes & Dadds, 2005a, for a review), and the most internal or “private” symptoms (e.g., hallucinations,
significant examples of this research (e.g., Patterson, feelings of hopelessness and fear). Research in such
1982) have often relied heavily on observational areas has relied far less on observational measure-
methods. More recently, observational data have ment, and relatively few observational coding sys-
informed increasingly sophisticated conceptualiza- tems have been developed specifically to investigate
tions of family interactions in pathways to child such problems.
anxiety (e.g., Dadds, Barrett, Rapee, & Ryan, 1996; Why are observational strategies used to study
Dubi et al., 2008; Hudson, Comer, & Kendall, some disorders more than others? The difference can
2008; Suveg, Sood, Barmish, Tiwari, Hudson, & be understood in terms of accessibility—the notion
Kendall, 2008). that some events are more amenable to direct obser-
Research into peer relationships has provided vation than others. Accessibility has been defined as
evidence that contexts outside of the family play a the likelihood that an environmental or behavioral
significant role in pathways to a range of disorders event exists, coupled with the difficulty of reliably
across childhood and adolescents. In a review of the observing it (Johnston & Pennypacker, 1993). It is
emerging evidence, Dishion and Tipsord (2011) self-evident as to why fields of research concerned
identified “peer contagion” as a mechanism through with those more public events have typically favored
which environments shape and amplify not only observational methods more than those concerned
problems such as aggression and drug use, but also with events and processes that are largely private.
depression and eating disorders. Importantly, prog- However, public behavior may nonetheless be asso-
ress in this area has demonstrated that the specific ciated with topographic characteristics that present
dynamics through which peers confer risk are often challenges related to accessibility. Some behaviors
subtle and not detectable through methods other may occur too infrequently (e.g., seizures) or too
than the coding of the behavior stream (Dishion briefly (e.g., some motor tics) to be observed reli-
& Granic, 2004). It is clear that direct observa- ably. Alternatively, the contexts in which they occur
tion provides a unique window on individuals in may not be conducive to observation (e.g., steal-
the context of—and in relation to—their ecologies, ing). This issue is seen in “setting events”—events
and is a strategy that remains essential to answering that are functionally related to behaviors of interest,
many of the questions surrounding the roles that yet occur in contexts that are distally removed from
environments play in problem trajectories. those in which the behaviors occurs. For example,
incidents of bullying experienced by a child while
Preliminary Issues in the Use traveling to school may function as an important set-
of Observation ting event for that child’s aggression toward peers in
Accessibility the school playground later in the day. Accessibility
In fields where dysfunction is associated with is a crucial concern and should be considered care-
overt or “public” behavioral events, observational fully when deciding upon the use of direct obser-
methods have proliferated. A noteworthy example vation as a measurement strategy. However, such a
is the field of childhood externalizing problems. decision should also be informed by an awareness of
Diagnostic features of presentations such as oppo- the strategies by which issues of accessibility may be
sitional defiant disorder (e.g., often loses temper) minimized—as addressed throughout the following
can be readily operationalized in terms of observ- sections.
able behavior (e.g., crying, kicking), as can the con- Accessibility is not an inherent characteristic
textual variables that often accompany them (e.g., of specific events or behaviors, but rather is deter-
aversive and ineffective parenting practices such as mined by multiple variables associated with the
criticism and vague instructions). Established sys- likelihood that an observer will be able to detect
tems for observational coding in this area are many them. As such, accessibility may be an issue when
and have been subject to psychometric research (see established observational systems are not available to
Aspland & Gardner, 2003). Likewise, an extensive assess particular constructs, or simply that appropri-
range of observational methods have been developed ate training in those systems is not available to cod-
to investigate the relationship problems of distressed ers. Likewise, observational procedures themselves

122 ob se rvatio nal co ding s trategies

may compromise accessibility through participant be of interest to clinical researchers (e.g., harsh par-
reactivity. For example, family members may react enting practices, couples’ hostility). Such issues have
to the physical presence of observers in the family also been examined empirically in measurement
home by inhibiting certain behaviors that may oth- research (see Gardner, 2000, for a review of parent–
erwise feature in typical interactions. child interactions). Experimental studies have been
conducted in which the intrusiveness of record-
Representativeness ing equipment has been manipulated (e.g., Jacob,
For observational data on a specific set of behav- Tennenbaum, Seilhamer, Bargiel, & Sharon, 1994)
iors to inform empirical conclusions, a representa- or participants have been instructed to intention-
tive sample of that behavior must be collected. In ally fake particular types of behavior (e.g., Johnson
other words, observational assessment aims to cap- & Bolstard, 1975; Patterson, 1982). Rather than
ture the typical behavior of participants in the set- suggesting that participants inhibit socially undesir-
ting of interest. The challenge can be likened to the able behavior during naturalistic observation, such
process of conducting a clinical assessment inter- studies have provided impressive support for the
view. It may be easy to get a client talking about the reliability of such methods. Data from these stud-
issues he or she is presenting with; however, only by ies indicate that family interactions vary little based
asking the right questions, and in the right way, will on the presence or absence of an observer, and that
an interview elicit the specific information neces- although participants are easily able to fake “bad”
sary to formulate a reliable diagnosis. Likewise, the (e.g., critical and reactive couples communication),
potential for observational measurement to capture they are generally unable to fake “good” (e.g., mutu-
the typical behavior of an individual will depend on ally supportive couples communication).
considerations such as how much of that behavior is What about the impact that location may have
sampled, on how many occasions it is sampled, and on observed participant behavior? This question
under what conditions the sampling occurs. For has been the subject of measurement research, with
example, early observational research into marital studies comparing participant behavior in home
problems found that the interactions of distressed versus laboratory/clinic settings. Mothers and chil-
couples were indistinguishable from those of non- dren have been found to exhibit higher rates of
distressed couples when using standardized analogue various behaviors in the clinic setting compared to
tasks (e.g., Birchler, Weiss, & Vincent, 1975). It was home (Zangwill & Kniskern, 1982). There are also
only when such couples were observed discussing some data to suggest that these respective settings
sensitive issues in their own relationships that fea- may differ in the extent to which they bias par-
tures uniquely associated with relationship quality ticipants toward positive versus negative behavior.
could be detected (e.g., Gottman, 1979). Findings regarding the direction of such effects vary
somewhat across different populations and para-
Settings for Observational Measurement digms. For example, when observed in the labora-
The conditions under which behavior is observed tory, mothers have been found to be more active
in clinical research typically span a continuum and responsive to their infants and more interac-
ranging from unconstrained naturalistic observa- tive and helpful and less restrictive in parent–child
tion to tightly controlled analogue tasks in labora- observations (Jacob, Tennenbaum, & Krahn, 1987;
tory or clinic settings (Hartmann & Wood, 1990). Moustakas, Sigel, & Schalock, 1956) compared to
Naturalistic observation typically refers to cod- in the home. Married couples have likewise been
ing behavior outside of the laboratory, in the “real found to engage in more positive emotional inter-
world” (see Dishion & Granic, 2004, for a review). actions in the laboratory setting compared to the
Common locations for such observation include the home (Gottman, 1979, 1980), whereas families
family home, classrooms, and schoolyards; however, have been found to exhibit more positive interac-
they may in principle be conducted anywhere. Given tions during decision-making tasks conducted in
the presence of either an observer or camera, and the the home setting (O’Rourke, 1963).
ethical requirement that participants are aware that
their behavior is being recorded, consumers of such Naturalistic Observation
research have often queried the extent to which nat- Clinical psychologists have often aimed to
uralistic observation does truly capture real-world observe behavior in its natural context where pos-
behavior. Social desirability is a common concern, sible, based on the assumption that in vivo obser-
considering the range of behaviors that are likely to vation potentially provides the highest-quality data

h awes, d ad d s, pasali c h 123

(Cone, 1999). Naturalistic observation has often to examine relations among cumulative risk, nur-
been utilized in studies that test the therapeutic turant and involved parenting, and behavior prob-
effects of modifying contextual variables on par- lems across early childhood. During a home visit,
ticipant outcomes (e.g., Raver et al. 2009), and children and caregivers were videotaped in a series
those concerned with the specific contexts in which of highly structured tasks designed to sample com-
behaviors occur. For example, Snyder and colleagues mon family scenarios and elicit a range of child and
(2008) used a multimethod strategy to assess child parent behaviors. These included a cleanup task (5
antisocial behavior in each of three social ecologies minutes), a delay of gratification task (5 minutes),
(home, classroom, and school playground) to index teaching tasks (3 minutes each), the presentation of
its cross-context stability. School playground data two inhibition-inducing toys (2 minutes each), and
were collected by observing the behavior of partici- a meal preparation and lunch task (20 minutes).
pating children in this setting on 10 separate occa- Another example is the structured family discussion
sions between the ages of 5.3 and 6.8 years. On each paradigm used by Dishion and Bullock (2001).
occasion, the behavior of each child was observed Once again, during a home visit parents and ado-
and coded for 5 minutes in relation to peer aggres- lescents were videotaped engaging in discussions
sion and covert antisocial behavior. on a series of set topics. Not only were the topics
The major advantage to naturalistic observa- of discussion structured (ranging from planning an
tion is the likelihood that data will generalize to activity for the following week, to parental moni-
the real world (Mash & Terdal, 1997). However, toring of peer activities, and norms for substance
as noted by Dishion and Granic (2004), much use), but also was the participation of various fam-
of what occurs in the real world does not provide ily members, with siblings included selectively in
informative data on the functional dynamics of psy- specific discussions. These structured discussions
chopathology and adjustment. The authors point were coded for parent–adolescent relationship
out that observing discordant couples or families quality, and problem-solving and parenting prac-
throughout the course of their day is likely to reveal tices related to the management and monitoring
little about the interpersonal process related to their of adolescent behavior. Importantly, increasingly
conflict, as the interactions associated with conflict sophisticated and affordable technology for mobile
are often avoided (for that reason) by the individu- digital recording may lead to increases in the con-
als involved. As such, naturalistic observation often duct of naturalistic observations in clinical research
requires researchers to place some restrictions or in the coming years.
structure on the behavior of those individuals being
observed. The aim of this structure is generally to Analogue Observation
elicit the most meaningful behaviors. Such restric- In contrast to the real-world contexts of naturalis-
tions may be somewhat minimal, involving home tic observation, analogue observations are conducted
visits during which a parent and child are asked to in artificial settings that are often far removed from
engage in unstructured play using age-appropriate those in which behaviors of interest are typically
toys for a set period of time. Alternatively, the obser- performed. Behavioral laboratories and psychology
vation may be scheduled around events in a fam- clinics are often the venues of choice for observation
ily’s daily routine that are “high risk” for behaviors of this kind, which typically involves conditions
of interest, as is often the case with mealtimes for that are designed, manipulated, or constrained by
young oppositional children. In either case, at least researchers (see Heyman & Slep, 2004). Although
minimal restrictions are likely to be imposed by the issues of cost and convenience often underlie the
researcher, such as asking family members to remain adoption of analogue observations over naturalistic
in two adjacent rooms, and leaving televisions and ones, analogue methods of observation present a
telephones turned off (e.g., Maerov, Brummet, & number of distinct advantages. First and foremost,
Reid, 1978). analogue observation provides a high degree of con-
Research laboratories are often the preferred trol over the conditions under which behavior is
settings for scheduling carefully controlled obser- observed, allowing researchers to standardize such
vations, allowing for access to equipment such as conditions across participants. Like the structured
digital recording facilities. However, naturalistic observational tasks conducted in naturalistic set-
observation may also involve a high degree of struc- tings, analogue observations are typically structured
ture. An example of this is the observational assess- in order to elicit specific behaviors of interest. Again,
ment used by Trentacosta and colleagues (2008) these may be low-frequency behaviors or those that

124 ob se rvat io nal co ding s trategies

are difficult to view in their natural context for other observations of problem solving and home-based
reasons. Importantly, however, the types of restric- free interaction was 72 percent for each. The design
tions that can be placed on participant behavior in of this study precluded the authors from disen-
the laboratory are potentially more complex than tangling effects related to setting (home vs. clinic)
those that are often possible in naturalistic setting. from those related to the restrictions imposed on
These controlled conditions may also be manipu- participants (relatively unrestricted naturalistic vs.
lated in experimental designs, allowing researchers structured analogue). However, such findings none-
to test predictions concerning the effects of specific theless demonstrate the potential for these respective
stimuli or events on participant behavior. observational contexts to provide unique diagnostic
Such an approach was used by Hudson, Doyle, information related to different disorders. These
and Gar (2009) to examine child and parent influ- findings also suggest that the inclusion of multiple
ences on dimensions of parenting associated with observational contexts may produce the most com-
risk for child anxiety. Mothers of children with anx- prehensive behavior data related to clinical risk.
iety disorders and mothers of nonclinical children Support for these assumptions can be found in
were videotaped interacting during a speech-prep- subsequent studies, such as the recent investigation
aration task with a child from the same diagnostic of pathways to adolescent depression reported by
group as their child (i.e., anxious or nonanxious) Allen and colleagues (2006). Using two separate
and with a child from the alternative diagnostic analogue observations, adolescents’ behavior was
group. Maternal behavior was then coded in terms coded on dimensions of autonomy and relatedness,
of overinvolvement and negativity. It was found first in the context of the mother–child relationship
that when interacting with children other than their and then in the context of peer relationships. In the
own, mothers were observed to be more involved first observation adolescents and their mothers par-
with anxious children compared to nonclinical ticipated in a revealed-differences task in which they
children. The use of analogue observation in this discussed a family issue (e.g., money, grades) that
design allowed the researchers to identify poten- they had separately identified as an area of disagree-
tially important bidirectional effects between child ment. In the second observation the adolescent was
anxiety and parenting practices. videotaped interacting with a close friend while
The importance of observational context has problem solving a fictional dilemma. Using a lon-
been emphasized in a number of studies. Dadds and gitudinal design, the authors showed that adoles-
Sanders (1992) compared observational data col- cents’ behavior with their mothers (associated with
lected through home-based free parent–child inter- undermining autonomy and relatedness) and peers
actions versus clinic-based structured mother–child (associated with withdrawn-angry-dependent inter-
problem-solving discussions, in samples of depressed actions) both explained unique variance in growth
and conduct-disordered children. Parent–child of depressive symptoms. The prediction of problem
behaviors observed during unconstrained home trajectories would therefore have been reduced had
interactions showed relatively little convergence the behavior been observed in the context of only
with behavior observed in the clinic-based problem- one of these relationships.
solving tasks. No relationship was seen between chil- Interesting developments concerning the impor-
dren’s behavior across each of the respective settings. tance of context to observational measurement have
Maternal behavior was somewhat more consistent, come from studies adopting dynamic systems (DS)
with mothers’ depressed affect during clinic-based theory—a mathematical language used to describe
problem solving related to aversive behavior in the the internal feedback processes of a system in adapt-
home for mothers of depressed children. Likewise, ing to new conditions. Granic and Lamey (2002) used
angry affect during problem solving was related to a problem-solving paradigm to observe parent–child
aversive behavior in the home for mothers of con- interactions in a sample of boys with clinic-referred
duct-disordered and comparison children. In terms conduct problems, with and without comorbid anxi-
of predictive validity, observations of depressed chil- ety/depression. Guided by the assumptions of DS
dren and their mothers in home-based interactions theory, the observational procedure incorporated a
correctly predicted child diagnoses in 60 percent of perturbation—an event intended to increase pres-
cases. This was compared to only 25 percent accu- sure on the parent–child dyad and trigger a reorga-
racy based on behavior observed during clinic-based nization of their behavioral system. A core premise
problem solving. Conversely, accuracy of classifi- of DS approaches is that perturbations expose the
cation for conduct-disordered children based on characteristics of a system as it moves away from and

h awes, d ad d s, pasali c h 125

back to a stable equilibrium. The specific perturba- Microsocial Coding Systems
tion employed was a knock on the laboratory door to Observational systems concerned with describ-
signal that the allotted time for the problem-solving ing behavior at a molecular level typically do so by
discussion was almost over, and that a resolution was coding discrete behaviors into mutually exclusive
needed. The rationale for this perturbation included categories. These categories are operationalized in
the DS premise that only by perturbing a system concrete behavioral terms that allow them to be
can the full range of behavioral possibilities therein coded with minimal inference. The term “microso-
be identified. The authors found that parent–child cial” has traditionally been applied to observational
interactions in the two groups differed only after the coding that is concerned with the order and pattern
perturbation. Specifically, during the initial period of behaviors in a stream of observed social interac-
of the problem-solving discussion, parents in both tion (Dishion & Snyder, 2004). The strength of
groups exhibited a permissive style in responding to such coding is its potential to describe behavior as
aversive behavior in their children. However, follow- it unfolds over time. By representing the moment-
ing the perturbation, only those dyads involving chil- by-moment interactions between individuals in the
dren with comorbid internalizing problems shifted contexts of relationships (e.g., parent–child, peer,
to a style of interaction characterized by mutual and spousal, sibling), microsocial coding can capture the
escalating criticism and hostility. The finding that relationship processes that underlie dysfunction and
these parent–child dyads were more sensitive to the adjustment at both individual and systemic levels.
effects of the perturbation than those of pure exter- Mircosocial coding was integral to the classic
nalizing children was interpreted as evidence of struc- observational studies conducted by Patterson and
tural differences between these respective types of colleagues at the Oregon Social Learning Center,
dyads at a systemic level. Importantly, these findings beginning in the 1960s and 1970s (see Patterson,
also demonstrate that important classes of behavior Reid, & Dishion, 1992). These influential stud-
may at times be observable only by placing pressure ies examined the moment-to-moment interactions
on participants through the systematic manipulation within families of aggressive and oppositional chil-
of contextual cues. dren, and the functional dynamics between these
interactions and children’s antisocial behavior.
Approaches to Coding Behavior Seminal coding systems were developed at this time
Observational coding strategies vary considerably to capture the microsocial interactions of families
in terms of the specificity or precision with which (e.g., the Family Process Code; Dishion et al., 1983)
behavioral codes are operationalized. Molecular (or and administered live in naturalistic settings—most
microsocial) coding systems are the most intensive, often the family home. Observational data from
specifying discrete, fine-grained, behavioral units this research indicated that three moment-to-
(e.g., eye contact, criticize, whine). At the other end moment variables most robustly differentiated the
of the spectrum are molar (or global/macrolevel) interactions of families of clinic-referred conduct
coding systems, based on more inclusive behav- problem children from those of well-adjusted chil-
ioral categories. For example, the code “whine” is dren. The first was “start-up”—the likelihood that
much more specific (or molecular) than the more a family member would initiate conflict when oth-
global code “oppositional behavior.” Researchers ers were behaving in a neutral or positive manner.
interested in short-term patterns of behavior per The second was “counterattack”—the likelihood
se—such as those testing predictions from operant that a family member would react immediately and
theory—have typically favored the coding behavior aversively to an aversive behavior directed at him
at the molecular level. Such coding is likely to be or her by another family member. The third was
of particular value when these patterns of behavior “continuance”—the likelihood that a family mem-
are associated with potentially important variations ber would continue to act in an aversive manner
across time and contexts, and can be accounted for following the first aversive initiation. Importantly,
by social influence. Conversely, when behavior is the moment-to-moment account of these family
used merely as a “sign” of an underlying disposi- interactions provided by microsocial observation
tion or trait, or researchers are interested in events allowed Patterson (1982) and colleagues to apply
and extended processes that occur over longer time social learning theory to family process. Their subse-
scales, the coding of more molar or global categories quent conceptualization of “reinforcement traps”—
may be of greater value (Cone, 1999; Dishion & based on escape-avoidance conditioning—forms
Granic, 2004). the basis for the most established interventions

126 ob se rvatio nal co ding s trategies

currently available in this area (see Eyberg, Nelson, a connected back-and-forth quality”) over the
& Boggs, 2008). respective behaviors of either member of the dyad.
Such ratings are formulated after observing parent–
Global Coding Systems child dyads in a range of structured contexts, each
Global or molar coding systems assign codes approximately 10 minutes in duration. The authors
based on summary ratings of behavior, and often contrast this method with traditional attachment
across longer time scales. Codes tend to be few, paradigms in which both parent and child are typi-
representing behavioral classes (e.g., negativity, cally involved yet only the behavior of the child is
supportiveness, conflict/hostility). The speed and coded (Aksan et al., 2006).
simplicity with which such systems can often be One of the main limitations of global ratings is
administered appeal to many of the researchers who that they do not retain the sequential relations of
adopt them. Furthermore, global ratings have often events, and therefore provide less potential informa-
been reported to be correlated highly with microso- tion on the functional dynamics of behavior than
cial data in studies comprising both. Hops, Davis, microsocial coding. However, there is also evidence
and Longoria (1995) found this to be the case for that global ratings can capture unique information
not only the global ratings of parent–child interac- of functional importance. In particular, such ratings
tions made by trained observers, but also the global may provide unique information about events that
ratings made by parents themselves. Heyman and unfold over longer time scales, and capture impor-
Slep (2004) suggested that global ratings may often tant outcomes to extended processes (Dishion &
be very appropriate in the collection of observa- Granic, 2004). Driver and Gottman (2004), for
tional data, given that molecular systems comprising example, examined the interactions of newlywed
30+ codes are often collapsed down into composite couples over the course of a day. Through the global
variables comprising positive, negative, and neutral coding of bids for intimacy and conflict discussions,
dimensions for the purpose of statistical analysis. It the researchers were able to draw conclusions about
is important to remember, however, that while the the role of affection in couples’ conflict resolution,
option to create such composite variables is avail- within the broader dynamics of daily interactions.
able when raw observational data are captured by Based on the potentially unique insights provided by
molecular codes, global ratings of such dimensions microsocial and global coding approaches, Dishion
can never be disaggregated into the discrete behav- and Snyder (2004) advised that such methods may
iors that they summarize. complement each other in the same programs of
As global or molar systems are often better able research.
than microsocial coding to take the broader con-
text of behavior into account, such ratings have the Units of Measurement
potential to capture some constructs more appro- To quantify the degree to which an observed
priately than molecular codes. For example, marital behavior is performed, a unit of measurement must
interactions have been coded using global ratings of be assigned to some aspect of that performance
emotional intensity, conflict tactics, and degree of (Johnston & Pennypacker, 1993). In clinical research
conflict resolution to investigate the relative effects a number of parameters—or dimensions—of
of parental mood and conflict on child adjustment behavior are often indexed for this purpose. The
(Du Rocher Schudlich & Cummings, 2007) and most common of these are frequency, intensity,
therapeutic change following couples’ interven- permanent products, and temporal properties. The
tion (Merrileesa, Goeke-Moreyb, & Cummings, aims of research are most likely to be met when the
2008). Aksan, Kochanska, and Ortmann’s (2006) theory-driven conceptualization of variables deter-
system for coding mutually responsive orientation mines the precise dimensions of behavior through
(MRO) in the parent–child relationship is another which they are indexed by observation. At the same
such example. This system was developed to mea- time, decisions related to such units of measurement
sure attachment-related dynamics in parent–child must also take into account the topographic charac-
interactions, based on the aim of characterizing teristics of the behavior of interest. Importantly, the
such dynamics using both parent and child data. dimensions through which behavior is indexed have
MRO is coded using global ratings that empha- implications for other aspects of the coding strategy,
size the joint aspect of parent–child interaction as different methods of recording observational data
(e.g., “Interaction flows smoothly, is harmoni- are better suited to capturing different dimensions
ous; communication flows effortlessly and has of behavior.

h awes, d ad d s, pasali c h 127

The frequency with which a behavior occurs has behaviors, and in turn quantify the behavior stream.
traditionally been seen to reflect the strength of that The strategies we will focus on here are event
behavior, and is often the simplest dimension to records, interval-based time sampling methods (par-
observe for discrete behavioral events—those with tial interval, whole interval, momentary), and those
an identifiable beginning and end. Frequency can that represent the temporal dynamics of behav-
also be one of the simplest to interpret, providing ior. These strategies present unique pros and cons
rate indices that can be standardized across various when applied to different dimensions of behavior.
periods of time (e.g., rate per minute, rate per hour). The appropriateness of a specific recording method
The intensity of a behavior refers to the amplitude, may also depend on practical considerations related
force, or effort with which it is expressed. However, to the setting in which the behavior is recorded.
as intensity is related largely to the impact that a Attempts to capture fine-grained data on behavioral
behavior has on the environment rather than the interactions will be of little use if the complexity
characteristics of the behavior itself, it can be more of the observation strategy prohibits the reliable
difficult to observe than frequency or temporal sampling of that behavior. This may be a particular
dimensions of behavior (Kazdin, 2001). concern for those increasingly rare studies that rely
Permanent products are the tangible byproducts on the live observation of participants. In such stud-
or “trace evidence” of behavior (e.g., number of wet ies, the complexity of the recording strategy will in
bedsheets, number of windows broken, number part determine the demands that are placed on the
of chores completed). Although not a measure of observer’s attention, or observer load. Where cod-
behavior per se, these are measures of the result or ing is completed from digital video/audio record-
effect of behavior, and may be of value when a behav- ings, it is often possible to minimize the demands
ior itself is not readily observable but leaves some associated with such complexity through repeated
lasting product that can be obtained or observed viewings of footage, and in some cases the use of
(Bellack & Hersen, 1998). It is likely, however, that commercially available software designed for such
such data—which can be recorded simply by not- purposes. However, coding from video footage may
ing such occurrences—will often be more informa- nonetheless become prohibitive if the time spent
tive to clinicians than researchers. Unlike the other reviewing such recordings far exceeds the real time
dimensions of behavior addressed here, it does not that it represents.
provide any information about the form or function Event records involve the recording of each and
of the behaviors that produce such products. every occurrence of a behavior in a given period
The temporal dynamics of behavior may also be of time. This recording strategy is most useful for
related to clinically important processes and can be discrete behaviors that are low in frequency (e.g.,
characterized in various ways. These include dura- swearing, out of seat in classroom, throwing a tan-
tion (the amount of time that elapses while a behav- trum). Such events may be those performed by an
ior is occurring), latency (the amount of time that individual or a group (e.g., children in a classroom).
elapses between the presentation of a stimulus and Event records are relatively simple to design and
the onset of a response), and interresponse time (the implement, and carry the advantage of provid-
amount of time that elapses between the offset of ing relatively complete coverage of the behavior of
one response and the onset of another). Piehler and interest. Event-based data may be converted into
Dishion (2007), for example, observed the inter- rate indices by dividing the observed frequency of
actions of adolescents in a discussion task with a behaviors by the amount of time observed, thereby
close friend. The simple index of duration of devi- allowing for comparison across variable periods of
ant talk bouts was found to differentiate youth with time. However, for many studies it is not feasible
early-onset antisocial behavior, late-onset antisocial to collect such comprehensive records for the large
behavior, and normative behavioral development. volumes of behavior that are of potential interest;
The temporal dynamics of behavior have proven nor is it necessary in order to examine the dynamics
to be of particular value in the fine-grained analysis of fine-grained behavior.
of relationship processes, as we shall soon discuss. Breaking a period of observation into smaller seg-
ments or intervals is a common practice that allows
Approaches to the Recording of large volumes of behavior to be recorded and facili-
Observational Codes tates the formal evaluation of reliability by allowing
A range of strategies can be used to record the for point-by-point comparison between observ-
occurrence, sequence, intensity, and duration of ers. For example, a 30-minute observation period

128 ob se rvat io nal co ding s trategies

may be divided into 30 1-minute intervals or 180 of a given interval. This strategy has traditionally
10-second intervals. Interval-based recording is most been used most often in research concerned with
efficient when the length of the observation inter- the duration of behavior, but it can also be used to
val is related to the frequency of the behavior, with record estimates of frequency. Whole interval time
high-frequency behaviors recorded using shorter sampling is most suited to the observation of behav-
observation intervals and low-frequency behaviors iors that are of lengthy duration or may not have
longer intervals. Dishion and Granic (2004) advise a clearly identifiable beginning or end. Dion and
that intervals in the region of 10 to 15 seconds colleagues (2011), for example, used this method
typically retain the microsocial quality of behavioral to collect observational measures of children’s class-
interactions being observed. However, somewhat room attention in a randomized clinical trial aiming
longer intervals are often adopted due to practical to improve attention and prevent reading difficul-
considerations, and various time sampling methods ties. Participating children were observed in the
make use of these intervals in different ways. Such classroom for 12 consecutive 5-second intervals, and
intervals serve the purpose of allowing observers to for an interval to be coded as “optimally attentive,”
simply record whether or not a behavioral response the child was required to be correctly seated and
has occurred, as opposed to recording every instance oriented toward the relevant teaching stimuli for its
of that behavior. The raw data recorded in interval- full duration. In terms of measurement error, there
based methods are typically converted into percent- is some evidence that whole interval time sampling
ages of intervals observed. As such, they represent tends to underestimate both the absolute duration
an estimate of behavior frequencies rather than the and frequency of behavior (Powell et al., 1977).
absolute frequencies of those behaviors. Momentary time sampling registers the occur-
Partial interval time sampling is used to record rence of a behavior if it is occurring at the moment
the presence or absence of a behavior if it occurs a given interval begins or ends. It is typically used
once or more at any time during a given interval. to provide frequency and duration data. This strat-
For example, if a behavior occurs twice in one inter- egy is suited to long-duration or high-frequency
val and ten times in the next, both intervals will reg- behaviors (e.g., rocking in an autistic child, on-task
ister the same unit of behavior (behavior present). behavior in the classroom). For example, Brown anc
This common method is useful for high-frequency, colleagues (2009) used momentary time sampling
brief behaviors that do not have a clear beginning or to code multiple dimensions of children’s physical
end. For example, an observer coding a 20-minute activity in school settings, including level of physical
parent–child interaction task may record “yes” to activity (e.g., stationary with limb or trunk move-
each consecutive, 15-second interval in which any ment, vigorous activity), and primary topography
designated parent or child behaviors occur. Once (e.g., running, sitting, standing). Observers noted
recorded, such data can be expressed in terms of the child’s behavior every 30 seconds through-
the percentage of intervals observed in which any out a 30-minute observation period and assigned
instance of the behavior occurred. Partial interval such codes based on the behavior occurring at that
time sampling is well suited to observations con- moment. There is some evidence to suggest that
cerned with the frequency of behavioral responses momentary interval sampling tends to underes-
and has been widely used for this purpose. There timate the frequency of behavior, particularly for
is some evidence, however, to suggest that this behaviors that are of short duration (Powell et al.,
strategy tends to overestimate behavior frequency 1977). Gardenier, MacDonald, and Green (2004)
(Green & Alverson, 1978; Powell et al., 1977). compared momentary time sampling with partial
Alternatively, the risk that partial interval record- interval sampling methods for estimating continu-
ing may underestimate the frequency of high-rate ous duration of stereotypy among children with
behaviors increases with increases in the length of pervasive developmental disorders. While partial
the recording interval. Partial interval systems may interval sampling consistently overestimated the
also be used to estimate response duration; how- duration of stereotypy, momentary sampling at
ever, this is less common and relies on interval times both overestimated and underestimated dura-
length being brief relative to the mean duration of tion. Momentary sampling was found to produce
the behavior of interest in order to minimize overes- more accurate estimates of absolute duration across
timation (Hartmann & Wood, 1990). low, moderate, and high levels of this behavior.
Whole interval time sampling registers behavioral In contrast to the discontinuous account of
responses that occur throughout the entire length behavior recorded by time interval methods,

h awes, d ad d s, pasali c h 129

behavior may also be recorded second by second, This method has also been applied to children’s
in a continuous stream. Interest in the “real-time” interactions in peer dyads. For example, Gottman,
temporal properties of social dynamics has grown Guralnick, Wilson, Swanson, and Murray (1997)
considerably in recent decades, supported by emerg- modeled the observed peer interactions of children
ing methods and frameworks for recording and ana- with developmental delays in this way to examine
lyzing temporally laden interaction patterns. Recent the role that peer ecology plays in the emotion regu-
innovations have allowed researchers to investigate lation of such children.
dimensions of social interaction that are inaccessible DS principles have also formed the basis for
to methods of behavioral observation that do not other innovations in the recording and analysis of
capture the temporal quality of real-time change in observed relationship interactions. A particularly
behavior as it responds to varying environmental noteworthy example is the state space grid (SSG).
demands. Some of the most important develop- Developed by Lewis, Lamey, and Douglas (1999),
ments in this area have focused on the nonlinear the SSG is a graphical and quantitative tool that can
dynamics of relationship patterns—often associ- be used to create a topographic map of the behav-
ated with sudden shifts—that are difficult to model ioral repertoire of a system (e.g., parent–child dyad).
using traditional analytic methods. It works by plotting the trajectory (i.e., sequence of
Such developments include Gottman’s (1991) emotional/behavioral states) on a grid similar to a
framework for conceptualizing the nonlinear scatterplot, which is divided into a number of cells.
dynamics of close relationships, and the develop- The coded behavior of one member of the dyad
ment of methods for the mathematical modeling (e.g., the child) is plotted on the x axis and the other
of relationships based on DS theory (Ryan et al., member’s (e.g., parent) on the y axis. Each point (or
2000). Gottman’s approach uses a time series of cell) on the grid therefore represents a simultane-
coded observational data to create parameters for ously coded parent–child event, or “stable state” of
each member of a dyad. These parameters are used the dyad. Any time a behavior changes, a new point
to identify key patterns of dyadic interaction, and is plotted and a line is drawn connecting it to the
the trajectories toward these patterns can be ana- previous point. Thus, the grid represents a series
lyzed to reveal the underlying dynamics of the sys- that moves from one dyadic state to another over
tem. Gottman and colleagues used this approach the course of an interaction. Figure 8.1 shows a hypo-
to model the dynamics of marital communication, thetical trajectory representing 10 seconds of coded
showing that these dynamics can predict those cou- parent–child behavior on a SSG (Hollenstein et al.,
ples who divorce and those who will remain mar- 2004). This behavioral sequence begins with 2 sec-
ried (Gottman, Coan, Carrere, & Swanson, 1998). onds in negative engagement/negative engagement





Negative Negative Neutral Positive

Engagement Disengagement Engagement


Figure 8.1 Example of a state space grid with a hypothetical trajectory representing 10 seconds of coded behavior, one arrowhead per
second. Plotting begins in the lower left part of the cell and moves in a diagonal as each second is plotted, ending in the upper right
(Reprinted with permission from Hollenstein et al., 2007).

130 ob se rvatio nal co ding s trategies

and is followed by 2 seconds in negative engage- exhibited fewer affective states, a greater tendency to
ment/neutral, 3 seconds in neutral/neutral, 1 second remain in each state, and fewer transitions among
in neutral/negative engagement, and 2 seconds in these states (Hollenstein et al., 2004).
negative engagement/negative engagement. In research examining the role of peer dynamics
This method can be used to record and examine in the development of antisocial behavior, SSGs have
several coexisting interaction patterns and explore been applied to observed peer interactions to derive
movement from one to the other in real time. In measures of dyadic organization or predictability.
DS terms, the moment-to-moment interactions Dishion, Nelson, Winter, and Bullock (2004) inves-
of the dyad are conceptualized as a trajectory that tigated the organization of peer interactions among
may be pulled toward certain attractors (recurrent antisocial and non-antisocial boys. The predictabil-
behavioral patterns or habits) and freed from others. ity of dyadic interactions was indexed by calculating
Using SSG, attractors are identified in terms of cells logged conditional probabilities of verbal behavior
to which behavior is drawn repeatedly, in which it between members of the respective dyads. Findings
rests over extended periods of time, or to which it indicated that adolescent boys who engaged in the
returns quickly. This temporally sensitive method most highly organized patterns of deviant talk (i.e.,
can be used to examine whether behavior changes breaking rules and norms) were the most likely to
in few of many states (i.e., cells) or regions (i.e., a continue antisocial behavior into adulthood.
subset of cells) of the state space. It is also possible The potential flexibility with which SSGs can be
to track how long a trajectory remains in some cells applied in various research designs is a clear strength
but not others, and how quickly it returns or stabi- of the method, allowing researchers to derive con-
lizes in particular cells (Dishion & Granic, 2004; tinuous time series as well as categorical and ordinal
Granic & Lamey, 2002). data for analysis. Furthermore, unlike sequential
Novel studies using SSGs have contributed sig- analysis, this technique does not rely on base rates
nificantly to the clinical literature in recent years. A of behavior to identify important interactional pat-
major focus of such studies has been on the structure terns (Dishion & Granic, 2004). It is also evident
or the relative flexibility versus rigidity that char- that DS approaches have the potential to inform
acterizes the exchanges within dyadic relationships observational research based on numerous develop-
(e.g., parent–child, husband–wife, adolescent–peer). mental and clinical theories. For example, given the
For example, Hollenstein, Granic, Stoolmiller, and capacity for DS methods to capture the structure
Snyder (2004) applied SSG analysis to code obser- of dyads, it has been suggested that they may be
vations of parent–child interactions in the families suited to the investigation of attachment dynamics.
of kindergarten children to examine whether indi- Specifically, SSGs could potentially represent secure
vidual differences in dyadic rigidity were associated family dynamics in terms of flexible, nonreactive,
with longitudinal risk for externalizing and internal- and synchronous interactive patterns that are “orga-
izing problems. Parent–child dyads were observed nized” as coordinated and mutual action–reaction
in 2 hours of structured play and discussion tasks patterns (Dishion & Snyder, 2004).
and common components of family routines (e.g.,
working on age-appropriate numeracy and literacy, Observational Measurement in
snack time). Videotaped observations were coded Intervention Research
using the Specific Affect (SPAFF) coding system Direct observation has long been a cornerstone
(Gottman, McCoy, Coan, & Collier, 1996). In of behavioral therapy and was associated with major
this system codes are based on a combination of developments in intervention science across the sec-
facial expression, gestures, posture, voice tone and ond half of the twentieth century. Early landmark
volume, speech rate, and verbal/motor response stimulus control studies relied heavily on observa-
content to capture integrated impressions of the tion for the purpose of behavior analysis. Referrals
affective tone of behavior. SSG data indicated that such as aggressive children were observed in fam-
high levels of rigidity in parent–child interactions ily interactions to identify antecedents and con-
were associated primarily with risk for externalizing sequences of target behaviors, which researchers
problems, predicting growth in such problems over systematically modified to observe the effects this
time. The parent–child dyads of well-adjusted chil- produced on behavioral responses (e.g., Patterson,
dren were found to flexibly adapt to context change 1974). Observational methods have since played
and display frequent change in affect, whereas a significant role in the development of the many
dyads of children at risk for externalizing problems evidence-based treatments that have grown out of

h awes, d ad d s, pasali c h 131

this tradition, and have informed empirical research content of such programs. For example, in the
concerned with numerous aspects of intervention. DPICS a directive is coded as a Command only if
Intervention research is generally concerned it is worded in such a way that it tells a child what
with questions related to both the efficacy of treat- to do; directives that tell the child what not to do
ments in producing clinically meaningful change are coded as Negative Talk—one of the main codes
and the mechanisms through which this change is operationalizing aversive/ineffective parenting
produced. Snyder and colleagues (2006) identified behaviors. In contrast, other widely used systems for
five core elements that need to be defined and mea- coding parent–child interactions based on closely
sured in intervention trials to clearly answer such related models retain the neutral Command code
questions. Importantly, each of these core elements for both kinds of directives (e.g., the Family Process
presents distinct implications for observational Code; Dishion et al., 1983).
measurement. The first core element in this model In our own research we have observed parents’
is the transfer of skills from a training specialist to implementation of parent training skills for various
a training agent. Training agents may include par- purposes, including the investigation of child char-
ents, as in the case of parent training programs, or acteristics predicted to moderate the effects of par-
teachers, as in the case of school-based interven- enting intervention on child outcomes. Hawes and
tions. The key issues in this element concern the Dadds (2005b) examined the association between
agent’s acquisition of the skills that are needed to childhood callous-unemotional (CU) traits (i.e., low
deliver the intervention to the client participant, levels of guilt and empathy) and treatment outcomes
who would be the respective children in both of in young boys with clinic-referred oppositional defi-
these examples. Snyder and colleagues (2006) sug- ant disorder whose parents participated in a parent
gested that for interventions in which such training training intervention. Parents’ implementation of
involves clearly specified sets of therapist behaviors the specific skills taught in this program, including
(e.g., queries, supportive statements, instructions, positive reinforcement of desirable behavior, use of
modeling, role playing, feedback, etc.), observation clear concrete commands, and contingent, nonre-
is often advantageous over other methods in the active limit setting (Dadds & Hawes, 2006), was
measurement of this therapeutic process. Patterson coded live in the family home using a structured
and Chamberlain (1994), for example, sequentially play task and a dinner observation. In doing so, we
coded the behaviors of therapists (e.g., “confront,” were able to show that CU traits uniquely predicted
“reframe,” “teach”) and parents (e.g., “defend,” poor diagnostic outcomes at 6-month follow-up,
“blame”) observed during a parent training inter- independently of any differences in implementation
vention and used these data to conduct a functional of the intervention by parents of children with high
analysis of client resistance. versus low levels of CU traits.
The second core element identified by Snyder The third core element in Snyder and colleagues’
and colleagues (2006) concerns the quality of the (2006) model concerns change in client behavior
intervention agent’s implementation of the treat- across the intervention sessions. Examining this
ment with the client participant. In the many trials change in relation to change in the actions of the
that have evaluated parent training interventions for training agent can provide evidence of the mecha-
conduct problems in young children, formal obser- nisms through which the intervention is producing
vations of parent–child interactions before and after behavior change. As noted by Snyder and colleagues
skills training have been common. Numerous obser- (2006), the value of using observation in measur-
vational systems have been developed in association ing this element may be reduced when client change
with specific parent training programs, allowing is associated largely with covert processes (e.g., the
researchers to code parents’ implementation of these formation of explicit intensions). However, in many
skills as prescribed by specific programs. In Parent– forms of psychosocial intervention, such change is
Child Interaction Therapy (McNeil & Hembree- accessible to observation. For example, Hawes and
Kigin, 2010) this is achieved using the Dyadic Dadds (2006) used observation as part of a mul-
Parent–Child Interaction Coding System (DPICS; timethod measurement strategy to examine change
Eyberg, Nelson, Duke, & Boggs, 2004)—an exten- in this way in the context of a parent training trial
sive coding system that was developed largely for for child conduct problems. Child behavior was
this purpose. This means that the same behaviors coded from naturalistic observations conducted at
may be classified as negative in one coding system the commencement and conclusion of the interven-
and neutral/positive in another, depending on the tion. A self-report measure of the parenting practices

132 ob se rvat io nal co ding s trategies

targeted in the intervention was also completed by of their behavioral repertoire, and decreases in the
parents at the same assessment points. These self- amount of time they spent “stuck” in any one emo-
report data were used to assess dimensions of par- tional state. Conversely, the interactions of non-
enting such as inconsistent discipline and parental responders became more rigid across treatment.
involvement, which can be difficult to capture in The authors concluded that rigidity is amenable to
brief periods of observation. Change in these self- change through family-based cognitive-behavioral
reported parenting domains was significantly asso- intervention, and that change in flexibility may be
ciated with change in observations of oppositional one mechanism through which improvements in
behavior across the intervention, consistent with problem behavior are produced.
the theoretical mechanisms of the clinical model The fourth core element identified by Snyder and
(Dadds & Hawes, 2006). colleagues (2006) relates to short-term or proximal
Research into mechanisms of change has tra- (e.g., posttreatment) outcomes of treatment. The
ditionally relied on treatment trials in controlled possibility that participants in intervention research
settings, but researchers have begun to focus increas- will report symptom reductions simply as a function
ingly on such processes in real-world (community- of receiving an intervention is widely recognized
based) settings. Gardner Hutchings, Bywater, and and is generally addressed where possible using a
Whitaker (2010) recently examined mediators of randomized controlled trial design (see Chapter 4
treatment outcome in a community-based trial of in this volume). Reporter biases and method effects
parent training for conduct problems, delivered to both have the potential to confound the measure-
the families of socially disadvantaged preschoolers. ment of treatment effects, and observational mea-
Like Hawes and Dadds (2006), Gardner and col- surement has been shown to be a highly effective
leagues (2010) used observation as part of a mul- means of minimizing such error. For example, in a
timethod measurement strategy to overcome the study by Dishion and Andrews (1995), parents were
problem of shared method variance. Parenting prac- found to report large reductions in their adoles-
tices were measured through direct observation of cents’ problem behavior regardless of their random
parent–child interactions in the family home, with assignment to active intervention versus control
the DPICS used to code frequencies of parenting conditions; analyses of the coded observations of
behaviors that were then collapsed into summary parent–adolescent interactions, however, revealed
positive (e.g., physical positive, praise) and negative that reductions in conflict behavior were specific to
(e.g., negative commands, critical statements) vari- conditions that actively promoted behavior change.
ables for analysis. Mediator analyses showed that the Additionally, the measurement of some treat-
effects of the intervention on change in child prob- ment outcome variables may be achieved more
lem behavior was mediated primarily by improve- sensitively through direct observation than other
ment in positive parenting rather than reductions in forms of measurement. For example, Stoolmiller,
harsh or negative parenting (Gardner et al., 2010). Eddy, and Reid (2000) collected multimethod data
Granic, O’Hara, Pepler, and Lewis (2007) exam- to evaluate the effects of a school-based interven-
ined an intervention for externalizing problems tion for physical aggression in a randomized con-
in a community-based setting, using observation trolled trial design. Of all the extensive self-report
to investigate mechanisms of change related to measures collected, only the coded observations of
different parent–child interactions. The authors children’s playground behavior were sensitive to the
used the SSG method to examine the changes in effects of the intervention. In another similar exam-
parent–child emotional behavior patterns that char- ple, Raver and colleagues (2009) used observations
acterized children who responded positively to the and teacher ratings of preschoolers’ classroom
intervention versus those who failed to respond. behavior to evaluate the effects of a classroom-based
Using a problem-solving analogue observation con- intervention to reduce behavior problems in chil-
ducted pretreatment and posttreatment, SSGs were dren from socioeconomically disadvantaged (Head
constructed to quantify previously unmeasured Start) families. Children’s externalizing (disruptive)
processes of change related to the flexibility of the and internalizing (disconnected) behaviors were
parent–child dyad. The children showing the great- observed by coders in 20-minute blocks during
est response to treatment were those whose families the course of a school day. Socioeconomic risk was
exhibited the greatest increases in flexibility—as found to moderate the effects of the intervention
indexed by SSGs showing increases in the number on child outcomes, but only in the analyses using
of times they changed emotional states, the breadth the observational data. Such findings reinforce the

h awes, d ad d s, pasali c h 133

value of including observation as part of a mul- goals. First, it ensures that observers are coding
timethod measurement strategy in intervention events according to the definitions formulated in a
research (Flay et al., 2005). coding manual; second, it provides observers feed-
In the example of parent training interven- back so as to improve their performance; and third,
tions for child conduct problems, short-term out- it assures others in the scientific community that
comes (e.g., reduced oppositional defiant disorder observers are producing replicable data.
symptoms) have been shown to then contribute to Considerably less emphasis is typically placed on
reduced risk for delinquency, drug use, and school issues of validity in observational research. This fol-
failure (Patterson, Forgatch, & DeGarmo, 2010). It lows from the notion that observational measure-
is this distal change, represented by long-term out- ment of behavior does not involve the measurement
comes such as these, that is the focus of the fifth and and interpretation of hypothetical constructs. As
final core element in Snyder and colleagues’ (2006) such, observational data has been viewed as axiom-
model. Such outcomes are seen to reflect more atically valued and its validity taken at face value
global and enduring reductions in dysfunction, or (Johnston & Pennypacker, 1993). The validity of
increases in capacities and resilience. In contrast to observational data has nonetheless been examined
the earlier elements in the model, Snyder and col- in measurement research, with evidence of various
leagues (2006) suggest that observational methods forms of validity available from a range of multim-
are often not appropriate to index such outcomes, ethod studies (see Cone, 1999). Hawes and Dadds
recommending instead that approaches such as (2006), for example, examined associations between
multi-informant self-report measures may provide observational measures of parent behaviors coded
superior data. As reviewed here, observational cod- live in the home setting and parent self-reports of
ing can serve multiple purposes in intervention their typical parenting practices on the Alabama
and prevention designs. Importantly, observational Parenting Questionnaire (APQ; Shelton et al.,
data are most likely to inform innovations in inter- 1996). Evidence of convergent validity was found,
vention science when collected in the context of a with moderate correlations seen between conceptu-
theory-driven multimethod measurement strategy. ally related observational and self-report variables.
For example, observed rates of aversive parenting
Reliability and Validity correlated positively with parent-reported use of
The process of establishing adequate reliability corporal punishment, and likewise, observed rates
in observational coding is one of the first essen- of praise with parent-reported use of positive par-
tial steps in research with such methods, whether enting practices (Hawes & Dadds, 2006).
this involves the design and development of a
novel observational strategy or the implementa- Future Directions
tion of an established coding system. This typically Before considering future directions for observa-
requires that a team of observers are trained until tional coding, it is worth reflecting on trends in the
conventional criteria for interobserver agreement popularity of such strategies. Namely, it seems that
are reached. The process for such training often the observation of behavior—once ubiquitous in
involves intensive practice, feedback, and discus- clinical research—has begun to disappear in recent
sions focused on example recordings, and it may decades (Dishion & Granic, 2004). This trend is
take days to months depending on the complexity not unique to clinical psychology, with a marked
of the coding system. Formal calculations of reli- decline in the use of observational measurement
ability are derived from the completed coding of a also noted in other fields such as social psychology
sample of recordings by multiple observers who are (Baumeister, Vohs, & Funder, 2007). This decline
unaware of each other’s results. These calculations can be seen to reflect a growing interest in mod-
may range from a simple index of agreement such els of psychopathology and intervention that offer
as percent agreement for occurrence and nonoc- perspectives beyond those afforded by behavioral
currence of observed events, through to intraclass analysis and learning theory. Broadly speaking, the
correlation coefficients, and an index of nominal focus in the literature has shifted from the environ-
agreement that corrects for chance agreement (i.e., ments that shape dysfunction to other forces that
kappa). Ongoing training beyond such a point is underpin it. Interestingly, however, the more that
also advisable to reduce observer drift over time. the neurosciences illuminate the role of biology in
According to Bakeman and Gottman (1997), focus- pathways to health and dysfunction, the more this
ing on interobserver agreement serves three critical research also highlights the very importance of the

134 ob se rvatio nal co ding s trategies

environment. Some of the most compelling find- will be increasingly required to measure contextual
ings from such research relate to gene × environ- variables and processes that may be difficult—if
ment interactions, in which genetic vulnerabilities not impossible—to characterize through self-report
confer risk for adverse outcomes only when com- methods. We believe that the use of observation in
bined with specific contexts (see Moffitt, Caspit, & this capacity will play a major role in future research
Rutter, 2006). Prominent examples include Caspi of this kind. As addressed throughout this chapter,
and colleagues’ (2002) finding that a functional observational methods often provide the most sensi-
polymorphism of the gene that encodes the neu- tive measures of contextual dynamics, and impor-
rotransmitter-metabolizing enzyme monoamine tantly, can be adapted with great flexibility for
oxidase A (MAOA) moderated the effects of child purposes of theory-driven measurement. In recent
maltreatment on risk for antisocial behavior in a years we have seen a range of innovative studies
longitudinal cohort. Low levels of MAOA expres- conducted in this vein, some examples of which
sion were associated with significant risk for con- follows.
duct disorder, but only among children who had While adverse child-rearing environments char-
been exposed to maltreatment early in life. Research acterized by severe maltreatment have been associ-
into epigenetic processes has also attracted much ated with atypical neurocognitive development in
attention, suggesting that environmental conditions children, little is known about the effects of norma-
in early life can structurally alter DNA and in turn tive variations in the child-rearing environment. In a
produce risk for psychopathology over the life of the design that incorporated the observation of parent–
individual (Meaney, 2010). adolescent interactions in laboratory-based tasks
Evidence of this kind is increasingly informing and magnetic resonance imaging of adolescent brain
conceptualizations of risk and protection in rela- structure, Whittle and colleagues (2009) examined
tion to contextual dynamics, and in turn presenting whether normative variations in maternal responses
researchers with new methodological challenges. to adolescents’ positive affective behavior were asso-
Here we focus on three such challenges and the poten- ciated with characteristics of adolescents’ affective
tial for observational coding to address the issues neural circuitry. Parent and adolescent affect and
they raise. The first challenge concerns the theory- verbal content were coded from a pleasant event-
driven measurement of contextual dynamics when planning interaction and a conflictual problem-
testing predictions that emphasize the interaction solving interaction. The extent to which mothers
of individual-level and environment-level factors. exhibited punishing responses to their adolescents’
The second concerns the reliable measurement of affective behavior, as coded from these interactions,
theoretically important individual differences using was associated with orbitofrontal cortex and ante-
methods that can be implemented across diverse rior cingulate cortex volumes in these adolescents
settings and study designs. The third challenge con- (Whittle et al., 2009). In this study direct observa-
cerns the translation of emerging models of psycho- tion was key to characterizing subtle variations in
pathology into clinical interventions. maternal socialization and to demonstrating the
importance of these common relationship dynam-
Investigating Emerging Models of ics to the neuroanatomic architecture of children’s
Psychopathology social, cognitive, and affective development.
There is growing evidence that individual differ- Longitudinal research has shown that children
ences associated with biologically based character- of depressed mothers exhibit relatively poor cog-
istics interact—and transact—with environmental nitive, neuropsychological, social, and emotional
factors to shape trajectories of risk and protection. skills across childhood and adolescence. Predictions
Various forms of evidence (e.g., experimental, lon- regarding the mechanisms through which this risk is
gitudinal, genetic) have informed models of such conferred have focused increasingly on the compro-
processes in relation to distinct forms of psycho- mised capacities for depressed mothers to construct
pathology, with scientific advances allowing for a growth-promoting environment for their infants,
increasingly precise conceptualizations of critical with the relational behavior of such mothers char-
child–environment dynamics. In the area of anti- acterized by reduced sensitivity, restricted range of
social behavior, developmental models have been affective expression, and inconsistent support of the
informed by particularly rapid progress of this kind infant’s budding engagement (Goodman & Gotlib,
(e.g., Hawes, Brennan, & Dadds, 2009). To test the 1999). However, empirical investigations of such
emerging predictions from such models, researchers mechanisms have relied largely on animal studies

h awes, d ad d s, pasali c h 135

that allow for the manipulation of environmental (e.g., freezing—tense, motionless, or fixed in place),
conditions. Rodent studies have shown that manip- hostility (e.g., anger—facial expressions or posters
ulating rearing conditions to simulate depression reflecting anger), and involvement (e.g., inquiries
(e.g., preventing maternal licking and grooming of about parent feelings or relationships—questions
pups) disrupts the development of the hypothalam- about the emotional state of the parent or quality of
ic-pituitary-adrenal (HPA) stress management sys- the interparental relationship [e.g., “Mom, are you
tem in offspring (see review by Champagne, 2008). okay?,” “Is Dad mad?”]). Relative to other forms of
Researchers who have begun to test the predic- behavioral reactivity, children’s distress responses to
tions from these animal models in humans have interparental conflict were consistent and unique
relied heavily on direct observation, using method- predictors of their cortisol reactivity to interpa-
ologies that typically integrate observational cod- rental conflict. Furthermore, observed distress was
ing with neurobiological measures. Feldman and particularly predictive of greater cortisol reactivity
colleagues (2009), for example, investigated such when children’s observed levels of involvement in
predictions in 9-month-old infants of mothers with the conflict were also high.
postnatal anxiety and depression. Observational Finally, in a novel study of emotion in young chil-
measurement served multiple purposes in this dren, Locke and colleagues (2009) used observation
study, indexing aspects of socialization as well to index affective responses that are inappropriate to
as infant temperament. Maternal sensitivity and the contexts in which they occur, and examined the
infant social engagement were coded from mother– association between this affect and salivary cortisol
infant play interactions in the home environment. level. To measure context-inappropriate affect, chil-
Infant fear regulation was also microcoded from a dren were administered a variety of episodes from
structured paradigm adapted from the Laboratory the Laboratory Temperament Assessment Battery
Temperament Assessment Battery (Goldsmith & (Lab-TAB) designed to elicit negative affect (e.g.,
Rothbart, 1996). Data from various self-report inhibition during conversation with a stranger,
measures were also collected, and infant cortisol anger or sadness during a disappointment para-
was assayed from salivary measures to index stress digm) or pleasure (e.g., anticipating surprising their
(HPA axis) reactivity. Echoing findings from the parent). Observers then coded the presence and
animal literature, maternal sensitivity was meaning- peak intensity of anger (e.g., bodily anger or frus-
fully related to infant social engagement and stress tration, anger vocalizations) in 5-second intervals.
reactivity, while maternal withdrawal was associ- Displays of anger that were inappropriate to con-
ated with infant fear regulation. The integration text were found to predict low levels of basal cor-
of observational measurement into designs of this tisol. Importantly, this prediction was unique from
kind represents a promising means to investigate that afforded by levels of anger that were context-
a range of physiological support systems that may appropriate. Such findings support the importance
be compromised by prenatal and postpartum expo- of examining contextual aspects of emotion when
sure to adverse conditions associated with parental investigating its role in relation to broader processes
psychopathology. of risk and protection, and the value of observation
Davies and colleagues (2008) used a highly for this purpose.
structured analogue observation task to investigate
a somewhat related research question concerning Theory-Driven Measurement of
the association between children’s biologically based Individual Differences
characteristics and their reactivity to interparental In addition to the theory-driven measurement
conflict. Children witnessed a live simulated con- of contextual variables, it is becoming increasingly
flict and resolution scenario between their parents. important for researchers to be able to characterize
This involved mothers acting from a script involv- participants on dimensions related to biologically
ing a disagreement with fathers over the telephone. based factors. It is now commonly accepted that
Mothers were instructed in advance to convey mild most psychopathologies and many complex behav-
irritation, frustration, and anger toward their part- iors have genetic origins, and that there are multiple
ner as they normally would at home. Salivary cortisol routes to the same behavioral phenotype (or behav-
was collected from children at three points during ioral symptoms). In between genes and behavior are
the simulated conflict, and three dimensions of chil- endophenotypes—individual differences that form
dren’s behavioral reactivity to the conflict were coded the causal links between genes and the overt expres-
from video recordings of the procedure: distress sion of disorders (see Cannon & Keller, 2006).

136 ob se rvat io nal co ding s trategies

For example, there is evidence to suggest that of CU traits (Dadds et al., 2011a). As no established
mentalizing—the intuitive ability to understand technology existed for investigating the processes by
that other people have minds—may be an endophe- which these deficits may be shaped by parenting
notype of the social impairments in autism (Viding dynamics—and potentially shape such dynamics in
& Blakemore, 2007). Growing research is concerned return—we subsequently developed a novel para-
with the identification of endophenotypes for vari- digm for this purpose. The “love” task (Dadds et al.,
ous disorders, placing increasing emphasis on the 2011b) was expressly designed to elucidate parent–
reliable and practical measurement of such individ- child interactions that are sensitive to the emotion-
ual differences. In our own research we have found processing deficits associated with CU traits. The
observation to be of particular value in measuring task concentrates parent–child interactions into a
individual differences associated with putative sub- short but emotionally intense encounter for which
types of antisocial behavior differentially associated reciprocated eye gaze is fundamental. It was admin-
with callous-unemotional (CU) traits. There is now istered following approximately 30 minutes of par-
considerable evidence that conduct problems follow ent–child play and conversation and was prompted
distinct developmental trajectories in children with by an experimenter with the following instructions:
high versus low levels of CU traits, involving some- “I’m going to come back into the room to do one
what causal processes (see Frick & Viding, 2009). more game. Once I have gone, I’d like you to look
Data from our initial experimental studies— [child’s name] in the eyes and show him/her, in the
using emotion-recognition and eye-tracking para- way that feels most natural for you, that you love
digms—suggested that children with high levels of him/her.”
CU traits exhibit deficits in the extent to which they Video recordings of the subsequent 90-second
attend to the eye regions of faces (Dadds et al., 2006, interaction were coded using global ratings of
2008). To move beyond these computer-based tasks mother and child levels of comfort and genuineness
and investigate whether this failure to attend to during the interaction, verbal and physical expres-
the eyes of other people occurs in real-world social sions of affection, and eye contact—both initiated
interactions, we relied heavily on observational and rejected. Compared with controls, children with
measurement. In our first such study (Dadds et al., oppositional defiant disorder were found to recipro-
2011a), we observed the parent–child interactions cate lower levels of affection from their mothers, and
of children with clinic-referred conduct problems, those with CU traits showed significantly lower lev-
in analogue scenarios involving “free play,” a family- els of affection than the children lacking these traits.
picture drawing task, and an “emotion talk” task in As predicted, the high-CU group showed uniquely
which parents and children discussed recent happy low levels of eye contact toward their mothers
and sad events. Parent–child interactions were coded (Dadds et al., 2011b). This paradigm appears to be
using global ratings of social engagement, talk, and a promising tool for characterizing children with
warmth to contextualize rates of parent–child eye high levels of CU traits. Importantly, as an obser-
contact. Interval coding was then used to code rates vational paradigm it is able to provide data that are
of eye contact. As previous literature on coding eye unaffected by report biases. With growing evidence
contact in family interactions could not be located, that child CU traits and family environment are
intervals of various length were compared in pilot associated with interacting and bidirectional risk
testing to achieve an acceptable balance between processes (e.g., Hawes Brennan, & Dadds, 2009;
measurement sensitivity and observer demands. Hawes et al., 2011), this method is likely to have
While levels of eye contact were found to be recip- broad applications in future research.
rocated in mother–son and father–son dyads, boys
with high levels of CU traits showed consistent Observation in Translational Research
impairments in eye contact towards their parents. As a consequence of the growing impact of the
Interestingly, although CU traits were unrelated to neurosciences on models of psychopathology, the
the frequency of eye contacts initiated by mothers, need for translational research is growing likewise.
fathers of high-CU boys exhibit reduced eye contact Findings from emerging intervention research have
toward their sons (Dadds et al., 2011a). shown that contextual dynamics can be critical to
Based on these findings, we postulated that a understanding the interplay between behavioral
failure to attend to the eyes of attachment figures and biological variables in therapeutic change. The
could drive cascading errors in the development of potential for observational coding to capture such
empathy and conscience in children with high levels dynamics in translational research designs has also

h awes, d ad d s, pasali c h 137

been demonstrated. O’Neal and colleagues (2010), paradigm (Dadds et al., 2011b) to observe change
for example, examined child behavior and cortisol in parent and child behaviors of theoretical impor-
response as long-term (16-month) outcome vari- tance to CU traits (e.g., eye contact) in response to
ables in a randomized controlled trial of a parent- novel interventions.
ing intervention for preschoolers at risk for conduct
problems. Structured parent–child interactions Summary
were observed both in the laboratory and the fam- The potential for any research strategy to produce
ily home, and multiple parent–child constructs meaningful findings will be determined first and fore-
were coded using various systems. The frequency of most by the meaningfulness of the research question,
discrete aggressive behaviors was microcoded using and as we have reviewed in this chapter, observa-
the DPICS, and various global observation systems tional coding has proven to be well suited to a range
(e.g., Home Observation for the Measurement of the of research questions in the clinical literature. Such
Environment–Early Childhood Version; Caldwell coding has been widely used to index the dimensions
& Bradley, 1984) were used to code dimensions of diagnostic symptoms associated with various dis-
related to parental warmth and engagement. Not orders, the contextual dynamics of functional impor-
only was the intervention shown to prevent the tance to these disorders, and individual differences
development of aggressive behavior, but cortisol (e.g., child temperament) and internal processes (e.g.,
data demonstrated that it also resulted in normal- cognitive biases) implicated in pathways to these dis-
ized stress responses. The effect of the intervention orders. In recent years considerable progress has been
on aggression was found to be largely mediated by achieved in establishing high-quality coding systems
the intervention effect on cortisol response, but for research with specific clinical populations—most
only among families characterized by low levels of notably discordant couples and the distressed families
observed warmth (O’Neal et al., 2010). In addition of children with conduct problems. At the same time,
to demonstrating the value of integrating observa- research involving the theory-driven adaptation of
tional measurement into such a design, such find- such systems, and the development of novel observa-
ings underscore the importance of parental warmth tional paradigms, has demonstrated that the flexibility
for the development of the HPA axis during early associated with observational measurement remains
childhood, and suggest that HPA axis function is one of its major strengths. The type of structure that
amenable to change through interventions that is applied to elicit behavior in either analogue or natu-
modify social environments in this period. ralistic observation, as well as the methods by which
Large-scale randomized controlled trial designs, this behavior is coded, recorded, and analyzed, can all
however, are not the only way to translate emerg- be adapted for theoretical purposes. We believe that
ing models of psychopathology into clinical prac- observational coding will be an important tool in
tice. In fact, small-scale designs that allow for the emerging translational research, allowing researchers
intensive examination of change processes may be to operationalize various biologically based individual
more likely to inform the early stages of interven- differences and capture critical information about the
tion development. Such an approach has recently contexts in which they emerge.
been recommended for the purpose of developing
new interventions in the field of autism (Smith References
Aksan, N., Kochanska, G., & Ortmann, M. R. (2006). Mutually
et al., 2007). Just as early stimulus control studies
responsive orientation between parents and their young chil-
used repeated observational measurement to trans- dren: Toward methodological advances in the science of rela-
late operant theory into behavioral interventions for tionships. Developmental Psychology, 42, 833–848.
child conduct problems in single-case experimental Allen, J. P., Insabella, G., Porter, M. R., Smith, F. D., Land, D.,
designs (e.g., Patterson, 1974), we believe that such & Phillips, N. (2006). A social-interactional model of the
designs are now needed to translate conceptualiza- development of depressive symptoms in adolescence. Journal
of Consulting and Clinical Psychology, 74, 55–65.
tions of heterogeneous causal processes in emerging Aspland, H., & Gardner, F. (2003). Observational measures of
models of antisocial behavior (see Frick & Viding, parent child interaction. Child and Adolescent Mental Health,
2009). Importantly, researchers testing the clinical 8, 136–144.
application of novel theories are likely to be required Bakeman, R., & Gottman, J.M. (1987). Applying observational
to design novel, theory-driven observational strate- methods: A systematic view. In J. Osofsky (Ed.), Handbook of
infant development (2nd ed., pp. 818–854). New York: Wiley.
gies, as opposed to relying on “off-the-shelf ” cod- Barkley, R. A. (1997). Attention-deficit/hyperactivity disorder.
ing systems. In our own research we are currently In E. J. Marsh & L. G. Terdal (Eds.), Assessment of childhood
conducting such investigations using the love task disorders (3rd ed., pp. 71–129). New York: Guilford Press.

138 ob se rvatio nal co ding s trategies

Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). preventing reading difficulties among low-income first-
Psychology as the science of self-reports and finger move- graders: A randomized study. Prevention Science, 12, 70–79.
ments: Whatever happened to actual behavior? Perspectives Dishion, T. J., & Andrews, D. W. (1995). Preventing escala-
on Psychological Science, 2, 396–403. tion in problem behaviors with high-risk young adolescents:
Bellack, A. S., & Hersen, M. (1998). Behavioral assessment: A Immediate and 1-year outcomes. Journal of Consulting and
practical handbook. Elmsford, NY: Pergamon Press. Clinical Psychology, 63, 538–548.
Birchler, G. R., Weiss, R. L., & Vincent, J. P. (1975). Multimethod Dishion, T. J., & Bullock, B. (2001). Parenting and adolescent
analysis of social reinforcement exchange between maritally problem behavior: An ecological analysis of the nurturance
distressed and nondistressed spouse and stranger dyads. hypothesis. In J. G. Borkowski, S. Ramey, & M. Bristol-
Journal of Personality and Social Psychology, 31(2), 349–360. Power (Eds.), Parenting and the child’s world: Influences on
Brown, W. H., Pfeiffer, K. A., McIver, K. L., Dowda, M., Addy, intellectual, academic, and social-emotional development
C. L., & Pate, R. R. (2009). Social and environmental factors (pp. 231–249). Mahwah, NJ: Erlbaum.
associated with preschoolers’ nonsedentary physical activity. Dishion, T. J., & Granic, I. (2004). Naturalistic observation
Child Development, 80, 45–58. of relationship processes. In S. N. Haynes & E. M. Heiby
Caldwell, B. M., & Bradley, R. H. (1984). Home Observation (Eds.), Comprehensive handbook of psychological assessment
for Measurement of the Environment–Revised Edition. Little (Vol. 3): Behavioral assessment (pp. 143–161). New York:
Rock: University of Arkansas at Little Rock. Wiley.
Cannon, T. D., & Keller, M. C. (2006). Endophenotypes in Dishion, T. J., Nelson, S. E., Winter, C., & Bullock, B. (2004).
the genetic analyses of mental disorders. Annual Review of Adolescent friendship as a dynamic system: Entropy and
Clinical Psychology, 2, 267–290. deviance in the etiology and course of male antisocial behav-
Caspi, A. (2000). The child is father of the man: Personality con- ior. Journal of Abnormal Child Psychology, 32, 651–663.
tinuities from childhood to adulthood. Journal of Personality Dishion, T. J., & Snyder, J. (2004). An introduction to the
and Social Psychology, 78, 158–172. “Special Issue on advances in process and dynamic system
Caspi, A., McClay, J., Moffitt, T. E., Mill, J., Martin, J., Craig, analysis of social interaction and the development of antiso-
I. W., Taylor, A., & Poulton, R. (2002). Role of genotype cial behavior.” Journal of Abnormal Child Psychology, 32(6),
in the cycle of violence in maltreated children. Science, 297, 575–578.
851–853. Dishion, T., & Stormshak, E. (2007). Intervening in children’s
Champagne, F. A. (2008). Epigenetic mechanisms and the lives: An ecological, family-centered approach to mental health
transgenerational effects of maternal care. Frontiers in care. Washington, DC: American Psychological Association.
Neuroendocrinology, 29, 386–397. Dishion, T. J., & Tipsord, J. M. (2011). Peer contagion in child
Cone, J. (1999). Observational assessment: Measure develop- and adolescent social and emotional development. Annual
ment and research issues. In P. C. Kendall, J. N. Butcher, Review of Psychology, 62, 189–214.
& G. N. Holmbeck (Eds.), Handbook of research methods in Dobson, K. S., & Kendall, P. C. (Eds.) (1993). Psychopathology
clinical psychology (2nd ed., pp. 183–223). New York: Wiley. and cognition. San Diego: Academic Press.
Dadds, M. R., Allen, J. L., Oliver, B. R., Faulkner, N., Legge, Driver, J. L., & Gottman, J. M. (2004). Daily marital interac-
K., Moul, C., Woolgar, M., & Scott, S. (2011b). Love, eye tions and positive affect during marital conflict among new-
contact, and the developmental origins of empathy versus lywed couples. Family Process, 43(3), 301–314.
psychopathy. British Journal of Psychiatry, 198, 1–6. Du Rocher Schudlich, T., & Cummings, E. M. (2007). Parental
Dadds, M. R., Barrett, P. M., Rapee, R. M., & Ryan, S. (1996). dysphoria and children’s adjustment: marital conflict styles,
Family processes and child anxiety and aggression: An obser- children’s emotional security, and parenting as mediators of
vational analysis. Journal of Abnormal Child Psychology, 24, risk. Journal of Abnormal Child Psychology, 35, 627–639.
715–734. Dubi, K., Emerton, J., Rapee, R., & Schniering, C.2008.
Dadds, M. R., El Masry, Y., Wimalaweera, S., & Guastella, A. J. Maternal modelling and the acquisition of fear and avoidance
(2008). Reduced eye gaze explains “fear blindness” in child- in toddlers: Influence of stimulus preparedness and tempera-
hood psychopathic traits. Journal of the American Academy of ment. Journal of Abnormal Child Psychology, 36, 499–512.
Child and Adolescent Psychiatry, 47, 455–463. Ehrmantrout, N., Allen, N. B., Leve, C., Davis, B., & Sheeber, L.
Dadds, M. R., Jambrak, J., Pasalich, D., Hawes, D. J., & (2011). Adolescent recognition of parental affect: Influence
Brennan, J. (2011a). Impaired attention to the eyes of attach- of depressive symptoms. Journal of Abnormal Psychology,
ment figures and the developmental origins of psychopathy. 120(3), 628–634.
Journal of Child Psychology and Psychiatry, 52(3), 238–245. Eyberg, S. M., Nelson, M. M., & Boggs, S. R. (2008). Evidence-
Dadds, M. R., Perry, Y., Hawes, D. J., Merz, S., Riddell, A., based treatments for child and adolescent disruptive behavior
Haines, D., Solak, E., & Dadds, M. R., & Hawes, D. J . disorders. Journal of Clinical Child and Adolescent Psychology,
(2006). Integrated family intervention for child conduct prob- 37, 213–235.
lems. Brisbane, Queensland: Australian Academic Press. Eyberg, S., Nelson, M., Duke, M., & Boggs, S. (2004). Manual for
Dadds, M. R., & Sanders, M. R. (1992). Family interaction the Dyadic Parent–Child Interaction Coding System (3rd ed.).
and child psychopathology: A comparison of two methods Unpublished manuscript, University of Florida, Gainesville.
of assessing family interaction. Journal of Child and Family Feldman, R., Granat, A. D. I., Pariente, C., et al. (2009).
Studies, 1, 371–392. Maternal depression and anxiety across the postpartum year
Davies, P. T., et al. (2008). Adrenocortical underpinnings of chil- and infant social engagement, fear regulation, and stress reac-
dren’s psychological reactivity to interparental conflict. Child tivity. Journal of the American Academy of Child and Adolescent
Development, 79(6), 1693–1706. Psychiatry, 48(9), 919–927.
Dion, E., Roux, C., Landry, D., Fuchs, D., Wehby, J., & Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson,
Dupéré, V. (2011). Improving classroom attention and D., Kellam, S., et al. (2005). Standards of evidence: Criteria

h awes, d ad d s, pasali c h 139

for efficacy, effectiveness, and dissemination. Prevention thinking on psychopathology and the family (pp. 73–91).
Science, 3, 151–175. New York: Elsevier.
Frick, P. J., & Viding, E. (2009). Antisocial behavior from a Hawes, D. J., & Dadds, M. R. (2005b). The treatment of conduct
developmental psychopathology perspective. Development problems in children with callous-unemotional traits. Journal
and Psychopathology, 21, 1111–1131. of Consulting and Clinical Psychology, 73(4), 737–741.
Gardenier, N. C., MacDonald, R., & Green, G. (2004). Hawes, D. J., & Dadds, M. R. (2006). Assessing parenting prac-
Comparison of direct observational methods for measuring tices through parent-report and direct observation during
stereotypic behavior in children with autism spectrum disor- parent-training. Journal of Child and Family Studies, 15(5),
ders. Research in Developmental Disabilities, 25, 99–118. 555–568.
Gardner, F. (2000). Methodological issues in the direct obser- Hawes, D. J., Dadds, M. R., Frost, A. D. J., & Hasking, P. A.
vation of parent-child interaction: do observational findings (2011). Do childhood callous-unemotional traits drive
reflect the natural behavior of participants? Clinical Child change in parenting practices? Journal of Clinical Child and
and Family Psychology Review, 3, 185. Adolescent Psychology, 52, 1308–1315.
Gardner, F., Hutchings, J., Bywater, T., & Whitaker, C. (2010). Heyman, R. E. & Slep, A. M. S. (2004). Analogue behavioral
Who benefits and how does it work? Moderators and media- observation. In M. Hersen (Ed.) & E. M. Heiby & S. N.
tors of outcome in an effectiveness trial of a parenting inter- Haynes (Vol. Eds.), Comprehensive handbook of psychologi-
vention. Journal of Clinical Child & Adolescent Psychology, cal assessment: Vol. 3. Behavioral assessment (pp. 162–180).
39(4), 568–580. New York: Wiley.
Goldsmith, H. H., & Rothbart, M. K. (1996). The Laboratory Hollenstein, T., Granic, I., Stoolmiller, M., & Snyder, J. (2004).
Temperament Assessment Battery (LAB-TAB): Locomotor Rigidity in parent–child interactions and the development of
Version 3.0. Technical Manual. Madison, WI: Department of externalizing and internalizing behavior in early childhood.
Psychology, University of Wisconsin. Journal of Abnormal Child Psychology, 32, 595–607.
Goodman, S. H., & Gotlib, I. H. (1999). Risk for psychopa- Hops, H., Davis, B., & Longoria, N. (1995). Methodological
thology in the children of depressed mothers: a developmen- issues in direct observation: Illustrations with the LIFE coding
tal model for understanding mechanisms of transmission. system. Journal of Clinical Child Psychology, 24(2), 193–203.
Psychological Review, 106, 458–490. Hudson, J., Comer, J. S., & Kendall, P. C. (2008). Parental
Gottman, J. (1979). Marital interaction: Experimental investiga- responses to positive and negative emotions in anxious and
tions. New York: Academic Press. non-anxious children. Journal of Clinical Child and Adolescent
Gottman, J. (1991). Chaos and regulated change in families: A Psychology, 37, 1–11.
metaphor for the study of transitions. In P. A. Cowan, & Hudson, J. L., Doyle, A., & Gar, N. S. (2009). Child and mater-
M. Heatherington (Eds.), Family transitions (pp. 247–372). nal influence on parenting behavior in clinically anxious
Hillsdale, NJ: Erlbaum. children. Journal of Clinical Child and Adolescent Psychology,
Gottman, J. M. (1998). Psychology and the study of marital pro- 38(2), 256–262.
cesses. Annual Review of Psychology, 49, 169–197. Jacob, T., Tennenbaum, D. L., & Krahn, G. (1987). Factors
Gottman, J. M., Coan, J., Carrere, S., & Swanson, C. (1998). influencing the reliability and validity of observation data.
Predicting marital happiness and stability from newlywed In T. Jacob (Ed.), Family interaction and psychopathology:
interactions. Journal of Marriage and the Family, 60, 5–22. Theories, methods, and findings (pp. 297–328). New York:
Gottman, J. M., Guralnick, M. J., Wilson, B., Swanson, C. C., & Plenum Press.
Murray, J. D. (1997). What should be the focus of emotion Jacob, T., Tennenbaum, D., Seilhamer, R. A., Bargiel, K., &
regulation in children? A nonlinear dynamic mathematical Sharon, T. (1994). Reactivity effects during naturalistic
model of children’s peer interaction in groups. Development observations of distressed and nondistressed families. Journal
and Psychopathology, 9, 421–452. of Family Psychology, 8, 354–363.
Granic, I., & Lamey, A. V. (2002). Combining dynamic systems Johnson, S. M., & Bolstard, O. D. (1975). Reactivity to home
and multivariate analyses to compare the mother-child inter- observation: A comparison of audio recorded behavior with
actions of externalizing subtypes. Journal of Abnormal Child observers present or absent. Journal of Applied Behavior
Psychology, 30(3), 265–283. Analysis, 8, 181–185.
Granic, I., O’Hara, A., Pepler, D., & Lewis, M. D. (2007). A Johnston, J. J., & Pennypacker, H. S. (1993). Strategies and tac-
dynamic systems analysis of parent-child changes associ- tics of behavioral research. Hillsdale, NJ: Lawrence Erlbaum
ated with successful “real-world” interventions with aggres- Associates.
sive children. Journal of Abnormal Child Psychology, 35, Kazdin, A. E. (2001). Behavior modification in applied settings
845–857. (6th ed.). Belmont, CA: Wadsworth/Thomson Learning.
Green, S. B., & Alverson, L. G. (1978). A comparison of indi- Kerig, P. K., & Baucom, D. H. (Eds.). (2004). Couple observa-
rect measures for long-duration behaviors. Journal of Applied tional coding systems. Mahwah, NJ: Erlbaum.
Behavior Analysis, 11, 530. Kochanska, G., Murray, K. T., & Harlan, E. T. (2000). Effortful
Hartmann, D. P., & Wood, D. D. (1990). Observational meth- control in early childhood: Continuity and change, anteced-
ods. In A. S. Bellack, M. Hersen, & A. E. Kazdin (Eds.), ents, and implications for social development. Developmental
International handbook of behavior modification and therapy Psychology, 36, 220–232.
(2nd ed., pp. 109–138). New York: Plenum Press. Lewis, M. D., Lamey, A. V., & Douglas, L. (1999). A new
Hawes, D. J., Brennan, J., & Dadds, M. R. (2009). Cortisol, dynamic systems method for the analysis of early socioemo-
callous-unemotional traits, and antisocial behavior. Current tional development. Developmental Science, 2, 458–476.
Opinion in Psychiatry, 22, 357–362. Locke, R. L., Davidson, R. J., Kalin, N. H., & Goldsmith, H. H.
Hawes, D. J., & Dadds, M. R. (2005a). Oppositional and con- (2009). Children’s context inappropriate anger and salivary
duct problems. In J. Hudson & R. Rapee (Eds.), Current cortisol. Developmental Psychology, 45(5), 1284–1297.

14 0 ob se rvat io nal co ding s trategies

Lorber, M. F., O’Leary, S. G., & Kendziora, K. T. (2003). measurement error. Journal of Applied Behavior Analysis, 10,
Mothers’ overreactive discipline and their encoding and 325–332.
appraisals of toddler behavior. Journal of Abnormal Child Raver, C. C., Jones, S. M., Li-Grining, C. P., Zhai, F., Metzger,
Psychology, 31, 485–494. M. W., & Solomon, B. (2009). Targeting children’s behav-
Maerov, S. L., Brummet, B., & Reid, J. B. (1978). Procedures ior problems in preschool classrooms: A cluster-randomized
for training observers. In J. B. Reid (Ed.), A social learning controlled trial. Journal of Consulting and Clinical Psychology,
approach to family intervention: Vol. 11. Observation in home 77, 302–316.
settings (pp. 37–42). Eugene, OR: Castalia Press. Ryan, K. D., Gottman, J. M., Murray, J. D., Carrere, S., &
Mash, E. J., & Terdal, L. G. (1997). Assessment of childhood disor- Swanson, C. (2000). Theoretical and mathematical model-
ders (3rd ed.). New York: Guilford Press. ing of marriage. In M. D. Lewis & I. Granic (Eds.), Emotion,
McNeil, C. B., & Hembree-Kigin, T. L. (2010). Parent–child development and self-organization: Dynamic systems approaches
interaction therapy (2nd ed.). New York: Springer. to emotional development (pp. 349–372). New York:
Meaney, M. J. (2010). Epigenetics and the biological definition Cambridge University Press.
of gene × environment interactions. Child Development, 81, Schneider, B. H. (2009). An observational study of the inter-
41–79. actions of socially withdrawn/anxious early adolescents and
Merrilees, C. E., Goeke-Morey, M. C., & Cummings, E. M. their friends. Journal of Child Psychology and Psychiatry, 50,
(2008). Do event-contingent diaries about marital conflict 799–806.
change marital interactions? Behavior Research and Therapy, Sheeber, L. B., Davis, B., Leve, C., Hops, H., & Tildesley, E.
46, 253–262. (2007). Adolescents’ relationships with their mothers and
Moffitt, T. E., Caspi, A., & Rutter, M. (2006). Measured gene- fathers: Associations with depressive disorder and subdiag-
environment interactions in psychopathology: Concepts, nostic symptomatology. Journal of Abnormal Psychology, 116,
research strategies, and implications for research, interven- 144–154.
tion, and public understanding of genetics. Perspectives on Shelton, K. K., Frick, P. J., & Wootton, J. (1996). The assessment
Psychological Science, 1, 5–27. of parenting practices in families of elementary school-aged
Moustakas, C. E., Sigel, I. E., & Schalock, M. D. (1956). An children. Journal of Clinical Child Psychology, 25, 317–327.
objective method for the measurement and analysis of child- Smith, T., Scahill, L., Dawson, G., Guthrie, D., Lord, C., Odom,
adult interaction. Child Development, 27, 109–134. S., et al. (2007). Designing research studies on psychosocial
O’Neal, C. R., Brotman, L. M., Huang, K., Gouley, K. K., interventions in autism. Journal of Autism and Developmental
Kamboukos, D., Calzada, E. J., et al. (2010). Understanding Disorders, 37, 354–366.
relations among early family environment, cortisol response, Snyder, J., Reid, J. B., Stoolmiller, M., Howe, G., Brown, H.,
and child aggression via a prevention experiment. Child Dagne, G., & Cross, W. (2006). The role of behavior obser-
Development, 81, 290–305. vation in measurement systems for randomized prevention
O’Rourke, J. F. (1963). Field and laboratory: The decision-mak- trials. Prevention Science, 7, 43–56.
ing behavior of family groups in two experimental condi- Snyder, J., Schrepferman, L., McEachern, A., Barner, S., Provines,
tions. Sociometry, 26, 422–435. J., & Johnson, K. (2008). Peer deviancy training and peer
Ost, L. G., Svensson, L., Hellstrom, K., & Lindwall, R. (2001). coercion-rejection: Dual processes associated with early onset
One-session treatment of specific phobias in youths: A ran- conduct problem. Child Development, 79, 252–268.
domized clinical trial. Journal of Consulting and Clinical Stoolmiller, M., Eddy, J. M., & Reid, J. B. (2000). Detecting
Psychology, 69, 814–824. and describing preventative intervention effects in a universal
Patterson, G. F. (1974). A basis for identifying stimuli which school-based randomized trial targeting delinquent and vio-
control behaviors in natural settings. Child Development, 45, lent behavior. Journal of Consulting and Clinical Psychology,
900–911. 68, 296–305.
Patterson, G. R. (1982). Coercive family processes. Eugene, OR: Trentacosta, C. J., Hyde, L. W., Shaw, D. S., Dishion, T. J.,
Castalia. Gardner, F., & Wilson, M. (2008). The relations among
Patterson, G. R., & Chamberlain, P. (1994). A functional analy- cumulative risk, parenting, and behavior problems during
sis of resistance during parent training therapy. Clinical early childhood. Journal of Child Psychology and Psychiatry,
Psychology: Science and Practice, 1, 53–70. 49, 1211–1219.
Patterson, G. R., Forgatch, M. S., & DeGarmo, D. S. (2010). Viding, E., & Blakemore, S-J. (2007). Endophenotype approach
Cascading effects following intervention. Development and to the study of developmental disorders: implications for
Psychopathology, 22(4), 949–970. autism research. Behavior Genetics, 37, 51–60.
Patterson, G. R., Reid, J. B., & Dishion, T. J. (1992). Antisocial Whittle, S., Yap, M. B., Yucel, M., Sheeber, L., Simmons, J. G.,
boys. Eugene, OR: Castalia. & Pantelis, C., et al. (2009). Maternal responses to adoles-
Piehler, T. F., & Dishion, T. J. (2007). Interpersonal dynam- cent positive affect are associated with adolescents’ reward
ics within adolescent friendship: Dyadic mutuality, deviant neuroanatomy. Social Cognitive & Affective Neuroscience,
talk, and patterns of antisocial behavior. Child Development, 4(3), 247–256.
78(5), 1611–1624. Zangwill, W. M., & Kniskern, J. R. (1982). Comparison of
Powell, J., Martindale, B., Kulp, S., Martindale, A., & Bauman, problem families in the clinic and at home. Behavior Therapy,
R. (1977). Taking a closer look: Time sampling and 13, 145–152.

h awes, d ad d s, pasali c h 141


Designing, Conducting, and

9 Evaluating Therapy Process Research

Bryce D. McLeod, Nadia Islam, and Emily Wheat

Therapy process research investigates what happens in therapy sessions and how these interactions
influence outcomes. Therapy process research employs an array of methodologies but has recently
used clinical trials as a platform for investigating process–outcome relations. This chapter serves as
a resource for performing and interpreting therapy process research conducted within clinical trials.
Issues related to designing, conducting, and evaluating therapy process research are reviewed, with
examples drawn from the child therapy literature to illustrate key concepts. The chapter concludes
with suggested future research directions.
Key Words: Alliance, therapeutic interventions, treatment integrity, therapy process, outcome

Therapy process research investigates what happens can facilitate the dissemination and implementation
in psychotherapy sessions and how these activities of EBTs into community settings (Kendall & Beidas,
influence clinical outcomes (Hill & Lambert, 2004). 2007; McLeod & Islam, 2011; McLeod, Southam-
Process research covers many topics and employs Gerow, & Weisz, 2009).
diverse methodologies, with current efforts using The goal of this chapter is to serve as a resource
clinical trials as a platform for process research (e.g., to for those conducting and interpreting therapy pro-
investigate process–outcome relations). Randomized cess data collected within an RCT, with examples
clinical trials (RCTs) can be an ideal vehicle for pro- drawn from the child therapy literature to illustrate
cess research (Weersing & Weisz, 2002a). Collecting key concepts. Issues related to designing, conducting,
process data during an RCT can greatly increase the and analyzing therapy process studies will take the
scientific yield of a clinical trial. Indeed, secondary forefront. Therapy process can, for example, include
data analysis of clinical trial data has played a role client behavior (e.g., developing social skills), thera-
in identifying how evidence-based treatments (EBTs) pist behavior (e.g., therapeutic interventions such
produce change (e.g., Crits-Christoph, Gibbons, as cognitive restructuring), and facets of the rela-
Hamilton, Ring-Kurtz, & Gallop, 2011; Huey, tion between client and therapist (e.g., level of client
Henggeler, Brondino, & Pickrel, 2000), the relation involvement; quality of the client–therapist alliance).
of client involvement and outcome (Chu & Kendall, Outcome refers to the short- and long-term changes
2004; Coady, 1991; Edelman & Chambless, 1993, in the client brought about by therapy (Doss, 2004).
1994), whether or not therapeutic tasks affect the
alliance (Kendall, Comer, Marker, Creed, Puliafico, Overview of Therapy Process Research
et al., 2009), and the strength of the alliance–outcome Before focusing on process research methods,
association (Chiu, McLeod, Har, & Wood, 2009; consider a conceptual framework. Figure 9.1 depicts
Hogue, Dauber, Stambaugh, Cecero, & Liddle, a model that incorporates theory and findings from
2006; Klein et al., 2003). The results of these studies the process research tradition (Doss, 2004) and

14 2
Therapy Inputs Treatment Delivery Change Mechanisms Outcomes
Client Chars Relationship Factors (Alliance; Involvement) Symptoms
Parent/ Functioning
Significant Behavior Client
other Chars Therapeutic Interventions (Adherence, Differentiation) Skills Perspectives
Therapist Parenting (for Environments
Chars Therapist Competence (Skillfulness; Responsiveness) child therapy) Systems

Pre-Treatment Treatment Post-Treatment


Figure 9.1 Theoretical Model of Therapeutic Change in Therapy.

treatment integrity research that investigates the delivery: therapeutic interventions (e.g., chang-
degree to which EBTs are delivered as specified in ing cognitive distortions), therapist competence,
treatment manuals (Dane & Schneider, 1998; Hogue, and relational factors (e.g., alliance, client involve-
2002; Jones, Clarke, & Power, 2008; Waltz, Addis, ment). Each component is hypothesized to facili-
Koerner, & Jacobson, 1993). The model details how tate symptom reduction (Chu & Kendall, 2004;
the three components of therapy process—client, Kendall & Ollendick, 2004; Orlinsky, Ronnestad,
therapist, and relational factors—affect clinical out- & Willutzki, 2004).
comes. Although developed for youth psychotherapy, The delivery of specific therapeutic interventions
the model can be extended to apply to therapy with is hypothesized to promote symptom reduction
participants of any age. An in-depth review regard- (e.g., McLeod & Weisz, 2004; Silverman, Pina,
ing each facet of the model is beyond the scope of & Viswesvaran, 2008). An emerging area of focus
this chapter, but the model provides a framework that can aid understanding of how therapeutic
to understand how the components discussed may interventions affect outcomes is treatment integrity
together, or in isolation, influence outcomes. research (McLeod et al., 2009). Treatment integ-
rity focuses upon the degree to which a treatment
Psychotherapy Inputs is delivered as intended (Perepletchikova & Kazdin,
The left side of the model identifies therapy 2005; Waltz et al., 1993). Two components of treat-
inputs that may influence or moderate the process ment integrity, treatment adherence and differen-
and outcome of therapy. Therapy inputs include tiation, refer specifically to the type of therapeutic
(a) client characteristics, such as symptom severity interventions delivered by the therapist. Treatment
(Ruma, Burke, & Thompson, 1996); (b) parent/sig- adherence refers to the extent to which the thera-
nificant other characteristics, such as psychopathol- pist delivers the treatment as designed (e.g., deliv-
ogy (Cobham, Dadds, & Spence, 1998); (c) family ers the prescribed interventions contained within
characteristics, such as stress and family income a treatment manual). Treatment differentiation
level (Kazdin, 1995); (d) therapist characteristics, refers to the extent to which a therapist delivers
such as theoretical orientation (Weersing, 2000) or therapeutic interventions proscribed by a specific
attitudes toward manual-based treatments (Aarons, treatment manual (e.g., delivering psychodynamic
2005; Becker, Zayfert, & Anderson, 2004); and interpretations in a cognitive-behavioral treatment
(e) service characteristics, such as organizational cul- [CBT] program). These two treatment integrity
ture and climate (Schoenwald, Carter, Chapman, & components therefore identify the prescribed (and
Sheidow, 2008). These inputs represent factors pres- proscribed) therapeutic interventions that together,
ent at the start of treatment that potentially influ- and/or in isolation, are hypothesized to be respon-
ence process and outcome. sible for change (Perepletchikova & Kazdin, 2005).
Therapist competence, a second component of
Process Factors treatment integrity (Perepletchikova & Kazdin,
The middle section depicts the main focus of this 2005), is key to treatment delivery (Kazdin &
chapter, the core components involved in treatment Kendall, 1998). Competence refers to the level of

mc leod , i sl am, wh eat 143

skill and degree of responsiveness demonstrated Psychotherapy Outcomes
by a therapist when delivering the technical and The right portion of the diagram represents
relational elements of therapy (Perepletchikova & treatment outcomes. Hoagwood and colleagues
Kazdin, 2005; Waltz et al., 1993). A therapist’s abil- (Hoagwood, Jensen, Petti, & Burns, 1996) sug-
ity to deliver interventions with skill and respon- gested five outcome domains: (a) symptoms/diag-
siveness is said to maximize their effects. To date, noses, a primary outcome in RCTs; (b) functioning,
research has revealed mixed findings regarding the defined as the ability to meet the demands of home,
strength of the relation between therapist compe- work, peer group, or neighborhood; (c) consumer
tence and outcomes (Webb, DeRubeis, & Barber, satisfaction, defined as the client’s experience and/or
2010). Perhaps, in studies where most or all thera- satisfaction with the mental health services; (d) envi-
pists meet a standard of implementation and are ronments, changes in a specific aspect of the client’s
monitored, there is little variability and therefore life (e.g., home, work) brought about by therapy
limited association with outcome. (e.g., changes in family and/or couple communi-
Relational factors—both the alliance and cli- cation); and (e) systems, assessment of service use
ent involvement—have been found to be related patterns following treatment. For process research,
to symptom reduction (Braswell, Kendall, Braith, although all domains are relevant, the reduction of
Carey, & Vye, 1985; Chu & Kendall, 2004; Horvath symptoms and improvements in functioning are
& Bedi, 2002; Manne, Winkel, Zaider, Rubin, two key domains.
Hernandez, & Bergman, 2010; McLeod, 2011). The model provides a framework for the fac-
A therapist’s abilities to (a) cultivate a relationship tors that may influence the process and outcome of
with the client (child, parent, adult, couple, family) therapy. In addition, the model aids understanding
marked by warmth and trust and (b) promote the of how process components may be studied in isola-
client’s participation in therapeutic activities are con- tion (e.g., the alliance–outcome relation) and/or in
sidered instrumental in promoting positive outcomes combination (e.g., therapist competence and client
(Chu et al., 2004; Chu & Kendall, 2004; Horvath & involvement).
Bedi, 2002). It has been hypothesized that a strong
alliance facilitates positive outcomes via increased The Methods of Process Research
client involvement in therapeutic tasks (Kendall & Broadly speaking, the research strategies can be
Ollendick, 2004; Manne et al., 2010), although sup- divided into qualitative and quantitative methods.
port for this hypothesis has been mixed (Karver et al., Qualitative approaches, such as having a therapist
2008; Shirk, Gudmundsen, Kaplinski, & McMakin, review a therapy session and comment upon specific
2008). Compared to the adult field, the relational ele- processes, offer some desirable features. For exam-
ments of therapy have received relatively little empiri- ple, a qualitative approach provides an opportunity
cal attention in the youth field (McLeod, 2011). to gather in-depth information from participants
and hear their unique perspective and experience of
Change Mechanisms