Using Behavioral Skills Training To Teach Behavioral Interventions and Milieu Teaching: A Systematic Review of The Literature and Empirical Investigation

USING BEHAVIORAL SKILLS TRAINING TO TEACH BEHAVIORAL
INTERVENTIONS AND MILIEU TEACHING: A SYSTEMATIC REVIEW OF THE
LITERATURE AND EMPIRICAL INVESTIGATION
by
MYLISSA SLANE
(Under the Direction of Rebecca Lieberman-Betz)
ABSTRACT
Language impairments in children are associated with later impairments in
cognitive, language, and academic domains (Johnson et al., 1999). The prevalence rate
for language impairments is high among community samples (7% to 17%; King et al.,
2005), and speech and language disorders are often co-morbid with other
neurodevelopmental disorders (Rosenbaum & Simon, 2016). Thus, one way to ensure
greater access to services and increased intervention dosage is to train natural
implementers (those who are already part of the child’s typical environment; e.g.,
teachers in a classroom, parents/guardians in a home) to deliver evidence-based language
interventions. The purpose of the following two studies was to (a) systematically review
and synthesize the literature examining the use of behavioral skills training (BST) to train
natural implementers (i.e., teachers and other professionals) to implement various
interventions and (b) extend the current literature by utilizing BST to train teachers to
implement primary components of a language intervention, milieu teaching (MT), with
fidelity. Results of the systematic review showed that BST could be effectively used to
train teachers and staff to implement a variety of interventions (e.g., reading racetrack,
the picture exchange communication system [PECS], discrete trial teaching [DTT], the
natural language paradigm [NLP]) targeting a variety of skills and deficits. However,
only a handful of studies had sufficient rigor, quality, and interpretable outcomes to infer
a functional relation. The second study was an empirical investigation examining the
effects of BST training on implementation of MT. Two teachers were taught to
implement MT using BST and both teachers learned to implement three core MT
techniques: following the child’s lead (FTCL), teaching social routines (TSR), and the
system of least prompts (SLP). A functional relation was demonstrated across each tier
for one teacher, with two out of three behaviors (FTCL and TSR) replicated across two
teachers. Results from the systematic literature review and the empirical investigation
have implications for future research in that both studies suggested natural implementers
(teachers and staff) can and should be taught to implement interventions with fidelity,
thereby increasing access to evidence-based interventions for children with disabilities.
INDEX WORDS: behavioral skills training, milieu teaching, teachers, intervention

by
MYLISSA MARY SLANE
Bachelor of Arts, Bloomsburg University of Pennsylvania, 2011
Master of Science, Bucknell University, 2013
A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2020
© 2020
Mylissa Mary Slane
All Rights Reserved

by
MYLISSA MARY SLANE
Major Professor: Rebecca Lieberman-Betz
Committee: A. Michele Lease

Amy Reschly
Joel Ringdahl
Electronic Version Approved:
Ron Walcott
Dean of the Graduate School
The University of Georgia
December 2020
ACKNOWLEDGEMENTS
I would like to thank my major professor, Dr. Rebecca Lieberman-Betz for her
unwavering support and assistance. I never would have been able to complete this project
without her guidance, advice, and assistance. I would also like to thank my academic
advisor and committee member, Dr. Michele Lease for believing in me and for pushing
me to continue even when things seemed impossible or insurmountable. Thank you as
well to my committee members, Dr. Joel Ringdahl and Dr. Amy Reschly who provided
me with helpful feedback and support throughout this process. They challenged me to
think critically and helped improve my project immensely. I cannot express my gratitude
enough to those who helped me with data and reliability coding, Maggie Molony, Ali
Zelan, and Kelsie Tyson. Their hard work and dedication were truly admirable and
without them, this project would not have been possible. I would also like to thank my
family who have supported me through this process and throughout all of graduate
school. They have always been there for me, especially when times were trying or
difficult. They are my rock and inspiration and without them, I would never have had the
courage to push myself as far as I have. I would also like to thank the school where I
completed my project for their support and cooperation throughout the project. In
addition, I want to extend my sincerest gratitude to my participants, without whom this
project would not be possible. Thank you to all of the friends I made throughout graduate
school and especially to my cohort for always being there for one another and supporting
one another. We truly made a great team! Thank you all!
iv
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS ............................................................................................... iv
LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES ..........................................................................................................viii
CHAPTER
1 INTRODUCTION ............................................................................................. 1
Naturalistic Developmental Behavioral Interventions ................................ 2
Milieu Teaching........................................................................................... 3
Behavioral Skills Training ........................................................................... 5
Purpose of the Studies ................................................................................. 7
2 STUDY 1: USING BEHAVIORAL SKILLS TRAINING TO TEACH
BEHAVIORAL INTERVENTIONS: A SYSTEMATIC REVIEW ................. 8
Abstract........................................................................................................ 9
Introduction ............................................................................................... 11
Method ....................................................................................................... 15
Results ....................................................................................................... 20
Discussion.................................................................................................. 28
3 STUDY 2: TEACHING THE TEACHER: USING BEHAVIORAL SKILLS
TRAINING TO TRAIN TEACHERS TO IMPLEMENT MILIEU
TEACHING TECHNIQUES........................................................................... 53
Abstract...................................................................................................... 54
v
Introduction ............................................................................................... 56
Method ....................................................................................................... 66
Results ....................................................................................................... 89
Discussion.................................................................................................. 97
4 GENERAL DISCUSSION ............................................................................ 113
REFERENCES ................................................................................................................ 117
APPENDICES
A OUTCOME CODING DESCRIPTIONS FOR SINGLE CASE ANALYSIS
REVIEW AND FRAMEWORK (SCARF)................................................... 133
B OBSERVATION DATA COLLECTION SHEET ....................................... 135
C INTERVENTION FIDELITY SHEETS ....................................................... 136
D DATA COLLECTION SHEETS .................................................................. 145
E TEACHER DEMOGRAPHICS FORM ........................................................ 147
F CHILD DEMOGRAPHICS FORM .............................................................. 151
G SOCIAL VALIDITY MEASURE ................................................................ 153
vi
LIST OF TABLES
Page
Table 2.1: Participant Demographics ................................................................................ 35
Table 2.2: Study Outcomes ............................................................................................... 38
Table 2.3: Rigor Coding Questions from the Single Case Analysis Review and
Framework (SCARF) ............................................................................................ 42
Table 2.4: Quality & Breadth of Measurement Coding Questions from the Single Case
Analysis Review and Framework (SCARF) ......................................................... 44
Table 2.5: SCARF Quality, Rigor, and Outcome Scores .................................................. 47
Table 3.1: Teacher Participant Demographics ................................................................ 105
Table 3.2: Child Participant Demographics .................................................................... 106
Table 3.3: IOA Agreement by Tier and Condition.......................................................... 107
vii
LIST OF FIGURES
Page
Figure 2.1: Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA)
Flow Diagram ........................................................................................................ 49
Figure 2.2: SCARF Quality and Rigor of Primary Outcomes........................................... 50
Figure 2.3: SCARF Quality and Rigor of Generalized Outcomes .................................... 51
Figure 2.4: SCARF Quality and Rigor of Maintained Outcomes ..................................... 52
Figure 3.1: Accurate Use of the Prompting Hierarchy .................................................... 108
Figure 3.2: Fidelity of Implementation Across Behaviors for Ms. Smith ....................... 109
Figure 3.3: Fidelity of Implementation Across Behaviors for Mr. Parker ...................... 111
viii
CHAPTER 1
INTRODUCTION
Language delays in children can have serious negative implications for future
development, including educational and social development (Peterson, 2004). Prevalence
rates of speech and language disorders in children vary based on child age and diagnostic
criteria, but are estimated to affect between 3% and 16% of children in the US
(Rosenbaum & Simon, 2016). They are also found to frequently co-occur with other
neurodevelopmental disabilities, such as autism spectrum disorder (ASD; Rosenbaum &
Simon, 2016). Given the impact of language delays on daily functioning and social
interactions, and the importance of language development for growth and development in
other areas, the need for intervention for children with language delays cannot be
understated.
Discrete trial teaching (DTT) became one of the most widely implemented
interventions for communication delays and many other developmental needs for
individuals with ASD (Schreibman et al., 2015). During the process of DTT, a targeted
skill is broken down into several components and the child and interventionist work on
learning one component at a time until the target behavior is mastered (Schreibman et al.,
2015). However, several limitations with the use of DTT in individuals with ASD were
identified including lack of generalization, challenging behaviors, a lack of spontaneity,
and a heavy dependence on prompts for performance (Schreibman et al., 2015).
According to Schreibman et al. (2015) the identification of these shortcomings, along
with a surge in the literature examining developmental interventions for young children
1
with social-communication disorders such as ASD, led to increased interest in more
naturalistic interventions.
Naturalistic Developmental Behavioral Interventions
More naturalistic teaching strategies began to be used after a surge in
developmental research suggested that children with ASD learn on a developmental
trajectory that is more similar to, rather than different from, typical development
(Schreibman et al., 2015). This, in combination with the limitations of DTT, led to an
incorporation of more naturalistic intervention based strategies for children with ASD
(Schreibman et al., 2015). Naturalistic intervention strategies use natural reinforcers (as
compared to arbitrary), use materials that children prefer, reinforce child attempts at
communicating or approximations of a target response, and intervene in a more natural
context rather than one that is contrived. Naturalistic developmental behavioral
interventions (NDBIs) combine these naturalistic intervention strategies with
developmental interventions and principles of applied behavior analysis to create a class
of interventions all its own (Schreibman et al., 2015). According to Schreibman et al.
(2015), there are several common features of NDBIs, including (1) a three-part
contingency; (2) manualized practice; (3) individualized treatment goals; (4) ongoing
measurement of progress; (5) child-initiated teaching episodes; (6) environmental
arrangement; (7) use of prompting and prompt fading; (8) modeling; (9) adult imitation
of the child’s language, play, or body movements; and (10) broadening the attentional
focus of the child. Naturalistic interventions, including NDBIs incorporate not only the
2
natural environments of children (e.g., classroom, home), but also intervention
implementers who are part of children’s natural environments (e.g., parents, teachers).
Training natural implementers to carry out NDBIs has the potential to increase
dosage exponentially for children with disabilities receiving intervention within the
natural environment (Peterson, 2004). Parents and teachers have many more
opportunities throughout the day and the week to implement intervention techniques and
to help address children’s identified needs. Teachers have been identified as one of the
prime candidates in terms of natural implementers and several studies have demonstrated
their ability to implement various interventions with fidelity. For example, teachers have
been trained to implement prelinguistic milieu teaching (PMT; Mccathren, 2000),
manualized interventions targeting joint attention (Kaale et al., 2012), enhanced milieu
teaching (Olive et al., 2007), naturalistic language teaching (Smith & Camarata, 1999),
and symbolic play and joint attention interventions (Wong, 2013). Indeed, several
intervention techniques including time delay, the mand-model procedure, and milieu
teaching have been identified as particularly suited to teacher implementation (Peterson,
2004).
Milieu Teaching
Milieu Teaching (MT) is an intervention technique that is conversation-based and
focuses on the child’s interests to encourage communication from the child (Kaiser et al.,
1993). MT has been successfully implemented with children with speech/language and
other communication delays. There are three main mechanisms to MT, which include
environmental arrangement, responsive interaction techniques, and milieu teaching
3
procedures (time delay, modeling, and mand-modeling; Peterson, 2004). As part of
environmental arrangement, the teacher or interventionist sets up the environment in such
a way as to encourage communicative acts on the part of the child (Peterson, 2004). For
example, an interventionist may place an object out of reach of the child or in a place
where the child cannot easily get to without the teacher’s help. It is the hope that such
object placement will occasion the child to communicate with the teacher in order to
receive help obtaining the desired item. Responsive interaction techniques include:
following a child’s lead, turn-taking, providing descriptive statements, imitating the
child’s verbalizations, and expanding on the child’s statements (Peterson, 2004). Time
delay is a procedure in which one uses nonvocal cues to occasion vocal responding from
a child (Peterson, 2004). During this procedure a teacher identifies something that the
child wants or desires and looks at them expectantly in the hope of occasioning a vocal
response. If this method does not work, the teacher then moves on to the mand-model
procedure. The mand model procedure is one that involves both manding (making a
request from the child) and modeling (demonstrating for the child what he/she is
expected to do). This is a teacher-initiated strategy in which the teacher directly asks the
child what he/she wants and then models the appropriate vocal response if no response is
given.
MT has a strong research base supporting its efficacy in teaching children with
language delays new language targets (Bolzani et al., 1990). Mand-model and incidental
teaching techniques improved communication in children with mild intellectual disability
(Warren & Gazdag, 1990), improved spontaneous production of multiple and single
4
words in children who experience prenatal cocaine exposure (Bolzani et al., 2009),
increased language in children with developmental disabilities (Togram & Erbas, 2010),
and taught a photo exchange system to a child with ASD (Ogletree et al., 2012). In
addition to having trained implementers utilize MT to increase language gains in young
children with developmental disabilities, several studies have taught teachers to
implement MT in young children and also demonstrated positive effects on children’s
language development (e.g., Kaiser et al., 1993). Therefore, research has shown that
children can benefit from teachers’ implementation of MT. However, the methods used
for training teachers to implement such interventions vary across studies.
Behavioral Skills Training
Behavioral Skills Training (BST) is a comprehensive training package that
incorporates four main elements, including: instruction, modeling, rehearsal, and
feedback (Kornacki et al., 2013; Ward-Horner & Sturmey, 2012). During the instruction
phase, the trainer provides information on the target intervention (usually in the form of
written directions or a slideshow). During modeling, the trainer demonstrates how to
perform the steps of the desired skill accurately for the learner. The rehearsal portion
gives the learner the opportunity to practice the desired skill with the trainer, so that
he/she may become more comfortable performing the target behaviors. Finally, the
trainer provides the trainee with corrective feedback as he/she implements the target
behaviors (Krumhus & Malott, 1980; Nuernberger et al., 2013).
BST has been used not only to train individuals to perform new behaviors or
engage in new tasks, but it has also been used to train teachers and other professional
5
staff to conduct a variety of interventions. For example, BST has been used to train
teachers to implement the Picture Exchange Communication System (PECS) with their
students (Homlitas et al., 2014), to train staff to implement mand training (Nigro-Bruzzi
& Sturmey, 2010), to train teachers and staff to use discrete trial teaching (DTT; Jull &
Mirenda, 2016; Sarokoff & Sturmey, 2004), to train teachers to implement specific goals
from students’ behavioral intervention plans (BIP; Madzharova & Sturmey, 2018), and to
train teachers to implement response interruption and redirection (RIRD) with students
exhibiting self-stimulatory behaviors (Giles et al., 2018).
Additionally, BST has been used to train several NDBIs in the literature,
including training teachers to implement the Natural Language Paradigm and response
chaining (NLP; Seiverling et al., 2010), training teachers to effectively use NLP
(Gianoumis et al., 2012), and training paraprofessionals to implement the system of least
prompts using an embedded teaching procedure (Toelken & Miltenberger, 2012). This is
a key study because the use of a brief, embedded teaching procedure ensured that the
staff could perform the intervention effectively while still performing their regular
classroom duties, thereby ensuring that the intervention did not interfere with their
primary teaching duties.
Thus, BST has been used to train a variety of NDBI techniques with a variety of
implementers, including teachers and other professional staff. Similarly, MT has been
shown to be effective in improving children’s language gains and outcomes. However,
studies examining MT report using a variety of training techniques, but do not report
utilizing BST as a training package
6
Purpose of the Studies
The purpose of the following studies is to: (1) provide a systematic review of the
current literature regarding the use of BST to train teachers and other professional staff to
implement various behavioral interventions; and (2) to conduct an empirical investigation
examining the effects of BST on teachers’ fidelity of implementation of various MT
techniques. First, Study 1 sought to determine whether BST has been used effectively to
train other individuals (teachers and other professionals) to implement interventions with
children ages birth to 21 through a systematic review of the current literature. It also
sought to determine the level of quality of the literature base. Finally, Study 1 sought to
determine whether there were any variables that impact the efficacy of BST training on
fidelity of intervention implementation. Second, Study 2 sought to determine whether
BST was effective in training teachers to implement several MT techniques in the
classroom with fidelity. This study also examined whether the fidelity of implementation
of intervention techniques generalized to another set of toys and whether it maintained
over time. The combined results of these two studies have the potential to increase the
confidence in the use of teachers and other professional staff as natural implementers
who can implement interventions with fidelity when properly trained. This has the
potential to increase dosage for students receiving interventions if they can be properly
implemented by teachers, who spend a great deal more time with students than most
other interventionists.
7
CHAPTER 2
STUDY 1: USING BEHAVIORAL SKILLS TRAINING TO TEACH
BEHAVIORAL INTERVENTIONS: A SYSTEMATIC REVIEW1
1
Slane, M. M., & Lieberman-Betz, R. To be submitted to Behavioral Interventions
8
Abstract
Behavioral skills training (BST) is a well-researched, established set of principles that has
been used to train a variety of individuals to complete numerous behavioral tasks, skills,
strategies, and interventions (DiGennaro et al., 2018). It includes the four main
components of instruction, modeling, rehearsal, and feedback (DiGennaro et al., 2018)
and has been used to teach individuals from a variety of backgrounds (including natural
implementers) to implement various interventions (e.g., Madzharova & Sturmey, 2018;
Nigro-Bruzzi & Sturmey, 2010; Sarokoff & Sturmey, 2004). However, there has not been
a comprehensive review that has synthesized and analyzed the existing literature on the
use of BST to train staff and teachers working with children, adolescents, and young
adults to implement interventions. Therefore, the current review aimed to close this gap
in the literature by conducting a systematic review of studies utilizing BST to train
teachers and other professionals to implement various interventions with children ages
birth to 21. A total of 19 studies from 17 articles were included in the review. The
SCARF protocol (Ledford, et al., 2016) was utilized to rate article quality/rigor and
outcomes of studies. All studies showed positive outcomes, suggesting that teachers and
other professional staff can be effectively taught using BST to implement a variety of
interventions with fidelity. However, only seven articles were found to have sufficient
quality/rigor scores in their primary outcomes to allow for interpretation of findings with
confidence. This indicates that additional high-quality studies are needed to examine the
efficacy of BST to teach others to implement intervention to support skill development in
9
individuals with disabilities. Implications for future research and intervention are
discussed.
INDEX WORDS: behavioral skills training, systematic literature review, natural
implementers, teachers
10
Using Behavioral Skills Training to Teach Behavioral Interventions: A Systematic
Review
Behavioral skills training (BST) is a well-researched, established set of principles
that has been used to train a variety of individuals to complete numerous behavioral
tasks, skills, strategies, and interventions (DiGennaro et al., 2018). In one of the earliest
studies to examine BST, Koegel et al. (1977) trained teachers to implement several
behavior modification procedures, including prompting, shaping, discrete trials, and
proper implementation of consequences. BST procedures involved reading a training
manual and watching video recordings of proper instructional techniques; implementing
the behavioral modification strategies and receiving live feedback from trained staff
regarding their performance; and receiving praise for correct performance, and corrective
feedback and modeling to rectify improper implementation. Using these procedures, the
authors were able to train teachers to implement behavioral modification techniques with
fidelity. Alden et al. (1978) further expanded the procedures that comprise BST, using
modeling and rehearsal as part of the initial training procedures rather than the corrective
feedback procedure. Around this time, recognition of the potential of BST to teach
important behaviors piqued and numerous studies examining its utility were published.
Several studies have utilized BST to increase knowledge and safety skills in
young children (e.g., Kolko et al., 1991; Miltenberger et al., 2009; Miltenberger &
Thiesse-Duffy, 1988; Wurtele & Owens, 1997; Wurtele, 1990), to teach children how to
find help when lost (Pan-Skadden et al., 2009), to teach safety skills aimed at preventing
sexual abuse (Wurtele et al., 1986), and to prevent child abduction (e.g., Bromberg &
11
Johnson, 1997; Johnson et al., 2005, 2006). BST has also been used to help promote
knowledge and prevent the risk of HIV/AIDS (e.g., Adams et al., 1992; Boyer &
Kegeles, 1991; Lawrence et al., 1995), to prevent gun play in young children (e.g., Himle
& Miltenberger, 2004; Miltenberger et al., 2004, 2005), and to encourage smoking
cessation (Glasgow & Lichtenstein, 1987). In recent years, the use of BST has further
expanded to include training teachers, staff, and parents to implement behavior analytic
principles as well as other interventions.
BST has been used to train teachers to implement complex behavior intervention
plans in the classroom (Madzharova & Sturmey, 2018); to train staff to use mand training
with children (Nigro-Bruzzi & Sturmey, 2010); to use discrete-trial teaching (Sarokoff &
Sturmey, 2004); and to improve the use of positive reinforcement, error correction, and
increase opportunities for responding (Palmen et al., 2010). BST has also been used to
train intervention-naïve adults to implement the picture exchange communication system
(PECS; Rosales et al., 2009). Additionally, researchers have used BST to train parents to
implement a variety of behavior analytic techniques. Parents have been taught to improve
food selectivity (Seiverling et al., 2012), promote social skills development (Hassan et
al., 2018), implement guidance compliance (Miles & Wilder, 2009), and implement
discrete-trial teaching (Ward-Horner & Sturmey, 2008).
Because of the extensive use of BST to train individuals to use or implement new
skills, several studies have sought to identify the most potent components of the
intervention package. Krumhus and Malott (1980) independently analyzed three
components of BST, including (1) instructions, (2) modeling, and (3) feedback. Although
12
use of instructions alone showed slight improvements in accuracy, use of modeling
drastically increased accuracy, and use of feedback led to further increases. In a follow up
study, Ward-Horner and Sturmey (2012) found that while modeling was an important
component of BST, feedback was the most effective and necessary component of the
training package. However, Kornacki et al., (2013) found that the key component for
BST success varied by individual participants. Given the results of these studies, BST is
now generally considered a four-component training package consisting of (1)
instruction, (2) modeling, (3) rehearsal, and (4) feedback (DiGennaro et al., 2018).
The BST literature has grown tremendously over the last 40 years, with its uses,
trainees, and contexts increasingly expanding. The field has identified the critical
components of this training package and has established it as a well-supported, evidence-
based training package. The contexts in which BST can be applied are constantly
increasing, expanding both the utility and applicability of BST to numerous behavior
analytic principles and interventions. In addition, BST has evolved from a training
package to change specific target behaviors to one that can be used to train other
individuals to implement a variety of intervention procedures. However, there has not
been a study that has synthesized and analyzed the existing literature on the use of BST to
train staff and teachers working with children, adolescents, and young adults to
implement various interventions. Such an analysis could benefit the field in several ways.
First, it would provide a comprehensive overview of the current BST-intervention
literature, including the populations with which it has been implemented, the topics of
focus (i.e., interventions, behavioral principles), and the outcomes of BST
13
implementation. Second, it would bring to light variables or factors that may influence
the efficacy/effectiveness of BST to teach others to implement interventions. Third, such
a review could aid future researchers and clinicians in determining whether BST is an
appropriate training package for training others to implement a target intervention and
highlight any significant considerations. Fourth, a review would provide a sense of the
quality of the current literature and in turn provide directions for future research to help
increase the quality of future studies.
Purpose of the Review
The purpose of the current review is to systematically synthesize and analyze the
BST-intervention literature to help guide both research and practice. This review will
support the translation of research into practice and will help guide clinical decision
making for assessing and evaluating whether BST is the appropriate training package
based on type of intervention, outcome variables, population, context, and quality of
published studies. The following research questions will be addressed:
1. Is BST efficacious when used to train other individuals to implement
interventions with children, adolescents, and young adults (ages birth to 21)?
2. What is the quality of studies comprising the current literature base examining the
use of BST to teach interventions?
3. What, if any, are the pertinent intervention variables that influence the
efficacy/effectiveness of BST on acquisition of intervention skills?
14
Method
The Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA)
guidelines were used to guide decision making throughout the literature review. The
PRISMA guidelines were developed in an effort to ensure scientific rigor and structural
commonality among systematic reviews and meta-analyses (Moher et al., 2009).
Article Search
A keyword search of the following databases was conducted to identify studies
for review: (1) PsycINFO, (2) Psychology and Behavioral Sciences Collection, and (3)
Education Research Complete. Search terms were entered as follows: Line 1: “behavioral
skills training;” AND Line 2: “intervention.” If available, the following options were also
selected: publication type- peer-reviewed journals, language- English. Studies were
published before or during December 2019, when the search was conducted. In addition
to the database search, the reference lists of all eligible studies were reviewed for relevant
articles to increase the scope of the search.
The following inclusion criteria were used to identify eligible studies: (1) the
study must have been published in a peer-reviewed journal (dissertations and theses were
excluded); (2) the study must either have been written in or translated to English; (3) the
study must have been quantitative in nature, utilizing either a group or single case design;
(4) the phrase “behavioral skills training” must have appeared somewhere in the article
(not the references section alone); and (5) the study must have used BST to train teachers,
paraprofessionals, or other staff to implement some form of intervention with children,
adolescents, or young adults (ages birth to 21 years).
15
A total of 172 studies were identified through the initial online search, with an
additional 21 studies identified through other sources, for a total of 193 studies to be
screened for full inclusion criteria. After duplicates were removed, a total of 146 studies
remained. The abstracts of these studies were reviewed using the full eligibility criteria
and a total of 124 were excluded. The remaining 22 full text articles were examined to
confirm eligibility and five were excluded due to failure to meet full inclusion criteria. In
the end, a total of 19 studies from 17 articles were included in the review (see Figure 2.1
for a complete diagram of study inclusion/exclusion).
Article Coding
Descriptive Information
Studies were reviewed and coded for descriptive characteristics. Data were
collected on trainees (e.g., age and gender), intervention recipients (e.g., age, diagnoses,
gender, demographic information), BST implementation (components used and fidelity),
intervention type, and intervention quality (setting, target behaviors, fidelity, and
effectiveness).
Evaluation Criteria for Study Quality
The methodology of all single case design (SCD) studies (all studies) was
evaluated using the Single-Case Analysis and Review Framework (SCARF; Ledford et
al., 2016). The SCARF was designed to assess study (1) rigor, (2) quality, and (3)
outcomes. Outcome scores on the SCARF of 3.0 - 4.0 or higher are consistent with
confidence in the demonstration of a functional relation. Similarly, quality/rigor scores of
16
2.0 or higher are considered to be strong enough for the study results to be interpreted.
The three areas assessed are discussed in more detail below.
Rigor. The three quality indicators for study rigor are reliability, fidelity, and
sufficiency of data. Evidence of the reliability of the dependent variable is determined
through examining the collection, reporting, and levels of interobserver agreement (IOA)
data. Evidence of fidelity of the independent variable is determined by looking at the
fidelity of implementation, sufficiency of the fidelity data, the frequency of fidelity data
collection, and the use of inter-observer agreement for fidelity data. Finally, evidence of
the sufficiency of the data is determined by examining the number of data points per
condition and the overall trend of the data when switching between conditions.
Quality of measurement. The seven indicators for study quality are social and
ecological validity, participant descriptions, condition descriptions, dependent variable
descriptions, two forms of generalization measurement, and measurement of
maintenance. Evidence of the social and ecological validity of the study is determined
through examining feasibility and acceptability data, psychometric properties of utilized
measures, normative comparisons for dependent variables, and the environment in which
the study is implemented. Evidence for participant description is determined by
examining quality of demographic data, reporting of formal test results, general
participant information, and study inclusion criteria. Condition descriptions are evaluated
by analyzing the description of condition procedures, the dosage, the setting, and the
demographics and training characteristics for the individuals implementing the
interventions.
17
The dependent variables are evaluated in terms of their operational definitions,
examples of target and non-target behaviors, and the description of the measurement
system and its use. Evidence for generalization is determined by evaluating the
description of generalization across contexts, materials, individuals, and settings in
addition to the programming of behavior generalization. Finally, evidence for
maintenance is evaluated by examining the reporting of continued data collection and
behavior change, the number of times maintenance is evaluated, and the time frame
during which maintenance is assessed.
Outcomes. The three quality indicators for study outcomes involve reporting for
primary outcomes, generalized outcomes, and the maintenance of outcomes. Evaluation
of these quality indicators requires the examination of the type of measurement used to
determine outcomes, the strength and evidence for treatment efficacy/effectiveness, and
the generalization and maintenance of reported study outcomes.
Interrater Reliability
First, interrater reliability (IRR) was examined for inclusion of studies.
Approximately 30% of all considered studies were reviewed by a second rater to
determine IRR (n = 43). Studies were judged to be included, excluded, or uncertain by
the second rater. Any study that was placed in the uncertain group and any disagreements
on inclusion were fully reviewed, discussed, and resolved by the two reviewers until a
consensus was reached. When screening a study for inclusion, raters first searched the
article for the phrase “behavioral skills training.” If the study did not include this phrase,
it was excluded from further eligibility review. If the study did include this phrase, the
18
raters then reviewed it to ensure it satisfied all five of the inclusion criteria described
above for inclusion in the review. IRR for inclusion/exclusion of studies was 88%.
For studies meeting criteria for full review, 30% were reviewed by a second rater
to determine IRR for article quality coding using the SCARF (n = 6). The two raters had
to demonstrate 80% reliability on two consecutive training studies before reliability
coding could begin. If an article’s IRR rating fell below 80%, discrepancies were
reviewed, discussed, and resolved by the two raters until a consensus was reached. IRR
was determined by dividing the number of agreements by the number of agreements plus
disagreements and multiplying by 100. Average IRR data was 80% with a range of 74%-
85%.
Data Analysis
The following methods were used to address each of the proposed research
questions:
1. To determine whether BST was efficacious when used to train other individuals to
implement some form of intervention with children, adolescents, and young adults
(ages birth to 21), the primary, generalized, and maintenance outcome SCARF
ratings for each study were analyzed and synthesized. Outcome scores of 3.0 - 4.0
or higher are consistent with confidence in the demonstration of a functional
relation. This information allowed for the determination of BST
efficacy/effectiveness in improving immediate intervention outcomes as well as
generalized and maintained outcomes.
19
2. In order to evaluate the quality of studies in the current literature base, all SCARF
variables were used to assign overall quality/rigor ratings. Scores of 2.0 or higher
are considered to be strong enough for the study results to be interpreted.
3. To determine the pertinent intervention variables that influence the
efficacy/effectiveness of BST on acquisition of intervention skills, procedural
fidelity and use of BST components were examined.
Results
Research Design
All included studies used a multiple baseline or multiple probe experimental
design (see Table 2.1). Eleven studies utilized a multiple baseline across participants
design (one of which was nonconcurrent). One study used a multiple baseline across
behaviors design. Of the seven studies using multiple probe designs, all were across
participants.
Participant Characteristics
As shown in Table 2.1, 11 studies examined the use of BST to train teachers as
implementers of interventions, seven studies examined the use of BST to train clinic staff
to implement intervention, and one study examined the use of BST to train a swimming
instructor to implement intervention with his/her student. Across all 19 studies, 74 natural
implementers were trained using BST. Natural implementers ranged in age from 19 to 50;
however, eight of the included studies did not report the age range for the participants
trained using BST. Implementers trained using BST had a variety of years of experience
20
ranging from no experience to multiple years of experience. Implementers' race/ethnicity
was only reported in one study (Chazin et al., 2018).
Across 17 studies, a total of 71 intervention recipients participated, with an age
range of 2 to 12 years; however, two studies did not report the number of intervention
recipients (Aherne & Beaulieu, 2019; Palmen et al., 2010) and an additional three studies
did not include their age ranges (Davenport et al., 2019; Hassan et al., 2017; Hogan et al.,
2015). Child intervention recipient race/ethnicity data were not reported in any studies
included in this review. The majority of child intervention recipients had a diagnosis of
autism spectrum disorder (ASD; n = 12 studies). Two studies listed developmental
disability as the primary diagnosis; one listed multiple disabilities, including global
developmental delay; one listed multiple physical disabilities; and diagnosis was not
reported in three studies (Aherne & Beaulieu, 2019; Davenport et al., 2019; Hogan et al.,
2015).
General Intervention Characteristics
The majority of the studies were conducted in a classroom/school environment (n
= 13). The remaining occurred at a home/training center (n = 1), a clinic/training center
(n = 1), a clinic (n = 1), a community pool (n = 1), a home (n = 1), and a treatment center
for individuals with ASD (n = 1). BST was used to target a variety of intervention skills,
including discrete trial teaching (DTT; n = 4), fidelity of behavior intervention plan (BIP)
implementation (n = 2), the natural language paradigm (NLP; n = 2), reading racetrack
intervention (n = 1), incidental teaching (n =1), and others. Sixty-eight percent of studies
(n = 13) also reported outcomes for child recipients, such as unprompted functional
21
communications, number of sight words read correctly, percentage of correct responses,
vocalizations, and stereotypy (see Table 2.2). All 19 studies reported using all four BST
intervention components (instruction, modeling, rehearsal, and feedback).
All 19 studies reported positive effects of BST, with BST effectively improving
the teachers’ and staffs’ implementation of the trained intervention. Such effects included
percentage of correct responses for discrete trial teaching (DTT), percentage of Reading
Racetrack intervention steps implemented correctly, percentage of steps performed
correctly in a natural language paradigm (NLP) intervention, and number of errors in
implementing response interruption and redirection (RIRD) for stereotypy. See Table 2.2
for a complete list of the foci of BST and recipient outcomes measured. Interestingly,
Aherne and Beaulieu (2019) reported the use of a self-evaluation procedure to help with
maintenance of learned skills. This is similar to the self-recording of performance
described by Nabeyama and Sturmey (2010), which together suggest that BST can be
combined with self-monitoring procedures to improve outcomes. Seiverling et al. (2010)
combined general-case training with BST in order to improve NLP and response chaining
performance. Finally, Chazin et al. (2018) found that BST effectively improved staff
implementation of student behavior intervention plans when training was combined with
coaching but not with training alone. Thus, while the majority of studies demonstrated
that BST alone was effective at improving teacher and staff fidelity of implementation of
various interventions, several studies noted the inclusion of other training components as
well.
22
Study Rigor
All 19 studies reported using all four BST intervention components (instruction,
modeling, rehearsal, and feedback). Thus, all key components were included for BST as
it was implemented in the studies. Ledford et al. (2016) describe study rigor as reliability
of the dependent variable, procedural fidelity, and the data itself (see Table 2.3 for coding
details). Overall, 100% of studies reported dependent variable reliability data (n = 19);
84.2% reported collecting reliability data in both primary comparison conditions for at
least 20% of sessions overall and reported greater than 80% agreement. Thus, the studies
in this review had mostly strong reliability data. In contrast, only 42.1% (n = 8) of studies
reported collection of fidelity data for the independent variable and of that 42.1%, only
37.5% (n = 3) reported collecting fidelity data in both primary conditions. However,
100% (n = 8) of these studies reported fidelity data of 80% or higher. Thus, the studies
that did report procedural fidelity reported good procedural fidelity with high ratings.
Even so, less than half the studies in this review included fidelity data and only three
reported collecting such data in both primary conditions. This lack of fidelity data across
over half the studies calls into question whether the interventions were implemented as
intended, and thus impacts the confidence with which we can attribute positive findings
to the BST training. Regarding sufficiency of data, 68.4% (n = 13) of studies had at least
three data points per condition and 84.2% (n = 16) had enough data to infer a functional
relation. However, for those studies that did not satisfy these criteria, this limits
inferences of a functional relation between the independent and dependent variables. This
is due to the fact that a greater number of data points allows for the detection of a pattern,
23
variability, and trends in the data whereas having only two data points can, for example,
show you a trend in one direction or another, but this trend may be highly misleading
without additional data points.
Study Quality
Study quality is composed of participant, condition, and dependent variable
descriptions; the examination of social/ecological validity; and maintenance and
generalization (see Table 2.4 for coding details). When describing participants, only
42.1% (n = 8) of studies gave complete descriptions of participants including the number
of participants, ages, gender, etc. In fact, only one study included race/ethnicity in its
description of participants, but only for implementers and not for intervention recipients.
This was a relative weakness for the studies included in this review, making it difficult to
determine for whom BST may be an appropriate and effective intervention. Conditions
for both primary comparison conditions were adequately described in 84.2% (n = 16) of
studies, demonstrating a relative strength for this group of studies. Similarly, authors
adequately described observable characteristics of dependent variables (i.e., operational
definitions) in 78.9% (n = 15) of studies, demonstrating yet another relative strength for
the studies included in this review. Of note, failure to include operational definitions is a
serious problem for reporting measurement of the dependent variable, and not only
makes the study difficult to replicate but makes it difficult to determine specific
behaviors that were measured as part of the study.
Overall, 73.7% (n = 14) of studies reported examining social/ecological validity.
Therefore, this is a relative strength of the studies included in this review. A total of
24
47.4% of studies (n = 9) assessed generalization in some form. One study assessed
generalization across contexts, four studies assessed generalization across individuals,
and five studies assessed generalization across responses. Of note, one study assessed
generalization across both responses and individuals (Sarokoff & Sturmey, 2008). Thus,
less than half the studies in this review assessed generalization in any form, indicating a
weakness for this group of studies. Similarly, only 42.1% of studies (n = 8), assessed
maintenance of outcomes. With less than half of the studies having assessed maintenance,
this also represents a weakness for this group of studies.
Primary Outcomes
Study rigor, quality, and outcome scores are provided in Table 2.5. For a
complete description of study outcome coding, see Appendix A. Overall quality/rigor
scores of 2.0 or higher are considered to be strong enough for the study results to be
interpreted. In addition, outcome scores of 3.0 - 4.0 or higher are consistent with
confidence in the demonstration of a functional relation (Ledford et al., 2016). As shown
in Table 2.5, based on outcome data for primary outcomes, all but two studies
demonstrated an outcome score of 3.0 or higher. However, when combined with their
overall quality/rigor scores, only seven studies (36.8%) had sufficient overall
quality/rigor and outcome scores to infer a functional relation with confidence (Chazin et
al., 2018; Davenport et al., 2019; Fetherston & Sturmey, 2014 [Study 1]; Homlitas et al.,
2014; Nabeyama & Sturmey, 2010; Sarokoff & Sturmey, 2008; Seiverling et al., 2010),
while 63.2% of studies (n = 12) had scores indicating low quality evidence of positive
effects (see Figure 2.2). Studies with high enough overall quality/rigor and outcomes
25
scores on the SCARF to be considered rigorous and high-quality examined the use of
BST to train fidelity of implementation of behavior intervention plans (BIP; Chazin et al.,
2018), reading racetrack intervention (Davenport et al., 2019), the picture exchange
communication system (PECS; Homlitas et al., 2014), discrete trial teaching (DTT;
Fetherston & Sturmey, 2014 [Study 1]; Sarokoff & Sturmey, 2008), the natural language
paradigm (NLP) and response chaining (Seiverling et al., 2010), and guarding procedures
for patients with ambulatory difficulties (Nabeyama & Sturmey, 2010).
Generalization and Maintenance
Generalization was assessed in 47.3% (n = 9) of studies (outcome scores greater
than 0 in Table 2.5). Of those studies that assessed generalization, 88.9% (n = 8) had
sufficient overall quality/rigor and outcome ratings for their generalization findings to be
interpreted with confidence (Fetherston & Sturmey, 2014 [all three studies]; Gianoumis
et al., 2012; Nabeyama & Sturmey, 2010; Nigro-Bruzzi & Sturmey, 2010; Palmen et al.,
2010; Sarokoff & Sturmey, 2008). Generalization was assessed across responses (n = 4;
Fetherston & Sturmey, 2014 [all three experiments]; Palmen et al., 2010), individuals (n
= 3; Gianoumis et al., 2012; Nabeyama & Sturmey, 2010; Sarokoff & Sturmey, 2008),
and contexts (n =1; Nigro-Bruzzi & Sturmey, 2010). For seven out of eight studies,
generalization was measured in the context of the study design, and for one study it was
measured pre and post intervention. Finally, all but one study received a rating indicating
that consistent, positive effects were shown via the context of the measurement design.
The remaining study received a rating indicating that generalization effects were
inconsistent or weak, positive effects. According to Figure 2.3, which shows overall
26
quality and rigor of generalization measurement and generalized outcomes 88.9% (n = 8)
of studies that measured generalization demonstrated high quality evidence of positive
effects and 11.1% of studies (n = 1) that measured generalization had low quality
evidence of positive effects. Overall, these studies suggest that BST can generalize across
a variety of parameters.
Maintenance was assessed in 42.1% (n = 8) of studies (outcome scores greater
than 0 in Table 2.5). Of those studies that assessed maintenance, 87.5% (n = 7) had
sufficient overall quality/rigor and outcome ratings for their maintenance findings to be
interpreted with confidence (see Figure 2.4; Aherne & Beaulieu, 2019; Davenport et al.,
2019; Hassan et al., 2017; Homlitas et al., 2014; Jimenez-Gomez et. al., 2019; Nabeyama
& Sturmey, 2010; Palmen et al., 2010). Four out of seven studies received scores
indicating that maintenance data was collected at least one week, but less than one month
after intervention was completed. The remaining three studies collected maintenance data
one or more months after the completion of intervention. Finally, all seven studies
showed maintenance data similar to intervention or criterion, and five of the seven
measured maintenance on more than one occasion. Additionally, 12.5% of studies (n = 1)
measuring maintenance had scores indicating low quality evidence of positive maintained
effects (see Figure 2.4). Overall, these studies indicate that skills learned from BST can
be maintained up to and over a month after intervention.
In sum, findings indicate that seven studies had both strong measurement
characteristics and primary outcomes that can be interpreted with confidence. Eight
studies indicated quality generalization measurement and outcomes and
27
seven studies indicated quality maintenance measurement and outcomes.
Summative Quality
Although most studies that measured generalization and maintenance did so in a
manner that allowed for the interpretation of their results with confidence, only a few
studies demonstrated sufficient overall quality/rigor and outcomes in multiple areas. The
only study to demonstrate sufficient quality/rigor and outcomes across primary outcomes,
generalization, and maintenance was Nabeyama and Sturmey (2010). Therefore, this
study was the most rigorous and comprehensive in terms of SCARF ratings and protocols
indicating high levels of quality in all three areas assessed. Two studies demonstrated
high quality/rigor in the areas of primary outcomes and generalization (Fetherston &
Sturmey, 2014; Sarokoff & Sturmey, 2008). Of note, this was only demonstrated in Study
1 of Fetherston and Sturmey (2014). Maintenance was not assessed in either of these
studies, so quality and outcome indicators for that area could not be determined.
Similarly, two studies demonstrated high quality/rigor in the areas of primary outcomes
and maintenance (Davenport et al., 2019; Homlitas et al., 2014). However, generalization
was not assessed in either of these studies, so quality and outcome indicators for that area
could not be determined.
Discussion
The current review evaluated 19 studies from 17 articles examining the effects of
BST on intervention implementation by natural implementers (teachers and other
professional staff) with children ages birth to 21. These studies were coded and analyzed
using the SCARF coding procedure (Ledford et al., 2016). All studies utilized a multiple
28
baseline or multiple probe design. The majority of interventionist-recipient relationships
were teacher-child, and the majority of interventions took place in the classroom or
school environment. All studies reviewed utilized all of the BST training components, but
a few incorporated a self-evaluation or self-report component, which helped improve
performance. The reviewed studies had several strengths in their reporting, including
dependent variable reliability data, social/ecological validity, description of dependent
variable characteristics (i.e., operational definitions), description of both primary
comparison conditions, and sufficiency of data. However, there were also several
weaknesses in their data reporting, including BST fidelity reporting and participant
descriptive characteristics.
Regarding SCARF scoring, less than half of the studies produced high enough
overall quality/rigor and outcome ratings for their primary outcome data to be interpreted
and to infer a functional relation with confidence. Similarly, generalization and
maintenance were assessed in less than half of all studies. However, the majority of these
studies received sufficient overall quality/rigor and outcomes scores to interpret
generalization/maintenance findings with confidence. Only one study produced sufficient
ratings in overall quality/rigor and outcomes across all three areas assessed: primary
outcomes, generalization, and maintenance. Additionally, only four studies produced
sufficient ratings across two outcome areas assessed.
This review demonstrated that although previous research has shown that BST
can be used to train individuals to implement a large variety of interventions, only a
certain number of included studies were found to be of sufficient overall rigor/quality and
29
outcomes to infer a functional relation with confidence. These included studies that
taught natural implementers to implement behavior intervention plans (BIP), reading
racetrack intervention, the picture exchange communication system (PECS), discrete trial
teaching (DTT), the natural language paradigm (NLP), and guarding procedures for
patients with ambulatory difficulties. The studies that implemented these intervention
techniques were found to be of higher quality among the studies included in this review.
However, that is not to say that all included studies that implemented these interventions
were rated as having high quality.
Implications
This review has shown that although BST has an expansive literature base, not all
BST studies have the same level of quality. This is important to note for consumers of
BST literature to be discerning in their review of existing and future BST studies.
Similarly, this is important for future researchers to help them ensure that their research
meets the highest quality standards to increase confidence in research findings. In
addition, although generalization and maintenance were included in several studies, less
than half of the studies reviewed incorporated these measures. These are important
components of BST research and future research should consider incorporating these
elements into the research design.
The identified studies had several strengths in their reporting. However, some
weaknesses in their data reporting were also noted, including reporting BST fidelity and
participant descriptive characteristics. This indicates a need for future research that
incorporates these elements into study design and reports on them more thoroughly in the
30
body of the text. Without intervention fidelity data, it is difficult to determine whether
BST was implemented as planned and thus whether it is responsible for changes observed
in the data. Similarly, without appropriate participant descriptions, it will be difficult for
future replication to occur, limiting an important part of scientific inquiry. In addition,
this limitation makes it difficult to collect information about the characteristics of
individuals for whom intervention may be effective. These two limitations highlight the
need for future research in which BST fidelity is measured and sufficient participant
characteristics are provided.
Every study included in this review reported some form of positive effects of BST
on interventions implemented by natural implementers (i.e., teachers or staff). Therefore,
this review has shown that several studies with well-established overall rigor/quality and
outcomes demonstrated that teachers and/or staff members can be effectively trained to
implement various interventions using the BST package. This is important because
teachers spend a great deal of time with children and thus, if taught to use various
interventions that target children’s needs, have the potential to increase the dosage of
intervention exponentially. Rather than seeing an interventionist once or twice a week for
an hour (or even daily for an hour), children could have the opportunity to receive
intervention daily for several hours a day while at school. This would likely lead to faster
gains and improvements in skills. However, it would also be important to make using the
intervention in the context of the classroom feasible for teachers. Therefore, future
research should continue to focus on incorporating intervention strategies into daily
classroom routines and structure.
31
This review has also identified particular interventions that are more likely to
work well with BST based on high quality SCARF coding. These include behavior
intervention plans (BIP), the reading racetrack intervention, the picture exchange
communication system (PECS), discrete trial teaching (DTT), the natural language
paradigm (NLP), and guarding procedures for patients with ambulatory difficulties. This
finding is important for two main reasons. One, it helps to strengthen the existing BST
literature by highlighting interventions that are likely to be successful due to high quality
research. Two, it encourages future research to expand on those intervention areas that
did not receive high SCARF overall quality/rigor and outcome scores such as incidental
teaching, activity schedules, cognitive behavioral therapy (CBT), and parent child
interaction therapy (PCIT) like verbal behaviors, etc. Of note, some of these interventions
alone may have a strong literature base, but this review focused only on those studies that
involved natural implementers (teachers and staff).
In addition to interventions, this review also helped identify some characteristics
of participants that have benefitted from this type of BST training. Teachers and staff
members ranging in age from 19 to 50 with a wide range of experience and backgrounds
were included in the studies. Thus, implementers between the ages of 19 to 50 and with
varying years of experience and backgrounds ranging from no experience to multiple
years of experience may be likely to benefit from BST coaching. Similarly, intervention
recipients ranged in age from 2 to 12 years, and the majority had a diagnosis of ASD.
Thus, children who are between the ages of 2 and 12 and who have a diagnosis of ASD
may be likely to benefit from intervention from natural implementers who have had BST
32
training. However, some studies did not report participant demographic information and
thus these data are missing from these ranges.
Limitations
The current review has several limitations worthy of note. First, the review is not
comprehensive of all studies involving training natural implementers with BST. Parents,
peers, and other natural implementers were excluded from the study due to a focus on
teachers and other professional staff members. Although important contributions to the
BST research, studies involving parents and other natural implementers are beyond the
scope of this review. Future reviews should focus on these other natural implementers.
Second, the method of score calculation for SCARF scores is not always inclusive
of all study information. For example, some questions, depending on whether you
answer, “yes” or “no,” then require you to answer “NA” for the remainder of questions in
a section. However, there are times when these questions are not mutually exclusive and
questions which could receive an answer of “yes” and earn more points are required to be
coded as “NA,” lowering the overall score for the study. Thus, the rules of coding
actually lead to a decrease in a study’s score rather than the lack of such an element in the
study. Such limitations are not unexpected as no systematic analysis framework is
without its limitations.
Conclusion
The current review demonstrated that BST can be successfully used to train
teachers and other professionals to implement a variety of interventions. However,
certain studies were ranked as having more rigor/quality and thus more confidence in
33
their outcomes according to SCARF. These seven studies involved training the following
intervention techniques: behavior intervention plans (BIP), the reading racetrack
intervention, the picture exchange communication system (PECS), discrete trial teaching
(DTT), the natural language paradigm (NLP), and guarding procedures for patients with
ambulatory difficulties. However, all studies, even those that received lower SCARF
scores, showed some form of improvement in intervention implementation after BST
training. The current review has shown that teachers and other professional staff can be
effectively taught various intervention techniques using BST and can then use those
intervention techniques with fidelity, increasing the potential intervention dosage for
recipients.
34
Table 2.1
Participant Demographics
Implementer Implementer Recipient Recipient

Implementer/Recipient N (age Gender N (age Gender Recipient
Study Relationship range) (%M) range) (%M) Diagnoses
Aherne
(2019) Staff/Client 3 (22-29) 33% NR NR NR
Multiple
Chazin disabilities
(2018): including
Study 1 Teacher/Student 4 (23-30) 0% 1 (3) 100% GDD
Davenport
(2019) Teacher/Student 3 (NR) 0% 3 (NR) NR NR
Fetherston
(2014) Developmental
Study 1 Teacher/Student 4 (NR) NR 4 (3-12) NR Disability
Fetherston
(2014) Developmental
Study 2 Teacher/Student 4 (NR) NR 4 (5-10) NR Disability
Fetherston
(2014)
Study 3 Teacher/Student 3 (NR) NR 3 (10-12) NR ASD
Gianoumis
(2012) Teacher/Student 3 (25-34) 0% 6 (3-4) NR ASD
35
Table 2.1 Continued
Giles ASD- Motor
(2018) Teacher/Student 3 (26-33) NR 3 (6-12) 100% Stereotypy
Hassan
(2017) Staff/Client 7 (22-32) 29% 7 (NR) NR ASD
Hogan
(2015) Staff/Client 4 (25-34) 0% 2 (NR) NR NR
Homlitas
(2014) Teacher/Student 3 (NR) NR 9 (2-7) NR ASD
Jimenez-
Gomez
(2019) Staff/Client 5 (NR) NR 3 (2-4) 100% ASD
Swimming
Jull (2016) Instructor/Child 6 (19-30) 17% 8 (5-8) 88% ASD
Multiple
Nabeyama physical
(2010) Staff/Client 3 (21-24) NR 3 (7-8) 100% disabilities
Nigro-
Bruzzi
(2010) Staff/Client 6 (NR) NR 6 (2-6) NR ASD
Palmen
(2010) Staff/Client 4 (41-50) 50% NR NR ASD
36
Table 2.1 Continued
Sarokoff
(2004) Teacher/Student 3 (NR) NR 1 (3) NR ASD
Sarokoff
(2008) Teacher/Student 3 (NR) 0% 5 (M = 5) 100% ASD
Seiverling
(2010) Teacher/Student 3 (23-42) NR 3 (3-4) NR ASD
Note. Studies identified by last name of first author and year. NR = not reported; ASD = autism spectrum disorder; GDD = global
developmental delay;
37
Table 2.2
Study Outcomes
Study Study Study Setting BST Focus Trained Intervention Brief Results
Design Outcomes
BST was effective in
teaching DTT; outcomes
were not maintained for
2/3 participants; a self-
evaluation procedure
Aherne Home/Training
helped with maintenance
(2019) MB-P Center DTT NR
BST improved staff
implementation of student
Chazin BIPs when training was
(2018): Unprompted functional combined with coaching
MP-P
Study 1 School BIP implementation communication but not with training alone
BST taught teachers to
Davenport Reading Racetrack Number of sight words read implement the reading
MP-P
(2019) School intervention correctly racetrack intervention
Fetherston
BST was effectively used
(2014) Percentage of correct responses
MP-P to train DTT
Study 1 School DTT by learners
Fetherston
(2014) Percentage of correct responses
MP-P to train incidental teaching
Study 2 School Incidental teaching by learners
38
Table 2.2 Continued
Design Outcomes
Fetherston BST was effectively used
(2014) Percentage of correct responses to train staff to implement
MP-P
Study 3 School Activity Schedules by learners activity schedules
Gianoumis Child vocalizations and to train teachers to
MB-P
(2012) School NLP maladaptive behavior implement NLP
Teachers were able to
Giles implement RIRD after
MB-P
(2018 School RIRD Stereotypy BST
Staff improved CBT
Hassan Clinic/Training intervention over self-
MP-P
(2017) Center CBT intervention NR study alone
BST improved
Hogan implementation of student
MB-P
(2015) School BIPs NR BIPs
BST was used to train
teachers to implement
Phases 1, 2, and 3A of
Homlitas
PECS
(2014) MB-P School PECS NR
BST was used to train
Jimenez‐
staff to implement PCIT
Gomez PCIT type verbal
like verbal behaviors
(2019) MP-P Clinic behaviors NR
39
Table 2.2 Continued.
Design Outcomes
5/6 instructors showed

improvement in
implementing swimming
Community Compliance and swimming behavior techniques with
Jull (2016) NC MB-P Swimming Pool swimming skills skills BST
BST combined with self-
recording of performance
was used to train staff to
correctly implement
guarding procedures for
Nabeyama ambulation guarding clients with ambulatory
MB-P
(2010) School procedures Distance ambulated difficulties
Nigro- BST was used to
Bruzzi effectively teach staff to
MB-P
(2010) School mand training Unprompted mands implement mand training
BST was used to improve
implementation of all
Treatment error correction,
behaviors and
Center for positive
significantly improved
Palmen Individuals with reinforcement, and Response efficiency and correct
error correction
(2010) MB-B ASD initiating opportunities target behavior
BST effectively trained
Sarokoff teachers to accurately
(2004) MB-P Home DTT NR implement DTT
40
Table 2.2 Continued
Design Outcomes
To use BST to train BST was used to
Sarokoff staff to implement Correctly identifying target effectively train staff to
MB-P
(2008) School DTT sight words implement DTT
BST and GCT were
effectively used to train
Seiverling NLP and response NLP and response
MB-P
(2010) School chaining Emission of vocal chains chaining
Note. Studies identified by last name of first author and year. MB-P = multiple baseline across participants; MB-B = multiple baseline
across behaviors; NC MB-P = nonconcurrent multiple baseline across participants; MP-P = multiple probe across participants; DTT =
discrete trial teaching; IMRF = instructions, modeling, rehearsal, and feedback; BST = behavioral skills training; NLP = natural
language paradigm; RIRD = response interruption/re-direction; CBT = cognitive behavioral therapy; BIP = behavior intervention plan;
PECS = picture exchange communication system; PCIT = parent child interaction therapy; ASD = autism spectrum disorder; GCT =
general-case training.
41
Table 2.3
Rigor Coding Questions from the Single Case Analysis Review and Framework (SCARF)
Criteria Criteria
Dependent Variable Reliability
1. Do authors report dependent 2. Do authors report collection of
variable reliability data? agreement data in both primary
comparison conditions and for at
least 20% of sessions overall?
3. Are dependent variable reliability 4. Was agreement data collected by
data (e.g., IOA data) calculated on observers who were blind to study
a point-point basis, and is conditions and/or purpose?
agreement higher than 80% (or
higher than 0.60 Kappa) in each
primary comparison condition?
Independent Variable Reliability (Fidelity)

1. Do authors report any data related 2. Do authors report the use of self-
to fidelity of implementation? report fidelity only?
3. Do authors report fidelity data 4. Do authors report collecting
suggesting fidelity of more than fidelity in both primary
80% or evidence of differentiation comparison conditions? (e.g.,
between conditions? baseline and intervention). If the
measurement context is separate
from the treatment context, PF
should be collected in both.
5. Do authors report fidelity data 6. Do authors report fidelity data
collection in at least 20% of separately for each primary
sessions? comparison condition? Note:
100% fidelity and explicit
collection in both conditions meets
this criteria.
7. Do authors (a) collect agreement
data on fidelity assessments (e.g.,
two observers assess fidelity and
compare their assessments to get a
percentage of agreement) or (b) are
data collected by observers blind to
study condition or purpose?
42
Table 2.3. Continued
Sufficiency of Data
1. Do at least three data points exist 2. Is the design a multiple baseline or
in each primary comparison multiple probe design?
condition?
3. Did data collection begin 4. Are more data points needed in
simultaneously during initial any primary comparison condition
baseline or probe conditions? due to (a) within-condition
variability, (b) within-condition
changes in level or trend, or (c)
potential covariation between tiers
in a multi-tier design?
5. Do at least four data points exist in 6. Do at least five data points exist in
each primary comparison each primary comparison
condition, or in conditions with condition, or in conditions with
only three data points is one of the only three data points is one of the
following true: all points at following true: all points at
baseline or ceiling levels, data baseline or ceiling levels, data
reached a criterion level, or no reached a criterion level, or no
overlap with adjacent conditions is overlap with adjacent conditions is
present? present?
Note. IOA = interobserver agreement.
43
Table 2.4
Quality & Breadth of Measurement Coding Questions from the Single Case Analysis
Review and Framework (SCARF)
Social and Ecological Validity

1. Do authors report feasibility or 2. Do authors report psychometric
acceptability ratings via data for the interviews,
interviews, questionnaires, or questionnaires, or surveys; or do
surveys? they provide a citation to another
source that shows acceptable
psychometric data?
3. Do authors report one or more of 4. Do authors report the use of
the following: (1) blind raters of typical environments and/or report
importance of results, acceptability the use of indigenous
or feasibility of procedures, or implementers or social partners?
acceptability of dependent
variables, (2) normative
comparisons?
Participant Descriptions
1. Do authors report demographic 2. Do authors report formal test
information, including age and results (e.g., IQ, language
diagnosis or eligibility category, competence, achievement)?
for all participants?
3. Do authors report general 4. Do authors report inclusion criteria
information about participants or pre-intervention behaviors for
(e.g., educational placement, all participants?
problem behaviors, functional
repertoire of behaviors, areas of
strength and weakness)?
Condition Descriptions
1. Are procedures for both primary 2. Is dosage adequately described?
comparison conditions adequately
described?
3. Is setting described for both 4. Are implementers adequately
primary comparison conditions described in terms of training and
general (i.e., if relevant: location, demographic characteristics? If
individuals in environment, indigenous implementers are used,
physical characteristics)? "yes" on this question requires
authors to report (a) how
implementers were trained, and (b)
44
Table 2.4. Continued.
evidence that the training was completed
as described (e.g., implementation
fidelity).
Dependent Variable Descriptions
1. Do authors describe observable 2. Do authors provide examples
characteristics of dependent and/or non-examples of target
variables (e.g., operational behaviors?
definitions)?
3. Do authors adequately describe 4. Do authors describe how system
measurement system? (e.g., was used? (e.g., Were data
counts, duration, 5-s partial collected by implementers or
interval system, 15-s momentary another individual? Were data
time sampling) collected in-vivo, via audio or
video?)
Generalization Measurement 1
1. Do authors report assessment of a 2. Do authors report assessment of a
target behavior performed in a target behavior performed with
context that is different than materials that are separate from
training/primary outcome those used in training/primary
measurement? measurement context?
3. Do authors report assessment of a 4. How do authors measure
target behavior performed with a generalization across materials,
different social partner than those social partners, or settings?
used in training/primary
measurement context?
Generalization Measures 2
1. Do authors measure a behavior 2. Do authors teach one specific
that is a generalized tendency, in behavior or type of behavior, but
addition to the primary outcome of measure a different specific (not
interest? generalized) behavior as a measure
of generalization (response
generalization)?
3. How do authors measure
generalized behavior? (e.g., either
by measuring a generalized
tendency or generalization of an
explicitly taught behavior).
45
Table 2.4. Continued.
Maintenance Measurement
1. Do authors report evidence of 2. Is this maintenance measured on
continued behavior change, during more than one occasion?
post-intervention sessions?
3. When is maintenance measured?
46
Table 2.5
SCARF Quality, Rigor, and Outcome Scores
Primary Outcome Generalization Measures Maintenance Measures

Study Rigor Quality Overall Outcome Quality/Rigor Outcome Quality/Rigor Outcome
Quality/Rigor
Aherne (2019) 1.0 1.6 1.2 5.0 0.0 1.0 3.0 5.0
Chazin (2018):
Study 1 3.0 2.0 2.7 4.0 0.0 1.0 1.0 5.0
Davenport
(2019) 3.3 1.9 2.8 5.0 0.0 1.0 3.0 4.0
Fetherston
(2014) Study 1 2.7 1.6 2.3 5.0 4.0 5.0 0.0 1.0
Fetherston
(2014) Study 2 1.3 1.6 1.4 5.0 4.0 5.0 0.0 1.0
Fetherston
(2014) Study 3 1.3 1.6 1.4 4.2 4.0 4.2 0.0 0.2
Gianoumis
(2012) 1.3 2.1 1.6 3.8 2.0 4.8 0.0 0.8
Giles (2018 2.0 1.7 1.9 3.0 0.0 1.0 0.0 1.0
Hassan (2017) 1.7 1.9 1.7 2.0 0.0 1.0 4.0 5.0
Hogan (2015) 1.0 1.0 1.0 3.0 0.0 1.0 0.0 1.0
47
Table 2.5 Continued
Primary Outcome Generalization Measures Maintenance Measures
Overall
Study Rigor Quality Rigor/Quality Outcome Quality/Rigor Outcome Quality/Rigor Outcome
Homlitas
(2014) 2.7 1.9 2.4 5.0 0.0 1.0 4.0 5.0
Jimenez‐
Gomez (2019) 0. 7 1.7 1.0 4.6 1.0 3.6 3.0 3.6
Jull (2016) 0.7 1.9 1.1 4.0 0.0 1.0 0.0 1.0
Nabeyama
(2010) 2.3 2.6 2.4 5.0 4.0 5.0 4.0 5.0
Nigro-Bruzzi
(2010) 2 1.7 1.9 2.5 3 4.5 0 0.5
Palmen (2010) 0.7 2.6 1.3 2.0 3.0 3.0 3.0 5.0
Sarokoff
(2004) 2.3 1.0 1.9 5.0 0.0 1.0 0.0 1.0
Sarokoff
(2008) 2.3 2.4 2.4 4.7 4 4.7 0 0.7
Seiverling
(2010) 3.0 1.9 2.6 4.9 0.0 0.9 0.0 0.9
Note. Studies identified by last name of first author and year. SCARF = Single Case Analysis Review and Framework.
48
Figure 2.1.
Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA) Flow Diagram
Identification
Records identified through Additional records identified

database searching through other sources
Screening
(n = 172) (n = 21)
Records after duplicates removed

(n = 146)
Screening
Eligibility
Records screened Records excluded

(n = 146) (n = 124)
Full-text articles assessed Full-text articles excluded,

Eligibility
for eligibility with reasons

(n = 22) (n = 5)
Include
Studies included in
qualitative synthesis
(n = 19 in 17 articles)
Identification
Included
Note. This figure shows the number of studies that were considered for eligibility in the
review and how they were reviewed and analyzed to obtain the final number of studies.
49
Figure 2.2
SCARF Quality and Rigor of Primary Outcomes
4
Primary Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Overall Study Quality & Rigor
Note. SCARF = Single Case Analysis Review and Framework. Filled-in circle data
points represent individual studies. The graph is interpreted as follows: Data points that
fall in quadrant one indicate low quality evidence of positive effects. Data points that fall
in quadrant two indicate high quality evidence of positive effects. Data points that fall in
quadrant three indicate low quality evidence of negative or minimal effects. Finally, data
points that fall in quadrant four indicate high quality evidence of negative or minimal
effects. In sum, the highest quality studies with the best outcomes fall in quadrant two.
50
Figure 2.3
SCARF Quality and Rigor of Generalized Outcomes
5
Generalized Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Quality & Rigor of Generalization Measurement
Note. SCARF = Single Case Analysis Review and Framework. Only studies that
measured generalization are included. Filled-in circle data points represent individual
studies. The graph is interpreted as follows. Data points that fall in quadrant one indicate
low quality evidence of positive effects. Data points that fall in quadrant two indicate
high quality evidence of positive effects. Data points that fall in quadrant three indicate
low quality evidence of negative or minimal effects. Finally, data points that fall in
quadrant four indicate high quality evidence of negative or minimal effects. In sum, the
highest quality studies with the best outcomes fall in quadrant two.
51
Figure 2.4
SCARF Quality and Rigor of Maintained Outcomes
4
Maintained Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Quality & Rigor of Maintenance Measurement
Note. SCARF = Single Case Analysis Review and Framework. Only studies that
measured maintenance are included. Filled-in circle data points represent individual
studies. The graph is interpreted as follows. Data points that fall in quadrant one indicate
low quality evidence of positive effects. Data points that fall in quadrant two indicate
high quality evidence of positive effects. Data points that fall in quadrant three indicate
low quality evidence of negative or minimal effects. Finally, data points that fall in
quadrant four indicate high quality evidence of negative or minimal effects. In sum, the
highest quality studies with the best outcomes fall in quadrant two.
52
CHAPTER 3
STUDY 2: TEACHING THE TEACHER: USING BEHAVIORAL SKILLS
TRAINING TO TRAIN TEACHERS TO IMPLEMENT MILIEU TEACHING
TECHNIQUES1
1
Slane, M. M., & Lieberman-Betz, R. To be submitted to Behavioral Interventions
53
Abstract
Behavioral skills training (BST) is a common set of four core principles that are used to
train others to implement various skills, behaviors, and interventions; it is well-
researched and has a strong literature base (DiGennaro et al., 2018). However, BST has
only begun to be examined in the literature on Naturalistic Developmental Behavioral
Interventions (NDBIs). NDBIs are a class of interventions that combine both principles
of applied behavior analysis as well as developmental sciences to improve skills in
children with disabilities from more basic developmental skills such as joint attention and
eye contact to more complex sills such as language and social interaction (Schreibman et
al., 2015). One NDBI that has led to improvements in children’s early language
development is Milieu Teaching (MT; e.g., Bolzani et al., 2009; Warren & Gazdag,
1990). The present study sought to investigate the utility of BST in the training of MT
techniques. Two teachers were trained to implement three MT techniques with children
who were minimally verbal: following the child’s lead, teaching social routines, and the
system of least prompts (Fey, 2008). A concurrent multiple baseline across behaviors
replicated across teacher participants was used. One teacher showed an increase in
fidelity of implementation of all three techniques when BST was introduced, effectively
implementing each of the MT components. The second teacher showed an increase in
fidelity of implementation for the first two techniques with limited evidence of a
functional relation for the third technique. These improvements in skill also generalized
to a new set of toys and materials and maintained over time. Thus, the present study
54
showed that BST can be effectively used to train natural implementers to carry out MT
techniques with fidelity.
INDEX WORDS: milieu teaching, behavioral skills training, naturalistic interventions,
teachers
55
Teaching the Teacher: Using Behavioral Skills Training to Train Teachers to
Implement Milieu Teaching Techniques
Language impairments in children are fairly common with community prevalence
estimates ranging from 7% to 17% (King et al., 2005). In addition, language impairments
are also frequently co-morbid with neurodevelopmental disabilities such as autism
spectrum disorder (ASD; Rosenbaum & Simon, 2016). These delays have been
associated with lower performance in cognitive, academic, and language domains in later
development (Johnson et al., 1999). Thus, the importance of early intervention for
children demonstrating language delays cannot be understated. However, most traditional
methods of intervention delivery, for example pull out sessions with speech/language
pathologists (SLPs) in schools, only allow for sessions to occur once or twice a week for
about an hour, which may not be enough to allow for children with language delays to
close the gap and catch up to their typically developing peers. This has led to a surge in
research involving natural implementers of intervention, such as teachers. Peterson
(2004) points out that teachers spend a great deal more time with children than traditional
interventionists. Therefore, if teachers were taught to implement various language
interventions, the dosage for these interventions has the potential to increase dramatically
with sessions happening multiple times a week, for multiple hours a day. However, in
order for this to occur, teachers must be properly trained in the various intervention
techniques so that they are able to implement these techniques with fidelity.
56
Behavioral Skills Training
Behavioral skills training (BST) is a comprehensive training package designed to
teach new skills and techniques to a variety of individuals (Kornacki et al., 2013; Ward-
Horner & Sturmey, 2012). BST is comprised of four primary components, (1) instruction,
(2) modeling, (3) rehearsal, and (4) feedback. During the first step, learners are given
instructions regarding how to perform the desired skill or behavior. Next, the therapist or
researcher models accurate completion of the target behavior. Then, the trainee and the
trainer practice with one another via role play (rehearsal). Finally, the trainer watches the
trainee implement the learned skill in the target environment and provides corrective
feedback (Krumhus & Malott, 1980; Nuernberger et al., 2013). Although all components
of BST have been established as important for proper implementation, modeling and
feedback have been identified as the critical components that increase the fidelity of
implementation of new skills (Krumhus & Malott, 1980; Ward-Horner & Sturmey,
2012).
BST has been used across a variety of settings and with a wide variety of
individuals. Studies have included using BST to increase on-task behavior for high-
functioning young adults with autism spectrum disorder (ASD; Palmen & Didden, 2012),
to improve undergraduate students’ ability to correctly implement functional analyses
(Iwata et al., 2000), and to train staff to implement Phases 1-3 of the Picture Exchange
Communication System (PECS; Homlitas et al., 2014). BST has also been used to teach
parents and caregivers of children with ASD to implement social skills training (Dogan et
al., 2017; Hassan et al., 2018), improve staff implementation of mand training and
57
subsequent unprompted mands in children (Nigro-Bruzzi & Sturmey, 2010), train
community staff and teachers to implement discrete trial training (Jull & Mirenda, 2016;
Sarokoff & Sturmey, 2004), train teachers to implement specific behavioral intervention
plan goals (Madzharova & Sturmey, 2018), and train teachers to implement response
interruption and redirection (Giles et al., 2018). Thus, BST is a well-established training
package that has demonstrated efficacy in a variety of contexts. However, researchers
have only recently begun to examine the use of BST to train professionals to implement
more naturalistic developmental behavioral interventions (NDBIs) with young children
with social-communication delays and disorders.
Naturalistic Developmental Behavioral Interventions
NDBIs are a class of interventions that combine both principles of applied
behavior analysis as well as developmental sciences (Schreibman et al., 2015). These
interventions are diverse in form and target multiple developmental domains, including
social, language, play, motor, and cognition (Schreibman et al., 2015). They also
typically target young children with social-communication disorders, such as those with
ASD (Schreibman et al., 2015). Schreibman et al. (2015) identified several common
features of NDBIs, including, (1) three-part contingency; (2) manualized practice; (3)
individualized treatment goals; (4) ongoing measurement of progress; (5) child-initiated
teaching episodes; (6) environmental arrangement; (7) use of prompting and prompt
fading; (8) modeling; (9) adult imitation of the child’s language, play, or body
movements; and (10) broadening the attentional focus of the child. In a review of NDBIs
targeting pre-linguistic communication skills, Dubin and Lieberman-Betz (2020)
58
identified seven components that were common across the majority of studies reviewed:
(1) following the child’s lead, (2) prompting, (3) natural consequences (i.e., outcomes
that logically result from a behavior, such as providing a child with a desired object
immediately after he/she requests it), (4) instruction embedded in routines, (5)
environmental arrangement, (6) time delay, and (7) linguistic mapping. Thus, while there
are a multitude of NDBIs, there are also a set of common core features of NDBIs that
target early developing communication behaviors.
Teachers have been trained to implement a variety of NDBIs using various
training techniques, such as verbal explanation, modeling, and coaching; didactic training
sessions; and presentations. However, the degree to which these training techniques are
used and the exact nature of their use is not always clear. Previous studies have focused
on training teachers to implement NDBIs such as prelinguistic milieu teaching (PMT;
Mccathren, 2000), manualized interventions targeting joint attention (Kaale et al., 2012),
enhanced milieu teaching (Olive et al., 2007), naturalistic language teaching (Smith &
Camarata, 1999), and symbolic play and joint attention (Wong, 2013). Despite the
growing number of NDBIs that have been implemented in the classroom by trained
teachers, there is a continued need to examine well-defined procedures that can be used
to train teachers to implement interventions successfully in their classrooms. BST is a
promising approach because it is a well-researched, routinized training procedure that
clearly lays out the process for training individuals, creating a more uniform standard for
training.
59
In an early study integrating BST and NDBIs, Seiverling et al. (2010) used a
multiple baseline across participants design to examine the use of BST to train preschool
teachers to use the Natural Language Paradigm (NLP) in their classrooms. After BST,
teachers successfully implemented NLP and response chaining with children with ASD
between 40 and 49 months of age. In a follow-up study, Gianoumis et al. (2012)
replicated these results, using a multiple baseline across participants design to show that
BST could be used to train preschool teachers to effectively use NLP with 3 to 4-year-old
children diagnosed with ASD. In yet another study designed to carve out a role for BST
in training adults to use naturalistic behaviorally-based interventions, Toelken and
Miltenberger (2012) utilized a multiple-baseline across behaviors design to examine the
use of BST to train paraprofessionals to implement the system of least prompts using an
embedded teaching procedure with two children with ASD (ages 4 and 5 years old). The
use of a brief, embedded teaching procedure allowed the staff to implement the behaviors
without interfering with their regular duties or causing an undue burden. Because the
evidence for the use of BST to train teachers, caregivers, and other professionals to
implement naturalistic behaviorally-based interventions is only emerging, it is important
to continue to examine its use to train teachers to use well established intervention
procedures, such as milieu teaching.
Milieu Teaching
Milieu teaching (MT) is one of three component interventions that comprise
milieu communication teaching (MCT; Fey, 2008). MCT is an intervention package
composed of three main intervention strategies: (1) prelinguistic milieu teaching, (2)
60
milieu teaching, and (3) focused stimulation. The most appropriate technique is chosen
based on a child’s current level of early social communication development. In addition
to these three intervention strategies, MCT utilizes several core techniques throughout all
three intervention strategies. These include following the child’s lead (FTCL), teaching
social routines (TSR), and setting up the environment (Fey, 2008). The MT intervention
technique is selected once a child is producing a minimum of 5 words, with the outcome
goal of furthering the development of the child’s language (Fey, 2008). In MT, FTCL,
TSR, and setting up the environment are combined with time delay and a mand-model
prompting hierarchy and several other techniques to develop and elicit communication
from the child. In the present study, the focus was on three core techniques believed to be
most amenable to the classroom environment and that fit participants’ current language
levels: FTCL, TSR, and the system of least prompts (SLP).
Following the Child’s Lead
FTCL is a technique that involves allowing the child to direct and lead the
interactions while the adult follows along with the child (Fey, 2008). In order to maintain
the children’s interest during intervention sessions, interventionists are taught to follow
children’s attentional leads and to focus on objects and routines of interest to the children
(Fey et al., 2017; Fey, 2008). Given that young children, especially those with social
communication delays, tend to attend to and focus more on objects they find interesting,
FTCL involves allowing the child to direct the interaction and to select the play materials
(Fey et al., 2017). This practice helps to ensure that children remain interested and
engaged during interactions with adults. Adults often try to direct children’s play by
61
making suggestions, asking questions, or issuing commands. However, if communicative
acts are to be encouraged, then the adult must learn to engage in play without utilizing
these behaviors and dominating the interaction. In essence, the adult must learn to be the
respondent rather than the director.
Teaching Social Routines
Social routines involve a collection of events that repeatedly occur in the same
pattern; they are created when a particular manner of playing or interacting occurs in the
same sequence repeatedly (Fey, 2008). Using identified routines, adults learn to create
opportunities for shared interaction. Rather than simply imitating the child’s play and
engaging in parallel play as in FTCL, adults serve as active participants by inserting turns
into the interaction. The goal is to create a back and forth routine in which the child and
adult take turns initiating and responding to one another. During the adult’s turn, he/she
can then pause, creating an opportunity for the child to communicate. If the adult does
not complete his/her turn, the child may look up, or engage in other forms of
communication to continue the routine. For example, when engaging in FTCL, the adult
and child may engage in parallel play, each running separate cars down separate tracks.
In contrast, when building a social routine, the interventionist and child might use the
same car and track, requiring the child to take turns sending the car down the track. Thus,
the adult has increased the likelihood that the child will maintain interest in the
interaction by allowing him/her to select the toys and routine and has also created
opportunities for interaction by establishing a turn-based routine.
62
System of Least Prompts
The combined use of FTCL and TSR allows the adult to create an opening for a
learner to emit new words through the use of the system of least prompts (SLP), a third
component of MT. The SLP involves both time delay procedures and mand-model
procedures (Fey, 2008), which are followed in succession. First, target words are selected
based on the routines that an adult and a child have established, and the child’s current
language level (Fey, 2008). For example, if a child is using primarily single words, words
such as “car”, “track”, “up”, or “down” may be selected as the target during an activity
where an adult and child are taking turns running a car down a track. After following the
child’s interest in the car and track and establishing the social routine of taking turns with
the car, the adult occasions a response from the child by not giving the child the car
during his/her turn. The adult will then initiate the prompting hierarchy with time delay,
followed by a linguistic prompt, and then a linguistic mand-model prompt (Fey, 2008).
During SLP, the adult looks at the child expectantly when first withholding the object
(time delay), then asks the child for the correct response to be emitted (linguistic prompt)
and, if no response is given, provides a model while asking for the child to emit the
correct response (mand-model).
Milieu Teaching in the Literature
MT has a strong research base supporting its efficacy in teaching children with
language delays new language targets (e.g., Bolzani et al., 2009; Warren & Gazdag,
1990). In a study examining the effects of MT on communication development, Warren
and Gazdag (1990) found that MT, specifically mand-model and incidental teaching
63
techniques, successfully improved communication in children with mild intellectual
disability. In another study examining the effects of MT on communication development
in children with prenatal cocaine exposure, Bolzani et al. (2009) found that participants
benefitted from MT, improving their spontaneous production of single and multiple
words. Similarly, Togram and Erbas (2010) found the mand-model component of MT to
be effective in increasing language in children with developmental disabilities; these
effects were maintained 16 weeks after the conclusion of the study. Using four types of
MT language prompts, including models, commands, questions, and time delay, Ingersoll
et al. (2012), found that both MT and a combined condition (MT plus responsive
interaction) were superior to responsive interaction in increasing language targets in
children. Further demonstrating the effectiveness of MT, Christensenet al. (2013) found
that MT components (modeling, mand-model, time delay, and incidental teaching) were
effective for increasing language targets in preschool-aged children with ASD in an early
childhood special education classroom. In addition, MT has been used to teach and
promote a photo exchange system for a child with ASD (Ogletree et al., 2012). These
studies suggest that MT can be used to improve language abilities in children and that the
use of target components rather than the intervention as a whole can still be beneficial for
improving children’s language and early social communication.
In addition to having trained implementers utilize MT to increase language gains
in young children with developmental disabilities, several studies have also utilized
natural implementers and demonstrated similar positive language outcomes. Kaiser et al.
(1993) successfully taught teachers to implement environmental arrangement techniques
64
as well as MT in a classroom with nonvocal preschool children, leading to increases in
child communication. Similarly, Kaiser et al. (1995) were able to successfully train
parents to implement MT techniques with their children, demonstrating gains in child
communication abilities. In a more recent study, Aktas and Ciftcitekinarslan (2018)
reported a successful parent training program, where parents were trained to implement
the mand-model procedure of MT, resulting in language gains for their children
diagnosed with ASD.
These studies demonstrate that the mand-model procedure of MT can be used
effectively by parents and teachers if they are trained properly (i.e., with coaching and
feedback). Thus, it would be logical to conclude that a training package such as BST
would effectively train teachers in the classroom to implement other MT techniques (i.e.,
FTCL and TSR), in addition to SLP (including mand-model procedures). Although these
previous studies described using procedures similar to BST (e.g., coaching and
feedback), they do not explicitly state that BST was used to train parents or teachers in
learning to implement MT techniques. Even though there is a plethora of research
establishing the efficacy of BST and MT individually, there is a lack of research
examining whether BST can be used to effectively train teachers to implement MT
techniques in the classroom with fidelity.
The present study aimed to fill this gap in the literature by examining the efficacy
of BST to train teachers at a school for children with disabilities to implement several
core MT techniques, including: (1) FTCL, (2) TSR, and (3) SLP. Specifically, this study
sought to address the following questions:
65
1. Is BST effective in training teachers to implement MT techniques in the
classroom?
2. Will improvements in teachers’ abilities to implement MT techniques
generalize to novel materials and activities?
3. Will improvements in teachers’ abilities to implement MT techniques be
maintained over time?
Method
Participants
Two teacher-child dyads participated in this study. Teachers at a school for
children with disabilities in the southeastern United States were recruited to participate in
the current study. To be eligible for the study, teachers must have been (a) willing to
participate in the study and (b) have at least one eligible child participant enrolled in their
classroom. Teacher participants were 20 and 47 years of age, with 1 week and 15 years of
teaching experience, respectively. For a complete description of teacher participant
demographics, see Table 3.1. One child per enrolled teacher was recruited to participate
in this study. To be eligible for participation, children were required to (a) be enrolled in
the classroom of an eligible teacher at a school for children with disabilities, (b) be
between 2 and 9 years of age, and (c) produce fewer than 5 referential words. Potential
participants were nominated by teachers and staff, and eligibility was confirmed via a
parent report measure and a brief, 20 min observation conducted by the first author (see
Appendix B for observation data sheet). During the observation, children’s language was
measured and documented and used to determine eligibility. It was initially thought
66
prelinguistic targets would be appropriate for intervention. However, once enrolled in the
study, it became clear that language targets were most appropriate based on the students’
communication skills and language levels. Therefore, MT was selected as the most
appropriate intervention for both child participants. For a complete description of child
participant demographics, see Table 3.2.
Dyad 1: Ms. Smith and Sarah
Ms. Smith. Ms. Smith had her associate’s degree and was certified as a
paraprofessional and a tutor, and served as the lead teacher in her classroom. She reported
experience working with children with a variety of disabilities, including ASD,
intellectual disability, Down syndrome, emotional disturbance, specific learning
disabilities (SLD), other health impairment (OHI), speech and/or language impairment,
visual and hearing impairments, traumatic brain injury, attention deficit hyperactivity
disorder (ADHD), and social communication deficits. She reported teaching in both
special and general education classrooms as well as gifted classrooms. Ms. Smith worked
at the school for a total of four years. She had nine students in her classroom, all of whom
had known diagnoses, including intellectual disability, speech and/or language
impairments, and several others. All nine children had an individualized Growth and
Performance plan (GPP), which was the school’s equivalent of an individualized
education program (IEP). She reported no experience with naturalistic social
communication interventions, but she had worked as a registered behavior technician
where she assisted lead teachers and board-certified behavior analysts (BCBAs) in
classroom and therapeutic settings. Ms. Smith reported minimal experience with
67
professional training, having participated in a previous study several years ago where she
was trained to perform various behavioral techniques, such as discrete trial teaching. Ms.
Smith reported using several strategies to support communication in the classroom,
including positive reinforcement, encouragement, motivation, differentiation, and peer
modeling.
Sarah. Sarah was 8 years, 0 months old and had a diagnosis of Phelan McDermid
Syndrome, developmental delay, and sensory dysfunction. Her full-scale IQ according to
the Abbreviated battery of the Stanford Binet Intelligence Scales- Fifth Edition (SB-5)
was 47. Her nonverbal IQ was below 42 and her verbal IQ was below 43. Previously,
Sarah received interventions including speech and language therapy, occupational
therapy, and applied behavior analysis. She had been at the school for four years and had
a GPP with social communication goals. As such, at the start of the study, Sarah was
receiving speech/language therapy and occupational therapy. According to parent report,
she was not using any words functionally or meaningfully. However, based on examiner
observation, she produced and used approximately 5 words meaningfully and
functionally.
Dyad 2: Mr. Parker and Josh
Mr. Parker. Mr. Parker was brand new to both teaching and working at the
school and did not have any previous experience working with children. He worked at the
school for a total of one week prior to enrolling in the study and worked as a
paraprofessional in the same classroom as Ms. Smith. He reported no experience with
naturalistic social communication interventions and no experience with professional
68
training. Mr. Parker did not report using any strategies to support communication in the
classroom at the beginning of the study.
Josh. Josh was 8 years, 7 months old and had a diagnosis of ASD. His full-scale
IQ according to the Abbreviated battery of the SB-5 was 47. His nonverbal IQ was below
42 and his verbal IQ was below 43. Josh was not receiving any interventions at the time
of the study. This was his first year attending the school and he had a GPP with social
communication goals, focusing on communication and language. According to parent
report, he was able to produce 166 words. However, based on examiner observation, he
produced and used approximately 3 words meaningfully and functionally. Much of his
language was comprised of echolalia, and teachers reported that they had never heard him
use more than a few words functionally and meaningfully.
Materials
Training Materials and Data Collection
A PowerPoint presentation containing the targeted information and video
examples was developed for each intervention strategy. During training sessions, the
PowerPoint presentation and videos were displayed on the researcher’s laptop computer
or on a larger projector screen when available. Data collection sheets were developed and
used to collect data on fidelity of BST for all teacher training sessions (see Appendix C).
All observation sessions were video recorded to allow for primary and reliability
coding of teacher behavior. Data collection sheets were developed to record teacher
strategy use across baseline and intervention sessions (see Appendix D). At the beginning
of SLP data collection, teachers were using their own watches or an iPad displayed with
69
the time to remind them to prompt for language targets every 2 min. An interval timer
that the teachers could keep on their person and that vibrated every 2 min was introduced
partway through SLP intervention to make it easier and less cumbersome for teachers to
determine when the 2 min interval had expired. It was introduced during Session 31 for
Ms. Smith and Session 32 for Mr. Parker during SLP intervention.
Baseline, Intervention, and Maintenance Toys
Toys were selected by Ms. Smith and Mr. Parker based on their knowledge of
Sarah’s and Josh’s interests. Toys used during baseline and intervention included the
following for both Sarah and Josh: magnets, books, puzzles, counting blocks, monkey
string with shape cards (a sticky string-like substance that is pliable and can adhere to
surfaces), string blocks, color sorting bears, coloring, small balls, connecting people
(small people who created circles and chains by holding hands), and Colorino (a game
where you match colorful markers to the picture to make a 3D version of the picture). A
giant beachball also became available to Josh toward the end of baseline for FTCL and
the beginning of FTCL intervention, and a bike became available to Sarah during FTCL
intervention.
Generalization Toys
Generalization toys for Sarah included a trampoline, a racetrack with cars, train
tracks, a vacuum cleaner, taking a walk, and a medium sized ball. Toys for Josh included
a trampoline, a tricycle, a medium sized ball, and pipe cleaners.
70
Setting
All baseline and intervention sessions were conducted in a 1:1 format in
classrooms and other available spaces (i.e., library, hallway, outdoor recreational area) at
a school for children with disabilities in the southeastern United States. Classrooms were
divided into several areas, including a reading area, an arts and crafts area, a play area,
and a block area. Baseline and intervention data collection was conducted during
unstructured free play in the play area of the classrooms and other available school
spaces. Training sessions were conducted 1:1 with the teachers in classrooms and/or
conference rooms at the school. No children were present during initial BST sessions.
Formal Measures
Demographic Questionnaire
Both teachers and parents of eligible children were asked to complete a brief
researcher-developed demographic questionnaire to gather background information (see
Appendices E & F).
Stanford-Binet Intelligence Scales, Fifth Edition (SB-5; Roid, 2003)
The SB-5 is an individually administered assessment of intellectual functioning. It
is appropriate for individuals ranging in age from 2 to 85+ years. Each child was
administered the SB-5 at the beginning of the study to determine his/her level of
intellectual functioning. The SB-5 demonstrates good internal consistency with values
ranging from .84 to .98, as well as good test-retest reliability, with values ranging from
.74 to .97 (Janzen et al., 2004). Similarly, the SB-5 demonstrates good concurrent validity
with a variety of tests, including the Woodcock Johnson-III Tests of Cognitive Abilities
71
(Woodcock et al., 2001a), the SB-IV (Thorndike et al., 1986), and many others with
values raging from .78 to .90 (Janzen et al., 2004). The SB-5 also demonstrated good
predictive validity with the Woodcock Johnson-III Tests of Achievement (Woodcock et
al., 2001b) and the Wechsler Individual Achievement Test-II (Wechsler, 2005) in
addition to good construct validity (Janzen et al., 2004).
The MacArthur-Bates Communicative Development Inventories Words and Gestures
Form (CDI; Fenson, et al., 2007)
The CDI is a parent report measure that assesses a child’s early developing
language, including use and understanding of words and phrases as well as gestures.
Parents were asked to complete the Words and Gestures form of the CDI. This form is
normed for use with children ages 8 months to 30 months; however, it can be used for
children who are older than 30 months if their communication and development are
delayed. The CDI demonstrates good internal consistency (ranging from .62 to .76), and
test-retest reliability (ranging from .59 to .99; Law & Roy, 2008). In addition, it has
shown good convergent validity: .52 with the Preschool Language Scale-Revised
(Zimmerman et al., 1979), .67 with the Peabody Picture Vocabulary Test- Third Edition
(Dunn & Dunn, 1997), and .82 with the Reynell Developmental Language Scales
(Fenson et al., 2007; Reynell & Gruber, 1990).
Social Validity
At the conclusion of the training and intervention process, teachers were asked to
complete an acceptability rating scale to help determine whether they found the
intervention acceptable, helpful, and practical (adapted from Hendrickson et al., 1993).
72
This measure was used to help gather information regarding teacher acceptability of the
training process, perceived effectiveness of the intervention techniques, and overall
feasibility of the techniques for use in the classroom (see Appendix G).
Response Definitions and Measurement Procedures
Teachers were taught three early social communication intervention strategies
from the Milieu Teaching (MT) intervention that were implemented in the classroom
during 20 min, pull-out sessions in which the teacher and child worked together
separately from the larger classroom. Three core techniques were targeted: (1) following
the child’s lead (FTCL), (2) teaching social routines (TSR), and (3) the system of least
prompts (SLP). All teachers were taught the intervention techniques in the same order,
beginning with the most foundational skill and progressing toward more complex,
response interactive strategies (Hemmeter & Kaiser, 1994). All intervention techniques
were based on those described by Fey (2008) and Fey et al. (2017).
Following the child’s lead
FTCL was coded if the teacher was engaged in any of the following behaviors:
imitating the child’s play with objects while engaging in play alongside him/her (parallel
play), commenting on the child’s behavior as the child plays, or responding appropriately
to child interactions (e.g., if the child holds out a toy in the teacher's direction, he/she
should accept the toy). FTCL was not coded if the teacher was engaged in any of the
following behaviors: asking questions, issuing commands or directions, using directive
speech (e.g., “my turn,” “your turn”), doing nothing, doing something other than what the
child was doing, playing with a different toy or activity than the child, prompting, doing
73
the opposite of what the child requested, correcting a behavior or response, or
commenting on their own behavior or actions.
Measurement and data collection. Based on research regarding the most effective
form of time sampling methods (Lane & Ledford, 2014) and optimal interval length, each
20-min data collection session was divided into 80, 15-s intervals. Momentary time
sampling was used when scoring intervals for FTCL. Thus, at the end of each interval,
the interventionist scored whether the teacher was correctly engaging in FTCL. If the
child and adult could not be seen in the video frame together, they were not considered to
be within arm’s reach and FTCL was not coded. All codes were based on the exact end of
the interval and not what came after (e.g., at 45.999 the teacher moves her hand, but does
not place it onto the child’s hand until 46, the code was based on the teacher moving her
hand, not where it ended up going). These data were then used to calculate a percentage
of intervals during which the teacher correctly implemented FTCL by dividing the
number of intervals containing FTCL by the total number of intervals and multiplying by
100. An interval was scored as containing FTCL if the operational definition of FTCL
was met at the boundary of the interval.
Teaching social routines
A social routine was operationally defined as: (1) successfully completing the steps in
a routine, (2) engaging in a routine, and (3) taking an imitative turn. TSR was coded as
correct if the teacher was successfully completing the steps in the identified routine,
engaging in the specified routine, taking an imitative turn, playing with the same toy as
the child and engaged in the same routine as the child, prompting the child to engage in
74
the routine (e.g., "my turn, your turn") or he/she was within the context of setting up or
maintaining the routine. TSR was still coded as correct if the routine was altered or
changed based on the child's interests as the teacher was still engaging in FTCL. TSR
was not coded if the teacher was engaged in parallel play (playing separately from the
child), engaged in an activity that did not allow for turn-taking (e.g., iPad or any game the
child played alone while the adult watched), engaged in a separate activity from the child,
was not within arm’s reach of the child, and/or was not visible in the same camera frame
as the child.
Measurement and data collection. Each 20-min data collection session was
divided into 80, 15-s intervals. Momentary time sampling was used when scoring
intervals. Thus, at the end of each interval, the interventionist scored whether the teacher
was correctly engaging in TSR. These data were then used to calculate a percentage of
intervals during which the teacher correctly implemented social routines by dividing the
number of intervals containing TSR by the total number of intervals and multiplying by
100. An interval was scored as containing TSR if the operational definition of TSR was
met at the boundary of the interval.
System of Least Prompts
The SLP is used to occasion a response from a child by moving through a
prompting hierarchy, starting with prompts that provide the least amount of support and
moving up to the most amount of support that is needed, or can be given, to occasion a
response. Teachers were told to start by giving an expectant look and waiting 3 s (time
delay: Step 1). If the children did not respond within 3 s, teachers then issued a linguistic
75
prompt, “What do you want?” (Step 2). If after another 3 s, children still did not respond,
teachers were instructed to give a linguistic model prompt and say, “X. You want X. Say
X.” encouraging the child to repeat the word after them (Step 3). See Figure 3.1 for a full
description of the prompting hierarchy. Note, this version of the prompting hierarchy was
adapted from Fey (2008). Several steps were removed from the hierarchy as described,
namely the cue step and the additional model step. This was done to ease the learning
burden for teachers and to make the SLP process more user-friendly. Seven steps were
defined and scored as correct or incorrect for the SLP variable. In order to have scored a
prompt, the teacher must have begun the prompting hierarchy with time delay. This
included withholding an object; pointing to or drawing the child’s attention to an object;
using gestures or vocalizations (e.g., “your turn” and pausing for 3 s (range = 3 to 5 s);
and waiting in silence to see if the child vocalized. Additionally, the prompt must have
been designed to elicit a vocal response to be scored (e.g., conveyed the expectation that
the child responds vocally).
SLP was implemented once every 2 min. First, a correct response was coded if the
teacher only gave one prompt within the 2 min time frame. In contrast, an incorrect
response was coded if the teacher gave more than one prompt or no prompts at all within
the 2 min time frame. Second, time delay was coded as correct if the teacher drew the
child’s attention to a specific item and then waited in silence for 3-5 s before issuing a
linguistic prompt (e.g., What color?). It was coded as incorrect if the adult waited less
than 3 s or more than 5 s before moving on to the next level of the hierarchy or if time
delay was implemented incorrectly. Third, a linguistic prompt was coded as correct if the
76
linguistic prompt was implemented correctly (i.e., saying a pure linguistic prompt [e.g.,
“What color is this?”] after time delay and before a model prompt. This was also coded
as correct if the child correctly responded after time delay and the linguistic prompt was
not necessary provided that the teacher appropriately stopped the prompting hierarchy
there. A linguistic prompt was coded as incorrect if the teacher skipped the linguistic
prompt and went right to a model prompt (e.g., “This is red.”). For example, if the teacher
went right to modeling (e.g., “What color? Purple”) then the linguistic aspect of this
prompt was considered part of the model prompt and the pure linguistic prompt was not
given.
Fourth, a model prompt was scored as correct if the teacher implemented the
model prompt correctly (i.e., saying the word he/she wanted the child to say and asking
the child to repeat it). A model prompt was also coded correct if the child correctly
responded after the linguistic prompt, making the model prompt unnecessary. A model
prompt was coded as incorrect if the model prompt was implemented incorrectly (e.g.,
the teacher modeled the wrong word; or issued the model prompt after the child had
already given the correct response; or the teacher failed to attempt to get the child to
repeat the word) and the adult did not model the desired vocalization and moved on to
another task/activity. Fifth, praise was operationally defined as encouraging sayings (e.g.,
“Great job! Nice work! Awesome! Way to go!”) and providing feedback (e.g., “That’s
correct. You got it right.”). Non-vocal forms of praise (e.g., high fives, fist bumps) were
acceptable if they were accompanied by vocal praise. Praise was coded as correct if the
teacher provided praise when the child correctly completed the behavior or attempted to
77
correctly complete the behavior, or the teacher did not provide praise if the child did not
complete or attempt to complete the behavior. Praise was coded as incorrect if the teacher
did not provide praise for successful completion of the target response, or if he/she
praised unsuccessful completion (i.e., the child did not emit the vocal response).
Sixth, the discontinuation of the prompting hierarchy was coded as correct if the
teacher concluded the prompting sequence when and only when the child
performed/attempted the requested vocalization or the teacher reached the end of the
prompting hierarchy. This column was coded as incorrect if the teacher discontinued
prompting before the child completed/attempted the vocalization or if the teacher
persisted with prompting even after the child had successfully completed/attempted the
vocalization. Of note, all prompts including time delay were coded regardless of whether
the prompts issued by the teacher were issued according to the order of the prompting
hierarchy. Seventh, sequence of prompts was coded to determine whether the teacher
implemented the steps of the prompting hierarchy in the appropriate and accurate order.
Prompting sequence was scored as correct if the sequence in which the prompt levels
were performed was done correctly (e.g., starting with time delay, moving to linguistic,
moving to modeling [if necessary]) regardless of whether each individual step was done
correctly. This step was coded incorrect if the prompts were delivered in an incorrect
sequence (e.g., modeling prompt performed before the linguistic prompt) regardless of
whether each individual step was done correctly.
Measurement and data collection. Accuracy of use of the system of least prompts
was measured using an event recording system. Each use of the prompting hierarchy was
78
given a percentage accuracy score based on the following criteria, which were marked as
yes, or no: (1) one prompt was given every 2 min (i.e., prompts should be separated by 2
min, plus or minus 15 s), (2) time delay was implemented correctly, (3) linguistic mand
prompts were implemented correctly, (4) linguistic mand-model prompts were
implemented correctly, (5) praise was provided when appropriate, (6) prompting was
discontinued at the appropriate step, (7) the sequence of prompts was followed correctly
(see Figure 3.1). All instances of prompting that demonstrated an 80% mastery criterion
(i.e., 6/7 steps completed correctly) were scored as correct (for a similar procedure, see
Wright & Kaiser, 2017). These data were then used to determine the percentage of
prompts used correctly in a session, calculated by dividing the number of prompts used
correctly by the total number of prompts and multiplying by 100.
Experimental Design
A single-case, concurrent multiple-baseline across behaviors design replicated
across teachers was implemented. This design allowed for the detection of a functional
relation between BST implementation and changes in teachers’ use of intervention
strategies in the classroom. Experimental control is demonstrated in a multiple baseline
across behaviors design when the data level or trend change upon introduction of the
intervention to the first tier while the data remain stable or unchanged in the remaining
tiers, and this change is repeated through the process of intra-participant replication (Gast
et al., 2018). Thus, with an increasing number of demonstrations of effect upon
introduction of the intervention, confidence that the intervention is responsible for the
79
change in data trend or level increases, experimental control is established, and a
functional relation between the intervention and change in the data can be inferred.
Procedures
Baseline Condition
A minimum of five, 20 min baseline sessions were conducted and video-recorded
for each teacher based on the recommendation in the What Works Clearinghouse (WWC)
Standards for Pilot Single-Case Designs, Version 4. During the baseline condition,
teachers were instructed to interact with children as they normally would and any
spontaneous use of programmed intervention techniques was recorded (for data collection
sheets, see Appendix D), but no instruction, training, or feedback was provided. The
teacher and child were observed during free play time. At the beginning of each session,
the teacher laid out all the toys and allowed the child to choose the toys with which to
play. Once baseline data were stable and without trend in the therapeutic direction across
tiers, and the minimum number of baseline data points had been collected for the first
behavior, the first intervention strategy was introduced. The introduction of intervention
was made based on teachers’ individual data; therefore, baseline length varied across
teachers.
Teacher Training
Teacher training occurred across two 1:1 sessions immediately prior to
introduction of intervention for each MT technique using BST. The BST procedure
included: 1) instruction, 2) modeling, 3) rehearsal, and 4) in-situ feedback. Upon
introduction of each intervention strategy, two training sessions were held - one session
80
that lasted approximately 45 min and one session that lasted approximately 20 min. Two
sessions were dedicated to each target intervention technique for a total of six training
sessions per teacher. Each individual teacher selected a training time that worked best for
his/her schedule and received training at the selected time. The two training sessions for
each technique occurred no more than two weekdays apart. For example, if the first
training session for FTCL occurred on a Tuesday, the second training was conducted by
Thursday of that same week.
During the initial training session for each intervention technique, the instructor
introduced and described the strategy, broke down and described the process for
implementing the technique, and answered any questions that teachers had. Next,
teachers were shown video models implementing the technique. During rehearsal and
feedback, the researcher and the teacher practiced the target intervention technique with
the researcher providing immediate feedback. Feedback included positive comments
regarding what the teacher was doing well in addition to comments designed to help
improve the accuracy of the teachers’ implementation of the techniques. During the
second training session for FTCL, TSR, and SLP, the teacher was asked to implement the
technique with the child in the classroom environment while the researcher provided in-
situ feedback. The researcher modeled the techniques for each teacher as necessary and
practice continued until each teacher felt confident that he/she could implement the
technique appropriately on his/her own. During the second training session, the
researcher observed the teachers engaging in each target technique with the child and
provided immediate, in-situ feedback on their performance.
81
Following the Child’s Lead. During training of FTCL, teachers were trained to
allow the child to lead and direct the interaction. First, they were trained how to imitate
the child’s actions and engage in parallel play, a process where they play alongside the
child, imitating the child’s actions but playing independently. For example, the teacher
and the child might run two different cars down two separate tracks to play with the cars
rather than taking turns with the same car or playing on the same track. Next, teachers
were trained how to comment on the child’s behavior as they played (e.g., “You have a
car. That’s a red block.”) rather than to ask questions or give commands (e.g., “What
color is that block? What do you have?”). Finally, teachers were trained how to respond
appropriately to child initiations. For example, if a child offered an object to the teacher,
he/she should accept the object; similarly, if the child made a request, the teacher should
attempt to fulfill the request (within reason).
Teaching Social Routines. During training for TSR, the researcher worked
closely with teachers to identify and develop several routines for each child participant
depending on the selected toys. Teachers establish routines with children in a variety of
ways, such as imitating a child’s play with the same or similar toys, imitating the child’s
actions, performing an action that is complementary to the child’s action to create a turn
for the teacher within the interaction, engaging the child by performing an action or
activity that he/she finds funny or interesting, or paring actions with singing or counting
(Fey, 2008). The development of these routines is critical for future techniques which
require the teacher to interrupt the established routine in order to create opportunities for
communicative interaction (e.g., prompting). For Ms. Smith, trained routines included
82
taking turns riding a bike throughout the school, putting magnets up on a board, coloring
a picture, and building with blocks. For Mr. Parker, trained routines included having Josh
request a turn to bounce on a giant ball and take turns playing with monkey string (i.e., a
pliable, sticky string that adheres to surfaces).
System of Least Prompts. The prompting hierarchy for SLP consisted of the
following prompting techniques: (1) time delay, (2) linguistic prompts, and (3) linguistic
mand-model prompts (adapted from Fey, 2008). Teachers were trained how to prompt in
the following manner: beginning with time delay, the teacher removed a toy of interest or
interrupted the routine, gave an expectant look, and then waited 3 s before delivering any
kind of instruction, giving the child (portrayed by the researcher) the opportunity to
communicate independently. Once the 3 s had elapsed, and the “child” had not provided
the correct response, the teacher then moved on to the next step in the prompting
hierarchy, linguistic prompts. These prompts are vocally issued prompts that encourage a
child to communicate with adults. For example, if after 3 s the “child” did not respond to
the initial disruption in routine, the teacher would say, “What do you want?” and pause
for another 3 s. If the “child” still did not respond to the linguistic prompt, the teacher
then issued a linguistic mand-model prompt, saying, “X. You want X. Say X.” saying the
name of the toy. In general, linguistic mand-model prompts are used to tell a child what is
being asked or what is expected (e.g., asking the child specifically what he/she wants).
Teachers were trained to prompt once every 2 min following the aforementioned
hierarchy.
83
As part of the initial training session for the SLP intervention technique, the
researchers and the teachers selected specific verbal goals for the individual child
participants. Because the children in this study were minimally verbal, single word
linguistic social-communication goals were identified for each child. Goal selection was
based on each child’s current level of social communication and communication
objectives in their GPPs. For example, if a teacher and child were engaged in a routine in
which they were rolling a ball back and forth but the teacher did not return the ball upon
the child’s turn, the primary prompting goal may be for the child to say, “ball,”
requesting the desired item. It would also be appropriate in this scenario for the child to
say, “turn” requesting his/her turn with the ball. To establish linguistic targets, based on
the recommendations of Fey (2008) the interventionist sat with each teacher to examine
the routines that each child developed during that stage of the intervention and identified
several target words that the children could emit in order to request continuation of the
routines. For Sarah, the primary routine was riding on the bike and prompted words
included, “back”, “go”, “bike”, “turn”, and “push.” For Josh, the primary routines were
rolling on the beach ball and taking turns with the monkey string and prompted words
included, “push”, “ball”, “bounce”, and “monkey string.”
Intervention Condition
Procedures for teacher-child sessions during the intervention condition were
identical to those used in baseline (1:1 pull out sessions, 20 min in length) with the
exception that teachers used the trained techniques rather than playing as they normally
would. No training or help was provided by the researcher during intervention sessions.
84
Data on teacher use of MT strategies were collected via video-recordings of intervention
sessions. A minimum of five data points was required for the intervention condition
across MT techniques and teachers, and intervention continued until the teachers reached
a mastery criterion of three consecutive intervention sessions at 80% or higher.
Given the importance of practice and continuing feedback in ensuring
maintenance of intervention skills (Ward-Horner & Sturmey, 2012), once the intervention
began, a 1:1 check-in with teachers was conducted after each data collection session in
order to ensure the continuation of intervention knowledge and accuracy. During the
check-in, the teacher and the researcher reviewed what went well during the session and
noted areas for improvement. The researcher also answered any questions the teacher had
and provided any additional practice/training as necessary. The researcher reviewed the
teachers’ graphed data with them as necessary (e.g., during a period of continued
performance decline or upon reaching a described goal) and discussed and explained their
performance based on visual analysis. Booster training sessions were also utilized during
periods of performance decline or after long breaks in data collection. During these
sessions, the researcher and teacher would practice the intervention techniques with the
child while the researcher provided immediate feedback and modeling as necessary.
Generalization and Maintenance
Generalization across materials was assessed once during each intervention phase
and during baseline for the TSR and SLP techniques. Teachers were given a novel set of
toys and were asked to use the trained intervention techniques with the novel toys. The
same set of novel toys were used during all three sessions for each teacher/child pair.
85
Generalization sessions were identical to intervention sessions. Data were collected using
the same procedures as during the baseline and intervention conditions, with a different
set of toys, and took place in the same environments. After completion of the
generalization probes, teachers were provided with feedback regarding their performance.
Maintenance of intervention techniques was assessed at two and four-week follow up
observations. Data were recorded on teachers’ use of all trained intervention techniques.
All maintenance sessions were identical to intervention sessions. The same data
collection systems used during previous data collection sessions were used during
maintenance sessions and all trained techniques were evaluated.
The study lasted for six months. In the beginning of the study, data were collected
once a day, two to three times per week. However, as the study progressed, data were
collected less frequently. There was a two-week break in data collection between sessions
14 and 15 for both teachers due to the Thanksgiving holiday and teacher availability. In
addition, after Session 23 for Ms. Smith and Session 24 for Mr. Parker, there was a five-
week break in data collection due to the holidays and teacher and researcher travel. Upon
returning from the five-week break, Mr. Parker requested a booster session, which was
performed between Sessions 24 and 25. Thus, a booster session for FTCL and primarily
TSR was conducted during which the researcher provided in-situ feedback to Mr. Parker
as he interacted with Josh. Ms. Smith was also offered a booster session, but she
declined. There was one additional two-week break between Sessions 30 and 31 for Ms.
Smith and between Sessions 28 and 29 for Mr. Parker. On four occasions for both
teachers, two sessions were conducted in one day with 30 min to 1 hr in between
86
sessions. For Ms. Smith, the following session pairs were conducted on the same day: 26
and 27, 29 and 30, 31 and 32, and 33 and 34. For Mr. Parker, the following session pairs
were conducted on the same day: 25 and 26, 27 and 28, 30 and 31, and 32 and 33.
Procedural Fidelity
Fidelity checklists of BST were developed for each of the intervention training
techniques, detailing specific criteria that must be covered within the training sessions
(see Appendix C). These checklists were used to ensure that all teacher participants
received the same information and training. Average procedural fidelity for Ms. Smith
was 77% with a range of 60-96%. Similarly, average procedural fidelity for Mr. Parker
was 78% with a range of 65-96%.
Interobserver Agreement
In accordance with WWC (Version 4.1) standards, 20% of randomly selected
baseline, intervention, generalization, and maintenance sessions across MT techniques
and teacher participants were coded by a second observer to determine interobserver
agreement (IOA). All raters were required to demonstrate 80% accuracy or higher for 3
different training video sessions on all data collection instruments before coding video
recordings for IOA. Interval-by-interval interobserver agreement was used to determine
IOA for all data collected (i.e., FTCL, TSR, and SLP). This was calculated by dividing
the number of intervals for which the two observers agreed by the total number of
intervals for the session and multiplying by 100. The primary coder was blind to sessions
selected for IOA. If IOA was below 80% on a session coded for IOA, a discrepancy
discussion was held to re-calibrate, but the original agreement percentage was maintained
87
for calculation of mean IOA. For Ms. Smith, observers had an average agreement across
baseline and intervention conditions of 80% (range: 56-93%) for FTCL, 92% (83-100%)
for TSR, and 97% (81-100%) for SLP. For Mr. Parker, observers had an average
agreement across baseline and intervention conditions of 83% (range: 63-93%) for FTCL,
96% (86-100%) for TSR, and 96% (80-100%) for SLP. See Table 3.3 for full reporting of
IOA, broken down by tier and condition for each teacher.
Visual Data Analysis
The dependent variables were graphed for each teacher. Both teachers had a
minimum of five data points per baseline condition. When training occurred on the first
MT strategy, baseline data continued to be collected for the remaining intervention tiers
(behaviors) and were analyzed for stability and absence of trends in the therapeutic
direction. With the introduction of training for each new intervention strategy, the
transition from baseline to intervention provided an opportunity for the replication of
effect, allowing for the detection of a functional relation (Gast et al., 2018). This process
was in accordance with standards issued by the What Works Clearinghouse (WWC) for
Pilot Single-Case Designs, Version 4.1, stating the need for a minimum of three
demonstrations of intervention effect at three different points in time. In addition, the use
of a multiple baseline design with a minimum of six intervention phases and a minimum
of five data points meets the standards put forth by WWC for Pilot Single-Case Designs
Without Reservations. Visual analysis of data trend, level, and variability was conducted
for each teacher to determine whether BST led to increases in implementation of MT
techniques.
88
Results
Milieu Teaching Strategies
Ms. Smith
Following the child’s lead. During baseline, Ms. Smith averaged 31% of
intervals demonstrating FTCL (range: 16-43%; see Figure 3.2). Baseline data were
somewhat variable and were trending in the non-therapeutic direction when intervention
began. Once intervention was introduced, there was an immediate increase in level of
percentage of intervals demonstrating FTCL, with a mean percentage during intervention
of 74% (55-90%). Of note, Ms. Smith’s performance during intervention was somewhat
variable; she began with a high percentage of intervals with FTCL, and then her
performance began to decline. A new toy (bike) was inadvertently introduced into the
interaction during Session 11 of the FTCL intervention condition. It is worth noting that
Ms. Smith’s performance increased initially to above the 80% criterion when this
happened; however, her performance went back to below the mastery criterion after this
session. This soon triggered the need for a booster session where the researcher met with
the teacher to review the intervention procedures and allowed her to practice these
procedures once again. After the booster session, Ms. Smith’s data began to increase in
the therapeutic direction, and she was able to achieve the mastery criterion of three
consecutive sessions at 80% of intervals or higher. Despite this variability in her data,
there was no overlap between data points in the baseline and intervention conditions. The
change in level of the data suggests that the intervention was responsible for the change
in the teacher’s percentage of intervals demonstrating FTCL.
89
Upon introduction of BST to TSR, the teacher’s performance of FTCL
demonstrated continued variability, with a mean accuracy score of 76% (60-90%) and
dipping below the 80% criterion several times. However, this is not unexpected based on
the nature of TSR. Once TSR is introduced as a technique, the teacher is instructed to
structure the interaction so that a routine is being used regularly. This can make it
difficult to maintain FTCL with levels commensurate with initial intervention
performance.
Teaching social routines. During baseline, Ms. Smith’s percentage of intervals
demonstrating TSR were largely consistent, with a mean score of 3% and a range from 0-
8%. Baseline data were low in level and were not trending in any direction after the first
few data points, which were initially trending in the therapeutic direction. Data were
largely stable around 0%. When intervention was introduced, percentage of intervals
demonstrating TSR showed an immediate increase in level from 0% to 88% and
remained above the 80% criterion for the remainder of the intervention condition. Her
mean percentage of intervals demonstrating TSR was 89% with a range of 86-94%. This
change in level suggests that the intervention was responsible for the teacher’s
improvement in percentage of implementation during sessions. Upon introduction of BST
to SLP, Ms. Smith’s performance became more variable with a mean percentage of
intervals demonstrating TSR of 71% with a range of 35-88%.
System of least prompts. During baseline, Ms. Smith’s performance was largely
consistent with an average percentage of prompts used correctly of 1% (range: 0-10%).
Baseline data were quite flat with a slight uptick in the therapeutic direction, which then
90
flattened once again. Data level remained low throughout baseline. Once intervention
was introduced, Ms. Smith’s performance actually decreased from 10% to 0% prompts
used correctly during the first session. This was due to the fact that Ms. Smith forgot to
implement one component of the SLP process, resulting in all of her prompts being
scored as incorrect for this session. After meeting with the researcher to discuss this issue
and to re-practice prompts, her performance improved to 80% of prompts used correctly
during the session. Her average percentage of prompts used correctly during intervention
was 72% with a range of 0-100%. With the exception of one data point (the session with
0% accuracy), there was no overlap in the data between baseline and intervention
conditions, suggesting that the intervention was responsible for Ms. Smith’s increase in
accuracy of implementation.
Generalization. During the first generalization probe for FTCL during the
intervention condition, Ms. Smith demonstrated 85% of intervals coded for FTCL
suggesting her skills had generalized to a new set of materials. This score decreased to
50%, and then increased to 84%. However, because there were no generalization sessions
conducted during baseline, evidence that BST contributed to generalization of FTCL
across materials is limited. Regarding TSR, baseline generalization performance was
10%. This increased to 79%, just below the 80% criterion, during the TSR intervention
condition, but decreased to 15% after introduction of BST to SLP. Thus, Ms. Smith was
able to generalize her TSR skills initially, but this generalization did not maintain.
Finally, during the two SLP baseline generalization probes, she performed at 0%. In
contrast, during the SLP intervention generalization probe, her performance improved to
91
80%, indicating that she was successfully able to generalize the SLP skills to a new set of
materials.
Maintenance. Regarding FTCL, Ms. Smith’s performance remained above the
80% criterion with values of 88% and 94%, at the two- and four-week follow ups,
respectively. Similarly, she maintained her scores for TSR at the two- and four-week
follow up sessions, with values of 84% and 80%, respectively. Finally, her SLP skills
maintained at the two-week follow up with a score of 90% but did not maintain at the
four-week follow up, with a score of 60%, falling below the 80% criterion.
Mr. Parker
Following the child’s lead. Mr. Parker demonstrated an average percentage of
intervals coded for FTCL of 30% (range: 16-40%) during baseline (see Figure 3.2).
Baseline data were variable but remained somewhat low in level. However, there was a
trend in the therapeutic direction toward the end of baseline. Once intervention was
introduced, there was an immediate increase in level, with a mean performance during
intervention of 77% (54-93%). Of note, Mr. Parker’s performance during intervention
was highly consistent in the beginning; however, as time went on, his performance
became more variable. A new toy (beach ball) was inadvertently introduced into the
interaction during Session 7 of the FTCL baseline condition. The ball was absent during
Session 8 due to the fact that it was unavailable, but it was again present during Session 9
when FTCL intervention began. During Session 15, the ball was again unavailable during
data collection, which may have resulted in Mr. Parker’s low accuracy of implementation
of 54%. This triggered the need for a booster session where the researcher met with the
92
teacher to review the intervention procedures and to allow him to practice these
procedures once again. Technically, Mr. Parker reached the mastery criterion of three
consecutive sessions at 80% or higher during Session 13; however, intervention was
continued due to the variability and therapeutic trend that were present in his TSR
baseline data. Thus, intervention was continued in the hopes that his TSR baseline data
would stabilize. After the booster session, Mr. Parker’s FTCL data demonstrated
increases in the therapeutic direction, and he achieved the mastery criterion of three
consecutive sessions at 80% accuracy or higher. Despite some variability in his data,
there was no overlap between data points in the baseline and intervention conditions. The
change in level of the data suggests that the intervention was responsible for the change
in the teacher’s percentage of implementation of FTCL. Upon introduction of BST to
TSR, the teacher’s performance dropped, with a mean percentage of intervals scored for
FTCL of 30% (8-63%). His performance remained below the 80% criterion level once
TSR was introduced. As previously mentioned, this drop in FTCL was expected based on
the structured nature of teaching social routines and the teacher’s increased involvement
in directing the interaction.
Teaching social routines. Mr. Parker’s performance during baseline was highly
variable, with an average percentage of intervals coded for TSR of 17%, ranging from
0% to 50%. The data began at a low, stable level around 0%; however, they began to
increase in the therapeutic direction, reaching a peak at about 50%. The data then began
trending in the non-therapeutic direction, reaching a level that was similar to the
beginning of baseline. However, it then began trending in the therapeutic direction again
93
before intervention began. When intervention was introduced, his percentage of intervals
coded for TSR increased from 36% to 91% and remained above the 80% criterion level
for all but one session. His average percentage of intervals coded for TSR during
intervention was 90% with a range of 83-99%. Of note, there was a large gap in data
collection between Sessions 24 and 25; this triggered the need for a booster session where
the researcher met with the teacher to review the intervention procedures and to allow
him to practice these procedures once again. Similarly, this skill remained above the
80% criterion when SLP was introduced with an average accuracy score of 94% and a
range of 90-99%. This change in data level suggests that the intervention was responsible
for Mr. Parker’s increase in accuracy of implementation.
System of least prompts. Mr. Parker demonstrated an average percentage of
prompts used correctly for SLP of 0% during baseline. Baseline data were flat and stable
and remained at the 0% level throughout. When intervention was introduced, his score
increased from 0% to 80%. With the exception of one data point, he remained above the
80% criterion throughout intervention, with an average percentage of prompts used
correctly for SLP of 80% (range: 50-100%). However, due to the fact that there was a
break in between the final baseline session and SLP training, these results must be
interpreted with caution. Although a generalization baseline point was conducted prior to
BST for SLP, inference of a functional relation between BST and MT implementation for
Mr. Parker is limited.
Generalization. During the generalization probe of FTCL during the intervention
condition, Mr. Parker demonstrated 81% of intervals coded for FTCL, suggesting that his
94
skills had generalized to a new set of materials. However, this decreased to 25%, and then
increased to 63%. Although his scores were somewhat variable, his FTCL skills
generalized to a new set of materials, though without baseline generalization sessions,
interpretations about whether BST contributed to generalization of FTCL are limited.
Regarding TSR, his baseline generalization performance was 0%. This increased to 99%
during the intervention condition, and further increased to 100%. Thus, he was able to
generalize his TSR skills to a new set of materials. Finally, during the two SLP baseline
generalization probes, Mr. Parker performed at 10% and 0%. In contrast, during the SLP
intervention generalization probe, his performance improved to 100%, indicating that he
was able to generalize the SLP technique to a new set of materials.
Maintenance. Regarding FTCL, Mr. Parker’s performance did not remain above
the 80% criterion, with values of 46% and 65%, at the two- and four-week follow up
probes, respectively. In contrast, he maintained his scores for TSR at the two- and four-
week follow ups, with values of 100% at both maintenance probes. Finally, his SLP skills
also maintained at the two- and four-week follow-up maintenance probes, with scores of
100% at both probes.
Descriptive Analysis of Formal Measures
Communication. Although child outcomes were not targeted in this study and
therefore not measured directly, parents completed a CDI for each child both before and
after the study. Prior to the study, Sarah’s parents reported that she understood 28/28
phrases and 307/396 words on the checklist. She reportedly could not produce a single
word on the list. Following the intervention, parents reported that she could understand
95
27/28 phrases and 370/396 words, a decrease of one phrase and an increase of 63 words
understood. They also reported that she could now produce a total of 16/396 words, an
increase of 16 words from the beginning of intervention. Josh also showed gains on this
measure from pre to post intervention. Before intervention parents reported that he could
understand 18/28 phrases and 380/396 words on the checklist. He could produce a total
of 166/396 words. Following intervention, he reportedly understood 22/28 phrases and
392/396 words, an increase of 4 phrases and 12 words. Josh could also reportedly
produce 201/396 words, an increase of 37 words. Though interesting, these gains are
merely descriptive, as a causal relationship cannot be concluded based on the study
design.
Social Validity. At the conclusion of the study, teachers were asked to complete
an acceptability rating scale to help determine whether they found the intervention
acceptable, helpful, and practical (adapted from Hendrickson et al., 1993). The
questionnaire was broken down into four main parts: Research, Intervention Effects,
Social Validity, and Training. Ms. Smith and Mr. Parker agreed that research is important
in schools, can improve staff teaching, and is important for better teaching all children.
Teachers also agreed that the intervention was helpful for them as teachers as well as for
the students with whom they worked. In terms of the social validity section, both teachers
reported that their knowledge, skills, and confidence in implementing techniques
improved; they believed they could incorporate techniques into daily classroom routines;
and that the intervention techniques were feasible to implement in the classroom.
However, Mr. Parker disagreed that the intervention techniques could be easily
96
incorporated into the classroom whereas Ms. Smith agreed. Both teachers strongly agreed
that they would participate in a similar project in the future.
Finally, regarding training session two, which featured in-situ feedback with the
child participants, both teachers agreed that they were comfortable during their training
sessions, the sessions were tailored to their experience levels, they felt comfortable
implementing techniques after their training sessions, felt comfortable asking questions,
and would recommend the sessions to their colleagues. In addition to the questionnaire,
Mr. Parker informed the researcher that he was grateful for having learned the techniques
and began implementing them with a new student with social communication difficulties
with whom he was working.
Discussion
The results of the present study indicate that BST can be effectively used to train
teachers to implement several core techniques of MT: FTCL, TSR, and SLP. Overall,
for Ms. Smith, the data showed one demonstration of effect and two replications of
effect, indicating evidence of a functional relation between BST and MT implementation
fidelity. For Mr. Parker, the data showed one demonstration of effect and one replication
of effect, with limited evidence of a second replication. Ms. Smith maintained all skills at
the two- and four-week follow up probes, with the exception of SLP at the four-week
follow up. Mr. Parker maintained TSR and SLP at the two and four-week follow up
probes but did not maintain FTCL. Ms. Smith initially generalized outcomes to a new set
of toys during intervention generalization sessions but struggled to maintain
generalization above the 80% criterion during additional generalization probes. Mr.
97
Parker struggled to maintain generalization criterion for FTCL but maintained
performance at the 80% criterion for TSR and SLP once these intervention techniques
were introduced. Similar to teacher gains in levels of implementation of the MT
techniques, post intervention gains were also observed in children’s word production and
comprehension according to parent-report.
These results are most similar to those reported by Kaiser et al. (1993) who found
that through implementing training techniques similar to those used in BST, they were
able to successfully teach educators environmental arrangement and MT techniques to be
implemented in the classroom. This study also expands upon similar studies which
trained parents to implement some form of MT techniques, using training techniques
similar to BST (Kaiser et al., 1995). In addition, similar to Aktas and Ciftcitekinarslan,
(2018), this study demonstrated the importance of intensive training in order for teachers
to correctly implement the MT techniques. BST provided a solid framework from which
to draw information and planning for teacher training sessions.
Implications
This study has several implications for both practice and for research. Data show
that teachers can be successfully taught to implement MT techniques when BST is used
as a training package. Teachers spend a great deal of time with children and if we can
train them to implement early social communication techniques designed to improve
language ability, then children have a chance to benefit from a much greater dosage of
intervention. Rather than seeing a therapist once a week for an hour, children could have
daily access to evidence-based interventions for multiple hours a day through trained
98
school staff. This would allow for an increase in dosage, and in theory, an increase in
language gains for each child exposed to the intervention in the classroom. Such practices
have the potential to make a big impact in the lives of children with social
communication deficits.
Based on the findings of the current study, future research should focus on
incorporating MT intervention techniques into regular classroom routines. The current
study allowed for an increase in dosage in intervention techniques, utilizing the teachers
as the interventionists. However, dosage could be increased even further and intervention
could reach more children if these techniques were integrated into regular classroom
routines and incorporated as part of normal classroom instruction. Similarly, given that
the present study has demonstrated that BST can be used to effectively train teachers to
implement MT in the classroom, other naturalistic interventions should be explored. BST
may be used to teach a variety of naturalistic interventions to not only teachers, but to
parents and other important figures in children’s lives. Finally, future research could
expand the results of the current study by directly measuring child outcomes to determine
whether the intervention examined has a direct effect on children’s language and other
targeted behaviors.
Limitations
Despite the positive outcomes of this study, there were several significant
limitations worth noting. First, regarding generalization, a generalization probe was not
conducted during FTCL baseline for Ms. Smith or Mr. Parker, making it impossible to
compare generalization after the introduction of the technique to baseline performance.
99
Thus, we cannot say for sure that generalization occurred for FTCL as performance may
have been equally high during baseline as during intervention. The remaining two tiers
had generalization probes conducted during baseline and during intervention so that this
comparison could be made. However, teacher performance was variable, with
generalization maintaining for some skills, but not for others.
Second, for Ms. Smith, a bike was introduced during Session 11 during FTCL
intervention. Ms. Smith’s data spiked during this session to above the 80% criterion line.
However, her data then began to fall below the 80% criterion and decreased for several
sessions, making it unlikely that the bike had a large impact on the data at that time. Yet
it is possible that the bike, introduced around the same time as intervention, could have
had an impact on the intervention data rather than the intervention itself, making it more
difficult to infer a causal relation between BST and improvement in FTCL data for Ms.
Smith. Similarly, a ball was introduced during Session 7 for Mr. Parker, was absent for
Session 8, and then present again for Session 9 when FTCL intervention began (it was
then present for the remainder of the sessions with the exception of one session).
However, given Mr. Parker’s performance during baseline and the continuing upward
trend from Session 7 to Session 8 where the ball was present then absent, it is unlikely
that this had a major impact on the data. Yet it is still possible given that when the ball
was missing during Session 15, Mr. Parker’s performance dropped dramatically and
began to climb once the ball was re-introduced. Therefore, similar to Ms. Smith, the
introduction of a new toy for Mr. Parker right around the introduction of FTCL
intervention calls into question whether the toy or the intervention was responsible for the
100
change in data. This again makes it more difficult to infer a causal effect of BST on
improvement in FTCL implementation.
Similarly, an interval timer that the teachers could keep on their person and that
vibrated every 2 min was introduced partway through SLP intervention to make it easier
and less cumbersome for teachers to determine when the 2 min interval had expired. It
was introduced during Session 31 for Ms. Smith and Session 32 for Mr. Parker during
SLP intervention. Upon introduction, Ms. Smith’s performance continued to decline until
Session 33. In contrast, Mr. Parker’s data increased to above the 80% criterion once the
timer was introduced and he immediately met criteria. Therefore, it is possible that the
interval timer had an impact on teacher performance in the SLP intervention condition.
Third, mean fidelity for BST of MT techniques was somewhat low with a wide
range for Ms. Smith’s and Mr. Parker’s training sessions. This was likely due to a number
of reasons. In order to remove redundancies from the training, several changes were
made to the coaching process during individual sessions, resulting in lower procedural
fidelity. Specifically, the second training session was made more informal to increase
teacher comfort and remove redundancies as coaching sessions always immediately
followed initial training sessions (both occurred on the same day). Rather than follow the
procedural fidelity sheet strictly, the researcher allowed the teachers to practice the skills
using a more open and informal style. The researcher provided training as the teacher
practiced the skills with the child, provided modeling when necessary as well as
immediate feedback that focused on both positive performance and areas that needed
improvement. This allowed the coaching session to flow more comfortably and naturally
101
and adhered more to naturalistic teaching strategies rather than following the pre-
determined criteria strictly. Ultimately, this resulted in lower procedural fidelity, but all
principles and pieces of behavioral skills training (instruction, modeling, rehearsal, and
in-situ feedback) were still utilized and followed accordingly during both training
sessions.
Fourth, frequency for the data collection sessions was not consistent. In the
beginning of the study, data were collected two to three times per week. However, as the
study progressed, toward the end of TSR and throughout SLP data collection, data were
often collected once a week but no less than every other week. At times, two sessions
were conducted in one day. This variation was due to conflicts with teacher, child, and
researcher availability and school holidays. Adjustments were made as necessary in order
to accommodate everyone’s schedules.
Finally, each baseline tier ended with the collection of generalization data, with
the exception of FTCL. Therefore, there was no additional baseline session prior to
implementing intervention for each of the remaining tiers (TSR and SLP). This prevented
the researcher from determining whether baseline data continued in the same direction,
level, and trend after a brief interruption (generalization session). This is problematic
because without knowing where baseline data are immediately prior to implementing
intervention, it makes it more difficult to determine whether there were changes in the
data when intervention was introduced. This in turn makes it more difficult to infer a
causal relation as having an additional baseline data session following generalization
would have increased confidence in the presence of a functional relation. Although this is
102
a limitation, data were collected close enough together that it is not likely a drastic threat
to internal validity, with one exception. There was a two-week break in between
generalization and SLP training for Mr. Parker and no additional baseline probes were
obtained before SLP training. Given the addition of extra time in between the last
baseline session and the first intervention session for SLP, these results must be
interpreted with caution.
Conclusion
Despite several limitations, this study demonstrated a causal relation between
BST and teacher-implemented MT for one teacher and a possible causal relation for a
second teacher, indicating that BST can be effectively used to train teachers to implement
the following MT techniques: FTCL, TSR, and SLP. Through the processes of
instruction, modeling, rehearsal, and feedback, we demonstrated that teachers can
effectively learn MT techniques to be implemented in the school setting and that these
skills largely generalized to new sets of toys and maintained across time. This study also
showed that a teacher with no experience in training or instruction could learn the
techniques just as well as a teacher with years of experience teaching and some
experience with training. This is important to note because such a finding suggests the
intervention techniques targeted in this study are accessible to teachers of all experience
levels, which could in turn make the intervention more widely available to larger groups
of students. The current study also expanded the current literature base by demonstrating
that BST can be effectively used to train natural implementers to use NDBI components.
103
Thus, the utility of BST as an intervention package has been further expanded into yet
another area of interventions, further increasing its utility and reach.
104
Table 3.1
Teacher Participant Demographics
Teacher Ms. Smith Mr. Parker

Age 47 20
Race/Ethnicity White White
Gender Female Male
Current Grade Teaching 3-4 3-4
Years of Experience 15 0
Highest Degree Earned Associate’s Degree High School Diploma
Area(s) of Certification Paraprofessional None
Certified Tutor
105
Table 3.2
Child Participant Demographics
Pre-Intervention Post-Intervention
Sarah Josh Sarah Josh
Age 8 8
Race/Ethnicity White Black
Gender Female Male
SB-5 AB Full Scale IQ (Percentile) <0.1 <0.1 N/A N/A
SB-5 AB Verbal IQ (Percentile) <0.1 <0.1 N/A N/A
SB-5 AB Nonverbal IQ (Percentile) <0.1 <0.1 N/A N/A
CDI: # of Phrases Understood (Raw Score) 28/28 18/28 27/28 22/28
CDI: # of Words Understood (Raw Score) 307/396 380/396 370/396 392/396
CDI: # of Words Produced (Raw Score) 0/396 166/396 16/396 201/396
CDI: # of Early Gestures (Raw Score) 16/18 6/18 17/18 6/18
CDI: # of Later Gestures (Raw Score) 43/45 27/45 42/45 27/45
CDI: Total # of Gestures (Raw Score) 59/63 33/63 59/63 33/63
Note. GPP = Growth & Performance Plan; SB-5 AB = Stanford-Binet Intelligence
Scales, Fifth Edition, Abbreviated Battery; CDI = MacArthur Bates Communicative
Development Inventory
106
Table 3.3
IOA Agreement by Tier and Condition
FTCL TSR SLP

T1 Baseline Average (Range) 67% (56-78) 93% (83-100) 100% (97-100)
T1 Intervention Average (Range) 85% (79-93) 92% (88-95) 89% (81-96)
T1 Generalization Average (Range) 81% (66-90) 78% (64-86) 99% (97-100)
T2 Baseline Average (Range) 85% (85-85) 100% (100-100) 100% (97-100)
T2 Intervention Average (Range) 82% (63-93) 89% (86-91) 86% (80-91)
T2 Generalization Average (Range) 81% (73-86) 100% (99-100) 96% (91-100)
Note. T1 = Ms. Smith; T2 = Mr. Parker; FTCL = Following the Child’s Lead; TSR =
Teaching Social Routines; SLP = System of Least Prompts.
107
Figure 3.1
Accurate Use of the Prompting Hierarchy
Response
Requested (Time
Delay)
No/Incorrect
Correct Response
Response
Vocal Mand
Prompt: "What do Praise!
you want?"
No/Incorrect
Correct Response Repeat
Response
Vocal Mand
Model Prompt:
Praise!
"X. You want X.
Say X."
No/Incorrect
Attempt Response
Response
Move on (praise
any attempt to
Praise!
say the target
word)
Repeat Repeat
108
Figure 3.2
Fidelity of Implementation Across Behaviors for Ms. Smith
Baseline Milieu Teaching Maintenance

100
Bike
90
Percentage of Intervals with
Introduced
80
70
60
FTCL
50
Booster
40
Session
30
20
10
0
100
90
80
70
60
TSR
50
40
30
20
10
0
100
90
Percentage of SLP Used
80
70
Correctly
60
50 Timer
40 introduced
30 5-week
20 2-week 2-week
break
10 break break
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Session
Note. Circles represent the percentage of intervals with FTCL and TSR implemented
correctly and the percentage of SLP prompts implemented correctly. Triangles represent
109
generalization data and filled in squares represent maintenance data. Slashes on the graph
represent breaks in data collection of more than one week. FTCL = following the child’s
lead; TSR = teaching social routines; SLP = system of least prompts.
110
Figure 3.3
Fidelity of Implementation Across Behaviors for Mr. Parker
100 Baseline Milieu Teaching Maintenance

90
80
70
60 Ball
Introduced
FTCL
50 Booster
40 Session
Ball Missing
30
20
10
0
100
90
80
70
60 Booster
Session
TSR
50
40
30
20
10
0
100
90
Percentage of SLP Used
80
70
Correctly
60
50
40
30
20 2-week 5-week 2-week Timer
10 break break break introduced
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Session
Note. Circles represent the percentage of intervals with FTCL and TSR implemented
correctly and the percentage of SLP prompts implemented correctly. Triangles represent
111
generalization data and filled in squares represent maintenance data. Slashes on the graph
represent breaks in data collection of more than one week. FTCL = following the child’s
lead; TSR = teaching social routines; SLP = system of least prompts.
112
CHAPTER 4
GENERAL DISCUSSION
The prevalence of children demonstrating language delays is quite high, with
community prevalence estimates of 7% to 17% (King et al., 2005). However, this has
been shown to be even higher among children with disabilities, with approximately 70%
of three to five year olds with co-occurring disabilities and language impairment
(Wetherby & Prizant, 1992). Furthermore, research indicates that speech and language
impairments often co-occur with neurodevelopmental disabilities such as autism
spectrum disorder (ASD; Rosenbaum & Simon, 2016). Language impairments have also
been shown to be associated with issues in cognitive, academic, and language
development as children grow (Johnson et al., 1999). Children with ASD are one group
of children who often exhibit language delays. Many children with ASD benefitted from
the use of discrete trial teaching (DTT) when it was first introduced by Lovaas (1987),
but Schreibman et al. (2015) identified several weaknesses of DTT that may make it
more difficult for some children, such as those with ASD, to benefit from DTT. Thus,
researchers began to look for other means to teach children with ASD language skills,
and to help close the language gap between them and their peers. Therefore, researchers
began to turn to a different, more naturalistic intervention type known as naturalistic
developmental behavioral interventions (NDBIs; Schreibman et al., 2015). These
interventions encouraged a more naturalistic approach to language intervention and also
encouraged the use of natural implementers in the natural environment, including
teachers (Peterson, 2004). Peterson (2004) indicated that teachers spend a great deal of
113
time with children throughout the school day and the school year. This stands to reason
that if they could be taught to implement various naturalistic language interventions, the
dosage of such interventions for children experiencing language delays could increase
tremendously, giving these children more opportunity to benefit from intervention. As
such, the overall goal of the two studies presented was to provide a systematic literature
review of the use of BST to train natural implementers including teachers and/or other
professional staff and to provide an empirical investigation into the use of BST to train
teachers to implement NDBIs.
The purpose of the systematic literature review was to analyze and synthesize the
current literature on the use of BST by teachers and other professional staff to implement
a variety of interventions. It sought to guide research and clinical decision making
regarding whether BST was the appropriate training package regarding type of
intervention, outcome variables, population, context, and quality of published studies. A
total of 19 studies from 17 articles were reviewed; however, only seven studies showed
sufficient quality and rigor for their results to be interpreted with confidence. Similarly,
less than half of the studies measured generalization, maintenance, or BST fidelity, or
described participant characteristics sufficiently, demonstrating several weaknesses in the
reviewed literature. However, several strengths were also demonstrated in the reporting
of social/ecological validity, the description of dependent variables and conditions, and
strong reliability data. Similarly, all 19 studies reported using all components of BST,
including instructions, modeling, rehearsal and feedback indicating that although BST
fidelity was not reported, all BST components were included in each of the reviewed
114
studies. In addition, very few studies examined BST and NDBIs. Overall, the systematic
review revealed a need for more BST research in which BST fidelity is measured,
participant characteristics are described, and the use of BST with NDBIs is examined.
The purpose of the empirical investigation was to help fill some of these gaps in
the BST literature by examining the use of BST to train natural implementers (teachers)
to implement the components of milieu teaching (MT). This was accomplished through
the use of a multiple baseline design across behaviors. In this study, two teachers working
in a language classroom in a school for children with disabilities were taught to
implement several MT techniques, including following the child’s lead (FTCL), teaching
social routines (TSR), and the system of least prompts (SLP). Each of these behaviors
were taught using the four components of BST. For the first teacher, a functional relation
was shown between BST and the improvement in performance in the fidelity of
implementation of MT techniques. These results were replicated across the second
teacher for two of the three behaviors (FTCL and TSR), but not for SLP. Thus, these
results showed that BST can be used to effectively train teachers to implement
components of MT with fidelity. In addition, these effects generalized to a new set of
toys and maintained across time at two and four-week follow-ups.
Taken together, these studies show that BST can be used to train teachers and
other professionals to implement a variety of interventions, including NDBIs. However,
more research is needed in the area of training teachers to implement NDBIs using BST.
It is promising indeed that teachers can be taught to implement NDBIs in the classroom;
however, even in the current study, a pull-out system was used where the teacher and
115
child were separated from the general classroom, and teachers were taught to implement
the intervention in 1:1 settings. In order for the intervention to truly have increased reach
and dosage, more research is needed to determine how these interventions can be
incorporated into daily classroom routines. As a field, we must make them feasible
enough for teachers to use them without detracting from their regular classroom duties
and teaching. Only then will we truly have the chance to see the impacts of teachers as
natural implementers on a day-to-day basis.
116
References
Adams, R. A., Plercy, F. P., Jurich, J. A., & Lewis, R. A. (1992). Components of a model
adolescent AIDS/drug abuse prevention program: A delphi study. Family Relations,
41, 312–317.
Aherne, C. M., & Beaulieu, L. (2019). Assessing long-term maintenance of staff
performance following behavior skills training in a home-based setting. Behavioral
Interventions, 34(1), 79–88. https://doi.org/10.1002/bin.1642
Aktas, B., & Ciftcitekinarslan, I. (2018). The effectiveness of parent training a mothers of
children with Autism use of mand model techniques. International Journal of Early
Childhood Special Education, 10(2), 106–120.
https://doi.org/10.20489/INTJECSE.512378
Alden, L., Safran, J., & Weideman, R. (1978). Comparison of cognitive and skills
training strategies in the treatment of unassertive clients. Behavior Therapy, 9(5),
843–846. https://doi.org/10.1016/S0005-7894(78)80015-X
Bolzani Dinehart, L. H., Yale Kaiser, M., & Hughes, C. R. (2009). Language delay and
the effect of milieu teaching on children born cocaine exposed: A pilot study.
Journal of Developmental and Physical Disabilities, 21(1), 9–22.
https://doi.org/10.1007/s10882-008-9122-8
Boyer, C. B., & Kegeles, S. M. (1991). AIDS risk and prevention among adolescents.
Social Science & Medicine, 33(1), 11–23. https://doi.org/10.1016/0277-
9536(91)90446-J
Bromberg, D. S., & Johnson, B. T. (1997). Behavioral versus traditional approaches to
117
prevention of child abduction. School Psychology Review, 26(4), 1–13.
Chazin, K. T., Barton, E. E., Ledford, J. R., & Pokorski, E. A. (2018). Implementation
and Intervention Practices to Facilitate Communication Skills for a Child With
Complex Communication Needs. Journal of Early Intervention, 40(2), 138–157.
https://doi.org/10.1177/1053815118771397
Christensen-Sandfort, R. J., & Whinnery, S. B. (2013). Impact of milieu teaching on
communication skills of young children with autism spectrum disorder. Topics in
Early Childhood Special Education, 34(4), 211–222.
https://doi.org/10.1177/0271121411404930
Davenport, C. A., Alber-Morgan, S. R., & Konrad, M. (2019). Effects of behavioral skills
training on teacher implementation of a reading racetrack intervention. Education
and Treatment of Children, 42(3), 385–407. https://doi.org/10.1353/etc.2019.0018
DiGennaro Reed, F. D., Blackman, A. L., Erath, T. G., Brand, D., & Novak, M. D.
(2018). Guidelines for Using Behavioral Skills Training to Provide Teacher Support.
TEACHING Exceptional Children, 50(6), 373–380.
https://doi.org/10.1177/0040059918777241
Dogan, R. K., King, M. L., Fischetti, A. T., Lake, C. M., Mathews, T. L., & Warzak, W.
J. (2017). Parent-implemented behavioral skills training of social skills. Journal of
Applied Behavior Analysis, 50(4), 805–818. https://doi.org/10.1002/jaba.411
Dubin, A., & Lieberman-Betz, R. (2020). Naturalistic interventions to improve
prelinguistic communication for children with autism spectrum disorder: A
systematic review. Review Journal of Autism and Developmental Disorders, 7, 151–
118
167. https://doi.org/https://doi.org/10.1007/s40489-019-00184-9
Dunn, L. M., & Dunn, L. M. (1997). Peabody Picture Vocabulary Test--Third Edition
examiner’s manual (3rd ed.). Circle Pines: American Guidance Service.
Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., Reznick, J. S., & Bates, E. (2007).
MacArthur-Bates Communicative Development Inventories: User’s guide and
Technical Manual (2nd ed.). Baltimore, MD: Paul H. Brookes Publishing Co., Inc.
Fetherston, A. M., & Sturmey, P. (2014). The effects of behavioral skills training on
instructor and learner behavior across responses and skill sets. Research in
Developmental Disabilities, 35(2), 541–562.
https://doi.org/10.1016/j.ridd.2013.11.006
Fey, M. E. (2008). Milieu communication teaching intervention manual. Department of
Hearing and Speech.
Fey, M. E., Warren, S. F., Bredin-Oja, S. L., & Yoder, P. J. (2017). Responsivity
education/prelinguistic milieu teaching. In R. B. McCauley, R. J., Fey, M. E., &
Gillam (Ed.), Treatment of Language Disorders in Children (2nd ed., pp. 57–85).
Baltimore, MD: Paul H. Brookes Publishing Co., Inc.
Gianoumis, S., Seiverling, L., & Sturmey, P. (2012). The effects of behavior skills
training on correct teacher implementation of natural language paradigm teaching
skills and child behavior. Behavioral Interventions, 27, 57–74.
https://doi.org/10.1002/bin.1334
Giles, A., Swain, S., Quinn, L., & Weifenbach, B. (2018). Teacher-Implemented
Response Interruption and Redirection: Training, Evaluation, and Descriptive
119
Analysis of Treatment Integrity. Behavior Modification, 42(1), 148–169.
https://doi.org/10.1177/0145445517731061
Glasgow, R. E., & Lichtenstein, E. (1987). Long-term effects of behavioral smoking
cessation interventions. Behavior Therapy, 18(4), 297–324.
https://doi.org/10.1016/S0005-7894(87)80002-3
Hassan, M., Simpson, A., Danaher, K., Haesen, J., Makela, T., & Thomson, K. (2018).
An evaluation of behavioral skills training for teaching caregivers how to support
social skill development in their child with autism spectrum disorder. Journal of
Autism and Developmental Disorders, 48(6), 1957–1970.
https://doi.org/10.1007/s10803-017-3455-z
Hassan, M., Thomson, K. M., Khan, M., Burnham Riosa, P., & Weiss, J. A. (2017).
Behavioral skills training for graduate students providing cognitive behavior therapy
to children with autism spectrum disorder. Behavior Analysis: Research and
Practice, 17(2), 155–165. https://doi.org/10.1037/bar0000078
Hemmeter, M. L., & Kaiser, A. P. (1994). Enhanced milieu teaching: Effects of parent-
implemented language intervention. Journal of Early Intervention, 18(3), 269–289.
https://doi.org/10.1177/105381519401800303
Hendrickson, J. M., Gardner, N., Kaiser, A., & Riley, A. (1993). Evaluation of a social
interaction coaching program in an integrated day-care setting. Journal of Applied
Behavior Analysis, 26(2), 1297740. https://doi.org/10.1901/jaba.1993.26-213
Himle, M. B., & Miltenberger, R. G. (2004). Preventing unintentional firearm injury in
children: The need for behavioral skills training. Education and Treatment of
120
Children, 27(2), 161–177. Retrieved from https://www.jstor.org/stable/42899794
Hogan, A., Knez, N., & Kahng, S. W. (2015). Evaluating the Use of Behavioral Skills
Training to Improve School Staffs’ Implementation of Behavior Intervention Plans.
Journal of Behavioral Education, 24(2), 242–254. https://doi.org/10.1007/s10864-
014-9213-9
Homlitas, C., Rosales, R., & Candel, L. (2014). A further evaluation of behavioral skills
training for implementation of the picture exchange communication system. Journal
of Applied Behavior Analysis, 47(1), 198–203. https://doi.org/10.1002/jaba.99
Ingersoll, B., Meyer, K., Bonter, N., & Jelinek, S. (2012). A comparison of
developmental social-pragmatic and naturalistic behavioral interventions on
language use and social engagement in children with autism. Journal of Speech,
Language, and Hearing Research, 55, 1301–1313.
Iwata, B. A., Wallace, M. D., Kahng, S. W., Lindberg, J. S., Roscoe, E. M., Conners, J.,
… Worsdell, A. S. (2000a). Skill acquisition in the implementation of functional
analysis methodology. Journal of Applied Behavior Analysis, 33(2), 181–194.
Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/14702451
Iwata, B. A., Wallace, M. D., Kahng, S. W., Lindberg, J. S., Roscoe, E. M., Conners, J.,
… Worsdell, A. S. (2000b). Skill acquisition in the implementation of functional
analysis methodology. Journal of Applied Behavior Analysis, 33(2), 181–194.
https://doi.org/10.1901/jaba.2000.33-181
Janzen, H. L., Obrzut, J. E., & Marusiak, C. W. (2004). Test Review: Roid, G. H. (2003).
Stanford-Binet Intelligence Scales, Fifth Edition (SB:V). Itasca, IL: Riverside
121
Publishing. Canadian Journal of Psychology, 19(1), 235–244.
https://doi.org/10.1177/082957350401900113
Jimenez-Gomez, C., McGarry, K., Crochet, E., & Chong, I. M. (2019). Training
behavioral technicians to implement naturalistic behavioral interventions using
behavioral skills training. Behavioral Interventions, 34(3), 396–404.
Johnson, B. M., Miltenberger, R. G., Egemo-Helm, K., Jostad, C. M., Flessner, C., &
Gatheridge, B. (2005). Evaluation of behavioral skills training For teaching
abduction-prevention skills to young children. Journal of Applied Behavior
Analysis, 38(1), 67–78. https://doi.org/10.1901/jaba.2005.26-04
Johnson, B. M., Miltenberger, R. G., Knudson, P., Egemo-Helm, K., Kelso, P., Jostad,
C., & Langley, L. (2006). A preliminary evaluation of two behavioral skills training
procedures for teaching abduction-prevention skills to school children. Journal of
Applied Behavior Analysis, 39(1), 25–34. https://doi.org/10.1901/jaba.2006.167-04
Johnson, C. J., Beitchman, J. H., Young, A., Escobar, M., Atkinson, L., Wilson, B., …
Wang, M. (1999). Fourteen-year follow-up of children with and without
speech/language impairments: Speech/language stability and outcomes. Journal of
Speech, Language, and Hearing Research, 42(3), 744–760.
https://doi.org/10.1044/jslhr.4203.744
Jones, E. A., Carr, E. G., & Feeley, K. M. (2006). Multiple effects of joint attention
intervention for children with autism. Behavior Modification, 30(6), 782–834.
https://doi.org/10.1177/0145445506289392
122
Jull, S., & Mirenda, P. (2016). Effects of a staff training program on community
instructors’ ability to teach swimming skills to children with autism. Journal of
Positive Behavior Interventions, 18(1), 29–40.
https://doi.org/10.1177/1098300715576797
Kaale, A., Smith, L., & Sponheim, E. (2012). A randomized controlled trial of preschool-
based joint attention intervention for children with autism. Journal of Child
Psychology and Psychiatry and Allied Disciplines, 53(1), 97–105.
https://doi.org/10.1111/j.1469-7610.2011.02450
Kaiser, A. P., Hester, P. P., Alpert, C. L., & Whiteman, B. C. (1995). Preparing parent
trainers: An experimental analysis of effects on trainers, parents, and children.
Topics in Early Childhood Special Education, 15(4), 385–414.
https://doi.org/10.1177/027112149501500401
Kaiser, A. P., Ostrosky, M. M., & Alpert, C. L. (1993). Training Teachers to Use
Environmental Arrangement and Milieu Teaching with Nonvocal Preschool
Children. Research and Practice for Persons with Severe Disabilities, 18(3), 188–
199. https://doi.org/10.1177/154079699301800305
King, T. M., Rosenberg, L. A., Fuddy, L., McFarlane, E., Sia, C., & Duggan, A. K.
(2005). Prevalence and early identification of language delays among at-risk three
year olds. Journal of Developmental and Behavioral Pediatrics, 26(4), 293–303.
https://doi.org/10.1097/00004703-200508000-00006
Koegel, R. L., Russo, D. C., & Rincover, A. (1977). Assessing and training teachers in
the generalized use of behavior modification with autistic children. Journal of
123
Applied Behavior Analysis, 10(2), 197–205. https://doi.org/10.1901/jaba.1977.10-
197
Kolko, D., Watson, S., & Faust, J. (1991). Fire safety prevention skills training to reduce
involvement with fire in young psychiatric inpatients: Preliminary findings.
Behaviour Therapy, 22, 269–284. https://doi.org/10.1016/S0005-7894(05)80182-0
Kornacki, L. T., Ringdahl, J. E., Sjostrom, A., & Nuernberger, J. E. (2013). A component
analysis of a behavioral skills training package used to teach conversation skills to
young adults with autism spectrum and other developmental disorders. Research in
Autism Spectrum Disorders, 7(11), 1370–1376.
https://doi.org/10.1016/j.rasd.2013.07.012
Krumhus, K. M., & Malott, R. W. (1980). The effects of modeling and immediate and
delayed feedback in staff training. Journal of Organizational Behavior
Management, 2(4), 279–293. https://doi.org/10.1300/J075v02n04_05
Lane, J. D., & Ledford, J. R. (2014). Using interval-based systems to measure behavior in
early childhood special education and early intervention. Topics in Early Childhood
Special Education, 34(2), 83–93. https://doi.org/10.1177/0271121414524063
Law, J., & Roy, P. (2008). Parental report of infant language skills: A review of the
development and application of the communicative development inventories. Child
and Adolescent Mental Health, 13(4), 198–206. https://doi.org/10.1111/j.1475-
3588.2008.00503.x
Lawrence, J. S., Brasfield, T. L., Jefferson, K. W., Alleyne, E., O’Bannon, R. E., &
Shirley, A. (1995). Cognitive-behavioral intervention to reduce African American
124
adolescents’ risk for HIV infection. Journal of Consulting and Clinical Psychology,
63(2), 221–237. Retrieved from
http://onlinelibrary.wiley.com/o/cochrane/clcentral/articles/124/CN-
00114124/frame.html
Ledford, J. R., Lane, J. D., Zimmerman, K. N., Chazin, K. T., & Ayres, K. A. (2016).
Single case analysis and review framework (SCARF). Retrieved from
http://vkc.mc.vanderbilt.edu/ebip/scarf/
Lovaas, O. I. (1987). Behavioral Treatment and Normal Educational and Intellectual
Functioning in Young Autistic Children. Journal of Consulting and Clinical
Psychology, 55(1), 3–9. https://doi.org/10.1037/0022-006x.55.1.3
Madzharova, M. S., & Sturmey, P. (2018). Using in-vivo modeling and feedback to teach
classroom staff to implement a complex behavior intervention plan. Journal of
Developmental and Physical Disabilities, (30), 329–337.
https://doi.org/10.1007/s10882-018-9588-y
McCathren, R. B. (2000). Teacher-implemented prelinguistic communication
intervention. Focus on Autism and Other Developmental Disabilities, 15(1), 21–29.
https://doi.org/10.1177/108835760001500103
Miles, N. I., & Wilder, D. A. (2009). The effects of behavioral skills training on caregiver
implementation of guided compliance. Journal of Applied Behavior Analysis, 42(2),
405–410. https://doi.org/10.1901/jaba.2009.42-405
Miltenberger, R. G., & Thiesse-Duffy, E. (1988). Evaluation of home‐based programs for
teaching personal safety skills to children. Journal of Applied Behavior Analysis,
125
21(1), 81–87. https://doi.org/10.1901/jaba.1988.21-81
Miltenberger, R., Gross, A., Knudson, P., Bosch, A., Jostad, C., & Breitwieser, C. B.
(2009). Evaluating behavioral skills training with and without simulated in situ
training for teaching safety skills to children. Education and Treatment of Children,
32(1), 63–75. https://doi.org/10.1353/etc.0.0049
Miltenberger, R. G., Flessner, C., Gatheridge, B., Johnson, B., Satterlund, M., & Egemo,
K. (2004). Evaluation of behavioral skills training to prevent gun play in children.
Journal of Applied Behavior Analysis, 37(4), 513–516.
Miltenberger, R. G., Gatheridge, B. J., Satterlund, M., Egemo-Helm, K. R., Johnson, B.
M., Jostad, C., … Flessner, C. A. (2005). Teaching safety skills to children to
prevent gun play: An evaluation of in situ training. Journal of Applied Behavior
Analysis, 38(3), 395–398. https://doi.org/10.1901/jaba.2005.130-04
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items
for systematic reviews and meta-analyses: The PRISMA statement (reprinted from
Annals of Internal Medicine). Physical Therapy, 89(9), 873–880.
https://doi.org/10.1371/journal.pmed.1000097
Nabeyama, B., & Sturmey, P. (2010). Using Behavioral Skills Training To Promote Safe
and Correct Staff Guarding and Ambulation Distance of Students With Multiple
Physical Disabilities. Journal of Applied Behavior Analysis, 43(2), 341–345.
Nigro-Bruzzi, D., & Sturmey, P. (2010). The effects of behavioral skills training on mand
126
training by staff and unprompted vocal mands by children. Journal of Applied
Behavior Analysis, 43(4), 757–761. https://doi.org/10.1901/jaba.2010.43-757
Nuernberger, J. E., Ringdahl, J. E., Vargo, K. K., Crumpecker, A. C., & Gunnarsson, K.
F. (2013). Using a behavioral skills training package to teach conversation skills to
young adults with autism spectrum disorders. Research in Autism Spectrum
Disorders, 7(2), 411–417. https://doi.org/10.1016/j.rasd.2012.09.004
Ogletree, B. T., Davis, P., Hambrecht, G., & Phillips, E. W. (2012). Using milieu training
to promote photograph exchange for a young child with autism. Focus on Autism
and Other Developmental Disabilities, 27(2), 93–101.
https://doi.org/10.1177/1088357612441968
Olive, M. L., De La Cruz, B., Davis, T. N., Chan, J. M., Lang, R. B., O’Reilly, M. F., &
Dickson, S. M. (2007). The effects of enhanced milieu teaching and a voice output
communication aid on the requesting of three children with autism. Journal of
Autism and Developmental Disorders, 37(8), 1505–1513.
https://doi.org/10.1007/s10803-006-0243-6
Palmen, A., & Didden, R. (2012). Task engagement in young adults with high-
functioning autism spectrum disorders: Generalization effects of behavioral skills
training. Research in Autism Spectrum Disorders, 6, 1377–1388.
Palmen, A., Didden, R., & Korzilius, H. (2010). Effectiveness of behavioral skills
training on staff performance in a job setting for high-functioning adolescents with
autism spectrum disorders. Research in Autism Spectrum Disorders, 4, 731–740.
127
Pan-Skadden, J., Wilder, D. A., Sparling, J., Severtson, E., Donaldson, J., Postma, N., …
Neidert, P. (2009). The use of behavioral skills training and in-situ training to teach
children to solicit help when lost: A preliminary investigation. Education &
Treatment of Children, 32(3), 359–370. https://doi.org/10.1353/etc.0.0063
Peterson, P. (2004). Naturalistic language teaching procedures for children at risk for
language delays. The Behavior Analyst Today, 5(4), 404–424.
https://doi.org/10.1037/h0100047
Reynell, J. K., & Gruber, C. P. (1990). Reynell Developmental Language Scales. Los
Angeles: Western Psychlogical Services.
Roid, G. H. (2003). Stanford-Binet Intelligence Scales (5th ed.). Itasca, IL: Riverside
Publishing.
Rosales, R., Stone, K., & Rehfeldt, R. A. (2009). The effects of behavioral skills training
on implementation of the picture exchange communication system. Journal of
Applied Behavior Analysis, 42(3), 541–549. https://doi.org/10.1901/jaba.2009.42-
541
Rosenbaum, S., & Simon, P. (2016). Speech and Language Disorders in Children:
Implications for the Social Security Administration’s Supplemental Security Income
Program. Speech and Language Disorders in Children: Implications for the Social
Security Administration’s Supplemental Security Income Program.
https://doi.org/10.17226/21872
Sarokoff, R. A., & Sturmey, P. (2008). The effects of instructions, rehearsal, modeling,
128
and feedback on acquisition and generalization of staff use of discrete trial teaching
and student correct responses. Research in Autism Spectrum Disorders, 2(1), 125–
136. https://doi.org/10.1016/j.rasd.2007.04.002
Sarokoff, R. A., & Sturmey, P. (2004). The effects of behavioral skills training on staff
implementation of discrete-trial teaching. Journal of Applied Behavior Analysis,
37(4), 535–538. https://doi.org/10.1901/jaba.2004.37-535
Schreibman, L., Dawson, G., Stahmer, A. C., Landa, R., Rogers, S. J., McGee, G. G., …
Halladay, A. (2015). Naturalistic developmental behavioral interventions:
Empirically validated treatments for autism spectrum disorder. Journal of Autism &
Developmental Disorders, 45, 2411–2428. https://doi.org/10.1007/s10803-015-
2407-8
Seiverling, L., Pantelides, M., Ruiz, H. H., & Sturmey, P. (2010). The effect of
behavioral skills training with general case training on staff chaining of child
vocalizations within natural language paradigm. Behavioral Interventions, 25, 53–
75. https://doi.org/10.1002/bin.293
Seiverling, Laura, Williams, K., Sturmey, P., & Hart, S. (2012). Effects of behavioral
skills training on parental treatment of children’s food selectivity. Journal of Applied
Behavior Analysis, 45(1), 197–203. https://doi.org/10.1901/jaba.2012.45-197
Smith, A. E., & Camarata, S. (1999). Using teacher-implemented instruction to increase
language intelligibility of children with autism. Journal of Positive Behavior
Interventions, 1(3), 141–151. https://doi.org/10.1177/109830079900100302
Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). Stanford-Binet Intelligence Scale:
129
Fourth Edition (4th ed.). Chicago: Riverside Publishing.
Toelken, S., & Miltenberger, R. G. (2012). Increasing independence among children
diagnosed with autism using a brief embedded teaching strategy. Behavioral
Interventions, 27, 93–104. https://doi.org/10.1002/bin.337
Togram, B., & Erbas, D. (2010). The effectiveness of instruction on mand model - One of
the milieu teaching techniques. Egitim Arastirmalari - Eurasian Journal of
Educational Research, (38), 198–215.
Ward-Horner, J., & Sturmey, P. (2008). The effects of general-case training and
behavioral skills training on the generalization of parents’ use of discrete-trial
teaching child correct responses, and child maladaptive behavior. Behavioral
Interventions, 23, 271–284. https://doi.org/10.1002/bin.268
Ward-Horner, J., & Sturmey, P. (2012). Component analysis of behavior skills training in
functional analysis. Behavioral Interventions, 27, 75–92.
Warren, S. F., & Gazdag, G. (1990). Facilitating early language development with milieu
intervention procedures. Journal of Early Intervention, 14(1), 62–86.
https://doi.org/10.1177/105381519001400106
Wechsler, D. (2005). Wechsler Individual Achievement Test 2nd Edition (WIAT II).
London: The Psychological Corp.
Wetherby, A. M., & Prizant, B. M. (1992). Profiling young children’s communicative
competence. In Causes and effects in communication and language intervention.
(pp. 217–253). Baltimore, MD, England: Paul H. Brookes Publishing.
130
Wong, C. S. (2013). A play and joint attention intervention for teachers of young children
with autism: A randomized controlled pilot study. Autism, 17(3), 340–357.
https://doi.org/10.1177/1362361312474723
Woodcock, R.W., McGrew, K.S., Mather, N. (2001). Woodcock – Johnson III Tests of
Achievement (3rd ed.). Itasca, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Tests of
Cognitive Abilities (3rd ed.). Itasca, IL: Riverside Publishing.
Wright, C. A., & Kaiser, A. P. (2017). Teaching parents enhanced milieu teaching with
words and signs using the teach-model-coach-review model. Topics in Early
Childhood Special Education, 36(4), 192–204.
https://doi.org/10.1177/0271121415621027
Wurtele, & Owens, J. S. (1997). Teaching personal safety skills to young children: An
investigation of age and gender across five studies. Child Abuse and Neglect, 21(8),
805–814. https://doi.org/10.1016/S0145-2134(97)00040-9
Wurtele, S. K. (1990). Teaching personal safety skills to four-year-old children: A
behavioral approach. Behavior Therapy, 21(1), 25–32.
https://doi.org/10.1016/S0005-7894(05)80186-8
Wurtele, S. K., Saslawsky, D. A., Miller, C. L., Marrs, S. R., & Britcher, J. C. (1986).
Teaching personal safety skills for potential prevention of sexual abuse: A
comparison of treatments. Journal of Consulting and Clinical Psychology, 54(5),
688–692. https://doi.org/10.1037/0022-006X.54.5.688
Zimmerman, I. L., Steiner, V. G., & Pond, R. (1979). The Preschool Language Scale--
131
Revised. Columbus: Charles Merrill.
132
Appendix A: Outcome Coding Descriptions for Single Case Analysis Review and
Framework (SCARF)
Primary Outcomes
1. Which best characterizes the study's effects? This framework is designed for
analysis of SINGLE STUDIES. Articles may include multiple studies; these
should be evaluated separately. A study is a stand-alone single case design with
a single dependent variable. Studies may include a single or multiple
participants. For ATD studies, assess each condition in comparison to single
other condition, if these comparisons match your research questions. Note:
Strong effects occur when consistent changes occur between conditions,
overlap is minimal and/or decreasing over time, and there is a clear change in
the expected direction in level, change, and/or variability. Weak effects occur
when one or more of those characteristics is not present. Non effects occur
when data do not reliably change when condition change occurs, or when data
patterns preclude decision-making. Contratherapeutic effects occur when data
changes in a non-expected direction.
a. ATD Designs: Enter 0: data paths undifferentiated, approximately half or
more of data paths are overlapping (approximately the same values or with
some higher values in one condition and some higher values in another
condition). Enter 1: approximately half or more data are overlapping as
described above, but overlap decreases over time. Enter 2: less than half of
data points are overlapping but there is a decreasing or variable
differentiation between conditions (e.g., difference in values between
conditions decreases over time or is not consistent). Enter 3: less than half
of data points are overlapping and there is increasing differentiation over
time (e.g., difference in values between conditions increases over time).
Enter 4: minimal/no overlap occurs, consistent differentiation between
conditions].
b. MB/MP Design: Enter 0: >1 non-effect or any contratherapeutic effect or if
vertical analysis suggests changes in data in one tier is associated with
condition change in another tier. Enter 1: <3 demonstrations of effect, 1
non-effect. Enter 2: >=3 demonstrations, >=1 non-effect. Enter 3: >=3
demonstrations, >=1 weak effects, 0 non-effect. Enter 4: >=3
demonstrations, 0 non-effects/weak effect.
c. Other Designs: :Enter 0: >1 non-effect or any contratherapeutic effect.
Enter 1: <3 demonstrations of effect, 1 non-effect. Enter 2: >=3
demonstrations, >=1 non-effect. Enter 3: >=3 demonstrations, >=1 weak
effects, 0 non-effect. Enter 4: >=3 demonstrations, 0 non-effects/weak
effect.
Generalized Outcomes
1. Which best characterizes the generalization outcomes in the study?
133
a. Enter 0: no measurement of generalization outcomes. Enter 1: consistent
non-effects or contratherapeutic effects. Enter 2: inconsistent or weak
positive effects. Enter 3: consistent positive effects shown via post-test.
Enter 4: consistent positive effects shown via measurement in context of
design
Maintained Outcomes
1. Which of the following best characterizes maintenance outcomes for the study?
a. Enter 0: maintenance was not assessed. Enter 1: maintenance data were
similar to pre-intervention/baseline data. Enter 2: maintenance data showed
outcomes that were deteriorating or less optimal than intervention or
criterion Enter 3: maintenance data showed maintained outcomes similar to
intervention or criterion levels. Enter 4: maintenance data showed
maintained outcomes similar to intervention or criterion levels and on
multiple occasions (e.g., more than one data point)
134
Appendix B: Observation Data Collection Sheet
135
Appendix C: Intervention Fidelity Sheets
Date: ______ Workshop #: _______ Session #: _________
Teacher ID: __________ Observer: ___________
Workshop 1 Session 1: Following the Child’s Lead
Following the Child’s Lead

Overview /7
1. What does it mean to follow the child’s lead?
2. Choosing toys/materials
3. Imitating children’s actions
4. Parallel play
5. Commenting on the child’s play
6. Things to Avoid
a. Suggestions
b. Questions
c. Commands
7. Responding to child interactions
Video examples (at least 2) /2
How to structure the environment to create opportunities for following the /1
child’s lead
Review
1. How to follow the child’s lead /5
2. Toy/Material Selection
3. Commenting
4. Parallel Play
5. Avoiding directive comments
How to structure the environment to create opportunities for following the /1
child’s lead
Researcher Model Session
Researcher models allowing the child to select the toys of interest /1
Researcher models how to engage in parallel play /1
Research models how to comment on child’s play /1
Researcher models how to respond to child’s interactions /1
Teacher Practice Session
Teacher practices allowing the child to select the toys with the researcher /1
Teacher practices parallel play with the researcher /1
Teacher practices commenting on play with the researcher /1
Teacher practices responding to interactions with the researcher /1
Ending Workshop
At the end of the session the researcher asks the teacher how he/she felt the /1
session went
136
Researcher summarizes how the teacher utilized following the child’s lead /1
Researcher asks the teacher whether he/she felt the session length was /1
appropriate for learning
Total /27
137
Date: _______ Workshop #: ______ Session #: ________
Teacher ID: ______________ Observer: __________
Workshop 1 Session 2: Following the Child’s Lead
Researcher Model Session- Live in Classroom

Researcher models choosing toys/materials with child /1
Researcher models imitating child’s actions /1
Researcher models parallel play with child /1
Researcher models commenting on child’s play /1
Teacher Practice Session- Live in Classroom
Teacher practices choosing toys/materials child while researcher provides /3
feedback (at least 3 times)
Teacher practices imitating child’s actions while researcher provides feedback /3
(at least 3 times)
Teacher practices parallel play with child while researcher provides feedback /3
(at least 3 times)
Teacher practices commenting on child’s play while researcher provides /3
feedback at least 3 times
Proficiency Assessment- Live in Classroom
Teacher demonstrates 80% proficiency with choosing toys/materials (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with imitating child’s actions (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with parallel play (4/5 consecutive /1
trials)
Teacher demonstrates 80% proficiency with commenting on child’s play (4/5 /1
consecutive trials)
Total /20
138
Date: _______ Workshop #: ______ Session #: _________
Teacher ID: _______________ Observer: ___________
Workshop 2 Session 1: Teaching Social Routines
Teaching Social Routines

Overview /6
8. What is a social routine?
a. How to identify appropriate routines
9. Choosing toys/materials the child finds interesting
a. Toy selection rules
10. How to establish a routine
a. Imitating play
b. Imitating actions
c. Engage the child
11. How to use the routine to insert a turn
12. What is a shared interaction?
a. How to create a shared interaction
b. Commands
Video examples (at least 2) /2
How to structure the environment to create opportunities for establishing /1
social routines
Review
Overview /6
6. What is a social routine/how to identify routines?
7. Choosing toys/materials the child finds interesting
8. How to establish a routine
9. How to use the routine to insert a turn
10. What is a shared interaction/how to create one?
How to structure the environment to create opportunities for establishing /1
social routines
Researcher models how to identify appropriate routines /1
Researcher models how to choose toys /1
Research models how to establish a routine /1
Researcher models how to use the routine to insert a turn /1
Researcher models how to respond to child interactions /1
Teacher practices how to identify appropriate routines with researcher /1
Teacher practices how to choose toys with researcher /1
Teacher practices how to establish a routine with researcher /1
139
Teacher practices how to use the routine to insert a turn with researcher /1
Teacher practices how to respond to child interactions with the researcher /1
Ending Workshop
At the end of the session the researcher asks the teacher how he/she felt /1
the session went
Researcher summarizes how the teacher utilized establishing routines /1
Total /29
140
Date: ______ Workshop #: _____ Session #: ________
Teacher ID: ______________ Observer: __________
Workshop 2 Session 2: Teaching Social Routines

Researcher models how to identify appropriate routines with child /1
Researcher models how to choose toys with child /1
Research models how to establish a routine with child /1
Researcher models how to use the routine to insert a turn with child /1
Researcher models how to create a shared interaction with child /1
Researcher models how to respond to child interactions with child /1
Teacher practices how to identify appropriate routines with child while researcher /3
provides feedback (at least 3 times)
Teacher practices how to choose toys with child while researcher provides /3
Teacher practices how to establish a routine with child while researcher provides /3
Teacher practices how to use the routine to insert a turn with child while /3
researcher provides feedback (at least 3 times)
Teacher practices how to respond to child interactions with child while researcher /3
provides feedback (at least 3 times)
Teacher demonstrates 80% proficiency with identifying appropriate routines (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with choosing appropriate toys (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with establishing routines (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with inserting a turn into the routine (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with creating a shared interaction (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with inserting a turn into the routine (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with responding to child interactions (4/5 /1
consecutive trials)
Total /28
141
Date: _______ Workshop #: _____ Session #: _________
Teacher ID: _______________ Observer: _______
Workshop 3 Session 1: Systematic Use of Prompts
Systematic Use of Prompts

Time Delay /6
1. Overview of constant time delay
2. When to use it
3. At least 3 examples of time delay presented (3 points)
4. Video model
Linguistic Prompts /6
1. Overview of linguistic prompts
2. Variety of linguistic prompts presented
3. At least 3 examples of linguistic prompts (3 points)
4. Video model
Non-Linguistic Prompts /6
1. Overview of non-linguistic prompts
2. Variety of non-linguistic prompts presented (including model)
3. At least 3 examples of non-linguistic model prompts (3 points)
4. Video model
How to structure the environment to create opportunities for prompting /1
Review
Types of prompts: /3
12. Time Delay
13. Linguistic Prompts
14. Non-Linguistic Prompts
Structuring the environment /1
Video quiz /1
Researcher models time delay for teacher /1
Researcher models linguistic prompts for teacher /1
Researcher models non-linguistic model prompts for teacher /1
Teacher practices time delay with researcher (at least 3 times) /3
Teacher practices linguistic prompts with researcher (at least 3 times) /3
Teacher practices non-linguistic model prompts with researcher (at least 3 /3
times)
Ending Workshop
At the end of the session the researcher asks the teacher how he/she felt the /1
session went
Researcher summarizes how the teacher utilized each prompt in the hierarchy /1
142
Total /39
143
Date: _______ Workshop #: ______ Session #: ________
Teacher ID: ______________ Observer: ___________
Workshop 3 Session 2: Systematic Use of Prompts

Researcher models time delay with child /1
Researcher models linguistic prompts with child /1
Researcher models non-linguistic model prompts with child /1
Teacher practices time delay with child while researcher provides feedback (at /3
least 3 times)
Teacher practices linguistic prompts with child while researcher provides /3
Teacher practices non-linguistic prompts with child while researcher provides /3
Teacher practices moving through the prompting hierarchy with child while /3
researcher provides feedback at least 3 times
Teacher demonstrates 80% proficiency with time delay (4/5 consecutive /1
trials)
Teacher demonstrates 80% proficiency with linguistic prompts (4/5 /1
consecutive trials)
Teacher demonstrates 80% proficiency with non-linguistic model prompts /1
(4/5 consecutive trials)
Teacher demonstrates 80% proficiency with non-linguistic physical prompts /1
(4/5 consecutive trials)
Teacher demonstrates 80% proficiency with regard to moving through the /1
prompting hierarchy (4/5 consecutive trials)
Total /20
144
Appendix D: Data Collection Sheets
145
146
Appendix E: Teacher Demographics Form
General Information
Pseudonym:
Age:
Race/Ethnicity:
Gender:
Current Grade Taught and
length of time teaching this
grade:
Previous Grades Taught and Grade Time Taught
length of time teaching each
grade:
Length of Time at The

Bridge:
Types of Classrooms Taught ¨ Special Education

(Check all that apply): ¨ General Education
¨ Other: Please specify
_______________________
147
Please check all disabilities ¨ Autism Spectrum Disorder (ASD)
with which you have ¨ Intellectual Disability
instructional experience: ¨ Down syndrome
¨ Emotional Disturbance (ED)
¨ Physical Handicap
¨ Cerebral Palsy
¨ Specific Learning Disabilities
¨ Other Health Impairment (OHI)
¨ Speech or Language Impairment
¨ Visually Impaired (including blindness)
¨ Hearing Impaired
¨ Deafness
¨ Deaf-Blindness
¨ Traumatic Brain Injury
¨ Attention Deficit/Hyperactivity Disorder
(ADHD)
¨ Social Communication Deficits
How many students are in
your current classroom?
Of the students in your

current class, how many have
a diagnosed disability?
Of the students in your

current class, how many have
a Growth & Performance
Plan?
What is your highest degree ¨ High School Diploma
earned? ¨ Bachelor’s Degree
¨ Master’s Degree
¨ Professional Degree (PhD, MD, JDD)
148
What was your area of study
for your highest degree
earned?
Please list your area(s) of

teacher certification:
What certificate type do you ¨ Pre-Service Certificate

currently hold? ¨ Certificate of Eligibility Certificate
¨ Clearance Certificate
¨ Induction Certificate
¨ Standard Professional Certificate
¨ Standard Performance Based Certificate
¨ Performance-Based Professional Educational
Leadership Certificate
¨ Standard Professional Educational Leadership
¨ Advanced Professional Certificate
¨ Lead Professional Certificate
¨ Not Applicable
Please describe any previous
professional development
experiences in naturalistic
social communication
interventions for children
with ASD:
Please describe any previous

experiences with receiving
149
coaching from an expert to
support you in the classroom:
Describe any strategies you

currently use in your
classroom to support
children’s communication:
150
Appendix F: Child Demographics Form
General Information
Pseudonym:
Date of Birth:
Gender:
Race/Ethnicity:
Diagnoses:
Has he/she received any ¨ Babies Can’t Wait

intervention/services prior to ¨ Private Therapy Services
attending the Bridge? ¨ Other:
Grade:
Number of Years at the
Bridge:
Does he/she have a Growth &
Performance Plan?
Has he//she received any
diagnoses? Please list each
one.
Current services received and ¨ Speech/Language:

frequency:
¨ Occupational Therapy:
151
¨ ABA:
¨ Other:
Where does he/she receive ¨ The Bridge

services? ¨ Private
¨ Other:
Are there social

communication goals in his/her
Growth & Performance Plan?
Or is social communication a
current focus in the classroom?
What are his/her preferred

toys, objects, activities, etc.?
152
Appendix G: Social Validity Measure
Teacher Feedback Questionnaire
Adapted from: Hendrickson, J. M., Gardner, N., Kaiser, A., Riley, A. (1993). Evaluation
of a social interaction coaching program in an integrated day-care setting. Journal of
Applied Behavior Analysis, 26(2), 13-225.
Directions: Please answer the following questions based on your experiences with the
research study. Be sure to answer every question and to only circle one response per
question:
Research Strongly Strongly

Disagree Disagree Agree Agree
Research in schools is important to learn to 1 2 3 4
better teach high-risk children/children with
disabilities
Research in schools is important to learn to 1 2 3 4
better teach all children
Research in schools can improve specific staff 1 2 3 4
teaching skills
Research in schools can improve staff teaching in 1 2 3 4
general
Intervention Effects
I believe the intervention techniques taught in 1 2 3 4
this study can be used in most classrooms to help
integrate high-risk children/ children with
disabilities
My participation in learning intervention 1 2 3 4
techniques was worth my effort
I would share my intervention skills with other 1 2 3 4
teachers
The social communication skills of the student(s) 1 2 3 4
with whom I worked have improved after this
intervention
This intervention was beneficial for the students 1 2 3 4
with whom I worked
This intervention was beneficial for me as a 1 2 3 4
teacher
Social Validity
153
Overall, I believe that my knowledge about social 1 2 3 4
communication and intervention techniques has
improved
I feel more knowledgeable and confident in my 1 2 3 4
ability to help my students improve their social
communication skills after participating in this
intervention
I feel confident in my ability to set up my 1 2 3 4
classroom and activities to help encourage
opportunities for social communication
I feel confident in my ability to incorporate the 1 2 3 4
intervention techniques I have learned into my
daily classroom routines
I believe that the intervention techniques are 1 2 3 4
feasible to implement in the classroom
I believe that the intervention techniques can be 1 2 3 4
easily incorporated into the classroom
Strongly Strongly
Disagree Disagree Agree Agree
I feel confident in my ability to follow the child’s 1 2 3 4
lead
I feel confident in my ability to establish and 1 2 3 4
engage in social routines
I feel confident in my ability to use the system of 1 2 3 4
prompts, including linguistic and non-linguistic
prompts and time delay
I feel confident in my ability to identify 1 2 3 4
opportunities to set up the environment to
encourage social communication
I would participate in a similar project in the 1 2 3 4
future
Training
I felt comfortable during the training sessions 1 2 3 4
The training sessions were tailored to my 1 2 3 4
experience level
I felt comfortable implementing the intervention 1 2 3 4
techniques after the training sessions were
completed
154
I felt comfortable asking questions during the 1 2 3 4
training sessions
I would recommend the training sessions to my 1 2 3 4
colleagues
155

Using Behavioral Skills Training To Teach Behavioral Interventions and Milieu Teaching: A Systematic Review of The Literature and Empirical Investigation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using Behavioral Skills Training To Teach Behavioral Interventions and Milieu Teaching: A Systematic Review of The Literature and Empirical Investigation

Uploaded by

Copyright:

Available Formats

USING BEHAVIORAL SKILLS TRAINING TO TEACH BEHAVIORAL

INTERVENTIONS AND MILIEU TEACHING: A SYSTEMATIC REVIEW OF THE

LITERATURE AND EMPIRICAL INVESTIGATION

(Under the Direction of Rebecca Lieberman-Betz)

Language impairments in children are associated with later impairments in

greater access to services and increased intervention dosage is to train natural

teachers in a classroom, parents/guardians in a home) to deliver evidence-based language

natural implementers (i.e., teachers and other professionals) to implement various

implement primary components of a language intervention, milieu teaching (MT), with

effects of BST training on implementation of MT. Two teachers were taught to

thereby increasing access to evidence-based interventions for children with disabilities.

INDEX WORDS: behavioral skills training, milieu teaching, teachers, intervention

INTERVENTIONS AND MILIEU TEACHING: A SYSTEMATIC REVIEW OF THE

LITERATURE AND EMPIRICAL INVESTIGATION

MYLISSA MARY SLANE

Bachelor of Arts, Bloomsburg University of Pennsylvania, 2011

Master of Science, Bucknell University, 2013

A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

Mylissa Mary Slane

All Rights Reserved

INTERVENTIONS AND MILIEU TEACHING: A SYSTEMATIC REVIEW OF THE

LITERATURE AND EMPIRICAL INVESTIGATION

MYLISSA MARY SLANE

Major Professor: Rebecca Lieberman-Betz

Committee: A. Michele Lease

Electronic Version Approved:

me to continue even when things seemed impossible or insurmountable. Thank you as

addition, I want to extend my sincerest gratitude to my participants, without whom this

one another. We truly made a great team! Thank you all!

LIST OF TABLES ............................................................................................................ vii

LIST OF FIGURES ..........................................................................................................viii

Naturalistic Developmental Behavioral Interventions ................................ 2

Behavioral Skills Training ........................................................................... 5

Purpose of the Studies ................................................................................. 7

2 STUDY 1: USING BEHAVIORAL SKILLS TRAINING TO TEACH

BEHAVIORAL INTERVENTIONS: A SYSTEMATIC REVIEW ................. 8

3 STUDY 2: TEACHING THE TEACHER: USING BEHAVIORAL SKILLS

TRAINING TO TRAIN TEACHERS TO IMPLEMENT MILIEU

4 GENERAL DISCUSSION ............................................................................ 113

REFERENCES ................................................................................................................ 117

A OUTCOME CODING DESCRIPTIONS FOR SINGLE CASE ANALYSIS

REVIEW AND FRAMEWORK (SCARF)................................................... 133

B OBSERVATION DATA COLLECTION SHEET ....................................... 135

C INTERVENTION FIDELITY SHEETS ....................................................... 136

D DATA COLLECTION SHEETS .................................................................. 145

E TEACHER DEMOGRAPHICS FORM ........................................................ 147

F CHILD DEMOGRAPHICS FORM .............................................................. 151

G SOCIAL VALIDITY MEASURE ................................................................ 153

Table 2.1: Participant Demographics ................................................................................ 35

Table 2.2: Study Outcomes ............................................................................................... 38

Framework (SCARF) ............................................................................................ 42

Analysis Review and Framework (SCARF) ......................................................... 44

Table 2.5: SCARF Quality, Rigor, and Outcome Scores .................................................. 47

Table 3.1: Teacher Participant Demographics ................................................................ 105

Table 3.2: Child Participant Demographics .................................................................... 106

Table 3.3: IOA Agreement by Tier and Condition.......................................................... 107

Flow Diagram ........................................................................................................ 49

Figure 2.2: SCARF Quality and Rigor of Primary Outcomes........................................... 50

Figure 2.3: SCARF Quality and Rigor of Generalized Outcomes .................................... 51

Figure 2.4: SCARF Quality and Rigor of Maintained Outcomes ..................................... 52

Figure 3.1: Accurate Use of the Prompting Hierarchy .................................................... 108

development, including educational and social development (Peterson, 2004). Prevalence