Professional Documents
Culture Documents
by
MYLISSA SLANE
ABSTRACT
cognitive, language, and academic domains (Johnson et al., 1999). The prevalence rate
for language impairments is high among community samples (7% to 17%; King et al.,
2005), and speech and language disorders are often co-morbid with other
neurodevelopmental disorders (Rosenbaum & Simon, 2016). Thus, one way to ensure
implementers (those who are already part of the child’s typical environment; e.g.,
interventions. The purpose of the following two studies was to (a) systematically review
and synthesize the literature examining the use of behavioral skills training (BST) to train
interventions and (b) extend the current literature by utilizing BST to train teachers to
fidelity. Results of the systematic review showed that BST could be effectively used to
train teachers and staff to implement a variety of interventions (e.g., reading racetrack,
the picture exchange communication system [PECS], discrete trial teaching [DTT], the
natural language paradigm [NLP]) targeting a variety of skills and deficits. However,
only a handful of studies had sufficient rigor, quality, and interpretable outcomes to infer
a functional relation. The second study was an empirical investigation examining the
implement MT using BST and both teachers learned to implement three core MT
techniques: following the child’s lead (FTCL), teaching social routines (TSR), and the
system of least prompts (SLP). A functional relation was demonstrated across each tier
for one teacher, with two out of three behaviors (FTCL and TSR) replicated across two
teachers. Results from the systematic literature review and the empirical investigation
have implications for future research in that both studies suggested natural implementers
(teachers and staff) can and should be taught to implement interventions with fidelity,
by
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2020
© 2020
by
Ron Walcott
Dean of the Graduate School
The University of Georgia
December 2020
ACKNOWLEDGEMENTS
I would like to thank my major professor, Dr. Rebecca Lieberman-Betz for her
unwavering support and assistance. I never would have been able to complete this project
without her guidance, advice, and assistance. I would also like to thank my academic
advisor and committee member, Dr. Michele Lease for believing in me and for pushing
well to my committee members, Dr. Joel Ringdahl and Dr. Amy Reschly who provided
me with helpful feedback and support throughout this process. They challenged me to
think critically and helped improve my project immensely. I cannot express my gratitude
enough to those who helped me with data and reliability coding, Maggie Molony, Ali
Zelan, and Kelsie Tyson. Their hard work and dedication were truly admirable and
without them, this project would not have been possible. I would also like to thank my
family who have supported me through this process and throughout all of graduate
school. They have always been there for me, especially when times were trying or
difficult. They are my rock and inspiration and without them, I would never have had the
courage to push myself as far as I have. I would also like to thank the school where I
completed my project for their support and cooperation throughout the project. In
project would not be possible. Thank you to all of the friends I made throughout graduate
school and especially to my cohort for always being there for one another and supporting
iv
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS ............................................................................................... iv
CHAPTER
1 INTRODUCTION ............................................................................................. 1
Milieu Teaching........................................................................................... 3
Abstract........................................................................................................ 9
Introduction ............................................................................................... 11
Method ....................................................................................................... 15
Results ....................................................................................................... 20
Discussion.................................................................................................. 28
TEACHING TECHNIQUES........................................................................... 53
Abstract...................................................................................................... 54
v
Introduction ............................................................................................... 56
Method ....................................................................................................... 66
Results ....................................................................................................... 89
Discussion.................................................................................................. 97
APPENDICES
vi
LIST OF TABLES
Page
Table 2.3: Rigor Coding Questions from the Single Case Analysis Review and
Table 2.4: Quality & Breadth of Measurement Coding Questions from the Single Case
vii
LIST OF FIGURES
Page
Figure 2.1: Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA)
Figure 3.2: Fidelity of Implementation Across Behaviors for Ms. Smith ....................... 109
Figure 3.3: Fidelity of Implementation Across Behaviors for Mr. Parker ...................... 111
viii
CHAPTER 1
INTRODUCTION
Language delays in children can have serious negative implications for future
rates of speech and language disorders in children vary based on child age and diagnostic
criteria, but are estimated to affect between 3% and 16% of children in the US
(Rosenbaum & Simon, 2016). They are also found to frequently co-occur with other
Simon, 2016). Given the impact of language delays on daily functioning and social
interactions, and the importance of language development for growth and development in
other areas, the need for intervention for children with language delays cannot be
understated.
Discrete trial teaching (DTT) became one of the most widely implemented
interventions for communication delays and many other developmental needs for
individuals with ASD (Schreibman et al., 2015). During the process of DTT, a targeted
skill is broken down into several components and the child and interventionist work on
learning one component at a time until the target behavior is mastered (Schreibman et al.,
2015). However, several limitations with the use of DTT in individuals with ASD were
with a surge in the literature examining developmental interventions for young children
1
with social-communication disorders such as ASD, led to increased interest in more
naturalistic interventions.
trajectory that is more similar to, rather than different from, typical development
(Schreibman et al., 2015). This, in combination with the limitations of DTT, led to an
incorporation of more naturalistic intervention based strategies for children with ASD
(Schreibman et al., 2015). Naturalistic intervention strategies use natural reinforcers (as
compared to arbitrary), use materials that children prefer, reinforce child attempts at
of interventions all its own (Schreibman et al., 2015). According to Schreibman et al.
(2015), there are several common features of NDBIs, including (1) a three-part
contingency; (2) manualized practice; (3) individualized treatment goals; (4) ongoing
arrangement; (7) use of prompting and prompt fading; (8) modeling; (9) adult imitation
of the child’s language, play, or body movements; and (10) broadening the attentional
focus of the child. Naturalistic interventions, including NDBIs incorporate not only the
2
natural environments of children (e.g., classroom, home), but also intervention
implementers who are part of children’s natural environments (e.g., parents, teachers).
Training natural implementers to carry out NDBIs has the potential to increase
dosage exponentially for children with disabilities receiving intervention within the
natural environment (Peterson, 2004). Parents and teachers have many more
opportunities throughout the day and the week to implement intervention techniques and
to help address children’s identified needs. Teachers have been identified as one of the
prime candidates in terms of natural implementers and several studies have demonstrated
their ability to implement various interventions with fidelity. For example, teachers have
manualized interventions targeting joint attention (Kaale et al., 2012), enhanced milieu
teaching (Olive et al., 2007), naturalistic language teaching (Smith & Camarata, 1999),
and symbolic play and joint attention interventions (Wong, 2013). Indeed, several
intervention techniques including time delay, the mand-model procedure, and milieu
2004).
Milieu Teaching
focuses on the child’s interests to encourage communication from the child (Kaiser et al.,
1993). MT has been successfully implemented with children with speech/language and
other communication delays. There are three main mechanisms to MT, which include
3
procedures (time delay, modeling, and mand-modeling; Peterson, 2004). As part of
a way as to encourage communicative acts on the part of the child (Peterson, 2004). For
example, an interventionist may place an object out of reach of the child or in a place
where the child cannot easily get to without the teacher’s help. It is the hope that such
object placement will occasion the child to communicate with the teacher in order to
receive help obtaining the desired item. Responsive interaction techniques include:
child’s verbalizations, and expanding on the child’s statements (Peterson, 2004). Time
delay is a procedure in which one uses nonvocal cues to occasion vocal responding from
a child (Peterson, 2004). During this procedure a teacher identifies something that the
child wants or desires and looks at them expectantly in the hope of occasioning a vocal
response. If this method does not work, the teacher then moves on to the mand-model
procedure. The mand model procedure is one that involves both manding (making a
request from the child) and modeling (demonstrating for the child what he/she is
expected to do). This is a teacher-initiated strategy in which the teacher directly asks the
child what he/she wants and then models the appropriate vocal response if no response is
given.
MT has a strong research base supporting its efficacy in teaching children with
language delays new language targets (Bolzani et al., 1990). Mand-model and incidental
(Warren & Gazdag, 1990), improved spontaneous production of multiple and single
4
words in children who experience prenatal cocaine exposure (Bolzani et al., 2009),
increased language in children with developmental disabilities (Togram & Erbas, 2010),
and taught a photo exchange system to a child with ASD (Ogletree et al., 2012). In
language development (e.g., Kaiser et al., 1993). Therefore, research has shown that
children can benefit from teachers’ implementation of MT. However, the methods used
feedback (Kornacki et al., 2013; Ward-Horner & Sturmey, 2012). During the instruction
phase, the trainer provides information on the target intervention (usually in the form of
perform the steps of the desired skill accurately for the learner. The rehearsal portion
gives the learner the opportunity to practice the desired skill with the trainer, so that
he/she may become more comfortable performing the target behaviors. Finally, the
trainer provides the trainee with corrective feedback as he/she implements the target
BST has been used not only to train individuals to perform new behaviors or
engage in new tasks, but it has also been used to train teachers and other professional
5
staff to conduct a variety of interventions. For example, BST has been used to train
teachers to implement the Picture Exchange Communication System (PECS) with their
students (Homlitas et al., 2014), to train staff to implement mand training (Nigro-Bruzzi
& Sturmey, 2010), to train teachers and staff to use discrete trial teaching (DTT; Jull &
Mirenda, 2016; Sarokoff & Sturmey, 2004), to train teachers to implement specific goals
from students’ behavioral intervention plans (BIP; Madzharova & Sturmey, 2018), and to
train teachers to implement response interruption and redirection (RIRD) with students
Additionally, BST has been used to train several NDBIs in the literature,
including training teachers to implement the Natural Language Paradigm and response
chaining (NLP; Seiverling et al., 2010), training teachers to effectively use NLP
(Gianoumis et al., 2012), and training paraprofessionals to implement the system of least
prompts using an embedded teaching procedure (Toelken & Miltenberger, 2012). This is
a key study because the use of a brief, embedded teaching procedure ensured that the
staff could perform the intervention effectively while still performing their regular
classroom duties, thereby ensuring that the intervention did not interfere with their
Thus, BST has been used to train a variety of NDBI techniques with a variety of
implementers, including teachers and other professional staff. Similarly, MT has been
studies examining MT report using a variety of training techniques, but do not report
6
Purpose of the Studies
The purpose of the following studies is to: (1) provide a systematic review of the
current literature regarding the use of BST to train teachers and other professional staff to
techniques. First, Study 1 sought to determine whether BST has been used effectively to
train other individuals (teachers and other professionals) to implement interventions with
children ages birth to 21 through a systematic review of the current literature. It also
sought to determine the level of quality of the literature base. Finally, Study 1 sought to
determine whether there were any variables that impact the efficacy of BST training on
classroom with fidelity. This study also examined whether the fidelity of implementation
over time. The combined results of these two studies have the potential to increase the
confidence in the use of teachers and other professional staff as natural implementers
who can implement interventions with fidelity when properly trained. This has the
potential to increase dosage for students receiving interventions if they can be properly
implemented by teachers, who spend a great deal more time with students than most
other interventionists.
7
CHAPTER 2
1
Slane, M. M., & Lieberman-Betz, R. To be submitted to Behavioral Interventions
8
Abstract
Behavioral skills training (BST) is a well-researched, established set of principles that has
been used to train a variety of individuals to complete numerous behavioral tasks, skills,
strategies, and interventions (DiGennaro et al., 2018). It includes the four main
and has been used to teach individuals from a variety of backgrounds (including natural
Nigro-Bruzzi & Sturmey, 2010; Sarokoff & Sturmey, 2004). However, there has not been
a comprehensive review that has synthesized and analyzed the existing literature on the
use of BST to train staff and teachers working with children, adolescents, and young
adults to implement interventions. Therefore, the current review aimed to close this gap
teachers and other professionals to implement various interventions with children ages
birth to 21. A total of 19 studies from 17 articles were included in the review. The
SCARF protocol (Ledford, et al., 2016) was utilized to rate article quality/rigor and
outcomes of studies. All studies showed positive outcomes, suggesting that teachers and
other professional staff can be effectively taught using BST to implement a variety of
interventions with fidelity. However, only seven articles were found to have sufficient
quality/rigor scores in their primary outcomes to allow for interpretation of findings with
confidence. This indicates that additional high-quality studies are needed to examine the
9
individuals with disabilities. Implications for future research and intervention are
discussed.
implementers, teachers
10
Using Behavioral Skills Training to Teach Behavioral Interventions: A Systematic
Review
that has been used to train a variety of individuals to complete numerous behavioral
tasks, skills, strategies, and interventions (DiGennaro et al., 2018). In one of the earliest
studies to examine BST, Koegel et al. (1977) trained teachers to implement several
the behavioral modification strategies and receiving live feedback from trained staff
regarding their performance; and receiving praise for correct performance, and corrective
feedback and modeling to rectify improper implementation. Using these procedures, the
authors were able to train teachers to implement behavioral modification techniques with
fidelity. Alden et al. (1978) further expanded the procedures that comprise BST, using
modeling and rehearsal as part of the initial training procedures rather than the corrective
feedback procedure. Around this time, recognition of the potential of BST to teach
important behaviors piqued and numerous studies examining its utility were published.
Several studies have utilized BST to increase knowledge and safety skills in
young children (e.g., Kolko et al., 1991; Miltenberger et al., 2009; Miltenberger &
Thiesse-Duffy, 1988; Wurtele & Owens, 1997; Wurtele, 1990), to teach children how to
find help when lost (Pan-Skadden et al., 2009), to teach safety skills aimed at preventing
sexual abuse (Wurtele et al., 1986), and to prevent child abduction (e.g., Bromberg &
11
Johnson, 1997; Johnson et al., 2005, 2006). BST has also been used to help promote
knowledge and prevent the risk of HIV/AIDS (e.g., Adams et al., 1992; Boyer &
Kegeles, 1991; Lawrence et al., 1995), to prevent gun play in young children (e.g., Himle
& Miltenberger, 2004; Miltenberger et al., 2004, 2005), and to encourage smoking
cessation (Glasgow & Lichtenstein, 1987). In recent years, the use of BST has further
expanded to include training teachers, staff, and parents to implement behavior analytic
BST has been used to train teachers to implement complex behavior intervention
plans in the classroom (Madzharova & Sturmey, 2018); to train staff to use mand training
with children (Nigro-Bruzzi & Sturmey, 2010); to use discrete-trial teaching (Sarokoff &
Sturmey, 2004); and to improve the use of positive reinforcement, error correction, and
increase opportunities for responding (Palmen et al., 2010). BST has also been used to
(PECS; Rosales et al., 2009). Additionally, researchers have used BST to train parents to
implement a variety of behavior analytic techniques. Parents have been taught to improve
food selectivity (Seiverling et al., 2012), promote social skills development (Hassan et
al., 2018), implement guidance compliance (Miles & Wilder, 2009), and implement
Because of the extensive use of BST to train individuals to use or implement new
skills, several studies have sought to identify the most potent components of the
components of BST, including (1) instructions, (2) modeling, and (3) feedback. Although
12
use of instructions alone showed slight improvements in accuracy, use of modeling
drastically increased accuracy, and use of feedback led to further increases. In a follow up
study, Ward-Horner and Sturmey (2012) found that while modeling was an important
component of BST, feedback was the most effective and necessary component of the
training package. However, Kornacki et al., (2013) found that the key component for
BST success varied by individual participants. Given the results of these studies, BST is
instruction, (2) modeling, (3) rehearsal, and (4) feedback (DiGennaro et al., 2018).
The BST literature has grown tremendously over the last 40 years, with its uses,
trainees, and contexts increasingly expanding. The field has identified the critical
based training package. The contexts in which BST can be applied are constantly
increasing, expanding both the utility and applicability of BST to numerous behavior
analytic principles and interventions. In addition, BST has evolved from a training
package to change specific target behaviors to one that can be used to train other
been a study that has synthesized and analyzed the existing literature on the use of BST to
train staff and teachers working with children, adolescents, and young adults to
implement various interventions. Such an analysis could benefit the field in several ways.
literature, including the populations with which it has been implemented, the topics of
13
implementation. Second, it would bring to light variables or factors that may influence
a review could aid future researchers and clinicians in determining whether BST is an
appropriate training package for training others to implement a target intervention and
highlight any significant considerations. Fourth, a review would provide a sense of the
quality of the current literature and in turn provide directions for future research to help
The purpose of the current review is to systematically synthesize and analyze the
BST-intervention literature to help guide both research and practice. This review will
support the translation of research into practice and will help guide clinical decision
making for assessing and evaluating whether BST is the appropriate training package
interventions with children, adolescents, and young adults (ages birth to 21)?
2. What is the quality of studies comprising the current literature base examining the
3. What, if any, are the pertinent intervention variables that influence the
14
Method
The Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA)
guidelines were used to guide decision making throughout the literature review. The
PRISMA guidelines were developed in an effort to ensure scientific rigor and structural
Article Search
for review: (1) PsycINFO, (2) Psychology and Behavioral Sciences Collection, and (3)
Education Research Complete. Search terms were entered as follows: Line 1: “behavioral
skills training;” AND Line 2: “intervention.” If available, the following options were also
published before or during December 2019, when the search was conducted. In addition
to the database search, the reference lists of all eligible studies were reviewed for relevant
The following inclusion criteria were used to identify eligible studies: (1) the
study must have been published in a peer-reviewed journal (dissertations and theses were
excluded); (2) the study must either have been written in or translated to English; (3) the
study must have been quantitative in nature, utilizing either a group or single case design;
(4) the phrase “behavioral skills training” must have appeared somewhere in the article
(not the references section alone); and (5) the study must have used BST to train teachers,
15
A total of 172 studies were identified through the initial online search, with an
additional 21 studies identified through other sources, for a total of 193 studies to be
screened for full inclusion criteria. After duplicates were removed, a total of 146 studies
remained. The abstracts of these studies were reviewed using the full eligibility criteria
and a total of 124 were excluded. The remaining 22 full text articles were examined to
confirm eligibility and five were excluded due to failure to meet full inclusion criteria. In
the end, a total of 19 studies from 17 articles were included in the review (see Figure 2.1
Article Coding
Descriptive Information
Studies were reviewed and coded for descriptive characteristics. Data were
collected on trainees (e.g., age and gender), intervention recipients (e.g., age, diagnoses,
intervention type, and intervention quality (setting, target behaviors, fidelity, and
effectiveness).
The methodology of all single case design (SCD) studies (all studies) was
evaluated using the Single-Case Analysis and Review Framework (SCARF; Ledford et
al., 2016). The SCARF was designed to assess study (1) rigor, (2) quality, and (3)
outcomes. Outcome scores on the SCARF of 3.0 - 4.0 or higher are consistent with
16
2.0 or higher are considered to be strong enough for the study results to be interpreted.
Rigor. The three quality indicators for study rigor are reliability, fidelity, and
through examining the collection, reporting, and levels of interobserver agreement (IOA)
fidelity of implementation, sufficiency of the fidelity data, the frequency of fidelity data
collection, and the use of inter-observer agreement for fidelity data. Finally, evidence of
the sufficiency of the data is determined by examining the number of data points per
condition and the overall trend of the data when switching between conditions.
Quality of measurement. The seven indicators for study quality are social and
maintenance. Evidence of the social and ecological validity of the study is determined
measures, normative comparisons for dependent variables, and the environment in which
participant information, and study inclusion criteria. Condition descriptions are evaluated
by analyzing the description of condition procedures, the dosage, the setting, and the
interventions.
17
The dependent variables are evaluated in terms of their operational definitions,
examples of target and non-target behaviors, and the description of the measurement
system and its use. Evidence for generalization is determined by evaluating the
behavior change, the number of times maintenance is evaluated, and the time frame
Outcomes. The three quality indicators for study outcomes involve reporting for
of these quality indicators requires the examination of the type of measurement used to
determine outcomes, the strength and evidence for treatment efficacy/effectiveness, and
Interrater Reliability
the second rater. Any study that was placed in the uncertain group and any disagreements
on inclusion were fully reviewed, discussed, and resolved by the two reviewers until a
consensus was reached. When screening a study for inclusion, raters first searched the
article for the phrase “behavioral skills training.” If the study did not include this phrase,
it was excluded from further eligibility review. If the study did include this phrase, the
18
raters then reviewed it to ensure it satisfied all five of the inclusion criteria described
above for inclusion in the review. IRR for inclusion/exclusion of studies was 88%.
For studies meeting criteria for full review, 30% were reviewed by a second rater
to determine IRR for article quality coding using the SCARF (n = 6). The two raters had
coding could begin. If an article’s IRR rating fell below 80%, discrepancies were
reviewed, discussed, and resolved by the two raters until a consensus was reached. IRR
was determined by dividing the number of agreements by the number of agreements plus
disagreements and multiplying by 100. Average IRR data was 80% with a range of 74%-
85%.
Data Analysis
The following methods were used to address each of the proposed research
questions:
1. To determine whether BST was efficacious when used to train other individuals to
implement some form of intervention with children, adolescents, and young adults
(ages birth to 21), the primary, generalized, and maintenance outcome SCARF
ratings for each study were analyzed and synthesized. Outcome scores of 3.0 - 4.0
19
2. In order to evaluate the quality of studies in the current literature base, all SCARF
variables were used to assign overall quality/rigor ratings. Scores of 2.0 or higher
Results
Research Design
design (see Table 2.1). Eleven studies utilized a multiple baseline across participants
design (one of which was nonconcurrent). One study used a multiple baseline across
behaviors design. Of the seven studies using multiple probe designs, all were across
participants.
Participant Characteristics
As shown in Table 2.1, 11 studies examined the use of BST to train teachers as
implementers of interventions, seven studies examined the use of BST to train clinic staff
to implement intervention, and one study examined the use of BST to train a swimming
instructor to implement intervention with his/her student. Across all 19 studies, 74 natural
implementers were trained using BST. Natural implementers ranged in age from 19 to 50;
however, eight of the included studies did not report the age range for the participants
trained using BST. Implementers trained using BST had a variety of years of experience
20
ranging from no experience to multiple years of experience. Implementers' race/ethnicity
range of 2 to 12 years; however, two studies did not report the number of intervention
recipients (Aherne & Beaulieu, 2019; Palmen et al., 2010) and an additional three studies
did not include their age ranges (Davenport et al., 2019; Hassan et al., 2017; Hogan et al.,
2015). Child intervention recipient race/ethnicity data were not reported in any studies
included in this review. The majority of child intervention recipients had a diagnosis of
disability as the primary diagnosis; one listed multiple disabilities, including global
developmental delay; one listed multiple physical disabilities; and diagnosis was not
reported in three studies (Aherne & Beaulieu, 2019; Davenport et al., 2019; Hogan et al.,
2015).
(n = 1), a clinic (n = 1), a community pool (n = 1), a home (n = 1), and a treatment center
for individuals with ASD (n = 1). BST was used to target a variety of intervention skills,
including discrete trial teaching (DTT; n = 4), fidelity of behavior intervention plan (BIP)
implementation (n = 2), the natural language paradigm (NLP; n = 2), reading racetrack
intervention (n = 1), incidental teaching (n =1), and others. Sixty-eight percent of studies
(n = 13) also reported outcomes for child recipients, such as unprompted functional
21
communications, number of sight words read correctly, percentage of correct responses,
vocalizations, and stereotypy (see Table 2.2). All 19 studies reported using all four BST
All 19 studies reported positive effects of BST, with BST effectively improving
the teachers’ and staffs’ implementation of the trained intervention. Such effects included
percentage of correct responses for discrete trial teaching (DTT), percentage of Reading
implementing response interruption and redirection (RIRD) for stereotypy. See Table 2.2
for a complete list of the foci of BST and recipient outcomes measured. Interestingly,
Aherne and Beaulieu (2019) reported the use of a self-evaluation procedure to help with
described by Nabeyama and Sturmey (2010), which together suggest that BST can be
combined general-case training with BST in order to improve NLP and response chaining
performance. Finally, Chazin et al. (2018) found that BST effectively improved staff
implementation of student behavior intervention plans when training was combined with
coaching but not with training alone. Thus, while the majority of studies demonstrated
that BST alone was effective at improving teacher and staff fidelity of implementation of
various interventions, several studies noted the inclusion of other training components as
well.
22
Study Rigor
All 19 studies reported using all four BST intervention components (instruction,
modeling, rehearsal, and feedback). Thus, all key components were included for BST as
it was implemented in the studies. Ledford et al. (2016) describe study rigor as reliability
of the dependent variable, procedural fidelity, and the data itself (see Table 2.3 for coding
details). Overall, 100% of studies reported dependent variable reliability data (n = 19);
84.2% reported collecting reliability data in both primary comparison conditions for at
least 20% of sessions overall and reported greater than 80% agreement. Thus, the studies
in this review had mostly strong reliability data. In contrast, only 42.1% (n = 8) of studies
reported collection of fidelity data for the independent variable and of that 42.1%, only
100% (n = 8) of these studies reported fidelity data of 80% or higher. Thus, the studies
that did report procedural fidelity reported good procedural fidelity with high ratings.
Even so, less than half the studies in this review included fidelity data and only three
reported collecting such data in both primary conditions. This lack of fidelity data across
over half the studies calls into question whether the interventions were implemented as
intended, and thus impacts the confidence with which we can attribute positive findings
to the BST training. Regarding sufficiency of data, 68.4% (n = 13) of studies had at least
three data points per condition and 84.2% (n = 16) had enough data to infer a functional
relation. However, for those studies that did not satisfy these criteria, this limits
inferences of a functional relation between the independent and dependent variables. This
is due to the fact that a greater number of data points allows for the detection of a pattern,
23
variability, and trends in the data whereas having only two data points can, for example,
show you a trend in one direction or another, but this trend may be highly misleading
Study Quality
generalization (see Table 2.4 for coding details). When describing participants, only
of participants, ages, gender, etc. In fact, only one study included race/ethnicity in its
description of participants, but only for implementers and not for intervention recipients.
This was a relative weakness for the studies included in this review, making it difficult to
determine for whom BST may be an appropriate and effective intervention. Conditions
for both primary comparison conditions were adequately described in 84.2% (n = 16) of
studies, demonstrating a relative strength for this group of studies. Similarly, authors
definitions) in 78.9% (n = 15) of studies, demonstrating yet another relative strength for
the studies included in this review. Of note, failure to include operational definitions is a
serious problem for reporting measurement of the dependent variable, and not only
makes the study difficult to replicate but makes it difficult to determine specific
Therefore, this is a relative strength of the studies included in this review. A total of
24
47.4% of studies (n = 9) assessed generalization in some form. One study assessed
and five studies assessed generalization across responses. Of note, one study assessed
generalization across both responses and individuals (Sarokoff & Sturmey, 2008). Thus,
less than half the studies in this review assessed generalization in any form, indicating a
weakness for this group of studies. Similarly, only 42.1% of studies (n = 8), assessed
maintenance of outcomes. With less than half of the studies having assessed maintenance,
Primary Outcomes
Study rigor, quality, and outcome scores are provided in Table 2.5. For a
scores of 2.0 or higher are considered to be strong enough for the study results to be
interpreted. In addition, outcome scores of 3.0 - 4.0 or higher are consistent with
in Table 2.5, based on outcome data for primary outcomes, all but two studies
demonstrated an outcome score of 3.0 or higher. However, when combined with their
overall quality/rigor scores, only seven studies (36.8%) had sufficient overall
quality/rigor and outcome scores to infer a functional relation with confidence (Chazin et
al., 2018; Davenport et al., 2019; Fetherston & Sturmey, 2014 [Study 1]; Homlitas et al.,
2014; Nabeyama & Sturmey, 2010; Sarokoff & Sturmey, 2008; Seiverling et al., 2010),
while 63.2% of studies (n = 12) had scores indicating low quality evidence of positive
effects (see Figure 2.2). Studies with high enough overall quality/rigor and outcomes
25
scores on the SCARF to be considered rigorous and high-quality examined the use of
BST to train fidelity of implementation of behavior intervention plans (BIP; Chazin et al.,
2018), reading racetrack intervention (Davenport et al., 2019), the picture exchange
communication system (PECS; Homlitas et al., 2014), discrete trial teaching (DTT;
Fetherston & Sturmey, 2014 [Study 1]; Sarokoff & Sturmey, 2008), the natural language
paradigm (NLP) and response chaining (Seiverling et al., 2010), and guarding procedures
than 0 in Table 2.5). Of those studies that assessed generalization, 88.9% (n = 8) had
sufficient overall quality/rigor and outcome ratings for their generalization findings to be
interpreted with confidence (Fetherston & Sturmey, 2014 [all three studies]; Gianoumis
et al., 2012; Nabeyama & Sturmey, 2010; Nigro-Bruzzi & Sturmey, 2010; Palmen et al.,
2010; Sarokoff & Sturmey, 2008). Generalization was assessed across responses (n = 4;
Fetherston & Sturmey, 2014 [all three experiments]; Palmen et al., 2010), individuals (n
= 3; Gianoumis et al., 2012; Nabeyama & Sturmey, 2010; Sarokoff & Sturmey, 2008),
and contexts (n =1; Nigro-Bruzzi & Sturmey, 2010). For seven out of eight studies,
generalization was measured in the context of the study design, and for one study it was
measured pre and post intervention. Finally, all but one study received a rating indicating
that consistent, positive effects were shown via the context of the measurement design.
The remaining study received a rating indicating that generalization effects were
inconsistent or weak, positive effects. According to Figure 2.3, which shows overall
26
quality and rigor of generalization measurement and generalized outcomes 88.9% (n = 8)
effects and 11.1% of studies (n = 1) that measured generalization had low quality
evidence of positive effects. Overall, these studies suggest that BST can generalize across
a variety of parameters.
than 0 in Table 2.5). Of those studies that assessed maintenance, 87.5% (n = 7) had
sufficient overall quality/rigor and outcome ratings for their maintenance findings to be
interpreted with confidence (see Figure 2.4; Aherne & Beaulieu, 2019; Davenport et al.,
2019; Hassan et al., 2017; Homlitas et al., 2014; Jimenez-Gomez et. al., 2019; Nabeyama
& Sturmey, 2010; Palmen et al., 2010). Four out of seven studies received scores
indicating that maintenance data was collected at least one week, but less than one month
after intervention was completed. The remaining three studies collected maintenance data
one or more months after the completion of intervention. Finally, all seven studies
showed maintenance data similar to intervention or criterion, and five of the seven
measuring maintenance had scores indicating low quality evidence of positive maintained
effects (see Figure 2.4). Overall, these studies indicate that skills learned from BST can
In sum, findings indicate that seven studies had both strong measurement
characteristics and primary outcomes that can be interpreted with confidence. Eight
27
seven studies indicated quality maintenance measurement and outcomes.
Summative Quality
manner that allowed for the interpretation of their results with confidence, only a few
studies demonstrated sufficient overall quality/rigor and outcomes in multiple areas. The
only study to demonstrate sufficient quality/rigor and outcomes across primary outcomes,
generalization, and maintenance was Nabeyama and Sturmey (2010). Therefore, this
study was the most rigorous and comprehensive in terms of SCARF ratings and protocols
indicating high levels of quality in all three areas assessed. Two studies demonstrated
high quality/rigor in the areas of primary outcomes and generalization (Fetherston &
Sturmey, 2014; Sarokoff & Sturmey, 2008). Of note, this was only demonstrated in Study
1 of Fetherston and Sturmey (2014). Maintenance was not assessed in either of these
studies, so quality and outcome indicators for that area could not be determined.
Similarly, two studies demonstrated high quality/rigor in the areas of primary outcomes
and maintenance (Davenport et al., 2019; Homlitas et al., 2014). However, generalization
was not assessed in either of these studies, so quality and outcome indicators for that area
Discussion
The current review evaluated 19 studies from 17 articles examining the effects of
professional staff) with children ages birth to 21. These studies were coded and analyzed
using the SCARF coding procedure (Ledford et al., 2016). All studies utilized a multiple
28
baseline or multiple probe design. The majority of interventionist-recipient relationships
were teacher-child, and the majority of interventions took place in the classroom or
school environment. All studies reviewed utilized all of the BST training components, but
performance. The reviewed studies had several strengths in their reporting, including
comparison conditions, and sufficiency of data. However, there were also several
weaknesses in their data reporting, including BST fidelity reporting and participant
descriptive characteristics.
Regarding SCARF scoring, less than half of the studies produced high enough
overall quality/rigor and outcome ratings for their primary outcome data to be interpreted
maintenance were assessed in less than half of all studies. However, the majority of these
ratings in overall quality/rigor and outcomes across all three areas assessed: primary
This review demonstrated that although previous research has shown that BST
certain number of included studies were found to be of sufficient overall rigor/quality and
29
outcomes to infer a functional relation with confidence. These included studies that
racetrack intervention, the picture exchange communication system (PECS), discrete trial
teaching (DTT), the natural language paradigm (NLP), and guarding procedures for
patients with ambulatory difficulties. The studies that implemented these intervention
techniques were found to be of higher quality among the studies included in this review.
However, that is not to say that all included studies that implemented these interventions
Implications
This review has shown that although BST has an expansive literature base, not all
BST studies have the same level of quality. This is important to note for consumers of
BST literature to be discerning in their review of existing and future BST studies.
Similarly, this is important for future researchers to help them ensure that their research
addition, although generalization and maintenance were included in several studies, less
than half of the studies reviewed incorporated these measures. These are important
components of BST research and future research should consider incorporating these
The identified studies had several strengths in their reporting. However, some
weaknesses in their data reporting were also noted, including reporting BST fidelity and
participant descriptive characteristics. This indicates a need for future research that
incorporates these elements into study design and reports on them more thoroughly in the
30
body of the text. Without intervention fidelity data, it is difficult to determine whether
BST was implemented as planned and thus whether it is responsible for changes observed
in the data. Similarly, without appropriate participant descriptions, it will be difficult for
individuals for whom intervention may be effective. These two limitations highlight the
need for future research in which BST fidelity is measured and sufficient participant
Every study included in this review reported some form of positive effects of BST
this review has shown that several studies with well-established overall rigor/quality and
outcomes demonstrated that teachers and/or staff members can be effectively trained to
implement various interventions using the BST package. This is important because
teachers spend a great deal of time with children and thus, if taught to use various
interventions that target children’s needs, have the potential to increase the dosage of
intervention exponentially. Rather than seeing an interventionist once or twice a week for
an hour (or even daily for an hour), children could have the opportunity to receive
intervention daily for several hours a day while at school. This would likely lead to faster
gains and improvements in skills. However, it would also be important to make using the
intervention in the context of the classroom feasible for teachers. Therefore, future
31
This review has also identified particular interventions that are more likely to
work well with BST based on high quality SCARF coding. These include behavior
intervention plans (BIP), the reading racetrack intervention, the picture exchange
communication system (PECS), discrete trial teaching (DTT), the natural language
paradigm (NLP), and guarding procedures for patients with ambulatory difficulties. This
finding is important for two main reasons. One, it helps to strengthen the existing BST
literature by highlighting interventions that are likely to be successful due to high quality
research. Two, it encourages future research to expand on those intervention areas that
did not receive high SCARF overall quality/rigor and outcome scores such as incidental
teaching, activity schedules, cognitive behavioral therapy (CBT), and parent child
interaction therapy (PCIT) like verbal behaviors, etc. Of note, some of these interventions
alone may have a strong literature base, but this review focused only on those studies that
of participants that have benefitted from this type of BST training. Teachers and staff
members ranging in age from 19 to 50 with a wide range of experience and backgrounds
were included in the studies. Thus, implementers between the ages of 19 to 50 and with
years of experience may be likely to benefit from BST coaching. Similarly, intervention
recipients ranged in age from 2 to 12 years, and the majority had a diagnosis of ASD.
Thus, children who are between the ages of 2 and 12 and who have a diagnosis of ASD
may be likely to benefit from intervention from natural implementers who have had BST
32
training. However, some studies did not report participant demographic information and
Limitations
The current review has several limitations worthy of note. First, the review is not
comprehensive of all studies involving training natural implementers with BST. Parents,
peers, and other natural implementers were excluded from the study due to a focus on
teachers and other professional staff members. Although important contributions to the
BST research, studies involving parents and other natural implementers are beyond the
scope of this review. Future reviews should focus on these other natural implementers.
Second, the method of score calculation for SCARF scores is not always inclusive
of all study information. For example, some questions, depending on whether you
answer, “yes” or “no,” then require you to answer “NA” for the remainder of questions in
a section. However, there are times when these questions are not mutually exclusive and
questions which could receive an answer of “yes” and earn more points are required to be
coded as “NA,” lowering the overall score for the study. Thus, the rules of coding
actually lead to a decrease in a study’s score rather than the lack of such an element in the
Conclusion
The current review demonstrated that BST can be successfully used to train
certain studies were ranked as having more rigor/quality and thus more confidence in
33
their outcomes according to SCARF. These seven studies involved training the following
intervention, the picture exchange communication system (PECS), discrete trial teaching
(DTT), the natural language paradigm (NLP), and guarding procedures for patients with
ambulatory difficulties. However, all studies, even those that received lower SCARF
training. The current review has shown that teachers and other professional staff can be
effectively taught various intervention techniques using BST and can then use those
intervention techniques with fidelity, increasing the potential intervention dosage for
recipients.
34
Table 2.1
Participant Demographics
35
Table 2.1 Continued
Implementer Implementer Recipient Recipient
Implementer/Recipient N (age Gender N (age Gender Recipient
Study Relationship range) (%M) range) (%M) Diagnoses
Giles ASD- Motor
(2018) Teacher/Student 3 (26-33) NR 3 (6-12) 100% Stereotypy
Hassan
(2017) Staff/Client 7 (22-32) 29% 7 (NR) NR ASD
Hogan
(2015) Staff/Client 4 (25-34) 0% 2 (NR) NR NR
Homlitas
(2014) Teacher/Student 3 (NR) NR 9 (2-7) NR ASD
Jimenez-
Gomez
(2019) Staff/Client 5 (NR) NR 3 (2-4) 100% ASD
Swimming
Jull (2016) Instructor/Child 6 (19-30) 17% 8 (5-8) 88% ASD
Multiple
Nabeyama physical
(2010) Staff/Client 3 (21-24) NR 3 (7-8) 100% disabilities
Nigro-
Bruzzi
(2010) Staff/Client 6 (NR) NR 6 (2-6) NR ASD
Palmen
(2010) Staff/Client 4 (41-50) 50% NR NR ASD
36
Table 2.1 Continued
Implementer Implementer Recipient Recipient
Implementer/Recipient N (age Gender N (age Gender Recipient
Study Relationship range) (%M) range) (%M) Diagnoses
Sarokoff
(2004) Teacher/Student 3 (NR) NR 1 (3) NR ASD
Sarokoff
(2008) Teacher/Student 3 (NR) 0% 5 (M = 5) 100% ASD
Seiverling
(2010) Teacher/Student 3 (23-42) NR 3 (3-4) NR ASD
Note. Studies identified by last name of first author and year. NR = not reported; ASD = autism spectrum disorder; GDD = global
developmental delay;
37
Table 2.2
Study Outcomes
Study Study Study Setting BST Focus Trained Intervention Brief Results
Design Outcomes
BST was effective in
teaching DTT; outcomes
were not maintained for
2/3 participants; a self-
evaluation procedure
Aherne Home/Training
helped with maintenance
(2019) MB-P Center DTT NR
BST improved staff
implementation of student
Chazin BIPs when training was
(2018): Unprompted functional combined with coaching
MP-P
Study 1 School BIP implementation communication but not with training alone
BST taught teachers to
Davenport Reading Racetrack Number of sight words read implement the reading
MP-P
(2019) School intervention correctly racetrack intervention
Fetherston
BST was effectively used
(2014) Percentage of correct responses
MP-P to train DTT
Study 1 School DTT by learners
Fetherston
BST was effectively used
(2014) Percentage of correct responses
MP-P to train incidental teaching
Study 2 School Incidental teaching by learners
38
Table 2.2 Continued
Study Study Study Setting BST Focus Trained Intervention Brief Results
Design Outcomes
Fetherston BST was effectively used
(2014) Percentage of correct responses to train staff to implement
MP-P
Study 3 School Activity Schedules by learners activity schedules
BST was effectively used
Gianoumis Child vocalizations and to train teachers to
MB-P
(2012) School NLP maladaptive behavior implement NLP
Teachers were able to
Giles implement RIRD after
MB-P
(2018 School RIRD Stereotypy BST
Staff improved CBT
Hassan Clinic/Training intervention over self-
MP-P
(2017) Center CBT intervention NR study alone
BST improved
Hogan implementation of student
MB-P
(2015) School BIPs NR BIPs
BST was used to train
teachers to implement
Phases 1, 2, and 3A of
Homlitas
PECS
(2014) MB-P School PECS NR
BST was used to train
Jimenez‐
staff to implement PCIT
Gomez PCIT type verbal
like verbal behaviors
(2019) MP-P Clinic behaviors NR
39
Table 2.2 Continued.
Study Study Study Setting BST Focus Trained Intervention Brief Results
Design Outcomes
40
Table 2.2 Continued
Study Study Study Setting BST Focus Trained Intervention Brief Results
Design Outcomes
To use BST to train BST was used to
Sarokoff staff to implement Correctly identifying target effectively train staff to
MB-P
(2008) School DTT sight words implement DTT
BST and GCT were
effectively used to train
Seiverling NLP and response NLP and response
MB-P
(2010) School chaining Emission of vocal chains chaining
Note. Studies identified by last name of first author and year. MB-P = multiple baseline across participants; MB-B = multiple baseline
across behaviors; NC MB-P = nonconcurrent multiple baseline across participants; MP-P = multiple probe across participants; DTT =
discrete trial teaching; IMRF = instructions, modeling, rehearsal, and feedback; BST = behavioral skills training; NLP = natural
language paradigm; RIRD = response interruption/re-direction; CBT = cognitive behavioral therapy; BIP = behavior intervention plan;
PECS = picture exchange communication system; PCIT = parent child interaction therapy; ASD = autism spectrum disorder; GCT =
general-case training.
41
Table 2.3
Rigor Coding Questions from the Single Case Analysis Review and Framework (SCARF)
Criteria Criteria
Dependent Variable Reliability
1. Do authors report dependent 2. Do authors report collection of
variable reliability data? agreement data in both primary
comparison conditions and for at
least 20% of sessions overall?
3. Are dependent variable reliability 4. Was agreement data collected by
data (e.g., IOA data) calculated on observers who were blind to study
a point-point basis, and is conditions and/or purpose?
agreement higher than 80% (or
higher than 0.60 Kappa) in each
primary comparison condition?
42
Table 2.3. Continued
Sufficiency of Data
1. Do at least three data points exist 2. Is the design a multiple baseline or
in each primary comparison multiple probe design?
condition?
3. Did data collection begin 4. Are more data points needed in
simultaneously during initial any primary comparison condition
baseline or probe conditions? due to (a) within-condition
variability, (b) within-condition
changes in level or trend, or (c)
potential covariation between tiers
in a multi-tier design?
5. Do at least four data points exist in 6. Do at least five data points exist in
each primary comparison each primary comparison
condition, or in conditions with condition, or in conditions with
only three data points is one of the only three data points is one of the
following true: all points at following true: all points at
baseline or ceiling levels, data baseline or ceiling levels, data
reached a criterion level, or no reached a criterion level, or no
overlap with adjacent conditions is overlap with adjacent conditions is
present? present?
Note. IOA = interobserver agreement.
43
Table 2.4
Quality & Breadth of Measurement Coding Questions from the Single Case Analysis
Review and Framework (SCARF)
Participant Descriptions
1. Do authors report demographic 2. Do authors report formal test
information, including age and results (e.g., IQ, language
diagnosis or eligibility category, competence, achievement)?
for all participants?
3. Do authors report general 4. Do authors report inclusion criteria
information about participants or pre-intervention behaviors for
(e.g., educational placement, all participants?
problem behaviors, functional
repertoire of behaviors, areas of
strength and weakness)?
Condition Descriptions
1. Are procedures for both primary 2. Is dosage adequately described?
comparison conditions adequately
described?
3. Is setting described for both 4. Are implementers adequately
primary comparison conditions described in terms of training and
general (i.e., if relevant: location, demographic characteristics? If
individuals in environment, indigenous implementers are used,
physical characteristics)? "yes" on this question requires
authors to report (a) how
implementers were trained, and (b)
44
Table 2.4. Continued.
evidence that the training was completed
as described (e.g., implementation
fidelity).
Dependent Variable Descriptions
1. Do authors describe observable 2. Do authors provide examples
characteristics of dependent and/or non-examples of target
variables (e.g., operational behaviors?
definitions)?
3. Do authors adequately describe 4. Do authors describe how system
measurement system? (e.g., was used? (e.g., Were data
counts, duration, 5-s partial collected by implementers or
interval system, 15-s momentary another individual? Were data
time sampling) collected in-vivo, via audio or
video?)
Generalization Measurement 1
1. Do authors report assessment of a 2. Do authors report assessment of a
target behavior performed in a target behavior performed with
context that is different than materials that are separate from
training/primary outcome those used in training/primary
measurement? measurement context?
3. Do authors report assessment of a 4. How do authors measure
target behavior performed with a generalization across materials,
different social partner than those social partners, or settings?
used in training/primary
measurement context?
Generalization Measures 2
1. Do authors measure a behavior 2. Do authors teach one specific
that is a generalized tendency, in behavior or type of behavior, but
addition to the primary outcome of measure a different specific (not
interest? generalized) behavior as a measure
of generalization (response
generalization)?
3. How do authors measure
generalized behavior? (e.g., either
by measuring a generalized
tendency or generalization of an
explicitly taught behavior).
45
Table 2.4. Continued.
Maintenance Measurement
1. Do authors report evidence of 2. Is this maintenance measured on
continued behavior change, during more than one occasion?
post-intervention sessions?
3. When is maintenance measured?
46
Table 2.5
Hogan (2015) 1.0 1.0 1.0 3.0 0.0 1.0 0.0 1.0
47
Table 2.5 Continued
Primary Outcome Generalization Measures Maintenance Measures
Overall
Study Rigor Quality Rigor/Quality Outcome Quality/Rigor Outcome Quality/Rigor Outcome
Homlitas
(2014) 2.7 1.9 2.4 5.0 0.0 1.0 4.0 5.0
Jimenez‐
Gomez (2019) 0. 7 1.7 1.0 4.6 1.0 3.6 3.0 3.6
Jull (2016) 0.7 1.9 1.1 4.0 0.0 1.0 0.0 1.0
Nabeyama
(2010) 2.3 2.6 2.4 5.0 4.0 5.0 4.0 5.0
Nigro-Bruzzi
(2010) 2 1.7 1.9 2.5 3 4.5 0 0.5
Palmen (2010) 0.7 2.6 1.3 2.0 3.0 3.0 3.0 5.0
Sarokoff
(2004) 2.3 1.0 1.9 5.0 0.0 1.0 0.0 1.0
Sarokoff
(2008) 2.3 2.4 2.4 4.7 4 4.7 0 0.7
Seiverling
(2010) 3.0 1.9 2.6 4.9 0.0 0.9 0.0 0.9
Note. Studies identified by last name of first author and year. SCARF = Single Case Analysis Review and Framework.
48
Figure 2.1.
Preferred Reporting for Systematic Reviews and Meta Analyses (PRISMA) Flow Diagram
Identification
(n = 172) (n = 21)
Studies included in
qualitative synthesis
(n = 19 in 17 articles)
Identification
Included
Note. This figure shows the number of studies that were considered for eligibility in the
review and how they were reviewed and analyzed to obtain the final number of studies.
49
Figure 2.2
4
Primary Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Overall Study Quality & Rigor
Note. SCARF = Single Case Analysis Review and Framework. Filled-in circle data
points represent individual studies. The graph is interpreted as follows: Data points that
fall in quadrant one indicate low quality evidence of positive effects. Data points that fall
in quadrant two indicate high quality evidence of positive effects. Data points that fall in
quadrant three indicate low quality evidence of negative or minimal effects. Finally, data
points that fall in quadrant four indicate high quality evidence of negative or minimal
effects. In sum, the highest quality studies with the best outcomes fall in quadrant two.
50
Figure 2.3
5
Generalized Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Quality & Rigor of Generalization Measurement
Note. SCARF = Single Case Analysis Review and Framework. Only studies that
measured generalization are included. Filled-in circle data points represent individual
studies. The graph is interpreted as follows. Data points that fall in quadrant one indicate
low quality evidence of positive effects. Data points that fall in quadrant two indicate
high quality evidence of positive effects. Data points that fall in quadrant three indicate
low quality evidence of negative or minimal effects. Finally, data points that fall in
quadrant four indicate high quality evidence of negative or minimal effects. In sum, the
highest quality studies with the best outcomes fall in quadrant two.
51
Figure 2.4
4
Maintained Outcomes
3
1 2
2
1
3 4
0
0 1 2 3 4 5
Quality & Rigor of Maintenance Measurement
Note. SCARF = Single Case Analysis Review and Framework. Only studies that
measured maintenance are included. Filled-in circle data points represent individual
studies. The graph is interpreted as follows. Data points that fall in quadrant one indicate
low quality evidence of positive effects. Data points that fall in quadrant two indicate
high quality evidence of positive effects. Data points that fall in quadrant three indicate
low quality evidence of negative or minimal effects. Finally, data points that fall in
quadrant four indicate high quality evidence of negative or minimal effects. In sum, the
highest quality studies with the best outcomes fall in quadrant two.
52
CHAPTER 3
TECHNIQUES1
1
Slane, M. M., & Lieberman-Betz, R. To be submitted to Behavioral Interventions
53
Abstract
Behavioral skills training (BST) is a common set of four core principles that are used to
researched and has a strong literature base (DiGennaro et al., 2018). However, BST has
Interventions (NDBIs). NDBIs are a class of interventions that combine both principles
children with disabilities from more basic developmental skills such as joint attention and
eye contact to more complex sills such as language and social interaction (Schreibman et
al., 2015). One NDBI that has led to improvements in children’s early language
development is Milieu Teaching (MT; e.g., Bolzani et al., 2009; Warren & Gazdag,
1990). The present study sought to investigate the utility of BST in the training of MT
techniques. Two teachers were trained to implement three MT techniques with children
who were minimally verbal: following the child’s lead, teaching social routines, and the
system of least prompts (Fey, 2008). A concurrent multiple baseline across behaviors
replicated across teacher participants was used. One teacher showed an increase in
fidelity of implementation of all three techniques when BST was introduced, effectively
fidelity of implementation for the first two techniques with limited evidence of a
functional relation for the third technique. These improvements in skill also generalized
to a new set of toys and materials and maintained over time. Thus, the present study
54
showed that BST can be effectively used to train natural implementers to carry out MT
teachers
55
Teaching the Teacher: Using Behavioral Skills Training to Train Teachers to
estimates ranging from 7% to 17% (King et al., 2005). In addition, language impairments
spectrum disorder (ASD; Rosenbaum & Simon, 2016). These delays have been
associated with lower performance in cognitive, academic, and language domains in later
development (Johnson et al., 1999). Thus, the importance of early intervention for
methods of intervention delivery, for example pull out sessions with speech/language
pathologists (SLPs) in schools, only allow for sessions to occur once or twice a week for
about an hour, which may not be enough to allow for children with language delays to
close the gap and catch up to their typically developing peers. This has led to a surge in
(2004) points out that teachers spend a great deal more time with children than traditional
interventions, the dosage for these interventions has the potential to increase dramatically
with sessions happening multiple times a week, for multiple hours a day. However, in
order for this to occur, teachers must be properly trained in the various intervention
techniques so that they are able to implement these techniques with fidelity.
56
Behavioral Skills Training
teach new skills and techniques to a variety of individuals (Kornacki et al., 2013; Ward-
Horner & Sturmey, 2012). BST is comprised of four primary components, (1) instruction,
(2) modeling, (3) rehearsal, and (4) feedback. During the first step, learners are given
instructions regarding how to perform the desired skill or behavior. Next, the therapist or
researcher models accurate completion of the target behavior. Then, the trainee and the
trainer practice with one another via role play (rehearsal). Finally, the trainer watches the
trainee implement the learned skill in the target environment and provides corrective
feedback (Krumhus & Malott, 1980; Nuernberger et al., 2013). Although all components
of BST have been established as important for proper implementation, modeling and
feedback have been identified as the critical components that increase the fidelity of
implementation of new skills (Krumhus & Malott, 1980; Ward-Horner & Sturmey,
2012).
BST has been used across a variety of settings and with a wide variety of
individuals. Studies have included using BST to increase on-task behavior for high-
functioning young adults with autism spectrum disorder (ASD; Palmen & Didden, 2012),
(Iwata et al., 2000), and to train staff to implement Phases 1-3 of the Picture Exchange
Communication System (PECS; Homlitas et al., 2014). BST has also been used to teach
parents and caregivers of children with ASD to implement social skills training (Dogan et
al., 2017; Hassan et al., 2018), improve staff implementation of mand training and
57
subsequent unprompted mands in children (Nigro-Bruzzi & Sturmey, 2010), train
community staff and teachers to implement discrete trial training (Jull & Mirenda, 2016;
Sarokoff & Sturmey, 2004), train teachers to implement specific behavioral intervention
plan goals (Madzharova & Sturmey, 2018), and train teachers to implement response
interruption and redirection (Giles et al., 2018). Thus, BST is a well-established training
have only recently begun to examine the use of BST to train professionals to implement
interventions are diverse in form and target multiple developmental domains, including
social, language, play, motor, and cognition (Schreibman et al., 2015). They also
typically target young children with social-communication disorders, such as those with
ASD (Schreibman et al., 2015). Schreibman et al. (2015) identified several common
features of NDBIs, including, (1) three-part contingency; (2) manualized practice; (3)
teaching episodes; (6) environmental arrangement; (7) use of prompting and prompt
fading; (8) modeling; (9) adult imitation of the child’s language, play, or body
movements; and (10) broadening the attentional focus of the child. In a review of NDBIs
58
identified seven components that were common across the majority of studies reviewed:
(1) following the child’s lead, (2) prompting, (3) natural consequences (i.e., outcomes
that logically result from a behavior, such as providing a child with a desired object
immediately after he/she requests it), (4) instruction embedded in routines, (5)
environmental arrangement, (6) time delay, and (7) linguistic mapping. Thus, while there
are a multitude of NDBIs, there are also a set of common core features of NDBIs that
training techniques, such as verbal explanation, modeling, and coaching; didactic training
sessions; and presentations. However, the degree to which these training techniques are
used and the exact nature of their use is not always clear. Previous studies have focused
Mccathren, 2000), manualized interventions targeting joint attention (Kaale et al., 2012),
enhanced milieu teaching (Olive et al., 2007), naturalistic language teaching (Smith &
Camarata, 1999), and symbolic play and joint attention (Wong, 2013). Despite the
growing number of NDBIs that have been implemented in the classroom by trained
teachers, there is a continued need to examine well-defined procedures that can be used
clearly lays out the process for training individuals, creating a more uniform standard for
training.
59
In an early study integrating BST and NDBIs, Seiverling et al. (2010) used a
multiple baseline across participants design to examine the use of BST to train preschool
teachers to use the Natural Language Paradigm (NLP) in their classrooms. After BST,
teachers successfully implemented NLP and response chaining with children with ASD
replicated these results, using a multiple baseline across participants design to show that
BST could be used to train preschool teachers to effectively use NLP with 3 to 4-year-old
children diagnosed with ASD. In yet another study designed to carve out a role for BST
use of BST to train paraprofessionals to implement the system of least prompts using an
embedded teaching procedure with two children with ASD (ages 4 and 5 years old). The
use of a brief, embedded teaching procedure allowed the staff to implement the behaviors
without interfering with their regular duties or causing an undue burden. Because the
evidence for the use of BST to train teachers, caregivers, and other professionals to
to continue to examine its use to train teachers to use well established intervention
Milieu Teaching
composed of three main intervention strategies: (1) prelinguistic milieu teaching, (2)
60
milieu teaching, and (3) focused stimulation. The most appropriate technique is chosen
to these three intervention strategies, MCT utilizes several core techniques throughout all
three intervention strategies. These include following the child’s lead (FTCL), teaching
social routines (TSR), and setting up the environment (Fey, 2008). The MT intervention
technique is selected once a child is producing a minimum of 5 words, with the outcome
goal of furthering the development of the child’s language (Fey, 2008). In MT, FTCL,
TSR, and setting up the environment are combined with time delay and a mand-model
prompting hierarchy and several other techniques to develop and elicit communication
from the child. In the present study, the focus was on three core techniques believed to be
most amenable to the classroom environment and that fit participants’ current language
FTCL is a technique that involves allowing the child to direct and lead the
interactions while the adult follows along with the child (Fey, 2008). In order to maintain
the children’s interest during intervention sessions, interventionists are taught to follow
children’s attentional leads and to focus on objects and routines of interest to the children
(Fey et al., 2017; Fey, 2008). Given that young children, especially those with social
communication delays, tend to attend to and focus more on objects they find interesting,
FTCL involves allowing the child to direct the interaction and to select the play materials
(Fey et al., 2017). This practice helps to ensure that children remain interested and
engaged during interactions with adults. Adults often try to direct children’s play by
61
making suggestions, asking questions, or issuing commands. However, if communicative
acts are to be encouraged, then the adult must learn to engage in play without utilizing
these behaviors and dominating the interaction. In essence, the adult must learn to be the
Social routines involve a collection of events that repeatedly occur in the same
pattern; they are created when a particular manner of playing or interacting occurs in the
same sequence repeatedly (Fey, 2008). Using identified routines, adults learn to create
opportunities for shared interaction. Rather than simply imitating the child’s play and
engaging in parallel play as in FTCL, adults serve as active participants by inserting turns
into the interaction. The goal is to create a back and forth routine in which the child and
adult take turns initiating and responding to one another. During the adult’s turn, he/she
can then pause, creating an opportunity for the child to communicate. If the adult does
not complete his/her turn, the child may look up, or engage in other forms of
communication to continue the routine. For example, when engaging in FTCL, the adult
and child may engage in parallel play, each running separate cars down separate tracks.
In contrast, when building a social routine, the interventionist and child might use the
same car and track, requiring the child to take turns sending the car down the track. Thus,
the adult has increased the likelihood that the child will maintain interest in the
interaction by allowing him/her to select the toys and routine and has also created
62
System of Least Prompts
The combined use of FTCL and TSR allows the adult to create an opening for a
learner to emit new words through the use of the system of least prompts (SLP), a third
component of MT. The SLP involves both time delay procedures and mand-model
procedures (Fey, 2008), which are followed in succession. First, target words are selected
based on the routines that an adult and a child have established, and the child’s current
language level (Fey, 2008). For example, if a child is using primarily single words, words
such as “car”, “track”, “up”, or “down” may be selected as the target during an activity
where an adult and child are taking turns running a car down a track. After following the
child’s interest in the car and track and establishing the social routine of taking turns with
the car, the adult occasions a response from the child by not giving the child the car
during his/her turn. The adult will then initiate the prompting hierarchy with time delay,
followed by a linguistic prompt, and then a linguistic mand-model prompt (Fey, 2008).
During SLP, the adult looks at the child expectantly when first withholding the object
(time delay), then asks the child for the correct response to be emitted (linguistic prompt)
and, if no response is given, provides a model while asking for the child to emit the
MT has a strong research base supporting its efficacy in teaching children with
language delays new language targets (e.g., Bolzani et al., 2009; Warren & Gazdag,
and Gazdag (1990) found that MT, specifically mand-model and incidental teaching
63
techniques, successfully improved communication in children with mild intellectual
in children with prenatal cocaine exposure, Bolzani et al. (2009) found that participants
benefitted from MT, improving their spontaneous production of single and multiple
words. Similarly, Togram and Erbas (2010) found the mand-model component of MT to
effects were maintained 16 weeks after the conclusion of the study. Using four types of
MT language prompts, including models, commands, questions, and time delay, Ingersoll
et al. (2012), found that both MT and a combined condition (MT plus responsive
children. Further demonstrating the effectiveness of MT, Christensenet al. (2013) found
that MT components (modeling, mand-model, time delay, and incidental teaching) were
effective for increasing language targets in preschool-aged children with ASD in an early
childhood special education classroom. In addition, MT has been used to teach and
promote a photo exchange system for a child with ASD (Ogletree et al., 2012). These
studies suggest that MT can be used to improve language abilities in children and that the
use of target components rather than the intervention as a whole can still be beneficial for
in young children with developmental disabilities, several studies have also utilized
natural implementers and demonstrated similar positive language outcomes. Kaiser et al.
64
as well as MT in a classroom with nonvocal preschool children, leading to increases in
child communication. Similarly, Kaiser et al. (1995) were able to successfully train
reported a successful parent training program, where parents were trained to implement
the mand-model procedure of MT, resulting in language gains for their children
effectively by parents and teachers if they are trained properly (i.e., with coaching and
feedback). Thus, it would be logical to conclude that a training package such as BST
would effectively train teachers in the classroom to implement other MT techniques (i.e.,
FTCL and TSR), in addition to SLP (including mand-model procedures). Although these
previous studies described using procedures similar to BST (e.g., coaching and
feedback), they do not explicitly state that BST was used to train parents or teachers in
The present study aimed to fill this gap in the literature by examining the efficacy
of BST to train teachers at a school for children with disabilities to implement several
core MT techniques, including: (1) FTCL, (2) TSR, and (3) SLP. Specifically, this study
65
1. Is BST effective in training teachers to implement MT techniques in the
classroom?
Method
Participants
children with disabilities in the southeastern United States were recruited to participate in
the current study. To be eligible for the study, teachers must have been (a) willing to
participate in the study and (b) have at least one eligible child participant enrolled in their
classroom. Teacher participants were 20 and 47 years of age, with 1 week and 15 years of
demographics, see Table 3.1. One child per enrolled teacher was recruited to participate
in this study. To be eligible for participation, children were required to (a) be enrolled in
the classroom of an eligible teacher at a school for children with disabilities, (b) be
between 2 and 9 years of age, and (c) produce fewer than 5 referential words. Potential
participants were nominated by teachers and staff, and eligibility was confirmed via a
parent report measure and a brief, 20 min observation conducted by the first author (see
Appendix B for observation data sheet). During the observation, children’s language was
measured and documented and used to determine eligibility. It was initially thought
66
prelinguistic targets would be appropriate for intervention. However, once enrolled in the
study, it became clear that language targets were most appropriate based on the students’
communication skills and language levels. Therefore, MT was selected as the most
appropriate intervention for both child participants. For a complete description of child
Ms. Smith. Ms. Smith had her associate’s degree and was certified as a
paraprofessional and a tutor, and served as the lead teacher in her classroom. She reported
disabilities (SLD), other health impairment (OHI), speech and/or language impairment,
visual and hearing impairments, traumatic brain injury, attention deficit hyperactivity
disorder (ADHD), and social communication deficits. She reported teaching in both
special and general education classrooms as well as gifted classrooms. Ms. Smith worked
at the school for a total of four years. She had nine students in her classroom, all of whom
impairments, and several others. All nine children had an individualized Growth and
where she assisted lead teachers and board-certified behavior analysts (BCBAs) in
classroom and therapeutic settings. Ms. Smith reported minimal experience with
67
professional training, having participated in a previous study several years ago where she
was trained to perform various behavioral techniques, such as discrete trial teaching. Ms.
modeling.
Sarah. Sarah was 8 years, 0 months old and had a diagnosis of Phelan McDermid
the Abbreviated battery of the Stanford Binet Intelligence Scales- Fifth Edition (SB-5)
was 47. Her nonverbal IQ was below 42 and her verbal IQ was below 43. Previously,
therapy, and applied behavior analysis. She had been at the school for four years and had
a GPP with social communication goals. As such, at the start of the study, Sarah was
she was not using any words functionally or meaningfully. However, based on examiner
functionally.
Mr. Parker. Mr. Parker was brand new to both teaching and working at the
school and did not have any previous experience working with children. He worked at the
school for a total of one week prior to enrolling in the study and worked as a
68
training. Mr. Parker did not report using any strategies to support communication in the
Josh. Josh was 8 years, 7 months old and had a diagnosis of ASD. His full-scale
IQ according to the Abbreviated battery of the SB-5 was 47. His nonverbal IQ was below
42 and his verbal IQ was below 43. Josh was not receiving any interventions at the time
of the study. This was his first year attending the school and he had a GPP with social
report, he was able to produce 166 words. However, based on examiner observation, he
produced and used approximately 3 words meaningfully and functionally. Much of his
language was comprised of echolalia, and teachers reported that they had never heard him
Materials
examples was developed for each intervention strategy. During training sessions, the
PowerPoint presentation and videos were displayed on the researcher’s laptop computer
or on a larger projector screen when available. Data collection sheets were developed and
used to collect data on fidelity of BST for all teacher training sessions (see Appendix C).
All observation sessions were video recorded to allow for primary and reliability
coding of teacher behavior. Data collection sheets were developed to record teacher
strategy use across baseline and intervention sessions (see Appendix D). At the beginning
of SLP data collection, teachers were using their own watches or an iPad displayed with
69
the time to remind them to prompt for language targets every 2 min. An interval timer
that the teachers could keep on their person and that vibrated every 2 min was introduced
partway through SLP intervention to make it easier and less cumbersome for teachers to
determine when the 2 min interval had expired. It was introduced during Session 31 for
Ms. Smith and Session 32 for Mr. Parker during SLP intervention.
Toys were selected by Ms. Smith and Mr. Parker based on their knowledge of
Sarah’s and Josh’s interests. Toys used during baseline and intervention included the
following for both Sarah and Josh: magnets, books, puzzles, counting blocks, monkey
string with shape cards (a sticky string-like substance that is pliable and can adhere to
surfaces), string blocks, color sorting bears, coloring, small balls, connecting people
(small people who created circles and chains by holding hands), and Colorino (a game
where you match colorful markers to the picture to make a 3D version of the picture). A
giant beachball also became available to Josh toward the end of baseline for FTCL and
the beginning of FTCL intervention, and a bike became available to Sarah during FTCL
intervention.
Generalization Toys
Generalization toys for Sarah included a trampoline, a racetrack with cars, train
tracks, a vacuum cleaner, taking a walk, and a medium sized ball. Toys for Josh included
70
Setting
classrooms and other available spaces (i.e., library, hallway, outdoor recreational area) at
a school for children with disabilities in the southeastern United States. Classrooms were
divided into several areas, including a reading area, an arts and crafts area, a play area,
and a block area. Baseline and intervention data collection was conducted during
unstructured free play in the play area of the classrooms and other available school
spaces. Training sessions were conducted 1:1 with the teachers in classrooms and/or
conference rooms at the school. No children were present during initial BST sessions.
Formal Measures
Demographic Questionnaire
Both teachers and parents of eligible children were asked to complete a brief
is appropriate for individuals ranging in age from 2 to 85+ years. Each child was
administered the SB-5 at the beginning of the study to determine his/her level of
intellectual functioning. The SB-5 demonstrates good internal consistency with values
ranging from .84 to .98, as well as good test-retest reliability, with values ranging from
.74 to .97 (Janzen et al., 2004). Similarly, the SB-5 demonstrates good concurrent validity
with a variety of tests, including the Woodcock Johnson-III Tests of Cognitive Abilities
71
(Woodcock et al., 2001a), the SB-IV (Thorndike et al., 1986), and many others with
values raging from .78 to .90 (Janzen et al., 2004). The SB-5 also demonstrated good
al., 2001b) and the Wechsler Individual Achievement Test-II (Wechsler, 2005) in
The CDI is a parent report measure that assesses a child’s early developing
language, including use and understanding of words and phrases as well as gestures.
Parents were asked to complete the Words and Gestures form of the CDI. This form is
normed for use with children ages 8 months to 30 months; however, it can be used for
children who are older than 30 months if their communication and development are
delayed. The CDI demonstrates good internal consistency (ranging from .62 to .76), and
test-retest reliability (ranging from .59 to .99; Law & Roy, 2008). In addition, it has
shown good convergent validity: .52 with the Preschool Language Scale-Revised
(Zimmerman et al., 1979), .67 with the Peabody Picture Vocabulary Test- Third Edition
(Dunn & Dunn, 1997), and .82 with the Reynell Developmental Language Scales
Social Validity
At the conclusion of the training and intervention process, teachers were asked to
complete an acceptability rating scale to help determine whether they found the
intervention acceptable, helpful, and practical (adapted from Hendrickson et al., 1993).
72
This measure was used to help gather information regarding teacher acceptability of the
feasibility of the techniques for use in the classroom (see Appendix G).
from the Milieu Teaching (MT) intervention that were implemented in the classroom
during 20 min, pull-out sessions in which the teacher and child worked together
separately from the larger classroom. Three core techniques were targeted: (1) following
the child’s lead (FTCL), (2) teaching social routines (TSR), and (3) the system of least
prompts (SLP). All teachers were taught the intervention techniques in the same order,
beginning with the most foundational skill and progressing toward more complex,
response interactive strategies (Hemmeter & Kaiser, 1994). All intervention techniques
were based on those described by Fey (2008) and Fey et al. (2017).
FTCL was coded if the teacher was engaged in any of the following behaviors:
imitating the child’s play with objects while engaging in play alongside him/her (parallel
play), commenting on the child’s behavior as the child plays, or responding appropriately
to child interactions (e.g., if the child holds out a toy in the teacher's direction, he/she
should accept the toy). FTCL was not coded if the teacher was engaged in any of the
speech (e.g., “my turn,” “your turn”), doing nothing, doing something other than what the
child was doing, playing with a different toy or activity than the child, prompting, doing
73
the opposite of what the child requested, correcting a behavior or response, or
Measurement and data collection. Based on research regarding the most effective
form of time sampling methods (Lane & Ledford, 2014) and optimal interval length, each
20-min data collection session was divided into 80, 15-s intervals. Momentary time
sampling was used when scoring intervals for FTCL. Thus, at the end of each interval,
the interventionist scored whether the teacher was correctly engaging in FTCL. If the
child and adult could not be seen in the video frame together, they were not considered to
be within arm’s reach and FTCL was not coded. All codes were based on the exact end of
the interval and not what came after (e.g., at 45.999 the teacher moves her hand, but does
not place it onto the child’s hand until 46, the code was based on the teacher moving her
hand, not where it ended up going). These data were then used to calculate a percentage
of intervals during which the teacher correctly implemented FTCL by dividing the
number of intervals containing FTCL by the total number of intervals and multiplying by
100. An interval was scored as containing FTCL if the operational definition of FTCL
A social routine was operationally defined as: (1) successfully completing the steps in
a routine, (2) engaging in a routine, and (3) taking an imitative turn. TSR was coded as
correct if the teacher was successfully completing the steps in the identified routine,
engaging in the specified routine, taking an imitative turn, playing with the same toy as
the child and engaged in the same routine as the child, prompting the child to engage in
74
the routine (e.g., "my turn, your turn") or he/she was within the context of setting up or
maintaining the routine. TSR was still coded as correct if the routine was altered or
changed based on the child's interests as the teacher was still engaging in FTCL. TSR
was not coded if the teacher was engaged in parallel play (playing separately from the
child), engaged in an activity that did not allow for turn-taking (e.g., iPad or any game the
child played alone while the adult watched), engaged in a separate activity from the child,
was not within arm’s reach of the child, and/or was not visible in the same camera frame
as the child.
Measurement and data collection. Each 20-min data collection session was
divided into 80, 15-s intervals. Momentary time sampling was used when scoring
intervals. Thus, at the end of each interval, the interventionist scored whether the teacher
was correctly engaging in TSR. These data were then used to calculate a percentage of
intervals during which the teacher correctly implemented social routines by dividing the
number of intervals containing TSR by the total number of intervals and multiplying by
100. An interval was scored as containing TSR if the operational definition of TSR was
prompting hierarchy, starting with prompts that provide the least amount of support and
moving up to the most amount of support that is needed, or can be given, to occasion a
response. Teachers were told to start by giving an expectant look and waiting 3 s (time
delay: Step 1). If the children did not respond within 3 s, teachers then issued a linguistic
75
prompt, “What do you want?” (Step 2). If after another 3 s, children still did not respond,
teachers were instructed to give a linguistic model prompt and say, “X. You want X. Say
X.” encouraging the child to repeat the word after them (Step 3). See Figure 3.1 for a full
description of the prompting hierarchy. Note, this version of the prompting hierarchy was
adapted from Fey (2008). Several steps were removed from the hierarchy as described,
namely the cue step and the additional model step. This was done to ease the learning
burden for teachers and to make the SLP process more user-friendly. Seven steps were
defined and scored as correct or incorrect for the SLP variable. In order to have scored a
prompt, the teacher must have begun the prompting hierarchy with time delay. This
using gestures or vocalizations (e.g., “your turn” and pausing for 3 s (range = 3 to 5 s);
and waiting in silence to see if the child vocalized. Additionally, the prompt must have
been designed to elicit a vocal response to be scored (e.g., conveyed the expectation that
SLP was implemented once every 2 min. First, a correct response was coded if the
teacher only gave one prompt within the 2 min time frame. In contrast, an incorrect
response was coded if the teacher gave more than one prompt or no prompts at all within
the 2 min time frame. Second, time delay was coded as correct if the teacher drew the
child’s attention to a specific item and then waited in silence for 3-5 s before issuing a
linguistic prompt (e.g., What color?). It was coded as incorrect if the adult waited less
than 3 s or more than 5 s before moving on to the next level of the hierarchy or if time
delay was implemented incorrectly. Third, a linguistic prompt was coded as correct if the
76
linguistic prompt was implemented correctly (i.e., saying a pure linguistic prompt [e.g.,
“What color is this?”] after time delay and before a model prompt. This was also coded
as correct if the child correctly responded after time delay and the linguistic prompt was
not necessary provided that the teacher appropriately stopped the prompting hierarchy
there. A linguistic prompt was coded as incorrect if the teacher skipped the linguistic
prompt and went right to a model prompt (e.g., “This is red.”). For example, if the teacher
went right to modeling (e.g., “What color? Purple”) then the linguistic aspect of this
prompt was considered part of the model prompt and the pure linguistic prompt was not
given.
Fourth, a model prompt was scored as correct if the teacher implemented the
model prompt correctly (i.e., saying the word he/she wanted the child to say and asking
the child to repeat it). A model prompt was also coded correct if the child correctly
responded after the linguistic prompt, making the model prompt unnecessary. A model
prompt was coded as incorrect if the model prompt was implemented incorrectly (e.g.,
the teacher modeled the wrong word; or issued the model prompt after the child had
already given the correct response; or the teacher failed to attempt to get the child to
repeat the word) and the adult did not model the desired vocalization and moved on to
another task/activity. Fifth, praise was operationally defined as encouraging sayings (e.g.,
“Great job! Nice work! Awesome! Way to go!”) and providing feedback (e.g., “That’s
correct. You got it right.”). Non-vocal forms of praise (e.g., high fives, fist bumps) were
acceptable if they were accompanied by vocal praise. Praise was coded as correct if the
teacher provided praise when the child correctly completed the behavior or attempted to
77
correctly complete the behavior, or the teacher did not provide praise if the child did not
complete or attempt to complete the behavior. Praise was coded as incorrect if the teacher
did not provide praise for successful completion of the target response, or if he/she
praised unsuccessful completion (i.e., the child did not emit the vocal response).
Sixth, the discontinuation of the prompting hierarchy was coded as correct if the
teacher concluded the prompting sequence when and only when the child
performed/attempted the requested vocalization or the teacher reached the end of the
prompting hierarchy. This column was coded as incorrect if the teacher discontinued
persisted with prompting even after the child had successfully completed/attempted the
vocalization. Of note, all prompts including time delay were coded regardless of whether
the prompts issued by the teacher were issued according to the order of the prompting
hierarchy. Seventh, sequence of prompts was coded to determine whether the teacher
implemented the steps of the prompting hierarchy in the appropriate and accurate order.
Prompting sequence was scored as correct if the sequence in which the prompt levels
were performed was done correctly (e.g., starting with time delay, moving to linguistic,
moving to modeling [if necessary]) regardless of whether each individual step was done
correctly. This step was coded incorrect if the prompts were delivered in an incorrect
sequence (e.g., modeling prompt performed before the linguistic prompt) regardless of
Measurement and data collection. Accuracy of use of the system of least prompts
was measured using an event recording system. Each use of the prompting hierarchy was
78
given a percentage accuracy score based on the following criteria, which were marked as
yes, or no: (1) one prompt was given every 2 min (i.e., prompts should be separated by 2
min, plus or minus 15 s), (2) time delay was implemented correctly, (3) linguistic mand
implemented correctly, (5) praise was provided when appropriate, (6) prompting was
discontinued at the appropriate step, (7) the sequence of prompts was followed correctly
(see Figure 3.1). All instances of prompting that demonstrated an 80% mastery criterion
(i.e., 6/7 steps completed correctly) were scored as correct (for a similar procedure, see
Wright & Kaiser, 2017). These data were then used to determine the percentage of
prompts used correctly in a session, calculated by dividing the number of prompts used
Experimental Design
across teachers was implemented. This design allowed for the detection of a functional
across behaviors design when the data level or trend change upon introduction of the
intervention to the first tier while the data remain stable or unchanged in the remaining
tiers, and this change is repeated through the process of intra-participant replication (Gast
introduction of the intervention, confidence that the intervention is responsible for the
79
change in data trend or level increases, experimental control is established, and a
functional relation between the intervention and change in the data can be inferred.
Procedures
Baseline Condition
for each teacher based on the recommendation in the What Works Clearinghouse (WWC)
Standards for Pilot Single-Case Designs, Version 4. During the baseline condition,
teachers were instructed to interact with children as they normally would and any
spontaneous use of programmed intervention techniques was recorded (for data collection
sheets, see Appendix D), but no instruction, training, or feedback was provided. The
teacher and child were observed during free play time. At the beginning of each session,
the teacher laid out all the toys and allowed the child to choose the toys with which to
play. Once baseline data were stable and without trend in the therapeutic direction across
tiers, and the minimum number of baseline data points had been collected for the first
behavior, the first intervention strategy was introduced. The introduction of intervention
was made based on teachers’ individual data; therefore, baseline length varied across
teachers.
Teacher Training
introduction of intervention for each MT technique using BST. The BST procedure
introduction of each intervention strategy, two training sessions were held - one session
80
that lasted approximately 45 min and one session that lasted approximately 20 min. Two
sessions were dedicated to each target intervention technique for a total of six training
sessions per teacher. Each individual teacher selected a training time that worked best for
his/her schedule and received training at the selected time. The two training sessions for
each technique occurred no more than two weekdays apart. For example, if the first
training session for FTCL occurred on a Tuesday, the second training was conducted by
During the initial training session for each intervention technique, the instructor
introduced and described the strategy, broke down and described the process for
implementing the technique, and answered any questions that teachers had. Next,
teachers were shown video models implementing the technique. During rehearsal and
feedback, the researcher and the teacher practiced the target intervention technique with
regarding what the teacher was doing well in addition to comments designed to help
improve the accuracy of the teachers’ implementation of the techniques. During the
second training session for FTCL, TSR, and SLP, the teacher was asked to implement the
technique with the child in the classroom environment while the researcher provided in-
situ feedback. The researcher modeled the techniques for each teacher as necessary and
practice continued until each teacher felt confident that he/she could implement the
technique appropriately on his/her own. During the second training session, the
researcher observed the teachers engaging in each target technique with the child and
81
Following the Child’s Lead. During training of FTCL, teachers were trained to
allow the child to lead and direct the interaction. First, they were trained how to imitate
the child’s actions and engage in parallel play, a process where they play alongside the
child, imitating the child’s actions but playing independently. For example, the teacher
and the child might run two different cars down two separate tracks to play with the cars
rather than taking turns with the same car or playing on the same track. Next, teachers
were trained how to comment on the child’s behavior as they played (e.g., “You have a
car. That’s a red block.”) rather than to ask questions or give commands (e.g., “What
color is that block? What do you have?”). Finally, teachers were trained how to respond
appropriately to child initiations. For example, if a child offered an object to the teacher,
he/she should accept the object; similarly, if the child made a request, the teacher should
Teaching Social Routines. During training for TSR, the researcher worked
closely with teachers to identify and develop several routines for each child participant
depending on the selected toys. Teachers establish routines with children in a variety of
ways, such as imitating a child’s play with the same or similar toys, imitating the child’s
actions, performing an action that is complementary to the child’s action to create a turn
for the teacher within the interaction, engaging the child by performing an action or
activity that he/she finds funny or interesting, or paring actions with singing or counting
(Fey, 2008). The development of these routines is critical for future techniques which
require the teacher to interrupt the established routine in order to create opportunities for
communicative interaction (e.g., prompting). For Ms. Smith, trained routines included
82
taking turns riding a bike throughout the school, putting magnets up on a board, coloring
a picture, and building with blocks. For Mr. Parker, trained routines included having Josh
request a turn to bounce on a giant ball and take turns playing with monkey string (i.e., a
System of Least Prompts. The prompting hierarchy for SLP consisted of the
following prompting techniques: (1) time delay, (2) linguistic prompts, and (3) linguistic
mand-model prompts (adapted from Fey, 2008). Teachers were trained how to prompt in
the following manner: beginning with time delay, the teacher removed a toy of interest or
interrupted the routine, gave an expectant look, and then waited 3 s before delivering any
kind of instruction, giving the child (portrayed by the researcher) the opportunity to
communicate independently. Once the 3 s had elapsed, and the “child” had not provided
the correct response, the teacher then moved on to the next step in the prompting
hierarchy, linguistic prompts. These prompts are vocally issued prompts that encourage a
child to communicate with adults. For example, if after 3 s the “child” did not respond to
the initial disruption in routine, the teacher would say, “What do you want?” and pause
for another 3 s. If the “child” still did not respond to the linguistic prompt, the teacher
then issued a linguistic mand-model prompt, saying, “X. You want X. Say X.” saying the
name of the toy. In general, linguistic mand-model prompts are used to tell a child what is
being asked or what is expected (e.g., asking the child specifically what he/she wants).
Teachers were trained to prompt once every 2 min following the aforementioned
hierarchy.
83
As part of the initial training session for the SLP intervention technique, the
researchers and the teachers selected specific verbal goals for the individual child
participants. Because the children in this study were minimally verbal, single word
linguistic social-communication goals were identified for each child. Goal selection was
objectives in their GPPs. For example, if a teacher and child were engaged in a routine in
which they were rolling a ball back and forth but the teacher did not return the ball upon
the child’s turn, the primary prompting goal may be for the child to say, “ball,”
requesting the desired item. It would also be appropriate in this scenario for the child to
say, “turn” requesting his/her turn with the ball. To establish linguistic targets, based on
the recommendations of Fey (2008) the interventionist sat with each teacher to examine
the routines that each child developed during that stage of the intervention and identified
several target words that the children could emit in order to request continuation of the
routines. For Sarah, the primary routine was riding on the bike and prompted words
included, “back”, “go”, “bike”, “turn”, and “push.” For Josh, the primary routines were
rolling on the beach ball and taking turns with the monkey string and prompted words
Intervention Condition
identical to those used in baseline (1:1 pull out sessions, 20 min in length) with the
exception that teachers used the trained techniques rather than playing as they normally
would. No training or help was provided by the researcher during intervention sessions.
84
Data on teacher use of MT strategies were collected via video-recordings of intervention
sessions. A minimum of five data points was required for the intervention condition
across MT techniques and teachers, and intervention continued until the teachers reached
maintenance of intervention skills (Ward-Horner & Sturmey, 2012), once the intervention
began, a 1:1 check-in with teachers was conducted after each data collection session in
order to ensure the continuation of intervention knowledge and accuracy. During the
check-in, the teacher and the researcher reviewed what went well during the session and
noted areas for improvement. The researcher also answered any questions the teacher had
and provided any additional practice/training as necessary. The researcher reviewed the
teachers’ graphed data with them as necessary (e.g., during a period of continued
performance decline or upon reaching a described goal) and discussed and explained their
performance based on visual analysis. Booster training sessions were also utilized during
periods of performance decline or after long breaks in data collection. During these
sessions, the researcher and teacher would practice the intervention techniques with the
child while the researcher provided immediate feedback and modeling as necessary.
Generalization across materials was assessed once during each intervention phase
and during baseline for the TSR and SLP techniques. Teachers were given a novel set of
toys and were asked to use the trained intervention techniques with the novel toys. The
same set of novel toys were used during all three sessions for each teacher/child pair.
85
Generalization sessions were identical to intervention sessions. Data were collected using
the same procedures as during the baseline and intervention conditions, with a different
set of toys, and took place in the same environments. After completion of the
generalization probes, teachers were provided with feedback regarding their performance.
observations. Data were recorded on teachers’ use of all trained intervention techniques.
All maintenance sessions were identical to intervention sessions. The same data
collection systems used during previous data collection sessions were used during
The study lasted for six months. In the beginning of the study, data were collected
once a day, two to three times per week. However, as the study progressed, data were
collected less frequently. There was a two-week break in data collection between sessions
14 and 15 for both teachers due to the Thanksgiving holiday and teacher availability. In
addition, after Session 23 for Ms. Smith and Session 24 for Mr. Parker, there was a five-
week break in data collection due to the holidays and teacher and researcher travel. Upon
returning from the five-week break, Mr. Parker requested a booster session, which was
performed between Sessions 24 and 25. Thus, a booster session for FTCL and primarily
TSR was conducted during which the researcher provided in-situ feedback to Mr. Parker
as he interacted with Josh. Ms. Smith was also offered a booster session, but she
declined. There was one additional two-week break between Sessions 30 and 31 for Ms.
Smith and between Sessions 28 and 29 for Mr. Parker. On four occasions for both
teachers, two sessions were conducted in one day with 30 min to 1 hr in between
86
sessions. For Ms. Smith, the following session pairs were conducted on the same day: 26
and 27, 29 and 30, 31 and 32, and 33 and 34. For Mr. Parker, the following session pairs
were conducted on the same day: 25 and 26, 27 and 28, 30 and 31, and 32 and 33.
Procedural Fidelity
Fidelity checklists of BST were developed for each of the intervention training
techniques, detailing specific criteria that must be covered within the training sessions
(see Appendix C). These checklists were used to ensure that all teacher participants
received the same information and training. Average procedural fidelity for Ms. Smith
was 77% with a range of 60-96%. Similarly, average procedural fidelity for Mr. Parker
Interobserver Agreement
agreement (IOA). All raters were required to demonstrate 80% accuracy or higher for 3
different training video sessions on all data collection instruments before coding video
IOA for all data collected (i.e., FTCL, TSR, and SLP). This was calculated by dividing
the number of intervals for which the two observers agreed by the total number of
intervals for the session and multiplying by 100. The primary coder was blind to sessions
selected for IOA. If IOA was below 80% on a session coded for IOA, a discrepancy
discussion was held to re-calibrate, but the original agreement percentage was maintained
87
for calculation of mean IOA. For Ms. Smith, observers had an average agreement across
baseline and intervention conditions of 80% (range: 56-93%) for FTCL, 92% (83-100%)
for TSR, and 97% (81-100%) for SLP. For Mr. Parker, observers had an average
agreement across baseline and intervention conditions of 83% (range: 63-93%) for FTCL,
96% (86-100%) for TSR, and 96% (80-100%) for SLP. See Table 3.3 for full reporting of
The dependent variables were graphed for each teacher. Both teachers had a
minimum of five data points per baseline condition. When training occurred on the first
MT strategy, baseline data continued to be collected for the remaining intervention tiers
(behaviors) and were analyzed for stability and absence of trends in the therapeutic
direction. With the introduction of training for each new intervention strategy, the
effect, allowing for the detection of a functional relation (Gast et al., 2018). This process
was in accordance with standards issued by the What Works Clearinghouse (WWC) for
Pilot Single-Case Designs, Version 4.1, stating the need for a minimum of three
demonstrations of intervention effect at three different points in time. In addition, the use
of a multiple baseline design with a minimum of six intervention phases and a minimum
of five data points meets the standards put forth by WWC for Pilot Single-Case Designs
Without Reservations. Visual analysis of data trend, level, and variability was conducted
techniques.
88
Results
Ms. Smith
Following the child’s lead. During baseline, Ms. Smith averaged 31% of
intervals demonstrating FTCL (range: 16-43%; see Figure 3.2). Baseline data were
somewhat variable and were trending in the non-therapeutic direction when intervention
began. Once intervention was introduced, there was an immediate increase in level of
of 74% (55-90%). Of note, Ms. Smith’s performance during intervention was somewhat
variable; she began with a high percentage of intervals with FTCL, and then her
performance began to decline. A new toy (bike) was inadvertently introduced into the
interaction during Session 11 of the FTCL intervention condition. It is worth noting that
Ms. Smith’s performance increased initially to above the 80% criterion when this
happened; however, her performance went back to below the mastery criterion after this
session. This soon triggered the need for a booster session where the researcher met with
the teacher to review the intervention procedures and allowed her to practice these
procedures once again. After the booster session, Ms. Smith’s data began to increase in
the therapeutic direction, and she was able to achieve the mastery criterion of three
consecutive sessions at 80% of intervals or higher. Despite this variability in her data,
there was no overlap between data points in the baseline and intervention conditions. The
change in level of the data suggests that the intervention was responsible for the change
89
Upon introduction of BST to TSR, the teacher’s performance of FTCL
demonstrated continued variability, with a mean accuracy score of 76% (60-90%) and
dipping below the 80% criterion several times. However, this is not unexpected based on
the nature of TSR. Once TSR is introduced as a technique, the teacher is instructed to
structure the interaction so that a routine is being used regularly. This can make it
performance.
demonstrating TSR were largely consistent, with a mean score of 3% and a range from 0-
8%. Baseline data were low in level and were not trending in any direction after the first
few data points, which were initially trending in the therapeutic direction. Data were
largely stable around 0%. When intervention was introduced, percentage of intervals
remained above the 80% criterion for the remainder of the intervention condition. Her
mean percentage of intervals demonstrating TSR was 89% with a range of 86-94%. This
change in level suggests that the intervention was responsible for the teacher’s
to SLP, Ms. Smith’s performance became more variable with a mean percentage of
System of least prompts. During baseline, Ms. Smith’s performance was largely
Baseline data were quite flat with a slight uptick in the therapeutic direction, which then
90
flattened once again. Data level remained low throughout baseline. Once intervention
was introduced, Ms. Smith’s performance actually decreased from 10% to 0% prompts
used correctly during the first session. This was due to the fact that Ms. Smith forgot to
implement one component of the SLP process, resulting in all of her prompts being
scored as incorrect for this session. After meeting with the researcher to discuss this issue
and to re-practice prompts, her performance improved to 80% of prompts used correctly
during the session. Her average percentage of prompts used correctly during intervention
was 72% with a range of 0-100%. With the exception of one data point (the session with
0% accuracy), there was no overlap in the data between baseline and intervention
conditions, suggesting that the intervention was responsible for Ms. Smith’s increase in
accuracy of implementation.
Generalization. During the first generalization probe for FTCL during the
intervention condition, Ms. Smith demonstrated 85% of intervals coded for FTCL
suggesting her skills had generalized to a new set of materials. This score decreased to
50%, and then increased to 84%. However, because there were no generalization sessions
10%. This increased to 79%, just below the 80% criterion, during the TSR intervention
condition, but decreased to 15% after introduction of BST to SLP. Thus, Ms. Smith was
able to generalize her TSR skills initially, but this generalization did not maintain.
Finally, during the two SLP baseline generalization probes, she performed at 0%. In
contrast, during the SLP intervention generalization probe, her performance improved to
91
80%, indicating that she was successfully able to generalize the SLP skills to a new set of
materials.
80% criterion with values of 88% and 94%, at the two- and four-week follow ups,
respectively. Similarly, she maintained her scores for TSR at the two- and four-week
follow up sessions, with values of 84% and 80%, respectively. Finally, her SLP skills
maintained at the two-week follow up with a score of 90% but did not maintain at the
four-week follow up, with a score of 60%, falling below the 80% criterion.
Mr. Parker
intervals coded for FTCL of 30% (range: 16-40%) during baseline (see Figure 3.2).
Baseline data were variable but remained somewhat low in level. However, there was a
trend in the therapeutic direction toward the end of baseline. Once intervention was
introduced, there was an immediate increase in level, with a mean performance during
was highly consistent in the beginning; however, as time went on, his performance
became more variable. A new toy (beach ball) was inadvertently introduced into the
interaction during Session 7 of the FTCL baseline condition. The ball was absent during
Session 8 due to the fact that it was unavailable, but it was again present during Session 9
when FTCL intervention began. During Session 15, the ball was again unavailable during
data collection, which may have resulted in Mr. Parker’s low accuracy of implementation
of 54%. This triggered the need for a booster session where the researcher met with the
92
teacher to review the intervention procedures and to allow him to practice these
procedures once again. Technically, Mr. Parker reached the mastery criterion of three
consecutive sessions at 80% or higher during Session 13; however, intervention was
continued due to the variability and therapeutic trend that were present in his TSR
baseline data. Thus, intervention was continued in the hopes that his TSR baseline data
would stabilize. After the booster session, Mr. Parker’s FTCL data demonstrated
increases in the therapeutic direction, and he achieved the mastery criterion of three
consecutive sessions at 80% accuracy or higher. Despite some variability in his data,
there was no overlap between data points in the baseline and intervention conditions. The
change in level of the data suggests that the intervention was responsible for the change
TSR, the teacher’s performance dropped, with a mean percentage of intervals scored for
FTCL of 30% (8-63%). His performance remained below the 80% criterion level once
TSR was introduced. As previously mentioned, this drop in FTCL was expected based on
the structured nature of teaching social routines and the teacher’s increased involvement
Teaching social routines. Mr. Parker’s performance during baseline was highly
variable, with an average percentage of intervals coded for TSR of 17%, ranging from
0% to 50%. The data began at a low, stable level around 0%; however, they began to
increase in the therapeutic direction, reaching a peak at about 50%. The data then began
trending in the non-therapeutic direction, reaching a level that was similar to the
beginning of baseline. However, it then began trending in the therapeutic direction again
93
before intervention began. When intervention was introduced, his percentage of intervals
coded for TSR increased from 36% to 91% and remained above the 80% criterion level
for all but one session. His average percentage of intervals coded for TSR during
intervention was 90% with a range of 83-99%. Of note, there was a large gap in data
collection between Sessions 24 and 25; this triggered the need for a booster session where
the researcher met with the teacher to review the intervention procedures and to allow
him to practice these procedures once again. Similarly, this skill remained above the
80% criterion when SLP was introduced with an average accuracy score of 94% and a
range of 90-99%. This change in data level suggests that the intervention was responsible
prompts used correctly for SLP of 0% during baseline. Baseline data were flat and stable
and remained at the 0% level throughout. When intervention was introduced, his score
increased from 0% to 80%. With the exception of one data point, he remained above the
correctly for SLP of 80% (range: 50-100%). However, due to the fact that there was a
break in between the final baseline session and SLP training, these results must be
interpreted with caution. Although a generalization baseline point was conducted prior to
BST for SLP, inference of a functional relation between BST and MT implementation for
condition, Mr. Parker demonstrated 81% of intervals coded for FTCL, suggesting that his
94
skills had generalized to a new set of materials. However, this decreased to 25%, and then
increased to 63%. Although his scores were somewhat variable, his FTCL skills
Regarding TSR, his baseline generalization performance was 0%. This increased to 99%
during the intervention condition, and further increased to 100%. Thus, he was able to
generalize his TSR skills to a new set of materials. Finally, during the two SLP baseline
generalization probes, Mr. Parker performed at 10% and 0%. In contrast, during the SLP
Maintenance. Regarding FTCL, Mr. Parker’s performance did not remain above
the 80% criterion, with values of 46% and 65%, at the two- and four-week follow up
probes, respectively. In contrast, he maintained his scores for TSR at the two- and four-
week follow ups, with values of 100% at both maintenance probes. Finally, his SLP skills
also maintained at the two- and four-week follow-up maintenance probes, with scores of
Communication. Although child outcomes were not targeted in this study and
therefore not measured directly, parents completed a CDI for each child both before and
after the study. Prior to the study, Sarah’s parents reported that she understood 28/28
phrases and 307/396 words on the checklist. She reportedly could not produce a single
word on the list. Following the intervention, parents reported that she could understand
95
27/28 phrases and 370/396 words, a decrease of one phrase and an increase of 63 words
understood. They also reported that she could now produce a total of 16/396 words, an
increase of 16 words from the beginning of intervention. Josh also showed gains on this
measure from pre to post intervention. Before intervention parents reported that he could
understand 18/28 phrases and 380/396 words on the checklist. He could produce a total
392/396 words, an increase of 4 phrases and 12 words. Josh could also reportedly
produce 201/396 words, an increase of 37 words. Though interesting, these gains are
design.
Social Validity. At the conclusion of the study, teachers were asked to complete
an acceptability rating scale to help determine whether they found the intervention
acceptable, helpful, and practical (adapted from Hendrickson et al., 1993). The
questionnaire was broken down into four main parts: Research, Intervention Effects,
Social Validity, and Training. Ms. Smith and Mr. Parker agreed that research is important
in schools, can improve staff teaching, and is important for better teaching all children.
Teachers also agreed that the intervention was helpful for them as teachers as well as for
the students with whom they worked. In terms of the social validity section, both teachers
improved; they believed they could incorporate techniques into daily classroom routines;
and that the intervention techniques were feasible to implement in the classroom.
However, Mr. Parker disagreed that the intervention techniques could be easily
96
incorporated into the classroom whereas Ms. Smith agreed. Both teachers strongly agreed
Finally, regarding training session two, which featured in-situ feedback with the
child participants, both teachers agreed that they were comfortable during their training
sessions, the sessions were tailored to their experience levels, they felt comfortable
implementing techniques after their training sessions, felt comfortable asking questions,
and would recommend the sessions to their colleagues. In addition to the questionnaire,
Mr. Parker informed the researcher that he was grateful for having learned the techniques
and began implementing them with a new student with social communication difficulties
Discussion
The results of the present study indicate that BST can be effectively used to train
teachers to implement several core techniques of MT: FTCL, TSR, and SLP. Overall,
for Ms. Smith, the data showed one demonstration of effect and two replications of
fidelity. For Mr. Parker, the data showed one demonstration of effect and one replication
of effect, with limited evidence of a second replication. Ms. Smith maintained all skills at
the two- and four-week follow up probes, with the exception of SLP at the four-week
follow up. Mr. Parker maintained TSR and SLP at the two and four-week follow up
probes but did not maintain FTCL. Ms. Smith initially generalized outcomes to a new set
generalization above the 80% criterion during additional generalization probes. Mr.
97
Parker struggled to maintain generalization criterion for FTCL but maintained
performance at the 80% criterion for TSR and SLP once these intervention techniques
techniques, post intervention gains were also observed in children’s word production and
These results are most similar to those reported by Kaiser et al. (1993) who found
that through implementing training techniques similar to those used in BST, they were
implemented in the classroom. This study also expands upon similar studies which
similar to BST (Kaiser et al., 1995). In addition, similar to Aktas and Ciftcitekinarslan,
(2018), this study demonstrated the importance of intensive training in order for teachers
to correctly implement the MT techniques. BST provided a solid framework from which
Implications
This study has several implications for both practice and for research. Data show
that teachers can be successfully taught to implement MT techniques when BST is used
as a training package. Teachers spend a great deal of time with children and if we can
language ability, then children have a chance to benefit from a much greater dosage of
intervention. Rather than seeing a therapist once a week for an hour, children could have
daily access to evidence-based interventions for multiple hours a day through trained
98
school staff. This would allow for an increase in dosage, and in theory, an increase in
language gains for each child exposed to the intervention in the classroom. Such practices
have the potential to make a big impact in the lives of children with social
communication deficits.
Based on the findings of the current study, future research should focus on
study allowed for an increase in dosage in intervention techniques, utilizing the teachers
as the interventionists. However, dosage could be increased even further and intervention
could reach more children if these techniques were integrated into regular classroom
routines and incorporated as part of normal classroom instruction. Similarly, given that
the present study has demonstrated that BST can be used to effectively train teachers to
may be used to teach a variety of naturalistic interventions to not only teachers, but to
parents and other important figures in children’s lives. Finally, future research could
expand the results of the current study by directly measuring child outcomes to determine
whether the intervention examined has a direct effect on children’s language and other
targeted behaviors.
Limitations
Despite the positive outcomes of this study, there were several significant
limitations worth noting. First, regarding generalization, a generalization probe was not
conducted during FTCL baseline for Ms. Smith or Mr. Parker, making it impossible to
99
Thus, we cannot say for sure that generalization occurred for FTCL as performance may
have been equally high during baseline as during intervention. The remaining two tiers
had generalization probes conducted during baseline and during intervention so that this
Second, for Ms. Smith, a bike was introduced during Session 11 during FTCL
intervention. Ms. Smith’s data spiked during this session to above the 80% criterion line.
However, her data then began to fall below the 80% criterion and decreased for several
sessions, making it unlikely that the bike had a large impact on the data at that time. Yet
it is possible that the bike, introduced around the same time as intervention, could have
had an impact on the intervention data rather than the intervention itself, making it more
difficult to infer a causal relation between BST and improvement in FTCL data for Ms.
Smith. Similarly, a ball was introduced during Session 7 for Mr. Parker, was absent for
Session 8, and then present again for Session 9 when FTCL intervention began (it was
then present for the remainder of the sessions with the exception of one session).
However, given Mr. Parker’s performance during baseline and the continuing upward
trend from Session 7 to Session 8 where the ball was present then absent, it is unlikely
that this had a major impact on the data. Yet it is still possible given that when the ball
was missing during Session 15, Mr. Parker’s performance dropped dramatically and
began to climb once the ball was re-introduced. Therefore, similar to Ms. Smith, the
introduction of a new toy for Mr. Parker right around the introduction of FTCL
intervention calls into question whether the toy or the intervention was responsible for the
100
change in data. This again makes it more difficult to infer a causal effect of BST on
Similarly, an interval timer that the teachers could keep on their person and that
vibrated every 2 min was introduced partway through SLP intervention to make it easier
and less cumbersome for teachers to determine when the 2 min interval had expired. It
was introduced during Session 31 for Ms. Smith and Session 32 for Mr. Parker during
SLP intervention. Upon introduction, Ms. Smith’s performance continued to decline until
Session 33. In contrast, Mr. Parker’s data increased to above the 80% criterion once the
timer was introduced and he immediately met criteria. Therefore, it is possible that the
interval timer had an impact on teacher performance in the SLP intervention condition.
Third, mean fidelity for BST of MT techniques was somewhat low with a wide
range for Ms. Smith’s and Mr. Parker’s training sessions. This was likely due to a number
of reasons. In order to remove redundancies from the training, several changes were
made to the coaching process during individual sessions, resulting in lower procedural
fidelity. Specifically, the second training session was made more informal to increase
followed initial training sessions (both occurred on the same day). Rather than follow the
procedural fidelity sheet strictly, the researcher allowed the teachers to practice the skills
using a more open and informal style. The researcher provided training as the teacher
practiced the skills with the child, provided modeling when necessary as well as
immediate feedback that focused on both positive performance and areas that needed
improvement. This allowed the coaching session to flow more comfortably and naturally
101
and adhered more to naturalistic teaching strategies rather than following the pre-
determined criteria strictly. Ultimately, this resulted in lower procedural fidelity, but all
principles and pieces of behavioral skills training (instruction, modeling, rehearsal, and
in-situ feedback) were still utilized and followed accordingly during both training
sessions.
Fourth, frequency for the data collection sessions was not consistent. In the
beginning of the study, data were collected two to three times per week. However, as the
study progressed, toward the end of TSR and throughout SLP data collection, data were
often collected once a week but no less than every other week. At times, two sessions
were conducted in one day. This variation was due to conflicts with teacher, child, and
researcher availability and school holidays. Adjustments were made as necessary in order
Finally, each baseline tier ended with the collection of generalization data, with
the exception of FTCL. Therefore, there was no additional baseline session prior to
implementing intervention for each of the remaining tiers (TSR and SLP). This prevented
the researcher from determining whether baseline data continued in the same direction,
level, and trend after a brief interruption (generalization session). This is problematic
because without knowing where baseline data are immediately prior to implementing
intervention, it makes it more difficult to determine whether there were changes in the
data when intervention was introduced. This in turn makes it more difficult to infer a
would have increased confidence in the presence of a functional relation. Although this is
102
a limitation, data were collected close enough together that it is not likely a drastic threat
to internal validity, with one exception. There was a two-week break in between
generalization and SLP training for Mr. Parker and no additional baseline probes were
obtained before SLP training. Given the addition of extra time in between the last
baseline session and the first intervention session for SLP, these results must be
Conclusion
BST and teacher-implemented MT for one teacher and a possible causal relation for a
second teacher, indicating that BST can be effectively used to train teachers to implement
the following MT techniques: FTCL, TSR, and SLP. Through the processes of
effectively learn MT techniques to be implemented in the school setting and that these
skills largely generalized to new sets of toys and maintained across time. This study also
showed that a teacher with no experience in training or instruction could learn the
techniques just as well as a teacher with years of experience teaching and some
experience with training. This is important to note because such a finding suggests the
intervention techniques targeted in this study are accessible to teachers of all experience
levels, which could in turn make the intervention more widely available to larger groups
of students. The current study also expanded the current literature base by demonstrating
that BST can be effectively used to train natural implementers to use NDBI components.
103
Thus, the utility of BST as an intervention package has been further expanded into yet
104
Table 3.1
105
Table 3.2
Pre-Intervention Post-Intervention
Sarah Josh Sarah Josh
Age 8 8
Race/Ethnicity White Black
Gender Female Male
SB-5 AB Full Scale IQ (Percentile) <0.1 <0.1 N/A N/A
SB-5 AB Verbal IQ (Percentile) <0.1 <0.1 N/A N/A
SB-5 AB Nonverbal IQ (Percentile) <0.1 <0.1 N/A N/A
CDI: # of Phrases Understood (Raw Score) 28/28 18/28 27/28 22/28
CDI: # of Words Understood (Raw Score) 307/396 380/396 370/396 392/396
CDI: # of Words Produced (Raw Score) 0/396 166/396 16/396 201/396
CDI: # of Early Gestures (Raw Score) 16/18 6/18 17/18 6/18
CDI: # of Later Gestures (Raw Score) 43/45 27/45 42/45 27/45
CDI: Total # of Gestures (Raw Score) 59/63 33/63 59/63 33/63
Note. GPP = Growth & Performance Plan; SB-5 AB = Stanford-Binet Intelligence
Development Inventory
106
Table 3.3
107
Figure 3.1
Response
Requested (Time
Delay)
No/Incorrect
Correct Response
Response
Vocal Mand
Prompt: "What do Praise!
you want?"
No/Incorrect
Correct Response Repeat
Response
Vocal Mand
Model Prompt:
Praise!
"X. You want X.
Say X."
No/Incorrect
Attempt Response
Response
Move on (praise
any attempt to
Praise!
say the target
word)
Repeat Repeat
108
Figure 3.2
Introduced
80
70
60
FTCL
50
Booster
40
Session
30
20
10
0
100
90
Percentage of Intervals with
80
70
60
TSR
50
40
30
20
10
0
100
90
Percentage of SLP Used
80
70
Correctly
60
50 Timer
40 introduced
30 5-week
20 2-week 2-week
break
10 break break
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Session
Note. Circles represent the percentage of intervals with FTCL and TSR implemented
correctly and the percentage of SLP prompts implemented correctly. Triangles represent
109
generalization data and filled in squares represent maintenance data. Slashes on the graph
represent breaks in data collection of more than one week. FTCL = following the child’s
110
Figure 3.3
80
70
60 Ball
Introduced
FTCL
50 Booster
40 Session
Ball Missing
30
20
10
0
100
90
Percentage of Intervals with
80
70
60 Booster
Session
TSR
50
40
30
20
10
0
100
90
Percentage of SLP Used
80
70
Correctly
60
50
40
30
20 2-week 5-week 2-week Timer
10 break break break introduced
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Session
Note. Circles represent the percentage of intervals with FTCL and TSR implemented
correctly and the percentage of SLP prompts implemented correctly. Triangles represent
111
generalization data and filled in squares represent maintenance data. Slashes on the graph
represent breaks in data collection of more than one week. FTCL = following the child’s
112
CHAPTER 4
GENERAL DISCUSSION
community prevalence estimates of 7% to 17% (King et al., 2005). However, this has
been shown to be even higher among children with disabilities, with approximately 70%
of three to five year olds with co-occurring disabilities and language impairment
(Wetherby & Prizant, 1992). Furthermore, research indicates that speech and language
spectrum disorder (ASD; Rosenbaum & Simon, 2016). Language impairments have also
development as children grow (Johnson et al., 1999). Children with ASD are one group
of children who often exhibit language delays. Many children with ASD benefitted from
the use of discrete trial teaching (DTT) when it was first introduced by Lovaas (1987),
but Schreibman et al. (2015) identified several weaknesses of DTT that may make it
more difficult for some children, such as those with ASD, to benefit from DTT. Thus,
researchers began to look for other means to teach children with ASD language skills,
and to help close the language gap between them and their peers. Therefore, researchers
teachers (Peterson, 2004). Peterson (2004) indicated that teachers spend a great deal of
113
time with children throughout the school day and the school year. This stands to reason
that if they could be taught to implement various naturalistic language interventions, the
dosage of such interventions for children experiencing language delays could increase
such, the overall goal of the two studies presented was to provide a systematic literature
review of the use of BST to train natural implementers including teachers and/or other
professional staff and to provide an empirical investigation into the use of BST to train
The purpose of the systematic literature review was to analyze and synthesize the
current literature on the use of BST by teachers and other professional staff to implement
regarding whether BST was the appropriate training package regarding type of
total of 19 studies from 17 articles were reviewed; however, only seven studies showed
sufficient quality and rigor for their results to be interpreted with confidence. Similarly,
less than half of the studies measured generalization, maintenance, or BST fidelity, or
reviewed literature. However, several strengths were also demonstrated in the reporting
strong reliability data. Similarly, all 19 studies reported using all components of BST,
including instructions, modeling, rehearsal and feedback indicating that although BST
fidelity was not reported, all BST components were included in each of the reviewed
114
studies. In addition, very few studies examined BST and NDBIs. Overall, the systematic
review revealed a need for more BST research in which BST fidelity is measured,
participant characteristics are described, and the use of BST with NDBIs is examined.
The purpose of the empirical investigation was to help fill some of these gaps in
the BST literature by examining the use of BST to train natural implementers (teachers)
to implement the components of milieu teaching (MT). This was accomplished through
the use of a multiple baseline design across behaviors. In this study, two teachers working
implement several MT techniques, including following the child’s lead (FTCL), teaching
social routines (TSR), and the system of least prompts (SLP). Each of these behaviors
were taught using the four components of BST. For the first teacher, a functional relation
was shown between BST and the improvement in performance in the fidelity of
teacher for two of the three behaviors (FTCL and TSR), but not for SLP. Thus, these
results showed that BST can be used to effectively train teachers to implement
Taken together, these studies show that BST can be used to train teachers and
more research is needed in the area of training teachers to implement NDBIs using BST.
It is promising indeed that teachers can be taught to implement NDBIs in the classroom;
however, even in the current study, a pull-out system was used where the teacher and
115
child were separated from the general classroom, and teachers were taught to implement
the intervention in 1:1 settings. In order for the intervention to truly have increased reach
and dosage, more research is needed to determine how these interventions can be
incorporated into daily classroom routines. As a field, we must make them feasible
enough for teachers to use them without detracting from their regular classroom duties
and teaching. Only then will we truly have the chance to see the impacts of teachers as
116
References
Adams, R. A., Plercy, F. P., Jurich, J. A., & Lewis, R. A. (1992). Components of a model
41, 312–317.
Aktas, B., & Ciftcitekinarslan, I. (2018). The effectiveness of parent training a mothers of
children with Autism use of mand model techniques. International Journal of Early
https://doi.org/10.20489/INTJECSE.512378
Alden, L., Safran, J., & Weideman, R. (1978). Comparison of cognitive and skills
843–846. https://doi.org/10.1016/S0005-7894(78)80015-X
Bolzani Dinehart, L. H., Yale Kaiser, M., & Hughes, C. R. (2009). Language delay and
the effect of milieu teaching on children born cocaine exposed: A pilot study.
https://doi.org/10.1007/s10882-008-9122-8
Boyer, C. B., & Kegeles, S. M. (1991). AIDS risk and prevention among adolescents.
9536(91)90446-J
117
prevention of child abduction. School Psychology Review, 26(4), 1–13.
Chazin, K. T., Barton, E. E., Ledford, J. R., & Pokorski, E. A. (2018). Implementation
https://doi.org/10.1177/1053815118771397
https://doi.org/10.1177/0271121411404930
Davenport, C. A., Alber-Morgan, S. R., & Konrad, M. (2019). Effects of behavioral skills
DiGennaro Reed, F. D., Blackman, A. L., Erath, T. G., Brand, D., & Novak, M. D.
(2018). Guidelines for Using Behavioral Skills Training to Provide Teacher Support.
https://doi.org/10.1177/0040059918777241
Dogan, R. K., King, M. L., Fischetti, A. T., Lake, C. M., Mathews, T. L., & Warzak, W.
118
167. https://doi.org/https://doi.org/10.1007/s40489-019-00184-9
Dunn, L. M., & Dunn, L. M. (1997). Peabody Picture Vocabulary Test--Third Edition
Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., Reznick, J. S., & Bates, E. (2007).
Technical Manual (2nd ed.). Baltimore, MD: Paul H. Brookes Publishing Co., Inc.
Fetherston, A. M., & Sturmey, P. (2014). The effects of behavioral skills training on
instructor and learner behavior across responses and skill sets. Research in
https://doi.org/10.1016/j.ridd.2013.11.006
Fey, M. E., Warren, S. F., Bredin-Oja, S. L., & Yoder, P. J. (2017). Responsivity
Gillam (Ed.), Treatment of Language Disorders in Children (2nd ed., pp. 57–85).
Gianoumis, S., Seiverling, L., & Sturmey, P. (2012). The effects of behavior skills
https://doi.org/10.1002/bin.1334
Giles, A., Swain, S., Quinn, L., & Weifenbach, B. (2018). Teacher-Implemented
119
Analysis of Treatment Integrity. Behavior Modification, 42(1), 148–169.
https://doi.org/10.1177/0145445517731061
https://doi.org/10.1016/S0005-7894(87)80002-3
Hassan, M., Simpson, A., Danaher, K., Haesen, J., Makela, T., & Thomson, K. (2018).
social skill development in their child with autism spectrum disorder. Journal of
https://doi.org/10.1007/s10803-017-3455-z
Hassan, M., Thomson, K. M., Khan, M., Burnham Riosa, P., & Weiss, J. A. (2017).
Behavioral skills training for graduate students providing cognitive behavior therapy
Hemmeter, M. L., & Kaiser, A. P. (1994). Enhanced milieu teaching: Effects of parent-
https://doi.org/10.1177/105381519401800303
Hendrickson, J. M., Gardner, N., Kaiser, A., & Riley, A. (1993). Evaluation of a social
children: The need for behavioral skills training. Education and Treatment of
120
Children, 27(2), 161–177. Retrieved from https://www.jstor.org/stable/42899794
Hogan, A., Knez, N., & Kahng, S. W. (2015). Evaluating the Use of Behavioral Skills
014-9213-9
Homlitas, C., Rosales, R., & Candel, L. (2014). A further evaluation of behavioral skills
Ingersoll, B., Meyer, K., Bonter, N., & Jelinek, S. (2012). A comparison of
language use and social engagement in children with autism. Journal of Speech,
Iwata, B. A., Wallace, M. D., Kahng, S. W., Lindberg, J. S., Roscoe, E. M., Conners, J.,
Iwata, B. A., Wallace, M. D., Kahng, S. W., Lindberg, J. S., Roscoe, E. M., Conners, J.,
https://doi.org/10.1901/jaba.2000.33-181
Janzen, H. L., Obrzut, J. E., & Marusiak, C. W. (2004). Test Review: Roid, G. H. (2003).
121
Publishing. Canadian Journal of Psychology, 19(1), 235–244.
https://doi.org/10.1177/082957350401900113
Jimenez-Gomez, C., McGarry, K., Crochet, E., & Chong, I. M. (2019). Training
https://doi.org/10.1002/bin.1666
Johnson, B. M., Miltenberger, R. G., Egemo-Helm, K., Jostad, C. M., Flessner, C., &
Johnson, B. M., Miltenberger, R. G., Knudson, P., Egemo-Helm, K., Kelso, P., Jostad,
C., & Langley, L. (2006). A preliminary evaluation of two behavioral skills training
Johnson, C. J., Beitchman, J. H., Young, A., Escobar, M., Atkinson, L., Wilson, B., …
https://doi.org/10.1044/jslhr.4203.744
Jones, E. A., Carr, E. G., & Feeley, K. M. (2006). Multiple effects of joint attention
https://doi.org/10.1177/0145445506289392
122
Jull, S., & Mirenda, P. (2016). Effects of a staff training program on community
https://doi.org/10.1177/1098300715576797
Kaale, A., Smith, L., & Sponheim, E. (2012). A randomized controlled trial of preschool-
based joint attention intervention for children with autism. Journal of Child
https://doi.org/10.1111/j.1469-7610.2011.02450
Kaiser, A. P., Hester, P. P., Alpert, C. L., & Whiteman, B. C. (1995). Preparing parent
https://doi.org/10.1177/027112149501500401
Kaiser, A. P., Ostrosky, M. M., & Alpert, C. L. (1993). Training Teachers to Use
Children. Research and Practice for Persons with Severe Disabilities, 18(3), 188–
199. https://doi.org/10.1177/154079699301800305
King, T. M., Rosenberg, L. A., Fuddy, L., McFarlane, E., Sia, C., & Duggan, A. K.
(2005). Prevalence and early identification of language delays among at-risk three
https://doi.org/10.1097/00004703-200508000-00006
Koegel, R. L., Russo, D. C., & Rincover, A. (1977). Assessing and training teachers in
123
Applied Behavior Analysis, 10(2), 197–205. https://doi.org/10.1901/jaba.1977.10-
197
Kolko, D., Watson, S., & Faust, J. (1991). Fire safety prevention skills training to reduce
Kornacki, L. T., Ringdahl, J. E., Sjostrom, A., & Nuernberger, J. E. (2013). A component
young adults with autism spectrum and other developmental disorders. Research in
https://doi.org/10.1016/j.rasd.2013.07.012
Krumhus, K. M., & Malott, R. W. (1980). The effects of modeling and immediate and
Lane, J. D., & Ledford, J. R. (2014). Using interval-based systems to measure behavior in
early childhood special education and early intervention. Topics in Early Childhood
Law, J., & Roy, P. (2008). Parental report of infant language skills: A review of the
3588.2008.00503.x
Lawrence, J. S., Brasfield, T. L., Jefferson, K. W., Alleyne, E., O’Bannon, R. E., &
124
adolescents’ risk for HIV infection. Journal of Consulting and Clinical Psychology,
http://onlinelibrary.wiley.com/o/cochrane/clcentral/articles/124/CN-
00114124/frame.html
Ledford, J. R., Lane, J. D., Zimmerman, K. N., Chazin, K. T., & Ayres, K. A. (2016).
http://vkc.mc.vanderbilt.edu/ebip/scarf/
Madzharova, M. S., & Sturmey, P. (2018). Using in-vivo modeling and feedback to teach
https://doi.org/10.1007/s10882-018-9588-y
https://doi.org/10.1177/108835760001500103
Miles, N. I., & Wilder, D. A. (2009). The effects of behavioral skills training on caregiver
405–410. https://doi.org/10.1901/jaba.2009.42-405
125
21(1), 81–87. https://doi.org/10.1901/jaba.1988.21-81
Miltenberger, R., Gross, A., Knudson, P., Bosch, A., Jostad, C., & Breitwieser, C. B.
(2009). Evaluating behavioral skills training with and without simulated in situ
training for teaching safety skills to children. Education and Treatment of Children,
Miltenberger, R. G., Flessner, C., Gatheridge, B., Johnson, B., Satterlund, M., & Egemo,
https://doi.org/10.1901/jaba.2004.37-513
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items
for systematic reviews and meta-analyses: The PRISMA statement (reprinted from
https://doi.org/10.1371/journal.pmed.1000097
Nabeyama, B., & Sturmey, P. (2010). Using Behavioral Skills Training To Promote Safe
and Correct Staff Guarding and Ambulation Distance of Students With Multiple
https://doi.org/10.1901/jaba.2010.43-341
Nigro-Bruzzi, D., & Sturmey, P. (2010). The effects of behavioral skills training on mand
126
training by staff and unprompted vocal mands by children. Journal of Applied
Nuernberger, J. E., Ringdahl, J. E., Vargo, K. K., Crumpecker, A. C., & Gunnarsson, K.
Ogletree, B. T., Davis, P., Hambrecht, G., & Phillips, E. W. (2012). Using milieu training
to promote photograph exchange for a young child with autism. Focus on Autism
https://doi.org/10.1177/1088357612441968
Olive, M. L., De La Cruz, B., Davis, T. N., Chan, J. M., Lang, R. B., O’Reilly, M. F., &
Dickson, S. M. (2007). The effects of enhanced milieu teaching and a voice output
https://doi.org/10.1007/s10803-006-0243-6
Palmen, A., & Didden, R. (2012). Task engagement in young adults with high-
https://doi.org/10.1016/j.rasd.2012.05.010
Palmen, A., Didden, R., & Korzilius, H. (2010). Effectiveness of behavioral skills
127
https://doi.org/10.1016/j.rasd.2010.01.012
Pan-Skadden, J., Wilder, D. A., Sparling, J., Severtson, E., Donaldson, J., Postma, N., …
Neidert, P. (2009). The use of behavioral skills training and in-situ training to teach
Peterson, P. (2004). Naturalistic language teaching procedures for children at risk for
https://doi.org/10.1037/h0100047
Reynell, J. K., & Gruber, C. P. (1990). Reynell Developmental Language Scales. Los
Roid, G. H. (2003). Stanford-Binet Intelligence Scales (5th ed.). Itasca, IL: Riverside
Publishing.
Rosales, R., Stone, K., & Rehfeldt, R. A. (2009). The effects of behavioral skills training
541
Rosenbaum, S., & Simon, P. (2016). Speech and Language Disorders in Children:
Program. Speech and Language Disorders in Children: Implications for the Social
https://doi.org/10.17226/21872
Sarokoff, R. A., & Sturmey, P. (2008). The effects of instructions, rehearsal, modeling,
128
and feedback on acquisition and generalization of staff use of discrete trial teaching
and student correct responses. Research in Autism Spectrum Disorders, 2(1), 125–
136. https://doi.org/10.1016/j.rasd.2007.04.002
Sarokoff, R. A., & Sturmey, P. (2004). The effects of behavioral skills training on staff
Schreibman, L., Dawson, G., Stahmer, A. C., Landa, R., Rogers, S. J., McGee, G. G., …
Empirically validated treatments for autism spectrum disorder. Journal of Autism &
2407-8
Seiverling, L., Pantelides, M., Ruiz, H. H., & Sturmey, P. (2010). The effect of
behavioral skills training with general case training on staff chaining of child
75. https://doi.org/10.1002/bin.293
Seiverling, Laura, Williams, K., Sturmey, P., & Hart, S. (2012). Effects of behavioral
Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). Stanford-Binet Intelligence Scale:
129
Fourth Edition (4th ed.). Chicago: Riverside Publishing.
Togram, B., & Erbas, D. (2010). The effectiveness of instruction on mand model - One of
Ward-Horner, J., & Sturmey, P. (2008). The effects of general-case training and
Ward-Horner, J., & Sturmey, P. (2012). Component analysis of behavior skills training in
https://doi.org/10.1002/bin.1339
Warren, S. F., & Gazdag, G. (1990). Facilitating early language development with milieu
https://doi.org/10.1177/105381519001400106
Wechsler, D. (2005). Wechsler Individual Achievement Test 2nd Edition (WIAT II).
130
Wong, C. S. (2013). A play and joint attention intervention for teachers of young children
https://doi.org/10.1177/1362361312474723
Woodcock, R.W., McGrew, K.S., Mather, N. (2001). Woodcock – Johnson III Tests of
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Tests of
Wright, C. A., & Kaiser, A. P. (2017). Teaching parents enhanced milieu teaching with
https://doi.org/10.1177/0271121415621027
Wurtele, & Owens, J. S. (1997). Teaching personal safety skills to young children: An
investigation of age and gender across five studies. Child Abuse and Neglect, 21(8),
805–814. https://doi.org/10.1016/S0145-2134(97)00040-9
https://doi.org/10.1016/S0005-7894(05)80186-8
Wurtele, S. K., Saslawsky, D. A., Miller, C. L., Marrs, S. R., & Britcher, J. C. (1986).
688–692. https://doi.org/10.1037/0022-006X.54.5.688
Zimmerman, I. L., Steiner, V. G., & Pond, R. (1979). The Preschool Language Scale--
131
Revised. Columbus: Charles Merrill.
132
Appendix A: Outcome Coding Descriptions for Single Case Analysis Review and
Framework (SCARF)
Primary Outcomes
1. Which best characterizes the study's effects? This framework is designed for
analysis of SINGLE STUDIES. Articles may include multiple studies; these
should be evaluated separately. A study is a stand-alone single case design with
a single dependent variable. Studies may include a single or multiple
participants. For ATD studies, assess each condition in comparison to single
other condition, if these comparisons match your research questions. Note:
Strong effects occur when consistent changes occur between conditions,
overlap is minimal and/or decreasing over time, and there is a clear change in
the expected direction in level, change, and/or variability. Weak effects occur
when one or more of those characteristics is not present. Non effects occur
when data do not reliably change when condition change occurs, or when data
patterns preclude decision-making. Contratherapeutic effects occur when data
changes in a non-expected direction.
a. ATD Designs: Enter 0: data paths undifferentiated, approximately half or
more of data paths are overlapping (approximately the same values or with
some higher values in one condition and some higher values in another
condition). Enter 1: approximately half or more data are overlapping as
described above, but overlap decreases over time. Enter 2: less than half of
data points are overlapping but there is a decreasing or variable
differentiation between conditions (e.g., difference in values between
conditions decreases over time or is not consistent). Enter 3: less than half
of data points are overlapping and there is increasing differentiation over
time (e.g., difference in values between conditions increases over time).
Enter 4: minimal/no overlap occurs, consistent differentiation between
conditions].
b. MB/MP Design: Enter 0: >1 non-effect or any contratherapeutic effect or if
vertical analysis suggests changes in data in one tier is associated with
condition change in another tier. Enter 1: <3 demonstrations of effect, 1
non-effect. Enter 2: >=3 demonstrations, >=1 non-effect. Enter 3: >=3
demonstrations, >=1 weak effects, 0 non-effect. Enter 4: >=3
demonstrations, 0 non-effects/weak effect.
c. Other Designs: :Enter 0: >1 non-effect or any contratherapeutic effect.
Enter 1: <3 demonstrations of effect, 1 non-effect. Enter 2: >=3
demonstrations, >=1 non-effect. Enter 3: >=3 demonstrations, >=1 weak
effects, 0 non-effect. Enter 4: >=3 demonstrations, 0 non-effects/weak
effect.
Generalized Outcomes
1. Which best characterizes the generalization outcomes in the study?
133
a. Enter 0: no measurement of generalization outcomes. Enter 1: consistent
non-effects or contratherapeutic effects. Enter 2: inconsistent or weak
positive effects. Enter 3: consistent positive effects shown via post-test.
Enter 4: consistent positive effects shown via measurement in context of
design
Maintained Outcomes
1. Which of the following best characterizes maintenance outcomes for the study?
a. Enter 0: maintenance was not assessed. Enter 1: maintenance data were
similar to pre-intervention/baseline data. Enter 2: maintenance data showed
outcomes that were deteriorating or less optimal than intervention or
criterion Enter 3: maintenance data showed maintained outcomes similar to
intervention or criterion levels. Enter 4: maintenance data showed
maintained outcomes similar to intervention or criterion levels and on
multiple occasions (e.g., more than one data point)
134
Appendix B: Observation Data Collection Sheet
135
Appendix C: Intervention Fidelity Sheets
Date: ______ Workshop #: _______ Session #: _________
136
Researcher summarizes how the teacher utilized following the child’s lead /1
Researcher asks the teacher whether he/she felt the session length was /1
appropriate for learning
Total /27
137
Date: _______ Workshop #: ______ Session #: ________
138
Date: _______ Workshop #: ______ Session #: _________
140
Date: ______ Workshop #: _____ Session #: ________
141
Date: _______ Workshop #: _____ Session #: _________
142
Researcher asks the teacher whether he/she felt the session length was /1
appropriate for learning
Total /39
143
Date: _______ Workshop #: ______ Session #: ________
144
Appendix D: Data Collection Sheets
145
146
Appendix E: Teacher Demographics Form
General Information
Pseudonym:
Age:
Race/Ethnicity:
Gender:
Current Grade Taught and
length of time teaching this
grade:
Previous Grades Taught and Grade Time Taught
length of time teaching each
grade:
147
Please check all disabilities ¨ Autism Spectrum Disorder (ASD)
with which you have ¨ Intellectual Disability
instructional experience: ¨ Down syndrome
¨ Emotional Disturbance (ED)
¨ Physical Handicap
¨ Cerebral Palsy
¨ Specific Learning Disabilities
¨ Other Health Impairment (OHI)
¨ Speech or Language Impairment
¨ Visually Impaired (including blindness)
¨ Hearing Impaired
¨ Deafness
¨ Deaf-Blindness
¨ Traumatic Brain Injury
¨ Attention Deficit/Hyperactivity Disorder
(ADHD)
¨ Social Communication Deficits
How many students are in
your current classroom?
148
What was your area of study
for your highest degree
earned?
149
coaching from an expert to
support you in the classroom:
150
Appendix F: Child Demographics Form
General Information
Pseudonym:
Date of Birth:
Gender:
Race/Ethnicity:
Diagnoses:
Grade:
Number of Years at the
Bridge:
Does he/she have a Growth &
Performance Plan?
Has he//she received any
diagnoses? Please list each
one.
¨ Occupational Therapy:
151
¨ ABA:
¨ Other:
152
Appendix G: Social Validity Measure
Teacher Feedback Questionnaire
Adapted from: Hendrickson, J. M., Gardner, N., Kaiser, A., Riley, A. (1993). Evaluation
of a social interaction coaching program in an integrated day-care setting. Journal of
Applied Behavior Analysis, 26(2), 13-225.
Directions: Please answer the following questions based on your experiences with the
research study. Be sure to answer every question and to only circle one response per
question:
Intervention Effects
I believe the intervention techniques taught in 1 2 3 4
this study can be used in most classrooms to help
integrate high-risk children/ children with
disabilities
My participation in learning intervention 1 2 3 4
techniques was worth my effort
I would share my intervention skills with other 1 2 3 4
teachers
The social communication skills of the student(s) 1 2 3 4
with whom I worked have improved after this
intervention
This intervention was beneficial for the students 1 2 3 4
with whom I worked
This intervention was beneficial for me as a 1 2 3 4
teacher
Social Validity
153
Overall, I believe that my knowledge about social 1 2 3 4
communication and intervention techniques has
improved
I feel more knowledgeable and confident in my 1 2 3 4
ability to help my students improve their social
communication skills after participating in this
intervention
I feel confident in my ability to set up my 1 2 3 4
classroom and activities to help encourage
opportunities for social communication
I feel confident in my ability to incorporate the 1 2 3 4
intervention techniques I have learned into my
daily classroom routines
I believe that the intervention techniques are 1 2 3 4
feasible to implement in the classroom
I believe that the intervention techniques can be 1 2 3 4
easily incorporated into the classroom
Strongly Strongly
Disagree Disagree Agree Agree
I feel confident in my ability to follow the child’s 1 2 3 4
lead
I feel confident in my ability to establish and 1 2 3 4
engage in social routines
I feel confident in my ability to use the system of 1 2 3 4
prompts, including linguistic and non-linguistic
prompts and time delay
I feel confident in my ability to identify 1 2 3 4
opportunities to set up the environment to
encourage social communication
I would participate in a similar project in the 1 2 3 4
future
Training
I felt comfortable during the training sessions 1 2 3 4
The training sessions were tailored to my 1 2 3 4
experience level
I felt comfortable implementing the intervention 1 2 3 4
techniques after the training sessions were
completed
154
I felt comfortable asking questions during the 1 2 3 4
training sessions
I would recommend the training sessions to my 1 2 3 4
colleagues
155