You are on page 1of 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333568505

The effects of different mastery criteria on the skill maintenance of children


with developmental disabilities

Article  in  Journal of Applied Behavior Analysis · June 2019


DOI: 10.1002/jaba.580

CITATIONS READS

3 169

3 authors:

Sarah Richling Larry Williams


Auburn University University of Nevada, Reno
11 PUBLICATIONS   54 CITATIONS    83 PUBLICATIONS   518 CITATIONS   

SEE PROFILE SEE PROFILE

James Edward Carr


Behavior Analyst Certification Board
172 PUBLICATIONS   3,569 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Representation of Women in Behavior Analysis: An Empirical Investigation View project

Guided notes View project

All content following this page was uploaded by James Edward Carr on 15 July 2019.

The user has requested enhancement of the downloaded file.


JOURNAL OF APPLIED BEHAVIOR ANALYSIS 2019, 52, 701–717 NUMBER 3 (SUMMER)

The effects of different mastery criteria on the skill maintenance


of children with developmental disabilities
SARAH M. RICHLING AND W. LARRY WILLIAMS
UNIVERSITY OF NEVADA, RENO

JAMES E. CARR
BEHAVIOR ANALYST CERTIFICATION BOARD

The acquisition of skills by individuals with developmental disabilities typically includes the
attainment of a certain mastery criterion. We conducted a survey of practitioners who indicated
the most commonly used mastery criterion as 80% accuracy across three consecutive sessions.
Based on these results, we conducted a series of three experiments to evaluate the relation
between mastery criterion and subsequent skill maintenance with 4 individuals with various
developmental disabilities. Results suggest that 80% accuracy across three consecutive sessions
may be insufficient for producing maintenance in some cases.
Key words: accuracy, developmental disabilities, discrete-trial teaching, maintenance, mastery
criteria, skill acquisition

A number of recent applied behavior-analytic acquisition. Several clinical manuals recommend


studies have evaluated the effects of different 80% or 90% accuracy for skill mastery
instructional variables during discrete-trial teach- (e.g., Anderson, Taras, & O’Malley Cannon,
ing (DTT) on subsequent skill maintenance. 1996; Leaf & McEachin, 1999). Luiselli et al.
These variables have included intertrial intervals recommended clinicians require learners to meet
(e.g., Cariveau, Kodak, & Campbell, 2016), these accuracy percentages across multiple ses-
measurement intensity (e.g., Cummings & Carr, sions, days, or technicians. A survey of clinical
2009), and task interspersal (e.g., Majdalany, practices by Love, Carr, Almason, and Pet-
Wilder, Greif, Mathisen, & Saini, 2014), among ursdottir (2009) reported that 62% of respon-
others. Another important variable that is clearly dents serving as program supervisors of early
related to skill maintenance is the specific acqui- and intensive behavioral intervention programs
sition criterion, often termed the “mastery crite- indicated using percentage correct measures
rion” (Luiselli, Russo, Christian, & Wilczynski, across multiple sessions to determine the
2008). Luiselli et al. (2008) indicated that prac- achievement of mastery. The authors did not
titioners generally have used percentage correct provide detailed information about common
or accuracy as the predominant measure of skill mastery criterion practices such as the specific
accuracy percentages used or the application
This study was conducted by the first author, under
of these percentages across a certain number
the supervision of the second author, in partial fulfillment
of the requirements for the PhD degree at University of of sessions.
Nevada, Reno. We thank Amy Gonzales, Terralyn Tiffer, The literature includes reports of a number
and Kendall Temple for their assistance with data collec- of mastery criteria such as 80% correct for a
tion and Matthew Locey, Linda Hayes, MaryAnn Dem-
chak, and Stephen Rock for their helpful comments on single session (McCormack, Arnold-Saritepe, &
the dissertation. Elliffe, 2017), 80% correct across two or three
Correspondence should be addressed to the first author sessions (Najdowski et al., 2009), 90% correct
at Department of Psychology, 226 Thach Hall, Auburn
University, AL 36849-5214, email: smr0043@auburn.edu across two sessions (Toussaint, Kodak, &
doi: 10.1002/jaba.580 Vladescu, 2016), 90% correct across three
© 2019 Society for the Experimental Analysis of Behavior
701
702 SARAH M. RICHLING et al.

sessions (Wunderlich & Vollmer, 2017), 100% criteria (e.g., 80% vs. 100% correct) might
for a single session (Grow, Kodak, & Clements, result in differential maintenance outcomes. An
2017), and 100% across three sessions (Belisle, experimental evaluation of different mastery-
Dixon, Stanley, Munoz, & Daar, 2016). Mas- criterion arrangements is warranted, to provide
tery criteria vary across multiple dimensions behavior analysts with empirical evidence to
including percentage correct, number of guide their selection of mastery criteria. Thus,
sessions/trials, and additional variables such as the purpose of the current investigation was to
applying those criteria across multiple thera- determine commonly used mastery criteria via
pists, to name a few. Given the ubiquity of the a survey of practitioners and subsequently to
use of mastery criteria in skill acquisition pro- evaluate the effects of different mastery criteria
grams, there is a surprising lack of research on the skill maintenance of individuals with
empirically validating different mastery criteria developmental disabilities.
on training outcomes. To date, only one publi-
shed study has explicitly evaluated the impact
EXPERIMENT 1: A SURVEY OF
of mastery criteria on skill maintenance for
CLINICAL PRACTICES
individuals with developmental disabilities
(Fuller & Fienup, 2018). To acquire more specific information about
Fuller and Fienup (2018) conducted a pre- mastery criteria used by practitioners in
liminary analysis of the effects of the accuracy advance of experimental research on the topic,
level dimension of mastery on response mainte- the first author conducted a survey of Board
nance with three individuals diagnosed with Certified Behavior Analysts® (BCBAs®) who
autism spectrum disorder. This experiment supervise behavior-analytic services for individ-
involved teaching sight and spelling words until uals with developmental disabilities. The mas-
various mastery criteria levels were met tery criteria dimensions evaluated in
(i.e., 50%, 80%, and 90% accuracy during a Experiments 2 through 4 were obtained from
single 20-trial session). Thereafter, we assessed the results of this survey.
response maintenance at weekly intervals across
one month. The 90% accuracy criterion Method
resulted in the highest levels of maintained Participants and materials. Participants for
responding for all three participants, whereas this phase of the experiment were recruited by
the 80% accuracy criterion resulted in mainte- emailing invitations to participate to 15,677
nance performance below 70% for two of the individuals through the Behavior Analyst Certi-
three participants. The 50% accuracy criterion fication Board® (BACB®) mass email service
resulted in the lowest levels of response mainte- who: (a) resided within the United States;
nance. This preliminary research suggests that (b) were certified at the doctoral (BCBA-D®)
an 80% mastery criterion across 20 trials may or master’s level (BCBA); (c) indicated a pri-
be insufficient for producing acceptable levels mary emphasis of work in behavior analysis,
of maintained responding. However, given the behavior therapy, education, or positive behav-
large number of variations in the dimensions of ior support; (d) indicated a primary area of
mastery criteria, additional research is work in intellectual disabilities, autism, special
warranted to determine which mastery criteria education, or college education; and
practitioners utilize most frequently. (e) indicated their primary age group of clients
We may consider the mastery criterion a as children or adolescents. The initial email
practitioner establishes for a skill acquisition invitation (see Supporting Information) indi-
program an independent variable, as various cated that individuals who are involved in
MASTERY CRITERIA 703

service delivery in the area of intellectual dis- Regarding the age distribution of the clients
abilities and autism should complete the sur- served, results showed that across settings the
vey. The survey (see Supporting Information) average proportion of clients served between
included 23 multiple-choice and fill-in-the- the ages of 0-6 years was 32%, between the
blank questions about respondents’ certifica- ages of 7-12 years was 28%, between the ages
tion status and practice area, as well as their of 13-17 years was 12%, and 18 years or older
clinical practices related to mastery criteria was 5%.
(e.g., dependent variable used, number of ses- Sixteen questions addressed clinical practices
sions applied, source of procedures). One hun- related to the mastery criteria used within
dred and ninety-nine individuals completed respondents’ clinical practice. Survey results
the survey. The section below describes the indicated that 68% of respondents (n = 132 of
results for each of the areas addressed by the 194) use mastery criteria based on a certain
survey. Note that the percentages reported are percentage of correct trials (i.e., session based),
based on the number of responses for a spe- 28% (n = 55 of 194 respondents) use a certain
cific question, which did not necessarily equal number of correct trials in a row, and 4%
the total number of survey respondents. (n = 7 of 194 respondents) use a number of
responses per unit of time (rate). For individ-
uals who indicated they use a certain percent-
Results and Discussion age of correct trials, 57% (n = 75 of
Five questions addressed general characteris- 131 respondents) require the percentage correct
tics of the respondent and the clients served. to be met across multiple sessions in combina-
Eighty-four percent (n = 168) of respondents tion with other variables (e.g., in the presence
indicated that they were currently certified as a of two or more technicians), and 35% (n = 46
BCBA®, and 16% (n = 31) of respondents of 131 respondents) of this sample require the
indicated that they were currently certified as a percentage correct to be met across multiple
BCBA-D®. Seventy-nine percent (n = 158) of sessions without other variables. Only 8% of
respondents indicated that the highest degree participants (n = 10 of 131) indicated they
held in behavior analysis was a master’s degree, apply a percentage of correct trials during one
and 21% (n = 41) of respondents indicated that session (i.e., a single trial block). Additional
the highest degree held in behavior analysis was results from respondents indicating the utiliza-
a doctoral degree. With respect to the year in tion of a certain number of correct trials in a
which the highest degree was earned, responses row or a rate of response per unit of time have
ranged from 1985 to 2016 (n = 149). Sixty-six not been included in this report.
percent (n = 98) of respondents indicated they Most notably, 54% of individuals (n = 70 of
received their degree between the years of 2007 130 respondents) reported using an 80% crite-
and 2016, 18% (n = 37) between the years of rion across one or more sessions, 28% (n = 36
1997 and 2006, and 9% (n = 14) between the of 130 respondents) reported using a 90% cri-
years of 1985 and 1994. The highest percent- terion, and 7% (n = 9 of 130 respondents)
age (74%) of respondents (n = 147) indicated reported using a 100% criterion. In addition,
their primary role in behavior analysis was that 6% of individuals (n = 8 of 130 respondents)
of a practitioner, 16% (n = 32) indicated their reported using between an 81% and 89% crite-
primary role was that of an administrator, 9% rion across one or more sessions, 5% (n = 6 of
(n = 18) indicated their primary role was that 130 respondents) reported using between a
of a faculty member, and 1% (n = 1) indicated 91% and 99% criterion, and 1% (n = 1 of
their primary role was that of a student. 130 respondents) reported using below an 80%
704 SARAH M. RICHLING et al.

criterion. Respondents reported applying these mastery criterion with this pool of clients and
criteria across three (50%, n = 65 of 130 the lack of research evaluating the impact of
respondents), two (22%, n = 28 of 130 respon- mastery criteria on skill maintenance, the fol-
dents), more than four (13%, n = 17 of lowing experiments were designed to empiri-
130 respondents), or four (8%, n = 10 of cally evaluate the effects of the most commonly
130 respondents) consecutive sessions. As pre- reported mastery criterion dimensions on skill
viously mentioned, the remaining 8% of maintenance.
respondents (n = 10 of 130) reported applying
these criteria during a single session. The survey
EXPERIMENT 2
also asked participants to indicate the primary
information source contributing to the use of We designed Experiment 2 to evaluate the
the specific mastery criteria. The highest per- effects of an 80% mastery criterion across three
centage (44%, n = 83 of 188 respondents) sessions on skill maintenance. We selected this
specified their previous supervised experience as criterion as the data from Experiment 1 indi-
the source of their mastery criteria. Twenty per- cated it was the most commonly reported mas-
cent (n = 37 of 188 respondents) identified tery criterion. We compared this criterion to
their employer policies or requirements, 16% both a 60% mastery criterion across three ses-
(n = 30 of 188 respondents) indicated their sions and a 100% mastery criterion across three
graduate school, 10% (n = 19 of 188 respon- sessions. We included the 60% and 100%
dents) indicated continuing education experi- criteria to identify whether a parametric rela-
ences (e.g., workshops), 9% (n = 16 of tion between mastery criteria and skill mainte-
188 respondents) indicated regulatory require- nance existed.
ments (e.g., Individualized Education Plans),
and 2% (n = 3 of 188 respondents) indicated a Method
funding source. Participants and setting. Four children with
Results from this phase of the experiment developmental disabilities who had normal
confirm the aforementioned conjectures of vari- vision and hearing and were between the ages
ous authors (i.e., Anderson et al., 1996; Leaf & of 6 and 9 years old participated in this experi-
McEachin, 1999; Luiselli et al., 2008), that cli- ment. We selected participants from a compre-
nicians often use a percentage of correct hensive life-skills classroom for students with
responding as a measure of skill mastery. The mild to moderate support needs within a public
findings of this phase of the experiment also elementary school. All participants had passed
extend previous survey findings of Love et al. Level 6 on the Assessment of Basic Learning
(2009) by demonstrating that the highest per- Abilities-Revised (DeWiele, Martin, Martin,
centage of BCBAs® providing services to indi- Yu, & Thomson, 2011), indicating they dem-
viduals diagnosed with autism spectrum onstrated discriminated responding toward a
disorder and intellectual disabilities report using combination of auditory and visual stimuli.
80% correct trials across three sessions as a Evan was a 6-year-old boy diagnosed with
measure of mastery. In addition, results indicate autism spectrum disorder who could respond
that the majority of participants have adopted to multiple-step instructions, speak in short
the use of this mastery criterion level from their sentences, exhibit all verbal operants, and
supervised experience. Finally, results show the engage in simple conversational verbal
largest number of clients served is between exchanges. He received a standard score of
0 and 12 years of age. Given the common 51 (average range = 85-115) on the Peabody
use of the 80%-for-three-consecutive-sessions Picture Vocabulary Test (PPVT; Dunn &
MASTERY CRITERIA 705

Dunn, 1997). Sandy was a 7-year-old girl diag- included either seven edible items or seven tan-
nosed with health impairment and intellectual gible items (this varied across participants). The
disability, who could respond to multiple-step experimenter then used the first three stimuli
instructions, speak in short sentences, exhibit selected by each participant during these assess-
all verbal operants, and engage in simple con- ments for that session. Prior to the onset of the
versational verbal exchanges. She received a first trial, the experimenter asked the partici-
standard score of 52 on the PPVT. Cyril was a pant which item they would like to work for
9-year-old boy diagnosed with health impair- from the array of the top three items. After
ment and intellectual disability who could each consequence delivery following correct
respond to multiple-step instructions, speak in responding, the therapist again asked the partic-
short sentences, exhibit all verbal operants, and ipant to choose an item from the array of three.
engage in conversational verbal exchanges. He The first author instructed the teacher to
received a standard score of 66 on the PPVT. restrict access to all items included in the assess-
Adam was a 6-year-old boy diagnosed with ment on session days for the duration of the
health impairment, Williams Syndrome, and experiment, as much as possible.
intellectual disability who could respond to Response definitions and measurement. We col-
multiple-step instructions, speak in complex lected data on the percentage of discrete trials in
sentences, exhibit all verbal operants, and which the participant made a correct response
engage in conversational verbal exchanges. He during each 10-trial session. Correct responses
received a standard score of 89 on the PPVT. included independent responses that occurred
An experimenter conducted sessions in a 3.5 within 3 s following the presentation of the rele-
x 4.6 m therapy room at the school during nor- vant discriminative stimulus. We only recorded
mal school hours. The room contained a table, responses for trials during which a participant
chairs for the participant and the experimenter, made a response to the instruction within the
and materials relevant to the experiment. The relevant response class. For example, we only
experimenter placed a video camera in the recorded a correct or incorrect response if the
room in an unobtrusive manner and videotaped child pointed to one of the three comparison
most sessions for subsequent data scoring. A cards in the receptive identification program and
secondary observer (one of two trained assis- did so without prompting. If a response was not
tants) was present for some sessions to collect made within 3 s, the trial would have been
interobserver agreement and treatment repeated until a response was made. The pur-
integrity data. pose of this protocol was to ensure that nonre-
Stimulus preference assessments. The experi- sponses due to noncompliance were not
menter provided contingent access to preferred included in the dependent measure. The re-
food or tangible items as a consequence for cor- presentation of trials due to nonresponding was
rect responses during training sessions. The never required during the experiment.
experimenter provided noncontingent access to Target acquisition program. The acquisition
these items during baseline and weekly follow- program taught during this experiment was
up probes. Prior to the experiment, the first receptive identification, an auditory–visual con-
author asked the classroom teacher to identify ditional discrimination. We taught participants
potential preferred food or tangible items. The three sets of three target stimuli. The stimuli
experimenter then conducted a single-array used for this program were nine 15.2 x
multiple-stimulus-without-replacement prefer- 15.2 cm cards with color-printed pictures of
ence assessment (DeLeon & Iwata, 1996) prior animals, plants, and food items (see Table 1)
to each 10-trial session. Each assessment on a white background. The first author and
706 SARAH M. RICHLING et al.

Table 1
Target Stimuli, Experiments 2-4

Stimulus Set Experiment 2 Experiment 3 Experiment 4


Willow, Magnolia, Succulent Kiwi (animal), Ginger, Octagon Alaska, Geyser, %
Set 1
Set 2 Lemur, Walleye, Spookfish Blowfish, Star Fruit, Rhombus Florida, Canyon, &
Set 3 Armadillo, Gibbon, Wallaby Sloth, Dragon Fruit, Spiral Texas, Rainforest, @
Non-Experimental Target Set 1 N/A N/A Raccoon, Cockatoo, Fiddleheads
Non-Experimental Target Set 2 N/A N/A Daffodil, Paw Paw, Wombat

the classroom teacher collaboratively selected three sets rotated randomly across sessions for
target stimuli for Studies 2-4. The teacher each participant.
aided in identifying target words that included Experimental design. We used a nonconcurrent
sound blends and numbers of syllabi that all multiple-baseline design across participants with
participants had previously demonstrated the an embedded modified alternating treatments
ability to pronounce and respond to differen- design to permit within-subject treatment com-
tially. In addition, the stimuli depicted things parisons and between-subject replications.
and included words the classroom teacher indi- We evaluated the maintenance of skills fol-
cated she was unlikely to teach the participant lowing teaching until participants met a mas-
outside of the experimental sessions. For each tery criterion of 80% correct responding for
participant, each stimulus set was assigned to three consecutive sessions as compared to
one of the three mastery criteria (i.e., 100%, both a mastery criterion of 100% and 60%
80%, or 60% correct for three consecutive ses- correct responding for three consecutive
sions). We counterbalanced mastery criteria sessions. The experiment involved three con-
across stimulus sets and across participants. We ditions: baseline, teaching, and weekly follow-up
used the same stimuli across all participants to probes. For each participant, we conducted
allow for counterbalancing of stimulus sets and three to five training sessions per day, 3 to
criterion levels. We assigned stimulus set 1 to 5 days per week until the onset of weekly
the 100%, 60%, 80%, and 80% criteria for follow-up probes.
participants 1, 2, 3, and 4, respectively. We Baseline. The experimenter did not provide
assigned stimulus set 2 to the 60%, 80%, reinforcement for correct responses or prompt-
100%, and 60% criteria for participants 1, 2, ing for incorrect responses. The experimenter
3, and 4, respectively. We assigned stimulus set provided access to preferred stimuli for 15 s
3 to the 80%, 100%, 60%, and 100% criteria (food items) or 30 s (tangible items) noncon-
for participants 1, 2, 3, and 4, respectively. tingently on a 1-min schedule. The purpose of
During a given session, the experimenter placed this protocol was to decrease the probability of
three stimulus cards from one of the sets on noncompliant behavior associated with escape
the table in front of the child. The experi- from demands.
menter then provided an instruction Teaching. Teaching sessions were similar to
(e.g., “Point to lemur”). The experimenter baseline sessions except the experimenter deliv-
recorded a correct response if the participant ered praise on a continuous schedule for correct
touched the card as instructed. The experi- responses (i.e., a fixed-ratio [FR] 1 schedule)
menter randomly rotated the position of target and 15 s (food items) or 30 s (tangible items)
stimuli and corresponding instruction through- of access to the preferred stimuli on a variable-
out each 10-trial session. Training among the ratio (VR) 3 schedule. All participants used a
MASTERY CRITERIA 707

token system for their normal classroom acqui- sessions (M = 42%) for each participant. IOA
sition tasks, which required five tokens to was 100% across all conditions for each
exchange for preferred items/activities (i.e., an participant.
FR-5 schedule). Given this history, we selected Procedural integrity and IOA on procedural
a VR-3 schedule for tangible/food items and an integrity were assessed for at least 33% of
FR-1 schedule of praise because they likely teaching sessions (M = 37%) for each partici-
would be sufficiently dense, protect against sati- pant. To assess procedural integrity, the experi-
ation, and would not disrupt compliance with menter’s behavior was measured during each
demands in the classroom setting. In addition, trial by a second observer to assess whether
incorrect responses resulted in the use of a (a) the instruction was delivered correctly,
least-to-most prompting procedure that (b) the experimenter accurately recorded correct
included gestural, partial-physical, and full or incorrect responses, and (c) the experimenter
physical prompts. No participants required pro- followed the designated reinforcement schedule
mpts beyond the gestural level during the or prompting hierarchy (if needed) immediately
experiment. Prompted responses did not result following the participant’s response. The first
in reinforcer delivery. Correct responses after a author assessed the procedural integrity score
prompt resulted in the experimenter saying based on the number of correct responses
“okay” in a neutral tone. We conducted train- observed divided by the total number of
ing sessions for a given set of stimuli until responses. Procedural integrity was 100% for
responding met the designated mastery crite- each participant. The first author determined
rion (i.e., 60%, 80%, or 100% for three con- the IOA on procedural integrity by comparing
secutive sessions). After the participant met the the overall procedural integrity scores between
designated mastery criterion for each set, two independent observers for each session.
the weekly follow-up probe phase began. Thus, IOA on procedural integrity was 100% for each
the experimenter conducted weekly follow-up participant.
probes with some of the sets of stimuli while
some of the other sets were still in the teaching
phase. Results and Discussion
Weekly follow-up probes. Weekly follow-up Figure 1 displays the results of Experiment
probes were identical to baseline. These main- 2 for each participant. All participants achieved
tenance probes occurred at approximately one, the designated mastery criterion for all three
two, three, and four weeks following acquisi- receptive identification task sets during the
tion for each stimulus set. Each maintenance teaching condition. For the 100% criterion,
probe consisted of a single 10-trial session. response accuracy was at or above 80% correct
Interobserver agreement and procedural integ- for all four participants across all weekly follow-
rity. A second observer independently collected up sessions. For the 80% criterion, response
data in situ or from video recordings of sessions. accuracy maintained near or slightly below
We used these data to calculate interobserver mastery performance levels for two of the four
agreement (IOA) and procedural integrity participants (Sandy, third panel, and Adam,
scores. We assessed IOA on a trial-by-trial basis bottom panel) during the weekly follow-up
by dividing the number of trials in agreement condition. For one of the four participants
by the total number of trials and then multiply- (Cyril, second panel), the 80% criterion
ing by 100%. We assessed IOA for at least 33% resulted in an immediate decrease in response
of baseline sessions (M = 42%), 33% of teach- accuracy to an average of 47.5% correct across
ing sessions (M = 37%), and 33% of follow-up weekly follow-up sessions. For one of the four
708 SARAH M. RICHLING et al.

Baseline Teaching Weekly Follow-up


100 100

80 80
100% Criterion
60 60
80% Criterion
40 40
60% Criterion
20 20
Evan
0 0
0 5 10 15 20
100 100

80 80

60 60
Percentage Correct

40 40

20 20
Cyril
0 0
10

20
15
0

100 100

80 80

60 60

40 40

20 20
Sandy
0 0
15
10

20
0

100 100

80 80

60 60

40 40

20 20
Adam
0 0
0 5 10 15 20 1 2 3 4
Sessions Week

Figure 1. Percentage of correct responses across participants during the 100%-, 80%-, and 60%-for-three-sessions
mastery criteria series in Experiment 2.

participants (Evan, top panel), the 80% crite- levels of responding in follow-up than did the
rion resulted in an immediate drop to baseline 80% criterion. In summary, although there was
levels during the weekly follow-up condition. a parametric effect of mastery criterion level on
This was similar for the 60% criterion, except maintenance, only skills taught using a 100%
for one participant (Adam, bottom panel), for mastery criterion maintained at greater than
whom the 60% criterion resulted in higher 80% accuracy.
MASTERY CRITERIA 709

The findings with respect to the 80% mas- those taught in Experiment 2 and depicted
tery criterion do not support the consistent use things the teacher would not likely target out-
of this criterion across all individuals when side of the experimental sessions, as confirmed
teaching a receptive identification task to pro- by the classroom teacher. During a given ses-
duce maintained responding. However, these sion, the experimenter presented one of the
results are preliminary and warrant further three cards from one of the sets to the child
investigation to determine the extent to which and said, “What is this?” The experimenter
the observed patterns are replicable. For exam- recorded a correct response if the participant
ple, it is possible the observed patterns are provided the appropriate vocal response
unique to specific individuals’ performance on assigned to the presented picture. Incorrect
a specific acquisition task. In addition, the responses (including the participant saying, “I
forced-choice selection arrangement in the cur- don’t know”) resulted in the presentation of
rent experiment allows for chance responding prompts in a least-to-most procedure that
on individual trials. A nonchoice-based task included a partial vocal prompt (i.e., the first
would allow for a more pure detection of differ- letter sound of each target word) and a full
ences in maintained responding across mastery vocal prompt.
criteria. As such, we conducted the following
experiment, which involved teaching a different Interobserver Agreement and Treatment
type of acquisition task in a nonforced-choice Integrity
selection format.
The first author assessed IOA for at least
33% of baseline sessions (M = 42%), 40% of
EXPERIMENT 3 acquisition sessions (M = 47%), and 42% of
We designed Experiment 3 to systematically follow-up sessions (M = 48%) for each partici-
replicate Experiment 2 and to extend the inves- pant. IOA was 100% across all conditions for
tigation to a second acquisition task. All other each participant. The first author assessed pro-
procedures were identical to Experiment 2 (i.- cedural integrity for at least 40% of acquisition
e., participants, settings, programmed conse- sessions (M = 47%) for each participant. Across
quences, session structure, instructional all participants, procedural integrity was 100%
procedures, and experimental design). and IOA for these data was 100%.

Target Acquisition Program Results and Discussion


The acquisition program taught during this Figure 2 displays the results of Experiment
experiment was vocal tacting. Three sets of 3 for each participant. All participants achieved
three pictures were included in this curricular the designated mastery criterion for all three
program with each set being randomly assigned vocal tacting task sets during the teaching con-
to one of the three mastery criteria (i.e., 60%, dition. With respect to the 100% criterion,
80%, or 100% for three consecutive sessions). response accuracy was at or above 70% for all
As in Experiment 2, we counterbalanced these four participants across all weekly follow-up
assignments across participants. The stimuli sessions. With respect to the 80% and 60%
used for this program were nine 15.2 x criteria, results were fairly idiosyncratic across
15.2 cm cards with color-printed pictures of participants. For Sandy (top panel), the 80%
animals, plants, food items, and geometric or criterion resulted in an immediate decrease to
abstract shapes (see Table 1) on a white back- zero-level responding across all weekly follow-
ground. These stimuli were different from up sessions, undifferentiated from the 60%
710 SARAH M. RICHLING et al.

Baseline Teaching Weekly Follow-up


100 100

80 80
100% Criterion
60 60
80% Criterion
40 40 60% Criterion

20 20
Sa ndy
0 0
0 5 10 15 20 25 30

100 100

80 80

60 60
Percentage Correct

40 40

20 20
Evan
0 0
0

100 100

80 80

60 60

40 40

20 20
Cyril
0 0
10

15

20

25

30
0

100 100

80 80

60 60

40 40

20 20
Adam
0 0
0 5 10 15 20 25 30 1 2 3 4
Sessions Week

Figure 2. Percentage of correct responses across participants during the 100%-, 80%- and 60%- for-three-sessions
mastery criteria series in Experiment 3.

criterion results. For Evan (second panel), the those observed for the 60% criterion. For Cyril
80% criterion resulted in an immediate (third panel), the 80% criterion resulted in
decrease to 30% during the weekly follow-up response accuracy that maintained near mastery
condition and continued decreasing to zero for levels, on average, and above the levels observed
the final two sessions. These levels fell below for the 60% criterion during the weekly follow-up
MASTERY CRITERIA 711

condition. In addition, on average, follow up increased number of sessions conducted per


responding was similar across the 100% and day, and d) a single maintenance probe session.
80% criterion sets. Finally, for Adam (bottom
panel), the 80% criterion resulted in
maintained response accuracy of 80% at the Procedures
1-week follow-up sessions and then subse- We evaluated the maintenance of skills fol-
quently decreased to 60% across the final lowing teaching until participants met a mas-
three weekly follow-up sessions. These levels tery criterion of 80% accuracy for three
were only slightly higher than those observed consecutive sessions compared to mastery
for the 60% criterion. criteria of 90% and 100% accuracy for three
The results of Experiment 3 replicate and consecutive sessions. Again, vocal tacting was
extend the findings of Experiment 2. These taught; however, novel stimuli and target
data provide further evidence against the con- responses were utilized. We included three tar-
sistent use of an 80% mastery criterion. Skill get sets of three pictures in this curricular pro-
maintenance followed similar patterns across a gram. We randomly assigned each set to the
second type of acquisition skill, and response three mastery criteria (i.e., 80%, 90%, or
patterns did not appear to be unique to specific 100%). These stimuli differed from those used
participants across the two experiments. Again, in Experiments 2 and 3 in that they included
the results of the experiment are preliminary colored pictures of states and natural land for-
and additional replications are necessary to mations, and black and white symbols (see
determining recommended mastery criteria. Table 1). We included these stimuli because
Moreover, the question remains whether mas- the teacher identified them as more representa-
tery criterion needs be 100% to produce tive of stimuli commonly taught in a classroom
acceptable levels of maintained responding, or setting. The teacher reported these were not
whether the second-most commonly used mas- stimuli currently being taught in any other con-
tery criterion (i.e., 90%) is sufficient. In order text but would likely be taught at some point
to provide further evaluation of the 80% crite- in future years.
rion and a preliminary analysis of the 90% cri- In addition, Experiment 4 differed from the
terion, we designed the following experiment. previous experiments in that after a given set of
stimuli was mastered (per the designated crite-
rion), a new set of nonexperimental target stim-
EXPERIMENT 4 uli (see Table 1) was introduced and taught
Experiment 4 was identical to Experiment such that the same number of target stimuli
3 (i.e., participants, settings, programmed con- were always in acquisition simultaneously. This
sequences, session structure, experimental procedural variation was included to control for
design, and target acquisition program) except possible confounding effects of teaching fewer
that we evaluated a new mastery criterion total synchronous stimuli as participants mas-
(i.e., 90%) in lieu of the 60% mastery crite- tered previous stimulus sets. We did not apply
rion, as well as four other procedural variations. any mastery criteria to these two sets, and train-
These variations (described in more detail ing continued regardless of performance.
below) included: a) the inclusion of stimulus Graphical displays do not include data from
targets identified by the classroom teacher as these sets. Two participants exhibited response
targets the student would likely learn in future accuracy levels that would have met the 80%
school curricula, b) the inclusion of non- criterion across three sessions for one of the
experimentally targeted stimulus sets, c) an two nonexperimental target sets each. However,
712 SARAH M. RICHLING et al.

each of those participants met this criterion on panel). The 90% criterion resulted in a
the final teaching session. For the other partici- decrease to 40% accuracy for the remaining
pants and nonexperimental target sets, participant (Evan, top panel). Finally, the 80%
responding remained below the lowest criterion criterion resulted in an immediate decrease to
level. 0% response accuracy for two participants
This experiment also differed from the previ- (Evan, top panel; Sandy, bottom panel). The
ous two experiments in that it included a 80% criterion resulted in an immediate
greater number of sessions per day to accom- decrease in response accuracy to 20% during
modate a short timeline prior to an extended follow-up for one of the four participants
academic break. As such, each participant com- (Cyril, second panel) and to 60% for one of
pleted six to nine training sessions per day, the four participants, (Adam, third panel).
5 days per week. Finally, we only assessed Figure 4 displays performance on the first
maintenance after 1 week following acquisition follow-up probe session (i.e., 1 week after the
for each set, to accommodate an extended aca- mastery criterion was met) for all participants
demic break. However, because the first follow-up across Experiments 2 (top panel), 3 (middle
assessment after teaching is the purest represen- panel), and 4 (bottom panel). Across all 12 eval-
tation of maintenance before a history of extinc- uations, the 100% criterion resulted in the
tion can affect responding, these data maintain highest maintenance levels in all but one
interpretive validity. instance (Sandy, top panel) and was at or above
70% accuracy (M = 86%). Across four compar-
isons (Experiment 4), the 90% criterion
Interobserver Agreement and Procedural
resulted in maintenance at 10% accuracy, on
Integrity
average. The 80% criterion resulted in variable
We assessed IOA for at least 33% of baseline
maintenance that ranged from 0% to 80%
sessions (M = 53%), 33% of acquisition ses-
accuracy (M = 37%). Across eight evaluations
sions (M = 38%), and 40% of follow-up ses-
(Experiments 2 and 3), the 60% criterion
sions (M = 50%) for each participant. IOA was
resulted in maintenance that averaged 52.5%
100% across all conditions for each participant.
accuracy.
We assessed procedural integrity data for at
These results replicate and extend the results
least 33% of acquisition sessions (M = 38%)
of Experiments 2 and 3, and provide further
for each participant. Across all participants,
evidence against the use of the 80% mastery
procedural integrity was 100% and IOA for
criterion and preliminary evidence against the
these data was 100%.
use of a 90% mastery criterion. The results of
Experiment 4 are particularly surprising because
Results and Discussion for half of the participants, the 80% mastery
Figure 3 displays the results of Experiment criterion resulted in better maintenance than
4 for each participant. All participants achieved did the more stringent 90% criterion. How-
the designated mastery criterion for all three ever, in the current preparation, the difference
vocal tacting sets during the teaching condition. between obtaining 80% accuracy and 90%
For the 100% criterion, response accuracy was accuracy was only one trial in a given session.
at or above 70% for all four participants during As such, the difference was marginal and spe-
the follow-up probe. For the 90% criterion, cific patterns for two of only four participants
response accuracy immediately fell to 0% for might be due to chance, although the same
three of the four participants (Cyril, second could be said for the difference between the
panel; Adam, third panel; Sandy, bottom 90% criterion and the 100% criterion.
MASTERY CRITERIA 713

Baseline Teaching 1 Week Follow-up


100 100

80 80
100% Criterion
60 60
90% Criterion
40 40
80% Criterion
20 20
Evan
0 0
0 5 10 15 20 25 30 35 40 45

100 100

80 80

60 60
Percentage Correct

40 40

20 20
Cyril
0 0
10

15

25

30

40

45
20

35
0

100 100

80 80

60 60

40 40

20 20
Adam
0 0
10

15

20

25

30

35

40

45
0

100 100

80 80

60 60

40 40

20 20
Sandy
0 0
0 5 10 15 20 25 30 35 40 45 1
Sessions Week

Figure 3. Percentage of correct responses across participants during the 100%-, 90%- and 80%- for-three-sessions
mastery criteria series in Experiment 4.

GENERAL DISCUSSION for three sessions) resulted in skill maintenance


We designed the current study to evaluate for children with developmental disabilities.
the extent to which the most commonly This study involved a comparison of this mas-
reported mastery criterion (i.e., 80% accuracy tery criterion with three other mastery criteria
714 SARAH M. RICHLING et al.

100
100% Criterion 80% Criterion 60% Criterion Study 2 average of 13 sessions (range, 4 to 22), the 80%
80 criterion required an average of seven sessions
60 (range, 3 to 16), and the 60% criterion required
40 an average of four sessions (range, 3 to 4). How-
20
ever, although the 80% mastery criterion pro-
0
duced a greater number of reinforcer
Evan Cyril Sandy Adam
presentations than the 60% criterion (as did the
Percentage Correct on First Follow-up

100 Study 3
90% criterion relative to the 80% criterion), it
80
did not always result in superior performance
60
during follow-up probes.
40
The most obvious explanation for why the
20
0
100% mastery criterion resulted in the best
0
Evan Cyril Sandy Adam maintenance is the absence of errors at the end
of the instructional period, which meant that
100
100% Criterion 90% Criterion 80% Criterion Study 4
the proper stimuli exerted discriminative control
80
over the target responses. For the remaining
60
criteria, it would be possible for the participant
40
to make an error on the first presentation of a
20
given stimulus, receive error correction, and then
0
0 0 0 0 0 perform accurately on subsequent presentations
Evan Cyril Sandy Adam
of that particular stimulus. This could result in
the individual never responding accurately to
Figure 4. Percentage correct on the first follow-up
probe session (i.e., 1 week following meeting mastery cri-
that stimulus in that absence of at least one
terion) for all participants across Experiments 2-4. prompt for that trial block. This possibility may
provide support for the practice of requiring that
the response to the first presentation of a stimu-
(i.e., 60%, 90%, and 100% accuracy for three lus be unprompted. In the present study, the
sessions). We made comparisons across two stimulus sets consisted of three target stimuli,
curricular programs and four children for a and as such, the first trial data may be insuffi-
total of 12 data sets. Maintenance following cient for capturing this possibility. Rather, it
teaching to 80% accuracy for three consecutive might be necessary to require an independent
sessions only occurred in approximately half of response to the first presentation of each stimu-
the 12 evaluations. lus. We did not design the current investigation
These findings may suggest that the addi- to evaluate the effects of the number of rein-
tional acquisition sessions required to meet a forcer presentations, independent first
100% mastery criterion produced higher levels trial/stimulus presentation requirements, or the
of maintenance than did the 60%, 80%, and number of sessions across which participants
90% mastery criteria. That is, for 9 of the must demonstrate accuracy. Future researchers
12 evaluations, the 100% mastery criterion should evaluate the extent to which mastery
resulted in the greatest number of teaching ses- criteria that require components such as accu-
sions and provided a greater number of rein- racy on the first trial or accurate responding
forcer presentations during acquisition. Across across a greater number of sessions affect the
all experiments and participants, the 100% mas- subsequent maintenance of responding.
tery criterion required an average of 14 sessions Several procedural arrangements varied across
(range, 4 to 33), the 90% criterion required an the experiments and the potential limitations of
MASTERY CRITERIA 715

these variations warrant further discussion. A reinforced responses. That is, the entire experi-
potential limitation of Experiment 4 was the mental context may serve as a discriminative
inclusion of only a single follow-up probe. stimulus, in the presence of which investigators
Although the first maintenance probe is argu- previously reinforced a larger number of
ably the purest demonstration of maintenance responses. In the current study, there were sev-
prior to the onset of extended extinction eral occasions in which a participant erred by
effects, there were several instances in Experi- emitting a response that was a correct tact for a
ments 2 and 3 in which performance levels stimulus in another set or emitted a blend of
increased on subsequent probes. This pattern tact words across various sets. Interestingly,
may be indicative of an extinction-related phe- there were several instances in which a partici-
nomenon (i.e., an increase in the previously pant emitted a tact response in Experiments
reinforced accurate response in the absence of 3 and 4 identical to a verbal discriminative
the contingent delivery of reinforcers). These stimulus provided by the trainer in the previous
points warrant future research and discussion receptive identification program in Experiment
regarding what is the most socially valid test of 2. The inclusion of the nonexperimental target
maintenance within the constraints of empirical stimuli resembles how individuals may teach
research. In addition, in Experiment 4, investi- skills in a natural environment. Once an indi-
gators conducted more sessions per day than in vidual masters a particular skill, the teacher
Experiments 2 and 3. It is possible that error likely adds a new target skill to the overall pro-
correction in early sessions affected responding gramming rather than waiting until the individ-
in subsequent sessions conducted on the same ual has mastered all skills before introducing
day. As such, participants would be more likely new targets. Future empirical researchers utiliz-
to meet the mastery criteria on a single day ing similar comparative preparations should
than across multiple days. Future researchers consider the adoption of nonexperimental tar-
should evaluate the extent to which the addi- get sets as a way to more closely emulate typical
tion of a time dimension component to the teaching situations.
mastery criteria beyond just sessions affects the It is worth noting several aspects of the cur-
maintenance of response accuracy for longer rent study that might function as barriers to
durations of time. the generality of findings to other contexts.
On a similar note, all of the mastery criteria First, many of the survey respondents indicated
examined in Experiment 4, including the they often used an 80% mastery criterion in
100% criterion, resulted in lower levels of combination with other variables including
maintained responding than in Experiments training across two or more technicians, train-
2 and 3. One potential explanation for this ing across two or more environments, task
outcome may be the inclusion of the non- interspersal, and reinforcer schedule thinning.
experimental target stimulus sets. The inclusion The current study did not include these vari-
of these additional sets resulted in teaching ables. Although respondents identified these
considerably more targets overall. Future variables as strategies utilized to promote main-
researchers should evaluate the extent to which tenance, several of those listed only address
the number of targets included in a massed-trial generalization. For example, training across two
teaching format affects skill acquisition and or more therapists or environments promotes
subsequent maintenance of response accuracy. stimulus generalization rather than mainte-
In addition, it is possible that the inclusion nance of responding. Further, the use of task
of these additional sets resulted in increased interspersal for the promotion of maintenance
probabilities of errors in the form of previously has equivocal empirical support (Rapp &
716 SARAH M. RICHLING et al.

Gunby, 2016). However, there is conceptual perspective-taking skills to children with autism using
and empirical support for reinforcer schedule the PEAK-T curriculum: Single-reversal “I-you” deic-
tic frames. Journal of Applied Behavior Analysis, 49,
thinning as a means for promoting mainte- 965-969. https://doi.org/10.1002/jaba.324
nance (e.g., Heinicke et al., 2016). Future Cariveau, T., Kodak, T., & Campbell, V. (2016). The
researchers should compare various mastery effects of intertrial interval and instructional format
on skill acquisition and maintenance for children
criteria involving combinations of accuracy with autism spectrum disorders. Journal of Applied
levels, numbers of sessions, training days, and Behavior Analysis, 49, 809-825. https://doi.org/10.
other variables in order to determine the most 1002/jaba.322
Cummings, A. R., & Carr, J. E. (2009). Evaluating pro-
efficient and effective strategies for producing gress in behavioral programs for children with autism
optimal training outcomes. spectrum disorders via continuous and discontinuous
Second, the current study only evaluated measurement. Journal of Applied Behavior Analysis,
mastery criteria based on a percentage of cor- 42, 57-71. https://doi.org/10.1901/jaba.2009.42-57
DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of a
rect responses across sessions. Some practi- multiple-stimulus presentation format for assessing
tioners use a trial-based teaching approach, reinforcer preferences. Journal of Applied Behavior
which requires a trial-based mastery criterion Analysis, 29, 519-533. https://doi.org/10.1901/jaba.
1996.29-519
(e.g., 10 consecutive correct trials). Future DeWiele, L., Martin, G., Martin, T. L., Yu, C. T., &
researchers should evaluate mastery criteria in Thomson, K. (2011). The Kerr-Meyerson assessment of
these arrangements as well. basic learning abilities revised: A self-instructional man-
The current study is one of the first to experi- ual (2nd Ed.). St. Amant Research Centre, Winnipeg,
MB, Canada. Retrieved from http://stamant.ca/
mentally evaluate commonly used mastery research/abla
criteria for skill acquisition. The evaluation Dunn, L. M., & Dunn, L. M. (1997). Peabody picture
included four children with varying diagnoses, vocabulary test, third edition. Circle Pines, MN: Ameri-
can Guidance Service.
two curricular program areas, and four mastery Fuller, J. L., & Fienup, D. M. (2018). A preliminary
criterion levels. Additional research is necessary analysis of mastery criterion level: Effects on response
to extend this line with other practice-relevant maintenance. Behavior Analysis in Practice, 11, 1-8.
variables, such as populations, settings, skill https://doi.org/10.1007/s40617-0201-0
Grow, L. L., Kodak, T., & Clements, A. (2017). An eval-
types, supplemental maintenance interventions, uation of instructive feedback to teach play behavior
an so on. Such activity would enable behavior to a child with autism spectrum disorder. Behavior
analysts to make programming decisions based Analysis in Practice, 10, 313-317. https://doi.org/10.
1007/s40617-016-0153-9
on evidence rather than on lore. In the mean- Heinicke, M. R., Carr, J. E., Pence, S. T., Zias, D. R.,
time, however, results from the current investi- Valentino, A. L., & Falligant, J. M. (2016). Assessing
gation, albeit preliminary, suggest that the 80% the efficacy of pictorial preference assessments for
children with developmental disabilities. Journal of
and 90% mastery criteria utilized by a majority Applied Behavior Analysis, 49, 848-868. https://doi.
of behavior analysts may not be sufficient to org/10.1002/jaba.342
promote skill maintenance for some individuals. Leaf, R., & McEachin, J. (Eds.) (1999). A work in pro-
gress: Behavior management strategies and a curriculum
for intensive behavioral treatment of autism. New York:
DRL Books.Love, J. R., Carr, J. E., Almason,
REFERENCES S. M., & Petursdottir, A. I. (2009). Early and inten-
sive behavioral intervention for autism: A survey of
Anderson, S., Taras, M., & O’Malley Cannon, B. (1996).
clinical practices. Research in Autism Spectrum Disor-
Teaching new skills to children with autism. In
ders, 3, 421-428. doi:https://doi.org/10.1016/j.rasd.
C. Maurice, G. Green, & S. C. Luce (Eds.), Behav-
2008.08.008
ioral intervention for young children with autism: A
Luiselli, J. K., Russo, D. C., Christian, W. P., &
manual for parents and professionals (pp. 181–194).
Wilczynski, S. P. (2008). Skill acquisition, direct
Austin, TX: PRO-ED.
instruction, and educational curricula. In
Belisle, J., Dixon, M. R., Stanley, C. R., Munoz, B., & J. K. Luiselli (Ed.), Effective practices for children with
Daar, J. H. (2016). Teaching foundational autism (p. 196). New York: Oxford University Press.
MASTERY CRITERIA 717

Majdalany, L. M., Wilder, D. A., Greif, A., Toussaint, K. A., Kodak, T., & Vladescu, J. C. (2016).
Mathisen, D., & Saini, V. (2014). Comparing An evaluation of choice on instructional efficacy and
massed-trial instruction, distributed-trial instruction, individual preferences among children with autism.
and task interspersal to teach tacts to children with Journal of Applied Behavior Analysis, 49, 170-175.
autism spectrum disorders. Journal of Applied Behav- https://doi.org/10.1002/jaba.263
ior Analysis, 47, 657-662. https://doi.org/10.1002/ Wunderlich, K. L., & Vollmer, T. R. (2017). Effects of
jaba.149 serial and concurrent training on receptive identifica-
McCormack, J., Arnold-Saritepe, A., & Elliffe, D. (2017). tion tasks: A systemic replication. Journal of Applied
The differential outcomes effect in children with Behavior Analysis, 50, 641-652. https://doi.org/10.
autism. Behavioral Interventions, 32, 357-369. https:// 1002/jaba.401
doi.org/10.1002/bin.1489
Najdowski, A. C., Chlingaryan, V., Bergstrom, R., Received October 26, 2017
Granpeesheh, D., Balasanyan, S., Aguilar, B., & Final acceptance April 3, 2019
Tarbox, J. (2009). Comparison of data-collection Action Editor, Jeanne Donaldson
methods in a behavioral intervention program for
children with pervasive developmental disorders: A
replication. Journal of Applied Behavior Analysis, 42,
827-832. https://doi.org/10.1901/jaba.2009.41-827 SUPPORTING INFORMATION
Rapp, J. T., & Gunby, K. (2016). Task interspersal for Additional Supporting Information may be
individuals with autism and other neurodevelopmental
disorders. Journal of Applied Behavior Analysis, 49, 730- found in the online version of this article at the
734. https://doi.org/10.1002/jaba.319 publisher’s website.

View publication stats

You might also like