Professional Documents
Culture Documents
www.bhcoe.org • 1
TABLE OF CONTENTS
Part 1: Overview 3
Instruments in ABA
Section 1: Considerations 16
Summary 50
References 51
Appendices 59
Acknowledgments 87
www.bhcoe.org • 2
: Overview www.bhcoe.org • 3
Section 1: Executive Summary
This document is meant for applied behavior analysis (ABA) practitioners, researchers,
insurance providers, and other relevant stakeholders. The purpose of this document is
twofold: first, provide a systematic approach for selecting instruments to assess and
plan treatment for individuals with autism spectrum disorder (ASD); and second,
inform data collection and reporting on treatment outcomes.
The guidelines we have provided in this document are based on the best available
research evidence and subject matter expertise regarding instrument1 selection for the
assessment and planning of treatment for individuals diagnosed with ASD. We intend
for these guidelines to be practical and digestible for diverse audiences ranging from
researchers and practitioners to insurance providers. Because all treatment is
individualized, these guidelines are intended to inform and are not meant as a
substitute for the expertise of the practitioner who has observed a patient directly.
Where stakeholder opinions diverge, significant weight should be given to the
recommendations of the qualified practitioner who has observed the patient in person.
1
Throughout this document we have used the terms assessment instruments, assessment tools, and
measurement instruments interchangeably.
www.bhcoe.org • 4
Section 2: About Autism Spectrum Disorder
ASD is a lifelong, pervasive developmental disability characterized by deficits in social
and communication skills, as well as restricted and repetitive behaviors (American
Psychiatric Association, 2013). There is no genetic test or single known cause for ASD.
Furthermore, the autism spectrum describes a broad range of behavioral patterns and
levels of functioning. This variability in skill deficits and behavior excesses continues to
challenge efforts to standardize treatment of the condition. Today, ASD is diagnosed by
experienced licensed professionals through standardized behavior observation tools in
conjunction with parents’ reports about their child’s behavior and development. Over
the last few decades, the number of children diagnosed has steadily increased as
awareness and acceptance of ASD have increased along with better screening and
diagnostic tools (Autism Speaks, 2021). According to data from the Centers for Disease
Control (CDC) and Prevention, one in 54 children had a diagnosis of ASD by age 8 in
2016 (CDC, 2020).
www.bhcoe.org • 5
Section 4: Societal and Economic
Considerations
In addition to the significant emotional and social impact of ASD on individuals and
their loved ones, the financial cost of supporting an individual with ASD is high.
Researchers have found the cost of raising an individual with ASD throughout their
lifespan averages between $1.4 million and $2.4 million in the U.S., depending on
whether the person also has an intellectual disability (Chasson et al., 2007; Buescher
et al., 2014). In contrast, the cost of raising a child without any disabilities in the
U.S. in 2015 was estimated at $233,610 (USDA, 2017). In addition, the medical
expenditures for children and adolescents with autism are on average between
4.1 and 6.2 times greater than for those without autism (Shimabukuro et al., 2007).
The reported annual cost of ABA therapy services for individuals with ASD ranges
between $40,000 and $60,000 per year (Rogge & Janssen, 2019). Globally, this
translates to an overall cost for providing care and services to children with ASD
estimated at between $61 and $66 billion a year (Autism Speaks, 2021). For adults
with ASD, the overall costs are estimated at $175 to $196 billion a year for
accommodation, direct medical costs, and individual productivity loss.
www.bhcoe.org • 6
gained the support of organizations such as the American Academy of Pediatrics
(Hyman et al., 2020) and the Centers for Disease Control and Prevention (CDC, 2019).
The goal of ABA for individuals with ASD is to improve the individual’s quality of life
by changing the environment around the individual to teach them useful skills. For
example, ABA practitioners may target increasing communication skills to help
individuals with ASD state their needs and wants, use their voices effectively,
and make choices based on their preferences.
The effects of EIBI also tend to persist over time. For example, a recently published
study followed up with individuals after they had received EIBI services and found
that the effects of EIBI remained 10 years later (Smith et al., 2019). Another important
finding in this study was that none of the participants developed additional diagnoses
such as anxiety, attention deficit hyperactivity disorder, or depression. These co-
occurring disorders are otherwise common for adolescents and adults with ASD
(Gjevik et al., 2011). In sum, research clearly suggests that EIBI —at present—is one
of the best treatment options for children with ASD.
www.bhcoe.org • 7
Several assessment instruments have been used to measure the outcomes of EIBI. Most
studies have reported the effects of EIBI using standardized measures of adaptive
behavior in everyday life (e.g., Vineland Adaptive Behavior Scales), intellectual
functioning (e.g., Bayley Scales of Infant Development, Wechsler Preschool and Primary
Scales, Mullen, Standford-Binet-5), and various measures of autism severity (e.g., ADOS,
ADI-R, Childhood Autism Rating Scale, Social Responsiveness Scale; Ridout & Eldevik,
2021). When measuring across patients, past research has consistently demonstrated a
dose–response relationship between the number
of hours per week of ABA (dose) and the amount
of change in an outcome measure (response),
such as IQ scores or adaptive behavior in
everyday life. Additionally, at the group level,
parent engagement and participation in
treatment is associated with better outcomes.
However, at the individual patient level, response
to intervention can vary for currently unknown
reasons. Thus far, systematic reviews have been
unable to identify consistent predictors or risk
factors for individual patient outcomes following ABA.
www.bhcoe.org • 8
allows ABA practitioners to set themselves apart from providers who provide treatments
for which there is little to no research evidence. Yet, despite widespread agreement that
practitioners and researchers must evaluate treatment outcomes, there is little
agreement about how this should be done in ABA (Smith, 2013).
For example, consider a child who is screaming and throwing himself on the floor
when asked to use the restroom. Based on the patient’s stated definition of a quality
life, an ABA treatment provider may be asked to develop an intervention to teach the
child how to approach and use the toilet effectively. Here, a standardized measure of
problem behavior or adaptive skills would have limited value for designing the
intervention and capturing improvement. Instead, the results of a functional behavior
assessment allow the treatment team to develop a function-based intervention plan
where behavior data before, during, and after the intervention demonstrate whether
a change in behavior occurs that improves the patient’s quality of life. The data
resulting from this more nuanced and individualized approach are likely to be more
meaningful to the treatment team for monitoring the patient’s improvement.
www.bhcoe.org • 9
Nevertheless, equipped only with individualized data, one finds it difficult to compare
outcomes of idiosyncratic approaches across patients and across providers. To do
this, practitioners must agree on common data definitions and methods of data
collection to measure the overall treatment gains across patients and to predict
rate of improvement for patients with similar behavioral presentation. Standardized
measurement tools provide common definitions and methods of data collection that
allow for large-scale, cross-patient comparison of adaptive skills changes and socially
significant gains. Succinctly, standardized instruments offer the field a unified
approach for a more global analysis of the effectiveness of the general approaches
that ABA practitioners take. A critical first step to create fieldwide systems for
documenting treatment outcomes is to develop a systematic and objective process
for selecting appropriate measurement instruments to evaluate treatment
effectiveness (Romanczyk & Gillins, 2008).
www.bhcoe.org • 10
Section 8: How We Selected the Assessment
Instruments in This Guide
We considered a few important factors when selecting the measurement instruments
outlined in this document. First, we aimed to select scientifically robust instruments
that are also practical to administer, considering time and cost. To do this, the team
of subject matter experts considered the following:
www.bhcoe.org • 11
Next, we used Consensus-based Standards
for the Selection of Health Measurement
Instruments (COSMIN). COSMIN is a
framework that assists with improving the
selection of outcome measure instruments in
both research and clinical practice. It is an
initiative of an international multidisciplinary
team of researchers who aim to improve the
selection of outcome measurement
instruments by developing tools for selecting
the most appropriate available instrument
(Mokkink et al., 2010). COSMIN calls for standardization of outcomes and outcome
measurement instruments by developing Core Outcome Sets (COS) and COS
methodology. A COS is a consensus-based minimum set of outcomes that should be
measured and reported in all clinical trials of a specific disease or trial population
(COSMIN, n.d). We followed the recommended four steps to assess whether a study
met the standard for good methodological quality (Mokkink et al., 2010):
www.bhcoe.org • 12
We then followed the following four steps to select the measurement instruments
(Mokkink et al., 2016):
www.bhcoe.org • 13
Though not a perfect measure, widespread adoption of an instrument typically
suggests that a tool is feasible to administer and useful to clinical practice. If a large
proportion of practitioners already use a particular measurement instrument, they are
much more likely to continue using the instrument. However, practitioner adoption of
instruments is sometimes influenced by third party funding sources. Therefore, we
also reviewed instruments that large insurance providers currently require their
provider network to administer and found those to be the same instruments
practitioners reported using most frequently in Padilla (2020). At a broader level, we
found that the measurement tools most frequently utilized by practitioners were
already on the list of instruments that we selected based on the COSMIN method.
Lastly, we considered several other practical factors that should affect decisions
regarding which measurement instruments to utilize. For example, we only included
instruments that practitioners of ABA are qualified to administer. Also, when possible,
we selected instruments that are comparatively time and cost efficient and can be
administered to a broad patient population (e.g., larger age range, availability of
instrument in different languages).
www.bhcoe.org • 14
www.bhcoe.org • 15
Section 1: Considerations
In this part, we provide recommendations that reflect established research findings and
best clinical practices for selecting measurement instruments to assess the outcome of
ABA therapy. However, assessment decisions should be made based on the individual
needs of each patient. These guidelines should not be used to diminish access, quality,
or frequency of currently available behavioral treatment services. Coverage of behavioral
treatments for ASD by healthcare funders should not supplant responsibilities of
educational or governmental entities, except where required by state or federal law.
In this document, guidance is limited to assessments for ABA treatment only.
www.bhcoe.org • 16
Assessor Must First Consider the Reason(s) for Assessment
The initial referral concern(s) is usually the first thing that should influence the
assessor’s selection of specific assessment tools. For example, the assessor may use
an instrument that specifically addresses problem behaviors for a patient referred for
aggression. Furthermore, an assessor may wish to evaluate a patient’s performance
compared with an objective pre-determined criterion (criterion-referenced
interpretation) or compare the patient’s performance with the performance of same-
age peers (norm-referenced interpretation).
All assessments produce raw scores for assessed skills. When norm-referenced
assessment measures are used, the raw scores alone are an uninterpretable measure of
performance because they do not provide any way of contextualizing the score (Sattler,
2014). For example, a 4-year-old child’s raw score on the Expressive Vocabulary Test-3
(EVT-3) might be 25. This score alone does not help the evaluator determine if the child
shows deficits or age-level skills as a speaker because the evaluator has no information
www.bhcoe.org • 17
on how other 4-year-olds performed on the test. To make a comparison, the evaluator
must first convert the raw score to a derived score using the mean and standard
deviation of the standardization sample of same-aged peers (i.e., 4-year-olds). After
converting the raw score of 25 to a derived score (e.g., a standard score) and comparing
the derived score to scores obtained from the standardization sample, the evaluator can
determine if the child’s performance is within, below, or above the average range
compared with the performance of all 4-year-olds in the standardization sample
(Raynolds & Livingstone, 2012).
www.bhcoe.org • 18
because it is more representative of the patient’s background. For all norm-referenced
measures, the standardization sample data is included in the manual for that measure.
www.bhcoe.org • 19
Assessor Must Consider the Following Criteria for
Instrument Selection
When selecting assessment instruments, practitioners must consider the reason for
referral, the patient’s age, the primary language spoken by the patient, and the
validity and reliability of the assessment instruments. It is important that the
practitioner selects instruments that are valid and reliable to give confidence that the
assessment results accurately reflect the patient’s abilities.
2
For more information on reliability and validity, interested readers are referred to Sattler (2014) and
Raynolds & Livingstone (2012).
www.bhcoe.org • 20
consideration. For example, diagnostic assessments require a reliability coefficient
of.90 or above. But tests for screening purposes or for purposes of skill acquisition
programming require reliability coefficients of.80 or above (Sattler, 2014).
Primary Language of the Patient. The patient’s primary language must also be
considered when choosing an assessment and, to the best ability of the assessor, the
assessment should be conducted in the child’s primary language (Sattler, 2018). When
using tests that rely on norm-referenced interpretation, translation of the assessment
questions can affect the reliability of the test and increase score errors (Sattler, 2018).
It is strongly recommended to use assessment instruments that have been norm
referenced in the patient’s primary language (Sattler, 2018). For example, the
Receptive and Expressive One-Word Picture Vocabulary Tests–Fourth Edition uses
a norm-referenced interpretation of the scores and has both English and Spanish
versions. If one needed a standardized language test for a child whose primary
language is Spanish, the Receptive and Expressive One-Word Picture Vocabulary
Tests–Fourth Edition could be an option. If there are no assessment instruments in
the patient’s primary language, use of interpreters who are fluent in English and the
patient’s primary language is the best option (Sattler, 2018). When reporting the
results, you should mention that an interpreter was used during the assessment.
Age of the Patient. When using assessment measures, the patient’s age must also be
considered. For instruments that assess skills, age of the patient provides information
about what is developmentally appropriate for the patient to have in their repertoire.
When using tests that use norm-referenced interpretation, it is important to use
assessment instruments that have been norm referenced with individuals who are in
the patient’s age group. If a specific measurement instrument is the only option to
use and it is outside of the patient’s age range, then the assessor can use a qualitative
interpretation of the test scores instead of reporting standard scores.
www.bhcoe.org • 21
defined by the patient. The areas of need include, but are not limited to, severity of
ASD, communication skills, readiness to learn, daily living skills, social and play skills,
problem behaviors, and executive functioning. Additionally, at a higher level, socially
important outcomes that matter when providing care include the individual’s quality
of life, stress, and overall happiness and satisfaction.
Guidelines from the APA include the following: (a) Psychologists provide services, teach,
and conduct research with populations and in areas only within the boundaries of their
competence, based on their education, training, supervised experience, consultation,
study, or professional experience; and (b) When psychologists are asked to provide
services to individuals for whom appropriate mental health services are not available and
for which psychologists have not obtained the competence necessary, psychologists
www.bhcoe.org • 22
with closely related prior training or experience may provide such services to ensure
that services are not denied if they make a reasonable effort to obtain the competence
required by using relevant research, training, consultation, or study.
Standards for ABA organizations from the BHCOE are that an ABA organization must
act honestly and responsibly to a) promote ethical practices of its employees and b)
supports certified employees in complying with ethical and professional requirements
of their certifying and/or licensing body. The organization never directs employees to
act in violation of those requirements and resolves any conflicts between the
company policy and those requirements.
www.bhcoe.org • 23
A Unified, Systematic Approach to Selecting Measurement
Instruments To Measure ABA-Treatment Outcomes:
The BHCOE ABA Outcomes Framework™
In Figure 2 we depict a decision-making model that includes the measurement
instruments selected in this guide. For each instrument, we provide information in
Tables 1–6 in the Appendices, such as the age range, duration of assessment, relative
cost, qualifications required to administer the instrument, and general pros and cons.
If more than one instrument met criteria for selection during review by subject matter
experts, we included them all in Figure 2 and left room for the practitioner/assessor
to choose between assessments based on their training and experience. These
instruments are depicted in alphabetic order and separated by the word “or.”
www.bhcoe.org • 24
to select specific verbal behaviors to teach. Criterion-referenced instruments in
this guide allow expressive and receptive language to be separated into specific
components such as requesting, labeling, answering “Wh” questions (i.e., Who,
What, Where, When, Why), identifying nouns, and following directions.
For practicality, some instruments minimize overall assessment time and facilitate
analyses across patients because they measure different skill domains (e.g.,
communication, daily living, social) and broad age ranges. For example, a patient in
comprehensive EIBI may benefit from the CARS-2 for measuring severity of ASD; the
Vineland-3 for measuring communication, daily living, and social skills; and the VB-
MAPP for measuring basic communication, learning readiness, skills barriers, social
skills, and play.
Step 4. In addition to measuring the impact of ASD and specific behaviors targeted
for treatment, assessors should measure the social significance of their treatment. At
a higher level, meaningful intervention for ASD should improve the individual’s, and
their families’, quality of life and emotional distress. Research evidence suggests that
the quality of life of families with a child with ASD is more impaired than the quality of
life of families with children with other developmental disabilities. The same trends
have been found regarding parent stress. Along the same lines, adolescents and
adults with ASD report lower quality life. Therefore, evaluating change in quality of
life and stress is important for guiding the course of a patient’s progress during
treatment and as outcome measures, allowing for evaluation of more global family-
system effects.
Another socially relevant and important factor to measure is the patient, and their
parents/guardians’, satisfaction with treatment. Engaging parents in treatment is
important (see BHCOE Accreditation, 2021) as it is related to better treatment
outcomes. Parental satisfaction with, and acceptability of treatment, likely relates to
their engagement with the treatment team, continuity with treatment, and
involvement in treatment sessions. Furthermore, a patient’s satisfaction with their
own treatment may be related to their self-observed improvements or likelihood to
adhere to treatment protocols. Along the same lines, there is some evidence that
parents’ scores on acceptability of a treatment is related to their likelihood to adhere
to the treatment protocol. Currently, however, the literature is mixed regarding the
relationship between measures of satisfaction and acceptability and treatment
www.bhcoe.org • 25
effectiveness or parental adherence to treatment. The mixed findings may be
because unlike skills assessments, measures of social validity are self-reports and
capture the informants’ perceptions of treatment, which may be affected by factors
other than skills improvement or reductions of problem behaviors. Further research is
needed in this area.
At this time, compared with instruments that measure skills, problem behaviors,
and severity of ASD, very few instruments that measure social significance following
ABA treatment have been developed and examined for reliability and validity in
the literature. For example, in the 14 studies reviewed by McNaughton (1994),
14 different instruments were used, each individually developed for a specific
research project, to measure parent satisfaction.
The few instruments that have been administered with individuals with ASD relate to the
individual, or their parent’s, quality of life; their level of stress; general satisfaction with
treatment; and treatment acceptability. Unfortunately, few instruments include both
self- and proxy-report (e.g., PedsQL™). Most instruments have been developed either
for the patient or their parents/guardians. Therefore, at this time, practitioners/assessors
are encouraged to interpret results with caution as much more research is needed in
this area.
www.bhcoe.org • 26
FIGURE 2: THE BHCOE ABA OUTCOMES FRAMEWORKTM
www.bhcoe.org • 27
FIGURE 3: CASE EXAMPLE
A 3-year old patient is referred for ABA therapy based on communication, daily living, and
social & play.
www.bhcoe.org • 28
Assessment Re-administration. Some instruments use rating scales to measure mood
or symptomatology and are more sensitive to short-term changes. These instruments
do not provide a norm group for reference or specific criterion for treatment
planning, but they can be re-administered as often as every few months. Generally,
however, to observe progress, it is best to readminister instruments that contain a
norm group or criterion for reference every 6 months to 1 year. For standardized
measurement instruments, meaningful changes are unlikely to be detected until at
least 1 year of treatment has occurred.
www.bhcoe.org • 29
Summary Regarding Instrument Selection
Choosing assessments to measure treatment outcomes is a multi-faceted process
that requires careful consideration of many factors. Measuring all patient progress
with a single instrument fails to adequately capture the many behavioral and skill
changes a patient is likely to experience following ABA treatment. In the first part
of this guideline, we provided practitioners and other stakeholders with a decision-
making model that outlines the measurement instruments that can be used with
individuals with ASD depending on the reason for referral and why they choose to
receive ABA therapy. To do this, we used the COSMIN method to select reliable and
valid measurement instruments that are comparatively efficient in terms of cost and
time to administer and that are applicable to broad age ranges. We have provided a
roadmap for selecting measurement instruments based on the referral problem.
We have also outlined factors that should affect the practitioner’s decision to use
one instrument rather than another, including the patient’s age, primary language,
and other personal characteristics. Lastly, we provided guidelines regarding
re-administration of assessment instruments for detecting change. In the next part
of this guideline, we discuss how to record, store, analyze, and interpret the data
obtained from administering these measurement instruments.
www.bhcoe.org • 30
www.bhcoe.org • 31
Having read Part II, we hope the reader can differentiate between different types
of assessments (e.g., norm-referenced, criterion-referenced) and determine which
assessment(s) is (are) most appropriate for the types of skills that patients of their
organization target through ABA. Hopefully, the reader also has identified how
they want to analyze those data at the organizational level. Such analyses allow the
organization to understand how well different practitioners lead programs that
positively affect patient progress and how the organization compares with other
organizations on treatment outcomes. But it is one thing to collect assessment
data and know what you want to do; it is another thing entirely to make use of
those data efficiently. In Part II, we discuss basic principles for creating and
maintaining tidy data sets and relational databases and how these translate to
best practices when practitioners store assessment data in Excel documents or
the latest cloud-based big data database.
One Row per Observation. In Figure 5 we have shown one example of a well-
organized VB-MAPP dataframe (bottom panel) as well as a dataframe that would
require cleaning and preprocessing to be used (top panel). The first characteristic
of a tidy dataframe is that a single row is used for each observation. For example,
if the purpose of this dataframe is to compare the progress a patient made between
two assessments, and if we have data from more than two assessments for one
patient, then each assessment comparison would be placed in a separate row.
www.bhcoe.org • 32
Row 2, Column H of the top panel in Figure 5 shows an example where multiple
assessment scores are stored in a single row. These should be separated into multiple
rows (e.g., rows 2–3; bottom panel of Figure 5).
One Variable per Column. The second characteristic of the tidy dataframe is one
variable per column. Again, using the example untidy dataframe in Figure 5, column D
contains the data from multiple VBMAPP assessments for the patient in row 2. Similarly,
Column I contains the data from multiple subdomains for a single assessment for the
patients in rows 2 and 4. Storing the data in this structure makes it challenging to easily
ask questions of these data without a lot of manual work. In contrast, by keeping one
variable per column and separating out the subdomains into their own column (one for
each subdomain), the resulting tidy dataframe can be analyzed more efficiently.
One Data Type per Column. A third characteristic of a tidy dataframe is that all data
entered in a column is of an identical data type. Common data types that readers are
likely to use include numbers (whole or decimal), text, and dates. Importantly, to make
use of all rows of a column, all rows must have an identical data type rather than mixed
data types. For example, Column C in the untidy dataframe contains both dates (e.g.,
02/22/2016, Sep-14) and text (e.g., “Sometime in July of 2017”). Note that another
common challenge encountered will occur when dates are entered in different formats.
Thus, in the above example, “Sep-14” may or may not have the correct year attached to
it, and the data analyst would have to do more work to verify the accuracy of that datum.
As additional examples of mixed data types, Columns D and E contain numeric data
(e.g., 11, 27, 22), text data (e.g., “13/170,” “28/170,” “16, maybe 17”), and date data
(e.g., 24-Dec3). Though humans can easily parse what data is useful when looking at
these data, computers cannot without being explicitly directed what data to extract and
how to extract it for each individual cell. This can become tedious and time consuming
and will decrease the overall utility of the dataframe.
3
Readers familiar with Excel are likely familiar with Excel’s autoformatting when entering data as
fractions (e.g., 12/24, which gets converted to 24-Dec). Without careful inspection upon data entry,
you may end up with a dataframe containing inaccurate or invalid data.
www.bhcoe.org • 33
FIGURE 5. Demonstration of the exact same data saved in an untidy format (top panel) and tidy format (bottom panel).
UNTIDY DATAFRAME
TIDY DATAFRAME
www.bhcoe.org • 34
One Datum per Cell. A fourth characteristic of a tidy dataframe is that a single
datum is entered per cell and as close to the raw data as possible. If two data
elements are needed to capture an observation (e.g., how many points were earned
on an assessment and how many points were possible), then best practice is to create
two columns, one for each data element, and to enter data accordingly. The top
panel in Figure 5 shows examples of multiple data entered within a single cell (e.g.,
rows 2 or 4 of columns D, H, I, and K), and the bottom panel shows how these data
would ideally be entered (e.g., rows 2, 3, and 5). For example, to use patient age at
the time of the assessment, it would be more helpful to store the patient’s date of
birth and the date of the assessment (two columns) rather than only the patient’s age
in years (one column). Collecting and storing the dates of birth and assessment allows
us to derive the patient’s age as needed or —if age is often used for analyses—age
could be derived and stored as a third column in the dataframe. In contrast, if only
the patient’s age is stored, organizations would be unable to analyze assessment
results based on age cohorts (e.g., millennials, generation Z, generation alpha),
historical periods in the organization (e.g., assessments conducted between 2018
and 2020), or other interesting scenarios where the raw dates would be needed.
www.bhcoe.org • 35
Section 2: Some Considerations of Tidy Data
What Data to Store. The previous section highlights an inherent tradeoff that occurs
during data entry. The more columns we include, the more data entry that must occur
across separate columns which is more work. As a result, when entering data, people
may be inclined to either add multiple variables to the same column or to aggregate
data into a single score and store only the aggregated data. For example, rather than
entering the raw data for each subdomain score for the VBMAPP in a separate
column (bottom panel, Figure 5), people may be tempted to enter only a single,
total score column as this takes less work when initially entering the data. However,
by entering only a single, aggregate score per assessment, the organization would be
unable to use any subdomain scores for analyses in the future, which may significantly
inhibit the types of analyses that can be conducted.
Another consideration for behavior analysts is the temporal nature of much of the data
we collect. For assessment data, this might be the scores on specific assessments
conducted at intake and every year thereafter. At least two options exist to store these
kinds of time series data. One option is to capture different assessments as different
variables in the dataframe (i.e., as different columns). Here, the level of observation
(i.e., what defines a row) would be at the patient level, as each patient would take up one
row. The benefit to this approach is that analyzing trends for individual patients would be
straightforward for most individuals, as all observations are on the same row. You would
simply create another column with the analysis you wanted to conduct. This dataframe
structure is sometimes referred to as a wide format.
A downside to creating wide dataframes is that they grow indefinitely to the right.
For each new assessment for a patient, you would need to add a new set of columns
for all data elements. Assessments with 20–25 elements collected per assessment
would expand to over 100 columns per row, which might be difficult to keep track of
or scan efficiently. Further, if you were interested in looking at only the most recent
assessment for all patients, these might be contained across different columns for
different patients (e.g., the second set of assessment scores for one patient,
the fourth set of assessment scores for a different patient). This would involve
significant restructuring of your data to complete any analyses.
www.bhcoe.org • 36
A second option is to capture different assessments as different observations (i.e., as
different rows). Here, the level of observation would be at the assessment level, as
each assessment would take up one row. This method of structuring a dataframe is
sometimes referred to as a long format. The benefit to creating long dataframes is
that the number of columns in the dataframe remains unchanged and consists only of
those elements that are collected with each assessment. This makes managing and
visually inspecting the dataframe very easy. The downside to this approach is that
within-patient analyses require the use of a patient ID or other unique identifier to
map the rows to one another and slightly more advanced data querying skills. Though
this sounds like it might be challenging, it is important to note that most analyses
require restructuring of the data in some manner (e.g., removing observations with
missing data, limiting the analysis to specific groups of patients).
4
Though tempting, including important information in a “Notes” column often limits the usefulness of
that information for two reasons. First, unless similar and consistent notes are entered for every row, the
amount of missing data makes that information unlikely to be useful for analytic purposes. Entering notes
takes time and resources. You should always ask whether the time and resources needed to transcribe
those notes into the database is worth that effort. Second, analyzing open text data objectively and
consistently requires skills in an area known as natural language processing (e.g., Bird et al. 2009; Vajjala
et al., 2020). Most behavior analysts are unlikely to have received training in these analyses. Thus, again,
you may end up with a column filled with data that go unused. If the information contained in a “Notes”
or open-text column is believed to be valuable, a better approach is to create a formal column in the
dataframe that captures that data element and to ensure that people consistently collect those data.
www.bhcoe.org • 37
that we know the precise degree to which time of day influences assessment scores for
one patient compared with all other patients.
The second way that metadata can be captured is at the dataframe level. These
metadata are often stored as a separate tidy dataframe to describe the contents of
the original dataframe. Figure 6 shows an example of a data dictionary that contains
metadata about the assessment dataframe in the bottom panel of Figure 5. Like the
examples above, each row has a single observation, each column has data of a single
type, and each cell has a single data element. This specific kind of metadata
dataframe can be referred to as a data dictionary because it provides (ideally) all
information someone might need to understand what is contained in the dataframe it
references and how they might then use the referenced dataframe. In addition to the
name, definition, and data type for each variable, data dictionaries often also contain
information such as whether missing values are allowed (i.e., the “mandatory”
column), how categorical data are stored or transformed into numbers so they can
be analyzed, whether the variable is a primary or secondary key (more on this below),
and when and what changes have been made to variable definitions over time.
www.bhcoe.org • 38
FIGURE 6. Example of a data dictionary with information about the variables stored in a dataframe. Many data
dictionaries include much more information. This is an example of the minimum information you would likely store.
www.bhcoe.org • 39
Handling Missing Data. A final set of proactive decisions that are worth noting relative
to working with related datasets is how to manage missing values. As practitioners begin
to combine multiple datasets for more advanced data analysis, it is likely that some rows
of one dataset do not have all the corresponding information in the second (or third, or
fourth) datasets. For example, the untidy dataset shown in Figure 5 shows common
patterns of missing data. As a result, most analytic datasets that are the combination of
several related datasets will contain missing values. Although a full treatment of ways to
manage missing data is well beyond the scope of this white paper, two broad strategies
are commonly used.
The first strategy is to drop the observations with missing data. The benefit of dropping
rows with missing data is that you know the results of your analyses are accurate because
they use only observations containing all necessary information. The downside is that the
rows dropped from the final analysis may contain important information or ranges of the
data that are relevant to what you are analyzing. For example, dropping patients with
any missing information from the dataframe in Figure 5 would reduce the dataset from
30 patients to five patients. One way to mitigate this challenge is to drop only the rows
with missing data from the subset of columns specific to your analysis. For example,
perhaps we are only interested in looking at differences in overall milestone scores from
patients in the top panel of Figure 5. Dropping only patients without an overall scoring
would reduce the total number of rows from 31 to 21.
A second strategy is to fill in the missing data (referred to as data imputation) using
dummy coding, logical values from domain expertise, or mathematical modeling (e.g.,
Molenburgh et al., 2020; van Buren, 2018). Filling missing data with dummy coding is
essentially creating a specific data value that represents “missing” and that matches the
data type for that column. Using the top panel of Figure 5 as an example, you could use
“missing” for the “1st Assessor ID” column, 01/01/1900 for missing data in the date
columns, and -10 for data in the assessment score columns. With this approach the idea
is to assign values that would produce outliers or non-logical values to make it readily
identifiable that you have handled those missing values but that they are, in fact, missing
data and are not a true representation of that variable for that observation.
www.bhcoe.org • 40
Filling in missing data using logical values involves looking at each missing value and
filling it in based on what you know about that observation and the variable/column with
missing data. For example, common practice at an organization may be to conduct the
VBMAPP sometime within 30 days of intake. Thus, if those data were missing and the
intake date was known, you could fill in the VBMAPP assessment date with a best guess
of a calendar date within 30 days of the known intake date. Filling in missing data using
mathematical modeling is a more advanced topic. Succinctly, however, these techniques
involve looking for patterns in the available data, predicting the likely value of the
missing data, and inserting that predicted value into the dataframe.
Filling in missing data using logical values or mathematical modeling has benefits and
drawbacks. The benefit of these approaches is that they improve the usability of those
observations for analyses as they contain less missing information. The downside to this
approach is that it can reduce the overall accuracy of analyses because the data being
used for analyses are of unknown accuracy. For dataframes with a lot of missing data,
imputing missing values may alter the results of subsequent analyses.
Data Markup. A final consideration pertains to datasets stored using programs with data
markup options. For example, when storing data in Excel, the user can provide
information about the data by changing the font color or font style or by using
conditional highlighting. These methods of data markup can be valuable sources of
information when a user is visually looking at a dataframe in Excel or a similar software
program. However, when data is moved between database systems or storage methods,
the information contained within data markup is often lost5. For organizations interested
in conducting analytics across larger stores of data, tools like Excel are inefficient, and
data are often read into alternative analytic environments (e.g., R, Python, SPSS).
Thus, if there is important information about the data that you currently capture with
data markup, it is better to add this information as a new column so the information is
maintained regardless of who analyzes the data and the environment they use.
5
The easiest way to see what information is retained is to save your data as a plaintext file (e.g.,.txt,.csv),
close the document, then reopen the plaintext file. Best practice is to save data as plaintext files only to
force data restrictions at the dataframe level and allow for reliable use of data across analytic platforms.
www.bhcoe.org • 41
Section 3: Working with Related Datasets
When maintaining tidy datasets, practitioners will likely find that different datasets
define unique observations (rows) at different levels. For example, we likely need one
observation per patient for a dataset storing patient demographic information (e.g., date
of birth, cultural variables relevant to intervention, gender identity). But we would need
multiple observations per patient for a dataset containing quarterly change in the
number of programs mastered or annual Vineland scores for patients. If we wanted to
know whether annual change in assessment scores differed based on patient age,
primary language, or household size, the patient demographics and annual assessment
scores datasets would need to be combined.
As another example, the patient assessment dataset may contain a column about who
conducted the assessment or the patient’s program supervisor. If another dataset
contained information about each employee (e.g., education, training, years of
experience in ABA, number of cases with different patient profiles), we might want to
combine these datasets to ask questions about how an employee’s background, training,
and success with different patient profiles (i.e., the employee’s competence) might relate
to annual change in assessment scores. To do this, the employee competence dataset
would need to be combined with the annual assessment scores dataset.
Data Models. The relationship between different datasets is often stored graphically in
data models (Mosley et al., 2009). Figure 7 shows one example of a very simple data
model relative to patient assessment data. The purpose of a data model is to show what
information is contained in each dataset and which columns are used to relate one
dataset to another. Data models are helpful for at least two reasons. First, creating data
models helps an organization efficiently develop and execute data strategies to learn
more from their data than any single dataset can tell them. Second, data models are
critical for understanding where data lives and what can be accomplished with the data
for employees working as data analysts or database managers6.
6
Many books exist on the topic of data modeling and how different database designs are more or less
useful depending on the ways that data are used for an organization. Curious readers may want to begin
with Silverston and Agnew (2009), Simsion and Witt (2004), or Umaneth and Scamell (2014).
www.bhcoe.org • 42
Primary and Foreign Keys. To combine datasets, each dataset must contain a primary
key and one or more foreign keys. A primary key is a column in a dataset where every
observation or row in that column is unique (Mosley et al., 2009). For example, column A
in the bottom panel of Figure 5 is the primary key. Each patient can exist in only one row,
and the patient identifier number is unique to each patient. If we want to combine this
dataset with the data in Figure 8, then Figure 8 must contain a column with the primary
key from the bottom panel in Figure 5. These columns in Figure 8 provide a link for
combining the data between the two tables and are thus referred to as foreign keys
(Mosley et al., 2009). When designing and storing datasets, practitioners should consider
what additional datasets might be combined with an assessment dataset so that they can
better understand variables related to patient outcomes. Once additional datasets are
identified, developing primary keys and embedding foreign keys across datasets
improves the efficiency with which larger scale analyses can be conducted.
www.bhcoe.org • 43
FIGURE 7. Example data model showing how the data from different tables relate to one another. Data models are
helpful as they show you how you might combine data from multiple sources into one analytic dataframe.
www.bhcoe.org • 44
FIGURE 8. Example dataframe with simple supervisor characteristics and the patients currently in their caseload.
www.bhcoe.org • 45
Section 4: Examples of Basic Analytics with
Assessment Data
On some regular cadence, organizations are likely interested in analyzing patient
assessment scores. Perhaps they are interested in patients’ rates of improvement.
Maybe the organization is interested in the types of patients (e.g., age, reason for
referral) that they are better or worse at treating. Or maybe they are interested in
understanding the supervisors who are performing better or worse in treating different
patients. Answering all these basic questions will likely require tidy datasets and the
ability to combine data from multiple dataframes.
Rate of Patient Improvement. The top panel in Figure 9 shows how the structure of
the tidy dataframe allows us to easily analyze the rate of change in VBMAPP scores per
month for each patient in our dataframe7. The top-left panel shows the average overall
change in VBMAPP Milestones scores per month for each patient as a jitter plot where
each marker represents a single patient’s average change score. The top-right panel
shows the same data but as a box-and-whisker plot. For these box-and-whisker plots,
the ‘X’ corresponds to the average change score for all patients; the line across the
middle of the box represents the median (i.e., 50th percentile); the top and bottom
edges of the box correspond to the 75th and 25th percentiles, respectively; the lines
extending out from the box (aka the whiskers) show the maximum and minimum values
excluding outliers; and individual circle markers correspond to those outliers.
To plot the top panel in Figure 9 we needed a minimum of four datum. Two are overall
milestone scores for at least two assessments; the other two are the calendar dates
when those assessments were conducted. The structure of the tidy dataframe makes it
easy because we can simply call the four columns needed to plot these data. However,
if the columns contained different data types or multiple data per cell, we would have
been unable to plot these data as easily and would have had to spend time cleaning
and organizing the data, creating inefficiencies.
7
These data correspond to the sheet titled, “Demo-Px Improve – Wide” in the accompanying
Excel document.
www.bhcoe.org • 46
FIGURE 9. Demonstration of basic assessment analyses using combined dataframes.
www.bhcoe.org • 47
Focused vs. Comprehensive. The middle panel in Figure 9 shows the same change
scores per month as the top panel but with patients stratified based on the type of
intervention they receive – comprehensive or focused. Unlike the top panel, creating
these plots required joining data from two different dataframes. First, we needed the
calculations for obtaining the change score per month from a tidy assessment
dataframe. Second, we needed each patient’s authorized number of treatment hours
per week from a “Patient Information” dataframe to get labels of comprehensive or
focused intervention8. To join these dataframes we used a function in Excel called
“VLOOKUP” that allows you to use keys as described above to find the value in one
sheet that is associated with a value in a second sheet. Similar simple functions for
joining multiple dataframes exist for many analytic software programs or user
interfaces for databases. The trick is making sure the data are in the right format
and are complete so that joining dataframes is possible.
Teasing Out the Details. The data in the middle panel of Figure 9 suggest that patients
receiving comprehensive services show more change in assessment scores per month
than patients receiving focused intervention. A natural follow-up question is whether
there is something specific to receiving comprehensive vs. focused services or if the
differences are the result of simply receiving more hours of ABA. The bottom panel in
Figure 9 aims to answer this question for these fake patients and hypothetical data.
Specifically, these scatterplots show the average change scores per month based on the
number of ABA therapy hours each patient receives per week (bottom left panel) and
based on the percentage of authorized hours used by each patient per week (bottom
right panel). No trends in either plot are noticeable, indicating that for these fake
patients at this fake organization there is something specific to the different types of
services they provide that contributes to the average change in assessment scores per
month beyond just the raw number of hours of ABA being received. This insight might
be something that the Clinical Director can follow-up.
8
We arbitrarily chose any patients authorized for ten or fewer hours per week as receiving focused
intervention and any patient authorized for more than 10 hours per week as receiving comprehensive
services. This was done as a simple demonstration of the methods practitioners can use to join
dataframes for analysis – not to define service type solely based on this criterion.
www.bhcoe.org • 48
Getting Fancy. At this point, readers are likely thinking of many ways they might slice
and dice their data to understand the variables that contribute to differing rates of
changes in assessment scores across all the patients in their organization. For
example, Figure 10 shows how we might analyze average change in assessment
scores based on the supervising practitioners’ years of experience in ABA or their
level of education. These analyses required joining data from a patient assessment
sheet and an employee information sheet, with the takeaway that analyses of the
variables that contribute to patient change in assessment scores using many different
datasets becomes much easier and more efficient when the data is stored in a tidy
format using a basic relational schema between the datasets. If this is done well,
practitioners need not spend their time entering and wrangling data into workable
structures and formats. Instead, behavior analysts could move to more advanced
analyses of change in assessment scores such as controlling for patient risk factors,
improvement as a function of patient cohorts, or improvement as a function of other
therapist characteristics.
www.bhcoe.org • 49
SUMMARY
The field of ABA has a long history of collecting, analyzing, and using assessment data to
drive the delivery of evidenced-based treatment for the individual patients they serve.
Increasingly, patients and other stakeholders (e.g., third-party payors) are asking ABA
providers to demonstrate more broadly that their treatments work, to quantify the cost
of treatment relative to improvement in patient behavioral health, and to allow for
patients and stakeholders to compare different ABA organizations against one another:
patients have a right to choose the provider best suited to their needs. ABA providers
are also increasingly interested in objectively measuring the effectiveness of the services
they provide compared with other organizations. Systematically measuring patient
outcomes in a manner similar to that of other providers allows organizations to better
understand their strengths and weaknesses. In turn, they can take actionable steps to
improve their services and better treat their patients.
Efficiently analyzing and reporting on treatment outcomes requires that the assessment
data being collected meet at least two criteria: first, that the data are stored in a tidy
format; and second, that each dataframe containing potentially important information
includes the necessary data to join multiple dataframes with related information. When
assessment data are stored in ways that meet these two criteria, ABA organizations can
begin to leverage more advanced data analytics to self-assess their employees’ skills and
abilities thoroughly. Most importantly, these analyses will result in improved patient care
and more efficient use of clinical resources.
www.bhcoe.org • 50
REFERENCES
Abidin, R. R. (2012). Parenting Stress Index, 4th Edition | PSI-4. Parinc.
https://www.parinc.com/Products/Pkey/333
Behavior Analyst Certification Board. (2020). Ethics code for behavior analysts.
Littleton, CO: Author. Retrieved from https://www.bacb.com/wp-
content/uploads/2020/11/Ethics-Code-for-Behavior-Analysts-2102010.pdf
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python.
O’Reilly.
www.bhcoe.org • 51
Bruni, T. P. (2014). Test review: Social Responsiveness Scale–Second Edition (SRS-2).
Journal of Psychoeducational Assessment, 32(4), 365–369.
https://doi.org/10.1177/0734282913517525
Buescher AVS, Cidav Z, Knapp M, Mandell DS. (2014). Costs of autism spectrum
disorders in the United Kingdom and the United States. JAMA Pediatrics, 168(8),
721–728. https://doi.org/10.1001/jamapediatrics.2014.210
Centers for Disease Control and Prevention (2019). Treatment and intervention services
for autism spectrum disorder. https://www.cdc.gov/ncbddd/autism/treatment.html
Centers for Disease Control and Prevention (2020). Autism and Developmental
Disabilities Monitoring (ADDM) Network.
https://www.cdc.gov/mmwr/volumes/69/ss/ss6904a1.htm?s_cid=ss6904a1_w
Chasson, G. S., Harris, G. E., & Neely, W. J. (2007). Cost comparison of early intensive
behavioral intervention and special education for children with autism. Journal of
Child and Family Studies, 16(3), 401–413. https://doi.org/10.1007/s10826-006-
9094-1
Chezan, L. C., Liu, J., Cholewicki, J. M., Drasgow, E., Ding, R., & Warman, A. (2021). A
psychometric evaluation of the Quality of Life for Children with Autism Spectrum
Disorder Scale. Journal of Autism and Developmental Disorders.
https://doi.org/10.1007/s10803-021-05048-y
Cohen, H., Amerine-Dickens, M., & Smith, T. (2006). Early intensive behavioral
treatment: Replication of the UCLA model in a community setting. Journal of
Developmental & Behavioral Pediatrics, 27(2), S145–S155.
https://doi.org/10.1097/00004703-200604002-00013
Cohen, I. L., & Sudhalter, V. (1999). PDD Behavior Inventory. Parinc. Available from
https://www.parinc.com/Products/Pkey/318
Cooper, J.O., Heron, T.E., & Heward, W.L. (2020). Applied behavior analysis (3rd ed).
Pearson.
www.bhcoe.org • 52
Constantino, J. N. (2012). (SRSTM-2) Social Responsiveness Scale (2nd ed.). Available
from https://www.wpspublish.com/srs-2-social-responsiveness-scale-second-edition
COSMIN. (n.d.). About the initiative. Retrieved June 16, 2021, from
https://www.cosmin.nl/about/
Dasu, T, & Johnson, T (2003). Exploratory data mining and data cleaning. Wiley.
Dunn, D. (2019). Peabody Picture Vocabulary Test, Fifth Edition (PPVT-5). Pearson.
Available from
https://www.pearsonassessments.com/content/dam/school/global/clinical/us/assets
/ppvt-5/ppvt-5-sample-score-summary-report.pdf
Eikeseth, S., Klintwall, L., Jahr, E., & Karlsson, P. (2012). Outcome for children with
autism receiving early and intensive behavioral intervention in mainstream
preschool and kindergarten settings. Research in Autism Spectrum Disorders, 6(2),
829–835. https://doi.org/10.1016/j.rasd.2011.09.002
Eldevik, S., Hastings, R.P., Hughes, J.C., Jahr, E., Eikeseth, S., & Cross, S. (2009). Meta-
analysis of early intensive Behavioral intervention for children with autism. Journal of
Clinical Child & Adolescent Psychology, 38, 439–450.
https://doi.org/10.1080/15374410902851739
Gjevik, E., Eldevik, S., Fjæran-Granum, T., & Sponheim, E. (2010). Kiddie-SADS reveals
high rates of DSM-IV disorders in children and adolescents with autism spectrum
disorders. Journal of Autism and Developmental Disorders, 41(6), 761–
769. https://doi.org/10.1007/s10803-010-1095-7
Goldstein, S., & Naglieri, J. A. (2012). Autism Spectrum Rating Scales (ASRS) [Technical
Report #1]. https://www.acer.org/files/ASRS-Tech-Supp.pdf
www.bhcoe.org • 53
Gresham, F., & Elliot, S. (2008). Social Skills Improvement System SSIS rating scales.
https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-
Assessments/Behavior/Social-Skills-Improvement-System-SSIS-Rating-
Scales/p/100000322.html
Grey, I., Coughlan, B., Lydon, H., Healy, O., & Thomas, J. (2017). Parental satisfaction
with early intensive behavioral intervention. Journal of Intellectual Disabilities, 23,
174462951774281. https://doi.org/10.1177/1744629517742813
Howard, J. S., Stanislaw, H., Green, G., Sparkman, C. R., & Cohen, H. G. (2014).
Comparison of behavior analytic and eclectic early interventions for young children
with autism after three years. Research in developmental disabilities, 35(12), 3326-
3344. https://doi.org/10.1016/j.ridd.2014.08.021
Hyman SL., Levy SE., Myers SM. (2020). Identification, evaluation, and management of
children with autism spectrum disorder. American Academy of Pediatrics, 145(1), 1-
63. https://doi.org/10.1542/peds.2019-3447
Rogge, N., Janssen, J. (2019). The economic costs of autism spectrum disorder: A
literature review. Journal of Autism and Developmental Disorders, 49, 2873–2900.
https://doi.org/10.1007/s10803-019-04014-z
Klintwall, L., Eldevik, S., & Eikeseth, S. (2015). Narrowing the gap: Effects of
intervention on developmental trajectories in autism. Autism, 19, 53–63.
https://doi.org/10.1177/1362361313510067
Lino, M., Kuczynski, K., Rodriguez, N., & Schap, T. (2017). Expenditures on children by
families, 2015 (No. 1528–2015). U.S. Department of Agriculture, Center for
Nutrition Policy and Promotion. https://fns-
prod.azureedge.net/sites/default/files/crc2015_March2017.pdf
Markowitz, L. A., Reyes, C., Embacher, R. A., Speer, L. L., Roizen, N., & Frazier, T. W.
(2016). Development and psychometric evaluation of a psychosocial quality of life
questionnaire for individuals with autism and related developmental disorders.
www.bhcoe.org • 54
Autism : The International Journal of Research and Practice, 20(7), 832–844.
https://doi.org/10.1177/1362361315611382
Makrygianni, M. K., Gena, A., Katoudi, S., & Galanis, P. (2018). The effectiveness of
applied behavior analytic interventions for children with autism spectrum disorder:
A meta-analytic study. Research in Autism Spectrum Disorders, 18–31.
https://doi.org/10.1016/j.rasd.2018.03.006
Martens, B. K., Witt, J. C., Elliott, S. N., & Darveaux, D. X. (1985). Teacher judgments
concerning the acceptability of school-based interventions. Professional
Psychology: Research and Practice, 16(2), 191–198. https://doi.org/10.1037/0735-
7028.16.2.191
Mokkink, L. B., Prinsen, C. A. C., Bouter, L. M., Vet, H. C. W. de, & Terwee, C. B.
(2016). The Consensus-based Standards for the selection of health Measurement
INstruments (COSMIN) and how to select an outcome measurement instrument.
Brazilian Journal of Physical Therapy, 20(2), 105. https://doi.org/10.1590/bjpt-
rbf.2014.0143
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L.,
Bouter, L. M., & Vet, H. C. W. de. (2010). The COSMIN checklist for assessing the
methodological quality of studies on measurement properties of health status
measurement instruments: an international Delphi study. Quality of Life Research,
19(4), 539. https://doi.org/10.1007/s11136-010-9606-8
Molenberghs, G., Fitzmaurice, G., Kenward, M.G., Tsiatis, A., & Verbeke, G. (Eds.)
(2020). Handbook of missing data methodology. CRC Press.
Mosley, M., Brackett, M., & Earley, S. (Eds.) (2009). The DAMA guide to the data
management body of knowledge enterprise server version. Technics Publications.
Peters-Scheffer, N., Didden, R., Korzilius, H., & Sturmey, P. (2011). A meta-analytic
study on the effectiveness of comprehensive ABA-based early intervention
www.bhcoe.org • 55
programs for children with autism spectrum disorders. Research in Autism Spectrum
Disorders, 5, 60–69. https://doi.org/j.rasd.2010.03.011
Reichow, B., Hume, K., Barton, E. E. & Boyd, B. A. (2018). Early intensive behavioral
intervention (EIBI) for young children with autism spectrum disorders (ASD).
Cochrane Database of Systematic Reviews.
https://doi.org/10.1002/14651858.CD009260.pub3
Reynolds, C. R., & Kamphaus, R. W. (2015). Behavior Assessment System for Children
(3rd ed.). Pearson.
https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-
Assessments/Behavior/Comprehensive/Behavior-Assessment-System-for-Children-
%7C-Third-Edition-/p/100001402.html
Ridout, S., & Eldevik, S. (2021). Measures used to assess treatment outcomes in
children with autism receiving early and intensive behavioral interventions: A
Review. Manuscript submitted for publication.
Rodgers, M., Simmonds, M., Marshall, D., Hodgson, R., Stewart, L. A., Rai, D., Wright,
K., Ben-Itzchak, E., Eikeseth, S., Eldevik, S., Kovshoff, H., Magiati, I., Osborne, L. A.,
Reed, P., Vivanti, G., Zachor, D., & Couteur, A. L. (2021). Intensive behavioural
interventions based on applied behaviour analysis for young children with autism:
An international collaborative individual participant data meta-analysis. Autism,
25(4), 1137–1153. https://doi.org/10.1177/1362361320985680
Romanczyk, R. G., & Gillis, J. M. (2008). Practice guidelines for autism education and
intervention: historical perspective and recent developments. In J. Luiselli, D. C.
Russo, & W. P. Christian (Eds.), Effective practices for children with autism:
educational and behavior support interventions that work. Oxford University Press.
www.bhcoe.org • 56
Sallows, G. O., & Graupner, T. D. (2005). Intensive behavioral treatment for children
with autism: Four-year outcome and predictors. American Journal on Mental
Retardation, 110(6), 417–438.
Waters, C. F., Amerine Dickens, M., Thurston, S. W., Lu, X., & Smith, T.
(2018). Sustainability of early intensive behavioral intervention for children with
autism spectrum disorder in a community setting. Behavior Modification, 00(0), 1–
24. https://doi.org/10.1177/0145445518786463
Williamson E., Sathe N. A., Andrews J. C., Krishnaswami, S., McPheeters, M. L.,
Fonnesbeck, C., Sanders, K., Weitlauf, A., Warren, Z. (2017). Medical therapies for
children with autism spectrum disorder—An update. Agency for Healthcare
Research and Quality (U.S.).
Schoper, E., Bourgondien, M. E. V., Wellman, G. J., & Love, S. R. (2010). (CARSTM-2)
Childhood Autism Rating ScaleTM (2nd ed). https://www.wpspublish.com/cars-2-
childhood-autism-rating-scale-second-edition
Silverston, L., & Agnew, P. (2009). The Data Model Resource Book. Wiley.
Simsion, G.C., & Witt, G.C. (2004). Data Modeling Essentials (3rd ed.). Morgan
Kaufman.
Shimabukuro, T. T., Grosse, S. D., & Rice, C. (2007). Medical expenditures for children
with an autism spectrum disorder in a privately insured population. Journal of
Autism and Developmental Disorders, 38(3), 546–
552. https://doi.org/10.1007/s10803-007-0424-y
Smith T. (2013). What is evidence-based behavior analysis? The Behavior Analyst, 36(1),
7–33. https://doi.org/10.1007/BF03392290
www.bhcoe.org • 57
Smith, D. P., Hayward, D. W., Gale, C. M., Eikeseth, S., & Klintwall, L. (2019). Treatment
gains from early and intensive behavioral intervention (EIBI) are maintained 10 years
later. Behavior Modification, 1–21. https://doi.org/10.1177/0145445519882895
Sparrow, S. S., Cicchetti, D. V., & Saulnier, C. A. (2016). Vineland Adaptive Behavior
Scales (3rd ed.).
https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-
Assessments/Behavior/Adaptive/Vineland-Adaptive-Behavior-Scales-%7C-Third-
Edition/p/100001622.html
Umaneth, N.S., & Scamell, R.W. (2014). Data modeling and database design (2nd ed.).
Cengage Learning.
Vajjala, S., Majumder, B., Gupta, A., & Surana, H. (2020). Practical natural language
processing: A comprehensive guide to building real-world NLP systems. O’Reilly.
www.bhcoe.org • 58
www.bhcoe.org • 59
TABLE 1: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Autism
Autism Spectrum Rating Scales (ASRS), Goldstein and Naglieri (2009)
ASRS™ is a multi-informant norm-referenced measure using a 5-point Likert scale that can be used to identify severity of symptoms and behaviors associated with
ASDs completed by caregivers and teachers.
Items in the rating scales are based on DSM-V diagnostic criteria for autism
The obtained scores are based on caregiver reports rather than direct
observation of the client
Allows comparisons of performance within age groups
www.bhcoe.org • 60
TABLE 1: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Autism
Childhood Autism Rating Scale (CARS™-2), Schopler et al., (2010)
15-items scored on a
A master's degree in psychology, education,
4-point Likert scale $237.00 ⎯ (Starter and
speech-language pathology, occupational
Three Forms: complete kit, print and digital)
therapy, social work, counseling, or in a field
Questionnaire for $41.00 ⎯ (Test forms and
2⎯57 years 5-10 minutes closely related to the intended use of the
parents/caregivers, High reports)
assessment, and formal training in the ethical
functioning (6 to 57), $41.00 ⎯ (All products: tests
administration, scoring, and interpretation of
Standard Version and materials for CARS2)
clinical assessments
(2 to 36)
Items are based on DSM-IV diagnostic criteria for autism None identified in the literature
www.bhcoe.org • 61
TABLE 1: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Autism
Gilliam Autism Rating Scale, Third Edition (GARS-3), Gilliam (2013)
GARS-3 is a multi-informant norm-referenced measure that can be used to identify severity of symptoms and behaviors associated with ASDs completed by
caregivers, teachers, and clinicians.
www.bhcoe.org • 62
TABLE 1: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Autism
PDD Behavior Inventory (PDDBI), Cohen and Sudhalter (1999)
PDDBI™ is a rating scale filled out by caregivers or teachers designed to assess children having a pervasive developmental disorder.
$464.00 ⎯ (PDDBI
Teacher and Parent A master's degree in psychology, education,
Comprehensive Kit)
Standard form: speech-language pathology, occupational
Standard form: $111.00 ⎯ (PDDBI Teacher
124 items therapy, social work, counseling, or in a field
20-30 minutes Rating Form, Pack of 25)
5 months⎯18 years Extended forms: closely related to the intended use of the
Extended form: $33.00 ⎯ (PDDBI Parent Score
180-188 items assessment, and formal training in the ethical
30-45 minutes Summary Sheets, Pack of 25)
Scored on a 3-point administration, scoring, and interpretation of
$33.00 ⎯ (PDDBI Teacher
Likert scale clinical assessments
Score Summary Sheets)
The obtained scores are based on caregiver and teacher reports rather than
Gives consistent measurements of progress over time when compared against
direct observation of the client
treatment plan goal progress
www.bhcoe.org • 63
TABLE 1: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Autism
Social Responsiveness Scale- Second Edition (SRS-2), Constantino (2012)
SRS™-2 identifies social impairment associated with ASD and quantifies its severity. Completed by multiple raters who have at least 1 month of experience with
the rated individual.
www.bhcoe.org • 64
TABLE 2: Tests Using Norm Referenced Interpretation of Scores Measuring Communication Skills Reliability Coefficient of 0.8 or above
Receptive and Expressive one-word picture Vocabulary Tests ⎯ Fourth Edition (ROWPVT-4, EOWPVT-4),
Edited by Brownell (2010), Spanish-Bilingual (2012)
EOWPVT-4 and ROWPVT-4 are individually administered, co-normed tests that measure receptive and expressive vocabulary skills
Allows comparisons of performance within age groups Provides one overall standard score for receptive and one for expressive
language
Allows comparison between receptive and expressive vocabulary
Online scoring and report generation not available
Has a Spanish-language version
www.bhcoe.org • 65
TABLE 2: Tests Using Norm Referenced Interpretation of Scores Measuring Communication Skills Reliability Coefficient of 0.8 or above
Expressive Vocabulary Test ⎯ Third Edition (EVT-3) , Williams (2019)
EVT-3 is a norm-referenced and individually administered test of expressive vocabulary that measures use of nouns, verbs adjectives and adverbs.
www.bhcoe.org • 66
TABLE 2: Tests Using Norm Referenced Interpretation of Scores Measuring Communication Skills Reliability Coefficient of 0.8 or above
Peabody Picture Vocabulary Test ⎯ Fifth Edition (PPVT-5), Dunn (2018)
PPVT™-5 is a norm-referenced and individually administered measure of receptive vocabulary based on words in Standard American English. Assesses use of
nouns, verbs, adjectives, and adverbs of a speaker (expressive language).
Growth Scale Values (GSVs): an objective score for measuring changes in PPVT-5 Complete Kit (Form A and B)
performance over time
www.bhcoe.org • 67
TABLE 2: Tests Using Norm Referenced Interpretation of Scores Measuring Communication Skills Reliability Coefficient of 0.8 or above
Test of Pragmatic Language ⎯ Second Edition (TOPL-2), Terasaki and Phelps-Gunn (2007)
Uses norm-referenced interpretation of scores to evaluate pragmatic language skills that involve social communication in context, selecting appropriate content,
expressing feelings, manding, and handling other aspects of pragmatic language.
A master’s degree in
psychology, school counseling, occupational
6⎯18 yrs 11 mo 43 items 45-50 minutes $270.00 ⎯ (TOPL-2 Kit) therapy, speech-language pathology, social
work, education, special education, or
related field
www.bhcoe.org • 68
TABLE 2: Tests Using Norm Referenced Interpretation of Scores Measuring Communication Skills Reliability Coefficient of 0.8 or above
Vineland Adaptive Behavior Scales ⎯ Third Edition (Vineland-3), Sparrow, Ciccheti and Saulnier
(2016)
Vineland-3 uses normed reference interpretation to measure communication, daily living skills, socialization, and motor skills.
\ General Weaknesses
Allows comparisons of performance withing age groups using normal curve Based on indirect assessment using interviews with caregivers and teacher or
rating scales completed by the caregivers and teacher
Allows comparison between communication, daily living and social skills
The communication domain uses receptive, expressive, and written
Has Spanish forms for parent/caregiver forms interpretation
www.bhcoe.org • 69
TABLE 3: Tests Using Criterion Referenced Interpretation of Scores Measuring Communication Skills
Assessment of Basic Language and Learning Skills, Revised (ABLLS-R), Partington (2010)
The ABLLS-R® is an assessment tool that helps identify deficiencies in language, academic, self-help, and motor skills and progress monitoring using criterion-
referenced interpretation of scores.
Measures other skills such as imitation, matching, and basic academic skills Does not include standardized assessment procedures
Provides the practitioner with options for selecting goals for intervention Age level comparisons cannot be made
Can be used to track progress over time Limited published studies that have assessed the psychometric properties of
assessment protocols or the efficacy treatments based on them
Can be administered in any language
www.bhcoe.org • 70
TABLE 3: Tests Using Criterion Referenced Interpretation of Scores Measuring Communication Skills
PEAK Comprehensive Assessment (PCA), Dixon (2019)
PCA is designed as an assessment instrument and treatment protocol for addressing language and cognitive deficits in children with autism.
Does not provide standardized score and as a result age level comparisons
Addressed foundational learning skills, foundational speaker and listener skills
cannot be made
www.bhcoe.org • 71
TABLE 3: Tests Using Criterion Referenced Interpretation of Scores Measuring Communication Skills
Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP), Sundberg (2008)
VB-MAPP is an assessment tool curriculum guide, and skill-tracking system that uses criterion referenced interpretation of scores.
www.bhcoe.org • 72
TABLE 4: Tests Measuring Daily Living and Social Skills
Assessment of Functional Living Skills (AFLS), Partington and Mueller (2012)
The AFLS uses criterion referenced interpretation of the scores. Provides a systematic way to evaluate, track, and teach functional, adaptive, and self-help skills so
that individuals with autism or developmental delays can become more independent.
www.bhcoe.org • 73
TABLE 4: Tests Measuring Daily Living and Social Skills
Social Skills Improvement System (SSIS) Rating Scales, Gresham and Elliot (2008)
Offers a targeted and comprehensive assessment of an individual’s social skills (conversations, cooperation, assertion, responsibility, empathy, engagement, and
self-control), problem behaviors and academic competence.
A master’s degree in
$139.00 ⎯ (SSIS Rating Scales)
psychology, school counseling, occupational
$57.75 ⎯ (Handscoring
3⎯18 years 140 items 10-25 minutes therapy, speech–language pathology, social
package of 25)
work, education, special education, or
$71.75 ⎯ (Computer scoring
related field
package of 25)
The SSIS uses caregiver, teacher and client self-reports which could result in
Rating scales for parents, clients, and teachers.
over or underestimation of client’s actual social skills
English and Spanish forms
www.bhcoe.org • 74
TABLE 4: Tests Measuring Daily Living and Social Skills
Vineland Adaptive Behavior Scales ⎯ Third Edition (Vineland-3), Sparrow, Ciccheti and Saulnier (2016)
Vineland-3 uses normed reference interpretation to measure communication, daily living skills, socialization, and motor skills.
www.bhcoe.org • 75
TABLE 5: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Problem Behaviors and Executive Functioning Skills
Aberrant Behavior Checklist - Second Edition (ABC), Aman and Singh (1994)
ABC is a symptom checklist for assessing problem behaviors of children and adults with developmental disabilities.
www.bhcoe.org • 76
TABLE 5: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Problem Behaviors and Executive Functioning Skills
Behavior Assessment system for Children - Third Edition (BASC-3), Reynolds and Kamphaus (2015)
BASC-3 is a multi-informant norm-referenced measure that can be used to measure severity of problem behaviors in the community, school, and home settings.
www.bhcoe.org • 77
TABLE 5: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Problem Behaviors and Executive Functioning Skills
Behavior Rating Inventory of Executive Function - Second Edition (BRIEF-2), Gioia, Isquith, Guy and Kenworthy (2015)
BRIEF-2 is a multi-informant norm-referenced rating scale that can measures executive functioning skills in home and school environments
$457.00 ⎯ (BRIEF-2
Parent/Teacher/Self-Report
Hand-Scored Kit) A master's degree in psychology, education,
$350.00 ⎯ (BRIEF-2 speech-language pathology, occupational
Parent, Teacher, and Parent/Teacher Hand-Scored therapy, social work, or counseling, or in a field
5⎯18 years Self-Report forms: 5⎯10 minutes Kit) closely related to the intended use of the
12 items $278.00 ⎯ (BRIEF-2 Screening assessment, and formal training in the ethical
Parent/Teacher/Self-Report administration, scoring, and interpretation of
Hand-Scored Kit) clinical assessments
$83.00 ⎯ (BRIEF-2 forms,
package of 25)
www.bhcoe.org • 78
TABLE 5: Tests Using Norm Referenced Interpretation of Scores Measuring Severity of Problem Behaviors and Executive Functioning Skills
Conners - Third Edition (Conners-3), Conners (2012)
Conners-3 is a multi-informant norm-referenced measure that can be used to measure defects in executive functioning, attention and levels of
hyperactivity/impulsivity.
www.bhcoe.org • 79
TABLE 6: Measuring Social Significance
Child and Family Quality of Life Scale - Second Edition (CFQL-2), Frazier et al. (2020)
CFQL evaluates clinically relevant aspects of psychosocial quality of life in individuals at risk for or with an existing developmental disorder diagnosis and is
completed by caregivers.
www.bhcoe.org • 80
TABLE 6: Measuring Social Significance
Family Empowerment Scale (FES), Koren et al. (1992)
FES is designed to measure empowerment in families with children who have emotional, behavioral, or mental disorders.
Administration Time
Age Range Total Items Cost Qualification of the Assessor
& Scoring
www.bhcoe.org • 81
TABLE 6: Measuring Social Significance
Intervention Rating Profile (IRP-15), Witt and Elliot (1985)
IRP-15 is a single factor scale that has been demonstrated to assess treatment acceptability of various interventions. It can be completed by teachers,
parents/caregivers, and interventionists.
www.bhcoe.org • 82
TABLE 6: Measuring Social Significance
Pediatric Quaity of Life Inventory (PedsQL), Varni et al. (2001)
PedsQL is a generic health status instrument with parent and child forms that assesses five domains of health (physical functioning, emotional functioning, social
functioning, and school functioning) in children and adolescents.
It has good validity, reliability, and internal consistencyTranslated into multiple Reliance on the caregiver version of the scale may show imperfect agreement
languagesResponsive to clinical change over time between children and parents, as well as parents and professionals.
www.bhcoe.org • 83
TABLE 6: Measuring Social Significance
Parenting Stress Index (PSI-4), Abidin (2012)
PSI-4 is screening and triage measure for evaluating the parenting system and identifying issues that may lead to problems in the child’s or parent’s behavior.
www.bhcoe.org • 84
TABLE 6: Measuring Social Significance
Parental Satisfaction Scale-EIBI (PSS-EIBI), Gray et al. (2019)
www.bhcoe.org • 85
TABLE 6: Measuring Social Significance
Quality of Life for Children with Autism Spectrum Disorder (QOLASD-C), Chezan et al. (2021)
QOLASD-C assess quality of life (QOL) as a treatment outcome for children with ASD. Parents rate their childs satisfaction level across three domains:
interpersonal relationships, self-determination, and emotional well-being.
Small sample size included in the analysis One of the three factors (i.e.,
Consists of simple structure with three domainsShort length of the scale emotional well-being) had marginal reliability than the other two factors
Decent psychometric properties Demographic data related to children’s age, gender, and school attendance
were available only for a subsample of children
www.bhcoe.org • 86
ACKNOWLEDGMENTS
BHCOE thanks the volunteers and subject matter experts for their assistance
in developing this publication and the resources associated with it, as well as
additional support.
www.bhcoe.org • 87
www.bhcoe.org • 88