Rethinking Gold Standards

Applied Neuropsychology: Child
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/hapc20
Rethinking “gold standards” and “best practices” in

the assessment of autism
Noah K. Kaufman
To cite this article: Noah K. Kaufman (2020): Rethinking “gold standards” and “best practices” in
the assessment of autism, Applied Neuropsychology: Child, DOI: 10.1080/21622965.2020.1809414
To link to this article: https://doi.org/10.1080/21622965.2020.1809414
Published online: 27 Aug 2020.
Submit your article to this journal
Article views: 19
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=hapc20
APPLIED NEUROPSYCHOLOGY: CHILD
https://doi.org/10.1080/21622965.2020.1809414
Rethinking “gold standards” and “best practices” in the assessment of autism

Noah K. Kaufman
Center for Neuropsychological Studies, Texas Tech University Health Sciences Center, Las Cruces, New Mexico, USA
ABSTRACT KEYWORDS
Failure to correctly diagnosis autism is problematic in both the false-positive and false-negative ADI-R; ADOS; assessment;
directions. Diagnosing autism when it is not truly present can direct limited resources away from autism; autism spectrum
those who actually need the services, while also creating stress and confusion for individuals and disorder; “best practices”;
“gold standard”;
families. In contrast, failure to correctly identify autism when it is indeed present can prevent indi- neuropsychological;
viduals and families from receiving needed support, including early intervention services. Those psychological
familiar with current trends in autism assessment are likely aware of “gold standards” involving
specific autism tests and “best practices” involving multi-disciplinary autism teams. Curiously, these
“gold standard” and “best practice” proclamations have not been adequately scrutinized. The pre-
sent article aims to address this gap in the literature by (a) discussing the value of autism tests/
tools; (b) drawing attention to biasing influences in autism assessment; (c) identifying methodo-
logical flaws in “gold standard” autism assessment research; and (d) proposing that more assess-
ment, not less, might be better in autism assessment. It is concluded that it is time to rethink
“gold standards” and “best practices” in the assessment of autism.
Introduction neuropsychology. For these and other reasons explored

below, it is time to rethink “gold standards” and “best
Current “best practices” in autism assessment equates to
practices” in the assessment of autism.
multi-disciplinary autism teams using “gold standard” tests
to make a diagnosis of autism. While this sounds reassuring,
the scientific support underlying these proclamations is Clinical judgment and autism tests/tools
characterized by conflicts of interest that are easily missed
Autism is a diagnosis that is defined in the Diagnostic and
by non-experts who, understandably, do not meticulously
Statistical Manual of Mental Disorders-Fifth Edition (DSM-
pore through the unexciting nooks of the supporting scien-
5; American Psychiatric Association, 2013) based on the
tific manuscripts. For example, many of the same research-
presence of certain behaviors. Although genetic testing for
ers studying the autism tools also benefit from the
those with autism is recommended, because tuberous scler-
widespread acceptance, and sale, of the tools. Likewise, the
osis and fragile X syndrome both overlap with autism at a
overly optimistic claims of diagnostic accuracy are riddled
rate that exceeds chance (Bailey et al., 1996; Patzer and
with methodological flaws. For example, diagnostic accuracy
Volkmar, 1999), there are no genetic tests for autism. In
statistics are calculated using samples with extremely high
fact, there are no “autism tests,” meaning that a diagnosis of
autism base rates and then generalized to clinical settings,
autism ultimately comes down to the diagnostic criteria
where the base rate for autism is much lower. And, in the
found in DSM-5 and clinical judgment. Even the assessment
case of multi-disciplinary autism teams being “best practice,”
tools described by many as “gold standards”—viz., the
data collection and treatment planning is, arguably,
Autism Diagnostic Inventory-Revised (ADI-R; Lord et al.,
enhanced through inclusion of professionals from varied 1994) and the Autism Diagnostic Observation Scale-Second
backgrounds. However, no research has been done to sup- Edition (ADOS-2; Lord et al., 2012)—are to be used in con-
port a blanket “best practices” assertion. Moreover, if the junction with clinical judgment:
social psychological literature on group decision-making is
of any value, there are powerful reasons to question a group The ADI-R and ADOS provide unique and critical overlapping
information that informs clinical judgments [emphasis added] in
decision-making process when diagnosing autism. While making an ASD diagnosis. When results from these instruments
there are defensible reasons to believe that tests and tools are consistent and correspond with clinical impression [emphasis
can improve diagnostic decision-making in the assessment added], diagnostic decision-making is straightforward.
of autism, tests are no substitute for an active awareness of Additional testing and alternative hypotheses must be
biasing influences and other general principles of assess- considered when there is inconsistency in results between the
ment, which are at the core of fields like psychology and
CONTACT Noah K. Kaufman noah71@pm.me Center for Neuropsychological Studies, Texas Tech University Health Sciences Center, 1188 West Hadley, Las
Cruces, NM 88005, USA; Department of Psychiatry, Paul L. Foster School of Medicine, 4615 Alameda, El Paso, TX 79905, USA.
ß 2020 Taylor & Francis Group, LLC
2 N. K. KAUFMAN
instruments and when the instrument results deviate from highly structured way of collecting clinical data, making it
clinical impression [emphasis added] (Risi et al., 2006, p. 1102). much more similar to actuarial decision-making methods
It is reasonable to ask why autism assessment tools are than subjective clinical judgment based on unstructured
needed in the diagnostic process, given that there is no interview. Therefore, it stands to reason that a diagnosis of
mention of testing in the DSM-5 diagnostic criteria for aut- autism should require data from assessment tools with
ism spectrum disorder. This is not to imply that testing is acceptable psychometric properties. Hence, use of the ADOS
not beneficial in the assessment of autism; indeed, as will be and ADI-R, assuming acceptable psychometric properties,
explained below, testing frequently improves diagnostic deci- also stands to reason.
sion-making. Rather, it is noteworthy that a diagnosis of Beyond the limits of unstructured clinical judgment, there
autism spectrum disorder in DSM-5, in comparison to some are other implied reasons to include assessment tools as a
other diagnoses, does not require a certain score on any aut- part of the diagnostic process when autism is under consid-
ism assessment test or tool. For a comparative perspective, eration. One obvious example relates to ascertaining whether
the diagnosis of intellectual disability, another neurodevelop- or not the symptoms are exclusively due to intellectual dis-
ability, or if autism and intellectual disability are both pre-
mental disorder described in DSM-5, does involve actual
sent. To accomplish this, an IQ test and an assessment of
testing, which is explained in DSM-5 as follows: “The diag-
adaptive functioning is required. As conveyed by the follow-
nosis of intellectual disability is based on both clinical
ing from DSM-5, the diagnostic process terminates with a
assessment and standardized testing of intellectual and adap-
diagnosis of intellectual disability, not autism spectrum dis-
tive functions … Individual cognitive profiles based on
order, if social and/or communication skills are approxi-
neuropsychological testing are more useful for understand-
mately as low as the low IQ:
ing intellectual abilities than a single IQ score.” (p. 37).
Whereas a DSM-5 diagnosis of intellectual disability does A diagnosis of autism spectrum disorder in an individual with
hinge, to some extent, on IQ, adaptive functioning, and intellectual disability is appropriate when social communication
and interaction are significantly impaired relative to the
sometimes even neuropsychological testing results, a DSM-5 developmental level of the individual’s nonverbal skills (e.g. fine
diagnosis of autism spectrum disorder does not overtly rely motor skills, nonverbal problem solving). In contrast,
on any testing whatsoever. And yet, the trend in the field is intellectual disability is the appropriate diagnosis when there is
to use “gold standard” assessment tools (viz., some version no apparent discrepancy between the level of social-
communicative skills and other intellectual skills. (p. 58)
of the ADOS and the ADI-R).
To illustrate, it is not unheard of for insurance company Similarly, DSM-5 now defines autism spectrum disorder
representatives to authorize more units if the ADOS is used. in a way that requires consideration of “intellectual
Consider also that it is very common for a practicing neuro- impairment” and “language impairment” (p. 51), both of
psychologist or psychologist to be asked by a referral source, which imply a need for psychometric testing. Even though
or the parent of a referred child, whether or not the ADOS DSM-5 does not specifically mention assessment tools in
will be utilized. This theme was recently addressed by making a diagnosis of autism spectrum disorder, it implies
Gwynette et al. (2019) as follows: the need for psychometric assessment of IQ, adaptive func-
tioning, and speech-language functioning.
A concerning trend has emerged in the diagnosis and treatment
of autism spectrum disorder (ASD) that has a negative impact Other reasons to include assessment tools in the diagnos-
on care. Quite often, a clinician’s diagnosis of ASD using DSM- tic process of autism are supported by the fundamentals of
5 criteria is no longer sufficient for individuals with ASD to psychological and neuropsychological assessment. Much of
access services. Insurance companies, school districts, and the impetus for these two fields came from difficulty clini-
developmental disability agencies commonly require an Autism cians experienced when trying to correctly identify those
Diagnostic Observation Schedule (ADOS) to be eligible for
with actual neurodevelopmental deficits. Alfred Binet and
services. (p. 1222)
Theodore Simon of France, early developers of the IQ test,
Even though the DSM-5 criteria for autism spectrum dis- wrote about how doctors were unable to correctly identify
order do not call for, let alone require, the use of any assess- children with learning difficulties (Binet & Simon, 1908). As
ment tools, clinicians serving those with a result of the unsystematic, random clinical decisions doc-
neurodevelopmental disorders know that there is now tre- tors were making about which children were educatable and
mendous pressure to make the diagnosis of autism using the which were not, Binet and Simon developed the
ADOS, if not also the ADI-R. Binet–Simon Scale, considered to be the first IQ test
Taking into consideration the scientific literature on the (Kaufman, 2009). More recently, Dahlstrom (1993) empha-
limits of subjective clinical judgment, as compared with clin- sized the limitations of human judgment when it is unaided
ical judgment based on structured data collection (e.g. a by methods to collect and interpret clinical data in a more
structured clinical interview) and empirical weighting of structured manner:
data (Faust & Ahern, 2012; Meehl, 1954), it would seem to People are poor judges of other people. In our dealings with
make sense to incorporate certain tests and tools into the one another we are subject to a variety of systematic and
diagnosis of autism, as has been done with other diagnoses random errors … Positive and negative halos … serve to cloud
in DSM-5 (e.g. intellectual disability, specific learning disor- our judgments and evaluations … this fallibility in the judgments
made by humans about fellow humans is one of the primary
ders, language disorder, neurocognitive disorders). While a
reasons that psychological tests have been developed and
psychometric test is not an actuarial model, per se, it is a
APPLIED NEUROPSYCHOLOGY: CHILD 3
applied in ever-increasing numbers over the past century. interaction. But these deficits might be exclusively due to
(p. 393). emotional functioning at the time of the assessment—not
In short, psychometric assessment tests and tools can autism. Alternatively, depression may interact with neurode-
improve clinical decision-making, above and beyond what velopmental deficits, blurring the clinical picture.
can be accomplished only using unstructured clinical inter- Rogers and Bender (2013) defined dissimulation as “a
view, semi-structured or structured interviews, structured general term to describe an inaccurate portrayal of symp-
behavioral observations, and so forth. Importantly, however, toms and associated features. It is typically used when more
the degree of improvement in clinical decision-making precise terms (e.g. malingering and defensiveness) are
depends on the specific test or tool and how it is utilized by inapplicable.” (p. 518). Dissimulation can erode the quality
the clinician. Therefore, psychometric assessment in the con- of clinical data when, for example, a parent overreports
text of autism would certainly seem to have its place, even symptoms in their child for reasons that may not be defini-
though the DSM-5 does not explicitly call for it. So, again, tively known. Alternatively, some parents may not openly
use of the ADOS and ADI-R in the assessment of autism share the full magnitude of their child’s challenges and diffi-
makes sense, insofar as these tools truly improve the quality culties with the clinician, thereby inaccurately representing
their child’s symptoms. Similarly, if it were established
of data upon which diagnoses are based.
through validity testing and other methods that an individ-
ual, or a parent, were intentionally reporting fake symptoms
Biasing influences in autism assessment of autism, the data could lead the clinician down a diagnos-
tic rabbit hole, possibly resulting in the delivery of costly
All assessment is influenced by bias, which is important irre- services that are needed elsewhere.
spective of what particular assessment test or tool is used to In contrast, response expectancies are “expectations for
improve diagnostic decision-making. Accordingly, it is automatic emotional, physical, or behavioral responses as
essential for mental health professionals conducting autism reactions to situational cues” (Suhr & Wei, 2013, p. 183). If,
assessments to minimize bias, no matter what test or tool for example, a parent came to believe that their child has
they do, or do not, use. One way to conceptualize bias dur- autism, depending on the child’s age and level of develop-
ing assessment is to separate biasing influences that operate ment, it is conceivable that the child would present with
during data collection and those that operate during data symptoms of autism due to parental influence. Similarly, it
interpretation (Kaufman & Bush, 2020a, 2020b). is conceivable that the parent would describe their child in
accord with beliefs that are influenced by inaccurate infor-
Bias during data collection mation the parent has come to rely on. Suhr and Wei
Of the many possible biasing influences at work during data (2013) described this phenomenon as follows:
collection, the following are particularly salient in the assess- As in other response expectancies, disease-specific beliefs may
ment of autism: medication; age; emotions; dissimulation be learned through not only personal experience but also the
suggestions of others (news media, public health
(Rogers & Bender, 2013); response expectancy (Suhr & Wei,
announcements, physician suggestion) and can even be learned
2013); and jurisogenic effects (Weissman, 1990). More through observation (e.g. epidemic hysteria; Hahn, 1999). In
explanation of these biasing influences is provided next, other words, when one is given diagnosis X and then reads
beginning with the biasing influence of medications. about diagnosis X, hears about it on television, attends support
Some psychotropic medications can cause individuals to groups for diagnosis X, and meets others with diagnosis X, etc.,
this can create response expectancy templates that include how
seem as if they meet diagnostic criteria for autism. While X might affect cognitive abilities and performance on tasks
many clinicians are skilled at factoring in medication effects, within a neuropsychological assessment. (p. 189)
thereby minimizing a confounding influence, this point is
still worth emphasizing, especially since some individuals The jurisogenic effect is perhaps a close cousin to
have been heavily medicated for years, making it harder for response expectancy, but it only arises in the context of liti-
even close family members to appreciate the impact of the gation. In essence, involvement in protracted litigation,
along with the potential for some sort of monetary compen-
medication. To illustrate, antipsychotic medication, espe-
sation if a case is won, can exacerbate a litigant’s personality
cially when combined with a stimulant, can make an other-
in such a way that they seem to meet diagnostic criteria for
wise animated, erratic youngster appear socially detached
autism. To illustrate, consider a scenario where a family sues
and robotic. This needs to be considered during assessment.
a school for not diagnosing their child with autism, which
Age is important because, for example, it is known that
occurs with noteworthy frequency (Hill et al., 2011;
“the process of brain maturation is long, lasting at least into
Zirkel, 2011).
early adulthood.” (Kolb & Fantie, 2012, p. 41). Simply put,
it is sometimes very difficult to know if a young child has
autism, as opposed to another neurodevelopmental disorder, Bias during data interpretation
making it important to proceed cautiously when trying to Of the many possible biasing influences at work during data
understand and characterize the workings of an immature interpretation, the following stand out with regard to autism
nervous system. assessment: ignoring the base rate; dilution effects; illusory
Next, an individual in a major depressive episode might correlation; pressure from the family, an individual being
present with deficits in social communication and social assessed, an institution (e.g. a school), or retaining legal
4 N. K. KAUFMAN
counsel; diagnosis momentum; hindsight bias; confirmation parent report, the clinician can end up down a diagnostic
bias; regression toward the mean; and clinician overconfi- rabbit hole, as noted above.
dence (Croskerry, 2002). More in-depth explanation of these The symptoms of autism, and many other mental disor-
biasing influences follows, beginning with ignoring the ders, occur in the general population more often than might
base rate. be realized. For example, most teens and adults have, at
Ignoring the base rate (Croskerry, 2002) is relevant to some point in their life, displayed “inflexible adherence to
autism diagnosis because autism is extremely rare. An esti- routines,” “rigid thinking patterns,” and/or “extreme distress
mated 1 in 59 children have autism at age eight years (Baio at small changes” (p. 50). Because of this, it would be a clin-
et al., 2018; Centers for Disease Control, 2014), which corre- ical mistake to conclude, based only on the presence of these
sponds with a base rate of 1.69%. This is very low. The base symptoms, that an individual satisfies DSM-5 diagnostic cri-
rate according to the DSM-5 is even lower, at 1%. teria for autism spectrum disorder. Making this erroneous
Therefore, it is always important to appreciate that the odds associative connection between symptoms and the diagnosis
of any particular individual having autism are also low. In of autism is called an illusory correlation (Chapman &
fairness, it is necessary to recognize that the base rate of Chapman, 1969). Clinicians conducting assessments can
those who truly have autism is higher in certain settings reduce the likelihood of bias from an illusory correlation by
(e.g. all children referred for outpatient pediatric neuro- considering how often symptoms are present, and absent, in
psychological assessment). Nonetheless, one must remain relation to those who truly have the disorder, versus those
mindful that autism is still not a common disorder, which who do not have the disorder. If large numbers of people
means correctly identifying it is more challenging, especially have the symptoms, but not the disorder, an illusory correl-
when other neurodevelopmental disorders might explain ation may be present.
the symptoms. Pressure from retaining legal counsel is known to bias
Dilution effects (Tetlock & Boettger, 1989) can bias aut- the opinions of legal experts (Murrie & Boccaccini, 2015).
ism assessment when clinical information with high rele- Similarly, it is not uncommon for individuals or family
vance is underweighted during the assessment process, in members to present for an assessment with preexisting ideas
favor of clinically irrelevant information. For example, some about the nature of the individual’s difficulties. This might
parents may not be able to remember important historical be a result of response expectancies, as explained above.
details about their child’s development that are very import- Similarly, a clinician working within a school or agency—or
ant (e.g. severe lack of oxygen during delivery), while simul- as a part of a multi-disciplinary team—might be influenced
taneously becoming very invested in information with low by pressure to make certain clinical conclusions. Under any
clinical relevance (e.g. vaccines as a cause of symptoms, or of these circumstances, the quality of the assessment may
the perception that the individual has savant-like abilities, suffer as a result of this outside pressure.
when such may not be the case). If this occurs, the highly Diagnosis momentum (Croskerry, 2002) and hindsight
relevant information will get diluted. As it turns out, accur- bias (Fischhoff, 1975) are similar concepts. Diagnosis
acy in decision-making, the underlying purpose for assess- momentum is when a clinician automatically re-applies a
ment (Sattler, 2001), is optimized when clinicians properly diagnosis to an individual, without engaging in their own
weight a relatively small number of highly relevant variables: diagnostic decision-making process. Hindsight bias is when
a clinician engages in their own diagnostic decision-making
A ceiling on predictive accuracy is commonly approached or
reached once the three to five most valid, and least redundant, process, but is heavily influenced by a diagnosis that was
variables have been identified. Beyond this point, additional already made. With hindsight bias, there is an “I knew it all
information tends to increase confidence, not accuracy. More along” quality to the decision-making, which can be particu-
problematically, at times continuing to add information may larly influential if the original diagnosis was made by a
decrease accuracy because weaker predictors may dilute the prominent clinician or group of clinicians. In studies seeking
stronger predictors. (Faust et al., 2009, p. 14)
to establish diagnostic accuracy statistics for autism tests,
This is why “more is better” does not exactly apply to best-estimate diagnoses of autism are frequently made by
assessment. In making this point, Frederick (2012) stated, clinicians who have access to the results of autism tests.
“Mental health treatment records are potentially misleading With this unfortunate research design, hindsight bias can
because diagnoses are often arbitrarily assigned, without quickly eat away at the methodological core of the study,
rigorous application of diagnostic guidelines.” (p. 536); artificially boosting the diagnostic accuracy statistics for the
Frederick also pointed out that “Reports of friends, family autism test.
members, and neighbors also have variable consistency and Confirmation bias (Wason, 1960) has relevance to autism
reliability. Such reports should not be casually integrated assessment. It is known that decisionmakers frequently do
into reports.” (p. 537). The key word in the foregoing is not manage competing information particularly well. To
“casually.” Report from those who know the individual can reduce the cognitive dissonance (Festinger, 1957) created by
be extremely valuable and should be sought; however, auto- competing information, decision-makers often seek out
matically weighting it strongly, in the absence of support information that accords with their beliefs or hypotheses.
from validity testing of raters, may dilute other more valid This is termed confirmation bias. Diagnostic decisions about
data points. Stated plainly, if a parent provides invalid data whether or not an individual has autism are sometimes very
and the clinician does not assess for the validity of the straightforward, making the decision relatively easy.
For example, consider the following individual characteris- advised to keep regression toward the mean in mind when
tics: male; age 7 years; positive for Fragile X Syndrome; interpreting information that supports a diagnosis of autism.
macrocephalic (i.e. very large head); full-scale IQ ¼ 53; It might be that what is observed during one assessment is
extremely poor verbal and non-verbal pragmatics; extremely not observed during a subsequent assessment, especially to
limited speech; echolalia; intolerant to schedule changes; fix- the extent that what is observed is extreme and measured
ated on trains; and hypersensitive to loud noises. In actual using methods with imperfect reliability.
practice, however, a high percentage of cases are much more Clinician overconfidence (Croskerry, 2002) can be par-
complex, a topic nicely addressed by Wolff et al. (2018). For ticularly problematic in autism assessment. Clinicians
example, consider the following individual characteristics: involved in diagnosing autism are likely familiar with other
male; age 44; negative for Fragile X Syndrome and Tuberous clinicians who document bold, highly confident diagnostic
Sclerosis; macrocephalic; always lived with mother; was impressions in their reports about the presence, or absence,
placed in special education due to “borderline retardation;” of autism. Or, clinicians involved in diagnosing autism may
never held a job; cannot manage money without help; often themselves boldly propound diagnostic statements about the
gives money to those who ask for it; has a driver’s license; presence, or absence, of autism. The research shows that
full-scale IQ ¼ 69; very limited adaptive functioning based overconfidence about diagnostic decision-making can be
on brother report; never married; no children; has had three very problematic: “confidence accounts for 2% of variance
girlfriends in his adult life; and primary interest is singing. in judgment accuracy.” (Miller et al., 2015, p. 9). Stated
In contrast to the first case described, where autism is very plainly, confidence about one’s diagnostic decision does not
probable, the second case is much more challenging because equate to accuracy about one’s diagnostic decision, which
intellectual disability might better explain the symptoms. has particular relevance to autism, given that many cases are
When diagnostic decision-making is complex, diagnostic very complex. Reframed differently, a high degree of cogni-
decisions are more vulnerable to confirmation bias, which tive dissonance (Festinger, 1957) can accompany autism
can creep in to help resolve the cognitive dissonance created assessments, because cases can be complex (Wolff et al.,
by competing information. Notably, this biasing effect may 2018). Some clinicians will resolve this cognitive dissonance
be particularly powerful when multi-disciplinary teams colin the direction of “being right.” But the research shows that
laborate to make the diagnostic decision, given what is such confidence about “being right” does not correlate with
known about group decision-making (Myers, 2008): accuracy—a counterintuitive research finding that those try-
ing to correctly diagnose autism are encouraged to consider.
In groups, we become more aroused, more stressed, more tense,
more error-prone on complex tasks. Submerged in a group that
In summary, bias during data collection and data inter-
gives us anonymity, we have a tendency to loaf or have our pretation can erode the quality of mental health assessments.
worst impulses unleashed by deindividuation … Discussion in This includes assessments directed at the correct identifica-
groups often polarizes our views … it may also suppress dissent, tion of autism. No matter what method a clinician chooses
creating a homogenized groupthink that produces disastrous to utilize, assessments will suffer if biasing influences are not
decisions. (p. 297)
actively taken into consideration. Hence, even if a clinician
Of note, zero-sum thinking about individual versus team- uses “gold standard” assessment tools, conclusions can be
based assessment of autism is not the point of emphasizing incorrect if biasing influences are not dispassionately built
confirmation bias in groups, given that professionals from into the clinical decision-making process. The same holds
different educational backgrounds can make independent, true for “best practices” involving multi-disciplinary autism
and therefore valuable, clinical contributions (Goldstein & teams, in light of the compelling social psychological
Ozonoff, 2018). research on group decision-making (e.g. higher stress among
Regression toward the mean, an important term from Sir team members, more error on complex tasks, social loafing,
Francis Galton, was described by Campbell and Kenny deindividuation, polarization of views, suppression of dis-
(1999) as follows: “scores that are extreme in standard devi- sent, and homogenized groupthink). In other words, groups
ation units on one measure are not likely to be as extreme of individuals attempting to make an important decision—
when measured on another measure” (p. 170). In their dis- i.e. whether or not an individual has autism—are not
cussion of the term, Faust and Ahern (2012) described it as beyond the influence of bias. In fact, because of the social
follows: “There is a well-known tendency for extreme occur- psychology of group decision-making, multi-disciplinary aut-
rences to be followed by less extreme occurrences merely ism teams may introduce more sources of bias into the diag-
due to the operation of chance.” (p. 203). Insofar as chance nostic process.
contributes to the occurrence of an event, extreme events
are more likely to be followed up by less extreme events.
Problems with research supporting current “gold
Given the low base rate for autism, having autism is an
standard” autism assessment tools
uncommon, and therefore, extreme event. Similarly, obtain-
ing a test score on a psychometric test that supports a diag- In their recent discussion of assessment tools for autism,
nosis of autism is also an uncommon and extreme event. Pennington et al. (2019) stated, “The two main diagnostic
And since psychometric tests lack perfect reliability, chance tools for ASD are the Autism Diagnostic Observation
will always play a role in obtained psychometric test scores. Schedule-Second Edition (ADOS-2; Lord et al., 2012) and
It follows, therefore, that diagnostic decision-makers are well the Autism Diagnostic Interview-Revised (ADI-R; Lord
6 N. K. KAUFMAN
et al., 1994). Both are standardized measures that have correctly identified by the test as not having the condition)
become the “gold standard” in the ASD field. Both the ADI- for the ADI-R and ADOS are considered separately, for
R and ADOS-2 have rigorous training procedures to reach each of the two assessment tools, the Youden’s J Index for
reliability standards.” (p. 292). Notably, however, this “gold each tool drops further. To illustrate using the Risi et al.
standard” proclamation is not supported by a specific refer- (2006) data, Youden’s J Index equals 0.32 for the ADI-R
ence with scientific evidence. In their defense, Pennington, among those in the U.S. sample with intellectual disability.
McGrath, and Peterson are not alone in making this bold Similarly, Youden’s J Index equals 0.182 for the ADOS in
proclamation. For years now, many others involved in aut- this same sample.
ism assessment have described these assessment tools as To further appreciate why the ADI-R and ADOS may
“gold standards,” raising an important question: Why? not be “gold standards,” consider the Canadian data (age
It is known, for example, that the term “gold standard” is 3þ years) from the Risi et al. (2006) study: sensitivity ¼
to be avoided in the social sciences. In making this point, 77.2% (95% CI ¼ 70–83); specificity ¼ 75.0% (60–86); and
Lilienfeld et al. (2015) wrote, “In the domains of psycho- Youden’s J Index ¼ 0.522. This means the likelihood of
logical and psychiatric assessment, there are precious few, if making a correct diagnostic decision about the presence of
any, genuine ‘gold standards.’ Essentially all measures, even autism among Canadians age 3þ years, using the ADI-R
those with high levels of validity for their intended purposes, and ADOS in combination, is at about a chance level. Now
are necessarily fallible indicators of their respective con- consider the likelihood of making a correct diagnostic deci-
structs … ” (p. 4). sion about the presence of autism among those in the U.S.
A review of the autism literature over the past couple of with a diagnosis of intellectual disability. Given sensitivity of
decades points to a study by Risi et al. (2006), in which evi- 91.1% (95% CI ¼ 83–98) and specificity of 50.0% (95% CI
dence of diagnostic accuracy is provided for both the ADI-R ¼ 31–75), Youden’s J Index is .411, which falls below a
and the first version of the ADOS (Lord et al., 1999). The chance level.
authors of this study, some of whom are also authors of the Youden’s J Index is a valuable overall index of diagnostic
ADI-R and/or ADOS, conclude that use of the ADI-R and accuracy, in part because it is independent of the sample
ADOS in the assessment of autism improves diagnostic deci- base rate (i.e. the number of those who truly have the condi-
sion-making: “The Autism Diagnostic Interview-Revised and tion [autism]). But it has been criticized as hard to interpret
Autism Diagnostic Observation Schedule make independent, (Glas et al., 2003). An alternative to Youden’s J Index is the
additive contributions to the judgment of clinicians that Hit Rate, another overall index of diagnostic accuracy com-
result in a more consistent and rigorous application of diag- puted as follows: (True Positives þ True Negatives)/(True
nostic criteria.” (p. 1094). Largely as a result of this conclu- Positives þ True Negatives þ False Positives þ False
sion, this study has subsequently been heavily relied upon to Negatives). Simply put, the Hit Rate is the percentage of
justify use of the “gold standard” terminology when describ- those correctly identified by the test, both in terms of true
ing the ADI-R and ADOS/ADOS-2. However, a closer positives and true negatives. However, the Hit Rate is
inspection of the data in the Risi et al. (2006) study leads to affected by the sample base rate; the higher the sample base
more questions about why these autism assessment tools are rate, the higher the Hit Rate, and vice versa. Therefore, if
automatically embraced by so many as “gold standards.” research is conducted on a sample where those with the
To illustrate, Risi et al. (2006) reported that combining condition of interest (i.e. autism) is high, the Hit Rate will
the ADI-R and ADOS in their U.S. sample (age 3þ years) be higher. But if the research is performed on a sample
resulted in sensitivity of 82.0% (95% CI ¼ 78–85) and speci- where those with the condition of interest (i.e. autism) is
ficity of 86.0% (95% CI ¼ 83–89) in the prediction of aut- low, the Hit Rate will be lower. Hence, the Hit Rate is
ism. This means that the combined, overall accuracy rate, higher when the condition of interest is abundant and easy
based on Youden’s J Index (Biggerstaff, 2000; Youden, 1950) to identify, but lower when the condition of interest is rare
is 0.68 (Youden’s J Index ¼ [0.82 þ 0.86]-1 ¼ 0.68). This and hard to spot.
value is the likelihood of making a correct diagnostic deci- Using data in Risi et al. (2006), the base rate for autism
sion about autism, or a 68% overall likelihood of being cor- in each of the four samples is uncommonly high: 540/
rect. While this is better than flipping a coin to see if 960 ¼ 0.563 for participants 3þ years old in the U.S.; 162/
someone has autism, it is not high enough to instill much 270 ¼ 0.600 for participants <3 years old in the U.S.; 45/
confidence about the diagnostic decision, let alone to justify 66 ¼ 0.620 for participants with profound intellectual disabil-
a “gold standard” declaration. Moreover, these diagnostic ity in the U.S.; and 184/232 ¼ 0.793 for participants 3þ years
accuracy scores are based on a combined mathematical algo- old in Canada. As expected, using these high sample base
rithm of best fit for using the ADI-R and ADOS in combin- rates for autism, the Hit Rates are also higher (Table 1).
ation—something practitioners seldom, if ever, actually do. This is notable because, as pointed about above, the popula-
More frequently, because “time and cost concerns are often tion base rate for autism is currently estimated to be
a barrier” (Pennington et al., 2019, p. 293), clinicians use between 1 and 1.69%, which is much lower than the base
one or the other of these assessment tools, but not both. rates in the Risi et al. (2006) study.
This is very important because when sensitivity (i.e. the Therefore, correctly identifying autism with the ADI-R-
percentage of those correctly identified by the test as having ADOS combination is more challenging than Risi et al.
the condition) and specificity (i.e. the percentage of those (2006) would suggest, especially in real-world clinical
Table 1. Indices of diagnostic accuracy for autism using the ADI-R þ ADOS. years old in the U.S.; .600 for participants <3 years old in
Sensitivity (%) Specificity (%) Youden’s J Index Hit rate the U.S.; .620 for participants with profound intellectual dis-
U.S.: <36 mo. 80.9 87.0 0.679 0.833 ability in the U.S.; and .793 for participants 3þ years old in
U.S.: ID 91.1 50.0 0.411 0.776
U.S.: 3þ yrs. 82.0 86.0 0.680 0.838
Canada. Therefore, while the positive and negative predictive
Canada: 3þ yrs. 77.2 75.0 0.522 0.767 values reported in Risi et al. (2006) are reasonably high (e.g.
ADI-R: Autism Diagnostic Inventory-revised; ADOS: Autism Diagnostic PPV ¼ 88.3% and NPV ¼ 78.8% for participants 3þ years
Observation Scale. old in the U.S. using both ADI-R and ADOS to predict aut-
ism), it is not realistic to generalize these predictive values
settings. For this reason, the Hit Rate as an overall index of to settings where the base rate for autism is signifi-
diagnostic accuracy has limitations that do not apply to cantly lower.
Youden’s J Index, which is independent of the sample base The crucial message here is that positive and negative
rate. When Youden’s J Index is computed on the Risi et al. predictive values are a function of the base rate of the con-
(2006) data, the combination of the ADI-R and the ADOS dition of interest. Therefore, assessment tools with impres-
often adds little to the diagnostic decision-making process, sive predictive values in settings where the base rate (of
thereby challenging the widely adopted view that these what is being predicted) is high will cease to have impressive
assessment tools are “gold standards.” Moreover, further predictive values in a setting where the base (of what is
consideration of all the data in the Risi et al. (2006) paper being predicted) is low. To fully appreciate why positive and
reveals an overall pattern demonstrating that the ADI-R and negative predictive values are a function of the base rate, the
ADOS, especially used separately, do not improve overall following formulas are helpful:
diagnostic decision-making as much as one might think.
PPV ¼ ðsensitivity baserateÞ=½ðsensitivity baserateÞ
There are limitations with indices of overall diagnostic
accuracy that attempt to combine sensitivity and specificity þ ðð1–specificityÞ ð1–base rateÞÞ
(Grimes & Shulz, 2002), which extends to both Hit Rate and
Youden’s J Index, so it is common to focus on interpreting NPV ¼ ðspecificityxð1–base rateÞÞ=½ðspecificity
sensitivity and specificity, as well as positive predictive value
and negative predictive value. Briefly, sensitivity is the ratio ð1–baserateÞÞ þ ðð1–sensitivityÞ baserateÞ
of true positives divided by the true positives plus the false An additional, perhaps more serious, concern with the
negatives (TP/(TP þ FN)); in other words, sensitivity is the Risi et al. (2006) study pertains to how sensitivity and speci-
percentage of those correctly identified by the test as having ficity were computed. Namely, scores on the ADI-R and
the condition. In comparison, specificity is the true negatives ADOS were compared to a best-estimate diagnosis, which
divided by the true negatives plus the false positives (TN/ differed between their U.S. and Canadian samples. The best-
(TN þ FP)); in other words, specificity is the percentage of estimate diagnoses in the U.S. samples were made using the
those correctly identified by the test as not having ADI-R and ADOS, meaning that the consensus best-estimate
the condition. diagnosis was confounded with the ADI-R and ADOS
Notably, sensitivity and specificity describe how a certain scores, as opposed to being independent of these scores.
assessment tool performed, in the past, with a particular Stated another way, scores on the ADI-R and ADOS were
sample and under specific circumstances. As explained by not compared to some other independent real “gold stand-
Grimes and Shulz (2002), sensitivity and specificity “look ard,” but were, instead, compared to scores on the ADI-R
backward” (p. 882) at results gathered on a particular group and ADOS. This means that the diagnostic accuracy of the
of individuals and under particular circumstances, making ADI-R and ADOS, in the U.S. samples, were derived by
these indices of less value to clinicians, who want to make checking to see how much scores on the ADI-R and ADOS
predictions about what will happen in the future with those compared to scores on the ADI-R and ADOS. As might be
they assess, who may differ dramatically from the sample expected, diagnostic accuracy studies with this sort of
originally studied. Because they are forward-looking, the research design have been heavily criticized
positive and negative predictive values are frequently relied (Wilczynski, 2008).
upon, instead of sensitivity and specificity, which describe In the Canadian samples, the best-estimate diagnoses
historical information. Briefly, the positive predictive value were made by clinicians who had not themselves adminis-
is the ratio of true positives divided by true positives plus tered the ADI-R and ADOS, but who had access to the
false positives (TP/(TP þ FP)); this is the chance of truly ADI-R and ADOS scores, raising concerning questions
having the condition (i.e. autism) if the autism test is posi- about hindsight bias (Fischhoff, 1975). In other words, clini-
tive. In contrast, the negative predictive value is the ratio of cians used clinical judgment to form an opinion about
true negatives divided by true negatives plus false negatives which participants had autism, but these clinicians knew
(TN/(TN þ FN)); this is the chance of not having the condi- whether or not the ADI-R and ADOS pointed toward, or
tion (i.e. autism) if the autism test is negative. away from, an autism diagnosis, setting the stage for a hind-
In Risi et al. (2006), what appear to be very high positive sight bias scenario. This too is a methodological flaw with
and negative predictive values are reported. But this is mis- significant implications, which Risi et al. (2006) acknow-
leading because the reported base rates of autism in this par- ledge: “In an ideal design, different examiners would admin-
ticular study are uncommonly high: .563 for participants 3þ ister the ADI-R and ADOS in random order, and examiners
8 N. K. KAUFMAN
blind to ADI-R and ADOS scores would make the BE [best- always replicate the multi-disciplinary assessment recom-
estimate] diagnoses … Circumstances were closer to the ideal mended for clinical diagnosis” (p. 3) and (b) that “Emerging
in study 2 [i.e. the Canadian sample].” (pp. 1101–1102). If evidence suggests that there is low agreement between indi-
one looks past this flaw, the Canadian research design has vidual clinician and transdisciplinary team diagnoses
better external validity than the U.S. research design; how- (Stewart et al. 2014), with both underdiagnosis and over-
ever, even with the Canadian study, the external validity is diagnosis of ASD.” (p. 7). In other words, the best-estimate
specious. And, as expected since the research design was method used by the researchers to establish the diagnostic
slightly more realistic, this is why the diagnostic accuracy of accuracy rates of the ADOS and ADI-R does not mesh with
the ADI-R used in addition to the ADOS drops substantially the results of multi-disciplinary autism teams; at the same
with the Canadian participants, down to chance-level accur- time, Stewart et al. (2014) report that the results of multi-
acy (Youden’s J Index ¼ .522 for the Canadian sample; see disciplinary teams do not agree with the conclusions made
Table 1). by individual clinicians. Can accuracy rates not established
Recently, Randall et al. (2018) set out to “find out which using a multi-disciplinary team approach be generalized to
of the commonly used tools is most accurate for diagnosing multi-disciplinary team approach settings? Similarly, how do
ASD in preschool children.” (p. 3). Despite the availability we know that the individual-clinician approach is not more
of many more than six “commonly used tools” (p. 3) to similar to the best-estimate approach used in the research,
help diagnose autism, these researchers began with six diag- given that the best-estimate approach does not involve a
nostic tools. They then proceeded to study three of these six multi-disciplinary team decision?
tools—viz., the ADOS, the ADI-R, and the Childhood At present, researchers use a best-estimate approach to
Autism Rating Scale (CARS; Schopler et al., 2010)—on the establish which individuals have autism; however, as Randall
basis that the other three tools did not meet inclusion crite- et al. (2018) have acknowledged, what the researchers estab-
ria for the study. Notably, however, because developers of lish is not always replicated by the multi-disciplinary teams.
the three included diagnostic tools were also the authors of Likewise, Randall et al. (2018) note low diagnostic agree-
the research studies, developers of the index tools being
ment between individual clinicians and multi-disciplinary
studied, as well as trainers for use of the instruments, there
teams. It would seem, therefore, that the multi-disciplinary
were conflicts of interests in nearly 40% of the 21 studies
teams are making diagnostic conclusions that differ from
included, causing Randall et al. (2018) to state that “the
both the researchers and the individual clinicians. And yet,
presence of conflicts of interest in some publications may
Randall et al. (2018) describe the multi-disciplinary team
result in ADOS, CARS, and ADI-R appearing more accurate
approach involving a pediatrician, a speech pathologist, and
than they really are.” (p. 3).
a psychologist as “best practice for this preschool age group”
Another major concern with the Randall et al. (2018)
(p. 23). Is this “best practice” proclamation supported by
study is that the base rate for autism was 74%, using a best-
estimate approach, similar to that described in the Risi et al. the research?
As stated above, zero-sum thinking about individual ver-
(2006) study. As a result, diagnostic accuracy statistics
reported by Randall et al. (2018) cannot be generalized to sus team-based assessment of autism is not the point of
real-world clinical settings where the base rate for autism is emphasizing potential problems with group decision-making
lower, which is a crucial consideration, given that the popu- about autism. There is no reason to doubt the valuable con-
lation base rate for autism is quite low (i.e. between 1% and tributions of professionals from varying educational back-
1.69%). This important message was explained by Grimes grounds—with regard to data collection during autism
and Shulz (2002) as follows: assessment. The doubt arises from low diagnostic agreement
between researchers and multi-disciplinary teams and low
Clinicians must know the approximate prevalence of the
diagnostic agreement between individual clinicians and
condition of interest in the population being tested; if not,
reasonable interpretation is impossible. Consider a new PCR test multi-disciplinary teams. In other words, the possible prob-
for chlamydia, with a sensitivity of 0.98 and specificity of 0.97 (a lem might be with the final diagnostic decision-making pro-
superb test) … a doctor uses the test in a municipal sexually cess utilized by a team of professionals.
transmitted disease clinic, where the prevalence of Chlamydia Rather than asserting that multi-disciplinary assessment is
trachomatis is 30%. In this high-prevalence setting … 93% of
“best practice,” it makes sense to test the hypothesis that
those with a positive test actually have the infection. Impressed
with the new test, the doctor now takes it to her private practice multi-disciplinary teams make less accurate diagnostic con-
in the suburbs, which has a clientele that is mostly older than clusions than the researchers and/or the individual clini-
age 35 years … Here, the prevalence of chlamydial infection is cians. This hypothesis seems especially worth exploring,
only 3%. Now the same excellent test has a predictive positive given the social psychological research on worse decision-
value of only 0.50 … Here, flipping a coin has the same
predictive positive value (and is considerably cheaper and making by groups. Put simply, it is reasonable to want repli-
simpler than searching for bits of DNA). This message is cated scientific support for bold assertions about “best
important, yet not widely understood: when used in low- practices” that, unfortunately, have the effect of (a) exclud-
prevalence settings, even excellent tests have poor predictive ing individual clinicians from the autism diagnostic deci-
positive value. (p. 883)
sion-making process and (b) forcing individuals and families
Also of note, Randall et al. (2018) acknowledged (a) that to wait until a multi-disciplinary autism team is available,
the results of best-estimate diagnosis of autism “does not which can greatly delay needed mental health services.
If one does not blindly accept that the ADOS and ADI-R uncomplicated. Indeed, it can become extremely complex, as
are “gold standard” assessment tools in the assessment of emphasized above and written about by Wolff et al. (2018).
autism, and instead seeks scientific support for these instru- Rather, it simply means that general principles of assessment
ments, not only will reasonable questions emerge about remain vital (e.g. dispassionate consideration of bias), which
overall diagnostic accuracy, but other limitations will summons into action fields like psychology and neuropsych-
become evident. For example, interscorer reliability for ology that have long been devoted to reliably measuring
ADOS-2 items are frequently poor, challenging the need for abnormal human behavior using a wide range of methods.
lengthy and expensive ADOS-2 trainings. In other words, if And it remains to be proven that decision-making by an
the costly ADOS-2 trainings are not resulting in adequate autism team is more accurate than decision-making by an
interscorer reliability, it is reasonable to question individual clinician, especially one equipped with fundamen-
Pennington, McGrath, and Peterson’s assertion that “the tal knowledge about the general principles of assessment.
ADI-R and ADOS-2 have rigorous training procedures to Only two autism assessment tools, arbitrarily described as
reach reliability standards” (2019, p. 292). “gold standards,” should not be used to block the clinical
judgments of individual clinicians using other assessment
tools with equal, if not better, scientific support. For
More, not less, might be better in autism assessment example, Zhou et al., (2020) published a study on the diag-
Overall, when conducting autism assessments, it is import- nostic utility of the Behavioral Assessment System for
ant to consider the purpose of assessment in a general sense Children-Third Edition (BASC-3; Reynolds & Kamphaus,
(Sattler, 2001), because in so doing, clinicians are less ham- 2015) in discriminating between children with autism versus
strung by the known limitations of the categorical, diagnos- attention-deficit/hyperactivity disorder (ADHD). Zhou et al.
tic nosology used in DSM-5 (Frances, 2009): (2020) also reported diagnostic accuracy rates for autism
that are slightly better than a recent large-scale study
It has long been realised that the mental disorders described in
our diagnostic system are fuzzy sets that lack clear boundaries
(N ¼ 1,080) of the ADOS-2 involving infants, children, and
between themselves and with normality. The traditional DSM young adults (Kamp-Becker et al., 2017). To the extent that
(and ICD) categorical approach is necessarily forced to carve these BASC-3 findings are generalizable to clinical settings,
nature at awkward joints and loses much information in the it suggests that lengthy and expensive assessment tools like
process. (p. 391) the ADI-R and ADOS-2 may not be justified in the assess-
Insofar as Frances (2009) and others (Galatzer-Levy & ment of autism. Or, alternatively, it might be possible to
Bryant, 2013; Hyman, 2007, 2010; Mirowsky & Ross, 1989) obtain very similar results using a more cost-effective meas-
are correct about the limitations of DSM-5, consideration of ure like the BASC-3, which has the added benefit of testing
diagnostic criteria for disorders can be viewed as but one symptom validity (Larrabee, 2012) and is more accessible
part of an overall assessment process, which includes the than the ADOS-2 and ADI-R, both of which require lengthy
collection of other valuable data. For example, symptom and and expensive training that many clinicians would
performance validity (Larrabee, 2012) data can make or rather avoid.
break an assessment. Similarly, test scores that rule out other Applying this same reasoning, multi-disciplinary autism
diagnoses (e.g. intellectual disability) or more precisely char- teams, arbitrarily described as “best practice,” should not
acterize an autism diagnosis (e.g. “with intellectual unquestioningly be viewed as any more accurate than indi-
impairment” or “with language impairment”) can tremen- vidual clinicians, especially in light of the social psychology
dously improve and enrich an autism assessment. Likewise, of group decision-making. This is especially relevant, insofar
data on the quality of the pregnancy and delivery (Kaufman as individual clinicians are, through their professional back-
et al., 2018) can establish much-needed context for test grounds and training (e.g. neuropsychological board-certifi-
scores and other data. This is precisely why other assessment cation), equipped with a deep awareness of general
tools can be so valuable to the assessment process. assessment principles, including the various ways bias can
Therefore, in contrast to the “gold standard” perspective on skew diagnostic accuracy. Similarly, it is known that neuro-
autism assessment (Huerta & Lord, 2012; Pennington et al., psychological methods lead to high accuracy rates in the
2019; Perry et al., 2002; Randall et al., 2018; Risi et al., detection of brain dysfunction (Fargo et al., 2008; Garb &
2006), it is necessary to emphasize that there are more than Schramke, 1996), which is common among many
two autism tests available to the clinician wishing to enrich with autism.
and improve the autism assessment process.
Moreover, it is important to disabuse consumers of aut-
Concluding thoughts
ism assessment services of the scientifically unsupported
belief that “best practices” dictate that autism is only prop- It is time to rethink “gold standards” and “best practices” in
erly diagnosed by a multi-disciplinary autism team. This is the assessment of autism, for the reasons articulated above
not to say that professionals from different backgrounds and because other methods may yield as good or better
(e.g. pediatricians, speech-language pathologists, psycholo- results. There are many other autism assessment tools that
gists, medical geneticist) do not have much to offer with are more cost-effective, more accessible, and frequently less
regard to data collection and treatment planning. Nor is it time-consuming. Among these other measures are the fol-
to say that this work is easy, straightforward, or lowing: Childhood Autism Rating Scale-Second Edition
10 N. K. KAUFMAN
(CARS-2; Schopler et al., 2010); Diagnostic Interview for emphasizes only two “gold standard” tests, as opposed to
Social and Communication Disorders-Tenth Revision the thought processes of those interpreting the tests. For this
(DISCO-10; Wing, 2006); Gilliam Autism Rating Scale-Third and the other reasons articulated above, it is time to rethink
Edition (GARS-3; Gilliam, 2013); Developmental, “gold standards” and “best practices” in the assessment
Dimensional, and Diagnostic Interview (3di; Skuse et al., of autism.
2004); Social Responsiveness Scale-Second Edition (SRS-2;
Constantino & Gruber, 2012); the Social Communication
Questionnaire (SCQ; Rutter et al., 2003); the Modified Disclosure statement
Checklist for Autism in Toddlers (M-CHAT; (Robins et al., No potential conflict of interest was reported by the author(s).
2014); the Social Language Development Test (SLDT;
Bowers et al., 2008, 2010); various subtests from the
NEPSY-II (Korkman et al., 2007); Social Cognition subtests
References
from the Advanced Clinical Solutions (Pearson, 2009); the American Psychiatric Association. (2013). Diagnostic and statistical
Benton Facial Recognition Test (Benton et al., 1994); and manual of mental disorders (DSM-5). American Psychiatric
the Autism Spectrum Rating Scales (ASRS; Goldstein & Publishing.
Bailey, A., Phillips, W., & Rutter, M. (1996). Autism: Towards an inte-
Naglieri, 2009). gration of clinical, genetic, neuropsychological, and neurobiological
Additionally, when other more traditional psychological perspectives. Journal of Child Psychology and Psychiatry, and Allied
and neuropsychological methods are utilized to characterize Disciplines, 37(1), 89–126. https://doi.org/10.1111/j.1469-7610.1996.
the functioning of those suspected of autism—that is, tests tb01381.x
of personality, executive functioning, memory, attention, Baio, J., Wiggins, L., Christensen, D. L., Maenner, M. J., Daniels, J.,
Warren, Z., & White, T. (2018). Prevalence of autism spectrum dis-
processing speed, sensory-motor functioning, and so on—
order among children aged 8 years—autism and developmental dis-
the clinical picture frequently comes into focus in a way that abilities monitoring network, 11 sites, United States, 2014. Mortality
informs not only the DSM-5 diagnostic criteria of autism and Morbidity Weekly Report Surveillance Summaries, 67(6), 1.
spectrum disorder, but other disorders in DSM-5. This is https://doi.org/10.15585/mmwr.ss6706a1
important because such an approach can reveal, or help rule Benton, A. L., Sivan, A. B., deS. Hamsher, K., Varney, N. R., & Spreen,
out, deficits that support an autism disorder, as well as mul- O. (1994). Contrubutions to neuropsychological assessment (2nd ed.).
Oxford University Press.
tiple other disorders. Moreover, using neuropsychological
Biggerstaff, B. J. (2000). Comparing diagnostic tests: A simple graphic
methods in the context of a possible autism spectrum dis- using likelihood ratios. Statistics in Medicine, 19(5), 649–663.
order is in keeping with the research demonstrating that https://doi.org/10.1002/(SICI)1097-0258(20000315)19:5<649::AID-
neuropsychological methods lead to high accuracy rates in SIM371>3.0.CO;2-H
the detection of brain dysfunction (Fargo et al., 2008; Garb Binet, A., & Simon, T. (1908). Les enfants anormaux. Colin.
& Schramke, 1996). Hence, even in scenarios where categor- Bowers, L., Huisingh, R., & LoGiudice, C. (2008). Social language devel-
opment test: Elementary. Pro-Ed.
ical diagnosis produces unsatisfying answers, the clinician Bowers, L., Huisingh, R., & LoGiudice, C. (2010). Social language devel-
using psychological and neuropsychological methods will opment test: Adolescent. Pro-Ed.
have rich clinical data on how the examinee is functioning, Campbell, D. T., & Kenny, D. A. (1999). A primer on regression arti-
relative to others in their respective comparison group. By facts. Guilford Publications.
using a “gold standard” and “best practices” rationale to Centers for Disease Control and Prevention. (2014). Prevalence of aut-
ism spectrum disorder among children aged 8 years–Autism and
limit what assessment tests and tools can be used in the
developmental disabilities monitoring network, 11 sites, United
diagnosis of autism, it is likely that methods with scientific States, 2010. Morbidity and Mortality Weekly Report Surveillance
support will not be recognized as valid, which increases the Summaries, 63(2), 1–21.
likelihood that the assessment results will not be accepted. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an
This is harmful because it prevents individuals and families obstacle to the use of valid psychodiagnostic signs. Journal of
from accessing needed services. Abnormal Psychology, 74(3), 271–280. https://doi.org/10.1037/
h0027592
Whenever autism is suspected, individuals and family Constantino, J. N., & Gruber, C. P. (2012). Social Responsiveness Scale-
members deserve high-quality, thoughtful, and scientifically- Second Edition (SRS-2) Manual. Western Psychological Services.
supported assessments. There are self-evident problems with Croskerry, P. (2002). Achieving quality in clinical decision making:
rushed, incomplete assessments that generate assailable aut- Cognitive strategies and detection of bias. Academic Emergency
ism diagnoses that will be questioned at every turn, by Medicine: Medicine, 9(11), 1184–1204. https://doi.org/10.1197/aemj.
teachers, by future clinicians, and by others invested in 9.11.1184
Dahlstrom, W. G. (1993). Tests: Small samples, large consequences.
understanding the individual. These false-positive conclu- American Psychologist, 48(4), 393–399. https://doi.org/10.1037/0003-
sions are particularly problematic because, as touched on 066X.48.4.393
above, they “may cause family stress, lead to unnecessary Fargo, J. D., Schefft, B. K., Szaflarski, J. P., Howe, S. R., Yeh, H.-S., &
investigations and treatments, and place greater strain on Privitera, M. D. (2008). Accuracy of clinical neuropsychological ver-
already limited service resources.” (Randall et al., 2018; p. sus statistical prediction in the classification of seizure types. The
Clinical Neuropsychologist, 22(2), 181–194. https://doi.org/10.1080/
3). Arguably, this is more likely to occur with an individual
13854040701220093
clinician, as opposed to a multi-disciplinary team. That said, Faust, D., & Ahern, D. C. (2012). Clinical judgment and prediction. In
there are also problems with inflexible thinking about how D. Faust (Ed.), Coping with psychiatric and psychological testimony
autism must be diagnosed, particularly when this thinking (6th ed, pp. 147–208). Oxford University Press.
Faust, D., Bridges, A. J., & Ahern, D. C. (2009). Methods for the iden- Jugendpsychiatrie Und Psychotherapie, 45(3), 193–207. https://doi.
tification of sexually abused children: Issues and needed features for org/10.1024/1422-4917/a000492
abuse indicators. In K. Kuehnle & M. Connell (Eds.), The evaluation Kaufman, A. S. (2009). IQ testing 101. Springer Publishing Company.
of child sexual abuse allegations: A comprehensive guide to assessment Kaufman, N. K., & Bush, S. S. (2020a). Validity assessment in military
and testimony (p. 3–19). John Wiley & Sons Inc. psychology. In U. Kumar (Ed.), The Routledge International
Festinger, L. (1957). A theory of cognitive dissonance. Stanford Handbook of Military Psychology and Mental Health (pp. 211–223).
University Press. NY. Routledge Taylor & Francis Group.
Fischhoff, B. (1975). Hindsight is not equal to foresight: The effect of Kaufman, N. K., & Bush, S. S. (2020b). Concussion in pediatric neuro-
outcome knowledge on judgment under uncertainty. Journal of psychology. Journal of Pediatric Neuropsychology, 6(1), 14–26.
Experimental Psychology: Human Perception and Performance, 1(3), https://doi.org/10.1007/s40817-020-00078-3.
288–299. https://doi.org/10.1037/0096-1523.1.3.288 Kaufman, N. K., Davis, A. S., Sandoval, H., Tonarelli, S., Mullins., &
Frances, A. (2009). Whither DSM–V? The British Journal of Psychiatry, Aguilar, Y. (2018). Perinatal complications: Pregnancy, delivery
195(5), 391–392. https://doi.org/10.1192/bjp.bp.109.073932 problems and 2D:4D ratios. Journal of Pediatric Neuropsychology, 4,
Frederick, R. I. (2012). Malingering/Cooperation/Effort. In D. Faust 63.
(Ed.), Coping with psychiatric and psychological testimony (6th ed., Kolb, B., & Fantie, B. D. (2009). Development of the child’s brain and
pp. 229–247). Oxford University Press. behavior. In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Handbook
Galatzer-Levy, I. R., & Bryant, R. A. (2013). 636,120 ways to have post- of clinical child neuropsychology (pp. 19–46). Springer.
traumatic stress disorder. Perspectives on Psychological Science: A Korkman, M., Kirk, U., & Kemp, S. (2007). NEPSY-II: Administration
Journal of the Association for Psychological Science, 8(6), 651–662. Manual. The Psychological Corporation.
https://doi.org/10.1177/1745691613504115 Larrabee, G. J. (2012). Performance validity and symptom validity in
Garb, H. N., & Schramke, C. J. (1996). Judgment research and neuro- neuropsychological assessment. Journal of the International
psychological assessment: A narrative review and meta-analyses. Neuropsychological Society: JINS, 18(4), 625–630. https://doi.org/10.
Psychological Bulletin, 120(1), 140–153. https://doi.org/10.1037/0033- 1017/S1355617712000240
~
Lilienfeld, S. O., SauvignA#, K. C., Lynn, S. J., Cautin, R. L., Latzman,
2909.120.1.140
Gilliam, J. E. (2013). Gilliam Autism Rating Scale (3rd ed.). Pro-Ed. R. D., & Waldman, I. D. (2015). Fifty psychological and psychiatric
Glas, A. S., Lijmer, J. G., Prins, M. H., Bonsel, G. J., & Bossuyt, P. M. terms to avoid: a list of inaccurate, misleading, misused, ambiguous,
(2003). The diagnostic odds ratio: a single indicator of test perform- and logically confused words and phrases. Frontiers in Psychology, 6,
ance. Journal of Clinical Epidemiology, 56(11), 1129–1135. https:// 1–15. https://doi.org/10.3389/fpsyg.2015.01100
doi.org/10.1016/S0895-4356(03)00177-X Lord, C., Rutter, M., & Le Couteur, A. (1994). Autism diagnostic inter-
Goldstein, S., & Naglieri, J. A. (2009). Autism diagnostic spectrum rat- view-revised: A revised version of a diagnostic interview for care-
ing scales manual. MHS Assessment. givers of individuals with possible pervasive developmental
Goldstein, S., & Ozonoff, S. (2018). Assessment of autism spectrum dis- disorders. Journal of Autism and Developmental Disorders, 24(5),
659–685. https://doi.org/10.1007/BF02172145
orders (2nd ed.). The Guilford Press.
Lord, C., Rutter, M., DiLavore, P., & Risi, S. (1999). Manual: Autism
Grimes, D. A., & Schulz, K. F. (2002). Uses and abuses of screening
diagnostic observation schedule. Western Psychological Services.
tests. Lancet, 359(9309), 881–884. https://doi.org/10.1016/S0140-
Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop, S.
6736(02)07948-5
(2012). Autism diagnostic observation schedule (2nd ed., ADOS-2).
Gwynette, M., McGuire, K., Fadus, M. C., Feder, J. D., Koth, K. A., &
Western Psychological Services.
King, B. H. (2019). Overemphasis of the Autism Diagnostic
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical
Observation Schedule (ADOS) evaluation subverts a clinician’s abil-
analysis and a review of the evidence. University of Minnesota Press.
ity to provide access to autism services. Journal of the American
Miller, D. J., Spengler, E. S., & Spengler, P. M. (2015). A meta-analysis
Academy of Child & Adolescent Psychiatry, 58(12), 1222–1223.
of confidence and judgment accuracy in clinical decision making.
https://doi.org/10.1016/j.jaac.2019.07.933 Journal of Counseling Psychology, 62(4), 553–567. https://doi.org/10.
Hahn, R. A. (1999). Expectations of sickness: Concept and evidence of
1037/cou0000105
the nocebo phenomenon. In I. Kirsh (Ed.), How expectancies shape Mirowsky, J., & Ross, C. E. (1989). Psychiatric diagnosis as reified
experience (pp. 333–356). American Psychological Association. measurement. Journal of Health and Social Behavior, 30(1), 11–25.
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, https://doi.org/10.2307/2136907
S. R., 1., C. P, Conference Participants (2009). American Academy Murrie, D. C., & Boccaccini, M. T. (2015). Adversarial Allegiance
of Clinical Neuropsychology Consensus Conference Statement on among Expert Witnesses. Annual Review of Law and Social Science,
the neuropsychological assessment of effort, response bias, and 11(1), 37–55.
malingering. The Clinical Neuropsychologist, 23(7), 1093–1129. Myers, D. G. (2008). Group influence social psychology. (9th ed., pp.
https://doi.org/10.1080/13854040903155063 261–297). McGraw Hill.
Hill, D. A., Martin, E. D., Jr,., & Nelson-Head, C. (2011). Examination Patzer, D., & Volkmar, F. (1999). The neurobiology of autism and the
of case law (2007–2008) regarding autism spectrum disorder and pervasive developmental disorders. In D. S. Charney, E. J. Nestler, &
violations of the Individuals With Disabilities Education Act. B. S. Bunney (Eds.), Neurobiology of mental illness (pp. 761–778).
Preventing School Failure: Alternative Education for Children and Oxford University Press.
Youth, 55(4), 214–225. https://doi.org/10.1080/1045988X.2010. Pearson. (2009). Advanced clinical solutions for WAIS-IV and WMS-IV:
542784 Administration and scoring manual. Pearson.
Huerta, M., & Lord, C. (2012). Diagnostic evaluation of autism spec- Pennington, B. F., McGrath, L., & Peterson, R. (2019). Autism spectrum
trum disorders. Pediatric Clinics of North America, 59(1), 103–111, disorder diagnosing learning disorders (3rd ed., pp. 258–311). The
xi. https://doi.org/10.1016/j.pcl.2011.10.018 Guilford Press.
Hyman, S. E. (2007). Can neuroscience be integrated into the DSM-V? Perry, A., Condillac, R. A., & Freeman, N. L. (2002). Best practices and
Nature Reviews. Neuroscience, 8(9), 725–732. https://doi.org/10.1038/ practical strategies for assessment and diagnosis of autism. Journal
nrn2218 on Developmental Disabilities, 9(2), 61–75.
Hyman, S. E. (2010). The diagnosis of mental disorders: the problem Randall, M., Egberts, K. J., Samtani, A., Scholten, R. J., Hooft, L.,
of reification. Annual Review of Clinical Psychology, 6(1), 155–179. Livingstone, N., & Williams, K. (2018). Diagnostic tests for autism
https://doi.org/10.1146/annurev.clinpsy.3.022806.091532 spectrum disorder (ASD) in preschool children. Cochrane Database
Kamp-Becker, I., Langmann, A., Stehr, T., Custodis, K., Poustka, L., & of Systematic Reviews, (7), 1–103.
Becker, K. (2017). Diagnostic accuracy of the ADOS-2 taking Reynolds, C. R., & Kamphaus, R. W. (2015). Behavior assessment sys-
account of gender effects. Zeitschrift Fur Kinder– Und tem for children (3rd ed.). Pearson.
12 N. K. KAUFMAN
Risi, S., Lord, C., Gotham, K., Corsello, C., Chrysler, C., Szatmari, P., Secondary influences on neuropsychological test performance (pp.
& Pickles, A. (2006). Combining information from multiple sources 182–200). Oxford University Press.
in the diagnosis of autism spectrum disorders. Journal of the Tetlock, P. E., & Boettger, R. (1989). Accountability: A social magnifier
American Academy of Child & Adolescent Psychiatry, 45(9), of the dilution effect. Journal of Personality and Social Psychology,
1094–1103. 57(3), 388–398. https://doi.org/10.1037/0022-3514.57.3.388
Robins, D. L., Casagrande, K., Barton, M., Chen, C.-M A., Dumont- Wason, P. C. (1960). On the failure to eliminate hypotheses in a con-
Mathieu, T., & Fein, D. (2014). Validation of the modified checkist ceptual task. Quarterly Journal of Experimental Psychology, 12(3),
for autism in toddlers. Pediatrics, 133(1), 37–45. https://doi.org/10. 129–140. https://doi.org/10.1080/17470216008416717
1542/peds.2013-1813 Weissman, H. N. (1990). Distortions and deceptions in self presenta-
Rogers, R., & Bender, S. (2013). Evaluation of malingering and related tion: Effects of protracted litigation in personal injury cases.
response styles. In A. M. Goldstein (Ed.), Handbook of psychology, Behavioral Sciences and The Law, 8(1), 67–74. https://doi.org/10.
11 (Forensic Psychology) (pp. 517–540). John Wiley & Sons, Inc. 1002/bsl.2370080108
Rutter, M., Bailey, A., Berument, S. K., Lord, C., & Pickles, A. (2003). Wilczynski, N. L. (2008). Quality of reporting of diagnostic accuracy
Social communication questionnaire (SCQ). Western Psychological
studies: no change since STARD statement publication—before-and-
Services.
after study. Radiology, 248(3), 817–823. https://doi.org/10.1148/
Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th
radiol.2483072067
ed.). Jerome M. Sattler, Publisher, Inc.
Wing, L. (2006). Diagnostic Interview for Social and Communication
Schopler, E., Van Bourgondien, M. E., Wellman, G. J., & Love, S. R.
Disorders (11th ed.). Centre for Social and Communication
(2010). The Childhood Autism Rating Scale, second edition (CARS-2).
Western Psychological Services. Disorders.
Skuse, D., Warrington, R., Bishop, D., Chowdhury, U., Lau, J., Mandy, Wolff, M., Bridges, B., & Denczek, T. (2018). The complexity of autism
W., & Place, M. (2004). The Developmental, Dimensional and spectrum disorders. Routledge.
Diagnostic Interview (3di): a novel computerized assessment for aut- Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1),
ism spectrum disorders. Journal of the American Academy of Child 32–35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-
and Adolescent Psychiatry, 43(5), 548–558. https://doi.org/10.1097/ CNCR2820030106>3.0.CO;2-3
00004583-200405000-00008 Zhou, X., Reynolds, C., Zhu, J., & Kamphaus, R. W. (2020).
Stewart, J. R., Vigil, D. C., Ryst, E., & Yang, W. (2014). Refining best Differentiating autism from ADHD in children and adolescents
practices for the diagnosis of autism: a comparison between individ- using BASC-3. Journal of Pediatric Neuropsychology, 6(2), 61–65.
ual healthcare practitioner diagnosis and transdisciplinary assess- https://doi.org/10.1007/s40817-020-00082-7
ment. Nevada Journal of Public Health, 11(1), 1–13. Zirkel, P. A. (2011). Autism litigation under the IDEA: A new meaning
Suhr, J., & Wei, C. (2013). Response expectancies and their potential of “disproportionality”. Journal of Special Education Leadership,
influence in neuropsychological evaluation. In P. Arnett (Ed.), 24(2), 92–103.

Rethinking Gold Standards

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rethinking Gold Standards

Uploaded by

Copyright:

Available Formats

Applied Neuropsychology: Child

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/hapc20

Rethinking “gold standards” and “best practices” in

To link to this article: https://doi.org/10.1080/21622965.2020.1809414

Published online: 27 Aug 2020.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Rethinking “gold standards” and “best practices” in the assessment of autism

Introduction neuropsychology. For these and other reasons explored

You might also like