You are on page 1of 4

2017 IEEE International Conference on Big Data (BIGDATA)

Machine learning for early detection of autism (and other conditions) using a
parental questionnaire and home video screening

Halim Abbas Eric Glover Dennis P Wall


and Ford Garberson Email: eric.g@ericglover.com Stanford University
Cognoa Inc Stanford, CA, USA
Palo Alto, CA, USA
Email: halim@cognoa.com

Abstract—Existing screening tools for early detection of autism We have developed two new machine learning autism
are expensive, cumbersome, time-intensive, and sometimes fall screeners that are reliable, cost-effective, easy enough to
short in predictive value. In this work, we apply Machine complete in minutes, and achieve higher accuracy than
Learning to gold standard clinical data obtained across thou- existing screeners. We have deployed these screeners using
sands of children at risk for autism spectrum disorders to the Cognoa [6] App. To date, Cognoa has been used by over
create a low-cost, quick, and easy to apply autism screening 250,000 parents in the US and internationally using web and
tool that performs better than most widely used standardized native smartphone mobile App tools. One screener is based
instruments. This new tool combines two screening methods on a short questionnaire about the child which is answered
into a single assessment, one based on short, structured parent- by the parent. The other screener is based on identification
reported questionnaires and the other based on tagging key of specific behaviors by trained analysts after watching two
behaviors from short, semi-structured home videos of children. or three short videos of the child at their home environment
We demonstrate a significant accuracy improvement over stan- captured by parents using a mobile device.
dard screening tools in a clinical study sample of 162 children. The screening algorithms were trained using behavioral
We further discuss the challenge of extending machine learning patterns keyed off of features that are probed by established
algorithms to conditions beyond autism, and we propose a clinical instruments. These instruments, the ADI-R [7] and
generalized framework for using machine learning algorithms
ADOS [8], are considered gold standard instruments that
to simultaneously search for the presence of many different
have high accuracy and high consistency across practitioners
and are frequently used for at risk children. They both,
conditions.
however, are costly and can require hours to administer.
We will briefly discuss our screening algorithms and
1. Introduction show a snapshot of their performance in a clinical trial that
was conducted last year (more details are available in a draft
that is currently under review [9]).
Diagnosis within the first few years of life dramati- We further investigate the problem of applying a clas-
cally improves the outlook of children with autism, as it sifier that can simultaneously classify multiple conditions
allows for treatment while the child’s brain is still rapidly using chained machine learning algorithms. We discuss the
developing [1], [2]. Unfortunately, in the United States major limitations inherent in training individual algorithms
autism is typically not diagnosed earlier than age 4, with that can be trained to classify the presence or absence of all
approximately 27% of cases remaining undiagnosed at age conditions in all children, and we propose a more practical
8 [3]. This delay in diagnosis is driven primarily by a lack approach.
of effective screening tools and a shortage of specialists to
evaluate at risk children. 2. Autism screener data and methodology
Most autism screeners in use today are based on score-
sheets with questions for the parent or the medical practi-
tioner, that produce results by comparing summation scores The autism machine learning based screeners were
to predetermined thresholds. Notable examples are the trained using data compiled from multiple clinical repos-
Modified Checklist for Autism in Toddlers, Revised (M- itories of ADOS and ADI-R score-sheets of nearly seven
CHAT) [4], a checklist-based screening tool for autism thousand children between 18 to 84 months of age, sup-
that is intended to be administered during developmental plemented by data collected from the parents of children
screenings with children between the ages of 16 and 30 answering screening questions on Cognoa’s website. Both
months, and the Child Behavior Checklist (CBCL) [5] which screeners were applied using Cognoa’s App in a clinical trial
is a parent-completed screening tool. to a sample of 162 at-risk children between the ages of 18

978-1-5386-2715-0/17/$31.00 ©2017 IEEE 3558


and 84 months who have undergone full clinical examination on these features were answered by a minimally trained
and received a clinical diagnosis. analyst after watching two or three one minute home videos
of the child’s behavior taken by their parent.
2.1. Parental questionnaire screener methodology A variety of machine learning approaches were stud-
ied before settling on an optimal approach for the video
screener. Random forests were then trained to determine
The parental questionnaire screener keys on behavioral the screener output. In order to reduce the impact of bias
patterns typically probed in a standard autism diagnostic from non-observable features, missing observations were
instrument, the Autism Diagnostic Interview Revised (ADI- randomly injected into the training data at a rate that was
R) [7]. This clinical tool consists of a parent interview of calculated to minimize the decision making impact of a
93 multi-part questions with multiple choice and numeric missing feature in the trees of the forests. Further details
responses which are delivered by a trained professional in a are discussed in [9].
clinical setting. While this instrument is considered a gold-
standard and gives consistent results across examiners, the 3. Autism screening inconclusive results
cost and time to administer it can be prohibitive. These ques-
tions are therefore keyed on by machine learning algorithms
when building the parental questionnaire screener for the Patients with more complex symptom presentation are
Cognoa application. known to pose challenges to developmental screening. These
A variety of machine learning approaches were studied children often screen as false positives or false negatives,
before settling on an optimal approach for the parental resulting in an overall degradation of screening accuracy that
questionnaire screener. Each of the instrument’s 155 data is observed by all standard methods and has become accept-
columns was encoded using in a manner similar to a one hot able in the industry. Given that our low-cost instruments do
encoding approach but which preserves information about not rely on sophisticated observations to differentiate com-
the severity of the response. Two machine learning algo- plex symptom cases, our approach was to avoid assessing
rithms using random forests were designed, one to screen them altogether, and to try instead to spot and label them
children between 18 months and three years of age, and as “inconclusive”.
the other to screen children between four and six years of For both the video and the parental questionnaire based
age. To make the screening questionnaire easier for parents, algorithms, multiple methods to implement this were de-
the number of questions asked was reduced. From each vised of varying sophistication. In the end the simplest
age group, detailed studies were performed to identify an method performed as well as the others and was chosen
optimal subset of questions to include for the final machine to report inconclusive results for the clinical study. Instead
learning algorithm (17 were chosen for the model applied of using the output score of the random forest to choose
to younger children, and 21 for the model applied to older an optimal threshold to separate autism from non-autism
children). Questions similar to these, but simplified and cases, the score was passed to a grid search to determine an
rephrased in order to be easily understood by parents were optimal cutoff range, subject to a constraint on the maximum
chosen to be presented in the Cognoa application. Aggrega- number of subjects which could be determined inconclusive.
tions of the average, extremes, and most common responses Subjects with a score above the lowest threshold but below
were found to significantly increase accuracy. Further details the highest were considered inconclusive.
are discussed in [9].
4. Autism screening combination
2.2. Video screener methodology
The combination of the questionnaire and video screen-
The video screener keys on behavioral patterns typically ers is made more challenging by the fact that no training
probed in another diagnostic tool, the Autism Diagnostic samples are available for children that are known to have
Observation Schedule (ADOS) [8]. ADOS consists of an both ADI-R and ADOS results. As a consequence, the
interactive, highly standardized examination of the child by clinical data itself must be used to build the model to
trained clinicians in a tightly controlled setting. ADOS is perform the combination.
widely considered a gold standard and is one of the most The numerical response of each of the parent ques-
common behavioral instruments used to aid in diagnosis of tionnaire and video classifiers were combined using l2-
autism [10], however the cost and time to administer it can regularized logistic regression. While some overfitting is
be prohibitive. For the Cognoa video screening algorithm, expected, this is minimal due to the fact that the logistic
a subset of ADOS questions were identified as probing regression is highly constrained, with only three free pa-
features that can realistically be observed in home videos. rameters. Since screening models were trained for young
Separate algorithms were trained for use with pre-verbal children and old children, separate combination algorithms
children or verbal children, and for each an optimal subset were trained per age group. For each combination algorithm,
of ten questions were identified as the most effective for optimal inconclusive output criteria were chosen using the
identification of autism in that age group. Questions based logistic regression response.

3559
5. Autism screening results in the clinical trial algorithms due to the fact that it allows for much more
flexible tuning of the algorithms. For example, if there is
ROC curves in Figure 1 show the performances of an algorithm that is unable to accurately classify 80% of
the parental questionnaire-based screener, the video-based children but which can determine a very accurate classi-
screener and the combined screener. The performances of fication on the remaining 20%, such an algorithm would
the industry-standard M-CHAT and CBCL autism screen- be infeasible to use in a standalone screening application.
ers are also compared. Operating near the commonly used However, in a tree of algorithms the inconclusive parameter
threshold of 80% sensitivity, the combined and video-based of the algorithm at that node can be tuned to reach an
screeners presented here have a better specificity than both inconclusive determination in the vast majority of subjects
the M-CHAT or the CBCL at a 95% confidence level, while without losing relevance (such subjects can still be informed
the questionnaire alone has a better specificity than the of the results of the upstream classifier determinations) while
CBCL screener at a 95% confidence level but is not quite giving a more precise determination for the minority of
better in specificity than the M-CHAT screener at a 95% subjects for which an accurate conclusive determination is
confidence level. possible.
The chained tree of machine learning algorithms has
further advantages. Parents would only have to spend time
6. Simultaneous screening for multiple condi- answering questions that are relevant to their particular
tions child. Further, this approach more closely mirrors what
happens in an actual clinical setting. Thus the data that is
available in real clinical settings for at-risk children would
In addition to autism there are many other important become all that is needed to train a classifier to discriminate
childhood conditions that could potentially be screened for between the relevant conditions at a particular node of the
using machine learning algorithms, including ADHD, lan- algorithm tree.
guage disorders, and intellectual disabilities, among others.
Building one or more high performing machine learning 7. Conclusion
algorithms to screen multiple conditions for all children is a
daunting task for several reasons. First, each of these condi-
tions has unique signatures that depend upon very different We have briefly presented two new machine learning
types of features from one another. Asking a sufficiently algorithms, one of which operates based on a parental ques-
thorough set of questions to be able to effectively classify tionnaire and the other of which operates on home cell phone
all of these conditions would be prohibitive on any parent’s videos, as well as their combination. We have also shown
time. Secondly, a large sample of training data would be that their performance exceeds that of the industry standard
needed of children with every one of these conditions which screening tools M-CHAT and CBCL in a clinical trial of
contain features that are relevant not just to their particular 162 at risk children. Further details on these algorithms and
condition, but also to all other conditions that the classifier this clinical trial are available elsewhere [9]. We have also
will attempt to identify. Such data is typically not available. proposed an engine to handle machine learning classification
For cost considerations, children in clinical settings are of many conditions simultaneously with clear advantages
typically not evaluated using instruments unless they are over individual algorithms that are designed to classify all
deemed to be at risk of a particular condition that is probed conditions for all children. Work is in progress to apply this
by the instrument. tool to conditions beyond autism.
Therefore, instead of building a single master machine
learning classifier that classifies all conditions simultane- References
ously, we propose a tree of condition classification algo-
rithms. Figure 2 shows what such a screening tree might [1] MS Durkin, MJ Maenner, FJ Meaney, SE Levy, C DiGuiseppi,
look like. This tree would start with an algorithm to de- JS Nicholas, et al. Socioeconomic inequality in the prevalence of
autism spectrum disorder: evidence from a U.S. cross-sectional study.
termine whether the child is at risk of any of a number of PloS One, 5, 2010.
broad categories of conditions, and if so, presents them with
[2] DL Christensen, J Baio, KV Braun, et al. Prevalence and characteris-
a new particular machine learning algorithm that is targeted tics of autism spectrum disorder among children aged 8 years autism
to determine more precisely which kind of condition they and developmental disabilities monitoring network, 11 sites, United
have. Further classifiers can then be trained to identify a States, 2012. MMWR Surveill Summ, 65 (No. SS-3)(No. SS-3):1–23,
more specific condition if possible. The machine learning 2016.
algorithms that are run at each node can be individual [3] L Zwaigenbaum, S Bryson, C Lord, S Rogers, A Carter, et al. Clinical
assessment and management of toddlers with suspected autism spec-
algorithms, or composite algorithms such as discussed in trum disorder: insights from studies of high-risk infants. Pediatrics,
Section 4 of this paper. 123:1383–1391, 2009.
The algorithms at each node of the tree can further come [4] R Bernier, A Mao, and J Yen. Diagnosing autism spectrum disorders
to an inconclusive determination at each node in the manner in primary care. Practitioner, 255(1745):27–30, 2011.
discussed in Section 3 of this paper. The functionality to [5] TM Achenbach and LA Rescorla. Manual for the ASEBA preschool
return inconclusive results is especially useful in a tree of forms & profiles. 2000.

3560
Figure 1. ROC curves on the clinical sample for the questionnaire and the video based algorithms, separately and in combination. Inconclusive determination
is allowed for up to 25% of the cases. The established screening tools MCHAT and CBCL are included as baselines.

Figure 2. A hypothetical illustration of a chained tree of machine learning algorithms. At each node in the tree a three way classifier (such as the one
discussed in this paper) is run to determine whether a condition is present, or whether the determination is inconclusive given the available data.

[6] Cognoa, Inc. 2390 El Camino Real St 220, Palo Alto, CA 94306, [10] C Lord, E Petkova, V Hus, W Gan, F Lu, et al. A multisite study of
https://www.cognoa.com/. the clinical diagnosis of different autism spectrum disorders. Archives
of General Psychiatry, 69:306–313, 2012.
[7] C Lord, M Rutter, and A Le Couteur. Autism diagnostic interview-
revised: a revised version of a diagnostic interview for caregivers of
individuals with possible pervasive developmental disorders. Journal
of Autism and Developmental Disorders, 24:659–685, 1994.
[8] C Lord, M Rutter, S Goode, J Heemsbergen, H Jordan, et al. Autism
diagnostic observation schedule: a standardized observation of com-
municative and social behavior. J Autism Dev Disord, 19:185–212,
1989.
[9] Halim Abbas, Ford Garberson, Eric Glover, and Dennis Wall. Ma-
chine learning approach for early detection of autism by combining
questionnaire and home video screening. Journal of the American
Medical Informatics Association (submitted, under review), 2017.

3561

You might also like