You are on page 1of 13

736201

research-article2017
AUT0010.1177/1362361317736201AutismRescorla et al.

Original Article

Autism

Structure, longitudinal invariance, and 2019, Vol. 23(1) 223­–235


© The Author(s) 2017
Article reuse guidelines:
stability of the Child Behavior Checklist sagepub.com/journals-permissions
DOI: 10.1177/1362361317736201
https://doi.org/10.1177/1362361317736201

1½–5’s Diagnostic and Statistical Manual journals.sagepub.com/home/aut

of Mental Disorders–Autism Spectrum


Disorder scale: Findings from Generation
R (Rotterdam)

Leslie A Rescorla1, Akhgar Ghassabian2,3, Masha Y Ivanova4,


Vincent WV Jaddoe2, Frank C Verhulst5 and Henning Tiemeier5

Abstract
Although the Child Behavior Checklist 1½–5’s 12-item Diagnostic and Statistical Manual of Mental Disorders-Autism
Spectrum Problems Scale (formerly called Pervasive Developmental Problems scale) has been used in several studies as
an autism spectrum disorder screener, the base rate and stability of its items and its measurement model have not been
previously studied. We therefore examined the structure, longitudinal invariance, and stability of the Child Behavior
Checklist 1½–5’s Diagnostic and Statistical Manual of Mental Disorders-Autism Spectrum Problems Scale in the diverse
Generation R (Rotterdam) sample based on mothers’ ratings at 18 months (n = 4695), 3 years (n = 4571), and 5 years (n
= 5752). Five items that seemed especially characteristic of autism spectrum disorder had low base rates at all three ages.
The rank order of base rates for the 12 items was highly correlated over time (Qs ⩾ 0.86), but the longitudinal stability
of individual items was modest (phi coefficients = 0.15–0.34). Confirmatory factor analyses indicated that the autism
spectrum disorder scale model manifested configural, metric, and scalar longitudinal invariance over the time period
from 18 months to 5 years, with large factor loadings. Correlations over time for observed autism spectrum disorder
scale scores (0.25–0.50) were generally lower than the correlations across time of the latent factors (0.45–0.68). Results
indicated significant associations of the autism spectrum disorder scale with later autism spectrum disorder diagnoses.

Keywords
autism spectrum disorder symptoms, Child Behavior Checklist 1½–5, longitudinal stability, measurement invariance,
preschoolers

The diagnostic criteria for autism spectrum disorder (ASD) (2003), who used the Social Responsiveness Scale (SRS)
in the Diagnostic and Statistical Manual of Mental with 788 pairs of twins aged 7–15 years, demonstrated
Disorders (5th ed.; DSM-5; APA, 2013) include three that ASD-like symptoms were continuously distributed
symptoms reflecting social communication and interaction and common in the general population. When Robinson
problems (SCI) and four symptoms reflecting restricted
interests and repetitive behaviors (RRB). As the DSM-5
indicates, each of the seven symptoms can present with a 1Bryn Mawr College, USA
2Erasmus MC, The Netherlands
wide range of severity, suggesting that the symptoms as 3New York University, USA
well as ASD itself should be considered as dimensional 4University of Vermont, USA
constructs. 5Erasmus MC-Sophia Children’s Hospital, The Netherlands

Evidence for the dimensionality of ASD comes from


Corresponding author:
research indicating that ASD-like behaviors manifest not Leslie A Rescorla, Department of Psychology, Bryn Mawr College, 101
only in people with an ASD diagnosis but also in the gen- N Merion Avenue, Bryn Mawr, PA 19010, USA.
eral population. For example, Constantino and Todd Email: lrescorl@brynmawr.edu
224 Autism 23(1)

et al. (2011) used the Social and Communication Disorders Seven items on the DSM-ASD scale appear to tap SCI,
Checklist (SCDC) at the ages of 7, 10, and 13 years with namely, item 4. Avoids looking others in the eye; 23.
6539 children in the Avon Longitudinal Study of Parents Doesn’t answer when people talk to him/her; 25. Doesn’t
and Children (ALSPAC), they found that ASD-like traits get along with other children; 67. Seems unresponsive to
were common and stable over time, both in the full sam- affection; 70. Shows little affection toward people; 76.
ple and in sub-groups of high-scoring children. Whitehouse Speech problem; and 98. Withdrawn, doesn’t get involved
et al. (2011) reported longitudinal findings for ASD-like with others. Additionally, five items appear to tap RRB,
traits measured in 760 Australian children at the age of 2 namely, item 7. Can’t stand things out of place; 21.
years using the Pervasive Developmental Problems (PDP) Disturbed by any change in routine; 63. Repeatedly rocks
scale of the Child Behavior Checklist 1½–5 (CBCL/1½– head or body; 80. Strange behavior; and 92. Upset by new
5; Achenbach and Rescorla, 2000) and then followed up people or situations. To our knowledge, the composition,
almost 20 years later with the Autism Spectrum Quotient. structure, longitudinal invariance, and stability of this
Longitudinal correlations were significant but modest for measurement model of the 12-item ASD scale have not
males for both the total scales of each instrument and vari- been tested using confirmatory factor analysis (CFA).
ous subscales. Although the CBCL/1½–5 is not a diagnostic instru-
Constantino and Todd (2003) and Robinson et al. ment, it is widely used in assessment of preschool children
(2011) have shown that ASD-like traits occur in school- because it provides a profile of the child’s behavioral and
age children in the general population and show some sta- emotional problems using normed scale scores.
bility over time. However, to our knowledge, less research Furthermore, several studies have demonstrated significant
has explored what percentage of preschool children in the associations between the DSM-PDP scale and autism diag-
general population manifest specific ASD-like behaviors noses for preschool children, suggesting its potential as an
and which symptoms are more common than others ASD screening instrument (e.g. Sikora et al., 2008).
among non-ASD preschoolers. That is, we know rela- Sensitivity (SENS) and specificity (SPEC) of the PDP scale
tively little about the base rates for individual ASD-like were 85% and 90%, respectively, when Muratori et al.
symptoms in preschool population samples or the longitu- (2011) compared Italian preschoolers with ASD with typi-
dinal stability of these individual ASD-like symptoms cally developing (TD) children, but SPEC was only 60%
over the preschool period. This is important to know when children with other psychiatric disorders (OPD) com-
because the preschool period is when most children with prised the comparison group. Narzisi et al. (2013), using
ASD are diagnosed and hence when clinicians must subsets of the same Italian samples in a case-control design,
decide whether a child’s presenting symptoms warrant a reported high SENS and SPEC for the PDP scale both with
diagnosis of ASD. Knowing that the ASD-like symptoms the TD comparison group (0.98 and 0.91) and the OPD
that a child manifests are quite common in the general comparison group (0.85 and 0.83). Myers et al. (2014)
population or are rather unstable over time may affect reported SENS of 79% and SPEC of 48% when the PDP
one’s threshold for diagnosing ASD. scale was used to differentiate children with ASD from
To address this gap in the literature, we used 12 items those with other developmental disabilities. Rescorla et al.
from the CBCL/1½–5 rated by an international group of (2015) reported 80% SENS for the PDP scale, with SPEC
experts as being very consistent with the DSM-5 diagnostic varying by comparison group (87% for non-referred, 55%
category of ASD (Achenbach, 2014). As was done in creat- for other psychiatric diagnoses, and 60% for developmental
ing the earlier Diagnostic and Statistical Manual of Mental delay). Rescorla et al. (in press) reported SENS of 0.77 and
Disorders (4th ed.; DSM-IV; APA, 1994) version of this SPEC of 0.99 in a family-risk ASD study. In contrast,
scale (Achenbach and Rescorla, 2000), a group of psychia- Havdahl et al. (2016) reported that the CBCL/1½–5 PDP
trists and psychologists rated each CBCL/1½–5 item as scale did not provide strong discrimination between 104
being not consistent, somewhat consistent, or very consist- preschoolers with ASD and 57 preschoolers with non-ASD
ent with a set of DSM diagnoses commonly used for young developmental and behavioral/emotional problems.
children. Items rated as very consistent with a given diagno-
sis by >60% of the expert judges were assigned to the
Goal of the present study
respective DSM-oriented scale. The ratings for the 2000
DSM-IV PDP scale, used by Whitehouse et al. (2011), The overarching goal of our study was to conduct a com-
yielded 13 very consistent items. However, when the experts prehensive examination of the structure, invariance, and
rated the items with respect to the DSM-5 diagnostic cate- longitudinal stability of the CBCL/1½–5’s DSM-ASD
gory of ASD, one of the PDP scale items, Afraid to try new scale (Achenbach, 2014) in a large population sample.
things, did not meet the >60% threshold and was therefore Mothers completed CBCLs at 18 months, 3 years, and 5
removed from the scale. Additionally, the scale was renamed years for a diverse sample of children participating in the
DSM-Autism Problems Scale, abbreviated as DSM-ASD to longitudinal Generation R Study in Rotterdam (Jaddoe
be consistent with DSM-nomenclature (Achenbach, 2014). et al., 2010; Kooijman et al., 2016; Tiemeier et al., 2012).
Rescorla et al. 225

Table 1.  Demographic characteristics of the sample at each time point.

18 months (n = 4695) 3 years (n = 4571) 5 years (n = 5752)


Gender, boys; n (%) 2305 (49.6) 2275 (49.8) 2885 (50.2)
Child ethnicity; n (%)
 Dutch 3115 (66.3) 3056 (66.9) 3669 (63.8)
  Other Western 404 (8.0) 417 (9.1) 510 (8.9)
 Non-Western 1124 (23.9) 1064 (23.3) 1571 (27.3)
Gestational age, weeks; mean (SD) 39.8 (1.8) 39.8 (1.8) 39.8 (1.9)
Birth weight, grams; mean (SD) 3436 (569) 3444 (568) 3429 (577)
Apgar score at 1 and 5 min; mean (SD) 9.1 (0.9) 9.1 (0.9) 9.1 (0.9)
Marital status; n (%)
 Married/cohabiting 4107 (87.5) 4024 (88.0) 4890 (85.0)
 Single 373 (7.9) 334 (7.3) 486 (8.4)
 Missing 215 (4.6) 213 (4.7) 376 (6.5)
Paternal age at intake; mean (SD) 31.4 (4.7) 31.6 (4.6) 31.2 (4.8)
Paternal age at intake; mean (SD) 33.8 (5.5) 34.1 (5.4) 33.8 (5.5)
Maternal education; n (%)
  <Secondary 232 (4.9) 227 (5.0) 331 (5.8)
 Secondary 1688 (36.0) 1571 (34.4) 2163 (37.6)
  Higher education 2576 (55.0) 2578 (56.4) 2912 (50.6)
 Missing 195 (4.2) 195 (4.3) 346 (6.0)
Mean (SD) euros/month; income, n (%)
  <1200 226 (4.8) 198 (4.3) 300 (5.2)
 1200–2000 805 (17.1) 751 (16.4) 1003 (17.4)
  >2000 2896 (61.7) 2836 (62.0) 3300 (57.4)
 Missing 768 (16.4) 786 (17.2) 1149 (20)

Specific goals of the study included the following: (a) to Written informed consent was obtained from parents/car-
determine the base rate of each ASD-like symptom in the egivers, with anonymity guaranteed. To be included in our
full sample at each age; (b) to test the longitudinal stabil- study, a child needed to have CBCL/1½–5 data for at least
ity of the 12 ASD scale items from 18 months to 5 years; one of the three time points. Because CBCL protocols with
(c) to test the longitudinal stability of the aggregated >8 missing items are considered invalid, all such children
12-item ASD scale as well as the SCI and RRB subscales; were excluded from the relevant age group sample.
(d) to test the measurement model of the ASD scale and its Because there was often a wide time window for Generation
SCI and RRB subscales at each age as well as the longitu- R visits at each data point, we required a visit between 17
dinal invariance of the model over time; and (e) to test the and 21 months for the 18-month group, between 34 and 39
associations between the CBCL/1½–5’s ASD, SCI, and months for the age 3 group, and between 59 and 83 months
RRB scales and scores on the SRS administered at the age for the age 5 group (because almost half the children seen
of 6 years as well as their associations with confirmed for the age 5 follow-up had already turned 6), yielding
ASD diagnoses. Although not a major goal, we also 5065, 4976, and 5917 possible cases at the three ages. For
looked at the effects of various demographic factors on 18 months, 3 years, and 5 years, respectively, missing val-
ASD scores because our sample was larger and more ues across the 12 ASD items ranged from 0.3% of cases
diverse than many population samples in which ASD-like (items 67 and 80) to 2.1% of cases (for item 76); from
problems have been studied (Constantino and Todd, 2003; 0.1% of cases (item 70) to 1.1% of cases (item 14); and
Robinson et al., 2011). from 0.2% of cases (item 98) to 0.5% of cases (item 14).
We excluded cases with any missing ASD data (370 at 18
months, 225 at 3 years, and 165 at 5 years), resulting in
Methods final sample sizes of 4695 at 18 months, 4571 at 3 years,
Participants and 5752 at 5 years. Table 1 presents the baseline character-
istics of the study participants at the three ages.
The present research was embedded within the Generation
R Study, a multi-ethnic population-based birth cohort in
Measures
Rotterdam, the Netherlands (Jaddoe et al., 2010; Tiemeier
et al., 2012). The study was approved by the Medical Ethics CBCL/1½–5.  We used the CBCL/1½–5 to measure ASD-
Committee of the Erasmus Medical Center, Rotterdam. like symptoms at all three ages. Even though 42% of the
226 Autism 23(1)

age 5 sample had turned 6, the CBCL/1½–5 was used both Verde, Morocco, the Dutch Antilles, Surinam, Turkey,
for reasons of continuity and because many children were Ethiopia, Bolivia, etc.).
not yet in formal schooling. The CBCL/1½–5 contains 99
problem items that parents can score as 0 = not true, 1 =
Data analyses
somewhat or sometimes true, and 2 = very true or often
true of the child based on the preceding 2 months. Good We calculated total ASD scale scores, as well as SCI and
reliability and validity have been reported for the Dutch RRB subscale scores, by summing the child’s 0-1-2 ratings
translation of the CBCL/1½–5 (Tick et al., 2007). for the relevant items. The SCI and RRB subscales were
moderately correlated, with rs of 0.40 at 18 months, 0.44
SRS.  The Generation R protocol involved administration at 3 years, and 0.53 at 5 years.
of an 18-item short-form of the SRS at age 6 (n = 4778 In a preliminary analysis, we tested the effects of var-
for our age 5 sample). Autistic traits (i.e. social, language, ious demographic and health factors on ASD scores
and repetitive behaviors) were measured by parental rat- using multiple regression. We then determined the base
ings using the scale 0 (never true) to 3 (almost always rate for each ASD item using our full samples at each
true), which we summed to yield a total SRS score. Serd- time point. We operationalized broad base rate as the
arevic et al. (2017) note that the SRS short-form corre- percentage of children with a rating of 1 (somewhat or
lated 0.95 with scores from the full SRS in a previous sometimes true) or a rating of 2 (very true or often true)
Dutch study. and narrow base rate as the percentage of children with
a rating of 2. We also calculated mean item ratings for
ASD diagnosis. As reported by Serdarevic et al. (2017), the final sample at each age and examined gender differ-
Generation R employed a multiple-gating procedure ences in these ratings.
whereby children who (a) had a high score on the SRS To examine stability over time, we cross-tabulated
short-form, (b) had a high score on the Social Communica- broad base rates for each item from 18 months to 3 years
tion Questionnaire (SCQ; Berument et al., 1999), or (c) and from 3 to 5 years. We also computed Q correlations
had been assessed for ASD according to parental report between the broad base rates over time, as well as between
were selected for a search of central medical records main- the mean item ratings over time. Additionally, we exam-
tained by general practitioners. Only children for whom a ined the longitudinal stability of the observed ASD scale
diagnosis of ASD was confirmed by these medical records and its SCI and RRB subscales over time using correla-
were considered ASD cases in this analysis. The specialist tions. We next used a mixed-model analysis of variance
diagnoses of ASD were generally based on clinical con- (ANOVA) to test the effects of age and gender on ASD,
sensus by a multidisciplinary team. The standard diagnos- SCI, and RRB scores for the final sample of children with
tic work-up involves an extensive developmental case ASD item data at 18 months, 3 years, and 5 years (n =
history obtained from parents, as well as school informa- 3306). In light of the very large sample size and the num-
tion and repeated observations of the child. This medical ber of analyses conducted, we only report findings with p
record information was available for 127 of our age 5 sam- < 0.001 as significant for all analyses.
ple (n = 5752), 60 of whom received an ASD diagnosis To test the measurement model of the DSM-ASD scale
and 67 of whom did not following the multidisciplinary at each age as well as its measurement invariance (MI)
assessment. over time, we conducted CFAs. These CFAs used the 12
dichotomized (0 vs 1/2) ASD items as categorical varia-
Child and demographic variables. Selected Generation R bles in the weighted least squares with standard errors and
measures with known impact on behavioral outcomes mean- and variance-adjusted (WLSMV) chi-square esti-
(Tiemeier et al., 2012) were included in some analyses. mator using Mplus 7.1 (Muthén and Muthén, 2015). MI is
Child variables included gender, birth weight, mean of typically conceptualized at three levels of a hierarchy of
Apgar scores at 1 and 5 min, and gestational age at birth, stringency. Configural MI means that particular items load
obtained from medical records. Maternal and paternal on the same factors across groups. Metric MI means that
age, marital status (married/cohabiting vs single), and items have similar loadings (item–factor associations)
maternal education were obtained from enrollment ques- across groups. Finally, scalar MI means that item inter-
tionnaires. Maternal education levels were classified as cepts (or thresholds for categorical data) are equivalent,
<secondary, secondary, and higher education (higher meaning that additive influences on item ratings not asso-
vocational education or university). Consistent with other ciated with the underlying factors (e.g. procedural differ-
Generation R publications (e.g. Ghassabian et al., 2013), ences in data collection) are the same across groups. The
child’s ethnic background was defined based on the coun- root mean square error of approximation (RMSEA) served
try of birth of the parents, with three major categories: as the primary fit index for our CFAs, with an RMSEA
Dutch, “Other Western” (Europe plus the United States, cutoff of 0.05 indicating good fit (Yu and Muthén, 2002).
Canada, Australia, etc.) and “Other non-Western” (Cape The Comparative Fit Index (CFI) and the Tucker–Lewis
Rescorla et al. 227

Index (TLI) served as the secondary fit indices, with val- problem was rated as either 1 or 2, whereas the second
ues greater than 0.95 considered to indicate good fit (Hu number (the narrow base rate) indicates the percentages
and Bentler, 1999). We performed CFAs with both the with ratings of 2 only. For the seven SCI items, broad base
single-factor ASD model (12 items) and a two-factor SCI rates at 18 months ranged from 2.5% (Shows little affec-
(7 items) and RRB (5 items) models. tion toward people) to 30.5% (Doesn’t answer when peo-
In our final analyses, we examined associations between ple talk to him/her). As would be expected, narrow base
ASD, SCI, and RRB scores at ages 18 months, 3 years, and rates (a rating of 2) were much lower (from 0.2% to 3.2%).
5 years and SRS scores at age 6. We also compared scores For the five RRB items, broad base rate ranged from 3.4%
on the ASD, SCI, and RRB scales as well as the SRS for (Strange behavior) to 31.6% (Disturbed by any change in
the 60 children with confirmed ASD diagnoses, the 67 routine), whereas narrow base rates ranged from 0.2% to
children evaluated but not diagnosed with ASD, and the 2.3%. Similarly, the age 3 and age 5 data showed a wide
rest of the age 5 sample not suspected of having ASD. range in broad base rates across the items on both subsets
of the ASD scale, as well as much lower percentages if
only a rating of 2 is used to operationalize base rate.
Results
Problems 67, 70, 80, and 98 had low base rates at all three
Demographic characteristics ages, item 25 became slightly more common with age,
item 63 became much less common with age, and items 23
As can be seen in Table 1, the sample had roughly equal and 21 had the highest base rates at all three ages. Table 2
gender percentages at all three time points. On average, the also presents the mean ratings for the 12 items at each age.
participants were born close to term, had mothers in their At 18 months, five items had mean ratings of ⩽0.06 (25,
early 30s, had normal birth weights and gestational age, and 67, 70, 80, and 98), whereas two had mean ratings of 0.34
had good Apgar scores. Child ethnicity was approximately (21 and 23), parallel to the base rate findings. The same
66% Dutch, 9% “Other Western,” and 25% “non-Western.” pattern emerged at ages 3 and 5 years.
Relative to the Generation R participants excluded due to We tested gender effects on our 12 ASD-like items via
lack of CBCL data, the children we studied had a slightly t-tests of mean item ratings. Boys had significantly greater
greater birth weight and older gestational age at birth, and variance for four items at 18 months (23, 63, 70, and 76),
their mothers were older and had more education. Similarly, eight items at age 3 (7, 21, 25, 63, 70, 76, 80, and 98), and
relative to the children we studied, the children excluded for all 12 items at age 5. Boys also had significantly higher
missing ASD items were significantly less likely to be mean item ratings than girls on two items at 18 months (63
Dutch, had more mothers with less than secondary educa- and 76), two items at age 3 (21 and 76), and seven items at
tion, and had lower family incomes. However, the excluded age 5 (4, 7, 21, 23, 63, 76, and 80).
and included groups did not differ significantly in gesta-
tional age, birth weight, mean Apgar score, or maternal age.
Longitudinal stability of ASD scale items
The longitudinal stability of the 12 ASD scale items was
Demographic predictors of ASD scale scores
first examined using cross-tabulations of broad base rates
We used simultaneous multiple regression to predict ASD across age points, with each item dichotomized as absent
scale scores (sum of 0-1-2 ratings on the 12 items) at each (rating of 0) or present (ratings of 1 or 2). As shown in
age from child gender, gestational age at birth, birth weight, Table 3, the phi coefficients were all significant at p <
mean Apgar score, marital status, age of mother and father 0.001 but generally small according to Cohen’s (1988)
at child’s birth, maternal education, ethnicity, and income. benchmarks (range = 0.15–0.34). This means that many
Although all three regressions were significant, these peri- but not all children were consistent over time with respect
natal and demographic factors accounted for small and to whether a given problem was reported as present or
declining percentages of the variance as the children aged: absent for them, with degree of consistency varying widely
R2 values of 8% (18 months), 5% (3 years), and 3% (5–6 across the 12 items. This is shown in Table 3, which dis-
years). The standardized betas for significant predictors (p ⩽ plays the percentage of children with each of four temporal
0.001) were maternal education (β = −0.07), income (β = stability patterns for each item: N/N = problem absent at
−0.16), and ethnicity (β = 0.10) at 18 months; income (β both ages, YY = problem present at both ages (Y/Y), Y/N
= −0.10), ethnicity (β = 0.11), and gender (boys higher, β = = “remission” over time, and N/Y = “increase” over time.
−0.06) at 3 years; and gender (β = −0.12) at 5 years. More than 89% of children had the N/N pattern for six
items from 18 months to 3 years (25, 63, 67, 70, 80, and
98) and for five of these same items from 3 to 5 years (item
Base rates and mean item scores for ASD-like
25 = 89%), meaning that the behavior was not rated as
problems true for the child at either age. The items showing most
In Table 2, the first number in each cell (the broad base change over time were item 23. Doesn’t answer when peo-
rate) indicates the percentage of children for whom the ple talk to him/her and item 21. Disturbed by any change
228

Table 2.  CBCL/1½ –5 DSM-ASD scale item base rates and mean item ratings at 18 months, 3 years, and 5 years.

ASD scale item 18 months 3 years 5 years


(N = 4695) (N = 4571) (N = 5752)

% 1 + 2 (% 2)a Mean ratingb % 1 + 2 (% 2) Mean rating % 1 + 2 (% 2) Mean rating


Social communication/social interaction
4. Avoids looking others in the eye 13.4% (0.6%) 0.14 (0.36) 19.6% (0.7%) 0.20 (0.42) 30.0% (2.4%) 0.32 (0.52)
23. Doesn’t answer when people talk to him/her 30.5% (3.2%) 0.34 (0.54) 35.2% (2.1%) 0.37 (0.53) 40.5% (3.3%) 0.44 (0.56)
25. Doesn’t get along with other children 4.4% (0.5%) 0.06 (0.25) 6.6% (0.4%) 0.07 (0.27) 7.6% (0.6%) 0.08 (0.30)
67. Seems unresponsive to affection 2.7% (0.6%) 0.04 (0.21) 1.5% (0.2%) 0.02 (0.15) 2.1% (0.2% 0.02 (0.16)
70. Shows little affection toward people 2.5% (0.2%) 0.03 (0.18) 2.5% (0.2%) 0.03 (0.18) 3.7% (0.5%) 0.04 (0.23)
76. Speech problem 6.8% (1.6%) 0.10 (0.35) 9.5% (2.1%) 0.12 (0.38) 9.3% (1.0%) 0.11 (0.37)
98. Withdrawn, doesn’t get involved with others 2.8% (0.4%) 0.04 (0.21) 5.6% (0.4%) 0.06 (0.25) 4.6% (0.4) 0.05 (0.23)
Restricted interests/repetitive behaviors
7. Can’t stand things out of place 11.2% (1.4%) 0.13 (0.37) 21.0% (2.3%) 0.23 (0.47) 15.3% (1.8%) 0.17 (0.42)
21. Disturbed by any change in routine 31.6% (2.3%) 0.34 (0.23) 33.0% (2.3%) 0.35 (0.52) 30.4% (3.2%) 0.34 (0.54)
63. Repeatedly rocks head or body 9.9% (1.3%) 0.11 (0.35) 2.5% (0.4%) 0.03 (0.19) 1.5% (0.3%) 0.02 (0.15)
80. Strange behavior 3.4% (0.2) 0.04 (0.19) 3.6% (0.2%) 0.04 (0.20) 5.6% (0.6%) 0.06 (0.27)
92. Upset by new people or situations 17.6% (0.4%) 0.18 (0.40) 13.0% (1.1%) 0.14 (0.38) 10.0% (1.0%) 0.11 (0.34)

CBCL: Child Behavior Checklist; DSM: Diagnostic and Statistical Manual of Mental Disorders; ASD: autism spectrum disorder.
aEach cell contains the percentage of children with ratings of either 1 or 2 on the item, followed by the breakdown of percentage rated 2.
bEach cell contains the mean of the 0-1-2 item ratings and the standard deviation of that mean rating for the full sample.
Autism 23(1)
Rescorla et al. 229

Table 3.  CBCL/1½ –5 DSM-ASD scale items: longitudinal stability from 18 months to 3 years and from 3 to 5 years.

18 months–3 years (N = 3727) 3–5 years (N = 3920)

  N/N N/Y Y/N Y/Y phi N/N N/Y Y/N Y/Y phi
Social communication/social interaction
4. Avoids looking others in the eye 73% 14% 8% 5% 0.22*** 61% 20% 9% 10% 0.26***
23. Doesn’t answer when people talk to him/her 50% 20% 16% 15% 0.20*** 45% 20% 14% 21% 0.30***
25. Doesn’t get along with other children 91% 5% 3% 1% 0.18*** 89% 5% 5% 2% 0.18***
67. Seems unresponsive to affection 97% 0.8% 2% 0.3% 0.16*** 98% 1% 0.9% 0.3% 0.25***
70. Shows little affection toward people 96% 1% 2% 0.5% 0.22*** 95% 3% 2% 0.7% 0.22***
76. Speech problem 86% 7% 5% 2% 0.21*** 86% 5% 6% 3% 0.34***
98. Withdrawn, doesn’t get involved with others 92% 5% 2% 0.8% 0.19*** 92% 3% 5% 1% 0.20***
Restricted interests/repetitive behaviors
7. Can’t stand things out of place 74% 16% 6% 4% 0.18*** 72% 8% 14% 7% 0.27***
21. Disturbed by any change in routine 53% 17% 16% 15% 0.24*** 53% 14% 17% 16% 0.28***
63. Repeatedly rocks head or body 89% 1% 8% 1% 0.23*** 98% 0.6% 2% 0.5% 0.32***
80. Strange behavior 94% 3% 3% 0.6% 0.15*** 93% 4% 2% 1% 0.23***
92. Upset by new people or situations 75% 8% 13% 5% 0.21*** 81% 6% 9% 3% 0.23***

CBCL: Child Behavior Checklist; DSM: Diagnostic and Statistical Manual of Mental Disorders; ASD: autism spectrum disorder.

in routine, with about one-third of the sample inconsistent F(2, 3304) = 25.3, p < 0.001, η2 < 0.01; gender, F(1,
(N/Y or Y/N) across time, compared to <3% for item 67. 3304) = 47.1, p < 0.001, η2 = 0.01; and the age × gender
We also examined longitudinal stability of ASD items interaction, F(2, 3304) = 21.4, p < 0.001, η2 < 0.01, with
by calculating Q correlations (Qs) between the broad base a significant linear but not quadratic contrast. Figure 1
rates for the 12 items across ages. These correlations were shows the raw scores by gender for the ASD, SCI, and
very large (18 months × 3 years = 0.93, 18 months × 5 RRB scales for the “three-wave” sample with data at all
years = 0.86, and 3 years × 5 years = 0.95; all p < 0.001), three ages (n = 3306). As shown in Figure 1, ASD scores
indicating highly consistent rank-ordering of problem base increased with age for boys but not for girls. The SCI sub-
rates over time. The same strong longitudinal consistency scale yielded a slightly larger effect for age, F(2, 3304) =
was found when mean item ratings were used. These 102.3, p < 0.001, η2 = 0.03; a small effect for gender,
results further demonstrate that the problems rarely F(1, 3304) = 41.7, p < 0.001, η2 = 0.01; and a small age
reported at one age were also rarely reported at the other × gender interaction F(2, 3304) = 13.4, p < 0.001, η2 <
two ages (especially items 67, 70, 80, and 98). 0.01, again with a significant linear contrast. Both boys
and girls showed an increase in scores across the three age
Longitudinal stability of ASD, SCI, and RRB points, but the boys had a steeper slope. For RRB, all
three effects were small: age, F(2, 3304) = 16.2, p <
scales
0.001, η2 < 0.01; gender, F(1, 3304) = 30.7, p < 0.001,
The correlations among total ASD scale scores over time η2 < 0.01; and the age × gender interaction, F(2, 3304) =
indicated moderate stability: 18 months to 3 years = 13.5, p < 0.001, η2 < 0.01, with a significant linear con-
0.44, 18 months to 5 years = 0.32, and 3 to 5 years = trast. Boys’ scores were flat across age, whereas girls
0.50, all p < 0.001 and medium-to-large sized correla- showed no change in score from 18 months to 3 years and
tions based on Cohen’s (1988) benchmarks. As with the then a decline between age 3 and 5.
full ASD scale, the rs across age periods for the SCI and
RRB subscales were largest from 3 to 5 years (0.40,
0.43), smallest from 18 months to 5 years (0.26, 0.25),
Measurement model of the CBCL ASD scale
and intermediate from 18 months to 3 years (0.35, 0.36). Because not all participants were assessed at all three
These findings indicate that the SCI and RRB items did ages, we performed CFAs of the ASD scale’s measure-
not differ appreciably in their longitudinal stability. ment model in a sequential fashion. In our first prelimi-
Additionally, whether we used the full ASD scale or the nary analyses, we tested the one-factor and two-factor
two subscales, rank-ordering of children over time in models cross-sectionally at each age. Both the one-factor
ASD scale scores was moderate, with high-scoring chil- models and two-factor models had good fit at all three
dren quite likely to maintain this status over time. ages (RMSEAs = 0.029–0.044, CFIs = 0.924 –0.951,
The mixed-model ANOVA for the 12-item ASD and TLIs = 0.908 –0.939). Mean factor loadings, which
scale yielded significant but very small effects of age, indicate the strength of the items’ relations to the hypoth-
230 Autism 23(1)

Figure 1.  ASD, SCI, and RRB raw scores by gender and age in three-wave sample (N = 3306).

esized construct (factor), were 0.59 to 0.71 in these cross- within-time latent factor correlations (ranging from 0.78
sectional models (see Table 4). to 0.85) were much higher than the within-factor, cross-
In our second set of preliminary analyses, we tested time correlations.
both the one-factor and the two-factor models over time Our main CFA was to test the model’s structure and sta-
for our three age pairings (18 months–3 years, N = 3727; bility for the full age range, using the 3306 children with
18 months–5 years, N = 3834; and 3–5 years, N = 3920). CBCL data at 18 months, 3 years, and 5 years. We did this
RMSEAs (0.023–0.028) were very good, and CFIs (0.926– first for the one-factor model (treating the ASD scale as a
0.954) and TLIs (0.915–0.946) were adequate, with the unified set of 12 items) and then for the two-factor model
one-factor and two-factor models showing similar fit. (treating the ASD scale as comprised of SCI and RRB sub-
Mean item loadings were similar to those obtained in the scales). For the one-factor model, RMSEA = 0.020, CFI
cross-sectional models (0.58–0.71), indicating that on = 0.938, and TLI = 0.930, all in the same range as the
average items related to their hypothesized constructs, age-pair models. Mean factor loadings were also compara-
albeit with some items showing consistently higher load- ble to those in earlier models, with a range of 0.59–0.65 for
ings than others. the 12-item scale. As seen in Table 4, fit indices were com-
To systematically examine variations in factor loadings parable for the two-factor model across three ages (RMSEA
across the 12 items, we calculated the mean loading by = 0.018, CFI = 0.924, and TLI = 0.917, mean factor
item across the 15 CFA data point obtained (e.g. 18-month loadings from 0.58 to 0.70).
loadings for 18-month sample, 18–3 and 18–5 age-pair All the preceding analyses tested for configural MI
samples, and 18–3–5 three-wave samples = 5). Five items only, namely, that the same items loaded on the same fac-
had a mean loading ⩾0.68 (25, 67, 70, 80, and 98), two tors for all groups (e.g. ages and age pairs). For our final
others had mean loadings ⩾0.60 (21 and 92), four had three-wave analysis, following procedures for testing MI
mean loadings from 0.52 to 0.57 (4, 7, 23, and 63), and of categorical variables suggested by Muthén and Muthén
item 76 had a mean loading of 0.40 (range = 0.35–0.49). (2015), we also tested metric and scalar longitudinal invar-
It is noteworthy that the five items with the most consist- iance by constraining equality across time for factor load-
ently high loadings are the same five items with the lowest ings and thresholds (i.e. possible systematic influences on
base rates (25, 67, 70, 80, and 98). observed scores that are not the hypothesized construct).
As shown in Table 4, the 12-item ASD scale latent fac- These additional equality constraints for loadings and
tors were highly correlated across time: 0.64 (18 thresholds had virtually no effect on our CFA results (one-
months–3 years), 0.50 (18 months–5 years), and 0.67 factor model: RMSEA = 0.022, CFI = 0.924, and TLI =
(3–5 years). For the SCI and RRB latent factors, cross- 0.942; two-factor model: RMSEA = 0.020, CFI = 0.938,
time loadings were 0.59 and 0.64 (18 months–3 years), and TLI = 0.931). These results, taken in aggregate, indi-
0.48 and 0.49 (18 months–5 years), and 0.66 and 0.65 cated that the ASD scale model comprised of the SCI and
(3–5 years). Interestingly, the cross-factor (SCI × RRB) RRB subscales manifested configural, metric, and scalar
Rescorla et al. 231

Table 4.  Cross-sectional and longitudinal CFA results for ASD and SCI/RRB models.

RMSEA CFI TLI Mean loading (range) Latent factor rs


12-item ASD model
  18 months 0.032 0.940 0.927 0.60 (0.45–0.80)  
  3 years 0.041 0.924 0.908 0.63 (0.38–0.78)  
  5 years 0.044 0.934 0.919 0.66 (0.39–0.81)  
  18 months–3 years 0.025 0.926 0.915 ASD-18: 0.59 (0.42–0.78); ASD-3: 0.62 (0.40–0.77) 0.64
  18 months–5 years 0.025 0.933 0.923 ASD-18: 0.59 (0.43–0.79); ASD-5: 0.65 (0.36–0.78) 0.50
  3–5 years 0.028 0.939 0.930 ASD-3: 0.63 (0.41–0.79); ASD-5: 0.66 (0.37–0.79) 0.67
  Three waves 0.020 0.938 0.930 ASD-18: 0.59 (0.41–0.76); ASD-3: 0.62 (0.39–0.80); 0.68, 0.50, 0.66
ASD-5: 0.65 (0.35–0.76)
  Three waves, full MI 0.022 0.924 0.942 ASD-18: 0.59 (0.41–0.75); ASD-3: 0.63 (0.36–0.75); 0.66, 0.49, 0.66
ASD-5: 0.66 (0.38–0.78)
SCI/RRB model
  18 months 0.029 0.951 0.939 SCI: 0.64 (0.47–0.82), RRB: 0.59 (0.49–0.73)  
  3 years 0.037 0.941 0.927 SCI: 0.67 (0.40 –0.80), RRB: 0.64 (0.59–0.69)  
  5 years 0.040 0.947 0.934 SCI: 0.68 (0.40–0.83), RRB: 0.71 (0.59–0.82)  
  18 months–3 years 0.024 0.937 0.925 SCI-18: 0.63 (0.44–0.81); SCI-3: 0.66 (0.42–0.80) 0.58
RRB-18: 0.58 (0.51–0.68); RRB-3: 0.64 (0.60–0.69) 0.64
SCI × RRB: 0.85, 0.80
  18 months–5 years 0.023 0.944 0.935 SCI-18: 0.64 (0.46–0.81); SCI-5: 0.67 (0.37–0.81) 0.48
RRB-18: 0.59 (0.50–0.69); RRB-5: 0.71 (0.60–0.79) 0.49
SCI × RRB: 0.83, 0.81
  3–5 years 0.024 0.954 0.946 SCI-3: 0.67 (0.43–0.82); SCI-5: 0.67 (0.38–0.82) 0.66
RRB-3: 0.65 (0.59–0.72); RRB-5: 0.70 (0.55–0.80) 0.65
SCI × RRB: 0.78, 0.84
  Three waves 0.018 0.924 0.917 SCI-18: 0.64 (0.44–0.81); SCI-3: 0.66 (0.41–0.84); 0.59, 0.45, 0.63
SCI-5: 0.66 (0.37–0.80) 0.64, 0.51, 0.68
RRB-18: 0.58 (0.51–0.68); RRB-3: 0.65 (0.59–0.70); SCI × RRB = 0.85,
RRB-5: 0.70 (0.55–0.78) 0.79, 0.81
  Three waves, full MI 0.020 0.938 0.931 SCI-18: 0.64 (0.40–0.78); SCI-3: 0.67 (0.38–0.79); 0.50, 0.45, 0.63
SCI-5: 0.67 (0.39–0.83) 0.64, 0.51, 0.68
RRB-18: 0.59 (0.51–0.72); RRB-3: 0.66 (0.55–0.70); SCI × RRB = 0.85,
RRB-5: 0.73 (0.68–0.78) 0.79, 0.81

CFA: confirmatory factor analysis; ASD: autism spectrum disorder; RRB: restricted interests and repetitive behaviors; RMSEA: root mean square
error of approximation; CFI: Comparative Fit Index; TLI: Tucker–Lewis Index; MI: measurement invariance.
The same sequence of CFAs was performed for the 12-item scale model and for the 2-subscale model. The first three CFAs tested the configural
model at each age; the second three CFAs tested longitudinal structural invariance using the pair-wage age groupings; the seventh analysis tested
longitudinal structural invariance using the children with data at all three time points, whereas the eighth included constraints to test configural,
metric, and scalar longitudinal invariance simultaneously (i.e. the most rigorous test of the model).

longitudinal invariance over the time period from 18 In our final analysis, we used one-way ANOVAs to
months to 5 years in this large population sample. compare differences in age 5 ASD, SCI, and RRB scores
as well as age 6 SRS scores for the 60 children diagnosed
with ASD, the 67 assessed for ASD but not diagnosed, and
SRS and ASD diagnoses results the 4651 children not flagged for ASD assessment. All
For the 3442 children who had CBCL/1½ –5 ASD data at four ANOVAs yielded large group differences (all p <
18 months and SRS data at age 6 (mean age = 72.5 months, 0.001), with η2 values of 0.19 for the SRS (means = 16.8,
SD = 3.8), the Pearson rs between SRS scores and CBCL 13.4, and 3.7, respectively), 0.12 for the 12-item ASD
scores were 0.26 for ASD, 0.24 for SCI, and 0.20 for RRB, scale (means = 6.9, 5.3, and 1.6), 0.09 for the SCI (means
all p < 0.001. The rs for age 3 CBCL scores and SRS = 3.7, 3.1, and 1.0), and 0.09 for RRB (means = 3.2, 2.2,
scores (n = 3601) were 0.38 (ASD), 0.37 (SCI), and 0.27 and 0.6).
(RRB), all p < 0.001. For the 4778 children in the age 5
sample with age 6 SRS data, rs were 0.61 (ASD), 0.54
Discussion
(SCI), and 0.52 (RRB), all p < 0.001. As would be
expected, rs were larger with shorter time intervals and Although Robinson et al. (2011) examined the prevalence
roughly comparable for the SCI and RRB scales. and stability of ASD-like traits in a large population
232 Autism 23(1)

sample, their study did not involve the preschool period, subscales. Additionally, the items with the highest factor
which is when many children first receive an ASD diagno- loadings on the ASD factor across 15 data points were the
sis. Additionally, they did not examine the base rate of spe- same items with the consistently lowest base rates, namely,
cific ASD-like problems. Our study therefore provides items 25, 67, 70, 80, and 98.
important information on the base rate and longitudinal Our rs between the SCI and RRB subscales of 0.40 at
stability of ASD-like behaviors in the general preschool 18 months, 0.44 at 3 years, and 0.53 at 5 years are consist-
population, as well as results of tests of the measurement ent with previous reports that the social communication
model of the CBCL/1½–5 ASD scale and its SCI and RRB problems and repetitive interests, behaviors, and activities
subscales. aspects of ASD are only moderately correlated (Happé
et al., 2006; Mandy and Skuse, 2008). However, the cor-
relations between the SCI and RRB latent factors were
Item-level analyses
much higher than the correlations between the observed
The 12 items of the CBCL/1½–5 ASD scale that we used raw scores for these subscales (i.e. 0.78–0.85 vs 0.40–
to operationalize the DSM-5 ASD criteria varied greatly in 0.53). Because latent factors in CFA are considered to be
base rate at all three ages, with broad base rates (ratings of free of measurement error, they may be more robust indi-
1 or 2) naturally much higher than narrow base rates (rat- cators of ASD problems than observed factor scores at
ings of 2). Rank-ordering of the 12 items, whether opera- each assessment period.
tionalized using base rates or mean item ratings, was very ASD scale scores were moderately stable over time,
consistent over time (Qs ⩾ 0.86), indicating that some with rs ranging from 0.31 (18 months–5 years) to 0.49
behaviors on the ASD scale were very common at all three (3–5 years). However, the cross-time correlations for the
age points, whereas others were quite uncommon. That the latent factors were considerably higher, notably 0.64 (18
DSM-ASD scale contains both kinds of items may be the months–3 years), 0.50 (18 months–5 years), and 0.67 (3–5
reason it discriminates better between children with ASD years) for the ASD scale latent factor loadings and 0.58
and TD children (i.e. is a good Level 1 screener) than and 0.64 (18 months–3 years), 0.48 and 0.49 (18 months–5
between children with ASD and those with other diagno- years), and 0.66 and 0.65 (3–5 years) for the SCI and RRB
ses (i.e. is less good as a Level 2 screener) (Havdahl et al., latent factors. Again, this may be due to the fact that CFA
2016; Muratori et al., 2011; Rescorla et al., 2015). latent factors are presumed to be free of measurement error
Five of the 12 ASD problem items (25. Doesn’t and thus are more robust indicators than observed scores,
get along with other children, 67. Seems unresponsive to leading to higher correlations across time. These results
affection, 70. Shows little affection toward people, 80. suggest that using latent factors derived from the
Strange behavior, and 98. Withdrawn) were rarely CBCL/1½–5 ASD scale provides a more robust measure
endorsed at any age. The relative rarity of these problems of ASD-like behaviors across the early childhood period
coupled with their centrality to the construct of ASD may than using observed scores.
make them especially important for identifying ASD in
young children. Item 63. Repeatedly rocks head or body
was somewhat common at 18 months (9.9% rated 1 or 2)
Associations with other ASD measures
but rare at 3 and 5 years (2.5% and 1.5%), suggesting that As would be predicted, rs between the three CBCL/1½–5
its link to ASD may increase with age, as more TD chil- ASD scales and age 6 SRS scores were largest (i.e. 0.52–
dren discontinue this behavior. In contrast, problems 0.61) when the measures were roughly concurrent (age
such as 21. Disturbed by any change in routine and 23. 5–6) and smallest for the longest time interval (0.20–0.26
Doesn’t answer when people talk to him/her, while cer- for 18 months to age 6). Given that the SRS has more
tainly manifested by many children with ASD, seem to be items, different content, and more rating options than the
common problems in young children more generally and CBCL/1½–5 ASD scale, the concurrent correlations are
hence less specific to ASD. quite impressive. Because the SRS has some research sup-
port as an ASD screener (Norris and Lecavalier, 2010), its
solid association with the CBCL/1½–5 ASD scale in our
Measurement model and scale findings study adds to the research evidence supporting the
Our CFA analyses revealed that the CBCL/1½–5 ASD CBCL/1½–5 ASD scale as a screener for ASD. It should
scale and its SCI and RRB subscales manifested configu- also be noted that correlations with the SRS were roughly
ral, metric, and scalar longitudinal invariance from 18 comparable for the SCI and RRB scales.
months to 5 years as indicated in low RMSEAs and mod- Because we had ASD diagnosis information for only a
erately high CFIs and TLIs. Second, mean factor loadings small fraction of our age 5 sample (about 1%), very incom-
were large in all models, indicating that at all three ages, plete diagnostic information about the 67 children assessed
the 12 ASD items in aggregate were good indicators of the as non-ASD, and no diagnostic information about the rest
latent construct of ASD, as well as of the SCI and RRB of the sample, our ANOVA results must be interpreted
Rescorla et al. 233

cautiously. However, the three groups showed impressive screener, several studies have shown moderate-to-strong
differences on the SRS, ASD, SCI, and RRB scales, with associations between the CBCL’s PDP/ASD scale and
the ASD group having scores 4–5 times higher than the ASD diagnoses (Muratori et al., 2011; Myers et al., 2013;
non-assessed group. Consistent with the notion of the Narzisi et al., 2013; Rescorla et al., 2015, in press; Sikora
broader autism phenotype (BAP), the 67 children assessed et al., 2008). All of these factors support use of the
for ASD but not so diagnosed also had elevated scores on CBCL/1½–5 as a screening tool for ASD in general popu-
the SRS, ASD, SCI, and RRB scales but not as high as lation settings, such as pediatric practices.
those with an ASD diagnosis. Consistent with the greater prevalence of ASD in boys,
we found that male gender was one of few significant demo-
graphic predictors of ASD score in our study. Interestingly,
Limitations and strengths
we found significantly larger variance in mean item ratings
An important limitation of the study is that we only used in boys than in girls, a trend that increased with age (i.e. in
mothers’ ratings on the 12 ASD items. Thus, we cannot 4, 7, and 12 items across ages). Gender differences in mean
determine the degree to which consistency over time in our item ratings also increased with age, with boys having
results (e.g. large Q correlations for mean item ratings) is a higher ratings than girls on 7 of the 12 items by age 5.
rater effect or a child effect. We did not analyze fathers’ Although gender effects were consistently small in our
ratings, nor did we analyze ratings by teachers/examiners repeated-measures ANOVAs, all three analyses showed that
or any observational measures. Use of such alternative boys had significantly higher ASD, SCI, and RRB scale
measures might have yielded different results. We also scores than girls. For practitioners using the CBCL/1½–5 in
note that missing data resulted in slight reductions in our clinical settings, the means and SDs in Table 5 for the ASD
sample sizes, although even our smallest sample (children scale and its SCI and RRB subscales, based on the complete
with complete data at 18 months, 3 years, and 5 years) still sample at each age point, can serve as useful supplements to
exceeded 3000. Children excluded for missing data were the electronic scoring program that generates T scores, per-
more likely to be from non-Dutch families with lower lev- centiles, and clinical-borderline-normal range cutpoints for
els of maternal education and family income. Nevertheless, all scales, including the DSM-ASD scale.
our final samples were still highly diverse. Our follow-up Our base rate analyses showed that items 25. Doesn’t
information about ASD status was also somewhat incom- get along with other children, 67. Seems unresponsive to
plete, although the multiple-gating procedure used by affection, 70. Shows little affection toward people, 80.
Generation R probably resulted in identifying most of the Strange Behavior, and 98. Withdrawn were endorsed as
children with ASD in the sample. very true or often true for very few children at each age.
Despite these limitations, our study had important These same items also had the highest loadings on the
strengths, including use of a large population-based diverse ASD latent factor and seem to capture essential features of
sample, modest attrition from 18 months to 5 years, and ASD. High scores on these items may therefore be particu-
repeated measurements of ASD-like problems with the larly important for identifying ASD. Item 63. Repeatedly
same instrument. An additional strength is that we used rocks head or body was somewhat common at 18 months
multiple approaches to examine prevalence and stability of and rare at 3 and 5 years, suggesting that its link to ASD
ASD-like symptoms, conducting analyses at the item, sub- may increase with age, as more TD children discontinue
scale (SCI/RRB), and scale (12-item ASD) levels. Our this behavior. The items with consistently high base rates
design allowed us to obtain a detailed picture of the base (e.g. 21. Disturbed by any change in routine and 23.
rate of DSM-ASD scale items at three ages, as well as Doesn’t answer when people talk to him/her) may be less
important information about the structural and longitudi- specific to ASD than these rarer items and more character-
nal invariance of the measurement model. istic of maladjusted young children in general.
Our CFAs showed that the 12-item scale, as well as its
two subscales, has strong measurement properties, namely,
Clinical implications
configural, metric, and scalar longitudinal MI. These find-
The CBCL/1½–5 is an extensively researched instrument ings support use of the ASD scale and its SCI and RRB
that is widely used in clinical settings serving young chil- subscales for operationalizing the extent to which children
dren because it does not require a professional to adminis- assessed with the CBCL/1½–5 have elevated scores on
ter, can be completed and scored quickly (including two sets of problems reflecting the two major symptom
online), has a parallel form that can be completed by teach- groupings for ASD. Additionally, our base rate results sug-
ers or caregivers, spans the entire preschool age range, and gest that a child whose elevated score on the DSM-ASD
provides information about a wide range of behavioral/ scale derives mainly from the items with relatively higher
emotional problems (e.g. attention problems, aggressive/ base rates in population samples may be less likely to meet
oppositional behavior, anxiety, and sleep problems). ASD diagnostic criteria than a child who also has high rat-
Although the CBCL/1½–5 was not designed as an ASD ings on most of the items with lower base rates.
234 Autism 23(1)

Table 5.  CBCL/1½–5 DSM-ASD mean (SD) scale scores by gender at 18 months, 3 years, and 5 years.

18 months 3 years 5 years

  Boys (n = 2305) Girls (n = 2390) Boys (n = 2275) Girls (n = 2296) Boys (n = 2885) Girls (n = 2867)
ASD scale scores 1.62 (1.93) 1.44 (1.78) 1.79 (2.10) 1.53 (1.82) 2.04 (2.47) 1.49 (1.87)
SCI scores 0.79 (1.20) 0.67 (1.07) 0.94 (1.31) 0.79 (1.12) 1.22 (1.49) 0.92 (1.20)
RRB scores 0.83 (1.11) 0.77 (1.05) 0.85 (1.16) 0.74 (1.04) 0.82 (1.29) 0.58 (0.99)

CBCL: Child Behavior Checklist; DSM: Diagnostic and Statistical Manual of Mental Disorders; ASD: autism spectrum disorder; SCI: social
communication and interaction; RRB: restricted interests and repetitive behaviors.

The CBCL/1½–5 is normed up to age 6, and therefore, Artsenlaboratorium Rijnmond. The second author was supported
clinicians can use its ASD scale as a screener for children by the Intramural Research Program of the Eunice Kennedy
up to their sixth birthday. The CBCL/6–18 does not have a Shriver National Institute of Child Health and Human
DSM-oriented ASD scale. A few articles (Ooi et al., 2011; Development (NICHD).
So et al., 2013) have created ad hoc scales to capture ASD
that include a few of the same items (e.g. Withdrawn and References
Strange behavior) as the CBCL/1½–5 ASD scale, but Achenbach TM (2014) DSM-Oriented Guide for the Achenbach
these ad hoc scales are not currently included in the System of Empirically Based Assessment (ASEBA).
CBCL/6–18 scoring software and so are less practical for Burlington, VT: Vermont Center for Children, Youth and
clinical use. Families, The University of Vermont.
Achenbach TM and Rescorla LA (2000) Manual for ASEBA
Preschool Forms & Profiles. Burlington, VT: Vermont
Future directions Center for Children, Youth and Families, The University of
Because information about the base rates and longitudinal Vermont.
stability of the ASD scale’s 12 items has been hitherto American Psychiatric Association (APA) (1994) Diagnostic and
Statistical Manual of Mental Disorders. 4th ed. Washington,
lacking, as has evidence about the scale’s measurement
DC: APA.
model, our data, obtained from a very large and diverse
American Psychiatric Association (APA) (2013) Diagnostic and
population sample, begin to address these issues. However, Statistical Manual of Mental Disorders. 5th ed. Washington,
our findings on CBCL/1½–5 ASD item base rates and the DC: APA.
measurement model associated with these items need rep- Berument SK, Rutter M, Lord C, et al. (1999) Autism screen-
lication in other samples. For example, it would be good to ing questionnaire: diagnostic validity. The British Journal
see if the findings replicate in other large population sam- of Psychiatry 175: 444–451.
ples, such as the 24 international samples for which Cohen J (1988) Statistical Power Analysis for the Behavioral
Rescorla et al. (2011) reported CBCL/1½–5 results. It will Sciences. 2nd ed. New York: Academic Press.
also be important to test replicability of the findings in Constantino JN and Todd RD (2003) Autistic traits in the general
large samples with well-defined groups of young children population: a twin study. Archives of General Psychiatry
diagnosed with ASD, other developmental or psychiatric 60: 524–530.
Ghassabian A, Rescorla L, Henrichs J, et al. (2013) Early lexi-
disorders, or typical development, in order to determine
cal development and risk of verbal and nonverbal cognitive
whether the five items highlighted in our results in fact are
delay at school age. Acta Paedictrica 103: 70–80.
the best discriminators of diagnosed ASD. Happé F, Ronald A and Plomin R (2006) Time to give up on
a single explanation for autism. Nature Neuroscience 9:
Funding 1218–1220.
The author(s) disclosed receipt of the following financial support Havdahl KA, von Tetzchner S, Huerta M, et al. (2016) Utility of
for the research, authorship, and/or publication of this article: the child behavior checklist as a screener for autism spec-
The general design of the Generation R Study is supported by the trum disorder. Autism Research 9(1): 33–42.
Erasmus Medical Center Rotterdam, the Erasmus University Hu L and Bentler PM (1999) Cutoff criteria for fit indexes in
Rotterdam, the Netherlands Organization for Health Research covariance structure analysis: conventional criteria ver-
and Development (ZonMw “Geestkracht” programme sus new alternatives. Structural Equation Modeling 6:
10.000.1003), the Netherlands Organization for Scientific 1–55.
Research, and the Ministry of Health, Welfare and Sport. The Jaddoe VW, van Duijn CM, van der Heijden AJ, et al. (2010)
Generation R Study is conducted by the Erasmus Medical Center The Generation R Study: design and cohort update 2010.
in close collaboration with the School of Law and the Faculty of European Journal of Epidemiology 25: 823–841.
Social Sciences of the Erasmus University Rotterdam, the Kooijman MN, Kruithof C, van Duijn CM, et al. (2016) The
Municipal Health Service Rotterdam area, the Rotterdam Generation R Study: design and cohort update 2017.
Homecare Foundation, and the Stichting Trombosedienst and European Journal of Epidemiology 31(12): 1243–1264.
Rescorla et al. 235

Mandy WP and Skuse DH (2008) Research review: what is Autism. Epub ahead of print 20 September 2017. DOI:
the association between the social-communication ele- 10.1177/1362361317718482.
ment of autism and repetitive interests, behaviours and Robinson EB, Munir K, Munafo MR, et al. (2011) Stability of
activities? Journal of Child Psychology and Psychiatry autistic traits in the general population: further evidence
49: 795–808. for a continuum of impairment. Journal of the American
Muratori F, Narzisi A, Tancredi R, et al. (2011) The CBCL/1.5– Academy of Child and Adolescent Psychiatry 50: 376–384.
5 in a sample of children with autism spectrum disorders. Serdarevic F, Ghassabian A, van Batenburg-Eddes T, et al. (2017)
Epidemiology and Psychiatric Sciences 20: 329–338. Infant muscle tone and childhood autistic traits: a longitu-
Muthén LK and Muthén BO (2015) Mplus User’s Guide. 7th ed. dinal study in the general population. Autism Research 10:
Los Angeles, CA: Muthén & Muthén. 757–768.
Myers CL, Gross AD and McReynolds BM (2014) Broadband Sikora DM, Hall TA, Hartley SL, et al. (2008) Does parent report
behavior rating scales as screeners for autism? Journal of of behavior differ across ADOS-G classifications: analysis
Autism and Developmental Disorders 44: 1403–1413. of scores from the CBCL and GARS. Journal of Autism and
Narzisi A, Calderoni S, Maestro S, et al. (2013) Child Behavior Developmental Disorders 38: 440–448.
Checklist 1½–5 as a tool to identify toddlers with autism So P, Greaves-Lord K, Van der Ende J, et al. (2013) Using the
spectrum disorders: a case-control study. Research in Child Behavior Checklist and the Teacher’s Report Form
Developmental Disabilities 34: 1179–1189. for identification of children with autism spectrum disor-
Norris M and Lecavalier L (2010) Screening accuracy of Level 2 ders. Autism 17: 595–607.
autism spectrum disorder rating scales: a review of selected Tick N, van der Ende J, Koot H, et al. (2007) 14-year changes
instruments. Autism 14(4): 263–284. in emotional and behavioral problems of very young Dutch
Ooi YP, Rescorla L, Ang RP, et al. (2011) Identification of autis- children. Journal of the American Academy of Child and
tic spectrum disorders using the Child Behavior Checklist Adolescent Psychiatry 46: 1333–1340.
in Singapore. Journal of Autism and Developmental Tiemeier H, Velders FP, Szekely E, et al. (2012) The Generation
Disabilities 41: 1147–1156. R Study: a review of design, findings to date, and a study
Rescorla LA, Achenbach TM, Ivanova MY, et al. (2011) International of the 5-HTTLPR by environmental interaction from fetal
comparisons of behavioral and emotional problems in preschool life onward. Journal of the American Academy of Child and
children: parents’ reports from 24 societies. Journal of Clinical Adolescent Psychiatry 51(11): 1119–1135.
Child and Adolescent Psychology 40: 456–467. Whitehouse AJ, Hickey M and Ronald A (2011) Are autistic
Rescorla LA, Kim YA and Oh KJ (2015) Screening for ASD traits in the general population stable across development?
with the Korean CBCL/1½–5. Journal of Autism and PLoS ONE 6: e23029.
Developmental Disorders 45: 4039–4050. Yu CY and Muthén BO (2002) Evaluation of model fit indices
Rescorla LA, Winder-Patel BM, Paterson SJ, et al. (2017) for latent variable models with categorical and continuous
ASD screening with the CBCL/1½-5: findings for young outcomes (technical report). Los Angeles, CA: Graduate
children at high risk for autism spectrum disorder. School of Education and Information Studies, UCLA.

You might also like