Professional Documents
Culture Documents
Assessing Developmental Delay in Early Childhood - Concerns With The Bayley-III Scales
Assessing Developmental Delay in Early Childhood - Concerns With The Bayley-III Scales
To cite this article: Peter J. Anderson & Alice Burnett (2016): Assessing developmental delay
in early childhood — concerns with the Bayley-III scales, The Clinical Neuropsychologist, DOI:
10.1080/13854046.2016.1216518
Article views: 47
Download by: [Ryerson University Library] Date: 18 October 2016, At: 08:30
The Clinical Neuropsychologist, 2016
http://dx.doi.org/10.1080/13854046.2016.1216518
Early detection of children with developmental delay is of utmost importance because inter-
vening early can prevent or reduce later cognitive, behavioral, educational, and social prob-
lems (Doyle, Harmon, Heckman, & Tremblay, 2009; Spittle, Orton, Anderson, Boyd, & Doyle,
2015). However, enormous variability is a feature of early cognitive, language, motor and
behavioral development, making it challenging for clinicians to diagnose developmental
delay. Screening questionnaires are sometimes used such as the Ages and Stages
Questionnaire (ASQ3) (Squires & Bricker, 2009), but standardized clinician-administered
instruments remain the gold standard approach for assessing developmental status.
Developmental surveillance programs are now common in clinical and research settings
to determine: (1) an individual’s eligibility for intervention services; (2) type of support
required for individual children; (3) need for ongoing surveillance; (4) an individual’s progress
while in an intervention program; (5) the outcome of clinical trials; and (6) the safety of
pharmacological interventions. These surveillance programs tend to focus on high-risk
infants such as those with congenital abnormalities (e.g. neurofibromatosis, Prader–Willi
syndrome, congenital diaphragmatic hernia), who experienced neonatal complications (e.g.
very preterm birth, hypoxic–ischemic encephalopathy, stroke), were exposed to neurotoxins
(fetal alcohol syndrome, neonatal abstinence syndrome), and from socially disadvantaged
environments. Given the increased rates of developmental problems and long-term neu-
robehavioral problems in these clinical populations, close monitoring of their development
is justified so that issues can be detected and managed early before they become severe
and entrenched.
In this article, we review the Bayley Scales of Infant and Toddler Development, Third
Edition (Bayley-III), which is the most widely used standardized measure of early develop-
ment for both clinical and research purposes. We begin with a description of the Bayley-III,
followed by a critical narrative review of an emerging literature that suggests that the Bayley-
III overestimates development, and as such under-identifies developmental delay (Aylward,
2013). Strategies that have been proposed for dealing with inflated Bayley-III scores will be
presented, and the capacity of the Bayley-III to predict later cognitive and motor problems
will be discussed.
lacked clinical specificity (Moore, Johnson, Haider, Hennessy, & Marlow, 2012). For example,
the MDI scale did not differentiate children with selective cognitive delay from those with
language delay, while the PDI did not differentiate children with selective fine motor delay
from those with gross motor delay.
The most substantial update in test structure came with the next revision and restand-
ardization: the Bayley Scales of Infant and Toddler Development, Third Edition, or Bayley-III
(Bayley, 2006b). This edition saw the creation of five distinct scales to better align with gov-
ernment guidelines regarding early childhood assessment, with the Bayley-III including
Cognitive (91 items), Language (97 items), and Motor scales (138 items), and caregiver ratings
of Social-Emotional (35 items) and Adaptive Behavior (241 items). A crucial change from the
BSID-II to Bayley-III was the separation of assessment of cognitive, expressive language (48
items), and receptive language (49 items) skills, as well as the separation of fine (66 items)
and gross (72 items) motor tasks, into subtests with explicit normative data. This structural
change is thought to enhance the clinical utility of the Bayley Scales, as a more detailed
assessment of strengths and weaknesses is possible, enabling more targeted interventions
to be prescribed. The Social-Emotional scale is based on the Greenspan Social-Emotional
Growth Chart (Greenspan, 2004) and is completed by the child’s caregiver to assess emotional
development and behavior. The Adaptive Behavior scale, also to be completed by caregivers,
is the second edition of the Adaptive Behavior Assessment System (ABAS-II) (Harrison &
Oakland, 2003). This scale estimates the child’s functioning in a wide range of adaptive skills.
The downside of these structural changes to the Bayley-III are that it takes longer to admin-
ister (up to 90 min) and it compromises the capacity for clinicians and researchers to compare
Bayley-III scores with those from earlier editions. In most countries, the US norms are used
to interpret performance, although some local standardisations have been conducted (e.g.
in the Netherlands and Germany) (Steenis, Verhoeven, Hessen, & van Baar, 2015).
average interval between testing of 6 days (2–15 days). Average stability coefficients were
acceptable (Cognitive – .79, Language – .78, Motor – .81), but higher in older children. In
terms of validity, the Cognitive and Language scales correlated moderately to strongly with
the Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III; .79–.82) for 57 children
aged 28–42 months, while the Motor scale correlated moderately with the Peabody
Developmental Motor Scales (PDMS-2; .49–.57). For 102 children, the BSID-II and Bayley-III
were administered in a counterbalanced order with a mean interval of 6 days. The BSID-II
MDI correlated more strongly with the Bayley-III Language scale (.71) than the Cognitive
scale (.60), while the BSID-II PDI and Bayley-III Motor composite correlated moderately (.60).
Of importance, the Bayley-III composite scores were approximately 7 points higher than the
respective BSID-II scales (Bayley, 2006a). This is contrary to expectations as scale scores usu-
ally decline when tests are revised and re-standardized due to creeping phenomena com-
monly observed on developmental/intelligence tests (Aylward & Aylward, 2011; Flynn, 1999).
As development is a dynamic process with children maturing specific skills at different
rates, only moderate long-term stability of developmental status might be expected. To our
knowledge, this issue has been addressed only in clinical and mixed clinical/non-clinical
samples among English-speaking children. A recent report examined performance on the
Bayley-III at 8 and 20 months of age in a sample of 131 preterm children (Greene, Patra,
Silvestri, & Nelson, 2013). The mean Cognitive and Language composite scores were approx-
imately 6 points lower at the 20-month assessment than at the 8-month assessment, while
the Motor composite remained stable. The correlation coefficients for the Cognitive and
Motor composites at the two time points were moderately strong (.57 and .53, respectively),
while the association of the Language composite at the 8- and 20-month assessments was
only fair (.36). The rate of impairment (> 2SDs below the normative mean) was low at 8 and
20 months for all scales, although cognitive delay increased from 1.5 to 6.9%, language delay
increased from 3.6 to 21.4%, while the rate of motor delay remained stable. A more ambitious
study explored the stability of developmental delay at 7 time points (3, 4, 6, 9, 12, 18, and
24 months of age) in a mixed sample of preterm and term children (n = 54), in which delay
was classified as a scale score more than 1.5 SDs below the normative mean (Lobo, Paul,
Mackley, Maher, & Galloway, 2014). Children were categorized as exhibiting stable develop-
ment if they remained in the same classification (delayed or not) at all 7 time points, cate-
gorized as relatively stable if they changed classification once across the time points, or
categorized as unstable if they changed classification more than once. The proportion of
children with a stable developmental pattern was relatively low, ranging from 17% for recep-
tive language to 65% for fine motor. Only a small number of children were categorized as
exhibiting a relatively stable developmental pattern, but the proportion of children with
unstable profiles was considerable, ranging from 28% for expressive language to 67% for
receptive language.
the normative mean. While these higher-than expected scores could reflect geographic
variability or sampling characteristics, anecdotal reports from clinicians using the Bayley-III
supported these findings. Since this initial publication, there have been numerous reports
supporting this concern with the Bayley-III. For example, some studies have reported marked
differences between BSID-II and Bayley-III scores (Moore et al., 2012; Silveira, Filipouski,
Goldstein, O’Shea, & Procianoy, 2012; Vohr et al., 2012), with a significantly lower rate of
developmental delay on the more recent edition.
The National Institute of Child Health and Human Development’s Neonatal Research
Network administered the BSID-II to 1012 extremely low birth-weight (ELBW) children aged
18–22 months born between 2006 and 2007 (period 1) and the Bayley-III to 1616 ELBW
children born between 2008 and 2011 (period 2) (Vohr et al., 2012). The mean Cognitive and
Language composite scores for the children in period 2 were 11 and 7 points higher, respec-
tively, than the mean MDI score for children in period 1, while the mean Motor score was 6
points higher in period 2. The rate of impairment (standard scores less than 70) was signifi-
cantly lower in period 2 (Cognitive – 10%, Language – 19%, Motor – 14%) than in period 1
(MDI – 37%, PDI – 27%). This difference in Bayley scores between periods 1 and 2 may be
explained by demographic differences or improved outcome between these two cohorts
rather than the test itself. However, other studies have found similar results when the BSID-II
and Bayley-III have been administered to the same sample (Moore et al., 2012; Silveira et al.,
2012). For example, the EPICure-2 cohort (children born less than 27 weeks of gestation in
England in 2006) were assessed on the same day using the Cognitive and Language items
from the Bayley-III and MDI items from the BSID-II at a median age of 33 months corrected
for prematurity (Moore et al., 2012). While the mean Bayley-III Cognitive scale was only 3
points higher than the mean BSID-II MDI score, the mean Bayley-III Language scale was 10
points higher than the mean MDI and the rate of impairment dropped from 25% with the
BSID-II to 14% with the Bayley-III. This study also found that the mean MDI score was less
correlated with an averaged Cognitive-Language score from the Bayley-III at lower scores
than at higher scores, a concerning finding given it is children scoring lower who most require
intervention. The limitations of the EPICure-2 study were that the BSID-II was not adminis-
tered according to standardized procedures, with the BSID-II-only items administered at the
end when attention and compliance are most challenged, and the Motor scales were not
administered. In contrast, a smaller Brazilian study administered the BSID-II and Bayley-III
on separate occasions (within 2 months) to 60 very preterm children, with mean Cognitive
and Language scales of the Bayley-III being 8–9 points higher than the MDI and the Motor
scale of the Bayley-III being 14 points higher than the PDI (Silveira et al., 2012). As the Bayley-
III was always administered after the BSID-II, it is possible that at least some of this difference
reflects learning effects given the marked overlap of items of the two editions. Similar con-
cerns with the Bayley-III have been reported in other clinical populations including infants
following neonatal encephalopathy and therapeutic hypothermia (Jary, Whitelaw, Walløe,
& Thoresen, 2013) and infants requiring complex cardiac surgery (Acton et al., 2011).
There are two explanations for the Bayley-III scores being significantly higher than BSID-II
scores. Most have assumed that the Bayley-III overestimates development (i.e. scores are
inflated), but it is equally possible that the BSID-II under-estimates development (i.e. scores
are decreased). Supporting the former explanation, the reported rate of impairment on
BSID-II assessments has generally been in line with expectations based on the normal dis-
tribution and previous literature, while the reported rate of impairment on Bayley-III
6 P. J. Anderson and A. Burnett
assessments has been well below expectations (Anderson et al., 2010; Moore et al., 2012;
Silveira et al., 2012; Vohr et al., 2012). Even so, the studies contrasting the Bayley-III and BSID-II
have been predominantly with preterm and other high-risk populations, and the true rate
of developmental delay in these clinical disorders is not known. To confirm speculation that
the Bayley-III scores are inflated, evidence with representative samples of typically develop-
ing children is needed. In a large cohort (n = 202) of healthy full-term / normal birth-weight
children at 24 months of age with social demographic characteristics approximately repre-
sentative of the Australian community, the Bayley-III composite scores were well above the
normative mean (Cognitive – 109, Language – 108, Motor – 118) (Anderson et al., 2010). In
a representative cohort, it would be expected that the rate of mild-to-severe delay (com-
posite scores < 85) would be approximately 16–17%; however in this Australian cohort, the
rate of even mild delay was only 1, 4 and 2% for the Cognitive, Language and Motor com-
posites, respectively. Similarly, in a large Swedish cohort of term children randomly selected
from the Swedish Medical Registry (n = 366), the mean Cognitive (104), Language (109), and
Motor (107) composites were markedly higher than the normative mean. Furthermore, in a
large Dutch sample, the difference between test scores using local and US norms fluctuated
across domains and age (Steenis et al., 2015).
Based on this body of research, we believe there is sufficient evidence to argue that the
Bayley-III overestimates development, and as a consequence, under-reports developmental
delay. The degree to which the Bayley-III overestimates developmental status may vary across
different age bands; however, there is at least some evidence that it occurs in the first (Reuner,
Fields, Wittke, Löpprich, & Pietz, 2013), second (Acton et al., 2011; Anderson et al., 2010; Jary
et al., 2013; Silveira et al., 2012; Vohr et al., 2012), and third (Moore et al., 2012) years of life.
Further, the degree of overestimation may also vary across different levels of performance
(Moore et al., 2012). The mixed sampling procedure used for the standardization of the
Bayley-III is the most likely reason for the overestimation, as this approach tends to lower
group means, increase SD, and as a result, decrease the capacity to detect developmental
delay (Pena et al., 2006).
with the best prediction for MDI < 70 being Bayley-III Cognitive and Language composite
scores < 85 or the average of Cognitive and Language composite scores < 80 in their
33-month-old sample (Johnson et al., 2014; Moore et al., 2012). While this practical approach
has merit for some research and clinical purposes, (1) it is based on the outdated BSID-II,
which may also under-identify delay, (2) reverts back to a composite score which has limited
clinical utility, (3) is not applicable for classifying mild delay, which also has significant clinical
implications, and (4) is based on extremely preterm children within in the age range of
27–48 months, and needs replication with a large cohort of typically developing children.
Adjustment algorithms have also been proposed, which is a more direct approach to
correct for the inflated scores than raising the cut-offs for classifying delay. The EPICure-2
team generated an algorithm to convert Bayley-III Cognitive and Language composite scores
to an equivalent BSID-II MDI (Moore et al., 2012): predicted MDI = 88.8 – (61.6 × (Language
composite/100)−1) + (.67 × Cognitive composite). Other authors have also generated regres-
sion algorithms for converting Bayley-III scores to BSID-II for preterm and term children
(Lowe, Erickson, Schrader, & Duncan, 2012) and survivors of neonatal encephalopathy (Jary
et al., 2013). Such conversion algorithms are useful if the goal is to convert Bayley-III scores
back to an outdated BSID-II score, but as noted previously, the clinical utility of the MDI has
been criticized. In addition, the published algorithms differ for specific clinical populations,
indicating that population-specific algorithms are likely required. Also, conversion algorithms
to date have applied data from rather narrow age bands and it is likely that age-specific
algorithms are also necessary.
Another approach is to use developmental age equivalents to generate a developmental
quotient (DQ) rather than the standardized composite/scale scores (Milne, McDonald, &
Comino, 2012). The formula for generating DQ is: (developmental age / actual age) × 100.
Milne et al. argue that this provides an estimate of the rate of development for individual
children relative to the standardization sample. In a sample of children referred for evaluation
for developmental delay/disability (n = 122, average age 35 months), DQ scores were signif-
icantly lower than standardized Cognitive, Language and Motor composite scores, with a
corresponding increase in the proportion of children classified as delayed, especially mod-
erately to profoundly delayed (DQ: 18.1%; Composite: 7.4). A criticism of this approach is
that it is less precise as it is based on the premise that SDs are equivalent for all ages, but
this is not the case (Aylward, 2013).
In some regions, large cohorts of typically developing children have been assessed on
the Bayley-III for screening purposes, or as participants in research projects. As long as the
cohort is approximately representative of the general population, the distribution charac-
teristics of this group could be used as a guide for interpreting test performance. For example,
some Australian clinicians and researchers are using published data of a Melbourne control
group, in which mean composite scores were between 8 and 18 points higher than the
normative mean of 100 (Anderson et al., 2010), to determine developmental status, and
when appropriate, the severity of delay. For those who can access these data, caution is
needed as these cohorts have usually been assessed within a narrow age band and may not
be translatable to children in younger and older age bands.
As the distribution of the normative sample appears to have shifted to the right (i.e.
upwards) by approximately 7 points, a simplistic approach maybe to subtract 7 points from
the composite scores, or alternatively adjust the classification cut-offs by 7 points. However,
the magnitude of the inflation differs across studies and across developmental domains
8 P. J. Anderson and A. Burnett
with the overestimation highest in the Motor and Language domains. Further, while the
mean inflation rate may be approximately 7 points, this is likely to vary across the distribution
of the composite scores. For example, the scores at the ends of the distribution may be more
inflated, and therefore require greater adjustment, than scores in the middle of the
distribution.
Ultimately, new or revised standardization data for the Bayley Scales are required. The
easiest solution would be to republish the norms excluding the 10% of children from high-
risk clinical populations, as this is the likely cause for the inflated scores, and is consistent
with the approach used for the BSID-II for which the norms were considered relatively accu-
rate. New standardization is unlikely given the Bayley-III was first published in 2006, although
a new edition is likely in the near future as tests are now generally revised and re-standardized
every 10–15 years. Clinicians and researchers can also consider alternative measures for
assessing early developmental status. While there are a few options available, none have
been reviewed or scrutinized as closely as the Bayley Scales, and they may have similar or
different measurement problems.
Based on pooled data from 14 studies (n = 1330 children), the correlation between MDI and
later cognitive functioning was .61 (95% confidence interval (CI): .57–.64), explaining 37%
of the variance. The correlation between PDI and later motor functioning was lower (r = .34,
95%CI: .26–.42), and explained only 12% of the variance.
Fewer studies have examined the sensitivity and specificity of the Bayley-III in relation to
later functioning. Bode and colleagues administered the Cognitive and Language scales of
the Bayley-III at 2 years of age and the WPPSI-III at 4 years of age in children born very preterm
as well as matched controls (Bode, D’Eugenio, Mettelman, & Gross, 2014). The Bayley-III scales
correlated highly with Full-Scale IQ at 4 years (Cognitive – .81, Language – .78), although this
varied accordingly to gestational age at birth with the correlation being highest for those
born earlier. High sensitivity and specificity were reported, indicating that the Bayley-III is
predictive of preschool IQ; however, outcome classifications were determined according to
the distribution of the control group (M, SD) rather than the test norms of the Bayley-III and
WPPSI-III (Bode et al., 2014). To truly evaluate the predictive validity of the Bayley-III, delay/
impairment needs to be judged according to the test norms as this is how the test is designed
to be used.
Using a cohort of very preterm infants (n = 105), the Victorian Infant Brain Studies team
has reported a series of papers examining the capacity of the Bayley-III to predict later cog-
nitive and motor functioning using the test norms to classify impaired/delayed performance
(Spencer-Smith, Spittle, Lee, Doyle, & Anderson, 2015; Spittle et al., 2013). The Bayley-III was
administered at 2 years of age while the Movement Assessment Battery for Children ( MABC-2)
(Henderson, Sugden, & Barnett, 2007) and the Differential Ability Scales (DAS-II) (Elliott, 2007)
were administered at 4 years of age to assess motor functioning and general intelligence.
While there was a strong correlation between the Bayley-III Motor composite and the MABC-2
percentile rank, the rate of impaired motor performance was significantly higher at 4 years
of age (Spittle et al., 2013). Based on the Bayley-III, 9% of the cohort were classified as mildly
to severely motor delayed and 4% as moderately to severely delayed, while on the MABC-2,
the rate of mild-to-severe impairment was 22%, with 19% exhibiting moderate-to-severe
motor impairment. The sensitivity of the Bayley-III to predict later motor impairment was
very low, but specificity was excellent. In other words, the children who the Bayley-III
identified as having motor delay at 2 years were very likely to have motor impairment at
4 years, but most of the children with a later motor impairment were not delayed on the
Bayley-III. Likewise, the Bayley-III Cognitive and Language composites correlated highly with
the General Conceptual Ability (GCA) scale of the DAS-II and achieved high specificity but
low levels of sensitivity for predicting later cognitive impairment (Spencer-Smith et al., 2015).
Thus, the Bayley-III tends to under-identify later cognitive and motor impairment.
Conclusion
The Bayley-III is the most widely used measure for assessing early developmental status,
specifically cognitive, language and motor delay. However, there is now considerable evi-
dence demonstrating that the Bayley-III overestimates development, resulting in a misclas-
sification of developmental delay. Concerningly, a significant proportion of children who
are performing age appropriately on the Bayley-III are actually delayed, and some children
may not be receiving important services, including early intervention, because they are
being misclassified by the Bayley-III. In terms of predicting later functioning, evidence to
10 P. J. Anderson and A. Burnett
date suggests that the Bayley-III tends to under-identify later cognitive and motor impair-
ments. While this is to be expected given the dynamic nature of development, we expect
the sensitivity of the Bayley-III to be poorer than earlier versions of the Bayley Scales. A
number of strategies have been proposed for dealing with the inflated scores, and while
each approach has merit for specific applications, none are ideal. We believe the best solu-
tions are (1) to publish new norms excluding the high-risk children added to the initial
standardization sample, (2) re-standardize the Bayley-III, or (3) devise a new edition of the
Bayley Scales with improved norms. In the meantime, in order to assist with interpreting
Bayley-III test scores, clinicians and researchers are encouraged to utilize Bayley-III data col-
lected on cohorts of typically developing children representative of their region, if available.
We do not recommend reverting to the BSID-II, or using algorithms to convert Bayley-III
scores to BSID-II scores, except when it is necessary to compare with earlier studies.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by Australia’s National Health and Medical Research Council [grant number
1081288].
References
Acton, B. V., Biggs, W. S., Creighton, D. E., Penner, K. A., Switzer, H. N., Thomas, J. H., … Robertson, C. M.
(2011). Overestimating neurodevelopment using the Bayley-III after early complex cardiac surgery.
Pediatrics, 128, e794–e800.
Anderson, P. J., De Luca, C. R., Hutchinson, E., Roberts, G., Doyle, L. W., & Victorian Infant Collaborative,
G. (2010). Underestimation of developmental delay by the new Bayley-III Scale. Archives of Pediatrics
& Adolescent Medicine, 164, 352–356.
Aylward, G. P. (2013). Continuing issues with the Bayley-III. Journal of Developmental and Behavioral
Pediatrics, 34, 697–701.
Aylward, G. P., & Aylward, B. S. (2011). The changing yardstick in measurement of cognitive abilities in
infancy. Journal of Developmental and Behavioral Pediatrics, 32, 465–468.
Bayley, N. (1969). Manual for the Bayley Scales of infant development. San Antonio, TX: The Psychological
Corporation.
Bayley, N. (1993). Bayley Scales of infant development, second edition: Manual. San Antonio, TX: The
Psychological Corporation.
Bayley, N. (2006a). Bayley Scales of infant and toddler development, third edition technical manual.
San Antonio, TX: Pearson PsychCorp.
Bayley, N. (2006b). Bayley Scales of infant and toddler development (3rd ed.). San Antonio, TX: Pearson
PsychCorp.
Bode, M. M., D’Eugenio, D. B., Mettelman, B. B., & Gross, S. J. (2014). Predictive validity of the Bayley, Third
Edition at 2 years for intelligence quotient at 4 years in preterm infants. Journal of Developmental
and Behavioral Pediatrics, 35, 570–575.
Doyle, O., Harmon, C. P., Heckman, J. J., & Tremblay, R. E. (2009). Investing in early human development:
Timing and economic efficiency. Economics & Human Biology, 7(1), 1–6.
Elliott, C. (2007). Differential ability scales-II (DAS-II). San Antonio, TX: Harcourt Assessment.
Flynn, J. R. (1999). Searching for justice – The discovery of IQ gains over time. American Psychologist,
54, 5–20.
Greene, M. M., Patra, K., Silvestri, J. M., & Nelson, M. N. (2013). Re-evaluating preterm infants with the
Bayley-III: Patterns and predictors of change. Research in Developmental Disabilities, 34, 2107–2117.
The Clinical Neuropsychologist 11
Greenspan, S. I. (2004). Greenspan social–emotional growth chart: A screening questionnaire for infants
and young children. San Antonio, TX: Harcourt Assessment.
Hack, M., Taylor, H. G., Drotar, D., Schluchter, M., Cartar, L., Wilson-Costello, D., & Morrow, M. (2005). Poor
predictive validity of the Bayley Scales of infant development for cognitive function of extremely
low birth weight children at school age. Pediatrics, 116, 333–341.
Harrison, P. L., & Oakland, T. (2003). Adaptive behavior assessment system–Second Edition. San Antonio,
TX: The Psychological Corporation.
Henderson, S., Sugden, D., & Barnett, A. (2007). The movement assessment battery for children (2nd ed.).
London: The Psychological Corporation.
Jary, S., Whitelaw, A., Walløe, L., & Thoresen, M. (2013). Comparison of Bayley-2 and Bayley-3 scores
at 18 months in term infants following neonatal encephalopathy and therapeutic hypothermia.
Developmental Medicine & Child Neurology, 55, 1053–1059.
Johnson, S., Moore, T., & Marlow, N. (2014). Using the Bayley-III to assess neurodevelopmental delay:
Which cut-off should be used? Pediatric Research, 75, 670–674.
Lobo, M. A., Paul, D. A., Mackley, A., Maher, J., & Galloway, J. C. (2014). Instability of delay classification and
determination of early intervention eligibility in the first two years of life. Research in Developmental
Disabilities, 35, 117–126.
Lowe, J. R., Erickson, S. J., Schrader, R., & Duncan, A. F. (2012). Comparison of the Bayley II mental
developmental index and the Bayley III cognitive scale: Are we measuring the same thing? Acta
Paediatrica, 101, e55–e58.
Luttikhuizen dos Santos, E. S., de Kieviet, J. F., Königs, M., van Elburg, R. M., & Oosterlaan, J. (2013).
Predictive value of the Bayley Scales of infant development on development of very preterm/very
low birth weight children: A meta-analysis. Early Human Development, 89, 487–496.
Milne, S., McDonald, J., & Comino, E. J. (2012). The use of the Bayley Scales of infant and toddler
development III with clinical populations: A preliminary exploration. Physical & Occupational Therapy
in Pediatrics, 32, 24–33.
Moore, T., Johnson, S., Haider, S., Hennessy, E., & Marlow, N. (2012). Relationship between test scores
using the second and third editions of the Bayley Scales in extremely preterm children. The Journal
of Pediatrics, 160, 553–558.
Pena, E. D., Spaulding, T. J., & Plante, E. (2006). The composition of normative groups and diagnostic
decision making: Shooting ourselves in the foot. American Journal of Speech-Language Pathology,
15, 247–254.
Reuner, G., Fields, A. C., Wittke, A., Löpprich, M., & Pietz, J. (2013). Comparison of the developmental
tests Bayley-III and Bayley-II in 7-month-old infants born preterm. European Journal of Pediatrics,
172, 393–400.
Silveira, R. C., Filipouski, G. R., Goldstein, D. J., O’Shea, T. M., & Procianoy, R. S. (2012). Agreement between
Bayley Scales second and third edition assessments of very low-birth-weight infants. Archives of
Pediatrics & Adolescent Medicine, 166, 1075–1076.
Spencer-Smith, M. M., Spittle, A. J., Lee, K. J., Doyle, L. W., & Anderson, P. J. (2015). Bayley-III cognitive
and language scales in preterm children. Pediatrics, 135, e1258–e1265.
Spittle, A., Orton, J., Anderson, P. J., Boyd, R., & Doyle, L. W. (2015). Early developmental intervention
programmes provided post hospital discharge to prevent motor and cognitive impairment in preterm
infants. Cochrane Database of Systematic Reviews, 11, Art. No. CD005495. doi:10.1002/14651858.
CD005495.pub4
Spittle, A. J., Spencer-Smith, M. M., Eeles, A. L., Lee, K. J., Lorefice, L. E., Anderson, P. J., & Doyle, L. W.
(2013). Does the Bayley-III Motor Scale at 2 years predict motor outcome at 4 years in very preterm
children? Developmental Medicine and Child Neurology, 55, 448–452.
Squires J., & Bricker D. (2009). Ages & stages questionnaires. Third edition. (ASQ-3): A parent-completed
child-monitoring system. Baltimore, MD: Paul Brookes.
Steenis, L. J. P., Verhoeven, M., Hessen, D. J., & van Baar, A. L. (2015). Performance of Dutch Children on
the Bayley III: A Comparison Study of US and Dutch Norms. PLoS ONE, 10, e0132871. doi:10.1371/
journal.pone.0132871
Vohr, B. R., Stephens, B. E., Higgins, R. D., Bann, C. M., Hintz, S. R., Das, A., … Fuller, J. (2012). Are outcomes
of extremely preterm infants improving? Impact of Bayley assessment on outcomes. The Journal of
Pediatrics, 161, 222–228.