Bayley III Dutch Adaptation Valid for Assessing Development

This article was downloaded by: [University of Nebraska, Lincoln]
On: 07 April 2015, At: 13:41

Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,
UK
European Journal of
Developmental Psychology
Publication details, including instructions for authors
and subscription information:
http://www.tandfonline.com/loi/pedp20
First steps in developing the

Dutch version of the Bayley III:
Is the original Bayley III and its
item sequence also adequate for
Dutch children?
a a
Leonie J. P. Steenis , Marjolein Verhoeven , Dave J.
b a
Hessen & Anneloes L. van Baar
a
Child and Adolescent Studies, Utrecht University,
P.O. Box 80140, 3508 TC Utrecht, The Netherlands
Click for updates b
Department of Methodology & Statistics, Utrecht
University, P.O. Box 80140, 3508 TC Utrecht, The
Netherlands
Published online: 30 Jan 2014.
To cite this article: Leonie J. P. Steenis, Marjolein Verhoeven, Dave J. Hessen

& Anneloes L. van Baar (2014) First steps in developing the Dutch version of the
Bayley III: Is the original Bayley III and its item sequence also adequate for Dutch
children?, European Journal of Developmental Psychology, 11:4, 494-511, DOI:
10.1080/17405629.2013.869207
To link to this article: http://dx.doi.org/10.1080/17405629.2013.869207
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the Content) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or
suitability for any purpose of the Content. Any opinions and views expressed
in this publication are the opinions and views of the authors, and are not the
views of or endorsed by Taylor & Francis. The accuracy of the Content should
not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions,
claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection
with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan, sub-
licensing, systematic supply, or distribution in any form to anyone is expressly
forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Downloaded by [University of Nebraska, Lincoln] at 13:41 07 April 2015
EUROPEAN JOURNAL OF DEVELOPMENTAL PSYCHOLOGY, 2014
Vol. 11, No. 4, 494511, http://dx.doi.org/10.1080/17405629.2013.869207
First steps in developing the Dutch version of the

Bayley III: Is the original Bayley III and its item
sequence also adequate for Dutch children?
Leonie J. P. Steenis1, Marjolein Verhoeven1, Dave J. Hessen2,

and Anneloes L. van Baar1

1
Child and Adolescent Studies, Utrecht University, P.O. Box 80140, 3508 TC
Utrecht, The Netherlands
2
Department of Methodology & Statistics, Utrecht University, P.O. Box 80140,
3508 TC Utrecht, The Netherlands
The Bayley Scales of Infant and Toddler Development [Bayley III, Bayley, N.
(2006). The Bayley scales of infant and toddler development. San Antonio, TX: The
Psychological Corporation] is currently developed and exclusively normed for
American children. In this study, the appropriateness of the Bayley III item content
and item sequence for Dutch children was evaluated. The translated version, the
Bayley-III-NL was evaluated in two phases. In phase 1 (N 100), analyses showed
that overall the item content seemed to be appropriate for Dutch children.
In addition, the item sequence was found to be suitable for Dutch children. After
phase 1, small adaptations were made to the instructions of a few items based on the
experiences of the examiners to improve standardization in performance. In phase 2
(N 400), the findings of phase 1 were confirmed in a larger sample. It is concluded
that Dutch norms can be based upon the current version of the Bayley-III-NL.
Keywords: Bayley III; Item sequence; Infant and toddler development;

Developmental assessment; Test adaptation.
The Bayley III (Bayley, 2006) is an internationally extensively used norm-

referenced assessment tool to measure the developmental abilities of young
children. It has been used for both research and clinical purposes in a growing
number of countries. However, there are indications that the Bayley IIIwhich
was developed and normed for American childrenmay not be appropriate for
use in other, non-American, samples. In a study with a representative group of
Correspondence should be addressed to Leonie J. P. Steenis, Child and Adolescent Studies,

Utrecht University, P.O. Box 80140, 3508 TC Utrecht, The Netherlands. Email: l.j.p.steenis@uu.nl
q 2014 Taylor & Francis

BAYLEY-III-NL 495
healthy, term born one-year-old Australian children, different developmental

scores were found compared to standardized American norms (Walker, Badawi,
Halliday, & Laing, 2010). Children scored significantly higher on Cognition and
Receptive Communication, and significantly lower on Expressive Communi-
cation and Gross Motor skills. No differences were found for Fine Motor skills.
In another Australian study, Bayley III scores for two-year-old extremely preterm
born children with low birth weight (EP/ELBW) and healthy term children were
compared using American norms (Anderson, De Luca, Hutchinson, Roberts, &
Doyle, 2010). Although all EP/ELBW childrens Bayley III mean scores were
significantly lower than those of healthy children, the proportion of children with
a developmental delay in the EP/ELBW group was lower than expected. Also in a
Danish study, different mean scores were found for healthy children in
comparison to the American normative mean for all scales (Krogh, Vver,
Harder, & Kppe, 2012).
A limitation these studies have in common is that they use a test developed and
normed in the USA to interpret scores of their own national samples. Other
countries are likely to differ from the USA in the socio-economic, cultural and
ethnic background of their population (Walker et al., 2010). Therefore, American
test material and norms might be unsuitable for other countries. For instance,
within the UK, different developmental trajectories concerning motor
development were found for Black Caribbean, Black African, Pakistani,
Caucasian and Indian infants (Kelly, Sacker, Schoon, & Nazroo, 2006). More
specifically, the pace at which these groups acquired Gross Motor skills differed
between the ethnic groups. Similar findings were found for language
development when comparing Bayley III language scores between extremely
premature born toddlers of White, Hispanic White and Black origin. Black and
Hispanic White toddlers scored lower compared to White toddlers on language
(Freeman Duncan et al., 2012). These ethnic differences in developmental
trajectories within one country indicate that developmental trajectories also
depend on the ethnic constellation of populations. Therefore, children should be
assessed with tests with norms based upon representative samples regarding their
background population.
A key issue is whether it is feasible to adapt and norm a developmental test for
another population, while retaining international comparability. Before
representative norms can be developed for the Bayley III in non-English
speaking countries, the test first needs to be translated and specific cultural
circumstances need to be considered: Not all items and test material might be
applicable and suitable for children from other cultural backgrounds.
For example, not all language milestones in English are present in other
languages (Wu et al., 2008). Furthermore, pictures in the American Bayley III
may represent unusual objects and situations in other countries, therefore making
it more difficult to successfully perform tasks. In order to measure skills as
originally intended in the American Bayley III, adaptations to items and material
496 STEENIS ET AL.
might be necessary. Although this may slightly alter the test material, such
changes might be necessary to warrant the validity of the test.
A challenging issue that arises when norming a developmental test for a
specific population concerns the item sequence. The Bayley III is a performance
test: test administration starts with age appropriate items and when a child fails on
one of the first three items, more easy items for younger children should be
completed before more difficult items are administered. The administration stops
when a child fails on a consecutive series of five items (discontinuation rule).
As such, both the starting and the ending pointand therewith the test scores
are determined by the capacities of the child. Items for younger ages that are not
administered are scored positively, as these items are considered too easy for the
child; items remaining after reaching the endpoint based on the discontinue rule
are scored negatively, as these are considered too difficult for the child. Due to
this test construction, the item sequence is of great importance for the total score
of scales. Children from other cultural backgrounds may master skills in a
slightly different order or at a different pace. If so, the American norms and item
sequence may overestimate or underestimate the developmental level of the child
and therefore may influence the test score on the Bayley Scales. A study
comparing the results of healthy Cameroonian and healthy German infants at 3, 6
and 9 months old showed that these two groups had meaningful deviances in the
acquisition sequence of Bayley III items for gross motor and language
development (Vierhaus et al., 2011). When comparing the optimal item sequence
(e.g., arranging the items according to level of difficulty) to the original American
item sequence, the item sequence for the German children was similar to that of
the American children. For Cameroonian children, however, the item order
deviated substantially from both the German and the American item order. These
findings underline the necessity to examine whether the item sequence of the
American Bayley III is correct in other cultures.
Currently, a Dutch version of the Bayley III, the Bayley-III-NL, is being
developed aiming at a good assessment of the developmental level of Dutch
children, while retaining international comparability. Therefore, the American
version needed to be (1) translated, (2) checked concerning the appropriateness of
the items regarding Dutch culture and (3) evaluated regarding the appropriate-
ness of the item sequences per scale (Bartram, 2001; Geisinger, 1994). Steps 1
and 2 have already been executed (Steenis, Verhoeven, & Van Baar, 2012).
Therefore, step 3 is the main focus of this study.
In the current study, we evaluate whether the items, translation and item
sequence based on the American population are appropriate for Dutch children,
using a two-phase approach. The aim of phase 1 is to evaluate the suitability of
the items for the Dutch population and the translation and adaptation of the items
in 100 children. In addition, difficulties encountered by the examiners during
administration will be evaluated per item. Based on their feedback, item
instructions will be adjusted when appropriate. In phase 2, the adapted version
BAYLEY-III-NL 497
will be evaluated in another 400 children to examine the changes made in phase 1
and the item sequence for Dutch children in a larger sample. Finally, based on the
proportions of all items, an optimal item sequence for all Bayley III scales will be
determined for Dutch children and then compared to the American item
sequence.
METHOD
Participants
Participants were healthy term Dutch children between 16 days to 42 months and
15 days old. In Table 1, participant characteristics for both phases of the study are
presented for 10 age groups. In phase 1, 100 children participated (mean
age 19.28 months, SD 12.95, 49% boys). In phase 2, 400 children
participated (mean age 18.36 months, SD 11.95, 51% boys).
TABLE 1
Characteristics of the participants in phases 1 and 2
Phase 1 Phase 2
Total N 100 400

Mean age (SD) 19.28 (12.95) 18.36 (11.95)
Boys (%) 49 51
Mean gestational age in weeks (SD) 39 (1.56) 40 (1.23)
Mean birth weight (g) (SD) 3491.84 (560.60) 3556.8 (523.60)
Educational level mothera
Low (%) 11 8
Middle (%) 11 30
High (%) 77 62
Ethnicity mother: Dutch (%) 98 96
n per age groupb
0 months 16 days4 months 15 days 11 51
a
Low educational level refers to special education, primary school or pre-vocational secondary
education (,12 years); middle educational level refers to senior general secondary education, pre-
university education or secondary vocational education (13 16 years); high educational level refers
to higher professional education or university (17 years).
b
The 10 age groups are based upon the 10 starting points used for the language scales.
498 STEENIS ET AL.
Measurements
Background questionnaire. Mothers completed a background questionnaire
containing 26 questions about family background and child characteristics, such
as date of birth and nationality of children and parents, family composition and
social economic status.
Bayley-III-NL. The Bayley Scales of Infant and Toddler Development-third

version (Bayley III) is an individually administered instrument that measures the
developmental level of children between 16 days to 42 months and 15 days old
(Bayley, 2006). It consists of five scales: Cognition Scale (91 items), Fine Motor
Scale (66 items), Gross Motor Scale (72 items), Receptive Communication Scale
(49 items) and Expressive Communication Scale (48 items). The Bayley III was
found to be a reliable and valid instrument with good psychometric
characteristics for American children.
In every scale, item difficulty increases with predetermined starting points for
different age groups. An item was chosen as starting item when at least 95% of the
children in the relevant age group were able to pass that particular item. When the
child correctly performs the first three items after the starting point, all items before
the starting point are attributed a positive score. If one of the first three items is
scored negatively, the items for a younger age group need to be administered until
the child is able to score positively on three items consecutively (reversal rule).
Next, after consecutively failing five items on a scale, all following items are
considered too difficult for the child and are scored negatively (discontinue rule).
Therefore, the results of the items are based on: (1) actual observation and (2)
attribution of a positive or negative value. Because of the construction of the
Bayley III, there is overlap between the administered items of adjacent age groups.
Consequently when using the Bayley III in groups of children, the number of actual
observations per item depends on the number of children belonging to several age
groups and the number of administered items is a result of the developmental levels
of the children and the reversal and discontinue rule.
Recently, the Bayley III was translated and a few small adaptations were made
in view of the Dutch culture, in order to develop a Dutch version of the Bayley
III: The Bayley-III-NL (Steenis et al., 2012).
Procedure
In phases 1 and 2, children between 2 weeks and 3.5 years old were recruited via
day-care centres, advertisements in newspapers, through personal connections
and via child health centres in the Netherlands. To determine if the child met the
selection criteria for healthy children, parents were asked questions about the
general health, birth weight and gestational age of the child, when they applied to
participate.
BAYLEY-III-NL 499
Examiners were trained to obtain an inter-rater reliability level with a minimal

consensus rate of 80% per scale. All 27 examiners scored a Bayley-III-NL
assessment on film and their scores were compared to the scores of the trainer.
The average kappa for all items over all scales was .94 (SD .05). For the
actually observed items, the overall average kappa was .77 (SD .17), for the
Cognitive Scale .73 (SD .13), for Receptive Communication .73 (SD .17),
for Expressive Communication .77 (SD .19), for the Fine Motor Scale .83
(SD .11) and for the Gross Motor Scale .79 (SD .20).
The test was administered at locations free from distracting stimuli. Because
young infants are only awake and alert for small periods of time and traveling to a
lab can be too fatiguing, children up to 3 months of age were tested at home.
To assess the most items that were potentially age-appropriate, the starting point
for all scales was set two age groups below the American starting point in phase
1. The discontinue rule was not extended as this would make the test duration
longer and that, together with more too difficult items, might frustrate children.
In phase 2, the original American starting point was used for the Cognitive,
Receptive Communication, Expressive Communication and Fine Motor Scales.
Regarding the Gross Motor Scale, the starting point was set one age group below
the American starting point and the reversal rule was extended to five items, in
order to fit the skills of Dutch children sufficiently.
Data analysis
The item sequence concerns all Bayley III items applicable for the whole age
range from 2 weeks to 42 months. The proportion of positive scores, which can
vary between .00 and 1.00, was calculated for each item. As the items in the item
sequence are expected to increase in difficulty with older age, the proportion of
positively scored items is expected to decrease when progressing through the test.
Items diverging more than .05 in proportion of positive scores (either higher or
lower) from the preceding or following items in the sequence were considered to
deviate from the general expected pattern. This cut-off point of .05 is based on the
criterion for the determination of age appropriate starting points that was used in
the American version of the Bayley III, which holds that at least 95% of the
relevant age group was able to pass each of the first three items after the age-
related starting point (Bayley, 2006).
In phase 1, first the deviating items were identified. Next, for all deviating
items, information of the examiners was taken into account to evaluate whether
the deviation might be caused by ambiguities in the instruction-manual or
difficulties in item administration. Additionally, the mean standardized scores of
the scales for the Dutch children, based on the American norms, were evaluated.
In phase 2, the same procedure was repeated for a larger group of 400 children
resulting in 400 scores per item. Next as the optimal item sequence may deviate
from the American sequence, it was also evaluated whether the deviating items
500 STEENIS ET AL.
were placed in a position that could have a negative influence on the total score of
a child. Important in this regard is the location of the item in the sequence in
relation to the starting points for different age groups. If an item is placed around
a starting point, a negative deviation from the general pattern of the item
sequence (a proportion of positive scores more than .05 lower than the
proportions of previous or next items, indicating a difficult item) does not
influence the adequacy of test results, as the reversal rule requires that an
examiner should return to more easy items of a younger age group. However, if
the item deviates positively from the pattern (a proportion of positive scores more
than .05 higher than the proportions of previous or next items, indicating an
easy item) and is placed among the three starting items, the examiner does not
need to apply the reversal rule and all previous items are scored positive. In that
case, the capacities of a child could be overestimated. In addition, pass rates for
the deviating items were also studied in relation to the number of actually
observed scores of children that were expected to score positively based on the
age group they belong to (e.g., the age appropriate pass rate). Next, x 2 analyses
were used to examine changes in the proportions of positive scores for those
items of which the instructions were adapted in phase 1.
Furthermore, the item order across the 10 age groups presented in Table 1 was
studied in phase 2 by means of Kendalls W, which is a measure of the
correspondence between the order of the item probabilities in the different age
groups (Tijmstra, Hessen, Van der Heijden, & Sijtsma, 2011). In this procedure,
ranks per age group are assigned to all items based on the proportion of correct
response per age group, and the concordance of the rank orders of the age groups
is investigated. Kendalls W can take on a value between 0 (no correspondence)
and 1 (perfect correspondence). Finally, the items of the Bayley-III-NL were
rearranged in the most optimal item sequence based on the difficulty level
(proportions), and then compared with the American item sequence by means of
rank correlations (Spearmans Rho). SPSS 20.0 was used for all analyses.
RESULTS
Phase 1
In phase 1, the item sequence was examined for 100 children. Because of the
construction of the Bayley III, a score is assigned to all items, including items that
are considered too easy (intended for younger children) and too difficult
(intended for older children) for the child. Therefore, 100 scores per item were
available for the analyses.
In phase 1, proportions of positive scored items in the predetermined
American sequence in general decreased for all scales and therefore the items
indeed seemed to increase in difficulty (see Figure 1). However, each scale
contained items that deviated from the overall pattern. In total, 41 of the 326
Figure 1. Proportions of the positive scores for all items in all scales in phase 1.
BAYLEY-III-NL
501
502 STEENIS ET AL.
items (13%) deviated more than .05 from the previous or next item in the general
pattern. On the Cognition Scale, proportions of 12 out of 91 (13%) items deviated
more than .05 from the pattern; on the Receptive Communication Scale 10 out of
49 items (20%); on the Expressive Communication Scale 5 out 46 (11%) items;
on the Fine Motor Scale 8 out of 66 (12%) items; and on the Gross Motor Scale 6
out of 72 items (8%).
Table 2 displays information about these deviating items. For six of these
items (four items from the Cognitive Scale and two items from the Gross Motor
Scale), examiners recommended small modifications to the item administration
and scoring, which were adapted (see Appendix A). Two items on the Expressive
Communication Scale were found to concern milestones in English language

(concerning the use of verb ing, like playing) that are not present in the Dutch
language according to linguistic experts and therefore were deleted from the test.
In addition, mean scaled scores of the current Dutch sample were compared
with the American normative mean scaled score of 10. Results on the Gross
Motor Scale showed a considerably lower mean scaled score of 7.74 (SD 3.11)
compared with the American average of 10. Mean scaled scores of the other
scales were comparable (see Table 3), indicating that the developmental
trajectory of Dutch and American children may not be comparable regarding the
Gross Motor Scale, whereas for other scales the developmental trajectory might
be similar. Therefore, we decided to use the original American age appropriate
starting point for all scales in phase 2, with the exception of the Gross Motor
Scale. The starting point for the Gross Motor Scale was set one age group below
the American starting point, as this would better fit the Gross Motor Skills of
Dutch children at a given age. In addition, as the developmental trajectory
seemed to differ from that of the American sample, the reversal rule for this
developmental domain was extended to five items that needed to be positive after
the starting point, instead of three items. By making the reversal rule more strict
for this scale, it was assured that skills applicable to the developmental level of
Dutch children at a certain age were administered.
Phase 2
Figure 2 presents proportions of positive scores per scale for all items for the next
400 children in phase 2. As in phase 1, because of the construction of the Bayley
III, positive (too easy items) and negative scores (too difficult items) are assigned.
Therefore, 400 scores per item were available for the analyses. Consistent with
phase 1, each scale contained items with a proportion deviating more than .05
from the preceding and following item, but the number of items was significantly
decreased to 17 out of the 324 items (5%). On the Cognitive Scale, 6 out of 91
(7%) items deviated from the pattern including three items that aim to stimulate
children to show symbolic play. For Receptive Communication, 2 out of 49 (4%)
items deviated from the pattern, on the Fine Motor Scale 6 out of 66 (9%) items
BAYLEY-III-NL 503
TABLE 2
Characteristics of the deviating items for the first phase
Proportion of Proportion of
item prior in Proportion of item later
Start sequence deviating in sequence
Item point (deviation) item (deviation)
Cognition
18. Inspects own hand E .93 (.09) .84 .90 (.06)
32. Looks at pictures I .73 (.12) .85 .65 (.20)
33. Picks up blocks series: retains I .85 (.20) .65 .71 (.06)
two of three blocks

37. Picksup block series: three blocks J .70 (.15) .55 .65 (.10)
40. Finds hidden object K .61 (.07) .68 .60 (.08)
46. Removes lid from bottle L, M .52 (.08) .44 .55 (.11)
54. Block series: nine blocks L, M .46 (.04) .42 .51 (.09)
57. Uses pencil to obtain object N .48 (.08) .40 .48 (.08)
65. Representational play P .42 (.10) .32 .41 (.09)
69. Imaginary play Q .37 (.21) .11 .33 (.22)
71. Multischeme combination play Q .33 (.26) .07 .23 (.16)
77. Simple pattern with plastic ducks Q .21 (.10) .11 .24 (.13)
Receptive Communication
11. Recognizes two familiar words J .77 (.22) .58 .64 (.06)
12. Responds to no-no J .58 (.06) .64 .80 (.16)
14. Responds to request for social K, L .80 (.22) .58 .67 (.09)
routines
25. Follows two-part directions P .40 (.07) .33 .39 (.06)
30. Understands pronouns Q .36 (.10) .26 .32 (.06)
(him, me, my, you, your)
36. Understands label one Q .24 (.06) .30 .11 (.29)
37. Understands pronouns (they, she, he) Q .30 (.19) .11 .07 (.04)
38. Understands pronouns (his, her) Q .11 (.04) .07 .13 (.06)
39. Understands plurals Q .07 (.06) .13 .22 (.09)
44. Understands past tense Q .16 (.12) .04 .12 (.08)
Expressive Communication
16. Imitates word K .54 (.07) .61 .53 (.08)
25. Imitates a two-word utterance O .41 (.15) .26 .39 (.13)
34. Uses verb -ing Q .28 (.24) .04 .39 (.35)
37. Names actionpicutre series: five pictures Q .22 (.16) .38 .23 (.15)
45. Uses present progressive form Q .07 (.03) .04 .12 (.08)
Fine Motor
24. Food pellet series: partial thumb I .69 (.08) .77 .67 (.10)
opposition
27. Turns pages of book J .67 (.06) .73 .62 (.11)
34. Grasp series: transitional grasp M, N .53 (.14) .39 .55 (.16)
42. Connecting blocks: together P .32 (.07) .39 .29 (.10)
48. Grasp series: dynamic grasp Q .25 (.15) .10 .37 (.27)
(continued)
504 STEENIS ET AL.
TABLE 2 continued
Proportion of Proportion of
item prior in Proportion of item later
Start sequence deviating in sequence
Item point (deviation) item (deviation)
49. Tactilely discriminates shapes Q .10 (.27) .37 .27 (.10)
51. Cuts paper Q .27 (.07) .15 .22 (.09)
54. Block stacking sries: eight blocks Q .13 (.15) .28 .10 (.18)
Gross Motor
24. Grasps foot with hands H .79 (.16) .63 .83 (.20)
55. Kicks ball P .31 (.07) .38 .38 (.00)
56. Walks forward on path P .38 (.00) .38 .21 (.17)
57. Walks up stairs series: both Q .38 (.17) .21 .17 (.04)
feet on each step, alone
58. Walks down stairs series: both Q .21 (.04) .17 .31 (.14)
feet on each step, alone
66. Stops from a full run Q .06 (.11) .17 .03 (.14)
and on the Gross Motor Scale, 3 out of 72 (4%) items (see Table 4). For
Expressive Communication, no items deviated more than .05 in proportion of
positive scores. All deviating items were found in older age groups from 16
months plus 15 days onwards.
Table 4 displays characteristics of the 17 deviating items in more detail. For
Cognition, three out of six deviating items belonged to one of the first three items
after the starting point. As these items were all considered difficult, the reversal
rule needed to be applied, which involved a check of the childs skills using items
of the younger age group and therefore did not affect the total score incorrectly.
On the Fine Motor Scale, two difficult items out of six deviating items
belonged to the starting items and the same reasoning held as described above for
Cognition. For Receptive Communication, one of two deviating items belonged
to starting items and this was an easy item with a pass rate of 96.6%. This is
consistent with the American criterion of a pass rate of at least 95% that was used
for determining starting items. Because the pass rate of this item was slightly
TABLE 3
Mean scaled scores of the participants in phases 1 and 2 and the USA
Phase 1 Phase 2 USA

Scale M (SD) M (SD) M (SD)
Cognition 10.41 (2.53) 10.68 (2.34) 10.00 (3.00)

Receptive Communication 10.16 (2.58) 10.23 (2.63) 10.00 (3.00)
Expressive Communication 10.08 (2.48) 10.19 (2.50) 10.00 (3.00)
Fine Motor 10.79 (2.25) 11.29 (2.73) 10.00 (3.00)
Gross Motor 07.74 (3.11) 9.08 (3.17) 10.00 (3.00)
Figure 2. Proportions of the positive scores for all items in all scales in phase 2.
BAYLEY-III-NL
505
TABLE 4
506
Characteristics of the deviating items of the next 400 children
Items Start In start Easy or Proportion of Proportion of Proportion of Age adequate

point point difficult item prior in deviating item item later in pass rate (%)
sequence sequence
(deviation) (deviation)
Cognition
STEENIS ET AL.
46. Removes lid from bottle L, M Yes Difficult .52 (.04) .48 .58 (.10) 78.6
65. Representational play P Yes Difficult .39 (.08) .31 .41 (.10) 93.8
69. Imaginary play Q Yes Difficult .28 (.10) .18 .28 (.10) 65.0
71. Multischeme combination play Q No Difficult .28 (.14) .14 .24 (.10) 60.0
75. Matches size (big and small plastic ducks) Q No Difficult .20 (.07) .13 .18 (.05) 60.0
77. Simple pattern (big and small plastic ducks) Q No Difficult .21 (.12) .09 .16 (.06) 50.0
Receptive Communication
13. Attends to others play routine K, L Yes Easy .70 (.08) .78 .67 (.11) 96.6
39. Understands plurals Q No Difficult .06 (.01) .05 .12 (.07) 25.0
Fine Motor
32. Drawing: imitates stroke series: random M, N Yes Difficult .58 (.11) .47 .53 (.06) 76.1
34. Grasp series: transitional grasp M, N No Difficult .53 (.05) .48 .58 (.10) 73.2
37. Grasp series: intermediate (tripod) grasp O Yes Difficult .50 (.14) .36 .40 (.04) 73.5
42. Connecting blocks: together P No Easy .30 (.12) .42 .27 (.15) 97.9
48. Grasp series: dynamic grasp Q No Difficult .22 (.10) .12 .18 (.06) 70.0
54. Block stacking series: eight blocks Q No Easy .10 (.10) .20 .07 (.13) 100.0
Gross Motor
55. Kicks ball P Yes Easy .30 (.15) .45 .43 (.02) 100.0
56. Walks forward on path P Yes Easy .45 (.02) .43 .21 (.22) 95.9
59. Jumps forward: four inches Q Yes Easy .17 (.11) .28 .21 (.07) 100.0
Notes: This table displays the deviating items with their corresponding start point and if the item is one of the first three items after the starting point (In start
point). Items that deviate positively from the sequence are considered too easy for the position in the sequence and items that deviate negatively are considered
too difficult (easy or difficult). The proportions of the deviating items and the surrounding items with the deviation from the deviating item are presented per
scale as well as the pass rate on the deviating items of children in the age group corresponding to the starting points the items belong to (age adequate pass rate).
BAYLEY-III-NL 507
higher than this 95%, the position around the starting point was justified. On the
Gross Motor Scale, all three deviating items belonged to starting items and these
were all easy items with pass rates of 95.9% and higher.
In phase 2, the mean score for the Gross Motor Scale was still lower than the
normative American mean, but the deviation was smaller compared to phase 1
(see Table 2). For the Fine Motor Scale, the mean scaled score was considerably
higher than in phase 1 and with a mean of 11.29 (SD 2.73) also higher than the
American normative mean of 10. No large differences were found for the other
scales.
The proportions of positive scores for the six items that were modified in phase
1 were compared for the results in the first and second phase using x 2 analyses.
Items 69 Imaginary play (x 2 (1) 5.08, p , .001) and 71 Multischeme
combination play (x 2 (1) 8.18, p , .001) of the Cognitive Scale showed a
significant difference, indicating that compared to phase 1 the items deviated less.
Kendalls W was used to study the order of the item sequence, and it showed
high concordance between the rank orders of the items for the 10 age-groups per
scale (see Table 5). Finally, ranks were assigned to the items in that proportions
of the items perfectly decreased in difficulty over all age groups. This optimal
item sequence for Dutch children was compared to the original American item
sequence using Spearmans Rho rank order correlations. These analyses showed
almost perfect correlations with p .99 for all scales, indicating that the Dutch
item sequence is comparable to the American item sequence.
DISCUSSION
This study described part of the process of the design and development of the
Dutch version of the Bayley-III: The Bayley-III-NL. During its development,
several decisions regarding translation, adaptation and item-sequence were made,
balancing between test improvement and preservation of international
comparability. Overall, both qualitative and quantitative analyses indicated that
TABLE 5
Kendalls W values per scale when comparing the rank order of the items between the
ten age groups
Scale Kendalls W
Cognition 0.81
Receptive Communication 0.84
Expressive Communication 0.82
Fine Motor 0.85
Gross Motor 0.81
Note: Ranks are assigned based on the proportion of correct response per item per age group.
508 STEENIS ET AL.
the content of the translated items is appropriate for Dutch children. As expected
for developmental tests, the item sequence was found to increase in difficulty for
all scales. After phase 1, based on the differences in development of language
milestones between Dutch and English language, two items were deleted from
the Expressive Communication Scale. Furthermore, based on the experiences
of our examiners, item instructions were slightly modified for six items
(see Appendix A). Despite these changes, there were still deviating items in phase
2. However, because of the position of these items in relation to the starting
points and the pass rates of the items, it was concluded that these items did not
have to be repositioned in the final item sequence.
Our decision to keep the original American item sequence was supported by
the high values of Kendalls W which indicated very strong concordance between
the item order across age groups (Tijmstra et al., 2011). Furthermore, the rank
order correlations between the optimal item sequence according to the results
of phase 2 and the original American item sequence were high. Thus, the
sequence in which Dutch children acquire skills seems comparable to that of
American children. However, differences in the mean scaled scores of Dutch and
American childrenDutch children scored lower for Gross Motor and higher for
Fine Motor than American childrenindicate that the pace at which children
acquire skills is not completely comparable across countries. This is consistent
with results of earlier studies that showed difficulties using American norms in
non-American samples (Anderson et al., 2010; Krogh et al., 2012; Walker et al.,
2010). Population specific norms are necessary when using the Bayley III.
As analyses of phase 1 showed that for the Gross Motor Scale the mean scaled
score is lower compared to the American mean scaled score, the reversal rule for
phase 2 was extended and the starting point was set one age group earlier. The
original reversal rule and starting point was adapted this way to assure all items
applicable to Dutch children at a certain age were administered. Adaptation was
necessary as the mean standardized scores of this sample indicated that Dutch
children have a different developmental trajectory than American children. This
might be a limitation of this study, because these adaptations may have resulted
in longer test duration. However, although perhaps not optimal, this adjustment
minimized the chance for underestimation or overestimation of the childs
abilities; as more items were administered, childrens Gross Motor skills were
investigated more accurately.
Unfortunately, no information was available on the proportions of positive
scores for the American norm sample. We chose 5% as a cut-off point to assess
the general pattern of the item sequence. However, developers of the American
Bayley III may also have had qualitative or theoretical reasons to keep particular
items in the scales. For example, in the free play items, specific levels of playing
are elicited such as symbolic play. Play development is an important aspect of
cognitive development during toddlerhood (Piaget, 1962). In the Bayley III,
items to elicit free play have a different design than other items of the test: these
BAYLEY-III-NL 509
items are less structured and children are given hardly any instructions. In our
experience, many children find it difficult to switch from the strongly structured
items to these free play items. This difficulty is underlined by the results in this
study: three out of five play items deviated from the expected pattern of
decreasing proportions and only two of them deviated less after clarification of
the item instructions. It is possible that, due to its relevance in cognitive child
development, these items were retained in the Bayley III at the age level when
children were expected to show the different levels of play, even though
proportions of the items deviated from the general pattern of increasing difficulty.
A limitation of this study is that the samples in phases 1 and 2 were not yet
representative of the Dutch population regarding region, and mothers ethnicity

and educational level. Furthermore, in the American sample, children with risk
factors for developmental delay were included, whereas such children were not
included in the current studys sample. However, because of the high agreement
between the optimal Dutch item sequence and the original American item
sequence, it is not expected that the optimal item sequence will be influenced
strongly by the constellation of the sample. To investigate this expectation, a
replication of this study is needed with a representative sample including at risk
children.
In addition, further study will be done to evaluate the reliability and validity of
the Dutch Bayley-III-NL. The norms for the Bayley-III-NL will be based upon
data of a larger sample of children representative for the Dutch population. This
procedure will enable good assessment of early child development in the
Netherlands with the Bayley-III-NL.
In conclusion, more data need to be collected before Dutch norms can be
constructed, but the first steps in the investigation of the appropriateness of the
American item sequence for Dutch children indicate that Dutch norms for the
Bayley-III-NL can be based upon the current version.
Manuscript received 10 January 2013

Revised manuscript accepted 18 November 2013
First published online 29 January 2014
REFERENCES
Anderson, P. J., De Luca, C. R., Hutchinson, E., Roberts, G., & Doyle, L. W. (2010). Underestimation
of developmental delay by the new Bayley-III scale. Archives of Pediatrics and Adolescent
Medicine, 164, 352356. doi:10.1001/archpediatrics.2010.20.
Bartram, D. (2001). Guidelines for test users: A review of national and international initiatives.
European Journal of Psychological Assessment, 17, 173186. doi:10.1027//1015-5759.17.3.173.
Bayley, N. (2006). The Bayley scales of infant and toddler development. San Antonio, TX: The
Psychological Corporation.
Freeman Duncan, A., Watterberg, K. L., Nolen, T. L., Vohr, B. R., Adams-Chapman, I., Das, A., &
Lowe, J. (2012). Effect of ethnicity and race on cognitive and language testing at age 18 22
510 STEENIS ET AL.
months in extremely preterm infants. The Journal of Pediatrics, 160, 966 971.e2. doi:10.1016/j.
peds.2011.12.009.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues
influencing the normative interpretation of assessment instruments. Journal of Psychological
Assessment, 6, 304312. doi:10.1037/1040-3590.6.4.304.
Kelly, Y., Sacker, A., Schoon, I., & Nazroo, J. (2006). Ethnic differences in achievement of
developmental milestones by 9 months of age: The millennium cohort study. Developmental
Medicine and Child Neurology, 48, 825830. doi:10.1111/j.1469-8749.2006.tb01230.x.
Krogh, M. T., Vver, M. S., Harder, S., & Kppe, S. (2012). Cultural differences in infant
development during the first year: A study of Danish infants assessed by the Bayley-III and
compared to the American norms. European Journal of Developmental Psychology, 9, 730 736.
doi:10.1080/17405629.2012.688101.
Piaget, J. (1962). Play, dreams, and imitation in childhood. New York, NY: W.W. Norton.
Steenis, L. J. P., Verhoeven, M., & Van Baar, A. L. (2012). The Bayley III: The instrument for early
detection of developmental delay. In A. M. Columbus (Ed.), Advances in psychology research
(Vol. 92, pp. 133 141). Hauppauge, NY: Nova Science.
Tijmstra, J., Hessen, D. J., Van der Heijden, P. G. M., & Sijtsma, K. (2011). Invariant ordering of item-
total regressions. Psychometrika, 76, 217 227.
Vierhaus, M., Lohaus, A., Kolling, T., Teubert, M., Keller, H., Fassbender, I., . . . Schwartzer, G.
(2011). The development of 3- to 9-month-old infants in two cultural contexts: Bayley
longitudinal results for Cameroonian and German infants. European Journal of Developmental
Psychology, 8, 349366. doi:10.1080/17405629.2010.505392.
Walker, K., Badawi, N., Halliday, R., & Laing, S. (2010). Brief report: Performance of Australian
children at one year of age on the Bayley Scales of Infant and Toddler Development (version III).
Australian Educational and Developmental Psychologist, 27, 5458. doi:10.1375/aedp.27.1.54.
Wu, Y.-T., Twou, K.-I., Hsu, C.-H., Fang, L.-J., Yao, G., & Jeng, S.-F. (2008). Brief report: Taiwanese
infants mental and motor development 6 24 months. Journal of Pediatric Psychology, 33,
102108. doi:10.1093/jpepsy/jsm067.
BAYLEY-III-NL 511
APPENDIX A: MODIFIED ITEMS AFTER PHASE 1
Item Modification
Cognition
46. Removes lid from bottle It was added to the item
instructions that it is needed to
check if the lid is screwed
on tightly enough so it can
only by removed by screwing not
by pulling.
65, 69, 71. Free play items It was added to the item
instructions that when a child does
not start to play because of
possible shyness, administer this item later.
If the child does start to
play, try to elicit the more
difficult free play items.
Gross Motor Scale
Items 57, 58. Walks up and It was added that when the
down the stairs alone child continues to use the wall
for support or keeps requesting for
support when walking up or down
the stairs, remove the stairs form
the wall or give him/her a
toy in both hands to see
if (s)he is able to walk
the stairs without support.

Bayley III Dutch Adaptation Valid for Assessing Development

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayley III Dutch Adaptation Valid for Assessing Development

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [University of Nebraska, Lincoln]

On: 07 April 2015, At: 13:41

First steps in developing the

To cite this article: Leonie J. P. Steenis, Marjolein Verhoeven, Dave J. Hessen

To link to this article: http://dx.doi.org/10.1080/17405629.2013.869207

PLEASE SCROLL DOWN FOR ARTICLE

First steps in developing the Dutch version of the

Leonie J. P. Steenis1, Marjolein Verhoeven1, Dave J. Hessen2,

and Anneloes L. van Baar1

Keywords: Bayley III; Item sequence; Infant and toddler development;

The Bayley III (Bayley, 2006) is an internationally extensively used norm-

Correspondence should be addressed to Leonie J. P. Steenis, Child and Adolescent Studies,

q 2014 Taylor & Francis

healthy, term born one-year-old Australian children, different developmental

Total N 100 400

Bayley-III-NL. The Bayley Scales of Infant and Toddler Development-third

Examiners were trained to obtain an inter-rater reliability level with a minimal

Communication Scale were found to concern milestones in English language

two of three blocks

Phase 1 Phase 2 USA

Cognition 10.41 (2.53) 10.68 (2.34) 10.00 (3.00)

Characteristics of the deviating items of the next 400 children

Items Start In start Easy or Proportion of Proportion of Proportion of Age adequate

representative of the Dutch population regarding region, and mothers ethnicity

Manuscript received 10 January 2013

APPENDIX A: MODIFIED ITEMS AFTER PHASE 1

You might also like