You are on page 1of 24

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/233025061

NEPSY-II: A Developmental Neuropsychological Assessment, Second Edition

Article  in  Child Neuropsychology · January 2010


DOI: 10.1080/09297040903146966 · Source: PubMed

CITATIONS READS

87 51,023

3 authors, including:

Brian L Brooks Elisabeth Sherman


The University of Calgary Copeman Healthcare Centre
191 PUBLICATIONS   4,164 CITATIONS    108 PUBLICATIONS   8,539 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Update and revision of criteria for malingered neurocognitive disorder (MND) View project

Multivariate Base Rates for Clinical Interpretation of Tests View project

All content following this page was uploaded by Brian L Brooks on 06 April 2015.

The user has requested enhancement of the downloaded file.


This article was downloaded by: [Brooks, Brian L.]
On: 14 December 2009
Access details: Access Details: [subscription number 917799133]
Publisher Psychology Press
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-
41 Mortimer Street, London W1T 3JH, UK

Child Neuropsychology
Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t713657840

NEPSY-II: A Developmental Neuropsychological Assessment, Second


Edition
Brian L. Brooks ab; Elisabeth M. S. Sherman ab; Esther Strauss c
a
Alberta Children's Hospital, Calgary, Alberta, Canada b University of Calgary, Calgary, Alberta,
Canada c University of Victoria, Victoria, British Columbia, Canada

First published on: 10 August 2009

To cite this Article Brooks, Brian L., Sherman, Elisabeth M. S. and Strauss, Esther(2010) 'NEPSY-II: A Developmental
Neuropsychological Assessment, Second Edition', Child Neuropsychology, 16: 1, 80 — 101, First published on: 10 August
2009 (iFirst)
To link to this Article: DOI: 10.1080/09297040903146966
URL: http://dx.doi.org/10.1080/09297040903146966

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf


This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
Child Neuropsychology, 16: 80–101, 2010
http://www.psypress.com/childneuropsych
ISSN: 0929-7049 print / 1744-4136 online
DOI: 10.1080/09297040903146966

TEST REVIEW: NEPSY-II: A DEVELOPMENTAL


NEUROPSYCHOLOGICAL ASSESSMENT,
SECOND EDITION

Brian L. Brooks,1,2 Elisabeth M. S. Sherman,1,2 and


Esther Strauss3
1
Alberta Children’s Hospital, Calgary, Alberta, Canada, 2University of Calgary,
Calgary, Alberta, Canada, and 3University of Victoria, Victoria, British Columbia,
Canada
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

The NEPSY-II consists of 32 subtests for use in a neuropsychological assessment with pre-
schoolers, children, and adolescents. This test review provides an overview of the NEPSY-II
for clinicians and researchers, including descriptions of the subtests, changes from the
original NEPSY, reliability and validity evidence, strengths, and limitations.

Keywords: NEPSY-II; Test review; Children; Neuropsychology; Adolescents.

INTRODUCTION
The NEPSY-II (Korkman, Kirk, & Kemp, 2007a) is a comprehensive, co-normed, and
multidomain neuropsychological battery designed for assessing neurocognitive abilities in pre-
schoolers, children, and adolescents. The NEPSY-II is a flexible battery of subtests that is
designed to allow the administration of specific subtests, groups of subtests, or the entire battery.
The NEPSY-II has its origin with the NEPS, a Finnish instrument that first appeared
nearly 30 years ago (Korkman, 1980). As noted by Korkman (1999), the test originally
consisted of only two to five tasks for 5- and 6-year-olds, designed along traditional
Lurian approaches, scored in a simple pass/fail manner and calibrated so that items were
passed by the vast majority of children. The NEPS was revised and expanded in 1988 and
1990 to include more tasks (including the Visual Motor Integration [VMI] and Token Test
as complements) and a wider age range (NEPS-U; Korkman, 1988a, 1988b). A Swedish
version was also developed (NEPSY; Korkman, 1990). The NEPSY was then revised and
expanded to an even broader age range and was standardized in Finland (Korkman, Kirk, &
Kemp, 1997) and in the United States (Korkman, Kirk, & Kemp, 1998). The North
American version of the NEPSY was revised, built upon, and expanded into the NEPSY-II
(Korkman et al., 2007a).

The authors wish to thank Helen Carlson, PhD, for her assistance with manuscript preparation and Pearson
Assessment (The Psychological Corporation) for providing us with the materials necessary to independently
review the NEPSY-II.
Address correspondence to Brian L. Brooks, PhD, Neurosciences Program, Alberta Children’s Hospital,
2888 Shaganappi Trail NW, Calgary, AB, Canada T3B 6A8. E-mail: brian.brooks@albertahealthservices.ca

© 2009 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business
NEPSY-II TEST REVIEW 81

STRUCTURE OF THE NEPSY-II: DOMAINS AND SUBTESTS


The NEPSY-II contains 32 subtests, which are divided into six theoretically derived
domains of cognitive functioning: Attention and Executive Functioning; Language; Memory
and Learning; Sensorimotor; Social Perception; and Visuospatial Processing. Table 1
highlights the six domains, the subtests that are grouped within each domain, the task
demands, and possible interpretive considerations.

FROM NEPSY TO NEPSY-II: CHANGES, DELETIONS, AND ADDITIONS


Korkman et al. (2007b) note that the revision of the battery was based upon several
factors, including: research in neuropsychology, child development, and child psychology;
feedback from experts and customers; experience of the authors; and early pilot studies of
revisions and newly developed subtests. Those clinicians familiar with the NEPSY will
readily notice the additions and omissions of subtests, the absence of domain or composite
scores, and the expansion of normative data from 12 to 16 years of age for many subtests
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

(see Table 1 for summary information on these changes).


The authors indicate that they had four primary goals for revising the NEPSY into
the NEPSY-II. First, they wanted to improve and expand the cognitive domains covered
across the age span. This was achieved by adding new subtests to the existing domains.
Several subtests including Knock and Tap, Tower, Visual Attention, and Finger Discrimi-
nation were removed because of low clinical sensitivity. New subtests were added (Affect
Recognition and Theory of Mind) in the new domain of Social Perception. Most NEPSY-II
subtests were also modified to some degree, in terms of changes to the administration,
recording or scoring procedures, addition of new items, and/or changes to the age range.
Table 1 includes summaries of these subtest modifications.
Second, the authors wanted to enhance clinical and diagnostic utility of the NEPSY-II.
This was done by removing the five domain scores “in favour of the more clinically sensi-
tive subtest-level scores” (Korkman et al., 2007b, p. 26). Special group studies were also
expanded to include 10 patient groups: Attention deficit/hyperactivity disorder (ADHD);
Asperger’s Disorder; Autistic Disorder; Deaf and Hard of Hearing; Emotionally Disturbed;
Language Disorder; Mild Intellectual Disability; Mathematics Disorder; Reading Disorder;
and Traumatic Brain Injury (TBI). Detailed discussion of these clinical groups can be
found in the Validity Evidence section.
The third goal of the revision involved improving the psychometric properties of the
NEPSY-II compared to the NEPSY. This included expansion of concurrent validity studies
with various measures (e.g., academic achievement, intelligence, cognition, and adaptive func-
tioning). New normative data for nearly all subtests were obtained; although some subtests on
the NEPSY-II were not re-normed. The authors also sought to improve upon the floor and ceil-
ing effects noted on some NEPSY subtests. Floor effects tend to arise when a test is too hard
and most of the normative sample performs poorly, whereas ceiling effects arise when a test is
too easy for the normative group (see Limitations for further discussions of these revisions).
The fourth goal for the revision was to enhance the usability and ease of admin-
istration of the NEPSY-II subtests. The authors sought to make the NEPSY-II more
flexible by encouraging examiners to choose the specific subtests that they felt were
most appropriate for a clinical assessment. To help facilitate this, the authors placed
the subtests in alphabetical order within the protocol and the subtests were adminis-
tered to the normative sample in four different orders to reduce potential order
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Table 1 Descriptions of NEPSY-II Subtests, Interpretive Suggestions, Administration Times, Inclusion in Referral Batteries, and Changes from NEPSY

Changes
Admin. Referral from
Age Range Description of Subtest Interpretation of Low Score(s)1 Time (mins) battery NEPSY

Attention and Executive Functioning


Animal Sorting 7–16 A card-sorting task assessing concept formation. Poor initiation, cognitive flexibility, and 8–10 5, 9 New
self-monitoring; poor conceptual
reasoning or semantic knowledge.
Auditory Attention and 5–16 An auditory selective and sustained attention task Reduced selective and sustained attention, 7–11 1–7,9 A,B,D
Response Set that involves pointing to stimuli according to slow responding.
7–16 consistent and inconsistent examiner cues; Poor sustained attention, inhibition, or work-
Assess shifting, inhibition, and maintaining a ing memory.
new and complex set.
Clocks 7–16 A clock-drawing task. Poor planning and organization; poor visual- 6–10 4,5,7 New
spatial/drawing or reading ability; poor

82
time concept.
Design Fluency 5–12 A nonverbal fluency task during which the child must Difficulty with initiation and productivity; 4 4,7,9 None
draw as many unique designs in a given time limit poor cognitive flexibility.
from both structured and unstructured dot arrays.
Inhibition 5–16 Visual task involving three components: Naming Slow processing speed (IN-N), or trouble 8–11 1–6,9 New
involves rapid naming of shapes (squares and with inhibitory control (IN-I) or inhibitory
circles) or the direction of arrows (up or down); control and cognitive flexibility (IN-S).
Inhibition involves rapid opposite naming of
shapes (saying square when circle, saying circle
when square) or arrows (say up when pointing
down, say down when pointing up); and Switch-
ing involves rapidly saying the correct shape or
arrow direction if the object is colored white, or
saying the opposite shape or arrow direction if
the object is colored black.
Statue 3–6 Assesses response inhibition and motor persistence Difficulty with overall inhibitory control. 3 2–9 B,D
by having child maintain a body position while
ignoring examiner’s sound distractors.
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Language
Body Part Naming and 3–4 Naming task for younger children that involves Trouble with word finding, expressive 4–5 6 A,B,C
Identification naming body parts on a picture of a child or on language, vocabulary, or semantic
the child’s own body, as well as recognition of knowledge.
body part names.
Comprehension of 3–16 Auditory comprehension task that requires the Poor receptive language, linguistic or seman- 6–8 1,2,3,5, A,B,C,D
Instructions child to point to the correct picture in response tic knowledge, or trouble following multi- 6,8,9
to examiner commands of increasing syntactic step commands.
complexity.
Oromotor Sequences 3–12 Involves the repetition of “articulatory sequences” Difficulty with motor programming for 5 2,6,7 None
(i.e., tongue twisters) to assess oromotor coordi- speech production.
nation.
Phonological Processing 3–16 A two-part test that requires the child to identify Reduced phonological awareness and 5–8 2,8 A,B,C,D
words from word segments and then to create a processing.
new word by omitting or substituting word seg-
ments or phonemes.
Repetition of Nonsense 5–12 Assesses the child’s phonological encoding and Poor ability to analyze or produce words 4 6 None
Words decoding skills by requiring the child to repeat phonologically or to articulate novel

83
nonsense words presented on audiotape. words.
Speeded Naming 3–16 Assess rapid semantic production by requiring the Trouble with expressive language, lexical 2–7 1–6,8,9 A,B,C,D
child to name the size, shape, and color of famil- access, processing speed, or naming.
iar stimuli as quickly as possible.
Word Generation 3–16 Test of verbal productivity by having child generate Difficulty with expressive language, 4–6 4,8,9 B,D
words within specific semantic and initial letter processing speed, executive control,
categories. initiation, or ideation.
Memory and Learning
List Memory 7–12 Test of verbal learning and memory that involves Trouble with learning skills for verbal 8 4 B
List Memory Delayed learning over five trials, an interference list, imme- material, rote memory, or span of verbal
diate recall, and delayed recall for a list of 15 words. memory.
Memory for Designs 3–16 Test of visuospatial memory for novel visual Difficulty with learning for visuospatial 10–15 3,7,8 New
Memory for Designs 5–16 designs. Involves learning, immediate, and information, decay of learned visuospatial (3–4)
Delayed delayed recall of the position of designs on information.
two-dimensional grids.

(Continued)
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Table 1 (Continued)

Changes
Admin. Referral from
Age Range Description of Subtest Interpretation of Low Score(s)1 Time (mins) battery NEPSY

Memory for Faces 5–16 A face recall task involving recalling a series of Poor face discrimination or recognition. 4–5 1,3,5,9 A,B,C,D
Memory for Faces photographs of children’s faces. (2–3)
Delayed
Memory for Names 5–16 Involves repeated exposure trials to a set of cards on Reduced capacity to learn and remember 6 2,6 D
Memory for Names which are children’s faces; the child is required to visual information with verbal labels. (2)
Delayed learn and recall the name associated with each face.
Narrative Memory 3–16 A story recall task that involves the examiner read- Trouble with verbal learning for contextual 6–11 1,6,9 A,B,C,D
ing a story to the child, followed by immediate information, comprehension, or immedi-
free recall, immediate cued recall, and immediate ate memory for larger amounts of verbal
recognition (for ages 3–10 only). information.
Sentence Repetition 3–6 Sentences are aurally presented to the child. The Difficulty with verbal immediate (working) 4 4,5,6,8 D
child recites the sentences to the examiner imme- memory.
diately after it is presented.

84
Word List Interference 7–16 Involves two aurally presented series of words that Poor verbal working memory and difficulty 6–8 1,2,3, New
are each repeated after presentation. The child is with verbal interference. 4,6,9
then asked to recall both series of words in order.
Sensorimotor
Fingertip Tapping 5–16 Tapping test that assesses motor speed and finger Poor fine motor control and motor 3–4 5,7,9 A,B,D
dexterity. programming.
Imitating Hand 3–12 Involves the child copying complex hand/finger Trouble with fine motor programming, dif- 5 6,7,9 None
Positions positions demonstrated by the examiner. ferentiation, or visuospatial abilities.
Manual Motor 3–12 Involves the imitation of rhythmic hand movement Difficulty with manual motor programming. 7 2,4,7 None
Sequences sequences.
Visuomotor Precision 3–12 A paper-and-pencil task involving timed eye-hand Poor psychomotor processing speed, visual 3–4 3,5,7,8,9 A,B,C
coordination in which the child is asked to rapidly attention, motor control, and coordination.
trace a path on paper without crossing any lines.
Social Perception
Affect Recognition 3–16 Child is asked to recognize affect (e.g., happy, sad, Trouble with recognition and discrimination 5–7 5,9 New
anger) from photographs of children’s faces in of facial affect.
various tasks. The tasks progress from affect
identification to recognition memory for affect.
(Continued)
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Table 1 (Continued)

Changes
Admin. Referral from
Age Range Description of Subtest Interpretation of Low Score(s)1 Time (mins) battery NEPSY

Theory of Mind 3–16 Involves showing a child pictures and asking ques- Difficulty with comprehending perspec- 10–13 4,9 New
tions, reading brief passages about people’s expe- tives, experiences, and beliefs of others.
riences and asking questions, comprehension of
abstract phrases, and matching of facial expres-
sion (feeling) to a person’s experience.
Visuospatial Processing
Arrows 5–16 Judgment of line orientation involving selection of the Reduced visuospatial abilities, trouble judg- 5–7 – A,B,C,D
arrow(s), from a number of arrows with different ing line orientation and angles.
orientations, that point(s) to the center of a target.
Block Construction 3–16 Requires the child to reproduce three-dimensional Poor visuoconstructional abilities, difficulty 7–11 3,7,8,9 B,C,D
block constructions using models or two-dimen- with three-dimensional tasks.
sional pictures, using unicolored blocks.
Design Copying 3–16 Motor and visual-perceptual test that involves the Poor visuoconstructional abilities, difficulty 7–10 1 thru 9 A,B,C,D

85
child copying two-dimensional geometric with two-dimensional drawing tasks.
designs of increasing difficulty on paper.
Geometric Puzzles 3–16 Assesses mental rotation, visuospatial analysis, and Reduced visuospatial abilities, trouble with 8–12 1,3,4,7,9 New
attention to detail. Child matches two shapes out- perception, or mental rotation.
side of a grid to two of several shapes inside grid.
Picture Puzzles 7–16 Test of visual discrimination, spatial localization, Problems with visual perception, visual 9–14 2,3 New
and visual scanning that involves identifying the attention, and scanning.
location of smaller parts of a picture on a grid for
the larger picture.
Route Finding 5–12 Visual-spatial task involving finding the correct Difficulty with visuospatial relations, poor 4 – None
route leading to a target on a map. orientation.
1
This is only a suggestion for clinicians. There are several reasons for a child to obtain a low score on a test. These potential reasons must be carefully considered when interpreting
test performance on the NEPSY-II.
Times in parentheses are for delayed portions of subtests. Referral batteries: 1 = General battery; 2 = Learning differences-reading; 3 = Learning differences-math; 4 = Attention/
concentration; 5 = Behavior management; 6 = Language delays/disorders; 7 = Perceptual/motor delays/disorders; 8 = School readiness; 9 = Social/interpersonal. Changes from
NEPSY to NEPSY-II: New = new subtest for NEPSY-II; None = no changes from NEPSY to NEPSY-II; A = changes to administration; B = changes to recording and scoring; C =
new items added; D = change to age range.
86 B. L. BROOKS ET AL.

effects1 (see page 42 in Korkman et al., 2007b). However, if a clinician wants to have a
specific battery of subtests for testing children with specific disorders, then the authors sug-
gest eight referral batteries that can be used for planning an assessment (new to NEPSY-II).
The NEPSY-II referral batteries include: General Referral (i.e., recommended for
most assessments and consists of the most “clinically sensitive” subtests from all but the
social perception domain); Learning Differences – Reading (i.e., recommended when the
child is referred for poor reading skills, poor reading achievement, or difficulty learning to
read); Learning Differences – Mathematics (i.e., recommended when the child is referred
for poor math skills, poor achievement in math, or difficulty learning mathematical
concepts); Attention/Concentration (i.e., recommended when a child’s problems are associ-
ated with poor attention or distractibility); Behavior Management (i.e., recommended when
a child exhibits severe problems with behavioral control); Language Delays/Disorders (i.e.,
recommended when a child is referred because of delayed language development or prob-
lems with language use); Perceptual/Motor Delays/Disorders (i.e., recommended when
there is delayed motor development or visuospatial impairment); School Readiness (i.e.,
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

recommended for 3- to 6-year-olds to help identify cognitive delay that might interfere
with entry into school); and Social/Interpersonal (i.e., recommended when a child exhibits
significant abnormal behavior, delayed or impaired social skills, or social isolation). The
specific subtests included in these nine referral batteries are listed in Table 1 of this review
and are readily identified when using the NEPSY-II computerized scoring assistant and
assessment planner software. Two Visuospatial Processing subtests, Arrows and Route
Finding, are not included in the author-suggested batteries.
In addition to the outlined revision goals, the clinical and interpretive manual intro-
duces a new role for the NEPSY-II, namely, that of a tool designed to assist in assessing
school-based problems by school psychologists (see page 2 of the manual). The proposed
referral batteries and the clinical group studies are reflective of this new role. Specifically,
the manual states that neuropsychological tests greatly enhance standard assessment prac-
tices used by school psychologists in the context of psychoeducational assessments. The
authors stress that examiners should have training in administering and scoring neuropsy-
chological tests and that interpretative inferences be restricted to a level consistent with
background and training. The authors suggest that in cases of identifying brain injury or
determining cognitive consequences of neurological conditions, the examiner should have
experience in performing such evaluations or should refer the child to a neuropsychologist.

NORMATIVE SAMPLE
The NEPSY-II normative sample is a national, stratified random sample consisting
of 1,200 preschoolers, children, and adolescents between the ages of 3 and 16 years, col-
lected between 2005 and 2006. There were 100 children (50 boys, 50 girls) in each of
12 age groups: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13–14, and 15–16 years of age. For ages 3 to
12 years, each age group contained 50 children in the first six months and 50 children in
the second six months of the year. For adolescents between 13 and 16 years, there was n =
50 for each year. Stratification by age, race/ethnicity, geographic location, and parental
education was based on the October 2003 United States census survey. Exclusion criteria
included diagnosis of a number of conditions that could potentially affect scores,

1
Unfortunately, psychometric evidence to support this position is not presented in the NEPSY-II Clinical
and Interpretive Manual.
NEPSY-II TEST REVIEW 87

including neurological, learning, sensory/motor and psychiatric disorders, recent history


of previous testing, and medication usage that might potentially impact performance.

SCORES AND CLASSIFICATION OF PERFORMANCE


The NEPSY-II provides four different types of scores. Primary scores are presented as
age-adjusted scaled scores (mean = 10, standard deviation = 3) and represent the central clini-
cal aspect of the subtest. For example, the primary score for the Animal Sorting test is the
Total Correct Sorts scaled score, which represents a child’s ability to formulate concepts, to
sort cards based on those concepts, and to shift set from one concept to another. In addition,
there are combined scaled scores for subtests, which are also considered primary scores. Com-
bined scores are created by combining two measures within a subtest, for example Animal
Sorting Total Correct and Animal Sorting Total Errors, to produce a combined age-adjusted
scaled score. The second type of score is the process score. Process scores assess more
specific abilities, skills, or error rates from a subtest and can be presented as scaled scores, per-
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

centile ranks, or cumulative percentages. Examples include the Animal Sorting Total
Repeated Sort Errors and the Total Novel Sort Errors. The third type of score is a contrast
score. Contrast scores are presented as scaled scores and allow for a statistical comparison
between high and low abilities. The fourth type of score is behavioral observations, which
quantify behaviors occurring during the assessment that are common in clinical populations
but uncommon in healthy children. Behavioral observations are presented as percentile ranks
or cumulative percentages due to non-normal distributions (i.e., positively skewed).
Suggested interpretive classifications for the NEPSY-II standard scores differ some-
what from other interpretive methods (for comparison to the Wechsler system, see Table 2).
For example, performance above the 75th percentile is referred to as “above expected

Table 2 Classification Descriptors for Scaled Score Performance on the NEPSY-II Compared to Wechsler
Classifications

Scaled Score Percentile Rank NEPSY-II Classification Wechsler Classification


19 99.9
18 99.6
Very superior
17 98.6
16 97.7 Above expected level
15 95
Superior
14 91
13 84 High average
12 75
11 63
10 50 At expected level Average
9 37
8 25
7 16 Low average
Borderline
6 9
Borderline
5 5
Below expected level
4 2.3
3 1.4
Extremely low
2 0.4 Well below expected level
1 0.1
Note. Scaled scores have a mean = 10 and standard deviation = 3. Percentile ranks corresponding to the scaled
scores are based on the Wechsler classification.
88 B. L. BROOKS ET AL.

level” and there is no descriptive differentiation of any scores above that level. This
absence of different classifications for higher scores may limit the qualitative reporting of
relative strengths in cognitive abilities and might cause clinicians to focus more on impair-
ment, rather than both strengths and weaknesses. There are also differences between the
NEPSY-II and other classification methods in the suggested descriptions of scores below
the 25th percentile. For example, a scaled score of 7 (16th percentile) is called “border-
line” by the NEPSY-II but low average by the Wechsler system.

RELIABILITY EVIDENCE
The NEPSY-II manual presents a voluminous amount of information on subtest reliabil-
ities. Although this is a clear asset for understanding the psychometric properties of the sub-
tests, the sheer amount of information presented can make it challenging to obtain an overall
impression of the reliability evidence for all of the scores, particularly for the different age
groups. Below, we present some summary information about these reliability estimates in order
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

to guide users as to which subtests and scores may have better reliability evidence over others.
Evidence for internal reliability refers to the consistency of measurement of a given
score. Overall, internal reliability evidence for the NEPSY-II is impressive. Table 4.1 in
the NEPSY-II Clinical and Interpretive Manual (Korkman et al., 2007b) shows that across
the age groups, the internal reliability coefficients are for the most part adequate to very
high. In the youngest children (3–4 years), only the Speeded Naming Combined Scale
Score has very high internal reliability (r = .93). In the older age groups, the number of
measures with very high internal reliability increases to six in 5- to 6-year-olds and to
eight in 7- to 12-year-olds, but only four subtests in the 13- to 16-year-olds have very high
internal reliability. There are, however, a few measures with marginal or low internal con-
sistency within each age group (e.g., Word Generation Total Score in 3- to 4-year-olds,
Memory for Faces Total score in 5- to 6-year-olds, Design Fluency in 7- to 12-year-olds,
and Narrative Memory Free Recall in 13- to 16-year-olds).
Internal reliability coefficients are also presented in the manual for a mixed clinical
sample (see Figure 1). In the 5- to 6-year-olds with clinical diagnoses, internal reliability
coefficients are r = .80 or higher for all of the subtest primary and process scores, and 8 of
the 11 scores have coefficients greater than r = .90. In the 7- to 12-year-olds, 14 out of 15
scores had internal reliability coefficients that are r = .80 or greater; although the
reliability coefficient for Word List Interference Recall in 7- to 12-year-olds is only mar-
ginal (r = .67). Across those primary and process scores with low internal reliability in the
healthy standardization sample, all but one score (i.e., Word List Interference Recall) had
high (r ≥ .80) or very high (r ≥ .90) internal reliability in the clinical sample.
Test-retest reliability refers to the stability of test performance over time. Similar to
the reliability section, the NEPSY-II manual presents a large amount of test-retest reliabil-
ity information. As an example, for ages 9:0 to 10:11, there are reliability estimates for 76
different scores, either in the form of test-retest correlations (37 scores) or decision consis-
tency estimates (39 scores). The test-retest sample consists of 165 healthy children
re-administered the battery after a mean retest interval of 21 days (range = 12–51 days).
Test-retest reliability is presented in different ways in the manual, depending on the
distribution of scores. Correlations are used for subtests that have normal distributions and
decision consistency (percent agreement) is used for subtests that have skewed distribu-
tions. Decision consistency estimates are based on percentage agreement for two categories
(i.e., was the child in the same classification range at test and retest, defined as ≤10th, or
NEPSY-II TEST REVIEW 89

Figure 1 Internal reliability of NEPSY-II primary and process scaled scores in the clinical samples.
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Figure notes: Subtest abbreviations include: CL = Clocks; INN = Inhibition-Naming; INI = Inhibition-
Inhibition; INS = Inhibition-Switching; CI = Comprehension of Instructions; PH = Phonological Processing;
MD = Memory for Designs; SR = Sentence Repetition; WIRP = Word List Interference-Repetition; WIRC =
Word List Interference-Recall; AR = Affect Recognition; TM = Theory of Mind; AW = Arrows; BC = Block
Construction; DCP = Design Copying Process Total Score; GP = Geometric Puzzles; PP = Picture Puzzles.

>10th percentile, or ≤ or > a scaled score of 6). These percentages can be conceptualized as
providing information on whether deficits, as detected by the NEPSY-II at Time 1, are
reproducible at Time 2. Decision consistency estimates are considered a less rigorous test
of reliability than correlation coefficients, particularly when only two clinical classifica-
tions (impaired versus not impaired) are being considered. For this reason, scores that only
have decision consistency reliability estimates as evidence of stability (see Table 4.5 in
Korkman et al., 2007b) should probably be interpreted with caution in diagnostic situations.
Overall, test-retest reliability correlations for many NEPSY-II subtests are generally
adequate to high. Table 3 presents the retest reliability correlations, with light gray shading to
indicate marginal reliability and dark gray shading to indicate adequate or better reliability.

PRACTICE EFFECTS
Practice effects can be calculated for the test-retest sample for those subtests with non-
skewed distributions, based on the differences between mean performance at Time 1 and
Time 2 presented in the manual. The largest practice effects on the NEPSY-II are found in the
Memory and Learning domain. In the Attention and Executive domain, the largest practice
effects were for Inhibition Switching Combined Scale; other Inhibition scores appear to have
relatively minor practice effects. Other domains such as Language, Sensorimotor, Social
Perception, and Visuospatial Processing have only small to negligible practice effects. For
scores with skewed distributions, the relative size of practice effects across subtests cannot be
directly compared because these scores are not in a standardized score format.

VALIDITY EVIDENCE
Test validity may be defined at its most basic level as the degree to which a test
measures what it is intended to measure. A test cannot be said to have one single level of
90 B. L. BROOKS ET AL.

Table 3 Test-Retest Reliability for NEPSY-II Primary and Process Scaled Scores

Age Groups
Domains and Subtest Scores 3–4 5–6 7–8 9–10 11–12 13–16
Attention and Executive Functioning
Animal Sorting Total Correct Sorts – .59 .63 .73 .71 .59
Auditory Attention Total Correct – – .42 .62 .58 –
Response Set Total Correct – – .84 .53 .58 –
Clocks Total – – .73 .70 .78 .64
Design Fluency Total – .59 .57 .68 .57 –
Inhibition-Naming Total Completion Time – .81 .82 .74 .79 .87
Inhibition-Inhibition Total Completion Time – .79 .81 .66 .80 .82
Inhibition-Switching Total Completion Time – – .82 .78 .75 .93
Inhibition Total Errors – .77 .66 .57 .33 .76
Statue Total .81 .79 – – – –
Language
Body Part Naming Total Score .70 – – – – –
Body Part Identification Total Score .77 – – – – –
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Comprehension of Instructions Total .82 .80 .79 .71 .84 .75


Phonological Processing Total .60 .88 .82 .78 .80 .87
Repetition of Nonsense Words Total – .87 .70 .72 .65 –
Speeded Naming Total Completion Time .82 .72 .91 .85 .79 .89
Word Generation Semantic Total – – – – – .84
Word Generation Initial Letter – – – – – .54
Memory and Learning
List Memory and List Memory Delayed Total Correct – – .60 .71 .64 –
Memory for Designs Content .44 .78 .81 .65 .75 .69
Memory for Designs Spatial .64 .64 .69 .62 .63 .48
Memory for Designs Total .62 .71 .83 .56 .69 .65
Memory for Designs Delayed Content – .64 .61 .60 .62 .82
Memory for Designs Delayed Spatial – .74 .73 .58 .65 .63
Memory for Designs Delayed Total – .76 .72 .51 .72 .60
Memory for Faces Total – .46 .53 .75 .57 .73
Memory for Faces Delayed Total – .59 .47 .69 .69 .82
Memory for Names Total – – – – – .67
Memory for Names Delayed Total – – – – – .53
Memory for Names and Memory for Names Delayed Total – – – – – .70
Narrative Memory Free & Cued Recall Total .75 .72 .79 .65 .78 .83
Narrative Memory Free Recall Total – .61 .76 .61 .64 .76
Sentence Repetition Total .74 .77 – – – –
Word List Interference Repetition Total – – .62 .74 .73 .87
Word List Interference Recall Total – – .89 .63 .62 .60
Sensorimotor
Imitating Hand Positions Total .72 .71 .21 .52 .33 –
Visuomotor Precision Total Completion Time .68 .60 .65 .81 .72 –
Social Perception
Affect Recognition Total Score .61 .58 .50 .52 .55 .58
Theory of Mind Total Score .70 .77 – – – –
Visuospatial Processing
Arrows Total Score – .60 .62 .51 .65 .83
Block Construction Total Score .62 .70 .77 .80 .76 .79
Design Copying Process Motor Score .70 .51 .67 .69 .60 .69
Design Copying Process Global Score .64 .67 .71 .52 .64 .74
Design Copying Process Local Score .66 .52 .67 .57 .59 .62
Design Copying Process Total Score .81 .74 .78 .62 .69 .75
Geometric Puzzles Total Score – – .65 .63 .66 .89
Picture Puzzles Total Score – – .79 .71 .76 .91

Note. Values are uncorrected Pearson correlations for time 1 and time 2 scores. Bold values in dark grey boxes
indicate that the retest correlation is “acceptable” or better (i.e., r ≥ .70). Retest correlations with light grey
shading are marginal (r = .60 – .69).
NEPSY-II TEST REVIEW 91

validity, but rather it can be said to possess various types and levels of validity across a spec-
trum of usage and populations. The NEPSY-II manual (Korkman et al., 2007b) provides
several lines of validity evidence (i.e., content, construct, concurrent, and clinical validity).

NEPSY-II Domains and Subtest Intercorrelations


Clinicians are often most interested in whether the subtests that are theoretically
grouped together within a domain actually correlate with each other. This is also a basic
component of validity. According to the authors, the subtests comprising each of the six
domains were selected a priori based on theoretical grounds, previous research with the
1998 NEPSY, clinical experience, and a review of the literature. It was also determined a
priori that the subtests within each domain would have low correlations because they are
measuring varied (yet also theoretically related) cognitive abilities.
The NEPSY-II manual presents a daunting intercorrelation matrix (Table 5.1; Korkman
et al., 2007b). On the Attention and Executive Functioning subtests, medium to large2
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

intercorrelations were present for some scores — particularly for different components of a
subtest. For example, medium to large correlations were reported for the various components
of the Inhibition subtests (e.g., Naming, Inhibition, and Switching). Other than the medium
correlations between Auditory Attention and Response Set, and Clocks and Inhibition Total
Errors, all other intercorrelations for the Attention and Executive Functioning subtests were
negligible to small. On the Language domain, Body Part Naming and Body Part Identifica-
tion had a large intercorrelation (r = .72) and Word Generation Semantic and Initial Letter
had a medium correlation (r = .46). Comprehension of Instructions had medium correlations
with Body Part Naming, Body Part Identification, and Phonological Processing. Phonologi-
cal Processing had a medium correlation with Body Part Identification (r = .34).
Memory and Learning subtests had medium to large correlations between the imme-
diate and delayed portions of subtests (e.g., Memory for Designs and Memory for Designs
Delayed). Word List Interference Recall and Word List Interference Recognition had a
medium correlation (r = .44). Narrative Memory also had medium correlations with Sen-
tence Repetition (r = .38) and Word List Interference Recognition (r = .30). The remain-
ing intercorrelations for the Memory and Learning subtests were negligible to small (e.g.,
the correlation between delayed components for Memory for Designs and Memory for
Faces, both involving memory for visual information, was r = .18). On the Sensorimotor
domain, the various components of the Finger Tapping subtest (i.e., Dominant Hand, Non-
dominant Hand, Repetitions, and Sequences) had medium to large correlations. Visuomo-
tor Precision did not correlate with the various components of the Finger Tapping subtest
(rs < .06). Affect Recognition and Theory of Mind, the two subtests included in the Social
Perception domain, only had a small intercorrelation (r = .21). Most of the Visuospatial
Processing subtests had medium-sized intercorrelations.

NEPSY-II and NEPSY


A sample of 109 children (mean age = 8.04 years, range = 3–12 years) were adminis-
tered the NEPSY and NEPSY-II with a mean test interval of 20 days (range = 11–53 days).
Most of the correlations between the NEPSY and NEPSY-II Attention and Executive

2
For the purpose of this review, correlations are classified as negligible (r < .10), small (r = .10 – .29),
medium (r = .30 – .49), and large (r = .50 – 1.0).
92 B. L. BROOKS ET AL.

Functioning subtests were small; although there was a medium correlation for Response Set
Combined Scaled score (r = .38) and a large correlation for Statue (r = .60). Correlations for
the Language subtests across the two versions were large (r = .60 – .78). On the Memory and
Learning domain, the correlation for Memory for Faces was medium to large (r = .49, imme-
diate; r = .55, delayed). Narrative Memory and Sentence Repetition from the NEPSY had
large correlations with their NEPSY-II counterparts. Correlations for Fingertip Tapping were
large and Visuomotor Precision correlations were medium (r = .32 for Completion Time) to
large (r = .60 for Combined Scaled score). On the Visuospatial Processing domain, correla-
tions between NEPSY and NEPSY-II were large for Arrows (r = .61) and Block Construc-
tion (r = .59) and ranged from medium to large for the various Design Copying elements.

Correlations with IQ Tests


A sample of 51 children received both the NEPSY-II and the Wechsler Intelligence
Scale for Children – Fourth Edition (WISC-IV; Wechsler, 2003) (mean age = 11.65,
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

range = 6–16; mean test interval = 11 days, range = 1–63). The WISC-IV Verbal Compre-
hension Index (VCI) had medium to large correlations with Language subtests from the
NEPSY-II. There were also medium correlations between VCI and Animal Sorting (r =
.41), Narrative Memory (r = .57 – .58), Word List Interference Repetition (r = .49), and
Picture Puzzles (r = .31), demonstrating good construct overlap between NEPSY-II sub-
tests measuring verbal skills and the VCI.
The WISC-IV Perceptual Reasoning Index (PRI) had medium to large correlations
with the Visuospatial Processing subtests. However, the PRI also had substantial correlations
with nearly all of the NEPSY-II subtests, including some subtests with minimal visual-
spatial components. The WISC-IV Working Memory Index (WMI) had medium-sized
correlations with Inhibition Naming, Inhibition Switching, as well as a large correlation with
Word List Interference Repetition (r = .63). WMI also had medium correlations with
Phonological Processing and Speeded Naming subtests, supporting the validation of these
NEPSY-II subtests as measuring aspects of attention and executive functioning. The WISC-IV
Processing Speed Index had medium-sized correlations with Clocks, the Inhibition subtests,
Language subtests, Narrative Memory, Word List Interference Recall, Nondominant Hand
Fingertip Tapping, Visuomotor Precision Combined Scaled score, and Block Construction.
A sample of 62 children between 4–16 years (mean age = 10.19 years) were also
administered the Wechsler Nonverbal Scale of Ability (WNV; Wechsler & Naglieri, 2006)
within a mean time of 148 days (range = 1–375 days) of the NEPSY-II. The WNV four-
subtest full-scale score correlated most strongly with Visuospatial Processing (rs ranged
from .54 to .63). Most of the other NEPSY-II subtests had at least medium correlations
with the WNV, with Phonological Processing having a large correlation (r = .53).
The Differential Ability Scales - Second Edition (DAS-II; Elliott, 2007) was co-
administered with the NEPSY-II to a sample of 242 children between 3 and 16 years
(mean age = 8.40; mean test interval = 250 days, range = 13–468 days). The DAS-II
General Conceptual Ability composite score correlated most strongly with Comprehen-
sion of Instructions and Sentence Repetition. Medium-sized correlations were reported
between the General Conceptual Ability score and Animal Sorting, Clocks, Inhibition
Switching, Phonological Processing, Speeded Naming, Memory for Designs, Narrative
Memory, Word List Interference Recall, Social Perception subtests, and Visuospatial
Processing subtests. The correlations between the NEPSY-II and the DAS-II Nonverbal
Composite score were largely similar to the correlations with the General Conceptual
NEPSY-II TEST REVIEW 93

Ability score. The Nonverbal Composite score had medium correlations with measures of
receptive language, expressive language, and verbal memory. Neither the composite
scores nor the cluster scores had strong correlations with Sensorimotor measures.

Correlations with Achievement


In a sample of 81 children (mean age = 8.83, range = 5–12; mean test interval = 8
days, range = 1–49 days) who were given the NEPSY-II and the Wechsler Individual
Achievement Test-Second Edition (WIAT-II; Wechsler, 2002), correlations were strongest
between WIAT-II composite scores and Sentence Repetition. Strong correlations were
also present between the WIAT-II composite scores and Clocks, Comprehension of
Instructions, Phonological Processing, and Narrative Memory.

Correlations with Other Cognitive and Neuropsychological Tests


Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

The NEPSY-II and the Children’s Memory Scale (CMS; Cohen, 1997) were admin-
istered to a sample of 43 children between 5 and 16 years of age (mean age = 11.0; mean
test interval = 8 days, range = 1–28 days). Correlations between the analogous CMS and
NEPSY-II memory subtests were medium to high, with the strongest correlations between
the story tests (r = .50– .61).
A sample of 49 children between 9 and 16 years (mean age = 12.84; test interval =
1–57 days) were administered the NEPSY-II and four subtests from the Delis-Kaplan
Executive Functions System (D-KEFS; Delis, Kaplan, & Kramer, 2001). D-KEFS Trail
Making Test and Design Fluency correlated most strongly with Visuomotor Precision.
There were also notable correlations for both D-KEFS tests with Animal Sorting, Inhibi-
tion, Block Construction, and Picture Puzzles. D-KEFS Color-Word Interference corre-
lated most strongly with Inhibition but also with Response Set, Clocks, Comprehension of
Instructions, Memory for Designs, Word List Interferences, and Visuomotor Precision.
D-KEFS Verbal Fluency correlated highest with Word Generation and had medium to
high correlations with the other Language subtests, Animal Sorting (but not other tests in
the Attention and Executive Functioning domain), Memory for Names Delayed, Word
List Interference Recall, Visuomotor Precision, Arrows, and Block Construction.

Clinical Studies
Clinical studies from the NEPSY-II manual involved 10 groups, including: ADHD;
reading disorder; mathematics disorder; language disorder; intellectual disability; autistic
disorder; Asperger’s disorder; deaf and hard of hearing; emotionally disturbed; and TBI.
These studies involved comparing each clinical sample to a control sample, matched on
age, sex, race/ethnicity, and parent-education level.
The clinical subgroups were administered up to 22 of the 32 subtests3 from the
NEPSY-II. Clinical group performance is presented in Table 4 using Cohen’s d effect

3
Subtests administered included: Animal Sorting; Auditory Attention and Response Set; Clocks; Inhibi-
tion; Statue; Comprehension of Instructions; Phonological Processing; Speeded Naming; Memory for Designs;
Memory for Faces; Narrative Memory; Sentence Repetition; Word List Interference; Fingertip Tapping; Visuo-
motor Precision; Affect Recognition; Theory of Mind; Arrows; Block Construction; Design Copying; Geometric
Puzzles; and Picture Puzzles. Some subtests were not administered to all clinical groups. For example, Statue
was only administered to the Language Disorder group.
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Table 4 Performance on the NEPSY-II in Clinical Groups Compared to Matched Controls: Effect Sizes

Reading Mathematics Language Intellectual Autistic Asperger’s Deaf & Hard Emotionally
Domains/Subtest Scores ADHD Disorder Disorder Disorder Disability Disorder Disorder of Hearing Disturbed
55 36 20 29 20 23 19 18 30
Sample size
Mean age 9.9 10.2 11.2 7.5 9.5 8.7 12.1 8.1 10.0
SD 1.8 1.4 1.4 2.0 2.2 2.8 2.2 2.3 1.7
Attention and Executive Functioning
Animal Sorting Total Correct Sorts .25 .28 .19 .59 1.82 2.18 .61 −.13 .74
Animal Sorting Combined Scaled Score .32 .42 .20 .74 1.84 2.47 .51 −.22 .85
Auditory Attention Total Correct .46 .10 .30 .75 1.15 .73 .56 .83 .37
Auditory Attention Combined Scaled Score .54 .14 .27 .70 .94 .79 .81 .86 .32
Response Set Total Correct .73 .27 .86 1.27 1.53 .97 .52 – .63
Response Set Combined Scaled Score .70 .06 .99 .93 1.32 1.02 .82 – .65
Clocks Total Score .43 .51 .69 .81 2.02 .77 .62 −.47 .57
Inhibition: Naming Total Completion Time .54 .55 .31 .99 1.26 .38 .78 .43 .40
Inhibition: Naming Combined Scaled Score .42 .32 .29 1.04 1.23 .17 .27 −.48 .53

94
Inhibition: Inhibition Total Completion Time .64 .08 −.07 .92 1.09 1.55 .86 .21 .52
Inhibition: Inhibition Combined Scaled Score .58 .27 1.14 1.31 1.26 .47 .49 .00 .33
Inhibition: Switching Total Completion Time .46 .01 −.02 .43 – .51 .58 – 1.12
Inhibition: Switching Combined Scaled Score .55 .01 .69 1.03 – .60 .35 – .64
Inhibition Total Errors .46 -.07 .99 1.32 1.31 .27 .10 −.44 .13
Statue Total Score – – – .62 – – – – –
Language
Comprehension of Instructions Total Score .29 .66 .55 .89 2.16 1.78 .40 .73 .99
Phonological Processing Total Score .44 .94 .35 .56 1.68 1.10 .48 1.13 .36
Speeded Naming Total Completion Time .52 1.10 −.08 .28 1.85 1.11 .41 −.46 1.05
Speeded Naming Combined Scaled Score .62 1.10 .02 .27 1.69 .93 .14 −.11 .77
Memory and Learning
Memory for Designs Content Score −.03 −.05 1.05 .60 2.19 .76 .59 .75 .07
Memory for Designs Spatial Score −.20 −.28 .96 .41 .60 .88 .66 .67 .31
Memory for Designs Total Score −.24 −.15 1.27 .53 1.25 .70 .57 .89 .21
Memory for Designs Delayed Content Score −.07 .01 1.04 −.11 1.09 1.00 .82 .55 .06
Memory for Designs Delayed Spatial Score −.04 −.31 .88 −.02 .74 .69 .25 .64 .52
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Memory for Designs Delayed Total Score .02 −.29 1.18 .14 1.09 1.03 .64 .52 .20
Memory for Faces Total Score .44 −.09 .34 .71 1.32 1.03 1.34 .22 .36
Memory for Faces Delayed Total Score .25 −.29 .88 .57 1.45 .64 .59 .11 .68
Narrative Memory Free & Cued Recall Total Score .35 −.05 .75 .87 2.05 1.28 .22 .68 .38
Narrative Memory Free Recall Total Score .41 .04 .61 .91 1.97 1.21 .34 .62 .44
Sentence Repetition Total Score – – – .64 – – – – –
Word List Interference Repetition Total Score .29 .32 .00 1.55 1.64 1.68 .14 – .20
Word List Interference Recall Total Score .45 .21 .71 .81 1.63 1.71 .33 – .46
Sensorimotor
Finger Tapping Dominant Hand Combined Scaled Score .02 .04 .44 .47 .87 .75 1.12 .16 .55
Finger Tapping Nondominant Hand Combined Scaled Score −.02 .00 .04 .33 1.08 .56 1.30 .24 .34
Finger Tapping Repetitions Combined Scaled Score −.06 .03 .35 .33 .70 .33 1.08 .37 .33
Finger Tapping Sequences Combined Scaled Score .12 .06 .00 .32 1.00 .88 1.07 .13 .55
Visuomotor Precision Total Completion Time .49 .17 −.33 −.24 .00 .35 .84 .13 .35
Visuomotor Precision Combined Scaled Score .31 −.01 .16 .70 .90 .86 .36 −.16 .84
Social Perception
Affect Recognition Total Score .37 −.25 .30 .54 1.44 1.19 .19 −.32 −.12
Theory of Mind Total Score – – – .55 – – – – –

95
Visuospatial Processing
Arrows Total Score .46 .19 .69 .61 1.91 1.15 .13 .31 .80
Block Construction Total Score .17 .14 .88 .38 1.69 .17 .80 .32 .55
Design Copying Process Motor Score .18 −.22 .13 .87 1.29 .82 1.06 −.18 .90
Design Copying Process Global Score .24 .25 .63 .98 1.51 .88 .74 −.40 .59
Design Copying Process Local Score .18 .31 .50 .92 1.56 .80 1.00 .08 .74
Design Copying Process Total Score .25 .13 .43 1.08 1.75 .89 1.15 −.22 .85
Geometric Puzzles Total Score .46 −.04 1.23 .83 1.54 .42 .38 .44 −.06
Picture Puzzles Total Score .17 .52 1.00 .97 2.67 1.03 .04 −.34 .11

Notes: ADHD = Attention deficit/hyperactivity disorder. SD = standard deviation for age. Values presented are Cohen’s d effect sizes, which can be interpreted as small (d = .20–
.49), medium (d = .50–.79), and large (d = .80+). Effect sizes of d ≥ .30 are bolded.
96 B. L. BROOKS ET AL.

sizes. The effect sizes presented in Table 4 suggest that many of the selected NEPSY-II
subtests are able to identify cognitive problems in these clinical groups (i.e., the Attention,
Executive Functioning, and Language subtests appear to be the most consistent across the
clinical groups), at least at a group level.

STRENGTHS OF THE NEPSY-II


The NEPSY-II remains one of a very small number of tests developed specifically
and primarily as a neuropsychological battery for children. The battery itself makes neu-
ropsychological sense — it includes many classic paradigms for testing neuropsychologi-
cal function, and as such, its conceptual underpinnings can be easily understood by users
with a neuropsychological background. It also has excellent coverage of the main cogni-
tive domains that most users need clinically, including inclusion of the new domain of
Social Perception, which many will find particularly appealing for use in assessing
children with this type of difficulty (e.g., Autism Spectrum Disorders).
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

The NEPSY-II is one of the only pediatric neuropsychological tools available for
comparing performance across subtests using contemporary data on co-normed subtests. This
feature cannot be overemphasized. It is also the only battery for children conceptualized as a
true flexible battery with normative data collected in a manner to reduce order effects. The
creation of a number of different batteries for assessing different presenting problems and
conditions is an interesting feature that many users will likely appreciate. The age range cov-
erage is clearly an asset and a definite improvement over the original version of the test, which
facilitates retest situations in the clinical arena as well as longitudinal designs in research. The
NEPSY-II will be welcomed by clinicians working with children ages 3 to 5 years because
there are so few neuropsychological tests normed for preschoolers. Moreover, the adolescent
coverage up to 16 years of age will help fill the gap that occurs for some neuropsychological
tests, which often have missing or incomplete norms coverage for this age group. Some of the
new subtests, such as Inhibition, are very well conceived and stand out psychometrically.
For those clinicians who are familiar with both the previous and the newer versions of
the battery, there are clear and obvious changes that improve the usability of this pediatric
battery (see Ahmad & Warriner, 2001; Strauss, Sherman, & Spreen, 2006 for reviews of the
NEPSY). There are improvements in usability in terms of the test materials and the test man-
ual, which itself is very well written and exceptionally comprehensive, and the computerized
scoring program is comprehensive and easy to use. Most subtests are also quite brief and
therefore show promise for busy clinical settings or for use in research protocols. There are
many subtests with solid to excellent psychometric properties. Moreover, the psychometrics
of several subtests improve when studied in clinical samples (i.e., see Figure 1). Overall, the
battery has generally high internal reliabilities and respectable test-retest reliabilities for most
subtests, and strong sample sizes for most of the reliability studies. There is a large amount
of concurrent validity evidence in the form of correlations with several other tests described
in the manual. The technical manual includes a vast and comprehensive array of valuable
psychometric information. Finally, the computerized scoring assistant and assessment plan-
ner software was well conceived, is user friendly and is a welcomed asset to this battery.

LIMITATIONS OF THE NEPSY-II


Like all neuropsychological tests and batteries, the NEPSY-II has some limitations.
Identification and discussion of these limitations is important for clinicians who use or are
NEPSY-II TEST REVIEW 97

considering using the NEPSY-II. It is always hopeful that identification of the limitations
will lead to additional (independent) research with the battery and can potentially direct
future revisions or the development of other tests. Some of these limitations were also
recently highlighted by Titley and D’Amato (2008).
Unlike many other neuropsychological and cognitive batteries and unlike the origi-
nal NEPSY, there are no index scores for the NEPSY-II. Although the absence of index
scores is not necessarily by definition a limitation, more explanation of the rationale and
empirical validation of this approach would have benefited the user. The manual does not
include the results of a factor analysis, which would have been helpful in determining
whether the test should be seen as a scale containing multiple separate domains, as it is
conceptualized and presented in the manual. Ruling in or ruling out the presence of sepa-
rate domains through factor analysis seems particularly important, given that prior
research identified only a single unitary factor for the original NEPSY (Stinnett, Oehler-
Stinnett, Fuqua, & Palmer, 2002) but other research suggested that different models were
needed for the different age groups (Mosconi, Nelson, & Hooper, 2008). Index scores also
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

tend to have higher reliabilities than subtests by virtue of being based on more test items.
Including reliable clusters of subtests in the form of index scores may therefore have
increased the test’s clinical utility by yielding higher reliability scores. Factor analysis
could have helped validate the domains presented in the manual, which are theoretically
derived rather than being based on a psychometric method for grouping.
Similar to the prior version of the NEPSY, the NEPSY-II memory subtests do not
provide differentiation in standard score performance between delayed free recall and
delayed recognition, which may limit usability in some clinical or research situations.
Several memory subtests have the potential to include delayed recognition of learned
elements, including List Learning, Memory for Designs, and Memory for Names.
Narrative Memory includes a separate immediate recognition memory score for 3- to
10-year-olds, but otherwise only has a standard score for the combination of immediate
free and cued recall.
There are some aspects of the NEPSY-II that are complex and require time to master
(even for the seasoned clinician). First, the large number of subtests can make it challeng-
ing to comprehend, to interpret, and to digest the vast amount of psychometric information
presented in the technical manual. For example, Table 5.1 in Korkman et al. (2007b) pre-
sents over 600 subtest intercorrelations. As a result of the vast amount of psychometric
information presented, users might find it challenging to determine the psychometric
strengths and weaknesses of individual subtests and scores, particularly as these properties
change across different age groups. Second, it can be challenging to know which tests to
select for which age groups, especially for those neuropsychologists that assess a wide
range of ages. Consider, for example, subtests from the Attention and Executive Function-
ing domain and their applicability to different age groups. Statue is the only subtest from
this domain that can be administered to 3- and 4-year-olds, but it cannot be administered
to children between 7 and 16 years. Auditory Attention, Design Fluency, and Inhibition
can be administered to children as young as 5, whereas Animal Sorting, Response Set, and
Clocks can only be administered to children starting at 7 years; Design Fluency can only
be administered to children between 5 and 12 years, but not ages 13–16 years. Fortunately,
the computerized scoring assistant and assessment planner software is an excellent
resource when selecting subtests or specific batteries for the various ages.
As noted by Korkman et al. (2007b), “the [referral] batteries are designed as guide-
lines to assist a new user of the NEPSY-II in selecting subtests and should not replace
98 B. L. BROOKS ET AL.

clinical experience and judgment” (p. 23). The referral batteries are also suggested for
those wanting information to help differentiate specific diagnostic questions. It is impor-
tant to note that there has yet to be a publication of validation evidence for the sensitivity/
specificity of these batteries for identifying specific problems or differentiating between
conditions. The utility of these batteries as sensitive tools for detecting particular neu-
rocognitive problems or conditions awaits empirical validation.
Scoring can be complex for some subtests, with numerous primary, process, and
contrast scores, as well as behavioral observations. We recommend using the computer-
ized scoring software, which helps to significantly reduce the complexity of some aspects
of scoring. However, it is also important to appreciate that the computer-scored printout
provides a large number of scores, which can make it challenging to read and to select
which scores to interpret. For example, when the general battery is administered to a
10-year-old, the user is provided with an eight-page printout that contains 55 Primary,
Process, and Contrast scores; the Attention and Executive Functioning domain alone
contains 29 scores. Similar to any other neuropsychological battery, clinicians should be
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

cautious about the risk of overinterpreting isolated low scores because of the high number
of scores generated by the battery (for reviews of the issues on base rates of low scores,
see Binder, Iverson, & Brooks, 2009; Iverson & Brooks, in press).
Some of the clinical validity evidence presented in the NEPSY-II manual has limita-
tions. Although the authors provided information on a large number of clinical groups
with respectable sample sizes, they also indicated that the clinical groups collected were
obtained from different settings, might have been based on varying criteria (e.g., emotion-
ally disturbed group) and therefore might not actually be representative of the diagnostic
category as a whole. Another limitation is that the clinical samples for the validity studies
do not include children with known neurological disorders (e.g., strokes, tumors, hydro-
cephalus, and epilepsy), apart from a sample of children with traumatic brain injuries that
is too small and too heterogeneous from which to draw any stable conclusions, even
though the test is designed to be used by neuropsychologists. Unfortunately, clinical sensi-
tivity on the 10 NEPSY-II subtests that were not included in the clinical studies remains
unknown, and other ways of conceptualizing clinical sensitivity (e.g., percent of group
with clinical-level impairments on specific subtests, positive predictive power) are not
provided. As well, like many other commonly used test batteries, reliability data and
validation evidence of the test’s utility in ethnic minority groups or other groups that differ
from the normative sample are needed.
There have been few independent studies conducted with the NEPSY-II. This is
likely the result of the relatively brief duration of time from publication of the test battery
to the time of writing this review. Boneh et al. (2008) included the NEPSY-II Statue,
Memory for Designs, and Narrative Memory subtests as part of their assessments of four
young children with glutaric aciduria Type I, an autosomal recessive disorder that can lead
to neurological damage and cognitive problems. Isaac and Oates (2008) and Davidson,
McCann, Morton, and Myles (2008) included the NEPSY-II in their discussions of poten-
tial measures of cognitive abilities in young children, children, and adolescents. Obvi-
ously, there is need for independent studies with the NEPSY-II. We anticipate that these
studies will emerge in the literature over the next several years.
The authors are commended for improving floor and ceiling effects on the NEPSY-II.
Unfortunately, they were not eliminated and they still need to be considered when
interpreting performance. For example, consider the performance by a 3-year-old on the
Narrative Memory subtest. A child who is unable to freely recall any story details had a
NEPSY-II TEST REVIEW 99

score at the 16th percentile on the original NEPSY. On the NEPSY-II, a raw score of zero
for a 3-year-old would be at the 2nd percentile — representing an improvement in the dis-
tribution of the normative score range and a better floor for the subtest compared to the
NEPSY. On the NEPSY-II Narrative Memory for 3-year-olds, recall of only one story
detail brings the standard score up to the 9th percentile and recalling only three elements
(out of 20) places the performance at the 50th percentile. Norm-referenced performance
levels on the NEPSY-II can therefore be substantially affected by small changes in raw
scores, which place considerable psychometric weight on each raw-score point. Like
many tests designed to capture performance in very young children that are also designed
for use with older children, some NEPSY-II subtests have floor effects (e.g., Statue, Imi-
tating Hand Positions, Affect Recognition, Theory of Mind, Word Generation, Narrative
Memory, Sentence Repetition, and Design Copying in 3-year-olds). Regarding ceiling
effects, consider the performance by a 16-year-old on the Auditory Attention test. A per-
fect performance (30/30) will yield an average standard score (51st–75th percentile). One
error places the score at the 11–25th percentile, two errors results in a score at the 2nd–5th
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

percentile, and three errors will place the performance <2nd percentile. There are several
NEPSY-II subtests that have notable ceiling effects in adolescents (e.g., Auditory Atten-
tion and Response Set, Clocks, Affect Recognition, Language subtests, Memory for
Designs, Word List Interference, and Visuospatial Processing in 16-year-olds). Of course,
floor and ceiling effects occur in many neuropsychological tests, and so this limitation is
not restricted to the NEPSY-II.
Although the NEPSY-II represents a substantial update to the normative data, sev-
eral subtests were not re-normed and two subtests were only re-normed for children
13–16 years. Unfortunately, it is not clearly indicated in the psychometric manual that
this issue with the normative data exists (i.e., see “no modifications” in Table 2.3 from
the NEPSY-II Clinical and Interpretive Manual; Korkman et al., 2007b). No new nor-
mative data for the Design Fluency, Imitating Hand Positions, List Memory, Manual
Motor Sequences, Oromotor Sequences, Repetition of Nonsense Words, and Route
Finding subtests were collected for the NEPSY-II. Memory for Names and Word
Generation (previously called Verbal Fluency) had new normative data collected only
for 13- to 16-year-olds. This means that the normative data for children 12 years and
younger on these nine subtests: (a) were collected approximately 10 years earlier than
the norms on the other NEPSY-II subtests; (b) are based on the performance of different
children than the current standardization; (c) are stratified according to a previous U.S.
census; and (d) are not co-normed with the other NEPSY-II tests. These discontinuous
norms may complicate subtest interpretation in retest situations of individual children
being tracked over time.

CONCLUSIONS
The revision of the NEPSY into the NEPSY-II was clearly a monumental task
for the authors and others involved, all of whom are accomplished and well-
respected leaders and pioneers in child neuropsychology. The resulting product
represents one of only a handful of co-normed, comprehensive, multidomain neuro-
psychological batteries. The authors should be commended for this impressive
achievement.
The NEPSY-II has several strengths, and despite the potential shortcomings
raised in this and other reviews (i.e., Titley & D’Amato, 2008), many aspects of this
100 B. L. BROOKS ET AL.

pediatric neuropsychological battery raise the bar for future test development.
The NEPSY-II is a welcome addition to a clinician’s and researcher’s repertoire of
neuropsychological tests for children and adolescents.

Original manuscript received June 24, 2009


Manuscript accepted June 26, 2009
First published online August 10, 2009

REFERENCES
Ahmad, S. A., & Warriner, E. M. (2001). Review of the NEPSY: A Developmental neuropsychological
assessment. Clinical Neuropsychologist, 15(2), 240–249.
Binder, L. M., Iverson, G. L., & Brooks, B. L. (2009). To err is human: “Abnormal” neuropsycholog-
ical scores and variability are common in healthy adults. Archives of Clinical Neuropsychology,
24, 31–46.
Boneh, A., Beauchamp, M., Humphrey, M., Watkins, J., Peters, H., & Yaplito-Lee, J. (2008).
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

Newborn screening for glutaric aciduria type I in Victoria: Treatment and outcome. Molecular
Genetics and Metabolism, 94, 287–291.
Cohen, M. J. (1997). Children’s Memory Scale. San Antonio, TX: The Psychological Corporation.
Davidson, A. J., McCann, M. E., Morton, N. S., & Myles, P. S. (2008). Anesthesia and outcome
after neonatal surgery. Anesthesiology, 109, 941–944.
Delis, D. C., Kaplan, E., & Kramer, J. H. (2001). Delis Kaplan executive function system. San
Antonio, TX: The Psychological Corporation.
Elliott, C. D. (2007). Differential ability scales (2nd ed.). San Antonio, TX: Harcourt Assessment.
Isaacs, E., & Oates, J. (2008). Nutrition and cognition: Assessing cognitive abilities in children and
young people. European Journal of Nutrition, 47(Suppl. 3), 4–24.
Iverson, G. L., & Brooks, B. L. (in press). Improving accuracy for identifying cognitive impairment.
In M. R. Schoenberg & J. G. Scott (Eds.), The black book of neuropsychology: A syndrome-
based approach. New York: Springer.
Korkman, M. (1980). NEPS. Lasten neuropsykologinen tutkimus. Käsikirja [NEPS. Neuropsycho-
logical Assessment of Children. Manual]. Helsinki, Finland: Psykologien kustannus.
Korkman, M. (1988a). NEPS-U. Lasten neuropsykologinen tutkimus. Uudistettu laitos [NEPSY.
Neuropsychological Assessment of Children. Revised Edition]. Helsinki, Finland: Psykologien
kustannus.
Korkman, M. (1988b). NEPSY: An application of Luria’s investigation for young children. Clinical
Neuropsychologist, 2(4), 375–392.
Korkman, M. (1990). NEPSY. Neuropsykologisk undersökning: 4-7 år. Svensk version [NEPSY.
Neuropsychological assessment: 4-7 years. Swedish version]. Stockholm: Stockholm:
Psykologiförlaget.
Korkman, M. (1999). Applying Luria’s diagnostic principles in the neuropsychological assessment
of children. Neuropsychol. Rev, 9(2), 89–105.
Korkman, M., Kirk, U., & Kemp, S. (1997). NEPSY. Lasten neuropsykologinen tutkimus [NEPSY.
A Developmental Neuropsychological Assessment. In Finnish]. San Antonio, TX: The
Psychological Corporation.
Korkman, M., Kirk, U., & Kemp, S. (1998). NEPSY: A developmental neuropsychological assess-
ment manual. San Antonio, TX: The Psychological Corporation.
Korkman, M., Kirk, U., & Kemp, S. (2007a). NEPSY-II: A developmental neuropsychological
assessment. San Antonio, TX: The Psychological Corporation.
Korkman, M., Kirk, U., & Kemp, S. (2007b). NEPSY-II: Clinical and interpretive manual.
San Antonio, TX: The Psychological Corporation.
Mosconi, M., Nelson, L., & Hooper, S. R. (2008). Confirmatory factor analysis of the NEPSY for
younger and older school-age children. Psychological Reports, 102(3), 861–866.
NEPSY-II TEST REVIEW 101

Stinnett, T. A., Oehler-Stinnett, J., Fuqua, D. R., & Palmer, L. S. (2002). Examination of the under-
lying factor structure of the NEPSY: A developmental neuropsychological assessment. Journal
of Psychoeducational Assessment, 20, 66–82.
Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of neuropsychological tests:
Administration, norms, and commentary (3rd ed.). New York: Oxford University Press.
Titley, J. E., & D’Amato, R. C. (2008). Understanding and using the NEPSY-II with young children,
children, and adolescents. In R. C. D’Amato & L. C. Hartlage (Eds.), Essentials of neuropsy-
chological assessment: Treatment planning for rehabilitation (pp. 149–172). New York:
Springer Publishing.
Wechsler, D. (2002). Wechsler Individual Achievement Test (2nd ed.). San Antonio, TX: Harcourt
Assessment.
Wechsler, D. (2003). Wechsler Intelligence Scale for Children (4th ed.). San Antonio, TX: The
Psychological Corporation.
Wechsler, D., & Naglieri, J. A. (2006). Wechsler Nonverbal Scale of Ability. San Antonio, TX:
Harcourt Assessment.
Downloaded By: [Brooks, Brian L.] At: 18:25 14 December 2009

View publication stats

You might also like