Towards reporting standards for neuropsychological study results

Clinical Neurology and Neurosurgery 162 (2017) 72–79
Contents lists available at ScienceDirect
Clinical Neurology and Neurosurgery

journal homepage: www.elsevier.com/locate/clineuro
Towards reporting standards for neuropsychological study results: A MARK

proposal to minimize communication errors with standardized qualitative
descriptors for normalized test scores
⁎
Mike R. Schoenberg , Ruba S. Rum
Department of Neurosurgery and Brain Repair, University of South Florida Morsani College of Medicine, Tampa, FL, USA
A R T I C L E I N F O A B S T R A C T
Keywords: Objective: Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient
Neuropsychological assessment care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of con-
Ethics sistency in how neuropsychological scores are communicated. A major limitation in the communication of
Qualitative descriptors neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the
Reporting errors
use of vague terminology.
Practice standards
Patients and methods: PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus
statements for the description and reporting of qualitative terms to communicate neuropsychological test scores
was conducted. The review found the use of confusing and overlapping terms to describe various ranges of
percentile standardized test scores.
Results: In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple)
as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: ‘very superior’,
‘superior’, ‘high average’, ‘average’, ‘low average’, ‘borderline’ and ‘abnormal/impaired’. A case example illus-
trates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for
neurosurgical planning.
Conclusions: The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize
communication of standardized neuropsychological test scores. Research are needed to further evaluate neu-
ropsychological communication errors. Conveying the clinical implications of neuropsychological results in a
manner that minimizes risk for communication errors is a quintessential component of evidence-based practice.
1. Introduction standardized, reliable, and norm-based assessments to measure brain

function that can include attention/executive, language, memory/
Neuropsychological evaluations provide unique information to the learning, and/or visuospatial/constructional functions as well as pro-
referring clinician and patient (and caregivers as appropriate), offering cessing speed, academic skill development and fine motor speed/dex-
answers to diagnostic and treatment-related questions in terms of brain- terity [2,11,12]. The tests are standardized, using normative data and
behavior correlates, capacity, rehabilitation, and prognosis [1–5]. It is regimented administration procedures to allow for generalization of
essential the assessment results be communicated accurately, clearly measured behavior to established brain-behavior relationships for the
and efficiently to promote patient health, reduce costs, and minimize purposes of health care provision, rehabilitation, prognosis or capacity
risks for communication errors [6–9]. Ultimately, practice patterns that decisions. In an ideal case, appropriate norms are used to increase the
diminish the efficient, understandable, and timely communication of specificity of test findings while simultaneously reducing the test var-
results adversely affect patient safety, reduce the quality of patient care iance associated with other demographic variables such as age, gender,
and increase costs [8,10]. and education level. A consequence of norm-referenced tests is that
cutoff values and the normative set used can dramatically influence: (1)
1.1. Types of neuropsyhological data to convey test sensitivity, (2) test specificity, and (3) diagnostic validity [13]. The
clinical interpretation is made particularly complex given that a single
A central component of a neuropsychological evaluation consists of neuropsychological assessment can include multiple (10 or more)
⁎
Corresponding author.
E-mail address: mschoenb@health.usf.edu (M.R. Schoenberg).
http://dx.doi.org/10.1016/j.clineuro.2017.07.010
Received 24 February 2015; Received in revised form 5 June 2017; Accepted 11 July 2017
Available online 25 July 2017
0303-8467/ © 2017 Elsevier B.V. All rights reserved.
M.R. Schoenberg, R.S. Rum Clinical Neurology and Neurosurgery 162 (2017) 72–79
neuropsychological tests, each having an independent normative data there is marked variability in the test score qualitative descriptors used
set from which standardized test scores are derived that can also differ to describe neuropsychological standardized scores [23]. Second, there
in extent demographic factors (age, education, gender, ethnicity, etc.) is a lack of consensus in the range of standard scores/percentiles that
are incorporated. Further, the same neuropsychological test may have correspond to a particular test score qualitative descriptor (i.e., what
more than one (sometimes multiple) independent normative data sets a scores are “average”) [24,25,19,26,23,27,14,28,29]. Despite repeated
clinician may use (see for example Trail Making Test normative data calls for more uniform use of the test score qualitative descriptors that
sets [14,12,15]). In addition to normative test data, the neuropsycho- delineate the relative uniqueness or statistical probability of test scores
logical study also includes history of symptoms or problems, medical/ (e.g., ‘below average’, ‘mildly impaired’, ‘borderline’, ‘extremely low’,
psychiatric history, social/occupational/development history, mental etc.), there remains excessive variability and no consensus among
status, behavioral observations, and observations about study validity/ neuropsychologists [e.g. [2,17,23]]. Further adding to the confusion,
reliability. To provide holistic, personalized, and reliable medical care, test score qualitative descriptors that sound similar (e.g., ‘low average’,
a neuropsychologist must be able to interpret and convey both quali- ‘low normal’, ‘borderline’, ‘below average’) do not overlap in terms of
tative aspects of the patient’s behavior and quantitative test data into how rare or unusual a score is in its interpretation. Indeed, a survey of
meaningful clinical judgment to answer referral question(s). 110 neuropsychologists [23] found the index score of 70 (2nd percen-
tile) was described using 22 different test score qualitative terms, with 6
1.2. Conveying neuropsychological results different terms using the word ‘impaired’ (e.g., ‘impaired’, ‘borderline
impaired’, ‘mildly impaired’, ‘moderately impaired’, ‘severely im-
There is no agreed upon standard for a neuropsychological report paired’, and ‘significantly impaired’).
format, and neuropsychologists and users of the report may have dif- Currently, there are at least three different test score qualitative
fering perspectives of the ideal format and style [2,16–19,3–5,9]. The classification schemas (with multiple permutations) that are generally
existing recommendations for communicating neuropsychological re- recognized [14,28,29]:
sults provide general guidance in what sections a written report should
have and calls for writers to clearly convey results/diagnosis and re- 1. Clinical Classification system advocated by Heaton et al. [14]
commendations [2,9]. Furthermore, testing standards [6] and profes- 2. Clinical classification advocated by Schretlen et al. [28]
sional ethics [7] highlight the need to produce reports that: (1) support 3. The David Wechsler/Intelligence classification system [29]
the role of the neuropsychologist as a consultant by encouraging
communication of results, (2) is tailored to satisfy the need for time- The three commonly used qualitative classification systems above,
liness for communicating results, and (3) minimizes the risk for com- along with a fourth based on the Wechsler classification system that
munication errors [6,7,2,8,3–5,9]. The proposed suggestion for a neu- uses different test score qualitative terms [26], are displayed in the first
ropsychology reporting guideline mirrors the recommendations four columns of Fig. 1. Surprisingly, there is lack of agreement for most
established by radiology [20–22] for reporting results such that the terms, including the most commonly used test score qualitative de-
format for communicating study results is less important than is it that scriptor, ‘average’. The Wechsler system [29] suggests scores falling in
the report is: (a) timely and (b) efficiently conveys results that minimize the 25–74th (or 75th) percentiles are ‘average’ while the Heaton et al.
potential for communication errors. Unfortunately, there has been no [14] classification system identifies scores falling between the 30–67th
guidance or consistency in how standardized test scores are commu- percentiles as ‘average’. The Schretlen et al. [28] classification system
nicated. generally mirrors the Wechsler system and describes scores between
24–74th percentiles as ‘average’. There is even less consensus for per-
2. Methods and materials centile scores that fall outside of the ‘average’ range. A score falling at
the 9th percentile may be ‘below average’ using the Wechsler classifi-
2.1. The need for consensus in communicating neuropsychological test cation system [29], ‘mildly impaired’ using the Heaton et al. [14]
results classification system, or ‘low average’ using the Schretlen et al. [28]
rating system. Even more troubling is the use of similar test score
It is well understood across medicine that communication errors qualitative descriptors for different ranges of percentile scores between
adversely affect patient care, contribute to medical errors, and increase the classification schemes (i.e., ‘low average’ versus ‘below average’).
costs [8,10]. “PubMed search from 1 Jan 2007 to 1 Aug 2016 for terms For example, ‘low average’ describes scores falling between the 9–24th
related to ‘neuropsycholog*’, ‘reporting standards’, or 'consensus percentiles using the Wechsler classification scheme [29], while “below
statements' did not result in any practice guidelines to standardize the average” is used to describe scores ranging from 16–27th percentiles
communication of qualitative descriptors for neuropsychological nor- using the Heaton et al. [14] system. The term ‘borderline’ equates to
mative test scores (see [2]”). At a minimum, the neuropsychological scores ranging from the 2–8th percentiles using the Wechsler [29] and
report should clearly communicate if the study is: (1) abnormal and Schretlen et al. [28] systems, but Heaton et al. [14] describes scores
related to known or suspected neurological (neurophysiological) dys- from the 6th-15th percentiles as ‘mildly impaired’. Thus, Heaton et al.’s
function, (2) equivocal and the study could reflect normal variant or [14] ‘mildly impaired’ scores reflect scores that are delineated as
mild abnormality, but are indeterminate from the neuropsychologist’s ‘borderline’ or ‘low average’ by Wechsler [29] and Schretlen et al. [28].
opinion, or (3) normal (no brain dysfunction). Unfortunately, the The confusion of terms and disagreement in score ranges lacks precision
qualitative descriptors used to describe standardized test scores (i.e., and will contribute to errors in communicating results to health pro-
the term(s) used to describe a standardized test score deviating from viders and patients [23]. Indeed, the lack of precision has contributed
‘normal’ or ‘abnormal’) are highly variable, which can obfuscate the to recommendations by neuropsychologists to include test scores in the
results and increase the potential for communication errors (e.g. [23]). neuropsychological report itself [16,19].
To address the lack of consistency in test score qualitative de-
2.2. Test score qualitative descriptors scriptors when communicating the results of neuropsychological as-
sessment, a simplified qualitative reporting system is recommended,
The term ‘test score qualitative descriptor’ refers to the terms au- and is delineated in Table 1. This classification categorization is also
thors use to communicate how a patient performed on a norm-refer- presented in reference to the four commonly used qualitative classifi-
enced neuropsychological test and not to descriptors of mental status. cation systems in Fig. 1.
There are at least two inter-related problems in neuropsychological
practice related to the use of test score qualitative descriptors. First,
73
Fig. 1. Commonly Used Qualitative Descriptors with

Corresponding Standard Scores and Percentiles.
Table 1 deficits in the context of neuropathological patterns and appropriate

Proposed Simplified Clinical Test Score Qualitative Classification (Q-Simple) System. clinical history [24,25,30,31,27,32,11,33,34]. That is, test scores in this
Qualitative anchors for neuropsychological standard scores, T-scores and Percentile
range reflect performances below the 16th percentile that may be in-
equivalents.
terpreted as unusual or low by patients and clinicians alike, but are also
Reference Index Reference T- Reference commonly observed in studies evaluating the base rates of test scores
Scores Scores Percentile Scores among healthy individuals [24,25,30,31,27,32–34]. Thus, these test
scores may reflect a mild abnormality, a premorbid weakness or mea-
Suggested Qualitative Descriptor
Normal (scores are normal and/or above average):
surement error.
Very Superior: 130+ 70+ 98+ The next test score qualitative descriptor, ‘abnormal’, describes
Superior: 120–129 64–69 92–97 statistically rare scores (lowest 5th percentile of performances com-
High average: 110–119 57–63 75–91 pared to a normative peer group), and possibly associated with brain
Average: 90–109 44–56 25–74
dysfunction. While intuitively appealing, data are lacking to indicate
Low average: 85–89 40–43 16–24
that the communication of an “impaired” or “abnormal” neu-
Borderline/equivocal (scores may be normal or abnormal):
ropsychological test result are improved by providing subclassifications
Borderline: 77–84 35–39 6–15
(e.g., ‘mildly’ impaired, ‘mild-to-moderately’ impaired, ‘moderate-to-
Abnormal (scores are statistically rare and can reflect brain dysfunction): severely’ impaired, etc.) [23]. Further, there is a lack of consensus
Abnormal/ < 77 < 35 <6
Impaired
where “mildly abnormal” standardized test scores end and more severe
(“moderately abnormal/impaired”, “profoundly abnormal/impaired”)
standardized test score ranges begin across the current multiple quali-
3. Results tative classification systems. Another factor that complicates the com-
munication of standardized test scores with qualitative descriptors is
3.1. Proposal for a simplified clinical test score qualitative descriptor that the normative data used to derive standardized scores vary across
classification system neuropsychological tests with some scores being age-corrected, while
other tests have normative data that is age-, education- and other de-
Development of the proposed Simplified Clinical Test Score mographic (e.g., ethnicity and/or gender) based [14]. To avoid confu-
Qualitative Classification (Q-Simple) system was based on the in- sion, the proposed Q-Simple system advocates to describe all percentile
creasing demand to reduce potential for communication errors, reduce scores that are at or below the 5th% regardless of test/normative
health care costs, and promote evidence based care that has been sample used. Taken the limitations and complicating factors together,
coupled with an ongoing lack of consensus among clinicians to describe the Q-Simple system proposes to use a single descriptor (e.g., ‘ab-
standardized test scores. The Q-Simple development integrated several normal’) to describe a specific range of scores (< 5th percentile) that
factors thought to be important to its utilization: (a) adhere to his- are generally agreed upon as reflecting statistically rare scores that can
torically used psychometric principals, (b) be face validity to consumers denote neuropathology. The suggestion for a general descriptor as
of neuropsychological results, and (c) reduce the potential for com- ‘abnormal’ has intrinsic interpretive value, and can cut-across the
munication errors in the written description of neuropsychological test multitude of confusing qualitative descriptors to promote clear and
scores to patients and their health care providers. The test score qua- concise communication that an abnormality exists [44]; [7,2,8,10]).
litative descriptors for the upper end of the distribution of percentile
ranges are consistent with those of Wechsler [29] and Schretlen et al. 4. Discussion
[28] systems. The term ‘borderline’ is advocated for standardized test
scores from the 6th to 15th percentiles. These scores are often equivocal There have been much debate regarding what cut-off value should
with respect to clinical importance, and occur frequently as low scores be interpreted as ‘abnormal’ to best distinguish between brain pa-
in a test battery; however, these scores can also reflect early or mild thology and normal variability (e.g., scores < 15th percentile vs < 7th
74
percentile vs. < 5th percentile or < 2nd percentile) (e.g. reporting test scores serves as a transparent anchor for the interpreta-
[24,25,31,30,27,14,32,11,33,34]). Lezak [11] has advocated for dis- tion of neuropsychological data and the qualitative descriptors used for
tinguishing between clinically impaired (scores between 2nd to 7th% these data [16]. In cases when a neuropsychological test has multiple
iles or z-scores between −1.5 and −2.0) versus pathologically im- normative data sets that maybe utilized, the raw test score is desirable
paired (scores < 2nd%iles or z-scores < −2.0). However, Lezak’s [11] to report [15,12]. Including raw scores in a report also allows for im-
distinction in clinically versus pathologically impaired is not utilized by proved comparison over time, particularly if other normative data sets
the four common qualitative classification systems [26,14,28,29]. are developed [16]. Of note, the Q-Simple system (and the other qua-
Further, the psychometric distinction of clinically impaired versus pa- litative score descriptor schemas detailed in Fig. 1) are focused on de-
thologically impaired based on a cut-off z-score of −2.0 is not stable scribing standardized test score and not raw scores. Generally, perfor-
since the standardized score will be determined by the normative mance on tests reported in raw scores may be best described when
sample used, the extent that a tests’ normative data are demo- normative samples are markedly skewed, with metrics such as cumu-
graphically corrected, and how normally distributed the neuropsycho- lative percentage frequencies and/or bases to describe absolute vari-
logical function (attention, verbal memory, constructional praxis) is in ables that are remarkable (e.g., raw scores of zero or all correct).
a healthy population (e.g. [31,30,11,12,15,35]). Indeed, neuropsycho-
logical performance is frequently not normally distributed, with nor- 4.2. Test score qualitative descriptor versus interpretation of brain
mative distributions frequently being negatively skewed and kurtotic dysfunction or rehabilitation outcome
(e.g., apraxias, recognition memory, confrontation naming) [11,12,15].
Thus, the use of clinically impaired versus pathologically impaired The issue of test score qualitative descriptors vs clinical inter-
based on a z-score cut-off of −2.0 (< 2nd percentile) may deter the pretation can be confused (e.g., report of ‘abnormal test score’ versus
neuropsychological study from identifying early or mild neurological interpretation of an abnormal/impaired score that reflects neuro-
disease or dysfunction [24,25,31,30,27,14,32,33,34]. pathology), and the Q-Simple classification system emphasizes that the
The Q-Simple system distills the complexities of the evolving diag- test score qualitative descriptors do not dictate the presence of disease
nostic science to a pragmatic clinical approach to rapidly communicate or brain dysfunction. Abnormal (impaired) scores can be obtained for
health data. Score description and interpretation must incorporate reasons unrelated to brain dysfunction such as measurement error/
knowledge about the difference between statistically rare scores within statistical variability or insufficient task engagement/attention, among
a normal distribution, the statistical differences between two or more others [24,25,31,30,40,32–34]. The likelihood that impaired score(s) is
scores as they occur within a test battery, and the actual base rate of (are) due to brain dysfunction and how the obtained score(s) answers
observed differences in scores within a test battery the referral question(s) must be determined by the clinician.
[24,25,30,31,27,32–34,15,36–38]. The Q-Simple system takes into ac- The qualitative description of test scores is similarly applicable to
count the extensive data establishing the low cognitive scores that treatment/rehabilitative applications and placement decision, as the Q-
occur commonly in a battery of tests given to healthy individuals Simple test score qualitative descriptor system delineates standardized
[24,25,30,31,27,32–34]. As an example, 78 percent of the healthy test scores and not the clinical interpretation of any change in test score
normative sample of the Neuropsychological Assessment Battery (NAB; or functional skills as a result of treatment. By adopting the Q-Simple
[38]) were observed to have 2 of the 36 scores below one standard qualitative descriptor system, it can provide a standardized reference
deviation (SD) of the mean (< 16th percentile or z-score = −1.0), and point other clinicians can use to allow for rapid determination of any
21.8 percent of the healthy sample had two or more scores at or below marked deterioration or improvement without having to wait for test
−2 SD below the mean (< 2nd percentile or z-scores < −2.0) [25]. scores if these are not included or appended to the neuropsychological
Within the proposed system, scores at or below the 5th percentile report.
(scores < −1.55 SD of the mean) are considered to be statistically rare The importance of answering the referral question and/or concisely
and labeled ‘abnormal’, but not necessarily reflective of brain dys- describing rehabilitation programming with clear and concise language
function. The implication that an abnormal score is interpreted as re- cannot be understated. An increasing problem in communicating results
flecting brain dysfunction versus some other factor (e.g., poor task in neuropsychological consultations is the use of confusing terms that
engagement, effects of fatigue, pain, etc.) is necessarily a clinical in- have no empirical anchor (e.g. “cognitive inefficiency.”). Not only does
terpretation. Thus, interpretation of performance on a neuropsycholo- ‘cognitive inefficiency’ not have a psychometric anchor in any quali-
gical test battery does not rely solely on scores or standard deviations tative classification scheme, it is also unclear whether this term is used
below the mean, rather it incorporates other clinical information ob- to describe brain dysfunction, refers to a psychiatric condition, or refers
tained from the assessment. The proposed qualitative classification to a normally occurring process for the individual. A list of selected
system cut-off scores for ‘abnormal’ attempts to balance the relative problematic terms are displayed in Table 2. We provide a case example
frequency of low scores with efforts to have adequate sensitivity and, of a neuropsychological study to illustrate reporting using the Q-Simple
most importantly, have face validity to consumers. qualitative reporting system.
It is recommended the neuropsychological report specify whether
scores were interpreted based on age-matched or demographic-adjusted 4.3. Case example
(age-, education-, and gender-matched) normative data [2], because of
the clear impact various normative data corrections have on a raw test The patient is a 48 YO right-handed Caucasian female with 11 years
score’s interpretation. The problem of variations in normative data of education and a history of pharmacoresistent epilepsy with cognitive
samples and the extent test scores may be affected by age-, education- complaints referred as part of a pre-surgical work-up. Her MRI study of
and/or ethnicity/gender effects lead to the next suggestion to improve brain is illustrated in Fig. 2 and shows left mesial temporal sclerosis.
transparency of neuropsychological reporting, including test scores in Neuropsychological Results:
the study report. Premorbid: intellectual function was estimated to be in the average
to high average range.
4.1. Test scores should be reported General Cognitive/Intelligence: Index of her general cognitive
function was borderline to low average compared to age-matched peers
The lack of consensus in test score qualitative descriptors and em- (Full Scale = 84, 14th percentile). Index of her verbal general cognitive
phasis for evidence-based neuropsychology practice [16,39] both serve function was borderline to low average (VCI = 81, 10th percentile)
as bases to extend previous calls for practice patterns to include neu- [note the use of qualitative score descriptors ‘borderline’ and ‘low
ropsychological test scores in the report [7,14–16]. Furthermore, average’ was used since the VCI score has a standard error of measure
75
Table 2
Summary of Ineffective and confusing qualitative descriptors to avoid in neuropsychological reports problem.
Term or phrase Setback Recommendation
Low average to superior range This is not a range; this signifies variability Scores in function x were variable, ranging from low average to the
superior range. This should be followed up with possible explanation
for variability (e.g., time vs. untimed tasks
Cognitive inefficiency Vague term Use more precise term. Slowness; inaccuracy, etc.
No anchor
Error-free Awkward accurate
Adequate Vague, subjective; relative term with varied implication across Recall was in the average range, consistent with XX’s general level of
patients. intellectual functioning.
No anchor
Recall was somewhat compromised Vague description Recall declined from 10 to 6 items (percentile)
No anchor
Immediate recall (9/12, 20%ile) Misleading; implies patient recalled more material after the e.g., Recall remained stable following a 30 min delay (9/12; average
improved after a 30-min delay (9/ delay. It is the neuropsychologists job to interpret normative range or 40 percentile). Or 30-min delayed recall revealed no
12, 40%ile) data and communicate appropriately forgetting (9/12, 40 percentile)
Below Average Term describing scores 16th to 27th percentile See Table 2 for proposed Q-Simple qualitative descriptors and/or
Technically, may also be used to describe scores below the reference%ile ranges and/or provide test percentile score.
average of the normative group (i.e., scores 49th percentile or
less)
Borderline Low Average Typically describing scores 6th to 15th percentile See Table 2 for proposed Q-Simple qualitative descriptors and/or
provide test percentile score.
Low Normal Typically, scores between 16th to 24th percentile See Table 2 for proposed Q-Simple qualitative descriptors an provide
test score percentile.
Unusually Low Scores between 3rd to 9th percentile See Table 2 for proposed Q-Simple qualitative descriptors and/or
provide test percentile score.
WNL or Within Normal Limits Can be scores falling within expectations for the individual or Can be appropriate in limited situations. Also see Table 2 for
equal to or above the 16th percentile (e.g., 16th percentile or proposed Q-Simple qualitative descriptors and/or provide reference
greater). percentile ranges and/or provide test(s) percentile score(s)
Memory: Scores varied from abnormal to average compared to age-

matched peers.
Immediate memory: Verbal immediate memory was abnormal for
stories and a word list. Visual immediate memory was low average to
average [note – this highlights score at 25th percentile, which falls
between low average and average].
Delayed memory: Verbal delayed memory was abnormal for stories
and a word list. Visual delayed memory was average for figures and a
complex figure. Verbal recognition cues did not improve recall. Visual
recognition memory was within normal limits.
Language: Receptive and expressive speech was functional with
adequate articulation and speech rate. Comprehension and repetition
were intact. Confrontation naming was borderline (BNT = 49/60) with
7 additional given phonemic cues. Semantic verbal fluency was ab-
normal. Letter fluency was borderline. No gross alexia or agraphia.
Visuoperceptual skills were intact. Visuoconstructional skills were
borderline. Complex figure copy was borderline with retained gestalt
but missing one detail. No gross spatial imprecision or distortion. Block
constructions were borderline. No constructional apraxia.
Fine Motor: manual dexterity was borderline with her right hand,
but average with her left hand.
Mood: She endorsed clinical symptoms of depression.
Fig. 2. MRI coronal image demonstrating left mesial temporal sclerosis for 48 YO female
Neuropsychological Study Conclusions:
with left temporal lobe epilepsy.
[Note. diagnoses and recommendations are not detailed below for
purposes of brevity]
that could place it in either percentile score range]. She obtained an Neuropsychological study was abnormal due to deficits in atten-
index of nonverbal general cognitive function that was average tion/executive, verbal memory, and language (confrontation naming
(PRI = 96, 39th percentile). and semantic fluency) functions. The deficits in attention/executive and
Attention/Executive: Basic attention intact. She repeated 6 digits language functions were mild to moderate while the deficits in verbal
forward and 4 digits backward, which is low average. She scored ab- memory were severe and indicate almost complete loss of declarative
normal to low average on tasks involving selective and/or divided at- verbal memory functions. There were clinical symptoms of depression.
tention. No gross behavioral apathy or disinhibition. Compared to de- Neuropsychological study results are consistent with left temporal
mographically matched peers, verbal reasoning score was abnormal lobe dysfunction and known mesial temporal sclerosis assuming normal
while a visual reasoning score was average. Letter fluency was bor- functional neuroanatomical organization. Further clinical correlation
derline. Figural fluency scores were average. Rapid (timed) sequencing recommended for surgical planning.
and set-shifting score was average (Trail Making Test Part B = 67, 53rd The patient’s cognitive complaints are associated with brain dys-
percentile). She scored low average on a complex problem solving, function that is frequently associated with localization related epilepsy.
sequencing and set-shifting task (WCST-64).
76
Her learning and memory of verbal material was impaired, and ac- 4.4. Summary and limitations
counts for her difficulty at work and forgetfulness at home with lapses
in completing instrumental activities of daily. She was observed to close Neuropsychological studies provide unique information for the
her eyes when given verbal material to learn/remember, which she medical management of patients (whether diagnostic or rehabilitative
described as an effort to visually picture verbal material given to her. in purpose), and require that study results and recommendations/
However, she found the rate that information was given to her was too treatments are conveyed in a manner that is concise and avoids com-
fast to compensate by this adaptation to verbal memory deficit. munication errors. Unfortunately, the test score qualitative descriptors
Depressive disorder likely due to neurophysiological dysfunction in neuropsychological reports are a source of confusion in commu-
with noted exacerbation after a seizure for several hours to a day. nicating neuropsychological results due to: (a) lack of consensus of the
Pharmacological treatment is indicated. test score qualitative descriptors that describe standardized test scores,
Surgical Candidacy. From a neuropsychological standpoint, the pa- (b) using similar test score qualitative descriptor terms (e.g., ‘below
tient is a good surgical candidate. She is at low risk for decline in average’ and ‘low average’) that refer to different ranges of standar-
memory or language following a selective left temporal lobectomy.
Neuropsychological Data Summary Report
Note: Grooved Pegboard Test = G. Peg; Test of Premorbid

dized scores, (c) the use of test score qualitative terms that have no
Intelligence = ToPF; Wechsler Adult Intelligence Scale – 4th
psychometric or commonly understood clinical foundation, and (d) lack
Ed. = WAIS-IV; Trail Making Test, Parts A and B = TMT Part A and
of uniformity in psychometric characterization of neuropsychological
TMT Part B; Rey Auditory Verbal Learning Test = RAVLT; Wechsler
assessments for different populations and conditions. The proposed Q-
Memory Scale – 4th Ed. = WMS-IV; Boston Naming Test = BNT;
Simple system aims to improve the communication of neuropsycholo-
Verbal Fluency Tests (Semantic and Letter); Rey-Osterrieth Complex
gical study results, and minimize the potential for errors in information
Figure Test = ROCFT; Ruff Figural Fluency Test = RFFT; Wisconsin
transfer. The Q-Simple system uses test score qualitative descriptors
Card Sorting Test-64 card version = WCST-64; Beck Depression
that are face valid, eliminates the use of overlapping and potentially
Inventory – 2nd Ed. = BDI-II.
77
confusing terms, and attempts to incorporate current research findings referring health care providers and the field of neuropsychology.
and historical parametric statistical classifications.
A potential limitation of the Q-simple system is shared by other Disclosure statement
qualitative systems, which is a lack of empirical support for the clas-
sifications. Why adopt the Q-simple system? The identified cut-off No conflicts of interest to report.
scores for each qualitative descriptor of the Q-Simple system attempts
to integrate historical parametric statistics in describing rare scores Acknowledgements
within a distribution along with Bayesian statistical analyses of the base
rate frequency of low scores in cognitive test batteries obtained by The authors would like to thank Marla Hamberger, PhD for her
healthy individuals [24,25,30,31,27,32–34]. Further, in lieu of a con- thoughtful insights and comments on the development of the proposed
sensus practice parameter for a qualitative descriptor reporting system, reporting system.
the Q-Simple system can reduce the likelihood of communication errors
by minimizing overlapping and difficult to distinguish terms that have References
been variously used to describe neuropsychological test scores (e.g.
[23]). The interpretation of test scores that fall within the confidence [1] American Academy of Neurology, Assessment: neuropsychological testing of adults.
interval between two test score qualitative descriptors (e.g., ‘low Considerations for neurologists, Neurology 47 (2) (1996) 592–599.
[2] Board of Directors, American Academy of Clinical Neuropsychology (AACN) prac-
average’ and ‘borderline’) require the clinician to interpret these scores tice guidelines for neuropsychological assessment and consultation, Clin.
within the context of the patient and his/her presenting problem to Neuropsychol. 21 (2007) 209–231, http://dx.doi.org/10.1080/
answer the referral question, and likely no qualitative score descriptor 13825580601025932.
[3] R.C. Hilsabeck, T.L. Hietpas, K.J.M. McCoy, Satisfaction of referring providers with
system can avoid the confound of those scores that have a standard neuropsychological services within a Veterans Administration medical center, Arch.
error of measure that could fall within two qualitative descriptor do- Clin. Neuropsychol. 29 (2014) 131–140.
mains. While the use of subcategory test score qualitative descriptors [4] R.O. Temple, J. Carvalho, G. Tremont, A national survey of physicians’ use and
satisfaction with neuropsychological services, Arch. Clin. Neuropsychol. 21 (2006)
(e.g., ‘mild-to-moderately impaired’) have intuitive appeal, these dis- 371–382.
tinctions have shown little reliability among clinicians [23], and there [5] G. Tremont, H.J. Westervelt, D.J. Javorsky, A. Podolanczuk, R.A. Stern, Referring
are no data supporting that subclasification of a qualitative descriptors physicians’ perceptions of the neuropsychological evaluation: how are we doing?
Clin. Neuropsychol. 16 (2002) 551–554.
reduces communication errors. The communication of subtleties or
[6] American Educational Research Association, American Psychological Association,
profound level of deficits must be provided by the clinician in the Im- National Council on Measurement in Education, Standards for Educational and
pressions/Conclusions section where interpreting and integrating the Psychological Testing Revised, 2014 edition, Author, Washington DC, 2014.
normative test scores with patient context, symptoms, medical history [7] American Psychological Association, Ethical principles of psychologists and code of
conduct, Am. Psychol. 57 (2002) 1060–1073.
and cultural factors to answer the referral question and make diagnostic [8] Committee on Quality of Health Care in America, To Err is Human: Building a Safer
and treatment/placement decisions. The Q-simple system advocates Health System vol. 627, National Academies Press, Washington DC, 2000.
clinicians to use the same test score qualitative descriptor to commu- [9] P. Tzotzoli, A guide to neuropsychological report writing, Health (N. Y.) 4 (2012)
821–823.
nicate results regardless if the score was derived from age-matched [10] S. Kripalani, F. LeFevre, C.O. Phillips, M.V. Williams, P. Basaviah, D.W. Baker,
normative data or more detailed demographically matched normative Deficits in communication and information transfer between hospital-based and
data. While conceptual and parametric limitations exist in comparing primary care physicians: implications for patient safety and continuity of care,
JAMA 297 (2007) 831–841.
normative scores derived from different samples and normative cor- [11] M.D. Lezak, Neuropsychological Assessment, 4th ed., Oxford University Press, New
rections, the Q-simple system approach is proposed to be basic: using York, 2004.
the same qualitative descriptors for all standardized test scores, and the [12] M. Mitrushina, K.B. Boone, J. Razani, L.F. D’Elia, Handbook of Normative Data for
Neuropsychological Assessment, Oxford University Press, New York, 2005.
psychometric limitations of this approach is then compensated by in- [13] O. Pedraza, D. Mungas, Measurement in cross-cultural neuropsychology,
cluding raw test scores in the report. Lastly, although the assessment Neuropsychol. Rev. 18 (3) (2008) 184–193.
instructions are standardized, differences in scores and interpretations [14] R.K. Heaton, S.W. Miller, M.J. Taylor, I. Grant, Revised Comprehensive Norms for
an Expanded Halstead-Reitan Battery: Demographically Adjusted
can occur when using different test versions that use different proce-
Neuropsychological Norms for African American and Caucasian Adults Professional
dures (e.g., the Stroop Color-Word test) and when variations in ad- Manual, Psychological Assessment Resources, Lutz, 2004.
ministration is required for unique patient/cultural factors. Generally, [15] E. Strauss, E.M.S. Sherman, O. Spreen, A Compendium of Neuropsychological Tests:
it is likely desirable to have the test version and/or modifications to Administration, Norms, and Commentary, 3rd ed., Oxford University Press, New
York, 2006.
procedures detailed when communicating results and how it affected [16] G. Chelune, Evidence-based research and practice in clinical neuropsychology, Clin.
the study quality and interpretation. The essential point is to avoid Neuropsychol. 24 (3) (2010) 454–467.
complicated language to describe standardized test scores to assure [17] J. Donders, A survey of report writing by neuropsychologists: I: general char-
acteristics and consent, Clin. Neuropsychol. 15 (2001) 137–149.
assessment results are accurately conveyed and reduce the risk for [18] J. Donders, A survey of report writing by neuropsychologists, II: test data, report
communication errors. format: and document length, Clin. Neuropsychol. 15 (2001) 150–161.
Clearly, a generally accepted test score qualitative classification [19] D. Freides, Proposed standard of professional practice: neuropsychological reports
display all quantitative data, Clin. Neuropsychol. 7 (1993) 234–235.
system is needed; however, any qualitative classification system pro- [20] American College of Radiology, ACR Practice Guideline for Communication of
posed that is not data driven by consumers of the evaluations them- Diagnostic Imaging Findings, American College of Radiology, Reston, VA, 2005.
selves or based on a clinical outcome anchor point will lack relevance [21] Board of the Faculty of Clinical Radiology: The Royal College of Radiologists,
Standards for the Reporting and Interpretation of Imaging Investigations, Royal
beyond what is proposed here. The issue of including test scores in
College of Radiologists, London, UK, 2006.
reports has prompted controversy (e.g. [17,18,16,19,41,42]. Argu- [22] D.C. Kushner, L.L. Lucey, American College of Radiology, Diagnostic radiology
ments against including test scores in reports include the incorrect use reporting and communication: the ACR guideline, J. Am. Coll. Radiol. 2 (2005)
15–21.
of scores by consumers and breach of confidentiality; however, in-
[23] T.J. Guilmette, L.D. Hagan, A.J. Giuliano, Assigning qualitative descriptions to test
cluding test scores that are the basis for interpretation of a study is scores in neuropsychology: forensic implications, Clin. Neuropsychol. 22 (2008)
advocated from an evidence-based practice standpoint to promote 122–139.
transparency of interpretation and enhance communication of results [24] B.N. Axelrod, J.R. Wall, Expectancy of impaired neuropsychological test scores in a
non-clinical sample, Int. J. Neurosci. 117 (11) (2007) 1591–1602.
[2,17,18,16,19,41]. Combining the proposed Q-Simple system and re- [25] L.M. Binder, G.L. Iverson, B.L. Brooks, To err is human: abnormal neuropsycholo-
porting test scores can improve the communication of neuropsychological scores and variability are common in healthy adults, Arch. Clin. Neuropsychol.
gical results to patients and health providers, and promote transparency 24 (2009) 31–46.
[26] G. Groth-Marnant, Handbook of Psychological Assessment, 5th ed., John
of neuropsychological decision-making [43,16,39,5]. Consistent ad- Wiley & Sons, Inc, Hoboken, New Jersey, 2009.
herence to improved practice standards stands to benefit patients,
78
[27] R.K. Heaton, I. Grant, C.G. Matthews, Comprehensive Norms for an Extended San Antonia, TX, 2008.
Halstead-Reitan Battery: Demographic Corrections, Research Findings, and Clinical [36] D. Wechsler, Wechsler Adult Intelligence Scale, 3rd ed., Psychological Corporation,
Applications, Psychological Assessment Resources, Inc, Odessa, 1991. San Antonio, 1997.
[28] D.J. Schretlen, S.M. Testa, G.D. Pearlson, Hopkins Neuropsychological Normative [37] D. Wechsler, Wechsler Memory Scale, 3rd ed., Psychological Corporation, San
System Professional Manual, Psychological Assessment Resources, Inc, Lutz, FL, Antonio, TX, 1997.
2010. [38] T. White, R.A. Stern, Neuropsychological Assessment Battery: Psychometric and
[29] D. Wechsler, Wechsler Adult Intelligence Scale-revised, The Psychological Technical Manual, Psychological Assessment Resources, Lutz, 2003.
Corporation, New York, 1981. [39] D.L. Sackett, S.E. Straus, W.S. Richardson, W. Rosenberg, R.B. Haynes, Evidence-
[30] B.L. Brooks, G.L. Iverson, J.A. Holdnack, H.H. Feldman, The potential for mis- based Medicine: How to Practice and Teach EBM, 2nd ed., Churchill Livingstone,
classification of mild cognitive impairment: a study of memory scores on the New York City, 2000.
Wechsler Memory Scale-III in healthy older adults, J. Int. Neuropsychol. Soc. 14 (3) [40] P. Green, M.L. Rohling, P.R. Lees-Haley, L.M. Allen, Effort has a greater effect on
(2008) 463–478. test scores than severe brain injury in compensation claimants, Brain Inj. 15 (2001)
[31] B.L. Brooks, G.L. Iverson, T. White, Substantial risk of accidental MCI in healthy 1045–1060, http://dx.doi.org/10.1080/02699050110088254.
older adults: base rates of low memory scores in neuropsychological assessment, J. [41] R.G. Matarazzo, Psychological report standards in neuropsychology, Clin.
Int. Neuropsychol. Soc. 13 (3) (2007) 490–500. Neuropsychol. 9 (1995) 249–250.
[32] L.J. Ingraham, C.B. Aiken, An empirical approach to determining criteria for ab- [42] R.I. Naugle, A.J. McSweeney, On the practice of routinely appending neu-
normality in test batteries with multiple measures, Neuropsychology 10 (1996) ropsychologial data to reports, Clin. Neuropsychol. 9 (1995) 245–247.
120–124. [43] S.C. Bowden, E.J. Harrison, D.W. Loring, Evaluating research for clinical sig-
[33] B.W. Palmer, K.B. Boone, I.M. Lesser, M.A. Wohl, Base rates of impaired neu- nificance: using critically appraised topics to enhance evidence-based neu-
ropsychological test performance among healthy older adults, Arch. Clin. ropsychology, Clin. Neuropsychol. 28 (4) (2014) 653–668.
Neuropsychol. 13 (6) (1998) 503–511. [44] American Psychological Association, National Council on Measurement in
[34] D.J. Schretlen, S.M. Testa, J.M. Winicki, G.D. Pearlson, B. Gordon, Frequency and Education, American Educational Research Association, Standards for educational
bases of abnormal performance by healthy adults on neuropsychological testing, J. and psychological testing, 1st ed., American Educational Research Association,
Int. Neuropsychol. Soc. 14 (3) (2008) 436–445. 1999.
[35] D. Wechsler, Wechsler Adult Intelligence Scale, 4th ed., Psychological Corporation,
79

Towards reporting standards for neuropsychological study results

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Towards reporting standards for neuropsychological study results

Uploaded by

Copyright:

Available Formats

Clinical Neurology and Neurosurgery 162 (2017) 72–79

Contents lists available at ScienceDirect

Clinical Neurology and Neurosurgery

Towards reporting standards for neuropsychological study results: A MARK

1. Introduction standardized, reliable, and norm-based assessments to measure brain

Fig. 1. Commonly Used Qualitative Descriptors with

Table 1 deﬁcits in the context of neuropathological patterns and appropriate

Term or phrase Setback Recommendation

Memory: Scores varied from abnormal to average compared to age-

Neuropsychological Data Summary Report

Note: Grooved Pegboard Test = G. Peg; Test of Premorbid

You might also like