You are on page 1of 11

Archives of Clinical Neuropsychology 31 (2016) 378–388

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


One-Year Reference Norms of Cognitive Change in Spanish Old Adults:
Data from the NEURONORMA Sample
Gonzalo Sánchez-Benavides1,2, Jordi Peña-Casanova1,3,*, Marta Casals-Coll1, Nina Gramunt2,
Rosa Maria Manero1,3, Albert Puig-Pijoan1,3, Miguel Aguilar4, Alfredo Robles 5, Carmen Antúnez6,
Anna Frank-Garcı́a 7, Manuel Fernández-Martı́nez 8, Rafael Blesa9 for the NEURONORMA Study Team†
1
Neurofunctionality and Language Group, Neurosciences Program, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
2
Clinical Research Program, Barcelona beta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain
3
Section of Behavioral Neurology, Neurology Department, Hospital del Mar, Parc Salut Mar, Barcelona, Spain
4
Neurology Department, Hospital Mútua de Terrassa, Terrassa, Spain
5
Neurology Department, Hospital La Rosaleda, Santiago de Compostela, Spain
6
Neurology Department, Hospital Virgen Arrixaca, Murcia, Spain
7
Neurology Department, Hospital Universitario La Paz, Madrid, Spain
8
Neurology Department, Hospital de Cruces, Barakaldo, Spain
9
Neurology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain
*Corresponding author at: Behavioral Neurology Section, Neurology Department, Hospital del Mar, Passeig Maritim, 25-29, 08003 Barcelona, Spain.
Tel.:+34-93-3160778; fax: +34-93-3160723.
E-mail address: jpcasanova@parcdesalutmar.cat (J. Peña-Casanova).
Accepted 14 March 2016

Abstract
Objective. Serial cognitive assessments are useful for many purposes, such as monitoring cognitive decline or evaluating the result
of an intervention. In order to determine if an observed change is reliable and meaningful, longitudinal reference data from non-
clinical samples are needed. Since neuropsychological outcomes are affected by language and cultural background, cognitive
tests should be adapted, and country-based norms collected. The lack of cross-sectional normative data for Spanish population
has been partially remediated, but there is still a need of reliable change norms. This paper aims to give an initial response to this
need by providing several reliable change indices (RCI) for 1-year follow-up in a Spanish sample.
Method. A longitudinal observational study was designed. A total of 122 healthy subjects over age 50 were evaluated twice (M ¼
369.5, SD¼ 10.7 days) with the NEURONORMA battery. Scores changes were analyzed, and simple discrepancy scores, standard
deviation indices, RCI, and standardized regression-based scores were calculated.
Results. Significant improvements were observed in variables related to memory, both verbal and visual, visuospatial function,
and the completion time of complex problems. Reference tables for several RCI are provided for their use in clinical settings.
Conclusions. Our results confirm the existence of heterogeneous practice effects after 1 year, and support the recommendation of
using reliable change norms to avoid misdiagnosis in repeated assessments. This study provides with initial, preliminary norms of
cognitive change for its use in Spanish elders. Further studies on larger samples and different inter-visit intervals are still needed.

Keywords: Reliable change; Practice effects; Reference values; Assessment; Geriatrics

Introduction

Performing serial assessments is a common practice in neuropsychology. Repetitive evaluations allow clinicians to accomplish
several objectives, such as following the natural progression of a disease or determining the effects of interventions. In geriatric


See appendix for Members of the NEURONORMA Study Team.

# The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
doi:10.1093/arclin/acw018 Advance Access publication on 24 April 2016
G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388 379

populations, cognitive follow-up permits, for example, the tracking of cognitive decline of patients suffering from dementia or the
detection of its initial symptoms by following subjects at risk. However, interpreting longitudinal data encompasses some chal-
lenges, such as dealing with practice effects, or with cognitive changes associated with aging (Duff, 2012; McCaffrey, Duff, &
Westervelt, 2000). The cross-sectional normative approach, which is the most common way of evaluating if a given score can
be considered as indicative of impairment, is not suitable to decide whether a cognitive change is normal or not. In order to illustrate
this idea, imagine a highly educated 70-year-old male who begins to complain about his memory. After a neuropsychological
evaluation, his age- and education-adjusted performance in a list-learning memory test is 1 SD above his reference group. As a

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


result, no evidence of memory impairment, rather the contrary, is found at this moment. Imagine also that the subject is reevaluated
1 year later, with his performance then being 0.5 SD above his reference group. Although the score has declined one-half SD, it is far
from the usual cut-off score that defines memory impairment (e.g., 1 or 1.5 SD below the reference mean). Therefore, by using the
cross-sectional normative data, no evidence of psychometric memory impairment can be concluded at any visit. This common
situation raises some clinically relevant questions: Is this one-half SD decline within the normal range of change after 1 year?
Shall we adjust such expected change for age, education, or sex effects? How can we empirically support a conclusion for
memory impairment from intra-subject cognitive decline when scores are within the cross-sectional psychometric normal
range? In order to give adequate answers to these questions, we need to have available reference data of longitudinal cognitive
scores change from healthy people.
Several authors have provided statistical methods to determine if a meaningful cognitive change has occurred, such as the re-
liable change indices (RCI) (Chelune, Naugle, Lüders, Sedlak, & Awad, 1993; Iverson, 2001; Jacobson & Truax, 1991), and the
standardized regression-based (SRB) formulas (McSweeny, Naugle, Chelune, & Lüders, 1993). A review of methods can be seen
in Duff (2012) and Hinton-Bayre (2010). After the 2010 position paper of the American Academy of Clinical Neuropsychology
recommending more research on the topic (Heilbronner et al., 2010), a wealth of papers on “reliable change” has been published
(e.g., Calamia, Markon, & Tranel, 2012; Gross et al., 2015; Sachs et al., 2012). However, reliable change reference data for
non-English-speaking populations remain scarce and, to our knowledge, currently non-existent in Spanish samples. It is well
established that norms should be valid and appropriate for the subject under evaluation. Sociodemographics, language, and cul-
tural background do influence cognitive measurements and therefore, translations, adaptations, and country-based collection of
normative data are needed in order to promote valid use of neuropsychological tests (Ardila, 2005; Fletcher-Janzen, Strickland, &
Reynolds, 2000). In Spain, normative data for several tests have been published in recent years, with the NEURONORMA study
being a noteworthy contribution. Until now, norms for the NEURONORMA battery have been provided for subjects over 50 (see
Peña-Casanova, Blesa, et al., 2009) and under 50 years (see Peña-Casanova et al., 2012), and norms for additional tests (under the
name NEURONORMA-Plus) will be published in the future. Such data, along with those provided by other initiatives (e.g., del
Pino, Peña, Schretlen, Ibarretxe-Bilbao, & Ojeda, 2015), are currently remediating the lack of valid neuropsychological norms in
Spain. Nevertheless, all of these normative data are cross-sectional in nature, and there is still a need for reliable change norms in
our country. This paper aims to respond to this need by providing several RCI for 1-year follow-up calculated from a sample of
healthy Spanish subjects over 50 years of age. Recommendations of use and examples for those clinicians interested in applying
these norms are also provided.

Materials and Methods

Participants

The data used to develop the reliable change norms presented in this study come from the 1-year follow-up visit performed in a
subgroup of healthy individuals from the NEURONORMA study normative sample. The NEURONORMA battery, which is a
comprehensive neuropsychological battery composed of 14 tests (cf. infra), was initially administered to 356 cognitively
normal subjects aged between 50 and 85 years old to obtain Spanish age- and education-adjusted normative data. Nine hospitals
from different regions of Spain were involved in data collection to ensure the maximum representativeness of the population.
Subjects were recruited between 2004 and 2007. The recruitment methods and sample characteristics were extensively described
elsewhere (Peña-Casanova, Blesa, et al., 2009; Sánchez-Benavides et al., 2014). One-hundred and twenty-four individuals were
reassessed 1 year after baseline (mean interval 369.5 days, SD ¼ 10.7). From those 124 subjects, two were diagnosed of mild cog-
nitive impairment (MCI) at follow-up by a clinician blinded to the study findings and were excluded from the analyses. Although
the final analyzed sample includes 122 individuals, not all subjects completed all the tests and the number of subjects providing
data for each test varies from 103 to 119. Sociodemographic characteristics and screening outcomes at baseline can be seen in
Table 1. The study was approved by the Research Ethics Committees of the involved centers and was conducted in accordance
with the Declaration of Helsinki and its subsequent amendments. All subjects signed an informed consent before performing
any study procedure.
380 G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388

Cognitive Measures

The NEURONORMA battery is composed of the following tests: Digit Span Forward and Backward; Visuospatial Span from
the WAIS-R-NI (Corsi’s Test); Trail Making Test (TMT); Symbol Digit Modalities Test (SDMT); Boston Naming Test (BNT);
Token Test; Selected subtests of the Visual Object and Space Perception Battery (VOSP): Object Decision, Progressive
Silhouettes, Position Discrimination, and Number Localization; Judgment of Line Orientation (JLO); Rey – Osterrieth
Complex Figure (ROCF), copy and memory at 3 and 30 min; Free and Cued Selective Reminding Test (FCSRT); Verbal

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


fluency tasks: semantic (animals) and formal phonemic (words beginning with P); Stroop Color-Word Interference Test; and
Tower of London Drexel University version (TOL-Dx). A total of 33 variables were analyzed (see Table 2 for a detailed list).
All the tests were administered and scored according to the standardized procedures published in each test’s manual. Details
on the testing procedures can be found in the NEURONORMA methods paper (Peña-Casanova, Blesa, et al., 2009). No alternative
versions, regardless of availability, were used at the retest visit. Depression symptoms were assessed at both visits by means of the
Hamilton Depression Rating Scale (HDRS).

Statistical Analyses

Initially, a descriptive analysis of the sociodemographic and cognitive outcomes at baseline (T1) and follow-up (T2) was per-
formed. Paired t-tests were used to test for significant changes in scores between visits. Pearson correlations and Cohen’s d effect
size were also computed. Five reliable change scores were calculated following previous works on the topic (Duff, 2012, 2014):
simple discrepancy scores, standard deviation indices (SDI), classical RCI, RCI correcting for practice effects, and complex SRB
scores. For SRB, baseline scores, age, education, and sex were used to predict 1-year score. No simple SRB scores (i.e., using only
the baseline score) were computed because sociodemographic data are usually accessible to clinicians and can be easily included in
complex SRB calculations.

Results

Table 1 shows the baseline characteristics of the sample. Subjects’ mean age was 65 years and the mean years of formal edu-
cation was 11, which approximately corresponds to an ISCED-UNESCO level 2. Men were underrepresented in this follow-up
sample (31%). Such percentage is lower than the observed in the baseline global sample (n ¼ 356), which consists of 40.4% of
men. There were no significant differences between visits regarding the presence of depression symptoms, as measured by the
HDRS score [T1, M ¼ 2.7, SD ¼ 2.8; T2, M ¼ 3.0, SD ¼ 3.2; t(117) ¼ 20.14, p ¼ .298], showing a minimal mean variation
(T2 2 T1, M ¼ 0.3, SD ¼ 3.2). Table 2 summarizes the baseline and follow-up cognitive scores in the NEURONORMA
battery. Correlation values between T1 and T2 scores, along with effect sizes (Cohen’s d), are also shown. Global performance
tends to be better at follow-up. Such improvements were statistically significant in 11 out of the 33 studied variables:
Judgment of Line Orientation, t(109) ¼ 2.02, p ¼ .046; FCSRT Free recall trial 1, t(106) ¼ 4.04, p ¼ ,.001; FCSRT Total
free recall, t(106) ¼ 4.27, p ¼ ,.001; FCSRT Total recall, t(106) ¼ 3.45, p ¼ ,.001; FCSRT Free delayed recall, t(104) ¼
2.49, p ¼ .014; FCSRT Total delayed recall, t(104) ¼ 2.35, p ¼ .020; ROCF 30 min recall, t(106) ¼ 2.16, p ¼ .032; TOL-Dx
Total execution time, t(111) ¼ 23.19, p ¼ .002; TOL-Dx Total solving time, t(111) ¼ 23.13, p ¼ .002; VOSP Object decision,
t(110) ¼ 5.56, p , .001; VOSP Progressive silhouettes, t(110) ¼ 29.11, p , .001.
Relevant percentiles (2%, 5%, 16%, 50%, 84%, 95%, 98%) obtained from the calculations of simple discrepancy scores (T2 2 T1)
are displayed in Table 3. Medians are in most cases around 0 (i.e., no score change), except for memory measures and some timed
tasks, in which percentile 50 was associated with score improvements. These data can be easily used in clinical settings by searching
for the position in the distribution of percentiles in which patient discrepancy score falls. As usually interpreted, the percentile, or the

Table 1. Sociodemographic characteristics and screening outcomes at baseline (n ¼ 122)


Mean (SD) Min–Max

Age, years 64.5 (8.7) 50– 85


Education, years 10.6 (5.4) 0– 20
Sex (women, %) (84, 69%)
MMSE 29.0 (1.5) 24– 30
MIS 7.4 (1.1) 4– 8
Retest interval, days 369.5 (10.7) 349– 401
Notes: MMSE ¼ Mini-Mental State Examination; MIS ¼ Memory Impairment Screen.
G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388 381

Table 2. Test and retest cognitive scores, practice effects, and test-retest correlations

n Baseline 1 year Practice effects r d

Digit Span Forward 119 5.46 (1.10) 5.58 (1.09) 0.12 (1.01) 0.58 0.11
Digit Span Backward 119 3.92 (1.04) 4.01 (1.09) 0.09 (0.92) 0.62 0.09
Corsi’s Test Forward 113 4.05 (1.02) 4.12 (0.92) 0.06 (1.02) 0.50 0.06
Corsi’s Test Backward 113 3.41 (1.04) 3.50 (1.01) 0.09 (0.94) 0.50 0.09
TMT Part A 112 57.62 (27.79) 58.51 (26.49) 0.88 (21.81) 0.68 0.03

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


TMT Part B 103 138.40 (84.12) 145.40 (88.39) 7.07 (54.71) 0.80 0.08
SDMT Total Correct 111 34.59 (14.29) 34.97 (14.03) 0.38 (5.65) 0.92 0.03
Boston Naming Test 108 48.40 (7.06) 48.89 (6.53) 0.49 (3.33) 0.88 0.07
Token Test 110 33.04 (2.50) 33.09 (2.35) 0.05 (1.93) 0.69 0.02
Judgment of Line Orientation 110 22.88 (4.06) 23.52 (3.94)* 0.64 (3.31) 0.66 0.16
ROCF Copy-Adequacy 107 30.62 (5.16) 30.82 (4.54) 0.20 (3.95) 0.67 0.04
ROCF Copy-Time 107 219.20 (96.54) 210.60 (111.78) 28.59 (80.43) 0.71 20.08
FCSRT Free Recall Trial 1 107 5.83 (2.35) 6.77 (2.36)* 0.93 (2.39) 0.48 0.40
FCSRT Total Free Recall 107 23.5 (7.37) 25.67 (6.86)* 2.18 (5.27) 0.73 0.31
FCSRT Total Recall 107 40.06 (5.49) 41.31 (5.16)* 1.25 (3.76) 0.75 0.24
FCSRT Free Delayed Recall 105 9.07 (3.43) 9.72 (3.35)* 0.65 (2.66) 0.68 0.20
FCSRT Total Delayed Recall 105 13.94 (2.24) 14.37 (2.28)* 2.39 (3.28) 0.66 0.19
ROCF 3 min Recall 107 17.21 (6.78) 18.13(7.17) 0.91 (5.24) 0.74 0.15
ROCF 30 min Recall 107 16.58 (6.48) 17.63 (7.14)* 1.05 (4.99) 0.72 0.13
Verbal Animal Fluency 109 19.72 (5.69) 20.02 (5.65) 0.30 (3.53) 0.81 0.05
Verbal “P” Fluency 109 14.08 (5.36) 14.30 (5.53) 0.22 (3.81) 0.76 0.04
Stroop Word 110 87.28 (19.60) 88.13 (19.47) 0.85 (18.16) 0.57 0.04
Stroop Color 110 59.54 (14.48) 59.44 (14.86) 20.1 (10.74) 0.73 20.01
Stroop Color-Word 110 32.04 (11.06) 32.16 (10.89) 0.13 (7.85) 0.74 0.01
TOL-Dx Total Correct 112 4.26 (2.08) 4.57 (2.03) 0.31 (1.98) 0.54 0.15
TOL-Dx Total Moves 112 33.32 (17.59) 30.33 (16.78) 21.99 (17.25) 0.50 20.12
TOL-Dx Total Initiation Time 112 64.96 (51.25) 62.85 (36.62) 22.10 (40.88) 0.61 02.05
TOL-Dx Total Execution Time 112 300.10 (134.33) 269.40 (131.22)* 230.65 (101.73) 0.71 02.23
TOL-Dx Total Solving Time 112 365.00 (149.47) 332.30 (142.73)* 232.76 (110.75) 0.71 02.22
VOSP Object Decision 111 15.87 (2.50) 16.97 (2.29)* 1.10 (2.08) 0.63 0.46
VOSP Progressive Silhouettes 111 11.69 (2.84) 9.24 (3.16)* 22.45 (2.83) 0.56 02.82
VOSP Position Discrimination 111 19.59 (0.86) 19.60 (0.81) 20.01 (1.32) 0.47 0.01
VOSP Number Localization 111 8.88 (1.66) 8.99 (1.49) 0.11 (1.32) 0.65 0.07
Notes: Practice effect scores are one-year score minus baseline score. r ¼ Pearson correlation. d ¼ Cohen’s d. All scores are raw scores and, therefore, time scores
(TMT, ROCF Copy Time, TOL-Dx Times), TOL-Dx Total Moves, and VOSP Progressive Silhouettes appear reversed (i.e., negative practice effect mean improve-
ment). Mean and (standard deviation) are given; practice effects are calculated by subtraction score at T1 from score at T2; n ¼ number of subjects with complete data
for this variable; TMT ¼ Trail Making Test; SDMT ¼ Symbol Digit Modalities Test; ROCF ¼ Rey– Osterrieth Complex Figure; FCSRT ¼ Free and Cued
Selective Reminding Test; TOL-Dx ¼ Tower of London Drexel version; VOSP ¼ Visual Object and Spatial Perception Battery. *p , .05.

range of percentiles, associated with the raw score gives an idea about the likelihood of observing this degree of discrepancy in healthy
subjects. For example, for the TOL-Total moves, half of the sample (percentile 50) needs at least three fewer moves to solve the pro-
blems in the 1-year follow-up. Needing 25 extra moves at follow-up would fall within the 2–5 percentile range (Table 3), which can
lead to a clinical interpretation regarding the abnormality of such decline in healthy people over 50 years old.
Table 4 shows SDI and RCI indices. Although these indices are calculated using different formulae, their interpretation is
homogeneous. Change is in all cases divided by some measure of standard deviation and produces z-scores that can be easily inter-
preted and compared. SDI standardizes the discrepancy score (T2 2 T1) by dividing it by the SD at T1. For its part, RCI uses the
standard error of the difference (SED) in the numerator, which is an estimate of the standard deviation of the difference score. In the
RCI correcting for practice effects, the group mean of change is subtracted from the individual discrepancy score before dividing it
by the SED. This procedure “centers” the individual change at the mean normal variation. Finally, Table 5 shows the results of the
SRB. In the SRB method, a predicted T2 score is calculated accounting for subject’s baseline score and the sociodemographic pre-
dictors that were significant in the regression analyses. Then, the predicted score is subtracted from the subjects’ actual score and
divided by the standard error of the estimate (SEE) of the regression equation. The result can then be interpreted just as the previous
indices, by treating it as a regular z-score. The reader should notice that for inverse variables (i.e., those in which the higher the
score, the higher the impairment: TMT, ROCF-Copy Time, TOL-Dx Total Moves, TOL-Dx Initiation, Execution and Solving
Times, and VOSP Progressive Silhouettes), the interpretation of the obtained z-score should be reversed (i.e., positive z-scores
mean decline at follow-up).
382 G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388

Table 3. Simple discrepancy scores

2% 5% 16% 50% 84% 95% 98%

Digit Span Forward 22 21 21 0 1 2 2


Digit Span Backward 22 21 21 0 1 2 2
Corsi’s Test Forward 22 21 21 0 1 2 2
Corsi’s Test Backward 22 21 21 0 1 2 2
TMT Part A 53 26 16 1 212 232 257

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


TMT Part B 121 86 51 3 228 269 2111
SDMT Total Correct 210 28 26 0 6 10 11
Boston Naming Test 29 25 22 1 4 5 6
Token Test 25 23.5 21.5 0 2 3 4
Judgment of Line Orientation 26 25 22 1 3 6 7
ROCF Copy-Adequacy 26 25 23 0 3 6 10
ROCF Copy-Time 190 130 53 28 274 2127 2164
FCSRT Free Recall Trial 1 25 24 21 1 3 5 6
FCSRT Total Free Recall 210 27 22 2 8 10 11
FCSRT Total Recall 26 25 22 1 4 8 9
FCSRT Free Delayed Recall 24 23 22 1 4 5 5
FCSRT Total Delayed Recall 23 22 21 0 2 3 4
ROCF 3 min Recall 29 28 25 2 6.5 9 10
ROCF 30 min Recall 28.5 27.5 23.5 1 6 9.5 10
Verbal Animal Fluency 27 25 23 0 3 6 7
Verbal “P” Fluency 27 25 23 0 4 6 9
Stroop Word 232 220 28 0 12 32 35
Stroop Color 225 213 25 0 7 12 21
Stroop Color-Word 218 215 27 1 7 13 17
TOL-Dx Total Correct 23 23 22 0 2 3 4
TOL-Dx Total Moves 34 24 14 23 219 229 233
TOL-Dx Total Initiation Time 78 42 24 1 226 256 273
TOL-Dx Total Execution Time 153 129 55 229 2138 2186 2213
TOL-Dx Total Solving Time 161 137 54 224 2143 2223 2253
VOSP Object Decision 23 22 21 1 3 5 5
VOSP Progressive Silhouettes 4 3 0 22 25 27 28
VOSP Position Discrimination 22 21 0 0 0 1 2
VOSP Number Localization 23 22 21 0 1 2 3
Notes: All scores are raw scores and, therefore, time scores (TMT, ROCF Copy Time, TOL-Dx Times), TOL-Dx Total Moves, and VOSP Progressive Silhouettes
appear reversed (i.e., negative discrepancy scores mean improvement). Simple discrepancy scores are 1 year minus baseline (T2 2 T1). TMT ¼ Trail Making Test;
SDMT ¼ Symbol Digit Modalities Test; ROCF ¼ Rey –Osterrieth Complex Figure; FCSRT ¼ Free and Cued Selective Reminding Test; TOL-Dx ¼ Tower of
London Drexel version; VOSP ¼ Visual Object and Spatial Perception Battery. Scores associated with selected percentiles.

Discussion

This paper provides reliable change reference data for several measures from a sample of healthy Spanish subjects between 50
and 85 years old. These data can be used to determine if a subject’s 1-year cognitive change is meaningful or not. Percentiles for
simple discrepancy scores, SDI, RCI, and SRB indices for widely used neuropsychological tests are given.
We have found a global trend toward improvement in cognitive performance at follow-up, indicating the presence of practice
effects in most tests after 1 year (Table 2). This finding is not surprising, since 1-year practice effects are commonly reported in
healthy samples (e.g., Calamia et al., 2012; Jonaitis et al., 2015; Levine, Miller, Becker, Selnes, & Cohen, 2004). Despite this
global positive trend, only few variables reached significance in paired t-tests. Significant improvements were observed in vari-
ables related to memory, both verbal (FCSRT) and visual (ROCF), visuospatial function (JLO, VOSP Object Decision and
Progressive Silhouettes), and the completion time of complex problems (TOL-Dx Total Execution Time, and Total Solving
Time). Regarding memory tasks, when no alternative versions are used, the examinee would learn both the procedure of the
task and the specific materials to be remembered, and consequently larger practice effects are expected. In our sample, practice
effects in memory tasks showed heterogeneous effect sizes values that range from 0.13 in the delayed recall of the ROCF to
0.40 in the Free Recall Trial 1 of the FCSRT. Previous reports found a high degree of heterogeneity in practice effects for tasks
within the same cognitive domain (Calamia et al., 2012). Such heterogeneity could be likely related to specific test-associated
traits that modulate the learning at baseline. Heterogeneity phenomena can be also observed in visuospatial data. While the
spatial perception subtests of the VOSP (i.e., Position Discrimination and Number Localization) showed minimal improvement
G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388 383

Table 4. Standard deviation and reliable change indices

SDI RCI RCI + PE

Digit Span Forward (T2 2 T1)/1.10 (T2 2 T1)/1.01 [(T2 2 T1) 2 0.12]/1.01
Digit Span Backward (T2 2 T1)/1.04 (T2 2 T1)/0.91 [(T2 2 T1) 2 0.09]/0.91
Corsi’s Test Forward (T2 2 T1)/1.02 (T2 2 T1)/1.02 [(T2 2 T1) 2 0.06]/1.02
Corsi’s Test Backward (T2 2 T1)/1.04 (T2 2 T1)/1.04 [(T2 2 T1) 2 0.09]/1.04
TMT Part A (T2 2 T1)/27.79 (T2 2 T1)/22.23 [(T2 2 T1) + 0.88]/22.23

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


TMT Part B (T2 2 T1)/84.12 (T2 2 T1)/53.20 [(T2 2 T1) + 7.07]/53.20
SDMT Total Correct (T2 2 T1)/14.29 (T2 2 T1)/5.72 [(T2 2 T1) 2 0.38]/5.72
Boston Naming Test (T2 2 T1)/7.06 (T2 2 T1)/3.46 [(T2 2 T1) 2 0.49]/3.46
Token Test (T2 2 T1)/2.50 (T2 2 T1)/1.97 [(T2 2 T1) 2 0.05]/1.97
Judgment of Line Orientation (T2 2 T1)/4.06 (T2 2 T1)/3.35 [(T2 2 T1) 2 0.64]/3.35
ROCF Copy-Adequacy (T2 2 T1)/5.16 (T2 2 T1)/4.19 [(T2 2 T1) 2 0.20]/4.19
ROCF Copy-Time (T2 2 T1)/96.54 (T2 2 T1)/73.52 [(T2 2 T1) + 8.59]/73.52
FCSRT Free Recall Trial 1 (T2 2 T1)/2.35 (T2 2 T1)/2.40 [(T2 2 T1) 2 0.93]/2.40
FCSRT Total Free Recall (T2 2 T1)/7.37 (T2 2 T1)/5.42 [(T2 2 T1) 2 2.18]/5.42
FCSRT Total Recall (T2 2 T1)/5.49 (T2 2 T1)/3.88 [(T2 2 T1) 2 1.25]/3.88
CSRT Free Delayed Recall (T2 2 T1)/3.43 (T2 2 T1)/2.62 [(T2 2 T1) 2 0.65]/2.62
FCSRT Total Delayed Recall (T2 2 T1)/2.24 (T2 2 T1)/1.85 [(T2 2 T1) 2 2.39]/1.85
ROCF 3 min Recall (T2 2 T1)/6.78 (T2 2 T1)/4.67 [(T2 2 T1) 2 0.91]/4.67
ROCF 30 min Recall (T2 2 T1)/6.48 (T2 2 T1)/5.01 [(T2 2 T1) 2 1.05]/5.01
Verbal Animal Fluency (T2 2 T1)/5.69 (T2 2 T1)/3.51 [(T2 2 T1) 2 0.30]/3.51
Verbal “P” Fluency (T2 2 T1)/5.36 (T2 2 T1)/3.71 [(T2 2 T1) 2 0.22]/3.71
Stroop Word (T2 2 T1)/19.60 (T2 2 T1)/18.18 [(T2 2 T1) 2 0.85]/18.18
Stroop Color (T2 2 T1)/14.48 (T2 2 T1)/10.64 [(T2 2 T1) + 0.10]/10.64
Stroop Color-Word (T2 2 T1)/11.06 (T2 2 T1)/7.98 [(T2 2 T1) 2 0.13]/7.98
TOL-Dx Total Correct (T2 2 T1)/2.08 (T2 2 T1)/2.00 [(T2 2 T1) 2 0.31]/2.00
TOL-Dx Total Moves (T2 2 T1)/17.59 (T2 2 T1)/17.59 [(T2 2 T1) + 1.99]/17.59
TOL-Dx Total Initiation Time (T2 2 T1)/51.25 (T2 2 T1)/44.26 [(T2 2 T1) + 2.10]/44.26
TOL-Dx Total Execution Time (T2 2 T1)/134.33 (T2 2 T1)/102.30 [(T2 2 T1) + 30.65]/102.30
TOL-Dx Total Solving Time (T2 2 T1)/149.47 (T2 2 T1)/113.83 [(T2 2 T1) + 32.76]/113.83
VOSP Object Decision (T2 2 T1)/2.50 (T2 2 T1)/2.15 [(T2 2 T1) 2 1.10]/2.15
VOSP Progressive Silhouettes (T2 2 T1)/2.84 (T2 2 T1)/2.66 [(T2 2 T1) + 2.45]/2.66
VOSP Position Discrimination (T2 2 T1)/0.86 (T2 2 T1)/0.89 [(T2 2 T1) + 0.01]/0.89
VOSP Number Localization (T2 2 T1)/1.66 (T2 2 T1)/1.39 [(T2 2 T1) 2 1.11]/1.39
Notes: SDI ¼ Standard Deviation Index; RCI ¼ Reliable Change Index; PE ¼ Practice Effects; TMT ¼ Trail Making Test; SDMT ¼ Symbol Digit Modalities
Test; ROCF ¼ Rey –Osterrieth Complex Figure; FCSRT ¼ Free and Cued Selective Reminding Test; TOL-Dx ¼ Tower of London Drexel version; VOSP ¼
Visual Object and Spatial Perception Battery.

after 1 year, the JLO test, which is thought to tap the same underlying cognitive domain, displayed a significant improvement. The
other two VOSP subtests, which assess object recognition (i.e., Object Decision and Progressive Silhouettes), showed larger prac-
tice effects, being the Progressive Silhouettes showing the highest effect size in the battery (20.85). In Progressive Silhouettes, the
examinee should recognize as early as possible an object that is initially presented from a non-prototypical point of view. In sub-
sequent drawings (10 per item), the silhouettes progressively reveals more details of the object by presenting it in a more inform-
ative rotated view toward its elongated axis. There are two objects to be recognized. Because of its nature, this recognition task is
clearly influenced by previous exposure to the test and individuals can recognize the objects with less visual information in a
second assessment because they remember the object. There are other neuropsychological tests, not studied here, that are even
more prone to show large practice effects, such as the Wisconsin Card Sorting Test, that has been labeled as a “one-shot test”
(Calamia et al., 2012) because the main measures of the test are compromised by previous exposure. In fact, the novelty of the
task seems one of the most relevant factors in the magnitude of practice effects. The more the novelty, the greater the practice
effect (Cysique et al., 2011; Dikmen, Heaton, Grant, & Temkin, 1999). In the current study, the TOL-Dx could be labeled as
the most “novel” task of the battery, and the significant decrements in the time needed to solve the problems can be interpreted
under this novelty effect assumption.
Negligible practice effects (operationalized as an arbitrary absolute Cohen’s d value below 0.1) were found for many variables.
The absence of significant improvements in these variables can be explained in terms of stability in memory-free variables [e.g.,
accuracy in constructional praxis (copy of the ROCF)], lack of novelty of the task [e.g., recalling vocabulary (BNT) or following
commands (Token Test)], or minimum learning of the specific items in “performance time-based” executive tests (e.g., the TMT,
SDMT, and Stroop test). However, these results can be also explained in part from an aging perspective. Previous studies reported
384 G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388

Table 5. Complex standardized regression-based formulas

F(df) R2 SEE Predicted T2

Digit Span Forward 36.38 (2,116) 0.38 0.86 2.52 + T1 × 0.46 + edu × 0.05
Digit Span Backward 38.97 (3,115) 0.49 0.77 3.29 + T1 × 0.47 2 age × 0.03 + edu × 0.05
Corsi’s Test Forward 24.23 (2,110) 0.29 0.77 4.10 + T1 × 0.42 2 age × 0.03
Corsi’s Test Backward 15.95 (3,109) 0.29 0.85 3.00 + T1 × 0.36 2 age × 0.02 + edu × 0.03
TMT Part A 50.96 (2,109) 0.47 19.22 35.1 + T1 × 0.56 2 edu × 0.84

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


TMT Part B 179.5 (1,101) 0.63 53.30 29.1 + T1 × 0.84
SDMT Total Correct 316.1 (2,108) 0.85 5.41 14.3 + T1 × 0.87 2 age × 0.15
Boston Naming Test 219.0 (2,105) 0.80 2.90 18.9 + T1 × 0.79 2 age × 0.13
Token Test 58.66 (2,107) 0.51 1.64 14.2 + T1 × 0.53 + edu × 0.11
Judgment of Line Orientation 82.90 (1,108) 0.43 2.98 8.88 + T1 × 0.64
ROCF Copy-Adequacy 54.53 (2,104) 0.50 3.20 12.1 + T1 × 0.54 + edu × 0.20
ROCF Copy-Time 107.3 (1,105) 0.50 78.98 30.2 + T1 × 0.82
FCSRT Free Recall Trial 1 24.00 (2,104) 0.30 1.97 9.85 + T1 × 0.38 2 age × 0.08
FCSRT Total Free Recall 81.73 (2,104) 0.60 4.32 29.5 + T1 × 0.542age × 0.26
FCSRT Total Recall 76.65 (2,104) 0.59 3.31 22.7 + T1 × 0.64 2 age × 0.11
FCSRT Free Delayed Recall 50.47 (2,102) 0.49 2.40 10.1 + T1 × 0.58 2 age × 0.09
FCSRT Total Delayed Recall 45.11 (2,102) 0.46 1.68 9.32 + T1 × 0.60 2 age × 0.05
ROCF 3 min Recall 62.88 (2,104) 0.54 4.87 3.19 + T1 × 0.72 + edu × 0.23
ROCF 30 min Recall 123.7 (1,105) 0.54 4.86 4.18 + T1 × 0.81
Verbal Animal Fluency 74.89 (3,105) 0.67 3.23 9.99 + T1 × 0.69 2 age × 0.08 + edu × 0.16
Verbal “P” Fluency 142.1 (1,107) 0.57 3.64 3.32 + T1 × 0.78
Stroop Word 28.24 (2,107) 0.33 15.9 63.8 + T1 × 0.54 2 age × 0.36
Stroop Color 66.80 (2,107) 0.55 9.99 35.4 + T1 × 0.69 2 age × 0.27
Stroop Color-Word 91.49 (2,107) 0.62 6.67 37.9 + T1 × 0.61 2 age × 0.39
TOL-Dx Total Correct 44.54 (1,109) 0.28 1.72 2.33 + T1 × 0.52
TOL-Dx Total Moves 23.22 (2,108) 0.29 14.2 213.6 + T1 × 0.45 + age × 0.46
TOL-Dx Total Initiation Time 30.05 (3,108) 0.44 27.4 5.00 + T1 × 0.42 + age × 0.66 + sex × 17.04
TOL-Dx Total Execution Time 62.87 (2,109) 0.53 90.2 2118 + T1 × 0.61 + age × 3.19
TOL-Dx Total Solving Time 68.37 (3,108) 0.55 95.9 2138 + T1 × 0.59 + age × 3.98
VOSP Object Decision 31.86 (3,107) 0.46 1.69 10.4 + T1 × 0.52 2 age × 0.04 + edu × 0.09
VOSP Progressive Silhouettes 49.39 (1,109) 0.31 2.64 1.96 + T1 × 0.62
VOSP Position Discrimination 31.31 (1,109) 0.22 0.72 10.8 + T1 × 0.45
VOSP Number Localization 48.40 (2,108) 0.46 1.09 6.06 + T1 × 0.59 2 age × 0.04
Notes: Sex is coded as Female ¼ 1, Male ¼ 0. SEE ¼ Standard Error of the Estimate of the regression model; TMT ¼ Trail Making Test; SDMT ¼ Symbol Digit
Modalities Test; ROCF ¼ Rey –Osterrieth Complex Figure; FCSRT ¼ Free and Cued Selective Reminding Test; TOL-Dx ¼ Tower of London Drexel version;
VOSP ¼ Visual Object and Spatial Perception Battery.

controversial findings regarding practice effects on the studied tests, which seem to be highly influenced by participants’ age.
While young, well-educated samples obtain a consistent benefit of previous exposure (e.g., Attix et al., 2009; Estevis, Basso,
& Combs, 2012; Levine et al., 2004; Salinsky, Storzbach, Dodrill, & Binder, 2001), older samples show much less practice
effect at retest (Calamia et al., 2012; Gavett, Ashendorf, & Gurnani, 2015). A relevant question that would need further research
is whether reliable changes show heteroscedasticity between different age groups. In that case, the development of norms for
change should be specifically developed on different age bands to account for such behavior, and larger samples would be
needed. In addition, the relationship between reliable change and age largely depends upon the specific traits of the task and
the inter-visit interval. As suggested by our results on the TMT, which rather than displaying a mean improvement at retest it dis-
played a decrease (1 s slower in part A and 7 in part B), some tests could be relatively more influenced by aging-related changes than
by learning from previous exposure. In any case, the lack of statistical significance of the change in the TMT between visits pre-
vents us from drawing strong conclusions, and such aging-related interpretation is merely speculative. Further research on the topic
could untangle if reliable change might be different at different ages.
It has been suggested that age-related influences in practice effects operate mainly at the initial assessment and are driven by the
age-associated differences in learning ability at baseline (Salthouse, 2011). As displayed in our SRB calculations (Table 5), the
strongest predictor of score at retest is in fact the score at first visit, which is assumed to be influenced by age and education
factors. Relevance of initial performance in the prediction of performance at subsequent assessments has been previously reported
in studies using SRB indices (e.g., Attix et al., 2009; Duff, 2014) and supports the idea that the best predictor of future behavior
is past behavior. In our SRB models, age and education also significantly predicted retest scores in most of the variables
under study.
G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388 385

With the aim of illustrating the application and usefulness of the indices presented here, an example is provided below.
Continuing with our hypothetical case of a 70-year-old man with 15 years of education who has memory complaints. Let us
imagine that he is administered the FCSRT twice with a test-retest interval of 1 year. His Total Recall raw score at baseline is
47 out of 48. This raw score corresponds to an age- and education-adjusted scaled score of 14, with an associated range of percen-
tiles between 90 and 94, according to Spanish published norms (see Peña-Casanova, Gramunt-Fombuena et al., 2009). At follow-
up, his score decreases to 43, which corresponds to a scaled score of 11 (associate percentile range 60 –71), being still above his
reference group mean and far from the usual MCI cut-off score of 21.5 SD (, percentile 7). A simple discrepancy score analysis

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


(Table 3) indicated that a loss of 4 points after 1 year only occurs in the 5% – 16% of healthy population (z ¼ 21.645 to z
¼ 20.994). The calculation of the SDI index (Table 4) yields a z-score of 20.73 (24/5.49), in which the change is interpreted
as less infrequent than the z-score derived from the simple discrepancy approach. Similar disparities are found for the more
complex indices, calculated also from the data shown in Table 4. The regular RCI index results in a z-score of 21.03 (24/
3.88), and RCI controlling for practice effects yields a z-score of 21.35 [(24 2 1.25)/3.88]. Finally, the complex SRB on the
Total Recall of the FCSRT (Table 5) shows that change on this score is positively influenced by baseline performance and nega-
tively affected by age. Thus, the predicted score for our subject is 45 (22.7 + 47 × 0.64 2 70 × 0.11), and the SRB-associated
z-score is 21.21, derived from the subtraction of the predicted score from the follow-up score, and then divided by the standard
error of the estimate [(41 2 45)/3.31]. Notice that in spite of the trend to improvement at follow-up observed in the sample for the
FCSRT Total Recall, the predicted SRB score for our subject is lower (45) than its actual score at baseline (47). Since 47 is very
close to the maximum test score (48), the SRB prediction could be interpreted in terms of the regression to the mean phenomenon,
in which extreme scores statistically tend to be closer to the group mean at retest.
After performing all the reliable change calculations, we have obtained several estimates of the likelihood of observing a
4-point loss in elderly subjects in 1 year. Such estimates range from 21.645 z-scores of the simple discrepancy lower percentile
interval to 20.73 of the SDI calculation, and therefore, its interpretation is not straightforward. There is no consensus regarding the
best index to estimate reliable change. According to previous literature, a general recommendation would be relying on RCI con-
trolling for practice effect and SRB indices, since they have shown good and similar results in comparison studies (see Duff, 2012,
and Hinton-Bayre, 2010, for a discussion on the topic). In our example, RCI and SRB results fell between the more extreme z-scores
calculated from simple discrepancy and SDI, and are more likely reflecting the actual change of the subject when compared with
the normative sample, since they account for more of the individual’s information. In any case, it would seem clear to any clinician
that our hypothetical subject, despite obtaining scores within the normal cross-sectional psychometric range, presents with a rather
uncommon decline in verbal episodic memory. It is the clinician’s responsibility to choose the most appropriate approach and
norms for evaluating change, and to conclude whether a meaningful and reliable change occurred or not by stabilizing a relevant
z-score cut-off (e.g., 21 or 21.645). Beyond the reliable change calculations presented here, performing a clinical interview
searching for possible functional changes during the test-retest interval should be mandatory and the most accurate approach to
evaluate the actual relevance of an observed cognitive change.
The data presented in this paper, collected from Spanish healthy subjects over 50 years, could be of special interest in the de-
tection of subtle cognitive decline in elderly subjects. Studying cognitive change in elderly people deserves particular attention
because the longitudinal normative approach of RCI could be especially useful in prevalent and burdensome pathologies such as
Alzheimer’s disease (AD). Norms for cognitive change can be used to define the transitional stage between healthy aging and de-
mentia, the MCI stage (Bläsi et al., 2009), and to provide additional evidence that an individual has MCI due to AD (Albert et al.,
2011). In preclinical AD, in which subjects’ scores could be within the normal psychometric range for years before meeting a diag-
nosis of MCI, cognitive decline is one of the operationalizations of the “subtle cognitive decline” proposed for Stage III of pre-
clinical AD (Sperling et al., 2011). Thus, norms for change seem especially useful in the characterization and definition of the
predementia phases of AD. However, it is important to highlight that once a clinical diagnosis is made, cognitive change over
time should be compared with the change observed in relevant clinical samples, rather than to those from healthy samples. In
fact, as stressed by several authors, applying change norms derived from healthy subjects to assess change in clinical samples re-
present a clear misuse of such normative data (Attix et al., 2009; Gavett et al., 2015). Clinical-based norms have been successfully
applied in conditions such as Parkinson disease (Rinehardt et al., 2010; Schoenberg et al., 2012) or epilepsy surgery (Martin,
Griffith, Sawrie, Knowlton, & Faught, 2006) to assess the effect of interventions.
This study has several limitations. The sample used to develop the norms presented here is small and, therefore, the provided
data should be considered as an initial contribution to the reliable change literature in our population. However, the use of a large
battery and the 1-year test-retest interval, which is clinically meaningful, are some of the strengths of the study. Further research on
reliable change in larger healthy and cognitively impaired Spanish samples, along with longer follow-ups (e.g., after 2 or 3 years)
would contribute important tools to improve the clinical practice of neuropsychologists in Spain. There is an additional limitation
for the use of these data in the field of preclinical AD. Since we do not have information on the AD biomarker status of the sample,
there is the possibility of having undetected preclinical AD subjects. Therefore, the use of these norms to determine the presence of
386 G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388

“subtle cognitive decline” to define Stage III preclinical AD should be made cautiously. Robust change norms obtained from indi-
viduals that either do not develop AD pathology in further follow-ups or are negative for AD biomarkers at baseline will overcome
this limitation.
Finally, a comment on the generalizability of these data should be made. These norms are derived from a sample of
Spanish-speaking subjects from Spain. Therefore, a cautious use should be made if they are applied in a different population.
We would like to recommend that clinicians from other Spanish-speaking countries, namely from Mexico and South and
Central Americas, to be aware of cultural and linguistic differences when making clinical interpretations derived from these data,

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


since, currently, we do not still have available studies exploring the cultural-related differences in reliable change norms. In any
case, as with cross-sectional norms selection, is the clinician’s responsibility to choose the most appropriate references for each
patient and, in the absence of other better suited norms, these ones could be also of interest for Latin-American neuropsychologists.

Funding

This study was mainly supported by a grant from the Pfizer Foundation, and by the Medical Department of Pfizer, SA. Spain. It
was also supported by the Behavioral Neurology group of the Program of Neuroscience, Hospital del Mar Research Institute,
Barcelona, Spain. JP-C has received an intensification research grant from the CIBERNED (Centro de Investigación
Biomédica en Red sobre Enfermedades Neurodegenerativas), Instituto Carlos III (Ministry of Health & Consumer Affairs of
Spain).

Conflict of Interest

None declared.

Acknowledgements

We would like to thank the participants in this study for their time and collaboration. The authors would also like to thank the
reviewers of this paper for their valuable comments.

Appendix: Members of the NEURONORMA Study Team

Steering committee: JP-C, Hospital del Mar, Barcelona, Spain; RB, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain; MA,
Hospital Mútua de Terrassa, Terrassa, Spain. Principal investigators: JP-C, Hospital del Mar, Barcelona, Spain; RB, Hospital de la
Santa Creu i Sant Pau, Barcelona, Spain; MA, Hospital Mútua de Terrassa, Terrassa, Spain; Jose Luis Molinuevo, Hospital Clı́nic,
Barcelona, Spain; AR, Hospital Clı́nico Universitario, Santiago de Compostela, Spain; MSB (deceased), Hospital Clı́nico San
Carlos, Madrid, Spain; CA, Hospital Virgen Arrixaca, Murcia, Spain; Carlos Martı́nez-Parra (deceased), Hospital Virgen
Macarena, Sevilla, Spain; AF-G, Hospital Universitario La Paz, Madrid, Spain; MF-M, Hospital de Cruces, Bilbao, Spain.
Genetics substudy: Rafael Oliva, Service of Genetics, Hospital Clı́nic, Barcelona, Spain. Neuroimaging substudy: Beatriz
Gómez-Ansón, Radiology Department and IDIBAPS, Hospital Clı́nic, Barcelona, Spain. Research Fellows: Gemma Monte,
Elena Alayrach, Aitor Sainz, and Claudia Caprile, Fundació Clinic, Hospital Clinic, Barcelona, Spain; Gonzalo
Sánchez-Benavides, Behavioral Neurology Group, Institut Municipal d’Investigació Médica, Barcelona, Spain. Clinicians, psy-
chologists, and neuropsychologists: Nina Gramunt (Coordinator), Peter Böhm, Sonia González, Yolanda Buriel, Marı́a Quintana,
Sonia Quiñones, Gonzalo Sánchez-Benavides, Rosa M. Manero, Gracia Cucurella, Institut Municipal d’Investigació Mèdica,
Barcelona, Spain; Eva Ruiz, Mónica Serradell, Laura Torner, Hospital Clı́nic, Barcelona, Spain; Dolors Badenes, Laura Casas,
Noemı́ Cerulla, Silvia Ramos, Loli Cabello, Hospital Mútua de Terrassa, Terrassa, Spain; Dolores Rodrı́guez, Clinical
Psychology and Psychobiology Department, University of Santiago de Compostela, Spain; Marı́a Payno, Clara Villanueva,
Hospital Clı́nico San Carlos, Madrid, Spain; Rafael Carles, Judit Jiménez, Martirio Antequera, Hospital Virgen Arrixaca,
Murcia, Spain; Jose Manuel Gata, Pablo Duque, Laura Jiménez, Hospital Virgen Macarena, Sevilla, Spain; Azucena Sanz,
Marı́a Dolores Aguilar, Hospital Universitario La Paz, Madrid, Spain; Ana Molano, Maitena Lasa, Hospital de Cruces, Bilbao,
Spain. Data management and biometrics: Josep Maria Sol, Francisco Hernández, Irune Quevedo, Anna Salvà, Verónica
Alfonso, European Biometrics Institute, Barcelona, Spain. Administrative management: Carme Pla (deceased), Romina Ribas,
Department of Psychiatry and Forensic Medicine, Universitat Autònoma de Barcelona, and Behavioral Neurology Group,
Institut Municipal d’Investigació Mèdica, Barcelona, Spain.
G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388 387

References

Albert, M. S., DeKosky, S. T., Dickson, D., Dubois, B., Feldman, H. H., Fox, N. C., et al. (2011). The diagnosis of mild cognitive impairment due to Alzheimer’s
disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease.
Alzheimer’s and Dementia, 7 (3), 270– 279.
Ardila, A. (2005). Cultural values underlying psychometric cognitive testing. Neuropsychology Review, 15 (4), 185–195.
Attix, D. K., Story, T. J., Chelune, G. J., Ball, J. D., Stutts, M. L., Hart, R. P., et al. (2009). The prediction of change: Normative neuropsychological trajectories. The

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


Clinical Neuropsychologist, 23 (1), 21– 38.
Bläsi, S., Zehnder, A. E., Berres, M., Taylor, K. I., Spiegel, R., & Monsch, A. U. (2009). Norms for change in episodic memory as a prerequisite for the diagnosis of
mild cognitive impairment (MCI). Neuropsychology, 23 (2), 189– 200.
Calamia, M., Markon, K., & Tranel, D. (2012). Scoring higher the second time around: Meta-analyses of practice effects in neuropsychological assessment. The
Clinical Neuropsychologist, 26 (4), 543– 570.
Chelune, G. J., Naugle, R. I., Lüders, H., Sedlak, J., & Awad, I. A. (1993). Individual change after epilepsy surgery: Practice effects and base-rate information.
Neuropsychology, 7, 41–52.
Cysique, L. A., Franklin, D., Abramson, I., Ellis, R. J., Letendre, S., Collier, A., et al. (2011). Normative data and validation of a regression based summary score for
assessing meaningful neuropsychological change. Journal of Clinical and Experimental Neuropsychology, 33 (5), 505–522.
del Pino, R., Peña, J., Schretlen, D. J., Ibarretxe-Bilbao, N., & Ojeda, N. (2015). Multisite study for norming and standardizing neuropsychological instruments in
healthy people for Spanish population: Methods and characteristics of Normacog project. Revista de Neurologı́a, 61 (2), 57– 65.
Dikmen, S. S., Heaton, R. K., Grant, I., & Temkin, N. R. (1999). Test-retest reliability and practice effects of expanded Halstead –Reitan Neuropsychological Test
Battery. Journal of the International Neuropsychological Society, 5 (4), 346–356.
Duff, K. (2012). Evidence-based indicators of neuropsychological change in the individual patient: Relevant concepts and methods. Archives of Clinical
Neuropsychology, 27 (3), 248–261.
Duff, K. (2014). One-week practice effects in older adults: Tools for assessing cognitive change. The Clinical Neuropsychologist, 28 (5), 714–725.
Estevis, E., Basso, M. R., & Combs, D. (2012). Effects of practice on the Wechsler Adult Intelligence Scale-IV across 3- and 6-month intervals. The Clinical
Neuropsychologist, 26 (2), 239– 254.
Fletcher-Janzen, E., Strickland, T. L., & Reynolds, C. (2000). Handbook of cross-cultural neuropsychology. New York: Kluwer Academic / Plenum Publishers.
Gavett, B. E., Ashendorf, L., & Gurnani, A. S. (2015). Reliable change on neuropsychological tests in the uniform data set. Journal of the International
Neuropsychological Society, 21 (07), 558–567.
Gross, A. L., Benitez, A., Shih, R., Bangen, K. J., Glymour, M. M. M., Sachs, B., et al. (2015). Predictors of retest effects in a longitudinal study of cognitive
aging in a diverse community-based sample. Journal of the International Neuropsychological Society, 21 (07), 506–518.
Heilbronner, R. L., Sweet, J. J., Attix, D. K., Krull, K. R., Henry, G. K., & Hart, R. P. (2010). Official position of the American Academy of Clinical
Neuropsychology on serial neuropsychological assessments: The utility and challenges of repeat test administrations in clinical and forensic contexts. The
Clinical Neuropsychologist, 24 (8), 1267–1278.
Hinton-Bayre, A. D. (2010). Deriving reliable change statistics from test-retest normative data: Comparison of models and mathematical expressions. Archives of
Clinical Neuropsychology, 25 (3), 244– 256.
Iverson, G. (2001). Interpreting change on the WAIS-III/WMS-III in clinical samples. Archives of Clinical Neuropsychology, 16 (2), 183 –191.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting
and Clinical Psychology, 59 (1), 12– 19.
Jonaitis, E. M., Koscik, R. L., La Rue, A., Johnson, S. C., Hermann, B. P., & Sager, M. A. (2015). Aging, practice effects, and genetic risk in the Wisconsin registry
for Alzheimer’s prevention. The Clinical Neuropsychologist, 29 (4), 426–441.
Levine, A. J., Miller, E. N., Becker, J. T., Selnes, O. A., & Cohen, B. A. (2004). Normative data for determining significance of test-retest differences on eight
common neuropsychological instruments. The Clinical Neuropsychologist, 18 (3), 373 –384.
Martin, R., Griffith, H. R., Sawrie, S., Knowlton, R., & Faught, E. (2006). Determining empirically based self-reported cognitive change: Development of reliable
change indices and standardized regression-based change norms for the multiple abilities self-report questionnaire in an epilepsy sample. Epilepsy and
Behavior, 8 (1), 239–245.
McCaffrey, R., Duff, K., & Westervelt, H. J. (2000). Practitioner’s guide to evaluating change with neuropsychological assessment instruments. New York:
Kluwer Academic / Plenum Publishers.
McSweeny, A. J., Naugle, R. I., Chelune, G. J., & Lüders, H. (1993). “T Scores for Change”: An illustration of a regression approach to depicting change in clinical
neuropsychology. Clinical Neuropsychologist, 7 (3), 300– 312.
Peña-Casanova, J., Casals-Coll, M., Quintana, M., Sanchez-Benavides, G., Rognoni, T., Calvo, L., et al. (2012). Spanish normative studies in a young adult popu-
lation (NEURONORMA young adults Project): Methods and characteristics of the sample. Neurologia, 27 (5), 253–260.
Peña-Casanova, J., Blesa, R., Aguilar, M., Gramunt-Fombuena, N., Gómez-Ansón, B., Oliva, R., et al. (2009). Spanish Multicenter Normative Studies
(NEURONORMA Project): Methods and sample characteristics. Archives of Clinical Neuropsychology, 24 (4), 307–319.
Peña-Casanova, J., Gramunt-Fombuena, N., Quinones-Ubeda, S., Sanchez-Benavides, G., Aguilar, M., Badenes, D., et al. (2009). Spanish Multicenter Normative
Studies (NEURONORMA Project): Norms for the Rey– Osterrieth complex figure (copy and memory), and free and cued selective reminding test. Archives of
Clinical Neuropsychology, 24 (4), 371– 393.
Rinehardt, E., Duff, K., Schoenberg, M., Mattingly, M., Bharucha, K., & Scott, J. (2010). Cognitive change on the repeatable battery of neuropsychological status
(RBANS) in Parkinson’s disease with and without bilateral subthalamic nucleus deep brain stimulation surgery. The Clinical Neuropsychologist, 24 (8), 1339–
1354.
Sachs, B. C., Lucas, J. A., Smith, G. E., Ivnik, R. J., Petersen, R. C., Graff-Radford, N. R., et al. (2012). Reliable change on the Boston naming test. Journal of the
International Neuropsychological Society, 18 (2), 375– 378.
Salinsky, M. C., Storzbach, D., Dodrill, C. B., & Binder, L. M. (2001). Test-retest bias, reliability, and regression equations for neuropsychological measures
repeated over a 12– 16-week period. Journal of the International Neuropsychological Society, 7, 597– 605.
Salthouse, T. A. (2011). Effects of age on time-dependent cognitive change. Psychological Science, 22 (5), 682–688.
388 G. Sánchez-Benavides et al. / Archives of Clinical Neuropsychology 31 (2016); 378–388

Sánchez-Benavides, G., Peña-Casanova, J., Casals-Coll, M., Gramunt, N., Molinuevo, J. L., Gómez-Ansón, B., et al. (2014). Cognitive and neuroimaging profiles
in mild cognitive impairment and Alzheimer’s disease: Data from the Spanish Multicenter Normative Studies (NEURONORMA Project). Journal of
Alzheimer’s Disease, 41 (3), 887–901.
Schoenberg, M. R., Rinehardt, E., Duff, K., Mattingly, M., Bharucha, K. J., & Scott, J. G. (2012). Assessing reliable change using the repeatable battery for the
assessment of neuropsychological status (RBANS) for patients with Parkinson’s Disease undergoing deep brain stimulation (DBS) surgery. The Clinical
Neuropsychologist, 26 (2), 255– 270.
Sperling, R. A., Aisen, P. S., Beckett, L. A., Bennett, D. A., Craft, S., Fagan, A. M., et al. (2011). Toward defining the preclinical stages of Alzheimer’s disease:
Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s

Downloaded from https://academic.oup.com/acn/article/31/4/378/1694398 by BIBLIOTECA.MADRID@URJC.ES user on 12 September 2023


and Dementia, 7 (3), 280– 292.

You might also like