You are on page 1of 7

Archives of Clinical Neuropsychology, Vol. 15, No. 5, pp.

425–431, 2000
Copyright © 2000 National Academy of Neuropsychology
Printed in the USA. All rights reserved
0887-6177/00 $–see front matter

PII S0887-6177(99)00034-7

Equivalence of the Color Trails Test and Trail

Making Test in Nonnative English-Speakers

Downloaded from by guest on 28 October 2021

Anthony T. Dugbartey

University of Washington

Brenda D. Townes

University of Washington and Bogaziçi University, Istanbul, Turkey

Roderick K. Mahurin

University of Washington and Battelle Memorial Research Institute

The Color Trails Test (CTT) has been described as a culture-fair test of visual attention, graph-
omotor sequencing, and effortful executive processing abilities relative to the Trail Making Test
(TMT). In this study, the equivalence of the TMT and the CTT among a group of 64 bilingual
Turkish university students was examined. No difference in performance on the CTT-1 and
TMT Part A was found, suggesting functionally equivalent performance across both tasks. In
contrast, the statistically significant differences in performance on CTT-2 and TMT Part B, as
well as the interference indices for both tests, were interpreted as providing evidence for task
nonequivalence of the CTT-2 and TMT Part B. Results have implications for both psychometric
test development and clinical cultural neuropsychology. © 2000 National Academy of Neuro-
psychology. Published by Elsevier Science Ltd

Keywords: test equivalence, cultural neuropsychology, cross-cultural, bilinguals, Color Trails Test

The Trail Making Test (TMT) is one of the most frequently used tests in forensic (Lees-
Haley, Smith, Williams, & Dunn, 1996) and clinical (Butler, Retzlaff, & Vanderploeg,
1991) neuropsychological evaluations, and has well-demonstrated efficacy in distinguish-
ing individuals with brain lesions from normal controls (Berg, Franzen, & Wedding,
1987; Eson, Yen, & Bourke, 1978; Reitan, 1958, 1971). Unfortunately, it is plagued by a
number of difficulties including susceptibility to practice effects (Durvasula et al., 1996;
Dye, 1979; Lezak, 1982; Matarazzo, Wiens, Matarazzo, & Goldstein, 1974), lack of spec-
ificity for indicating laterality (Heilbronner, Henry, Buck, Adams, & Fogel, 1991) or
caudality (Anderson, Bigler, & Blatter, 1995; Reitan & Wolfson, 1995; but see Segalow-

The authors thank Galm Jahic, Hande Eslen, O. Akin, and B. Yuhay for their help in data collection.
Address correspondence to: Anthony T. Dugbartey, Forensic Psychiatric Services Commission, 2840 Nanaimo
Street, Victoria, British Columbia, Canada V8T 4W9.

426 A. T. Dugbartey, B. D. Townes, and R. K. Mahurin

itz, Unsal, & Dywan, 1992) of cerebral lesions, and length of formal education (Born-
stein & Suga, 1988; Ernst, 1987; Stuss, Stethem, Hugenholtz, & Richard, 1989).
A number of alternate forms of the TMT (desRosiers & Kavanaugh, 1987; Lewis &
Rennick, 1979) and other modifications (Ricker & Axelrod, 1994; Ricker, Axelrod, &
Houtler, 1996; Trippel, Cutlan, Long, & Alsworth, 1997) have been introduced in an at-
tempt to improve its scope and diagnostic utility in the evaluation of neurobehavioral
disorders. These variations notwithstanding, and consistent with the views of many pro-
ponents (e.g., Ardila, 1995; Geisinger, 1994; Greenfield, 1997; Report of the Therapeu-
tics and Technology Assessment Subcommittee of the American Academy of Neurol-
ogy, 1996), most of the extant neuropsychological tests, including the TMT, may have

Downloaded from by guest on 28 October 2021

rather limited utility in cross-cultural settings.
Analysis of the TMT is predicated on the assumption that although both Parts A and
B entail the use of overlearned serial stimuli, the added complexity of dual parallel pro-
cessing requirements on Part B uniquely taps ease of switching cognitive set from one
idea to the other. However, the obvious verbal or linguistic component in the alphabetic
sequencing aspect of Part B places illiterate and nonnative English-speakers at a distinct
disadvantage (D’Elia, Satz, Uchiyama, & White, 1996). Although it has been suggested
that acculturation may exert minimal effects on TMT performance (Arnold, Montgom-
ery, Castaneda, & Longoria, 1994), the Color Trails for Children (Williams et al., 1995)
and Color Trails Test (CTT; D’Elia et al., 1996) were developed on the premise that the
TMT has limited utility across cultures. The primary purpose of the present study, there-
fore, was to compare the CTT and TMT in a relatively well-educated nonnative English-
speaking sample. The CTT time variables, with 2-week test-retest reliabilities ranging
from .64 to .79 and convergent validity (in comparison to the TMT) ranging from .41 to
.49 in healthy individuals (D’Elia et al., 1996), are fairly modest. As regards clinical va-
lidity and utility, international multicenter studies have found Part 2 of the CTT to be
the single most sensitive cognitive test among a selected neuropsychological battery in
symptomatic, but not asymptomatic, HIV-1 seropositive individuals (Maj et al., 1994).
Perhaps due in part to the rise in numbers of nonnative English-speaking individuals
presenting for neuropsychological services, global migration rates, and the growing
awareness of cultural value systems as vital moderators in neuropsychological evalua-
tions, the need for culturally fair neuropsychometric tools are likely to come into higher
demand. It is imperative, therefore, that greater research effort be devoted toward un-
derstanding the sensitivity, utility, and limitations of neuropsychological assessment de-
vices in culturally diverse settings.


A total of 64 healthy, full-time students of Bogaziçi University, Istanbul, Turkey (51
undergraduate and 13 graduate) with a mean age of 22.67 years (SD ⫽ 1.98; range, 18–
29) were included in the sample. The participants (47% male; 86% right-handed) were
native Turkish-speaking students who also spoke English as a second language, and
were among the top 10% of university students in Turkey as determined by their scores
on the national university entrance examination. All participants had earned a minimum
score of 550 on the Test of English as a Foreign Language ®. Each individual received ex-
tra course credit for participation in the study. None of the participants had a known his-
tory of neurological or psychiatric illness.
Color Trails Test Equivalence 427

Materials and Procedure

The following tests were administered: Trail Making Test Parts A and B (TMT-A &
TMT-B; Army Individual Test Battery, 1944); Stroop Color Word Test (Golden, 1978);
Symbol Search subtest from the Wechsler Intelligence Scale for Children-III (WISC-III;
Wechsler, 1989); Controlled Oral Word Association Test (COWAT; Benton, 1969); Digit
Symbol subtest from the Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler,
1981); and Color Trails Form A, Parts 1 (CTT-1) and 2 (CTT-2) (D’Elia, et al., 1996).
All participants were individually administered the neuropsychological test battery
by trained psychometrists using standard instructions and procedures. The language of

Downloaded from by guest on 28 October 2021

assessment was English. The COWAT, however, was administered in Turkish in order
to explore the nature of the relationship between generative naming in the participants’
primary language versus performance on the TMT-B using the Latin alphabet.

Statistical Analysis
Spearman rank order correlations were computed for selected variables on account
of their nonnormal distributions and the highly selective nature of the sample.
Wilcoxon matched-pairs signed-rank tests were performed to determine whether
there were any significant differences between the CTT-1 and TMT-A, and between
CTT-2 and TMT-B. The two-tailed Wilcoxon test was chosen to correct for violations of
the normality assumption given the restricted, positive skew of sample composition from
which the data were derived. In order to reduce the risk of Type I error from the multiple
pairwise comparisons, the alpha level decision rule for statistical significance was set at .01.


Table 1 presents the mean scores for each of the neuropsychological tests adminis-
tered in this study. There was no significant difference between the mean scores for
CTT-1 and TMT-A (Z ⫽ ⫺.83, p ⫽ .40). Not surprisingly, the CTT-1 was significantly
correlated with TMT-A (r ⫽ .35, p ⬍ .01). A significant difference emerged between the
mean scores of the CTT-2 and TMT-B (Z ⫽ ⫺4.38, p ⬍ .0001), which were moderately
correlated at .45 (p ⬍ .01). The interference index (a procedure utilized in the CTT for
partialling out the effects of undivided attention and simple perceptual tracking on the
alternating demands of CTT-2) was calculated for both the CTT and TMT. The Wil-
coxon test revealed a significant difference between the means of the obtained interfer-
ence indices (Z ⫽ ⫺3.42, p ⬍.001).
Interestingly, there was a significant difference between the COWAT and TMT-B (Z ⫽
⫺3.61, p ⬍ .001), with essentially negligible variance shared between both these tests
(r ⫽ .06). The number of errors made on both forms of the CTT and TMT were sub-
jected to the Friedman test with results indicating no significant relationship between
the errors made on these tests (Fr ⫽ 5.06, df ⫽ 3, p ⫽ .17).
A standard simultaneous multiple-regression analysis was computed to evaluate
whether the CTT-2 (treated as the dependent variable) was related to other time-depen-
dent attention and psychomotor tasks. These latter independent variables included the
Stroop, Digit Symbol, Symbol Search tests, and time to completion on the CTT-1, TMT-
A, and TMT-B. Results indicated a significant relationship, F(8, 55) ⫽ 10.50, p ⬍ .0001,
with the proposed model accounting for 55% of the variance on CTT-2. Both the CTT-1
(t ⫽ 4.12, p ⬍ .001) and TMT-A (t ⫽ 3.15, p ⬍ .01) emerged as significant predictors.
428 A. T. Dugbartey, B. D. Townes, and R. K. Mahurin

Means and Standard Deviations for Neuropsychological Test Scores

Test M SD Range

Trail Making Test

Part A (seconds) 27.88 9.12 12–50
Part A (errors) .13 .38 0–2
Part B (seconds) 50.03 14.51 21–100
Part B (errors) .20 .44 0–2
Interference Index .91 .59 ⫺.14–2.75
Color Trails Test
Part 1 (seconds) 26.89 8.59 10–52

Downloaded from by guest on 28 October 2021

Part 1 (errors) .11 .31 0–1
Part 2 (seconds) 58.30 14.10 30–97
Part 2 (errors) .31 .64 0–2
Interference Index 1.28 .56 .17–2.97
WISC-III Symbol Search 35.91 6.08 21–45
WAIS-R Digit Symbol 69.77 10.72 44–93
COWAT 41.61 10.32 24–66
Stroop Color Word Test
Word 111.94 10.73 82–140
Color 70.42 10.50 52–100
Color-Word 48.37 9.52 17–76

WISC-III ⫽ Wechsler Intelligence Scale for Children-III; WAIS-R ⫽

Wechsler Adult Intelligence Scale-Revised; COWAT ⫽ Controlled
Oral Word Association Test.


The purpose of the present study was to determine the parity of the CTT and the
TMT in a group of high-functioning nonnative English-speaking individuals. At the out-
set, a major shortcoming of this investigation that warrants acknowledgement is the fact
that order of CTT and TMT administration was invariant for the entire sample (with the
latter preceding the former). Future studies may direct attention to the role of practice
effects and the extent to which the performance on the TMT exerts a transfer of training
influence upon the CTT, or vice versa.
The present findings support previous contentions (D’Elia et al., 1996) that the TMT-A
and CTT-1 are essentially equivalent tasks. This is not surprising given the fact that al-
though differing in total path length (by about 21.9 cm), both measures rely almost ex-
clusively on overlearned numerical sequencing abilities. Multiple regression analysis of
the time-dependent cognitive tasks revealed that the CTT-1 and TMT-A were best pre-
dictors of performance on the CTT-2 time to completion, suggesting that they both are
heavily influenced by similar cognitive abilities.
The difference in performance between TMT-B and CTT-2 (when time to comple-
tion and the interference index were analyzed) provides evidence for a robust distinction
in task requirements of these measures. In other words, this observed difference implies
that the CTT-2 may not necessarily be the ‘culture reduced’ equivalent of the TMT-B as
posited by D’Elia et al. (1996), but that they may measure different underlying cognitive
skills. A parsimonious explanation for this difference may lie in the number of visual
stimuli involved. The CTT-2 has almost twice as many stimuli than the TMT-B (i.e., 49
vs. 25 circles), which effectively makes it just as much a measure of simple visuopercep-
tual scanning and susceptibility to visual distractors as it may be a measure of higher or-
der cognitive flexibility. A similar explanation has been evoked to account for the differ-
ence in performance between TMT-A and TMT-B (Gaudino, Geisler, & Squires, 1995).
The TMT-B is often described as culture-loaded because of its reliance on the Latin al-
Color Trails Test Equivalence 429

phabet. Our finding of virtually little association between phonemic verbal fluency (as
assessed in the participants’ native language) and TMT-B performance may be due to
the possibility that processing with verbal automatisms in normal, and perhaps, highly
educated individuals may not depend exclusively on the conventional verbal-linguistic
pathways used in propositional speech. Indeed, support for this hypothesis can be found
in the clinical literature wherein some aphasic patients have demonstrated relatively
preserved verbal sequential speech abilities (Blunk, De Bleser, Willmes, & Zeumer,
1981). Also, there is evidence of impaired production of serial automatic speech follow-
ing nondominant right hemisphere lesions (Speedie, Wertman, Ta’ir, & Heilman, 1993).
Clearly, the issue of bilingualism further complicates the picture (see e.g., Breier et al.,

Downloaded from by guest on 28 October 2021

1996). Besides, the possibility of a Stroop effect on CTT-2 performance, as suggested by
Spreen and Strauss (1998), may influence test performance in a way that makes the
CTT-2 a qualitatively different task in comparison to TMT-B. Interestingly, our study
found only a modest association (r ⫽ ⫺.29) between TMT-B and the Stroop Color-
Word subtest. Further empirical studies on the relationship between the Stroop interfer-
ence effect and CTT-2 is warranted.
Perhaps equally important in determining the psychometric equivalence of the CTT
and TMT is the question of how to ensure whether cultural equivalence has been at-
tained. Simply substituting the obviously verbally mediated alphabetical elements with
colors may not necessarily be sufficient to ensure the universality of a test. Without un-
dertaking a detailed cultural investigation of what an ability test means in a given cul-
ture, it often is difficult to demonstrate its cultural relevance (Boesch, 1996), and hence
performance equivalence vis-à-vis the respective test takers from different cultural back-
grounds. This is not an unattainable task (see e.g., Beach, 1984; Colby, Jessor, &
Shweder, 1996), but an approach that is, regrettably, rarely utilized in contemporary


Anderson, C. V., Bigler, E. D., & Blatter, D. D. (1995). Frontal lobe lesions, diffuse damage, and neuropsy-
chological functioning in traumatic brain-injured patients. Journal of Clinical and Experimental Neuro-
psychology, 17, 900–908.
Ardila, A. (1995). Directions of research in cross-cultural neuropsychology. Journal of Clinical and Experi-
mental Neuropsychology, 17, 143–150.
Army Individual Test Battery. (1944). Manual of directions and scoring. Washington, DC: War Department,
Adjutant General’s Office.
Arnold, B. R., Montgomery, G. T., Castaneda, I., & Longoria, R. (1994). Acculturation and performance of
Hispanics on selected Halstead-Reitan neuropsychological tests. Assessment, 1, 239–248.
Beach, K. (1984). The role of external memory cues in learning to become a bartender. Quarterly Newsletter
of the Laboratory of Comparative Human Cognition, 61, 42–43.
Benton, A. L. (1969). Development of a multilingual aphasia battery: Progress and problems. Journal of the
Neurological Sciences, 9, 39–48.
Berg, R. A., Franzen, M. D., & Wedding, D. (1987). Screening for brain impairment. New York: Springer.
Blunk, R., De Bleser, R., Willmes, K., & Zeumer, H. A. (1981). A refined method to relate morphological
and functional aspects of aphasia. European Neurology, 30, 68–79.
Boesch, E. E. (1996). The seven flaws of cross-cultural psychology. The story of a conversion. Mind, Culture,
and Activity, 3, 2–10.
Bornstein, R. A., & Suga, L. J. (1988). Educational level and neuropsychological performance in healthy
elderly subjects. Developmental Neuropsychology, 4, 17–22.
Breier, J. I., Dede, D., Fiano, K., Fennell, E. B., Leach, L., Uthman, B., & Gilmore, R. (1996). Differential
effects of right hemisphere injection during the Wada procedure on the primary and secondary lan-
guages in a bilingual speaker. Neurocase, 2, 341–345.
430 A. T. Dugbartey, B. D. Townes, and R. K. Mahurin

Butler, M., Retzlaff, P., & Vanderploeg, R. (1991). Neuropsychological test usage. Professional Psychology:
Research and Practice, 22, 510–512.
Colby, A., Jessor, R., & Shweder, R. (Eds.). (1996). Ethnography and human development. Chicago: Univer-
sity of Chicago Press.
D’Elia, L. F., Satz, P., Uchiyama, C. L., & White, T. (1996). Color Trails Test. Professional manual. Odessa,
FL: Psychological Assessment Resources.
desRosiers, G., & Kavanaugh, D. (1987). Cognitive assessment in closed head injury: Stability, validity and
parallel forms for two neuropsychological measures of recovery. International Journal of Clinical Neu-
ropsychology, 9, 162–173.
Durvasula, R. S., Satz, P., Hinkin, C. H., Uchiyama, C., Morgenstern, H., Miller, E. N., Jin, S., Sorenson, D.,
van Gorp, W., Visscher, B. R., & Mitchell, M. (1996). Does practice make perfect? Results of a six-year

Downloaded from by guest on 28 October 2021

longitudinal study with semi-annual testings [Abstract]. Archives of Clinical Neuropsychology, 11, 386.
Dye, O. A. (1979). Effects of practice on Trail Making Test performance. Perceptual and Motor Skills, 48, 206.
Ernst, J. (1987). Neuropsychological problem solving in the elderly. Psychology and Aging, 2, 363–365.
Eson, M. E., Yen, J. K., & Bourke, R. S. (1978). Assessment of recovery from serious head injury. Journal of
Neurology, Neurosurgery, and Psychiatry, 41, 1036–1042.
Gaudino, E. A., Geisler, M. W., & Squires, N. K. (1995). Construct validity in the Trail Making Test: What
makes Part B harder? Journal of Clinical and Experimental Neuropsychology, 17, 529–535.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues influencing
the normative interpretation of assessment instruments. Psychological Assessment, 6, 304–312.
Golden, J. C. (1978). Stroop Color and Word Test. Chicago, IL: Stoelting.
Greenfield, P. M. (1997). You can’t take it with you: Why ability assessments don’t cross cultures. American
Psychologist, 52, 1115–1124.
Heilbronner, R. L., Henry, G. K., Buck, P., Adams, R. L., & Fogel, T. (1991). Lateralized brain damage and
performance on Trail Making Test A and B, Digit Span Forward and Backward, and TPT memory and
location. Archives of Clinical Neuropsychology, 6, 251–258.
Lees-Haley, P. R., Smith, H. H., Williams, C. W., & Dunn, J. T. (1996). Forensic neuropsychological test
usage: An empirical survey. Archives of Clinical Neuropsychology, 11, 45–51.
Lewis, R. F., & Rennick, P. M. (1979). Manual for the Repeatable Cognitive-Perceptual-Motor Battery.
Grosse Pointe Park, MI: Axon.
Lezak, M. D. (1982). The test-retest stability and reliability of some tests commonly used in neuropsychology.
Paper presented at the Fifth European Conference of the International Neuropsychological Society,
Deauville, France.
Maj, M., Satz, P., Janssen, R., Zaudig, M., Starace, F., D’Elia, L. F., Sughondabirom, B., Mussa, M., Naber,
D., Ndetei, D., Schulte, G., & Sartorius, N. (1994). WHO neuropsychiatric AIDS study, cross-sectional
phase II: Neuropsychological and neurological findings. Archives of General Psychiatry, 51, 51–61.
Matarazzo, J. D., Wiens, A. N., Matarazzo, R. G., & Goldstein, S. G. (1974). Psychometric and clinical test-
retest reliability of the Halstead-Reitan Impairment Index in a sample of healthy, young, normal men.
Journal of Nervous and Mental Disease, 158, 37–49.
Reitan, R. M. (1958). The validity of the Trail Making Test as an indicator of organic brain damage. Percep-
tual and Motor Skills, 9, 127–130.
Reitan, R. M. (1971). Trail Making Test results for normal and brain-damaged children. Perceptual and
Motor Skills, 33, 575–581.
Reitan, R. M., & Wolfson, D. (1995). Category Test and Trail Making Test as measures of frontal lobe func-
tions. The Clinical Neuropsychologist, 9, 50–56.
Report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neu-
rology. (1996). Assessment: Neuropsychological testing of adults. Considerations for neurologists. Neu-
rology, 47, 592–599.
Ricker, J. H., & Axelrod, B. N. (1994). Analysis of an oral paradigm for the Trail Making Test. Assessment, 1, 47–51.
Ricker, J. H., Axelrod, B. N., & Houtler, B. D. (1996). Clinical validation of the Oral Trail Making Test. Neu-
ropsychiatry, Neuropsychology, and Behavioral Neurology, 9, 50–53.
Segalowitz, S. J., Unsal, A., & Dywan, J. (1992). CNV evidence for the distinctiveness of frontal and poste-
rior neural processes in a traumatic brain-injured population. Journal of Clinical and Experimental Neu-
ropsychology, 14, 545–565.
Speedie, L. J., Wertman, E., Ta’ir, J., & Heilman, K. M. (1993). Disruption of automatic speech following
right basal ganglia lesion. Neurology, 43, 1768–1774.
Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests. Administration, norms, and
commentary (2nd ed.). New York: Oxford University Press.
Stuss, D. T., Stethem, L. L., Hugenholtz, H., & Richard, M. T. (1989). Traumatic brain injury. The Clinical
Neuropsychologist, 3, 145–156.
Color Trails Test Equivalence 431

Trippel, A. J., Cutlan, S. L., Long, C. J., & Alsworth, M. (1997). Trails C: Potential of an expanded Trail Mak-
ing Test to detect verbal ability deficiency. Poster presented at the 17th Annual Meeting of the National
Academy of Neuropsychology, Las Vegas, NV.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised manual. San Antonio, TX: The Psychological
Wechsler, D. (1989). Wechsler Intelligence Scale for Children-III manual. New York: The Psychological Cor-
Williams, J., Rickert, V., Hogan, J., Zolten, A. J., Satz, P., D’Elia, L. F., Asarnow, R. F., Zaucha, K., & Light,
R. (1995). Children’s Color Trails. Archives of Clinical Neuropsychology, 10, 211–223.

Downloaded from by guest on 28 October 2021

You might also like