You are on page 1of 6

Disability and Rehabilitation

ISSN: 0963-8288 (Print) 1464-5165 (Online) Journal homepage: https://www.tandfonline.com/loi/idre20

Test–retest reliability of the Wisconsin Card Sorting


Test in people with schizophrenia

En-Chi Chiu & Shu-Chun Lee

To cite this article: En-Chi Chiu & Shu-Chun Lee (2019): Test–retest reliability of the
Wisconsin Card Sorting Test in people with schizophrenia, Disability and Rehabilitation, DOI:
10.1080/09638288.2019.1647295

To link to this article: https://doi.org/10.1080/09638288.2019.1647295

View supplementary material

Published online: 30 Jul 2019.

Submit your article to this journal

Article views: 8

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=idre20
DISABILITY AND REHABILITATION
https://doi.org/10.1080/09638288.2019.1647295

ORIGINAL ARTICLE

Test–retest reliability of the Wisconsin Card Sorting Test in people with


schizophrenia
En-Chi Chiua and Shu-Chun Leeb,c
a
Department of Long-Term Care, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan, ROC; bDepartment of Occupational
Therapy, Taipei City Psychiatric Center, Taipei City Hospital, Taipei, Taiwan, ROC; cSchool of Occupational Therapy, College of Medicine,
National Taiwan University, Taipei, Taiwan, ROC

ABSTRACT ARTICLE HISTORY


Purpose: The aim of this study was to examine the test–retest reliability of the Wisconsin Card Sorting Received 4 May 2018
Test in people with schizophrenia. In this study, minimal detectable change (MDC) was calculated and Revised 19 July 2019
systematic measurement errors were evaluated. Accepted 19 July 2019
Method: Sixty-three people with schizophrenia underwent the WCST twice with a two-week interval.
KEYWORDS
Test–retest reliability was evaluated using intraclass correlation coefficient. Systematic measurement error Executive functions;
was examined using paired t-test and effect size (Cohen’s d). reproducibility of results;
Results: The values of intraclass correlation coefficient were >0.70, except for two indices random measurement error;
(“nonperseverative errors” and “failure to maintain set” with intraclass correlation coefficient of 0.56 and systematic measurement
0.30, respectively). Seven indices showed nonsignificant differences between the two assessments error; minimal detectable
(t(62)¼ 0.84 to 1.38, p > 0.05) and negligible effect sizes (d ¼ 0.03–0.13). The values of MDC with 95% change; percentage of
certainty were 32.3, 42.0, 31.2, 36.9, 40.1, 3.3, and 3.8 for the “total number correct,” “perseverative minimal detectable change
responses,” “perseverative errors,” “nonperseverative errors,” “conceptual level responses,” “number of cat-
egories completed,” and “failure to maintain set” indices, respectively.
Conclusions: The WCST has acceptable test–retest reliability. Two indices (“nonperseverative errors” and
“failure to maintain set”) revealed lower levels of consistency in scores over repeated assessments.
Clinicians and researchers should be cautious when using these two indices to interpret of the re-assess-
ment results in people with schizophrenia.

ä IMPLICATIONS FOR REHABILITATION


 The Wisconsin Card Sorting Test showed acceptable test–retest reliability in people with
schizophrenia.
 Six indices of the Wisconsin Card Sorting Test revealed substantial random measurement errors,
which should be used cautiously to interpret executive functions over repeated assessments.

Introduction performance of daily activities is important for people with schizo-


phrenia. Clinicians and researchers need to conduct assessments
Impaired executive function is one of the prominent cognitive
periodically to develop appropriate treatment plans and track
issues in people with schizophrenia. Approximately 68–85% of
people with schizophrenia have impaired executive functions [1]. intervention progress.
The Wisconsin Card Sorting Test (WCST) is one of the common
Executive functions indicate higher-level cognitive functions that
enable people to form goals, plan how to achieve those goals, measures used to assess executive functions in people with
and then execute the plans effectively [2]. Based on the Lezak schizophrenia [6]. It comprises 16 indices, namely the “number of
model, people with schizophrenia show impaired executive trials administered,” “total number correct,” “total number of
functions in four patterns (1) volition: individuals could not set up errors,” “percent errors,” “perseverative responses,” “percent per-
goals; (2) planning: individuals have difficulty in recognising and severative responses,” “perseverative errors,” “percent persevera-
organising steps or materials to achieve goals; (3) purposive tive errors,” “nonperseverative errors,” “percent nonperseverative
action: individuals display impairments in maintaining, switching, errors,” “conceptual level responses,” “percent conceptual level
and stopping sequences of planned movements; (4) effective per- responses,” “number of categories completed,” “trials to complete
formance: individuals are unable to detect and correct errors [3]. first category,” “failure to maintain set,” and “learning to learn”
In people with schizophrenia, the clinical features resulting indices [7]. The description of the WCST indices is shown in
from impaired executive functions could lead to difficulty in per- Supplementary Table S1. According to the factor structure of the
forming daily activities [4]. Independence in daily activities is one WCST used in psychiatric settings, the WCST indices evaluate
of the essential rehabilitation goals in people with schizophrenia three factors: ability to shift, problem solving, and response main-
[5]. Therefore, improving executive functions to enhance the tenance [8].

CONTACT Shu-Chun Lee A1057@tpech.gov.tw Department of Occupational Therapy, Taipei City Psychiatric Center, Taipei City Hospital, No. 309, Songde Rd.,
Xinyi Dist., Taipei 110, Taiwan, ROC
Supplemental data for this article can be accessed here.
ß 2019 Informa UK Limited, trading as Taylor & Francis Group
2 E.-C. CHIU AND S.-C. LEE

A cognitive measure has to be consistent during repeated Table 1. Demographics of participants (n ¼ 63).
assessments in order to obtain meaningful and explanatory test Characteristic
results. Test–retest reliability is the degree to which a measure is Age (years), mean (SD) 43.9 (9.8)
consistent and free from measurement errors over time [9]. Gender, n (%)
Random measurement error (i.e., factors influencing scores, which Male 30 (47.6)
Female 33 (52.4)
are unpredictable) is an important aspect to be considered when Education, n (%)
explaining score changes. Minimal detectable change (MDC) is Junior high school and lower 11 (17.5)
described as the minimal change beyond random measurement Senior high school 31 (49.2)
error at certain confidence levels between two assessments [10]. College and higher 21 (33.3)
The MDC helps to explain whether or not a score change for an Onset age (years), mean (SD) 22.3 (7.4)
Duration of illness (years), mean (SD) 21.2 (9.3)
individual is a true change (improvement or deterioration). Schizophrenia subtypes, n (%)
Systematic measurement error is a predictable error of measure- Simple type 14 (22.2)
ment, such as practice effect. Examinees may improve in test Disorganised type 6 (9.5)
results over repeated assessments, due to the influence from pre- Paranoid type 14 (22.2)
Residual type 4 (6.3)
vious experiences in administering the same test [11]. For inter- Schizoaffective disorder 5 (7.9)
preting test–retest results in clinical and research settings, Undifferentiated type 20 (31.7)
examining test–retest reliability is essential. Type of antipsychotics, n (%)
Test–retest reliability of the WCST has been examined in other First generation 14 (22.2)
Second generation 47 (74.6)
populations (e.g., normal population and sleep apnoea popula-
Third generation 4 (6.3)
tion) [12,13]. A few studies have examined test–retest reliability in Taking two types of antipsychotics 7 (11.1)
people with schizophrenia [14,15]. However, in these studies, the Antipsychotic loading, mean (SD) 1.1 (0.8)
MDC values were not calculated and systematic measurement Mini Mental State Examination, mean (SD) 25.8 (3.6)
errors were not investigated. Therefore, the purpose of this study SD: standard deviation.
was to examine the test–retest reliability of the WCST in people
with schizophrenia. In our study, the MDC values were computed
cards to four response cards every time without instructions from
and systematic measurement errors were evaluated for the WCST.
the test examiner. Participants must deduce sorting principles
according to feedback of correct or wrong matches. As participants
Methods discover the rules for making correct matches, they have to follow
the sorting principle despite the changing conditions and neglect
Participants
other unrelated geometric figures. After making ten correct
A convenience sample of people with schizophrenia was recruited matches consecutively (called as one category), the sorting prin-
from three community rehabilitation facilities in northern Taiwan ciple changes without any prior notification. Participants have to be
from April to June in 2014. The inclusion criteria for selecting par- able to adapt to the changing sorting principles [7]. In this study,
ticipants were (1) diagnosis of schizophrenia based on the participants are considered to have finished the WCST only after
Diagnostic and Statistical Manual of Mental Disorders, fifth edi- completing all 128 cards, which means each participant would
tion; (2) age > 20 years old; (3) onset time > 2 years; (4) the Mini have to undergo the same test [8]. We chose seven indices with
Mental State Examination score  24; and (5) stable psychiatric raw scores (i.e., “total number correct,” “perseverative responses,”
symptoms (i.e., receiving antipsychotic medication regularly for “perseverative errors,” “nonperseverative errors,” “conceptual level
three months). The exclusion criteria were (1) history of brain responses,” “number of categories completed,” and “failure to
injury and (2) diagnoses of substance abuse and intellectual maintain set” indices) to represent executive functions based on
developmental disorder. Table 1 shows the participants’ demo- the factor structure of the WCST for people in psychiatric settings
graphic characteristics. [8]. The five percent indices were not chosen because of linear
All participants signed an informed consent form before they transformations of their individual raw indices. The “total number
were recruited for the study. The study was approved by the of errors” was not chosen because it is a linear combination of the
Institutional Review Board of a local hospital in the psychi- two indices (“perseverative errors” and “nonperseverative errors”).
atric centre. The rule for discontinuing administration of the WCST is comple-
tion of all 128 cards and thus the “number of trials administered”
index was not chosen. The “trials to complete the first category”
Procedure
and “learning to learn” indices were not used, because their scores
All participants were assessed two times by one examiner using could not be calculated unless every participant completed one
the WCST. The second test was conducted two weeks after the and three categories, respectively.
first one. The participants were assessed in a quiet room to avoid The Mini Mental State Examination was applied to assess the
any interferences that could influence their performance. During participants’ global cognitive function (i.e., orientation, attention
the two-week interval between the tests, the participants did not and calculation, registration, recall, and language) [16]. The score
undergo any cognitive treatments. The demographic data of par- of the Mini Mental State Examination ranges from 0 to 30. A
ticipants were obtained from medical records. higher score demonstrates better global cognitive function.
Test–retest reliability of the Mini Mental State Examination has
been evaluated in people with schizophrenia [17].
Instrument
The WCST comprises 128 response cards with three types of geo-
Data analysis
metric figures which differ based on form, number, and colour. In
the WCST, participants are required to apply sorting principles We examined the test–retest reliability using the intraclass correl-
through trial and error. In this test, participants match stimulus ation coefficient (ICC3, 1), based on a two-mixed model with an
RELIABILITY OF WISCONSIN CARD SORTING TEST 3

Table 2. Results test–retest reliability.


Index Mean1 (SD1) Mean2 (SD2) ICC (95% CI) Paired t-test (df) p Value Cohen’s d
Total number correct 65.9 (25.9) 67.7 (27.6) 0.80 0.84 (62) 0.403 0.07
Perseverative responses 39.8 (28.6) 36.1 (28.1) 0.72 1.36 (62) 0.178 0.13
Perseverative errors 33.8 (21.0) 31.1 (20.8) 0.71 1.38 (62) 0.174 0.13
Nonperseverative errors 28.3 (20.1) 29.2 (19.9) 0.56 0.39 (62) 0.698 0.05
Conceptual level responses 48.3 (33.4) 47.7 (36.1) 0.81 0.24 (62) 0.810 0.02
Number of categories completed 2.9 (3.0) 2.8 (3.6) 0.84 0.40 (62) 0.688 0.03
Failure to maintain set 1.3 (1.7) 1.2 (1.5) 0.30 0.20 (62) 0.841 0.03
SD: standard deviation; df: degree of freedom; ICC: intraclass correlation coefficient; CI: confidence interval.

absolute agreement type. The test–retest consistency was eval- Table 3. Minimal detectable change at 95, 90, and 80% confidence levels.
uated as follows: ICC < 0.39, poor reliability; ICC ¼ 0.40–0.59, mod- Index SEM MDC95 (MDC%) MDC90 MDC80
erate reliability; ICC ¼ 0.60–0.79, good reliability; and ICC ¼ Total number correct 11.6 32.3 (27.4) 24.8 19.4
0.80–1.00, excellent reliability [18]. We computed the MDC of the Perseverative responses 15.2 42.0 (33.3) 35.2 27.4
WCST to identify any deterioration or improvement in an individ- Perseverative errors 11.2 31.2 (33.2) 26.1 20.4
Nonperseverative errors 13.3 36.9 (45.5) 30.8 24.1
ual with schizophrenia. The formula for calculating the MDC is
Conceptual level responses 14.5 40.1 (34.0) 33.5 26.2
shown below [10]: Number of categories completed 1.2 3.3 (30.2) 2.8 2.2
Failure to maintain set 1.4 3.8 (63.7) 3.2 2.5
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
SEM ¼ SD1  1  ICC (1) SEM: standard error of measurement; MDC: minimal detectable change.

pffiffiffi Discussion
MDC ¼ z  score  SEM  2 (2)
In this study for people with schizophrenia, the results revealed
The z-score is the confidence interval from the normal distribu- that three indices of the WCST have excellent test–retest reliability
tion (e.g., MDC95 represents 1.96 for 95% confidence interval). The (i.e., “total number correct”, “conceptual level responses”, and
SEM is the standard error of measurement. The SD1 is standard “number of categories completed”). Two indices (“perseverative
deviation for the first assessment. In this study, we calculated responses” and “perseverative errors”) showed good test–retest
MDC values with 95, 90, and 80% confidence levels. We also com- reliability. These five indices have sufficient levels of consistency
puted the percentage of MDC (i.e., MDC%) using the MDC95, in scores over two repeated assessments. Two indices (“failure to
which was obtained by dividing the MDC95 by the maximum maintain set” and “nonperseverative errors”) demonstrated poor
score and multiplying the result by 100. The MDC% can be used to moderate test–retest reliability. The “failure to maintain set”
to compare the relative quantities of random measurement errors index refers to the number of wrong trials which occurred after
between different measures [19]. An MDC% of less than 30% is five or more consecutively correct trials. People with schizophre-
considered as an acceptable random measurement error [20]. nia who have cognitive deficits of working memory and attention
For determining systematic measurement error, we used may have difficulty in correctly choosing the response/answer
paired t-test (two-tailed, a ¼ 0.05) to assess statistically significant consecutively [22,23]. The “nonperseverative errors” index was
differences between the two assessments. We used the effect size used to calculate the number of wrong trials which do not
(Cohen’s d) to calculate the size of systematic measurement error. involve perseverative behaviour. People with schizophrenia may
The criteria for assessing systematic measurement error were as not be able to find and maintain the rules referring simultan-
follows: 0.20  d < 0.50, small; 0.50  d < 0.80, moderate; and eously to three parameters (i.e., form, number, and colour) and
may guess the answers, which increases instability over two
d  0.80, large [21].
assessments. With regard to the seven indices examined in this
study, the WCST has acceptable test–retest reliability in people
Results with schizophrenia.
For the seven indices of the WCST, no statistically significant
A total of 63 participants completed two assessments of the
differences were noticed and effect sizes (d < 0.20) were negli-
WCST. The mean age of participants was 43.9 years and 47.6%
gible, indicating that the WCST has no systematic measurement
were males (Table 1). The average duration of illness was
error (e.g., practice effect) in people with schizophrenia. In a previ-
21.2 years. About 33.3% of the participants had a college-level
ous study, the WCST showed practice effect in college students
education or higher.
[24]. The practice effect can be traced to two sources: test-specific
For test–retest consistency, the ICC values of three indices (i.e.,
practice and item-specific practice. Test-specific practice is gained
“total number correct,” “conceptual level responses,” and “number when individuals develop test strategies through repeated assess-
of categories completed”) were >0.80 (Table 2). The ICC value of ments. Item-specific practice is gained by remembering test items
the “failure to maintain” index was <0.39. The ICC values for the from previous tests [25]. The WCST has no practice effect in peo-
three remaining indices ranged from 0.56 to 0.72. Table 3 displays ple with schizophrenia, but noticeable practice effect in college
the values of the MDC at 95, 90, and 80% confidence levels (i.e., students. A possible reason is college students might develop test
MDC95, MDC90, and MDC80) for the seven indices. The MDC% of strategies and remember the test items better than people with
the “total number correct” index was <30%. The MDC% of the schizophrenia and perform better in the second assessment.
other indices ranged from 30.2 to 63.7%. Further studies are needed to investigate and compare the prac-
Regarding systematic measurement error, paired t-test for tice effect of the cognitive measures in people with schizophrenia
these indices revealed no statistically significant differences and college students. In this study, the systematic measurement
between the two assessments (p > 0.05). Cohen’s d of these indi- error (e.g., practice effect) can be ignored in the re-assessment
ces ranged from 0.03 to 0.13 (Table 2). with a two-week interval for people with schizophrenia.
4 E.-C. CHIU AND S.-C. LEE

Regarding the clinical implications of this study for people [2] Leak MD. Neuropsychology assessment. New York (NY):
with schizophrenia, we provided the MDC values of seven indices Oxford University Press Inc.; 2012.
in the WCST to interpret score changes in executive functions. For [3] Chiu EC, Lee SC, Kuo CJ, et al. Development of a perform-
an individual, a score change over two assessments higher than ance-based measure of executive functions in patients with
the MDC value can be recognised as a true change (beyond ran- schizophrenia. PLoS One. 2015;10:e0142790.
dom measurement error) at a certain confidence level. For [4] O’Grada C, Dinan T. Executive function in schizophrenia:
instance, the MDC95 value (i.e., 95% confidence level) of the “total what impact do antipsychotics have? Hum
number correct” was 32.3. An individual with a score change > Psychopharmacol. 2007;22:397–406.
32.3 over two assessments demonstrates a true improvement [5] Morin L, Franck N. Rehabilitation interventions to promote
with 95% certainty after treatment. However, six indices showed recovery from schizophrenia: a systematic review. Front
high MDC% (>30%), which implies that these indices have sub- Psychiatry. 2017;8:100.
stantial random measurement errors. One possible reason for the [6] Prentice KJ, Gold JM, Buchanan RW. The Wisconsin Card
substantial random measurement errors is that participants may Sorting impairment in schizophrenia is evident in the first
have felt frustrated from the negative feedback of incorrect four trials. Schizophr Res. 2008;106:81–87.
responses, which impacted the test results [26]. We computed the [7] Heaton RK, Chelune GJ, Talley JL, et al. Wisconsin card sort-
MDC% using the MDC95 (the more robust levels of confidence ing test manual revised and expanded. Lutz (FL):
interval). People with schizophrenia may not obtain a true score Psychological Assessment Resources Inc.; 1993.
change at 95% certainty in these indices. In this study, we calcu- [8] Greve KW, Stickle TR, Love JM, et al. Latent structure of the
lated the MDC values at different confidence levels (e.g., 95, 90, Wisconsin card sorting test: a confirmatory factor analytic
and 80%). The different confidence levels are useful for clinicians
study. Arch Clin Neuropsychol. 2005;20:355–364.
and researchers to decide how much confidence they can claim [9] Portney LG, Watkins MP. Foundations of clinical research:
for an individual score change after two assessments.
applications to practice. Upper Saddle River (NJ): Pearson
This study has two limitations that should be noted. First, a
Education; 2009.
convenience sample in northern Taiwan was used, restricting the
[10] Haley SM, Fragala-Pinkham MA. Interpreting change scores
generalisability of our findings. In future studies, people with
of tests and measures used in physical therapy. Phys Ther.
schizophrenia could be recruited from different regions in Taiwan
2006;86:735–743.
or different countries to cross-validate our findings. Second, the
[11] Chiu EC, Koh CL, Tsai CY, et al. Practice effects and test-re-
minimal important difference for the WCST was not investigated.
test reliability of the Five Digit Test in patients with stroke
The minimal important difference is used to determine whether
score changes of clients are meaningful to clinical users or clients. over four serial assessments. Brain Inj. 2014;28:1726–1733.
[12] Paolo AM, Axelrod BN, Tro €ster AI. Test-retest stability of the
Future studies estimating the minimal important difference is war-
ranted to improve the applicability of the WCST in people with Wisconsin card sorting test. Assessment. 1996;3:137–143.
schizophrenia. [13] Ingram F, Greve KW, Ingram PT, et al. Temporal stability of
the Wisconsin card sorting test in an untreated patient
sample. Br J Clin Psychol. 1999;38:209–211.
Conclusions [14] Seidman LJ, Pepple JR, Faraone SV, et al. Wisconsin card
The results of this study showed that the WCST has acceptable sorting test performance over time in schizophrenia.
test–retest reliability. However, two indices (i.e., “nonperseverative Preliminary evidence from clinical follow-up and neurolep-
errors” and “failure to maintain set” indices) showed lower tic reduction studies. Schizophr Res. 1991;5:233–242.
levels of consistency in scores over repeated assessments, which [15] Harvey PD, Palmer BW, Heaton RK, et al. Stability of cogni-
suggests that caution is required for using these two indices tive performance in older patients with schizophrenia: an
in interpreting of the re-assessment results for people with 8-week test-retest study. Am J Psychiatry. 2005;162:
schizophrenia. 110–117.
[16] Guo NW, Liu HC, Wong PF, et al. Chinese version and
norms of the Mini-Mental State Examination. J Rehabil Med
Disclosure statement
Assoc. 1988;16:52–59.
No potential conflict of interest was reported by the authors. [17] de Leon J, Ellis G, Rosen P, et al. The test-retest reliability
of the Mini-Mental State Examination in chronic schizo-
phrenic patients. Acta Psychiatr Scand. 1993;88:188–192.
Funding
[18] Bushnell CD, Johnston DC, Goldstein LB. Retrospective
This work was supported by grants from Taipei City Government. assessment of initial stroke severity: comparison of the NIH
Stroke Scale and the Canadian Neurological Scale. Stroke.
2001;32:656–660.
ORCID [19] Chiu EC, Wu WC, Chou CX, et al. Test-retest reliability and
minimal detectable change of the Test of Visual Perceptual
Shu-Chun Lee http://orcid.org/0000-0002-8007-0409 Skills-Third Edition in patients with stroke. Arch Phys Med
Rehabil. 2016;97:1917–1923.
[20] Huang SL, Hsieh CL, Wu RM, et al. Minimal detectable
References
change of the timed “up & go” test and the dynamic gait
[1] Domingo SZ, Julio B, Garcıa-Portilla MP, et al. Cognitive index in people with Parkinson disease. Phys Ther. 2011;91:
performance associated to functional outcomes in stable 114–121.
outpatients with schizophrenia. Schizophr Res Cogn. 2015; [21] Cohen J. Statistical power analysis for the behavioral scien-
2:146–158. ces. Hillsdale (MI): Lawrence Erlbaum Associates; 1988.
RELIABILITY OF WISCONSIN CARD SORTING TEST 5

[22] Carter JD, Bizzell J, Kim C, et al. Attention deficits In schizo- Experimental Building Language test battery in young
phrenia-preliminary evidence of dissociable transient and adults. Peer J. 2015;3:e1460.
sustained deficits. Schizophr Res. 2010;122:104–112. [25] Silva PHR, Spedo CT, Barreira AA, et al. Symbol Digit
[23] Kraguljac NV, Srivastava A, Lahti AC. Memory deficits in Modalities Test adaptation for Magnetic Resonance
schizophrenia: a selective review of functional magnetic Imaging environment: a systematic review and meta-ana-
resonance imaging (fMRI) studies. Behav Sci. 2013;3: lysis. Mult Scler Relat Disord. 2018;20:136–143.
330–347. [26] Younan B. Cognitive functioning differences between phys-
[24] Piper BJ, Mueller ST, Geerken AR, et al. Reliability and ically active and sedentary older adults. J Alzheimers Dis
validity of neurobehavioral function on the Psychology Rep. 2018;2:93–101.

You might also like