You are on page 1of 16

This article was downloaded by: [University Of Pittsburgh]

On: 16 June 2013, At: 02:44


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Clinical and Experimental


Neuropsychology
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/ncen20

Use of the Wechsler Adult Intelligence Scale Digit


Span subtest for malingering detection: A meta-
analytic review
a a a a
Lindsey J. Jasinski , David T. R. Berry , Anni L. Shandera & Jessica A. Clark
a
University of Kentucky, Lexington, KY, USA
Published online: 03 Mar 2011.

To cite this article: Lindsey J. Jasinski , David T. R. Berry , Anni L. Shandera & Jessica A. Clark (2011): Use of the
Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review, Journal of
Clinical and Experimental Neuropsychology, 33:3, 300-314

To link to this article: http://dx.doi.org/10.1080/13803395.2010.516743

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or
systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution
in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the
contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and
drug doses should be independently verified with primary sources. The publisher shall not be liable for
any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused
arising directly or indirectly in connection with or arising out of the use of this material.
JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY
2011, 33 (3), 300–314

Use of the Wechsler Adult Intelligence Scale Digit Span


subtest for malingering detection:
A meta-analytic review

Lindsey J. Jasinski, David T. R. Berry, Anni L. Shandera, and Jessica A. Clark


University of Kentucky, Lexington, KY, USA

Twenty-four studies utilizing the Wechsler Adult Intelligence Scale (WAIS) Digit Span subtest—either the Reliable
Digit Span (RDS) or Age-Corrected Scaled Score (DS-ACSS) variant—for malingering detection were meta-an-
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

alytically reviewed to evaluate their effectiveness in detecting malingered neurocognitive dysfunction. RDS and
DS-ACSS effectively discriminated between honest responders and dissimulators, with average weighted effect
sizes of 1.34 and 1.08, respectively. No significant differences were found between RDS and DS-ACSS. Similarly,
no differences were found between the Digit Span subtest from the WAIS or Wechsler Memory Scale (WMS).
Strong specificity and moderate sensitivity were observed, and optimal cutting scores are recommended.

Keywords: Digit span; Malingering; Feigning; Wechsler Adult Intelligence Scale–Third Edition; Wechsler Memory
Scale–Third Edition; Wechsler; Meta-analysis.

Neuropsychological assessments provide information assessments. In nonforensic contexts, estimates of feign-


vital to diagnosis, prognosis, and treatment following an ing are lower, around 7.3%, but are significant enough to
injury or illness; however, test results may be rendered warrant attention (Berry & Schipper, 2007). These esti-
useless when feigning or suboptimal effort occurs. Thus, mates point to the need for well-validated measures of
it has been strongly recommended that an examination malingering and their routine utilization.
of malingering should be included in every neuropsy-
chological test battery (Bush, et al., 2005). According
to the Diagnostic and Statistical Manual of Mental
Current malingering assessment practices
Disorders–Fourth Edition, Text Revision (DSM–IV–TR;
American Psychiatric Association, 2000), malingering Dedicated malingering tests
is the “intentional production of false or exaggerated
physical or psychological symptoms, motivated by exter- Tests developed specifically to assess feigning, also known
nal incentives” (p. 739). Malingering is thought to be as dedicated malingering tests, have received signifi-
a multidimensional construct, with distinctions made cant attention in recent years. Several such tests have
between psychiatric, neurocognitive, and somatic feign- been well supported in the literature (Grote & Hook,
ing (Larrabee, 2007). In general, estimates of the base rate 2007). The most common type of dedicated malin-
of neurocognitive feigning have demonstrated substan- gering test utilizes a forced-choice memory paradigm,
tial rates of malingering, particularly in forensic settings. where patients are shown a stimulus to remember and
For instance, Frederick, Crosby, and Wynkoop (2000) then are asked to choose the correct response from
estimated 44% of criminal defendants had an invalid test- several options after a brief delay. In this model,
taking approach, while Mittenberg, Patton, Canyock, feigning is suspected in the presence of either worse-
and Condit (2002) showed rates of malingering around than-chance performance or scores falling below an a
41% in litigants after a traumatic brain injury (TBI). priori cutting score. Examples of well-validated forced-
Thus, it seems that 40% is a reasonable approximation choice measures for detecting malingered neurocognitive
of the base rates of neurocognitive feigning in forensic dysfunction include the Test of Memory Malingering

Address correspondence to David T. R. Berry, Department of Psychology, University of Kentucky, 012-D Kastle Hall, Lexington,
Kentucky, 40504–0044, USA (E-mail: dtrb85@gmail.com).
© 2011 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business
http://www.psypress.com/jcen DOI: 10.1080/13803395.2010.516743
DIGIT SPAN AND MALINGERING DETECTION 301

(Tombaugh, 1996), Word Memory Test (Green, Allen, & Third Edition (WAIS–III; Wechsler, 1997a; Digit Span:
Astner, 1997), Letter Memory Test (Inman et al., 1998), Trueblood & Schmidt, 1993; Vocabulary–Digit Span,
Validity Indicator Profile (Frederick & Crosby, 2000), WAIS Discriminant Function: Mittenberg, Theroux-
Portland Digit Recognition Test (Binder, 2002), and Fichera, Zielinski, & Heilbronner, 1995; Reliable Digit
Victoria Symptom Validity Test (Slick, Hopp, Strauss, Span, RDS: Greiffenstein, Baker, & Gola, 1994),
& Thompson, 1997). See Vickery, Berry, Inman, Harris, from the Wechsler Memory Scale–Revised (WMS–
and Orey (2001), and Grote and Hook (2007) for more R; Wechsler, 1987) and Third Edition (WMS–III;
comprehensive reviews. Wechsler, 1997b; Logical Memory recognition: Killgore
While dedicated malingering tests are by far the most & DellaPietra, 2000; Discriminant Function: Mittenberg,
widely used in clinical practice and boast the most empir- Patton, & Legler, 2003), and from the Halstead–Reitan
ical support, they suffer from some limitations. One of the Battery (Discriminant Function: Mittenberg, Rotholc,
strongest criticisms of dedicated malingering tests is their Russell, & Heilbronner, 1996). Additionally, several indi-
vulnerability to coaching, or the provision of information vidual tests have been examined for their ability to detect
to examinees prior to an evaluation regarding the pres- malingering, including the Wisconsin Card Sorting Test
ence and/or nature of malingering indices. For example, (Bernard, McGrath, & Houston, 1996), Auditory Verbal
attorneys might discuss specific tests with clients or warn Learning Test (Binder, Villaneuva, Howieson, & Moore,
them not to fail tests where they have to choose answers 1993), California Verbal Learning Test (Millis, Putnam,
from two options. The impact of attorney coaching has Adams & Ricker, 1995), Finger Tapping Test (Arnold
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

been well documented in the malingering literature to et al., 2005), and Rey Complex Figure Test (Meyers &
date (Gutheil, 2003; Lees-Haley, 1997; Suhr & Gunstad, Volbrecht, 1999), among others. See Larrabee (2007) for
2007; Wetter & Corrigan, 1995). Additionally, increasing a more comprehensive review of the malingering research
amounts of information available via the Internet make on several of these instruments.
self-coaching an easy task when preparing for a neu-
ropsychological evaluation (Bauer & McCaffrey, 2006;
Ruiz, Drake, Glass, Marcotte, & van Gorp, 2002). For
example, Allen and Green (2001) documented a steady Methodological considerations
decline in failure rates of the Computerized Assessment
Research methodology
of Response Bias (CARB) over a six-year period corre-
sponding closely to increasing Internet availability, and Methodological rigor has been a topic of considerable
Berry and Schipper (2007) describe a recent search of the attention with regard to malingering research. While case
Psychological Assessment Resources (PAR, Inc.) web- studies and differential prevalence designed studies have
site that provided extensive information on the names, been utilized in the past, they are not currently accepted
acronyms, and purpose of several malingering mea- as standard practice for malingering research. In con-
sures. This, coupled with the finding that nearly 20% of trast, the most frequently utilized methodologies include
Americans surveyed felt that intentional distortion or simulation and known-groups design studies. Simulation
misrepresentation within the context of the legal compen- studies involve a group of participants asked to feign
sation system was acceptable (Public Attitude Monitor, symptoms of a disorder while completing a series of
1992), points to the importance of developing accurate tests, who are compared to a group of “normal” par-
tests for detecting malingering. ticipants asked to respond honestly or to perform to
In addition to vulnerability to coaching, another con- the best of their ability on the same test battery. The
cern with dedicated malingering tests is the administra- honest group often has no prior psychiatric or neurolog-
tion time required. For many well-validated procedures, ical diagnosis. However, Rogers (2008) points out that
upwards of 20 minutes additional time is needed for a such simulation designs have only “marginal relevance
single dedicated malingering test. Given Rogers’s (1997) to malingering because they do not differentiate feigned
recommendation that multiple indicators of feigning be from genuine disorders” (p. 413). Rather, the stronger ver-
employed, the added testing time could be significant. sion of the simulation design study, enhanced simulation
Further, dedicated tests of malingering provide data rele- design, employs a clinically relevant honest comparison
vant only to effort, but no information on other cognitive group, thus providing direct information on the differ-
functions, making them time consuming with very lit- ences between the feigned and authentic disorder. For
tle relevant diagnostic information obtained (Mathias, example, in traumatic brain injury research, the clinically
Greve, Bianchini, Houston, & Crouch, 2002). relevant group would be participants who had sustained a
brain injury but were not seeking compensation and had
verified honesty (Rogers, 2008).
Simulation studies are often utilized due to their strong
Embedded indices
internal validity, control over group assignment, and
In response to the notable limitations of dedicated malin- convenience. However, concerns are often raised regard-
gering tests, interest in indices embedded within stan- ing the generalizability of simulation studies. Several
dard neuropsychological tests has grown in recent years. methodological considerations have been recommended
Malingering indices developed within large-scale bat- to strengthen the external validity of a study. For
teries have included subtests from the Wechsler Adult instance, in addition to the malingering group being
Intelligence Scale Revised (WAIS–R; Wechsler, 1981) and clinically relevant or having experienced the condition
302 JASINSKI ET AL.

of interest (e.g., brain injury), honest and malinger- provide information on the test’s ability to classify malin-
ing groups should be matched demographically. Further, gering. Sensitivity (SN) refers to the probability of a
Rogers (2008) recommends that malingering instructions positive test sign (i.e., failure of malingering test) in indi-
be clear and comprehensible to the participants, as well viduals who are malingering. Specificity (SP), in contrast,
as specific. A warning for participants to feign believably refers to the probability of a negative test sign (i.e., pass-
and relevant proportional incentives, both positive for ing malingering test) in individuals who are responding
following instructions, and negative for failing to do so, honestly. The hit rate is the overall efficiency of the test,
should be included to increase motivation in the malin- taking into account both sensitivity and specificity rates.
gering group. Instructions should include information It is important to note that distributions from which
on relevant symptoms of interest, as well as detection these scores are derived usually overlap, and thus there
strategies and avoidance techniques. Additionally, allow- is no score that perfectly separates malingering and hon-
ing preparation time for participants after they are given est groups, resulting in the need to choose the most
symptom information greatly enhances similarity to real- effective cutting score based on minimization of relevant
world evaluations, as does the inclusion of both malinger- errors (Greve & Bianchini, 2004). While sensitivity and
ing and nonmalingering tests within the research battery. specificity rates are the most frequently cited statistics in
Finally, a postexperimental check for understanding and malingering detection, they do not represent the entire
cooperation with instructions is necessary to exclude data picture. When derived from malingering studies, these
from subjects who were not compliant. If utilized, each statistics are typically based on equal sample sizes with
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

of these methodological considerations can help maintain known base rates of malingering approximating 50%.
strong internal validity, while improving external validity Positive and negative predictive powers, in contrast, take
(Rogers, 2008). into account the relevant base rate and therefore pro-
Known-groups methodology, in contrast, is often uti- vide a more pertinent statistic for clinicians. Additionally,
lized for its strong external validity, at the expense of they are more relevant for clinicians in that they pre-
some internal validity and study control. Known-groups dict the probability that an individual is either honest
designs employ a fully clinically relevant malingering or malingering based on test scores. Specifically, positive
group thought to be feigning based on failure of one predictive power (PPP) refers to the probability of malin-
or more well-validated malingering tests. This “known” gering given a positive test score, while negative predictive
malingering group is then compared to a clinically rel- power (NPP) is the probability of the absence of malin-
evant honest group of similar background and injury gering, or honest responding, given a negative test score.
severity. The generalizability of such results is very high, However, given that these values vary with the base rate
given that both groups have experienced the necessary of each malingering study, they are not explored in the
antecedent events and motivation to malinger similar present meta-analysis, but can be easily calculated using
to those seen in clinical practice. However, known- the relevant sensitivity and specificity values provided
groups methodology is limited based upon the qual- subsequently. Finally, an effect size describes a test’s abil-
ity of the malingering indicators used to determine ity to differentiate between two groups, with larger effect
group status, and imperfect measures result in imperfect sizes indicative of larger average differences between the
groups and subsequently lower internal validity (Rogers, groups. Greater average differences, then, translate into
2008). more accurate diagnostic classification statistics. Each
Research methodology is an important factor to con- of these statistics are examined in further detail in the
sider when conducting meta-analytic studies, because present study.
differing methodologies greatly impact study outcomes,
resultant operating characteristics, and diagnostic clas-
sification parameters of any malingering detection tool. Goals of present study
Because each has strengths and weaknesses, well-
validated malingering measures should have supportive The goal of the present study is to provide a com-
evidence from both types of methodologies, per Rogers’s prehensive, meta-analytic evaluation of the ability
(1997) recommendations. Further, specific methodolog- of the Digit Span subtest in detecting malingered
ical characteristics should be coded and explored as neurocognitive dysfunction. The Digit Span subtest
confounds and contributing factors to the findings, as is is embedded within the revised, third, and fourth
the case in the current meta-analytic review. editions of the Wechsler Adult Intelligence Scale
(WAIS–R, WAIS–III, WAIS–IV), as well as the revised,
third, and fourth editions of the Wechsler Memory Scale
(WMS–R, WMS–III, WMS–IV). It requires an exami-
Statistical methods
nee to repeat increasing strings of digits in forward and
Diagnostic validity refers to a “test’s ability to differ- reverse order, until two consecutive strings of the same
entiate persons with and without a specified disorder” length are missed. A variation, Reliable Digit Span (RDS;
(Larrabee & Berry, 2007, p. 14), in this case malinger- Greiffenstein et al., 1994), has been developed using the
ing. A cutting score is a score at or below which, based sum of the longest strings of digits forward and back-
on prior research, an individual is thought to be malin- ward where both trials were passed. RDS was developed
gering. Sensitivity and specificity are the most commonly specifically as a malingering detection tool and has been
reported statistics in malingering detection studies and extensively studied.
DIGIT SPAN AND MALINGERING DETECTION 303

The Digit Span subtest was chosen as the focus for common criticism of meta-analysis (Lipsey & Wilson,
the present review for several reasons. Primarily, it is 2001). Thus, the primary focus of this paper was on
embedded within the Wechsler Adult Intelligence Scale feigned neurocognitive symptoms. The use of the Digit
(WAIS–R, WAIS–III, WAIS–IV) and Wechsler Memory Span test in detecting other types of feigning should be
Scale (WMS–R, WMS–III, WMS–IV), which are among explored in future studies.
the most frequently utilized neuropsychological instru-
ments in clinical practice today (Camera, Nathan, &
Puente, 2000; Rabin, Barr, & Burton, 2005). As such,
Data collection
retrospective malingering analyses can be conducted if a
clinician administered either of these instruments and is
Keyword-guided searches of the PsycINFO and
in need of information regarding malingering status. Such
MedLine databases were undertaken to identify stud-
retrospective analyses can also be particularly useful in
ies of any version of the WAIS or WMS used to detect
attaining research goals. Further, it is virtually identical
malingering. Combinations of the keywords “malinger∗ ,”
across several revisions of the WAIS and WMS, thereby
“feign∗ ,” “dissimulat∗ ,” “fak∗ ,” and “pattern analys∗ ”
increasing the number of studies available for the present
were crossed with “WAIS,” “WAIS-R,” WAIS-III,”
meta-analysis, subsequently improving the reliability of
“WMS,” “WMS-R,” and “WMS-III,” both in acronym
the findings. Finally, the Digit Span subtest is easily and
and written form, as well as “Wechsler.” The asterisks
quickly administered either individually or as part of a
are used in search databases to allow for variable suffixes
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

battery, so is particularly user friendly in a clinical setting.


(such as -ing, -ed, -s, -ion). Additionally, several of the
Thus, the focus of the present study is on a meta-analytic
prominent pattern analysis techniques used in malinger-
review of the Digit Span subtest and its ability to detect
ing literature were searched for with either the WAIS or
malingering.
the WMS, including “performance curve,” “magnitude
of error,” “violation of learning principle,” and “atypical
pattern analysis.” Reference sections of review papers,
METHOD book chapters, test manuals, and empirical studies were
scrutinized to identify additional manuscripts. Lastly,
A priori inclusion criteria email contact with highly published individuals using the
WAIS or WMS for malingering detection was attempted
Malingering studies utilizing the Digit Span subtest of in order to solicit any papers in press or under review.
the revised or third editions of the WAIS and WMS were As a result of these keyword-guided searches, 74 papers
surveyed through March 2009. Studies investigating the
were identified, 43 of them utilizing a version of the
usefulness of either the total age-corrected scaled score WAIS, 24 reviewing a version of the WMS, and 4 stud-
(DS-ACSS) or Reliable Digit Span (RDS) were included. ies using both the WAIS and the WMS. It was necessary
To be considered for inclusion, manuscripts were required
to exclude studies on the following basis: lack of a clini-
to come from a peer-reviewed journal, peer-reviewed arti- cally relevant control group (n = 10), lack of any feigning
cle in press, or English-language dissertation. Conference group (n = 5), the use of a correlational or differential
presentations and papers were not included in the present
prevalence design (n = 6), or because the Digit Span
study, as they do not typically undergo as thorough a subtest was used to determine malingering status for a
review process. In cases where dissertation results were known-groups design (n = 2). Further, 21 studies were
later published, only the published paper was included for excluded because they examined subtests of the WAIS or
evaluation. WMS other than Digit Span for malingering detection.
Publications meeting these criteria were examined and Finally, 6 papers were also excluded on the basis of being
were included if they utilized known-groups or enhanced a conference presentation. A total of 24 papers (WAIS:
simulation methodology (e.g., with a clinically relevant n = 23; WMS: n = 1) were considered appropriate based
honest comparison group). Thus, studies using student on the a priori inclusion criteria and were coded. These
or community volunteers instructed to malinger were studies are listed in the Appendix.
included provided they were compared to a group with
a history of traumatic brain injury or other neurologi-
cal disorder. Studies were excluded if comparisons were
Selection of data for review
made only with normal controls, as they do not pro-
vide information on a test’s ability to discriminate feigned In some cases, multiple contrasts were available from
from true injury (Rogers, 1997). In the present study, clin- a single study. Here, only the independent contrasts
ical samples primarily included traumatic brain injury were selected because using effect sizes from overlapping
and chronic pain patients, although several mixed neuro- groups increases statistical dependence and inflates the
logical groups were also included. Chronic pain patients mean effect sizes for the study (Lipsey & Wilson, 2001).
were included because they often feign neurocognitive One source of nonindependence involved comparing a
symptoms in addition to somatic complaints (Iverson & single feigning group with multiple honest groups. For
McCracken, 1997). Groups feigning psychiatric symp- example, a few studies included multiple honest groups
toms or mental retardation were not included in the of mild and severe TBI patients, all compared to a sin-
present review to reduce the heterogeneity among groups gle malingering group. In this case, the contrast using a
and the possibility of comparing “apples to oranges,” a group of mild TBI patients was chosen because it was
304 JASINSKI ET AL.

thought to be most clinically relevant, given that mild Discrepancies were discussed until 100% consensus was
TBI patients are most likely to present for neuropsy- reached.
chological testing and have the greatest motivation to
exaggerate symptoms, compared to those with a severe
TBI where compensation could theoretically be based RESULTS
upon physical findings and notable functional decline.
Another source of nonindependence of contrasts was the A total of 24 studies were selected and were coded accord-
use of several malingering groups compared to a single ing to the inclusion criteria. Of these, two studies did not
honest group. In this situation, the more stringent classi- provide enough information (i.e., standard deviations) to
fication criteria for the malingering group were used (e.g., calculate an effect size, while an additional two studies
choosing definite over possible malingered neurocogni- provided enough information to compute two indepen-
tive dysfunction, MND, based on the Slick, Sherman, & dent effect sizes. In other words, each of these two studies
Iverson, 1999, criteria). included at least two feigning and two honest groups
A final source of nonindependent contrasts arose from for independent comparison. Thus, the analysis of effect
the overlapping nature of the Digit Span age-corrected sizes included 24 independent contrasts. Further, of the
scaled score (DS-ACSS) and Reliable Digit Span (RDS) 24 studies examined, only 19 studies provided informa-
variants, where the same trial scores are used to com- tion on the relevant diagnostic accuracy statistics. See
pute each scale. Four of the studies included provided Appendix for a summary of each study, including effect
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

enough information to calculate effect sizes for both vari- sizes and diagnostic accuracy statistics.
ants. In order to create independent contrasts, two studies Calculation of effect sizes was completed utilizing the
were randomly assigned to contribute to the DS-ACSS Hedges’s d statistic from the meta-analytic software pro-
analyses, while the other two were assigned to the RDS gram DSTAT (Johnson, 1989). Further analyses were
analyses. While this method slightly reduced the number conducted using the Statistical Package for the Social
of studies available for each of the analyses, the benefits of Sciences (SPSS) software. The Hedges’s d value repre-
independent effects outweighed the desire for additional sents the distance between two sample means in terms of
studies. pooled standard deviation units. Although the alterna-
tive effect size Cohen’s d is often used, it has been shown
to be biased with small sample sizes, such as those often
utilized in neuropsychology research (Hedges & Olkin,
Data extraction 1985). Hedges’s d is a standardized metric thought to be
an unbiased estimate of effect size by correcting for sam-
Demographic variables such as age, gender, and mean ple size with the inverse variance, thus giving more weight
education level were extracted from each of the studies to studies with larger sample sizes (Lipsey & Wilson,
and are presented in the Appendix. Additionally, the type 2001). Thus, in the text below, all d values are Hedges’s d.
of group was coded according to the following system: The next step in data analysis was screening the effect
traumatic brain injury, mixed neurological, and chronic sizes for outliers, or unusually high or low values that
pain. The type of participants in the malingering group might skew subsequent analyses. An important consid-
was also coded, including patients, students, or com- eration when conducting outliers analysis is the bal-
munity volunteers. Means and standard deviations for ance between retaining all possible comparisons in order
RDS and DS-ACSS were extracted when available, as to investigate any methodologically interesting determi-
were sensitivity and specificity, hit rate, and methodolog- nants of variability in the dataset, and excluding extreme
ical characteristics of each study, specific to the research values unlikely to be representative of observations within
design type. Methodological characteristics deemed rele- the population of interest. In the present study, ade-
vant in this study were used to rate the methodological quate variability was seen in the effect sizes, so the desire
strength of each study and are based on recommenda- for a representative set of findings from the popula-
tions from prior literature reviews (Berry, Baer, Rinaldo, tion was a priority, and outliers were excluded. Further,
& Wetter, 2002; Berry & Schipper, 2007). A total method- such extreme effect sizes have disproportionate influence
ological strength score for each contrast was calculated on meta-analytical statistics (e.g., means, variances) and
and was divided by the number of possible points for may distort them (Lipsey & Wilson, 2001). Thus, a step-
each design type (e.g., Sim, enhanced simulation; KG, and-fences procedure (Tukey, 1977) was completed for
known groups). This was done because 16 variables were the set of 24 independent contrasts. This procedure uses
coded for simulation design studies, while known-groups the median d value to determine the likelihood of a
studies had only 7. Thus, a proportion score was used value belonging to the same distribution. Values outside
to directly compare the designs. Higher scores represent of the inner fence, located approximately 2.67 standard
methodologically stronger studies. deviations away from the median d value, are consid-
In order to ascertain coding accuracy, 42% (10) of pub- ered outliers. Application of this process resulted in two
lications were cross-coded by a second rater. Reliabilities outliers being identified from the set of independent con-
were found to be adequate for demographic variables trasts (d values: 2.97 and 3.46), both of which contributed
(r = .97), methodological characteristics (r = .92), means to a positively skewed distribution. It is noteworthy that
and standard deviations (r = 1.0), and diagnostic accu- one outlier was from a study with a very small sam-
racy statistics (sensitivity r = 1.0, specificity r = 1.0). ple size (n = 7 for the feigning group) while the other
DIGIT SPAN AND MALINGERING DETECTION 305

came from the only study utilizing inmates as part of the of this study can be interpreted as representative of the
feigning group. Removal of these outliers brought greater population of studies in the area of malingering and
normality to the distribution. the Digit Span test, including both DS-ACSS and RDS.
Following outlier removal, 22 independent contrasts This subtest is able to differentiate honest and malinger-
using either the DS-ACSS (n = 12) or RDS (n = 10) ing groups by more than one pooled standard deviation
variant remained for analysis. The distribution of the on average. However, what is not yet known is extent
sample was roughly normal. The d values for the of variability in the effect sizes and the diagnostic accu-
independent contrasts ranged from 0.92 to 2.46, with racy of DS-ACSS and RDS. These issues will be explored
a weighted mean effect size of 1.25 (95% confidence subsequently.
interval, CI = 1.13–1.37) and a median of 1.23. Overall,
these mean values indicate that the two variants of the
Digit Span test separate malingering and honest groups Variability of effect sizes
by just over one pooled standard deviation, a significant
effect size that is considered large (Cohen, 1988). Thus, When summarizing data across studies, a further consid-
honest participants had higher scores on the DS-ACSS eration is the homogeneity of the effect sizes derived, with
and RDS subtests than did those thought to be malinger- significant heterogeneity suggesting that a single mean
ing, indicating that malingerers suppressed their scores effect size should not be used to represent an overall
beyond what would be observed in a clinical population effect. In other words, the effect sizes may differ from
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

responding honestly. the population mean by more than just sampling error,
rather by systematic differences that can be further exam-
ined. The Q statistic is commonly used in meta-analysis,
Representativeness of study findings with a significant result indicating heterogeneity. It rep-
resents an asymptotic chi-square distribution with k −
While these effect sizes are substantial, several factors 1 degrees of freedom (Hedges & Olkin, 1985). Results
should be considered before having confidence in the from the 22 independent contrasts indicate that a sig-
results of a meta-analysis, including the possibility that nificant amount of variability in the effect sizes was not
the findings are not characteristic of all studies con- observed, Q(21) = 30.58, p < .08, although the results
ducted in an area. Specifically, meta-analysis has often approached significance. Hence, the overall mean effect
been challenged on the basis of the “file drawer problem,” size does appear to adequately represent the overall abil-
whereby only statistically significant results are accepted ity of the Digit Span test to detect malingering.
for publication in peer-reviewed journals, leaving open
the possibility of small effects or negative findings going
unpublished. The result would be effect sizes that overes- Within- and between-measure differences
timate the actual effect in the population. In response to
such criticism, Rosenthal (1991) developed the fail-safe N Although the overall effect size showed no statistically
statistic, using it to estimate the number of studies with significant heterogeneity, theoretically derived modera-
a null effect that would have to exist in researcher’s “file tors were explored to examine their potential impact on
drawers” in order to reduce the observed mean effect size the data, as the idea of predetermined focused contrasts is
from the sample studies to a nonsignificant level. Using potentially unrelated to overall variability and is therefore
this procedure, the fail-safe N based on the set of 22 useful to examine (Rosenthal, 1991). Further, since the
independent contrasts in this study resulted in an esti- Q statistic noted above approached significance, modera-
mated 244 unpublished null findings necessary to render tors may be useful in explaining the observed variability,
the overall effect nonsignificant (p > .05). Based on this despite lack of formal statistical significance. This explo-
analysis, it appears highly improbable that this number of ration included the variant utilized (RDS or DS-ACSS)
studies exist, thereby increasing confidence in the findings and the test battery from which Digit Span scores were
of this study as representative of the population at large. derived (versions of WAIS or WMS), in addition to other
An additional consideration is the extent to which methodological moderators discussed below.
methodological rigor influenced the overall effects A major source of hypothesized variability is between-
derived from the studies. Intuitively, one would assume variant differences, given that the Digit Span test can
that studies with increased methodological rigor would be used in the form of DS-ACSS or RDS. To examine
result in less variable, and thus more reliable, effect this, separate effect sizes were calculated for each vari-
sizes. In order to ascertain the relationship between ant after outlier extraction, and the results are presented
methodological quality and the effect sizes, a correla- in Table 1. Overall, both RDS and DS-ACSS showed
tion coefficient was calculated using the methodological robust effect sizes, with RDS having a slightly larger
strength score assigned to each study and resulted in effect size (d = 1.34, 95% CI = 1.18 − 1.50) than DS-
r = −.24 (p = .28). This indicates that, overall, method- ACSS (d = 1.08, 95% CI = 1.01 − 1.50). Further analysis
ological quality was not significantly associated with the of these effect sizes showed within-measure homogene-
magnitude of effect size. ity tests to be significant for RDS (Q = 15.89, p < .05),
Thus, based on the sizable mean effect sizes, large value indicating that the overall effect size for RDS is not able
of the fail-safe N, and lack of a significant relationship to describe the dataset accurately, and further moderator
between methodological rigor and effect size, the results analyses could help explain the variability seen within the
306 JASINSKI ET AL.

TABLE 1 data should be interpreted with caution. When examin-


Effect size data by subscale ing tests of homogeneity across each test version, results
were found to be homogenous for the WAIS–III (Q =
d 95% CI k
6.11, p > .05) and mixed WAIS/WMS studies (Q = 4.95,
RDS 1.34 1.18 – 1.50 10 p > .05). In contrast, significant heterogeneity was found
DS-ACSS 1.08 1.01 – 1.50 12 for the WAIS–R (Q = 18.48, p < .05), again suggesting
p value >.05 that the overall effect size is not an accurate descriptor,
and further moderator analyses would be helpful. It is
Note. Data presented are from independent contrasts, outlier- important to note that the other scales may not have
parsed dataset. RDS = reliable digit span, DS-ACSS = digit shown significant variability in part because of small sam-
span age-corrected scaled score, d = weighted mean corrected
ple size. Homogeneity testing was not completed for the
Hedges’s g effect size, 95% CI = 95% confidence interval around
the weighted mean effect size, k = number of studies included in
WMS–III or WMS–R given that only one effect size was
analysis. available from these studies. Again, the QB statistic was
used, and no significant differences were found in the
RDS data. Homogeneity tests were not significant, how- ability of the WAIS test versions or WMS test versions
ever, for the DS-ACSS data (Q = 12.25, p > .05). in differentiating malingering and honest groups (QB =
In addition to examining the homogeneity of each 1.03, p > .05). This finding of similar effect sizes across
variant, the QB statistic was used to test for significant test versions is crucial given the new revisions of both
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

differences between the ability of the RDS and DS-ACSS tests that have been recently developed. The results of
to separate honest and feigning groups. The QB statistic this study suggest that the Digit Span subtest could be
is analogous to an F-test and uses the Scheffé method used effectively as a malingering detection tool regardless
to protect against Type I error. In essence, this analog to of the test version administered and likely can be used in
analysis of variance (ANOVA) partitions the overall Q newer or future versions of the test. Further, retrospec-
statistic for the study into that part that can be explained tive data analysis that relies on older test versions can be
by sampling error within the two groups (Qw ; for RDS conducted with confidence.
and DS-ACSS) and the portion that can be explained by
the independent variable (QB ; Lipsey & Wilson, 2001).
Essentially, this is the within- and between-groups vari- Methodological moderators
ance used in ANOVA and here applied to the differences
between the RDS and DS-ACSS. Using this method, no Overall, very little variability was found in the effect sizes
significant differences were found in the ability of the across both the type of variant and test version. However,
RDS or DS-ACSS to separate honest and feigning groups as mentioned previously, additional theoretically derived
(QB = 2.44, p > .05). In general, it appears that both RDS methodological moderators may be an interesting and
and DS-ACSS show strength in malingering detection, important area of consideration. Further, the hetero-
and either could be used with confidence. geneity found in the RDS effect sizes as well as those
Another potential moderator was the test battery in derived from the WAIS–R warrant further analyses to
which Digit Span was administered. Table 2 shows the explain variability that extends beyond sampling error.
results from this analysis, and as can be seen, nearly all A major hypothesized source of additional variability
versions of the WAIS and WMS resulted in strong effect may be the methodological design with which the study
sizes, with the lowest seen in the WMS–R. However, was conducted, given differences in reliability and validity
it is important to note that only one study was avail- found between these methods (Rogers, 1997). Further, the
able for both the WMS–III and WMS–R; thus, these exploration of simulation versus known-groups design
has been a topic of interest in much literature, and exam-
TABLE 2 ination of these variables may shed light onto the quality
Effect size data by test version and applicability of each design.
As such, a focused contrast of the effect of study design
d 95% CI k on the effect sizes was conducted to see whether the real-
world setting, and hence increased external validity, of
WAIS–III 1.28 1.06 – 1.50 7
those in known-groups studies (n = 15) moderated the
WAIS–R 1.26 1.10 – 1.42 10
WMS–III 1.27 0.74 – 1.80 1 effect size outcome relative to a simulation or laboratory
WMS–R 1.02 0.55 – 1.49 1 design (n = 7). Results of these and other methodological
Mixed 1.21 0.92 – 1.50 3 moderator analyses appear in Table 3. Again using the QB
p value >.05 statistic, results showed no significant differences between
study design type (QB = 2.17, p > .05). Interestingly,
Note. Data presented are from independent contrasts, outlier- this suggests that results from both simulation (d = 1.27,
parsed dataset. WAIS–III = Wechsler Adult Intelligence Scale– 95% CI = 1.05–1.23) and known-groups studies (d =
Third Edition, WAIS–R = Wechsler Adult Intelligence Scale–
1.29, 95% CI = 1.22–1.36) provided similar effect sizes,
Revised edition, WMS–III = Wechsler Memory Scale–Third
Edition, WMS–R = Wechsler Memory Scale–Revised edition,
and that both can be considered valuable sources of
Mixed = combination of WAIS and WMS Digit Span subtests, information on malingering and the Digit Span test.
d = weighted mean corrected Hedges’s g effect size, 95% CI = Turning to other methodological moderators, those
95% confidence interval around the weighted mean effect size, hypothesized in advance from enhanced simulation stud-
k = number of studies included in analysis. ies included the presence of incentive offered for effort,
DIGIT SPAN AND MALINGERING DETECTION 307

TABLE 3
Results of categorical methodological effect size moderator analyses

Manipulation Condition k Mean d-value QB

Study design KG 14 1.29 1.04


Sim 8 1.17
Financial incentive Yes 3 1.34 19.58∗∗∗
No 5 1.08
Warning believable Yes 6 1.16 19.71∗∗∗
No 2 1.41
KG obj. criteria Yes 11 1.22 15.91∗∗∗
No 3 1.51

Note. Analyses conducted with outlier-parsed dataset. k = number of studies. KG = known-groups


design, Sim = simulation design.
∗∗∗ p significant at <.001.
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

preparation time, warning participants to feign believ- malingering. Again turning to the RDS and WAIS–R,
ably, and the use of coaching, while those from known- results for the WAIS–R were similar, with smaller effect
groups studies included having malingering status based sizes and less blatant malingering observed in studies uti-
on objective criteria. Although each of these modera- lizing a warning (QB = 8.07, p < .01). Analysis for the
tors contributed to the methodological strength score RDS could not be completed because all RDS studies
discussed previously, each was hypothesized as particu- used a warning.
larly important and warrants individual exploration as a Finally, turning to the known-groups design studies,
possible contributing moderator. Interestingly, the effect a hypothesized moderator was whether the malingering
of preparation time and coaching on malingering indices classification was based on objective criteria (e.g., well-
or symptoms could not be examined due to the lack of validated dedicated malingering tests). Results in Table 3
any enhanced simulation design studies utilizing those show that the presence of objective criteria significantly
methodological characteristics. moderated the effect size outcome (QB = 15.91, p < .001).
The next methodological moderator that was exam- As might be expected, those studies employing objective
ined was the presence of financial incentive in enhanced criteria for establishing the known malingering groups
simulation designed studies. Because the incentive was had smaller effect sizes. However, this finding may be an
not consistently awarded to all participants (e.g., use of a artifact of having very few studies that did not employ
drawing) and due to the very small number of studies uti- objective criteria and thus should be interpreted with cau-
lizing simulation design with financial incentive (n = 3), tion. When examining the RDS and WAIS–R, the use
this variable was coded as either being present or not. of objective criteria significantly moderated the effect for
Results of this analysis are presented in Table 3 and indi- both groups (RDS, QB = 4.49, p < .05; WAIS–R, QB =
cate that the presence of financial incentive resulted in 16.82, p < .001), with smaller effect sizes seen in studies
larger effect sizes than when no such incentive was used utilizing this methodological quality.
(QB = 19.58, p < .001). Given that the RDS variant and
WAIS–R battery showed significant heterogeneity, a spe-
cific contrast examining the moderating effect of incen-
tive was undertaken for those groups. For the WAIS–R, Diagnostic accuracy
the presence of financial incentive resulted in larger effect
sizes when using an incentive (QB = 9.00, p < .01). For In addition to examining the ability of the Digit Span
RDS, the analysis could not be completed due to hav- subtest in separating groups based on the effect size,
ing only one study using the RDS and financial incentive another important consideration for clinical practice is
and one study without incentive. Intuitively, one would the diagnostic accuracy of a test. As such, the sensitivity
assume that increased effort via the use of an incentive (proportion of feigners correctly identified by a test) and
would result in less blatant malingering and thus smaller specificity (the proportion of honest individuals correctly
effect sizes. However, in this case incentives resulted in identified as such by a test) were examined across studies.
more obvious feigning. As noted previously, a few studies provided information
Turning to the moderating effect of warning partici- relevant for effect size coding, but sensitivity and speci-
pants to feign believably, results in Table 3 showed that ficity values were not available, and vice versa. Table 4
the presence of a warning significantly moderated effect provides the mean diagnostic accuracy statistics for the 19
size outcome (QB = 19.71, p < .001). The use of a warn- independent contrasts from 18 studies (1 study used sep-
ing significantly reduced the effect size. The results should arate malingering and honest groups for cross-validation
be interpreted with caution given the very small sample of and thus allowed for 2 independent contrasts). For the
those not provided a warning (n = 2); however, this may overall sample of contrasts, mean sensitivity values were
indicate that the use of a warning results in less blatant moderate (M = 61.4%, SD = 20.5) while mean specificity
308 JASINSKI ET AL.

TABLE 4
Operating characteristics for overall sample and by subscale

Cut score M k Sensitivity (%) Specificity (%) Hit rate (%)

Total sample 6.9 19 61.4 (20.5) 86.3 (10.9) 76.3 (7.0)


RDS 7.1 9 63.3 (17.6) 86.1 (11.0) 75.7 (6.6)
DS-ACSS 6.8 10 59.7 (23.7) 86.5 (11.5) 76.8 (7.6)
p-value .712 .941 .734

Note. Analyses conducted on independent contrasts, outlier-parsed dataset. k = number of studies.


RDS = reliable digit span. DS-ACSS = digit span age-corrected scaled score. Standard deviations
in parentheses.

values were moderately high (M = 86.3%, SD = 10.9). Summary of results: Effect size analysis
This resulted in an overall hit rate of 76.3% (SD = 7.0).
A closer examination of the diagnostic accuracy sep- Overall, results of the present study indicate that the
arately for RDS and DS-ACSS showed a mean sensi- Digit Span test is sensitive to malingering and on aver-
tivity of 63.3% (SD = 17.6) for RDS (N = 9), while age differentiates malingering and honest groups by
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

specificity was 86.1% (SD = 11.0). The overall hit rate more than one pooled standard deviation (d = 1.25,
for RDS was high average (M = 75.7%, SD = 6.6) 95% CI = 1.13–1.37). Similarly, both the RDS and
at a mean cutting score of 7.1 (SD = 0.3). Similarly, DS-ACSS variants produced large effect sizes (RDS:
the mean sensitivity for the DS-ACSS (N = 10) was d = 1.34, 95% CI = 1.18–1.50; DS-ACSS: d = 1.08,
adequate (M = 59.7%, SD = 23.7), with high aver- 95% CI = 1.01–1.50). These effect sizes are similar to
age specificity (M = 86.5%, SD = 11.5) and hit rate the means reported in a recent meta-analysis of sev-
(M = 76.8%, SD = 7.6) at a mean cutting score of eral dedicated malingering tests, suggesting that the Digit
6.8 (SD = 1.9). Overall, there was no significant dif- Span may be a viable alternative to those more time-
ference between the measures on sensitivity (t = 0.376, consuming tests (Vickery et al., 2001). A file drawer anal-
p = .712), specificity (t = −0.075, p = .941), or hit ysis suggested that 244 null studies would need to exist
rate (t = −0.345, p = .734). These findings again to render the effect insignificant, an unlikely prospect.
suggest that either variant can be used effectively in Homogeneity testing revealed consistency among the
classifying malingering. However, it is noteworthy that effect sizes and variability consistent with what is
the cutting score for RDS was much less variable as expected given sampling error. Results such as these can
indicated by a small standard deviation, suggesting be considered positive, with the observed consistency in
more consistency across studies in recommended cutting the effect sizes resulting in increased confidence in the
scores. use of the scale, particularly in forensic settings. These
results aid in the Digit Span test meeting Daubert stan-
dards of admissibility in court (Daubert v. Merrell Dow
DISCUSSION
Pharmaceuticals, 1993).
Further, the effect sizes derived from studies on the
This purpose of the present study was to provide a
Digit Span subtest were found to be similar in magnitude
comprehensive meta-analytic review of the ability of the
across the variants (RDS vs. DS-ACSS) or test battery
Digit Span subtest of the Wechsler Adult Intelligence
(WAIS vs. WMS) utilized. Overall, although the effect
Scale (WAIS) and Wechsler Memory Scale (WMS) in
sizes for the RDS and WAIS–R showed significant vari-
detecting malingering, particularly in forensic settings
ability, there were no significant differences between the
where simply assuming honesty is not acceptable. More
two variants or between any of the test versions in their
specifically, the ability of the Reliable Digit Span (RDS)
ability to separate malingering and honest groups. Effect
variant (Greiffenstein et al., 1994) and the Digit Span
sizes from both variants and all test versions were large.
age-corrected scaled score variant (DS-ACSS) in sepa-
These results are promising for clinicians who may choose
rating and accurately classifying malingering and hon-
to use either the WAIS or the WMS, depending on the
est groups was examined across all studies in the area
referral question, indicating that the Digit Span subtest
through March 2009. The WAIS and WMS are two
from either battery could be used interchangeably to ade-
of the most frequently utilized neuropsychological tests,
quately detect malingering. Further, should a legal issue
and the examination of embedded malingering indices
arise after an evaluation has been completed, a statement
within these measures has the potential to allow clin-
regarding the presence or absence of malingering can be
icians to verify honesty in patients without adding
made using the Digit Span results in conjunction with
unnecessary testing time to a battery. Further, retro-
other malingering indices, given the high likelihood that
spective analyses of effort can be conducted when no
either a WAIS or a WMS was administered during the
malingering tests were given, and embedded indices on
original testing. For researchers, retrospective data analy-
standard neuropsychological tests have been hypothe-
ses are possible, and results from various test versions can
sized to be less susceptible to coaching (Mathias et al.,
be combined to produce larger sample sizes and hence
2002).
DIGIT SPAN AND MALINGERING DETECTION 309

more reliable outcomes. Further, with the recent release with Orey, Cragar, and Berry (2000) finding no effect
of the newest versions of the WAIS–IV (Wechsler, 2008) of incentive, while Rogers and Cruise (1998) reported a
and WMS–IV (Wechsler, 2009), Digit Span research and main effect of incentive on an inventory for psychiatric
application to malingering detection is supported and malingering. It may be the case that incentive has a differ-
based on these results should generalize to the nearly ential effect based on the type of feigning (neurocognitive
identical forms of the subtest in these new test versions. vs. psychiatric) or based on the type of test adminis-
Moreover, the impact of study methodology on the tered (e.g., self-report vs. oral administration); however,
effect sizes was explored, again with promising results. this has yet to be examined in the literature. Given the
For instance, homogeneity was found among effect sizes very small sample size, the current results should be
derived from both enhanced simulation and known- interpreted with extreme caution, but raise an interesting
groups studies, implying that both methodologies are point for further exploration.
justifiable approaches to determining a test’s ability to Another methodological characteristic thought to have
classify malingering. This is consistent with Rogers’s an impact on the ecological validity of a study is the
(1997) recommendation that both designs are vital to test use of a warning to feign believably. While the practice
validation, although it is in contrast with the theoreti- of providing a warning regarding malingering instru-
cal notion that known-groups designs are typically the ments is not consistent in clinical practice (Slick, Tan,
stronger methodology. Moreover, it is in contrast with Strauss, & Hultsch, 2004), Rogers (1997) strongly rec-
Rogers and Cavanaugh’s (1983) criticism of simulation ommends using a warning in simulation research. Again,
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

studies, specifically that they involve generalizing results little research has focused on this specific aspect of simu-
from individuals asked to comply with instructions to lation methodology, but intuition suggests that inclusion
fake to patients who fake when asked to comply. The of a warning might result in smaller group differences, or
findings here are promising for research using enhanced more credible feigning. This was the case in the present
simulation studies conducted with the recommended study, although again the results should be interpreted
methodological controls (Rogers, 2008), indicating they with extreme caution given the majority of studies uti-
are ecologically valid and may provide a reliable estimate lized such a warning.
of a test’s ability to detect malingering in clinical settings. Finally, the use of objective, valid measures for classify-
Additional moderators were hypothesized and ing the malingerers in a known-groups study significantly
explored, as they can provide some insight into method- moderated the effect sizes in the present meta-analysis.
ological characteristics that may be important despite Specifically, those studies employing well-validated dedi-
overall consistency in the derived effect sizes. Further, cated tests of malingering as well as using the Slick et al.
significant heterogeneity was found for the subset of RDS (1999) criteria to aid in the determination of group status
effect sizes as well as those derived from the WAIS–R, showed smaller effect sizes, while those not utilizing such
suggesting that moderator variables could aid in explain- criteria had larger and more variable differences. The use
ing such variability. Hypothesized moderators included of objective criteria is now considered standard practice
the use of incentives for effort, the use of a warning to in known-groups studies, as it increases the internal valid-
feign believably in enhanced simulation studies, and the ity of the study. These results are not surprising, but are
use of objective criteria in classifying feigning groups in need of further exploration with a larger number of
within the known-groups methodology. Each resulted studies.
in significant moderation of the overall effect sizes and
for RDS and WAIS–R effect sizes where they could be
analyzed, indicating that such methodological controls Summary of results: Diagnostic accuracy
should be routinely applied in malingering studies. The statistics
use of coaching on symptoms or feigning indices, as well
as the effect of preparation time, could not be analyzed Overall, the Digit Span test showed strong specificity
due to lack of relevant studies. (M = 86.3%), with moderate sensitivity (M = 61.4%)
With regard to the use of incentives, the present results and overall hit rate (M = 76.3%). More specifically, the
are limited in that the use of incentives was coded as a RDS and DS-ACSS variants also showed strong speci-
dichotomy, due to inconsistency in how the incentives ficity (M = 86.1%; M = 86.5%, respectively) with good
were awarded and the small number of studies included sensitivity (M = 63.3%; M = 59.7%, respectively). There
in these analyses (n = 3). In looking at these studies, the were no significant differences in the diagnostic accuracy
incentive amount ranged from $20 per study participant of the RDS and DS-ACSS, indicating that clinicians may
to a $100 drawing for a single participant. The results of choose the one best suited to their needs. Although RDS
the moderator analyses showed that effect size increased showed slightly better consistency across recommended
for those offered an incentive, and this was true when eye- cutting scores, the DS-ACSS is calculated automatically
balling the amount of incentive. The study providing the when scoring the WAIS and is conceptually easier to
largest incentive (Iverson & Franzen, 1996) also showed understand, making it the likely choice by clinicians.
the largest effect size (d = 2.46). However, intuition sug- Further, DS-ACSS utilizes a standardized score and is
gests that a larger incentive is associated with a more not calculated as a simple sum of digits forward and
ecologically valid study and less blatant malingering, thus backward as RDS is, so may be more difficult for the
reducing the differences between the feigning and hon- sophisticated malingerer to gauge the amount of feigning
est groups. Prior research in this area has been mixed, required to produce a given standard score.
310 JASINSKI ET AL.

Limitations Bernard, L. C., McGrath, M. J., & Houston, W. (1996). The dif-
ferential effects of simulating malingering closed head injury,
and other CNS pathology on the Wisconsin Card Sorting
The primary shortcoming of the present review is the
Test: Support for the “pattern of performance hypothesis”.
small number of studies available for analysis. More Archives of Clinical Neuropsychology, 11, 231–245.
specifically, the limited number of studies across ver- Berry, D. T. R., Baer, R. A., Rinaldo, J. C., & Wetter, M. W.
sions of the WAIS and WMS batteries precluded in-depth (2002). Assessment of malingering. In J. N. Butcher (Ed.),
exploration among these tests. Thus, the results of those Clinical personality assessment: Practical approaches. New
York, NY: Oxford University Press.
analyses should be considered tentative and an area in Berry, D. T. R., & Schipper, L. J. (2007). Detection of feigned
need of further research, particularly in light of the psychiatric symptoms during forensic neuropsychological
new test versions being released. Further, the effects of evaluations. In G. J. Larrabee (Ed.), Assessment of malin-
coaching and preparation time are potentially important gered neuropsychological deficits (pp. 226–263). New York,
NY: Oxford University Press.
moderators that could not be explored here.
Binder, L. M. (2002). The Portland Digit Recognition Test:
Additionally, although the Digit Span subtest may A review of validation and clinical use. Journal of Forensic
hold promise as a screening instrument, the present study Neuropsychology, 2, 27–41.
was not able to explore this issue thoroughly based on Binder, L. M., Villanueva, M. R., Howieson, D., & Moore, R.
a limited range of cutting scores presented to maxi- T. (1993). The Rey AVLT recognition memory task measures
motivational impairment after mild head trauma. Archives of
mize sensitivity, a requirement for using either RDS or Clinical Neuropsychology, 8, 137–147.
DS-ACSS as a screening tool for malingering. Future Bush, S. S., Ruff, R. M., Troster, A. I., Barth, J. T., Koffler, S. P.,
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

research should focus on publishing the full range of Pliskin, N. H., et al. (2005). Symptom validity assessment:
cutting scores and associated sensitivity and specificity Practical issues and medical necessity. NAN position paper.
Archives of Clinical Neuropsychology, 20, 419–426.
values, to allow clinicians and researchers to further
Camara, W., Nathan, J. S., & Puente, A. E. (2000).
examine this potentially time-saving diagnostic approach. Psychological test usage: Implications in professional
In summary, the present review provided evidence to psychology. Psychological Testing and Assessment, 31,
support the use of the Digit Span subtest, including 141–154.
∗ Chafetz, M. D. (2008). Malingering on the social security dis-
RDS and DS-ACSS, in adequately detecting malingering.
ability consultative exam: Predictors and base rates. The
The Digit Span subtest performs similarly to many well- Clinical Neuropsychologist, 22, 529–546.
validated dedicated malingering tests. The advantages of Cohen, J. (1988). Statistical power analysis for the behavioral sci-
embedded indices, in addition to the large effect sizes ences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
and strong diagnostic accuracy statistics, make the Digit Daubert v. Merrell Dow Pharmaceuticals, 113 S. Ct. 2876 (1993).
∗ Duncan, S. A., & Ausborn, D. L. (2002). The use of Reliable
Span test an essential tool for clinicians and researchers
Digits to detect malingering in a criminal forensic pretrial
alike. population. Assessment, 9, 56–61.
∗ Etherton, J. L., Bianchini, K. J., Ciota, M. A., Heinly, M. T.,

Original manuscript received 11 May 2010 & Greve, K. W. (2006). Pain, malingering and the WAIS–III
Revised manuscript accepted 28 July 2010 Working Memory Index. The Spine Journal, 6, 61–71.
∗ Etherton, J. L., Bianchini, K. J., Greve, K. W., & Heinly, M.
T. (2005). Sensitivity and specificity of Reliable Digit Span in
REFERENCES malingered pain-related disability. Assessment, 12, 130–136.
Frederick, R. I., & Crosby, R. D. (2000). Development and val-
idation of the Validity Indicator Profile. Law and Human
Note: References marked with an asterisk designate studies incl- Behavior, 24, 59–82.
uded in the meta-analysis. Frederick, R. I., Crosby, R. D., & Wynkoop, T. F. (2000).
Allen, L. M., & Green, P. (2001). Declining CARB failure rates Performance curve classification of invalid responding
over 6 years of testing: What’s wrong with this picture? on the Validity Indicator Profile. Archives of Clinical
Archives of Clinical Neuropsychology, 16, 846. Neuropsychology, 15, 281–300.
American Psychiatric Association. (2000). Diagnostic and sta- Green, P., Allen, L. M., & Astner, I. K. (1997). The Word
tistical manual of mental disorders (4th ed., text rev.). Memory Test: A user’s guide to the oral and computer-
Washington, DC: Author. administered forms. Durham, NC: CogniSyst.
∗ Greiffenstein, M., Baker, W. J., & Gola, T. (1994). Validation
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch,
S., et al. (2005). Sensitivity and specificity of finger-tapping of malingered amnesia measures with a large clinical sample.
test scores for the detection of suspect effort. The Clinical Psychological Assessment, 6, 218–224.
∗ Greiffenstein, M., Gola, T., & Baker, W. J. (1995). MMPI-2
Neuropsychologist, 19, 105–120.
∗ Axelrod, B. N., Fichtenberg, N. L, & Millis, S. R. (2006). validity scales versus domain specific measures in detec-
Detecting incomplete effort with Digit Span from the tion of factitious traumatic brain injury. The Clinical
Wechsler Adult Intelligence Scale–Third Edition. The Neuropsychologist, 3, 230–240.
Clinical Neuropsychologist, 20, 513–523. Greve, K. W., & Bianchini, K. J. (2004). Setting empirical
∗ Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). cut-offs on psychometric indicators of negative response
Sensitivity and specificity of various digit span scores in the bias: A methodological commentary with recommendations.
detection of suspect effort. The Clinical Neuropsychologist, Archives of Clinical Neuropsychology, 19, 533–541.
20, 145–159. Grote, C. L., & Hook, J. N. (2007). Forced-choice recognition
Bauer, L., & McCaffrey, R. J. (2006). Coverage of the Test of tests of malingering. In G. J. Larrabee (Ed.), Assessment of
Memory Malingering, Victoria Symptom Validity Test, and malingered neuropsychological deficits (pp. 44–79). New York,
Word Memory Test on the Internet: Is test security threat- NY: Oxford University Press.
ened? Archives of Clinical Neuropsychology, 21, 121–126. Gutheil, T. G. (2003). Reflections on coaching by attorneys.
Bernard, L. C., McGrath, M. J., & Houston, W. (1993). Journal of American Academy of Psychiatry and Law, 31, 6–9.
∗ Heaton, R. K., Smith, H. H., Lehman, R. A. W., & Vogt, A.
Discriminating between simulated malingering and closed
head injury on the Wechsler Memory Scale–Revised. T. (1978). Prospects for faking believable deficits on neu-
Archives of Clinical Neuropsychology, 8, 539–551. ropsychological testing. Journal of Consulting and Clinical
Psychology, 46, 892–900.
DIGIT SPAN AND MALINGERING DETECTION 311

Hedges, L., & Olkin, I. (1985). Statistical methods for meta- Journal of Clinical and Experimental Neuropsychology, 24,
analysis. New York, NY: Academic Press. 1094–1102.
∗ Heinly, M. T., Greve, K. W., Bianchini, K. J., Love, J. M., Mittenberg, W., Patton, C., & Legler, W. (October, 2003).
& Brennan, A. (2005). WAIS Digit Span-based indicators Identification of malingered head injury on the Wechsler
of malingered neurocognitive dysfunction. Assessment, 12, Memory Scale–Third Edition. Paper presented at the annual
429–444. conference of the National Academy of Neuropsychology,
∗ Inman, T. H., & Berry, D. T. R. (2002). Cross-validation Dallas, TX.
of indicators of malingering: A comparison of nine neu- Mittenberg, W., Rotholc, A., Russell, E., & Heilbronner,
ropsychological tests, four tests of malingering, and behav- R. (1996). Identification of malingered head injury
ioral observations. Archives of Clinical Neuropsychology, 17, on the Halstead–Reitan Battery. Archives of Clinical
1–23. Neuropsychology, 11, 271–281.
Inman, T. H., Vickery, C. D., Berry, D. T. R., Lamb, D., ∗ Mittenberg, W., Theroux, S., Aguila-Puentes, G., Bianchini, K.,
Edwards, C., & Smith, G. T. (1998). Development and ini- Greve, K., & Rayls, K. (2001). Identification of malingered
tial validation of a new procedure for evaluating adequacy head injury on the Wechsler Adult Intelligence Scale–Third
of effort given during neuropsychological testing: The Letter Edition. The Clinical Neuropsychologist, 15, 440–445.
Memory Test. Psychological Assessment, 10, 128–139. ∗ Mittenberg, W., Theroux-Fichera, S., Zielinski, R. E., &
∗ Iverson, G. L., & Franzen, M. D. (1994). The Recognition Heilbronner, R. L. (1995). Identification of malingered
Memory Test, Digit Span, and Knox Cube Test as markers head injury on the Wechsler Adult Intelligence Scale–
of malingered memory impairment. Assessment, 1, 323–334. Revised. Professional Psychology: Research and Practice, 26,
∗ Iverson, G. L., & Franzen, M. D. (1996). Using multiple objec- 491–498.
tive memory procedures to detect simulated malingering. Orey, C., Cragar, D. E., & Berry, D. T. R. (2000). The effects
Journal of Clinical and Experimental Neuropsychology, 18, of two motivational manipulations on the neuropsycholog-
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

38–51. ical performance of mildly head-injured college students.


Iverson, G. L., & McCracken, L. M. (1997). “Postconcussive” Archives of Clinical Neuropsychology, 15, 335–348.
symptoms in persons with chronic pain. Brain Injury, 11, Public attitude monitor [Survey]. (1992). Oak Brook, IL:
783–790. Insurance Research Council.
Johnson, B. T. (1989). DSTAT: Software for the meta-analytic Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment
review of research literatures. Hillside, NJ: Lawrence Erlbaum practices of clinical neuropsychologists in the United States
Associates. and Canada: A survey of INS, NAN, and APA Division 40
Killgore, W. D., & DellaPietra, L. (2000). Using the WMS– members. Archives of Clinical Neuropsychology, 20, 33–65.
III to detect malingering: Empirical validation of the rarely Rogers, R. (1997). Clinical assessment of malingering and decep-
missed index (RMI). Journal of Clinical and Experimental tion (2nd ed.). New York, NY: Guilford Press.
Neuropsychology, 22, 761–771. Rogers, R. (2008). Researching response styles. In R. Rogers
∗ Langeluddecke, P. M., & Lucas, S. K. (2003). Quantitative (Ed.), Clinical assessment of malingering and deception (3rd
measures of memory malingering on the Wechsler Memory ed., pp. 411–434). New York: Guilford Press.
Scale–Third edition in mild head injury litigants. Archives of Rogers, R., & Cavanaugh, J. L. (1983). “Nothing but the truth”
Clinical Neuropsychology, 18, 181–197. . . . a reexamination of malingering. Journal of Psychiatry and
∗ Larrabee, G. J. (2003). Detection of malingering using atypical Law, 11, 443–459.
performance patterns on standard neuropsychological tests. Rogers, R., & Cruise, K. R. (1998). Assessment of malingering
Clinical Neuropsychologist, 17, 410–425. with simulation designs: Threats to external validity. Law and
Larrabee, G. J. (2007). Assessment of malingered neuropsycholog- Human Behavior, 22, 273–285.
ical deficits. New York, NY: Oxford University Press. Rosenthal, R. (1991). Meta-analytic procedures for social
Larrabee, G. J., & Berry, D. T. R. (2007). Diagnostic clas- research (Rev. ed.). Newbury Park, CA: Sage.
sification statistics and diagnostic validity of malingering Ruiz, M. A., Drake, E. B., Glass, A., Marcotte, D., & van
assessment. In G. J. Larrabee (Ed.), Assessment of malingered Gorp, W. G. (2002). Trying to beat the system: Misuse of the
neurocognitive deficits (pp. 14–26). New York, NY: Oxford Internet to assist in avoiding the detection of psychological
University Press. symptom dissimulation. Professional Psychology: Research
Lees-Haley, P. R. (1997). Attorneys influence expert evidence in and Practice, 33, 294–299.
forensic psychology and neuropsychology cases. Assessment, Slick, D. J., Hopp, G., Strauss, E., & Thompson, G.
4, 321–324. (1997). Victoria Symptom Validity Test: Professional manual.
Lipsey, M. W., & Wilson, B. D. (2001). Practical meta-analysis. Odessa, FL: Psychological Assessment Resources.
Thousand Oaks, CA: Sage Publications. Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999).
∗ Martin, R. C., Hayes, J. S., & Gouvier, W. D. (1996). Diagnostic criteria for malingered neurocognitive dysfunc-
Differential vulnerability between postconcussion self-report tion: Proposed standards for clinical practice and research.
and objective malingering tests in identifying simulated Clinical Neuropsychologist, 13, 545–561.
mild head injury. Journal of Clinical and Experimental Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F.
Neuropsychology, 18, 265–275. (2004). Detecting malingering: A survey of experts’ practices.
∗ Mathias, C. W., Greve, K. W., Bianchini, K. B., Houston, R. J., Archives of Clinical Neuropsychology, 19, 465–473.
& Crouch, J. A. (2002). Detecting malingered neurocognitive ∗ Strauss, E., Slick, D. J., Levy-Bencheton, J., Hunter,
dysfunction using the Reliable Digit Span in traumatic brain M., MacDonald, S. W. S., & Hultsch, D. F. (2002).
injury. Assessment, 9, 301–308. Intraindividual variability as an indicator of malingering
Meyers, J. E., & Volbrecht, M. (1999). Detection of malinger- in head injury. Archives of Clinical Neuropsychology, 17,
ers using the Rey Complex Figure and Recognition Trial. 423–444.
Applied Neuropsychology, 6, 201–207. Suhr, J. A., & Gunstad, J. (2007). Coaching and malingering:
Millis, S. R., Putnam, S. H., Adams, K. M., & Ricker, J. H. A review. In G. J. Larrabee (Ed.), Assessment of malingered
(1995). The California Verbal Learning Test in the detec- neuropsychological deficits (pp. 287–311). New York, NY:
tion of incomplete effort in neuropsychological evaluation. Oxford University Press.
Psychological Assessment, 7, 463–471. ∗ Suhr, J., Tranel, D., Wefel, J., & Barrash, J. (1997). Memory
∗ Mittenberg, W., Azrin, R., Millsaps, C., & Heilbronner, R. performance after head injury: Contributions of malinger-
(1993). Identification of malingered head injury on the ing, litigation status, psychological factors, and medication
Wechsler Memory Scale–Revised. Psychological Assessment, use. Journal of Clinical and Experimental Neuropsychology,
5, 34–40. 4, 500–514.
Mittenberg, W., Patton, C., Canyock, E. M., & Condit, D. C. Tombaugh, T. N. (1996). Test of Memory Malingering
(2002). Base rates of malingering and symptom exaggeration. (TOMM). New York, NY: Multi-Health Systems.
312 JASINSKI ET AL.

∗ Trueblood, W. (1994). Qualitative and quantitative character- Wechsler, D. (1987). Wechsler Memory Scale–Revised manual.
istics of malingered and other invalid WAIS–R and clin- San Antonio, TX: The Psychological Corporation.
ical memory data. Journal of Clinical and Experimental Wechsler, D. (1997a). WAIS–III: Administration and scoring
Neuropsychology, 4, 597–607. manual. San Antonio, TX: The Psychological Corporation.
∗ Trueblood, W., & Schmidt, M. (1993). Malingering and other Wechsler, D. (1997b). Wechsler Memory Scale III:
validity considerations in the neuropsychological evaluation Administration and scoring manual. San Antonio, TX:
of mild head injury. Journal of Clinical and Experimental The Psychological Corporation.
Neuropsychology, 15, 578–590. Wechsler, D. (2008). WAIS–IV: Administration and scoring
Tukey, J. (1977). Exploratory data analysis. Reading, MA: manual. San Antonio, TX: The Psychological Corpor
Addison-Wesley. ation.
Vickery, C. D., Berry, D. T. R., Inman, T. H., Harris, M. J., & Wechsler, D. (2009). WMS–IV: Administration and scoring man-
Orey, S. A. (2001). Detection of inadequate effort on neu- ual. San Antonio, TX: The Psychological Corporation.
ropsychological testing: A meta-analytic review of selected Wetter, M. W., & Corrigan, S. K. (1995). Providing informa-
procedures. Archives of Clinical Neuropsychology, 16, 45–73. tion to clients about psychological tests: A survey of attor-
Wechsler, D. (1981). Wechsler Adult Intelligence Scale–Revised neys’ and law students’ attitudes. Professional Psychology:
manual. New York, NY: The Psychological Corporation. Research and Practice, 26, 474–477.
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

APPENDIX
Demographic and methodological characteristics, effect sizes, and classification accuracy of studies included in independent contrasts analyses

Age Ed.
(years) (years) Method Test Characteristics

Study Comp type Grp N Subj. M SD M SD Coa Prep Inc Bel. MS Subtest M SD CS SN SP HR d

Axelrod (2006)a KG M 36 TBI 43.1 11.9 12.5 1.6 n/a n/a n/a n/a .86 DS 6.3 2.1 ≤7 .75 .72 1.05
H 29 TBI 38.2 17.8 12.8 2.1 8.5 2.1 .69
Babikian (2006)a KG M 66 MixedN 42.5 12.3 13.0 2.4 n/a n/a n/a n/a .86 RDS 6.7 2.4 ≤7 .62 .69 0.99
H 56 MixedN 48.4 16.4 13.3 2.4 8.9 2.0 .77
Chafetz (2008)a KG M 16 Dis. 24.1 8.5 10.2 1.5 n/a n/a n/a n/a .43 RDS 4.2 2.4 NA NA NA NA 2.18
H 21 Dis. 30.2 8.9 9.8 1.6 8.0 1.0
KG M 7 Dis. 25.4 11.0 7.1 3.2 n/a n/a n/a n/a .43 RDS 2.9 1.9 NA NA NA NA 3.46c
H 12 Dis. 40.9 12.7 9.8 1.4 7.7 1.0
Duncan (2002)a KG M 54 Crim. 34.7 10.1 8.9 2.2 n/a n/a n/a n/a .57 RDS 5.8 3.4 ≤7 .68 .71 1.19
H 124 Crim. 38.1 9.5 10.5 2.7 8.9 2.1 .72
Etherton (2006)a KG M 32 Pain 43.0 8.5 11.5 3.0 n/a n/a n/a n/a .86 DS 6.4 3.0 ≤7 .75 .75 1.12
H 36 TBI 32.2 14.1 NA NA 9.8 3.1 .75
Sim M 20 Stud. NA NA NA NA N N N Y .38 DS 5.9 2.7 ≤7 .55 .71 1.43
H 49 Pain 43.5 11.9 12.2 2.2 10.0 2.9 .78
Etherton (2005)a KG M 35 Pain 43.3 11.6 NA NA n/a n/a n/a n/a 1.0 RDS 7.2 3.0 ≤7 .60 .79 1.29
H 53 Pain 43.0 8.4 NA NA 10.5 2.3 .92
Greiffenstein (1994)a KG M 43 TBI 39.3 10.0 11.1 2.2 n/a n/a n/a n/a .57 RDS 6.7 1.2 ≤7 .68 .77 1.90
H 30 PPCS 37.6 13.9 11.7 0.8 8.9 1.1 .89
Greiffenstein (1995)a KG M 68 TBI 38.3 11.7 12.2 2.9 n/a n/a n/a n/a .57 RDS 6.6 1.8 ≤7 .89 .80 1.49
H 53 TBI 24.6 12.5 12.5 2.6 9.2 1.7 .68
Heaton (1978)a Sim M 16 Stud/CV 24.4 7.5 12.9 2.4 N N Y N .50 DS 7.0 2.9 NA NA NA NA 0.92
H 16 TBI 26.7 6.5 11.9 1.6 9.5 2.5
Heinly (2005)a KG M 18 TBI NA NA NA NA n/a n/a n/a n/a .86 DS 6.1 2.8 ≤5 .36 .75 1.24
H 45 TBI NA NA NA NA 10.5 3.8 .92
Inman (2002)a Sim M 21 TBI 18.7 0.9 12.5 1.0 N N Y Y .75 RDS 9.2 2.3 <8 .27 .65 1.02
H 24 TBI 18.7 1.7 12.3 0.6 11.6 2.4 1.0
Iverson (1994)a Sim M 40 Stud/Inm 28 6.9 14 2.0 N N N Y .50 DS 2.7 1.7 <5 .90 .90 2.97c
H 20 TBI 28 7.5 13 1.9 8.3 2.3 .90

313
Downloaded by [University Of Pittsburgh] at 02:44 16 June 2013

314
Iverson (1996)a Sim M 20 Psych. 20.0 1.7 13.7 0.7 N N Y Y .50 DS 2.6 1.4 <4 .78 .91 2.46
H 20 MemImp 34.0 13.2 13.6 2.3 8.2 2.8 1.0
Langeluddecke (2003)b KG M 25 TBI 33.9 12.0 10.8 2.1 n/a n/a n/a n/a 1.0 DS 11.6 3.6 ≤8 .16 .76 1.27
H 50 TBI 38.4 11.3 11.1 3.3 16.3 3.9 .98
Larrabee (2003)a KG M 24 TBI 39.3 11.8 12.5 2.3 n/a n/a n/a n/a .71 RDS 7.4 1.7 ≤7 .50 .73 1.27
H 12 TBI 34.8 16.8 12.6 2.6 10.0 2.2 .94
Martin (1996)a Sim M 51 Stud./TBI 22.1 5.5 14.0 1.4 Y N N N .44 DS 11.3 6.3 ≤ 11 .35 .68 1.15
H 70 TBI 21.6 5.5 14.6 1.3 16.4 3.4 1.0
Mathias (2002)a KG M 24 TBI 38.9 12.3 12.5 2.8 n/a n/a n/a n/a 1.0 RDS 6.8 1.5 ≤7 .67 .82 1.83
H 30 TBI 33.6 13.5 12.5 2.4 10.1 2.0 .93
Mittenberg (1993)a Sim M 39 CV 32.8 12.4 15.5 2.3 N N N Y .56 DS 11.2 3.9 NA NA NA NA 1.02
H 39 TBI 32.9 11.9 13.9 3.0 15.3 4.1
Mittenberg (1995)a Sim M 67 CV 33.7 12.2 15.7 1.9 N N N Y .56 DS 6.9 3.1 NA NA NA NA 1.02
H 67 TBI 33.2 11.3 14.0 3.0 9.7 2.3
Mittenberg (2001)a KG M 36 TBI 42.9 16.0 11.0 3.4 n/a n/a n/a n/a .43 DS 6.1 2.2 NA NA NA NA 1.21
H 36 TBI 30.3 12.9 11.9 1.5 9.5 3.3
Strauss (2002)a Sim M 11 TBI NA NA NA NA N N N Y .56 RDS 5.9 4.1 <7 .79 .85 1.27
H 16 TBI NA NA NA NA 10.3 2.9 .90
Suhr (1997)a KG M 31 TBI 37.6 8.7 12.7 2.2 n/a n/a n/a n/a 1.0 DS 6.3 2.4 NA NA NA NA 1.02
H 20 TBI 35.4 9.5 13.1 2.9 9.0 3.0
Trueblood (1994)a KG M 12 TBI 34.9 NA 11.6 NA n/a n/a n/a n/a .86 DS 4.8 NA <7 .75 .75 NA
H 12 TBI 35.0 NA 11.3 NA 8.3 NA .75
Trueblood (1993)a KG M 8 TBI 37.4 NA 11.6 NA n/a n/a n/a n/a .86 DS 5.9 NA ≤7 .62 .75 NA
H 8 TBI 37.4 NA 12.0 NA 8.9 NA .88

Note. For studies, only the first author is listed. n/a = not applicable; NA = information not available. Comp type = comparison type (KG = known groups; Sim =
enhanced simulation). Grp = group (M = malingering; H = honest). N = number of subjects. Subj. = subject sample (Crim. = criminal pretrial defendants, CV =
community volunteers, Dis. = disability claimants, Inm = Inmates, MemImp = memory impaired, MixedN = mixed neurological, Pain = chronic pain, PPCS = persistent
postconcussive syndrome, Psych. = psychiatric patients, Stud. = students, TBI = traumatic brain injury). Age (M = mean, SD = standard deviation). Ed. = education
level. Method (Coa = coaching provided on either symptoms or avoiding detection, Prep = preparation time given, Inc = incentive given, Bel. = warning to be believable;
N = not present, Y = included in methodology). MS = relative method score. Test scale (RDS = reliable digit span, DS = digit span age-corrected scaled score; M = mean
test score, SD = standard deviation, CS = cutting score, SN = sensitivity, SP = specificity, HR = hit rate, d = corrected Hedges’s g effect size).
a Study using a version of the Wechsler Adult Intelligence Scale (WAIS).
b Study using a version of the Wechsler Memory Scale (WMS).
c Study excluded as an outlier.

You might also like