The Detection of Content-Based Invalid Responding: A Meta-Analysis of The MMPI-2-Restructured Form's (MMPI-2-RF) Over-Reporting Validity Scales

The Clinical Neuropsychologist
ISSN: 1385-4046 (Print) 1744-4144 (Online) Journal homepage: http://www.tandfonline.com/loi/ntcn20
The detection of content-based invalid responding:

a meta-analysis of the MMPI-2-Restructured
Form’s (MMPI-2-RF) over-reporting validity scales
Paul B. Ingram & Michael S. Ternes
To cite this article: Paul B. Ingram & Michael S. Ternes (2016): The detection of content-based
invalid responding: a meta-analysis of the MMPI-2-Restructured Form’s (MMPI-2-RF) over-
reporting validity scales, The Clinical Neuropsychologist, DOI: 10.1080/13854046.2016.1187769
To link to this article: http://dx.doi.org/10.1080/13854046.2016.1187769
Published online: 23 May 2016.
Submit your article to this journal
Article views: 29
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ntcn20
Download by: [University of Nebraska, Lincoln] Date: 31 May 2016, At: 00:38
The Clinical Neuropsychologist, 2016
http://dx.doi.org/10.1080/13854046.2016.1187769
The detection of content-based invalid responding: a

meta-analysis of the MMPI-2-Restructured Form’s
(MMPI-2-RF) over-reporting validity scales
Paul B. Ingram and Michael S. Ternes
Educational Psychology, University of Kansas, Lawrence, KS, USA
Downloaded by [University of Nebraska, Lincoln] at 00:38 31 May 2016
ABSTRACT ARTICLE HISTORY

Objective: This study synthesized research evaluation of the Received 24 November 2015
effectiveness of the over-reporting validity scales of the Minnesota Accepted 5 May 2016
Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) KEYWORDS
for detecting intentionally feigned over-endorsements of symptoms Meta-analysis; validity scales;
using a moderated meta-analysis. Method: After identifying over-reporting; MMPI-2-RF;
experimental and quasi-experimental studies for inclusion (k = 25) malingering
in which the validity scales of the MMPI-2-RF were compared between
groups of respondents, moderated meta-analyses were conducted for
each of its five over-reporting scales. These meta-analyses explored
the general effectiveness of each scale across studies, as well as the
impact that several moderators had on scale performance, including
comparison group, study type (i.e. real versus simulation), age,
education, sex, and diagnosis. Results: The over-reporting scales of
the MMPI-2-RF act as effective general measures for the detection
of malingering and over endorsement of symptoms with individual
scales ranging in effectiveness from an effect size of 1.08 (Symptom
Validity; FBS-r) to 1.43 (Infrequent Pathology; Fp-r), each with different
patterns of moderating influence. Conclusions: The MMPI-2-RF
validity scales effectively discriminate between groups of respondents
presenting in either an honest manner or with patterned exaggeration
and over-endorsement of symptoms. The magnitude of difference
observed between honest and malingering groups was substantially
narrower than might be expected using traditional cut-scores for the
validity scales, making interpretation within the evaluation context
particularly important. While all over-reporting scales are effective,
the FBS-r and RBS scales are those least influenced by common
and context specific moderating influences, such as respondent or
comparison grouping.
Content-based invalid responding (e.g. a respondent’s purposeful manipulation of response

patterns according to their interpretation or reaction to an item’s content through either
over or under-reported) is a major concern for self-report inventories (Ben-Porath, 2003). By
including validity scales, clinical assessments increase their capacity to effectively delineate
response patterns associated with invalid responding. This inclusion ensures a more accurate
CONTACT Paul B. Ingram pbingram@ku.edu

© 2016 Informa UK Limited, trading as Taylor & Francis Group
2 P. B. Ingram and M. S. Ternes
assessment of symptom presentation. The need for accurate interpretation is particularly

pronounced in instruments such as the Minnesota Multiphasic Personality Inventory-2-
Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) given that high risk deci-
sion-making is both an intended purpose and the strength of the instrument (Ben-Porath,
2012a). Such a need is well recognized as one of the major criticisms of the MMPI-2-RF in
that there are not enough evaluative precautions taken to ensure effective and appropriate
interpretations are rendered (Butcher, 2010; Butcher & Williams, 2010).
The establishment of interpretive precision is well underway with numerous studies
exploring symptom measurement and interpretation across both clinical (Arbisi, Polusny,
Erbes, Thuras, & Reddy, 2011; Mason et al., 2013) and nonclinical populations (Forbey, Lee,
& Handle, 2010; Ingram, Kelso, & McCord, 2011). There have also been efforts to calculate
index scores to improve determination of validity scale response patterns (Meyers et al.,
2014). Such a rapid growth on various aspects of the MMPI-2-RF is likely brought on by the
clear potential of the Restructured Clinical (RC; Tellegen et al., 2003) scales, the core clinical
scales of the MMPI-2-RF. These scales offer an opportunity to resolve difficulties noted in
earlier versions of the MMPI (Archer, 2006). Unsurprisingly, given the MMPI-2-RF’s impressive
psychometric properties (Arbisi, Sellbom, & Ben-Porath, 2008/2011; Ayearst, Sellbom, Trobst,
& Bagby, 2013; Ben-Porath & Tellegen, 2008/2011; Sellbom, Ben-Porath, Graham, Arbisi, &
Bagby, 2003; Simms, Casillas, Clark, Watson, & Doebbeling, 2005; Wolf et al., 2008) and diag-
nostically predictive capacity (Scholte et al., 2012; Tarescavage, Wygant, Gervais, & Ben-
Porath, 2013), the MMPI-2-RF is often described as an effective instrument at the detection
of symptoms. The MMPI-2-RF is effective at detecting those that are exaggerated, minimized,
or otherwise misreporting their symptoms (Ben-Porath, 2012a; Hoelzle, Nelson, & Arbisi,
2012). In conjunction with the growing base of literature supporting its clinical scales, the
MMPI-2-RF’s validity scales are also frequently described as effective tools, helping to monitor
for responses suggestive of invalid responding patterns (Sleep, Petty, & Wygant, 2015).
Studies have examined MMPI-2-RF detection of response bias across a variety of contexts.
Research has explored the effectiveness of the validity scales in both known group (e.g.
Anderson, 2011; Goodwin, Sellbom, & Arbisi, 2013; Green, 2013; Harp, Jasinski, Shandera-
Ochsner, Mason, & Berry, 2011) and simulation studies (Marion, Sellbom, & Bagby, 2011;
Sellbom & Bagby, 2010; Sellbom, Wygant, & Bagby, 2012) across a variety of common pre-
senting clinical concerns (Harp et al., 2011; Marion et al., 2011; Youngjohn, Wershba,
Stevenson, Sturgeon, & Thomas, 2011) using different comparison groups (Crisanti, 2014;
Sellbom, Wygant, Toomey, Kucharski, & Duncan, 2010; Tarescavage et al., 2013). Consistently,
the validity scales are interpreted as appropriate for detecting content-based invalid respond-
ing (Ben-Porath, 2012b).
The validity scales of the MMPI-2-RF include the Infrequent Responses (F-r), Infrequent
Psychopathology Responses (Fp-r), Symptom Validity (FBS-r), Response Bias (RBS; Gervais,
Ben-Porath, Wygant, & Green, 2007; Gervais, Ben-Porath, Wygant, & Selbom, 2010), and
Infrequent Somatic Complaints (Fs), each measuring over-reporting. The validity scales also
include Uncommon Virtues (L-r) and Adjustment Validity (K-r), which assess under-reporting
using two nonoverlapping constructs derived by factor analysis. While most of these scales
are revised forms of validity scales with a long history of inclusion on the MMPI-2 (Ben-Porath,
2012b), the RBS was developed later. RBS is not based on elevations of infrequently answered
items keyed in a given direction within a small portion of a population as the infrequent
scales (F-r, Fs, and Fp-r) or using rational approaches (FBS-r). Instead, Gervais et al developed
The Clinical Neuropsychologist 3
RBS to detect response bias within high-risk assessment settings by using items that dis-
criminated between individuals who passed and scored below chance on external measures
of cognitive effort. Similarly, the Fs scale is a relatively new addition to the interpretive
approach of the MMPI, having only recently been developed prior to inclusion in the MMPI-
2-RF (Wygant, Ben-Porath, & Arbisi, 2004).
The literature is growing in support of the validity scales for the MMPI-2-RF as effective
screening instruments for clinical concerns (e.g. Mason et al., 2013). This growth is particularly
true of indices measuring over-reporting; however, there have been no published meta-anal-
yses about the efficacy of the MMPI-2-RF validity scales. This lack of meta-analytic evaluation
leaves room for some interpretive confusion. Although studies have repeatedly suggested
that validity scales are able to differentiate feigners from honest respondents, there are a
myriad of interactions which may complicate, exacerbate, or otherwise alter the discrimina-
tive capacity of these scales. It is for this reason that differences observed across various
malingering studies are as wide as they are for many of the scales (c.f. Lehr, 2014; Marion
et al., 2011). Differences between the effect sizes, comparison and criterion groups, and
demographic characteristics in the various studies are yet to be reconciled in a manner that
enables a complete understanding of each validity scale’s general effectiveness.
Meta-analysis explores relationships in a way not available within individual studies by
creating an analytical amalgam drawn from numerous research teams and methodological
approaches. Offering not only a statistical summary of relationships intended to be described
during research study, meta-analysis also offers a means to examine potentially moderating
factors that may be, without the knowledge of researchers, influencing the strength of their
observations (Borenstein, Hedges, Higgins, & Rothstein, 2011). Given this capacity for syn-
thesis, it is unsurprising that the MMPI-2, with its rich research history (Colligan, 1985), has
been the subject of repeated meta-analyses exploring its effectiveness at the detection of
content-based invalid responding (Baer & Miller, 2002; Berry, Baer, & Harris, 1991; Nelson,
Hoelzle, Sweet, Arbisi, & Demakis, 2010; Rogers, Sewell, Martin, & Vitacco, 2003 ; Rogers,
Sewell, & Salekin, 1994).
A weakness facing the MMPI-2-RF is that such a meta-analysis has not been conducted
to explore its validity scales measuring over-reporting. This weakness is an important area
to address given the MMPI-2-RF’s growing research base and use in high stakes decisions
(e.g. Archer, Hagan, Mason, Handel, & Archer, 2012; Detrick & Chinball, 2014). While this is
likely due to the relative novelty of the instrument, the MMPI-2-RF could still benefit from
such an exploration in an effort to not just summarize the research findings but to offer
potential directions for further study. Thus, this study has two goals: to meta-analytically
explore the relationship between MMPI-2-RF validity scales and detection efforts of willful
manipulation in the form of over-reporting on measures of content-based invalid responding
and to explore moderating influences on these relationships.
Method
General approach
Meta-analysis is a strategy for synthesizing the results of multiple studies into a single esti-
mate of effect. In order to do so, this study performed a literature review identifying studies
reporting mean T-scores for any validity scales of the MMPI-2-RF across at least two distinct
groups, at least one of which was identified as a known or simulated malingering group. In
this way, studies targeted examined the capacity of individuals to intentionally mislead
interpretation efforts through over-endorsement of symptoms. All studies were then coded
(validity scale mean differences and moderator coding) by two separate reviewers.
Discrepancies in reviewer coding were compared (20 of 1536 possible discrepancies; 98.7%
inter-rater reliability). When a difference emerged, primary reference material was reviewed
and discussion occurred until coding agreement was reached.
This study utilized a weighted mean effect size approach, ensuring that studies with larger
sample sizes produced greater influence than those with smaller sample size. Doing so,
differences between valid and invalid responding groups for the over-reporting validity
indexes native to the MMPI-2-RF were evaluated. The standardized mean differences were
transformed into effect sizes, and the effect sizes were then converted into weighted effect
sizes. These effect sizes (based upon Hedge’s g), along with subsequent moderator analyses
using an analysis of variance (ANOVA) analog and tests of homogeneity, were calculated
pursuant to the procedures provided by Borenstein et al. (2009). This review utilized the
recommended guidelines for meta-analysis described by Moher, Liberati, Tetzlaff, Altman,
and the PRISMA Group (2009).
Literature search, selection criteria and coding

To identify studies for the meta-analyses, we searched the Social Sciences database via the
ProQuest on 15 May 2015 using the keyword MMPI-2-RF malinger*. This search involved the
review of 18 separate databases and produced a total of 40 results (including peer reviewed
articles, dissertations, and books). Next, the titles and abstracts of each article were reviewed
to identify studies which met the research goal of assessing validity scale performance for
the detection of malingering. A forward and backward search (i.e. reviewing all articles cited
by or citing an identified article) for each identified article was then conducted in order to
expand the potential pool of included studies. A request was made to the first authors who
had published an identified malingering study for further unpublished manuscripts either
electronically or at the 2015 MMPI-2-RF/MMPI-2/MMPI-A annual symposium. No additional
sources were gained through this method. Finally, the University of Minnesota Press’s Test
Division (2011) website, which catalogs all publications on the MMPI-2-RF, was reviewed.
The headings of ‘Validity Scales,’‘Forensic,’‘Medical’ and ‘Correctional’ were examined, as they
were most topically related to areas in which malingering studies might be found. From
those heading groups, a total of 25 studies (13 new) were identified for review. Screening
of potential studies concluded in January of 2016.
A culmination of the aforementioned methods and a review of titles and abstracts reduced
the potential studies to 37. Each study was then examined for inclusion as part of a compre-
hensive review. A list of included studies is provided in Table 1. During this comprehensive
review, 13 total studies were excluded for reporting raw scores instead of T-scores (Green,
2013; Jones & Ingram, 2011; Nelson, Sweet, & Heilbronner, 2007; Sullivan & Elliot, 2012), use
of dependent samples (Lange, Brickell, & French, 2015; Shiels, 2015), not describing groups
sufficiently (Tolin, Steenkamp, Marx, & Litz, 2010) or unreported information necessary for
the calculation of meta-analytic effect sizes (Dionysus, Denney, & Halfaker, 2011; Gervais et
al., 2007; Sobhanian, 2014; Whitney, Davis, Shepard, & Herman, 2008; Whitney, 2013; Young
& Gross, 2011).
Table 1. Identified studies examining content based invalid responding.

Included Excluded
Anderson (2011)* Dionysus et al. (2011)
Crisanti (2014)* Gervais et al. (2007)
Goodwin et al. (2013) Jones and Ingram (2011)
Green (2013)* Nelson et al. (2007)
Harp et al. (2011) Lange et al. (2015)
Jones, Ingram, & Ben-Porath (2012) Shiels (2015)*
Kolinsky (2013)* Sobhanian (2014)*
Lehr (2014)* Sullivan and Elliot (2012)
Marion et al. (2011) Sullivan, Elliot, Lange, & Anderson (2013)
Mason et al. (2013) Tolin et al. (2010)
Nguyen, Green, & Barr (2015) Whitney (2013)
Rogers, Gillard, Berry, and Granacher (2011) Whitney et al. (2008)
Saiz and Dura (2013) Young and Gross (2011)
Schroeder et al. (2012)
Sellbom and Bagby (2011)

Sellbom et al. (2010)
Sellbom et al. (2012)
Tarescavage et al. (2013)
Wall, Wygant, & Gallagher (2015)
Wygant et al. (2009)
Young, Kearns, & Roper (2011)
Youngjohn et al. (2011)
*Denotes unpublished doctoral dissertation.
There was no inclusion criteria whether a study had screened for random (VRIN-r > 80),
acquiescent (TRIN-r > 80) or nonresponding (Cannot Say; CNS < 18) responses using recom-
mended cut-scores (Ben-Porath & Tellegen, 2008/2011). Approximately, 60% of included
articles had such screenings, suggesting that such exclusion would greatly diminish the
available sample for meta-analysis. However, given the impact that these patterns can have
on content-based scale elevations (e.g. Burchett et al., 2015), a moderator was coded to
evaluate whether traditional interpretive guidelines for noncontent-based validity indexes
(e.g. Ben-Porath & Tellegen, 2008/2011 for those studies which utilized the MMPI-2-RF) were
included within each malingering study. Screening guidelines recommended examination
of VRIN-r and TRIN-r for significant elevations before considering other validity scales as
chance alone may produce content-based invalid responding elevation for many scales.
In order to ensure that effects were likely to generalize, when studies reported more than
one comparison group (e.g. they included a group deemed as a probable malingering group
but could made no definitive decision about the certainty of respondents indentions), only
the one group known to be feigning their presentation was selected. In the case of studies
reporting groups of respondents who have failed multiple symptom validity tests (SVT) not
included on the MMPI, this study compared the group with no SVT failures to the group
with the greatest number of SVT failures. This approach is consistent with the comparisons
made within those studies and is presumed to capture malingering versus nonmalingering
groups (c.f. Wygant et al., 2009). These decisions strengthen the goal of the study to examine
the capacity of the validity scales on the MMPI-2-RF at the detection of malingered respond-
ing instead of secondary issues (e.g. lower than required native reading level or medical
complications) that might otherwise lead to increased variability in validity scale
performance.
Moderator analysis
Moderators were selected for being previously noted as impactful on the function of validity
scales on the MMPI, making them important to consider during meta-analysis. Five moder-
ators identified for inclusion in this study were the classification of malingerers as members
of a known group versus a laboratory simulation (Baer & Miller, 2002), participant age (Baer
& Miller, 2002), diagnosis of invalid responding group (Nelson et al., 2010; Rogers et al., 2003),
sample from which the comparison group was drawn (Baer & Miller, 2002; Nelson et al.,
2010), education (Baer & Miller, 2002), and if the study used non-content based invalid
responding scales (e.g. VRIN-r, TRIN-r, and CNS) to screen respondents prior to study inclusion.
Publication status was also examined. This inclusion of publication status as a moderator is
typical within meta-analyses in order to examine the potential of publication bias (Borenstein
et al., 2009). Some studies provided multiple types of effect sizes resulting in moderator-de-
pendent study totals (e.g. one published study might contain both a simulation and real
effect size). For reference, Table 2 provides a complete listing of moderators, as well as their
coded frequency within each study.
While conceptual areas (e.g. age, diagnosis) were identified as areas of interest prior to
comprehensive review of the studies, specific subcategories for each moderator were left
up to the determination of each of the two reviewers. Following separate review of the
articles, coding categories were finalized by inter-rater review. Final coding categories were
established through negotiation of the two reviewers. Final coding decisions were made
pursuant to an effort to provide appropriate descriptions of groups within studies and ensure
subgroups contained sufficient numbers of effect sizes to enable stable and meaningful
comparative evaluation. Such a balance provided a means to ensure both clinical utility and
meaningful statistical measurement. Although most did not, a few cases required the col-
lapsing of moderator sub-category groups. For instance, respondent diagnosis required the
collapse of several diagnoses (attention-deficit hyperactivity disorder, depression, schizo-
phrenia, etc.) into a single category. Each effect size contributes orthogonally to moderator
analyses within each validity scale, contributing to only one subgroup.
Results
Separate meta-analyses were conducted on each of the over-reporting validity scales of the
MMPI-2-RF. These meta-analyses examined the mean score difference observed between
individuals classified as part of a group exhibiting actual effort and a group consisting of
feigned, exaggerated, or otherwise intentionally misrepresented symptom level. Thus, effect
sizes represent estimates of magnitude by which these groups may be differentiated. Effect
sizes are grouped according to strength with .20 to .49 indicating ‘small’ effects, .50 to .79
showing ‘medium’ effects, and greater than .80 demonstrating a ‘large’ effect (Cohen, 1992).
Using these descriptions, the composite effect sizes (e.g. not taking into account the mod-
erator analyses conducted) reveal very large effects for all of the validity scales. However,
variability within validity scales was greater than would be expected as a result of sampling
error. This variability suggests a need to consider moderating influences, consistent with the
planned use of a random effects model. A random effects approach was utilized because of
the study’s goal to make inferences about the general population from which the observed
studies are sampled (Hedges & Vevea, 1998). Descriptive characteristics for moderators are
Table 2. Characteristics of studies included in the meta-analyses.

Fp-r F-r Fs RBS FBS-r
Characteristics k % k % k % k % k %
Publication 22 23 21 14 22
No 5 22.7 5 21.7 3 14.2 1 7.1 4 18.2
Yes 17 77.2 18 78.2 18 85.7 13 92.9 18 81.8
Condition 24 25 23 14 24
Simulation 10 41.7 10 40.0 9 39.1 3 21.4 9 37.5
Real 14 58.3 15 60.0 14 60.8 11 78.6 15 62.5
Diagnosis 27 30 28 17 28
Combined 10 37.0 11 36.7 10 35.7 7 41.2 11 9.3
Singular 2 7.4 2 6.7 2 7.1 0 0 2 7.1
PTSD 4 14.8 4 13.3 4 14.3 2 11.8 4 14.3
TBI 4 14.8 5 16.6 5 17.9 4 23.5 5 17.9
Somatic 3 11.1 4 13.3 4 14.3 3 17.6 4 14.3
None 4 14.8 4 13.3 3 10.7 1 5.8 3 10.7
Malingerer 25 26 24 15 25
Litigant 2 8.0 2 7.7 2 8.3 1 6.7 2 8.0
Disability 7 28.0 7 26.9 7 29.2 7 4.7 7 28.0
Criminal 3 12.0 3 11.5 3 12.5 2 13.3 3 12.0
Veteran 3 12.0 3 11.5 3 12.5 3 20.0 3 12.0
Patient 3 12.0 4 15.4 3 12.5 1 6.7 4 16.0
Student 7 28.0 7 26.9 6 25.0 1 6.7 6 24.0
Comparison 25 26 24 15 25
Litigant 2 8.0 2 7.7 2 8.3 1 6.7 2 8.0
Disability 7 28.0 7 26.9 7 29.2 7 4.7 7 28.0
Criminal 3 12.0 3 11.5 3 12.5 2 13.3 3 12.0
Veteran 4 16.0 4 13.8 4 16.7 4 20.0 4 16.0
Patient 5 20.0 6 23.1 5 20.8 1 6.7 6 24.0
Student 4 16.0 4 13.8 3 12.5 0 0 3 12.0
Non-Content 22 23 21 14 22
No 10 45.5 10 43.5 9 42.9 5 35.7 9 40.9
Yes 12 54.5 13 56.5 12 57.1 9 64.3 13 59.1
Age 21 23 21 14 22
Teen 1 4.8 1 4.3 1 4.8 0 0 1 4.5
20s 4 19.0 4 17.4 3 14.3 2 14.3 3 13.6
30s 4 19.0 4 17.4 4 19.0 2 14.3 4 18.2
40s 10 47.6 11 47.8 10 47.6 8 57.1 11 50.0
50+ 2 9.5 3 13.0 3 14.3 2 14.3 3 13.6
Education 16 16 14 11 15
No HS 6 37.5 6 37.5 6 42.9 6 54.5 6 40.0
HS Grad 10 62.5 10 62.5 8 57.1 5 45.5 9 60.0
Note: k represents total number of effect sizes associated with a moderator subgroup and % is the portion of the modera-
tors overall sample size represented by the specific moderator subgroup. Percentages may contain rounding error.
located in Table 2. Results from each validity scale meta-analysis are presented below in the
order of their overall effect sizes, from largest to smallest.
Validity scale analyses

Fp-r
Seventeen published studies and five unpublished sources furnished a total of 29 effect sizes
(n = 3872). An I2 estimate indicated that 90.9% of observed variance was accounted for by
something other than chance (I2 = .909). Random effects meta-analysis of recorded effect
sizes resulted in a weighted mean effect size estimate for Fp-r of 1.43. The 95% confidence
interval (CI) ranged from 1.17 to 1.67, suggesting that effects observed on Fp-r scale are
significantly impactful and qualify as large in effect. As a result of the homogeneity statistic
indicating unexplained variability between studies, moderator analyses demonstrated sig-

nificant influence across multiple domains: publication status (Q = 30.75, p < .05), study
condition (Q = 36.51, p < .05), participant diagnosis (Q = 26.90, p < .05), type of malingerer
group (Q = 26.40, p < .05), noncontent validity screening (Q = 37.66, p < .05), type of com-
parison group (Q = 30.38, p < .05), and education level (Q = 22.08, p < .05). The only moder-
ator that failed to display heterogeneity was age (QW = 15.85, ns); the age moderator does
not differ in effect size between the identified age group classifications. Subsequent analysis
of each moderator’s subgroups indicated that, after Bonferroni correction, all moderator
models offered adequate explanations of variability with the exceptions of three. The malin-
gerer group and comparison group showed heterogeneity with the student subgroup
demonstrating problems (Q = 20.10, p < .01; Q = 16.56, p < .01). Likewise, the noncontent
validity screening moderator subgroup displayed heterogeneity (Q = 22.29, p < .05). This
difficulty with heterogeneity means that while there is a pattern emerging in the moderated
impact of both malingerer group and comparison group on the effectiveness of malingering
detection, the definitions used by this study to define the student group are not compre-
hensive enough to sufficiently explain the observed variance. More detailed information
about the Fp-r meta-analysis is located in Table 3.
F-r
A total of 18 published and five unpublished studies provided a total of 32 effect sizes
(n = 4060). The weighted mean effect size estimate calculated from this sample was 1.32.
With a 95% confidence interval ranged from 1.13 to 1.51, the effects observed on F-r scale
are significantly impactful, and qualify as large in effect, and most of the variability in F-r
is accounted for by something other than chance, I2 = .839. The moderators of publication
status (Q = 36.57, p < .05), study condition (Q = 38.84, p < .05), participant diagnosis
(Q = 32.93, p < .05), type of malingerer group (Q = 29.88, p < .05), type of comparison group
(Q = 30.81, p < .05), noncontent validity screening (Q = 40.76, p < .05), and age (Q = 25.21,
p < .05) were heterogeneous. Conversely, education was homogeneous, suggesting that
when you consider F-r’s effectiveness there is no impact on that effectiveness by this mod-
erator. The noncontent validity screening moderator subgroup displayed heterogeneity
(Q = 26.95, p < .05), suggesting between study variability in a manner that makes model
interpretation inappropriate. Additional details on moderator analyses are provided in
Table 4.
Fs
A total of 18 published and three unpublished sources resulted in 30 effect sizes (n = 3880)
with a weighted mean effect size estimate of 1.24. With a 95% confidence interval ranging
from 1.06 to 1.42, Fs’s effect sizes suggest the scale is significantly useful in its ability to
distinguish honest and dishonest responding groups with significant effects that are large
in size. Fs variability is largely due to patterns of variability and not chance, I2 = .810. As
before, seven moderators were explored with publication status (Q = 33.16, p < .05), study
condition (Q = 36.13, p < .05), participant diagnosis (Q = 26.98, p < .05), type of malingerer
group (Q = 28.54, p < .05), type of comparison group (Q = 31.41, p < .05), noncontent validity
screening (Q = 36.20, p < .05), and age (Q = 23.33, p < .05) showing heterogeneity and each
offering significant between-class effects. With the exception of malingerer group, all mod-
erator models were sufficient in explaining observed variance. The patient subgroup of the
Table 3. Infrequent pathology (Fp-r) effect size estimates and moderator analyses.
Sample/class m n g 95% CI QW QB QT
Total sample 29 3872 1.43 1.17–1.69 307.91***
Publicationa 30.75* 277.16***
No 5 395 1.91 .83–3.00 5.57
Yes 24 3477 1.35 1.08–1.61 25.19
Conditiona 36.51* 271.39***
Simulation 14 1381 1.85 1.47–2.23 17.19
Real 15 2491 1.04 .76–1.32 19.32
Diagnosisa 26.90* 281.01***
Combined 11 2150 1.38 .97–1.80 9.40
Singular 3 266 1.61 .79–2.43 8.07
PTSD 4 338 1.48 .76–2.19 .77
TBI 4 456 .67 −.03–1.36 .27
Somatic 3 394 1.21 .42–2.00 .73
None 4 268 2.39 1.65–3.12 7.66
Malingerera 26.40* 281.51***

Litigant 2 130 .50 .49–1.49 .02
Disability 7 1767 1.06 .55–1.57 2.70
Criminal 4 430 1.78 1.08–2.47 .96
Veteran 3 440 1.40 .60–2.19 .77
Patient 3 182 .93 .09–1.77 1.85
Student 10 923 1.92 1.46–2.37 20.10b
Comparisona 30.38* 277.53***
Litigant 2 130 .50 −.49–1.49 .02
Disability 7 1767 1.06 .55–1.57 2.70
Criminal 4 430 1.78 1.08–2.47 .96
Veteran 4 497 1.51 .81–2.20 1.08
Patient 7 810 1.58 1.04–2.11 9.06
Student 5 238 1.84 1.17–2.51 16.56b
Non-Contenta 270.25***
No 14 1596 1.55 1.19–1.92 15.37
Yes 15 2276 1.31 .94–1.68 22.29b
Age 22 2958 15.85 200.31*** 216.16***
Teen 2 88 .58 −.40–1.57 1.59
20s 4 341 1.83 1.13–2.53 4.70
30s 4 519 1.71 1.04–2.38 3.75
40s 10 1850 .95 .53–1.37 5.80
50+ 2 160 1.66 .70–2.62 .01
Education 19 2614 22.08* 132.13*** 154.21***
No HS 6 1568 1.36 .86–1.86 5.52
HS Grad 13 1046 1.08 .71–1.44 16.56
Notes: m = Number of effect sizes for each moderator, n = total number of participants, and g = effect size calculated
using Hedge’s g (.20–.49 is small, .50–.79 = medium, and >.80 is large). Q within, between, and total scores are rep-
resented by QW, QB, and QT.The symbol adenotes that a moderator uses the total number of effect sizes, number of
participants, and QT for analysis while bsignifies that a moderator subgroup is significant after a Bonferroni correc-
tion.*p < .05**p < .01***p < .001.
malingerer group moderator was not adequately defined within this study (Q = 11.79,
p < .05). Education did not offer significant within-class effects (Q = 14.02, ns), suggesting
that that variable, as defined within this study, does not impact Fs functioning. Table 5 reports
moderator specific information, including effect sizes, confidence intervals, and information
about homogeneity.
RBS
Thirteen published and one unpublished source furnished 18 effect sizes (n = 2867) with a
weighted mean effect size estimate of 1.18 and 95% confidence intervals ranging from .96
to 1.41. As a significant predictor of feigning, the weighted mean effect size observed for
Table 4. Infrequent responding (F-r) effect size estimates and moderator analyses.
Total sample 32 4060 1.32 1.13–1.51 191.94***
No 5 395 1.96 1.03–2.89 3.73
Yes 27 3665 1.23 1.05–1.41 32.84
Conditiona 38.84* 153.10***
Simulation 14 1381 1.55 1.25–1.85 17.77
Real 18 2679 1.15 .90–1.40 21.07
Diagnosisa 32.93* 159.00***
Combined 12 2216 1.37 1.06–1.67 10.14
Singular 3 266 1.20 .57–1.82 3.58
PTSD 4 338 1.46 .90–2.02 4.86
TBI 5 530 .82 .34–1.29 .98
Somatic 4 442 1.16 .63–1.69 7.89
None 4 268 2.01 1.44–2.58 5.48
Malingerera 29.88* 162.06***

Litigant 2 130 .86 .09–1.63 .25
Disability 7 1767 1.41 1.03–1.80 3.77
Criminal 4 430 1.63 1.10–2.17 2.58
Veteran 3 440 1.40 .80–2.01 1.68
Patient 6 370 .63 .17–1.09 4.24
Student 10 923 1.58 1.23–1.93 17.36
Comparisona 30.81* 161.13***
Litigant 2 130 .86 .09–1.63 .25
Disability 7 1767 1.41 1.03–1.80 3.77
Criminal 4 430 1.63 1.10–2.17 2.58
Veteran 4 497 1.50 .96–2.03 2.09
Patient 10 998 .91 .57–1.25 10.57
Student 5 238 1.87 1.33–2.40 11.56
Non-contenta 151.18***
No 14 1596 1.52 1.21–1.83 13.81
Yes 18 2464 1.17 .93–1.41 26.95b
Age 25 3146 25.21* 126.35*** 151.56***
Teen 2 88 .87 .05–1.69 .54
20s 4 341 1.85 1.27–2.43 4.34
30s 4 519 1.47 .93–2.02 3.88
40s 12 1990 1.17 .86–1.49 9.79
50+ 3 208 1.05 .40–1.70 6.66
Education 19 2614 18.73 99.19*** 117.92***
No HS 6 1568 1.65 1.22–2.08 4.01
HS Grad 13 1046 1.27 .94–1.60 14.72
Notes: m = Number of effect sizes for each moderator, n = total number of participants, and g = effect size calculated
using Hedge’s g (.20–.49 is small, .50–.79 = medium, and >.80 is large). Q within, between, and total scores are rep-
resented by QW, QB, and QT.The symbol adenotes that a moderator uses the total number of effect sizes, number of
participants, and QT for analysis while bsignifies that a moderator subgroup is significant after a Bonferroni correc-
tion.*p < .05**p < .01***p < .001.
RBS was large and due to something other than chance, I2 = .825. RBS demonstrated heter-
ogeneity and moderator analyses revealed significant effects only for diagnosis (Q = 16.69,
p < .05). No other moderators were significantly impactful on the function of the RBS validity
scale. Analyses of moderator subgroups are presented in Table 6.
FBS-r
Eighteen published and four unpublished sources furnished 31 effect sizes (n = 3990). The
weighted mean effect size estimate for FBS-r was 1.04 and a majority of the variance in the
scale is predicted by something other than chance, I2 = .811. Along with a 95% confidence
intervals ranging from .87 to 1.22, FBS-r demonstrates a large effect size. Examining all seven
Table 5. Infrequent somatic (Fs) effect size estimates and moderator analyses.
Total sample 30 3880 1.24 1.06–1.42 152.90***
No 3 215 1.80 1.19–2.41 3.21
Yes 27 3665 1.19 1.00–1.38 29.95
Conditiona 36.13* 116.77***
Simulation 13 1311 1.49 1.29–1.69 15.34
Real 17 2569 1.04 .80–1.28 20.79
Diagnosisa 26.98* 125.92***
Combined 11 2106 1.29 1.00–1.58 8.31
Singular 3 266 1.13 .56–1.70 4.03
PTSD 4 338 1.43 .92–1.93 1.35
TBI 5 530 .77 .33–1.21 2.57
Somatic 4 442 1.20 .72–1.69 8.21
None 3 198 1.89 1.28–2.50 2.51
Malingerera 28.54* 124.36***

Litigant 2 130 .88 .17–1.59 .30
Disability 7 1767 1.27 .92–1.62 3.44
Criminal 4 430 1.40 .91–1.88 .29
Veteran 3 440 1.41 .85–1.96 1.71
Patient 5 260 .69 .21–1.17 11.79b
Student 9 853 1.44 1.10–1.78 11.01
Comparisona 31.41* 121.50***
Litigant 2 130 .88 .17–1.59 .30
Disability 7 1767 1.27 .92–1.62 3.44
Criminal 4 430 1.40 .91–1.88 .29
Veteran 4 497 1.51 1.02–2.00 2.31
Patient 9 888 1.01 .68–1.34 16.80
Student 4 168 1.50 .95–2.05 8.28
No 13 1526 1.44 1.20–1.67 14.42
Yes 17 2354 1.09 .85–1.33 21.78
Age 23 2966 23.33* 99.18*** 122.51***
Teen 2 88 .96 .19–1.74 2.14
20s 3 271 1.62 1.00–2.23 .35
30s 4 519 1.21 .71–1.71 .45
40s 11 1880 1.11 .80–1.41 13.17
50s 3 208 1.08 .48–1.68 7.23
Education 17 2434 14.02 62.38*** 76.39***
No HS 6 1568 1.25 .86–1.65 2.95
HS Grad 11 866 1.26 1.00–1.53 11.06
Notes: m = Number of effect sizes for each moderator, n = total number of participants, and g = effect size calculated using
Hedge’s g (.20–.49 is small, .50–.79 = medium, and >.80 is large). Q within, between, and total scores are represented by
QW, QB, and QT.The symbol adenotes that a moderator uses the total number of effect sizes, number of participants, and
QT for analysis while bsignifies that a moderator subgroup is significant after a Bonferroni correction. *p < .05, **p < .01,
***p < .001.
moderator variables identified for this study, FBS-r demonstrated heterogeneity for publi-
cation stats (Q = 34.09, p < .05), study condition (Q = 33.17, p < .05), respondent diagnosis
(Q = 30.23, p < .05), noncontent validity screening (Q = 33.07, p < .05), and age (Q = 23.29,
p < .05). However, the somatic subgroup for the diagnosis moderator demonstrated heter-
ogeneity, suggesting that it contained significance within class differences, which makes
interpretation of the diagnosis moderator inappropriate. This heterogeneity also occurred
for the age moderator. Other moderators (i.e. malingerer group, comparison group, and
education) were observed to have no effect. Information about the FBS-r moderator analyses
are contained in Table 7. As a result, only publication status and study condition are signifi-
cantly impactful moderators that can be interpreted.
Table 6. Response Bias Scale (RBS) effect size estimates and moderator analyses.
Total sample 18 2867 1.18 .96–1.41 97.07***
Publicationa 17.53 79.54***
No 1 135 1.66 .74–2.58 .0
Yes 17 2732 1.15 .93–1.31 17.53
Conditiona 17.50 79.58***
Simulation 4 431 1.41 .94–1.89 1.18
Real 14 2436 1.12 .87–1.37 16.32
Diagnosisa 16.69* 80.38***
Combined 8 1776 1.19 .86–1.51 6.62
Singular 0 0 – – –
PTSD 2 141 .98 .29–1.67 .29
TBI 4 485 1.11 .62–1.49 4.76
Somatic 3 366 1.20 .65–1.74 5.02
None 1 99 1.79 .83–2.75 .0
Malingerera 9.14 87.93***

Litigant 1 48 1.57 .52–2.63 .0
Disability 7 1731 1.38 1.03–1.72 4.03
Criminal 3 305 1.46 .91–2.01 .90
Veteran 3 538 .96 .44–1.49 4.21
Patient 3 188 .48 −.10–1.06 .41
Student 1 57 1.18 .18–2.18 .0
Comparisona 9.28 87.80***
Litigant 1 48 1.57 .52–2.63 .0
Disability 7 1731 1.38 1.03–1.72 4.03
Criminal 3 305 1.46 .91–2.01 .90
Veteran 4 595 1.01 .55–1.48 4.44
Patient 3 188 .48 −.10–1.06 .41
Student 0 0 – – –
No 5 835 1.37 .64–2.09 1.33
Yes 13 2032 1.11 .87–1.34 16.70
Age 15 2369 11.87 85.20*** 81.36***
Teen 0 0 – – –
20s 2 240 1.37 .69–2.04 .24
30s 2 197 1.06 .37–1.75 1.48
40s 9 1827 1.18 .87–1.49 9.50
50s 2 132 .53 −.17–1.23 .65
Education 11 2124 4.07 53.12*** 57.19***
No HS 6 1526 1.39 .91–1.86 2.48
HS Grad 5 598 1.22 .57–1.86 1.59
Hedge’s g (.20–.49 is small, .50–.79 = medium, and >.80 is large). Q within, between, and total scores are represented by
QW, QB, and QT.The symbol adenotes that a moderator uses the total number of effect sizes, number of participants, and
QT for analysis. No moderator subgroups were significant after a Bonferroni correction. *p < .05, **p < .01, ***p < .001.
Discussion
This study is the first published meta-analysis of the validity scales on the MMPI-2-RF in their
detection of over-reported malingering. The major goal of this study was to synthesize
research in the effectiveness of the MMPI-2-RF validity scales at the detection of malingering
while also evaluating contextual influences that might cause discrepancy in the stability of
those effect sizes. Using a random-effects model to examine moderating factors, this study
offers four broad implications about the MMPI-2-RF’s efficacy with respect to its over-
reporting validity indexes: (a) they are valid and useful tools in the assessment of feigned
over-reporting; (b) there is some inconsistency in the effectiveness of the different over-
reporting scales as some function more effectively than others and are influenced less by
moderating factors; (c) researchers and clinicians need to consider the influence of the
Table 7. Symptom validity (FBS-r) effect size estimates and moderator analyses.
Total sample 31 3990 1.04 .87–1.22 158.85***
No 4 325 1.31 .81–1.82 3.56
Yes 27 3665 1.01 .82–1.19 30.53
Conditiona 33.17* 125.67***
Simulation 13 1311 1.12 .76–1.48 12.44
Real 18 2679 1.00 .83–1.18 20.73
Diagnosisa 30.23* 128.62***
Combined 12 2216 1.07 .80–1.34 4.83
Singular 3 266 .58 .02–1.14 2.58
PTSD 4 338 .96 .46–1.46 4.84
TBI 5 530 .93 .50–1.37 3.16
Somatic 4 442 1.32 .84–1.80 13.25b
None 3 198 1.31 .73–1.89 1.57
Malingerera 26.26 132.59***

Litigant 2 130 1.12 .41–1.82 .63
Disability 7 1767 1.23 .89–1.58 3.40
Criminal 4 430 1.29 .82–1.77 2.06
Veteran 3 440 1.41 .86–1.96 4.78
Patient 6 370 .62 .20–1.04 4.79
Student 9 853 .86 .53–1.19 10.59
Comparisona 24.36 134.48***
Litigant 2 130 1.12 .41–1.82 .63
Disability 7 1767 1.23 .89–1.58 3.40
Criminal 4 430 1.29 .82–1.77 2.06
Veteran 4 497 1.32 .83–1.80 5.31
Patient 10 998 .62 .31–.93 7.34
Student 4 168 1.17 .63–1.71 5.62
No 13 1526 1.17 .82–1.52 11.70
Yes 18 2464 .97 .78–1.15 21.38
Age 24 3076 2.29* 86.19*** 109.49***
Teen 2 88 .60 −.10–1.31 .25
20s 3 271 1.64 1.07–2.21 2.70
30s 4 519 .98 .52–1.44 4.24
40s 12 1990 1.05 .78–1.31 3.98
50+ 3 208 .94 .38–1.50 12.11b
Education 18 2544 16.48 62.32*** 78.79***
No HS 6 1568 1.14 .89–1.39 4.44
HS Grad 12 976 1.19 .86–1.53 12.04
Hedge’s g (.20–.49 is small, 50–.79 = medium, and >.80 is large). Q within, between, and total scores are represented by QW,
QB, and QT.The symbol adenotes that a moderator uses the total number of effect sizes, number of participants, and QT for
analysis while bsignifies that a moderator subgroup is significant after a Bonferroni correction. *p < .05, **p < .01, ***p < .001.
situation and respondent characteristics on their interpretation for most of the validity scales;
and (d) numerous groups and individual characteristics remain understudied, and additional
efforts to thoroughly evaluate the MMPI-2-RF at the detection of content-based invalid
responding are needed. Historically noted differences between simulation and real study
conditions, with simulation effect sizes being substantially larger than actual evaluation
differences, were also observed in this study. This trend is likely a product of motivation
towards substantial secondary gains (Nelson et al., 2010).
Clinicians using the MMPI-2-RF should be familiar with our results and incorporate them
into interpretive practice. To do so effectively, clinicians should be cognizant that moderators
analyzed within this study are evaluated in isolation of one another, offering estimates of
effect relative only to that moderator’s influence on the performance of a given validity scale.
We did not attempt to assess guidelines for these interactions between variables (e.g. the
interplay of a respondent compared to given norm group with strong and consistent dis-
criminative capacity who is also part of a diagnostic category that produces lower estimates
discrimination). This means that although effects from two moderators may differ (perhaps
even interact), research has yet to explore these interactions. Instead, this paper has focused
on the contribution of such moderators with belief that findings here will spur further research
in this area. Until such a point, clinicians are advised to use their judgment as they interpret
validity profiles, with the points below offering some guidance on how best to do so.
Even after accounting for differences in comparison groups and study design, the over-re-
porting scales show a consensus of very large effect sizes. These effect sizes represent sub-
stantial portions (>80%) of patterned variance in response style not attributable to chance.
Thus, the validity scales of the MMPI-2-RF are useful in their efforts to discern between groups
with true and feigned responding patterns. However, some scales appear more useful to
consider during specific types of evaluations as a function of respondent, evaluation setting,
or comparison group characteristic. Moreover, that these effect sizes also have generally
small confidence intervals suggests not only a high degree of power but also consistency.
Taken together, this consistency suggests that the intent of the MMPI-2-RF to provide a
means of screening dishonest responding through their revisions and additions of validity
scales has been widely successful.
From these effect sizes, an interesting pattern emerges. This study demonstrates that
malingered responding, without regard to moderating factors, will produce a profile of
over-reporting validity scales that are elevated at least one standard deviation beyond that
of the same comparison group to which most studies compared malingers (e.g. student to
student, litigant to litigant, etc.). This stable capacity to detect differences at a rate of over
one standard deviation (on average) is likely to generalize across contexts because of the a
priori research design decision made to select for analysis only those groups known to be
malingering, when multiple comparison groups were provided within a study (e.g. we
excluded probable malingers in favor of known malingers as seen in Marion et al., 2011). Doing
so provides a greater accuracy to ensure groups are appropriately classified and is consistent
with previous meta-analyses conducted on the MMPI (Nelson et al., 2010).
However, the mean difference observed between malingered and honest responding
groups is also smaller than desirable. The differences between honest and malingering
groups suggest that lower T-scores on the validity scales should cause clinicians to invest
increased effort in evaluating and determining malingering as a possibility. For instance, F-r
is described as indicating possible over-reporting at two standard deviations above norma-
tive levels with less than that offering no evidence of over-reporting (Ben-Porath, 2012b).
Nevertheless, across both simulation and real groups with an applicable comparison group,
the observed mean differences between honest and malingering groups do not appear this
exaggerated. This trend of narrower than expected differences between groups is evident
for each of the over-reporting scales of the MMPI-2-RF in comparative samples not drawn
from the normative sample. It may be due to context-specific comparison groups having
generally inflated T-scores. This means that a non-malingerer with an already elevated T-score
runs a greater risk of being identified as a malingerer than someone drawn from the nor-
mative sample.
With patterns of at least one standard deviation (up to just under 1.5) difference between
honest and malingered respondents, clinicians can use context specific reference groups to
help ensure accurate decision-making. For instance, an individual exceeding traditional

thresholds for non-content scale invalidity (Ben-Porath, 2012b; Ben-Porath & Tellegen,
2008/2011) who also exceeds one standard deviation compared to a context specific refer-
ence group is more likely to be accurately identified as a malinger. However, this narrower
than expected difference between malingerers and honest respondents within applicable
reference groups poses some difficulty for MMPI-2-RF.Without incorporating context specific
normative groups in addition to traditional cut-score use, interpretation of validity indexes
in some clinical populations may lead to higher false positive rates for labeling as malingering
what is actually honest responding. For instance, honest respondents were only about one
standard deviation below malingers for RBS whereas normative data would suggest a three
standard deviation difference (e.g. T-score cut value of 80; Ben-Porath & Tellegen, 2008/2011).
This risk of a higher false-positive rate would be more pronounced in nonsimulation malin-
gering evaluations. However, the ability to differentiate an individual’s response veracity

even across a narrow mean score difference helps to offset this concern some. Further, score
differences between honest and malingering respondents appear regularly, which supports
MMPI-2-RF validity scale use; however, providing an additional point of reference (e.g. the
addition of comparison group norms specific to the evaluation context) might further bolster
true-positive and true-negative identification rates for scales.
Interpretation of differences exceeding the one standard deviation recommendation is
tentatively based on the overall effect sizes observed in this meta-analysis and is recom-
mended in combination with traditional approaches. This one standard deviation difference
will likely differ between comparison groups and may not meet needed clinical efficacy on
its own given a lack of evaluation for specificity and sensitivity. The narrow mean difference
between honest and malingering groups and a lack of specificity analyses to differentiate
many of these groups could be problematic in some evaluations. Clinicians may wish to
include external SVTs to support their interpretations when the evaluation is being con-
ducted on a comparison group with validity scale elevations above those seen in the nor-
mative sample. Use of higher cut scores may also help ensure feigning is effectively detected
(e.g. T-score cut value of 100 instead of 80 for RBS; Tarescavage, Wygant, Gervais, & Ben-
Porath, 2013). Further research on cut score efficacy in various populations would aid inter-
pretation of the MMPI-2-RF validity scales, as optimized cut-scores among context-specific
comparison groups are needed. Accordingly, it would be useful if malingering studies would
include sensitivity analyses to aid in the evaluation of discriminative capacity amongst
groups of suspected malingers. Although some of the malingering studies have done so
(e.g. Sellbom et al., 2010; Sellbom & Bagby, 2011), most have not.
The FBS-r and RBS scales appear the most robust scales in that they are the least influenced
by some important contextual factors associated with an evaluation (e.g. diagnosis being
evaluated, respondent referral group). Although the effect size for FBS-r was smaller than
other validity scales traditionally associated with greater effectiveness in the detection of
over-reporting (e.g. F-r; Rogers et al., 2003), this smaller effect size is offset by its stability in
discrimination. This stability is similarly true of RBS, as was predicted by Nelson et al. (2010)
who suggested that it would likely be a highly valuable tool during neuropsychological
evaluations. It is important to note, however, that despite being moderated by a number of
factors, the other validity scales are all well suited for criminal evaluations with those groups
consistently producing larger and more discernable differences with malingered respond-
ents generating even more elevated validity profiles. This supports the contention that the
MMPI-2-RF is a grounded instrument for forensic use (Ben-Porath, 2012a). The effectiveness
of validity scales, other than FBS-r and RBS, within neuropsychological evaluations is not as
consistent. Below, some important interpretive findings are summarized within the context
of each separate over-reporting scale. Scales are arranged according to the size of the mean
effect size, with largest overall effect first.
Fp-r
The Fp-r scale has the strongest overall capacity to detect dishonest responding as evidenced
by a large mean effect size and higher range of values for its confidence interval. Fp-r offers
the largest estimates of differences between malingering and honest respondents; however,
there are numerous moderating factors that play a role in both Fp-r’s evaluative capacity.
Beyond respondent diagnosis and referral group, the comparison group also indicated a
substantially moderating influence; however, due to the heterogeneous nature of the stu-
dent subgroup (i.e. students did not all perform in a consistent manner), the effectiveness
of the comparison group moderator cannot be determined for Fp-r, and the cause of the
discrepancy within that subgroup cannot be determined. This was similarly true of the non-
content validity screening moderator. Studies utilizing exclusion criteria for excessive ran-
dom, fixed, or acquiescent responding, the influence such screenings are difficult to interpret.
These moderating factors make it difficult to determine a single cut-score that will be effec-
tive at differentiating between honest and malingered responses when evaluation specific
information is taken into account.
F-r
Possessing consistently large effect size in the assessment of over-reporting, the effectiveness
of F-r at the detection of malingering is bolstered by a reasonably small margin of error but
is also limited by numerous moderating factors (i.e. diagnosis of respondent, respondent
population, comparison group population, etc.). As with Fp-r, these moderating factors make
it difficult to determine how and when to interpret moderate F-r scale elevations as indicative
of invalid responding with the caveat that certain groups (e.g. criminals or those without
any form of true diagnosis) are likely to be easily detectable and still appropriate for use. The
non-content validity screening moderator was also not interpretable for this scale suggesting
that studies utilizing exclusion criteria for excessive random, fixed, or acquiescent responding
contain substantial within class variability.
Fs
With a large effect size, Fs produces a similar pattern seen in other validity scales. It is effec-
tively able to discriminate between honest and malingered responding but does so with a
degree of moderated influence. It offers a more context-specific tool that can aid in the
evaluation of faked somatic issues. Relatedly, although Fs was designed to assess somatic
complaints and is a well-reasoned scale for use in medically related evaluations, the respond-
ent malingering group of patient was heterogeneous between studies. This heterogeneity
means that the coding definition used within this study did not consistently explain observed
patterns between studies and that Fs is likely to vary between various types of evaluations.
Thus, as with Fp-r and F-r, Fs has difficulties with consistent performance in a variety of
settings, with some influence not yet fully understood. Screening for excessive non-content
validity indicators did influence Fs interpretation, making it likely that not conducting such
screenings will lead to inflated T-scores on Fs.
RBS
In addition to producing a large effect size and effectively differentiating malingering groups,
RBS was moderated only by diagnosis. These diagnostic moderators may increase the capac-
ity of RBS to discriminate honest responding and malingered response patterns with even
larger effects within issues commonly evaluated within neuropsychological contexts (i.e.
TBI). As with FBS-r, the impact of different respondent diagnoses remains an area of consid-
eration for RBS as it continues to develop a base of research. Although it is emerging as a

strong index, many specific areas need further study to more narrowly define some of the
wide confidence intervals observed. Unlike Fs, screening for excessive non-content validity
indicators did not influence RBS performance.
FBS-r
An effective measure of malingering, FBS-r offers a large effect with minimal moderating
influence compared to other validity scales. Moderation is only noted for respondent diag-
nosis, content screening, and age; however, neither diagnosis nor age is interpretable due
to issues of heterogeneity within subgroups. This heterogeneity means that although diag-
nosis and age are likely to impact the utility of FBS-r, current operationalized definitions do
not provide effective estimates of those impacts. Content screening influences the interpre-
tation of FBS-r, making it likely that not conducting such screenings will lead to inflated
T-scores. Thus, while FBS-r offers a stable and strong capacity to discriminate malingers from
honest respondents, more work is needed on the scale to define the impact of individual
respondent clinical characteristics. Otherwise, analysis of moderators for FBS-r suggests that
it is a highly portable validity scale that is applicable across a variety of contexts with a
minimal amount of moderating influence.
Study critique and potential limitations

There are two main limitations to this study. The first short coming of this meta-analysis is
that the accuracy of malingering diagnosis was not assessed. Thus, while we are able to
conclude that the different validity scales of the MMPI-2-RF are able to differentiate reported
groups, it is possible that studies had varying degrees of accuracy with group membership
assignment, and this could lead to some additional variability between studies. The second
limitation is the number of studies viable for inclusion. As a result of a low number of effect
sizes, some moderators had to be collapsed. This collapsing of effect sizes into a single
moderator impacts both how moderators were able to be defined as well as the means
through which meta-analytic analyses could be conducted. For instance, some diagnoses
(ADHD, Schizophrenia, depression, etc.) are understudied and had to be condensed into a
single moderator. Congruent with this low number of effect sizes, it should be noted that
moderators for RBS were not generally calculated with an individual estimate of τ2. Instead,
those moderators used a pooled τ2 because of low numbers of effect sizes within each
subgroup. Finally, the contribution to effect size is influenced both by overall scale effective-
ness and the number of studies in meta-analysis working in concert with one another. It is
important to remember that the effect sizes observed will likely evolve some as additional
studies emerge. However, these limitations were not thought to have invalidated this study.
The lack of evaluation for diagnostic accuracy is due, in part, to the grouping approach
used by the studies themselves. While it would have been possible to code the strength of
studies in assigning group membership, another approach of calculating effects only for
groups with the greatest likelihood of malingered presentation was used. This approach has
been undertaken within meta-analytic evaluations on the MMPI before (Nelson et al., 2010).
Simulation studies represent known (or proximally known, less issues such as seizures arise
during test administration and go unreported) rates of actual group membership, while
studies using samples identifying groups using SVT failure as a criteria imply accuracy based
on external criteria. These SVT-based group designs are prone to greater difficulties with
determining absolute conviction of invalidity. However, the approach of removing ‘probable
malingering’ groups from this study helps ensure that this study compared groups with
memberships that are accurate reflections of individual respondent effort. By not considering
cases which left researchers unsure of actual effort, this approach compares only groups in
which researchers had utmost confidence in their group membership assignment and
attempts to assuage the difficulties associated with unknown diagnostic accuracy.
The fact that differences were observed within the combined sample of diagnosis sug-
gests that there will likely be support for use of the validity scales to discern malingering
efforts, even within less studied diagnoses. Likewise, the use of a pooled τ2 for some mod-
erator subgroup estimates may be somewhat problematic, as it does not reflect as accurate
a measurement as an independent estimate might; however, pooled estimates used in mod-
erators for other validity scales still showed effectiveness suggesting that use of the pooled
estimate is not likely to have biased results. In general, these limitations speak to a short-
coming of research on malingering assessment using the MMPI-2-RF in a number of specific
ways. There is a general paucity of research, and there is also a need for future studies to
target specific areas so far limited in the literature. Thus, although the number of studies
meeting inclusion for this study was small and somewhat limited the capacity to conduct
as specific analyses as might be desired in some specific areas, results seen here are likely
to be interpretable as reflective of general patterns of the over-reporting scales of the MMPI-
2-RF. This is bolstered given the consistency of numbers of effect sizes used in this study
with those of other meta-analysis on earlier versions of the MMPI (e.g. Nelson et al., 2010).
Implications for future research

While this study has demonstrated the effectiveness of the MMPI-2-RF over-reporting validity
scales, it has also helped to identify some areas of needed further study. For instance, factors
including specific diagnoses and certain evaluation contexts are under-evaluated in most
scales. This means that the MMPI-2-RF validity scales have not been assessed in many areas
of clinical practice that see frequent referrals (e.g. traumatic brain injury within veterans or
evaluations for dementia with geriatric populations). This is likely due to the relative youth
of the MMPI-2-RF given the substantial rate at which research is being produced on its
effectiveness across contexts. Still, further research is needed in many areas. Researchers
should also increase the consistency of screening for patterns of random responding while
conducting over-reporting research and future meta-analyses may wish to expand on coding
approaches to non-content based validity screenings. With only about half of the studies
identified for this meta-analysis conducting these screenings, not consistently screening
out participants for excessive random and acquiescent responding adds unnecessary noise
to interpretation. It may also be useful to thoroughly examine the impact of various coaching
conditions of the effectiveness of malingering attempts. Finally, given the consistent dis-
crepancy between real and simulation studies, as the MMPI-2-RF literature continues to
grow, it may be useful for future meta-analyses over-reporting scales to evaluate moderators
separately.
Conclusion
This study adds to the literature surrounding validity scale interpretation with the MMPI-2-RF
by meta-analytically exploring the role of the validity scales and moderating factors on their
effectiveness. The MMPI-2-RF validity scales show a great deal of promise for the detection
of content-based invalid responding in a variety of evaluative contexts. However, in order
to ensure interpretations are as efficacious and accurate as possible, it is important for cli-
nicians to consider a number of factors related to the context of the evaluation and the
characteristic of the respondent, and these factors vary according to the validity scale being
used. These moderating factors vary across the different validity scales, suggesting a need
for careful incorporation of validity scale results into assessment conceptualizations.
As these moderators differ across the validity scales, there are also some areas that need
further study and which are substantially limited at this point. As an example, studies need
to explore the role of education on patterns of responding given the persistent appearance
of scale elevations being lower for those who are more educated. This apparent trend may
be a function of the development of formal reasoning often associated with exposure to
higher levels of education, making intentional and controlled efforts at feigned presentation
more effectively managed. Although observed effects demonstrate differences only between
those with a high school degree and those without, we believe that this difference is less
about those discrete groups (as the groupings were created by researchers due to observed
constraints within the studies sampled) and may represent a broader trend in education.
Another area needing exploration is the moderating effects of different variables (e.g. edu-
cation, respondent diagnosis, etc.) and how they may interact to influence scale performance.
For instance, education may be of even greater influence within evaluative contexts when
individuals are feigning particular sets of diagnostic criteria whose symptoms are often
misrepresented or misunderstood by the lay public (e.g. schizophrenia). This is an area of
growth needed not only for the MMPI-2-RF but also in general malingering research, as the
systematic evaluation of specific factors within malingering groups are not often thoroughly
tested.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Paul B. Ingram http://orcid.org/0000-0002-5409-4896
References
Anderson, J. L. (2011). A multi-method assessment approach to the detection of malingered pain:
Association with the MMPI-2 Restructured Form (Doctoral dissertation). Retrieved from ProQuest
Dissertation and Thesis database. (UMI No. 1497182).
Arbisi, P. A., Polusny, M. A., Erbes, C. R., Thuras, P., & Reddy, M. K. (2011). The Minneasota Multiphasic
Personality Inventory–2 Restructured Form in national guard soldiers screening positive for
posttraumatic stress disorder and mild traumatic brain injury. Psychological Assessment, 23, 203–214.
doi:http://dx.doi.org/10.1037/a0021339
Arbisi, P. A., Sellbom, M., & Ben-Porath, Y. S. (2008). Empirical Correlates of the MMPI-2 Restructured
Clinical (RC) Scales in psychiatric inpatients. Journal of Personality Assessment, 90, 122–128. doi:http://
dx.doi.org/10.1080/00223890701845146
Archer, R. P. (2006). A perspective on the Restructured Clinical (RC) Scale Project. Journal of Personality
Assessment, 87, 179–185. doi:http://dx.doi.org/10.1207/s15327752jpa8702_07
Archer, E. M., Hagan, L. D., Mason, J., Handel, R., & Archer, R. P. (2012). MMPI-2-RF characteristics of
custody evaluation litigants. Assessment, 19, 14–20. doi:http://dx.doi.org/10.1177/107319110397469
Ayearst, L. E., Sellbom, M., Trobst, K. K., & Bagby, R. M. (2013). Evaluting the interpersonal content of the
MMPI-2-RF interpersonal scales. Journal of Personality Assessment, 95, 187–196. doi:http://dx.doi.or
g/10.1080/00223891.2012.730085
Baer, R. A., & Miller, J. (2002). Underreporting of psychopathology on the MMPI-2: A meta-analytic
review. Psychological Assessment, 14, 16–26.
Ben-Porath, Y. S. (2003). Assessing personality and psychopathology with self-report inventories. In
J. R. Graham & J. A. Naglieri (Eds.), Handbook of psychology: Assessment psychology (Vol. 10, pp.
553–577). Hoboken, NJ: Wiley.
Ben-Porath, Y., & Tellegen, A. (2008/2011). Minnesota multiphasic personality inventory–restructured form:
Manual for administration, scoring, and interpretation. Minneapolis, MN: University of Minnesota Press.
Ben-Porath, Y. S. (2012a). Addressing challenges to MMPI-2-RF based testimony: Questions and answers.
Archives of Clinical Neuropsychology, 27, 691–705. doi:http://dx.doi.org/10.1093/arclin/acs083
Ben-Porath, Y. S. (2012b). Interpreting the MMPI-2-RF. Minneapolis: University of Minnesota Press.
Berry, D. T. R., Baer, R. A., & Harris, M. J. (1991). Detection of malingering on the MMPI: A meta-analysis.
Clinical Psychology Review, 11, 585–598.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis.
Hoboken, NJ: Wiley.
Burchett, D., Dragon, W. R., Smith Holbert, A. M., Tarescavage, A. M., Mattson, C. A., Handel, R. W.,
& Ben-Porath, Y. S. (2015). “False Feigners”: Examining the impact of non-content-based invalid
responding on the Minnesota Multiphasic Personality Inventory-2 Restructured Form content-based
invalid responding indicators. Psychological Assessment, 28, 458–470. doi:http://dx.doi.org/10.1037/
pas0000205
Butcher, J. N. (2010). Personality assessment from the nineteenth to the early twenty-first century: Past
achievements and contemporary challenges. Annual Review of Clinical Psychology, 6, 1–20. doi:http://
dx.doi.org/10.1146/annurev.clinpsy.121208.131420
Butcher, J. N., & Williams, C. L. (2010). Personality assessment with the MMPI-2: Historical roots,
international adaptations, and current challenges. Applied Psychology: Health and Well-Being, 1,
105–135.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Colligan, R. C. (1985). History and development of the MMPI. Psychiatric Annals, 15, 524–533.
Crisanti, L. (2014). The ability of the MMPI-2-RF Validity scales to detect feigning of cognitive and
posttraumatic stress disorder (PTSD) symptoms (Doctoral dissertation). Retrieved from ProQuest
Detrick, P., & Chinball, J. T. (2014). Underreporting on the MMPI-2-RF in a high-demand police officer
selection context: An illustration. Psychological Assessment, 26, 1044–1049. doi:http://dx.doi.
org/10.1037/pas0000013
Dionysus, K. E., Denney, R. L., & Halfaker, D. A. (2011). Detecting negative response bias with the Fake Bad
Scale, Response Bias Scale, and Henry-Heilbronner Index of the Minnesota Multiphasic Personality
Inventory-2. Archives of Clinical Neuropsychology, 26, 81–88. doi:http://dx.doi.org/10.1093/arclin/
acq096
Forbey, J. D., Lee, T. T. C., & Handle, R. W. (2010). Correlates of the MMPI-2-RF in a college setting.
Psychological Assessment, 22, 737–744. doi:http://dx.doi.org/10.1037/a0020645
Gervais, R. O., Ben-Porath, Y. S., Wygant, D. B., & Green, P. (2007). Development and validation of a Response
Bias Scale for the MMPI-2. Assessment, 14, 196–208. doi:http://dx.doi.org/10.1177/1073191106295861
Gervais, R. O., Ben-Porath, Y. S., Wygant, D. B., & Sellbom, M. (2010). Incremental validity of the MMPI-2-
RF over-reporting scales and RBS in assessing the veracity of memory complaints. Archives of Clinical
Neuropsychology, 25, 274–284. doi:http://dx.doi.org/10.1093/arclin/acq018
Goodwin, B. E., Sellbom, M., & Arbisi, P. A. (2013). Posttraumatic stress disorder in Veterans: The utility
of the MMPI-2-RF validity scales in detecting overreported symptoms. Psychological Assessment, 25,
671–678. doi:http://dx.doi.org/10.1037/a0032214
Green, J. E. (2013). Predictability of MMPI-2-RF scales in identifying probable malingering of depressive
symptomatology (Doctoral dissertation). Retrieved from ProQuest Dissertation and Thesis database.
(UMI No. 3553290).
Harp, J. P., Jasinski, L. J., Shandera-Ochsner, A. L., Mason, L. H., & Berry, D. T. R. (2011). Detection of
malingered ADHD using the MMPI-2-RF. Psychological Injury and Law, 4, 32–43. doi:http://dx.doi.
org/10.1007/s12207-011-9100-9
Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological
Methods, 3, 486–504.
Hoelzle, J. B., Nelson, N. W., & Arbisi, P. A. (2012). MMPI-2 and MMPI-2-restructured form validity scales:
Complementary approaches to evaluate response validity. Psychological Injury and Law, 5, 174–191.
doi:http://dx.doi.org/10.1007/s12207-012-9139-2
Ingram, P. B., Kelso, K., & McCord, D. M. (2011). Empirical correlates and expanded interpretation of
the MMPI-2-RF Restructured Clinical Scale 3 (Cynicism). Assessment, 18, 95–101. doi:http://dx.doi.
org/10.1177/1073191110388147
Jones, A., & Ingram, M. V. (2011). A comparison of selected MMPI-2 and MMPI-2-RF validity scales in
assessing effort on cognitive tests in a military sample. The Clinical Neuropsychologist, 25, 1207–1227.
doi:http://dx.doi.org/10.1080/13854046.2011.600726
Jones, A., Ingram, M. V., & Ben-Porath, Y. S. (2012). Scores on the MMPI-2-RF scales as a function of
increasing levels of failure on cognitive symptom validity tests in a military sample. The Clinical
Neuropsychologist, 26, 790–815. doi:http://dx.doi.org/10.1080/13854046.2012.693202
Kolinsky, M. (2013). Utility of the MMPI-2-RF to identify overreporting of psychopathology symptoms
in a simulated sample of an inmate population (Doctoral dissertation). Retrieved from ProQuest
Lange, R. T., Brickell, B. A., & French, L. M. (2015). Examination of the mild brain injury atypical symptom
scale and the validity–10 scale to detect symptom exaggeration in US military service members.
Journal of Clinical and Experimental Neuropsychology, 37, 325–337. doi:http://dx.doi.org/10.1080/1
3803395.2015.1013021
Lehr, E. Y. C. (2014). A comparison of the MMPI-2-RF and PAI as predictors of naïve and informed faking
(Doctoral dissertation). Retrieved from ProQuest Dissertation and Thesis database. (UMI No. 3683389).
Marion, B. E., Sellbom, M., & Bagby, R. M. (2011). The detection of feigned psychiatric disorders using
the MMPI-2-RF overreporting validity scales: An analog investigation. Psychological Injury and Law,
4, 1–12. doi:http://dx.doi.org/10.1007/s12207-011-9097-0
Mason, L. H., Shandera-Ochsner, A. L., Williamson, K. D., Harp, J. P., Edmundson, M., Berry, D. T. R., & High,
W. M., Jr (2013). Accuracy of the MMPI-2-RF validity scales for identifying feigned PTSD symptoms,
random responding, and genuine PTSD. Journal of Personality Assessment, 95, 585–593. doi:http://
dx.doi.org/10.1080/0223891.2013.819512
Meyers, J. E., Miller, R. M., Haws, N. A., Murphy-Tafiti, J. L., Curtis, T. D., Rupp, Z. W., ... Thompsom, L. M.
(2014). An adaptation of the MMPI-2 meyers index for the MMPI-2-RF. Applied Neuropsychology Adult,
21, 148–154. doi:http://dx.doi.org/10.1080/09084282.2013.780173
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & the PRISMA Group. (2009). Preferred reporting items
for systematic reviews and metaanalyses: The PRISMA statement. Annals of Internal Medicine, 151,
264–269. doi:http://dx.doi.org/10.1371/journal.pmed.1000097
Nelson, N. W., Sweet, J. J., & Heilbronner, R. L. (2007). Examination of the new MMPI-2 Response Bias
Scale (Gervais): Relationship with the MMPI-2 Validity Scales. Journal of Clinical and Experimental
Neuropsychology, 29, 67–72.
Nelson, N. W., Hoelzle, J. B., Sweet, J. J., Arbisi, P. A., & Demakis, G. J. (2010). Updated meta-analysis
of the MMPI-2 symptom validity scale (FBS): Verified utility in forensic practice. The Clinical
Neuropsychologist, 24, 701–724. doi:http://dx.doi.org/10.1080/13854040903482863
Nguyen, C. T., Green, D., & Barr, W. B. (2015). Evaluation of the MMPI-2-RF for detecting over-reported
symptoms in a civil forensic and disability setting. The Clinical Neuropsychologist, 29, 255–271.
Rogers, R., Gillard, N. D., Berry, D. T. R., & Granacher, R. P., Jr (2011). Effectiveness of the MMPI-2-RF
Validity Scales for feigned mental disorders and cognitive impairment: A known-groups study.
Journal of Psychopathology and Behavioral Assessment, 33, 355–367. doi:http://dx.doi.org/10.1007/
s10862-011-9222-0
Rogers, R., Sewell, K. W., Martin, M. A., & Vitacco, M. J. (2003). Detection of feigned mental disorders:
A meta-analysis of the MMPI-2 and malingering. Assessment, 10, 160–177. doi:http://dx.doi.
org/10.1177/1073191103252349
Rogers, R., Sewell, K. W., & Salekin, R. T. (1994). A meta-analysis of malingering on the MMPI-2. Assessment,
1, 227–237. doi:http://dx.doi.org/10.1177/107319119400100302
Saiz, J. L. B., & Dura, L. P. (2013). Detección de exageración de síntomas mediante el SIMS y el MMPI-2-
RF en pacientes diagnosticados de trastorno mixto ansioso-depresivo y adaptativo en el contexto
medicolegal: un estudio preliminar [Detection of exaggeration of symptoms by SIMS and the MMPI-
2-RF in patients diagnosed with adjustment disorder with mixed anxiety and depressed mood in
medical context : a preliminary study]. Clínica y Salud, 24, 177–183. doi:http://dx.doi.org/0.5093/
cl2013a19
Scholte, W., Tiemens, B., Verheul, R., Meerman, A., Egger, J., & Hutschemaekers, G. (2012). The RC scales
predict psychotherapy outcomes: The predictive validity of the MMPI-2’s restructured clinical scales
for psychotherapeutic outcomes. Personality and Mental Health, 6, 292–302. doi:http://dx.doi.
org/10.1002/pmh.1190
Schroeder, R. W., Baade, L. E., Peck, C. P., VonDran, E. J., Brockman, C. J., Webster, B. K., & Heinrichs, R. J.
(2012). Validation of the MMPI-2-RF Validity Scales in criterion group neuropsychological samples.
The Clinical Neuropsychologist, 26, 129–146. doi:http://dx.doi.org/10.1080/13854046.2011.639314
Sellbom, M., & Bagby, R. M. (2010). Detection of overreported psychopathology with the MMPI-2 RF
Form Validity Scales. Psychological Assessment, 22, 757–767. doi:http://dx.doi.org/10.1037/a0020825
Sellbom, M. & Bagby, R. M. (2011). Detection of overreported psychopathology with the MMPI-2-RF
form validity scales. Psychological Assessment, 22, 757–767. doi:http://dx.doi.org/0.1037/a0020825
Sellbom, M., Ben-Porath, Y. S., Graham, J. R., Arbisi, P. A., & Bagby, M. (2003). Susceptibility of the MMPI-
2 Clinical, Restructured Clinical (RC), and Content Scales to overreporting and underreporting.
Assessment, 12, 79–85. doi:http://dx.doi.org/10.1177/1073191104273515
Sellbom, M., Wygant, D. B., & Bagby, M. (2012). Utility of the MMPI-2-RF in detecting non-credible somatic
complaints. Psychiatry Research, 197, 295–301. doi:http://dx.doi.org/10.1016/j.psychres.2011.12.043
Sellbom, M., Wygant, D. B., Toomey, J. A., Kucharski, L. T., & Duncan, S. (2010). Utility of the MMPI-2-RF
(Restructured Form) Validity Scales in detecting malingering in a criminal forensic setting: A known-
groups design. Psychological Assessment, 22, 22–31. doi:http://dx.doi.org/0.1037/a0018222
Shiels, J. (2015). Detecting feigned depression on psychology graduate students using the MMPI-2-RF
(Doctoral dissertation). Retrieved from ProQuest Dissertation and Thesis database. (UMI No. 3700677).
Simms, L. J., Casillas, A., Clark, L. A., Watson, D., & Doebbeling, B. N. (2005). Psychometric evaluation
of the restructured clinical scales of the MMPI-2. Psychological Assessment, 17, 345–358. doi:http://
dx.doi.org/10.1037/1040-3590.17.3.345
Sobhanian, S. (2014). Comparing possible malingering of post traumatic stress disorder symptoms using
the minnesota multiphasic personality inventory-2 and Multiphasic Personality Inventory-2-Restructured
Form (Doctoral dissertation). Retrieved from proquest dissertation and thesis database. (UMI No.
3613822).
Sleep, C., Petty, J. A., & Wygant, D. B. (2015). Framing the results: Assessment of response bias through
select self-report measures in psychological injury evaluations. Psychological: Injury and Law, 8,
27–39. doi: http://dx.doi.org/10.1007/s12207-015-9219-1
Sullivan, K. A. & Elliott, C. (2012). An investigation of the validity of the MMPI-2 Response Bias Scale
using an analog simulation design. The Clinical Neuropsychologist, 26, 160–176. doi:http://dx.doi.or
g/10.1080/13854046.2011.647084
Sullivan, K. A., Elliot, C. D., Lange, R. T., & Anderson, D. S. (2013). A known–groups evaluation of the
Response Bias Scale in a neuropsychological setting. Applied neuropsychology Adult, 20, 20–32.
Tarescavage, A. M., Wygant, D. B., Gervais, R. O., & Ben-Porath, Y. S. (2013). Association between the
MMPI-2 Restructured Form (MMPI-2-RF) and malingered neurocognitive dysfunction among non-
head injury disability claimants. The Clinical Neuropsychologist, 27, 313–335. doi: http://dx.doi.
org/10.1080/1384046.2012.744099
Tellegen, A., Ben-Porath, Y. S., McNulty, J. L., Arbisi, P. A., Graham, J. R., & Kaemmer, B. (2003). The MMPI-2
Restructured Clinical (RC) Scales: Development, validation, and interpretation. Minneapolis: University
of Minnesota Press.
Tolin, D. F., Steenkamp, M. M., Marx, B. P., & Litz, B. T. (2010). Detecting symptom exaggeration in
combat veterans using the MMPI-2 symptom validity scales: A mixed group validation. Psychological
Assessment, 22, 729–736.
University of Minnesota Press, Test Division. (2011). MMPI-2-RF reference list. Retrieved from http://
http://www.upress.umn.edu/test-division/MMPI-2-RF/mmpi-2-rf-references
Wall, T. D., Wygant, D. B., & Gallagher, R. W. (2015). Identifying overreporting in a correctional setting:
Utility of the MMPI-2 restructured form validity scales. Criminal Justice and Behavior, 42, 610–622.
doi:http://dx.doi.org/10.1177/0093854814556881
Whitney, K. A. (2013). Predicting test of memory malingering and medical symptom validity test failure
within a veterans affairs medical center: Use of the Response Bias Scale and the Henry-Heilbronner
index. Archives of clinical Neuropsychology, 28, 222–235. doi:http://dx.doi.org/10.1093/arclin/act012
Whitney, K. A., Davis, J. J., Shepard, P. H., & Herman, S. M. (2008). Utility of the Response Bias Scale
(RBS) and other MMPI-2 validity scales in predicting TOMM performance. Archieves of Clinical
Neuropsychology, 23, 777–786. doi:http://dx.doi.org/10.1016/j.acn.2008.09.001
Wolf, E. J., Miller, M. W., Orazem, R. J., Weierich, M. R., Castillo, D. T., Milford, J., … Keane, T. M. (2008).
The MMPI-2 Restructured Clinical Scales in the assessment of post-traumatic stress disorder with
comorbid disorders. Psychological Assessment, 20, 327–340. doi:http://dx.doi.org/10.1037/a0012948
Wygant, D. B., Anderson, J. L., Sellbom, M., Rapier, J. L., Allgeier, L. M., & Granacher, R. P. (2011). Association
of the MMPI-2 Restructured Form (MMPI-2-RF) Validity scales with structured malingering criteria.
Psychological Injury and Law, 4, 13–23. doi:http://dx.doi.org/10.1007/s12207-011-9098-z
Wygant, D. B., Ben-Porath, Y. S., & Arbisi, P. A. (2004). Development and initial validitation of a scale to
detect infrequent somatic complaints. Poster presented at the 39th annual symposium on Recent
Developments of the MMPI-2/MMPI-A, Minneapolis, MN.
Wygant, D. B., Ben-Porath, Y. S., Arbisi, P. A., Berry, D. T. R., Freeman, D. B., & Heilbronner, R. L. (2009).
Examination of the MMPI-2 restructured form (MMPI-2-RF) validity scales in civil forensic settings:
Findings from simulation and known group samples. Archives of Clinical Neuropsychology, 24,
671–680. doi:http://dx.doi.org/10.1093/arclin/acp073
Wygant, D. B., Sellbom, M., Gervais, R. O., Ben-Porath, Y. S., Stafford, K. P., Freeman, D. B., & Heilbronner,
R. L. (2010). Further validation of the MMPI-2 and MMPI-2-RF Response Bias Scale (RBS): Findings
from disability and criminal forensic settings. Psychological Assessment, 22, 745–756.
Young, J. C. & Gross, A. M. (2011). Detection of response bias and noncredible performance in adult
attention–deficit/hyperactivity disorder. Archives of Clinical Neuropsychology, 26, 165–175. doi:http://
dx.doi.org/10.1093/arclin/acr01
Young, J. C., Kearns, L. A., & Roper, B. L. (2011). Validation of the MMPI-2 Response Bias Scale and Henry-
Heilbronner index in a U.S. veteran population. Archives of Clinical Neuropsychology, 26, 194–204.
doi:http://dx.doi.org/10.1093/arclin/acr015
Youngjohn, J. R., Wershba, R., Stevenson, M., Sturgeon, J., & Thomas, M. L. (2011). Independent validation
of the MMPI-2-RF somatic/cognitive and Validity Scales in TBI litigants tested for effort. The Clinical
Neuropsychologist, 25, 463–476.

The Detection of Content-Based Invalid Responding: A Meta-Analysis of The MMPI-2-Restructured Form's (MMPI-2-RF) Over-Reporting Validity Scales

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Detection of Content-Based Invalid Responding: A Meta-Analysis of The MMPI-2-Restructured Form's (MMPI-2-RF) Over-Reporting Validity Scales

Uploaded by

Copyright:

Available Formats

The Clinical Neuropsychologist

ISSN: 1385-4046 (Print) 1744-4144 (Online) Journal homepage: http://www.tandfonline.com/loi/ntcn20

The detection of content-based invalid responding:

Paul B. Ingram & Michael S. Ternes

To link to this article: http://dx.doi.org/10.1080/13854046.2016.1187769

Published online: 23 May 2016.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

The detection of content-based invalid responding: a

ABSTRACT ARTICLE HISTORY

Content-based invalid responding (e.g. a respondent’s purposeful manipulation of response

CONTACT Paul B. Ingram pbingram@ku.edu

assessment of symptom presentation. The need for accurate interpretation is particularly

Literature search, selection criteria and coding

Table 1. Identified studies examining content based invalid responding.

Sellbom and Bagby (2011)

Table 2. Characteristics of studies included in the meta-analyses.

Validity scale analyses

indicating unexplained variability between studies, moderator analyses demonstrated sig-

Malingerera 26.40* 281.51***

Malingerera 29.88* 162.06***

Malingerera 28.54* 124.36***

Malingerera 9.14 87.93***

Malingerera 26.26 132.59***

help ensure accurate decision-making. For instance, an individual exceeding traditional

gering evaluations. However, the ability to differentiate an individual’s response veracity

eration for RBS as it continues to develop a base of research. Although it is emerging as a

Study critique and potential limitations

Implications for future research

You might also like