You are on page 1of 17

This article was downloaded by: [University of California Santa Cruz]

On: 09 October 2014, At: 15:40


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,
UK

Journal of Personality
Assessment
Publication details, including instructions for
authors and subscription information:
http://www.tandfonline.com/loi/hjpa20

MMPI-A Validity Scale Uses


and Limitations in Detecting
Varying Levels of Random
Responding
Robert P. Archer , Richard W. Handel , Kathleen D.
Lynch & David E. Elkins
Published online: 10 Jun 2010.

To cite this article: Robert P. Archer , Richard W. Handel , Kathleen D. Lynch &
David E. Elkins (2002) MMPI-A Validity Scale Uses and Limitations in Detecting Varying
Levels of Random Responding, Journal of Personality Assessment, 78:3, 417-431, DOI:
10.1207/S15327752JPA7803_03

To link to this article: http://dx.doi.org/10.1207/S15327752JPA7803_03

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014
JOURNAL OF PERSONALITY ASSESSMENT, 78(3), 417–431
Copyright © 2002, Lawrence Erlbaum Associates, Inc.

MMPI–A Validity Scale Uses


and Limitations in Detecting Varying
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

Levels of Random Responding

Robert P. Archer and Richard W. Handel


Department of Psychiatry
Eastern Virginia Medical School

Kathleen D. Lynch
Virginia Consortium Program in Clinical Psychology

David E. Elkins
Department of Psychiatry
Eastern Virginia Medical School

Although there is a substantial research literature on the effects of random responding


on the MMPI–2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), there
are very few studies available on this topic with the MMPI–A (Butcher et al., 1992).
Archer and Elkins (1999) found that MMPI–A validity scales F and VRIN were par-
ticularly useful in detecting entirely random profiles from those derived standardly in
clinical settings but noted that “all random” protocols could not be used to evaluate
the usefulness of the T-score difference between the first half (F1) and the second half
(F2) of the MMPI–A test booklet. Following up on this issue, this study extended the
methodology of previous research by examining the hit rate, positive predictive
power, negative predictive power, sensitivity, and specificity of VRIN, F, F1, F2 and
the absolute value of the T-score difference between F1 and F2 (denoted as |F1 – F2|) in
5 samples varying in the degree of protocol randomness. One of the samples consisted
of 100 adolescent inpatients administered the MMPI–A under standard instructions,
and another sample consisted of 100 protocols randomly generated by computer. The
additional 3 samples of 100 protocols each contained varying degrees of computer-
generated randomness introduced in the latter half of the MMPI–A item pool. Over-
418 ARCHER, HANDEL, LYNCH, ELKINS

all, the results generally indicate that several MMPI–A validity scales are useful in de-
tecting protocols that are largely random, but all of these validity scales are more
limited in detecting partially random responding that involves less than half the total
item pool located in the second half of the test booklet. Clinicians should be particu-
larly cautious concerning validity inferences based on the observed T-score differ-
ence that occurs for the F1 and F2 subscales and current findings do not support the
clinical usefulness of this index.
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

The Minnesota Multiphasic Personality Inventory–Adolescent (MMPI–A;


Butcher et al., 1992) includes a number of validity scales, and among these the
VRIN scale and the F scale have demonstrated utility in the detection of random
responding (e.g., Archer & Elkins, 1999). Furthermore, the MMPI–A contains
two subscales of the F scale that may have usefulness in the detection of random
responding, particularly when random responses are differentially more com-
mon in either the first (F1) or second half (F2) of the MMPI–A test booklet. Al-
though an extensive number of studies (e.g., Berry, Wetter, Baer, Larsen, et al.,
1992; Berry, Wetter, Baer, Widiger, et al.,1991; Cramer, 1995) have investi-
gated the detection of random responding on the MMPI–2 (Butcher, Dahlstrom,
Graham, Tellegen, & Kaemmer, 1989), few studies have explored this issue for
the MMPI–A.
In one of the first MMPI–A studies of random responding, Baer, Ballenger,
Berry, and Wetter (1997) examined the MMPI–A validity scale patterns of a sam-
ple of 106 normal adolescents. These adolescents completed the MMPI–A using
standard instructions, and each participant was then randomly assigned to one of
five conditions representing varying degrees of random responding. Baer et al.
(1997) concluded that several MMPI–A validity scales were effective in identify-
ing random responding, with increasing scores on F1, F2, F, and VRIN all associ-
ated with increased amounts of random responding. Baer et al. (1997) also
reported that greater degrees of random responding were associated with larger ef-
fect sizes for the effectiveness of validity scales in discriminating among groups.
The authors recommended replication of their findings in clinical samples of
adolescents.
Baer, Kroll, Rinaldo, and Ballenger (1999) further evaluated the utility of
MMPI–A validity scales in the detection of random responding using adolescent
samples from normal and clinical settings. The random response group consisted
of 20 nonclinical adolescents who had completed the MMPI–A without access to
the test booklet, thereby producing entirely random protocols; 24 nonclinical ado-
lescents with instructions to overreport symptoms of psychological disturbance;
and 25 adolescents in clinical treatment programs who had completed the
MMPI–A under standard instructions. The results of this study indicated that the F
scale was sensitive to both random responding and overreporting, whereas the
VRIN scale was exclusively sensitive to random responding.
MMPI–A VALIDITY SCALES 419

Archer and Elkins (1999) explored the utility of validity scales F, F1, F2, and
VRIN in accurately detecting differences in response patterns for a sample of 354
adolescents assessed under normal instructions in clinical settings from a sample
of 354 entirely random MMPI–A profiles generated by computer. In addition to
identifying optimal cutting scores for a 50% base rate of random responding, Ar-
cher and Elkins also adjusted the ratio of clinical to random participants to 10:1 to
represent a more likely base rate of substantial random responding in most clinical
settings. Archer and Elkins reported that the optimal validity scale cutting scores
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

for the 10:1 ratio (e.g., VRIN ≥ 80) were generally consistent with prior empirical
findings and recommendations in the MMPI–A manual (Butcher et al., 1992) for
the detection of random responding on the MMPI–A.
In addition to evaluating hit rate, sensitivity, specificity, positive predictive
power (PPP), and negative predictive power (NPP) at various cutting scores for each
validity scale as used in isolation, Archer and Elkins (1999) also examined the rela-
tive contribution of each of the validity scales when used in linear combination in the
identification of random profiles. This was accomplished by using a stepwise
discriminant function analysis (DFA) involving six MMPI–A validity scales (i.e.,
VRIN, TRIN, F, F1, F2, and F1 – F2) as potential predictors of individual protocol
membership in either the clinical or random groups. Results from the DFA sup-
ported the utility of VRIN, F, F1, and F2 in the detection of entirely random respond-
ing, particularly the role of the VRIN and F scales. In contrast, the TRIN scale and the
F1 – F2 index were not found to be useful predictors. Archer and Elkins’s (1999) find-
ings concerning TRIN and F1 – F2 are not surprising because the TRIN scale was de-
veloped to be most useful in detecting acquiescence or nonacquiescence rather than
random responding. Furthermore, F1 – F2 is unlikely to provide effective discrimina-
tion because both subscales would be expected to be elevated to a similar degree in
an entirely random protocol, thereby producing low values on the F1 – F2 index. The
authors suggested that future researchers focus on the usefulness of the T-score dif-
ference between the F1 and F2 subscales in detecting protocols that shift from consis-
tent to random responding in the latter portions of the test booklet.
This study extends the work of Archer and Elkins (1999) to explore the utility
of MMPI–A validity scales in the detection of varying degrees of random respond-
ing, including the examination of the usefulness of an index based on the absolute
T-score differences between the two F subscales, denoted as |F1 – F2|. Prior re-
search studies on the MMPI–A (e.g., Baer et al., 1997) and MMPI–2 (e.g., Cramer,
1995) have addressed the issue of partial random responding by randomizing all
items after a certain item number. For example, Baer et al. (1997) created a 25%
random condition by having selected participants continue to complete an
MMPI–A answer sheet after access to the test booklet ceased at Item Number 361
(i.e., responses for Items 361 to 478 were random). Although adolescents appear
more likely to introduce random responding in the latter stages of the test booklet
due to increased fatigue or boredom (Baer et al., 1997), it appears less likely that
420 ARCHER, HANDEL, LYNCH, ELKINS

they would answer all items in a random manner after a certain item number in a
manner consistent with the Baer et al. (1997) methodology. Therefore, the ap-
proach used in this study included random selection of subsets of items in the sec-
ond half of the MMPI–A booklet that were replaced with computer generated
random responses. Hypotheses for this study included the prediction that, consis-
tent with prior research by Baer et al. (1997, 1999) and Archer and Elkins (1999),
the F and VRIN scales would be most effective in detecting entirely random proto-
cols. However, it was further hypothesized that the |F1 – F2| index would be found
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

effective in the detection of protocols with varying degrees of randomness intro-


duced in the latter half of the test booklet.

METHOD

Participants

The data set used in this study included 446 inpatient adolescents administered the
MMPI–A under standard instructions. Profiles were eliminated from further analy-
ses based on a Cannot Say (?) score > 25 resulting in a final sample size of 430 (288
boys and 142 girls). These 430 adolescents have a mean age of 15.5 (SD = 1.1), and
285 were White, 50 were Black, 65 were Hispanic, and 30 were other. The largest
diagnostic groupings for these adolescents were conduct disorders (n = 192),
dysthymic disorders (n = 70), diagnosis missing or unknown (n = 43), major de-
pressive disorders (n = 36), depressive disorders not otherwise specified (n = 16),
and bipolar disorder (n = 10).

Instruments

The primary instrument in the current research was the 478 item MMPI–A. A full
description of the development, reliability, and validity of this instrument is pro-
vided in Archer (1997) and in the MMPI–A manual (Butcher et al., 1992). The cen-
tral focus of this study was on MMPI–A validity scales F, F1, F2, and VRIN. The F
scale is 66 items in length, and items were selected for scale membership that were
endorsed in the deviant direction by no more than 20% of the normative sample
(Butcher et al., 1992). The MMPI–A F scale is divided into a 33 item F1 scale
(scored in the first 236 items in the test booklet) and a 33 item F2 scale that occurs
exclusively in the latter half of the test booklet. The MMPI–A manual suggests that
a comparison of T-score values from the F1 and F2 subscales may be useful in iden-
tifying adolescents who change to a random response style in the latter stages of the
test booklet. To examine this contention, a |F1 – F2| index was used in this study to
calculate the absolute value of the difference between the F1 and F2 T scores for
each participant. The absolute T-score difference was used as a method of eliminat-
ing the phenomenon of the summation of positive and negative values reducing
MMPI–A VALIDITY SCALES 421

mean differences toward a zero value. Finally, the VRIN scale consists of 50
item–response pairs with similar or opposite content. Each time an adolescent re-
sponds to an item pair in an inconsistent manner, a raw score point is added to the
VRIN scale. Elevated VRIN scale scores indicate the adolescent has frequently re-
sponded in an inconsistent manner, possibly as a result of carelessness, reading lim-
itations, or intentional random responding (Archer, 1997).

Procedure
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

Four samples of 100 participants each were randomly selected from the larger sam-
ple of 430 MMPI–A protocols. The first sample of 100 was randomly selected from
the broader clinical sample derived under standard administration and served as the
comparison group for all analyses. This clinical sample had a mean age of 15.5 years
(SD = 1.0) and contained 73 boys and 27 girls. The remaining three samples included
the introduction of varying degrees of computer-generated randomness. For the sec-
ond sample (Random Group 1), 100 MMPI–A protocols were randomly selected
from the remaining pool of 330 profiles and modified by replacing items in the sec-
ond half of the MMPI–A with computer generated random responses for one third of
the items. For each individual case in Random Group 1, one third of the items in the
latter half of the test booklet were randomly selected and these items were in turn re-
placed with computer generated random responses. Therefore, no two cases in Ran-
dom Group 1 included random responses to the same set of items. Random Group 1
contained 69 boys and 31 girls with a mean age of 15.5 (SD = 0.9). The 100 partici-
pants in Random Group 2 followed a similar procedure in that 100 protocols were
randomly selected from the remaining pool of 230 profiles but included random re-
sponses generated for two thirds of the items in the second half of the MMPI–A. No
participants in Random Group 2 had random responses for the same set of items.
Random Group 2 contained 67 boys and 33 girls with a mean age of 15.4 years (SD =
1.0). Random Group 3 consisted of 100 protocols randomly selected from the re-
maining 130 participants. The entire second half of the MMPI–A item pool for this
group was replaced with random responses. A different set of random responses was
generated for each participant in Group 3 for the latter half of the item pool. Random
Group 3 consisted of 63 boys and 37 girls and had a mean age of 15.6 years (SD = 1.0).
Finally, Random Group 4 consisted of 100 entirely random protocols generated by
computer. As with the Group 3 participants, no Group 4 participants had the same
pattern of random responses. Sixty-six of the random protocols in Random Group 4
were scored as boys and 34 protocols were scored on norms for girls.

Data Analyses

Overall classification accuracy was calculated based on identification of adoles-


cents from the clinical group in comparison to each of the random groups and pre-
422 ARCHER, HANDEL, LYNCH, ELKINS

sented in terms of hit rate findings, PPP, NPP, sensitivity, and specificity for each of
the T-score cutoff criteria. For purposes of these analyses, a true positive was de-
fined as a test score above the criterion level produced by a profile containing ran-
dom responses, and a true negative was a test score below the criterion level from a
protocol produced by a clinical participant under standard administration condi-
tions. PPP was the probability that the elevated score was produced by a partially
random or random protocol, and NPP was the probability that the validity scale
score below criterion was produced by a nonrandom protocol (i.e., a clinical partic-
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

ipant administered the MMPI–A under standard conditions).


Hit rate, PPP, NPP, specificity, and sensitivity were calculated for F1, F2, F,
VRIN, and |F1 – F2| at various cutting scores. In addition to investigating these
data with a 50% base rate of random responding, each case in the unmodified
clinical comparison group was copied nine times to allow for analyses of hit
rate, PPP, NPP, specificity, and sensitivity for a ratio of 10:1 unmodified versus
random protocols.

RESULTS

The mean MMPI–A T scores generated by the five groups used in this research are
presented for the F1, F2, F, and VRIN scales, and the |F1 – F2| index in Table 1. This
table also provides the F value for univariate analyses of variance examining the ef-
fects of group membership on mean T-score values for each validity scale or index.
As can be seen in Table 1, the mean T-score values for F1 were, as expected, highly
similar under conditions in which participants took the test under standard clinical
conditions (clinical group) and under the three conditions in which varying degrees
of randomness were introduced into the latter half of the item pool. Only for Ran-

TABLE 1
Means and Standard Deviations of F1, F2, F, VRIN, and |F1 – F2| Index for Protocols
in Varying Degrees of Randomness

F1 F2 F VRIN |F1 – F2 |

M SD M SD M SD M SD M SD

Clinical comparison
group 58.00 10.80 52.00 10.00 54.83 10.31 50.84 8.60 7.71 5.49
Random Group 1 58.20 12.45 57.54 7.29 58.16 8.99 54.64 8.02 6.62 7.12
Random Group 2 59.93 11.77 65.62 6.61 63.93 8.32 59.40 7.87 9.73 6.20
Random Group 3 59.58 10.86 74.66 5.87 69.28 6.77 64.74 8.62 17.20 10.02
Random Group 4 83.89 10.43 74.51 5.84 80.14 6.31 74.27 7.65 11.84 8.10
F value 94.5*** 191.73*** 145.59*** 126.05*** 30.68***

***p < .001.


MMPI–A VALIDITY SCALES 423

dom Group 4, in which the protocol was entirely random, did a substantial elevation
of the mean T-score value occur for the F1 subscale. In contrast, F2 mean T-score
values systematically increased as the level of randomness was increased in the sec-
ond half of the test booklet. As expected, the mean T-score values for F2 for a group
of protocols that were entirely random during the second half was very similar to
the mean T score for F2 for a group of protocols that were entirely random across the
length of the item pool. The mean T-score values for the F scale and the VRIN scale
both increased systematically as the amount of randomness in protocols was in-
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

creased across the five groups. Finally, and somewhat surprisingly, the absolute T-
score value difference between the F1 and F2 subscales was reasonably similar un-
der standard administration conditions with the clinical group and under conditions
in which one third of the latter half of the item pool was randomly generated. The
mean T-score difference value was somewhat higher when two thirds of the latter
half of the item pool was random, but mean scores only appeared markedly differ-
ent under conditions in which the entire latter half of the item pool was randomly
generated (i.e., Random Group 3).

Classification Accuracy Data

Table 2 presents the data concerning the classification accuracy achieved by the use
of the F1 scale at varying T-score cutoff levels in discriminating protocols produced
by adolescents in the clinical sample versus groups containing varying degrees of
randomly generated protocols. In addition, Table 2 presents the hit rate, PPP, and
NPP that result if classification accuracy remained consistent (in terms of propor-
tions correctly classified in each group), but the ratio of clinical to random partici-
pants was modified from 1 clinical participant for every random participant (a 1:1
ratio) to 10 clinical participants per 1 random participant. The 10:1 ratio was se-
lected as the primary reference point for data analyses because it was felt to repre-
sent a more realistic expectation regarding the occurrence of substantially random
protocols in most clinical settings. In addition, findings generated by varying cut-
ting scores are presented in Table 2 to allow the reader to better evaluate the relative
efficacy of each of the MMPI–A validity scales across a broad range of classifica-
tion criteria.
Table 3 presents comparable classification accuracy for the F2 scale in terms of
the application of five separate levels of T-score cutoffs and Table 4 provides com-
parable classification accuracy information for the MMPI–A F scale.
Table 5 provides the classification accuracy findings for the VRIN scale at five
cutoff score levels of T ≥ 60, T ≥ 70, T ≥ 75, T ≥ 80, and T ≥ 90.
Table 6 presents information concerning the classification accuracy derived by
the use of the absolute T-score difference between the F1 and F2 scales at five cut-
ting score levels.
424 ARCHER, HANDEL, LYNCH, ELKINS

TABLE 2
Hit Rate, PPP, NPP, Sensitivity, and Specificity for the F1 Scale at Five Cutting Scores

Hit Rate PPP NPP


Comparison Cut Score
Group (T ≥) 10:1 1:1 10:1 1:1 10:1 1:1 SEN SPE

Random Group 1 60 .58 .49 0.09 0.49 .91 .49 .38 0.60
65 .66 .46 0.07 0.42 .90 .47 .21 0.71
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

70 .75 .47 0.06 0.41 .90 .48 .13 0.81


80 .89 .52 0.17 0.67 .91 .51 .06 0.97
90 .91 .52 1.00 1.00 .91 .51 .03 1.00
Random Group 2 60 .59 .52 0.10 0.52 .91 .52 .44 0.60
65 .67 .51 0.10 0.52 .91 .51 .31 0.71
70 .76 .53 0.11 0.56 .91 .52 .24 0.81
80 .89 .53 0.21 0.73 .91 .51 .08 0.97
90 .91 .51 1.00 1.00 .91 .50 .01 1.00
Random Group 3 60 .59 .53 0.10 0.53 .92 .52 .45 0.60
65 .67 .50 0.09 0.50 .91 .50 .29 0.71
70 .75 .49 0.08 0.47 .91 .49 .17 0.81
80 .89 .53 0.21 0.73 .91 .51 .08 0.97
90 .91 .51 1.00 1.00 .91 .50 .01 1.00
Random Group 4 60 .64 .80 0.20 0.71 .99 .98 .99 0.60
65 .73 .85 0.25 0.77 .99 .97 .98 0.71
70 .82 .88 0.33 0.83 .99 .93 .94 0.81
80 .94 .81 0.68 0.96 .97 .73 .65 0.97
90 .94 .65 1.00 1.00 .93 .58 .29 1.00

Note. PPP = positive predictive power; NPP = negative predictive power; SEN = Sensitivity; SPE = Specificity.

DISCUSSION

Current results provide data on the effectiveness of a variety of MMPI–A scales and
indexes in accurately detecting the occurrence of protocols varying in the degree of
random responding. Although little research has focused on the frequency of ran-
dom responses, Berry, Wetter, Baer, Larson, et al. (1992) and Baer et al. (1997) re-
ported that between 29% and 76% of respondents acknowledged one or more
random responses during administration of the MMPI–2 and MMPI–A, respec-
tively. However, the typical number of random MMPI–A responses in the Baer et
al. (1997) study was 13.64 for their sample of 14- through 17-year-old adolescents,
substantially less than the number of random responses found in the random groups
used in our study. In the absence of clear research findings on this topic, it would
seem reasonable to postulate that random responding and omitted responses may
be similar in that most respondents provide a few of both types of responses, but rel-
atively few adolescents or adults produce many omitted or random responses.
Thus, classification accuracy data provided by evaluations in which the ratio of ran-
MMPI–A VALIDITY SCALES 425

dom protocols to clinical sample protocols was 10 to 1 is likely to be more reflective


of typical base rates for substantial random responding than the 1:1 ratio also re-
ported in this study.
Overall, the results generally indicate that several MMPI–A validity scales are
useful in detecting protocols that have a large number of random responses. All of
these MMPI–A indexes show more limited effectiveness, however, in detecting
partially random responding in latter stages of the test that involves less than half
the total MMPI–A protocol. As expected, F1 subscale T-score cutoffs were not ef-
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

fective in identifying protocols that introduced random responding in the latter


half of the test booklet (i.e, Random Groups 1, 2, or 3). Because the F1 item pool
only appears in the first 236 items of the MMPI–A, random responding introduced
in the latter half of the test booklet could not influence F1 results. The data pre-
sented in Table 2 indicate that the optimal F1 T-score cutoff was T ≥ 80 for Random
Group 4 (entirely random responses) in terms of producing the optimal balance be-
tween the five classification accuracy indexes presented in this table.

TABLE 3
Hit Rate, PPP, NPP, Sensitivity, and Specificity for the F2 Scale at Five Cutting Scores

Hit Rate PPP NPP


Comparison Cut Score
Group (T ≥) 10:1 1:1 10:1 1:1 10:1 1:1 SEN SPE

Random Group 1 60 .74 .58 0.14 0.63 0.93 0.55 0.37 0.78
65 .80 .51 0.10 0.52 0.91 0.50 0.14 0.87
70 .84 .50 0.10 0.50 0.91 0.50 0.08 0.92
80 .90 .49 0.00 0.00 0.91 0.50 0.00 0.99
90 .91 .50 0.00 0.00 0.91 0.50 0.00 1.00
Random Group 2 60 .79 .82 0.27 0.79 0.98 0.84 0.85 0.78
65 .84 .70 0.29 0.80 0.95 0.64 0.52 0.87
70 .86 .82 0.26 0.78 0.93 0.56 0.28 0.92
80 .90 .50 0.09 0.50 0.90 0.50 0.01 0.99
90 .91 .50 0.00 0.00 0.91 0.50 0.00 1.00
Random Group 3 60 .80 .89 0.31 0.82 1.00 1.00 1.00 0.78
65 .88 .92 0.43 0.88 1.00 0.97 0.97 0.87
70 .91 .88 0.51 0.91 0.98 0.84 0.83 0.92
80 .92 .59 0.66 0.95 0.92 0.55 0.19 0.99
90 .91 .50 1.00 1.00 0.91 0.50 0.01 1.00
Random Group 4 60 .80 .89 0.31 0.82 1.00 1.00 1.00 0.78
65 .88 .90 0.42 0.88 0.99 0.94 0.94 0.87
70 .91 .87 0.51 0.91 0.98 0.84 0.82 0.92
80 .92 .62 0.71 0.96 0.93 0.57 0.25 0.99
90 .91 .50 0.00 0.00 0.91 0.50 0.00 1.00

Note. PPP = positive predictive power; NPP = negative predictive power; SEN = Sensitivity; SPE = Specificity.
426 ARCHER, HANDEL, LYNCH, ELKINS

TABLE 4
Hit Rate, PPP, NPP, Sensitivity, and Specificity for the F Scale at Five Cutting Scores

Hit Rate PPP NPP


Comparison Cut Score
Group (T ≥) 10:1 1:1 10:1 1:1 10:1 1:1 SEN SPE

Random Group 1 60 .67 .53 0.11 0.55 0.92 0.52 0.36 0.70
65 .75 .51 0.10 0.51 0.91 0.50 0.20 0.81
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

70 .81 .50 0.08 0.48 0.91 0.50 0.11 0.88


80 .90 .51 0.17 0.67 0.91 0.50 0.02 0.99
90 .91 .51 1.00 1.00 0.91 0.50 0.01 1.00
Random Group 2 60 .69 .65 0.16 0.66 0.94 0.63 0.59 0.70
65 .78 .62 0.18 0.69 0.93 0.59 0.43 0.81
70 .82 .57 0.17 0.68 0.92 0.54 0.25 0.88
80 .90 .52 0.33 0.83 0.91 0.51 0.05 0.99
90 .91 .50 0.00 0.00 0.91 0.50 0.00 1.00
Random Group 3 60 .72 .80 0.23 0.75 0.99 0.88 0.91 0.70
65 .81 .78 0.28 0.80 0.97 0.76 0.75 0.81
70 .85 .70 0.30 0.81 0.95 0.64 0.51 0.88
80 .91 .52 0.38 0.86 0.91 0.51 0.06 0.99
90 .91 .50 0.00 0.00 0.91 0.50 0.00 1.00
Random Group 4 60 .73 .85 0.25 0.77 1.00 1.00 1.00 0.70
65 .83 .90 0.34 0.84 0.99 0.99 0.99 0.81
70 .89 .92 0.44 0.89 0.99 0.96 0.96 0.88
80 .94 .74 0.83 0.98 0.95 0.66 0.48 0.99
90 .91 .53 1.00 1.00 0.91 0.51 0.06 1.00

Note. PPP = positive predictive power; NPP = negative predictive power; SEN = Sensitivity; SPE = Specificity.

T-score cutoffs for the F2 subscale, F, and VRIN all demonstrated limited or
mixed effectiveness at accurately identifying randomness in the latter half of the
test booklet, but appeared to be effective in this discrimination task only when the
extent of randomness involved all of the latter half of the test booklet (Random
Group 3) or all random protocols (Random Group 4). Under the latter classifica-
tion tasks, the optimal T-score cutoff for the F and F2 scales appeared to be around
T ≥ 80 for conditions in which the number of standard to random protocols are con-
figured in a ratio of 10:1. Similarly, results show that the VRIN scale was not able
to achieve an impressive level of classification accuracy in the task of separating
standardly administered protocols from protocols that were partially (one third or
two thirds) random in the latter half of the test booklet (i.e., Random Group 1 or
Random Group 2). Whereas adequate levels of overall hit rate classification were
produced for these groups, sensitivity and PPP findings reflected dramatic limita-
tions. In contrast, the VRIN scale was more effective in distinguishing standardly
administered protocols from protocols that were entirely random in the latter half
of the booklet (T ≥ 60 appeared to be the best balanced cutoff score) and most ef-
fective in distinguishing standardly administered protocols in Random Group 4
MMPI–A VALIDITY SCALES 427

from completely random protocols at a T score ≥ 75 for the 10:1 clinical protocol to
random ratio. Although review of the classification accuracy for the F, F2, and
VRIN scales suggests that these scales are relatively equivalent in terms of yield-
ing a high overall hit rate, Baer et al. (1999) noted that the VRIN scale may have
the advantage of being sensitive only to random responding, whereas F scale ele-
vations reflect both random patterns and overreporting response sets.
Current findings for the F scale, the F subscales, and the VRIN scale can be
compared to prior MMPI–A literature as reported by Archer and Elkins (1999),
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

Baer et al. (1997), and Baer et al. (1999). In general, the overall classification
accuracy found for these scales in the current research is very similar to the find-
ings reported by Archer and Elkins in their comparison of protocols from 354
adolescents under standard administration in a clinical sample in contrast to 354
entirely random protocols. Additionally, these results were also consistent with
findings reported by Baer et al. (1997) in their MMPI–A comparisons of 106 ad-
olescents who completed the MMPI–A using standard instructions in contrast to
MMPI–A protocols that were systematically varied in the extent of randomness

TABLE 5
Hit Rate, PPP, NPP, Sensitivity, and Specificity for the VRIN Scale at Five Cutting Scores

Hit Rate PPP NPP


Comparison
Group Cut Score 10:1 1:1 10:1 1:1 10:1 1:1 SEN SPE

Random Group 1 60 .80 .59 0.18 0.68 .93 .56 .32 0.85
70 .89 .51 0.14 0.63 .91 .51 .05 0.97
75 .90 .50 0.00 0.00 .91 .50 .00 0.99
80 .91 .50 0.00 0.00 .91 .50 .00 1.00
90 .91 .50 0.00 0.00 .91 .50 .00 1.00
Random Group 2 60 .82 .67 0.25 0.77 .94 .63 .49 0.85
70 .89 .53 0.25 0.77 .92 .52 .10 0.97
75 .90 .52 0.33 0.83 .91 .51 .05 0.99
80 .91 .50 0.00 0.00 .91 .50 .00 1.00
90 .91 .50 0.00 0.00 .91 .50 .00 1.00
Random Group 3 60 .84 .78 0.32 0.82 .97 .74 .70 0.85
70 .91 .63 0.49 0.91 .93 .58 .29 0.97
75 .92 .59 0.64 0.95 .92 .55 .18 0.99
80 .91 .52 1.00 1.00 .91 .51 .03 1.00
90 .91 .50 0.00 0.00 .91 .50 .00 1.00
Random Group 4 60 .86 .92 0.40 0.87 .99 .99 .99 0.85
70 .94 .82 0.69 0.96 .97 .74 .66 0.97
75 .95 .78 0.85 0.98 .96 .70 .57 0.99
80 .93 .62 1.00 1.00 .93 .57 .23 1.00
90 .91 .51 1.00 1.00 .91 .50 .01 1.00

Note. PPP = positive predictive power; NPP = negative predictive power; SEN = Sensitivity; SPE = Specificity.
428 ARCHER, HANDEL, LYNCH, ELKINS

TABLE 6
Hit Rate, PPP, NPP, Sensitivity, and Specificity for the |F1 –F2| Index at Five Cutting Scores

Hit Rate PPP NPP


Comparison
Group Cut Score 10:1 1:1 10:1 1:1 10:1 1:1 SEN SPE

Random Group 1 10 .64 .45 0.06 0.41 .90 .47 .22 0.68
15 .81 .49 0.07 0.43 .91 .49 .09 0.88
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

20 .87 .51 0.11 0.55 .91 .50 .06 0.95


25 .91 .52 1.00 1.00 .91 .51 .04 1.00
30 .91 .52 1.00 1.00 .91 .51 .03 1.00
Random Group 2 10 .66 .59 0.14 0.61 .93 .58 .50 0.68
15 .82 .56 0.17 0.67 .92 .54 .24 0.88
20 .87 .50 0.09 0.50 .91 .50 .05 0.95
25 .91 .51 1.00 1.00 .91 .50 .01 1.00
30 .91 .50 0.00 0.00 .91 .50 .00 1.00
Random Group 3 10 .69 .72 0.19 0.70 .96 .73 .75 0.68
15 .85 .73 0.32 0.83 .95 .67 .57 0.88
20 .90 .67 0.44 0.89 .94 .60 .39 0.95
25 .93 .60 1.00 1.00 .93 .56 .20 1.00
30 .92 .57 1.00 1.00 .92 .53 .13 1.00
Random Group 4 10 .67 .60 0.14 0.62 .93 .59 .52 0.68
15 .83 .62 0.23 0.75 .93 .58 .35 0.88
20 .88 .58 0.29 0.80 .92 .54 .20 0.95
25 .92 .55 1.00 1.00 .92 .52 .09 1.00
30 .91 .51 1.00 1.00 .91 .50 .01 1.00

Note. PPP = positive predictive power; NPP = negative predictive power; SEN = Sensitivity; SPE = Specificity.

such that groups contained 25% random, 50% random, 75% random, or 100%
random responses. Although Baer and her colleagues (1997) concluded that the
F1, F2, F, and VRIN scales were sensitive in detecting protocols that were par-
tially random, they reported mean T scores for these validity indexes that were
not clinically elevated when the randomness was restricted to the last quarter of
the test booklet. Thus, their findings appear to parallel ours in indicating that re-
spondents who have “gone random” during only part of the latter half of the test
booklet may be quite difficult to detect through the use of standard MMPI–A va-
lidity measures.
A major focus of this study involved the evaluation of the |F1 – F2| index in de-
tection of randomness in MMPI–A protocols. Butcher et al. (1992) in the
MMPI–A manual noted that a comparison of the T-score values generated by an
adolescent on the F1 and F2 subscales may be useful in identifying adolescents who
begin random responding during the latter half of the test booklet. Specifically,
Butcher et al. (1992) observed that an “acceptable T-score elevation on F1 in com-
MMPI–A VALIDITY SCALES 429

bination with an elevated value on the F2 scale” (p. 40) might indicate randomness
introduced during the latter stages of the testing session. Archer (1997) speculated
that if the F1 and F2 subscales are used in this manner, a minimum of 20 T-score
points difference would be needed before inferring that a change in response style
was manifested in the latter half of the test. Surprisingly, the findings from this
study indicate that the absolute T-score difference found between the F1 and F2
subscales is not effective in identifying random response protocols. Specifically,
the |F1 – F2| index was consistently unable to generate both high PPP and sensitiv-
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

ity at any of the cutoff levels used in this study when comparing the standardly ad-
ministered protocols produced by the clinical group with any of the partially or
fully random protocol groups. In general, the overall pattern suggests that when
the T-score difference cutoffs became large enough to generate acceptable levels
of PPP (probability an elevated score reflects a random protocol), so few random
protocols were found above that criterion score that test sensitivity became mark-
edly low. The obvious question concerns why the T-score difference between the
F1 and F2, promising in theory, is of so little practical value identifying random re-
sponses in the current research investigation.
A possible or partial explanation for this |F1 – F2| index limitation may be seen
in the tendency of the F1 subscale to generally produce higher T-score values than
the F2 subscale under all-random conditions as well as under standard administra-
tion conditions as found in reports for clinical samples in several studies. Illus-
trating this point, protocols that are completely random have been shown to
produce mean F1 scale values that vary between T scores of 84 in our study to mean
values as high as 91 in the prior research by Archer and Elkins (1999), Baer et al.
(1997), and Baer et al. (1999). In contrast, the mean F2 T-score values for these
random samples are much lower and varied narrowly from 74 to 75. Furthermore,
in our current research and in the research by Archer and Elkins (1999) and Baer et
al. (1999), the mean T-score values for F1 under standard administration condi-
tions to a clinical sample were between 58 and 60, whereas the F2 mean T score un-
der standard conditions was between 52 and 55. The net effect of these phenomena
is that limited randomness introduced solely into the second half of the test booklet
appears to have little effect on the mean T-score difference between F1 and F2, and
it requires a marked degree of randomness isolated in the second half of the test
booklet before these difference scores become substantially elevated. It is interest-
ing to note that Cramer (1995) also found marked limitations for a |F – Fb| index in
distinguishing valid MMPI–2 profiles from MMPI–2 protocols containing vary-
ing degrees of randomness. This latter limitation may be related to the recent ob-
servation by Friedman, Lewak, Nichols, and Webb (2001) that the MMPI–2 F
scale may emphasize more items related to psychoticism, whereas the Fb scale has
a greater emphasis on items reflecting acute distress. Although no similar item
content evaluation has been conducted on the MMPI–A F subscales, our results re-
430 ARCHER, HANDEL, LYNCH, ELKINS

flect the effects created by the mean and standard deviation differences between F1
and F2. The MMPI–A manual (Butcher et. al., 1992) shows that F2 has higher
mean raw scores and larger standard deviations than the F1 subscale for both boys
and girls in the normative sample.
In summary, current results indicate that the F, F1, F2, and the VRIN scales ap-
pear individually useful in the detection of protocols that have a high percentage of
random responding but are less effective in the detection of partially random pro-
tocols. Clinicians should be cautious in concluding that partially random protocols
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

can be ruled out based on any of the validity scales utilized in this study. In particu-
lar, these findings do not support the use of the |F1 – F2| index in the identification
of random responding, but our research design cannot rule out the possibility that
such an index might be useful if calculated or configured in a different manner. For
example, although Cramer (1995) found that a |F – Fb| index was not useful in iden-
tifying randomness on the MMPI–2, the author did find that an index composed of
F + Fb + |F – Fb| scale could reliably distinguish several levels of randomness.
However, even if future investigations of F1 and F2 identify a mathematical config-
uration of these two subscales that yield useful levels of accurate detection of ran-
dom profiles, the practical utility of this type of relatively complicated measure
would require a reliable demonstration of classification accuracy characteristics
that exceeds those currently achievable through standardly used MMPI–A validity
scales such as the F and VRIN scale.

REFERENCES

Archer, R. P. (1997). MMPI–A: Assessing adolescent psychopathology (2nd ed.). Mahwah, NJ: Law-
rence Erlbaum Associates, Inc.
Archer, R. P., & Elkins, D. E. (1999). Identification of random responding on the MMPI–A. Journal of
Personality Assessment, 73, 407–421.
Baer, R. A., Ballenger, J., Berry, D. T. R., & Wetter, M. W. (1997). Detection of random responding on
the MMPI–A. Journal of Personality Assessment, 68, 139–151.
Baer, R. A., Kroll, L. S., Rinaldo, J., & Ballenger, J. (1999). Detecting and discriminating between ran-
dom responding and overreporting on the MMPI–A. Journal of Personality Assessment, 72,
308–320.
Berry, D. T. R., Wetter, M. W., Baer, R. A., Larsen, L. H., Clark, C., & Monroe, K. (1992). MMPI–2 ran-
dom responding indices: Validation using a self-report methodology. Psychological Assessment, 4,
340–345.
Berry, D. T. R., Wetter, M. W., Baer, R. A., Widiger, T. A., Sumpter, J. C., Reynolds, S. K., et al. (1991).
Detection of random responding on the MMPI–2: Utility of the F, back F, and VRIN scales. Psycho-
logical Assessment, 3, 418–423.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). MMP–2: Minne-
sota Multiphasic Personality Inventory–2: Manual for administration and scoring. Minneapolis:
University of Minnesota Press.
MMPI–A VALIDITY SCALES 431

Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R. P., Tellegen, A., Ben-Porath, Y. S., et al.
(1992). MMPI–A: Minnesota Multiphasic Personality Inventory–Adolescent: Manual for adminis-
tration, scoring, and interpretation. Minneapolis: University of Minnesota Press.
Cramer, K. M. (1995). Comparing three new MMPI–2 randomness indices in a novel procedure for ran-
dom profile derivation. Journal of Personality Assessment, 65, 514–520.
Friedman, A.F., Lewak, R., Nichols, D. S., & Webb, J. T. (2001). Psychological assessment with the
MMPI–2. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Robert P. Archer
Downloaded by [University of California Santa Cruz] at 15:40 09 October 2014

Department of Psychiatry
Eastern Virginia Medical School
825 Fairfax Avenue
Norfolk, VA 23507–1914

Received May 4, 2001


Revised October 30, 2001