Professional Documents
Culture Documents
Psychological
Assessment
VOLUMES
A Continuation Order Plan is available for this series. A continuation order will
bring delivery of each new volume immediately upon publication. Volumes are bill-
ed only upon actual shipment. For further information please contact the publisher.
Advances in
Psychological
Assessment
VOLUMES
Edited by
James C. Rosen
University of Vermont
Burlington, Vermont
and
Paul McReynolds
University of Nevada-Reno
Reno, Nevada
v
vi CONTRIBUTORS
The present volume, like the earlier ones in this series, is designed to
help keep the assessment psychologist abreast of significant new devel-
opments in the field. The series is addressed both to the practitioner and
the researcher, and has also proved helpful to graduate students in
psychology, and this volume can usefully serve as a supplementary text
for graduate classes in assessment. The chapters in the present volume,
as in the preceding ones, are most directly relevant to workers in the
areas of measurement, clinical psychology, and personality psychology,
but some chapters will also be of use to specialists in other areas.
It has been the policy, in determining chapter topics and in soliciting
authors for volumes in this series, to first survey the field of psychological
assessment as a whole, and then to focus on those trends and techniques
that are particularly innovative and are at the forefront of current
development. Our basic aim is to highlight and articulate advances in the
field. It is, of course, not sufficient, In order for a theme to merit inclusion
in this series, that it be essentially new; equally important, the contribu-
tion to the field of assessment must be substantial. This criterion means
that topics selected for inclusion in this series are rarely completely
novel; rather, they represent relatively new developments-which, how-
ever, have been sufficiently tested and utilized to make it evident that
they truly are advances.
We feel that the eight chapters comprising the present volume-the
eighth in the ongoing series-meet these criteria particularly well, and it
Is with considerable pleasure that we present them to the professional
assessment community.
Wetakethisopportunitytoth ankEliotWerneratPlenumP ressforhis
assistance and support in producing this book. Most of all, we express our
gratitude to the 14 authors whose scholarly contributions have resulted
in the excellence of this volume.
James C. Rosen
Paul McReynolds
vii
Contents
ix
CONTENTS
CHAPTERS
Assessment of Creative Potential In Psychology and the
Development of a Creative Temperament Scale for the CPI ........... 225
Harrison G. Gough
remaining three are directed toward adult assessment. Also, the first two
of the four-chapters 3 and 4-represent techniques in which the asses-
sor systematically evaluates the individual being assessed, whereas the
latter two chapters of this group-i.e., chapters 5 and 6-deal with self-
report inventories. Chapter 7, the penultimate contribution, approaches
assessment from a different perspective; rather than examining a specific
method of assessment it focuses on a particular object of assessment, in
this case couples. The final chapter, number 8, has a still different
rationale, in that it is concerned with the assessment of an important
human potential, the capacity for creativity.
This, then, is the overall plan of the volume. We turn now to a more
detailed introduction to each chapter, beginning with Chapter 1, on the
assessment of intelligence.
Intelligence testing continues as one of the most basic areas of
assessment. Among the relatively recent technological developments in
this area are revisions of the WAIS, WPPSI, and Stanford-Binet, publica-
tion of the Kaufman Assessment Battery for children (K-ABC), and the
third edition of the WISC. Chapter 1, however, is not concerned with
particular intelligence tests, but rather takes a broader, more conceptual
approach to the assessment of intelligence. Its author, RobertJ. Sternberg,
is an internationally renowned authority on the nature of intelligence, and
a sharp critic of traditional intelligence assessment. In his contribution
here he first provides a brief historical introduction to the measurement
of intelligence, with an emphasis on Binet's work, and then offers a broad,
overall schematization-built around the guiding concept of metaphor-
of seven different approaches to conceptualizing and assessing
intelligence. This scholarly and thought-providing chapter not only effec-
tively broadens the scope in terms of which intelligence is typically
conceived, but also provides an informed glimpse of what the intelligence
tests of the future-and perhaps not too distant future-may well be like.
High quality assessment, particularly when utilized in research, is
dependent not only on the availability of sound data-gathering instru-
ments and procedures, but also on adequate methods of data analysis.
This truism takes one within the realm of the statistical bases of assess-
ment. Though perhaps less known to the majority of assessment
psychologists than the development of new instruments, there have
indeed been major recent advances in data analysis. One of the most
striking of these is the systematic procedure known as structural equa-
tion modeling. This modern, highly sophisticated method for dealing
meaningfully with the combined effects of an assortment of variables is
the topic of Chapter 2. The authors, Lew Bank and G. R. Patterson, are
expert not only in the logic and mathematics of structural equation
modeling, but in addition have meaningfully utilized the technique in
INTRODUCTION xlii
their own influential studies on children and parents at the Oregon Social
Learning Center. Their contribution here includes illustrative examples
from those researches. These examples demonstrate that structural
equation modeling, by its very nature, is closely tied to substantive
psychological theory, and can assist in deciding which of several theoreti-
cal formulations is to be preferred.
Chapter 3, as noted earlier, is on the assessment of children and
adolescents. The author, Thomas M. Achenbach, is one of the most highly
respected and cited authorities in this area. He is responsible for the
development of several widely employed instruments, including the Child
Behavior Checklist (for parents) and the Teacher's Report Form, for
reporting specific child behaviors based on direct observation. In addi-
tion, he has contributed in a major way to the theoretical understanding
of child psychopathology, especially with respect to taxonomical issues.
In his contribution to this volume, Achenbach, rather than concentrating
solely on particular instruments, draws a broader picture. He proposes
an orientation that he terms "multiaxial empirically based assessment".
This approach involves integrating data from a variety of sources, includ-
ing behavior ratings, interviews, ability tests, and physical evaluations.
Further, a computer program is available for the integration of certain
data on individual children. Both in terms of the individual measures
discussed and the overall assessment model described, this chapter
offers a wealth of insights to the child assessor.
With Chapter 4 we turn to assessment in a major area of psychopa-
thology-psychopathy. For many years the understanding of this disorder,
sometimes also referred to as sociopathy or antisocial personality disor-
der-has constituted a major research problem. This situation was long
exacerbated by the lack of an adequate instrument for identifying and
describing the psychopath. In recent years, however, this lack has been
largely alleviated by Robert D. Hare and his associates in their develop-
ment of the Psychopathy Checklist. This instrument, now in a revised and
updated form, is without question a major advance, both for researchers
and clinicians. In their contributions to this volume Stephen D. Hart, Hare,
and Timothy J. Harpur provide an up-to-date overview of the Checklist,
beginning with an analysis of the concept of psychopathy and then
reviewing the psychometric properties, factor structure and range of
application of the instrument. Their chapter is properly viewed as both
a basic contribution to the understanding of psychopathy, and as an
examiner's guide for psychologists wishing to employ the Psychopathy
Checklist-Revised.
The MMPI has been with us for a half century; it was developed in the
early 1940s, and first published in 1943. In the decades to follow it gained
an international reputation as the foremost broad-based psychological
xlv PAUL McREYNOlDS AND JAMFS C. ROSEN
Robert J. Sternberg
I
2 ROBERTJ.STERNBERG
broaden them to take into account metaphors of mind other than the
geographic one.
The chapter is divided into three major parts. In the first, I will
consider two main historical traditions in the testing of intelligence, both
in terms of the theories of intelligence upon which much later work was
based, and in terms of the tests that emanated from these theories.
In the second part of the chapter, I will consider alternative meta-
phors of mind and how they have influenced the testing of intelligence. In
the third part of the chapter, I will briefly summarize main points and
draw conclusions.
Binet's tests were not as closely allied to his theory as were Galton's.
Although the items do measure judgmental abilities, the judgments that
need to be made are somewhat artificial, and are generally removed from
real-world judgmental tasks. At best, they mimic simple school tasks, but
they capture little of the richness of the kinds of judgments we need to
make when we make important business or life decisions, such as
whether to buy a house or a car. Binet's theory was much closer in its
conception to real-world intelligence than was Galton's. His test items,
distant as they were from real-world problem solving and decision
making, were better predictors of these things than were Galton's test
items. But whereas Galton started a tradition of a close correspondence
between theory and test, Binet started a tradition of a much more modest
correspondence.
could argue that the present tests are quite narrow in what they measure
(e.g., Gardner, 1983; Sternberg, 1985), so that if a new test did correlate
highly with the old ones, it would indicate that the new test is just as
narrow as the old ones. In effect, it is an historical accident that the
intelligence-testing business got its main start when Binet was asked to
distinguish groups of students in school. Had he or someone else been
asked to do the same for performance at work, or in some other domain,
the tests that resulted might have been quite different, and possibly
different in kind.
Second, new tests might provide an operational basis for expanding
our concept of intelligence. To the extent that the tests do not correlate
highly with conventional ones, we need at least to be open to the
possibility that the new tests are measuring an aspect of intelligence that
the old ones do not measure.
Third, new tests can give us kinds of information that are not yielded
by conventional psychometric tests, regardless of the correlation of the
new tests with the conventional ones. They can not only give us new
information, but can help us conceive of individual people's intelligence
in new ways. Let us therefore consider other metaphors, and both the
theories and tests that they have generated.
processes heavily, and in some cases have even been built around these
processes (e.g., Evans, 1968).
Historically, the greatest impetus for the computational approach to
understanding intelligence can be traced to the pioneering work of
Newell, Shaw, and Simon (1958) and others who constructed computer
programs that could perform "intelligently. These AI programs are
II
summarized in books such as Boden (1977) and Stillings et al. (1987), and
will not be reviewed here. Rather, I will concentrate on more recent
human-experimental work that has had more direct implications for the
testing of intelligence.
Hunt, Frost, and Lunneborg (1973) suggested that one way to under-
stand intelligence would be to test subjects in their ability to perform
tasks that contemporary cognitive psychologists believe measure basic
human information-processing ability. The proximal goal in this research
would be to estimate parameters (characteristic quantities) representing
the durations of performance for information-processing components
constituting each task, and then to investigate the extent to which these
components correlate across subjects with each other and with scores
on measures commonly believed to assess intelligence. Sternberg (1977)
later expanded upon this logic in his approach called "componential
analysis. The overall purpose of componential analysis is to identify the
II
low IQs will have noisy channels of information processing. When evoked
potentials are averaged out, the potentials will have a smoother appear-
ance (because of averaging over the noise) than those produced by
individuals with more consistent and less noisy channels.
The third of the biological approaches relates cerebral blood flow to
intelligence. The idea is that blood goes to portions of the brain that are
being used in the processing of a task. It is possible to use radioactive
traces that are inhaled in order to monitor flow of blood during informa-
tion processing. Using this approach, one could monitor blood flow as a
function of the task being performed and who is performing it.
Biological tests. Biologically-based tests provide information that is
different in kind from either geographically-based or computationally-
based ones. Biological tests may indicate specific neuropsychological
deficits, patterns of hemispheric specialization, performances of differ-
ent regions of the brain, or in the case of evoked-potential measurement,
patterns of brain waves. The interpretability of this information, as in the
case of any test information, will depend upon the quality of the theory
upon which the test is based. But biologically-based tests are, for the most
part, the only ones that really map onto brain functioning, whether
directly or indirectly.
Halstead constructed a testoffunctioning based upon his theory, and
more recently, J.P. Das and Jack Naglieri have been working on a test
based on Luria's theory. Das and his colleagues have constructed an
impressive array of tests that measure all three aspects of functioning in
Luria's theory, namely, attention-arousal, planning, and mode of process-
ing. With respect to the last, Das's tests measure both simultaneous and
successive processing. Simultaneous processing refers to the parallel
processing of multiple chunks of information at a time. Successive or
sequential processing refers to serial processing of chunks of informa-
tion, one following the other. Tests such as Raven Matrices and Gestalt
Closure would measure simultaneous processing, whereas serial-recall
tests would measure successive processing.
Das and Naglieri are not the first to construct a test based on Luria's
theory. Two other such test are the Luria-Nebraska Neuropsychology
Battery (LNNB) (Golden, 1981) and the Kaufman Assessment Battery for
Children (K-ABC) (Kaufman & Kaufman, 1983). This latter test does not
measure the attention-arousal and planning function separately, but it
does measure simultaneous and successive processing and provides
separate scores for each. I have reviewed this test elsewhere in some
detail (Sternberg, 1985). The K-ABC also has a separate achievement
section, which is similar to what one would find on other tests of verbal
intelligence, such as the Stanford-Binet.
METAPHORS OF MIND 17
set were most alike. The child was then asked why they were most alike.
Subjects were selected from children in three populations: Bush children
who had not attended school, children in school from the same town as
the Bush children, and school children living in Dakar, the capital of
Senegal. Greenfield found that children who had attended school, regard-
less of where, performed much as American children did. Preference for
color decreased sharply with grade, whereas preference for form and
function increased. Moreover, an increasing proportion of older children
justified their classifications in terms of subordinate categories. Children
who had not attended school and lived in the Bush responded quite
differently. They showed a greater preference for color with increasing
age, and rarely justified responses in terms of subordinate language
structure.
Even when the objects to be dealt with are familiar, the way they were
typically used or thought about may have once helped people perform
with them. When Cole and his colleagues (Cole, Gay, Glick, &Sharp, 1971)
asked adult Kpelle tribesmen to sort twenty familiar objects into "groups
of things that belong together," the subjects separated the objects into
functional groups (a knife with an orange, for example), as children in
Western societies do. The researchers had expected to see taxonomic
groupings (tools and foods, for example) from these adults, because
Western adults typically sort taxonomically. The Kpelle proved to be
perfectly capable of taxonomic sorting: When the subjects were asked to
sort the objects the way a fool would do it, they immediately arranged
them into neat piles of tools, foods, clothing, and utensils. Taxonomic
sorting of these objects seemed stupid to the Kpelle because it was
inconsistent with the way they deal with these objects in everyday life,
that is, functionally. In another classification task, the Kpelle sorted
leaves taxonomically (as either "tree" leaves or of "vine" leaves) with
ease. In this case, the taxonomic approach seemed completely appropri-
ate. As farmers, the Kpelle are frequently called upon to make such
discriminations, and hence were comfortable adopting the taxonomic
sorting strategy.
In sum, tests based upon the anthropological metaphor need to be
tailored, not just translated or adjusted, to the culture in which the testing
is taking place. Tests based upon the internally-oriented metaphors have
been used with almost no modification across cultures, under the as-
sumption that what they measure should be universal. But this is a big
assumption. Changing the content vehicle or the format of a test, or even
the location in which the test is given, can have a major effect upon test
scores. Thus, children who might look quite stupid on tests based upon
metaphors that view intelligence as inside the head, might look quite
smart on tests based on metaphors that are oriented toward the outside.
26 ROBERT J. STERNBERG
children rather difficult tasks to solve. Initially, he or she looks at how the
children solve the tasks without any intervention on the part of the
examiner. Then, children receive carefully graded, sequential hints, and
the examiner observes the children's ability to profit from these hints. In
this way, it becomes possible to observe the children's zone of proximal
development.
Although I initially had some doubts as to whether the tests of the
zone of proximal development measure what they are suppose to mea-
sure, the results of Brown and her colleagues and of Feuerstein are very
encouraging. I remain concerned, however, that the operationalization of
the zone of proximal development may not sufficiently take into account
individual differences in abilities and styles of learning. The instruction
that works well for one child might work only poorly for another child,
with the result that the first child might appear to have a larger zone of
proximal development than the second. In order for the measure to be
fair, we would have to make sure that the form of instruction used was
equally suitable for all children receiving that instruction, and it is
unlikely that any form of instruction will be equally suitable for all. Hence,
I believe that we do have to be careful in our interpretation of results of
tests that measure the zone of proximal development. Moreover, we need
to recognize that there may be zones of proximal development that are
domain-specific rather than domain-general, and that may differ not only
as a function of domain but as a function of how learning takes place. For
example, some children might learn quite well with the kind of direct
instruction given in tests of the zone of proximal development, whereas
other students might learn better on their own.
These concerns notwithstanding, the zone of proximal development
is one of the more exciting concepts in the psychology of intelligence,
because it gives us a way of addressing the question of what will happen
in the future, not based just upon retrospective measurement, but based
upon simulations of prospective processing of information. The dynamic
form of testing is quite different from the static form used under the
geographic metaphor. Dynamic testing may well be the wave of the future
in terms of understanding not only to what point people have arrived, but
also to what point they are going.
30 ROBERT J. STERNBERG
Intelligence as Viewed
Internally and Externally to the Individual
The Systems Metaphor
The systems metaphor is an attempt to bring together various other
metaphors by viewing intelligence in terms of a complex interaction of
various cognitive and other systems. I will describe here two attempts to
understand intelligence in terms of interactioning systems: Gardner's
(1983) theory of multiple intelligences and Sternberg's (1985, 1988)
triarchic theory of human intelligence.
Systems theories. The two systems theories to be considered here
are similar in viewing conventional theories of intelligence as too narrow,
but are different in the way they propose to expand our conception of
intelligence. Consider each theory in tum.
Howard Gardner's (1983) theory of multiple intelligences may be
viewed as having three fundamental principles. First, intelligence is not a
single thing, whether viewed unitarily or as comprising multiple abilities.
Rather, there are multiple intelligences, each distinct from the others.
The multiple intelligences Gardner proposes in his 1983 book are linguis-
tic, logical-mathematical, spatial, musical, bodily-kinesthetic,
interpersonal, and intrapersonal. In some ways, the distinction between
positing one intelligence comprising multiple abilities and positing mul-
tiple intelligences, each distinct from the others, is subtle. But the
positing of multiple intelligences emphasizes the separateness of each
set of skills, and also emphasizes Gardner's view that each intelligence is
a system in its own right, rather than merely one aspect of a larger system,
namely, what we traditionally call"intelligence." The second fundamental
principle is that these intelligences are independent of each other. In
other words, a person's abilities as assessed under one intelligence
should, in theory, be unpredictive of that person's abilities as assessed
under another intelligence. Obviously, the claim of independence is a
strong one, but Gardner believes that it is justified by what we know about
the mind. The third fundamental principle is that the intelligences inter-
act. Although they are distinct from each other, no one could ever get
anything done if their distinctness and independence meant that they
could not work together. In such an instance, a mathematical word
problem requiring, say, the application of both linguistic and logical-
mathematical intelligences would be insoluble.
Gardner defines an intelligence as "an ability or set of abllities that
permits an individual to solve problems or fashion products that are of
consequence in a particular cultural setting" (Walters & Gardner, 1986, p.
165). How do we know what constitutes an intelligence? In other words,
METAPHORS OF MIND 31
what criteria can one use to identify the multiple intelligences in Gardner's
theory, or other possible intelligences that have not yet been identified?
Gardner proposes eight criteria for distinguishing an independent intel-
ligence: potential isolation by brain damage; the existence of idiots
savants, prodigies, and other exceptional individuals; an identifiable core
operation or set of operations; a distinctive developmental history, along
with a definable set of expert "end-state" performances; an evolutionary
history and evolutionary plausibility; support from experimental-psy-
chological investigations; support from psychometric findings; and
susceptibility to encoding in a symbol system.
Sternberg's (1985, 1988) triarchic theory, as its name implies, has
three parts. The first, "componential subtheory," relates intelligence to
the internal world of the individual, or the mental mechanisms that
underlie intelligent behavior. It specifies three kinds of information-
processing components: metacomponents, which are higher order
executive processes used to plan what one is going to do, to monitor it
while one is doing it, and to evaluate it after it is done; performance
components, which are lower order processes that execute the instruc-
tions of the metacomponents in order to perform tasks; and
knowledge-acquisition components, which are used to learn how to do
what the metacomponents and performance components eventually do.
Components of information processing are always applied to tasks
with which one has some level of prior experience (including the null
level) and in situations with which one has some level of prior experience
(including the null level}. Hence, these internal mechanisms are closely
tied to one's experience. The second, "experiential subtheory," specifies
that the components are not equally good measures of intelligence at all
levels of experience. Assessing intelligence requires one to consider not
only the components, but the level of experience with which they are
applied.
According to the experiential subtheory, intelligence is best mea-
sured at those regions of the experiential continuum that involve
application of information-processing components to tasks or situations
that are either relatively novel, on the one hand, or in the process of
becoming automatized, on the other.lf a task is too unfamiliar, such as a
trigonometry problem presented to a first-grader, it will not measure
intelligence because the individual will have virtually no mental re-
sources to bring to bear on the problem. If the task is already automatized,
one will have no sense of the history of how efficaciously that automati-
zation was accomplished-whether it took one week or one year. The
ability to deal with novelty and the ability to automatize information
processing are interrelated. If one is well able to automatize, one has more
resources left over for dealing with novelty. Similarly, if one is well able
32 ROBERT J. STERNBERG
to deal with novelty, one has more resources left over for automatization.
Thus, performance at the various levels of the experiential continuum are
related to one another.
These abilities should not be viewed in a vacuum with respect to the
componential subtheory. The components of intelligence are applied to
tasks and situations at various levels of experience: coping with novelty
is via the components, and what is automatized is a set of components of
information processing.
According to the third, "contextual subtheory," intelligent thought is
directed toward one or more of three behavioral goals: adaptation to an
environment, shaping of an environment, or selection of an environment.
These three goals may be viewed as the functions toward which intelli-
gence is directed: Intelligence is not aimless or random mental activity
that happens to involve certain components of information processing at
certain levels of experience. Rather, these components are purposefully
directed toward the pursuit of these three global goals, regardless of the
level of experience at which the components are executed. The nub of the
triarchic theory of intelligence is that intelligence involves recognizing
and capitalizing on one's strengths, and recognizing and either compen-
sating for or remediating one's weaknesses. Thus, people may differ
widely in how they are intelligent, but they find some way in which they
excel, and then make the most of it.
Systems tests. In the systems approaches, the actual testing that is
done will depend on the way the system of the mind is conceived. Howard
Gardner and David Feldman, in their project Spectrum, are developing
tests based on Gardner's (1983) theory of multiple intelligences. These
tests, unlike the conventional ones, are not paper-and-pencil, but rather
measure children's thinking skills in an enriched classroom environment
where children are performing criterion activities. Thus, linguistic intel-
ligence might be measured by having children write a poem, or
bodily-kinesthetic intelligence by having them dance or play a sport.
Stern berg is currently developing a test based on his triarchic theory
of intelligence, which measures each of componential skills, coping with
novelty skills, automatization skills, and practical-intellectual skills in
verbal, quantitative, and figural domains. The test, for kindergarten
through adulthood, is at multiple levels, and is a group test. The compo-
nential items are most similar to those on a standard intelligence test,
including things like learning meanings of words from context, number
series, and figural analogies. The coping with novelty tests require
subjects to solve problems that are based on a novel premise. For
example, novel verbal analogies have students solve analogies that are
preceded by a premise that may be either factual (e.g., canaries sing
songs) or counter factual (e.g., canaries play hopscotch). In each case, the
METAPHORS OF MIND 33
subjects would have to solve the analogies as though the premise were
true. The novel number matrices, used to measure coping with novelty in
the quantitative domain, require subjects to complete number-matrix
problems in which numerals and symbols that have been set equal to
various n urn bers are freely interchanged. The figural coping with novelty
task requires solving figural series in which the series show a discontin u-
ity in the middle, and the subject must infer how to extrapolate from the
first domain in the series to the second. All three automatization subtests-
verbal, quantitative, and figural-require subjects rapidly to indicate
whether two different symbols are of the same class or not, for example,
whether two numbers are both even or both odd. Practical verbal
problems require everyday inferences, practical math problems require
everyday math, and practical figural problems require planning of routes
in a way that is time-efficient and effective. In the triarchic approach, the
testing of intelligence is closely linked to the teaching of intelligence, and
there exists a program at the high school-college level, Intelligence
Applied (Sternberg, 1986), which can be used in conjunction with testing
in order to enhance intellectual skills.
Conclusion
The testing of intelligence can be as diverse and multifaceted as is
intelligence itself. One cannot beg the question of "What is intelligence?"
by saying, as did Boring, that intelligence is what the tests test, because
there are as many different kinds of tests as there are metaphors for
understanding intelligence. There can be different sorts of tests within
each metaphor, depending upon the particular theory within the meta-
phor that generates the test. The conventional individual and group tests
we use to measure intelligence represent only a small sampling of the
ways in which intelligence might be tested. Almost all of the conventional
tests are based upon the geographic metaphor, but there is no reason in
principle why we need to test intelligence on this basis. Other metaphors
could help us assess aspects of performance that heretofore have been
neglected.
It is worth emphasizing again that metaphors are not right or wrong,
but more or less useful for particular purposes. Similarly, the theories
within the metaphors can be more or less useful for particular purposes.
Theories, unlike metaphors, can be proven to be wrong, although of
course, we cannot prove them to be right, but can only gather evidence
that is consistent with them. In comparing theories, we need to keep in
mind whether or not they were generated under the same metaphor,
because theories generated under different metaphors do not readily
34 ROBERT J. STERNBERG
lend themselves to comparison, any more than apples and oranges do.
They deal with different aspects of intelligence and accomplish different
goals, and hence are not, strictly speaking, comparable. Even within the
same metaphor, noncomparabilities can exist. For example, within the
biological metaphor, there were three distinct approaches that were
viewed-neuropsychological, electrophysiological, and blood flow-and
it would not be possible directly to compare across these procedures of
measurement. Each' deals with a different class of phenomena.
The direction that much applied research has taken over the past
decade or so has been toward successively more refined psychometric
theories of measurement, and successively more refined delivery sys-
tems for measurement. Thus, we now have tailored tests, which typically
use computer technology to present existing tests in more efficient ways.
I am all in favor of psychometric and technological development. But I
personally believe that we need to apply more resources to questions of
what we want to measure before we apply resources to how we are going
to measure it. Historically, the link between theories and tests of intelli-
gence has not been as strong as I believe it should be. The link is always
there, to some extent, but even when it is there, we have often not been
conscious of it. We need more explicitly to state what we are assuming
about intelligence when we measure it through a given vehicle, and more
seriously to consider the alternatives to the vehicles we are using. The
metaphorical approach to understanding intelligence helps chart the
universe of possibilities for more informed and self-conscious testing of
intelligence.
References
Baltes, P. B., Dittmann-Kohli, F., & Dixon, R. A. (1984). New perspectives on the
development of intelligence in adulthood: Toward a dual-process conception
and a model of selective optimization with compensation. In P. B. Baltes & 0.
G. Brim, Jr. (Eds.), Life-span development and behavior (Vol. 6, pp. 33-76). New
York: Academic Press.
Baron, J. (1985). Rationality and intelligence. New York: Cambridge University
Press.
Berry, J. W. (1974). Radical cultural relativism and the concept of intelligence. In
J. W. Berry &P. R. Dasen (Eds.), Culture and cognition: Readings in cross<ultural
psychology (pp. 225-229). London: Methuen.
Berry, J. W., &Irvine, S. H. (1986). Bricolage: Savages do it daily. In R. J. Sternberg
& R. K. Wagner (Eds.), Practical intelligence: Nature and origins of competence
in the everyday world (pp. 271-306). New York: Cambridge University Press.
METAPHORS OF MIND 35
Binet, A., & Simon, T. (1916). The intelligence of the feeble-minded (E. S. Kite,
Trans.). Baltimore, MD: Wllliams & Wilkins.
Bllnkhorn, S. F., & Hendrickson, D. E. (1982). Averaged evoked responses and
psychometric intelligence. Nature, 295, 59&.597.
Boden, M. A. (1977). Artificial intelligence and natural man. Sussex, England:
Harvester Press.
Boring, E. G. (1923). Intelligence as the tests test it. New Republic, June 6, 35-37.
Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of
metacognltion.ln R. Glaser (Ed.), Advances in instructional psychology (Vol. 1,
pp. 77-165). Hillsdale, NJ: Erlbaum.
Brown, A. L., & Camplone, J. C. (1978). Permissible inferences from cognitive
training studies in developmental research. In W. S. Hall & M. Cole (Eds.),
Quarterly newsletter of the Institute for Comparative Human Behavior, 2, 46-53.
Brown, A. L., &French, A. L. (1979). The zone of potential development: Implications
for intelligence testing in the year 2000. In R. J. Sternberg & D. K. Detterman
(Eds.), Human intelligence: Perspectives on its theory and measurement(pp. 217-
235). Norwood, NJ: Ablex.
Bruner, J. S., Olver, R. R., & Greenfield, P.M. (1966).Studies in cognitive growth. New
York: Wiley.
Case, R. (1984). The process of stage transition: A neo-Piagetian view. In R. J.
Sternberg (Ed.), Mechanisms of cognitive development (pp. 20-44). New York:
Freeman.
Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic
Press.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373-380.
Cattell, R. B. (1971). Abilities: Their structure, growth, and action. Boston, MA:
Houghton Mifflin.
Cattell, R. B., & Cattell, A K. (1963). Test of g: Culture fair, Scale 3. Champaign, IL:
Institute for Personality and Ability Testing.
Charlesworth, W. R. A (1979). An ethological approach to studying intelllgence.
Human Development, 22, 212-216.
Cole, M., Gay, J., Glick, J., &Sharp, D. W. (1971). The cultural context of learning and
thinking. New York: Basic Books.
Dixon, R. A., & Baltes, P. B. (1986). Toward life-span research on the functions and
pragmatics of intelllgence. In R. J. Sternberg & R. K. Wagner (Eds.), Practical
intelligence: Nature and origins of competence in the everyday world (pp. 203-
235). New York: Cambridge University Press.
Evans, T. G. (1968). A program for the solution of geometric analogy intelligence
test questions. In M. Minsky (Ed.),Semantic information processing. Cambridge,
MA: MIT Press.
Ferguson, G. A. (1954). On learning and human ability. Canadian Journal of
Psychology, 8, 95-112.
36 ROBERT J. STERNBERG
Ferrara, R. A., Brown, A. L., & Campione, J. C. (1986). Children's learning and
transfer of inductive reasoning rules: Studies of proximal development. Child
Development, 57, 1087-1099.
Feuerstein, R. (1979). The dynamic assessment ofretarded performers: The learning
potential assessment device, theory, instruments, and techniques. Baltimore, MD:
University Park.
Feuerstein, R. (l980).1nstrumental enrichment: An intervention program forcognitive
modifiability. Baltimore, MD: University Park.
Fischer, K. W. (1980). A theory of cognitive development: The control and
construction of hierarchies of skills. Psychological Review, 87, 477-531.
Fischer, K. W., & Pipp, S. L. (1984). Processes of cognitive development: Optimal
level and skill acquisition. In R. J. Sternberg (Ed.), Mechanisms of cognitive
development (pp. 45-75). New York: Freeman.
Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press.
Galin, D., & Ornstein, R. (1972). Lateral specialization of cognitive mode: An EEG
study. Psychophysiology, 9, 412-418.
Galton, F. (1883). Inquiry into human faculty and its development. London: Macmillan
Press.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York:
Basic Books.
Gazzaniga, M.S. (1985). The social brain: Discovering the networks ofthe mind. New
York: Basic Books.
Golden, C. J. (1981). A standardized version of Luria's neuropsychological tests:
A quantitative and qualitative approach to neuropsychological evaluation. In
S. B. Filskov & T. J. Boll (Eds.), Handbook ofclinical neuropsychology. New York:
Wiley.
Guilford, J.P. (1967). The nature of human intelligence. New York: McGraw-Hill.
Guilford, J. P. (1982). Cognitive psychology's ambiguities: Some suggested
remedies. Psychological Review, 89, 48-59.
Guilford, J. P., & Hoepfner, R. (1971). The analysis of intelligence. New York:
McGraw-Hill.
Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities.
Intelligence, 8, 179-203.
Guttman, L. (1954). A new approach to factor analysis: The radex. In P. F.
Lazarsfeld (Ed.), Methematical thinking in the social sciences (pp. 258-348). New
York: Free Press.
Halstead, W. C. (1951). Biological intelligence. Journal of Personality, 20, 118-30.
Hebb, S. B. (1983). Ways with words. New York: Cambridge University Press.
Hendrickson, A. E., &Hendrickson, D. E. (1980). The biological basis for individual
differences in intelligence. Personality and Individual Differences, I, 3-33.
Horn, J. L. (1986). Intellectual ability concepts. In R. J. Sternberg (Ed.), Advances
in the psychology of human intelligence (Vol. 3, pp. 35-77). Hillsdale, NJ:
Erlbaum.
METAPHORS OF MIND 37
Sternberg, R. J. (1988). The triarchic mind: A new theory ofhuman intelligence. New
York: Viking.
Sternberg, R. J. (1990). Metaphors of mind. New York: Cambridge University
Press.
Sternberg, R. J., & Gardner, M. K. (1982). A componential interpretation of the
general factor in human intelligence. In H. J. Eysenck (Ed.), A model for
intelligence (pp. 231-254). Berlin: Springer.
Sternberg, S. (1969). Memory-scanning. Mental processes revealed by reaction-
time experiments. American Scientist, 4, 421-457.
Stillings, N. A., Feinstein, M. H., Garfield, J. L., Rissland, E. L., Rosenbaum, D. A.,
Weisler, S. E., & Baker-Ward, L. (1987). Cognitive science: An introduction.
Cambridge, MA: MIT Press.
Super, C. M. (1976). Environmental effects on motor development: The case of
African infant precocity. Developmental Medicine and Child Neurology, 18, 561-
567.
Thomson, G. H. (1939). The factorial analysis of human ability. London: University
of London Press.
Thurstone, L. L. (1938). Primary mental abilities. Chicago, IL: University of Chicago
Press.
Thurstone, L. L., & Thurstone, T. G. (1962). Tests of Primary Mental Abilities
(Revised). Chicago, IL: Science Research Associates.
Tuddenham, R. D. (1970). A "Piagetian" test of cognitive development. In W. B.
Dockrell (Ed.), On intelligence (pp. 49-70). Toronto: Ontario Institute for
Studies in Education.
Vernon, P. E. (1971). The structure of human abilities. London: Methuen.
Vygotsky, L. (1978). Mind in society. Cambridge, MA: Harvard University Press.
Walters, J. M., & Gardner, H. (1986). The theory of multiple intelligences: Some
issues and answers. InR. J. Sternberg &R. K. Wagner (Eds.), Practical intelligence:
Nature and origins ofcompetence in the everyday world (pp. 163-182). New York:
Cambridge University Press.
Wissler, C. (1901). The correlation of mental and physical tests. Psychological
Review, Monograph Supplement, 3, No. 6.
Author Note
Preparation of this chapter was supported by a contract from the Army Research
Institute, and by grants from the Spencer and McDonnell Foundations. The ideas
in the chapter represent a condensation of ideas presented In Sternberg (1990).
CHAPTER2
41
42 LEW BANK AND GERALD R. PAITERSON
collecting tremendously more complex data sets than Pastorelli's when the
same results apparently could be obtained simply by asking mothers for
their perspectives? Were we likely to gain anything for all our effort?
Using the warp and woof fabric lines of the seat back in front of us, we
graphed our imaginary results in tweed. Well before landing, we reached
agreement. The model including only data from mothers was flawed, we
hypothesized, because it would not be predictive of important outcome
variables measured with other agents and methods. On the other hand,
a more generalizable model, using multiple indicators from different
informants and based on different methods, could be used to form latent
variables (i.e., unobserved, but hypothesized, underlying variables in a
theoretical model). Latent variables based on multiple operational defi-
nitions gathered in a variety of contexts by several different agents ought
to far more successfully predict theoretically important outcomes re-
gardless of how each outcome was measured.
For nearly a decade, we have been working with these multiple
indicator, latent variable models at the Oregon Social Learning Center
(OSLC). With an expectation of enhanced prediction of critical processes
and outcomes across several settings, we have moved toward a compre-
hensive theory of the development of antisocial behavior in children
(Patterson &Bank, 1986,1987,1989; Patterson, Bank, &Stoolmiller, 1990;
Patterson, Capaldi, & Bank, 1991; Patterson, Dishion, & Bank, 1984). The
context of these investigations includes the following: (a) Patterson's
(1977, 1979, 1982,1986) performance theory conception, which demands
a consistent effort to account for increasingly large chunks of criterion
variance in theoretical outcomes of interest; (b), a scientist-as-practitio-
ner model with all investigators also working in clinical roles (see Bank,
Patterson, &Reid,1987; Forgatch,1991);and (c) ahistoryofresearchwith
social interactional variables based on direct observation data (e.g.,
Patterson, 1982; Patterson &Reid, 1970; Reid, 1978; Reid &Patterson, 1989).
The social interactional perspective posits that it is experience and
practice through moment-to-moment interactions with others that build
over time (and many repetitions) our individual styles and personalities.
We have called these molecular episodes microsocial interactions and we
are careful to differentiate them from molar or macrosocial variables. For
example, a trained observer will code into a portable computer actual
behaviors as they occur in a family in their living room; a child's total
number of negative behaviors divided by the total of all his or her
behaviors during a given observation period will yield a microsocial
variable providing a base rate of his or her negative behavior. Microsocial
variables may be compared within and across families. On another day,
we will individually interview this same child's mother and father, as well
as the child, and ask each of them to tell us how often the child does not
STRUCTURAL EQUATION MODEIJNG 43
In general, EQS is more user friendly for those getting started with SEM;
LISREL, however, is more readily available through SPSS-X.
Radonale of SEM
The SEM approach has two major benefits: (a) it is enormously
helpful as a heuristic device, and (b) it is a powerful analytic tool in that
it allows the researcher to test and to reject a wide variety of alternative
models. As a heuristic device, one can posit different sets of structural
relationships that best sum up several theoretical approaches to the
same problem, construct models of competing explanations of the data
that are plausible within the same theoretical framework, and provide
through the generation and testing of these models a fertile ground for
other researchers to test replicability and develop and test competing
models. In addition, other investigators may succeed in extending the
usefulness of properly specified models by seeking the antecedents and
products of important processes. As a statistical method, modeling
allows a simultaneous test of (a) indicators loading on the factors they
theoretically describe, (b) structural paths existing reliably only where
hypothesized, and (c) a satisfactory fit of the model to the data. The
preparation alone to conduct such a test requires substantial under-
standing of the literature and the subject at hand in order to generate a
plausible model. Once a model is tested and found consistent with the
data, the process continues. It is now the responsibility of the investigator
to test competing models; this includes known spurious cases. The more
generalizable the competing models that can be rejected, the more
confident the researcher can be of the utility of the model(s) that cannot
be rejected.
Quesdons of causality. The question of making causal inferences
from these longitudinal data bears further discussion. We are in strong
agreement with Dwyer (1983) and Baumrind (1991); one cannot demon-
strate causality from correlational data, not even when using structural
46 LEW BANK AND GERALD R. PAITERSON
.i:
Marttol Adluotment Pe,.m Povchooatholoqy
it''· "#it.'< (o..U.O~I, aubatanceobl.oolvl;de.,...ud
'
i ~ Poohlve Relnlorcoment
..
c ,,
suoervlslon · :
,, il
!I! ·
~ I•
.
I
'·
j gt ;.:
....e .
3.
.
]; ;;
<
i-: ~.
Q 0 ' 0
CliiLD ADJUSTMENT/
·' <
••
H
' CONDUCT DISORDER 3
'
...e
.0
.• ~
~~
~!.I..; -:<,·~ ~ • • . ..r,.,~ >
::z:
!!
Dloclpllne PROXIMAL FACTORS .'Of ' 0
150,000 persons. Schools were randomly selected and recruited from the
ten identified until two cohorts, each with approximately 100 Grade 4
boys, were filled. Twenty-one families were ineligible for participation
because they moved out of state, moved before contact, or their first
language was not English. Seventy-seven percent of the remaining fami-
lies agreed to participate in the study; the most common reasons stated
for refusals were lack of interest and no available time. Refusers did not
differ significantly from participants on teacher Child Behavior Checklist
(Achenbach &Edelbrock, 1983) ratings of externalizing and internalizing
behaviors. Details of the recruitment procedures for the boys (N = 206)
and their families are provided in Capaldi and· Patterson (1987). The
sample was almost entirely caucasoid and of lower socioeconomic sta-
tus, with 75% working class or unemployed. Through the fifth year (yYave
5) of data collection, well over 90% of the Oregon Youth Study (OYS)
sample were still participating in the study. Families were paid up to $300
for completion of the full 23-hour assessment battery.
Measures
For Example 1, the first set of models (Figure 2) used only mother self-
report indicators to define the Mother Antisocial and Mother Discipline
constructs. For mothers' antisocial histories, the Minnesota Multiphasic
Personality Inventory (MMPI) 49/94 profiles (Pd and Ma) and self-reports
of alcohol and illegal substance use were the three indicators. The MMPI
score was the sum of the Pd and MaT-scores. For alcohol consumption,
we used the Michigan Alcohol Screening Test (MAS1) score (Zucker,
1987), and illegal substance use was an OSLC interview scale. For moth-
ers' disciplinary styles, three interview-derived scales-efficacy ratings,
tendency to rationalize, and self-confidence ratings-were the indicators.
Child Delinquency was defined by actual police-recorded law violations
and the boys' responses to a 32-item General Delinquency Scale devel-
oped from the Elliott delinquency measure item pool (Elliott, Ageton,
Huizinga, Knowles, & Canter, 1983).
For the second round of models in Example 1, observer impres-
sions-based on home observations-and Department of Motor Vehicles
driving violation records were added to the MMPI49 scale sum and self-
reported use of illegal substances. And to measure mothers' discipline
styles, observer impressions and mother nattering during the in-home
observations were added to the mothers' self-confidence and self-effi-
cacy ratings. It should be noted that observer impressions referred
specifically to mothers' antisocial behavior and discipline styles in sepa-
rate sets of items on the observer impressions questionnaire. Nattering
was the observed base rate of low-amplitude coercive behavior of moth-
50 LEW BANK AND GERALD R. PAITERSON
BBN • .91
BBNN -1.0
No residuals covaried
'llr....... ..
........................ xz • 3.348, p. 067
.•IJ'
--··..···..
··~···-...~..~... (I)
······--··~-------······--···········---· ..
No residuals covaried
ers directed to the targeted children. We have also labelled this behavior
"irritable discipline." The definition for the criterion construct Child
Delinquency was not changed.
Two indicators for SES were based on Hollingshead's (1975) mea-
sures: the average of the parents' educational levels and the average of
their occupational prestige. In addition, family income was used as a third
SES indicator. For single-mother families, the data were based solely on
mothers' socioeconomic levels.
STRUCTURAL EQUATION MODEIJNG 51
some serious flaws in the structural model that the reader should note.
Following Anderson and Gerbing (1988), our strategy in testing structural
models was to compare the hypothesized model, as well as other alterna-
tive models, to the fit of the measurement model. (See Patterson et al.,
1990, for further discussion of the strategy.) The hypothesized structural
model shown in Figure 2b was a simplex, with paths depicting the effect
of Mother Antisocial mediated through Mother Discipline. The simplex
model was a statement of expected relations among theoretical con-
structs: each construct was predictive of the next one, but no other direct
effects on constructs later in the sequence were expected. Thus, direct,
nonmediated effects other than the simplex sequence were useful alter-
native models to test.
The factor loadings, though not included in Figure 2b, were essen-
tially unchanged as compared to the measurement model, though the
loading for the police contact indicator was only marginal (p less than
.10). Note that the path fi·om Mother Antisocial to Mother Discipline was
marginally significant, while the path from Mother Discipline to Child
Delinquency failed to reach significance at all. In this example, the factor
loading for police contacts was almost identical to its loading in the
measurement model, and the path coefficients were actually slightly
larger than in the measurement model. What changed was the standard
error estimate for police contacts. In SEM, widely varying estimates of the
same standard errors may be indicative of a spurious solution, regardless
of goodness of fit.
The fit of the simplex was adequate, and was not significantly poorer
statistically than that of the measurement model, X2(1) = 3.57, p greater
than .05. Therefore, one should conclude that the simplex structural
model provided a good solution that did not differ significantly from the
measurement model. Nonetheless, it did not lend much support to our
hypothesis. That is, of the two hypothesized paths, only one approached
significance, and very little criterion variance in Child Delinquency was
accounted for; in addition, increases in some standard error estimates
caused us some concern. Therefore, this model was not acceptable to us.
Alternative models using mothe....report prediction. Several alterna-
tives to the hypothesized simplex model were also tested. A saturated
solution with all three possible paths present was used. By definition, the
fit of the saturated model must equal that of the measurement model, but
none of the three paths reached statistical significance. Thus, this model
was discarded. In the second alternative model, Mother Antisocial was
tested as the predictor construct for both Mother Discipline and Child
Delinquency, and the two paths were, in fact, statistically significant. This
model also fit the data satisfactorily, XZ(18) = 22.08, p = .20, though the fit
was significantly poorer than that of the measurement model, X2 (1) = 4.52,
STRUCTURAL EQUATION MODEIJNG 53
p less than .05. Furthermore, the covariance between the Mother Disci-
pline and Child Delinquency construct residuals was highly significant,
which suggested that the path between them must be returned to the
model for the model to be correctly specified. In summary, then, these
analyses with mother-report data as predictor variables shed little light
on the correctness of the hypothesized simplex or either of the alterna-
tive models tested.
The simplex model using multlmethod and -agent prediction. We
believed a more adequate test of the simplex hypothesis could be
3.a Measurement model
BBN • .904
BBNN • .991
·.55 -.49
'Ill:
BBN • .90
BBNN ~ .985
obtained through the use of multiple agents and methods in defining each
of the latent variable constructs. To this end, a second set of models is
presented in Figure 3. Note that the Department of Motor Vehicles'
records and observers' impressions from home visits to each of the
participating families were added to the mothers' self-reports of drug use
and MMPI 49/94 profile sums in defining Mother Antisocial. Thus, mother
report was not rejected as indicative, but is used with additional infor-
mant-method sources. It was the convergence of these several sources
that we felt would provide increased predictive power.
Similarly, nattering and observer impressions were added to the
mothers' self-reports of their own discipline styles and effectiveness in
defining Mother Discipline. The operational definition for the latent
criterion variable Child Delinquency remained unchanged. The measure-
ment model depicted in Figure 3a looks much like the measurement
model in Figure 2a. The obvious difference is the addition of the indicators
just noted above and the higher magnitudes of correlation among all
constructs in the multiple agent;method model as compared to the
mother-report-onlymodel. Another difference, which is less obvious, was
the need to covary two pairs of residuals to arrive at the satisfactory fit
for the model in Figure 3a. The two observer's impressions residuals were
covaried, as well as were the two mothers' reports of discipline residuals.
The decision to allow these particular residuals to covary was made at the
measurement model level (i.e., not during the hypothesis testing pro-
cess), and is consistent with our own approach for statistically handling
method effect (Banket al., 1990). Covarying residuals is a way of parceling
unique chunks of variance that would otherwise contribute noise in
evaluating a model. Covariances of this type should be in a direction
consistent with expectation; in the current problem, both covariances
should be, and were, positive.
All factor loadings for the measurement model were statistically
significant, as were the three latent variable construct intercorrelations
and the two covaried pairs of residuals. The model fit satisfactorily, X2 (30)
= 31.61, p = .39. In Figure 3b, the test of the hypothesized simplex is
presented. All loadings were statistically significant and essentially the
same as in the measurement model. The hypothesized paths were also
significant and the fit was adequate, X2(31) = 33.861, p = .33. The fit of the
model was not significantly poorer than that of the measurement model,
with the chi-square difference test (df = 1) = 2.251, p greater than .15. In
addition, the correlation between Mother Antisocial and the Child Delin-
quency residual was nonsignificant, XZ(l) =2.22,p =.14. This nonsignificant
correlation is consistent with the hypothesis that the effects of Mother
Antisocial on Child Delinquency are mediated through Mother Discipline,
and that there is no significant direct effect of Mother Antisocial on Child
Delinquency.
STRUCilJRAL EQUATION MODEIJNG 55
(1975) Index (see Larzelere and Patterson, 1990, for a discussion of this
point), while others have included income level (e.g., Bank, Forgatch et
al., 1991). For the current example, all three SES variables have been used
as indicators (see Figure 4a). TheSES data were collected at the same time
Grades 7-8
21
R • .26
xz(sa) 71.602, p = .1 os
Simplex model vs. Measurement model: xl3l = 6.89, p > .05
Residuals covaried: Observer Impressions (Mother Antisocial) +Observer Impressions (Maternal Discipline)
Effectiveness (Maternal Discipline) + Self·Contidence (Maternal Discipline)
Education (Parent SES) + Drugs (Mother Antisocial)
Income (Parent SES) + Effectiveness (Maternal Discipline)
Flgure 4. Parent SES and multimethod and -agent mother predictors of boys'
delinquency.
STRUCTURAL EQUATION MODFlJNG 57
as the other Grade 4 (first year) data, but it is assumed that income,
occupation, and education for both parents reflects a relatively stable set
for at least some extended period prior to the collection of the Wave 1
data.
Thus, it is reasonable to hypothesize the simplex model once again.
It is certainly the most parsimonious model given four latent constructs-
SES to Mother Antisocial to Mother Discipline to Child Delinquency-and
as has been illustrated thus far, it is an easy task to compare competing
models to it. Most important to us, however, the simplex model closely
approximated our theoretical framework in explaining the development
of antisocial behavior, and therefore represents the strongest test of the
theoretical model. It should also be emphasized, however, that paths
from SES to Mother Discipline and Child Delinquency could still be
acceptable within our framework as long as the paths from Mother
Antisocial to Mother Discipline and Mother Discipline to Child Delin-
quency remain statistically significant.
Referring now to Figure 4a, and again following the Anderson and
Gerbing (1988) strategy outlined above, it is clear that the fit of the
measurement model is adequate, X2(55) = 64.712, p = .174. Note that it was
necessary to covary four pairs of residuals in fitting the measurement
model: the two pairs alreadycovaried, that is, the two observer's impres-
sions indicator residuals; in addition, the residuals from parent education
and mother self-report of drug and alcohol use and from parent income
and mothers' self-ratings of discipline effectiveness were also covaried.
4.c. Best alternative model
Gradesz.a
simplex model provides specific points for intervention as, for example,
with mother discipline techniques. Furthermore, a number of studies
have already demonstrated the efficacy of parent training interventions
(e.g., Bank, Marlowe, Reid, Patterson, & Weinrott, 1991; Chamberlain,
1990; Patterson, Chamberlain, & Reid, 1982). Thus, although the SES
direct-impact model may be a viable theoretical alternative model, the
social interactional mediational model represented by the simplex in the
example appears to us to be of greatest practical utility for truly imple-
menting change in the desired areas.
Some Assumptions
We assume that the adequate definition for a trait construct requires
a representative sampling of indicators. This Brunswickian perspective
The Data
The present discussion is focused on the child's antisocial trait.
Presumably, however, what we have to say about this trait would apply
to other traits as well. The problem of interest lies in the typical finding
62 LEW BANK AND GERAlD R. PATTERSON
Sample Description
For this second example, a subsample of 80 boys and their families,
teachers, and peers was used from the full (N = 206) OYS sample. The
characteristics of the full sample were described in detail in the first
example for this chapter. Classroom and playground data were collected
on the subsample, of which 40 of the boys had the highest scores on the
antisocial behavior construct in the full sample, and a second 40 were
randomly selected from the remaining 166. Issues of comparability of the
subsample and full sample are discussed in Ramsey, Patterson, and
Walker (1989). There were no statistically significant differences between
the full sample and the subsample on a variety of demographic and
construct scores, and intercorrelations among variables and constructs
were highly similar (within .05) in virtually all cases of interest.
Measures
For this example, the delinquency criterion construct was defined in
precisely the same way as in the first example. The Macro Child Antisocial
construct used teacher and parent reports and peer sociometric results.
The teacher indicator used only CBCL items, while the parent indicator
was an average of CBCL, parent interview and telephone interview
responses. The peer scale was scored with boys nominated by their
classroom peers on nine items relating to overt and covert antisocial
behaviors. These measures were all collected during the first year of the
study.
The Micro Child Antisocial construct used two measures during
Wave\ 1-total negative process (TNP) during a problem-solving task and
negative interactions with siblings during home observation-and two
measures collected in the Wave 2 (Grade 5)-academic engaged time
(AE1) in the classroom, and negative playground behavior (Playneg) with
peers. AET and Playneg are both school-based variables, and these data
were gathered only during Grade 5 and only for the subsample of 80, as
explained above. The Wave 1 (Grade 4) measures are described in detail
STRUCTURAL EQUATION MODEUNG 63
Total
Academic negative
Elliott Pollee Teacher Parent Peer en aged Negative process Negative
Self-Report~~ report ~ ~~ ~ w/slbs
Miw
AET ·.441 ·.433 ·.386 ·.450 ·.397 1.00
Total negative process .495 .294 .152 .345 .017 ·.224 .090 1.00
Negative w/slbs. .210 .056 .009 .335 .035 ·242 .077 .234 1.00
7.a
.47
7.b
BBN = .76
BBNN = .68
Grades 4-5
x2 a30.79,pa.10
(22)
BBN = .90
BBNN = .95
Residu•ls covaned: Parent Report (Macro) + Negative Play w/Sibllngs iMicro Homa Observation)
Negative Ounng Problem Solving+ Ellion SeH·Report of Delinquency
aP< .10
Table 1
Micro versus Macro Prediction of Arrests and Self-Reported
Law VIolations with Stepwise Multiple Regression Analyses-
OUTCOMES
Self-Report Arrests Mean Delinquency
Predictors Sig RZ Sig RZ Sig RZ
Macro Teacher n.s. a• .26 a .28
Parent a .34 n.s. b (.34)d
Peer n.s. n.s. n.s.
Micro Sib(fC Neg n.s. n.s. n.s.
Playneg b .44 n.s. n.s.
Total Neg a n.s. n.s.
AET n.s. a .32 a .35
Both Sib(fC Neg n.s. n.s. n.s .
Playneg c .52 n.s. n.s.
Total Neg b n.s. n.s.
AET n.s. a a
Teacher n.s. b .40 b .43
Parent a n.s. n.s.
Peer n.s. n.s. n.s.
•a, b, and c refer to the stepwise order of insertion of each variable into the
multiple regression equation. All significant variables are at the .05 level unless
otherwise indicated. Each R2 appears after the last variable entered into each
equation.
dp = .06
STRUCTURAL EQUATION MODEUNG 69
The structural models may be more elegant and more easily inter-
preted than the multiple regression analyses, but most importantly, the
SEM models provide estimates based on converging indicator validities.
It is this combining of indicators that we believe results in more general-
izable prediction models. Multiple regression techniques certainly have
a place among social scientists' statistical tools, but it is our position that,
given adequate data sets as described in this chapter, SEM will almost
always be the preferred analytic approach.
References
Achenbach, T. M., &Edelbrock, C. S. (1983). Manual for the child behavior checklist
and revised child behavior profile. Burlington, VT: University Associates in
Psychiatry.
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice:
A revi.ew and recommended two-step approach. Psychological Bulletin, 103,
411423.
Arrington, R. (1943). Time sampling: A review. Psychological Bulletin, 40, 81-124.
Bank, L., Dishion, T. J., Skinner, M. L., & Patterson, G. R. (1990). Method variance
in structural equation modeling: Uving with "glop." In G. R. Patterson (Ed.),
Aggression and depression in family interactions, (pp. 247-279). Hillsdale, NJ:
Lawrence Erlbaum Assoc.
Bank, L., Forgatch, M. S., Patterson, G. R., & Fetrow, R. A. (1991). Parenting
practices: Mediators of negative contextual factors in divorce. Unpublished
manuscript. (Available from the first author, OSLC, 207 E. 5th, Suite 202,
Eugene, OR 97401.)
Bank, L., Marlowe, J.H., Reid, J.B., Patterson, G.R. & Weinrott, M.R. (1991). A
comparative evaluation of parent training interventions for families of chronic
delinquents. Journal of Abnonnal Child Psychology, 19(1), 15-33.
Bank, L., Patterson, G. R., & Reid, J. B. (1987). Delinquency prevention through
training parents in family management. Behavior Analyst, 10, 75-82.Baumrind,
D. (1991). Effective parenting during the early adolescent transition. In P. A.
Cowan and M. Hetherington (Eds.), Family transitions (pp. 111-164). Hillsdale,
NJ: Lawrence Erlbaum Assoc.
Bentler, P. M. (1980). Multivariate analysis with latent variables. Annual Review of
Psychology, 31, 419456.
Bentler, P.M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the
analysts of covariance structures. Psychological Bulletin, 88, 58~6.
Bentler, P. M. (1989). EQS: Structural equations program manual. Los Angeles:
BMDP Statistical Software.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by
the multttralt-multtmethod matrix. Psychological Bulletin, 56, 81-105.
Capaldi, D. M., &Patterson, G. R. (1987). An approach to the problem of recruitment
and retention rates for longitudinal research. Behavioral Assessment, 9, 169-
177.
Capaldi, D. M., &Patterson, G. R. (1988). Psychometric properties offourteen latent
constructs from the Oregon Youth Study. NY: Springer-Verlag.
Capaldi, D. M., &Patterson, G. R. (in press). The relation of parental transitions to
boys' adjustment problems: I - A linear hypothesis; II - Mothers at risk for
transitions and unskilled parenting. Developmental Psychology. Chamberlain,
P. (1990). Comparative evaluation of specialized foster care for seriously
delinquent youths: A first step. Community Alternatives: International Journal of
Family Care, 2(2), 21-36.
72 LEW BANK AND GERALD R. PATIERSON
Author Note
Support for this chapter was provided by Grant No. MH 37940 from the Center for
Studies of Antisocial and Violent Behavior, NIMH, U.S. PHS, and Grant No. MH
46690, Prevention Research Branch, NIMH, U.S. PHS.
Note
1. Patterson and Capaldi found only a weak relation between Mother Antiso-
cial and Mother Discipline. When these authors partialed SES out of both sets of
Indicators, no significant correlation obtained. The comparison with the present
data is complex, however, because we have used different sets of indicators in
large part for both of these constructs. Furthermore, Patterson and Capaldi used
only education and occupation In defining SES, but we have used income as well.
CHAPTERS
New Developmen ts in
Multiaxial Empirically Based
Assessment of Child and
Adolescent Psychopatho logy
Thomas M. Achenbach
What is Multiaxial
Empirically Based Assessment?
In considering the notion of multiaxial empirically based assessment,
it is helpful to appreciate the conceptual context as well as the intended
meaning of the terms.
75
76 TIIOMAS M. ACHENBACH
Assessment
For our purposes, the word assessment refers to identifying the
distinguishing features of individual cases. There are many reasons for
identifying the distinguishing features of individual cases. The reasons
include clinical objectives such as deciding whether a particular child
needs special help and what sort of help is needed. The reasons also
include systems objectives, such as documenting the number of each
kind of case seen in a particular system and using the distribution of cases
to plan for future services. And the reasons include research objectives,
such as identifying differential etiologies or outcomes for cases differing
in their distinguishing features.
If "assessment" refers to "identifying the distinguishing features,"
how do we know which "distinguishing features" to select? This question
is what prompted our program of research in the first place. Each
individual is distinguishable from other individuals in many ways. To
identify the particular features that are important with respect to psycho-
pathology, we need ways of linking cases to other cases who share similar
features. By identifying groups who share similar features, we provide a
basis for finding answers to such important questions as whether particu-
lar problems should be treated or left alone, the most efficacious treatments
for particular problems, and differences between the etiologies of differ-
ent kinds of problems.
Taxonomy. The process of linking cases to other cases according to
their distinguishing features involves taxonomy-the systematic grouping
of cases according to features that reflect intrinsic similarities among cases
assigned to the same class and differences between cases assigned to
different classes. (fhe words "classification" and "diagnosis" also pertain
to the grouping of individuals into classes. However, these words have
additional meanings that tend to obscure the taxonomic goal of detecting
features that reflect intrinsic similarities and differences between types
of cases.)
The conceptual context of our work encompasses both assessment
and taxonomy as two aspects of a single process-the process of identi-
fying the important similarities and differences between cases. When our
research program began in the 1960s (Achenbach, 1965, 1966), the
prevailing theories placed the origins of most psychopathology in child-
hood. Despite the emphasis on childhood origins, however, there was
little systematic research on the actual form taken by childhood disor-
ders. In fact, until 1968, the official psychiatric nosology provided only
two categories specifically for childhood disorders. These were Adjust-
ment Reaction of Childhood and Schizophrenic Reaction, Childhood
Type (American Psychiatric Association, 1952). Consistent with the
MULTIAXIAL ASSESSMENT OF CHILDREN 77
of items, methods, or cases, nor that the methods had no influence on the
results. Different decisions, cases, and methods might have produced
different results. All human endeavors are structured by the minds of the
humans who carry them out, and empirical research is no exception to
this fundamental fact. However, the effort to empirically link the assess-
ment and taxonomy of child psychopathology was a first step in a
direction that led toward further challenges.
The syndromes derived from case histories revealed far more differ-
entiation among childhood disorders than was evident in the official
nosology. The study also provided methodology for grouping children
according to syndromes whose correlates were then tested in subse-
quent studies (Achenbach &Lewis, 1971; Anderson, 1969; Hafner, Quast,
& Shea, 1975; Katz, Zigler, & Zalk, 1975; Roff, Knight, & Wertheim, 1976;
Rolf, 1972; Weintraub, 1973). Consistent with the initial derivation of
syndromes, the subsequent studies employed data from case records.
The informants had not responded to standardized protocols and had not
directly supplied data that were scored for analysis. The problems
reported in the records could therefore have been influenced by what the
compilers of the records had chosen to ask and record.
The word "empirical" is derived from the Greek word empeiria,
meaning "experience." The initial effort at empirical derivation of syn-
dromes was successful in providing a more differentiated picture of
childhood disorders and a methodology that was applicable in a variety
of studies. To base our empirical approach more firmly on the way in
which children's problems are actually experienced, however, the next
step was to obtain assessment data directly from those who actually
observe the children's behavior, rather than from case records. Such
assessment data could then be used to derive taxa that would reflect the
patterns detectible in reports by particular kinds of informants.
Parent reports. Parents are the adults who typically have the most
involvement with their children's behavior over the longest periods and
the most situations. In most cases, parents' perceptions are also crucial
in determining what will be done about children's problems, and parents
are involved in most efforts to obtain help for children. In extending the
empirical approach toward more direct utilization of experiential data,
we therefore started with parents.
To tap parents' perceptions of their children's functioning, we devel-
oped the Child Behavior Checklist (CBCL; Achenbach, 1978), which
includes many of the same problem items identified in the initial case
record research, plus additional problem items developed through nine
pilot editions tested with parents in a variety of clinical settings. To
provide more differentiated scoring than is typically possible from case
records, we changed the present-versus-absent format to a 0-1-2 scale in
MULTIAXIAL ASSESSMENT OF CHilDREN 79
which 0 =Not true (as far as you know), I =Somewhat or Sometimes True,
and 2 = Very True or Often True. Furthermore, because children's compe-
tencies may be as important as their problems in determining who needs
special help and the likely outcomes, we included items tapping children's
involvement in activities, social relations, and school.
In obtaining data directly from parents, we also extended empirically
based assessment across many more caseloads than the one used in the
study of case records. This was an important innovation, because the
caseloads of individual clinical settings may reflect idiosyncracies such
as the composition of the local catchment area, socioeconomic and
ethnic factors in client selection, funding mechanisms, dominant clinical
philosophy, and the effects of competing clinical services. By obtaining
data from the parents of children seen in many different types of setting,
we reduced the risk of obtaining syndromes and prevalence rates that
were not typical of children seen for mental health services.
Because the patterning and prevalence of problems may vary with
the children's sex and age, we derived syndromes separately for each sex
at ages 4 to 5, 6 to 11, and 12 to 16. These age ranges were chosen because
they reflect important differences in cognitive and physical development,
educational level, and social status. The syndromes were derived from
samples of 250 referred children of each sex at ages 4 to 5 and 450 referred
childrenofeachsexatages 6to 11 and 12to 16, fora totalof2,300referred
children (Achenbach & Edelbrock, 1983).
To derive syndromes for a particular sex/age group, we first identi-
fied problems that were reported Q.e., scored 1 or 2) for at least 5% of the
clinically referred children in that group. We then performed principal
components analyses with orthogonal (varimax) and oblique (direct
quartimin) rotations of the first 7 to 15 components for each group.
Syndromes of problems that remained intact through multiple rotations
were retained as the bases for problem scales. Either eight or nine
syndromes were retained for each sexjage group. Some syndromes were
quite similar for all groups, whereas others showed considerable varia-
tion or were restricted to particular sex/age groups. Second-order analyses
produced two broad-band groupings of syndromes that were designated
as "internalizing" and "externalizing." The internalizing grouping was
characterized by problems within the self, such as depression and
somatic complaints. The externalizing grouping was characterized more
by conflicts with other people and with social mores, such as aggressive
and delinquent behavior. This distinction resembles those that have
been designated by others as "Personality Problem versus Conduct
Problem" (Peterson, 1961); "Inhibition versus Aggression" (Miller, 1967);
and "Overcontrolled versus Undercontrolled" (Achenbach & Edelbrock,
1978).
80 DIOMAS M. ACHENBACH
and mental health workers, the mean r was .22. There were significant
differences between correlations for types of problems (better agree-
ment for externalizing than internalizing problems) and for different age
groups (better agreement for ages 6 to 11 than for adolescents). However,
these differences were small and did not affect the overall picture of
moderate correlations between informants seeing children in similar
(though not necessarily identical) contexts, and low correlations be-
tween informants seeing children in different contexts and between
self-reports and reports by others.
The modest cross-informant correlations do not mean that the infor-
mants' reports are either unreliable or invalid. High test-retest reliabilities
have been obtained for ratings by most kinds of informants, and numer-
ous significant associations with other variables support the validity of
ratings (see Achenbach & Brown, 1991; Achenbach & Edelbrock, 1983,
1986, 198 7). Rather than indicating a lack of reliability or validity, the low
inter-informant correlations indicate that no single informant can substi-
tute for all others. Instead, to obtain a reasonably complete picture of a
child's functioning, we need data from multiple sources. When available,
such sources would include each parent, the child's teacher(s), and
direct assessment of the child, such as observations, interviews, and-
for older children-structured self-report forms.
In order to extend assessment beyond parent reports, we have
developed standardized rating instruments analogous to the CBCL but
designed to obtain reports from teachers (the Teacher's Report Form,
"TRF"), self-reports from adolescents (Youth Self-Report, "YSR"), direct
observations in group settings (Direct Observation Form, "DOF"), and
clinical interviews (Semistructured Clinical Interview For Children, "SCIC")
(Achenbach & Edelbrock, 1986, 1987; McConaughy &Achenbach, 1990).
We have also developed a downward extension of the CBCL for ages 2 to
3 (Achenbach, Edelbrock, & Howell, 1987), and upward extensions of the
CBCL (Young Adult Behavior Checklist, "YABCL") and of the YSR (Young
Adult Self-Report, "YASR") (Achenbach, 1990a; 1990b).
For clinical assessment, reports by informants are not the only
important sources of data. For most cases, standardized tests of ability
and achievement are also relevant, as is medical assessment. To highlight
the importance of viewing assessment in terms of multiple sources of
data, we call this approach multiaxial empirically based assessment. Table
1 summarizes procedures for obtaining assessment data in terms of five
axes relevant to the assessment of most children. Not all procedures may
be feasible or desirable in all cases, but comprehensive assessment
should take account of all these aspects of functioning. Practical applica-
tions of multiaxial empirically based assessment have been illustrated by
Achenbach and McConaughy (1987) for diverse child and adolescent cases.
Table I
I~
Examples of Multlaxlal Assessment
Cross-Informant Discrepancies
Despite the differences between the DSM and the empirically based
approach, empirically based assessment procedures have shown signifi-
cant associations with some DSM categories (e. g., Edelbrock & Costello,
1988; Edelbrock, Costello, & Kessler, 1984; Weinstein, Noam, Grimes,
Stone, &Schwab-Stone, 1990). Furthermore, the use of empirically based
procedures to obtain data from multiple informants about the same child
has posed a major challenge that must be confronted by any approach to
assessment, taxonomy, or diagnosis, including the DSM approach. This
challenge is the problem of dealing with the often disparate pictures of
children's behavior obtained from different sources, each of which may
be reliable, valid, and important in its own right.
Our meta-analyses demonstrated that low to moderate agreement
among informants is not restricted to any particular instruments, infor-
mants, samples of children, or contexts (Achenbach et al., 1987). In fact,
the findings were quite consistent across many studies published over a
long period. The consistency and generality of the findings indicate that
they are not likely to be altered much by changes in instrumentation. Nor
can the problem be escaped by the DSM approach, whereby the integra-
tion of disparate data presumably occurs in the clinician's head. Even
when clinicians have been exposed to the same data, the inter-clinician
reliability of DSM child diagnoses has generally been mediocre (American
Psychiatric Association, 1980, pp. 470-472; Mattison, Cantwell, Russell, &
Will, 1979; Mezzich, Mezzich, & Coffman, 1985; Strober, Green, & Carlson,
1981; Werry, Methven, Fitzpatrick, & Dixon, 1983). Furthermore, when
DSM diagnoses were operationally defined by administering the Diagnos-
tic Interview Schedule for Children (DISC), little agreement was found
between diagnoses made from interviews with children, interviews with
their parents, and clinical evaluations (Costello, Edelbrock, Dulcan,
MULTIAXIAL ASSESSMENT OF CHILDREN 85
Kalas, & Klaric, 1984). Like the correlations between ratings by different
informants, the correlations between symptoms scored from the parent
and child DISCs were low, with an overall r = .27 (Edelbrock, Costello,
Dulcan, Conover, & Kalas, 1986).
Rather than being an artifact of any particular method, the modest
correlations among informants reflect the essential realities of children's
behavioral and emotional problems. Many such problems are not likely
to be manifested uniformly across all contexts. Furthermore, what is
noticed, judged to be present as a problem, and reported by a particular
informant is affected by characteristics of that informant and his or her
relations with the child. The problem, then, is not to abolish differences
between informants' reports or to choose between right and wrong
reports in each case. Instead, the problem is how best to utilize the
important information from each source.
The sex, age, and informant variations among syndromes raise the
following question: To what extent are particular empirically obtained
patterns linked to an underlying core syndrome that is similar across
groups and informants? The answer to this question has practical impli-
cations for making comparisons across sex, age, and informants. It also
has theoretical implications for advancing from purely empirical correla-
tions among items toward taxonomic constructs on which to target
multiple assessment procedures and the testing of hypotheses as to
etiology, appropriate interventions, and outcomes.
The challenges posed by variations in syndromes arose from the
discoveryofthesevariations through empirical research. The problem of
sex, age, and informant variations in syndromal patterns is not so obvious
in the DSM approach. This is because the DSM approach started with
assumed disorders rather than with empirical data on the patterning of
problems in relation to sex, age, or the source of data. Yet, if a taxonomy
is to accurately reflect children's actual problems, it must take account
of such variations in its definitions of disorders and in its decision rules
for determining whether children of each sex and age have a particular
disorder.
between scores for each core syndrome and the corresponding central
core syndromes. Moderate to high mean correlations between the rel-
evant core syndromes and their respective central core syndromes
indicated that the central core syndromes accurately represented the
variance accounted for by most of the core syndromes.
CBCL, YSR, and TRF core syndromes. The foregoing analyses dem-
onstrated the feasibility of identifying core sets of items that co-occur in
parents' ratings across multiple sexjage groups, despite variations in the
other items that are associated with the syndromes in particular sexjage
groups. The next step was to extend this approach to the identification of
core syndromes in ratings by different informants. Because the ACQ
analyses had not revealed any syndromes that were not also found in the
CBCL, and because the CBCL has parallel forms for teacher- and self-
reports, we started with core syndromes identified on the CBCL. However,
we had by now accumulated larger clinical samples than those from
which the original CBCL syndromes were derived. In addition, we wished
to include syndromes for 4- and 5-year-olds and we wished to extend the
age range to 18. We therefore performed new principal components/
varimax analyses of the CBCL. These new analyses of the CBCL were done
separately for each sex at ages 4 to 5, 6 to 11, and 12 to 18. We also
performed new principal componentsjvarimax analyses of the TRF for
each sex separately at ages 5 to 11 and 12 to 18 and of the YSR for each sex
separately at ages 11 to 18. To maximize comparability among the
analyses, we used the 89 problem items that appear on all three instru-
ments, excluding any items that were reported for a very small percent of
the sample being analyzed.
Within each instrument, we identified syndromes that were similar
for multiple sexjage groups. We then identified items that were common
to the versions of a syndrome in the majority of the sexjage groups for
which the syndrome was found. For example, versions of an Aggressive
syndrome were found for all six sexjage groups on the CBCL, all four on
the TRF, and both groups on the YSR. The core CBCL Aggressive syn-
drome was constructed from items that were included in the Aggressive
syndromes for at least four of the six CBCL sexjage groups. The core TRF
Aggressive syndrome was constructed from items that were included in
the Aggressive syndromes for at least three of the four TRF sexjage
groups. And the core YSR Aggressive syndrome was constructed from
items that were included in the Aggressive syndromes for both sexes on
the YSR. Some additional syndromes were found for particular sexjage
groups on one instrument, but I will focus here on the syndromes that
were found for most sexjage groups in ratings by multiple informants.
90 THOMAS M. ACHENBACH
type model, because these constructs comprise sets of features that are
not perfectly correlated with each other. That is, individuals may
manifest different numbers of the prototypical features of a particular
construct. Furthermore, each feature may be manifest In degrees, as
scored on the 0-1-2 scales of the CBCL, TRF, and YSR. By summing the 0-
1-2 scores for all features of a syndrome, we can assess the degree to which
a child resembles the prototype for that syndrome, according to ratings
by a particular informant. In addition, we can compare the syndrome
scores obtained by the child in ratings by multiple informants to deter-
mine the degree to which the child matches one or more prototypes in the
ratings by different informants.
The concept of syndromes as prototypical sets of correlated features
corresponds to the "polythetic" concept of some DSM-111-R diagnostic
categories (American Psychiatric Association, 1987, p. xxiv). According
to the DSM's polythetic concept, the symptoms listed as criteria for a
disorder are fully interchangeable with each other. Because no one of the
symptoms is required for making a diagnosis, any combination of the
required number of symptoms can justify the diagnosis. This is analogous
to the use of cutpoints on the distributions of syndrome scores for
distinguishing between children who are in the normal versus clinical
range.
Table 2
Items Defining the Cross-Informant Syndrome Constructs Derived from the
Child Behavior Checklist (CBCL),Youth Self Report (YSR), and Teacher's
Report Form (TRF1
Internalizing Scales
Withdrawn Anxious/Depressed
42. Would rather be alone 12. Lonely
65. Refuses to talk 14. Cries a lot
69. Secretive 31. Fears impulses
75. Shy, timid 32. Needs to be perfect
80. Stares blankly" 33. Feels unloved
88. Sulksd 34. Feels persecuted
102. Underactive 35. Feels worthless
103. Unhappy, sad, depressed 45. Nervous, tense
Ill. Withdrawn 50. Fearful, anxious
Somatic Complaints 52. Feels too gullty
51. Feels dizzy 71. Self-conscious
54. Overtired 89. Suspicious
56a. Aches, pains 103. Unhappy, sad, depressed
56b. Headaches 112. Worries
56c. Nausea
56d. Eye problems
56e. Rashes, skin problems
56f. Stomachaches
56g. Vomiting
Neither Internalizing nor Externalizing
Social Problems Attention Problems
1. Acts too young I. Acts too young
11. Too dependent 8. Can't concentrate
25. Doesn't get along w. peers 10. Can't sit still
38. Gets teased 13. Confused
48. Not liked by peers 17. Daydreams
62. Clumsy 41. Impulsive
64. Prefers younger kids 45. Nervous, tense
Thought Problems 61. Poor school work
9. Can't get mind off thoughts 62. Clumsy
40. Hears things 80. Stares blankly"
66. Repeats acts
70. Sees things
84. Strange behavior
85. Strange ideas
94 TIIOMAS M. ACHENBACH
Table 2, continued
Extemallzlng Scales
Delinquent Behavior Aggressive Behavior
26. Lacks guilt 3. Argues
39. Bad companions 7. Brags
43. Lies 16. Mean to others
63. Prefers older kids 19. Demands attention
67. Runs away from homeb 20. Destroys own things
72. Sets flresb 21. Destroys others' things
81. Steals at home 23. Disobedient at school
82. Steals outside home 27. Jealous
90. Swearing, obscenity 37. Fights
101. Truancy 57. Attacks people
105. Alcohol, drugs 68. Screams
74. Shows off
86. Stubborn, Irritable
87. Sudden mood changes
93. Talks too much
94. Teases
95. Temper tantrums
97. Threatens
104. Loud
a) Items are designated by the numbers they bear on the CBCL, YSR, and TRF and
summaries of their content. b) Not on TRF. c) Not on CBCL. d) Not on YSR.
(From Achenbach, 1991a). ©Copyright T. M. Achenbach
Derivation
Operations
Assessment
Operations data on large clinical samples of
each sex/age group
Potential
Sources
orData
Initial
Screen
Is deviance Cozu:lusiqp
confined to tbe No evidence ot
same syndrome dlnic:al deviance;
In Ill sources? cbeck Individual
items tor important
problems, e.g., suicidal
No behavior, firesettlng
Cqnc/Wa Is tbesame
Differential Cbild 'a proble101 combination ol
Olapo.tis correspond to syndromes deviant
a single syndrome, In Ill sources?
e.g., Aggressive,
Depressed
Yes No
Yes No
Conclusjon Condusjon
Different behaviors Some lntormants'
may bave to be perceptions mar
targeted for change baveto be
in different contexts targeted for cbaage
Figure 2. Taxonomic decision tree for using empirically based assessment proce-
dures. (From Achenbach, 1991a.) ©Copyright T. M. Achenbach
98 TIIOMAS M. ACHENBACH
and TRF profiles and by other data that focus on similar syndromes and
have explicit criteria for distinguishing between the normal and clinical
range. Standardized observational procedures and interviews, for ex-
ample, can be used for comparison with the parent-, self-, and
teacher-ratings. (McConaughy & Achenbach, 1988, 1990, provide details
of the Direct Observation Form and Semistructured Clinical Interview
that can be used in this way.)
As shown in Figure 2, we start at the top with data from any combina-
tion of the five potential sources, including parents, self-reports, teachers,
interviews, and observations. After syndrome scales are scored from
each source, the initial screening question is whether any scales are in the
clinical range. A global screen for deviance would include total problem,
Internalizing, and Externalizing scores, as well as syndrome scales from
each source. If no scores are in the clinical range, the data indicate that
the child is not clinically deviant. Nevertheless, individual items should
be examined for evidence of problems that are important in their own
right, such as suicidal behavior and firesetting, whether or not any scales
are in the clinical range.
If any scales are in the clinical range, we ask whether deviant scores
occur on the same syndromes in all sources of data that show any
deviance. This is a question of differential diagnosis. If deviance is
confined to the same syndromes in all data, this indicates focalized
problems in the area of that syndrome.
If deviant scores are not confined to a single syndrome, we then ask
whether the same combination of syndromes is deviant in all data. If the
answer is yes, this indicates that the child's problems comprise multiple
syndromes or a complex profile pattern that might correspond to profile
types identifiable through cluster analysis.
On the other hand, if the data sources differ in the syndromes they
show to be deviant, we need to determine whether the child's behavior
actually differs much among contexts. If the answer is yes, we conclude
that different behaviors may have to be targeted for change in different
contexts. If the answer is no, however, this suggests that some of the
informants' perceptions of the child may need changing.
Additional choices beside those represented in the decision tree are
possible, as are additional sources of data, such as medical examinations,
interviews with parents and teachers, family assessment, and psycho-
logical tests. However, the taxonomic decision tree and cross-informant
program are especially valuable for focusing research, training, and
clinical decision-making on taxonomic distinctions that can be made from
multiple sources relevant to the assessment of most children.
MULTIAXIAL ASSESSMENT OF CHilDREN 99
Summary
Multiaxial empirically based assessment refers to the use of assess-
ment data and taxonomic constructs that are empirically derived from
multiple sources.
Assessment involves identifying the distinguishing features of each case.
Taxonomy involves grouping cases according to their distinguishing features.
Empirically based assessment links assessment and taxonomy by deriving
taxonomic constructs from specific assessment data and by operationalizing
the taxonomic constructs via specific assessment procedures.
Meta-analyses of many studies have revealed significant but modest
correlations between assessment data from different kinds of informants
seeing children in different contexts. Furthermore, multivariate analyses
have shown variations in the patterning and prevalence of problems
reported for children of each sex at different ages.
To deal with the variations among informants and sex;age groups, we
derived core syndromes consisting of sets of problem items found to co-
occur in ratings by a particular type of informant for multiple seX/age
groups. We then compared the core syndromes derived from parent-,
self-, and teacher-ratings to identify sets of items that co-occurred in
ratings by all three types of informants. The items that were common to
the core syndromes for at least two of the three types of Informants were
used to form cross-informant syndrome constructs. These constructs rep-
resent inferred or "latent" variables that may not be exhaustively measured
by any single source of data. The items defining a syndrome construct
provide a prototype for cases considered to manifest the syndrome. The
degree to which cases manifest the prototypical features of the syndrome
can be judged from parent-, self-, and teacher-ratings on the CBCL, YSR,
and TRF, respectively.
Because some items are associated with a syndrome construct only
in ratings by a particular informant, each informant's ratings are scored
on instrument-specific syndrome scales. These scales are normed on
ratings by each type of informant for national samples of 4- to 18-year-
olds. The syndrome scales and scales for scoring competencies are
displayed on the 1991 profiles for the CBCL, YSR, and TRF.
To facilitate the integration of data from parent-, self-, and teacher-
reports, we have developed a cross-informant computer program for
scoring and comparing the CBCL, YSR, and TRF profiles for individual
children. We have also developed a taxonomic decision tree for compar-
ing deviance on scales scored from parents, self-reports, teachers,
interviews, and direct observations. This body of work is designed to aid
in focusing research, training, and clinical decision-making on taxonomic
100 1HOMAS M. ACHENBACH
References
Achenbach, T.M. (1965). A factor-<Inalytic study of juvenile psychiatric symptoms.
Presented at Society for Research in Child Development, Minneapolis, MN.
Achenbach, T.M. (1966). The classification of children's psychiatric symptoms: A
factor-analytic study. Psychological Monographs, 80, (No. 615).
Achenbach, T.M. (1978). The Child Behavior Profile: I. Boys aged 6-11. Journal of
Consulting and Clinical Psychology, 46, 478-488.
Achenbach, T.M. (1985). Assessment and taxonomy of child and adolescent
psychopathology. Newbury Park, CA: Sage.
Achenbach, T.M. (1990a). YoungAdultBehaviorChecklist. Burlington, VT: University
of Vermont Department of Psychiatry.
Achenbach, T.M. (1990b). Young Adult Self-Report. Burlington, VT: University of
Vermont Department of Psychiatry.
Achenbach, T.M. (1991a). Integrative guide for the /991 CBCL/4-/8, YSR, and TRF
Profiles. Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991b).Manual for the ChildBehaviorChecklistand 1991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991c). Manual for the Teacher's Report Fonn and /991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991d). Manual for the Youth Self-Report and 1991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M., & Brown, J.S. (1991). Bibliography of published studies using the
Child Behavior Checklist and related materials: 1991 edition. Burlington, VT:
University of Vermont Department of Psychiatry.
Achenbach, T.M., Conners, C.K., Quay, H. C., Verhulst, F.C., & Howell, C.T. (1989).
Replication of empirically derived syndromes as a basis for taxonomy of child/
adolescent psychopathology. Journal ofAbnomial Child Psychology, 17,299-323.
Achenbach, T.M., &Edelbrock, C. (1981). Behavioral problems and competencies
reported by parents of normal and disturbed children aged four to sixteen.
Monographs of the Society for Research in Child Development, 46, Serial No. 188,
Achenbach, T.M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist
and Revised Child Behavior Profile. Burlington, VT: University of Vermont,
Department of Psychiatry.
Achenbach, T.M., &Edelbrock, C. (1986).Manual for the Teacher'sReportFonn and
Teacher Version of the Child Behavior Profile. Burlington, VT: University of
Vermont, Department of Psychiatry.
MULTIAXIAL ASSESSMENT OF CHilDREN 101
Achenbach, T.M., & Edelbrock, C. (1987). Manual for the Youth Self-Report and
Profile. Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M., Edelbrock, C., & Howell, C.T. (1987). Empirically based
assessment of the behavioral/emotional problems of 2-3-year-old children.
Journal of Abnormal Child Psychology, 15, 629-650.
Achenbach, T.M., & Lewis, M. (1971). A proposed model for clinical research and
its application to encopresis and enuresis. Journal of the American Academy of
Child Psychiatry, 10, 535-554.
Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent
behavioral and emotional problems: Implications of cross-informant
correlations for situational specificity. Psychological Bulletin, 101, 213-232.
American Psychiatric Association (1952, 1968, 1980, 198 7). Diagnostic and statistical
manual ofmental disorders (1st, 2nd, 3rd, 3rded. rev.). Washington, DC: Author.
Anderson, L.M. (1969). Personality characteristics of parents of neurotic,
aggressive, and normal preadolescent boys. Journal of Consulting and Clinical
Psychology, 33, 575-581.
Cantor, N., Smith, E.E., French, R.deS., &Mezzich, J. (1980). Psychiatric diagnosis
as prototype categorization. Journal of Abnormal Psychology, 89, 181-193.
Costello, A.J., Edelbrock, C., Dulcan, M.K., Kalas, R., & Klaric, S.H. (1984). Report
on the Diagnostic Interview Schedule for Children (DISC). Pittsburgh, PA:
University of Pittsburgh, Department of Psychiatry.
Edelbrock, C., & Costello, A.J. (1988). Convergence between statistically derived
behavior problem syndromes and child psychiatric diagnoses. Journal of
Abnormal Child Psychology, 16, 219-231.
Edelbrock, C., Costello, A.J., Dulcan, M.K., Kalas, R., & Conover, N.C. (1985). Age
differences in the reliability of the psychiatric interview of the child. Child
Development, 56, 265-275.
Edelbrock, C., Costello, A.J., & Kessler, M.D. (1984). Empirical corroboration of
attention deficit disorder. Journal ofthe American Academy of Child Psychiatry,
23, 285-290.
Hafner, A.J., Quast, W., & Shea, M.J. (1975). The adult adjustment of one thousand
psychiatric and pediatric patients: Initial findings from a twenty-five year follow-
up. In R.D. Wirt, G. Winokur & M. Rolf. (Eds.), Life history research in
psychopathology (Vol. 4). Minneapolis, MN: University of Minnesota Press.
Horowitz, L.M., Post, D.L., French, R.deS., Wallis, K.D., & Siegelman, E.Y. (1981).
The prototype as a construct in abnormal psychology: 2. Clarifying disagreement
In psychiatric judgments. Journal of Abnormal Psychology, 90, 575-585.
Horowitz, L.M., Wright, J.C., Lowenstein, E., & Parad, H.W. (1981). The prototype
as a construct in abnormal psychology: 1. A method for deriving prototypes.
Journal of Abnormal Psychology, 90,568-574.
Katz, P.A., Zigler, E., & Zalk, S.R. (1975). Children's self-image disparity: The effects
of age, maladjustment and action-thought orientation. Developmental
Psychology, 11, 546-550.
102 TIJOMAS M. ACHENBACH
Mattison, R., Cantwell, D.P., Russell, A.T., & Will, L. (1979). A comparison of DSM-
11 and DSM-III in the diagnosis of childhood psychiatric disorders. Archives of
General Psychiatry, 36, 1217-1222.
McConaughy, S.H., &Achenbach, T.M. (1988).Practical guide for the Child Behavior
Checklist and related materials. Burlington, VT: University of Vermont,
Department of Psychiatry.
McConaughy, S.H., &Achenbach, T.M. (1990). Guide for the Semistructured Clinical
Interview for Children Aged 6-/1. Burlington, VT: University of Vermont,
Department of Psychiatry.
Mezzlch, A. C., Mezzlch, J.E., & Coffman, GA (1985). Rellablllty of DSM-III vs. DSM-
11 ln chlld psychopathology.Journal oftheAmericanAcademyofChildPsychiatry,
24, 273-280.
Roff, J.D., Knight, R., & Wertheim, E. (1976). Disturbed preschlzophrenlcs:
Chlldhood symptoms ln relation to adult outcome. Journal of Nervous and
Mental Disease, 162, 274-281.
Rolf, J.E. (1972). The social and academic competence of chlldren vulnerable to
schizophrenia and other behavior pathologies. Journal ofAbnormal Psychology,
80, 225-243.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B.B. Uoyd. (Eds.),
Cognition and categorization. Hlllsdale, NJ: Erlbaum.
Strober, M., Green, J., & Carlson, G. (1981). The rellabllity of psychiatric diagnosis
ln hospitalized adolescents: Interrater agreement using the DSM-III. Archives of
General Psychiatry, 38, 141-145.
Weinstein, S.R., Noam, G.G., Grimes, K., Stone, K., & Schwab-Stone, M. (1990).
Convergence of DSM-III diagnoses and self-reported symptoms in chlld and
adolescent Inpatients. Journal ofthe American Academy ofChild and Adolescent
Psychiatry, 29, 627-634.
Weintraub, SA (1973). Self-control as a correlate of an internallzing-externallzlng
symptom dimension. Journal of Abnormal Child Psychology, I, 292-307.
Werry, J.S., Methven, R.J., Fitzpatrick, J., & Dixon, H. (1983). The interrater
reliability ofDSM-III in chlldren.Journal ofAbnormal Child Psychology, II, 341-354.
Author Note
Much of the recent work reported here has been supported by NIMH Grant 40305
and the Spencer Foundation, for which the author is most grateful.
CHAPTER4
The Psychopathy
Checklist-Revised (PCL-R)
An Overview for Researchers
and Clinicians
The Psychopathy Checklist (PCL; Hare, 1980) and its revision (PCL-R;
Hare, 1985a, in press) are clinical rating scales that provide researchers
and clinicians with reliable and valid assessments of psychopathy. Their
development was spurred largely by dissatisfactions with the ways in
which other assessment procedures defined and measured psychopathy
OHare, 1980, 1985b).
The PCL was originally intended only for research with forensic
populations. However, over the past ten years both the construct of
psychopathy and the PCL itself have become topics of intense interest to
Deftnldon
Psychopathy, as defined in the PCL-R, is a personality disorder. Like
all personality disorders, it has an early onset and characterizes the
individual's long-term functioning, resulting in social and interpersonal
dysfunction (e.g., American Psychiatric Association, 1980, 1987; Millon,
1981). Symptoms of psychopathy are usually evident by middle to late
childhood, and can be assessed reliably in adolescence (Forth, Hart, &
Hare, 1990; Robins, 1966). The disorder is chronic and persists well into
adulthood, although there may be some changes in its symptom pattern
PSYCHOPATIIY CHECKUST 105
with respect to interpersonal and affective traits (Hart & Hare, 1989).
In general, the empirical association between diagnoses of
psychopathy and APD in forensic populations is asymmetric: Most crimi-
nal psychopaths are also APD, but the reverse is not true. For example,
about 90% of criminal psychopaths, diagnosed according to PCL-R crite-
ria, meet DSM-111/DSM-lii-R APD criteria, but only about 20% to 30% of
those diagnosed as APD meet the PCL-R criteria for psychopathy (Hare,
1983, in press; Hart & Hare, 1989). One reason for this asymmetry is that
in forensic populations the base rate for psychopathy, as defined by the
PCLfPCL-R, is much lower than is the base rate for APD. An additional
reason is that most criminal psychopaths engage in the sort of antisocial
behaviors that define APD, whereas the majority of prisoners and foren-
sic patients with APD show little evidence of the affective and interpersonal
features measured by the PCLfPCL-R.
The situation may change when the next version of DSM Is published.
Thus, the Axis II Work Group of the DSM-IV Task Force has Identified APD
as the personality disorder most likely to undergo major changes In DSM-
IV (American Psychiatric Association, 1990). The primary goals of the
Work Group are to simplify the criteria for this disorder and at the same
time include more traditional items typical of psychopathy. Four criteria
sets are being evaluated and compared in field trials, under the overall
direction of Thomas Widiger (see Widiger, et al., in press; Hare, Hart, &
Harpur, In press). Briefly, the four criteria sets are as follows: the existing
DSM-lii-R criteria for APD; a shortened list of the DSM-lii-R criteria; the ICD-
10 criteria for dyssocial personality disorder (see below); and a list of 10
items derived from the PCL-R, half measuring interpersonal and affective
behaviors and half measuring criminal and antisocial behaviors.
Assessment Issues
The problems with the diagnostic criteria for APD also apply to self-
report scales designed to assess psychopathy, such as Scale 4 (Pd) of the
MMPI/MMPI-2 (Hathaway & McKinley, 1943; Butcher, Dahlstrom, Gra-
ham, Tellegen, & Kaemmer, 1989), or the Socialization (So) scale of CPI
(Gough, 1969); they also seem to measure only the social deviance
components of psychopathy (Hare, 1985b; Harpur et al., 1989).2 Self-
report scales are also problematic In that they require the cooperation of
the inmate or patient, and are susceptible to attempts at malingering or
socially desirable responding-a special concern when trying to assess
psychopathy, since lying and deceitfulness are commonly included as
diagnostic criteria for the disorder (Hare, 1985b; Hare, Forth, & Hart, 1989;
Hart, Forth, & Hare, in press). It was concerns such as these that
motivated the development of the PCL.
PSYCHOPA1HY CHECKUST 107
ThePCL
The first step in construction of the PCL was to generate a list of
characteristics, derived from experience with criminals and from a
survey of the literature, that might differentiate between psychopathic
and non psychopathic inmates. Over 100 such characteristics were iden-
tified. The list was shortened by deleting partially overlapping items.
Next, we formed two criterion subject groups-psychopaths and
non psychopaths-on the basis of global ratings. Independent raters then
reassessed these subjects and rated them on each of the preliminary
items. Items were dropped from the list if they correlated too highly with
other items, or if they had low correlations with the global ratings, low
interrater reliability, or extreme base rates. After these steps, a list of 22
characteristics remained; these became the original version of the PCL
(Hare, 1980; Hare & Frazelle, 1980).
ThePCirR
Subsequent revisions of the PCL consisted of (a) dropping two items
with unsatisfactory psychometric characteristics and problematic scor-
ing criteria, (b) modifying item descriptions to clarify their scoring, and
(c) changing the wording of several items. The result was the 20-item PCL-
R (Hare, 1985a, in press). The items in the PCL-R are presented in Table
108 HARTETAL.
Table 1
The PCL-R IteJDB: Mean lnte!Tilter Rellab1llty, Corrected Item-Total
Correlations, and De8crlptlve Statistics
r
Inter- Item-
rater total M SD
assessed by two independent raters, one using the PCL and the other
using the PCL-R The correlation between the PCL and PCL-R Total scores
was .88, a value that is similar to the interrater reliability of the individual
scales. Indeed, when the correlation between the scales is disattenuated
for rater unreliability, it approaches unity.
Scale fonnat
The items listed in Table 1 are merely summary labels; the PCL-R
manual (Hare, in press) contains a detailed description of each item, as
well as a section on the sources of information typically used to score the
item. Each item is scored on a 3-point scale, where 0 indicates that the
symptom definitely does not apply to the individual; 1, that the item
applies somewhat or only in a limited sense; and 2, that the item definitely
applies. The ratings for most items involve some degree of judgment and
inference, guided by the item description in the manual. For several items,
however, fixed and explicit criteria are provided. The PCL-R allows raters
to omit items that they feel cannot be scored properly due to missing or
incomplete information.
Administration
PCL-R assessments are based on an interview and a review of collat-
eral information. In our research we use a semi-structured interview that
allows the interviewer (a) to obtain the requisite historical information,
and (b) to observe the individual's interactional style. The interview
takes about 90 to 120 minutes to complete, and covers educational,
occupational, family, marital, and criminal history. Although a more
structured interview would perhaps increase the reliability of some
information collected, it would also tend to obscure or suppress the
individual's natural interactional style.
The type of collateral information available varies according to the
setting in which the assessment is made. In correctional settings there is
usually ample material available: A criminal record, intake or classifica-
tion reports, presentence reports, institutional progress logs, past parole
or probation records, and so forth. In forensic psychiatric and pretrial
settings, police reports concerning the individual's current offenses,
interviews with family members, and results of medical and psychologi-
cal assessments may also be available. The purposes of this collateral
information are (a) to help evaluate the credibility of information ob-
tained during the interview, (b) to help determine if the interactional style
exhibited by the individual during the interview was representative of his
or her usual behavior, and (c) to provide the primary data for scoring
110 HARTETAL
several of the items. A collateral review typically takes about one hour. In
the absence of adequate collateral information, the PCL-R cannot be
scored.
Occasionally, there are large discrepancies between the interview
and collateral information. If it is possible to determine that one source
of information is more credible than the other, then greatest weight is
given to information from the most credible source. Otherwise, prefer-
ence is given to the source most suggestive of psychopathology, on the
assumption that most individuals tend to underreport pathological
behavior.
It may prove impossible to interview the individual in some situations
(e.g., research using archival information). Acceptable ratings may be
made without an interview only if there is extensive, high-quality file
information available (e.g., Hart & Hare, 1989; Harris, Rice, & Cormier,
1990; Wong, 1988); where possible, behavioral observations and informal
interactions with the client should be used to supplement collateral
information.
Total Scores
Once the interview and collateral review have been completed, each
PCL-R item is scored. The individual items are then summed to yield a
Total score. If five or fewer items are omitted the Total score should be
prorated to a 20-item scale; if more than five items are omitted the
assessment should be considered invalid. Total scores on the PCL-R
range from 0 to 40.
Cutoff scores may be used to classify individuals if a diagnosis is
desired. For the PCL, we used a score of 33 or more as indicative of
psychopathy; this cutoff appeared to provide the best balance between
sensitivity and specificity with respect to global ratings of psychopathy
in the PCL derivation samples. The corresponding cutoff for the PCL-R is
30. Subsequent research has confirmed the utility of these cutoffs 0Nong
& Templeman, 1988).
Factor Scores
The PCL-R (as well as the PCL) consists of two stable, oblique factors
(Hare et al., 1990; Harpur, Hakstian, & Hare, 1988). The correlation
between the factors is about the same in samples of prison inmates (.56
on average) as it is in samples of forensic patients (.53 on average). The
factors can be viewed as psychologically meaningful facets ofthe "higher-
order" construct of psychopathy.
PSYCHOPATIIY CHECKUST 111
Descriptive Statistics
Researchers generally report very similar mean PCLrR Total scores
for samples of criminals, regardless of the country the research is
conducted in, the security level of the institution, whether or not subjects
are volunteers, and the racial composition of the sample (Hare, in press;
Harpur et al., 1988, 1989; Kosson et al., 1990). The mean score in samples
drawn from forensic psychiatric facilities generally is lower than the
mean for samples of prison inmates.
Table 2 presents descriptive statistics for the PCLrR, aggregated
across seven samples of male prison inmates (N = 1192) and across four
samples of male forensic psychiatric patients (N =440). The data are from
studies conducted by a number of investigators in Canada, the United
PSYCHOPATIIY CHECKUST 113
States, and England for which individual item scores were available to us.
Raters varied in their professional qualifications and degree of experi-
ence with the PCL-R; samples differed in mean age, racial composition,
security level, and so forth. In each case, the distribution of Total scores
is approximately normal, with a slight negative skew.
Table 2 also presents means for the Factor scores in prison and
forensic psychiatric populations.
Table 2
Mean PCL-R Total, Factor I, and Fac.tor 2 Scores For Prison Inmates and
Forensic Psychiatric Patients
Note. Based on pooled data from seven samples of prlsonlnmates (N = 1192) and
four samples of forensic psychiatric patients (N = 440). From Hare (in
press).
Demographic Variables
Age. PCL-R Total scores do not to vary appreciably as a function of the
age of subjects. This is not true for Factor 2 scores, however. Cross-
sectional analyses (of the PCL and PCL-R) indicate that Factor 1scores are
stable across age groups ranging from 15 to 55, whereas Factor 2 scores
show a significant linear decline across the same groups (Harpur & Hare,
1990a).
Race. Mean Total scores for Black and Native North American Indian
males are within 1 or 2 points of the mean for White males (Hare, in press;
Kosson et al., 1990; Peterson, 1984; Wong, 1984). There is no consistent
evidence that individual items are racially biased (Hare, in press); never-
theless, the issue needs further investigation. A limitation of current data
is that the raters typically have been White. As a result we do not know
to what extent the evaluation of items dealing with interpersonal and
affective characteristics is influenced by racial and cultural differences
between the rater and the inmate or patient.
Gender. In the few samples studied to date the Total scores are
reliable and distributed much as they are for male offenders (see Hare, in
press; Neary, 1990).
114 HARTET AI..
Socioeconomic level. PCL-R (and PCL) Total and Factor 2 scores are
negatively correlated with the occupational achievement of subjects, but
uncorrelated with the occupational achievement of their parents (Harpur
et al., 1989). With respect to educational achievement, criminal psych~
paths typically complete fewer years of formal schooling, but take part in
more educational and vocational upgrading In prison, than do other
criminals; thus, Total scores are uncorrelated with overall educational
achievement. Factor 2 scores, however, correlate negatively with educa-
tion (Harpur et al., 1989).
lt~lliil>lllt3f
Table3
RellabWty of PC~R Total and Factor Scores
Internal Consistency-
Alpha .87 .84 .77 .85 .80 .77
MIC .26 .40 .28 .22 .34 .28
ICCb
One rating .83 .72 .83 .86 .77 .83
Average of two ratlnes .91 .86 .91 .93 .88 .92
Note. Adapted from Hare (In press). Alpha= Cronbach's alpha; MIC =mean Inter-
Item correlation; ICC =·lntraclass correlation
a) Based on data from 1192 inmates and 440 patients.
b) Based on data from 385 Inmates and 90 patients.
PSYCHOPAmY CHECKIJST 115
using a one-way random effects model (Bartko, 1976; Shrout & Fleiss,
1979). Data from two sets of judges were used to calculate the expected
reliability of a single judge's ratings (ICC 1) and the expected reliability of
the mean of two ratings (ICC~.
The ICCs indicate that PCL-R Total and Factor scores are highly
reliable, especially when averaged across two raters, despite the subjec-
tive nature of many of the items.
Generallzablllty
Generalizability (G) theory (Cronbach, Gieser, Nanda, & Rajaratnam,
1972) has several advantages over classical test score theory. For one, it
provides a single index of the adequacy of measurement, namely the
generalizability coefficient (GC). Applying G theory to the PCL-R, the main
concern is to reliably rank order individuals (the object of measurement).
Variance due to individuals is considered universe score variance (true
score variance in the classical sense), and error variance arises from the
interaction of individuals with all other sources (test items, time, raters,
institutions, etc.). The ratio of universe score variance to universe score
plus error variance reflects the generalizability (reliability) of measure-
ment. This ratio is the GC; it is an intraclass correlation that is interpreted
in the same way as traditional reliability coefficients. (See Wiggins, 1973,
pp. 284-295, for a succinct discussion of G theory.)
Schroeder and Hare (1990) performed a G analysis on data from 475
inmates and patients for whom double ratings were available. The GC was
.82 for Total scores, .76 for Factor 1 scores, and .83 for Factor 2 scores.4
Validity
Because the PCL and PCL-R are so highly correlated, evidence con-
cerning the validity of one has direct implications for the validity of the
other. In several studies both PCL and PCL-R ratings were collected, and
in each case the scales had identical patterns of correlations with external
variables (e.g., other measures of psychopathy, parole outcome, vio-
lence, etc.). For these reasons, separate sections are not devoted to the
two versions in the following discussion.
Content-related evidence
Although they owe much to Cleckley (1976), the PCL-R items are
consistent in content with the conceptualizations of psychopathy dis-
cussed by many other authors (e.g., Buss, 1966; Karpman, 1961; McCord
116 HARTETAL
& McCord, 1964; Millon, 1981). They are also consistent with t~e views of
psychopathy held by practicing clinicians (Gray & Hutchison, 1964;
Davies & Feldman, 1981) and researchers (Fotheringham, 1957; Albert,
Brigante, & Chase, 1959).
As noted above, the PCL-R criteria for psychopathy overlap consid-
erably with the DSM-III/DSM-III-R, Research Diagnostic Criteria (RDC;
Spitzer, Endicott, & RQbins, 1975), and Feighner criteria (Feighner et al.,
1972) for APD, although the DSM-IIIfDSM-III-R and RDC criteria pay little
attention to interpersonal and affective characteristics. The fact that the
DSM-IV Task Force decided to use items based on the PCL-R in its field
trials may be seen as support for the content-related validity of the PCL-R.
The PCL-R criteria are also very similar to the current International
Classification of Diseases (ICD-9; World Health Organization, 1978) cat-
egory 301.7, "Personality disorder with predominantly sociopathic or
asocial manifestations" (also referred to as APD in clinical modifications
of ICD-9), and to the proposed ICD-10 revision of this category (F60.2,
"dyssocial personality disorder"; see Sartorius, Jablensky, Cooper, &
Burke, 1988). Uke the PCL-R, the ICD-10 category makes use of inferences
about personality traits.
In sum, the PCL-R items appear to provide good coverage of the
domain of psychopathic traits as defined in clinical practice, research,
and standard diagnostic criteria.
Hart et al. (1988) found that the PCL predicted conditional release
violations in a sample of 231 federal offenders, even after they had
controlled for variables such as type of release granted (parole versus
mandatory supervision), criminal history, previous conditional release
violations, and demographic characteristics. Psychopaths recidivated
faster than did non psychopaths, and over twice as often. Following their
release, psychopaths had poorer social and occupational functioning,
regardless of the eventual outcome of their release, than did
non psychopaths.
Similar results were obtained by Serin, Peters, and Barbaree (1990).
They reported that the PCL was considerably better at predicting the
performance of 93 male inmates on unescorted temporary absence or
parole than were several standard actuarial instruments, including the
Base Expectancy Scale (BES; Gottfredson & Bonds, 1961), the Salient
Factor Score (SFS; Hoffman &Beck, 1974), and the Recidivism Prediction
Scale (RPS; Nuffield, 1982).
In addition to general recidivism, the PCL-Rappears to be useful in the
prediction of violent recidivism. Serin (in press) found that PCL-R Total
scores, but not scores on actuarial instruments (the BES, SFS, and RPS),
were significantly correlated with violent outcome following the release
of 81 male inmates. Harris et al. (1990) studied the post-release behavior
of 169 male forensic psychiatric patients. The violent recidivism rate for
psychopaths was almost four times that of non psychopaths. The PCL-R
significantly improved the prediction of outcome over and above the use
of criminal history variables. In another study, Rice, Harris, and Quinsey
(1990) studied 54 rapists released from a maximum security psychiatric
hospital. PCL-R Total scores were significantly correlated with recidivism
for violent offenses in general, and with recidivism for sexual offenses in
particular. A combination of PCL-R scores and a phallometric index of
arousal (based on penile plethysmography) predicted recidivism as well
as did a large battery of criminal-history and demographic variables.
Ogloff, Wong, and Greenwood (1990) performed an outcome study of
80 male forensic patients enrolled in a therapeutic community program
designed to treat personality disordered criminals. Data were prospec-
tive for some patients and retrospective for others. The outcome variables
included (1) the number of days that the patient remained in the program;
(2) ratings (4-point scale) of degree of motivation/effort put into the
program; and (3) ratings (4-point scale) of degree of improvement shown
during treatment. Results indicated that PCL-R psychopaths remained in
118 HARTETAL.
the program for a shorter period of time, put in less effort, and showed
less improvement, than did other inmates.
Construct-related evidence
Other clinical measures. Evidence of the PCL/PCL-R's convergent
validity comes mainly from studies looking at the performance of psycho-
paths on psychological tests that theoretically should be related to
psychopathy. For example, PCL/PCL-R Total scores correlate positively
with scores on measures on impulsivity, machiavellianism, narcissism,
and sensation-seeking (Hare, in press; Harpur et al., 1989). With respect
to interpersonal style, Foreman (1988) found that PCL-R Total scores
were positively correlated with ratings of dominance and negatively
correlated with ratings of nurturance on the Interpersonal Adjective
Scales (Wiggins, Trapnell, & Phillips, 1988), regardless of whether these
ratings were made by inmates themselves or by institutional staff. A
series of analyses of responses on the Rorschach test revealed that PCL-
Rscores were positively correlated with psychodynamic measures related
to narcissism, egocentricity, low anxiety, and emotional detachment
(Gacono, 1990; Gacono, Meloy, & Heaven, 1990). With respect to other
mental disorders, Hart and Hare (1989) found that PCL-R Total scores
were positively correlated with diagnoses of substance use disorder,
histrionic personality disorder, and APD; they were also correlated with
prototypicality ratings of histrionic personality disorder, narcissistic
personality disorder, and APD. Positive correlations between the PCL-R
and substance use have also been reported by Smith and Newman (1990).
Although positively associated with some personality and substance
use disorders, psychopathy tends to be negatively associated with most
forms of mental disorder. Hart and Hare (1989) found that patients
diagnosed as psychopathic, using the PCL-R, were less likely than other
patients to receive DSM-III Axis I diagnoses. Hart and Hare (1989) also
found that PCL-R Total and Factor scores were either uncorrelated or
negatively correlated with prototypicality ratings of schizophrenia and
with prototypicality ratings of all personality disorders except histrionic,
narcissistic, and antisocial personality disorder.
Further evidence of the discriminant validity of the PCL/PCL-R comes
from studies using standardized psychological tests. The PCL and PCL-R
are uncorrelated or negatively correlated with self-report measures of
anxiety, depression, and general distress or neuroticism (Hare, in press;
Harpur et al., 1989; Hart, Forth, &Hare, 1990, In press). The results of over
a dozen studies Indicate that there Is no association between PCL/PCL-R
Total scores and performance on various intelligence tests (Hare, in
press). Also, two studies have reported that PCL/PCL-R scores are not
PSYCHOPATIIY CHECKLIST 119
Crlmlnal behavior and violence. The PCL and PCL-Rhave strong and
stable associations with various indices of criminality. Kosson et al.
(1990) examined the association between the PCL-R and criminal behav-
ior in samples of Black and White inmates. Group analyses indicated that
psychopaths of both races were charged with a significantly greater
number and variety of criminal offenses than were non psychopaths. The
same pattern of results has been found in a sample of forensic psychiatric
patients (Hart & Hare, 1989), and in a random sample of 293 White and
Native Indian offenders incarcerated in Canadian federal prisons (Wong,
1984).
In addition to their general criminal activities, psychopaths commit
violent and aggressive offenses at a particularly high rate. In a sample of
244 inmates, Hare and McPherson (1984a) found that PCL-defined psy-
chopaths were significantly more likely than other criminals to engage in
physical violence and other forms of aggressive behavior, including
verbal abuse, threats, and intimidation. Serin (in press) replicated these
findings in a sample of 87 male prison inmates assessed using the PCL-R.
For example, compared with other criminals, psychopaths were more
likely to have a conviction for a violent offense, to use weapons, threats,
and instrumental aggression, and to attribute hostile intent to others.
Hare and McPherson (1984a) also found that while in prison psychopaths
were more violent and aggressive than were other inmates.
Williamson, Hare, and Wong (1987) examined the natureoftheviolent
offenses committed by psychopaths. Their sample consisted of prison
inmates assessed with the PCL. Official police reports were used to
analyze the circumstances surrounding the most serious of inmates'
instant offenses. Most of the murders and serious assaults committed by
the nonpsychopaths occurred during a domestic dispute or during a
period of extreme emotional arousal, whereas this was seldom true of the
psychopaths. The victims of the non psychopaths were likely to be female
and known to them, but the victims of the psychopaths were likely to be
male and unknown to them. The violence of the psychopaths frequently
had revenge or retribution as the motive or occurred during a drinking
bout. In general, it appeared that most of the psychopaths' violence was
callous and cold-blooded or part of an aggressive, macho display, without
the affective coloring that accompanied the violence of non psychopaths.
These results were replicated in a study that analyzed both police reports
and in-depth interviews with offenders (Wright & Wong, 1988).
Laboratory studies. The PCL and PCL-R have been used to investigate
language processes in psychopaths, with the following results: (a) psy-
120 HARTETAL
Table 4
PCL/PCirR and Antisocial Personality Disorder (APD): Effect
Size (Pearson's r with Dependent Variable) In Studies of Criminal Behavior
PCL/PCL-R
Dependent
Study variable N Total Diag. APD
Note. Cell entries are product-moment correlations (r). Diag. =categorical diagnosis
of psychopathy (fotal score > 33 on PCL or > 30 on PCL-R).
a) Number of charges for violent and aggressive behaviors in prison.
b) Global ratings of violent behavior (5 point scale).
The items that define APD and that are found in the So and Pd scales
are from much the same domain as, and therefore should be associated
with, criterion variables related to criminality, violence, and recidivism.
The finding that the PCL/PCLrR generally Is more strongly related to these
criterion variables than are these other measures attests to the value of
Including inferences about Interpersonal/affective traits In the assess-
ment of psychopathy.
Equally important are the results from laboratory tests of hypotheses
about the nature of psychopathy in which the basis for assessment was
the PCLfPCLrR (briefly discussed above). Although there is no similar
body of systematic laboratory research involving APD, several of the
laboratory studies of psychopathy discussed above also obtained DSM-
III or DSM-III-R diagnoses of APD (Patrick, personal communication,
October, 1990; Williamson et al., 1990, in press). In each case, the effects
that were significant with the PCL/PCLrR were not significant when the
presence or absence of APD was the basis for group selection.
PSYCHOPATIIY CHECKUST 123
Conclusions
The PCL-R is a 20-item clinical rating scale for the assessment of
psychopathy. It makes use of interview and file information to assign a
Total score (0 to 40) that represents the degree to which an individual
matches the prototypical psychopath, perhaps most vividly described
by Cleckley (1976). The PCL-R consists of two stable, correlated factors:
Factor 1 measures the affective/interpersonal components of psychopathy,
whereas Factor 2 reflects the impulsive, unstable, and antisocial lifestyle
aspects of the disorder. There is extensive evidence that PCL-R Total and
Factor scores are reliable and valid when used with male forensic
populations. There are early indications that the PCL-R will also be useful
with female forensic populations and with noncriminals.
References
Albert, R.S., Brigante, T.R., & Chase, M. (1959). The psychopathic personality: A
content analysis of the concept. Journal of General Psychology, 60, 17-28.
American Psychiatric Association. (1980). Diagnostic and Statistical Manual of
Mental Disorders (3rd ed.). Washington, DC: Author.
American Psychiatric Association. (1987). Diagnostic and Statistical Manual of
Mental Disorders (3rd ed., revised). Washington, DC: Author.
American Psychiatric Association. (1990). DSM-JV Update (January/February
1990). Washington, DC: Author.
Bartko, J.J. (1976). On various intraclass correlation reliability coefficients.
Psychological Bulletin, 83, 762-765.
Buss, A. H. (1966). Psychopathology. New York: Wiley.
Butcher,J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A., &Kaemmer, B. (1989).
Manual for the restandardized Minnesota Multiphasic Personality Inventory: The
MMP/-2. Minneapolis: University of Minnesota Press.
Cleckley, H. (1976). The Mask of Sanity (5th ed.). St. Louis, MO: Mosby.
Correctional Service of Canada (1989). Forum on Corrections Research, I, No.2.
Ottawa, Canada: Author.
Correctional Service of Canada (1990). Fo~um on Corrections Research, 2, No. I.
Ottawa, Canada: Author.
Cotton, D.J. (1989). Forensic assessment survey results and model forensic assessment
protocol recommendations. Conditional Release Program, Forensic Services
branch, California Department of Mental Health. Pinole, CA:. Author.
Cronbach, L.J., Gieser, G.C., Nanda, H., &Rajaratnam, N. (1972). The Dependability
of behavioral measurements. New York: Wiley.
Davies, W., & Feldman, P. (1981). The diagnosis of psychopathy by forensic
specialists. British Journal of Psychiatry, /38, 329-331.
124 HARTETAL
Hare, R.D. (1980). A research scale for the assessment of psychopathy in criminal
populations. Personality and Individual Differences, /, 111-119.
Hare, R. D. (1982). Psychopathy and physiological activity during anticipation of
an aversive stimulus in a distraction paradigm. Psychophysiology, /9, 266-271.
Hare, R.D. (1983). Diagnosis of antisocial personality disorder in two prison
populations. American Journal of Psychiatry, /40, 887-890.
Hare, R.D. (1984). Performance of psychopaths on cognitive tasks related to
frontal lobe function. Journal of Abnormal Psychology, 93, 133-140.
Hare, R.D. (1985a). The Psychopathy Checklist. Unpublished manuscript, Department
of Psychology, University of British Columbia, Vancouver, Canada.
Hare, R.D. (1985b). Comparison of procedures for the assessment of psychopathy.
Journal of Consulting and Clinical Psychology, 53, 7-16.
Hare, R.D. (in press). The Hare Psychopathy Checklist-Revised (PCL-R). Toronto,
Ontario: Multi-Health Systems.
Hare, R.D., & Cox, D.N. (1978). Clinical and empirical conceptions of psychopathy,
and the selection of subjects for research. In R.D. Hare & D. Schalling (Eds.),
Psychopathic behavior: Approaches to research. Chichester, England: Wiley.
Hare, R.D., Cox, D.N., & Hart, S.D. (1990). Preliminary manual for the Psychopathy
Checklist: Screening Version (PCL:SV). Unpublished manuscript, University of
British Columbia, Vancouver, B.C., Canada.
Hare, R.D, & Craigen, D. (1974). Psychopathy and physiological activity in a mixed-
motive game situation. Psychophysiology, II, 197-206.
Hare, R.D., Forth, A.E., & Hart, S.D. (1989). The psychopath as prototype for
pathological lying and deception. InJ.C. Yuille (Ed.), Credibilityassessment(pp.
25-49). Dordrecht, The Netherlands: Kluwer.
Hare, R.D., & Frazelle, J. (1980). Some preliminary notes on the use of a research
scale for the assessment of psychopathy in criminal populations. Unpublished
manuscript, Department of Psychology, University of British Columbia,
Vancouver, Canada.
Hare, R.D., Frazelle,J., & Cox, D.N. (1978). Psychopathy and physiological response
to threat of an aversive stimulus. Psychophysiology, 15, 165-172.
Hare, R.D., Harpur, T.J., Hakstian, AR., Forth, A.E., Hart, S.D., & Newman, J.P.
(1990). The revised Psychopathy Checklist: Reliability and factor structure.
Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2,
338-341.
Hare, R.D., Hart, S.D., & Harpur, T.J. (in press). Psychopathy and the proposed
DSM-IV criteria for antisocial personality disorder. Journal of Abnormal
Psychology: Special Issue.
Hare, R.D., &Jutai, J.W. (1988). Psychopathy and cerebral asymmetry in semantic
processing. Personality and Individual Differences, 9, 329-337.
Hare, R.D., & McPherson, L.M. (1984a). Violent and aggressive behavior by
criminal psychopaths. International Journal of Law and Psychiatry, 7, 35-50.
126 HARTETAL
Hare, R.D., & McPherson, L.M. (1984b). Psychopathy and perceptual asymmetry
during verbal dichotic listening. Journal of Abnormal Psychology, 93, 141-149.
Hare, R.D., McPherson, L.M., & Forth, A.E. (1988). Male psychopaths and their
criminal careers. Journal of Consulting and Clinical Psychology, 56, 710-714.
Hare, R.D., & Schalling, D. (Eds.). (1978). Psychopathic behavior: Approaches to
research. Chichester, England: Wiley.
Hare, R.D., Williamson, S.E., & Harpur, T.J. (1988). Psychopathy and language. In
I.E. Moffitt & S.A. Mednick (Eds.), Biological contributions to crime causation
(pp. 68-92). Dordrecht, Netherlands: Martinus Nijhoff.
Harpur, T. J., &Hare, R. D. (1989,June). Facilitation and inhibition ofvisual attention
in psychopaths. Paper presented at the Fourth Meeting of the International
Society for the Study of Individual Differences, Heidelberg, West Germany.
Harpur, T.J., &Hare, R.D. (1990a). The assessment of psychopathy as a function of
age. Manuscript submitted for publication.
Harpur, T.J., &Hare, R.D. (1990b). Psychopathy and attention. In J. Enns (Ed.), The
development of attention: Research and theory (pp. 429-444). New York: North
Holland.
Harpur, T J., Hakstian, A.R., &Hare, R.D. (1988). Factor structure of the Psychopathy
Checklist. Journal of Consulting and Clinical Psychology, 56, 741-747.
Harpur, T.J., Hare, R.D., & Hakstian, AR. (1989). Two-factor conceptualization of
psychopathy: Construct validation and assessment implications. Psychological
Assessment: A Journal of Consulting and Clinical Psychology, 1, 6-17.
Harris, G. T., Rice, M. E., & Cormier, C. A. (1990). Psychopathy and violent
recidivism. Manuscript submitted for publication.
Hart, S.D., Forth, A. E., & Hare, R.D. (1990). Performance of male psychopaths on
selected neuropsychological tests. Journal ofAbnormal Psychology, 99, 374-379.
Hart, S.D., Forth, AE., & Hare, R.D. (in press). Assessing psychopathy in male
criminals using the MCMI-11. Journal of Personality Disorders.
Hart, S.D., Kropp, P.R., & Hare, R.D. (1988). Performance of male psychopaths
following conditional release from prison. Journal of Consulting and Clinical
Psychology, 56,227-232.
Hart, S.D., & Hare, R.D. (1989). Discriminant validity of the Psychopathy Checklist
in a forensic psychiatric population. Psychological Assessment: A Journal of
Consulting and Clinical Psychology, 1, 211-218.
Hathaway, S.R., & McKinley, J.C. (1943). Manual for the Minnesota Multiphasic
Personality Inventory. New York: Psychological Corporation.
Hoffman, P., & Beck, J.L. (1974). Parole decision-making: A salient factor score.
Journal of Criminal Justice, 2, 19~206.
Jutai, J., & Hare, R.D. (1983). Psychopathy and selective attention during
performance of a complex perceptual-motor task. Psychophysiology, 20, 146-151.
Jutai, J., Hare, R.D., & Connolly, J.F. (1987). Psychopathy and event-related brain
potentials (ERPs) associated with attention tospeech.Personalityandlndividual
Differences, 8, 17~184.
PSYCHOPATHY CHECKUST 127
Widiger, T.A., Frances, A.J., Pincus, H.A., Davis, W.W., & First, M. (in press).
Toward an empirical classification for DSM-N.Journal ofAbnormal Psychology:
Special Issue.
Wiggins, J.S. (1973). Personality and prediction: Principles ofpersonality assessment.
Reading, MA: Addison-Wesley.
Wiggins, J.S. (1982). Circumplex models of interpersonal behavior in clinical
psychology. In P.C. Kendall & J.N. Butcher (Eds.), Handbook of research
methods in clinical psychology (pp. 183-221). New York: Wiley.
Wiggins, J.S., Trapnell, P., & Phillips, N. (1988). Psychometric and geometric
characteristics of the revised Interpersonal Adjective Scales (IAs.R). Multivariate
Behavioral Research, 23, 517-530.
Williamson, S., Hare, R.D., & Wong, S. (1987). Violence: Criminal psychopaths and
their victims. Canadian Journal of Behavioral Science, 19, 454-462.
Williamson, S., Harpur, T.J., & Hare, R.D. (1990, August). Sensitivity to emotional
polarity in psychopaths. Paper presented at the meeting of the American
Psychological Association, Boston, MA.
Williamson, S., Harpur, T.J., & Hare, R.D. (in press). Abnormal processing of
affective words by psychopaths. Psychophysiology.
Woodruff, R.A., Guze, S.B., & Clayton, P.J. (1980). The medical and psychiatric
implications of antisocial personality (sociopathy). In H.J. Vetter &R.W. Rieber
(Eds.), The psychological foundations of criminal justice, Vol. II (pp. 307-312).
New York: John Jay Press.
Wong, S. (1984). Criminal and institutional behaviors of psychopaths. Programs
Branch Users Report. Ottawa, Ontario, Canada: Ministry of the Solicitor-General
of Canada.
Wong, S. (1988).1s Hare's Psychopathy Checklist reliable without the interview?
Psychological Reports, 62, 931-934.
Wong, S., & Templeman, R. (1988, June). High and low psychopathy groups derived
by cluster-analysing Psychopathy Checklist data. Paper presented at the Annual
Meeting of the Canadian Psychological Association, Montreal, Canada.
World Health Organization (1978). Mental disorders: Glossary and guide to their
classification in accordance with the ninth revision ofthe International aassification
of Diseases. Geneva: Author.
Wright, S., & Wong, S. (1988). Criminal psychopaths and their victims. Unpublished
manuscript, Department of Psychology, University of Saskatchewan, Saskatoon,
Saskatchewan.
Author Note
Preparation of this chapter was supported by grant MT-4511 from the Medical
Research Council of Canada, and by the Program of Research on Mental Health
130 HARTETAL
and the Law of the John D. and Catherine T. MacArthur Foundation (MacArthur
Risk Study: John Monahan, Director). TJ. Harpur is now at the University of
Illinois.
Notes
1. The MacArthur Risk Study uses a brief, 12-item screening version of the
PCL-R, called the PCL:SV, intended for both forensic and nonforensic settings
(Hare, Cox, & Hart, 1990); the DSM-IV APD field trials use a similar, 10-item
modification. Further information concerning these modifications is available
upon request.
2. Information about an experimental Sell-Report Psychopathy (SRP-11) scale
intended to measure the affective, interpersonal, and lifestyle components of
psychopathy is available on request.
3. To help train researchers and clinicians in the proper use of the PCL-R, we
have developed workshops and a set of mock assessment materials (videotaped
interviews with file information). Other users have set up their own training
programs.
4. Schroeder et al. (1983) performed a G theory analysis on PCL data obtained
from five samples of male prison inmates (N = 301) assessed between 1977 and
1981. GCs for PCL Total scores in the individual samples ranged from .85 to .90; the
overall GC was .90. Harpur et al. (1989) reanalyzed these data and obtained GCs
of .79 for Factor 1 and and .84 for Factor 2.
CHAPTERS
The MMPI-2:
Development
and Research Issues
MMPI-2 Development
The charge of the MMPI Restandardization Committee, composed of
James N. Butcher from the University of Minnesota, W. Grant Dahlstrom
from the University of North Carolina, John R Graham from Kent State
University, Auke Tellegen from the University of Minnesota, and Beverly
Kaemmer from the University of Minnesota Press, was to effect some
changes, the needs for which had become apparent over the years, while
at the same time maintaining interpretive continuity with the MMPI
(Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). The two
major targets of this committee were the inappropriateness of some item
content and the need for contemporary norms. The results of the revision
included some modifications and deletions at the item level, the introduc-
tion of some new scales, and changes in the psychometric scaling, which
involved a renorming of the instrument and the development of a new
transformation of the raw scale scores. The following is a review of the
nature of these changes and additions which are found in the MMPI-2.
Item Changes
The MMPI item pool has been many things to different people: to its
developers it was simply an empirical collection of the most discriminat-
ing items available, to grammarians it was a nightmare of complex and
awkward statements, to some feminists it was evidence of institutional-
ized sexism, to comedians such as Art Buchwald, it was the source of a
good amount of material (such as his "North Dakota Null-Hypothesis
Brain Inventory"), to some job applicants it was a group of offensive
MMPI-2 133
New Scales
Except for the minor changes at the item level mentioned above, the
MMPI clinical and validity scales were left relatively intact in the MMPI-2
(see section below on form comparability). Perhaps the biggest change
from the MMPI to the MMPI-2 at the scale level is the introduction of fifteen
new content scales, designed to better represent the major dimensions of
the MMPI-2 item pool, and three new validity scales to complement the
standard validity scales available on the MMPI.
Content Scales
In Burisch' s review (1984) of strategies of test construction, he notes
that while empirical construction can yield instruments that are valid
with respect to external criteria, it often compares unfavorably with
strategies which consider item content (e.g., deductive or factor-ana-
lytic) in terms of other characteristics, such as communicability and
discriminant validity. Although MMPI content interpretation was dis-
couraged in its early years, practitioners in more recent years have found
content interpretation, such as with the Koss and Butcher critical items
(Koss & Butcher, 1973) and the Wiggins Content Scales (Wiggins, Goldberg,
& Appelbaum, 1971), useful both by themselves and as an adjunct to
MMPI clinical scale interpretation. Because of the popularity and utility
of content interpretive approaches to the MMPI, the Restandardization
134 NATHAN C. WEED AND JAMFS N. BUTCHER
& Butcher, 1988). Convergent validity data in the form of correlations with
other MMPI scales are also available (Butcher et al, 1989b). It is clear,
though, that much more research is needed involving these content
scales, especially studies providing correlations with external criteria.
Other potentially interesting studies would include: an evaluation of the
utility of the Negative Treatment Indicators and Negative Work Attitudes
scales; studies regarding the applicability of these scales for various
types of computer adaptive testing (see section below on the use of
computers with the MMPl-2); and comparisons between the deductively-
based, homogeneous MMPl-2 Content Scales, and the atheoretical,
empirically-based clinical scales, in terms of their relative validity, utility,
susceptibility to faking, etc.
BackF
Like scale F, F8 (Back F) was designed to help identify individuals who
complete the inventory in an invalid manner. It was developed, like F, by
identifying items which were infrequently endorsed in the normative
sample. But whereas the items onFappear in the first 370 items, F8 items
are found near the end of the test booklet. Thus, a normal score on F
coupled with a high score on F8 might indicate that the test taker stopped
paying attention to the test items at some point and shifted to a random
or unusual pattern of responding, possibly due to fatigue or loss of
interest in the task. This scale is thought to be potentially useful espe-
cially with adolescent samples in which test taking attitudes may shift.
VRIN
The VRIN (Variable Response Inconsistency) Scale was developed to
complement the original MMPI validity scales by providing information
about the consistency with which an individual responds to item content
within the MMPI-2. VRIN is not scored by considering answers to single
items, but responses to item pairs. VRIN consists of 67 pairs of items which
are either very similar or very opposite in item content. For every
instance of inconsistent responding within an item pair, a point is added
to the VRIN score. In certain cases, VRIN scores may be used to help
explain elevations on scale F. Elevations on Fmay occur for a number of
reasons, including severe psychopathology, faking bad, extreme confu-
sion, or random responding. But if, for example, a profile contains
elevation on both F and VRIN, one would be able to narrow down the
possible explanations for the Felevation. In this case, a clinician should
suspect that the test taker was confused or responding randomly, since
high VRIN scores suggest response inconsistency. In cases where high F
136 NATHAN C. WEED AND JAMFS N. BUTCHER
today's test takers. First, there are differences in test administration. For
example, in the original normative sample, individuals were encouraged
to respond with "Cannot Say" if they were not sure about whether an item
applied to them. Today it is standard practice to ask that test takers
complete each item, leaving answers blank only if they must. Second,
there are many demographic characteristics of today's society of which
the original sample is not representative. Although the original "Minne-
sota normals" may have been representative of the state of Minnesota in
the 1930s in terms of age range, educational level and socioeconomic
background (Hathaway & McKinley, 1940), they are certainly not repre-
sentative of the United States of the 1990s on these variables and others,
such as ethnic group membership. Third, there have been other changes
in society over the last fifty years which surely create differences in the
way people respond to the MMPI items now. Society has undergone
attitudinal changes,lifestyle changes, and gender role changes in the last
fifty years, and each of these is likely to have exerted some influence on
the way people respond to this set of items.
Unlike scores such as those on the Scholastic Aptitude Test, for which
normative drift is not problematic since it is important to compare
differences in performance from year to year, MMPI protocols are inter-
preted with reference to the mean of the population from which they
came. If a protocol is drawn from a population with characteristics
different from those from the comparison group, scores based on relative
standing are difficult to interpret. Therefore, the differences itemized
above between the old MMPI normative sample and today's test takers
justify (perhaps demand) that a new set of comparison scores be con-
structed. One of the main goals of the Restandardization Committee was
to provide an appropriate normative comparison group for today's test
takers. (A separate project involved developing contemporary norms for
adolescents. An Adolescent Form of the MMPI-2 employing these norms
and including new items with age-specific content will be available from
the Univeristy of Minnesota Press sometime in 1991.)
The MMPI-2 normative sample is made up of 2600 test protocols (from
1138 men and 1462 women) gathered from seven geographic regions of
the country (California, Minnesota, North Carolina, Ohio, Pennsylvania,
Virginia, and Washington). Data from the 1980 census were used as a
comparison for many demographic variables to ensure that the sample
collected was representative. Sample data match the census data well on
most variables, including age, ethnic background, marital status, and
income level (Butcher et al., 1989a).
One variable on which the MMPI-2 normative sample does not closely
match the 1980 census is educational level. The new normative sample
138 NATHAN C. WEED AND JAMES N. BUTCHER
has a higher mean education than the census data. Part of this problem
is mitigated by national changes in mean education level since 1980. Also,
since the MMPI-2 is administered only to those with adequate reading
skills (a minimum of an eighth-grade reading level is required, Butcher et
al., 1989a), the more appropriate comparison group (as opposed to the
census) is those to whom the MMPI-2 can be administered, a group which
is, at the least, closer to the MMPI-2 normative sample in terms of
education. Finally, the effect of education on MMPI-2 profiles appears to
be minimal (Butcher, 1990a). Table 1 shows correlations between years
of education and the standard MMPI-2 validity and clinical scales in the
normative sample. Demonstrated also in Figures 1 and 2, the only scales
substantially related to educational level are scales Mf (for males) and K,
both of which have already traditionally been interpreted within the
context of education with the original MMPI. To summarize, then, with the
exception of education, which has minimal impact on MMPI-2 scale
scores, the new normative sample is quite comparable to the 1980 census
Table 1
Correlations Between MMPI-2 Clinical and Validity Scales and Years of
Education for the MMPI-2 Normative Sample
..
,.._, ..... ..
I
.'. . . I Part High School
•-:: ·-= • -:: M-
· -=
n=61
: =---
·~ ~
·- ·-= =--
"~ ~
- ::...
·-= --= :
-~ ~"
"~ - ~ f'
• -:- -· -.:.:;,.--;'-\---:---,.::----:---:--~--=--~- ·-----S.;-~
·-: ;.
.. ,_
·- •-:: · -=
• -::
.
....-
....
·~
.....; "":
..... .. I
~
I
t .-: "~
...' ... ....
I
•-::
I
"-= =--
•-:: :..
·- ·- '
I
n=242
• -:: M-
·-=
~··
11! -: ·-= f'
·-= ~
·-= =--
·-=-=· -=-- .: ~
=o.....:~=_--~--~. . . .:. . -.--=~--=-=- ·- -r.!-L.
·- .
· -= - :..
11-::
..:
·~ ·-= II- : :
·-= .. -; '': tl-= f'
....
'-=
..... .. ...' .... ....
It-:....
· -= ··-=
I
Part College
n=272 -t· ..... .
~
~-
' . ...... ....
k •lli
r
nt ~ MALE ~- ~-
·-= M-::
-~
r·
~~·~
!It-:
,. i :-,..
Itt -; ~- =-I·
,....
e-: ~~
.. -: .. -; r-
·-;
•i
11-:
.~
-- ~ -§
w-:: :--
f'
·-= .. -: ~
.
l't-:
- '-"""
·- --- ~: -:--
;...
•i-
n· :
-~
·-
:
»i
51 --: ... ....... ·-= ,....
-~
~ "' :
·-= "i ~~
"1 ·- ''i ,_
·- r-....
.~
"""=
•i II-; ":
~i
.... ....
•-§ .,..; •-::
-·
:..,
..•
.~
~·
'
~
'
~-
' •··• •
!If-:
~
;...
rn
...
" · -=
:-1t ·~-=
. ·-=---~=-~~
· -= ~
" ,_ .e.
,._ ...;-
·- »--: ,....
=-:;-t.t
~
.,
- ~
:-..~
lO-:
~1
·-=
'' -: ,_ ·- '1-: ,._
:-~ ..:: ,_
·~-=
~
•.' .... .. ·-
=-D
!-: ··-= n...:_
.....
:..,
· -=
~
' '
MMPI-2 141
..... .....
• ·- .
... Post Graduate
.
~· ${~11
·-
~
M- M-
' n=253
'~~ MALE :-'"
..,-:_ ..:
~,,,
:-u•
~~
:......
·-=
. -:: ''""""""
~ -§ ~
;...
"'--: o-n
It-; o-n
·-
,_
·-=
"i
.... ..... ~
"-;
. .3
, . .• .• . .... ...
. -·
Part High School
r:
n=611
·-
•:RMAU ..:
:~ ·-
-~ · -=
;•
·~ =-·
•i
•i
:1
~"'i
·-=
:1--·-____ :_
. .... . ~
·-=
F
- ~ ~-
·.; •1f•
•1
~i ·~
·-=
:~: ~:
.
• -:
"i . ·-: }-
.; ;..
A
~--
- -- - --- : - ·- ----:-----------·--~---~~~~-~--·---.-~-~·
: :..
·~ .: :
-~ ~ ·~-~ .
·-; · -:. ;=-·
·1~
·- · -= ~i r·
·- ·-
.... .....
·-=
1 ..... .• ~·· . ... ...
.. ~ :.. .
MMPI-2 143
··1 ·- f·..·
:1 .., ·-
·~ ~
• -'
r·
~-
.1 -~ r·
•1 t·
,..:
·-=
·-
·~
·- .1 t·
~i
.
"~ ·- ·- ·- ·1 r·
-~ r·
·1
~--·-- --- : -
· --
· ---:-----·-·--..:...-or::-• = • -~-rL.•
·-= :
·1 r·
.......: .1:-~·
•~
•1
'==::::::> ~
"-: ·-: :
~·
•i ·1_ r·
•i
·- ·- ": ·- ,._ ·1r·
· -=
.... .... .' -. ... .... .... ... ·~ :...
:1 - . ·-=; :~ -
·- -~-- -~
-.
:1
•i ·- . ..:
·- ·- ·-
·-; ;-·
~..::
t·
t·
• -'
....
, .:
1 .
..... • .• . ...• •.
~- ...
:.. ,
144 NATHAN C. WEED AND JAMFS N. BUTCHER
Unlfonn T-Scores
On the MMPI, raw scores (and K-corrected raw scores) are standard-
ized by a linear transformation which sets the mean of each scale at 50 and
the standard deviation at ten. These "T-scores" provide a neutral metric
wherein differing scales can be compared in terms of their deviation from
the mean. They cannot, however, be compared directly in terms of their
percentile ranks, since each scale distribution has its own skewness and
kurtosis. This state of affairs can yield some interesting conditions. For
example, it is possible to find that a T-score of 80 (three standard
deviations above a particular scale mean) is more common (has a lower
population percentile) than aT-score of 70 (two standard deviations
above a different scale mean).
This information can be quite disturbing to some, especially those
test users who learned about distributions in their graduate Measure-
ment or Statistics courses which emphasized the prototypical normal
distribution. In the normal distribution, of course, the mean always
equals the median. It is a distribution which indicates precise percentiles
corresponding to standard deviation units, a distribution which specifies
skewness and kurtosis, and is thus, a distribution which is rarely repro-
duced in real-life psychological inquiry, and is especially unrepresentative
of distributions of variables relevant to clinical psychology. Despite its
unsuitability, there have been attempts in the past to transform MMPI
scale distributions into a normal distribution (Colligan, Osborne, Swenson,
& Offord, 1983). However, these "normalized" T-scores failed to gain
popularity, as clinicians had become used to the discrimination at the
upper ends of the positively skewed distributions, and normalized T-
scores remain little used in practice today.
The MMPI Restandardization Committee desired continuity between
MMPI and MMPI-2, but was, at the same time, concerned about the
difference in percentile rank across the clinical scales. The result was a
compromise of sorts.A composite distribution, consisting of the distribu-
tions of the eight standard clinical scales, was formed, and each of the
eight scales was transformed to this composite distribution (fellegen,
1988, 1989). In this way, the positive skewness inherent in each of these
scales was maintained (though modified slightly), while at the same time
forcing uniformity among the scales with regard to percentile rank. With
MMPI-2 "Uniform T-scores," then, one can say for the first time that if a
clinical scale is more deviant from the mean in standard deviation units
than another, it also must be less common.
MMPI-2 145
Validation of MMPI-2
The goal of much of fifty years of MMPI research has been to discover
what the MMPI does well. Rather than investigating the bows and whys of
clinical description and prediction, MMPI researchers have been notori-
ous for ignoring the black box of process in favor of eagerly pursuing the
bottom-line payoff of validity. This has proven fruitful, and bootstrapping
has yielded great returns. With the publication of the MMPI-2, there is
little doubt that clinicians and researchers will remain interested in the
"payoffs," ensuring that they are as big as before, and anticipating,
perhaps, that they will be even bigger.
Fonn Comparability
Despite the conservative nature of the revision of the MMPI, the
overarching concern among loyal practitioners is that the MMPI-2 be able
to "do everything" that the MMPI did (Adler, 1990; Ben-Porath, 1990).
There is much wisdom in maintaining a cautious stance. As Cronbach
(1975) has pointed out, any difference in stimulus conditions should be
considered capable of moderating results until demonstrated otherwise.
It is for this reason that caution is always advised when, for example,
MMPI scales are taken out of context ofthe test booklet, or when the MMPI
is administered by computer, or when the test administration is not
supervised, unless studies have clearly demonstrated equivalence.
A primary objective of the MMPI Restandardization Committee was
to maintain interpretive comparability between the MMPI and MMPI-2,
not merely because of a wish to mollify loyal test users, but because of a
great unwillingness to see decades of MMPI research become suddenly
obsolete. Accordingly, a major research issue, perhaps the first MMPI-2
research issue, centers on how much confidence we should have in
applying our clinical and research knowledge about the MMPI to the new
146 NATHAN C. WEED AND JAMFS N. BUTCHER
MMPI-2. There are to date three kinds of evidence available which bear on
the issue of comparability between the two forms: analyses performed at
the item level, studies focussing on relative scale standing, and studies
examining absolute scale level.
Item equivalence. During the item revision stage of the development
of the MMPI-2 (see above), analyses were performed to assess the
equivalency of the old items with the revised versions of the same items.
On the experimental form AX (from which theMMPI-2 ultimately emerged),
82 items were revised versions of old MMPI items. Ben-Porath and
Butcher (1989b) examined the percentage of agreement between old and
new versions for these 82 items and compared this level of agreement
with agreement between two administrations of the old versions. Of the
82 items, only nine had agreement levels significantly different from the
agreement obtained by administering the old versions twice. None of
these differences held up for both genders, and one of these nine showed
greater agreement between old and new versions. Nevertheless, the
investigators then examined these items with respect to their correla-
tions with the MMPI scales of which they are members. None of the nine
revised items had item-scale correlations which differed from the item-
scale correlations for the corresponding old versions. This study suggests
that MMPI item revision did not result in changes which would impair
comparability between forms.
Relative scale level There has been some concern expressed in the
popular psychological press (Adler, 1990) about the differences between
MMPI and MMPI-2 in the relative level of the clinical scales. In the MMPI-
2 Manual (Butcher et al., 1989a) data from a psychiatric sample are
presented which indicate that the two-point codetypes from administra-
tion of both the MMPI and the MMPI-2 agree roughly two-thirds of the
time. This figure has been cited as evidence (Adler, 1990) that the MMPI-
2 produces results which are not comparable with those produced by the
MMPI. However, to evaluate whether lack of complete agreement from
MMPI to MMPI-2 stems from differences between the versions, one needs
to compare form comparability data with test-retest data from the
original MMPI.
A study by Ben-Porath and Butcher (1989a) made this type of com-
parison. College students were randomly assigned to either of two
groups: one which completed both the MMPI and the developmental
version of the MMPI-2 in a counterbalanced order, and one which com-
pleted the MMPI twice. First, test-retest correlations on individual MMPI
scales were compared with correlations between MMPI and MMPI-2
scales. Of the 42 comparisons made (21 scales each for females and
males), only two differed at the .01level of significance. For one of these
scales (scale F), the magnitude of difference was not shared across
MMPI-2 147
slightly lower and some slightly higher than they did with the MMPl.
Second, the original MMPI instructions encouraged the use of the "Cannot
Say" response to items of which test takers were not certain. Today, it is
standard practice to discourage the use of Cannot Say. As a result, more
items in the MMPI-2 clinical scales are answered and thus, more items
endorsed in the keyed direction. Third, as society changes, some items
simply are answered differently by test takers today than they were in the
normative sample some fifty years ago.
Because of these changes, but primarily because of the change in
practice regarding Cannot Says, MMPI-2 profiles, normed on a contempo-
rary sample, run lower than MMPI profiles. The "clinically interpretable"
cutoff of aT-score of 70 now corresponds roughly to a cutoff of 65 on the
MMPI-2 (92nd percentile). This new cutoff is also supported by studies
which indicate that an MMPI-2 T-score of 65 appears to be optimal for
separating the normative samples from a Depressed sample (Butcher,
1989) and a Chronic Pain sample (Keller & Butcher, 1990). For these
reasons, the MMPI Restandardization Committee now recommends aT-
score of 65 as the new "clinically interpretable" cutoff, and the bold line
which ran across MMPI profile sheets now rests comfortably at 65 on the
MMPI-2 profile sheet. For practitioners used to interpreting absolute
levels of MMPI clinical scale elevations, there will undoubtedly be some
adjustment. However, the contemporary norms and Uniform T-scores
should eliminate the need for the rather unwieldy (and often faulty)
"internal norms" which were at times necessary for users of the original
MMPl.
To summarize, early research suggests that the MMPI-2 functions in
a manner very similar to the MMPI both at the item level and at the relative
scale level. In fact, with regard to relative scale standing and codetypes,
the MMPI-2 matches closely enough to be considered an alternate form of
the same test. Absolute differences between the MMPI and MMPI-2 in
clinical scale elevations are present, but for practitioners used to inter-
preting absolute levels ofMMPI scales, they can be summarized simply by
considering the MMPI-2 scale elevations to be approximately five T-score
points lower than the corresponding MMPI scale.
MMPI Descriptors
It appears that for most purposes, the MMPI-2 clinical scales function
in the same ways that the MMPI clinical scales functioned. It should follow
logically, then, that the more direct, bottom-line question of whether
scale correlates remain the same is answered in the affirmative. However,
as applicability of norms change, so does applicability of MMPI scale and
configuration descriptors. Unfortunately, the most commonly cited ''vali-
MMPI-2 149
dation" studies are over twenty years old. The simple validation study,
once the bread-and-butter of MMPI research, has taken a back seat to
other pursuits. Clinicians appear to be satisfied by the early decades of
empirical research, and time makes these studies less and less current.
To be sure, there are recent validity studies which provide support
for the use of the standard correlates used in MMPI interpretation for
years (as well as MMPI-2 validation studies; Butcher et al., 1989a).
However, as time goes on, the need for additional studies to complement
and update the work done in the past increases. The publication of the
MMPI-2 should generate many new such studies. Especially valuable will
be those which provide validity descriptors from multiple sources and
data from different kinds of samples.
It has long been recognized that the MMPI clinical scales possess
different correlates depending upon the source from which the external
criterion statements are obtained, and the sample utilized in a given
study. For example, one would certainly not expect psychologist, inpa-
tient unit nurse, spouse, friend, and client all to agree that a client steals
things, or is perfectionistic, or has suicidal thoughts (e.g., see McCrea &
Costa, 1987). Neither should one be surprised, when conducting a valida-
tion study, to find that in a homogeneous sample, strong external correlates
are few and far between, or that in a diverse sample, there exist an over-
abundance of scale correlates.
These are not trivial points, since the MMPI is used in many contexts,
from assessment in inpatient settings to screening in the selection of
graduate students in clinical psychology. It is questionable for a practitio-
ner to apply MMPI scale descriptors obtained in one setting to a test-taker
from an entirely different population. It is important, then, for studies
examining the validity of MMPI-2 scales, to take care to specify sample
characteristics, source of criterion descriptors, and if possible, base
rates of the criterion statements within the samples being studied. This
is necessary not only for standard clinical interpretation, but also for
computerized interpretive systems (discussed below). It is hoped that
the innovative features present in the MMPI-2 will be matched by a new
level of sophistication in its supporting research.
Item Subtlety
In the early years following the development of the MMPl by McKinley
and Hathaway, there was a great deal of enthusiasm among psychologists
about the empirical method of test construction. Test taker's responses
to MMPI items were treated as behavioral units and not self-report
statements to be interpreted as face value (Meehl, 1945). The overuse of
face valid items was actually discouraged, because: 1) face validity itself
was not viewed as important as the empirical relationship between an
item and some behavioral criterion; 2) face valid items were considered
to be more susceptible to faking; and 3) it was hoped that serendipitous
findings involving seemingly "neutral" (or even counterintuitive) items
might lead to a greater understanding of psychopathology.
Out of this tradition emerged the so-called "subtle" scales, the most
popular and most researched of which are the Wiener and Harmon Subtle
and Obvious scales (Wiener, 1948). These scales were derived by dividing
the items on the MMPI clinical scales into two groups: one made up of
items which appear on the surface to be related to psychological distur-
bance (obvious), and one made up of items for which the relationship to
disturbance is not clear (subtle). Of the clinical scales, five were success-
fully partitioned in this manner (D, Hy, Pd, Pa, Ma) and scoring keys were
developed. Clinicians who employ the Wiener and Harmon subtle scales
in practice have used them in two ways: 1) as clinical scales which are less
susceptible to faking; or 2) in conjunction with the obvious scales to
assess profile validity.
The first use of the subtle scales (i.e., as "unfakeable" indicators of
psychopathology) is not well supported in the literature. Despite their
popular use, the bulk of the evidence fails to find that the Wiener and
Harmon subtle items contribute much validity to the MMPl clinical scales
MMPI-2 153
might only see the scale briefly mentioned within a larger context. As a
rule of thumb, empirical scales should be named according to their
function, and inductive or rational scales, according to their content.
Although introduction of new scales will ideally address all of these issues
and more, these guidelines may serve as a beginning for investigators
interested in scale development based on the MMPI-2 item pool.
Adaptive testing
tion. They concluded that the results they obtained warranted further
study of the countdown method, preferably using actual computer ad-
ministration, rather than simulated administration. They felt, however,
that recommendation of this procedure for practice is premature since
their data do not bear on the issue of test equivalence across administra-
tion conditions. To warrant this kind of recommendation, they reasoned,
would require evidence not only that computer administration yields
results comparable to the standard administration (see section above),
but also that the reordering of the items produces no substantive changes
in MMPI results.
A recent study by Slutske, Ben-Porath, Roper, Nguyen, and Butcher
(1990) was conducted to address these questions using computer adap-
tive administration of the MMPI-2 to college students. Subjects were given
both the booklet version of the MMPI-2 and the computer adaptive
version (with items ordered from least frequently endorsed to most
frequently endorsed in the MMPI-2 normative sample) in a counterbal-
anced design. Following the (shortened) adaptive administration, the
remainder of the items not needed for classification were administered by
computer to obtain full scale scores which could be compared to the
analogous scores from the booklet form.
Results comparing the standard booklet form and the reordered
computerized version of the MMPI-2 suggest rather strongly that the
forms of administration are comparable. First, the mean profiles for the
two versions are very similar. Of the thirteen basic scales and fifteen
content scales for both men and women, approximately two-thirds of the
mean scale scores from one version were within one T-score point of the
mean of the same scale from the other version. Second, for these 28 MMPI-
2 scales, the correlations between the the two versions administered
compare quite favorably with the test-retest correlations of the same
scales using the booklet form. Third, the item endorsement differences
between forms of administration are not dramatic and do not suggest any
obvious pattern of response set.
Having addressed the issue of form comparability, Slutske et al.
(1990) then reported results of item and time savings. Of the 498 MMPI-
2 items required to obtain full scale scores for the 28 basic scales, a mean
of only 357 items were administered to achieve perfect classification at a
T-score of 65 according to the countdown method, a savings of 28%. A
mean of 33 minutes was required for administration of the adaptive
version, 36% down from the mean of 52 minutes needed for the adminis-
tration of all498 items. Although these results may be directly relevant
only to assessment questions in which a simple classification is appropri-
ate, it underscores the potential applicability of this method to situations
where time and client attention are at a premium. A study by the same
MMPI-2 159
Summary
The revision of the MMPI has been compared to the renovation of an
old historic house. Improvements are made to see that it's structurally
sound and to ensure its safety of use. Modern conveniences, taking
advantage of technologies not available at the time of construction, are
added for the comfort and luxury of the user. But foremost is the
preservation of the aesthetics, function, and character of the old building.
The research to date on the MMPI-2 suggests that the conservative
nature of the test revision has been successful in maintaining interpretive
continuity. Research examining the comparability of MMPI and MMPI-2
will undoubtedly continue, and add to the half century of research base
documenting and supporting their use. New features, such as contempo-
rary norms, uniform T-scores, new indicators of profile validity, and the
MMPI-2 content scales should prove useful for practitioners and generate
interest among researchers. Finally, research programs which were
initiated using the MMPI, such as understanding the role of subtle items,
evaluating the efficacy of validity indicators, and examining the utility of
computer adaptive testing, will continue with the MMPI-2. Time and
160 NATHAN C. WEED AND JAMFS N. BUTCHER
empirical research will reveal which parts of the renovation become little-
used and which become as useful and popular as the original structure.
References
Adler, T. (1990, April). Does the 'new' MMPl beat the 'classic'? APA Monitor,
pp. 18-19.
Ben-Porath, Y. S. (1990, August). MMPI-2 Items. MMPI-2 News and Profiles, pp. 4-5.
Ben-Porath, Y. S., &Butcher, J. N. (1989a). The comparability of MMPI and MMPI-
2 scales and profiles. Psychological Assessment: A Journal of Consulting and
Clinical Psychology, 1, 345-347.
Ben-Porath, Y. S., & Butcher, J. N. (1989b). Psychometric stability of rewritten
MMPI items. Journal of Personality Assessment, 53, 645-653.
Ben-Porath, Y. S., Slutske, W. S., & Butcher, J. N. (1989). A real-data simulation of
computerized adaptive administration of the MMPI. Personality Assessment: A
Journal of Consulting and Clinical Psychology, 1, 18-22.
Ben-Porath, Y. S., Waller, N. G., Slutske, W. S., &Butcher, J. N. (1988).A comparison
of two methods for adaptive administration of MMPI-2 content scales. Paper
presented at the 96th Annual Meeting of the American Psychological Association,
Atlanta.
Bishkin, B. H., & Kolotkin, R. C. (1977). Effects of computerized administration on
scores on the Minnesota Multiphasic Personality Inventory. Applied
Psychological Measurement, 1, 543-549.
Burisch, M. (1984). Approaches to personality inventory construction. American
Psychologist, 39, 214-227.
Butcher, J. N. (1987). The use of computers in psychological assessment: An
overview of practices and issues. In J. N. Butcher (Ed.), Computerized
psychological assessment (p. 3-14). New York: Basic.
Butcher, J. N. (1989, August). MMPI-2: Issues of continuity and change. Paper
presented at the 97th Annual Convention of the American Psychological
Association, New Orleans.
Butcher, J. N. (1990a, August). Educational level and MMPI-2 measured
psychopathology: A case of negligible influence. MMPI-2 News and Profiles, p. 3.
Butcher, J. N. (1990b). Use of the MMPI-2 in treatment planning. New York: Oxford
University Press.
Butcher, J. N., Dahlstrom, W.G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989).
Manual for the restandardized Minnesota Multiphasic Personality Inventory:
MMPI-2. An administrative and interpretive guide. Minneapolis: University of
Minnesota Press.
Butcher,J. N., Graham,J.R., Williams, C.L.,&Ben-Porath, Y.S. (1989).Development
and use of the MMPI-2 Content Scales. Minneapolis: University of Minnesota
Press.
MMPI-2 161
Butcher, J. N., Keller, L. S., & Bacon, S. F. (1985). Current developments and future
directions in computerized personality assessment. Journal of Consulting and
Clinical Psychology, 53, 803-815.
Chapman, L. J., & Chapman, J.P. (1969). Illusory correlation as an obstacle to the
use of valid diagnostic signs. Journal of Abnonnal Psychology, 4, 44-49.
Christian, W. L., Burkhart, B. R., &Gynther, M.D. (1978). Subtle-obvious ratings of
MMPI items: New interest in an old concept. Journal of Consulting and Clinical
Psychology, 46, 1178-1186.
Cohen, J. (1978). Partialled products are interactions; partialled powers are curve
components. Psychological Bulletin, 85, 858-866.
Colligan, R. C., Osborne, D., Swenson, W. M., & Offord, K. P. (1983). The MMPI: A
contemporary normative study. New York: Praeger.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology.
American Psychologist, 30, 116-127.
Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws in analyses
recently proposed. Psychological Bulletin, 102, 414-417.
Eyde, L. D., Kowal, D. M., & Fishburne, F. J., Jr. (1986, August). The validity of
computer-based test interpretations of the MMPI. InA. D. Mangelsdorff (Chair),
Computer-based clinical assessment for children, adults, and neuropsychological
cases. Symposium conducted at the meeting of the American Psychological
Association, Washington, D. C.
Eyde, L. D., Kowal, D. M., &Fishburne, F. J., Jr. (1987, August). Clinical implications
of validity research on computer-based interpretations of the MMPI. In A. D.
Mangelsdorff (Chair), Practical test user problems facing psychologists in private
practice.. Symposium conducted at the meeting of the American Psychological
Association, New York.
Fishburne, F. J., Jr., Eyde, L. D., & Kowal, D. M. (1988, August). Computer-based test
interpretations of the Minnesota Multiphasic Personality Inventory with
neurologically impaired patients. Paper presented at the meeting of the American
Psychological Association, Atlanta.
Gough, H. G. (1947). Simulated patterns on the MMPI. Journal of Abnonnal and
Social Psychology, 42, 215-225.
Graham, J. R. (1990). MMPI-2: Assessing personality and psychopathology. New
York: Oxford University Press.
Graham, J. R., & Ben-Porath, Y. S. (1990, June). Congruence between the MMPI and
MMP/-2 code types: Empirical data and theoretical issues. Paper presented at the
25th Annual Symposium on Recent Developments of the MMPl (MMPI-2).
Minneapolis, MN.
Graham, J. R., & Butcher, J. N. (1988). Differentiating schizophrenic and major
affective disordered inpatients with the revised form ofthe MMPJ. Paper presented
at the 23rd Annual Symposium on Recent Developments in the Use of the MMPI,
St. Petersburg, FL.
162 NATHAN C. WEED AND JAMFS N. BUTCHER
Greene, R. L. (1980). The MMPL· An interpretive manual. New York: Grune &
Stratton.
Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule
(Minnesota): I. Construction of the schedule. Journal ofPsychology, 10, 249-254.
Holden, R. R., &Jackson, D. N. (1979).ltemsubtlety andfacevalidityinpersonality
assessment. Journal of Consulting and Clinical Psychology, 4 7, 459-468.
Keller, L. S., & Butcher, J. N. (1990). Use of the MMP/-2 with chronic pain patients.
Minneapolis: University of Minnesota Press.
Koss, M.P., & Butcher, J. N. (1973). A comparison of psychiatric patients' self
report with other sources of clinical information. Journal of Research in
Personality, 7, 225-236.
Lubinski, D., &Humphreys, L. G. (1990). Assessing spurious "moderator effects":
Illustratedsubstantivelywith the hypothesized ("synergistic") relation between
spatial and mathematical ability. Psychological Bulletin, 107, 385-393.
McCrea, R. R., & Costa, P. T., Jr. (1987). Validation of the five-factor model of
personality across instruments and observers. Journal of Personality and
Social Psychology, 52, 81-90.
Meehl, P. E. (1945). The dynamics of "structured" personality tests. Journal of
Clinical Psychology, I, 296-303.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and
a review of the evidence. Minneapolis: University of Minnesota Press.
Moreland, K. L. (1987). Computerized psychological assessment: What's available.
In J. N. Butcher (Ed.), Computerized psychological assessment (pp. 2649). New
York: Basic.
Saunders, D. R. (1956). Moderator variables in prediction. Educational and
Psychological Measurement, /6,209-227.
Slutske, W. S., Ben-Porath, Y. S., Roper, B., Nguyen, P., & Butcher, J. N. (1990, June). An
empirical study ofthe computer adaptive MMP/-2. Paper presented at the 25th Annual
Symposium on Recent Developments of the MMPl (MMPI-2). Minneapolis, MN.
Tellegen, A. M. (1988, August). Derivation ofUniform T-scores for the restandardized
MMPI. Symposium presentation at the 96th Annual Convention of the American
Psychological Association. Atlanta, GA.
Tellegen, A.M. (1989, August).New Uniform T-scores for the MMPI-2: Methodological
issues. Paper presented at the 97th Annual Convention of the American
Psychological Association. New Orleans, LA.
Weed, N. C., Ben-Porath, Y. S., & Butcher, J. N. (1990). Failure of the Wiener-
Harmon MMPI subtle scales as predictors of psychopathology and as validity
indicators. Psychological Assessment: A Journal of Consulting and Clinical
Psychology, 2, 281-285.
Weed, N. C., & Han, K. (1991). Evaluating indicators of MMPI-2 profile validity.
Manuscript In preparation.
MMPI-2 163
Weiss, D., & Vale, C. D. (1987). Computerized adaptive testing for measuring
abilities and other psychological variables. InJ. N. Butcher (Ed.), Computerized
psychological assessment (pp. 325-343). New York: Basic.
Wiener, D. N. (1948). Subtle and obvious keys for the MMPI. Journal of Consulting
Psychology, 12, 164-170.
Wiggins, J. S., Goldberg, L., & Appelbaum, M. (1971). MMPI Content Scales:
Interpretive norms and correlations with other scales. Journal of Consulting
and Clinical Psychology, 37, 403-410.
CHAPTER6
Nonns
Various sets of norms are available for the BPI. Norms used for the
adult profiles described in the BPI manual were collected using mail
surveys or interviews of 709 men and 710 women randomly selected from
telephone directories and voters' records from the United States and
Canada. Age and marital status for this sample matched closely the
distributions associated with the 1980 U.S. Census. An underrepresentation
of nonwhites, however, suggests that additional nonwhite norms are still
required for the BPI. Adolescent norms are based on 880 male and 1380
female high school students representing pooled samples from two
Canadian provinces. The BPI manual also describes norms for college
students (187 men; 192 women), correctional officer job applicants (260
BASIC PERSONAUTY INVENTORY 167
men; 109 women), and psychiatric patients (66 men; 46 women). Norma-
tive data are also available for other samples of psychiatric patients and
high school students (Holden, Red don, Jackson, & Helmes, 1983), and for
psychiatric patients who completed a microcomputer-administered ver-
sion of the BPI (Holden, Fekken, & Cotton, 1990).
Model of Measurement
In considering the theoretical foundations of the BPI, it is useful to
review alternative measurement models for the assessment of psychopa-
thology, because these are intimately linked to alternative conceptions of
psychopathology.
The class model. A variety of measurement models have been
employed implicitly in the development of scales designed to assess
psychopathology. One of the important contributions of Loevinger (1957)
was her recognition that one aspect of construct validity was the require-
ment that a measurement model bear a close relationship to the structure
of the processes thought to underlie what was being measured. In the
measurement of psychopathology one can conceptualize these pro-
cesses as representing discrete states or psychopathological conditions
or as continuous dimensions. The choice of a model of psychopathology
should logically determine the choice of a measurement model, which in
turn affects the approach taken to scale development and interpretation.
A generation of personality and psychopathology test specialists has
been influenced by the approach employed by Hathaway and McKinley
(1940) in the construction of the original Minnesota Multiphasic Person-
ality Inventory (MMPI). These authors considered psychopathology in a
manner consistent with the classical model of medical diagnosis. Patients
suffering from severe depression or from a schizophrenic reaction were
conceptualized as being in a class distinct from other people much in the
same way as a patient diagnosed as having a tuberculosis infection is
thought of as different from non-infected people. The task of assessing
psychopathology from this vantage point was to distinguish members of
a class from those who were not members. It thus followed that the
approach to measurement taken was to seek to distinguish reliably
persons falling within the class or diagnosis from others falling outside of
the class. The class was believed to have an independent existence, one
that could be approximated by measurement. The method employed to
accomplish this was the contrasted groups or empirical method of scale
construction. A group of schizophrenic patients and a group of non-
patients were administered a heterogeneous set of personality items and
those items showing different endorsement proportions between the two
groups were assembled into a scale. This scale could then be adminis-
168 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON
tered to persons whose diagnosis was not known. In its simplest form this
model involved a single decision. If a person endorsed the items on the
scale above a certain cutting score, that person would be classified as a
schizophrenic; otherwise not. In Hathaway's conceptualization, the level
of the score was important only insofar as it affected the classification
decision. If a person's score exceeded the cutting score that person was
classified as probably a member of the diagnostic group; how much he or
she exceeded the cutting score was of little or no importance. Similarly,
a score below the cutting score was not further interpreted in terms of its
magnitude. Neither was much attention directed at the nature of the item
content comprising the scale, nor at the particular items that a particular
respondent endorsed. It should be noted, however, that despite the
underlying measurement model associated with instruments such as the
MMPI, in practice (and in contradiction to the class model), users do
interpret scales on such tests as dimensions. Nevertheless, it is useful to
contrast the class model in its "pure" form with the dimensional model
even though many individuals have expanded upon and departed from
Hathaway's original conceptualization of the measurement model im-
plicit in the contrasted groups approach, in a shift of paradigms (Messick,
1989). This takes the form of interpreting a person's score as representing
the magnitude of the trait for a particular respondent. That is, a person
with a high depression score is thought to manifest a high degree of
depression, rather than simply being a member of a depressed diagnostic
group.
The dimensional model. A dimensional model of psychopathology
recognizes that its manifestations can be conceptualized as falling along
a continuum. In such a model, the analogy is not with medical diagnosis,
but with the measurement of intellectual ability, which similarly is viewed
as falling along a continuum. With a dimensional model, the task of
measurement is to identify exemplars characteristic or prototypical of
the domain represented by the construct. If, for example, this domain is
depression, exemplars or items highly prototypical of depression are
sought, items that are not only prototypical but representative of all
important facets of depression. For example, Beck (1967) identified such
features as despair over the future, suicidal tendencies, loss of libido, and
psychomotor retardation, among others, as characteristic of depression.
To be representative of the domain, an item pool should contain content
representing each of these components.
Our choice of a dimensional model over alternatives was based on a
number of considerations. First, there is much controversy surrounding
the question of the viability of psychiatric diagnostic categories. The
reliability of psychiatric diagnoses even when standard nomenclatures
and diagnostic systems have been employed has been far from satisfac-
BASIC PERSONAUTY INVENTORY 169
tory. There has been much controversy also about their number and
nature, as witnessed by the changing standards for diagnosis over the
years. Another problem is that this evolution of a standard nomenclature
has remained largely free of influence from empirical research on psychi-
atric classification (Blashfield, 1984). Where such research has been
undertaken, there is evidence of important incongruence between em-
pirical findings and, for example, DSM-III-R (Livesley, Schroeder, &
Jackson, 1989).
A third problem in the use of a class model is that there is a lack of
consensus on where to draw the line regarding membership in a class. In
World War II the psychiatric screening of inductees into the U.S. Army was
undertaken by physicians employing a diagnostic interview. Rejection
rates at different induction centers ranged from 2 to 60 percent (Office of
Strategic Services, Assessment Staff, 1948), a problem that was alleviated
through the use of standardized questionnaires. Better levels of consen-
sus have been achieved usingDSM-III-Rcriteria, butthecriteriaemployed
(e.g., "has two or fewer friends") are arbitrary both in terms of their lack
of explicit rationale and their lack of empirical support.
A fourth problem in adopting a medical diagnostic model for assess-
ment is the problem of base rates. Here we refer to the fact that the level
of item and scale validity coefficients can be affected markedly by the
proportion of persons in a diagnostic group and in the "normal" group
(Loevinger, 1957; Meehl &Rosen, 1955). This problem is not alleviated by
specifying an arbitrary proportion (e.g., 50%) for each contrasted group
because these arbitrary proportions will vary between the groups used
for scale construction and those encountered in a given diagnostic
setting. Furthermore, the base rate for a given diagnostic category will
vary from one setting to another due to a number of causes, e.g.,
diagnostic and administrative traditions in the setting, selection factors
such as the policy to restrict certain types of patients in a clinical setting,
and the relation of certain types of psychopathology to ethnicity and
social class, which will vary across settings. To base diagnostic decisions
on measurement that is markedly and arbitrarily affected by base rates is
a dangerous venture.
A fifth concern with a class model and the method of contrasted
groups is the problem of matching normal and pathological groups on all
relevant variables except the particular type of psychopathology of
interest, a problem that is compounded by the fact that many variables
are correlated with psychopathology. When contrasted groups differ in
important respects, e.g., age, sex, social class, or even on perhaps more
incidental dimensions like employment status, religion, occupation, or
social attitudes, there is a distinct possibility that the differences ob-
served are due to the extraneous variables and not to differences in
170 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON
Imaginary symptoms, whose scale name was later changed on the DPI to
Hypochondriasis, contained item content representing complaints that
were vague, ill defined, or lacking a link to a known medical condition.
Health concern involved a preoccupation with health as distinguished
from health complaints. Psychometrically, we found these three facets to
be correlated but distinguishable. They are sufficiently correlated to
comprise a single broader scale, and constitute behaviors ordinarily
associated with hypochondriasis. One advantage of broadening the
definition of hypochondriasis beyond imaginary symptoms is that the
latter item content generally yields extreme endorsement proportions,
whose content borders on somatic delusions. Such content does not
differentiate well within the normal range. It is important to recognize
here as elsewhere in assessment based on self-report inventories that a
great deal of weight should not be placed on responses to a single item.
Rather, the aggregation of responses is what is emphasized. We are more
likely to suspect hypochondriasis if a person has a great many diverse
complaints than only one. Of course, it is important to interpret eleva-
tions on this scale in the light of the respondent's entire medical history.
There may be good reason for diverse somatic symptoms. One of us
recalls a patient who had suffered from semi-starvation while a prisoner
of war, an experience that resulted in a number of chronic health
problems. In this case the symptoms had an existence apart from a desire
to achieve secondary psychological gains from somatic complaints.
Depression. (Depression, Insomnia). My future is hopeless. The DPI
Depression and Insomnia scales do not exhaust the facets of this scale,
nor do they reflect adequately the background for this scale. When the
DPI Depression scale was developed, a total of 214 items was prepared,
reflecting many of the well known facets of depression, such as despair
over the future, suicidal tendencies, feeling "blue," loss of confidence in
abilities, and lack of libido. Facets were scored separately and
intercorrelated. Item correlations were obtained with each facet, as well
with total scores for a general depression scale (based on the sum of the
facet scores) and with related psychopathological scales, plus a social
desirability scale. These procedures permitted an analysis of the type of
item that best appraised general depression, as well as providing a basis
for representing the domain of depression in a comprehensive manner.
This approach provided a good foundation for developing the BPI scale.
Because the negative affect that is often represented in depressive
content is often highly evaluative, it is also important for the sake of
fostering discriminant validity to differentiate depression from simply
responding undesirably.
Denial. (Defensiveness, Repression, Shallow Affect). I cannot recall
ever having been embarrassed by something I did. It is interesting that the
BASIC PERSONAUTY INVENTORY 173
psychotic processes per se, but rather from temporary states like severe
panic, which sometimes results in the elevation of a variety of scales.
Thinking Disorder will also show elevations for persons who have been
abusers of drugs or alcohol. As well, the Thinking Disorder scale will also
reflect unusual cognitive processes in people who are not diagnosable as
having a psychiatric disorder. It remains for further research to deter-
mine whether such people are susceptible to psychotic disorders of
cognition.
Impulse Expression. (Impulsivity, Hostility). When I am bored I
sometimes do reckless or foolish things just to stir up excitement. In the
absence of evidence for psychopathology, persons high in Impulse
Expression appear to have above average levels of energy and to be
regarded by others as lively and entertaining. Its pathological implica-
tions arise from the tendency of such persons to experience fits of temper
and uncontrolled, sometimes hostile behavior, as well as a tendency to
engage in risky behavior, often involving physical, social, monetary, or
ethical risks. Persons with extremely low scores are generally regarded
as stolid, cautious individuals who have a slower than average pace of
responding and acting, are reserved, even tempered, and not subject to
unpredictable changes in behavior.
Social Introversion. (Desocialization)./ am happier alone than when
I am with other people. At times psychopathology has the effect of causing
a person to withdraw from what is generally regarded as normal human
contact. Sometimes this is the result of extremely low self-esteem, some-
times of severe depression, and sometimes the result of having experienced
delusional beliefs about the harm that might occur at the hands of
strangers or loved ones. A desire to affiliate will often vary, of course, in
the normal population. This can occur without any particular psycho-
pathological implications. But the content of the Social Introversion scale
of the BPI reflects more extreme behavior than one would be likely to
encounter in the normal range. Hence, elevations in Social Introversion
can be regarded as significant, particularly when they are accompanied
by other evidence of dysfunction.
Self Depreciation. (Self Depreciation)./ rarely have anything intelli-
gent to contribute to a conversation. There has been an extensive literature
regarding the role of social desirability in responses to personality
questionnaires. It has been argued that responses indicating negative
social desirability in self-report questionnaires define the largest compo-
nent in these questionnaires. This sort of finding can be interpreted in a
variety of ways, but one obvious interpretation is that the tendency to
depreciate the self is a very important aspect of psychopathology,
showing a pervasiveness that makes the measurement of specific aspects
of psychopathology difficult. In the BPI an effort was made to contain this
BASIC PERSONAUTY INVENTORY 177
Note: Decimals omitted. lEI refers to Item Efficiency Index (see text). Var to item variance.
Abbreviations for scale names are as follows: Hyp - Hypochondriasis; Dep - Depression; Den - Denial; InP - Interpersonal
Problems; Aln- Alienation; Psi -Persecutory Ideas; Anx- Anxiety; ThO- Thinking Disorder; Imp -Impulse Expression; Sol-
Sociallntroversion; and SOp- Self Depreciation. The Deviation scale, a scale composed of heterogeneous critical items, was
not used for this stage of test development.
Table adapted from Jackson et al. (1989) with the permission of Sigma Assessment Systems, Inc.
........
CQ
180 RONALD R. HOLDEN AND DOUGLAS N. JACKSON
Factor 1
Impulse Expression .62
Interpersonal Problems .52
Pd .51
Anxiety .49
Alienation .46
Ma .46
Pt .40
D .38
K -.35
L -.67
Denial -.82
The scales showing the highest loadings on this factor are those that
identify persons who are likely to have trouble adapting to social or legal
codes of behavior, who are likely to show unstable employment histories,
and who are more likely than average to engage in interpersonal conflict.
This is the pattern of behavior that is commonly associated with the Pd-
Ma configuration in MMPI clinical lore. This configuration was found by
Loper, Kammeier, and Hoffmann (1973) to characterize the MMPI profiles
of University of Minnesota undergraduates who were later hospitalized
for alcoholism. BPI scales loading this factor support this interpretation.
Impulse Expression, implying undercontrol of impulses, Interpersonal
Problems, suggesting a propensity for conflict, and Alienation, with its
implication of deviant socialization with respect to dishonesty, theft, and
thrillseeking, are consistent with an interpretation of the positive pole of
this factor as reflecting sociopathic or antisocial behavioral tendencies.
Its negative pole, defined by the BPI Denial scale and the MMPI L and K
scales, suggests overcontrol of impulses, defensiveness, and some im-
pression management. Social undercontrol is an apt label.
Factor 2
Si .60
Pt .56
Anxiety .49
K -.69
FactorS
Hy .82
D .66
Hs .61
Hypochondriasis .45
Depression .39
Cognate scales from the BPI and MMPiload this factor. The Hs, D, Hy
combination is the familiar "neurotic triad" in MMPIIore. Persons high on
this factor tend to experience psychopathology in terms of negative
affective states and other manifestations of depression coupled with
somatic complaints, justifying the label of depression and somatization.
Factor4
Sc .77
F .75
Persecutory Ideas .69
Ma .67
Thinking Disorder .63
Alienation .53
Deviation .51
Pt .51
Pa .49
Hypochondriasis .48
Hs .45
Interpersonal Problems .43
Pd .39
Impulse Expression .38
K -.40
The two highest loadings on this factor from the BPI, Persecutory
Ideas and Thinking Disorder, are associated with psychosis scales from
the MMPI. But there are also a number of other scales highly loaded,
suggesting that psychotic processes are accompanied by diverse other
forms of p~ychopathology in this population. It is noteworthy that the BPI
Deviation scale is loaded on this factor. The Deviation scale contains
diverse deviant content, rather than content pertaining to a unitary
dimension of psychopathology. Its appearance here suggests that hospi-
talized psychiatric patients experience or report many of these diverse
deviant behaviors. Jackson and Hoffmann labelled this dimension gener-
alized psychotic processes.
BASIC PERSONAUTY INVENTORY 183
FactorS
Social Introversion .75
Self Depreciation . 72
Depression .62
Si .53
D .40
Alienation .38
According to Beck (1967) social withdrawal and low self esteem are
often associated with high levels of depression. It is not surprising to find
these characteristics c<>-<>ccurring in a sample of psychiatric patients.
The label depressed withdrawal seems appropriate.
It should be noted that the above factor analytic results were ob-
tained from a particular population, male psychiatric patients, most of
whom were suffering from severe alcoholism. Although the five factors
accounted for a major proportion of the variance, and the interpretation
of the factors was relatively easy from the perspective of plausibility, it is
not certain whether or not the same factors would emerge with different
populations. But given the fact that correlations between BPI and MMPI
scales are relatively high, and share substantial common variance (Jack-
son &Hoffmann, 1987), it is clear that the BPI and the MMPI each address
the same domain of psychopathology.
Typologies
In addition to interpretations based upon individual scale scores, the
configurations of scores associated with an individual's test profile may
also be of interest and relevance. More specifically, the similarity of a
respondent's profile to a particular prototypical configuration or "modal
type" may provide the test user with additional pertinent information.
Using a sample of 352 provincial psychiatric hospital inpatients, Holden,
Fekken, and Jackson (1983) employed cluster analytic methods to iden-
tify six reliable profile "types" of psychiatric patients based on BPI scale
scores (see Table 2). This typology successfully classified over 75% of the
patients. The following information describes each type and provides
information about and exemplary patient. It should be noted, however,
that any one patient will only approximate his/her "type" of profile.
TYPE lA represents a cluster of patients associated with a modal
profile elevated on BPI scales of Denial and Alienation. Patient #155
(Figure 1) was a high scorer on this type. This patient was a divorced
woman in her early twenties with a grade 8 education. She had a prelimi-
nary diagnosis of an Immature Personality Disorder. Casebook symptoms
184 RONAlD R. HOLDEN AND DOUGlAS N. JACKSON
Table 2
1YPE
BPI SCALE lA IB IIA liB IliA IIIB
Hypochondriasis 44 56 34 66 51 49
Depression 41 59 55 45 61 39
Denial 77 23 43 57 62 38
Interpersonal Problems 52 48 64 36 41 59
Alienation 58 42 57 43 33 67
Persecutory Ideas 50 50 41 59 41 59
Anxiety 38 62 43 57 58 42
Thinking Disorder 49 51 39 61 44 56
Impulse Expression 45 55 52 48 41 59
Social Introversion 51 49 64 36 60 40
Self Depreciation 46 54 58 42 58 42
Note: Modal Profiles are scaled to have a mean of 50 and a standard deviation of
10. The Deviation scale, a scale composed of heterogeneous critical items,
was not used in the derivation of the types.
110
100
90
....
Q)
80
0
(,)
tl) 70
....
"C
60
~
"C
c: 50
.....
~
tl) 40 El
30
10
10
:z:. ,.j ,.j(l) :z:. z
!!!
g ~3
~
~!:!
~a ~ra
~7.
II
0 0
"'~ !il ~ I= ~~
0
.."' ~9 0~
~
O.J
~ "' ~~
...:
~~ ~
i .."'
Q
u "'~
~ "'..
~
0
=
!:!"'
..:"' ~ ~ 0
~ "'
Q
u
Q
~ "' "' 0
~
Note. Casebook symptoms of assaultive behavior, alcohol abuse, and attention-
seeking. Preliminary diagnosis of Immature Personality Disorder.
110
110
100
Q) 90
s..
0 80
(,)
Cf.l
'"0 70
....
~ 60
'"0
c:
.....
~ 50
Cf.l
40
30
10
10
z wz :z:.
II
!;! z f::
~~
~~
8 ~z ~~ 0 f;:~
~~ ~rs 0
:Z:.<o>
I= <n!= I=
~a
~ .
~
"'.."'
(I)
O.J
"'~ "' "' "'"'
«(
~~
~.. ~
0 ?.; u
<1:0 "'~
"'"'
:;:o. "'"' "'
~<(
.."'"'
0
~ "'
Q
X
"' ~
Q
u
0
~ "'
Q
~
Note. Casebook symptoms of depression, somatic complaints, and suicidality.
Preliminary diagnosis of Neurotic Depression.
Figure IA & B
BPI Profiles of Patients Representing Modal Type lA and Modal Type IB
186 RONAlD R. HOlDEN AND DOUGlAS N. JACKSON
110
100
90
Qj
r.. 80
0
u
Cl) 70
"C
r.. 60
Cll
=
"C 50
.5
Cl)
40
30
20
10
......
II
z z wz
~~ ~ ~ ~§ ~~ ~~~
~
0 '"'O ~
~e
5 ~8 ~~=
~... ~"'
"'1'1 12"' ~§
0 ~
Q
~f ~ 12"'
.."' a "'~
0
~
Q
~ ~ "'
Q
110
100
90
Qj
r.. 80
0
u
Cl) 70
"C
r.. 60
Cll
=
"C 50
.5
Cl)
40
30
lO
10
i II
z z
I"'
l!l ll.Z 7.:
~
~~ 0 ~3 ~"'
~8
~~ o12 i;! O 0
~ ~~ 5 ~8
.,~: !=!
~
Q 12'"
"'~
~
u
"' ~~ "'~
~.. ~
"'
.."'
0
=
0
t!"" 12 Q
"' ~
i
Q
~ "'
Q
Research Applications
The Basic Personality Inventory has been used in a variety of different
research programs and continues to be incorporated into more and more
empirical investigations. Below are listed some of the more extensive
research applications.
Health Psychology. The influence of psychological factors on physi-
cal well-being represents a rapidly developing area in the health
188 RONALD R. HOlDEN AND DOUGlAS N. JACKSON
110
100
9U
,...Q,l 80
0
u
rn 70
,...
"0
60
ell
so
....=
"0
ell
40
rn
3Q
20
10
z z ..,z z
so
.:I t:>e>: ...:IZ "-'Z
~~
~
~~
~~ 5 uo"'
II
0 0 .JO 0
0~
<C
;;; ~ ~!:! I=
.. ~~~
l:lr::
t;B o..:
~ "'Q
::!"'
..,o ~... "'~ 0 <C
>
=
0 1<1 ~g; "'E 0
"'.....: "'
Q
g ~
1<1
...
Q <C
u
0 "'
Q
~
=
Note. Casebook symptoms of depression and insomnia. Preliminary diagnosis of
Schizophrenia in Remission.
110
100
90
Q,l
r... 80
0
u
rn 70
,...
"0
60
ell
-
El
"0
c: so
ell
40
rn
30
20
10
1!1 z .:I z >"' t:>e>: wz ...:IZ: "'Z z
~ ~~
-'~"'
-<::!: ..:< -"0
..,5l ~
0 0
..
~~
zw ~0
t:l!=
~ "'
0...:1 ~z ~8 151Q
0 ~
~... "'"'
..:o ~
..
Q <nl<l
~ u >
lSI'<
~ ...
~ .."' l:l
~<~..: 1<1 0 1<1
"'
0
:z: ::! Q
u
Q
"' ~ 1<1
0
~ ~ Q
=
Note. Ca sebook symptoms of delusions, mania, depression, and visual
hallucinations. Preliminary diagnosis of Schizoaffective Disorder.
Agure3A&B
BPI Profiles of PaUents RepresenUng Modal Type IliA and Modal Type mB
BASIC PERSONAU1Y INVENTORY 189
so
45
40
35
-
Q)
bO 30
~
c:
Q) 25
u
s..
Q) 20
~
15
10
0
IA In IIA un IliA nm
Note. Presence of case-recorded hallucinatory symptoms as a function of Modal
type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 10% of patients classified as Modal Type IIA
to 46% of patients classified as Modal Type liB. Types differ significantly
(x2 = 17.55, p < .01).
Figure 4
60
50
-
Q) 40
bJl
~
c:
Q) 30
u
s..
Q)
~ 20
10
0
lA ID llA 118 lilA lUll
Note. Presence of casebook-recorded delusional symptoms as a function of
Modal Type. Frequency is displayed as a percentage of patients within each
Modal Type. Frequency ranged from 19% of patients classified as Modal
Type IliA to 59% of patients classified as Modal Type IIIB. Types differ
significantly (x2 = 21.23, p < .01).
Figure 5
190 RONAlD R. HOlDEN AND DOUGlAS N. JACKSON
70
60
Q,)
ell so
....
~
=
Q,)
(,J
s...
40
Q,)
30
~
20
10
0
lA IB IIA IIIJ lilA IIIB
Note. Presence of casebook-recorded depressive symptoms as a function of
Modal Type. Frequency Is displayed as a percentage of patients within each
Modal Type. Frequency ranged from 41% of patients classified as Modal
Type 1118 to 76% of patients classified as Modal Type lilA Types differ
significantly (XZ = 19.83, p < .01).
F1gure 6
40
JS
30
Q,)
OD 2S
....
~
=
Q,)
u
s...
2.0
Q,)
~ IS
10
0
IA lB IIA liD IDA IIIJ
Note. Presence of casebook-recorded anxiety symptoms as a function of Modal
Type. Frequency Is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 9% of patients classified as Modal Type lllB to
35% of patients classified as Modal Type 18. Types differ significantly
(XZ = 16.15, 2p < .01).
flgure 7
BASIC PERSONAUTY INVENTORY 191
40
35
JO
Q) 25
t:l.O
.....
(1:1
r=
Q) 20
v
....
Q)
~ 15
10
0
lA lU UA liD IUA lllll
Note. Presence of casebook-recorded Insomnia symptoms as a function of Modal
Type. Frequency Is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 7% of patients classified as Modal Type IIIB to
30% of patients classified as Modal Type IliA. Types differ significantly
(x2 = 11.13, p < .05).
FigureS
40
35
30
<U
t)J) 25
.....~
c:
<U 20
u
s..
<U
Poe 15
10
0
lA 113 UA 118 UIA lllfl
Note. Presence of casebook-recorded assaultive behavior as a function of Modal
Type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 8% of patients classified as Modal Type IliA to
35% of patients classified as Modal Type IIA. Types differ significantly
(x2 = 13.60, p < .05).
Flgure 9
Skinner &Allen, 1982, 1983). Morey et al. (1984) identified three distinct
types of alcohol abusers (early-stage problem drinkers; affiliative, mod-
erate alcohol dependents; schizoid, severe alcohol dependents) who
could subsequently be differentiated on the basis of BPI scale scores.
From this, a speculative model of alcohol abuse has been proposed with
differential treatment hypothesized as a function of the type of alcohol
abuse.
Juvenile Delinquency. Jaffe and his associates (Austin, Leschied,
Jaffe, &Sas, 1986; Jaffe, Leschied, Sas, &Austin, 1985; Jaffe, Leschied, Sas,
Austin, & Smiley, 1985; Leschied, Austin, & Jaffe, 1988; Sas & Jaffe, 1986;
Sas, Jaffe, & Reddon, 1985) have explored the utility of using the BPI with
young offenders. Jaffe, Leschied, Sas, Austin, and Smiley (1985) demon-
strated that the BPI could indicate the presence of previous delinquency,
the nature of the offending charge (person/property vs status), the
presence of school misbehavior, the type of previous residence referral
(home vs detention), and could predict subsequent court reappearance.
Jaffe, Leschied, Sas, & Austill (1985) suggest that clinical services to
juvenile court systems might appropriately incorporate personality test-
ing, such as provided by the BPI, in their assessment programs.
BASIC PERSONAUTY INVENTORY 193
50
-
QJ
40
Oil
~
cQJ: 30
(,J
I..
QJ
Poe 20
10
0
lA IB llA un IliA mn
Note. Presence of casebook-recorded suicidal symptoms as a function of Modal
Type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 24% of patients classified as Modal Type liB
to 57% of patients classified as Modal Type IIA or Modal Type rnA. Types
differ significantly (x2 = 16.1l,p < .01).
Agure 10
0.30
>.
v 0.27 l2l Endorsed Items
c
<I
0.24 • Rejected liems
<l
..J 0.21
<I
"'c0 0.18
c. 0.15
~
c:: 0.12
.,
"0
0.09
..
N
:a 0.06
" 0.03
~
... 0.00
ci5
.,...c -0.03
:;; -0.06
-0.09
12-33 34-58 59-83 83-108
Scores on Clinicnl Rating of Deoression
Rgure 11
Mean Standardized Response Latencies as a Function of Clinical Rating of Depression
Conclusions
The BPI, focusing on the measurement of constructs underlying the
MMPI, assesse~ a traditional domain of psychopathology. However,
unlike the MMPI, which was originally constructed on the basis of a class
model of ali-or-none psychopathological categorizations, the BPI repre-
sents a modern, dimensional model of assessment for quantification
along various continua of psychological dysfunctioning. Such an ap-
proach is advantageous in that is consistent with classical test theory.
Theoretically, the BPI incorporates an emphasis on important constructs
of psychopathology, fostering both the development of convergent and
discriminant validity (Campbell & Fiske, 1959). Rarely, if ever, have
multiphasic inventories of psychopathology been shown to provide
BASIC PERSONAUIY INVENTORY 195
References
Austin, G.W., Leschled, A.W., Jaffe, P.G., & Sas, L. (1986). Factor structure and
construct validity of the Basic Personality Inventory with juvenile offenders.
Canadian Journal of Behavioural Science, 18, 238-247.
Bagby, R.M., Glllls, J.R., & Dickens, S. (1990). Detection of dissimulation with the
new generation of objective personality measures. Behavioral Sciences and the
Law, 8, 93-102.
Bagby, RM., Taylor, G.J., &Ryan, D. (1986). Torontoalexithymlascale: Relationship
with personality and psychopathology measures. Psychotherapy and
Psychosomatics, 45, 207-215.
196 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON
Holden, R.R., & Fekken, G.C. (1987, August). Reaction time and self-report
psychopathological assessment: Convergent and discriminant validity. Paper
prsented at the American Psychological Association Annual Convention, New
York.
Holden, R.R., &Fekken, G. C. (1988, June). Using reaction time to detect faking on
a computerized inventory of psychopathology. Paper presented at the Canadian
Psychological Association Annual Convention, Montreal, Canada.
Holden, R.R., & Fekken, G.C. (1990). Structured psychopathological test item
characteristics and validity. Psychological Assessment: A Journal of Consulting
and Clinical Psychology, 2, 3540.
Holden, R.R., Fekken, G.C., & Cotton, D.H.G. (1990). Clinical reliabilities and
validities of the microcomputerized Basic Personality Inventory. Journal of
Clinical Psychology, 46, 845-849.
Holden, R.R., Fekken, G. C., & Cotton, D.H.G. (1991). Assessing psychopathology
using structured test item response latencies. Psychological Assessment: A
Journal of Consulting and Clinical Psychology, 3, 000-000.
Holden, R.R., Fekken, G.C., &Jackson, D.N. (1983, June). Diagnostic efficiency of
the Basic Personaltiy Inventory. Paper presented at the Canadian Psychological
Association Annual Convention, Winnipeg, Canada.
Holden, R.R., Fekken, G.C., Reddon, J.R., Helmes, E., & Jackson, D.N. (1988).
Clinical reliabllities and validities of the Basic Personality Inventory. Journal of
Consulting and Clinical Psychology, 56, 766-768.
Holden, R.R., Helmes, E., Fekken, G.C., & Jackson, D.N. (1985). The
multidimensionality of person reliability: Implications for interpreting individual
test item responses. Educational and Psychological Measurement, 45, 119-130.
Holden, R.R., & Jackson, D.N. (1985). Disguise and the structured self-report
assessment of psychopathology: I. An analogue investigation. Journal of
Consulting and Clinical Psychology, 53, 211-222.
Holden, R.R., Reddon, J.R., Jackson, D.N., & Helmes, E. (1983). The construct
heuristic applied to the measurement of psychopathology. Multivariate
Behavioral Research, 18,3746.
Jackson, D.N. (1970). A sequential system for personality scale development. In
C.D. Spielberger (Ed.), Current topics in clinical andcommunitypsychology(yol.
2, pp. 61-96). New York: Academic Press.
Jackson, D.N. (1971). The dynamics of structured personality tests: 1971.
Psychological Review, 78,229-248. (Reprinted as a Warner Modular Publication,
1973,320, 1-20.)
Jackson, D.N. (1976). The Basic Personality Inventory. Port Huron, Ml: Sigma
Assessment Systems.
Jackson, D.N., Helmes, E., Hoffmann, H., Holden, R.R., Jaffe, P.G., Reddon, J.R. &
Smiley, W.C. (1989). The Basic Personality Inventory Manual. Port Huron, Ml:
Sigma Assessment Systems.
198 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON
Neill, J.A., &Jackson, D.N. (1976). Minimum redundancy item analysis. Educational
and Psychological Measurement, 36, 123-134.
Office of Strategic Services, Assessment Staff. (1948).Assessment ofmen: Selection
of personnel for the Office of Strategic Services. New York: Holt, Rinehart &
Winston.
Ogborne, A.C. (1987). A note on the characteristics of alcohol abusers with
controlled drinking aspirations. Drug and Alcohol Dependence, 19, 159-164.
Reddon, J.R., &Jackson, D.N. (1989). Readability of three adult personality tests:
Basic Personality Inventory, Jackson Personality Inventory, and Personality
Research Form- E. Journal of Personality Assessment, 53, 180-183.
Red don, J.R., Marceau, R., &Jackson, D.N. (1982). An application of singular value
decomposition to the factor analysis of MMPI items. Applied Psychological
Measurement, 6, 275-283.
Richmond, J.M., Lindsay, R.M., Burton, H.J., Conley, J., & Wai, L. (1982).
Psychological and physiological factors predicting the outcome of home
dialysis. Clinical Nephrology, 17, 109-113.
Sas, L., & Jaffe, P.G. (1986). Understanding depression In juvenile delinquency:
Implications for Institutional admission policies and admission programs.
Juvenile and Family Court Journal, 37, 49-58.
Sas, L., Jaffe, P.G., &Reddon,J.R. (1985). Unravelling the needs of dangerous young
offenders: A clinical-rational and empirical approach to classification. Canadian
Journal of Criminology, 27, 83-96.
Scudds, R.A., Rollman, G.B., Hart, M., & McCain, G.A. (1987). Pain perception and
personality measures as discriminators in the classification of fibrositis.
Journal of Rheumatology, 14, 563-569.
Skinner, H.A. (1979). A multivariate evaluation of the MAST. Journal of Studies on
Alcohol, 40, 831-843.
Skinner, H.A. (1981). Primary syndromes of alcohol abuse: Their measurement
and correlates. British Journal of Addiction, 76, 63-76.
Skinner, H.A. (1982). The Drug Abuse Screening Test. Addictive Behaviors, 7,
363-371.
Skinner, H.A., & Allen, B.A. (1982). Alcohol dependence syndrome: Measurement
and validation. Journal of Abnormal Psychology, 91, 199-209.
Skinner, H.A, &Allen, B.A. (1983). Differential assessment of alcoholism: Evaluation
of the Alcohol Use Inventory. Journal of Studies on Alcohol, 44, 852-862.
Spielberger, C.D. (1983). State-Trait Anxiety Inventory (Form }')Manual. Palo Alto:
Consulting Psychologists Press.
Strauss, J.S., & Harder, D.W. (1981). The Case Record Rating Scale: A method for
rating symptom and social function data from case records. Psychiatry Research,
4, 333-345.
CHAPTER 7
Assessment of Couples
assessment tools: other sources provide such resources with far greater
completeness than this space allows (e.g., Touliatos, Perlmutter, &
Straus, 1990). Rather, our purpose is to present a number of the common-
alities of the various assessment approaches currently utilized, and to
put particular, but not exclusive, emphasis on clinically-relevant assess-
ment procedures. We augment this discussion with some suggestions for
increasing convergence in the field among researchers and practitioners
alike.
Our title, "Assessment of Couples," reflects the fact that married,
heterosexual couples are by no means the only couples who seek profes-
sional help nor are they the only ones who serve as the subjects of
couples' research. The assessment approach we describe is quite ap-
plicable to other types of couples, such as gay, unmarried, and/or
remarried couples. Every couple has its own particular history and
problems, which need to be examined, as we shall describe. However, the
commonalities among various types of couples far exceeds their differ-
ences in terms of assessment needs.
We begin by examining the context for assessment (clinical, research,
or both), then explore the various levels of assessment (individual,
couple's presenting problems, micro-interactional, patterns of interaction,
etc.). Throughout the chapter we present assessment techniques (both
tried-and-true and the more novel) that may be useful in many settings
and for different applications. Finally, we conclude with a discussion of
newer methodological techniques that may help to increase further the
compatibility of assessment strategies not only among advocates of
particular theories, but also between what researchers and clinicians do.
Levels of Assessment
Many different factors influence couple functioning and thus are
important dimensions for assessment. These include individual function-
ing, historical factors, relationship issues, larger systemic or contextual
factors such as socio-economic status, employment, extended family
issues, and interactional processes. We explore different levels of as-
sessment relevant to a couple's relationship, when they are most
important, and the various means available for assessing within these
dimensions. The distinctions between levels are necessarily arbitrary:
individual factors influence couple interaction patterns, which in turn
affect what are the salient relationship issues, and so on. Thus, what is
now to be presented is more a general and practical set of levels or
dimensions of assessment than a rigorously defined theoretical or hier-
archical system. Practitioners from different theoretical perspectives
will likely emphasize different levels of assessment among those presented
below.
Individual factors
A host of personal or individual factors that each partner brings to a
relationship can affect the quality and process of their relationship. Here
we identify several of the factors that we believe are essential to explore
as part of any assessment of a couple.
Assessing individual psychopathology of either (or both) partners
provides a first step. Although the presence of severe individual dys-
function may not preclude couples therapy, knowledge of any individual
problems and how they affect or are affected by the relationship is
essential. Depression is perhaps the most common individual problem
that coexists with marital distress, and is a problem that often remits
along with a reduction in marital discord (Jacobson, Dobson, Fruzzetti,
Schmaling, & Salusky, in press; O'Leary & Beach, 1990). In addition to
individual structured or unstructured interviewing, depression may be
economically assessed using the Beck Depression Inventory (Beck, Rush,
Shaw, & Emery, 1979), or some other appropriate screening device. With
moderate or severe depression, of course, careful attention must be given
to suicidality. With a client who is severely depressed and has high levels
of suicidal ideation (or has a suicide plan with few deterrents), individual
treatment may be indicated either prior to or as an adjunct to marital
therapy.
Similarly, any substance abuse or dependence on the part of either or
both partners must be identified. This may best be accomplished by a
combination of written screening tool (e.g., the McAndrews scale of the
ASSFSSMENT OF COUPLFS 205
measure of the context in which couples live their lives and are therefore
an essential part of any assessment.
Issues associated with partners' own parents provide another av-
enue into understanding a couple's world. Are parents near or far? What
kind of relationship does each partner have with his or her own and the
partner's parents? If the couple has children, what kind of relationship do
they have with their grandparents? Are the parents ill, requiring physical
care, financial, or emotional support? What do the partners expect from
themselves and from each other concerning parents and "in-laws?"
Another important contextual factor concerns employment. This
may, depending on circumstances, involve issues of unemployment or
underemployment, job dissatisfaction or satisfaction, or issues of limited
time together which is attributed to long working hours.
Employment factors often are associated as well with financial factors,
another important contextual variable. It is important not to underesti-
mate the potential impact that financial difficulties can have on a
relationship, especially in the context of raising children, or of medical
problems of either partner, a child, or aging parents, or of other costly
family responsibilities. Of course, relationship and individual problems
can create or complicate financial problems: the relationship between
financial and couple factors may not be one-directional, but instead may
be a reciprocal one.
Unfortunately, the cost of therapy itself can easily exacerbate financial
hardship for many couples. Therefore, it is important to view financial
difficulties as contextual factors relating to relationship difficulties, not
simply as a sign of resistance to therapy.
Finally, an often-overlooked contextual factor is the physical envi-
ronment in which the couple lives. What is their housing and neighborhood
like? Is it a place they both enjoy and in which they feel comfortable? Or,
is it physically close and cramped, constantly in need of attention
(assuming they do not enjoy the process or repair), and a place they
prefer not to spend time?
Although at first glance the contextual factors mentioned might seem
to present intractable problems, it is just this notion itself that is essential
to discern. For example, although it might seem that having a child with
developmental disabilities or being faced with inadequate housing are
virtually impossible problems to solve, they relate in central ways to the
couple's experience. Moreover, often these contextual factors become
recurrent conflict issues in relationships and the themes around which
destructive patterns of interaction revolve. They should not be overlooked.
ASSFSSMENT OF COUPLES 209
nondistressed groups using the DAS has poor reliability (Eddy, Heyman,
&Weiss, 1990). In addition, the unit of analysis to be employed (partners
considered separately or averaged together) continues to be widely
debated (e.g., Baucom, 1983; Whisman, Jacobson, Fruzzetti, & Waltz,
1989). Averaging the partners' scores could mask disagreement: for
example, if a score of 97 is considered the cutoff between satisfied and
dissatisfied, what of a couple who averaged 102, with one partner scoring
72 and the other 132? Similar problems arise if difference scores are
employed. Thus, both individual and combined scores should be exam-
ined in some fashion. Some potential ways to solve these problems (for
research purposes, at least) are: a) consider both couple scores and
separate individual scores in two sets of analyses; b) employ each
partner's score as a univariate measure in analyses with multiple de-
pendent measures; and c) conservatively employ the scores of the
partner who show the smallest change in repeated measure analyses
(Baucom, 1983; Jacobson, Follette, & Elwood, 1984). The use of any of
these approaches would enhance the validity of the DAS as a measure of
treatment outcome.
The "Marital Satisfaction Inventory" (MSI: Snyder, 1979) contains 280
self-report items which generate eleven normalized T-scales, including
overall validity and a global distress scale. The other scales measure
affective communication, problem solving communication, time together,
finances, sexual satisfaction, concerns with childrearing, relationship
with children, role-orientation within the dyad, and distress in the family
of origin.
For clinical purposes, at least, the MSI shows promise in discriminat-
ing among presenting problems (e.g., Berg &Snyder, 1981; Snyder, Wills,
& Keiser, 1981). Whether the separate scales are in fact sensitive to
specific changes in the relationship has not, however, been demon-
strated. In addition, the MSI seems less reactive to relationship changes
than the DAS, especially among women; thus it may be a more conserva-
tive measure of treatment success or other changes over time (Whisman
& Jacobson, 1991).
Another self-report measure of partners' presenting complaints and
targets for treatment is the "Areas of Change" questionnaire" (AOC)
developed by Weiss and associates (Margolin, Talovic, &Weinstein, 1983;
Patterson, 1976; Weiss, Hops, &Patterson, 1973). As its name implies, this
34-item questionnaire assesses a) desired behavior changes to be made
by the other partner, and b) partners' perception of what changes the
other one wants from the respondent. Both sets of ratings are made on a
seven point scale, with anchor points ranging from "much less" to "much
more." Also, by comparing the two parts, each partner's perceptual
accuracy and couple concordance can be assessed.
ASSESSMENT OF COUPLES 211
References
Arthur, J., Hops, H., & Biglan, A. (1982). Living in Family Environments (LIFE)
coding system. Eugene, OR: Oregon Research Institute.
Barlow, D. H. (1981a). A role for clinicians in the research process. Behavioral
Assessment, 3, 227-233.
Barlow, D. H. (1981b). On the relation of clinical research to clinical practice:
Current issues, new directions. Journal of Consulting and Clinical Psychology,
49, 147-155.
Barlow, D. H., Hayes, S.C., & Nelson, R. (1984). The scientist-practitioner. Elmsford,
New York: Pergamon.
Baucom, D. H. (1982). A comparison of behavioral contracting and problem-
solving/communicat ion training In behavioral marital therapy. Behavior
Therapy, 13, 162-174.
Baucom, D. H. (1983). Conceptual and psychometric Issues In evaluating the
effectiveness of behavioral marital therapy. Advances in family intervention,
assessment and theory (Vol. 3, pp. 91-117). Greenwich, CT: JAI Press.
Baucom, D. H., Burnett, C. K., Rankin, L., &Sher, T. G. (1990). Cognitive/behavioral
marital therapy outcome research: What Is success? In R. L. Weiss (Chair), Why
9 out of 10 marital therapists should not prefer satisfaction. Symposium presented
at the 24th Annual Convention of the Association for the Advancement of
Behavior Therapy, San Francisco.
Baucom, D. H., Epstein, N., Sayers, S., &Sher, T. G. (1989). The role of cognitions
In marital relationships: Definitional, mehtodological, and conceptual issues.
Journal of Consulting and Clinical Psychology, 57, 31-38.
Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive therapy of
depression. New YOrk: Guilford.
Berg, P., & Snyder, D. K. (1981). Differential diagnosis of marital and sexual
distress: A multidimensional approach. Journal of Sex and Marital Therapy, 7,
290-295.
220 ALAN E. FRUZZETil AND NEILS. JACOBSON
Biglan, A., & Thorsen, C. (1986). Ther interactive behavior of women with chronic
pain. Unpublished manuscript, Oregon Research Institute, Eugene, OR.
Birchler, G. R., & Webb, L. J. (1977). Discriminating interaction in behavior in
happy and unhappy marriages. Journal of Consulting and Clinical Psychology,
45, 494-495.
Broderick, J. E., & O'Leary, K. D. (1986). Contributions of affect, attitudes, and
behavior to marital satisfaction. Journal of Consulting and Clinical Psychology,
54, 514-517.
Christensen, A. (1988). Dysfunctional interaction patterns in couples. In P. Noller
& M. A. Fitzpatrick (Eds.), Perspectives on mairtal interaction (pp. 31-52).
Clevedon, Avon, England: Multilingual Matters.
Christensen, A., & Heavey, C. L. (1990). Gender and social structure in the
demand/withdraw pattern of marital conflict. Journal of Personality and Social
Psychology, 59, 73-81.
Christensen, A. C., & Mendoza, J. (1986). A method for assessing change in single
subject designs: An alter action of the RC index. Behavior Therapy, I 7, 30~08.
Christensen, A., & Nies, D. C. (1980). The Spouse Observation Checklist: Empirical
analysis and critique. The American Journal of Family Therapy, 8, 69-79.
Cohen, L. H., Sargent, M. M., & Sechrest, L. B. (1986). Use of psychotherapy
research by professional psychologists. American Psychologist, 4 I, 198-206.
Eddy,J. M., Heyman, R. E., & Weiss, R. L (1990). Satisfaction by the numbers: Is the
DASacausefor concern?InR. L. Weiss (Chair), Why9outof /Omarital therapists
should not prefer satisfaction. Symposium presented at the 24th Annual
Convention of the Association for the Advancement of Behavior Therapy, San
Francisco. Filsinger, E. E. (1981). The Dyadic Interaction Scoring Code. In E. E.
Filslinger & R. A. Lewis (Eds.), Assessing marriage: New behavioral approaches.
Beverly Hills, CA: Sage.
Finkelhor, D., Gelles, R. J., Hotalling, G. T., &Straus, M.A. (1983). The dark side of
families: Current family violence research. Newbury Park, CA: Sage.
Fitzpatrick, M.A. (1988). A typological approach to marital interaction. In P. Noller
& M. A. Fitzpatrick (Eds.), Perspectives on marital interaction (pp. 98-120).
Clevedon, Avon, England: Multilingual Matters.
Floyd, F. J., & Markman, H. J. (1984). An economical observational measure of
couples' communication skill. Journal of Consulting and Clinical Psychology, 52,
97-103.
Fruzzetti, A. E., & Jacobson, N. S. (1990). Toward a behavioral conceptualization
of adult intimacy: Implications for marital therapy. In E. A. Blechman (Ed.),
Emotions and the family: For better or for worse (pp. 117-135). Hillsdale, NJ:
Erlbaum.
Fruzzetti, A. E., & Jacobson, N. S. (1991). Depressive response to relationship
dissolution: A comparison of cognitive and contextual factors. Manuscript
submitted for publication. Seattle, WA: University of Washington.
ASSESSMENT OF COUPLES 221
Margolin, G., & Weinstein, C. D. (1983). The role of affect in behavioral marital
therapy. In M. L. Aronson&L. R. Wolberg (Eds.), Group and family therapy 1982:
An overview (pp. 334-355). New York: Brunner/Mazel.
Margolin, G., & Weiss, R. L. (1978). A comparative evaluation of therapeutic
components associated with behavioral marital treatment. Journal ofConsulting
and Clinical Psychology, 46, 1476-1486.
Markman, H. J., & Notarius, C. I. (1987). Coding marital and family interaction:
Current status. InT. Jacob (Ed.), Family interaction and psychopathology (pp.
329-390). New York: Plenum.
Morrow-Bradley, C., & Elliott, R. (1986). Utilization of psychotherapy research by
practicing psychotherapists. American Psychologist, 41, 188-197.
Napier, A. Y. (1978). The rejection-Intrusion pattern: A central family dynamic.
Journal of Marriage and Family Counseling, 4, 5-12.
Notarius, C. l., & Markman, H. J. (1981). The Couples Interaction Scoring System.
In E. E. Filsinger & R. A. Lewis {Eds.), Assessing Marriage (pp. 112-127). Beverly
Hills, CA: Sage.
Notarius, C.l., &Vanzetti, N.A. (1983). Themaritalagendas protocol.ln E. Filsinger
(Ed.), Marriage and family assessment. Beverly Hills, CA: Sage.
O'Leary, K. D., & Beach, S. R. H. (1990). Marital therapy: A viable treatment for
depression and marital discord. American Journal of Psychiatry, 147, 183-186.
O'Leary, K. D., & Murphy, C. (in press). Clinical issues in the assessment of spouse
abuse. In R. T. Ammerman & M. Hersen (Eds.), Assessment of family violence:
A clinical and legal sourcebook. New York: John Wiley.
O'Leary, K. D., & Vivian, D. (1990). Physical aggression in marriage. In F. D.
Fincham & T. N. Bradbury {Eds.), The psychology of marriage: Basic issues and
applications (pp. 232-348). New York: Guilford.
Patterson, G. R. (1976). Some procedures for assessing changes in marital
interaction patterns. Oregon Research Institute Bulletin, 16 (7).
Sharpley, C. F., & Cross, D. G. (1982). A psychometric evaluation of the Spanier
Dyadic Adjustment Scale. Journal of Marriage and the Family, 44, 739-741.
Snyder, D. K. (1979). Multidimensional assessment of marital satisfaction. Journal
of Marriage and the Family, 41, 121-131.
Snyder, D. K., Wills, R. M., &Keiser, T. W. (1981). Empiricalvalidationofthe Marital
Satisfaction Inventory: An actuarial approach. Journal ofConsulting a_nd Clinical
Psychology, 49, 262-268
Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the
qulity of marriage and similar dyads. Journal ofMarriage and the Family, 38, 15-
28.
Stanton, M. D.,&Todd, T. C. (1982). Thefamilytherapyofdrugabuseandaddiction.
New York: Guilford.
Straus, M. A. (1979). Measuring intrafamily conflict and violence: The Conflict
Tactics (CT) Scales. Journal of Marriage and the Family, 4 I, 75-88.
224 AlAN E. FRUZZETn AND NEILS. JACOBSON
Straus, M.A., & Gelles, R. J., &Steinmetz, S. K. (1980). Behind closed doors: Violence
in the American family. Garden City, New York: Doubleday I Anchor.
Sullaway, M., & Christensen, A. (1983). Assessment of dysfunctional interaction
patterns in couples. Journal of Marriage and the Family, 45, 653-660.
Touliatos, J., Perlmutter, B. F., & Straus, M. A (1990). Handbook of family
measurement techniques. Newbury Park, CA: Sage.
Vincent, J. P., Cook, N. 1., & Messerly, L. (1980). A social learning analysis of
couples during the second postnatal month. The American Journal of Family
Therapy, 8, 49-68.
Weiss, R. L., &Aved, B. M. (1978). Marital satisfaction and depression as predictors
of physical health status. Journal ofConsulting and Clinical Psychology, 46, 1379-
1384.
Weiss, R. L., &Frohman, P. E. (1985). Behavioral observation as outcome measures:
Not through a glass darkly. Behavioral Assessment, 7, 309-316.
Weiss, R. L., Hops, H., & Patterson, G. R. (1973). A framework for conceptualizing
marital conflict, a technology for altering it, some data for evaluating it. In L. A
Hamerlynck, L. C. Hardy, & E. J. Mash (Eds.), Behavior change: Methodology,
concepts, and practice. Champaign, IL: Research Press.
Weiss, R. L., &Perry, B.A. (1983). The Spouse Observation Checklist: Development
and clinical applications. In E. E. Filsinger (Ed.), Marriage and fam ilyassessment
(pp. 65.84). Beverly Hills, CA:. Sage.
Weiss, R. L., & Margolin, G. (1986). Assessment of conflict and accord: A second
look. InA. Ciminero (Ed.),Handbook ofbehavioral assessment(2nd edition) (pp.
561-600). New York: Wiley.
Whisman, M.A., &Jacobson, N. S. (1991). Changes in marital adjustment following
marital therapy: Comparisons between outcome measures. Unpublished
manuscript. Seattle, WA: University of Washington.
Whisman,M.A.,Jacobson,N.S.,Fruzzetti,A.E.,&Waltz,J.A.(1989).Methodological
issues in marital therapy. Advances in Behaviour Research and Therapy, 11, 175-
189.
Wile, D. (1981). Couples therapy: A non-traditional approach. New York: John Wiley.
Yllo, K., &Bograd, M. (1988). Feminist perspectives on wife abuse. Newbury Park,
CA: Sage.
CHAPTERS
Assessment of Creative
Potential in Psychology and
the Development of a Creative
Temperament Scale for the CPI
Harrison G. Gough
This chapter will deal with two topics. The first is the identification of
creative potential among graduate students in psychology, and predic-
tion of this potential from biographical and test measures available at the
time of entry into training. The second is the development of a scale for
creative temperament on the California Psychological Inventory (CPO
(Gough, 1987), capable of forecasting creative attainment in fields other
than psychology, as well as within the psychological domain. Although
any discipline can be expected to demand certain specific skills and
attributes for creative achievement, it also appears that there is a more
general constellation of personal qualities cutting across disciplinary
boundaries (Amabile, 1983; Barron, 1965; Helson, 1988; Isaaksen, 1988; &
MacKinnon, 1978). Attention will also be paid to similarities and differences
in characteristics associated with creativity for men and women (Bachtold
& Werner, 1970; Helson, 1978).
HARRISON G. GOUGH, Professor of Psychology, Emeritus, University of California,
Berkeley, California.
225
226 HARRISON G. GOUGH
Assessment Procedures
The battery of tests included the Strong Vocational Interest Blank for
Men (SVIB-M) (Strong, 1943) to assess interests, and the ACL, CPI, and
Minnesota Multiphasic Personality Inventory (MMPI) (Hathaway &
McKinley, 1940) in the personality sphere. In the 1960s, the revised
version of the SVIB (Campbell, 1966, 1977) was introduced, and later the
325-item current version of the Strong Interest Inventory (SII) (Campbell
& Hansen, 1981; Hansen & Campbell, 1985) was adopted.
For cognitive assessment, a preliminary form of a new Psychological
Vocabulary and Information Test (PVIT) was administered. Harold
Sampson, a 1953 Ph.D. in psychology from Berkeley, was working with me
on this test, and by 1952 we had completed two parallel forms, each
containing 150 items of psychological information, and furnishing scores
on nine areas of psychology (applied, comparative, developmental, ex-
perimental, general, physiological, personality, social, and statistics)
plus a total or overall score for each form. In the analyses to be reported
below, sums of the scores on Forms A and Bwere used for all1 0 measures.
Levine's (1950) Minnesota Psycho-Analogies Test, Forms A and B, was
also included in this first session, and one form or the other was used until
the mid-1970s. In the 1950s, the College Vocabulary Test (CVT) (Gough &
Sampson, 1954) that Sampson and I had constructed was put into the
battery. Each form of the CVT (A or B) had 75 items, calibrated so as to
give an average difficulty level of .50 for college students.
In addition, a brief biographical data blank was employed, to obtain
information about high school and college activities, and from the appli-
cation file undergraduate GPA and scores on the MAT were secured.
Later, after Dawes (1971) had introduced his rating scale for undergradu-
ate colleges, his scoring method was applied to the schools from which
our students had come. Dawes' ratings, relying on data presented by Cass
and Birnbaum (1968), gave higher values to schools that were more
228 HARRISON G. GOUGH
Creadvlty Criterion
About once every three years, faculty ratings were gathered of the
students who had entered during that period. One rating was for future
promise, defined as "Potentiality, performance, and promise: An overall
evaluation of the student's performance as a psychologist, potentiality
for significant work in the field, and general promise as a member of the
profession." The other was for creativity, defined as "The creative quality
of the student's thinking and research in psychology." For both ratings,
a seven-point scale was used, with levels defined as fonows: 7 =one of our
best graduate students, 6 = clearly above average, 5 = somewhat above
ASSFSSMENT OF CREATIVE POTENTIAL 229
Description of Sample
Figure 1 presents the mean MMPI-1 profiles for males and females
separately on the three validating and 10 clinical scales of the inventory.
Both profiles show the combination of moderate elevation on both F and
Koften associated with personal effectiveness and progressive tempera-
ment (Dahlstrom, Welsh, & Dahlstrom, 1972; Greene, 1980).
On the clinical scales, males scored highest on Scale 5, whereas
females scored lowest on this same measure. Because high standard
scores on Scale 5 for men indicate femininity, and low standard scores on
Scale 5 have the same meaning for women, this finding carries similar
implications for both sexes. In particular, studies of highly educated and
intellectually talented men usually report elevations on Scale 5, probably
associated with esthetic and culturally sophisticated attitudes more than
any feminization of personality (Friedman, Webb & Lewak, 1989). Scales
4 (psychopathy) and 8 (schizophrenia) are also moderately elevated on
both profiles, suggesting a certain degree of unconventionality and
willfulness (Graham, 1987).
Mean CPI profiles are presented in Figure 2. Both profiles have
standard scores above 60 on the scales for Capacity for Status (Cs),
Achievement via Independence (A1), Psychological-mindedness (Py),
and Flexibility (Fx). These elevations are suggestive of resourceful inde-
pendence, ambition, a talent for psychological thinking, and openness to
new experience. The impression one gets from both inventories, taken
together, is of well-integrated, adaptable, intellectually adept, and expe-
rience-seeking individuals.
The CPI may also be interpreted in reference to its internal structure.
The 1987 revision of the inventory (Gough, 1987, 1989) introduced a three-
vector model of personality structure, based on measures of interpersonal
and normative orientations and self-realization or ego integration. These
three dimensions correspond to the psychometric fundamentals of the
CPI as delineated by smallest-space analysis (Karni & Levin, 1972). A v.1
(vector) scale for interpersonal orientation assesses an axis going from
involvement and interpersonal responsiveness atone pole to detachment
and privacy-seeking at the other. The second vector scale (v.2) assesses
an axis going from norm-favoring and rule-accepting dispositions at one
ASSESSMENT OF CREATIVE POTENTIAL 231
.. .....
StarkeR. Hathaway and J. Charnley McKinley
-·$1
U\ ~
no -
FEMALE
1115.:
~ -
.,_ ~-
·· -: ~- ~-
·~
·-
--:. ~-
,._
·- ·-
" -: ~- ·~ ...: ~-
~-
• . .:
,._
"~
I ·- ..-- --
.. ·-
·- ·-
,....: ~
·'»-=: ·-
,. ~ -
~
,._
--,_- '
-
,, ~
~
~~ .~ ,,_
·~ ·- , _
B-: ·-- ,_
•-:
"4
·-
.~
".. :
.~
120 - .
no ~
, _
MALE
~- ·- ,_
., _
'•-:
·- ··-
" -:
·-
to$ ~
• -:
,_ .,_ eo;
·-"-: ··- ..
".. :
..
~
·- ~~
·-
UD-
~ ~ ·· ----
"-: ~- ,._
,._
~...
»- ~-
tO- , _ ··-
-~ -
., ~
-- .~ , ·-
,._
· -:
•-:
·-
••-:
"4
·-
.• ... .....' . _,..
"~
.~
III•SI
'
..• ...... k+111;
Flgure 1. MMPI mean profiles for 405 female and 623 male entering graduate
students In psychology.
232 HARRISON G. GOUGH
FEMALE NORMS
r--t---r--t-~r--t---r--t-~---.--,---+-~---+--~--+-~~-~--4---+-~ ~
..
.0 - . -+--x--
, +-'i--1
, f---::-+--=--+---+--> ~ - -·-~ -+--+-~---1-;+-L+:--+--4--T-f.--+--+-~
'f =
- L_ _J __ _ ~~~~~--J_--1-~~~~~---L-·~---L~~--L-~~~
Do Cs Sy Sp Sa In Em Ro So 5< Ci Cm Wb To At Ai lo Py F• FIM
...
»-r;rf--t--1-~-= -+---+--+--~~---+---~~=--"-
~ 1-~---r= - ~- - ~~ -
~ ~
~ f--~,._r-+-=--+---+---::-+-"-
"-+..::,=---+- :;;.
~
..._ --1-" - - --+..::!:l--+-=
~
;;;-+--+----1-i- -=--;--; "
~ ~
j . ._
: f--;;:-b~+-x-t:_.-~.._;;;;j.;.....-+--15--t-"!111_'-t-:~_'--~ - -i;_~ -~ -~-+,...;;_;-+..::=;-,,c.~...:p-'!1-14~+~+--
- " <!! ;L - ::--....:
eo
~ ~- , -_1' ,;'":: i / ;; -,, '~ : I!'
'!" 2' . . . ;. ; ; _;__::/ -
<> - , -
-E ~ ~ : f
- = +,;-.+-;N r!--+~f----lf-;+:i_
_i"'f-:,. .-+o""_-..:1:~~ :-=_;; ;o_ ji --;-4-.12..-1-;,.;;-4...::,-4-~ • a.
1! •
i:,. ,. .. :_ lr
I
~ =.
- : -
10 - . lO 10
~ = - ~ lO 1S ~ - = - ~ - - 2
.. - - -1-=- ~ ::::::-1----=--1-- - -- -:- +..:::=---+-==--~,.'-+-=-+-+-~~-~-=- -;--~--~- ..
~- ~ ~-~-~-~-+-~-~-~--~-4--::-+-==
- - - - - -
- -4-~- ~-~-~-- ~--=-+-=-~-==--~ ..
~ ~
= !. io
-- -; :. - -=-~ 1-;:-
s - "(I
_ 1-•• +-+-==-!-=--+-· =
:- t-f-
- 0 ~- 10 "5' !1
=+..:::-+-:--t-=:.....t-=+-::--1
-, 'f -. ;;;
~ ;
~ t-~-+-.,.-+--+-<>-+--=--+---+-';=-lf---. -+-i=r-+---+--+-...-,.l--;-1--~=--l---1- ~ -1--"-+-~- . - ;,
' - - - ' - --'----''---- ; 1f __ L_:-__L___I_ __L__.L.=-L..~___ L _ j
Do a ~ ~ s. ~ ~ Ro So s.: g ~ ~ Th ~ M ~ ry h ~
Figure 2. CPI mean profiles for 405 female and 623 male entering graduate
students in psychology.
234 HARRISON G. GOUGH
Findings
Table I
Intercorrelations Among the Variables Usted for 623 Male and 405 Female
Entering Graduate Students In Psychology at Berkeley, 1946-1981
Intercorrelations-
Variables 2 3 4 5 6
1. Prestige rating of -.10** .24** -.17** .07 .19**
undergraduate college -.09 .21 ** -.12* -.04 .20**
2. College grade point .06 -.05 .06 .09*
average .04 -.05 .21 ** .14**
3. Miller Analogies Test .05 .01 .25**
.11* .02 .25**
4. Age at entry -.29** -.15**
-.09 -.07
5. Year of entry .05
.13**
6. Creativity rating
by faculty
a, males in row 1, females in row 2 * p~ .05 ** p~.01
Table 2
Correlations between Intellective-Cognitive Measures Administered at
Admission to Graduate School and Faculty Ratings of Creativity
Males Females
Measures N r N r
Psychological Vocabulary
and Information Test:
Applied 530 .02 366 -.02
Comparative .15** .07
Developmental -.01 .11 *
Experimental .17** .22**
General .18** .16**
Personality .08 .11*
Physiological .10* .08
Social .16** .14**
Statistics " .18** .10*
Total Score .18** .17**
Chapin Social Insight Test 258 .07 178 .28**
College Vocabulary Test, Form A 215 .16** 131 .24**
College Vocabulary Test, Form B 221 .16** 125 .23**
Minnesota Psycho-Analogies, Form A 273 .20** 143 .21 **
Minnesota Psycho-Analogies, Form B 464 .27** 264 .24**
* p~ .05 ** p ~ .01
ASSESSMENT OF CRFATIVE POTENDAL 237
Table3
Correlations between Scales of the Adjective Checklist at Entry to Graduate
School and Faculty Ratings of Creativity
ACLScales Males& Females11
Number of adjectives checked -.08* .02
Number of favorable adjectives .01 .15**
Number of unfavorable adjectives .03 -.03
Communality -.03 .13**
Achievement .00 .09
Dominance .00 .12*
Endurance -.05 .04
Order -.04 -.02
lntraception .06 .13**
Nurturance -.04 .03
Affiliation -.02 .10*
Heterosexuality -.08* .09
Exhibition .00 .07
Autonomy .07 .05
Aggression .02 .02
Change .04 .07
Succorance -.08* -.11*
Abasement -.06 -.08
Deference -.04 -.05
Counseling Readiness -.03 .02
Self-Control -.03 -.06
Self-confidence .02 .13**b
Personal Adjustment -.02 .09
Ideal Self .04 .14**
Creative Personality .17** .26**
Military Leadership .00 .10*
Masculinity .02 .05
Femininity -.07 .03
Critical Parent -.02 -.01
Nurturing Parent -.02 .05
Adult .02 .08
Free Child .04 .09
Adapted Child .00 -.11*
High Origence I Low Intellectence -.03 -.05
High Origence I High Intellectence .09* .07
Low Origence I Low lntellectence -.07 .03
Low Origence I High Intellectence .02 .06
a, N = 623 * p s. .05 b, N = 405 ** p s. .01
238 HARRISON G. GOUGH
the Lie scale was negative (r = -.12) and that for the Mf scale was positive
(r = .13). It should be mentioned that all of the correlations in Table 5 were
computed from raw scores on the MMPI. This means that for the Mf scale
higher scores are associated with stronger femininity for both sexes.
Although four MMPI scales produced correlations at the .Ollevel for
women, neither L nor Mf was among these. The Hypochondriasis, De-
pression, and Welsh Anxiety scales had negative coefficients, whereas
the Barron Ego Strength scale was positively related to the ratings.
Table 6 presents findings for a number of measures directed toward
esthetic, independent, and non-conformist attributes. Three of Barron's
self-report inventory scales were significantly (p ~ .01) related to the
ratings for both sexes: personal complexity, independence of judgment,
and disposition toward originality. Barron's scale for soundness; on the
other hand, was essentially uncorrelated. Both versions of the Art Scale
were correlated with the ratings at the .Ollevel. Within the unpublished
Differential Reaction Schedule, three of the subscales pertaining to
originality showed positive and significant (p ~ .01) values, and the total
score for originality did the same. The "P-4" scale for motivation to
succeed also revealed significant relationships. All of the measures in
Table 5
Correlations between Scales of the MMPI at Entry to Gradnate School and
Faculty Ratings of Creativity
MMPI Scales Males- Femalesb
L (Lie) -.12** -.09
F (Frequency) .02 -.09
K (Ego Functioning) -.01 .09
Hs + .5K (Hypochondriasis) -.01 -.18**
D (Depression) -.01 -.19**
Hy (Hysteria) .03 -.03
Pd + .4K (Psychopathic Deviate) -.05 -.04
Mf (Femininity) .13** .05
Pa (Paranoia) -.03 -.03
Pt + K (Psychasthenia) .02 -.11*
Sc + K (Schizophrenia) -.03 .00
Ma + .2K (Hypomania) -.04 .04
Si (Social Introversion) .04 -.12*
A (Welsh Anxiety Scale) .01 -.17**
R (Welsh Repression Scale) .04 -.06
ES (Barron Ego Strength Scale) .07 .21 **
Table6
Correlations between Selected Research Measures at Entry to Graduate
School and Faculty Ratings of Creativity
Males Females
Measures N r N r
Barron Complexity/Simplicity Scale 623 .28** 405 .20**
Barron Independence Scale .27** .26**
Barron Originality Scale .18** .24**
Barron Personal Soundness Scale .02 .09
Barron-Welsh Art Scale 259 .22** 162 .25**
Welsh Revised Art Scale .20** .21 **
Differential Reaction Schedule
Intellectual Competence 623 .10** 405 .19**
lnquiringness .18** .17**
Cognitive Flexibility .13** .24**
Esthetic Sensitivity .19** .08
Sense of Destiny .03 .19**
Sum of the above five scales .20** .29**
P4 (Motivation for Success) .15** .27**
* p.:;,.05 **p.:;,.01
ASSESSMENT OF CRFATIVE POTENTIAL 241
Table 7
Correlations between Selected Scales of the Strong Interest Inventory at
Entry to Graduate School and Faculty Ratings of Creativity
Scales Males" Females~>
TableS
Analysis of Variance by Type and Level for Creativity Ratings of 1,028
Graduate Students In Psychology
Levels 1-4 LevelS Levels 6-7 Total
Type N M N M N M N M SD
Alpha 30 47.20 43 50.86 108 50.43 181 49.99 10.21
Beta 22 44.45 6 41.00 49 49.73 77 47.55 11.36
Gamma 59 49.25 134 51.99 294 51.46 487 51.33 9.52
Delta 45 47.89 80 49.27 158 50.75 283 49.88 9.66
Total 156 47.79 263 50.73 609 50.95 1,028 50.41 9.87
Analysis F df p
Type 3.22 3 .02
Level 5.44 2 <.01
TxL 1.03 6 .41
ASSESSMENT OF CRFATIVE POTENTIAL 243
G•-•-....-I /
I
I
I
/
I
I
I
I
I
/.U~
51 I
/ ... .Jelt&s
/
/
/
50 /
/
/
/
/
/
Bet.&•
I r ·-·- ·-·-'
49
48
.· /
I
I
47
I
I
46
I
I
45 i
I
44
1+2+)+4 5
in the CPI was assigned a dummy weight of "1" if answered true, and a
dummy weight of "0" if answered false. Then these dummy weights were
correlated with the criterion ratings of creativity. Items were selected for
the CT scale if their correlations were at the .0 1level of confidence in the
total sample, and at or beyond the .10 level for each sex considered
separately. Thirty-five items met these requirements. Seven additional
items whose content was congruent with psychological notions about
creativity and whose correlations with the criterion were at least at the
.10 level of significance were then added, making a total of 42 items in the
ASSESSMENT OF CRFATIVE POTENTIAL 245
aptitude college seniors studied by Helson (1967) was ranked next. Half
of this sample had been nominated by faculty members as exceptional or
outstanding in regard to creativity, whereas the others-matched on
aptitude scores-were not nominated. From the strength of the nomi-
nations a differentiated criterion was formulated; its correlation with CT
was .52.
A sample of 41 women in mathematics (Helson, 1971) comes next. A
panel of eminent mathematicians rated the work of all these women for
creativity, and these ratings served as the criterion index; its correlation
with CT was .46. Sixty-six honors students in engineering came next
(Gough, 1976). They were rated by from four to twelve faculty members
with whom they had taken courses and seminars. These ratings for
creativity when pooled yielded a correlation of .53 with CT. The 124
architects studied by MacKinnon (1964) was ranked next on the CT scale.
One subsample was composed of eminent, world-famous architects,
another subsample was made up of men of similar age who had worked
with these eminent practitioners, or in the same firm with them, and a
third subsample was composed of a cross-section of members of the
architectural association, matched for age with the first subsample. The
names of all124 men were then submitted to a panel of university teachers
of architecture, editors of major journals, and leading practitioners not
included in the total. Their ratings were pooled, leading to the criterion
evaluation for creativity. This index correlated .44 with the CT scale.
Finally, the 37 business executives in Ireland studied by Barron and Egan
(1968) were rated by members of the assessment staff and also by a panel
drawn from the Irish Management Institute. Because the former ratings
were based more on style and personality than on the actual work of the
Table 9
Means and Standard Deviations on the CT (Creativity) Scale and
CorTelations with External Ratings of Creativity In the Samples Indicated
Samples N M SD r
Psychology graduate students, males 623 30.23 4.54 .25*
Psychology graduate students, females 405 29.92 4.96 .33*
Research scientists, males 45 28.09 4.90 .33*
Mathematicians, males 57 27.79 4.80 .47*
High-aptitude college seniors, females 51 27.45 4.21 .52*
Mathematicians, females 41 25.12 6.12 .46*
Honors students in engineering, males 66 25.02 4.91 .53*
Architects, males 124 24.62 5.49 .44*
Irish business executives, males 37 22.73 4.47 .34*
* p~ .01
ASSFSSMENT OF CREATIVE POTENTIAL 247
men, only the evaluations by the lMl experts were used. These ratings
correlated .34 with the CT scores.
Two comments are in order on the findings in Table 9. The first is that
the CTscale correlated positively and significantlywith criteria of creativity
in all cross-validating samples, going from a low of .33 to a high of .53, and
with a median of .46. It is apparent that the CT scale, even though
developed on a sample of graduate students in psychology, assesses
qualities that are related to creative attainment in other fields as well, and
perhaps to creativity in general. The second is that all of the cross-
validating coefficients surpass those found in the initial sample. This is
unusual indeed. At least part of the explanation must lie in the carefully
evolved and highly valid criteria available for these cross-validating
samples. It is an axiom in criterion-linked research that as the precision
and validity of the criterion goes up the stronger will be the relationships
to measures diagnostic of this criterion.
Personologlcal Implications of Cf
What are the attributes associated with higher and lower scores on
CT, beyond the basic goal of identifying creative potential? A good way to
discover these more general implications is to examine descriptions that
observers give of persons who have taken the CT scale, in order to
determine specific implications of higher and lower the scores. The
observers, of course, must not have any knowledge of the CT scores, and
it is even better if they are not focused on creativity or any particular
notions relevant to this criterion. In the archival files at IPAR samples of
530 men and 293 women were available, all of whom had taken the CPI and
all of whom had been described on the ACL by panels of 10 staff observers.
By summing the number of observers checking each adjective, a de-
scriptive score ranging from a possible minimum of 0 to a possible
maximum of 10 can be generated, and then these sums can be correlated
with the CT scale. When this was done in the sample of 823 assesses, far
too many adjectives had coefficients significant at the .01level of prob-
ability to warrant citing them all. For this reason, only those adjectives
with positive correlations of .30 or above, and negative correlations of-
.23 or below were selected for review. A second consideration was that
any adjective cited should be significantly (p ~ .01) related to CT for each
sex considered alone.
Applications of these rules led to the designation of the eight adjec-
tives most strongly descriptive of persons with high scores on CT, and the
eight adjectives more strongly descriptive of persons with low scores.
The eight adjectives associated with high scores and their correlations
for men and women, respectively, were: imaginative (.41, .39), curious
248 HARRISON G. GOUGH
(.40, .37), interests wide (.40, .37), original (.37, .34), resourceful (.34, .33),
versatile (.33, .36), clever (.32, .33), and complicated (.32, .30).
The eight adjectives most strongly descriptive of persons with lower
CT scores were: conseroative (-.33, -.48), conventional (-.31, -.48), interests
narrow (-.29. -.40), simple (-.27, -.33), commonplace (-.28, -.27), dull (-.25,
-.29), stolid (-.23, -.31), and rigid (-23, -.24).
A similar analysis was carried out in a sample of 236 couples, in which
each person (N = 472) was described on the ACL by a spouse or partner.
These couples had been studied in projects on population psychology
(Gough, 1973) and interpersonal dependency (Hirschfeld, Klerman, Gough,
Barrett, Korchin, & Chodoff, 1977). There were 201 couples from the
former project, and 35 from the latter. Dummy weights of 1-0 on each
adjective were correlated with CT scores for all472 persons, and for the
236 men and 236women separately. The reduced range on the descriptive
side (from 0 to 10 down to 0-1), plus the probable loss in validity in going
from a panel of 10 observers to a single observer, led to much lower
correlations in this sample. For indicative items, the rules for selection
were (1) a coefficient of .20 of greater in the full sample, and (2) correla-
tions of .17 or more for both men and women considered separately. For
contraindicative items, cutting points were set at -.13 or beyond for the
total sample, and at the same value for each sex considered separately.
The six most descriptive terms under these rules were: unconventional
(.20, .29), individualistic (.24, .1 7), imaginative (.30, .25), insightful (.21, .21 ),
adventurous (.19, .22), and reflective (.17, .26). The six adjectives most
strongly associated with low scores on CT were: conseroative (-.25, -.30),
interests narrow (-.14, -.30), prejudiced (-.27, -.14), conventional (-.18, -.17),
silent (-.15, -.16). and organized (-.14, -.13).
Another kind of information from which inferences about the meaning
of the CT scale can be drawn is the trend in mean scores for various
samples. The current CT manual (Gough, 1987) presents such data for 39
male and 30 female samples. The male normative sample had a mean of
22.33 and a standard deviation of 5.92 on CT. Among the samples with
distinctly higher scores were psychologists (M = 30.23), research scien-
tists (M = 28.09), mathematicians (M = 27.79), social work graduate
students (M =26.67), and medical students (M =26.09). Among those with
distinctly lower means were correctional officers (M = 18.59), prison
inmates (M = 18. 76), sales managers (M = 19.49), and police officers (M =
21.11).
The female normative sample had a mean of 22.02 and a standard
deviation of 5. 72. Among the female samples with distinctly higher scores
were psychologists (29.92), high-aptitude college students (M = 27 .45),
social work graduate students (M = 27.00), medical students (M = 26.16),
and students of law (30.15). Among the groups with distinctly lower
ASSFSSMENT OF CREATIVE POTENTIAL 249
A Conceptual Synthesis
Many specific variables predictive of creativity in psychology have
been cited in Tables 1 through 9 above, but without any attempt to group
them into logical or functional categories. In Table 10, nine such func-
tional clusters are proposed, on the basis of rational analysis, and key
measures for each category are cited.
The first category refers to the ability to "educe relationships,"
employing Spearman's (1904) language for the essential capacity in
general intelligence. The analogy item, in particular if applied to field-
relevant content, seems to tap cognitive processes that are important for
creative work in psychology.
The second category is breadth of pertinent information. Those who
know more about the broad range of a field, including esoterica in its
nooks and crannies, are better equipped to do things that are new and
consequential.
A third cluster refers to independence of mind, and to independence
in interpersonal behavior. A fourth comprises esthetic interests and
orientation. A fifth cluster incorporates measures of openness to experi-
ence, liking for complexity, and ego differentiation. A sixth involves
psychological-mindedness and psychological interests. Adaptive flexibil-
ity constitutes a seventh cluster, and minimal interest in work stressing
details and record keeping or the enforcement of social norms defines an
eighth category. Finally, scales developed specifically to identify creative
potential, whether in psychology itself or in other fields of endeavor,
appear to assess qualities that can predict creative performance. It
should be pointed out that the clusters proposed in Table 10 for at-
tributes of creativity important in psychology are quite similar to clusters
proposed by others (for example, Barron, 1965) for creativity in general.
250 HARRISON G. GOUGH
Table 10
Conceptual Grouplog of Measures Showing Promlae as Predictors of
Creative Potential In Psychology
1. Ability to educe relationships
a. Miller Analogies Test (.25, .25)
b. Minnesota Psycho-Analogies, Form A (.20, .21)
c. Minnesota Psycho-Analogies, Form B (.27, .24)
2. Range of psychological information
a. PVIT Experimental Psychology Subscale (.17, .22)
b. PVIT General Psychology Subscale (.18, .16)
c. PVIT Total Score (.18, .17)
3. Independence of Judgment
a. Barron Independence Scale (.27, .26)
b. CPI Achievement via Independence Scale (.20, .34)
4. Esthetic propensity
a. Barron-Welsh Art Scale (.22, .25)
b. Welsh Revised Art Scale (.20, .21)
c. Strong Interest Inventory Artist Scale (.23, .13)
5. Personal complexity and differentiation
a. Barron Complexlty/Simpllclty Scale· (.28, .20)
b. DRS lnqulrlngness Scale (.18, .17)
6. Psychological Orientation
a. CPI Psychological-mlndedness scale (.15, .25)
b. Strong Interest Inventory Psychologist Scale (.22, .23)
7. Adaptive flexibility
a. CPI Flexlblllty Scale (.16, .21)
b. DRS Cognitive Flexibility Scale (.13, .24)
8. Minimal Interest In occupations with strong norm-enforcing or detail-
centered requirements
a. Strong Interest Inventory Accountant Scale (-.16, -.21)
b. Strong Interest Inventory Banker Scale (-.25, -.27)
c. Strong Interest Inventory Pharmacist Scale (-.17, -.20)
d. Strong Interest Inventory Pollee Officer Scale (-.22, -.18)
e. Strong Interest Inventory Purchasing Agent Scale (-.21, -.23)
9. Above average scores on scales developed to assess creative potential
a. ACL Creative Personality Scale (.17, .26)
b. Barron Originality Scale (.18, .24)
c. CPI Creative Temperament Scale (.25, .33)
d. DRS Total score (.20, .29)
Note: Correlations of each measure with criteria of creativity for males and
females, respectively, are given within the parentheses.
ASSESSMENT OF CREATIVE POTENTIAL 251
: ~ - " - ; ~
0-5 0: !i- - -;_
. :0 -~ :- -
= ~ : : = ; , 0
.. ~-= I - I I .. I - I I-
. -•_ ,
- - - 10 0 5 ..
I ~ - : :
_ -L_ 0 -- '-- 2~
Do Cs Sy Sp Sa In Em Re So Sc Gi Cm Wb To Ac Ai Je Py Fx F/M !='l
flgure 4. Case I (-): Type and Level = Alpha 6, Creativity rating = 70, CT score= 70. 8
Case 2 (- - -): Type and Level = Delta 6, Creativity rating = 70, CT score= 73. 8
::
ASSESSMENT OF CRFATIVE POTENTIAL 253
Summary
A prospective study of 1,028 entering graduate students in psychol-
ogy (623 men, 405 women) was carried out, relating biographical data and
test measures gathered at the time of beginning the graduate program to
faculty ratings of creativity obtained from one to three years later. The
project began in the fall of 1950, and testing was continued each year up
through the fall of 1981.
In the biographical realm, the only variable related to the criterion of
creativity for both sexes was a prestige rating of the undergraduate
college attended, derived from the selectivity of each college's admissions
practices, with correlations of .19 for men and .20 for women. Graduate
students coming from more prestigious or selective colleges tended to
receive higher ratings on creativity from their graduate instructors than
did students coming from less prestigious schools.
In the realm of cognitive and aptitude tests, correlations with creativ-
ity were generally in the range from .20 to .25. Students scoring higher on
tests such as the Miller Analogies, Minnesota Psycho-Analogies, and the
College Vocabulary Test tended to receive higher ratings.
In the personality sphere, measures of psychopathology and malad-
justment, as assessed by the MMPI, were generally unrelated to the
creativity criterion, but measures directed explicitly to the assessment of
creativity or to hypothesized elements such as personal complexity and
esthetic awareness had correlations between .22 and .28. In the area of
vocational interests, the Psychologist scale on the Strong Interest Inventory
had correlations withcreativityof .22 for men and .23 for women, whereas
the scale for Banker had corresponding correlations of -.25 and -.27. A new
scale, called CT for Creative Temperament, was derived by item analyses
254 HARRISON G. GOUGH
of the CPI for these 1,028 students. It had correlations of .25 for males and
.33 for females with the creativity criterion, as would be anticipated given
the mode of its development. In seven cross-validating samples drawn
from other disciplines and occupational settings, CT had a median
coefficient with the creativity criteria of .46.
Certain lifestyles, as defined within a structural model of personality
generated from the CPI, were related to the ratings. Gammas, whose way
of living combines interactive involvement with others along with skep-
ticism concerning normative conventions, were most creative among the
four CPI types. Betas, whose way of living reflects a need for privacy and
distance from others with positive valuation of societal norms, ranked
lowest.
The analysis ended with two illustrative case vignettes, students
whose early promise was great, as indicated by test scores and biographical
data, and who also received very high ratings from faculty members. One
of these two students went on to achieve all that had been expected of him
in his professional career, but the other, encountering traumatic events
and ego-wounding experiences, more or less fell by the wayside. These
case illustrations serve as critical reminders that in prediction the
context and circumstances of the life situation must always be considered,
and that they set strict limits on what can be forecast from personological
data alone.
References
Amabile, T. M. (1983). The social psychology of creativity: A componential
conceptualization. Journal of Personality and Social Psychology, 45, 357-376.
Bachtold, L. K., & Werner, E. E. (1970). Creative psychologist: Gifted women.
American Psychologist, 25, 234-243.
Barron, F. (1953a). Complexity-simplicity as a personality dimension. Journal of
Abnormal and Social Psychology, 48, 163-172.
Barron, F. (1953b). Some personality correlates of independence of judgment.
Journal of Personality, 21, 287-297.
Barron, F. (1954). Personal soundness In university graduate students: An
experimental study of young men In the sciences and professions. University
of California Publications in Personality Assessment and Research, No. 1.
Barron, F. (1965). The psychology of creativity. In New directions in psychology
(Vol 2, pp. 3-134). New York: Holt, Rinehart & Winston.
Barron, F., &Egan, D. (1968). Leaders and innovators in Irish management. Journal
of Management Studies, 5, 41-60.
Barron, F., & Welsh, G. A (1952). Artistic perception as a factor in personality
style: Its measurement by a figure-preference test.Joumal ofPsychology, 33, 199-
203.
ASSESSMENT OF CREATIVE POTENTIAL 255
Gough, H. G., & Sampson, H. (1954). The College Vocabulary Test, Forms A and B.
Berkeley, CA:. University of California Institute of Personality Assessment and
Research.
Gough, H. G., & Woodworth, D. G. (1960). Stylistic variations among professional
research scientists. Journal of Psychology, 49, 87-98.
Graham, J. R (1987). The MMPL· A practical guide (2nd ed.). New York: Oxford Press.
Greene, R L. (1980). The MMPL· An interpretive manual. New York: Grune and
Stratton.
Hall, W. B., & MacKinnon, D. W. (1969). Personality inventory correlates of
creativity among architects. Journal of Applied Psychology, 53, 322-326.
Hansen, J. C., & Campbell, D.P. (1985). Manual for the SVIB-SCI/ (4th ed.). Stanford,
CA: Stanford University Press.
Hathaway, S. R, & McKinley, J. C. (1940). A multiphasic personality schedule
(Minnesota): I. Construction of the schedule. Journal ofPsychology, 10, 249-254.
Helson, R. (1967). Personality characteristics and developmental history of
creative college women. Genetic Psychology Monographs, 76, 205-256.
Helson, R (1971). Women mathematicians and the creative personality. Journal
of Consulting and Clinical Psychology, 36, 210-220.
Helson, R (1978). Creativity in women. In J. Sherman & F. Denmark (Eds.), The
psychology of women: Future directions in research (pp. 533-604). New York:
Psychological Dimensions.
Helson, R (1988). The creative personality. InK. Gronhaug&G. Kaufmann (Eds.),
Innovation: A cross-disciplinary perspective (pp. 29-64). Oslo, Norway: Norwegian
University Press.
Helson, R, & Crutchfield, R. S. (1970). Mathematicians: The creative researcher
and the average Ph.D.Journal of Consulting and Clinical Psychology, 34,250-257.
Hirschfeld, R. M. A, Klerman, G. L., Gough, H. G., Barrett, J., Korchin, S. J., &
Chodoff, P. (197('). A measure ofinterpersonal dependency.Joumal ofPersonality
Assessment, 41, 610-618.
Isaaksen, S. G. (1988). Educational implications of creativity research: An updated
rationale for creative learning. InK. Gronhaug &G. Kaufmann (Eds.),/nnovation:
A cross-disciplinary perspective (pp. 167-203). Oslo, Norway: Norwegian
University Press.
Karni, E. S., & Levin, J. (1972). The use of smallest space analysis in studying scale
structure: An application to the California Psychological Inventory. Journal of
Applied Psychology, 56, 341-346.
Kelly, E. L., &Fiske, D. W. (1951). The prediction ofperformance inclinicalpsychology.
Ann Arbor, MI: University of Michigan Press.
Kelly, E. L., Goldberg, L. R., Fiske, D. W., &Kilkowski, J. M. (1978). Twenty-five years
later: A follow-up study of the graduate students in clinical psychology assessed
In the VA Selection Research Project. American Psychologist, 33, 746-754.
Levine, A. S. (1950). Construction and use of verbal analogy items. Journal of
Applied Psychology, 24, 105-107.
ASSESSMENT OF CREATIVE POTENTIAL 257
Author Note
At its inception, the project reported here was supported from a general research
grant to the Institute of Personality Assessment and Research by the Rockefeller
Foundation. In the late 1950s and 1960s, support was given by two career research
grants that I received from the Ford Foundation.ln the 1970s, aid came from gifts
from the Consulting Psychologists Press, and from intramural faculty research
grants. In the 1980s, a grant from the Spencer Foundation supported work on a
follow-up inquiry, and on a consolidation of the computer archival files. All of this
financial aid is gratefully acknowledged.
Many individuals gave invaluable and much appreciated assistance to the study
during the years of its existence since 1950. Not every person can be named, but
specific acknowledgment must be made of those key persons who designed and
took responsibility for the computer analyses and archival files, beginning with
Quintin Welch and continuing with Susan Hopkin, DanielS. Weiss, Peter B. Lifton,
Kevin Lanning, and Pamela Bradley. Significant help in the choice and construction
of tests was furnished by Frank Barron, Ravenna Helson, Donald MacKinnon,
Harold Sampson, and George Welsh. At every stage of the program, including
management of the project during years when I was absent on sabbatical leave,
Wallace B. Hall played a vital role. I want to thank each of these individuals, as well
as all others who contributed to the study.
Index
Academic engaged time (AE11. in Areas of Change questionnaire
SEM, 62,63 (AOC), 210-211
ACL, see Adjective Check Ust Army, psychiatric screening in,
ACQ Behavior Checklist, 87, 88, 89 169
Adaptation, in intelligence, 3-4 Assessment, 76-77,99
Adaptive testing, with MMPI-2, 157- taxonomy linked with, 77, 78
159
Adjective Check List (ACL), 226, Back F scale, ofMMPI-2, 135, 149
227,235,237,247,248, Balance-beam task, 20
250 Barnum effect, 156
Adjustment Reaction of Childhood, Barron's self-report inventory, 228,
76 239-240,250
Adolescent assessment, see Mul- Barron-Welsh Art Scale, 228, 250
tiaxial empirically based as- Base Expectancy Scale (BES), 117
sessment Basic Personality Inventory (BPI),
AEI', see Academic engaged time xiv, 165-195
Age class model compared with, 167-
multiaxial empirically based as- 168, 169-170
sessment and, 79, 81, 85- dimensional model compared
86,87,88-89,95,99 with, 168-171
PCL-R and, 113 format and administration of, 166
AJcoholabuse, 176,183,187,191- MMPI relationship to, 166, 167,
192, see also Substance 173, 178, 180-183, 194,
abuse 195
AJexithymia, 193 norms used in, 166-167
Alienation scale, BPI, 174, 178, 179, purpose of, 165-166
183, 187 research applications of, 187-194
Anorexia, BPI assessment of, 187 scale construction and multivari-
Anthropological metaphors, 5, 21- ate item analysis in, 178-
26 180
Antisocial behavior, see Antisocial scale facets and concepts in, 171-
personality disorder; Struc- 177
tural equation modeling typologies in, 183-187
Antisocial personality disorder Beck Depression Inventory, -204
(APD), 105-106, 111, 116, BES, see Base Expectancy Scale
118, 121-122, see also Psy- Binet, AJfred, xii, 2, 3-5, 9
chopathy Biological metaphors, 5, 12-18, 34
Anxiety scale, BPI, 175, 184, 187 Blacks, PCL-R and, 113, 119
AOC, see Areas of Change question- Blood-flow approach, 13, 16, 17, 34
naire Boring, E.G., 5
APD, see Antisocial personality dis- BPI, see Basic Personality Inven-
order tory
Aphasia, 14 Buchwald, Art, 132
259
260 INDEX
TRF, see Teacher's Report Form VRIN Scale, see Variable Response
Triarchic theory of intelligence, 30, Inconsistency Scale
31-32, 33 Vygotsky, Lev, 26, 27
TRIN Scale, see True Response
Inconsistency Scale Wechsler intelligence tests, xii, 8, 17
True Response Inconsistency Scale Welsh Anxiety scales, 239
(TRIN), of MMPI-2, 136, 149 Welsh Revised Art Scale, 228, 250
T-scores Wiener and Harmon Subtle and
ofMMPI. 49 Obvious scales, 152-153
of MMPI-2, 134, 136, 144, 147- Wissler, Clark, 3
148, 157, 158, 159 Wolof Tribe, 24-25
ofMSI, 210
in multlaxial empirically based YABCL, see Young Adult Behavior
assessment, 95, 96 Checklist
Two-factor theory of intelligence, 6 YASR, see Young Adult Self-Report
Young Adult Behavior Checklist
Universalism, 23 (YABCL), 81
Young Adult Self-Report (YASR), 81
Variable Response Inconsistency Youth Self-Report (YSR), 81, 82, 89,
Scale (VRIN), of MMPI-2, 90,91,92,93-98,99
135-136, 149 YSR, see Youth Self-Report
Violence
couple assessment and, 212-213 Zone of proximal development, 27-29
PCL-Rassessmentof,l17,119,
122