You are on page 1of 277

Advances in

Psychological
Assessment
VOLUMES
A Continuation Order Plan is available for this series. A continuation order will
bring delivery of each new volume immediately upon publication. Volumes are bill-
ed only upon actual shipment. For further information please contact the publisher.
Advances in
Psychological
Assessment
VOLUMES
Edited by
James C. Rosen
University of Vermont
Burlington, Vermont

and
Paul McReynolds
University of Nevada-Reno
Reno, Nevada

SPRINGER SCIENCE+BUSINESS MEDIA, LLC


ISBN 978-1-4757-9103-7 ISBN 978-1-4757-9101-3 (eBook)
DOI 10.1007/978-1-4757-9101-3
© 1992 Springer Science+Business Media New York
Originally pub1ished by Plenum Press, New York in 1992
Softcover reprint of the hardcover 1st edition 1992

AII rights reserved

No part of this book may be reproduced, stored in a retrieval system, or transmitted


in any form or by any means, electronic, mechanical, photocopying, microfilming,
recording, or otherwise, without written permission from the Publisher
Contributors

THOMAS M. ACHENBACH, Professor, Department of Psychiatry, Univer-


sity of Vermont, Burlington, Vermont.

LEW BANK, Oregon Social Learning Center, Eugene, Oregon.

JAMFS N. BUTCHER, Professor, Department of Psychology, University of


Minnesota, Minneapolis, Minnesota.

AlAN E. FRUZZETTI, Doctoral Candidate, Department of Psychology,


University of Washington, Seattle, Washington.

HARRISON G. GOUGH, Professor of Psychology, Emeritus, University of


California, Berkeley, California.

ROBERT D. HARE, Professor, Department of Psychology, University of


British Columbia, Vancouver, British Columbia, Canada.

TIMOTHY J. HARPUR, Assistant Professor, Department of Psychology,


University of Illinois, Champaign, Illinois.

STEPHEN D. HART, Assistant Professor, Department of Psychology,


Simon Fraser University, Burnaby, British Columbia, Canada.

RONAlD R. HOlDEN, Department of Psychology, Queen's University,


Kingston, Ontario, Canada.

DOUGLAS N. JACKSON, Senior Professor, Department of Psychology,


The University of Western Ontario, London, Ontario, Canada.

NEILS. JACOBSON, Professor, Department of Psychology, University of


Washington, Seattle, Washington.

v
vi CONTRIBUTORS

PAUL McREYNOlDS, Emeritus Professor of Psychology, University of


Nevada-Reno, Reno, Nevada.

GERAlD R. PAITERSON, Oregon Social Learning Center, Eugene,


Oregon.

JAMES C. ROSEN, Professor and Director of the Clinical Psychology


Program, Department of Psychology, University of Vermont, Burlington,
Vermont.

ROBERT J. SfERNBERG, Professor, Department of Psychology, Yale


University, New Haven, Connecticut.

NATHAN C. WEED, Instructor, Department of Psychology, University of


Minnesota, Minneapolis, Minnesota.
Preface

The present volume, like the earlier ones in this series, is designed to
help keep the assessment psychologist abreast of significant new devel-
opments in the field. The series is addressed both to the practitioner and
the researcher, and has also proved helpful to graduate students in
psychology, and this volume can usefully serve as a supplementary text
for graduate classes in assessment. The chapters in the present volume,
as in the preceding ones, are most directly relevant to workers in the
areas of measurement, clinical psychology, and personality psychology,
but some chapters will also be of use to specialists in other areas.
It has been the policy, in determining chapter topics and in soliciting
authors for volumes in this series, to first survey the field of psychological
assessment as a whole, and then to focus on those trends and techniques
that are particularly innovative and are at the forefront of current
development. Our basic aim is to highlight and articulate advances in the
field. It is, of course, not sufficient, In order for a theme to merit inclusion
in this series, that it be essentially new; equally important, the contribu-
tion to the field of assessment must be substantial. This criterion means
that topics selected for inclusion in this series are rarely completely
novel; rather, they represent relatively new developments-which, how-
ever, have been sufficiently tested and utilized to make it evident that
they truly are advances.
We feel that the eight chapters comprising the present volume-the
eighth in the ongoing series-meet these criteria particularly well, and it
Is with considerable pleasure that we present them to the professional
assessment community.
Wetakethisopportunitytoth ankEliotWerneratPlenumP ressforhis
assistance and support in producing this book. Most of all, we express our
gratitude to the 14 authors whose scholarly contributions have resulted
in the excellence of this volume.
James C. Rosen
Paul McReynolds

vii
Contents

Introduction .......................... .................... .. ................. .. ............. .......... .... xi


Paul McReynolds and James C. Rosen
CHAPTER 1
Metaphors of Mind Underlying the Testing of Intelligence ...... ........ 1
Robert J. Sternberg
CHAPTER2
The Use of Structural Equation Modeling In
Combining Data from Different Types of Assessment ........... ............ 41
Lew Bank and Gerald R. Patterson
CHAPTER3
New Developments in Multiaxlal Empirically
Based Assessment of Chlld and Adolescent Psychopathology ........ 75
Thomas M. Achenbach
CHAPTER4
The Psychopathy Checklist- Revised (PCL-R):
An Overview for Researchers and Clinicians ..................................... 103
Stephen D. Hart, Robert D. Hare, and Timothy J. Harpur
CHAPTERS
The MMPI-2: Development and Research Issues ................................ 131
Nathan C. Weed and James N. Butcher
CHAPTER6
Assessing Psychopathology Using the Basic
Personality Inventory: Rationale and Applications ........................... 165
Ronald R. Holden and Douglas N. Jackson
CHAPTER 7
Assessment of Couples ............................................................................ 201
Alan E. Fruzzetti and NeilS. Jacobson

ix
CONTENTS

CHAPTERS
Assessment of Creative Potential In Psychology and the
Development of a Creative Temperament Scale for the CPI ........... 225
Harrison G. Gough

Index .......................................... .......................................... ..... 259


Introduction

Paul McReynolds and James C. Rosen

Our purpose in this Introduction is to offer some brief background


comments on the substantive chapters to follow. The general object of
these remarks is to place each chapter in its proper context within the
overall assessment scene, and in so doing to enhance the meaningfulness
of the chapters, both individually and as a group. In addition we will
provide a brief overview of the entire volume, focusing on the order of the
chapters, their relations to each other, and their import as a whole.
Though each chapter stands alone, and focuses on a particular area
of advance in assessment, when taken as a group they constitute, we
believe, a representative-albeit incomplete-picture of contemporary
psychological assessment. And though the contributions can of course
be read in any order, the order in which we have arranged them is not
random, but rather reflects an underlying plan.
The first two chapters concern the conceptual bases of assessment.
Of these, the opening chapter addresses the conceptual foundations of
intelligence testing-historically the first area of assessment to gain
prominence-and the second reviews and delineates a new mathemati-
cal approach in data analysis. The following four chapters-numbers 3
through &-are all concerned with methods of assessment. The first of
these four focuses on the evaluation of children and adolescents, and the

PAUL McREYNOLDS, Emeritus Professor of Psychology, University of Nevada-


Reno, Reno, Nevada.
JAMFS C. ROSEN, Professor and Director of the Clinical Psychology Program,
Department of Psychology, University of Vermont, Burlington, Vermont.
xU PAUL McREYNOLDS AND JAMES C. ROSEN

remaining three are directed toward adult assessment. Also, the first two
of the four-chapters 3 and 4-represent techniques in which the asses-
sor systematically evaluates the individual being assessed, whereas the
latter two chapters of this group-i.e., chapters 5 and 6-deal with self-
report inventories. Chapter 7, the penultimate contribution, approaches
assessment from a different perspective; rather than examining a specific
method of assessment it focuses on a particular object of assessment, in
this case couples. The final chapter, number 8, has a still different
rationale, in that it is concerned with the assessment of an important
human potential, the capacity for creativity.
This, then, is the overall plan of the volume. We turn now to a more
detailed introduction to each chapter, beginning with Chapter 1, on the
assessment of intelligence.
Intelligence testing continues as one of the most basic areas of
assessment. Among the relatively recent technological developments in
this area are revisions of the WAIS, WPPSI, and Stanford-Binet, publica-
tion of the Kaufman Assessment Battery for children (K-ABC), and the
third edition of the WISC. Chapter 1, however, is not concerned with
particular intelligence tests, but rather takes a broader, more conceptual
approach to the assessment of intelligence. Its author, RobertJ. Sternberg,
is an internationally renowned authority on the nature of intelligence, and
a sharp critic of traditional intelligence assessment. In his contribution
here he first provides a brief historical introduction to the measurement
of intelligence, with an emphasis on Binet's work, and then offers a broad,
overall schematization-built around the guiding concept of metaphor-
of seven different approaches to conceptualizing and assessing
intelligence. This scholarly and thought-providing chapter not only effec-
tively broadens the scope in terms of which intelligence is typically
conceived, but also provides an informed glimpse of what the intelligence
tests of the future-and perhaps not too distant future-may well be like.
High quality assessment, particularly when utilized in research, is
dependent not only on the availability of sound data-gathering instru-
ments and procedures, but also on adequate methods of data analysis.
This truism takes one within the realm of the statistical bases of assess-
ment. Though perhaps less known to the majority of assessment
psychologists than the development of new instruments, there have
indeed been major recent advances in data analysis. One of the most
striking of these is the systematic procedure known as structural equa-
tion modeling. This modern, highly sophisticated method for dealing
meaningfully with the combined effects of an assortment of variables is
the topic of Chapter 2. The authors, Lew Bank and G. R. Patterson, are
expert not only in the logic and mathematics of structural equation
modeling, but in addition have meaningfully utilized the technique in
INTRODUCTION xlii

their own influential studies on children and parents at the Oregon Social
Learning Center. Their contribution here includes illustrative examples
from those researches. These examples demonstrate that structural
equation modeling, by its very nature, is closely tied to substantive
psychological theory, and can assist in deciding which of several theoreti-
cal formulations is to be preferred.
Chapter 3, as noted earlier, is on the assessment of children and
adolescents. The author, Thomas M. Achenbach, is one of the most highly
respected and cited authorities in this area. He is responsible for the
development of several widely employed instruments, including the Child
Behavior Checklist (for parents) and the Teacher's Report Form, for
reporting specific child behaviors based on direct observation. In addi-
tion, he has contributed in a major way to the theoretical understanding
of child psychopathology, especially with respect to taxonomical issues.
In his contribution to this volume, Achenbach, rather than concentrating
solely on particular instruments, draws a broader picture. He proposes
an orientation that he terms "multiaxial empirically based assessment".
This approach involves integrating data from a variety of sources, includ-
ing behavior ratings, interviews, ability tests, and physical evaluations.
Further, a computer program is available for the integration of certain
data on individual children. Both in terms of the individual measures
discussed and the overall assessment model described, this chapter
offers a wealth of insights to the child assessor.
With Chapter 4 we turn to assessment in a major area of psychopa-
thology-psychopathy. For many years the understanding of this disorder,
sometimes also referred to as sociopathy or antisocial personality disor-
der-has constituted a major research problem. This situation was long
exacerbated by the lack of an adequate instrument for identifying and
describing the psychopath. In recent years, however, this lack has been
largely alleviated by Robert D. Hare and his associates in their develop-
ment of the Psychopathy Checklist. This instrument, now in a revised and
updated form, is without question a major advance, both for researchers
and clinicians. In their contributions to this volume Stephen D. Hart, Hare,
and Timothy J. Harpur provide an up-to-date overview of the Checklist,
beginning with an analysis of the concept of psychopathy and then
reviewing the psychometric properties, factor structure and range of
application of the instrument. Their chapter is properly viewed as both
a basic contribution to the understanding of psychopathy, and as an
examiner's guide for psychologists wishing to employ the Psychopathy
Checklist-Revised.
The MMPI has been with us for a half century; it was developed in the
early 1940s, and first published in 1943. In the decades to follow it gained
an international reputation as the foremost broad-based psychological
xlv PAUL McREYNOlDS AND JAMFS C. ROSEN

instrument for psychopathological assessment. Then in the latter 1980s


the inventory, which in some respects was showing certain signs of age,
was revised and restandardized, under the leadership of a committee
composed of James N. Butcher, W. Grant Dahlstrom, John R. Graham,
Auke Tellegen, and Beverly Kaemmer. In 1989 the new version was
published-as most readers of this Introduction already are aware-as
MMPI-2. Clearly, this was an event of the utmost importance in the field
of assessment. In the present volume Chapter 5, by Nathan C. Weed and
Butcher, is devoted to the new MMPI-2. The chapter summarizes the
rationale and development of the revised inventory, and reviews major
research trends. As such, the chapter will prove useful both to those
readers who systematically utilize the MMPI and those who merely wish
to keep themselves informed on a major advance in assessment.
Whereas Chapter 5 concerned a classic instrument in the assessment
of psychopathology, Chapter 6 describes a highly promising newer
instrument designed for the same general role. This is the Basic Personal-
ity Inventory (BPI), by Douglas N. Jackson. Readers not otherwise familiar
with Jackson's work will recognize him as the author of the well-known
Personality Research Form and the Jackson Personality Inventory, both
widely employed in the evaluation of normal functioning. In contrast, the
Basic Personalty Inventory, as already indicated, is explicitly addressed to
the explanation of psychopathology. Shorter than the MMPI, and assess-
ing somewhat different dimensions of psychopathology, it clearly has a
number of attractive features. Though the test was constructed some
time ago the manual was published only in 1989, after considerable
research had accumulated. The present chapter is by Ronald R. Holden,
who has done extensive research on the inventory, and Jackson. It
provides an excellent description of the BPI, sets forth the rationale
underlying the test's construction and development, and reviews recent
research on the test. All psychologists working in the area of psychopa-
thologywill find this chapter a useful resource in bringing them up-to-date
on a valuable new addition to their wares.
In Chapter 7 the focus turns from concentration on a single instru-
ment to concern with a particular object of assessment, specifically, the
assessment of couples. Treatment of distressed couples is a major
clinical area, both for the practitioner and the researcher, and sound
assessment plays a crucial role in successful treatment. In this chapter
Alan E. Fruzzetti and NeilS. Jacobson, who are well-known authorities in
this field, bring the reader up-to-date on recent developments in couples
assessment. They succeed in incorporating a wealth of information in
their wide-ranging chapter, including a conceptual analysis of the pur-
poses and levels of couples assessment, and a survey of the major
available assessment techniques that have direct clinical relevance. The
INTRODUCTION

authors strongly recommend a multi-level, multi-method approach to


assessment when dealing with couples, rather than focussing on only one
data source. They make this point very concretely by providing a model
protocol-which contains a variety of evaluative procedures-for couples
assessment. In closing, Fruzzetti and Jacobson draw from their own wide
experience to offer a number of helpful suggestions for further research
in couples assessment.
The eighth and final chapter differs notably from the preceding seven
in one major respect: whereas the first seven provide detailed perspec-
tives and reviews of given procedures or of particular areas of assessment,
Chapter 8 presents extensive new empirical data. The topic addressed is
the assessment of an important human dimension-creativity. Several
decades ago research on the nature and assessment of creativity was one
of the most active areas in personology; though recently the area has
received less attention it is no less Important. The author of Chapter 8,
Harrison G. Gough, is the creator of the now classic California Psychologi-
cal Inventory. His chapter here consists of two highly significant research
contributions. The first of these reports on a monumental ongoing study
beginning in 1950, and involving over a thousand subjects and an exten-
sive test battery, and aimed at the identification of creative potential in
psychology graduate students. The second part of Gough's chapter
describes the development of a Creative Temperament (C1) scale for the
CPI, and presents extensive normative and validity data. Both studies add
greatly to the understanding of creativity, and the new CT scale provides
an important new tool for investigators on this topic.
It Is evident from these introductory remarks that the eight chapters
to follow cover a wide range of recent advances In psychological assess-
ment. Because of their diversity, quality and topicality the various
contributions provide, we believe, important resources for the advanced
student, the researcher, and the practitioner.
CHAPTER I

Metaphors of Mind Underlying


the Testing of Intellige nce

Robert J. Sternberg

Theory and practice have maintained an uneasy alliance throughout


the history of intelligence testing. Contrary to popular belief, most tests
of intelligence have been at least loosely based on theories of intelligence.
The quality of the theories has varied, of course, as has the degree of
correspondence between the tests and the theories on which they are
based. Many of the problems of intelligence testing have been attributed
to the lack of theoretical basis in conventional intelligence testing (see,
e.g., Hunt, Frost, & Lunneborg, 1973). But I believe that the biggest
problem in the testing of intelligence has not been the weakness of the
linkage between theory and practice, but rather, the almost exclusive use
of a single metaphor of mind underlying the testing of intelligence, a
geographic metaphor that views intelligence tests as providing maps of
a part of the mind. In this chapter, I will argue that our testing of
intelligence has been and continues to be inadequate, in part because
tests have been only partially adequate operationalizations of the theo-
ries upon which they are based, but in greater part because the theories
upon which they are based have been derived from just one of the many
possible metaphors of mind. If we want to improve our tests, we need to

ROBERT J. STERNBERG, Professor, Department of Psychology, Yale University,


New Haven, Connecticut.

I
2 ROBERTJ.STERNBERG

broaden them to take into account metaphors of mind other than the
geographic one.
The chapter is divided into three major parts. In the first, I will
consider two main historical traditions in the testing of intelligence, both
in terms of the theories of intelligence upon which much later work was
based, and in terms of the tests that emanated from these theories.
In the second part of the chapter, I will consider alternative meta-
phors of mind and how they have influenced the testing of intelligence. In
the third part of the chapter, I will briefly summarize main points and
draw conclusions.

Historical Views of Intelligence


If current thinking about the nature of intelligence owes a debt to any
scholars, the debt is to Sir Francis Galton and to Alfred Binet. These two
investigators-Galton at the end of the nineteenth century and Binet at the
beginning of the twentieth century-have had a profound impact on our
thinking about intelligence, an impact that carries down to the present
day. Many present conflicts regarding the nature of intelligence can be
traced to conflicts between Galton and Binet. To understand contempo-
rary thinking about and measurement of intelligence, one needs to know
how Galton and Binet conceived of intelligence.

Sir Francis Galton's VIew of lntelllgence


Galton's theory. Galton (1883) believed two general qualities distin-
guished the more from the less intellectually able. The first was energy, or
the capacity for labor. Galton believed that intellectually gifted individu-
als in a variety of fields were characterized by remarkable levels of
energy. His second general quality was sensitivity. Galton observed that
the more sensitive the senses are with respect to differences in lumines-
cence, pitch, odor, or whatever, the larger is the range of information on
which intelligence can act.
Galton's tests of Intelligence. For seven years (1884-1890), Galton
maintained an anthropometric laboratory at the South Kensington Mu-
seum in London where, for a small fee, visitors could have themselves
measured on a variety of psychophysical tests. One such test was weight
discrimination. The apparatus consisted of a number of cases of car-
tridges, filled with alternative layers of shot, wool, and wadding. The
cases were all identical in appearance, and differed only in their weights.
Subjects were tested by a sequencing task. They were given three cases,
and with their eyes closed, had to arrange them in proper order of weight.
METAPHORS OF MIND 3

The weights formed a geometric series of heaviness, and the examiner


recorded the finest interval that an examinee could discriminate. Galton
suggested that similar geometric sequences could be used for testing
other senses, such as touch and taste. With touch, Galton proposed the
use of wire-work of various degrees of fineness, whereas for taste, he
proposed the use of stock bottles of solutions of salt of various strength.
For an olfaction, he suggested the use of bottles of attar of rose in various
degrees of dilution.
James McKean Cattell brought many of Galton's Ideas across the
Atlantic to the United States. Cattell (1890) proposed a series of fifty
psychophysical tests, such as the greatest possible pressure one could
achieve by squeezing one's hand and the distance on the skin by which
two points must be separated in order for them to be felt as separate
points. Thus, Cattell took Galton's theory and refined the measures that
could be used to assess intelligence, based on this theory.
Psychophysical tests find little or no place in modern-day tests of
intelligence as administered in schools and in industry. The coup-de-
grace was administered by a student in Cattell's own laboratory. Clark
Wissler (1901) proposed that Cattell's tests should correlate both with
each other and with external criteria of academic success, such as grades
in the undergraduate program at Columbia University. Wissler found a
sample of Cattell's tests to correlate with each other only at the level that
would be expected as a result of a distribution corresponding to the laws
of chance. Moreover, he found that whereas students' grades in college
correlated with each other at a very high level, they showed only trivial
correlations with the mental tests of Cattell, as based on the work of
Galton. Wissler concluded that his tests told us nothing about the general
ability of college students, and indeed, he interpreted his results as
casting doubt on the notion that there even exists such a thing as general
intelligence.

Alfred Binet's VIew of Intelligence


Binet's theory. Binet and Simon (1916) theorized that intelligent
thought is composed of three distinct elements: direction, adaptation,
and criticism. Direction consists of knowing what has to be done and how
it is to be accomplished. When we are required to add two numbers, for
example, we give ourselves a series of instructions on how to proceed,
and these instructions form the direction of thought. The direction need
not be conscious. Adaptation refers to one's selection and monitoring of
one's strategy during the course of task performance. Retarded children,
according to Binet and Simon, show a lack of adaptive ability, which
manifests itself in part in terms of what these authors call n 'importequisme
4 ROBERTJ.STERNBERG

(no-matter-whatism). No-matter-whatism derives from a lack of critical


sense, a lack of differentiation in thinking, and an absence of persistence
of intellectual effort. Criticism, or control, is the ability to criticize one's
own thoughts and actions. Binet and Simon (1916) believed much of this
ability to be exercised beneath the conscious level. Mental defectives
show a lack of control. Their actions are frequently inappropriate to the
task at hand. For example, Binet and Simon reported on a retarded
individual who, when told to copy an "a," scribbled a formless mass and
smiled at it in a self-satisfied way.
The above formulation should make clear that, contrary to the
contemporary convention wisdom, Binet was not atheoretical in his
approach to intelligence and its development. On the contrary, he and
Simon conceived of intelligence in ways that were theoretically sophisti-
cated-more so than much of the work that followed theirs-and that
resembled in content much of the most recent thinking regarding
metacognitive information processing. Whatever may be the distinction
between the thinking of Binet and that of Galton, it is not that Galton was
theoretically motivated and that Binet was not. If anything, Binet had a
more well-developed theory of the nature of intelligence than did Galton.
Instead, the distinction was in the way these scientists selected items for
the tests that they proposed to measure intelligence, and in the degree of
correspondence between theory and test.
Binet's test of Intelligence. The Binet tests, as revised in the United
States by Lewis Terman, consisted of a variety of exercises that measure
higher order thinking abilities. For example, at the two-year old level,
children might be required to put circular, square, and triangular pieces
into holes on a board of appropriate shape, or to identify parts of the
body. At age eight, tests include defining words, recognizing why each of
a set of statements is foolish, and requiring children to say how each of
two objects is the same as, and different from, the other. By age fourteen,
tests have become more complicated, and include, beside vocabulary,
tests such as reasoning, requiring solution of arithmetic word problems,
and ingenuity, requiring individuals to indicate the series of steps that
could be used to pour a given amount of water from one container to
another.
Galton chose test items so that they would correspond quite closely
to his theory of intelligence in terms of psychophysical ability. Binet
chose items to measure the judgmental abilities upon which his theory
focused, but these items were also chosen so that they would differenti-
ate between the performance of children of different ages or mental
capacities as well as be intercorrelated with each other at a reasonable
level.
METAPHORS OF MIND 5

Binet's tests were not as closely allied to his theory as were Galton's.
Although the items do measure judgmental abilities, the judgments that
need to be made are somewhat artificial, and are generally removed from
real-world judgmental tasks. At best, they mimic simple school tasks, but
they capture little of the richness of the kinds of judgments we need to
make when we make important business or life decisions, such as
whether to buy a house or a car. Binet's theory was much closer in its
conception to real-world intelligence than was Galton's. His test items,
distant as they were from real-world problem solving and decision
making, were better predictors of these things than were Galton's test
items. But whereas Galton started a tradition of a close correspondence
between theory and test, Binet started a tradition of a much more modest
correspondence.

Modern Metaphors and


Their Implications for Testing
In this section of the chapter, I consider various alternative modern
metaphors of mind, and their implications for the testing of intelligence.
I will consider seven metaphors: the geographic, the computational, the
biological, the epistemological, the anthropological, the sociological, and
the systems metaphors. Each views the mind in a different way, and has
somewhat different implications for how intelligence should be tested.
The important message in the consideration of these metaphors is
that intelligence tests are no more a unitary phenomenon than is intelli-
gence itself. Boring's (1923) view that intelligence is what the tests test,
an operational view that has been oft-repeated since Boring suggested it
(e.g., Jensen, 1969), is meaningless. The way in which intelligence is
tested will depend upon a theory of intelligence, and this theory will in
turn depend upon a metaphorical metatheory of mind. Boring's view of
intelligence being what the tests test was probably based on theories
generated by the geographic metaphor (i.e., a metaphor of intelligence as
a map of the mind). But there are alternative metaphors that can generate
theories and tests of intelligence, and none of these metaphors is privi-
leged with respect to the others in terms of any kind of validity for the
tests so generated. Most tests have been based upon the geographic
metaphor, but this is an historical accident rather than a logical or
psychological necessity. Any metaphor is capable of generating theories
and tests, and indeed, most have done so. Thus, the message of the
metaphorical approach is that our conceptualization of intelligence
needs to be broader than it has been, and more receptive to alternative
view points. The alternative metaphors are not mutually exclusive, but
6 ROBERT J. STERNBERG

rather, in large part, complementary. They deal with different aspects of


intelligence, as seen from different points of view. Metaphors are not right
or wrong, but rather more or less useful for a particular purpose. A full
understanding of intelligence needs to embrace multiple metaphors.

Metaphors Viewing Intelligence as Internal to


the Individual
The Geographic Metaphor
The geographic metaphor views a theory of intelligence as providing
a map of the mind. This view extends back as least to Gall, perhaps the
most famous of phrenologists. Gall implemented the metaphor of a map
in a literal way, investigating the topography of the head. The measure of
intelligence, according to Gall, resides in a person's pattern of cranial
bumps.
During the first half of the twentieth century, the metaphor of intelli-
gence as something to be mapped dominated theory and research.
However, the metaphor of the map became more abstract, and less literal,
than it had been for Gall. The psychologist studying intelligence was both
an explorer and a cartographer, seeking to chart the innermost regions of
the mind. Visual inspection and touching just would not do. The psy-
chologists needed tools such as factor analysis to understand the mind's
structure.
Geographic theories. Geographic theories of the mind are, for the
most part, factor based. They view intelligence in terms of factors, which
Vernon (1971) likened to lines of longitude and latitude. The factors
produce a space that comprises the various mental abilities proposed to
constitute intelligence.
The earliest of the major geographic theories, that proposed by
Charles Spearman (1927), proposed a "two-factor" theory of intelligence.
The theory posits a general factor, common to all tasks requiring intelli-
gence, and one specific factor unique to each type of task. Godfrey
Thomson (1939) later criticized Spearman's theory, arguing that it was
possible to have a general factor in the absence of a general ability in the
head. To Thomson, g was a statistical reality but a psychological artifact
resulting from the workings of an extremely large number of what
Thomson called "bonds," all of which are sampled simultaneously in
intellectual tasks.
Later, Thurstone (1938) proposed the theory of primary mental
abilities, according to which intelligence comprises seven such abilities:
verbal comprehension, verbal fluency, number, memory, perceptual
METAPHORS OF MIND 7

speed, inductive reasoning, and spatial visualization. Thurstone was


antagonistic to Spearman's theory of g, because he believed that
Spearman's general factor was obtained only because Spearman failed to
rotate his factorial axes upon obtaining an initial solution. Thurstone
suggested that the general factor was merely an epiphenomenon of the
correlations among the primary mental abilities.
Some of the more recent theories of intelligence based on the geo-
graphic metaphor have been hierarchical. For example, Vernon (1971)
proposed a hierarchical theory, with gat the top of the hierarchy and two
major group factors, verbal-educational ability and spatial-mechanical
ability, below general ability. Cattell (1971) proposed a theory similar to
Vernon's, although the theory is much more detailed. The two abilities of
greatest interest for present purposes are what Cattell referred to as
crystallized and fluid abilities. Crystallized ability is essentially the accu-
mulation of knowledge and skills throughout the life course, whereas fluid
ability is flexibility of thought and ability to reason abstractly. Gustafsson
(1984) has used confirmatory factor analysis to study the model of Cattell,
and has come to the conclusion that the general factor of Spearman is
essentially identical to Cattell's fluid ability factor.
Not all recent theorists have taken the hierarchical approach. Guilford
(196 7; Guilford & Hoepfner, 1971) posited 120 distinct abilities (increased
to 150 by Guilford, 1982), organized along three dimensions. These
dimensions are operations, products, and contents. As there are five
operations, six products, and four contents, at least in an earlier version
of the theory, there are a total of 5 x 6 x 4 = 120 abilities.
Yet a different organization of abllitles was proposed by Guttman
(1954), whose radex, or radial representation of complexity model,
consists of two parts. The first part is what Guttman refers to as a simplex.
If one Imagines a circle, than the simplex refers to the distance of a given
point (ability) from the center of the circle. The closer an ability is to the
center of the circle, the more central it is to human intelligence. Thus, g
could be viewed as being at the center of the circle, whereas the more
peripheral abilities, such as perceptual speed, would be nearer to the
periphery of the circle. The second part of the radexis called a circumplex.
It refers to the angular orientation of a given ability with respect to the
circle. Thus, abilities are viewed as being arranged around the circle, with
abilities that are more highly related (correlated) nearer to each other In
the circle. Snow, Kyllonen, and Marshalek (1984) have used nonmetric
multidimensional scaling in order to demonstrate that the Thurstonian
primary mental abilities can actually be mapped into a radex.
Geographic tests. Intelligence testing has been dominated almost
exclusively by the geographic metaphor.lndeed, intelligence testing and
the main methodology through which the geographic metaphor has been
8 ROBERT J. STERNBERG

operationalized-factor analysis-grew up hand and hand. Not only is it


difficult, historically, to separate testing from the geographic metaphor,
but for many if not most of the people who do testing, there is no other way
with which they are familiar. The goal of testing, for these people, is to
obtain one or more scores corresponding to levels of ability with respect
to each of the regions of the mind posited by a given theory, whether it is
just a single region or multiple regions. Even those who do not subscribe
to the geographic metaphor often end up using tests generated under the
geographic metaphor as the criterion by which the validity of their tests
will be judged.
Although geographically-based tests bear some resemblance to each
other, the content of a given test will depend on a theory from which it is
derived. For example, tests based upon Spearman's (1927) notion of g
tend to measure what Spearman (1923) called "apprehension of experi-
ence," "eduction of relations," and "eduction of correlates." In more
modern terms, these might be called encoding of stimuli, inference of
relations, and application of relations. They tend to be measured by test
items such as analogies, number series, classifications, and matrix prob-
lems. Frequently, figural (abstract) content is used in order to minimize
effects of knowledge. Tests based on Thurstone's (1938) theory, includ-
ingThurstone and Thurstone's (1962) own test of primary mental abilities,
measure skills relevant to each of the seven posited abilities, such as
vocabulary (verbal comprehension), mathematical computation in prob-
lem solving (number), and mental rotation (spatial visualization).
Consistent with the notions of Gustafsson (1984), tests based upon
Cattell's (1971) notion of fluid ability tend to look very similar to tests of
Spearman's g, including Cattell and Cattell's (1963) own test. Tests that
measure crystallized ability as well, such as the Stanford-Binet or Wechsler,
are likely to be more heavily loaded with items measuring vocabulary and
general information.
I would argue that there are good reasons for having tests based on
metaphors other than the geographic one.
First, it is not clear that the tests generated under the geographic
metaphor really should serve as the standard against which other tests
are measured. The fact that such tests have been in existence for a
number of years does not in itself make them a (or the) viable standard.
To the extent that external criteria have been used in the development of
conventional tests, the validation criteria have generally been perfor-
mance in school as measured by indices such as school grades and
teachers' evaluations. But clearly, these indices provide only one kind of
criterion against which to evaluate tests or theories. Certainly, school
grades are not an adequate measure of success in life. And after people
graduate from school, grades are of essentially no importance at all. One
METAPHORS OF MIND 9

could argue that the present tests are quite narrow in what they measure
(e.g., Gardner, 1983; Sternberg, 1985), so that if a new test did correlate
highly with the old ones, it would indicate that the new test is just as
narrow as the old ones. In effect, it is an historical accident that the
intelligence-testing business got its main start when Binet was asked to
distinguish groups of students in school. Had he or someone else been
asked to do the same for performance at work, or in some other domain,
the tests that resulted might have been quite different, and possibly
different in kind.
Second, new tests might provide an operational basis for expanding
our concept of intelligence. To the extent that the tests do not correlate
highly with conventional ones, we need at least to be open to the
possibility that the new tests are measuring an aspect of intelligence that
the old ones do not measure.
Third, new tests can give us kinds of information that are not yielded
by conventional psychometric tests, regardless of the correlation of the
new tests with the conventional ones. They can not only give us new
information, but can help us conceive of individual people's intelligence
in new ways. Let us therefore consider other metaphors, and both the
theories and tests that they have generated.

The Computational Metaphor


During the last decade, the predominant metaphor for studying
intelligence has probably been that of the computer program. Research-
ers have sought to understand intelligence in terms of the information
processing that people do when they think intelligently. Information-
processing investigators have varied primarily in terms of the complexity
of the processes they have sought to study. They have taken rather widely
differing approaches.
Computational theorists tend to be highly critical of geographic
approaches and especially of the individual who is seen as the originator
of the geographic approach, Charles Spearman. It is therefore ironic, as
few computational theorists realize, that the computational approach to
intelligence, like the geographic one, dates back to none other than
Charles Spearman.
Computational theories. Spearman (1923) proposed what he be-
lieved to be three fundamental qualitative principles of cognition
(mentioned earlier). These principles-apprehension of experience,
eduction of relations, and deduction of correlates-are mental processes
that Spearman believed to be components of general intelligence. Pro-
grams devised under the banner of artificial intelligence have used these
10 ROBERTJ.STERNBERG

processes heavily, and in some cases have even been built around these
processes (e.g., Evans, 1968).
Historically, the greatest impetus for the computational approach to
understanding intelligence can be traced to the pioneering work of
Newell, Shaw, and Simon (1958) and others who constructed computer
programs that could perform "intelligently. These AI programs are
II

summarized in books such as Boden (1977) and Stillings et al. (1987), and
will not be reviewed here. Rather, I will concentrate on more recent
human-experimental work that has had more direct implications for the
testing of intelligence.
Hunt, Frost, and Lunneborg (1973) suggested that one way to under-
stand intelligence would be to test subjects in their ability to perform
tasks that contemporary cognitive psychologists believe measure basic
human information-processing ability. The proximal goal in this research
would be to estimate parameters (characteristic quantities) representing
the durations of performance for information-processing components
constituting each task, and then to investigate the extent to which these
components correlate across subjects with each other and with scores
on measures commonly believed to assess intelligence. Sternberg (1977)
later expanded upon this logic in his approach called "componential
analysis. The overall purpose of componential analysis is to identify the
II

component mental operations underlying a series of related information-


processing tasks and to discover the organization of these component
operations in terms of their relationships both to each other and to higher
order constellations of mental abilities. The fundamental unit of analysis
in componential analysis is the component, which is an elementary
information process that operates upon internal representations of ob-
jects or symbols (see also Newell & Simon, 1972). The components may
translate a sensory in put into a conceptual representation, transform one
conceptual representation into another, or translate a conceptual repre-
sentation into a motor output. Componential analysis can be used to
decompose task performance into (a) the component processes that
combine to produce it and (b) the strategies into which these component
processes integrate themselves. Using componential analysis, the inves-
tigator can determine the length of time a given person spends on each
mental process and the susceptibility of the process to an error.
Various information-processing theories of intelligence have been
proposed, based on the computational metaphor. Brown (1978; Brown &
Campione, 1978) has divided processes of cognition into two kinds:
metacognitive processes, which are executive skills used to control one's
information processing; and cognitive processes, which are non executive
skills used to implement task strategies. She suggests that five
metacognitive processes are of particular importance: planning one's
METAPHORS OF MIND 11

next move and executing a strategy, monitoring the effectiveness of


individual steps in a strategy, testing one's strategy as one performs it,
revising one's strategy as the need arises, and evaluating one's strategy
in order to determine its effectiveness.
Sternberg (1980) distinguishes among metacomponents, which are
similar to Brown's metacognitive processes, performance components,
and knowledge-acquisition components. Thus, Brown's cognitive pro-
cesses are subdivided into two categories in Sternberg's theory. Whereas
performance components are lower order processes used in the execu-
tion of various strategies for task performance, knowledge-acquisition
components are processes involved in learning new information and
storing it in memory.
Hunt (1980) has distinguished between mechanical processes that
are relatively content-free, and processes that are more knowledge-
based. Hunt has emphasized mechanistic processes as, in many cases,
being rather general across information-processing tasks.
Baron (1985) has proposed a theory of intelligence whereby rational
thinking is the cornerstone of intelligence. Baron defines intelligence as
the set of properties that make for effectiveness, regardless of the
environment a person is in. Although intelllgence is based on rational
thinking, it goes beyond rational thinking in including personal endow-
ments such as capacities of knowledge that can lead to success.
Cognldve tests. Computationally-based tests, regardless of their
particular content, give information that is quite different in kind from the
information obtained from geographically-based tests. Typical informa-
tion yielded by such computational tests would be the processes used to
solve problems; the strategy or strategies into which these processes are
combined; the amount of time spent on each process, and, in some cases,
the susceptibility of each process to errors; and also, in some cases, the
form of representation of information in the mind (see, e.g., Hunt,
Lunneborg, & Lewis, 1975; Sternberg & Gardner, 1982). Geographically-
based tests do not give us these kinds of information, except perhaps in
a confounded fashion.lndeed, psychometric factors are often a combina-
tion of process and content that makes it difficult to separate the effects
of the two.
The tasks that are used in the computational approach depend upon
the particular computational theory under consideration. Jensen (1982),
for example, emphasizes very low levels of information processing in
attempting to understand the computational bases of intelligence. He will
typically present subjects with a choice reaction-time task, which re-
quires subjects to select among various buttons upon the presentation of
the choice stimulus. Jensen is a believer in the notion that intelligence
largely derives from sheer mental speed, and his use of relatively simply
12 ROBERT J. STERNBERG

tasks reflects this theoretical disposition. Hunt (1978) uses somewhat


more complex tasks, believing that intelligence involves more complex
operations. For example, he relates verbal ability to the speed of retrieval
of lexical information from long-term memory. In order to measure this
speed, he uses tasks such as the Posner and Mitchell (1967) Letter-
Matching task, in which subjects are asked to state as quickly as possible
whether the letters in a pair such as "A a" constitute a physical match
(which they don't) or (in another condition) a name match (which they
do). As a measure of speed of memory scanning, Hunt uses the S.
Sternberg (1969) memory-scanning task, in which subjects are asked to
state as quickly as possible whether a target digit or letter, such as 5,
appeared in a previously memorized set of digits or letters, such as 3 6 5
2. Individuals are usually tested in these situations either via a tachisto-
scope (a machine that provides rapid stimulus exposures) or a computer
terminal, and the principal dependent measure of interest is response
time.
Sternberg (1977; see also Sternberg, 1983) has used more complex
tasks to measure intelligence (see also Simon, 1976). Indeed, the tasks are
generally those that are found on conventional psychometric intelligence
tests, such as analogies, series completions, classifications, and syllo-
gisms of various kinds. In his information-processing work, Sternberg
emphasized the difference from the geographic approach not in the tests
used, but in the way in which test performance was analyzed. Thus, the
computational approaches have in common not any one particular
content that might appear on the tests, but rather their allegiance to
understanding the computational processes involved in intelligence-
how a person processes a problem requiring intelligence, from when she
first sees that problem to the time that she reaches a solution.

The Biological Metaphor


The biological metaphor provides a basis for understanding intelli-
gence by studying the brain and the operation of the central nervous
system. It is the most reductionist of the various metaphors, in that one
seeks understanding of intelligence directly in terms of biological func-
tion, rather than indirectly through molar levels of processing. However,
I will argue that the inferences that can be made from the biological
approach are often as indirect and even, at times, more indirect than are
the inferences that are made from alternative approaches. Although
adherents to the biological metaphor have in common their interest in
studying intelligence in terms of the brain and central nervous system,
they differ fairlywidely in the approaches they take for studying intelligence.
METAPHORS OF MIND 13

Biological theories. Biological approaches are of three main kinds:


neuropsychological approaches, which seek to understand intelligence
in terms of the size and structure of the brain; electrophysiological
approaches, which seek understanding of intelligence in terms of
electroencephalographic measurement; and blood-flow approaches,
which seek to understand intelligence in terms of the flow of blood to
various portions of the brain during thinking. I will consider each of these
approaches briefly in turn.
Neuropsychological approaches to intelligence are often traced back
to Hippocrates, who in the 5th century B. C. suggested that the brain
might be the basis of human intelligence. In more recent times, one of the
earlier general theories of brain function was proposed by Halstead
(1951). He suggested four biologically-based abilities: the integrative field
factor (C), the abstraction factor (A), the power factor (P), and the
directional factor (D). More influential in recent times has been Hebb's
(1949) theory. Hebb proposed the concept of the cell assembly. Repeated
stimulation of specific receptors slowly leads to the formation of an
assembly of cells in the association area of the brain. These cells can act
briefly as a close system after stimulation is stopped. Hebb assumes that
the process accompanying synaptic activity makes the synapse more
readily traversed. Any two cells or systems of cells repeatedly active at
the same time tend to become associated. If collections of cells become
associated, they form cell assemblies. An individual cell or other unit of
transmission may enter into more than one assembly at different times.
Moreover, over time, it may enter into new cell assemblies or drop out of
old ones. The cell assembly acts on an ali-or-none basis. In other words,
it fires or it does not. Hebb uses the concept of a cell assembly to account
for many different psychological phenomenon, among which is intelligence.
Another theory that has had great impact on the field of intelligence
research and testing has been that of a Russian psychologist, Alexander
Luria (1973, 1980). Luria believed that the brain is a highly differentiated
system whose parts are responsible for different aspects of a unified
whole. In other words, separate cortical regions act together to produce
thought and action of various kinds. Luria (1980) suggested that the brain
comprises three main units. The first, a unit of arousal, includes the brain
stem and midbrain structures. Included within this first unit are the
medulla, reticular activating system, pons, thalamus, and hypothalamus.
The second unit of the brain is a sensory-input unit, which includes the
temporal, parietal, and occipital lobes. The third unit includes the frontal
cortex, which is involved in organization and planning.
Some of the most interesting theorizing within the neuropsychological
approach has been done by those who study hemispheric specialization.
This work goes back to Marc Dax, an obscure country doctor in France,
14 ROBERTJ.STERNBERG

who in 1836 presented a little-noticed paper to a medical society meeting


in Montpellier. Dax had treated a number of patients suffering from a loss
of speech as a result of brain damage. He noticed the connection between
loss of speech (aphasia) and the side of the brain in which damage had
occurred. Indeed, having studied more than forty patients with aphasia,
Dax noticed that in every case there had been damage to the left
hemisphere of the brain. The paper aroused no interest. In more recent
times, the father of split-brain research has been Roger Sperry, who has
argued that each hemisphere behaves in many respects like a separate
brain (e.g., Sperry, 1961).1t would be hard to overstate the contribution
of Sperry to modem split-brain research, especially because so many of
the people working in the area have been graduate students of Sperry or
have worked at one time or another in his laboratory.
Although there are arguments as to the exact functioning of each
hemisphere, it is generally agreed that visual and spatial functions are
primarily localized in the right hemisphere, and language functions in the
left. One of the most interesting experiments demonstrating this localiza-
tion was done by Jerre Levy and her colleagues (Levy, Trevarthen, &
Sperry, 1972). They showed that when split-brain patients (those with
their corpus callosum severed) are shown so-called chimeric faces-
faces that have one appearance in the left half and another appearance in
the right half-they are unaware that the information in the two halves of
the picture conflicts. When asked to respond vocally to what they see,
they choose the picture from the right-field half of the chimeric stimulus,
because the left hemisphere controls language processing and controls
the body contralaterally. But when asked to point, subjects choose
pictures from the left-field, indicating right-hemispheric control of their
movement. In other words, the task that the subjects are asked to perform
is crucial in determining what face the subject will believe him or herself
to have seen.
Gazzaniga (1985) has argued that the brain is organized modularly,
into relatively independently functioning units that work in parallel. His
view is in the spirit of present connectionists models of human perfor-
mance (McClelland & Rumelhart, 1986). There exist many discrete units
of the mind, each operating relatively independent of the others. More-
over, many of these modules operate at a level that is not conscious. They
operate in parallel to our conscious thought, and contribute to conscious
processing in identifiable ways. In particular, the left hemisphere assigns
interpretations to the processing of these modules. Thus, the left hemi-
sphere may perceive the individual operating in a way that doesn't make
any particular sense or that is not particularly understandable, and its job
is to assign some meaning to that behavior. This view is also largely
consistent with that of Fodor (1983), who also believes that information
METAPHORS OF MIND 15

processing is largely determined by independent modules operating in


parallel.
A very different biological approach is the electrophysiological one.
Early studies tended to use electroencephalogram (EEG) measurements.
The idea was to relate patterns of EEG activity to intelligence or other
cognitive functions. For example, Galin and Ornstein (1972) showed a link
between amount of EEG activity in each of the hemispheres and the type
of tasks performed by a subject. In particular, they found that the ratio of
right to left hemisphere EEG processing was greater in verbal than in
spatial tasks. This pattern of results might seem to be the opposite of what
would be expected. However, Galin and Ornstein were measuring alpha
activity, which tends to be associated with the brain at rest. Therefore,
higher ratio indicates less active processing by the hemisphere of the
brain being measured at a given time.
Some of the most interesting electrophysiological work has been
done by Emanuel Donchin and his colleagues at the University of Illinois.
Much of this work has utilized the P-300 wave form. The label P-300 refers
to a positive component of evoked potentials that has a latency of
anywhere from 300 to 900 milliseconds after the presentation of a
stimulus. P-300 has been linked to processes of stimulus identification
and classification (McCarthy & Donchin, 1981). The amplitude of P-300
seems to reflect the allocation of cognitive resources to a given task. P-300
seems to be stronger, the greater the amount of surprise that a subject
experiences as a result of the presentation of the stimulus.
Schafer (1982) has suggested that a tendency to show a large P-300
response to surprising stimuli may be an individual-differences variable.
Schafer believes that a functionally efficient brain leaves fewer neurons
to process a stimulus that is familiar and more to process a stimulus that
is novel or unfamiliar. In other words, according to Schafer, more intelli-
gent individuals should show a greater P-300 response to novel stimuli
than would less intelligent ones. At the same time, more intelligent
Individuals should show smaller P-300 to expected stimuli than should
less intelligent ones.
Hendrickson and Hendrickson (1980) have conducted a program of
theory and research attempting to link electrophysiological responses to
observed intelligence. Their measurements are obtained while the sub-
ject is at rest. Their basic theory suggests that errors can occur in the
passage of information through the cerebral cortex. These errors are
alleged to be responsible for variability in evoked potentials. Thus, it
would follow that individuals with normal circuitry that conveys informa-
tion accurately will form correct and accessible memories more quickly
than individuals with "noisy circuits," who will make errors in transmis-
sion. Moreover, the Hendricksons have suggested that individuals with
16 ROBERTJ.STERNBERG

low IQs will have noisy channels of information processing. When evoked
potentials are averaged out, the potentials will have a smoother appear-
ance (because of averaging over the noise) than those produced by
individuals with more consistent and less noisy channels.
The third of the biological approaches relates cerebral blood flow to
intelligence. The idea is that blood goes to portions of the brain that are
being used in the processing of a task. It is possible to use radioactive
traces that are inhaled in order to monitor flow of blood during informa-
tion processing. Using this approach, one could monitor blood flow as a
function of the task being performed and who is performing it.
Biological tests. Biologically-based tests provide information that is
different in kind from either geographically-based or computationally-
based ones. Biological tests may indicate specific neuropsychological
deficits, patterns of hemispheric specialization, performances of differ-
ent regions of the brain, or in the case of evoked-potential measurement,
patterns of brain waves. The interpretability of this information, as in the
case of any test information, will depend upon the quality of the theory
upon which the test is based. But biologically-based tests are, for the most
part, the only ones that really map onto brain functioning, whether
directly or indirectly.
Halstead constructed a testoffunctioning based upon his theory, and
more recently, J.P. Das and Jack Naglieri have been working on a test
based on Luria's theory. Das and his colleagues have constructed an
impressive array of tests that measure all three aspects of functioning in
Luria's theory, namely, attention-arousal, planning, and mode of process-
ing. With respect to the last, Das's tests measure both simultaneous and
successive processing. Simultaneous processing refers to the parallel
processing of multiple chunks of information at a time. Successive or
sequential processing refers to serial processing of chunks of informa-
tion, one following the other. Tests such as Raven Matrices and Gestalt
Closure would measure simultaneous processing, whereas serial-recall
tests would measure successive processing.
Das and Naglieri are not the first to construct a test based on Luria's
theory. Two other such test are the Luria-Nebraska Neuropsychology
Battery (LNNB) (Golden, 1981) and the Kaufman Assessment Battery for
Children (K-ABC) (Kaufman & Kaufman, 1983). This latter test does not
measure the attention-arousal and planning function separately, but it
does measure simultaneous and successive processing and provides
separate scores for each. I have reviewed this test elsewhere in some
detail (Sternberg, 1985). The K-ABC also has a separate achievement
section, which is similar to what one would find on other tests of verbal
intelligence, such as the Stanford-Binet.
METAPHORS OF MIND 17

Measurement of evoked potentials has been especially popular among


adherents to the biological metaphor who are interested in testing. For
example, Schafer reported a correlation of .82 between an individual-
differences measure of evoked potential and IQ. In line with his theory, the
higher the IQ, the greater the difference in evoked potential amplitude
between expected and unexpected stimuli. This result suggests that more
intelligent individuals are more flexible in responding to novel stimuli
than are less intelligent individuals. The Hendricksons (1980) have pub-
lished pictures of what they reported to be typical wave forms for
subjects with high and low IQ. The wave forms for subjects with high IQs
are more complex than the wave forms for subjects with low IQs,
consistent with their theory. The Hendricksons used a string to measure
the length of the wave forms over a given period of time, on the view that
greater string length would reflect greater complexity in the wave form
and hence higher IQ. Somewhat oddly, given the reliabilities of the
measures, they obtained a correlation of .83 between an evoked potential
measure and scores on the Wechsler Adult Intelligence Scale. A replica-
tion of this study has been reported by Blinkhorn and Hendrickson
(1982), using Raven's Advanced Progressive Matrices as well as a variety
of verbal intelligence tests. In this study the correlation was .84, corrected
for restriction of range. These correlations are so high as to be troubling
to some, myself included.
Blood flow has also been used in the measurement of intelligence (see
Horn, 1986). It is actually possible to relate blood flow to different regions
of the brain for tasks requiring crystallized versus fluid abilities. This
approach is more direct than the evoked-potentials approach, because it
is possible to determine in just what portion of the brain processing is
taking place.
The biological metaphors have generated a substantial amount of
research relating functioning of the brain to cognition, in general, and to
intelligence, in particular. The relation between biological measures and
psychometrically assessed intelligence is by no means straightforward.
For example, what does it mean to show a relationship between average
evoked potentials and psychometrically-derived intelligence test scores?
Do the evoked potentials somehow cause intelligent cognition? Or equally
plausibly, does intelligent cognition lead to certain patterns of evoked
potentials? Or do both intelligent information processing and evoked
potentials depend on some aspect of the brain, whether conceived
biologically or cognitively? As one reflects on the correlation, one is
reminded of the rule one learns in school that one cannot infer causation
from correlation.
Despite the striking magnitude of some of the correlations, we are a
long way from understanding the neural mechanisms that are actually
18 ROBERT J. STERNBERG

responsible for them. Indeed, it is possible that when given instructions


simply to sit and not perform any particular task, the more intelligent
subjects may be busy thinking about issues that are on their minds,
whereas the less intelligent subjects may be less likely to be thinking in
this way, or at all! This difference might generate a correlation. Indeed, the
evidence suggests that the correlation between evoked-potential mea-
sures and conventional intellectual measures goes down as subjects
perform more and more complex tasks. In other words, when subjects are
actually doing a complex task, one no longer obtains so easily a correla-
tion between average evoked potentials and scores on that test. I believe
that biologically-based measures have a bright future, but we need to
avoid jumping to premature, and often reductionist conclusions. At the
present time, biological approaches provide another means for obtaining
dependent measures. They do not provide a means that is somehow
privileged with respect to every other kind of measure.

The Epistemological Metaphor


The epistemological metaphor draws very heavily upon philosophy,
and especially the philosophy of knowledge, for its conceptualization of
intelligence. Epistemological theorists, influenced heavily by Jean Piaget,
tend to be developmental in the range of issues they consider.
Epistemological theory. Piaget (1952) thought that there were two
interrelated aspects of intelligence: its function and its structure. Piaget,
a biologist by training, saw the function of intelligence to be no different
from the function of other biological activities. That function is adapta-
tion, which includes assimilating the environment to one's own structures
(be they physiological or cognitive) and accommodating one's structures
to encompass new aspects of the environment.
Piaget further proposed that the internal organizational structures of
intelligence and how intelligence will be manifested differ with age. It is
obvious that an adult does not deal with the world in the same way as does
a neonate. For example, the infant typically acts on his or her environment
via sensory-motor structures and, thus, is limited to the apparent, physi-
cal world. The adult, on the other hand, is capable of abstract thought
and, thus, is free to explore the world of possibility. Much of Piaget's
research was a logical and philosophical exploration of how knowledge
structures might develop from primitive to sophisticated forms. Guided
by his interest in epistemology and his observations of children's behav-
ior, Piaget divided the intellectual development of the individual into
discrete, qualitative stages. As the child progresses from one stage to the
next, the cognitive structures of the preceding stage are reorganized and
METAPHORS OF MIND 19

extended to form the underlying structures of the equilibrium character-


izing the next stage (Piaget & lnhelder, 1969).
In Piaget's theory, there are four main stages. The first stage is the
sensory-motor one, which occupies birth through roughly two years of
age. The new-born baby exhibits only innate, preprogrammed reflexes,
such as grasping and sucking. Intelligence begins to exhibit itself as the
innate reflexes are refined and elaborated. The second stage is the
operational one, which takes place roughly in the ages two through seven.
The child is now beginning to represent the world through symbols and
images, but the symbols and images are directly dependent upon the
immediate perception of the child. In the third stage, that of concrete
operations, the child becomes able to perform concrete mental opera-
tions. In this stage, lasting approximately between the ages of seven
through eleven, the child can now think through sequences of actions or
events that previously had to be enacted physically. It is now possible for
the child to reverse the direction of thought. The child comes to under-
stand subtraction, for example, as the reverse of addition, and division as
the reverse of multiplication. The period is labeled one of "concrete
operations," because operations are performed for objects that are
physically present. In the last stage, that of formal operations, the child
learns to think abstractly and hypothetically, not just concretely. In this
stage, which begins to evolve at around the age of eleven, the individual
can view a problem from multiple points of view, and can think much
more systematically than in the past.
Other, neo-Piagetian theorists have built on the ideas of Piaget while
at the same time not accepting all of his assumptions and contentions. For
example, Case (1984, 1985), like Piaget, believes that cognitive develop-
ment proceeds through four general stages that take place between age
one month and adulthood. Case's stages are not exactly the same as
Piaget's, however. In his recent work, Case has been particularly con-
cerned with the control structures for thought, that is, how problem
situations are represented and how these representations are acted
upon. In his earlier work, Case drew heavily on Pascual-Leone's (1970)
concept of theM-space for understanding the amount of men tal process-
ing space that can be allocated to various cognitive tasks at different ages.
Another of the neo-Piagetian theories is that of Fischer (1980; Fischer
& Pipp, 1984). Fischer assumes that development can be understood
primarily in terms of two key concepts, which provide instantiations for
the notions of competence and performance as they have been discussed
by others. The first concept Is that of optimal level, which specifies the
upper limit on the complexity of skill that an individual can bring to bear
upon a problem. The second concept is that of skill, which appears to be
a set of processes that can be brought to bear upon problems. Skills differ
20 ROBERT J. STERNBERG

in complexity, and indeed, the complexity of skills that can be brought to


bear upon a problem is a key source of development in Fischer's theory.
Other neo-Piagetian theories include those of Siegler (1981, 1984,
1987) and Pascual-Leone (1970, 1987).
Epistemological tests. The epistemological metaphor has lent itself
to a number of assessment devices, most of them based upon the thinking
of Piaget. Indeed, Tuddenham (1970) constructed a Piagetian test of
cognitive development that is roughly analogous to an intelligence test,
except that it is based upon the kinds of tasks Piaget used in his research.
What are some examples of such tasks?
Some of the most well-known tasks are those used to measure
conservation, a test of concrete-operational function. In one task, chil-
dren are shown a jar of liquid. The liquid is poured from this first jar into
another jar, which is higher and narrower. The child is then asked
whether the second jar contains more liquid than the first jar, less liquid,
or the same amount. A conserver would recognize that the amount of
liquid has not changed. A non-conserver would typically label the second
jar as containing more liquid, because the liquid in the second jar reaches
a higher level. Conservation of liquid is only one of several kinds of
conservations. For example, clay might be molded into different shapes,
and children asked whether the amounts of clay in each shape are the
same. Children who do not conserve view the different shapes as contain-
ing different amounts of the clay substance.
Another task that has been extensively used, especially by Siegler
(1978), is the balance-beam task. In this task, children are presented with
a balance beam that contains differing numbers of weights at differing
distances from the center of the beam. The children must indicate
whether the beam will balance, or whether instead one side or the other
will be higher. Siegler has done an elegant and elaborate analysis of
different stages in the development of information processing in the
performance of the balance-beam task. Yet another related task is that of
class inclusion. Children might be shown three green marbles and eight
blue marbles, and then be asked whether there are more blue marbles or
more marbles. The idea is to assess whether the children are concrete-
operational in the sense of being able to recognize that blue marbles are
a subset of marbles as a whole.
This last task highlights one of the difficulties with Piagetian measure-
ments. At least in some cases, children may not understand the task in the
same way that examiners do. For example, it may seem odd to be asked
whether there are more blue marbles or more marbles, and children may
reinterpret the question to mean: Are there more blue marbles or green
marbles? They would then answer this second question appropriately,
METAPHORS OF MIND 21

but be marked as incorrect. Of course, this problem of misinterpretation


applies to other approaches as well.
An example of a task used to measure formal operations would be the
permutations task. Examinees might be given four or more objects or sets
of numbers, and be asked to generate all possible permutations of these
objects (or numbers). The question here is whether they go about
performing this task in a systematic way. Formal-operational children will
generate all possible permutations by systematically varying the posi-
tions of items in the array.
The epistemological approach provides a fourth kind of theoretical
framework for assessing intelligence. The scoring of epistemologically-
based tests is theory-based, and relates to developmental stages in a
child's mental growth. Note that one is asking different kinds of questions
about mental functioning than would be asked in typical tests. Here one
is attempting to assess stages of the development of schemas for organiz-
ing the world, rather than seeking to assess how one child compares to
another of the same age. As has been true with each approach, the
epistemological approach provides a unique perspective on the
individual's intellect, with this perspective determining both the kinds of
questions one would ask and the way in which responses to the questions
would be scored.

Metaphors Viewing Intelligence


as External to the Individual
The Anthropological Metaphor
The basic idea of the anthropological metaphor is what Irvine and
Berry (1988) refer to as the law of cultural differentiation. It is based on a
statement by Ferguson (1954): "Cultural factors prescribe what shall be
learned and at what age; consequently different cultural environments
lead to the development of different patterns of ability" (p. 121).
Anthropological theories. Anthropologically-oriented psychologists
differ in the extent to which they believe that culture affects the nature of
intelligence. There are four positions of varying degrees of extremity,
each of which will be considered in turn.
The most extreme position, radical cultural relativism (Berry, 1974),
entails the rejection of assumed psychological universals across cultural
systems, and requires the generation from within each cultural system of
any behavior concept that is to be applied to it. Specifically, for the
concept of intelligence, this position requires that indigenous notions of
22 ROBERT J. STERNBERG

cognitive competence be the sole basis for the generation of cross-


culturally valid descriptions and assessments of cognitive capacity.
In this approach, it is essential to understand how context shapes
intelligence. Berry and Irvine (1986) have described four levels of context
that can affect intelligence and the way it is evaluated. At the highest level
is ecological context. This kind of context comprises all of the permanent
or almost permanent characteristics that provide the back-drop of hu-
man action. It is the natural cultural habitat in which a person lives. A
second kind of context is the experiential context, or the pattern of
recurrent experiences within the ecological context that provides the
basis for learning and development. When cross-cultural psychologists
try to determine independent variables that affect behavior in a particu-
lar habitat, they are usually dealing with the level of experiential context.
A third kind of context is the performance context, which is itself nested
under the two kinds described above. This context comprises the limited
set of environmental circumstances that account for particular behaviors
at specific points in space and time. Finally, nested under the above three
levels of context is the experimental context. This context comprises
environmental characteristics manipulated by psychologists and others
to elicit particular responses or test scores. Although this context should
be nested within the three described above, it often is not, in which case
the experimental context will not represent appropriately the conditions
under which a given set of people lives.
A less extreme form of anthropological theorizing is what might be
called conditional comparativism. Those who adhere to this point of view
believe it is possible to do some kind of conditional comparison in which
an investigator sees how different cultures organize experience to deal
with a single activity such as writing, reading, or computing. This com-
parison is possible, however, only if the investigator is in a position to
assert that performance of a task or tasks under investigation is an
achievement that is attained in every culture being compared.
This is the view taken by, among others, Michael Cole and his
colleagues in the Laboratory of Comparative Human Cognition (1982).
Cole and his colleagues assert that the radical cultural-relativists' posi-
tion does not take into account the fact that cultures interact. They
assume that learning is context-specific, and the context-specific intellec-
tual achievements are the primary basis for intellectual development.
They state specifically that they do not deny the existence of any
intercontextual generality of behavior. But they further state that such
intercontextual generality is a secondary phenomenon, and one in which
the cultural organization of experience plays a major role. The idea in this
view is that each experience within a cultural context can be linked to a
specific task performance. There is no central process or general ability
METAPHORS OF MIND 23

intervening between experiences and behavior. Learning is viewed as


primarily event- or context-specific.
A still less extreme position is what might be referred to as a dualistic
one. Dualistic positions generally do not relyexclusivelyon the anthropo-
logical metaphor. Theorists such as Keating (1984), Jenkins (1979),
Baltes, Dittmann-Kohli, and Dixon (1984), and Charlesworth (1979) have
in one way or another attempted to incorporate both cognitive and
contextual elements into their models of intelligence.
In his ethological approach to studying intelligence, Charlesworth
(1979) has focused on what he refers to as the "other part" of intelli-
gence-intelligent behavior as it occurs in everyday life rather than in test
situations-and how these situations may be related to developmental
changes. Keating (1984), in his research, has investigated how intelli-
gence can be studied through a number of cognitive-psychological
paradigms, but more recently has suggested that these paradigms are
more or less vacuous when it comes to understanding how cognition
interacts with culture. Baltes (e.g., Dixon & Baltes, 1986) argues that it is
necessary to look at both the mechanics and the pragmatics of intelli-
gence. For example, his studies of wisdom suggest that wisdom cannot be
understood outside the context of the environment in which one
develops.
At the opposite extreme from radical cultural relativism is universal-
ism. Its primary tenet is that there are significant commonalities in the
nature of intelligence and of the mind in general across cultures. For
example, Levi-Strauss (1966) argued that there are no differences be-
tween how the mind works in one culture and how it works in another
culture, or even in how it works from one time to another. Primitive and
Western systems of thinking merely represent different ways by which
people try to understand nature and make it susceptible to rational
inquiry. According to Levi-Strauss, cultures do not differ in their levels of
mental development. All seek knowledge about the universe, and seek to
order and systematize. What differs across cultures is the content of
thought. Primitive systems of classification are more likely to be based on
attributes that are readily perceived or otherwise experienced. Modern
scientific classification systems rely on attributes that are inferred from
relations in the structures of objects. Thus, structurally there are no
differences among cultures. What differs is content.
Anthropological tests. Adherents to the anthropological metaphor
tend to eschew tests in the traditional sense. Rather, they try to devise
cognitive tasks that are culturally relevant. Their attempts at cultural
relevance go beyond the naive attempts of many psychometricians and
their so-called "culture-fair tests" (Cattell &Cattell, 1963), which are often
more culturally loaded than the tests they are designed to replace. An
24 ROBERT J. SlERNBERG

example of an investigator using the anthropological approach is Berry


(1974). Berry conducted a study of ten subsistence-level groups to test
the hypothesis that people in a hunting culture should possess good
visual discrimination and spatial skills. Their cultures are expected to
support the development of such skills through the presents of many
geometric and spatial concepts. To test this hypothesis, Berry ranked
cultural groups according to the importance of hunting to their existence,
and compared these rankings with test scores for perceptual discrimina-
tion and other related skills. For example, he used the embedded-figures
test, often used as a measure of psychological differentiation. He found,
as predicted, that the more central the role of hunting to a culture, the
better the psychological test scores.
Anthropologically oriented psychologists often suggest that the dif-
ferences in performance across cultures are based on enculturation
practices rather than on any internally-based "intelligence." For example,
Super (1976) found evidence that African infants sit and walk earlier than
do their counterparts in the United States and Europe. But Super also
found that mothers in the cultures he studied made a self-conscious effort
to teach babies to sit and walk as early as possible. Other motor behaviors
were not more advanced. For example, infants found to sit and walk early
were actually found to crawl later than did infants in the United States.
In another follow-up of this kind of logic, Serpell (1979) tested a notion
of McFie (1961) that lack of toys and construction games encouraging
accurate standards of orientation and imitation might lead to inferior
perceptual abilities on the part of African infants. Serpell designed a study
to distinguish between a generalized perceptual-deficit hypothesis and a
more context-specific hypothesis. He selected four perceptual tasks that,
by a general-process interpretation, should result in lower performance
for Zambian than for English children. But he suggested that performance
would depend on enculturation practices. Serpell hypothesized that
whereas English children would have more experience with two-dimen-
sional representations of pen-and-paper tasks, Zambian children would
have more practice molding wire into two-dimensional objects. Serpell
therefore predicted that English children would score higher on a pen-
and-paper task, but not as high on a wire-shaping task as Zambian
children. Serpell's data supported his prediction.
Another study showing the effects of the kind of training one receives
on how one performs was done by Greenfield (Bruner, Olver, &Greenfield,
1966). Greenfield and her colleagues studied children of the Wolof Tribe
in rural Senegal. Children received sets of pictures mounted on cards. The
cards were designed so that within each of the sets, a child could form
pairs based on various attributes-color, form, or function. The child was
first asked to show the investigator which of the two pictures in a given
METAPHORS OF MIND 25

set were most alike. The child was then asked why they were most alike.
Subjects were selected from children in three populations: Bush children
who had not attended school, children in school from the same town as
the Bush children, and school children living in Dakar, the capital of
Senegal. Greenfield found that children who had attended school, regard-
less of where, performed much as American children did. Preference for
color decreased sharply with grade, whereas preference for form and
function increased. Moreover, an increasing proportion of older children
justified their classifications in terms of subordinate categories. Children
who had not attended school and lived in the Bush responded quite
differently. They showed a greater preference for color with increasing
age, and rarely justified responses in terms of subordinate language
structure.
Even when the objects to be dealt with are familiar, the way they were
typically used or thought about may have once helped people perform
with them. When Cole and his colleagues (Cole, Gay, Glick, &Sharp, 1971)
asked adult Kpelle tribesmen to sort twenty familiar objects into "groups
of things that belong together," the subjects separated the objects into
functional groups (a knife with an orange, for example), as children in
Western societies do. The researchers had expected to see taxonomic
groupings (tools and foods, for example) from these adults, because
Western adults typically sort taxonomically. The Kpelle proved to be
perfectly capable of taxonomic sorting: When the subjects were asked to
sort the objects the way a fool would do it, they immediately arranged
them into neat piles of tools, foods, clothing, and utensils. Taxonomic
sorting of these objects seemed stupid to the Kpelle because it was
inconsistent with the way they deal with these objects in everyday life,
that is, functionally. In another classification task, the Kpelle sorted
leaves taxonomically (as either "tree" leaves or of "vine" leaves) with
ease. In this case, the taxonomic approach seemed completely appropri-
ate. As farmers, the Kpelle are frequently called upon to make such
discriminations, and hence were comfortable adopting the taxonomic
sorting strategy.
In sum, tests based upon the anthropological metaphor need to be
tailored, not just translated or adjusted, to the culture in which the testing
is taking place. Tests based upon the internally-oriented metaphors have
been used with almost no modification across cultures, under the as-
sumption that what they measure should be universal. But this is a big
assumption. Changing the content vehicle or the format of a test, or even
the location in which the test is given, can have a major effect upon test
scores. Thus, children who might look quite stupid on tests based upon
metaphors that view intelligence as inside the head, might look quite
smart on tests based on metaphors that are oriented toward the outside.
26 ROBERT J. STERNBERG

What is of greatest interest is that adherents of the anthropological


metaphor, who make a living of studying intelligence across cultures,
almost all believe that mere translation or minor modifications do not
adequately control for cultural differences in intelligence testing, whereas
adherents to the geographic metaphor, most of whom do not specialize
in cross-cultural work, are happy just to translate or make minor modifi-
cations in their tests. So the metaphor under which one works can have
profound implications for how intelligent a person will appear when
tested, because the metaphor determines what will be tested and how
testing will be done.

The Soclologlcal Metaphor


The sociological metaphor differs only subtly from the anthropologi-
cal one. Whereas the anthropological metaphor deals with the effects of
enculturation, the sociological metaphor deals with the effects of social-
ization. Adherents to the anthropological metaphor concern themselves
primarily with how culture affects intelligence. They deal with the ques-
tion of whether intelligence is the same thing across cultures, and if it is
not, how it is different. Adherents to the sociological metaphor care about
cultural effects, but tend to be less interested in the question of how
intelligence differs from one culture to another than in the question of
how socialization within any culture affects the development of intelli-
gence. Although they may look at socialization across cultures, their main
interest is in the socialization process itself, and especially how it is
similar across cultures even though the content of the socialization may
vary quite substantially.
Sociological theories. The sociological approach is due largely to Lev
Vygotsky. Vygotsky (1978) made several important contributions to the
theory of intelligence, the two most important of which are probably his
theory of internalization and his concept of the zone of proximal
development.
In his theory of internalization, Vygotskyturned the views ofPiageton
their head. Although Piaget and Vygotskywere both interactionists, they
were interactionists who believed that individual intelligence started at
essentially opposite points. Piaget believed that intelligence matured
from the inside, and directed itself outward. Vygotsky, in contrast,
believed that intelligence begins in the social environment, and directs
itself inward. The process of the direction of intelligence from the outside
to the inside is what Vygotsky refers to as internalization.
Internalization is the internal reconstruction of an external opera-
tion. The basic notion is that we observe those in the social environment
around us acting in certain ways, and we internalize their actions so they
METAPHORS OF MIND 27

become a part of ourselves. For example, we might learn how to teach


young children by watching how our parents teach us; or we might learn
how to speak or ride a bicycle or even read a book bywatching how others
do it. Internalization does not occur, for the most part, simply as the result
of mimicry of a single action. Rather, it is a process that continues over
time. First, the action may be perfectly imitated, or its meaning not quite
understood. Moreover, even after an action is internalized, its linkage to
other internal acts may take quite some time. Some functions are never
internalized: They remain forever as external signs. According to Vygotsky,
the internalization of socially based and historically developed activities
is what distinguishes human from animals.
Perhaps Vygotsky's most exciting contribution to the psychology of
intelligence is his notion of the zone of proximal (or potential) develop-
ment. Consider a situation posed by Vygotsky.
Take two children whose chronological age is ten years and whose
mental age is eight years. Ask whether one can characterize them as being
of the same age mentally. On the face of it, of course, one can. But this
means that both children can deal with tasks up to the degree of difficulty
characterized by what eight-year-olds can typically do. One could say
that the actual developmental level of the two children is the same. But,
Vygotsky asked, can one thereby ascertain that the subsequent course of
their mental development and their school learning will be the same,
because both depend on their intellect? Naturally, there are nonintellectual
factors that may influence their school learning or their mental develop-
ment. But for the time being, consider these nonintellectual factors as
being comparable for the two children. Most people would assume that
one could make comparable predictions about each of the children, and
indeed, the whole predictive use of intelligence tests in the United States
is based on this assumption. Vygotsky argues that this view is incorrect.
Suppose that a teacher-examiner provides guided assistance to each
of the two children in order to help them solve a given problem. It turns
out that, with this guided assistance, the first children can deal with
problems up to the level of a twelve-year-old, whereas the second child
can only deal with problems up to the level of a nine-year-old. Would we
still want to conclude that the two children are mentally the same?
Vygotsky suggests that we would not, for the first child has been shown
to be better able to profit from instruction than the second child. Hence,
it is reasonable to suppose that with regard to future as opposed to past
development, the first child is superior to the second child and has a
better prognosis. The difference between mental age twelve and mental
age eight, for the first child, and between mental age nine and mental age
eight, for the second child, is what Vygotsky refers to as the zone of
proximal development. It is the distance between the actual developmen-
28 ROBERT J. STERNBERG

tal level as determined by independent problem solving and the level of


potential development as determined through problem solving under
adult guidance or in collaboration with more capable peers.
Another sociological theorist is Reuven Feuerstein, whose basic
premise is that intelligence is modifiable. The key concept in Feuerstein's
conception of intelligence and its development is mediated teaming
experience (Feuerstein, 1980).

Mediated learning experience is the way in which stimuli


emitted by the environment are transformed by a "medi-
ating" agent, usually a parent, sibling, or other caregiver.
This mediating agent, guided by his intentions, culture,
and emotional investment, selects and organizes the
world of stimuli for the child. The mediator selects stimuli
that are most appropriate and then frames, filters, and
schedules them; he determines the appearance or disap-
pearance of certain stimuli and ignores others. Through
this process of mediation, the cognitive structure of the
child is affected. The child acquires behavior patterns
and learning sets, which in turn become important ingre-
dients of his capacity to become modified through direct
exposure to stimuli. (Feuerstein, 1980, p. 16)

Feuerstein believes that mediated learning experience can occur


either through the intervention of a particular individual, such as the
parent, or through general cultural transmission. Children who are
culturally deprived-who have inadequate exposure to their own culture-
will tend to receive inadequate mediated learning experience. Feuerstein
believes that cultural deprivation is not in terms of a mainstream or host
culture, but in terms of the culture of the child and his family.
Sociological tests. The concept of the zone of proximal development
is clearly related to mediated learning experience, as mediated learning
experience seems to be what helps children achieve their level of poten-
tial development. Brown and French (1979) give a fairly detailed example
of how the zone of proximal development can be measured, and Ferrara,
Brown, and Campione (1986) illustrate the use of the zone of proximal
development in considerable detail. Brown and her colleagues have
devised tests that use the Vygotskian concept in order to measure the
zone of proximal development.
Feuerstein (1979) has developed a test called the Learning Potential
Assessment Device (LPAD), which follows directly from his ideas about
mediated learning experience. The test also fits very well with Vygotsky's
notion of the zone of proximal development. In the test, an examiner gives
METAPHORS OF MIND 29

children rather difficult tasks to solve. Initially, he or she looks at how the
children solve the tasks without any intervention on the part of the
examiner. Then, children receive carefully graded, sequential hints, and
the examiner observes the children's ability to profit from these hints. In
this way, it becomes possible to observe the children's zone of proximal
development.
Although I initially had some doubts as to whether the tests of the
zone of proximal development measure what they are suppose to mea-
sure, the results of Brown and her colleagues and of Feuerstein are very
encouraging. I remain concerned, however, that the operationalization of
the zone of proximal development may not sufficiently take into account
individual differences in abilities and styles of learning. The instruction
that works well for one child might work only poorly for another child,
with the result that the first child might appear to have a larger zone of
proximal development than the second. In order for the measure to be
fair, we would have to make sure that the form of instruction used was
equally suitable for all children receiving that instruction, and it is
unlikely that any form of instruction will be equally suitable for all. Hence,
I believe that we do have to be careful in our interpretation of results of
tests that measure the zone of proximal development. Moreover, we need
to recognize that there may be zones of proximal development that are
domain-specific rather than domain-general, and that may differ not only
as a function of domain but as a function of how learning takes place. For
example, some children might learn quite well with the kind of direct
instruction given in tests of the zone of proximal development, whereas
other students might learn better on their own.
These concerns notwithstanding, the zone of proximal development
is one of the more exciting concepts in the psychology of intelligence,
because it gives us a way of addressing the question of what will happen
in the future, not based just upon retrospective measurement, but based
upon simulations of prospective processing of information. The dynamic
form of testing is quite different from the static form used under the
geographic metaphor. Dynamic testing may well be the wave of the future
in terms of understanding not only to what point people have arrived, but
also to what point they are going.
30 ROBERT J. STERNBERG

Intelligence as Viewed
Internally and Externally to the Individual
The Systems Metaphor
The systems metaphor is an attempt to bring together various other
metaphors by viewing intelligence in terms of a complex interaction of
various cognitive and other systems. I will describe here two attempts to
understand intelligence in terms of interactioning systems: Gardner's
(1983) theory of multiple intelligences and Sternberg's (1985, 1988)
triarchic theory of human intelligence.
Systems theories. The two systems theories to be considered here
are similar in viewing conventional theories of intelligence as too narrow,
but are different in the way they propose to expand our conception of
intelligence. Consider each theory in tum.
Howard Gardner's (1983) theory of multiple intelligences may be
viewed as having three fundamental principles. First, intelligence is not a
single thing, whether viewed unitarily or as comprising multiple abilities.
Rather, there are multiple intelligences, each distinct from the others.
The multiple intelligences Gardner proposes in his 1983 book are linguis-
tic, logical-mathematical, spatial, musical, bodily-kinesthetic,
interpersonal, and intrapersonal. In some ways, the distinction between
positing one intelligence comprising multiple abilities and positing mul-
tiple intelligences, each distinct from the others, is subtle. But the
positing of multiple intelligences emphasizes the separateness of each
set of skills, and also emphasizes Gardner's view that each intelligence is
a system in its own right, rather than merely one aspect of a larger system,
namely, what we traditionally call"intelligence." The second fundamental
principle is that these intelligences are independent of each other. In
other words, a person's abilities as assessed under one intelligence
should, in theory, be unpredictive of that person's abilities as assessed
under another intelligence. Obviously, the claim of independence is a
strong one, but Gardner believes that it is justified by what we know about
the mind. The third fundamental principle is that the intelligences inter-
act. Although they are distinct from each other, no one could ever get
anything done if their distinctness and independence meant that they
could not work together. In such an instance, a mathematical word
problem requiring, say, the application of both linguistic and logical-
mathematical intelligences would be insoluble.
Gardner defines an intelligence as "an ability or set of abllities that
permits an individual to solve problems or fashion products that are of
consequence in a particular cultural setting" (Walters & Gardner, 1986, p.
165). How do we know what constitutes an intelligence? In other words,
METAPHORS OF MIND 31

what criteria can one use to identify the multiple intelligences in Gardner's
theory, or other possible intelligences that have not yet been identified?
Gardner proposes eight criteria for distinguishing an independent intel-
ligence: potential isolation by brain damage; the existence of idiots
savants, prodigies, and other exceptional individuals; an identifiable core
operation or set of operations; a distinctive developmental history, along
with a definable set of expert "end-state" performances; an evolutionary
history and evolutionary plausibility; support from experimental-psy-
chological investigations; support from psychometric findings; and
susceptibility to encoding in a symbol system.
Sternberg's (1985, 1988) triarchic theory, as its name implies, has
three parts. The first, "componential subtheory," relates intelligence to
the internal world of the individual, or the mental mechanisms that
underlie intelligent behavior. It specifies three kinds of information-
processing components: metacomponents, which are higher order
executive processes used to plan what one is going to do, to monitor it
while one is doing it, and to evaluate it after it is done; performance
components, which are lower order processes that execute the instruc-
tions of the metacomponents in order to perform tasks; and
knowledge-acquisition components, which are used to learn how to do
what the metacomponents and performance components eventually do.
Components of information processing are always applied to tasks
with which one has some level of prior experience (including the null
level) and in situations with which one has some level of prior experience
(including the null level}. Hence, these internal mechanisms are closely
tied to one's experience. The second, "experiential subtheory," specifies
that the components are not equally good measures of intelligence at all
levels of experience. Assessing intelligence requires one to consider not
only the components, but the level of experience with which they are
applied.
According to the experiential subtheory, intelligence is best mea-
sured at those regions of the experiential continuum that involve
application of information-processing components to tasks or situations
that are either relatively novel, on the one hand, or in the process of
becoming automatized, on the other.lf a task is too unfamiliar, such as a
trigonometry problem presented to a first-grader, it will not measure
intelligence because the individual will have virtually no mental re-
sources to bring to bear on the problem. If the task is already automatized,
one will have no sense of the history of how efficaciously that automati-
zation was accomplished-whether it took one week or one year. The
ability to deal with novelty and the ability to automatize information
processing are interrelated. If one is well able to automatize, one has more
resources left over for dealing with novelty. Similarly, if one is well able
32 ROBERT J. STERNBERG

to deal with novelty, one has more resources left over for automatization.
Thus, performance at the various levels of the experiential continuum are
related to one another.
These abilities should not be viewed in a vacuum with respect to the
componential subtheory. The components of intelligence are applied to
tasks and situations at various levels of experience: coping with novelty
is via the components, and what is automatized is a set of components of
information processing.
According to the third, "contextual subtheory," intelligent thought is
directed toward one or more of three behavioral goals: adaptation to an
environment, shaping of an environment, or selection of an environment.
These three goals may be viewed as the functions toward which intelli-
gence is directed: Intelligence is not aimless or random mental activity
that happens to involve certain components of information processing at
certain levels of experience. Rather, these components are purposefully
directed toward the pursuit of these three global goals, regardless of the
level of experience at which the components are executed. The nub of the
triarchic theory of intelligence is that intelligence involves recognizing
and capitalizing on one's strengths, and recognizing and either compen-
sating for or remediating one's weaknesses. Thus, people may differ
widely in how they are intelligent, but they find some way in which they
excel, and then make the most of it.
Systems tests. In the systems approaches, the actual testing that is
done will depend on the way the system of the mind is conceived. Howard
Gardner and David Feldman, in their project Spectrum, are developing
tests based on Gardner's (1983) theory of multiple intelligences. These
tests, unlike the conventional ones, are not paper-and-pencil, but rather
measure children's thinking skills in an enriched classroom environment
where children are performing criterion activities. Thus, linguistic intel-
ligence might be measured by having children write a poem, or
bodily-kinesthetic intelligence by having them dance or play a sport.
Stern berg is currently developing a test based on his triarchic theory
of intelligence, which measures each of componential skills, coping with
novelty skills, automatization skills, and practical-intellectual skills in
verbal, quantitative, and figural domains. The test, for kindergarten
through adulthood, is at multiple levels, and is a group test. The compo-
nential items are most similar to those on a standard intelligence test,
including things like learning meanings of words from context, number
series, and figural analogies. The coping with novelty tests require
subjects to solve problems that are based on a novel premise. For
example, novel verbal analogies have students solve analogies that are
preceded by a premise that may be either factual (e.g., canaries sing
songs) or counter factual (e.g., canaries play hopscotch). In each case, the
METAPHORS OF MIND 33

subjects would have to solve the analogies as though the premise were
true. The novel number matrices, used to measure coping with novelty in
the quantitative domain, require subjects to complete number-matrix
problems in which numerals and symbols that have been set equal to
various n urn bers are freely interchanged. The figural coping with novelty
task requires solving figural series in which the series show a discontin u-
ity in the middle, and the subject must infer how to extrapolate from the
first domain in the series to the second. All three automatization subtests-
verbal, quantitative, and figural-require subjects rapidly to indicate
whether two different symbols are of the same class or not, for example,
whether two numbers are both even or both odd. Practical verbal
problems require everyday inferences, practical math problems require
everyday math, and practical figural problems require planning of routes
in a way that is time-efficient and effective. In the triarchic approach, the
testing of intelligence is closely linked to the teaching of intelligence, and
there exists a program at the high school-college level, Intelligence
Applied (Sternberg, 1986), which can be used in conjunction with testing
in order to enhance intellectual skills.

Conclusion
The testing of intelligence can be as diverse and multifaceted as is
intelligence itself. One cannot beg the question of "What is intelligence?"
by saying, as did Boring, that intelligence is what the tests test, because
there are as many different kinds of tests as there are metaphors for
understanding intelligence. There can be different sorts of tests within
each metaphor, depending upon the particular theory within the meta-
phor that generates the test. The conventional individual and group tests
we use to measure intelligence represent only a small sampling of the
ways in which intelligence might be tested. Almost all of the conventional
tests are based upon the geographic metaphor, but there is no reason in
principle why we need to test intelligence on this basis. Other metaphors
could help us assess aspects of performance that heretofore have been
neglected.
It is worth emphasizing again that metaphors are not right or wrong,
but more or less useful for particular purposes. Similarly, the theories
within the metaphors can be more or less useful for particular purposes.
Theories, unlike metaphors, can be proven to be wrong, although of
course, we cannot prove them to be right, but can only gather evidence
that is consistent with them. In comparing theories, we need to keep in
mind whether or not they were generated under the same metaphor,
because theories generated under different metaphors do not readily
34 ROBERT J. STERNBERG

lend themselves to comparison, any more than apples and oranges do.
They deal with different aspects of intelligence and accomplish different
goals, and hence are not, strictly speaking, comparable. Even within the
same metaphor, noncomparabilities can exist. For example, within the
biological metaphor, there were three distinct approaches that were
viewed-neuropsychological, electrophysiological, and blood flow-and
it would not be possible directly to compare across these procedures of
measurement. Each' deals with a different class of phenomena.
The direction that much applied research has taken over the past
decade or so has been toward successively more refined psychometric
theories of measurement, and successively more refined delivery sys-
tems for measurement. Thus, we now have tailored tests, which typically
use computer technology to present existing tests in more efficient ways.
I am all in favor of psychometric and technological development. But I
personally believe that we need to apply more resources to questions of
what we want to measure before we apply resources to how we are going
to measure it. Historically, the link between theories and tests of intelli-
gence has not been as strong as I believe it should be. The link is always
there, to some extent, but even when it is there, we have often not been
conscious of it. We need more explicitly to state what we are assuming
about intelligence when we measure it through a given vehicle, and more
seriously to consider the alternatives to the vehicles we are using. The
metaphorical approach to understanding intelligence helps chart the
universe of possibilities for more informed and self-conscious testing of
intelligence.

References
Baltes, P. B., Dittmann-Kohli, F., & Dixon, R. A. (1984). New perspectives on the
development of intelligence in adulthood: Toward a dual-process conception
and a model of selective optimization with compensation. In P. B. Baltes & 0.
G. Brim, Jr. (Eds.), Life-span development and behavior (Vol. 6, pp. 33-76). New
York: Academic Press.
Baron, J. (1985). Rationality and intelligence. New York: Cambridge University
Press.
Berry, J. W. (1974). Radical cultural relativism and the concept of intelligence. In
J. W. Berry &P. R. Dasen (Eds.), Culture and cognition: Readings in cross<ultural
psychology (pp. 225-229). London: Methuen.
Berry, J. W., &Irvine, S. H. (1986). Bricolage: Savages do it daily. In R. J. Sternberg
& R. K. Wagner (Eds.), Practical intelligence: Nature and origins of competence
in the everyday world (pp. 271-306). New York: Cambridge University Press.
METAPHORS OF MIND 35

Binet, A., & Simon, T. (1916). The intelligence of the feeble-minded (E. S. Kite,
Trans.). Baltimore, MD: Wllliams & Wilkins.
Bllnkhorn, S. F., & Hendrickson, D. E. (1982). Averaged evoked responses and
psychometric intelligence. Nature, 295, 59&.597.
Boden, M. A. (1977). Artificial intelligence and natural man. Sussex, England:
Harvester Press.
Boring, E. G. (1923). Intelligence as the tests test it. New Republic, June 6, 35-37.
Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of
metacognltion.ln R. Glaser (Ed.), Advances in instructional psychology (Vol. 1,
pp. 77-165). Hillsdale, NJ: Erlbaum.
Brown, A. L., & Camplone, J. C. (1978). Permissible inferences from cognitive
training studies in developmental research. In W. S. Hall & M. Cole (Eds.),
Quarterly newsletter of the Institute for Comparative Human Behavior, 2, 46-53.
Brown, A. L., &French, A. L. (1979). The zone of potential development: Implications
for intelligence testing in the year 2000. In R. J. Sternberg & D. K. Detterman
(Eds.), Human intelligence: Perspectives on its theory and measurement(pp. 217-
235). Norwood, NJ: Ablex.
Bruner, J. S., Olver, R. R., & Greenfield, P.M. (1966).Studies in cognitive growth. New
York: Wiley.
Case, R. (1984). The process of stage transition: A neo-Piagetian view. In R. J.
Sternberg (Ed.), Mechanisms of cognitive development (pp. 20-44). New York:
Freeman.
Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic
Press.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373-380.
Cattell, R. B. (1971). Abilities: Their structure, growth, and action. Boston, MA:
Houghton Mifflin.
Cattell, R. B., & Cattell, A K. (1963). Test of g: Culture fair, Scale 3. Champaign, IL:
Institute for Personality and Ability Testing.
Charlesworth, W. R. A (1979). An ethological approach to studying intelllgence.
Human Development, 22, 212-216.
Cole, M., Gay, J., Glick, J., &Sharp, D. W. (1971). The cultural context of learning and
thinking. New York: Basic Books.
Dixon, R. A., & Baltes, P. B. (1986). Toward life-span research on the functions and
pragmatics of intelllgence. In R. J. Sternberg & R. K. Wagner (Eds.), Practical
intelligence: Nature and origins of competence in the everyday world (pp. 203-
235). New York: Cambridge University Press.
Evans, T. G. (1968). A program for the solution of geometric analogy intelligence
test questions. In M. Minsky (Ed.),Semantic information processing. Cambridge,
MA: MIT Press.
Ferguson, G. A. (1954). On learning and human ability. Canadian Journal of
Psychology, 8, 95-112.
36 ROBERT J. STERNBERG

Ferrara, R. A., Brown, A. L., & Campione, J. C. (1986). Children's learning and
transfer of inductive reasoning rules: Studies of proximal development. Child
Development, 57, 1087-1099.
Feuerstein, R. (1979). The dynamic assessment ofretarded performers: The learning
potential assessment device, theory, instruments, and techniques. Baltimore, MD:
University Park.
Feuerstein, R. (l980).1nstrumental enrichment: An intervention program forcognitive
modifiability. Baltimore, MD: University Park.
Fischer, K. W. (1980). A theory of cognitive development: The control and
construction of hierarchies of skills. Psychological Review, 87, 477-531.
Fischer, K. W., & Pipp, S. L. (1984). Processes of cognitive development: Optimal
level and skill acquisition. In R. J. Sternberg (Ed.), Mechanisms of cognitive
development (pp. 45-75). New York: Freeman.
Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press.
Galin, D., & Ornstein, R. (1972). Lateral specialization of cognitive mode: An EEG
study. Psychophysiology, 9, 412-418.
Galton, F. (1883). Inquiry into human faculty and its development. London: Macmillan
Press.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York:
Basic Books.
Gazzaniga, M.S. (1985). The social brain: Discovering the networks ofthe mind. New
York: Basic Books.
Golden, C. J. (1981). A standardized version of Luria's neuropsychological tests:
A quantitative and qualitative approach to neuropsychological evaluation. In
S. B. Filskov & T. J. Boll (Eds.), Handbook ofclinical neuropsychology. New York:
Wiley.
Guilford, J.P. (1967). The nature of human intelligence. New York: McGraw-Hill.
Guilford, J. P. (1982). Cognitive psychology's ambiguities: Some suggested
remedies. Psychological Review, 89, 48-59.
Guilford, J. P., & Hoepfner, R. (1971). The analysis of intelligence. New York:
McGraw-Hill.
Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities.
Intelligence, 8, 179-203.
Guttman, L. (1954). A new approach to factor analysis: The radex. In P. F.
Lazarsfeld (Ed.), Methematical thinking in the social sciences (pp. 258-348). New
York: Free Press.
Halstead, W. C. (1951). Biological intelligence. Journal of Personality, 20, 118-30.
Hebb, S. B. (1983). Ways with words. New York: Cambridge University Press.
Hendrickson, A. E., &Hendrickson, D. E. (1980). The biological basis for individual
differences in intelligence. Personality and Individual Differences, I, 3-33.
Horn, J. L. (1986). Intellectual ability concepts. In R. J. Sternberg (Ed.), Advances
in the psychology of human intelligence (Vol. 3, pp. 35-77). Hillsdale, NJ:
Erlbaum.
METAPHORS OF MIND 37

Hunt, E. B. (1978). Mechanics of verbal ability. Psychological Review, 85, 109-130.


Hunt, E. B. (1980). Intelligence as an information- processing concept. British
Journal of Psychology, 71, 449-474.
Hunt, E. B., Frost, N., & Lunneborg, C. (1973). Individual differences in cognition:
A new approach to intelligence. In G. Bower (Ed.), The psychology of learning
and motivation (Vol. 7, pp. 87-122). New York: Academic Press.
Hunt, E. B., Lunneborg, C., &Lewis, J. (1975). What does it mean to be high verbal?
Cognitive Psychology, 7, 194-227.
Irvine, S. H., & Berry, J. W. (1988). The abilities of mankind: Arevaluation.ln S. H.
Irvine & J. W. Berry (Eds.), Human abilities in cultural context (pp. 3-59). New
York: Cambridge University Press.
Jenkins, J. J. (1979). Four points to remember: A tetrahedral model of memory
experiments. In L. S. Cermak &F.I. M. Craik (Eds.), Levels ofprocessing in human
memory (pp. 429-446). Hillsdale, NJ: Erlbaum.
Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement?
Harvard Educational Review, 39, 1-123.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children
(K-ABC). Circle Pines, MN: American Guidance Service.
Keating, D.P. (1984). The emperor's new clothes: The "new look" in intelligence
research. In R. J. Sternberg (Ed.), Advances in the psychology of human
intelligence (Vol. 2, pp. 1-45). Hillsdale, NJ: Erlbaum.
Laboratory of Comparative Human Cognition (1982). Culture and Intelligence. In
R. J. Sternberg (Ed.), Handbook of human intelligence (pp. 642-719). New York:
Cambridge University Press.
Levi-Strauss, C. (1966). The savage mind. Chicago, IL: University of Chicago Press.
Levy, J., Trevarthen, C., & Sperry, R. W. (1972). Perception of bilateral chimeric
figures following hemispheric disconnection. Brain, 95, 61-78.
Luria, A. R. (1973). The working brain. New York: Basic Books.
Luria, A. R. (1980). Highercorticalfunctions in man (2nd ed., rev. & expanded). New
York: Basic Books.
McCarthy, G., & Donchin, E. (1981). A metric for thought: A comparison of P300
latency and reaction time. Science, 2ll, 77-79.
McClelland, J. L., &Rumelhart, D. E. (1986). A distributed model ofhumanlearning
and memory. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group,
Parallel distributed processing: Explorations in the microstructure of cognition
(Vol. 2, pp. 170-215). Cambridge, MA: MIT Press.
McFie, J. (1961). The effects of education on African performance on a group of
Intellectual tests. British Journal of Educational Psychology, 31, 232-240.
Newell, A., Shaw, J. C., & Simon, H. A. (1958). Elements of a theory of human
problem solving. Psychological Review, 65, 151-166.
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ:
Prentice-Hall.
38 ROBERT J. STERNBERG

Pascual-Leone, J. (1970). A mathematical model for the transition rule in Piaget's


development stages. Acta Psychologica, 32, 301-345.
Pascual-Leone, J. (1987). Organismic processes for neo-Piagetian theories: A
dialectical causal account of cognitive development. International Journal of
Psychology, 22, 531-570.
Piaget, J. (1952). The origins of intelligence in children. New York: International
Universities Press.
Piaget, J., &lnhelder, B. (1969). The psychology ofthe child. New York: Basic Books.
Posner, M. 1., & Mitchell, R. F. (1967). Chronometric analysis of classification.
Psychological Review, 74, 392-409.
Schafer, E. W. P. (1982). Neural adaptability: Abiological determinant of behavioral
Intelligence. International Journal of Neuroscience, I7, 183-191.
Serpell, R. (1979). How specific are perceptual skills? A cross-cultural study of
pattern reproduction. British Journal of Psychology, 70, 365-380.
Siegler, R. S. (1978). The origins of scientific reasoning. In R. S. Siegler (Ed.),
Children's thinking: What develops? (pp. 109-149). Hillsdale, NJ: Erlbaum.
Siegler, R. S. (1981). Developmental sequences within and between concepts.
Monographs of the Society for Research in Child Development, 46, (Serial No.
189).
Siegler, R. S. (1984). Mechanisms of cognitive growth: Variation and selection. In
R. J. Sternberg (Ed.), Mechanisms of cognitive development (pp. 141-162). New
York: Freeman.
Siegler, R. S. (1987). The perils of averaging data over strategies: An example from
children's addition. Journal of Experimental Psychology: General, I 16, 250-264.
Simon, H. A. (1976).ldentifying basic abilities underlying intelligent performance
of complex tasks.ln L. B. Resnick (Ed.), The nature of intelligence. Hillsdale, NJ:
Erlbaum.
Snow, R. E., &Kyllonen, P. C., &Marshalek, B. (1984). The topography of ability and
learning correlations. In R. J. Sternberg (Ed.), Advances in the psychology of
human intelligence (Vol. 2, pp. 47-103). Hillsdale, NJ: Erlbaum.
Spearman, C. (1923). The nature of 'intelligence' and the principles of cognition.
London: Macmillan Press.
Spearman, C. (1927). The abilities of man. New York: Macmillan.
Sperry, R. W. (1961). Cerebral organization and behavior. Science, I33,1749-1757.
Sternberg, R. J. (1977).lntelligence, information processing, and analogical reasoning:
The componential analysis of human abilities. Hillsdale, NJ: Erlbaum.
Sternberg, R. J. (1980). Sketch of a componential subtheory of human intelligence.
Behavioral and Brain Sciences, 3, 573-584.
Sternberg, R. J. (1983). Components of human intelligence. Cognition,! 5, 1-48.
Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence. New
York: Cambridge University Press.
Sternberg, R. J. (1986). Intelligence applied: Understanding and increasing your
intellectual skills. San Diego, CA:. Harcourt Brace Jovanovich.
METAPHORS OF MIND 39

Sternberg, R. J. (1988). The triarchic mind: A new theory ofhuman intelligence. New
York: Viking.
Sternberg, R. J. (1990). Metaphors of mind. New York: Cambridge University
Press.
Sternberg, R. J., & Gardner, M. K. (1982). A componential interpretation of the
general factor in human intelligence. In H. J. Eysenck (Ed.), A model for
intelligence (pp. 231-254). Berlin: Springer.
Sternberg, S. (1969). Memory-scanning. Mental processes revealed by reaction-
time experiments. American Scientist, 4, 421-457.
Stillings, N. A., Feinstein, M. H., Garfield, J. L., Rissland, E. L., Rosenbaum, D. A.,
Weisler, S. E., & Baker-Ward, L. (1987). Cognitive science: An introduction.
Cambridge, MA: MIT Press.
Super, C. M. (1976). Environmental effects on motor development: The case of
African infant precocity. Developmental Medicine and Child Neurology, 18, 561-
567.
Thomson, G. H. (1939). The factorial analysis of human ability. London: University
of London Press.
Thurstone, L. L. (1938). Primary mental abilities. Chicago, IL: University of Chicago
Press.
Thurstone, L. L., & Thurstone, T. G. (1962). Tests of Primary Mental Abilities
(Revised). Chicago, IL: Science Research Associates.
Tuddenham, R. D. (1970). A "Piagetian" test of cognitive development. In W. B.
Dockrell (Ed.), On intelligence (pp. 49-70). Toronto: Ontario Institute for
Studies in Education.
Vernon, P. E. (1971). The structure of human abilities. London: Methuen.
Vygotsky, L. (1978). Mind in society. Cambridge, MA: Harvard University Press.
Walters, J. M., & Gardner, H. (1986). The theory of multiple intelligences: Some
issues and answers. InR. J. Sternberg &R. K. Wagner (Eds.), Practical intelligence:
Nature and origins ofcompetence in the everyday world (pp. 163-182). New York:
Cambridge University Press.
Wissler, C. (1901). The correlation of mental and physical tests. Psychological
Review, Monograph Supplement, 3, No. 6.

Author Note
Preparation of this chapter was supported by a contract from the Army Research
Institute, and by grants from the Spencer and McDonnell Foundations. The ideas
in the chapter represent a condensation of ideas presented In Sternberg (1990).
CHAPTER2

The Use of Structural


Equation Modeling in
Combining Data from
Different Types of Assessment

Lew Bank and G. R. Patterson

Some years ago, the authors of this chapter sat in a Chicago-bound


jetliner watching squall lines and immense thunderheads across a dark
sky and bantering about the use of multiple versus single assessment
devices. We were already committed-through a major NIMH grant for an
ongoing longitudinal study-to the time-consuming and expensive
Campbell and Fiske (1959) strategy of collecting data from a variety of
agents using a diversity of methods. Our colleague, Tina Pastorelli, had
just completed a series of analyses in Rome that clearly demonstrated the
direct impact of parenting practices on children's antisocial behavior
(Pastorelli & Dishion, 1991). These results pleased us because they were
consistent with our theoretical framework and prior investigations. The
problem was that only data taken from the mothers of subjects had been
collected for the Rome study. And since that study had yielded results in
accord with our position, why were we spending our time and grant money

LEW BANK, Oregon Social Learning Center, Eugene, Oregon.


GERALD R. PATIERSON, Oregon Social Learning Center, Eugene, Oregon.

41
42 LEW BANK AND GERALD R. PAITERSON

collecting tremendously more complex data sets than Pastorelli's when the
same results apparently could be obtained simply by asking mothers for
their perspectives? Were we likely to gain anything for all our effort?
Using the warp and woof fabric lines of the seat back in front of us, we
graphed our imaginary results in tweed. Well before landing, we reached
agreement. The model including only data from mothers was flawed, we
hypothesized, because it would not be predictive of important outcome
variables measured with other agents and methods. On the other hand,
a more generalizable model, using multiple indicators from different
informants and based on different methods, could be used to form latent
variables (i.e., unobserved, but hypothesized, underlying variables in a
theoretical model). Latent variables based on multiple operational defi-
nitions gathered in a variety of contexts by several different agents ought
to far more successfully predict theoretically important outcomes re-
gardless of how each outcome was measured.
For nearly a decade, we have been working with these multiple
indicator, latent variable models at the Oregon Social Learning Center
(OSLC). With an expectation of enhanced prediction of critical processes
and outcomes across several settings, we have moved toward a compre-
hensive theory of the development of antisocial behavior in children
(Patterson &Bank, 1986,1987,1989; Patterson, Bank, &Stoolmiller, 1990;
Patterson, Capaldi, & Bank, 1991; Patterson, Dishion, & Bank, 1984). The
context of these investigations includes the following: (a) Patterson's
(1977, 1979, 1982,1986) performance theory conception, which demands
a consistent effort to account for increasingly large chunks of criterion
variance in theoretical outcomes of interest; (b), a scientist-as-practitio-
ner model with all investigators also working in clinical roles (see Bank,
Patterson, &Reid,1987; Forgatch,1991);and (c) ahistoryofresearchwith
social interactional variables based on direct observation data (e.g.,
Patterson, 1982; Patterson &Reid, 1970; Reid, 1978; Reid &Patterson, 1989).
The social interactional perspective posits that it is experience and
practice through moment-to-moment interactions with others that build
over time (and many repetitions) our individual styles and personalities.
We have called these molecular episodes microsocial interactions and we
are careful to differentiate them from molar or macrosocial variables. For
example, a trained observer will code into a portable computer actual
behaviors as they occur in a family in their living room; a child's total
number of negative behaviors divided by the total of all his or her
behaviors during a given observation period will yield a microsocial
variable providing a base rate of his or her negative behavior. Microsocial
variables may be compared within and across families. On another day,
we will individually interview this same child's mother and father, as well
as the child, and ask each of them to tell us how often the child does not
STRUCTURAL EQUATION MODEIJNG 43

mind, starts arguments, etc. The responses from interview questions of


this sort will then be used to calculate macrosoclal variables.
Clearly, such discriminations have actually been made for a long time.
In fact, in a review of the then-existing literature on molar and molecular
types of variables, Littman and Rosen (1950) found the literature con-
tained a multiplicity of definitions and "an affective attachment to the
terms which serves to cloud the problem and dominate fresh approaches"
(p. 64). We have viewed the distinction as critical, but strictly method-
ological, in nature. That is, we have asked, are there particular phenomena
that are best described with either micro- or macrosocial operational
definitions? In addition, if both methods provide plausible operational
definitions of any particular theoretical concept, then should we not
sample across both domains?
Over the past 25 years, Patterson and his colleagues have conducted
a series of investigations concerning child behavior using direct observa-
tion methods (for reviews, see Jones, Reid, & Patterson, 1975; Patterson,
1982; Reid, 1978). Those investigations emphasized the importance and
richness of microsocial data collected from home observation. By the
early 1980s, however, the Patterson group at OSLC had become con-
vinced that macrosocial data types must also be included in the definition
of many key constructs needed to understand the development of antiso-
cial behavior. Currently, many family researchers espouse direct
observation/microsocial operational definitions as the only accurate
method of gaining critical information on family interaction styles (see
the Journal of Family Psychology special issue for December, 1989, for
examples). Although we are clearly advocates of this methodology, we
believe it would be a serious error to view macrosocial definitions to be
of only secondary import. Amore in-depth discussion of this point of view
was presented by Chamberlain and Bank (1989).
The purpose of this chapter is to describe the general methodological
approach that we have developed for assessing factors involved in the
development of antisocial behavior in children. We will make clear some
of the assumptions and concerns that underlie our approach, and will
present two theoretical/empirical examples from our research. Particu-
lar attention will be given to comparing the relative utilities for different
levels (micro- and macro-) of data in making long-term predictions about
child adjustment. As is apparent in the chapter title, structural equation
modeling (SEM) is a statistical tool that we have found most useful in
testing our hypotheses. Given the focus of this chapter, analyses will be
presented for illustration only, and more technical discussions of SEM
must be found elsewhere (e.g., Bank, Dishion, Skinner, &Patterson, 1990;
Bentler, 1989; Dwyer, 1983; Joreskog & SOrbom, 1988). SEM analyses
reported in this chapter were all conducted using Bentler's EQS program.
44 LEW BANK AND GERAlD R. PATIERSON

In general, EQS is more user friendly for those getting started with SEM;
LISREL, however, is more readily available through SPSS-X.

Nature and Utility of


Structural Equation Modeling
Problems ln Complex Data Analysis
In many studies, only a single method (e.g., self-report data, observer
ratings, or demographic variables) is employed in obtaining predictive
data; such a practice can be termed a "mono method" approach. Further-
more, many studies depend on only a single person (e.g., child, parent, or
teacher) to provide data. We refer to such data providers as "agents."
Unfortunately, too many studies utilize single methods and single agents
to operationally define all or many concepts in a theoretical framework.
In our view, as soon as even two concepts share data types of method and/
or agent, there is reason to suspect some significant portion of the shared
variance to be accounted for by a monomethod bias.
We (Bank et al., 1990) have demonstrated, for example, that if we use
mother-reported data as the only source of multiple indicators for
constructs of Maternal Stress, Maternal Depression, and Maternal Sup-
port, then the best fitting confirmatory factor analysis includes a fourth
factor, mother's frame of reference, with most indicators loading signifi-
cantly on both the method and respective content factors (Bank et al.,
1990, pp. 268-270). In fact, the one-, two-, and three-factor alternative
solutions all provided statistically inadequate fits to the data; by contrast,
the introduction of the method factor resulted in both an overall accept-
able fit and a significantly better fit than found with anyofthe alternatives.
Given such results, it may be difficult or impossible for an investigator
with mono method data to have confidence in further statistical analyses,
given the need to demonstrate findings beyond the monomethod artifact.
It is important to note, however, that a method factor may also
provide substantive information. In the mother data example just cited,
the method factor is labeled Mother's Frame of Reference to indicate that
this method factor may have significant predictive powers for phenom-
ena within the context in which mothers typically act (i.e., the home)
(Reid, Patterson, Bank, & Dishion, 1987). It also must be noted that
confirmatory factor analytic solutions with both content and method
factors may be "empiricallyunderidentified." Empirical underidentification
may be defined as a set of conditions under which the parameters in
question (if estimated simultaneously) result in an indeterminate solu-
STRUCTURAL EQUATION MODEIJNG 45

tion (i.e., a factor analytic solution which cannot converge). Therefore,


investigators must embark upon such work with care (see Bank et al.,
1990; Wothke, 1987).
In order to be able to detect and avoid method bias in scientific
investigations, researchers must first design studies with multiple method/
agent assessment batteries. Without the inclusion of sufficiently different
types of data, any method biases are likely to go unobserved. When
adequate data are available, SEM has proven to be quite useful as an
analytic technique for combining different types of data, detecting and
isolating method variance components, and testing central hypotheses.
Clearly, other statistical techniques can be used, but it is our purpose
here to provide examples of the utility of SEM in this area.

Radonale of SEM
The SEM approach has two major benefits: (a) it is enormously
helpful as a heuristic device, and (b) it is a powerful analytic tool in that
it allows the researcher to test and to reject a wide variety of alternative
models. As a heuristic device, one can posit different sets of structural
relationships that best sum up several theoretical approaches to the
same problem, construct models of competing explanations of the data
that are plausible within the same theoretical framework, and provide
through the generation and testing of these models a fertile ground for
other researchers to test replicability and develop and test competing
models. In addition, other investigators may succeed in extending the
usefulness of properly specified models by seeking the antecedents and
products of important processes. As a statistical method, modeling
allows a simultaneous test of (a) indicators loading on the factors they
theoretically describe, (b) structural paths existing reliably only where
hypothesized, and (c) a satisfactory fit of the model to the data. The
preparation alone to conduct such a test requires substantial under-
standing of the literature and the subject at hand in order to generate a
plausible model. Once a model is tested and found consistent with the
data, the process continues. It is now the responsibility of the investigator
to test competing models; this includes known spurious cases. The more
generalizable the competing models that can be rejected, the more
confident the researcher can be of the utility of the model(s) that cannot
be rejected.
Quesdons of causality. The question of making causal inferences
from these longitudinal data bears further discussion. We are in strong
agreement with Dwyer (1983) and Baumrind (1991); one cannot demon-
strate causality from correlational data, not even when using structural
46 LEW BANK AND GERALD R. PAITERSON

equation models. Patterson et al. (1984) have suggested a next step,


however, which calls for experimental manipulation based on findings
with structural models. This sequence of passive longitudinal (i.e.,
nonexperimental) modeling followed by true experimental studies has
indeed been our strategy, with SEM tests of process in longitudinal data
sets pointing the way to key areas for experimental interventions in
succeeding studies (see also Forgatch, 1991). In our work, we prefer to
discuss structural models as suggestive of processes rather than causal-
ity. Bentler (1980) has a helpful presentation of this issue.
Hypothesis testing. It is important to understand the strategies for
testing hypotheses, as well as the statistics, when using SEM techniques.
Ideally, hypothesis testing is carried out in essentially the same fashion
with a process model as with any of the inferential statistics familiar to
social scientists. The major differences are: (a) instead of positing a
difference between groups, as measured by one or more dependent
variables, a specific set of covariances or correlations among latent
variables is hypothesized (i.e., the model); (b) because latent variables
are not themselves measurable, each one is estimated via a network of
operational definitions or indicators (see Patterson &Bank, 1987); and (c)
a reconstructed covariance or correlation matrix based on the specified
model parameters is compared to the obtained raw data covariance/
correlation matrix.
One additional twist occurs in evaluating the chi-square statistic,
which is typically used to test for differences between the obtained and
the reconstructed covariance matrices: A statistically significant chi-
square indicates that the two covariance matrices are different, and the
fit of the model to the data is, therefore, inadequate. A nonsignificant chi-
square indicates similar specified and raw data covariance matrices and,
thus, an adequate fit.
The proper interpretation of an adequate fit may be elusive. It does
not imply that the process model is "true," but that the hypothesized
theoretical interrelations, as estimated with a particular set of indicators,
are consistent with the actual data. There may well be other process
models using these same constructs that also provide adequate fits, and
possibly statistically significantly better fits, than the hypothesized
process model. Hence, it is imperative that researchers employing this
statistical technique make every effort to test those alternative models
suggested by other theorists as well as reasonable variations within their
own theory-driven models.
As we have already indicated, SEM with non experimental data sets is
viewed by us as an extremely useful prelude to experimental studies. True
experiments do, of course, allow causal attributions, but prior to that
step, such interpretations of an adequate fit should be avoided.
STRUCTURAL EQUATION MODEIJNG 47

Some limitations of SEM. There are a variety of well-known limita-


tions to the SEM methodology. Specification error, or not correctly
positing the structure of the model to begin with, leaves us with the
potential problem of analyses-some of which may appear compelling-
which simply do not shed light on the actual research question. Path
coefficients, for example, which may be statistically significant and
interpretable in an incorrectly specified model may drop to zero when the
correct specification of the model is made. Arriving at correctly specified
models can only be achieved through strategies outlined above (i.e., a
careful and complete theoretical formulation based on a thorough review
of the literature and tests of alternative models, including other theoreti-
cal formulations).
Another common problem is that of underidentification. When the
number of parameters to be estimated equals or exceeds the number of
degrees of freedom in the model or in any particular part of the model, the
entire model or some portion of it cannot be estimated. Thus, there will
be some models specified correctly that cannot be tested without making
restricting assumptions to properly identify the model. A related and
often limiting issue is that of adequate sample size. For SEM, sample sizes
approaching 200 or greater are clearly sufficient (e.g., see Tanaka, 1987).
Smaller samples may also be adequate given a straightforward hypoth-
esis testing procedure conducted without exploratory analyses of the
same data set. Smaller sample sizes may also be workable when the
number of parameters to be estimated is small relative to the number of
observations in a sample (approximately 1:5 or fewer).
In order to more fully explain the nature and uses of SEM, we will
present two examples from our own research illustrating the application
of this methodology.ln both examples, the prediction of later delinquent
behavior in a longitudinal study of at-risk boys was undertaken. In the first
example, it was hypothesized that the effects of antisocial behavior of
mothers would be mediated through the disciplinary behavior of moth-
ers in significantly predicting their boys' delinquency. This hypothesis
can be seen as a small chunk of the theoretical framework presented in
Figure 1. An entire theoretical framework frequently cannot be tested in
a single SEM analysis; it is extremely useful, however, to be able to
scrutinize all the expected interrelationships implied by a theoretical
position as, for example, shown in Figure 1. Thus, in using a contextual
approach to explain behavior disorders in children, antisocial behavior
by the parent was one possible distal variable, and the mothers' disciplin-
ary behavior was one possible proximal variable. Based on prior studies
(e.g., Bank, Forgatch, Patterson &Fetrow, 1991; Laheyet al., 1988), it was
anticipated that mothers' antisocial behavior and mothers' discipline
would be the most powerful predictors of their children's delinquency.
48 LEW BANK AND GERAlD R. PATIERSON

.i:
Marttol Adluotment Pe,.m Povchooatholoqy
it''· "#it.'< (o..U.O~I, aubatanceobl.oolvl;de.,...ud
'
i ~ Poohlve Relnlorcoment
..
c ,,
suoervlslon · :
,, il
!I! ·
~ I•
.
I

j gt ;.:

....e .
3.

.
]; ;;
<

i-: ~.
Q 0 ' 0
CliiLD ADJUSTMENT/
·' <
••
H
' CONDUCT DISORDER 3
'
...e
.0
.• ~
~~
~!.I..; -:<,·~ ~ • • . ..r,.,~ >
::z:
!!
Dloclpllne PROXIMAL FACTORS .'Of ' 0

{ ~-'"'' ·' ,,,.,.,. .., . .,.;c:,..,., '1 ~.;z


DISTAL FACTORS
'
~ ,;.\''"" "--~~'-"'·'~ ~r. itl.-'A ·~
''"\''""'~

Agure 1: Contextual approach to family process.

It was also hypothesized that SEM would produce the expected


results only when indicators from a variety of methods and agents were
combined to form the latent predictor constructs, but not when indica-
tors based on a single agent or method-mother self-report-were used.
The flexibility of SEM was further demonstrated in this example with the
addition of socioeconomic status (SES) as a predictor of both parent
characteristics and child outcome. Given the finding of significant asso-
ciation between parenting and child outcome variables, it was necessary
to ask whether SES was accounting for this relationship. The test of this
secondary hypothesis and alternatives to it were straightforwardly ac-
complished with SEM, as will be evident later in this chapter.
For the second example, we used a combination of micro and macro
indicators to predict later delinquency; the hypothesis was that through
combining microsocial and macrosocial assessment modes, better pre-
diction would be achieved. It was also of particular interest to us to
examine microsocial variables as predictors of delinquency in a 4- to 5-
year span. A parallel set of multiple regression analyses (MRA) was
performed with the second example so that we might easily compare the
SEM and MRA results.

Example I: Mother Antisocial Behavior and


Maternal Discipline as Predictors of Delinquent
Behavior
Sample Description
Based on prior police contact data with juveniles, the ten highest risk
elementary schools were identified in a metropolitan area of about
STRUCTURAL EQUATION MODEUNG 49

150,000 persons. Schools were randomly selected and recruited from the
ten identified until two cohorts, each with approximately 100 Grade 4
boys, were filled. Twenty-one families were ineligible for participation
because they moved out of state, moved before contact, or their first
language was not English. Seventy-seven percent of the remaining fami-
lies agreed to participate in the study; the most common reasons stated
for refusals were lack of interest and no available time. Refusers did not
differ significantly from participants on teacher Child Behavior Checklist
(Achenbach &Edelbrock, 1983) ratings of externalizing and internalizing
behaviors. Details of the recruitment procedures for the boys (N = 206)
and their families are provided in Capaldi and· Patterson (1987). The
sample was almost entirely caucasoid and of lower socioeconomic sta-
tus, with 75% working class or unemployed. Through the fifth year (yYave
5) of data collection, well over 90% of the Oregon Youth Study (OYS)
sample were still participating in the study. Families were paid up to $300
for completion of the full 23-hour assessment battery.

Measures
For Example 1, the first set of models (Figure 2) used only mother self-
report indicators to define the Mother Antisocial and Mother Discipline
constructs. For mothers' antisocial histories, the Minnesota Multiphasic
Personality Inventory (MMPI) 49/94 profiles (Pd and Ma) and self-reports
of alcohol and illegal substance use were the three indicators. The MMPI
score was the sum of the Pd and MaT-scores. For alcohol consumption,
we used the Michigan Alcohol Screening Test (MAS1) score (Zucker,
1987), and illegal substance use was an OSLC interview scale. For moth-
ers' disciplinary styles, three interview-derived scales-efficacy ratings,
tendency to rationalize, and self-confidence ratings-were the indicators.
Child Delinquency was defined by actual police-recorded law violations
and the boys' responses to a 32-item General Delinquency Scale devel-
oped from the Elliott delinquency measure item pool (Elliott, Ageton,
Huizinga, Knowles, & Canter, 1983).
For the second round of models in Example 1, observer impres-
sions-based on home observations-and Department of Motor Vehicles
driving violation records were added to the MMPI49 scale sum and self-
reported use of illegal substances. And to measure mothers' discipline
styles, observer impressions and mother nattering during the in-home
observations were added to the mothers' self-confidence and self-effi-
cacy ratings. It should be noted that observer impressions referred
specifically to mothers' antisocial behavior and discipline styles in sepa-
rate sets of items on the observer impressions questionnaire. Nattering
was the observed base rate of low-amplitude coercive behavior of moth-
50 LEW BANK AND GERALD R. PAITERSON

2.a Measurement model

X2 (17) 16.356, p • .50

BBN • .91

BBNN -1.0

No residuals covaried

2.b. Simplex model

'llr....... ..
........................ xz • 3.348, p. 067
.•IJ'
--··..···..
··~···-...~..~... (I)
······--··~-------······--···········---· ..

x2pa) 19.925, P• .34

No residuals covaried

Flgure 2: Mother report as a predictor of boys' delinquency.

ers directed to the targeted children. We have also labelled this behavior
"irritable discipline." The definition for the criterion construct Child
Delinquency was not changed.
Two indicators for SES were based on Hollingshead's (1975) mea-
sures: the average of the parents' educational levels and the average of
their occupational prestige. In addition, family income was used as a third
SES indicator. For single-mother families, the data were based solely on
mothers' socioeconomic levels.
STRUCTURAL EQUATION MODEIJNG 51

Mother-Report Prediction Models


Figure 2a-the measurement model for the latent variable constructs
Mother Antisocial, Mother Discipline, and Child Delinquency-includes
all indicators, factor loadings, interconstruct correlations, and overall fit
of the model. The measurement model is a factor structure in which all
measured variables are required to load on their respective factors (or
theoretical constructs) and all possible correlations among constructs in
the model are calculated, but no path weights are estimated. In addition,
if the investigator wishes to allow any measured variable residuals to
covary, this should be undertaken within the measurement model before
any testing of the hypothesized structural model is attempted. Thus, the
measurement model calculated in this way becomes an approximate
"ceiling" for a best-fitting model given a particular set of theoretical
constructs and measured variables. The measurement model is there-
fore, by definition, a saturated model; that is, all possible relations among
constructs have been estimated.
In ensuing analytic steps the investigator would, of course, test the
hypothesized structural (path) model and alternatives to it, but no
additional factor loadings or covaried residuals would be included in the
structural models that were not already part of the measurement model.
Any models whose chi-square differences from the measurement model
were statistically nonsignificant may be thought of as accounting for
variance in the model as well as possible. Significant increases in a
model's goodness of fit chi-square, as compared to the measurement
model chi-square, suggest that nontrivial aspects of the model accounted
for in the measurement model were no longer accounted for. This line of
reasoning was introduced by Bentler and Bonnett (1980), and has become
more commonly used since Anderson and Gerbing's (1988) restatement
of it.
Returning to Figure 2a, note that only mother reports were used to
define the predictor constructs: the criterion construct Child Delin-
quencywas defined without any information from mothers. Specifically,
police contacts and the boys' reports of their own delinquent activities
were the indicators of delinquency. In such a model, it was our expecta-
tion that there would be no completely satisfactory solution. Given good
convergence of indicators for each construct, the model might appear to
fit the data, but even so, it was unlikely to account for much criterion
variance. That is, mother-report data alone were not expected to provide
good predictive power for later child delinquency.
All loadings were statistically significant, and all three interconstruct
correlations were in the expected directions and were also significant.
Indeed, the measurement model looks quite acceptable, and yet there are
52 LEW BANK AND GERAlD R. PATTERSON

some serious flaws in the structural model that the reader should note.
Following Anderson and Gerbing (1988), our strategy in testing structural
models was to compare the hypothesized model, as well as other alterna-
tive models, to the fit of the measurement model. (See Patterson et al.,
1990, for further discussion of the strategy.) The hypothesized structural
model shown in Figure 2b was a simplex, with paths depicting the effect
of Mother Antisocial mediated through Mother Discipline. The simplex
model was a statement of expected relations among theoretical con-
structs: each construct was predictive of the next one, but no other direct
effects on constructs later in the sequence were expected. Thus, direct,
nonmediated effects other than the simplex sequence were useful alter-
native models to test.
The factor loadings, though not included in Figure 2b, were essen-
tially unchanged as compared to the measurement model, though the
loading for the police contact indicator was only marginal (p less than
.10). Note that the path fi·om Mother Antisocial to Mother Discipline was
marginally significant, while the path from Mother Discipline to Child
Delinquency failed to reach significance at all. In this example, the factor
loading for police contacts was almost identical to its loading in the
measurement model, and the path coefficients were actually slightly
larger than in the measurement model. What changed was the standard
error estimate for police contacts. In SEM, widely varying estimates of the
same standard errors may be indicative of a spurious solution, regardless
of goodness of fit.
The fit of the simplex was adequate, and was not significantly poorer
statistically than that of the measurement model, X2(1) = 3.57, p greater
than .05. Therefore, one should conclude that the simplex structural
model provided a good solution that did not differ significantly from the
measurement model. Nonetheless, it did not lend much support to our
hypothesis. That is, of the two hypothesized paths, only one approached
significance, and very little criterion variance in Child Delinquency was
accounted for; in addition, increases in some standard error estimates
caused us some concern. Therefore, this model was not acceptable to us.
Alternative models using mothe....report prediction. Several alterna-
tives to the hypothesized simplex model were also tested. A saturated
solution with all three possible paths present was used. By definition, the
fit of the saturated model must equal that of the measurement model, but
none of the three paths reached statistical significance. Thus, this model
was discarded. In the second alternative model, Mother Antisocial was
tested as the predictor construct for both Mother Discipline and Child
Delinquency, and the two paths were, in fact, statistically significant. This
model also fit the data satisfactorily, XZ(18) = 22.08, p = .20, though the fit
was significantly poorer than that of the measurement model, X2 (1) = 4.52,
STRUCTURAL EQUATION MODEIJNG 53

p less than .05. Furthermore, the covariance between the Mother Disci-
pline and Child Delinquency construct residuals was highly significant,
which suggested that the path between them must be returned to the
model for the model to be correctly specified. In summary, then, these
analyses with mother-report data as predictor variables shed little light
on the correctness of the hypothesized simplex or either of the alterna-
tive models tested.
The simplex model using multlmethod and -agent prediction. We
believed a more adequate test of the simplex hypothesis could be
3.a Measurement model

xz(3o) 31.61. p • .39

BBN • .904

BBNN • .991

Residuals covaried: Observer Impressions (Antisocial)+ Observer Impressions (Discipline)

Discipline Ellectiveness + Discipline Self-Confidence ratings

3.b. Simplex model

·.55 -.49

'Ill:

xz• (31)~ 33.861, p~ .33 '· · · -. . . . . . . . . . . . . . . . . . . . . . . . . . .~=~-~.r-~:=:.~.~~-~. . -.. . . . . . . . . .)'1~- .24


Simplex model vs. Me~surement model: x2 (1 f 2.251, p > .05

BBN • .90

BBNN ~ .985

Residuals covaried: as in Figure 3.a.

Figure 3. Multimethod and -agent mother predictors of boys' delinquency.


54 LEW BANK AND GERAlD R. PATTERSON

obtained through the use of multiple agents and methods in defining each
of the latent variable constructs. To this end, a second set of models is
presented in Figure 3. Note that the Department of Motor Vehicles'
records and observers' impressions from home visits to each of the
participating families were added to the mothers' self-reports of drug use
and MMPI 49/94 profile sums in defining Mother Antisocial. Thus, mother
report was not rejected as indicative, but is used with additional infor-
mant-method sources. It was the convergence of these several sources
that we felt would provide increased predictive power.
Similarly, nattering and observer impressions were added to the
mothers' self-reports of their own discipline styles and effectiveness in
defining Mother Discipline. The operational definition for the latent
criterion variable Child Delinquency remained unchanged. The measure-
ment model depicted in Figure 3a looks much like the measurement
model in Figure 2a. The obvious difference is the addition of the indicators
just noted above and the higher magnitudes of correlation among all
constructs in the multiple agent;method model as compared to the
mother-report-onlymodel. Another difference, which is less obvious, was
the need to covary two pairs of residuals to arrive at the satisfactory fit
for the model in Figure 3a. The two observer's impressions residuals were
covaried, as well as were the two mothers' reports of discipline residuals.
The decision to allow these particular residuals to covary was made at the
measurement model level (i.e., not during the hypothesis testing pro-
cess), and is consistent with our own approach for statistically handling
method effect (Banket al., 1990). Covarying residuals is a way of parceling
unique chunks of variance that would otherwise contribute noise in
evaluating a model. Covariances of this type should be in a direction
consistent with expectation; in the current problem, both covariances
should be, and were, positive.
All factor loadings for the measurement model were statistically
significant, as were the three latent variable construct intercorrelations
and the two covaried pairs of residuals. The model fit satisfactorily, X2 (30)
= 31.61, p = .39. In Figure 3b, the test of the hypothesized simplex is
presented. All loadings were statistically significant and essentially the
same as in the measurement model. The hypothesized paths were also
significant and the fit was adequate, X2(31) = 33.861, p = .33. The fit of the
model was not significantly poorer than that of the measurement model,
with the chi-square difference test (df = 1) = 2.251, p greater than .15. In
addition, the correlation between Mother Antisocial and the Child Delin-
quency residual was nonsignificant, XZ(l) =2.22,p =.14. This nonsignificant
correlation is consistent with the hypothesis that the effects of Mother
Antisocial on Child Delinquency are mediated through Mother Discipline,
and that there is no significant direct effect of Mother Antisocial on Child
Delinquency.
STRUCilJRAL EQUATION MODEIJNG 55

Altematlve models using multlmethod and -agent prediction. Sev-


eral alternative models were again fitted to the data set. The saturated
model included the Mother Antisocial and Child Delinquency path; the
model fit was, of course, equal to that of the measurement model, but also,
as expected, the path from Mother Antisocial to Child Delinquency was
statistically nonsignificant (beta = .16, t = 1.33). Thus, this alternative
model reduced to the simplex. The second alternative again tested
Mother Antisocial as the predictor of both Mother Discipline and Child
Delinquency, and the Mother Discipline to Child Delinquency path was
dropped. That is, the mother's antisocial behavior was posited as the
direct cause of both her own inept discipline techniques and her son's
antisocial behavior; any effect of poor discipline on the boy's behavior is,
however, denied. This model fits satisfactorily, X2(31) =40.04, p =.13, and
the two structural paths are statistically significant. The model is, how-
ever, statistically poorer in fit than the measurement model, X2(1) = 8.40,
p less than .01. In addition, the covariance between the Mother Discipline
and Child Delinquency residuals was highly significant, suggesting that a
properly specified model must also include the path from Mother Disci-
pline to Child Delinquency. Thus, this model also reduced to the
hypothesized simplex.
These results are quite clear: the data analyses supported the hy-
pothesized simplex model (i.e., that mothers' antisocial behavior would
produce poor disciplinary techniques, which would lead, in tum, to sons'
delinquent activities), and the alternative models can be discarded.
Obviously, an investigator who had collected only mother-report data for
the predictor constructs would have failed to confirm the hypothesis. We
believe that other definitions for the criterion variable could also be used
(e.g., at school by discipline contacts, truancy, poor grades, and school
dropout) and parallel findings would emerge.

One More Step.••


The flexibility and elegance of SEM in hypothesis testing can be seen
in a small digression in which we add SFS to the problem we have just
solved. Several studies have already demonstrated that the Impact of SFS
on child behavior outcomes is mediated through parenting variables
(Bank, Forgatch, et al., 1991; Larzelere &Patterson,1990). Patterson and
Capaldi (1991) used a different strategy, partialing out the effects of SFS
from mothers' antisocial behavior and discipline techniques, then sub-
jecting the residual correlation matrix to SEM 1
On the other hand, it is very reasonable to hypothesize that SFS may
have a direct impact on any or all of the three constructs of interest in this
example. Some investigators have chosen to operationally define SFS
based on occupational status and education, as with the Hollingshead
56 LEW BANK AND GERAlD R. PATIERSON

(1975) Index (see Larzelere and Patterson, 1990, for a discussion of this
point), while others have included income level (e.g., Bank, Forgatch et
al., 1991). For the current example, all three SES variables have been used
as indicators (see Figure 4a). TheSES data were collected at the same time

4.a Measurement model

Grades 7-8

xz(ssi 64.712, p- .174


Residuals covaried: Observer Impressions (Mother Antisocial)+ Observer Impressions (Maternal Discipline)
Self-Rate (Maternal Discipline)+ Self-Confidence (Maternal Discipline)
Education (Parent SES) +Drugs (Mother Antisocial)
Income (Parent SES) +Self-Rate (Maternal Discipline)

4.b. Simplex model


Grades 7-8

21
R • .26
xz(sa) 71.602, p = .1 os
Simplex model vs. Measurement model: xl3l = 6.89, p > .05
Residuals covaried: Observer Impressions (Mother Antisocial) +Observer Impressions (Maternal Discipline)
Effectiveness (Maternal Discipline) + Self·Contidence (Maternal Discipline)
Education (Parent SES) + Drugs (Mother Antisocial)
Income (Parent SES) + Effectiveness (Maternal Discipline)

Flgure 4. Parent SES and multimethod and -agent mother predictors of boys'
delinquency.
STRUCTURAL EQUATION MODFlJNG 57

as the other Grade 4 (first year) data, but it is assumed that income,
occupation, and education for both parents reflects a relatively stable set
for at least some extended period prior to the collection of the Wave 1
data.
Thus, it is reasonable to hypothesize the simplex model once again.
It is certainly the most parsimonious model given four latent constructs-
SES to Mother Antisocial to Mother Discipline to Child Delinquency-and
as has been illustrated thus far, it is an easy task to compare competing
models to it. Most important to us, however, the simplex model closely
approximated our theoretical framework in explaining the development
of antisocial behavior, and therefore represents the strongest test of the
theoretical model. It should also be emphasized, however, that paths
from SES to Mother Discipline and Child Delinquency could still be
acceptable within our framework as long as the paths from Mother
Antisocial to Mother Discipline and Mother Discipline to Child Delin-
quency remain statistically significant.
Referring now to Figure 4a, and again following the Anderson and
Gerbing (1988) strategy outlined above, it is clear that the fit of the
measurement model is adequate, X2(55) = 64.712, p = .174. Note that it was
necessary to covary four pairs of residuals in fitting the measurement
model: the two pairs alreadycovaried, that is, the two observer's impres-
sions indicator residuals; in addition, the residuals from parent education
and mother self-report of drug and alcohol use and from parent income
and mothers' self-ratings of discipline effectiveness were also covaried.
4.c. Best alternative model

Gradesz.a

x2(sai 68.82, p- .156 R • .32

Alternative model vs. Measurement model: x:;3) • 4.11, p > .OS


Residuals covaried: Observer Impressions (Mother Antisocial)+ Observer Impressions (Maternal Discipline)
Effectiveness (Maternal Discipline) + Self-Confidence (Maternal Discipline)
Education (Parent SES) + Drugs (Mother Antisocial)
Income (Parent SES) + Effectiveness (Maternal Discipline)
58 LEW BANK AND GERALD R. PATIERSON

All of these covariances were statistically significant. The results of fitting


the data to the hypothesized simplex model are presented In Figure 4b.
Note that all factor loadings were statistically significant and very close
in magnitude to their respective loadings in the measurement model. The
fit was acceptable, X2(58) = 71.602, p = .108, and all three simplex paths
were statistically significant, too. The chi-square difference between the
measurement and simplex models was not significant, X2(3) = 6.89, p
greater than .05. These findings are consistent with the hypothesis and
with prior findings that suggest SES to be entirely mediated through
parent variables in predicting child outcome.
Aseries of alternative models was tested, including direct paths from
SES to Mother Discipline and Child Delinquency; these models all reduce
to the hypothesized simplex. A second series of alternative models again
used direct paths from SES to the other constructs, but this time the
hypothesized paths from Mother Antisocial to Mother Discipline and
from Mother Discipline to Child Delinquency were dropped systemati-
cally (one at a time, then both simultaneously). Again, none of these
models worked as well as the hypothesized simplex. Finally, an alterna-
tive model with SES accounting for all other effects was tested (Figure 4c).
That Is, it was assumed that SES directly accounted for all other effects in
the model and that no other latent variable was predictive of unique
variance beyond the predictive power of SES. This alternative model fit
very well, X2(58) =68.82, p = .156, and the chi-square difference between
the measurement and alternative models was not significant, X2(3) = 4.11,
p greater than .05. In this alternative model, the covaried latent construct
residuals for Mother Discipline and Child Delinquency (which, of course,
were expected to relate significantly when the construct to construct
path were dropped) approached statistical significance, X2(1) = 3.67, p =
.055. If this relation is taken as evidence that the Mother Discipline to
Child Delinquency path must be reinserted to correctly specify the
model, then the alternative model in Figure 4c will also reduce to the
hypothesized simplex. (That is, theSES to Child Delinquency path is no
longer significant, and like dominoes, the covaried construct residuals for
Mother Antisocial and Mother Discipline will then reach significance, and
reinserting this path in the model will result in statistical significance for
the simplex paths only.)
It is our position, however, that theSES direct impact alternative
should not be considered as a viable alternative to the hypothesized
simplex model. A detailed discussion of theSES alternative model will not
be attempted here, but two important points must be made. Using the
Larzelere and Patterson (1990) education/occupation SES definition, the
SES direct impact model is no longer viable; obviously, income is playing
a significant role (see also Capaldi &Patterson, in press). In addition, the
STRUCTURAL EQUATION MODEUNG 59

simplex model provides specific points for intervention as, for example,
with mother discipline techniques. Furthermore, a number of studies
have already demonstrated the efficacy of parent training interventions
(e.g., Bank, Marlowe, Reid, Patterson, & Weinrott, 1991; Chamberlain,
1990; Patterson, Chamberlain, & Reid, 1982). Thus, although the SES
direct-impact model may be a viable theoretical alternative model, the
social interactional mediational model represented by the simplex in the
example appears to us to be of greatest practical utility for truly imple-
menting change in the desired areas.

Example 2: On the Generalizability of


Micro and Macro Indicators
Our second example to illustrate an application of SEM concerns the
combination of data from the micro and macro assessment modes. Before
considering the specifics of this example, however, it will be helpful to
examine certain underlying assumptions and perspectives.

Some Assumptions
We assume that the adequate definition for a trait construct requires
a representative sampling of indicators. This Brunswickian perspective

atWDga: Home School Community laboratory

Figure 5. Possible respondents/raters using a variety of micro- and macroanalytic


measurement modalities.
60 LEW BANK AND GERAlD R. PATIERSON

could be satisfied by taking a stratified sampling from the three dimen-


sional block presented in Figure 5. The three dimensions that define the
space are agents (e.g., teachers, observers, parents, child self-report,
peers, etc.), settings (e.g., home, school, community), and methods (e.g.,
micro, macro). Within micro methods there would be, of course, a variety
of assessment techniques that one might also wish to consider, such as
videotaping, event sampling, or coding in real time. Within macro assess-
ment, one might consider sampling from questionnaires, interviews,
telephone reports of recent occurrences, or responding to vignettes. In
passing, we note that there are some recent ingenious developments in
assessment that cut across the micro-macro distinction. For example,
Larson, Richards, Rafaelli, Ham, and Jewell (1990) use a radio signal
device to indicate to the child that s/he is to briefly describe where s/he
is and what s/he is doing.
What is a representative sampling of micro indicators? As an arbi-
traryrule of thumb, we might specify observation of interaction sequences
that sample at least two settings, and the children's interactions with at
least two agents within each setting. For example, in the present report we
had observation data collected on the children's reactions to peers and
to teachers. Ideally, we might wish to sample interactions both within the
classroom and on the playground; but of these two, we might select the
playground because of the richer opportunity for the performance of the
trait in question. Within the home, the hour just prior to dinner time
seems to lend itself to the performance of coercive behaviors, so we might
sample the children's interactions with siblings and parents in that
setting. It can be seen that there are some cells within the block that would
probably not be included in our representative sampling (e.g., teachers
in the home or siblings and parents in the classrooms).
It is assumed that each indicator has been shown to have acceptable
psychometric properties. In the case of macro measures, the alpha
measure of internal consistency or test-retest reliability would suffice. In
the case of micro measures, this problem becomes a little more complex.
What is needed is some demonstration that the amount of interaction
sampled provides an adequate estimate for the trait. While this problem
received a good deal of attention in the early 1940s (e.g., Arrington, 1943),
it has, with some exceptions (e.g., Patterson & Cobb, 1973), been ignored
in recent years. In the Patterson and Cobb chapter, it was shown that 20
minutes of sampling target child behavior in each of three home observa-
tion sessions was barely adequate as an estimate for a summary score for
aggression. In that most studies using observational techniques fail to
demonstrate adequacy of micro sampling, this is a serious problem: that
is, the observers may be reliable coders but not collect an adequate
sample of what the child is doing.
STRUCTURAL EQUATION MODEilNG 61

Macro Continuities Across Time and Settings

The working hypothesis is that, on the average, what people say


about a child (macro variables) is isomorphic with what a child is actually
observed to do (micro variables) in a variety of settings. To the lay
person, this is probably a truism; but to the investigator into children's
social behavior, the assertion is counterintuitive on several points. First,
it runs counter to what was once an accepted dogma, to wit, personality
traits could not be shown to be stable across either time or settings
(Mischel, 1968). However, Olweus's (1979) well-known review of findings
from longitudinal studies laid to rest the notion that there is no evidence
for the assumption of trait stability across time. More recently, the other
constraint concerning different settings has also been laid to rest by the
beautifully designed study by Wright (1983). He demonstrated conclu-
sively that aggregating across raters and across time within settings
results in across-setting stability for traits such as children's aggression.
The present writers view Mischel's early conclusions as resting on
inadequate assessment. Given the data at hand in 1968, his conclusions
were accurate, there was no solid basis for assertions about either kind
of stability; but the situation is quite different now.
But even if one grants that there are now data demonstrating
generalizability across time and settings for children's antisocial behav-
ior, if one examines the covariations between teachers' and parents'
ratings of the antisocial behavior trait, the correlations still are typically
in the .2 to .4 range. The correlations between home observations and
laboratory observations of family interaction are not much better. The
average correlation between micro and macro measures of antisocial
behavior is probably a good deal lower than either of these, that is, in the
.1 to .3 range. These weak to moderate correlations are hardly sufficient
grounds for the belief in a single underlying latent construct that provides
a royal road to the prediction of future delinquency.
It seems clear that, when working with measured variables at the
bivariate correlational levels for both micro and macro measures, our
typical approaches include inordinate amounts of error in our estimates
of a trait. One advantage of the use of SEM in combining different data
types is the ability to estimate disattenuated relations among latent
variable constructs defined by converging indicator sets.

The Data
The present discussion is focused on the child's antisocial trait.
Presumably, however, what we have to say about this trait would apply
to other traits as well. The problem of interest lies in the typical finding
62 LEW BANK AND GERAlD R. PATTERSON

of only low-order covariations between measures of antisocial behavior


based on observation (micro variables) and those that are based on
global ratings by resident adult observers (macro variables) or child self-
report. In this example, we are taking the position that micro- and
macro-based measures of the antisocial trait should be solidly
intercorrelated. Presumably, the usual methods of assessment and analy-
sis contain built-in biases and errors in sampling that contribute to
underestimates of the covariations between methods.

Sample Description
For this second example, a subsample of 80 boys and their families,
teachers, and peers was used from the full (N = 206) OYS sample. The
characteristics of the full sample were described in detail in the first
example for this chapter. Classroom and playground data were collected
on the subsample, of which 40 of the boys had the highest scores on the
antisocial behavior construct in the full sample, and a second 40 were
randomly selected from the remaining 166. Issues of comparability of the
subsample and full sample are discussed in Ramsey, Patterson, and
Walker (1989). There were no statistically significant differences between
the full sample and the subsample on a variety of demographic and
construct scores, and intercorrelations among variables and constructs
were highly similar (within .05) in virtually all cases of interest.

Measures
For this example, the delinquency criterion construct was defined in
precisely the same way as in the first example. The Macro Child Antisocial
construct used teacher and parent reports and peer sociometric results.
The teacher indicator used only CBCL items, while the parent indicator
was an average of CBCL, parent interview and telephone interview
responses. The peer scale was scored with boys nominated by their
classroom peers on nine items relating to overt and covert antisocial
behaviors. These measures were all collected during the first year of the
study.
The Micro Child Antisocial construct used two measures during
Wave\ 1-total negative process (TNP) during a problem-solving task and
negative interactions with siblings during home observation-and two
measures collected in the Wave 2 (Grade 5)-academic engaged time
(AE1) in the classroom, and negative playground behavior (Playneg) with
peers. AET and Playneg are both school-based variables, and these data
were gathered only during Grade 5 and only for the subsample of 80, as
explained above. The Wave 1 (Grade 4) measures are described in detail
STRUCTURAL EQUATION MODEUNG 63

in Capaldi and Patterson (1988). Detailed descriptions of the delinquency


variables may be found in Larzelere and Patterson (1990). Details of the
school-based measures, AET, and Playneg are included in Ramsey et al.
(1989).

Predicting Delinquency from Micro and Macro Constructs


In this example we present macro data with indicators from parents,
teachers, and peers using questionnaire, interview, and peer sociometric
methods to define the latent antisocial trait. A group of indicators taken
from across multiple settings form a micrc:rbased latent construct for the
same antisocial trait; these indicators consisted of coded observations in
the home, on the playground, in the classroom, and in the laboratory.
Confirmatory factor analysis was used to evaluate if these two latent
constructs might most parsimoniously be considered as a single con-
struct instead of two separate constructs (see Patterson and Bank, 1987,
for details of this procedure). SEM was then used to test the hypothesis
that both the micro and macro latent constructs contributed equally to
predicting delinquent behavior assessed 4 years later.
The findings suggested that, given proper measurement, micro and
macro data sets can be thought of as highly overlapping and nearly
interchangeable in prediction. Figure 6 includes the intercorrelations of
the macro and micro variables along with the delinquency variables. Note
that our ability to predict the two delinquency indicators was quite good
Delinquency Macro Micro

Total
Academic negative
Elliott Pollee Teacher Parent Peer en aged Negative process Negative
Self-Report~~ report ~ ~~ ~ w/slbs

Elliott SeW-Report t.OO

Pollee contacts .526 t.OO

Teacher CBCL .427 .403 t.OO

Parent report .550 .489 .687 t.OO

Peer nomination .320 .445 .720 .588 1.00

Miw
AET ·.441 ·.433 ·.386 ·.450 ·.397 1.00

Play negative .365 .442 .329 .324 .376 ·.641 1.00

Total negative process .495 .294 .152 .345 .017 ·.224 .090 1.00

Negative w/slbs. .210 .056 .009 .335 .035 ·242 .077 .234 1.00

Figure 6. Delinquency, macro, and micro lntercorrelatlons.


64 LEW BANK AND GERAW R. PATIERSON

and was, in fact, statistically significant for all zero-order correlations,


with the single exception of synchronous negative interchanges between
the target child and sibling (from home observations) predicting the
target children's later self-report of delinquent acts but not official record
of police contacts.

7.a

.47

xz< 12;= 19.1, p = .086


Laboratory/
Problem solving
observations BBN = .91
(TNP) BBNN = .94

7.b

x2< 13;= 51.04, p < .oo1

BBN = .76
BBNN = .68

Residuals covaned: Macro Parent + Micro Heme Observation.;

Figure 7. Micro/Macro confirmatory factor analyses.


STRUCTURAL EQUATION MODEUNG 65

Conftnnatory Factor Analyses

The two confirmatory factor analyses are illustrated in Figure 7. The


fit of the 2-factor model was adequate, while that of the single-factor
model was not. Note that the fit of the 2-factor model is statistically
significantly better than the single, collapsed micro/macro factor, X2(1) =
8.3, p less than .01. Thus, though it is apparent that the Micro and Macro
Child Antisocial constructs were intercorrelated (r =.52, p less than .01),
it is also clear that they must be treated as separate constructs. It should
C).lso be noted that the lower convergent validities for the micro indicators
in Figure 6 may be due to the home observation and laboratory task data
being collected during Grade 4, while the classroom and playground
observations were conducted when the boys were in Grade 5. Negative
playground behaviors, in particular, fail to relate significantly to sibling
synchronicity at home and total negative process in the laboratory task.
In addition, the relatively hefty intercorrelation between classroom and
playground observations is likely due in part to the shared setting
(school) and interactants (peers, teachers). Also, note that the residuals
for target child-sibling synchronicity and parent report were allowed to
covary; this was done in both confirmatory analyses and represents the
unique variance of the home setting.

Measurement and Structural Models


As in Example 1, the Anderson and Gerbing (1988) strategy of testing
first a measurement model, then comparing it to the fit of the hypoth-
esized structural model, was used. The fit of the measurement model is
acceptable, X2 (22) = 30. 79, p = .1 0, with two pairs of residuals covarying:
(1) the boy-sibling synchronicity with parent report as noted above, and
(2) total negative process of the boy and his mother (and father if present)
during the problem-solving laboratory task with the target child's later
self-report of delinquent behavior. Note that the measurement model
correlation estimates of both Micro and Macro Antisocial with later
Delinquency were r = .44. As in the confirmatory analyses, the negative
synchronicity variable only marginally loaded on the micro factor (p less
than .10), while all other loadings and paths were statistically reliable.
The fit of the structural model is depicted in Figure 8; all loadings were
essentially identical to those of the measurement model. Both the Micro
and Macro Antisocial paths to Delinquency are statistically significant.
The fit was identical to that of the measurement model (i.e., the hypoth-
esized model is saturated).
An interesting alternative to the hypothesized model is to construe
the total negative process residual as predictive of later delinquency
rather than simply allowing it to covary with the Child Delinquency self-
66 LEW BANK AND GERAlD R. PATIERSON

report measured variable resiClual. This alternative is particularly inter-


esting because total negative process loads poorly on the micro factor.
Estimating the specific path (see Patterson et al., 1990) from the residual
to the delinquency construct allows us to observe whether any unique
criterion variance is accounted for besides that already predicted by the
Micro and Macro Antisocial constructs. This solution is illustrated in
Figure 9. It is quite similar to the solution of the hypothesized model in
Figure 8, and accounts for about 4% additional criterion variance. There
is no significant chi-square difference in the two models' comparativ e fit.

Grades 4-5

x2 a30.79,pa.10
(22)

BBN = .90

BBNN = .95

tllUL All p < .05 unless otherwise noted

Residu•ls covaned: Parent Report (Macro) + Negative Play w/Sibllngs iMicro Homa Observation)
Negative Ounng Problem Solving+ Ellion SeH·Report of Delinquency

A.gure 8. A comparison of micro and macro child antisocial latent variable


predictors of later child delinquency.
STRUCI'URAL EQUATION MODEIJNG 67

Structural Equadon Modeling Versus Muldple Regression

SEM has sometimes been described as a combination of factor


analysis and multiple regression. That is, there is a data reduction
component and a prediction component. While this is true, such a
definition omits many of the advantages of SEM such as the ability to
parcel out measurement error and relations among residuals and the
heuristic value developed through the need to clearly and specifically
state a theoretical framework. Nonetheless, it is of some interest to
compare the SEM findings in this second example to the results obtained
using multiple regression analysis with the same data set. A series of
multiple regression analyses were conducted to test each component of
the model presented in Figure 8. Table 1 contains the results of those
analyses. A scan of Table 1 reveals that significant criterion variance was
accounted for in each of the multiple regression equations. As is typically
the case with multiple regression, only two or three measured variables
predicted significantly in each equation. Clearly, there is a great deal of

tllllJl. All p < .05 unless otherwise noted

aP< .10

xz <w 31 .o9s. P • .o94


BBN • .90
BBNN • .94
Residuals covaried: Parent Aepon (Macro• • Negative Play w1Siblings (Micro Home Observation)

Figure 9. An alternative model: The addition of a specific path from negative


mlcrosoclallnterchanges during the lab task to later delinquency.
68 LEW BANK AND GERALD R. PATIERSON

valuable information in Table 1, but what must the investigator conclude?


Among macro variables, the peer nomination variable yielded no addi-
tional predictive power in the context of the teacher and parent reports;
yet, we know that peer relations were becoming critical for these preado-
lescent boys, and indeed, in Figures 8 and 9, it is clear that the peer
nominations measure converged extremely well with the other indica-
tors. Though an investigator might well conclude that, based on these
multiple regression analyses, both micro and macro variables contribute
significantly to the prediction of adolescent boys' delinquency, one might
be tempted to drop the macro measures since they appear to enhance the
micro prediction effects only slightly. The SEM analyses presented in
Figures 8 and 9 lend strong support to the idea that micro and macro
variables are equally important in the prediction of later delinquency,
and that both types of assessment should be included in future research.

Table 1
Micro versus Macro Prediction of Arrests and Self-Reported
Law VIolations with Stepwise Multiple Regression Analyses-

OUTCOMES
Self-Report Arrests Mean Delinquency
Predictors Sig RZ Sig RZ Sig RZ
Macro Teacher n.s. a• .26 a .28
Parent a .34 n.s. b (.34)d
Peer n.s. n.s. n.s.
Micro Sib(fC Neg n.s. n.s. n.s.
Playneg b .44 n.s. n.s.
Total Neg a n.s. n.s.
AET n.s. a .32 a .35
Both Sib(fC Neg n.s. n.s. n.s .
Playneg c .52 n.s. n.s.
Total Neg b n.s. n.s.
AET n.s. a a
Teacher n.s. b .40 b .43
Parent a n.s. n.s.
Peer n.s. n.s. n.s.

•a, b, and c refer to the stepwise order of insertion of each variable into the
multiple regression equation. All significant variables are at the .05 level unless
otherwise indicated. Each R2 appears after the last variable entered into each
equation.
dp = .06
STRUCTURAL EQUATION MODEUNG 69

The structural models may be more elegant and more easily inter-
preted than the multiple regression analyses, but most importantly, the
SEM models provide estimates based on converging indicator validities.
It is this combining of indicators that we believe results in more general-
izable prediction models. Multiple regression techniques certainly have
a place among social scientists' statistical tools, but it is our position that,
given adequate data sets as described in this chapter, SEM will almost
always be the preferred analytic approach.

Discussion and Conclusions


We believe that the analyses provided in the two examples presented
in this chapter have several implications. First, there is the question of
monomethodfmono-agent data: can it be appropriate to use data from a
single agent and method? And if so, when? Particular methods and/or
agents may be useful-or all that can be obtained-in specific contexts, but
if this is not clearly the case, then multiple agents and methods appear to
be the best approach. As is clear from our first example, the use of a single
method and agent can easily lead an investigator to erroneous conclu-
sions, and perhaps even worse is the fact that there is no way to verify the
validity of those findings without multimethod and multiagent data.
Replication of results using the same single method and agent simply are
not much help.
In the second example, we selected a problem that is of particular
interest to us: predictive comparability and complementarity of micro
and macro assessments. By following our strategy of combining a variety
of operational definitions in latent variable constructs and then testing a
theoretical model, several interesting outcomes may be observed. (1) It
is clear that the micro indicators did not converge as well as we would
have liked. Even so, it is also clear that (2) the micro and macro latent
constructs were moderatelywell correlated (about .50), and (3) they both
contributed significantly and with similar magnitude in the prediction of
later delinquent behavior. Finally, (4) as can be seen in the alternative
model presented in Figure 9, there was additional Delinquency criterion
variance accounted for by the variable describing the total negative
process; this contribution was above and beyond that of the micro and
macro predictor constructs.
We believe that these two examples support our contention that SEM
is a very handy technique for carrying out the multimethod-multiagent
strategy. Many of the problems that may emerge in this analytic mode
have already been addressed (e.g., Bank et al., 1990), and it is possible to
70 LEW BANK AND GERALD R. PAITERSON

engage in more precise hypothesis testing than with traditional factor


analysis and multiple regression strategies.
The present findings also suggest that, at least for the antisocial
behavior trait, micro and macro variables can serve equally well in
making certain kinds of long-range predictions. It is, of course, too early
to say that this is a general state of affairs that would hold for a wide range
of criterion variables. For example, we still do not understand which
settings should be sampled for micro data, how many data are required
in each setting, and how many settings one should sample. The present
findings only demonstrate that, in principle, micro variables can be used
to make significant predictions about future events. However, the ex-
pense involved in collecting micro data implies that this approach would
seldom be the data of choice for prediction studies. This, in turn, raises
the question of what the unique utilities might be for micro variables.
Within Patterson's coercion model (1982), discussed in the introduc-
tion of this chapter, the micro variables are thought to be necessary
descriptors for any construct that defines a mechanism for change
(Patterson, Reid, & Dishion, in press). On the one hand, most long-term
changes in preadolescents' social behaviors are thought to come about
as a result of microsocial exchanges within the family or with peers. In this
instance, the microsocial exchanges define contingencies that produce
the change. It is also assumed that microsocial exchanges among family
members serve as a sensitive amplifier for stressors that impinge from
outside the family. The data from studies in Tennessee (Wahler &Dumas,
1983), Oregon (Patterson, 1983), and Kansas (Snyder & Huntley, 1989)
produced similar outcomes. Day-by-day variations in maternal reports of
stressors covary significantly with observed rates of irritable mother
behavior (e.g., stress days are related to higher rates of irritability).
Snyder and Huntley used sophisticated modeling analyses of repeated
measures data to show that the effect of stress on the child was mediated
by its impact on parental discipline and monitoring practices. The
implication is that the effect of family stressors on the child is amplified
when it produces disruptions in parenting practices.
Thus, it is essential that each investigator carefully considers the
advantages and disadvantages of specific methods and agents to be used
for operationally defining theoretical constructs. In many instances, it
may well be that the multiple method and agent approach put forth in this
paper will be most useful. Budget issues may sometimes appear to be
obstacles, but we believe these potential problems can be met without
seriously compromising the assessment and analysis strategies (e.g.,
Chamberlain & Bank, 1989). Therefore, it remains our position that the
most generalizable findings are likely to emerge with the use of multiple
agents and methods.
STRUCTURAL EQUATION MODEUNG 71

References
Achenbach, T. M., &Edelbrock, C. S. (1983). Manual for the child behavior checklist
and revised child behavior profile. Burlington, VT: University Associates in
Psychiatry.
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice:
A revi.ew and recommended two-step approach. Psychological Bulletin, 103,
411423.
Arrington, R. (1943). Time sampling: A review. Psychological Bulletin, 40, 81-124.
Bank, L., Dishion, T. J., Skinner, M. L., & Patterson, G. R. (1990). Method variance
in structural equation modeling: Uving with "glop." In G. R. Patterson (Ed.),
Aggression and depression in family interactions, (pp. 247-279). Hillsdale, NJ:
Lawrence Erlbaum Assoc.
Bank, L., Forgatch, M. S., Patterson, G. R., & Fetrow, R. A. (1991). Parenting
practices: Mediators of negative contextual factors in divorce. Unpublished
manuscript. (Available from the first author, OSLC, 207 E. 5th, Suite 202,
Eugene, OR 97401.)
Bank, L., Marlowe, J.H., Reid, J.B., Patterson, G.R. & Weinrott, M.R. (1991). A
comparative evaluation of parent training interventions for families of chronic
delinquents. Journal of Abnonnal Child Psychology, 19(1), 15-33.
Bank, L., Patterson, G. R., & Reid, J. B. (1987). Delinquency prevention through
training parents in family management. Behavior Analyst, 10, 75-82.Baumrind,
D. (1991). Effective parenting during the early adolescent transition. In P. A.
Cowan and M. Hetherington (Eds.), Family transitions (pp. 111-164). Hillsdale,
NJ: Lawrence Erlbaum Assoc.
Bentler, P. M. (1980). Multivariate analysis with latent variables. Annual Review of
Psychology, 31, 419456.
Bentler, P.M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the
analysts of covariance structures. Psychological Bulletin, 88, 58~6.
Bentler, P. M. (1989). EQS: Structural equations program manual. Los Angeles:
BMDP Statistical Software.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by
the multttralt-multtmethod matrix. Psychological Bulletin, 56, 81-105.
Capaldi, D. M., &Patterson, G. R. (1987). An approach to the problem of recruitment
and retention rates for longitudinal research. Behavioral Assessment, 9, 169-
177.
Capaldi, D. M., &Patterson, G. R. (1988). Psychometric properties offourteen latent
constructs from the Oregon Youth Study. NY: Springer-Verlag.
Capaldi, D. M., &Patterson, G. R. (in press). The relation of parental transitions to
boys' adjustment problems: I - A linear hypothesis; II - Mothers at risk for
transitions and unskilled parenting. Developmental Psychology. Chamberlain,
P. (1990). Comparative evaluation of specialized foster care for seriously
delinquent youths: A first step. Community Alternatives: International Journal of
Family Care, 2(2), 21-36.
72 LEW BANK AND GERALD R. PATIERSON

Chamberlain, P. & Bank, L. (1989). Toward an integration of macro and micro


measurement systems for the researcher and the clinician. Journal of Family
Psychology, 3, 199-205.
Dwyer, J. H. (1983). Statistical models for the social and behavioral sciences. NY:
Oxford University Press.
Elliott, D. S., Ageton, S., Huizinga, D., Knowles, B. A., & Canter, R. J. (1983). The
prevalence and incidence of delinquent behavior: 1976-1980. Technical report
#26 of the National Youth Survey. Boulder, CO: Behavioral Research Institute.
Forgatch, M. S. (1991). The clinical science vortex: A developing theory of
antisocial behavior. In D. Pepler and K. H. Rubin (Eds.), The development and
treatment ofchildhood aggression, (pp. 291-316). Hillsdale, NJ: Lawrence Erlbaum
Assoc.
Hollingshead, A B. (1975) .Four factor index ofsocial status. Unpublished manuscript.
(Available from Yale University, New Haven, CT.)
Jones, R. R., Reid, J. B., & Patterson, G. R. (1975). Naturalistic observation in
clinical assessment. In P. McReynolds (Ed.), Advances in psychological
assessment (Vol. 3) (pp. 241-291). San Francisco: Jossey-Bass, Inc.
Joreskog, K. G., & Sorbom, D. (1988). Lisrel 7: A guide to the program and
applications. Chicago: SPSS Inc.Lahey, B. B., Hartdagen, S. E., Frick, P. J.,
McBurnett, K., Connor, R., &
Hynd, G. R. (1988). Conduct disorder: Passing the confounded relation to parent
divorce and antisocial personality.Journal ofAbnormal Child Psychology, 9 7(3),
334-337.
Larson, R. W., Richards, M. H., Raffaelli, M., Ham, M., &Jewell, L. (1990). Ecology
of depression in later childhood and early adolescence: A profile of daily states
and activities. Journal of Abnormal Psychology, 99, 92-102.
Larzelere, R. E., & Patterson, G. R. (1990). Family management practices as a
mediator of the longitudinal effects of socioeconomic status on early
delinquency. Criminology, 28, 301-324.
Littman, R. A., &Rosen, E. (1950). Molar and molecular. Psychological Review, 57,
58-65.
Mischel, W. (1968). Personality and assessment. NY: John Wiley & Sons.
Olweus, D. (1979). Stability of aggressive reaction patterns in males: A review.
Psychological Bulletin, 86, 852-875.
Pastorelli, C., & Dishion, T. J. (1991). Modeling parenting practices and child
adjustment: A cross-cultural study. Unpublished manuscript. (Available from T.
J. Dishion, OSLC, 207 E. 5th, Suite 202, Eugene, OR 97401.)
Patterson, G. R. (1977). A three stage functional analysis for children's coercive
behaviors: A tactic for developing a performance theory. In B. Etzel, J. M.
LeBlanc, and D. M. Baer (Eds.), New developments in behavioral research:
Theory, method, and applications (pp. 59-79). Hillsdale, NJ: Lawrence Erlbaum
Assoc.
STRUCfURAL EQUATION MODEUNG 73

Patterson, G. R. (1979). A performance theory for coercive family interaction. In


R. Cairns (Ed.), The analysis ofsocial interactions: Methods, issues, and illustrations
(pp. 119-162). Hillsdale, NJ: Lawrence Erlbaum Assoc.
Patterson, G. R. (1983). Stress: A change agent for family process. In N.Garmezy
& M. Rutter (Eds.), Stress, coping, and development in children (pp. 235-264).
New York: McGraw-Hill.
Patterson, G. R. (1982). Coercive family process. Eugene, OR: Castalia Press.
Patterson, G. R. (1986). Performance models for antisocial boys. American
Psychologist, 4I, 432-444.
Patterson, G. R., & Bank, L. (1986). Bootstrapping your way in the nomological
thicket. Behavior Assessment, 8, 49-73.
Patterson, G. R., & Bank, L. (1987). When is a nomological network a construct? In
D. R. Peterson and C. B. Fishman (Eds.), Assessment for decision (pp. 249-279).
New Brunswick, NJ: Rutgers University Press.
Patterson, G. R., & Bank, L. (1989). Some amplifying mechanisms for pathological
process in families. In M. Gunnar and E. Thelen (Eds.), The Minnesota Symposia
on Child Psychology: Vol. 22. Systems and development (pp. 167-210). Hillsdale,
NJ: Lawrence Erlbaum Assoc.
Patterson, G. R., Bank, L., & Stoolmiller, M. L. (1990). The preadolescent's
contributions to disrupted family process. In R. Montemayer (Ed.), Advances
in adolescent development: 2. The transition from childhood to adolescence (pp.
107-133). Newbury Park, CA: Sage.Patterson, G. R., & Capaldi, D. M. (1991).
Antisocial parents: Unskilled and vulnerable. In P. Cowan and E. M. Hetherington
(Eds.), Family transitions (pp. 195-218). Hillsdale, NJ: Lawrence Erlbaum Assoc.
Patterson, G. R., Capaldi, D. M., & Bank, L. (1991). An early starter model for
predicting delinquency. In D. Pepler and K. H. Rubin (Eds.), The development
and treatment of childhood aggression (pp. 139- 168). Hillsdale, NJ: Lawrence
Erlbaum Assoc.
Patterson, G. R., Chamberlain, P. C., &Reid, J. B. (1982). A comparative evaluation
of a parent training program. Behavior Therapy, I3, 638-650.
Patterson, G. R., & Cobb, J. A. (1973). Stimulus control for classes of noxious
behaviors. In J. F. Knutson (Ed.), The control of aggression: Implications from
basic research (pp. 145-200). Chicago: Aldine.
Patterson, G. R., Dishion, T. J., & Bank, L. (1984). Family interaction: A process
model of deviancy training. Aggressive Behavior, 10, 253-267.
Patterson, G. R., &Reid, J. B. (1970). Reciprocity and coercion: Two facets of social
systems. In J. Michaels and C. Neuringer (Eds.), Behavior modification and
clinical psychology (pp. 133-177). NY: Appleton-Century-Crofts.
Patterson, G. R., Reid, J. B., & Dishion, T. J. (in press). A social/earning approach:
4. Antisocial boys. Eugene, OR: Castalia Press.
Ramsey, E., Patterson, G. R., & Walker, H. M. (1989). Generalization of the
antisocial trait from home to school settings. Journal ofApplied Developmental
Psychology, I I, 209-223.
74 LEW BANK AND GERAlD R. PATTERSON

Reid, J. B. (1978). A socialleaming approach: 2. Observation in home settings.


Eugene, OR: CastallaPress.Reld,J. B., &Patterson, G. R (1989). The development
of antisocial behavior patterns In childhood and adolescence. European
Joumal of Personality, 3, 107-120.
Reid, J. B., Patterson, G. R, Bank, L., &Dlshlon, T. J. (April, 1987). Thegeneralizability
of single versus multiple methods in structural equation models of child
development. Paper presented at theannualmeetlng of the Society for Research
In Child Development, Baltimore, MD.
Snyder, J., & Huntley, D. (1989). Troubled families and troubled youth.ln P. Leone
(Ed.), Understanding troubled and troubling youth: A multidisciplinary perspective
(pp. 194-225). Newbury, CA: Sage.
Tanaka, J. S. (1987). "How big Is big enough?": Sample size and goodness of fit In
structural equation modeling with latent variables. Child Development, 58, 134-
146.
Wahler, R, & Dumas, J. (June, 1983). Stimulus class determinants of mother child
coercive exchanges in multidistressed families: Assessment and intervention.
Paper presented at the Vermont Conference on Primary Prevention of
Psychopathology, Bolton Valley, VT.
Wothke, W. (1987). Multivariate linear models of the multitrait-multimethod matrix.
Paper presented at the American Educational Research Association,
Washington, DC.
Wright, J. C. (1983). The structure and perception of behavioral consistency.
Unpublished doctoral dissertation, Department of Psychology, Stanford
University, Palo Alto, CA.
Zucker, R. A. (1987). Before alcoholism: A developmental account of the etiological
process. Nebraska Symposium on Motivation, 1986: Alcoholism and addictive
behavior (pp. 27-83). Uncoln: University of Nebraska Press.

Author Note
Support for this chapter was provided by Grant No. MH 37940 from the Center for
Studies of Antisocial and Violent Behavior, NIMH, U.S. PHS, and Grant No. MH
46690, Prevention Research Branch, NIMH, U.S. PHS.

Note
1. Patterson and Capaldi found only a weak relation between Mother Antiso-
cial and Mother Discipline. When these authors partialed SES out of both sets of
Indicators, no significant correlation obtained. The comparison with the present
data is complex, however, because we have used different sets of indicators in
large part for both of these constructs. Furthermore, Patterson and Capaldi used
only education and occupation In defining SES, but we have used income as well.
CHAPTERS

New Developmen ts in
Multiaxial Empirically Based
Assessment of Child and
Adolescent Psychopatho logy

Thomas M. Achenbach

In this chapter, I will present a body of work that I call "multiaxial


empirically based assessment." In explaining this notion, I will first
summarize previous work that has posed new challenges that were not so
evident before. Thereafter, I will present recent innovations designed to
meet these challenges. For brevity, I will use the term "children" to
include adolescents.

What is Multiaxial
Empirically Based Assessment?
In considering the notion of multiaxial empirically based assessment,
it is helpful to appreciate the conceptual context as well as the intended
meaning of the terms.

DIOMAS M. ACHENBACH, Professor, Department of Psychiatry, University of


Vermont, Burlington, Vermont.

75
76 TIIOMAS M. ACHENBACH

Assessment
For our purposes, the word assessment refers to identifying the
distinguishing features of individual cases. There are many reasons for
identifying the distinguishing features of individual cases. The reasons
include clinical objectives such as deciding whether a particular child
needs special help and what sort of help is needed. The reasons also
include systems objectives, such as documenting the number of each
kind of case seen in a particular system and using the distribution of cases
to plan for future services. And the reasons include research objectives,
such as identifying differential etiologies or outcomes for cases differing
in their distinguishing features.
If "assessment" refers to "identifying the distinguishing features,"
how do we know which "distinguishing features" to select? This question
is what prompted our program of research in the first place. Each
individual is distinguishable from other individuals in many ways. To
identify the particular features that are important with respect to psycho-
pathology, we need ways of linking cases to other cases who share similar
features. By identifying groups who share similar features, we provide a
basis for finding answers to such important questions as whether particu-
lar problems should be treated or left alone, the most efficacious treatments
for particular problems, and differences between the etiologies of differ-
ent kinds of problems.
Taxonomy. The process of linking cases to other cases according to
their distinguishing features involves taxonomy-the systematic grouping
of cases according to features that reflect intrinsic similarities among cases
assigned to the same class and differences between cases assigned to
different classes. (fhe words "classification" and "diagnosis" also pertain
to the grouping of individuals into classes. However, these words have
additional meanings that tend to obscure the taxonomic goal of detecting
features that reflect intrinsic similarities and differences between types
of cases.)
The conceptual context of our work encompasses both assessment
and taxonomy as two aspects of a single process-the process of identi-
fying the important similarities and differences between cases. When our
research program began in the 1960s (Achenbach, 1965, 1966), the
prevailing theories placed the origins of most psychopathology in child-
hood. Despite the emphasis on childhood origins, however, there was
little systematic research on the actual form taken by childhood disor-
ders. In fact, until 1968, the official psychiatric nosology provided only
two categories specifically for childhood disorders. These were Adjust-
ment Reaction of Childhood and Schizophrenic Reaction, Childhood
Type (American Psychiatric Association, 1952). Consistent with the
MULTIAXIAL ASSESSMENT OF CHILDREN 77

practices of the day, these diagnostic categories were defined in terms of


narrative descriptions and general inferences, with no explicit criteria for
discriminating among disorders. Even though psychological tests and
clinical interviews were widely used to assess children, these procedures
were not designed to assess the criteria! features of diagnostic categories.
Furthermore, the diagnostic categories themselves had no taxonomic
basis in the sense of being based on evidence for intrinsic differences
among cases.
To obtain a picture of the actual patterns of problems occurring in
children, I constructed a checklist of problems that could be scored for
children aged 4 to 16. The problems were based on a review of the child
clinical literature, plus consultations with clinicians. I then used the
checklist to score the problems reported in the case records of 300 boys
and 300 girls seen at the University of Minnesota Child Psychiatry Service.
Additional problems were added on the basis of findings from the case
records. Factor analyses revealed considerably more syndromes than
suggested by the two categories of childhood disorders included in the
psychiatric nosology. Some of the factor-analytically derived syndromes
had counterparts in the existing clinical literature, whereas others did
not. Some of these syndromes were also specific to one sex or were
limited to certain ages.
Unklng assessment and taxonomy. My early efforts linked assess-
ment and taxonomy by deriving taxa (in this case, syndromes of
co-occurring problems) directly from a particular assessment proce-
dure, as applied to actual clinical samples of children. The correspondence
of a case to a syndrome was operationally defined in terms of the
proportion of the case's problems that were included in the syndrome.
Assessment and taxonomy were thus linked in two fundamental ways:
(1) The taxonomy was derived from a particular assessment procedure;
and (2) this assessment procedure could be applied to new cases to
operationally define their correspondence to the syndromes of the
taxonomy.

Emplrlcally Based Assessment


The research outlined in the preceding section was "empirical" in
several ways. First, it obtained data on actual samples of cases. Second,
correlational analyses were used to identify the actual associations
among the variables scored from the sample of cases. Third, the taxa were
derived by mathematically determining which problems formed syn-
dromes in the statistical sense of features that tend to co-occur.
Calling the initial effort "empirical" does not mean that no decisions
were made about the data. Nor does it mean that there was no selection
78 lliOMAS M. ACHENBACH

of items, methods, or cases, nor that the methods had no influence on the
results. Different decisions, cases, and methods might have produced
different results. All human endeavors are structured by the minds of the
humans who carry them out, and empirical research is no exception to
this fundamental fact. However, the effort to empirically link the assess-
ment and taxonomy of child psychopathology was a first step in a
direction that led toward further challenges.
The syndromes derived from case histories revealed far more differ-
entiation among childhood disorders than was evident in the official
nosology. The study also provided methodology for grouping children
according to syndromes whose correlates were then tested in subse-
quent studies (Achenbach &Lewis, 1971; Anderson, 1969; Hafner, Quast,
& Shea, 1975; Katz, Zigler, & Zalk, 1975; Roff, Knight, & Wertheim, 1976;
Rolf, 1972; Weintraub, 1973). Consistent with the initial derivation of
syndromes, the subsequent studies employed data from case records.
The informants had not responded to standardized protocols and had not
directly supplied data that were scored for analysis. The problems
reported in the records could therefore have been influenced by what the
compilers of the records had chosen to ask and record.
The word "empirical" is derived from the Greek word empeiria,
meaning "experience." The initial effort at empirical derivation of syn-
dromes was successful in providing a more differentiated picture of
childhood disorders and a methodology that was applicable in a variety
of studies. To base our empirical approach more firmly on the way in
which children's problems are actually experienced, however, the next
step was to obtain assessment data directly from those who actually
observe the children's behavior, rather than from case records. Such
assessment data could then be used to derive taxa that would reflect the
patterns detectible in reports by particular kinds of informants.
Parent reports. Parents are the adults who typically have the most
involvement with their children's behavior over the longest periods and
the most situations. In most cases, parents' perceptions are also crucial
in determining what will be done about children's problems, and parents
are involved in most efforts to obtain help for children. In extending the
empirical approach toward more direct utilization of experiential data,
we therefore started with parents.
To tap parents' perceptions of their children's functioning, we devel-
oped the Child Behavior Checklist (CBCL; Achenbach, 1978), which
includes many of the same problem items identified in the initial case
record research, plus additional problem items developed through nine
pilot editions tested with parents in a variety of clinical settings. To
provide more differentiated scoring than is typically possible from case
records, we changed the present-versus-absent format to a 0-1-2 scale in
MULTIAXIAL ASSESSMENT OF CHilDREN 79

which 0 =Not true (as far as you know), I =Somewhat or Sometimes True,
and 2 = Very True or Often True. Furthermore, because children's compe-
tencies may be as important as their problems in determining who needs
special help and the likely outcomes, we included items tapping children's
involvement in activities, social relations, and school.
In obtaining data directly from parents, we also extended empirically
based assessment across many more caseloads than the one used in the
study of case records. This was an important innovation, because the
caseloads of individual clinical settings may reflect idiosyncracies such
as the composition of the local catchment area, socioeconomic and
ethnic factors in client selection, funding mechanisms, dominant clinical
philosophy, and the effects of competing clinical services. By obtaining
data from the parents of children seen in many different types of setting,
we reduced the risk of obtaining syndromes and prevalence rates that
were not typical of children seen for mental health services.
Because the patterning and prevalence of problems may vary with
the children's sex and age, we derived syndromes separately for each sex
at ages 4 to 5, 6 to 11, and 12 to 16. These age ranges were chosen because
they reflect important differences in cognitive and physical development,
educational level, and social status. The syndromes were derived from
samples of 250 referred children of each sex at ages 4 to 5 and 450 referred
childrenofeachsexatages 6to 11 and 12to 16, fora totalof2,300referred
children (Achenbach & Edelbrock, 1983).
To derive syndromes for a particular sex/age group, we first identi-
fied problems that were reported Q.e., scored 1 or 2) for at least 5% of the
clinically referred children in that group. We then performed principal
components analyses with orthogonal (varimax) and oblique (direct
quartimin) rotations of the first 7 to 15 components for each group.
Syndromes of problems that remained intact through multiple rotations
were retained as the bases for problem scales. Either eight or nine
syndromes were retained for each sexjage group. Some syndromes were
quite similar for all groups, whereas others showed considerable varia-
tion or were restricted to particular sex/age groups. Second-order analyses
produced two broad-band groupings of syndromes that were designated
as "internalizing" and "externalizing." The internalizing grouping was
characterized by problems within the self, such as depression and
somatic complaints. The externalizing grouping was characterized more
by conflicts with other people and with social mores, such as aggressive
and delinquent behavior. This distinction resembles those that have
been designated by others as "Personality Problem versus Conduct
Problem" (Peterson, 1961); "Inhibition versus Aggression" (Miller, 1967);
and "Overcontrolled versus Undercontrolled" (Achenbach & Edelbrock,
1978).
80 DIOMAS M. ACHENBACH

In order to provide normative baselines with which to compare


children's reported problems and competencies, we conducted a home
interview survey in which parents filled out the CBCL for 1,300 children
who had not received mental health services during the previous 12
months (Achenbach &Edelbrock, 1981). We then constructed profiles for
comparing the scores of individual children with those of normative
samples of the same age and sex on all the competence and problem
scales.
The CBCL and its scoring profiles provided standardized means for
obtaining, displaying, and analyzing parent-reported problems and com-
petencies for diverse children assessed under diverse conditions for
diverse purposes. This approach offers a common data language for
describing children's problems and competencies that can be used by
workers of different persuasions and professional backgrounds in many
contexts.

Muldaxlal Empirically Based Assessment


The CBCL illustrates the possibilities for empirically based assess-
ment and taxonomy using parents' reports. However, parents are not in
a position to observe all important aspects of their children's behavior.
Furthermore, parents' perceptions and standards for reporting their
children's behavior and their own effects on that behavior are apt to differ
from those of other potential informants. The same can be said for any
informant, including teachers, clinicians, trained observers, and peers-
each kind of informant has access to a child's behavior in only certain
contexts. What is perceived and reported is also affected by characteris-
tics of the informant as well as the contexts in which the child's behavior
is seen. Although self-reports are not limited by the restriction of behav-
ior samples to certain contexts, they are limited by children's cognitive
levels, ability to appraise and report their own behavior and feelings, and
willingness to disclose problems.
Considering the many factors that affect informants' reports of
children's functioning, it should not be surprising that agreement among
informants is far from perfect. To ascertain the typical levels of agree-
ment, we performed meta-analyses of correlations between informants
ratings of 269 samples in 119 published studies (Achenbach, McConaughy,
& Howell, 1987). The meta-analyses yielded the following mean correla-
tions for different combinations of informants: Between informants seeing
children in generally similar contexts (pairs of parents, teachers, mental
health workers, observers), the mean r was .60; between informants
seeing children in different contexts (e. g., parents versus teachers), the
mean rwas .28; and between self-ratings and ratings by parents, teachers,
MULTIAXIAL ASSESSMENT OF CHILDREN 81

and mental health workers, the mean r was .22. There were significant
differences between correlations for types of problems (better agree-
ment for externalizing than internalizing problems) and for different age
groups (better agreement for ages 6 to 11 than for adolescents). However,
these differences were small and did not affect the overall picture of
moderate correlations between informants seeing children in similar
(though not necessarily identical) contexts, and low correlations be-
tween informants seeing children in different contexts and between
self-reports and reports by others.
The modest cross-informant correlations do not mean that the infor-
mants' reports are either unreliable or invalid. High test-retest reliabilities
have been obtained for ratings by most kinds of informants, and numer-
ous significant associations with other variables support the validity of
ratings (see Achenbach & Brown, 1991; Achenbach & Edelbrock, 1983,
1986, 198 7). Rather than indicating a lack of reliability or validity, the low
inter-informant correlations indicate that no single informant can substi-
tute for all others. Instead, to obtain a reasonably complete picture of a
child's functioning, we need data from multiple sources. When available,
such sources would include each parent, the child's teacher(s), and
direct assessment of the child, such as observations, interviews, and-
for older children-structured self-report forms.
In order to extend assessment beyond parent reports, we have
developed standardized rating instruments analogous to the CBCL but
designed to obtain reports from teachers (the Teacher's Report Form,
"TRF"), self-reports from adolescents (Youth Self-Report, "YSR"), direct
observations in group settings (Direct Observation Form, "DOF"), and
clinical interviews (Semistructured Clinical Interview For Children, "SCIC")
(Achenbach & Edelbrock, 1986, 1987; McConaughy &Achenbach, 1990).
We have also developed a downward extension of the CBCL for ages 2 to
3 (Achenbach, Edelbrock, & Howell, 1987), and upward extensions of the
CBCL (Young Adult Behavior Checklist, "YABCL") and of the YSR (Young
Adult Self-Report, "YASR") (Achenbach, 1990a; 1990b).
For clinical assessment, reports by informants are not the only
important sources of data. For most cases, standardized tests of ability
and achievement are also relevant, as is medical assessment. To highlight
the importance of viewing assessment in terms of multiple sources of
data, we call this approach multiaxial empirically based assessment. Table
1 summarizes procedures for obtaining assessment data in terms of five
axes relevant to the assessment of most children. Not all procedures may
be feasible or desirable in all cases, but comprehensive assessment
should take account of all these aspects of functioning. Practical applica-
tions of multiaxial empirically based assessment have been illustrated by
Achenbach and McConaughy (1987) for diverse child and adolescent cases.
Table I
I~
Examples of Multlaxlal Assessment

Age Axial Axial/ Axia/11 AxiaW AxiaV


Range Parent Teacher Cognitive Physical Direct ABaeBBment
Reports Reports Asseament Asaeament of Child

2-5 CBCL/2-3 Preschool Ability tests Height, weight Observations


CBCL/4-16 records Perceptual-motor Medical exam during play
History Teacher tests Neurological exam Interview
Parent interview Language tests
interview

6-11 CBCL/4-16 TRF Ability tests Height, weight DOF


History School Achievement tests Medical exam SCIC
Parent records Perceptual-motor Neurological exam
Interview Teacher tests
interview Language tests IS
12-18 CBCL/4-16 TRF Ability tests Height, weight DOF
History School Achievement tests Medical exam YSR
~
!'=
Parent records Language tests Neurological exam Clinical interview
interview Teacher Selkoncept ~
interview measures
Personality tests
From McConaughy, S.H., & Achenbach, T .M. (1988) i
MULTIAXIAL ASSESSMENT OF CHILDREN 83

Challenges Raised by Multiaxial Empirically


Based Assessment
Multiaxial empirically based assessment provides a common data
language for describing and quantifying children's problems and compe-
tencies in a standardized fashion. The standardized descriptions include
reports of specific items and aggregations of the items into narrow-band,
broad-band, and total score scales. Profiles for scoring children's prob-
lems and competencies make it possible to compare descriptions of
individual children at the levels of items, scales, broad-band groupings,
and total scores with descriptions obtained in a similar fashion for
normative samples of peers.
The empirically based approach to assessment differs from the
approach to child diagnostic categories employed in DSM-111 and DSM-111-
R (American Psychiatric Association, 1980, 1987). The DSM diagnostic
categories for childhood disorders were based on assumptions about
what disorders exist and what descriptive criteria should be used to
determine their presence in a child. Although the DSM-111 descriptive
criteria and decision rules are more explicit than in previous editions of
the DSM, neither the diagnostic categories nor the criteria were derived
from data on actual samples of children. The DSM also lacks operational
definitions of its diagnostic categories, because it does not specify
assessment operations for determining the presence of each criterial
attribute. Instead, the clinician is to decide what assessment data to
obtain, by what means, and from what sources. From whatever data are
obtained, the clinician must then decide whether each criterial feature is
present or absent and whether the child does or does not have a
particular disorder.
The main differences between the DSM-III diagnostic categories for
children and the empirically based approach described here can be
summarized as follows:
1. The DSM diagnostic categories were based on assumptions about
what disorders exist, whereas the empirically based approach
derives syndromes from data on large clinical samples of children.
2. The DSM descriptive criteria for each disorder were chosen via
committee judgments about how to define disorders, whereas the
empirically based approach selects criterial features on the basis
of their discriminative power and the obtained associations with
other features defining a syndrome.
3. The DSM criteria do not operationally define diagnostic catego-
ries, because no operations are specified for assessing criteria}
features, whereas the empirically based approach operationally
84 1HOMAS M. ACHENBACH

defines syndromes in terms of scores obtained via particular


assessment procedures.
4. In using the DSM, the clinician must decide what data to obtain,
how to obtain the data, and from what sources. The clinician must
then judge whether each criteria! feature is present or absent. The
multiaxial empirically based approach, by contrast, uses stan-
dardized procedures to obtain quantitative assessments from
multiple sources.
5. The DSM requires disorders to be diagnosed as present versus
absent, whereas multiaxial empirically based assessment allows
for quantitative gradations of syndromes within and between
situations.

Cross-Informant Discrepancies
Despite the differences between the DSM and the empirically based
approach, empirically based assessment procedures have shown signifi-
cant associations with some DSM categories (e. g., Edelbrock & Costello,
1988; Edelbrock, Costello, & Kessler, 1984; Weinstein, Noam, Grimes,
Stone, &Schwab-Stone, 1990). Furthermore, the use of empirically based
procedures to obtain data from multiple informants about the same child
has posed a major challenge that must be confronted by any approach to
assessment, taxonomy, or diagnosis, including the DSM approach. This
challenge is the problem of dealing with the often disparate pictures of
children's behavior obtained from different sources, each of which may
be reliable, valid, and important in its own right.
Our meta-analyses demonstrated that low to moderate agreement
among informants is not restricted to any particular instruments, infor-
mants, samples of children, or contexts (Achenbach et al., 1987). In fact,
the findings were quite consistent across many studies published over a
long period. The consistency and generality of the findings indicate that
they are not likely to be altered much by changes in instrumentation. Nor
can the problem be escaped by the DSM approach, whereby the integra-
tion of disparate data presumably occurs in the clinician's head. Even
when clinicians have been exposed to the same data, the inter-clinician
reliability of DSM child diagnoses has generally been mediocre (American
Psychiatric Association, 1980, pp. 470-472; Mattison, Cantwell, Russell, &
Will, 1979; Mezzich, Mezzich, & Coffman, 1985; Strober, Green, & Carlson,
1981; Werry, Methven, Fitzpatrick, & Dixon, 1983). Furthermore, when
DSM diagnoses were operationally defined by administering the Diagnos-
tic Interview Schedule for Children (DISC), little agreement was found
between diagnoses made from interviews with children, interviews with
their parents, and clinical evaluations (Costello, Edelbrock, Dulcan,
MULTIAXIAL ASSESSMENT OF CHILDREN 85

Kalas, & Klaric, 1984). Like the correlations between ratings by different
informants, the correlations between symptoms scored from the parent
and child DISCs were low, with an overall r = .27 (Edelbrock, Costello,
Dulcan, Conover, & Kalas, 1986).
Rather than being an artifact of any particular method, the modest
correlations among informants reflect the essential realities of children's
behavioral and emotional problems. Many such problems are not likely
to be manifested uniformly across all contexts. Furthermore, what is
noticed, judged to be present as a problem, and reported by a particular
informant is affected by characteristics of that informant and his or her
relations with the child. The problem, then, is not to abolish differences
between informants' reports or to choose between right and wrong
reports in each case. Instead, the problem is how best to utilize the
important information from each source.

Variations In Syndromal Patterns


As outlined earlier, our empirically based approach captured varia-
tions in the patterning and prevalence of problems by deriving syndromes
from ratings by different informants, separately for children of each sex
in different age ranges. Some syndromes comprised similar sets of items
in ratings by different informants across most sex/age groups. A syn-
drome designated as Aggressive, for example, included items such as
Argues a lot, Physically attacks people, and Temper tantrums in nearly all
analyses (Achenbach, Conners, Quay, Verhulst, &Howell, 1989; Achenbach
&Edelbrock, 1983, 1986, 1987). Other syndromes were somewhat similar
in most groups, while still others were found only in certain sex/age
groups.
Because our initial aim was to reflect the variations in syndromal
patterns actually found, we constructed and normed separate syndrome
scales and profiles for ratings of each sex/age group by each type of
informant. As we completed our analyses of successive groups, the
variations in syndromal patterns proliferated.
The variations in syndromes may reflect real differences among
groups. However, they make it difficult to directly compare ratings of the
same children by different informants and even by the same informants
at different points in the children's development, as required for longitu-
dinal analyses. They also make it difficult to directly compare children of
different sex/age groups. Such comparisons can be made by using sex/
age-based Tscores for competencies, total problems, internalizing, exter-
nalizing, and the narrow-band syndromes that are most similar across
groups, such as the Aggressive syndrome. The less consistent syn-
dromes, however, cannot be readily compared across sex/age groups.
86 TIIOMAS M. ACHENBACH

The sex, age, and informant variations among syndromes raise the
following question: To what extent are particular empirically obtained
patterns linked to an underlying core syndrome that is similar across
groups and informants? The answer to this question has practical impli-
cations for making comparisons across sex, age, and informants. It also
has theoretical implications for advancing from purely empirical correla-
tions among items toward taxonomic constructs on which to target
multiple assessment procedures and the testing of hypotheses as to
etiology, appropriate interventions, and outcomes.
The challenges posed by variations in syndromes arose from the
discoveryofthesevariations through empirical research. The problem of
sex, age, and informant variations in syndromal patterns is not so obvious
in the DSM approach. This is because the DSM approach started with
assumed disorders rather than with empirical data on the patterning of
problems in relation to sex, age, or the source of data. Yet, if a taxonomy
is to accurately reflect children's actual problems, it must take account
of such variations in its definitions of disorders and in its decision rules
for determining whether children of each sex and age have a particular
disorder.

Meeting the Challenges of Cross-Informant


Discrepancies and Syndromal Variations
The cross-informant discrepancies and the syndrorrial variations
have stimulated a new stage of research in multiaxial empirically based
assessment. Our previous efforts to derive syndromes empirically were
"exploratory." That is, we used principal components analysis to deter-
mine which problems tended to co-occur in reports by each type of
informant, rather than making a priori assumptions about syndromes for
which we felt there was insufficient theoretical or empirical basis.
To advance beyond these exploratory efforts, we have focused on
relations between syndromes derived from reports by different infor-
mants for boys and girls of different ages. We first determined the degree
of similarity among the versions of a particular syndrome obtained from
ratings of each sexjage group by a particular type of informant. We did
this by scoring checklists for clinically referred subjects according to the
several versions of a particular syndrome. We then computed correla-
tions between the different versions of a particular syndrome. For example,
we computed correlations among the versions of the Aggressive syn-
drome obtained from separate analyses of parents' ratings of boys and
giris in different age ranges.
MULTIAXIAL ASSESSMENT OF CHilDREN 87

Uke scores on tests of different abilities, scores for different syn-


dromes tend to correlate positively with each other (Achenbach &
Edelbrock, 1983, 1986, 1987). In effect, there is a general psychopathology
(g) factor among syndromes like that found in the positive correlations
among ability tests. The positive correlations among syndromes reflect
the tendency for Individuals who are very deviant in one area to be at least
somewhat deviant in other areas as well. We did not want the correlations
between versions of a syndrome to be inflated by their common associa-
tions with a nonspecific psychopathology factor. We therefore partialled
out of the correlation between versions of a syndrome a score that was
the sum of all checklist items that were not on any version of the
syndrome being tested. This score was used to represent the problems
that were not specific to the syndrome being tested. For example, the
correlations that we computed between the different versions of the
Aggressive syndrome were partial correlations in which we partialled out
all items not loading on any version of the Aggressive syndrome.

Idendftcadon of Core Syndromes ln


Radngs by Particular Informants
Cross-sample core syndromes ln parent radngs. To test the degree
of similarity among different versions of syndromes scored from parents'
ratings, Keith Conners, Herbert Quay, Frank Verhulst, Catherine Howell,
and I compared syndromes derived from principal components analyses
of four sets of parents' ratings of 8,194 clinically referred 6- to 16-year-olds
(Achenbach et al., 1989). The data sets included the original CBCL
syndromes, syndromes derived from CBCL ratings of Dutch children,
syndromes derived from the 215-item ACQ Behavior Checklist, and
syndromes derived from the counterparts of 115 CBCL items that are on
the ACQ. Conners, Quay, and I constructed the ACQ to tap 12 syndromes
hypothesized on the basis of previous multivariate analyses of children's
behavioral/emotional problems (Achenbach et al., 1989).
Separately for each sex in the age ranges 6 to 11 and 12 to 16, we
identified syndromes that remained consistent in several varimax rota-
tions of each data set. We then made comparisons between the different
versions, derived from the different data sets. Items that were found on
at least three of the four versions of a syndrome for a particular sex/age
group were selected to represent a core version of that syndrome for that
group. To test the similarity among the different versions of a syndrome,
we computed correlations among the different versions including the
core version. For each syndrome, a syndrome score was computed that
consisted of the sum of scores for each item comprising that syndrome.
Each version of a syndrome was scored for the ACQ subjects. We then
88 THOMAS M. ACHENBACH

computed partial correlations between these subjects' scores on the


different versions of the syndromes. As explained earlier, the partial
correlations entailed partialling out scores for all items not included in
the syndromes in order to prevent the general psychopathology factor
from inflating the correlations between syndromes.
As an example, a syndrome designated as Somatic Complaints was
found in all groups. It consisted of items such as headaches, nausea,
stomachaches, and vomiting without known medical cause. The core
version of the Somatic Complaints syndrome consisted of items found to
co-occur in the principal components/varimax analyses of at least three
of the four sex/age groups. The syndrome scores for Somatic Complaints
consisted of the sum of ratings on the items of a particular version of the
syndrome. Partial correlations were computed between these syndrome
scores, with all items that did not load highly on any version of the
Somatic Complaints syndrome partialled out of the correlations.
High correlations between core versions of a syndrome and the
versions derived from the American CBCL, the Dutch CBCL, and the ACQ
supported seven syndromes in parents' ratings of each sex within the age
ranges 6 to 11 and 12 to 16. Three other syndromes were supported for
one sex. These findings indicated that the core versions could serve as
good "prototypes" (Achenbach, 1985; Rosch, 1978) of the different ver-
sions of syndromes obtained in parents' ratings of different clinical
samples on the American CBCL, the Dutch CBCL, the entire ACQ, and the
ACQ items that had counterparts on the CBCL.
To determine the degree of similarity between the core versions of
syndromes for the different sexjage groups, we then computed partial
correlations among the core versions for all the sexjage groups in which
they were found. (fhe sum of scores for items not included in any core
syndrome was partialled out.) To take Somatic Complaints as an illustra-
tion, we computed partial correlations among the four core versions of
the Somatic Complaints syndromes for each combination of sexjage
groups, such as boys 6-11 versus girls 6-11, boys 6-11 versus boys 12-16,
and so on. Because the correlations were high between most combina-
tions of core syndromes, we constructed a central core version of each
syndrome. The central core syndrome consisted of items that occurred
together in the core syndromes for a majority of the sexjage groups for
which core syndromes had been established. Thus, for Somatic Com-
plaints, the central core syndrome consisted of items that were included
in the core Somatic Complaints syndromes of at least three of the four sexf
age groups.
To test the degree to which the central core syndromes could
represent multiple sexjage groups, we computed the partial correlations
MULTIAXIAL ASSESSMENT OF CHILDREN 89

between scores for each core syndrome and the corresponding central
core syndromes. Moderate to high mean correlations between the rel-
evant core syndromes and their respective central core syndromes
indicated that the central core syndromes accurately represented the
variance accounted for by most of the core syndromes.

CBCL, YSR, and TRF core syndromes. The foregoing analyses dem-
onstrated the feasibility of identifying core sets of items that co-occur in
parents' ratings across multiple sexjage groups, despite variations in the
other items that are associated with the syndromes in particular sexjage
groups. The next step was to extend this approach to the identification of
core syndromes in ratings by different informants. Because the ACQ
analyses had not revealed any syndromes that were not also found in the
CBCL, and because the CBCL has parallel forms for teacher- and self-
reports, we started with core syndromes identified on the CBCL. However,
we had by now accumulated larger clinical samples than those from
which the original CBCL syndromes were derived. In addition, we wished
to include syndromes for 4- and 5-year-olds and we wished to extend the
age range to 18. We therefore performed new principal components/
varimax analyses of the CBCL. These new analyses of the CBCL were done
separately for each sex at ages 4 to 5, 6 to 11, and 12 to 18. We also
performed new principal componentsjvarimax analyses of the TRF for
each sex separately at ages 5 to 11 and 12 to 18 and of the YSR for each sex
separately at ages 11 to 18. To maximize comparability among the
analyses, we used the 89 problem items that appear on all three instru-
ments, excluding any items that were reported for a very small percent of
the sample being analyzed.
Within each instrument, we identified syndromes that were similar
for multiple sexjage groups. We then identified items that were common
to the versions of a syndrome in the majority of the sexjage groups for
which the syndrome was found. For example, versions of an Aggressive
syndrome were found for all six sexjage groups on the CBCL, all four on
the TRF, and both groups on the YSR. The core CBCL Aggressive syn-
drome was constructed from items that were included in the Aggressive
syndromes for at least four of the six CBCL sexjage groups. The core TRF
Aggressive syndrome was constructed from items that were included in
the Aggressive syndromes for at least three of the four TRF sexjage
groups. And the core YSR Aggressive syndrome was constructed from
items that were included in the Aggressive syndromes for both sexes on
the YSR. Some additional syndromes were found for particular sexjage
groups on one instrument, but I will focus here on the syndromes that
were found for most sexjage groups in ratings by multiple informants.
90 THOMAS M. ACHENBACH

Cross-Informant Syndrome Constmcts


In order to advance the integration of data from multiple informants,
we compared the core syndromes derived from the CBCL, YSR, and TRF.
For those core syndromes that had counterparts in multiple instruments,
we identified items that were common to the core syndromes of at least
two of the three instruments. These common items were used to form
cross-informant constructs that represent what is common to syndromes
scorable from different informants. Although the items comprising the
constructs were empirically derived from the three instruments, the term
"construct" signifies that these common items represent an hypothetical
variable that may not be exhaustively measured by any of the three
instruments alone. In statistical language, these constructs constitute
"latent variables." Ratings by a particular type of informant on a syn-
drome scale of one of the three instruments provide one operational
definition for a construct. Ratings by the other types of informants on
their respective instruments provide other operational definitions for the
same construct. Ratings by the different informants can all be used to
assess the same construct, but the ratings are not necessarily expected
to correlate highly with each other.

Prototype concepts. In addition to thinking in terms of "hypothetical


constructs" and "latent variables," it is also helpful think of cross-
informantsyndromes in terms of prototype concepts. According to findings
from cognitive research, people's concept of a particular category con-
sists of a set of features constituting a mental model or prototype for
entities that are judged to belong to the category (Rosch, 1978). Because
the defining features of a prototype are not all perfectly correlated with
each other, members of a category can vary in the number and particular
subset of features that they display. Category membership is thus not ali-
or-none. Instead, it is a matter of degree that involves the number of
prototypical features manifested. Cases that manifest many prototypical
features of Category A and few prototypical features of Categories B, C,
and Dare easily assigned to Category A Cases that do not manifest many
more features of one category than of other categories, by contrast, are
more difficult to categorize. Research has shown that the prototype view
characterizes psychodiagnostic thinking better than do traditional con-
cepts of categories as being defined in ali-or-none fashion by certain
necessary and sufficient criteria (Cantor, Smith, French, &Mezzich, 1980;
Horowitz, Post, French, Wallis, & Siegelman, 1981; Horowitz, Wright,
Lowenstein, & Parad, 1981).
The cross-informant syndrome constructs correspond to the proto-
MULTIAXIAL ASSESSMENT OF CHilDREN 91

type model, because these constructs comprise sets of features that are
not perfectly correlated with each other. That is, individuals may
manifest different numbers of the prototypical features of a particular
construct. Furthermore, each feature may be manifest In degrees, as
scored on the 0-1-2 scales of the CBCL, TRF, and YSR. By summing the 0-
1-2 scores for all features of a syndrome, we can assess the degree to which
a child resembles the prototype for that syndrome, according to ratings
by a particular informant. In addition, we can compare the syndrome
scores obtained by the child in ratings by multiple informants to deter-
mine the degree to which the child matches one or more prototypes in the
ratings by different informants.
The concept of syndromes as prototypical sets of correlated features
corresponds to the "polythetic" concept of some DSM-111-R diagnostic
categories (American Psychiatric Association, 1987, p. xxiv). According
to the DSM's polythetic concept, the symptoms listed as criteria for a
disorder are fully interchangeable with each other. Because no one of the
symptoms is required for making a diagnosis, any combination of the
required number of symptoms can justify the diagnosis. This is analogous
to the use of cutpoints on the distributions of syndrome scores for
distinguishing between children who are in the normal versus clinical
range.

Instrument-Speclftc Syndrome Scales


The cross-informant constructs serve both to reflect the common
elements underlying syndromes identified in ratings by different infor-
mants and to provide a common focus for the assessment of children
according to data from the different informants. Yet, one type of infor-
mant may be able to contribute data on some items and even on some
syndromes that may not be evident to other types of informants. The
differences between the problem items listed on the CBCL and TRF, for
example, reflect differences in the potential access of parents and teach-
ers to different kinds of problem behaviors. The CBCL includes items
such as Disobedient at home and Runs away from home, which are unlikely
to be observed by teachers. The TRF, on the other hand, includes items
such as Disturbs other pupils and Sleeps in class, which are unlikely to be
observed by parents.
To identify items that might be associated with a syndrome in ratings
by a particular type of informant and also to identify syndromes that
might not be evident in the items common to all informants, we performed
separate principal components analyses of all items that were on a
particular instrument. Thus, for example, unlike the previously described
92 lHOMAS M. ACHENBACH

analyses of only the 89 problem items common to all three instruments,


we analyzed all118 problem items of the CBCL and TRF and all1 02 of the
YSR, excluding any that were reported for a very small percent of the
sample being analyzed. These analyses identified some syndromes that
were not evident in the analyses of the 89 common items, as well as some
items that were associated with the cross-informant syndromes in ratings
by a particular informant. Because our focus is on the cross-informant
syndromes, we will not present here the syndromes that are limited to
one instrument, but they are presented in the 1991 Manuals for the CBCL
and YSR (Achenbach, 1991b, 1991c; no syndromes were limited to the
TRF).
Table 2 lists the items defining the eight cross-informant syndrome
constructs that can be scored from the CBCL, YSR, and TRF. These are
items that were included in the core versions of the syndromes for at least
two of the three instruments. The eight core syndromes were identified
in the principal components/varimax analyses of the 89 items that are
common to the three instruments. However, in our principal compo-
nentsjvarimax analyses of the complete set of items for each instrument,
we identified some additional items that were consistently associated
with a particular syndrome in ratings by one or two types of informant.
These items may capture aspects of a syndrome that are reportable by
only one or two types of informants. We therefore included these addi-
tional items on only the scale(s) for scoring the syndrome from the
informant(s) for whom the items were included in the core syndrome.
Item 80. Stares blankly, for example, was found in the core Attention
Problems syndrome derived from the CBCL and TRF. This item is not on
the YSR, however, because it is not likely to be reported by youths about
themselves. Because it was associated with the core syndrome for two of
the three types of informants, it is included in the cross-informant
syndrome construct, as shown in Table 2.
Other items that were associated with a core syndrome in ratings by
only one type of informant are included in the syndrome scale for only
that type of informant. These instrument-specific items are not shown in
Table 2, but are specified in the Integrative Guide and the Manuals for the
1991 editions of the CBCL, YSR, and TRF Profiles (Achenbach, 1991a,
1991b, 1991c, 1991d). An example is item 24. Disturbs other pupils, which
consistently loaded on the Aggressive Behavior syndrome in teacher
ratings but is not on the CBCL or YSR. Because this item captures an
aspect of the Aggressive Behavior syndrome that is scorable only from
the TRF, it is included in the version of the Aggressive Behavior syndrome
scored from the TRF.
MULTIAXIAL ASSESSMENT OF CHILDREN 93

Table 2
Items Defining the Cross-Informant Syndrome Constructs Derived from the
Child Behavior Checklist (CBCL),Youth Self Report (YSR), and Teacher's
Report Form (TRF1
Internalizing Scales
Withdrawn Anxious/Depressed
42. Would rather be alone 12. Lonely
65. Refuses to talk 14. Cries a lot
69. Secretive 31. Fears impulses
75. Shy, timid 32. Needs to be perfect
80. Stares blankly" 33. Feels unloved
88. Sulksd 34. Feels persecuted
102. Underactive 35. Feels worthless
103. Unhappy, sad, depressed 45. Nervous, tense
Ill. Withdrawn 50. Fearful, anxious
Somatic Complaints 52. Feels too gullty
51. Feels dizzy 71. Self-conscious
54. Overtired 89. Suspicious
56a. Aches, pains 103. Unhappy, sad, depressed
56b. Headaches 112. Worries
56c. Nausea
56d. Eye problems
56e. Rashes, skin problems
56f. Stomachaches
56g. Vomiting
Neither Internalizing nor Externalizing
Social Problems Attention Problems
1. Acts too young I. Acts too young
11. Too dependent 8. Can't concentrate
25. Doesn't get along w. peers 10. Can't sit still
38. Gets teased 13. Confused
48. Not liked by peers 17. Daydreams
62. Clumsy 41. Impulsive
64. Prefers younger kids 45. Nervous, tense
Thought Problems 61. Poor school work
9. Can't get mind off thoughts 62. Clumsy
40. Hears things 80. Stares blankly"
66. Repeats acts
70. Sees things
84. Strange behavior
85. Strange ideas
94 TIIOMAS M. ACHENBACH

Table 2, continued
Extemallzlng Scales
Delinquent Behavior Aggressive Behavior
26. Lacks guilt 3. Argues
39. Bad companions 7. Brags
43. Lies 16. Mean to others
63. Prefers older kids 19. Demands attention
67. Runs away from homeb 20. Destroys own things
72. Sets flresb 21. Destroys others' things
81. Steals at home 23. Disobedient at school
82. Steals outside home 27. Jealous
90. Swearing, obscenity 37. Fights
101. Truancy 57. Attacks people
105. Alcohol, drugs 68. Screams
74. Shows off
86. Stubborn, Irritable
87. Sudden mood changes
93. Talks too much
94. Teases
95. Temper tantrums
97. Threatens
104. Loud
a) Items are designated by the numbers they bear on the CBCL, YSR, and TRF and
summaries of their content. b) Not on TRF. c) Not on CBCL. d) Not on YSR.
(From Achenbach, 1991a). ©Copyright T. M. Achenbach

Profiles for Scoring the Syndromes


In order to assess children on the eight syndromes, separate profiles
have been constructed for scoring the CBCL, YSR, and TRF. Each profile
displays the scales of items used to score the syndromes from that
particular instrument. The scores (0, 1, 2) for each item of a syndrome
scale are summed to obtain a total raw score for that syndrome as
reported by a particular informant. To provide a normative basis for
comparison, the raw scale scores are converted to Tscores derived from
national normative samples. The profile indicates how a child's scale
score compares with those of the normative samples of children of the
same sex and age range. The profile also indicates a normal, borderline,
and clinical range for each syndrome scale. Figure 1 summarizes relations
between the derivation of syndromes, formulation of cross-informant
syndrome constructs, and construction of scales for scoring the syn-
dromes on the 1991 CBCL, YSR, and TRF profiles.
MULTIAXIAL ASSESSMENT OF CHilDREN 95

As noted previously, broad-band groupings of syndromes designated


as Internalizing and Externalizing were identified through second-order
analysesofourpre-1991 syndromes (e. g.,Achenbach &Edelbrock, 1983).
To determine whether similar groupings could be identified in the 1991
syndromes, we performed second-order principal factorjvarimax analy-
ses of the eight syndromes scored by each type of Informant for our
clinical samples of each sex/age group.
Averaged across all groups, the Withdrawn, Somatic Complaints, and
Anxious/Depressed syndromes comprised the Internalizing grouping.
The Aggressive Behavior and Delinquent Behavior syndromes comprised
the Externalizing grouping. The 1991 CBCL, YSR, and TRF profiles for all
seX/age groups are arranged in a uniform format to reflect these group-
ings, with the three Internalizing scales on the left, the two Externalizing
scales on the right, and the three remaining scales (Social Problems,
Thought Problems, and Attention Problems) In the middle.
To assess children in terms of the broad-band Internalizing-External-
izing distinction, raw scores and T scores are computed for the sum of
Internalizing items and the sum of Externalizing items, respectively.
Normal, borderline, and clinical ranges have been established via com-
parisons of the distributions of Internalizing, Externalizing, and total
problem scores for normative versus clinical samples. Children whose
total problem scores are in the clinical range can be divided into those
who manifest mainly Internalizing problems versus those who manifest

Cross-Informant Syndrome Construct


Taxonomic ("latent variable," "prototype")
Construct e.g., Aggressive Behavior Syndrome

Derivation
Operations

Assessment
Operations data on large clinical samples of
each sex/age group

Figure I. Relations between derivation of syndromes, formulation of cross-


Informal syndrome constructs, and construction of profile scales. (From
Achenbach, 1991a.) ©Copyright T. M. Achenbach
96 lHOMAS M. ACHENBACH

mainly Externalizing problems, as judged from the differences between


their Internalizing and Externalizing scores.

Integrating Data from the CBCL, YSR, and TRF


Cross-Informant Computer Program
In using multiaxial assessment, each source of data should be checked
against other sources to detect both similarities and differences in how
the child appears in various contexts. To facilitate comparisons between
father-, mother-, self-, and teacher-reports, we have developed a cross-
informant computer program (described by Achenbach, 1991a). Data
from the CBCL, YSR, and TRF can be entered into the program. For each
informant, the program then scores a separate profile that compares the
informant's ratings with those for national normative samples scored by
the same type of informant.
Beside displaying the separate profiles, the program lists side-by-side
the raw scores for each of the 89 common items and the T scores for the
eight syndrome scales, Internalizing, Externalizing, and total problems,
as obtained from each informant. The program also indicates whether
each scale score is in the normal, borderline, or clinical range, enabling
the user to identify agreements and disagreements in terms of these
categories.
To provide quantitative indices of the degree of agreement between
informants, the program computes Q correlations between the 89 item
scores for each combination of informants and also between the 8
syndrome scale scores. (A Qcorrelation is computed across a set of items
scored from two sources of data, in contrast to the more familiar R
correlation, which is computed between two items scored across a
sample of subjects. Our cross-informant Q correlations are computed by
applying the Pearson product-moment formula to the relations between
the sets of scores obtained from a pair of informants. For example, the
scores assigned to the 89 common items by the child's mother are
correlated with the scores assigned to the 89 items by the child's
teacher.) To evaluate the magnitude of the correlations between particu-
lar raters, the cross-informant program also prints out the 25th percentile
and the 75th percentile correlations obtained in normative samples.
Thus, if the correlation between a particular mother and teacher is below
the 25th percentile, their agreement is considered to be relatively low. If
the correlation is above the 75th percentile, on the other hand, their
agreement is relatively high. All of these data can be used in multi-
informant assessment for research, training, and clinical purposes.
MULTIAXIAL ASSESSMENT OF CHilDREN 97

Taxonomic Decision Tree


Another aid to integrating data from multiple sources is the taxo-
nomic decision tree shown in Figure 2. The sequence of decisions
depicted by the tree can make use of the data yielded by the CBCL, YSR,

Potential
Sources
orData

Initial
Screen

Is deviance Cozu:lusiqp
confined to tbe No evidence ot
same syndrome dlnic:al deviance;
In Ill sources? cbeck Individual
items tor important
problems, e.g., suicidal
No behavior, firesettlng

Cqnc/Wa Is tbesame
Differential Cbild 'a proble101 combination ol
Olapo.tis correspond to syndromes deviant
a single syndrome, In Ill sources?
e.g., Aggressive,
Depressed
Yes No

Corrdrgiqn Does evidence


Child's problems Indicate tbat
comprise multiple cbild's behavior
syndromes or actuaUy differs
bigber-order among contexts?
prorde pauem

Yes No

Conclusjon Condusjon
Different behaviors Some lntormants'
may bave to be perceptions mar
targeted for change baveto be
in different contexts targeted for cbaage

Figure 2. Taxonomic decision tree for using empirically based assessment proce-
dures. (From Achenbach, 1991a.) ©Copyright T. M. Achenbach
98 TIIOMAS M. ACHENBACH

and TRF profiles and by other data that focus on similar syndromes and
have explicit criteria for distinguishing between the normal and clinical
range. Standardized observational procedures and interviews, for ex-
ample, can be used for comparison with the parent-, self-, and
teacher-ratings. (McConaughy & Achenbach, 1988, 1990, provide details
of the Direct Observation Form and Semistructured Clinical Interview
that can be used in this way.)
As shown in Figure 2, we start at the top with data from any combina-
tion of the five potential sources, including parents, self-reports, teachers,
interviews, and observations. After syndrome scales are scored from
each source, the initial screening question is whether any scales are in the
clinical range. A global screen for deviance would include total problem,
Internalizing, and Externalizing scores, as well as syndrome scales from
each source. If no scores are in the clinical range, the data indicate that
the child is not clinically deviant. Nevertheless, individual items should
be examined for evidence of problems that are important in their own
right, such as suicidal behavior and firesetting, whether or not any scales
are in the clinical range.
If any scales are in the clinical range, we ask whether deviant scores
occur on the same syndromes in all sources of data that show any
deviance. This is a question of differential diagnosis. If deviance is
confined to the same syndromes in all data, this indicates focalized
problems in the area of that syndrome.
If deviant scores are not confined to a single syndrome, we then ask
whether the same combination of syndromes is deviant in all data. If the
answer is yes, this indicates that the child's problems comprise multiple
syndromes or a complex profile pattern that might correspond to profile
types identifiable through cluster analysis.
On the other hand, if the data sources differ in the syndromes they
show to be deviant, we need to determine whether the child's behavior
actually differs much among contexts. If the answer is yes, we conclude
that different behaviors may have to be targeted for change in different
contexts. If the answer is no, however, this suggests that some of the
informants' perceptions of the child may need changing.
Additional choices beside those represented in the decision tree are
possible, as are additional sources of data, such as medical examinations,
interviews with parents and teachers, family assessment, and psycho-
logical tests. However, the taxonomic decision tree and cross-informant
program are especially valuable for focusing research, training, and
clinical decision-making on taxonomic distinctions that can be made from
multiple sources relevant to the assessment of most children.
MULTIAXIAL ASSESSMENT OF CHilDREN 99

Summary
Multiaxial empirically based assessment refers to the use of assess-
ment data and taxonomic constructs that are empirically derived from
multiple sources.
Assessment involves identifying the distinguishing features of each case.
Taxonomy involves grouping cases according to their distinguishing features.
Empirically based assessment links assessment and taxonomy by deriving
taxonomic constructs from specific assessment data and by operationalizing
the taxonomic constructs via specific assessment procedures.
Meta-analyses of many studies have revealed significant but modest
correlations between assessment data from different kinds of informants
seeing children in different contexts. Furthermore, multivariate analyses
have shown variations in the patterning and prevalence of problems
reported for children of each sex at different ages.
To deal with the variations among informants and sex;age groups, we
derived core syndromes consisting of sets of problem items found to co-
occur in ratings by a particular type of informant for multiple seX/age
groups. We then compared the core syndromes derived from parent-,
self-, and teacher-ratings to identify sets of items that co-occurred in
ratings by all three types of informants. The items that were common to
the core syndromes for at least two of the three types of Informants were
used to form cross-informant syndrome constructs. These constructs rep-
resent inferred or "latent" variables that may not be exhaustively measured
by any single source of data. The items defining a syndrome construct
provide a prototype for cases considered to manifest the syndrome. The
degree to which cases manifest the prototypical features of the syndrome
can be judged from parent-, self-, and teacher-ratings on the CBCL, YSR,
and TRF, respectively.
Because some items are associated with a syndrome construct only
in ratings by a particular informant, each informant's ratings are scored
on instrument-specific syndrome scales. These scales are normed on
ratings by each type of informant for national samples of 4- to 18-year-
olds. The syndrome scales and scales for scoring competencies are
displayed on the 1991 profiles for the CBCL, YSR, and TRF.
To facilitate the integration of data from parent-, self-, and teacher-
reports, we have developed a cross-informant computer program for
scoring and comparing the CBCL, YSR, and TRF profiles for individual
children. We have also developed a taxonomic decision tree for compar-
ing deviance on scales scored from parents, self-reports, teachers,
interviews, and direct observations. This body of work is designed to aid
in focusing research, training, and clinical decision-making on taxonomic
100 1HOMAS M. ACHENBACH

distinctions that can be made from multiple sources relevant to the


assessment of most children.

References
Achenbach, T.M. (1965). A factor-<Inalytic study of juvenile psychiatric symptoms.
Presented at Society for Research in Child Development, Minneapolis, MN.
Achenbach, T.M. (1966). The classification of children's psychiatric symptoms: A
factor-analytic study. Psychological Monographs, 80, (No. 615).
Achenbach, T.M. (1978). The Child Behavior Profile: I. Boys aged 6-11. Journal of
Consulting and Clinical Psychology, 46, 478-488.
Achenbach, T.M. (1985). Assessment and taxonomy of child and adolescent
psychopathology. Newbury Park, CA: Sage.
Achenbach, T.M. (1990a). YoungAdultBehaviorChecklist. Burlington, VT: University
of Vermont Department of Psychiatry.
Achenbach, T.M. (1990b). Young Adult Self-Report. Burlington, VT: University of
Vermont Department of Psychiatry.
Achenbach, T.M. (1991a). Integrative guide for the /991 CBCL/4-/8, YSR, and TRF
Profiles. Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991b).Manual for the ChildBehaviorChecklistand 1991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991c). Manual for the Teacher's Report Fonn and /991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991d). Manual for the Youth Self-Report and 1991 Profile.
Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M., & Brown, J.S. (1991). Bibliography of published studies using the
Child Behavior Checklist and related materials: 1991 edition. Burlington, VT:
University of Vermont Department of Psychiatry.
Achenbach, T.M., Conners, C.K., Quay, H. C., Verhulst, F.C., & Howell, C.T. (1989).
Replication of empirically derived syndromes as a basis for taxonomy of child/
adolescent psychopathology. Journal ofAbnomial Child Psychology, 17,299-323.
Achenbach, T.M., &Edelbrock, C. (1981). Behavioral problems and competencies
reported by parents of normal and disturbed children aged four to sixteen.
Monographs of the Society for Research in Child Development, 46, Serial No. 188,
Achenbach, T.M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist
and Revised Child Behavior Profile. Burlington, VT: University of Vermont,
Department of Psychiatry.
Achenbach, T.M., &Edelbrock, C. (1986).Manual for the Teacher'sReportFonn and
Teacher Version of the Child Behavior Profile. Burlington, VT: University of
Vermont, Department of Psychiatry.
MULTIAXIAL ASSESSMENT OF CHilDREN 101

Achenbach, T.M., & Edelbrock, C. (1987). Manual for the Youth Self-Report and
Profile. Burlington, VT: University of Vermont, Department of Psychiatry.
Achenbach, T.M., Edelbrock, C., & Howell, C.T. (1987). Empirically based
assessment of the behavioral/emotional problems of 2-3-year-old children.
Journal of Abnormal Child Psychology, 15, 629-650.
Achenbach, T.M., & Lewis, M. (1971). A proposed model for clinical research and
its application to encopresis and enuresis. Journal of the American Academy of
Child Psychiatry, 10, 535-554.
Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent
behavioral and emotional problems: Implications of cross-informant
correlations for situational specificity. Psychological Bulletin, 101, 213-232.
American Psychiatric Association (1952, 1968, 1980, 198 7). Diagnostic and statistical
manual ofmental disorders (1st, 2nd, 3rd, 3rded. rev.). Washington, DC: Author.
Anderson, L.M. (1969). Personality characteristics of parents of neurotic,
aggressive, and normal preadolescent boys. Journal of Consulting and Clinical
Psychology, 33, 575-581.
Cantor, N., Smith, E.E., French, R.deS., &Mezzich, J. (1980). Psychiatric diagnosis
as prototype categorization. Journal of Abnormal Psychology, 89, 181-193.
Costello, A.J., Edelbrock, C., Dulcan, M.K., Kalas, R., & Klaric, S.H. (1984). Report
on the Diagnostic Interview Schedule for Children (DISC). Pittsburgh, PA:
University of Pittsburgh, Department of Psychiatry.
Edelbrock, C., & Costello, A.J. (1988). Convergence between statistically derived
behavior problem syndromes and child psychiatric diagnoses. Journal of
Abnormal Child Psychology, 16, 219-231.
Edelbrock, C., Costello, A.J., Dulcan, M.K., Kalas, R., & Conover, N.C. (1985). Age
differences in the reliability of the psychiatric interview of the child. Child
Development, 56, 265-275.
Edelbrock, C., Costello, A.J., & Kessler, M.D. (1984). Empirical corroboration of
attention deficit disorder. Journal ofthe American Academy of Child Psychiatry,
23, 285-290.
Hafner, A.J., Quast, W., & Shea, M.J. (1975). The adult adjustment of one thousand
psychiatric and pediatric patients: Initial findings from a twenty-five year follow-
up. In R.D. Wirt, G. Winokur & M. Rolf. (Eds.), Life history research in
psychopathology (Vol. 4). Minneapolis, MN: University of Minnesota Press.
Horowitz, L.M., Post, D.L., French, R.deS., Wallis, K.D., & Siegelman, E.Y. (1981).
The prototype as a construct in abnormal psychology: 2. Clarifying disagreement
In psychiatric judgments. Journal of Abnormal Psychology, 90, 575-585.
Horowitz, L.M., Wright, J.C., Lowenstein, E., & Parad, H.W. (1981). The prototype
as a construct in abnormal psychology: 1. A method for deriving prototypes.
Journal of Abnormal Psychology, 90,568-574.
Katz, P.A., Zigler, E., & Zalk, S.R. (1975). Children's self-image disparity: The effects
of age, maladjustment and action-thought orientation. Developmental
Psychology, 11, 546-550.
102 TIJOMAS M. ACHENBACH

Mattison, R., Cantwell, D.P., Russell, A.T., & Will, L. (1979). A comparison of DSM-
11 and DSM-III in the diagnosis of childhood psychiatric disorders. Archives of
General Psychiatry, 36, 1217-1222.
McConaughy, S.H., &Achenbach, T.M. (1988).Practical guide for the Child Behavior
Checklist and related materials. Burlington, VT: University of Vermont,
Department of Psychiatry.
McConaughy, S.H., &Achenbach, T.M. (1990). Guide for the Semistructured Clinical
Interview for Children Aged 6-/1. Burlington, VT: University of Vermont,
Department of Psychiatry.
Mezzlch, A. C., Mezzlch, J.E., & Coffman, GA (1985). Rellablllty of DSM-III vs. DSM-
11 ln chlld psychopathology.Journal oftheAmericanAcademyofChildPsychiatry,
24, 273-280.
Roff, J.D., Knight, R., & Wertheim, E. (1976). Disturbed preschlzophrenlcs:
Chlldhood symptoms ln relation to adult outcome. Journal of Nervous and
Mental Disease, 162, 274-281.
Rolf, J.E. (1972). The social and academic competence of chlldren vulnerable to
schizophrenia and other behavior pathologies. Journal ofAbnormal Psychology,
80, 225-243.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B.B. Uoyd. (Eds.),
Cognition and categorization. Hlllsdale, NJ: Erlbaum.
Strober, M., Green, J., & Carlson, G. (1981). The rellabllity of psychiatric diagnosis
ln hospitalized adolescents: Interrater agreement using the DSM-III. Archives of
General Psychiatry, 38, 141-145.
Weinstein, S.R., Noam, G.G., Grimes, K., Stone, K., & Schwab-Stone, M. (1990).
Convergence of DSM-III diagnoses and self-reported symptoms in chlld and
adolescent Inpatients. Journal ofthe American Academy ofChild and Adolescent
Psychiatry, 29, 627-634.
Weintraub, SA (1973). Self-control as a correlate of an internallzing-externallzlng
symptom dimension. Journal of Abnormal Child Psychology, I, 292-307.
Werry, J.S., Methven, R.J., Fitzpatrick, J., & Dixon, H. (1983). The interrater
reliability ofDSM-III in chlldren.Journal ofAbnormal Child Psychology, II, 341-354.

Author Note
Much of the recent work reported here has been supported by NIMH Grant 40305
and the Spencer Foundation, for which the author is most grateful.
CHAPTER4

The Psychopathy
Checklist-Revised (PCL-R)
An Overview for Researchers
and Clinicians

Stephen D. Hart, Robert D. Hare, and


Timothy J. Harpur

The Psychopathy Checklist (PCL; Hare, 1980) and its revision (PCL-R;
Hare, 1985a, in press) are clinical rating scales that provide researchers
and clinicians with reliable and valid assessments of psychopathy. Their
development was spurred largely by dissatisfactions with the ways in
which other assessment procedures defined and measured psychopathy
OHare, 1980, 1985b).
The PCL was originally intended only for research with forensic
populations. However, over the past ten years both the construct of
psychopathy and the PCL itself have become topics of intense interest to

STEPHEN D. HART, Assistant Professor, Department of Psychology, Simon


Fraser University, Burnaby, British Columbia, Canada.
ROBERT D. HARE, Proffesor, Department of Psychology, University of British
Columbia, Vancouver, British Columbia, Canada.
TIMOTiiY J. HARPUR, Assistant Professor, Department of Psychology,
University of Illinois, Champaign, Illinois.
103
104 HARTETAL

members of the criminal justice system and to psychologists and psychia-


trists working in more traditional mental health settings. For example,
American and Canadian correctional and forensic psychiatric services
have started to use the PCL-R to assist in making decisions related to the
placement, treatment, and conditional release of offenders and patients
(e.g., Cotton, 1989; Correctional Service of Canada, 1989). An increasing
number of substance-abuse programs use the PCL-R to differentiate
between, and to tailor treatment programs for, psychopaths and other
patients (e.g., see Gerstley, Alterman, McLellan, & Woody, 1990). PCL-
based assessments are also being used in large-scale studies of risk for
violence in the mentally disordered (The MacArthur Risk Study; Monahan,
1990), and in the DSM-IV field trials for evaluating different sets of criteria
for antisocial personality disorder (American Psychiatric Association,
1990; Gunderson, 1990; Hare, Hart, & Harpur, in press; Widiger, Frances,
Pincus, Davis, & First, in press).
The primary purpose of this chapter is to provide an introduction to
the PCL-R. We begin by discussing the construct of psychopathy. Next, we
provide basic information concerning the PCL-R, followed by a review of
normative data and evidence concerning reliability and validity. Those
interested in obtaining more detailed or further technical information
should consult the manual for the PCL-R (Hare, in press).

The Construct of Psychopathy


In this section, we outline the construct of psychopathy and some
important research findings. A thorough review is beyond the scope of
this chapter; interested readers should consult Cleckley (1976) as well as
some of the many other references available, such as Doren (1987), Grant
(1977), Hare (1970), Hare and Schalling (1978), McCord (1982), McCord
and McCord (1964), Meloy (1988), Millon (1981), Reid, Dorr, Walker, and
Bonner (1986), Smith (1978), and Weiss (1987).

Deftnldon
Psychopathy, as defined in the PCL-R, is a personality disorder. Like
all personality disorders, it has an early onset and characterizes the
individual's long-term functioning, resulting in social and interpersonal
dysfunction (e.g., American Psychiatric Association, 1980, 1987; Millon,
1981). Symptoms of psychopathy are usually evident by middle to late
childhood, and can be assessed reliably in adolescence (Forth, Hart, &
Hare, 1990; Robins, 1966). The disorder is chronic and persists well into
adulthood, although there may be some changes in its symptom pattern
PSYCHOPATIIY CHECKUST 105

after age 45 or so (Cleckley, 1976; Hare, McPherson, &Forth, 1988; Harpur


& Hare, 1990a; Robins, 1966). Psychopathy is associated with unstable
interpersonal relations, poor occupational functioning, and increased
risk of involvement in criminal activity (Cleckley, 1976; Hare, McPherson,
&Forth, 1988; Hart, Kropp, &Hare, 1988; Kosson, Smith, &Newman, 1990;
Woodruff, Guze, & Clayton, 1980).lt is the disproportionate involvement
of psychopaths in crime that makes them of particular concern to the
criminal justice system.
Psychopathy can be differentiated from other personality disorders
on the basis of its characteristic pattern of interpersonal, affective, and
behavioral symptoms. Interpersonally, psychopaths are grandiose, ego-
centric, manipulative, dominant, forceful, and cold-hearted. Affectively,
they display shallow and labile emotions, are unable to form long-lasting
bonds to people, principles, or goals, and are lacking in empathy, anxiety,
and genuine guilt or remorse. Behaviorally, psychopaths are impulsive
and sensation-seeking, and tend to violate social norms; the most obvious
expressions of these predispositions involve criminality, substance abuse,
and a failure to fulfill social obligations and responsibilities.

Psychopathy, Sociopathy, and Antisocial Personality Disorder


The terms psychopathy, sociopathy, and antisocial personality dis-
order (APD) are often used to refer to the same underlying construct (Dix,
1980). However, as diagnostic labels they sometimes refer to criteria sets
that differ in important ways. Over the years the criteria for APD and
sociopathy, terms that we will use interchangeably here, have increas-
ingly emphasized antisocial and criminal behaviors, whereas definitions
of psychopathy typically include explicit reference to affective and
interpersonal characteristics.
A focus on behavior has increased the reliability of APD diagnoses,
but at the expense of validity (Frances &Widiger, 1986; Hare, 1983; Millon,
1981). Current criteria for APD do not include the affective and interper-
sonal characteristics that traditionally have been considered central to
psychopathy. As a result, the APD criteria do not distinguish the callous,
remorseless, and manipulative psychopath from other antisocial indi-
viduals (Harpur, Hare, & Hakstian, 1989). It is not surprising, therefore,
that APD is more or less synonymous with persistent antisocial behavior,
that as many as 75% or 80% of convicted felons warrant the diagnosis
(Correctional Service of Canada, 1990; Guze, 1976; Hare, 1983), or that
individuals diagnosed as APD are heterogeneous with respect to the
personality traits that define psychopathy. PCL-R criteria for psychopathy,
on the other hand, yield much lower base rates in criminal populations
(about 15 to 25%), and define individuals who are relatively homogeneous
106 HARTETAL

with respect to interpersonal and affective traits (Hart & Hare, 1989).
In general, the empirical association between diagnoses of
psychopathy and APD in forensic populations is asymmetric: Most crimi-
nal psychopaths are also APD, but the reverse is not true. For example,
about 90% of criminal psychopaths, diagnosed according to PCL-R crite-
ria, meet DSM-111/DSM-lii-R APD criteria, but only about 20% to 30% of
those diagnosed as APD meet the PCL-R criteria for psychopathy (Hare,
1983, in press; Hart & Hare, 1989). One reason for this asymmetry is that
in forensic populations the base rate for psychopathy, as defined by the
PCLfPCL-R, is much lower than is the base rate for APD. An additional
reason is that most criminal psychopaths engage in the sort of antisocial
behaviors that define APD, whereas the majority of prisoners and foren-
sic patients with APD show little evidence of the affective and interpersonal
features measured by the PCLfPCL-R.
The situation may change when the next version of DSM Is published.
Thus, the Axis II Work Group of the DSM-IV Task Force has Identified APD
as the personality disorder most likely to undergo major changes In DSM-
IV (American Psychiatric Association, 1990). The primary goals of the
Work Group are to simplify the criteria for this disorder and at the same
time include more traditional items typical of psychopathy. Four criteria
sets are being evaluated and compared in field trials, under the overall
direction of Thomas Widiger (see Widiger, et al., in press; Hare, Hart, &
Harpur, In press). Briefly, the four criteria sets are as follows: the existing
DSM-lii-R criteria for APD; a shortened list of the DSM-lii-R criteria; the ICD-
10 criteria for dyssocial personality disorder (see below); and a list of 10
items derived from the PCL-R, half measuring interpersonal and affective
behaviors and half measuring criminal and antisocial behaviors.

Assessment Issues
The problems with the diagnostic criteria for APD also apply to self-
report scales designed to assess psychopathy, such as Scale 4 (Pd) of the
MMPI/MMPI-2 (Hathaway & McKinley, 1943; Butcher, Dahlstrom, Gra-
ham, Tellegen, & Kaemmer, 1989), or the Socialization (So) scale of CPI
(Gough, 1969); they also seem to measure only the social deviance
components of psychopathy (Hare, 1985b; Harpur et al., 1989).2 Self-
report scales are also problematic In that they require the cooperation of
the inmate or patient, and are susceptible to attempts at malingering or
socially desirable responding-a special concern when trying to assess
psychopathy, since lying and deceitfulness are commonly included as
diagnostic criteria for the disorder (Hare, 1985b; Hare, Forth, & Hart, 1989;
Hart, Forth, & Hare, in press). It was concerns such as these that
motivated the development of the PCL.
PSYCHOPA1HY CHECKUST 107

Development of the PCL and PCL-R


Global Ratings
In their early research Hare and his colleagues used global ratings to
identify psychopaths. After interviewing inmates and reviewing their
institutional files, raters made a subjective decision as to how closely the
inmate matched Cleckley's (1976) description of the prototypic psycho-
path. These global ratings were made on a 7-point scale (1 =low
psychopathy, 7 =high psychopathy). The global ratings had good interrater
reliability, typically around r = .90 (Hare &Cox, 1978; Hare, 1985b). Global
ratings of 6 or 7 were considered diagnostic of psychopathy.
Although reliable, the global ratings had some undesirable character-
istics. For example, it was difficult to determine how raters actually
arrived at their decisions, to identify the source of disagreements be-
tween raters, or to specify how raters should be trained. There were
similar difficulties with attempts to rate subjects on Cleckley's (1976) list
of 16 diagnostic criteria (Hare, 1980). For these reasons, a new scale was
needed to structure and formalize the assessment of psychopathy.

ThePCL
The first step in construction of the PCL was to generate a list of
characteristics, derived from experience with criminals and from a
survey of the literature, that might differentiate between psychopathic
and non psychopathic inmates. Over 100 such characteristics were iden-
tified. The list was shortened by deleting partially overlapping items.
Next, we formed two criterion subject groups-psychopaths and
non psychopaths-on the basis of global ratings. Independent raters then
reassessed these subjects and rated them on each of the preliminary
items. Items were dropped from the list if they correlated too highly with
other items, or if they had low correlations with the global ratings, low
interrater reliability, or extreme base rates. After these steps, a list of 22
characteristics remained; these became the original version of the PCL
(Hare, 1980; Hare & Frazelle, 1980).

ThePCirR
Subsequent revisions of the PCL consisted of (a) dropping two items
with unsatisfactory psychometric characteristics and problematic scor-
ing criteria, (b) modifying item descriptions to clarify their scoring, and
(c) changing the wording of several items. The result was the 20-item PCL-
R (Hare, 1985a, in press). The items in the PCL-R are presented in Table
108 HARTETAL.

Table 1
The PCL-R IteJDB: Mean lnte!Tilter Rellab1llty, Corrected Item-Total
Correlations, and De8crlptlve Statistics
r
Inter- Item-
rater total M SD

1. Glibness/superficial charm .55 .48 .79 .75


2. Grandiose senses of self worth .54 .52 .85 .76
3. Need for stimulation/
proneness to boredom .61 .57 1.39 .72
4. Pathological lying .46 .54 .96 .76
5. Conning/manipulative .61 .57 1.02 .79
6. Lack of remorse or guilt .60 .51 1.45 .70
7. Shallow affect .54 .53 1.15 .75
8. Callous/lack of empathy .52 .61 1.25 .72
9. Parasitic lifestyle .58 .39 1.11 .70
10. Poor behavioral controls .62 .42 1.23 .78
11. Promiscuous sexual behavior .62 .38 1.12 .85
12. Early behavior problems .65 .43 .99 .85
13. Lack of realistic, long-term goals .57 .46 1.28 .74
14. Impulsivity .56 .51 1.52 .66
15. lrres ponslblllty .51 .51 1.41 .68
16. Failure to accept responsibility
for actions .42 39 1.17 .78
17. Many short-term marital relationships .66 .30 .67 .79
18. Juvenile delinquency .79 .36 1.12 .89
19. Revocation of conditional release .73 .35 1.31 .80
20. Criminal versatility .86 .42 .92 .82
Note. Pooled across samples of prison Inmates and forensic patients (N = 1632).
The Item-total correlations differ slightly from those presented In Hare,
Hart, and Harpur (In press) because of the Inclusion of additional samples
and subjects. Each Item scored on a 3-polnt scale (0, 1, 2).

1. The original PCL items and their psychometric characteristics are


described in an appendix to the PCL-R manual (Hare, in press).

Correladon Between PCL and PCL-R


Not surprisingly, the PCL and PCI.rR are highly correlated and mea-
sure the same construct. Hare et al. (1990) examined the comparability of
the PCL and PCI.rR in a sample of 122 male offenders, each of whom was
PSYCHOPATIIY CHECKUST 109

assessed by two independent raters, one using the PCL and the other
using the PCL-R The correlation between the PCL and PCL-R Total scores
was .88, a value that is similar to the interrater reliability of the individual
scales. Indeed, when the correlation between the scales is disattenuated
for rater unreliability, it approaches unity.

Scale fonnat
The items listed in Table 1 are merely summary labels; the PCL-R
manual (Hare, in press) contains a detailed description of each item, as
well as a section on the sources of information typically used to score the
item. Each item is scored on a 3-point scale, where 0 indicates that the
symptom definitely does not apply to the individual; 1, that the item
applies somewhat or only in a limited sense; and 2, that the item definitely
applies. The ratings for most items involve some degree of judgment and
inference, guided by the item description in the manual. For several items,
however, fixed and explicit criteria are provided. The PCL-R allows raters
to omit items that they feel cannot be scored properly due to missing or
incomplete information.

Administration
PCL-R assessments are based on an interview and a review of collat-
eral information. In our research we use a semi-structured interview that
allows the interviewer (a) to obtain the requisite historical information,
and (b) to observe the individual's interactional style. The interview
takes about 90 to 120 minutes to complete, and covers educational,
occupational, family, marital, and criminal history. Although a more
structured interview would perhaps increase the reliability of some
information collected, it would also tend to obscure or suppress the
individual's natural interactional style.
The type of collateral information available varies according to the
setting in which the assessment is made. In correctional settings there is
usually ample material available: A criminal record, intake or classifica-
tion reports, presentence reports, institutional progress logs, past parole
or probation records, and so forth. In forensic psychiatric and pretrial
settings, police reports concerning the individual's current offenses,
interviews with family members, and results of medical and psychologi-
cal assessments may also be available. The purposes of this collateral
information are (a) to help evaluate the credibility of information ob-
tained during the interview, (b) to help determine if the interactional style
exhibited by the individual during the interview was representative of his
or her usual behavior, and (c) to provide the primary data for scoring
110 HARTETAL

several of the items. A collateral review typically takes about one hour. In
the absence of adequate collateral information, the PCL-R cannot be
scored.
Occasionally, there are large discrepancies between the interview
and collateral information. If it is possible to determine that one source
of information is more credible than the other, then greatest weight is
given to information from the most credible source. Otherwise, prefer-
ence is given to the source most suggestive of psychopathology, on the
assumption that most individuals tend to underreport pathological
behavior.
It may prove impossible to interview the individual in some situations
(e.g., research using archival information). Acceptable ratings may be
made without an interview only if there is extensive, high-quality file
information available (e.g., Hart & Hare, 1989; Harris, Rice, & Cormier,
1990; Wong, 1988); where possible, behavioral observations and informal
interactions with the client should be used to supplement collateral
information.

Total Scores
Once the interview and collateral review have been completed, each
PCL-R item is scored. The individual items are then summed to yield a
Total score. If five or fewer items are omitted the Total score should be
prorated to a 20-item scale; if more than five items are omitted the
assessment should be considered invalid. Total scores on the PCL-R
range from 0 to 40.
Cutoff scores may be used to classify individuals if a diagnosis is
desired. For the PCL, we used a score of 33 or more as indicative of
psychopathy; this cutoff appeared to provide the best balance between
sensitivity and specificity with respect to global ratings of psychopathy
in the PCL derivation samples. The corresponding cutoff for the PCL-R is
30. Subsequent research has confirmed the utility of these cutoffs 0Nong
& Templeman, 1988).

Factor Scores
The PCL-R (as well as the PCL) consists of two stable, oblique factors
(Hare et al., 1990; Harpur, Hakstian, & Hare, 1988). The correlation
between the factors is about the same in samples of prison inmates (.56
on average) as it is in samples of forensic patients (.53 on average). The
factors can be viewed as psychologically meaningful facets ofthe "higher-
order" construct of psychopathy.
PSYCHOPATIIY CHECKUST 111

Factor 1 is defined by items 1, 2, 4, 5, 6, 7, 8, and 16. It clearly reflects


interpersonal and affective characteristics, such as egocentricity, lack of
remorse, callousness, and so forth, considered fundamental to clinical
conceptions of psychopathy. Evidence presented elsewhere (Hare, in
press; Harpur et al., 1989; Hart & Hare, 1989) indicates that Factor 1
scores, obtained by summing the scores on individual items, are most
closely correlated with classic clinical descriptions of psychopathy,
prototypicality ratings of narcissistic personality disorder, and with self-
report measures of machiavellianism, narcissism, empathy, and anxiety
(negatively in the latter two cases). Factor 1 also projects onto the octant
of the interpersonal circumplex defined by ratings of Arrogant/Calculat-
ing, an octant that has also been labelled Narcissistic/Exploitative (Wiggins,
1982).
Factor 2, defined by items 3, 9, 10, 12, 13, 14, 15, 18, and 19, reflects
those aspects of psychopathy related to an impulsive, antisocial, and
unstable lifestyle. Factor 2 scores are most strongly correlated with
diagnoses of APD, criminal behaviors, socioeconomic background, and
self-report measures of psychopathy, including the So scale of the CPI, the
Pd and Ma scales of the MMPI/MMPI-2, and Scale 6a (antisocial) of the
Millon Clinical Multiaxial Inventory-11 (MCMI-11; Millon, 1987). In addition,
recent evidence indicates that Factor 2 is much more strongly related to
substance abuse than is Factor 1 (Hart & Hare, 1989; Smith & Newman,
1990), a finding that is consistent with the argument (Gerstley et al., 1990)
that it is important in the treatment of substance abusers to differentiate
between patients who are psychopaths (e.g., high on Factors 1 and 2) and
those whose abuse is part of a more general pattern of antisocial behavior
(e.g., high on only Factor 2).

Uses and Users

Most of the data on the psychometric properties of the PCL-R are


derived from adult male forensic populations (e.g., institutional or com-
munity correctional facilities, forensic psychiatric hospitals, and pretrial
evaluation or detention facilities). However, early indications are that the
PCL-R will also be useful with other populations, including substance
abusers (Alterman, personal communication, November,1990), young
offenders (Forth et al., 1990), female offenders (Hare, in press; Neary,
1990), and noncriminals (Hare, in press). In general, the utility of the PCL-
R appears to depend more on the extent and quality of information
available about individuals than it does on the nature of the population
from which they come.
In research settings the PCL-R typically is used to create criterion
groups or to perform correlational analyses with other variables. In such
112 HARTETAL

cases only the researchers have access to an individual's scores or


diagnoses, and the information has no implications for his treatment or
management. Under these conditions the user qualifications are not as
stringent as they are if the assessments have direct or indirect implica-
tions for inmates or patients. Researchers (or, if the researcher is currently
enrolled in a graduate training program or medical school, his or her
supervisor) should possess an MAfMEd, PhD/DEd, or MD degree, with
graduate level courses in psychopathology, statistics, and psychometric
theory. The actual ratings can be made, under supervision, by individuals
with lesser qualifications; a degree in the social or behavioral sciences,
counselling psychology, or social work, plus some experience in inter-
viewing, would suffice in most cases. In any case, it is important that
potential users of the PCLrR become familiar with its use and that they
obtain estimates of the reliability of their assessments.
User qualifications are more demanding when the PCLrR is used for
clinical purposes. The term psychopathy is misinterpreted or misunder-
stood by many people, professionals and nonprofessionals alike, and its
application in clinical settings can have profound and lasting effects on
the ways in which an inmate, defendant, or patient is viewed and treated.
Even those familiar with the empirical research on psychopathy may
draw unwarranted conclusions from the use of the label. The problem
would be compounded if a PCL-R score is taken as firm evidence that a
given individual is, or is not, a psychopath, rather than as an imperfect
estimate of the extent to which he matches the prototypical psychopath.
We must emphasize that the PCL-R is a clinical tool that requires
professional expertise and judgment and, for clinical purposes, should be
used only by those legally entitled to administer psychological tests and
to diagnose mental disorder. 3

Descriptive Statistics
Researchers generally report very similar mean PCLrR Total scores
for samples of criminals, regardless of the country the research is
conducted in, the security level of the institution, whether or not subjects
are volunteers, and the racial composition of the sample (Hare, in press;
Harpur et al., 1988, 1989; Kosson et al., 1990). The mean score in samples
drawn from forensic psychiatric facilities generally is lower than the
mean for samples of prison inmates.
Table 2 presents descriptive statistics for the PCLrR, aggregated
across seven samples of male prison inmates (N = 1192) and across four
samples of male forensic psychiatric patients (N =440). The data are from
studies conducted by a number of investigators in Canada, the United
PSYCHOPATIIY CHECKUST 113

States, and England for which individual item scores were available to us.
Raters varied in their professional qualifications and degree of experi-
ence with the PCL-R; samples differed in mean age, racial composition,
security level, and so forth. In each case, the distribution of Total scores
is approximately normal, with a slight negative skew.
Table 2 also presents means for the Factor scores in prison and
forensic psychiatric populations.

Table 2
Mean PCL-R Total, Factor I, and Fac.tor 2 Scores For Prison Inmates and
Forensic Psychiatric Patients

PCL-R Prison Inmates Forensic Patients


score M SD M SD
Total 23.6 7.9 20.6 7.8
Factor I 8.7 3.9 7.9 3.9
Factor 2 11.7 3.9 11.4 4.0

Note. Based on pooled data from seven samples of prlsonlnmates (N = 1192) and
four samples of forensic psychiatric patients (N = 440). From Hare (in
press).

Demographic Variables
Age. PCL-R Total scores do not to vary appreciably as a function of the
age of subjects. This is not true for Factor 2 scores, however. Cross-
sectional analyses (of the PCL and PCL-R) indicate that Factor 1scores are
stable across age groups ranging from 15 to 55, whereas Factor 2 scores
show a significant linear decline across the same groups (Harpur & Hare,
1990a).
Race. Mean Total scores for Black and Native North American Indian
males are within 1 or 2 points of the mean for White males (Hare, in press;
Kosson et al., 1990; Peterson, 1984; Wong, 1984). There is no consistent
evidence that individual items are racially biased (Hare, in press); never-
theless, the issue needs further investigation. A limitation of current data
is that the raters typically have been White. As a result we do not know
to what extent the evaluation of items dealing with interpersonal and
affective characteristics is influenced by racial and cultural differences
between the rater and the inmate or patient.
Gender. In the few samples studied to date the Total scores are
reliable and distributed much as they are for male offenders (see Hare, in
press; Neary, 1990).
114 HARTET AI..

Socioeconomic level. PCL-R (and PCL) Total and Factor 2 scores are
negatively correlated with the occupational achievement of subjects, but
uncorrelated with the occupational achievement of their parents (Harpur
et al., 1989). With respect to educational achievement, criminal psych~
paths typically complete fewer years of formal schooling, but take part in
more educational and vocational upgrading In prison, than do other
criminals; thus, Total scores are uncorrelated with overall educational
achievement. Factor 2 scores, however, correlate negatively with educa-
tion (Harpur et al., 1989).

lt~lliil>lllt3f

Internal consistency and lnterrater reUabUlty


Table 3 presents summary data on the Internal consistency and
interrater reliability of PCL-R Total and Factor scores. Internal consis-
tency was estimated by Cronbach's alpha and by the mean inter-item
correlation (MIC), the latter being Independent of scale length. lnterrater
reliability was measured by the intraclass correlation coefficient OCC),

Table3
RellabWty of PC~R Total and Factor Scores

Prlaon Inmates Forensic patients


Factor Factor
RellabWty
Index Total 1 2 Total 1 2

Internal Consistency-
Alpha .87 .84 .77 .85 .80 .77
MIC .26 .40 .28 .22 .34 .28

ICCb
One rating .83 .72 .83 .86 .77 .83
Average of two ratlnes .91 .86 .91 .93 .88 .92

Note. Adapted from Hare (In press). Alpha= Cronbach's alpha; MIC =mean Inter-
Item correlation; ICC =·lntraclass correlation
a) Based on data from 1192 inmates and 440 patients.
b) Based on data from 385 Inmates and 90 patients.
PSYCHOPAmY CHECKIJST 115

using a one-way random effects model (Bartko, 1976; Shrout & Fleiss,
1979). Data from two sets of judges were used to calculate the expected
reliability of a single judge's ratings (ICC 1) and the expected reliability of
the mean of two ratings (ICC~.
The ICCs indicate that PCL-R Total and Factor scores are highly
reliable, especially when averaged across two raters, despite the subjec-
tive nature of many of the items.

Generallzablllty
Generalizability (G) theory (Cronbach, Gieser, Nanda, & Rajaratnam,
1972) has several advantages over classical test score theory. For one, it
provides a single index of the adequacy of measurement, namely the
generalizability coefficient (GC). Applying G theory to the PCL-R, the main
concern is to reliably rank order individuals (the object of measurement).
Variance due to individuals is considered universe score variance (true
score variance in the classical sense), and error variance arises from the
interaction of individuals with all other sources (test items, time, raters,
institutions, etc.). The ratio of universe score variance to universe score
plus error variance reflects the generalizability (reliability) of measure-
ment. This ratio is the GC; it is an intraclass correlation that is interpreted
in the same way as traditional reliability coefficients. (See Wiggins, 1973,
pp. 284-295, for a succinct discussion of G theory.)
Schroeder and Hare (1990) performed a G analysis on data from 475
inmates and patients for whom double ratings were available. The GC was
.82 for Total scores, .76 for Factor 1 scores, and .83 for Factor 2 scores.4

Validity
Because the PCL and PCL-R are so highly correlated, evidence con-
cerning the validity of one has direct implications for the validity of the
other. In several studies both PCL and PCL-R ratings were collected, and
in each case the scales had identical patterns of correlations with external
variables (e.g., other measures of psychopathy, parole outcome, vio-
lence, etc.). For these reasons, separate sections are not devoted to the
two versions in the following discussion.

Content-related evidence
Although they owe much to Cleckley (1976), the PCL-R items are
consistent in content with the conceptualizations of psychopathy dis-
cussed by many other authors (e.g., Buss, 1966; Karpman, 1961; McCord
116 HARTETAL

& McCord, 1964; Millon, 1981). They are also consistent with t~e views of
psychopathy held by practicing clinicians (Gray & Hutchison, 1964;
Davies & Feldman, 1981) and researchers (Fotheringham, 1957; Albert,
Brigante, & Chase, 1959).
As noted above, the PCL-R criteria for psychopathy overlap consid-
erably with the DSM-III/DSM-III-R, Research Diagnostic Criteria (RDC;
Spitzer, Endicott, & RQbins, 1975), and Feighner criteria (Feighner et al.,
1972) for APD, although the DSM-IIIfDSM-III-R and RDC criteria pay little
attention to interpersonal and affective characteristics. The fact that the
DSM-IV Task Force decided to use items based on the PCL-R in its field
trials may be seen as support for the content-related validity of the PCL-R.
The PCL-R criteria are also very similar to the current International
Classification of Diseases (ICD-9; World Health Organization, 1978) cat-
egory 301.7, "Personality disorder with predominantly sociopathic or
asocial manifestations" (also referred to as APD in clinical modifications
of ICD-9), and to the proposed ICD-10 revision of this category (F60.2,
"dyssocial personality disorder"; see Sartorius, Jablensky, Cooper, &
Burke, 1988). Uke the PCL-R, the ICD-10 category makes use of inferences
about personality traits.
In sum, the PCL-R items appear to provide good coverage of the
domain of psychopathic traits as defined in clinical practice, research,
and standard diagnostic criteria.

Criterion-related evidence: Concurrent


PCLfPCL-R Total scores are strongly related to other clinical-behav-
ioral measures of psychopathy (e.g., Hare, 1985b, in press; Harpur et al.,
1989; Kosson et al., 1990; Newman & Kosson, 1986). Point-biserial corre-
lations between Total scores and DSM-III/DSM-III-R diagnoses of APD
average about .55 (except in samples where the base rate of APD is very
high). Similarly, Pearson correlations between Total scores and global
ratings of psychopathy average about .85. Factor 2 scores typically
correlate higher with APD diagnoses than do Factor 1 scores (about .60
versus .40); the opposite pattern is true for correlations with global
ratings (r = .85 for Factor 1 versus .70 for Factor 2).
PCL-R (and PCL) Total scores are correlated with self-report mea-
sures related to psychopathy, including the Pd and Mascales of the MMPI,
the CPI So scale, and the MCMI-11 antisocial scale (Hare, 1985b, in press;
Harpur et. al, 1989; Hart et al., in press). However, the magnitude of these
correlations typically is small (about .20 to .35). In general, these self-
report scales are more strongly correlated with Factor 2 than with Factor 1.
PSYCHOPAlHY CHECKUST 117

Criterion-related evidence: Predictive

Hart et al. (1988) found that the PCL predicted conditional release
violations in a sample of 231 federal offenders, even after they had
controlled for variables such as type of release granted (parole versus
mandatory supervision), criminal history, previous conditional release
violations, and demographic characteristics. Psychopaths recidivated
faster than did non psychopaths, and over twice as often. Following their
release, psychopaths had poorer social and occupational functioning,
regardless of the eventual outcome of their release, than did
non psychopaths.
Similar results were obtained by Serin, Peters, and Barbaree (1990).
They reported that the PCL was considerably better at predicting the
performance of 93 male inmates on unescorted temporary absence or
parole than were several standard actuarial instruments, including the
Base Expectancy Scale (BES; Gottfredson & Bonds, 1961), the Salient
Factor Score (SFS; Hoffman &Beck, 1974), and the Recidivism Prediction
Scale (RPS; Nuffield, 1982).
In addition to general recidivism, the PCL-Rappears to be useful in the
prediction of violent recidivism. Serin (in press) found that PCL-R Total
scores, but not scores on actuarial instruments (the BES, SFS, and RPS),
were significantly correlated with violent outcome following the release
of 81 male inmates. Harris et al. (1990) studied the post-release behavior
of 169 male forensic psychiatric patients. The violent recidivism rate for
psychopaths was almost four times that of non psychopaths. The PCL-R
significantly improved the prediction of outcome over and above the use
of criminal history variables. In another study, Rice, Harris, and Quinsey
(1990) studied 54 rapists released from a maximum security psychiatric
hospital. PCL-R Total scores were significantly correlated with recidivism
for violent offenses in general, and with recidivism for sexual offenses in
particular. A combination of PCL-R scores and a phallometric index of
arousal (based on penile plethysmography) predicted recidivism as well
as did a large battery of criminal-history and demographic variables.
Ogloff, Wong, and Greenwood (1990) performed an outcome study of
80 male forensic patients enrolled in a therapeutic community program
designed to treat personality disordered criminals. Data were prospec-
tive for some patients and retrospective for others. The outcome variables
included (1) the number of days that the patient remained in the program;
(2) ratings (4-point scale) of degree of motivation/effort put into the
program; and (3) ratings (4-point scale) of degree of improvement shown
during treatment. Results indicated that PCL-R psychopaths remained in
118 HARTETAL.

the program for a shorter period of time, put in less effort, and showed
less improvement, than did other inmates.

Construct-related evidence
Other clinical measures. Evidence of the PCL/PCL-R's convergent
validity comes mainly from studies looking at the performance of psycho-
paths on psychological tests that theoretically should be related to
psychopathy. For example, PCL/PCL-R Total scores correlate positively
with scores on measures on impulsivity, machiavellianism, narcissism,
and sensation-seeking (Hare, in press; Harpur et al., 1989). With respect
to interpersonal style, Foreman (1988) found that PCL-R Total scores
were positively correlated with ratings of dominance and negatively
correlated with ratings of nurturance on the Interpersonal Adjective
Scales (Wiggins, Trapnell, & Phillips, 1988), regardless of whether these
ratings were made by inmates themselves or by institutional staff. A
series of analyses of responses on the Rorschach test revealed that PCL-
Rscores were positively correlated with psychodynamic measures related
to narcissism, egocentricity, low anxiety, and emotional detachment
(Gacono, 1990; Gacono, Meloy, & Heaven, 1990). With respect to other
mental disorders, Hart and Hare (1989) found that PCL-R Total scores
were positively correlated with diagnoses of substance use disorder,
histrionic personality disorder, and APD; they were also correlated with
prototypicality ratings of histrionic personality disorder, narcissistic
personality disorder, and APD. Positive correlations between the PCL-R
and substance use have also been reported by Smith and Newman (1990).
Although positively associated with some personality and substance
use disorders, psychopathy tends to be negatively associated with most
forms of mental disorder. Hart and Hare (1989) found that patients
diagnosed as psychopathic, using the PCL-R, were less likely than other
patients to receive DSM-III Axis I diagnoses. Hart and Hare (1989) also
found that PCL-R Total and Factor scores were either uncorrelated or
negatively correlated with prototypicality ratings of schizophrenia and
with prototypicality ratings of all personality disorders except histrionic,
narcissistic, and antisocial personality disorder.
Further evidence of the discriminant validity of the PCL/PCL-R comes
from studies using standardized psychological tests. The PCL and PCL-R
are uncorrelated or negatively correlated with self-report measures of
anxiety, depression, and general distress or neuroticism (Hare, in press;
Harpur et al., 1989; Hart, Forth, &Hare, 1990, In press). The results of over
a dozen studies Indicate that there Is no association between PCL/PCL-R
Total scores and performance on various intelligence tests (Hare, in
press). Also, two studies have reported that PCL/PCL-R scores are not
PSYCHOPATIIY CHECKLIST 119

associated with any impairment in performance on standard


neuropsychological tests (Hare, 1984; Hart et al., 1990).

Crlmlnal behavior and violence. The PCL and PCL-Rhave strong and
stable associations with various indices of criminality. Kosson et al.
(1990) examined the association between the PCL-R and criminal behav-
ior in samples of Black and White inmates. Group analyses indicated that
psychopaths of both races were charged with a significantly greater
number and variety of criminal offenses than were non psychopaths. The
same pattern of results has been found in a sample of forensic psychiatric
patients (Hart & Hare, 1989), and in a random sample of 293 White and
Native Indian offenders incarcerated in Canadian federal prisons (Wong,
1984).
In addition to their general criminal activities, psychopaths commit
violent and aggressive offenses at a particularly high rate. In a sample of
244 inmates, Hare and McPherson (1984a) found that PCL-defined psy-
chopaths were significantly more likely than other criminals to engage in
physical violence and other forms of aggressive behavior, including
verbal abuse, threats, and intimidation. Serin (in press) replicated these
findings in a sample of 87 male prison inmates assessed using the PCL-R.
For example, compared with other criminals, psychopaths were more
likely to have a conviction for a violent offense, to use weapons, threats,
and instrumental aggression, and to attribute hostile intent to others.
Hare and McPherson (1984a) also found that while in prison psychopaths
were more violent and aggressive than were other inmates.
Williamson, Hare, and Wong (1987) examined the natureoftheviolent
offenses committed by psychopaths. Their sample consisted of prison
inmates assessed with the PCL. Official police reports were used to
analyze the circumstances surrounding the most serious of inmates'
instant offenses. Most of the murders and serious assaults committed by
the nonpsychopaths occurred during a domestic dispute or during a
period of extreme emotional arousal, whereas this was seldom true of the
psychopaths. The victims of the non psychopaths were likely to be female
and known to them, but the victims of the psychopaths were likely to be
male and unknown to them. The violence of the psychopaths frequently
had revenge or retribution as the motive or occurred during a drinking
bout. In general, it appeared that most of the psychopaths' violence was
callous and cold-blooded or part of an aggressive, macho display, without
the affective coloring that accompanied the violence of non psychopaths.
These results were replicated in a study that analyzed both police reports
and in-depth interviews with offenders (Wright & Wong, 1988).
Laboratory studies. The PCL and PCL-R have been used to investigate
language processes in psychopaths, with the following results: (a) psy-
120 HARTETAL

chopaths demonstrate reduced cerebral asymmetry in the processing of


language-related aural and visual stimuli (Hare &McPherson, 1984b; Hare
&Jutai, 1988), suggesting that psychopathy maybe associated with weak
or unusuallateralization of language function; (b) psychopaths appear to
exhibit a partial dissociation between the affective and semantic compo-
nents of language and to have some difficulties with connotation (Hare,
Williamson, & Harpur, 1988; Williamson, Harpur, & Hare, 1990, in press);
and (c) psychopaths make unusual use of certain language-related hand
gestures, suggesting that they may have difficulty in encoding linguistic
material (Gillstrom & Hare, 1988).
A study by Patrick and Lang (1989) found that psychopathic sex
offenders, defined by the PCL-R, failed to show normal modulation of the
blink reflex to a startle stimulus while viewing slides with affective
content. They also found that the psychopaths gave much smaller
autonomic responses during imagery of fearful material than did other
offenders.
There is increasing interest in the role that attentional processes play
in the development and maintenance of psychopathy (see review by
Harpur & Hare, 1990b). First, several studies have examined over-focus-
sing in psychopaths. Hare (1982) hypothesized that psychopaths may be
unusually proficient at selectively attending to some stimuli and events
and at ignoring others. Several subsequent studies have addressed this
issue using the PCL/PCL-R, with results that are generally in agreement
with the over-focussing hypothesis (e.g., Forth & Hare, 1989; Harpur &
Hare, 1989; Jutai & Hare, 1983; Jutai, Hare, & Connolly, 1987; Raine &
Venables, 1988). Second, Newman and his colleagues have published a
series of studies of passive avoidance learning, disinhibition, and domi~
nant response set in psychopaths defined by the PCL or PCL-R (Kosson
& Newman, 1986; Newman, 1987; Newman & Kosson, 1986; Newman,
Widom, & Nathan, 1985). A common theme in this research is the
hypothesis the disinhibited behavior of psychopaths is related to a
dominant response set for reward. More recently, Newman (1989) has
suggested that the psychopath's poor passive avoidance behavior may
also result from an inability to switch attentional focus when faced with
competing signals for reward and punishment. Finally, with respect to
active coping, Ogloff and Wong (1990) reported reduced electrodermal
and increased cardiac activity prior to an unavoidable aversive stimulus
in PCL-defined psychopaths (consistent with the results of previous
research conducted using global ratings of psychopathy, e.g., Hare, &
Craigen, 1978; Hare, Frazelle, & Cox, 1978). Ogloff and Wong (1990) also
found that when the psychopaths could avoid the tone by pressing a
button, heart rate acceleration was greatly reduced, presumably because
they had no need for an active coping response.
PSYCHOPAlliY CHECKUST 121

Comparadve Valldltles of the


PCL/PCL-R and Other Assessment Procedures
We noted above that DSM-111-R diagnoses of APD and scores on
traditional self-report scales, such as the So scale of the CPI and the Pd
scale of the MMPI, are more closely aligned with the deviant lifestyle
components of psychopathy (as measured by PCL-R Factor 2) than with
the interpersonal/affective components of the disorder (as measured by
PCL-R Factor 1). There is evidence that the inclusion of inferences about
interpersonal/affective components in the diagnosis of psychopathy can
result in a substantial increase in the incremental validity of the diag-
noses, even when the criterion involves antisocial/criminal behaviors.
For example, Simourd, Bonta, Andrews, and Hoge (1990) compared the
postdictive, concurrent, and predictive validity of the So scale, the Pd
scale, and the PCL/PCL-R with respect to criminal behavior. A computer
search of Psychological Abstracts for the years 1974 to 1989 yielded 42
studies that met the criteria for inclusion in the analysis: a published
study; the use of the MMPI, CPI, or PCL/PCL-R as the method for assessing
psychopathy; the use of officially-measured institutional infractions and
recidivism; sufficient data for effect-size estimates (Pearson's r) to be
calculated. Forty-two studies met the Inclusion criteria, yielding a total of
62 effect-size estimates, 15 for the PCL/PCL-R, 17 for the So scale, and 35
for the Pd scale. The mean effect size was .35 for the PCL/PCL-R, .31 for the
So scale of the CPI, and .19 for the Pd scale of the MMPI. The difference
between psychopathy measures was particularly large in cases where
predictions·of criminal behavior were involved; the mean effect size was
.31 for the PCL/PCL-R (7 studies), .15 for the So scale (5 studies), and .13
for the Pd scale (11 studies).
Hare, Hart, and Harpur (1990) compared the effect size (Pearson's r
with the dependent variable) of the PCL/PCL-R and diagnoses of APD in
several studies of criminal behavior In which information about both
assessment procedures was available. The PCL/PCL-R and APD diag-
noses in each study were made independently; interview and file
information were used in the studies by Hare and McPherson (1984a) and
Hart et al. (1988), while only file information was used in the study by
Harris et al. (1990). The results are summarized in Table 4; the PCL/PCL-
R data are for Total scores and for categorical diagnoses of psychopathy,
using the cutoff scores described above. The mean effect size was
considerably larger for PCL/PCL-R Total scores (r = .46, r2 = .19) and
diagnoses of psychopathy (r = .44, r2 = .1 7) than it was for diagnoses of
APD (r = .28, r2 = .08).
122 HARTETAL

Table 4
PCL/PCirR and Antisocial Personality Disorder (APD): Effect
Size (Pearson's r with Dependent Variable) In Studies of Criminal Behavior

PCL/PCL-R
Dependent
Study variable N Total Diag. APD

Hare & McPherson (1984a) Institutional


violence• 319 .49 .40 .25
Violenceb 227 .46 .45 .39
Hart et al. (1988) Parole
outcome 231 .33 .25 .20
Harris et al. (1990) Violent
recidivism 173 .42 .56 .26

Note. Cell entries are product-moment correlations (r). Diag. =categorical diagnosis
of psychopathy (fotal score > 33 on PCL or > 30 on PCL-R).
a) Number of charges for violent and aggressive behaviors in prison.
b) Global ratings of violent behavior (5 point scale).

The items that define APD and that are found in the So and Pd scales
are from much the same domain as, and therefore should be associated
with, criterion variables related to criminality, violence, and recidivism.
The finding that the PCL/PCLrR generally Is more strongly related to these
criterion variables than are these other measures attests to the value of
Including inferences about Interpersonal/affective traits In the assess-
ment of psychopathy.
Equally important are the results from laboratory tests of hypotheses
about the nature of psychopathy in which the basis for assessment was
the PCLfPCLrR (briefly discussed above). Although there is no similar
body of systematic laboratory research involving APD, several of the
laboratory studies of psychopathy discussed above also obtained DSM-
III or DSM-III-R diagnoses of APD (Patrick, personal communication,
October, 1990; Williamson et al., 1990, in press). In each case, the effects
that were significant with the PCL/PCLrR were not significant when the
presence or absence of APD was the basis for group selection.
PSYCHOPATIIY CHECKUST 123

Conclusions
The PCL-R is a 20-item clinical rating scale for the assessment of
psychopathy. It makes use of interview and file information to assign a
Total score (0 to 40) that represents the degree to which an individual
matches the prototypical psychopath, perhaps most vividly described
by Cleckley (1976). The PCL-R consists of two stable, correlated factors:
Factor 1 measures the affective/interpersonal components of psychopathy,
whereas Factor 2 reflects the impulsive, unstable, and antisocial lifestyle
aspects of the disorder. There is extensive evidence that PCL-R Total and
Factor scores are reliable and valid when used with male forensic
populations. There are early indications that the PCL-R will also be useful
with female forensic populations and with noncriminals.

References
Albert, R.S., Brigante, T.R., & Chase, M. (1959). The psychopathic personality: A
content analysis of the concept. Journal of General Psychology, 60, 17-28.
American Psychiatric Association. (1980). Diagnostic and Statistical Manual of
Mental Disorders (3rd ed.). Washington, DC: Author.
American Psychiatric Association. (1987). Diagnostic and Statistical Manual of
Mental Disorders (3rd ed., revised). Washington, DC: Author.
American Psychiatric Association. (1990). DSM-JV Update (January/February
1990). Washington, DC: Author.
Bartko, J.J. (1976). On various intraclass correlation reliability coefficients.
Psychological Bulletin, 83, 762-765.
Buss, A. H. (1966). Psychopathology. New York: Wiley.
Butcher,J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A., &Kaemmer, B. (1989).
Manual for the restandardized Minnesota Multiphasic Personality Inventory: The
MMP/-2. Minneapolis: University of Minnesota Press.
Cleckley, H. (1976). The Mask of Sanity (5th ed.). St. Louis, MO: Mosby.
Correctional Service of Canada (1989). Forum on Corrections Research, I, No.2.
Ottawa, Canada: Author.
Correctional Service of Canada (1990). Fo~um on Corrections Research, 2, No. I.
Ottawa, Canada: Author.
Cotton, D.J. (1989). Forensic assessment survey results and model forensic assessment
protocol recommendations. Conditional Release Program, Forensic Services
branch, California Department of Mental Health. Pinole, CA:. Author.
Cronbach, L.J., Gieser, G.C., Nanda, H., &Rajaratnam, N. (1972). The Dependability
of behavioral measurements. New York: Wiley.
Davies, W., & Feldman, P. (1981). The diagnosis of psychopathy by forensic
specialists. British Journal of Psychiatry, /38, 329-331.
124 HARTETAL

Dlx, G.E. (1980). Clinical evaluation of the "dangerousness" of "normal" criminal


defendants. Virginia Law Review, 66, 523-581.
Doren, D. M. (1987). Understanding and treating the psychopath. New York: Wiley.
Feighner, J.P, Robins, E., Guze, S.B., Woodruff, R.A., Winokur, G., & Munoz, R.
(1972). Diagnostic criteria for use in psychiatric research. Archives of General
Psychiatry, 26, 57-63.
Fotheringham, J.B. (1957). Psychopathic personality: A review. Canadian Psychiatric
Association Journal, 2, 52-74.
Foreman, M. (1988). Psychopathy and interpersonal behavior. Unpublished doctoral
dissertation, University of British Columbia, Vancouver, Canada.
Forth, A. E., & Hare, R.D. (1989). The contingent negative variation in psychopaths.
Psychophysiology, 26, 67&.682.
Forth, A.E., Hart, S.D., & Hare, R.D. (1990). Assessment of psychopathy in male
young offenders. Psychological Assessment: A Journal ofConsulting and Clinical
Psychology, 2, 342-344.
Frances, A.J., & Widiger, T. (1986). The classification of personality disorders: An
overview of problems and situations. In A.J. Frances & R.E. Hales (Eds.),
American Psychiatric Association Annual Review 01ol. 5, Psychiatry Update, pp.
24-257). Washington, DC: American Psychiatric Press.
Gacono, C.B. (1990). An empirical study of object relations and defensive structure
in antisocial personality disorder. Journal ofPersonality Assessment, 54, 589-600.
Gacono, C. B., Meloy, J.R., & Heaven, T.R. (1990). A Rorschach investigation of
narcissism and hysteria in antisocial personality disorder. Journal ofPersonality
Assessment, 55, 270-279.
Gerstley, L.J., Alterman, A.l., McLellan, A.T., & Woody, G.E. (1990). Antisocial
personality disorder in substance abusers: A problematic diagnosis? American
Journal of Psychiatry, 147, 173-178.
Gillstrom, B.J., &Hare, R.D. (1988). Language-related hand gestures in psychopaths.
Journal of Personality Disorders, 2, 21-27.
Gottfredson, D.M., & Bonds, J.A. (1961). A manual for intake based expectancy
scoring. San Francisco: California Department of Corrections, Research Division.
Gough, H. G. (1969). Manualfor the California Psychological Inventory. Palo Alto, CA:
Consulting Psychologists Press.
Grant, V. (1977). The menacing stranger. New York: Dover.
Gray, K.C., & Hutchinson, H. C. (1964). The psychopathic personality: A survey of
Canadian psychiatrists' opinions. Canadian Psychiatric Association Journal, 9,
452-461.
Gunderson, J.G. (1990). Minutes of DSM-IV Axis II Work Group Meeting (May 2).
Washington, DC: American Psychiatric Association.
Guze, S.B. (1976). Criminalityandpsychiatricdisorders. New York: Oxford University
Press.
Hare, R.D. (1970). Psychopathy: Theory and research. New York: Wiley.
PSYCHOPAlHY CHECKUST 125

Hare, R.D. (1980). A research scale for the assessment of psychopathy in criminal
populations. Personality and Individual Differences, /, 111-119.
Hare, R. D. (1982). Psychopathy and physiological activity during anticipation of
an aversive stimulus in a distraction paradigm. Psychophysiology, /9, 266-271.
Hare, R.D. (1983). Diagnosis of antisocial personality disorder in two prison
populations. American Journal of Psychiatry, /40, 887-890.
Hare, R.D. (1984). Performance of psychopaths on cognitive tasks related to
frontal lobe function. Journal of Abnormal Psychology, 93, 133-140.
Hare, R.D. (1985a). The Psychopathy Checklist. Unpublished manuscript, Department
of Psychology, University of British Columbia, Vancouver, Canada.
Hare, R.D. (1985b). Comparison of procedures for the assessment of psychopathy.
Journal of Consulting and Clinical Psychology, 53, 7-16.
Hare, R.D. (in press). The Hare Psychopathy Checklist-Revised (PCL-R). Toronto,
Ontario: Multi-Health Systems.
Hare, R.D., & Cox, D.N. (1978). Clinical and empirical conceptions of psychopathy,
and the selection of subjects for research. In R.D. Hare & D. Schalling (Eds.),
Psychopathic behavior: Approaches to research. Chichester, England: Wiley.
Hare, R.D., Cox, D.N., & Hart, S.D. (1990). Preliminary manual for the Psychopathy
Checklist: Screening Version (PCL:SV). Unpublished manuscript, University of
British Columbia, Vancouver, B.C., Canada.
Hare, R.D, & Craigen, D. (1974). Psychopathy and physiological activity in a mixed-
motive game situation. Psychophysiology, II, 197-206.
Hare, R.D., Forth, A.E., & Hart, S.D. (1989). The psychopath as prototype for
pathological lying and deception. InJ.C. Yuille (Ed.), Credibilityassessment(pp.
25-49). Dordrecht, The Netherlands: Kluwer.
Hare, R.D., & Frazelle, J. (1980). Some preliminary notes on the use of a research
scale for the assessment of psychopathy in criminal populations. Unpublished
manuscript, Department of Psychology, University of British Columbia,
Vancouver, Canada.
Hare, R.D., Frazelle,J., & Cox, D.N. (1978). Psychopathy and physiological response
to threat of an aversive stimulus. Psychophysiology, 15, 165-172.
Hare, R.D., Harpur, T.J., Hakstian, AR., Forth, A.E., Hart, S.D., & Newman, J.P.
(1990). The revised Psychopathy Checklist: Reliability and factor structure.
Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2,
338-341.
Hare, R.D., Hart, S.D., & Harpur, T.J. (in press). Psychopathy and the proposed
DSM-IV criteria for antisocial personality disorder. Journal of Abnormal
Psychology: Special Issue.
Hare, R.D., &Jutai, J.W. (1988). Psychopathy and cerebral asymmetry in semantic
processing. Personality and Individual Differences, 9, 329-337.
Hare, R.D., & McPherson, L.M. (1984a). Violent and aggressive behavior by
criminal psychopaths. International Journal of Law and Psychiatry, 7, 35-50.
126 HARTETAL

Hare, R.D., & McPherson, L.M. (1984b). Psychopathy and perceptual asymmetry
during verbal dichotic listening. Journal of Abnormal Psychology, 93, 141-149.
Hare, R.D., McPherson, L.M., & Forth, A.E. (1988). Male psychopaths and their
criminal careers. Journal of Consulting and Clinical Psychology, 56, 710-714.
Hare, R.D., & Schalling, D. (Eds.). (1978). Psychopathic behavior: Approaches to
research. Chichester, England: Wiley.
Hare, R.D., Williamson, S.E., & Harpur, T.J. (1988). Psychopathy and language. In
I.E. Moffitt & S.A. Mednick (Eds.), Biological contributions to crime causation
(pp. 68-92). Dordrecht, Netherlands: Martinus Nijhoff.
Harpur, T. J., &Hare, R. D. (1989,June). Facilitation and inhibition ofvisual attention
in psychopaths. Paper presented at the Fourth Meeting of the International
Society for the Study of Individual Differences, Heidelberg, West Germany.
Harpur, T.J., &Hare, R.D. (1990a). The assessment of psychopathy as a function of
age. Manuscript submitted for publication.
Harpur, T.J., &Hare, R.D. (1990b). Psychopathy and attention. In J. Enns (Ed.), The
development of attention: Research and theory (pp. 429-444). New York: North
Holland.
Harpur, T J., Hakstian, A.R., &Hare, R.D. (1988). Factor structure of the Psychopathy
Checklist. Journal of Consulting and Clinical Psychology, 56, 741-747.
Harpur, T.J., Hare, R.D., & Hakstian, AR. (1989). Two-factor conceptualization of
psychopathy: Construct validation and assessment implications. Psychological
Assessment: A Journal of Consulting and Clinical Psychology, 1, 6-17.
Harris, G. T., Rice, M. E., & Cormier, C. A. (1990). Psychopathy and violent
recidivism. Manuscript submitted for publication.
Hart, S.D., Forth, A. E., & Hare, R.D. (1990). Performance of male psychopaths on
selected neuropsychological tests. Journal ofAbnormal Psychology, 99, 374-379.
Hart, S.D., Forth, AE., & Hare, R.D. (in press). Assessing psychopathy in male
criminals using the MCMI-11. Journal of Personality Disorders.
Hart, S.D., Kropp, P.R., & Hare, R.D. (1988). Performance of male psychopaths
following conditional release from prison. Journal of Consulting and Clinical
Psychology, 56,227-232.
Hart, S.D., & Hare, R.D. (1989). Discriminant validity of the Psychopathy Checklist
in a forensic psychiatric population. Psychological Assessment: A Journal of
Consulting and Clinical Psychology, 1, 211-218.
Hathaway, S.R., & McKinley, J.C. (1943). Manual for the Minnesota Multiphasic
Personality Inventory. New York: Psychological Corporation.
Hoffman, P., & Beck, J.L. (1974). Parole decision-making: A salient factor score.
Journal of Criminal Justice, 2, 19~206.
Jutai, J., & Hare, R.D. (1983). Psychopathy and selective attention during
performance of a complex perceptual-motor task. Psychophysiology, 20, 146-151.
Jutai, J., Hare, R.D., & Connolly, J.F. (1987). Psychopathy and event-related brain
potentials (ERPs) associated with attention tospeech.Personalityandlndividual
Differences, 8, 17~184.
PSYCHOPATHY CHECKUST 127

Karpman, B. (1961). The structure of neurosis: With special differentials between


neurosis, psychosis, homosexuality, alcoholism, psychopathy, and criminality.
Archives of Criminal Psychodynamics, 4, 599-646.
Kosson, D.S., & Newman, J.P. (1986). Psychopathy and allocation of attentional
capacity in a divided-attention situation. Journal of Abnormal Psychology, 95,
257-263.
Kosson, D.S., Smith, S.S., & Newman, J.P. (1990). Evaluating the construct validity
of psychopathy on Black and White male Inmates: Three preliminary studies.
Journal of Abnormal Psychology, 99, 250-259.
McCord, W.M. (1982). The psychopath and milieu therapy. New York: Academic
Press.
McCord, W.M., & McCord, J. (1964). The psychopath: An essay on the criminal mind.
Princeton, NJ: Van Nostrand.
Meloy, J.R. (1988). The psychopathic mind: Origins, dynamics, and treatment.
Northvale, NJ: Jason Aronson.
Millon, T. (1981). Disorders of Personality: DSM-111 Axis ll New York: Wiley.
Millon, T. (1987). Millon Clinical Multiaxial /nventory-11 manual. Minneapolis, MN:
National Computer Systems.
Monahan, J. (1990). Risk of violence among the mentally disordered: A description
of the MacArthur Risk Study. Unpublished manuscript, University of Virginia
School of Law, Charlottesville, VA
Neary, A (1990). DSM-III and Psychopathy Checklist assessment of antisocial
personality disorder in Black and White female felons. Unpublished doctoral
dissertation, University of Missouri-St. Louis, MO.
Newman, J.P. (1987). Reaction to punishment in extraverts and psychopaths:
Implications for the impulsive behavior of dislnhibited individuals. Journal of
Research in Personality, 21, 464480.
Newman, J.P. (1989, July). Response modulation deficits in psychopaths: A matter
of perspective. In R.D. Hare (Chair), Assessment of psychopathy in forensic
populations. Symposium conducted at the meeting of the International Society
for the study of Individual Differences, Heidelberg, FRG.
Newman, J.P., & Kosson, D.S. (1986). Passive avoidance learning In psychopathic
and nonpsychopathic offenders. Journal of Abnormal Psychology, 95, 252-256.
Newman, J.P., Wid om, C., & Nathan, S. (1985). Passive avoidance in syndromes of
disinhibition: Psychopathy and extraversion. Journal of Personality and Social
Psychology, 48, 1316-1327.
Nuffield, J. (1982). Parole decision-making in Canada: Research towards decision
guidelines. Ottawa, Ontario, Canada: Ministry of the Solicitor General.
Ogloff, J.R.P, & Wong, S. (1990). Electrodermal and cardiovascular evidence of a
coping response in psychopaths. Criminal Justice and Behavior, 17, 231-245.
Ogloff, J., Wong, S., & Greenwood, A. (1990). Treating criminal psychopaths In a
therapeutic community program. Behavioral Sciences and the Law, 8, 81-90.
128 HARTETAL.

Patrick, C. J., &Lang, P. J. (1989, October). Psychopathy and emotion in a forensic


population. Paper presented at the meeting of the Society For
Psychophysiological Research, New Orleans.
Peterson, B. (1984). Cross-validation ofthe checklist for the assessment ofpsychopathy
in a prison sample. Unpublished doctoral dissertation, University of Missouri,
Saint Louis, MO.
Raine, A., & Venables, P.H. (1988). Enhanced P3 evoked potentials and longer
recovery time in psychopaths. Psychophysiology, 25, 30-38.
Reid, W.H., Dorr, D., Walker, J.l., & Bonner, J.W. (Eds.). (1986). Unmasking the
psychopath: Antisocial personality and related syndromes. New York: W.W.
Norton.
Rice, M.E., Harris, G.T., &Quinsey, V.L. (1990). A follow-up of rapists assessed in a
maximum security psychiatric facility. Manuscript submitted for publication.
Robins, L.N. (1966). Deviant children grown up. Baltimore MD: Williams and
Wilkins.
Sartorius, N., Jablensky, A., Cooper, J.E., & Burke, J.D. (Eds.). (1988). Psychiatric
classification in an international perspective. The British Journal of Psychiatry,
Supplement No. 1. 152.
Schroeder, M.L., & Hare, R.D. (1990). Generalizability of the Revised Psychopathy
Checklist. Manuscript in preparation.
Schroeder, M.L., Schroeder, K.G., & Hare, R.D. (1983). Generalizability of a
checklist for the assessment of psychopathy. Journal of Consulting and Clinical
Psychology, 51, 511-516.
Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlation: Uses in assessing rater
reliability. Psychological Bulletin, 86, 420428.
Serio, R. C. (in press). Psychopathy and violence incriminals.Joumal ofinterpersonal
Violence.
Serio, R. C., Peters, R. D., & Barbaree, H. E. (1990). Predictors of psychopathy and
release outcome in a criminal population. Psychological Assessment: A Journal
of Consulting and Clinical Psychology, 2, 419-422.
Simourd, D. J., Bonta,J., Andrews, D. A., &Hoge, R. D. (1990, May). Psychopathy and
criminal behavior: A meta-analysis. Paper presented at the annual meeting of
the Canadian Psychological Association, Ottawa, Canada.
Smith, R.J. (1978). The psychopath in society. New York: Academic Press.
Smith, S.S., &Newman,J.P. (1990). Alcohol and drug abuse/dependence disorders
in psychopathic and nonpsychopathic criminal offenders. Journal ofAbnonnal
Psychology, 99, 430439.
Spitzer, R.L., Endicott, J., & Robins, E. (1975). Research diagnostic criteria:
Rationale and reliability. Archives of General Psychiatry, 35, 773-782.
Weiss, J. (1987). The nature of psychopathy. Directions in psychiatry. New York:
Hatherleight Company Ltd.
PSYCHOPATHY CHECKUST 129

Widiger, T.A., Frances, A.J., Pincus, H.A., Davis, W.W., & First, M. (in press).
Toward an empirical classification for DSM-N.Journal ofAbnormal Psychology:
Special Issue.
Wiggins, J.S. (1973). Personality and prediction: Principles ofpersonality assessment.
Reading, MA: Addison-Wesley.
Wiggins, J.S. (1982). Circumplex models of interpersonal behavior in clinical
psychology. In P.C. Kendall & J.N. Butcher (Eds.), Handbook of research
methods in clinical psychology (pp. 183-221). New York: Wiley.
Wiggins, J.S., Trapnell, P., & Phillips, N. (1988). Psychometric and geometric
characteristics of the revised Interpersonal Adjective Scales (IAs.R). Multivariate
Behavioral Research, 23, 517-530.
Williamson, S., Hare, R.D., & Wong, S. (1987). Violence: Criminal psychopaths and
their victims. Canadian Journal of Behavioral Science, 19, 454-462.
Williamson, S., Harpur, T.J., & Hare, R.D. (1990, August). Sensitivity to emotional
polarity in psychopaths. Paper presented at the meeting of the American
Psychological Association, Boston, MA.
Williamson, S., Harpur, T.J., & Hare, R.D. (in press). Abnormal processing of
affective words by psychopaths. Psychophysiology.
Woodruff, R.A., Guze, S.B., & Clayton, P.J. (1980). The medical and psychiatric
implications of antisocial personality (sociopathy). In H.J. Vetter &R.W. Rieber
(Eds.), The psychological foundations of criminal justice, Vol. II (pp. 307-312).
New York: John Jay Press.
Wong, S. (1984). Criminal and institutional behaviors of psychopaths. Programs
Branch Users Report. Ottawa, Ontario, Canada: Ministry of the Solicitor-General
of Canada.
Wong, S. (1988).1s Hare's Psychopathy Checklist reliable without the interview?
Psychological Reports, 62, 931-934.
Wong, S., & Templeman, R. (1988, June). High and low psychopathy groups derived
by cluster-analysing Psychopathy Checklist data. Paper presented at the Annual
Meeting of the Canadian Psychological Association, Montreal, Canada.
World Health Organization (1978). Mental disorders: Glossary and guide to their
classification in accordance with the ninth revision ofthe International aassification
of Diseases. Geneva: Author.
Wright, S., & Wong, S. (1988). Criminal psychopaths and their victims. Unpublished
manuscript, Department of Psychology, University of Saskatchewan, Saskatoon,
Saskatchewan.

Author Note
Preparation of this chapter was supported by grant MT-4511 from the Medical
Research Council of Canada, and by the Program of Research on Mental Health
130 HARTETAL

and the Law of the John D. and Catherine T. MacArthur Foundation (MacArthur
Risk Study: John Monahan, Director). TJ. Harpur is now at the University of
Illinois.

Notes
1. The MacArthur Risk Study uses a brief, 12-item screening version of the
PCL-R, called the PCL:SV, intended for both forensic and nonforensic settings
(Hare, Cox, & Hart, 1990); the DSM-IV APD field trials use a similar, 10-item
modification. Further information concerning these modifications is available
upon request.
2. Information about an experimental Sell-Report Psychopathy (SRP-11) scale
intended to measure the affective, interpersonal, and lifestyle components of
psychopathy is available on request.
3. To help train researchers and clinicians in the proper use of the PCL-R, we
have developed workshops and a set of mock assessment materials (videotaped
interviews with file information). Other users have set up their own training
programs.
4. Schroeder et al. (1983) performed a G theory analysis on PCL data obtained
from five samples of male prison inmates (N = 301) assessed between 1977 and
1981. GCs for PCL Total scores in the individual samples ranged from .85 to .90; the
overall GC was .90. Harpur et al. (1989) reanalyzed these data and obtained GCs
of .79 for Factor 1 and and .84 for Factor 2.
CHAPTERS

The MMPI-2:
Development
and Research Issues

Nathan C. Weed and James N. Butcher

Despite the volumes of supporting research its popularity promotes,


the MMPI' s prominence quickly converts into disadvantage when discus-
sion turns to the revision of this historic personality inventory. First,
there appears to be some inertia where clinical practice and research
tradition is concerned. This is true whether or not a particular method of
assessment or mode of therapy has a research literature to support it, but
is especially pronounced when generations of users fear the destruction
of an instrument which has served them well while in the process of
"improvement." Second, among those who agreed that revision of the
MMPI was necessary to some degree, there existed nearly as many
proposed types of revisions as there were proponents of revisions.
Suggestions ranged from the conservative (a simple renorming) to the
radical (a complete abandonment of its empirical roots in favor of a more
"modern" approach), and vehemence was equally distributed along this
continuum. Finally, and perhaps most importantly, the decades of
"bootstrapping" promoted by the MMPI's. popularity have tied the test's
NA1HAN C. WEED, Instructor, Department of Psychology, University of Minnesota,
Minneapolis, Minnesota.
JAMFS N. BUTCHER, Professor, Department of Psychology, University of
Minnesota, Minneapolis, Minnesota.
131
132 NATIIAN C. WEED AND JAMFS N. BUTCHER

utility to a particular set of Items. There is no a priori construct explication


underlying the MMPI scales which permits painless item substitution or
revision. Presumably, the further a new instrument strays from the
stimulus materials indicated in empirical research, the weaker our confi-
dence in the application of empirical findings to the revised instrument.
The MMPI, then (with due credit to McKinley and Hathaway), is
largely a product of its past and present popularity, and despite a general
consensus that there was a great need for some revision, the MMPI-2 will
have to contend with the many problems its predecessor's popularity
poses. The following chapter describes the nature of the differences
between the two inventories and the implications of these differences by
first reviewing the process of the development of the MMPI-2, and then
examining some of the important issues which it must face in the years
to come.

MMPI-2 Development
The charge of the MMPI Restandardization Committee, composed of
James N. Butcher from the University of Minnesota, W. Grant Dahlstrom
from the University of North Carolina, John R Graham from Kent State
University, Auke Tellegen from the University of Minnesota, and Beverly
Kaemmer from the University of Minnesota Press, was to effect some
changes, the needs for which had become apparent over the years, while
at the same time maintaining interpretive continuity with the MMPI
(Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). The two
major targets of this committee were the inappropriateness of some item
content and the need for contemporary norms. The results of the revision
included some modifications and deletions at the item level, the introduc-
tion of some new scales, and changes in the psychometric scaling, which
involved a renorming of the instrument and the development of a new
transformation of the raw scale scores. The following is a review of the
nature of these changes and additions which are found in the MMPI-2.

Item Changes
The MMPI item pool has been many things to different people: to its
developers it was simply an empirical collection of the most discriminat-
ing items available, to grammarians it was a nightmare of complex and
awkward statements, to some feminists it was evidence of institutional-
ized sexism, to comedians such as Art Buchwald, it was the source of a
good amount of material (such as his "North Dakota Null-Hypothesis
Brain Inventory"), to some job applicants it was a group of offensive
MMPI-2 133

personal questions, and to some test-takers it was at times simply


confusing. One of the goals of the MMPI Restandardization Committee
was to eliminate the sources of many of these objections by rewriting
items when necessary and possible, and when not possible, to
eliminate them.
In all, sixty-eight items on the MMPI were modified in some way. The
MMPI-2 Manual divides these revisions into four types: 1) elimination of
possibly sexist or male-oriented wording; 2) modernization of idioms and
usage; 3) grammatical clarification (such as tense and voice); and 4)
simplifications (Butcher et al., 1989a). A small number of items which
could not be altered satisfactorily were deleted, including four on scale F,
one on Hs, three on D, 4 on Mf, and one on Si. The sixteen repeated items
in the booklet form of the MMPI were dropped, as were items which did
not belong to any of the commonly used MMPI scales, and replaced with
items to address contemporary clinical problems and areas of concern
which had not received adequate attention in the original MMPI pool. The
MMPI-2 contains 567 items. The clinical and validity scale scores can be
obtained by administering the first 370 items.

New Scales
Except for the minor changes at the item level mentioned above, the
MMPI clinical and validity scales were left relatively intact in the MMPI-2
(see section below on form comparability). Perhaps the biggest change
from the MMPI to the MMPI-2 at the scale level is the introduction of fifteen
new content scales, designed to better represent the major dimensions of
the MMPI-2 item pool, and three new validity scales to complement the
standard validity scales available on the MMPI.

Content Scales
In Burisch' s review (1984) of strategies of test construction, he notes
that while empirical construction can yield instruments that are valid
with respect to external criteria, it often compares unfavorably with
strategies which consider item content (e.g., deductive or factor-ana-
lytic) in terms of other characteristics, such as communicability and
discriminant validity. Although MMPI content interpretation was dis-
couraged in its early years, practitioners in more recent years have found
content interpretation, such as with the Koss and Butcher critical items
(Koss & Butcher, 1973) and the Wiggins Content Scales (Wiggins, Goldberg,
& Appelbaum, 1971), useful both by themselves and as an adjunct to
MMPI clinical scale interpretation. Because of the popularity and utility
of content interpretive approaches to the MMPI, the Restandardization
134 NATHAN C. WEED AND JAMFS N. BUTCHER

Committee agreed that content scales be continued in the MMPI-2. One


possibility towards this end was to revise and update the popular Wiggins
Content Scales. However, with the changes in the MMPI-2 item pool, these
scales are no longer representative of the MMPI-2 item pool content
dimensions. An ambitious project was thus begun with the goal of
producing a set of homogeneous scales covering the major content
dimensions in the MMPI-2.
The MMPI-2 content scale construction process was multi-stage and
iterative, alternating between deductive, rational steps, and statistical
refinement (Butcher, Graham, Williams, & Ben-Porath, 1989). The first
stage in the process involved the rational identification of the content
dimensions existing in the MMPI-2 item pool, and the subsequent classi-
fication of items into these categories. After item categorization was done
independently by the four authors, meetings were held to achieve con-
sensus about item membership. In the second stage, the appropriateness
of the consensus item membership was evaluated with item-scale corre-
lations and internal consistency statistics obtained from samples of
college students and psychiatric inpatients. At this point, items contrib-
uting to scale inconsistency were dropped from scales, and in some cases
substituted with more appropriate items. Nonviable scales were dropped,
and one new scale was created. The third stage was a rational review of
the scales. Construct irrelevant items were deleted, scales with excessive
content overlap were dropped, and some scales were renamed. The
fourth stage involved further statistical refinement. In this stage, the
discriminant validity of the items was examined, and items deleted or
transferred to other scales when appropriate. Also, uniform T-scores
were derived. In the fifth and final stage, scale descriptors were created,
based on item member content. The final product of these five steps was
the fifteen MMPI-2 Content Scales: Anxiety, Fear, Obsessiveness, Depres-
sion, Health Concerns, Bizarre Mentation, Anger, Cynicism, Antisocial
Practices, Type A, Low Self-Esteem, Social Discomfort, Family Problems,
Work Interference, and Negative Treatment Indicators.
Reliability data on the MMPI-2 Content Scales indicate that they are
generally more internally consistent than the Wiggins Content Scales and
a good deal more so than the MMPI-2 clinical scales (Butcher et al., 1989b).
Test-retest correlations are similar to those of the Wiggins Content
Scales, and slightly higher than those of the clinical scales. Spouse rating
data from the normative sample have been reported to provide evidence
of criterion-related validity for the new content scales. Validity for some
of the scales is also suggested by comparing mean differences between
the normative sample and special clinical populations (e.g., Health Con-
cerns with a chronic pain sample, Keller & Butcher, 1990; and Bizarre
Mentation and Depression with an inpatient psychiatric sample, Graham
MMPI-2 135

& Butcher, 1988). Convergent validity data in the form of correlations with
other MMPI scales are also available (Butcher et al, 1989b). It is clear,
though, that much more research is needed involving these content
scales, especially studies providing correlations with external criteria.
Other potentially interesting studies would include: an evaluation of the
utility of the Negative Treatment Indicators and Negative Work Attitudes
scales; studies regarding the applicability of these scales for various
types of computer adaptive testing (see section below on the use of
computers with the MMPl-2); and comparisons between the deductively-
based, homogeneous MMPl-2 Content Scales, and the atheoretical,
empirically-based clinical scales, in terms of their relative validity, utility,
susceptibility to faking, etc.

BackF
Like scale F, F8 (Back F) was designed to help identify individuals who
complete the inventory in an invalid manner. It was developed, like F, by
identifying items which were infrequently endorsed in the normative
sample. But whereas the items onFappear in the first 370 items, F8 items
are found near the end of the test booklet. Thus, a normal score on F
coupled with a high score on F8 might indicate that the test taker stopped
paying attention to the test items at some point and shifted to a random
or unusual pattern of responding, possibly due to fatigue or loss of
interest in the task. This scale is thought to be potentially useful espe-
cially with adolescent samples in which test taking attitudes may shift.

VRIN
The VRIN (Variable Response Inconsistency) Scale was developed to
complement the original MMPI validity scales by providing information
about the consistency with which an individual responds to item content
within the MMPI-2. VRIN is not scored by considering answers to single
items, but responses to item pairs. VRIN consists of 67 pairs of items which
are either very similar or very opposite in item content. For every
instance of inconsistent responding within an item pair, a point is added
to the VRIN score. In certain cases, VRIN scores may be used to help
explain elevations on scale F. Elevations on Fmay occur for a number of
reasons, including severe psychopathology, faking bad, extreme confu-
sion, or random responding. But if, for example, a profile contains
elevation on both F and VRIN, one would be able to narrow down the
possible explanations for the Felevation. In this case, a clinician should
suspect that the test taker was confused or responding randomly, since
high VRIN scores suggest response inconsistency. In cases where high F
136 NATHAN C. WEED AND JAMFS N. BUTCHER

scores are associated with normal VRIN scores (indicating consistent


responding and responding to infrequent items), one should consider the
possibility of serious disturbance or faking bad.
TRIN

TRIN, or True Response Inconsistency, was designed to detect re-


sponse sets such as "yea-saying" or "nay-saying." The scale was developed
similarly to VRIN, with scores derived from responses to item pairs. The
TRIN scale, in contrast to the VRIN scale, only contains pairs of items with
opposite content. If an individual agrees with both statements, one point
is added to the TRIN score. If an individual disagrees with both items, a
point is subtracted from the TRIN score. Extremely high scores suggest
acquiescent response bias, while extreme low scores suggest bias to-
wards disagreeing with item content. These three new scales (F8 , VRIN,
and TRIN) taken together with the standard validity indicators from the
MMPI, will allow for more detailed examination of protocol validity on the
MMPI-2.

Scaling of the MMPI-2


A clinician with a good deal of experience interpreting old MMPI
profiles may develop some "internal norms" for MMPI clinical scales. The
clinician may know, for example, that submerged profiles (i.e., all clinical
scales around or below aT-score of 50) are extremely rare. The practitio-
ner knows this not because she or he possesses any special knowledge
about normative drift, or a change in the frequency of "Cannot Say"
responses, but rather, because these profiles are observed so rarely in
practice. In a like manner, a practitioner may know that certain MMPI
~linical scales allow for elevations much higher than other scales. This,
the clinician knows not because of an understanding of the variance in
skewnesses across the scales, but again, by careful observation.
While it is certainly possible for test users to develop these sorts of
"internal norms" over years of test use, the MMPI Restandardization
Committee attempted to eliminate the need for them. The issue of mean
scores was addressed by constructing new norms based upon a contem-
porary community sample. The issue of distributions about the mean
varying from scale to scale was dealt with by developing Uniform T-
scores.
MMPI-2 Nonnative Sample
There is abundant evidence indicating that the original MMPI norma-
tive data collected by McKinley and Hathaway are not representative of
MMPI-2 137

today's test takers. First, there are differences in test administration. For
example, in the original normative sample, individuals were encouraged
to respond with "Cannot Say" if they were not sure about whether an item
applied to them. Today it is standard practice to ask that test takers
complete each item, leaving answers blank only if they must. Second,
there are many demographic characteristics of today's society of which
the original sample is not representative. Although the original "Minne-
sota normals" may have been representative of the state of Minnesota in
the 1930s in terms of age range, educational level and socioeconomic
background (Hathaway & McKinley, 1940), they are certainly not repre-
sentative of the United States of the 1990s on these variables and others,
such as ethnic group membership. Third, there have been other changes
in society over the last fifty years which surely create differences in the
way people respond to the MMPI items now. Society has undergone
attitudinal changes,lifestyle changes, and gender role changes in the last
fifty years, and each of these is likely to have exerted some influence on
the way people respond to this set of items.
Unlike scores such as those on the Scholastic Aptitude Test, for which
normative drift is not problematic since it is important to compare
differences in performance from year to year, MMPI protocols are inter-
preted with reference to the mean of the population from which they
came. If a protocol is drawn from a population with characteristics
different from those from the comparison group, scores based on relative
standing are difficult to interpret. Therefore, the differences itemized
above between the old MMPI normative sample and today's test takers
justify (perhaps demand) that a new set of comparison scores be con-
structed. One of the main goals of the Restandardization Committee was
to provide an appropriate normative comparison group for today's test
takers. (A separate project involved developing contemporary norms for
adolescents. An Adolescent Form of the MMPI-2 employing these norms
and including new items with age-specific content will be available from
the Univeristy of Minnesota Press sometime in 1991.)
The MMPI-2 normative sample is made up of 2600 test protocols (from
1138 men and 1462 women) gathered from seven geographic regions of
the country (California, Minnesota, North Carolina, Ohio, Pennsylvania,
Virginia, and Washington). Data from the 1980 census were used as a
comparison for many demographic variables to ensure that the sample
collected was representative. Sample data match the census data well on
most variables, including age, ethnic background, marital status, and
income level (Butcher et al., 1989a).
One variable on which the MMPI-2 normative sample does not closely
match the 1980 census is educational level. The new normative sample
138 NATHAN C. WEED AND JAMES N. BUTCHER

has a higher mean education than the census data. Part of this problem
is mitigated by national changes in mean education level since 1980. Also,
since the MMPI-2 is administered only to those with adequate reading
skills (a minimum of an eighth-grade reading level is required, Butcher et
al., 1989a), the more appropriate comparison group (as opposed to the
census) is those to whom the MMPI-2 can be administered, a group which
is, at the least, closer to the MMPI-2 normative sample in terms of
education. Finally, the effect of education on MMPI-2 profiles appears to
be minimal (Butcher, 1990a). Table 1 shows correlations between years
of education and the standard MMPI-2 validity and clinical scales in the
normative sample. Demonstrated also in Figures 1 and 2, the only scales
substantially related to educational level are scales Mf (for males) and K,
both of which have already traditionally been interpreted within the
context of education with the original MMPI. To summarize, then, with the
exception of education, which has minimal impact on MMPI-2 scale
scores, the new normative sample is quite comparable to the 1980 census

Table 1
Correlations Between MMPI-2 Clinical and Validity Scales and Years of
Education for the MMPI-2 Normative Sample

Scale Men (N=ll38) Women (N=1462)


L -.133 -.150
F -.220 -.143
K .238 .234
Hs -.137 -.104
D -.103 -.109
Hy .102 .039
Pd- .009- .028
Mf .347- .152
Pa .032 .021
Pt -.001 -.054
Sc -.027 -.061
Ma -.052 -.072
Si -.128 -.149
Note. From "Educational level and MMPI-2 measured psychopathology: A case of
negligible influence" by J. N. Butcher, 1990, August, MMP/-2 News and
Profiles, p. 3. Reproduced by permission.
MMPI-2 139

..
,.._, ..... ..
I
.'. . . I Part High School
•-:: ·-= • -:: M-

· -=
n=61

: =---
·~ ~
·- ·-= =--
"~ ~
- ::...
·-= --= :
-~ ~"
"~ - ~ f'
• -:- -· -.:.:;,.--;'-\---:---,.::----:---:--~--=--~- ·-----S.;-~
·-: ;.

.. ,_
·- •-:: · -=
• -::

.
....-
....
·~
.....; "":
..... .. I
~
I
t .-: "~
...' ... ....
I
•-::

I
"-= =--
•-:: :..

.... ..... .. . ... •... ..•. ....


~
High School Graduate

·- ·- '
I
n=242
• -:: M-

·-=

~··
11! -: ·-= f'
·-= ~

·-= =--
·-=-=· -=-- .: ~
=o.....:~=_--~--~. . . .:. . -.--=~--=-=- ·- -r.!-L.

·- .
· -= - :..
11-::
..:
·~ ·-= II- : :
·-= .. -; '': tl-= f'
....
'-=
..... .. ...' .... ....
It-:....
· -= ··-=
I

Flgure I. Mean profiles for :males at different levels of education In


MMPI-2 Restandardlzatlon sample.
Note. From "Educational level and MMPI-2 measured psychopathology: A case of
negligible Influence" by J. N. Butcher, 1990, August, MMPI-2 News and
Profiles, p. 3. Reproduced by permission.
140 NATHAN c. WEED AND JAMFS N. BUTCHER

Part College
n=272 -t· ..... .
~
~-
' . ...... ....
k •lli

r
nt ~ MALE ~- ~-
·-= M-::
-~

~~·~
!It-:
,. i :-,..
Itt -; ~- =-I·
,....
e-: ~~

.. -: .. -; r-
·-;
•i
11-:
.~
-- ~ -§

w-:: :--
f'
·-= .. -: ~
.
l't-:
- '-"""
·- --- ~: -:--
;...
•i-
n· :
-~
·-
:

»i
51 --: ... ....... ·-= ,....
-~
~ "' :
·-= "i ~~
"1 ·- ''i ,_

·- r-....
.~
"""=
•i II-; ":
~i

.... ....
•-§ .,..; •-::


:..,
..•
.~


'
~

'
~-
' •··• •

College Graduate •;.,.. .... ... ...' k~lf; .


...
n=310
··~ MALE
"'~
~'"": •-: •-:
·- u ..:: -··
;-...,
... -: ~~~·
lt$ i r..,
··-=
K~
·- .,...:
;.""
,....
·-= ... -: ;...
" -: -;
. -: · -=
t)-:
~)

!If-:
~
;...
rn
...
" · -=
:-1t ·~-=

. ·-=---~=-~~
· -= ~
" ,_ .e.
,._ ...;-
·- »--: ,....
=-:;-t.t
~
.,
- ~

:-..~
lO-:
~1

·-=
'' -: ,_ ·- '1-: ,._
:-~ ..:: ,_
·~-=
~

•.' .... .. ·-
=-D
!-: ··-= n...:_
.....
:..,
· -=
~

' '
MMPI-2 141

..... .....
• ·- .
... Post Graduate

.
~· ${~11

·-
~

M- M-
' n=253
'~~ MALE :-'"
..,-:_ ..:
~,,,

:-u•
~~
:......
·-=
. -:: ''""""""
~ -§ ~
;...
"'--: o-n
It-; o-n

·-
,_
·-=

. ...' ..... ·-.


~-

"i
.... ..... ~
"-;
. .3

, . .• .• . .... ...
. -·
Part High School

r:
n=611
·-
•:RMAU ..:

:~ ·-
-~ · -=
;•
·~ =-·
•i
•i
:1
~"'i
·-=

:1--·-____ :_

. .... . ~­

Flgure 2. Mean profiles for females at different levels of education


In MMPI-2 Restandardizatlon sample.
Note. From "Educational level and MMPI-2 measured psychopathology: A case of
negligible influence" by J. N. Butcher, 1990, August, MMPI-2 News and
Profiles, p. 3. Reproduced by permission.
142 NATHAN C. WEFD AND JAMES N. BUTCHER

High School Graduatr


n=J98
...
~·~·~-~~--~--~-,--~
' ---! __ ... • , ... ~·
_!___!•___!•~~·~--~·---!•___!•___!~'·~-
• •

... .... ..... .


Part College
n=379
... . ... . ....
....
_.,.
: iRMAU
~::
•4 . . :
•i · -=
~ ..
•i
•i
-~
• -:

·-=
F
- ~ ~-
·.; •1f•
•1
~i ·~
·-=
:~: ~:
.
• -:
"i . ·-: }-
.; ;..
A

~--
- -- - --- : - ·- ----:-----------·--~---~~~~-~--·---.-~-~·
: :..
·~ .: :
-~ ~ ·~-~ .
·-; · -:. ;=-·
·1~ ­
·- · -= ~i r·
·- ·-
.... .....
·-=
1 ..... .• ~·· . ... ...
.. ~ :.. .
MMPI-2 143

... .... .... .' .' . ...' ..'. ...


~- College Gnduate
·• :. RW.U:
... ~ ·-
,.:
-· ~~
n=390

··1 ·- f·..·
:1 .., ·-
·~ ~
• -'

~-
.1 -~ r·
•1 t·
,..:
·-=
·-
·~
·- .1 t·
~i

.
"~ ·- ·- ·- ·1 r·
-~ r·
·1
~--·-- --- : -
· --
· ---:-----·-·--..:...-or::-• = • -~-rL.•
·-= :
·1 r·
.......: .1:-~·
•~
•1
'==::::::> ~
"-: ·-: :

•i ·1_ r·
•i
·- ·- ": ·- ,._ ·1r·
· -=
.... .... .' -. ... .... .... ... ·~ :...

... .... 1 .... ... .... ....' ... Post Gnduate


n=227
'":. RMAll , ..: -··
·1 ~ ·~
••i ""' ~··
•i
·- t·
·1
•i

.., ~·

•i ·-•i t·~-
•i
·~
·- ·- · - §-•
·~ ·-= -~ r~~
:1--·-----:- : ·~ r·
·-
·- --:-- - ---·--· - ---~.--·---~-~~·r·
•i . i r·

:1 - . ·-=; :~ -
·- -~-- -~
-.

:1
•i ·- . ..:
·- ·- ·-
·-; ;-·
~..::


• -'
....
, .:

1 .
..... • .• . ...• •.
~- ...
:.. ,
144 NATHAN C. WEED AND JAMFS N. BUTCHER

data, and a certain improvement in representation over the "Minnesota


normals" of the 1930s.

Unlfonn T-Scores
On the MMPI, raw scores (and K-corrected raw scores) are standard-
ized by a linear transformation which sets the mean of each scale at 50 and
the standard deviation at ten. These "T-scores" provide a neutral metric
wherein differing scales can be compared in terms of their deviation from
the mean. They cannot, however, be compared directly in terms of their
percentile ranks, since each scale distribution has its own skewness and
kurtosis. This state of affairs can yield some interesting conditions. For
example, it is possible to find that a T-score of 80 (three standard
deviations above a particular scale mean) is more common (has a lower
population percentile) than aT-score of 70 (two standard deviations
above a different scale mean).
This information can be quite disturbing to some, especially those
test users who learned about distributions in their graduate Measure-
ment or Statistics courses which emphasized the prototypical normal
distribution. In the normal distribution, of course, the mean always
equals the median. It is a distribution which indicates precise percentiles
corresponding to standard deviation units, a distribution which specifies
skewness and kurtosis, and is thus, a distribution which is rarely repro-
duced in real-life psychological inquiry, and is especially unrepresentative
of distributions of variables relevant to clinical psychology. Despite its
unsuitability, there have been attempts in the past to transform MMPI
scale distributions into a normal distribution (Colligan, Osborne, Swenson,
& Offord, 1983). However, these "normalized" T-scores failed to gain
popularity, as clinicians had become used to the discrimination at the
upper ends of the positively skewed distributions, and normalized T-
scores remain little used in practice today.
The MMPI Restandardization Committee desired continuity between
MMPI and MMPI-2, but was, at the same time, concerned about the
difference in percentile rank across the clinical scales. The result was a
compromise of sorts.A composite distribution, consisting of the distribu-
tions of the eight standard clinical scales, was formed, and each of the
eight scales was transformed to this composite distribution (fellegen,
1988, 1989). In this way, the positive skewness inherent in each of these
scales was maintained (though modified slightly), while at the same time
forcing uniformity among the scales with regard to percentile rank. With
MMPI-2 "Uniform T-scores," then, one can say for the first time that if a
clinical scale is more deviant from the mean in standard deviation units
than another, it also must be less common.
MMPI-2 145

MMPI-2 Research Issues


The MMPI has never had trouble generating research interest. With
the recent publication of the MMPI-2, it is very likely that MMPI studies,
some of which are centered around the new instrument, and some of
which are a continuation of research programs begun with the old
instrument, will flourish. The following is a sampling of some of the issues
which merit scientific inquiry in the years.to come. (I'he reader is also
referred to books by Butcher [1990b] and Graham [1990] that focus on
clinically-oriented issues regarding assessment with the MMPI-2)

Validation of MMPI-2
The goal of much of fifty years of MMPI research has been to discover
what the MMPI does well. Rather than investigating the bows and whys of
clinical description and prediction, MMPI researchers have been notori-
ous for ignoring the black box of process in favor of eagerly pursuing the
bottom-line payoff of validity. This has proven fruitful, and bootstrapping
has yielded great returns. With the publication of the MMPI-2, there is
little doubt that clinicians and researchers will remain interested in the
"payoffs," ensuring that they are as big as before, and anticipating,
perhaps, that they will be even bigger.

Fonn Comparability
Despite the conservative nature of the revision of the MMPI, the
overarching concern among loyal practitioners is that the MMPI-2 be able
to "do everything" that the MMPI did (Adler, 1990; Ben-Porath, 1990).
There is much wisdom in maintaining a cautious stance. As Cronbach
(1975) has pointed out, any difference in stimulus conditions should be
considered capable of moderating results until demonstrated otherwise.
It is for this reason that caution is always advised when, for example,
MMPI scales are taken out of context ofthe test booklet, or when the MMPI
is administered by computer, or when the test administration is not
supervised, unless studies have clearly demonstrated equivalence.
A primary objective of the MMPI Restandardization Committee was
to maintain interpretive comparability between the MMPI and MMPI-2,
not merely because of a wish to mollify loyal test users, but because of a
great unwillingness to see decades of MMPI research become suddenly
obsolete. Accordingly, a major research issue, perhaps the first MMPI-2
research issue, centers on how much confidence we should have in
applying our clinical and research knowledge about the MMPI to the new
146 NATHAN C. WEED AND JAMFS N. BUTCHER

MMPI-2. There are to date three kinds of evidence available which bear on
the issue of comparability between the two forms: analyses performed at
the item level, studies focussing on relative scale standing, and studies
examining absolute scale level.
Item equivalence. During the item revision stage of the development
of the MMPI-2 (see above), analyses were performed to assess the
equivalency of the old items with the revised versions of the same items.
On the experimental form AX (from which theMMPI-2 ultimately emerged),
82 items were revised versions of old MMPI items. Ben-Porath and
Butcher (1989b) examined the percentage of agreement between old and
new versions for these 82 items and compared this level of agreement
with agreement between two administrations of the old versions. Of the
82 items, only nine had agreement levels significantly different from the
agreement obtained by administering the old versions twice. None of
these differences held up for both genders, and one of these nine showed
greater agreement between old and new versions. Nevertheless, the
investigators then examined these items with respect to their correla-
tions with the MMPI scales of which they are members. None of the nine
revised items had item-scale correlations which differed from the item-
scale correlations for the corresponding old versions. This study suggests
that MMPI item revision did not result in changes which would impair
comparability between forms.
Relative scale level There has been some concern expressed in the
popular psychological press (Adler, 1990) about the differences between
MMPI and MMPI-2 in the relative level of the clinical scales. In the MMPI-
2 Manual (Butcher et al., 1989a) data from a psychiatric sample are
presented which indicate that the two-point codetypes from administra-
tion of both the MMPI and the MMPI-2 agree roughly two-thirds of the
time. This figure has been cited as evidence (Adler, 1990) that the MMPI-
2 produces results which are not comparable with those produced by the
MMPI. However, to evaluate whether lack of complete agreement from
MMPI to MMPI-2 stems from differences between the versions, one needs
to compare form comparability data with test-retest data from the
original MMPI.
A study by Ben-Porath and Butcher (1989a) made this type of com-
parison. College students were randomly assigned to either of two
groups: one which completed both the MMPI and the developmental
version of the MMPI-2 in a counterbalanced order, and one which com-
pleted the MMPI twice. First, test-retest correlations on individual MMPI
scales were compared with correlations between MMPI and MMPI-2
scales. Of the 42 comparisons made (21 scales each for females and
males), only two differed at the .01level of significance. For one of these
scales (scale F), the magnitude of difference was not shared across
MMPI-2 147

genders, and it appears that the significant difference is accounted for by


an unexpectedly high test-retest correlation. The other scale with a
significant difference between test-retest and cross-form correlations is
the scale that lost more items in the revision than any other (Ego
Strength). The authors conclude that this scale alone has changed
substantively with the revision of the MMPI.
Second, comparisons were made for other profile characteristics.
Each of the variables for which comparisons were made between test-
retest agreement and form agreement showed a good deal of similarity.
For example, 54% in the test-retest condition had the same high-point
scale, while 59% in the form agreement condition had the same high-
point. 40% of the test-retest group had the same two highest points, while
33% of the form agreement group had the same two highest scales.
Regarding agreement between forms, another issue which merits
discussion is that of codetype "purity." Consider a case in which a two-
point codetype is determined by merely selecting the two highest scales
without regard to elevation or to separation from the other clinical scales.
If there are other scales which have elevations near the levels of those in
the codetypes, it's not unlikely that scale unreliability will result in a new
two-point code upon readministration of the MMPI-2. The same is true
when we comparing codetypes between two the forms. Data presented by
Graham and Ben-Porath (1990) indicate that the level of definition of high-
point, two-point, or three-point codes is strongly related to the level of
agreement across forms. For example, in the community normative
sample, when they defined a codetype as merely the two highest scores
among the clinical scales, only 63% of the two-point codetypes agreed
between the MMPI and MMPI-2 , but when they required a five point
difference between the second and third highest scales to purify the
codetypes, agreement of 93% was achieved. In the 7% of the sample with
differing codetypes on the MMPI and MMPI-2 they found no significant
differences between the level of accuracy of the MMPI codetypes and that
of the MMPI-2 codetypes, using spouse ratings as validity criteria. (A
trend, however, was reported for the MMPI-2 codetypes to be more
accurate.) Thus, when we consider agreement between the two forms in
the context of test-retest data and codetype purity, we must conclude
that the MMPI-2 appears to resemble the MMPI very closely in terms of
relative scale level.
Absolute scale level. Although users of the MMPI-2 should not expect
changes from the MMPI with regard to relative scale level, there may be
some changes in absolute level, for three possible reasons. First, as
discussed earlier, the introduction of Uniform T-scores provides a scaling
metric which permits the direct comparison of percentile ranks from
scale to scale. As a result of fixing the skewnesses, some scales will range
148 NATHAN C. WEED AND JAMFS N. BUTCHER

slightly lower and some slightly higher than they did with the MMPl.
Second, the original MMPI instructions encouraged the use of the "Cannot
Say" response to items of which test takers were not certain. Today, it is
standard practice to discourage the use of Cannot Say. As a result, more
items in the MMPI-2 clinical scales are answered and thus, more items
endorsed in the keyed direction. Third, as society changes, some items
simply are answered differently by test takers today than they were in the
normative sample some fifty years ago.
Because of these changes, but primarily because of the change in
practice regarding Cannot Says, MMPI-2 profiles, normed on a contempo-
rary sample, run lower than MMPI profiles. The "clinically interpretable"
cutoff of aT-score of 70 now corresponds roughly to a cutoff of 65 on the
MMPI-2 (92nd percentile). This new cutoff is also supported by studies
which indicate that an MMPI-2 T-score of 65 appears to be optimal for
separating the normative samples from a Depressed sample (Butcher,
1989) and a Chronic Pain sample (Keller & Butcher, 1990). For these
reasons, the MMPI Restandardization Committee now recommends aT-
score of 65 as the new "clinically interpretable" cutoff, and the bold line
which ran across MMPI profile sheets now rests comfortably at 65 on the
MMPI-2 profile sheet. For practitioners used to interpreting absolute
levels of MMPI clinical scale elevations, there will undoubtedly be some
adjustment. However, the contemporary norms and Uniform T-scores
should eliminate the need for the rather unwieldy (and often faulty)
"internal norms" which were at times necessary for users of the original
MMPl.
To summarize, early research suggests that the MMPI-2 functions in
a manner very similar to the MMPI both at the item level and at the relative
scale level. In fact, with regard to relative scale standing and codetypes,
the MMPI-2 matches closely enough to be considered an alternate form of
the same test. Absolute differences between the MMPI and MMPI-2 in
clinical scale elevations are present, but for practitioners used to inter-
preting absolute levels ofMMPI scales, they can be summarized simply by
considering the MMPI-2 scale elevations to be approximately five T-score
points lower than the corresponding MMPI scale.

MMPI Descriptors
It appears that for most purposes, the MMPI-2 clinical scales function
in the same ways that the MMPI clinical scales functioned. It should follow
logically, then, that the more direct, bottom-line question of whether
scale correlates remain the same is answered in the affirmative. However,
as applicability of norms change, so does applicability of MMPI scale and
configuration descriptors. Unfortunately, the most commonly cited ''vali-
MMPI-2 149

dation" studies are over twenty years old. The simple validation study,
once the bread-and-butter of MMPI research, has taken a back seat to
other pursuits. Clinicians appear to be satisfied by the early decades of
empirical research, and time makes these studies less and less current.
To be sure, there are recent validity studies which provide support
for the use of the standard correlates used in MMPI interpretation for
years (as well as MMPI-2 validation studies; Butcher et al., 1989a).
However, as time goes on, the need for additional studies to complement
and update the work done in the past increases. The publication of the
MMPI-2 should generate many new such studies. Especially valuable will
be those which provide validity descriptors from multiple sources and
data from different kinds of samples.
It has long been recognized that the MMPI clinical scales possess
different correlates depending upon the source from which the external
criterion statements are obtained, and the sample utilized in a given
study. For example, one would certainly not expect psychologist, inpa-
tient unit nurse, spouse, friend, and client all to agree that a client steals
things, or is perfectionistic, or has suicidal thoughts (e.g., see McCrea &
Costa, 1987). Neither should one be surprised, when conducting a valida-
tion study, to find that in a homogeneous sample, strong external correlates
are few and far between, or that in a diverse sample, there exist an over-
abundance of scale correlates.
These are not trivial points, since the MMPI is used in many contexts,
from assessment in inpatient settings to screening in the selection of
graduate students in clinical psychology. It is questionable for a practitio-
ner to apply MMPI scale descriptors obtained in one setting to a test-taker
from an entirely different population. It is important, then, for studies
examining the validity of MMPI-2 scales, to take care to specify sample
characteristics, source of criterion descriptors, and if possible, base
rates of the criterion statements within the samples being studied. This
is necessary not only for standard clinical interpretation, but also for
computerized interpretive systems (discussed below). It is hoped that
the innovative features present in the MMPI-2 will be matched by a new
level of sophistication in its supporting research.

Indicators of Profile Validity


One of the features of the MMPI which has contributed to its popular-
ity over the years is the group of "validity scales" (scales L, F, and K),
which are scored routinely along with the standard clinical scales in an
attempt to detect "invalid" response sets such as faking, random respond-
ing, or exaggerating. The MMPI-2 adds to this list of validity scales, with
the inclusion of VRIN, 1RIN, and F8 (described above). In addition to
150 NATHAN C. WEED AND JAMFS N. BUTCHER

obtaining data bearing on the efficacy of these new scales, it seems


important at this point to reevaluate the ability of the traditional validity
scales to detect response sets. This is needed not only in order to
recommend new validity cutoffs for the new norms associated with
MMPI-2, but to update and complement the research which has been
conducted to date.
One of the standard approaches to evaluating indicators of profile
validity, and the approach which is responsible for the bulk of the
evidence for the utility of the validity scales, has been the "instructional
set" paradigm (see Gough, 1947).1n this approach, a group of individuals
(usually college students) are asked to take the MMPI or MMPI-2 twice. On
one of the administrations, the test is given according to normal proce-
dures. During the other administration, the subjects are asked to take the
test under some response set (e.g., as if they were feigning mental illness,
or as if they were their ideal selves). Cutoffs optimizing discrimination
between the instructional sets are then obtained, and a table is con-
structed which indicates the hit rate for the optimal cutoffs on the
proposed validity indicator.
In the better examples from this literature, instructional set is coun-
terbalanced, with special care taken to obtain valid profiles during the
normal administration. Also, recommended cutoffs are cross-validated.
Depending upon the scope and purpose of the project, a third group (e.g.,
psychiatric patients) is sometimes used to obtain cutoffs.
Despite the cleverness of this approach, however, there are some
inherent difficulties and assumptions which limit its utility for evaluating
potential indicators of profile validity. First, the primary assumption of
this design is the operationalization of profile validity: one group (e.g.,
psychiatric patients) is assumed to produce (perfectly) valid profiles,
while the second group (e.g., faking-bad normals) is assumed to produce
(completely) invalid profiles. The further these operational definitions
deviate from their theoretical validities (i.e., .0 and 1.0), the more the
results are brought into questions, and certainly the more we must
question the precision of any cutoffs.
Second, granting the assumption above, it is difficult to conceive of a
control group analogous to a psychiatric group which would allow us to
discriminate those who are faking good from those who are truly good.
Although it may be possible to identify a group of ascetics who have
renounced many human frailties, recent events involving high-profile
religious leaders underscore the difficulty of such a task.
Finally, and perhaps most importantly, the instructional set ap-
proach is remarkably indirect; ironically, a characteristic associated very
rarely with MMPI research. "Profile validity" is a concept which only has
meaning so far as external criteria are concerned. That is, a test protocol
MMPI-2 151

should be considered potentially invalid only if it can be demonstrated


that protocols with similar features do not predict relevant external
criteria well. Any operationalization of profile validity, such as Instruc-
tional set, that strays from a direct examination of external correlates is
bound to lose information at the least, and yield misleading results at the
worst. In fact, since the central question concerns the degree to which
scores on proposed indicators of profile validity suggest differential
validity of the clinical scales, there is no need to operationalize clinical
scale validity in this manner - it can be measured quite simply by the
correlations between MMPI scales and well-established criteria in the
form of normal-range personality correlates or psychiatric symptoms. A
scale or Indicator that affects or moderates the relationship between
MMPI scale and associated criteria is by definition an indicator of validity.
It is unnecessary to employ an "invalid" instructional set to demonstrate
the efficacy of a validity indicator.
An approach involving multiple regression first described by Saunders
(1956; see also Cronbach, 1987; Lubinski & Humphreys, 1990; and Weed
&Han, 1991) may be useful in addressing the utility of indicators of profile
validity, while at the same time carrying fewer procedural assumptions.
Using a multiple regression approach, one needs first to select appropri-
ate and well-validated MMPI scale correlates as criterion variables. The
first step in the procedure is to enter the appropriate MMPI-2 scale as a
predictor variable in the regression equation. The resulting r2 will indi-
cate the zero-order validity of that particular MMPI-2 scale when using a
particular correlate or set of correlates as criteria. The second step is to
add the proposed validity indicator as a second linear predictor. The
resulting R2 , when compared to the value obtained at the first step, will
indicate the increment due to the linear contribution of the validity
indicator (including any suppressor effects). The final step in the proce-
dure is to add a third variable to the equation which represents the
nonlinear interaction (moderator variable) effects of predictor by valid-
ity indicator. It is computed by simply multiplying the validity scale score
by the predictor (MMPI) score (Cohen, 1978). The resulting incremental
change in R2 will indicate precisely how well the validity Indicator func-
tions as a moderator of scale validity.
It should be noted that by employing this procedure with several
samples, it is possible to estimate the differential efficacy of validity
indicators across varying samples. It has long been suggested, for ex-
ample, that scale K is more effective as a validity indicator in higher SES
samples, while Lis more effective in low SES samples. With a large enough
sample, a simple median split on SES using this methodology could prove
very informative for this type of question.
152 NATHAN C. WEED AND JAMFS N. BUTCHER

The issue of the efficacy of validity scale configurations is somewhat


more complex, but every bit as important since many clinicians evaluate
profile validity by comparing several validity scales with one another
(e.g., LFK "checkmark" fake-good patterns). These configurations may
also be evaluated using multiple regression as described above. The
difficulty in these cases is in operationalizing the configuration so it can
be entered into the regression equation as a predictor. Classification
rules must be constructed, either for dichotomizing into the presence or
absence of a particular validity sign or for assessing the degree of fit to
some prototypical validity configuration. Once this has been accom-
plished, however, evaluation of the hypothesized validity configuration
may proceed as with the single variable predictor.

Item Subtlety
In the early years following the development of the MMPl by McKinley
and Hathaway, there was a great deal of enthusiasm among psychologists
about the empirical method of test construction. Test taker's responses
to MMPI items were treated as behavioral units and not self-report
statements to be interpreted as face value (Meehl, 1945). The overuse of
face valid items was actually discouraged, because: 1) face validity itself
was not viewed as important as the empirical relationship between an
item and some behavioral criterion; 2) face valid items were considered
to be more susceptible to faking; and 3) it was hoped that serendipitous
findings involving seemingly "neutral" (or even counterintuitive) items
might lead to a greater understanding of psychopathology.
Out of this tradition emerged the so-called "subtle" scales, the most
popular and most researched of which are the Wiener and Harmon Subtle
and Obvious scales (Wiener, 1948). These scales were derived by dividing
the items on the MMPI clinical scales into two groups: one made up of
items which appear on the surface to be related to psychological distur-
bance (obvious), and one made up of items for which the relationship to
disturbance is not clear (subtle). Of the clinical scales, five were success-
fully partitioned in this manner (D, Hy, Pd, Pa, Ma) and scoring keys were
developed. Clinicians who employ the Wiener and Harmon subtle scales
in practice have used them in two ways: 1) as clinical scales which are less
susceptible to faking; or 2) in conjunction with the obvious scales to
assess profile validity.
The first use of the subtle scales (i.e., as "unfakeable" indicators of
psychopathology) is not well supported in the literature. Despite their
popular use, the bulk of the evidence fails to find that the Wiener and
Harmon subtle items contribute much validity to the MMPl clinical scales
MMPI-2 153

over and above the contribution of their "obvious" counterparts. A recent


study by Weed, Butcher, and Ben-Porath (1990), using two large samples
of MMPI-2 protocols found that the Wiener and Harmon subtle scales
correlate very little with prototypical external MMPI-2 correlates by
themselves (mean correlation for both samples was .01). When combined
with the obvious items to produce full scale scores they actually attenu-
ated the validity of the obvious scales to the same degree that a random
variable would. This finding would certainly contraindicate their use as
alternate versions of the MMPI-2 clinical scales. (In fact, it may even
suggest that they be used as suppressor variables and subtracted out of
the full scale scores to obtain purer predictors of external criteria!)
The second proposed use of the Wiener and Harmon subtle scales is
in conjunction with the obvious scales as indicators of profile validity
(Greene, 1980). It is thought that if the obvious scales are elevated far
above the subtle scales, it is an indication that the test taker is overreporting
psychopathology. If the converse is true, it is taken as an indication of
underreporting psychopathology. Of course, this proposed use of the
subtle scales rests on the assumption that the subtle scales are equally
valid as (and perhaps more valid than) the obvious scales at predicting
psychopathology, an assumption which, as stated above, the bulk of the
research evidence fails to confirm. Nevertheless, this use was also
examined empirically by Weed, Butcher, and Ben-Porath (1990). They
found that the most valid test protocols were not those with little
disparity between subtle and obvious scale scores, but those in which the
obvious scale scores exceeded the subtle scale scores. This, again, is the
validity pattern one would expect by comparing scores on a random
variable to those of a valid predictor.
There is still much to be learned about subtle scales. For example,
there may be differences in how these scales function in different popu-
lations, or it may be that certain of the subtle scales are more effective
than others. Also, there have been different formulations of the concept
of item subtlety which may bear on the effectiveness of MMPI-2 items
(e.g., Holden & Jackson, 1979; Christian, Burkhart, & Gynther, 1978).
However, regarding the clinical application of subtle scales, the challenge
to demonstrate their value remains.

New Scale Development


It is likely that a great deal of research attention will be paid in the next
decade to the development of new scales for the MMPI-2. This, along with
the improvement of original scales is warranted and needed because the
item pool has changed. Careful and thoughtful scale construction can
154 NATIIAN C. WEED AND JAMES N. BUTCHER

result in scales that complement assessment with the traditional clinical


scales and the new content scales. Below are some guidelines an Investi-
gator may wish to follow (at a minimum) when constructing MMPI-2
scales. Likewise, a test user may want to consider these issues before
implementing new scales along with or in place of established and well-
understood scales.
First, unless there are compelling research reasons, a new scale
should be developed or used only if it could potentially perform a task
better or more efficiently than other methods. For example, it is a waste
of time to develop a scale which predicts demographic group member-
ship (e.g., age, sex, or race), if there are more direct (and presumably,
more accurate) ways of obtaining the same information.
Second, once a scale has been constructed, validity information
should be presented from as many sources as possible (e.g., other self-
report measures, peer ratings, counselor ratings, behavioral ratings,
etc.). This is especially important for scales which are assembled ratio-
nally or inductively, because it is tempting, but fallacious, to allow face
validity or factor-naming substitute for external correlates. Correlations
should also be obtained between the new scale and other scales with
similar construct explication or clinical purpose (especially other MMPI-
2 scales!) in order to address convergent and discriminant validity.
Third, validity evidence should be cross-validated. This is critical for
empirically constructed scales which may not have well-understood
internal structures.ln most cases, evidence from developmental samples
should be discounted completely as biased due to sample-specific char-
acteristics. It can be argued that the larger the developmental sample, the
less the need for cross-validation. Even with huge samples, however,
evidence of validity is most compelling when found in two samples from
even slightly different populations of test-takers.
Fourth, an examination of the internal characteristics of a scale is
desirable, no matter how empirical the predictive task. A simple item-
level principal components analysis applied to a useful empirical,
multidimensional scale may help in understanding the phenomenon
being predicted.
Fifth, the utility of new scales should be explored, whether a scale was
devised for research or clinical purposes. As a beginning, an investigator
should at least speculate on how and in what situations the new scale
might be used. Next, the scale developer should actually implement
usage of the scale and report on its success. In many instances, this latter
step can also serve as further validation of the new scale.
Finally, but not trivially, the scale should be given an appropriate
name. "Catchy" or "gimmicky" names may be entertaining, but will almost
certainly obscure the purpose of the scale, especially to test users who
MMPI-2 155

might only see the scale briefly mentioned within a larger context. As a
rule of thumb, empirical scales should be named according to their
function, and inductive or rational scales, according to their content.
Although introduction of new scales will ideally address all of these issues
and more, these guidelines may serve as a beginning for investigators
interested in scale development based on the MMPI-2 item pool.

Computer Applications of the MMPI-2


Practitioners have been quick to apply the technology of the elec-
tronic computer to psychological test administration and interpretation.
Potential advantages have, at times, seemed attractive enough for clini-
cians to allow the use of computer developments to outstrip the pace of
research supporting the use of these developments. So far, different
applications of computers in assessment have received widely varying
amounts of research attention. For example, there is a good amount of
research published which tests the equivalence of computer administra-
tion of the MMPI to the standard booklet form. Many studies find form
equivalence, but there are some notable exceptions (e.g., Bishkin &
Kolotkin, 1977; see Moreland, 1987 for a review). Two examples of
underresearched topics which stand to gain quite a bit of research
attention in the coming years are the use of computerized MMPI-2
interpretive reports and the possibilities for computer adaptive testing.

Computerized Interpredve Reports


Many see the widespread use of computers in interpretation of
psychological tests as a giant step towards improving the quality of
psychological assessment. Besides the low cost and the rapid turnaround
possible, the use of the computer may help to minimize the impact of
many problems, fallacies, and frailties which are inevitable when humans
attempt to interpret (let alone gather) psychological data (e.g., see Meehl,
1954; Chapman & Chapman, 1969). However, using computer test inter-
pretation is not without its own difficulties. The potential for misuse and
the problem of ensuring the validity of interpretive reports perhaps rank
highest among these difficulties (Butcher, 1987).
The potential for misuse of computerized interpretive systems is in
some ways greater than that of clinical interpretation. When the "magic
squiggles" that only the trained clinician knows how to interpret become
instantly transformed into words, often terrible words which everyone
can understand, the danger is very great that a nonprofessional, a
professional untrained in psychological test interpretation, or even a test
156 NATHAN C. WEED AND JAMES N. BUTCHER

taker might read it with an uncritical eye, with potentially unfortunate


consequence. Even when used by trained professionals, however, there
can be difficulties. Though a practitioner might be well aware that the
correlation between some criterion and an indicator on the MMPI-2 is
somewhat less than .9, the clinician may forget this important information
in the context of an interpretive report. Practitioners need, therefore, to
guard against the tendency to read computer reports uncritically. Writers
of the interpretive systems should use cautious phrasing, to remind
consumers of what they should know about the meaning of any given
scale or profile (Butcher, 1987).
Determining the validity of a computerized interpretive system can
be extremely challenging for the consumer. Especially for those familiar
with the MMPI or MMPI-2, the phenomenon of illusory correlation
(Chapman & Chapman, 1969) is likely to make a clinician see more
interpretive "hits" than misses. Compounding the situation is the possi-
bility of Barnum statements fooling the consumer. It is partly for these
reasons that research investigating the validity of interpretive systems
based entirely on user ratings is quite unsatisfactory. Evaluative research
of this type should include "interpretive reports" which control for the
presence of the Barnum effect, and actuarial tables with objective criteria
should replace subjective ratings to circumvent illusory correlations.
An ambitious series of studies by Eyde, Kowal, & Fishburne (1986,
1987; Fishburne, Eyde, & Kowal, 1988) were conducted in an effort to
compare the relative merits of some of the most popular commercially
available MMPI interpretive reports. With multiple raters using standard-
ized case histories as criteria, they have examined the relative accuracy
of these different reports for different types of assessment questions,
client characteristics (e.g., race), and specific MMPI patterns. This kind
of study, in combination with studies which tease out the effect of Barn urn
statements, should prove to be extremely useful in aiding both the
development of reports and the selection of products by the consumer.
Since it is unlikely that the ordinary practitioner will have the re-
sources and inclination to conduct studies of this scope, realistically, the
lion's share of the burden of validation must fall on those who write and
market the interpretive systems. Basing one's interpretive system on
results from published research goes a long way towards ensuring
validity. However, the amount and kind of validation research available
places limits on this approach. It seems, then, that in the coming years
both MMPI-2 validation studies and interpretative report evaluation
studies will be necessary to ensure the validity of these systems and may
help the consumer differentiate among the many computer systems
available.
MMPI-2 157

Adaptive testing

An area which has begun to receive some research attention in recent


years is the application of computerized adaptive testing methods to the
MMPI. Adaptive or "tailored" testing involves efforts to fit the test
administration to the individual, by administering only those items which
are appropriate to a given individual and contribute information relevant
to answering the assessment question. The application of IRT (Item
Response Theory) to test construction and administration is one such
strategy (see Weiss & Vale, 1987). However, since IRT requires the
assumption of unidimensionality for its application, the MMPI and MMPI-
2 clinical scales are not appropriate for this approach. Another adaptive
technique known as the "countdown method" appears to be more appro-
priate for use with MMPI and MMPI-2 clinical scales and has shown
considerable promise in reducing then umber of items and time required
in their administration.
First described by Butcher, Keller, & Bacon (1985), the countdown
method determines whether an individual has exceeded a predetermined
cutoff on a scale without administering all items. For example, on the
MMPI-2, a cutoff on scale 2 might be a raw score of 28, which corresponds
to aT-score of 65. Since there are 57 items on the scale, items would be
administered by computer until 28 are endorsed in the keyed direction
(the individual has reached the cutofO or until 30 are endorsed in the
nonkeyed direction (the individual cannot possibly reach the cutofQ.
This adaptive method was tested by Ben-Porath, Slutske, & Butcher
(1989) in a preliminary examination of the potential benefits of the
countdown method with the MMPI in a real-data simulation study. That
is, items were not actually administered by computer, but computer
administration was simulated by selecting responses from pencil-and-
paper answer sheets which had already been completed. Two
modifications were made to the method as originally described. First, in
an attempt to maximize item savings, they "administered" the items in
several orders, including from least to most frequently endorsed in the
normative sample and from most to least frequently endorsed. It was
thought that in normal samples, the former order would maximize
savings by allowing the cutoff criteria to be "unmet" more quickly. In
samples with much psychopathology, the latter order might allow the
criteria to be met sooner.
In all four samples used (two personnel selection samples, a psychi-
atric sample, and a chemical dependency sample), regardless of item
order, and whether or not full scores were obtained on elevated scales,
they found considerable item savings (up to 38%) when compared to the
number of items required in the standard pencil-and-paper administra-
158 NATHAN C. WEED AND JAMES N. BUTCHER

tion. They concluded that the results they obtained warranted further
study of the countdown method, preferably using actual computer ad-
ministration, rather than simulated administration. They felt, however,
that recommendation of this procedure for practice is premature since
their data do not bear on the issue of test equivalence across administra-
tion conditions. To warrant this kind of recommendation, they reasoned,
would require evidence not only that computer administration yields
results comparable to the standard administration (see section above),
but also that the reordering of the items produces no substantive changes
in MMPI results.
A recent study by Slutske, Ben-Porath, Roper, Nguyen, and Butcher
(1990) was conducted to address these questions using computer adap-
tive administration of the MMPI-2 to college students. Subjects were given
both the booklet version of the MMPI-2 and the computer adaptive
version (with items ordered from least frequently endorsed to most
frequently endorsed in the MMPI-2 normative sample) in a counterbal-
anced design. Following the (shortened) adaptive administration, the
remainder of the items not needed for classification were administered by
computer to obtain full scale scores which could be compared to the
analogous scores from the booklet form.
Results comparing the standard booklet form and the reordered
computerized version of the MMPI-2 suggest rather strongly that the
forms of administration are comparable. First, the mean profiles for the
two versions are very similar. Of the thirteen basic scales and fifteen
content scales for both men and women, approximately two-thirds of the
mean scale scores from one version were within one T-score point of the
mean of the same scale from the other version. Second, for these 28 MMPI-
2 scales, the correlations between the the two versions administered
compare quite favorably with the test-retest correlations of the same
scales using the booklet form. Third, the item endorsement differences
between forms of administration are not dramatic and do not suggest any
obvious pattern of response set.
Having addressed the issue of form comparability, Slutske et al.
(1990) then reported results of item and time savings. Of the 498 MMPI-
2 items required to obtain full scale scores for the 28 basic scales, a mean
of only 357 items were administered to achieve perfect classification at a
T-score of 65 according to the countdown method, a savings of 28%. A
mean of 33 minutes was required for administration of the adaptive
version, 36% down from the mean of 52 minutes needed for the adminis-
tration of all498 items. Although these results may be directly relevant
only to assessment questions in which a simple classification is appropri-
ate, it underscores the potential applicability of this method to situations
where time and client attention are at a premium. A study by the same
MMPI-2 159

research group is underway to test both the comparability of this item


reordering with the booklet form and the efficiency of the countdown
method with a psychiatric Inpatient population.
As mentioned earlier, the assumption of scale unidimensionality
required for the application of IRT makes the MMPI-2 clinical scales
inappropriate for this type of adaptive testing. The MMPI-2 content
scales, however, were constructed with scale homogeneity in mind.
Preliminary research by Ben-Porath, Waller, Slutske, and Butcher (1988)
using real data simulation with responses from a psychiatric sample
indicates that these new content scales do not violate the assumptions of
IRT. They found that by the application of IRT methods, considerable item
savings are possible, without losing much information. With the DEP
(Depression content scale), for example, a mean of 16 of the 33 items on
the scale were required to obtain scores which correlate .99 with the full
scale score. In fact, the flexibility of IRT test construction allows for the
level of error to be preset by the examiner. Presumably, clinical applica-
tion of IRT with the MMPI-2 content scales could allow practitioners to
vary the precision of the measurement according to the needs of the
particular assessment situation; allowing more error when only a general
idea of relative standing is necessary but time Is at a premium, and
allowing very little measurement error when precision of measurement is
essential.

Summary
The revision of the MMPI has been compared to the renovation of an
old historic house. Improvements are made to see that it's structurally
sound and to ensure its safety of use. Modern conveniences, taking
advantage of technologies not available at the time of construction, are
added for the comfort and luxury of the user. But foremost is the
preservation of the aesthetics, function, and character of the old building.
The research to date on the MMPI-2 suggests that the conservative
nature of the test revision has been successful in maintaining interpretive
continuity. Research examining the comparability of MMPI and MMPI-2
will undoubtedly continue, and add to the half century of research base
documenting and supporting their use. New features, such as contempo-
rary norms, uniform T-scores, new indicators of profile validity, and the
MMPI-2 content scales should prove useful for practitioners and generate
interest among researchers. Finally, research programs which were
initiated using the MMPI, such as understanding the role of subtle items,
evaluating the efficacy of validity indicators, and examining the utility of
computer adaptive testing, will continue with the MMPI-2. Time and
160 NATHAN C. WEED AND JAMFS N. BUTCHER

empirical research will reveal which parts of the renovation become little-
used and which become as useful and popular as the original structure.

References
Adler, T. (1990, April). Does the 'new' MMPl beat the 'classic'? APA Monitor,
pp. 18-19.
Ben-Porath, Y. S. (1990, August). MMPI-2 Items. MMPI-2 News and Profiles, pp. 4-5.
Ben-Porath, Y. S., &Butcher, J. N. (1989a). The comparability of MMPI and MMPI-
2 scales and profiles. Psychological Assessment: A Journal of Consulting and
Clinical Psychology, 1, 345-347.
Ben-Porath, Y. S., & Butcher, J. N. (1989b). Psychometric stability of rewritten
MMPI items. Journal of Personality Assessment, 53, 645-653.
Ben-Porath, Y. S., Slutske, W. S., & Butcher, J. N. (1989). A real-data simulation of
computerized adaptive administration of the MMPI. Personality Assessment: A
Journal of Consulting and Clinical Psychology, 1, 18-22.
Ben-Porath, Y. S., Waller, N. G., Slutske, W. S., &Butcher, J. N. (1988).A comparison
of two methods for adaptive administration of MMPI-2 content scales. Paper
presented at the 96th Annual Meeting of the American Psychological Association,
Atlanta.
Bishkin, B. H., & Kolotkin, R. C. (1977). Effects of computerized administration on
scores on the Minnesota Multiphasic Personality Inventory. Applied
Psychological Measurement, 1, 543-549.
Burisch, M. (1984). Approaches to personality inventory construction. American
Psychologist, 39, 214-227.
Butcher, J. N. (1987). The use of computers in psychological assessment: An
overview of practices and issues. In J. N. Butcher (Ed.), Computerized
psychological assessment (p. 3-14). New York: Basic.
Butcher, J. N. (1989, August). MMPI-2: Issues of continuity and change. Paper
presented at the 97th Annual Convention of the American Psychological
Association, New Orleans.
Butcher, J. N. (1990a, August). Educational level and MMPI-2 measured
psychopathology: A case of negligible influence. MMPI-2 News and Profiles, p. 3.
Butcher, J. N. (1990b). Use of the MMPI-2 in treatment planning. New York: Oxford
University Press.
Butcher, J. N., Dahlstrom, W.G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989).
Manual for the restandardized Minnesota Multiphasic Personality Inventory:
MMPI-2. An administrative and interpretive guide. Minneapolis: University of
Minnesota Press.
Butcher,J. N., Graham,J.R., Williams, C.L.,&Ben-Porath, Y.S. (1989).Development
and use of the MMPI-2 Content Scales. Minneapolis: University of Minnesota
Press.
MMPI-2 161

Butcher, J. N., Keller, L. S., & Bacon, S. F. (1985). Current developments and future
directions in computerized personality assessment. Journal of Consulting and
Clinical Psychology, 53, 803-815.
Chapman, L. J., & Chapman, J.P. (1969). Illusory correlation as an obstacle to the
use of valid diagnostic signs. Journal of Abnonnal Psychology, 4, 44-49.
Christian, W. L., Burkhart, B. R., &Gynther, M.D. (1978). Subtle-obvious ratings of
MMPI items: New interest in an old concept. Journal of Consulting and Clinical
Psychology, 46, 1178-1186.
Cohen, J. (1978). Partialled products are interactions; partialled powers are curve
components. Psychological Bulletin, 85, 858-866.
Colligan, R. C., Osborne, D., Swenson, W. M., & Offord, K. P. (1983). The MMPI: A
contemporary normative study. New York: Praeger.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology.
American Psychologist, 30, 116-127.
Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws in analyses
recently proposed. Psychological Bulletin, 102, 414-417.
Eyde, L. D., Kowal, D. M., & Fishburne, F. J., Jr. (1986, August). The validity of
computer-based test interpretations of the MMPI. InA. D. Mangelsdorff (Chair),
Computer-based clinical assessment for children, adults, and neuropsychological
cases. Symposium conducted at the meeting of the American Psychological
Association, Washington, D. C.
Eyde, L. D., Kowal, D. M., &Fishburne, F. J., Jr. (1987, August). Clinical implications
of validity research on computer-based interpretations of the MMPI. In A. D.
Mangelsdorff (Chair), Practical test user problems facing psychologists in private
practice.. Symposium conducted at the meeting of the American Psychological
Association, New York.
Fishburne, F. J., Jr., Eyde, L. D., & Kowal, D. M. (1988, August). Computer-based test
interpretations of the Minnesota Multiphasic Personality Inventory with
neurologically impaired patients. Paper presented at the meeting of the American
Psychological Association, Atlanta.
Gough, H. G. (1947). Simulated patterns on the MMPI. Journal of Abnonnal and
Social Psychology, 42, 215-225.
Graham, J. R. (1990). MMPI-2: Assessing personality and psychopathology. New
York: Oxford University Press.
Graham, J. R., & Ben-Porath, Y. S. (1990, June). Congruence between the MMPI and
MMP/-2 code types: Empirical data and theoretical issues. Paper presented at the
25th Annual Symposium on Recent Developments of the MMPl (MMPI-2).
Minneapolis, MN.
Graham, J. R., & Butcher, J. N. (1988). Differentiating schizophrenic and major
affective disordered inpatients with the revised form ofthe MMPJ. Paper presented
at the 23rd Annual Symposium on Recent Developments in the Use of the MMPI,
St. Petersburg, FL.
162 NATHAN C. WEED AND JAMFS N. BUTCHER

Greene, R. L. (1980). The MMPL· An interpretive manual. New York: Grune &
Stratton.
Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule
(Minnesota): I. Construction of the schedule. Journal ofPsychology, 10, 249-254.
Holden, R. R., &Jackson, D. N. (1979).ltemsubtlety andfacevalidityinpersonality
assessment. Journal of Consulting and Clinical Psychology, 4 7, 459-468.
Keller, L. S., & Butcher, J. N. (1990). Use of the MMP/-2 with chronic pain patients.
Minneapolis: University of Minnesota Press.
Koss, M.P., & Butcher, J. N. (1973). A comparison of psychiatric patients' self
report with other sources of clinical information. Journal of Research in
Personality, 7, 225-236.
Lubinski, D., &Humphreys, L. G. (1990). Assessing spurious "moderator effects":
Illustratedsubstantivelywith the hypothesized ("synergistic") relation between
spatial and mathematical ability. Psychological Bulletin, 107, 385-393.
McCrea, R. R., & Costa, P. T., Jr. (1987). Validation of the five-factor model of
personality across instruments and observers. Journal of Personality and
Social Psychology, 52, 81-90.
Meehl, P. E. (1945). The dynamics of "structured" personality tests. Journal of
Clinical Psychology, I, 296-303.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and
a review of the evidence. Minneapolis: University of Minnesota Press.
Moreland, K. L. (1987). Computerized psychological assessment: What's available.
In J. N. Butcher (Ed.), Computerized psychological assessment (pp. 2649). New
York: Basic.
Saunders, D. R. (1956). Moderator variables in prediction. Educational and
Psychological Measurement, /6,209-227.
Slutske, W. S., Ben-Porath, Y. S., Roper, B., Nguyen, P., & Butcher, J. N. (1990, June). An
empirical study ofthe computer adaptive MMP/-2. Paper presented at the 25th Annual
Symposium on Recent Developments of the MMPl (MMPI-2). Minneapolis, MN.
Tellegen, A. M. (1988, August). Derivation ofUniform T-scores for the restandardized
MMPI. Symposium presentation at the 96th Annual Convention of the American
Psychological Association. Atlanta, GA.
Tellegen, A.M. (1989, August).New Uniform T-scores for the MMPI-2: Methodological
issues. Paper presented at the 97th Annual Convention of the American
Psychological Association. New Orleans, LA.
Weed, N. C., Ben-Porath, Y. S., & Butcher, J. N. (1990). Failure of the Wiener-
Harmon MMPI subtle scales as predictors of psychopathology and as validity
indicators. Psychological Assessment: A Journal of Consulting and Clinical
Psychology, 2, 281-285.
Weed, N. C., & Han, K. (1991). Evaluating indicators of MMPI-2 profile validity.
Manuscript In preparation.
MMPI-2 163

Weiss, D., & Vale, C. D. (1987). Computerized adaptive testing for measuring
abilities and other psychological variables. InJ. N. Butcher (Ed.), Computerized
psychological assessment (pp. 325-343). New York: Basic.
Wiener, D. N. (1948). Subtle and obvious keys for the MMPI. Journal of Consulting
Psychology, 12, 164-170.
Wiggins, J. S., Goldberg, L., & Appelbaum, M. (1971). MMPI Content Scales:
Interpretive norms and correlations with other scales. Journal of Consulting
and Clinical Psychology, 37, 403-410.
CHAPTER6

Assessing Psychopatho logy


Using the Basic Personality
Inventory: Rationale and
Applications

Ronald R. Holden and Douglas N. Jackson

This chapter provides a guide to the use and interpretation of the


Basic Personality Inventory (BPI; Jackson, 1976). In doing so, greater
emphasis is placed on describing the applied aspects of the test with less
of a focus on the instrument's intricate psychometric development.
Those with an interest in a more extensive discussion of the multivariate
statistical details are referred to the Basic Personality Inventory Manual
(Jackson, Helmes, Hoffmann, Holden, Jaffe, Reddon, & Smiley, 1989).

Purpose of the Baste Personality Inventory


The BPI was developed as a self-report measure of personality and
psychopathology that would be useful to mental health professionals in
a wide variety of settings. The self-report of psychopathology represents
an important, integral part of the identification and diagnosis of dysfunc-

RONAlD R. HOlDEN, Department of Psychology, Queen's University,


Kingston, Ontario, Canada.
DOUGLAS N. JACKSON, Senior Professor, Department of Psychology, The
University of Western Ontario, London, Ontario, Canada.
165
166 RONALD R. HOlDEN AND DOUGLAS N. JACKSON

tlonal behavior. The most widely used measure of psychopathology, the


Minnesota Multiphasic Personality Inventory (MMPI), although having a
noble history of over 60 years, has received a substantial amount of
criticism both on psychometric and practical grounds (Faschingbauer,
1979; Levitt &Duckworth, 1984), and has been recently revised (Butcher,
Dahlstrom, Graham, Tellegen, &Kaemmer,1989; Chapter 5, this volume).
The BPI, designed to measure the same traditional domain of psychopa-
thology as the MMPI, has fewer items, focuses on construct measurement,
attempts to suppress response biases, does not permit item overlap
between clinical scales, minimizes required reading ability, and avoids
objectionable item content (Jackson et al., 1989; Jackson & Hoffmann,
1987). The end result is the development through modem multivariate
techniques of a psychometrically sophisticated inventory of traditional
psychopathology that is relatively short (i.e., 240-items; 20-30 minutes)
and easy to administer.

Fonnat and Administration


The paper-and-pencil version of the BPI consists of 240 items in a self-
administered, reusable question booklet. Items may be responded to on
either answer sheets designed for hand scoring or on response forms to
be used for computer scoring. A computer administered version of the
BPI also is available. The reading level required for completion of the BPI
is approximately Grades 5 to 7 (Reddon & Jackson, 1989), depending on
the method used to address readability. Administration of the BPI may be
through either individual or group testing but should involve completion
under the supervision of a responsible proctor. Appropriate proctoring
of test completion would include informing respondents of the aims and
purposes of the testing and the use that is to be made of test results.

Nonns
Various sets of norms are available for the BPI. Norms used for the
adult profiles described in the BPI manual were collected using mail
surveys or interviews of 709 men and 710 women randomly selected from
telephone directories and voters' records from the United States and
Canada. Age and marital status for this sample matched closely the
distributions associated with the 1980 U.S. Census. An underrepresentation
of nonwhites, however, suggests that additional nonwhite norms are still
required for the BPI. Adolescent norms are based on 880 male and 1380
female high school students representing pooled samples from two
Canadian provinces. The BPI manual also describes norms for college
students (187 men; 192 women), correctional officer job applicants (260
BASIC PERSONAUTY INVENTORY 167

men; 109 women), and psychiatric patients (66 men; 46 women). Norma-
tive data are also available for other samples of psychiatric patients and
high school students (Holden, Red don, Jackson, & Helmes, 1983), and for
psychiatric patients who completed a microcomputer-administered ver-
sion of the BPI (Holden, Fekken, & Cotton, 1990).

Model of Measurement
In considering the theoretical foundations of the BPI, it is useful to
review alternative measurement models for the assessment of psychopa-
thology, because these are intimately linked to alternative conceptions of
psychopathology.
The class model. A variety of measurement models have been
employed implicitly in the development of scales designed to assess
psychopathology. One of the important contributions of Loevinger (1957)
was her recognition that one aspect of construct validity was the require-
ment that a measurement model bear a close relationship to the structure
of the processes thought to underlie what was being measured. In the
measurement of psychopathology one can conceptualize these pro-
cesses as representing discrete states or psychopathological conditions
or as continuous dimensions. The choice of a model of psychopathology
should logically determine the choice of a measurement model, which in
turn affects the approach taken to scale development and interpretation.
A generation of personality and psychopathology test specialists has
been influenced by the approach employed by Hathaway and McKinley
(1940) in the construction of the original Minnesota Multiphasic Person-
ality Inventory (MMPI). These authors considered psychopathology in a
manner consistent with the classical model of medical diagnosis. Patients
suffering from severe depression or from a schizophrenic reaction were
conceptualized as being in a class distinct from other people much in the
same way as a patient diagnosed as having a tuberculosis infection is
thought of as different from non-infected people. The task of assessing
psychopathology from this vantage point was to distinguish members of
a class from those who were not members. It thus followed that the
approach to measurement taken was to seek to distinguish reliably
persons falling within the class or diagnosis from others falling outside of
the class. The class was believed to have an independent existence, one
that could be approximated by measurement. The method employed to
accomplish this was the contrasted groups or empirical method of scale
construction. A group of schizophrenic patients and a group of non-
patients were administered a heterogeneous set of personality items and
those items showing different endorsement proportions between the two
groups were assembled into a scale. This scale could then be adminis-
168 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

tered to persons whose diagnosis was not known. In its simplest form this
model involved a single decision. If a person endorsed the items on the
scale above a certain cutting score, that person would be classified as a
schizophrenic; otherwise not. In Hathaway's conceptualization, the level
of the score was important only insofar as it affected the classification
decision. If a person's score exceeded the cutting score that person was
classified as probably a member of the diagnostic group; how much he or
she exceeded the cutting score was of little or no importance. Similarly,
a score below the cutting score was not further interpreted in terms of its
magnitude. Neither was much attention directed at the nature of the item
content comprising the scale, nor at the particular items that a particular
respondent endorsed. It should be noted, however, that despite the
underlying measurement model associated with instruments such as the
MMPI, in practice (and in contradiction to the class model), users do
interpret scales on such tests as dimensions. Nevertheless, it is useful to
contrast the class model in its "pure" form with the dimensional model
even though many individuals have expanded upon and departed from
Hathaway's original conceptualization of the measurement model im-
plicit in the contrasted groups approach, in a shift of paradigms (Messick,
1989). This takes the form of interpreting a person's score as representing
the magnitude of the trait for a particular respondent. That is, a person
with a high depression score is thought to manifest a high degree of
depression, rather than simply being a member of a depressed diagnostic
group.
The dimensional model. A dimensional model of psychopathology
recognizes that its manifestations can be conceptualized as falling along
a continuum. In such a model, the analogy is not with medical diagnosis,
but with the measurement of intellectual ability, which similarly is viewed
as falling along a continuum. With a dimensional model, the task of
measurement is to identify exemplars characteristic or prototypical of
the domain represented by the construct. If, for example, this domain is
depression, exemplars or items highly prototypical of depression are
sought, items that are not only prototypical but representative of all
important facets of depression. For example, Beck (1967) identified such
features as despair over the future, suicidal tendencies, loss of libido, and
psychomotor retardation, among others, as characteristic of depression.
To be representative of the domain, an item pool should contain content
representing each of these components.
Our choice of a dimensional model over alternatives was based on a
number of considerations. First, there is much controversy surrounding
the question of the viability of psychiatric diagnostic categories. The
reliability of psychiatric diagnoses even when standard nomenclatures
and diagnostic systems have been employed has been far from satisfac-
BASIC PERSONAUTY INVENTORY 169

tory. There has been much controversy also about their number and
nature, as witnessed by the changing standards for diagnosis over the
years. Another problem is that this evolution of a standard nomenclature
has remained largely free of influence from empirical research on psychi-
atric classification (Blashfield, 1984). Where such research has been
undertaken, there is evidence of important incongruence between em-
pirical findings and, for example, DSM-III-R (Livesley, Schroeder, &
Jackson, 1989).
A third problem in the use of a class model is that there is a lack of
consensus on where to draw the line regarding membership in a class. In
World War II the psychiatric screening of inductees into the U.S. Army was
undertaken by physicians employing a diagnostic interview. Rejection
rates at different induction centers ranged from 2 to 60 percent (Office of
Strategic Services, Assessment Staff, 1948), a problem that was alleviated
through the use of standardized questionnaires. Better levels of consen-
sus have been achieved usingDSM-III-Rcriteria, butthecriteriaemployed
(e.g., "has two or fewer friends") are arbitrary both in terms of their lack
of explicit rationale and their lack of empirical support.
A fourth problem in adopting a medical diagnostic model for assess-
ment is the problem of base rates. Here we refer to the fact that the level
of item and scale validity coefficients can be affected markedly by the
proportion of persons in a diagnostic group and in the "normal" group
(Loevinger, 1957; Meehl &Rosen, 1955). This problem is not alleviated by
specifying an arbitrary proportion (e.g., 50%) for each contrasted group
because these arbitrary proportions will vary between the groups used
for scale construction and those encountered in a given diagnostic
setting. Furthermore, the base rate for a given diagnostic category will
vary from one setting to another due to a number of causes, e.g.,
diagnostic and administrative traditions in the setting, selection factors
such as the policy to restrict certain types of patients in a clinical setting,
and the relation of certain types of psychopathology to ethnicity and
social class, which will vary across settings. To base diagnostic decisions
on measurement that is markedly and arbitrarily affected by base rates is
a dangerous venture.
A fifth concern with a class model and the method of contrasted
groups is the problem of matching normal and pathological groups on all
relevant variables except the particular type of psychopathology of
interest, a problem that is compounded by the fact that many variables
are correlated with psychopathology. When contrasted groups differ in
important respects, e.g., age, sex, social class, or even on perhaps more
incidental dimensions like employment status, religion, occupation, or
social attitudes, there is a distinct possibility that the differences ob-
served are due to the extraneous variables and not to differences in
170 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

psychopathology. The problem of matching is usually not alleviated by


random selection within groups, because the pathological group and the
normal group ordinarily represent distinct populations. Since there is
such a large number of variables on which contrasted groups can be
matched, and since the investigator does not know precisely which
variables should be used as relevant controls (such a choice requires
precise knowledge of the relationship between every variable and every
item in the item pool), effective matching rarely, if ever, occurs in
practice.
Given the questions regarding the viability of extant diagnostic
categories, coupled with the statistical, conceptual, and psychometric
problems associated with identifying items that reliably distinguish
contrasted groups in the population, it is not surprising that many
psychologists who develop personality scales, ourselves included, favor
a dimensional approach. But there is a more important set of reasons than
those associated with the difficulties associated with class models. We
believe that a dimensional model is a more accurate way of conceptual-
izing psychopathology. Consider one example. Beck (1967) reported that
a systematic investigation of the reliability of psychiatric diagnoses
revealed that psychiatrists failed to agree about the precise diagnostic
category in which to place depressed patients, but when asked to rate the
depth of depression, they demonstrated very high levels of agreement.
This suggests that a dimensional model might be more congruent with the
way in which mental health professionals think about psychopathology.
Other constructs of psychopathology, in addition to depression, are
intuitively consistent with a dimensional model. Characteristics such as
anxiety or hypochondriasis appear to vary in degree or magnitude, rather
than as step functions.
Another source of support for a dimensional approach is the freedom
that such an approach provides to evaluate whether or not one's
conceptualization of a dimension is congruent with empirical data. If
behavioral exemplars or personality items fail to be mutually related
statistically in a manner required by theoretical expectations, scales or
items may be rejected, or the theory revised, or both. For example, if
depression is conceptualized as being represented by a set of facets, such
as loss of self esteem and lack of interest in engaging in exciting activities,
but it is found that some facets are not associated as expected by theory,
the conceptualization can be revised. Furthermore, the process of assess-
ing psychopathology is not only a matter of setting a boundary between
"normal" and "abnormal" classes, but it is concerned as well with differ-
entiating individuals within the normal range and within the pathological
range. A dimensional model is more appropriate for such differentiation
along the entire continuum of a dimension of psychopathology. What we
BASIC PERSONAIJTY INVENTORY 171

sought in BPI scale development was the incorporation of a dimensional


model of psychopathology into a comprehensive construct approach.

The Construct Approach as Realized in the


Development of the BPI
Although it is not our aim here to provide extensive details concern-
ing the development of the BPI, we consider it appropriate to provide an
overview of the test construction rationale, together with some illustra-
tive material. Briefly, BPI construction can be summarized in terms of five
steps: (a) identification of relevant constructs of psychopathology, do-
main specifications, and definition; (b) identifying items within the domain
of each construct; (c) multivariate item analyses designed to optimize
content saturation, while minimizing sources of irrelevant content and
response bias variance; (d) reviewing items selected for each scale for
content and substantive validity; and (e) assembling scales in final form
and undertaking validity studies. We discuss each of these steps under
appropriate headings.

Identifying Constructs of Psychopathology:


Facets and Concepts of BPI Scales
The BPI's development had its foundation in a multivariate analysis
of 28 scales contained in the as yet unpublished Differential Personality
Inventory (DPI) (Jackson & Messick, 1986), an elaborate Instrument
comprised of 432 items. These scales were theoretically based and
spanned types of psychopathology broadly. In the discussion that fol-
lows, we discuss each of the 11 substantive components that arose from
this analysis organized in terms of corresponding BPI scales, plus the
Deviation scale, developed to identify clinically critical behaviors. The
Differential Personality Inventory scales defining each principal compo-
nent factor were helpful in defining the facets of each scale. The headings
are organized in terms of BPI scales, followed in parentheses by the
defining DPI scales representing the facets identified in the principal
components factor analysis. We also provide an illustrative item, but, to
preserve the security of the BPI, not precisely an item that appears in
the BPI.
Hypochondriasis. (Somatic Complaints, Health Concerns,
Hypochondriasis). I am frequently troubled by pains in my joints. The DPI
distinguished between somatic complaints and what we originally termed
imaginary symptoms. Somatic complaints represented health problems
having a well-defined organic basis, usually linked to a known diagnosis.
172 RONALD R. HOLDEN AND DOUGLAS N. JACKSON

Imaginary symptoms, whose scale name was later changed on the DPI to
Hypochondriasis, contained item content representing complaints that
were vague, ill defined, or lacking a link to a known medical condition.
Health concern involved a preoccupation with health as distinguished
from health complaints. Psychometrically, we found these three facets to
be correlated but distinguishable. They are sufficiently correlated to
comprise a single broader scale, and constitute behaviors ordinarily
associated with hypochondriasis. One advantage of broadening the
definition of hypochondriasis beyond imaginary symptoms is that the
latter item content generally yields extreme endorsement proportions,
whose content borders on somatic delusions. Such content does not
differentiate well within the normal range. It is important to recognize
here as elsewhere in assessment based on self-report inventories that a
great deal of weight should not be placed on responses to a single item.
Rather, the aggregation of responses is what is emphasized. We are more
likely to suspect hypochondriasis if a person has a great many diverse
complaints than only one. Of course, it is important to interpret eleva-
tions on this scale in the light of the respondent's entire medical history.
There may be good reason for diverse somatic symptoms. One of us
recalls a patient who had suffered from semi-starvation while a prisoner
of war, an experience that resulted in a number of chronic health
problems. In this case the symptoms had an existence apart from a desire
to achieve secondary psychological gains from somatic complaints.
Depression. (Depression, Insomnia). My future is hopeless. The DPI
Depression and Insomnia scales do not exhaust the facets of this scale,
nor do they reflect adequately the background for this scale. When the
DPI Depression scale was developed, a total of 214 items was prepared,
reflecting many of the well known facets of depression, such as despair
over the future, suicidal tendencies, feeling "blue," loss of confidence in
abilities, and lack of libido. Facets were scored separately and
intercorrelated. Item correlations were obtained with each facet, as well
with total scores for a general depression scale (based on the sum of the
facet scores) and with related psychopathological scales, plus a social
desirability scale. These procedures permitted an analysis of the type of
item that best appraised general depression, as well as providing a basis
for representing the domain of depression in a comprehensive manner.
This approach provided a good foundation for developing the BPI scale.
Because the negative affect that is often represented in depressive
content is often highly evaluative, it is also important for the sake of
fostering discriminant validity to differentiate depression from simply
responding undesirably.
Denial. (Defensiveness, Repression, Shallow Affect). I cannot recall
ever having been embarrassed by something I did. It is interesting that the
BASIC PERSONAUTY INVENTORY 173

three facets comprising the Denial scale were conceptualized as distinct


scales, although a review of their definitions reveals why they are
sufficiently correlated to define a factor. Defensiveness shows a kinship
to the MMPI Land K scales, and in particular to findings by Jackson and
Messick (1969) bearing on the psychometric bases for defensiveness or
"lie" items. These findings refer to contrasts between judged properties
of personality items. Ordinarily there is a very high correlation between
the judged frequency of endorsement of an item and the judged frequency
of the occurrence of the behavior reflected in the item content. Similarly,
there is a high correlation between the judged desirability of a true
response to item content and the actual frequency of endorsement of the
item. Defensiveness items typically show a discrepancy between these
attributes. Thus, the item, "/never resented being punished as a child,"
reflects behavior that would be judged as desirable, but reflects rather
infrequent behavior. Such items tend to be sensitive to instruction to try
to fake positive mental health or desirable behavior. Repression item
content typically involves a denial of a willingness to confront unpleasant
experiences in one's pastor, more generally, to probe into the psychologi-
cal determination of behavior or psychological aspects of artistic
expression. Shallow affect is measured by item content reflecting emo-
tionally arousing situations in normal people. Persons with elevated
scores tend to deny being aroused by such situations. Thus, an item like
"Accidents, even if serious, never bother me, "would reflect shallow affect.
Together, these three facets all reflect a tendency to deny psychological
arousal or content that is frequent in the general population. Unlike other
tests of psychopathology, scores on this denial scale are negatively
correlated with other BPI scales. Because these denial tendencies are
associated with faking (Helmes & Holden, 1986), the Denial scale can be
viewed as a measure of response style and as a kind of suppressor
variable (Conger &Jackson, 1972), a variable that is associated with other
BPI measures of psychopathology but not with criterion measures of
psychopathology (Holden, Fekken, Reddon, Helmes, &Jackson, 1988). As
such, it would be expected that taking the elevation of the Denial scale
into account might increase the validity of other BPI scales, even though
Denial, unlike all other BPI clinical scales, does not by itself correlate
substantially with relevant clinical ratings (Holden et a!., 1988).
Interpersonal Problems. (Familial Discord, Hostility, Rebellious-
ness, Sadism). People who try to boss me around are in for a lot of trouble.
The common theme running through the four facets and the item content
of the Interpersonal Problems scale is a low threshold for expressing
negative responses toward other people. When persons high on the
interpersonal problems dimension have their freedom of action re-
stricted or otherwise are frustrated by other people they respond
174 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

aggressively. This low threshold for aggressive responses causes them to


be seen by others as high on a dimension of hostility.lt is interesting also
that empirical multivariate clustering procedures resulted in items re-
flecting sadism to be associated with the other facets. Persons high on
Interpersonal Problems were rated as affected, aggressive, high-strung,
idealistic, imaginative, indifferent, and shy by persons who knew them
(Jackson et al., 1989).
Alienation. (Cynicism, Socially Deviant Attitudes)./ believe in trying
to cheat otherpeople before they can cheat me. The two facets of Alienation
both have a strong attitudinal component. People high on the alienation
dimension expect dishonesty and harm from other people; such cynicism
serves to justify the belief that it is all right to take advantage of others.
If it were not for the pejorative connotations of such a title, we might have
named this scale "Antisocial Attitudes."
Persecutory Ideas. (Broodiness, Ideas of Persecution)./spend a great
deal of time thinking about how those close to me might try to harm me.
Persecutory ideas are a rather common form of psychopathology, but
need not take the form of full blown delusions of persecution. Persons
who brood over minor incidents or slights by others are not necessarily
delusional but may instead be showing preconditions for the "misinter-
pretations, unwarranted inferences, or unjustified conclusions" (Cameron
& Magaret, 1951, p. 392) that form some of the psychological bases for
delusions. Cameron and Magaret describe in a male patient the often
prolonged process of solitary brooding over events that seem incompre-
hensible in the face of anxiety, cerebral incompetence, Isolation, and
inability or unwillingness to seek consensual validation of interpretations
of perceived events.

By no means [do] all persecutory delusions come to so


clear and dramatic a climax. But whether or not they do,
there is nearly always a prolonged phase during which
the patient gathers and mulls over his spurious evidence.
He develops progressive reaction-sensitization in terms
of his growing convictions. Everything around him seems
to refer in some way to him (self-reference), even though
he may for a long time be unable to comprehend what is
apparently brewing. Scraps of conversation, the looks
others give him, their gestures, smiles, frowns and laugh-
ter find their place in his pseudocommunity, whether or
not they can at the time be interpreted with certainty.
. . .To this growing calamity most patients react with fear,
anger and indignation. Some, however, view it with pas-
sive resignation; they may fear it but they do not resist it.
BASIC PERSONAUTY INVENTORY 175

Some seem, indeed, to welcome and invite persecution,


as punishment they deserve or as martyrdom they
accept (Cameron &Magaret, 1951, pp. 395-396).

Anxiety. (Neurotic Disorganization, Mood Fluctuation, Irritability,


Panic Reaction). If something unexpected happens, I become extremely
upset. An understanding of the meaning of high scores on the BPI Anxiety
scale may be obtained not only from considering common conceptions of
anxiety, but from an appraisal of the four DPI scales defining the facets of
Anxiety. Persons high on Anxiety would be expected to have a variety of
fears, to worry excessively about the possibility of danger or harm, either
physical or interpersonal, to become disorganized under stress to the
point where some cognitive functions, such as inability to find personal
effects or to remember obligations, might be impaired, to be upset or
irritated by frustration, and to be susceptible to alterations in mood or a
sense of well being. Anxiety can be expressed in a variety of ways, for
example through muscular tension, as well as other physiological signs
such as perspiration, a fluctuating heart rate and a hyper-alertness
regarding danger. Persons high on the Anxiety scale are not always
expected to be in a state of panic, but acute anxiety may be triggered by
a variety of situational influences. When the BPI Anxiety scale is corre-
lated with other measures, such as the State-Trait Anxiety scale
(Spielberger, 1983), it should be recognized that the correlations will
reflect the respective methods of scale construction and construct
definitions of the scales. The BPI Anxiety scale tends to be more focussed
on cognitive aspects and symptoms of anxiety, rather than on the more
generalized forms of psychological distress and psychopathology re-
flected in other anxiety scales.
Thinking Disorder. (Disorganization of Thinking, Feelings of Unreal-
ity, Perceptual Distortion). Sometimes I feel as if I am living in a dream
world. The constituent facets of Thinking Disorder, as reflected in multi-
variate analyses, are all linked to psychotic-like manifestations of cognitive
dysfunction. High scores on Disorganization of Thinking represent gross
departures of normal thinking patterns, for example manifestations of
disorientation, of extreme forgetfulness, and of distortions of time.
Elevated scores on Feelings of Unreality reflect the dream-like states that
frequently accompany schizophrenic reactions (or substance abuse).
Scores on Perceptual Distortion reflect a continuum which, at the ex-
treme, would involve auditory and visual hallucinations, but at less
extreme levels, might reflect a tendency to misinterpret or distort percep-
tual signals. At a simple level Thinking Disorder might be interpreted as
a "psychotic tendencies" scale, but there are times when the Thinking
Disorder scale may be elevated when these symptoms arise, not from
176 RONALD R. HOlDEN AND DOUGLAS N. JACKSON

psychotic processes per se, but rather from temporary states like severe
panic, which sometimes results in the elevation of a variety of scales.
Thinking Disorder will also show elevations for persons who have been
abusers of drugs or alcohol. As well, the Thinking Disorder scale will also
reflect unusual cognitive processes in people who are not diagnosable as
having a psychiatric disorder. It remains for further research to deter-
mine whether such people are susceptible to psychotic disorders of
cognition.
Impulse Expression. (Impulsivity, Hostility). When I am bored I
sometimes do reckless or foolish things just to stir up excitement. In the
absence of evidence for psychopathology, persons high in Impulse
Expression appear to have above average levels of energy and to be
regarded by others as lively and entertaining. Its pathological implica-
tions arise from the tendency of such persons to experience fits of temper
and uncontrolled, sometimes hostile behavior, as well as a tendency to
engage in risky behavior, often involving physical, social, monetary, or
ethical risks. Persons with extremely low scores are generally regarded
as stolid, cautious individuals who have a slower than average pace of
responding and acting, are reserved, even tempered, and not subject to
unpredictable changes in behavior.
Social Introversion. (Desocialization)./ am happier alone than when
I am with other people. At times psychopathology has the effect of causing
a person to withdraw from what is generally regarded as normal human
contact. Sometimes this is the result of extremely low self-esteem, some-
times of severe depression, and sometimes the result of having experienced
delusional beliefs about the harm that might occur at the hands of
strangers or loved ones. A desire to affiliate will often vary, of course, in
the normal population. This can occur without any particular psycho-
pathological implications. But the content of the Social Introversion scale
of the BPI reflects more extreme behavior than one would be likely to
encounter in the normal range. Hence, elevations in Social Introversion
can be regarded as significant, particularly when they are accompanied
by other evidence of dysfunction.
Self Depreciation. (Self Depreciation)./ rarely have anything intelli-
gent to contribute to a conversation. There has been an extensive literature
regarding the role of social desirability in responses to personality
questionnaires. It has been argued that responses indicating negative
social desirability in self-report questionnaires define the largest compo-
nent in these questionnaires. This sort of finding can be interpreted in a
variety of ways, but one obvious interpretation is that the tendency to
depreciate the self is a very important aspect of psychopathology,
showing a pervasiveness that makes the measurement of specific aspects
of psychopathology difficult. In the BPI an effort was made to contain this
BASIC PERSONAUTY INVENTORY 177

pervasive response tendency and to define it substantively. Rather than


define it as a source of correlated error, a response bias, or a response
style to be eliminated from any consideration, the tendency to respond
undesirably was considered an important dimension of psycho-
pathology. Thus the Self Depreciation scale was designed to measure the
low self esteem that accompanies and serves as a cognitive manifestation
of certain types of psychopathology. Psychometrically, we distinguish
between item content specifically associated with tendencies to depreci-
ate the self, including beliefs in lack of physical attractiveness, lack of
ability, feelings of worthlessness and undeservedness of love and affec-
tion, and, at the extreme, the idea of total worthlessness and incompetence.
Such feelings can arise from a number of sources: extreme guilt, depres-
sion, a recognition of the effects of psychopathology on effectiveness in
functioning in variety of spheres such as interpersonal relationships and
work, and the perception of the difference between aspirations and
abilities to achieve accomplishments.
Devladon. Unlike other BPI scales, Deviation cannot be considered a
homogeneous dimension, nor even a congeries of correlated dimensions.
Hence, we do not propose a prototypical item, nor do we list facets.
Rather, Deviation represents a series of critical items bearing on behav-
iors of some clinical importance, for example, suicidal ideation, reports
of substance abuse, and other forms of extreme behavior. Interestingly,
the scale does show a degree of internal consistency, even though the
content is not designed to be homogeneous. What interpretations are
appropriate when Deviation is elevated? Ftrst of all, it is appropriate to
investigate the item content endorsed to explore hypotheses concerning
the nature of the content and the possible sources of psychopathology
that might have been associated with positive responses. Taken as a
whole, an elevated Deviation scale might be regarded as a measure of a
person's statistical deviation from common behavior patterns. There is
some tendency for persons high on this scale to show a variety of
behaviors that are unique, not only in regard to those areas covered by
the scale, but in other areas as well. The Deviation scale will also be
elevated when a respondent has such a poor knowledge of the English
language or is otherwise so confused that responses are not purposeful.
Elevated Deviation scales will also occur if the individual has become
confused in regard to the necessity of responding to the appropriate
numbered item in the appropriate numbered response box on the answer
sheet. Another reason for an elevated Deviation scale is when the respon-
dent is attempting to "fake bad." The skilled clinician will seek to gather
the evidence necessary to distinguish valid from invalid elevations on the
Deviation scale.
178 RONALD R. HOlDEN AND DOUGLAS N. JACKSON

Scale Construction and Muldvarlate Item Analyses


As we have already indicated, scale construction proceeded using a
multivariate strategy. Our approach was to undertake a two stage,
canonical principal components analysis, one that resulted in rotating
components based on 28 constructs of psychopathology in a space
within which 11 orthogonal dimensions based on MMPI scales were also
represented. The 11 orthogonal dimensions provided a basis for conduct-
ing an item analysis for each BPI scale, one that permitted the
evaluation of the relationship between each item and every one of the 11
components.
For example, the BPI Hypochondriasis principal component factor
was based on three previously developed scales: Somatic Complaints,
Health Concern, and Hypochondriasis or Imaginary Symptoms.
An illustration of the method of item analysis would be helpful. Table
1 provides item statistics for four items similar to items appearing on the
Alienation scale of the BPI. (Although the item statistics are genuine, the
items have been modified somewhat to avoid reproduction of actual BPI
items). The columns of Table 1 refer to statistical data relevant to decision
making regarding the acceptability of an item. The first column (lEI) refers
to an Item Efficiency Index. The lEI is a value that gives a positive weight
to an item's association with the factor underlying its own scale and a
negative weight to the item's redundancy with other scales as measured
by the variance that the item shares with irrelevant factors (Neill &
Jackson, 1976). The first item, "getting the better of a card cheat ... ," is
substantially associated with the Alienation factor, has an acceptable
mean and variance, low correlations with other factors, and thus a high
lEI. The decision was therefore to retain this item. The second item, telling
others "that I do not like a person," has a moderate correlation with the
factor score on which the item is keyed, one that is higher than that for
any of the irrelevant factors. But its lEI was ranked lower than those for
acceptable items, and the item was rejected.ltem three, "I would be much
more successful if certain people were not against me," also shows a
moderately high correlation with its own factor. But this item correlates
.60 with the factor score for Persecutory Ideas, a result that is not
surprising given the content of the item. The fact that this item correlated
higher with an irrelevant factor, coupled with its relatively low lEI, was a
reason for rejecting it. Finally, item four, "frightening someone," was
rejected for multiple reasons, notably its higher correlation with the Self
Depreciation factor and its very low item variance, which was "rounded
up" in the table from .0025. Items with low variance have a number of
liabilities, including unstable statistical properties, and a minimal contri-
bution to scale reliability and validity.
Table I
I~
(')
Illustration of Criteria for Selecting and Rejecting BPI Items: Alienation Scale "1:1

Correlations with BPI Factor Scores


ITEM lEI Mean Var Hyp Dep Den lnP Aln Psi Anx ThO Imp Sol SOp Decision/Reason
~z
~
1. I would enjoy getting =1
the better of a card cheat 74 19 15 03 -17 -13 11 75 -08 -03 12 12 04 15 Retain/high lEI z
-
at his own game.
z~
2. I often tell others that d
I do not like a person. 39 32 22 28 -06 -29 08 42 00 16 13 -03 07 03 Reject/lEI
relatively low
~
3. I would be much more
successful if certain people 37 21 17 27 05 30 06 46 60 11 17 -20 -06 32 Reject/Higher with
were not against me. irrelevant fact
correlation
4. The way I see it, frightening
someone who cannot fight 51 05 01 13 02 20 07 56 -02 12 11 -23 -07 63 Reject/multiple
back is a good joke. reasons

Note: Decimals omitted. lEI refers to Item Efficiency Index (see text). Var to item variance.
Abbreviations for scale names are as follows: Hyp - Hypochondriasis; Dep - Depression; Den - Denial; InP - Interpersonal
Problems; Aln- Alienation; Psi -Persecutory Ideas; Anx- Anxiety; ThO- Thinking Disorder; Imp -Impulse Expression; Sol-
Sociallntroversion; and SOp- Self Depreciation. The Deviation scale, a scale composed of heterogeneous critical items, was
not used for this stage of test development.
Table adapted from Jackson et al. (1989) with the permission of Sigma Assessment Systems, Inc.

........
CQ
180 RONALD R. HOLDEN AND DOUGLAS N. JACKSON

Even though Table 1 provides an illustration in which a majority of the


items were rejected because of serious statistical defects, it would be
wrong to conclude that the BPI item pool contained a large number of
defective items. The majority of the items were acceptable, but some
were better than others. When an item was otherwise acceptable in terms
of its statistical properties, it was subjected to further editorial review.
Editorial review was undertaken with at least four considerations in mind:
(a) to eliminate redundant content (e.g., two hypochondriasis items both
concerned with back pains); (b) to avoid ambiguous phrasing or content;
(c) to improve readability and brevity; and (d) to evaluate item relevance
to all subgroups within the broad population of potential respondents.
Additional item analyses were reviewed to evaluate the effects of the
changes introduced, to check on the reliability of previous item analyses,
and to appraise the stability and generalizability of item properties
across different populations.
Holden et al. (1983) reported separate BPI item factor analyses based
on three samples of normal adults, psychiatric patients, and high school
students. In this study BPI items showed excellent convergence with the
scales on which they were keyed, with 96, 99, and 98 percent of items
loading appropriately on their respective scales, with no item loading
inappropriately across all three samples, and with keyed items showing
much higher loadings than non-keyed items or with loadings expected by
chance. The data from the study thus strongly supported the theoretical
structure of the BPI and what Loevinger (1957) termed the structural
component of validity. The BPI item structure showed replicability
across samples with varying levels of psychopathological involvement,
age, and demographic distribution. It thus offers support for the modern
construct approach to scale construction employed, and to BPI psycho-
metric properties. A perspective on these findings may be gained from a
comparison with findings from a similar item factor analysis of the MMPI
(Reddon, Marceau, & Jackson, 1982). In the MMPI study six orthogonal
item factors were identified, but for five of the six factors items were
predominantly associated with a general social desirability scale, to a
greater extent than on the factor with which they were purportedly
identified.

Relationship ofthe BPI to the MMPI


Jackson and Hoffmann (1987) investigated the relationship between
the BPI and MMPI clinical scales by undertaking a factor analysis of the
two scales using a hospitalized psychiatric population. Five factors
emerged, each sharing loadings from BPI and MMPI scales. The factors
BASIC PERSONAUTY INVENTORY 181

make a good deal of clinical sense, as is evident from a listing of salient


loadings.

Factor 1
Impulse Expression .62
Interpersonal Problems .52
Pd .51
Anxiety .49
Alienation .46
Ma .46
Pt .40
D .38
K -.35
L -.67
Denial -.82

The scales showing the highest loadings on this factor are those that
identify persons who are likely to have trouble adapting to social or legal
codes of behavior, who are likely to show unstable employment histories,
and who are more likely than average to engage in interpersonal conflict.
This is the pattern of behavior that is commonly associated with the Pd-
Ma configuration in MMPI clinical lore. This configuration was found by
Loper, Kammeier, and Hoffmann (1973) to characterize the MMPI profiles
of University of Minnesota undergraduates who were later hospitalized
for alcoholism. BPI scales loading this factor support this interpretation.
Impulse Expression, implying undercontrol of impulses, Interpersonal
Problems, suggesting a propensity for conflict, and Alienation, with its
implication of deviant socialization with respect to dishonesty, theft, and
thrillseeking, are consistent with an interpretation of the positive pole of
this factor as reflecting sociopathic or antisocial behavioral tendencies.
Its negative pole, defined by the BPI Denial scale and the MMPI L and K
scales, suggests overcontrol of impulses, defensiveness, and some im-
pression management. Social undercontrol is an apt label.

Factor 2
Si .60
Pt .56
Anxiety .49
K -.69

This factor highlights social withdrawal and anxiety opposed to a


denial of anxiety and related disorders. Jackson and Hoffmann identified
this factor as reflecting generalized social anxiety.
182 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

FactorS
Hy .82
D .66
Hs .61
Hypochondriasis .45
Depression .39

Cognate scales from the BPI and MMPiload this factor. The Hs, D, Hy
combination is the familiar "neurotic triad" in MMPIIore. Persons high on
this factor tend to experience psychopathology in terms of negative
affective states and other manifestations of depression coupled with
somatic complaints, justifying the label of depression and somatization.

Factor4
Sc .77
F .75
Persecutory Ideas .69
Ma .67
Thinking Disorder .63
Alienation .53
Deviation .51
Pt .51
Pa .49
Hypochondriasis .48
Hs .45
Interpersonal Problems .43
Pd .39
Impulse Expression .38
K -.40

The two highest loadings on this factor from the BPI, Persecutory
Ideas and Thinking Disorder, are associated with psychosis scales from
the MMPI. But there are also a number of other scales highly loaded,
suggesting that psychotic processes are accompanied by diverse other
forms of p~ychopathology in this population. It is noteworthy that the BPI
Deviation scale is loaded on this factor. The Deviation scale contains
diverse deviant content, rather than content pertaining to a unitary
dimension of psychopathology. Its appearance here suggests that hospi-
talized psychiatric patients experience or report many of these diverse
deviant behaviors. Jackson and Hoffmann labelled this dimension gener-
alized psychotic processes.
BASIC PERSONAUTY INVENTORY 183

FactorS
Social Introversion .75
Self Depreciation . 72
Depression .62
Si .53
D .40
Alienation .38

According to Beck (1967) social withdrawal and low self esteem are
often associated with high levels of depression. It is not surprising to find
these characteristics c<>-<>ccurring in a sample of psychiatric patients.
The label depressed withdrawal seems appropriate.
It should be noted that the above factor analytic results were ob-
tained from a particular population, male psychiatric patients, most of
whom were suffering from severe alcoholism. Although the five factors
accounted for a major proportion of the variance, and the interpretation
of the factors was relatively easy from the perspective of plausibility, it is
not certain whether or not the same factors would emerge with different
populations. But given the fact that correlations between BPI and MMPI
scales are relatively high, and share substantial common variance (Jack-
son &Hoffmann, 1987), it is clear that the BPI and the MMPI each address
the same domain of psychopathology.

Typologies
In addition to interpretations based upon individual scale scores, the
configurations of scores associated with an individual's test profile may
also be of interest and relevance. More specifically, the similarity of a
respondent's profile to a particular prototypical configuration or "modal
type" may provide the test user with additional pertinent information.
Using a sample of 352 provincial psychiatric hospital inpatients, Holden,
Fekken, and Jackson (1983) employed cluster analytic methods to iden-
tify six reliable profile "types" of psychiatric patients based on BPI scale
scores (see Table 2). This typology successfully classified over 75% of the
patients. The following information describes each type and provides
information about and exemplary patient. It should be noted, however,
that any one patient will only approximate his/her "type" of profile.
TYPE lA represents a cluster of patients associated with a modal
profile elevated on BPI scales of Denial and Alienation. Patient #155
(Figure 1) was a high scorer on this type. This patient was a divorced
woman in her early twenties with a grade 8 education. She had a prelimi-
nary diagnosis of an Immature Personality Disorder. Casebook symptoms
184 RONAlD R. HOLDEN AND DOUGlAS N. JACKSON

Table 2

Basic Personality Inventory Modal Profile Types

1YPE
BPI SCALE lA IB IIA liB IliA IIIB
Hypochondriasis 44 56 34 66 51 49
Depression 41 59 55 45 61 39
Denial 77 23 43 57 62 38
Interpersonal Problems 52 48 64 36 41 59
Alienation 58 42 57 43 33 67
Persecutory Ideas 50 50 41 59 41 59
Anxiety 38 62 43 57 58 42
Thinking Disorder 49 51 39 61 44 56
Impulse Expression 45 55 52 48 41 59
Social Introversion 51 49 64 36 60 40
Self Depreciation 46 54 58 42 58 42

Note: Modal Profiles are scaled to have a mean of 50 and a standard deviation of
10. The Deviation scale, a scale composed of heterogeneous critical items,
was not used in the derivation of the types.

indicated a history of assaultive behavior, alcohol abuse, and attention-


seeking but ineffective suicidal attempts.
TYPE IBis characterized by a BPI profile that is the reflection of TYPE
lA. TYPE IBis a cluster of patients with BPI scale elevations on Anxiety,
Depression, Hypochondriasis, and Impulse Expression. Patient #61 (Fig-
ure 1), a single woman in her early twenties with a grade 10 education, was
a high scorer on this type. She was diagnosed as a Neurotic Depressive
and had a casebook history of symptoms of depression, somatic com-
plaints, insomnia, and suicidality.
Elevation on BPI scales of Interpersonal Problems, Social Introver-
sion, SelfDepreciation, and Depression mark the configuration associated
with modal profile TYPE IIA. Patient #62 (Figure 2) scored high on this
type. This patient was a married man in his early fifties who had com-
pleted high school. Preliminarily diagnosed as a Schizoaffective Disorder,
this individual had recorded symptoms of solitary brooding, mania,
depression, insomnia, and assaultive behavior.
The profile associated with TYPE liB is marked by high scores on BPI
scales of Hypochondriasis, Thinking Disorder, Persecutory Ideas, Anxi-
ety, and Denial. Patient #8 (Figure 2), a divorced woman in her late thirties
with a grade 12 education, represented a high scorer on this type. This
BASIC PERSONAUTY INVENTORY 185

110

100
90
....
Q)
80
0
(,)
tl) 70
....
"C
60
~
"C
c: 50
.....
~
tl) 40 El

30

10

10
:z:. ,.j ,.j(l) :z:. z
!!!
g ~3
~
~!:!
~a ~ra
~7.

II
0 0
"'~ !il ~ I= ~~
0

.."' ~9 0~
~
O.J
~ "' ~~
...:
~~ ~
i .."'
Q
u "'~
~ "'..
~
0
=
!:!"'
..:"' ~ ~ 0
~ "'
Q

u
Q

~ "' "' 0

~
Note. Casebook symptoms of assaultive behavior, alcohol abuse, and attention-
seeking. Preliminary diagnosis of Immature Personality Disorder.
110

110

100

Q) 90
s..
0 80
(,)
Cf.l
'"0 70
....
~ 60
'"0
c:
.....
~ 50
Cf.l
40

30

10

10
z wz :z:.

II
!;! z f::
~~
~~
8 ~z ~~ 0 f;:~
~~ ~rs 0
:Z:.<o>
I= <n!= I=
~a
~ .
~

"'.."'
(I)
O.J
"'~ "' "' "'"'
«(
~~
~.. ~
0 ?.; u
<1:0 "'~
"'"'
:;:o. "'"' "'
~<(
.."'"'
0
~ "'
Q
X
"' ~
Q
u
0
~ "'
Q

~
Note. Casebook symptoms of depression, somatic complaints, and suicidality.
Preliminary diagnosis of Neurotic Depression.
Figure IA & B
BPI Profiles of Patients Representing Modal Type lA and Modal Type IB
186 RONAlD R. HOlDEN AND DOUGlAS N. JACKSON

110

100

90
Qj
r.. 80
0
u
Cl) 70
"C
r.. 60
Cll

=
"C 50
.5
Cl)
40

30

20

10
......
II
z z wz
~~ ~ ~ ~§ ~~ ~~~
~
0 '"'O ~
~e
5 ~8 ~~=

~... ~"'
"'1'1 12"' ~§
0 ~
Q
~f ~ 12"'
.."' a "'~
0

~
Q

~ ~ "'
Q

Note. Casebook symptoms of solitary brooding, mania, depression, Insomnia,


and assaultive behavior. Preliminary diagnosis of Schlzoaffectlve Disorder.

110
100
90
Qj
r.. 80
0
u
Cl) 70
"C
r.. 60
Cll

=
"C 50

.5
Cl)
40
30
lO
10

i II
z z

I"'
l!l ll.Z 7.:

~
~~ 0 ~3 ~"'
~8
~~ o12 i;! O 0
~ ~~ 5 ~8
.,~: !=!
~
Q 12'"
"'~
~
u
"' ~~ "'~
~.. ~
"'
.."'
0
=
0
t!"" 12 Q

"' ~
i
Q

~ "'
Q

Note. Casebook symptoms of delusions, mania, assaultive behavior, nonmedical


drug use, and suicidality. Preliminary diagnosis of Schizophreniform
Disorder.
Flgure 2A&B
BPI Profiles of Patients Representing Modal Type IIA and Modal Type liB
BASIC PERSONAliTY INVENTORY I87

patient was diagnosed as Schizophreniform Disorder and her casebook


reported symptoms of delusions, mania, assaultive behavior, nonmedical
drug use, and suicidality.
TYPE IliA is characterized by high scores on BPI scales of Denial,
Depression, Social Introversion, Self Depreciation, and Anxiety. A grade
nine educated, married woman in her late forties (Patient #88; Figure 3)
was most similar to this modal type. Her casebook indicated symptoms
of depression and insomnia and the patient was diagnosed as Schizophre-
nia in Remission.
BPI scale elevations for Alienation, Interpersonal Problems, Persecu-
tory Ideas, Impulse Expression, and Thinking Disorder characterize TYPE
IIIB. Patient #108 (Figure 3), a single woman in her early twenties with a
grade 10 education, scored very high in similarity to this type. Her
preliminary diagnosis indicated Schizoaffective Disorder and her history
indicated the presence of delusions, mania, depression, and reports of
visual hallucinations.
Morey, Blashfield, and Skinner (1983) have suggested that any
replicable typologies or clusters of persons should be subjected to an
external validation phase where variables not used in the typology
development be examined. We now report such an external validation for
the sample of 352 psychiatric patients using patient symptom data
collected from case files, admission and discharge summaries, social
work assessments, and psychological assessment reports. Although the
limitations of case records are recognized (Strauss & Harder, 1981), the
presence or absence of the following symptoms were collected for each
patient and used in our external validation: Hallucinations; Delusions;
Mania; Depression; Anxiety; Somatic Complaints; Insomnia; Anorexia;
Assaultive Behavior; Alcoholism; Nonmedical Drug Use; Suicidality. The
six identified types were then compared with regard to the frequency of
occurrence of the various psychiatric symptoms. Significant type differ-
ences with regard to the frequency of symptoms of hallucinations,
delusions, depression, anxiety, insomnia, assaultive behavior, and
suicidality (see Figures 4 to 10) support the validity of differentiating
among these BPI types.

Research Applications
The Basic Personality Inventory has been used in a variety of different
research programs and continues to be incorporated into more and more
empirical investigations. Below are listed some of the more extensive
research applications.
Health Psychology. The influence of psychological factors on physi-
cal well-being represents a rapidly developing area in the health
188 RONALD R. HOlDEN AND DOUGlAS N. JACKSON

110

100

9U
,...Q,l 80
0
u
rn 70
,...
"0
60
ell
so
....=
"0
ell
40
rn
3Q

20
10
z z ..,z z
so
.:I t:>e>: ...:IZ "-'Z
~~

~
~~
~~ 5 uo"'
II
0 0 .JO 0
0~
<C
;;; ~ ~!:! I=
.. ~~~
l:lr::
t;B o..:
~ "'Q
::!"'
..,o ~... "'~ 0 <C
>
=
0 1<1 ~g; "'E 0
"'.....: "'
Q

g ~
1<1
...
Q <C
u
0 "'
Q

~
=
Note. Casebook symptoms of depression and insomnia. Preliminary diagnosis of
Schizophrenia in Remission.
110

100

90
Q,l
r... 80
0
u
rn 70
,...
"0
60
ell

-
El
"0
c: so
ell
40
rn
30

20

10
1!1 z .:I z >"' t:>e>: wz ...:IZ: "'Z z

~ ~~
-'~"'
-<::!: ..:< -"0
..,5l ~
0 0

..
~~
zw ~0
t:l!=
~ "'
0...:1 ~z ~8 151Q
0 ~
~... "'"'
..:o ~
..
Q <nl<l
~ u >
lSI'<
~ ...
~ .."' l:l
~<~..: 1<1 0 1<1
"'
0
:z: ::! Q
u
Q
"' ~ 1<1
0
~ ~ Q

=
Note. Ca sebook symptoms of delusions, mania, depression, and visual
hallucinations. Preliminary diagnosis of Schizoaffective Disorder.

Agure3A&B
BPI Profiles of PaUents RepresenUng Modal Type IliA and Modal Type mB
BASIC PERSONAU1Y INVENTORY 189

so
45

40

35

-
Q)
bO 30
~
c:
Q) 25
u
s..
Q) 20
~
15

10

0
IA In IIA un IliA nm
Note. Presence of case-recorded hallucinatory symptoms as a function of Modal
type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 10% of patients classified as Modal Type IIA
to 46% of patients classified as Modal Type liB. Types differ significantly
(x2 = 17.55, p < .01).
Figure 4

60

50

-
Q) 40
bJl
~
c:
Q) 30
u
s..
Q)
~ 20

10

0
lA ID llA 118 lilA lUll
Note. Presence of casebook-recorded delusional symptoms as a function of
Modal Type. Frequency is displayed as a percentage of patients within each
Modal Type. Frequency ranged from 19% of patients classified as Modal
Type IliA to 59% of patients classified as Modal Type IIIB. Types differ
significantly (x2 = 21.23, p < .01).
Figure 5
190 RONAlD R. HOlDEN AND DOUGlAS N. JACKSON

70

60
Q,)
ell so
....
~

=
Q,)
(,J
s...
40

Q,)
30
~

20

10

0
lA IB IIA IIIJ lilA IIIB
Note. Presence of casebook-recorded depressive symptoms as a function of
Modal Type. Frequency Is displayed as a percentage of patients within each
Modal Type. Frequency ranged from 41% of patients classified as Modal
Type 1118 to 76% of patients classified as Modal Type lilA Types differ
significantly (XZ = 19.83, p < .01).
F1gure 6
40

JS

30

Q,)
OD 2S
....
~

=
Q,)
u
s...
2.0

Q,)
~ IS

10

0
IA lB IIA liD IDA IIIJ
Note. Presence of casebook-recorded anxiety symptoms as a function of Modal
Type. Frequency Is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 9% of patients classified as Modal Type lllB to
35% of patients classified as Modal Type 18. Types differ significantly
(XZ = 16.15, 2p < .01).
flgure 7
BASIC PERSONAUTY INVENTORY 191

40

35

JO

Q) 25
t:l.O
.....
(1:1
r=
Q) 20
v
....
Q)
~ 15

10

0
lA lU UA liD IUA lllll
Note. Presence of casebook-recorded Insomnia symptoms as a function of Modal
Type. Frequency Is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 7% of patients classified as Modal Type IIIB to
30% of patients classified as Modal Type IliA. Types differ significantly
(x2 = 11.13, p < .05).
FigureS

professions. Burton and his colleagues (Burton, Canzona, Wai, Holden,


Conley, & Lindsay, 1983; Burton & Holden, 1981; Burton, Kline, Lindsay;
&Heidenheim, 1986; Holden, Burton, &Conley, 1981;Richmond, Lindsay,
Burton, Conley, & Wai, 1982) have used the BPI to examine psychological
variables related to patient adjustment to continuous ambulatory perito-
neal dialysis. Using separate outcome measures based upon patients'
self-report, nephrologists' evaluations, and medical team ratings, the BPI
was found capable of discriminating between patients who adjusted well
and those who adjusted poorly to the dialysis procedure (Burton et al.,
1983). Furthermore, the BPI could significantly discriminate between
survivors and nonsurvivors in chronic renal failure patients undergoing
home dialysis (Burton et al., 1986). Burton et al. (1983) suggest that
psychological assessment such as is provided by the BPI may assist
health professionals in selecting and monitoring patients for their adjust-
ment to this form of kidney dialysis.
Alcoholism and Substance Abuse. The BPI has been employed in
numerous investigations examining the use of alcohol and drugs (Morey,
Skinner, & Blashfield, 1984; Ogbome, 1987; Skinner, 1979, 1981, 1982;
192 RONAlD R. HOIDEN AND DOUGLAS N. JACKSON

40

35

30

<U
t)J) 25
.....~
c:
<U 20
u
s..
<U
Poe 15

10

0
lA 113 UA 118 UIA lllfl
Note. Presence of casebook-recorded assaultive behavior as a function of Modal
Type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 8% of patients classified as Modal Type IliA to
35% of patients classified as Modal Type IIA. Types differ significantly
(x2 = 13.60, p < .05).
Flgure 9

Skinner &Allen, 1982, 1983). Morey et al. (1984) identified three distinct
types of alcohol abusers (early-stage problem drinkers; affiliative, mod-
erate alcohol dependents; schizoid, severe alcohol dependents) who
could subsequently be differentiated on the basis of BPI scale scores.
From this, a speculative model of alcohol abuse has been proposed with
differential treatment hypothesized as a function of the type of alcohol
abuse.
Juvenile Delinquency. Jaffe and his associates (Austin, Leschied,
Jaffe, &Sas, 1986; Jaffe, Leschied, Sas, &Austin, 1985; Jaffe, Leschied, Sas,
Austin, & Smiley, 1985; Leschied, Austin, & Jaffe, 1988; Sas & Jaffe, 1986;
Sas, Jaffe, & Reddon, 1985) have explored the utility of using the BPI with
young offenders. Jaffe, Leschied, Sas, Austin, and Smiley (1985) demon-
strated that the BPI could indicate the presence of previous delinquency,
the nature of the offending charge (person/property vs status), the
presence of school misbehavior, the type of previous residence referral
(home vs detention), and could predict subsequent court reappearance.
Jaffe, Leschied, Sas, & Austill (1985) suggest that clinical services to
juvenile court systems might appropriately incorporate personality test-
ing, such as provided by the BPI, in their assessment programs.
BASIC PERSONAUTY INVENTORY 193

50

-
QJ
40
Oil
~
cQJ: 30
(,J
I..
QJ
Poe 20

10

0
lA IB llA un IliA mn
Note. Presence of casebook-recorded suicidal symptoms as a function of Modal
Type. Frequency is displayed as a percentage of patients within each Modal
Type. Frequency ranged from 24% of patients classified as Modal Type liB
to 57% of patients classified as Modal Type IIA or Modal Type rnA. Types
differ significantly (x2 = 16.1l,p < .01).
Agure 10

Mlcrocomputerlzatlon. The BPI has also been employed in examin-


ing the area of microcomputer-administered testing. Holden et al. (1990),
using a sample of psychiatric inpatients, demonstrated that BPI means,
standard deviations, internal consistencies, and validities were not af-
fected by the microcomputerization of the test. Other research with an
automated BPI (Holden & Fekken, 1987, 1988; Holden, Fekken, & Cotton,
1991) has shown promising results for the psychometric properties of BPI
test item decision times. Specifically, the differential BPI test item re-
sponse latencies of psychiatric patients has been shown to relate to
clinical criteria. For example, Figure 11 demonstrates that patients who
are clinically rated as being more depressed endorse BPI Depression
items more rapidly (relative to the patient and the item) than less
depressed patients (Holden et al., 1991). Complementarily, depressed
patients reject BPI Depression items slower than do patients showing less
clinically rated depression.
Other Research. Other research applications employing the BPI
include studies of alexithymia (Bagby, Taylor, & Ryan, 1986), eating
disorders (Chandarana, Helmes, & Benson, 1988), certain symptoms of
psychopathology (Helmes & Barilko, 1988), test construction strategies
194 RONALD R. HOlDEN AND DOUGLAS N. JACKSON

0.30
>.
v 0.27 l2l Endorsed Items
c
<I
0.24 • Rejected liems
<l
..J 0.21
<I
"'c0 0.18
c. 0.15
~
c:: 0.12
.,
"0
0.09
..
N
:a 0.06
" 0.03
~
... 0.00
ci5
.,...c -0.03
:;; -0.06
-0.09
12-33 34-58 59-83 83-108
Scores on Clinicnl Rating of Deoression

Rgure 11
Mean Standardized Response Latencies as a Function of Clinical Rating of Depression

(Holden, 1989; Holden &Fekken, 1990; Holden &Jackson, 1985; Holden et


al., 1983), the detection of dissimulation (Bagby, Gillis, & Dickens, 1990;
Helmes & Holden, 1986), person consistency (Holden, Helmes, Fekken, &
Jackson, 1985), clinical judgment (Jackson, MacLennan, Erdle, Holden,
LaLonde, & Thompson, 1986), stressful life events (Lei & Skinner, 1980),
and pain perception (Scudds, Rollman, Harth, & McCain, 1987).

Conclusions
The BPI, focusing on the measurement of constructs underlying the
MMPI, assesse~ a traditional domain of psychopathology. However,
unlike the MMPI, which was originally constructed on the basis of a class
model of ali-or-none psychopathological categorizations, the BPI repre-
sents a modern, dimensional model of assessment for quantification
along various continua of psychological dysfunctioning. Such an ap-
proach is advantageous in that is consistent with classical test theory.
Theoretically, the BPI incorporates an emphasis on important constructs
of psychopathology, fostering both the development of convergent and
discriminant validity (Campbell & Fiske, 1959). Rarely, if ever, have
multiphasic inventories of psychopathology been shown to provide
BASIC PERSONAUIY INVENTORY 195

discriminantlyvalid scale scores. Thus, many assessment questionnaires


developed prior to the explication of construct approaches to test
construction (e.g., Jackson, 1970, 1971) have paid little heed to issues of
response styles and independence of scales and their interpretations.
The field of assessment, however, may benefit as much from evidence
indicating what a scale does not measure as from data demonstrating
what it does measure.
State-of-the-art technology was used in the BPI's development. Ini-
tially, multimethod factor analytic techniques identified
psychopathological core content common to both the MMPI and the
Differential Personality Inventory (Jackson & Messick, 1986). At a later
stage, final BPI test item selection was guided by consideration of an Item
Efficiency Index (Neill &Jackson, 1976) that differentially weighted factor
loadings for an item on its own keyed dimension compared with those for
nonkeyed dimensions in order to produce minimally intercorrelated
scales.
To date, empirical evidence provides considerable support for the
success of the BPI's method of construction. The test's scales are
internally consistent and temporally stable for both nonclinical and
clinical samples (Holden et al., 1990; Jackson et al., 1989), the factor
structure of the instrument supports its scoring key (Holden et al., 1983),
and scores provided by the test are valid predictors of external criteria
in student (Holden & Jackson, 1985) and psychiatric patient samples
(Holden et al., 1988, 1990).
The BPI is a relatively new instrument for the assessment of psycho-
pathology. To date, research results with the inventory have been highly
encouraging. In particular, the BPI has demonstrated much promise for
clinical use in the fields of health psychology and juvenile-delinquency.
There is a need now for additional research to delineate further the utility
of the inventory in various other research and clinical settings.

References
Austin, G.W., Leschled, A.W., Jaffe, P.G., & Sas, L. (1986). Factor structure and
construct validity of the Basic Personality Inventory with juvenile offenders.
Canadian Journal of Behavioural Science, 18, 238-247.
Bagby, R.M., Glllls, J.R., & Dickens, S. (1990). Detection of dissimulation with the
new generation of objective personality measures. Behavioral Sciences and the
Law, 8, 93-102.
Bagby, RM., Taylor, G.J., &Ryan, D. (1986). Torontoalexithymlascale: Relationship
with personality and psychopathology measures. Psychotherapy and
Psychosomatics, 45, 207-215.
196 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

Beck, A.T. (1967). Depression: Causes and treatment. Philadelphia: University of


Pennsylvania Press.
Blashfield, R.K. (1984). The classification of psychopathology. New York: Plenum.
Burton, H.J., Canzona, L., Wai, L., Holden, R.R., Conley, J., & Lindsay, R.H. (1983).
Life without the machine: A look at psychological determinants for successful
adaptation of patients on CAPO. In N. Levy (Ed.), Psychonephrology 2:
Psychological problems in kidney failure and their treatment. (pp. 159-172) New
York, NY: Plenum.
Burton, H.J., & Holden, R.R. (1981, August). Psychological factors related to
successful adaptation for chronic renal patients. Paper presented at the
American Psychological Association Annual Convention, Los Angeles.
Burton, H.J., Kline, S.A., Lindsay, R.M., &Heidenheim,A.P. (1986). The relationship
of depression to survival in chronic renal failure. Psychosomatic Medicine, 48,
261-269.
Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A., & Kaemmer, B. (1989).
Manual for the restandardized Minnesota Multiphasic Personality Inventory:
MMP/-2. An administrative and interpretive guide. Minneapolis: University of
Minnesota Press.
Cameron, N., & Magaret, A. (1951). Behavior pathology. Boston: Houghton-Mifflin.
Campbell, D.T., &Fiske, D.W. (1959). Convergent and discriminant validation by
the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Chandarana, P., Helmes, E., & Benson, N. (1988). Eating attitudes as related to
demographic and personality characteristics: A high school survey. Canadian
Journal of Psychiatry, 33, 834-837.
Conger, A.J., & Jackson, D.N. (1972). Suppressor variables, prediction, and the
interpretation of psychological relationships. Educational and Psychological
Measurement, 32, 579-599.
Faschingbauer, T.R. (1979). The future of the MMPI.In C.S. Newmark (Ed.), MMPI
clinical and research trends. (pp. 373-398) New York, NY: Praeger.
Hathaway, S.R., & McKinley, J.D. (1940). A multiphasic personality schedule:
I. Construction of the schedule. Journal of Psychology, /0, 249-254.
Helmes, E., & Barilko, 0. (1988). Comparison of three multiscale inventories in
identifying the presence of psychopathological symptoms. Journal ofPersonality
Assessment, 52, 74-80.
Helmes, E., & Holden, R.R. (1986). Response styles and faking on the Basic
Personality Inventory. Journal of Consulting and Clinical Psychology, 54, 853-859.
Holden, R.R. (1989). Disguise and the structured self-report assessment of
psychopathology: II. A clinical replication. Journal of Clinical Psychology, 45,
583-586.
Holden, R.R., Burton, H.J., & Conley, J. (1981 ). Psychological correlates ofadjustment
to continuous ambulatory peritoneal dialysis: Nonnative and clinical data.
Unpublished manuscript.
BASIC PERSONAUTY INVENTORY 197

Holden, R.R., & Fekken, G.C. (1987, August). Reaction time and self-report
psychopathological assessment: Convergent and discriminant validity. Paper
prsented at the American Psychological Association Annual Convention, New
York.
Holden, R.R., &Fekken, G. C. (1988, June). Using reaction time to detect faking on
a computerized inventory of psychopathology. Paper presented at the Canadian
Psychological Association Annual Convention, Montreal, Canada.
Holden, R.R., & Fekken, G.C. (1990). Structured psychopathological test item
characteristics and validity. Psychological Assessment: A Journal of Consulting
and Clinical Psychology, 2, 3540.
Holden, R.R., Fekken, G.C., & Cotton, D.H.G. (1990). Clinical reliabilities and
validities of the microcomputerized Basic Personality Inventory. Journal of
Clinical Psychology, 46, 845-849.
Holden, R.R., Fekken, G. C., & Cotton, D.H.G. (1991). Assessing psychopathology
using structured test item response latencies. Psychological Assessment: A
Journal of Consulting and Clinical Psychology, 3, 000-000.
Holden, R.R., Fekken, G.C., &Jackson, D.N. (1983, June). Diagnostic efficiency of
the Basic Personaltiy Inventory. Paper presented at the Canadian Psychological
Association Annual Convention, Winnipeg, Canada.
Holden, R.R., Fekken, G.C., Reddon, J.R., Helmes, E., & Jackson, D.N. (1988).
Clinical reliabllities and validities of the Basic Personality Inventory. Journal of
Consulting and Clinical Psychology, 56, 766-768.
Holden, R.R., Helmes, E., Fekken, G.C., & Jackson, D.N. (1985). The
multidimensionality of person reliability: Implications for interpreting individual
test item responses. Educational and Psychological Measurement, 45, 119-130.
Holden, R.R., & Jackson, D.N. (1985). Disguise and the structured self-report
assessment of psychopathology: I. An analogue investigation. Journal of
Consulting and Clinical Psychology, 53, 211-222.
Holden, R.R., Reddon, J.R., Jackson, D.N., & Helmes, E. (1983). The construct
heuristic applied to the measurement of psychopathology. Multivariate
Behavioral Research, 18,3746.
Jackson, D.N. (1970). A sequential system for personality scale development. In
C.D. Spielberger (Ed.), Current topics in clinical andcommunitypsychology(yol.
2, pp. 61-96). New York: Academic Press.
Jackson, D.N. (1971). The dynamics of structured personality tests: 1971.
Psychological Review, 78,229-248. (Reprinted as a Warner Modular Publication,
1973,320, 1-20.)
Jackson, D.N. (1976). The Basic Personality Inventory. Port Huron, Ml: Sigma
Assessment Systems.
Jackson, D.N., Helmes, E., Hoffmann, H., Holden, R.R., Jaffe, P.G., Reddon, J.R. &
Smiley, W.C. (1989). The Basic Personality Inventory Manual. Port Huron, Ml:
Sigma Assessment Systems.
198 RONAlD R. HOlDEN AND DOUGLAS N. JACKSON

Jackson, D.N., & Hoffmann, H. (1987). Common dimensions of psychopathology


from the MMPiand the Basic Personality lnventory.Joumal ofClinical Psychology,
43, 661-669.
Jackson, D.N., MacLennan, R.N., Erdle, S.W.P., Holden, R.R., LaLonde, R.N., &
Thompson, G.R. (1986). Clinical judgments of depression. Journal of Clinical
Psychology, 42, 136-145.
Jackson, D.N., & Messick, S. (1969). A distinction between judgments of frequency
and of desirability as determinants of response. Educational and Psychological
Measurement, 29, 273-294.
Jackson, D.N. & Messick, S. (1986). The Differential Personality Inventory. London,
Canada: Authors.
Jaffe, P.G., Leschied, A.W., Sas, L., &Austin, G.W. (1985). A model for the provision
of clinical assessments and service brokerage for young offenders: The London
Family Court Clinic. Canadian Psychology, 26, 54-Sl.
Jaffe, P.G., Leschied, AW., Sas, L., Austin, G.W., &Smiley, W.C. (1985). The utility
of the Basic Personality Inventory in the assessment of young offenders.
Ontario Psychologist, 17, 4-11.
Lei, H., & Skinner, H.A. (1980). A psychometric study of life events and social
readjustment. Journal of Psychosomatic Research, 24, 57-S5.
Leschied, A.W ., Austin, G.W., &Jaffe, P.G. (1988). Impact of the young offenders act
on recidivism rates of special needs youth: Clinical and policy implications.
Canadian Journal of Behavioural Science, 20, 322-331.
Levitt, E. E. &Duckworth, J.C. (1984). Minnesota Multiphasic Personality Inventory.
In D.J. Keyser & R.C. Sweetland (Eds.), Test critiques. (Vol.l, pp. 466-4 72) Kansas
City: Test Corporation of America.
Livesley, W.J., Schroeder, M.L., & Jackson, D.N. (1989). A study of the factorial
structure of personality pathology. Journal of Personality Disorders, 3, 292-306.
Loevinger, J. (1957). Objective tests as instruments of psychological theory.
Psychological Reports, 3, 635-694.
Loper, R.G., Kammeier, M.L., & Hoffmann, H. (1973). MMPI characteristics of
college freshmen males who later became alcoholics. Journal of Abnormal
Psychology, 82, 159-162.
Meehl, P.E., & Rosen, A. (1955). Antecedent probability and the efficiency of
psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52,
194-216.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed.)
(pp. 13-103). New York: American Council on Education.
Morey, L.C., Blashfield, R.K., & Skinner, H.A (1983). A comparison of cluster
analysis techniques within a sequential validation framework. Multivariate
Behavioral Research, 18, 309-329.
Morey, L.C., Skinner, H.A, &Blashfield, R.K. (1984). A typology of alcohol abusers:
Correlates and implications. Journal of Abnormal Psychology, 93, 408-417.
BASIC PERSONAUTY INVENTORY 199

Neill, J.A., &Jackson, D.N. (1976). Minimum redundancy item analysis. Educational
and Psychological Measurement, 36, 123-134.
Office of Strategic Services, Assessment Staff. (1948).Assessment ofmen: Selection
of personnel for the Office of Strategic Services. New York: Holt, Rinehart &
Winston.
Ogborne, A.C. (1987). A note on the characteristics of alcohol abusers with
controlled drinking aspirations. Drug and Alcohol Dependence, 19, 159-164.
Reddon, J.R., &Jackson, D.N. (1989). Readability of three adult personality tests:
Basic Personality Inventory, Jackson Personality Inventory, and Personality
Research Form- E. Journal of Personality Assessment, 53, 180-183.
Red don, J.R., Marceau, R., &Jackson, D.N. (1982). An application of singular value
decomposition to the factor analysis of MMPI items. Applied Psychological
Measurement, 6, 275-283.
Richmond, J.M., Lindsay, R.M., Burton, H.J., Conley, J., & Wai, L. (1982).
Psychological and physiological factors predicting the outcome of home
dialysis. Clinical Nephrology, 17, 109-113.
Sas, L., & Jaffe, P.G. (1986). Understanding depression In juvenile delinquency:
Implications for Institutional admission policies and admission programs.
Juvenile and Family Court Journal, 37, 49-58.
Sas, L., Jaffe, P.G., &Reddon,J.R. (1985). Unravelling the needs of dangerous young
offenders: A clinical-rational and empirical approach to classification. Canadian
Journal of Criminology, 27, 83-96.
Scudds, R.A., Rollman, G.B., Hart, M., & McCain, G.A. (1987). Pain perception and
personality measures as discriminators in the classification of fibrositis.
Journal of Rheumatology, 14, 563-569.
Skinner, H.A. (1979). A multivariate evaluation of the MAST. Journal of Studies on
Alcohol, 40, 831-843.
Skinner, H.A. (1981). Primary syndromes of alcohol abuse: Their measurement
and correlates. British Journal of Addiction, 76, 63-76.
Skinner, H.A. (1982). The Drug Abuse Screening Test. Addictive Behaviors, 7,
363-371.
Skinner, H.A., & Allen, B.A. (1982). Alcohol dependence syndrome: Measurement
and validation. Journal of Abnormal Psychology, 91, 199-209.
Skinner, H.A, &Allen, B.A. (1983). Differential assessment of alcoholism: Evaluation
of the Alcohol Use Inventory. Journal of Studies on Alcohol, 44, 852-862.
Spielberger, C.D. (1983). State-Trait Anxiety Inventory (Form }')Manual. Palo Alto:
Consulting Psychologists Press.
Strauss, J.S., & Harder, D.W. (1981). The Case Record Rating Scale: A method for
rating symptom and social function data from case records. Psychiatry Research,
4, 333-345.
CHAPTER 7

Assessment of Couples

Alan E. Fruzzetti and Neil S. Jacobson

The field of couple assessment, like so many other areas of clinical


assessment, is in the midst of substantial theoretical and practical self-
analysis and revision. Different theoretical orientations in this field have
co-existed mostly in isolation from one another, and advocates of each
orientation have maintained their own more-or-less standard approach
to assessment. These standard approaches generally have involved
following a rather detailed recipe of procedures. Thus, each theoretical
approach to couple assessment has been quite specific in terms of goals
and methods, albeit of a different flavor.
In recent years, however, not only new combinations of assessments
and new tools, but entirely new approaches to assessment have been
developed. Practitioners, researchers, and practitioner-researchers have
become interested in a more multi-level approach. Although this may
present some difficulty for the novice (standard assessment protocols
are fewer, options greater), the prospects for advancement in the field are
many: A conceptual and methodological shift may lead to even greater
convergence of couple assessment methods.
The purpose of this chapter is not to present yet another, specialized
four-star recipe for couple assessment, or some hybrid assessment
technique. Nor do we simply provide a compendium of psychometric

ALAN E. FRUZZETII, Doctoral Candidate, Department of Psychology, University


of Washington, Seattle, Washington.
NEIL S. JACOBSON, Professor, Department of Psychology, University of
Washington, Seattle, Washington.
201
202 ALAN E. fRUZZETil AND NEILS. JACOBSON

assessment tools: other sources provide such resources with far greater
completeness than this space allows (e.g., Touliatos, Perlmutter, &
Straus, 1990). Rather, our purpose is to present a number of the common-
alities of the various assessment approaches currently utilized, and to
put particular, but not exclusive, emphasis on clinically-relevant assess-
ment procedures. We augment this discussion with some suggestions for
increasing convergence in the field among researchers and practitioners
alike.
Our title, "Assessment of Couples," reflects the fact that married,
heterosexual couples are by no means the only couples who seek profes-
sional help nor are they the only ones who serve as the subjects of
couples' research. The assessment approach we describe is quite ap-
plicable to other types of couples, such as gay, unmarried, and/or
remarried couples. Every couple has its own particular history and
problems, which need to be examined, as we shall describe. However, the
commonalities among various types of couples far exceeds their differ-
ences in terms of assessment needs.
We begin by examining the context for assessment (clinical, research,
or both), then explore the various levels of assessment (individual,
couple's presenting problems, micro-interactional, patterns of interaction,
etc.). Throughout the chapter we present assessment techniques (both
tried-and-true and the more novel) that may be useful in many settings
and for different applications. Finally, we conclude with a discussion of
newer methodological techniques that may help to increase further the
compatibility of assessment strategies not only among advocates of
particular theories, but also between what researchers and clinicians do.

Three Purposes of Assessment


Couple assessment methods, as in other areas of assessment, cannot
be divorced from the purpose of the assessment. Clearly the most
common reason for conducting a couple assessment is to inform clinical
interventions when a couple presents for therapy. Although therapists of
different orientations approach this assessment somewhat idiosyncrati-
cally, most agree that a) assessment is critical, and b) needs to be quite
thorough, and therefore may take several individual or conjoint sessions
and may be augmented by self-report questionnaires or other procedures
that address individual or couple functioning.
Of course, couples may be part of a research assessment in a study
investigating basic relational processes and issues. Often, research as-
ASSFSSMENT OF COUPLFS 203

sessments include many labor- and data-intensive procedures, such as


videotaping couple interactions and coding them on a number of di-
mensions, and multiple paper and pencil tests of individual and couple
functioning across a wide spectrum of areas. These protocols may occur
before any face-to-face interviews are conducted, but in some instances
have even supplanted direct interviewing. In longitudinal studies these
same protocols are repeated, often several times, depending on the
design of the study.
Finally, there are assessment batteries that are determined by both
clinical and research needs. In such cases the individual practitioner may
seek some kind of independent measure of relationship change over the
course of therapy. This may be for quality assurance purposes and/or as
a method of investigating change (or a lack of it) with a single couple
(Barlow, Hayes, & Nelson, 1984). Or, it may be to evaluate treatment
effectiveness (and/or the process of change) in a large outcome study
(Jacobson, Follette, &Elwood, 1984). Either way, these situations present
opportunities for collaboration and convergence between clinical and
research needs.
Superficially it may seem that clinical and research assessments have
much in common. Unfortunately, this is not necessarily the case. Just as
a wide gap exists between clinical practice and clinical research (Barlow,
1981a, 1981 b; Cohen, Sargent, &Sechrest, 1986; Morrow-Bradley &Elliott,
1986), there also has been quite a chasm between the kinds of assessments
utilized by researchers and practitioners. For example, clinical assessment
typically is concerned with understanding the couple in the overall
context of their lives-what problems they might have, and what factors
and relational processes seem to be causing or maintaining difficulties. In
contrast, research assessments have been more interested in classifying
couples according to typologies determined by a particular theory, and
then investigating differences between types of couples.
Fortunately, however, new developments in the approach to assess-
ment, if not the specific techniques themselves, suggest some
rapproachment between basic and applied researchers and clinicians.
More and more researchers are conceptualizing their assessments of
couples according to particular levels of analysis (e.g., examining general
patterns of interaction) that coincide with the approach that couple-
oriented clinicians are moving toward as well. And clinicians and
researchers alike are employing more multi-level assessment procedures.
There seems to be great potential for convergence, especially between
general clinical assessments and assessments conducted as part of
treatment outcome and therapy process research.
204 ALAN E. FRUZZETn AND NEILS. JACOBSON

Levels of Assessment
Many different factors influence couple functioning and thus are
important dimensions for assessment. These include individual function-
ing, historical factors, relationship issues, larger systemic or contextual
factors such as socio-economic status, employment, extended family
issues, and interactional processes. We explore different levels of as-
sessment relevant to a couple's relationship, when they are most
important, and the various means available for assessing within these
dimensions. The distinctions between levels are necessarily arbitrary:
individual factors influence couple interaction patterns, which in turn
affect what are the salient relationship issues, and so on. Thus, what is
now to be presented is more a general and practical set of levels or
dimensions of assessment than a rigorously defined theoretical or hier-
archical system. Practitioners from different theoretical perspectives
will likely emphasize different levels of assessment among those presented
below.

Individual factors
A host of personal or individual factors that each partner brings to a
relationship can affect the quality and process of their relationship. Here
we identify several of the factors that we believe are essential to explore
as part of any assessment of a couple.
Assessing individual psychopathology of either (or both) partners
provides a first step. Although the presence of severe individual dys-
function may not preclude couples therapy, knowledge of any individual
problems and how they affect or are affected by the relationship is
essential. Depression is perhaps the most common individual problem
that coexists with marital distress, and is a problem that often remits
along with a reduction in marital discord (Jacobson, Dobson, Fruzzetti,
Schmaling, & Salusky, in press; O'Leary & Beach, 1990). In addition to
individual structured or unstructured interviewing, depression may be
economically assessed using the Beck Depression Inventory (Beck, Rush,
Shaw, & Emery, 1979), or some other appropriate screening device. With
moderate or severe depression, of course, careful attention must be given
to suicidality. With a client who is severely depressed and has high levels
of suicidal ideation (or has a suicide plan with few deterrents), individual
treatment may be indicated either prior to or as an adjunct to marital
therapy.
Similarly, any substance abuse or dependence on the part of either or
both partners must be identified. This may best be accomplished by a
combination of written screening tool (e.g., the McAndrews scale of the
ASSFSSMENT OF COUPLFS 205

MMPI-2) and followup questions in an individual interview. As with any


significant problem of individual functioning, the effect of the partners'
relationship difficulties on the individual's substance abuse and vice-
versa should be explored. In some cases (depending on a therapist's
training, comfort, and/or theoretical preferences), remediation of the
individual problem prior to any couple-focused intervention may be
indicated. However, often the treatment of such individual problems are
integrated as a part of any intervention with the couple (e.g., Jacobson,
Holtzworth-Munroe, & Schmaling, 1989; Stanton & Todd, 1982).
Any history of other severe psychological problems also must be
ascertained: psychiatric hospitalizations, current or prior use of psycho-
tropic medications, criminal history, or any evidence of psychosis. Again,
while the current or historical presence of any of these factors may not
necessarily preclude couple interventions, they may indicate the need to
initiate adjunctive treatments or to postpone work with the couple until
individual problems are stabilized or resolved. At the very least this
knowledge affects treatment and subsequent assessment planning. It is
important to note, however, that severe emotional or behavioral prob-
lems in one or both partners is related to poorer outcome in marital
therapy (Jacobson, Berley, Melman, Elwood, & Phelps, 1985).
Many clinicians and researchers alike rely on the MMPI (Hathaway &
McKinley, 1983) or the MMPI-2 (Weed & Butcher, this volume) to screen
for a wide array of individual problems, especially when some of the test's
secondary scales are employed. It should be noted, however, that marital
distress may lead to false-positive elevations on the PO scale for both
husbands and wives (Fruzzetti, Whisman, & Jacobson, 1990), and so
should be interpreted with caution.
Additionally, it is important to assess competence in certain individual
skill areas. For example, basic verbal skills are essential to communica-
tion and problem-solving (e.g., Jacobson &Margolin, 1979), both of which
are essential components of therapy for couples. Similarly, the ability to
express emotion may be an important prerequisite for many types of
couple interventions (e.g., Greenberg &Johnson, 1988).
Patterns of blaming the partner for couple difficulties, and the attribu-
tions individual partners make about the other's behavior are also of
interest (e.g., Baucom, Epstein, Sayers, &Sher, 1989). In addition, partners'
beliefs about their viability as a couple, and the confidence they have in
their ability as a couple to work through their problems, may be impor-
tant determinants of their commitment to (and success in) couples
therapy. Notarious and Vanzetti (1983) have called the phenomenon in
which partners believe and act as though they will resolve their difficulties
"relational efficacy," which may be a good predictor of outcome in
therapy (Fruzzetti, King, Whisman, & Jacobson, 1989). Of course, this
206 ALAN E. FRUZZETn AND NEILS. JACOBSON

variable is as much interactional (each partner's beliefs and behaviors


depending on the other's) as it is individual in nature.
It is also important to be aware of any medical problems of either
partner. For example, several studies have demonstrated a relationship
between chronic. pain (e.g., Biglan & Thorsen, 1986), general somatic
complaints (Weiss &Aved, 1978) and couple difficulties. Thus, it may be
important to investigate any potential reciprocal influence between
medical and marital problems.
Finally, it is important to assess areas of individual strengths as well as
problems, whether for research or clinical purposes. Not only does an
exclusive focus on dysfunction blur the broader context of partners'lives
and make descriptions of them incomplete, but positive or neutral factors
maysignificantlyinfluence and predict relationship processes and therapy
outcome.
Historical factors
The assessment of historical factors related to current relationship
problems generally involves broad screening followed up by more in-
tensive exploration of identified problem areas. Because historical factors
may operate on the individual, couple, or systemic level, this may seem
to be an enormous task in itself. However, there are only a handful of
problem areas that are the most common, so screening can be fairly
straightforward. And again, attention should be paid to areas of strength
as well as to problem areas.
Historical factors include, but are not limited to: previous marriages,
relationships, or affairs of each partner; the kinds of relationships the
couple's parents had (including how the relevant partner experienced it
and how both partners think it affects their marriage); sibling and
extended family relationships; the couple's early relationship history
(including how they met, what initially attracted them to each other, and
some examples of shared positive experiences); what the couple has
tried to do to solve problems in the past (including what has worked, what
has not, and any previous couple's therapy); each partner's sexual
history (including abuse in the family of origin or by previous partners);
and any developmental problems or particular achievements of either
partner. General relationship topics can be discussed conjointly, while
more sensitive topics should be followed up or explored individually.
Contextual factors
The context for a couple is their environment: partners are immersed
in these factors constantly and thus may not notice or pay attention to
ASSESSMENT OF COUPLFS 207

them. Couples may or may not identify contextual factors as related to


their relationship (or individual) problems, although often some of these
factors are also conflict topics. The basic question is: "What is their life
as a couple like?" This includes questions about how they spend their
time and how much of it is spent together; the quality of that time- what
they do and how they feel when they are together; from whom they get
support, and in what form; to whom they give support (and how); what
their sources of pleasure and recreation are as a couple and individually,
and how often these are available; and questions about instrumental
activities in their daily lives.
An exploration of contextual factors not only permits the assessor to
begin to understand what living this couple's lives might be like (the
moment to moment pleasures and stressors), it also affords an opportunity
to identify key conflict issues and conflict processes (described below).
Thus, identifying and understanding contextual factors helps to begin to
determine how these various factors fit together to influence a couple's
presenting problems.
Certain contextual factors are quite common, and we present some of
the more typical ones below. However, the thrust of any assessment
should be to understand the particular couple in their world; thus the
following factors constitute only a beginning checklist in this process, not
an exhaustive one.
Children, whether living in the home or not, are perhaps the most
common contextual factor for couples (e.g., Margolin, 1981b). Children
may have been born to both partners, or just to one partner in a previous
relationship (or some mix). The couple may be considering (and dis-
agreeing) whether to have children at all, or whether to have another
child, or they may be expecting a child. There may be issues associated
with the transition to parenthood. Or, there may be part-time custody
issues from previous relationships and divorced spouses to deal with.
In addition, there may be medical, behavioral, developmental, or
school problems with children. With older couples, their children may
have moved away, be currently experiencing relationship difficulties of
their own, be getting married, divorced, or having children of their own.
These developmental milestones may present couples with life-span
developmental crises of their own.
Thus, a couple maybe influenced by parenting issues (who does what
with respect to childrearing), time spent with school or medical authorities,
or issues associated with childcare or their childrens' activities outside
the home. Or, a couple may face "empty-nest" issues. Children/parenting
factors may be a problem or a source of joy and comfort for a couple (or,
occasionally, not a factor at all). However, they usually provide a large
208 AlAN E. FRUZZETn AND NEILS. JACOBSON

measure of the context in which couples live their lives and are therefore
an essential part of any assessment.
Issues associated with partners' own parents provide another av-
enue into understanding a couple's world. Are parents near or far? What
kind of relationship does each partner have with his or her own and the
partner's parents? If the couple has children, what kind of relationship do
they have with their grandparents? Are the parents ill, requiring physical
care, financial, or emotional support? What do the partners expect from
themselves and from each other concerning parents and "in-laws?"
Another important contextual factor concerns employment. This
may, depending on circumstances, involve issues of unemployment or
underemployment, job dissatisfaction or satisfaction, or issues of limited
time together which is attributed to long working hours.
Employment factors often are associated as well with financial factors,
another important contextual variable. It is important not to underesti-
mate the potential impact that financial difficulties can have on a
relationship, especially in the context of raising children, or of medical
problems of either partner, a child, or aging parents, or of other costly
family responsibilities. Of course, relationship and individual problems
can create or complicate financial problems: the relationship between
financial and couple factors may not be one-directional, but instead may
be a reciprocal one.
Unfortunately, the cost of therapy itself can easily exacerbate financial
hardship for many couples. Therefore, it is important to view financial
difficulties as contextual factors relating to relationship difficulties, not
simply as a sign of resistance to therapy.
Finally, an often-overlooked contextual factor is the physical envi-
ronment in which the couple lives. What is their housing and neighborhood
like? Is it a place they both enjoy and in which they feel comfortable? Or,
is it physically close and cramped, constantly in need of attention
(assuming they do not enjoy the process or repair), and a place they
prefer not to spend time?
Although at first glance the contextual factors mentioned might seem
to present intractable problems, it is just this notion itself that is essential
to discern. For example, although it might seem that having a child with
developmental disabilities or being faced with inadequate housing are
virtually impossible problems to solve, they relate in central ways to the
couple's experience. Moreover, often these contextual factors become
recurrent conflict issues in relationships and the themes around which
destructive patterns of interaction revolve. They should not be overlooked.
ASSFSSMENT OF COUPLES 209

Relationship satisfaction and conflict topics

Identifying relationship conflict issues and general relationship satis-


faction is perhaps the simplist and most common form of couple
assessment in both clinical and research settings. Furthermore, relation-
ship dissatisfaction is either a primary or accompanying complaint of
most couples seeking therapy: how partners think and feel about their
relationship is of considerable importance and is an appropriate focus for
both clinicians and researchers (Baucom, 1983).
A list of conflict topics related to relationship satisfaction that should
be assessed includes (but should not be limited to): time together; levels
of expressed affection; sexual activity, satisfaction, and function/dys-
function; children and childrearing; parents and in-laws; lifestyle issues
and leisure time activities; finances; gender roles and who does what with
respect to household tasks; power and decision-making; and issues
related to each other's employment.
As with other dimensions, it is very important to note areas in which
the couple does not experience conflict. Identifying these areas of (at
least relative) strength helps to create a more complete view of the
couple.
Several useful self-report instruments have been developed to iden-
tify areas of conflict, dysfunction, and dissatisfaction in a fairly thorough
and efficient (for the assessor, at least) manner. Each questionnaire
measuring satisfaction and the content of relationship issues has its own
adherents. There are, indeed, many such instruments available. We
mention four specific ones below, chosen because of their high levels of
utilization in both clinical and research settings and because of their
documented psychometric properties. See Touliatos, Perlmutter, &Straus
(1990) for a more complete description of available instruments.
The "Dyadic Adjustment Scale" (DAS: Spanier, 1976) contains 32
items concerning areas of agreement/ disagreement, affectional and
companionship behaviors, and general satisfaction. Both partners com-
plete the questionnaire, which takes from five to twenty minutes. Although
initially considered to generate several subscales (cohesion,satisfaction,
consensus, and affectional expression), some recent efforts have not
consistently confirmed their separate validity (e.g., Eddy, Heyman, &
Weiss, 1990; Kazak, Jarmans, & Snitzer, 1988; Sharpley & Cross, 1982). At
the very least, however, a primary factor of relationship distress has been
found consistently which shows high levels of convergent validity.
The use of the DAS for research purposes, although extremely
common, is not without certain problems. For example, there is some
evidence that any classification of couples into distressed versus
210 ALAN E. FRUZZETn AND NEILS. JACOBSON

nondistressed groups using the DAS has poor reliability (Eddy, Heyman,
&Weiss, 1990). In addition, the unit of analysis to be employed (partners
considered separately or averaged together) continues to be widely
debated (e.g., Baucom, 1983; Whisman, Jacobson, Fruzzetti, & Waltz,
1989). Averaging the partners' scores could mask disagreement: for
example, if a score of 97 is considered the cutoff between satisfied and
dissatisfied, what of a couple who averaged 102, with one partner scoring
72 and the other 132? Similar problems arise if difference scores are
employed. Thus, both individual and combined scores should be exam-
ined in some fashion. Some potential ways to solve these problems (for
research purposes, at least) are: a) consider both couple scores and
separate individual scores in two sets of analyses; b) employ each
partner's score as a univariate measure in analyses with multiple de-
pendent measures; and c) conservatively employ the scores of the
partner who show the smallest change in repeated measure analyses
(Baucom, 1983; Jacobson, Follette, & Elwood, 1984). The use of any of
these approaches would enhance the validity of the DAS as a measure of
treatment outcome.
The "Marital Satisfaction Inventory" (MSI: Snyder, 1979) contains 280
self-report items which generate eleven normalized T-scales, including
overall validity and a global distress scale. The other scales measure
affective communication, problem solving communication, time together,
finances, sexual satisfaction, concerns with childrearing, relationship
with children, role-orientation within the dyad, and distress in the family
of origin.
For clinical purposes, at least, the MSI shows promise in discriminat-
ing among presenting problems (e.g., Berg &Snyder, 1981; Snyder, Wills,
& Keiser, 1981). Whether the separate scales are in fact sensitive to
specific changes in the relationship has not, however, been demon-
strated. In addition, the MSI seems less reactive to relationship changes
than the DAS, especially among women; thus it may be a more conserva-
tive measure of treatment success or other changes over time (Whisman
& Jacobson, 1991).
Another self-report measure of partners' presenting complaints and
targets for treatment is the "Areas of Change" questionnaire" (AOC)
developed by Weiss and associates (Margolin, Talovic, &Weinstein, 1983;
Patterson, 1976; Weiss, Hops, &Patterson, 1973). As its name implies, this
34-item questionnaire assesses a) desired behavior changes to be made
by the other partner, and b) partners' perception of what changes the
other one wants from the respondent. Both sets of ratings are made on a
seven point scale, with anchor points ranging from "much less" to "much
more." Also, by comparing the two parts, each partner's perceptual
accuracy and couple concordance can be assessed.
ASSESSMENT OF COUPLES 211

TheAOC reliably discriminates between distressed and non distressed


couples (e.g., Birchler &Webb, 1977), and is sensitive to treatment effects
(e.g., Baucom, 1982; Margolin &Weiss, 1978). Also, there is considerable
overlap between the AOC and general measures of relationship satisfac-
tion (e.g, Margolin & Wampold, 1981; Weiss, Hops, & Patterson, 1973).
Thus, the AOC goes beyond the measurement of relationship satisfaction
to provide an efficient means to assess and identify specific targets for
treatment, and subsequently, to assess treatment outcome specific to
those target behaviors.
Another type of self-report device is represented by the "Spouse
Observation Checklist" (SOC; Patterson, 1976; Weiss & Perry, 1983),
which employs spouses as observers of their partners' behavior in their
day to day interactions. The SOC consists of 408 items divided into 12
general categories of behavior (e.g., affection, communication, compan-
ionship, consideration). Each behavior is rated for its occurence or
frequency and pleasing/displeasing effect. Although different versions
have been employed, SOC items ususally describe either behaviors of the
other partner or behaviors in which both partners participate. Some
versions (e.g., Christensen & Nies, 1980) use the same item content, but
have each partner report on their own behavior as well as the behavior
of the other.
The SOC has been employed primarily a) to assess relationship
distress and treatment response (e.g., Margolin, 1981a); b) as an assess-
ment device to help increase the accuracy of conflict reports and
relationship events (e.g., Jacobson & Margolin, 1979); and c) as a thera-
peutic intervention tool (e.g., Jacobson & Margolin, 1979). Because the
SOC provides more behaviorally specific and detailed records, and is
based on one person's observation and monitoring of another, it provides
assessment data quite distinct from typical self-report measures. As
such, it has been widely and successfully employed for clinical assess-
ment, intervention, and research purposes (e.g., Jacboson & Margolin,
1979; Vincent, Cook, & Messerly, 1980; Weiss, Hops, & Patterson, 1973).
Despite its established utility, there have been several criticisms of
the methodology employed by the SOC in which partners observe their
own interactions. There is disagreement regarding the validity of obser-
vations couples make concerning their own interactions (Christensen &
Nies, 1980; Weiss & Margolin, 1986). The general criticism of this assess-
ment methodology is that partners are not "objective" raters of each
others' behavior. Rather, they bring their relationship history, expecta-
tions, current emotions, etc. to bear on their ratings. These other factors
are assumed to influence the ratings partners make, at least relative to a
"blind" or uninformed observer. Despite these criticisms, the SOC affords
212 ALAN E. FRUZZETil AND NEILS. JACOBSON

a unique and comprehensive alternative means to assess common rela-


tionship behaviors and how they function in a particular couple.

Reladonshlp processes and Interaction patterns


Although identifying partners' overall satisfaction with their relation-
ship and their areas of disagreement and conflict is essential, it does not
provide a sufficient assessment of couples. It remains necessary to
identify the form or process in which this conflict (or satisfaction) is
played out. Here we discuss many different processes and interaction
patterns germane to both clinicians and researchers, and some of the
various approaches to assess them.
Violence. One problem that spans the content and process levels of
couples is violence in the relationship. Given the general prevalence of
violence, especially against women by their partner (e.g., Straus, Gelles,
&Steinmetz, 1980) and the high incidence found in couples presenting for
marital therapy (e.g., O'Leary & Vivian, 1990), it is essential to assess
every couple, regardless of setting or presenting factors, for violence.
Relationship violence has been described as having four components
(Ganley, 1981): a) physical violence; b) sexual violence; c) psychological
abuse; and d) property destruction and harm to pets. We present some
basic procedures for identifying the occurence of these different levels of
violence. More complete descriptions of the assessment, treatment,
social, and research implications of relationship violence can be found in
other sources (e.g., Finkelhor, Gelles, Hotalling, &Straus, 1983; O'Leary &
Murphy, in press; O'Leary & Vivian, 1990; Yllo & Bograd, 1988). Steps
subsequent to the identification of violence of course depend on the
setting and purpose of the assessment and level of risk present.
Assessing violence is perhaps the most sensitive aspect of a couple
assessment. Therefore, it is important to maximize the security of the
abused partner. First, self-report screening can be utilized with separate
envelopes provided for partners. The most common such instrument is
the Conflict Tactics Scale (CTS: Straus, 1979), which has recently been
revised (Gelles & Straus, 1988). The CTS asks 19 questions about the
frequency and recency of conflict behaviors. The level of violence as-
sessed ranges from low/none (verbal arguments) to severe (e.g., shooting
or stabbing).
However, the CTS does not measure severity of the behaviors re-
ferred to by its items nor the consequences of violent behavior, which can
vary considerably. For example, "pushing" may be relatively minor (e.g.,
incidental contact in order to leave a room) or quite serious (e.g., pushed
to the ground, into a wall, or down stairs). Still, the CTS affords an efficient
introduction to an assessment of relationship violence.
ASSFSSMENT OF COUPLFS 213

Most importantly, any self-report assessment must be complemented


by a face-to-face interview with individual partners, especially the woman.
Again, the prevalence of violence makes it imperative to ask directly
about specific physical behaviors and/or the threat of violence {not
simply, "is your relationship ever characterized by violence") indicated
on the CTS and whether she is ever afraid of her partner. Even if there has
never been a violent incident, or much time has passed since such an
incident, the threat of future violence may be a powerful factor controlling
the woman's behavior and limiting her choices.
Of course, whenever violence is found, the situations, predictors,
warning signs, and so on, must be identified in order to determine the
level of risk and safey present. What happens next depends on the
purpose of the assessment, resources and/or referrals available, willing-
ness of the partners to participate in treatments targeting physical
aggression, and apparent safety of the woman involved. Ultimate dispen-
sation of the case to individual treatment (e.g., anger management),
couples treatment, women's shelter, or some combination remains con-
troversial (e.g., Yllo & Bograd, 1988). But engagement in any clinical
decision-making is predicated on first identifying the presence or threat
of violence through careful and thorough assessment.
It is important to consider that when serious safety issues are
involved the assessor's role may be expanded. That is, if a person faces
imminent and serious risk, the assessor (or associate) may need to
facilitate her movement to some refuge (e.g., women's shelter), and
perhaps even arrange protection from the office to that destination via
police escort or other means. Thus, assessment of violence involves a
considerable commitment to the client on the part of the assessor.
Communication. Tremendous theoretical, statistical, and techno-
logical advances over the past twenty years have led to very sophisticated
methodologies for studying communication interactions (including ver-
bal and affective behaviors) in couples. All of these systems are very time-
and labor-intensive, and therefore impractical for purely clinical pur-
poses. Yet, significant advances in our understanding of dyadic processes
have resulted from their use (e.g., Markman & Notarius, 1987).
Several excellent microanalytic systems are now available, each with
its own merits. They vary in many ways, including the emphasis they
place on the content of the conversation versus partners' affect, the
relative importance of nonverbal communication, and the types of tasks
they are designed to code. The most common microanalytic systems
include the Marital Interaction Coding System (MICS; Hops, Wills,
Patterson, & Weiss, 1972) is now in its fourth version (Heyman, Weiss, &
Eddy, 1990); the Couples Interaction Scoring System (CISS: Gottman,
1979; Notarius & Markman, 1981); the Dyadic Interaction Scoring Code
214 ALAN E. FRUZZETn AND NEILS. JACOBSON

(DISC; Filsinger, 1981); the Uving In Family Environments coding system


(LIFE; Arthur, Hops, &Biglan, 1982); the Communication Skills Test (CST;
Floyd & Markman, 1984); the Kategoriensystem fur Partnerschaftliche
Interaktion (KPI; Hahlweg, Reisner, Kohli, Vollmer, Schindler, &
Revenstorf, 1984); and the Specific Affect Coding System (SPAFF; Gottman,
1988). Detailed comparisons among most of these systems may be found
in several sources (e.g., Margolin, Michelli, &Jacobson, 1988; or Markman
& Notarius, 1987).
Hundreds of studies have employed microanalytic coding method-
ologies, and provide a strong empirical foundation for our understanding
of couple interactions. For example, we now take for granted the phenom-
ena of negative (among distressed couples) and positive behavioral and
affective reciprocity (among happy couples). However, the use of such
coding systems as criteria for assessing treatment outcome has been the
subject of much debate.
Critics of microanalytic coding systems point to the fact that they are
based on "artificial," time-limited, laboratory conversations. Jacobson
(1985a, 1985b) has also questioned whether the most important treat-
ment targets are, in fact, observable to personally uninvolved coders, and
therefore whether such systems are really valid indices of treatment
outcome. In response, Weiss and Frohman (1985) suggest that the con-
tent and predictive validity of such systems are sufficient evidence of
their worthiness, and that the full utility of microanalytic coding systems
has yet to be realized. While conceding that the coding systems have not
been perfected, they are continually evolving to include more relevant
data. Gottman (1985) suggests that treatment innovations should be more
reflective of the results of studies employing microanalytic coding tech-
niques, not the other way around.
Perhaps due in part to questions raised in this debate, coding systems
continue to be developed, refined, and expanded. Moreover, there has
been a renewed focus on interaction patterns (more molar, as opposed to
smaller, molecular interaction units) to complement the microanalytic
approach. Such systems are more economical and have the potential to
become clinical assessment tools as well, not limited to the research
laboratory. Much more research is necessary before the clinical validity
of such systems can be established.
Despite disagreements about the ultimate validity of coding systems
as measures of treatment outcome, there is considerable clinical utility to
having familiarity with any coding system and employing observational
techniques whenever possible. For example, in our clinic, couples rou-
tinely participate in 30 minutes of videotaped interaction whether they
are part of a research project or not. Such interactions are generally set
up by technicians, hopefully freeing couples to interact more naturally.
ASSESSMENT OF COUPLES 215

The videotape may be viewed by the clinician casually or at any level of


coding (depending on her or his level of familiarity or expertise). The
specific behaviors targeted will depend on the theoretical approach of
the therapist (e.g., conflict style, underlying affect, indirect communica-
tion, power differentials, conflict triggers). Whatever approach is
employed, the techniques of observing couples interacting provide an
invaluable assessment tool.
Psychophysiology and Affect. One relatively new and exciting area
of study in couple interactions involves measuring heart rate, skin
conductance, pulse transmission time (to the finger), and somatic activ-
ity in couples engaged in videotaped laboratory tasks (Gottman &
Levenson, 1985; Levenson & Gottman, 1983, 1985). This research has
begun to demonstrate "linkage" (reciprocity) between indices of part-
ners' physiological arousal in conflict tasks. In addition, physiological
indicators have been shown to be related to other measures of couple
interactions, such as relationship satisfaction. For example, rates of
physiological arousal (such as heart rate) during a conflictual interaction
predicted decrements in relationship satisfaction over a three year
period (Levenson & Gottman, 1985).
The recent efforts to assess couples' psychophysiology parallels an
increased research focus on affect in couple interactions over the past
decade (e.g., Broderick &O'Leary, 1986; Gottman &Krokoff, 1989; Margolin
& Weinstein, 1983). For clinical assessment purposes, of course, objec-
tive physiological measurement is quite impractical. However, self-report
of affect related to arousal can be quite valid and informative (e.g.,
Gottman & Levenson, 1985, 1986). And further·, high arousal seems to be
associated with withdrawal from conflict (Gottman & Krokoff, 1989) and
predictive of subsequent changes in relationship satisfaction. Thus, it
may be very important to inquire about arousal in a couple assessment,
especially during conflict (best observed at the time, not retrospec-
tively).
Characteristic Interaction Themes and Patterns. Although not nec-
essarily new, a focus on particular themes of couple Interactions and
broader patterns of interaction provides a common ground for clinicians
and researchers alike (Gottman, 1982; Margolin, 1983). The demand/
withdraw interaction pattern, in which one partner increasingly with-
draws in conflict situations while the other intensifies his/her engagement,
provides one example of this convergence.
The demand/ withdraw interaction pattern has been identified by
therapists from systemic (e.g., Napier, 1978), behavioral (e.g., Jacobson
& Margolin, 1979), and ego-analytic perspectives (e.g., Wile, 1981), al-
though each employs somewhat different terminology. And researchers
have begun to examine the prevalence of this pattern, its role In the dyad,
216 ALAN E. FRUZZETn AND NEILS. JACOBSON

and consequences on couple and individual functioning (e.g., Christensen


& Heavey, 1990; Fruzzetti & Jacobson, 1991).
Considerable empirical evidence supports continued emphasis at
the thematic, or general pattern, level of interaction. For example, Sullaway
& Christensen (1983) examined 22 asymmetric interaction patterns that
were found in clinical literature. They found significant agreement be-
tween partners about the presence of particular patterns. Moreover, four
particular patterns were related to overall relationship satisfaction: a)
introvert vs. extrovert; b) relationship vs. work orientation; c) emotional
vs. rational, and d) demand/withdraw. All four patterns were more
common among less satisfied couples.
Importantly, support was found for the use of self-report instruments
in the identification of characteristic interaction themes and patterns.
Further work by Christensen and his colleagues (cf. Christensen, 1988, for
examples of valid self-report tools) has continued to validate this assess-
ment approach.
Themes and patterns related to closeness and intimacy are of particu-
lar relevance in couple assessment. Looking at a combination of more
standard conflict interactions (e.g., a discussion about an area of dis-
agreement) and intimate interactions (in which there may also be conflict)
can provide a fairly thorough assessment of interaction themes and
patterns. Intimacy themes are important to explore because they are
often related to common presenting problem (e.g., "I'm just not sure we
love each other any more" or, "I want to spend more time together but it
seems that he can't wait to get away from me"). However, intimacy
conflicts are most frequently manifested at the thematic or interactional
level (Fruzzetti & Jacobson, 1990; Jacobson, 1989; Christensen & Heavey,
1990), and are not simply areas of disagreement. Thus, exploring intimacy
interactions with a couple can often provide a sample of potential
interaction patterns salient to those partners.
Assessment of interaction patterns in general might include prelimi-
nary self-reports followed up by careful obervation of interaction themes
in conflict situations and in more affectively-positive situations. For
example, in addition to observing couples in conflict, Jacobson and
Margolin (1979) recommend asking couples to talk about their courtship
period (how they met, what attracted them to each other, and so on),
which they suggest can foster closeness as partners remember happier
days.
The way partners interact when discussing more affectively-positive
topics can provide a window into their themes and patterns around
closeness and intimacy: for example, are there differences in the comfort
level (including physiological arousal) each partner experiences, and
how is this demonstrated? How, if at all, do themes such as rational vs.
ASSFSSMENT OF COUPLES 217

emotional, work-orientation vs. relationship-orientation, or demand vs.


withdraw characterize this couple? Does this pattern characterize other
interactions they have? These same questions can be applied to typical
conflict interactions as well.

A Suggested Protocol for Clinical Assessment


There are many means by which to collect assessment data. Because
we approach couple assessment multi-dimensionally, we recommend a
combination of conjoint and individual interviews, interaction tasks, and
self-report questionnaires in order to maximize the potential accuracy
and usefulness of any assessment. It must be emphasized that this may
take several sessions in addition to the time the couple spends on self-
report measures: A single fifty-minute interview is simply inadequate in
most cases, as is sole reliance on self-reports.
A suggested clinical assessment protocol would begin with the
administration of carefully-chosen self-report questionnaires, perhaps
done at home by the couple and mailed in for careful examination prior
to face-to-face interviews. If possible, the couple should complete a
videotaped sample of their interaction and communication style, includ-
ing both conflict and non-conflict discussions. One or two conjoint
sessions would then be scheduled to allow both general discussion and
followup of critical items from the self-report quesionnaires. The com-
plexity of this task should not be underestimated. Every questionnaire
contains valuable information pertinent to the assessment process. Not
only is each partner's answer to a particular item important, but partner
to partner comparisons must be made; couple patterns must be dis-
cerned; and a preliminary hierarchy of the relative importance of the
assessment data (conflict topics, conflict process, closeness, power
differences, etc.) must be established. Which assessment data are most
important is influence, of course, by the theoretical preferences of the
clinician and type of therapy to be employed. Nevertheless, from most
perspectives the questionnaires and communication sample provide a
first step toward understanding the salient features of couples' difficul-
ties, and hence, intervention targets.
During the conjoint sessions each of the levels of assessment de-
scribed above would be explored, and tentative hypotheses generated
from self-report and observed communication interactions (if available)
would be tested. Moreover, the process of having couples discuss their
history, their problems, and their strengths together (conjointly) pro-
vides another important window into the interactional processes and
themes specific to the particular couple.
218 ALAN E. fRUZZETn AND NEILS. JACOBSON

Then, individual sessions would be held to discuss further any


sensitive topics that arise and/or to determine if there might be some
essential issues that have been avoided by one or both partners (e.g.,
violence in their relationship). Finally, another conjoint session allows
the assessor to clarify uncertainties and get feedback from the couple
concerning the accuracy of her or his integration of all the various levels
of assessment of the couple.
Of course, in clinical practice the assessment should not end at this
point. Rather, this might be regarded as the preliminary assessment, or
first phase on an ongoing assessment process that will continue through
treatment and into post-therapy evaluations. These first stages merely
provide organizing principles and first glances at couple issues, pro-
cesses, and interaction patterns, and a guide to initial intervention
strategies. An evaluation of partners' responses to interventions, life
changes, stressors, and each others' own changes, will further inform and
refine this first assessment step.

Future Directions: Multidimensional Assessment


The use of a multidimensional assessment approach holds great
promise for creating common ground for clinicans and researchers alike.
Clearly, not all research procedures are necessary for clinical assess-
ment; not all dimensions have to be quantified. And no plausible research
investigation can include all potentially relevant dimensions, if for no
other reason than the enormity of sample size required. So, choices must
be made. Yet, there is enormous possible confluence in the approaches
that researchers and clinicians alike can take. Here we discuss several of
the areas of potential agreement, or at least complementarity.
First of all, we must go beyond the measurement of relationship
satisfaction as the sole criterion of distress or dysfunction (and hence
improvement). One way is to include individual partners' functioning as
measures and indicators of outcome (e.g., Baucom et al., 1990). Another
is to continue to assess arousal and affective indicators that partners
experience during high- and low- conflict tasks. And the identification of
global themes and patterns of interaction provides tremendous opportu-
nity for exploration in both clinical and research settings.
For clinicians, systematic, multi-level assessment can provide re-
searchers with carefully defined areas of difficulty that are in need of
study. Clinically-relevant challenges are usually welcome topics for am-
bitious researchers. For their part, researchers can expend more effort
examining the process of change in couple's therapy, not just looking at
comparative outcomes (Jacobson, Follette, & Elwood, 1984). Therapy
ASSESSMENT OF COUPLES 219

process research, especially, holds promise as a shared endeavor for


clinicians and researchers, regardless of theoretical orientation. Also,
investigations that match types of couples to treatment strategies
(Fitzpatrick, 1988; Margolin et al., 1988)- an implied procedure that most
clinicians do routinely - would be clinically relevant. And finally, re-
searchers need to report results in ways that are meaningful in clinical
practice. Fortunately, the means to do this are now available (e.g.,
measures of clinically significant change: Christensen & Mendoza, 1986;
Jacobson, Follette, & Revenstorf, 1984). Overall, however, there is a
tremendous opportunity afoot for cooperation and collaboration among
groups that have traditionally worked independently.

References
Arthur, J., Hops, H., & Biglan, A. (1982). Living in Family Environments (LIFE)
coding system. Eugene, OR: Oregon Research Institute.
Barlow, D. H. (1981a). A role for clinicians in the research process. Behavioral
Assessment, 3, 227-233.
Barlow, D. H. (1981b). On the relation of clinical research to clinical practice:
Current issues, new directions. Journal of Consulting and Clinical Psychology,
49, 147-155.
Barlow, D. H., Hayes, S.C., & Nelson, R. (1984). The scientist-practitioner. Elmsford,
New York: Pergamon.
Baucom, D. H. (1982). A comparison of behavioral contracting and problem-
solving/communicat ion training In behavioral marital therapy. Behavior
Therapy, 13, 162-174.
Baucom, D. H. (1983). Conceptual and psychometric Issues In evaluating the
effectiveness of behavioral marital therapy. Advances in family intervention,
assessment and theory (Vol. 3, pp. 91-117). Greenwich, CT: JAI Press.
Baucom, D. H., Burnett, C. K., Rankin, L., &Sher, T. G. (1990). Cognitive/behavioral
marital therapy outcome research: What Is success? In R. L. Weiss (Chair), Why
9 out of 10 marital therapists should not prefer satisfaction. Symposium presented
at the 24th Annual Convention of the Association for the Advancement of
Behavior Therapy, San Francisco.
Baucom, D. H., Epstein, N., Sayers, S., &Sher, T. G. (1989). The role of cognitions
In marital relationships: Definitional, mehtodological, and conceptual issues.
Journal of Consulting and Clinical Psychology, 57, 31-38.
Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive therapy of
depression. New YOrk: Guilford.
Berg, P., & Snyder, D. K. (1981). Differential diagnosis of marital and sexual
distress: A multidimensional approach. Journal of Sex and Marital Therapy, 7,
290-295.
220 ALAN E. FRUZZETil AND NEILS. JACOBSON

Biglan, A., & Thorsen, C. (1986). Ther interactive behavior of women with chronic
pain. Unpublished manuscript, Oregon Research Institute, Eugene, OR.
Birchler, G. R., & Webb, L. J. (1977). Discriminating interaction in behavior in
happy and unhappy marriages. Journal of Consulting and Clinical Psychology,
45, 494-495.
Broderick, J. E., & O'Leary, K. D. (1986). Contributions of affect, attitudes, and
behavior to marital satisfaction. Journal of Consulting and Clinical Psychology,
54, 514-517.
Christensen, A. (1988). Dysfunctional interaction patterns in couples. In P. Noller
& M. A. Fitzpatrick (Eds.), Perspectives on mairtal interaction (pp. 31-52).
Clevedon, Avon, England: Multilingual Matters.
Christensen, A., & Heavey, C. L. (1990). Gender and social structure in the
demand/withdraw pattern of marital conflict. Journal of Personality and Social
Psychology, 59, 73-81.
Christensen, A. C., & Mendoza, J. (1986). A method for assessing change in single
subject designs: An alter action of the RC index. Behavior Therapy, I 7, 30~08.
Christensen, A., & Nies, D. C. (1980). The Spouse Observation Checklist: Empirical
analysis and critique. The American Journal of Family Therapy, 8, 69-79.
Cohen, L. H., Sargent, M. M., & Sechrest, L. B. (1986). Use of psychotherapy
research by professional psychologists. American Psychologist, 4 I, 198-206.
Eddy,J. M., Heyman, R. E., & Weiss, R. L (1990). Satisfaction by the numbers: Is the
DASacausefor concern?InR. L. Weiss (Chair), Why9outof /Omarital therapists
should not prefer satisfaction. Symposium presented at the 24th Annual
Convention of the Association for the Advancement of Behavior Therapy, San
Francisco. Filsinger, E. E. (1981). The Dyadic Interaction Scoring Code. In E. E.
Filslinger & R. A. Lewis (Eds.), Assessing marriage: New behavioral approaches.
Beverly Hills, CA: Sage.
Finkelhor, D., Gelles, R. J., Hotalling, G. T., &Straus, M.A. (1983). The dark side of
families: Current family violence research. Newbury Park, CA: Sage.
Fitzpatrick, M.A. (1988). A typological approach to marital interaction. In P. Noller
& M. A. Fitzpatrick (Eds.), Perspectives on marital interaction (pp. 98-120).
Clevedon, Avon, England: Multilingual Matters.
Floyd, F. J., & Markman, H. J. (1984). An economical observational measure of
couples' communication skill. Journal of Consulting and Clinical Psychology, 52,
97-103.
Fruzzetti, A. E., & Jacobson, N. S. (1990). Toward a behavioral conceptualization
of adult intimacy: Implications for marital therapy. In E. A. Blechman (Ed.),
Emotions and the family: For better or for worse (pp. 117-135). Hillsdale, NJ:
Erlbaum.
Fruzzetti, A. E., & Jacobson, N. S. (1991). Depressive response to relationship
dissolution: A comparison of cognitive and contextual factors. Manuscript
submitted for publication. Seattle, WA: University of Washington.
ASSESSMENT OF COUPLES 221

Fruzzetti, A. E., King, M. R., Whisman, M.A., &Jacobson, N. S. (1989, November).


Homework compliance and generalization in behavioral marital therapy: Therapist-
client interactions and outcome. Paper presented at the 23rd annual meeting of
the Association for Advancement of Behavior Therapy, Washington, D. C.
Fruzzetti, A. E., Whisman, M.A., & Jacobson, N. S. (1990). MMPI Scale 4 false
positives: Beware the effects of marital discord. Unpublished manuscript.
Seattle, Wk Univeristy of Washington.
Ganley, A. (1981). Counseling program for men who batter: Elements of effective
programs. Response, 4, 34.
Gelles, R. J., & Straus, M.A. (1988)./ntimate violence. New York: Simon &Schuster.
Gottman, J. M. (1979). Marital interaction: Experimental investigations. New York:
Academic Press.
Gottman, J. M. (1982). Temporal form: Toward a new language for describing
relationships. Journal of Marriage and the Family, 44, 943-962.
Gottman, J. M. (1985). Observational measures of behavior therapy outcome: A
reply to Jacobson. Behavioral Assessment, 7, 317-322.
Gottman, J. M. (1988). Specific Affect Coding System (SPAFF): Observing emotional
communication in marital and family interaction. Seattle, WA: University of
Washington.
Gottman, J. M., & Krokoff, L. J. (1989). Marital interaction and satisfaction: A
longitudinal view. Journal of Consulting and Clinical Psychology, 57, 4 7-52.
Gottman, J. M., & Levenson, R. W. (1985). A valid procedure for obtaining self-
report of affect in marital interaction. Journal of Consulting and Clinical
Psychology, 53, 151-160.
Gottman, J. M., & Levenson, R. W. (1986). Assessing the role of emotion in
marriage. Behavioral Assessment, 8, 31-48.
Greenberg, L.A., &Johnson, S. (1988). Emotion focused couples therapy. New York:
Guilford.
Hahlweg, K., Reisner, L., Kohli, G., Vollmer, M., Schindler, L., & Revenstorf, D.
(1984). Development and validity of a new system to analyze interpersonal
communication (KPl). InK. Hahlweg &N. S. Jacobson (Eds.), Marital interaction:
Analysis and modification. New York: Guilford Press.
Hathaway, & McKinley, (1983). The Minnesota Multiphasic Personality Inventory
manual. New York: Psychological Corporation.
Heyman, R. E, Weiss, R. L., & Eddy, J. M. (1990). Marital interaction coding system-
IV (MICS-IV). Eugene, OR: Oregon Marital Studies Program, University of
Oregon.
Hops, H., Wills, T. A, Patterson, G. R., & Weiss, R. L. (1972). Marital interaction
coding system. Eugene, OR: University of Oregon and Oregon Research Institute.
Jacobson, N. S. (1985a). The role of observational measures in behavior therapy
outcome research. Behavioral Assessment, 7, 297-308.
Jacobson, N. S. (1985b). The uses versus abuses of observational measures.
Behavioral Assessment, 7, 323-330.
222 ALAN E. FRUZZETn AND NEILS. JACOBSON

Jacobson, N. S. (1989). The politics of intimacy. Behavior Therapist, 12, 29-32.


Jacobson, N. S., Berley, R., Elwood, R., Melman, K., &Phelps, C. (1985). Failure in
behavioral marital therapy. InS. Coleman (Ed.), Failure in family therapy (pp.
91-134). New York: Guilford.
Jacobson, N. S., Dobson, K., Fruzzetti, A. E., Schmaling, K. B., & Salusky, S. (in
press). Marital therapy as a treatment for depression. Journal ofConsulting and
Clinical Psychology.
Jacobson, N. S., Follette, W. C., & Elwood, R. W. (1984). Outcome research on
behavioral marital therapy: A methodological and conceptual reappraisal. In
K. Hahlweg &N. S. Jacobson (Eds.), Marital interaction.· Analysis and modification
(pp. 113-132). New York: Guilford Press.
Jacobson, N. S., Follette, W. C., &Revenstorf, D. (1984). Psychotherapy outcome
research: Methods for reporting variability and evaluating clinical significance.
Behavior Therapy, 15, 336-352.
Jacobson, N.S., Holtzworth-Munroe, A, &Schmaling, K.B. (1989). Marital therapy
and spouse involvement in the treatment of depression, agoraphobia, and
alcoholism. Journal of Consulting and Clinical Psychology, 57, 5-10.
Jacobson, N. S., & Margolin, G. (1979). Marital therapy: Strategies based on social
learning and behavior exchange principles. New York: Brunner/Mazel.
Kazak, A. E., Jar mas, A., &Snitzer, L. (1988). Athe assessment of marital satisfaction:
An evaluation of the Dyadic Adjustment Scale. Journal of Family Psychology, 2,
82-91.
Levenson, R. W., & Gottman, J. M. (1983). Marital interaction: Physiological
linkage and affective exchange. Journal ofPersonality and Social Psychology, 45,
587-597.
Levenson, R. W, & Gottman, J. M. (1985). Physiological and affective predictors of
change in relationship satisfaction. Journal ofPersonal ityand Social Psychology,
49,85-94.
Margolin, G. (1981 a). Behavior exchange in distressed and nondistressed marriages:
A family cycle perspective. Behavior Therapy, 12, 329-343.
Margolin, G. (1981b). The reciprocal relationship between marital and child
problems. In J.P. Vincent (Ed.), Advances in family intervention assessment and
theory. Greenwich, CT: JAI Press.
Margolin, G. (1983). An Interactional model for the behavioral assessment of
marital relationships. Behavioral Assessment, 5, 103-127.
Margolin, G., Michelli,J., &Jacobson, N. (1988). Assessment of marital dysfunction.
InA. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook
(3rd edition) (pp. 441-489). Elsmford, New York: Pergamon.
Margolin, G., Talovic, S., & Weinstein, C. D. (1983). Areas of Change questionnaire:
A practical approach to marital assessment. Journal of Consulting and Clinical
Psychology, 51, 920-931.
Margolin, G., &Wampold, B. E. (1981 ). A sequential analysis of conflict and accord
in distressed and nondistressed marital palrs.Journal ofConsulting and Clinical
Psychology, 49, 554-467.
ASSESSMENT OF COUPLES 223

Margolin, G., & Weinstein, C. D. (1983). The role of affect in behavioral marital
therapy. In M. L. Aronson&L. R. Wolberg (Eds.), Group and family therapy 1982:
An overview (pp. 334-355). New York: Brunner/Mazel.
Margolin, G., & Weiss, R. L. (1978). A comparative evaluation of therapeutic
components associated with behavioral marital treatment. Journal ofConsulting
and Clinical Psychology, 46, 1476-1486.
Markman, H. J., & Notarius, C. I. (1987). Coding marital and family interaction:
Current status. InT. Jacob (Ed.), Family interaction and psychopathology (pp.
329-390). New York: Plenum.
Morrow-Bradley, C., & Elliott, R. (1986). Utilization of psychotherapy research by
practicing psychotherapists. American Psychologist, 41, 188-197.
Napier, A. Y. (1978). The rejection-Intrusion pattern: A central family dynamic.
Journal of Marriage and Family Counseling, 4, 5-12.
Notarius, C. l., & Markman, H. J. (1981). The Couples Interaction Scoring System.
In E. E. Filsinger & R. A. Lewis {Eds.), Assessing Marriage (pp. 112-127). Beverly
Hills, CA: Sage.
Notarius, C.l., &Vanzetti, N.A. (1983). Themaritalagendas protocol.ln E. Filsinger
(Ed.), Marriage and family assessment. Beverly Hills, CA: Sage.
O'Leary, K. D., & Beach, S. R. H. (1990). Marital therapy: A viable treatment for
depression and marital discord. American Journal of Psychiatry, 147, 183-186.
O'Leary, K. D., & Murphy, C. (in press). Clinical issues in the assessment of spouse
abuse. In R. T. Ammerman & M. Hersen (Eds.), Assessment of family violence:
A clinical and legal sourcebook. New York: John Wiley.
O'Leary, K. D., & Vivian, D. (1990). Physical aggression in marriage. In F. D.
Fincham & T. N. Bradbury {Eds.), The psychology of marriage: Basic issues and
applications (pp. 232-348). New York: Guilford.
Patterson, G. R. (1976). Some procedures for assessing changes in marital
interaction patterns. Oregon Research Institute Bulletin, 16 (7).
Sharpley, C. F., & Cross, D. G. (1982). A psychometric evaluation of the Spanier
Dyadic Adjustment Scale. Journal of Marriage and the Family, 44, 739-741.
Snyder, D. K. (1979). Multidimensional assessment of marital satisfaction. Journal
of Marriage and the Family, 41, 121-131.
Snyder, D. K., Wills, R. M., &Keiser, T. W. (1981). Empiricalvalidationofthe Marital
Satisfaction Inventory: An actuarial approach. Journal ofConsulting a_nd Clinical
Psychology, 49, 262-268
Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the
qulity of marriage and similar dyads. Journal ofMarriage and the Family, 38, 15-
28.
Stanton, M. D.,&Todd, T. C. (1982). Thefamilytherapyofdrugabuseandaddiction.
New York: Guilford.
Straus, M. A. (1979). Measuring intrafamily conflict and violence: The Conflict
Tactics (CT) Scales. Journal of Marriage and the Family, 4 I, 75-88.
224 AlAN E. FRUZZETn AND NEILS. JACOBSON

Straus, M.A., & Gelles, R. J., &Steinmetz, S. K. (1980). Behind closed doors: Violence
in the American family. Garden City, New York: Doubleday I Anchor.
Sullaway, M., & Christensen, A. (1983). Assessment of dysfunctional interaction
patterns in couples. Journal of Marriage and the Family, 45, 653-660.
Touliatos, J., Perlmutter, B. F., & Straus, M. A (1990). Handbook of family
measurement techniques. Newbury Park, CA: Sage.
Vincent, J. P., Cook, N. 1., & Messerly, L. (1980). A social learning analysis of
couples during the second postnatal month. The American Journal of Family
Therapy, 8, 49-68.
Weiss, R. L., &Aved, B. M. (1978). Marital satisfaction and depression as predictors
of physical health status. Journal ofConsulting and Clinical Psychology, 46, 1379-
1384.
Weiss, R. L., &Frohman, P. E. (1985). Behavioral observation as outcome measures:
Not through a glass darkly. Behavioral Assessment, 7, 309-316.
Weiss, R. L., Hops, H., & Patterson, G. R. (1973). A framework for conceptualizing
marital conflict, a technology for altering it, some data for evaluating it. In L. A
Hamerlynck, L. C. Hardy, & E. J. Mash (Eds.), Behavior change: Methodology,
concepts, and practice. Champaign, IL: Research Press.
Weiss, R. L., &Perry, B.A. (1983). The Spouse Observation Checklist: Development
and clinical applications. In E. E. Filsinger (Ed.), Marriage and fam ilyassessment
(pp. 65.84). Beverly Hills, CA:. Sage.
Weiss, R. L., & Margolin, G. (1986). Assessment of conflict and accord: A second
look. InA. Ciminero (Ed.),Handbook ofbehavioral assessment(2nd edition) (pp.
561-600). New York: Wiley.
Whisman, M.A., &Jacobson, N. S. (1991). Changes in marital adjustment following
marital therapy: Comparisons between outcome measures. Unpublished
manuscript. Seattle, WA: University of Washington.
Whisman,M.A.,Jacobson,N.S.,Fruzzetti,A.E.,&Waltz,J.A.(1989).Methodological
issues in marital therapy. Advances in Behaviour Research and Therapy, 11, 175-
189.
Wile, D. (1981). Couples therapy: A non-traditional approach. New York: John Wiley.
Yllo, K., &Bograd, M. (1988). Feminist perspectives on wife abuse. Newbury Park,
CA: Sage.
CHAPTERS

Assessment of Creative
Potential in Psychology and
the Development of a Creative
Temperament Scale for the CPI

Harrison G. Gough

This chapter will deal with two topics. The first is the identification of
creative potential among graduate students in psychology, and predic-
tion of this potential from biographical and test measures available at the
time of entry into training. The second is the development of a scale for
creative temperament on the California Psychological Inventory (CPO
(Gough, 1987), capable of forecasting creative attainment in fields other
than psychology, as well as within the psychological domain. Although
any discipline can be expected to demand certain specific skills and
attributes for creative achievement, it also appears that there is a more
general constellation of personal qualities cutting across disciplinary
boundaries (Amabile, 1983; Barron, 1965; Helson, 1988; Isaaksen, 1988; &
MacKinnon, 1978). Attention will also be paid to similarities and differences
in characteristics associated with creativity for men and women (Bachtold
& Werner, 1970; Helson, 1978).
HARRISON G. GOUGH, Professor of Psychology, Emeritus, University of California,
Berkeley, California.
225
226 HARRISON G. GOUGH

The California Study of


Graduate Students in Psychology
Background
Because most of the analyses to be reported in this chapter are based
on a sample of 1,028 graduate students in psychology (623 men, 405
women), something should be said at the outset about the sample and
how it came into being. Let us begin with its origin. In the spring of 1950,
C.W. Brown, Chair of the Department of Psychology, approached Donald
W. MacKinnon, Director of the Institute of Personality Assessment and
Research OPAR), with the suggestion that IPAR might undertake a study
of graduate students in psychology at Berkeley. The Institute had just
been established in 1949, and was in the midst of planning its research
agenda. Also, the well-known Michigan program of research on beginning
graduate students in clinical psychology (Kelly & Fiske, 1951) had been
underway for several years, and reports were circulating that the perfor-
mance of these students could be forecast from information available at
the time of entry.
Most departments, including that at Berkeley, were already making
use of background data for selecting students, such as undergraduate
grade point averages (GPA) and cognitive ability as assessed by instru-
ments such as the Miller Analogies Test (MAl) (Miller, 1970). One project
already in progress at lPAR involved the study of advanced doctoral
candidates from some 16 departments at Berkeley (see Barron, 1954),
inquiring into the antecedents of their professional promise, originality,
and personal soundness. Brown's proposal was that something along the
line of the Michigan studies and IPAR's own work could be done with
beginning graduate students in psychology at Berkeley.
MacKinnon was favorable to the idea, and presented it to the staff at
IPAR for discussion. Because of the closeness of the staff to the students,
and because a good many of these students would be taking seminars
with us and writing theses under the supervision of IPAR personnel, it was
decided that a full-scale assessment, including intimate life history in-
terviews, probes of background, and observational procedures involving
stress, could not be carried out. Rather, a testing program to include
cognitive, interest, and personality measures should be initiated with
incoming students in the fall of 1950. It was also stipulated that only one
person should have access to the identified data for each student, and
that all other persons-including Institute Director MacKinnon and De-
partment Chair Brown-should be denied any such access. Because of
my interest in psychological testing, and because both the Adjective
Check Ust (ACL) (Gough & Heilbrun, 1983) and the CPI were just then
ASSESSMENT OF CREATIVE POTENTIAL 227

being developed, I was asked by the staff to take on the management of


this project, and I agreed to do so.
It was in this way that the California Study of Graduate Students in
Psychology (CSGSP) had its inception. In the fall of 1950, all of the new
students in psychology were requested to take part in the one-day battery
of tests assembled for the project. Twelve men and four women from the
entering class of 1950 showed up for the session, along with 24 men and
six women from earlier classes, all of whom had been told of the project
and invited to take part if they wished. In this latter group of 30 students
there were four from as far back as the entering class of 1947, and even one
from the entering class of 1946.

Assessment Procedures
The battery of tests included the Strong Vocational Interest Blank for
Men (SVIB-M) (Strong, 1943) to assess interests, and the ACL, CPI, and
Minnesota Multiphasic Personality Inventory (MMPI) (Hathaway &
McKinley, 1940) in the personality sphere. In the 1960s, the revised
version of the SVIB (Campbell, 1966, 1977) was introduced, and later the
325-item current version of the Strong Interest Inventory (SII) (Campbell
& Hansen, 1981; Hansen & Campbell, 1985) was adopted.
For cognitive assessment, a preliminary form of a new Psychological
Vocabulary and Information Test (PVIT) was administered. Harold
Sampson, a 1953 Ph.D. in psychology from Berkeley, was working with me
on this test, and by 1952 we had completed two parallel forms, each
containing 150 items of psychological information, and furnishing scores
on nine areas of psychology (applied, comparative, developmental, ex-
perimental, general, physiological, personality, social, and statistics)
plus a total or overall score for each form. In the analyses to be reported
below, sums of the scores on Forms A and Bwere used for all1 0 measures.
Levine's (1950) Minnesota Psycho-Analogies Test, Forms A and B, was
also included in this first session, and one form or the other was used until
the mid-1970s. In the 1950s, the College Vocabulary Test (CVT) (Gough &
Sampson, 1954) that Sampson and I had constructed was put into the
battery. Each form of the CVT (A or B) had 75 items, calibrated so as to
give an average difficulty level of .50 for college students.
In addition, a brief biographical data blank was employed, to obtain
information about high school and college activities, and from the appli-
cation file undergraduate GPA and scores on the MAT were secured.
Later, after Dawes (1971) had introduced his rating scale for undergradu-
ate colleges, his scoring method was applied to the schools from which
our students had come. Dawes' ratings, relying on data presented by Cass
and Birnbaum (1968), gave higher values to schools that were more
228 HARRISON G. GOUGH

selective in their admissions requirements, and lower ratings to those


that had less stringent requirements for entry.
Also, other tests were tried for varying periods of time. Among the
tests in this category were the Barron-Welsh Art Scale (Barron & Welsh,
1952), the Revised Art Scale from the Welsh Figure Preference Test
(Welsh, 1969, 1975, 1980), Barron's self-report scales for complexity
(Barron, 1953a), independence (Barron, 1953b), originality (Barron, 1965),
and personal soundness (Barron, 1954), and the Chapin Social Insight
Test (Gough, 1968). An experimental test for creativity that I developed
(Gough, 1962) called the Differential Reaction Schedule (DRS) was also
administered. The DRS includes subscales for intellectual competence,
inquiringness, cognitive flexibility, esthetic sensitivity, and sense of
destiny, plus a sum of these five for an indicator of creativity. It also
includes a scale caned "P-4" whose purpose is to assess motivation to
succeed in one's field of work.
Each year from 1950 through 1981, incoming students were asked to
take part in these one-day sessions, and up until1960 an except six men
and four women did so. In the 1960s a trend toward unwillingness to
participate appeared, with 24 men and 15 women saying no. In the 1970s
the numbers saying no increased to 41 men and 45 women. In 1980 and
1981 the numbers refusing to take part (10 men and 13 women) began to
approach parity with those agreeing (eight men and 19 women). This
phenomenon was one of the factors influencing the decision to terminate
the testing program in 1981. Another factor was the belief that a follow-up
for outcomes should be conducted in the 1980s, and that this could best
be done with a completed sample. Altogether, 623 men and 405 women
did participate in the testing program over the 31 years, as com pared with
81 men and 77 women who did not. From the sampling standpoint, this
means that 88.5% of the entering male students were included, and 84.0%
of the females.

Creadvlty Criterion
About once every three years, faculty ratings were gathered of the
students who had entered during that period. One rating was for future
promise, defined as "Potentiality, performance, and promise: An overall
evaluation of the student's performance as a psychologist, potentiality
for significant work in the field, and general promise as a member of the
profession." The other was for creativity, defined as "The creative quality
of the student's thinking and research in psychology." For both ratings,
a seven-point scale was used, with levels defined as fonows: 7 =one of our
best graduate students, 6 = clearly above average, 5 = somewhat above
ASSFSSMENT OF CREATIVE POTENTIAL 229

average, 4 = average, 3 = somewhat below average, 2 = clearly below


average, and 1 = one of our poorest graduate students.
The number of faculty members rating each student ranged from a
rare minimum of two to a maximum of 15; the median number was five.
The raters drew on their personal knowledge of the students (that is, they
did not examine the admissions files, nor did they have any access to the
test scores), and did their ratings independently of each other. Over all
occasions of rating, the median interjudge reliability coefficients were .85
for Potentiality (P), and .74 for Creativity (C). Mean ratings for each
student were standardized within each period of rating so that the "P" and
"C" criteria would be comparable over the full time range of the project.
In the total sample of 1,028 students, the correlation between the P
and C ratings was .84. In spite of this substantial relationship, many
individuals had differences of 10 standard score points or more, and the
personality and other implications of the two ratings were clearly dis-
tinguishable. In general, the correlates of the P ratings suggested qualities
of prudence, good judgment, and long-range perspectives, whereas those
for the Cratings stressed attributes of independence, esthetic orientations,
and a liking for change. In this chapter, only the predictors of the
creativity ratings will be examined; at a later time, a similar study of the
performance criterion will be reported.
From the Departmental records, information was obtained as to
whether students did or did not take the Ph.D. degree at Berkeley. For
those who transferred to other universities, American Psychological
Association registers and other sources were consulted to identify those
who took degrees elsewhere. A log was also maintained of the number of
years required to receive the Ph.D. degree, for those who did so. The
criterion of "time to take the Ph.D. degree" has already been examined in
a prior report (Gough, 1983). It is of interest to note that in the Michigan
study (Kelly, Goldberg, Fiske, & Kilkowski, 1978), the range in number of
years to finish was from three to 24, with almost 10% of the sample taking
1Oyears or more. For the Berkeley doctoral recipients, the range was from
two to 26 years, with 10.3% of the men and 14% of the women taking 10
years or more. The mean number of years at Berkeley was 6.24 for men
and 6.87 for women.
In the late 1970s, a nationwide panel of raters was asked to evaluate
the Berkeley students who had embarked on their training at least five
years earlier. Specifically, the raters were asked to indicate "visibility" (I
have heard of this person, and of his or her work in psychology), and
"quality of work," to be rated only on those whom the rater felt competent
to judge. A new round of similar ratings is contemplated for the future, so
as to gather appraisals for the students who entered the program after the
cut-off date for the first outside ratings.
230 HARRISON G. GOUGH

Finally, in the 1980s, a follow-up questionnaire was sent to all of the


students who could be located, asking for information concerning life
satisfaction and career progress, as well as for a self-rating of professional
attainment. Once these follow-up data are in hand and fully analyzed,
findings will be reported. However, none of this follow-up information is
included in the present chapter.

Description of Sample
Figure 1 presents the mean MMPI-1 profiles for males and females
separately on the three validating and 10 clinical scales of the inventory.
Both profiles show the combination of moderate elevation on both F and
Koften associated with personal effectiveness and progressive tempera-
ment (Dahlstrom, Welsh, & Dahlstrom, 1972; Greene, 1980).
On the clinical scales, males scored highest on Scale 5, whereas
females scored lowest on this same measure. Because high standard
scores on Scale 5 for men indicate femininity, and low standard scores on
Scale 5 have the same meaning for women, this finding carries similar
implications for both sexes. In particular, studies of highly educated and
intellectually talented men usually report elevations on Scale 5, probably
associated with esthetic and culturally sophisticated attitudes more than
any feminization of personality (Friedman, Webb & Lewak, 1989). Scales
4 (psychopathy) and 8 (schizophrenia) are also moderately elevated on
both profiles, suggesting a certain degree of unconventionality and
willfulness (Graham, 1987).
Mean CPI profiles are presented in Figure 2. Both profiles have
standard scores above 60 on the scales for Capacity for Status (Cs),
Achievement via Independence (A1), Psychological-mindedness (Py),
and Flexibility (Fx). These elevations are suggestive of resourceful inde-
pendence, ambition, a talent for psychological thinking, and openness to
new experience. The impression one gets from both inventories, taken
together, is of well-integrated, adaptable, intellectually adept, and expe-
rience-seeking individuals.
The CPI may also be interpreted in reference to its internal structure.
The 1987 revision of the inventory (Gough, 1987, 1989) introduced a three-
vector model of personality structure, based on measures of interpersonal
and normative orientations and self-realization or ego integration. These
three dimensions correspond to the psychometric fundamentals of the
CPI as delineated by smallest-space analysis (Karni & Levin, 1972). A v.1
(vector) scale for interpersonal orientation assesses an axis going from
involvement and interpersonal responsiveness atone pole to detachment
and privacy-seeking at the other. The second vector scale (v.2) assesses
an axis going from norm-favoring and rule-accepting dispositions at one
ASSESSMENT OF CREATIVE POTENTIAL 231

The Minnesota Multiphasic Personality Inventory

.. .....
StarkeR. Hathaway and J. Charnley McKinley
-·$1

U\ ~

no -
FEMALE
1115.:
~ -
.,_ ~-

·· -: ~- ~-

·~

·-
--:. ~-
,._

·- ·-
" -: ~- ·~ ...: ~-
~-

• . .:
,._
"~
I ·- ..-- --
.. ·-
·- ·-
,....: ~

·'»-=: ·-
,. ~ -
~
,._
--,_- '
-
,, ~

~
~~ .~ ,,_
·~ ·- , _
B-: ·-- ,_
•-:
"4
·-
.~

".. :
.~

120 - .

no ~

, _
MALE
~- ·- ,_

., _
'•-:

·- ··-
" -:

·-
to$ ~
• -:
,_ .,_ eo;
·-"-: ··- ..
".. :

..
~

·- ~~

·-
UD-

~ ~ ·· ----
"-: ~- ,._
,._

~...
»- ~-
tO- , _ ··-

-~ -
., ~
-- .~ , ·-
,._
· -:
•-:
·-
••-:
"4
·-
.• ... .....' . _,..
"~
.~

III•SI
'
..• ...... k+111;

Flgure 1. MMPI mean profiles for 405 female and 623 male entering graduate
students In psychology.
232 HARRISON G. GOUGH

pole, to norm-doubting and rule-testing proclivities at the other. These


two scales, considered conjointly, define four ways of living or lifestyles
called Alpha, Beta, Gamma, and Delta.
The Alpha lifestyle derives from an involved, interactive approach to
others combined with positive cathexis of social norms. The Beta lifestyle
incorporates a detached, internally-oriented behavioral mode with firm
commitment to shared value systems. The Gamma lifestyle combines an
interactive drive with dubiety concerning social conventions and the so-
called proprieties. The Delta way of living seeks detachment and distance
from others, in combination with skepticism about most normative
sanctions.
For each of these four lifestyles there are quite specific pathways
either to self-actualization or to self-defeating and problem-inducing
behavior. Ego integration, as assessed by the third vector scale (v.3), is
the key to whether the individual realizes or fails to realize the potential
of his or her type. Raw scores on the v.3 scale are coded into seven levels,
beginning at Levell (the lowest level of ego integration) and culminating
in Level 7 (the highest level).
Alphas at this highest level (v.3 at 7) are natural leaders (see Gough,
1990), capable of decisive actions that truly reflect the consensual
preferences of the group. At their worst (v.3 at 1), Alphas are invasive,
judgmental, and receptive to the appeals of hate or fringe groups. Betas
at Levels 6 and 7 on the ego integration vector can be inspirational models
of goodness and virtue, but at Levels 1 and 2 they tend toward repressive
internal control, depression, and the hysterical pathologies. Gammas, at
their best, are creative and innovative, effective in bringing about needed
change, but at their worst they tend to act out, and to manifest exhibi-
tionistic, histrionic, and narcissistic disorders. Deltas at the highest
levels of integration tend to express their private visions in artistic,
literary, and musical achievements, and in certain kinds of scientific work
such as mathematics where solitary effort is productive. At the lowest
levels on the v.3 scale, Deltas tend toward decompensation, fractionation
of personality, the psychoses, and violence, whether directed against self
or others.
The modal CPI type for both sexes was Gamma, with 45.7% of the 623
men in this category, and 49.9% of the 405 women. Among the men, 26.2%
were Deltas, 20.4% were Alphas, and 7. 7% were Betas. Among the women,
29.6% were Deltas, 13.3% were Alphas, and 6.8% were Betas. In the general
population, the incidence of each type is 25% for both sexes. Thus, among
these psychologists, Gammas are over-represented and Betas are under-
represented.
In regard to level, 57.3% of the men and 62.2% of the women were
classified at 6 or 7, indicating an excellent degree of self-realization or ego
ASSESSMENT OF CREATIVE POTENTIAL 233

Profile Sheet for the Callfomla Psychological Inventory

FEMALE NORMS
r--t---r--t-~r--t---r--t-~---.--,---+-~---+--~--+-~~-~--4---+-~ ~
..

.0 - . -+--x--
, +-'i--1
, f---::-+--=--+---+--> ~ - -·-~ -+--+-~---1-;+-L+:--+--4--T-f.--+--+-~
'f =
- L_ _J __ _ ~~~~~--J_--1-~~~~~---L-·~---L~~--L-~~~
Do Cs Sy Sp Sa In Em Ro So 5< Ci Cm Wb To At Ai lo Py F• FIM

...

»-r;rf--t--1-~-= -+---+--+--~~---+---~~=--"-
~ 1-~---r= - ~- - ~~ -
~ ~
~ f--~,._r-+-=--+---+---::-+-"-
"-+..::,=---+- :;;.
~
..._ --1-" - - --+..::!:l--+-=
~
;;;-+--+----1-i- -=--;--; "
~ ~
j . ._
: f--;;:-b~+-x-t:_.-~.._;;;;j.;.....-+--15--t-"!111_'-t-:~_'--~ - -i;_~ -~ -~-+,...;;_;-+..::=;-,,c.~...:p-'!1-14~+~+--
- " <!! ;L - ::--....:
eo
~ ~- , -_1' ,;'":: i / ;; -,, '~ : I!'
'!" 2' . . . ;. ; ; _;__::/ -
<> - , -
-E ~ ~ : f
- = +,;-.+-;N r!--+~f----lf-;+:i_
_i"'f-:,. .-+o""_-..:1:~~ :-=_;; ;o_ ji --;-4-.12..-1-;,.;;-4...::,-4-~ • a.
1! •
i:,. ,. .. :_ lr

I
~ =.
- : -
10 - . lO 10

~ = - ~ lO 1S ~ - = - ~ - - 2
.. - - -1-=- ~ ::::::-1----=--1-- - -- -:- +..:::=---+-==--~,.'-+-=-+-+-~~-~-=- -;--~--~- ..

~- ~ ~-~-~-~-+-~-~-~--~-4--::-+-==
- - - - - -
- -4-~- ~-~-~-- ~--=-+-=-~-==--~ ..
~ ~
= !. io
-- -; :. - -=-~ 1-;:-
s - "(I

_ 1-•• +-+-==-!-=--+-· =
:- t-f-
- 0 ~- 10 "5' !1
=+..:::-+-:--t-=:.....t-=+-::--1
-, 'f -. ;;;
~ ;
~ t-~-+-.,.-+--+-<>-+--=--+---+-';=-lf---. -+-i=r-+---+--+-...-,.l--;-1--~=--l---1- ~ -1--"-+-~- . - ;,
' - - - ' - --'----''---- ; 1f __ L_:-__L___I_ __L__.L.=-L..~___ L _ j
Do a ~ ~ s. ~ ~ Ro So s.: g ~ ~ Th ~ M ~ ry h ~

Figure 2. CPI mean profiles for 405 female and 623 male entering graduate
students in psychology.
234 HARRISON G. GOUGH

integration. Only 3.4% of the men were classified at Levels 1, 2, or 3, and


the corresponding figure for women was 2.4%.1n the general population,
20% of both men and women receive Level6 and 7 classifications, and at
Levels 1, 2, and 3 the incidence is 39%.
Turning to background and admissions data, mean ages at entry were
25.50 (SD = 4.68) for males, and 25.18 (SD = 5.11) for females. Mean
undergraduate GPAs were 3.49 (SD = .29) for men and 3.55 (SD = .28) for
women. On the Miller Analogies Test (MAT), for the 608 men having this
score the mean percentile rank (based on general population norms) was
71.35 (SD-13.15), and for the 379 women with MAT scores the mean was
69.58 (SD = 12.14). These means are quite similar to the MAT median of 71
reported by Kelly and Fiske (1951) for graduate students In the Michigan
study.
These biodata are not compatible with the social clock expectation of
graduation from college at age 21 or 22, followed by immediate entry into
graduate school. The Berkeley students were either somewhat older than
average at the time of graduating from college, or had delayed three or
four years after leaving college before initiating their graduate studies. If
mean number of years to take the doctorate is added to mean age at entry,
one gets average ages of31.74 for men and 32.05 for women at the time of
receiving the Ph.D. degree.
The average scores reported for the MAT may also be lower than
most readers would anticipate. Many departments use cutting scores on
tests of this kind, restricting admissions to those with percentile ranks of,
say, 80 to 85. At Berkeley, 45.7% of the students had MAT scores of 70 or
below. One reason for this large number is the strong skepticism con-
cerning the predictive value of aptitude tests that prevailed among many
members of the Department in post-World War II years. One renowned
member of the faculty, in fact, often declared (and not entirely in jest) that
the best way to use the MAT was to accept those with low scores and
reject those with high. One consequence ofthese attitudes is the relatively
low MAT mean, along with a rather large standard deviation. Another,
more subtle consequence, is that the quadrant defined by below average
scores on both GPA and MAT is not empty, or even deficient in numbers.
A common contention is that an applicant with a low GPA will be accepted
if and only if his or her aptitude test scores are high; or conversely, if the
aptitude measure is low, then GPA must be high. In fact, for the CSGSP
sample, 249 or 25.2% were in the low-low quadrant defined by MAT scores
of 70 or less and GPAs of 3.50 or below. Thus, any trivial or disappointing
correlations of GPA or the MAT with outcome variables cannot be
explained away by an alleged interaction between these two measures.
ASSESSMENT OF CREATIVE POTENTIAL 235

Findings

Table 1 presents correlations of the creativity criterion with college


prestige ratings, undergraduate GPA, MAT, age at entry, and year of entry,
and also the intercorrelations among these five possible predictors. For
both sexes, the prestige ratings and the MAT scores were significantly
(p ~. 01) and positively correlated with the ratings. Also, MAT scores and
prestige ratings were significantly (p <. 01) related to each other.
Table 2 reports correlations of the 10 scores from the PVIT (based on
the sum of scores on Forms A and B), the Chapin Social Insight Test, Forms
A and B of the College Vocabulary Test, and Forms A and B of the Levine
Minnesota Psycho-Analogies. Five PVIT variables were correlated at or
beyond the .01level confidence for both sexes with the creativity ratings:
Experimental, General, Social, and Statistics, and the Total score. Ex-
perimental psychology was the strongest of these, with coefficients of .18
for men and .22 for women. Both forms of the CVT had correlations for
both sexes at the .01 level of confidence, as did both forms of the
Minnesota Psycho-Analogies. The strongest single predictor in Table 2
was Form B of the Psycho-Analogies, with coefficients of .27 for men and
.24 for women. Because both the PVIT and Psycho-Analogies are based on
information about psychology, it seems that the analogy form of item
adds a small increment in predictive power.
Findings for the Adjective Check List scales are found in Table 3. Only
one of the 37 scales (Creative Personality) registered a statistically
significant (p ~ .01) coefficient for both sexes, with values of .17 for men
and .26 for women. This scale was developed on IPAR assessees in various
studies (Gough, 1979), for whom either staff or external ratings of creativity
were available, and as its name implies is intended to identify persons of
creative temperament. The psychology sample was not used in the
development of this ACL scale, which means that the findings are cross-
validational and unconfounded.
Table 4 presents the findings for the CPl. Six of the folk measures
(scales assessing everyday concepts about personality, and that appear
on the profile sheet) had significant (p ~ .01) and positive relationships to
the creativity criterion: Em (Empathy), To (folerance), Ai (Achievement
via Independence), le (Intellectual Efficiency), Py (Psychological-
mindedness), and Fx(Flexibility). The vector scalev.3, for self-realization,
also had correlations at the .01level of confidence for both sexes. There
were no CPI scales at this level with negative coefficients, although the v.2
scale for pro-normative orientation came close. It is worth pointing out
that the scales with strong positive relationships to the ratings were also
those on which both the male and female samples scored high, at
236 HARRISON G. GOUGH

Table I
Intercorrelations Among the Variables Usted for 623 Male and 405 Female
Entering Graduate Students In Psychology at Berkeley, 1946-1981
Intercorrelations-
Variables 2 3 4 5 6
1. Prestige rating of -.10** .24** -.17** .07 .19**
undergraduate college -.09 .21 ** -.12* -.04 .20**
2. College grade point .06 -.05 .06 .09*
average .04 -.05 .21 ** .14**
3. Miller Analogies Test .05 .01 .25**
.11* .02 .25**
4. Age at entry -.29** -.15**
-.09 -.07
5. Year of entry .05
.13**
6. Creativity rating
by faculty
a, males in row 1, females in row 2 * p~ .05 ** p~.01
Table 2
Correlations between Intellective-Cognitive Measures Administered at
Admission to Graduate School and Faculty Ratings of Creativity
Males Females
Measures N r N r
Psychological Vocabulary
and Information Test:
Applied 530 .02 366 -.02
Comparative .15** .07
Developmental -.01 .11 *
Experimental .17** .22**
General .18** .16**
Personality .08 .11*
Physiological .10* .08
Social .16** .14**
Statistics " .18** .10*
Total Score .18** .17**
Chapin Social Insight Test 258 .07 178 .28**
College Vocabulary Test, Form A 215 .16** 131 .24**
College Vocabulary Test, Form B 221 .16** 125 .23**
Minnesota Psycho-Analogies, Form A 273 .20** 143 .21 **
Minnesota Psycho-Analogies, Form B 464 .27** 264 .24**
* p~ .05 ** p ~ .01
ASSESSMENT OF CRFATIVE POTENDAL 237

Table3
Correlations between Scales of the Adjective Checklist at Entry to Graduate
School and Faculty Ratings of Creativity
ACLScales Males& Females11
Number of adjectives checked -.08* .02
Number of favorable adjectives .01 .15**
Number of unfavorable adjectives .03 -.03
Communality -.03 .13**
Achievement .00 .09
Dominance .00 .12*
Endurance -.05 .04
Order -.04 -.02
lntraception .06 .13**
Nurturance -.04 .03
Affiliation -.02 .10*
Heterosexuality -.08* .09
Exhibition .00 .07
Autonomy .07 .05
Aggression .02 .02
Change .04 .07
Succorance -.08* -.11*
Abasement -.06 -.08
Deference -.04 -.05
Counseling Readiness -.03 .02
Self-Control -.03 -.06
Self-confidence .02 .13**b
Personal Adjustment -.02 .09
Ideal Self .04 .14**
Creative Personality .17** .26**
Military Leadership .00 .10*
Masculinity .02 .05
Femininity -.07 .03
Critical Parent -.02 -.01
Nurturing Parent -.02 .05
Adult .02 .08
Free Child .04 .09
Adapted Child .00 -.11*
High Origence I Low Intellectence -.03 -.05
High Origence I High Intellectence .09* .07
Low Origence I Low lntellectence -.07 .03
Low Origence I High Intellectence .02 .06
a, N = 623 * p s. .05 b, N = 405 ** p s. .01
238 HARRISON G. GOUGH

standard scores of 58 or above in every instance. This cluster of scales


carries implications for perceptiveness about people, fair-mindedness,
good use of intellectual resources and a strong need for achievement, in
particular in situations allowing for self-definition of ways to proceed.
Table 4 also cites correlations of .25 and .33, for men and women,
respectively, for the new scale for creative temperament. This scale, to be
described in detail below, was developed by item analysis within the
sample of 1,028 students, which of course means that positive relation-
ships are to be expected between the scale and the creativity ratings.
Findings for the MMPI are given in Table 5. For the men, only two
scales (Land Mf) had coefficients at the .Ollevel of confidence. That for
Table 4
Correlations between Scales of the CPI at Entry to Graduate School and
Faculty Ratings of Creativity
CPI Scales Males"' Femaleab
Do (Dominance) -.01 .18**
Cs (Capacity for Status) .09* .22**
Sy (Sociability) -.02 .11*
Sp (Social Presence) .07 .19**
Sa (Self-acceptance) .01 .22**
In (Independence) .08* .27**
Em (Empathy) .11** .15**
Re (Responsibility) .01 .09
So (Socialization) .03 .02
Sc (Self-control) -.06 -.03
Gi (Good Impression) -.09* -.06
Cm (Communality) .00 .07
Wb (Well-being) .01 .07
To (folerance) .20** .24**
Ac (Achievement via Conformance) -.03 .06
Ai (Achievement via Independence) .20** .34**
Ie (Intellectual Efficiency) .21 ** .30**
Py (Psychologlcal-mindedness) .15** .25**
Fx (Flexibility) .16** .21 **
F/M (Femininity/Masculinity) .01 -.11 *
v.1 (Introversive orientation) -.03 -.16**
v.2 (Pro-normative orientation) -.10** -.11*
v.3 (Self-realization) .12** .21 **
MS (Baucom Masculinity Scale) -.01 .20**
FM (Baucom Femininity Scale) .02 .01
CT (Creative Temperament Scale) .25** .33**
a, N = 623 * p~ .05 b, N = 405 ** p~.01
ASSESSMENT OF CREATIVE POTENTIAL 239

the Lie scale was negative (r = -.12) and that for the Mf scale was positive
(r = .13). It should be mentioned that all of the correlations in Table 5 were
computed from raw scores on the MMPI. This means that for the Mf scale
higher scores are associated with stronger femininity for both sexes.
Although four MMPI scales produced correlations at the .Ollevel for
women, neither L nor Mf was among these. The Hypochondriasis, De-
pression, and Welsh Anxiety scales had negative coefficients, whereas
the Barron Ego Strength scale was positively related to the ratings.
Table 6 presents findings for a number of measures directed toward
esthetic, independent, and non-conformist attributes. Three of Barron's
self-report inventory scales were significantly (p ~ .01) related to the
ratings for both sexes: personal complexity, independence of judgment,
and disposition toward originality. Barron's scale for soundness; on the
other hand, was essentially uncorrelated. Both versions of the Art Scale
were correlated with the ratings at the .Ollevel. Within the unpublished
Differential Reaction Schedule, three of the subscales pertaining to
originality showed positive and significant (p ~ .01) values, and the total
score for originality did the same. The "P-4" scale for motivation to
succeed also revealed significant relationships. All of the measures in

Table 5
Correlations between Scales of the MMPI at Entry to Gradnate School and
Faculty Ratings of Creativity
MMPI Scales Males- Femalesb
L (Lie) -.12** -.09
F (Frequency) .02 -.09
K (Ego Functioning) -.01 .09
Hs + .5K (Hypochondriasis) -.01 -.18**
D (Depression) -.01 -.19**
Hy (Hysteria) .03 -.03
Pd + .4K (Psychopathic Deviate) -.05 -.04
Mf (Femininity) .13** .05
Pa (Paranoia) -.03 -.03
Pt + K (Psychasthenia) .02 -.11*
Sc + K (Schizophrenia) -.03 .00
Ma + .2K (Hypomania) -.04 .04
Si (Social Introversion) .04 -.12*
A (Welsh Anxiety Scale) .01 -.17**
R (Welsh Repression Scale) .04 -.06
ES (Barron Ego Strength Scale) .07 .21 **

a, N =623 * p~ .05 b, N =405 ** p~.01


240 HARRISON G. GOUGH

Table 6 were developed in other studies, on different samples, and thus


indicate that attributes associated with creativity in other fields of
endeavor also, in general, are associated with creativity within the field
of psychology.
Table 7 gives the findings for the Strong Interest Inventory. Because
three different versions of this test were used during the course of the
study, and because of the dropping of some scales and the adding of new
ones, a choice had to be made of what measures to include. The 31 scales
mentioned in Table 7 were found in all three forms. Most of the students
were tested with the original SVIB Form M. For the revised forms, the male
occupational keys were used for both sexes, to preserve equivalence. The
six Occupational Theme and 23 Basic Interest scales that play a pre-
dominant role in the structuring of the current version of the Strong test
could not be included, because these scales are not scorable on the svm
FormM.
Two scales (Artist and Psychologist) yielded significantly (p < .01)
positive correlations for both sexes. In the negative direction, six scales
met this requirement: Police Officer, Accountant, Purchasing Agent,
Banker, Mortician, and Pharmacist. The pattern of relationships is quite
similar to that reported by Hall and MacKinnon (1969) in their analysis of
the correlates of creativity among architects. However, their three-scale

Table6
Correlations between Selected Research Measures at Entry to Graduate
School and Faculty Ratings of Creativity
Males Females
Measures N r N r
Barron Complexity/Simplicity Scale 623 .28** 405 .20**
Barron Independence Scale .27** .26**
Barron Originality Scale .18** .24**
Barron Personal Soundness Scale .02 .09
Barron-Welsh Art Scale 259 .22** 162 .25**
Welsh Revised Art Scale .20** .21 **
Differential Reaction Schedule
Intellectual Competence 623 .10** 405 .19**
lnquiringness .18** .17**
Cognitive Flexibility .13** .24**
Esthetic Sensitivity .19** .08
Sense of Destiny .03 .19**
Sum of the above five scales .20** .29**
P4 (Motivation for Success) .15** .27**
* p.:;,.05 **p.:;,.01
ASSESSMENT OF CRFATIVE POTENTIAL 241

regression equation for creativity could not be applied because two of


those scales could not be scored on the sample of graduate students. The
scales in Table 7 with strongest links to creativity suggest themes of
interest in esthetic and investigative matters, and little interest in activi-
ties requiring attention to detail, or to the enforcement of norms.

Table 7
Correlations between Selected Scales of the Strong Interest Inventory at
Entry to Graduate School and Faculty Ratings of Creativity
Scales Males" Females~>

Artist .23** .13**


Psychologist .22** .23**
Architect .15** .05
Physician .11* .07
Dentist .03 -.03
Veterinarian -.13** -.07
Mathematician .22** .08
Physicist .21** .10*
Engineer .09* -.01
Chemist .17** .05
Farmer -.05 -.08
Mathematics Teacher -.09* -.05
Police Officer -.22** -.18**
Forester -.09* -.01
Personnel Director -.14** -.01
Public Administrator -.05 .06
Social Worker -.02 .13**
Social Science Teacher -.17** -.08
City School Superintendent -.05 .04
Minister .04 .11*
Musician .08 .10*
Accountant -.16** -.21 **
Purchasing Agent -.21 ** -.23**
Banker -.25** -.27**
Mortician -.28** -.17**
Pharmacist -.17** -.20**
Realtor -.14*'" -.12*
Life Insurance Salesperson -.22** -.22*
Advertising Executive .05 .02
Lawyer .15** .10*
Reporter .22** .10*
a, N = 593 * p,::..05 b, N = 387 ** p.::. .01
242 HARRISON G. GOUGH

A CPI Type/Level Analysis

As indicated above, it is possible to conceptualize CPI findings on the


basis of the three fundamental vectors discernible in the interscale
matrix. When the four types or lifestyles are examined in conjunction with
the seven possible levels of ego integration, an interactive grid can be
generated. This examination is called a Type/Level analysis. The findings
from such an analysis are given in Table 8.
Ideally, one would make use of all 28 cells specified by the model (4
Types times 7 Levels for each Type). Because of the relatively small
number of students in the Beta category, it was necessary to combine on
Level into 1 through 4, 5 alone, and 6 plus 7. The contingency table and
ANOVAs were first reviewed for males and females separately. Because
trends in each case were almost identical, only the findings for all1,028
students are reported in Table 8. For this sample, the modal classification
on Type was Gamma, with 47.4% of the students in this category. Next
come Delta, with an incidence of 27.5%, then Alpha with an incidence of
17.6%, and finally Beta with an incidence of 7.5%. The F ratio for the Type
main effect was significant at the .02level of confidence. Gammas had the
highest mean rating on creativity, followed by Alphas, then Deltas, and
then Betas. This finding is consonant with expectations of change-seeking
and revisionist inclinations among Gammas.
On Level, the students at Levels 1 through 4 had a mean creativity
criterion rating of 47. 79, those at LevelS had a mean of 50. 73, and those
at the highest two levels (6 and 7) had a mean of 50.95. This progression
was significant beyond the .01 level. The interaction of Type and Level

TableS
Analysis of Variance by Type and Level for Creativity Ratings of 1,028
Graduate Students In Psychology
Levels 1-4 LevelS Levels 6-7 Total
Type N M N M N M N M SD
Alpha 30 47.20 43 50.86 108 50.43 181 49.99 10.21
Beta 22 44.45 6 41.00 49 49.73 77 47.55 11.36
Gamma 59 49.25 134 51.99 294 51.46 487 51.33 9.52
Delta 45 47.89 80 49.27 158 50.75 283 49.88 9.66
Total 156 47.79 263 50.73 609 50.95 1,028 50.41 9.87

Analysis F df p
Type 3.22 3 .02
Level 5.44 2 <.01
TxL 1.03 6 .41
ASSESSMENT OF CRFATIVE POTENTIAL 243

was not significant in regard to the creativity criterion, as indicated by the


F ratio of 1.03 (p = .41).
Although the full story of the analysis is contained in Table 8, it is not
easy to visualize the trends from this mode of presentation. For this
reason, Figure 3 is presented, showing how the four lifestyles vary around
the overall average rating of 50.41 as Level increases.
For each grouping on Level, Gammas rank highest and Betas rank
lowest. Alphas and Deltas do not differ from each other by much, but are
clearly lower than the Gammas and higher than the Betas. Alphas,
Gammas, and Deltas all increase monotonically on the criterion as Level
rises, but among the Betas those at Levels 6 and 7 show a slight decline
from the mean for Betas at Level 5. In view of the strong upward trend for
Level, this finding for Betas may well be sample-specific.
Beyond the classificatory rubrics, what does it mean to be a Gamma
or a Beta? In the CPI manual (Gough, 1987), California Q-set (Block, 1961)
descriptions by observers are reported for each of the CPI Types. In a
contrast of 198 Gammas versus 595 others, the six Q-sort items most
strongly descriptive of those in the Gamma category were: Characteris-
tically pushes and tries to stretch limits; sees what he or she can get away
with (r = .26); Various needs tend toward relatively direct and uncon-
trolled expression; unable to delay gratification (r =.23); Is self-dramatizing,
histrionic (r = .22); Tends to be rebellious and nonconforming (r = .21); Is
self-indulgent (r = .21); and, Is verbally fluent; can express ideas well (r =
.18). Gammas tend to be seen as rule-testers, colorful, and expressive.
The six largest positive correlations for a Q-sort items when 154 Betas
were pitted against the complement of 793 others were: Tends toward
over-control of needs and impulses; binds tensions excessively; delays
gratification unnecessarily (r = .26); Behaves in an ethically consistent
manner; is consistent with own personal standards (r = .21); Genuinely
submissive; accepts domination comfortably (r = .20); Is fastidious (r =
.20); Favors conservative values in a variety of areas (r = .18); and, Is a
genuinely dependable and responsible person (r = .16). Betas appear to
others as inhibited, conservative, and unassertive, but also as depend-
able and responsible. It seems reasonable, from these descriptions of
Gammas and Betas, to expect more creative work in psychology from
Gammas than from Betas.

The CT Scale for Creative Temperament


The CT (Creative Temperament) scale was briefly mentioned above,
in the discussion of Table 4. The development of this scale can now be
described. For the 623 men, 405 women, and alii ,028 students, each item
244 HARRISON G. GOUGH

G•-•-....-I /
I
I
I
/
I
I
I
I
I
/.U~
51 I
/ ... .Jelt&s
/
/
/
50 /
/
/
/
/
/
Bet.&•

I r ·-·- ·-·-'
49

48
.· /
I
I
47
I
I
46
I
I
45 i
I
44

1+2+)+4 5

Flgure 3. A CPI Type/Level analysis of ratings of creativity for 1,028 graduate


students in psychology.

in the CPI was assigned a dummy weight of "1" if answered true, and a
dummy weight of "0" if answered false. Then these dummy weights were
correlated with the criterion ratings of creativity. Items were selected for
the CT scale if their correlations were at the .0 1level of confidence in the
total sample, and at or beyond the .10 level for each sex considered
separately. Thirty-five items met these requirements. Seven additional
items whose content was congruent with psychological notions about
creativity and whose correlations with the criterion were at least at the
.10 level of significance were then added, making a total of 42 items in the
ASSESSMENT OF CRFATIVE POTENTIAL 245

CT scale. Twelve items had positive correlations, and were therefore


keyed for "true" responses. Thirty items had negative coefficients and
were therefore keyed for "false" responses. When the scale was scored on
the male students, a mean of 30.23 was obtained (SD = 4.54); for women,
the mean was 29.92 (SD = 4.96). For the 1,000 males in the CPI norm
sample, CT yielded a mean of 22.33 (SD = 5.92), and an alpha reliability
coefficient of. 72. For the 1,000 females in the norm sample the mean on
CT was 22.02 (SD = 5. 75), and the alpha reliability coefficient was .70.
Inspection of the 42 items suggested the existence of four major
themes. The first pertains to poise and self-confidence, with items such
as "I am a better talker than listener" (true), and "Clever, sarcastic people
make me very uncomfortable" (false). The second cluster embodies
individualized and nonconventional personal values. Illustrative items
are "If the pay was right I would like to travel with a circus or carnival"
(true) and "I always looked up to my father as an ideal man" (false).
Cluster three expresses liking for the unpredicted and improbable.
Representative items are "I much prefer symmetry to asymmetry" (false)
and "I like to plan out my activities in advance" (false). The fourth cluster
involves progressive social attitudes, such as "It makes me angry when I
hear of someone who has been wrongly prevented by voting" (true), and
"Only a fool would try to change our American way of life" (false). The
numbers of all42 items and their keyed direction for scoring may be found
in the CPI Administrator's Guide (Gough, 1987), page 89.
As reported earlier (Table 4), the CT scale correlated .25 with creativ-
ity ratings for males and .33 with the ratings for females. But these are the
samples on which the scale was developed, in which positive relation-
ships are to be expected. How would CT fare if applied to cross-validating
samples, such as those included in the IPAR studies of creativity?
Table 9 gives this information, with samples ranked in order of their
mean raw scores on the CT scale. After the psychology samples, the
highest mean was found for the 45 research scientists (Gough &
Woodworth, 1960), composed principally of electrical engineers, physi-
cists, and applied mathematicians. Criterion ratings of creativity for these
men were a composite of the ratings of laboratory supervisors and peer
ratings among the 45 scientists themselves. All of the men were in space
technology centers in the same region of the country, and all of them were
at least somewhat familiar with the work of the others. The CT scale
correlated .33 with the overall composite for creativity.
The 51 theoretical mathematicians studied by Helson and Crutchfield
(1970) ranked next highest. For them, detailed examination was carried
out of the biographical accounts in rosters of American scientists, and
evaluations were also made of published work. The overall rating of
creativity for this sample correlated .47 with scores on CT. The high-
246 HARRISON G. GOUGH

aptitude college seniors studied by Helson (1967) was ranked next. Half
of this sample had been nominated by faculty members as exceptional or
outstanding in regard to creativity, whereas the others-matched on
aptitude scores-were not nominated. From the strength of the nomi-
nations a differentiated criterion was formulated; its correlation with CT
was .52.
A sample of 41 women in mathematics (Helson, 1971) comes next. A
panel of eminent mathematicians rated the work of all these women for
creativity, and these ratings served as the criterion index; its correlation
with CT was .46. Sixty-six honors students in engineering came next
(Gough, 1976). They were rated by from four to twelve faculty members
with whom they had taken courses and seminars. These ratings for
creativity when pooled yielded a correlation of .53 with CT. The 124
architects studied by MacKinnon (1964) was ranked next on the CT scale.
One subsample was composed of eminent, world-famous architects,
another subsample was made up of men of similar age who had worked
with these eminent practitioners, or in the same firm with them, and a
third subsample was composed of a cross-section of members of the
architectural association, matched for age with the first subsample. The
names of all124 men were then submitted to a panel of university teachers
of architecture, editors of major journals, and leading practitioners not
included in the total. Their ratings were pooled, leading to the criterion
evaluation for creativity. This index correlated .44 with the CT scale.
Finally, the 37 business executives in Ireland studied by Barron and Egan
(1968) were rated by members of the assessment staff and also by a panel
drawn from the Irish Management Institute. Because the former ratings
were based more on style and personality than on the actual work of the
Table 9
Means and Standard Deviations on the CT (Creativity) Scale and
CorTelations with External Ratings of Creativity In the Samples Indicated
Samples N M SD r
Psychology graduate students, males 623 30.23 4.54 .25*
Psychology graduate students, females 405 29.92 4.96 .33*
Research scientists, males 45 28.09 4.90 .33*
Mathematicians, males 57 27.79 4.80 .47*
High-aptitude college seniors, females 51 27.45 4.21 .52*
Mathematicians, females 41 25.12 6.12 .46*
Honors students in engineering, males 66 25.02 4.91 .53*
Architects, males 124 24.62 5.49 .44*
Irish business executives, males 37 22.73 4.47 .34*

* p~ .01
ASSFSSMENT OF CREATIVE POTENTIAL 247

men, only the evaluations by the lMl experts were used. These ratings
correlated .34 with the CT scores.
Two comments are in order on the findings in Table 9. The first is that
the CTscale correlated positively and significantlywith criteria of creativity
in all cross-validating samples, going from a low of .33 to a high of .53, and
with a median of .46. It is apparent that the CT scale, even though
developed on a sample of graduate students in psychology, assesses
qualities that are related to creative attainment in other fields as well, and
perhaps to creativity in general. The second is that all of the cross-
validating coefficients surpass those found in the initial sample. This is
unusual indeed. At least part of the explanation must lie in the carefully
evolved and highly valid criteria available for these cross-validating
samples. It is an axiom in criterion-linked research that as the precision
and validity of the criterion goes up the stronger will be the relationships
to measures diagnostic of this criterion.

Personologlcal Implications of Cf
What are the attributes associated with higher and lower scores on
CT, beyond the basic goal of identifying creative potential? A good way to
discover these more general implications is to examine descriptions that
observers give of persons who have taken the CT scale, in order to
determine specific implications of higher and lower the scores. The
observers, of course, must not have any knowledge of the CT scores, and
it is even better if they are not focused on creativity or any particular
notions relevant to this criterion. In the archival files at IPAR samples of
530 men and 293 women were available, all of whom had taken the CPI and
all of whom had been described on the ACL by panels of 10 staff observers.
By summing the number of observers checking each adjective, a de-
scriptive score ranging from a possible minimum of 0 to a possible
maximum of 10 can be generated, and then these sums can be correlated
with the CT scale. When this was done in the sample of 823 assesses, far
too many adjectives had coefficients significant at the .01level of prob-
ability to warrant citing them all. For this reason, only those adjectives
with positive correlations of .30 or above, and negative correlations of-
.23 or below were selected for review. A second consideration was that
any adjective cited should be significantly (p ~ .01) related to CT for each
sex considered alone.
Applications of these rules led to the designation of the eight adjec-
tives most strongly descriptive of persons with high scores on CT, and the
eight adjectives more strongly descriptive of persons with low scores.
The eight adjectives associated with high scores and their correlations
for men and women, respectively, were: imaginative (.41, .39), curious
248 HARRISON G. GOUGH

(.40, .37), interests wide (.40, .37), original (.37, .34), resourceful (.34, .33),
versatile (.33, .36), clever (.32, .33), and complicated (.32, .30).
The eight adjectives most strongly descriptive of persons with lower
CT scores were: conseroative (-.33, -.48), conventional (-.31, -.48), interests
narrow (-.29. -.40), simple (-.27, -.33), commonplace (-.28, -.27), dull (-.25,
-.29), stolid (-.23, -.31), and rigid (-23, -.24).
A similar analysis was carried out in a sample of 236 couples, in which
each person (N = 472) was described on the ACL by a spouse or partner.
These couples had been studied in projects on population psychology
(Gough, 1973) and interpersonal dependency (Hirschfeld, Klerman, Gough,
Barrett, Korchin, & Chodoff, 1977). There were 201 couples from the
former project, and 35 from the latter. Dummy weights of 1-0 on each
adjective were correlated with CT scores for all472 persons, and for the
236 men and 236women separately. The reduced range on the descriptive
side (from 0 to 10 down to 0-1), plus the probable loss in validity in going
from a panel of 10 observers to a single observer, led to much lower
correlations in this sample. For indicative items, the rules for selection
were (1) a coefficient of .20 of greater in the full sample, and (2) correla-
tions of .17 or more for both men and women considered separately. For
contraindicative items, cutting points were set at -.13 or beyond for the
total sample, and at the same value for each sex considered separately.
The six most descriptive terms under these rules were: unconventional
(.20, .29), individualistic (.24, .1 7), imaginative (.30, .25), insightful (.21, .21 ),
adventurous (.19, .22), and reflective (.17, .26). The six adjectives most
strongly associated with low scores on CT were: conseroative (-.25, -.30),
interests narrow (-.14, -.30), prejudiced (-.27, -.14), conventional (-.18, -.17),
silent (-.15, -.16). and organized (-.14, -.13).
Another kind of information from which inferences about the meaning
of the CT scale can be drawn is the trend in mean scores for various
samples. The current CT manual (Gough, 1987) presents such data for 39
male and 30 female samples. The male normative sample had a mean of
22.33 and a standard deviation of 5.92 on CT. Among the samples with
distinctly higher scores were psychologists (M = 30.23), research scien-
tists (M = 28.09), mathematicians (M = 27.79), social work graduate
students (M =26.67), and medical students (M =26.09). Among those with
distinctly lower means were correctional officers (M = 18.59), prison
inmates (M = 18. 76), sales managers (M = 19.49), and police officers (M =
21.11).
The female normative sample had a mean of 22.02 and a standard
deviation of 5. 72. Among the female samples with distinctly higher scores
were psychologists (29.92), high-aptitude college students (M = 27 .45),
social work graduate students (M = 27.00), medical students (M = 26.16),
and students of law (30.15). Among the groups with distinctly lower
ASSFSSMENT OF CREATIVE POTENTIAL 249

scores were prison inmates (M = 17. 76), psychiatric clinic patients (M =


18.32), and high school students (M = 18.32). The 34 women in the Oregon
colony of followers of Bhagwan Shree Rajneesh studied in the early 1980s
by Sundberg, Latkin, Littman, and Hagan (1990) had a mean score of 28.35
on CT. The 33 men in this same colony tested on the CPl had a mean score
of 29.94.
From all of the descriptive and classificatory evidence just presented,
it appears that high scores on CTcarryimplications for unconventionality,
personal complexity, imaginativeness, and breadth of interests, whereas
low scores suggest narrowness of interests, preference for routine,
acceptance of tradition, and inflexibility.

A Conceptual Synthesis
Many specific variables predictive of creativity in psychology have
been cited in Tables 1 through 9 above, but without any attempt to group
them into logical or functional categories. In Table 10, nine such func-
tional clusters are proposed, on the basis of rational analysis, and key
measures for each category are cited.
The first category refers to the ability to "educe relationships,"
employing Spearman's (1904) language for the essential capacity in
general intelligence. The analogy item, in particular if applied to field-
relevant content, seems to tap cognitive processes that are important for
creative work in psychology.
The second category is breadth of pertinent information. Those who
know more about the broad range of a field, including esoterica in its
nooks and crannies, are better equipped to do things that are new and
consequential.
A third cluster refers to independence of mind, and to independence
in interpersonal behavior. A fourth comprises esthetic interests and
orientation. A fifth cluster incorporates measures of openness to experi-
ence, liking for complexity, and ego differentiation. A sixth involves
psychological-mindedness and psychological interests. Adaptive flexibil-
ity constitutes a seventh cluster, and minimal interest in work stressing
details and record keeping or the enforcement of social norms defines an
eighth category. Finally, scales developed specifically to identify creative
potential, whether in psychology itself or in other fields of endeavor,
appear to assess qualities that can predict creative performance. It
should be pointed out that the clusters proposed in Table 10 for at-
tributes of creativity important in psychology are quite similar to clusters
proposed by others (for example, Barron, 1965) for creativity in general.
250 HARRISON G. GOUGH

Table 10
Conceptual Grouplog of Measures Showing Promlae as Predictors of
Creative Potential In Psychology
1. Ability to educe relationships
a. Miller Analogies Test (.25, .25)
b. Minnesota Psycho-Analogies, Form A (.20, .21)
c. Minnesota Psycho-Analogies, Form B (.27, .24)
2. Range of psychological information
a. PVIT Experimental Psychology Subscale (.17, .22)
b. PVIT General Psychology Subscale (.18, .16)
c. PVIT Total Score (.18, .17)
3. Independence of Judgment
a. Barron Independence Scale (.27, .26)
b. CPI Achievement via Independence Scale (.20, .34)
4. Esthetic propensity
a. Barron-Welsh Art Scale (.22, .25)
b. Welsh Revised Art Scale (.20, .21)
c. Strong Interest Inventory Artist Scale (.23, .13)
5. Personal complexity and differentiation
a. Barron Complexlty/Simpllclty Scale· (.28, .20)
b. DRS lnqulrlngness Scale (.18, .17)
6. Psychological Orientation
a. CPI Psychological-mlndedness scale (.15, .25)
b. Strong Interest Inventory Psychologist Scale (.22, .23)
7. Adaptive flexibility
a. CPI Flexlblllty Scale (.16, .21)
b. DRS Cognitive Flexibility Scale (.13, .24)
8. Minimal Interest In occupations with strong norm-enforcing or detail-
centered requirements
a. Strong Interest Inventory Accountant Scale (-.16, -.21)
b. Strong Interest Inventory Banker Scale (-.25, -.27)
c. Strong Interest Inventory Pharmacist Scale (-.17, -.20)
d. Strong Interest Inventory Pollee Officer Scale (-.22, -.18)
e. Strong Interest Inventory Purchasing Agent Scale (-.21, -.23)
9. Above average scores on scales developed to assess creative potential
a. ACL Creative Personality Scale (.17, .26)
b. Barron Originality Scale (.18, .24)
c. CPI Creative Temperament Scale (.25, .33)
d. DRS Total score (.20, .29)

Note: Correlations of each measure with criteria of creativity for males and
females, respectively, are given within the parentheses.
ASSESSMENT OF CREATIVE POTENTIAL 251

Two Case Illustrations


The nine categories in Table 10, including the measures listed within
each, represent the main findings from this analysis. However, there is
still the crucial matter of individual cases, and the degree to which the
abstract, generalized notions embodied in Table 10 will apply to specific
individuals. Another consideration is that we live in a changing world,
whose contingencies are not always favorable to the realization of
potential. In psychological assessment, therefore, the ultimate value of
any point of view or system of measurement must be judged against its
veridicality when applied to individuals.
In Figure 4, CPI profiles are presented for two male graduate students
from the sample. Both entered the graduate program at about the same
time, both had impressive undergraduate records, both took their Ph.D.
degrees in less than the Berkeley norm of 6.24 years for men, and both
were rated very high on creative potential, with a standard score of 70 for
Case 1 and the same rating of 70 for Case 2. From the standpoint of creative
promise, these were two exceptionally talented and gifted young men.
Inspection of the CPI profiles shows that although both had high
scores on the CT scale (70 for Case 1, 73 for Case 2), and also high scores
on scales related to creativity, such as Tolerance, Achievement via
Independence, Psychological-mindedness, and Flexibility, theywere quite
different on certain other scales such as Dominance, Sociability, Social
Presence, Self-acceptance, Well-being, and Commonality. On these latter
scales, Case 1 scored considerably above average whereas Case 2 ranked
distinctly lower.
In terms of the CPI theoretical model of personality structure, Case 1
is an Alpha, a person who can be expected to have strong pro-normative
inclinations along with a comfortable sense of efficacy in dealing with
others. Case 2 is a Delta, a person with unresolved and even unresolvable
doubts about society's conventions, and with strong needs for distance
from others and for inner privacy. At the time of testing, both were at
Level 6, indicating well-integrated and effective functioning.
Case 1, after receiving the Ph.D. degree, went on to a professional
career marked by one commendable achievement after another, and
within 10 years had achieved both national and international recognition.
However, his personal life was not conflict-free, and in fact he has faced
severe interpersonal and familial problems; but all of these problems
have been dealt with without any visible loss of career momentum or
diminishment of his capacity for creative work.
Case 2 began his professional career with a position in a highly ranked
university, and seemed to be off to a good start. However, personal and
familial problems soon arose, which for this man were debilitating and
Prome Sheet for the Callfomla Psychological Inventory I~
Do Cs Sy Sp s~ In Em Re So Sc Gi Cn• Wb To Ac Ai I~ Py Fx F/M
~ ~

MALE NOIU..,IS "'-


~-
l -
- •O ~
- "'
..; ;; ---- JO
-
...
=
-
- :-r----l ..
-
: i : - : ~ ; i ~ ~
- lS - - 2S : - 35 l> - - .lS - __x_ '~ I 70
ro "' ;;; - - - JO ~-t-_ = :~~~ :Yi' -- •• V~'l< - ' "'
-~\ : - - ~ : I-A !2 XI - _ 1- ,~ ~l : / -~ f\.;
-I /-~ V,0. - r;·'ii)"- - - ~ ~ ~ L'?"_ ..12. u ' ,.
'" ;s 20 1 "-~./ - --1-': \ .,.. ~
-;-r~ t-~ ·-t-
' 1\· "' - 20 1 20 - , - ,...... ~ -k ./ ~
i / , ''
- "' _ -_ ,"'-'··-
...... -_
t-=~ lU - I - - os •
- ,: - ,: / -
ril_
- I >£ " ~~ " _I 10- ·: _ - · "' - '' - io a
oCI... I -t -- J$ - , - " - -
so
. r-·_,
L.: I -- :s
i· -1-20
- - ·,
20 • ,, - ~ - - 20 - ' - ~ 20 - I('
-1 - - - - .L " : !! .....__ ..J. : - - ~
.. l : : IS, - - 20 25 IS. - l$ - -
.., 1-;-; : ~ ~
- - - ;;- r---t-: : ~ ; 2; 20 :: : oo- r- oo- "'
- - - ~ -
- 1 ,. - •s
~~- ~ ~
..=1- 1 -J·
JO - ~- =- t-=-- ~ JO
t-:-t-;o
: : : I l l-- 1--13;
101- - - - - .!. = - 2'0 - - =
- $ : !!! : = 0 25 : s ;;; "$' !l - -
20 - - - - - r· - o - - o- 1-:- - - - : _ 20

: ~ - " - ; ~
0-5 0: !i- - -;_
. :0 -~ :- -
= ~ : : = ; , 0
.. ~-= I - I I .. I - I I-
. -•_ ,
- - - 10 0 5 ..

I ~ - : :
_ -L_ 0 -- '-- 2~
Do Cs Sy Sp Sa In Em Re So Sc Gi Cm Wb To Ac Ai Je Py Fx F/M !='l
flgure 4. Case I (-): Type and Level = Alpha 6, Creativity rating = 70, CT score= 70. 8
Case 2 (- - -): Type and Level = Delta 6, Creativity rating = 70, CT score= 73. 8
::
ASSESSMENT OF CRFATIVE POTENTIAL 253

eventually disastrous insofar as his career in psychology was concerned.


Within 10 years, Case 2 had left the field of psychology, lowered his
occupational goals, and more or less resigned himself to work that made
little use of his training or of his creative talent.
What these two cases demonstrate is that the possession of promise
is not the same thing as the fulfillment of promise. Psychology is fortunate
in having a contributor with the ability and fortitude of Case 1. Psychology
is unfortunate in that the equally remarkable creative talent of Case 2 has
been kept from expression by the vicissitudes of life, and by his vulner-
ability to these stresses. Creativity is not some disembodied entity,
moving ineluctably to its own destiny. On the contrary, creative promise
is an attribute of individuals whose realization will depend on many
factors that lie outside of their own control.

Summary
A prospective study of 1,028 entering graduate students in psychol-
ogy (623 men, 405 women) was carried out, relating biographical data and
test measures gathered at the time of beginning the graduate program to
faculty ratings of creativity obtained from one to three years later. The
project began in the fall of 1950, and testing was continued each year up
through the fall of 1981.
In the biographical realm, the only variable related to the criterion of
creativity for both sexes was a prestige rating of the undergraduate
college attended, derived from the selectivity of each college's admissions
practices, with correlations of .19 for men and .20 for women. Graduate
students coming from more prestigious or selective colleges tended to
receive higher ratings on creativity from their graduate instructors than
did students coming from less prestigious schools.
In the realm of cognitive and aptitude tests, correlations with creativ-
ity were generally in the range from .20 to .25. Students scoring higher on
tests such as the Miller Analogies, Minnesota Psycho-Analogies, and the
College Vocabulary Test tended to receive higher ratings.
In the personality sphere, measures of psychopathology and malad-
justment, as assessed by the MMPI, were generally unrelated to the
creativity criterion, but measures directed explicitly to the assessment of
creativity or to hypothesized elements such as personal complexity and
esthetic awareness had correlations between .22 and .28. In the area of
vocational interests, the Psychologist scale on the Strong Interest Inventory
had correlations withcreativityof .22 for men and .23 for women, whereas
the scale for Banker had corresponding correlations of -.25 and -.27. A new
scale, called CT for Creative Temperament, was derived by item analyses
254 HARRISON G. GOUGH

of the CPI for these 1,028 students. It had correlations of .25 for males and
.33 for females with the creativity criterion, as would be anticipated given
the mode of its development. In seven cross-validating samples drawn
from other disciplines and occupational settings, CT had a median
coefficient with the creativity criteria of .46.
Certain lifestyles, as defined within a structural model of personality
generated from the CPI, were related to the ratings. Gammas, whose way
of living combines interactive involvement with others along with skep-
ticism concerning normative conventions, were most creative among the
four CPI types. Betas, whose way of living reflects a need for privacy and
distance from others with positive valuation of societal norms, ranked
lowest.
The analysis ended with two illustrative case vignettes, students
whose early promise was great, as indicated by test scores and biographical
data, and who also received very high ratings from faculty members. One
of these two students went on to achieve all that had been expected of him
in his professional career, but the other, encountering traumatic events
and ego-wounding experiences, more or less fell by the wayside. These
case illustrations serve as critical reminders that in prediction the
context and circumstances of the life situation must always be considered,
and that they set strict limits on what can be forecast from personological
data alone.

References
Amabile, T. M. (1983). The social psychology of creativity: A componential
conceptualization. Journal of Personality and Social Psychology, 45, 357-376.
Bachtold, L. K., & Werner, E. E. (1970). Creative psychologist: Gifted women.
American Psychologist, 25, 234-243.
Barron, F. (1953a). Complexity-simplicity as a personality dimension. Journal of
Abnormal and Social Psychology, 48, 163-172.
Barron, F. (1953b). Some personality correlates of independence of judgment.
Journal of Personality, 21, 287-297.
Barron, F. (1954). Personal soundness In university graduate students: An
experimental study of young men In the sciences and professions. University
of California Publications in Personality Assessment and Research, No. 1.
Barron, F. (1965). The psychology of creativity. In New directions in psychology
(Vol 2, pp. 3-134). New York: Holt, Rinehart & Winston.
Barron, F., &Egan, D. (1968). Leaders and innovators in Irish management. Journal
of Management Studies, 5, 41-60.
Barron, F., & Welsh, G. A (1952). Artistic perception as a factor in personality
style: Its measurement by a figure-preference test.Joumal ofPsychology, 33, 199-
203.
ASSESSMENT OF CREATIVE POTENTIAL 255

Block, J. (1961). The Q-sort method in personality assessment and psychiatric


research. Palo Alto, CA: Consulting Psychologists Press. (Originally published
by Charles C. Thomas, Springfield, IL).
Campbell, D.P. (1966). The 1966 revision of the Strong Vocational Interest Blank.
Personnel and Guidance Journal, 44, 854-858.
Campbell, D. P. (1977). Manual for the Strong-Campbell Interest Inventory (rev. ed.).
Stanford, CA: Stanford University Press.
Campbell, D.P., &Hansen,J. C. (1981).ManualfortheSVJ.B,SCI/(3rded.). Stanford,
CA: Stanford University Press.
Cass, J., &Birnbaum, M. (1968). Comparative guide to American colleges. New York:
Harper and Row.
Dahlstrom, W. G., Welsh, G. S., &Dahlstrom, L. E. (1972). An MMPI handbook: Vol.
1, Clinical interpretation. Minneapolis: University of Minnesota Press.
Dawes, R. M. (1971). A case study of graduate admissions: Applications of three
principles of human decision-making. American Psychologist, 26, 180-188.
Friedman, A F., Webb, J. T., & Lewak, R. (1989). Psychological assessment with the
MMPI. Hillsdale, NJ: Erlbaum.
Gough, H. G. (1962). Imagination-undeveloped resource. InS. J. Parnes & H. F.
Harding (Eds.), A source book for creative thinking (pp. 217-226). New York:
Scribners.
Gough, H. G. (1968). Manual for the Chapin Social Insight Test. Palo Alto, CA:
Consulting Psychologists Press.
Gough, H. G. (1973). Personality assessment in the study of population. In J. T.
Fawcett (Ed.), Psychological perspectives on populations (pp. 329-353). New
York: Basic Books.
Gough, H. G. (1976). Studying creativity by means of word association tests.
Journal of Applied Psychology, 61, 348-353.
Gough, H. G. (1979). A creative personality scale for the Adjective Check List.
Journal of Personality and Social Psychology, 37, 1398-1405.
Gough, H. G. (1983). Personality correlates of time required to complete work for
the Ph.D. degree in psychology. In C. D. Spielberger & J. N. Butcher (Eds.),
Advances in personalityassessment(Yol. 3, pp. 105-128). Hillsdale, NJ: Erlbaum.
Gough, H. G. (1987). California Psychological Inventory administrator's guide. Palo
Alto, CA: Consulting Psychologists Press.
Gough, H. G. (1989). The California Psychological Inventory. In C. S. Newmark
(Ed.), Major psychological assessment instruments (Yol. 2, pp. 67-98). Boston:
Allyn and Bacon.
Gough, H. G. (1990). Testing for leadership with the California Psychological
Inventory. InK. E. Clark & M. B. Clark (Eds.), Measures of leadership (pp. 355-
379). West Orange, NJ: Leadership Library of America.
Gough, H. G., & Heilbrun, AB., Jr. (1983). The Adjective Check List manual-1983
edition. Palo Alto, CA: Consulting Psychologists Press.
256 HARRISON G. GOUGH

Gough, H. G., & Sampson, H. (1954). The College Vocabulary Test, Forms A and B.
Berkeley, CA:. University of California Institute of Personality Assessment and
Research.
Gough, H. G., & Woodworth, D. G. (1960). Stylistic variations among professional
research scientists. Journal of Psychology, 49, 87-98.
Graham, J. R (1987). The MMPL· A practical guide (2nd ed.). New York: Oxford Press.
Greene, R L. (1980). The MMPL· An interpretive manual. New York: Grune and
Stratton.
Hall, W. B., & MacKinnon, D. W. (1969). Personality inventory correlates of
creativity among architects. Journal of Applied Psychology, 53, 322-326.
Hansen, J. C., & Campbell, D.P. (1985). Manual for the SVIB-SCI/ (4th ed.). Stanford,
CA: Stanford University Press.
Hathaway, S. R, & McKinley, J. C. (1940). A multiphasic personality schedule
(Minnesota): I. Construction of the schedule. Journal ofPsychology, 10, 249-254.
Helson, R. (1967). Personality characteristics and developmental history of
creative college women. Genetic Psychology Monographs, 76, 205-256.
Helson, R (1971). Women mathematicians and the creative personality. Journal
of Consulting and Clinical Psychology, 36, 210-220.
Helson, R (1978). Creativity in women. In J. Sherman & F. Denmark (Eds.), The
psychology of women: Future directions in research (pp. 533-604). New York:
Psychological Dimensions.
Helson, R (1988). The creative personality. InK. Gronhaug&G. Kaufmann (Eds.),
Innovation: A cross-disciplinary perspective (pp. 29-64). Oslo, Norway: Norwegian
University Press.
Helson, R, & Crutchfield, R. S. (1970). Mathematicians: The creative researcher
and the average Ph.D.Journal of Consulting and Clinical Psychology, 34,250-257.
Hirschfeld, R. M. A, Klerman, G. L., Gough, H. G., Barrett, J., Korchin, S. J., &
Chodoff, P. (197('). A measure ofinterpersonal dependency.Joumal ofPersonality
Assessment, 41, 610-618.
Isaaksen, S. G. (1988). Educational implications of creativity research: An updated
rationale for creative learning. InK. Gronhaug &G. Kaufmann (Eds.),/nnovation:
A cross-disciplinary perspective (pp. 167-203). Oslo, Norway: Norwegian
University Press.
Karni, E. S., & Levin, J. (1972). The use of smallest space analysis in studying scale
structure: An application to the California Psychological Inventory. Journal of
Applied Psychology, 56, 341-346.
Kelly, E. L., &Fiske, D. W. (1951). The prediction ofperformance inclinicalpsychology.
Ann Arbor, MI: University of Michigan Press.
Kelly, E. L., Goldberg, L. R., Fiske, D. W., &Kilkowski, J. M. (1978). Twenty-five years
later: A follow-up study of the graduate students in clinical psychology assessed
In the VA Selection Research Project. American Psychologist, 33, 746-754.
Levine, A. S. (1950). Construction and use of verbal analogy items. Journal of
Applied Psychology, 24, 105-107.
ASSESSMENT OF CREATIVE POTENTIAL 257

MacKinnon, D. W. (1964). The creativity of architects. In C. W. Taylor (Ed.),


Widening horizons in creativity (pp. 359-378). Reading, MA: Addison-Wesley.
MacKinnon, D. W. (1978)./n search ofhuman effectiveness: Identifying and developing
creativity. Buffalo, NY: Creative Education Foundation.
Miller, W. S. (1970). Manual for the Miller Analogies Test. New York: Psychological
Corporation.
Spearman, C. (1904). "General intelligence" objectively determined and measured.
American Journal of Psychology, I 5, 72-293.
Strong, E. K., Jr. (1943). The vocational interests of men and women. Stanford, CA:
Stanford University Press.
Sundberg, N. D., Latkin, C. A, Littman, R. A, & Hagan, R. A. (1990). Personality in
a religious commune: CPis In Rajneeshpuram.Joumal ofPersonality Assessment,
55, 7-17.
Welsh, G. S. (1969). Gifted adolescents: A handbook oftest results. Greensboro, NC:
Prediction Press.
Welsh, G. S. (1975). Creativity and intelligence: A personality approach. Chapel Hill,
NC: Institute for Research in Social Science, University of North Carolina.
Welsh, G. S. (1980). Manual for the Welsh Figure Preference Test. Palo Alto, CA:
Consulting Psychologists Press.

Author Note
At its inception, the project reported here was supported from a general research
grant to the Institute of Personality Assessment and Research by the Rockefeller
Foundation. In the late 1950s and 1960s, support was given by two career research
grants that I received from the Ford Foundation.ln the 1970s, aid came from gifts
from the Consulting Psychologists Press, and from intramural faculty research
grants. In the 1980s, a grant from the Spencer Foundation supported work on a
follow-up inquiry, and on a consolidation of the computer archival files. All of this
financial aid is gratefully acknowledged.
Many individuals gave invaluable and much appreciated assistance to the study
during the years of its existence since 1950. Not every person can be named, but
specific acknowledgment must be made of those key persons who designed and
took responsibility for the computer analyses and archival files, beginning with
Quintin Welch and continuing with Susan Hopkin, DanielS. Weiss, Peter B. Lifton,
Kevin Lanning, and Pamela Bradley. Significant help in the choice and construction
of tests was furnished by Frank Barron, Ravenna Helson, Donald MacKinnon,
Harold Sampson, and George Welsh. At every stage of the program, including
management of the project during years when I was absent on sabbatical leave,
Wallace B. Hall played a vital role. I want to thank each of these individuals, as well
as all others who contributed to the study.
Index
Academic engaged time (AE11. in Areas of Change questionnaire
SEM, 62,63 (AOC), 210-211
ACL, see Adjective Check Ust Army, psychiatric screening in,
ACQ Behavior Checklist, 87, 88, 89 169
Adaptation, in intelligence, 3-4 Assessment, 76-77,99
Adaptive testing, with MMPI-2, 157- taxonomy linked with, 77, 78
159
Adjective Check List (ACL), 226, Back F scale, ofMMPI-2, 135, 149
227,235,237,247,248, Balance-beam task, 20
250 Barnum effect, 156
Adjustment Reaction of Childhood, Barron's self-report inventory, 228,
76 239-240,250
Adolescent assessment, see Mul- Barron-Welsh Art Scale, 228, 250
tiaxial empirically based as- Base Expectancy Scale (BES), 117
sessment Basic Personality Inventory (BPI),
AEI', see Academic engaged time xiv, 165-195
Age class model compared with, 167-
multiaxial empirically based as- 168, 169-170
sessment and, 79, 81, 85- dimensional model compared
86,87,88-89,95,99 with, 168-171
PCL-R and, 113 format and administration of, 166
AJcoholabuse, 176,183,187,191- MMPI relationship to, 166, 167,
192, see also Substance 173, 178, 180-183, 194,
abuse 195
AJexithymia, 193 norms used in, 166-167
Alienation scale, BPI, 174, 178, 179, purpose of, 165-166
183, 187 research applications of, 187-194
Anorexia, BPI assessment of, 187 scale construction and multivari-
Anthropological metaphors, 5, 21- ate item analysis in, 178-
26 180
Antisocial behavior, see Antisocial scale facets and concepts in, 171-
personality disorder; Struc- 177
tural equation modeling typologies in, 183-187
Antisocial personality disorder Beck Depression Inventory, -204
(APD), 105-106, 111, 116, BES, see Base Expectancy Scale
118, 121-122, see also Psy- Binet, AJfred, xii, 2, 3-5, 9
chopathy Biological metaphors, 5, 12-18, 34
Anxiety scale, BPI, 175, 184, 187 Blacks, PCL-R and, 113, 119
AOC, see Areas of Change question- Blood-flow approach, 13, 16, 17, 34
naire Boring, E.G., 5
APD, see Antisocial personality dis- BPI, see Basic Personality Inven-
order tory
Aphasia, 14 Buchwald, Art, 132
259
260 INDEX

California Psychological Inventory Concurrent validity, ofPCL-R, 116


(CPI) Conditional comparativism, 22
creativity assessment and, xv, Conflict Tactics Scale (CTS), 212-213
225, 226-227,23Q-234, Conservation measurement, 20
235, 238, 242-254 Construct validity, of BPI. 167
CT scale of, see Creative Tempera- Content validity, of coding systems,
ment scale, of CPI 214
PCL-R compared with, 106, 111, Contrasted groups method of scale
116, 121 construction, 167-168, 169-
Cattell, James McKean, 3 170
CBCL, see Chlld Behavior Checklist Convergent validity
Cell assembly concept, 13 ofDAS, 209
Chapin Social Insight Test, 228, 235 ofMMPI-2, 135, 154
Chlld Behavior Checklist (CBCL) Countdown method, 158, 159
multiaxial empirically based as- Couple assessment, xiv, 201-219
sessment and, xiii, 78-79, conflict identification in, 207,
80, 81,82, 87,88, 89,90, 209-212
91, 92, 93-98, 99 contextual factors in, 206-208
SEM and, 49, 62 historical factors in, 206
Chlldren individual factors in, 204-206
couple assessment and, 207-208, multidimensional, 218-219
209, 210 purposes of, 202-203
multiaxial empirically based as- relationship processes and pat-
sessment of, see Multiaxial terns in, 212-217
empirically based assess- relationship satisfaction and,
ment 209-212
Piaget's theories on, 18-19, 2Q-21 Couples Interaction Scoring System
structural equation modeling (CISS), 213
and, see Structural equa- CPI, see California Psychological
tion modeling Inventory
Chi-square statistics, in SEM, 46, Creative Temperament scale (en. of
51, 54, 58, 66 CPI, 243-254
CISS, see Couples Interaction Scor- conceptual synthesis with, 249
ing System personological implications of,
Class models, 167-168, 169-170 247-249
Cleckley, H., 107, 115, 123 Creativity assessment, xv, 225-254
Cluster analytic methods, in BPI, background of, 226-227
183 CPI and, see under California Psy-
Coding systems, in couple assess- chological Inventory
ment, 213-215 criterion used in, 228-230
Cognitive processes, 10, 11 findings in, 235-241
Cognitive tests, 11-12 procedures used in, 227-228
CollegeVocabularyTest (CVI1. 227, sample in, 23Q-234
235,253 Criminality, PCL-R assessment of,
Communication Skills Test (CSO. 105, 106, 111, 114, 117-
214 118, 119, 121, 122
Componential analysis, 10 Criticism, in intelligence, 3
Computational metaphors, 5, 9-12, Crystallized abilities, 7, 8, 17
16 CST, see Communication Skills Test
Computer programs CTS, see Conflict Tactics Scale
intelligent, 10 CT scale, see Creative Temperament
MMPI-2 and, 155-159 scale, of CPI
multiaxial empirically based Cultural differentiation, law of, 21
assessment and, 96, 99 Cultural relativism, 21-22, 23
INDEX 261

Culture-fair tests, 23-24 Eating disorders, BPI assessment of,


CVf, see College Vocabulary Test 193
Educational level
DAS, see Dyadic Adjustment Scale MMPI-2 and, 138
Dax, Marc, 13-14 PCL-R and, 114
Delusions, BPI assessment of, 187 Electroencephalographic measure-
Denial scale, BPI, 172-173, 181, ment, 13, 15
183, 184, 187 Electrophysiological approaches, 13,
Depression 15-16, 34
BPI assessment of, 167, 168, Embedded-figures test, 24
170, 172, 182, 183, 184, Empirically based assessment, 77-80
187, 193 Employment
couple assessment and, 204 couple assessment and, 208
Depression scale, BPI, 172, 184, PCL-R and, 114
187, 193 Epistemological metaphors, 5, 18-21
Deviationscale,BPI,171,177,182 EQS program, 43-44
Diagnostic and Statistical Manual Evoked potentials, 15-16, 17, 18
(DSM) Extemalizing, in multiaxial empiri-
BPI and, 169 cally based assessment, 79,
multiaxial empirically based as- 81, 85, 95-96, 98
sessment and, 83-84, 86,
91 Face validity, of MMPI-2, 152, 154
PCL-R and, 104, 106, 116, 118, Factor analysis
121, 122 in BPI, 180
Diagnostic Interview Schedule for in geographic tests, 8
Children (DISC), 84-85 in SEM, 44-45, 65, 67, 70
Dialysis, BPI assessment and, 191 Factor scores, of PCL-R, 110-111,
Differential Personality Inventory 113, 114,115,121,123
(DPI). 171,172,175,195 Feighner criteria, 116
Differential Reaction Schedule Firesetting, multiaxial empirically
(DRS), 228, 239, 240, 250 based assessment of, 98
Dimensional models, 168-171 Fluid abilities, 7, 8, 17
Direction, in intelligence, 3
Direct Observation Form (DOF1. 81,82 Gall, Franz Joseph, 6
DISC, see Diagnostic Interview Galton, Sir Francis, 2-3, 4, 5
Schedule for Children; Gender
Dyadic Interaction Scoring MMPI-2 and, 147
Code multiaxial empirically based as-
Discriminant validity, of MMPI-2, sessment and, 79, 85-86,
154 87,88-89,95,99
DOF, see Direct Observation Form PCL-R and, 113
DPI, see Differential Personality General Delinquency Scale, in SEM, 49
Inventory General factor in intelligence, 6, 7,
DRS, see Differential Reaction 8,9
Schedule Generalizability theory, 115
Drug abuse, see Substance abuse Generalized psychotic processes, 182
DSM, see Diagnostic and Statistical Generalized social anxiety, 181
Manual General psychopathology factor, 87,
Dyadic Adjustment Scale (DAS). 88
209-210 Geographic metaphors, 5, 6-9, 11,
Dyadic Interaction Scoring Code 16
(DISC). 213-214 Gestalt Closure test, 16
Dyssocial personality disorder, 106, g factor, see General factor in intelli-
116 gence
262 INDEX

Hallucinations, BPI assessment of, Kpelle tribesmen, 25


187 KPI, see Kategoriensystem fUr
Health psychology, BPI in, 187, 191 Partnerschaftliche Interak-
Hemispheric specialization, 13-15 tion
Hippocrates, 13
Histrionic personality disorder, 118 Latent variables
Hypochondriasis scale, BPI, 171- in multiaxial empirically based
172, 178, 184 assessment, 90, 99
in SEM, 42, 46, 54, 69
ICD-10 criteria, 106, 116 Learning Potential Assessment
lEI, see Item Efficiency Index Device (LPAD), 28-29
Impulse Expression scale, BPI, 176, Letter-matching task, 12
181, 184, 187 Levi-Strauss, C., 23
Insomnia, BPI assessment of, 187 LIFE, see Living in Family Environ-
Instructional set paradigm, 150, 151 ments coding system
Intelligence LISREL, 44
historical views of, 2-5 Living in Family Environments cod-
multiple, 3(}...31 ing system (LIFE), 214
triarchic theory of, 30, 31-32, 33 LNNB, see Luria-Nebraska Neurop-
two-factor theory of, 6 sychology Battery
InteUigence Applied, 33 LPAD, see Learning Potential Assess-
Intelligence tests ment Device
metaphors in, see Metaphors Luria, Alexander, 13, 16
PCL-R and, 118 Luria-Nebraska Neuropsychology
Internal consistency, of PCL-R, 114- Battery (LNNB), 16
115
Internalization, 2~27 Macrosocial variables, in SEM, 42-
Internalizing, in multiaxial empiri- 43, 48, 59-69, 70
cally based assessment, 79, Mania, BPI assessment of, 187
81,85,95-96,98 Marital Interaction Coding System
Interpersonal Problems scale, BPI, (MICS), 213
173-174, 181, 184, 187 Marital Satisfaction Inventory (MSij,
Interrated reliability, of PCL-R, 114- 210
115 MAST, see Michigan Alcohol Screen-
IPAR studies, 235, 245, 247 ing Test
Irritable discipline, 50, 70 MAT, see Miller Analogies Test
IRf, see Item Response Theory Maternal reports, 51-55, see also
Item Efficiency Index (IEij, in BPI, Parent reports
178, 195 MCMI-11, see Millon Clinical Mul-
Item Response Theory URn. 157, tiaxial Inventory-11
159 Measurement models, in SEM, 51,
54, 57, 58, 65-66
Jackson Personality Inventory, Mediated learning experience, 28
xiv Memory-scanning task, 12
Juvenile delinquency, BPI assess- Mental retardation, 3-4
ment of, 192 Metacognitive processes, 1(}...11
Metaphors, xii, 1-34
K-ABC, see Kaufman Assessment anthropological, 5, 21-26
Battery for Children biological, 5, 12-18, 34
Kategoriensystem fUr Partner- computational, 5, 9-12, 16
schaftliche Interaktion epistemological, 5, 18-21
(KPI), 214 geographic, 5, ~9. 11, 16
Kaufman Assessment Battery for sociological, 5, 2~29
Children (K-ABC), xii, 16 systems, 5, 3(}...33
INDEX 263

Michigan Alcohol Screening Test Minnesota Multiphasic Personality


(MAST), 49 Inventory-2 (cont'd)
Microcomputerization, BPI and, validity of, 134-135, 145-152,
193 154, 155, 159
Microsocial variables, in SEM, 42- VRINScale of, 135-136, 149
43, 48, 59-69, 70 Minnesota Psycho-Analogies Test,
MICS, see Marital Interaction Cod- 227,235,250,253
ing System MMPI, see Minnesota Multiphasic
Miller Analogies Test (MAT), 226, Personality Inventory
227,234,235,250,253 Mother's Frame of Reference, 44
Millon Clinical Multiaxial Inventory- MSI, see Marital Satisfuction lnventmy
II (MCMI-II), 111 M-space, 19
Minnesota Multiphasic Personality Multiaxial empirically based assess-
Inventory (MMPI), xiii-xiv, ment, xiii, 75-100
131-132, 135, 136, 144, core syndrome identification in,
152, 156, 157, see also Min- 87-89,90,92,99
nesota Multiphasic Person- cross-informant computer pro-
ality Inventory-2 gram in, 96, 99
BPI relationship to, 166, 167, cross-informant discrepancies in,
173, 178, 180-183, 194, 84-85
195 cross-informant syndrome con-
couple assessment and, 205 structs in, 90-91, 99
creativity assessment and, 227, instrument-specific syndrome
230, 231, 238-239, 253 scales in, 91-92, 99
descriptors of, 148-149 proffies for syndrome scoring in,
MMPI-2 compared with, 145-148, 94-96
159 syndrome pattem variation and,
PCL-R compared with, 106, 111. 85-86
116, 121 taxonomic decision tree in, 97-
SEM and, 49, 54 98,99-100
Minnesota Multiphasic Personality Multiple intelligences, 30-31
Inventory-2 (MMPI-2), xiv, Multiple regression analysis, SEM
131-160, see also Minne- vs., 48, 67-69, 70
sota Multiphasic Personal- Multivariate item analysis, in BPI,
ity Inventory 178-181
Back F scale of, 135, 149
computer applications of, 155- Narcissistic personality disorder,
159 111, 118
content scales of, 133-135, 159 Native American Indians, PCL-R
couple assessment and, 205 and, 113, 119
item changes in, 132-133 Negative playground behavior, see
MMPI compared with, 145-148, Playneg
159 Neuropsychological approaches, 13-
new scale development in, 132, 14, 34
133, 153-155 Neurotic triad, in MMPI, 182
normative sample in, 136--137,
158 Occupation, see Employment
PCL-R compared with, 106, 111, Operational definitions
116 in multiaxial empirically based
research issues in, 145 assessment, 83-84, 90
scaling of, 136--144 in SEM, 69
subtle scales of, 152-153, 159 Oregon Youth Study (OYS), 49, 62
TRIN Scale of, 136, 149 OSLC interview scale, 49
T -scores of, see under T -scores OYS, see Oregon Youth Study
264 INDEX

Parent reports, 78-80, see also Ma- Psychopathy Checklist (cont'd)


ternal reports PCL-R correlated with, 108-109
PCL, see Psychopathy Checklist Psychopathy Checklist-Revised (PCL-
Peason product-moment formula, 96 R), xiii, 103-123
Performance theory, 42 administration of, 109-110
Permutations task, 21 assessment issues in, 106
Persecutory Ideas scale, BPI, 174- demographic variables and, 113-
175, 178, 182, 184, 187 114
Personality Research Form, xiv Factor scores of, 110-111, 113,
Piaget, Jean, 18-19, 20-21. 26 114, 115, 121, 123
Playneg, in SEM, 62, 63, 65 ~obalratingsin, 107,110,116
Plethysmography, 117 in laboratory studies, 119-120
Polythetic concept, 91 PCL correlated with, 108-109
Predictive validity reliability of, 114-115
of coding systems, 214 scale format in, 109
of PCL-R, 117-118 Total scores of, see Total scores,
Primary mental abilities theory, 6-7 ofPCL-R
Principal components analysis, in uses and users of, 111-112
multiaxial empirically validity of, 115-122
based assessment, 79, 92, Psychophysiological assessment, 214
95 PVIT, see Psychological Vocabulary
Pronle validity, ofMMPI-2, 149-152, and Information Test
159 P-300 wave form, 15
Prototype concepts, in multiaxial em-
pirically based assessment, Race,PCL-Rand,ll3,119
90-91,99 Rajneesh, Bhagwan Shree, 249
Psychological Vocabulary and Infor- Raven's Advanced Progressive Matri-
mation Test (PVIT), 227, ces, 16, 17
235, 250 ROC, see Research Diagnostic Crite-
Psychopathology ria
BPI assessment of, see Basic Per- Recidivism Prediction Scale (RPS),
sonality Inventory 117
MMPI/MMPI-2 assessment of, Relational efficacy, 205-206
135, 152, 153, 15~ 166 Research Diagnostic Criteria (ROC),
multiaxial empirically based as- 116
sessment of, see Multiaxial RPS, see Recidivism Prediction Scale
empirically based assess-
ment Salient Factor Score (SFS), 117
Psychopathy. 104-106 Saturated models, in SEM. 51, 52
CPI assessment of, 106, 111, Schizophrenia
116, 121 BPI and, 167-168,175
defined, 104-105 PCL-R and, 118
higher-order construct of, 110 Schizophrenic Reaction, Childhood
MMPI/MMPI-2 assessment of, Type, 76
106, 111, 116, 121 Scholastic Aptitude Test, 137
PCL-R assessment of, see Psycho- SCIC, see Semistructured Clinical
pathy Checklist-Revised Interview for Children
sociopathy and APD compared Self Depreciation scale, BPI. 176-
with. 105-106 177, 178, 184, 187
Psychopathy Checklist (PCL). 103- SEM. see Structural equation model-
104, 106, 107, 110-111, ing
113, 114, 115, 116, 118, Semistructured Clinical Interview
119, 120, see also Psycho- for Children (SCIC), 81, 82
pathy Checklist-Revised Sex, see Gender
INDEX 266

Sexism, in MMPI, 132, 133 Structural equation modeling (cont'd)


Sexual offenses, 117,120 multiple regression analysis vs.,
SFS, see Salient Factor Score 48, 67-69, 70
SII, see Strong Interest Inventory prediction in, 63-64
Simon, T., 3, 4 sample in, 48-49, 62
Simplex model, in SEM, 52, 53-54, Substance abuse, see also Alcohol
55, 58 abuse
SOC, see Spouse Observation BPI and, 175,177,187,191-192
Checklist couple assessment and, 204-205
Social Introversion scale, BPI, 176, PCL-Rand, 105, 111, 118
184, 187 Subtle scales, of MMPI -2, 152-153,
Social undercontrol, 181 159
Socioeconomic status Suicidal ideation
MMPI-2 and, 151 BPI assessment of, 177, 187
PCL-R and, 111, 114 multiaxial empirically based
SEM and, 48, 49, 50, 55-59 assessment of, 98
Sociological metaphors, 5, 26--29 SVIB-M, see Strong Vocational Inter-
Sociopathy, 105-106, see also Psy- est Blank for Men
chopathy Systems metaphors, 5, 30-33
SPAFF, see Specific Affect Coding
System Taxonomic decision tree, 97-98, 99-
Spearman, Charles, 6, 7, 8, 9, 249 100
Specific Affect Coding System Taxonomic sorting, 25
(SPAFF), 214 Taxonomy, 76--77,78,86,99
Spectrum project, 32 defined, 76
Sperry, Roger, 14 Teacher's Report Form (TRF), xiii,
Split-brain patients, 14 81,82,89,90,91,92,93-
Spouse Observation Checklist 98,99
(SOC), 211-212 Terman, Lewis, 4
Stanford- Binet intelligence tests, xii, Test reliability
8, 16 ofDAS, 210
State-Trait Anxiety scale, 175 in multiaxial empirically based
Strong Interest Inventory (SII). 227, assessment, 81
240-241, 250, 253 of PCL-R, 114-115
Strong Vocational Interest Blank for Test validity
Men (SVIB-M), 227, 240 of BPI, 167, 180
Structural equation modeling (SEM), of coding systems, 214
xii-xiii, 41-70 ofCPI, 247
causality in, 45-46 ofDAS, 209
data analysis in, 44-45, 61-62 of MMPI-2, 134-135, 145-152,
factor analysis in, 44-45, 65, 67, 154, 155, 159
70 in multiaxial empirically based
hypothesis testing in, 46 assessment, 81
limitations of, 47-48 of PCL-R, 115-122
macrosocial variables in, see Thinking Disorder scale, BPI, 175-
Macrosocial variables, in 176, 182, 184, 187
SEM Thomson, Godfrey, 6
maternal antisocial behavior/dis- Thurstone, L.L., 6--7, 8
cipline in, 47-59 TNP, see Total negative process
measures used in, 49-50, 62-63 Total negative process (TNP), in
microsocial variables in, see Mi- SEM, 62, 65-66
crosocial variables, in SEM Total scores, of PCL-R, 110, 112-
mother-report prediction models 113, 114, 115, 116, 117,
in, 51-55 118, 123
266 INDEX

TRF, see Teacher's Report Form VRIN Scale, see Variable Response
Triarchic theory of intelligence, 30, Inconsistency Scale
31-32, 33 Vygotsky, Lev, 26, 27
TRIN Scale, see True Response
Inconsistency Scale Wechsler intelligence tests, xii, 8, 17
True Response Inconsistency Scale Welsh Anxiety scales, 239
(TRIN), of MMPI-2, 136, 149 Welsh Revised Art Scale, 228, 250
T-scores Wiener and Harmon Subtle and
ofMMPI. 49 Obvious scales, 152-153
of MMPI-2, 134, 136, 144, 147- Wissler, Clark, 3
148, 157, 158, 159 Wolof Tribe, 24-25
ofMSI, 210
in multlaxial empirically based YABCL, see Young Adult Behavior
assessment, 95, 96 Checklist
Two-factor theory of intelligence, 6 YASR, see Young Adult Self-Report
Young Adult Behavior Checklist
Universalism, 23 (YABCL), 81
Young Adult Self-Report (YASR), 81
Variable Response Inconsistency Youth Self-Report (YSR), 81, 82, 89,
Scale (VRIN), of MMPI-2, 90,91,92,93-98,99
135-136, 149 YSR, see Youth Self-Report
Violence
couple assessment and, 212-213 Zone of proximal development, 27-29
PCL-Rassessmentof,l17,119,
122

You might also like