Professional Documents
Culture Documents
Psychological Testing - An Introduction - 2nd Ed - Domino & Domino
Psychological Testing - An Introduction - 2nd Ed - Domino & Domino
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
ii
13:31
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
Second Edition
This book is an introductory text to the eld of psychological testing primarily suitable
for undergraduate students in psychology, education, business, and related elds. This
book will also be of interest to graduate students who have not had prior exposure
to psychological testing and to professionals such as lawyers who need to consult a
useful source. Psychological Testing is clearly written, well organized, comprehensive,
and replete with illustrative materials. In addition to the basic topics, the text covers
in detail topics that are often neglected by other texts such as cross-cultural testing,
the issue of faking tests, the impact of computers, and the use of tests to assess positive
behaviors such as creativity.
George Domino is the former Director of Clinical Psychology and Professor of Psychology at the University of Arizona. He was also the former director of the Counseling
Center and Professor of Psychology at Fordham University.
Marla L. Domino has a BA in Psychology, an MA in Criminal Law, and a PhD in
Clinical Psychology specializing in Psychology and Law. She also completed a postdoctoral fellowship in Clinical-Forensic Psychology at the University of Massachusetts
Medical School, Law and Psychiatry Program. She is currently the Chief Psychologist
in the South Carolina Department of Mental Healths Forensic Evaluation Service and
an assistant professor in the Department of Neuropsychiatry and Behavioral Sciences
at the University of South Carolina. She was recently awarded by the South Carolina
Department of Mental Health as Outstanding Employee of the Year in Forensics
(2004).
13:31
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
ii
13:31
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
13:31
SECOND EDITION
Psychological Testing
An Introduction
George Domino
University of Arizona
Marla L. Domino
Department of Mental Health, State of South Carolina
iii
isbn-13
isbn-10
978-0-521-86181-6 hardback
0-521-86181-0 hardback
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
13:31
Contents
Preface
page ix
Acknowledgments
xi
Personality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
vi
Contents
Psychopathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Aim, 161 r Introduction, 161 r Measures, 163 r The Minnesota
Multiphasic Personality Inventory (MMPI) and MMPI-2, 170 r The
Millon Clinical Multiaxial Inventory (MCMI), 179 r Other Measures,
185 r Summary, 196 r Suggested Readings, 196 r Discussion
Questions, 196
Aim, 257 r Some Overall Issues, 257 r Attitudes Toward the Elderly,
260 r Anxiety About Aging, 261 r Life Satisfaction, 261 r Marital
Satisfaction, 263 r Morale, 264 r Coping or Adaptation, 265 r Death
and Dying, 265 r Neuropsychological Assessment, 266 r Depression,
269 r Summary, 270 r Suggested Readings, 270 r Discussion
Questions, 271
11
12
13
13:31
Aim, 356 r Some Basic Issues, 356 r Some Basic Findings, 356 r
Ratings, 359 r The Role of Personality, 360 r Biographical Data
(Biodata), 363 r Assessment Centers, 365 r Illustrative Industrial
Concerns, 371 r Testing in the Military, 373 r Prediction of Police
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
Contents
vii
16
17
18
13:31
537
Test Index
623
Index of Acronyms
627
Subject Index
629
P1: JZP
0521861810pre
CB1038/Domino
0 521 86181 0
March 6, 2006
viii
13:31
P1: JZP
0521861810pre1
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
Preface
P1: JZP
0521861810pre1
CB1038/Domino
0 521 86181 0
but do they really belong in a textbook on psychological testing? If they do, then that means
that some other topics more directly relevant to
testing are omitted or given short shrift. In this
textbook, we have chosen to focus on testing and
to minimize the theoretical issues associated with
intelligence, personality, etc., except where they
may be needed to have a better understanding of
testing approaches.
It is no surprise that computers have had (and
continue to have) a major impact on psychological testing, and so an entire chapter of this book
(Chapter 17) is devoted to this topic. There is
also a vast body of literature and great student
interest on the topic of faking, and here too an
entire chapter (Chapter 16) has been devoted to
March 6, 2006
13:32
Preface
P1: JZP
0521861810pre1
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
Acknowledgments
xi
P1: JZP
0521861810pre1
CB1038/Domino
0 521 86181 0
March 6, 2006
xii
13:32
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
AIM In this chapter we cover four basic issues. First, we focus on what is a test, not just
a formal denition, but on ways of thinking about tests. Second, we try to develop a
taxonomy of tests, that is we look at various ways in which tests can be categorized.
Third, we look at the ethical aspects of psychological testing. Finally, we explore how
we can obtain information about a specic test.
INTRODUCTION
Most likely you would have no difculty identifying a psychological test, even if you met one in
a dark alley. So the intent here is not to give you
one more denition to memorize and repeat but
rather to spark your thinking.
What is a test? Anastasi (1988), one of the
of experiment, the experimenter studies a phenomenon and observes the results, while at the
same time keeping in check all extraneous variables so that the results can be ascribed to a particular antecedent cause. In psychological testing,
however, it is usually not possible to control all
the extraneous variables, but the metaphor here
is a useful one that forces us to focus on the standardized procedures, on the elimination of conicting causes, on experimental control, and on
the generation of hypotheses that can be further
investigated. So if I administer a test of achievement to little Sandra, I want to make sure that
her score reects what she has achieved, rather
than her ability to follow instructions, her degree
of hunger before lunch, her uneasiness at being
tested, or some other inuence.
A second way to consider a test is to think of a
test as an interview. When you are administered
an examination in your class, you are essentially
being interviewed by the instructor to determine
how well you know the material. We discuss interviews in Chapter 18, but for now consider the
following: in most situations we need to talk
to each other. If I am the instructor, I need to
know how much you have learned. If I am hiring
an architect to design a house or a contractor to
build one, I need to evaluate their competency,
and so on. Thus interviews are necessary, but
a test offers many advantages over the standard
1
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
whether a test is commercially published (sometimes called a proprietary test) or not. Major
tests like the Stanford-Binet and the Minnesota
Multiphasic Personality Inventory are available
for purchase by qualied users through commercial companies. The commercial publisher advertises primarily through its catalog, and for many
tests makes available, for a fee, a specimen set, usually the test booklet and answer sheet, a scoring
key to score the test, and a test manual that contains information about the test. If a test is not
commercially published, then a copy is ordinarily
available from the test author, and there may be
some accompanying information, or perhaps just
the journal article where the test was rst introduced. Sometimes journal articles include the
original test, particularly if it is quite short, but
often they will not. (Examples of articles that contain test items are R. L. Baker, Mednick & Hocevar, 1991; L. R. Good & K. C. Good, 1974; McLain,
1993; Rehsch, 1958a; Snell, 1989; Vodanovich
& Kass, 1990). Keep in mind that the contents of
journal articles are copyright and permission to
use a test must be obtained from both the author
and the publisher.
If you are interested in learning more about
a specic test, rst you must determine if the
test is commercially published. If it is, then you
will want to consult the Mental Measurements
Yearbook (MMY), available in most university
libraries. Despite its name, the MMY is published
at irregular intervals rather than yearly. However,
it is an invaluable guide. For many commercially
14:9
guished by various aspects of their administration. For example, there are group vs. individual
tests; group tests can be administered to a group
of subjects at the same time and individual tests to
one person only at one time. The Stanford-Binet
test of intelligence is an individual test, whereas
the SAT is a group test. Clinicians who deal with
one client at a time generally prefer individual
tests because these often yield observational data
in addition to a test score; researchers often need
to test large groups of subjects in minimum time
and may prefer group tests (there are of course,
many exceptions to this statement). A group test
can be administered to one individual; sometimes, an individual test can be modied so it
can be administered to a group.
Tests can also be classied as speed vs. power
tests. Speed tests have a time limit that affects
performance; for example, you might be given a
page of printed text and asked to cross out all the
es in 25 seconds. How many you cross out will
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
which overlaps with the approaches already mentioned, is through their item structure. Test items
can be placed on a continuum from objective to
subjective. At the objective end, we have multiplechoice items; at the subjective end, we have the
type of open-ended questions that clinical psychologists and psychiatrists ask, such as tell me
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
5. Tests that assess normal and positive functioning, such as creativity, competence, and selfesteem.
Test function. Tests can also be categorized
depending upon their function. Some tests are
used to diagnose present conditions. (Does the
client have a character disorder? Is the client
depressed?) Other tests are used to make predictions. (Will this person do well in college? Is
this client likely to attempt suicide?) Other tests
are used in selection procedures, which basically
involve accepting or not accepting a candidate, as
in admission to graduate school. Some tests are
used for placement purposes candidates who
have been accepted are placed in a particular
treatment. For example, entering students at
a university may be placed in different level writing courses depending upon their performance
in a writing exam. A battery of tests may be used
to make such a placement decision or to assess
which of several alternatives is most appropriate
for the particular client here the term typically
used is classication (note that this term has both
a broader meaning and a narrower meaning).
Some tests are used for screening purposes; the
term screening implies a rapid and rough procedure. Some tests are used for certication, usually related to some legal standard; thus passing
a driving test certies that the person has, at the
very least, a minimum prociency and is allowed
to drive an automobile.
Score interpretation. Yet another classication
14:9
not to the test but to how the score or performance is interpreted. The same test could yield
either or both score interpretations.
Another distinction that can be made is
whether the measurement provided by the test
is normative or ipsative, that is, whether the standard of comparison reects the behavior of others
or of the client. Consider a 100-item vocabulary
test that we administer to Marisa, and she obtains
a score of 82. To make sense of that score, we
compare her score with some normative data
for example, the average score of similar-aged college students. Now consider a questionnaire that
asks Marisa to decide which of two values is more
important to her: Is it more important for you
to have (1) a good paying job, or (2) freedom to
do what you wish. We could compare her choice
with that of others, but in effect we have simply
asked her to rank two items in terms of her own
preferences or her own behavior; in most cases it
would not be legitimate to compare her ranking
with those of others. She may prefer choice number 2, but not by much, whereas for me choice
number 2 is a very strong preference.
One way of dening ipsative is that the scores
on the scale must sum to a constant. For example, if you are presented with a set of six
ice cream avors to rank order as to preference, no matter whether your rst preference is
crunchy caramel or Bohemian tutti-frutti,
the sum of your six preferences will be 21
(1+2+3+4+5+6). On the other hand, if you
were asked to rate each avor independently on
a 6-point scale, you could rate all of them high or
all of them low; this would be a normative scale.
Another way to dene ipsative is to focus on the
idea that in ipsative measurement, the mean is
that of the individual, whereas in normative measurement the mean is that of the group. Ipsative
measurement is found in personality assessment;
we look at a technique called Q sort in Chapter 18.
Block (1957) found that ipsative and normative
ratings of personality were quite equivalent.
Another classicatory approach involves
whether the responses made to the test are interpreted psychometrically or impressionistically. If
the responses are scored and the scores interpreted on the basis of available norms and/or
research data, then the process is a psychometric
one. If instead the tester looks at the responses
carefully on the basis of his/her expertise and
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
distinction is whether a test assesses maximal performance (how well a person can do) or typical
performance (how well the person typically does)
(Cronbach, 1970). Tests of maximal performance
usually include achievement and aptitude tests
and typically based on items that have a correct
answer. Typical performance tests include personality inventories, attitude scales, and opinion
questionnaires, for which there are no correct
answers.
Age range. We can classify tests according to
the age range for which they are most appropriate. The Stanford-Binet, for example, is appropriate for children but less so for adults; the SAT
is appropriate for adolescents and young adults
but not for children. Tests are used with a wide
variety of clients and we focus particularly on
children (Chapter 9), the elderly (Chapter 10),
minorities and individuals in different cultures
(Chapter 11), and the handicapped (Chapter 12).
Type of setting. Finally, we can classify tests
according to the setting in which they are primarily used. Tests are used in a wide variety of settings, but the most prevalent are school settings
(Chapter 13), occupational and military settings
(Chapter 14), and mental health settings such
as clinics, courts of law, and prisons (Chapter 15).
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
9
ETHICAL STANDARDS
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
10
example, ordered by the courts) may be detrimental to a particular client. How can these situations be resolved so that both the needs of society
and the welfare of the individual are protected?
6. Social responsibility: Psychologists have professional and scientic responsibilities to community and society. With regard to psychological
testing, this might cover counseling against the
misuse of tests by the local school.
In addition to these six principles, there are
specic ethical standards that cover eight categories, ranging from General standards to
Resolving ethical issues. The second category is titled, Evaluation, assessment, or intervention and is thus the area most explicitly
related to testing; this category covers 10 specic
standards:
1. Psychological procedures such as testing, evaluation, diagnosis, etc., should occur only within
the context of a dened professional relationship.
2. Psychologists only use tests in appropriate
ways.
3. Tests are to be developed using acceptable scientic procedures.
4. When tests are used, there should be familiarity with and awareness of the limitations imposed
by psychometric issues, such as those discussed
in this textbook.
5. Assessment results are to be interpreted in light
of the limitations inherent in such procedures.
6. Unqualied persons should not use psychological assessment techniques.
7. Tests that are obsolete and outdated should
not be used.
8. The purpose, norms, and other aspects of a
test should be described accurately.
9. Appropriate explanations of test results should
be given.
10. The integrity and security of tests should be
maintained.
Standards for educational and psychological
tests. In addition to the more general ethical
standards discussed above, there are also specic standards for educational and psychological
tests (American Educational Research Association, 1999), rst published in 1954, and subsequently revised a number of times.
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
The right to privacy basically concerns the willingness of a person to share with others personal
information, whether that information be factual or involve feelings and attitudes. In many
tests, especially personality tests, the subject is
asked to share what may be very personal information, occasionally without realizing that such
sharing is taking place. At the same time, the subject cannot be instructed that, if you answer true
to item #17, I will take that as evidence that you are
introverted.
What is or is not invasion of privacy may be a
function of a number of aspects. A person seeking the help of a sex therapist may well expect
and understand the need for some very personal questions about his or her sex life, while
a student seeking career counseling would not
expect to be questioned about such behavior (for
a detailed analysis of privacy as it relates to psychological testing see Ruebhausen & Brim, 1966;
for some interesting views on privacy, including Congressional hearings, see the November
1965 and May 1966 issues of the American
Psychologist).
Mention might also be made of feedback, providing and explaining test results to the client.
Pope (1992) suggests that feedback may be
the most neglected aspect of assessment, and
describes feedback as a dynamic, interactive process, rather than a passive, information-giving
process.
The concern for ethical behavior is a pervasive aspect of the psychological profession, but
one that lay people often are not aware of. Students, for example, at times do not realize that
their requests (can I have a copy of the XYZ
intelligence test to assess my little brother) could
involve unethical behavior.
In addition to the two major sets of ethical
standards discussed above, there are other pertinent documents. For example, there are guidelines for providers of psychological services to
members of populations whose ethnic, linguistic, or cultural background are diverse (APA,
1993), which include at least one explicit statement about the application of tests to such individuals, and there are guidelines for the disclosure of test data (APA, 1996). All of these
documents are the result of hard and continuing work on the part of many professional
organizations.
14:9
11
Test levels. If one considers tests as tools to be
used by professionals trained in their use, then it
becomes quite understandable why tests should
not be readily available to unqualied users. In
fact, the APA proposed many years ago a rating
system of three categories of tests: level A tests
require minimal training, level B tests require
some advanced training, and level C tests require
substantial professional expertise. These guidelines are followed by many test publishers who
often require that prospective customers ll out
a registration form indicating their level of expertise to purchase specic tests.
There is an additional reason why the availability of tests needs to be controlled and that is
for security. A test score should reect the dimension being measured, for example, knowledge of
elementary geography, rather than some other
process such as knowledge of the right answers.
As indicated earlier, some tests are highly secured
and their use is tightly controlled; for example
tests like the SAT or the GRE are available only
to those involved in their administration, and a
strict accounting of each test booklet is required.
Other tests are readily available, and their item
content can sometimes be found in professional
journals or other library documents.
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
12
SUMMARY
14:9
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14:9
13
Chun, K. T. et al. (1975). Measures for psychological assessment: A guide to 3000 original sources
and their applications. Ann Arbor: University of
Michigan.
An old but still useful source for measures of mental health.
Johnson, O. G. (1970; 1976). Tests and measurements in child development. San Francisco: JosseyBass.
The two volumes cover unpublished tests for use with
children.
McReynolds, P. (Ed.) (1968). Advances in psychological assessment. Palo Alto: Science and Behavior Books.
This is an excellent series of books, the rst one published in 1968, each book consisting of a series of chapters on assessment topics, ranging from reviews of specic tests like the Rorschach and the California Psychological Inventory (CPI), to topic areas like the assessment
of anxiety, panic disorder, and adolescent suicide.
Newmark, C. S. (Ed.) (1985; 1989), Major psychological assessment instruments, volumes I and II.
Boston: Allyn & Bacon.
A nice review of the most widely used tests in current
psychological assessment, the volumes give detailed
information about the construction, administration,
interpretation, and status of these tests.
P1: JZP
0521861810c01
CB1038/Domino
0 521 86181 0
14
This is the rst edition of what has become a continuing series. In this particular volume, over 3,000 tests,
both commercially available and unpublished, are given
a brief thumbnail sketches.
A review of 143 measures covering such areas as personality, self-concept, attitudes, and social skills.
SUGGESTED READINGS
Dailey, C. A. (1953). The practical utility of the clinical report. Journal of Consulting Psychology, 17, 297
302.
An interesting study that tried to quantify how clinical procedures, based on tests, contribute to the decisions made about
patients.
Lorge, I. (1951). The fundamental nature of measurement. In. E. F. Lindquist (Ed.), Educational Measurement, pp. 533559. Washington, D.C.: American
Council on Education.
An excellent overview of measurement, including the NOIR
system.
DISCUSSION QUESTIONS
14:9
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
AIM This chapter looks at three basic questions: (1) How are tests constructed?
(2) What are the basic principles involved in administering a test? and (3) How can
we make sense of a test score?
CONSTRUCTING A TEST
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
16
March 6, 2006
13:32
eight life stages), perhaps their relative importance (are they all of equal importance?), and
how many items each subtopic will contribute
to the overall test (I might decide, for example,
that each of the eight stages should be assessed by
15 items, thus yielding a total test of 120 items).
This table of specications may reect not only
my own thinking, but the theoretical notions
present in the literature, other tests that are available on this topic, and the thinking of colleagues
and experts. Test companies that develop educational tests such as achievement batteries often
go to great lengths in developing such a table of
specications by consulting experts, either individually or in group conferences; the construction of these tests often represent major efforts of
many individuals, at a high cost beyond the reach
of any one person.
The table of specications may be very formal
or very informal, or sometimes absent, but leads
to the writing or assembling of potential items.
These items may be the result of the test constructors own creativity, they may be obtained
from experts, from other measures already available, from a reading of the pertinent literature,
from observations and interviews with clients,
and many other sources. Writing good test items
is both an art and a science and is not easily
achieved. I suspect you have taken many instructor made tests where the items were not clear,
the correct answers were quite obvious, or the
items focused on some insignicant aspects of
your coursework. Usually, the classroom instructor writes items and uses most of them. The professional test constructor knows that the initial
pool of items needs to be at a minimum four or
ve times as large as the number of items actually
needed.
5. Tryouts and renement. The initial pool of
items will probably be large and rather unrened.
Items may be near duplications of each other,
perhaps not clearly written or understood. The
intent of this step is to rene the pool of items to
a smaller but usable pool. To do this, we might
ask colleagues (and/or enemies) to criticize, the
items, or we might administer them to a captive
class of psychology majors to review and identify
items that may not be clearly written. Sometimes,
pilot testing is used where a preliminary form is
administered to a sample of subjects to determine
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
whether there are any glitches, etc. Such pilot testing might involve asking the subjects to think
aloud as they answer each item or to provide
feedback as to whether the instructions are clear,
the items interesting, and so on. We may also do
some preliminary statistical work and assemble
the test for a trial run called a pretest. For example,
if I were developing a scale to measure depression, I might administer my pool of items (say
250) to groups of depressed and nondepressed
people and then carry out item analyses to see
which items in fact differentiate the two groups.
For example, to the item I am feeling blue I
might expect signicantly more depressed people to answer true than nondepressed people.
I might then retain the 100 items that seem to
work best statistically, write each item on a 3 5
card, and sort these cards into categories according to their content; such as all the items dealing
with sleep disturbances in one pile, all the items
dealing with feelings in a separate pile, and so on.
This sorting might indicate that we have too many
items of one kind and not enough of another,
so I might remove some of the excess items and
write some new ones for the underrepresented
category. Incidentally, this process is known as
content analysis (see Gottschalk & Gleser, 1969).
This step then, consists of a series of procedures,
some requiring logical analysis, others statistical analysis, that are often repeated several times,
until the initial pool of items has been reduced
to manageable size, and all the evidence indicates that our test is working the way we wish
it to.
6. Reliability and validity. Once we have
rened our pool of items to manageable size,
and have done the preliminary work of the above
steps, we need to establish that our measuring
instrument is reliable, that is, consistent, and
measures what we set out to measure, that is,
the test is valid. These two concepts are so basic
and important that we devote an entire chapter
to them (see Chapter 3). If we do not have reliability and validity, then our pool of items is not a
measuring instrument, and it is precisely this that
distinguishes the instruments psychologists use
from those questionnaires that are published in
popular magazines to determine whether a person is a good lover, nancially responsible,
or a born leader.
13:32
17
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
18
on fth graders. It is not unusual for achievement tests used in school systems to have normative samples in the tens of thousands, chosen
to be representative on the basis of census data
or other guiding principles, but for most tests the
sample size is often in the hundreds or smaller.
The sample should be clearly dened also so that
the test user can assess its adequacy was the
sample a captive group of introductory psychology students, or a random sample representative of many majors? Was the sample selected on
specic characteristics such as income and age,
to be representative of the national population?
How were the subjects selected?
8. Further renements. Once a test is made
available, either commercially or to other
researchers, it often undergoes renements and
revisions. Well-known tests such as the StanfordBinet have undergone several revisions, sometimes quite major and sometimes minor. Sometimes the changes reect additional scientic
knowledge, and sometimes societal changes, as
in our greater awareness of gender bias in
language.
One type of revision that often occurs is the
development of a short form of the original test.
Typically, a different author takes the original test,
administers it to a group of subjects, and shows
by various statistical procedures that the test can
be shortened without any substantial loss in reliability and validity. Psychologists and others are
always on the lookout for brief instruments, and
so short forms often become popular, although as
a general rule, the shorter the test the less reliable
and valid it is. (For some examples of short forms
see Burger, 1975; Fischer & Fick, 1993; Kaufman,
1972; Silverstein, 1967.)
Still another type of revision that occurs fairly
frequently comes about by factor analysis. Lets
say I develop a questionnaire on depression
that assesses what I consider are four aspects of
depression. A factor analysis might indeed indicate that there are four basic dimensions to my
test, and so perhaps each should be scored separately, in effect, yielding four scales. Or perhaps,
the results of the factor analysis indicate that there
is only one factor and that the four subscales I
thought were separate are not. Therefore, only
one score should be generated. Or the factor analysis might indicate that of the 31 items on the test,
13:32
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
19
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
20
46 is to 24 as 19 is to
(a) 9, (b) 13, (c) 38, (d) 106
13:32
States
1. California
2. Michigan
3. North Carolina
4. Ohio
5. Montana
6. Arizona
7. South Dakota
8. Idaho
Matching items can be useful in assessing specic factual knowledge such as names of authors
and their novels, dates and historical events, and
so on. One problem with matching items is that
mismatching one component can result in mismatching other components; thus the components are not independent.
7. Completion items. These provide a stem and
require the subject to supply an answer. If potential answers are given, this becomes a multiplechoice item. Examples of completion items
are:
Wundt established his laboratory in the year
I am always
(Here the answer is 35 because the series of numbers increases alternately by 7 points and 4 points:
6 + 7 = 13; 13 + 4 = 17; 17 + 7 = 24; etc.)
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
Presumably, choice (a) would reect introversion, while choice (b) would reect extraversion;
whether the item works as intended would need
to be determined empirically.
10. Vignettes. A vignette is a brief scenario, like
the synopsis of a play or novel. The subject is
asked to react in some way to the vignette, perhaps by providing a story completion, choosing
from a set of alternatives, or making some type
of judgment. Examples of studies that have used
vignettes are those of G. Domino and Hannah
(1987), who asked American and Chinese children to complete brief stories; of DeLuty (1988
1989), who had students assess the acceptability of suicide; of Wagner and Sternberg (1986),
who used vignettes to assess what they called
tacit knowledge; and of Iwao and Triandis
(1993), who assessed Japanese and American
stereotypes.
11. Rearrangement or continuity items. This
is one type of item that is relatively rare but has
potential. These items measure a persons knowledge about the order of a series of items. For
example, we might list a set of names, such as
Wilhelm Wundt, Lewis Terman, Arthur Jensen,
etc., and ask the test taker to rank these in chronological order. The difculty with this type of item
is the scoring, but Cureton (1960) has provided a
table that can be used in a relatively easy scoring
procedure that reects the difference between the
persons answers and the scoring key.
continuum. Different
kinds of test items can be thought of as
occupying a continuum along a dimension of
objective-subjective:
Objective-subjective
objective subjective
13:32
21
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
22
number so a person can select the middle neutral option, or should the responses be an even
number, so the subject is forced to choose?
An example of the data available comes from a
study by Bendig (1959) who administered a personality inventory to two samples, one receiving
the standard form with a trichotomous response
(true, ?, false), the other a form that omitted the ?
response. The results were pretty equivalent, and
Bendig (1959) concluded that using a dichotomous response was more economical in terms
of scoring cost (now, it probably does not make
any difference). For another example, see Tzeng,
Ware, and Bharadwaj (1991).
Sequencing of items. Items in a test are usually
March 6, 2006
13:32
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
23
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
24
the results indicated one main factor, presumably rigidity, or six factors, presumably the above
dimensions. We discuss this method next.
Instead he chose to use data that was already
collected by researchers at the Institute of Personality Assessment and Research of the University
of California at Berkeley. At this institute, a number of different samples, ranging from graduate
students to Air Force captains, had been administered batteries of tests, including the CPI and the
MMPI, had been rated by IPAR staff on a number of dimensions, including rigidity. Rehsch
simply analyzed statistically the responses to the
combined CPI-MMPI item pool (some 957 truefalse statements) of the subjects rated highest
and lowest 25% on rigidity. He cross-validated,
that is replicated the analysis, on additional samples. The result was a 39-item scale that correlated signicantly with a variety of ratings, and
which was substantially congruent with the theoretical framework. High scorers on this scale
tend to be seen as anxious, overcontrolled, inexible in their social roles, orderly, and uncomfortable with uncertainty. Low scorers tend to
be seen as uent in their thinking and in their
speech, outgoing in social situations, impulsive,
and original. Interestingly enough, scores on the
scale correlated only .19 with ratings of rigidity in a sample of medical school applicants. It
is clear that the resulting scale is a complex
rather than a pure measure of rigidity. In fact,
a content analysis of the 39 items suggested that
they can be sorted into eight categories ranging
from anxiety and constriction in social situations to conservatism and conventionality. A
subsequent study by Rehsch (1959) presented
some additional evidence for the validity of this
scale.
Factor-analysis as a way of test construction.
This approach assumes that scales should be univariate and independent. That is, scales should
measure only one variable and should not correlate with scales that measure a different variable.
Thus, all the items retained for a scale should be
homogeneous, they should all be interrelated.
As in the criterion-keying method, we begin
with a pool of items that are administered to a
sample of subjects. The sample may be one of
convenience (e.g., college sophomores) or one
of theoretical interest (patients with the diagno-
March 6, 2006
13:32
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
25
ADMINISTERING A TEST
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
26
13:32
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
27
SD
62
10
SD
Test A
62
10
Test B
106
Raw score of
117 on Test B
equals a
z score of = + 1.22
FIGURE 22. Equivalency of raw scores to z scores.
13:32
Raw score of
78 on Test A
equals a
z score of = + 1.60
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
28
13:32
2
7
3 4 5 6 7 8
12 17 20 17 12 7
9
4
stanine:
1
dened
poor
as:
percentage: 4
2&3
4,5,6
7&8
9
below average above superior
average
average
19
54
19
4
or a tripartite classication:
stanine:
dened as:
percentage:
1,2,3
low
23
4,5,6
average
54
7,8,9
high
23
The difculty of an item is simply the percentage of persons who answer the item correctly.
Note that the higher the percentage the easier
the item; an item that is answered correctly by
60% of the respondents has a p (for percentage)
value of .60. A difcult item that is answered correctly by only 10% has a p = .10 and an easy item
answered correctly by 90% has a p = .90. Not
all test items have correct answers. For example, tests of attitudes, of personality, of political opinions, etc., may present the subject with
items that require agreement-disagreement, but
for which there is no correct answer. Most items
however, have a keyed response, a response that
if endorsed is given points. On a scale of anxiety,
a yes response to the item, are you nervous
most of the time? might be counted as reecting anxiety and would be the keyed response.
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
29
34.13 34.13
13.59
Percent of cases
under portions of
the normal curve
13.59
2.14
2.14
Standard Deviations
3
+1
+2
+3
16
50
84
97
99
+1
+2
+3
20
30
40
50
60
70
80
85
100
115
Percentiles
z scores
T scores
Stanines
1
Deviation IQs
55
70
130
145
FIGURE 23. Relationships of different types of scores, based on the normal distribution.
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
30
March 6, 2006
13:32
92% area
z score
of 1.41
Lets say I develop a test to identify the brightest 10% of entering college freshmen for possible
placement in an honors program. In that case, the
test items should have an average p = .10, that is,
the test should be quite difcult with the average
p value reecting the percentage of scores to be
selected in this example, 10%. Tests such as the
SAT or GRE are quite demanding because their
difculty level is quite high.
Measurement of item difculty. Item difculty
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
31
21%
area
z score
of +0.81
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
32
13:32
Table 21
Test
item
Upper 27
Lower 27
Index of
discrimination
1
2
3
4
23 (85%)
24 (89%)
6 (22%)
9 (33%)
6 (22%)
22 (81%)
4 (15%)
19 (70%)
63%
8%
7%
37%
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
33
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
34
Item response theory (IRT). The classical the-
13:32
to which a test item discriminates between highand low-scoring groups, (3) the difculty of the
item, and (4) the probability that a person of
low ability on that variable makes the correct
response.
NORMS
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
35
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
36
13:32
Table 22
Equivalent percentiles
Raw score
Male
Female
47
46
45
44
43
42
etc.
99
98
98
97
96
94
97
95
93
90
86
81
Table 23
Supervisors ratings
Mechanical aptitude
scores
150159
140149
130139
120129
110119
100109
9099
8089
7079
6069
Totals:
Excellent
Above
average
Average
Below
average
Poor
51
42
20
16
0
1
1
2
0
1
134
38
23
14
9
2
0
0
1
1
0
88
16
8
7
3
4
3
0
2
0
0
43
0
3
2
0
7
12
14
23
19
30
110
1
0
1
0
8
16
19
23
26
31
125
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
37
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
38
approach, on the other hand, focuses on the measurement of gain or growth of individuals, and
item selection, reliability and validity, all center
on the notion of gain or growth.
COMBINING TEST SCORES
had administered ten different tests of knowledge of Spanish to Sharon. One test measured
vocabulary, another, knowledge of verbs, still a
third, familiarity with Spanish idioms, and so
on. We are not only interested in each of these
ten components, but we would like to combine
Sharons ten different scores into one index that
reects knowledge of Spanish. If the ten tests
were made up of one item each, we could of
course simply sum up how many of the ten items
were answered correctly by Sharon. With tests
that are made up of differing number of items,
we cannot calculate such a sum, since each test
may have a different mean and standard deviation, that is represent different scales of measurement. This would be very much like adding
a persons weight in pounds to their height in
inches and their blood pressure in millimeters to
March 6, 2006
13:32
obtain an index of physical functioning. Statistically, we must equate each separate measurement before we add them up. One easy way to
do this, is to change the raw scores into z scores
or T scores. This would make all of Sharons ten
scores equivalent psychometrically, with each z
score reecting her performance on that variable
(e.g., higher on vocabulary but lower on idioms).
The ten z scores could then be added together,
and perhaps divided by ten.
Note that we might well wish to argue, either on
theoretical or empirical grounds, that each of the
ten tests should not be given equal weight, that for
example, the vocabulary test is most important
and should therefore be weighted twice as much.
Or if we were dealing with a scale of depression,
we might argue that an item dealing with suicide
ideation reects more depression than an item
dealing with feeling sad, and therefore should be
counted more heavily in the total score. There
are a number of techniques, both statistical and
logical, by which differential weighting can be
used, as opposed to unit weighting, where every
component is given the same scoring weight (see
Wang & Stanley, 1970). Under most conditions,
unit weighting seems to be as valid as methods
that attempt differential weighting (F. G. Brown,
1976).
Combining scores using clinical intuition. In
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
13:32
39
scores statistically is through the use of a multiple regression, which essentially expresses the
relationship between a set of variables and a particular outcome that is being predicted. If we had
only one variable, for example IQ, and are predicting GPA, we could express the relationship
with a correlation coefcient, or with the equation of a straight line, namely:
Y = a + bX
where Y is the variable being predicted, in this
case GPA
X is the variable we have measured, in this
case IQ
b is the slope of the regression line (which
tells us as X increases, by how much Y
increases)
a is the intercept (that is, it reects the
difference in scores between the two
scales of measurement; in this case GPA
is measured on a 4-point scale while IQ
has a mean of 100)
When we have a number of variables, all related
statistically to the outcome, then the equation
expands to:
Y = a + b1 x1 + b2 x2 + bx . . . etc.
A nice example of a regression equation can
be found in the work of Gough (1968) on a
widely used personality test called the California
Psychological Inventory (CPI). Gough administered the CPI to 60 airline stewardesses who had
undergone ight training and had received ratings of in-ight performance (something like a
nal-exam grade). None of the 18 CPI scales individually correlated highly with such a rating, but
a four-variable multiple regression not only correlated +.40 with the ratings of in-ight performance, but also yielded an interesting psychological portrait of the stewardesses. The equation
was:
In-ight rating = 64.293 + .227(So)
1.903(Cm) + 1.226(Ac ) .398(Ai )
where 64.293 is a weight that allows the two
sides of the equation to be
equated numerically,
So is the persons score on the Socialization scale
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
40
13:32
SUMMARY
P1: JZP
0521861810c02
CB1038/Domino
0 521 86181 0
March 6, 2006
Henderson, M., & Freeman, C. P. L. (1987). A selfrating scale for bulimia. The BITE. British Journal of
Psychiatry, 150, 1824.
There is a lot of interest in eating disorders, and these authors
report on the development of a 36-item scale composed of
two subscales the Symptom Subscale and the Severity scale,
designed to measure binge eating. Like the study by Zimmerman and Coryell (1987) listed next, this study uses fairly
typical procedures, and reects at least some of the steps mentioned in this chapter.
13:32
41
DISCUSSION QUESTIONS
1. Locate a journal article that presents the development of a new scale (e.g., Leichsenring, 1999).
How does the procedure compare and contrast
with that discussed in the text?
2. Select a psychological variable that is of interest to you (e.g., intelligence, depression, computer anxiety, altruism, etc.). How might you
develop a direct assessment of such a variable?
3. When your instructor administers an examination in this class, the results will most likely be
reported as raw scores. Would derived scores be
better?
4. What are the practical implications of changing item difculty?
5. What kind of norms would be useful for a
classroom test? For a test of intelligence? For a
college entrance exam?
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
AIM This chapter introduces the concepts of reliability and of validity as the two
basic properties that every measuring instrument must have. These two properties are
dened and the various subtypes of each discussed. The major focus is on a logical
understanding of the concepts, as well as an applied understanding through the use
of various statistical approaches.
INTRODUCTION
RELIABILITY
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
43
scores and the second. By convention, a correlation coefcient that reects reliability should
reach the value of .70 or above for the test to be
considered reliable.
The determination of test-retest reliability
appears quite simple and straightforward, but
there are many problems associated with it. The
rst has to do with the suitable interval before
retesting. If the interval is too short, for example a couple of hours, we may obtain substantial consistency of scores, but that may be more
reective of the relative consistency of peoples
memories over a short interval than of the actual
measurement device. If the interval is quite long,
for example a couple of years, then people may
have actually changed from the rst testing to
the second testing. If everyone in our sample
had changed by the same amount, for example
had grown 3 inches, that would be no problem
since the consistency (John is still taller than Bill)
would remain. But of course, people dont change
in just about anything by the same amount, so
there would be inconsistency between the rst
and second set of scores, and our instrument
would appear to be unreliable whereas in fact
it might be keeping track of such changes. Typically, changes over a relatively longer period of
time are not considered in the context of reliability, but are seen as true changes.
Usually then, test-retest reliability is assessed
over a short period of time (a few days to a few
weeks or a few months), and the obtained correlation coefcient is accompanied by a description of what the time period was. In effect, testretest reliability can be considered a measure of
the stability of scores over time. Different periods of time may yield different estimates of stability. Note also that some variables, by their very
nature, are more stable than others. We would not
expect the heights of college students to change
over a two-week period, but we would expect
changes in mood, even within an hour!
Another problem is related to motivation. Taking a personality inventory might be interesting
to most people, but taking it later a second time
might not be so exciting. Some people in fact
might become so bored or resentful as to perhaps answer randomly or carelessly the second
time around. Again, since not everyone would
become careless to the same degree, retest scores
would change differently for different people,
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
44
March 4, 2006
14:16
number of items we can generate for an alternate form, but if we are developing a test to assess
depression, the number of available items related
to depression is substantially smaller.
Lets assume you have developed a 100-item,
multiple-choice vocabulary test composed of
items such as:
donkey = (a) feline, (b) canine, (c) aquiline, (d) asinine
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
45
k (obtained r )
1 + (k 1)(obtained r )
estimated r =
2(.68)
1.36
=
= .81
1 + (1)(.68)
1.68
ity is .61; how long must the test be for its reliability to be .70? (Notice we need to solve for k.)
.70 =
k(.61)
1 + (k 1)(.61)
cross-multiplying we obtain:
k(.61) = .70 + .70(k 1)(.61)
k(.61) = .70 + (.427)(k 1)
k(.61) = .70 + .427k .427
k(.183) = .273
k = 1.49
the test needs to be about 1.5 times as long or
about 90 items (60 1.5).
EXAMPLE 3 Given a 300-item test whose reliability is .96, how short can the test be to have its
reliability be at least .70? (Again, we are solving
for k.)
k(.96)
1 + (k 1)(.96)
k(.96) = .70 + .70(.96)(k 1)
.70 =
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
46
March 4, 2006
14:16
variance of differences
variance of total scores
ity of scores among individuals, that is, individual differences, makes statistical calculations such
as the correlation coefcient possible. The item,
Are you alive as you read this? is not a good test
item because it would yield no variability everyone presumably would give the same answer.
Similarly, gender as dened by male or female
yields relatively little variability, and from a psychometric point of view, gender thus dened is
not a very useful measure. All other things being
equal, the greater the variability in test scores
the better off we are. One way to obtain such
variability is to increase the range of responses.
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
47
Sources of error. The four types of reliability just
discussed all stem from the notion that a test score
is composed of a true score plus an error component, and that reliability reects the relative
ratio of true score variance to total or observed
score variance; if reliability were perfect, the error
component would be zero.
A second approach to reliability is based on
generalizability theory, which does not assume
that a person has a true score on intelligence, or
that error is basically of one kind, but argues that
different conditions may result in different scores,
and that error may reect a variety of sources
(Brennan, 1983; Cronbach, Gleser, Rajaratnam,
& Nanda, 1972; see Lane, Ankenmann, & Stone,
1996, for an example of generalizability theory
as applied to a Mathematics test). The interest
here is not only in obtaining information about
the sources of error, but in systematically varying those sources and studying error experimentally. Lyman (1978) suggested ve major sources
of error for test scores:
1. The individual taking the test. Some individuals are more motivated than others, some are less
attentive, some are more anxious, etc.
2. The inuence of the examiner, especially on
tests that are administered to one individual at
a time. Some of these aspects might be whether
the examiner is of the same race, gender, etc., as
the client, whether the examiner is (or is seen as)
caring, authoritarian, etc.
3. The test items themselves. Different items
elicit different responses.
4. Temporal consistency. For example, intelligence is fairly stable over time, but mood may
not be.
5. Situational aspects. For example, noise in the
hallway might distract a person taking a test.
We can experimentally study these sources of
variation and statistically measure their impact,
through such procedures as analysis of variance,
to determine which variables and conditions create lessen reliability. For example, whether the
retest is 2 weeks later or 2 months later might
result in substantial score differences on test X,
but whether the administrator is male or female
might result in signicant variation in test scores
for male subjects but not for female subjects. (See
Brennan, 1983, or Shavelson, Webb, & Rowley,
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
48
14:16
No
Yes
No
Observer 1
A+ D
100
A+ B +C + D
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
where restimated
r12
r11
r22
March 4, 2006
14:16
49
The standard error of measurement. Knowing
the reliability coefcients for a particular test
gives us a picture of the stability of that test.
Knowing for example, that the test-retest reliability of our 100-item vocabulary test is .92 over a
6-month period tells us that our measure is fairly
stable over a medium period of time; knowing
that in a sample of adults, the test-retest reliability is .89 over a 6-year period, would also tell
us that vocabulary is not easily altered by differing circumstances over a rather long period of
time. Notice however, that to a certain degree this
approach does not focus on the individual subject. To compute reliability the test constructor
simply administers the test to a group of subjects,
chosen because of their appropriateness (e.g.,
depressed patients) or quite often because of their
availability (e.g., college sophomores). Although
the obtained correlation coefcient does reect
the sample upon which it is based, the psychometrician is more interested in the test than in the
subjects who took the test. The professional who
uses a test, however, a clinical psychologist, a personnel manager, or a teacher, is very interested in
the individual, and needs therefore to assess reliability from the individual point of view. This is
done by computing the standard error of measurement (SEM).
Imagine the following situation. I give Susan,
a 10-year-old, an intelligence test and I calculate
her IQ, which turns out to be 110. I then give
her a magic pill that causes amnesia for the testing, and I retest her. Because the test is not perfectly reliable, because Susans attention might
wander a bit more this second time, and because
she might make one more lucky guess this time,
and so on, her IQ this second time turns out to
be 112. I again give her the magic pill and test
her a third time, and continue doing this about
5,000 times. The distribution of 5,000 IQs that
belong to Susan will differ, not by very much,
but perhaps they can go from a low of 106 to a
high of 118. I can compute the mean of all of
these IQs and it will turn out that the mean will
in fact be her true IQ because error deviations
are assumed to cancel each other out for every
lucky guess there will be an unlucky guess. I can
also calculate the variation of these 5,000 IQs by
computing the standard deviation. Because this
is a very special standard deviation (for one thing,
it is a theoretical notion based on an impossible
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
50
14:16
34%
34%
13%
13%
3%
3%
2 SD
1 SD
+1 SD
+2 SD
100.6
105.3
110
114.7
119.4
SEM = SD 1 r 11
where SD is the standard deviation of scores on
the test, and
r11 is the reliability coefcient.
Lets say that for the test I am using with Susan,
the test manual indicates that the SD = 15 and the
reliability coefcient is .90. The SEM is therefore
equal to:
15 (1.90) or 4.7
How do we use this information? Remember that
a basic assumption of statistics is that scores, at
least theoretically, take on a normal curve distribution. We can then imagine Susans score distribution (the 5,000 IQs if we had them) to look
like the graph in Figure 3.1.
We only have one score, her IQ of 110, and
we calculated that her scores would on the average deviate by 4.7 (the size of the SEM). There-
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
51
34%
34%
13%
3%
x
10 2.95.88 or 4.1.
We would accept Alicias two scores as being different, if the probability of getting such a difference by chance alone is 5 or fewer times out of
100, i.e., p < .05. You will recall that such a probability can be mapped out on the normal curve
to yield a z score of +1.96. We would therefore
take the SED of 4.1 and multiply it by 1.96 to yield
approximately 8, and would conclude that Alicias
two scores are different only if they differ by at
least 8 points; in the example above they do not,
and therefore we cannot conclude that she did
better on one test than the other (see Figure 3.2).
SED = SD 2r 11 r 22
where
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
52
egories of tests where the determination of reliability requires somewhat more careful thinking.
The rst of these are speeded tests where different scores reect different rates of responding.
Consider for example a page of text where the
task is to cross out all the letters e with a time
limit of 40 seconds. A persons score will simply
reect how fast that person responded to the task.
Both test-retest and equivalent forms reliability
are applicable to speeded tests, but split-half and
internal consistency are not, unless the split is
based on time rather than number of items.
14:16
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
not; validity is best thought of as a unitary process with somewhat different but related facets
(Cronbach, 1988). Messick (1989) denes validity as an integrated evaluative judgment of the
adequacy and appropriateness of interpretations
and actions based on the assessment measure.
Content Validity
March 4, 2006
14:16
53
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
54
14:16
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
55
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
56
14:16
Other Aspects
Face validity. Sometimes we speak of face validity, which is not validity in the technical sense, but
refers to whether a test looks like it is measuring
the pertinent variable. We expect, for example, a
test of intelligence to have us dene words and
solve problems, rather than to ask us questions
about our musical and food preferences. A test
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
57
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
58
r 11r 22 ,
where r11 again represents the reliability coefcient of the rst variable (for example, a test) and
r22 the reliability coefcient of the second variable (for example, a criterion). If a test we are
trying to validate has, for example, a reliability of
.70 and the criterion has a reliability of .50, then
the maximum validity coefcient we can obtain is
.59. (Note, of course, that this is the same formula
we used for the correction for attenuation.)
Interpreting a validity coefcient. Much of the
March 4, 2006
14:16
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
59
Cumulative GPA
2.5 to 3.49
Total
18
= (25)
1000 to 1399
28
11
= (45)
16
12
= (30)
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
60
14:16
Negative
Hits
Errors
Positive
False Positives
Test for TB
Errors
Hits
Negative
False Negatives
true positives
100
true positives + false negatives
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
61
College
Student Fails
Positive
(student will fail)
A
Hits
SAT
Negative
(student will pass)
D
Errors
False Negatives
Student Passes
C
Errors
False Positives
B
Hits
The specicity of a test is the proportion of correctly identied negatives, (i.e., how accurately
does a test classify those who do NOT have the
particular condition?), that is, true negatives, and
is dened as:
Specicity
=
true negatives
100
true negatives + false positives
or B/C + B.
The predictive value (also called efciency) of
a test is the ratio of true positives to all positives,
and is dened as:
Predictive value
true positives
=
100
true positives + false positives
or A/A + C.
An ideal test would have a high degree of sensitivity and specicity, as well as high predictive
value, with a low number of false positives and
false negative decisions. (See Klee & Garnkel
[1983] for an example of a study that uses the
concepts of sensitivity and specicity; see also
Baldessarini, Finkelstein, & Arana, 1983; Gerardi,
Keane, & Penk, 1989.)
An example from suicide. Maris (1992) gives an
interesting example of the application of decision theory to some data of a study by Pokorny
of 4,704 psychiatric patients who were tested
and followed up for 5 years. In this group of
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
62
14:16
35
Did not
commit suicide
1206
true
false
positives
positives
28
3435
false
true
negatives
negatives
= 1241 cases
predicted as
suicide
= 3463 cases
predicted as
nonsuicide
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
63
College
Student fails
A
Positive
(student will fail if
SAT is below 400)
Hits
Student passes
C
Errors
False Positives
SAT
D
Negative
(student will pass if
SAT is above 400)
Errors
B
Hits
False Negatives
consideration is the base rate, that is the naturally occurring frequency of a particular behavior. Assume, for example, that I am a psychologist working at the local jail and over the years
have observed that about one out of 100 prisoners attempts suicide (the actual suicide rate seems
to be about 1 in 2,500 inmates, a rather low base
rate from a statistical point of view; see Salive,
Smith, & Brewer, 1989). As prisoners come into
the jail, I am interested in identifying those who
will attempt suicide to provide them with the necessary psychological assistance and/or take the
necessary preventive action such as removal of
belts and bed sheet and 24-hour surveillance. If
I were to institute an entrance interview or testing of new inmates, what would happen? Lets
say that I would identify 10 inmates out of 100 as
probable suicide attempters; those 10 might not
include the one individual who really will commit suicide. Notice then that I would be correct 89
out of 100 times (the 89 for whom I would predict no suicide attempt and who would behave
accordingly). I would be incorrect 11 out of 100
times, for the 10 false positive individuals whom I
would identify as potential suicides, and the I false
negative whom I would not detect as a potential
suicide. But if I were to do nothing and simply
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
64
14:16
March 4, 2006
Correlation coefficient
.63
.51
.44
.22
.16
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
March 4, 2006
14:16
65
P1: JZP
0521861810c03
CB1038/Domino
0 521 86181 0
66
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281302.
A basic and classic paper that focused on construct validity.
March 4, 2006
14:16
DISCUSSION QUESTIONS
1. I have developed a test designed to assess creativity in adults. The test consists of 50 truefalse questions such as, Do you consider yourself creative? and As a child were you extremely
curious? How might the reliability and validity
of such a test be determined?
2. For the above test, assume that it is based on
psychoanalytic theory that sees creativity as the
result of displaced sexual and aggressive drives.
How might the construct validity of such a test
be determined?
3. Why is reliability so important?
4. Locate a meta-analytical study of a psychological test. What are the conclusions arrived at by
the author(s)? Is the evidence compelling?
5. In your own words, dene the concepts of sensitivity and specicity.
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
11:47
4 Personality
AIM This chapter focuses on the assessment of normal personality. The question
of how many basic personality dimensions exist, and other basic issues are discussed.
Nine instruments illustrative of personality assessment are considered; some are well
known and commercially available, while others are not. Finally, the Big-Five model,
currently a popular one in the eld of personality assessment, is discussed.
INTRODUCTION
Personality
Internal or External?
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
68
responsible to not at all responsible. Two interesting questions can now be asked: (1) how
do these two methods relate to each other
does the person who scores high on the scale of
responsibility also score high on the self-rating of
responsibility? and (2) which of these two methods is more valid which scores, the personality
inventory or the self-ratings, will correlate more
highly with an external, objective criterion? (Note
that basically we are asking the question: Given
two methods of eliciting information, which is
better?)
There seems to be some evidence that suggests,
that at least in some situations, self-ratings tend to
be the better method, that self-ratings turn out to
be slightly more valid than corresponding questionnaire scales. The difference between the two
methods is not particularly large, but has been
found in a number of studies (e.g., M. D. Beck
& C. K. Beck, 1980; Burisch, 1984; Carroll, 1952;
Shrauger & Osberg, 1981). Why then use a test?
In part, because the test parallels a hypothesized
dimension, and allows us to locate individuals on
that dimension. In essence, its like asking people how tall they are. The actual measurement (5
feet 8 inches) is more informative than the rating
above average.
Self-report measures. One of the most common
ways of assessing personality is to have the individual provide a report of their own behavior.
The report may be a response to an open-ended
question (tell me about yourself), may require
selecting self-descriptive adjectives from a list, or
answering true-false to a series of items. Such
self-report measures assume, on the one hand,
that individuals are probably in the best position
to report on their own behavior. On the other
hand, most personality assessors do not blindly
assume that if the individual answers true to the
item I am a sociable person, the person is in fact
sociable. It is the pattern of responses related to
empirical criteria that is important. In fact, some
psychologists (e.g., Berg, 1955; 1959) have argued
that the content of the self-report is irrelevant;
what is important is whether the response deviates from the norm.
Whether such reporting is biased or unbiased
is a key issue that in part involves a philosophical issue: Are most people basically honest and
objective when it comes to describing themselves?
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
11:47
69
Two terms are often used in discussing personality, particularly in psychological testing. When
we assess an individual with a personality test,
that test will presumably measure some variable
or combination of variables perhaps sociability,
introversion-extraversion, self-control, assertiveness, nurturance, responsibility, and so on. Ordinarily, we assume that individuals occupy different positions on the variable, that some people
are more responsible than others, and that our
measurement procedure is intended to identify
with some degree of accuracy a persons position on that variable. The variable, assumed to
be a continuum, is usually called a trait. (For an
excellent discussion of trait see Buss, 1989.)
As you might expect, there is a lot of argument about whether such traits do or do not exist,
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
70
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
11:47
71
Importance of language. Why select a particular variable to measure? Although, as we said,
measurement typically arises out of need, one
may well argue that some variables are more
important than others, and that there is a greater
argument for scaling them. At least two psychologists, Raymond Cattell and Harrison Gough,
well known for their personality inventories, have
argued that important variables that reect signicant individual differences become encoded
in daily language. If for example, responsibility is of importance, then we ought to hear lay
people describe themselves and others in terms
of responsibility, dependability, punctuality, and
related aspects. To understand what the basic
dimensions of importance are, we need to pay
attention to language because language encodes
important experiences.
Psychopathology. Many theories of personality
and ways of measuring personality were originally developed in clinical work with patients.
For example, individuals such as Freud, Jung,
and Adler contributed to much of our understanding about basic aspects of personality, but
their focus was primarily on psychopathology
or psychological disturbances. Thus, there is a
substantial area of personality assessment that
focuses on the negative or disturbed aspects of
personality; the MMPI is the most evident example of a personality inventory that focuses on psychopathology. These instruments are discussed
in Chapter 15.
Self-actualization. Other theorists have focused
Freud is the focus on motivation on what motivates people, and on how can these motives be
assessed. Henry A. Murray (1938) was one individual who both theoretically and empirically
focused on needs, those aspects of motivation that
result in one action rather than another (skipping lunch to study for an exam). Murray realized
that the physical environment also impinges on
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
72
Factor name
Brief explanation
Schizothymiaaffectothymia
Intelligence
Ego strength
Submissiveness-dominance
Desurgency-surgency
Superego strength
Threctia-Parmia
Harria-Premsia
B
C
E
F
G
H
I
L
M
N
O
Q1
Q2
Q3
Q4
Alaxia-protension
Praxernia-Autia
Artlessness-shrewdness
Untroubled adequacy-guilt
proneness
Conservative-radical
Group adherence
Self-sentiment integration
Ergic tension
Emotional stability
Sober-enthusiastic
Expedient-conscientious
Shy-uninhibited
Tough minded vs. tender
minded
Trusting-suspicious
Practical-imaginative
Unpretentious-astute
Self-assured vs. worrying
16PF is
designed for ages 16 and
older, and yields scores for
the 16 dimensions listed in
Table 4.1.
As shown in Table 4.1, each of the factors is
identied by a letter, and then by a factor name.
These names may seem quite strange, but they
are the words that R. B. Cattell chose. For those
names that are not self-evident, there is also a
brief explanation in more familiar terms.
Six forms of the test are available, two of which
are designed for individuals with limited education. The forms of the 16PF contain 187 items
and require about 45 to 60 minutes for administration (2535 minutes for the shorter forms
of 105 items). Since its original publication, the
16PF has undergone ve revisions. Some forms
of the 16PF contain validity scales, scales that are
designed to assess whether the respondent is producing a valid protocol, i.e., not faking. These
scales include a fake-bad scale, a fake-good
scale, and a random-responses scale.
The 16 dimensions are said to be independent,
and each item contributes to the score of only one
scale. Each of the scales is made up from 6 to 13
items, depending on the scale and the test form.
The items are 3-choice multiple-choice items,
or perhaps more correctly, forced-choice options.
An example of such an item might be: If I had
some free time I would probably (a) read a good
book, (b) go visit some friends, (c) not sure. R. B.
Cattell, Eber, and Tatsuoka (1972) recommend
that at least two forms of the test be administered
to a person to get a more valid measurement, but
in practice, this is seldom done.
Joiner-self sufcient
Undisciplined-controlled
Relaxed-tense
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
Administration. The 16PF is basically a selfadministered test, and requires minimal skills on
the part of the examiner to administer; interpretation of the results is of course a different matter.
Scoring. Scoring of the 16PF can be done by
hand or by machine, and is quite straightforward each endorsed keyed response counts 1
point. As with most other tests that are handscored, templates are available that are placed on
top of the answer sheet to facilitate such scoring.
Raw scores on the 16PF are then converted to
stens, a contraction of standard ten, where scores
can range from 1 to 10, and the mean is xed
at 5.5; such conversions are done by using tables
provided in the test manual, rather than by doing
the actual calculations. Despite the strange names
given to the 16 factors, for each of the scales, the
test manual gives a good description of what a
low scorer or a high scorer might be like as a person. A number of computer scoring services are
available. These can provide not only a scoring
of the scales and a prole of results, but also a
narrative report, with some geared for specic
purposes (for example, selection of law enforcement candidates).
Reliability. An almost overwhelming amount of
information on the 16PF can be found in the
Handbook for the 16PF (R. B. Cattell, Eber, &
Tatsuoka, 1970), in the test manual, in the professional literature, and in a variety of publications
from the test publisher. Internal consistency of
the scales is on the low side, despite the focus
on factor analysis, and scales are not very reliable
across different forms of the test i.e., alternate
form reliability (Zuckerman, 1985). Information
about test-retest reliability, both with short intervals (2 to 7 days) and longer intervals (2 to
48 months) is available and appears adequate.
The correlation coefcients range from the .70s
and .80s for the brief interval, to the .40s and
.50s for a 4-year interval. This is to be expected
because test-retest reliability becomes lower as
the interval increases, and in fact, a 4-year interval may be inappropriate to assess test stability,
but more appropriate to assess amount of change.
Validity. The test manual gives only what may be
11:47
73
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
74
some time and has found extensive applications in a wide variety of areas. Sometimes
however, there has been little by way of replication of results. For example, the 16PF Handbook presents a number of regression equations
designed to predict specic behaviors, but most
of these regression equations have not been tested
to see if they hold up in different samples.
The short forms are of concern because each
scale is made up of few items, and short scales
tend to be less reliable and less valid. In fact, the
data presented by the test authors substantiate
this concern, but a new test user may not perceive the difference between short forms and long
forms in reliability and validity.
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
75
11:47
abbreviated as E-I
abbreviated as S-N
abbreviated as T-F
abbreviated as J-P
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
76
the work of Isabel Myers Briggs, and its own journal named Research in Psychological Type.
The MBTI manual (I. B. Myers & McCaulley,
1985) gives considerable information for the psychometrically oriented user, but it is clear that the
focus of the Manual is on the applied use of the
MBTI with individual clients in situations such
as personal and/or career counseling. Thus, there
are detailed descriptions of the 16 pure types in
terms of what each type is like, and there are presumed employment aspects for each type; for
example, introverts are said to be more careful
with details, to have trouble remembering names
and faces, and like to think before they act, as
opposed to extroverts who are faster, good at
greeting people, and usually act quickly.
Nevertheless, we can still ask some psychometric questions, and one of these is: How independent are the four sets of scales? Intercorrelations of the four scales indicate that three of the
scales are virtually independent, but that JP correlates signicantly with SN, with typical correlation coefcients ranging from about .26 to .47;
one way to interpret this is that intuitive types are
more common among perceptive types the two
tend to go together.
Criticisms. One basic issue is how well the test
captures the essence of the theory. Jungian theory is complex and convoluted, the work of a
genius whose insights into human behavior were
not expressed as easily understood theorems. The
MBTI has been criticized because it does not mirror Jungian theory faithfully; it has also been criticized because it does, and therefore is of interest
only if one accepts the underlying theory (see
McCaulley, 1981, and J. B. Murray, 1990, for
reviews).
The Edwards Personal Preference
Schedule (EPPS)
Introduction. There are two theoretical inu-
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
11:47
77
Brief denition
1. Achievement
2. Deference
3. Order
4. Exhibition
5. Autonomy
6. Afliation
7. Intraception
8. Succorance
9. Dominance
10. Abasement
11. Nurturance
12. Change
13. Endurance
14. Heterosexuality
15. Aggression
To achieve, to be successful
To follow, to do what is expected
To be orderly and organized
To be at the center of attention
To be independent
To have friends
To analyze ones self and others
To be helped by others
To be a leader
To accept blame
To show affection and support
To need variety and novelty
To have persistence
To seek out members of the opposite sex
To be aggressive, verbally and/or physically
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
78
shows an unusual degree of technical sophistication and encompasses a number of steps implemented only because of the availability of highspeed computers. D. N. Jackson (1967) indicates
that there were four basic principles that guided
the construction of the PRF:
1. Explicit and theoretically based denitions of
each of the traits;
2. Selection of items from a large item pool, with
more than 100 items per scale, with selection
based on homogeneity of items;
3. The use of procedures designed to eliminate
or control for such response biases as social
desirability;
4. Both convergent and discriminant validity
were considered at every stage of scale development, rather than after the scale was developed.
In constructing the PRF, D. N. Jackson (1967)
used a series of steps quite similar to the ones
outlined in Chapter 2:
1. Each of the traits (needs) was carefully studied
in terms of available theory, research, etc.;
2. A large pool of items was developed, with each
item theoretically related to the trait;
3. These items were critically reviewed by two or
more professionals;
4. Items were administered to more than a thousand subjects, primarily college students;
5. A series of computer programs were written
and used in conducting a series of item analyses;
6. Biserial correlations were computed between
each item, the scale on which the item presumably belonged, scales on which the item did not
belong, and a set of items that comprised a tentative social desirability scale;
7. Items were retained only if they showed a
higher correlation with the scale they belonged
to than any of the other scales;
8. Finally, items were retained for the nal scales
that showed minimal relation to social desirability, and also items were balanced for true or false
as the keyed response.
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
are available.
Reliability. Because the development of the PRF
consisted of some steps designed to select items
that correlated highly with total scale scores, one
would expect the reliability of the PRF, at least as
measured by internal consistency methods, to be
high. D. N. Jackson (1967) does list the KuderRichardson coefcients for the 22 scales, but the
coefcients are inated because they are based on
the best 40 items for each scale, but each scale is
11:47
79
made up of 20 items. To be really correct, the reliability coefcients should have either been computed on 20-item scales, or should have been corrected by the Spearman-Brown formula. Despite
this, the coefcients are quite acceptable, with the
exception of the Infrequency scale.
Test-retest reliabilities are also presented for a
sample of 135 individuals retested with a 1-week
interval. Coefcients range from a low of +.46
(again for the Infrequency scale) to a high of .90,
with more than half of the coefcients in the .80s
range. Odd-even reliabilities are also presented,
with slightly lower coefcients.
Validity. D. N. Jackson (1967; 1984) presents
as a personality inventory that is very sophisticated in its development. Although it has been
available for some time, it is not really a popular
test, especially among practitioners. For example, Piotrowski and Keller (1984) inquired of
all graduate programs that train doctoral students in clinical psychology as to which tests
should a clinical PhD candidate be familiar with.
The PRF was mentioned by only 8% of those
responding.
The test manual does not make it clear why
both short forms and long forms of the PRF
were developed. Strictly speaking, these are not
short forms but abbreviated forms that assess
only 15 of the 22 scales. The parallel forms
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
80
ogy programs (Piotrowski & Keller, 1984) mentioned before, the most popular personality
inventory mentioned was the MMPI, which was
listed by 94% of the respondents. The second
most popular was the CPI which was mentioned
by 49%. Thus, despite its focus on normality, the
CPI is considered an important instrument by
clinicians, and indeed it is. Surveys done with
other professional groups similarly place the CPI
in a very high rank of usefulness, typically second
after the MMPI.
The author of the CPI, Harrison Gough, indicates (personal communication, August 3, 1993)
that to understand the CPI there are ve axioms
or basic notions that need attention:
1. The rst is the question, what should be
measured? We have seen that for Edwards and
for Jackson the answer lies in Murrays list
of needs. For Gough the answer is folk concepts. Gough argues that across the ages, in all
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
persons income be sufcient, or would knowing their educational level, their occupation, their
address, their involvement in community activities, and so on, enrich our understanding? Gough
argues for the latter approach.
Development. The CPI, rst published in 1956,
11:47
81
Dominance
Capacity for status
Sociability
Social presence
Self-acceptance
Independence
Empathy
Responsibility
Socialization
Self-control
Good impression
Communality
Well-being
Tolerance
Achievement via
conformance
Achievement via
independence
Intellectual efciency
Psychological mindedness
Flexibility
Femininity/Masculinity
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
82
ALPHA
characterized as:
ambitious, productive,
high-aspiration level,
leader, has social
poise,talkative, a doer
able to deal with
frustration, can be selfcentered
BETA
characterized as:
ethical, submissive,
dependable and responsible, can be conformist, methodical
reserved
able to delay
gratification
Vector 1
Extraverted
Introverted
(involvement and
participation)
(detachment and
privacy)
GAMMA
characterized as:
doubter and skeptic, innovative, self-indulgent,
rebellious and nonconforming, verbally fluent
DELTA
characterized as:
tends to avoid action,
feels lack of personal
meaning, shy and quiet,
reflective, focused on
internal world
Rule questioning
FIGURE 41. The CPI vectors 1 and 2.
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
ies that have used the CPI and thus are relevant
to the question of its validity. Megargee (1972)
attempted to summarize most of the studies that
appeared before 1972, but an even larger number of studies have appeared since then. Although
Gough and his students have been quite prolic
in their contributions to the literature, the CPI
has found wide usage, as well as a few vociferous
critics.
Because of space limitations, we cannot even
begin to address the issue of validity, but perhaps one small example will sufce. Over the
years, the CPI has been applied with outstanding success to a wide variety of questions of
psychological import, including that of college
entrance. Nationwide, only about 50% of highschool graduates enter college. Can we predict
who will enter college? Intellectual aptitude is certainly one variable and indeed it correlates signicantly with college entrance, but not overwhelmingly so; typical correlations between scores on
tests of intellectual aptitude and entering-not
entering college are in the range of .30 to .40.
Socioeconomic status is another obvious variable, but here the correlations are even lower.
In the CPI test manual, Gough (1987) reports
on a nationwide normative study in which 2,620
students took the CPI while in high school
and were surveyed 5 to 10 years later as to
their college-going. Overall, 40% of the sample
attended college, but the rates were different for
each of the four types, as dened by vectors 1
and 2. Alphas had the highest rate (62%), while
deltas had the lowest rate (23%); both betas and
gammas had rates of 37%. High potential alphas
(those scoring at levels 5, 6, or 7 on the v3
scale) tended to major in business, engineering, medicine, and education, while high potential deltas tended to major in art, literature, and
music; note that because fewer deltas were entering college, there are fewer such talented persons
in a college environment. Within each type, going
to college was also signicantly related to level of
self-realization. For example, for the alphas only
28% of those in level 1 went to college, but a full
78% of those in levels 5, 6, and 7 did. As Gough
(1989) points out, the CPI has been applied to
11:47
83
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
84
Challenge to be met
Early infancy
Later infancy
Early childhood
Middle childhood
Adolescence
Early adulthood
Middle adulthood
Late adulthood
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
Factor
Representative item
Trust
Autonomy
Initiative
Intimacy
Generativity
Ego integrity
11:47
85
Industry
Identity
hand.
Reliability. The authors assessed three samples
for reliability purposes: 102 college students; 68
community adults who were administered the
IPB twice with a test-retest period from 28 to
35 days, and a third sample of 73 adults living in
a retirement community. The alpha coefcients
for the rst and third samples ranged from .48 to
.79, acceptable but low. The authors interpreted
these results as reecting heterogeneity of item
content. The test-retest coefcients for the second sample ranged from .79 to .90, quite high
and indicative of substantial temporal stability,
at least over a 1-month period.
Validity. The validity of a multivariate instru-
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
86
Domino & Hannah, 1989), the IPB was administered to 143 elderly persons who were participating in a college program. They were also assessed
with the CPI self-realization scale (vector 3) as a
global self-report of perceived effective functioning. For men, higher effective functioning was
related to a greater sense of trust and industry and
lower scores on generativity and intimacy. For
women, higher effective functioning was related
most to a sense of identity and to lower scores on
trust and industry. These results suggest that for
people who grew up in the 1920s and 1930s, there
were different pathways to success for men success was facilitated by having basic trust, working hard, and not getting very close to others.
For women, it meant developing a strong sense
of identity, not trusting others, and not being
as concerned with actual work output (see also
Hannah, G. Domino, Figueredo, & Hendrickson,
1996).
Note that in developing the IPB the authors
attempted to develop scales on the basis of both
internal consistency and external validity.
Criticisms. The IPB is a new instrument, and
like hundreds of other instruments that are published each year, may not survive rigorous analysis, or may simply languish on the library
shelves.
The Self-Consciousness Inventory (SCI)
Introduction. Getting in touch with oneself or
self-insight would seem to be an important variable, not just from the viewpoint of the psychologist interested in the arena of psychotherapy,
for example, but also for the lay person involved
in everyday transactions with the world. We all
know individuals who almost seem obsessed with
analyzing their own thoughts and those of others, as well as individuals who seem to be blessedly
ignorant of their own motivation and the impact,
or lack of it, they have on others.
Development. Fenigstein, Scheier, and Buss
of 23 items, with 10 items for factor 1 labeled private self-consciousness, 7 items for factor 2 labeled
public self-consciousness, and 6 items for factor
3 labeled social anxiety. The actual items and
their factor loadings are presented in the article
by Fenigstein, Scheier, and Buss (1975). Examples of similar items are, for factor 1: I am very
aware of my mood swings; for factor 2: I like
to impress others; for factor 3: I am uneasy in
large groups.
Administration. This is a brief instrument easily
self-administered, and probably taking no longer
than 15 minutes for the average person.
Scoring. Four scores are obtained, one for each
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
Reliability. The test-retest reliability for a sample of 84 subjects over a two-week interval ranges
from +.73 for Social Anxiety (the shortest scale)
to +.84 for Public Self-consciousness. The reliability of the total score is +.80. Note here, that
the total scale, which is longer than any of its
subscales, is not necessarily the most reliable.
Validity. No direct validity evidence was pre-
11:47
87
& Sundberg, 1986), argue that boredom is a common emotion and one that is important not only
in the overall eld of psychology but also in more
specialized elds such as industrial psychology,
education, and drug abuse, yet few scales exist to
measure this important variable.
Development. The authors began with a review
of the relevant literature, as well as with interviews with various persons; this led to a pool of
200 true-false items, similar to, I am always busy
with different projects. Items that were duplicates and items for which three out of four judges
could not agree on the direction of scoring were
eliminated. Preliminary scales were then assessed
in various pilot studies and items revised a number of times.
Description. The current version of the scale
contains 28 items (listed in Farmer & Sundberg, 1986), retained on the basis of the following criteria: (1) responses on the item correlated
with the total score at least +.20; (2) at least 10%
of the sample answered an item in the bored
direction; (3) a minimal test-retest correlation of
+.20 (no time interval specied); and (4) a larger
correlation with the total score than with either
of two depression scales; depression was chosen
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
88
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
11:47
89
Denition
1. Neuroticism
(emotional stability; adjustment)
2. Extraversion-Introversion
(surgency)
3. Openness to experience
(intellect; culture)
4. Agreeableness
(likability; friendliness)
5. Conscientiousness
(dependability; conformity)
tory originally was designed to measure three personality dimensions: neuroticism, extraversion,
and openness to experience (Costa & McCrae,
1980). Eventually two additional scales, agreeableness and conscientiousness, were added to
bring the inventory into line with the Big-Five
model (Costa & McCrae, 1985). Finally, in 1990
the current revised edition was published (Costa
& McCrae, 1992).
Development. The original NEO inventory,
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
90
cates that domain scores give a good approximation of the factor scores, and so it is not worth
calculating factor scores by hand for individual
cases.
Reliability. Internal consistency and 6-month
test-retest reliability coefcients for the rst three
(NEO) scales are reported to be from +.85 to
+.93 (McCrae & Costa, 1987). The test manual (Costa & McCrae, 1992) reports both alpha
coefcients and test-retest reliability coefcients,
and these seem quite satisfactory. Caruso (2000)
reported a metaanalysis of 51 studies dealing with
the reliability of the NEO personality scales, and
found that reliability was dependent on the specic NEO dimension-specically Agreeableness
scores were the weakest, particularly in clinical
samples, for male only samples, and with testretest reliability.
Validity. Much of the research using the NEO-PI
and leading to the development of the NEO-PIR is based on two major longitudinal studies of
large samples, one of over 2,000 white male veterans, and the other based on a variable number sample of volunteers participating in a study
of aging. Both the test manual and the literature are replete with studies that in one way or
another address the validity of the NEO-PI and
the NEO-PI-R, including content, criterion, and
construct validity. Because we are considering 35
scales, it is impossible to meaningfully summarize such results, but in general the results support the validity of the NEO-PI-R, especially its
domain scales.
Norms. The test manual (Costa & McCrae, 1992)
gives a table of means and SDs for men and
women separately, based on samples of 500 men
and 500 women. There is a similar table for
college-aged individuals, based on a sample of
148 men and 241 women aged 17 through 20
years. Tables are also available to change raw
scores into percentiles.
Interesting aspects. The literature seems to con-
11:47
P1: JZP
0521861810c04
CB1038/Domino
0 521 86181 0
Personality
We can once again ask whether the ve dimensions, as assessed by the NEO-PI-R are independent. The test manual gives a table of intercorrelations among the 35 scales that indicates the ve
domain scales not to be all that independent; for
example, scores on the Neuroticism scale correlate .53 with scores on the Conscientiousness
scale, and scores on the Extraversion scale correlate +.40 with scores on the Openness scale. In
addition, the facet scales under each domain scale
intercorrelate substantially. For example, Anxiety and Depression, which are facets of the Neuroticism scale, correlate +.64 with each other.
Although one would expect the components of
a scale to intercorrelate signicantly, a substantial correlation brings into question whether the
components are really different from each other.
Criticisms. Hogan (1989b) in reviewing the
NEO-PI commends it highly because it was developed and validated on adult subjects rather than
college students or mentally ill patients, because
it represents an attempt to measure the Big-Five
dimensions, and because there is good discriminant and convergent validity. Clearly, the NEOPI has made an impact on the research literature
and is beginning to be used in a cross-cultural
context (e.g., Yank et al., 1999). Whether it can
be useful in understanding the individual client
in counseling and therapeutic settings remains to
be seen.
11:47
91
In this study, Broughton examines six strategies by which
personality scales can be constructed, including a prototype
strategy not commonly used.
Burisch, M. (1984). Approaches to personality inventory construction. American Psychologist, 39, 214
227.
A very readable article in which the author discusses three
major approaches to personality scale construction, which he
labels as external, inductive, and deductive. The author argues
that although one method does not appear to be better, the
deductive approach is recommended.
Kelly, E. J. (1985). The personality of chessplayers. Journal of Personality Assessment, 49, 282284.
A brief but interesting study of the MBTI responses of chessplayers. As you might predict, chessplayers are more introverted, intuitive, and thinking types than the general population.
DISCUSSION QUESTIONS
SUMMARY
In this chapter, we have looked at a variety of measures of personality. Most have been personality
inventories, made up of a number of scales, that
are widely used and commercially available. A few
are not widely known but still are useful teaching devices, and they illustrate the wide range of
instruments available and the variables that have
been scaled.
SUGGESTED READINGS
Broughton, R. (1984). A prototype strategy for construction of personality scales. Journal of Personality
and Social Psychology, 47, 13341346.
1. Do you think that most people answer honestly when they take a personality test?
2. Compare and contrast the Cattell 16 PF and
the California Psychological Inventory.
3. The EPPS covers 15 needs that are listed in
Table 4.2. Are there any other needs important enough that should be included in this
inventory?
4. How might you go about generating some evidence for the validity of the Self-Consciousness
Inventory?
5. How can the criterion validity of a personality
measure of ego strength (or other dimension)
be established?
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
14:17
Cognition
AIM In this chapter we focus on the assessment of cognitive abilities, primarily intel-
ligence. We take a brief look at various basic issues, some theories, and some representative instruments. We see that the assessment of intelligence is in a state of ux,
partly because of and partly parallel to the changes that are taking place in the eld
of cognitive psychology.
INTRODUCTION
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
93
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
94
March 4, 2006
14:17
So the focus here is on how people go about solving problems, on processing information, rather
than on why Johnny does better than Billy. Representative theories are those of Baron (1985), A. L.
Brown (1978), and Sternberg (1985). Many of the
tests that have evolved from this approach assess
very specic processes such as letter matching. Although some of these tests are components of typical intelligence tests, most are used
for research purposes rather than for individual
assessment.
3. The biological metaphor. Here intelligence is
dened in terms of brain functions. Sternberg
(1990) suggests that these theories are based or
supported by three types of data: (1) studies of the
localization of specic abilities in specic brain
sites, often with patients who have sustained
some type of brain injury; (2) electrophysiological studies where the electrical activity of the
brain is assessed and related to various intellectual activities such as test scores on an intelligence
test; and (3) the measurement of blood ow in the
brain during cognitive processing, especially to
localize in what part of the brain different processes take place. Representative theories here are
those of Das, Kirby, and Jarman (1979) and Luria
(1973). This approach is reected in some tests of
intelligence, specically the Kaufman Assessment
Battery for Children, and in neuropsychological
batteries designed to assess brain functioning (see
Chapter 15).
4. The epistemological metaphor. The word epistemology refers to the philosophical study of
knowledge, so this model is one that looks primarily at philosophical conceptions for its underpinnings. This model is best represented by
the work of the Swiss psychologist, Jean Piaget
(1952). His theory is that intellectual development proceeds through four discrete periods:
(1) a sensorimotor period, from birth to 2 years,
whose focus is on direct perception; (2) a preoperational period, ages 2 to 7, where the child
begins to represent the world through symbols
and images; (3) a concrete operations period, ages
7 to 11, where the child can now perform operations on objects that are physically present and
therefore concrete; and (4) formal operations,
which begins at around age 11, where the child
can think abstractly. A number of tests have been
developed to assess these intellectual stages, such
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
95
the three faces of intellect, which sees intellectual functions as composed of processes that are
applied to contents and result in products. In this
model, there are ve types of processes: memory, cognition, divergent thinking, convergent
production, and evaluation. These processes are
applied to materials that can have one of four
types of contents: gural, symbolic, semantic, or
behavioral. The result of a process applied to a
content is a product, which can involve units,
classes, relations, systems, transformations, and
implications.
These three facets, processes, contents, and
products can interact to produce 120 separate
abilities (5 4 6), and for many years Guilford
and his colleagues sought to develop factor pure
tests for each of these 120 cells. Although the tests
themselves have not had that great an impact, the
theoretical structure has become embedded in
mainstream psychology, particularly educational
psychology. We look at one test that emanates
directly from Guilfords model (The Structure
of Intellect Learning Abilities Test) and at some
other tests based on this model when we discuss
creativity in Chapter 8.
A second theory is that of H. Gardner (1983)
who postulates multiple intelligences, each distinct from each other. Note that this is unlike the
approach of factor analysts who view intelligence
as composed of multiple abilities. H. Gardner
believes that there are seven intelligences that he
labels as linguistic, logical-mathematical, spatial
(having to do with orientation), musical, bodily
kinesthetic (the ability to use ones body as in
athletics or dancing), interpersonal intelligence
(understanding others), and intrapersonal intelligence (understanding oneself). For now, this
theoretical model has had little inuence on psychological testing, although it seems to have the
potential for such an impact in the future.
Cognitive approaches. Cognitive psychology
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
96
are higher order processes such as recognizing the existence of a problem, dening what the
problem is, and selecting strategies to solve the
problem. There are also performance components
that are used in various problem solving strategies for example, inferring that A and B are similar in some ways but different in others. Finally,
there are knowledge acquisition components that
are processes involved in learning new information and storing that information in memory.
Sternbergs theory has resulted in an intelligence
test the Sternberg Triarchic Abilities Test but
it is too new to evaluate.
Much of the criticisms of standard intelligence
tests such as the Stanford-Binet, can be summarized by saying that these efforts focus on the
test rather than the theory behind the test. In
many ways, these tests were practical measures
devised in applied contexts, with a focus on criterion rather than construct validity. Primarily
as part of the revolution of cognitive psychology, there has been a strong emphasis on different approaches to the study of intelligence,
approaches that are more theoretical, that focus
more on process (how the child thinks) rather
than product (what the right answer is), and that
attempt to dene intelligence in terms of basic
and essential capacities (Horn, 1986; Keating &
MacLean, 1987; K. Richardson, 1991).
The basic model for cognitive theories of intelligence has been the computer, which represents
a model of how the brain works. It is thus no
surprise that the focus has been on information
processing and specically on two major aspects
of such processing: the knowledge base and the
processing routines that operate on this knowledge base (K. Richardson, 1991).
From a psychometric perspective. The vari-
ous theories of intelligence can also be classied into three categories (with a great deal of
oversimplication): (1) those that see intelligence
as a global, unitary ability; (2) those that see intelligence as composed of multiple abilities; and (3)
those that attempt to unite the two views into
a hierarchical (i.e., composed of several levels)
approach.
The rst approach is well exemplied by the
work of Spearman (1904; 1927), who developed
the two-factor theory. This theory hypothesizes
that intellectual activities share a common basis,
March 4, 2006
14:17
called the general factor, or g. Thus if we administer several tests of intelligence to a group of people, we will nd that those individuals who tend
to score high on test A also tend to score high
on the other tests, and those who score low tend
to score low on all tests. If we correlate the data
and do a factor analysis, we would obtain high
correlations between test scores that would indicate the presence of a single, global factor. But
the world isnt perfect, and thus we nd variation. Marla may obtain the highest score on test
A, but may be number 11 on test B. For Spearman, the variation could be accounted by specic factors, called s, which were specic to particular tests or intellectual functions. There may
also be group factors that occupy an intermediate position between g and s, but clearly what
is important is g, which is typically interpreted
as general ability to perform mental processing,
or a mental complexity factor, or agility of symbol manipulation. A number of tests such as the
Ravens Progressive Matrices and the D-48 were
designed as measures of g, and are discussed in
Chapter 11 because they are considered culture
fair tests. Spearman was British, and this single
factor approach has remained popular in Great
Britain, and to some extent in Europe. It is less
accepted in the United States, despite the fact
that there is substantial evidence to support this
view; for example, A. R. Jensen (1987) analyzed
20 different data sets that contained more than
70 cognitive subtests and found a general factor
in each of the correlation matrices. The disagreement then, seems to be not so much a function
of empirical data, but of usefulness how useful
is a particular conceptualization?
The second approach, that of multiple factors,
is a popular one in the United States, promulgated quite strongly by early investigators such as
T. L. Kelley (1928) and Thurstone (1938). This
approach sees intelligence as composed of broad
multiple factors, such as a verbal factor, memory,
facility with numbers, spatial ability, perceptual
speed, and so on. How many such multiple factors are there? This is the same question we asked
in the area of personality and just as in personality, there is no generally agreed-upon number.
Thurstone originally proposed 12 primary mental abilities while more current investigators such
as Guilford have proposed as many as 120. In
fact, there is no generally agreed naming of such
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
Cognition
14:17
97
g
major factors
x
a b c d e
a1 a2 a3 a4
minor factors
specific factors
theories.
OTHER ASPECTS
Intelligence and job performance. Correlations
be hired because they met some minimal criteria of performance. Because each of these conditions places restrictions on the variability of
the sample and because of restrictions on the
ratings of job performance, the true correlation between general intelligence and job prociency is substantially higher, particularly as a job
requires greater complexity (J. E. Hunter, 1986;
J. E. Hunter & R. F. Hunter, 1984). J. E. Hunter
(1986) pointed out that in well-executed studies where job prociency is dened by objective criteria rather than supervisors ratings, the
relationship between intelligence and job performance could correlate as high as the mid .70s.
Yet, we should not expect a very high correlation
between any type of test and the amazing variety
of skills, accomplishments, etc., to be found in
different occupational activities (Baird, 1985).
Intelligence and academic achievement. Most
psychologists would agree that standard intelligence tests are good measures or predictors of
academic achievement. The literature conrms
that there is a relationship between intelligence
test scores and academic achievement of about
.50 (Matarazzo, 1972). Such a relationship is
somewhat higher in primary grades and somewhat lower in college (Brody, 1985). Keep in
mind that in college the grading scale is severely
restricted; in theory it is a 5-point scale (A to F),
but in practice it may be even more restricted. In
addition, college grades are a function of many
more nonintellectual variables such as degree of
motivation, good study habits, outside interests,
than is true of primary grades. Intellectual abilities are also more homogeneous among college
students than children in primary grades, and
faculty are not particularly highly reliable in their
grading habits.
Academic vs. practical intelligence. Neisser
(1976) and others have argued that typical intelligence tests measure academic intelligence, which
is different from practical intelligence. Neisser
(1976) suggests that the assessment of academic
intelligence involves solving tasks that are not
particularly interesting, that are presented by others, and that are disconnected from everyday
experience. Practical intelligence involves solving tasks as they occur in natural settings, and are
interesting because they involve the well-being
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
98
March 4, 2006
14:17
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
99
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
100
word would be identied correctly for its meaning by more white children than minority children, that word would not likely be included in
the nal test.
The need for revisions. At rst glance, it may
seem highly desirable to have tests that are frequently revised, so that the items are current, and
so that they are revised or abandoned on the basis
of accumulated data obtained in the eld. On
the other hand, it takes time not only to develop
a test, but to master the intricacies of administration, scoring, and interpretation, so that too
frequent revisions may result in unhappy consumers. Each revision, particularly if it is substantial, essentially results in a new instrument
for which the accumulated data may no longer
be pertinent.
Understanding vs. prediction. Recall that tests
can be used for two major purposes. If I am interested in predicting whether Susan will do well in
college, I can use the test score as a predictor.
Whether there is a relationship or not between
test score and behavior, such as performance in
class, is a matter of empirical validity. The focus
here is on the test score, on the product of the performance. If, however, I am interested in understanding how and why Susan goes about solving
problems, then the matter becomes more complicated. Knowing that Susans raw score is 81 or
that her IQ is 123 does not answer my needs. Here
the focus would be more on the process, on how
Susan goes about solving problems, rather than
just on the score.
One advantage of individual tests of intelligence such as the Stanford-Binet or the Wechsler
scales, is that they allow for observation of the
processes, or at least part of them, involved in
engaging in intellectual activities, in addition to
yielding a summary score or scores.
Correlation vs. assignment. For most tests, the
March 4, 2006
14:17
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
101
The 1937 Stanford-Binet. This revision consisted of two parallel forms, forms L and M, a
complete restandardization on a new sample of
more than 3,000 children, including about 100
children at each half year interval from ages 1
to 5, 200 children at each age from 6 to 14, and
100 children at each age from 15 to 18. The test
manual gave specic scoring examples (Terman
& Merrill, 1937). The sample was not truly representative however, and the test was criticized
for this. Nevertheless, the test became very popular and in some ways represented the science of
psychology quantifying and measuring a major
aspect of life.
The 1960 Stanford-Binet. This revision com-
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
102
14:17
Major
areas
Crystallized
abilities
Fluid and
analytical abilities
Scales
Verbal
Reasoning
Quantitative
Reasoning
Abstract/Visual
Reasoning
Subtests
Vocabulary
Comprehension
Absurdities
Verbal relations
Quantitative
Number Series
Equation building
Pattern analysis
Copying
Matrices
Paper folding
Short-term
memory
Bead memory
Memory for sentences
Memory for digits
Memory for objects
revision of the Stanford-Binet and its most extensive to date (Hagen, Delaney, & Hopkins, 1987;
Thorndike, Hagen, & Sattler, 1986a; 1986b). So
many changes were made on this version, referred
to in the literature as the Stanford-Binet IV, that
it might as well be considered a new test. The
earlier forms were age scales while this revision
was a point scale. The earlier forms were predominantly verbal in focus, while the 1986 form
contained spatial, quantitative, and short-term
memory items as well. This revision was designed
for use from the age of 2 to adult. The standardization sample consisted of more than 5,000 persons, from ages 2 to 23, stratied according to
the 1980 U.S. Census on such demographic variables as gender, ethnicity, and area of residence.
Despite such efforts, the standardization sample
had an overrepresentation of blacks, an underrepresentation of whites, and an overrepresentation of children from higher socioeconomic-level
homes.
The theory that subsumes the 1986 StanfordBinet is a hierarchical theory, with g at the top of
the hierarchy, dened as including informationprocessing abilities, planning and organizing
abilities, and reasoning and adaptation skills.
Incorporated into this theory are also the concepts of crystallized abilities, which are basically academic skills, and uid-analytic abilities,
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
103
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
104
March 4, 2006
14:17
administered at different ages, the factor structure of the test varies according to age. For example, the test manual indicates that there are two
factors at the preschool level, but four factors
at the adolescent/adult level. At the same time,
somewhat different factor structures have been
reported by different investigators (e.g., Keith,
et al., 1988; R. B. Kline 1989; Sattler, 1988). Other
investigators have looked at the factor structure of
the Stanford-Binet in a variety of samples, from
elementary school children to gifted (e.g., Boyle,
1989; Gridley, 1991; T. Z. Keith et al., 1988; R. B.
Kline, 1989; McCallum, 1990; McCallum, Karnes
& Crowell, 1988; Ownby & Carmin, 1988). At the
same time, it can be argued that the results of
the factor analyses do not fully support the theoretical model that gave birth to the StanfordBinet IV. All subtests do load signicantly on g.
Some of the subtests do load signicantly on the
appropriate major areas, but there are exceptions.
For example, the Matrices subtest, which falls
under the Abstract/Visual reasoning area, actually loads more highly with the Quantitative reasoning area. Whether these exceptions are strong
enough to suggest that the theoretical model
is incorrect is debatable (Delaney & Hopkins,
1987).
A second source of validity information are
the correlations obtained between scores on
the Stanford-Binet and scores on other intelligence tests, primarily with the Wechsler tests and
with earlier versions of the Stanford-Binet. The
obtained correlation coefcients are too numerous to report here, but in general, show substantial correlation between same type subtests. The
correlations between the total Stanford-Binet
scores and similar scores on other tests correlate
in the .80 to .91 range. Other investigators have
compared the Stanford-Binet with the WISC-R
in gifted children (e.g., Phelps, 1989; Robinson
& Nagle, 1992) and in learning-disabled children
(e.g. T. L., Brown, 1991; Phelps & Bell, 1988),
with the WAIS-R (e.g., Carvajal, 1987a; Spruill,
1991), with the WPPSI (Carvajal, 1991), with the
K-ABC (e.g., Hayden, Furlong, & Linnemeyer,
1988; Hendershott et al., 1990; Knight, Baker,
& Minder, 1990; Krohn & Lamp, 1989; Lamp &
Krohn, 1990), and with other tests (e.g., Atkinson, 1992; Carvajal, 1987c; 1988; Karr, Carvajal &
Palmer, 1992). The Wechsler tests and the K-ABC
are discussed below.
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
Cognition
14:17
105
THE WECHSLER TESTS
The WAIS
Introduction. The WAIS had its beginnings in
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
106
March 4, 2006
14:17
Description
Information
Digit span
Vocabulary
Arithmetic (T)
Comprehension
Similarities
Performance scale
Description
Picture completion
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
107
Classication
Very superior
Superior
High average or bright normal
Average
Low average or dull normal
Borderline
Mentally retarded or mentally
defective
the subtests. Subtests like the Picture Arrangement and Object Assembly seem, however, to
be marginal with coefcients in the .60s. Interestingly, average Full Scale IQ seems to increase
about 6 to 7 points upon retest, probably reecting a practice effect.
Validity. Wechsler has argued that his scales have
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
108
The
March 4, 2006
14:17
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
109
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
110
March 4, 2006
14:17
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
Cognition
111
Description is identical to
that of the WAIS.
Picture completion
Picture arrangement
Block design
Object assembly
Coding (like the Digit Symbol of the WAIS)
Mazes (mazes of increasing difculty)
14:17
presented in the test manual as to what is considered a correct response, and how points are to
be distributed if the item is not simply scored as
correct or incorrect. Raw scores are then changed
into normalized standard scores with mean of
10 and SD of 3, as compared to a childs own
age group. These subtest scores are then added
and converted to a deviation IQ with mean of
100 and SD of 15. Three total scores are thus
obtained: a Verbal IQ, a Performance IQ, and a
Full Scale IQ. As with both the Stanford-Binet
and the WAIS, there are a number of sources
available to provide additional guidance for the
user of the WISC-R (e.g., Groth-Marnat, 1984;
A. S. Kaufman, 1979a; Sattler, 1982; Truch, 1989).
A. S. Kaufman (1979a), in particular, gives some
interesting and illustrative case reports.
Computer programs to score the WISC-R and
provide a psychological report on the client are
available, but apparently differ in their usefulness
(Das, 1989; Sibley, 1989).
Reliability. Both split-half (odd-even) and testretest (1-month interval) reliabilities are reported
in the test manual. For the total scores, they are all
in the .90s suggesting substantial reliability, both
of the internal consistency and stability over time
types. As one might expect, the reliabilities of the
individual subtests are not as high, but typically
range in the .70s and .80s.
The test manual also gives information on the
standard error of measurement and the standard
error of the difference between means (which we
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
112
14:17
distribution.
112
115
118
121
124
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
between the two areas, but also enough independence to justify the use of the two summary
scores.
Other factor analyses of the WISC have suggested a factor structure similar to that of the
WAIS, including a general factor, and at least
three other substantial factors of verbal comprehension, perceptual-spatial functioning, and
a memory or freedom from distractibility factor
(Gutkin & Reynolds, 1980; 1981; Littell, 1960;
Van Hagan & Kaufman, 1975; Zimmerman &
Woo-Sam, 1972). These three factors have also
been obtained in studies of learning-disabled
children and mentally retarded children (e.g.,
Cummins & Das, 1980; Naglieri, 1981). The third
factor is a rather small factor, and a number of
alternative labels for it have been proposed.
Bannatyne (1971) proposed a recategorization
of the WISC subtests into four major categories:
verbal-conceptual ability (Vocabulary + Comprehension + Similarities subtests)
acquired information (Information + Arithmetic + Vocabulary subtests)
visual-spatial ability (Block Design + Object
Assembly + Picture Completion subtests)
sequencing (Coding + Digit Span + Arithmetic
subtests)
The argument for such a recategorization is that
the analysis is more meaningful with learningdisabled children, and the results are more easily
interpretable to teachers and parents. The focus
would not be so much on the IQ, but on the
measurement of abilities.
Abbreviated scales. As with the WAIS, a number
of efforts have been made to identify combinations of WISC subtests that correlate highly with
the Full Scale IQ of the entire WISC (Silverstein,
1968; 1970), or administering every other item, or
every third item, but including all subtests (Silverstein, 1967). One such subtest combination consists of the Vocabulary and Block Design subtests;
scores here correlate in the .80s with the Full Scale
IQ of the entire WISC-R (Ryan, 1981). Typically,
quite high correlations (in the .80s and .90s) are
obtained between abbreviated forms and the full
test, and these abbreviated forms can be useful as
screening devices, or for research purposes where
only a summary IQ number is needed.
March 4, 2006
14:17
113
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
114
March 4, 2006
14:17
WPPSI have produced a wide range of ndings (Sattler, 1982; 1988). Scores on the WPPSI
have been correlated with a variety of scores on
other tests, with typical correlation coefcients
between the Full Scale IQ and other test measures in the .50 to .70 range (e.g., Baum & Kelly,
1979; Gerken, 1978; B. L. Phillips, Pasewark, &
Tindall, 1978). Keep in mind that for many of
these samples, the children tested were homogeneous for example, retarded and, as you recall,
homogeneity limits the size of the correlation.
As might be expected, scores on the WPPSI
correlate substantially with scores on the WISCR, in the order of .80 (Wechsler, 1974), and
with the Stanford-Binet. Sattler (1974) reviewed
a number of such studies and reported that the
median correlations between the WPPSI Verbal, Performance, and Full Scale IQs and the
Stanford-Binet IQ were .81, .67, and .82, respectively. Despite the fact that the tests correlate substantially, it should be noted that the IQs obtained
from the WPPSI and from the Stanford-Binet are
not interchangeable. For example, Sewell (1977)
found that the mean WPPSI IQ was higher than
that of the Stanford-Binet, while earlier studies
found just the opposite (Sattler, 1974).
Fewer studies are available on the predictive validity of the WPPSI, and these typically
attempt to predict subsequent academic achievement, especially in the rst grade or later IQ. In
the rst instance, typical correlations between
WPPSI scores and subsequent achievement are
in the .40 to .60 range (e.g, Crockett, Rardin, &
Pasewark, 1975), and in the second are higher,
typically in the .60 to .70 range (e.g., Bishop &
Butterworth, 1979). A number of studies have
looked at the ability of WPPSI scores to predict
reading achievement in the rst grade. Typical
ndings are that with middle-class children there
is such a relationship, with modal correlation
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
coefcients in the .50s. With minority and disadvantaged children no such relationship is found
but again, one must keep in mind the restriction
of range both on the WPPSI scores and on the
criterion of reading achievement (e.g., Crockett,
Rardin, & Pasewark, 1976; Serwer, B. J. Shapiro,
& P. P. Shapiro, 1972; D. R. White & Jacobs,
1979).
Because the term construct validity is an
umbrella term subsuming other types of validity,
all of the studies mentioned so far can be considered as supportive of the construct validity of
the WPPSI. A number of other ndings might
be mentioned here. R. S. Wilson (1975) studied
monozygotic (identical) twins and dizygotic (fraternal) twins and found that monozygotic twins
were closer in intelligence to each other than were
dizygotic twins a nding that is in line with the
view that intelligence has a substantial hereditary/genetic component, and one that supports
the construct validity of the WPPSI. Other studies
have focused on the relationship of IQ to socioeconomic status (e.g., A. S. Kaufman, 1973), and
on language spoken at home (e.g., Gerken, 1978).
Norms. The WPPSI was standardized on a
national sample of 1,200 children, with 200 children (100 boys and 100 girls) at each of six halfyear age levels, from age 4 to 6. The sample was
stratied using census data.
Factor structure. As with the other Wechsler
tests, the subtests of the WPPSI and the Verbal
and Performance scales intercorrelate with each
other signicantly. Subtests typically correlate .40
to .60, and Verbal and Performance IQs correlate in the mid .60s. The results of factor analytic studies suggest a general factor, as well as
two broad factors, a verbal factor and a performance factor, although the verbal component is
a much more important one, and may be interpreted as a general factor (Coates & Bromberg,
1973; Heil, Barclay, & Endres, 1978; Hollenbeck &
Kaufman, 1973; Ramanaiah & Adams, 1979). In
the younger aged children, the two broad factors
are less distinct from each other, a nding that is
in line with developmental theories that hypothesize intellectual functioning to evolve as the child
grows older, into more specialized and distinct
categories. Similar results have been obtained
with black children (Kaufman & Hollenbeck,
1974).
March 4, 2006
14:17
115
Abbreviated scales. A. S. Kaufman (1972)
developed a short form of the WPPSI composed
of four subtests: Arithmetic and Comprehension from the Verbal Scale, and Block Design
and Picture Completion from the Performance
Scale. The reliability of this short form was in
the low .90s, and scores correlated with the Full
Scale IQ of the entire WPPSI in the .89 to .92
range. Other investigators have also developed
short forms for both the WPPSI and the WPPSIR (e.g., Tsushima, 1994).
The WPPSI-R. As with other major tests, while
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
116
March 4, 2006
14:17
gence test designed for ages 21/2 to 171/2, and contains 23 scales that cover 6 areas and yield 3 IQ
scores. The six areas are: (1) speed of information
processing, (2) reasoning, (3) spatial imagery,
(4) perceptual matching, (5) short-term memory, and (6) retrieval and application of knowledge. The three IQ scores are General, Visual, and
Verbal.
Each of the six areas is composed of a number of subscales; for example, the Reasoning area
is made up of four subscales, while the Retrieval
and Application of Knowledge area is made up of
seven subscales. All of the subscales are appropriate for multiple age levels. For example, the Block
Design subscale is appropriate for ages 4 to 17,
while the Visual Recognition subscale is appropriate for ages 21/2 to 8. Thus, which subscales
are used depends on the age of the child being
tested.
The BAS is unusual in at least two aspects:
it was developed using very sophisticated psychometric strategies, and it incorporates various theories in its subscales. Specically, the BAS
subscales were developed according to the Rasch
latent trait model; this is a very sophisticated psychometric theory and has procedures that are
beyond the scope of this book (Rasch, 1966). Two
of the subscales on the BAS are based on the developmental theories of Piaget, and one subscale,
that of Social Reasoning, is based on Kohlbergs
(1979) theory of moral reasoning.
Finally, two subtests, Word Reading and Basic
Arithmetic, both for ages 5 to 14 and both from
the Retrieval and Application of Knowledge area,
can be used to estimate school achievement.
Administration and scoring. Administration
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
Reliability and validity. Unfortunately, little
BAS possesses excellent potential and great psychometric sophistication, but the 1985 data
on reliability and validity was judged inadequate. For a review of the BAS see Buckhalt
(1986).
The Differential Ability Scales. The BAS was
introduced in the United States as the DAS and
seems well on its way toward becoming a popular test. The DAS is very similar to the BAS; some
BAS subtests were eliminated or modied, so that
the DAS consists of 20 subtests, 17 of which are
March 4, 2006
14:17
117
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
118
March 4, 2006
14:17
The DAS manual (Elliott, 1990a) reports several validity studies, including correlations with
the WISC-R (mid .80s) and with the StanfordBinet IV (high .70s to high .80s). The literature
also contains a number of studies supportive of
the Validity of the DAS (e.g., McIntosh, 1999).
Recent reviews of the DAS are quite favorable and
point to its technical excellence and potential use
with minority children (Braden, 1992).
The Kaufman Assessment Battery for
Children (K-ABC)
Kaufman (1983) describes intelligence as the ability to process information effectively to solve
unfamiliar problems. In addition, he distinguished between sequential and simultaneous
processing. A number of other theorists have analyzed intellectual functioning into two modes of
mental organization. Freud, for example, spoke
of primary and secondary processes. Recently,
Guilford (1967b) focused on convergent and
divergent thinking, while R. B. Cattell (1963)
used the terms uid and crystallized intelligence,
and Wechsler (1958) used verbal and nonverbal intelligence. One dichotomy that has found
its way in a number of tests and test interpretations is the notion of sequential (or successive) and simultaneous processing (Luria, 1966).
Sequential processing requires the organization
of stimuli into some temporally organized series,
where the specic order of the stimuli is more
important than the overall relationship of these
stimuli. For example, as you read these words
it is their sequencing that is important for the
words to have meaning. Sequential processing is
typically based on verbal processes and depends
on language for thinking and remembering; it
is serial in its nature. Simultaneous processing
involves stimuli that are primarily spatial and
focuses on the relationship between elements. To
understand the sentence this box is longer than
this pencil, we must not only have an understanding of the sequence of the words, we must
also understand the comparative spatial relationship of longer than. Simultaneous processing
searches for patterns and congurations; it is
holistic. A. S. Kaufman (1979b) suggested that
the WISC-R subtests could be organized along
the lines of sequential vs. simultaneous processing. For example, Coding and Arithmetic require
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
administered intelligence and achievement measure that assesses styles of problem solving and
information processing, in children ages 21/2 to
121/2. It is composed of ve global scales: (1)
Sequential processing scale; (2) Simultaneous
processing scale; (3) Mental processing composite scale, which is a combination of the rst
two; (4) Achievement scale; and (5) Nonverbal
scale. The actual battery consists of 16 subtests,
including 10 that assess a childs sequential and
simultaneous processing and 6 that evaluate a
childs achievement in academic areas such as
reading and arithmetic; because not all subtests
cover all ages, any individual child would at the
most be administered 13 subtests. The 10 subtests
that assess the childs processing include practice
items so that the examiner can communicate to
the child the nature of the task and can observe
whether the child understands what to do. Of
these 10 subtests, 7 are designed to assess simultaneous processing, and 3 sequential processing;
the 3 sequential processing subtests all involve
short-term memory.
All the items in the rst three global scales minimize the role of language and acquired facts and
skills. The Achievement scale assesses what a child
has learned in school; this scale uses items that are
more traditionally found on tests of verbal intelligence and tests of school achievement. The Nonverbal scale is an abbreviated version of the Mental Processing composite scale, and is intended to
assess the intelligence of children with speech or
language disorders, with hearing impairments,
or those that do not speak English; all tasks on
March 4, 2006
14:17
119
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
120
March 4, 2006
14:17
number of issues including its validity for minority groups (e.g., Sternberg, 1984), its appropriateness for preschool children (e.g., Bracken, 1985),
its theoretical basis (e.g., Jensen, 1984), and
its lack of instructional utility (Good, Vollmer,
Creek, et al., 1993).
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
121
tunately, only 3 of the 26 subtests achieve adequate alternate-form reliability (J. A. Cummings,
1989). These three subtests, incidentally, are
among those that have the highest test-retest
reliabilities.
For the two subtests that require subjective
scoring, interscorer reliability becomes important. Such interscorer reliability coefcients
range from .75 to .85 for the DPFU subtest, and
from .92 to 1.00 for the DPSU subtest (M. Meeker,
R. J. Meeker, & Roid, 1985). These are rather high
coefcients, not usually found this high in tests
where subjective scoring is a major aspect.
Standardization. The normative sample consisted of 349 to 474 school children in each
of ve grade levels, from grades 2 to 6, with
roughly equivalent representation of boys and
girls. Approximately half of the children came
from California, and the other half from school
districts in three states. For the intermediate levels, samples of children in grades 7 to 12 were
assessed; while adult norms are based on various
groups aged 18 to 55. Little information is given
on these various samples.
Diagnostic and prescriptive aspects. The sub-
tests are timed so that the raw scores can be compared with norms in a meaningful way. However, subjects may be given additional time for
uncompleted items, although they need to indicate where they stopped when time was called so
the raw score can be calculated. Thus, two sets
of scores can be computed: One set is obtained
under standard administrative procedures and
therefore comparable to norms, and another set
reects ability with no time limit and potentially
is useful for diagnostic purposes or for planning remedial action. Whether this procedure
is indeed valid remains to be proven, but the
distinction between actual performance and
potential is an intriguing one, used by a number of psychologists.
There is available a teachers guide that goes
along with the SOI-LA, whose instructional focus
is on the remedial of decits as identied by
the test (M. Meeker, 1985). This represents a
somewhat novel and potentially useful approach,
although evidence needs to be generated that
such remedial changes are possible.
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
122
Criticisms. The SOI-LA represents an interesting
100 multiple-choice test items, 50 verbal in content and 50 quantitative. The verbal items consist
of analogies given in a multiple choice format.
For example:
arm is to hand as:
(a) head is to shoulder
14:17
Quantitative items are given as two quantitative expressions, and the student needs to decide
whether the two expressions are equal, if one is
greater, or if insufcient information is given.
Thus, two circles of differing size might be given,
and the student needs to determine whether the
radius of one is larger than that of the other.
Administration. The SCAT III is a groupadministered test and thus requires no special
clinical skills or training. Clear instructions are
given in the test manual, and the examiner needs
to follow these.
Scoring. When the SCAT III is administered, it
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
Cognition
March 4, 2006
14:17
123
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
124
pupils in 70 different school systems, including both public and private schools. The sample
was stratied using census data on several variables, including geographic region of residence
and socioeconomic status. The racial-ethnic distribution of the sample also closely paralleled the
census data and included 74% white, 20% black,
4% Hispanic, and 2% other.
Normative data are reported by age and grade
using deviation IQs which in this test are called
School Ability Indexes, as well as percentiles and
stanines. The School Ability Index is normed with
a mean of 100 and SD of 16.
The Slosson Intelligence Test (SIT)
screening instrument to evaluate a persons intellectual ability, although it is also presented by its
author as a parallel form for the Stanford-Binet
and was in fact developed as an abbreviated version of the Stanford-Binet (Slosson, 1963). The
SIT was rst published in 1961 and revised in
1981, although no substantive changes seem to
have been made from one version to the other.
It was revised again in 1991, a revision in which
items were added that were more similar to the
Wechsler tests than to the Stanford-Binet. This
latest version was called the SIT-R. The test contains 194 untimed items and is said to extend
from age 2 years to 27 years. No theoretical rationale is presented for this test, but because it
originally was based on the Stanford-Binet, presumably it is designed to assess abstract reason-
March 4, 2006
14:17
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
March 4, 2006
Cognition
So far we have looked at measures that are multivariate, that assess intelligence in a very complex way, either globally or explicitly composed
of various dimensions. There are however, literally hundreds of measures that assess specic
cognitive skills or dimensions. The STT is illustrative.
Carver (1992) presented the STT as a test to
measure cognitive speed. The STT is designed to
measure how fast individuals can choose the correct answers to simple mental problems. In this
case, the problems consist of pairs of letters, one
in upper case and one in lower case. The respondent needs to decide whether the two letters are
the same or different e.g., Aa vs. aB. Similar
tasks have been used in the literature, both at the
14:17
125
SUMMARY
In this chapter, we briey looked at various theories of cognitive assessment and a variety of issues.
We only scratched the surface in terms of the variety of points of view that exist, and in terms of
the controversies about the nature and nurture
of intelligence. We looked, in some detail, at the
various forms of the Binet tests, because in some
ways, they nicely illustrated the historical progression of classical testing. We also looked at
the Wechsler series of tests because they are quite
popular and also illustrate some basic principles
of testing. The other tests were chosen as illustrations, some because of their potential utility
(e.g., the BAS), or because they embody an interesting theoretical perspective (e.g., the SOI-LA),
or because they seem to be growing in usefulness
and popularity (e.g., the K-ABC). Some, such as
the SIT, leave much to be desired, and others,
such as the Otis-Lennon, seem to be less used
than in the past. Tests, like other market products,
achieve varying degrees of popularity and commercial success, but hopefully the lessons they
teach us will outlast their use.
P1: JZP
0521861810c05
CB1038/Domino
0 521 86181 0
126
SUGGESTED READINGS
Byrd, P. D., & Buckhalt, J. A. (1991). A multitraitmultimethod construct validity study of the Differential Ability Scales. Journal of Psychoeducational Assessment 9, 121129.
You will recall that we discussed the multitrait-multimethod
design as a way of assessing construct validity, and more
specically, as a way of obtaining convergent and discriminant validity information. This study of 46 rural Alabama
children analyzes scores from the DAS, the WISC-R, and the
Stanford Achievement Test. The authors conclude that one
must be careful in comparing subtests from the DAS and
the WISC-R, even though they may have similar content.
One may well ask whether this article represents an accurate
utilization of the multitrait-multimethod approach are the
methods assessed really different?
March 4, 2006
14:17
Keating, D. P. (1990). Charting pathways to the development of expertise. Educational Psychologist, 25, 243
267.
A very theoretical article that rst briey reviews the history
of the conception of intelligence and then engages in some
speculative thinking. The article introduces Alfreda Binet,
the mythical twin sister of Alfred Binet, who might have done
things quite differently from her famous brother.
DISCUSSION QUESTIONS
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
AIM This chapter looks at the measurement of attitudes, values, and interests. These
three areas share much in common from a psychometric as well as a theoretical point
of view; in fact, some psychologists argue that the three areas, and especially attitudes
and values, are not so different from each other. Some authors regard them as subsets
of personality, while others point out that it is difcult, if not impossible, to dene
these three areas so that they are mutually exclusive.
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
128
Some writers seem to emphasize one component more than the others. For example, Thurstone (1946) dened attitude as, the degree of
positive or negative affect associated with some
psychological object. But most social scientists
do perceive attitudes as learned predispositions
to respond to a specic target, in either a positive or negative manner. As in other areas of
assessment, there are a number of theoretical
models available (e.g., Ajzen & Fishbein, 1980;
Bentler & Speckart, 1979; Dohmen, Doll, &
Feger, 1989; Fishbein, 1980; Jaccard, 1981; Triandis, 1980; G. Wiechmann & L. A. Wiechmann,
1973).
Centrality of attitudes. The study of attitudes
and attitude change have occupied a central position in the social sciences, and particularly in
social psychology, for a long time. Even today,
the topic is one of the most active topics of
study (Eagly & Chaiken, 1992; Oskamp, 1991;
Rajecki, 1990). Part of the reason why the study
of attitudes has been so central focuses on the
assumption that attitudes will reveal behavior
and because behavior seems so difcult to assess
directly, attitudes are assumed to provide a way
of understanding behavior (Kahle, 1984). Thus
the relationship between attitudes and behavior
is a major question, with some writers questioning such a relationship (e.g., Wicker, 1969) and
others proposing that such a relationship is moderated by situational or personality factors (e.g.,
Ajzen & Fishbein, 1973; Zanna, Olson, & Fazio,
1980).
study of attitudes is to observe a persons behavior, and to infer from that behavior the persons
attitudes. Thus, we might observe shoppers in a
grocery store to determine their attitudes toward
a particular product. The problem of course, is
that a specic behavior may not be related to a
particular attitude (for a brief, theoretical discussion of the relationship between attitudes and
observable behavior see J. R. Eiser, 1987). You
might buy chicken not because you love chicken
but because you cannot afford let mignon,
or because you might want to try out a new
recipe, or because your physician has suggested
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
129
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
130
Median
10.5
Q value
.68
8.3
3.19
6.7
.88
4.8
.52
2.1
.86
1.3
.68
Note that item 14 would probably be eliminated because of its larger Q value. If the other
items were retained and administered to a subject who endorses items 1, 19, and 23, then that
persons score would be the median of 10.5, 6.7,
and 4.8, which would be 6.7.
The intent of this method was to develop an
interval scale, or possibly a ratio scale, but it is
clear that the zero point (in this case the center
of the distribution of items) is not a true zero.
The title method of equal-appearing intervals
suggests that the procedure results in an interval
scale, but whether this is so has been questioned
(e.g., Hevner, 1930; Petrie, 1969). Unidimensionality, hopefully, results from the writing of the initial pool of items, in that all of the items should
be relevant to the target being assessed and from
selecting items with small Q values.
There are a number of interesting questions
that can be asked about the Thurstone procedure.
For example, why use 11 categories? Why use the
median rather than the mean? Could the judges
rate each item rather than sort the items? In general, variations from the procedures originally
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
131
2.30
4.84
7.45
10.50
The
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
132
3. The items are administered to a sample of subjects who indicate for each item whether they
strongly agree, agree, are undecided, disagree, or strongly disagree (sometimes a word
like approve is used instead of agree). Note that
these subjects are not experts as in the Thurstone
method; they are typically selected because they
are available (introductory psychology students),
or they represent the population that eventually
will be assessed (e.g., registered Democrats).
4. A total score for each subject can be generated by assigning scores of 5, 4, 3, 2, and 1 to
the above categories, and reversing the scoring
for unfavorably worded items; the intent here is
to be consistent, so that ordinarily higher scores
represent a more favorable attitude.
5. An item analysis is then carried out by computing for each item a correlation between
responses on that item and total scores on all the
items (to be statistically correct, the total score
should be for all the other items, so that the
same item is not correlated with itself, but given a
large number of items such overlap has minimal
impact).
6. Individual items that correlate the highest with
the total score are then retained for the nal version of the scale. Note therefore that items could
be retained that are heterogeneous in content, but
correlate signicantly with the total. Conversely,
we could also carry out an item analysis using
the method of item discrimination we discussed.
Here we could identify the top 27% high scorers and the bottom 27% low scorers, and analyze
for each item how these two groups responded to
that item. Those items that show good discrimination between high and low scorers would be
retained.
7. The nal scale can then be administered to
samples of subjects and their scores computed.
Such scores will be highly relative in meaning
what is favorable or unfavorable depends upon
the underlying distribution of scores.
Note should be made that some scales are
called Likert scales simply because they use a 5point response format, but may have been developed without using the Likert procedure, i.e.,
simply by the author putting together a set of
items.
Are ve response categories the best? To some
degree psychological testing is affected by inertia
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
Group
14:22
133
Ratings
1
2
3
4
5
6
7
(each of these would be defined using the seven statements)
Hispanics
American Indians
Blacks
Italians
Russians
etc.
FIGURE 61. Example of a Bogardus Scale using multiple targets.
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
134
Response
Patterns:
Agree
Agree
Agree
Disagree
Disagree
Disagree
Agree
Disagree
Disagree
Agree
Agree
Agree
Disagree
Disagree
Disagree
Agree
Disagree
Disagree
Agree
Agree
Agree
Disagree
Agree
Disagree
In fact, the number of possible response patterns is 2N where N is the number of items; in
this case 23 equals 2 2 2 or 8. If however,
the items form a Guttman scale, there should be
few if any reversals, and only the four response
patterns marked by an should occur. The ideal
number of response patterns then becomes N +
1, or 4 in this example. We can then compute what
is called the coefcient of reproducibility, which is
dened as:
1
was developed as a way of assessing word meaning but because this technique has been used
quite frequently in the assessment of attitudes it
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
135
My ideal self
good
bad
small
large
beautiful
ugly
passive
active
sharp
dull
slow
fast
dirty
clean
etc.
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
136
Table 61. SemD ratings from one subject for ve brands of beer
SemD Scales
Brand A
Brand B
Brand C
Brand D
Brand E
Pleasant-unpleasant
Ugly-beautiful
Sharp-at
Salty-sweet
Happy-sad
Expensive-cheap
Mean
6
5
6
7
5
6
5.83
2
2
1
1
3
2
1.83
6
5
4
5
5
7
5.33
5
5
6
6
7
7
6.00
3
2
2
3
1
2
2.17
di2j
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
137
Brand B
10.30
C
3.00
8.89
D
2.65
10.44
3.16
14:22
E
9.06
3.16
8.19
9.95
Checklists. One way to assess attitudes, particularly toward a large number of issues, is the checklist approach. As its name implies, this approach
consists of a list of items (people, objects, issues,
etc.) to which the respondent is asked to indicate their attitude in some way by checking
those items they endorse, selecting favorable or
unfavorable for each item, indicating approvaldisapproval, etc.
This is a simple and direct approach, and
because all subjects are asked to respond to the
same items, there is comparability of measurement. On the other hand, some argue that the
presentation of a number of items can result in
careless responding and hence lowered reliability and validity. In addition, the response categories typically used do not allow for degree
of preference. (I may favor the death penalty
and check that item in the list, but my convictions may not be very strong and might be easily
dissuaded.)
An example of the checklist approach in the
assessment of attitudes can be found in the work
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
138
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
139
Another example:
Where would you locate President Clinton on the
following scale?
An excellent leader.
Better than most prior presidents.
Average in leadership.
Less capable than most other presidents.
Totally lacking in leadership capabilities.
Note that a scale could combine both numerical and graphic properties; essentially what distinguishes a graphic scale is the presentation of
some device, such as a line, where the respondent can place their answer. Note also, that from
a psychometric point of view, it is easier to force
the respondent to place their mark in a particular segment, rather than to allow free reign. In
the capital punishment example above, we could
place little vertical lines to distinguish and separate the ve response options. Or we could allow
the respondent to check anywhere on the scale,
strongly agree not sure disagree strongly even between responses, and generate a score
agree
disagree by actually measuring the distance where they
+2
+1
0
1
2
placed their mark from the extreme left-hand
beginning of the line. Guilford (1954) discusses
In general, such use of numbers makes life more
these scales at length, as well as other less common
complicated for both the respondent and the
types.
examiner. Mention should be made here, that
there seems to be a general tendency on the part
of some respondents to avoid extreme categories. Self-anchoring scales. Kilpatrick and Cantril
Thus the 5-point scale illustrated above may turn (1960) presented an approach that they called
out to be a 3-point scale for at least some subjects. self-anchoring scaling, where the respondent is
The extension of this argument is that a 7-point asked to describe the top and bottom anchoring
scale is really preferable because in practice it will points in terms of his or her own perceptions,
values, attitudes, etc. This scaling method grew
yield a 5-point scale.
Another type of rating scale is the graphic scale out of transactional theory that assumes that we
where the response options follow a straight line live and operate in the world, through the self,
both as personally perceived. That is, there is a
or some variation. For example:
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
140
10
9
8
7
6
5
4
3
2
1
0
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
141
There are a number of situations where the assessment of attitudes might be helpful, but available
scales may not quite t the demands of the situation. For example, a city council may wish to
determine how citizens feel toward the potential
construction of a new park, or the regents of a
university might wish to assess whether a new
academic degree should be offered. The same
steps we discussed in Chapter 2 might well be
used here (or the steps offered by Oppenheim
[1992] above). Perhaps it might not be necessary
to have a theory about the proposed issue, but
it certainly would be important to identify the
objectives that are to be assessed and to produce
items that follow the canons of good writing.
VALUES
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
142
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
143
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
144
Instrumental values
Note: These values were later replaced by health and loyal respectively.
Adapted with the permission of The Free Press, a Division of Simon & Schuster from The Nature Of Human Values by
C 1973 by The Free Press.
Milton Rokeach. Copyright
each student. Students classied as high identity achievement ranked the RVS instrumental
values as follows:
value
honest
responsible
loving
broadminded
independent
capable
etc.
mean ranking
5.50
5.68
5.96
6.56
7.48
7.88
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
145
ing the temporal stability (i.e., test-retest reliability) of the RVS. One way is to administer the
RVS to a group of individuals and retest them
later. For each person, we can correlate the two
sets of ranks and then can compute the median of
such rank order correlation coefcients for our
sample of subjects. Rokeach (1973) reports such
medians as ranging from .76 to .80 for terminal
values and .65 to .72 for instrumental values, with
samples of college students retested after 3 weeks
to 4 months.
Another way is also to administer the RVS
twice, but to focus on each value separately. We
may for example, start out with a comfortable
life. For each subject in our sample, we have the
two ranks assigned to this value. We can then
compute a correlation coefcient across subjects
for that specic value. When this is done, separately for each of the 36 values, we nd that the
reliabilities are quite low; for the terminal values the average reliability is about .65 (Rokeach,
1973) and for the instrumental values it is about
.56 (Feather, 1975). This is of course not surprising because each scale is made up of only
one item. One important implication of such low
reliability is that the RVS should not be used for
individual counseling and assessment.
One problem, then, is that the reliability of the
RVS is marginal at best. Rokeach (1973) presents
the results of various studies, primarily with college students, and with various test-retest intervals ranging from 3 weeks to 16 months; of the 29
coefcients given, 14 are below .70, and all range
from .53 to .87, with a median of .70. Inglehart
(1985), on the other hand, looked at the results of
a national sample, one assessed in 1968 and again
in 1981. Because there were different subjects, it is
not possible to compute correlation coefcients,
but Inglehart (1985) reported that the stability
of rankings over the 13-year period was phenomenal. The six highest- and six lowest-ranked
values in 1968 were also the six highest-and six
lowest-ranked values in 1981.
various analyses and comparisons of RVS rankings, including cross-cultural comparisons and
analyses of such variables such as as race, socioeconomic status, educational level, and occupation. The RVS has also been used in hundreds
of studies across a wide spectrum of topics, with
most studies showing encouraging results that
support the construct validity of this instrument.
These studies range from comparisons of women
who prefer Ivory as a washing machine detergent to studies of hippies (Rokeach, 1973). One
area where the study of values has found substantial application is that of psychotherapy, where
the values of patients and of therapists and their
concomitant changes, have been studied (e.g.,
Beutler, Arizmendi, Crago, et al., 1983; Beutler,
Crago, & Arizmendi, 1986; Jensen & Bergin, 1988;
Kelly, 1990).
Cross-cultural aspects. Rokeach (1973) believed
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
146
Table 63. Factor structure of the RVS Based on a sample of 1,409 respondents (Rokeach,
1973)
Example of item with
Factor
Positive loading
A comfortable life
Logical
Obedient
A world at peace
A world of beauty
Social recognition
Polite
Wisdom
Forgiving
Broadminded
True friendship
Family security
Mature Love
Courageous
to support the terminal-instrumental differentiation, although not everyone agrees (e.g., Crosby,
Bitner, & Gill, 1990; Feather & Peay, 1975; Heath
& Fogel, 1978; Vinson et al., 1977). Factor analyses suggest that the 36 values are not independent
of each other and that certain values do cluster
together. Rokeach (1973) suggests that there are
seven basic factors that cut across the terminalinstrumental distinction. These factors are indicated in Table 6.3. One question that can be asked
of the results of a factor analysis is how important each factor is. Different respondents give
different answers (ranks) to different values. This
variation of response can be called total variance. When we identify a factor, we can ask
how much of the total variance does that factor account for? For the RVS data reported in
Table 6.3, factor 1 accounts for only 8.2% of
the total variation, and in fact all seven factors
together account for only 40.8% of the total variation, leaving 59.2% of the variation unaccounted
for. This suggests that the factors are probably
not very powerful, either in predicting behavior
or in helping us to conceptualize values. Heath
and Fogel (1978) had subjects rate rather than
rank the importance of each of the 36 values;
8.2
7.8
5.5
5.4
5.0
4.9
4.0
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
147
Table 64. Values medians and composite rank orders for American men and women
(Rokeach, 1973)
Terminal value:
Male (n = 665)
Female (n = 744)
A comfortable life
An exciting life
A sense of
accomplishment
A world at peace
A world of beauty
Equality
Family security
Freedom
Happiness
Inner harmony
Mature love
National security
Pleasure
Salvation
Self-respect
Social recognition
True friendship
Wisdom
7.8
14.6
8.3
(4)
(18)
(7)
10.0
15.8
9.4
(13)
(18)
(10)
Males
Males
Males
3.8
13.6
8.9
3.8
4.9
7.9
11.1
12.6
9.2
14.1
9.9
8.2
13.8
9.6
8.5
(1)
(15)
(9)
(2)
(3)
(5)
(13)
(14)
(10)
(17)
(12)
(6)
(16)
(11)
(8)
3.0
13.5
8.3
3.8
6.1
7.4
9.8
12.3
9.8
15.0
7.3
7.4
15.0
9.1
7.7
(1)
(15)
(8)
(2)
(3)
(5)
(12)
(14)
(11)
(16)
(4)
(6)
(17)
(9)
(7)
Females
Males
Females
Females
Males
Females
Females
Males
Females
5.6
7.2
8.9
10.4
9.4
7.5
8.2
8.3
3.4
14.3
10.2
12.8
13.5
10.9
13.5
10.9
6.6
9.7
(2)
(4)
(8)
(12)
(9)
(5)
(6)
(7)
(1)
(18)
(11)
(15)
(16)
(14)
(17)
(13)
(3)
(10).
7.4
7.7
10.1
(9.4)
8.1
8.1
6.4
8.1
3.2
16.1
10.7
13.2
14.7
8.6
13.1
10.7
6.8
9.5
(4)
(5)
(12)
(10)
(8)
(6)
(2)
(7)
(1)
(18)
(14)
(16)
(17)
(9)
(15)
(13)
(3)
(11)
Males
Males
Females
Females
Females
Males
Females
Instrumental values
Ambitious
Broadminded
Capable
Cheerful
Clean
Courageous
Forgiving
Helpful
Honest
Imaginative
Independent
Intellectual
Logical
Loving
Obedient
Polite
Responsible
Self-controlled
Note: The gures shown are median rankings and in parentheses composite rank orders.
The gender differences are based on median rankings.
Adapted with the permission of The Free Press, a Division of Simon & Schuster from The Nature of Human Values by
C 1973 by The Free Press.
Milton Rokeach. Copyright
746
6 (746)
=1
= 1 .77 = +.23
5814
969
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
148
Your rank
Your ances
rank
D2
Ambitious
Broadminded
Capable
Cheerful
Clean
Courageous
Forgiving
Helpful
Honest
Imaginative
Independent
Intellectual
Logical
Loving
Obedient
Polite
Responsible
Self-controlled
2
8
4
12
15
5
6
7
1
18
11
10
9
13
14
16
3
17
1
9
2
7
10
16
12
17
11
14
3
15
4
8
18
13
6
5
1
1
2
5
5
11
6
10
10
4
8
5
5
5
4
3
3
12
1
1
4
25
25
121
36
100
100
16
64
25
25
25
16
9
9
144
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
149
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
150
Strong come from? Originally, they were generated by Strong and others, and were basically
the result of clinical insight. Subsequently, the
items contained in the current Strong came from
earlier editions and were selected on the basis
of their psychometric properties (i.e., reliability
and validity), as well as on their public relations
aspects that is, they would not offend, irritate,
or embarrass a respondent. As in other forms of
testing, items that yield variability of response or
response range are the most useful. D. P. Campbell
and J. C. Hansen (1981), for example, indicate
that items such as funeral director and geography were eliminated because almost everyone
indicates dislike to the former and like to the
latter. An item such college professor on the
other hand yields like responses of about 5%
in samples of farmers to 99% in samples of behavioral scientists.
Other criteria were also used in judging
whether an item would be retained or eliminated. Both predictive and concurrent validity
are important and items showing these aspects
were retained. For example, the Strong should
have content validity and so the items should
cover a wide range of occupational content.
Because sex-role bias was of particular concern,
items were modied (policeman became police
ofcer) or otherwise changed. Items that showed
a signicant gender difference in response were
not necessarily eliminated, as the task is to understand such differences rather than to ignore them.
Because the United States is such a conglomeration of minorities, and because the Strong might
be useful in other cultures, items were retained
if they were not culture bound, although the
actual operational denition of this criterion
might be a bit difcult to give. Other criteria,
such as reading level, lack of ambiguity, and current terminology, were also used.
1. You need to collect an occupational sample, in this case, golf instructors. Perhaps you
might identify potential respondents through
some major sports organization, labor union, or
other societies that might provide such a roster.
Your potential respondents must however, satisfy several criteria (in addition to lling out the
Strong): they must be satised with their occupation, be between the ages of 25 and 60, have at least
3 years of experience in that occupation, and perform work that is typical of that occupation
for example, a golf instructor who spends his or
her time primarily designing golf courses would
be eliminated.
2. You also need a reference group although
ordinarily you would use the available data based
on 300 men in general and 300 women in general. This sample has an average age of 38 years,
represents a wide variety of occupations, half professional and half nonprofessional.
3. Once youve collected your data, youll need
to compare for each of the 325 Strong items,
the percent of like, indifferent, or dislike
responses. The aim here is to identify 60 to 70
items that show a response difference of 16% or
greater.
4. Now you can assign scoring weights to each
of the 60 to 70 items. If the golf instructors
endorsed like more often than the general sample, that item is scored +1; if the golf instructors
endorsed dislike more often, then the item is
scored 1 (for like). If there are substantial differences between the two samples on the indifferent response, then that response is also scored.
5. Now you can obtain the raw scores for each of
your golf instructors, and compute your normative data, changing the raw scores to T scores.
Development. In more general terms then, the
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
151
Administrative Indices
General Occupational Themes
Basic Interest Scales
Occupational Scales
Special Scales
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
152
introverted with those of extroverted individuals, as dened by their scores on the MMPI scale
of the same name. High scorers (introverts) prefer working with things or ideas, while low scorers
(extroverts) prefer working with people.
Scores on the Strong are for the most part presented as T scores with a mean of 50 and SD of
10.
Interpretation of the prole. The resulting
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
not only contained separate scoring for occupations based on the respondents gender, but the
separate gender booklets were printed in blue for
males and pink for females! Thus, womens career
interests were compared with those of nurses,
school teachers, secretaries and other traditionally feminine occupations. Fortunately, current
versions of the Strong have done away with such
sexism, have in fact pioneered gender equality in
various aspects of the test, and provide substantial career information for both genders, and one
test booklet.
Hollands theory. The earlier versions of the
Strong were guided primarily by empirical considerations, and occupational scales were developed because there was a need for such scales.
As these scales proliferated, it became apparent that some organizing framework was needed
to group subsets of scales together. Strong and
others developed a number of such classifying
schemas based on the intercorrelations of the
occupational scales, on factor analysis, and on the
identication of homogeneous clusters of items.
In 1974, however, a number of changes were
made, including the incorporation of Hollands
(1966; 1973; 1985a) theoretical framework as a
way of organizing the test results.
Holland believes that individuals nd specic careers attractive because of their personalities and background variables; he postulated
that all occupations could be conceptualized
as representing one of six general occupational
themes labeled realistic, investigative, artistic,
social, enterprising, and conventional.
Individuals whose career interests are high in
the realistic area are typically aggressive persons
who prefer concrete activities to abstract work.
They prefer occupations that involve working
outdoors and working with tools and objects
rather than with ideas or people. These individu-
14:22
153
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
154
SVIB was published, he administered the inventory to the senior class at Stanford University,
and 5 years later contacted them to determine
which occupations they had entered, and how
these occupations related to their scores on the
inventory.
The criterion then for studying the predictive validity of the Strong becomes the occupation that the person eventually enters. If someone becomes a physician and their Strong prole
indicates a high score on the Physician scale, we
then have a hit. The problem, however, is that
the world is complex and individuals do not necessarily end up in the occupation for which they
are best suited, or which they desire. As Strong
(1935) argued, if nal occupational choice is an
imperfect criterion, then a test that is validated
against such a criterion must also be imperfect.
This of course is precisely the problem we discussed in Chapter 3; a test cannot be more valid
than the criterion against which it is matched,
and in the real world there are few, if any, such
criteria. Nevertheless, a number of studies both
by Strong (1955) and others (e.g., D. P. Campbell,
1971; Dolliver, Irwin, & Bigley, 1972) show substantial predictive validity for the Strong, with a
typical hit rate (agreement between high score
on an Occupational Scale and entrance into that
occupation) of at least 50% for both men and
women. There is of course something reassuring that the hit rates are not higher; for one
thing it means that specic occupations do attract
people with different ideas and interests, and
such variability keeps occupations vibrant and
growing.
Faking. In most situations where the Strong is
administered, there is little if any motivation to
fake the results because the client is usually taking
the inventory for their own enhancement. There
may be occasions, however, when the Strong is
administered as part of an application process;
there may be potential for faking in the application for a specic occupation or perhaps entrance
into a professional school.
Over the years, a number of investigators have
looked at this topic, primarily by administering the Strong twice to a sample of subjects,
rst under standard instructions, and secondly
with instructions to fake in a specic way, for
example, fake good to get higher scores on engineering (e.g., Garry, 1953; Wallace, 1950). The
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
155
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
156
X1
X2
FIGURE 63. Two distributions separated from each other by two standard deviations.
percent overlap, which is simply the percentage of scores in one sample that are matched by
scores in the second sample. If the two distributions of scores are totally different and therefore dont overlap, the statistic is zero. If the two
distributions are identical and completely overlap, then the statistic is 100%. If the intent of
a scale is to distinguish between two groups,
then clearly the lower the percentage overlap, the
more efcient (valid) is the scale. Tilton called
this statistic the Q index, and it is calculated as
follows:
M1 M2
Q=
2(SD1 + SD2 )
Once Q is computed, the percent overlap can be
determined using Tiltons (1937) table. Essentially, the Q index is a measure of the number
of standard deviation units that separate the two
distributions. For example, a Q value of 2 represents two distributions that are separated from
each other by two standard deviations, and have
an overlap of about 32%. Figure 6.3 illustrates
this. Note that if our occupational scale were
an IQ test, the means of the two groups would
differ by about 30 points a rather substantial
difference. The median percent overlap for the
Strong occupational scales is in fact about 34%.
This is of course a way of expressing concurrent
validity. Scales that reect well-dened occupations such as physicist or chemist, have the lowest
overlap or highest validity. Scales that assess less
well-dened occupations, such as that of college
professor, have a higher degree of overlap and,
therefore, lower concurrent validity.
J. C. Hansen (1981) indicate that interest measurement is based on two empirical ndings: (1)
different people give different responses to the
individual items; and (2) people who are satised
with their particular occupation tend to respond
to particular items in a characteristic way. Given
these two statements, the item response distribution for a particular item charts the value of that
item and its potential usefulness in the inventory.
At the Center for Interest Measurement Research
of the University of Minnesota extensive data on
the Strong is stored on computer archives, going
back to the original samples tested by Strong. For
example, D. P. Campbell and J. C. Hansen (1981)
show the item response distribution for the item
artist given by some 438 samples, each sample
typically ranging from less than 100 to more than
1,000 individuals in a specic occupation. Both
male and female artist samples tend to show near
unanimity in their endorsement of like; at the
other extreme, male farmers show an 11% like
response, and females in life insurance sales a 32%
like response.
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
14:22
157
for use with junior and senior high-school students in grades 6 through 12; and (3) the Kuder
Occupational Interest Survey (KOIS), designed
for grades 10 through adulthood. The rst two
yield scores in 10 general areas, namely: artistic, clerical, computational, literary, mechanical, musical, outdoor, persuasive, scientic, and
social service. The third, the KOIS, yields substantially more information, and our discussion
will focus primarily on this instrument. Once
again, we use the term Kuder as a more generic
designation, except where this would violate the
meaning.
Development. Initially, the Strong and the
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
158
in the other scales. Items were then placed in triads, each triad reecting three different scales.
The respondent indicates which item is most preferred and which item is least preferred. Note that
this results in an ipsative instrument one cannot obtain all high scores. To the degree that one
scale score is high, the other scale scores must be
lower.
Lets assume we are developing a new occupational scale on the Kuder for limousine driver,
and nd that our sample of limousine drivers
endorses the rst triad as follows:
Item #
1
2
3
Most preferred
20%
60%
20%
Least preferred
70%
15%
15%
Interest
14:22
P1: JZP
0521861810c06a
CB1038/Domino
0 521 86181 0
activities. Some of these inventories use a forcedchoice format, while others ask the respondent
to indicate how much they like each activity.
Most of these have adequate reliability but leave
much to be desired in the area of validity.
Interest inventories for nonprofessional occupations. Both the Strong and the Kuder have
found their primary application with collegebound students and adults whose expectations
are to enter professions. In fact, most careerinterest inventories are designed for occupations
that are entered by middle-class individuals. In
large part, this reects a reality of our culture, that
perhaps is changing. At least in the past, individuals from lower socioeconomic classes did not have
much choice, and job selection was often a matter of availability and nancial need. For upper
socioeconomic class individuals, their choice was
similarly limited by family expectations and traditions, such as continuing a family business
or family involvement in government service. A
number of other interest inventories have been
developed that are geared more for individuals
entering nonprofessional occupations.
One example of such inventories is the Career
Assessment Inventory (CAI; Johansson, 1986),
rst introduced in 1975 and subsequently revised
several times, recently to include both nonprofessional and professional occupations. The CAI
currently contains some 370 items similar in content to those of the Strong and takes about 40
to 45 minutes to complete. For each item, the
client responds on a 5-point scale ranging from
like very much to dislike very much. Like the
Strong, the CAI contains the six general-theme
scales that reect Hollands typology, 25 Basic
Interest scales (e.g., electronics, food service,
athletics-sports), and 111 occupational scales
(such as accountant, barber/hairstylist, carpenter, re-ghter, interior designer, medical assistant, police ofcer, and truck driver). Although
the CAI seems promising, in that it was welldesigned psychometrically and shows adequate
reliability, it too has been criticized, primarily
for lack of evidence of validity (McCabe, 1985;
Rounds, 1989).
Other career aspects. In addition to career
interests, there are a number of questionnaires
designed to assess a persons attitudes, compe-
14:22
159
tencies, and decision-making skills, all in relation to career choice. For example, the Career
Decision Scale (Osipow, 1987) is an 18-item scale
designed to assess career indecision in college students, and the Career Maturity Inventory (Crites,
1978) is designed to assess career-choice competencies (such as self-knowledge and awareness of
ones interests) and attitudes (such as degree of
independence and involvement in making career
decisions).
Lack of theory. One criticism of the entire eld
P1: JZP
0521861810c06b
CB1038/Domino
0 521 86181 0
160
SUGGESTED READINGS
DISCUSSION QUESTIONS
8:41
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
14:24
Psychopathology
some issues of denition and nosology, and then we look at 11 different instruments,
each selected for specic purposes. First, we look at two screening inventories, the
SCLR-90 and the PSI. Then we look at three multivariate instruments: two of these,
the MMPI and the MCMI, are major instruments well known to most clinicians, and
the third is new and unknown. Next, we look at an example of a test that focuses on
a specic aspects of psychopathology the schizoid personality disorder. Finally, we
look at a measure of anxiety and three measures of depression used quite frequently
by clinicians and researchers. More important than each specic test, are the kinds
of issues and construction methodologies they represent, especially in relation to the
basic issues covered in Chapter 3.
INTRODUCTION
The Diagnostic and Statistical Manual (DSM).
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
162
Mental disorder. There are many phrases used
for the area under consideration, such as mental illness, psychopathology, and so on. For
our purposes, we will follow the DSM approach
and dene mental disorder as a psychological
pattern associated with distress and/or impairment of functioning that reects dysfunction in
the person.
Psychiatric diagnosis. In the assessment of psy-
used for a variety of purposes in the area of psychopathology, their use often falls into one of two
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
163
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
164
each of the 90 items, the degree of distress experienced during the past 7 days, using a 5-point
scale (0 to 4) that ranges from not at all to
extremely. The SCL-90R can be scored for nine
symptom dimensions, and these are presented in
Table 7.1. The SCL-90R is primarily applicable to
adults, but can be used with adolescents, perhaps
even with 13- and 14-year-olds.
Administration. The SCL-90R contains clear
directions and is relatively easy to administer,
often by a nurse, technician, or research assistant. Most patients can complete the SCL-90R in
10 to 15 minutes. The items on the scale can also
be read to the patient, in cases where trauma or
other conditions do not permit standard administration. A 3 5 card with the response options
is given to the patient, and the patient can indicate
a response by pointing or raising an appropriate
number of ngers.
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
165
Name
Denition
Example
12
Somatization
Headaches
A lump in your throat
10
Obsessive-compulsive
Interpersonal sensitivity
13
Depression
10
Anxiety
Hostility
Phobic anxiety
Paranoid ideation
10
Psychoticism
Note: If you yourself are a bit obsessive-compulsive, you will note that the above items only add up to 83. There
are in fact 7 additional items to the SCL-90 that are considered to reect clinically important symptoms, such as poor
appetite and trouble falling asleep, that are not subsumed under any of the 9 primary dimensions.
divided by 90; thus, this is a measure of intensity corrected for the number of symptoms.
Reliability. The test manual (Derogatis, 1977)
provides internal consistency (coefcient alpha)
and test-retest (1 week) information. Internal
consistency coefcients range from .77 to .90,
with most in the mid .80s. Test-retest coefcients
range from .78 to .90, again with most coefcients in the mid .80s. Thus from both aspects of
reliability, the SCL-90R seems quite adequate.
Validity. The test manual discusses validity nd-
ings under three headings: concurrent, discriminative, and construct validity. Under concurrent validity are studies that compare SCL-90R
scores with those obtained on other multivariate
measures of psychopathology such as the MMPI.
These results indicate that the SCL-90R scales
correlate .40 to .60 with their counterparts on
the MMPI, but the results are somewhat complex. For example, the Psychoticism Scale of the
SCL-90R does correlate signicantly with the
MMPI Schizophrenia Scale (.64), but so does
the SCL-90 Depression Scale (.55), the ObsessiveCompulsive Scale (.57), the Anxiety Scale (.51),
and the Interpersonal Sensitivity Scale (.53)!
Under discriminative validity are studies
where the SCL-90R is said to provide clinical discrimination or usefulness in dealing with different diagnostic groups. Thus studies are included
in this section, that deal with oncology patients,
drug addicts, dropouts from West Point Military Academy, and others. Finally, there is a section on construct validity that presents evidence
from factor-analytic studies showing a relatively
good match between the original dimensions on
the SCL-90R and the results of various factoranalytic procedures. The SCL-90R has been particularly useful in studies of depression (e.g.,
Weissman, Sholomskas, Pottenger, et al., 1977).
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
166
viewed with respect to both reliability and validity. Couched in somewhat different language, it
is also the question of factorial invariance does
the factor pattern remain constant across different groups? Such constancy, or lack of it, can
be viewed as good or bad, depending on ones
point of view. Having factorial invariance might
indicate stability of dimensions across different
groups; not having such invariance may in fact
reect the real world. For example, in psychiatric patients we might expect the various SCL90R dimensions to be separate symptom entities,
but in normal subjects it might not be surprising
to have these dimensions collapse into a general
adjustment type of factor. The SCL-90R manual suggests that the SCL-90R has such factorial
invariance for gender, social class, and psychiatric
diagnosis (see Cyr, Doxey, & Vigna, 1988).
Interpretation. The test manual indicates that
interpretation of a protocol is best begun at the
global level, with a study of the GSI, PSI, and PSDI
as indicators of overall distress. The next step is
to analyze the nine primary symptom dimensions that provide a broad-brush prole of the
patients psychodynamic status. Finally, an analysis of the individual items can provide information as to whether the patient is suicidal or
homicidal, what phobic symptoms are present,
and so on.
Norms. Once the raw scores are calculated for
a protocol, the raw scores can be changed to T
scores by consulting the appropriate normative
table in the manual. Such norms are available for
male (n = 490) and female (n = 484) normal
or nonpatient subjects, for male (n = 424) and
female (n = 577) psychiatric outpatients, and for
adolescent outpatients (n = 112). In addition,
mean proles are given for some 20 clinical samples ranging from psychiatric outpatients, alcoholics seeking treatment, to patients with sexual
dysfunctions.
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
167
through the use of templates, and the results easily plotted on a graph, separately by gender. The
raw scores can be changed to T scores by either
using the table in the manual, or by plotting the
scores on the prole sheet. On most multivariate
instruments, like the MMPI for example, scores
that are within two SDs from the mean (above 30
and below 70 in T scores) are considered normal. On the PSI, scores that deviate by only 1 SD
from the mean are considered signicant. Thus
one might expect a greater than typical number of
false positives. However, the PSI manual explicitly
instructs the user on how to determine a cutoff
score, based on local norms, that will maximize
the number of hits.
Reliability. Test-retest reliability coefcients
range from .66 to .95, while internal consistency
coefcients range from .51 to .85. These coefcients are based on normal college students,
and as Golding (1978) points out, may be
signicantly different in a clinical population.
Certainly the magnitude of these coefcients
suggests that while the PSI is adequate for
research purposes and group comparisons,
extreme caution should be used in making
individual decisions.
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
168
Validity. The initial validity data are based on
comparisons of two small psychiatric crossvalidation groups (Ns of 23 and 27), a prison
group (N = 80), and subjects (N = 45;
presumably college students) given the fake good
and fake bad instructions. In each case, the mean
scores on the scales seem to work as they should.
For example, the Alienation means for the psychiatric patients are T scores ranging from 66 to 74
(remember that 50 is average), while Social Nonconformity means for the prison subjects are at
67 for both genders. Under instructions to fake
good, the mean Defensiveness scores jump to 63
and 67, while under instructions to fake bad the
mean scores go down to 31.
The test manual (Lanyon, 1973; 1978) indicates considerable convergent and discriminant
validity by presenting correlations of the PSI
scales with those of inventories such as the MMPI.
Other studies are presented using contrasted
groups. For example, in one study of 48 psychiatric patients and 305 normal males, a cutoff
score of 60 on the Alienation Scale achieved an
86% overall hit rate, with 17% of the psychiatric
cases misidentied as false negatives, and 14% of
the normal individuals misidentied as false positives. Incidentally, the criterion for being normal is a difcult one ordinarily it is dened
either by self-report or by the absence of some
criteria, such as the person has not been hospitalized psychiatrically or has not sought psychological help. These operational denitions do not
guarantee that subjects identied as normal are
indeed normal.
A number of studies can be found in the literature that also support the validity of the PSI.
One example is that of Kantor, Walker, and Hays
(1976) who administered the PSI to two samples of adolescents: 1,123 13- to 16-year-old students in a school setting and 105 students of
the same age who were on juvenile probation.
Students in the rst sample were also asked to
answer anonymously a questionnaire that contained three critical questions: (1) Did you run
away from home last year? (2) Have you been
in trouble with the police? (3) Have you stolen
anything worth more than $2? A subgroup was
then identied of youngsters who answered one
or more of the critical questions in the keyed
direction. The PSI scales did not differentiate the
subgroup from the larger normal group, but the
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
that represented the normative sample of normal adults, ages 16 to 60. The results yielded ve
factors: Factor I represents a dimension of serious psychopathology, and contains items from 4
PSI scales; Factor II represents an extraversionintroversion dimension, with most of the items
coming from the Expression Scale; The third
factor seems to be an acting-out dimension,
although it is made up of few items and does
not seem to be a robust dimension; Factor IV
represents the Protestant Ethic dened as diligence in work and an attitude of responsibility.
It too is made up of items from four of the PSI
scales; Finally, Factor V is a general neuroticism
factor, made up primarily of items from the Discomfort Scale. Note then, that two of the factors
(extraversion and general neuroticism) parallel
the two PSI scales that were developed on the
basis of factor analysis. Two other factors (serious psychopathology and acting out) show some
congruence to the Alienation and Social Nonconformity Scales, but the parallel is not very
high.
Overall (1974) did administer the PSI to 126
new patients at a psychiatric outpatient clinic,
and compared the results to those obtained on
a sample of 800 normal subjects, originally collected as norms by the author of the PSI. What
makes this study interesting and worth reporting here is that Overall (1974) scored each PSI
protocol on the standard ve scales, and then
rescored each protocol on a set of ve scales that
resulted from a factor analysis (presumably those
obtained in the Lanyon, J. H. Johnson, and Overall 1974 study cited above). Note that a factor
analysis can yield the same number of dimensions as originally postulated by clinical insight,
but the test items dening (or loading) each factor dimension may not be the exact ones found
on the original scales. Overall (1974) computed
a discriminant function (very much like a regression equation), which was as follows:
Y = +.417AL + .034Sn + .244Di + .046E x
+.307De
For each person then, we would take their scores
on the PSI, plug the values in the above equation,
do the appropriate calculations, and compute Y.
In this case, Y would be a number that would
predict whether the individual is normal or a
14:24
169
that the use of the PSI can raise serious ethicallegal issues. If the PSI is used with patients who
are seeking treatment, and the resulting scores are
used for potential assignment to different treatments, there seems to be little problem other than
to ensure that whatever decisions are taken are
made on the basis of all available data. But if the
individual is unwilling to seek treatment and is
nevertheless screened, for example in a military
setting or by a university that routinely administers this to all incoming freshmen, and is then
identied as a potential mist, serious ethicallegal issues arise. In fact, Bruch (1977) administered the PSI to all incoming freshmen over a
2-year period, as part of the regular freshman
orientation program. Of the 1,815 students who
completed the PSI, 377 were eventually seen in
the Counseling Center. An analysis of the PSI
scores indicated that students who became Counseling Center clients obtained higher scores on
the Alienation, Social Nonconformity, and Discomfort Scales, and lower scores on the Defensiveness Scale.
number of grounds, including high intercorrelations among its scales and high correlations with
measures of social desirability (Golding, 1978).
Yet, Pulliam (1975) for example, found support
for the idea that the PSI scales are not strongly
affected by social desirability. One study that used
the PSI with adolescents in a juvenile court related
agency, found the PSI to be of little practical
use and with poor discriminant validity (Feazell,
Quay, & Murray, 1991). For a review of the PSI,
see Streiner (1985), Vieweg and Hedlund (1984).
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
170
THE MINNESOTA MULTIPHASIC
PERSONALITY INVENTORY (MMPI)
AND MMPI-2
The MMPI
Introduction. The MMPI was rst published in
by pooling together those items that differentiated signicantly between a specic diagnostic group, normal subjects, and other diagnostic
groups. Thus items for the Schizophrenia Scale
included those for which the response rate of
schizophrenics differed from those of normals
and those of other diagnostic groups. Scales were
then cross-validated by administering the scale
to new samples, both normal and psychiatric
patients, and determining whether the total score
on the scale statistically differentiated the various
samples.
Thus eight clinical scales, each addressed to
a particular psychiatric diagnosis were developed. Later, two additional scales were developed that became incorporated into the standard
MMPI prole. These were: (1) a MasculinityFemininity (Mf) Scale originally designed to distinguish between homosexual and heterosexual
males, but composed primarily of items showing a gender difference; and (2) a Social Introversion (Si) Scale composed of items whose response
rate differed in a group of college women who
participated in many extracurricular activities
vs. a group who participated in few if any such
activities. This empirical approach to scale construction resulted in many items that were subtle i.e., their manifest content is not directly
related to the psychopathological dimension they
are presumed to assess. This distinction between
subtle and obvious items has become a major
research focus; it has in fact been argued that
such subtle items reduce the validity of the MMPI
scales (Hollrah, Schlottmann, Scott, et al., 1995).
In addition to these 10 clinical scales, four
other scales, called validity scales, were also developed. The purpose of these scales was to detect
deviant test-taking attitudes. One scale is the
Cannot Say Scale, which is simply the total number of items that are omitted (or in rare cases,
answered as both true and false). Obviously, if
many items are omitted, the scores on the rest of
the scales will tend to be lower. A second validity scale is the Lie Scale designed to assess faking
good. The scale is composed of 15 items that most
people, if they are honest, would not endorse
for example, I read every editorial in the newspaper every day. These items are face valid and
in fact were rationally derived.
A third validity scale, the F Scale, is composed of 60 items that fewer than 10% of the
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
normal samples endorsed in a particular direction. These items cover a variety of content; a factor analysis of the original F Scale indicated some
19 content dimensions, such as poor physical
health, hostility, and paranoid thinking (Comrey,
1958).
Finally, the fourth validity scale is called the K
Scale and was designed to identify clinical defensiveness. The authors of this scale (Meehl & Hathaway, 1946) noticed that for some psychiatric
patients, their MMPI prole was not as deviant
as one might expect. They therefore selected 30
MMPI items for this scale that differentiated
between a group of psychiatric patients whose
MMPI proles were normal (contrary to expectations), and a group of normal subjects whose
MMPI proles were also normal, as expected. A
high K score was assumed to reect defensiveness
and hence lower the deviancy of the resulting
MMPI prole. Therefore, the authors reasoned
that the K score could be used as a correction
factor to maximize the predictive validity of the
other scales. Various statistical analyses indicated
that this was the case, at least for 5 of the 10 scales,
and so these scales are plotted on the prole sheet
by adding to the raw score a specied proportion
of the K raw score.
Original aim. The original aim of the MMPI was
14:24
171
is usually followed by specic therapeutic procedures. But in psychology, this medical model is
limited and misleading. Although diagnosis is a
short hand for a particular constellation of symptoms, what is important is the etiology of the disorder and the resulting therapeutic regime that
is, how did the client get to be the way he or she
is, and what can be done to change that. In psychopathology, the etiology is often complex, multidetermined, and open to different arguments,
and the available therapies are often general and
not target specic.
Because in fact, reliable differences in MMPI
scores were obtained between individuals who
differed in important ways, the focus became on
what these differences meant. It became more
important to understand the psychodynamic
functioning of a client and that clients strengths
and difculties. The diagnostic names of the
MMPI scales became less important, and in fact
they were more or less replaced by a numbering system. As J. R. Graham (1993) states, each
MMPI scale became an unknown to be studied
and explored. Thousands of studies have now
been carried out on the MMPI and the clinician
can use this wealth of data to develop a psychological portrait of the client and to generate hypotheses about the dynamic functioning of that client
(see Caldwell, 2001).
Numerical designation. The 10 clinical scales are
numbered 1 to 10 as follows:
Scale #
1
2
3
4
5
6
7
8
9
10
Original name
Hypochondriasis
Depression
Hysteria
Psychopathic Deviate
Masculinity-Femininity
Paranoia
Psychasthenia
Schizophrenia
Hypomania
Social introversion
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
172
MMPI, the MMPI-2 has found extensive applications in many cultures, not only in Europe and
Latin America, but also in Asia and the Middle
East (see Butcher, 1996).
used personality test in the United States and possibly in the world (Lubin, Larsen, & Matarazzo,
1984), but it has not been free of criticism. One
major concern was the original standardization
sample, the 724 persons who were visitors to the
University of Minnesota hospital, often to visit
a relative hospitalized with a psychiatric diagnosis, were not representative of the U.S. general
population.
In 1989, a revised edition, the MMPI-2, was
published (Butcher, Dahlstrom, Graham, et al.,
1989). This revision included a rewrite of many
of the original items to eliminate wording that
had become obsolete, sexist language, and items
that were considered inappropriate or potentially
offensive. Of the original 550 items found in the
1943 MMPI, 82 were rewritten, even though most
of the changes were slight. Another 154 new items
were added to adequately assess such aspects as
drug abuse, suicide potential, and marital adjustment. An additional change was to obtain a normative sample that was more truly representative
of the U.S. population. Potential subjects were
solicited in a wide range of geographical locations, using 1980 Census data. The nal sample consisted of 2,600 community subjects, 1,138
males and 1,462 females, including 841 couples,
and representatives of minority groups. Other
groups were also tested including psychiatric
patients, college students, and clients in marital counseling. To assess test-retest reliability, 111
female and 82 male subjects were retested about
a week later.
At the same time that the adult version of the
MMPI was being revised, a separate form for adolescents was also pilot tested, and a normative
sample of adolescents also assessed. This effort
however, has been kept separate.
The resulting MMPI-2 includes 567 items, and
in many ways is not drastically different from its
original. In fact, considerable effort was made
to ensure continuity between the 1943 and the
1989 versions. Thus most of the research ndings
and clinical insights pertinent to the MMPI are
still quite applicable to the MMPI-2. As with the
tains three new scales, all of which are validity scales rather than clinical scales. One is the
Backpage Infrequency Scale (Fb), which is similar to the F Scale, but is made up of 40 items that
occur later in the test booklet. The intent here
is to assess whether the client begins to answer
items randomly somewhere after the beginning
of the test. A second scale is the Variable Inconsistency Scale (VRIN). This scale consists of 67
pairs of items that have either similar or opposite
content, and the scoring of this scale reects the
number of item pairs that are answered inconsistently. A third scale is the True Response Inconsistency Scale (TRIN) which consists of 23 pairs
of items that are opposite in content, and the
scoring, which is somewhat convoluted (see J. R.
Graham, 1993), reects the tendency to respond
true or false indiscriminately.
Administration. The MMPI-2 is easily administered and scored; it can be scored by hand using
templates or by computer. It is, however, a highly
sophisticated psychological procedure, and interpretation of the results requires a well-trained
professional.
The MMPI-2 is appropriate for adolescents
as young as age 13, but is primarily for adults.
Understanding of the items requires a minimum
eighth-grade reading level, and completing the
test requires some 60 to 90 minutes on the average. Originally, the MMPI items were printed
individually on cards that the client then sorted;
subsequently, items were printed in a reusable
booklet with a separate answer sheet, and that is
the format currently used. The 1943 MMPI was
also available in a form with a hard back that
could be used as a temporary writing surface.
The MMPI-2 is available for administration on a
personal computer, as well as a tape-recorded version for subjects who are visually handicapped.
The MMPI and MMPI-2 have been translated in
many languages.
There is a shortened version of the MMPI in
that in order to score the standard scales only the
rst 370 items need to be answered. Subsequent
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
173
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
174
4. Psychopathic Deviate
(Pd) (50)
5. Masculinity-Femininity
(Mf) (56)
Scale Description
Designed to measure preoccupation with ones body (somatic
concerns) and fear of illness. High scores reect denial of
good health, feelings of chronic fatigue, lack of energy, and
sleep disturbances. High scorers are often complainers,
self-centered, and cynical.
Designed to measure depression. High scorers feel depressed
and lack hope in the future. They may also be irritable and
high-strung, have somatic complaints, lack self-condence,
and show withdrawal from social and interpersonal activities.
This scale attempts to identify individuals who react to stress
and responsibility by developing physical symptoms. High
scorers are usually psychologically immature and
self-centered. They are often interpersonally oriented but are
motivated by the affection and attention they get from
others, rather than a genuine interest in other people.
The Pd type of person is characterized by asocial or amoral
behavior such as excessive drinking, sexual promiscuity,
stealing, drug use, etc. High scorers have difculty in
incorporating the values of society and rebel toward
authorities, including family members, teachers, and work
supervisors. They often are impatient, impulsive, and poor
planners.
Scores on this scale are related to intelligence, education, and
socioeconomic status, and this scale more than any other,
seems to reect more personality interests than
psychopathology. Males who score high (in the feminine
direction) may have problems of sexual identity or a more
androgynous orientation. Females who score high (in the
masculine direction) are rejecting of traditional female role
and have interests that in our culture are seen as more
masculine.
Paranoia is marked by feelings of persecution, suspiciousness,
and grandiosity, and other evidences of disturbed thinking. In
addition to these characteristics, high scorers may also be
suspicious, hostile, and overly sensitive.
Psychasthenic (a term no longer used) individuals are
characterized by excessive doubts, psychological turmoil, and
obsessive-compulsive aspects. High scorers are typically
anxious and agitated individuals who worry a great deal and
have difculties in concentrating. They are orderly and
organized but tend to be meticulous and overreactive.
Schizophrenia is characterized by disturbances of thinking,
mood, and behavior. High scorers, in addition to the psychotic
symptoms found in schizophrenia, tend to report unusual
thoughts, may show extremely poor judgment, and engage in
bizarre behavior.
Hypomania is characterized by ight of ideas, accelerated motor
activity and speech, and elevated mood. High scorers tend to
exhibit an outward picture of condence and poise, and are
typically seen as sociable and outgoing. Underneath their
facade, there are feelings of anxiousness and nervousness,
and their interpersonal relations are usually quite supercial.
High scorers are socially introverted and tend to feel
uncomfortable and insecure in social situations. They tend to
be shy, reserved, and lack self-condence.
Note: Most of the above is based on Graham (1990) and Dahlstrom, Welsh, and Dahlstrom (1972).
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
175
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
176
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
very complex one, not only because we are dealing with an entire set of scales rather than just
one, but also because there are issues about the
validity of congural patterns, of interpretations
derived from the entire MMPI prole, of differential results with varying samples, and of the
interplay of such aspects as base rates, gender,
educational levels of the subjects, characteristics
of the clinicians, and so on.
J. R. Graham (1993) indicates that validity
studies of the MMPI fall into three general categories. The rst are studies that have compared
the MMPI proles of relevant criterion groups.
Most of these studies have found signicant differences on one or more of the MMPI scales
among groups that differed on diagnostic status
or other criteria. A second category of studies try
to identify reliable nontest behavioral correlates
of MMPI scales or congurations. The results of
these studies suggest that there are such reliable
correlates, but their generalizability is sometimes
in question; i.e., the ndings may be applicable
to one type of sample such as alcoholics, but not
to another such as adolescents who are suicidal.
A third category of studies looks at the MMPI
results and at the clinician who interprets those
results as one unit, and focuses then on the accuracy of the interpretations. Here the studies are
not as supportive of the validity of the MMPIbased inferences, but the area is a problematic
and convoluted one (see Garb, 1984; L. R. Goldberg, 1968).
Racial differences. There is a substantial body of
literature on the topic of racial differences on the
MMPI, but the results are by no means unanimous, and there is considerable disagreement as
to the implications of the ndings.
A number of studies have found differences
between black and white subjects on some MMPI
scales, with blacks tending to score higher than
whites on scales F, 8, and 9, but the differences are
small and, although statistically signicant, may
not be of clinical signicance (W. G. Dahlstrom,
Lachar, & L. E. Dahlstrom, 1986; Pritchard
& Rosenblatt, 1980). Similar differences have
been reported on the MMPI-2, but although
14:24
177
statistically signicant, they are less than 5 Tscore points, and therefore not really clinically
meaningful (Timbrook & Graham, 1994).
R. L. Greene (1987) reviewed 10 studies that
compared Hispanic and white subjects on the
MMPI. The differences seem to be even smaller
than those between blacks and whites, and R.
L. Greene (1987) concluded that there was no
pattern to the obtained differences. R. L. Greene
(1987) also reviewed seven studies that compared
American Indians and whites and three studies that compared Asian-American and white
subjects. Here also there were few differences
and no discernible pattern. Hall, Bansal, and
Lopez (1999) did a meta-analytical review and
concluded that the MMPI and MMPI-2 do not
unfairly Portray African-Americans and Latinos
as pathological. The issue is by no means a closed
one, and the best that can be said for now is that
great caution is needed when interpreting MMPI
proles of nonwhites.
MMPI manuals. There is a veritable ood of
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
178
to diagnose the specic nature of such impairment. Thus the MMPI clinical scales might have
worked better diagnostically had they been developed to discriminate specic diagnostic groups
from each other.
Usefulness of the MMPI. There are at least two
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
179
Development. In general terms, the develop-
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
180
with each other. Rather, the intent was to identify items that statistically correlated at least .30
or above with the total score on the scale they
belonged to, as dened by the initial clinical judgment and theoretical stance. In fact, the median
correlation of items that were retained was about
.58. Items that showed extreme endorsement frequencies, less than 15% or greater than 85% were
eliminated (you recall from Chapter 2, that such
items are not very useful from a psychometric
point of view). These and additional screening
steps resulted in a 289-item research form, that
included both true and false keyed responses, and
items that were scored on multiple scales.
In the third stage, two major studies were carried out. In the rst study, 200 experienced clinicians administered the experimental form of the
MCMI to as many of their patients as feasible
(a total of 682 patients), and rated each patient,
without recourse to the MCMI responses, on
a series of comprehensive and standard clinical
descriptions that paralleled the 20 MCMI dimensions. An item analysis was then undertaken to
determine if each item correlated the highest
with its corresponding diagnostic category. This
resulted in 150 items being retained, apparently
each item having an average scale overlap of about
4 that is, each item is scored or belongs to about
4 scales on average, although on some scales the
keyed response is true and on some scales the
keyed response for the same item is false. Note
however, that this overlap of items, which occurs
on such tests as the MMPI, is here not a function
of mere correlation but is dictated by theoretical
expectations.
The results of this rst study indicated that
three scales were not particularly useful, and
so the three scales (hypochondriasis, obsessioncompulsion, and sociopathy) were replaced by
three new scales (hypomanic, alcohol abuse,
and drug abuse). This meant that a new set of
items was developed, added to the already available MCMI items, and most of the steps outlined above were repeated. This nally yielded
175 items, with 20 scales ranging in length
from 16 items (Psychotic Delusion) to 47 items
(Hypomanic).
Parallel with DSM. One advantage of the MCMI
no accident because Millon has played a substantial role in some of the work that resulted in the
DSM.
Millons theory. Millons theory about disorders
of the personality is deceptively simple and based
on two dimensions. The rst dimension involves
positive or negative reinforcement that is, gaining satisfaction vs. avoiding psychological discomfort. Patients who experience few satisfactions in life are detached types; those who evaluate
satisfaction in terms of the reaction of others are
dependent types. Where the satisfaction is evaluated primarily by ones own values with disregard
for others we have an independent type, and those
who experience conict between their values and
those of others are ambivalent personalities. The
second dimension has to do with coping, with
maximizing satisfaction and minimizing discomfort. Some individuals are active, and manipulate or arrange events to achieve their goals; others are passive, and cope by being apathetic,
resigned, or simply passive. The four patterns
of reinforcement and the two patterns of coping result in eight basic personality styles: active
detached, passive detached, active independent,
and so on. These eight styles are of course assessed
by each of the eight basic personality scales of the
MCMI. Table 7.3 illustrates the parallel.
Millon believes that such patterns or styles
are deeply ingrained and that a patient is often
unaware of the presence of such patterns and
their maladaptiveness. If the maladjustment continues, the basic maladaptive personality pattern
becomes more extreme, as reected by the three
personality disorder scales S, C, and P. Distortions of the basic personality patterns can also
result in clinical-syndrome disorders, but these
are by their very nature transient and depend
upon the amount of stress present. Scales 12 to 20
assess these disorders, with scales 12 through 17
assessing those with moderate severity, and scales
18 through 20 assessing the more severe disorders. Although there is also a parallel between
the eight basic personality types and the clinicalsyndrome disorders, the correspondence is more
complex, and is not a one to one. For example, neurotic depression or what Millon (1987)
calls dysthimia (scale 15) occurs more commonly among avoidant, dependent, and passive
aggressive personalities. Note that such a theory
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
181
MCMI scale
Can become:
Passive detached
Active detached
Passive dependent
Active dependent
Passive independent
Active independent
Passive ambivalent
Active ambivalent
Schizoid
Avoidant
Dependent
Histrionic
Narcissistic
Antisocial
Compulsive
Passive aggressive
Schizotypal
Schizotypal
Borderline
Borderline
Paranoid
Paranoid
Borderline &/or Paranoid
Borderline &/or Paranoid
administered individually,
but could be used in a group
setting. As the manual indicates, the briefness of the test
and its easy administration
by an ofce nurse, secretary,
or other personnel, makes
it a convenient instrument.
The instructions are clear
and largely self-explanatory.
able, so the user is required to use the commercially available scoring services. Although this
may seem to be driven by economic motives, and
probably is, the manual argues that hand scoring
so many scales leads to errors of scoring, and even
more important, as additional research data are
obtained, renements in scoring and in normative equivalence can be easily introduced in the
computer scoring procedure, but not so easily in
outdated templates. The manual does include a
description of the item composition of each scale,
so a template for each scale could be constructed.
Computer scoring services are available from the
test publisher, including a computer generated
narrative report.
Coding system. As with the MMPI there is a pro-
le coding system that uses a shorthand notation to classify a particular prole, by listing the
basic personality scales (18), the pathological
personality disorder scales (S, C, P), the moderate clinical syndrome scales (A, H, N, D, B, T)
and the severe clinical syndrome scales (SS, CC,
PP), in order of elevation within each of these
four sections.
Decision theory. We discussed in Chapter 3 the
notions of hits and errors, including false positives and false negatives. The MCMI incorporates this into the scale guidelines, and its
manual explicitly gives such information. For
example, for scale 1, the schizoid scale, the base
rate in the patient sample was .11 (i.e., 11%
of the patients were judged to exhibit schizoid
symptoms). Eighty-eight percent of patients who
were diagnosed as schizoid, in fact, scored on
the MCMI above the cutoff line on that scale.
Five percent of those scoring above the cutoff line
were incorrectly classied; that is, their diagnosis
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
182
A. Basic Personality Patterns. These reect everyday ways of functioning that characterize patients.
They are relatively enduring and pervasive traits.
1.
2.
3.
4.
5.
6a.
6b.
7.
8a.
8b.
B. Pathological Personality Disorders. These scales describe patients with chronic severe pathology.
9.
10.
11.
Paranoid (44)
C. Clinical symptom syndromes. These nine scales represent symptom disorders, usually of briefer
duration than the personality disorders, and often are precipitated by external events.
12.
Anxiety (25)
13.
Somatoform (31)
14.
Bipolar-manic (37)
15.
16.
17.
18.
19.
20.
Dysthymia (36)
Alcohol dependence (46)
Drug dependence (58)
Thought disorder (33)
Major depression (24)
Delusional disorder (23)
D. Validity scales
21.
22.
23.
24.
Weight factor (or disclosure level). This is not really a scale as such, but is a score adjustment
applied under specic circumstances. It is designed to moderate the effects of either excessive
defensiveness or excessive emotional complaining (i.e., fake good and fake bad response sets).
Validity index. Designed to identify patients who did not cooperate or did not answer relevantly
because they were too disturbed. The scale is composed of 4 items that are endorsed by fewer
than 1 out of 100 clinical patients. Despite its brevity, the scale seems to work as intended.
Desirability gauge. The degree to which the respondent places him/herself in a favorable light
(i.e., fake good).
The Debasement measure. The degree to which the person depreciates or devalues themselves
(i.e., fake bad).
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
183
values in the .80 to .85 range. For the second sample, the coefcients range from .61 to .85, with a
median of about .77. Because all the patients were
involved in psychotherapy programs, we would
expect the 5-week reliabilities to be lower. We
would also expect, and the results support this,
the personality pattern scales to be highest in reliability, followed by the pathological personality
scales, and least reliable, the clinical syndromes
(because most changeable and transient).
Internal consistency reliability (KR 20) was
also assessed in two samples totaling almost 1,000
patients. These coefcients range from .58 to .95,
with a median of .88; only one scale, the 16 item
PP scale, which is the shortest scale, has a KR
reliability of less than .70.
Validity. A number of authors, such as Loevinger
(1957) and Jackson (1970) have argued that validation should not simply occur at the end of
a tests development, but should be incorporated in all phases of test construction. That
seems to be clearly the case with the MCMI; as
we have seen above, its development incorporated three distinct validational stages. We can
also ask, in a more traditional manner, about
the validity of the resulting scales. The MCMI
manual presents correlations of the MCMI scales
with scales from other multivariate instruments,
namely the MMPI, the Psychological Screening
Inventory, and the SCL-90. It is not easy to summarize such a large matrix of correlations, but
in general the pattern of correlations for each
MCMI scale supports their general validity, and
the specic signicant correlations seem to be
in line with both theoretical expectations and
empirically observed clinical syndromes. (For a
comparison of the MCMI-II and the MMPI see
McCann, 1991).
Norms. Norms on the MCMI are based on a sample of 297 normal subjects ranging in age from
18 to 62, and 1,591 clinical patients ranging in
age from 18 to 66. These patients came from
more than 100 hospitals and outpatient centers,
as well as from private psychotherapists in the
United States and Great Britain. These samples
are basically samples of convenience, chosen for
their availability, but also reective of diversity
in age, gender, educational level, and socioeconomic status.
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
184
results of two factor analyses, one done on a general psychiatric sample (N = 744), and one on a
substance abuse sample (N = 206). For the general psychiatric sample, the factor analysis suggested four factors, with the rst three accounting
for 85% of the variance. These factors are rather
complex. For example, 13 of the 20 scales load
signicantly on the rst factor which is described
as depressive and labile emotionality expressed
in affective moodiness and neurotic complaints
(Millon, 1987). In fact, the rst three factors parallel a classical distinction found in the abnormal
psychology literature of affective disorders, paranoid disorders, and schizophrenic disorders.
The results of the factor analysis of the
substance-abuse patients also yielded a four factor solution, but the pattern here is somewhat
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
develop an item pool that reected DSM criteria and, in this case, reected a particular theory of interpersonal behavior (L. S. Benjamin,
1993). One interesting approach used here was
that the items were worded from the perspective
of the respondent relating to others. For example,
rather than having an item that says, People say
I am cold and aloof, the authors wrote, When I
have feelings I keep them to myself because others
might use them against me. A total of 360 items
were generated that covered the 11 personalitydisorder categories, social desirability, and some
other relevant dimensions. The authors do not
indicate the procedures used to eliminate items,
and indeed the impression one gets is that all
items were retained. Respondents are asked to
answer each item according to their usual self
over the past 5 years or more, and use a 10-point
scale (where 1 is never or not at all true, and 10
is always or extremely true).
14:24
185
Content validity. The items were given to 4 clini-
cians to sort into the 11 personality disorder categories. A variety of analyses were then carried out
basically showing clinicians agreement. Where
there was disagreement in the sorting of items,
the disagreement was taken as reecting the fact
that several of the personality disorders overlap
in symptomatology.
Normative sample. The major normative sample is composed of 1,230 subjects, that includes
368 patients and 862 normals who were recruited
from the general population by newspaper advertisements, classroom visits, solicitation of visitors to the University Hospital and so on.
Although the authors give some standard demographic information such as gender, education,
and age, there is little other information given;
for example, where did the patients come from
(hospitalized? outpatients? community clinic?
university hospital?), and what is their diagnosis?
Presumably, patients are in therapy, but what
kind and at what stage is not given. Clearly these
subjects are samples of convenience; the average
age of the normal subjects is given as 24.4 which
suggests a heavy percentage of captive collegeaged students.
Reliability. Interitem consistency was calculated
for each of the 11 scales; alpha coefcients range
from a low of .84 to a high of .96, with an average of .90, in the normative sample. Test-retest
coefcients for a sample of 40 patients and 40
nonpatients who were administered the WISPI
twice within 2 weeks ranged from a low of .71
to a high of .94, with an average of .88. Two
forms of the WISPI were used, one a paper-andpencil form, the other a computer-interview version, with administration counterbalanced. The
results suggest that the two forms are equally
reliable.
Scale intercorrelations. The scales correlate sub-
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
186
Concurrent validity. Do the WISPI scales dis-
social situations, odd beliefs, eccentric behavior, and odd speech. A number of scales have
been developed to assess this personality disorder, although most seem to focus on just a few of
the nine criteria. An example of a relatively new
and somewhat unknown scale that does cover all
nine criteria is the SPQ (Raine, 1991). The SPQ
is modeled on the DSM-III-R criteria, and thus
the nine criteria served both to provide a theoretical framework, a blueprint by which to generate items, and a source for items themselves.
Raine (1991) rst created a pool of 110 items,
some taken from other scales, some paraphrasing the DSM criteria, and some created new.
These items, using a true-false response, were
administered to a sample of 302 undergraduate
student volunteers, with the sample divided randomly into two subsamples for purposes of crossvalidation. Subscores were obtained for each of
the nine criterion areas and item-total correlations computed. Items were deleted if fewer than
10% endorsed them or if the item-total correlation was less than .15.
A nal scale of 74 items, taking 5 to 10 minutes to complete, was thus developed. Table 7.5
lists the nine subscales or areas and an illustrative
example.
In addition to the pool of items, the subjects
completed four other scales, two that were measures of schizotypal aspects and two that were
not. This is of course a classical research design
to obtain convergent and discriminant validity
data (see Chapter 3). In addition, students who
scored in the lowest and highest 10% of the distribution of scores were invited to be interviewed
by doctoral students; the interviewers then independently assessed each of the 25 interviewees on
the diagnosis of schizotypal disorder and on each
of the nine dimensions.
Reliability. Coefcient alpha for the total score
was computed as .90 and .91 in the two subsamples. Coefcient alpha for the nine subscales
ranged from .71 to .78 for the nal version. Note
here somewhat of a paradox. The alpha values
for each of the subscales are somewhat low, suggesting that each subscale is not fully homogeneous. When the nine subscales are united, we of
course have both a longer test and a more heterogeneous test; one increases reliability, the other
decreases internal consistency. The result, in this
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
187
14:24
Illustrative item
People are talking about me.
I get nervous in a group.
I have had experiences with the
supernatural.
When I look in the mirror my face
changes.
People think I am strange.
I dont have close friends.
I use words in unusual ways.
I keep my feelings to myself.
I am often on my guard.
ning in 1964 through a series of steps and procedures somewhat too detailed to summarize
here (see the STAI manual for details; Spielberger, Gorsuch, Lushene, et al., 1983). Initially,
the intent was to develop a single scale that would
measure both state and trait anxiety, but because
of linguistic and other problems, it was eventually decided to develop different sets of items to
measure state and trait anxiety.
Basically, three widely used anxiety scales were
administered to a sample of college students.
Items that showed correlations of at least .25 with
each of the three anxiety scale total scores were
selected and rewritten so that the item could be
used with both state and trait instructions. Items
were then administered to another sample of college students, and items that correlated at least .35
with total scores (under both sets of instructions
designed to elicit state and trait responses) were
retained. Finally, a number of steps and studies
were undertaken that resulted in the present form
of two sets of items that functioned differently
under different types of instructional sets (e.g.,
Make believe you are about to take an important nal examination).
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
188
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
189
8. Self-accusations
9. Suicidal wishes
10. Crying speels
11. Irritability
12. Social withdrawal
13. Indecisiveness
14. Distortion of body image
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
190
Administration. Initially, the BDI was administered by a trained interviewer who read aloud
each item to the patient, while the patient followed on a copy of the scale. In effect then, the
BDI began life as a structured interview. Currently, most BDIs are administered by having the
patient read the items and circle the most representative option in each item; it is thus typically
used as a self-report instrument, applicable to
groups. In its original form, the BDI instructed
the patient to respond in terms of how they were
feeling at the present time, even though a number of items required by their very nature a comparison of recent functioning vs. usual functioning, i.e., over an extended period of time. The
most recent revision asks the respondent to consider how they were feeling over the past few
weeks. Incidentally, one advantage of such selfrating procedures is that they involve the patient
in the assessment and may thus be therapeutic.
the raw scores are used directly without any transformation. Total raw scores can range from 0
to 63, and are used to categorize four levels of
depression: none to minimal (scores of 0 to 9);
mild to moderate (scores of 1018); moderate
to severe (1929); and severe (3063). Note that
while there is a fairly wide range of potential
scores, individuals who are not depressed should
score below 10. There is thus a oor effect (as
opposed to a ceiling effect when the range of
high scores is limited), which means that the BDI
ought not to be used with normal subjects, and
that low scores may be indicative of the absence
of depression but not of the presence of happiness (for a scale that attempts to measure both
depression and happiness see McGreal & Joseph,
1993).
Reliability. A. T. Beck (1978) reports the results
of an item analysis based on 606 protocols, showing signicant positive correlations between each
item and the total score. A corrected split-half
reliability of .93 was also reported for a sample of
97 subjects.
Test-retest reliability presents some problems
for instruments such as the BDI. Too brief an
interval would reect memory rather than stability per se, and too long an interval would mirror
possible changes that partly might be the result of
therapeutic interventions, remission, or more
individual factors. A. T. Beck and Beamesderfer (1974) do report a test-retest study of 38
patients, retested with a mean interval of 4 weeks.
At both test and retest an assessment of depth of
depression was independently made by a psychiatrist. The authors report that the changes in BDI
scores paralleled the changes in the clinical ratings of depression, although no data are advanced
for this assertion. Oliver and Burkham (1979)
reported a test-retest r of .79 for a sample of college students retested over a 3-week interval. In
general, test-retest reliability is higher in nonpsychiatric samples than in psychiatric samples, as
one might expect, because psychiatric patients
would be expected to show change on retesting
due to intervening experiences, whether therapeutic or not; such experiences would not affect
all patients equally.
Internal consistency reliability seems quite
adequate; typical results are those of Lightfoot
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
191
Secondary validity. In our discussion of secondary validity (see Chapter 3), we saw that
Gough (1965) suggested relating scores on a
measure to important variables. A. T. Beck
and Beamesderfer (1974) undertook just such an
analysis and found a small but signicant relationship of BDI scores with gender (females scoring higher), none with race, none with age (contrary to the popular belief that older patients are
more likely to be depressed), a small but significant relationship with educational attainment
(patients with lesser education tended to score
higher), a slight (but presumably insignicant)
correlation with vocabulary scores, and a significant but explainable negative correlation with
social desirability, in that depressed patients do
select unfavorable alternatives.
Cross-cultural studies. The BDI has been used
in a substantial number of cross-cultural studies in a variety of countries, ranging from the
former Czechoslovakia and Switzerland (A. T.
Beck & Beamesderfer, 1974) to Iran (Tashakkori,
Barefoot, & Mehryar, 1989) and Brazil (Gorenstein, Andrade, Filho, et al., 1999), and has been
translated into a wide range of languages including Chinese, German, Korean, and Turkish (see
Naughton & Wiklund, 1993, for a brief review
of these studies). The results are supportive of
its reliability and validity across various cultures,
with some minor exceptions.
Short form. A. T. Beck and Beamesderfer (1974)
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
192
and even 10 additional factors. Beck and Beamesderfer (1974) themselves obtained three factors
that they labeled as negative view, physiological, and physical withdrawal. Weckowicz,
Muir, and Cropley (1967) also found three factors labeled as guilty depression, retarded
depression, and somatic disturbance, and
corresponding to those factors found by other
investigators in more general investigations of
depression. Recent studies (e.g., Byrne & Baron,
1994) also have reported three factors but with
some cross-cultural differences, at least within
French Canadian vs. English Canadian adolescents. Endler, Rutherford, and Denisoff (1999)
however, reported two factors for a sample of
Canadian students a cognitive-affective dimension and a physiological dimension. Whether the
BDI measures depression in a unidimensional
global manner or whether the scale is composed
of several replicable factors remains an open issue
(e.g., Welch, Hall, & Walkey, 1990).
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
Sometimes, rather than create a new scale, investigators look at the variety of measures that have
been developed to assess a particular variable, and
select the best items as a new scale. The CES-D
illustrates this approach. This scale was designed
to measure symptoms of depression in community populations (Radloff, 1977); it is basically a
screening inventory, designed not as a diagnostic
tool but as a broad assessment device. The scale
consists of 20 items taken from other depression
14:24
193
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
194
been used with a wide variety of groups, including homeless persons (Wong, 2000).
The construct validity of the CES-D also seems
quite acceptable, but there is some question as to
whether the CES-D measures depression, both
depression and anxiety, or some other variable.
Roberts, Vernon, and Rhoades (1989) suggest
for example, that the scale measures demoralization, which could be a precursor to either
depression or anxiety.
Studies of the factor structure of the CES-D
have typically found four factors, and although
different investigators use different terms, the
four factors reect: (1) depressed affect, (2) positive affect, (3) somatic/vegetative signs, and (4)
interpersonal distress this last factor is composed of only two items and is a weak factor psychometrically (Kuo, 1984; Radloff, 1977).
Four subscale scores can thus be obtained that
include 18 of the 20 items.
Like the BID, the CES-D has been translated
into a number of languages including Chinese
(Ying, 1988) and Greek (Madianos, Gournas, &
Stefanis, 1992), and short forms have been developed (Melchior, Huba, Brown, et al., 1993; Santor
& Coyne, 1997).
The Zung Self-Rating Depression
Scale (SDS)
The SDS was designed to provide a quantitative, objective assessment of the subjective experience of depression (Zung, 1965). The scale is
composed of 20 items that cover affective, cognitive, behavioral, and psychological symptoms of
depression. Respondents are asked to rate each
item using a 4-point scale from 1 (none or a little
of the time) to 4 (most or all of the time), as to how
it has applied to them during the past week. The
SDS is self-administered and takes about 5 minutes to complete.
The score on the SDS is calculated by summing
the item scores, dividing by 80, and multiplying
by 100. Scores below 50 are in the normal range,
scores between 50 and 59 reect mild depression, 60 to 69 marked depression, and 70 or above
extreme depression.
Although the SDS has been available for some
time and has been used in a variety of studies, there is relatively little reliability information available; what is available suggests adequate
14:24
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
Psychopathology
14:24
195
Not
depressed
Depressed
90
10
Not
depressed
10
90
Test
results
Not
depressed
Depressed
18
18
Not
depressed
162
Test
results
P1: JZP
0521861810c07
CB1038/Domino
0 521 86181 0
196
SUMMARY
We have taken a brief look at a variety of instruments, and through them, at issues that underlie
the area of testing for psychopathology. We have
seen screening inventories, multivariate tests, and
measures of specic dimensions such as anxiety
and depression. In the past, a major issue has
always been criterion validity, and diagnosis was
seen as not highly reliable, and therefore not a
valid, even though it was a necessary, criterion.
As the focus has changed more to construct validity, the issue has focused more on the sensitivity
and predictive power of a test. At the same time,
we should not lose sight of the fact that a test, in
the hands of a well-trained and sensitive clinician,
is much more than a simple diagnostic tool.
SUGGESTED READINGS
Elwood, R. W. (1993). Psychological tests and clinical discriminations: Beginning to address the base rate
problem. Clinical Psychology Review, 13, 409419.
We used the arguments and examples given by this author
in our discussion of base rates. This article is a very clear
exposition of base rates as they affect clinical assessment.
DISCUSSION QUESTIONS
14:24
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
AIM This chapter looks at a variety of areas that reect normal positive functioning.
SELF-CONCEPT
Perhaps a rst question about normal functioning has to do with a persons self-concept. How
do you feel about yourself? Do you like yourself?
Do you have condence in your abilities? Do you
perceive yourself as being of value and worth?
Or are you doubtful about your own worth, do
you have little condence in yourself and often
feel unhappy about yourself? This is the issue of
self-esteem and/or self-concept. A persons selfconcept is that persons view of self, and it is highly
related to a wide variety of behaviors. Other terms
such as self-esteem and self-image are used, and
authors argue about the differences and similarities. Some differentiate between various selfcombinations, and others use all the terms synonymously. Paralleling the development of intelligence measurement, as well as the measurement
of many other variables, self-concept research
initially emphasized a general or unitary selfconcept. Recently it has focused on the multidimensionality of the self-concept. For our purposes, we use the term self-concept to include
both global concept and specic dimensions.
One of the better well-known scales of selfconcept is the Tennessee Self-Concept Scale or
TSCS (Fitts, 1965); a 1981 bibliography contained some 1,350 relevant references (P. F. Reed,
Fitts, & Boehm, 1981). The TSCS consists of 100
self-descriptive items, such as I am a happy person and I do what is right, by means of which
an individual indicates what he or she likes, feels,
does, and so on. The scale is basically designed to
assess a persons self-image, and how realistic or
deviant it is. Each item has ve response options
197
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
198
Identity
6 items
per cell
Family
self
Social
self
etc.
Self-satisfaction
Behavior
FIGURE 81. Two-dimensional schema underlying the TSCS.
the MMPI. Note that, in general, the development of this scale follows the pattern outlined in
Chapter 2.
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
199
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
200
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
201
Mean
SD
Student leaders
Varsity athletes
Intro Psych students
Counseling Center clients
Problem students as
identied by Dean
Students on academic probation
62.3
57.1
51.4
46.3
43.4
5.7
5.0
7.3
5.1
4.8
41.4
5.4
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
202
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
203
Control (I-E) scale was thus developed by Rotter to operationalize his hypothesis. The I-E
scale began as a set of 26 items, using a Likerttype response scale, and developed on a priori
grounds that is, the items were written to reect
the theoretical literature and to be used as is. This
scale was used and further rened in two doctoral dissertations by students of Rotter, was then
expanded to 60 items, and then through a series of
studies, was reduced to a 29-item forced-choice
scale, that includes 6 ller items. The I-E scale
then presents the respondent with sets of items
from which the respondent selects the one that he
she most believes in. For example: (a) Becoming
a success is a matter of hard work, or (b) Getting
a good job depends on being at the right place
at the right time. For each pair, one statement
represents an internal locus of control and the
matching statement an external locus of control.
The score is the total number of external choices.
Originally, the I-E scale was intended to provide subscale scores in a variety of life areas such
as social interactions, political affairs, and academic achievement; however, the subscales were
all yielding similar results so it was decided to
abandon this effort and measure a single, overall
expectancy.
need to consult the original doctoral dissertations for specic details, Rotter (1966) provides
a rather extensive amount of information on the
initial reliability and validity of the I-E scale. In
addition, internal vs. external control of reinforcement, often referred to as locus of control
is probably one of the most studied variables in
psychology, and numerous scales of locus of control are available. In fact, the area is so prolic that
there are many reviews (e.g., Joe, 1971; Lefcourt,
1966; Strickland, 1989; Throop & MacDonald,
1971), and entire books devoted to the topic (e.g.,
Lefcourt, 1976; Phares, 1976).
Rotter (1966) reported corrected split-half
reliabilities of .65 for males and .79 for females,
and Kuder-Richardson coefcients for various
samples in the .69 to .76 range. Rotter (1966) felt
that the nature of the scale (brief, forced-choice,
and composed of items covering a variety of situations) resulted in underestimates of its internal
consistency. Test-retest reliability in various samples, with 1- and 2-month intervals, ranged from
.49 to .83.
Correlations with a measure of social desirability ranged from .17 to .35, with a median
of .22, and correlations with various measures
of intelligence essentially were insignicant. Rotter (1966) also reported briey two factor analyses, both of which suggested one general factor. A
number of other studies are presented addressing
the construct validity of the scale, such as correlations with story-completion and semistructured
interview measures of locus of control, analyses
of social-class differences, and controlled laboratory tasks.
The literature is replete with hundreds of studies that support the construct validity of the scale
and of the concept. Locus of control scores are
related to a wide variety of behaviors such as academic achievement (the more internal the orientation the higher the achievement; e.g., BarTal & Bar-Zohar, 1977) and various aspects of
problem solving (e.g., Lefcourt, 1976). A number of studies support the hypothesis that internals show more initiative and effort in controlling both the physical environment and their own
impulses (e.g., Joe, 1971).
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
204
As Wiederman and Allgeier (1993) state, Sexuality is a vital part of being human. Sexuality
covers a rather wide variety of variables, such
as premarital intercourse, attitudes toward virginity, machismo, pornography, medical conditions, religious and philosophical views, homosexuality, and so on. All types of scales have
been developed in this area, ranging from attitude scales toward the use of condoms (e.g.,
I. S. Brown, 1984) and masturbation (e.g.,
Abramson & Mosher, 1975) to sexual knowledge questionnaires (e.g., Gough, 1974; Moracco
& Zeidan, 1982). From a psychometric point of
view, this is a very broad and ill-dened area,
and so the examples below cannot be taken as
representative in the same way that the StanfordBinet and the Wechsler tests represent intelligence tests.
The sexuality scale. Snell and Papini (1989)
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
205
Christensen and Carpenter (1962). These investigators were interested in exploring the relationship of premarital pregnancy to possible consequences such as having to get married, giving the
baby up for adoption, etc., as a function of cultural differences, specically the sexually restrictive Mormon culture of Utah, the more typical
United States culture, and the sexually permissive
culture of Denmark.
The authors attempted to develop a Guttman
type scale to assess intimacy permissiveness,
that is how permissive or restrictive a persons
attitudes are toward premarital sexual intimacy.
They began with 21 items but eventually reduced
these to 10, after some statistical procedures were
carried out to test for unidimensionality. The 10
items cover the desirability of marrying a virgin,
petting, premarital intercourse, premarital pregnancy, and freedom of access to erotic literature.
Presumably someone who has a permissive attitude toward premarital intercourse would also
have a permissive attitude toward premarital petting. The subject is required to check each item
with which he or she agrees, so scores can go
from 0 to 10, with higher scores indicating greater
permissiveness.
You recall that in the Guttman scaling, the
coefcient of reproducibility is the important statistical procedure. The authors report such coefcients as ranging from .90 to .96 (these are basically reliability coefcients). In terms of validity,
the authors present three lines of evidence. The
mean for the Danish sample (8.3) is substantially
higher than the mean for the U.S. sample (4.1),
and the mean for the Mormon sample (2.4).
Secondly, males have higher mean scores than
females, in all three samples. Finally, the higher
the intimacy-permissiveness score the larger the
percent having premarital intercourse, again in
all three samples. Unfortunately, the authors give
no detail on how the original items were developed or on the precise statistical procedures used.
CREATIVITY
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
206
least four ways in which we can study creativity. The rst is to focus on creative persons. Suppose we were able to identify a group of highly
creative individuals, we could ask whether they
differed on intelligence, personality, motivation,
food preferences, and so on, from their less creative peers. Psychologists have been quite prolic
in their study of such persons, and a wide variety of groups have been assessed such as writers,
architects, mathematicians, mothers of creative
adolescents, and so on. The results suggest that
creative persons do differ from their less creative
peers in a number of signicant ways that transcend eld of enterprise (see Barron, 1969; G. A.
Davis, 1986).
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
requires a well-trained individual, knowledgeable and experienced with the scoring guidelines.
The test manual and scoring guides are quite clear
and provide much guidance. There is also the
possibility of sending the test protocols to the
author for scoring (for a fee).
The subtests can be scored along four
dimensions: uency, exibility, originality, and
elaboration (not every subtest yields all four
scores). These dimensions apply to many other
creativity measures, especially the Guilford tests,
14:26
207
on the TTCT are positively related to other concurrent criteria, including teacher ratings and
observed creative-leadership activities, but that
predictive validity is a much more complex and
controversial matter. TTCT scores have a modest but signicant correlation with later creative
achievement criteria (e.g., Torrance, 1981).
In the test manual, Torrance (1966) presents a
wide variety of studies covering construct, concurrent, and predictive validity. Many more such
studies have been published in the literature since
then, and a majority are supportive of the validity
of the TTCT. As far as content validity, Torrance
(1966) argued that although the TTCT tasks do
not sample the entire universe of creative abilities, they do sample a rather wide range of such
abilities. The test stimuli were selected on the
basis of an analysis of the literature regarding
eminently creative individuals and educational
theories regarding learning and creativity.
The manual also presents a number of ndings
that address the issue of whether TTCT scores are
correlated with intelligence. The results seem to
suggest that there is a very low pattern of correlations with intelligence tests (in the .20s), a slightly
higher correlation with tests that assess reading
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
208
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
209
In one sense, the ACL should have been considered in Chapter 4 because it is primarily a
personality inventory, but it is presented in this
chapter because its beginnings and its use have
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
210
been closely interwoven with the eld of creativity. The ACL is a very simple device and consists
of 300 words, in alphabetical order, from absentminded to zany. Subjects are typically asked to
mark those adjectives that are self-descriptive. As
with other personality inventories, the responses
of the subject can then be translated into a personality prole over a number of scales.
The ACL actually began its professional life
as an observers checklist used to describe wellfunctioning subjects, such as architects and writers, who were being intensively studied at the
Institute of Personality Assessment and Research
(IPAR). The author of the ACL, Harrison Gough,
published the 300-item version in 1952. Scales
were then developed on the ACL, including a set
of 15 scales based on Murrays need system, scales
based on Transactional Analysis, and others. A
test manual was published in 1965, and revised
in 1983 (Gough & Heilbrun, 1983). The ACL has
been quite popular, and has been translated into
a substantial number of languages, from French
to Vietnamese. The ACL can be computer or
hand scored. Scoring basically involves subtracting the number of contraindicative items from
the number of indicative items for each scale. Raw
scores are then converted to T scores, according
to both gender and the total number of adjectives checked.
In some ways, the ACL represents an ideal psychometric instrument. It is simple to administer, brief, nonthreatening and noninvasive to the
respondent, scorable either by hand or computer,
amenable to statistical and logical analyses, useful both as an observer- or self-descriptive instrument, and almost limitless in its range of applications (see Fekken, 1984, for a review). Because
it is an open system such as the MMPI and CPI,
new scales can be developed as needed, and in
fact a number of investigators, including Gough
himself, have developed scales of creativity on the
ACL (G. Domino, 1994).
The Domino Creativity (Cr) Scale on the ACL.
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
211
SD
Artistic creative
Artistic control
Scientic creative
Scientic control
54.48
45.52
54.38
45.62
8.71
8.94
9.72
7.95
Females
SD
Artistic creative
Artistic control
Literary creative
Literary control
52.37
47.63
54.14
45.86
9.60
9.62
9.66
8.28
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
212
(c) A rocket
(a) A person
(b) A fox
FIGURE 85. Illustrative responses on the Chinese Tangram.
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
213
IMAGERY
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
214
Today probably the most commonly used measure of imagery is Marks (1973) Vividness of
Visual Imagery Questionnaire (VVIQ), although
it has been severely criticized (e.g., Chara &
Verplanck, 1986; Kaufman, 1983). This 16-item
questionnaire asks the respondent to think, in
turn, of a relative or friend seen frequently, a
rising sun, a shop, and a country scene, and in
each case visualize the picture that comes in the
minds eye. For each item, the respondent uses
a 1 to 5 rating scale, where 1 is dened as perfectly clear and as vivid as normal vision and 5
is no image at all, you only know that you are
thinking of the object. The scale is administered
twice, once with eyes open and once with eyes
closed.
D. F. Marks (1973) reports a test-retest reliability of .74 for a sample of 68 subjects, with
no time period indicated, and a split-half reliability of .85. He also reports three different
studies, two with college students and one with
high-school students, in which in each case the
most extreme high scorers were compared with
the most extreme low scorers in the degree of
recall of colored photographs. In all three experiments good visualizers (i.e., low scorers on the
VVIQ), were more accurate in recall than poor
visualizers (i.e., high scorers on the VVIQ). In
two of the studies, D. F. Marks (1973) found
that females had greater accuracy of recall than
males.
Paivios Individual Differences
Questionnaire (IDQ)
Another relatively popular imagery questionnaire is the Individual Differences Questionnaire (IDQ). Paivio (1975) proposed that human
thinking involves a continuous interplay of
nonverbal imagery and verbal symbolic processes, which are interconnected but functionally distinct; this proposal led to a dual coding theory that spells out the implications of
such a model. As part of his efforts, Paivio (1971;
Paivio & Harshman, 1983) developed the IDQ
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
14:26
215
construct validity of the VVQ. Part of the difculty may be that the VVQ considers verbal-visual
abilities as opposite ends of a continuum, while
the literature suggests that these two processes are
parallel and independent.
Parrott (1986) also found the VVQ not to relate
to visual imagery, spatial-ability level, and examination grades in a sample of mechanical engineering students. Boswell and Pickett (1991) suggested that the problem with the VVQ was its
questionable internal consistency. They administered the VVQ to a sample of college students and
computed K-R 20 coefcients; these were .42 for
males, .50 for females, and .47 for the total sample. They carried out a factor analysis and found
six factors, but only the rst two were logically
meaningful; these were labeled as vividness of
dreams and enjoyment of word usage.
General concern about imagery questionnaires.
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
216
than .40. The internal consistency of this 20item form was .90. Scores on this scale, which
the authors call the Competitiveness Index, were
signicantly correlated with measures of achievement motivation. A factor analysis yielded three
factors labeled as emotion, argument, and
games. Whether this competitiveness index is
useful, remains to be seen.
HOPE
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
tially (.71 and .82) with two measures of psychological well-being that presumably measured
purpose and meaning in life, somewhat less (.69)
with a one-item self-assessment of hope, and negatively (.54) with a hopelessness scale. A factor
analysis of the MHS suggested that a three-factor
solution was the best, which the authors interpreted as satisfaction, avoidance of hope threats,
and anticipation of the future.
Note a bit of a problem here. If a scale correlates
substantially with other scales that presumably
measure different constructs, then we can question whether in fact the scales are measuring the
same construct. Here is a hope scale that correlates substantially with measures of psychological
well-being; does the MHS measure hope or some
other variable, such as purpose in life, psychological health, adjustment, a desire to present ones
self favorably? If we could arrive at a consensus
unlikely in reality but thinkable theoretically
that all these scales are measuring hope, then we
need to ask why use the MHS? What advantages
does this scale have over others? Is it shorter? Does
it have better norms? It is more easily available?
Does it have higher reliability?
14:26
217
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
218
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
One of the most frequently used and psychometrically sound instruments for the measurement
of loneliness is the UCLA Loneliness scale, originally presented in 1978 (Russell, Peplau, & Ferguson, 1978) and revised in 1980 (Russell, Peplau,
& Cutrona, 1980).
The ULS was initially developed by using 25
items from another longer (75 items) loneliness
scale. This pool of items was administered to a
sample of UCLA students who were either members of a discussion group on loneliness (n = 12),
a Social Psychology class (n = 35), or in Introductory Psychology (n = 192). Correlations between
item and total score indicated that 5 of the items
correlated below .50 and hence were dropped.
The nal scale of 20 items was then assessed
for reliability and a coefcient alpha of .96 was
obtained. A sample of 102 students retested over
a 2-month period, yielded a test-retest r of .73.
Scores on the ULS correlated .79 with self-ratings
of loneliness, and the mean was signicantly different for those students who had volunteered
for the discussion group on loneliness than for
a control sample. Furthermore, scores correlated
signicantly with self-ratings of depression, anxiety, and endorsement of several adjectives such
as empty and shy, and negatively with selfratings of satisfaction and of happiness.
The ULS is thus a unidimensional measure
composed of 20 items, 10 worded positively and
10 worded negatively. Subjects respond on a 4point scale of never (1), rarely (2), sometimes
(3), and often (4). Coefcient alphas are typically in the .80s range for early adolescents
(Mahon & Yarcheski, 1990), and in the .90s for
college students and adults (Hartshorne, 1993;
Knight, Chisholm, Marsh, et al., 1988). The
results of factor-analytic studies are however
mixed. Some researchers have found one factor
(e.g. Hartshorne, 1993), others two factors (e.g.,
Knight, Chisholm, Marsh, et al., 1988; Zakahi &
Duran, 1982), and some, four or ve factors (e.g.,
14:26
219
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
220
This chapter looked at a selected number of topical areas to illustrate various uses of tests and
some basic issues. In the area of self-concept, the
Tennessee Self-Concept Scale illustrates some of
the challenges that need to be met. In the area
of locus of control, the Rotter scale occupies a
central position and gave birth to a multitude of
such scales. Sexuality is yet another active eld
of research where various scales and psychometric methodologies can provide potentially useful
results. In the area of creativity, the challenges are
many, and the measurement there is, if not still
in its infancy, certainly is in its childhood. Other
areas such as hope and death anxiety illustrate a
variety of approaches and issues.
SUGGESTED READINGS
14:26
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
DISCUSSION QUESTIONS
1. If you were writing this chapter on psychological tests that measure normal positive function-
14:26
221
P1: JZP
0521861810c08
CB1038/Domino
0 521 86181 0
222
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
14:26
9 Special Children
AIM This chapter looks at psychological testing in the context of special children.
We rst look at some specic issues that range from the laws that have impacted
the psychological testing of handicapped children, to issues of infant intelligence. In
the process, we look at several instruments that are broad-based and nicely illustrate
some of these issues. Then we look at nine major categories of special children, not
exhaustively, but to illustrate various issues and instruments. Finally, we return to some
general issues of this rather broad and complicated area.
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
224
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
questions of technical adequacy in their development and in their reliability and validity. Tests
that may be quite useful in research settings or
with normal children may be of limited use with
special children. Even when tests are appropriate, the information they yield may be of limited
value to the development of a treatment plan or
to the prescription of special educational interventions. Sometimes two instruments that supposedly measure the same functions may yield
rather disparate results (e.g., Eippert & Azen,
1978; B. L. Phillips, Pasewark, & Tindall, 1978).
Sometimes the same instrument, when revised,
can yield substantially different normative interpretations of the same score (e.g., Covin, 1977;
R. J. Thompson, 1977).
In Chapter 2, we discussed the notions of age
norms and school-grade norms. We noted several
problems with these norms, which are particularly acute in the testing of special children. These
children are often developmentally delayed. Thus
a mentally retarded 8-year-old child may have
language abilities that are more appropriate for a
normal 3-year-old. But we should not therefore
conclude that the child functions at a 3-year-old
level. He or she probably functions at different
levels in different domains, and most likely his or
her language abilities may show more variability
of functioning than that of the typical 3-year-old.
The challenges to the appropriate use of tests with
special children are many, but this is not to say
that these tests are useless far from it. Again, we
make the point here that psychometrically valid
instruments, in the hands of a well-trained professional, often represent the only objective data
available, and they can be extremely useful for a
variety of purposes.
Domains of assessment. There are all sorts
of tests that can and have been administered
to special children, and there are many ways
to categorize these tests and/or the underlying
domains. One way is to consider the following
ve categories: (1) infant scales, used to diagnose
early developmental delays that may occur in cognitive, physical, self-help, language, or psychosocial development; (2) preschool tests used to
diagnose mental retardation and learning disabilities and to assess school readiness; (3) school-age
tests of intelligence and cognitive abilities; (4)
school-age tests of academic achievement; and
14:26
225
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
226
discussed in Chapter 8, the Self-Esteem Inventory (Coopersmith, 1967), the Piers-Harris SelfConcept Scale (Piers & Harris, 1969), and the
Self-Perception Inventory (A. T. Soares & L. M.
Soares, 1975). The majority of self-concept scales
are self-report inventories the child responds to
stimuli by choosing the one that best represents
himself or herself. These stimuli may be written
statements, statements that are read by the examiner, drawings of children, happy faces, etc. (see
Coller, 1971; Walker, 1973).
Age differentiation. From birth to 18 months
1. Comparable factor structure. If a test is modied in some way for administration to special
children, it is possible that the test may no longer
measure the same construct. A factor analysis of
the modied test should yield relevant information as to whether the structure of the test has
changed.
2. Comparable item functioning. Even if the
overall factor structure is the same for the original test and the modied or nonstandard test, it is
possible that specic items may be more (or less)
difcult for someone with a particular impairment.
3. Comparable reliability. Obviously, the modied form needs to yield consistent results i.e.,
be reliable.
4. Comparable accuracy of prediction. Does the
nonstandard test discriminate between successful
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
Goodwin and Driscoll (1980) outline a number of issues that concern the measurement and
evaluation of young children. The rst issue is
whether young children should be tested. Some
people express fear that the very procedure of
testing a young child can be harmful and traumatic. There does not seem to be much evidence to support this, and most professionals
believe that the potential benets of professionally administered tests far outweigh any possible
negative effects. Often, fears are raised by misinformed individuals who imagine the wholesale administration of multiple choice exams to
preschoolers; they do not realize that most tests at
this level are individually administered or require
an adult informant. Fleege, Charlesworth, Burts
et al. (1992) did report that kindergarten children being given a standardized achievement test
exhibited more behaviors that were stress related
during the test than before or after the test but
much of the behavior seemed to be related to how
the adult authority gures around the children
reacted.
14:26
227
A second issue is whether measurement is possible or meaningful with young children. In general, the younger the child, the greater the measurement problems; in addition, the measurement of affective constructs seems to be more
difcult than the measurement of cognition.
A third issue is whether tests are used appropriately with young children. There is of course
a growing dissatisfaction, both by professionals
and lay persons, with the use of tests in American
schools, but satisfactory substitutes have yet to
appear.
Another issue is whether tests used with young
children are fair, particularly with regard to ethnic minorities. This is a highly emotional issue,
where often facts are distorted to t preconceived
opinions. The issue is by no means an open-andshut one, and there is much controversy. We take
a closer look at this in Chapter 11.
Perhaps of greater concern than the above
points is the possible negative consequence of
labeling a child with a particular diagnostic label
(C. D. Mercer, Algozzine, & Triletti, 1979). This
also is a highly emotional issue, with little data
available. From a psychometric point of view, to
test a child merely to obtain a diagnostic label
is like purchasing a house to have a mailbox.
Although the mailbox may be central and may
serve a valuable purpose, it is the house that is
important.
Infant intelligence. The assessment of infant
intelligence is a difcult task. The examiner must
not only establish rapport with both the infant
and the adult caretaker, but must be a superb
observer of behavior that does not appear on
command. With older children we can offer them
rewards, perhaps reason with them, or use our
adult status to obtain some kind of compliance;
infants, however, do not necessarily respond in a
cooperative manner.
Infant intelligence tests are often assessed from
the viewpoint of content validity; that is, do they
reect a sample of behavior that is governed by
intellect? They can also be judged by predictive
validity, that is, how well they forecast a future
criterion. This is a difcult endeavor because
intelligence of an infant is primarily exhibited
through motor behavior, while the intelligence
of a school-aged child is primarily exhibited
through verbal school achievement. Finally, tests
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
228
title indicates, these are developmental schedules or frameworks, that is, they are not tests
in the strict sense of the word, but a timetable
of what is to be expected of a child in ve
areas of behavior: adaptive behavior, gross motor
movement, ne motor movement, language, and
personal-social behavior. The schedules provide
a standardized means by which a childs behavior
can be observed and evaluated; they are particularly useful to pediatricians and child psychologists who need to observe and evaluate young
children.
The scales were developed by Gesell (Gesell,
Halveron, & Amatruda, 1940) and revised a number of times by various investigators (Ames, Gillespie, Haines, et al., 1979; Gesell, Ilg, & Ames,
1974; Knobloch & Pasamanick, 1974; Knobloch,
Stevens, & Malone, 1980). The scales cover an
age range of 4 weeks to 5 years and are basically a
quantication of the qualitative observations that
a pediatrician makes when evaluating an infant.
The scales are used to assess developmental status, either from a normal normative approach,
or to identify developmental abnormalities that
might be the result of neurological or other damage. In fact, part of the value of these schedules is
their focus on neuromotor status, which covers
such things as posture, locomotion, and muscle tone, and can provide valuable quantitative
information for the identication of neuromotor disabilities.
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
229
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
230
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
231
sented, both in the test manuals and in the literature, that support the validity of the PIC,
particularly the concurrent, convergent, and discriminant validity. As Knoff (1989) stated, these
studies provide an excellent basis for the PIC,
but additional work is needed, especially a more
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
232
the analogy of an experiment as a way of thinking about tests. This is particularly true in testing
special children, and we want the testing environment to be free of extraneous conditions.
Ideally, the testing room should be well lit, free
from noise, distractions, and interruptions. The
furniture should be appropriate for the child.
A number of authors (e.g., Blau, 1991) have
described what the ideal testing environment
should be like, but the reality is that testing often
takes place in whatever room is available in the
school or clinic, with interruptions, scheduling
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
dren who are referred for testing present a particular challenge in that they may have poor
self-concept, difculties in maintaining attention, lack of motivation, substantial shyness,
poor social skills, and so on. These decits of
course, affect not only the childs coping with the
demands of school, but may well affect their test
performance. Thus, establishing good rapport is
particularly important.
Children are highly variable. Not only do they
differ substantially from each other, but differ
within themselves from day to day and from
behavior to behavior. Their emotional states are
highly volatile a child may be very happy at
10 in the morning and very unhappy an hour
later. Childrens behaviors are often specic to a
particular situation; a child may have great difculties with school peers but may get along well
with siblings at home. A particular child may be
advanced motorically but may lag in social skills;
another child may be more easily distracted, and
a third may fatigue easily in the testing situation.
In fact, it is generally recommended that children
be tested in the morning when they are well rested
and alert.
Young children have a short attention span so
that sustained effort, such as that required by a
test, can be difcult. Childrens responses to a test
can sometimes be more reective of their anxiety or attempts to cope with a task they dont
understand; thus a child may reply yes regardless of the questions asked. Young children also
may have limited (or no) reading and writing
14:26
233
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
234
CATEGORIES OF SPECIAL CHILDREN
We can probably identify some nine major categories of special children, although these categories are not mutually exclusive and, for some
specic purposes, we may wish to use different
divisions.
Mental Retardation
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
235
Examples of items
Life age
Self-help general
Asks to go to toilet
Tells time to quarter hour
Drinks from cup or glass unassisted
Uses table knife for spreading
Dries own hands
Exercises complete care of dress
Is trusted with money
Buys own clothing accessories
Uses skates, sled, wagon
Performs responsible routine chores
Uses names of familiar objects
Communicates by letter
Moves about on oor
Walks downstairs one step per tread
Plays with other children
Plays difcult games
1.98
7.28
1.40
6.03
2.60
12.38
5.83
13.00
5.13
14.65
1.70
14.95
.63
3.23
1.50
12.30
Self-help eating
Self-help dressing
Self-direction
Occupation
Communication
Locomotion
Socialization
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
236
with those on the older edition, i.e., the Adaptive Behavior Composite vs. the Deviation Social
Quotient. The two correlated only .55, and were
higher in samples of mentally retarded adults and
hearing-impaired children. One would expect
the two versions of the same scale to correlate substantially higher, although the authors
argue that the revised scale is in fact signicantly
different.
Similarly, scores on the survey form were correlated with the parallel scores on the classroom
edition. This yielded correlation coefcients of
.31 to .54; lower correlations were obtained on a
sample of preschool children in a Head Start program, and higher correlations were obtained in
a sample of mentally retarded students. Because
the classroom edition contains some of the exact
items as the survey form, we would expect typical correlations to be higher, if for no other reason than item overlap. The relatively low correlations suggest various possibilities: that the two
groups of informants, parents and teachers, perceive the same child quite differently; that adaptive behavior is not highly stable across situations,
and what is observed at home is in fact different
from what is observed at school; that adaptive
behavior is a meaningful variable for mentally
retarded children but not for normal children;
that the two scales use different methodologies
(interview vs. questionnaire) that create different
results.
Several studies have compared scores on the
survey form with other measures of adaptive
functioning. Most of the obtained coefcients are
in the mid range, from about .20 to .70, with lower
and higher coefcients obtained depending upon
the nature of the sample, the test form used, and
other aspects.
Correlations with standard measures of cognitive functioning are also relatively low, typically in the .20 to .50 range. This, of course, supports the idea that the Vineland is measuring
something different than, yet related to, cognitive
functioning.
Construct validity is supported in a number
of ways. For example, the results of factor analyses suggest one major factor at various ages and
support the use of the Adaptive Behavior Composite as an overall measure of adaptive skills.
Support is also found for the separate use of the
various areas. It is interesting to note that gender
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
237
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
238
Decit Disorders Evaluation Scale (McCarney, 1989) and the Attention Decit DisorderHyperactivity Comprehensive Teachers Rating
Scale (Ullmann, Sleator, & Sprague, 1991).
Motor Impairments
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
239
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
240
has been used as a substitute for a test of general intelligence. However, the evidence indicates
that the PPVT-R, although useful as a screening instrument, should not be substituted for a
more comprehensive measure of cognitive functioning. Some evidence suggests that PPVT IQs
are lower than those obtained on the StanfordBinet in the case of minority children, but they are
higher for children that come from well-educated
and verbally articulate families.
An example of a concurrent validity study is
that by Argulewicz, Bingenheimer, and Anderson
(1983) who studied a sample of Anglo-American
and Mexican-American children in rst through
fourth grade. They found the Mexican-American
children to score almost a standard deviation
below Anglo-American children on both forms
of the PPVT-R. Only Form L correlated significantly with achievement measures of reading
and mathematics in both groups (.31 and .41
with reading, and .29 and .36 with mathematics, for Mexican-American and Anglo-American,
respectively), with group differences not statistically signicant.
The PPVT-R has also been used with adults,
both mentally retarded (e.g., Prout & Schwartz,
1984) and normal (e.g., Altepeter & Johnson,
1989). The results are rather mixed and inconsistent, but the conclusion is the same caution
should be used when the PPVT-R is used with
adults.
Norms. The original normative sample consisted of some 4,000 white persons residing in
or around Nashville, Tennessee. Norms for the
revised edition, however, are quite extensive,
based on stratied samples of 4,200 children and
adolescents and 828 adults, according to U.S.
Census data. The adults were tested in groups
using slides of the test plates.
Adaptations. Adaptations of the PPVT have
the , using the stimulus word. The performance of the normal children remained essentially the same on the two forms, but the autistic
children showed signicant improvement on the
modied form.
The PPVT-R and decision theory. If the PPVT-R
Speech Impairments
These are a wide variety of conditions and symptoms that interfere with spoken communication.
Communication skills are extremely important
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
241
achieve in the early school years. The test covers kindergarten to grade 2, and has three forms,
two that are alternate forms, and one that is an
applications form designed to assess mastery
of basic concepts used in combination with other
basic concepts. These concepts involve such basic
notions as left and right, rst and last, more and
less, whole vs. part, and so on. There is a version
of the test for use with blind children. The Boehm
can serve as a screening test to identify children
who have deciencies in those basic concepts, and
therefore might need special attention from the
teacher.
Each of the alternate forms is divided into two
booklets, with each booklet having 3 practice
items and 25 operational pictorial items arranged
in approximate order of increasing difculty. The
child looks at a picture composed of three objects,
such as an ice cream cone, a piece of pie, and
a shirt, and the teacher asks, Which of these
should a child never eat? School-aged children
can mark their answers with a pencil or crayon,
while preschool children can simply point. The
two alternate forms use the same concepts but
different illustrations.
The manual is quite clear in its instructions,
and the black and white line drawings are quite
unambiguous. The test can actually be administered to a small group of children, with the teacher
reading each question, but more typically individual administration is needed. If a child has a
short attention span, the test can be divided into
sections, with each section administered separately. Administration of this test does not require
a high degree of training or expertise.
Scoring. Scoring is straightforward. A single test
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
242
Hearing Impairments
Children with hearing impairments can be classied into two broad categories: the hard of hearing and the deaf. The difference essentially lies
in whether their hearing level can or cannot be
enhanced and used. Many other terms have been
used to try and distinguish among various types
of hearing impairment. One major distinction
is that of congenitally deaf (i.e., born deaf) and
adventitiously deaf (i.e., became deaf later in life
because of illness, accident, etc.). Another distinction is whether the person has hearing that
is nonfunctional for conducting ordinary aspects
of everyday life (i.e., deaf) vs. those whose hearing is functional with or without a hearing aid
(i.e., hard of hearing). Another distinction is
whether the impairment occurred before or after
the development of language (i.e., prelingually or
postlingually deaf).
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
Often, hearing-impaired children are evaluated by using nonverbal tests that have been
standardized on the hearing population for
example, the performance portion of the WISCR or an adaptation of the WPPSI for deaf children (Ray & Ulissi, 1982). There are a number
of difculties with this approach. If only one
scale or subtest is used, the obtained information is quite limited. Second, hearing-impaired
and normal-hearing children do not differ solely
in their capacity to hear. They have had different
experiences with language, which is intimately
related to problem solving as well as a host of
other aspects such as social skills. Even though
performance tests are nonverbal, they still have
verbal components such as the ability to understand instructions or to respond within a time
limit.
McQuaid and Alovisetti (1981) surveyed psychological services for hearing-impaired children
in a portion of the northeastern United States and
found that the Wechsler scales, especially the performance scale of the WISC-R, were commonly
used. One of the measures also used that has been
developed specically for the hearing impaired is
the Hiskey-Nebraska Test of Learning Aptitudes,
which we discuss next. (For an overview of the
assessment of auditory functioning, see Shah &
Boyden, 1991.)
Hiskey-Nebraska Tests of Learning Aptitude.
This test was rst published in 1941 and was originally designed as a measure of learning ability
for deaf children. In 1955, it was standardized on
hearing children to provide a measure of intelligence for children who might be at a disadvantage on highly verbal tests of ability. The age range
covers 21/2 to 171/2, with ve subtests applicable to
ages 3 to 10, four subtests applicable to ages 11
to 17, and three subtests that range across all age
ranges. Table 9.2 indicates the 12 subtests that
make up the Hiskey.
Instructions on each subtest may be presented
orally or by pantomime, depending on the childs
hearing acuity. The items were selected on the
basis of several criteria. They needed to reect
school experience, but to be adaptable for use in
a nonverbal test and administrate by pantomime.
Performance on the item needed to be correlated
with acceptable criteria of intelligence, and not
be inuenced by time limits. In some ways the
14:26
243
Appropriate age
range
3 to 10
3 to 10
3 to 10
3 to 10
3 to 10
All ages
All ages
All ages
11 to 17
11 to 17
11 to 17
11 to 17
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
244
Hiskey is analogous to the WISC; it is a broadbased measure of intellectual functioning, composed of a number of subtests.
mation on validity, basically addressing only concurrent validity for hearing children. Correlation
coefcients in the high .70s and low to middle
.80s are reported with the Stanford-Binet and the
WISC.
B. U. Watson and Goldgar (1985) assessed 71
hearing-impaired children and reported a correlation coefcient of .85 between the learning
quotient of the Hiskey and the WISC-R Full Scale.
Phelps and Branyan (1988) studied 31 hearingimpaired children and found correlations of .57
with the K-ABC nonverbal scale and .66 with the
WISC-R Performance score.
Norms. The norms are based on a standardization sample of 1,079 deaf children and 1,074 hearing children, ranging in age from 21/2 to 171/2. The
majority of the deaf children attended schools
for the deaf, while the hearing sample was stratied according to parental occupation to match
U.S. Census gures. Norms for individuals older
than 17 are based on extrapolation and cannot
be considered reliable.
Criticisms. Some authors feel that the Hiskey
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
245
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
246
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
score on the nonverbal test might lead to erroneous decisions. That, of course, is not a criticism solely of nonverbal tests. Scores on the
Stanford-Binet and the WISC-R, for example,
can be quite discrepant with mentally retarded
and learning-disabled children (e.g., A. S. Bloom,
Reese, Altshuler, et al., 1983).
Other tests use cartoons as the items and ask
the child to respond by pointing to a thermometer response format, happy faces, or similar nonverbal stimuli (for an example see Praver,
DiGiuseppe, Pelcovitz, et al., 2000).
Testing the limits. This phrase, often associated
with the Rorschach (see Chapter 15), refers to
the notion of psychologically pushing the subject to see what the limits of that persons abilities might be. This procedure can be useful with
special children, indeed with almost any subject.
Once a test has been administered in a standardized manner, the examiner can return to specic
items that the child missed, for example, and
provide cues and encouragement for the child
to complete the item. Here the intent is to see
whether in fact a particular child can use cues
to solve a problem, whether encouragement and
support can allow the child to really show and
stretch his or her abilities. Budoff and his colleagues (e.g., Budoff & Friedman, 1964) have in
fact taken testing to the limits and changed it into
a strategy to assess learning potential.
Test-retest stability of preschool measures.
One major concern is the low stability of measurement associated with infant behavior. When
cognitive tests are administered to children as
they enter school at age 6 and are readministered
a year later, we typically obtain coefcients in the
.70s or higher. But with infants and preschoolers, such correlations over a year-time-span range
from zero to the high .50s. Interestingly, these
coefcients are substantially higher for handicapped children (Kamphaus, 1993). The instability is a reection of the behavior rather than
the measure, but nevertheless creates problems.
14:26
247
dominated by two such batteries: the Reitan Batteries, i.e., the Halstead-Reitan Neuropsychological Test Battery for children 9 to 14, the ReitanIndiana Test Battery for children aged 5 through
8 (Reitan 1969; Reitan & Davison, 1974), and
the Luria-Nebraska Childrens Battery (Golden,
1981). We illustrate the Luria-Nebraska next.
This approach has been criticized because of the
time and cost requirements, because many of
the important behaviors are evaluated only in
a cursory manner, and because the procedures
used tend to be redundant (e.g., Goldstein, 1984;
Slomka & Tarter, 1984; Sutter & Battin, 1984).
A second approach consists of using a combination of traditional psychological and educational tests. Tests such as the K-ABC, the
Stanford-Binet, and the WISC-R are often used
as the main measure, together with other scales
that may measure oral and written language
skills (e.g., the Peabody Picture Vocabulary Test),
motor and visual-motor skills (e.g., the Developmental Test of Visual Motor Integration), academic achievement (discussed in Chapter 13),
and aspects of social-emotional behavior (e.g.,
the Child Behavior Checklist).
A third approach, of course, is to combine
the two. For example, Bigler and Nussbaum
(1989) describe the test battery used at a neurological clinic in Texas. The battery includes
selected sub-tests from the Halstead-Reitan batteries, the Reitan-Aphasia Screening Battery, the
Wide Range Achievement Test, the Boder Test
of Reading and Spelling Patterns, the Durrell
Analysis of Reading Difculty, the Beery Test
of Visual/Motor Integration, Ravens Coloured
Progressive Matrices, the WISC-R, a family history questionnaire, the Child Behavior Checklist, the Personality Inventory for Children, projective drawings, and a behavioral observation
inventory; additional measures, such as the KABC, are included as needed. For a thorough and
sophisticated critique of measurement and statistical problems associated with neuropsychological assessment of children, see Reynolds (1989).
major approaches to neuropsychological assessment in children. The rst is to use a standardized battery of tasks that were designed to identify brain impairment. The eld seems to be
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
248
oped by administering the adult version to children aged 5 to 12. Initially the authors found that
children 8 years and older could do a majority
of the procedures in the adult battery, and that
for 13- and 14-year-olds the adult battery was
quite appropriate. The authors therefore decided
to create a childrens battery for ages 8 to 12. This
was done by eliminating difcult items from the
adult battery, substituting easier items where possible, and adding new items where needed. Three
versions of the battery were investigated, and the
fourth version was published (Golden, 1989).
The assignment of each item to one of the
11 basic scales was done on the basis of the
authors clinical judgment, followed by a correlational analysis. A further factor analysis on a
separate sample of brain-damaged and normal
children, seemed in general to substantiate item
placement. However, further factor analyses of
each scale alone resulted in the creation of a new
set of 11 scales that, judging by their titles, have
some overlap but are not identical with the original 11 scales.
Each of the 11 scales is said to be multifactorial
in structure. Each scale covers not just a specic
skill but a domain of skills in a given area. For
example, the motor scale measures ne motor
speed as well as unilateral and bilateral coordination, imitation skills, verbal control of motor
movements, and construction skills.
Administration. The Luria-Nebraska requires a
skilled examiner and is typically used to assess
children who have known or suspected brain
dren with learning disabilities is now fairly common, given that the denition of learning disabilities includes minimal brain dysfunction. So
it is quite appropriate to ask questions about the
validity of the Luria-Nebraska with such children. Several studies have obtained ndings supportive of the validity of the Luria-Nebraska
(e.g., Geary & Gilger, 1984; Nolan, Hammeke, &
Barkley, 1983), but primarily with the language
and academic achievement subtests, that are adequately assessed by other instruments (Hynd,
1988).
A rather interesting study is reported by Snow
and Hynd (1985a). They administered the LuriaNebraska, the WISC-R, and an achievement test,
to 100 children who had been previously identied as learning disabled on the basis of a discrepancy between aptitude and achievement that is,
they achieved less than what their abilities would
predict. The authors analyzed the results using
Q-factor analysis. In this type of factor analysis, rather than seek out the fewest dimensions
among the test variables, the Q-technique analysis clusters subjects with similar test score patterns. The intent then is to identify subgroups
of subjects who belong together on the basis
of the similarity of their test scores. The authors
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
249
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
250
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
251
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
252
validity of the CBCL, such as studies comparing normal children with children referred to
a clinic. The discriminative power of the test is
fairly high. By using the 90th percentile of the
behavior-problem scores, and the 10th percentile
of the social-competence scores, the authors were
able to correctly classify 91.2% of the normal children and 74.1% of the referred children.
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
Concurrent validity results, by way of correlations with other rating scales, are also quite positive, with typical coefcients in the high .80s and
low .90s. The CBCL is a very popular instrument
and has been used in hundreds of studies.
Norms. The original CBCL provided norms on
children between ages 4 and 16, but the 1991 revision was extended upward through age 18, and a
separate form for 2- and 3-year-old children was
developed. The norms are based on some 1,300
children, and T scores can be obtained for the
subscales.
Evaluation. Reviewers of the CBCL see this
instrument as well-documented psychometrically with adequate reliability and validity (B. J.
Freeman, 1985) and as one of the best standardized instruments of its kind (M. L. Kelley, 1985),
and others characterize the CBCL as one of the
most sophisticated and well-researched broadspectrum behavior Rating Scales (Merrell, 1994).
The Conners Rating Scales. Another set of scales
that has also proven popular is the Conners Rating Scales. Conners also developed separate scales
for parents and for teachers, designed to identify
behavior problems in school-aged children. The
CRS (Conners, 1990) consists of four scales, two
parent and two teacher rating scales, that have
many items in common and are also conceptually similar. The scales vary in length from 28
to 93 items. Originally developed in 1969 (Conners, 1969), they have been widely used since
then and have been revised as the need arises
and as research ndings accumulate (Conners,
Sitarenios, Parker, et al., 1998). For parents then,
there is the CPRS-93 (Conners parents rating
scale with 93 items) and the CPRS-48. For teachers, there is the CTRS-28 and the CTRS-39. The
CPRS-48 and the CTRS-39 seem to be the most
widely used versions. The rating scales were originally developed within an applied research setting, Johns Hopkins University Hospital, and
were intended as norm-referenced instruments
to be used widely. A practical result of this is that
in reading the literature, one nds different versions of these scales, sometimes with different
results as in the number of factors reported.
The Conners Parents Questionnaire, or CPRS93 (Conners, 1970), consists of 93 items that
14:26
253
assess a number of behavior problems ranging from hyperactivity to bowel problems. Each
item is rated by the parent on a 4-point scale
ranging from 0, not at all, to 3, very much
(all four scales use this format). Factor analysis of this scale yielded six factors labeled as:
aggressive-conduct disorder, anxious-inhibited,
antisocial, enuresis-encopresis, psychosomatic,
and anxious-immature. However, the actual scale
is scored on ve dimensions: conduct problem,
learning problem, psychosomatic, impulsivehyperactive, and anxiety.
The Conners Teacher Rating Scale, or CTRS39 (Conners, 1969), consists of 39 items that
cover three areas: classroom behavior, group participation, and attitude toward authority. It also
includes six subscales such as hyperactivity, conduct problems, and daydream-attention problem. The longer version (CTRS-48) contains only
ve subscales, such as conduct problems, similar
to those on the CTRS-39, but with different others such as psychosomatic. The teacher rates each
item on a 4-point scale, identical to the one used
by parents. Factor analysis suggests four clusters:
conduct problem, inattentive-passive, tensionanxiety, and hyperactivity. An abbreviated
10-item scale, using the items from the longer version that are most frequently checked by teachers, is also available. These 10 items include
such aspects as restless, impulsive, short attention
span, easily frustrated, and has temper outbursts;
the scale seems quite useful in identifying hyperactive children.
Reliability. Test-retest for the CTRS-39, at 1month intervals, ranges from .72 to .91, but drops
to .33 to .55 at 1-year intervals (R. A. Glow, P. A.
Glow, & Rump, 1982).
Interrater agreement on the CTRS-39 ranges
from a low of .39 to a high of .94 on different
subscales. Agreement between parents on the
CPRS-48 averages in the low .50s. Sandberg,
Wieselberg, & Shaffer (1980) reported an alpha
coefcient of .92 for the 10-item Hyperactivity
Index.
Validity. Numerous studies have supported the
ability of the Conners scales to differentiate various diagnostic groups from their normal counterparts, such as learning disabled, hyperactive,
and juvenile delinquents (e.g., Merrell, 1990).
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
254
reliability and validity of the Conners scales is difcult because of the many forms, and they believe
that other instruments such as the CBCL are
psychometrically superior (e.g., Witt, Heffer, &
Pfeiffer, 1990). The response choices of just a little, pretty much, and very much have been
criticized as ambiguous and ill dened what
may be pretty much to one person may be
just a little or very much to another. Many of
the scale items have also been criticized because
they are too abstract (e.g., submissive), or contain two separate aspects (e.g., temper outbursts,
explosive and unpredictable behavior).
McCarthy Scales of Childrens Abilities. Finally,
14:26
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
Special Children
14:26
255
short form of the McCarthy for rapid screening of preschool, kindergarten, and rst-grade
children; Taylor, Slocumb, and ONeill (1979)
have proposed another short form by identifying
the six subtests that correlated most highly with
the GCI, in a sample of 50 kindergarten children.
For a comparison of three short forms see Harrington and Jennings, (1986). A set of six subtests
has also been published as a separate test called
the McCarthy Screening Test (McCarthy, 1978).
Interesting aspects. One interesting aspect of
Czeschlik, T. (1992). The Middle Childhood Temperament Questionnaire: Factor structure in a German
P1: JZP
0521861810c09
CB1038/Domino
0 521 86181 0
256
In this study, the cross-cultural validity of a temperament questionnaire was assessed through factor analysis.
The results do not support the validity of this instrument. The author concludes that if temperament research
is to make progress, there is need for sound measuring
instruments.
Waksman, S. A. (1985). The development and psychometric properties of a rating scale for childrens
social skills. Journal of Psychoeducational Assessment, 3,
111121.
A readable article, illustrating the development of a rating
scale to assess childrens social skills. You might want to compare and contrast this article with the one by Clark, Gresham,
DISCUSSION QUESTIONS
14:26
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
14:18
10 Older Persons
AIM This chapter looks at testing older persons. We rst discuss some basic issues
such as dening who is an older person, practical issues of testing, and some general comments related to personality, cognitive, and attitude testing. Then we look
at a number of specic areas of relevance: attitudes toward older persons, anxiety
about aging, life satisfaction, marital satisfaction, morale, coping, death and dying,
neuropsychological assessment, memory, and depression.
Currently, there is great interest in older persons, in part, because the number of older persons and their relative frequency within the general population has increased substantially (John,
Cavanaugh, Krauss-Whitbourne, 1999). Exactly
who is an older person? Chronological age is
often used as the criterion, with age 65 as the cutoff point. Terms such as the young-old (ages 65
to 74) vs. the old-old (75 years and older) have
been proposed (Neugarten, 1974), as related to
various physical and sociopsychological characteristics. The term aging, as J. E. Birren and B.
A. Birren (1990) so well state, implies something
that is associated with chronological age but not
identical with it. Thus the term is used in two
ways as an independent variable used to explain
other phenomena (e.g., there are changes that
occur in intellectual processes as one gets older)
and as a dependent variable explained by other
processes (e.g., lack of support by others creates aging difculties). In contrast to chronological age, functional age has been suggested
(Salthouse, 1986), i.e., the persons ability to be
involved in directed activities, intellectual pursuits, and so on.
The fact that the population of the United
States is aging presents a number of challenges
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
258
Testing problems. D. Gallagher, Thompson, and
March 4, 2006
14:18
Suppose that we wanted to assess someones ability to swim. We could ask them a relevant set
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
Older Persons
March 4, 2006
14:18
259
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
260
March 4, 2006
14:18
of college students toward old age. This scale consisted of 137 statements that covered 13 different
categories such as conservatism (e.g., they are
set in their ways), mental deterioration (e.g.,
they are absent-minded), and personality traits
(e.g., they are kind). The items were developed
through a series of unstructured interviews with a
small sample of adults, as well as discussions with
social workers, study of case records of elderly
clients, and a review of the literature. However,
no item statistics are given as to how the nal
items were selected.
The scale can be administered in a group setting. There is no time limit and completion typically takes 15 to 30 minutes. The authors experimented with a response scale of 0 to 100, where
the respondent was instructed to use these numbers as percentages, i.e., If you think that 90% of
older people are characterized by this statement,
give that statement a response value of 90. The
authors concluded that a yes no response format
was preferable, as it took less time and simplied
the instructions.
Although the scale and various subsets of items
were used in a variety of studies with samples
ranging from college undergraduates to older
persons, the authors did not seem particularly
concerned about reliability, and so little reliability
evidence is available (Axelrod & Eisdorfer, 1961;
Bekker & Taylor, 1966; Kilty & Feld, 1976; Lane,
1964; Tuckman & Lorge, 1953; 1958). In one
study, the scale was administered at the beginning
of a course on the psychology of the adult and a
subset of 30 items readministered with the nal
examination; the obtained r was .96. SpearmanBrown reliability coefcients from .73 to .88 are
also reported.
Although this scale has been used in one form
or another in a variety of studies, the studies were
typically isolated and do not present the cohesive portrait required by construct validity. In
part, the difculty may lie in the lack of a wellarticulated theory about attitudes toward older
persons. Axelrod and Eisdorfer (1961) administered the scale to a class of college students, with
random fths of the class asked to respond to
different age groups (35, 45, 55, 65, and 75). If
we assume that the negative stereotype of aging
increases with the age of the stimulus target, then
the sensitivity of the scale to such increases could
be seen as evidence of the construct validity of
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
14:18
Older Persons
261
Table 101. Multidimensional Model of Anxiety About Aging (Lashen & Faulkender, 1993)
Fears
Dimensions of
anxiety
of aging
(process of aging)
of being old
(state of being old)
of old people
(perception of others)
1. Physical e.g., perceived changes in physical appearance as one gets older; worries about health
2. Psychological e.g., self-esteem and life satisfaction; degree of personal control; fear of memory loss
3. Social e.g., economic and social losses; worries about retirement
4. Transpersonal e.g., search for meaning of life; religious issues
Lasher and Faulkender (1993) proposed that anxiety about aging is a separate dimension from
other forms of anxiety such as death anxiety or
state-trait anxiety. They felt that this concept of
aging anxiety has importance both because it
helps to understand how we react to the elderly,
and it has not been adequately researched.
These authors began the development of their
scale with a theoretical model composed of two
major dimensions, as illustrated in Table 10.1 The
intersection of the two dimensions yields 12 cells,
and the authors used this theoretical blueprint to
generate 7 items per cell, for a total of 84 items.
These items used a Likert 5-point response scale,
with half of the items phrased positively and half
of the items phrased negatively. The authors then
asked three psychology graduate students to sort
the items in terms of the two dimensions. This
was done twice and the authors concluded that
no new items were needed, although no statistical
data is offered in support of this conclusion. Note
that this represents a variation from the usual
procedure where a pool of items is rst developed
and then a subset of the best items is chosen,
on the basis of some preliminary decision rules.
The 84-item Aging Anxiety Scale (AAS) was
then administered to 312 volunteers, ranging in
age from below 25 to over 74. A series of factor analyses were performed, and nally 20 items
were retained, reecting 4 factors, each composed of 5 items: fear of old people; psychological
concerns; physical appearance; and fear of losses
(it is interesting to note that only 6 of the 20 items
are worded negatively and that 5 of these 6 occur
on the fear-of-losses factor).
Although these results do not support the initial theoretical framework, the authors felt that
the obtained 20-item AAS is a potentially useful
scale. Total scores on the scale show higher mean
scores on the part of males than females, and the
four factors intercorrelate with each other, but
not substantially, with coefcients ranging from
.20 to .39. Scores on the AAS correlate significantly with two other measures of aging, and
the overall pattern does suggest some construct
validity.
LIFE SATISFACTION
One of the active areas of research with older persons has to do with their perceived life satisfaction
(subjective well-being, happiness, etc.). There are
a number of interesting issues in this area, such as
how happy are older Americans relative to some
comparison group, what are the sources of satisfaction and dissatisfaction, and what are the
dimensions and relative importance of such life
satisfaction (Doyle & Forehand, 1984).
A number of studies have found a negative
relationship between age and self-reported happiness, that is, life satisfaction decreases with
age (e.g., Bradburn & Caplovitz, 1965; Robinson & Shaver, 1973). However, other researchers
(Herzog & Rodgers, 1981) have reported that
such relationship can be positive, that obtained
correlation coefcients tend to be small, and
that whether the results are positive or negative
tends to be a function of how the variables are
dened. For example, A. Campbell, Converse,
and Rodgers (1976) found that younger people
reported feeling happier than older persons, but
reported lower life satisfaction.
A number of investigators have attempted to
dene and measure the psychological well-being
of older people, quite often with the intent of
using such a measure as an operational denition
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
262
March 4, 2006
14:18
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
Older Persons
The life satisfaction of older people is substantially related to their marital relationships (e.g.,
Dorfman & Mofett, 1987; Medley, 1980). S. N.
Hayes and his colleagues (1992) felt that the marital satisfaction inventories available were not fully
appropriate for older individuals, and therefore
carried out a series of ve studies to develop such
a questionnaire, which they called the MSQFOP
(Marital Satisfaction Questionnaire for Older
Persons).
Study 1. The rst step was to generate a pool of
items based on the available literature, on other
available questionnaires, and structured interviews with a small sample of older individuals and
professional workers. This procedure yielded an
initial pool of 120 items that were then reviewed
to remove redundancies, etc., and resulted in
a preliminary version of 52 items answered on
a 6-point response scale from very dissatised
to very satised. Note that the even number of
responses was chosen purposely to minimize a
central response tendency. The 52-item version was then administered to 110 older married persons, whose mean age was 69.9 years.
An item analysis was then undertaken and items
were eliminated if less than 5% of the sample
March 4, 2006
14:18
263
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
264
MORALE
March 4, 2006
14:18
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
Older Persons
14:18
265
COPING OR ADAPTATION
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
266
Neuropsychological assessment basically involves the assessment of cognitive and behavioral factors that reect neurological disease. As
Kaszniak (1989) states, neuropsychological evaluation is playing an increasingly important role
in the assessment of older adults. A wide variety of
measures and approaches are used in neuropsychological testing, but at the risk of oversimplication, we can identify the following major categories of tests (as listed in Schmitt & Ranseen,
1989):
1. Brief screening procedures. A number of
procedures have been developed that are brief
and are used primarily as screening procedures
to be followed by more extensive testing where
appropriate. At the same time, these procedures are often used for other purposes, ranging from the assessment of changes over time to
March 4, 2006
14:18
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
Older Persons
tion. That is, it involves grossly impaired thinking, memory lapses, and faulty reasoning. Thus
the term dementia does not refer to a single illness, but to a group of conditions all of which
involve the same basic symptoms, namely a progressive decline in intellectual functions. Two of
these conditions account for most of the patients:
one is Alzheimers where the neurons of the brain
cells deteriorate, and the other is multi-infarct
dementia (an infarct is a small stroke, and this
condition is due to the effects of many small
strokes that damage brain tissues).
Many current efforts are aimed at the diagnosis of Alzheimers. For example, Volicer, Hurley,
March 4, 2006
14:18
267
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
268
March 4, 2006
14:18
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
Older Persons
14:18
269
DEPRESSION
There seems to be agreement that depression represents a major public health problem and that
depression can and does occur late in life, with
high rates of depression in clients over the age of
65. Thus, depression seems to be the most common functional psychiatric disorder among older
persons, although there is some question whether
the prevalence of depression increases with age. A
number of authors point out that what is called
depression in older persons may in fact represent reactions to the economic and social difculties they encounter, grief over the loss of friends
and family, and reactions to physical illness and
problems.
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
270
14:18
Example
Factor
1. Remote personal memory
2. Numeric recall
3. Everyday task-oriented memory
4. Word recall/semantic memory
5. Spatial/topographic memory
The literature suggests that somatic complaints are more prominent in older depressed
patients than in younger individuals. However,
complaints of fatigue, pain, or lack of energy,
which in a person may be reections of depression, may in an older person be realistic evidence
of being old, and not necessarily depressed. Part
of the complexity of assessing depression in older
persons is that many self-report scales of depression contain items that have to do with somatic
symptoms, such as sleep disturbances and diminished energy levels. Because older persons do
tend to have more physical illnesses, endorsement
of these items may not necessarily be reective
of depression (Blazer, Hughes, & George, 1987;
Himmelfarb, 1984; Newman, 1989).
A wide variety of procedures are used to assess
depression in older persons. These include the
depression scales discussed in Chapter 7 and others, particularly the three most common scales,
the Beck Depression Inventory (A. T. Beck, Ward,
Mendelson, et al., 1961), the Hamilton Rating
Scale for Depression (Hamilton, 1960), and the
Zung Self-Rating Depression Scale (Zung, 1965).
Other approaches include multivariate instruments such as the MMPI, projective tests such as
the Gerontological Apperception Test (R. L. Wolk
& R. B. Wolk, 1971) and structured interviews,
such as the Schedule for Affective Disorders and
Schizophrenia (Spitzer & Endicott, 1977).
SUMMARY
Forgetting a word
Having trouble concentrating
Going into a room and forgetting why
Forgetting an appointment
Failing to recognize others
Costa, P. T., & McCrae, R. R. (1984). Concurrent validation after 20 years: The implications of personality stability for its assessment. In N. W. Shock, R. G.
Greulich, R. Andres, D. Arenberg, P. T. Costa, E. G.
Lakatta, & J. D. Tobin (Eds.), Normal human aging:
The Baltimore longitudinal study of aging (NIH Publication No. 84-2450). Washington, D.C.: U.S. Public
Health Service.
What happens to personality as a person ages? One answer
was given by what is called the Kansas City Studies (see Neugarten & Associates, 1964): As people aged they became more
preoccupied with themselves, more emotionally withdrawn
what eventually became known as disengagement theory.
This suggested reading, covering what is known as the Baltimore study, gives a different answer there is personality
stability as one ages.
P1: JZP
0521861810c10
CB1038/Domino
0 521 86181 0
March 4, 2006
Older Persons
14:18
271
Libman, E., Creti, L., Amsel, R., Brender, W., & Fichten,
C. S. (1997). What do older good and poor sleepers do during periods of nocturnal wakefulness? The
Sleep Behaviors Scale: 60 +. Psychology and Aging, 12,
170182.
DISCUSSION QUESTIONS
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
AIM What are the problems associated with using psychological tests with minority
individuals and those of another culture? If for example, we wish to administer the
WISC-R to a black child, or we translate the test into French for use with French children,
will the test still be valid? Basically, this is the issue we look at in this chapter.
INTRODUCTION
In this chapter, we look at cross-cultural testing, that is, at some of the ways in which culture and testing can interact. We use the term
culture in two different ways: (1) to delineate
people living in different countries, for example, the United States vs. the Peoples Republic
of China; and (2) to refer to minority groups
within a particular country, for example, blacks
and Hispanics living in the United States. There
are of course many ways of dening culture. For
our purpose, we can dene culture as a set of
shared values and behaviors that include beliefs,
customs, morals, laws, etc., that are acquired by a
person, shared in common with other members
who are typically in close proximity, but different
from those held by others who often live in a different geographical setting (D. W. Sue & D. Sue,
1990).
MEASUREMENT BIAS
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
273
The cultural test-bias hypothesis. This hypothesis contends that group differences on mental
tests are due to artifacts of the tests themselves and
do not reect real differences between groups that
differ on such demographic variables as ethnicity, race, or socioeconomic status. There is, in fact,
almost no evidence to support such a hypothesis
with tests that have been carefully designed, such
as the Stanford-Binet or the Wechsler tests (A. R.
Jensen, 1980).
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
274
March 4, 2006
14:19
rst language for a substantial number of students in the United States, and they have limited
English prociency. Their number seems to be
growing and, because of increased use of standardized tests in school systems, there is great
concern that test results for these students may
either be less valid or misused.
Lam (1993) indicates that test developers, particularly of standardized achievement tests, make
ve assumptions: (1) test takers have no linguistic barriers that might interfere with their
performance on the test, i.e., they can follow
instructions, understand the test items, and have
adequate time to complete the test; (2) the test
content is suitable and of appropriate difculty
level for the test taker; (3) test takers have the
required test sophistication for taking standardized achievement tests; (4) test takers are properly motivated to do well on the test; and (5) test
takers do not have strong negative psychological
reactions (such as anxiety or feeling stressed) to
testing.
Lam (1993) feels that these assumptions may
at least be questionable with language minority students, and therefore their test results may
not be as reliable and valid. This issue is a wellrecognized one and in fact is incorporated in
the Standards for Educational and Psychological
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
Testing discussed in Chapter 1. Part of the solution consists of strategies that reduce the probability that any of the ve assumptions are violated. These include translating tests into the
childs native language, developing ethnic specic norms, developing tests that accommodate
the cultural differences of various people, extending time limits, using items that are relevant to
the minority culture, and so on.
Matluck and Mace (1973) presented a number of suggestions regarding tests to be used with
Mexican-American children. With regard to format, such tests should assess separately the childs
receptive ability (e.g., listening comprehension)
vs. productive ability. The test should use appropriate stimuli for example, for younger children
pictorial-visual stimuli are most likely appropriate, but verbal-auditory stimuli are not. Similarly, the number of items and the administration time should be appropriate for example,
for most children 15 to 25 minutes is considered
appropriate. With regard to content, the items
should be simple in language and not require
linguistic skills that are beyond the childs age.
Items should not have language or cultural bias.
Concerning test materials, the authors point out
that sometimes impressionistic or sketchy line
drawings are used as stimuli, and a child may not
have the appropriate experiential background to
deal with such materials; actual objects or photographs might be better. Finally, with respect to
the test examiner, the authors point to a need to
be sensitive as to whether the examiners gender,
degree of experience, physical appearance as to
ethnicity, and so on might inuence the test performance of the child.
External vs. internal criteria. A number of
March 4, 2006
14:19
275
number of steps can be taken to attempt to eliminate bias as a test is being developed. First, a sensitive and knowledgeable test writer can eliminate
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
276
obviously biased items. Item statistics can be easily collected, and those items that are related to
race, gender, or other irrelevant variables can
be eliminated. Sometimes, matched samples are
used for this purpose that is, the pool of items is
administered to a black sample and a white sample that have been matched on ability. The difculty level of each item is computed for the two
samples separately, and items that show a difference in difculty rates are eliminated. It should be
obvious, however, that when we match the samples we no longer have representative samples;
also an item that has different difculty rates in
different groups is not necessarily a biased item.
We can compute the correlation of an item with
the criterion and determine whether the item predicts the criterion equally well in the two samples
(although item-criterion correlations are typically quite low).
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
cally, if a test is biased that test results in systematic error related to one group but not to
another. Specically, there are two situations that
can occur. The rst is differential validity or slope
bias. Here we have a situation where the relationship between test scores and criterion scores
(for example, SAT scores and predicted GPA) is
substantially greater in one ethnic group than in
another. Because the correlation, or regression
line, is represented statistically by the slope of
graphed data, this is called slope bias. In fact,
the literature suggests that slope bias is the result
of poor experimental procedure, differences in
sizes of samples from majority and minority populations, or chance ndings (e.g., J. E. Hunter,
Schmidt, & R. F. Hunter, 1979).
To examine slope bias we look at group differences by using a regression equation to predict
from test scores (e.g., IQ) to criterion scores (e.g.,
GPA). Two questions are typically asked here:
(1) Can the same regression equation be used
for the two different groups? and (2) Does the
regression equation overpredict or underpredict
for either group?
March 4, 2006
14:19
277
Underprediction means that the estimated criterion value is lower than the actual value, while
overprediction means that the estimated criterion value is higher than the actual value. If
under- or overprediction occurs in a systematic
way, there is bias present.
In the second situation, the majority group
obtains a higher mean score on the test than
the minority group; the bias comes in that both
groups do equally well on the criterion. Thus
for example, if we could show that on a college
entrance exam whites obtained a mean score of
600 and blacks obtained a mean score of 400,
and both groups did equally well on academic
achievement as measured by grades, for example,
then we would have what is called intercept bias
(the term again referring to the regression line in a
bivariate graph). In this case, the predictive validity coefcients for each sample would be approximately equal, so that test scores would be equally
predictive of criterion performance. However, if
a college admissions committee were to use a particular cutoff score to admit or reject applicants, a
greater proportion of minority applicants would
be rejected. In fact, there is no evidence that supports intercept bias, and some studies have shown
that there is a slight to moderate bias in favor
of the minority group (e.g. Duran, 1983; J. E.
Hunter, Schmidt, & Rauschenberger, 1977).
Intercept bias is often assessed through the
analysis of item difculty. Obviously, if there are
mean differences between samples, there will be
differences in item difculty; in fact, we can think
of the mean as reecting average difculty. To
determine item bias we look for items that do not
follow the expected pattern. Thus, for a minority
sample, one item may be substantially more difcult than similar items; such an item needs to
be inspected to determine what is causing these
results, and it should possibly be eliminated from
the test.
In fact, when these procedures are used in an
objective, scientic manner, the nding is that
cognitive tests are generally not biased against
minorities.
Bias and decision theory. Tests are used by educational institutions and by some businesses to
screen applicants for admission or hiring. Not
entirely for unselsh reasons, most of these institutions would like to make decisions that are
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
278
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
279
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
280
is sometimes
argued that test bias is not so much a matter of the
test being biased; the bias occurs in the use of test
results to label minority children and to place
a disproportionate number of these in specialeducation programs, that is, to label these children as mentally retarded. In fact, the empirical
evidence argues just the opposite. A number of
studies show that black and low-socioeconomicclass children are less likely to be recommended
for special-education-class placement than their
white or higher-socioeconomic-class peers (C. R.
Reynolds, 1982).
Special-education
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
281
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
282
CROSS-CULTURAL ASSESSMENT
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
283
test-taking behavior. A test may well have content validity in one country, but not in another.
Holtzman (1968) divides such potential problems into three categories: (1) cross-national
differences, (2) cross-language differences, and
(3) subcultural differences (such as ethnic origin and degree of urbanization). He suggests that
studies can be undertaken that control for one or
more of these categories. For example, a comparison of college students in Mexico, Venezuela,
we wanted to determine the utility of a particular test that was developed in the United States
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
284
in another country, such as Venezuela, for example, we would translate the test into Spanish and
carry out the necessary assessments of reliability and validity in Venezuela. In fact, most of
the major commercially published tests, such
as the Stanford-Binet, the WISC, the MMPI,
and the CPI, are available in various languages.
However, translation is not a simple matter of
looking up the equivalent words in a dictionary.
Common objects in one culture may be uncommon in another; bland words in one language
may have strong affective meanings in another.
Phrases may be translated correctly in a literary
sense, but not be colloquially correct; maintaining the language equivalence of an instrument
across languages can be difcult.
Brislin (1970) suggested that the procedure to
be used, called the back translation method, follow
certain steps. The items to be translated should
be simple, not use hypothetical phrasings or subjunctive mood, avoid metaphors and colloquialisms. The items are translated from the source
language to the target language by a bilingual
person. A second bilingual person translates the
items back from the target language to the source
language. The two source language versions are
then compared; any discrepancies hopefully can
be resolved.
There is also a process known as decentering,
which refers to the translation process in which
both the source and the target language versions
are equally important i.e., both the source and
the target versions contribute to the nal set of
questions, and both are open to revisions.
A number of other procedures are also used.
In addition to the back translation, bilinguals can
take the same test in both languages. Items that
yield discrepant responses can easily be identied and altered as needed or eliminated. Rather
than use only two bilingual individuals to do the
back translation, several such individuals can be
used, either working independently or as a committee. It is also important to pretest any translated instruments to make sure that the translated
items are indeed meaningful.
Quite often, tests that are developed in one
culture are not simply translated into the target
language, but they are changed and adapted. This
is the case, for example, with the Cuban version
of the Stanford-Binet (Jacobson, Prio, Ramirez,
et al., 1978).
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
285
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
286
language(s) do you read and speak: (a) only Spanish, (b) Spanish better than English, (c) both
equally, (d) English better than Spanish, (e) only
English. Other questions with similar response
options ask what language(s) are spoken at home,
with friends, as a child, etc.). The items were
placed within the context of a 16-page questionnaire, and administered to a sample of 363 Hispanics and 228 non-Hispanic white. Both English
and Spanish versions were available.
The responses for the two samples were factor analyzed separately. For the Hispanic sample
three factors were obtained. The rst accounted
for 54.5% of the variance and was called language use and ethnic loyalty. This factor was
made up of seven items that measured language
use and the ethnicity of important others. The
second factor accounted for only 7% of the variance and included four items on preference for
media (e.g., Spanish language TV). The third
variable accounted for 6.1% of the variance and
included four items that measured the ethnicity
of friends for self and for ones children. Similar results were obtained for the non-Hispanic
white sample, except that the second and third
factors accounted for a greater portion of the
variance. On the basis of some additional statistical decisions, the authors chose 12 items as
their nal scale. The 12-item scale was then analyzed for reliability (alpha coefcient = .92) and
for validity. Scores on the scale correlated signicantly with generational distance, with length of
residence in the United States (taking age into
account), and with the respondents own evaluation of their degree of acculturation.
The Olmedo, Martinez, & Martinez (1978) Scale.
March 4, 2006
14:19
scale is the Acculturation Rating Scale for Mexican Americans, or ARSMA (Cuellar, Harris, &
Jasso, 1980). The ARSMA consists of 20 questions each scored on a 5-point scale, ranging from
Mexican/Spanish to Anglo/English. For example,
one item asks what language you prefer. Available responses are (a) Spanish only; (b) mostly
Spanish, some English; (c) Spanish and English
about equally; (d) mostly English, some Spanish;
(e) English only. The total scores on this scale
yield a typology of ve types: very Mexican;
Mexican-oriented bicultural; true bicultural;
Anglo-oriented bicultural; and very Anglicized.
Four factors have been identied on the scale:
(1) language preference; (2) ethnic identity and
generation removed from Mexico; (3) ethnicity
of friends and associates; (4) direct contact with
Mexico and with ability to read and write in Spanish. Both internal reliability (alpha = .88), and
test-retest reliability (.80 for 4- to 5-week period)
are adequate. This scale and its factors was crossvalidated (Montgomery & Orozco, 1984) and
used in a variety of studies (e.g., Castro, Furth, &
Karlow, 1984). Some of the items have been incorporated into a semistructured interview measure
of acculturation (Burnam, Telles, Karno, et al.,
1987).
SL-ASIA. Most of the acculturation scales that
have been developed are for Hispanics. One scale
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
287
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
288
March 4, 2006
14:19
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
289
The Coloured PM (CPM). The CPM was designed
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
290
March 4, 2006
14:19
48 items similar to those on the SPM, but considerably more difcult; the test is intended for
children aged 11 and older of above-average intellectual ability. The test consists of two sets of
items. Set 1 can be used either as a practice test
for Set 2 or as a rough screening test. Set 2 can
be used either as a test of intellectual capacity when administered without a time limit or
as a test of mental efciency when used with
a time limit. Although the APM has been used
in a number of studies (e.g., Ackerman, 1992),
there seems to be relatively little psychometric
information available. What is available suggests
that the APM is much like the other two forms
in terms of reliability and validity. One major
difference between the APM and the other two
is the lack of norms. The test manual reports
estimated norms rather than actual norms.
Another possible difference concerns the dimensionality of this form. Because the underlying
theory is that of Spearman, a construct validity approach would require the PM to be unidimensional, i.e., a pure measure of g. That
seems to be the case for the SPM and the CPM,
but there seems to be some question about the
APM: some studies have reported a single factor
(e.g., Alderton & Larson, 1990; Arthur & Woehr,
1993), while others have reported two (Dillon,
Pohlmann, & Lohman, 1981). A 12-item short
form is also available (Arthur & Day, 1994).
The D48. Like the Progressive Matrices, the D
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
291
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
292
March 4, 2006
14:19
One reaction to the difcult issues of culturefair tests has been to develop tests that are
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
culture-specic. Perhaps the best known example is that of the Black Intelligence Test of Cultural Homogeneity, abbreviated as BITCH-100
(R. L. Williams, 1975). This test is composed
of 100 multiple-choice items that assess knowledge about black culture and street wiseness.
The test was actually standardized on black
high-school students. In one of the few studies
designed to assess the validity of the BITCH-100,
the test and the WAIS were administered to 116
white and 17 black applicants to a police department (Matarazzo & Wiens, 1977). As expected,
the black applicants performed better on the
BITCH-100 and less well on the WAIS than the
white applicants. Scores on the two tests did not
correlate, and the mean BITCH-100 score for
the black applicants fell below the average of
the standardization sample of high-school students, despite the fact that these police applicants were better educated and more intelligent
as a group than the standardization sample. It
would seem that the BITCH-100 is not a valid
measure of intelligence, even within the black
population, and may not even be a valid measure
of street wiseness or knowledge of black slang
(A. R. Jensen, 1980). In general, such culturespecic tests do yield higher mean scores for
members of that minority, but they have little
if any predictive validity. From the viewpoint of
understanding they can yield useful, if restricted,
information. From the viewpoint of prediction,
for example of scholastic attainment, they are
typically not valid.
The TEMAS
14:19
293
STANDARDIZED TESTS
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
294
14:19
First cohort
Second cohort
White
Asian
Hispanic
Black
American-Indian
.38
.57
.35
.36
.40
.42
.54
.52
.40
.42
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
March 4, 2006
14:19
295
Bracken, B. A., & Fouad, N. (1987). Spanish translation and validation of the Bracken Basic Concept Scale.
School Psychology Review, 16, 94102.
An interesting article that illustrates several steps that can be
taken to assure that a test that is translated from a source to a
target language remains equivalent across cultures.
P1: JZP
0521861810c11
CB1038/Domino
0 521 86181 0
296
Lam, T. C. M. (1993). Testability: A critical issue in
testing language-minority students with standardized
achievement tests. Measurement and Evaluation in
Counseling and Development, 26, 179191.
The concept of testability refers to the degree to which a test
is appropriate and valid for a particular subgroup, in this case
individuals who have limited English prociency. An interesting and well-written article that discusses testing issues
relevant to language-minority students.
Schwarz, P. A. (1963). Adapting tests to the cultural setting. Educational and Psychological Measurement, 23,
673686.
Although this report is rather old, it still offers interesting
insights into the challenges faced by an investigator who
wishes to use tests in a different cultural context in this
case, Africa.
March 4, 2006
14:19
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
AIM This chapter looks at testing in the context of disability and rehabilitation.
Although we talk about the entire life span, the focus is on adults, as opposed to
Chapter 9 where the focus was explicitly on children. We look at three major categories of disability: visual impairment, hearing impairment, and physical impairment.
For each category, we look at some of the basic principles and some specic examples
as illustrations of these principles.
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
298
uals with disabilities present a number of challenges with regard to tests. Because such disabilities can involve a wide range of conditions,
such as mental retardation, visual impairments,
chronic health problems, and so on, the challenges are numerous and varied. For example,
typical paper-and-pencil inventories require a
fair degree of hand and nger dexterity, so often
clients cannot complete these without assistance.
Often, disabilities can result in poor self-esteem,
a high degree of frustration, depression, denial,
poor adjustment, and other psychological aspects
that may interfere with and/or color the results
of a psychological test. Simeonsson, Huntington,
and Parse (1980) stated that, it is an unfortunate
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
bes four stages in the process of diagnostic assessment with rehabilitation clients. The rst is to
determine the relevant issue or question, that is,
what are the goals of assessment for this particular client. Often, a battery of tests is administered
routinely to all clients. Although such data may
be quite useful for research purposes or may provide valuable local norms, the usefulness of such
an approach for a particular client may be quite
limited.
The second stage involves obtaining the necessary background data. This can involve med-
14:30
299
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
300
MODIFIED TESTING
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
needed to be developed and used. Other possibilities, such as pooling of data from several
samples, needed to be explored. With regard
to the testing situation, the Panel pointed out
that little was known about the relationship of
aspects of the disabled person and the testing situation as reected in such questions as: Why do
some disabled persons take the standard form of
a test, such as the SAT, rather than the modied version? Are there other approaches besides
paper-and-pencil tests? How does the attitude of
the examiner affect the test scores of the disabled
person? How do coaching and training in testtaking strategies affect the performance of disabled persons? Finally, with regard to the actual
decision-making process, how are test scores
actually used in making decisions about disabled
individuals, and how do agged (i.e., the results
of a nonstandard test administration identied as
such) test scores inuence the decision-making
process? The Panel also concluded that tests can
provide good objective measures of merit and
provide the opportunity to show that a person
has the required abilities. At the same time, members of the Panel felt that psychometric technology simply was not advanced enough to ensure
that tests could measure skills independent of disabling conditions, and they recommended that
multiple sources of information to supplement
test scores should be used.
Note that when we ask how changes in a test
affect that testss reliability and validity, the implication is that the potential result is to lower the
reliability and validity. Quite legitimately, we can
ask what happens if we administer a test with no
modication to a disabled individual. In many
cases, the test would be invalid.
14:30
301
about the reliability and validity of modied testing was compiled by researchers at Educational
Testing Service (ETS). From 1982 to 1986, ETS
conducted a series of studies on the performance
of individuals with disabilities on the SAT and the
GRE. Four types of disabilities were considered:
visual impairment, hearing impairment, physical
impairment, and learning disabilities. The results
of these studies were described in a comprehensive report (Willingham, Ragosta, Bennett,
et al., 1988). These researchers assessed the comparability of standard and nonstandard (i.e.,
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
302
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
World of work. Licensing or certication procedures that include a test are required for many
occupations. In addition, civil-service systems
rely heavily on screening through testing. Thus,
disabled individuals may encounter such tests,
especially in rehabilitation agencies that assist
them with training and obtaining employment.
Employment testing has been under close
scrutiny since the mid 1960s when it was observed
that the use of such tests tended to screen out
minority-group members. Such close scrutiny
had three major consequences: (1) The importance of tests in the total hiring process was overestimated. It is rare that decisions to hire are
primarily driven by test results, and test results
are typically seen as just one of many sources of
information. (2) Much effort has been invested
in employment testing, including litigation of
results. (3) Use of tests in the world of work has
declined, in large part because of confusion about
14:30
303
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
304
the important decisions made about an individual such as placement in particular educational
programs, admission to college and professional
schools, military assignment, and so on, are made
in part upon the basis of test results. One can
argue that such use of tests is suspect and negative, and that tests ought to be banned. It can
also be argued that denying a person the opportunity to take a test and to show in an objective
and standardized manner what the person can
do, is an infringement of that persons civil rights
(Sherman & Robinson, 1982).
Full participation in American society has
become a major goal of the disabled, one that
has been supported through a variety of federal legislative acts. Thus, children with specialeducational needs are no longer segregated, but
where possible are mainstreamed in the regular
classroom, and such decisions are made in part
on the basis of test results.
From a legal standpoint, it is assumed that disabled people can be tested in a way that will not
reect the effects of their disability, and that the
obtained scores will be comparable with those
of people who are not physically challenged.
Whether in fact such assumptions can be met
psychometrically, is debatable.
Minimum competency testing. There has been
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
305
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
306
amount of technical material on the GATB, ranging from a very hefty test manual to government
reports and studies in the scientic literature. For
a description of other aspects of this program,
see Droege (1987). Like other tests, the GATB is
not free of criticism, and despite its wide use, it
seems to have some signicant weaknesses (see
Keesling, 1985). We will return to this test in
Chapter 14.
The Standards. The Standards for Educational
known legislative act made it illegal to discriminate among people on the basis of race,
color, religion, sex, or national origin. The
Act addressed ve major areas in which blacks
had suffered unequal treatment: (1) political
participation; (2) access to public accommodations, such as restaurants; (3) access to publicly
owned facilities, such as parks; (4) education; and
(5) employment. Much of the language of the Act
was prohibitory in nature (thou shalt not . . .),
but there were a number of provisions (called
titles) for implementing various sections of the
Act. Title VII for example, enumerated unlawful employment practices and established the
Equal Employment Opportunity Commission as
a means of implementing its provisions. Most
legal challenges to the use of standardized tests
have fallen under this Title. In effect, Title VII
indicates that if a test results in a proportionally
lower selection rate for minorities and females
than for white males, the procedure is considered
discriminatory, unless the test can be validated in
accord with specic guidelines called the technical validation requirements.
Rehabilitation Act of 1973. This act extended
civil rights protection to people with disabilities
and was a direct descendant of the Civil Rights
Act. One section of the 1973 Act, Section 504,
specically indicates that a disabled individual
shall not be subjected to discrimination. Unfortunately, this act was enacted without specic
indications of how the law was to be implemented. Eventually, guidelines for ending discrimination in areas such as education and
employment practices on the basis of disability
were developed. For example, employers cannot
use tests that tend to screen out disabled individuals unless: (1) it can be shown that the test is
job-related, and (2) alternative tests that do not
screen out disabled individuals are not available
(see Sherman & Robinson, 1982, for a detailed
discussion).
With regard to admission to postsecondary
educational institutions (such as college), Section 504 guidelines eventually evolved to cover
four psychometric aspects:
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
law, the Education for All Handicapped Children Act, mandates that a full and individual
evaluation of a childs educational needs be carried out before the child is assigned to a specialeducation program. However, the specics of
the evaluation are not spelled out, so that the
number and nature of tests used can differ quite
widely from one setting to another. The Act also
mandates that tests and procedures used for the
evaluation and placement of disabled children
14:30
307
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
308
Some
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
309
Table 121. WAIS Verbal Subtests Findings on Visually Impaired (Vander Kolk, 1982)
Adventitious
vs.
Congenitally Blind
No vision
Partial vision
WAIS Verbal
Subtest
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Information
Comprehension
Arithmetic
Similarities
Digit Span
Vocabulary
10.36
10.91
11.47
11.99
9.68
10.85
2.25
2.23
3.12
3.20
3.33
2.56
10.02
10.05
9.63
10.90
10.59
9.86
2.10
2.54
2.28
2.24
2.30
2.14
10.59
9.81
8.72
11.04
11.69
10.45
4.31
4.18
3.79
4.17
3.44
3.70
9.87
10.29
10.35
11.30
10.28
9.85
4.35
2.68
2.71
2.75
2.40
2.28
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
310
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
Spearman-Brown coefcient of .87. A comparison of the difculty level of the test items for these
blind subjects vs. a sample of sighted children
who took the standard form, yielded a coefcient
of .93, indicating a very close correspondence in
the two samples of which items were difcult and
which were easy. However, the mean obtained by
these blind adults was substantially lower than the
means obtained by fth- and sixth-grade sighted
children. The tactual form of the D-48 proved to
be a difcult task, as shown by the mean score on
the WAIS Verbal IQ of 108.9.
The results indicated that scores on the D-48
correlated signicantly with WAIS Verbal IQ (r =
.58), with observers ratings on personal adjustment (r = .39), with interview ratings of intellectual efciency(r = .84), with age (r = .39), and
with educational level (r = .42). Thus, the validity
of the D-48 with blind adults seems substantial.
The task, however, proved to be a rather frustrating situation for many clients. From a psychometric point of view, this is a negative aspect, but
from a clinical point of view, the D-48 may well
provide a situation where the ability to tolerate
frustration and to cope can be observed.
14:30
311
issues and problems in the area of visual perception is the Frostig (Frostig, Lefever, &Whittlesey, 1966; Maslow, Frostig, Lefever, et al., 1964).
This test was developed based on the observation that children who had school difculties often experienced difculty in processing
visual stimuli. These difculties seemed to fall
into ve major areas: (1) eye-motor coordination
(assessed by drawing lines between boundaries);
(2) gure-ground (assessed by such items as hidden gures); (3) form constancy (discriminating
specic geometric gures from other shapes); (4)
position in space (correctly identifying rotated
gures); and (5) spatial relationships (on the
Frostig assessed by copying of patterns by linking
dots).
The Frostig can be administered individually or in small groups and requires somewhere
between 30 to 60 minutes to administer. The
1963 standardization sample consisted of California children aged 4 to 8, with 93% coming
from middle class homes, and including very
few minority children. Thus the sample does not
seem to be representative. The test yields three
sets of scores:
1. A perceptual-age score for each subtest and
for the total test is dened in terms of the performance of the average child in each age group.
2. A scaled score is also available for each subtest, but these scores are not standard scores as
found on tests like the WISC, and their rationale
is not indicated; some authors recommend that
such scores not be used (e.g., Swanson & Watson,
1989).
3. A perceptual quotient can also be obtained as
a deviation score based on the sum of the subtest
scale scores corrected for age variation. Because
this quotient is based on the scaled scores, it too
is suspect.
Three test-retest reliability studies are reported
by the test authors. In two of the studies, the
examiners were trained psychologists, and the
obtained test-retest coefcients ranged from .80
to .98 for the total test score, and from .42 to
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
312
General issues
Prelingual onset. Hearing impairment can also
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
313
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
314
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
315
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
316
Braden (1990) did an analysis of the literature and located 21 studies where a version of
the Wechsler Performance Scale was administered to deaf subjects, and means were given for
at least 5 subtests. A meta-analysis of these 21
studies indicated that although deaf persons have
mean Performance IQs slightly below normalhearing norms, they are well within the average
range. The only exception occurs on the Coding/Digit Symbol subtest, where the mean score
is markedly lower (a mean of 8.77 versus the
expected 10). These ndings were quite consistent across studies. The signicantly lower mean
on Coding/Digit Symbol was consistent across
gender, degree of hearing loss, and version of the
Wechsler used, but was increased by the type of
administration procedure used (e.g., standard vs.
oral administration), the presence of additional
handicapping conditions, and the use of norms
based on deaf persons. Several explanations for
this nding have been proposed, including the
failure of the person to understand the speeded
nature of the task, underdeveloped language skills
needed to mediate the task, the prevalence of neuropsychological decits among deaf persons, and
even test bias (Braden, 1990).
In discussing the Wechsler tests in Chapter 5,
we discussed the concept of subtest variability. At
a theoretical level, we would expect a person to
perform equally well, or equally poorly, across all
subtests, but in fact we typically obtain variability, i.e., a person might do better on some subtests and less well on others. We saw that there
have been attempts to relate such variability to
diagnostic status. Hearing-impaired individuals
show a typical group prole. They perform relatively well on the Picture Completion subtest,
suggesting above-average ability to discern essential and unessential details. They perform somewhat below average on the Picture Arrangement
subtest that requires the subject to correctly place
in order a sequence of cartoon panels to make
a story. Whether this reects poorer social skills
based on more restricted social experiences, or
poorer sequential skills that reect the lack of
continuous auditory stimulation, is debatable.
The hearing impaired perform above average on
the Block Design subtest, where a design is to
be reproduced using wooden blocks, and on the
Object Assembly subtest, essentially a series of
puzzles. They do less well on the Coding subtest,
which requires a number of skills. Whether this
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
317
Performance scale (PS) is probably the intelligence test used most often by psychologists
assessing deaf children. Despite this, there are
actually few studies of the validity of the WISCR PS with deaf children. Hirshoren, Hurley, &
Hunt (1977) administered the PS and the HiskeyNebraska to a sample of 59 prelingually deaf children, average age about 10. The average IQ on the
WISC-R PS was 88, with IQs ranging from 52
to 129. The intercorrelations among PS subtests
were quite similar to those obtained with hearing
children. Scores on the PS correlated .89 with the
Hiskey-Nebraska learning quotient scores; this
and other analyses indicate the two tests are fairly
interchangeable. Braden (1989) compared the
WISC-R PS scores for a sample of 33 prelingually
severely deaf children, with their scores on the
Stanford Achievement Test (hearing-impaired
edition). The correlations between the two sets
of scores were rather low, ranging from a high
of .37 to a low of .08, with a median r of
about .14. In a second study, a number of nonverbal tests of intelligence (such as the HiskeyNebraska) were also used. Again, the correlation between nonverbal IQ and achievement
was rather low. The author questions the criterion validity of the WISC-R PS with deaf children, but also points out that academic achievement may not be an appropriate criterion measure for nonverbal IQs; each involves different
psychological processes, and academic achievement is attenuated (i.e., lowered) by hearing
loss.
On the other hand, the WISC-R is one of the
few standardized tests for which there is a standardized version, of the performance subscales
only, for deaf children, based on a sample of
1974, the Ofce of Demographic Studies of Gallaudet College (a famous college for the hearing
impaired) developed a special edition of the 1973
Stanford Achievement Test for use with hearingimpaired students (SAT-HI; Trybus & Karchmer,
1977). This special edition was standardized
nationally on a stratied random sample of
almost 7,000 hearing-impaired children and adolescents. Like the Stanford, this form is a fullrange achievement test, composed of six different
levels or batteries from primary level through
advanced levels, covering grades 1 to 9. There are
four core-subject areas: vocabulary, reading comprehension, mathematics concepts, and mathematics computation.
The SAT-HI is a group test to be administered
in the classroom using whatever method of communication is normally used. Practice tests are
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
318
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
14:30
319
that there are a number of personality characteristics associated with hearing impairment, such
as neurotic tendencies, excessive anxiety, social
withdrawal, lack of sociability, depression, suspiciousness, social immaturity, and emotional
instability. To what degree these ndings reect
attempts to cope with the deafness or realistic
repercussions of being hearing impaired, or limitations in the instruments used, is an unresolved
issue.
Paper-and-pencil inventories such as the
MMPI and the 16 PF require a fairly high reading level, and also use idiomatic expressions that
hearing-impaired persons may not understand.
However, Brauer (1993) translated the MMPI
into American Sign Language (ASL) and reported
some basic data showing the linguistic equivalence of the ASL and English versions of the
MMPI, adequate reliability, but no validity data.
Rosen (1967) showed that hearing-impaired persons, whose academic achievement scores were
sufcient to understand MMPI items, in fact
did not understand many of these items due to
their idiomatic nature; in addition, some of the
MMPI test items are not appropriate for hearingimpaired persons. Thus, a number of researchers
have turned to projective tests. Many of these projective techniques are of limited use because they
emphasize language and communication, as in
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
320
persons have
often been described in the literature as rigid and
concrete, lacking imagination, and having limited abstract and divergent thinking (Myklebust,
1964). Recent research suggests that some of
these ndings are more reective of the linguistic
limitations of hearing-impaired children rather
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
PHYSICAL-MOTOR DISABILITIES
Nature of physical disabilities. From a psychometric perspective, physical disabilities present
a much more heterogeneous set of conditions,
some requiring no particular modications in
test or test administration and others presenting substantial challenges. Even within a particular disability, clients may differ dramatically from
each other in their test-taking capabilities.
Three of the major categories of physical disabilities that can present challenges in the testing situation are those due to neuromuscular diseases, major physical injuries, and severe
chronic health problems. Neuromuscular diseases include conditions such as cerebral palsy
and muscular dystrophy. These often involve
troublesome involuntary movements, clumsy
voluntary movements, impaired mobility, and
sometimes evidence of brain injury as reected in
impairments in verbal skills and in motor coordination. Physical injuries can also be quite varied and may involve paralysis due to spinal-cord
injury or orthopedic disabilities, such as injuries
to a limb. Finally, chronic health problems can
range from cancer to severe allergies to conditions such as diabetes and asthma.
Motor impairment. In this chapter, we include
14:30
321
ically disabled individuals: psychological, physical, and psychometric (K. O. White, 1978).
Psychological considerations include sensitivity to the fact that some disabled individuals have
limited opportunities for social interactions; for
these individuals, the testing situation may be
frightening and strange, and their test anxiety
may well cloud the results.
Physical considerations involve the verbal contents of a test and the performance requirements.
The content of some items may be inappropriate for disabled individuals for example, a personality test item that reads, I am in excellent
health. The test itself may be inappropriate in
that it may require speed of performance or other
aspects not in keeping with the clients coordination, strength, or stamina. When standard instruments are used with disabled individuals, often
they are modied to meet the requirements of the
specic situation. There may be a need for a comfortable work space designed to accommodate a
wheelchair, or assistance in some of the manual
tasks involved in a test such as turning the pages
of the test booklet, indicating responses by gestures, pointing, or pantomime, perhaps longer
and/or more frequent test breaks, or extra time.
Speed tests may also be inappropriate.
Psychometric considerations involve the
impact that modications of administration
have on both the normative aspects of the test
(e.g., Does the same raw score obtained under a
time limit vs. untimed mean the same?), and on
the psychometric aspects, such as validity.
The client is the focus. Most experts agree that
the examining procedures for physically disabled
individuals need to be modied to reduce the
physical barriers that can interfere with appropriate testing. Keep in mind that the aim of testing is typically to obtain a picture of what the
individual can do. The examiner should be aware
of the nature and extent of the clients disabilities, and the client should be contacted prior to
testing to determine what testing modications,
if any, are required. As always, rapport is important. Patience, calmness, adaptability, and tact are
desirable examiner characteristics, as well as eye
contact, emphasis on ability as opposed to disability, and acceptance of the client as an independent, fully functioning individual (K. O. White,
1978).
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
322
Disabilities and personality. Years ago, some
The Ostomy Adjustment Scale. Some instruments such as the Wechsler tests or the MMPI
are designed for a rather broad segment of the
population for example, all normal individuals who are at least 16, or all psychiatric patients.
Nonetheless, they can be quite useful with specic populations, such as the physically disabled.
Some instruments, however, are designed specifically for a target population, a somewhat narrower segment; an example of this is the Ostomy
Adjustment Scale (Olbrisch, 1983).
More than 1.5 million persons in the
United States and Canada have undergone
ostomy surgery, with approximately 110,000 new
ostomies performed each year. A stoma is a passage way that is surgically constructed through
the abdominal wall as an exit for body waste.
There are several specialized procedures here,
including a colostomy which is a rerouting of
the large intestine, often performed with older
persons who have been diagnosed with colorectal cancer. One of the key issues in recovery from
this surgical procedure is the patients emotional
adjustment and acceptance of the stoma.
14:30
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
Olbrisch (1983) attempted to develop a reliable and valid measure of adjustment to ostomy
surgery to evaluate that process of adjustment
and the effectiveness of mutual aid groups for
ostomy patients. The rst step was to generate a
pool of potential items based on a review of the
literature, as well as the contributions of three
ostomy patients and three expert professionals.
From this pool of items, 39 were selected that
reected a wide range of situations, applicability to most potential respondents, and readability. The items were worded so they could be
responded to on a 6-point Likert scale (e.g., I
can lead a productive and fullling life despite
my ostomy, and I feel embarrassed by my
ostomy, as though it were something to hide).
The scale, together with several other instruments, was mailed to a sample of 120 ostomy
patients; of these, 53 returned usable questionnaires. These patients ranged in age from 19 to
83, included 29 males and 24 females, with the
average time since surgery of about 21/2 years.
Five of the items were eliminated because of
low item-total correlations or low variances, suggesting that the items were being misinterpreted
or did not discriminate among participants. For
the 34 remaining items, the Cronbach alpha was
.87 indicating high internal consistency. A testretest analysis with an interval ranging from 2 to
6 weeks yielded an r of .72.
Discriminant validity was established by showing that scores on the scale were not signicantly
correlated with measures of social desirability
and measures of self-esteem. Convergent validity
was shown by small but signicant correlations
with such variables as number of months elapsed
since surgery, whether the surgery had been elective or emergency, and whether the patient was
able to work or not.
An exploratory factor analysis yielded 12 factors, with the rst 5 factors accounting for 69%
of the variance. These factors cover such dimensions as normal functioning and negative affect.
Based on the factor analysis and other considerations, the author divided the items into two
alternate forms of 17 items each.
SUMMARY
14:30
323
SUGGESTED READINGS
Nester, M. A. (1993). Psychometric testing and reasonable accommodation for persons with disabilities.
Rehabilitation Psychology, 38, 7585.
An excellent article that covers the legal and psychometric
issues related to nondiscriminative testing of persons with
disabilities. The author is a psychometrician with the United
States Ofce of Personnel Management, who has written
extensively on the topic.
P1: JZP
0521861810c12
CB1038/Domino
0 521 86181 0
324
DISCUSSION QUESTIONS
14:30
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
AIM Because much testing occurs in a school setting, this chapter looks at testing
in the context of school, from the primary grades through professional training. For
each level, we look at a representative test or test battery, as illustrative of some of the
issues, concerns, and purposes of testing. The intent here is not to be comprehensive,
but to use a variety of measures to illustrate some basic issues (see R. L. Linn, 1986).
PRESCHOOL ASSESSMENT
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
326
should be prescriptive and should provide information relevant to school instruction (Neisworth
& Bagnato, 1986). These authors advocate a testteach-test approach, where test items become
instructional objectives and vice versa. They urge
the use of curriculum-based assessment measures, which are basically criterion-referenced
tests that use the curricular items themselves as
the assessment content.
Some general problems. Testing preschoolers
and perhaps broader point of view, we can consider four methods available to assess preschool
children:
1. individual tests, such as the Stanford-Binet;
2. multidimensional batteries; a wide variety
of measures exist in this category, many covering such domains as ne and gross motor
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
327
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
328
should not be taken as evidence of psychometric weakness, but rather reective of the rapid
developmental changes characteristic of this age
group. Preschool children comprise a unique
population, and not simply a younger version of
school-aged children. There is wide variability in
their experiential background in terms of exposure to adults, preschool environments, peers,
demands for responsible behavior, and so on.
Test requirements. As we discussed in Chapter
number of preparatory steps be taken before testing a preschool child. First, the examiner needs
to have a clear and specic referral question.
Often the referral is made in very general terms
for example, assess this child for educational
placement, and so the referral source needs to
be interviewed to determine what information
is wanted and how ndings and recommendations can be related to remedial action or placement decisions. A second step is to study the
available data determining, for example, whether
the child has certain medical conditions such as
vision problems that may alter the testing procedure. A third step is to identify all the relevant
persons who need to be involved in the assessment, such as parents, school nurse, pediatrician, teachers, and so on. The fourth step is to
determine which areas of assessment need to be
emphasized. Testing often covers such areas as
cognitive functioning, social-emotional behavior, motor coordination, adaptive competencies,
and language but in specic cases some of these
may be of paramount importance. Finally, an
overall testing strategy is developed. The examiner decides which specic tests will be used,
in what order they will be administered, and
what specic adaptations will need to be made.
(See Nuttall, Romero, & Kalesnik, 1992, for an
excellent overview on assessing and screening
preschoolers.)
ASSESSMENT IN THE PRIMARY GRADES
The various concerns we have discussed typically continue to be relevant as the child
advances into the primary grades. However, a
new focus appears, and that is how much the
child achieves in school. Thus, achievement test
batteries become important. A typical example is
the California Achievement Tests (CAT) battery.
The California Achievement Tests (CAT)
The CAT is one of several nationally standardized, broad-spectrum achievement test batteries
designed to assess the basic skills taught in elementary and secondary schools. The CAT measures basic skills in reading, language, spelling,
mathematics, study skills, science, and social
studies, and is designed for use in kindergarten
through the twelfth grade. This test battery was
rst used in 1943 and has undergone a number
of revisions, with the fth edition published in
1992. There are two major uses for the CAT: rst,
to determine which specic skills students have
or have not mastered, and second, to compare
students performance with that of a national
sample.
Description. The items in the CAT are multiple-
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
329
Interrelationship of subtests. Do the CAT subtests measure different domains? In fact, the subtests intercorrelate substantially with each other,
with coefcients in the .50 to .80 range. There is
thus substantial overlap, suggesting that perhaps
the test battery is really assessing a general construct (g?) (Airasian, 1989). In fact, scores on the
CAT tend to correlate substantially with scores
on tests of cognitive abilities, with coefcients
in the .60 to .80 range. This could be a troublesome aspect, particularly because the newest
edition of the CAT is said to measure more general understanding, skills, and processes, rather
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
330
than factual content; it is not clear, at a conceptual level, whether we have an achievement or an
aptitude test.
Reliability. In general, the reliability estimates of
content validity is of major importance to educational tests, and so it is not surprising that authors
of achievement test batteries place great emphasis
on content validity, often at the expense of other
types of validity.
The content validity of the CAT was built into
the test from the beginning, as it should have.
Educational objectives to be measured were specied by reviewing curriculum guides, instructional programs, textbooks, and other relevant
materials. Individual items were written by professional item writers, with vocabulary difculty
and readability closely monitored. Both teachers
and curriculum experts reviewed the items, and
special steps were taken to ensure that gender,
ethnic, or racial bias were avoided. For the fth
revision, new items were administered and crossvalidated with representative samples of students,
and the results analyzed using item-response
theory.
Unfortunately, there is less information available on other types of validity. For example, the
test manuals contain evidence that mastery of a
particular content area, as assessed by the CAT,
increases within a grade level from Fall to Spring,
and increases across grade levels.
Norms. Both Fall and Spring norms are available, so that depending on the time of test administration, a better comparison can be made. The
norms are based on sizable samples of pupils at
each level (typically around 10,000), selected on
the basis of a stratied random-sampling procedure, with minority groups well represented,
with Catholic and private schools included, and
These rating scales of childhood behavior problems are used widely by school psychologists.
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
331
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
332
The GED tests, developed by the American Council on Education, are used throughout the United
States and Canada to award high-school level
equivalency credentials to adults who did not
graduate from high school. Specic score requirements are set by each state or Canadian province.
More than 700,000 adults take the GED tests
each year, and nearly 470,000 are awarded a high
school equivalency diploma (Whitney, Malizio,
& Patience, 1986).
Description. The GED battery contains ve tests:
High-School
seniors
GED examinees
Writing skills
Social studies
Science
Reading skills
Mathematics
.93 to .94
.91 to .93
.90 to .93
.89 to .92
.90 to .93
.88 to .94
.86 to .94
.86 to .93
.85 to .93
.81 to .92
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
intended to measure the major and lasting outcomes of a high school program of study, the
authors argue that content validity is of greatest importance. You recall that content validity
is carried out primarily by logical analyses of
test items, and that ordinarily it is built into the
test rather than analyzed afterwards. Items from
the GED tests are written by teams of experienced educators, and test content is carefully analyzed by other teams of curriculum specialists and
related professionals.
Concurrent validity. As indicated above, the
8:12
333
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
334
12. Some subject areas, such as reading and mathematics, are assessed every 2 years, while other
areas are assessed every 4 or 6 years. For sample
items and an overview of the development of the
NAEP, see Mullis (1992).
The items or exercises for each content area
were developed using a consensus approach,
where large committees of experts including concerned citizens specied the objectives for each
content area. The test results are then analyzed
using item-response theory.
The NAEP was conceived as an information
system that would yield indicators of educational
progress, just as the Consumer Price Index is
one indicator of economic health (R. L. Linn &
Dunbar, 1992). Rather than report global test
scores, analyses of the NAEP are based on the
individual items or exercises; thus the basic scoring unit is the percentage of test takers who successfully complete a particular exercise.
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
335
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
336
developers have is the degree of test sophistication or test-taking familiarity a person may have.
We want a test score to reect a persons standing
on the variable tested, and not the degree of familiarity the subject has with multiple-choice items.
This is a particular concern with tests such as the
SAT where, on the one hand, sophisticated examinees should not beat the test and, on the other
hand, naive examinees should not be penalized
(D. E. Powers & Alderman, 1983).
Two approaches are usually taken to remove
sophistication as an extraneous variable. The rst
is to make sure that the test items and directions
are easy to read, not complicated, and do not have
extraneous clues that can help test-wise examinees. The second approach is to make sure that
all examinees are equally sophisticated by teaching all of them the precepts of good test-taking
strategies (e.g., budget your time; dont spend too
much time on any one item, etc.), and by providing practice examples of test items. In 1978,
the College Board introduced a booklet called
Taking the SAT, designed to familiarize students
with the SAT and provide all candidates with the
same basic information about the test. Before the
booklet was made available to all students, prepublication copies were sent to a random sample
of SAT candidates. A comparison was then made
of the effects of the booklet on test scores. The
results indicated that, although the booklet was
useful, reading it had a minimal effect on subsequent test scores.
Gender gap. There is currently an average dif-
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
the selection of applicants for admission to college theoretically benets both the institution
and the individual. By identifying students who
potentially will fail, the institution is less likely
to waste its resources, and so is the individual.
Both false negatives (a selected individual who
fails) and false positives (a potentially successful
student who is rejected) are of concern, however.
Thorndike (1971) argued that if members of
a minority group tend, on the average, to score
lower on the predictor (i.e., the SAT) than on
the criterion (i.e., GPA) as compared with the
majority, then there will be a relatively higher
incidence of false negatives, if the test is used as
a selection device. R. D. Goldman and Widawski
(1976) analyzed student scores at four universities and found that the use of the SAT in selection
of black and Mexican-American students shifted
the number of errors from the false positive category to the false negative category.
A number of studies have shown that minority
or disadvantaged students, when admitted to college, can do quite well academically, despite relatively low Scholastic Aptitude Test scores. These
ndings raise issues about the validity of such
scores for minority students, and a number of
studies have shown that high-school achievement
is a better predictor of college achievement than
are test scores. In general, high-school GPA correlates about .30 with college GPA, with test scores
adding little to the prediction.
Houston (1980) studied a small sample (n =
61) of black students given special admission to
a university. A comparison of those students who
had graduated within eight semesters vs. those
who had been dismissed for academic reasons
indicated signicant differences in high-school
rank, college GPA, and SAT-M scores, but not on
SAT-V. Although the difference was in the right
direction of 371 vs. 339, the sample size was too
8:12
337
Year 1
Year 2
White
Asian
Hispanic
Black
Indian
.38
.57
.35
.36
.40
.42
.54
.52
.40
.42
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
338
Year 1
Year 2
White
Asian
Hispanic
Black
Indian
.22
.26
.16
.29
.04
.24
.53
.44
.39
.23
High-school GPA
Year 1
.35
.56
.34
.29
.40
Year 2
.37
.45
.47
.37
.32
and Richards (1974) analyzed SAT scores and academic performance, as dened by second quarter
college GPA, for a sample of Mexican-American
and Anglo-American students attending a large
California university. On the SAT-V, SAT-M, and
GPA, the Anglo students scored higher. For the
Mexican-Americans, SAT-V correlated .33 with
GPA, and SAT-M correlated .12 (corresponding
correlations for the Anglo group were .40 and
.37). A regression equation developed on the
Anglo group correlated .44 with GPA. When this
regression equation was applied to the MexicanAmerican sample, there was an overprediction
of GPA; the actual average GPA was 2.28, but the
predicted GPA was 2.66. The entire study was
replicated on a subsequent larger sample with
similar results.
tors in the area of using aptitude and achievement tests in college admissions has been James
Crouse, who has recommended that colleges
abandon the SAT and use standardized achievement tests, instead, to select incoming students
(Gottfredson & Crouse, 1986).
Part of Crouses argument is that the focus
should be on the utility of a test rather than on
its validity. Even if a test is unbiased and predicts desired criteria well, it does not necessarily mean that it should be used. Crouse argues
that the practical benets of the SAT for college admissions are minimal because the SAT
provides predictions of success that are largely
redundant with those made from high-school
grades alone. SAT scores and high-school rank
are moderately correlated (in the .40 to .50 range)
with each other and with educational outcomes
such as college GPA, so that outcomes predicted
from high-school rank alone have a part-whole
correlation of at least .80 with outcomes predicted
from high-school rank plus SAT scores.
Crouse argues that achievement tests should be
substituted for the SAT in that such tests are no
worse than the SAT in predicting academic success. Their advantage is that they would promote
diligence in high school. The SAT is seen as a
measure of how smart a person is, presumably,
in part, an innate characteristic. The achievement
tests, however, would reect how hard a person is
willing to work (see Crouse, 1985, and a rebuttal
by Hanford, 1985).
Aptitude vs. achievement. In addition to the
SAT, some colleges, particularly the more selective ones, require applicants to present scores on
the CEEB achievement tests. There are a number of these, and often the candidate has some
degree of choice in which ones to take, so that
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
339
Correlation
High-school GPA
SAT
Calif. Achievement Tests
ACT
.49
.42
.38
.37
an interesting question: Are measures of intelligence, specically the information and vocabulary subtests of the WAIS, better predictors of
college achievement than are tests such as the
SAT? He was able to locate four relevant studies
and summarized the results, given in Table 13.5.
Note that in the rst two studies, the WAIS subtests are better predictors of college GPA than
are achievement-test scores. In one sense, this
is not at all surprising. Academic achievement
is a function of the broad intellectual capabilities one has, rather than a specic degree of
knowledge.
Decline in SAT scores. Because the SAT is used,
test
information Vocabulary
Study 1
Study 2
Study 3
Study 4
.38
.30
.46
.25
.48
.43
.19
.46
.38
.45
.25
Note: Different achievement tests were used in the different studies; study 3 used the SAT.
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
340
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
341
Reliability. Test-retest, internal consistency, and
alternative form reliability coefcients for the
SAT range from the high .80s to the low .90s;
the KR-20 reliability for the SAT-V is about .91
and for the SAT-M is about .92.
Validity. Most of the validity information is pre-
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
342
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
343
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
344
more than once and therefore may present multiple scores in their application. Studies have
shown that individuals who repeat the General
Test show on average a score gain of about 25
to 30 points but these individuals are a selfselected group who believe that repeating the test
will increase their scores. ETS suggests that multiple scores can be averaged, or only the most
recent or highest score be used.
Computer version. In 1992, the GRE program
coefcients for the verbal and quantitative sections, regardless of academic department and criterion used in measuring academic achievement.
Typical coefcients range from .20 through the
low .30s. GRE Subject Tests tend to be better predictors of rst-year GPA for specic departments
than the GRE General Test, and GRE quantitative
scores tend to be better predictors in the mathematical and physical sciences. Validation studies by ETS (1977) show median validity coefcients that range from .02 to .36 for the verbal
section and from .06 to .32 for the quantitative
section. Jaeger (1985) reported that the median
predictive validity coefcients for the verbal score
ranged from .02 to .36 across nine major elds of
study, while corresponding coefcients for the
quantitative score ranged from .06 to .32. A 1988
review (Staff, 1988) of 492 validity studies done
between 1983 and 1988 indicated that correlations between GRE scores and graduate grades
were low and ranged from .15 to .31. When GRE
scores were combined with undergraduate GPA,
the correlations rose somewhat to a high of .44.
Jaeger (1985) pointed to a time trend in the correlation between GRE and undergraduate GPA
with graduate GPA. In studies done in the 1950s
and 1960s the median r was about .45; in the
1970s it was .39, and in the 1980s it was about
.35. The suggested explanation was restriction of
range due to grade ination.
How does the GRE predict graduate rst-year
GPA? The ETS Guide yields substantial information based on large national samples, and some
results are summarized in Table 13.6.
Note that undergraduate GPA is a better predictor of graduate GPA than are the GRE subtests, taken either individually or in combination. However, the best prediction is obtained
when both undergraduate GPA and GRE scores
are placed in a composite (essentially a regression equation). A similar pattern shows up when
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
345
8:12
.37
.30
.29
.28
.34
.46
Undergraduate
GPA
GRE
Verbal
Biology
Chemistry
Economics
Psychology
.37
.51
.43
.37
.33
.36
.31
.37
.24
.27
.22
.29
coefcient is really an underestimate of the correlation for the entire group of applicants. There
are however, statistical formulae to estimate the
correlation for the entire sample. When this is
done, the obtained correlation coefcients are
quite respectable in the .35 to .70 range.
Restriction of range. Various issues are involved
in why the validity coefcients for the GRE are so
low, and why there is substantial variation from
study to study. One issue is that of restriction of
range. Cohn (1985) indicated that the GRE was
the best documented instrument of its type,
and that restriction of range was a major consideration in the small validity coefcients obtained.
Dollinger (1989) analyzed the GRE scores for
105 clinical psychology students admitted without regard for their GRE scores. Restriction of
range of GRE scores was not a problem in this
sample; GRE-V scores ranged from 340 to 800
and GRE-Q ranged from 260 to 770. Dollinger
(1989) used two criteria: (1) number of failed
preliminary examinations, and (2) a composite
that incorporated the failure plus several criteria
on timely progression in the program and faculty judgment. All three GRE scores (V, Q, and
Advanced) correlated signicantly with both criteria, with correlation coefcients ranging from
.33 to .46, and better than graduate GPA. However, when the data were analyzed for minority students, the coefcients dropped substantially, and the only signicant result was that GRE
Advanced Test scores did correlate signicantly
with the criteria for both majority and minority
students.
Huitema and Stein (1993) report an interesting study of 204 applicants to the Department
of Psychology at Western Michigan University,
where GRE scores were required but ignored in
the admissions process. The authors show that
the variation in GRE scores for those 138 applicants who were accepted was essentially the same
as for the total pool of applicants, that is, there
was no restriction of range. Under these circumstances, GRE total scores correlated between .55
and .70 with four criteria of graduate achievement, such as exam scores in Advanced Statistics
courses and faculty ratings. Correlations between
undergraduate GPA and the four criteria were all
nonsignicant. The authors argue that restriction
of range in fact severely limits the validity of the
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
346
Advanced Psychology Test. Although the predictive validity of the GRE Verbal and Quantitative sections have been studied extensively, there
are substantially fewer studies on the Advanced
tests. In the area of Psychology, as an example,
Advanced Psychology Test scores have been signicant predictors of grades and performance
on comprehensive examinations (e.g., Kirnan &
Geisinger, 1981), but not of faculty ratings (e.g.,
Hackman, Wiggins, & Bass, 1970).
House and J. J. Johnson (1993b), in the study
mentioned above, looked at GRE Advanced Psychology Test scores for 293 graduate students in
masters programs. For the entire sample, test
scores were signicantly correlated with grades
(r = .41), but the correlation coefcients showed
substantial variation across program areas, from
a low of .10 for clinical psychology students, to
.56 for counseling psychology students (see Kalat
& Matlin, 2000 for an overview of this test).
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
347
8:12
Clinical
Experimental
students students
.44
.31
.35
.42
.07
.32
.07
.03
.13
.05
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
348
The dissertation is basically evidence that the student is able to conduct scholarly research in a
sound and competent manner. On the one hand,
this represents a potentially useful criterion; on
the other, there are a number of problems such
as separating what portions reect the students
work vs. the mentors work.
In terms of evidence of professional accomplishment, there are a number of criteria that
could be used, such as papers published or presentations at professional conferences. Such criteria have a number of problems. First, they may
not be routinely collected and thus may not be
available for analysis. Second, such accomplishments may mirror a variety of factors other than
professional competence. Finally, such criteria
are typically not normally distributed but are
highly positively skewed.
Among the specially constructed criteria might
be considered global faculty ratings and performance work samples. Ratings are relatively easy
to obtain and provide a fairly convenient criterion. At the same time, ratings are limited, may
show restriction of range, and are open to bias
such as the halo effect, where ratings are inuenced by the observers general impression of the
person being rated. With regard to work samples,
graduate students are being trained both for specic tasks germane to their discipline, such as
analysis of water contamination, and for more
generic tasks such as research, scholarly work,
and teaching. Theoretically, at least, one could
develop work samples that could be used as criterion measures. In reality, such work samples are
quite rare and present a number of both practical
and theoretical difculties (Hartnett & Willingham, 1980).
GRE with Hispanics. Whitworth and Barrientos
on the MCAT, with a great emphasis on its criterion validity as a predictor of rst-year medical
school grades, and to a lesser extent as a predictor
of scores on the National Board of Medical Examiners examinations, particularly part I (NBMEI), which examines knowledge of the basic sciences, and less frequently part II (NBME-II),
which examines knowledge of the clinical
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
349
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
350
Table 139. Actual vs. Predicted Performance on NBME-I (Donnelly et al., 1986)
Predicted performance
Actual performance
High pass
Average pass
Low pass
Fail
26 (92.9%)
2 (7.1%)
0 (0.0%)
0 (0.0%)
50 (24.2%)
140 (67.6%)
16 (7.7%)
1 (0.5%)
0 (0.0%)
27 (42.9%)
30 (47.6%)
6 (9.5%)
0 (0.0%)
10 (14.7%)
19 (27.9%)
39 (57.4%)
Totals:
28
207
63
68
composed of the MCAT (average of four subtests) and grades in anatomy courses (four such
courses). The equation was cross-validated and
then the results were sent to the currently enrolled
medical students who had not yet taken the
NBME-I. This was meant as a counseling device
so that students for whom failure was predicted could take remedial steps. Incidentally, the
regression equation correlated .85 with NBMEI scores, and .90 when it was cross-validated
(this is somewhat unusual, as more typically correlation coefcients drop in value when crossvalidated). How well did the regression equation
predict actual NBME-I scores? Table 13.9 gives
the results. The percentages indicate the conditional probabilities. For example, 57.4% of those
for whom the regression equation predicted failure, in fact did fail, while 27.9% of those for whom
the equation predicted failure did obtain a low
pass. Notice that overall the results indicate a high
degree of accuracy (92.6%), but that the accuracy
is greater in predicting those who do pass rather
than those who do fail.
(1987) studied all students who entered the University of Virginia School of Medicine in the years
1978 through 1984 (n = 974). They determined
that MCAT scores predicted the NBME-I scores
well (rs in the .40 to .50 range), and predicted
medical-school course grades less well (rs in the
.30 to .40 range). Undergraduate science GPA
predicted medical-school course grades well (rs
in the .40 range), but predicted NBME-I scores
less well (rs in the .30 range). These authors
suggested that there might be a method effect
present; both the MCAT and the NBME-I are
long, multiple-choice, paper-and-pencil, standardized tests. Both undergraduate and graduate GPA include laboratory work, test data of
various types, subjective ratings, group projects,
and other less well-dened activities. The correlational pattern may well reect the underlying
nature of these variables.
Interstudy variability. There is a great deal of
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
351
Lloyd, Jones, et al. (1986) studied medical students at Howard University College of Medicine,
a predominantly black institution. The criterion
of performance consisted of grades in all 4 years
of medical school and scores on both parts I and
II of the NBME exams. In general, the predictive
validities of the MCAT scores and of undergraduate GPA were found to be similar to those of studies with white medical students, and the results
supported the use of the MCAT as an admissions
criterion.
Coaching. Although most of the studies on
coaching have focused on the SAT, and to a lesser
extent on the Law School Admission Test (LSAT),
there is also concern about the MCATs. N. Cole
(1982) listed six components of test preparation
programs (or coaching), and each of these components has rather different implications for test
validity:
1.
2.
3.
4.
5.
6.
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
352
Quantitative subtest. The effect was small, however, and attributable to the science review component of these coaching programs.
The Dental Admission Testing
Program (DAT)
sections:
1. A subtest of 100 multiple-choice questions
covers the natural sciences specically the
equivalent of college courses in biology and in
chemistry (both organic and inorganic). This
section requires the simple recall of basic scientic information and is called Survey of Natural
Science.
2. The Perceptual Ability subtest contains 90
multiple-choice items that require the subject to visually discriminate two- and threedimensional objects.
3. The Reading Comprehension subtest consists
of 50 multiple-choice items based on a reading passage, similar to reading material in dental
school.
4. The Quantitative Reasoning subtest contains
50 multiple-choice questions that assess the persons ability to reason with numbers and to deal
with quantitative materials.
Scoring. The test scores for the DAT are reported
In the United States approximately 800 occupations are regulated by state governments, including occupations such as barber, physician, and
psychologist. Other occupations, travel agent or
auto mechanic, for instance, are regulated by various boards and agencies (Shimberg, 1981). For
many of these occupations, licensure or certication involves a test or series of tests. Licensure
is a process whereby the government gives permission to an individual to engage in a particular
occupation. The granting of the license reects
minimal competency, and usually there are definitions of what a licensed practitioner may do.
Furthermore, it is illegal for someone who is not
licensed to engage in any of the dened practices.
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
8:12
353
Validity. This is a challenging issue for licens-
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
354
Tests are used in the school context for a variety of purposes, many of which were discussed in
Chapters 9 and 12. In this chapter, we looked at
the California Achievement Tests as applicable to
both elementary and secondary schools. At the
high-school level we illustrated several content
areas, including the assessment of social competence, the GED tests used to award high-school
equivalency diplomas, and the NAED used as
a national thermometer of school achievement.
For college, the focus was on the SAT, while for
graduate school it was the GRE. Finally, we briey
covered some of the tests used for admission into
professional schools and some of the issues concerned with licensure and certication.
8:12
P1: JZP
0521861810c13
CB1038/Domino
0 521 86181 0
Kaplan, R. M. (1982). Naders raid on the testing industry. American Psychologist, 37, 1523.
A rebuttal of two arguments used by Nader and others as a
criticism of the SAT, namely that the SAT is no better than
chance in predicting college performance and that the use of
SAT scores denies low-income students the opportunity to be
admitted to college.
Menges, R. J. (1975). Assessing readiness for professional practice. Review of Educational Research, 45,
173207.
A review of how readiness for professional practice can be
measured in a variety of ways, especially with regard to the
helping professions.
Powers, D. E. (1993). Coaching for the SAT: A summary of the summaries and an update. Educational
Measurement: Issues and Practice, 12, 2439.
A review of some of the key issues about coaching and of
several meta-analyses that have been done.
8:12
355
Wainer, H. (1993). Measurement problems. Journal of
Educational Measurement, 30, 121.
An excellent article that looks at 16 unsolved problems in
educational measurement and what might be potential solutions.
DISCUSSION QUESTIONS
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
14 Occupational Settings
AIM This chapter looks at some issues and examples involved in testing in occupa-
tional settings, including the military and the police. Many of the tests that are used
are tests we have already seen for example, tests of personality such as the CPI
(Chapter 4), tests of intelligence such as the WAIS (Chapter 5), or tests to screen out
psychological problems such as the MMPI (Chapter 7). Our emphasis here will be on
issues and tests not discussed before.
Purposes of Testing
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
357
14:49
Used by
93.8%
82.7%
78.4%
77.8%
38.2%
34.0%
The most frequently used test was the WatsonGlaser Critical Thinking Appraisal. Among the
most frequently used personality tests were the
16 PF, the CPI, and the MMPI (discussed in
Chapter 4). Of the simulation exercises, the most
frequent was the in-basket (to be discussed
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
358
Do different methods of measuring job performance, such as work samples, ratings by supervisors, self-ratings, measures of production output, etc., result in different validity results for the
same tests?
Nathan and Alexander (1988) conducted
metaanalyses of validity coefcients from tests
of clerical abilities for ve criteria: supervisor
ratings, supervisor rankings, work samples, production quantity, and production quality. They
found that for the rst four criteria, high test
validities were obtained, with validities resulting
from rankings and from work samples on the
average higher than the validities resulting from
ratings and from quantity of production. Only
the fth criterion, quality of production, had low
validity and did not generalize across situations.
Criteria Are Dynamic
Ghiselli and Haire (1960) suggested that the criteria against which tests are validated are dynamic,
Quite often, different results are obtained in different studies because different methods were
used. For example, in the area of market research,
questions (i.e., test items) are often presented
either verbally or pictorially. Weitz (1950) wondered whether the two methods would yield
equal results. A sample of 200 adult women was
surveyed. One half were asked verbal questions
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
359
Ree and Earles (1991) studied some 78,000 airmen in 82 job specialties and found that g or general cognitive ability was the most valid predictor
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
360
Although cognitive-ability tests are valid indicators for on-the-job performance, they present
a serious challenge, namely that in the United
States blacks score, on the average, about one SD
lower than whites (J. E. Hunter & R. F. Hunter,
1984). If such tests are used to select applicants,
the result is what the courts called adverse impact.
When this problem arose in the 1960s, the solution seemed relatively straightforward: if cognitive tests were unfair to black applicants, one only
needed to make the tests fair, that is, to remove
the items that were culturally biased.
The evidence collected since then has not supported this approach. As discussed in Chapter 11,
any test that is valid for one racial group is valid
for the other. Single-group validity, where a test
is valid for one group but not another, and differential validity, where the test is less valid for
one group than for another, are artifacts of small
sample size. If tests were in fact culturally biased,
that would mean that the test scores for blacks
would be lower than their true ability scores, and
so their job performance would be higher than
what is predicted by their test scores. In fact, however, the regression equations for blacks are either
equal to those of whites or overpredict the performance of blacks. The overwhelming evidence
is that differences in average test scores reect real
differences in abilities. These differences are likely
the result of societal forces such as poverty and
prejudice. Eliminate these and the test differences
should be eliminated.
Minimizing Adverse Impact
Supervisors Ratings
A supervisors rating is probably the most common measure of job performance, and the most
common criterion against which tests are evaluated. These ratings are used not only to validate
tests, but for issues such as promotions and pay
raises, as well as to assess the impact of training
programs. Other more objective criteria such as
salary or promotion history can be used instead
of ratings, but most of these criteria have some
serious shortcomings. Rating scales can be constructed from a variety of points of view, and the
literature has not clearly identied one form as
superior to others (e.g., Atkin & Conlon, 1978;
Dickinson & Zellinger, 1980).
Lawler (1967) suggested that the multitraitmultimethod approach (see Chapter 3) might
be a useful one for performance appraisal,
where the multimethod is replaced by multirater. Rather than use a single supervisors ratings, this approach calls for multiratings i.e.,
ratings by supervisors, by peers, by subordinates,
and by the individual anyone who is familiar
with the aspects to be rated of the individuals
performance.
The multitrait requirement is met by rating
somewhere around three to ve traits. One rating should be a global one on quality of job
performance. The nature of the other ratings
depends upon the purpose of the rating procedure and the particular types of behaviors relevant to a specic job. Lawler (1967) suggests that
whatever these behaviors are, the rating scales
should be behavior-description anchored scales.
Rather than rate dimensions such as friendliness
and adaptability (which cannot be rated reliably),
what should be rated is effort put forth on the job
and ability to perform the job.
Lawler (1967) gives as an example some data
based on a group of managers, where superiors, peer, and self-ratings were obtained on three
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
14:49
Occupational Settings
361
Superiors
1. Quality of job performance
2. Ability to perform the job
3. Effort put forth on the job
Peers
4. Quality of job performance
.65
.42
.52
.40
.31
.01
.53
.56
Peers
3
Self
6
.44
.38
.40
.53
.55
.56
.40
.01
.09
.01
.17
.10
.30
.03
.04
.09
.02
.02
.01
.30
.03
.13
.06
.01
.30
.43
.40
.14
Note: The larger squares include the heterotrait-monorater coefcients; the smaller circles include the monotraitheterorater coefcients i.e., the validity coefcients. The remaining coefcients are the heterotrait-heterorater
coefcients.
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
362
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
Variable to be related: list making
14:49
363
High
8
6
Occasionally I make lists
of the things I need to do
Average
5
3
I rarely make lists of the
things I need to do
2
Low
1
anchors.
The literature indicates that self-ratings are typically one half SD higher than supervisory ratings. Subordinates tend to be more lenient in
evaluating themselves than do their supervisors (M. M. Harris & Schaubroeck, 1988). Farh,
Dobbins, and Cheng (1991) studied self-ratings
vs. supervisory ratings in the Republic of China
(Taiwan). Their results indicated a modesty
bias that is, Chinese employees rated their
job performance less favorably than did their
supervisors. Quite clearly, Western individualism
stresses individual achievement, self-sufciency,
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
364
to experience. The results indicated that conscientiousness showed consistent relations with
all job-performance criteria for all occupational
groups. Extraversion was a valid predictor for
managers and sales, the two occupational elds
that involve social interaction. And both openness to experience and extraversion were valid
predictors of the training prociency criterion
across occupations.
Tett, Jackson, and Rothstein (1991) also conducted a meta-analysis of the literature. They
found that personality variables did correlate signicantly with job criteria, although the typical
value of .24 was somewhat low. In addition, they
found higher values as a function of several variables. For example, where job analyses were used
to select predictors, in studies of applicants vs.
incumbents (i.e., people holding the job), and in
military vs. civilian samples. These authors concluded that current validational practices have a
number of correctable weaknesses; specically,
they pointed to the need to carry out job analyses and use valid personality measures. In general,
most correlations between well-designed personality inventories and job-effectiveness criteria fall
in the .40 to .60 range.
The Big Five revisited. As discussed in Chap-
ity tests as screening devices, especially in highrisk or sensitive occupations such as police ofcer
jobs, seems to be increasing. Without a doubt,
the MMPI has dominated the eld. Given that
the MMPI is so frequently given, it is not unlikely
that a police ofcer, for example, may take the test
several times. What is the effect of such repeated
administrations on the obtained test scores? Note
that this question is closely related to, but different from, the question of test-retest reliability.
As discussed in Chapter 7, the long-term testretest reliability of the MMPI, where the intervening period ranges from several months to several
years, is signicantly lower than its short-term
reliability. Graham (1987) indicates that for normal samples, typical MMPI test reliabilities for
retest periods of a day or less are about .80 to .85,
for periods of 1 to 2 weeks they are .70 to .80, and
for periods of a year or more they are .35 to .45.
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
365
higher than the PRC managers. On the Confucian Work Dynamism, PRC managers scored the
highest, next the Hong Kong managers, and lowest of all the U.S. managers. On the Human heartedness dimension, where higher scores reect a
greater task orientation, U.S. managers scored
highest, followed by Hong Kong managers, and
lowest of all the PRC managers. Only on Moral
Discipline were there no signicant differences
among the three groups.
BIOGRAPHICAL DATA (BIODATA)
There is a saying in psychology that the best predictor of future behavior is past behavior. This
does not necessarily mean that people will behave
in the future as they have in the past, but rather
that earlier behavior and experience will make
some future behaviors more likely (Mumford
& Owens, 1987). Past behavior can be assessed
through biographical data or what is called biodata. Biodata forms consist of a variety of items,
some of which are quite factual and veriable
(e.g., what was your undergraduate GPA), and
others that are more subjective and less veriable
(e.g., what is your major strength?). Quite often,
such biodata forms are administered as jobapplication blanks, that contain both descriptive
information (e.g., the persons address and phone
number), and biodata items. The items or questions are then weighted or scored in some manner, and the resulting score used for classication
or predictive purposes. The literature uses a number of labels for biodata such as background data,
scored autobiographical data, or job-application
blanks.
Biodata forms have been used quite successfully in a variety of areas ranging from creativity to executive evaluations. In the early 1920s,
there were a number of studies in the literature
showing that such weighted application blanks
could be useful in distinguishing successful from
unsuccessful employees in a variety of occupations. During and after the Second World War,
such items were formulated in a multiple-choice
format that was quite convenient and efcient.
In general, these measures show relatively high
validity, little or no adverse impact, and job applicants typically respond positively. One major
source of concern is to what degree they are susceptible to faking (Kluger, Reilly, & Russell, 1991).
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
366
Many biodata forms are in-house forms developed by business or consulting companies for
use with specic client companies and not available for public browsing. Others are made available to professionals. For example, the Candidate Prole Record (CPR) is a 145-item biodata
form used to identify high-potential candidates
for secretarial and ofce personnel. All items are
multiple-choice and cover aspects such as academic history, work history, work-related attitudes, and self-esteem. The CPR is untimed and
can be administered individually or in groups,
with an administration time of about 30 to 45
minutes. The CPR can be computer scored (Pederson, 1990). For those who prefer to build their
own, there is a vast body of literature on biodata
forms, including a dictionary of biodata items
(Glennon, Albright, & Owens, 1966).
Items. Items for biodata forms typically reect
two approaches: (1) items that center on a particular domain or criterion for example, items
that have to do with job-related knowledge and
skills; (2) items that reect previous life experiences, such as prior job experiences or attitudes
toward particular work aspects. An example of
the rst type of item might be the following: How
many real-estate courses have you completed?
The response options might go from none to ve
or more. An example of the second item might be:
At what age did you begin working? The response
options might go from 14 and younger to
never.
The nature of biodata items. It is not quite clear
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
367
unteering activities, leadership roles, participation in ones religious faith, etc. The indirect
approach also attempts to identify the information that describes performance on the criterion (e.g., sales volume) but an attempt is
made to identify the psychological constructs
that might underlie such a performance (e.g.,
sales motivation, feelings of inferiority, sociability, etc.). and items are developed to reect such
constructs.
Usually, items for rational scales are retained
based on their internal consistency i.e., does the
item correlate with other items and/or total score.
A particular biodata form may have several such
clusters of items. In general, a number of studies
seem to suggest that such rationally developed
scales have adequate reliability and substantial
predictive validity, although studies comparing
rational vs. empirical scaling techniques show
the empirical scales to have higher initial criterion related validity and similar results when
cross-validated (Hornick, James, & Jones, 1977;
see below). This approach has not received much
attention, although it would seem to be most useful when content and construct validity are of
concern.
3. Factorial scales. Here the pool of biodata
items is administered to a sample of respondents and various factor analytic techniques are
used to determine the smallest number of dimensions (i.e., factors) or groupings of items that are
also psychologically meaningful. Items that load
(i.e., correlate) .30 on a particular dimension are
usually retained and scored either with unitary
weights (e.g., +1), or weights that reect the size
of the loading.
Given that most item pools are quite heterogeneous and that specic items do not correlate
highly with each other, this approach is limited
because its usefulness depends upon homogeneity of item content and high internal consistency. On the other hand, the results of a limited
number of studies do suggest that factor-analytic
results yield fairly stable factor structures across
time, across various groups, and even across
cultures. Mumford and Owens (1987) point to
the paucity of studies on this approach, and suggest that factorial scales may not be as effective as
empirically keyed scales, but may display greater
stability and generality.
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
368
4. Subgrouping. This is a somewhat complicated technique, that also begins with a pool of
items administered to a large sample of subjects.
Item responses are then analyzed separately for
men and for women, and a special type of factor
analysis carried out to determine the basic components or dimensions. For each person then,
a prole of scores is generated on the identied dimensions. These proles are then analyzed
statistically to determine subgroups of individuals with highly similar proles. These subgroups
are then analyzed to determine how they differ on the various biodata item responses. Much
of this approach has been developed by Owens
and his colleagues, and most of the studies have
been carried out with college students, although
the results seem quite promising (Mumford &
Owens, 1987).
Scoring. The scoring of empirical biodata forms
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
369
Not
Licensed Licensed Difference Weight
Do you:
own your
81%
home?
rent a
3%
house?
rent an
9%
apartment?
live with
5%
relatives?
60%
21
5%
25%
16
10%
Empirical score
Rational score
14:49
Original
sample
.59
.35
Cross-validation
sample
.46
.36
Note that the empirical approach was superior to the rational even though the empirical
approach showed shrinkage (i.e., loss of validity) upon cross-validation, whereas the rational
approach did not.
Accuracy. Biodata items rely on the self-report
looked at this issue with mixed results; some studies have found substantial agreement between
what the respondent reported and objective verication of that report, whereas other studies
have found disagreement (e.g., Cascio, 1975; I. L.
Goldstein, 1971; Weiss & Dawis, 1960).
Mumford and Owens (1987) believe that
the discrepancy among studies may reect the
methodology used, and that although there may
be a general self-presentation bias, there is accuracy of item responses when there is no motive
for faking. They also indicate that there are
three techniques that can be used to minimize
faking: (1) use items that are less sensitive to
faking; (2) develop scoring keys that predict
faking; and (3) use faking keys in the scoring (see
Chapter 16).
Reliability. The reliability of biodata items can
be enhanced by a number of approaches:
other, and are compared with a variety of criteria, the issue of validity can be quite complex.
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
370
however, when cross-validated, correlated significantly (.36 and .42) with criteria of creativity.
Biodata with unskilled workers. Scott and R. W.
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
371
validity of assessment centers from a predictivevalidity perspective, but less so from a constructvalidity point of view. In other words, less is
known about why assessment ratings predict
later performance than the fact that they do
(Gaugler, Rosenthal, Thornton, et al., 1987;
Klimoski & Brickner, 1987; N. Schmitt, Gooding, Noe, et al., 1984).
A number of researchers have reported lack of
evidence to support the convergent and discriminant validity of the specic dimensions assessed:
correlations within the same exercise are higher
than the correlations for the same dimension
across different exercises, that is, the multitrait
correlation coefcients are higher than the multimethod ones (e.g., Sackett & Dreher, 1982).
Others have shown that using behavior checklists increases the average convergent validity
and decreases the average discriminant validity
(Reilly, Henry, & Smither, 1990). Sackett (1987)
points out that the content validity leaves much
to be desired it is not enough to pay attention to
the construction of the exercises; how these exercises are presented and evaluated is also crucial.
and tasks used are not standard and vary from one
assessment center to another. Reilly, Henry, and
Smither (1990) for example, report eight dimensions that include leadership, problem solving,
work orientation, and teamwork. Two group
exercises are briey described. In one, the candidates had to arrive at an assembly procedure for a
ashlight or a simple electrical device. In another,
the group was required to organize and plan an
approach to construct a prototype of a robot, after
being shown a model and given tools and parts.
Three assessors were assigned two candidates
each to observe and rate. T. H. Shore, Thornton, and L. M. Shore (1990) used 11 dimensions
falling into two major domains: the interpersonal
style domain that included amount of participation and understanding of people, and the
performance style domain that included originality, work drive (i.e., persistence), and
thoroughness of performance.
dence that assessment centers are useful predictors of subsequent managerial success. The main
issue is not so much whether they work, but why
they work (i.e., predictive vs. construct validity).
Typical conclusions are that assessment centers are useful tools to predict the future success
of potential managers, regardless of educational
level, race, gender, or prior assessment-center
experience (Klimoski & Brickner, 1987). They
seem to work in a wide variety of organizational
settings ranging from manufacturing companies
to educational and governmental institutions;
they can be useful not only for selection and promotion purposes, but also for training, for career
planning, and for improving managerial skills.
Why do these centers work? The traditional
answer is that they are standardized devices to
allow assessment of traits that are then used to
predict future success on the job. They work
because they do a good job of measuring and
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
372
Probably the best known situation or simulation exercise used in assessment centers is the
in-basket technique (Frederiksen, 1962). In this
technique the candidate is given an in-basket
that contains letters, memos, records of phone
calls, etc., and the candidate is asked to handle
these as he or she would in an everyday job. The
behavior (i.e., the actions and decisions the candidate makes) is scored according to content and
to style; content refers to what was done and style
to how a task was completed. Content scoring
procedures typically involve a simple counting
for example, the number of memos completed or
number of decisions made. Stylistic scoring procedures evaluate the quality of the performance,
the quality of the decisions made. Unfortunately,
the technique is not standardized and a variety of
assessment materials and scoring procedures are
used.
Reliability. Three types of reliability have been
investigated: interrater reliability, alternate-form
reliability, and split-half reliability. Interrater
reliability coefcients vary substantially: in one
study, for example, they varied from .47 to .94
and in another study from .49 to .95. Median rs
however appear to be fairly substantial: in one
study, the median r was .80, and in another it
was .91. Schippmann, Prien, and Katz (1990),
concluded that, despite the fact that none of the
studies they reviewed used more than 4 raters
and sample sizes are typically quite small, scorers
and raters are responding fairly consistently to
the data.
Few studies have looked at alternate-form reliability, but the few that have report fairly abysmal
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
Work Samples
14:49
373
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
374
Lawshe introduced the concept of synthetic validity to refer to the notion that tests can be validated,
not against a single overall criterion, but against
job elements (Guion, 1965b; Lawshe, 1952; Lawshe & Steinberg, 1955). These job elements may
be common to many dissimilar jobs. By using
tests that are valid for specic job elements, one
can create a tailor-made battery of tests, even for
a new unique job.
Guion (1965b) presents an interesting application of this method. He studied an electrical
wholesaler rm composed of 48 people, from
president to stock boy. In no case were more than
3 persons doing the same job. Clearly, a traditional approach where test scores are related to a
criterion would not work here. A detailed analysis of the various jobs and job descriptions indicated that there were seven major job elements,
such as sales ability, creative business judgment,
leadership, and work organization. The president and vice president of the company were
then asked to rate the employees on the seven
major job elements, as well as to give an overall
rating. The procedure used attempted to eliminate any halo ratings, and only employees for
whom a particular job element was relevant were
rated on that dimension. The interrater reliability was quite high, ranging from a low of .82
to a high of .95, with most of the coefcients
in the high .80s. Employees were subsequently
administered a test battery that yielded 19 scores.
These scores were then compared to the seven
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
375
Differences in total scores between shorttenure and long-tenure employees were significant, within each racial group, but differences
across groups were not signicant. Point-biserial
correlations between application-blank scores
and the dichotomy of long vs. short tenure was.77
for the nonminority group and .79 for the minority group. These coefcients dropped slightly to
.56 and .58 on the cross-validation samples, as
expected. By using an expectancy chart of the
combined data, Cascio (1975) could increase the
predictive accuracy to 72%, as opposed to the
base rate of 52%.
Occupational Choice
Excessive Turnover
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
376
Between 1983 and 1988, Project A was developed to create a selection and classication system for all 276 entry level positions in the United
States Army. The primary instrument used in this
project was the Armed Services Vocational Aptitude Battery (ASVAB), made up of 10 subtests; 4
of the subtests comprise a test of their own called
the Armed Forces Qualication Test (AFQT). For
each of the entry-level positions, critical scores on
the appropriate subtests were developed, and if
an individual scored above those critical scores,
that person would be assigned to the position,
depending on Army needs, individual preference,
and other aspects. Basically, Project A was one of
the largest validational studies ever conducted;
19 of the entry-level positions, such as motor
transport operator and combat engineer were
studied with samples of 500 to 600 individuals
in each position (see J. P. Campbell, 1990, for
overall details, McHenry, Hough, Joquam, et al.,
1990, for specic validity results and Drasgow &
Olson-Buchanan, 1999, for the history of a computerized version of the ASVAB).
The Dot Estimation Task (DOT)
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
Original stimulus
14:49
377
Rotated
Mirror image
FIGURE 142. Example of stimulus gure used by Dror, Kosslyn, and Waag (1993).
which of the two elds contains the greater number of dots. This is a self-paced test lasting 6
minutes and containing a maximum of 50 trials.
The DOT was intended to measure compulsivity
vs. decisiveness, with the underlying assumption
that compulsive individuals will either actually
count the dots or vacillate in their estimates and
thus have longer response time and attempt fewer
trials in a given time period. Decisive individuals will presumably work quickly and make their
estimates with some rapidity. Lambirth, Gibb,
and Alcorn (1986) administered the DOT and
a personality measure of obsessive-compulsive
behavior to 153 students and retested them 4
weeks later. The reliability of the DOT was low:
.64 for number of trials attempted, .46 for number of correct trials, and .64 for the number of
incorrect trials. All validity coefcients that is
all correlations between DOT scores and scores
on the obsessive-compulsive measure were essentially zero, thus indicating that the DOT cannot
be used as a nonverbal indicator of obsessivecompulsive traits.
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
378
et al. (1971) reported on a study to design and validate a psychological test battery to select police
patrolmen. The battery consisted of 14 tests that
covered four broad domains: motivation, mental ability, aptitude, and behavior. For motivation, measures of background and experience
were used. Mental ability used standard tests
of reasoning, language facility, and perception.
Aptitude covered creative potential and social
insight. And behavior was assessed by various
aspects of temperament and personality. The test
battery was administered to 540 patrolmen who
represented both the upper-rated third and the
lower-rated third in a large urban police department. The sample reected the racial composition of the department: 75% white and 25%
black.
The authors attempted to answer four basic
questions: (1) Are there any racial differences
on the predictor (i.e., test) variables? Signicant differences in group means were obtained
in three of the four areas, with white subjects
scoring higher on motivation, mental ability, and
behavior only on the aptitude area was there
no difference. (2) Are there any racial differences on the criterion (i.e., job performance)
variables? The criteria used here were semiannual performance ratings by supervisors, as well
as variables such as number of arrests made
and disciplinary actions taken against the patrolman. Only four criterion variables showed consistent results. White and black ofcers did not
differ on tenure (i.e., amount of time on the
police force) or absenteeism. Black patrolmen
made signicantly more arrests and had significantly more disciplinary actions taken against
them, a result that might simply reect the area
of their patrol. (3) Do such differences lead to
discrimination? Using complex statistical procedures, the authors found that the best predictive validity was obtained when the test data on a
given racial group were used to predict criterion
scores for members of that same group; the poorest predictive validity was obtained when predicting across racial groups. When the groups were
treated separately statistically, better prediction
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
379
the task is, in the context of a specic occupation or task. The JCI takes about an hour or less
to administer, is easily understood by individuals with limited educational skills, and provides
wide coverage of job elements.
The JCI was administered to 100 individuals
in eight different job groups falling under one of
two occupational areas engineering jobs, such
as drilling machine operator, and clerical jobs,
such as mailroom clerk. For six of the eight job
groups, it was possible to have supervisors complete the JCI as well. The overall pattern of results
indicated a high level of agreement between the
ratings given by the job holders and their supervisors, with most correlation coefcients above
.50. Correlations between composite ratings for
the entire JCI were higher in general than the correlations for the individual sections of the JCI.
A comparison of the engineering vs. clerical
occupational areas indicated signicant differences on both the total score and three of the
ve JCI sections. Clerical jobs required a wider
variety of tools and equipment and used more
mathematical and communication components.
In addition, signicant differences were obtained
across job groups within the same occupational
area.
The Supervisory Practices Test (SPT)
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
380
Item weights were determined by a rather complicated procedure. A new sample of 200 supervisors and 416 nonsupervisors was tested. The sample was then divided into two ways. The rst was
supervisors vs. nonsupervisors. The second was
in accord to the majority response for each item;
that is, for each subject the number of responses
agreeing with the majority was calculated. Then
the entire sample was dichotomized into the
50% choosing the greatest number of majority
responses, and the 50% choosing the least number of majority responses. The majority view
was thus combined with the supervisors view
to develop positive weights for good responses,
and negative weights for poor responses.
Reliability. For a sample of 112 nonsupervisors
who were retested with an interval of seven
months, the obtained r was .77. The authors
argued that for this sample the range of test scores
was restricted (although they varied from 11 to
126), and a more correct estimate of reliability
was .86. Split-half reliability for a sample of 177
individuals was .82.
Validity. The nal form was administered to var-
In the years 1942 to 1945, the U.S. Employment Service decided to develop a general aptitude battery that could be used for screening
applicants for many occupations, with an emphasis on occupational counseling. An analysis of
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
381
Aptitude
Specic subtests
General intelligence
V
N
S
P
Q
K
F
M
Verbal aptitude
Numerical aptitude
Spatial aptitude
Form perception
Clerical perception
Motor coordination
Finger dexterity
Manual dexterity
Three-dimensional space;
vocabulary; arithmetic reasoning
Vocabulary
Computation; arithmetic reasoning
Three-dimensional space
Tool matching; form matching
Name comparison
Mark making
Assemble and disassemble
Place; turn
or scored by computer services. For each subtest, the raw score, dened as number of items
correct, is calculated. Raw scores are then converted to scores that are referenced on the norming population. The specic conversion depends
upon the form of the GATB used, the type of
answer sheet used, and the purpose for which
the subtest score will be used. For example, the
arithmetic reasoning subtest is used both in the
calculation of intelligence and in the calculation of numerical aptitude; the same raw score
will be converted differently for each of these
purposes.
Raw scores are thus converted into aptitude
scores, with a mean of 100 and SD of 20. These
aptitude scores can be further converted to composite scores because the nine basic dimensions
form three composites: a cognitive composite
(G + V + N), a perceptual composite (S + P
+ Q), and a psychomotor composite (K + F +
M). These three composites are then given different relative weights to predict each of ve job
families, that is, these are regression equations
developed empirically. Finally, percentile scores
for the sum of the composites can be calculated
in accord with separate race norms for blacks,
Hispanics, and others.
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
382
Reliability. Temporal stability of the cognitive
aptitudes (G, V, and N) is quite acceptable, with
correlation coefcients typically above .80, and
seems to decline slightly as the test-retest interval
increases. In general, the results compare quite
well with those of similar measures in other test
batteries.
For the perceptual aptitudes (S, P, and Q), testretest coefcients are in the .70s and .80s, while
for the psychomotor aptitudes (K, F, and M) the
test-retest coefcients are in the high .60s and
low .70s. In general, reliability is higher for samples of adults than for samples of ninth and tenth
graders, probably reecting the lack of maturation of the younger examinees.
Equivalent-form reliability in general parallels
the test-retest reliability ndings, with most coefcients in the .80s and .90s. Internal-consistency
reliability is not appropriate to assess the reliability of speeded tests, and because the GATB is
so heavily dependent on brief time limits, internal consistency is not an appropriate reliability
estimate.
Validity. The GATB has been studied intensively,
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
able (Dobbins, 1990). One type involves the comparison of scores obtained by individuals who
occupy different levels of supervision, such as
higher-level vs. lower-level supervisors. In general, the results are that higher-level supervisors score higher than lower-level supervisors,
although confounding aspects such as age and
intelligence are typically not accounted for. A
second type of study looks at concurrent validity with on-the-job ratings. Typical correlation
coefcients are low but signicant, in the .20s
and .30s range, although a number of studies also
report no signicant correlations. A third group
of studies focuses on the convergent and discriminant validity of the How Supervise?, showing that
scores on this instrument are indeed signicantly
correlated with measures of intelligence. A typical
study is that by Weitz and Nuckols (1953), who
used a modied version of the How Supervise?
with district managers in a life insurance company. Scores were compared with various criteria
that reected volume of sales, as well as personnel turnover. None of the correlation coefcients
were signicant, except for a correlation of .41
between number of right answers on section 1
and educational level.
Work Orientation scale of the CPI (WO). One
14:49
383
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
384
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
14:49
385
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
386
of view. Subsequently, Sacket, Burris, and Callahan (1989), were able to locate 24 studies. One
of the ndings is that little theft is detected.
Thus, although the score differences between
those individuals who are detected for theft and
those who are not are substantial, the correlation
coefcient is very small due to the lack of variance
on the criterion, i.e., few employees are detected
for theft.
An APA Task Force (Goldberg, Grenier, Guion,
et al., 1991) reviewed some 300 studies and concluded that for those few tests for which validity
information was available, their predictive validity was supported. In part, this may have been due
to the fact that validity criteria for integrity tests
were expanded to cover such aspects as absenteeism, personnel turnover, behavioral indicators such as grievances, and supervisory ratings.
However, validity information was available for
only a few of the tests, and much of what was available was judged to be fragmented and incomplete. It was suggested that integrity tests predict
more validly at the untrustworthy end of the scale
than at the trustworthy end.
Ones, Viswesvaran, and Schmidt (1993) also
conducted a meta-analysis of the validity of
integrity tests. They concluded that the best estimate of the mean true validity was .41, when the
criterion is supervisory ratings of overall job performance. They also concluded that the validity
for predicting counterproductive behavior on the
job (such as theft and absenteeism) is also fairly
substantial, but that validity is affected by several
moderator variables such as whether the sample
is composed of applicants or current employees.
Most integrity tests do not correlate signicantly with measures of intelligence, with available coefcients typically in the .10 to +.10
range. When scores on the attitude and the
admissions subsections are correlated with each
other, signicant coefcients are obtained, often
in the .50 to .70 range.
Construct validity. This is a much more com-
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
indicated that there were three concerns of interest to employers. The rst is the extent to which
the use of integrity tests leads to a generally
improved workforce, e.g., having greater job satisfaction. Some studies do indeed present such
evidence. A second concern might be the usefulness of integrity tests in predicting the occurrence
of such events as theft. The conclusion here is that
few predictive studies are free from methodological difculties and that most integrity tests have
not been used in predictive studies. For a few
tests, however, the available evidence would support their predictive validity. A third concern is
the extent to which losses due to employee theft
are actually decreased. Although the picture is
far from complete, the available evidence suggests that for some integrity tests there is documentation available to support their utility in this
respect.
Faking. A major concern with integrity tests is
their resistance to faking. Three lines of inquiry
are relevant here (Sackett, Burris, & Callahan,
1989): (1) the effects of direct instructions to fake
good; (2) the correlation of test scores with various indices of social desirability and lie scales
(see Chapter 16); and (3) studies that statistically
separate the effects of lying from the obtained
validity coefcients. Few studies exist in each category, and at least for category (1) the results
are contradictory although in both studies the
instructions to fake did result in signicantly
different mean scores. For the second category,
most available studies do show signicant correlations between integrity test scores and various
measures designed to assess faking. Under category (3) the very limited evidence suggests that
integrity tests do measure something beyond the
mere ability to fake.
Factor analysis. Studies of factor analyses of
integrity tests suggest that the instruments are
not unidimensional, but little is known about the
utility and validity of the individual factor scores.
14:49
387
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
388
A typical study. Collins and Schmidt (1993)
looked at white-collar crime i.e., nonviolent
crimes for nancial gain that utilize deception
and are usually carried out by an individual rather
than an organization. They wondered whether
integrity and personality measures could discriminate between white-collar criminals and
their nonoffender colleagues.
They studied 329 federal prison inmates who
had been convicted of white-collar crimes, who
volunteered for the study, and 320 individuals
employed in white-collar positions of authority.
All subjects were administered the CPI, a biodata
questionnaire, and an integrity scale; all three
instruments yielded a total of 37 scores for each
subject.
For the statistical analyses, the total sample
was randomly divided into a validation sample and a cross-validation sample. The statistical analyses were conducted in three steps.
First, the number of variables was reduced to
16 in the validation sample. Second, a discriminant function was developed in the validation
sample. Third, the discriminant function was
cross-validated. Among the top four variables
that discriminated between criminals and noncriminals was the Performance subscale of the
integrity test (the Performance scale is said to
measure conscientious work attitudes and behavior). The other three variables were scales from
the CPI: Socialization, Responsibility, and Tolerance. As dicussed in Chapter 4, the Socialization scale measures the degree to which individuals adhere to social norms; the Responsibility scale measures the degree to which an individual is conscientious, responsible, and attentive to duty; the Tolerance scale assesses the
degree to which a person is tolerant and trusting.
The authors suggested that the common theme
underlying these four scales is that of social
conscientiousness.
SUGGESTED READINGS
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology,
74, 478494.
An excellent theoretical discussion of the concept of validity
as it pertains to personnel selection and decisions.
Cascio, W. F. (1995). Whither industrial and organizational psychology in a changing world of work? American Psychologist, 50, 928939.
Although not directly related to psychological testing, this
is an excellent overview of the current and future state of
the world of work. The author discusses key changes in such
a world and some of the research questions that must be
answered. A careful reading of this article will indicate many
areas where the judicious appli-cation of psychological testing
could be most useful.
SUMMARY
14:49
P1: JZP
0521861810c14
CB1038/Domino
0 521 86181 0
Occupational Settings
DISCUSSION QUESTIONS
14:49
389
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
AIM This chapter looks at testing in two broad settings: testing in clinics or men-
tal health centers, and testing in forensic settings, settings that are legal in nature
such as jails, courtrooms, etc. Obviously, the two categories are not mutually exclusive.
A clinical psychologist may for example, evaluate a client, with the evaluation mandated by the courts. Under clinical settings, we look at neuropsychological testing,
projective techniques, some illustrative clinical issues and clinical syndromes, as well as
applications in the area of health psychology. Under forensic settings, we look at some
illustrative applications of testing, as well as how legislation has affected testing.
CLINICAL PSYCHOLOGY:
NEUROPSYCHOLOGICAL TESTING
390
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
391
so that all test results can be studied with a minimum of paper shufing. The entire battery takes
some 5 hours to administer, more or less depending on the patients age, degree of brain damage,
and other aspects.
The aim of the Halstead-Reitan is not simply
to diagnose whether there is brain injury, but to
determine the severity of such injury, the specic
localization of the injury, the degree to which
right or left hemisphere functioning is affected,
and to provide some estimate of the effectiveness
of rehabilitation.
The battery requires some rather expensive
and bulky materials, so unlike the MMPI and
WAIS, it is not a portable test. Such testing is
usually carried out in a clinic. Numerous modications of the battery are available; for example,
in the Category Test individual stimulus cards
rather than slides can be used.
Most of the subtests used in the HalsteadReitan are actually borrowed from other procedures and tests. On the one hand, this means that
available psychometric information as to reliability and validity should generalize to this new
use. On the other hand, the entire eld of neuropsychological assessment has paid little attention to such psychometric issues (Davison, 1974).
Perhaps more than any other test we have
considered, the utility of the Halstead-Reitan
is closely related to the competence of the test
administrator. This is not an easy test to administer, score, and interpret. Although most clinical
psychologists now receive some training in neuropsychological assessment, administration, and
interpretation of the Halstead-Reitan requires
considerable knowledge of neuropsychology and
supervised training.
Validity. The validation of the Halstead-Reitan
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
392
Halstead-Reitan in the literature, some substantive and some less so. The apparatus is not mass
produced and is seen by many as clumsy and
very expensive. Test administration is long and
can be stressful; the length of the text means, in
part, that it is a costly procedure to administer or
may not be appropriate for a patient whose medical condition places severe limits on their capacity to complete extensive testing. The scoring is
characterized as simplistic and primarily aimed
at assessing whether there is organicity or not,
rather than using a more normative approach.
Individual practitioners often modify the battery
or use only part of it, so that standardization is
not followed. Finally, the problem of false negatives can be a signicant one (Whitworth, 1987).
Some criticisms may not be fully deserved. For
example, there are normative scores based on age,
education, and gender that allow the test results to
be changed to T scores for comparative purposes
(R. R. Heaton, Grant, & Matthews, 1991). Other
criticisms may not really be criticisms. For example, the Halstead-Reitan does not reect a particular theory of brain functioning; the subtests
were chosen empirically because they seemed to
work. Some see this as a weakness, others as a
strength. A theoretical framework is now available (e.g., Reitan & Wolfson, 1985).
Neuropsychological tests in general have a
number of limitations. Prigatano and Redner
(1993) identify four major ones: (1) not all
changes associated with brain injury are reected
in changed test performance; (2) test ndings
do not automatically indicate the reason for the
specic performance; (3) neuropsychological test
batteries are long to administer and therefore
expensive; and (4) a patients performance is
inuenced not just by brain dysfunction but also
by a variety of other variables such as age and
education.
PROJECTIVE TECHNIQUES
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
393
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
394
The Rorschach was developed in 1921 by Hermann Rorschach, a Swiss psychiatrist. Originally,
the test consisted of some 15 inkblots that had
been created by dropping blobs of ink on pieces of
paper and folding these to form somewhat symmetrical gures. Printing was in its infancy then,
and only 10 inkblots could be printed because
of budgetary constraints (in fact, the publisher
went broke after publishing the inkblots with
an accompanying monograph). The theoretical
background of the Rorschach was Jungian and
psychoanalytic, but much of the evidence was
empirical. Rorschach had administered inkblots
to different patients and people he knew and
had retained those whose responses and patterns
of responses seemed to be related to personality
and psychodynamic functioning. In the United
States, the Rorschach became extremely popular due to primarily ve psychologists: Samuel
Beck, Marguerite Hertz, Bruno Klopfer, Zygmunt
Piotrowski, and David Rapaport, each of whom
14:20
symmetrical inkblots each printed on a separate 61/2 91/2 inch plate. Five of the inkblots
are black and gray, and ve have other colors, all
on white backgrounds. Often the instructions are
not standardized but simply require the subject
to tell what the blots remind him or her of, or
what they might represent, or what they could be
(B. Klopfer, Ainsworth, W. G. Klopfer, et al.,
1954). The inkblots are presented one at a time,
in the same order. This presentation is called the
free association or response phase. Questions
that the subject may have are typically responded
to in a noncommittal way (e.g., however you like).
The inkblots are then presented a second time
for the inquiry phase. Here the examiner asks
whatever questions are necessary to determine
what precisely the response was, where in the
inkblot it was seen, and why it was seen i.e., what
aspects of the inkblot contributed to the perception. The information obtained here is to be used
primarily for scoring the responses. Sometimes,
a third administration of the inkblots is used
(called testing the limits) where the examiner
explores whether other kinds of responses might
be elicited; for example, if the patient gave originally only bizarre responses, could that patient
perceive more normal aspects?
Scoring. There are ve scores that are obtained
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
395
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
396
Perceptual vs. content approaches. One of the
March 4, 2006
14:20
studies on one topic, and their results seem conicting as occurs with the Rorschach a useful
procedure is to analyze overall trends through
a metaanalysis. Parker (1983) conducted such
an analysis on 39 studies; studies that had been
published between 1971 and 1980 in one journal, and that met certain experimental criteria.
He reported that reliabilities in the order of .83
and higher, and validity coefcients of .45 and
higher, can be expected with the Rorschach when
hypotheses supported by empirical or theoretical
rationales are tested using reasonably powerful
statistics (for an example of a specic study, see
Bornstein, Hill, Robinson, et al., 1996). Hiller,
Rosenthall, Bornstein, et al., (1999) concluded
that the validity of the Rorschach is not significantly different from that of the MMPI, with
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
both instruments showing mean validity coefcients of about .29 to 30. An opposite conclusion
however, was reached by Garb, Florio, and Grove
(1998) that the Rorschach is not as valid as the
MMPI.
Percentage of agreement. Percentage of agree-
397
agreement ranging from 88% to 100%. Experimental group subjects (with special instructions)
repeated about one third of their responses, while
control subjects (standard instructions) repeated
about two thirds of their responses.
The protocols were scored on 27 different variables, with only 5 variables showing signicant
changes from test to retest. Most of the testretest correlations were above .70, even for the
experimental group which was instructed to give
different responses.
G. J. Meyer (1997) concluded that Exners
Comprehensive system has excellent interrater
reliability with a mean coefcient of .86, but
Wood, Nezworks, and Stejskal (1997) strongly
disagreed.
The Holtzman Inkblot Technique (HIT)
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
398
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
399
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
400
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
401
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
402
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
403
quently used as a measure of gender role identication. That is, it is assumed that the normal
response to the instructions of draw a person is
to draw a gure of ones own gender. Most of the
time this is a reasonable assumption supported by
the research literature. However, opposite gender
drawings are frequently obtained from women as
well as from young school-aged boys. Numerous
explanations have been proposed, including the
notion that women in our culture are ambivalent about their gender identication. Farylo and
Paludi (1985) point out a number of methodological problems including the observation that
masculinity-femininity is often assumed to be
a bipolar dimension (vs. the notion of several
types including androgynous and undifferentiated individuals), that the gender of the administrator has an impact, and that in our culture
draw a person may well be equated with draw
a man.
The DAP has also found substantial use in
studies of attitudes toward target groups such as
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
404
March 4, 2006
14:20
developed for the Bender-Gestalt, primarily concerned with the accuracy and organization of the
drawings. Some of the aspects that are considered
are the relative size of the drawings compared to
the stimuli, their location on the sheet of paper,
and any inversions or rotations of the gure. The
Pascal and Suttell (1951) scoring system for adult
subjects is one of the better known and includes
106 different scorable features of the drawings,
where each abnormal response is given a numerical value. Typical test-retest reliabilities over a
short time period of 24 hours are in the .70s, and
interscorer reliability of trained scorers is typically in the .90s.
Another popular scoring system, this one for
the protocols of children, was developed by
Koppitz (1964; 1975). Koppitz identied a group
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
405
Validity. The validity of the Bender-Gestalt as a
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
406
more condent than the psychologists. In addition, there was no relationship between a persons diagnostic accuracy and their degree of condence. Goldberg also asked a Bender-Gestalt
expert to diagnose the 30 protocols the
experts diagnostic accuracy was 83%. L. R. Goldberg makes two points: (1) By rating all protocols as nonorganic, one could have obtained an
accuracy of 80%; (2) By scoring the protocols
according to the Pascal-Suttell method, one could
also have obtained an 80% accuracy rate. These
results do not necessarily argue against the usefulness of the Bender-Gestalt, but do support the
notions that psychometric analysis is better than
impressionistic formulations, and that a tests
usefulness is limited by the capabilities of the test
user.
A recent study. Bornstein (1999) carried out
March 4, 2006
14:20
well as those of other candidates, into a regression equation, and in fact compute her predicted
GPA in graduate school. We can then accept or
reject her on the basis of the regression-equation
results. These two methods are known as clinical
and statistical prediction methods.
Meehl (1954) wondered about the relative
accuracy of clinical judgment when compared to
statistical prediction, particularly because studies
of psychiatric diagnoses showed that such diagnoses were often somewhat unreliable, that is,
there was less than unanimous agreement among
psychiatrists. In diagnosing general conditions
such as psychosis, agreement ranged from mid
60% to mid 80%, but for specic conditions the
agreement ranged from about mid 30% to mid
60%. Meehl (1954) found 19 studies relevant to
this issue. Nine of the studies found statistical
prediction to be more accurate; 10 of the studies found no difference between clinical and statistical methods. No studies found the clinical
method to be superior. A later study reviewed 45
such studies and essentially supported Meehls
ndings (Sawyer, 1966). An analysis of 136 studies (Grove, Zald, Lebow, et al., 2000) indicated
that on average mechanical prediction techniques (e.g., using regression equations) were
about 10% more accurate than clinical (i.e., subjective) techniques. Mechanical prediction outperformed clinical prediction in 33% to 47%
of the studies examined, while in only 6% to
16% were clinical predictions more accurate than
mechanical predictions.
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
SVIB, and low on the Printer and MathematicsPhysical Science Teacher scales, whereas type Bs
showed the opposite pattern. Type A were thus
seen as individuals characterized by a problemsolving approach, one that included genuineness
and respect, an ability to understand the patients
experiences, and an expectation of responsible
self-determination.
Eventually, the hypothesis was suggested,
based on some empirical evidence, of an interaction between therapist-type and client diagnosis,
such that type A therapists were more successful
with schizophrenic patients, and type B therapists were more successful with neurotic patients.
Whether this is the case is debatable (Chartier,
1971; Razin, 1971). Unfortunately, the research
in this area has been complicated by the fact that
there are nine different versions of this scale, and
it is not always clear which version is being used,
although at least four of the scales are so highly
intercorrelated that they can be considered alternate forms (Kemp & Stephens, 1971).
Alcoholism
March 4, 2006
14:20
407
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
408
14:20
Denition
8. Maturity fears
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
samples, that correlated with the relevant subscale more than with other subscales, and that
had alpha coefcients above .80. The EDI-2 contains an additional 27 items scored on 3 provisional subscales, as well as expanded norms
totaling almost 1,000 patients. The EDI-2 items
cover both specic eating behaviors, such as dieting, and more adjusted-oriented aspects, such
as meeting parental expectations. Responses are
given on a 6-point scale, ranging from always,
through often and sometimes, to never,
although the actual scores range from 0 to 3, with
responses of never, rarely, or sometimes all
given a zero weight. The scores are recorded on a
prole form that allows direct comparison with
patient and female college-student norms.
The primary purpose of the EDI-2 is to aid
clinicians to assess patient symptoms, to plan
treatment, and to evaluate the effectiveness of
therapeutic interventions. It can also be used as a
screening inventory to identify at-risk individuals, to assess the incidence of symptoms in a target
sample, or to study from a research point of view
what specic changes may be related to various
therapeutic modalities.
The EDI-2 is relatively easy to administer
and score. It is appropriate for both males and
females, primarily for adolescents and adults, but
can be used for children as young as 12. The EDI2 takes about 20 minutes and can be administered
individually or to groups. The answer sheet contains a carbon page that allows a direct translation
of chosen responses into scores on the appropriate scales. Scoring is thus simply a clerical task.
The obtained raw scores can be changed to percentile ranks using patient or nonpatient norms.
For the eight original EDI scales, internal consistency reliability is quite adequate ranging from
the low .80s to the low .90s for patient samples
both in the United States and in other countries (e.g., Norring & Sohlberg, 1988). For the
three provisional scales, the reliabilities are signicantly lower, with internal-consistency alpha
coefcients falling below .70 for two of the three
scales. Test-retest reliability is also adequate, with
coefcients in the mid .80s for short periods (1 to
3 weeks), and substantially lower for longer time
periods.
Much of the validity of the EDI-2 is criterionrelated validity, with several studies showing eating disorder patient samples scoring signicantly
higher than control samples. A number of studies
14:20
409
have also shown that specic subscales show signicant changes in patient samples as a function
of specic treatments. Concurrent validity data
is also available that shows scores on the eight
original subscales to correlate signicantly with
scores on other eating-disorder inventories, with
clinical ratings on the same dimensions, and with
body-image measures. A representative study is
that by Gross, Rosen, Leitenburg, et al. (1986)
who administered the EDI and another eating
attitudes test to 82 women diagnosed as bulimic.
Both instruments discriminated bulimic patients
from control subjects, despite the fact that many
subscales on both tests did not correlate significantly with each other; for example, the EDI
subscale of bulimia did not correlate signicantly with the other tests subscale of dieting.
Many of the subscales also did not correlate signicantly with behavioral measures such as the
amount of calories consumed during three standardized meals. The authors concluded that both
self-report measures and direct behavioral measures of eating and vomiting would be needed,
and that their results supported the criterion
validity of the tests, but showed only partial support for the concurrent validity.
Some construct-validity data is also available;
for example, several studies have obtained eight
factors on the EDI that correspond to the eight
original scales, but some have not (e.g., Welch,
Hall, & Norring, 1990). Investigators in New
Zealand administered the EDI to three samples
of nonpatient women: 192 rst-year psychology
students, 253 student nurses, and 142 aerobic
dance-class enrollees. A factor analysis did not
conrm the original eight scales, but indicated
three factors in each of the samples: a factor focusing on concern with body shape, weight, and eating, a factor of self-esteem, and a factor of perfectionism (Welch, Hall, & Walkey, 1988). Not all
EDI scales show equal degree of validity, and further research is needed to indicate which scales
are indeed valid and useful.
HEALTH PSYCHOLOGY
Psychology and medicine. Originally, the interface between psychology and medicine occurred
in the area of mental health, but recently there has
been great concern with the behavioral factors
that affect physical health and illness. Thus in the
1970s the eld of health psychology developed in
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
410
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
during World War I, as a self-report inventory designed to screen out emotionally unstable recruits. This was the prototype of all subsequent self-report instruments, and although
current ones are much more psychometrically
sophisticated, they still reect much of Woodworths approach.
As I have noted before, self-reports are economical in that they do not require a well-trained
professional for administration and scoring, and
are amenable to psychometric treatment, such
as scoring and interpretation by computer. They
are typically brief, usually inexpensive, and reect
the clients experience through their own assessment rather than through an observer. This last
point can of course be detrimental, in that some
professionals are skeptical about the accuracy of
self-reports. For an overall review of self-report
measures of stress see Derogatis (1987).
Stress. One area of health psychology concerns
March 4, 2006
14:20
411
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
412
March 4, 2006
14:20
psychological symptoms. Part of the original theoretical framework was that life events, whether
positive or negative, were stressful. Subsequent
research has shown that the impact is a function
of the degree of aversiveness of the event. Two
events such as marriage and divorce, may reect
equal amounts of disruption, but the stressfulness is related to the negative event and not the
positive one. Indeed, an important component
is how the individual perceives the event (e.g.,
Horowitz, Wilner, & Alvarez, 1979; Zeiss, 1980).
Another issue is the weighing of the individual items. Originally, different weights were
used through a technique called direct magnitude estimation in which each item was assigned
a mean life change score. Several investigators have shown that alternative scoring schemes,
using unitary weights (i.e., every item endorsed
is counted as one), or using factor weights (i.e.,
items are scored to reect their loading on a factor) are just as predictive, if not more so (e.g.,
Grant, Sweetwood, Gerst, et al., 1978; Lei & Skinner, 1980). In most studies where different scoring techniques are compared, weighted scores
and unitary weight scores typically correlate in
the low to mid .90s (e.g., G. R. Elliot & Eisdorfer,
1981; M. Zimmerman, 1983). Might the weights
of the individual items change across time? Scully,
Tosi, and Banning (2000) found that 14 of 43
events did show a signicant shift in weight, but
there was a correlation of .80 between the original
and new weights.
A third issue concerns the dimensionality of
these life-event questionnaires. Originally, they
were presented as unidimensional, with endorsed
events contributing to an overall stress score. Subsequent studies have presented evidence of the
multidimensionality of such scales (e.g., Skinner
& Lei, 1980).
How stable are life-change scores over time?
In a 2-year study, the rank ordering of the
amount of readjustment required by life events
remained quite consistent for both male psychiatric patients and normal controls, with correlation coefcients ranging from .70 to .96. There
was also stability in the absolute weights given,
but only in the normal controls. These results
suggest that for normal individuals, the perception of the impact of life changes is relatively stable, but the same cannot be said for psychiatric
patients (Gerst, Grant, Yager, et al., 1978).
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
413
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
414
14:20
on intensity (i.e., how often: somewhat, moderately, or extremely often). Here also, no indication is given as to what psychometric considerations, if any, entered into the construction of the
scale.
The two scales were administered once a
month for 9 months to a sample of 100 adults. For
the hassles scale, the average month-to-month
reliability coefcient was .79 for the frequency
score and .48 for intensity; for the uplifts scale
the coefcients were .72 and .60, respectively.
Thus, these subjects experienced roughly the
same number of hassles and uplifts from month
to month, but the amount of distress or pleasure varied considerably. From a psychometric
point of view, these coefcients might be considered test-retest coefcients, and at least for intensity they fall short of what is required. Hassles
and uplifts scores are also related to each other
with a mean r of .51 for frequency and .28 for
intensity perhaps reecting a response style (see
Chapter 16) or a tendency for people who indicate they have many hassles also to indicate they
have many uplifts. These investigators and others (e.g., Weinberger, Hiner, & Tierney, 1987)
have reported that hassles are better predictors
of health status than major life-change events.
Health Status
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
differently by different investigators, and different terms are used as synonyms. For example,
some investigators equate measures of quality of
life with measures of health status, and some do
not. Another issue is the sensitivity of these scales
to clinical change. That is, a measure of health
status should not only show the usual reliability
and validity, but should also be sensitive to detect
those differences that may occur as a function of
treatment, such as chemotherapy with cancer; in
fact, the evidence suggests that many brief measures of health status are sensitive to such changes
(e.g., J. N. Katz, Larson, Phillips, et al., 1992).
Another issue is length. Even a 15-minute questionnaire may be too long when one is assessing
the elderly, the inrm, people undergoing surgical treatments, and so on.
These health-status measures have generally
four purposes: (1) to examine the health of general populations; (2) to examine the effects of
clinical interventions; (3) to examine changes
in the health-care delivery system; and (4) to
examine the effects of health-promotion activities (Bergner & Rothman, 1987). Among the better known health status measures are the Activities of Daily Living (Katz, Ford, Moskowitz,
et al., 1963), the General Health Survey (A. L.
Stewart, Hays, & Ware, 1988), and the Sickness
Impact Prole (Bergner, Bobbitt, Carter, et al.,
1981; Bergner, Bobbitt, Kressel, et al., 1976); for
a review of the major scales see R. T. Anderson,
Aaronson, and Wilkin (1993).
14:20
415
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
416
March 4, 2006
14:20
The McGill PQ, published by Melzack, a psychologist at McGill University, was designed to
provide a quantitative prole of clinical pain.
Originally, it was intended as a way to evaluate
the effectiveness of different pain therapies, but
it is often used both clinically and in research
applications as a diagnostic tool (Melzack, 1975).
The McGill PQ stems from a theoretical basis
that postulates three major psychological dimensions to pain: a sensory-discriminative dimension (for example, how long the pain lasts), a
motivational-affective dimension (for example,
the fear associated with pain), and a cognitiveevaluative dimension (for example, how intense
the pain is).
Melzack realized that the experience of pain
is a highly personal experience, inuenced by
individual differences, perceptions, and cultural
aspects, and developed a gate control theory of
pain to explain how the gating or modulation
of pain is possible (Melzack & Wall, 1965). This
theory is at the basis of the McGill PQ, although
use of this test does not necessarily require acceptance of the theory.
The rst step in the development of the McGill
PQ was to establish an item pool, a set of 102
descriptive words related to pain, obtained from
the literature, patients pain descriptions, and
other questionnaires. These words were sorted
into the three categories, and then further sorted
within each category into subgroups. For example, the word throbbing was categorized in the
temporal category, which is part of the sensory
dimension; similarly, the word shooting is part
of the spatial category and also part of the sensory dimension (Melzack & Torgerson, 1971).
Patients and physicians were then asked to rate
each item as to intensity using a 5-point numerical scale from least to worst. In addition to the
pain descriptor words, there are other components on the McGill PQ (e.g., questions about
the patients diagnosis, an anatomical drawing
on which to indicate the location of pain), but
these are not scored. On the actual test protocol,
the items are presented in 20 groupings of 2 to 6
items each, listed in order of intensity. For scoring purposes, the 20 groupings are divided into
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
there is a fourth miscellaneous category. Painrating scores can be obtained for each of the
four areas, as well as for the total. There are also
four indices that can be obtained: (1) Pain Rating
Index, based upon the mean scale values of the
words endorsed; (2) Pain Rating Index of ranks,
based upon the ranks of the words endorsed; (3)
Total number of words checked; and (4) Present
Pain Intensity, based on the rating scale of 0 (for
no pain) to 5 (for excruciating pain). As might
be expected, the rst two indices correlate substantially (in excess of .90) and thus only the Pain
Rating Index of ranks is used because it is simpler to compute. Number of words chosen also
correlates quite high with the two pain ratings
(r = .89 and above), while the ratings of present
pain intensity correlate less substantially with the
other three indices. The initial scoring system
was weak, and suggestions have been made to
improve it (Kremer, Atkinson, & Ignelzi, 1982;
Melzack, 1984).
Reliability. Test-retest reliability is difcult to
measure because low reliability may in effect
mirror the tests sensitivity pain in its various sources and manifestations does change over
time. The evidence in fact seems to suggest that
the scale is sensitive to changes in pain over time
(e.g., C. Graham, Bond, Gerkovich, et al., 1980).
Factor analysis. Studies of the factor struc-
14:20
417
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
418
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
14:20
419
arteries obstructed 50% or more, scored signicantly higher on all four JAS scales than the other
36 men with lesser atherosclerosis.
The JAS has been translated into numerous
languages and has been the focus of a voluminous
body of literature (see reviews by Glass, 1977, and
by Goldband, Katkin, & Morell, 1979).
FORENSIC PSYCHOLOGY
(Dusky v. United States [1960]), the legal standard for competency to stand trial was established. Such a standard holds that the defendant
must have sufcient present ability to consult
with his or her lawyer and have a rational as well
as factual understanding of the legal proceedings.
Mental-health professionals are often asked to
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
420
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
Voir dire. Voir dire is the process of jury selection, particularly the elimination of potential
jurors whom the attorneys feel may be biased
against the defendant or may not be open minded
to what is to be presented in the courtroom. Studies of jury selection attempt to identify those variables that can reliably predict juror verdicts. This
is not only an important practical issue of interest
to lawyers who may wish to maximize the possibility of obtaining a specic verdict, but is also
an important theoretical issue because understanding how people arrive at a verdict may be
applicable to the understanding of how individuals arrive at solutions in general, and how such
aspects as personality interface with cognitive
abilities. A wide variety of procedures are used
in this process. In terms of psychological testing,
there are three major categories: (1) scales that
have been developed to measure attitudes and/or
personality; (2) scales developed specically for
the voir dire process; and (3) biodata questionnaires (Saks, 1976).
Under category (1) measures of authoritarianism have proven to be popular, with many studies
showing that potential jurors high on authoritarianism are more likely to convict or punish the
defendant (e.g., McAbee & Cafferty, 1982). Under
category (2) a number of instruments have been
developed, but many of these also seem to focus
on authoritarianism (e.g., Kassin & Wrightsman,
1983). Under category (3) studies that use biodata information generally show signicant correlations between biographical variables such as
March 4, 2006
14:20
421
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
422
14:20
What are the legal standards applicable to psychological testing? There are two such sets. One
is the Federal Rules of Evidence (or FRE) that
require mental-health professionals to offer evidence (e.g. results of psychological testing) that
is reasonably relied upon by such professionals.
A second set is known as the Daubert standard
that evidence presented by experts must be reliable (testable, subject to peer review, with known
rate of error, and/or generally accepted within the
eld) and relevant (applicable to the specic case
and helpful to the fact nders in resolving the
legal question).
LEGAL CASES
This act, commonly known as the Equal Employment Opportunity Act, is probably the best
known piece of civil-rights legislation. This act
created the Equal Employment Opportunity
Commission, which eventually published guidelines concerning standards to be met in the construction and use of employment tests. Title
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
March 4, 2006
rst cases to focus on employment testing discrimination. Mr. Myart was a black applicant for
a job at a Motorola factory; he alleged that the
qualifying test he was asked to take was discriminatory in that it required familiarity with white,
middle-class culture. The Illinois Fair Employment Practices Commission agreed that the test
was discriminatory, but the Illinois Supreme
Court overturned the examiners ruling.
Griggs v. Duke Power Company (1971). The
Connecticut
14:20
423
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
424
examinees the right to know more about examinations, particularly, tests such as the SAT and the
GRE. They cover issues as privacy, information
about the tests development, how test scores are
March 4, 2006
14:20
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
form of liability sustained by court decisions happens when an employer fails to exercise reasonable care in hiring an employee who subsequently commits a crime or does harm to others.
In this case, a mailroom clerk who had a prior
record of rape and conviction, followed a secretary home and murdered her. The parents of
the victim sued Monsanto for negligent hiring.
Quite clearly, the employer needs to take preventive measures that might include additional
testing, yet such testing may well be challenged
as discriminatory.
Target Stores 1989. A class action suit was led,
also in California, against Target Stores who used
the CPI and the MMPI to screen prospective security guards. The plaintiffs argued that the questions, such as I am fascinated by re, were not
March 4, 2006
14:20
425
P1: JZP
0521861810c15
CB1038/Domino
0 521 86181 0
426
We have taken a brief look at testing in the context of clinical and forensic settings. The examples
chosen were illustrative rather than comprehensive. We devoted a substantial amount to projective techniques, although these, as a group,
seem to be less popular than they were years ago.
However, they are, if nothing else, of historical
importance and do illustrate a number of issues
and challenges. Testing in the health psychology
area is growing and substantial headway is being
made in applying psychological principles and
techniques to the maintenance of good health
and the understanding of how the human body
and psyche actually function in unison. Testing
is also increasing in forensic settings, while at the
same time being regulated and challenged by a
number of court decisions.
SUGGESTED READINGS
March 4, 2006
14:20
DISCUSSION QUESTIONS
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
AIM In this chapter we begin by asking, What can interfere with the validity of a
particular test score? If Rebecca scores at the 95th percentile on a test of vocabulary,
can we really conclude that she does possess a high level of vocabulary? While there
are many issues that can affect our conclusion, in this chapter we focus mainly on
just one faking. We take a brief look at two additional issues test anxiety and
testwiseness. We use a variety of tests, such as the MMPI and CPI, to illustrate some
basic points; you have met all of these tests in earlier chapters.
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
428
March 4, 2006
14:22
ual differences in the way different persons typically perceive, think, remember, solve problems,
or perform other intellectual operations. These
have typically been called cognitive styles, and a
substantial number of them have been identied.
For example, in decision making, some individuals prefer to take risks while others prefer to be
cautious. In the area of memory, some individuals are levelers and others are sharpeners,
i.e., they either assimilate or exaggerate stimulus
differences. If you ask a leveler what they did on
their vacation, theyll reply, I went to the beach;
if you ask a sharpener, they will give you all the
details.
Although such dimensions are considered
cognitive, they in fact involve not only cognition, but personality, affect, temperament, etc.,
as well as interpersonal domains. Research on
some cognitive styles, such as eld dependence
vs. eld independence, or reection vs. impulsivity, have a fairly substantial base; other cognitive
styles have not been investigated to any degree.
Obviously, these styles can interact with test
performance, yet at the same time they probably reect the persons behavior. For example,
an impulsive individual may do poorly on a
multiple-choice test by not considering the alternatives carefully but in fact, thats how they may
well behave in work situations.
We will not consider cognitive styles further,
although they are very important, and a number
of scales have been developed to assess specic
cognitive styles. A consideration of these would
simply take us too far aeld. For a recent overview
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
429
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
430
Legal issues. The topic of faking is particularly
14:22
Set
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
ing good and faking bad were thought to represent opposite poles of a unitary dimension,
but recently, the two concepts are seen as different dimensions to be assessed separately. Faking good is seen as composed of two independent concepts, typically labeled as self-deceptive
enhancement and impression management.
Are there also two components in faking bad?
At present, the answer is unclear, but the two
major scales in this area, the MMPIF scale and
the CPI Sense of Well-being scale seem to reect
two different approaches. In general, the detection of faking good seems to be more difcult
than the detection of faking bad (R. L. Greene,
1988).
How does one fake good? Basically, by endorsing test items that portray personal honesty and
virtue. The MMPI L scale was developed to detect
this strategy. Another strategy of the test taker
is to overendorse items indicative of especially
good adjustment or mental health. The MMPI
Mp (positive malingering) scale was developed
to detect this strategy (Cofer, Chance, & Judson, 1949). Faking good is often manifested by
a failure to acknowledge commonly held weaknesses and endorsement of a constellation of
unusual virtues. If one considers self-deception
vs. other deception, faking good focuses more on
March 4, 2006
14:22
431
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
432
SOME PSYCHOMETRIC ISSUES
March 4, 2006
14:22
the MMPI subtle scales have no predictive validity). An item that illustrates this type comes from
studies of deception in interviews; the item disturbed affect at/blunted or inappropriate is
in fact characteristic of most psychotics, but has
a low endorsement rate for malingerers (Cornell
& Hawk, 1989).
Type 2 items are endorsed by malingerers,
but not by psychiatric patients. An example
is the item, visual hallucinations. Few psychotic patients actually report such symptoms,
but many malingerers endorse this item.
One approach then is to empirically identify sets of items that statistically discriminate
between individuals who are truly mentally ill
and individuals who are instructed to answer as
if they were suffering from mental illness. Thus,
if only 2% of mentally ill persons respond true
to the item, I hear voices, but 86% of normals
instructed to fake bad do so, then the item would
be an excellent candidate for a fake bad scale. If
an individual we are testing endorses the item,
the probabilities are quite low that the person is
truly mentally ill, but quite high that the person
is faking.
Generic vs. specic scales. One approach is to
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
suppressor variable is one that removes the variance that is assumed to be irrelevant to accentuate
the relationship between the predictor scale and
the criterion. A suppressor variable is one that
is signicantly associated with a predictor, but
not associated with the criterion for which the
predictor is valid. For example, if we have a personality scale that predicts leadership behavior
well, we have a predictor (the scale) that correlates well with the criterion (leadership behavior,
however dened). Suppose however, our scale is
heavily inuenced by acquiescence i.e., our scale
correlates with acquiescence, but the predictor
does not. If we can remove the acquiescence, then
the correlation between our scale and the predictor should increase. Generally, if we can identify
and measure the suppressor variable, then the
validity between our predictor and our criterion
should increase. The Christiansen, Gofn, Johnston, et al. (1994) study of assessment-center candidates indicated that correction for faking had
little effect on criterion-related validity (performance data based on job analyses), but would
have resulted in different hiring decisions than
those made on the basis of uncorrected scores.
These authors concluded that faking is not a serious threat to the validity of personality tests and
that the use of faking corrected scores may be
unwarranted.
Oblong trapezoids. A rather different and
that is, personality scales that attempt to measure ones personality style, such as impulsivity?
March 4, 2006
14:22
433
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
434
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
435
RELATED ISSUES
Does format alter scores? In most personality
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
436
March 4, 2006
14:22
tions and discriminant functions (like regression equations that predict a categorical variable, such as neurotic vs. psychotic or faking vs.
not faking) are very useful analyses to assess faking. Schretlen, Wilkins, VonGorp, and Bobholz
(1992) developed a discriminant function from
combined MMPI, Bender-Gestalt, and malingering scale scores that correctly identied 93.3% of
their subjects, which included 20 prison inmates
instructed to fake insanity and 40 nonfaking control subjects. The discriminant function included
only three variables: the F K index from the
MMPI, a Vocabulary subset of the malingering
scale, and the sum of ve Bender-Gestalt indicators of faking.
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
area is limited by a number of design restrictions. Many studies use college students rather
than samples of individuals who may be more
likely to malinger. Scores of subjects asked to
fake are compared with scores of subjects answering under standard conditions, rather than with
those of individuals who are genuinely disturbed.
Typically, no incentive is provided to subjects
437
50
70
55
54
49
114
48
52
41
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
438
March 4, 2006
14:22
Items can be omitted for a wide variety of reasons discussed in detail in the MMPI Handbook
(W. G. Dahlstrom, Welsh, & L. E. Dahlstrom,
1972). The Handbook suggests that if there is an
elevated Cannot Say score, the subject be interviewed to determine the reason (e.g., suspiciousness, lack of understanding, depression, fear of
loss of privacy, etc.).
The Lie (L) scale. This 15-item scale is designed
to identify deliberate efforts to lie on the test.
The items involve denial of aggression, prejudices, poor self-control, etc., that in fact are rather
common and that most people are willing to
admit. All the keyed responses are false. High
scores on the L scale represent a rather unsophisticated attempt to fake good, and so scores on this
scale are related to socioeconomic level, intelligence, and education, with more sophisticated
persons from higher socioeconomic levels scoring lower. Higher scorers on this scale tend to be
rigid, overly conventional, socially conforming
individuals who are moralistic and unoriginal.
The F scale. The earliest validity index on the
MMPI was the F scale, designed by the authors
to detect deviant response sets. The F scale was
derived from a set of 64 items endorsed by less
than 10% of the normal normative group. The
keyed responses are the infrequent answers; a
high score presumably reects falsication or
random answering. The content of the items is
quite diverse, covering some 19 different areas
such as hostility, poor physical health, feelings of
isolation, and atypical attitudes toward religion
and authority. The scale does not assess why a
person is endorsing the rare answers. Does the
person not understand the directions? Is there a
deliberate intent to fake? Does the persons mental status interfere with honest completion of the
test? (Scores on the F scale do correlate with scores
on the schizophrenia scale.)
The F scale has often been used to detect the
presence of faking bad (e.g., Cofer, Chance, &
Judson, 1949). In one study, the F scale by itself
was the most effective index for identifying faking bad subjects (Exner, McDowell, Pabst, et al.,
1963).
One problem with the F scale is that elevated scores are also associated with blacks,
maladjusted individuals, individuals of lower
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
439
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
440
March 4, 2006
14:22
FIGURE 161. Comparison of MMPI-2 proles for fake bad instructions vs. psychiatric
patients (T scores, rounded to nearest whole number). [Based on J. R. Graham, D. Watts,
& R. E. Timbrook (1991). Detecting fake-good and fake-bad MMPI-2 proles. Journal of
Personality Assessment, 57, 264277. Reprinted with permission of the publisher; Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Prole for Basic Scales. Copyright
1989 the Regents of the University of Minnesota. All rights reserved. Minnesota Multiphasic Personality Inventory-2 and MMPI-2 are trademarks owned by the University
of Minnesota. Reproduced by permission of the University of Minnesota Press.]
of the neurotic patients, but the F-K index identied 9 out of the 11 (82%) as faked. Four judges
(including three of the original MMPI authors)
were able to correctly identify from 55% to 73%
of the 11 proles as faked. For the psychotic proles, the F-K index again identied correctly 9 of
the 11 (82%) faked proles, with the four judges
achieving success rates from 91% to 100%.
In general, differences greater than +10 on the
F-K index are associated with faking bad, and
differences less than -10 are associated with faking good, although the specic cutoff score varies
from sample to sample.
MMPI-2. Initial results on the MMPI-2 show that
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
441
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
442
14:22
concerned with possible faking, a common procedure is to administer, in addition to the usual
tests that do not have a faking index (such as the
Beck Depression Inventory), one or more scales
to measure the possible presence of such faking;
such scales often come from the MMPI.
At least two questions come to mind. First, how
valid are such scales when they are not embedded in their original instrument? And second,
how well do these scales work compared to each
other?
Cassisi and Workman (1992) asked college students to take the complete MMPI-2, and an additional short form that included the 102 items
that comprise the L, F, and K scales. They then
scored these two sets of scales (those in the standard MMPI and those in this short form). They
found correlations of .78 for the L scale, .87 for
the F scale, and .83 for the K scales; these coefcients are quite equivalent to the test-retest correlation coefcients reported in the test manual.
They then asked another sample of students to
take the 102-item form, under either standard
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
inally called the Dissimulation scale, was developed through external criterion analyses that
compared the MMPI responses of actual neurotics with the responses of normal subjects who
were asked to fake neurosis. In the current CPI
(Form 434; Gough & Bradley, 1996) the Wb contains 38 items; these items originally showed large
differences between the neurotic samples and
dissimulation samples, but the items also have
identical rates of endorsement between patients
and normals. The original scale was used with
the MMPI, but when the CPI was published,
the scoring was reversed and the scale retitled
Well-being.
Very low scores on this scale are indicative
of faking bad. The scale however is not simply a validity scale but also reects personological aspects. Thus, high scores tend to be associated with behaviors that reect productivity,
self-condence, getting along well with others
in short, a person who has good health and a
positive outlook on life.
Good impression (Gi) scale. The Gi scale consists of 40 items obtained from a pool of 150
items that were administered to high-school students twice rst under standard instructions,
and second as if they were applying for a very
important job. Those items that showed signicant response shifts under the two sets of instructions were retained for the scale. Most of the
item content reects obvious claims to favorable attributes and virtues, or denial of negative aspects such as failings. As with other CPI
scales, there is also a rather complex layer of
personological implications attached to scores,
March 4, 2006
14:22
443
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
444
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
(1957b) showed that the proportion of endorsement was highly correlated to the social desirability. Edwards (1957b) then developed socialdesirability scale of 39 MMPI items, each item
keyed for socially desirable response. Thus, individuals who obtain high scores on the socialdesirability scale have endorsed a relatively large
number of socially desirable responses, and those
with low scores have given few socially desirable
responses.
Ratings of social desirability are highly reliable,
whether we compare different groups of judges
rating the same items, or the same group of judges
rating the same items on two occasions, with typical coefcients in the .90s. In fact, even when the
groups of judges are quite different, such as college students vs. schizophrenic patients, or they
come from different cultures (e.g., Japan and the
United States), the degree of correspondence is
quite high (Iwawaki & Cowen, 1964; J. B. Taylor,
1959).
Meaning of social desirability. The concept of
March 4, 2006
14:22
445
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
446
have been developed to measure social desirability (Paulhus, 1984), but there are three major
ones: the Edwards scale (A. L. Edwards, 1957b),
the Marlowe-Crowne scale (Crowne & Marlowe,
1960), and the Jackson scale (D. N. Jackson,
1984).
The Edwards social desirability scale evaluates
the tendency of subjects to give socially desirable
responses. It consists of 39 MMPI items that were
unanimously responded to in a socially desirable
fashion by 10 judges, and correlated substantially
with scale total (i.e., the scale is internally consistent).
The Marlowe-Crowne SD scale measures the
tendency to give culturally approved responses,
and consists of 33 true-false items selected
from various personality inventories, items that
describe culturally approved behaviors with a low
probability of occurrence. These items reect cultural approval, avoid the psychopathological content of the MMPI, are keyed for social desirability
on the basis of agreement by at least 9 out of 10
judges, and also showed substantial correlation
with scale total.
Finally, Jacksons social desirability scale is a
scale from his Personality Research Form (see
Chapter 4); it assesses the tendency to describe
oneself in desirable terms and to present oneself favorably. This scale consists of 20 nonpsychopathological items selected from a larger
pool of items scaled for social desirability. These
items were chosen to avoid substantial content
homogeneity, as well as extreme endorsement
probabilities.
Intercorrelation of social-desirability scales.
March 4, 2006
14:22
The authors conducted a factor analysis and concluded that the Edwards and the Jackson scales
assess a dimension of social desirability that they
labeled as a sense of own general capability,
and that the Marlowe-Crowne assessed a separate dimension of interpersonal sensitivity,
thus suggesting that SD has a self component
and an another component.
Components of social desirability. The various
measures of social desirability can be incorporated within a two-factor model, with at least
two such models present in the literature. In one,
the two dimensions are attribution vs. denial
i.e., claiming socially desirable characteristics for
oneself and denying undesirable characteristics
(e.g., Jacobson, Kellogg, Cauce, et al., 1977).
A second model argues that the two dimensions are self-deception vs. impression management in self-deception the respondent actually believes the positive self-report, whereas in
impression management, there is conscious faking (e.g., Paulhus, 1986). Thus self-deception is a
response style that involves an unconscious tendency to see oneself in a favorable light, whereas
impression management is a conscious presentation of a false front (Zerbe & Paulhus, 1987). Most
of the social desirability scales address impression
management, or a combination of the two. Studies suggest that the impression management style
of responding is in fact used by few job applicants
and has a negligible effect on criterion validity
(Schmit & Ryan, 1993).
Ganster, Hennessey, and Luthans (1983)
argued that there are actually three types of
social desirability effects: spuriousness, suppression, and moderation. Social desirability may create spurious or misleading correlations between
variables, or suppress (that is, hide) the relationship between variables, or moderate (that
is, interact with) relationships between variables.
In spuriousness, social desirability is correlated
with both the predictor and the criterion. Any
observed correlation between the two results
from the shared variance in social desirability
rather than some other aspect. Statistically, we
can partial out the effects of such social desirability. In suppression, the social desirability masks
the true relationship. By statistically controlling
for social desirability, the relationship between
predictor and criterion increases in magnitude.
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
447
absence of such symptoms? They therefore developed their scale from non-MMPI type items.
Their initial item pool consisted of 50 items
drawn from a variety of personality instruments.
They retained those items where there was at
least 90% agreement on the part of raters as
to the socially desirable response direction, and
they also retained those items that discriminated
between low and high scorers. The nal scale
consisted of 33 items, 18 keyed true and 15
keyed false, for example, I always try to practice what I preach (true) and I like to gossip at
times (false).
Reliability. Crowne and Marlowe (1960) computed the internal consistency reliability (K-R 20)
to be .88 in a sample of 39 college students, and
the test-retest (1-month interval) to be .89 on a
sample of 31 college students.
Validity. In the original article (Crowne & Mar-
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
448
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
through careful construct validation. In general, to the degree that there is an imbalance
in the true-false keying of the items in a scale,
the scale is susceptible to acquiescent responding. Thus the solution, at least on the surface,
would appear simple: any test constructor should
attempt to reduce such an imbalance. Items can
be rewritten, so that I am happy becomes
I am not happy. There is, however, evidence
that regularly worded items are the most reliable and that negative and polar-opposite items
(e.g., I am sad) may in fact lower the reliability, if not the validity, of a test (e.g., Benson &
Hocevar, 1985; Schriesheim, Eisenbach, & Hill,
1991).
Another way to reduce acquiescence is to make
sure that test items are clear, unambiguous, and
anchored in specic behaviors. Do you like
food? is an item that almost requires a positive
answer, but Do you eat four or more meals per
day? is much more specic and unambiguous.
A number of statistical procedures have also
been suggested as a way to measure and control
acquiescence. Logically, for example, we could
derive two scores from a scale, one based on
the keyed items, and one based on the number of true responses given regardless of the
keyed direction. D. N. Jackson (1967) argued
that such procedures do not generally control
acquiescence.
We could also of course, not use true-false
items, but use other variants such as forcedchoice or multiple-choice procedures, where the
various alternatives are matched for endorsement
frequency.
One way to assess whether acquiescence plays
a role on a particular scale, is to rewrite the items
so that the keyed response is false, and to administer both the original items and the rewritten
items. This has been done for the MMPI, and
the correlations of the original scales with their
corresponding reversed forms are similar to their
test-retest reliabilities, that is, acquiescence does
not seem to play a major role (E. Lichtenstein &
Bryan, 1965).
OTHER ISSUES
Variance in personality tests. J. S. Wiggins
(1968) placed the matter in perspective by indicating that the variance associated with response
March 4, 2006
14:22
449
sets and/or styles covers, at best, 16% of the variance of any socially relevant criterion measure.
Why do scores on a personality scale differ
from person to person? The question may seem
trite and the answer obvious people differ from
each other, that is, variance is due to individual
differences on the trait being measured. However, there are other sources of variance, such as:
(1) variance due to the particular strategy used to
construct the scale, (2) variance due to the specic item characteristics, and (3) variance due to
response styles (J. S. Wiggins, 1968).
1. Different strategies. Despite the proliferation of instruments devised by different strategies, there is very little by way of systematic comparison among different strategies. The study by
Hase and Goldberg (1967) is somewhat unique.
You recall that they compared four strategies of
scale construction, all using the common item
pool of the CPI. The results indicated the four
main strategies to be equivalent.
2. Analyses of the variance due to specic
item characteristics have in large part focused on
social desirability. The procedure here, initiated
by Edwards (1957b), is to have judges estimate
the social-desirability scale value of specic items
using a Likert-type scale. These ratings are then
averaged across judges to provide a mean rating for each item. Such ratings are quite reliable
and show a fair amount of consistency for different groups of judges. There are, at the same
time, signicant individual differences on these
ratings.
3. Variance due to stylistic consistencies or
response styles refers to aspects such as the tendency to be critical, extreme, acquiescent, and so
on. Here also, the focus is social desirability, but
this time the scale of Social Desirability developed by Edwards (1957b) is typically used. This
39-item scale was developed from the MMPI item
pool and from the Taylor Manifest Anxiety Scale
(which is itself comprised of MMPI items). In
fact, 22 of the 39 items come from the TMAS.
Because anxiety is in fact the major dimension
found in tests such as the MMPI, it is not surprising that the Social Desirability scale (composed of
anxiety items) should correlate signicantly with
almost any MMPI scale. It is thus not surprising that Edwards could predict a persons MMPI
scores by equations derived solely from socialdesirability indices.
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
450
of
the earlier literature used direct but simple
approaches to determine the occurrence of faking. Thus, the investigator either had a predetermined decision rule (e.g., all proles that have an
F score of above 80 are to be classied as fake)
or looked at a distribution of results and determined which cutoff score would give maximum
correct identication of proles as either faked or
not. Recent studies combine indices, sometimes
from different tests, to assess faking. A representative example of a well-done empirical study is
that by Schretlen and Arkowitz (1990), who gave
a series of tests, including the MMPI and the Bender Gestalt, to two groups of prison inmates who
were instructed to respond as if they were either
mentally retarded or insane. Their responses were
compared with those of three criterion groups:
psychiatric inpatients, mentally retarded adults,
and prison-inmates controls, all taking the tests
under standard conditions. Based on discriminant analyses, 92% to 95% of the subjects were
correctly classied as faking or not faking.
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
451
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
452
The Psychological Screening Inventory (PSI).
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
453
nonveriable items. Some studies on the fakability of these items have found little response
bias (e.g., Cascio, 1975; Mosel & Cozan, 1952),
while others have (e.g., Goldstein, 1971; S. P.
Klein & Owens, 1965; Schrader & Osburn, 1977;
D. J. Weiss & Dawis, 1960). Some studies have
reported that subjects can improve their scores
when instructed to do so, while other studies show that, in actual practice, relatively little faking occurs, particularly on biodata items
that can be veried (T. E. Becker & Colquitt,
1992). Mumford and Owens (1987) speculated
that such differences may be due to the itemkeying strategies used in scoring the particular
biodata questionnaire.
There are two major strategies used the itemkeying strategy and the option-keying strategy. In
the item-keying strategy the alternatives to each
item are scored in such a way that assumes a linear
relationship between item and criterion. Thus,
if the choices are the typical Likert responses
e.g., I am a take-charge person strongly agree,
agree, not sure, disagree, strongly disagree the
item is scored from 5 to 1 if there is a positive
correlation with the criterion (and 1 to 5 if the
correlation is negative).
With the option-keying strategy, each alternative is analyzed separately, and scored only if that
alternative signicantly correlates with the criterion. Typically, we might analyze the responses
given by contrasted groups (e.g., successful insurance salespeople vs. unsuccessful ones), and score
those specic alternatives that show a differential
response endorsement with either unit weights
(e.g., 1) or differential weights (perhaps reecting
the percentage of endorsement or the magnitude
of the correlation coefcient).
Although both scoring procedures seem to
yield comparable results from a validity point
of view, items scored with the option-keying
procedure are less amenable to faking because
the respondent does not know which option
yields the maximal score for that item. That is,
this procedure could well yield items where a
strongly agree response is scored zero, but an
agree response is scored +1.
Kluger, Reilly, and Russell (1991) investigated
the fakability of these two procedures, as well as
the fakability under two types of instructions:
general instructions of applying for a job vs.
specic instructions of applying for the job of
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
454
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
455
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
456
other instruments do not fare as well. For example, on the Philadelphia Geriatric Center Morale
Scale (discussed in Chapter 10), scores correlated
.70 with the Edwards Social Desirability Scale
in one study (Carstensen & Cone, 1983). As the
authors indicated, it is to be expected that a measure of psychological well-being should correlate
with social desirability because it is socially desirable to be satised with life and experience high
morale. But such a high correlation calls into
question the construct validity of the scale.
TEST ANXIETY
Test anxiety, like anxiety in general, is an unpleasant general emotion, where the individual feels
apprehensive and worried. Test anxiety is a general emotion attached to testing situations, situations which the individual perceives as evaluative. Test anxiety is a major problem for many
individuals; it can also be a major problem from a
psychometric point of view, because it can lower
a students performance on many tests.
During the 1950s, there was a proliferation of
scales designed to assess general anxiety (Sarason, 1960). One of the major ones was the Taylor Manifest Anxiety Scale (J. A. Taylor, 1953).
This scale was a somewhat strange product. Clinical psychologists were asked to judge which
MMPI items reected a denition of anxiety and
these became the Taylor, but the scale was really
designed to measure drive an important but
generic concept in learning theory. Subsequently,
the research focused on specic types of anxiety,
such as social anxiety and test anxiety. Seymour
Sarason and his colleagues (Sarason, Davidson,
Lighthall, et al., 1960) developed the Test Anxiety
Scale for Children, which became the rst widely
used test-anxiety instrument.
March 4, 2006
14:22
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
457
The Test Anxiety Inventory (TAI). This Inventory (Spielberger, 1980) is one of the major measures used to assess test anxiety. The TAI yields
a total score, as well as two separate subscores
indicating worry and emotionality. The TAI has
been used widely with both high-school and college students (Gierl & Rogers, 1996) and has
been translated or adopted into many languages
including Italian (Comunian, 1985) and Hungarian (K. Sipos, M. Sipos, & Spielberger, 1985);
these versions seem to have adequate reliability
and construct validity (see DeVito, 1984, for a
review).
TESTWISENESS
Testwiseness or test sophistication refers to a persons ability to use the characteristics and format
of a test or test situation, to obtain a higher score
independent of the knowledge that the person
has. Put more simply, there are individual differences in test-taking skills. Research suggests that
test-wiseness is not a general trait, is not related to
intelligence, but is clue-specic, that is, related to
the particular type of clue found in the test items
(Diamond & Evans, 1972; T. F. Dunn & Goldstein,
1959). Apparently, even in sophisticated college
students, individual differences in testwiseness
may be signicant; Fagley (1987) reported that
in a sample studied, testwiseness accounted for
about 16% of the variance in test scores.
Millman, Bishop, and Ebel (1965) analyzed
testwiseness into two major categories those
independent of the test and those dependent
on it. Independent aspects include strategies to
use test time wisely, to avoid careless errors, to
make a best guess, and to choose an answer using
deductive reasoning. Dependent aspects include
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
458
strategies to interpret the intent of the test constructor and the use of cues contained within the
test itself.
Experimental test. One of the tests developed to
measure testwiseness is the Gibb (1964) Experimental Test of Testwiseness, which is composed
of 70 multiple-choice questions. The questions
appear to be difcult history items, but can be
answered correctly by using cues given within
the test question or the test itself. There are seven
types of cues, including alliterative association
cues in which a word in the correct answer alternative sounds like a word in the item stem, and
length cues where the correct alternative is longer
than the incorrect alternatives. The reliability of
the test is adequate though low .72 for K-R20
and .64 for a test-retest with a 2-week period
(Harmon, D. T. Morse, & L. W. Morse, 1996). The
validity has not yet been established, although a
factor analysis suggested that the test could be
characterized as tapping a general prociency
in testwiseness (Harmon, D. T. Morse, & L. W.
Morse, 1996).
Eliminating testwiseness. Testwiseness is pri-
14:22
Ganellen, R. J. (1994). Attempting to conceal psychological disturbance: MMPI defensive response sets
and the Rorschach. Journal of Personality Assessment,
63, 423437.
The subjects of this study were commercial airline pilots who
were required to undergo an independent psychological evaluation after completing a treatment program for alcohol or
substance abuse. Because the results of the evaluation would
have a bearing on whether their pilots licenses would be reinstated, there was considerable incentive to fake good. Did
they? Did the potential defensive response set affect the MMPI
and the Rorschach? Read the article to nd out!
P1: JZP
0521861810c16
CB1038/Domino
0 521 86181 0
March 4, 2006
14:22
459
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
AIM This chapter looks at the role of computers in psychological testing. Comput-
ers have been used as scoring machines, as test administrators, and recently as test
interpreters. We look at the issue of computer-based test interpretation (CBTI) and
questions about the validity of such interpretations. We consider ethical and legal
issues, as well as a variety of other concerns. The role of computers in testing is a very
hot topic currently, with new materials coming out frequently. Entire issues of professional journals are devoted to this topic (e.g., December 1985 issue of the Journal
of Consulting and Clinical Psychology and the School Psychology Review, 1984, 13,
[No. 4]), and there are entire journals that focus on computers and psychology (e.g.,
Computers in Human Behavior). At the same time, this is a relatively new eld, and
many issues have not yet been explored in depth.
HISTORICAL PERSPECTIVE
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
461
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
462
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
Evidence
9:14
463
view, there are two major approaches to assessing the equivalence of p-p and cf versions (F. R.
Wilson, Genco, & Yager, 1985):
1. Classical test theory. Here we wish to show
that the two versions yield equal means and
variances, and that the pattern of correlations
with other measures, as in criterion validity, is
essentially identical. We may also wish to assess
the convergent-discriminant validity of the two
forms.
2. Generalizability theory. Here we wish to ask
the questions whether obtained results are generalizable across different conditions; i.e., if we wish
to diagnose an individual as having a particular
psychiatric syndrome, it should not make any difference whether the MMPI was administered as
a cf or as a p-p instrument, and what is more
important, we can identify various sources of
potential variability. (For an example that combines the two approaches, see F. R. Wilson, Genco,
& Yager, 1985.)
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
464
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
465
Equivalence of reliability. Few studies have
addressed this issue. D. L. Elwood and Grifn
(1972) compared the p-p and the cf version of
the WAIS through a test-retest design, and found
the results to be virtually identical.
Equivalence of speeded tests. Many clerical
and perceptual tests are highly speeded. They
consist of items that are very easy, such as crossing out all the letter es in a page of writing, but
the score reects the ability to perform the task
rapidly; if sufcient time were given, all examinees would obtain perfect scores. With such
tests, there are a number of potential differences
between the p-p version and the cf version. In
general, marking a bubble on an answer sheet
takes considerably longer than pressing a computer key. Several studies do indicate that with
speed tests, subjects are much faster in the cf version, that the reliability of scores obtained on the
cf presentation is as high as that of the p-p version,
and that despite the differences in mean performance, the correlation between the two forms
(reecting the rank order of individuals) can be
quite high (e.g., Greaud & Green, 1986; Lansman,
Donaldson, Hunt, et al., 1982).
One possible advantage of the cf is that a xed
number of items can be administered, and time
elapsed can be tracked simultaneously with number of items attempted. This is much more difcult to do with p-p tests. Whether such information can be used in a predictive sense (e.g., to
predict job performance) remains to be seen.
Even with tests that are not highly speeded,
equivalence can be problematic if speed is
involved. For example, Van de Vijver and
Harsveld (1994) analyzed the equivalence of the
General Aptitude Test Battery as administered
to Dutch applicants to a military academy. You
recall from Chapter 14 that the GATB is a general intelligence speed test that uses a multiplechoice format and is composed of seven subtests,
such as vocabulary and form matching. Differences between the p-p and cf versions were small
though noticeable, with the cf subtests producing faster and more inaccurate responses, with
simple clerical tasks affected more than complex
tasks.
A meta-analysis of the equivalence of p-p versus cf versions of cognitive ability tests indicated a
signicant difference between speeded tests and
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
466
power tests. For power tests, the typical correlation was .97, whereas for speeded tests it was
.72. Keep in mind that the denitions of speeded
and of power tests, such as the one we gave in
Chapter 1, are idealized types; most tests are
somewhere in between. Many power tests, such
as the SAT and the GRE, do in fact have signicant
time limits.
Equivalence
through computer, it is relatively easy to program the computer to calculate the raw scores
and to change such scores into standard scores or
derived scores, such as T scores and percentiles.
The computer is an ideal instrument to carry out
such calculations based on extensive normative
data that can be programmed.
Do examinees like the computer? Initially,
there was a great deal of concern that computers might be seen as impersonal and cold, and
that examinees would respond negatively to the
experience, and perhaps alter their test answers.
In fact, the research indicates that in most cases,
if not all, examinees like computer testing. These
studies have obtained similar results with a variety of clients and a variety of tests (e.g., M. J.
Burke, Normand, & Raju, 1987; Klingler, Johnson, & Williams, 1976; N. C. Moore, Summer, &
Bloor, 1984; F. L. Schmidt, Urry, & Gugel, 1978).
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
467
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
468
the MMPI (e.g., Drake & Oetting, 1959; Gilberstadt & Duker, 1965; P. A. Marks & Seeman,
1963), but these systems have failed when applied
to other samples than those originally used. One
of the major problems is that a large number of
MMPI proles cannot be classied following the
complicated rules that such MMPI cookbooks
require for actuarial interpretation.
2. The clinical method. The clinical method
involves one (or more) experts who certainly
would use the results of the research literature
and actuarial data and basically combine these
data with his or her own acuity and clinical skills
to generate a library of potential interpretations
associated with different scale scores and combinations. A variety of such clinical computerbased interpretive systems are now available for
a wide variety of tests, but primarily for the
MMPI.
One argument present in the literature (e.g.,
Matarazzo, 1986) is that the same test scores, as
on the MMPI, can be interpreted differently given
different demographic factors. For example, a
high paranoia-scale score would be interpreted
one way if obtained by a 45-year-old patient with
schizophrenia, but rather differently if obtained
by a 21-year-old college student. The clinician
attempts to take into account the unique characteristics of the client, whereas the computer
can only be programmed in a nomothetic fashion, for example, if the client is older than 28, it
interprets this way, if younger, it interprets that
way.
Unfortunately, the literature suggests that in
attempting to take into account the individual
uniqueness of the client, the validity of clinical
reports generated by clinicians usually decreases,
primarily because clinicians are inconsistent in
their judgment strategies, whereas computers are
consistent. Virtually all the 100 or so studies that
compare the clinical vs. the actuarial methods,
show that the actuarial methods are equal to
or exceed the accuracy of the clinical methods.
The greater the extent to which clinicians rely
on empirically established methods of data interpretation and collection, the greater their overall
accuracy. The computer has tremendous potential to play a major role in increasing the accuracy
of psychological evaluations and predictions, as
the programing becomes more sophisticated in
evaluating an individuals test scores.
or computerized test interpretation is not equivalent to actuarial. Meehl (1954) specied that
an actuarial method must be prespecied; that
is, it must follow a set procedure and be based
on empirically established relations. A computerized test interpretation may simply model or
mimic subjective clinical judgment, or it may
actually be based on actuarial relationships. As a
matter of fact, very few computerized-report programs are based solely on actuarial relationships.
Why is that? In large part because, at present,
actuarial rules on tests such as the MMPI would
tend to classify few protocols at best.
Study the clinician. Because we basically want
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
and indeed with almost every test, we cannot simply ask, Are CBTIs valid? We must ask, Which
CBTI and for what purpose? Not many studies are available to answer these questions, but
hopefully they will be in the future. Ideally, to
evaluate a CBTI, we need to evaluate the library
of item statements and the interpretive rules by
which statements are selected. Very few studies
have been done where experts are asked to look
at the interpretive system and evaluate the accuracy of the decision rules that lead to the interpretive statements (e.g., Labeck, Johnson, & Harris,
1983). Such information is typically not available, and thus most studies evaluate the resulting CBTI, either in toto or broken down into
sections.
A number of studies can be subsumed under
the label of consumer satisfaction, in that clinicians are asked to rate the accuracy of such
reports, using a wide variety of experimental
approaches, some of which control possible confounding effects and some of which do not (see
Labeck, Johnson, & Harris, 1983; J. T. Webb,
Miller, & Fowler, 1970, for examples). Moreland
(1987) listed 11 such studies (although many others are available especially on the MMPI) with
accuracy ratings ranging from a low of 32% to a
high of 91%, and a median of 78.5%. Although
this consumer satisfaction approach seems rather
straightforward, there are a number of methodological problems with this approach (see D. K.
Snyder, Widiger, & Hoover, 1990). A general limitation of these studies is that high ratings of
satisfaction do not prove validity.
In a second category of studies, clinicians are
asked to complete a symptom checklist, a Q sort,
or some other means of capturing their evaluation, based on a CBTI. These judgments are then
compared to those made by clinicians familiar
with the client or based on some other criteria,
such as an interview. Moreland (1987) listed ve
such studies, with mean correlations between sets
of ratings (based on CBTI vs. based on knowledge of the client) between .22 and .60, with the
median of 18 such correlation coefcients as .33.
Such studies now are relatively rare (see Moreland, 1991, for a review of such studies).
9:14
469
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
470
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
471
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
472
ranges of scores have been determined empirically. These ranges differ from scale to scale, so
that a clinician would need to know that, for
example, a T score of 65 on one scale might well
represent a moderate score, while on another
scale it would represent a high score. The test
author (D. K. Snyder, 1981) developed a CBTI
composed of more than 300 interpretive paragraphs based on individual scale elevations as well
as congural patterns, both within and across
spouses. (For an interesting case study and an
example of a CBTI, see D. K. Snyder, Lachar, &
Wills, 1988.)
Neuropsychological Testing
and Computers
Report = work. A neuropsychological report
represents a great deal of professional time,
and so it would be most advantageous if such
reports could be computer generated. Also there
is research that indicates a greater diagnostic
accuracy for mechanical/statistical methods of
data combination vs. human clinical judgment,
and so the diagnostic accuracy of a well-designed
computer program should exceed that of
clinicians.
However, at present this potential has not been
realized. Adams and Heaton (1985; 1987) point
out that there are several obstacles to the development of computer programs to interpret neuropsychological test results:
1. There is little uniformity in clinical neuropsychology. The specic tests, the types of interpretations and recommendations, as well as the type
of diagnostic decisions made are so varied, that
computer programs have great difculty in incorporating all these parameters.
2. We have an incomplete state of knowledge about brain-behavior relationships. Most
research studies are based on samples that have
well-dened conditions, but in most clinical settings the patients who need a diagnostic workup
have conditions that are not so clearly dened.
3. In normal adults, performance on neuropsychological tests is correlated positively with education and negatively with age, indicating that
such demographic variables are important in
neuropsychological test interpretation. We do
not yet have the adequate norms to incorpo-
Test administration. Neuropsychological patients as a group have difculties with aspects such
as following directions and persisting with a task.
Also, qualitative observations of the testing are
very important in the scoring and test interpretation. Thus, the kind of automation that is possible
with a test such as the MMPI may not be feasible
with neuropsychological testing, more because of
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
473
ADAPTIVE TESTING AND COMPUTERS
The advent of computers into psychological testing has resulted in a number of advances, perhaps
most visibly in the area of adaptive testing. In a
typical test, all examinees are administered the
same set of items. In Chapter 2 we discussed
the bandwidth-delity dilemma. Do we measure something with a high degree of precision,
but applicable to only some of the clients, or do
we measure in a broad perspective, applicable to
most, but with little precision? Adaptive testing
solves that dilemma. In an adaptive test, different
sets of items are administered to different individuals depending upon the individuals status
on the trait being measured (Meijer & Nering,
1999; D. J. Weiss, 1985). For example, suppose we
have a 100-item multiple-choice vocabulary test,
with items ranging in difculty from the word
cat to the word gribble. Rather than starting
everybody with the word cat, if we are testing a
college student, we might begin with a word of
middle difculty such as rickets. If the person
gets that item correct, the next item to be presented would be of slightly higher difculty. If the
person answers incorrectly, then the next item to
be presented would be of lower difculty. Thus
the computer calculates for each answered item
whether the answer is right or wrong, and what
the next appropriate item would be. As the person progresses through the test, the calculations
become much more complex because, in effect,
the computer must calculate a running average
plus many other more technical aspects. The person might then experience a slight delay before
the next appropriate item is presented. However,
the computer is programmed to calculate the two
possible outcomes the answer is right or the
answer is wrong while the person is answering
the item. When the answer is given, the computer
selects the pertinent alternative and testing proceeds smoothly.
There are many synonyms in the literature
for adaptive testing, including tailored, programmed, sequential, and individualized testing.
Generic adaptive testing is of course efcient and
can save considerable testing time. They are also
more precise, and at least potentially, are therefore more reliable and valid.
Adaptive testing actually began with the BinetSimon tests and its principles were incorporated
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
474
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
475
attempt to do this. Computers can handle extensive amounts of normative data, but humans are
limited. Computers can use very complex ways
of combining and scoring data, whereas most
humans are quite limited in these capabilities.
Computers can be programmed so that they continuously update the norms, predictive regression equations, etc., as each new case is entered.
6. Greater standardization. In Chapter 1, we
saw the importance of standardizing both test
procedures and test interpretations. The computer demands a high degree of such standardization and, ordinarily, does not tolerate deviance
from such standardization.
7. Greater control. This relates to the previous
point, but the issue here is that the error variance
attributable to the examiner is greatly reduced if
not totally eliminated.
8. Greater utility with special clients or groups.
There are obvious benets with computerized
testing of special groups, such as the severely disabled, for whom p-p tests may be quite limited
or inappropriate.
9. Long-term cost savings. Although the initial costs of purchasing computer equipment, of
developing program software, etc., can be quite
high, once a test is automated it can be administered repeatedly at little extra cost.
10. Easier adaptive testing. This approach
requires a computer and can result in a test that
is substantially shorter and, therefore, more economical of time. The test can also be individualized for the specic examinee.
Are advantages really advantages? In reading the above list, you may conclude that some
advantages may not really be advantages. There
may be some empirical support for this, but relatively little work has been done on these issues.
For example, immediate feedback would typically be seen as desirable. S. L. Wise and L. A.
Wise (1987) compared a p-p version and two cf
versions of a classroom achievement test with
third and fourth graders. One cf version provided immediate item feedback and one did not.
All three versions were equivalent to each other
in mean scores. However, high math-achievers
who were administered the cf version with immediate feedback showed signicantly higher state
anxiety; the authors recommended that such
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
476
just discussed, we might consider that computerized testing reduces the potential for observing
the subjects behavior. As we have seen, one of the
major advantages of tests such as the StanfordBinet is that subjects are presented with a set
of standardized stimuli, and the examiner can
observe directly or indirectly the rich individual
differences in human behavior that enhance the
more objective interpretation of test results. With
computerized testing, such behavior observation
is severely limited.
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
as expert witnesses in the courtroom. Unfortunately, studies that evaluate the accuracy of diagnosis and prediction show mixed results and cast
substantial doubts as to whether psychological
evaluation can meet the legal standards for expertise. Ziskin and Faust (1988) cite 1,400 studies
and articles that cast doubt on the reliability and
validity of clinical evaluations conducted for legal
purposes. Why is the reliability and validity of
such psychological evaluations so limited? Faust
and Ziskin (1989) cite several reasons: (1) Psychological theory is not so advanced as to permit precise behavioral prediction; (2) Available
9:14
477
information that is potentially useful is often misused, by disregarding base rates and by reliance
on subjective rather than objective procedures;
and (3) Clinicians, like people in general, have a
limited capacity to manage complex data. These
reasons once again point to the future potential
of using CBTIs.
Computer tests. For now, much effort has been
devoted to translating p-p versions to cf versions, and relatively little effort has been devoted
to creating new computer-administered tests. A
number of tests have however been developed
specically for computerized use, and some of
these take advantage of the graphic possibilities of
the computer (e.g., Davey, Godwin, Mittelholtz,
1997). As one might imagine, psychologists in
the military, both in the U.S. and other countries
including England, have pioneered many such
techniques (B. F. Green, 1988). For example, one
technique used with Navy personnel is an Air
Defense Game that involves a simulated radar
screen with hostile air targets approaching ones
ship. The examinee has to defend the ship by
launching missiles. The effects of stress and other
aspects on such test performance can be easily
studied (Greitzer, Hershman, & Kelly, 1981).
These approaches would seem to be particularly useful in the assessment of perceptualmotor coordination and decision-making
aspects, as well as other domains of human
abilities (Fleishman, 1988).
Barrett, Alexander, Doverspike, et al. (1982)
reported on a battery of information-processing
tests developed specically for computer testing
(although p-p versions of these tests are also available). For example, in the Linear Scanning test, 20
equilateral triangles are presented in a row. Each
triangle has a line through it, with the exception of one to four triangles. The row of triangles
is presented for 1.5 seconds and erased, and the
subject needs to indicate how many triangles did
not have a line through them. The split-half reliability for most of these measures was above .80
but test-retest reliability with a 2-week to a 1month interval, was rather low, with none of the
15 reported correlation coefcients higher than
.60, and in most cases substantially lower. To be
sure these measures are more complex in structure than the type of items found on personality
inventories or traditional cognitive tests, but the
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
478
have been
made to computer score such projective techniques as the Rorschach and the Holtzman
Inkblot Test (e.g., Gorham, 1967; Gorham,
Moseley, & Holtzman, 1968; Piotrowski, 1964),
but in general these do not seem to be widely used
by clinicians.
As you might expect, computer administration and computer interpretation of projectives
presents major challenges, but an interesting
example comes from a sentence-completion test.
Veldman (1967) administered 36 sentence stems
to more than 2,300 college freshmen; these stems
required a one-word response. He thus obtained a
response pool of more than 83,000 responses. By
eliminating all words with frequencies of less than
1%, a pool of 616 common responses was kept.
Veldman then developed a computer program
that administers each sentence stem and waits
for the subject to respond. If the response is a
rare word, the computer program requests a synonym. A second rare response results in another
request; if the third response is also rare, then the
second sentence stem is presented. If the response
is a common word that is, one in the computers
memory then followup questions are presented.
For example, if the sentence stem My work has
been is responded with the word hard or
difcult, the computer asks, What do you nd
difcult about it? If the response is good, the
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
9:14
479
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
480
Acceptability to clients. Originally, some psy-
9:14
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
For an example of computer software that automates and simplies the process of constructing
such a test, see Walczyk (1993).
THE FUTURE OF COMPUTERIZED
PSYCHOLOGICAL TESTING
9:14
481
P1: JZP
0521861810c17
CB1038/Domino
0 521 86181 0
482
Klee, S. H., & Garnkel, B. D. (1983). The computerized continuous performance task: A new measure of
inattention. Journal of Abnormal Child Psychology, 11,
487496.
An example of a study of a test administered by computer,
with such aspects as sensitivity and specicity discussed.
Schoenfeldt, L. F. (1989). Guidelines for computerbased psychological tests and interpretations. Computers in Human Behavior, 5, 1321.
DISCUSSION QUESTIONS
1. The book discusses three types of service associated with computer administration of tests.
What is available on your college campus or
community? Do these services deviate from the
three categories presented?
2. How might you carry out a study to determine
whether a personality variable (such as impulsivity, for example), interacts with some aspects of
computerized presentation of a test?
3. To assess the validity of a CBTI one could compare a clinicians judgment based on the CBTI vs.
a second clinicians judgment based on personal
knowledge of the client. What do you think of
this procedure? What are some of the potential
problems?
4. Could adaptive testing be used in the exams
you take in this class?
5. What are some of the ethical issues involved
in using computers in psychological testing?
9:14
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
AIM This chapter looks at a particular point of view called behavioral assessment and
contrasts this with the more traditional point of view. We look at a variety of instruments developed or used in behavioral assessment to illustrate various issues. We then
turn our focus to four broad areas of assessment that transcend the individual: program
evaluation, the assessment of environments, the assessment of family functioning, and
nally, some broad-based, exible techniques.
TRADITIONAL ASSESSMENT
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
484
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
Behavioral observations can take place in a natural setting, such as a playground or a classroom
(and are called naturalistic observations), or in
an analog setting, such as a laboratory task that
tries to simulate a real-life procedure. The observation may be obtrusive, where the subject is
aware that they are being observed, or unobtrusive. Unobtrusive observations often focus on
aspects of the environment rather than the person, for example, counting the number of empty
beer bottles in someones thash as an indicator of
drinking behavior. From a practical point of view,
such observations can be easy in some cases and
nearly impossible in others. We can easily observe
a child on the playground or in the classroom to
assess his or her aggressiveness, but it would be
more difcult, for example, to observe an adult
executive as he or she interacts with subordinates
and peers to assess anxiety. From a psychometric point of view, however, there are a number
of challenges. Is what is to be observed so specic that two observers will agree as to the occurrence of the behavior (i.e., interrater reliability)?
Does the act of observing alter the behavior, e.g.,
If Alberto knows he is being observed, will he
throw that spitball? What is to be recorded? Is
it the occurrence of a behavior (Alberto throws
a spitball), the frequency (Alberto threw six spitballs in the past hour), the antecedents (Alberto
throws spitballs when the teacher ignores him),
etc.? What coding system shall be used (e.g., Is
throwing spitballs an instance of aggressive
behavior?) (See Wahler, House, & Stanbaugh,
1976, for an example of a coding scale that covers
24 categories of behavior, for the direct observation of children in home or school settings.)
In studies of behavioral assessment, direct observation is the most common assessment procedure used, followed by self-report (Bornstein,
Bridgwater, Hickey, et al., 1980).
2. Self-monitoring. Here the subject observes
9:22
485
Indirect Assessment
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
486
2. Interviews are conducted for different purposes, in different ways, by different examiners.
Therefore, issues of reliability and validity cannot address interviewing in general, but must
pay attention to the various aspects and circumstances. This is a clear situation where generalizability theory (as discussed in Chapter 3) could
be useful.
3. Reliability would seem important to establish,
but very few studies do so. In the area of behavioral assessment, a number of studies have looked
at interrater agreement, for example, by taping
interviews and scoring such interviews along
specied dimensions, by independent observers.
Such interrater agreement is often relatively high.
4. Internal-consistency reliability, for example,
by repeating items within the same interview, is
rarely evaluated.
5. A number of studies have looked at criterionrelated validity, where interview results are compared with data from other assessment instruments. Results differ widely, with some studies
reporting high validity coefcients and others
low; what is needed is to determine under what
conditions valid results are obtained.
6. Content validity seems particularly appropriate and adequate in structured interviews, which
essentially represent a set of questions (much like
the MMPI) read aloud by the examiner.
7. The validity of interviews can also be evaluated
in a pre-post design, where an interview precedes
and follows an experimental or therapeutic intervention. To the extent that the interview is valid,
the data obtained in the interview should covary
with the results of the intervention.
8. There are many potential sources of error in
interviews; not only are these self-reports, but
interactions between such aspects as the gender
and ethnicity of the interviewer vs. the interviewee can introduce considerable error variance.
9. In general, from a psychometric point of view,
interviews are less preferable.
Structured vs. unstructured interviews. Interviews can vary from very unstructured to highly
structured. In an unstructured interview, there
is a goal (e.g., Is this a good candidate for
this position?), but the format of the interview,
the sequence of questions, whether a topic is
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
covered or not, are all unstructured. Unstructured interviews are often used in therapeutic
settings, for example when a clinician conducts
an initial interview with a client. The unstructured interview allows a great deal of exibility
and the potential to go in new or unplanned for
directions.
In a highly structured interview, the goal may
be the same, but the procedure, questions, etc.,
are all predetermined and standardized. Thus, the
structured interview lacks exibility but provides
standardization. From a psychometric point of
view this is a preferable strategy because it permits
quantication and comparability across interviewers, situations, and clients.
Types of interviews. There are, of course,
numerous types of interviews and different ways
of categorizing them. There are employment
interviews whose goal is generally to decide on a
persons qualications as to a particular position.
There are psychiatric intake interviews where initial information about a client is obtained. There
is the mental status exam, which is an interview
that ascertains what abnormalities are present in
the client (see Chapter 7). There are exit interviews, polls, case histories, and various other
types of interviews.
Regardless of the type of interview, we can
conceptualize the interview as made up of three
aspects: the interviewer, the interviewee, and
the interview process itself. Each of these three
aspects also represents a source of error in terms
of reliability and validity.
2. Checklists and rating scales. These are used
widely in behavioral assessments and are completed either by the individual as a self-report or
by someone else as an observers report. Both the
Child Behavior Checklist (Achenbach, 1991) and
the Conners rating scales (Conners, 1990), which
we covered in Chapter 9, are good examples of
behavioral checklists and rating scales. Both focus
on specic behaviors rather than hypothesize
personality traits. In general, self-report scales
that emanate from a behavioral perspective differ in two major ways from their more traditional
counterparts. First, the items typically focus on
behavior and use behavioral terms. Second, many
(but not all) have been developed informally
so that their psychometric structure does not
9:22
487
have the degree of sophistication and complexity found in instruments such as the MMPI or
the SVIB. When behavioral-assessment instruments rst became popular, there was a tendency to reject classical psychometric principles
of reliability and validity, and to be more concerned with other issues more directly relevant
to behavioral observation. Subsequently, psychometric concepts were seen as quite applicable, and while traditional testing was rejected,
traditional psychometric concepts were not.
Checklists and rating scales are economical,
can be easily administered, and serve to focus subsequent interviewing and observational efforts.
They are useful to quantify observations. In addition, these are typically normative instruments;
they allow for the comparison of a specic child
with norms. They also can be used to quantify any
change that is the result of intervention efforts. At
the same time, because they are indirect measures
of behavior, behaviorists tend to be suspicious of
them and are concerned that the obtained data
may be affected by social desirability, situational
aspects, and a host of other limitations.
We used the terminology of bandwidth and
delity earlier, and there is a parallel between
these terms and the direct-indirect dichotomy.
Indirect methods are essentially broad-band, low
delity, whereas direct methods are narrowband, high delity. Broad-band low delity
methods provide more information, but at a
lower quality.
Note also, that these indirect methods could be
considered as self-reports. From a behavioral
assessment point of view, self-report has several
advantages. The observer, who is also the subject,
is always present when a specic behavior occurs,
and the observer can observe internal events
(thoughts and feelings).
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
488
A fourth difference is that behavioral assessment typically uses multiple sources of data collection. Not only are interviews conducted, but
behavioral observations are made, and checklists
and questionnaires used.
Another difference is that in behavioral assessment there is typically a rather strong insistence
on clarity of denition and precision of measurement. These are of course highly desirable, but in
traditional testing they may be absent or not as
strongly emphasized.
Finally, traditional assessment typically either
precedes treatment or uses a pre-post design.
For example, a patient is assessed as depressed,
is given 6 months of psychotherapy and medications and is again evaluated at the end of 6
months. With behavioral assessment, the assessment is typically ongoing and multiple in nature.
For a more detailed analysis of the differences
between traditional and behavioral approaches,
see D. P. Hartmann, Roper, and Bradford (1979).
VALIDITY OF BEHAVIORAL ASSESSMENT
In Chapter 3, we discussed validity specically content criterion predictive and concurrent and construct validity. Initially, a number of behaviorists argued that these traditional
concepts did not apply to behavioral assessment,
but the more acceptable view now is that they do
(e.g., Cone, 1977).
Content validity is very important in behavioral assessment because it basically reects adequate sampling (the observations made must
be representative of the behavior of interest).
Another way to look at this issue is to determine
to what degree an observed behavior is specic
to a particular situation (the same issue discussed
under generalizability).
Predictive validity refers to the accuracy of a
test to predict specic types of behaviors. This
also is important in behavioral assessment, particularly in the use of checklists and rating scales.
In a number of studies, the predictive validity of
such scales, as dened by correlations with reallife behaviors, has been less than adequate.
In behavioral assessment, concurrent validity
is often dened as how well a particular scale or
observational method correlates with other scales
or methods designed to assess the same behavior.
Concurrent validity data is frequently available
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
alternate way to consider reliability and validity is through generalizability theory. That is, the
relationships of a given measure with either itself
(reliability) or other criteria (validity) reect the
fact that a given score can be generalized in different ways. Cone (1977) suggests that generalizability theory can provide a way to evaluate
behavioral assessment techniques and indicates
that there are six aspects to such generalizability:
9:22
489
mation gathered from an individual, i.e., selfreport, has always been a controversial topic
within psychology. For psychoanalysts, selfreports based on such techniques as free association (what comes to mind when . . .) and retrospection (tell me about your childhood), were
presumed to be reliable, that is replicable, but not
particularly valid because what the patient said
was distorted by ego defenses, needs, early childhood experiences, and so on. For the behaviorists,
as exemplied by John B. Watson, self-report was
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
490
accuracy of
observations is typically assessed by interobserver
agreement, raising the issue of interrater reliability. In addition to the Pearson correlation coefcient, there are a number of other indices used
such as percentage agreement, discussed in Chapter 15. There are a number of statistical issues
that complicate the picture (see the 1977 Journal
of Applied Behavior Analysis). One of the issues
is that of chance agreement. If two observers are
independently observing the same behavior, they
may agree on the basis of chance alone. The kappa
statistic (J. Cohen, 1960), takes into account such
chance agreement.
BEHAVIORAL CHECKLISTS
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
491
reliability may be low and, in essence, inappropriate. Where appropriate, test-retest reliability
is generally higher the shorter the time period
between test and retest. We also need to distinguish, at least theoretically, between test-retest
reliability and a pre-post research design where
the instrument is given twice to assess the effectiveness of an intervention. Good research practice would require the use of a control group as
well, which typically would yield a better measure
of reliability.
Internal consistency is relevant only to the
degree that the particular scale measures a unitary dimension or factor.
Interrater reliability can also, in an applied situation, be confounded with other aspects. For
example, the ratings of teachers may not fully
coincide with those given by mothers. However,
such differences may not necessarily represent
low interrater reliability or error, but may be legitimate sources of information.
Validity of checklists. Content validity as well as
criterion-related validity (both concurrent and
predictive) would seem to be of essence.
What affects the validity of behavioral
checklists? S. N. Haynes and Wilson (1979) identied eight types of variables that could affect the
validity:
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
492
Questionnaire
Assesses
Common fears
Deviant sexual behaviors
Heterosexual interactions
Eating patterns
Assertiveness
Fear of spiders
Preschool behavior
A large number of questionnaires are now available to assess a variety of behaviors such as
sexual behaviors, interpersonal behaviors like
Measurement of Anger
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
493
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
494
Penile measurement techniques provide information about a males sexual interests by measuring penile response to various categories of sexual stimuli. One type of penile gauge is a device
that involves the encasement of the penis in a
glass cylinder. Sexual arousal of the penis causes
air displacement in the glass cylinder, and thus
volume changes can be recorded (McConaghy,
1967). Other types of gauges have also been developed (see R. C. Rosen & Beck, 1988).
ODonohue and Letourneau (1992) reviewed
the psychometric properties of such gauges in
samples of child molesters, although fully realizing that sexual arousal to children is neither
a necessary nor a sufcient condition for sexual offending. At the onset we should ask, as
these investigators did, what kind of assessment
method does PTM represent? Is this a normreferenced test where the results for a subject
are compared to those for a group of subjects?
Is PTM a criterion-referenced test where we have
some criterion of what is healthy or deviant, and
we can assess an individual accordingly? Or is
PTM not really a test, but a direct observation of
behavior? If the latter, then the major criterion
is whether this sample of behavior is representative and what can we infer from it. The writers
assumed that standard psychometric criteria (i.e.,
reliability and validity) were indeed relevant.
Reliability. Studies of child molesters usually
have few subjects, and the range of scores of PTM
is often quite restricted. Both of these aspects tend
to result in lower reliability coefcients. From
a reliability perspective, we would want consistency of response to similar stimuli; in fact the
level of sexual arousal may be affected by a variety of factors such as time since last orgasm and
drug use.
An illustrative reliability study is that of Frenzel
and Lang (1989), who studied heterosexual and
homosexual sex offenders as well as a control
group of heterosexual volunteers. The stimuli
were 27 lm clips that were shown three times to
assess test-retest reliability. The lm clips varied
as to gender and age of subject shown, and three
items were sexually neutral. Coefcient alpha was
.93 for the 27 stimuli. A factor analysis of the data
resulted in four factors: general heterosexual-
used are arrest records, criminal convictions, selfreport, and the reports of others. All of these criteria are fallible. For example, ODonohue and
Letourneau cite one study where the investigators interviewed 561 nonincarcerated paraphiliacs (sex offenders); they found a ratio of arrests to
commission of rape and child molestation to be 1
to 30, indicating a very high rate of false negatives
(i.e., people who have not been arrested but have
committed a sexual offense).
An illustrative study is that of Freund (1965),
who used colored slides of nude males and
females of different ages to compare two groups:
20 heterosexual pedophiles and 20 heterosexual
patients in an alcohol-abuse program. All subjects showed the greatest sexual arousal to female
slides; sex offender subjects showed the greatest
arousal to slides of children and least arousal to
slides of adults. Of the 20 sex offenders, the results
misclassied only 1 subject, and of the controls,
only 1 subject was misclassied as a pedophile.
In general, studies in this category show that
child molesters can be differentiated from normal
controls, and there is good sensitivity in general
with few false positives and fewer false negatives
(Blanchard, Klassen, Dickey, et al., 2001).
Differential diagnosis. One validity question is
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
495
Depression can be seen as a behavioral disorder characterized by dysphoric mood, accompanied by motoric changes (e.g., walking slowly)
and physiological concomitants such as loss of
appetite (see Lewinsohn & Graf, 1973). The
Pleasant Event Schedule (PES; MacPhillamy &
Lewinsohn, 1976) was developed specically
from a behavioral point of view and reects
Lewinsohns theory about depression, which
essentially sees depression as a lack of positive
reinforcement. The PES is intended to assess the
amount of external positive reinforcement the
individual receives.
Lewinsohn (1976) believes that the general
goal for the behavioral treatment of depressed
individuals is to restore an adequate schedule of
positive reinforcement for the individual. Specic intervention techniques can vary from person to person, and one approach is to use activity
schedules to increase the patients rate of behaviors that are likely to be reinforced by others or
are intrinsically reinforcing for that patient.
The PES consists of 320 events that were
elicited from various groups of subjects; several
forms of the PES were developed, but form III
with 320 items seems to be the standard one. The
PES is used as a retrospective report of the events
of the last 30 days, as well as a daily log for ongoing behavior. Lewinsohn (1976) describes how
the PES can be administered to a patient: a subgroup of PES items judged to be most pleasant
are targeted, and the patient is asked to indicate
at the end of each day which activities he or she
has engaged in.
In using the PES as a retrospective report, subjects are asked to indicate how frequently each
item occurred within the last 30 days, on a 3-point
scale: 0 (not happened); 1 (a few, 1 to 6 times);
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
496
One of the major areas of decit studied by behavior modication has been that of social skills, generally involving the ability to express both positive
and negative feelings appropriately in interpersonal situations. The label social skills stands
for a variety of behaviors that usually have an
interpersonal component (i.e., relating to others) and an evaluative component (i.e., how effective the person is in relation to others). From a
behavioral-assessment stance, we might perceive
social skills as the abilities to emit behaviors that
are positively reinforced by others and not emit
behaviors that are punished by others. Among
these skills, assertiveness and heterosexual dating skills have been the focus of much research
and many self-report scales (for an early review of
several self-report techniques used in the socialskills area, see Hersen & Bellack, 1977).
Assertiveness
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
497
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
498
There are many self-report measures of assertiveness for use with adult subjects, but relatively
few for children. Michelson and Wood (1982)
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
Phobias
9:22
499
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
500
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
Another popular measure is the FQ, a brief 15item instrument that requires the subject to rate
his or her avoidance of specic situations (I. M.
Marks & Mathews, 1979). Typical items are: sight
of blood, going into crowded shops, being criticized. Three subscales are available blood/injury,
agoraphobia, and social, as well as a total score.
This scale is probably one of the most frequently
used self-report instruments in the treatment of
agoraphobia (Trull, Neitzel, & Main, 1988). In
clinical samples, the FQ has excellent test-retest
reliability, normative data is available (Mizes &
Crawford, 1986; Trull & Hillerbrand, 1990), and
the scale has been translated into a number of
languages including French, Italian, and Spanish
(H. B. Lee & Oei, 1994).
Trull and Hillerbrand (1990) studied a sample of college students and a volunteer sample
of community adults (surveyed by phone). For
the collegiate sample, females scored higher than
males on the blood/injury and agoraphobia subscales, as well as the total phobia score. For the
community sample, females scored higher on all
three subscales, as well as the total score. Community subjects scored higher than college students on the social and agoraphobia subscales,
as well as on the total score. In both samples,
the three subscales intercorrelated signicantly,
with coefcients from .28 to .48. Internal consistency (Cronbachs alpha) ranged from .44 to .73.
A conrmatory factor analysis indicated that the
9:22
501
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
502
occurred, presumably reective of the effectiveness of the program. Sometimes, tests are misused in this way. The teaching effectiveness of a
school and its staff is often evaluated based on
how the children do on a standardized achievement battery, without considering the fact that
the scores on these tests may be more highly
related to the socioeconomic background of the
children than to the expertise and efforts of the
teachers.
National programs such as Head Start have
had massive evaluations, although the conclusions derived from such evaluations do not necessarily reect unanimous agreement (see W. L.
Goodwin & Driscoll, 1980).
Program evaluations can vary from rather simple approaches where a single rating scale is used,
as in the typical course evaluation conducted
on collegiate campuses, to complex multiphasic
approaches that use cognitive measures, assessment of psychomotor changes, economic indicators, direct observation, use of archival data, and
so on. Thus, in evaluating a graduate program
in clinical psychology, the evaluating team looks
not only at information such as the publications
of the faculty, their rank, their teaching evaluations, etc., but also interviews graduate students,
deans and other administrators, obtains architectural information about the amount of space
available, library resources, and so on.
An excellent example and case study is the evaluation of Sesame Street, an extremely popular
childrens television program geared for children
ages 3 to 5. The program was originally developed
as a remedial program, especially for disadvantaged children, to prepare them for school. The
intent was not only to entertain but to increase
intellectual and cultural development (Ball &
Bogatz, 1970). The rst year of Sesame Street
was evaluated through a large sample of children
that included disadvantaged inner-city children,
Spanish-speaking children, and rural-area children. A special battery of tests was developed
for this program evaluation, basically consisting
of 12 major tests (such as Numbers and Sorting Skills) that took an average of 2 hours to
administer. The alpha coefcients for the subtests ranged from .17 to .93, with a median of .66
(see W. L. Goodwin & Driscoll, 1980, for a synopsis). To the extent that psychological tests are
used in such program evaluations, we need to be
You recall that Public Law 91-230, the Handicapped Childrens Act (Chapter 12) mandated the
establishment of preschool and early-childhood
programs for disabled and high-risk children.
Thus, these children need to be identied, and
there is a need for instruments. Traditional intelligence tests such as the Wechsler or the StanfordBinet can be quite useful, but at risk involves
more than simply cognitive decit. There is a
need to identify environmental aspects that can
either directly affect development, or can interact with other variables such as intelligence, to
produce greater developmental delay. Scientists
often use socioeconomic status (SES) as an index
of environmental quality. Although SES works
and is used widely, it is also a fairly limited index.
Within an SES grouping, for example, families
differ widely in the kinds and amounts of stimulation they provide their children.
Caldwell and her colleagues (e.g., Bradley &
Caldwell, 1977, 1979) pointed out that there is
a need to develop screening and diagnostic procedures that identify children at risk. A childs
developmental environment can be hypothesized to play a major role, and thus should be
included as a major assessment dimension. In
particular, a childs developmental environment
can be indexed by process measures, that is the
day-to-day, moment-to-moment transactions of
the child with the environment the type of
language stimulation received, the kind of play
materials available, the types of punishment and
rewards, etc. They therefore developed the Home
Observation for Measurement of the Environment (HOME) Inventory, which consists of items
(different numbers in different versions) that
cover such aspects as the emotional and verbal
responsivity of the mother to the child, whether
there are appropriate play materials, and so on.
The HOME inventory is administered in the
home by a trained observer and requires about
60 minutes.
Bradley and Caldwell (1979) presented validity data for the preschool version of the HOME
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
503
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
504
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
505
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
506
could measure its mineral content, degree of pollution, viscosity, etc., but we usually leave such
measurements to physical scientists. But the psychological perception is another matter. Bruvold
(1968) presented four rating scales to evaluate
the taste of water. These scales were developed
using the Thurstone method of equal-appearing
intervals, and for each of the scales, nine items
were retained. The rst scale, called the hedonic scale, consists of nine items that range from,
I like this water extremely to I dislike this
water extremely. The second scale, the quality scale, consists of nine items that range from,
This water has an excellent taste to This water
has a horrible taste. The third scale is the action
tendency scale, and its nine items range from,
I would be very happy to accept this water as
my everyday drinking water to I cant stand
this water in my mouth and I could never drink
it. Finally, the fourth scale, called the combination scale, has nine items that range from, This
water tastes real good. I would be very happy
to have it for my everyday drinking water to
This water has a terrible taste. I would never
drink it.
Reliability of these scales was assessed in a laboratory study where 24 employees of the California
State Department of Public Health rated 10 water
samples, with order of water sample presentation
randomized, as well as a number of other experimental controls. For each subject there were 28
reliability correlation coefcients computed, but
the average intercorrelations between and within
scales ranged from .62 to .77. The validity of the
scales was assessed by comparing mean ratings
for samples of respondents from different towns,
and by showing an inverse relationship between
favorability toward water and mineral content of
the water.
their preferences for different environmental settings. The Environmental Preference Questionnaire (EPQ; R. Kaplan, 1977) was designed to
explore individual differences in such environmental preferences. The 1977 form of the EPQ
contains seven scales developed on the basis of
responses made by several hundred people, and
as a result of various factor-analytic procedures.
The seven EPQ scales are described next:
The majority of tests considered in earlier chapters have the individual as their focus. That is,
whether we administer an intelligence test, a personality inventory, a checklist of values, a measure
of creativity, etc., we are assessing an individual.
Tests can also be used to assess the psychosocial and/or physical environment, such as a college campus as we discussed earlier, or as we will
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
507
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
508
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
ber of techniques that are somewhat more difcult to categorize because they combine various
aspects of projective and objective techniques.
One category are gure placement techniques,
which presumably reect an individuals or a
9:22
509
familys perception of how the family is organized. Most of these instruments have, however,
lacked standardization and have, at best, modest reliability and validity (Bagarozzi, 1991). An
example, is the Family Systems Test (FAST) developed in Germany, originally in a German version,
but also in English. The FAST is based on a specic family theory that considers cohesion and
hierarchy as two important dynamic aspects of
family functioning. The FAST consists of a board
divided into 81 squares, with 6 male and 6 female
gures, 18 cylindrical blocks of 3 sizes, and 1 male
and 1 female gure each in orange, violet, and
green. The respondent, an individual, a couple, or
an entire family, arranges the gures on the board
to indicate cohesion, and elevates the gures with
blocks to indicate hierarchy. This is done three
times to represent a typical situation, an ideal situation, and a conict situation. Cohesion is operationally translated as proximity of gure placement, and hierarchy as the differences in height
between gures. Cohesion and hierarchy scores
are combined to reect family structure in each
situation, or can be combined across situations to
assess perceived family exibility. There are two
additional options that involve using gures of
varying colors, and indicating degree of eye contact between gures.
In a study of California adolescents, test-retest
reliability with a 1-week period was adequate (.63
to .87) with higher coefcients associated with
older adolescents. Test-retest reliability over a
4-month interval was poor (.42 to .59). The construct validity of the FAST is mixed. Some ndings are supportive of hypothesized relationships
and some are not. There are a number of concerns
about this technique, which potentially may be
useful.
Assessment of couples. What are the purposes
of couple assessment? Fruzzetti and Jacobson
(1992) suggest that there are three purposes:
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
510
interactions, such as the degree of demandwithdrawal (i.e., as one partner makes greater
demands, the other withdraws emotionally).
BROAD-BASED INSTRUMENTS
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
2. Peer rating scales, which involve a Likerttype rating scale. Typically, the children are provided with a roster of their classmates, and rate
each child on a 5-point scale indicating, for example, how much they like that child. Sometimes the
verbal anchors are accompanied by or replaced
with various sequences of happy faces.
3. Paired-comparison procedure. Here each
child is asked to choose between every possible pair of children (excluding oneself) again in
terms of a particular situation. If we have a classroom of 25 children, then each child will make
253 such judgments; that is the major drawback,
although short cuts are available (for illustrative
studies see Asher, Singleton, Tinsley et al., 1979;
Deutsch, 1974; LaGreca, 1981).
Because there are so many variations in sociometric techniques, it is difcult to generalize
about the reliability and validity of these techniques. At the risk of oversimplifying, however,
we can make some general tentative conclusions.
First as to reliability:
1. Reliability generally seems to be adequate.
2. Reliability is greater with older children than
with younger children.
3. Reliability of positive nominations is greater
than that of negative nominations.
4. As usual, test-retest reliability decreases as the
time interval gets longer.
5. Peer rating procedures seem somewhat more
reliable than restricted nomination procedures.
6. Lack of reliability may not necessarily represent error as in the classical test theory model,
but may reect legitimate sources of variation.
For example, sociometric ratings made by less
popular children are less stable; they may simply
reect their inability to form stable friendships.
With regard to validity the following applies
1. In general, there is a moderate to low relationship between sociometric indices and various
indices of behavior. Positive correlations are typically found between indices of social acceptance
and prosocial behavior, and between indices of
rejection and negative aggressive social behavior. For example, rejected children are twice as
likely to engage in aggressive behavior on the
playground.
9:22
511
Adjective Checklists
Natural language. Adjectives represent a natu-
ral descriptive language with relevance and utility for both lay persons and scientists. Several
adjective checklists are available in the psychological testing literature ranging from broadbased personality assessment techniques such
as the Adjective Check List (ACL; Gough &
Heilbrun, 1965), to lists developed for specic
populations such as children and the mentally
retarded (G. Domino, 1965; G. Domino, Goldschmid, & Kaplan, 1964; Lipsitt, 1958), or to measure specic variables such as affect and mood
(Lorr, Daston, & Smith, 1967; Zuckerman &
Lubin, 1965), depression (Lubin, 1965; 1967)
or complexity-simplicity (Jackson & Minton,
1963).
Assessing environments. Craik (1971) revi-
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
512
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
513
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
514
d2
1
4
SUM:
9:22
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
9:22
515
An example. Q sorts not only can yield useful
empirical data but, perhaps more important, can
provide a useful methodology to test theoretical ideas. For example, Metzger (1979) attempted
through Q methodology, to assess a theory proposed by Kubler-Ross (1969) who hypothesized
that a dying person goes through stages of denial,
anger, bargaining, depression, and acceptance in
coming to grips with impending death. Metzger
developed a 36-item structured Q set from a
larger pool of items items such as Why did God
let this happen to me? and I pray to be well.
She administered the Q set to two couples, in
which the wife had a potentially terminal medical
diagnosis. Although the results did not support
Kubler-Ross stage theory, the author concluded
that the Q methodology was a useful procedure
to investigate terminal illness.
SUMMARY
In this chapter, we looked at behavioral assessment, various basic issues, and illustrative instruments like the Pleasant Events Schedule and the
Fear Survey Schedule. We also looked at how tests
can be used in program evaluation, the assessment of environments, and the assessment of systems like the family. Finally, we looked at sociometric techniques, adjective checklists, and Q
sets all broad-based techniques that can be used
in a variety of ways.
SUGGESTED READINGS
Calvert, J. D., Moore, D. W., & Jensen, B. J. (1987). Psychometric evaluation of the Dating Anxiety Survey: A
self-report questionnaire for the assessment of dating
anxiety in males and females. Journal of Psychopathology and Behavioral Assessment, 9, 341350.
This article evaluates an instrument designed to assess dating anxiety. The authors report the results of two studies in
which a factor analysis, an assessment of reliability (internal
consistency), and concurrent validity, were investigated.
P1: JZP
0521861810c18
CB1038/Domino
0 521 86181 0
516
methodological and theoretical assumptions. Psychological Bulletin, 77, 409420.
A review of the assumptions that underlie traditional vs.
behavioral assessment, as these assumptions impact the selection of test items and the interpretation of the responses to
the test.
9:22
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
AIM In this chapter we take a brief look at the history of psychological testing and
a peek at the future. For didactic purposes, we consider the current status of psychological testing as reecting four major strands: the French clinical tradition, the German nomothetic tradition, the British idiographic tradition, and the American applied
tradition.
INTRODUCTION
When you meet a new person at a party, for example, you want to know their name and a little
bit about their background whats their family like, where did they grow up, and so on. In
this text, you have met psychological testing; it
is time to take a brief look backwards. Psychological testing is, in American society, at least,
quite ubiquitous. Children are tested in a variety
of ways, from preschool days through middleschool graduation. High-school adolescents are
given aptitude tests to determine what they can
master, achievement tests to assess what they have
mastered, minimum competency exams to determine whether their diploma should be granted,
interest tests to help them choose careers, and
college-entrance examinations to allow them to
progress in the educational stream. Similarly, college students come face to face with a variety of
educational and psychological exams, and if you
think college graduation means the end of testing, you are in for a big surprise!
Psychological testing is a recent phenomenon,
closely interwoven with 20th-century American
culture, yet the use of systematic procedures for
comparing and evaluating individuals is quite
old. In China for example, in the year 2200 b.c.,
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
518
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
519
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
520
The British, on the other hand, were vitally interested in these individual differences and viewed
them not as error, but as a fundamental reection of evolution and natural selection, ideas
that had been given a strong impetus by the
work of Charles Darwin (18091882). It was Darwins cousin, Sir Francis Galton (18221911),
who united the French clinical tradition, the
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
German experimental spirit, and the English concern for variation, when he launched the testing
movement on its course (A. R. Buss, 1976).
Galton studied eminent British men and
became convinced that intellectual genius was
xed by inheritance. He then developed a number of tests to measure this inherited capacity
and to demonstrate individual differences. He
established a small anthropometric laboratory in
a London museum where, for a small fee, visitors could undergo tests of reaction time, hearing acuity, color vision, muscular strength, and
other basic sensorimotor functions. To summarize the information he collected on approximately 10,000 persons, Galton made use of or
himself developed statistical procedures. He also
realized that a persons score could be expressed
in a relative rather than an absolute manner (e.g.,
John is taller than these 18 men, rather than John
is 6 feet tall). This is an extremely important and
fundamental concept in psychological measurement, yet one that can easily be neglected.
Galtons ultimate objective was the early identication of geniuses so that they could be
encouraged to reproduce and thus improve
the intellectual level of the human race. This
grandiose idea, however, was not supported by
the test results, for although sensorimotor functions demonstrated individual differences, they
showed little relationship to other criteria of
intelligence.
Factor analysis. Although it was Galton who laid
9:37
521
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
522
the reaction time for a sound, and the measurement of dynamometric pressure (the pressure
exerted by ones grip). Cattell felt that differences in sensory acuity, reaction time, and similar functions would result in differences in intellectual achievement; hence he administered his
tests to Columbia University college freshmen
in the hope of predicting their college achievement (Sokal, 1987). Incidentally, Cattell and his
wife had seven children, one of whom named
Psyche also became a well-known psychologist
(Sokal, 1991).
Cattells 1890 paper resulted in a tremendous
interest in mental testing. Psychologists at various
institutions began developing similar tests and
in 1895 the American Psychological Association
formed a committee to investigate the possibility of having various psychological laboratories
cooperate in the collection of data.
Other American investigators. Cattell was not
the rst nor the only American investigator concerned with testing. Joseph Jastrow (18631944),
a student of G. Stanley Hall, and holder of the
rst PhD in psychology awarded in the United
States, had developed a set of 15 tests that he
demonstrated to visitors at the 1893 Columbian
Exposition held in Chicago. These tests, with a
heavy Wundtian avor, included weight discrimination, reproduction of letters after tachistoscopic presentation, tests of color blindness, and
tests of reaction time (see Jastrow, 1930, for an
autobiography).
Lightner Witmer (18671956) who also studied with Wundt, established the rst psychological clinic in the United States, in 1896 at the
University of Pennsylvania (McReynolds, 1987;
ODonnell, 1979). As the number of clients
referred to the clinic grew, Witmer began collecting tests and using these in diagnostic evaluations (for a survey of the clinics activities from
1896 to its closing in 1961, see Levine & Wishner,
1977). Soon other university clinics were established, and diagnostic evaluation based on psychological test results became part of the clinical
approach. Until the late 1920s, a large portion
of the clinic psychologists activities consisted
of administering intelligence tests. Beginning in
the 1930s however, the psychological horizons
expanded, and the use of projective tests to study
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
523
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
524
Although both studies had serious methodological limitations, and they were not meant
by their authors to be indictments of testing,
they were nevertheless taken by many as proof
that mental measurement could not be a reality.
As a result, when the 1905 Binet-Simon scale
appeared, psychologists in American universities
exhibited little interest; however, psychologists
and educators working in clinics and schools,
faced with the same practical problems of mental
classication that Binet was struggling with, were
quite receptive and rapidly enthusiastic.
Mental deciency. One such man was Henry
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
525
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
526
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
527
Bayley, a psychologist at the University of California (Berkeley) began the Berkeley Growth Study
(her doctoral dissertation) a longitudinal study
of a sample of 61 newborns, devoted to the investigation of mental and motor development and
physical growth and maturing. Each newborn
was studied intensively and was seen monthly for
the rst 15 months, and then at 3-month intervals
until age 3 and less frequently after that. Mental and motor test scores, anthropometric measures, observations of mother and child, interview data, and projective test data were collected.
Reports from this study have been numerous and
have contributed much to our understanding of
psychological development (Bayley, 1968; 1986;
1991).
Other developments. Situational tests in the
study of personality were also developed. These
tests involved the close observation of a subject engaged in a task whose purpose was often
disguised. A classical example is the series of
investigations conducted by Hartshorne and May
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
528
testing was not without its pains. A large number of tests eagerly placed on the market did
not live up to the users expectations and often
resulted in a negative attitude toward all testing.
Many individuals with little or no psychological training used complex instruments to make
decisions that often were not valid. For example,
many school teachers were taught administration
of the Binet scale during one summer course and
would then become ofcial community examiners, frequently being labeled as psychologists. In
1914, surveys published by J. E. Wallin indicated
that the majority of psychological examiners were
teachers, principals, and even medical personnel,
most with very little training in psychology.
Acrimonious controversies over technical
issues also did not help. A classical example
is the Stanford-Iowa controversy over the constancy of the IQ i.e., the question of whether
intelligence is a xed personal attribute based
on hereditary givens, or a characteristic that not
only reects ones culture and upbringing but can
also be altered by environmental manipulations
such as coaching, nursery experiences, intensive
teaching, and others (see McNemar, 1940; R. L.
Thorndike, 1940; Wellman, Skeels, & Skodak,
1940). Again, this was not a new question, but
one that went back to the time of Gallon (Fancher,
1983). In 1932, Beth Wellman of Iowa published
the rst of a series of studies reporting marked
changes in the IQs of children attending the Iowa
University Elementary School. The evidence to
support this contention consisted in mean IQ
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
increases from 110 to 119 and 124 on subsequent testing of a large group of children. Subsequent reports by Wellman and her co-workers
presented additional evidence that the stimulation in a childs environment is an important factor in mental development (see Minton, 1984).
Criticisms of the Iowa studies were swift and
often violent. B. R. Simpson published an article
in 1939, entitled The wandering IQ: Is it time
to settle down? in which he bitterly denounced
the work of Wellman as worse than nonsense.
Florence Goodenough, also in 1939, published A
critique of recent experiments on raising the IQ,
a thoughtful evaluation based on the more rational arguments of unreliability, examiner bias, and
statistical regression.
David Wechsler. In 1939, David Wechsler (1896
1981) pointed out that available adult tests of
intelligence had typically been developed from
childrens tests, and therefore their content was
often inappropriate for adults. He also indicated
that available tests overemphasized speed and
verbal abilities, and their standardization rarely
included adults. To correct these and other weaknesses, Wechsler created the Wechsler-Bellevue
Intelligence Scale, an individual point-scale for
adults. The Wechsler-Bellevue scale won wide
acceptance and was soon ranked second in frequency of use, following the Stanford-Binet. In
particular, the Wechsler-Bellevue found great
use in military hospitals during World War II.
Ten years later Wechsler published the Wechsler
Intelligence Scale for Children, and in 1955 the
Wechsler-Bellevue was replaced by the Wechsler
Adult Intelligence Scale (see Chapter 5).
World War II. World War II also made extensive demands on the skills and ingenuity of
psychologists and further stimulated psychological testing. The successful placement of personnel into specialized war activities, such as radar
observers, airplane navigators, and radio operators, became a crucial goal that generated much
systematic and sophisticated research. Problems
of adjustment, morale, and psychopathology also
stimulated interest in testing, particularly in the
use of projective techniques as tools for clinical
psychologists. The high rate of draftee difculties emphasized the severity of the mental-health
problem and made greater demands upon clin-
9:37
529
federal government actively supported the training of clinical psychologists. This resulted both
in great interest in psychological tests designed to
measure the inner aspects of personality, and
in a reaction against the identication of clinical psychology as applied psychometrics. The
young clinicians felt they were not just testers
and often disavowed any connections with tests
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
530
The development and application of psychological tests has increased enormously; they are routinely used in education, the military, civil service, industry, even religious life. The number
of tests available commercially is truly gigantic,
but probably surpassed by noncommercial tests.
Large-scale testing involving thousands of subjects are relatively common. Complex statistical analyses, not possible before computers, have
now become routine matters. It is somewhat difcult to single out recent contributions as historically noteworthy because the objectivity of time
has not judged their worth, but the following are
probably illustrative.
The Authoritarian Personality. One of the major
his coworkers at the University of Southern California developed a unied theory of the human
intellect that organizes various intellectual abilities into a single system called the structure of
intellect (Guilford, 1959a; 1967b). As discussed
in Chapters 5 and 8, Guilfords model is a threedimensional one that classies human intellectual abilities according to the type of mental operations or processes involved, the kind of content,
and the type of outcome or product. Given ve
categories of operations, four kinds of content,
and six kinds of products, the theory can be represented by a cube subdivided into 120 (4 5 6)
smaller cubes. Based on this model and a factoranalytic approach, Guilford and his coworkers
developed a large number of tests, each designed
to be a pure measure of a particular intellectual ability. Under the operation dimension of
divergent thinking, a number of measures have
been developed and used in the area of creativity,
despite their limited validity (see Guilford, 1967a,
for his autobiography).
The 1960s. The launching of the Soviet Sputnik in October 1957 brought to the consciousness of the American public the possibility that
the United States was no longer number one,
at least in some areas of endeavor. As always,
the high-school educational system was criticized and a urry of legislative actions resulted
in a search for talented students who could once
again bring the United States to the forefront of
science and technology. These searches involved
large-scale testing, exemplied by the National
Merit Scholarship Corporation program. Such
programs were facilitated by various technological advances that permitted machine scoring of
large number of answer sheets. This in turn facilitated the use of multiple-choice items, as opposed
to essay questions, which could not be scored by
machine.
The result was dual. On the one hand, teams of
experts devoted their considerable talents to the
development of multiple-choice items that would
assess more than memory and recognition. On
the other hand, a number of critics protested
rather vocally against these tests, particularly
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
9:37
531
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
532
The third development was the truth-intesting legislation. Just as there are a variety of
laws that protect the consumer when a product
is purchased or used, so a number of legislative efforts were made, particularly in relation
to admissions tests for professional and graduate
programs, to allow the test taker to have access to
his or her test results, to have publicly available
information on the psychometric aspects of the
test, and to give full disclosure to the test taker
prior to testing as to what use will be made of the
test scores. The basic rationale for such legislation was to permit the test taker the opportunity
to know explicitly the criteria by which decisions
are made about that individual; public disclosure
would presumably make the test publisher and
test administrator more accountable and, hence,
more careful. The intent was good, but the results,
it may be argued, punished test takers by resulting
in higher fees and fewer administrative dates (for
a fascinating chronology of noteworthy events in
American psychology, many relevant to testing,
see Street, 1994).
The 1980s also saw a number of major attacks
of both specic tests and psychological testing
more generally, with the publication of books
such as The Reign of ETS: The Corporation that
Makes Up Minds (Nairn & Associates, 1980), The
Testing Trap (Strenio, 1981), None of the Above:
Behind the Myth of Scholastic Aptitude (D. Owen,
1985).
A peek at the future. Following the example
9:37
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
We have taken a quick look at the history of psychological testing and a very brief peek at its
potential future. Testing has a rather long past,
perhaps because it seems to be an intrinsic human
activity. But testing today seems to have evolved
primarily due to the French clinical tradition and
the work of Binet, the German nomothetic tradition as represented by the work of Wundt, the
British idiographic approach, especially that of
Sir Francis Galton, and the American emphasis
on pragmatism.
SUGGESTED READINGS
9:37
533
Personality. Journal of the History of the Behavioral
Sciences, 29, 345353.
A fascinating review by two Dutch social scientists of the background, theoretical ideas, and methodological contributions
that resulted in the classic study of the Authoritarian Personality.
Buchanan, R. D. (1994). The development of the Minnesota Multiphasic Personality Inventory. Journal of
the History of the Behavioral Sciences, 30, 148161.
A review of how the MMPI came to be. Very readable and an
interesting look at a chapter in the history of psychology that
in many ways revolutionized psychological testing.
Dennis, P. M. (1984). The Edison Questionnaire. Journal of the History of the Behavioral Sciences, 20, 2337.
Thomas A. Edison, the famous inventor, developed a questionnaire of 48 questions to be administered to applicants
for positions as industrial chemists in his laboratories. The
questions included: What countries bound France? What
is copra? and Who was Plutarch? Although now unknown,
the questionnaire actually had quite an impact on the publics
perception of psychological testing.
DISCUSSION QUESTIONS
1. From a historical perspective which one person had the most impact on the development of
psychological testing as a eld?
2. If you could invite one person from those mentioned in the chapter to visit your class, which one
would it be?
3. If you are familiar with the musical My Fair
Lady (or the book Pygmalion on which it is
based) you might want to discuss the major
themes as they applied to England at the time
of Sir Francis Galton.
4. Could you take the work of Galton and
of Goddard and argue for an environmental
explanation?
5. Are there any recent events that in the future
may be incorporated in a chapter on the history
of psychological testing?
P1: JZP
0521861810c19
CB1038/Domino
0 521 86181 0
534
9:37
P1: JZP
0521861810apx
CB1038/Domino
0 521 86181 0
March 4, 2006
14:23
APPENDIX
very
easy
items
Equivalent z
score
99%
98
97
96
95
94
93
92
91
90
89
88
87
86
85
84
83
82
81
80
79
78
77
76
75
74
2.33
2.05
1.88
1.75
1.65
1.56
1.48
1.41
1.34
1.28
1.23
1.17
1.13
1.08
1.04
.99
.95
.92
.88
.84
.81
.77
.74
.71
.67
.64
50%
49
48
47
46
45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
Equivalent
z score
.00
+.03
+.05
+.08
+.10
+.13
+.15
+.18
+.20
+.23
+.25
+.28
+.31
+.33
+.36
+.39
+.41
+.44
+.47
+.50
+.52
+.55
+.58
+.61
+.64
+.67
535
P1: JZP
0521861810apx
CB1038/Domino
0 521 86181 0
March 4, 2006
536
14:23
Appendix
Difculty level (area
to the right)
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
54
53
52
51
Equivalent z
score
.61
.58
.55
.52
.50
.47
.44
.41
.39
.36
.33
.31
.28
.25
.23
.20
.18
.15
.13
.10
.08
.05
.02
very
difcult
items
Equivalent
z score
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
+.71
+.74
+.77
+.81
+.84
+.88
+.92
+.95
+.99
+1.04
+1.08
+1.13
+1.17
+1.23
+1.28
+1.34
+1.41
+1.48
+1.56
+1.65
+1.75
+1.88
+2.05
+2.33
Note: Because of rounding error and lack of interpolation, the values are approximate, and may not
match normal curve parameters (e.g., the value for 84% is given as .99 rather than the expected 1).
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
March 6, 2006
13:33
References
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
538
Alderman, A. L., & Powers, P. E. (1980). The
effects of special preparation on SAT verbal
scores. American Educational Research Journal, 17,
239251.
Alderton, D. L., & Larson, G. E. (1990). Dimensionality of Ravens Advanced Progressive Matrices
items. Educational and Psychological Measurement,
50, 887900.
Alfred, K. D., & Smith, T. (1989). The hardy
personality: Cognitive and physiological responses
to evaluative threat. Journal of Personality and Social
Psychology, 56, 257266.
Allen, J. P., & Litten, R. Z. (1993). Psychometric
and laboratory measures to assist in the treatment of alcoholism. Clinical Psychology Review, 13,
223239.
Allen, J. P., & Mattson, M. E. (1993). Psychometric instruments to assist in alcoholism treatment
planning. Journal of Substance Abuse Treatment, 10,
289296.
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks-Cole.
Allen, R. M., & Jefferson, T. W. (1962). Psychological
evaluation of the cerebral palsied person. Springeld,
IL: Charles C Thomas.
Alliger, G. M., & Dwight, S. S. (2000). A metaanalytic
investigation of the susceptibility of integrity tests to
faking and coaching. Educational and Psychological
Measurement, 60, 5972.
Allport, G. W. (1937). Personality: A psychological interpretation. New York: Holt, Rinehart and Winston.
Allport, G. W. (1961). Pattern and growth in personality. New York: Holt, Rinehart and Winston.
Allport, G. W., & Odbert, H. S. (1936). Trait-names: A
psycholexical study. Psychological Monographs, 47,
(1, whole No. 211).
Allport, G. W., Vernon, P. E., & Lindzey, G. (1960).
Study of values (3rd ed.). Boston: HoughtonMifin.
Altemeyer, B. (1981). Right-wing authoritarianism.
Manitoba, Canada: University of Manitoba Press.
Altepeter, T. S., & Johnson, K. A. (1989). Use of the
PPVT-R for intellectual screening with adults: A
caution. Journal of Psychoeducational Assessment, 7,
3945.
Alter, J. B. (1984). Creativity prole of university and
conservatory dance students. Journal of Personality
Assessment, 48, 153158.
Alter, J. B. (1989). Creativity prole of university and
conservatory music students. Creativity Research
Journal, 2, 184195.
Alvino, J., McDonnel, R. C., & Richert, S. (1981).
National survey of identication practices in gifted
and talented education. Exceptional Children, 48,
124132.
March 6, 2006
13:33
References
American Association on Mental Deciency. (1974).
Adaptive Behavior Scale: Manual. Washington, DC:
Author.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing.
Washington, DC: American Educational Research
Association.
American Guidance Service, Inc. (1969). The Minnesota Rate of Manipulation Tests, examiners manual. Circle Pines, MN: Author.
American Psychological Association. (1986). Guidelines for computer-based tests and interpretations.
Washington, DC: Author.
American Psychological Association. (1987). General
guidelines for providers of psychological services.
American Psychologist, 42, 712723.
American Psychological Association. (1992). Ethical
principles of psychologists and code of conduct.
American Psychologist, 47, 15971611.
American Psychological Association, Task Force on the
Delivery of Services to Ethnic Minority Populations.
(1993). Guidelines for providers of psychological
services to ethnic, linguistic, and culturally diverse
populations. American Psychologist, 48, 4548.
American Psychological Association Committee on
Psychological Tests and Assessment. (1996). Statement on the disclosure of test data. American Psychologist, 51, 644648.
Ames, L. B., Gillespie, B. S., Haines, J., & Ilg, F. L. (1979).
The Gesell Institutes child from one to six: Evaluating
the behavior of the preschool child. New York: Harper
& Row.
Anastasi, A. (1983). Evolving trait concepts. American
Psychologist, 38, 175184.
Anastasi, A. (1988). Psychological testing (6th ed.). New
York: Macmillan.
Anastasi, A., & Schaefer, C. E. (1969). Biographical correlates of artistic and literary creativity in adolescent
girls. Journal of Applied Psychology, 53, 267273.
Anderson, G. J., & Walberg, H. J. (1976). The assessment
of learning environments. Chicago, IL: University of
Illinois.
Anderson, R. D., & Sisco, F. H. (1977). Standardization of the WISC-R for deaf children (series T, No.
1). Washington, DC: Gallaudet College, Ofce of
Demographic Studies.
Anderson, R. T., Aaronson, N. K., & Wilkin, D. (1993).
Critical review of the international assessments of
health-related quality of life. Quality of Life Research,
2, 369395.
Andrews, J. (1973). The relationship of values to identity achievement status. Journal of Youth and Adolescence, 2, 133138.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Andrews, L. W., & Gutkin, T. B. (1991). The effects
of human vs. computer authorship on consumers
perceptions of psychological reports. Computers in
Human Behavior, 7, 311317.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational
Measurement (pp. 508600). Washington, DC:
American Council on Education.
Aplin, D. Y. (1993). Psychological evaluation of adults
in a cochlear implant program. American Annals of
the Deaf , 138, 415419.
Arbisi, P. A., & Ben-Porath, Y. S. (1998). The ability
of Minnesota Multiphasic Personality Inventory2 validity scales to detect fakebad responses in
psychiatric inpatients. Psychological Assessment, 10,
221228.
Archer, R. P., Gordon, R. A., & Kirchner, F. H. (1987).
MMPI response-set characteristics among adolescents. Journal of Personality Assessment, 51, 506516.
Arellano, C. M., & Markman, H. J. (1995). The Managing Affect and Differences Scale (MADS): A selfreport measure assessing conict management in
couples. Journal of Family Psychology, 9, 319334.
Argulewicz, E. N., Bingenheimer, L. T., & Anderson,
C. C. (1983). Concurrent validity of the PPVT-R for
Anglo-American and Mexican-American students.
Journal of Psychoeducational Assessment, 1, 163167.
Arizmendi, T., Paulsen, K., & Domino, G. (1981).
The Matching Familiar Figures Test: A primary, secondary, and tertiary evaluation. Journal of Clinical
Psychology, 37, 812818.
Armstrong, R. G. (1955). A reliability study of a short
form of the WISC vocabulary subtest. Journal of
Clinical Psychology, 11, 413414.
Arnold, M. B. (1962). Story sequence and analysis. New
York: Columbia University Press.
Aronow, E., & Reznikoff, M. (1976). Rorschach content
interpretation. New York: Grune & Stratton.
Aronow, E., Reznikoff, M., & Moreland, K. L. (1995).
The Rorschach: Projective technique or psychometric test? Journal of Personality Assessment, 64,
213228.
Arrasmith, D. G., Sheehan, D. S., & Applebaum,
W. R. (1984). A comparison of the selected-response
strategy and the constructed-response strategy for
assessment of a third-grade writing task. Journal of
Educational Research, 77, 172177.
Arthur, D. (1994). Workplace testing. New York: American Management Association.
Arthur, W., Jr., & Day, D. V. (1994). Development of
a short form for the Raven Advanced Progressive
Matrices Test. Educational and Psychological Measurement, 54, 394403.
Arthur, W., Jr., & Woehr, D. J. (1993). A conrmatory
factor analytic study examining the dimensionality
March 6, 2006
13:33
539
of the Ravens Advanced Progressive Matrices.
Educational and Psychological Measurement, 53,
471478.
Asher, S. R., Singleton, L. C., Tinsley, B. R., & Hymel, S.
(1979). A reliable sociometric measure for preschool
children. Developmental Psychology, 15, 443444.
Association of American Medical Colleges (1977). New
Medical College Admission Test interpretive manual.
Washington, DC: Association of American Medical
Colleges.
Astin, A. W. (1982). Minorities in American higher education. San Francisco: Jossey-Bass.
Astin, A. W., & Holland, J. L. (1961). The environmental assessment technique: A way to measure college
environments. Journal of Educational Psychology, 52,
308316.
Atkin, R. S., & Conlon, E. J. (1978). Behaviorally
anchored rating scales: Some theoretical issues.
Academy of Management Review, 3, 119128.
Atkinson, D., & Gim, R. (1989). Asian-American cultural identity and attitudes toward mental health
services. Journal of Counseling Psychology, 36, 209
212.
Atkinson, L. (1992). Concurrent validities of the Stanford Binet (4th ed.), Leiter, and Vineland with developmentally delayed children. Journal of School Psychology, 30, 165173.
Austin, J. S. (1992). The detection of fake good and fake
bad on the MMPI-2. Educational and Psychological
Measurement, 52, 669674.
Averill, L. A. (1990). Recollections of Clarks G. Stanley
Hall. Journal of the History of the Behavioral Sciences,
26, 125130.
Axelrod, S., & Eisdorfer, C. (1961). Attitudes toward
old people: An empirical analysis of the stimulusgroup validity of the Tuckman-Lorge Questionnaire. Journal of Gerontology, 16, 7580.
Aylward, E. H. (1991). Understanding childrens testing.
Austin, TX: Pro-ed.
Aylward, E. H., & Schmidt, S. (1986). An examination
of three tests of visual-motor integration. Journal of
Learning Disabilities, 19, 328335.
Baars, J., & Scheepers, P. (1993). Theoretical and
methodological foundations of the authoritarian
personality. Journal of the History of the Behavioral
Sciences, 29, 345353.
Back, K. W. (1971). Metaphors as test of personal philosophy of aging. Sociological Focus, 5, 18.
Bachman, J. G., & OMalley, P. M. (1984). Yea-saying,
nay-saying, and going to extremes: Black-white differences in response styles. Public Opinion Quarterly, 48, 491509.
Baehr, M. E. (1953). A simplied procedure for
the measurement of employee attitudes. Journal of
Applied Psychology, 37, 163167.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
540
Baehr, M. E., Saunders, D. R., Froemel, E. C., & Furcon, J. E. (1971). The prediction of performance for
black and for white police patrolmen. Professional
Psychology, 2, 4658.
Bagarozzi, D. (1991). The Family Apperception Test:
A review. American Journal of Family Therapy, 19,
177181.
Bagby, R. M., Gillis, J. R., & Dickens, S. (1990). Detection of dissimulation with the new generation of
objective personality measures. Behavioral Sciences
and the Law, 8, 93102.
Bagley, C., Wilson, G. D., & Boshier, R. (1970). The
Conservatism scale: A factor structure comparison
of English, Dutch, and New Zealand samples. The
Journal of Social Psychology, 81, 267268.
Bagnato, S. J. (1984). Team congruence in developmental diagnosis and intervention: Comparing clinical judgment and child performance measures.
School Psychology Review, 13, 716.
Bagnato, S. J., & Neisworth, J. T. (1991). Assessment
for early intervention: Best practices for professionals.
New York: Guilford Press.
Bahr, H. M., & Chadwick, B. A. (1974). Conservatism,
racial intolerance, and attitudes toward racial assimilation among whites and American Indians. Journal
of Social Psychology, 94, 4556.
Bailey, K. D. (1988). The conceptualization of validity:
Current perspectives. Social Science Research, 17,
117136.
Bailey-Richardson, B. (1988). Review of the Vineland
Adaptive Behavior Scales, Classroom ed. Journal of
Psychoeducational Assessment, 6, 8791.
Baillargeon, J., & Danis, C. (1984). Barnum meets
the computer: A critical test. Journal of Personality
Assessment, 48, 415419.
Baird, L. L. (1985). Do grades and tests predict adult
accomplishment? Research in Higher Education, 23,
385.
Baird, L. L. (1987). Do students think admissions tests
are fair? Do tests affect their decisions? Research in
Higher Education, 26, 373388.
Baird, P. (1983). Assessing life events and physical
health: Critique of the Holmes and Rahe scale. Psychology, 20, 3840.
Baker, A. F. (1983). Psychological assessment of autistic
children. Clinical Psychology Review, 3, 4159.
Baker, E. L., ONeill, H. F., Jr., & Linn, R. L. (1993).
Policy and validity prospects for performancebased assessment. American Psychologist, 48,
12101218.
Baker, H. J., & Leland, B. (1959). Detroit Tests of Learning Aptitude. Indianapolis, IN: Bobbs-Merrill.
Baker, R. L., Mednick, B. R., & Hocevar, D. (1991).
Utility of scales derived from teacher judgements
of adolescent academic performance and psychoso-
March 6, 2006
13:33
References
cial behavior. Educational and Psychological Measurement, 51, 271286.
Baldessarini, R. J., Finkelstein, S., & Arana, G. W.
(1983). The predictive power of diagnostic tests and
the effects of prevalence of illness. Archives of General
Psychiatry, 40, 569573.
Baldwin, A. L., Cole, R. E., & Baldwin, C. (1982).
Parental pathology, family interaction, and the competence of the child in school. Monographs of
the Society for Research in Child Development, 47,
(Number 5).
Ball, S., & Bogatz, G. A. (1970). The rst year of Sesame
Street: An evaluation. Princeton, NJ: Educational
Testing Service.
Ballard, R., Crino, M. D., & Rubenfeld, S. (1988). Social
desirability response bias and the Marlowe-Crowne
Social Desirability Scale. Psychological Reports, 63,
227237.
Balzer, W. K., & Sulsky, L. M. (1992). Halo and performance appraisal research: A critical examination.
Journal of Applied Psychology, 77, 975985.
Banks, M. H., Jackson, P. R., Stafford, E. M., & Warr,
P. B. (1983). The Job Components Inventory and
the Analysis of Jobs requiring limited skill. Personnel
Psychology, 36, 5766.
Bannatyne, A. D. (1971). Language, reading, and learning disabilities. Springeld, IL: Charles C Thomas.
Banta, T. J. (1961). Social attitudes and response
styles. Educational and Psychological Measurement,
21, 543557.
Barba, R. H., & Mason, C. L. (1992). Construct validity
studies for the Draw-A-Computer User Test. Computers in Human Behavior, 8, 231237.
Barker, H. R., Fowler, R. D., & Peterson, L. P. (1971).
Factor analytic structure of the short form MMPI
in a VA hospital population. Journal of Clinical Psychology, 27, 228233.
Barkley, R. A. (1988). A review of Child Behavior Rating Scales and checklists for research in child psychopathology. In M. Rutter, H. Tuma, & I. Lann
(Eds.), Assessment and diagnosis in child and adolescent psychopathology. New York: Guilford Press.
Barnett, A. J. (1982). Designing an assessment of the
child with cerebral palsy. Psychology in the Schools,
19, 160165.
Barnette, W. L., Jr., & McCall, J. N. (1964). Validation
of the Minnesota Vocational Interest Inventory for
vocational high school boys. Journal of Applied Psychology, 48, 378382.
Baron, J. (1985). Rationality and intelligence. Cambridge, UK: Cambridge University Press.
Baron, J., & Norman, M. F. (1992). SATs, Achievement
Tests, and High-School class rank as predictors of
college performance. Educational and Psychological
Measurement, 52, 10471055.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Barrett, E. T., & Gleser, G. C. (1987). Development
and validation of the cognitive status examination.
Journal of Consulting and Clinical Psychology, 55,
877882.
Barrett, E. T., Wheatley, R. D., & LaPlant, R. J. (1983).
A brief clinical neuropsychological battery: Clinical
classication trails. Journal of Clinical Psychology, 39,
980984.
Barrett, G. V., Alexander, R. A., Doverspike, D.,
Cellar, D., & Thomas, J. C. (1982). The development
and application of a computerized informationprocessing test battery. Applied Psychological Measurement, 6, 1329.
Barrick, M. R., & Mount, M. K. (1991). The big ve personality dimensions and job performance: A metaanalysis. Personnel Psychology, 44, 126.
Barringer, D. G., Strong, C. J., Blair, J. C., Clark, T. C.,
& Watkins, S. (1993). Screening procedures used to
identify children with hearing loss. American Annals
of the Deaf , 138, 420426.
Barron, F. (1953). An Ego-Strength Scale which predicts response to psychotherapy. Journal of Consulting Psychology, 17, 327333.
Barron, F. (1969). Creative person and creative process.
New York: Holt, Rinehart and Winston.
Barry, B., Klanderman, J., & Stipe, D. (1983). Study
number one. In A. S. Kaufman & N. L. Kaufman
(Eds.), Kaufman Assessment Battery for Children:
Interpretive manual (p. 94). Circle Pines, MN: American Guidance Service.
Bar-Tal, D., & Bar-Zohar, Y. (1977). The relationship
between perception of locus of control and academic
achievement: Review and some educational implications. Contemporary Educational Psychology, 2,
181199.
Barton, P. E., & Coley, R. J. (1994). Testing in Americas
schools. Princeton, NJ: Educational Testing Service.
Bass, B. M. (1956). Development and evaluation of a
scale for measuring social acquiescence. The Journal
of Abnormal and Social Psychology, 53, 296299.
Baucom, D. H. (1985). Review of the California Psychological Inventory. In J. V. Mitchell, Jr. (Ed.),
The ninth mental measurements yearbook (Vol. 1,
pp. 250252). Lincoln, NB: University of Nebraska
Press.
Baucom, D. H., Epstein, N., Rankin, L. A., & Burnett,
C. K. (1996). Assessing relationship standards: The
Inventory of Specic Relationship Standards. Journal of Family Psychology, 10, 7288.
Bauer, M. S., Crits-Christoph, P., Ball, W. A., Dewees,
E., McAllister, T., Alahi, P., et al. (1991). Independent
assessment of manic and depressive symptoms by
self-rating: Scale characteristics and implications for
the study of mania. Archives of General Psychiatry,
48, 807812.
March 6, 2006
13:33
541
Baughman, E. E., & Dahlstrom, W. G. (1972). Racial
differences on the MMPI. In S. S. Guterman
(Ed.), Black psyche. Berkeley, CA: Glendessary
Press.
Baum, D. D., & Kelly, T. J. (1979). The validity of
Slosson Intelligence Test with learning disabled
kindergarteners. Journal of Learning Disabilities, 12,
268270.
Bauman, M. K., & Kropf, C. A. (1979). Psychological tests used with blind and visually handicapped
persons. School Psychology Digest, 8, 257270.
Bayley, N. (1968). Behavioral correlates of mental
growth: Birth to thirty-six years. American Psychologist, 23, 117.
Bayley, N. (1969). Bayley Scales of Infant Development.
New York: Psychological Corporation.
Bayley, N. (1986). Behavioral correlates of mental
growth: Birth to thirty-six years. Advances in Infancy
Research, 4, 1637.
Bayley, N. (1991). Consistency and variability in the
growth of intelligence from birth to eighteen years.
Journal of Genetic Psychology, 152, 573604.
Beaber, R., Marston, A., Michelli, J., & Mills, M.
(1985). A brief test for measuring malingering in
schizophrenic individuals. American Journal of Psychiatry, 142, 14781481.
Beaver, A. P. (1953). Personality factors in choice of
nursing. Journal of Applied Psychology, 37, 374379.
Bechtoldt, H. P. (1959). Construct validity: A critique.
American Psychologist, 14, 619629.
Bech, P., Gram, L. F., Dein, E., Jacobsen, O., Vitger, J., &
Bolwig, T. G. (1975). Quantitative rating of depressive states: Correlations between clinical assessment,
Becks Self Rating Scale and Hamiltons Objective Rating Scale. Acta Psychiatrica Scandinavia, 51,
161170.
Beck, A. T. (1967). Depression. New York: Harper &
Row.
Beck, A. T. (1978). Beck inventory. Philadelphia, PA:
Center for Cognitive Therapy.
Beck, A. T., & Beamesderfer, A. (1974). Assessment of
depression: The depression inventory. In P. Pichot
(Ed.), Psychological measurement in pharmacopsychiatry (Vol. 7). Basel: S. Karger.
Beck, A. T., Rial, W. Y., & Rickels, K. (1974). Short form
of depression inventory: Cross validation. Psychological Reports, 34, 11841186.
Beck, A. T., & Steer, R. A. (1984). Internal consistencies
of the original and revised Beck Depression Inventory. Journal of Clinical Psychology, 40, 13651367.
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Beck
Depression Inventory manual (2nd ed.). San Antonio, TX: Psychological Corporation.
Beck, A. T., Steer, R. A., & Garbin, M. G. (1988).
Psychometric properties of the Beck Depression
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
542
Inventory: Twenty-ve years of evaluation. Clinical
Psychology Review, 8, 77100.
Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., &
Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561571.
Beck, M. D., & Beck, C. K. (1980). Multitraitmultimethod validation of four personality measures with a high-school sample. Educational and
Psychological Measurement, 40, 10051011.
Becker, J. T., & Sabatino, D. A. (1973). Frostig revisited.
Journal of Learning Disabilities, 6, 180184.
Becker, R. L. (1988). Reading-free Vocational Interest
Inventory-Revised. Columbus, OH: Elbern.
Becker, T. E., & Colquitt, A. L. (1992). Potential vs.
actual faking of a biodata form: An analysis along
several dimensions of item type. Personnel Psychology, 45, 389406.
Beery, K. (1989). The VMI (Developmental Test of
Visual-Motor Integration) administration, scoring,
and teaching manual. Cleveland, OH: Modern Curriculum Press.
Beery, K. K., & Buktenica, N. A. (1967). Developmental
Test of Visual-Motor Integration. Chicago, IL: Follett
Educational Corporation.
Behar, L. B. (1977). The Preschool Behavior Questionnaire. Journal of Abnormal Child Psychology, 5, 265
276.
Bejar, I. I., Embretson, S., & Mayer, R. E. (1987). Cognitive psychology and the SAT: A review of some
implications. ETS Research Report, #8728.
Bekker, L. D., & Taylor, C. (1966). Attitudes toward
the aged in a multigenerational sample. Journal of
Gerontology, 21, 115118.
Bell, D. C., & Bell, L. G. (1989). Micro and macro
measurement of family systems concepts. Journal of
Family Psychology, 3, 137157.
Bellack, A. S., & Hersen, M. (1977). Self-report inventories in behavioral assessment. In J. D. Cone & R. P.
Hawkins (Eds.), Behavioral assessment (pp. 5276).
New York: Brunner/Mazel.
Bellack, A. S., Hersen, M., & Lamparski, D. (1979). Role
play tests for social skills: Are they valid? Are they
useful? Journal of Consulting and Clinical Psychology,
47, 335342.
Bellak, L. (1975). The T.A.T., C.A.T. and S.A.T. in clinical use (3rd ed.). New York: Grune & Stratton.
Bellak, L. (1986). The Thematic Apperception Test, the
Childrens Apperception Test, and the Senior Apperception Technique in clinical use (4th ed.). Orlando,
FL: Grune & Stratton.
Bender, L. (1938). A visual motor gestalt test and
its clinical use. American Orthopsychiatric Association, Research Monographs, No. 3.
Bendig, A. W. (1959). Score reliability of dichotomous
and trichotomous item responses on the Maudsley
March 6, 2006
13:33
References
Personality Inventory. Journal of Consulting Psychology, 23, 181.
Benjamin, L. S. (1993). Every psychopathology is a gift
of love. Psychotherapy Research, 3, 124.
Benjamin, L. T. (1984). Staying with initial answers on
objective tests: Is it a myth? Teaching of Psychology,
11, 133141.
Bennett, R. E., & Ward, W. C. (1993). Construction
versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio
assessment. Hillsdale, NJ: Erlbaum.
Bennett, V. D. C. (1964). Development of a self concept Q sort for use with elementary age school
children. Journal of School Psychology, 3, 19
25.
Benson, J., & Hocevar, D. (1985). The impact of item
phrasing on the validity of attitude scales for elementary school children. Journal of Educational Measurement, 22, 213240.
Bentler, P. (1972). Review of the Tennessee Self Concept
Scale. In O. K. Buros (Ed.), The seventh mental measurements yearbook (pp. 366367). Highland Park,
NJ: Gryphon Press.
Bentler, P. M., & LaVoie, A. L. (1972). An extension
of semantic space. Journal of Verbal Learning and
Verbal Behavior, 11, 174182.
Bentler, P. M., & Speckart, G. (1979). Attitude organization and the attitude-behavior relationship.
Journal of Personality and Social Psychology, 37,
913929.
Benton, A. L. (1974). The Visual Retention Test. New
York: Psychological Corporation.
Benton, A. L., Hamsher, K. de S., Varney, N. R., &
Spreen, O. (1983). Contributions to neuropsychological assessment: A clinical manual. New York: Oxford
University Press.
Berdie, R. F. (1950). Scores on the SVIB and the Kuder
Preference Record in relation to self-ratings. Journal
of Applied Psychology, 34, 4249.
Berg, I. A. (1955). Response bias and personality:
The deviation hypothesis. Journal of Psychology, 40,
6071.
Berg, I. A. (1959). The unimportance of test item content. In B. M. Bass & I. A. Berg (Eds.), Objective
approaches to personality assessment (pp. 8399).
Princeton, NJ: D. Van Nostrand.
Bergan, J. R. (1977). Behavioral consultation. Columbus, OH: Charles E. Merrill.
Bergan, J. R., & Kratochwill, T. R. (1990). Behavioral
consultation in applied settings. New York: Plenum
Press.
Bergland, B. (1974). Career planning: The use of
sequential evaluated experience. In E. L. Herr
(Ed.), Vocational guidance and human development.
Boston, MA: Houghton Mifin.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Bergner, M., Bobbitt, R. A., Carter, W. B., Gilson, B. S.
(1981). The Sickness Impact Prole: Development
and nal revision of a health status measure. Medical
Care, 19, 787785.
Bergner, M., Bobbitt, R. A., Kressel, S., et al. (1976). The
Sickness Impact Prole: Conceptual foundation and
methodology for the development of a health status
measure. International Journal of Health Service, 6,
393415.
Bergner, M., & Rothman, M. L. (1987). Health status measures: An overview and guide for selection.
Annual Review of Public Health, 8, 191210.
Berk, R. A. (1982). Handbook of methods for detecting
test bias. Baltimore, MD: Johns Hopkins University
Press.
Berk, R. A. (Ed.) (1984). A guide to criterion-referenced
test construction. Baltimore, MD: Johns Hopkins
University Press.
Berk, R. A. (1986). A consumers guide to setting performance standards on criterion-referenced tests.
Review of Educational Research, 56, 137172.
Berlinsky, S. (1952). Measurement of the intelligence
and personality of the deaf: A review of the literature. Journal of Speech and Hearing Disorders, 117,
3954.
Berman, A., & Hays, T. (1973). Relation between belief
in after-life and locus of control. Journal of Consulting and Clinical Psychology, 41, 318.
Bernard, L. C., Houston, W., & Natoli, L. (1993).
Malingering on neuropsychological memory tests:
Potential objective indicators. Journal of Clinical
Psychology, 49, 4553.
Berry, D. T. R., Baer, R. A., & Harris, M. J. (1991).
Detection of malingering on the MMPI: A metaanalysis. Clinical Psychology Review, 11, 585598.
Berry, J. W. (1974). Radical cultural relativism and
the concept of intelligence. In J. W. Berry & P. R.
Dasen (Eds.), Culture and cognition: Readings in
cross-cultural psychology (pp. 225229). London:
Methuen.
Berry, K., & Sherrets, S. (1975). A comparison of the
WISC and WISC-R for special education. Pediatric
Psychology, 3, 14.
Bersoff, D. N. (1970). The revised deterioration formula for the Wechsler Adult Intelligence Scale: A
test of validity. Journal of Clinical Psychology, 26,
7173.
Beutler, L. E., Arizmendi, T. G., Crago, M., Shaneld,
S., & Hagaman, R. (1983). The effects of value similarity and clients persuadability on value convergence and psychotherapy improvement. Journal of
Social and Clinical Psychology, 1, 231245.
Beutler, L. E., Crago, M., & Arizmendi, T. G. (1986).
Research on therapist variables in psychotherapy.
In S. L. Gareld & A. E. Bergin (Eds.), Handbook of
March 6, 2006
13:33
543
psychotherapy and behavior change (pp. 257310).
New York: Wiley.
Bigler, E. D. (1986). Forensic issues in neuropsychology. In D. Wedding, A. M. Horton, Jr., & J. Webster (Eds.), The neuropsychology handbook (pp. 526
547). Berlin: Springer-Verlag.
Bigler, E. D., & Nussbaum, N. L. (1989). Child neuropsychology in the private medical practice. In C. R.
Reynolds & E. Fletcher-Janzen (Eds.), Handbook of
clinical child neuropsychology (pp. 557572). New
York: Plenum Press.
Binder, L. M. (1992). Deception and malingering. In
A. E. Puente & R. J. McCaffrey (Eds.), Handbook of
neuropsychological assessment (pp. 353374). New
York: Plenum Press.
Binder, L. M. (1993). Assignment of malingering after
mild head trauma with the Portland Digit Recognition Test. Journal of Clinical and Experimental Neuropsychology, 15, 170182.
Binder, L. M., & Willis, S. C. (1991). Assessment
of motivation after nancially compensable minor
head trauma. Psychological Assessment, 3, 175181.
Binet, A., & Simon, T. (1905). Methodes nouvelles pour
le diagnostic du niveau intellectual des anormaux.
Annee Psychologique, 11, 191244.
Birren, J. E., & Birren, B. A. (1990). The concepts, models, and history of the psychology of aging. In J. E.
Birren, & K. W. Schaie (Eds.), Handbook of the psychology of aging (3rd ed., pp. 320). New York: Academic Press.
Bishop, D., & Butterworth, G. E. (1979). A longitudinal
study using the WPPSI and WISC-R with an English
sample. British Journal of Educational Psychology, 49,
156168.
Blanchard, R., Klassen, P., Dickey, R., Kuban, M. E.,
& Blak, T. (2001). Sensitivity and specicity of the
Phallometric test for pedophilia in nonadmitting sex
offenders. Psychological Assessment, 13, 118126.
Blankstein, K. R., Flett, G. L., & Koledin, S. (1991). The
Brief College Student Hassles Scale: Development,
validation, and relation with pessimism. Journal of
College Student Development, 32, 258264.
Blau, T. H. (1991). The psychological examination of the
child. New York: Wiley.
Blazer, D., Hughes, D. C., & George, L. K. (1987). The
epidemiology of depression in an elderly community population. The Gerontologist, 27, 281287.
Block, J. (1957). A comparison between ipsative and
normative ratings of personality. Journal of Abnormal and Social Psychology, 54, 5054.
Block, J. (1961). The Q-sort method in personality
assessment and psychiatric research. Springeld, IL:
Charles C Thomas.
Block, J. (1978). The Q-sort method. Palo Alto, CA:
Consulting Psychologists Press.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
544
Block, J. (1981). Some enduring and consequential
structures of personality. In A. I. Rabin (Ed.), Further explorations in personality. New York: Wiley
InterScience.
Block, J., Weiss, D. S., & Thorne, A. (1979). How
relevant is a semantic similarity interpretation of
ratings? Journal of Personality and Social Psychology,
37, 10551074.
Bloom, A. S., & Raskin, L. M. (1980). WISC-R VerbalPerformance IQ discrepancies: A comparison of
learning disabled children to the normative sample.
Journal of Clinical Psychology, 36, 322323.
Bloom, A. S., Reese, A., Altshuler, L., Meckler, C. L., &
Raskin, L. M. (1983). IQ discrepancies between the
Binet and WISC-R in children with developmental
problems. Journal of Clinical Psychology, 39, 600
603.
Bloom, B. (Ed.) (1956). Taxonomy of educational objectives. Handbook 1: Cognitive domain. New York:
Longmans, Green & Co.
Bloom, L., & Lahey, M. (1978). Language development
and language disorders. New York: Wiley.
Blum, G. S. (1950). The Blacky Pictures. Cleveland, OH:
The Psychological Corporation.
Blum, J. E., Fosshage, J. L., & Jarvix, L. F. (1972). Intellectual changes and sex differences in octogenarians:
A twenty-year longitudinal study of aging. Developmental Psychology, 7, 178187.
Blumenthal, M. (1975). Measuring depressive symptomatology in a general population. Archives of General Psychiatry, 34, 971978.
Bochner, S. (1978). Reliability of the Peabody Picture
Vocabulary Test: A review of 32 selected research
studies published between 1965 and 1974. Psychology in the Schools, 15, 320327.
Boehm, A. E. (1971). Boehm Test of Basic Concepts.
New York: The Psychological Corporation.
Boehm, V. R. (1968). Mr. Prejudice, Miss Sympathy,
and the authoritarian personality: An application of
psychological measuring techniques to the problem
of jury bias. Wisconsin Law Review, 734750.
Bogardus, E. S. (1925). Measuring social distance. Journal of Applied Sociology, 9, 299308.
Boldt, R. F. (1986). Generalization of SAT validity
across colleges. ETS Research Report, #8624.
Bolt, M. (1978). The Rokeach Value Survey: Preferred
or preferable? Perceptual and Motor Skills, 47, 322.
Bolton, B. (Ed.) (1990). Special education and rehabilitation testing: Practical applications and test reviews.
Austin, TX: Proed, Applied Testing Series.
Bolton, B. (1995). Review of the Inwald Personality
Inventory. In J. C. Conoley & J. C. Impara (Eds.),
The twelfth mental measurements yearbook (pp. 501
503). Lincoln, NB: The University of Nebraska
Press.
March 6, 2006
13:33
References
Bondy, M. (1974). Psychiatric antecedents of psychological testing (before Binet). Journal of the History
of the Behavioral Sciences, 10, 180194.
Booth-Kewley, S., Edwards, J. E., & Rosenfeld,
P. (1992). Impression management, social desirability, and computer administration of attitude questionnaires: Does the computer make a
difference? Journal of Applied Psychology, 77, 562
566.
Booth-Kewley, S., Rosenfeld, P., & Edwards, J. E.
(1992). Impression management and self-deceptive
enhancement among Hispanic and non-Hispanic
white Navy recruits. Journal of Social Psychology,
132, 323329.
Borgatta, E. F. (1964). The structure of personality
characteristics. Behavioral Science, 9, 817.
Borgen, F. H. (1986). New approaches to the assessment of interests. In W. B. Walsh & S. H. Osipow
(Eds.), Advances in vocational psychology (Vol. 1,
pp. 83125). Hillsdale, NJ: Erlbaum.
Borgen, F. H., & Harper, G. T. (1973). Predictive validity of measured vocational interests with black and
white college men. Measurement and Evaluation in
Guidance, 6, 1927.
Boring, E. G. (1957). A history of experimental psychology. New York: Appleton-Century-Crofts.
Boring, E. G. (1965). On the subjectivity of important
historical dates: Leipzig, 1879. Journal of the History
of the Behavioral Sciences, 1, 59.
Bornstein, P. H., Bridgwater, C. A., Hickey, J. S., &
Sweeney, T. M. (1980). Characteristics and trends in
behavioral assessment: An archival analysis. Behavioral Assessment, 2, 125133.
Bornstein, R. F. (1999). Criterion validity of objective and projective dependency tests: A metaanalytic
assessment of behavioral prediction. Psychological
Assessment, 11, 4857.
Bornstein, R. F., Hill, E. L., Robinson, K. J., Calabrese, C., & Bowers, K. S. (1996). Internal reliability of Rorschach Oral Dependency Scale scores.
Educational and Psychological Measurement, 56,
130138.
Bortner, M. (1965). Review of the Progressive Matrices. In O. K. Borum, R., & Grisso, T. (1995). Psychological test use in criminal forensic evaluations.
Professional Psychology, 26, 465473.
Borum, R., & Grisso, T. (1995). Psychological test
use in criminal forensic evaluations. Professional
Psychology: Research and Practice, 26, 465473.
Bosman, F., Hoogenboom, J., & Walpot, G. (1994). An
interactive video test for pharmaceutical chemists
assistants. Computers in Human Behavior, 10, 51
62.
Boswell, D. L., & Pickett, J. A. (1991). A study of
the internal consistency and factor structure of
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
the Verbalizer-Visualizer Questionnaire. Journal of
Mental Imagery, 15, 3336.
Boudreau, R. A., Killip, S. M., MacInnis, S. H., Milloy,
D. G., & Rogers, T. B. (1983). An evaluation of Graduate Record Examinations as predictors of graduate
success in a Canadian context. Canadian Psychology,
24, 191199.
Bouman, T. K., & Luteijn, F. (1986). Relations between
the Pleasant Events Schedule, depression, and other
aspects of psychopathology. Journal of Abnormal
Psychology, 95, 373377.
Bowman, M. L. (1989). Testing individual differences
in ancient China. American Psychologist, 44, 576
578.
Boyd, M. E., & Ward, G. (1967). Validities of the D-48
test for use with college students. Educational and
Psychological Measurement, 27, 11371138.
Boyle, G. J. (1989). Conrmation of the structural
dimensionality of the Stanford-Binet Intelligence
Scale (4th ed.). Personality and Individual Differences, 10, 709715.
Bracken, B. A. (1981). McCarthy Scales as a learning
disability diagnostic aid: A closer look. Journal of
Learning Disabilities, 14, 128130.
Bracken, B. A. (1984). Bracken Basic Concept Scale. San
Antonio, TX: The Psychological Corporation.
Bracken, B. A. (1985). A critical review of the Kaufman
Assessment Battery for Children (K-ABC). School
Psychology Review, 14, 2136.
Bracken, B. A. (1987). Limitations of preschool instruments and standards for minimal levels of technical
adequacy. Journal of Psychoeducational Assessment,
5, 313326.
Bracken, B. A. (1991). The assessment of preschool
children with the McCarthy Scales of Childrens
Abilities. In B. A. Bracken (Ed.), The Psychoeducational assessment of preschool children (2nd. ed.,
pp. 5385). Boston, MA: Allyn & Bacon.
Bracken, B. A., Prasse, D. P., & McCallum, R. S.
(1984). Peabody Picture Vocabulary Test-Revised:
An appraisal and review. School Psychology Review,
13, 4960.
Bradburn, N. M., & Caplovitz, D. (1965). Reports on
happiness. Chicago, IL: Aldine.
Braden, J. P. (1984). The factorial similarity of the
WISC-R Performance Scale in deaf and hearing
samples. Journal of Personality and Individual Differences, 4, 403410.
Braden, J. P. (1985). The structure of nonverbal intelligence in deaf and hearing subjects. American Annals
of the Deaf , 130, 496501.
Braden, J. P. (1989). The criterion-related validity of
the WISC-R Performance Scale and other nonverbal
IQ tests for deaf children. American Annals of the
Deaf , 134, 329332.
March 6, 2006
13:33
545
Braden, J. P. (1990). Do deaf persons have a characteristic psychometric prole on the Wechsler Performance Scales? Journal of Psychoeducational Assessment, 8, 518526.
Braden, J. P. (1992). The Differential Ability Scales
and Special Education. Journal of Psychoeducational
Assessment, 10, 9298.
Bradley, R. H., & Caldwell, B. M. (1974). Issues
and procedures in testing young children. Eric TM
Report, #37. Princeton, NJ: Educational Testing
Service.
Bradley, R. H., & Caldwell, B. M. (1977).
Home observation for measurement of the
environment: A validation study of screening efciency. American Journal of Mental Deciency, 81,
417420.
Bradley, R. H., & Caldwell, B. M. (1979). Home observation for measurement of the environment: A revision of the Preschool Scale. American Journal of
Mental Deciency, 84, 235244.
Bradley, R. H., Caldwell, B. M., & Elardo, R. (1977).
Home environment, social status, and mental test
performance. Journal of Educational Psychology, 69,
697701.
Bragman, R. (1982). Effects of different methods of
conveying test directions on deaf childrens performance on pattern recognition tasks. Journal of Rehabilitation of the Deaf , 16, 1726.
Bragman, R. (1982). Review of research on test instructions for deaf children. American Annals of the Deaf ,
127, 337346.
Braithwaite, V. A., & Law, H. G. (1985). Structure of
human values: Testing the adequacy of the Rokeach
Value Survey. Journal of Personality and Social Psychology, 49, 250263.
Brannick, M. T., Michaels, C. E., & Baker, D. P. (1989).
Construct validity of in-basket scores. Journal of
Applied Psychology, 74, 957963.
Brannigan, G. G., Aabye, S. M., Baker, L. A., & Ryan,
G. T. (1995). Further validation of the qualitative
scoring system for the modied Bender-Gestalt Test.
Psychology in the Schools, 32, 2426.
Brauer, B. A. (1993). Adequacy of a translation of
the MMPI into American Sign Language for use
with deaf individuals: Linguistic equivalency issues.
Rehabilitation Psychology, 38, 247260.
Braun, P. R., & Reynolds, D. N. (1969). A factor analysis of a 100-item fear survey inventory. Behaviour
Research and Therapy, 7, 399402.
Bray, D. W. (1982). The assessment center and the
study of lives. American Psychologist, 37, 180
189.
Brazelton, T. B. (1973). Neonatal Behavioral Assessment
Scale (Clinics in Developmental Medicine, No. 50).
Philadelphia, PA: J. B. Lippincott.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
546
Brazelton, T. B., Robey, J. S., & Collier, G. (1969). Infant
development in the Zinacanteco Indians of Southern Mexico. Pediatrics, 44, 274293.
Breckler, S. J. (1984). Empirical validation of affect,
behavior, and cognition as distinct components of
attitude. Journal of Personality and Social Psychology,
47, 11911205.
Breen, M. J., Carlson, M., & Lehman, J. (1985).
The revised developmental test of visual-motor
integration: Its relation to the VMI, WISC-R, and
Bender Gestalt for a group of elementary aged learning disabled students. Journal of Learning Disabilities, 18, 136138.
Breland, H. M., & Ironson, G. H. (1976). DeFunis
reconsidered: A comparative analysis of alternative
admissions strategies. Journal of Educational Measurement, 13, 8999.
Brennan, R. L. (1983). Elements of generalizability theory. Iowa City, IO: ACT Publications.
Bretz, R. D., Jr., Ash, R. A., & Dreher, G. F. (1989).
Do people make the place? An examination of the
attraction-selection-attrition hypothesis. Personnel
Psychology, 42, 561581.
Bridgeman, B., & Lewis, C. (1994). The relationship of
essay and multiple-choice scores with grades in college courses. Journal of Educational Measurement,
31, 3750.
Brigance, A. H. (1978). Brigance Diagnostic Inventory
of Early Development. Billerica, MA: Curriculum
Associates.
Briggs, S. R. (1992). Assessing the ve-factor model of
personality description. Journal of Personality, 60,
253293.
Bringmann, W. G., Balance, W. D. G., & Evans, R. B.
(1975). Wilhelm Wundt 18321920: A brief biographical sketch. Journal of the History of the Behavioral Sciences, 11, 287297.
Brislin, R. W. (1970). Back-translation for crosscultural research. Journal of Cross-Cultural Psychology, 1, 185216.
Brodsky, S. (1989). Advocacy in the guise of scientic objectivity: An examination of Faust and Ziskin.
Computers in Human Behavior, 5, 261264.
Brody, N. (1985). The validity of tests of intelligence.
In B. B. Wolman (Ed.), Handbook of intelligence
(pp. 353389). New York: Wiley.
Broen, W. E., Jr., & Wirt, R. D. (1958). Varieties of
response sets. Journal of Consulting Psychology, 22,
237240.
Brooks, C. M., Jackson, J. R., Hoffman, H. H., & Hand,
G. S., Jr. (1981). Validity of the new MCAT for predicting GPA and NBME Part 1 Examination Performance. Journal of Medical Education, 56, 767769.
Brooks, C. R. (1977). WISC, WISC-R, SB L & M,
WRAT: Relationships and trends among children
March 6, 2006
13:33
References
ages six to ten referred for psychological evaluation.
Psychology in the Schools, 14, 3033.
Brown, A. L. (1978). Knowing when, where, and
how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in instructional
psychology (Vol. 1, pp. 77165). Hillsdale, NJ:
Erlbaum.
Brown, D. C. (1994). Subgroup norming: Legitimate
testing practice or reverse discrimination? American
Psychologist, 49, 927928.
Brown, F. (1979). The SOMPA: A system of measuring potential abilities? School Psychology Digest, 8,
3746.
Brown, F. G. (1976). Principles of educational and
psychological testing (2nd ed.). New York: Holt,
Rinehart and Winston.
Brown, H. S., & May, A. E. (1979). A test-retest reliability study of the Wechsler Adult Intelligence Scale.
Journal of Consulting and Clinical Psychology, 47,
601602.
Brown, I. S. (1984). Development of a scale to measure attitude toward the condom as a method of
birth control. The Journal of Sex Research, 20, 255
263.
Brown, S. A., Goldman, M. S., Inn, A., & Anderson,
L. R. (1980). Expectations of reinforcement from
alcohol: Their domain and relation to drinking patterns. Journal of Consulting and Clinical Psychology,
48, 419426.
Brown, S. H. (1978). Long-term validity of a personal
history item scoring procedure. Journal of Applied
Psychology, 63, 673676.
Brown, S. R. (1980). Political subjectivity: Application of
Q methodology in political science. New Haven, CT:
Yale University Press.
Brown, T. L. (1991). Concurrent validity of the
Stanford-Binet (4th ed.) Agreement with the WISCR in classifying learning disabled children. Psychological Assessment, 3, 247253.
Brozek, J. (1972). To test or not to test: Trends in the
Soviet views. Journal of the History of the Behavioral
Sciences, 8, 243248.
Bruce, M. M., & Learner, D. B. (1958). A supervisory
practices test. Personnel Psychology, 11, 207216.
Bruch, M. A. (1977). Psychological Screening Inventory as a predictor of college student adjustment.
Journal of Consulting and Clinical Psychology, 45,
237244.
Bruck, M., Ceci, S. J., & Hembrooke, H. (1998). Reliability and credibility of young childrens reports.
American Psychologist, 53, 136151.
Bruhn, A. R., & Reed, M. R. (1975). Simulation of
brain damage on the Bender-Gestalt Test by college students. Journal of Personality Assessment, 3,
244255.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Bruininks, R. H. (1978). Bruininks-Oseretsky Test of
Motor Prociency, examiners manual. Circle Pines,
MN: American Guidance Service.
Bruvold, W. H. (1968). Scales for rating the taste of
water. Journal of Applied Psychology, 52, 245253.
Bruvold, W. H. (1975). Judgmental bias in the rating
of attitude statements. Educational and Psychological
Measurement, 35, 605611.
Buck, J. N. (1966). The House-Tree-Person technique:
Revised manual. Beverly Hills, CA: Western Psychological Services.
Buckhalt, J. A. (1986). Test review of the British Ability
Scales. Journal of Psychoeducational Assessment, 4,
325332.
Buckhalt, J. A. (1990). Criterion-related validity of
the British Ability Scales Short-Form for black
and white children. Psychological Reports, 66, 1059
1066.
Buckhalt, J. A. (1991). Test review of the Wechsler Preschool and Primary Scale of Intelligence
Revised. Journal of Psychoeducational Assessment, 9,
271279.
Buckhalt, J. A., Denes, G. E., & Stratton, S. P. (1989).
Validity of the British Ability Scales Short-Form for
a sample of U.S. students. School Psychology International, 10, 185191.
Budoff, M., & Friedman, M. (1964). Learning potential as an assessment approach to the adolescent
mentally retarded. Journal of Consulting Psychology,
28, 433439.
Buechley, R., & Ball, H. (1952). A new test of validity
for the group MMPI. Journal of Consulting Psychology, 16, 299301.
Burden, R. L., & Fraser, B. J. (1993). Use of classroom
environment assessments in school psychology: A
British perspective. Psychology in the Schools, 30,
232240.
Burg, C., Quinn, P. O., & Rapoport, J. L. (1978). Clinical
evaluation of one-year-old infants: Possible predictors of risk for the hyperactivity syndrome. Journal
of Pediatric Psychology, 3, 164167.
Burgemeister, B. B., Blum, L., & Lorge, I. (1972).
Columbia Mental Scale. New York: Harcourt, Brace,
Jovanovich.
Burger, G. K. (1975). A short form of the California
Psychological Inventory. Psychological Reports, 37,
179182.
Burgess, A. (1991). Prole analysis of the Wechsler
Intelligence Scales: A new index of subtest scatter.
British Journal of Clinical Psychology, 30, 257263.
Burgess, E. W., & Cottrell, L. S. (1939). Predicting success or failure in marriage. New York: Prentice Hall.
Burisch, M. (1984). Approaches to personality inventory construction. American Psychologist, 39, 214
227.
March 6, 2006
13:33
547
Burke, H. R. (1972). Ravens Progressive Matrices:
validity, reliability, and norms. The Journal of Psychology, 82, 253257.
Burke, H. R., & Bingham, W. C. (1969). Ravens Progressive Matrices: More on construct validity. The
Journal of Psychology, 72, 247251.
Burke, M. J., & Normand, J. (1987). Computerized
psychological testing: Overview and critique. Professional Psychology: Research and Practice, 18, 4251.
Burke, M. J., Normand, J., & Raju, N. S. (1987).
Examinee attitudes toward computer-administered
ability testing. Computers in Human Behavior, 3,
95107.
Burkhart, B. R., Christian, W. L., & Gynther, M. D.
(1978). Item subtlety and the MMPI: A paradoxical
relationship. Journal of Personality Assessment, 42,
7680.
Burnam, M. A., Telles, C. A., Karno, M., Hough, R. L., &
Escobar, J. I. (1987). Measurement of acculturation
in a community population of Mexican Americans.
Hispanic Journal of Behavioral Sciences, 9, 105130.
Buros, O. K. (Ed.). The sixth mental measurements
yearbook (pp. 762765). Highland Park, NJ: The
Gryphon Press.
Buss, A. H. (1989). Personality as traits. American Psychologist, 44, 13781388.
Buss, A. R. (1976). Galton and the birth of differential
psychology and eugenics: Social, political, and economic forces. Journal of the History of the Behavioral
Sciences, 12, 4758.
Buss, D. M., & Scheier, M. F. (1976). Selfconsciousness, self-awareness, and self-attribution.
Journal of Research in Personality, 10, 463468.
Butcher, J. N. (1978). Automated MMPI interpretative
systems. In O. K. Buros (Ed.), Eighth mental measurements yearbook (pp. 942945). Highland Park,
NJ: Gryphon Press.
Butcher, J. N. (1979). Use of the MMPI in personnel
selection. In J. N. Butcher (Ed.), New developments
in the use of the MMPI. Minneapolis, MN: University
of Minnesota Press.
Butcher, J. N. (1990). MMPI-2 in psychological treatment. New York: Oxford University Press.
Butcher, J. N. (Ed.) (1996). International adaptations of
the MMPI-2. Minneapolis, MN: University of Minnesota Press.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Manual for administration and scoring: Minnesota Multiphasic Personality Inventory-2: (MMPI-2). Minneapolis: University
of Minnesota Press.
Butcher, J. N., Graham, J. R., Williams, C. L., & BenPorath, Y. S. (1990). Development and use of the
MMPI-2 content scales. Minneapolis, MN: University of Minnesota Press.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
548
Butcher, J. N., Keller, L. S., & Bacon, S. F. (1985). Current developments and future directions in computerized personality assessment. Journal of Consulting
and Clinical Psychology, 53, 803815.
Butcher, J. N., & Owen, P. (1978). Objective personality inventories: Recent research and some contemporary issues. In B. B. Wolman (Ed.), Clinical diagnosis of mental disorders: A handbook (pp. 475546).
New York: Plenum.
Byrne, B. M., & Baron, P. (1994). Measuring adolescent
depression: Tests of equivalent factorial structure for
English and French versions of the Beck Depression Inventory. Applied Psychology: An International
Review, 43, 3347.
Byrnes, J. P., & Takahira, S. (1993). Explaining gender differences on SAT-Math items. Developmental
Psychology, 29, 805810.
Bzoch, K. R., & League, R. (1971). Assessing language
skills in infancy: A handbook for multidimensional
analysis of emergent language. Gainesville, FL: Tree
of Life Press.
Cacioppo, J. T., Petty, R. E., & Geen, T. R. (1989). Attitude structure and function: From the tripartite to
the homeostasis model of attitudes. In A. R. Pratkanis, S. J. Breckler, & A. G. Greenwald (Eds.), Attitude
structure and function (pp. 275309). Hillsdale, NJ:
Erlbaum.
Cain, L. F., Levine, S., & Elzey, F. F. (1977). Manual for
the Cain-Levine Social Competency Scale. Palo Alto,
CA: Consulting Psychologists Press.
Cairns, R. B., & Green, J. A. (1979). How to assess
personality and social patterns: Observations or
ratings? In R. B. Cairns & J. A. Green (Eds.),
The analysis of social interactions. Hillsdale, NJ:
Erlbaum.
Cairo, P. C. (1979). The validity of the Holland and
Basic Interest Scales of the Strong Vocational Interest Blank: Leisure activities vs. occupational membership as criteria. Journal of Vocational Behavior,
15, 6877.
Caldwell, A. B. (2001). What do the MMPI scales fundamentally measure? Some hypotheses. Journal of
Personality Assessment, 76, 117.
Camara, W. L., & Schneider, D. L. (1994). Integrity
tests: Facts and unresolved issues. American Psychologist, 49, 112119.
Campbell, A., Converse, P. E., & Rodgers, W. L. (1976).
The quality of American life. New York: Russell Sage
Foundation.
Campbell, D. P. (1966). The stability of vocational
interests within occupations over long time spans.
Personnel and Guidance Journal, 44, 10121019.
Campbell, D. P. (1971). Handbook for the Strong
Vocational Interest Blank. Stanford, CA: Stanford
University.
March 6, 2006
13:33
References
Campbell, D. P. (1974). Manual for the StrongCampbell Interest Inventory. Stanford, CA: Stanford
University.
Campbell, D. P., & Hansen, J. C. (1981). Manual for
the SVIB-SCII (3rd ed.). Stanford, CA: Stanford
University.
Campbell, D. T. (1960). Recommendations for APA
test standards regarding construct, trait, and discriminant validity. American Psychologist, 15, 546
553.
Campbell, D. T., & Fiske, D. W. (1959). Convergent
and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81105.
Campbell, J. M., Amerikaner, M., Swank, P., & Vincent,
K. (1989). The relationship between the Hardiness
Test and the Personal Orientation Inventory. Journal
of Research in Personality, 23, 373380.
Campbell, J. P. (1990). An overview of the Army selection and classication project (Project A). Personnel
Psychology, 43, 231239.
Caneld, A. A. (1951). The sten scale A modied
C scale. Educational and Psychological Measurement,
11, 295297.
Canivez, G. L., & Watkins, M. W. (1998), Longterm stability of the Wechsler Intelligence Scale for
Children Third Edition. Psychological Assessment,
10, 285291.
Canter, A. H. (1952). MMPI prole in multiple sclerosis. Journal of Consulting Psychology, 15, 353356.
Canter, D. (1969). An intergroup comparison of connotative dimensions in architecture. Environment
and Behavior, 1, 3748.
Cardno, J. A. (1955). The notion of attitude: An historical note. Psychological Reports, 1, 345352.
Carey, W. B., & McDevitt, S. C. (1978). Stability and
change in individual temperament diagnoses from
infancy to early childhood. Journal of the American
Academy of Child Psychiatry, 17, 331337.
Carline, J. D., Cullen, T. J., Scott, C. S., Shannon, N. F.,
& Schaad, D. (1983). Predicting performance during clinical years from the New Medical College
Admission Test. Journal of Medical Education, 58,
1825.
Carlson, C. I., & Grotevant, H. D. (1987). A comparative review of family rating scales: Guidelines for
clinicians and researchers. Journal of Family Psychology, 1, 2347.
Carlson, J. G. (1985). Recent assessments of the MyersBriggs Type Indicator. Journal of Personality Assessment, 49, 356365.
Carlson, J. H., & Williams, T. (1993). Perspectives on
the seriousness of crimes. Social Science Research,
22, 190207
Carlson, J. S., & Jensen, C. M. (1981). Reliability of the
Raven Colored Progressive Matrices Test: Age and
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
ethnic group comparisons. Journal of Consulting and
Clinical Psychology, 49, 320322.
Carlyn, M. (1977). An assessment of the Myers-Briggs
Type Indicator. Journal of Personality Assessment, 41,
461473.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and
validity assessment. Beverly Hills, CA: Sage.
Carney, R. N., & Schattgen, S. F. (1994). California
Achievement Tests (5th ed.). In D. J. Keyser, & R. C.
Sweetland (Eds.), Test critiques (Vol. X, pp. 110
119). Austin, TX: Pro-ed.
Carp, A. L., & Shavzin, A. R. (1950). The susceptibility
to falsication of the Rorschach psychodiagnostic
technique. Journal of Consulting Psychology, 14, 230
233.
Carpenter, P. A., Just, M. A., & Shell, P. (1990).
What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97,
404431.
Carr, A. C., Ghosh, A., & Ancil, R. J. (1983). Can a
computer take a psychiatric history? Psychological
Medicine, 13, 151158.
Carr, A. C., Wilson, S. L., Ghosh, A., Ancill, R. J., &
Woods, R. T. (1982). Automated testing of geriatric patients using a microcomputer-based system.
International Journal of Man-Machine Studies, 17,
297300.
Carraher, S. M. (1993). Another look at the dimensionality of a learning style questionnaire. Educational
and Psychological Measurement, 53, 411415.
Carrier, M. R., Dalessio, A. T., & Brown, S. H. (1990).
Correspondence between estimates of content and
criterion-related validity values. Personnel Psychology, 43, 85100.
Carroll, J. B. (1952). Ratings on traits measured by a
factored personality inventory. Journal of Abnormal
and Social Psychology, 47, 626632.
Carroll, J. B. (1982). The measurement of intelligence.
In R. J. Sternberg (Ed.), Handbook of human intelligence (pp. 29120). Cambridge, MA: Cambridge
University Press.
Carrow, E. (1973). Carrow Elicited Language Inventory.
Austin, TX: Learning Concepts.
Carson, K. P., & Gilliard, D. J. (1993). Construct validity of the Miner Sentence Completion Scale. Journal
of Occupational and Organizational Psychology, 66,
171175.
Carstensen, L. L., & Cone, J. D. (1983). Social desirability and the measurement of psychological wellbeing in elderly persons. Journal of Gerontology, 38,
713715.
Carter, B. D., Zelko, F. A. J., Oas. P. T., & Waltonen,
S. (1990). A comparison of ADD/H children and
clinical controls on the Kaufman Assessment Battery
March 6, 2006
13:33
549
for children (K-ABC). Journal of Psychoeducational
Assessment, 8, 155164.
Caruso, J. C. (2000). Reliability generalization of the
NWO Personality scales. Educational and psychological measurement, 60, 236254.
Carvajal, H. (1987a). Correlations between scores on
Stanford-Binet IV and Wechsler Adult Intelligence
Scale-Revised. Psychological Reports, 61, 8386.
Carvajal, H. (1987b). 1986 Stanford-Binet abbreviated
forms. Psychological Reports, 61, 285286.
Carvajal, H. (1987c). Relationship between scores of
young adults on Stanford-Binet IV and Peabody Picture Vocabulary Test-Revised. Perceptual and Motor
Skills, 65, 721722.
Carvajal, H. (1988). Relationships between scores on
Stanford-Binet IV and scores on McCarthy Scales
of Childrens Abilities. Bulletin of the Psychonomic
Society, 26, 349.
Carvajal, H. (1991). Relationships between scores
on Wechsler Preschool and Primary Scale of
intelligence-Revised and Stanford-Binet IV. Psychological Reports, 69, 2326.
Carver, R. P. (1974). Two dimensions of tests: Psychometric and edumetric. American Psychologist, 29,
512518.
Carver, R. P. (1992). Reliability and validity of the
Speed of Thinking test. Educational and Psychological Measurement, 52, 125134.
Cascio, W. F. (1975). Accuracy of veriable biographical information blank responses. Journal of Applied
Psychology, 60, 767769.
Cascio, W. F., Alexander, R. A., & Barrett, G. V. (1988).
Setting cutoff scores: Legal, psychometric, and professional issues and guidelines. Personnel Psychology,
41, 124.
Cascio, W. F., Outtz, J., Zedeck, S., & Golstein, I. L.
(1991). Statistical implications of six methods of
test score use in personnel selection. Human Performance, 4, 233264.
Casey, T. A., Kingery, P. M., Bowden, R. G., & Corbett,
B. S. (1993). An investigation of the factor structure
of the Multidimensional Health Locus of Control
Scales in a health promotion program. Educational
and Psychological Measurement, 53, 491498.
Cassisi, J. E., & Workman, D. E. (1992). The detection
of malingering and deception with a short form of
the MMPI-2 based on the L, F, and K scales. Journal
of Clinical Psychology, 48, 5458.
Castro, F. G., Furth, P., & Karlow, H. (1984). The health
beliefs of Mexican, Mexican American, and Anglo
American women. Hispanic Journal of Behavioral
Sciences, 6, 365383.
Cates, J. A. (1991). Comparison of human gure drawings by hearing and hearing-impaired children. The
Volta Review. 93, 3139.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
550
Catron, D. W., & Thompson, C. C. (1979). Test-retest
gains in WAIS scores after four retest intervals. Journal of Clinical Psychology, 35, 352357.
Cattell, R. B. (1943). The description of personality:
I. Foundations of trait measurement. Psychological
Review, 50, 559594.
Cattell, R. B. (1950). Personality: A systematic theoretical and factual study. New York: McGraw-Hill.
Cattell, R. B. (1963). Theory of uid and crystallized
intelligence: A critical experiment. Journal of Educational Psychology, 54, 122.
Cattell, R. B. (1986). Structured tests and functional
diagnoses. In R. B. Cattell & R. C. Johnson (Eds.),
Functional psychological testing. New York: BrunnerMazel.
Cattell, R. B. (1987). Intelligence: Its structure, growth
and action. New York: North Holland.
Cattell, R. B., Cattell, A. K., & Cattell, H. E. (1993).
Sixteen Factor Questionnaire (5th ed.). Champaign,
IL: Institute for Personality and Ability Testing.
Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970).
Handbook for the Sixteen Personality Factor Questionnaire. Champaign, IL: Institute for Personality
and Ability Testing.
Cautilli, P. G., & Bauman, M. K. (1987). Assessment
of the visually impaired client. In B. Bolton (Ed.),
Handbook of measurement and evaluation in rehabilitation (2nd ed., pp. 249262). Baltimore, MD:
Paul H. Brookes.
Cavell, T. A., & Kelley, M. L. (1994). The Checklist of
Adolescent Problem Situations. Journal of Clinical
Child Psychology, 23, 226238.
Chambers, D. W. (1983). Stereotypic images of the
scientist: The Draw-A-Scientist Test. Science Education, 67, 255265.
Champion, V. (1984). Instrument development for
health belief model constructs. Advances in Nursing Science, 6, 7385.
Chan, C. K. K., Burtis, P. J., Scardamalia, M., & Bereiter,
C. (1992). Constructive activity in learning from
text. American Educational Research Journal, 29, 97
118.
Chara, P. J., Jr., & Verplanck, W. S. (1986). The Imagery
Questionnaire: An investigation of its validity. Perceptual and Motor Skills, 63, 915920.
Chartier, G. M. (1971). A-B therapist variable: Real or
imagined? Psychological Bulletin, 75, 2233.
Chase, C. I. (1985). Review of the Torrance Tests of
Creative Thinking. In J. V. Mitchell, Jr. (Ed.), The
ninth mental measurements yearbook (pp. 1486
1487). Lincoln, NB: University of Nebraska.
Chattin, S. H. (1989). School psychologists evaluation
of the K-ABC, McCarthy scales, Stanford-Binet IV,
and WISC-R. Journal of Psychoeducational Assessment, 7, 112130.
March 6, 2006
13:33
References
Chen, G. M. (1994). Social desirability as a predictor of
argumentativeness and communication apprehension. The Journal of Psychology, 128, 433438.
Chernyshenko, O. S., & Ones, D. S. (1999). How selective are psychology graduate programs? The effect
of the selection ratio on GRE score validity. Educational and Psychological Measurement, 59, 951961.
Cherrick, H. M. (1985). Review of Dental Admission
Test. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 448449). Lincoln:
University of Nebraska Press.
Chinese Culture Connection. (1987). Chinese values
and the search for culture-free dimensions of culture. Journal of Cross-Cultural Psychology, 18, 143
164.
Christensen, H. T., & Carpenter, G. R. (1962). Valuebehavior discrepancies regarding pre-marital coitus
in three Western cultures. American Sociological
Review, 27, 6674.
Christiansen, N. D., Gofn, R. D., Johnston, N. G.,
& Rothstein, M. G. (1994). Correcting the 16PF for
faking: Effects on criterion-related validity and individual hiring decisions. Personnel Psychology, 47,
847860.
Chua, S. L., Chen, D., & Wong, A. F. L. (1999). Computer anxiety and its correlates: A metaanalysis.
Computers in Human Behavior, 15, 609623.
Cibis, G. W., Maino, J. H., Crandall, M. A., Cress, P.,
Spellman, C. R., & Shores, R. E. (1985). The Parsons
Visual Acuity Test for screening children 18 to 48
months old. Annals of Ophthalmology, 17, 471478.
Clarizio, H. F. (1982). Intellectual assessment of Hispanic children. Psychology in the Schools, 19, 6171.
Clark, J. B., & Madison, C. L. (1984). Manual for the
Test of Oral Language. Austin, TX: Pro-ed.
Clark, L., Gresham, F. M., & Elliott, S. N. (1985). Development and validation of a social skills assessment
measure. Journal of Psychoeducational Assessment, 3,
347356.
Clark, M. J., & Grandy, J. (1984). Sex differences in the
academic performance of Scholastic Aptitude Test
takers. College Board Report No. 848. New York:
College Board Publications.
Cleary, T. A. (1968). Test bias: Prediction of grades
of Negro and white students in integrated colleges.
Journal of Educational Measurement, 5, 115124.
Cleary, T. A., Humphreys, L. G., Kendrick, S. A., &
Wesman, A. (1975). Educational uses of tests with
disadvantaged students. American Psychologist, 30,
1541.
Coan, R. W., Fairchild, M. T., & Dobyns, Z. P. (1973).
Dimensions of experienced control. The Journal of
Social Psychology, 91, 5360.
Coates, S., & Bromberg, P. M. (1973). Factorial structure of the Wechsler Preschool and Primary Scale of
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Intelligence between the ages of 4 and 6 1/2. Journal
of Consulting and Clinical Psychology, 40, 365370.
Cofer, C. N., Chance, J., & Judson, A. J. (1949). A
study of malingering on the Minnesota Multiphasic
Personality Inventory. The Journal of Psychology, 27,
491499.
Cofn, T. E. (1941). Some conditions of suggestion
and suggestibility: A study of certain attitudinal and
situational factors inuencing the process of suggestion. Psychological Monographs, 53 (Whole No.
241).
Coffman, W. E. (1985). Review of the Structure of
Intellect Learning Abilities Test. In J. V. Mitchell,
Jr. (Ed.), The ninth mental measurements yearbook (pp. 14861488). Lincoln, NE: University of
Nebraska Press.
Cohen, B. H., & Saslona, M. (1990). The advantage
of being an habitual visualizer. Journal of Mental
Imagery, 14, 101112.
Cohen, J. (1950). Wechsler Memory Scale performance
of psychoneurotic, organic, and schizophrenic
groups. Journal of Consulting Psychology, 14, 371
375.
Cohen, J. (1957). The factorial structure of the WAIS
between early adulthood and old age. Journal of Consulting Psychology, 21, 283290.
Cohen, J. (1960). A coefcient of agreement for nominal scales. Educational and Psychological Measurement, 20, 3746.
Cohen, J., & Cohen, P. (1983). Applied multiple
regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J., & Lefkowitz, J. (1974). Development of a
biographical inventory blank to predict faking on
personality tests. Journal of Applied Psychology, 59,
404405.
Cohen, J. B. (1990). Misuse of computer software to
detect faking on the Rorschach: A reply to Kahn,
Fox, and Rhode. Journal of Personality Assessment,
54, 5862.
Cohn, S. J. (1985). Review of Graduate Record Examination. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 622624).
Lincoln: University of Nebraska Press.
Cole, D. A. (1987). Utility of conrmatory factor analysis in test validation research. Journal of Consulting
and Clinical Psychology, 55, 584594.
Cole, K. N., & Fewell, R. R. (1983). A quick language
screening test for young children: The token test.
Journal of Psycoeducational Assessment, 1, 149154.
Cole, N. (1982). The implications of coaching for
ability testing. In A. Wigdor & W. Garner (Eds.),
Ability testing: Uses, consequences, and controversies.
(pp. 389414). Washington, DC: National Academy
Press.
March 6, 2006
13:33
551
Cole, N. S. (1973). Bias in selection. Journal of Educational Measurement, 10, 237255.
Coleman, J. S. (1966). Equality of Educational Opportunity. Washington, DC: U.S. Department of Health,
Education, and Welfare.
College Entrance Examination Board. (1988). 1988
prole of SAT and achievement test takers. New York:
Author.
Coller, A. R. (1971). Self-concept measures: An annotated bibliography. Princeton, NJ: Educational Testing Service.
Collins, J. M., & Schmidt, F. L. (1993). Personality,
integrity, and white collar crime: A construct validity
study. Personnel Psychology, 46, 295311.
Compas, B. E., Davis, G. E., Forsythe, C. J., & Wagner,
B. M. (1987). Assessment of major and daily stressful events during adolescence: The Adolescent Perceived Events Scale. Journal of Consulting and Clinical Psychology, 55, 534541.
Comrey, A. L. (1958). A factor analysis of items on the
F scale of the MMPI. Educational and Psychological
Measurement, 18, 621632.
Comunian, A. L. (1985). The development and validation of the Italian form of the Test Anxiety Inventory. In H. M. Van Der Ploeg, R. Schwarzer, & C. D.
Spielberger (Eds.), Advances in Test Anxiety Research
(Vol. 4, pp. 215220). Lisse, Netherlands: Swets &
Zeitlinger.
Cone, J. D. (1977). The relevance of reliability and
validity for behavioral assessment. Behavior Therapy, 8, 411426.
Cone, J. D. (1978). The Behavioral Assessment Grid
(BAG): A conceptual framework and a taxonomy.
Behavior Therapy, 9, 882888.
Cone, J. D., & Foster, S. L. (1991). Training in
measurement: Always the bridesmaid. American
Psychologist, 46, 653654.
Conners, C. K. (1969). A Teacher Rating Scale for use
in drug studies with children. American Journal of
Psychology, 126, 884888.
Conners, C. K. (1970). Symptom patterns in hyperkinetic, neurotic, and normal children. Child Development, 41, 667682.
Conners, C. K. (1990). Conners Rating Scales manual.
North Tonawanda, NY: Multi-Health Systems.
Conners, C. K., Sitarenios, G., Parker, J. D. A., &
Epstein, J. N. (1998). The revised Conners Parent
Rating Scale (CPRS-R): Factor structure, reliability, and criterion validity. Journal of Abnormal Child
Psychology, 26, 257280.
Conoley, J. C., & Bryant, L. E. (1995). Multicultural family assessment. In J. C. Conoley &
E. B. Werth (Eds.), Family assessment (pp. 103
129). Lincoln, NE: Buros Institute of Mental
Measurements.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
552
Coombs, C. H. (1950). Psychological scaling without
a unit of measurement. Psychological Review, 57,
145158.
Coons, P. M., & Fine, C. G. (1990). Accuracy of the
MMPI in identifying multiple personality disorder.
Psychological Reports, 66, 831834.
Cooper, E. (1991). A critique of six measures for
assessing creativity. Journal of Creative Behavior, 25,
194204.
Cooper, K. L., & Gutmann, D. L. (1987). Gender
identity and ego mastery style in middle-aged, preand post-empty nest women. The Gerontologist, 27,
347352.
Coopersmith, S. (1969). A method of determining
types of self-esteem. Journal of Abnormal Psychology, 59, 8794.
Coopersmith, S. (1967). The antecedents of self-esteem.
San Francisco, CA: W. H. Freeman.
Corach, N. L., Feldman, M. J., Cohen, I. S., Gruen, W.,
Meadow, A., & Ringwall, E. A. (1958). Social desirability as a variable in the Edwards Personal Preference Schedule. Journal of Consulting Psychology, 22,
7072.
Corach, N. L., & Powell, B. L. (1963). A factor analytic study of the Frostig Developmental Test of
Visual Perception. Perceptual and Motor Skills, 16,
3963.
Cornell, D. G., & Hawk, G. L. (1989). Clinical presentation of malingerers diagnosed by experienced
forensic psychologists. Law and Human Behavior,
13, 375383.
Cornwell, J. M., Manfredo, P. A., & Dunlap, W. P.
(1991). Factor analysis of the 1985 revision of Kolbs
Learning Style Inventory. Educational and Psychological Measurement, 51, 455462.
Cortese, M., & Smyth, P. (1979). A note on the translation to Spanish of a measure of acculturation. Hispanic Journal of Behavioral Sciences, 1, 6568.
Cortina, J. M. (1993). What is coefcient alpha? An
examination of theory and applications. Journal of
Applied Psychology, 78, 98104.
Cortina, J. M., Doherty, M. L., Schmitt, N., Kaufman,
G., & Smith, R. G. (1992). The Big Five personality
factors in the IPI and MMPI: Predictors of police
performance. Personnel Psychology, 45, 119140.
Cosden, M. (1987). Rotter Incomplete Sentences
Blank. In D. J. Keyser, & R. C. Sweetland (Eds.), Test
critiques compendium. Kanas City, MO: Test Corporation of America.
Costa, P. T., Jr., & McCrae, R. R. (1980). Still stable
after all these years: Personality as a key to some
issues in adulthood and old age. In P. B. Baltes &
O. G. Brim, Jr. (Eds.), Life span development and
behavior (Vol. 3, pp. 65102). New York: Academic
Press.
March 6, 2006
13:33
References
Costa, P. T., Jr., & McCrae, R. R. (1985). The NEO
Personality Inventory manual. Odessa, FL: Psychological Assessment Resources.
Costa, P. T., Jr., & McCrae, R. R. (1989). NEOPI/FFI manual supplement. Odessa, FL: Psychological Assessment Resources.
Costa, P. T., Jr., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO
Personality Inventory. Psychological Assessment, 4,
513.
Costantino, G., Malgady, R., & Rogler, L. H. (1988).
Tell-Me-a-Story-TEMAS-Manual. Los Angeles, CA:
Western Psychological Services.
Costantino, G., Malgady, R. G., Rogler, L. H., &
Tsui, E. C. (1988). Discriminant analysis of clinical outpatients and public school children by
TEMAS: A Thematic Apperception Test for Hispanics and Blacks. Journal of Personality Assessment, 52
670678.
Couch, A., & Keniston, K. (1960). Yeasayers and
naysayers: Agreeing response set as a personality
variable. Journal of Abnormal and Social Psychology,
60, 151174.
Court, J. H. (1983). Sex differences in performance on
Ravens Progressive Matrices: A review. The Alberta
Journal of Educational Research, 29, 5474.
Court, J. H. (1988). A researchers bibliography for
Ravens Progressive Matrices and Mill Hill Vocabulary
scales (7th ed.). Cumberland Park, South Australia:
author.
Court, J. H., & Raven, J. C. (1982). Research and
references: 1982 update. London: H. K. Lewis.
Covin, T. M. (1977). Comparison of WISC and WISCR full scale IQs for a sample of children in special
education. Psychological Reports, 41, 237238.
Craig, H. B. (1965). A sociometric investigation of the
self-concept of the deaf child. American Annals of
the Deaf , 110, 456474.
Craig, R. J. (1999). Overview and current status of the
Millon Clinical Multiaxial Inventory. Journal of Personality Assessment, 72, 390406.
Craik, K. H. (1971). The assessment of places. In P.
McReynolds (Ed.), Advances in psychological assessment (Vol. 2, pp. 4062). Palo Alto, CA: Science &
Behavior Books.
Cramer, P. (1999). Future directions for the Thematic
Apperception Test. Journal of Personality Assessment,
72, 7492.
Crary, W. G., & Johnson, C. W. (1975). The mental
status examination. In C. W. Johnson, J. R. Snibbe,
& L. E. Evans (Eds.), Basic psychopathology: A programmed text (pp. 5089). New York: Spectrum.
Creaser, J. W. (1960). Factor analysis of a study-habits
Q-sort test. Journal of Counseling Psychology, 7, 298
300.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Cress, P. J. (1987). Visual assessment. In M. Bullis
(Ed.), Communication development in young children with deaf-blindness: Literature Review III. Monmouth, OR: Teaching Research Division of Oregon
State System of Higher Education.
Crites, J. O. (1978). Career Maturity Inventory. Monterey, CA: CTB/McGraw-Hill.
Crockett, B. K., Rardin, M. W., & Pasewark, R. A.
(1975). Relationship between WPPSI and StanfordBinet IQs and subsequent WISC IQs in Headstart
children. Journal of Consulting and Clinical Psychology, 43, 922.
Crockett, B. K., Rardin, M. W., & Pasewark, R. A.
(1976). Relationship of WPPSI and subsequent
Metropolitan Achievement Test scores in Headstart
children. Psychology in the Schools, 13, 1920.
Cronbach, L. J. (1942). Studies of acquiescence as a
factor in the true-false test. Journal of Educational
Psychology, 33, 401415.
Cronbach, L. J. (1946). Response sets and test validity.
Educational and Psychological Measurement, 6, 475
494.
Cronbach, L. J. (1949). Statistical methods applied to
Rorschach scores: A review. Psychological Bulletin,
46, 393429.
Cronbach, L. J. (1950). Further evidence on response
sets and test design. Educational and Psychological
Measurement, 10, 331.
Cronbach, L. J. (1951). Coefcient alpha and the internal structure of tests. Psychometrika, 16, 297334.
Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
Cronbach, L. J. (1980). Validity on parole: How can we
go straight? New Directions in Test Measurement, 5,
99108.
Cronbach, L. J. (1988). Five perspectives on the validity
argument. In R. Wainer, & H. I. Braun (Eds.), Test
validity (pp. 317). Hillsdale, NJ: Erlbaum.
Cronbach, L. J., & Gleser, G. C. (1965). Psychological
tests and personnel decisions. Urbana, IL: University
of Illinois Press.
Cronbach, L. J., Gleser, G. C., Rajaratnam, N., &
Nanda, H. (1972). The dependability of behavioral
measurements. New York: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281302.
Crook, T. H. (1979). Psychometric assessment in the
elderly. In A. Raskin & L. F. Jarvik (Eds.), Psychiatric symptoms and cognitive loss in the elderly:
Evaluation and assessment techniques (pp. 207220).
Washington, DC: Hemisphere.
Crook, T. H., & Larrabee, G. J. (1988). Interrelationships among everyday memory tests: Stability of factor structure of age. Neuropsychology, 2, 112.
March 6, 2006
13:33
553
Crook, T. H., & Larrabee, G. J. (1990). A self-rating
scale for evaluating memory in everyday life. Psychology and Aging, 5, 4857.
Crosby, L. A., Bitner, M. J., & Gill, J. D. (1990). Organizational structure of values. Journal of Business
Research, 20, 123134.
Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M.
(1984). A comparison of three methods for establishing minimum standards on the National Teacher
Examinations. Journal of Educational Measurement,
21, 113129.
Cross, O. H. (1947). Braille adaptation of the Minnesota Multiphasic Personality Inventory for use
with the blind. Journal of Applied Psychology, 31,
189198.
Crouse, J. (1985). Does the SAT help colleges make better selection decisions? Harvard Educational Review,
55, 195219.
Crowne, D. P., & Marlowe, D. (1960). A new scale of
social desirability independent of psychopathology.
Journal of Consulting Psychology, 24, 349354.
Crowne, D. P., & Marlowe, D. (1964). The approval
motive: Studies in evaluative dependence. New York:
Wiley.
Cuellar, I., Harris, L. C., & Jasso, R. (1980). An acculturation scale for Mexican American normal and
clinical populations. Hispanic Journal of Behavioral
Sciences, 2, 199217.
Cummings, J. A. (1986). Projective drawings. In H. M.
Knoff (Ed.), The assessment of child and adolescent
personality (pp. 199244). New York: Guilford Press.
Cummings, J. A. (1989). Review of the Structure of
Intellect Learning Abilities Test. In J. C. Conoley &
J. J. Kramer (Eds.), The tenth mental measurements
yearbook (pp. 787791). Lincoln, NE: University of
Nebraska Press.
Cummings, N. A. (1986). Assessing the computers
impact: Professional concerns. Computers in Human
Behavior, 1, 293300.
Cummins, J. P., & Das, J. P. (1980). Cognitive processing, academic achievement, and WISC-R performance in EMR children. Journal of Consulting
and Clinical Psychology, 48, 777779.
Cunningham, M. R., Wong, D. T., & Barbee,
A. P. (1994). Self-presentation dynamics on overt
integrity tests: Experimental studies of the Reid
Report. Journal of Applied Psychology, 79, 643658.
Cureton, E. E. (1960). The rearrangement test.
Educational and Psychological Measurement, 20,
3135.
Cureton, E. E. (1965). Reliability and validity: Basic
assumptions and experimental designs. Educational
and Psychological Measurement, 25, 327346.
Cureton, E. E., Cook, J. A., Fischer, R. T., Laser,
S. A., Rockwell, N. J., & Simmons, J. W. (1973).
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
554
Length of test and standard error of measurement.
Educational and Psychological Measurement, 33,
6368.
Cyr, J. J., Doxey, N. C. S., & Vigna, C. M. (1988). Factorial composition of the SCL-90R. Journal of Social
Behavior and Personality, 3, 245252.
Dahlstrom, W. G. (1993). Tests: Small samples,
large consequences. American Psychologist, 48,
393399.
Dahlstrom, W. G., Brooks, J. D., & Peterson. C. D.
(1990). The Beck Depression Inventory: Item order
and the impact of response sets. Journal of Personality Assessment, 55, 224233.
Dahlstrom, W. G., Lachar, D., & Dahlstrom, L. E.
(1986). MMPI patterns of American minorities.
Minneapolis: University of Minnesota Press.
Dahlstrom, W. G., Welsh, G. S., & Dahlstrom,
L. E. (1972). An MMPI handbook: Vols. 1 and 2,
Minneapolis: University of Minnesota Press.
Dana, R. H., Feild, K., & Bolton, B. (1983). Variations
of the Bender Gestalt Test: Implications for training
and practice. Journal of Personality Assessment, 47,
7684.
DAndrade, R. G. (1965). Trait psychology and componential analysis. American Anthropologist, 67, 215
228.
Dannenbaum, S. E., & Lanyon, R. I. (1993). The use
of subtle items in detecting deception. Journal of
Personality Assessment, 61, 501510.
Darlington, R. B. (1976). A defense of rational personnel selection, and two new methods. Journal of
Educational Measurement, 13, 4352.
Das, J. P. (1989). Review of the Explorer. In J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurements yearbook (pp. 888889). Lincoln, NE: University of Nebraska Press.
Das, J. P., Kirby, J. R., & Jarman, R. F. (1979). Simultaneous and successive cognitive processes. New York:
Academic Press.
Das, J. P., Naglieri, J. A., & Kirby, J. R. (1994). Assessment of cognitive processes: The PASS. New York:
Allyn & Bacon.
Davey, T., Godwin, J., & Mittelholtz, D. (1997). Developing and scoring an innovative computerized writing assessment. Journal of Educational Measurement,
34, 2141.
Davidson, K., & MacGregor, M. W. (1996). Reliability
of an idiographic Q-sort measure of defense mechanisms. Journal of Personality Assessment, 66, 624
639.
Davis, C. (1980). Perkins-Binet Tests of Intelligence for
the Blind. Watertown, MA: Perkins School for the
Blind.
Davis, G. A. (1986). Creativity is forever (2nd ed.).
Dubuque, IA: Kendall-Hunt.
March 6, 2006
13:33
References
Davis, G. A., & Bull, K. S. (1978). Strengthening
affective components of creativity in a college
course. Journal of Educational Psychology, 70, 833
836.
Davis, W. E. (1975). Race and the differential power
of the MMPI. Journal of Personality Assessment, 39,
141145.
Davis, W. E., Beck, S. J., & Ryan, T. A. (1973). Racerelated and educationally-related MMPI prole differences among hospitalized schizophrenics. Journal
of Clinical Psychology, 29, 478479.
Davison, L. A. (1974). Current status of clinical
neuropsychology: In R. M. Reitan & L. A. Davison
(Eds.), Clinical neuropsychology: Current status and
applications. New York: Hemisphere.
Davison, M. L. (1985). Multidimensional scaling versus components analysis of test intercorrelations.
Psychological Bulletin, 97, 94105.
Dawes, R. M. (1962). A note on base rates and psychometric efciency. Journal of Consulting Psychology,
26, 422424.
Dean, R. S. (1977). Reliability of the WISC-R with
Mexican-American children. Journal of School Psychology, 15, 267268.
Dean, R. S. (1978). Distinguishing learning-disabled
and emotionally disturbed children on the WISCR. Journal of Consulting and Clinical Psychology, 46,
381382.
Dean, R. S. (1979). Predictive validity of the WISC-R
with Mexican-American children. Journal of School
Psychology, 17, 5558.
Dean, R. S. (1980). Factor structure of the WISC-R
with Anglos and Mexican-Americans. Journal of
School Psychology, 18, 234239.
Dearborn, G. V. (1898). A study of imaginations. American Journal of Psychology, 9, 183190.
Debra P. v. Turlington, 644 F. 2d 397 (5th Cir. 1981).
Deffenbacher, J. L. (1980). Worry and emotionality in
test anxiety. In I. G. Sarason (Ed.), Test anxiety: Theory, research, and applications (pp. 111128). Hillsdale, NJ: Erlbaum.
DeFrancesco, J. J., & Taylor, J. (1993). A validational note on the revised Socialization scale of
the California Psychological Inventory. Journal of
Psychopathology and Behavioral Assessment, 15, 53
56.
DeGiovanni, I. S., & Epstein, N. (1978). Unbinding
assertion and aggression in research and clinical
practice. Behavior Modication, 2, 173192.
deJonghe, J. F. M., & Baneke, J. J. (1989). The Zung
Self-Rating Depression Scale: A replication study
on reliability, validity and prediction. Psychological
Reports, 64, 833834.
Dekker, R., Drenth, P. J. D., Zaal, J. N., & Koole, F. D.
(1990). An intelligence test series for blind and low
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
vision children. Journal of Visual Impairment and
Blindness, 84, 7176.
Delaney, E. A., & Hopkins, T. F. (1987). Examiners
handbook: An expanded guide for fourth edition users.
Chicago, IL: Riverside.
Delhees, K. H., & Cattell, R. B. (1971). Manual for
the Clinical Analysis Questionnaire (CAQ). Champaign, IL: Institute for Personality and Ability
Testing.
DeLongis, A., Coyne, J. C., Dakof, G., Folkman, S., &
Lazarus, R. S. (1982). Relationship of daily hassles,
uplifts, and major life events to health status. Health
Psychology, 1, 119136.
DeLongis, A., Folkman, S., & Lazarus, R. S. (1988).
The impact of daily stress on health and mood:
Psychological and social resources as mediators.
Journal of Personality and Social Psychology, 54,
486495.
DeLuty, R. H. (19881989). Physical illness, psychiatric illness, and the acceptability of suicide. Omega,
19, 7991.
DeMinzi, M. C. R. (1990). A new multidimensional
Childrens Locus of Control Scale. The Journal of
Psychology, 125, 109118.
Denman, S. (1984). Denman Neuropsychological Memory Scale manual. Charleston, SC: Author.
Denney, D. R., & Sullivan, B. J. (1976). De sensitization
and modeling treatments of spider fear using two
types of scenes. Journal of Consulting and Clinical
Psychology, 44, 573579.
Dennis, K. E. (1986). Q methodology: Relevance and
application to nursing research. Advances in Nursing
Science, 8, 617.
Denton, C., & Postwaithe, K. (1985). Able children:
Identifying them in the classroom. Philadelphia, PA:
Nfer-Nelson.
DeRenzi, E., & Vignolo, L. (1962). The Token Test:
A sensitive test to detect receptive disturbances in
aphasics. Brain, 85, 665678.
Derogatis, L. R. (1987). Self-report measures of stress.
In L. Goldberger & S. Breznitz (Eds.), Handbook of Stress (pp. 270294). New York: The Free
Press.
Derogatis, L. R. (1977). The SCL-90 manual: Scoring,
administration, and procedures for the SCL-90. Baltimore, MD: Johns Hopkins University School of
Medicine, Clinical Psychometrics Unit.
Derogatis, L. R. (1983). SCL-90R: Administration, scoring and procedures manual II. Townson, MD: Clinical Psychometrics Research.
Detterman, D. K. (1992) (Ed.), Current topics in human
intelligence: Vol. 2. Is mind modular or unitary? Norwood, NJ: Ablex.
Deutsch, F. (1974). Observational and sociometric
measures of peer popularity and their relationship
March 6, 2006
13:33
555
to egocentric communication in female preschoolers. Developmental Psychology, 10, 745747.
DeVito, A. J. (1984). Review of Test Anxiety Inventory.
In D. J. Keyser & R. C. Sweetland (Eds.), Test Critiques (Vol. 1, pp. 673681). Kansas City, MO: Test
Corporation of America.
DeVito, A. J. (1985). Review of Myers-Briggs Type Indicator. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 2, pp. 10301032).
Lincoln, NB: University of Nebraska Press.
Dewing, K. (1970). Family inuences on creativity: A
review and discussion. The Journal of Special Education, 4, 399404.
Deyo, R. A., Diehl, A. K., Hazuda, H., & Stern, M. P.
(1985). A simple language-based acculturation scale
for Mexican Americans: Validation and application
to health care research. American Journal of Public
Health, 75, 5155.
Diamond, J. J., & Evans, W. J. (1972). An investigation
of the cognitive correlates of testwiseness. Journal of
Educational Measurement, 9, 145150.
Diamond, J. J., & Evans, W. J. (1973). The correction for
guessing. Review of Educational Research, 43, 181
191.
Dicken, C. (1963). Good impression, social desirability, and acquiescence as suppressor variables. Educational and Psychological Measurement, 23, 699720.
Dickinson, T. L., & Zellinger, P. M. (1980). A comparison of the behaviorally anchored rating and mixed
standard scale formats. Journal of Applied Psychology, 65, 147154.
Digman, J. M. (1989). Five robust trait dimensions:
Development, stability, and utility. Journal of Personality, 57, 195214.
Digman, J. M. (1990). Personality structure: Emergence of the ve-factor model. Annual Review of
Psychology, 41, 417440.
Digman, J. M., & Inouye, J. (1986). Further specication of the ve robust factors of personality. Journal
of Personality and Social Psychology, 50, 116123.
Digman, J. M., & Takemoto-Chock, N. K. (1981).
Factors in the natural language of personality:
Re-analysis, comparison, and interpretation of six
major studies. Multivariate Behavioral Research, 16,
149170.
Dillon, R. F., Pohlmann, J. T., & Lohman, D. F. (1981).
A factor analysis of Ravens Advanced Progressive
Matrices freed of difculty factors. Educational and
Psychological Measurement, 41, 12951302.
Di Vesta, F. (1965). Developmental patterns in the use
of modiers as modes of conceptualization. Child
Development, 36, 185213.
Dobbins, G. H. (1990). How Supervise? In J. Hogan
& R. Hogan (Eds.), Business and Industry Testing
(pp. 472477). Austin, TX: Pro-ed.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
556
Dodd, S. C. (1935). A social distance test in the
Near East. American Journal of Sociology, 41, 194
204.
Dodrill, C. B., & Warner, M. H. (1988). Further studies
of the Wonderlic Personnel Test as a brief measure of
intelligence. Journal of Consulting and Clinical Psychology, 56, 145147.
Dohmen, P., Doll, J., & Feger, H. (1989). A component
theory for attitude objects. In A. Upmeyer (Ed.),
Attitudes and behavioral decisions (pp. 1959). New
York: Springer-Verlag.
Dohrenwend, B. S., & Dohrenwend, B. P. (1978). Some
issues in research on stressful life events. Journal of
Nervous and Mental Disease, 166, 715.
Doll, E. A. (1953). The measurement of social competence. A manual for the Vineland Social Maturity
Scale. Philadelphia: Educational Test Bureau.
Dollinger, S. J. (1989). Predictive validity of the Graduate Record Examination in a Clinical Psychology
Program. Professional Psychology, 20, 5658.
Dolliver, R. H. (1969). Strong Vocational Interest
Blank vs. expressed vocational interests: A review.
Psychological Bulletin, 72, 95107.
Dolliver, R. H., Irwin, J. A., & Bigley, S. E. (1972).
Twelve-year follow-up of the Strong Vocational
Interest Blank. Journal of Counseling Psychology, 19,
212217.
Domino, G. (1965). Personality traits in institutionalized mongoloids. American Journal of Mental Deciency, 69, 541547.
Domino, G. (1968). A non-verbal measure of intelligence for totally blind adults. The New Outlook for
the Blind, 62, 247252.
Domino, G. (1970). The identication of potentially
creative persons from the Adjective Check List.
Journal of Consulting and Clinical Psychology, 35,
4851.
Domino, G. (1979). Creativity and the home environment. Gifted Child Quarterly, 23, 818828.
Domino, G. (1980). Chinese Tangrams as a technique
to assess creativity. Journal of Creative Behavior, 14,
204213.
Domino, G. (1982). Get high on yourself: The effectiveness of a television campaign on self-esteem,
drug use, and drug attitudes. Journal of Drug Education, 12, 163171.
Domino, G. (1984). Measuring geographical environments through adjectives: The Environmental
Check List. Psychological Reports, 55, 151160.
Domino, G. (1992). Cooperation and competition in
Chinese and American children. Journal of CrossCultural Psychology, 23, 456467.
Domino, G. (1994). Assessment of creativity with the
ACL: An empirical comparison of four scales. Creativity Research Journal, 7, 2133.
March 6, 2006
13:33
References
Domino, G., & Acosta, A. (1987). The relation of acculturation and values in Mexican Americans. Hispanic
Journal of Behavioral Sciences, 9, 131150.
Domino, G., & Affonso, D. D. (1990). A personality
measure of Eriksons life stages: The Inventory of
Psychosocial Balance. Journal of Personality Assessment, 54, 576588.
Domino, G., Affonso, D., & Hannah, M. T. (1991).
Assessing the imagery of cancer: The Cancer
Metaphors Test. Journal of Psychosocial Oncology, 9,
103121.
Domino, G., & Blumberg, E. (1987). An application of
Goughs conceptual model to a measure of adolescent self-esteem. Journal of Youth and Adolescence,
16, 179190.
Domino, G., Fragoso, A., & Moreno, H. (1991). Crosscultural investigations of the imagery of cancer in
Mexican nationals. Hispanic Journal of Behavioral
Sciences, 13, 422435.
Domino, G., Goldschmid, M., & Kaplan, M. (1964).
Personality traits of institutionalized mongoloid
girls. American Journal of Mental Deciency, 68, 498
502.
Domino, G., & Hannah, M. T. (1987). A comparative
analysis of social values of Chinese and American
children. Journal of Cross-Cultural Psychology, 18,
5877.
Domino, G., & Hannah, M. T. (1989). Measuring effective functioning in the elderly: An application of
Eriksons theory. Journal of Personality Assessment,
53, 319328.
Domino, G., & Lin, J. (1991). Images of cancer: China
and the United States. Journal of Psychosocial Oncology, 9, 6778.
Domino, G., & Lin, W. Y. (1993). Cancer metaphors:
Taiwan and the United States. Personality and Individual Differences, 14, 693700.
Domino, G., & Pathanapong, P. (1993). Cancer
imagery: Thailand and the United States. Personality
and Individual Differences, 14, 693700.
Domino, G., & Regmi, M. P. (1993). Attitudes toward
cancer: A cross-cultural comparison of Nepalese and
U.S. students. Journal of Cross-Cultural Psychology,
24, 389398.
Donnelly, M., Yindra, K., Long, S. Y., Rosenfeld, P.,
Fleisher, D., & Chen, C. (1986). A model for predicting performance on the NBME Part I Examination. Journal of Medical Education, 61, 123
131.
Donovan, J. M. (1993). Validation of a Portuguese
form of Templers Death Anxiety Scale. Psychological
Reports, 73, 195200.
Dorans, N. J., & Lawrence, I. M. (1987). The internal
construct validity of the SAT. ETS Research Report,
No. 8735.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Dorfman, L., & Mofett, M. (1987). Retirement satisfaction in married and rural women. The Gerontologist,
27, 215221.
Dorken, H., & Greenbloom, G. H. (1953). Psychological investigation of senile dementia. II. The
Wechsler-Bellevue Adult Intelligence Scale. Geriatrics, 8, 324333.
Doyle, D., & Forehand, M. J. (1984). Life satisfaction
and old age. Research on Aging, 6, 432448.
Doyle, K. O., Jr. (1974). Theory and practice of ability
testing in ancient Greece. Journal of the History of
the Behavioral Sciences, 10, 202212.
Dozois, D., Dobson, K., & Ahnberg, J. L. (1998). A
psychometric evaluation of the Beck Depression
Inventory-II. Psychological Assessment, 10, 8389.
Drake, L. E., & Oetting, E. R. (1959). An MMPI codebook for counselors. Minneapolis: University of Minnesota Press.
Drasgow, F., & Olson-Buchanan, J. B. (Eds.). (1999).
Innovations in computerized assessment. Mahwah,
NJ: Erlbaum.
Drewes. D. W. (1961). Development and validation of
synthetic dexterity tests based on elemental motion
analysis. Journal of Applied Psychology, 45, 179185.
Driver, B. L., & Knopf, R. C. (1977). Personality, outdoor recreation, and expected consequences. Environment and Behavior, 9, 169193.
Droege, R. C. (1987). The USES Testing Program. In B.
Bolton (Ed.), Handbook of measurement and evaluation in rehabilitation (2nd ed., pp. 169182). Baltimore, MD: Paul H. Brookes.
Dror, I. E., Kosslyn, S. M., & Waag, W. L. (1993). Visualspatial abilities of pilots. Journal of Applied Psychology, 78, 763773.
Dubinsky, S., Gamble, D. J., & Rogers, M. L. (1985).
A literature review of subtle-obvious items on the
MMPI. Journal of Personality Assessment, 49, 6268.
DuBois, L. M. (1985). Review of Dental Admission
Test. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 449450). Lincoln:
University of Nebraska Press.
DuBois, P. H. (1966). A test-dominated society: China
1115 B.C. 1905 A.D. In A. Anastasi (Ed.), Testing problems in perspective (pp. 2936). Washington,
DC: American Council on Education.
DuBose, R. F. (1976). Predictive value of infant intelligence scales with multiple handicapped children.
American Journal of Mental Deciency, 81, 388390.
DuBose, R. F. (1981). Assessment of severely impaired
young children: Problems and recommendations.
Topics in Early Childhood Special Education, 1, 921.
Dubuisson, D., & Melzack, R. (1976). Classication
of clinical pain descriptions by multiple group
discriminant analysis. Experimental Neurology, 51,
480487.
March 6, 2006
13:33
557
Duckworth, J. C. (1991). The Minnesota Multiphasic
Personality Inventory-2: A review. Journal of Counseling and Development, 69, 564567.
Dufault, K. J., & Martocchio, B. (1985). Hope: Its
spheres and dimensions. Nursing Clinics of North
America, 20, 379391.
Dumont, R., & Faro, C. (1993). A WISC-III short
form for learning-disabled students. Psychology in
the Schools, 30, 212219.
Dunlap, W. R., & Sands, D. I. (1990). Classication of
the hearing impaired for independent living using
the Vineland Adaptive Behavior Scale. American
Annals of the Deaf, 135, 384388.
Dunn, L. M., & Dunn, L. (1981). Peabody Picture
Vocabulary Test-Revised. Cicle Pines, MN: American Guidance Service.
Dunn, T. F., & Goldstein, L. G. (1959). Test difculty, validity, and reliability as functions of selected
multiple-choice item construction principles. Educational and Psychological Measurement, 19, 171
179.
Dunn, T. G., Lushene, R. E., & ONeil, H. F. (1972).
Complete automation of the MMPI and a study of its
response latencies. Journal of Consulting and Clinical
Psychology, 39, 381387.
Dunnett, S., Koun, S., & Barber, J. (1981). Social desirability in the Eysenck Personality Inventory. British
Journal of Psychology, 72, 1926.
Dunnette, M. D., McCartney, J., Carlson, H. C., &
Kirchner, W. K. (1962). A study of faking behavior
on a forced-choice self-description checklist. Personnel Psychology, 15, 1324.
Dunst, C. J., & Gallagher, J. L. (1983). Piagetian
approaches to infant assessment. Topics in Early
Childhood Special Education, 3, 4462.
DuPaul, G. J. (1992). Five common misconceptions about the assessment of attention decit
hyperactivity disorder. The School Psychologist, 10,
1415.
Duran, R. P. (1983). Hispanics education and
background: Predictors of college achievement. New
York: College Entrance Examination Board.
Dusek, J. B. (1980). The development of test anxiety in
children. In I. G. Sarason (Ed.), Test anxiety. Hillsdale, NJ: Erlbaum.
Dvorak, B. J. (1947). The new USES General Aptitude
Test Battery. Occupations, 26, 4244.
Dwyer, C., & Wicenciak, S. (1977). A pilot investigation
of three factors of the 16 P. F., Form E, comparing the
standard written form with an Ameslan videotape
revision. Journal of Rehabilitation of the Deaf, 10,
1723.
Dyal, J. A. (1983). Cross-cultural research with the
locus of control construct. In H. M. Lefcourt (Ed.),
Research with the locus of control construct: Vol. 3.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
558
Extensions and limitations (pp. 209306). New York:
Academic Press.
Dyer, C. O. (1985). Review of the Otis-Lennon School
Ability Test. In J. V. Mitchell, Jr. (Ed.), The ninth
mental measurements yearbook (Vol. 2, pp. 1107
1111). Lincoln, NE: University of Nebraska Press.
Eagly, A. H., & Chaiken, S. (1992). The psychology of
attitudes. San Diego, CA: Harcourt Brace Janovich.
Ebel, R. L. (1963). The social consequences of
educational testing. Proceedings of the 1963
Invitational Conference on Testing Problems.
(pp. 130143). Princeton, NJ: Educational Testing
Service.
Ebel, R. L. (1969). Expected reliability as a function
of choices per item. Educational and Psychological
Measurement, 29, 565570.
Ebel, R. L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice Hall.
Edelbrock, C. S., & Rancurello, M. D. (1985). Childhood hyperactivity: An overview of rating scales
and their applications. Clinical Psychology Review,
5, 429445.
Educational Testing Service. (1977). Graduate Record
Examinations technical manual. Princeton, NJ:
Author.
Educational Testing Service. (1994). College Board to
recenter scorting scale for PSAT/NMSQT and SAT
I. ETS Developments, 39, 13.
Edward, J. E., & Wilkins, W. (1981). VerbalizerVisualizer Questionnaire: Relationship with
imagery and verbal-visual ability. Journal of Mental
Imagery, 5, 137142.
Edwards, A. J. (1971). Individual Mental Testing. Part
I History and theories. Scranton, PA: Intext Educational.
Edwards, A. L. (1957a). Techniques of attitude scale construction. New York: Appleton-Century-Crofts.
Edwards, A. L. (1957b). The social desirability variable
in personality assessment and research. New York:
Dryden.
Edwards, A. L. (1959). The Edwards Personal Preference Schedule (Rev. ed.). New York: Psychological
Corporation.
Edwards, A. L. (1970). The measurement of personality traits by scales and inventories. New York: Holt,
Rinehart and Winston, Inc.
Edwards, A. L., & Diers, C. J. (1963). Neutral items as
a measure of acquiescence. Educational and Psychological Measurement, 23, 687698.
Edwards, A. L., & Kilpatrick, F. P. (1948). A technique
for the construction of attitude scales. Journal of
Applied Psychology, 32, 374384.
Edwards, J. R., Baglioni, A. J., Jr., & Cooper, C. L.
(1990). Examining relationships among self-report
measures of Type A behavior pattern: The effects
March 6, 2006
13:33
References
of dimensionality, measurement error, and differences in underlying constructs. Journal of Applied
Psychology, 75, 440454.
Eichman, W. J. (1962). Factored scales for the MMPI:
A clinical and statistical manual. Journal of Clinical
Psychology, 18, 363395.
Eippert, D. S., & Azen, S. P. (1978). A comparison of
two developmental instruments in evaluating children with Downs syndrome. Physical Therapy, 58,
10661069.
Eisdorfer, C., & Altrocchi, J. (1961). A comparison of
attitudes toward old age and mental illness. Journal
of Gerontology, 21, 455457.
Eiser, C. (1984). Communicating with sick and hospitalized children. Journal of Child Psychology and
Psychiatry, 25, 181189.
Eiser, J. R. (1987). The expression of attitude. New York:
Springer-Verlag.
Ekman, P. (1965). Differential communication of
affect by head and body cues. Journal of Personality and Social Psychology, 2, 726735.
Ekstrom, R. B., French, J. W., & Harman, H. H.
(1979). Cognitive factors: Their identication
and replication. Multivariate Behavioral Research,
No. 792.
Elbert, J. C. (1984). Training in child diagnostic
assessment: A survey of clinical psychology graduate programs. Journal of Clinical Child Psychology,
13, 122133.
Elksnin, L. K., & Elksnin, N. (1993). A review of picture interest inventories: Implications for vocational
assessment of students with disabilities. Journal of
Psychoeducational Assessment, 11, 323336.
Elliot, G. R., & Eisdorfer, C. (1981). Stress and human
health: Analysis and implications of research. New
York: Springer-Verlag.
Elliott, C. D. (1990a). Differential Ability Scales: Introductory and technical handbook. San Antonio, TX:
Psychological Corporation.
Elliott, C. D. (1990b). The nature and structure of childrens abilities: Evidence from the differential ability scales. Journal of Psychoeducational Assessment,
8, 376390.
Elliott, C. D., & Tyler, S. (1987). Learning disabilities
and intelligence test results: A principal components
analysis of the British Ability Scales. British Journal
of Psychology, 78, 325333.
Ellis, D. (1978). Methods of assessment for use with
the visually and mentally handicapped: A selective
review. Childcare, Health and Development, 4, 397
410.
Elman, L., Blixt, S., & Sawacki, R. (1981). The development of cutoff scores on a WISC-R in the multidimensional assessment of gifted children. Psychology
in the Schools, 18, 426428.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Elwood, D. L. (1969). Automation of psychological
testing. American Psychologist, 24, 287289.
Elwood, D. L., & Grifn, R. H. (1972). Individual intelligence testing without the examiner: Reliability of
an automated method. Journal of Consulting and
Clinical Psychology, 38, 914.
Elwood, R. W. (1993). Psychological tests and clinical
discriminations: Beginning to address the base rate
problem. Clinical Psychology Review, 13, 409419.
Embretson, S. (1985a). Review of the British Ability
Scales. In J. V. Mitchell, Jr. (Ed.), The ninth mental
measurements yearbook (Vol. 1, pp. 231232). Lincoln, NE: University of Nebraska Press.
Embretson, S. E. (1985b). Introduction to the problem
of test design. In S. E. Embretson (Ed.), Test design
(pp. 317). New York: Academic Press.
Endler, N. S., & Magnusson, D. (1976). Interactional
psychology and personality. New York: Wiley & Sons.
Endler, N. S., Rutherford, A., & Denisoff, E. (1999).
Beck Depression Inventory: Exploring its dimensionality in a nonclinical population. Journal of Clinical Psychology, 55, 13071312.
Ensor, A., & Phelps, L. (1989). Gender differences
on the WAIS-R Performance Scale with young deaf
adults. Journal of the American Deafness and Rehabilitation Association, 22, 4852.
Epstein, N., Baldwin, L., & Bishop, D. (1983). The
McMaster family assessment device. Journal of Marital and Family Therapy, 9, 171180.
Epstein, S. (1979). The stability of behavior. I. On predicting most of the people much of the time. Journal
of Personality and Social Psychology, 37, 10971126.
Erickson, R. C., Post, R., & Paige, A. (1975). Hope as
a psychiatric variable. Journal of Clinical Psychology,
31, 324329.
Erickson, R. C., & Scott, M. (1977). Clinical memory
testing: A review. Psychological Bulletin, 84, 1130
1149.
Erikson, E. H. (1963). Childhood and society (2nd ed.).
New York: Norton.
Erikson, E. H. (1980). Identity and the life cycle. New
York: Norton.
Erikson, E. H. (1982). The life cycle completed: A review.
New York: Norton.
Eron, L. D. (1950). A normative study of the Thematic Apperception Test. Psychological Monographs,
64 (No. 9).
Eron, L. D. (1955). Some problems in the research
application of the Thematic Apperception Test. Journal of Projective Techniques, 19, 125129.
Eron, L. D., Terry, D., & Callahan, R. (1950). The use
of rating scales for emotional tone of TAT stories.
Journal of Consulting Psychology, 14, 473478.
Eslinger, P. J., Damasio, A. R., Benton, A. L., & Van
Allen, M. (1985). Neuropsychologic detection of
March 6, 2006
13:33
559
abnormal mental decline in older persons. Journal
of the American Medical Association, 253, 670674.
Estes, G. D., Harris, J., Moers, F., & Wodrich, D. L.
(1976). Predictive validity of the Boehm Test of Basic
Concepts for achievement in rst grade. Educational
and Psychological Measurement, 36, 10311035.
Estes, W. K. (1974). Learning theory and intelligence.
American Psychologist, 29, 740749.
Evans, R. B., & Cohen, J. B. (1987). The American Journal of Psychology: A retrospective. American Journal
of Psychology, 100, 321362.
Evans, R. G. (1982). Clinical relevance of the MarloweCrowne Scale: A review and recommendations.
Journal of Personality Assessment, 46, 415425.
Ewedemi, F., & Linn, M. W. (1987). Health and hassles in older and younger men. Journal of Clinical
Psychology, 43, 347353.
Exner, J. E., Jr. (1974). The Rorschach: A comprehensive
system (Vol. 1). New York: Wiley.
Exner, J. E., Jr. (1993). The Rorschach: A comprehensive system: Vol. 1, Basic Foundations (3rd ed.). New
York: Wiley.
Exner, J. E., Jr. (1999). The Rorschach: Measurement
concepts and issues of validity. In S. E. Embretson &
S. L. Hershberger (Eds.), The new rules of measurement. Mahwah, NJ: Erlbaum.
Exner, J. E., McDowell, E., Pabst, J., Stackman, W., &
Kirk, L. (1963). On the detection of willful falsications in the MMPI. Journal of Consulting Psychology,
27, 9194.
Eyde, L. D., Kowal, D. M., & Fishburne, F. J., Jr. (1991).
The validity of computer-based test interpretations
of the MMPI. In T. B. Gutkin & S. L. Wise (Eds.), The
computer and the decision-making process (pp. 75
123). Hillsdale, NJ: Lawrence Erlbaum.
Eysenck, H. J. (1985). Review of the California Psychological Inventory. In J. V. Mitchell, Jr. (Ed.),
The ninth mental measurements yearbook (Vol. 1,
pp. 252253). Lincoln, NE: University of Nebraska
Press.
Eysenck, H. J., & Eysenck, S. B. G. (1975). The Eysenck
Personality Questionnaire manual. London: Hodder
& Stoughton.
Fabian, L., & Ross, M. (1984). The development of
the Sports Competition Trait Inventory. Journal of
Sports Behavior, 7, 1327.
Fagley, N. S. (1987). Positional response bias in
multiple-choice tests of learning: Its relation to testwiseness and guessing strategy. Journal of Educational Psychology, 79, 9597.
Fancher, R. E. (1987). Henry Goddard and the Kallikak
family photographs. American Psychologist, 42,
585590.
Fancher, R. E. (1983). Alphonse de Candolle, Francis
Galton, and the early history of the nature-nature
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
560
controversy. Journal of the History of the Behavioral
Sciences, 19, 341352.
Farh, J. L., Dobbins, G. H., & Cheng, B. S. (1991).
Cultural relativity in action: A comparison of selfratings made by Chinese and U.S. workers. Personnel
Psychology, 44, 129147.
Farmer, R., & Sundberg, N. D. (1986). Boredom
Proneness The development and correlates of
a new scale. Journal of Personality Assessment, 50,
417.
Farrugia, D. (1983). A study of vocational interests
and attitudes of hearing impaired clients. Journal
of Rehabilitation of the Deaf, 17, 17.
Farrugia, D., & Austin, G. F. (1980). A study of socialemotional adjustment patterns of hearing-impaired
students in different educational settings. American
Annals of the Deaf, 125, 535541.
Farylo, B., & Paludi, M. A. (1985). Research with the
Draw-A-Person Test: Conceptual and methodological issues. The Journal of Psychology, 119, 575580.
Faust, D., Hart, K., & Guilmette, T. J. (1988). Pediatric
malingering: The capacity of children to fake believable decits on neuropsychological testing. Journal
of Consulting and Clinical Psychology, 56, 578582.
Faust, D., & Ziskin, J. (1989). Computer-assisted psychological evaluation as legal evidence: Some day my
prints will come. Computers in Human Behavior, 5,
2336.
Fazio, A. F. (1969). Verbal and overt-behavioral assessment of a specic fear. Journal of Consulting and
Clinical Psychology, 33, 703709.
Feather, N. T. (1975). Values in education and society.
New York: Free Press.
Feather, N. T. (1986). Cross-cultural studies with the
Rokeach Value Survey: The Flinders program of
research on values. Australian Journal of Psychology,
38, 269283.
Feather, N. T., & Peay, E. R. (1975). The structure of
terminal and instrumental values: Dimensions and
clusters. Australian Journal of Psychology, 27, 151
164.
Feazell, D. M., Quay, H. C, & Murray, E. J.
(1991). The validity and utility of Lanyons Psychological Screening Inventory in a youth services
agency sample. Criminal Justice and Behavior, 18,
166179.
Federici, L., & Schuerger, J. (1974). Prediction of success in an applied M. A. psychology program. Educational and Psychological Measurement, 34, 945952.
Feingold, A. (1983). The validity of the Information
and Vocabulary Subtests of the WAIS for predicting
college achievement. Educational and Psychological
Measurement, 43, 11271131.
Fekken. G. C. (1984). Adjective Check List. In D. J.
Keyser & R. C. Sweetland (Eds.), Test critiques
March 6, 2006
13:33
References
(Vol. 1, pp. 3440). Kansas City, MO: Test Corporation of America.
Feldman, K. A., & Newcomb, T. M. (1969). The impact
of colleges on students: Vol. 1. An analysis of four
decades of research. San Francisco, CA: Jossey-Bass.
Feldman, W. (1990). Learning disabilities: A review
of available treatments. Springeld, IL: Charles C
Thomas.
Fenigstein, A., Scheier, M. F., & Buss, A. H. (1975).
Public and private self-consciousness: Assessment
and theory. Journal of Consulting and Clinical Psychology, 43, 522527.
Ferketich, S. L., Figueredo, A. J., & Knapp, T. R. (1991).
The multitrait-multimethod approach to construct
validity. Research in Nursing and Health, 14, 315
320.
Ferraro, L. A., Price, J. H., Desmond, S. M., & Roberts,
S. M. (1987). Development of a diabetes locus of
control scale. Psychological Reports, 61, 763770.
Festinger, L. (1947). The treatment of qualitative
data by scale analysis. Psychological Bulletin, 44,
149161.
Feuerstein, R. (1979). The dynamic assessment of
retarded performers: The learning potential assessment device, theory, instruments, and techniques. Baltimore, MD: University of Maryland Press.
Fibel, B., & Hale, W. D. (1978). The generalized
expectancy for success scale A new measure. Journal of Consulting and Clinical Psychology, 46, 924
931.
Figueroa, R. A. (1979). The system of Multicultural
Pluralistic Assessment. The School Psychology Digest,
8, 2836.
Filskov, S. B. (1983). Neuropsychological screening.
In P. Q. Keller & L. G. Writt (Eds.), Innovations in clinical practice: A sourcebook (Vol. II,
pp. 1725). Sarasota, FL: Professional Resource
Exchange.
Finch, A. J., & Rogers, T. R. (1984). Self-report instruments. In T. H. Ollendick & M. Hersen (Eds.), Child
behavioral assessment (pp. 106123), New York:
Pergamon Press.
Finch, A. J., Thornton, L. J., & Montgomery, L. E.
(1974). WAIS short forms with hospitalized psychiatric patients. Journal of Consulting and Clinical
Psychology, 42, 469.
Finegan, J. E., & Allen, N. J. (1994). Computerized and
written questionnaires: Are they equivalent? Computers in Human Behavior, 10, 483496.
Finger, M. S., & Ones, D. S. (1999). Psychometric
equivalence of the computer and booklet forms of
the MMPI: A metaanalysis. Psychological Assessment,
11, 5866.
Fischer, D. G., & Fick, C. (1993). Measuring social
desirability Short forms of the Marlowe-Crowne
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
References
Social Desirability scale. Educational and Psychological Measurement, 53, 417424.
Fishbein, M. (1980). A theory of reasoned action:
Some applications and implications. In M. M. Page
(Ed.), Nebraska Symposium on Motivation, 1979
(pp. 65115). Lincoln, NE: University of Nebraska
Press.
Fisher, S., & Fisher, R. (1950). Test of certain assumptions regarding gure drawing analysis. Journal of
Abnormal and Social Psychology, 45, 727732.
Fitts, W. H. (1965). Manual for the Tennessee SelfConcept Scale. Nashville, TN: Counselor Recordings
and Tests.
Flaharty, R. (1976). EPEC: Evaluation and Prescription
for Exceptional Children. In E. R. Ritvo, B. J. Freeman, E. M. Ornitz, & P. E. Tanguay (Eds.), Autism:
Diagnosis, current research and management. New
York: Spectrum.
Flanagan, J., & Lewis, G. (1969). Comparison of Negro
and white lower class men on the General Aptitude
Test Battery and the Minnesota Multiphasic Personality Inventory. Journal of Social Psychology, 78, 289
291.
Flanagan, R., & DiGiuseppe, R. (1999). Critical review
of the TEMAS: A step within the development of
thematic apperception instruments. Psychology in
the Schools, 36, 2130.
Fleege, P. O., Charlesworth, R., Burts, D. C., & Hart,
C. H. (1992). Stress begins in kindergarten: A look
at behavior during standardized testing. Journal of
Research in Childhood Education, 7, 2026.
Fleenor, J. W., & Taylor, S. (1994). Construct validity of
three self-report measures of creativity. Educational
and Psychological Measurement, 54, 464470.
Fleishman, E. A. (1988). Some new frontiers in personnel selection research. Personnel Psychology, 41,
679702.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76,
378382.
Fleiss, J. L. (1975). Measuring agreement between two
judges on the presence or absence of a trait. Biometrics, 31, 651659.
Fleming, J., & Garcia, N. (1998). Are standardized tests
fair to African Americans? The Journal of Higher
Education, 69, 471495.
Fleming, T. M., Perkins, D. V., & Lovejoy, M. C. (1991).
Development of a brief self-report measure of quality in roommate relationships. Behavioral Assessment, 13, 125135.
Fletcher, T. (1989). A comparison of the Mexican version of the Wechsler Intelligence Scale for ChildrenRevised and the Woodcock Psycho-Educational
Battery in Spanish. Journal of Psychoeducational
Assessment, 7, 5665.
March 6, 2006
13:33
561
Fluckiger, F. A., Tripp, C. A., & Weinberg, G. H. (1961).
A review of experimental research in graphology:
19331960. Perceptual and Motor Skills, 12,
6790.
Folkins, C. H., & Sime, W. E. (1981). Physical tness
training and mental health. American Psychologist,
36, 373389.
Folstein, M. D., Folstein, S. E., & McHugh, P. R. (1975).
Mini-mental state: A practical method for grading
the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189198.
Forde, J. (1977). Data on the Peabody Picture Vocabulary Test. American Annals of the Deaf, 122, 3843.
Fosberg, I. A. (1938). Rorschach reactions under varied
instructions. Rorschach Research Exchange, 3, 1230.
Fosberg, I. A. (1941). An experimental study of the
reliability of the Rorschach psychodiagnostic technique. Rorschach Research Exchange, 5, 7284.
Fosberg, I. A. (1943). How do subjects attempt to fake
results on the Rorschach test? Rorschach Research
Exchange, 7, 119121.
Fowler, P. C. (1985). Factor structure of the Personality
Research Form-E: A maximum likelihood analysis.
Journal of Clinical Psychology, 41, 377381.
Fowler, R. D. (1967). Computer interpretation of personality tests: The automated psychologist. Comprehensive Psychiatry, 8, 455467.
Fowler, R. D. (1985). Landmarks in computer-assisted
psychological assessment. Journal of Counseling and
Clinical Psychology, 53, 748759.
Fraboni, M., & Saltstone, R. (1992). The WAIS-R
number-of-factors quandary: A cluster analytic
approach to construct validation. Educational and
Psychological Measurement, 52, 603613.
Fraiberg, S. (1977). Thoughts from the blind: Developmental studies of blind children. New York: Basic
Books.
Franco, J. N. (1983). An acculturation scale for
Mexican-American children. Journal of General Psychology, 108, 175181.
Frank, G. (1983). The Wechsler enterprise. Oxford:
Pergamon Press.
Frank, G. H. (1970). The measurement of personality from the Wechsler tests. In B. A. Mahrer (Ed.),
Progress in experimental personality research. New
York: Academic Press.
Frank, J. (1968). The role of hope in psychotherapy.
International Journal of Psychiatry, 6, 383395.
Frankenburg, W. K., & Dodds, J. B. (1967). The Denver Developmental Screening Test. The Journal of
Pediatrics, 71, 181191.
Franzen, M. D. (1987). Luria-Nebraska Neuropsychological Battery. In D. J. Keyser & R. C. Sweetland
(Eds.), Test critiques compendium (pp. 260272).
Kansas City, MO: Test Corporation of America.
P1: JZP
0521861810rfa1
CB1038/Domino
0 521 86181 0
562
Franzen, M. D. (1989). Reliability and validity in neuropsychological assessment. New York: Plenum Press.
Fraser, B. J. (1981). Using environmental assessments
to make better classrooms. Journal of Curriculum
Studies, 13, 131144.
Frederick, R. I., Sarfaty, S. D., Johnston, D., & Powel, J.
(1994). Validation of a detector of response bias on
a forced-choice test of nonverbal ability. Neuropsychology, 8, 118125.
Frederiksen, N. (1962). Factors in in-basket performance. Psychological Monographs, 76 (22, whole No.
541).
Freedman, D. G., & Freedman, N. (1969). Behavioral differences between Chinese-American
and European-American newborns. Nature, 24,
1227.
Freeman, B. J. (1976). Evaluating autistic children.
Journal of Pediatric Psychology, 1, 1821.
Freeman, B. J. (1985). Review of Child Behavior Checklist. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 300301). Lincoln,
NE: University of Nebraska Press.
Freeman, F. S. (1984). A note on E. B. Titchener and
G. M. Whipple. Journal of the History of the Behavioral Sciences, 20, 177179.
French, J. (1958). Validation of new item types against
four-year academic criteria. Journal of Educational
Psychology, 49, 6776.
French, J. L. (1964). Manual: Pictorial Test of Intelligence. Boston, MA: Houghton, Mifin.
Frenzel, R. R., & Lang, R. A. (1989). Identifying sexual preferences in intrafamilial and extrafamilial
child sexual abusers. Annals of Sex Research, 2,
255275.
Freund, K. (1965). Diagnosing heterosexual pedophilia by means of a test for sexual interest. Behavioral
Research and Therapy, 3, 229234.
Fricke, B. G. (1956). Response set as a suppressor variable in the OAIS and the MMPI. Journal of Consulting Psychology, 20, 161169.
Friedman, M., & Rosenman, R. H. (1959). Association of specic overt behavior pattern with blood
and cardiovascular ndings. Journal of the American Medical Association, 169, 12861296.
Frisbie, D. A., & Becker, D. F. (1991). An analysis of
textbook advice about true-false tests. Applied Measurement in Education, 4, 6783.
Frostig, M., Lefever, W., & Whittlesey, J. R. B. (1966).
Administration and scoring manual for the Marianne
Frostig Developmental Test of Visual Perception. Palo
Alto, CA: Consulting Psychologists Press.
Fruzzetti, A. E., & Jacobson, N. S. (1992). Assessment of couples. In J. C. Rosen, & P. McReynolds
(Eds.), Advances in Psychological Assessment (Vol. 8,
pp. 201224). New York: Plenum Press.
March 6, 2006
13:33
References
Fullard, W., McDevitt, S. C., & Carey, W. B. (1984).
Assessing temperament in one- to three-yearold children. Journal of Pediatric Psychology, 9,
205217.
Funk, S. C., & Houston, B. K. (1987). A critical analysis
of the hardiness scales validity and utility. Journal
of Personality and Social Psychology, 53, 572578.
Furey, W., & Forehand, R. (1983). The Daily Child
Behavior Checklist. Journal of Behavioral Assessment, 5, 8395.
Furnham, A. (1986). Response bias, social desirability
and dissimulation. Personality and Individual Differences, 7, 385400.
Furnham, A. (1987). A content and correlational analysis of seven locus of control scales. Current Psychological Research and Reviews, 6, 244255.
Furnham, A., & Henderson, M. (1982). The good, the
bad and the mad: Response bias in self-report measures. Personality and Individual Differences, 3, 311
320.
Furnham, A., & Henry, J. (1980). Cross-cultural locus
of control studies: Experiment and critique. Psychological Reports, 47, 2329.
Gaddes, W. H. (1994). Learning disabilities and brain
function: A neuropsychological approach (3rd ed.).
New York: Springer-Verlag.
Gage, N. L. (1959). Review of the Study of Values. In
O. K. Buros (Ed.), The fth mental measurements
yearbook (pp. 199202). Highland Park, NJ: The
Gryphon Press.
Gage, N. L., Leavitt, G. S., & Stone, G. C. (1957). The
psychological meaning of acquiescence for authoritarianism. Journal of Abnormal and Social Psychology, 55, 98103.
Gagnon, S. G., & Nagle, R. J. (2000). Comparison of
the revised and original versions of the Bayley Scales
of Infant Development. School Psychology International, 21, 293305.
Galassi, J. P., & Galassi, M. D. (1974). Validity of a
measure of assertiveness. Journal of Counseling Psychology, 21, 248250.
Galen, R., & Gambino, S. (1975). Beyond normality:
The predictive value and efciency of medical diagnoses. New York: Wiley.
Gallagher, D., Thompson, L. W., & Levy, S. M. (1980).
Clinical psychological assessment of older adults.
In L. W. Poon (Ed.), Aging in the 1980s (pp. 19
40). Washington, DC: American Psychological
Association.
Gallagher, J. J. (1966). Research summary on gifted child
education. Springeld, IL: Ofce of the Superintendent of Public Instruction.
Gallagher, J. J. (1989). A new policy initiative: Infants
and toddlers with handicapping conditions. American Psychologist, 44, 387391.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Galton, F. (1884). Measurement of character. Fortnightly Review, 36, 179185.
Galton, F. (1907). Inquiries into human faculty and its
development (2nd ed.). New York: E. P. Dutton.
Ganster, D. C., Hennessey, H. W., & Luthans, F. (1983).
Social desirability response effects: Three alternative
models. Academy of Management Journal, 26, 321
331.
Garb, H. N. (1984). The incremental validity of information used in personality assessment. Clinical Psychology Review, 4, 641655.
Garb, H. N., Florio, C. M., & Grove, W. M.
(1998). The validity of the Rorschach and
the Minnesota Multiphasic Personality Inventory:
Results from meta-analyses. Psychological Science, 9,
402404.
Garcia, M., & Lega, L. I. (1979). Development of
a Cuban Ethnic Identity Questionnaire. Hispanic
Journal of Behavioral Sciences, 1, 247261.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic.
Gardner, M. (1974). Mathematical games. Scientic
American, 231, 98103.
Garner, D. M. (1991). The Eating Disorder Inventory2 professional manual. Odessa, FL: Psychological
Assessment Resources.
Garner, D. M., & Olmsted, M. P. (1984). The Eating
Disorder Inventory Manual. Odessa, FL: Psychological Assessment Resources.
Garry, R. (1953). Individual differences in ability to
fake vocational interests. Journal of Applied Psychology, 37, 3337.
Garth, T. R., Eson, T. H., & Morton, M. M. (1936). The
administration of non-language intelligence tests to
Mexicans. Journal of Abnormal and Social Psychology, 31, 5358.
Gaugler, B. B., Rosenthal, D. B., Thornton, G. C. III,
& Bentson, C. (1987). Meta-analysis of assessment
center validity. Journal of Applied Psychology, 72,
493511.
Gay, M. L., Hollandsworth, J. G., Jr., & Galassi, J. P.
(1975). An assertiveness inventory for adults. Journal of Counseling Psychology, 22, 340344.
Gear, G. H. (1976). Accuracy of teacher judgment in identifying intellectually gifted children: A
review of the literature. Gifted Child Quarterly, 20,
478490.
Geary, D. C., & Gilger, J. W. (1984). The LuriaNebraska Neuropsychological Battery Childrens
revision: Comparison of learning-disabled and normal children matched on full scale IQ. Perceptual
and Motor Skills, 58, 115118.
Geary, D. C., Jennings, S. M., Schultz, D. D., & Alper,
T. G. (1984). The diagnostic accuracy of the LuriaNebraska Neuropsychological Battery Childrens
March 6, 2006
13:35
563
revision for 9-to 12-year-old learning-disabled children. School Psychology Review, 13, 375380.
Geer, J. H. (1965). The development of a scale to measure fear. Behaviour Research and Therapy, 3, 4553.
Geisinger, K. F. (1992). The metamorphosis of test validation. Educational Psychologist, 27, 197222.
Geist, H. (1959). Geist Picture Interest Inventory. Los
Angeles, CA: Western Psychological Services.
Geist, H. (1988). The Geist Picture Interest Inventory.
Revised. Los Angeles, CA: Western Psychological
Service.
Gelatt, H. B. (1967). Information and decision theories
applied to college choice and planning. In Preparing
school counselors in educational guidance. New York:
College Entrance Examination Board.
Gelb, S. A. (1986). Henry H. Goddard and the immigrants, 19101917: The studies and their social context. Journal of the History of the Behavioral Sciences,
22, 324332.
Genshaft, J., & Ward, M. E. (1982). A review of the
Perkins-Binet tests of intelligence for the blind with
suggestions for administration. School Psychology
Review, 11, 338341.
Gerardi, R., Keane, T. M., & Penk, W. (1989). Utility:
Sensitivity and specicity in developing diagnostic
tests of combat-related post-traumatic stress disorder (PTSD). Journal of Clinical Psychology, 45, 691
703.
Gerken, K. C. (1978). Performance of Mexican American children on intelligence tests. Exceptional children, 44, 438443.
Gerken, K. C., & Hodapp, A. F. (1992). Assessment
of preschoolers at-risk with the WPPSI-R and the
Stanford-Binet L-M. Psychological Reports, 71, 659
664.
Gerst, M. S., Grant, I., Yager, J., & Sweetwood, H.
(1978). The reliability of the social readjustment rating scale: Moderate and long-term stability. Journal
of Psychosmatic Research, 22, 519523.
Gesell, A., Halverson, H. M., & Amatruda, C. S. (1940).
The rst ve years of life. New York: Harper.
Gesell, A., Ilg, F. L., & Ames, L. B. (1974). Infant and
child in the culture of today (Rev. ed.). New York:
Harper & Row.
Geschwind, N., & Levitsky, W. (1968). Human brain:
Left-right asymmetrics in temporal speech regions.
Science, 161, 186187.
Getter, H., & Sundland, D. M. (1962). The Barron Ego
Strength Scale and psychotherapy outcome. Journal
of Consulting Psychology, 26, 195.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York: Wiley.
Ghiselli, E. E. (1973). The validity of aptitude tests in
personnel selection. Personnel Psychology, 26, 461
477.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
564
Ghiselli, E. E., & Haire, M. (1960). The validation of
selection tests in the light of the dynamic character of criteria. Personnel Psychology, 13, 225
231.
Gibb, B. (1964). Testwiseness as secondary cue
response (Doctoral dissertation, Stanford University, 1964). Ann Arbor, MI: University Microlms
(UMI No. 64-7643).
Gibbins. K. (1968). Response sets and the semantic differential. British Journal of Social and Clinical Psychology, 7, 253263.
Gibbins, S. (1989). The provision of school psychological assessment services for the hearing
impaired: A national survey. The Volta Review, 91,
95103.
Gibson, J. J. (1966). The senses considered as perceptual
systems. New York: Houghton Mifin.
Gibson-Harman, K., & Austin, G. F. (1985). A revised
form of the Tennessee Self-concept Scale for use with
deaf and hard of hearing persons. American Annals
of the Deaf, 130, 218225.
Giedt, F. H., & Downing, L. (1961). An extraversion
scale for the MMPI. Journal of Clinical Psychology,
17, 156159.
Gierl, M. J., & Rogers, W. T. (1996). A conrmatory factor analysis of the Test Anxiety Inventory
using Canadian high-school students. Educational
and Psychological Measurement, 56, 315324.
Gilberstadt, H., & Duker, J. (1965). A handbook for
clinical and actuarial MMPI interpretation. Philadelphia, PA: W. B. Saunders.
Gilewski, M. J., & Zelinski, E. M. (1986). Questionnaire assessment of memory complaints. In L. W.
Poon, T. Crook, K. L. Davis, C. Eisdorfer, B. J. Gurland, A. W. Kaszniak, & L. W. Thompson (Eds.),
Clinical memory assessment of older adults (pp. 93
107). Washington, DC: American Psychological
Association.
Gilewski, M. J., Zelinski, E. M., & Schaie, K. W. (1990).
The Memory Functioning Questionnaire for assessment of memory complaints in adulthood and old
age. Psychology and Aging, 5, 482490.
Gill, D. L., & Deeter, T. E. (1988). Development of the
Sports Orientation Questionnaire. Research Quarterly for Exercise and Sport, 59, 191202.
Gillberg, I. C., & Gillberg, I. C. (1983). Three-year
follow-up at age 10 of children with minor neurodevelopmental disorders. I. Behavioral problems.
Developmental Medicine and Child Neurology, 25,
438449.
Gillespie, B. L., Eisler, R. M. (1992). Development of
the femining gender role stress scale. Behavior Modication, 16, 426438.
Ginzberg, E. (1951). Occupational choice. New York:
Columbia University Press.
March 6, 2006
13:35
References
Glaser, R. (1963). Instructional technology and the
measurement of learning outcomes. American Psychologist, 18, 519521.
Glass, D. C. (1977). Stress, behavior patterns, and coronary disease. American Scientist, 65, 177187.
Glass, G. V. (1976). Primary, secondary, and metaanalysis of research. Educational Researcher, 5, 38.
Glaub, V. E., & Kamphaus, R. W. (1991). Construction of a nonverbal adaptation of the Stanford-Binet
fourth edition. Educational and Psychological Measurement, 51, 231241.
Glennon, J. R., Albright, K. E., & Owens, W. A. (1966).
A catalog of life history items. Greensboro, NC: Center for Creative Leadership.
Glow, R. A., Glow, P. A., & Rump, E. E. (1982). The
stability of child behavior disorders: A one year testretest study of Adelaide versions of the Conners
Teacher and Parent Rating Scales. Journal of Abnormal Child Psychology, 10, 3360.
Glueck, B. C., & Reznikoff, M. (1965). Comparison
of computer-derived personality prole and projective psychological test ndings. American Journal of
Psychiatry, 121, 11561161.
Glutting, J. J., & McDermott, P. A. (1989). Using
teaching items on ability tests: A nice idea but
does it work? Educational and Psychological Measurement, 49, 257268.
Goddard, H. H. (1913). The Binet tests in relation to
immigration. Journal of Psycho-Asthenics, 18, 105
107.
Goddard, H. H. (1917). Mental tests and the immigrant. Journal of Delinquency, 2, 243277.
Goebel, R. A. (1983). Detection of faking on the
Halstead-Reitan Neuropsychological Test Battery.
Journal of Clinical Psychology, 39, 731742.
Goetzinger, C. P., Wills, R. C., & Dekker, L. C. (1967).
Non-language I.Q. tests used with deaf pupils. Volta
Review, 69, 500506.
Goldband, S., Katkin, E. S., & Morell, M. A. (1979). Personality and cardiovascular disorder: Steps toward
demystication. In I. G. Sarason & C. D. Spielberger
(Eds.), Stress and anxiety (Vol. 6). New York: Wiley.
Goldberg, E. L., & Alliger, G. M. (1992). Assessing the
validity of the GRE for students in Psychology: A
validity generalization approach. Educational and
Psychological Measurement, 52, 10191027.
Goldberg, L. R. (1959). The effectiveness of clinicians
judgments: The diagnosis of organic brain damage
from the Bender-Gestalt test. Journal of Consulting
Psychology, 23, 2533.
Goldberg, L. R. (1968). Simple models or simple
processes? Some research on clinical judgments.
American Psychologist, 23, 483496.
Goldberg, L. R. (1972). Review of the CPI. In O. K.
Buros (Ed.), The seventh mental measurements
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
yearbook (Vol. 1, pp. 9496). Highland Park, NJ:
Gryphon Press.
Goldberg, L. R. (1974). Objective personality tests and
measures. Annual Review of Psychology, 25, 343366.
Goldberg, L. R. (1981). Language and individual
differences: The search for universals in personality lexicons. In L. Wheeler (Ed.), Personality
and social psychology review (Vol. 2, pp. 141165).
Beverly Hills, CA: Sage.
Goldberg, L. R. (1990). An alternative Description of
Personality: The Big-Five Factor Structure. Journal
of Personality and Social Psychology, 59, 12161229.
Goldberg, L. R., Grenier, J. R., Guion, R. M.,
Sechrest, L. B., & Wing, H. (1991). Questionnaires
used in the prediction of trustworthiness in preemployment selection decisions: An A.P.A. Task Force
Report. Washington DC: American Psychological
Association.
Goldberg, L. R., Rorer, L. G., & Greene, M. M. (1970).
The usefulness of stylistic scales as potential suppressor or moderator variables in predictions from
the CPI. Oregon Research Institute Research Bulletin,
10, No. 3.
Goldberg, P. (1965). A review of sentence completion
methods in personality assessment. In B. I. Murstein
(Ed.), Handbook of projective techniques (pp. 777
822). New York: Basic Books.
Golden, C. J. (1981). The Luria-Nebraska Childrens
Battery: Theory and formulation. In G. W. Hynd,
& J. E. Obrzut (Eds.), Neuropsychological assessment and the school-age child. New York: Grune &
Stratton.
Golden, C. J. (1989). The Nebraska Neuropsychological Childrens Battery. In C. R. Reynolds &
E. Fletcher-Janzen (Eds.), Handbook of clinical
child neuropsychology (pp. 193204). New York:
Plenum.
Golden, C. J., Purisch, A. D., & Hammeke, T. A.
(1985). Luria-Nebraska Neuropsychological Battery:
Forms I and II. Los Angeles: Western Psychological
Services.
Goldfried, M. R. (1964). A cross-validation of the
Marlowe-Crowne Social Desirability Scale items.
Journal of Social Psychology, 64, 137145.
Goldfried, M. R., & Kent, R. N. (1972). Traditional
vs. behavioral personality assessment: A comparison of methodological and theoretical assumptions.
Psychological Bulletin, 77, 409420.
Goldfried, M. R., & Zax, M. (1965). The stimulus value
of the TAT. Journal of Projective Techniques, 29, 46
57.
Golding, S. L. (1978). Review of the Psychological
Screening Inventory. In O. K. Buros (Ed.), The eighth
mental measurements yearbook (Vol. 1, pp. 1019
1022). Highland Park, NJ: Gryphon Press.
March 6, 2006
13:35
565
Goldman, M. E., & Berry, C. A. (1981). Comparative
predictive validity of the new MCAT using different
admissions criteria. Journal of Medical Education,
56, 981986.
Goldman, R. D., & Hewitt, B. N. (1976). Predicting the
success of Black, Chicano, Oriental, and White college students. Journal of Educational Measurement,
13, 107117.
Goldman, R. D., & Richards, R. (1974). The SAT
prediction of grades for Mexican-American versus
Anglo-American students at the University of California, Riverside, Journal of Educational Measurement, 11, 129135.
Goldman, R. D., & Slaughter, R. E. (1976). Why college
grade point average is difcult to predict. Journal of
Educational Psychology, 68, 914.
Goldman, R. D., & Widawski, M. H. (1976). An analysis
of types of errors in the selection of minority college
students. Journal of Educational Measurement, 13,
185200.
Goldman, R. L. (1992). The reliability of peer assessments of quality of care. Journal of the American
Medical Association, 267, 958960.
Goldman, R. L. (1994). The reliability of peer assessments. A metaanalysis. Evaluation and the Health
Professions, 17, 321.
Goldschmidt, M. L., & Bentler, P. M. (1968). Manual:
Concept assessment kit: Conservation. San Diego:
Educational & Industrial Testing Service.
Goldsmith, D. B. (1922). The use of the personnel history blank as a salesmanship test. Journal of Applied
Psychology, 6, 149155.
Goldstein, G. (1984). Comprehensive neuropsychological assessment batteries. In G. Goldstein
& M. Hersen (Eds.), Handbook of psychological assessment (pp. 181210). Elmsford, NY:
Pergamon Press.
Goldstein, G., & Shelly, C. H. (1975). Similarities and
differences between psychological decit in aging
and brain damage. Journal of Gerontology, 30, 448
455.
Goldstein, I. L. (1971). The application blank: How
honest are the responses? Journal of Applied Psychology, 55, 491492.
Good, L. R., & Good, K. C. (1974). A preliminary measure of existential anxiety. Psychological Reports, 34,
7274.
Good, R. H., III, Vollmer, M., Creek, R. J., Katz, L.,
& Chowdhri, S. (1993). Treatment utility of the
Kaufman Assessment Battery for Children: Effects
of matching instruction and student processing
strength. School Psychology Review, 22, 826.
Goodenough, F. L. (1926). Measurement of intelligence by drawings. New York: Harcourt, Brace, &
World.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
566
Goodwin, F. K., & Jamison, K. R. (1990). Manicdepressive illness. New York: Oxford University Press.
Goodwin, W. L., & Driscoll, L. A. (1980). Handbook
for measurement and evaluation in early childhood
education. San Francisco, CA: Jossey-Bass.
Gordon, M. E., & Gross, R. H. (1978). A critique of
methods for operationalizing the concept of fakeability. Educational and Psychological Measurement,
38, 771782.
Gorenstein, C., Andrade, L., Filho, A., Tung, T., &
Artes, R. (1999). Psychometric properties of the Portuguese version of the Beck Depression Inventory on
Brazilian college students. Journal of Clinical Psychology, 55, 553562.
Gorham, D. R. (1967). Validity and reliability studies of a computer-based scoring system for inkblot
responses. Journal of Consulting Psychology, 31, 65
70.
Gorham, D. R., Moseley, E. C., & Holtzman, W. W.
(1968). Norms for the computer-scored Holtzman
Inkblot Technique. Perceptual and Motor Skills, 26,
12791305.
Gorman, B. (1968). Social desirability factors and the
Eysenck Personality Inventory. Journal of Psychology,
69, 7583.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.).
Hillsdale, NJ: Lawrence Erlbaum.
Gorsuch, R. L., & Key, M. K. (1974). Abnormalities of
pregnancy as a function of anxiety and life stress.
Psychosomatic Medicine, 36, 352372.
Gotlib, I. H. (1984). Depression and general psychopathology in university students. Journal of
Abnormal Psychology, 93, 1930.
Gottesman, I. I., & Prescott, C. A. (1989). Abuses of
the MacAndrew MMPI Alcoholism Scale: A critical
review. Clinical Psychology Review, 9, 223242.
Gottfredson, L. S., & Crouse, J. (1986). Validity versus
utility of mental tests: Example of the SAT. Journal
of Vocational Behavior, 29, 363378.
Gottschalk, L. A. (1974). A Hope Scale applicable to
verbal samples. Archives of General Psychiatry, 30,
779785.
Gottschalk, L. (1985). Hope and other deterrents to
illness. American Journal of Psychotherapy, 39, 515
524.
Gottschalk, L. A., & Gleser, G. C. (1969). The measurement of psychological states through the content
analysis of verbal behavior. Los Angeles, CA: University of California Press.
Gough, H. G. (1947). Simulated patterns on the MMPI.
Journal of Abnormal and Social Psychology, 42, 215
225.
Gough, H. G. (1950). The F minus K dissimulation
index for the MMPI. Journal of Consulting Psychology, 14, 408413.
March 6, 2006
13:35
References
Gough, H. G. (1954). Some common misconceptions
about neuroticism. Journal of Consulting Psychology,
18, 287292.
Gough, H. G. (1965). Conceptual analysis of psychological test scores and other diagnostic variables.
Journal of Abnormal Psychology, 70, 294302.
Gough, H. G. (1966). Appraisal of social maturity by
means of the CPI. Journal of Abnormal Psychology,
71, 189195.
Gough, H. G. (1968). An interpreters syllabus
for the California Psychological Inventory. In P.
McReynolds (Ed.), Advances in psychological assessment (Vol. 1, pp. 5579). Palo Alto, CA: Science and
Behavior Books.
Gough, H. G. (1974). A 24-item version of the MillerFisk sexual knowledge questionnaire. The Journal of
Psychology 87, 183192.
Gough, H. G. (1984). A managerial potential scale for
the California Psychological Inventory. Journal of
Applied Psychology, 69, 233240.
Gough, H. G. (1985). A Work Orientation scale for
the California Psychological Inventory. Journal of
Applied Psychology, 70, 505513.
Gough, H. G. (1987). CPI administrators guide. Palo
Alto, CA: Consulting Psychologists Press.
Gough, H. G. (1989). The California Psychological
Inventory. In C. S. Newmark (Ed.), Major psychological assessment instruments (Vol. 2, pp. 6798).
Boston, MA: Allyn & Bacon.
Gough, H. G. (1991). Some unnished business. In
W. M. Grove & D. Cicchetti (Eds.), Thinking clearly
about psychology (Vol. 2, pp. 114136) Personality
and psychopathology. Minneapolis, MN: University
of Minnesota Press.
Gough, H. G. (1992). Assessment of creative potential
in psychology and the development of a Creative
Temperament Scale for the CPI. In J. C. Rosen & P.
McReynolds (Eds.), Advances in psychological assessment (Vol. 8, pp. 225257). New York: Plenum.
Gough, H. G., & Bradley, P. (1992). Delinquent and
criminal behavior as assessed by the revised California Psychological Inventory. Journal of Clinical
Psychology, 48, 298308.
Gough, H. G. , & Bradley, P. (1996). CPI manual. Palo
Alto, CA: Consulting Psychologists Press.
Gough, H., & Domino, G. (1963). The D-48 test
as a measure of general ability among gradeschool children. Journal of Consulting Psychology, 27,
344349.
Gough, H. G., & Heilbrun, A. L. (1965). The Adjective Check List Manual. Palo Alto, CA: Consulting
Psychologists Press.
Gough, H. G., & Heilbrun, A. B., Jr. (1983). The Adjective Check List manual-1983 edition. Palo Alto, CA:
Consulting Psychologists Press.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Gould, J. (1982). A psychometric investigation of the
standard and long-form Beck Depression Inventory.
Psychological Reports, 51, 11671170.
Gould. S. J. (1981). The mismeasure of man. New York:
W. W. Norton.
Gowan, J. C., & Gowan, M. S. (1955). A teacher prognosis scale for the MMPI. Journal of Educational
Research, 49, 112.
Gracely, R. H., McGrath, P., & Dubner, R. (1978). Ratio
scales of sensory and affective verbal pain descriptions. Pain, 5, 518.
Graham, C., Bond, S. S., Gerkovich, M. M., & Cook,
M. R. (1980). Use of the McGill Pain Questionnaire
in the assessment of cancer pain: Replicability and
consistency. Pain, 8, 377387.
Graham, E. E., & Shapiro, E. (1963). Use of the Performance Scale of the WISC with the deaf child. Journal
of Consulting Psychology, 17, 396398.
Graham, J. R. (1987). The MMPI: A practical guide
(2nd ed.). New York: Oxford University Press.
Graham, J. R. (1990). MMPI-2: Assessing personality
and psychopathology. New York: Oxford University
Press.
Graham, J. R. (1993). MMPI-2: Assessing personality
and psychopathology (2nd ed.). New York: Oxford
University Press.
Graham, J. R., Watts, D., & Timbrook, R. E. (1991).
Detecting fake-good and fake-bad MMPI-2 proles.
Journal of Personality Assessment, 57, 264277.
Grant, I., Sweetwood, H., Gerst, M. S., & Yager, J.
(1978). Scaling procedures in life events research.
Journal of Psychosomatic Research, 22, 525530.
Grassi, J. R. (1973). Grassi Basic Cognition Evaluation.
Miami, FL: University of Miami.
Greaud, V. A., & Green, B. F. (1986). Equivalence of
conventional and computer presentation of speed
tests. Applied Psychological Measurement, 10, 2334.
Green, B. F. (1954). Attitude measurement. In G. Lindsey (Ed.), Handbook of social psychology (pp. 335
369) Cambridge, MA: Addison-Wesley.
Green, B. F. (1988). Critical problems in computerbased psychological measurement. Applied Measurement in Education, 1, 223231.
Green, B. F. (1991). Guidelines for computer testing.
In T. B. Gutkin & S. L. Wise (Eds.), The computer and
the decision-making process (pp. 245273). Hillsdale,
NJ: Lawrence Erlbaum.
Green, C. J. (1982). The diagnostic accuracy and utility
of MMPI and MCMI computer interpretive reports.
Journal of Personality Assessment, 46, 359365.
Green, R. F., & Goldfried, M. R. (1965). On the bipolarity of semantic space. Psychological Monographs,
79, 6 (Whole No. 599).
Greenbaum, P. E., Dedrick, R. F., & Lipien, L. (2004).
The Child Behavior Checklist. In Hilsenroth, M. J.,
March 6, 2006
13:35
567
& Segal, D. L. (Eds.), Comprehensive handbook of
Psychological assessment. Hoboken, NJ: Wiley &
Sons.
Greene, A. C., Sapp, G. L., & Chissom, B. (1990).
Validation of the Stanford-Binet Intelligence Scale:
Fourth edition with exceptional black male students.
Psychology in the Schools, 27, 3541.
Greene, R. L. (1978). An empirically derived MMPI
Carelessness Scale. Journal of Clinical Psychology, 34,
407410.
Greene, R. L. (1980). The MMPI: An interpretive manual. New York: Grune & Stratton.
Greene, R. L. (1987). Ethnicity and MMPI performance: A review. Journal of Consulting and Clinical Psychology, 55, 497512.
Greene, R. L. (1988). Assessment of malingering and
defensiveness by objective personality inventories.
In R. Rogers (Ed.), Clinical assessment of malingering
and deception (pp. 123158). New York: Guilford
Press.
Greene, R. L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston, MA: Allyn & Bacon.
Greenspoon, J., & Gersten, C. D. (1967). A new look at
psychological testing: Psychological testing from the
standpoint of a behaviorist. American Psychologist,
22, 848853.
Greitzer, F. L., Hershman, R. L., & Kelly, R. T. (1981).
The air defense game: A microcomputer program for research in human performance. Behavior Research Methods and Instrumentation, 13, 57
59.
Gresham, F. M., & Elliott, S. N. (1990). Social Skills
Rating System: Manual. Circle Pines, MN: American
Guidence Service.
Gridley, B. E. (1991). Conrmatory factor analysis of the Stanford-Binet: Fourth edition for a
normal sample. Journal of School Psychology, 29,
237248.
Gridley, B. E., & Treloar, J. H. (1984). The validity of
the scales for rating the Behavioral Characteristics
of Superior Students for the identication of gifted
students. Journal of Psychoeducational Assessment, 2,
6571.
Grifn, C. H. (1959). The development of processes
for indirect or synthetic validity. 5. Application of
motion and time analysis to dexterity tests. Personnel
Psychology, 12, 418420.
Grifths, R. (1970). The abilities of young children.
Chard, England: Young & Son.
Grigoriadis, S., & Fekken, G. C. (1992). Person reliability on the Minnesota Multiphasic Personality
Inventory. Personality and Individual Differences, 13,
491500.
Grisso, T. (1986). Evaluating competencies: Forensic
assessments and instruments. New York: Plenum.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
568
Gronlund, N. E. (1959). Sociometry in the classroom.
New York: Harper & Brothers.
Gronlund, N. E. (1993). How to make achievement
tests and assessments (5th ed.). Boston, MA: Allyn
& Bacon.
Grnnerd, C. (1999). Rorschach interrater agreement
estimates: An empirical evaluation. Scandinavian
Journal of Psychology, 40, 115120.
Gross, J., Rosen, J. C., Leitenberg, H., & Willmuth,
M. E. (1986). Validity of the Eating Attitudes Test
and the Eating Disorders Inventory in bulimia nervosa. Journal of Consulting and Clinical Psychology,
54, 875876.
Grossman, F. M., & Johnson, K. M. (1983). Validity of
the Slosson and Otis-Lennon in predicting achievement of gifted students. Educational and Psychological Measurement, 43, 617622.
Grosze-Nipper, L. M. H., & Rebel, H. J. C. (1987).
Dimensions of pacism and militarism. In H. J. C.
Rebel & L. Wecke (Eds.), Friends, foes, values and
fears. (pp. 5591). Amsterdam: Jan Mets.
Grotevant, H. D., Scarr, S., & Weinberg, R. A. (1977).
Patterns of interest similarity in adoptive and biological families. Journal of Personality and Social Psychology, 35, 667676.
Groth-Marnat, G. (1984). Handbook of psychological
assessment. New York: Van Nostrand Reinhold Co.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E.,
& Nelson, C. (2000). Clinical versus mechanical
prediction: A meta-analysis. Psychological Assessment, 12, 1930.
Guastello, S. J., Guastello, D. D., & Craft, L. L. (1989).
Assessment of the Barnum effect in computer-based
test interpretations. The Journal of Psychology, 123,
477484.
Guastello, S. J., & Rieke, M. L. (1990). The Barnum effect and validity of computer-based test
interpretations: The human resource development
report. Psychological Assessment, 2, 186190.
Guertin, W. H., Ladd, C. E., Frank, G. H., Rabin, A. I.,
& Hiester, D. S. (1966). Research with the Wechsler
intelligence scales for adults: 19601965. Psychological Bulletin, 66, 385409.
Guilford, J. P. (1954). Psychometric methods. New York:
McGraw-Hill.
Guilford, J. P. (1959a). Three faces of intellect. American Psychologist, 14, 469479.
Guilford, J. P. (1959b). Personality. New York:
McGraw-Hill.
Guilford, J. P. (1967a) Autobiography. In E. G. Boring
& G. Lindzey (Eds.), A history of psychology in autobiography (Vol. 5, pp. 167192). New York: AppletonCentury-Crofts.
Guilford, J. P. (1967b). The nature of human intelligence. New York: McGraw-Hill.
March 6, 2006
13:35
References
Guilford, J. P. (1988). Some changes in the Structureof-Intellect Model. Educational and Psychological
Measurement, 48, 14.
Guilford, J. P. & Fruchter, B. (1978). Fundamental
statistics in psychology and education. New York:
McGraw-Hill.
Guion, R. M. (1965a). Personnel testing. New York:
McGraw-Hill.
Guion, R. M. (1965b). Synthetic validity in a small
company: A demonstration. Personnel Psychology,
18, 4963.
Guion, R. M. (1977). Content validity the source of
my discontent. Applied Psychological Measurement,
1, 110.
Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11, 385398.
Guion, R. M., & Gottier, R. F. (1965). Validity of personality measures in personnel selection. Personnel
Psychology, 18, 135164.
Gustafsson, J. E. (1984). A unifying model for the
structure of intellectual abilities. Intelligence, 8,
179203.
Gutkin, T. & Reynolds, C. R. (1980). Factorial similarity of the WISC-R for Anglos and Chicanos referred
for psychological services. Journal of School Psychology, 18, 3439.
Gutkin, T. B., & Reynolds, C. R. (1981). Factorial similarity of the WISC-R for white and black children
from the standardization sample. Journal of Educational Psychology, 73, 227231.
Gutterman, J. E. (1985). Correlations of scores of low
vision children on the Perkins-Binet Tests of Intelligence for the Blind, the WISC-R and the WRAT.
Journal of Visual Impairment and Blindness, 79, 55
58.
Guttman, L. (1944). A basis for scaling qualitative data.
American Sociological Review, 9, 139150.
Guttman, L. (1945). A basis for analysing test-retest
reliability. Psychometrika, 10, 255282.
Gynther, M. D. (1972). White norms and Black
MMPIs: A prescription for discrimination. Psychological Bulletin, 78, 386402.
Gynther, M. D. (1979). Aging and personality. In J. N.
Butcher (Ed.), New developments in the use of the
MMPI (pp. 3968). Minneapolis: University of Minnesota Press.
Gynther, M. D., Fowler, R. D., & Erdberg, P. (1971).
False positives galore: The application of standard
MMPI criteria to a rural, isolated, Negro sample.
Journal of Clinical Psychology, 27, 234237.
Gynther, M. D., & Gynther, R. A. (1976). Personality
inventories. In I. B. Weiner (Ed.), Clinical methods
in psychology. New York: Wiley.
Hackman, J. R., Wiggins, N., & Bass, A. R. (1970).
Prediction of long-term success in doctoral work in
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
psychology. Educational and Psychological Measurement, 30, 365374.
Haemmerlie, F. M., & Merz, C. J. (1991). Concurrent validity between the California Psychological
Inventory-Revised and the Student Adaptation to
College questionnaire. Journal of Clinical Psychology, 47, 664668.
Hagen, E., Delaney, E., & Hopkins, T. (1987). StanfordBinet Intelligence Scale examiners handbook: An
expanded guide for fourth edition users. Chicago, IL:
Riverside.
Hainsworth, P. K., & Hainsworth, M. L. (1980).
Preschool Screening System. Pawtucket, RI: Early
Recognition Intervention Systems.
Hakstian, A. R., & Farrell, S. (2001). An openness scale
for the California Psychological Inventory. Journal
of Personality Assessment, 76, 107134.
Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied
Measurement in Education, 2, 3750.
Haladyna, T. M., & Downing, S. M. (1989b). Validity of
a taxonomy of multiple-choice item-writing rules.
Applied Measurement in Education, 2, 5178.
Haladyna, T., & Downing, S. (1994). How many
options is enough for a multiple-choice test item?
Educational and Psychological Measurement, 53,
9991009.
Hale, R. L. (1978). The WISC-R as a predictor of
WRAT performance. Psychology in the Schools, 15,
172175.
Hale, R. L., & Landino, S. A. (1981). Utility of WISCR subtest analysis in discriminating among groups
of conduct problem, withdrawn, mixed, and nonproblem boys. Journal of Consulting and Clinical Psychology, 49, 9195.
Hall, C. S., & Lindzey, G. (1970). Theories of personality.
New York: Wiley.
Hall, G., Bansal, A., & Lopez, I. (1999). Ethnicity and
psychopathology: A meta-analytic review of 31 years
of comparative MMPI/MMPI-2 research. Psychological Assessment, 11, 186197.
Hall, H. V., & Pritchard, D. A. (1996). Detecting malingering and deception. Delray Beach, FL: St. Lucie
Press.
Haller, N., & Exner, J. E. Jr. (1985). The reliability of
Rorschach variables for inpatients presenting symptoms of depression and/or helplessness. Journal of
Personality Assessment, 49, 516521.
Halpin, G., Halpin, G., & Schaer, B. B. (1981). Relative
effectiveness of the California Achievement Tests
in comparison with the ACT Assessment, College
Board Scholastic Aptitude Test, and High school
grade point average in predicting college grade
point average. Educational and Psychological Measurement, 41, 821827.
March 6, 2006
13:35
569
Halverson, C. F. (1995). Measurement beyond the individual. In J. C. Conoley & E. B. Werth (Eds.), Family
assessment (pp. 318). Lincoln, NB: Buros Institute
of Mental Measurements.
Hambleton, R. K. (1984). Validating the test score. In
R. A. Berk (Ed.), A guide to criterion-referenced test
construction (pp. 199230). Baltimore, MD: Johns
Hopkins University Press.
Hambleton, R. K., & Murphy, E. (1992). A psychometric perspective on authentic measurement. Applied
Measurement in Education, 5, 116.
Hambleton, R. K., & Swaminathan, H. (1985). Item
response theory: Principles and applications. Boston,
MA: Kluwer-Nijhoff.
Hamilton, S. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23,
5662.
Hammen, C. L. (1980). Depression in college students:
Beyond the Beck Depression Inventory. Journal of
Consulting and Clinical Psychology, 48, 126128.
Hammer, E. F. (1978). The clinical application of projective drawings. Springeld, IL: Charles C Thomas.
Hampton, N. H. (1986). Luria-Nebraska Neuropsychological Battery (LNNB) Forms I and II microcomputer diskette. Computers in Human Behavior,
2, 9293.
Haney, W. (1981). Validity, vaudeville, and values.
American Psychologist, 36 10211034.
Haney, W., & Madaus, G. (1978). Making sense of
the competency testing movement. Harvard Educational Review, 48, 462484.
Haney, W. M., Madaus, G. F., & Lyons, R. (1993).
The fractured marketplace for standardized testing.
Boston, MA: Kluwer Academic Publishers.
Hanford, G. H. (1985). Yes, the SAT does help colleges.
Harvard Educational Review, 55, 324331.
Hannah, M. T., Domino, G., Figueredo, A. J., & Hendrickson, R. (1996). The prediction of ego integrity
in older persons. Educational and Psychological Measurement, 56, 930950.
Hansen, J. C. (1984). Users guide for the SVIB-SCII.
Stanford, CA: Stanford University Press.
Hansen, J. C. (1986). Strong Vocational Interest
Blank/Strong-Campbell Interest Inventory. In W. B.
Walsh & S. H. Osipow (Eds.), Advances in Vocational Psychology (Vol. 1, pp. 129). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Hansen, J. C., & Campbell, D. P. (1985). Manual for the
SVIB-SCII (4th ed.). Stanford, CA: Stanford University Press.
Hansen, R., Young, J., & Ulrey, G. (1982). Assessment considerations with the visually handicapped
child. In G. Ulrey & S. Rogers (Eds.), Psychological
assessment of handicapped infants and young children. (pp. 108114). New York: Thieme-Stratton.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
570
Hanson, S., Buckelew, S. P., Hewett, J., & ONeal,
G. (1993). The relationship between coping and
adjustment after spinal cord injury: A 5-year
follow-up study. Rehabilitation Psychology, 38,
4152.
Hardy, J. B., Welcher, D. W., Mellits, E. D., & Kagan, J.
(1976). Pitfalls in the measurement of intelligence:
Are standard intelligence tests valid instruments
for measuring the intellectual potential of urban
children? Journal of Psychology, 94, 4351.
Harkness, A. R., McNulty, J. L., & Ben-Porath, Y. S.
(1995). The Personality Psychopathology Five (PSY5): Constructs and MMPI-2 scales. Psychological
Assessment, 7, 104114.
Harmann, H. H. (1960). Modern factor analysis.
Chicago, IL: University of Chicago Press.
Harmon, M. G., Morse, D. T., & Morse, L. W. (1996).
Conrmatory factor analysis of the Gibb experimental test of testwiseness. Educational and Psychological Measurement, 56, 276286.
Harnett, R. T., & Willingham, W. W. (1980). The criterion problem: What measure of success in graduate education? Applied Psychological Measurement,
4, 281291.
Harrell, T. W. (1992). Some history of the Army General Classication Test. Journal of Applied Psychology,
77, 875878.
Harrington, R. G., & Follett, G. M. (1984). The
readability of child personality assessment instruments. Journal of Psychoeducational Assessment, 2,
3748.
Harrington, R. G., & Jennings, V. (1986). Comparison of three short forms of the McCarthy Scales of
Childrens Abilities. Contemporary Educational Psychology, 11, 109116.
Harris, D. B. (1963). Childrens drawings as measures
of intellectual maturity. New York: Harcourt, Brace
& World.
Harris, M. M., & Schaubroeck, J. (1988). A metaanalysis of self-supervisor, self-peer, and peer-supervisor
ratings. Personnel Psychology, 41, 4362.
Harrower, M. R., & Herrmann, R. (1953). Psychological
factors in the care of patients with multiple scleroses:
For use of physicians. New York: National Multiple
Sclerosis Society.
Hart, S. D., Roesch, R., Corrado, R. R., & Cox,
D. N. (1993). The Referral Decision Scale. Law and
Human Behavior, 17, 611623.
Hartigan, J. A., & Wigdor, A. K. (1989). Fairness
in employment testing. Washington, DC: National
Academy Press.
Hartlage, L. C. (1987). Diagnostic assessment in rehabilitation. In B. Bolton (Ed.), Handbook of measurement and evaluation in rehabilitation (2nd ed.,
pp. 141149). Baltimore, MD: Paul H. Brookes.
March 6, 2006
13:35
References
Hartlage, L. C., & Steele, C. T. (1977). WISC and WISCR correlates of academic achievement. Psychology in
the Schools, 14, 1518.
Hartley, J., & Holt, J. (1971). A note on the validity
of the Wilson-Patterson measure of conservatism.
British Journal of Social and Clinical Psychology, 10,
8183.
Hartman, A. A. (1970). A basic T.A.T. set. Journal of
Projective Techniques and Personality Assessment, 34,
391396.
Hartman, D. E. (1986). Articial intelligence or articial psychologist? Conceptual issues in clinical
microcomputer use. Professional Psychology, 17,
528534.
Hartmann, D. P. (1977). Considerations in the choice
of interobserver reliability estimates. Journal of
Applied Behavior Analysis, 10, 103116.
Hartmann, D. P., Roper, B. L., & Bradford, D. C. (1979).
Source relationships between behavioral and traditional assessment. Journal of Behavioral Assessment,
1, 321.
Hartmann, G. (1938). The differential validity of items
in a Liberalism-Conservatism Test. Journal of Social
Psychology, 9, 6778.
Hartnett, R. T., & Willingham, W. W. (1980). The criterion problem: What measure of success in graduate education? Applied Psychological Measurement,
4, 281291.
Hartshorne, H., & May, M. A. (1928). Studies in deceit.
New York: Macmillan.
Hartshorne, T. S. (1993). Psychometric properties and
conrmatory factor analysis of the UCLA Loneliness
scale. Journal of Personality Assessment, 61, 182195.
Hase, H. D., & Goldberg, L. R. (1967). Comparative
validity of different strategies of constructing personality inventory scales. Psychological Bulletin, 67,
231248.
Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota Multiphasic Personality Inventory. Minneapolis, MN: University of Minnesota Press.
Hathaway, S. R., McKinley, J. C., Butcher, J. N.,
Dahlstrom, W. G., Graham, J. R., Tellegen, A., &
Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis, MN: University
of Minnesota Press.
Hathaway, S. R., & Meehl, P. E. (1951). An atlas for the
clinical use of the MMPI. Minneapolis, MN: University of Minnesota Press.
Hawkins, N. C., Davies, R., & Holmes, T. H. (1957).
Evidence of psychosocial factors in the development of pulmonary tuberculosis. American Review
of Tuberculosis and Pulmonary Disorders, 75, 768
780.
Hayden, D. C., Furlong. M. J., & Linnemeyer, S.
(1988). A comparison of the Kaufman Assessment
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Battery for Children and the Stanford-Binet IV for
the assessment of gifted children. Psychology in the
Schools, 25, 239243.
Hayes, F. B., & Martin, R. P. (1986). Effectiveness of the
PPVT-R in the screening of young gifted children.
Journal of Psychoeducational Assessment, 4, 2733.
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1986).
Evaluating the quality of behavioral assessment. In
R. O. Nelson & S. C. Hayes (Eds.), Conceptual foundations of behavioral assessment (pp. 463503). New
York: Guilford Press.
Hayes, S. N., Floyd, F. J., Lemsky, C., Rogers, E., Winemiller, D., Heilman, N., Werle, M., Murphy, T., &
Cardone, L. (1992). The Marital Satisfaction Questionnaire for older persons. Psychological Assessment, 4, 473482.
Hayes, S. N., Richard, D. C. S., & Kubany, E. S. (1995).
Content validity in psychological assessment: A
functional approach to concepts and methods. Psychological Assessment, 7, 238247.
Hayes, S. P. (1929). The new revision of the Binet Intelligence Tests for the blind. Teachers Forum, 2, 24.
Haynes, J. P., & Bensch, M. (1981). The PV sign on the
WISC-R and recidivism in delinquents. Journal of
Consulting and Clinical Psychology, 49, 480481.
Haynes, S. N., & Wilson, C. C. (1979). Behavioral
assessment. San Francisco, CA: Jossey-Bass.
Hays, D. G., & Borgatta, E. P. (1954). An empirical
comparison of restricted and general latent distance
analysis. Psychometrika, 19, 271279.
Hays, R. D., & DiMatteo, M. R. (1987). A short-form
measure of loneliness. Journal of Personality Assessment, 51, 6981.
Hayslip, B., Jr. (1984). Idiographic assessment of the
self in the aged: A case for the use of the Q-sort. International Journal of Aging and Human Development,
20, 293311.
Heath, R. L., & Fogel, D. S. (1978). Terminal and
instrumental? An inquiry into Rokeachs Value Survey. Psychological Reports, 42, 11471154.
Heaton, R. K., Grant, I., Anthony, W. Z., & Lehman,
R. A. W. (1981). A comparison of clinical and automated interpretation of the Halstead-Reitan battery.
Journal of Clinical Neuropsychology, 3, 121141.
Heaton, R. K., Smith, H. H., Jr., Lehman, R. A. W.,
& Vogt, A. J. (1978). Prospects for faking believable decits on neuropsychological testing. Journal
of Consulting and Clinical Psychology, 46, 892900.
Heaton, R. R., Grant, I., & Matthews, C. G. (1991).
Comprehensive norms for an expanded HalsteadReitan Battery. Odessa, FL: Psychological Assessment Resources.
Heaton, S. M., Nelson, A. V., & Nester, M. A. (1980).
Guide for administering examinations to handicapped individuals for employment purposes (PRR
March 6, 2006
13:35
571
8016). Washington, DC: Personnel Research and
Development Center.
Heckhausen, H. (1967). The anatomy of achievement
motivation. New York: Academic Press.
Hedlund, J. L., & Vieweg, B. W. (1979). The Zung SelfRating Depression Scale: A comprehensive review.
Journal of Operational Psychiatry, 10, 5164.
Heil, J., Barclay, A., & Endres, J. M. (1978). A factor analytic study of WPPSI scores of educationally
deprived and normal children. Psychological Reports,
42, 727730.
Heilbrun, A. B., Jr. (1972). Review of the EPPS. In
O. K. Buros (Ed.), The seventh mental measurements
yearbook (Vol. 1, pp. 148149). Highland Park, NJ:
Gryphon Press.
Heilbrun, A. B., & Goodstein, L. D. (1961). The relationships between individually dened and group
dened social desirability and performance on the
Edwards Personal Preference Schedule. Journal of
Consulting Psychology, 25, 200204.
Heilbrun (1992). The role of psychological testing in
forensic assessment. Law and Human Behavior, 16,
257272.
Heim, A. W. (1975). Psychological testing. London:
Oxford University Press.
Helmes, E., & Reddon, J. R. (1993). A perspective on
developments in assessing psychopathology: A critical review of the MMPI and MMPI-2. Psychological
Bulletin, 113, 453471.
Helms, J. E. (1992). Why is there no study of cultural
equivalence in standardized cognitive ability testing?
American Psychologist, 47, 10831101.
Hendershott, J. L., Searight, H. R., Hateld, J. L.,
& Rogers, B. J. (1990). Correlations between the
Stanford-Binet, fourth edition and the Kaufman
Assessment Battery for Children for a preschool
sample. Perceptual and Motor Skills, 71, 819
825.
Henderson, R. W., & Rankin, R. J. (1973). WPPSI reliability and predictive validity with disadvantaged
Mexican-American children. Journal of School Psychology, 11, 1620.
Henerson, M. E., Morris, L. L., & Fitz-Gibbon, C. T.
(1987). How to measure attitudes. Newbury Park,
CA: Sage.
Henry, P., Bryson, S., & Henry, C. A. (1990). Black
student attitudes toward standardized tests: Does
gender make a difference? College Student Journal,
23, 346354.
Henry, W. E. (1956). The analysis of fantasy: The thematic apperception technique in the study of personality. New York: Wiley.
Herrmann, D. J. (1982). Know thy memory: The use
of questionnaires to assess and study memory. Psychological Bulletin, 92, 434452.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
572
Hersen, M. (1973). Self-assessment of fear. Behavior
Therapy, 4, 241257.
Hersen, M., & Bellack, A. S. (1977). Assessment of
social skills. In A. R. Ciminero, K. S. Calhoun, &
H. E. Adams (Eds.), Handbook of behavioral assessment (pp. 509554). New York: Wiley.
Hersen, M., & Bellack, A. S. (Eds.). (1981). Behavioral
assessment. Elmsford, NY: Pergamon.
Hertzog, C., & Schear, J. M. (1989). Psychometric considerations in testing the older person. In T. Hunt &
C. J. Lindley (Eds.), Testing older adults (pp. 2450).
Austin, TX: Pro-ed.
Herzog, A. R., & Rodgers, W. L. (1981). Age
and satisfaction: Data from several large surveys.
Research on Aging, 3, 142165.
Hess, A. K. (1985). Review of Millon Clinical Multiaxial
Inventory. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 984986).
Lincoln, NE: University of Nebraska Press.
Hess, D. W. (1969). Evaluation of the young deaf adult.
Journal of Rehabilitation of the Deaf, 3, 621.
Hess, E. H. (1965). Attitude and pupil size. Scientic
American, 212, 4654.
Hess, E. H., & Polt, J. M. (1960). Pupil size as related
to interest value of visual stimuli. Science, 132, 349
350.
Hetzler, S. A. (1954). Radicalism-Conservatism and
social mobility. Social Forces, 33, 161166.
Heveren, V. W. (1980). Recent validity studies of the
Halstead-Reitan approach to clinical neuropsychological assessment. Clinical Neuropsychology, 2, 49
61.
Hevner, K. (1930). An empirical study of three psychophysical methods. Journal of General Psychology,
4, 191212.
Hilgard, E. R. (1957). Lewis Madison Terman
18771956. American Journal of Psychology, 70,
472479.
Hill, A. B., Kemp-Wheeler, S. M., & Jones, S. A. (1986).
What does Beck Depression Inventory measure in
students? Personality and Individual Differences, 7,
3947.
Hill, K. T. (1972). Anxiety in the evaluative context.
In W. Hartup (Ed.), The young child (Vol. 2).
Washington, DC: National Association for the Education of Young Children.
Hill, K. T., & Sarason, S. B. (1966). The relation of test
anxiety and defensiveness to test and school performance over the elementary school years: A further longitudinal study. Monographs of the Society
for Research in Child Development, 31 (No. 2).
Hiller, J. B., Rosenthall, R., Bornstein, R., Berry, D.,
& Brunell-Neuleib, S. (1999). A comparative metaanalysis of Rorschach and MMPI validity. Psychological Assessment, 11, 278296.
March 6, 2006
13:35
References
Hilliard, A. G. (1975). The strengths and weaknesses of
cognitive tests for young children. In J. D. Andrews
(Ed.), One child indivisible. (pp. 1733). Washington, DC: National Association for the Education of
Young Children.
Hills, J. R., Bush, M., & Klock, J. A. (1964). Predicting grades beyond the freshman year. College Board
Review, 54, 2224.
Hilton, T. L., & Korn, J. H. (1964). Measured change in
personal values. Educational and Psychological Measurement, 24, 609622.
Himmelfarb, S. (1984). Age and sex differences in the
mental health of older persons. Journal of Consulting
and Clinical Psychology, 52, 844856.
Hinckley, E. D. (1932). The inuence of individual
opinion on construction of an attitude scale. Journal
of Social Psychology, 3, 283296.
Hirshoren, A., Hurley, O. L., & Hunt, J. T. (1977). The
WISC-R and the Hiskey-Nebraska Test with deaf
children. American Annals of the Deaf, 122, 392394.
Hirshoren, A., Hurley, O. L., & Kavale, K. (1979).
Psychometric characteristics of the WISC-R Performance Scale with deaf children. Journal of Speech
and Hearing Disorders, 44, 7379.
Hiscock. M. (1978). Imagery assessment through selfreport: What do imagery questionnaires measure?
Journal of Consulting and Clinical Psychology, 46,
223230.
Hiskey, M. S. (1966). Hiskey-Nebraska Test of Learning
Aptitude. Lincoln, NE: Union College Press.
Hobbs, S. A., & Walle, D. L. (1985). Validation of the
Childrens Assertive Behavior Scale. Journal of Psychopathology and Behavioral Assessment, 7, 145153.
Hocevar, D. (1979). A comparison of statistical infrequency and subjective judgment as criteria in the
measurement of originality. Journal of Personality
Assessment, 43, 297299.
Hodges, W. F., & Spielberger, C. D. (1966). The effects
of threat of shock on heart rate for subjects who
differ in manifest anxiety and fear of shock. Psychophysiology, 2, 287294.
Hoemann, H. W. (1972). Communication accuracy
in a sign-language interpretation of a group test.
Journal of Rehabilitation of the Deaf, 5, 4043.
Hoffman, R. A., & Gellen, M. I. (1983). The Tennessee
Self Concept Scale: A revisit. Psychological Reports,
53, 11991204.
Hogan, R. (1969). Development of an empathy scale.
Journal of Consulting and Clinical Psychology, 33,
307316.
Hogan R. (1989a). Review of the Personality Research
Form (3rd ed.). In J. C. Conoley & J. J. Kramer
(Eds.), The tenth mental measurements yearbook
(pp. 632633). Lincoln, NB: University of Nebraska
Press.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Hogan, R. (1989b). Review of the NEO Personality Inventory. In J. C. Conoley & J. J. Kramer
(Eds.), The tenth mental measurements yearbook
(pp. 546547). Lincoln, NE: University of Nebraska
Press.
Hogan, R. (1990). What kinds of tests are useful in
organizations? In J. Hogan & R. Hogan (Eds.), Business and industry testing (pp. 2235). Austin, TX:
Pro-Ed.
Hogan, R., DeSoto, C. B., & Solano, C. (1977). Traits,
tests, and personality research. American Psychologist, 32, 255264.
Hogan, R. T. (1983). A socioanalytic theory of personality. In M. Page (Ed.), 1982 Nebraska Symposium
on Motivation (pp. 5589). Lincoln, NB: University
of Nebraska Press.
Hoge, D. R., & Bender, I. E. (1974). Factors inuencing
value change among college graduates in adult life.
Journal of Personality and Social Psychology, 29, 572
585.
Hoge, S. K., Bonnie, R. J., Poythress, N., & Monahan,
J. (1999). The MacArthur Competence Assessment
Tool Criminal Adjudication. Odessa, FL: Psychological Assessment Resources.
Hojat, M. (1982). Psychometric characteristics of the
UCLA Loneliness Scale: A study with Iranian college students. Educational and Psychological Measurement, 42, 917925.
Holaday, M., Smith, D. A., & Sherry, A. (2000). Sentence completion tests: A review of the literature
and results of a survey of members of the Society
for Personality Assessment. Journal of Personality
Assessment, 74, 371383.
Holahan, C. K., & Holahan, C. J. (1987). Life stress,
hassles, and self-efcacy in aging: A replication and
extension. Journal of Applied Social Psychology, 17,
574592.
Holden, R. R., & Fekken, G. C. (1989). Three common
social desirability scales: Friends, acquaintances, or
strangers. Journal of Research in Personality, 23, 180
191.
Holland, J. L. (1966). The psychology of vocational
choice. Waltham, MA: Blaisdell.
Holland, J. L. (1973). Making vocational choices: A theory of careers. Englewood Cliffs, NJ: Prentice-Hall.
Holland, J. L. (1985a). Making vocational choices: A
theory of vocational personalities and work environments (2nd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Holland, J. L. (1985b). Professional manual for the selfdirected search. Odessa, FL: Psychological Assessment Resources.
Holland, J. L., & Nichols, R. C. (1964). Prediction of
academic and extracurricular achievement in college. Journal of Educational Psychology, 55, 5565.
March 6, 2006
13:35
573
Hollandsworth, J. G., Galassi, J. P., & Gay, M. L. (1977).
The Adult Self Expression Scale: Validation by the
multitrait-multimethod procedure. Journal of Clinical Psychology, 33, 407415.
Hollenbeck, G. P., & Kaufman, A. S. (1973). Factor
analysis of the Wechsler Preschool and Primary Scale
of Intelligence (WPPSI). Journal of Clinical Psychology, 29, 4145.
Hollinger, R. D., & Clark, J. P. (1983). Theft by employees. Lexington, MA: Lexington Books.
Hollrah, J. L., Schlottmann, R. S., Scott, A. B., &
Brunetti, D. G. (1995). Validity of the MMPI subtle
items. Journal of Personality Assessment, 65, 278
299.
Holm, C. S. (1987). Testing for values with the deaf: The
language/cultural effect. Journal of Rehabilitation of
the Deaf, 20, 719.
Holmes, T. H., & Rahe, R. H. (1967). The Social
Readjustment Rating Scale. Journal of Psychosomatic
Research, 11, 213218.
Holt, R. R. (1951). The Thematic Apperception Test. In
H. H. Anderson & G. L. Anderson (Eds.), An introduction to projective techniques. Englewood Cliffs,
NJ: Prentice Hall.
Holt, R. R. (1958). Clinical and statistical prediction: A
reformulation and some new data. Journal of Abnormal and Social Psychology, 56, 112.
Holt, R. R. (1971). Assessing personality. New York:
Harcourt Brace Jovanovich.
Holtzman, W. H. (1968). Cross-cultural studies in psychology. International Journal of Psychology, 3, 83
91.
Holtzman, W. H., Thorpe, J. S., Swartz, J. D., & Herron,
E. W. (1961). Inkblot perception and personality
Holtzman Inkblot Technique. Austin, TX: University
of Texas Press.
Honaker, L. M., Hector, V. S., & Harrell, T. H.
(1986). Perceived validity of computer-vs. cliniciangenerated MMPI reports. Computers in Human
Behavior, 2, 7783.
Hong, K. E., & Holmes, T. H. (1973). Transient diabetes
mellitus associated with culture change. Archives of
General Psychiatry, 29, 683687.
Hopkins, K. D., & McGuire, L. (1966). Mental measurement of the blind: The validity of the Wechsler
Intelligence Scale for Children. International Journal
for the Education of the Blind, 15, 6573.
Hops, H., & Lewin, L. (1984). Peer sociometric forms.
In T. H. Ollendick & M. Hersen (Eds.), Child
Behavioral Assessment (pp. 124147). New York:
Pergamon.
Horiguchi, J., & Inami, Y. (1991). A survey of the living conditions and psychological states of elderly
people admitted to nursing homes in Japan. Acta
Psychiatrica Scandinavica, 83, 338341.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
574
Horn, J. (1986). Intellectual ability concepts. In R. J.
Sternberg (Ed.), Advances in the psychology of human
intelligence (Vol. 3). Hillsdale, NJ: Erlbaum.
Horn, J. L., Wanberg, K. W., & Foster, F. M. (1987).
Guide to the Alcohol Use Inventory. Minneapolis,
MN: National Computer Systems.
Hornick, C. W., James, L. R., & Jones, A. P. (1977).
Empirical item keying vs. a rational approach to
analyzing a psychological climate questionnaire.
Applied Psychological Measurement, 1, 489500.
Horowitz, M. D., Wilner, N., & Alvarez, W. (1979).
Impact of event scale: A measure of subjective stress.
Psychosomatic Medicine, 41, 209218.
Hough, L. M. (1992). The Big Five personality
variables construct confusion: Description vs.
prediction. Human Performance, 5, 139155.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp,
J. D., & McCloy, R. A. (1990). Criterion-related
validities of personality constructs and the effect of
response distortion on those validities. Journal of
Applied Psychology, 75, 581585.
House, A. E., House, B. J., & Campbell, M. B. (1981).
Measures of interobserver agreement: Calculation
formulas and distribution effects. Journal of Behavioral Assessment, 3, 3757.
House, J. D., & Johnson, J. J. (1993a). Graduate Record
Examination scores and academic background variables as predictors of graduate degree completion. Educational and Psychological Measurement,
35, 551556.
House, J. D., & Johnson, J. J. (1993b). Predictive validity of the Graduate Record Examination Advanced
Psychology Test for graduate grades. Psychological
Reports, 73, 184186.
House, J. D., Johnson, J. J., & Tolone, W. L. (1987).
Predictive validity of the Graduate Record Examination for performance in selected graduate psychology courses. Psychological Reports, 60, 107110.
Houston, L. N. (1980). Predicting academic achievement among specially admitted black female college students. Educational and Psychological Measurement, 40, 11891195.
Houtz, J. C., & Shaning, D. J. (1982). Contribution
of teacher ratings of behavioral characteristics to
the prediction of divergent thinking and problem
solving. Psychology in the Schools, 19, 380383.
Hoyt, D. R., & Creech, J. C. (1983). The Life Satisfaction
Index: A methodological and theoretical critique.
Journal of Gerontology, 38, 111116.
Hu, S., & Oakland, T. (1991). Global and regional perspectives on testing children and youth: An empirical study. International Journal of Psychology, 26,
329344.
Hubert, N. C., Wachs, T. D., Peters-Martin, P., & Gandour, M. J. (1982). The study of early temperament:
March 6, 2006
13:35
References
Measurement and conceptual issues. Child Development, 53, 571600.
Hudgens, R. W. (1974). Personal catastrophe and
depression: A consideration of the subject with
respect to medically ill adolescents, and a requiem
for retrospective life-event studies. In B. S. Dohrenwend, & B. P. Dohrenwend (Eds.), Stressful life
events: Their nature and effects (pp. 119134). New
York: Wiley.
Hui, C. H., & Triandis, H. C. (1985). Measurement in
cross-cultural psychology. Journal of Cross-Cultural
Psychology, 16, 131152.
Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response style.
Journal of Cross-Cultural Psychology, 20, 296309.
Huitema, B. E., & Stein, C. R. (1993). Validity of
the GRE without restriction of range. Psychological
Reports, 72, 123127.
Hull, J. G., Van Treuren, R. R., & Virnelli, S. (1987).
Hardiness and health: A critique and alternative
approach. Journal of Personality and Social Psychology, 53, 518530.
Humphreys, L. G. (1962). The organization of human
abilities. American Psychologist, 17, 475483.
Humphreys, L. G. (1985). Review of the System
of Multicultural Pluralistic Assessment. In J. V.
Mitchell (Ed.), The ninth mental measurements yearbook (pp. 15171519). Lincoln, NB: University of
Nebraska Press.
Hunsley, J., Vito, D., Pinsent, C., James, S., & Lefebvre,
M. (1996). Are self-report measures of dyadic relationships inuenced by impression management
biases? Journal of Family Psychology, 10, 322330.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal
of Vocational Behavior, 29, 340362.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility
of alternative predictors of job performance. Psychological Bulletin, 96, 7299.
Hunter, J. E., Schmidt, F. L., & Hunter, R. F. (1979).
Differential validity of employment tests by race:
A comprehensive review and analysis. Psychological
Bulletin, 86, 721735.
Hunter, J. E., Schmidt, F. L., & Rauschenberger, J. M.
(1977). Fairness of psychological tests: Implications
of four denitions for selection utility and minority
hiring. Journal of Applied Psychology, 62, 245260.
Huntley, C. W. (1965). Changes in Study of Values
scores during the four years of college. Genetic Psychology Monographs, 71, 349383.
Hurtz, G. M., & Hertz, N. M. R. (1999). How many
raters should be used for establishing cutoff scores
with the Angoff method? A generalizability theory
study. Educational and Psychological Measurement,
59, 885897.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Hutt, M. (1977). The Hutt adaptation of the BenderGestalt (3rd ed.). New York: Grune & Stratton.
Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender
differences in mathematics performance: A metaanalysis. Psychological Bulletin, 107, 139155.
Hynd, G. W. (1988). Neuropsychological assessment
in clinical child psychology. Newbury Park, CA:
Sage.
Illerbrun, D., Haines, L., & Greenough, P. (1985). Language identication screening test for kindergarten:
A comparison with four screening and three diagnostic language tests. Language, Speech and Hearing
Services in Schools, 16, 280292.
Inglehart, R. (1985). Aggregate stability and
individual-level ux in mass belief systems:
The level of analysis paradox. American Political
Science Review, 79, 97116.
Insko, C. A., & Schopler, J. (1967). Triadic consistency:
A statement of affective-cognitive-conative consistency. Psychological Review, 74, 361376.
Inwald, R., Knatz, H., & Shusman, E. (1983). Inwald
Personality Inventory manual. New York: Hilson
Research.
Ireton, H., & Thwing, E. (1979). Minnesota Preschool
Inventory. Minneapolis, MN: Behavior Science
Systems.
Ironson, G. H., & Davis, G. A. (1979). Faking high or
low creativity scores on the Adjective Check List.
Journal of Creative Behavior, 13, 139145.
Irvine, S. H. (1969). Figural tests of reasoning in Africa.
International Journal of Psychology, 4, 217228.
Irwin, R. B. (1914). A Binet scale for the blind. New
Outlook for the Blind, 8, 9597.
Iwawaki, S., & Cowen, E. L. (1964). The social desirability of trait descriptive terms: Applications to a
Japanese sample. Journal of Social Psychology, 63,
199205.
Iwao, S., & Triandis, H. C. (1993). Validity of autoand heterostereotypes among Japanese and American students. Journal of Cross-Cultural Psychology,
24, 428444.
Jaccard, J. (1981). Attributes and behavior: Implications of attitudes towards behavioral alternatives.
Journal of Experimental Social Psychology, 17, 286
307.
Jackson, D. N. (1967). Personality research form manual. Goshen, NY: Research Psychologists Press.
Jackson, D. N. (1970). A sequential system for personality scale development. In C. D. Spielberger
(Ed.), Current topics in clinical and community psychology (Vol. 2, pp. 6196). New York: Academic
Press.
Jackson, D. N. (1977). Manual for the Jackson Vocational Interest Survey. Port Huron, MI: Research Psychologists Press.
March 6, 2006
13:35
575
Jackson, D. N. (1984). Personality Research Form manual (3rd ed.). Port Huron, MI: Research Psychologists Press.
Jackson, D. N., & Messick, S. (1958). Content and style
in personality assessment. Psychological Bulletin, 55,
243252.
Jackson, D. N., & Minton, H. (1963). A forced-choice
adjective preference scale for personality assessment. Psychological Reports, 12, 515520.
Jackson, J. L. (1999). Psychometric considerations
in self-monitoring assessment. Psychological Assessment, 11, 439447.
Jacob, S., & Brantley, J. C. (1987). Ethical-legal
problems with computer use and suggestions for
best practices: A national survey. School Psychology
Review, 16, 6977.
Jacobs, J. W., Bernhard, M. R., Delgado, A., & Strain.
J. J. (1977). Screening for organic mental syndromes
in the medically ill. Annals of Internal Medicine, 86,
4046.
Jacobson, L. I., Kellogg, R. W., Cauce, A. M., & Slavin,
R. S. (1977). A multidimensional social desirability inventory. Bulletin of the Psychonomic Society, 9,
109110.
Jacobson, L. I., Prio, M. A., Ramirez, M. A., Fernandez,
A. J., & Hevia, M. L. (1978). Construccion y validacion de una prueba de intelligencia para Cubanos.
Interamerican Journal of Psychology, 12, 3945.
Jaeger, R. M. (1985). Review of Graduate Record Examinations. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 624626).
Lincoln: University of Nebraska Press.
Jaeger, R. M., Linn, R. L., & Tesh, A. S. (1989). A synthesis of research on some psychometric properties
of the GATB. In J. A. Hartigan & A. K. Wigdor
(Eds.), Fairness in employment testing (pp. 303324).
Washington, DC: National Academy Press.
Janda, L., & Galbraith, G. (1973). Social desirability
and adjustment in the Rotter Incomplete Sentences
Blank. Journal of Consulting and Clinical Psychology,
40, 337.
Janis, I. (1980). Personality differences in decision
making under stress. In K. Blandenstein, P. Pliner,
& J. Polivy (Eds.), Assessment and modication
of emotional behavior (pp. 165189). New York:
Plenum.
Janz, N. K., & Becker, M. H. (1984). The health belief
model: A decade later. Health Education Quarterly,
11, 147.
Jastak, J. F., & Jastak, J. R. (1964). Short forms of
the WAIS and WISC Vocabulary subtests. Journal
of Clinical Psychology, 20, 167199.
Jastak, J. F., & Jastak, S. R. (1972). Manual for the Wide
Range Interest Opinion Test. Wilmington, DE: Guidance Associates of Delaware.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
576
Jastak, J. F., & Jastak, S. (1979). Wide Range InterestOpinion Test. Wilmington, DE: Jastak Associates.
Jastrow, J. (1930). Autobiography. In C. Murchison (Ed.), A history of psychology in autobiography
(Vol. 1, pp. 135162). Worcester, MA: Clark University Press.
Jencks, C. (1972). Inequality: A reassessment of the effect
of family and schooling in America. New York: Harper
& Row.
Jenkins, C. D., Rosenman, R. H., & Friedman, M.
(1967). Development of an objective psychological test for the determination of the coronaryprone behavior pattern in employed men. Journal
of Chronic Diseases. 20, 371379.
Jenkins, C. D., Rosenman, R. H., & Zyzanski,
S. J. (1974). Prediction of clinical coronary-prone
behavior pattern. New England Journal of Medicine,
290, 12711275.
Jenkins, C. D., & Zyzanski, S. J. (1980). Behavioral risk
factors and coronary heart disease. Psychotherapy
and Psychosomatics, 34, 149177.
Jensema. C. (1975a). A statistical investigation of the
16PF, Form E as applied to hearing-impaired college
students. Journal of Rehabilitation of the Deaf, 9, 21
29.
Jensema, C. (1975b). Reliability of the 16 PF Form
E for hearing-impaired college students. Journal of
Rehabilitation of the Deaf, 8, 1418.
Jensen, A. R. (1969). How much can we boost IQ
and scholastic achievement? Harvard Educational
Review, 39, 123.
Jensen, A. R. (1974). How biased are culture-loaded
tests? Genetic Psychology Monographs, 90, 185244.
Jensen, A. R. (1976). Test bias and construct validity.
Phi Delta Kappan, 58, 340346.
Jensen, A. R. (1980). Bias in mental testing. New York:
Free Press.
Jensen, A. R. (1984). The black-white difference on the
K-ABC: Implications for future tests. The Journal of
Special Education, 18, 377408.
Jensen, A. R. (1987). The g beyond factor analysis. In
R. R. Ronning, J. C. Conoley, J. A. Glover, & J. C.
Witt (Eds.), The inuence of cognitive psychology on
testing (pp. 87142). Hillsdale, NJ: Erlbaum.
Jensen, A. R. (1998). The g factor. Westport, CT:
Praeger.
Jensen, J. P., & Bergin, A. E. (1988). Mental health values of professional therapists: A national interdisciplinary survey. Professional Psychology Research and
Practice, 19, 290297.
Joe, V. C. (1971). Review of the internal-external control construct as a personality variable. Psychological
Reports, 28, 619640.
Joe, V. C. (1974). Personality correlates of conservatism. Journal of Social Psychology, 93, 309310.
March 6, 2006
13:35
References
Joe, V. C., & Kostyla, S. (1975). Social attitudes and
sexual behaviors of college students. Journal of Consulting and Clinical Psychology, 43, 430.
Johansson, C. B. (1975). Manual for the Career Assessment Inventory. Minneapolis, MN: National Computer Systems.
Johansson, C. B. (1986). Manual for the Career
Assessment Inventory-Enhanced Version. Minneapolis, MN: National Computer Systems.
John, E., Cavanaugh, C., Krauss-Whitbourne, E. S.
(Eds.). (1999). Gerontology: An interdisciplinary perspective. New York: Wiley.
Johnson, D. G., Lloyd, S. M. Jr., Jones, R. F., & Anderson, J. (1986). Predicting academic performance at
a predominantly Black medical school. Journal of
Medical Education, 61, 629639.
Johnson, D. F., & White, C. B. (1980). Effects of training
on computerized test performance in the elderly.
Journal of Applied Psychology, 65, 357358.
Johnson, D. L., & McGowan, R. J. (1984). Comparison of three intelligence tests as predictors of
academic achievement and classroom behaviors of
Mexican-American children. Journal of Psychoeducational Assessment, 2, 345352.
Johnson, E. G. (1992). The design of the National
Assessment of Educational Progress. Journal of Educational Measurement, 29, 95110.
Johnson, J. H., & Overall, J. E. (1973). Factor analysis
of the Psychological Screening Inventory. Journal of
Consulting and Clinical Psychology, 41, 5760.
Johnson, J. H., & Williams, T. A. (1980). Using online computer technology in a mental health admitting system. In J. B. Sidowski, J. H. Johnson, &
T. A. Williams (Eds.), Technology in mental health
care delivery systems (pp. 237249). Norwood, NJ:
Ablex.
Johnson, J. R., Null, C., Butcher, J. N., & Johnson, K. N.
(1984). Replicated item level factor analysis of the
full MMPI. Journal of Personality and Social Psychology, 47, 105114.
Johnson, R. W. (1972). Contradictory scores on the
Strong Vocational Interest Blank. Journal of Counseling Psychology, 19, 487490.
Johnson, W. L., & Johnson, A. M. (1993). Validity of the
quality of school life scale: A primary and secondorder factor analysis. Educational and Psychological
Measurement, 53, 145153.
Johnson, W. R. (1981). Basic interviewing skills. In
C. E. Walker (Ed.), Clinical practice of psychology
(pp. 83128). New York: Pergamon Press.
Joncich, G. (1966). Complex forces and neglected
acknowledgments in the making of a young
psychologist: Edward L. Thorndike and his teachers. Journal of the History of the Behavioral Sciences,
2, 4350.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Jones, D. H., & Ragosta, M. (1982). Predictive validity of
the SAT on two handicapped groups: The deaf and the
learning disabled (829). Princeton, NJ: Educational
Testing Service.
Jones, E. E., & Sigall. H. (1971). The bogus pipeline:
A new paradigm for measuring affect and attitude.
Psychological Bulletin. 76, 349364.
Jones, J. W. (Ed.). (1991). Preemployment honesty testing. New York: Quorum Books.
Jones, R. F. (1986). The effect of commercial coaching
courses on performance on the MCAT. Journal of
Medical Education. 61, 273284.
Jones, R. F., & Adams, L. N. (1982). An annotated bibliography of research on the Medical College Admission Test. Washington, DC: Association of American
Medical Colleges.
Jones, R. F., & Thomae-Forgues, M. (1984). Validity
of the MCAT in predicting performance in the rst
two years of medical school. Journal of Medical Education, 59, 455464.
Joynson, R. B. (1989). The Burt affair. London:
Routledge.
Julian, J. Sotile, R. Henry, M. & Sotile, P. (1991)
The family Apperception Test: Manual. New York:
authors.
Jung, C. G. (1910). The association method. American
Journal of Psychology, 21, 219235.
Jurg, C. G. (1923). Psychological types. New York:
Harcount, Brace, & Co.
Justen, J. E., & Brown, G. (1977). Denitions of severely
handicapped: A survey of state departments of education. AAESPH Review, 2, 814.
Kabacoff, R. I., Miller, I. W., Bishop, D. S., Epstein,
N. B., & Keitner, G. I. (1990). A psychometric study of the McMaster Family Assessment
Device in psychiatric, medical, and nonclinical
samples. Journal of Family Psychology, 3, 431
439.
Kahana, E., Fairchild, T., & Kahana, B. (1982). Adaptation. In D. J. Mangen & W. A. Peterson (Eds.),
Research instruments in social gerontology (Vol. 1,
pp. 145193). Minneapolis, MN: University of
Minnesota Press.
Kahle, L. (1984). Attitudes and social adaptation.
Oxford, UK: Pergamon.
Kahn, M., Fox, H. M., & Rhode, R. (1988), Detecting
faking on the Rorschach: Computer vs. expert clinical judgment. Journal of Personality Assessment, 52,
516523.
Kahn, M. W., Fox, H., & Rhode, R. (1990). Detecting
faking on the Rorschach: Computer versus expert
clinical judgment. A reply to Cohen. Journal of Personality Assessment, 54, 6366.
Kahn, R. L., Goldfarb, A. I., Pollack, M., & Peck, A.
(1960). Brief objective measures for the determina-
March 6, 2006
13:35
577
tion of mental status in the aged. American Journal
of Psychiatry, 117, 326328.
Kalat, J. W., & Matlin, M. W. (2000). The GRE Psychology test: A useful but poorly understood test.
Teaching of Psychology, 27, 2427.
Kamphaus, R. W. (1993). Clinical assessment of
childrens intelligence. Boston, MA: Allyn &
Bacon.
Kamphaus, R. W., & Lozano, R. (1984). Developing local norms for individually administered tests.
School Psychology Review, 13, 491498.
Kamphaus, R. W., & Pleiss, K. L. (1991). Draw-aPerson techniques: Tests in search of a construct.
Journal of School Psychology, 29, 395401.
Kamphaus, R. W., & Reynolds, C. R. (1987). Clinical
and research applications of the K-ABC. Circle Pines,
MN: American Guidance Service,
Kangas, J., & Bradway, K. (1971). Intelligence at
middle-age: A thirty-eight year follow up. Developmental Psychology, 5, 333337.
Kanner, A. D., Coyne, J. C., Schaefer, C., & Lazarus,
R. S. (1981). Comparison of two modes of stress
measurement: Daily hassles and uplifts vs. major
life events. Journal of Behavioral Medicine, 4,
139.
Kanner, A. D., Feldman, S. S., Weinberger, D. A., &
Ford, M. E. (1987). Uplifts, hassles, and adaptational
outcomes in early adolescents. Journal of Early Adolescence, 7, 371394.
Kantor, J. E., Walker, C. E., & Hays, L. (1976). A study
of the usefulness of Lanyons Psychological Screening Inventory with adolescents. Journal of Consulting
and Clinical Psychology, 44, 313316.
Kaplan, H. E., & Alatishe, M. (1976). Comparison of
ratings by mothers and teachers on preschool children using the Vineland Social Maturity Scale. Psychology in the Schools, 13, 2728.
Kaplan, R. (1977). Patterns of environmental preference. Environment and Behavior, 9, 195216.
Kaplan, R. M. (1982). Naders raid on the testing industry. American Psychologist, 37, 1523.
Kardash, C. A., Amlund, J. T., & Stock, W. A. (1986).
Structural analysis of Paivios Individual Differences
Questionnaire. Journal of Experimental Education,
55, 3338.
Karnes, F. A., May, B., & Lee, L. A. (1982). Correlations
between scores on Form A, Form B, and Form A+B
of the Culture Fair Intelligence Test for economically disadvantaged students. Psychological Reports,
51, 417418.
Karr, S. K., Carvajal, H., & Palmer, B. L. (1992). Comparison of Kaufmans short form of the McCarthy
Scales of Childrens Abilities and the Stanford-Binet
Intelligence Scales Fourth Edition. Perceptual and
Motor Skills, 74, 11201122.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
578
Kasmar, J. V. (1970). The development of a usable lexicon of environmental descriptors. Environment and
Behavior, 2, 153169.
Kassin, S. M., & Wrightsman, L. S. (1983). The construction and validation of a juror bias scale. Journal
of Research in Personality, 17, 423442.
Kaszniak, A. W. (1989). Psychological assessment of
the aging individual. In J. E. Birren & K. W.
Schaie (Eds.), Handbook of the psychology of aging,
(3rd ed., pp. 427445). New York: Academic
Press.
Katz, E. (1955). Success on Stanford-Binet Intelligence
Scale test items of children with cerebral palsy as
compared with non-handicapped children. Cerebral
Palsy Review, 16, 1819.
Katz. J. N., Larson, M. G., Phillips, C. B., Fossel, A. H.,
& Liang, M. H. (1992). Comparative measurement
sensitivity of short and longer health status instruments. Medical Care, 30, 917925.
Katz, L., & Dalby, J. T. (1981). Computer and manual administration of the Eysenck Personality Inventory. Journal of Clinical Psychology, 37, 586588.
Katz, S., Ford, A. B., Moskowitz, R. W., Jackson, B. A., &
Jaffe, M. W. (1963). The index of ADL: A standardized measure of biological and psychosocial function. Journal of the American Medical Association.
185, 914919.
Kauffman, J. M. (1989). Characteristics of behavior disorders of children and youth (4th ed.). Columbus,
OH: Merrill.
Kaufman, A. S. (1972). A short form of the Wechsler
Preschool and Primnary Scale of Intelligence. Journal of Consulting and Clinical Psychology, 39, 311
369.
Kaufman, A. S. (1973). The relationship of WPPSI
IQs to SES and other background variables. Journal
of Clinical Psychology, 29, 354357.
Kaufman, A. S. (1975). Factor structure of the
McCarthy Scales at ve age levels between 21/2 and
81/2 . Educational and Psychological Measurement, 35,
641656.
Kaufman, A. S. (1977). A McCarthy short form for
rapid screening of preschool, kindergarten, and
rst-grade children. Contemporary Educational Psychology, 2, 149157.
Kaufman, A. S. (1979a). Intelligent testing with the
WISC-R. New York: Wiley.
Kaufman, A. S. (1979b). WISC-R research: Implications for interpretation. The School Psychology
Digest, 8, 527.
Kaufman, A. S. (1982). An integrated review of
almost a decade of research on the McCarthy
Scales. In T. R. Kratochwill (Ed.), Advances in school
psychology (Vol. II, pp. 119170). Hillsdale, NJ:
Erlbaum.
March 6, 2006
13:35
References
Kaufman, A. S. (1983). Some questions and answers
about the Kaufman Assessment Battery for Children
(K-ABC). Journal of Psychoeducational Assessment,
1, 205218.
Kaufman, A. S. (1990). Assessing adolescent and adult
intelligence. Boston, MA: Allyn & Bacon.
Kaufman, A. S. (2000). Intelligence tests and school
psychology: predicting the future by studying the
past. Psychology in the Schools, 37, 716.
Kaufman, A. S., & Applegate, B. (1988). Short forms
of the K-ABC Mental Processing and Achievement
scales at ages 4 to 121/2 years for clinical and screening purposes. Journal of Clinical Child Psychology,
17, 359369.
Kaufman, A. S., & Hollenbeck, G. P. (1974). Comparative structure of the WPPSI for blacks and whites.
Journal of Clinical Psychology, 30, 316319.
Kaufman, A. S., Ishikuma, T., & Kaufman-Packer, J. L.
(1991). Amazingly short forms of the WAIS-R. Journal of Psychoeducational Assessment, 9, 415.
Kaufman, A. S., & Kaufman, N. L. (1975). Social class
differences on the McCarthy Scales for black and
white children. Perceptual and Motor Skills, 41, 205
206.
Kaufman, A. S., & Kaufman, N. L. (1977). Clinical evaluation of young children with the McCarthy Scales.
New York: Grune & Stratton.
Kaufman, A. S., & Kaufman, N. L. (1983). K-ABC interpretive manual. Circle Pines, MN: American Guidance Service.
Kaufman, G. (1983). How good are imagery
questionnaires? A rejoinder to David Marks. Scandinavian Journal of Psychology, 24, 247249.
Kazdin, A. E. (1977). Assessing the clinical or applied
importance of behavior change through social validation. Behavior Modication, 1, 427452.
Kearns, N. P., Cruickshank. C., McGuigan, K., Riley,
S., Shaw, S., & Snaith, R. (1982). A comparison of
depression rating scales. British Journal of Psychiatry,
141, 4549.
Keating, D. P., & MacLean, D. J. (1987). Cognitive processing, cognitive ability, and development: A reconsideration. In P. A. Vernon (Ed.), Speed of information processing and intelligence. New York: Ablex.
Keesling, J. W. (1985). Review of USES General Aptitude Test Battery. In J. V. Mitchell, Jr. (Ed.), The ninth
mental measurements yearbook (Vol. 2, pp. 1645
1647). Lincoln, NE: University of Nebraska
Press.
Keith, T. (1985). McCarthy Scales of Childrens Abilities. In D. Keyser & R. Sweetland (Eds.), Test
critiques: Vol. IV. Austin, TX: PRO-ED.
Keith, T. Z. (1990). Conrmatory and hierarchical conrmatory analysis of the Differential Ability Scales.
Journal of Psychoeducational Assessment, 8, 391405.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Keith, T. Z., Cool, V. A., Novak, C. G., White, L. J., &
Pottebaum, S. M. (1988). Conrmatory factor analysis of the Stanford-Binet Fourth Edition: Testing
the theory-test match. Journal of School Psychology,
26, 253274.
Kelleher, D. (1958). The social desirability factor in
Edwards PPS. Journal of Consulting Psychology, 22,
100.
Keller, M. (1972). The oddities of alcoholics. Quarterly
Journal of Studies on Alcohol, 33, 11471148.
Kelley, M. L. (1985). Review of Child Behavior Checklist. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 301303). Lincoln,
NE: University of Nebraska Press.
Kelley, P. L., Jacobs, R. R., & Farr, J. L. (1994).
Effects of multiple administrations of the MMPI
for employee screening. Personnel Psychology, 47,
575591.
Kelley, T. L. (1927). Interpretation of educational measurements. New York: New World Book Company.
Kelley, T. L. (1928). Crossroads in the mind of man: A
study of differentiable mental abilities. Stanford, CA:
Stanford University Press.
Kelley, T. L. (1939). The selection of upper and lower
groups for the validation of test items. Journal of
Educational Psychology, 30, 1724.
Kelly, T. A. (1990). The role of values in psychotherapy:
A critical review of process and outcome effects.
Clinical Psychology Review, 10, 171186.
Kemp, D. E., & Stephens, J. H. (1971). Which AB
scale? A comparative analysis of several versions.
The Journal of Nervous and Mental Disease, 152,
2330.
Keogh, B. K., & Smith, S. E. (1967). Visuo-Motor ability
for school prediction: A seven-year study. Perceptual
and Motor Skills, 25, 101110.
Kerlinger, F. N. (1964). Foundations of behavioural
research. New York: Holt.
Kerlinger, F. N. (1986). Foundations of behavioral
research (3rd ed.). New York: Holt, Rinehart and
Winston.
Kerlinger, F. N., & Pedhazur, E. J. (1973). Multiple
regression in behavioral research. New York: Holt,
Rinehart & Winston.
Kerr, W. A. (1952). Untangling the LiberalismConservatism continuum. Journal of Social Psychology, 35, 111125.
Kessler, R. C., & Wethington, E. (1991). The reliability
of life event reports in a community survey. Psychological Medicine, 21, 723738.
Keston, J., & Jimenez, C. (1954). A study of the
performance on English and Spanish editions of
the Stanford-Binet Intelligence Test by SpanishAmerican children. Journal of Genetic Psychology, 85,
263269.
March 6, 2006
13:35
579
Kidder, L. H., Judd, C. M., & Smith, E. R. (1986).
Research methods in social relations. New York: Holt,
Rinehart, & Winston.
Kiecolt-Glaser, J. K., & Glaser, R. (1986). Psychological
inuences on immunity. Psychosomatics, 27, 621
625.
Kiernan, R. J., Mueller, J., Langston, J. W., & Van Dyke,
C. (1987). The Neurobehavioral Cognitive Status
Examination: A brief but differentiated approach to
cognitive assessment. Annals of Internal Medicine,
107, 481485.
Kilpatrick, F. P., & Cantril, H. (1960). Self-anchoring
scaling: A measure of individuals unique reality
worlds. Journal of Individual Psychology, 16, 158
173.
Kilty, K. M., & Feld, A. (1976). Attitudes toward aging
and toward the needs of older people. Journal of
Gerontology, 31, 586594.
King, J. D., & Smith, R. A. (1972). Abbreviated forms of
the Wechsler Preschool and Primary Scale of Intelligence for a kindergarten population. Psychological
Reports, 30, 539542.
King, M., & King, J. (1971). Some correlates of university performance in a developing country: The case
of Ethiopia. Journal of Cross-Cultural Psychology, 2,
293300.
King, W. C., Jr., & Miles, E. W. (1995). A quasiexperimental assessment of the effect of computerizing noncognitive paper-and-pencil measurements:
A test of measurement equivalence. Journal of
Applied Psychology, 80, 643651.
Kirk, S. A., McCarthy, J. J., & Kirk, W. D. (1968). Illinois
Test of Psycholinguistic Abilities. Urbana, IL: University of Illinois Press.
Kirk Patrick, E. A. (1900) Individual tests of school
children. Psychological Review, 7, 274280.
Kirnan, J. P., & Geisinger, K. F. (1981). The prediction of graduate school success in psychology. Educational and Psychological Measurement, 41,
815820.
Kirnan, J. P., & Geisinger, K. F. (1990). General Aptitude Test Battery. In J. Hogan & R. Hogan (Eds.),
Business and Industry Testing (pp. 140157). Austin,
TX: Pro-ed.
Kivela, S. L., & Pahkala, K. (1986). Sex and age differences in the factor pattern and reliability of the
Zung Self-Rating Depression Scale in a Finnish
elderly population. Psychological Reports, 59,
587597.
Klee, S. H., & Garnkel, B. D. (1983). The computerized continuous performance task: A new measure
of inattention. Journal of Abnormal Child Psychology,
11, 487496.
Klein, M. H., Benjamin, L. S., Rosenfeld, R., Treece,
C., Husted, J., & Greist, J. H. (1993). The Wisconsin
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
580
Personality Disorders Inventory: Development, reliability, and validity. Journal of Personality Disorders,
7, 285303.
Klein, S. P., & Owens, W. A. (1965). Faking of a scored
life history as a function of criterion objectivity. Journal of Applied Psychology, 49, 451454.
Kleinmuntz, B. (1961). The College maladjustment
scale (Mt): Norms and predictive validity. Educational and Psychological Measurement, 21, 1029
1033.
Kleinmuntz, B. (1967). Personality measurement.
Homewood, IL: The Dorsey Press.
Kleinmuntz, B. (1969). Personality test interpretation
by computer and clinician. In J. N. Butcher (Ed.),
MMPI: Research developments and clinical applications (pp. 97104). NY: McGraw-Hill.
Kleinmuntz, B. (1977). Personality measurement. New
York: Krieger.
Kleinmuntz, B., & Szucko, J. J. (1984). Lie detection in
ancient and modern times. American Psychologist,
39, 766776.
Klieger, D. M., & Franklin, M. E. (1993). Validity of the
Fear Survey Schedule in phobia research: A laboratory test. Journal of Psychopathology and Behavioral
Assessment, 15, 207217.
Klieger, D. M., & McCoy, M. L. (1994). Improving
the concurrent validity of the Fear Survey ScheduleIll. Journal of Psychopathology and Behavioral Assessment, 16, 201220.
Klimoski, R., & Brickner, M. (1987). Why do assessment centers work? The puzzle of assessment center
validity. Personnel Psychology, 40, 243260.
Kline, P. (1986). A handbook of test construction. New
York: Methuen.
Kline, R. B. (1989). Is the Fourth Edition StanfordBinet a four factor test? Conrmatory factor analyses
of alternative models for ages 2 through 23. Journal
of Psychoeducational Assessment, 7, 413.
Kline, R. B., Snyder, J., Guilmette, S., & Castellanos,
M. (1993). External validity of the prole variability
index for the K-ABC, Stanford-Binet, and WISC-R:
Another cul-de-sac. Journal of Learning Disabilities,
26, 557567.
Klingler, D. E., Johnson, J. H., & Williams, T. A. (1976).
Strategies in the evaluation of an on-line computerassisted unit for intake assessment of mental health
patients. Behavior Research Methods and Instrumentation, 8, 95100.
Klingler, D. E., Miller, D. A., Johnson, J. H., &
Williams, T. A. (1977). Process evaluation of an
online computer-assisted unit for intake assessment
of mental health patients. Behavior Research Methods and Instrumentation, 9, 110116.
Klonoff, H., & Kennedy, M. (1965). Memory
and perceptual functioning in octogenarians and
March 6, 2006
13:35
References
nonagena-rians in the community. Journal of Gerontology, 20, 328333.
Klonoff, H., & Kennedy, M. (1966). A comparative
study of cognitive functioning in old age. Journal of
Gerontology, 21, 239243.
Klopfer, B., Ainsworth, M. D., Klopfer, W. G., &
Holt, R. R. (1954). Developments in the Rorschach
Technique. Yonkers-on-Hudson, NY: World
Book Co.
Klopfer, W. G. (1973). The short history of projective
techniques. Journal of the History of the Behavioral
Sciences, 9, 6065.
Kluger, A. N., Reilly, R. R., & Russell, C. J. (1991).
Faking biodata tests: Are option-keyed instruments
more resistant? Journal of Applied Psychology, 76,
889896.
Knapp, R. H., & Garbutt, J. J. (1958). Time imagery
and the achievement motive. Journal of Personality,
26, 426434.
Knapp, R. R. (1960). The effects of time limits on the
intelligence test performance of Mexican and American subjects. Journal of Educational Psychology, 51,
1420.
Knapp, R. R. (1976). Handbook for the Personal Orientation Inventory. San Diego, CA: EdITS.
Knapp, R. R., Shostrom, E. L., & Knapp, L.
(1978). Assessment of the actualizing person. In P.
McReynolds (Ed.), Advances in psychological assessment (Vol. 4, pp. 103140). San Francisco, CA:
Jossey-Bass.
Knight, B. C., Baker, E. H., & Minder, C. C. (1990).
Concurrent validity of the Stanford-Binet: Fourth
edition and the Kaufman Assessment Battery for
Children with learning disabled students. Psychology
in the Schools, 27, 116125.
Knight, H. V., Acosta, L. J., & Anderson, B. D. (1988).
A comparison of textbook and microcomputer
instruction in preparation for the ACT. Journal of
Computer Based Instruction, 15, 8387.
Knight, R. G., Chisholm, B. J., Marsh, N. V., & Godfrey, H. P. (1988). Some normative reliability, and
factor analytic data for the Revised UCLA Loneliness Scale. Journal of Clinical Psychology, 44, 203
206.
Knobloch, H., & Pasamanick, B. (Eds.). (1974). Gesell
and Amatrudas developmental diagnosis (3rd ed.).
New York: Harper & Row.
Knobloch, H., Stevens, F., & Malone, A. F. (1980). Manual of developmental diagnosis. Hagerstown, MD:
Harper & Row.
Knoff, H. M. (1989). Review of the Personality Inventory for Children. In J. C. Conoley & J. J. Kramer
(Eds.), The tenth mental measurements yearbook
(pp. 625630). Lincoln, NE: University of Nebraska
Press.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Kobasa, S. C. (1979). Stressful life events, personality
and health: An inquiry into hardiness. Journal of
Personality and Social Psychology, 37, 111.
Kobasa, S. C., Maddi, S. R., & Kahn, S. (1982). Hardiness and health: A prospective study. Journal of
Personality and Social Psychology, 42, 168177.
Koch, H. L. (1933). Popularity in preschool children:
Some related factors and a technique for its measurement. Child Development, 4, 164175.
Koelle, W. H., & Convey, J. J. (1982). The prediction
of the achievement of deaf adolescents from selfconcept and locus of control measures. American
Annals of the Deaf, 127, 769779.
Kogan, N., & Wallach. M. A. (1961). Age changes
in values and attitudes. Journal of Gerontology, 16,
272280.
Kohlberg, L. (1979). The meaning and measurement of
moral development. Worcester, MA: Clark University
Press.
Komaroff, A. L., Masuda, M., & Holmes, T. H. (1968).
The social readjustment rating scale: A comparative study of Negro, Mexican and White Americans.
Journal of Psychosomatic Research, 12, 121128.
Komorita, S. S., & Graham, W. K. (1965). Number of
scale points and the reliability of scales. Educational
and Psychological Measurement, 25, 987995.
Koppitz, E. M. (1964). The Bender Gestalt Test for young
children. New York: Grune & Straton.
Koppitz, E. M. (1975). The Bender Gestalt Test for young
children: Vol. 2. Research and applications, 1963
1973. New York: Grune & Stratton.
Korchin, S. J. (1976). Modern Clinical Psychology. New
York: Basic Books.
Kornblith, S. J., Greenwald, D. P., Michelson, L., &
Kazdin, A. E. (1984). Measuring the effects of
demand characteristics on the Beck Depression
Inventory responses of college students. Journal of
Behavioral Assessment, 6, 4549.
Koson, D., Kitchen, C., Kochen, M., & Stodolosky, D.
(1970). Psychological testing by computer: Effect on
response bias. Educational and Psychological Measurement, 30, 803810.
Kosuth, T. F. (19841985). The pictorial inventory of
careers. Jacksonville, FL: Talent Assessment.
Kramer, J., Shanks, K., Markely, R., & Ryabik, J.
(1983). The seductive nature of WISC-R short
forms: An analysis with gifted referrals. Psychology
in the Schools, 20, 137141.
Kranau, E. J., Green, V., & Valencia-Weber, G. (1982).
Acculturation and the Hispanic woman: Attitudes
toward women, sex-role attribution, sex-role behavior, and demographics. Hispanic Journal of Behavioral Sciences, 4, 2140.
Kratochwill, T. R., Doll, E. J., & Dickson, W. P.
(1985). Microcomputers in behavioral assessment:
March 6, 2006
13:35
581
Recent advances and remaining issues. Computers
in Human Behavior, 1, 277291.
Krause, J. S., & Dawis, R. V. (1992). Prediction of life
satisfaction after spinal cord injury: A four-year longitudinal approach. Rehabilitation Psychology, 37,
4960.
Kravitz, D. A., Cutler, B. L., & Brock, P. (1993). Reliability and validity of the original and revised Legal
Attitudes Questionnaire. Law and Human Behavior,
17, 661677.
Krech, D., Crutcheld, R. S., & Ballachey, E. L. (1962).
Individual in society. New York: McGraw-Hill.
Kremer, E. F., Atkinson, J. H., Jr., & Ignelzi, R. J. (1982).
Pain measurement: The affective dimensional measure of the McGill Pain Questionnaire with a cancer
pain population. Pain, 12, 153163.
Krohn, E. J., & Lamp, R. E. (1989). Concurrent validity
of the Stanford-Binet Fourth Edition and K-ABC for
Head Start children. Journal of School Psychology, 27,
5967.
Krout, M. H. (1954). An experimental attempt to
produce unconscious manual symbolic movements.
Journal of General Psychology, 51, 93120.
Krug, S. E. (1987). Psychware sourcebook (2nd ed.).
Kansas City, MO: Test Corporation of America.
Krumboltz, J. C., Mitchell, A., & Gelatt, H. G. (1975).
Applications of social learning theory of career selection. Focus on Guidance, 8, 116.
Kubler-Ross, E. (1969). On death and dying. New York:
Macmillan.
Kuder, G. F., & Diamond, E. E. (1979). Occupational
Interest Survey: General manual (2nd ed.). Chicago,
IL: Science Research Associates.
Kuder, G. F., & Richardson, M. W. (1937). The theory of
estimation of test reliability. Psychometrika, 2, 151
160.
Kudoh, T., & Nishikawa, M. (1983). A study of the
feeling of loneliness (I): the reliability and validity
of the revised UCLA Loneliness Scale. The Japanese
Journal of Experimental Social Psychology, 22, 99
108.
Kulberg, G. E., & Owens, W. A. (1960). Some life history antecedents of engineering interests. Journal of
Educational Psychology, 51, 2631.
Kulik, J. A., Bangert-Drowns, R. L., & Kulik, C. C.
(1984). Effectiveness of coaching for aptitude tests.
Psychological Bulletin, 95, 179188.
Kumar, V. K., Rehill, K. B., Treadwell, T. W., &
Lambert, P. (1986). The effects of disguising scale
purpose on reliability and validity. Measurement
and Evaluation in Counseling and Development, 18,
163167.
Kunert, K. M. (1969). Psychological concomitants
and determinants of vocational choice. Journal of
Applied Psychology, 53, 152158.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
582
Kuo, W. H. (1984). Prevalence of depression among
Asian-Americans. Journal of Nervous and Mental
Disorders, 172, 449457.
Kutner, B., Fanshel, D., Togo, A. M., & Langner, T. S.
(1956). Five hundred over sixty. New York: Russell
Sage Foundation.
Labeck. L. J., Johnson, J. H., & Harris, W. G. (1983).
Validity of a computerized on-line MMPI interpretive system. Journal of Clinical Psychology, 39, 412
416.
Laboratory of Comparative Human Cognition (1982).
Culture and intelligence. In R. J. Sternberg (Ed.),
Handbook of human intelligence (pp. 642719).
Cambridge, UK: Cambridge University Press.
Lachar, D. (1974). The MMPI: Clinical assessment and
automated interpretation. Los Angeles, CA: Western
Psychological Services.
Lachar, D., & Gdowski, C. G. (1979). Actuarial assessment of child and adolescent personality: An interpretive guide for the Personality Inventory for Children
prole. Los Angeles, CA: Western Psychological Services.
Lachar, D., & Wrobel, T. A. (1979). Validation of clinicians hunches: Construction of a new MMPI critical
item set. Journal of Consulting and Clinical Psychology, 47, 277284.
LaGreca, A. M. (1981). Peer acceptance: The correspondence between childrens sociometric scores
and teachers ratings of peer interactions. Journal
of Abnormal Child Psychology, 9, 167178.
Lah, M. I., & Rotter, J. B. (1981). Changing college
student norms on the Rotter Incomplete Sentences
Blank. Journal of Consulting and Clinical Psychology,
49, 985.
Lam, T. C. M. (1993). Testability: A critical issue
in testing language minority students with standardized achievement tests. Measurement and
Evaluation in Counseling and Development, 26,
179191.
Lamb, R. R., & Prediger, D. J. (1981). Technical report
for the unisex edition of the ACT interest inventory
(UNIACT). Iowa City, IO: American College Testing
Program.
Lambert, M. J., Hatch, D. R., Kingston, M. D., &
Edwards, B. C. (1986). Zung, Beck, and Hamilton
Rating scales as measures of treatment outcome: A
metaanalytic comparison. Journal of Consulting and
Clinical Psychology, 54, 5459.
Lambert, N. M. (1991). The crisis in measurement literacy in psychology and education. Educational Psychologist, 26, 2335.
Lambirth, T. T, Gibb, G. D., & Alcorn, J. D. (1986). Use
of a behavior-based personality instrument in aviation selection. Educational and Psychological Measurement, 46, 973978.
March 6, 2006
13:35
References
Lamp, R. E., & Krohn, E. J. (1990). Stability of
the Stanford-Binet Fourth Edition and K-ABC for
young Black and White children from low-income
families. Journal of Psychoeducational Assessment, 8,
139149.
Landis, C. (1936). Questionnaires and the study of personality. Journal of Nervous and Mental Disease, 83,
125134.
Landis, C., Zubin, J., & Katz, S. E. (1935). Empirical
evaluation of three personality adjustment inventories. Journal of Educational Psychology, 26, 321330.
Landy, F. J., Rastegary, H., Thayer, J., & Colvin, C.
(1991). Time urgency: The construct and its measurement. Journal of Applied Psychology, 76, 644
657.
Lane, B. (1964). Attitudes of youth toward the
aged. Journal of Marriage and the Family, 26,
229231.
Lane, S., Liv, M., Ankenmann, R. D., & Stone, C. A.
(1996). Generalizability and validity of a mathematics performance assessment. Journal of Educational
Measurement, 33, 7192.
Lang, P. J., & Lazovik, A. P. (1963). Experimental desensitization of a phobia. Journal of Abnormal and Social
Psychology, 66, 519525.
Lankford, J. S., Bell, R. W., & Elias, J. W. (1994). Computerized vs. standard personality measures: Equivalency, computer anxiety, and gender differences.
Computers in Human Behavior, 10, 497510.
Lanning, K. (1989). Detection of invalid response
patterns on the California Psychological Inventory.
Applied Psychological Measurement, 13, 4556.
Lansman, M., Donaldson, G., Hunt, E., & Yantis, S.
(1982). Ability factors and cognitive processes. Intelligence, 6, 347386.
Lanyon, R. I. (1967). Measurement of social competence in college males. Journal of Consulting Psychology, 31, 493498.
Lanyon, R. I. (1968). A Psychological Screening Inventory. Paper presented to the Eastern Psychological
Association, Washington, DC.
Lanyon, R. I. (1973). Manual for the Psychological
Screening Inventory. Port Huron, MI: Research Psychologists Press.
Lanyon, R. I. (1978). Psychological Screening Inventory:
Manual (2nd ed.). Port Huron, MI: Research Psychologists Press.
Lanyon, R. I. (1984). Personality assessment. Annual
Review of Psychology, 35, 667701.
Lanyon, R. I. (1993). Development of scales to assess
specic deception strategies on the Psychological
Screening Inventory. Psychological Assessment, 5,
324329.
Lanyon, R. I. (1995). Review of the Imwald Personality
Inventory. In J. C. Conoley & J. C. Impara (Eds.),
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
The twelfth mental measurements yearbook (pp. 503
504). Lincoln, NB: University of Nebraska Press.
Lanyon, R. I., Johnson, J. H., & Overall, J. E. (1974).
Factor structure of the Psychological Screening
Inventory items in a normal population. Journal of
Consulting and Clinical Psychology, 42, 219223.
Laosa, L. M. (1993). Family characteristics as predictors of individual differences in Chicano childrens
emergent school readiness. ETS Research Report 93
34. Princeton, NJ: Educational Testing Service.
Lasher, K. P., & Faulkender, P. J. (1993). Measurement of aging anxiety: Development of the Anxiety
About Aging Scale. International Journal of Aging
and Human Development, 37, 247259.
Lau, S. (1988). The value orientations of Chinese University students in Hong Kong. International Journal
of Psychology, 23, 583596.
Laughton, J. (1988). Strategies for developing creative abilities of hearing-impaired children. American Annals of the Deaf, 133, 258263.
Lautenschlager, G. J. (1986). Within-subject measures
for the assessment of individual differences in faking. Educational and Psychological Measurement, 46,
309316.
LaVoie, A. L. (1978). Review of the Survey of Interpersonal Values. In O. K. Buros (Ed.), The eight mental measurements yearbook (Vol. 1, pp. 11081110).
Highland Park, NJ: Gryphon Press.
LaVoie, A. L., & Bentler, P. M. (1974). A short form
measure of the expanded seven-dimensional semantic space. Journal of Educational Measurement, 11,
6566.
Lavos, G. (1962). W.I.S.C. psychometric patterns
among deaf children. Volta Review, 64, 547552.
Lawler, E. E., III (1967). The multitrait-multirater
approach to measuring managerial job performance. Journal of Applied Psychology, 51, 369381.
Lawlis, G. F. (1971). Response styles of a patient population on the Fear Survey Schedule. Behaviour
Research and Therapy, 9, 95102.
Lawshe, C. H. (1952). What can industrial psychology
do for small business? Personnel Psychology, 5, 31
34.
Lawshe, C. H. (1985). Inferences from personnel tests
and their validity. Journal of Applied Psychology, 70,
237238.
Lawshe, C. H., Kephart, N. C., & McCormick, E. J.
(1949). The paired comparison technique for rating performance of industrial employees. Journal of
Applied Psychology, 33, 6977.
Lawshe, C. H., & Steinberg, M. D. (1955). Studies in
synthetic validity. I. An exploratory investigation of
clerical jobs. Personnel Psychology, 8, 291301.
Lawson, J. S., & Inglis, J. (1985). Learning disabilities and intelligence test results: A model
March 6, 2006
13:35
583
based on a principal components analysis of
the WISCV-R. British Journal of Psychology, 76,
3548.
Lawton, M. P. (1972). The dimensions of morale. In
D. Kent, R. Kastenbaum, & S. Sherwood (Eds.),
Research planning and action for the elderly, (pp. 144
165). New York: Behavioral Publications.
Lawton, M. P. (1975). The Philadelphia Geriatric Center Morale Scale: A revision. Journal of Gerontology,
30, 8589.
Lazaridis, E. N., Rudberg, M. A., Furner, S. E., & Cassel, C, K. (1994). Do Activities of Daily Living have
a hierarchical structure? An analysis using the longitudinal study of aging. Journal of Gerontology, 49,
4751.
Lazarsfeld, P. F. (1950). The logic and mathematical foundation of latent structure analysis. In S. A.
Stouffer, L. Guttman, E. A. Schuman, P. F. Lazarsfeld,
S. A. Starr, & J. A. Clausen (Eds.), Measurement and
prediction, (pp. 362412). Princeton, NJ: Princeton
University Press.
Lazarsfeld, P. F. (1954). A conceptual introduction to
latent structure analysis. In P. F. Lazarsfeld (Ed.),
Mathematical thinking in the social sciences. Glencoe,
IL: Free Press.
Lazarsfeld, P. F. (1959). Latent structure analysis. In S.
Koch (Ed.), Psychology: A study of science (Vol. 3, pp.
476543). New York: McGraw-Hill.
Lazarus, R. S. (1966). Psychological stress and the coping
process. New York: McGraw-Hill.
Lazarus, R. S. (1980). The stress and coping paradigm.
In C. Eisdorfer, D. Cohen, & A. Kleinman (Eds.),
Conceptual models for psychopathology (pp. 173
209). New York: Spectrum.
Leahy, J. M. (1992). Validity and reliability of the Beck
Depression Inventory-Short Form in a group of
adult bereaved females. Journal of Clinical Psychology, 48, 6468.
Leckliter, I. N., Matarazzo, J. D., & Silverstein, A. B.
(1986). A literature review of factor analytic studies
of the WAIS-R. Journal of Clinical Psychology, 42,
332342.
LecLercq, D. A., & Bruno, J. E. (1993). Item banking:
Interactive testing and self-assessment. Berlin:
Springer-Verlag.
Lee, H. B., & Oei, T. P. S. (1994). Factor structure,
validity, and reliability of the Fear Questionnaire in
a Hong Kong Chinese population. Journal of Psychopathology and Behavioral Assessment, 16, 189
199.
Lee, L. (1971). Northwestern Syntax Screening Test.
Evanston, IL: Northwestern University Press.
Lees-Haley, P. R. (1992). Psychodiagnostic test usage by
forensic psychologists. American Journal of Forensic
Psychology, 10, 2530.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
584
Lefcourt, H. M. (1966). Internal vs. external control of
reinforcement: A review. Psychological Bulletin, 65,
206220.
Lefcourt, H. M. (1976). Locus of control: Current trends
in theory and research. Hillsdale, NJ: Erlbaum.
Lei, H., & Skinner, H. A. (1980). A psychometric study
of life events and social readjustment. Journal of Psychosomatic Research, 24, 5766.
Leichsenring, F. (1999). Development and rst results
of the Borderline Personality Inventory: A selfreport instrument for assessing borderline personality organization. Journal of Personality Assessment,
73, 4563.
Leigh, I. W., Robins, C. J., Welkowitz, J., & Bond, R. N.
(1989). Toward greater understanding of depression
in deaf individuals. American Annals of the Deaf, 134,
249254.
Leighton, A. H. (1959). Mental illness and acculturation. In I. Goldston (Ed.), Medicine and anthropology
(pp. 108128). New York: International University
Press.
Leiter, R. G. (1952). Leiter International Performance
Scale. Chicago, IL: Stoelting.
Lerner, B. (1981). The minimum competence testing
movement: Social, scientic, and legal implications.
American Psychologist, 36, 10571066.
Lerner, H., & Lerner, P. M. (1987). Rorschach Inkblot
Test. In D. J. Keyser & R. C. Sweetland (Eds.), Test
critiques compendium (pp. 372401). Kansas City,
MO: Test Corporation of America.
Lesser, G. (1959). Population differences in construct validity. Journal of Consulting Psychology, 23,
6065.
Lettieri, D. J., Nelson, J. E., & Sayers, M. A. (1985). Alcoholism treatment assessment research instruments.
Washington, DC.: Metrotec.
Levenson, H. (1973). Perceived parental antecedents
of internal, powerful others, and chance locus of
control orientations. Developmental Psychology, 9,
260265.
Levenson, H. (1974). Activism and powerful others:
Distinctions within the concept of internal-external
control. Journal of Personality Assessment, 38, 377
383.
Leventhal, A. M. (1966). An anxiety scale for the CPI.
Journal of Clinical Psychology, 22, 459461.
Levine, E. S. (1971). Mental assessment of the deaf
child. The Volta Review, 73, 80105.
Levine, E. S. (1974). Psychological tests and practices
with the deaf: A survey of the state of the art. The
Volta Review, 76, 298319.
Levine, M., & Wishner, J. (1977). The case records of
the psychological clinic at the University of Pennsylvania (18961961). Journal of the History of the
Behavioral Sciences, 13, 5966.
March 6, 2006
13:35
References
Levonian, E. (1961). A statistical analysis of the 16 Personality Factor Questionnaire. Educational and Psychological Measurement, 21, 589596.
Levonian, E., Comrey, A., Levy, W., & Procter, D.
(1959). A statistical evaluation of Edwards Personal
Preference Schedule. Journal of Applied Psychology,
43, 355359.
Levy, S. (1982). Use of the Peabody Picture Vocabulary
Test with low-functioning autistic children. Psychology in the Schools, 19, 2427.
Lewandowski, D. G., & Saccuzzo, D. P. (1975). Possible
differential WISC patterns for retarded delinquents.
Psychological Reports, 37, 887894.
Lewinsohn, P. M. (1976). Activity schedules in treatment of depression. In J. D. Krumboltz & C. E.
Thoresen (Eds.), Counseling methods (pp. 7483).
New York: Holt, Rinehart and Winston.
Lewinsohn, P. M., & Graf, M. (1973). Pleasant activities
and depression. Journal of Consulting and Clinical
Psychology, 41, 261268.
Lewinsohn, P. M., Mermelstein, R. M., Alexander, C.,
& MacPhillamy, D. J. (1985). The Unpleasant Events
Schedule: A scale for the measurement of aversive
events. Journal of Clinical Psychology, 41, 483498.
Lezak, M. (1983). Neuropsychological Assessment (2nd
ed.). New York: Oxford University Press.
Lezak, M. D. (1988). IQ: R.I.P. Journal of Clinical and
Experimental Neuropsychology, 10, 351361.
Libbrecht, K., & Quackelbeen, J. (1995). On the early
history of male hysteria and psychic trauma. Journal
of the History of the Behavioral Sciences, 31, 370
384.
Lichtenstein, E., & Bryan, J. H. (1965). Acquiescence
and the MMPI: An item reversal approach. Journal
of Abnormal Psychology, 70, 290293.
Lichtenstein, R., & Ireton, H. (1984). Preschool screening. Orlando, FL: Grune & Stratton.
Liebert, R. M., & Morris, L. W. (1967). Cognitive and
emotional components of test anxiety: A distinction
and some initial data. Psychological Reports, 20, 975
978.
Lightfoot, S. L., & Oliver, J. M. (1985). The Beck
Inventory: Psychometric properties in university
students. Journal of Personality Assessment, 49, 434
436.
Likert, R. (1932). A technique for the measurement of
attitudes. Archives of Psychology, No. 140, 155.
Linde, T., & Patterson, C. H. (1958). The MMPI in
cerebral palsy. Journal of Consulting Psychology, 22,
210212.
Lindzey, G. (1959). On the classication of projective
techniques. Psychological Bulletin, 56, 158168.
Linn, M. R. (1993). A brief history for counselors . . .
college entrance examinations in the United States.
The Journal of College Admission, 140, 616.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
Linn, R. L. (1986). Educational testing and assessment.
Research needs and policy issues. American Psychologist, 41, 11531160.
Linn, R. L., & Dunbar, S. B. (1992). Issues in the design
and reporting of the National Assessment of Educational Progress. Journal of Educational Measurement,
29, 177194.
Linn, R. L., Grave, M. E., & Sanders, N. M. (1990).
Comparing state and district test results to national
norms: The validity of the claims that everyone
is above average. Educational Measurement: Issues
and Practice, 9, 514.
Linn, R. L., & Harnisch, D. L. (1981). Interactions
between item content and group membership on
achievement test items. Journal of Educational Measurement, 18, 109118.
Linn, R. L., Harnisch, D. L., & Dunbar, S. B. (1981).
Validity generalization and situational specicity:
An analysis of the prediction of rst-year grades in
law school. Applied Psychological Measurement, 5,
281289.
Lipsitt, L. P. (1958). A self-concept scale for children
and its relationship to the childrens form of the
Manifest Anxiety Scale. Child Development, 29, 463
473.
Lissitz, R. W., & Willhoft, J. L. (1985). A methodological study of the Torrance Tests of Creativity. Journal
of Educational Measurement, 22, 111.
Littell, W. M. (1960). The Wechsler Intelligence Scale
for Children: Review of a decade of research. Psychological Bulletin, 57, 132156.
Livneh, H., & Livneh, C. (1989). The ve-factor model
of personality: Is evidence of its cross-measure validity premature? Personality and Individual Differences, 10, 7580.
Locke, H. J., & Wallace, K. M. (1959). Short marital adjustment and prediction tests: Their reliability
and validity. Marriage and Family Living, 21, 251
255.
Loeb, R., & Sarigiani, P. (1986). The impact of hearing impairment on self perceptions of children. The
Volta Review, 88, 89101.
Loevinger, J. (1957). Objective tests as instruments of
psychological theory. Psychological Reports, 3, 635
694.
Loevinger, J. (1972). Some limitations of objective
personality tests. In J. N. Butcher (Ed.), Objective personality assessment. New York: Academic
Press.
Loevinger, J. (1976). Ego development. San Francisco,
CA: Jossey-Bass.
Loevinger, J. (1979). Construct validity of the
sentence completion test of ego development.
Applied Psychological Measurement, 3, 281
311.
March 6, 2006
13:35
585
Lohmann, N. (1977). Correlations of life satisfaction,
morale, and adjustment measures. Journal of Gerontology, 32, 7375.
Longstaff, H. P., & Jurgensen, C. E. (1953). Fakability
of the Jurgensen Classication Inventory. Journal of
Applied Psychology, 37, 8689.
Lord, F. M. (1944). Reliability of multiple-choice tests
as a function of number of choices per item. Journal
of Educational Psychology, 35, 175180.
Lord, F. M. (1952). The relation of the reliability of
multiple-choice tests to the distribution of item difculties. Psychometrika, 17, 181194.
Lord, F. M. (1980). Applications of item response theory
to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (Eds.). (1968). Statistical
theories of mental test scores. New York: AddisonWesley.
Lorr, M., Daston, P., & Smith, I. R. (1967). An analysis of mood states. Educational and Psychological
Measurement, 27, 8996.
Lovie, A. D., & Lovie. P. (1993). Charles Spearman,
Cyril Burt, and the origins of factor analysis. Journal of the History of the Behavioral Sciences, 29,
308321.
Lowe, N. K., & Ryan-Wenger, N. M. (1992). Beyond
Campbell and Fiske: Assessment of convergent
and discriminant validity. Research in Nursing and
Health, 15, 6775.
Loyd, B. H. (1988). Implications of item response theory for the measurement practitioner. Applied Measurement in Education, 1, 135143.
Lubin, B. (1965). Adjective checklists for the measurement of depression. Archives of General Psychology,
12, 5762.
Lubin, B. (1967). Manual for the depression adjective
check lists. San Diego, CA: Education and Industrial
Testing Service.
Lubin, B., Larsen, R. M., & Matarazzo, J. D. (1984).
Patterns of psychological test usage in the United
States: 19351982. American Psychologist, 39, 451
454.
Lubin, B., Wallis, R. R., & Paine, C. (1971). Patterns of
psychological test usage in the United States: 1935
1969. Professional Psychology, 2, 7074.
Lucas, R. W., Mullin, P. J., Luna, C. B. X., & McInroy,
D. C. (1977). Psychiatrists and a computer as interrogators of patients with alcohol-related illnesses: A
comparison. British Journal of Psychiatry, 131, 160
167.
Ludenia, K., & Donham, G. (1983). Dental outpatients:
Health locus of control correlates. Journal of Clinical
Psychology, 39, 854858.
Lundy, A. (1988). Instructional set and Thematic
Apperception Test validity. Journal of Personality
Assessment, 52, 309320.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
586
Lunneborg, C. E., & Lunneborg, P. W. (1967). Pattern
prediction of academic success. Educational and Psychological Measurement, 27, 945952.
Lunneborg, P. W. (1979). The Vocational Interest
Inventory: Development and validation. Educational and Psychological Measurement, 39,
445451.
Luria, A. R. (1966). Higher cortical functions in man.
New York: Basic Books.
Luria, A. R. (1973). The working brain. New York: Basic
Books.
Lyman, H. B. (1978). Test scores and what they mean
(3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Lynn, M. R. (1986). Determination and quantication of content validity. Nursing Research, 35,
382385.
Lytton, H., Watts, D., & Dunn, B. E. (1986). Stability and predictability of cognitive and social characteristics from age 2 to age 9. Genetic Psychology
Monographs, 112, 363398.
MacAndrew, C. (1965). The differentiation of male
alcoholic out-patients from nonalcoholic psychiatric patients by means of the MMPI. Quarterly Journal of Studies on Alcohol, 26, 238246.
Maccoby, E. E., & Jacklin, C. N. (1974). The psychology of sex differences. Stanford, CA: United
Press.
MacFarlane, J. W., & Tuddenham, R. D. (1951). Problems in the validation of projective techniques. In
H. H. Anderson, & G. L. Anderson (Eds.), An introduction to projective techniques (pp. 2654). Englewood Cliffs, NJ: Prentice-Hall.
Machover, K. (1949). Personality projection in the drawing of the human gure: A method of personality investigation. Springeld, IL: Charles C Thomas.
MacPhillamy, D. J., & Lewinsohn, P. M. (1976). Manual for the pleasant events schedule. Unpublished
manuscript, University of Oregon.
Madianos, M. G., Gournas, G., & Stefanis, C. N. (1992).
Depressive symptoms and depression among elderly
people in Athens. Acta Psychiatrica Scandinavica, 86,
320326.
Mael, F. A. (1991). A conceptual rationale for the
domain and attributes of biodata items. Personnel
Psychology, 44, 763792.
Mahakian, C. (1939). Measuring the intelligence and
reading capacity of Spanish-speaking children. Elementary School Journal, 39, 760768.
Mahon, N. E., & Yarcheski, A. (1990). The dimensionality of the UCLA Loneliness Scale. Research in Nursing and Health, 13, 4552.
Malgady, R. G., Costantino, G., & Rogler, L. H. (1984).
Development of a Thematic Apperception Test for
urban Hispanic children. Journal of Consulting and
Clinical Psychology, 52, 986996.
March 6, 2006
13:35
References
Mallory, E., & Miller, V. (1958). A possible basis for the
association of voice characteristics and personality
traits. Speech Monographs, 25, 255260.
Maloney, M. P., & Ward, M. P. (1976). Psychological assessment. A conceptual approach. New York:
Oxford University Press.
Mann, I. T., Phillips, J. L., & Thompson, E. G. (1979).
An examination of methodological issues relevant
to the use and interpretation of the Semantic
Differential. Applied Psychological Measurement, 3,
213229.
Mapou, A. (1955). Development of general working
population norms for the USES General Aptitude
Test Battery. Journal of Applied Psychology, 39, 130
133.
Mardell, C., & Goldenberg, D. (1975). Developmental Indicators for the Assessment of Learning (DIAL).
Edison, NJ: Childcraft Education Corp.
Marin, G., Gamba, R. J., & Marin, B. V. (1992). Extreme
response style and acquiescence among Hispanics.
Journal of Cross-cultural Psychology, 23, 498509.
Marin, G., Sabogal, F., Marin, B. V., Otero-Sabogal, R.,
& Perez-Stable, E. J. (1987). Development of a short
acculturation scale for Hispanics. Hispanic Journal
of Behavioral Sciences, 9, 183205.
Maris, R. W. (1992). Overview of the study of suicide
assessment and prediction. In R. W. Maris, A. L.
Berman, J. T. Maltsberger, & R. I. Yut (Eds.), Assessment and prediction of suicide. New York: Guilford
Press.
Marks, D. F. (1973). Visual imagery differences in the
recall of pictures. British Journal of Psychology, 64,
1724.
Marks, I. M., & Mathews, A. M. (1979). Brief standard
self-rating for phobic patients. Behaviour Research
and Therapy, 17, 263267.
Marks, P. A., & Seeman, W. (1963). The actuarial
description of personality: An atlas for use with the
MMPI. Baltimore, MD: Williams & Wilkins.
Marks, P. A., Seeman, W., & Haller, D. L. (1974).
The actuarial use of the MMPI with adolescents and
adults. Baltimore, MD: Williams & Wilkins.
Marland, S. (1972). Education of the gifted and talented,
I. Report to Congress of the United States by the Commissioner of Education. Washington, DC: U.S. Ofce
of Education.
Marquart, D. I., & Bailey, L. L. (1955). An evaluation of
the culture-free test of intelligence. Journal of Genetic
Psychology, 86, 353358.
Marquart, J. M. (1989). A pattern matching approach
to assess the construct validity of an evaluation
instrument. Evaluation and Program Planning, 12,
3743.
Marsh, H. W., & Richards, G. E. (1988). Tennessee SelfConcept Scale: Reliability, internal structure, and
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
construct validity. Journal of Personality and Social
Psychology, 55, 612624.
Marshall, V. W. (1982). Death and dying. In D. J. Mangen & W. A. Peterson (Eds.), Research instruments in
social gerontology (Vol. 1, pp. 303381). Minneapolis, MN: University of Minnesota Press.
Marston, A. R. (1971). It is time to reconsider the
Graduate Record Examination. American Psychologist, 26, 653655.
Martin, D. (1986). Is my child gifted?: A guide for caring
parents. Springeld, IL: Charles C Thomas.
Martin, R. P. (1986). Assessment of the social and emotional functioning of preschool children. School Psychology Review, 15, 216232.
Martin, R. P. (1988). Assessment of personality and
behavior problems. New York: Guilford Press.
Martinez, R., Norman, R. D., & Delaney, N. D. (1984).
A Childrens Hispanic Background Scale. Hispanic
Journal of Behavioral Sciences, 6, 103112.
Marx, M. B., Garrity, T. F., & Bowers, F. R. (1975). The
inuence of recent life experience on the health of
college freshmen. Journal of Psychosomatic Research,
19, 87.
Maslach, C., Jackson, S. E. (1981). Maslach Burnout
Inventory. Palo Alto, CA: Consulting Psychologists
Press, Inc.
Masling, J. (1960). The inuence of situational and
interpersonal variables in projective testing. Psychological Bulletin, 57, 6585.
Maslow, A. H. (1954). Motivation and personality. New
York: Harper & Row.
Maslow, A. H. (1962). Toward a psychology of being.
Princeton, NJ: D. VanNostrand.
Maslow, A. (1970). Motivation and personality. New
York: Harper & Row.
Maslow, P., Frostig, M., Lefever, W., & Whittlesey,
J. R. B. (1964). The Marianne Frostig Developmental Test of Visual perception, 1963 standardization.
Perceptual and Motor Skills, 19, 463499.
Mason, E. M. (1992). Percent of agreement among
raters and rater reliability of the Copying subtest of the Stanford-Binet Intelligence Scale:
Fourth edition. Perceptual and Motor Skills, 74,
347353.
Masters, J. R. (1974). The relationship between number of response castegories and reliability of Likerttype questionnaires. Journal of Educational Measurement, 11, 4953.
Masuda, M., & Holmes, T. H. (1967). The social readjustment rating scale: A cross cultural study of
Japanese and Americans. Journal of Psychosomatic
Research, 11, 227237.
Matarazzo, J. D. (1972). Wechslers measurement and
appraisal of adult intelligence (5th ed.). Baltimore,
MD: Williams & Wilkins.
March 6, 2006
13:35
587
Matarazzo, J. D. (1980). Behavioral health and behavioral medicine: Frontiers for a new health psychology. American Psychologist, 35, 807817.
Matarazzo, J. D. (1983). Computerized psychological
testing. Science, 221, 323.
Matarazzo, J. D. (1986). Computerized clinical psychological test interpretations. American Psychologist, 41, 1424.
Matarazzo, J. D. (1992). Psychological testing and
assessment in the 21st century. American Psychologist, 47, 10071018.
Matarazzo, J. D., & Wiens, A. N. (1977). Black Intelligence Test of Cultural Homogeneity and Wechsler
Adult Intelligence Scale scores of black and white
police applicants. Journal of Applied Psychology, 62,
5763.
Matheny, A. P., Jr. (1980). Bayleys infant behavior
record: Behavioral components and twin analyses.
Child Development, 51, 11571167.
Mathis, H. (1968). Relating environmental factors to
aptitude and race. Journal of Counseling Psychology,
15, 563568.
Matluck, J. H., & Mace, B. J. (1973). Language characteristics of Mexican-American children: Implications for assessment. Journal of School Psychology,
11, 365386.
Mattis, S. (1976). Mental status examination for
organic mental syndrome in the elderly patient. In
L. Bellak & T. B. Karasu (Eds.), Geriatric psychiatry
(pp. 77121). New York: Grune & Stratton.
Mauger, P. A., & Kolmodin, C. A. (1975). Longterm predictive validity of the Scholastic Aptitude
Test. Journal of Educational Psychology, 67, 847
851.
May, K. M., & Loyd, B. H. (1994). Honesty tests
in academia and business: A comparative study.
Research in Higher Education, 35, 499511.
May, R. (1966). Sex differences in fantasy patterns.
Journal of Projective Techniques and Personality
Assessment, 30, 576586.
McAbee, T. A., & Cafferty, T. P. (1982). Degree of
prescribed punishment as a function of subjects
authoritarianism and offenders race and social status. Psychological Reports, 50, 651654.
McAdams, D. P. (1992). The ve-factor model in
personality: A critical appraisal. Journal of Personality, 60, 329361.
McAllister, L. W. (1986). A practical guide to CPI interpretation. Palo Alto, CA: Consulting Psychologists
Press.
McAndrew, C. (1965). The differentiation of male
alcoholic outpatients from nonalcoholic psychiatric outpatients by means of the MMPI. Quarterly Journal of Studies on Alcohol, 26, 238
246.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
588
McArthur, D. S., & Roberts, G. E. (1982). Roberts
Apperception Test for Children: Manual. Los Angeles,
CA: Western Psychological Services.
McCabe, S. P. (1985). Career Assessment Inventory.
In D. J. Keyser & R. C. Sweetland (Eds.), Test critiques (Vol. II, pp. 128137). Kansas City, MO: Test
Corporation of America.
McCallum, R. S., Karnes, F., & Crowell, M. (1988). Factor structure of the Stanford-Binet Intelligence Scale
for gifted children (4th ed.). Contemporary Educational Psychology, 13, 331338.
McCallum, R. S. (1990). Determining the factor structure of the Stanford-Binet Fourth Edition: The
right choice. Journal of Psychoeducational Assessment, 8, 436442.
McCallum, R. S., Karnes, F. A., & Edwards, R. P. (1984).
The test of choice for assessment of gifted children: A
comparison of the K-ABC, WISC-R, and StanfordBinet. Journal of Psychoeducational Assessment, 2,
5763.
McCann, J. T. (1991). Convergent and discriminant validity of the MCMI-II and MMPI personality disorder scales. Psychological Assessment, 3,
918.
McCarney, S. B. (1989). The Attention Decit Disorders Evaluation Scale technical manual, school version. Columbia, MO: Hawthorne Educational Services, Inc.
McCarthy, B. W., & Rafferty, J. E. (1971). Effect of social
desirability and self-concept scores on the measurement of adjustment. Journal of Personality Assessment, 35, 576583.
McCarthy, D. (1972). Manual for the McCarthy Scales
of Childrens Abilities. New York: Psychological Corporation.
McCarthy, D. (1978). McCarthy Screening Test. New
York: Psychological Corporation.
McCaulley, M. H. (1981). Jungs theory of psychological types and the Myers-Briggs Type Indicator.
In P. McReynolds (Ed.), Advances in psychological
assessment (Vol. 5, pp. 294352). San Francisco, CA:
Jossey-Bass.
McClelland, D. C. (1951). Personality. New York:
Sloane.
McClelland, D. C., Atkinson, J. W., Clark, R. A., &
Lowell, E. L. (1953). The achievement motive. New
York: Appleton-Century-Crofts.
McClelland, D. C., & Boyatzis, R. E. (1982). Leadership
motive pattern and long-term success in management. Journal of Applied Psychology, 67, 737743.
McClenaghan, B., & Gallahue, D. (1978). Fundamental
movement: A developmental and remedial approach.
Philadelphia, PA: Saunders.
McConaghy, N. (1967). Penile volume change to moving pictures of male and female nudes in heterosex-
March 6, 2006
13:35
References
ual and homosexual males. Behaviour Research and
Therapy, 5, 4348.
McCormack, R. L. (1983). Bias in the validity of
predicted college grades in four ethnic minority
groups. Educational and Psychological Measurement,
43, 517522.
McCrae, R. R. (1982). Consensual validation of personality traits: Evidence from self-reports and ratings. Journal of Personality and Social Psychology, 43,
293303.
McCrae, R., & Costa, P. (1983a). Social desirability scales: More substance than style. Journal
of Consulting and Clinical Psychology, 51, 882
888.
McCrae, R. R., & Costa, P. T., Jr. (1983b). Joint factors
in self-reports and ratings: Neuroticism, extraversion, and openness to experience. Personality and
Individual Differences, 4, 245255.
McCrae, R. R., & Costa, P. T., Jr. (1985). Updating
Normans Adequate Taxonomy: Intelligence and
personality dimensions in natural language and in
questionnaires. Journal of Personality and Social Psychology, 49, 710721.
McCrae, R. R., & Costa, P. T., Jr. (1986). Clinical assessment can benet from recent advances in personality psychology. American Psychologist, 41, 1001
1003.
McCrae, R. R., & Costa, P. T., Jr. (1987). Validation of
the ve-factor model of personality across instruments and observers. Journal of Personality and
Social Psychology, 52, 8190.
McCrae, R. R., & Costa, P. T., Jr. (1989a). Reinterpreting the Myers-Briggs Type Indicator from the
perspective of the ve-factor model of personality.
Journal of Personality, 57, 1740.
McCrae, R. R., & Costa, P. T., Jr. (1989b). More reasons
to adopt the ve-factor model. American Psychologist, 44, 451452.
McCrae, R. R., Costa, P. T., Jr., & Busch, C. M.
(1986). Evaluating comprehensiveness in personality systems: The California Q-set and the ve factor
model. Journal of Personality, 54, 430446.
McCrone, W. P., & Chambers, J. F. (1977). A national
pilot study of psychological evaluation services to
deaf vocational rehabilitation clients. Journal of
Rehabilitation of the Deaf, 11, 14.
McDermid, C. D. (1965). Some correlates of creativity
in engineering personnel. Journal of Applied Psychology, 49, 1419.
McDougall, W. (1932). Of the words character and
personality. Character Personality, 1, 316.
McFall, R. M., & Lillesand, D. V. (1971). Behavior
rehearsal with modeling and coaching in assertive
training. Journal of Abnormal Psychology, 77, 313
323.
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
References
McFie, J. (1975). Assessment of organic intellectual
impairment. London: Academic Press.
McGee, R. K. (1967). Response set in relation to
personality: an orientation. In I. A. Berg (Ed.),
Response set in personality assessment (pp. 131).
Chicago, IL: Aldine.
McGill, J. C. (1980). MMPI score differences among
Anglo, Black, and Mexican-American welfare recipients. Journal of Clinical Psychology, 36, 147151.
McGreal, R., & Joseph, S. (1993). The DepressionHappiness Scale. Psychological Reports, 73, 1279
1282.
McGuire, F. L. (1980). The new MCAT and medical
school performance. Journal of Medical Education,
55, 405408.
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity
results: The relationship between predictor and criterion domains. Personnel Psychology, 43, 335354.
McHorney, C. A., Ware, J. E., Jr., & Raczek, A. E. (1993).
The MOS 36-item short-form Health Survey (SF36) II: Psychometric and clinical tests of validity in
measuring physical and mental health constructs.
Medical Care, 31, 247263.
McIntosh, D. E. (1999). Identifying at-risk
preschoolers: The discriminant validity of the
Differential Ability Scales. Psychology in the Schools,
36, 110.
McKee, M. G. (1972). Review of the EPPS. In O. K.
Buros (Ed.), The seventh mental measurements Yearbook (Vol. 1, pp. 1459151). Highland Park, NJ:
Gryphon Press.
McKechnie, G. E. (1975). Manual for the Leisure Activities Blank. Palo Alto, CA: Consulting Psychologists
Press.
McKinney, W. R. (1987). Public personnel selection:
Issues and choice points. Public Personnel Management Journal, 16, 243257.
McLain, D. L. (1993). The MSTAT-I: A new measure of
an individuals tolerance for ambiguity. Educational
and Psychological Measurement, 53, 183189.
McMahon, R. J. (1984). Behavioral checklists and rating scales. In T. H. Ollendick & M. Hersen (Eds.),
Child behavioral assessment: Principles and procedures. New York: Pergamon Press.
McNemar, Q. (1940). A critical examination of the
University of Iowa studies of environmental inuences upon the IQ. Psychological Bulletin, 37, 6392.
McNemar, Q. (1950). On abbreviated WechslerBellevue scales. Journal of Consulting Psychology, 14,
7981.
McQuaid, M. F., & Alovisetti, M. (1981). School psychological services for hearing-impaired children
in the New York and New England area. American
Annals of the Deaf, 126, 3742.
March 6, 2006
13:35
589
McQuitty, L. (1957). Elementary linkage analysis for
isolating orthogonal and oblique types and typical
relevancies. Educational and Psychological Measurement, 17, 207229.
McReynolds, P. (1987). Lightner Witmer. American
Psychologist, 42, 849858.
McReynolds, P., & Ludwig, K. (1984). Christian
Thomasius and the origin of psychological rating
scales. ISIS, 75, 546553.
McReynolds, P., & Ludwig, K. (1987). On the history of
rating scales. Personality and Individual Differences,
8, 281283.
McShane, D. A. (1980). A review of scores of American
Indian children on the Wechsler Intelligence Scales.
White Cloud Journal, 1, 310.
McShane, D. A., & Plas, J. M. (1982). Wechsler scale
performance patterns of American Indian children.
Psychology in the Schools, 19, 817.
McShane, S. L., & Karp, J. (1993). Employment following spinal cord injury: A covariance
structure analysis. Rehabilitation Psychology, 38,
2740.
Meador, D., Livesay, K. K., & Finn, M. H. (1983). Study
Number 27. In A. S. Kaufman & N. L. Kaufman
(Eds.), Kaufman-Assessment Battery for Children:
Interpretive manual. Circle Pines, MN: American
Guidance Service.
Meadow, K. P. (1983). An instrument for assessment of
social-emotional adjustment in hearing-impaired
preschoolers. American Annals of the Deaf, 128, 826
884.
Medley, M. L. (1980). Life satisfaction across four
stages of adult life. International Journal of Aging
and Human Development, 11, 192220.
Meehl, P. E. (1954). Clinical versus statistical prediction.
Minneapolis, MN: University of Minnesota Press.
Meehl, P. E. (1956). Wanted A good cookbook. American Psychologist, 11, 263272.
Meehl, P. E. (1957). When shall we use our heads
instead of the formula? Journal of Counseling Psychology, 4, 268273.
Meehl, P. E. (1978). Theoretical risks and tabular
asterisks: Sir Karl, Sir Ronald, and the slow progress
of soft psychology. Journal of Consulting and Clinical
Psychology, 46, 806834.
Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a
suppressor variable in the MMPI. Journal of Applied
Psychology, 30, 525564.
Meehl, P. E., Lykken, D. T., Schoeld, W., & Tellegen, A. (1971). Recaptured-item technique (RIT):
A method for reducing somewhat the subjective
element in factor naming. Journal of Experimental
Research in Personality, 5, 171190.
Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efciency of psychometric signs,
P1: JZP
0521861810rfa2
CB1038/Domino
0 521 86181 0
590
patterns, or cutting scores. Psychological Bulletin, 52,
194216.
Meeker, M. (1985). A teachers guide for the Structure
of Intellect Learning Abilities Test. Los Angeles, CA:
Western Psychological Services.
Meeker, M., Meeker, R. J., & Roid, G. H. (1985).
Structure of intellect Learning Abilities Test (SOI-LA)
manual. Los Angeles, CA: Western Psychological
Services.
Meer, B., & Baker, J. A. (1965). Reliability of measurements of intellectual functioning of geriatric
patients. Journal of Gerontology, 20, 110114.
Megargee, E. I. (1972). The California psychological
inventory handbook. San Francisco, CA: Jossey-Bass.
Mehryar, A. H., Hekmat, H., & Khajavi, F. (1977).
Some personality correlates of contemplated suicide. Psychological Reports, 40, 12911294.
Meier, S. T. (1993). Revitalizing the measurement curriculum. American Psychologist, 48, 886891.
Meijer, R. R., & Nering, M. L. (1999). Computerized adaptive testing: Overview and introduction.
Applied Psychological Measurement, 23, 187194.
Melchior, L. A., Huba, G. J., Brown, V. B., & Reback,
C. J. (1993). A short depression index for women.
Educational and Psychological Measurement, 53,
11171125.
Melzack, R. (1975). The McGill Pain Questionnaire:
Major properties and scoring methods. Pain, 1, 277
299.
Melzack, R. (1984). Measurement of the dimensions
of pain. In B. Bromm (Ed.), Pain measurement in
man Neurophysiological correlates of pain. New York:
Elsevier.
Melzack, R. (1987). The short-form McGill Pain Questionnaire. Pain, 30, 191197.
Melzack, R., & Torgerson, W. S. (1971). On the language of pain. Anesthesiology, 34, 5059.
Melzack, R., & Wall, P. D. (1965). Pain mechanisms: A
new theory. Science, 150, 971979.
Mendoza, R. H. (1989). An empirical scale to measure type and degree of acculturation in MexicanAmerican adolescents and adults. Journal of CrossCultural Psychology, 20, 372385.
Mercer, C. D., Algozzine, B., & Triletti, J. J.
(1979). Early identication issues and considerations. Exceptional Children, 46, 5254.
Mercer, J. (1973). Labeling the mentally retarded. Berkeley, CA: University of California Press.
Mercer, J. R. (1976). A system of multicultural pluralistic assessment (SOMPA) in Proceedings: With bias
toward none. Lexington, KY: Coordinating Ofce for
Regional Resource Centers, University of Kentucky.
Mercer, J., Gomez-Palacio, M., & Padilla, E. (1986).
The development of practical intelligence in crosscultural perspective. In R. J. Sternberg & R. K. Wag-
March 6, 2006
13:35
References
ner (Eds.), Practical intelligence (pp. 307337). Cambridge, MA: Cambridge University Press.
Mercer, J., & Lewis, J. (1977). System of multicultural pluralistic assessment. New York: Psychological
Corporation.
Merenda, P. F. (1958). Navy basic test battery validity
for success in naval occupations. Personnel Psychology, 11, 567577.
Merenda, P. F., & Reilly, R. (1971). Validation of selection criteria in determining success of graduate students in psychology. Psychological Reports, 28, 259
266.
Merrell, K. W. (1990). Teacher ratings of hyperactivity
and self-control in learning disabled boys: A comparison with low achieving and average peers. Psychology in the Schools, 27, 289296.
Merrell, K. W. (1994). Assessment of behavioral, social,
and emotional problems. New York: Longman.
Mershon, B., & Gorsuch, R. L. (1988). Number of
factors in the personality sphere: Does increase in
factors increase predictability of real life criteria?
Journal of Personality and Social Psychology, 55,
675680.
Messick, S. (1960). Dimensions of social desirability.
Journal of Consulting Psychology, 24, 279287.
Messick, S. (1962). Response style and content measures from personality inventories. Educational and
Psychological Measurement, 22, 4156.
Messick, S. (1975). Meaning and values in measurement and evaluation. American Psychologist, 35,
10121027.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13103). New York:
Macmillan.
Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50, 741749.
Messick, S., & Anderson, S. (1970). Educational testing, individual development, and social responsibility. The Counseling Psychologist, 2, 8088.
Messick, S. J., & Jackson, D. N. (1958). The measurement of authoritarian attitudes. Educational and
Psychological Measurement, 18, 241253.
Messick, S. & Jungeblut, A. (1981). Time and method
in coaching for the SAT. Psychological Bulletin, 89,
191216.
Metcalfe, M., & Goldman, E. (1965). Validation of an
inventory for measuring depression. British Journal
of Psychiatry, 111, 240242.
Metzger, A. M. (1979). A Q-methodological study of
the Kubler-Ross stage theory. Omega, 10, 291301.
Meyer, A. B., Fouad, N. A., & Klein, M. (1987).
Vocational inventories. In B. Bolton (Ed.), Handbook of measurement and evaluation in rehabilitation (2nd ed., pp. 119138). Baltimore, MD: Paul H.
Brookes.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Meyer, G. J. (1997). Assessing reliability: Critical corrections for a critical examination of the Rorschach
Comprehensive System. Psychological Assessment, 9,
480489.
Meyer, G. J., Finn, S. E., & Eyde, L. D. (2001). Psychological testing and psychological assessment. American Psychologist, 56, 128165.
Meyer, P., & Davis, S. (1992). The California Psychological Inventory Applications Guide. Palo Alto, CA:
Consulting Psychologists Press.
Meyers, C. E., Nihira, K., & Zetlin, A. (1979). The measurement of adaptive behavior. In N. R. Ellis (Ed.),
Handbook of mental deciency: Psychological theory
and research (2nd ed., pp. 215253). Hillsdale, NJ:
Erlbaum.
Michelson, L., & Wood, R. (1982). Development and
psychometric properties of the Childrens Assertive
Behavior Scale. Journal of Behavioral Assessment, 4,
313.
Miethe, T. D. (1985). The validity and reliability of
value measurements. The Journal of Psychology, 119,
441453.
Millard. K. A. (1952). Is How Supervise? an intelligence test? Journal of Applied Psychology, 36,
221224.
Miller, B. C., Rollins, B. C., & Thomas, D. L. (1982). On
methods of studying marriages and families. Journal
of Marriage and the Family, 44, 851873.
Miller, D., Wertz, O., & Counts, S. (1961). Racial differences on the MMPI. Journal of Clinical Psychology,
17, 159161.
Miller, J. F., & Powers, M. J. (1988). Development of
an instrument to measure hope. Nursing Research,
37, 610.
Miller, K. M., & Biggs, J. B. (1958). Attitude change
through undirected group discussion. Journal of
Educational Psychology, 49, 224228.
Miller, L. C., Murphy, R., & Buss, A. H. (1981). Consciousness of body: Private and public. Journal of
Personality and Social Psychology, 41, 397406.
Miller, M. D., & Linn, R. L. (2000). Validation of
performance-based assessments. Applied Psychological Measurement, 24, 367378.
Miller, W. R. (1976). Alcoholism scales and objective
assessment methods: A review. Psychological Bulletin, 83, 649674.
Millman, J., Bishop, C. H., & Ebel, R. (1965). An analysis of testwiseness. Educational and Psychological
Measurement, 25, 707726.
Millon, T. (1987). Millon Clinical Multiaxial InventoryII: MCMI-II (2nd ed.). Minneapolis, MN: National
Computer Systems.
Millon, T., & Green, C. (1989). Interpretive guide to the
Millon Clinical Multiaxial Inventory (MCMI-II). In
C. S. Newmark (Ed.), Major psychological assessment
March 4, 2006
14:36
591
instruments (Vol. 2, pp. 543). Boston, MA: Allyn &
Bacon.
Millon, T., Green, C. J., & Meagher, R. B. Jr. (1982a).
Millon Adolescent Personality Inventory Manual.
Minneapolis, MN: National Computer Systems.
Millon, T., Green, C. J., & Meagher, R. B. Jr. (1982b).
Millon Behavioral Health Inventory Manual (3rd
ed.). Minneapolis, MN: National Computer Systems.
Miner, J. (1964). Scoring guide for the Miner Sentence
Completion Scale. New York: Springer-Verlag.
Minton, H. L. (1984). The Iowa Child Welfare Research
Station and the 1940 debate of intelligence: Carrying
on the legacy of a concerned mother. Journal of the
History of the Behavioral Sciences, 20, 160176.
Mischel, W. (1968). Personality and assessment. New
York: Wiley.
Mischel, W. (1977). On the future of personality measurement. American Psychologist, 32, 246254.
Mischel, W. (1981). Introduction to personality (3rd
ed.). New York: Holt, Rinehart and Winston.
Mishra, S. P. (1971). Wechsler Adult Intelligence Scale:
Examiner vs. machine administration. Psychological
Reports, 29, 759762.
Mishra, S. (1983). Evidence of item bias in the verbal subtests of the WISC-R for Mexican-American
children. Journal of Psychoeducational Assessment, 1,
321328.
Misiak, H., & Sexton, V. S. (1966). History of Psychology.
New York: Grune & Stratton.
Mitchell, A. J. (1937). The effect of bilingualism on
the measurement of intelligence. Elementary School
Journal, 38, 2937.
Mitchell, R. (1973). Dening medical terms. Developmental Medicine and Child Neurology, 15, 279.
Mitchell, T. W., & Klimoski, R. J. (1982). Is it rational to be empirical? A test of methods for scoring
biographical data. Journal of Applied Psychology, 67,
411418.
Mittenberg, W., Azrin, R., Millsaps, C., & Heilbronner,
R. (1993). Identication of malingered head injury
on the Wechsler Memory Scale-Revised. Psychological Assessment, 5, 3440.
Mittenberg, W., Rotholc, A., Russell, E., & Heilbronner,
R. (1996). Identication of malingered head injury
on the Halstead-Reitan Battery. Toxicology Letters,
11, 271281.
Mittenberg, W., Theroux-Fichera, S., Zielinski, R. E.,
& Heilbronner, R. L. (1995). Identication of malingered head injury on the Wechsler Adult Intelligence
Scale-Revised. Professional Psychology: Research and
Practice, 26, 491498.
Mizes, J. S., & Crawford, J. (1986). Normative values
on the Marks and Mathews Fear Questionnaire: A
comparison as a function of age and sex. Journal of
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
592
Psychopathology and Behavioral Assessment, 8, 253
262.
Montgomery, G. T., & Orozco, S. (1984). Validation of
a measure of acculturation for Mexican Americans.
Hispanic Journal of Behavioral Sciences, 6, 5363.
Moore, M. (1975). Rating vs. ranking in the Rokeach
Value Survey: An Israeli comparison. European Journal of Social Psychology, 5, 405408.
Moore, N. C., Summer, K. R., & Bloor, R. N. (1984).
Do patients like psychometric testing by computer?
Journal of Clinical Psychology, 40, 875877.
Moos, R. H. (1968). The assessment of the social
climates of correctional institutions. Journal of
Research in Crime and Delinquency, 5, 174188.
Moos, R. H. (1972). Assessment of the psychosocial
environments of community-oriented psychiatric
treatment programs. Journal of Abnormal Psychology, 79, 918.
Moos, R. H. (1979). Evaluating educational environments. San Francisco, CA: Jossey-Bass.
Moos, R. H., & Gerst, M. S. (1974). University Residence Environmene Scale. Palo Alto, CA: Consulting
Psychologists Press.
Moos, R. H., & Houts, P. (1968). Assessment of the
social atmosphere of psychiatric wards. Journal of
Abnormal Psychology, 73, 595604.
Moos, R. H., & Swindle, R. W., Jr. (1990). Stressful
life circumstances: Concepts and measures. Stress
Medicine, 6 171178.
Moos, R. H., & Trickett, E. J. (1974). Classroom Environment Scale manual. Palo Alto, CA: Consulting
Psychologists Press.
Moracco, J., & Zeidan, M. (1982). Assessment of sex
knowledge and attitude of non-Western medical
students. Psychology, 19, 1321.
Moran, A. (1986). The reliability and validity of
Ravens Standard Progressive Matrices for Irish
apprentices. International Review of Psychology, 35,
533538.
Moran, P. W., & Lambert, M. J. (1983). A review of
current assessment tools for monitoring changes in
depression. In M. S. Lambert, E. R. Christensen, &
S. S. DeJulio (Eds.), The assessment of psychotherapy
outcome. (pp. 263303). New York: Wiley.
Moreland, K. L. (1987). Computerized psychological
assessment: Whats available. In J. N. Butcher (Ed.),
Computerized psychological assessment (pp. 2649).
New York: Basic Books.
Moreland, K. L. (1991). Assessment of validity
in computer-based test interpretations. In T. B.
Gutkin & S. L. Wise (Eds.), The computer and the
decision-making process (pp. 4374). Hillsdale, NJ:
Erlbaum.
Moreland, K. L., & Onstad, J. A. (1987). Validity of
Millons computerized interpretation system for the
March 4, 2006
14:36
References
MCMI: A controlled study. Journal of Consulting and
Clinical Psychology, 55, 113114.
Morey, L. C., & LeVine, D. J. (1988). A multitraitmultimethod examination of Minnesota Multiphasic Personality Inventory (MMPI) and Millon
Clinical Multiaxial Inventory (MCMI). Journal of
Psychopathology and Behavioral Assessment, 10, 333
344.
Morgan, W. G. (1995). Origin and history of the Thematic Apperception Test images. Journal of Personality Assessment, 65, 237254.
Morris, J. N., & Sherwood, S. (1975). A retesting and
modication of the Philadelphia Geriatric Center
Morale Scale. Journal of Gerontology, 30, 7784.
Mosel, J. N. (1952). The validity of rational ratings
on experience and training. Personnel Psychology, 5,
19.
Mosel, J. N., & Cozan, L. W. (1952). The accuracy of
application blank work histories. Journal of Applied
Psychology, 36, 365369.
Mosher, D. L. (1965). Approval motive and acceptance of fake personality interpretations which
differ in favorability. Psychological Reports, 17,
395402.
Mosier, C. I. (1941). A short cut in the estimation of
the split-halves coefcient. Educational and Psychological Measurement, 1, 407408.
Mowrer, O. H. (1960). The psychology of hope. San
Francisco, CA: Jossey-Bass.
Mowrer, O. H. (Ed.). (1967). Morality and mental
health. Chicago, IL: Rand McNally & Co.
Mulaik, S. A. (1964). Are personality factors raters
conceptual factors? Journal of Consulting Psychology,
28, 506511.
Mullis, I. V. S. (1992). Developing the NAEP contentarea frameworks and innovative assessment methods in the 1992 assessments of mathematics, reading, and writing. Journal of Educational Measurement, 29, 111131.
Mumford, M. D., & Owens, W. A. (1987). Methodology
review: Principles, procedures, and ndings in the
application of background data measures. Applied
Psychological Measurement, 11, 131.
Mumley, D., L., Tillbrook, C. E., & Grisso, T. (2003).
Five year research update (19962000): Evaluations
for Competence to Stand Trial (Adjudicative Competence). Behavioral Sciences and the Law, 21, 329
350.
Murphy, J. J. (1972). Current practices in the use of
psychological testing by police agencies. Journal of
Criminal Law, Criminology, and Police Science, 63,
570576.
Murphy, K. R. (1994). Potential effects of banding as a
function of test reliability. Personnel Psychology, 47,
477495.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Murphy, K. R., & Constans, J. I. (1987). Behavioral
anchors as a source of bias in rating. Journal of
Applied Psychology, 72, 573577.
Murphy, K. R., Jako, R. A., & Anhalt, R. L. (1993).
Nature and consequences of halo error: A critical
analysis. Journal of Applied Psychology, 78, 218225.
Murray, H. A. (1938). Explorations in personality. New
York: Oxford University Press.
Murray, H. A. (1943). Thematic Apperception Test:
Pictures and manual. Cambridge, MA: Harvard
University Press.
Murray, H. A. (1967). Autobiography. In E. G. Boring & G. Lindzey (Eds.), A history of psychology in
autobiography (Vol. 5, pp. 283310). New York:
Appleton-Century-Crofts.
Murray, J. B. (1990). Review of research on the MyersBriggs Type Indicator. Perceptual and Motor Skills,
70, 11871202.
Murstein, B. J. (1963). Theory and research in projective
techniques (emphasizing the TAT). New York: Wiley.
Myers, A. M., Holliday, P. J., Harvey, K. A., & Hutchinson, K. S. (1993). Functional performance measures:
Are they superior to self-assessments? Journal of
Gerontology, 48, 196206.
Myers, I. B., & McCaulley, M. H. (1985). Manual: A
guide to the development and use of the Myers-Briggs
Type Indicator. Palo Alto, CA: Consulting Psychologists Press.
Myklebust, H. (1964). The psychology of deafness. New
York: Grune & Stratton.
Nagle, R. J. (1979). The McCarthy Scales of Childrens
Abilities: Research implications for the assessment
of young children. School Psychology Digest, 8, 319
326.
Nagle, R. J., & Bell, N. L. (1995). Validation of an itemreduction short form of the Stanford-Binet Intelligence Scale: Fourth edition with college students.
Journal of Clinical Psychology, 51, 6370.
Naglieri, J. A. (1981). Factor structure of the WISC-R
for children identied as learning disabled. Psychological Reports, 49, 891895.
Naglieri, J. A. (1988). Draw A Person: A quantitative
scoring system. San Antonio, TX: Psychological Corporation.
Naglieri, J. A., & Kaufman, A. S. (1983). How many
factors underlie the WAIS-R? The Journal of Psychoeducational Assessment, 1, 113120.
Naglieri, J. A., & Welch, J. A. (1991). Use of Ravens
and Naglieris Noverbal Matrix Tests. Journal of the
American Deafness and Rehabilitation Association,
24, 98103.
Nairn, A. and associates (1980). The reign of ETS.
Washington DC: Ralph Nader.
Narby, D. J., Cutler, B. L., & Moran, G. (1993). A metaanalysis of the association between authoritarian-
March 4, 2006
14:36
593
ism and jurors perceptions of defendant culpability.
Journal of Applied Psychology, 78, 3442.
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A metaanalytic
investigation. Personnel Psychology, 41, 517535.
Nathan, B. R., & Tippins, N. (1990). The consequences
of halo error in performance ratings: A eld study
of the moderating effect of halo on test validation
results. Journal of Applied Psychology, 75, 290296.
National Commission on Excellence in Education
(1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government
Printing Ofce.
National Society to Prevent Blindness (1974). The Symbol Chart for Twenty Feet Snellen Scale. New York:
Author.
Naughton, M. J., & Wiklund, I. (1993). A critical review
of dimension-specic measures of health-related
quality of life in cross-cultural research. Quality of
Life Research, 2, 397432.
Nedelsky, L. (1954). Absolute grading standards for
objective tests. Educational and Psychological Measurement, 14, 319.
Needham, W. E., & Eldridge, L. S. (1990). Performance
of blind vocational rehabilitation clients on the Minnesota Rate of Manipulation Tests. Journal of Visual
Impairment and Blindness, 84, 182185.
Neeper, R., & Lahey, B. B. (1984). Identication of two
dimensions of cognitive decits through the factor
analysis of teacher ratings. School Psychology Review,
13, 485490.
Neiner, A. G., & Owens, W. A. (1985). Using biodata to
predict job choice among college graduates. Journal
of Applied Psychology, 70, 127136.
Neisser, U. (1976). General, academic, and articial
intelligence. In L. Resnick (Ed.), The nature of intelligence (pp. 135144). Hillsdale, NJ: Erlbaum.
Neisser, U., Boodoo, G., Bouchard, T. J., Jr., & Boykin,
A. W. (1996). Intelligence: Knowns and unknowns.
American Psychologist, 51, 77101.
Neisworth, J. T., & Bagnato, S. J. (1986). Curriculumbased developmental assessment: Congruence of
testing and teaching. School Psychology Review, 15,
180199.
Nelson, R. O. (1977). Methodological issues in assessment via self-monitoring. In J. D. Cone & R. P.
Hawkins (Eds.), Behavioral assessment: New directions in clinical psychology (pp. 217240). New York:
Brunner/Mazel.
Nelson, R. O., & Hayes, S. C. (1979). Some current
dimensions of behavioral assessment. Behavioral
Assessment, 1, 116.
Nelson-Gray, R. O. (1991). DSM IV: Empirical guidelines for psychometrics. Journal of Abnormal Psychology, 100, 308315.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
594
Nester, M. A. (1984). Employment testing for handicapped persons. Public Personnel Management Journal, 13, 417434.
Nester, M. A. (1993). Psychometric testing and reasonable accomodation for persons with disabilities.
Rehabilitation Psychology, 38, 7585.
Neugarten, B. (1974). Age groups in the young-old.
Annals of the American Academy of Political and
Social Science, 415, 187198.
Neugarten, B. L., & Associates (Eds.). (1964). Personality in middle and late life. New York: Atherton.
Neugarten, B. L., Havighurst, R. J., & Tobin, S. S.
(1961). The measurement of life satisfaction. Journal of Gerontology, 16, 134143.
Nevo, B. (1985). Face validity revisited. Journal of Educational Measurement, 22, 287293.
Newcomb, T. M. (1950). Social psychology. New York:
Holt.
Newman, J. P. (1989). Aging and depression. Psychology and aging, 4, 150165.
Ng, S. H., Akhtar Hossain, A. B. M., Ball, P., Bond,
M. H., Hayashi, K., Lim, S. P., ODriscoll, M. P.,
Sinha, D., & Yang, K. S. (1982). Human values in
nine countries. In R. Rath, H. S. Asthana, D. Sinha,
& J. B. P. Sinha (Eds.), Diversity and unity in crosscultural psychology. Lissie, the Netherlands: Swets &
Zeitlinger.
Nichols, R. C., & Schnell, R. R. (1963). Factor scales
for the California Psychological Inventory. Journal
of Consulting Psychology, 27, 228235.
Nicholson, C. L. (1977). Correlations between
the Quick Test and the Wechsler Scale for
Children-Revised. Psychological Reports, 40, 523
526.
Nicholson, R. A., Briggs, S. R., & Robertson, H. C.
(1988). Instruments for assessing competency to
stand trial: How do they work? Professional Psychology, 19, 383394.
Nietzel, M. T., Martorano, R. D., & Melnick, J. (1977).
The effects of covert modeling with and without
reply training on the development and generalization of assertive responses. Behavior Therapy, 8, 183
192.
Nihira, K., Foster, R., Shellhass, M., & Leland, H.
(1974). AAMD Adaptive Behavior Scale. Washington, DC: American Association of Mental Deciency.
Nolan, D. R., Hameke, T. A., & Barkley, R. A. (1983). A
comparison of the patterns of the neuropsychological performance in two groups of learning-disabled
children. Journal of Clinical Child Psychology, 12, 13
21.
Noonan, J. V., & Sarvela, P. D. (1991). Implementation decisions in designing computer-based instructional testing programs. In T. B. Gutkin & S. L. Wise
March 4, 2006
14:36
References
(Eds.), The computer and the decision-making process
(pp. 177197). Hillsdale, NJ: Erlbaum.
Norman, R. D., & Daley, M. F. (1959). Senescent
changes in intellectual ability among superior older
women. Journal of Gerontology, 14, 457464.
Norman, W. T. (1963). Toward an adequate taxonomy
of personality attributes: Replicated factor structure
in peer nomination personality ratings. Journal of
Abnormal and Social Psychology, 66, 574583.
Norring, C., & Sohlberg, S. (1988). Eating Disorder
Inventory in Sweden: Description, cross-cultural
comparison, and clinical utility. Acta Psychiatrica
Scandinavica, 78, 567575.
Norton, R. (1983). Measuring marital quality: A critical look at the dependent variable. Journal of Marriage and the Family, 45, 141151.
Nowacek, G. A., Pullen, E., Short, J., & Blumner, H. N.
(1987). Validity of MCAT scores as predictors of
preclinical grades and NBME Part I examination
scores. Journal of Medical Education, 62, 989991.
Nowicki, S., & Duke, M. P. (1974). A locus of control
scale for non-college as well as college adults. Journal
of Personality Assessment, 38, 136137.
Nowicki, S., & Strickland, B. R. (1973). A locus of control scale for children. Journal of Consulting Psychology, 40, 148154.
Nunnally, J. (1962). The analysis of prole data. Psychological Bulletin, 59, 311319.
Nunnally, J., & Husek, T. R. (1958). The phony language examination: An approach to the measurement of response bias. Educational and Psychological
Measurement, 2, 275282.
Nussbaum, K., Wittig, B. A., Hanlon, T. E., & Kurland,
A. A. (1963). Intravenous mialamide in the treatment of depressed female patients. Comprehensive
Psychiatry, 4, 105116.
Nuttall, E. V., Romero, I., & Kalesnik, J. (Eds.). (1992).
Assessing and screening preschoolers. Boston, MA:
Allyn & Bacon.
Oakland, T. (1985a). Review of the Otis-Lennon
School Ability Test. In J. V. Mitchell, Jr. (Ed.),
The ninth mental measurements yearbook (Vol.
2, pp. 11111112). Lincoln, NE: University of
Nebraska Press.
Oakland, T. (1985b). Review of the Slosson Intelligence
Test. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 2, pp. 14011403).
Lincoln, NE: University of Nebraska Press.
Oakland, T., & Matuszek, P. (1977). Using tests in
nondiscriminatory assessment. In T. Oakland (Ed.),
Psychological and educational assessment in minority
group children. New York: Bruner/Mazel.
Obayuwana, A. O., Collins, J. L., Carter, A. L., Rao,
M. S., Mathura, C. C., & Wilson, S. B. (1982).
Hope Index Scale: An instrument for the objective
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
assessment of hope. Journal of the National Medical
Association of New York, 74, 761.
Oblowitz, N., Green, L., & Heyns, I. (1991). A selfconcept scale for the hearing-impaired. The Volta
Review, 93, 1929.
OBrien, N. P. (1988). Test construction. New York:
Greenwood Press.
ODell, J. (1972). P. T. Bamum explores the computer.
Journal of Consulting and Clinical Psychology, 38,
270273.
ODonnell, J. M. (1979). The clinical psychology of
Lightner Witmer: A case study of institutional innovation and intellectual change. Journal of the History
of the Behavioral Sciences, 15, 317.
ODonohue, W., & Letourneau, E. (1992). The psychometric properties of the penile tumescence assessment of child molesters. Journal of Psychopathology
and Behavioral Assessment, 14, 123174.
Olbrisch, M. E. (1983). Development and validation
of the Ostomy Adjustment Scale. Rehabilitation Psychology, 28, 312.
Olea, M. M., & Ree, M. J. (1994). Predicting pilot and
navigator criteria: Not much more than g. Journal
of Applied Psychology, 79, 845851.
Oliver, J. M., & Burkham, R. (1979). Depression in
university students: Duration, relation to calendar
time, prevalence, and demographic correlates. Journal of Abnormal Psychology, 88, 667670.
Oliveri. M. E., & Reiss, D. (1984). Family concepts and
their measurement; Things are seldom what they
seem. Family Process, 23, 3348.
Ollendick, T. H. (1979). Discrepancies between Verbal and Performance IQs and subtest scatter on
the WISC-R for juvenile delinquents. Psychological
Reports, 45, 563568.
Olmedo, E. L. (1979). Acculturation. A psychometric
perspective. American Psychologist, 34, 10611070.
Olmedo, E. L. (1981). Testing linguistic minorities.
American Psychologist, 36, 10781085.
Olmedo, E. L., Martinez, J. L., & Martinez, S. R. (1978).
Measure of acculturation for Chicano adolescents.
Psychological Reports, 42, 159170.
Olmedo, E. L., & Padilla, A. M. (1978). Empirical and
construct validation of a measure of acculturation,
for Mexican-Americans. Journal of Social Psychology,
105, 179187.
Olson, D. H., McCubbin, H. I, Barnes, H., Larsen, A.,
Muxen, M., & Wilson, M., & Wilson, M. (1985).
Family inventories. St. Paul, MN: University of Minnesota.
Olson, D. H., & Rabunsky, C. (1972). Validity of four
measures of family power. Journal of Marriage and
the Family, 34, 224234.
Oltman, P. K., & Hartnett, R. T. (1984). The role
of GRE General and Subject Test scores in grad-
March 4, 2006
14:36
595
uate program admission. ETS Research Report,
8414.
Ones, D. S., Chockalingam, V., & Schmidt, F. L. (1995).
Integrity tests: Overlooked facts, resolved issues,
and remaining questions. American Psychologist, 50,
456457.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L.
(1993). Comprehensive metaanalysis of integrity
test validities: Findings and implications for personnel selection and theories of job performance.
Journal of Applied Psychology, 78, 679703.
Ong, J., & Marchbanks, R. L. (1973). Validity of selected
academic and non-academic predictors of optometry grades. American Journal of Optimetry, 50, 583
588.
Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13, 145150.
Oppenheim, A. N. (1992). Questionnaire design, interviewing and attitude measurement. London: Pinter.
Ortiz, V. Z., & Gonzalez, A. (1989). Validation of a short
form of the WISC-R with accelerated and gifted
Hispanic students. Gifted Child Quarterly, 33, 152
155.
Ortiz, V. Z., & Volkoff, W. (1987). Identication of
gifted and accelerated Hispanic students. Journal for
the Education of the Gifted, 11, 4555.
Orvik, J. M. (1972). Social desirability for the individual, his group, and society. Multivariate Behavioral
Research, 7, 332.
Osborne, R. T. (1994). The Burt collection. Journal of
the History of the Behavioral Sciences, 30, 369379.
Osgood, C., & Luria, Z. (1954). A blind analysis of a
case of multiple personality using the Semantic Differential. Journal of Abnormal and Social Psychology,
49, 579591.
Osgood, C., Suci, G., & Tannenbaum, P. (1957). The
measurement of meaning. Urbana, IL: University of
Illinois Press.
Osipov, V. P. (1944). Malingering: The simulation of
psychosis. Bulletin of the Meninger Clinic, 8, 3942.
Osipow, S. H. (1983). Theories of career development
(3rd ed.). New York: Appleton-Century-Crofts.
Osipow, S. H. (Ed.). (1987). Manual for the Career
Decision Scale. Odessa, FL: Psychological Assessment Resources.
Oskamp, S. (1991). Attitudes and opinions (2nd ed.).
Englewood Cliffs, NJ: Prentice Hall.
OSS Assessment Staff (1948). Assessment of men: Selection of personnel for the Ofce of Strategic Services.
New York: Rinehart.
Osterhouse, R. A. (1972). Desensitization and studyskills training as treatment for two types of testanxious students. Journal of Counseling Psychology,
19, 301307.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
596
Osterlind, S. J. (1983). Test item bias. Newbury Park,
CA: Sage.
Osterlind, S. J. (1989). Constructing test items. Boston,
MA: Kluwer Academic.
Ostrom, T. M. (1969). The relationship between the
affective, behavioral, and cognitive components of
attitude. Journal of Experimental Social Psychology,
5, 1230.
Ouellette, S. E. (1988). The use of projective drawing
techniques in the personality assessment of prelingually deafened young adults: A pilot study. American Annals of the Deaf, 133, 212218.
Overall, J. E. (1974). Validity of the Psychological
Screening Inventory for psychiatric screening. Journal of Consulting and Clinical Psychology, 42, 717
719.
Overall, J. E., & Magee, K. N. (1992). Estimating individual rater reliabilities. Applied Psychological Measurement, 16, 7785.
Owen, D. (1985). None of the above. Boston, MA:
Houghton Mifin.
Owen, S. V., & Baum, S. M. (1985). The validity of the
measurement of originality. Educational and Psychological Measurement, 45, 939944.
Owens, W. A. (1976). Background data. In M. D. Dunnette (Ed.), Handbook of industrial psychology. New
York: Rand-McNally.
Owens, W. A., Glennon, J. R., & Albright, L. E. (1962).
Retest consistency and the writing of life history
items: A rst step. Journal of Applied Psychology, 46,
329332.
Ownby, R. L., & Carmin, C. N. (1988). Conrmatory factor analyses of the Stanford-Binet Intelligence Scale (4th ed.). Journal of Psychoeducational
Assessment, 6, 331340.
Pace, C. R., & Stern, G. G. (1958). An approach to
the measurement of psychological characteristics of
college environments. Journal of Educational Psychology, 49, 269277.
Padilla, A. (1980). The role of cultural awareness and
ethnic loyalty in acculturation. In A. Padilla (Ed.),
Acculturation: Theory, models, and some new ndings. Boulder, CO: Westview Press.
Padilla, E. R., Olmedo, E. L., & Loya, R. (1982). Acculturation and the MMPI performance of Chicano
and Anglo college students. Hispanic Journal of
Behavioral Sciences, 4, 451466.
Paget, K. D., & Nagle, R. J. (1986). A conceptual model
of preschool assessment. School Psychology Review,
15, 154165.
Paivio, A. (1971). Imagery and verbal processes. New
York: Holt, Rinehart and Winston.
Paivio, A. (1975). Imagery and synchronic thinking.
Canadian Psychological Review, 16, 147163.
Paivio, A., & Harshman, R. (1983). Factor analysis
of a questionnaire on imagery and verbal habits
March 4, 2006
14:36
References
and skills. Casnadian Journal of Psychology, 37, 461
483.
Pankratz, L., Fausti, S., & Peed, S. (1975). A forcedchoice technique to evaluate deafness in the hysterical or malingering patient. Journal of Consulting and
Clinical Psychology, 43, 421422.
Panton, J. H. (1958). Predicting prison adjustment
with the MMPI. Journal of Clinical Psychology, 14,
308312.
Parker, K. (1983). A meta-analysis of the reliability
and validity of the Rorschach. Journal of Personality
Assessment, 47, 227231.
Parrott, C. A. (1986). Validation report on the
Verbalizer-Visualizer Questionnaire. Journal of
Mental Imagery, 10, 3942.
Pascal, G. R., & Suttell, B. J. (1951). The Bender-Gestalt
Test: Its quantication and validity for adults. New
York: Grune & Stratton.
Passow, H. (1985). Review of School and College
Ability Tests, Series III. In J. V. Mitchell, Jr. (Ed.),
The ninth mental measurements yearbook (Vol. 2, pp.
13171318). Lincoln, NE: University of Nebraska
Press.
Pastore, N. (1978). The Army intelligence tests and
Walter Lippmann. Journal of the History of the Behavioral Sciences, 14, 316327.
Patterson, C. H. (1946). A comparison of various
short forms of the Wechsler-Bellevue Scale. Journal of Consulting Psychology, 10, 260267.
Paulhus, D. L. (1984). Two-component models of
socially desirable responding. Journal of Personality
and Social Psychology, 46, 598609.
Paulhus, D. L. (1986). Self-deception and impression management in test responses. In A. Angleitner & J. S. Wiggins (Eds.), Personality assessment
via questionnaires (pp. 143165). Berlin: SpringerVerlag.
Paulhus, D. L. (1991). Measurement and control of
response bias. In J. P. Robinson, P. R. Shaver, &
L. S. Wrightsman (Eds.), Measures of personality and
social psychological attitudes (pp. 1759). New York:
Academic Press.
Payne, S. L. (1951). The art of asking questions. Princeton, NJ: Princeton University Press.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980).
Validity generalization results for tests used to predict job prociency and training success in clerical
occupations. Journal of Applied Psychology, 65, 373
406.
Pearson, B. Z. (1993). Predictive validity of the Scholastic Aptitude Test (SAT) for Hispanic bilingual students. Hispanic Journal of Behavioral Sciences, 15,
342356.
Pederson, K. (1990). Candidate Prole Record. In J.
Hogan & R. Hogan (Eds.), Business and industry
testing (pp. 357359). Austin, TX: Pro-ed.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Pedhazur, E. (1982). Multiple regression in behavioral
research: Explanation and prediction (2nd ed.). New
York: Holt, Rinehart and Winston.
Penner, L. A., Homant, R., & Rokeach, M. (1968).
Comparison of rank-order and paired-comparison
methods for measuring value systems. Perceptual
and Motor Skills, 27, 417418.
Pennock-Roman, M. (1990). Test validity and language
background: A study of Hispanic American students at
six universities. New York: College Entrance Examination Board.
Peplau, L. A., & Perlman, D. (Eds.). (1982). Loneliness:
A sourcebook of current theory, research and therapy.
New York: Wiley.
Perry, G. G., & Kinder, B. N. (1990). The susceptibility of the Rorschach to malingering: A critical
review. Journal of Personality Assessment, 54, 47
57.
Petersen, N. S., & Novick, M. R. (1976). An evaluation
of some models for culture-fair selection. Journal of
Educational Measurement, 13, 329.
Peterson, J. (1926). Early conceptions and tests of intelligence. Yonkers, NY: World Book Company.
Petrie, B. M. (1969). Statistical analysis of attitude scale
scores. Research Quarterly, 40, 434437.
Pfeiffer, E. (1975). SPMSQ: Short Portable Mental Status Questionnaire. Journal of the American Geriatric
Society, 23, 433441.
Phares, E. J. (1976). Locus of control in personality. Morristown, NJ: General Learning Press.
Phelps, L. (1989). Comparison of scores for intellectually gifted students on the WISC-R and the
fourth edition of the Stanford-Binet. Psychology in
the Schools, 26, 125129.
Phelps, L., & Bell, M. C. (1988). Correlations between
the Stanford-Binet: Fourth Edition and the WISC-R
with a learning disabled population. Psychology in
the Schools, 25, 380382.
Phelps, L., & Branyan, L. T. (1988). Correlations among
the Hiskey, K-ABC Nonverbal Scale, Leiter, and
WISC-R Performance Scale with public-school deaf
children. Journal of Psychoeducational Assessment, 6,
354358.
Phillips, B. L., Pasewark, R. A., & Tindall, R. C.
(1978). Relationship among McCarthy Scales of
Childrens Abilities, WPPSI, and Columbia Mental Maturity Scale. Psychology in the Schools, 15,
352356.
Phillips, S. (1980). Childrens perceptions of health and
disease. Canadian Family Physician, 26, 11711174.
Piaget, J. (1952). The origins of intelligence in children.
New York: International Universities Press.
Piaget, J. (1967). Six psychological studies. New York:
Random House.
Piedmont, R. L. (1998). The revised NEO Personality Inventory: Clinical and research applications. The
March 4, 2006
14:36
597
Plenum series in social/clinical psychology. New
York: Plenum.
Piers, E., & Harris, D. (1969). Manual for the PiersHarris Childrens Self-concept Scale. Nashville, TN:
Counselor Recordings and Tests.
Piersma, H. L. (1987). Millon Clinical Multiaxial
Inventory (MCMI) computer-generated diagnoses:
How do they compare to clinician judgment? Journal of Psychopathology and Behavioral Assessment, 9,
305312.
Pintner, R. (1924). Results obtained with the nonlanguage group tests. Journal of Educational Psychology, 15, 473483.
Pintner, R., & Paterson, D. G. (1915). The Binet Scale
and the deaf child. Journal of Educational Psychology,
6, 201210.
Piotrowski, C. (1983). Factor structure on the Semantic Differential as a function of method of analysis. Educational and Psychological Measurement, 43,
283288.
Piotrowski, C. (1995). A review of the clinical and
research use of the Bender-Gestalt Test. Perceptual
and Motor Skills, 81, 12721274.
Piotrowski, C., & Keller, J. W. (1984). Psychodiagnostic
testing in APA-approved clinical psychology programs. Professional Psychology: Research and Practice, 15, 450456.
Piotrowski, Z. A. (1964). Digital-computer interpretation of inkblot test data. Psychiatric Quarterly, 38,
126.
Plag, J. A., & Goffman, J. M. (1967). The Armed Forces
Qualication Test: Its validity in predicting military
effectiveness for naval enlistees. Personnel Psychology, 20, 323340.
Plake, B. S., Reynolds, C. R., & Gutkin, T. B. (1981).
A technique for comparison of the prole variability between independent groups. Journal of Clinical
Psychology, 37, 142146.
Plemons, G. (1977). A comparison of MMPI scores of
Anglo-and Mexican-American psychiatric patients.
Journal of Consulting and Clinical Psychology, 45,
149150.
Pollard, W. E., Bobbitt, R. A., Bergner, M., Martin, D. P.,
& Gilson, B. S. (1976). The Sickness Impact Prole:
Reliability of a health status measure. Medical Care,
14, 146155.
Ponterotto, J. G., Pace, T. M., & Kavan, M. G. (1989).
A counselors guide to the assessment of depression. Journal of Counseling and Development, 67,
301309.
Pope, K. S. (1992). Responsibilities in providing psychological test feedback to clients. Psychological
Assessment, 4, 268271.
Popham, W. J. (1981). The case for minimum
competency testing. Phi Delta Kappan, 63, 89
91.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
598
Poresky, R. H., Hendrix, C., Mosier, J. E., & Samuelson,
M. L. (1988). The Companion Animal Semantic Differential: Long and short form reliability
and validity. Educational and Psychological Measurement, 48, 255260.
Porter, T. M. (1986). The rise of statistical thinking, 18201900. Princeton, NJ: Princeton University
Press.
Poteat, G. M., Wuensch, K. L., & Gregg, N. B. (1988).
An investigation of differential prediction with the
WISC-R. Journal of School Psychology, 26, 5968.
Powers, D. E. (1993). Coaching for the SAT: A summary of the summaries and an update. Educational
Measurement: Issues and Practice, 12, 2439.
Powers, D. E., & Alderman, D. L. (1983). Effect of test
familiarization on SAT performance. Journal of Educational Measurement, 20, 7179.
Powers, S., & Barkan, J. H. (1986). Concurrent validity
of the Standard Progressive Matrices for Hispanic
and non-Hispanic seventh-grade students. Psychology in the Schools, 23, 333336.
Pratkanis, A. R., & Greenwald, A. G. (1989). A
sociocognitive model of attitude structure and function. In L. Berkowitz (Ed.), Advances in experimental
social psychology (Vol. 22, pp. 245285). San Diego,
CA: Academic Press.
Praver, F., DiGiuseppe, R., Pelcovitz, D., Mandel, F. S.,
& Gaines, R. (2000). A preliminary study of a cartoon measure for childrens reactions to chronic
trauma. Child Maltreatment, 5, 273285.
Preston, J. (1978). Abbreviated forms of the WISC-R.
Psychological Reports, 42, 883887.
Prieto, E. J., & Geisinger, K. F. (1983). Factor-analytic
studies of the McGill Pain Questionnaire. In R.
Melzack (Ed.), Pain measurement and assessment
(pp. 8593). New York: Raven Press.
Prigatano, G. P. (1978). Wechsler Memory Scale: A
selective review of the literature. Journal of Clinical
Psychology, 34, 816832.
Prigatano, G. P., & Amin, K. (1993). Digit Memory
Test: Unequivocal cerebral dysfunction and suspected malingering. Journal of Clinical and Experimental Neuropsychology, 15, 537546.
Prigatano, G. P., & Redner, J. E. (1993). Uses and abuses
of neuropsychological testing in behavioral neurology. Neurologic Clinics, 11, 219231.
Pritchard, D. A., & Rosenblatt, A. (1980). Racial bias
in the MMPI: A methodological review. Journal of
Consulting and Clinical Psychology, 48, 263267.
Prout, H. T., & Ferber, S. M. (1988). Analogue
assessment: Traditional personality assessment
measures in behavioral assessment. In E. S. Shapiro
& T. R. Kratochwill (Eds.), Behavioral assessment in
schools: Conceptual foundations and practical applications (pp. 322350). New York: Guilford Press.
March 4, 2006
14:36
References
Prout, H. T., & Schwartz, J. F. (1984). Validity of the
Peabody Picture Vocabulary Test-Revised with mentally retarded adults. Journal of Clinical Psychology,
40, 584587.
Pulliam, G. P. (1975). Social desirability and the
Psychological Screening Inventory. Psychological
Reports, 36, 522.
Quay, H. C., & Peterson, D. R. (1967). Manual for the
Behavior Problem Checklist. Champaign, IL: Childrens Research Center, University of Illinois.
Quay, L. C. (1974). Language dialect, age, and
intelligence-test performance in disadvantaged
black children. Child Development 45, 463468.
Quenk, N. L. (2000). Essentials of Myers-Briggs Type
Indicator Assessment. New York: Wiley.
Rabideau, G. F. (1955). Differences in visual acuity
measurements obtained with different types of targets. Psychological Monographs, 69, No. 10 (Whole
No. 395).
Rabkin, J. G., & Struening, E. L. (1976). Life events,
stress, and illness. Science, 194, 10131020.
Radloff, L. S. (1977). The CES-D Scale: A self-report
depression scale for research in the general population. Applied Psychological Measurement, 1, 385
401.
Radloff, L. S., & Teri, L. (1986). Use of the Center
for Epidemiological Studies-Depression Scale with
older adults. Clinical Gerontologist, 5, 119136.
Ragosta, M., & Nemceff, W. (1982). A research and
development program on testing handicapped people
(RM 82-2). Princeton, NJ: Educational Testing Service.
Rahe, R. H., & Lind, E. (1971). Psychosocial factors
and sudden cardiac death: A pilot study. Journal of
Psychosomatic Research, 15, 1924.
Rahe, R. H., Lundberg, V., Bennett, L., & Theorell, T.
(1971). The social readjustment rating scale: A comparative study of Swedes and Americans. Journal of
Psychosomatic Research, 15, 241249.
Rahe, R. H., Meyer, M., Smith, M., Kjaer, G., & Holmes,
T. H. (1964). Social stress and illness onset. Journal
of Psychosomatic Research, 8, 3544.
Raine, A. (1991). The SPQ: A scale for the assessment
of schizotypal personality based on DSM-III-R criteria. Schizophrenia Bulletin, 17, 555564.
Rajecki. D. W. (1990). Attitudes. Sunderland, MA: Sinauer.
Ralston, D. A., Gustafson, D. J., Elsass, P. M., Cheung, F., & Terpstra, R. H. (1992). Eastern values: A
comparison of managers in the United States, Hong
Kong, and the Peoples Republic of China. Journal
of Applied Psychology, 77, 664671.
Ramanaiah, N. V., & Adams, M. L. (1979). Conrmatory factor analysis of the WAIS and the WPPSI.
Psychological Reports, 45, 351355.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Ramanaiah, N. V., & Martin, H. J. (1980). On the twodimensional nature of the Marlowe-Crowne Social
Desirability Scale. Journal of Personality Assessment,
44, 507514.
Ramirez, M., Garza, R. T., & Cox, B. G. (1980). Multicultural leader behaviors in ethnically mixed task
groups (Technical Report). Ofce of Naval Research:
Organizational Effectiveness Research Program.
Randt, C. T., Brown, E. R., & Osborne, D. P., Jr. (1980).
A memory test for longitudinal measurement of
mild to moderate decits. Clinical Neuropsychology,
2, 184194.
Rankin, W. L., & Grobe, J. W. (1980). A comparison
of ranking and rating procedures for value system
measurement. European Journal of Social Psychology,
10, 233246.
Raphael, E. G., Cloitre, M., & Dohrenwend, B. P.
(1991). Problems of recall and misclassication with
checklist methods of measuring stressful life events.
Health Psychology, 10, 6274.
Rasch, G. (1966). An item-analysis which takes
individual differences into account. British Journal of Mathematical and Statistical Psychology, 19,
4957.
Rathus, S. A. (1973). A 30-item schedule for assessing
assertive behavior. Behavior Therapy, 4, 398406.
Rathus, S. A., & Nevid, J. S. (1977). Concurrent validity
of the 30-item Assertiveness Schedule with a psychiatric population. Behavior therapy, 8, 393397.
Raven, J. C. (1938). Standard Progressive Matrices.
London: H. K. Lewis.
Raven, J. C. (1947a). Coloured Progressive Matrices.
London: H. K. Lewis.
Raven, J. C. (1947b). Advanced Progressive Matrices.
London: H. K. Lewis.
Raven, J. C., Court, J. H., & Raven, J. (1977). Coloured
Progressive Matrices. London: Lewis.
Raven, J., Raven, J. C., & Court, J. H. (1998). Standard
Progressive Matrices. 1998 edition. Oxford: Oxford
University Press.
Rawls, J. R., Rawls, D. J., & Harrison, C. W. (1969).
An investigation of success predictors in graduate
school in psychology. Journal of Psychology, 72, 125
129.
Ray, J. J. (1971). A new measure of conservatism: Its
limitations. British Journal of Social and Clinical Psychology, 10, 7980.
Ray, S., & Ulissi, S. M. (1982). Adaptation of the Wechsler Preschool and Primary Scales of Intelligence for
deaf children. Natchitoches, LA: Steven Ray.
Razin, A. M. (1971). A-B variable in psychotherapy: A
critical review. Psychological Bulletin, 75, 121.
Reading, A. E., & Newton, J. R. (1978). A card sort
method of pain assessment. Journal of Psychosomatic
Research, 22, 503512.
March 4, 2006
14:36
599
Reardon, R. C., Hersen, M., Bellack, A. S., & Foley,
J. M. (1979). Measuring social skill in grade school
boys. Journal of Behavioral Assessment, 1, 87105.
Ree, M. J., & Earles, J. A. (1991). Predictive training
success: Not much more than g. Personnel Psychology, 44, 321332.
Ree, M. J., Earles, J. A., & Teachout, M. (1994). Predicting job performance: Not much more than g.
Journal of Applied Psychology, 79, 518524.
Reed, M. (1970). Deaf and partially hearing children. In P. Mittler (Ed.), The psychological assessment of mental and physical handicaps. London:
Tavistock.
Reed, P. F., Fitts, W. H., & Boehm, L. (1981). Tennesseee
Self-Concept Scale: Bibliography of research studies
(Rev. ed.). Los Angeles, CA: Western Psychological
Services.
Rehsch, J. M. (1958). A scale for personal rigidity.
Journal of Consulting Psychology, 22, 1115.
Rehsch, J. M. (1959). Some scale and test correlates
of a Personality Rigidity scale. Journal of Consulting
Psychology, 22, 372374.
Reich, J. H. (1987). Instruments measuring DSM-III
and DSM-III-R personality disorders. Journal of Personality Disorders, 1, 220240.
Reich, J. H. (1989). Update on instruments to measure DSM-III and DSM-III-R personality disorders.
Journal of Nervous and Mental Disease, 177, 366370.
Reich, T., Robins, L. E., Woodruff, R. A., Jr., Taibleson,
M., Rich, C., & Cunningham, L. (1975). Computerassisted derivation of a screening interview for alcoholism. Archives of General Psychiatry, 32, 847852.
Reilly, R. R., & Chao, G. T. (1982). Validity and fairness
of some alternative selection procedures. Personnel
Psychology, 35, 162.
Reilly, R. R., Henry, S., & Smither, J. W. (1990). An
examination of the effects of using behavior checklists on the construct validity of assessment center
dimensions. Personnel Psychology, 43, 7184.
Reilly, R. R., & Knight, G. E. (1970). MMPI scores
of Mexican-American college students. Journal of
College Student Personnel, 11, 419422.
Reissland, N. (1983). Cognitive maturity and the experience of fear and pain in the hospital. Social Science
Medicine, 17, 13891395.
Reitan, R. M. (1969). Manual for administration of neuropsychological test batteries for adults and children.
Indianapolis, IN: Author.
Reitan, R. M., & Davison, L. A. (Eds.). (1974). Clinical neuropsychology: Current status and applications.
Washington, DC: Winston.
Reitan, R. M., & Wolfson, D. (1985). Halstead-Reitan
Neuropsychological Test Battery: Theory and clinical interpretation. Tucson, AZ: Neuropsychology
Press.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
600
Remmers, H. H., & Ewart, E. (1941). Reliability of
multiple-choice measuring instruments as a function of the Spearman-Brown prophecy formula, III.
Journal of Educational Psychology, 32, 6166.
Rentoul, A. J., & Fraser, B. J. (1979). Conceptualization
of enquiry-based or open classroom learning environments. Journal of Curriculum Studies, 11, 233
245.
Renzulli, J. S., Hartman, R. K., & Callahan, C. M.
(1971). Teacher identication of superior students.
Exceptional Children, 38, 211214, 243248.
Renzulli, J. S., Smith, L. H., White, A. J., Callahan,
C. M., & Hartman, R. K. (1976). Scales for rating the behavioral characteristics of superior students.
Wetherseld, CT: Creative Learning Press.
Reschly, D. J. (1978): WISC-R factor structures among
Anglos, Blacks, Chicanos, and Native-American
Papagos. Journal of Consulting and Clinical Psychology, 46, 417422.
Reschly, D. J., & Reschly, J. E. (1979). Brief reports
on the WISC-R: I. Validity of WISC-R factor scores
in predicting achievement and attention for four
socio-cultural groups. Journal of School Psychology,
17, 355361.
Reschly, D. J., & Sabers, D. (1979). Analysis of test bias
in four groups with the regression denition. Journal
of Educational Measurement, 16, 19.
Resnick, D. (1982). History of educational testing. In
A. K. Wigdor & W. R. Gamer (Eds.), Ability testing:
Uses, consequences, and controversies: Part II. Documentation section (pp. 173194). Washington, DC:
National Academy Press.
Resnick, P. J. (1988). Malingering of post-traumatic
disorders. In R. Rogers (Ed.), Clinical assessment of
malingering and deception (pp. 84103). New York:
Guilford Press.
Reuter, J., Stancin, T., & Craig, P. (1981). Kent scoring
adaptation of the Bayley Scales of Infant Development.
Kent, OH: Kent Developmental Metrics.
Reynell, J. (1969). Reynell Developmental Language
Scales. Windsor, England: NFER.
Reynolds, C. R. (1982). The problem of bias in psychological assessment. In C. R. Reynolds & T. B.
Gutkin (Eds.), The Handbook of School Psychology
(pp. 178208). New York: Wiley.
Reynolds, C. R. (1985). Review of the System
of Multicultural Pluralistic Assessment. In J. V.
Mitchell (Ed.), The ninth mental measurements yearbook (pp. 15191521). Lincoln, NE: University of
Nebraska Press.
Reynolds, C. R. (1989). Measurement and statistical
problems in neuropsychological assessment of children. In C. R. Reynolds & E. Fletcher-Janzen (Eds.),
Handbook of clinical child neuropsychology (pp. 147
166). New York: Plenum Press.
March 4, 2006
14:36
References
Reynolds, C. R., & Kamphaus, R. W. (1997). The
Kaufman Assessment Battery for Children: Development, structure, and applications in neuropsychology. In A. M. Horton, D. Wedding, et al. (Eds.),
The neuropsychology handbook: Vol. 1. Foundations
and assessment (2nd ed., pp. 290330). New York,
NY: Springer-Verlag.
Reynolds, C. R., & Piersel, W. C. (1983). Multiple
aspects of bias on the Boehm Test of Basic Concepts (Forms A and B) for white and for MexicanAmerican children. Journal of Psychoeducational
Assessment, 1, 135142.
Reynolds, C. R., Willson, V. L., & Chatman, S. R.
(1984). Item bias on the 1981 revision of the Peabody
Picture Vocabulary Test using a new method of
detecting bias. Journal of Psychoeducational Assessment, 2, 219224.
Reynolds, C. R., Willson, V. L., & Chatman, S. R.
(1985). Regression analyses of bias on the Kaufman
Assessment Battery for Children. Journal of School
Psychology, 23, 195204.
Reynolds, W. M. (1979). A caution against the use of
the Slosson Intelligence Test in the diagnosis of mental retardation. Psychology in the Schools, 16, 7779:
Reynolds, W. M. (1982). Development of reliable and
valid short forms of the Marlowe-Crowne Social
Desirability Scale. Journal of Clinical Psychology, 38,
119125.
Reynolds, W. M. (1985). Review of the Slosson Intelligence Test. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 2, pp. 14031404).
Lincoln, NE: University of Nebraska Press.
Rhodes, L., Bayley, N., & Yow, B. (1984). Supplement
to the manual for the Bayley Scales of Infant Development. San Antonio, TX: The Psychological Corporation.
Rich, C. C., & Anderson, R. P. (1965). A tactual form of
the Progressive Matrices for use with blind children.
Personnel and Guidance Journal, 43, 912919.
Rich, J. (1968). Interviewing children and adolescents.
London: Macmillan.
Richards, J. T. (1970). Internal consistency of the
WPPSI with the mentally retarded. American Journal of Mental Deciency, 74, 581582.
Richardson, A. (1977). Verbalizer-visualizer: A cognitive style dimension. Journal of Mental Imagery, 1,
109126.
Richardson, A. (1978). Subject, task, and tester
variables associated with initial eye movement
responses. Journal of Mental Imagery, 2, 85100.
Richardson, K. (1991). Understanding intelligence.
Philadelphia, PA: Milton Keynes.
Richman, N., Stevenson, J., & Graham, P. J. (1982).
Preschool to school: A behavioural study. New York:
Academic Press.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Richmond, B. O., & Kicklighter, R. H. (1980). Childrens Adaptive Behavior Scale. Atlanta, GA: Humanics.
Ridgway, J., MacCulloch, M. J., & Mills, H. E. (1982).
Some experiences in administering a psychometric test with a light pen and microcomputer.
International Journal of Man-Machine Studies, 17,
265278.
Riethmiller, R. J., & Handler, L. (1997). The great gure drawing controversy: The integration of research
and clinical practice. Journal of Personality Assessment, 69, 488496.
Rigazio-DiGilio, S. A. (1993). The Family System Test
(FAST): A spatial representation of family structure
and exibility. The American Journal of Family Therapy, 21, 369375.
Ritzler, B. A., Sharkey, K. J., & Chudy, J. (1980). A comprehensive projective alternative to the TAT. Journal
of Personality Assessment, 44, 358362.
Roach, A. J., Frazier, L. P., & Bowden, S. R. (1981). The
Marital Satisfaction Scale: Development of a measure for intervention research. Journal of Marriage
and the Family, 43, 537545.
Roberts, J. S., Laughlin, J. E., & Wedell, D. H.
(1999). Validity issues in the Likert and Thurstone
approaches to attitude measurement. Educational
and Psychological Measurement, 59, 211233.
Roberts, R. E., Vernon, S. W., & Rhoades, H. M. (1989).
Effects of language and ethnic status on reliability and validity of the Center for Epidemiologic
Studies Depression Scale with psychiatric patients.
Journal of Nervous and Mental Disease, 177, 581592.
Robertson-Tchabo, E. A., & Arenberg, D. (1989).
Assessment of memory in older adults. In T. Hunt
& C. J. Lindley (Eds.), Testing older adults (pp. 200
231). Austin, TX: Pro-Ed.
Robertson, A., & Cochrane, R. (1973). The WilsonPatterson Conservatism Scale: A reappraisal. British
Journal of Social and Clinical Psychology, 12, 428
430.
Robinson, E. L., & Nagle, R. J. (1992). The comparability of the Test of Cognitive Skills with the Wechsler
Intelligence Scale for Children Revised and the
Stanford-Binet: Fourth edition with gifted children.
Psychology in the Schools, 29, 107112.
Robinson, J. P., Athanasiou, R., & Head, K. B. (1969).
Measures of occupational attitudes and occupational
characteristics. Ann Arbor, MI: Institute for Social
Research.
Robinson, J. P., Rusk, J. G., & Head, K. B. (1968). Measures of political attitudes. Ann Arbor, MI: Institute
for Social Research.
Robinson, J. P., & Shaver, P. R. (1973). Measures of social
psychological attitudes. Ann Arbor, MI: Institute for
Social Research.
March 4, 2006
14:36
601
Robinson, J. P., Shaver, P. R., & Wrightsman, L. S.
(1990). Measures of personality and social psychological attitudes. San Diego, CA: Academic Press.
Roe, A., & Klos, D. (1969). Occupational classication.
Counseling Psychologist, 1, 8492.
Roe, A., & Siegelman, M. (1964). The origin of interest.
Washington, DC: American Personnel and Guidance Association.
Rogers, R. (1984). Towards an empirical model of
malingering and deception. Behavioral Sciences and
the Law, 2, 93111.
Rogers, R., Bagby, M., & Gillis, R. (1992). Improvements in the M test as a screening measure for malingering. Bulletin of the American Academy of Psychiatry and Law, 20, 101104.
Rogers, R., Harris, M., & Thatcher, A. A. (1983). Identication of random responders on the MMPI: An
actuarial approach. Psychological Reports, 53, 1171
1174.
Rogers, R., Ustad, K. L., Sewell, K. W., & Reinhardt,
V. (1996). Dimensions of incompetency: A factor
analytic study of the Georgia Court Competency
Test. Behavioral Sciences and the Law, 14, 323330.
Roid, G. H. (1985). Computer-based test interpretation: The potential of quantitative methods of test
interpretation. Computers in Human Behavior, 1,
207219.
Rokeach, M. (1973). The nature of human values. New
York: Free Press.
Rome, H. P., Swenson, W. M., Mataya, P., McCarthy,
C. E., Pearson, J. S., & Keating, R. F. (1962). Symposium on automation techniques in personality
assessment. Proceedings of the Mayo Clinic, 37, 61
82.
Romero, I. (1992). Individual assessment procedures with preschool children. In E. V. Nuttall, I.
Romero, & J. Kalesnik (Eds.), Assessing and screening preschoolers (pp. 5566). Boston: Allyn & Bacon.
Ronan, G. F., Colavito, V. A., & Hammontree, S. R.
(1993). Personal problem-solving system for scoring
TAT responses: Preliminary validity and reliability
data. Journal of Personality Assessment, 61, 2840.
Roper, B. L., Ben-Porath, Y., & Butcher, J. N. (1995).
Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality
Assessment, 65, 358371.
Rorer, L. G. (1965). The great response-style myth.
Psychological Bulletin, 63, 129156.
Rosecranz, H. A., & McNevin, T. E. (1969). A factor
analysis of attitudes toward the aged. The Gerontologist, 9, 5559.
Rosen, A. (1967). Limitations of personality inventories for assessment of deaf children and adults as
illustrated by research with the MMPI. Journal of
Rehabilitation of the Deaf , 1, 4752.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
602
Rosen, E. (1956). Self-appraisal, personal desirability
and perceived social desirability of personality traits.
Journal of Abnormal and Social Psychology, 52, 151
158.
Rosen, R. C., & Beck, J. G. (1988). Patterns of sexual
arousal. New York: Guilford Press.
Rosen, W. G., Motts, R. C., & Davis, K. L. (1984). A
new rating scale for Alzheimers disease. American
Journal of Psychiatry, 141, 13561364.
Rosenbaum, C. P., & Beebe, J. E. (1975). Psychiatric treatment: Crisis, clinic, consultation. New York:
McGraw-Hill.
Rosenberg, M. J., Hovland, C. I., McGuire, W. J., Abelson, R. P., & Brehm, J. W. (1960). Attitude organization and change. New Haven, CT: Yale University
Press.
Rosenstock, I. M. (1974). Historical origins of the
health belief model. Health Education Monographs,
2, 238335.
Rosenthal, B. L., & Kamphaus, R. W. (1988). Interpretive tables for test scatter on die Stanford-Binet
Intelligence Scale: Fourth edition. Journal of Psychoeducational Assessment, 6, 359370.
Ross, C. E., & Minowsky, J. (1979). A comparison of
life-event weighting scheles: Change, undesirability,
and effect-proportional indices. Journal of Health
and Social Behavior, 20, 166177.
Ross, D. R. (1970). A technique of verbal assessment of
deaf students. Journal of Rehabilitation of the Deaf ,
3, 715.
Ross, L. M., & Pollio, H. R. (1991). Metaphors of death:
A thematic analysis of personal meanings. Omega,
23, 291307.
Rossi, P. H., Wright, J. D., & Anderson, A. B. (Eds.).
(1983). Handbook of survey research. New York: Academic Press.
Rossini, E. D., & Kaspar, J. C. (1987). The validity of
the Bender-Gestalt emotional indicators. Journal of
Personality Assessment, 51, 254261.
Rossman, J. (1931). The psychology of the inventor.
Washington, DC: Inventors.
Rothbart, M. K. (1981). Measurement of temperament
in infants. Child Development, 52, 569578.
Rotter, J. B. (1946). The incomplete sentences test as a
method of studying personality. American Psychologist, 1, 286.
Rotter, J. B. (1966). Generalized expectancies for internal vs. external control of reinforcement. Psychological Monographs, 80(Whole No. 609).
Rotter, J. B. (1967). A new scale for the measurement of
interpersonal trust. Journal of Personality, 35, 651
665.
Rotter, J. B., & Rafferty, J. E. (1950). Manual: The Rotter
Incomplete Sentences Blanks. New York: The Psychological Corporation.
March 4, 2006
14:36
References
Rounds, J. B. (1989). Review of the Career Assessment Inventory. In J. C. Conoley & J. J. Kramer
(Eds.), The tenth mental measurements yearbook
(pp. 139141). Lincoln, NE: University of Nebraska
Press.
Rourke, B. P., & Fuerst, D. R. (1991). Learning disabilities and psychosocial functioning: A neuropsychological perspective. New York: Guilford Press.
Rowe, D. C., & Plomin, R. (1977). Temperament in
early childhood. Journal of Personality Assessment,
41, 150156.
Rubenstein, C., & Shaver, P. (1982). In search of intimacy. New York: Delacorte Press.
Rubenzer, S. (1992). A comparison of traditional and
computer-generated psychological reports in an
adolescent inpatient setting. Journal of Clinical Psychology, 48, 817826.
Ruebhausen, O. M., & Brim, O. G., Jr. (1966). Privacy
and behavioral research. American Psychologist, 21,
423437.
Ruebush, B. K. (1963). Anxiety. In H. W. Stevenson, J.
Kagan, & C. Spiker (Eds.), NSSE sixty-second yearbook, Part I: Child psychology. Chicago, IL: University
of Chicago Press.
Ruesch, J., Loeb, M. B., & Jacobson, A. (1948). Acculturation and disease. Psychological Monographs:
General and Applied, 292, 140.
Rulon, P. J. (1939). A simplied procedure for determining the reliability of a test of split-halves. Harvard Educational Review, 9, 99103.
Runco, M. A., & Mraz, W. (1992). Scoring divergent
thinking tests using total ideational output and a
creativity index. Educational and Psychological Measurement, 52, 213221.
Ruschival, M. A., & Way, J. G. (1971). The WPPSI and
the Stanford-Binet: A validity and reliability study
using gifted preschool children. Journal of Consulting and Clinical Psychology, 37, 163.
Russell, D., Peplau, L. A., & Cutrona, C. E. (1980).
The Revised UCLA Loneliness Scale: Concurrent
and discriminant validity evidence. Journal of Personality and Social Psychology, 39, 472480.
Russell, D., Peplau, L. A., & Ferguson, M. L. (1978).
Developing a measure of loneliness. Journal of Personality Assessment, 42, 290294.
Rust, J. O., & Lose, B. D. (1980). Screening for giftedness with the Slosson and the Scale for Rating
the Behavioral Characteristics of Superior Students.
Psychology in the Schools, 17, 446451.
Rutter, M. (1973). The assessment and treatment of
preschool autistic children. Early Child Development
and Care, 3, 1329.
Ryan, A. M., & Sackett, P. R. (1987a). A survey of individual assessment practices by I/O psychologists.
Personnel Psychology, 40, 455488.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Ryan, A. M., & Sackett, P. R. (1987b). Pre-employment
honesty testing: Fakability, reactions of test takers,
and company image. Journal of Business and Psychology, 1, 248256.
Ryan, J. J. (1981). Clinical utility of a WISC-R short
form. Journal of Clinical Psychology, 37, 389391.
Ryan, R. M. (1987). Thematic Apperception Test. In
D. J. Keyser & R. C. Sweetland (Eds.), Test critiques
compendium (pp. 517532). Kansas City, MO: Test
Corporation of America.
Rybstein-Blinchik. E. (1979). Effects of different cognitive strategies on chronic pain experience. Journal
of Behavioral Medicine, 2, 93101.
Ryman, D. H., Naitoh, P., Englund, C., & Genser, S. G.
(1988). Computer response time measurements of
mood, fatigue, and symptom scale items: Implications for scale response time uses. Computers in
Human Behavior, 4, 95109.
Sabatelli, R. M. (1988). Measurement issues in marital research: A review and critique of contemporary survey instruments. Journal of Marriage and
the Family, 50, 891917.
Saccuzzo, D. P., Higgins, G., & Lewandowski, D.
(1974). Program for psychological assessment of law
enforcement ofcers: Initial evaluation. Psychological Reports, 35, 651654.
Saccuzzo, D. P., & Lewandowski, D. G. (1976). The
WISC as a diagnostic tool. Journal of Clinical Psychology, 32, 115124.
Sachs, B., Trybus, R., Koch, H., & Falberg, R. (1974).
Current developments in the psychological evaluation of deaf individuals. Journal of Rehabilitation of
the Deaf , 8, 136140.
Sackett, P. R. (1987). Assessment centers and content
validity: Some neglected issues. Personnel Psychology, 40, 1325.
Sackett, P. R., Burris, L. R., & Callahan, C. (1989).
Integrity testing for personnel selection: An update.
Personnel Psychology, 42, 491529.
Sackett, P. R., & Dreher, G. F. (1982). Constructs
and assessment center dimensions: Some troubling
empirical ndings. Journal of Applied Psychology, 67,
401410.
Sackett, P. R., & Harris, M. M. (1984). Honesty testing for personnel selection: A review and critique.
Personnel Psychology, 37, 221245.
Sackheim, H. A., & Gur, R. C. (1979). Self-deception,
other-deception, and self-reported psychopathology. Journal of Consulting and Clinical Psychology,
47, 213215.
Saks, M. J. (1976). The limits of scientic jury selection:
Ethical and empirical. Jurimetrics Journal, 17,
322.
Salive, M. E., Smith, G. S., & Brewer, T. F. (1989). Suicide mortality in the Maryland state prison system,
March 4, 2006
14:36
603
19791987. Journal of the American Medical Association, 262, 365369.
Salthouse, T. A. (1986). Functional age: Examination
of a concept. In J. E. Birren, P. K. Robinson, &
J. E. Livingston (Eds.), Age, health and employment
(pp. 7892). Englewood Cliffs, NJ: Prentice Hall.
Saltzman, J., Strauss, E., Hunter, M., & Spellacy, F.
(1998). Validity of the Wonderlic Personnel Test as a
brief measure of intelligence in individuals referred
for evaluation of head injury. Archives of Clinical
Neuropsychology, 13, 611616.
Samelson, F. (1977). World War I intelligence testing
and the development of psychology. Journal of the
History of the Behavioral Sciences, 13, 274282.
Samuda, R. J. (1975). Psychological testing of American
minorities: Issues and consequences. New York: Dodd,
Mead, & Co.
Sanchez, G. I. (1932). Scores of Spanish-speaking children on repeated tests. Journal of Genetic Psychology,
40, 223231.
Sanchez, G. I. (1934). Blingualism and mental measures. Journal of Applied Psychology, 18, 765772.
Sanchez, R., & Atkinson, D. (1983). MexicanAmerican cultural commitment, preference for
counselor ethnicity, and willingness to use counseling. Journal of Counseling Psychology, 30, 215220.
Sandberg, S. T., Wieselberg, M., & Shaffer, D. (1980).
Hyperkinetic and conduct problem children in a
primary school population: Some epidemiological
considerations. Journal of Child Psychology and Psychiatry and Allied Disciplines, 21, 293311.
Sandler, I. S., & Guenther, R. T. (1985). Assessment of
life stress events. In P. Karoly (Ed.), Measurement
strategies in health psychology (pp. 555600). New
York: Wiley.
Sandoval, J. (1979). The WISC-R and internal evidence
of test bias with minority groups. Journal of Consulting and Clinical Psychology, 47, 919927.
Sandoval, J. (1981). Format effects in two Teacher
Rating Scales of Hyperactivity. Journal of Abnormal
Child Psychology, 9, 203218.
Sandoval, J. (1985). Review of the System of Multicultural Pluralistic Assessment. In J. V. Mitchell (Ed.),
The ninth mental measurements yearbook (pp. 1521
1525). Lincoln, NE: University of Nebraska Press.
Sanford, E. C. (1987). Biography of Granville Stanley
Hall. The American Journal of Psychology, 100, 365
375.
Santor, D. A., & Coyne, J. C. (1997). Shortening the
CES-D to improve its ability to detect cases of
depression. Psychological Assessment, 9, 233243.
Sapinkopf, R. C. (1978). A computer adaptive testing
approach to the measurement of personality variables. Dissertation Abstracts International, 38, 1OB,
4993.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
604
Sapp, S. G., & Harrod, W. J. (1993). Reliability and
validity of a brief version of Levensons Locus of
Control Scale. Psychological Reports, 72, 539550.
Sarason, I. G. (1958). Interrelationships among individual difference variables, behavior in psychotherapy, and verbal conditioning. Journal of Abnormal
and Social Psychology, 56, 339344.
Sarason, I. G. (1960). Empirical ndings and theoretical problems in the use of anxiety scales. Psychological Bulletin, 57, 403415.
Sarason, I. G. (1980) (Ed.). Test anxiety. Hillsdale, NJ:
Erlbaum.
Sarason, I. G., Johnson, J. H., & Siegel, J. M. (1979).
Assessing the impact of life changes. In I. G. Sarason
& C. D. Spielberger (Eds.), Stress and anxiety
(Vol. 6). New York: Wiley.
Sarason, S. B., Davidson, K. S., Lighthall, F. F., Waite,
R. R., & Ruebush, B. K. (1960). Anxiety in elementary
school children. New York: Wiley.
Sarason, S. B., & Mandler, G. (1952). Some correlates
of test anxiety. Journal of Abnormal and Social Psychology, 47, 810817.
Sarbin, T. R. (1943). A contribution to the study
of actuarial and individual methods of prediction.
American Journal of Sociology, 48, 593602.
Sattler, J. M. (1974). Assessment of childrens
intelligence. Philadelphia, PA: W. B. Saunders.
Sattler, J. M. (1982). Assessment of childrens intelligence
and special abilities (2nd ed.). Boston, MA: Allen &
Bacon.
Sattler, J. M. (1988). Assessment of children (3rd ed.).
San Diego, CA: Author.
Sattler, J. M., & Gwynne, J. (1982).White examiners
generally do not impede the intelligence test performance of black children: To debunk a myth. Journal
of Consulting and Clinical Psychology, 50, 196208.
Sattler, J. M., & Theye, F. (1967). Procedural, situational, and interpersonal variables in individual
intelligence testing. Psychological Bulletin, 68, 347
360.
Sattler, J. M., & Tozier, L. L. (1970). A review of intelligence test modications used with cerebral palsied
and other handicapped groups. The Journal of Special Education, 4, 391398.
Sauer, W. J., & Warland, R. (1982). Morale and life satisfaction. In D. J. Mangen & W. A. Peterson (Eds.),
Research instruments in social gerontology (Vol. 1,
pp. 195240). Minneapolis, MN: University of Minnesota Press.
Sawicki, R. F., Leark, R., Golden. C. J., & Karras, D.
(1984). The development of the pathognomonic,
left sensorimotor and right sensorimotor scales for
the Luria-Nebraska Neuropsychological Battery
Childrens Revision. Journal of Clinical Child Psychology, 13, 165169.
March 4, 2006
14:36
References
Sawyer, J. (1966). Measurement and prediction, clinical and statistical. Psychological Bulletin, 66, 178
200.
Saxe, S. J., & Reiser, M. (1976). A comparison of
three police applicant groups using the MMPI. Journal of Police Science and Administration, 4, 419
425.
Saylor, C. F., Finch, A. J. Jr., Furey, W., Baskin, C. H., &
Kelly, M. M. (1984). Construct validity for measures
of childhood depression: Application of multi-traitmultimethod methodology. Journal of Consulting
and Clinical Psychology, 52, 977985.
Schaefer, C. E. (1967). Biographical correlates of scientic and artistic creativity in adolescents. Unpublished doctoral dissertation, Fordham University,
New York.
Schaefer, C. E., & Anastasi, A. (1968). A biographical inventory for identifying creativity in adolescent
boys. Journal of Applied Psychology, 52, 4248.
Schakel, J. A. (1986). Cognitive assessment of
preschool children. School Psychology Review, 15,
200215.
Scherer, P. (1983). Psychoeducational evaluation of
hearing-impaired preschool children. American
Annals of the Deaf, 128, 118124.
Schetz, K. (1985). Comparison of the Compton
Speech and Language Screening Evaluation with the
Fluharty Preschool Speech and Language Screening Test. Journal of the American Speech-LanguageHearing Association, 16, 1624.
Schiff, M. (1977). Hazard adjustment, locus of control,
and sensation seeking: Some null ndings. Environment and Behavior, 9, 233254.
Schippmann, J. S., Prien, E. P., & Katz, J. A. (1990). Reliability and validity of in-basket performance measures. Personnel Psychology, 43,837859.
Schirmer, B. R. (1993). Constructing meaning from
narrative text. American Annals of the Deaf, 138, 397
403.
Schmid, K. D., Rosenthal, S. L., & Brown, E. D. (1988).
A comparison of self-report measures of two family
dimensions: Control and cohesion. The American
Journal of Family Therapy, 16, 7377.
Schmidt, F. L. (1991). Why all banding procedures in
personnel selection are logically awed. Human Performance, 4, 265278.
Schmidt, F. L. (1985). Review of Wonderlic Personnel Test. In J. V. Mitchell, Jr. (Ed.), The ninth
mental measurements yearbook (Vol. II, pp. 1755
1757). Lincoln, NE: University of Nebraska
Press.
Schmidt, F. L., & Hunter, J. E. (1974). Racial and ethnic
bias in psychological tests: Divergent implications of
two denitions of test bias. American Psychologist,
29, 18.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Schmidt, F. L., & Hunter, J. E. (1977). Development of
a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529540.
Schmidt, F. L., & Hunter, J. E. (1980). The future of
criterion-related validity. Personnel Psychology, 33,
4160.
Schmidt, F. L., & Hunter, J. E. (1981). Employment
testing. Old theories and new research ndings.
American Psychologist, 36, 11281137.
Schmidt, F. L., Hunter, J. E., Pearlman, K., & Shane,
G. S. (1979). Further tests of the Schmidt-Hunter
Bayesian validity generalization procedure. Personnel Psychology, 32, 257281.
Schmidt, F. L., Urry, V. W., & Gugel, J. F. (1978). Computer assisted tailored testing: Examinee reactions
and evaluations. Educational and Psychological Measurement, 38, 265273.
Schmidt, N., & Sermat, V. (1983). Measuring loneliness
in different relationships. Journal of Personality and
Social Psychology, 44, 10381047.
Schmit, M. J., & Ryan, A. M. (1993). The Big Five in
personnel selection: Factor structure in applicant
and nonapplicant populations. Journal of Applied
Psychology, 78, 966974.
Schmitt, F. A., & Ranseen, J. D. (1989). Neuropsychological assessment of older adults. In T. Hunt &
C. J. Lindley (Eds.), Testing older adults (pp. 5169).
Austin, TX: Pro-Ed.
Schmitt, N., Gilliland, S. W., Landis, R. S., & Devine, D.
(1993). Computer-based testing applied to selection
of secretarial applicants. Personnel Psychology, 46,
149165.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M.
(1984). Metaanalysis of validity studies published
between 1964 and 1982 and the investigation of
study characteristics. Personnel Psychology, 37, 407
422.
Schmitt, N., & Robertson, I. (1990). Personnel selection. Annual Review of Psychology, 41, 289319.
Schneider, B. (1983). An interactionist perspective on
organizational effectiveness. In K. S. Cameron &
D. A. Whetten (Eds.), Organizational effectiveness: A
comparison of multiple models (pp. 2754). Orlando,
FL: Academic Press.
Schneider, W. H. (1992). After Binet: French intelligence testing, 19001950. Journal of the History of
the Behavioral Sciences, 28, 111132.
Schneider, J. M., & Parsons, O. A. (1970). Categories on
the locus of control scale and cross-cultural commparisons in Denmark and the United States. Journal
of Cross-Cultural Psychology, 1, 131138.
Schoenfeldt, L. F. (1985). Review of Wonderlic Personnel Test. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. II, pp. 17571758).
Lincoln, NE: University of Nebraska Press.
March 4, 2006
14:36
605
Schoenfeldt, L. F. (1989). Guidelines for computerbased psychological tests and interpretations. Computers in Human Behavior, 5, 1321.
Scholl, G., & Schnur, R. (1976). Measures of psychological, vocational, and educational functioning in the
blind and visually handicapped. New York: American Foundation for the Blind.
Schopler, E., & Reichler, R. J. (1979). Individualized
assessment and treatment for autistic and developmentally disabled children: Vol. I. Psychoeducational
prole. Baltimore, MD: University Park Press.
Schrader, A. D., & Osburn, H. G. (1977). Biodata
faking: Effects of induced subtlety and position
specicity. Personnel Psychology, 30, 395404.
Schrader, W. B. (1971). The predictive validity of College Board admissions tests. In W. H. Angoff (Ed.),
The College Board admissions testing program (pp.
117145). New York: College Entrance Examination
Board.
Schretlen, D. J. (1988). The use of psychological tests to
identify malingered symptoms of mental disorder.
Clinical Psychology Review, 8, 451476.
Schretlen, D., & Arkowitz, H. (1990). A psychological test battery to detect prison inmates who fake
insanity or mental retardation. Behavioral Sciences
and the Law, 8, 7584.
Schretlen, D., Wilkins, S. S., Van Gorp, W. G., & Bobholz, J. H. (1992). Cross-validation of a psychological test battery to detect faked insanity. Psychological
Assessment, 4, 7783.
Schriesheim, C. A. (1981). The effect of grouping or
randomizing items on leniency response bias. Educational and Psychological Measurement, 41, 401
411.
Schriesheim, C. A., & DeNisi, A. A. (1980). Item presentation as an inuence on questionnaire validity:
A eld experiment. Educational and Psychological
Measurement, 40, 175182.
Schriesheim, C. A., Eisenbach, R. J., & Hill, K. D.
(1991). The effect of negation and polar opposite item reversals on questionnaire reliability and
validity: An experimental investigation. Educational
and Psychological Measurement, 51, 6778.
Schriesheim, C. A., & Klich, N. R. (1991). Fiedlers
Least Preferred Coworker (LPC) Instrument: An
investigation of its true bipolarity. Educational and
Psychological Measurement, 51, 305315.
Schroeder, L. D., Sjoquist, D. L., & Stephan, P. E. (1986).
Understanding regression analysis. Newbury Park,
CA: Sage.
Schuessler, K. F. (1961). A note on statistical signicance of scalogram. Sociometry, 24, 312318.
Schuh, A. J. (1967). The predictability of employee
tenure: A review of the literature. Personnel Psychology, 20, 133152.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
606
Schuldberg, D. (1988). The MMPI is less sensitive to
the automated testing format than it is to repeated
testing: Item and scale effects. Computers in Human
Behavior, 4, 285298.
Schuman, H., & Kalton, G. (1985). Survey methods. In
G. Lindzey & E. Aronson (Eds.), Handbook of social
psychology (Vol. 1, 3rd ed.). New York: Random
House.
Schutte, N. S., & Malouff, J. M. (1995). Sourcebook of
adult assessment strategies. New York: Plenum Press.
Schwab, D. P., & Oliver, R. L. (1974). Predicting tenure
with biographical data: Exhuming buried evidence.
Personnel Psychology, 27, 125128.
Schwab, J. J., Bialow, M., Brown, J. M., & Holzer, C. E.
(1967). Diagnosing depression in medical inpatients. Annals of Internal Medicine, 67, 695707.
Scissons, E. H. (1976). Computer administration of
the California Psychological Inventory. Measurement and Evaluation in Guidance, 9, 2430.
Scogin, F., Schumacher, J., Gardner, J., & Chaplin, W.
(1995). Predictive validity of psychological testing
in law enforcement settings. Professional Psychology,
26, 6871.
Scott, R. D., & Johnson, R. W. (1967). Use of the
weighted application blank in selecting unskilled
employees. Journal of Applied Psychology, 51, 393
395.
Scully, J. A., Tosi, H., & Banning, K. (2000). Life event
checklists: Revisiting the Social Readjustment Rating Scale after 30 years. Educational and Psychological Measurement, 60, 864876.
Seamons, D. T., Howell, R. J., Carlisle, A. L., & Roe,
A. V. (1981). Rorschach simulation of mental illness
and normality by psychotic and non-psychotic legal
offenders. Journal of Personality Assessment, 45, 130
135.
Seashore, H. G., Wesman, A. G., & Doppelt, J. E. (1950).
The standardization of the Wechsler Intelligence
Scale for Children. Journal of Consulting Psychology,
14, 99110.
Segal, D. L., Hersen, M., & Van Hasselt, V. B. (1994).
Reliability of the Structured Clinical Interview for
DSM-III-R: An evaluative review. Comprehensive
Psychiatry, 35, 316327.
Segal, D. L., Hersen, M., Van Hasselt, V. B., Kabacoff,
R. I., & Roth, L. (1993). Reliability of diagnosis
in older psychiatric patients using the Structured Clinical Interview for DSM-III-R. Journal
of Psychopathology and Behavioral Assessment, 15,
347356.
Seligman, M. (1975). Helplessness. New York: Freeman.
Selzer, M. L. (1971). The Michigan Alcoholism Screening Test: The quest for a new diagnostic instrument. American Journal of Psychiatry, 127, 1653
1658.
March 4, 2006
14:36
References
Serwer, B. J., Shapiro, B. J., & Shapiro, P. P. (1972).
Achievement prediction of high risk children. Perceptual and Motor Skills, 35, 347354.
Sewell, T. E. (1977). A comparison of WPPSI and
Stanford-Binet Intelligence Scale (1972) among
lower SES black children. Psychology in the Schools,
14, 158161.
Shah, C. P., & Boyden, M. F. H. (1991). Assessment
of auditory functioning. In B. A. Bracken (Ed.),
The psychoeducational assessment of preschool children (2nd ed., pp. 341378). Boston, MA: Allyn &
Bacon.
Shapiro, E. S. (1987). Behavioral assessment in school
psychology. Hillsdale, NJ: Erlbaum.
Sharp, S. E. (18981899). Individual psychology: A
study in psychological method. American Journal
of Psychology, 10, 329391.
Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989).
Generalizability theory. American Psychologist, 44,
922932.
Shaycoft, M. F. (1979). Handbook of criterionreferenced testing: Development, evaluation, and use.
New York: Garland STPM Press.
Sheehan, K. R., & Gray, M. R. (1991). Sex bias in the
SAT and the DTMS. The Journal of General Psychology, 119, 514.
Shepard, L. (1980). Standard setting issues and methods. Applied Psychological Measurement, 4, 447467.
Sherer, M., Parsons, O. A., Nixon, S., & Adams, R. L.
(1991). Clinical validity of the Speech Sounds Perception Test and the Seashore Rhythm Test. Journal
of Clinical and Experimental Neuropsychology, 13,
741751.
Sherman, S. W., & Robinson, N. M. (Eds.). (1982).
Ability testing of handicapped people: Dilemma for
government, science, and the public. Washington,
DC: National Academy Press.
Sherry, D. L., & Piotrowski, C. (1986). Consistency
of factor structure on the Semantic Differential: An
analysis of three adult samples. Educational and Psychological Measurement, 46, 263268.
Shimberg, B. (1981). Testing for licensure and certication. American Psychologist, 36, 11381146.
Shneidman, E. S. (Ed.). (1951), Thematic test analysis.
New York: Grune & Stratton.
Shore, T. H., Thornton, G. C., III, & Shore, L. M.
(1990). Construct validity of two categories of
assessment center dimension ratings. Personnel Psychology, 43, 101116.
Shostrom, E. L. (1963). Personal Orientation Inventory.
San Diego, CA: Education and Industrial Testing
Service.
Shostrom, E. L. (1964). A test for the measurement
of self-actualization. Educational and Psychological
Measurement, 24, 207218.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Shrauger, J. S., & Osberg, T. M. (1981). The relative
accuracy of self-prediction and judgments by others
in psychological assessment. Psychological Bulletin,
90, 322351.
Shrout, P. E., Spitzer, R. L., & Fleiss, J. L. (1987).
Quantication of agreement in psychiatric diagnosis revisited. Archives of General Psychiatry, 44, 172
177.
Shultz, K. S., & Chavez, D. V. (1994). The reliability
and factor structure of a social desirability scale in
English and in Spanish. Educational and Psychological Measurement, 54, 935940.
Sibley, S. (1989). Review of the WISC-Riter Complete.
In J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurements yearbook (pp. 892893). Lincoln,
NE: University of Nebraska Press.
Sicoly, F. (1992). Estimating the accuracy of decisions
based on cutting scores. Journal of Psychoeducational
Assessment, 10, 2636.
Sidick, J. T., Barrett, G. V., & Doverspike, D. (1994).
Three-alternative multiple choice tests: An attractive
option. Personnel Psychology, 47, 829835.
Siebel, C. C., Faust, W. L., & Faust, M. S. (1971).
Administration of design copying tests to large
groups of children. Perceptual and Motor Skills, 32,
355360.
Siegel, D. J., & Piotrowski, R. J. (1985). Reliability of
K-ABC subtest composites. Journal of Psychoeducational Assessment, 3, 7376.
Silverman, B. I., Bishop, G. F., & Jaffe, J. (1976). Psychology of the scientist: XXXV. Terminal and instrumental values of American graduate students in Psychology. Psychological Reports, 39, 10991108.
Silverstein, A. B. (1967). Validity of WISC short forms
at three age levels. California Mental Health Research
Digest, 5, 253254.
Silverstein, A. B. (1968). Validity of a new approach
to the design of WAIS, WISC, and WPPSI short
forms. Journal of Consulting and Clinical Psychology,
32, 478479.
Silverstein, A. B. (1970). Reappraisal of the validity of
WAIS, WISC, and WPPSI short forms. Journal of
Consulting and Clinical Psychology, 34, 1214.
Silverstein, A. B. (1971). Deviation social quotients
for the Vineland Social Maturity Scale. American
Journal of Mental Deciency, 76, 348351.
Silverstein, A. B. (1973). Factor structure of the Wechsler Intelligence Scale for Children for three ethnic
groups. Journal of Educational Psychology, 65, 408
410.
Silverstein, A. B. (1984). Pattern analysis: The question
of abnormality. Journal of Consulting and Clinical
Psychology, 52, 936939.
Simeonsson, R. J., Bailey, D. B., Huntington, G. S., &
Comfort, M. (1986). Testing the concept of good-
March 4, 2006
14:36
607
ness of t in early intervention. Infant Mental Health
Journal, 7, 8194.
Simeonsson, R. J., Buckley, L., & Monson, L. (1979).
Concepts of illness causality in hospitalized children. Journal of Pediatric Psychology, 4, 7784.
Simeonsson, R. J., Huntington, G. S., & Parse, S. A.
(1980). Expanding the developmental assessment
of young handicapped children. New Directions for
Exceptional Children, 3, 5174.
Simola, S. K., & Holden, R. R. (1992). Equivalence of computerized and standard administration of the Piers-Harris Childrens Self-Concept
Scale. Journal of Personality Assessment, 58, 287
294.
Singer, E., & Presser, S. (1989). Survey research methods. Chicago, IL: University of Chicago Press.
Sipos, K., Sipos, M., & Spielberger, C. D. (1985).
The development and validation of the Hungarian form of the Test Anxiety Inventory. In H. M.
Van Der Ploeg, R. Schwarzer, & C. D. Spielberger
(Eds.), Advances in Test Anxiety Research (Vol.
4, pp. 221228). Lisse, the Netherlands: Swets &
Zeitlinger.
Siu, A. L., Reuben, D. B., & Hayes, R. D. (1990). Hierarchical measures of physical function in ambulatory
geriatrics. Journal of the American Geriatric Society,
38, 11131119.
Skinner, H. A., & Allen, B. A. (1983). Does the computer make a difference? Computerized vs. face-toface vs. self-report assessment of alcohol, drug, and
tobacco use. Journal of Consulting and Clinical Psychology, 51, 267275.
Skinner, H. A., & Lei, H. (1980). Differential weights
in life change research: Useful or irrelevant? Psychosomatic Medicine, 42, 367370.
Slate, J. R., & Hunnicutt, L. C., Jr. (1988). Examiner
errors on the Wechsler scales. Journal of Psychoeducational Assessment, 6, 280288.
Slomka, G. T., & Tarter, R. E. (1984). Mental retardation. In R. E. Tarter & G. Goldstein (Eds.), Advances
in clinical neuropsychology (Vol. 2, pp. 109137).
NY: Plenum Press.
Slosson, R. L. (1963). Slosson Intelligence Test (SIT) for
children and adults. New York: Slosson Educational.
Slosson, R. L. (1991). Slosson Intelligence Test (SIT-R).
East Aurora, NY: Slosson Educational.
Small, S. A., Zeldin, R. S., & Savin-Williams, R. C.
(1983). In search of personality traits: A multimethod analysis of naturally occurring prosocial
and dominance behavior. Journal of Personality, 51,
116.
Smith, A. L., Hays, J. R., & Solway, K. S. (1977). Comparison of the WISC-R and Culture Fair Intelligence
Test in a juvenile delinquent population. The Journal
of Psychology, 97, 179182.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
608
Smith, D. K., St. Martin, M. E., & Lyon, M. A. (1989). A
validity study of the Stanford-Binet: Fourth edition
with students with learning disabilities. Journal of
Learning Disabilities, 22, 260261.
Smith, J. D. (1985). Minds made feeble: The myth and
legacy of the Kallikaks. Rockville, MD: Aspen Systems.
Smith, M. B. (1990). Henry A. Murray (1893-1988):
Humanistic psychologist. Journal of Humanistic Psychology, 30, 613.
Smith, M. B., & Anderson, J. W. (1989). Obituary:
Henry A. Murray (1893-1988). American Psychologist 44, 11531154.
Smith, P. C., & Kendall, L. M. (1963). Retranslation
of expectations: An approach to the construction of
unambiguous anchors for rating scales. Journal of
Applied Psychology, 47, 149155.
Smither, R. D., & Houston, J. M. (1992). The nature of
competitiveness: the development and validation of
the competitiveness index. Educational and Psychological Measurement, 52, 407418.
Smyth, F. L. (1989). Commercial coaching and
SAT scores. Journal of College Admissions, 123,
29.
Smyth, F. L. (1990). SAT coaching. Journal of College
Admissions, 129, 717.
Snell, W. E., Jr. (1989). Development and validation of
the Masculine Behavior Scale: A measure of behaviors stereotypically attributed to males vs. females.
Sex Roles, 21, 749767.
Snell, W. E., & Papini, D. R. (1989). The sexuality scale:
An instrument to measure sexual-esteem, sexualdepression, and sexual-preoccupation. The Journal
of Sex Research, 26, 256263.
Snider, J. G., & Osgood, C. E. (Eds.). (1969). Semantic Differential technique: A sourcebook. Chicago, IL:
Aldine.
Snow, J. H., & Hynd, G. W. (1985a). A multivariate investigation of the Luria-Nebraska Neuropsychological Battery Childrens Revision with
learning-disabled children. Journal of Psychoeducational Assessment, 3, 101109.
Snow, J. H., & Hynd, G. W. (1985b). Factor structure
of the Luria-Nebraska Neuropsychological Battery
Childrens Revision. Journal of School Psychology, 23,
271276.
Snyder, C. R., Harris, C., Anderson, J. R., Holleran,
S. A., Irving, L. M., Sigmon, S. T., Yoshinobu,
L., Gibb, J., Langelle, C., & Harney, P. (1991).
The Will and the Ways: Development and validation of an individual-differences measure of hope.
Journal of Personality and Social Psychology, 60,
570585.
Snyder, C. R., Shenkel, R. J., & Lowery, C. R. (1977).
Acceptance of personality interpretations: The
March 4, 2006
14:36
References
Barnum effect and beyond. Journal of Consulting
and Clinical Psychology, 45, 104114.
Snyder, D. K. (1979). Multidimensional assessment
of marital satisfaction. Journal of Marriage and the
Family, 41, 813823.
Snyder, D. K. (1981). Manual for the Marital Satisfaction Inventory. Los Angeles, CA: Western Psychological Services.
Snyder, D. K., Lachar, D., & Wills, R. M. (1988).
Computer-based interpretation of the Marital Satisfaction Inventory: Use in treatment planning. Journal of Marital and Family Therapy, 14, 397409.
Snyder, D. K., Widiger, T. A., & Hoover, D. W.
(1990). Methodological considerations in validating computer-based test interpretations: Controlling for response bias. Psychological Assessment, 2,
470477.
Soares, A. T., & Soares, L. M. (1975). Self-Perception
Inventory Composite manual. Bridgeport, CT: University of Bridgeport.
Sokal, M. M. (Ed.). (1987). Psychological testing and
American society 18901930. New Brunswick, NJ:
Rutgers University Press.
Sokal, M. M. (1990). G. Stanley Hall and the institutional character of psychology at Clark 18891920.
Journal of the History of the Behavioral Sciences, 26,
114124.
Sokal, M. M. (1991). Obituary: Psyche Cattell (1893
1989). American Psychologist, 46, 72.
Solomon, E., & Kopelman, R. E. (1984). Questionnaire
format and scale reliability: An examination of three
modes of item presentation. Psychological Reports,
54, 447452.
Sonnenfeld, J. (1969). Personality and behavior in
environment. Proceedings of the Association of American Geographers, 1, 136140.
Space, L. G. (1981). The computer as a psychometrician. Behavior Research Methods and Instrumentation, 13, 596606.
Spanier, G. B. (1976). Measuring dyadic adjustment:
New scales for assessing the quality of marriage and
similar dyads. Journal of Marriage and the Family,
38, 1528.
Sparrow, S. S., Balla, D. A., & Cicchetti, D. V. (1984).
Vineland Adaptive Behavior Scales. Circle Pines, MN:
American Guidance Service.
Spearman, C. (1904). General intelligence objectively determined and measured. American Journal
of Psychology, 15, 201293.
Spearman, C. (1927). The abilities of man. London:
Macmillan.
Spearman, C. (1930). Autobiography. In C. Murchison
(Ed.), A history of psychology in autobiography
(Vol. 1, pp. 299334). Worcester, MA: Clark University Press.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Spector, P. (1982). Behavior in organizations as a function of employees locus of control. Psychological
Bulletin, 91, 482497.
Spector, P. E. (1988). Development of the Work Locus
of Control Scale. Journal of Occupational Psychology,
61, 335340.
Spielberger, C. D. (1966) (Ed.), Anxiety and behavior.
New York: Academic Press.
Spielberger, C. D. (1980). Test Anxiety Inventory. Palo
Alto, CA: Consulting Psychologists Press.
Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E.
(1970). STAI Manual. Palo Alto, CA: Consulting
Psychologists Press.
Spielberger, C. D., Gorsuch, R. L., Lushene, R., Vagg,
P. R., & Jacobs, G. A. (1983). Manual for the StateTrait Anxiety Inventory: A self-evaluation questionnaire. Palo Alto, CA: Consulting Psychologists
Press.
Spitzer, R. L., & Endicott, J. (1977). Schedule for Affective Disorders and Schizophrenia Life-time version
(SADS-L). New York: New York State Psychiatric
Institute.
Spitzer, R. L., & Williams, J. B. W. (1983). Instruction manual for the Structured Clinical Interview for
DSM-III (SCID). New York: Biometrics Research
Department, New York State Psychiatric Institute.
Spitzer, R. L., Williams, J. B. W., Gibbon, M., & First,
M. B. (1992). The Structured Clinical Interview for
DSM-III-R (SCID). Archives of General Psychiatry,
49, 624629.
Spranger, E. (1928). Types of men. Translated from the
5th German edition of Lebensformen by P. J. W. Pigors. Halle: Max Niemeyer Verlag.
Spruill, J. (1988). Two types of tables for use with the
Stanford-Binet Intelligence Scale: Fourth edition.
Journal of Psychoeducational Assessment, 6, 7886.
Spruill, J. (1991). A comparison of the Wechsler Adult
Intelligence Scale Revised with the Stanford-Binet
Intelligence Scale (4th ed.) for mentally retarded
adults. Psychological Assessment, 3, 133135.
Staats, S. (1989). Hope: A comparison of two selfreport measures for adults. Journal of Personality
Assessment, 53, 366375.
Staats, S. R., & Stassen, M. A. (1985). Hope: An affective
cognition. Social Indicators Research, 17, 235242.
Staff (1988). Validity service update. GRE Board
Newsletter, p. 3.
Stanley, J. C., & Porter, A. C. (1967). Correlation of
Scholastic Aptitude Test scores with college grades
for Negroes vs. whites. Journal of Educational Measurement, 4, 199218.
Stanton, J. M. (1956). Group personality proles
related to aspects of antisocial behavior. Journal of
Criminal Law, Criminology and Police Science, 47,
340349.
March 4, 2006
14:36
609
Steer, R. A., Beck, A. T., & Garrison, B. (1986). Applications of the Beck Depression Inventory. In N. Sartorius & T. A. Ban (Eds.), Assessment of depression
(pp. 123142). New York: Springer-Verlag.
Steer, R. A., Rissmiller, D. J., Ranieri, W. F., & Beck,
A. T. (1994). Use of the computer-administered Beck
Depression Inventory and Hopelessness Scale with
psychiatric inpatients. Computers in Human Behavior, 10, 223229.
Stein, K. B. (1968). The TSC scales: The outcome
of a cluster analysis of the 550 MMPI items. In P.
McReynolds (Ed.), Advances in psychological assessment (Vol. 1, pp. 80104). Palo Alto, CA: Science
and Behavior Books.
Stein, L. A. R., & Graham, J. R. (1999). Detecting fakegood MMPI-A proles in a correctional facility. Psychological Assessment, 11, 386395.
Stein, M. I. (1981). Thematic Apperception Test (2nd
ed.). Springeld, IL: Charles C Thomas.
Stein, S. P., & Charles, P. (1971). Emotional factors
in juvenile diabetes mellitus: A study of early life
experience of adolescent diabetics. American Journal
of Psychiatry, 128, 5660.
Stephens, T. E., & Lattimore, J. (1983). Prescriptive
checklist for positioning multihandicapped residential clients: A clinical report. Physical Therapy, 63,
11131115.
Stephenson, W. (1953). The study of behavior. Chicago,
IL: University of Chicago Press.
Stern, W. (1930). Autobiography. In C. Murchison
(Ed.), A history of psychology in autobiography
(Vol. 1, pp. 335388). Worcester, MA: Clark University Press.
Sternberg, R. J. (1984). The Kaufman Assessment Battery for Children: An information processing analysis and critique. The Journal of Special Education,
18, 269279.
Sternberg, R. J. (1985). Beyond IQ: A triarchic theory
of human intelligence. Cambridge, MA: Cambridge
University Press.
Sternberg, R. J. (Ed.). (1988a). Advances in the psychology of human intelligence (Vols. 14). Hillsdale, NJ:
Erlbaum.
Sternberg, R. J. (1988b). The triarchic mind. New York:
Viking.
Sternberg, R. J. (1990). Metaphors of mind: Conceptions of the nature of intelligence. Cambridge, MA:
Cambridge University Press.
Sternberg, R. J., & Davidson, J. E. (1986). Conceptions
of giftedness. New York: Cambridge University.
Sternberg, R. J., & Detterman, D. K. (1986). What is
intelligence? Norwood, NJ: Ablex.
Sternberg, R. J., & Grigorenko, E. L. (1997). Are cognitive styles still in style? American Psychologist, 52,
700712.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
610
Sternberg, R. J., Wagner, R. K., Williams, W. M., & Horvath, J. A. (1995). Testing common sense. American
Psychologist, 50, 912927.
Sternberg, R. J., & Williams, W. M. (1997). Does the
Graduate Record Examination predict meaningful
success in the graduate training of psychologists?
American Psychologist, 52, 630641.
Stewart, A. J., & Chester, N. L. (1982). Sex differences in human social motives: Achievement, afliation, and power. In A. Stewart (Ed.), Motivation and society (pp. 172218). San Francisco, CA:
Jossey-Bass.
Stewart, A. L., Hays, R. D., & Ware, J. E. (1988). The
MOS short-form General Health Survey: Reliability
and validity in a patient population. Medical Care,
26, 724735.
Stokols, D. (1992). Establishing and maintaining
healthy environments. American Psychologist, 47, 6
22.
Stone, G. C., Cohen, F., & Adler, N. E. (Eds.). (1979).
Health psychology: A handbook. Washington, DC:
Jossey-Bass.
Stotland, E. (1969). The psychology of hope. San Francisco, CA: Jossey-Bass.
Stotland, E., & Blumenthal, A. L. (1964). The reduction
of anxiety as a result of the expectation of making a
choice. Canadian Journal of Psychology, 18, 139145.
Strahan. R., & Gerbasi, K. C. (1972). Short, homogeneous versions of the Marlowe-Crowne Social Desirability Scale. Journal of Clinical Psychology, 28, 191
193.
Street, W. R. (1994). A chronology of noteworthy events
in American psychology. Washington, DC: American
Psychological Association.
Streiner, D. L. (1985). Psychological Screening Inventory. In D. J. Keyser & R. C. Sweetland (Eds.), Test
critiques (Vol. IV, pp. 509515). Kansas City, MO:
Test Corporation of America.
Streiner, D. L., & Miller, H. R. (1986). Can a good
short form of the MMPI ever be developed? Journal
of Clinical Psychology, 42, 109113.
Strenio, A. J. (1981). The testing trap. New York: Rawson, Wade.
Stricker, L. J., & Ross, J. (1964a). Some correlates of a
Jungian personality inventory. Psychological Reports,
14, 623643.
Stricker, L. J., & Ross, J. (1964b). An assessment of
some structural properties of the Jungian personality typology. Journal of Abnormal and Social Psychology, 68, 6271.
Strickland, B. R. (1989). Internal-external control
expectancies: From contingency to creativity. American Psychologist, 44, 112.
Strickland, B. R., & Crowne, D. P. (1963). Need for
approval and the premature termination of psy-
March 4, 2006
14:36
References
chotherapy. Journal of Consulting Psychology, 27, 95
101.
Strong, E. K., Jr. (1935). Predictive value of the Vocational Interest Test. Journal of Educational Psychology, 26, 331349.
Strong, E. K., Jr. (1955). Vocational interests 18 years
after college. Minneapolis, MN: University of Minnesota Press.
Sue, D. W., & Sue, D. (1990). Counseling the culturally
different. New York: Wiley.
Suinn, R. M., Ahuna, C., & Khoo, G. (1992). The
Suinn-Lew Asian Self-Identity Acculturation Scale:
Concurrent and factorial validation. Educational
and Psychological Measurement, 52, 10411046.
Suinn, R., Dauterman, W., & Shapiro, B. (1967). The
WAIS as a predictor of educational and occupational
achievement in the adult blind. New Outlook for the
Blind, 61, 4143.
Suinn, R., Rikard-Figueroa, K., Lew, S., & Vigil, P.
(1987). The Suinn-Lew Asian Self-Identity Acculturation Scale: An initial report. Educational and
Psychological Measurement, 47, 401407.
Sullivan, P. M. (1982). Administration modications
on the WISC-R Performance Scale with different
categories of deaf children. American Annals of the
Deaf, 127, 780788.
Sullivan, P. M., & Vernon, M. (1979). Psychological assessment of hearing-impaired children. School
Psychology Digest, 8, 271290.
Sutter, E. G., & Battin, R. R. (1984). Using traditional psychological tests to obtain neuropsychological information on children. International Journal
of Clinical Neuropsychology, 6, 115119.
Swallow, R. (1981). Fifty assessment instruments commonly used with blind and partially seeing individuals. Journal of Visual Impairment and Blindness, 75,
6572.
Swanson, H. L., & Watson, B. L. (1989). Educational
and psychological assessment of exceptional children
(2nd ed.). Columbus, OH: Merrill.
Swenson, W. M. (1985). An aging psychologist
assessess the impact of age on MMPI proles. Psychiatric Annals, 15, 554557.
Swerdlik, M. E. (1977). The question of the comparability of the WISC and WISC-R: Review of the
research and implications for school psychologists.
Psychology in the Schools, 14, 260270.
Szapocznik, J., Kurtines, W. M., & Fernandez, T.
(1980). Bicultural involvement and adjustment in
Hispanic-American youths. International Journal of
Intercultural Relations, 4, 353365.
Szapocznik, J., Scopetta, M. A., Aranalde, M., &
Kurtines, W. (1978). Cuban value structure: Treatment implications. Journal of Consulting and Clinical Psychology, 46, 961970.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Tamkin, A. S., & Klett, C. J. (1957). Barrons Ego
Strength Scale: A replication of an evaluation of its
construct validity. Journal of Consulting Psychology,
21, 412.
Tashakkori, A., Barefoot, J., & Mehryar, A. H. (1989).
What does the Beck Depression Inventory measure
in college students? Evidence from a non-western
culture. Journal of Clinical Psychology, 45, 595602.
Tasto, D. L. (1977). Self-report schedules and inventories. In A. R. Ciminero, K. S. Calhoun, & H. E.
Adams (Eds.), Handbook of behavioral assessment
(pp. 153193). New York: Wiley.
Tasto, D. L., & Hickson, R. (1970). Standardization,
item analysis, and scaling of the 122-item Fear Survey Schedule. Behavior Therapy, 1, 473484.
Tasto, D. L., Hickson, R., & Rubin, S. E. (1971).
Scaled prole analysis of fear survey schedule factors. Behavior Therapy, 2, 543549.
Tate, D. G., Forchheiner, M., Maynard, F., Davidoff,
G., & Dijkers, M. (1993). Comparing two measures
of depression in spinal cord injury. Rehabilitation
Psychology, 38, 5361.
Tatsuoka, M. (1970). Discriminant analysis: The study
of group differences. Champaign, IL: Institute for Personality and Ability Testing.
Taylor, H. C., & Russell, J. T. (1939). The relationship of
validity coefcients to the practical effectiveness
of tests in selection. Discussion and tables. Journal
of Applied Psychology, 23, 565578.
Taylor, J. A. (1953). A personality scale of manifest
anxiety. Journal of Abnormal and Social Psychology,
48, 285290.
Taylor, J. B. (1959). Social desirability and MMPI
performance: The individual case. Journal of Consulting Psychology, 23, 514517.
Taylor, R. L., Kauffman, D., & Partenio, I. (1984).
The Koppitz developmental scoring system for the
Bender-Gestalt: Is it developmental? Psychology in
the Schools, 21, 425428.
Taylor, R. L., Slocumb, P. R., & ONeill, J. (1979). A
short form of the McCarthy Scales of Childrens
Abilities: Methodological and clinical applications.
Psychology in the Schools, 16, 347350.
Taylor, S. E. (1990). Health psychology. American Psychologist, 45, 4050.
Templer, D. I. (1970). The construction and validation
of a Death Anxiety Scale. The Journal of General Psychology, 82, 165177.
Tenopyr, M. L. (1977). Content-construct confusion.
Personnel Psychology, 30, 4754.
Tenopyr, M. L., & Oeltjen, P. D. (1982). Personnel selection and classication. Annual Review of Psychology,
33, 581618.
Teplin, L. (1991). The criminalization hypothesis:
Myth, misnomer, or management strategy. In S. A.
March 4, 2006
14:36
611
Shah & B. D. Sales (Eds.), Law and mental health:
Major developments and research needs (pp. 149
183). Rockville, MD: U.S. Department of Health and
Human Services.
Teplin, L., & Swartz, J. (1989). Screening for severe
mental disorder in jails: The development of the
Referral Decision Scale. Law and Human Behavior,
13, 118.
terKuile, M. M., Linssen, A. C. G., & Spinhoven, P.
(1993). The development of the Multidimensional
Locus of Pain Control Questionnaire (MLPC): Factor structure, reliability, and validity. Journal of Psychopathology and Behavioral Assessment, 15, 387
404.
Terman, L. M. (1916). The measurement of intelligence.
Boston, MA: Houghton-Mifin.
Terman, L. M. (1932). Autobiography. In C. Murchison (Ed.), A history of psychology in autobiography
(Vol. 2, pp. 297332). Worcester, MA: Clark University Press.
Terman, L. M., & Childs, H. G. (1912). A tentative revision and extension of the Binet-Simon measuring
scale of intelligence. Journal of Educational Psychology, 3, 6174; 133143; 198208; 277289.
Terman, L. M., & Merrill, M. A. (1937). Measuring
intelligence. Boston, MA: Houghton-Mifin.
Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance:
A metaanalytic review. Personnel Psychology, 44,
703742.
Thayer, P. W. (1977). Somethings old, somethings new.
Personnel Psychology, 30, 513524.
Thistlethwaite, D. L. (1960). College press and changes
in study plans of talented students. Journal of Educational Psychology, 51, 222234.
Thomas, A., & Chess, S. (1977). Temperament and
development. New York: Brunner/Mazel.
Thomas, K. R., Wiesner, S. L., & Davis, R. M. (1982).
Semantic differential ratings as indices of disability
acceptance. Rehabilitation Psychology, 27, 245247.
Thomas, P. J. (1980). A longitudinal comparison of the
WISC and WISC-R with special education students.
Psychology in the Schools, 17, 437441.
Thompson, B., & Daniel, L. G. (1996). Factor analytic
evidence for the construct validity of scores: A historical overview and some guidelines. Educational
and Psychological Measurement, 56, 197208.
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174195.
Thompson, R. J. (1977). Consequences of using the
1972 Stanford Binet intelligence scale norms. Psychology in the Schools, 14, 445448.
Thompson, R. J. (1980). The diagnostic utility of
WISC-R measures with children referred to a
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
612
developmental evaluation center. Journal of Consulting and Clinical Psychology, 48, 440447.
Thorndike, E. L. (1926). Measurement of intelligence. New York: Teachers College, Columbia
University.
Thorndike, R. L. (1940). Constancy of the IQ. Psychological Bulletin, 37, 167186.
Thorndike, R. L. (1949). Personnel selection. New York:
Wiley.
Thorndike, R. L. (1971). Concepts of culturefairness. Journal of Educational Measurement, 8, 63
70.
Thorndike, R. L. (1977). Causation of Binet IQ decrements. Journal of Educational Measurement, 14,
197202.
Thorndike, R. L., & Hagen, E. P. (1977). Measurement
and evaluation in Psychology and Education (4th ed.).
New York: Wiley.
Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986a).
The Stanford-Binet Intelligence Scale: Fourth edition,
guide for administering and scoring. Chicago, IL:
Riverside.
Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986b).
The Stanford-Binet Intelligence Scale: Technical manual (4th ed.). Chicago, IL: Riverside.
Thorndike, R. M. (1990). Origins of intelligence and its
measurement. Journal of Psychoeducational Assessment, 8, 223230.
Thorne, A., & Gough, H. G. (1991). Portraits of type:
An MBTI research compendium. Palo Alto, CA: Consulting Psychologists Press.
Thorne, F. C. (1966). The sex inventory. Journal of
Clinical Psychology, 22, 367374.
Thornton, G. C., III, & Byham, W. C. (1982). Assessment centers and managerial performance. New York:
Academic Press.
Throop, W. F., & MacDonald, A. P. (1971). Internalexternal locus of control: A bibliography. Psychological Reports, 28, 175190.
Thurstone, L. L. (1934). The vectors of mind. Psychological Review, 41, 132.
Thurstone, L. L. (1938). Primary mental abilities. Psychometric Monographs (No. 1).
Thurstone, L. L. (1946). Comment. American Journal
of Sociology, 52, 3950.
Thurstone, L. L. (1952). Autobiography. In E. G. Boring et al. (Eds.), A history of psychology in autobiography (Vol. 4, pp. 295322). Worcester, MA: Clark
University Press.
Thurstone, L. L., & Chave, E. J. (1929). The measurement of attitudes. Chicago, IL: University of Chicago
Press.
Tiedeman, D. V., & OHara, R. P. (1963). Career
development: Choice and adjustment. Princeton, NJ:
College Entrance Examination Board.
March 4, 2006
14:36
References
Tillman, H. M. (1973). Intelligence scales for the blind:
A review with implications for research. Journal of
School Psychology, 11, 8087.
Tilton, J. W. (1937). The measurement of overlapping.
Journal of Educational Psychology, 28, 656662.
Timbrook, R. E., & Graham, J. R. (1994). Ethnic differences on the MMPI-2. Psychological Assessment,
6, 212217.
Timbrook, R. E., Graham, J. R., Keiller, S. W., &
Watts, D. (1993). Comparison of the WinenerHarmon subtle-obvious scales and the standard
validity scales in detecting valid and invalid MMPI-2
proles. Psychological Assessment, 5, 5361.
Timmons, L. A., Lanyon, R. I., Almer, E. R., & Curran,
P. J. (1993). Development and validation of sentence completion test indices of malingering during examination for disability. American Journal of
Forensic Psychology, 11, 2338.
Tolor, A. L., & Brannigan, G. G. (1980). Research
and clinical applications of the Bender-Gestalt Test.
Springeld, II: Charles C Thomas.
Torrance, E. P. (1966). Torrance tests of creative thinking.
Norms technical manual. Princeton, NJ: Personnel
Press.
Torrance, E. P. (1981). Empirical validation of
criterion-referenced indicators of creative ability
through a longitudinal study. Creative Child and
Adult Quarterly, 6, 136140.
Trefnger, D. J. (1985). Review of the Torrance Tests
of Creative Thinking. In J. V. Mitchell, Jr. (Ed.),
The ninth mental measurements yearbook (pp. 1632
1634). Lincoln, NE: University of Nebraska.
Triandis, H. C. (1980). Values, attitudes, and interpersonal behavior. In M. M. Page (Ed.), Nebraska Symposium on Motivation 1979 (pp. 195259). Lincoln,
NE: University of Nebraska Press.
Triandis, H. C., Kashima, Y., Hui, C. H., Lisansky, J.,
& Marin, G. (1982). Acculturation and biculturalism indices among relatively acculturated Hispanic
young adults. Interamerican Journal of Psychology,
16, 140149.
Trickett, E. J., & Moos, R. H. (1970). Generality and
specicity of student reactions in high school classrooms. Adolescence, 5, 373390.
Trickett, E. J., & Moos, R. H. (1973). Social environment of junior high and high school classrooms.
Journal of Educational Psychology, 65, 93102.
Trieschmann, R. B. (1988). Spinal cord injuries: Psychological, social, and vocational rehabilitation (2nd
ed.). New York: Pergamon Press.
Trochim, W. M. K. (1985). Pattern matching, validity, and conceptualization in program evaluation.
Evaluation Review, 9, 575604.
Truch, S. (1989). WISC-R Companion. Seattle, WA:
Special Child Publications.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Trull, T. J., & Hillerbrand, E. (1990). Psychometric
properties and factor structure of the Fear Questionnaire Phobia Subscale items in two normative
samples. Journal of Psychopathology and Behavioral
Assessment, 12, 285297.
Trull, T. J., Neitzel, M. T., & Main, A. (1988). The use
of meta-analysis to assess the clinical signicance of
behavior therapy for agoraphobia. Behavior Therapy, 19, 527538.
Trybus, R. (1973). personality assessment of entering
hearing-impaired college students using the 16PF,
Form B. Journal of Rehabilitation of the Deaf, 6, 34
40.
Trybus, R., & Karchmer, M. A. (1977). School achievement scores of hearing impaired children: National
data on achievement status and growth patterns.
American Annals of the Deaf, 122, 6269.
Tryon, G. S. (1980). The measurement and treatment
of test anxiety. Review of Educational Research, 50,
343372.
Tryon, W. W. (1991). Activity measurement in psychology and medicine. New York: Plenum Press.
Tsushima, W. T. (1994). Short form of the WPPSI and
WPSSI-R. Journal of Clinical Psychology, 50, 877
880.
Tucker, M. F., Cline, V. B., & Schmitt, J. R. (1967). Prediction of creativity and other performance measures from biographical information among pharmaceutical scientists. Journal of Applied Psychology,
57, 131138.
Tucker, W. H. (1994). Fact and ction in the discovery
of Sir Cyril Burts aws. Journal of the History of the
Behavioral Sciences, 30, 335347.
Tucker, W. H. (1997). Re-considering Burt: Beyond a
reasonable doubt. Journal of the History of the Behavioral Sciences, 33, 145162.
Tuckman, J., & Lorge, I. (1952). The attitudes of the
aged toward the older worker for institutionalized
and non-institutionalized adults. Journal of Gerontology, 7, 559564.
Tuckman, J., & Lorge, I. (1953). Attitudes toward
old people. Journal of Social Psychology, 37,
249260.
Tuckman, J., & Lorge, I. (1958). Attitude toward aging
of individuals with experiences with the aged. Journal of Genetic Psychology, 92, 199204.
Tuddenham, R. D., Davis, L., Davison, L., & Schindler,
R. (1958). An experimental group version for school
children of the Progressive Matrices. Journal of Consulting Psychology, 22, 30.
Tupes, E. C., & Christal, R. E. (1961). Recurrent personality factors based on trait ratings. (USAF ASD
Technical Report No. 61-97). Lackland Air Force
Base, TX: U.S. Air Force (cited in McCrae & Costa,
1986).
March 4, 2006
14:36
613
Turk, D. C., Rudy, T. E., & Salovey, P. (1985). The
McGill Pain Questionnaire reconsidered: Conrming the factor structure and examining appropriate
uses. Pain, 21, 385397.
Turner, R. G. (1978). Individual differences in ability to
image nouns. Perceptual and Motor Skills, 47, 423
434.
Turner, R. J., & Wheaton, B. (1995). Checklist measurement of stressful life events. In S. Cohen, R. C.
Kessler, & L. U. Gordon (Eds.), Measuring stress (pp.
2958). New York: Oxford University Press.
Tursky, B. (1976). The development of a pain perception prole: A psychophysical approach. In
M. Weisenberg & B. Tursky (Eds.), Pain: New
perspectives in therapy and research. New York:
Plenum.
Tuttle, F. B., & Becker, L. A. (1980). Characteristics and
identication of gifted and talented students. Washington, DC: National Educational Association.
Tyler, F. B., Dhawan, N., & Sinha, Y. (1989). Cultural contributions to constructing locus-of-control
attributions. Genetic, Social, and General Psychology
Monographs, 115, 205220.
Tyler, S., & Elliott, C. D. (1988). Cognitive proles of
groups of poor readers and dyslexic children on the
British Ability Scales. British Journal of Psychology,
79, 493508.
Tzeng, O. C. S., Maxey, W. A., Fortier, R., & Landis, D.
(1985). Construct evaluation of the Tennessee Self
Concept Scale. Educational and Psychological Measurement, 45, 6378.
Tzeng, O. C. S., Ware, R., & Bharadwaj, N. (1991).
Comparison between continuous bipolar and
unipolar ratings of the Myers-Briggs Type Indicator. Educational and Psychological Measurement, 51,
681690.
Ulissi, S. M., Brice, P. J., & Gibbins, S. (1989). Use of
the Kaufman-Assessment Battery for Children with
the hearing impaired. American Annals of the Deaf,
134, 283287.
Ullman, L. P., & Krasner, L. (1975). A psychological
approach to abnormal behavior (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.
Ullman, R. K., Sleator, E. K., & Sprague, R. L.
(1991). ADDH Comprehensive Teachers Rating Scale
(ACTeRS). Champaign, IL: Metritech, Inc.
Urban, W. J. (1989). The Black scholar and intelligence
testing: The case of Horace Mann Bond. Journal of
the History of the Behavioral Sciences, 25, 323334.
U.S. Department of Labor (1970). Manual for the
USES General Aptitude Test Battery. Washington,
DC: Manpower Administration, U.S. Department
of Labor.
Valencia, R. R. (1979). Comparison of intellectual performance of Chicano and Anglo third-grade boys
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
614
on the Ravens Coloured Progressive Matrices. Psychology in the Schools, 16, 448453.
Van den Brink, J., & Schoonman, W. (1988). First steps
in computerized testing at the Dutch Railways. In
F. J. Maarse, L. J. M. Mulder, W. P. B. Sjouw, &
A. E. Akkerman (Eds.), Computers in psychology:
Methods, instrumentation and psychodiagnostica
(pp. 184188). Amsterdam: Swets & Zeitlinger.
Vander Kolk, C. J. (1982). A comparison of intelligence
test score patterns between visually impaired subgroups and the sighted. Rehabilitation Psychology,
27, 115120.
Vandeveer, B., & Schweid, E. (1974). Infant assessment:
Stability of mental functioning in young retarded
children. American Journal of Mental Deciency, 79,
14.
Van de Vijver. F. J. R., & Harsveld, M. (1994). The
incomplete equivalence of the paper-and-pencil
and computerized versions of the General Aptitude Test Battery. Journal of Applied Psychology, 79,
852859.
Van Gorp, W., & Meyer, R. (1986). The detection of
faking on the Millon Clinical Multiaxial Inventory
(MCMI). Journal of Clinical Psychology, 42, 742747.
Van Hagan, J., & Kaufman, A. S. (1975). Factor analysis of the WISC-R for a group of mentally retarded
children and adolescents. Journal of Consulting and
Clinical Psychology, 43, 661667.
Vansickle, T. R., & Kapes, J. T. (1993). Comparing
paper-pencil and computer-based versions of the
Strong-Campbell Interest Inventory. Computers in
Human Behavior, 9, 441449.
Varble, D. L. (1971). Current status of the Thematic Apperception Test. In P. McReynolds (Ed.),
Advances in psychological assessment (Vol. 2,
pp. 216235). Palo Alto, CA: Science and Behavior
Books, Inc.
Veldman, D. J. (1967). Computer-based sentence completion interviews. Journal of Counseling Psychology,
14, 153157.
Vernon, M. (1968). Fifty years of research on the intelligence of the deaf and hard-of-hearing: A survey of
the literature and discussion of implications. Journal
of Rehabilitation of the Deaf, 1, 112.
Vernon, M. (1970). Psychological evaluation and interviewing of the hearing impaired. Rehabilitation
Research and Practice Review, 1, 4552.
Vernon, M., & Brown, D. W. (1964). A guide to psychological tests and testing procedures in the evaluation of deaf and hard-of-hearing children. Journal
of Speech and Hearing Disorders, 29, 414423.
Vernon, M. C., & Andrews, J. F. (1990). The psychology
of deafness. New York: Longman.
Vernon, P. E. (1960). The structure of human abilities
(Rev. ed.). London: Methuen.
March 4, 2006
14:36
References
Vernon, P. E., & Allport, G. W. (1931). A test for personal value. Journal of Abnormal and Social Psychology, 26, 231248.
Vescovi, G. M. (1979). The emerging private psychologist practitioner as contributer to vocational rehabilitation process for deaf clients. Journal of Rehabilitation of the Deaf, 13, 919.
Vieweg, B. W., & Hedlund, J. L. (1984). Psychological Screening Inventory: A comprehensive review.
Journal of Clinical Psychology, 40, 13821393.
Vincent, K. R., & Cox, J. A. (1974). A re-evaluation of
Ravens Standard Progressive Matrices. The Journal
of Psychology, 88, 299303.
Vinson, D. E., Munson, J. M., & Nakanishi, M. (1977).
An investigation of the Rokeach Value Survey for
consumer research applications. In W. D. Perreault
(Ed.), Advances in consumer research (pp. 247252).
Provo, UT: Association for Consumer Research.
Vodanovich, S. J., & Kass, S. J. (1990). A factor analytic
study of the boredom proneness scale. Journal of
Personality Assessment, 55, 115123.
Volicer, L., Hurley, A. C., Lathi, D. C., & Kowall,
N. W. (1994). Measurement of severity in advanced
Alzheimers disease. Journal of Gerontology, 49, 223
226.
Volicer, L., Seltzer, B., Rheaume, Y., & Fabiszewski, K.
(1987). Progression of Alzheimer-type dementia in
institutionalized patients: A cross-sectional study.
Journal of Applied Gerontology, 6, 8394.
Volker, M. A., Guarnaccia, V., & Scardapane, J. R.
(1999). Sort forms of the Stanford-Binet Intelligence
Scale: Fourth edition for screening potentially gifted
preschoolers. Journal of Psychoeducational Assessment, 17, 226235.
Von Mayrhauser, R. T. (1989). Making intelligence
functional: Walter Dill Scott and applied psychological testing in World War I. Journal of the History
of the Behavioral Sciences, 25, 6072.
Vredenburg, K., Krames, L., Flett, G. L. (1985). Reexamining the Beck Depression Inventory: The long
and short of it. Psychological Reports, 57, 767778.
Vulpe, S. G. (1982). Vulpe Assessment Battery. Toronto:
National Institute on Mental Retardation.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge,
MA: Harvard University Press.
Waddell, D. D. (1980). The Stanford-Binet: An evaluation of the technical data available since the 1972
restandardization. Journal of School Psychology, 18,
203209.
Wagner, R. K., & Sternberg, R. J. (1986). Tacit knowledge and intelligence in the everyday world. In R. J.
Sternberg & R. K. Wagner (Eds.), Practical intelligence (pp. 5183). Cambridge, MA: Cambridge University Press.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Wahler, R. G., House, A. E., & Stanbaugh, E. E., II
(1976). Ecological assessment of child problem behavior: A clinical package for home, school, and institutional settings. New York: Pergamon.
Wainer, H. (1987). The rst four millenia of mental
testing: From Ancient China to the computer age.
Educational Testing Service Research Report, No. 87
34.
Wainer, H. (1993). Measurement problems. Journal of
Educational Measurement, 30, 121.
Walczyk, J. J. (1993). A computer program for constructing language comprehension tests. Computers
in Human Behavior, 9, 113116.
Walk, R. D. (1956). Self ratings of fear in a fear-invoking
situation. Journal of Abnormal and Social Psychology,
22, 171178.
Walker, D. K. (1973). Socioemotional measures for
preschool and kindergarten children. San Francisco,
CA: Jossey-Bass.
Walker, N. W., & Myrick, C. C. (1985). Ethical considerations in the use of computers in psychological
testing and assessment. Journal of School Psychology,
23, 5157.
Wallace, W. L. (1950). The relationship of certain variables to discrepancy between expressed and inventoried vocational interest. American Psychologist, 5,
354 (abstract).
Wallas, G. (1926). The art of thought. London: Watts.
Wallbrown, F. H., & Jones, J. A. (1992). Reevaluating
the factor structure of the Revised California Psychological Inventory. Educational and Psychological
Measurement, 52, 379386.
Walls, R. T., Werner, T. J., Bacon, A., & Zane, T. (1977).
Behavior Checklists. In J. D. Cone & R. P. Hawkins
(Eds.), Behavioral Assessment (pp. 77146). New
York: Brunner/Mazel.
Wallston, B. S., Wallston, K. A., Kaplan, G. D., &
Maides, S. A. (1976). Development and validation
of the Health Locus of Control Scale. Journal of Consulting and Clinical Psychology, 44, 580585.
Wallston, K. A., Wallston, B. S., & DeVellis, R. (1978).
Development of the Multidimensional Health Locus
of Control (MHLC) Scales. Health Education Monographs, 6, 160169.
Walsh, J. A. (1972). Review of the CPI. In O. K. Buros
(Ed.), The Seventh mental measurements yearbook
(Vol. 1, pp. 9697). Highland Park, NJ: Gryphon
Press.
Walters, G. D. (1988). Schizophrenia. In R. L.
Greene (Ed.), The MMPI: Use with specic populations (pp. 5073). Philadelphia, PA: Grune &
Stratton.
Walters, G. D., White, T. W., & Greene, R. L. (1988). Use
of the MMPI to identify malingering and exaggeration of psychiatric symptomatology in male prison
March 4, 2006
14:36
615
inmates. Journal of Consulting and Clinical Psychology, 56, 111117.
Wampler, K. S., & Halverson, C. F. (1990). The Georgia
Marriage Q-sort: An observational measure of marital functioning. American Journal of Family Therapy,
18, 169178.
Wang, K. A. (1932). Suggested criteria for writing attitude statements. Journal of Social Psychology, 3, 367
373.
Wang, M. W., & Stanley, J. C. (1970). Differential
weighting: A review of methods and empirical studies. Review of Educational Research, 40, 663705.
Wapner, S. (1990). Introduction. Journal of the History
of the Behavioral Sciences, 26, 107113.
Ward, C. H., Beck, A. T., Mendelson, M., Mock, J. E.,
& Erbaugh, J. K. (1962). The psychiatric nomenclature. Reasons for diagnostic disagreement. Archives
of General Psychiatry, 7, 198205.
Wardrop, J. L. (1989). Review of the California
Achievement Tests. In J. C. Conoley & J. J. Kramer
(Eds.), The tenth mental measurements yearbook
(pp. 128133). Lincoln, NE: University of Nebraska
Press.
Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36
item short-form Health Survey (SF-36). I. Conceptual framework and item selection. Medical Care,
30, 473483.
Ware, J. E., Snow, K. K., & Kosinski, M. (1993). SF36 Health Survey: Manual and interpretation guide.
Boston, MA: The Health Institute, New England
Medical Center Hospitals.
Waring, D., Farthing, C., & Kidder-Ashley, P.
(1999). Impulsive response style affects computeradministered multiple choice test performance.
Journal of Instructional Psychology, 26, 121128.
Waters, L. K. (1965). A note on the fakability of
forced-choice scales. Personnel Psychology, 18, 187
191.
Watkins, C. E., Jr. (1986). Validity and usefulness of
WAIS-R, WISC-R, and WPPSI short forms: A critical review. Professional Psychology, 17, 3643.
Watkins, E. O. (1976). The Watkins Bender-Gestalt scoring system. Novato, CA: Academic Therapy.
Watson, B. U. (1983). Test-retest stability of the HiskeyNebraska Test of Learning Aptitude in a sample of
hearing-impaired children and adolescents. Journal
of Speech and Hearing Disorders, 48, 145149.
Watson, B. U., & Goldgar, D. E. (1985). A note on the
use of the Hiskey-Nebraska Test of Learning Aptitude with deaf children. Language, Speech, and Hearing Services in Schools, 16, 5357.
Watson, D. (1979). Guidelines for the psychological
and vocational assessment of deaf rehabilitation
clients. Journal of Rehabilitation of the Deaf, 13, 27
57.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
616
Watson, D. (1989). Strangers ratings of the ve robust
personality factors: Evidence of a surprising convergence with self-report. Journal of Personality and
Social Psychology, 57, 120128.
Watson, R. I. (1966). The role and use of history in the
Psychology curriculum. Journal of the History of the
Behavioral Sciences, 2, 6469.
Watts, K., Baddeley, A., & Williams, M. (1982). Automated tailored testing using Ravens matrices and
the Mill Hill vocabulary test: A comparison with
manual administration. International Journal of
Man-Machine Studies, 17, 331344.
Webb, J. T., Miller, M. L., & Fowler, R. D. Jr.
(1970). Extending professional time: A computerized MMPI interpretation service. Journal of Clinical
Psychology, 26, 210214.
Webb, S. C. (1955). Scaling of attitudes by the method
of equal-appearing intervals: A review. Journal of
Social Psychology, 42, 215239.
Wechsler, D. (1939). The measurement of adult intelligence. Baltimore, MD: Williams & Wilkins.
Wechsler, D. (1941). The measurement of adult intelligence (2nd ed.). Baltimore, MD: Williams & Wilkins.
Wechsler, D. (1958). The measurement and appraisal
of adult intelligence. Baltimore, MD: Williams &
Wilkins.
Wechsler, D. (1967). Manual for the Wechsler Preschool
and Primary Scale of intelligence. New York: Psychological Corporation.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children Revised. New York: Psychological Corporation.
Wechsler, D. (1975). Intelligence dened and
undened: A relativistic approach. American Psychologist, 30, 135139.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale
Revised. New York: Psychological Corporation.
Wechsler, D. (1984). WISC-RM escala de inteligencia
para nivel scolar Wechsler. Mexico, DF: El Manual
Moderno.
Wechsler, D. (1987). Wechsler Memory Scale Revised
manual. New York: The Psychological Corporation.
Wechsler, D. (1989). Wechsler Preschool and Primary
Scale of Intelligence Revised (WPPSI-R). San Antonio, TX: Psychological Corporation.
Wechsler, D. (1991). Wechsler Intelligence Scale for
Children-Third Edition: Manual. New York: The Psychological Corporation.
Weckowicz, T. E., Muir, W., & Cropley, A. J. (1967). A
factor analysis of the Beck Inventory of Depression.
Journal of Consulting Psychology, 31, 2328.
Weinberger, M., Hiner, S. L., & Tierney, W. M. (1987).
In support of hassles as a measure of stress in predicting health outcomes. Journal of Behavioral Medicine,
10, 1931.
March 4, 2006
14:36
References
Weiner, B. (1980). Human motivation. New York: Holt,
Rinehart and Winston.
Weiner, I. (1977). Approaches to Rorschach validation.
In M. A. Rickers-Ovsiankina (Ed.), Rorschach psychology. Huntington, NY: Robert E. Krieger.
Weiner, I. B. (1994). The Rorschach Inkblot Method
(RIM) is not a test: Implications for theory and practice. Journal of Personality Assessment, 62, 498504.
Weiss, D. J. (1985). Adaptive testing by computer.
Journal of Consulting and Clinical Psychology, 53,
774789.
Weiss, D. J., & Davison, M. L. (1981). Test theory and
methods. Annual Review of Psychology, 32, 629658.
Weiss, D. J., & Dawis, R. V. (1960). An objective validation of factual interview data. Journal of Applied
Psychology, 44, 381385.
Weiss, R. L., & Margolin, G. (1977). Marital conict
and accord. In A. R. Ciminero, K. S. Calhoun, & H. E.
Adams (Eds.), Handbook for behavioral assessment.
New York: Wiley.
Weiss, R. S. (1973). Loneliness: The experience of emotional and social isolation. Cambridge, MA: MIT
Press.
Weissman, M. M., Sholomskas, D., Pottenger, M., Prusoff, B. A., & Locke, B. Z. (1977). Assessing depressive symptoms in ve psychiatric populations: A
validation study. American Journal of Epidemiology,
106, 203214.
Weitz, J. (1950). Verbal and pictorial questionnaires in
market research. Journal of Applied Psychology, 34,
363366.
Weitz, J., & Nuckols, R. C. (1953). A validation study
of How Supervise? Journal of Applied Psychology,
37, 78.
Welch, G., Hall, A., & Norring, C. (1990). The factor structure of the Eating Disorder Inventory in a
patient setting. International Journal of Eating Disorders, 9, 7985.
Welch, G., Hall, A., & Walkey, F. (1988). The factor
structure of the Eating Disorders Inventory. Journal
of Clinical Psychology, 44, 5156.
Welch, G., Hall, A., & Walkey, F. (1990). The replicable dimensions of the Beck Depression Inventory.
Journal of Clinical Psychology, 46, 817827.
Wellman, B. L., Skeels, H. M., & Skodak, M. (1940).
Review of McNemars critical examination of Iowa
studies. Psychological Bulletin, 37, 93111.
Welsh, G. S. (1948). An extension of Hathaways MMPI
prole coding system. Journal of Consulting Psychology, 12, 343344.
Welsh, G. S. (1956). Factor dimensions A & R. In G. S.
Welsh & W. G. Dahlstrom (Eds.), Basic readings on
the MMPI in psychology and medicine (pp. 264
281). Minneapolis, MN: University of Minnesota
Press.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Welsh, G. S. (1966). Comparison of the D-48, Terman CMT, and Art Scale scores of gifted adolescents.
Journal of Consulting Psychology, 30, 88.
Werner, E. E., Bierman, J. M., & French, F. E. (1971).
The children of Kavai: A longitudinal study from the
prenatal period. Honolulu, HI: University of Hawaii
Press.
Werner, P. D. (1993). A Q-sort measure of beliefs about
abortion in college students. Educational and Psychological Measurement, 53, 513521.
Werner, S. H., Jones, J. W., & Steffy, B. D. (1989). The
relationship between intelligence, honesty, and theft
admissions. Educational and Psychological Measurement, 49, 921927.
Wertheimer, M. (1958). Principles of perceptual organization. In D. C. Beardsless & M. Wertheimer
(Eds.), Readings in perception. New York: Van Nostrand.
Wesman, A. G. (1968). Intelligent testing. American
Psychologist, 23, 267274.
Wesman, A. G. (1971). Writing the test item. In R. L.
Thorndike (Ed.), Educational Measurement (2nd
ed.). Washington, DC: American Council on Education.
Westman, J. C. (1990). Handbook of learning
disabilities: A multisystem approach. Boston, MA:
Allyn & Bacon.
Wetzler, S. (1989a). Parameters of psychological assessment. In S. Wetzler & M. M. Katz (Eds.), Contemporary approaches to psychological assessment (pp.
315). New York: Brunner/Mazel.
Wetzler, S. (1989b). Self-report tests: The patients vantage. In S. Wetzler & M. M. Katz (Eds.), Contemporary approaches to psychological assessment (pp.
98117). New York: Brunner/Mazel.
Wetzler, S., Kahn, R., Strauman, T. J., & Dubro,
A. (1989). Diagnosis of major depression by selfreport. Journal of Personality Assessment, 53, 2230.
Wetzler, S., & Marlowe, D. B. (1993). The diagnosis
and assessment of depression, mania, and psychosis
by self-report. Journal of Personality Assessment, 60,
131.
Wharton, Y. L. (1977). List of hypotheses advanced to
explain the SAT score decline. New York: College
Entrance Examination Board.
Whipple, G. M. (1910). Manual of mental and physical
tests. Baltimore, MD: Warwick and York.
White, D. M., Clements, C. B., & Fowler, R. D. (1986). A
comparison of computer administration with standard administration of the MMPI. Computers in
Human Behavior, 1, 153162.
White, D. R., & Jacobs, E. (1979). The prediction of
rst-grade reading achievement from WPPSI scores
of preschool children. Psychology in the Schools, 16,
189192.
March 4, 2006
14:36
617
White, K., Sheehan, P. W., & Ashton, R. (1977).
Imagery assessment: A survey of self-report measures. Journal of Mental Imagery, 1, 145170.
White, K. O. (1978). Testing the handicapped for
employment purposes: adaptations for persons with
motor handicaps (PS 784). Washington, DC: Personnel Research and Development Center, U.S. Civil
Service Commission.
Whitehorn, J. C., & Betz, B. J. (1960). Further studies
of the doctor as a crucial variable in the outcome of
treatment with schizophrenics. American Journal of
Psychiatry, 117, 215223.
Whitney, D. R., Malizio, A. G., & Patience, W. M.
(1986). Reliability and validity of the GED tests.
Educational and Psychological Measurement, 46,
689698.
Whitworth, R. H. (1984). Bender Visual Motor Gestalt
Test. In D. J. Keyser & R. C. Sweetland (Eds.), Test
critiques (Vol. I., pp 9098). Kansas City, MO: Test
Corporation of America.
Whitworth, R. H. (1987). The Halstead-Reitan
Neuropsychological battery and allied procedures.
In D. J. Keyser & R. C. Sweetland (Eds.), Test critiques compendium (pp. 196205). Kansas City, MO:
Test Corporation of America.
Whitworth, R. H., & Barrientos, G. A. (1990). Comparison of Hispanic and Anglo Graduate Record Examination scores and academic performance. Journal
of Psychoeducational Assessment, 8, 128132.
Wicker, A. W. (1969). Attitudes versus actions:
The relationship of verbal and overt behavioral
responses to attitude objects. Journal of Social Issues,
25, 4178.
Wider A. (1948). The Cornell Medical Index. New York:
Psychological Corporation.
Widiger, T. A., & Frances, A. (1987). Interviews and
inventories for the measurement of personality disorders. Clinical Psychology Review, 7, 4975.
Widiger, T. A., & Kelso, K. (1983). Psychodiagnosis of
Axis II. Clinical Psychology Review, 3, 491510.
Widiger, T. A., & Sanderson, C. (1987). The convergent
and discriminant validity of the MCMI as a measure
of the DSM-III personality disorders. Journal of Personality Assessment, 51, 228242.
Widiger, T. A., Williams, J. B., Spitzer, R. L., & Frances,
A. (1985). The MCMI as a measure of DSM-III.
Journal of Personality Assessment, 49, 366378.
Widiger, T. A., Williams, J. B., Spitzer, R. L., & Frances,
A. (1986). The MCMI and DSM-III: A brief rejoinder to Millon (1985). Journal of Personality Assessment, 50, 198204.
Wiederman, M. W., & Allgeier, E. R. (1993). The
measurement of sexual-esteem: Investigation of
Snell and Papinis (1989) Sexuality Scale. Journal of
Research in Personality, 27, 88102.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
618
Wiechmann, G. H., & Wiechmann, L. A. (1973).
Multiple factor analysis: An approach to attitude
validation. Journal of Experimental Education, 41,
7484.
Wiener, D. N. (1948). Subtle and obvious keys for the
Minnesota Multiphasic Personality Inventory. Journal of Consulting Psychology, 12, 164170.
Wierzbicki, M., & Daleiden, E. L. (1993). The differential responding of college students to subtle and
obvious MCMI subscales. Journal of Clinical Psychology, 49, 204208.
Wiggins, G. (1990). The case for authentic assessment.
ERIC clearinghouse on tests, measurement, and evaluation. Washington, DC: American Institutes for
Research.
Wiggins, J. S. (1968). Personality structure. Annual
Review of Psychology, 19, 293350.
Wiggins, J. S. (1973). Personality and Prediction:
Principles of personality assessment. Reading, MA:
Addison-Wesley.
Wiggins, J. S. (1982). Circumplex models of interpersonal behavior in clinical psychology. In P. C.
Kendall & J. N. Butcher (Eds.), Handbook of research
methods in clinical psychology (pp. 183221). New
York: Wiley.
Wiggins, J. S. (1989). Review of the Personality
Research Form (3rd ed.). In J. C. Conoley & J. J.
Kramer (Eds.), The tenth mental measurements yearbook (pp. 633634). Lincoln, NE: University of
Nebraska Press.
Wiggins, J. S., & Pincus, A. L. (1992). Personality:
Structure and assessment. In M. R. Rosenzweig &
L. W. Porter (Eds.), Annual review of psychology (Vol.
43, pp. 473504). Palo Alto, CA: Annual Reviews.
Wikoff, R. L. (1979). The WISC-R as a predictor of achievement. Psychology in the Schools, 16,
364366.
Wilkening, G. N., Golden, C. J., Maclnnes, W. D.,
Plaisted, J. R., & Hermann, B. P. (1981, August). The
Luria-Nebraska Neuropsychological Battery Childrens revision: A preliminary report. Paper presented
at the meeting of the American Psychological Association, Los Angeles, CA.
Williams, J. B. W., Gibbon, M., First, M. B., Spitzer,
R. L., Davies, M., Borus, J., Howes, M. J., Kane, J.,
Pope, H. G., Rounsaville, B., & Wittchen, H. (1992).
The Structured Clinical Interview for DSM-III-R
(SCID): Multiple test-retest reliability. Archives of
General Psychiatry, 42, 630636.
Williams, R. L. (1970). Black pride, academic relevance, and individual achievement. The Counseling
Psychologist, 2, 1822.
Williams, R. L. (1971). Abuses and misuses in testing
black children. The Counseling Psychologist, 2, 62
73.
March 4, 2006
14:36
References
Williams, R. L. (1972). Abuses and misuses in testing
black children. In R. L. Jones (Ed.), Black psychology.
New York: Harper & Row.
Williams, R. L. (1974). Scientic racism and IQ: The
silent mugging of the black community. Psychology
Today, 7, 32ff.
Williams, R. L. (1975). The BITCH-100: A culturespecic test. Journal of Afro-American Issues, 3, 103
116.
Willingham, W. W. (1989). Standard testing conditions and standard score meaning for handicapped
examinees. Applied Measurement in Education, 2,
97103.
Willingham, W. W., Ragosta, M., Bennett, R. E., Braun,
H., Rock, D. A., & Powers, D. E. (1988). Testing handicapped people. Boston, MA: Allyn & Bacon.
Wilson, E. L. (1980). The use of psychological tests in
diagnosing the vocational potential of visually handicapped persons who enter supportive and unskilled
occupations. In B. Bolton & D. W. Cook (Eds.),
Rehabilitation client assessment (pp. 6577). Baltimore, MD: University Park Press.
Wilson, F. R., Genco, K. T., & Yager, G. G. (1985).
Assessing the equivalence of paper-and-pencil vs.
computerized tests: Demonstration of a promising methodology. Computers in Human Behavior,
1, 265275.
Wilson, G. D. (1973). The psychology of conservatism.
New York: Academic Press, 1973.
Wilson, G. D., & Patterson, J. R. (1968). A new measure
of conservatism. British Journal of Social and Clinical
Psychology, 7, 264269.
Wilson, K. M. (1974). The contribution of measures
of aptitude (SAT) and achievement (CEEB Achievement Average), respectively in forecasting college
grades in several liberal arts colleges. ETS Research
Bulletin, No. 7436.
Wilson, R. C., Christensen, P. R., Merrield, P. R.,
& Guilford, J. P. (1960). Alternate uses. Manual of
administration, scoring, and interpretation. Beverly
Hills, CA: Sheridan Supply Co.
Wilson, R. S. (1975). Twins: Patterns of cognitive development as measured on the Wechsler Preschool and
Primary Scale of Intelligence. Developmental Psychology, 11, 126134.
Wilson, S. L. (1991). Microcomputer-based psychological assessment an advance in helping severely
physically disabled people. In P. L. Dann, S. H.
Irvine, & J. M. Collis (Eds.), Advances in computerbased human assessment (pp. 171187). Durdrecht,
The Netherlands: Kluwer Academic Publishers.
Wilson, S. L., Thompson, J. A., & Wylie, G. (1982).
Automated psychological testing for the severely
physically handicapped. International Journal of
Man-Machine Studies, 17, 291296.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Wing, J. K. (Ed.). (1966). Early childhood autism.
Oxford: Pergamon Press.
Winter, W. D., Ferreira, A. J., & Olson, J. L. (1966).
Hostility themes in the family TAT. Journal of Projective Techniques and Personality Assessment, 30,
270274.
Winterling, D., Crook, T., Salama, M., & Gabert, J.
(1986). A self-rating scale for assessing memory loss.
In A. Bes, J. Cahn, S. Hoyer, J. P. Marc-Vergnes,
& H. M. Wisniewski (Eds.), Senile dementias: Early
detection (pp. 482486). London: John Libbey Eurotext.
Winters, K. C., & Henley, G. A. (1989). Personal experience inventory (PEI) test and manual. Los Angeles,
CA: Western Psychological Services.
Wirt, D. D., Lachar, D., Klinedinst, J. K., & Seat,
P. D. (1984). Multidimensional description of child
personality: A manual for the Personality Inventory
for Children (Revised 1984 by D. Lachar). Los Angeles, CA: Western Psychological Services.
Wirt, R. D., Lachar, D., Klinedinst, J. K., & Seat, P. D.
(1990). Personality Inventory for Children 1990 edition. Los Angeles, CA: Western Psychological Services.
Wirtz, W., & Howe, W. (1977). On further examination:
Report of the advisory panel on the Scholastic Aptitude Test score decline. New York: College Entrance
Examination Board.
Wise, S. L., & Wise, L. A. (1987). Comparison
of computer-administered and paper-administered
achievement tests with elementary school children.
Computers in Human Behavior, 3, 1520.
Wisniewski, J. J., & Naglieri, J. A. (1989). Validity of the
Draw A Person: A quantitative scoring system with
the WISC-R. Journal of Psychoeducational Assessment, 7, 346351.
Witt, J. C., Heffer, R. W., & Pfeiffer, J. (1990). Structured
rating scales: A review of self-report and informant
rating processes, procedures, and issues. In C. R.
Reynolds & R. W. Kamphaus (Eds.), Handbook of
psychological and educational assessment of children
(pp. 364394). New York: Guilford Press.
Wodrich, D. L., & Kush, S. A. (1990). Childrens psychological testing (2nd ed.). Baltimore, MD: Paul H.
Brookes.
Wolf, T. H. (1969a). The emergence of Binets conception and measurement of intelligence: A case history
of the creative process. Journal of the History of the
Behavioral Sciences, 5, 113134.
Wolf, T. H. (1969b). The emergence of Binets conception and measurement of intelligence: A case history
of the creative process. Part II. Journal of the History
of the Behavioral Sciences, 5, 207237.
Wolf, T. H. (1973). Alfred Binet. Chicago, IL: University
of Chicago Press.
March 4, 2006
14:36
619
Wolf, T. M., Elston, R. C., & Kissling, G. E. (1989).
Relationship of hassles, uplifts, and life events to
psychological well-being of freshman medical students. Behavioral Medicine, 15, 3745.
Wolk, R. L. (1972). Rened projective techniques with
the aged. In D. P. Kent, R. Kastenbaum, & S. Sherwood (Eds.), Research planning and action for the
elderly (pp. 218244). New York: Behavioral Publications.
Wolk, R. L., & Wolk, R. B. (1971). Manual: Gerontological Apperception Test, New York: Human Sciences
Press.
Wolk, S., & Zieziula, F. R. (1985). Reliability of the 1973
Edition of the SAT-HI over time: Implications for
assessing minority students. American Annals of the
Deaf, 130, 285290.
Wollersheim, J. P. (1970). Effectiveness of group therapy based upon learning principles in the treatment
of overweight women. Journal of Abnormal Psychology, 76, 462474.
Wolman, B. B. (Ed.). (1985). Handbook of intelligence.
New York: Wiley.
Wolpe, J. (1973). The practice of behavior therapy (2nd
ed.). New York: Pergamon.
Wolpe, J., & Lang, P. J. (1964). A fear survey schedule
for use in behavior therapy. Behavior Research and
Therapy, 2, 2730.
Wolpe, J. , & Lang, P. J. (1977). Manual for the Fear
Survey Schedule (Rev.). San Diego, CA: Educational
and Industrial Testing Service.
Wong, Y. I. (2000). Measurement properties of the
Center for Epidemiologic Studies Depression Scale
in a homeless population. Psychological Assessment,
12, 6976.
Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996).
The comprehensive system for the Rorschach: A critical examination. Psychological Science, 7, 310.
Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1997).
The reliability of the Comprehensive System for the
Rorschach: A comment on Meyer (1997). Psychological Assessment, 9, 490494.
Woodworth, R. S. (1920). Personal Data Sheet.
Chicago, IL: Stoelting.
Woon, T., Masuda, M., Wagner, N. N., & Holmes,
T. H. (1971). The Social Readjustment Rating Scale:
a cross-cultural study of Malaysians and Americans.
Journal of Cross-Cultural Psychology, 2, 373386.
Worchel, F. F., Aaron, L. L., & Yates, D. F. (1990). Gender
bias on the Thematic Apperception Test. Journal of
Personality Assessment, 55, 593602.
Worthen, B. R., Borg, W. R., & White, K. R. (1993).
Measurement and evaluation in the schools: A practical guide. White Plains, NY: Longman.
Wright, B. D., & Stone, M. H. (1985). Review of the
British Ability Scales. In J. V. Mitchell, Jr. (Ed.),
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
620
The ninth mental measurements yearbook (Vol. 1,
pp. 232235). Lincoln, NE: University of Nebraska
Press.
Wright, D., & DeMers, S. T. (1982). Comparison of the
relationship between two measures of visual-motor
coordination and academic achievement. Psychology in the Schools, 19, 473477.
Wright, J. H., & Hicks, J. M. (1966). Construction
and validation of a Thurstone scale of LiberalismConservatism. Journal of Applied Psychology, 50, 9
12.
Wylie, R. C. (1974). The self-concept: A review of
methodological considerations and measuring instruments (Rev. ed., Vol. 1). Lincoln, NE: University of
Nebraska Press.
Wytek, R., Opgenoorth, E., & Presslich, O. (1984).
Development of a new shortened version of Ravens
Matrices Test for application and rough assessment
of present intellectual capacity within psychopathological investigation. Psychopathology, 17, 4958.
Yachnick, M. (1986). Self-esteem in deaf adolescents.
American Annals of the Deaf, 131, 305310.
Yank, J., McCrae, R. R., Costa, P. T. Jr., Dai, X., Yao, S.,
Cai, T., & Gao, B. (1999). Cross-cultural personality
assessment in psychiatric populations: The NEOPI-R in the Peoples Republic of China. Psychological
Assessment, 11, 359368.
Yerkes. R. M., & Foster, J. C. (1923). A point scale for
measuring mental ability. Baltimore, MD: Warwick
and York.
Yin, P., & Fan, X. (2000). Assessing the realibility of
Beck Depression Inventory scores: realibility generalization across studies. Educational and Psychological Measurement, 60, 201223.
Ying, Y. (1988). Depressive symptomatology among
Chinese-Americans as measured by the CES-D.
Journal of Clinical Psychology, 44, 739746.
Yudin, L. W. (1966). An abbreviated form of the WISC
for use with emotionally disturbed children. Journal
of Consulting Psychology, 30, 272275.
Zajonc, R. B., & Bargh, J. (1980). Birth order, family
size, and decline of SAT scores. American Psychologist, 35, 662668.
Zakahi, W. R., & Duran, R. L. (1982). All the lonely
people: The relationship among loneliness, communicative competence, and communication anxiety.
Communication Quarterly, 30, 203209.
Zanna, M. P., Olson, J. M., & Fazio, R. H. (1980).
Attitude-behavior consistency: An individual difference perspective. Journal of Personality and Social
Psychology, 38, 432440.
Zanna, M. P., & Rempel, J. K. (1988). Attitudes: A
new look at an old concept. In D. Bartal & A. W.
Kruglanski (Eds.), The social psychology of knowl-
March 4, 2006
14:36
References
edge (pp. 315334). New York: Cambridge University Press.
Zapf, P. A., & Viljoen, J. L. (2003). Issues and considerations regarding the use of assessment instruments in
the evaluation of competency to stand trial. Behavioral Sciences and the Law, 21, 351367.
Zautra, A. J., & Reich, J. W. (1983). Life events and
perceptions of life quality: Developments in a twofactor approach. Journal of Community Psychology,
1, 121132.
Zea, M. C., & Tyler, F. B. (1994). Illusions of control A
factor-analytic study of locus of control in Colombian students. Genetic, Social and General Psychology
Monographs, 120, 201224.
Zebb, B. J., & Meyers, L. S. (1993). Reliability and validity of the Revised California Psychological Inventorys Vector 1 scale. Educational and Psycological
Measurement, 53, 271280.
Zehrbach, R. R. (1975). Comprehensive Identication
Process. Bensenville, IL: Scholastic Testing Service.
Zeiss, A. M. (1980). Aversiveness vs. change in the
assessment of life stress. Journal of Psychosomatic
Research, 24, 1519.
Zeleznik, C., Hojat, M., & Veloski, J. J. (1987). Predictive validity of the MCAT as a function of undergraduate institution. Journal of Medical Education,
62, 163169.
Zelinski, E. M., Gilewski, M. J., & Thompson, L. W.
(1980). Do laboratory tests relate to self-assessment
of memory ability in the young and old? In L. W.
Poon, J. L. Fozard, L. S. Cermak, D. Arenberg, &
L. W. Thompson (Eds.), New directions in memory
and aging (pp. 519544). Hillsdale, NJ: Erlbaum.
Zenderland, L. (1997). The Bell Curve and the shape
of history. Journal of the History of the Behavioral
Sciences, 33, 135139.
Zerbe, W. J., & Paulhus, D. L. (1987). Socially desirable,
responding in organizational behavior: A reconception. Academy of Management Review, 12, 250264.
Zielinski, J. J. (1993). A comparison of the Wechsler
Memory Scale Revised and the Memory Assessment Scales: Administrative, clinical, and interpretive issues. Professional Psychology, 24, 353359.
Zieziula, F. R. (Ed.). (1982). Assessment of hearingimpaired people. Washington, DC: Gallaudet College.
Zigler, E., Balla, D., & Hodapp, R. (1984). On the
denition and classication of mental retardation.
American Journal of Mental Deciency, 89, 215230.
Zigler, E., & Muenchow, S. (1992). Head Start: The
inside story of Americas most successful educational
experiment. New York: Basic Books.
Zilboorg, G. (1941). A history of medical psychology.
New York: Norton.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
References
Zimiles, H. (1996). Rethinking the validity of psychological assessment. American Psychologist 51, 980
981.
Zimmerman, I. L., & Woo-Sam, J. (1972). Research
with the Wechsler Scale for Children: 19601970
(Special Monograph Supplement). Psychology in the
Schools, 9, 232271.
Zimmerman, I. L., Woo-Sam, J., & Glasser, A. J. (1973).
The clinical interpretation of the Wechsler Adult Intelligence Scale. Orlando, FL: Grune & Stratton.
Zimmerman, M. (1983). Methodological issues in the
assessment of life events: A review of issues and
research. Clinical Psychology Review, 3, 339370.
Zimmerman, M. (1983). Weighted versus unweighted
life event scores: Is there a difference? Journal of
Human Stress, 9, 3035.
Zimmerman, M. (1986). The stability of the revised
Beck Depression Inventory in college students: Relationship with Life Events. Cognitive Therapy and
Research, 10, 3743.
Zingale, S. A., & Smith, M. D. (1978). WISC-R patterns
for learning disabled children at three SES levels.
Psychology in the Schools, 15, 199204.
Ziskin, J., & Faust, D. (1988). Coping with psychiatric and psychological testimony (Vols. 13, 4th ed.).
Marina Del Rey, CA: Law and Psychology Press.
Zuckerman, M. (1985). Review of Sixteen Personality
Factor Questionnaire. In J. V. Mitchell, Jr. (Ed.), The
ninth mental measurements yearbook (Vol. II, pp.
March 4, 2006
14:36
621
13921394). Lincoln, NE: University of Nebraska
Press.
Zuckerman, M., & Lubin, B. (1965). Manual for the
Multiple Affect Adjective Check List. San Diego, CA:
Educational & Industrial Testing Service.
Zung, W. W. K. (1965). A Self-rating Depression Scale.
Archives of General Psychiatry, 12, 6370.
Zung, W. W. K. (1969). A cross-cultural survey of
symptoms in depression. American Journal of Psychiatry, 126, 116121.
Zwiebel, A., & Mertens, D. M. (1985). A comparison of
intellectual structure in deaf and hearing children.
American Annals of the Deaf, 130, 2731.
Zytowski, D. G. (1981). Counseling with the Kuder
Occupational Interest Survey. Chicago, IL: Science
Research Associates.
Zytowski, D. G. (1985). Kuder DP manual supplement.
Chicago, IL: Science Research Associates.
Zytowski, D. G., & Kuder, F. (1986). Advances in the
Kuder Occupational Interest Survey. In W. B. Walsh
& S. H. Osipow (Eds.), Advances in vocational psychology (Vol. 1, pp. 3153). Hillsdale, NJ: Erlbaum.
Zytowski, D. G., & Laing, J. (1978). Validity of othergender-normed scales on the Kuder Occupational
Interest Survey. Journal of Counseling Psychology, 3,
205209.
Zytowski, D. G., & Warman, R. E. (1982). The changing
use of tests in counseling. Measurement and Evaluation in Guidance, 15, 147152.
P1: JZP
0521861810rfa3
CB1038/Domino
0 521 86181 0
March 6, 2006
622
19:0
P1: JZP
0521861810ind
CB1038/Domino
0 521 86181 0
14:51
Test Index
(1972), 101
(1986), 102
Black Intelligence Test of Cultural Homogeneity
(BITCH.100), 292293
P1: JZP
0521861810ind
CB1038/Domino
0 521 86181 0
624
College Characteristics Index, 505
Colorado Childhood Temperament Inventory,
230231
Test Index
Geriatric Scale of Recent Life Events, 265
Gerontological Apperception Test, 270, 398
Gesell Developmental Schedules, 228
Graduate Record Examination (GRE), 342
Hayes-Binet, 525
House-Tree-Person, 320
How Supervise?, 382
Kuhlmann-Binet, 525
14:51
P1: JZP
0521861810ind
CB1038/Domino
0 521 86181 0
Test Index
14:51
625
MacArthur Competence Assessment Tool (MacCAT), Personality Inventory for Children (PIC), 231
Personality Research Form (PRF), 78
420
(MSQFOP), 263
Pictorial Test of Intelligence, 238
Scale, 237238
Purdue Hand Precision Test, 357
McCarthy Scales of Childrens Abilities, 254
302
Metropolitan Achievement Tests, 330
241
Miller Analogies Test, 347
P1: JZP
0521861810ind
CB1038/Domino
0 521 86181 0
626
Semantic Differential (SemD), 134
Sexuality Scale SS, 204
Short-Form General Health Survey (SF-36), 415
Test Index
Thematic Apperception Test, 398
Time urgency, 374
14:51
P1: JZP
0521861810acr
CB1038/Domino
0 521 86181 0
14:29
Index of Acronyms
ACL
ACL Cr
AFQT
ARSMA
ASES
ASVAB
AUT
BAI
BAS
BCSHS
BDI
BITCH-100
BP
C
CABS
CBCL
CES
CES-D
F
FAD
FAST
FQ
GATB
GED
HIT
HOME
ICEQ
IPB
IPI
JCI
K-ABC
KGIS
KOIS
KVPR
LAQ
LPAD
LSQ
MAC-S
CIA
CPI
CVS
Conservatism Scale
Childrens Assertive Behavior Scale
Child Behavior Checklist
Classroom Environment Scale
Center for Epidemiologic
Studies-Depression
Childrens Inventory of Anger
California Psychological Inventory
Chinese Value Survey
DAP
DAS
DAS
DAT
DCBC
Draw-A-Person
Death Anxiety Scale
Differential Ability Scales
Dental Admission Testing Program
Daily Child Behavior Checklist
ECL
EPPS
EPQ
GRE
IDQ
I-E
MAST
MBTI
MCAT
627
P1: JZP
0521861810acr
CB1038/Domino
0 521 86181 0
628
MCMI
MMPI
MMPI-2
MRMT
MSE
MSQFOP
Index of Acronyms
Millon Clinical Multiaxial Inventory
Minnesota Multiphasic Personality
Inventory
Minnesota Multiphasic Personality
Inventory-Revised
Minnesota Rate of Manipulation Test
Mental Status Exam
Marital Satisfaction Questionnaire for
Older Persons
NAEP
PES
P-F
PQ
PRF
PSI
PTM
RDS
RVS
SADS
PIC
PM
PPVT-R
SIT
SL-ASIA
STAI
STT
SVIB
TAI
TASC
TEMAS
TSCS
TTCT
ULS
VVIQ
SOI-LA
SOMPA
SoV
SPQ
SPT
SS
SSHI
VVQ
SAT
SCAT III
SCI
SCID
SCII
SCL-90R
SDS
SemD
SEQ
SF-36
SHS
SII
WAIS
WAIS-R
WISC
WISC-III
WISC-R
WISPI
WMS
WMS-R
WO
WPPSI
WPPSI-R
WPT
16PF
14:29
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
13:48
Subject Index
Anger, 492
Childrens Inventory, 492
Angoff method, 354
Anxiety
about aging, 261
state-trait, 187
test, 456
APA Committee on test bias, 276
APA ethics code, 9
APA Task Force on integrity tests, 386
Aptitude by treatment interaction, 120
Aptitudes, 303
Armed Forces Vocational Aptitude Battery (ASVAB),
376
Army Alpha, 525526
Army Beta, 525526
Assertiveness, 496
Assessment
clinical, 2
direct, 22
performance, 22
vs. testing, 2
Assessment centers, 371
Attention Decit Hyperactivity Disorder, 237
teachers rating scale, 237
Attitude scale construction
Check Lists, 137, 138
Equal appearing intervals (Thurstone), 129
Scalogram analysis (Guttman), 133, 205,
260
Semantic Differential, 134
Social distance (Bogardus), 133
steps in development, 140
Summated ratings (Likert), 131
Writing items, 140, 141
Attitudes, 127
dened, 128
measurement, 129
theoretical components
towards elderly, 260
629
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
630
Attitudes (cont.)
towards hearing impaired, 315
See also specic topics, tests
Attraction-selection-attrition model, 363
Authentic-measurement, 22
Authoritarian personality, 138, 421, 530
Back translation, 220, 284
Banding, 373
Bandwidth-delity, 30, 57, 487
Barnum effect, 470
Basal level, 103
Base rate, 63, 129, 195
Base rate scores, 182
Battery, 2, 162, 163
Bayley Scales of Infant Development, 228
Beck Depression Inventory (BDI), 189, 451
Bedford Alzheimer Nursing Scale, 267
Behavior Problem Checklist, 252
Behavior rating scales, 251, 320
Behavioral assessment, 69, 478, 483, 484
Bell Curve, The (Herrnstein and Murray), 531
Bender Visual Motor Gestalt Test, 404
Benton Visual Retention Test, 267
Berkeley Growth Study, 527
Bias, 272
intercept bias, 277
on SAT, 337
slope bias, 277
Binet, Alfred, 519
Binet-Simon Scale
(1905), 93, 100
(1908), 101
(1911), 101
(1916), 101
(1937), 101
(1960), 101
(1972), 101
(1986), 102
Biodata (biographical data), 365, 453
Black Intelligence Test of Cultural Homogeneity
(BITCH.100), 292293
Blacky Pictures Test, 398
Boehm Test of Basic Concepts, 241
Bogardus method, 133
Bogardus Social Distance Scale, 133
Boredom Proneness Scale (BP), 87
Bracken Basic Concept Scale, 241
Branching, 474
Brazelton Behavioral Assessment Scale, 230231
Brief College Students Hassles Scale (BCSHS),
218
Brigance Inventory of Early Development, 327
British Ability Scales (BAS), 116
Bruiniks-Oseretsky Test of Motor Prociency, 238
Subject Index
Cain-Levine Social Competency Scale, 230231
California Achievement Tests, 328
California F scale, 421
California Psychological Inventory (CPI), 39, 80
California Q set, 513
Candidate Prole Record, 365366
Career Assessment Inventory, 159
Career Decision Scale, 159
Carrow Elicited Language Inventory, 241
Categories of tests, 5
Cattell 16 Personality Factors (16PF), 72
Cattell Culture Fair Intelligence Test, 287
CBTI Guidelines, 476
Ceiling level, 103
Center for Edpidemiologic Studies-Depression
(CES-D), 193
Cerebral palsy, 238
Certication, 7, 352
Check lists, 487, 490
attitudes, 137, 138
reliability, 491
validity, 491
Child Behavior Checklist (CBCL), 246, 252
Childrens Adaptive Behavior Scale, 234
Childrens Apperception Test, 398
Childrens Assertive Behavior Scale (CABS), 498
Childrens Inventory of Anger (CIA), 492
Chinese Tangrams, 211
Chinese Value Survey (CVS), 365
Civil Rights Act, 306, 417, 422, 425
Clark-Madison Test of Oral Language, 241
Classication, 2, 7
Classroom Environment Scale (CES), 504
Clinical assessment, 162
Clinical intuition, 38
Clinical vs statistical approaches, 38, 406
Closed response option, 141
Coaching, 340, 351
Coefcient alpha, 47
Coefcient kappa, 48
Coefcient of reproducibility, 134
Cognition, 92
abilities, 308
hearing impaired, 315
Cognitive Capacity Screening Examination, 267
Cognitive styles, 428
Cohens Kappa coefcient, 48, 490
Cohesion and control, 508
College Characteristics Index, 505
College Entrance Examination Board, 334
Colorado Childhood Temperament Inventory,
230231
Columbia Mental Maturity Scale, 238
Combining test scores, 38
Competence to stand trial, 419
13:48
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
Subject Index
Competency Screening Test, 420
Competitiveness, 215
index, 216
Completion items, 20
Comprehensive Identication Process, 224
Computer, 460
anxiety, 466
congural scoring, 462
disabled, 479
Halstead-Reitan and, 473
test administration, 462
test scoring, 461
See also specic tests
Computer based test interpretations (CBTI), 467
Computer testing, 461
GRE, 341
Concept Assessment Kit, 95
Concurrent validity, 54
Condentiality, 10
Congural interpretation, 173, 175
Congural scoring, 462
Conners Rating Scales, 253
Conservatism Scale (C scale), 138
Consistency score, 77
Construct validity, 54
methods, 55
Consumer satisfaction studies, 469
Content analysis, 17
MMPI, 173, 175
Content validity, 53, 71
Contrasted groups, 55, 56
Convergent and discriminant validity, 56
Convergent thinking, 203
Coping, 265
Coronary-prone behavior, 411
Correction for attenuation, 49, 57
Correlation coefcient
Q sets, 514
rank order, 146
Countdown method, 474
Couples, assessment, 509
Creativity, 205
biodata, 370
hearing impaired, 320
Creativity index, 211
Criterion contamination, 54
Criterion keying, 23, 70, 150, 152
Criterion reference, 7, 37
Criterion validity, 54, 358
Cronbachs alpha, 47
Cross-cultural testing, 204, 272, 413
racial distance quotient, 133
racial norming, 281, 305
ratings, 363
response styles, 435
13:48
631
See also Minorities; specic groups
Cross-validation, 24, 57
Crystallized abilities, 102
intelligence, 287
Culture fair tests, 282
hearing impaired, 317
Cutoff score, 36, 39, 353
multiple, 39
Daily Child Behavior Checklist (DCBC), 493
Daubert standard, 422
Death and dying, 265
Death anxiety, 219
scale (DAS), 219
Death Images Scale, 265
Decentering, 284
Decision theory, 59, 181
bias, 277
Luria-Nebraska, 249
tests, 4
validity, 59
D48 test, 290
tactual form, 310
D index, 136
Delta scores, 30
Denman Neuropsychology Memory Scale, 267
Dental Admission Testing Program (DAT), 352
Denver Developmental Screening Test, 224, 252
Depression, 269, 495
Derived scores, 17, 25
percentiles, 25
Detroit Tests of Learning Aptitude, 241
Developmental Indicators for the Assessment of
Learning, 224
Developmental Test of Visual-Motor Integration, 250
Deviation IQ
Diagnostic and Statistical Manual (DSM), 161
Differential Ability Scales (DAS), 116, 117
Differential validity, 57
Difculty of test items, 28
Direct assessment, 22
Direct magnitude estimation, 412
Disabilities
categories, 297
physical-motor disabilities, 321
Discriminant analysis, 40
examples, 169, 435, 436
Discriminant validity, 56
Distance-cluster analysis, 136
Distractors, 19
Divergent thinking, 206
Domino Creativity Scale on the ACL (ACL Cr),
210
Dot Estimation task, 376
Downey Will Temperament Test, 527
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
632
Drawing techniques, 250, 402
Draw-A-Man Test, 403
Draw-A-Person (DAP), 403
hearing impaired, 315
Dutch cognitive battery, 310
Dyadic Adjustment Scale, 508
Eating disorder, 408
inventory, 408
Ebel procedure, 354
Education for All Handicapped Children Act, 223
Educational Testing Service (ETS), 12, 28, 301
Edumetric, 3738
Edwards Personal Preference Schedule (EPPS), 76,
450
Effective therapists, 406
Efciency. See Predictive value
Elaboration, 207
Elderly, 257
Empiricism, 33, 311
Employee Polygraph Protection Act (1988), 387
Employment testing, 356
Environment
classroom, 503
college, 505
home, 502
physical, 505
Environmental Check List (ECL), 512
Environmental Personality Inventory, 512
Environmental Preference Questionnaire (EPQ), 506
Epistemology, 94
Equal appearing intervals, 129
Equal Employment Opportunity Act, 422
Equivalence, 466
Erickson, E., 84
Error variance, 251, 326
Estimated learning potential
Ethics
issues, 476
standards, 9
Etic and emic, 284
Examiner error, 109
Exner Comprehensive System, 397
Expectancy table, 35, 59, 373
Experimenter variables, 3
Expressed interests, 146
Extreme groups, 33
F (fascist) scale, 138, 421
Face validity, 56
Factor analysis, 17, 18, 24, 56
Factor weights, 87
Factorial invariance, 165
Factorial validity, 73, 209
Faking, 154, 427
Subject Index
impression management, 429
self-deceptive enhancement, 430, 431
False negative, 60
False positive, 60
Family Adaptability and Cohesion Evaluation Scales,
507
Family Apperception Test, 509
Family functioning, 506
Family systems, 506
test (FAST), 509
Fear Questionnaire (FQ), 501
Fear Survey Schedule, 499
Fear thermometer, 499
Federal Rules of Evidence (FRE), 422
Feedback, 11
Fiat, 22
Filler items, 22, 434
Five factor model, 88
job performance, 363364
Flexibility, 207
Fluency, 207
Fluid analytic abilities, 102
Fluid intelligence, 287
Focal assessment, 194
Folk concepts, 80
Forced choice items, 20, 434
Forced distribution method, 362
Forensic psychology, 419
Frostig Developmental Test of Visual Perception, 311
Fullard Toddler Temperament Scale, 230231
g (general) factor, 96, 288
occupational criteria, 359
Geist Picture Interest Inventory, 158159, 302
Gender differences
fear, 501
SAT, 336
TAT, 401
General Aptitude Test Battery (GATB), 305, 380
General Educational Development (GED) Tests, 332
General Health Survey, 415
Generalizability theory, 47, 64, 165166, 463, 489
Georgia Court Competency Test, 420
revised, 420
Geriatric Scale of Recent Life Events, 265
Gerontological Apperception Test, 270, 398
Gesell Developmental Schedules, 228, 527
Gibb Test of Testwiseness, 458
Gifted children, 245
GPA
criterion, 341, 347
GRE, 344
medical school, 349
restriction of range, 345
SAT, 341
13:48
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
Subject Index
Grade norms, 35
Graduate Record Examination (GRE), 342
Grassi Basic Cognition Evaluation, 245, 246
Griggs vs. Duke Power, 423
Group vs. individual tests, 5
Guessing, 31
correction, 31
Guidelines for Computer-based Tests and
Interpretations, 476
Guttman scale, 133, 205, 260
Hall, G. S., 525
Halo, 251, 361
Halstead-Reitan
computer and, 473
Neuropsychological Battery, 390, 455
Neuropsychological Test Battery for Children, 247
Hardiness, 410
test, 410
Hassles and uplifts, 414
Hassles scale, 218
scale, 414
Hayes-Binet, 525
Health belief model, 411
Health psychology, 409
Health status, 414
Healy-Fernald tests, 525
Hearing impairment, 242, 312
Hierarchical theories, 97
Hiskey-Nebraska Tests of Learning Aptitude, 243
Hispanics. See Latinos
History, 517
American applied orientation, 522
British idiographic approach, 520
French clinical tradition, 518
German nomothetic approach, 519
Holland Self-Directed Search, 158
Hollands theory, 149, 153
Holtzman Inkblot Technique (HIT), 397
Home Observation for Measurement of the
Environment Inventory (HOME), 502
Homogeneity, 45
groups, 57
items, 45
Hope, 216
House-Tree-Person, 320
How Supervise?, 382
Hysteria, 518
Illinois Tests of Psycholinguistic Abilities, 241
Imagery, 213
Impact of Life Event Scale, 411
Impression management, 508
Impressionistic interpretation, 7, 514
In-basket technique, 372
13:48
633
Index of discrimination, 32
Individual differences, 46
Questionnaire (IDQ), 214
Individualized Classroom Environment
Questionnaire (ICEQ), 504
Infant intelligence, 227
Infant Temperament Questionnaire, 230231
Inuences on tests, 3
experimenter variables, 3
method of administration, 3
situational variables, 3
subject variables, 34
Information about tests, 11
Informed consent, 1
Insanity, 420
Institute of Personality Assessment and Research
(IPAR), 24, 209, 210
Instrumental values, 143
Integrity tests, 384, 452
Intelligence
academic achievement, 97
academic vs. practical, 97
age scale vs. point scale, 98
anthropological model, 95
biological model, 94
cognitive theory, 95
computational model, 94
developmental periods, 94
epistemological model, 94
uid, 287
geographic model, 94
global-multiple, 93
hierarchical theories, 97
infant, 227
job performance, 97
metaphors, 94
multiple intelligences, 95
nature-nurture, 278
Piagets stages, 94
quotient, 99, 531
structure of intellect model, 95
See also specic topics, tests
Intelligent testing, 98
Intercept bias, 277
Interest inventories
disadvantaged, 158
Kuder inventories, 157
nonprofessional occupations, 159
Strong Interest Inventory, 148
Interests, 149
expressed, 157
inventoried, 157
Interitem consistency, 46
Internal consistency, 33, 45, 56
International Classication of Diseases, 161
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
634
Interquartile range, 130
Interrater reliability, 490
Interscorer reliability, 212
Interval scales, 8
Interview, 485
Intimacy-permissiveness scale, 205
Intrascorer reliability, 212
Inventory of Psychosocial Balance (IPB), 84
Inventory of Test Anxiety, 457
Inwald Personality Inventory (IPI), 378
Iowa Tests of Basic Skills, 330
Ipsative measurement, 7
Edwards Personal Preference Schedule, 76
Q sets, 513
Study of Values, 142
IQ. See Intelligence, quotient
Item
analysis, 17
banking, 474
categories, 18
difculty, 28
discrimination, 31
homogeneity, 45
sampling, 44
selection, 99
structure, 6
types, 19
writing, 18, 141
Item keying
biodata, 366, 453
Item response distribution, 156
Item response theory (IRT), 34
Jenkins Activity Survey, 411
Job Components Inventory (JCI), 379
Job performance
intelligence, 97
personality, 363
prediction, 359
Job success, 358
Jungian theory, 75
Kappa coefcient, 48, 490
Kaufman Assessment Battery for Children (K-ABC),
118
Keyed response, 1819
Kuder General Interest Survey (KGIS), 157
Kuder Occupational Interest Survey (KOSI),
157
Kuder-Richardson Formula, 47
Kuder Vocational Preference Record (KVPR), 157
Kuhlmann-Binet, 525
Lake Wobegon effect, 330
Lambda scores, 158
Subject Index
GRE performance, 294, 347, 348
SAT, 294, 337
Learning disabilities, 237
Learning Environment Inventory, 505
Learning Potential Assessment Device (LPAD), 95
Legal Attitudes Questionnaire (LAQ), 421, 422
Legal cases
Albemarle Paper Co. v. Moody, 423
Connecticut v. Teal, 423
Daubert v. Merrell Dow Pharmaceuticals, 423
Debra P. v. Turlington, 424
DeFunis v. Odegaard, 423
Diana v. State Board of Education, 425
Gaines v. Monsanto, 425
Griggs v. Duke Power Company, 423
Guadalupe v. Tempe Elementary District, 425
Larry P. v. Riles, 425
Myart v. Motorola, 423
Parents in Action on Special Education v. Hannon,
425426
Sharif v. NY State Education Department, 426
Target Stores, 425
Watson v. Fort Worth Bank and Trust, 423
Legal issues, 422, 477
Leisure Activities Blank, 512
Leiter International Performance Scale, 246
Liberalism-conservatism scale, 131
Licensure and certication, 352
Lie scales, 438, 450
Life events, 411
Life Experience Survey, 411
Life satisfaction, 261
Life Situation Questionnaire (LSQ), 322
Likert method, 131
Locator tests, 329
Locke-Wallace Marital Adjustment Test, 508
Locus of control, 202
scale, 202, 203
London House Personnel Selection Inventory, 385
Loneliness, 218
Luria-Nebraska Childrens Neuropsychological Test
Battery, 247
MacArthur Competence Assessment Tool (MacCAT),
420
Marital satisfaction, 263
Inventory, 510
Marital quality, 508
Questionnaire for Older Persons (MSQFOP), 263
Marlowe-Crowne Social Desirability Scale, 446
Maslach Burnout Inventory, 434
Matching items, 20
Mattis Dementia Rating Scale, 266
Mayo clinic, 460
McAndrew Alcoholism Scale of the MMPI, 408
13:48
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
Subject Index
McCarney Attention Decit Disorders Evaluation
Scale, 237238
McCarthy Scales of Childrens Abilities, 254
McClenaghan and Gallahue checklist, 238
McGill Pain Questionnaire (PQ), 416
McMaster Family Assessment Device (FAD), 508
Medical College Admission Test (MCAT), 348
Memory, 267
Memory Assessment Clinics Self-Rating Scale
(MAC-S), 269
Mental age, 99
Mental deciency, 234, 524
Mental Measurements Yearbook, 5, 1112
Mental Status Exam, 162, 163, 266
Meta-analysis, 57
Metropolitan Achievement Tests, 330
Mexican-Americans. See Latinos
Michigan Alcoholism Screening Test (MAST), 408
Military, 376
Armed Services Vocational Aptitude Battery, 376
Army Selection and Classication Project, 376
Dot estimation task, 376
Navy Basic Test Battery, 376
visual spatial abilities, 377
Miller Hope Scale, 216
Mill Hill Vocabulary Scale, 288
Millon Adolescent Personality Inventory, 184
Millon Behavioral Health Inventory, 184
Millon Clinical Multiaxial Inventory (MCMI), 179,
451, 455
Mini-mental State Examination, 266
Minimum competency testing, 304
Minnesota Multiphasic Personality Inventory
(MMPI), 170
Minnesota Multiphasic Personality
Inventory-Revised (MMPI-2), 170
Minnesota Preschool Inventory, 224
Minnesota Rate of Manipulation Test (MRMT), 308
Minorities
cognitive abilities, 360
MCAT scores, 351
MMPI, 295
norms on GATB, 382
police, 378
See also Cross-cultural testing; specic groups, tests
Moderator variables, 433
Modied testing, 300, 308
hearing impaired, 314
Morale, 264
scale, 264
Motor impairments, 238, 321
Multiaxial approach, 161
Multiple choice items, 1819
distractors, 19
keyed responses, 1819
13:48
635
response options, 18, 2122
vs. essay, 334
Multiple factors, 96
Multiple regression, 39
Multitrait-multimethod matrix, 55, 56, 360
Multivariate tests, 18
Muscular dystrophy, 238
Myers-Briggs Type Indicator (MBTI), 74, 211
National Assessment of Educational Progress
(NAEP), 333
Nativism, 311
Naturalistic observation, 485
Nature/nurture, 93, 528
Navy Basic Test Battery, 376
Nedelsky method, 354
NEO Personality Inventory-Revised (NEO-PI-R),
89
Neurobehavioral Cognitive Status Examination, 266
Neuropsychological assessment, 390
children, 245, 246
computerized, 472
elderly, 266
faking
Halstead-Reitan Neuropsychological battery, 390
NOIR system, 8
interval scales, 8
nominal scales, 8
ordinal scales, 8
ratio scales, 9
Nominal scales, 8
Nomothetic approach, 64, 519
Norm reference, 7
Norms, 17, 34, 304
age scale, 34
convenience samples, 34
local, 37
random sampling, 34
school grade, 35
selection, 34
special, 308
stratied samples, 34
Northwestern Syntax Screening Test, 241
Objective-subjective continuum, 21
Observer reports, 129
Occupational choice, 375
Occupational testing, 356
OConnor Tweezer Dexterity Test, 357
Option keying, 368
biodata, 453
Ordinal scales, 8
Originality, 207
Ostomy Adjustment Scale, 322
Otis-Lennon School Ability Test (OLSAT), 123
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
636
Pain, 416
Paired comparison, 362
Paper-and-pencil tests, 6
equivalence with computer forms
Parsons Visual Acuity Test, 245
Pattern matching, 55
Peabody Picture Vocabulary Test-Revised (PPVT-R),
238, 239, 317
Pearson coefcient, 43
Penile Tumescence Measurement (PTM), 494
Percentage of agreement, 397
Percentage overlap, 153
Percentiles, 25
Perceptual-motor assessment, 311
Performance assessment, 22, 258
Performance tests, 6
Perkins-Binet Test of Intelligence for the Blind, 245,
309
Personal Rigidity Scale, 23
Personality, 67
content vs. style, 430
denition, 67
ve factor model, 88
Inventory for Children (PIC), 231
Research Form (PRF), 78
stylistic scales, 444
theoretical models, 68
traits, 69
types, 67, 70
Personality test construction
content validity, 71
contrasted groups, 70
criterion keying, 70
deductive method, 70
external method, 70
at method, 71
inductive method, 70
rational method, 71
stylistic scales, 431
theoretical approach, 70
Personnel selection, 357
Philadelphia Geriatric Center Morale Scale, 264
Phobias, 499
Physical-motor disabilities, 321
Piaget theory, 94
Pictorial Inventory of Careers, 302
Pictorial Test of Intelligence, 238
Picture interest inventories, 302
Piers-Harris Self-Concept Scale, 226
Pilot testing, 1617
Pintner-Patterson Performance Scale, 525
Pleasant Event Schedule (PES), 495
Police, 377
Polygraph, 420
Positional response bias, 435, 436
Subject Index
Postwar period, 526
Power tests, 6
Practical intelligence, 97
Predictive value, 58, 60
Predictors
college achievement, 339
Preemployment testing, 356
Preschool Screening System, 224
Preschool testing, 325
approaches, 326
Pretest, 17
Primary factors, 310
Primary validity, 65, 200, 201
Principal components analysis, 112
Privacy, 11
Prole stability, 78
Program evaluation, 23, 501
Project A, 376
Projective techniques, 69, 326, 392
family assessment, 509
sentence completion tests, 401
Proprietary tests, 5
Psychological Screening Inventory (PSI), 166
Psychophysical methods, 132
Public Law 88-352, 306
Public Law 94-142, 223, 307, 424
Public Law 99-457, 223
Purdue Hand Precision Test, 357
Q index of interquartile range, 130
Q index of overlap, 156
Q methodology, 513
Q sets, 513
California, 514
self-concept, 515
stages of Kubler-Ross, 515
Q sort, 513
Racial distance quotient, 133
Racial norming, 281, 305
Random responding, 431
Randt Memory Test, 267
Rank-order correlation coefcient, 146
Rapport, 25, 258
Rater reliability, 48
Rathus Assertiveness Schedule, 497
Rating errors, 361
bias, 361
central tendency, 362
corrections for, 361
halo, 361
leniency, 362
Rating scales, 139
behavioral anchors, 362
cultural differences, 363
13:48
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
Subject Index
graphic, 139
self vs. others, 361. See Self-rating scales
supervisors, 360
Ratio IQ, 99
Ratio scales, 9
Ravens Progressive Matrices (PM), 288
Raw scores, 17
Reactivity, 490
Readability, 246
Reading-Free Vocational Interest Inventory-Revised,
302
Recentering, 335
Receptive and Expressive Emergent Language Scale,
241
Referral Decision Scale (RDS), 420
Regression, 39
biodata, 370
CPI, 440, 443
equation for creativity, 211
generalizability, 350
SAT, 337
Rehabilitation Act of , 306, 424
Reid Report, 385
Reitan-Indiana Test Battery, 247
Reliability, 42
alternate form, 44
correction for attenuation, 49
Cronbachs alpha, 47
difference scores, 51
generalizability theory, 47
interitem consistency, 46, 47
interobserver, 48
Kuder-Richardson formula, 47
property of, 42
rater, 48
Rulon formula, 46
scorer, 48
Spearman-Brown forumla, 45
split-half, 44
standard error of differences, 50
standard error of measurement, 49
test-retest, 43
true vs. error, 42
Repeated administration, 364
Response range, 149
Response sets, 427
Response time, 479
Restriction of range, 150, 345
Retest reliability, 43
Reynell Developmental Language Scales,
241
Roberts Apperception Test for Children,
398
Rokeach Value Survey (RVS), 143
Role playing, 481, 485
13:48
637
Rorschach Inkblot Technique, 394, 526
Rothbart Infant Behavior Questionnaire, 230231
Rotter Incomplete Sentences Blank, 402
Rotters Interpersonal Trust Scale, 434
Rulon formula, 46
Sample size, 63
SAT, 334
decline, 339
hearing impaired, 317, 318
Scales for Rating the Behavioral Characteristics of
Superior Students, 245, 246
Scatter, 35, 112
Schedule for Affective Disorders and Schizophrenia
(SADS), 270
Schedule of Recent Experience, 411
Schemas, 314
Schizotypal personality
questionnaire (SPQ), 186
Scholastic Aptitude Test (SAT), 334
School and College Ability Tests III (SCAT III), 122
School testing, 325
Scientic inquiry
Secondary traits, 73
Score interpretation
criterion reference, 7
norm reference, 7
Scorer reliability, 48
Scores
combining, 38
delta, 30
derived, 25
raw, 17
sources of error, 47
standard, 26
stanines, 28
T, 28
z, 26
Scottish surveys, 522
Screening tests, 7
Second order factors, 73
Secondary traits, 73
Secondary validity, 65, 191, 200, 201
Security, 5
Selection ratio, 62
Self actualization, 71, 410
Self-anchoring scales, 139
Self concept, 197, 226, 318
hearing impaired, 318
Self-Consciousness Inventory (SCI), 86
Self-Esteem Inventory, 226
Self-Esteem Questionnaire (SEQ), 200
Self monitoring, 481, 485
Self-perception Inventory, 226
Self-Rating Depression Scale (SDS), 194
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
638
Self-rating scales
anchor points, 69
attitude measurement, 139
behavioral assessment, 487, 489
memory, 269
personality, 69
Self-report measures, 68, 194, 411, 489
social competence, 331
Self-understanding, 2
Semantic differential, 134, 321, 451
Sensitivity 518, 60, 100, 490, 499
Sentence completion tests, 401
Sequential strategies, 62
Sesame Street, 502
Sexuality, 204, 494
scale, 204
Short-Form General Health Survey (SF-36),
415
Short forms, 18
Short Portable Mental Status Questionnaire,
266
Sickness Impact Prole, 415
Situational methods, 69
Situational variables, 3
Slope bias, 277
Slosson Intelligence Test (SIT), 124
Snellen chart, 245, 312
Snyder Hope Scale (SHS), 217
Social desirability, 76, 444
Edwards scale, 444
Personality Research Form, 78
scales of, 78, 445, 446
Social emotional behavior, 230
Social Readjustment Rating Scale, 412
Social Sciences Citation Index, 12
Social skills, 496
Social validity, 489
Sociometric procedures, 129, 510
Spearman Brown formula, 45
Specicity, 60, 100
Specimen set, 5
Speech impairments, 240
Speed of Thinking Test (STT), 125
Speed tests, 5
Spielberger Test Anxiety Inventory, 456
Spina bida, 238
Spinal cord injury, 322
Spiral omnibus, 22
Split-half reliability, 44
Standard age scores, 103
Standard deviation, 26
Standard error of differences, 50
Standard error of estimate, 59
Standard error of measurement, 49
Standard scores, 26
Subject Index
Standardization, 1, 17
Standards for Educational and Psychological Testing,
10, 306
Stanford Achievement Tests, 330
Special Edition, 317
Stanford-Binet, 101
special forms, 105
Stanines, 28
Stanton Survey, 385
State-Trait Anxiety Inventory (STAI), 187
Stems, 19, 20
Sternberg Triarchic Abilities Test, 96
Stoma, 322
Stress, 411
Strong Campbell Interest Inventory (SCII), 148,
149
Strong Interest Inventory (SII), 149
Strong Vocational Interest Blank (SVIB),
148149
Structure of Intellect
Learning Abilities Test (SOI-LA), 120
model, 95, 120, 208, 530
Structured Clinical Interview for DSM-III (SCID),
163, 164
Study of Values (SoV), 142
Stylistic scales, 433
Subject variables, 34
Success criteria, 347, 348
Suinn-Lew Asian Self-Identity Acculturation Scale
(SL-ASIA), 286
Summated ratings, 131
Supervisory Practices Test (SPT), 379
Suppressor variables, 431
Symptom Checklist 90R (SCL-90R), 163,
164
Symptom validity testing, 435, 436
Synthetic validity, 374
System of Multicultural Pluralistic Assessment
(SOMPA), 291
Table of specications, 16
Tactual forms, 310
Taxonomy, 53
Taylor & Russell tables, 64
Taylor Manifest Anxiety Scale, 456
Teacher rating scales, 330
childhood behavior problems, 330
Tell-me-a-story (TEMAS), 293
Tennessee Self Concept Scale (TSCS), 197
Terman, Lewis M., 524
Terminal values, 143
Tertiary validity, 65, 202
Test
administration, 3, 25
affected by, 3
13:48
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
Subject Index
assessment, 2
battery, 2, 162, 163
bias, 272
categories, 5
commercially published, 5
construction, 15
denition, 1
equivalence with computer forms, 327
experimental procedure, 3
group, 5
individual, 5
information about, 11
invasiveness, 6
levels, 11
maximal vs. typical performance, 8
power, 5
predicted behavior, 5
proprietary, 5
security, 6, 11
short forms, 18
sophistication, 336, 457
specimen set, 5
speed, 5
users, 2
See also specic topics, tests
Test anxiety, 456
questionnaire, 456
scale, 456
scale for children, 457
Test functions, 7
certication, 7
classication, 2, 7
diagnosis, 7
placement, 7
prediction, 7
screening, 7
selection, 7
Testing and Selection Order, 356357
Test purposes, 2
classication, 2
program evaluation, 2
scientic inquiry
self-understanding, 2
Test-retest reliability, 43
Testing the limits, 247
Tests in Microche, 12
Tests of General Educational Development (GED),
332
Testwiseness, 457. See Test sophistication
Thematic Apperception Test, 398
Thurstone method, 129, 527
Time urgency, 374
Token Test, 241
Torrance Test of Creative Thinking (TTCT), 206
True-false items, 19
13:48
639
Truth-in-testing, 424
T scores, 28
uniform, 173
Turnover, 375
Two-factor theory, 96
Type A behavior, 411
UCLA Loneliness Scale (ULS), 219
Uniform Guidelines on Employee Selection
Procedures, 356357
Unitary weights, 87, 155
Unpleasant Events Schedule, 496
U.S. Civil Service Commission, 305
U.S. Department of Labor, 305
U.S. Ofce of Personnel Management, 305
Utility vs. validity, 338
Validity, 52
applications and, 65
coefcients, 58
concurrent, 54
construct, 54
content, 5
convergent and discriminant validity, 56
criterion, 54
differential, 57, 351
external criteria, 275
face, 56
factorial, 73, 209
generalization and, 57, 64, 342, 350
individual, 65
internal criteria, 275
multitrait-multimethod matrix, 55, 56
predictive, 54
primary, 65, 201
secondary, 65, 191, 201
social, 489
synthetic, 374
tertiary, 65, 202
utility and, 338
Validity scales, 69, 437, 438
CPI, 438
MMPI, 437, 438, 450
MMPI-2, 440, 443
Value Survey (Rokeach) (RVS), 143
Values, 141, 365
terminal vs. instrumental, 143
Variability, 46
Variables
experimenter variables, 3
moderator, 433
situational, 2
subject, 5
suppressor, 433
See also specic tests
P1: JZP
0521861810sin
CB1038/Domino
0 521 86181 0
640
Variance, 43
Vector scales (on CPI), 8182
Verbalizer-Visualizer Questionnaire (VVQ),
215
Verication score (on Kuder), 157
Vignettes, 21
Vineland Adaptive Behavior Scale, 234, 320
Vineland Social Maturity Scale, 234
Visual acuity, 245, 312
Visual impairment, 244, 307
Vividness of Visual Imagery Questionnaire (VVIQ),
214
Vocational Interest Inventory, 158
Vocational Interest Survey, 158
Vocational placement, 298
Vocational rehabilitation, 297
Voir dire, 421
Vulpe Assessment Battery, 238
Washington University Sentence Completion Test,
402
Water Rating Scale, 505
Watson-Glaser Critical Thinking Appraisal,
357
Wechsler intelligence tests, 529
Adult Intelligence Scale-Revised (WAIS-R),
105
Adult Intelligence Scale (WAIS),
105
Deterioration Quotient, 108
Subject Index
hearing impaired, 314, 316
Intelligence Scale for Children (WISC), 110
Intelligence Scale for Children III (WISC-III), 110
Intelligence Scale for Children-Revised (WISC-R),
110
pattern analysis, 108, 112
Preschool and Primary Scale of Intelligence
(WPPSI), 114
Preschool and Primary Scale of
Intelligence-Revised (WPPSI-R), 114,
115
Wechsler-Bellevue Intelligence Scale, 105
Wechsler Memory Scale (WMS), 268
Wechsler Memory Scale-Revised (WMS-R), 268
Weighting
differential, 38, 87
unit, 38, 155
Wide Range Interest Opinion Test, 158159,
302
Wisconsin Personality Disorders Inventory (WISPI),
185
Wonderlic Personnel Test (QPT), 383
Woodworth Personal Data Sheet, 526
Work Orientation scale of the CPI (WO), 383
Work samples, 373
Worry-Emotionality Questionnaire, 456
Yerkes Point Scale, 525
z scores, 26
13:48