You are on page 1of 18

BASIC AND APPLIED SOCIAL PSYCHOLOGY

2017, VOL. 39, NO. 1, 1–18


http://dx.doi.org/10.1080/01973533.2016.1256288

Scale Imposition as Quantitative Alchemy: Studies on the Transitivity


of Neuroticism Ratings
Stefanie Dorough Morrisa, James W. Griceb, and Ryan A. Coxc
a
Ave Maria University; bOklahoma State University; cUniversity of Oklahoma

ABSTRACT
It is common practice in psychology to devise “measurement” procedures by imposing rating scales
(e.g., Likert items) onto phenomena and treating the values they produce as quantities. The validity
of these procedures goes untested. Validity checks are instead performed on sets of these
measurement procedures (i.e., multi-item scales). We present results from three studies suggesting
that people cannot be assumed to preserve transitivity when comparing themselves and others on
NEO Neuroticism-domain trait items. As transitivity is one of the fundamental axioms of quantitative
measurement, these studies challenge the validity of Neuroticism scales at the level of individual
scale items.

Introduction assumptions about various so-called measurement


procedures. The result is that many of the models
Pfleiderer (2014), a finance professor at Stanford
produced and advanced by psychologists bear a dubious
University’s Graduate School of Business, recently wrote
relationship to phenomena and processes in the real
an article in which he criticizes what he calls chameleon
world. This is a problem that deserves attention and
models in finance and economy. Chameleons are
one for which psychologists should be held accountable.
theoretical models that are presented as models of reality
Society relies upon a division of labor. If psychology is
despite being based on unexamined and/or inaccurate
an important enterprise—and we believe it is—then
assumptions. Pfleiderer begins the article with a joke:
the people who are ostensibly pursuing its stated aims
An engineer, a physicist and an economist are stranded should actually be doing so. Bringing the magnifying
on a deserted island with nothing to eat. A crate glass a bit closer to the studies at hand, to the extent that
containing many cans of soup washes ashore and the psychological science depends upon the accurate
three ponder how to open the cans.
measurement of psychological phenomena, the healthy
Engineer: Let’s climb that tree and drop the cans on the
rocks. development of the entire field is threatened by
Physicist: Let’s heat each can over our campfire until disregard for measurement theory. What, after all, could
the increase in internal pressure causes it to open. be more prohibitive to the actual advancement of
Economist: Let’s assume we have a can opener. (p. 1) science than merely apparent advances?
We are psychologists, not economists, and do not
presume here to speak to the accuracy or urgency of Is psychometrics a pathological science?
Pfleiderer’s specific concerns. We repeat this joke
Michell (2008a) wrote that
because it communicates, with an efficiency that is often
reserved to jokes, an overwhelmingly ignored criticism pathology of science occurs when the normal processes
that has been leveled against modern “quantitative” of scientific investigation break down and a hypothesis
psychology for the better part of a century: namely, that is accepted as true within the mainstream of a discipline
without a serious attempt being made to test it and
somewhere in our efforts to produce and utilize
without any recognition that this is happening. (p. 7)
measures of psychological concepts, we have either
failed to understand measurement or—worse? He diagnosed the subfield of psychometrics as
—understood it and failed to do it. Rather, we have pathological. Here is why: The stated goal of
made a series of unwarranted and, often, unnecessary psychometrics is the quantitative measurement of

CONTACT Stefanie Dorough Morris stefanie.dorough@avemaria.edu Psychology, Ave Maria University, 5050 Ave Maria Boulevard, Ave Maria,
FL 34142-9505.
© 2017 Taylor & Francis
2 S. D. MORRIS ET AL.

psychological phenomena, the first step of which is the structure, where a variable is defined as “anything
demonstration of the quantitative structure of these relative to which objects may vary” (Barrett, 2003,
target phenomena. The mainstream has accepted, in p. 423). The axiomatic conditions of quantitative
deed if not in thought, that psychological attributes are structure as outlined by Mill (1843/1974) and later by
quantitative. Psychometricians have failed to address Hölder (1901) are as follows:
this unquestioning acceptance effectively; the cries of Let X, Y, and Z be any three values of variable Q.
the few among them who seem to recognize that the Variable Q satisfies the conditions of ordinal structure
problem even exists seem to have fallen on deaf ears. if and only if:
Barrett (2008), baffled by the lack of attention given to 1. X ≥ Y and Y ≥ Z then X ≥ Z (transitivity; this extends
Michell’s substantial body of work on this topic (e.g., to X < Y < Z);
Michell, 1997, 2000, 2008a, 2008b), urged us to consider 2. X ≥ Y and Y � X then X ¼ Y (antisymmetry);
the practical consequences of the breakdown of serious 3. either X ≥ Y or Y ≥ X (strong connexity);
dialogue between psychometricians and mainstream Let Variable Q be any ordinal variable such that for any
psychologists: “While psychometricians have advanced of its values X, Y and Z:
their thinking and technical sophistication in leaps and 4. X þ (Y þ Z) ¼ (X þ Y) þZ (associativity);
bounds over the past 40 years or so, the practical conse- 5. X þ Y ¼ Y þ X (commutativity);
quences have been almost nonexistent except in the 6. X ≥ Y if and only if X þ Z ≥ Y þ Z (monotonicity);
domain of educational testing and various examination 7. If X > Y then there exists a value of Z such that
scenarios” (p. 79). Satisfied that the matter of quantitiv- X ¼ Y þ Z (solvability);
ity has been settled, tied up neatly with a bow, and passed 8. X þ Y > X (positivity);
down to us in the form of scale construction techniques, 9. There exists a natural number n such that nXY ≥ Y
mainstream psychologists have “moved on.” Meanwhile, (where 1X ¼ X and (n þ 1)X ¼ nX þ X) (the
psychometricians who have persisted in their work seem Archimedian or continuity condition).
to mainstream psychologists to be fighting yesterday’s For real relations of a property possessed in real
battle. Distanced in this way from serious considerations instances (e.g., real values of an attribute possessed by
of measurement theory, mainstream psychologists have real people) to be validly represented as ratios or magni-
nonetheless continued to engage in what only appears tudes of quantities, the property itself must possess
to meet the requirements of a quantitative science. quantitative structure. Furthermore, the values produced
It is common practice among psychologist to devise by an accurate quantitative measurement procedure
(quantitative) “measurement” procedures by imposing must, by definition, meet these axiomatic conditions.
rating scales (e.g., Likert-type scales) onto descriptions This article presents three empirical studies that
of various phenomena. People are then expected to use demonstrate the failure of imposed Likert-type rating
these imposed scales to make judgments regarding scales to meet the most basic assumption that must be
the extent to which the phenomena are possessed or met in order to treat them as quantitative measurement
expressed by themselves or other people. These judg- procedures. The criticisms and methods outlined
ments are assumed to be quantities with no steps taken here are applicable to any scales that purport to measure
to ensure the representational accuracy of the judgments psychological attributes through the imposition of “one-
or the scales themselves. These procedures are generally size-fits-all” fixed-format scales, but this particular set of
combined with others to form multi-item measurement studies addresses an untested measurement claim that is
scales the validity of which is established using a number made by personality trait theorists, namely, that we can
of statistical procedures (e.g., convergent/discriminant and regularly do validly represent quantitative relations
validity, factor analysis, internal consistency) that treat of individuals with respect to dimensions of personality
the values of the scale items as quantities. Although scien- using self- and other-reports along Likert-type scales.
tific endeavors often require assumptions, the assumption Something is rotten in the state of psychometrics, which
that imposed scales produce quantitative values of real should be evident from the fact that accepted methods of
phenomena is an unnecessary one, as quantitative struc- assessing the validity of our scales as measurement
ture itself can serve as a clear guide for testing assump- procedures (e.g., convergent and discriminant validity,
tions of quantitivity and quantitative measurement. internal consistency) rely on the assumption that our
scales are valid measurement procedures (i.e., that
individual item ratings can be treated as quantitative
What exactly is quantitivity?
values). “The real-world consequences of this systematic
Quantitative properties or attributes are variables that aversion to properly considering the presumed status
meet conditions of ordinal, additive, and continuous of a psychological variable”—both in reality and as
BASIC AND APPLIED SOCIAL PSYCHOLOGY 3

measured—“is that our journals are now filled with or an empirical structure. Elements in an empirical
studies that are largely trivial exemplars of mostly structure may be expressed numerically (e.g., with
inaccurate explanations of phenomena” (Barrett, 2008, respect to rank, frequency, quantity of properties
pp. 79–80). The studies presented here show how the possessed by the empirical objects.).
axioms of quantitative structure can be used to falsify Under a representational measurement paradigm,
the hypothesis that imposed rating scales produce inasmuch as a procedure models empirical relationships,
quantitative values. They also serve as the first step in it may be considered measurement. Quantitative
what we expect to be a time-consuming journey toward measurement, then, is the representation of one among
procedures that produce accurate representations of many empirical systems; that is, it is a model of formal
psychological realities. Before turning to these studies, relationships among elements with respect to quantity.
we reflect on the events that may have led us to the state Many nonquantitative formal relationships holding
we find ourselves in and on forces preventing us from between sets of entities may also be meaningfully
moving forward. represented by numerical structures. Mathematical
operations performed on the values produced by quanti-
tative and nonquantitative measurement procedures can
Representational measurement theory
serve as a proxy for in vivo operations performed on
In 1932, the British Association for the Advancement of empirical systems. Stevens (1951) famously identified
Science appointed a committee to investigate the claim four “scales of measurement” (nominal, ordinal, interval,
that psychophysicists (e.g., Fechner, Thurstone, and ratio) and outlined statistics that could be permissibly
Stevens) had demonstrated the continuous quantitative applied to numerals in each empirical system without
structure of human sensations (e.g., loudness, pitch). disrupting their representational accuracy.
After 7 years of consideration, the committee released There are legitimate reasons to restrict the term
its conclusion that psychophysicists had indeed failed “measurement” to refer solely to the estimation of
to demonstrate the existence of ratios and requisite quantities—the first that comes to mind is the misrep-
units characteristic of quantitative properties in human resentation of psychometric procedures to the public.
sensory experiences and had de facto failed to demon- Nonetheless, although the stated task of psychometrics
strate their measurability under the classical definition proper may be to quantify psychological phenomena,
(A. Ferguson et al., 1940). These findings reflect a inasmuch as the task of science concerns the acquisition
commitment to classical measurement theory, in which of knowledge of real things and processes, it is not the
measurement is defined as “the estimation or discovery primary task of psychological scientists to quantify
of the ratio of some magnitude of a quantitative anything. That is, the scientific value of models is
attribute to a unit of the same attribute” (e.g., Michell, contingent upon their representational accuracy, not
1997, p. 358). Six years later, Stevens (1946) published whether they are quantitative. Redefinition of the term
a response to the committee’s findings in which “measurement” aside, a strong case can be made for
he argued that scientists should be free to model the legitimacy of a representational psychometrics, the
psychological phenomena under a broader definition goal of which is the accurate representation of
of measurement as the assignment of numerals to relative relations of individual entities with respect to
objects according to a rule, provided that the rule psychological attributes, including but not limited to
results in an accurate representation of an empirical the representation of quantitative relations where they
reality. It can be noted here that Stevens was advocating are claimed or assumed. As this is consistent with
a view of science in which the goal is the acquisition Stevens’s “levels of measurement” and “permissive
of knowledge about real things, structures, and statistics,” which all students of psychology are still
processes. taught at both the undergraduate and graduate levels,
Representational measurement theory is grounded one might expect that adopting a representational
in Tarski’s metamathematics, particularly his general measurement approach would alleviate many of the
theory of models, which holds that measurement is a concerns raised by Michell. What if Joel Michell
homomorphism, or structure-preserving map, between and other classical psychometricians were simply
finite sets of elements in a particular relational system to—perhaps begrudgingly—accept the goal of psycho-
(Trendler, 2009; see Sher, 1999, for a review of Tarski’s metrics as a purely representational task? If we have
work). A relational system is purely formal, and the been following the guidelines of representational
elements are abstract symbols. If the elements are measurement passed down to us from Stevens, surely
applied to empirical objects, the corresponding set of their concerns about present practices would be reduced
objects is referred to as an empirical relational system to an argument over semantics.
4 S. D. MORRIS ET AL.

Psychology’s break with reality article on permissive statistics that may have fueled
our collective break from reality is Lord’s (1953)
Bizarrely, the adoption of a representational measure-
“On the Statistical Treatment of Football Numbers” in
ment paradigm does nothing to justify prevailing
which he superficially appears to demonstrate that the
measurement procedures. If anything, the lens of
quantitative analysis of nominal values can yield mean-
representational measurement theory brings the
ingful results. In this thought experiment, Lord imagines
pathological nature of current psychometric practices
a vending machine that dispenses football numbers.
more clearly into focus. The four scales of measurement
These football numbers, in addition to being nominal
are a standard part of the psychology curriculum. We
data that simply distinguish each football player from
know—or at least we should know—that it is not
the others, also represent desirability at an interval level.
permissible to add, subtract, multiply, or divide ordinal
The football players to whom the machine is dispensing
values, as to do so is to sever the representational bond
numbers desire low numbers over high numbers; as
between rank and reality. Armed with this knowledge
Lord sets up the scenario, it would seem that the num-
and freed through appeal to the representational
bers correspond to discrete units of desirability with
paradigm from the pressure to quantify everything,
equal intervals (e.g., 4 is as equally desirable over 5 as
one could reasonably expect to see psychologists
10 is over 11). Lord demonstrates that one could
employing representational measurement techniques
fruitfully analyze the numbers distributed by the vending
that specify and model a variety of relational systems
machine using quantitative parametric procedures to
taking care to use only procedures that maintain
determine whether the machine is biased (i.e., not
their representational value. Rather, it is now common
distributing numbers randomly); from this, he argues
practice for psychologists to impose structure onto
that we should not feel bound to “permissive statistics.”
ill-defined phenomena and to assume, in the absence
It is distressing that this article reached the light of
of reason or demonstration, that “measurement” proce-
publication as Lord shifts, in the middle of the thought
dures based on the imposed structures produce values
experiment, from treating the numbers as representing
that represent the quantitative structure of real variables.
one relational system to representing another. When
The pathology of psychometrics, it seems, goes far
estimating bias, he is no longer analyzing the numbers
beyond the mere assumption that psychological
qua football numbers but as interval values of
attributes possess quantitative structure! The deepest,
desirability. This error has been clearly and soundly
darkest error in mainstream psychology with respect
addressed by Zand Scholten and Borsboom (2009).
to measurement does not lie in the assumption that
Perhaps this seemingly sanctioned departure from
psychological attributes themselves are quantitative
reality on the basis of its potential “fruitfulness” is partly
but in the assumption that we can actually produce
to blame for the present state of mainstream psychologi-
quantitative variables by simply imposing scales onto
cal measurement. Maybe psychologists gave into the
psychological phenomena, regardless of the structure
temptation to disregard the work of psychometricians
of the phenomena or our ability to represent it. This
because they were simply taking too long. It took
reflects a profound break from reality! If science is
the Ferguson committee 7 years to conclude that
concerned with reality, then misrepresenting reality
psychophysicists had failed to establish the additivity
intentionally or through neglect while pretending not
of psychophysical phenomena, and psychometricians
to be doing so is to engage in pseudoscience. Other
are still working on ways to address their concerns
words that come to mind include “magic,” “alchemy,”
(e.g., conjoint measurement theory, Rasch model),
and “delusion.”
which they discuss among themselves using language
To be clear, it is arguably magical, alchemical, and
that is increasingly specialized (e.g., Barrett, 2003;
delusional thinking that permits doctoral-level psychol-
Bouyssou & Pirlot, 2002; Falmagne, 1979; Fishburn,
ogists to think that we can engage in procedures in which
1991; Gonzales, 2000; Krantz, 1964; Luce, 1966; Luce &
we make little to no effort to ensure the representative
Narens, 1985; Luce & Steingrimsson, 2011; Luce &
validity of our models and somehow still obtain results
Tukey, 1964; Michell, 1988, 1997). Meanwhile, hundreds
that are interpretable at the level of reality. That we,
of “psychometric” procedures bearing dubious repre-
nonetheless, regularly engage in this thinking may be
sentational value have been constructed, marketed, and
due in part to Stevens (1946) himself talking out of both
applied by decreasingly specialized persons using
sides of his mouth when, in one and the same article, he
conventions of scale construction that do not take into
both established permissive statistics and suggested that
account the true difficulty of modeling reality. On top
it could sometimes be “fruitful” to treat values as quan-
of this, the worth of thousands of studies, textbooks,
titative even though they are not. Another “classic”
lectures, and conference presentations is predicated on
BASIC AND APPLIED SOCIAL PSYCHOLOGY 5

the validity of these measures. In almost every area of from modeling psychological phenomena to modeling
psychology, we see a failure to do the work necessary variance.
to demonstrate the representational value of “measure- Both Michell (2000) and Barrett (2008) suggested
ment procedures.” Instead, scale construction tends that, having succeeding for so long in presenting
to rely on indicators of validity (e.g., convergent/ psychology as a quantitative science, psychologists
discriminant validity, internal consistency) that presume have become increasingly invested in preserving the
representational validity of scale item values (e.g., appearance of measuring psychological phenomena. If
Cloninger, Przybeck, & Svrakic, 1991; Cloninger, this is the case, psychologists are by no means alone
Svrakic, Pryzbeck, & Wetzel, 1994; Costa & McCrae, in this investment. As with any product, once measure-
1992; Digman, 1990; Guilford, 1975; Lee & Ashton, ment procedures have been marketed to the public
2004; McCrae & Costa, 2003; Tellegen & Waller, 2008; (i.e., a client base has been identified), their creators,
Watson, Clark, & Tellegen, 1988). marketers, and users can easily find themselves in
If it is the case that psychologists lost patience with collusion to defend their worth.
our psychometrician brothers and sisters, preferring
fruitfulness to scientific modeling, then what was the
The first step is acknowledging the problem:
“fruit” and what was the rush? According to Michell
Three studies
(2000), the pressure of scientism in the 19th century
led psychologists to adopt and advance the rhetoric of This article presents the results of three studies
(quantitative) measurement in the absence of actual illustrating a simple technique for determining whether
(quantitative) measurement, and continuing pressures a procedure satisfies the lowest level assumptions of
have prevented psychologists from acknowledging quantitative measurement. More specifically, these
the resulting methodological shortcomings. One of the studies address a measurement claim that is implicitly
pressures he cites is the economic reliance of scientific made by personality trait theorists, namely, that we
research on grants from agencies that award financial can quantify personality traits (i.e., as captured in trait
support based upon the apparent rigor of quantitative term/phrases) in ourselves and others.
methodology. Michell pointed out that the “new rigorism From the perspective of trait theory, personality is
movement” in psychology emerged during the immediate understood as a collection of observable patterns
post–World War II decades when government of behaviors (i.e., “traits”) that are attributable to a finite
investment in scientific research was on the rise and set of underlying structures—also referred to as “traits”–
that our methods have remained largely unchanged the latter of which are characterized by the production of
since that time. forces that motivate engagement in category-specific
Psychology historian Danziger (1990) presented a behaviors (e.g., agreeableness, extraversion). Most, if
similar narrative of what he referred to as “the triumph not all, of the personality differences we observe from
of the aggregate” (p. 68). Prior to WWI, almost all of the one individual to another are believed to be produced
published results in journals of experimental psychology by different combinations of values along a small
were concerned with individual subjects; even in cases number of quantitative trait-dimensions that are
where responses were averaged across individuals, applicable to all people. Trait models of personality rely
individual responses were generally provided and heavily on the use of factor analysis, a statistical
interpretations were usually based on these individual procedure that decomposes the shared variance among
patterns of responses. During the period from WWI sets of observed measurement values into common
to WWII, the ratio of studies reporting individual data and variable-specific portions and distributes common
to aggregate data rapidly declined, particularly in variance across a smaller set of linear functions called
journals geared toward “applied psychology.” Danziger factors (Harman, 1967). As Lamiell (2013) pointed out,
made the case that this shift may be attributed to the the personality trait-dimensions (i.e., in the case of the
higher marketability of aggregate studies relative to Big Five, these are Openness, Conscientiousness,
studies of individuals as a proposed means to assess Extraversion, Agreeable, and Neuroticism) extracted
and improve social conditions.1 The increased focus using factor analysis do not refer to any attribute,
on aggregate analyses may not at first glance appear to quantitative or otherwise, possessed by individuals, and
complicate the pursuit of valid measurements, as even any conclusion to the contrary is based on a misunder-
aggregate analyses presume the valid representation of standing of what factor models actually represent. The
individuals within a specified system; however, reliance error involved in mistaking linear models of variance with
on aggregate procedures obscures fundamental measure- real properties or attributes existing at the level of indivi-
ment problem by shifting the focus of psychometrics duals is grievous. Regardless, however, of the ontological
6 S. D. MORRIS ET AL.

status one ascribes to latent factors, it cannot be denied complete the scale happen to experience quantities
that the operations (e.g., addition, multiplication, of agreement that are whole numbers.3
division) used in factor analysis presume that the values Even more must be true if quantity of agreement is a
entered are valid representations of empirical systems that valid measure of quantities of the trait “often acts
are knowable and thus subject to empirical validation. without thinking when upset,” including the following:
Trait measures constructed in accordance with factor 1. Quantity of the trait “often acts without thinking
models generally consist of a number of items that load when upset” and quantity of agreement with the
highly onto a single factor and on which people are phrase “often acts without thinking when upset”
asked to rate themselves with respect to a disposition must have homomorphic structure.
or behavior using an imposed rating scale (e.g., a 2. The person completing the scale would have to know
Likert-type rating scale; i.e., the same type of procedure exactly what is intended by the “trait” item. A person
used to obtain the raw data used to produce many who frequently gets upset could validly give the same
factor-based scales). The following self-rating item is response as someone who rarely gets upset, provided
taken from the Impulsiveness factor of the NEO-PI-R that they both act without thinking when upset.
Neuroticism subscale (Costa & McCrae, 1992), a per- 3. The person completing the scale would have to know
sonality measure that is used for a variety of purposes the quantity of this “trait” that he or she and similar
including personality research and employee selection. others possess.
The instructions and rating scale are typical of per- 4. The person completing the scales must be willing to
sonality measures that have been shown to have good give an accurate response.
psychometric properties. Finally, if we incorporate the instructions, the person
Instructions: Indicate the extent to which you agree completing the scale would have to be able to do all of
with the following statement. Rank yourself in relation the aforementioned things while accurately adjusting
to others who are similar to you in age and sex. for their position in a distribution of an undefined
When I am upset I often act without thinking. group of people who share their same sex and age.4
Given the many abilities required to make a valid quan-
Strongly Neither Agree Strongly
Disagree Disagree nor Disagree Agree Agree titative rating on a Likert-type scale and the precision these
1 2 3 4 5 scales are presented as having in “scientific literature,” it
would be reasonable to believe that careful attention had
This kind of scale is so prevalent that it is easy to been given to the development of each Likert item. It
overlook the immense complexity involved in selecting
would also be reasonable to assume that it would be rela-
an accurate response. Individual Likert items are
tively easy, compared to completing a Likert item rating,
attempted quantitative measurement procedures the
validity of which is contingent upon the truth of many for people to make mere quantitative judgments with
assumptions. One key assumption is that people are able respect to the properties assessed by each scale (i.e., quan-
to quantify properties possessed by themselves and titative judgments in units of the property itself rather than
other people. A person who cannot do this also cannot normalized scores converted into units of agreement).
normalize their own quantitative scores with respect to The fact that psychologists and other scientists
a distribution of the quantitative scores of similar others purport to construct quantitative measurement
(e.g., people similar in sex and age) and then convert procedures using sets of Likert-type scales without even
that score into quantities of units of agreement. To give doing so much as to define the “measured” properties to
these demands more concreteness, here are a just few provide unambiguous measurement targets should scan-
examples of what must be true if this scale produces dalize both scientists and the general public. Yet it is the
quantitative values of agreement: common practice in psychology to begin the construc-
1. Strongly Agree ¼ 5 (Strongly Disagree) tion of measurement procedures by pairing Likert-type
2. Neither Agree nor Disagree ¼ 3 (Strongly Disagree) rating scales with ambiguously stated traits; untrained
3. Strongly Agree – Agree ¼ Agree – Neither Agree nor raters are then asked to assign scale values to themselves
Disagree ¼ Neither Agree nor Disagree-Disagree ¼ and others, and these values are treated as quantities
Disagree – Strongly Disagree. of something … something that may be correlated with
4. The range of values provided must correspond to something else of interest that can be revealed through
the actual range of agreement people experience with the application of statistical procedures.
respect to the trait item.2 An exaggerated understanding of what statistics can do
5. There exist an infinite number of possible values has led to a neglect of the basic measurement tasks that
between each of the given values, but people who seem to be implied by Likert-type scales like
BASIC AND APPLIED SOCIAL PSYCHOLOGY 7

establishing a measurement target and demonstrating that (i.e., assigned a single interpretation out of the possible
a rater can “hit” the target. For example, a person who is interpretations) the construct evoked by the term
aiming to quantify the extent to which they or others Neuroticism may be highly consistent or highly
“worry a lot” needs to know (a) what it means to “worry” inconsistent within or across researchers.
and (b) what it means to do it “a lot,” as this is a phrase A Neuroticism or Neuroticism-like factor dimension
that can refer to frequency, duration, or intensity. Once is reliably extracted from varied data sets and is recog-
the target of a measurement procedure is established, it nizable in some form within almost all theoretically or
should be demonstrated that the values produced by the empirically based models of general personality struc-
procedure are accurate. Only when the majority of a field ture (e.g., Digman, 1990; McCrae & Costa, 2003). The
believes that statistical procedures can correct for errors misinterpretation of factor analysis has resulted in this
that people make in hitting their measurement target, fact being considered as evidence that Neuroticism qua
even error that is due to not knowing what the target is, statistical factor is a truly general structure, meaning
can these tasks appear unnecessary. present in all persons, as opposed to an aggregate struc-
The poor state of our understanding of Neuroticism ture. In Digman’s (1990) review of the five-factor model
is an excellent example of the consequences of ignoring of general personality structure, he referred to per-
basic measurement tasks in favor of quantitative sonality factors from various models representing the
alchemy and statistical magic. Neuroticism is one of presence and effects of negative emotionality using the
the most heavily researched personality constructs of generic term Dimension IV, a dimension generally
the 20th century, yet there is still no clear consensus as referred to as Neuroticism “to line up with the vast work
to the referent of the term “Neuroticism” that can be of Eysenck over the years” (p. 422). Other theorists
identified as a unit applicable at the level of the individ- employ the terms “emotional stability,” which often
ual (i.e., the only place where personality traits actually appears as the opposite of neuroticism in some models,
exist). It is associated with a variety of definitions, or simply “stability” (Guilford, 1975; Goldberg, 1993),
each so vague that it is difficult where not impossible “emotionality” (Lee & Ashton, 2004), “negative emo-
to ascertain what a single researcher means by it in a tionality/temperament” (Watson & Clark, 1994; Watson
single instance. It has been identified, for example, as a & Tellegen, 1985), and “harm avoidance” (Cloninger,
“relatively enduring disposition to experience negative 2000), but this is largely considered to reflect arbitrary
affect,” “a dimension of psychopathology,” and a preference rather than construct divergence because
“susceptibility to psychological distress.” These defini- the scales themselves produce highly convergent scores.
tions contain terms that are highly ambiguous, and Table 1 lists the facet scales of the Neuroticism scale of
depending upon how such terms are disambiguated the NEO Personality Inventory–Revised (NEO PI-R), an

Table 1. Facets of negative and/or unstable emotionality related subscales.


NEO PI-R PANAS MPQ HEXACO TCI EPP
Angry hostility Hostility Aggression
Anxiety Anxiety Fear of uncertainty Anxiety
Depression
Self-CSS
Vulnerability (to stress) Stress Reaction
Impulsivity
Irritability/anger
Fear Fear
Scared
Nervous
Jittery
Guilty Guilt
Ashamed
Upset/distressed
Distressed Unhappiness
Alienation
Sentimentality
Dependence Dependency
Fatigability
Anticipatory worry
Shyness
Inferiority
Hypochondriacal
Obsessive
Note. PANAS ¼ Positive Affect Negative Affect Schedule; MPQ ¼ Multidimensional Personality Questionnaire; TCI ¼ Temperament and Character Inventory;
EPP ¼ Eysenck Personality Profiler; CSS ¼ consciousness.
8 S. D. MORRIS ET AL.

assessment of personality dimensions within the five- traits in themselves and other people. The first axio-
factor model, the Negative Emotionality scale of the matic condition of quantitative structure is transitivity,
Positive Affect Negative Affect Schedule (Watson et al., which means that all quantitative values are also ordinal
1988), the Negative Affectivity scale of the Multidimen- values and that any measurement procedure that
sional Personality Questionnaire (Tellegen & Waller, produces valid quantitative judgments of properties
2008), the Negative Emotionality scale of the HEXACO necessarily produces transitive values. Transitivity states
Personality Inventory (Lee & Ashton, 2004), the Harm that, for any triad of values X, Y, and Z of variable Q, if
Avoidance Scale of the Temperament and Character X ≥ Y and Y ≥ Z, then X ≥ Z. If, for example, (a) Tom is
Inventory (Cloninger et al., 1994), an expansion of the taller than Mike and (b) Mike is taller than Johnny, then
Tridimensional Personality Questionnaire (Cloninger (c) Tom must be taller than Johnny.
et al., 1991), and the Neuroticism scale of the Eysenck For a variable to be quantified, it must have quantitative
Personality Profiler (Eysenck & Wilson, 1991). There is structure. If it has quantitative structure, then all of the
a readily recognizable theme both within and across preceding axioms are true and will hold for entities pos-
these scales, but the lack of overlap between the facets sessing the variable. A representational measurement
of scales is striking. procedure results in values that maintain the relations
Despite the prevalence of studies that report quanti- between entities with respect to a particular variable.
tative values of Neuroticism, there is no evidence that If the values produced by a quantitative measurement
researchers have defined or (quantitatively) measured procedure are an accurate map of real relations
Neuroticism in individual persons (i.e., the only place among entities, the values will also follow the axioms
where Neuroticism as a real, personal attribute could of measurement. A valid measurement procedure will
exist). This does not mean that the traits associated with repeatedly maintain real relations. You can arbitrarily
Neuroticism or even Neuroticism itself do not exist at assign height to people and the numbers will follow
the level of the person. The wisdom in trait theory is that the axioms of quantitivity; to utilize these axioms toward
language captures something real in human interactions. developing measurement procedures, the procedures
The self- and/or other-rating scales currently in use by need to be tested repeatedly on entities with the same
psychologist are halfhearted attempts to assess concepts relation to one another. If the variable in question has
the existence of which is so obvious that no failed not previously been measured (i.e., the goal is not an
measurement test could make the slightest dent in alternative to an existing measuring), making it imposs-
people’s belief in them. Our sense of the existential ible to select different entities with the same internal
real-ness of Neuroticism, for example—along with a relations for measurement from one time to another,
profound misunderstanding of factor analysis—is the same entities can be used provided that the variable
probably a main cause of our neglect to develop a single is stable. If the values produced by a particular measure-
definition of Neuroticism. ment procedure do not maintain the same relations
In this article, we hope to test the basic assumption of with respect to a stable variable at different times, the
existing Neuroticism scales and to initiate a systematic measurement procedure is not a true relational map; this
investigation into the nature and structure of Neuroti- tells you that there is error in the measurement model,
cism in persons. It is difficult to target personality factors but it does not tell you the source of the error. Identify-
or subfacets (in persons) for measurement in the absence ing sources of error in the measurement procedure is
of an unambiguous definition. In the studies presented a task that takes what Freedman (1991) might call
here, we have targeted trait terms/phrases that appear “shoe-leather.” You can decide to tolerate the error
in the Neuroticism-domain Likert items of the regardless of the source,5 you can attempt to remove it
NEO-PI-R. Trait terms are a natural point of departure using a statistical procedure, or you can see it as an
for developing a scientific understanding of the real opportunity to engage with reality and learn more about
properties that people perceive in themselves and others. the variable you are trying to measure. Although the first
Trait theory is historically based in this Lexical Hypoth- two options arguably have their place in science and
esis, and trait terms in existing personality measurement practice, the latter option has particular merit when
procedures continue to reflect this; thus, trait terms the nature and structure of the target variable are
taken from an existing Neuroticism measurement ill-defined, as is the case with Neuroticism and
procedure serve as a doubly beneficial starting point Neuroticism-domain traits.
for both testing assumptions of these existing measures The studies we report here utilize a transitivity
and exploring the nature and structure of Neuroticism. paradigm in which participants were asked to make
The validity of a Likert-type scale item rests on the pairwise judgments of the ordinal relations (i.e., greater,
assumption that people are able to quantify personality less than, equal) of a set of entities with respect to a
BASIC AND APPLIED SOCIAL PSYCHOLOGY 9

variable. The transitivity paradigm has been utilized in a of traits are unknown. In short, there is no known
variety of research areas including choice theory (i.e., quantitative scale we could use. Some might argue that
Ace & Dawis, 1974; E. D. Ferguson, 1962, 1971; Rader, we could simply use the Likert items from the NEO-PI-
1963; Tversky, 1969), perception of social structures R as quantitative procedures and that we bear the bur-
(e.g., Delia & Crockett, 1973), and linguistics (e.g., den of proof to demonstrate that they are not quanti-
Guest, Dell, & Cole, 2000), but personality researchers tative procedures. In response to that, consider an
have not taken advantage of the simple methods implied analogy to height. Height is a known quantitative vari-
by transitive structure. able, and many procedures have been devised to mea-
A quantitative measurement procedure de facto sure height in units of height. Imagine, however, that
produces transitive values. If people are able to quantify someone had only a vague concept of height and
traits possessed by themselves and others, it is implied attempted to measure it using a Likert item in the same
that are able to make ordinal judgments of themselves format as the items used on the NEO-PI-R to measure
and others with respect to the same traits. To test the traits:
hypothesis that people are able to make consistent Instructions: Indicate the extent to which you agree
ordinal judgments, we asked a group of participants to with the following statement. Rank yourself in relation
make a series of pairwise comparisons of themselves to others who are similar to you in age and sex.
and people they know with respect to trait-items I have a lot of height.
representing the NEO-PI-R Neuroticism subscale (i.e.,
Strongly Neither Agree Strongly
Anger, Hostility, Depression, Self-consciousness, Disagree Disagree nor Disagree Agree Agree
Impulsiveness, and Vulnerability) and examined their 1 2 3 4 5
ratings for triplet transitivity; this we call Study 1. In
Study 2 we conducted an exact replication with a second You cannot expect to obtain quantitative estimates—
sample on a smaller scale. To test whether the no matter how rough—of height (i.e., a judgment of
proportion of transitivity violations found on this task height in units of height) using this scale. Although it
exceeded that expected from random guessing, Study 2 might be possible with extensive clarification of terms
also includes a comparison of violations made by to convert height into units of agreement with the state-
participants and those produced by randomly generated ment “I have a lot of height,” it would be impossible to
pairwise rankings. Finally, we conducted a third study convert those units of agreement back into units of
using the same paradigm substituting well-defined height. In the same way, if a trait is quantitative, unless
attributes with known quantitative structure and whole it happens to be structured exactly like an imposed rating
numbers for Neuroticism-domain traits. With respect to scale, ratings on a scale like this one cannot be treated as
the first two studies, the hypothesis tested was not our quantities of the trait. If that does happen to be the fact, it
own but the widespread assumption that self- and will not be discovered by beginning the inquiry into the
other-ratings with respect to personality traits can be structure of traits by imposing a rating scale. Because we
treated as quantities. Study 3 was conducted to test have a clear understanding of the essential nature and
the hypotheses that the proportion of triplet transitivity structure of height (i.e., distance is an extensive, quanti-
violations would be lower for ratings made along less tative property; a person’s height is the distance from the
ambiguous attributes and that the results were not due bottom of their body to the top of their body) it is like-
to participants’ failure to understand the pairwise wise clear that such an imposed scale is inappropriate for
ratings task. (quantitatively) measuring the height of persons. Unfor-
In the case of variables like personality traits, the tunately, it is necessary to overcome many unjustified
transitivity of repeated ordinal judgments has benefits beliefs that have been drilled into our heads regarding
over evaluating the consistency of repeated quantitat- “good practice” in psychometrics to see that it is also
ive estimates. First, although trait theorists assume that inappropriate for (quantitatively) measuring poorly
people are able to quantify traits, they have not estab- understood/defined psychological phenomena like
lished how people do this. A rating on a Likert item is personality traits.
given in units of agreement with an imposed range and Second, the transitivity paradigm can serve as a start-
may also be normalized in some way to place the rated ing point for discovering many things regarding the
person in a distribution of similar others. We are inter- structure and nature of a trait whether it happens to
ested here in the first assumption of the Likert have quantitative structure. Transitivity is a more basic
item qua quantitative procedure, which is that people measurement task than quantification. Although people
can quantify traits in self and others in relation to cannot make quantitative judgments without being able
the traits themselves. The nature of the unit and range to maintain order, people can make pairwise judgments
10 S. D. MORRIS ET AL.

without first making quantitative judgments (see (i.e., to begin transitivity) rather than attempting to
Michell, 2011). People can, for example, make ordinal reverse engineer Likert item ratings of unknown repre-
judgments about the relative hotness/coldness of water sentational value.
in different bowls (e.g., the water in this bowl is
hotter/less hot, no discernable difference) without first
making separate quantitative estimations of the Study 1
temperature of the water in each bowl (e.g., the water Methods
in this bowl is 83 °F). If a person conducted all possible
pairwise ordinal judgments of hotness/coldness as in the Participants
transitivity tasks we report here, provided that the tem- One hundred eight women and 26 men (M age ¼ 19.35
perature of the water in each bowl remains unchanged years, SD ¼ 3.43) participated in this study. All
throughout the task, careful investigations of specific participants were undergraduate volunteers drawn from
cases of intransitivity could bring us closer to an under- a university subject pool.
standing of the nature and structure of both experienced
and objective temperature. We could gather information
that would lead to the knowledge that we experience Procedure
temperature differently in different contexts (e.g., when Name Elicitation Procedure. A Name Elicitation
our body is at various states of rest/arousal, when we Procedure was used to generate a list of names to be
make pairwise comparisons with different comparison used in a pairwise ranking procedure (the Appendix).
items), to understanding of various sense receptors, Participants were asked to provide the first names or
and could even contribute to the development of an nicknames of seven people known personally to them.
objective definition of temperature as heat energy. Participants were also asked to indicate (a) the sex
Reactions to transitivity violations that would not bring and age of each person, and (b) the participants’ per-
us closer to understanding the nature and structure of ceived familiarity with the person using three categories
experienced or objective temperature would be these: (“I know this person extremely well,” “I do not know
(a) dismissing the importance of intransitivity of this person extremely well, but he/she is more than an
“hotter/less hot” ratings as a source of scientific acquaintance,” and “This person is an acquaintance”).
knowledge, and (b) factor analyzing sets of Likert-type Participants were instructed to indicate each person’s
scale ratings of the extent to which people agree that exact age when known and to estimate his/her age as
water in a given bowl “has a lot of heat,” and “has very closely as possible when the exact age was not known.
little heat,” and “feels cold,” and so on, in comparison to Pairwise Ranking Task
water in similar other bowls to obtain a linear model of Participants made a series of pairwise rankings in
variation in trait ratings and treating it as if it were a which the seven people listed on the Naming Elicitation
quantitative model of a property possessed by the water Procedure along with “Myself” were compared with
in individual bowls. respect to 10 items representing the NEO-PI-R
In short, if people are able to maintain quantitative Neuroticism subscales of Anger, Hostility, Depression,
relations among a set of entities with respect to a trait, Self-Consciousness, Impulsiveness, and Vulnerability.
they are able to produce transitive pairwise judgments These items (five positive indicators, five negative) were
with respect to the same trait. Therefore, if people fail selected from the International Personality Item Pool
to make transitive judgments, we know that they have (http://ipip.ori.org/) rather than directly from the
failed to make quantitative judgments. Furthermore, NEO-PI-R. The seven people and “Myself” were also
we can begin to form testable hypotheses regarding compared on three general and common descriptions
why people fail to make transitive judgments about of Neuroticism (e.g., prone to psychological distress).
traits that can lead us to further understand the nature The 10 Neuroticism trait-items were as follows:
and structure of traits. It may be the case, as with Often feels “blue” (positive indicator)
temperature, that people’s subjective experience of traits
cannot serve as a representational map of quantitative Seldom gets mad (negative indicator)
relations of people with respective to traits regardless Not easily frustrated (negative indicator)
of whether they, like objective temperature, have
quantitative structure. Because it is our subjective Has frequent mood swings (positive indicator)
experience of traits that is coded into language and Panics easily (positive indicator)
language is our chosen pathway toward the objective
nature of traits, we do well to begin at the beginning Rarely gets irritated (negative indicator)
BASIC AND APPLIED SOCIAL PSYCHOLOGY 11

Remains calm under pressure (negative indicator) resulting in a final sample size of 114 participants. This
cut-point was determined after data were collected and
Feels comfortable with himself/herself (negative is intended to remove outlying cases that might lead to
indicator)
inflated proportions of transitivity by conflating con-
Fears for the worst (positive indicator) sidered pairwise comparisons with random responses
as random response patterns (see Study 2) show very
Feels threatened easily (positive indicator) high instances of intransitivity.
The three general descriptions of Neuroticism were The ratings for all possible triplets of individuals were
as follows: then examined for violations of transitivity. For example,
if Janet was judged as more “prone to distress” than Self,
Emotionally unstable
and Self was more “prone to distress” then Mom, then
Sensitive, emotional, and prone to experience feelings Janet should be judged more “prone to distress” than
that are upsetting Mom (i.e., Janet > Self > Mom). If Janet was judged as
equally or less “prone to distress” than Mom, however,
Prone to psychological distress
then the Janet–Self–Mom triplet was tallied as a transi-
Each comparison, or rating, was conducted on a tivity violation. These transitivity violations were tallied
computer and consisted of the presentation of two across all traits and all persons, and transitivity
randomly drawn names along with one randomly violations for each of the 13 Neuroticism constructs
drawn Neuroticism-item, as follows: and each of the eight persons were tallied as well. These
sums were finally converted to proportions based on the
Consider the attribute: Prone to psychological distress
total number of possible violations in each case.
Who possesses MORE of this attribute, Janet or Myself? The distribution of the proportion of transitivity
(If they exhibit the attribute equally, select Equal.) violations across all 13 neuroticism items and eight
Prone to psychological distress
persons was somewhat uniform with values ranging
from .05 to .36. More than half of the participants were
□ Janet □ Myself □ Equal found to have between 12% and 25% transitivity
violations in their pairwise comparisons (M ¼ .20,
A progress bar also appeared at the bottom of the
Mdn ¼ .19, SD ¼ .08). As can be seen in Table 2, the
screen to allow participants to track their progress
mean proportions of violations for each Neuroticism-
through the task. Participants were instructed that they
domain attribute ranged from .18 to .22, and the
could use either the computer mouse or the keyboard
proportions of violations on each of the three broad
to make ratings and that they could change their
Neuroticism definitions ranged from .19 to .22. The
response as many times as they wanted before proceed-
median values similarly showed very little variation
ing to the next item but would be unable to return to a
across the Neuroticism attributes.
previous item. All possible pairs of individuals (seven
The proportions of transitivity violations were also
others and self) were compared on each of the
computed for subgroups of triads that were considered
10 Neuroticism-item and three general definitions,
as congruent or incongruent with respect to sex, age
resulting in 364 responses. The computer program
(i.e., no two people in the triad were more than 5 years
stored all of the responses along with the reaction times
for each response.

Table 2. Summary of individual transitivity indices, Study 1.


Attribute Minimum Maximum Mdn M
Results
1 .00 .55 .18 .20
The reaction times to each pairwise judgment were 2 .00 .61 .16 .20
3 .00 .57 .18 .20
examined to gauge whether the participants took the 4 .00 .55 .16 .19
task seriously. For example, one participant responded 5 .00 .55 .16 .19
6 .00 .55 .20 .22
to 66% of the pairwise comparisons in 1 s or less, sug- 7 .00 .52 .16 .18
gesting that she did not take the time to read and/or 8 .00 .57 .21 .22
9 .00 .57 .16 .19
attend to a majority of the statements in a meaningful 10 .00 .55 .21 .22
way. As a conservative, arbitrary cut-point we omitted Definition
11 .00 .48 .20 .22
participants from the analyses if they responded to 5% 12 .00 .57 .18 .20
or more of the pairwise comparisons in 1 s or less. 13 .00 .64 .16 .19
Twenty such participants were identified and deleted, Note. N ¼ 134.
12 S. D. MORRIS ET AL.

apart in reported age), and both sex and age. Proportions maximum proportions of transitivity violations produced
of violations were also computed for triads thought to were .45 and .56, respectively. The distribution of
represent that “best-case scenario” for transitivity (i.e., proportion transitivity violations was unimodal and
persons were congruent on sex and age variables and appeared to be slightly negatively skewed, even though
were judged as well known by the participant). No the median and mean were equal (Mdn ¼ .52, M ¼ .52,
substantial differences in the proportions of transitivity SD ¼ .02). It should be noted that the minimum pro-
violations were found, however, for any of these portion of violations from the random number generator
subgroups.6 (.45) was still greater than the highest proportion of
violations for any person in the two studies.
Study 2: Exact replication and random
comparison
Study 3
Fifteen women and seven men (M age ¼ 20.14 years,
Methods
SD ¼ 2.05) participated in a small-scale replication of
the first study. Examination of the reaction times sug- Participants
gested that two participants may have provided One hundred forty-four undergraduate students
random responses to many of the items. One participant (82 women, 62 men) participated in this study in
made 127 judgments (35%) in 1 s or less, whereas a exchange for course credit. Ages ranged from 17 to
second participant made 106 judgments (29%) in 1 s 29, with the majority of participants falling between
or less. The transitivity violations for these two the ages 18 and 21 (M ¼ 19.63, SD ¼ 1.74).
individuals were therefore not examined, and no other
anomalies were noted for the other 20 participants. Procedures
The distribution of transitivity violations (Min ¼.04, Participants completed the Name Elicitation task
Max ¼.33, Mdn ¼ .20, M ¼ .20, SD ¼ .07) was similar described in Studies 1 and 2. They were then asked to
to the distribution in the first study. Moreover, as in make a series of pairwise rankings in which the named
the original study, there was little variation in the pro- people and Myself were compared with respect to four
portion of transitivity violations across the 13 attributes conceptually, well-defined personal attributes with
(i.e., the Neuroticism trait-items; see Table 3). demonstrable quantitative structure: Height, Weight,
To test whether participants were simply providing Strength, and Education. The format of each item was
random responses to the pairwise ranking task, these identical to that used in the previous studies:
aggregate results were compared to values supplied by a Consider the attribute: Height
random number generator. Specifically, responses to the
364 pairwise comparisons were generated randomly such Who possesses MORE of this attribute, William or
that A > B, A < B, or A ¼ B could occur with equal fre- Jenny (If they exhibit the attribute equally, select Equal.)
quency (probability). The Randomize and Random(x) Height:
functions in Embarcadero’s Delphi XE2 (Object Pascal)
were used to generate the random numbers. One hundred □ William □ Jenny □ Equal
randomized trials were conducted, and the minimum and
After comparing all possible combinations of eight
people on the four properties (112 comparisons),
Table 3. Summary of individual transitivity indices for participants were next asked to conduct the same task
Replication Study 2. with nine numerical values. Specifically, the participants
Attribute Minimum Maximum Mdn M SD were asked to complete the pairwise comparisons with
1 .00 .46 .28 .25 .12 all possible pairs of the numbers 1, 5, 10, 28, 42, 42,
2 .02 .34 .15 .17 .10 59, 75, 100. To present a comparison of equality, 42
3 .02 .46 .23 .22 .12
4 .00 .46 .14 .20 .14 was included twice. The same format and computer
5 .00 .41 .14 .18 .14 program used in Studies 1 and 2 were used to present
6 .00 .45 .25 .23 .13
7 .00 .46 .24 .20 .12 the pairs (36 comparisons), and responses and reaction
8 .00 .45 .17 .18 .13 times were again recorded.
9 .00 .43 .15 .19 .13
10 .02 .54 .16 .20 .14
Definition
11 .00 .50 .20 .20 .15 Results
12 .00 .39 .14 .20 .11
13 .00 .50 .19 .17 .13 Examination of the reaction times indicated that four
Note. N ¼ 20. participants may not have taken the task seriously. These
BASIC AND APPLIED SOCIAL PSYCHOLOGY 13

Table 4. Summary of individual transitivity indices for Study 3. dimensional, continuous structures in human nature cit-
Attribute Minimum Maximum Mdn M SD ing factor models of trait ratings as empirical evidence of
Height .00 .34 .04 .05 .06 this structure. Although it may be tempting to dismiss
Weight .00 .39 .02 .04 .06
Strength .00 .36 .02 .04 .06 the approximately 20% transitivity violations as an
Education .00 .22 .01 .03 .04 “acceptable error rate” and continue to treat self-ratings
Note. N ¼ 140. as quantities, it is important to remember that this 20%
inconsistency with respect to simple orders among tri-
participants responded to 8%, 12%, 41%, and 52% of the plets cannot be translated into an error rate with respect
comparisons of the well-defined, quantitative attributes to quantities of traits possessed by individuals. Recall
in 1 s or less, whereas none of the remaining 140 that transitivity at the level of an individual rater with
participants responded to a single comparison in 1 s or respect to a single attribute is a necessary but insufficient
less. Data for these four participants were therefore not demonstration that even that single rater can accurately
included in the analysis, and no anomalies with the estimate quantitative relations with respect to that single
comparisons of the numbers were noted. attribute. Furthermore, inasmuch as the goal of scientific
The distribution of the proportion of transitivity measurement is the modeling of real relations, the most
violations for all of the objective qualities was highly appropriate way to respond to measurement error when
skewed, with most of the participants (77%) showing dealing with a poorly defined variable is to seek its
less than 5% violations in their responses (Min ¼.00, sources so that it can be reduced.
Max ¼.29, Mdn ¼ .03, M ¼ .04, SD ¼ .04). Table 4 We attempted to establish rough “floor” and “ceiling”
shows that the proportions of violations across the rates of transitivity violation by analyzing pairwise judg-
height, weight, strength, and education qualities were ments of randomly generated pairwise judgments and
typically low and highly similar in the aggregate. whole numbers. The average rate of transitivity violation
The distribution of the proportion of transitivity over 100 randomly generated trials was higher than 50%,
violations for the numerical value comparisons was which is much higher than the rate for the Neuroticism-
even more skewed (Min ¼.00, Max ¼.17, Mdn ¼ .00, domain trait ratings, suggesting that the 20% violation
M ¼ .01, SD ¼ .02). One hundred seventeen of the rate identified in the trait rating task was not the result
participants (81%) showed no transitivity violations in of random patterns of data. The mean rate of transitivity
their responses. One extreme case revealed 14% violation for pairwise judgments of whole numbers was
violations, but nothing otherwise unusual about this only 1%, which is not surprising, as this is the easiest
persons’ data was noted. scenario for making pairwise judgments. Particularly
informative is the fact that individuals in the third
study also demonstrated a low degree of intransitivity
(M ¼ 4%) in their judgments of self and others on
Discussion
well-defined personal attributes with known quantitative
The results of the first two studies suggest that individual structure.
raters often do not preserve simple orders when judging Researchers seem to think that the average person can
pairs of persons on Neuroticism trait terms commonly quickly and mentally quantify traits in self and others,
used by psychologists. Specifically, individuals often organize themselves within a distribution of others
failed to preserve order in their judgments of self and similar to them in age and sex, and convert that score
others on NEO-PI-R Neuroticism-domain trait terms. into a unit of agreement with a statement about the trait,
As transitivity is the most basic axiom of continuous all with a level of measurement precision appropriate for
quantitative measurement, these results call into ques- use in scientific publication and for making ethical
tion the widely held belief that people quantify traits decisions related to the hiring and promotion of
when making self- and other-ratings. The use of statisti- individuals. If trait variables like the ones we targeted
cal analyses in which continuity is assumed (e.g., analysis in Study 1 and 2 are indeed well conceptualized—by
of variance, least squares regression, factor analysis) on raters!—quantitative attributes, easily recognizable in
such data is also brought into question, echoing Yule’s self and other at the quantitative level, then we should
(1921) concerns nearly 80 years ago. Speaking more expect to see similar rates of transitivity violations in
generally, these results call for the reevaluation of any pairwise trait ratings as in pairwise ratings of attributes
theory, research, product, or process that presumes the like height, weight, and strength, assuming that the
quantitative—or even ordinal structure of self- and/or people being rated are similarly distributed with respect
other-ratings with respect to personality traits, including to these properties. There are many possible causes of
personality theories that claim that traits are intransitivity and the possible differences between
14 S. D. MORRIS ET AL.

Neuroticism-domain traits and quantitative traits like These ranges overlap and, as a pairwise procedure
height and weight can help us focus on potential causes involves multiple estimates, transitivity violations could
for future investigations. It may be the case that occur despite the demonstrable quantitivity of height
Neuroticism-domain traits are (a) not only ill-defined and the fact that the measurement procedure used is
but also poorly conceptualized by raters, (b) not accurate within .5 in. The ranges could be even larger if
easily recognized in others, (c) not quantitative, (d) the measurements were taken at multiple times during
quantitative but not readily observable by people at the which the height of one, some, or all of the individuals
quantitative level, or (e) not as equally distributed among could have increased or decreased. It is possible, then,
self and known others as height, weight, and so on. The that the properties estimated by the participants in our
latter is most likely to be a cause of unequal intransitivity study possess quantitative structure and that participants
rates if people are making quantitative judgments of were able to estimate values only to a degree of accuracy
people prior to making pairwise judgments. Perhaps that did not prevent overlapping ranges of values.
the people rated in these studies clustered closely It is not necessary to defend the value of the work we
together with respect to Neuroticism-domain traits and have presented here by arguing that measurement error
were more varied with respect to height. This, coupled could not result in transitivity violations in a pairwise
with measurement error, could result in higher rates of judgment task but only to argue that it is cannot be
intransitivity for the Neuroticism-domain traits. Of dismissed for this reason. On the contrary, the
course, this hypothesis cannot be tested yet because a suggestion that transitivity violations could be due to
quantitative measurement procedure for the Neuroti- “measurement error” is exactly the kind of suggestion
cism-domain traits does not yet exist; we have only that gives rise to further testable hypotheses regarding
Likert items that are unhelpful toward this end. We the nature and structure of personality traits, but only
would need to develop a quantitative measurement if it is considered within the context of the pursuit of
procedure for Neuroticism-domain traits and then come representational measurement (i.e., modeling relations
back and test this assumption; by that time, this study of real entities with respect to real properties). When
would be obsolete. the argument that observed transitivity violations could
be due to “measurement error” is invoked as a means of
protecting imposed scales from scientific scrutiny by
Regarding “measurement error”
denying the axiomatic structure of quantitivity as a
It has been suggested that this series of studies is basis for falsifiable hypotheses regarding quantitative
dismissible, as transitivity violations can be explained relations, it quickly becomes circular and unhelpful
by measurement error. Conceptually, measurement toward refining representations of real relations.
error refers to the difference between a measured value Transitivity is a more basic measurement task than
of quantity and its true value. True score theory holds quantification. A measurement procedure that repre-
that variability is an inherent aspect of measurement sents quantitative relations can be used to represent
procedures; thus we should not expect measurement ordinal relations. On the other hand, ordinal values do
procedures to produce truly accurate representations not de facto represent quantitative relations. To argue
of a particular system at any given moment. Statistical that the observed transitivity violations are due to
procedures have been devised to estimate a range of measurement error is to assume that our participants
values within which true scores are likely to lie. An either had to or chose to make quantitative judgments
imperfect measurement procedure for a quantitative prior to making transitive judgments. For each triplet
property could produce transitivity violations when in the studies represented here, we asked participants
used in a pairwise comparison task. Consider, for to make three pairwise ordinal judgments, yet the
example, a measurement procedure designed to esti- measurement error argument assumes that these three
mate the height of a set of individuals using a pairwise pairwise judgments actually required six quantitative
comparison format. In a single moment, let the true judgments; if this was the case, had they recorded these
height of Person A be 66 in., the true height of Person quantitative judgments, we could have simply counted
B be 66.5 in., and the true height of Person C be 67 the exact number of transitivity violations that were
in. If the measurement procedure is accurate within .5 due to the reliability of their quantitative estimates.
in., rounding to the nearest one-hundredth in., it could (To the extent that these traits exist as quantitative
produce values falling in the following ranges: properties in real persons, we would need a great deal
Person A: 65.51–66.49 in. more information to judge the representational accuracy
Person B: 66.01–66.99 in. or validity of any of their quantitative estimates.)
Person C: 66.51–67.49 in. It cannot be assumed that people cannot make pairwise
BASIC AND APPLIED SOCIAL PSYCHOLOGY 15

judgments without first making quantitative judgments Neuroticism-domain traits. Various suggestions
(see Michell, 2011); to do so is to express a bias in think- regarding the source of the failure to maintain transitiv-
ing about measurement. Appealing to a previous ity are helpful toward refining our understanding of the
example, people can make ordinal judgments about the nature and structure of these traits. The high rates of
relative hotness/coldness of water in different bowls transitivity violations in Neuroticism-domain trait
(e.g., the water in this bowl is hotter/less hot, no discern- ratings relative to the ratings of properties like height
able difference) without first making separate quantitat- and strength provide clues for further study but no
ive estimations of the temperature of the water in each definitive answers. Perhaps the mental units used to
bowl (e.g., the water in this bowl is 83 °F). Careful inves- make quantitative estimations of Neuroticism-domain
tigations of specific cases of intransitivity in judging rela- traits were less stable than those used to estimate height.
tive temperature bring us closer to an understanding of In this case, refinement of the trait definitions should
the nature and structure of experienced and objective bring the rate of transitivity violations closer to that of
temperature. Dismissing the importance of intransitivity height. It might also be the case that the people who
of “hotter/less hot” ratings as a source of scientific were rated were more distinct from one another with
knowledge simply because it could be caused by respect to height than with respect to the Neuroticism-
measurement error, on the other hand, halts progress domain traits. This possibility brings to light what
toward understanding temperature. If people were not should already be a striking difference between per-
even familiar with the concept of temperature as degrees sonality traits and height: The absence of an established
of heat, you certainly would prefer asking them to make quantitative measurement procedure or normative
pairwise ordinal judgments rather than asking them to definition of these traits (at the level of the person
estimate how many/how much “hotness” the water in where they are proposed to exist) prevents us from
each bowl possesses. determining whether this is the case. This absence is
Although it is possible to make ordinal judgments striking because the establishment of quantitative proce-
without first making quantitative judgments, it is also dures would seem to be a necessary step toward the
possible that people do make quantitative estimates of establishment of Likert items as quantitative procedures.
the traits used in the pairwise judgments tasks. This After all, scientific measurement as an attempt to model
can be established empirically only by observing how real relations requires extensive consideration of repre-
people make pairwise decisions. If it is the case that sentational accuracy, not mere reliability, as a means of
people make quantitative estimations prior to making establishing the validity of procedures. In the case of
ordinal judgments, transitivity violations are attributable Likert-type scales, psychological scientists purport to
to one of two sources: either the amount of the traits have developed a method that allows the average person
possessed by one or more people changes between judg- to produce standardized, quantitative values of real
ments, which calls into question the nature of traits as relations with respect to traits with no clear unit.
stable properties, or the unit used to make the estimation We are not warranted in treating self- or
changed one or more times over the course of the task. other-ratings on Likert items as ordinal, additive, or
We can focus on the second possible source for two quantitative values; far from it! Along with clear think-
reasons: Trait theory holds that traits are stable and—a ing, simple measurement tasks, like the transitivity tech-
more compelling reason—there was no opportunity for nique used here, are useful toward demonstrating the
the participants to observe any changes in the people problematic nature of existing psychological measures.
they were rating during the transitivity task. By refining They can also serve as a basis for continued study
the definition of the trait, thereby making the unit more regarding the nature and structure of poorly understood
stable, we should expect to see less measurement error. It phenomena. In the future, we plan to observe the
may also be the case that people make pairwise decisions decision-making process people use during this task,
differently with respect to traits versus height, weight, paying close attention to (a) possible differences in
and so on. This is another thing we can learn by decision making with respect to personality traits and
observing how people make decisions, paying special quantitative attributes and (b) decision making in com-
attention to the decision-making processes that result parisons resulting in intransitivity. Used in the context
in intransitive triplets. of a hermeneutic procedure in which we actually speak
In short, transitivity errors violate the assumption at length to study participants, they can be used to
that people are able to accurately represent quantities clarify our language and develop valid procedures for
of traits possessed by themselves and others. Studies 1 representing real phenomena and processes. One way
and 2 show that people often fail to maintain transitivity to begin this task is to identify triplet transitivity viola-
when making pairwise judgments with respect to tions, show them to the rater, and ask them to explain
16 S. D. MORRIS ET AL.

their inconsistency. This can help us eliminate ambi- 2. I recently read an article in which researchers measured
guity and nonessentials in our trait definitions, the pro- “trivialization of traffic violations” on an 11-point scale
gress of which could be demonstrated by a reduction in from 1 (extremely unimportant) to 11 (extremely important;
Fointiat, Somat, & Grosbras, 2011). It is difficult to imagine
the proportion of intransitive ratings toward the level that anyone attributes importance to traffic violations with
observed in Study 3. a level of nuance that would require an 11-point scale.
It is true that scientists cannot avoid making some 3. Or we can accept this as acceptable measurement
assumptions—the most basic assumption we must error even though we have no based for determining the
make is that the phenomena we study are both real possible range of agreement and, by implication, how large
these units are relative to the range.
and knowable—but we ought not build our scientific
4. Given the level of skill and motivation these complex tasks
endeavors on unwarranted assumption. This is like a require, it is curious that we give so much credit to naïve
sighted person with his eyes closed groping around a subjects who complete these procedures yet also so little
room on the assumption that the room is dark. To be credit that we build in things like lie-scales and reverse
clear, the criticisms that have been raised here are not coded items to correct tendencies toward self-promotion
directed toward statistics or statisticians; they concern and acquiescence.
5. There is no set amount of “measurement error” that can be
a problem that is methodologically prior to statistics. judged as acceptable. This is a decision that requires the
These studies are presented as simple, concrete, and mental engagement of a person. A certain amount of error
easy-to-replicate examples of how even nonspecialists in a measurement procedure may or may not be acceptable
can draw on the axioms of quantitative structure to in an applied situation depending upon the precision
put some of our unnecessary measurement assump- required by the task for which it is being used.
6. For more detail on violations for subgroups, refer to
tions to the test so that psychometric techniques that
Badzinski (2012).
lead to merely apparent progress can be replaced with 7. No multiplication in the number of Likert items or
measurement initiatives that could result in actual statistical procedures can alter the fact that individual
progress. As long as psychologists fail to see that we Likert items are assumed to be quantitative procedures;
have broken from or avoided reality in the most basic any doubts regarding this can be cleared up by looking
step of our methods (i.e., modeling target phenomena at the mathematical operations used to develop and
validate multi-item scales.
as they are), our discipline may continue in the appear-
ance of progress while our actual progress remains
stunted. References
We need to get past the idea that quantification is as
easy as asking people to circle numerals on Likert Ace, M. E., & Dawis, R. V. (1974). Type of content, type of
score, and response inconsistency in comparison measures
items.7 The quantitative measurement of nonextensive of preference. Educational and Psychological Measurement,
and/or poorly conceptualized properties is difficult, 34, 221–230. doi:10.1177/001316447403400202
but quantitative measurement need not always be our Badzinski, S. I. (2012). Transitivity and inter-definition con-
goal. Perhaps psychologists sometimes opt for “quanti- sistency of NEO neuroticism-domain ratings. (Unpublished
tative” measures of various phenomena simply because doctoral dissertation). Retrieved from http://hdl.handle.
net/11244/10947
quantitative science seems more prestigious and so easy
Barrett, P. T. (2003). Beyond psychometrics: Measurement,
to do. We may prefer to use quantitative variables even non-quantitative structure, and applied numerics. Journal
when our work does not demand it or when other types of Managerial Psychology, 3(18), 421–439. doi:10.1108/
of variables might be more appropriate. Once we 02683940310484026
understand the challenges of developing representative Barrett, P. (2008). The consequence of sustaining a pathology:
measurement procedures and see how they are skirted Scientific stagnation. A commentary on the target article ‘Is
psychometrics a pathological science?’ by Joel Michell.
in current psychometric practices, we are called to Measurement, 6, 78–123. doi:10.1080/15366360802035521
respond in some way. Some may undertake to meet Bouyssou, D., & Pirlot, M. (2002). Nontransitive decom-
these challenges, and others may opt out of using quan- posable conjoint measurement. Journal of Mathematical
titative measures, finding them unnecessary for devel- Psychology, 46(6), 677–702. doi:10.1006/jmps.2002.1419
oping models of real processes and phenomena. Cloninger, C. R. (2000). A practical way to diagnose
personality disorder: A proposal. Journal of Personality
Either response is better than denying that we have a
Disorders, 14, 98–108. doi:10.1521/pedi.2000.14.2.99
problem. Cloninger, C. R., Przybeck, R., & Svrakic, D. (1991). The
tridimensional personality questionnaire: U.S. normative
Notes data. Psychological Reports, 69, 1047–1057. doi:10.2466/
pr0.1991.69.3.1047
1. This marks a turning away from the modeling of natural Cloninger, C. R., Svrakic, D. M., Pryzbeck, T. R., & Wetzel,
entities and systems to the modeling of trends. T. R. (1994). The temperament and character inventory
BASIC AND APPLIED SOCIAL PSYCHOLOGY 17

(TCI): A guide to its development and use. St. Louis, MO: zu Leipzig [Reports on the negotiations of the Saxon
Center for Psychobiology of Personality. Society of Sciences in Leipzig]. Mathematisch-Physische
Costa, P. T., Jr., & McCrae, R. R. (1992). The NEO PI-R Classe, 53, 1–64. doi:10.1007/bf03017642
professional manual. Odessa, FL: Psychological Assessment Krantz, D. H. (1964). Conjoint measurement: The
Resources. Luce-Tukey axiomatization and some extensions. Journal
Danziger, K. (1990). Constructing the subject: Historical of Mathematical Psychology, 1(2), 248–277. doi:10.1016/
origins of psychological research. New York, NY: Cambridge 0022-2496(64)90003-3
University Press. Lamiell, J. T. (2013). Statisticism in personality psychologists’
Delia, J. G., & Crockett, W. H. (1973). Social schemas, use of trait constructs; What is it? How was it contracted?
cognitive complexity and the learning of social structures. Is there a cure? New Ideas in Psychology, 31, 65–71.
Journal of Personality, 41(3), 413–429. doi:10.1111/j.1467- doi:10.1016/j.newideapsych.2011.02.009
6494.1973.tb00103.x Lee, K., & Ashton, M. C. (2004). Psychometric properties
Digman, J. M. (1990). Personality structure: Emergence of of the HEXACO Personality Inventory. Multivariate
the five-factor model. Annual Review of Psychology, Behavioral Research, 39, 329–358. doi:10.1207/s15327906
41, 417–470. doi:10.1146/annurev.ps.41.020190.002221 mbr3902_8
Eysenck, H. J., & Wilson, G. (1991). The Eysenck personality Lord, F. M. (1953). On the statistical treatment of football
profiler (1st ed.). Guilford, CT: Psi-Press. numbers. American Psychologist, 8, 750–751. doi:10.1037/
Falmagne, J. C. (1979). On a class of probabilistic conjoint h0063675
measurement models: Some diagnostic properties. Journal Luce, D. (1966). Two extensions of conjoint measurement.
of Mathematical Psychology, 19(2), 73–88. doi:10.1016/ Journal of Mathematical Psychology, 3(2), 348–370.
0022-2496(79)90013-0 doi:10.1016/0022-2496(66)90019-8
Ferguson, A., Myers, C. S., Bartlett, R. J., Banister, H., Bartlett, Luce, D., & Narens, L. (1985). Classification of concatenation
F. C., Brown, W., … Tucker, W. S. (1940). Quantitative measurement structures according to scale type. Journal of
estimates of sensory events: Final report of the committee Mathematical Psychology, 29(1), 1–72. doi:10.1016/0022-
appointed to consider and report upon the possibility of 2496(85)90018-5
quantitative estimates of sensory events. Advancement of Luce, D. & Steingrimsson. (2011). Theory and tests of the
Science, 1, 331–349. conjoint commutativity axiom for additive conjoint
Ferguson, E. D. (1962). Ego involvement: A critical examination measurement. Journal of Mathematical Psychology, 55(5),
of some methodological issues. Journal of Abnormal and 379–385. doi:10.1016/j.jmp.2011.05.004
Social Psychology, 64(6), 407–417. doi:10.1037/h0046041 Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint
Ferguson, E. D. (1971). Role of individual differences in measurement: A new type of fundamental measurement.
measures of ego-involvement. Psychological Reports, Journal of Mathematical Psychology, 1, 1–27. doi:10.1016/
29, 569–570. doi:10.2466/pr0.1971.29.2.569 0022-2496(64)90015-x
Fishburn, P. C. (1991). Nontransitive additive conjoint McCrae, R. R., & Costa, P. T. (2003). Personality in
measurement. Journal of Mathematical Psychology, 35(1), adulthood: a five-factor theory perspective. New York, NY:
1–40. doi:10.1016/0022-2496(91)90032-o Guilford.
Fointiat, V., Somat, A., & Grosbras, J. (2011). Saying, but not Michell, J. (1988). Some problems in testing the double
doing: Induced hypocrisy, trivialization, and misattribution. cancellation condition in conjoint measurement theory.
Social Behavior and Personality, 39(4), 465–476. Journal of Mathematical Psychology, 32(4), 466–473.
doi:10.2224/sbp.2011.39.4.465 doi:10.1016/0022-2496(88)90024-7
Freedman, D. A. (1991). Statistical models and shoe leather. Michell, J. (1997). Quantitative science and the definition of
Sociological Methodology, 21, 291–313. doi:10.2307/270939 measurement in psychology. British Journal of Psychology,
Goldberg, L. R. (1993). The structure of phenotypic person- 88, 355–383. doi:10.1111/j.2044-8295.1997.tb02641.x
ality traits. American Psychologist, 48, 26–34. doi:10.1037/ Michell, J. (2000). Normal science, pathological science and
0003-066X.48.1.26 psychometrics. Theory and Psychology, 10, 639–667.
Gonzales, C. (2000). Two factor additive conjoint doi:10.1177/0959354300105004
measurement with one solvable component. Journal of Michell, J. (2008a). Is psychometrics pathological science?
Mathematical Psychology, 44(2), 285–309. doi:10.1006/ Measurement, 6, 7–24. doi:10.1080/15366360802035489
jmps.1998.1248 Michell, J. (2008b). Rejoinder. Measurement, 6, 125–133.
Guest, D. J., Dell, G. S., & Cole, J. S. (2000). Violable doi:10.1080/15366360802121917
constraints in language production: Testing the transitivity Michell, J. (2011). Qualitative research meets the ghost of
assumption of optimal theory. Journal of Memory and Pythagoras. Theory & Psychology, 21(2), 241–259.
Language, 42, 272–299. doi:10.1006/jmla.1999.2679 doi:10.1177/0959354310391351
Guilford, J. P. (1975). Factors and factors of personality. Mill, J. S. (1974). A system of logic: Ratiocinative and inductive.
Psychological Bulletin, 82, 802–814. doi:apa.org/journals/ Toronto, Canada: University of Toronto Press. (Original
bul/82/5/802.pdf work published 1843).
Harman, H. H. (1967). Modern factor analysis. Chicago, IL: Pfleiderer, P. (2014, March 25). Chameleons: The misuse of
University of Chicago Press. theoretical models in finance and economics. SSRN
Hölder, O. (1901). Die Axiome der Quantität und die Lehre Electronic Journal, 1–36. doi:10.2139/ssrn.2414731
vom Mass [The axioms of quantity and the theory of Rader, T. (1963). The existence of a utility function to
measurement]. Berichte über die Verhandlungen der represent preferences. The Review of Economic Studies,
Königlich Sächsischen Gesellschaft der Wissenschaften 30(3), 229–232. doi:10.2307/2296323
18 S. D. MORRIS ET AL.

Sher, G. (1999). Is there a place for philosophy in Quine’s Yule, G. U. (1921). Critical notice: Review of the essentials of
theory? The Journal of Philosophy, 96, 491–524. mental measurement by W. Brown and G. H. Thomson.
doi:10.2307/2564611 British Journal of Psychology, 12, 100–107. doi:10.1111/
Stevens, S. S. (1946). On the theory of scales of j.2044-8295.1921.tb00040.x
measurement. Science, 103, 677–680. doi:10.1126/science.103. Zand Scholten, A., & Borsboom, D. (2009). A reanalysis of
2684.677 Lord’s statistical treatment of football numbers. Journal of
Stevens, S. S. (1951). Mathematics, measurement, and psycho- Mathematical Psychology, 53(2), 69–75. doi:10.1016/j.
physics. In S. S. Stevens (Ed.), Handbook of experimental jmp.2009.01.002
psychology (pp. 1–49). New York, NY: Wiley.
Tellegen, A., & Waller, N. G. (2008). Exploring personality Appendix: Name Elicitation Procedure
through test construction: development of the multidimen-
sional personality questionnaire. In G. J. Boyle G. Matthews Instructions: In the spaces below, please provide the first
& D. H. Saklofske (Eds.), The Sage handbook of personality name, nickname, or initials of seven people you know
theory and assessment: Vol. II. Personality measurement and
personally. For confidentiality purposes, please, do not
testing (pp. 261–292). London, UK: Sage.
Trendler, G. (2009). Measurement theory, psychology and the include both first and last names. Beside each name,
revolution that cannot happen. Theory & Psychology, 19(5), please indicate each person’s sex (e.g., M ¼ Male and
579–599. doi:10.1177/0959354309341926 F ¼ Female) and approximate age. Below each name,
Tversky, A. (1969). Intransitivity of preferences. Psychological please place a check next to the option that best
Review, 76(1), 31–48. doi:10.1037/h0026750 describes how well you know the person.
Watson, D., & Clark, L. A. (1994). Manual for the positive
and negative affect schedule: Expanded form. Iowa City:
Name/Nickname Sex Age
University of Iowa. 1. ___________________________ ________
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development _________
and validation of brief measures of positive and negative ___ I know this person extremely well.
affect: The PANAS scales. Journal of Personality and ___ I do not know this person extremely well, but
Social Psychology, 54, 1063–1070. doi:10.1037/0022-3514.54.
he/she is more than an “acquaintance.”
6.1063
Watson, D., & Tellegen, A. (1985). Towards a consensual ___ This person is an acquaintance.
structure of mood. Psychological Bulletin, 98, 219–235. (Note. The above task was repeated six more times,
doi:10.1037//0033-2909.98.2.219 for a total of seven known others.)

You might also like