You are on page 1of 7

Australian Journal of Psychology

ISSN: 0004-9530 (Print) 1742-9536 (Online) Journal homepage: https://www.tandfonline.com/loi/raup20

Stevens's theory of scales of measurement and its


place in modern psychology

Joel Michell

To cite this article: Joel Michell (2002) Stevens's theory of scales of measurement and
its place in modern psychology, Australian Journal of Psychology, 54:2, 99-104, DOI:
10.1080/00049530210001706563

To link to this article: https://doi.org/10.1080/00049530210001706563

Published online: 26 May 2010.

Submit your article to this journal

Article views: 871

View related articles

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=raup20
99

Stevens's Theory of Scales of Measurement


and Its Place in Modern Psychology
Joel Michell
University of Sydney

Stevens's theory of scales of measurement has been an important methodological resource within psychol-
ogy for half a century. It advanced the representational theory of measurement and promised to open up to
scientific investigation the issue of the structure of psychological attributes. Its development by Suppes and
Zinnes and the axiomatic measurement theory tradition showed how that promise could be fulfilled.
However, neither Stevens nor the psychometricians who adopted his theory used it in that way. They used it
to foreclose scientific investigation of that issue. This paper examines the way in which this was done and
offers reasons for it.

The method of 'postulating' what we want has many advantages; ordinal scale. In the case of interval and ratio scales, the
they are the same as the advantages of theft over honest toil. (S.S. classes of "admissible" transformations are the positive linear
Stevens quoting Bertrand Russell). transformations and positive similarities transformations,
Stevens's (1946) theory of scales of measurement was respectively. The identification of these groups of transforma-
a component of psychology's post-Second World War method- tions was a unique feature of Stevens's theory.
ological consensus. Repeatedly recited by its author (Stevens, Stevens's theory was a logically coherent contribution to the
1946, 1951, 1959, 1968, 1975), it "stood like the Decalogue" theory of numerical representations. It was not the last word,
(Newman, 1974, p. 137), defining psychology's paradigm but it was a significant step forward. However, he had not
of measurement. Elsewhere (Michell, 1999), its conceptual characterised the precise structure that aspects of the "empiri-
roots and limitations have been identified. This article discusses cal world" must possess to sustain one or other of his kinds
its effects. It begins with a sketch of Stevens's theory, although of measurement scales.
there is little need to describe it in detail. I then indicate how the
axiomatic measurement tradition, beginning with Suppes and THE DEVELOPMENT OF STEVENS'S THEORY
Zinnes (1963), revealed issues implied by Stevens's theory. In the 1950s and later, Stevens's theory was elaborated
Then the way in which Stevens's theory was actually used by a number of researchers, including Suppes (1951) and
to avoid exploring those same issues is described. Finally, this Coombs (1956, 1960, 1964). Suppes and Zinnes (1963) speci-
article discusses why Stevens's theory was used in this way. fied structures necessary and sufficient for each of Stevens's
scale types. For example, an empirical structure only leads
STEVENS'S THEORY to an ordinal scale if it involves at least a weak order relation
Stevens was a synthesiser. He took from Bertrand Russell upon a set of objects, that is, a transitive and strongly connect-
(1903) (who made the numerical representation of order the ed binary relation. These are demanding conditions, for not all
centre of his theory of measurement); from H. M. Johnson binary relations are transitive and not all transitive ones
(1936) (who considered both classification and ordering under strongly connected. Take loving. If Othello loves Desdemona
the general heading of measurement); and from George and Desdemona loves Iago, it does not follow that Othello
Birkhoff, who provided Stevens with the theory of numerical loves Iago. Such triangles show that loving is not transitive.
transformations (Stevens, 1946). Stevens's basic idea was that Or, take being an ancestor of in a family tree. This relation
measurement involves numerically modelling "aspects of the is transitive: if someone, say George, is an ancestor of some-
empirical world" (Stevens, 1951, p. 23). The aspects modelled one else, say Elizabeth; and Elizabeth is an ancestor of another
may differ in complexity, giving rise to different kinds person, Charles; then it follows that George is an ancestor
of scales: modelling a classification produces a nominal scale; of Charles. However, it is not the case that the ancestor relation
modelling an order produces an ordinal scale; modelling connects every pair of people in a family tree (e.g., Charles and
differences between levels of an attribute produces an interval his sister, Anne, are not connected).
scale; and, on top of that, modelling ratios between levels As Suppes and Zinnes made clear, identifying an ordinal
of an attribute produces a ratio scale. scale means first, identifying the objects involved; second,
According to Stevens, these different kinds of scale not only identifying a possibly suitable relation; and third, checking
differ with respect to the kind of relations numerically to see that the relation has the properties sustaining a weak
modelled; they also differ with respect to the group of numeri- order (i.e., transitivity and connexity). If it does not, then there
cal transformations that leave the kind of scale unchanged. is no possibility of having an ordinal scale based upon that
Thus, any one-to-one transformation of the numbers used relation. However, if it does, then an ordinal scale is secured.
in a nominal scale maps them into a new nominal scale. Stevens (1951) specified conditions necessary and sufficient
Likewise, any increasing monotonic transformation of the for a "serial order" (p. 13) although, not in his section on
numbers used in an ordinal scale maps them into a new ordinal scales. However, his treatment of quantitative structure

This is a revised version of a paper delivered at the 20th Annual Conference of the European Society for the History of the Human Sciences, Amster-
dam, The Netherlands, August, 2001.
Address for correspondence: Associate Professor Joel Michell, Department of Psychology, University of Sydney, Sydney NSW 2006, Australia. Email:
joelm@psych.usyd.edu.au,

Australian Journal of Psychology Vol. 54, No. 2, 2002 pp. 99-104


I 00 Joel Michell

"in the ordinary sense of the word" (p. 27) (viz., interval and extension) measured or not and the above conditions are true
ratio scales) was incomplete. No definition of either the kind of lengths measured or not. Taking an expression like 'a + b
of structure involved or of clues to its occurrence was = c', it is tempting to read it as a joined to b gives c. Such
provided. This is where the contribution of Suppes and Zinnes an operational reading is not intended. Most lengths in the
was important. They specified conditions necessary and suffi- universe are either too big or too small for human operations
cient for interval and for ratio scales. This work was taken and, more importantly, if for any lengths a, b and c, a + b = c,
considerably further by R. D. Luce and his associates and also, then that is true regardless of human operations. If operations
in Europe, by Pfanzagl (1968). One of Luce's associates refor- upon lengths serve any function it is in attempting to test the
mulated the distinctions between the various kinds of scales truth of such conditions. The expression, 'a + b = c', denotes
possible using the mathematical theory of automorphisms a relation: namely that length c is entirely composed
and proved that nothing lies between Stevens's ordinal and of discrete parts, a and b. The first four conditions state what
interval scales (see Narens 1985, 2002; Narens & Luce, 1986, it means for lengths to be additive. The remaining three ensure
for these and other significant results of the axiomatic that no lengths are excluded. Condition 5 means that there
measurement tradition). is no least length; 6, that there is no greatest length; and 7, that
Hilder's (1901) theorem was used to unify many of the there are no gaps in the length series (the concepts of upper
results produced. Holder had not only defined the structure bound and least upper bound being explained in most elemen-
necessary and sufficient for ratio scale measurement, but also tary algebra books, e.g., Birkhoff & MacLane, 1965).
the structure that differences must have to be measurable Any attribute having this kind of structure is an unbounded
(i.e., Stevens's interval scales) (see Michell & Ernst, 1996, continuous quantity and it is measurable on a ratio scale.
1997). If a distinction is drawn between the structure If some attribute is not known to have this structure, but differ-
of a quantitative attribute (quantitative deep structure, ences between its levels do, then it is measurable on an inter-
Nieder6e, 1994), on the one hand and, on the other, directly val scale. When an attribute is hypothesised to be measurable
observable data structures (i.e., quantitative surface structures) on either kind of scale (as often happens in psychology), then
indicative of underlying quantitative deep structure, then this is the kind of structure that is implicated. Such a hypothe-
Holder may be seen as defining quantitative deep structure, sis is the same as any in science; namely, it is uncritical
while the tradition of axiomatic measurement theory specified to accept it without relevant evidence.
an array of quantitative surface structures. This distinction In this respect, Stevens's theory was a map locating the door
is much more important in psychology than in physics because to an important opportunity for psychology and its elaboration
the attributes that psychologists aspire to measure are not open by Suppes and Zinnes and others was the conceptual key that
to direct observation in the way that some physical attributes opened that door. For example, a scientist studying intellectual
are. Thus, in psychology, evidence for underlying quantitative abilities would want to know how such abilities work in the
structure is inevitably indirect. For such evidence to be production of intellectual behaviour, say how mathematical
adequately evaluated, the character common to quantitative ability works in enabling someone to solve a set of numerical
attributes must be explicitly specified, as also must the proper- problems. Part of doing this, would be investigating how
ties of observable data structures diagnostic of underlying different levels of mathematical ability relate to one another,
quantitative structure. That is, in order to diagnose something, for example, whether they are quantitative, ordered weakly
we need to know what it is we are attempting to diagnose and or only partially ordered. It might involve constructing
we need to know what is diagnostic of it. H61lder (1901) speci- a theory about how different levels of this ability relate to one
fied the first and Suppes and Zinnes (1963) made progress another and putting this theory to the test using methods
with the latter. The subsequent work of Luce and Tukey diagnostic of the various structural possibilities. Having done
(1964) was vitally important to the prospects of measurement this successfully, a psychologist might have evidence for
in psychology because it showed that under certain conditions an interval or ordinal scale of mathematical ability. Stevens's
merely ordinal data could be diagnostic of underlying quantita- theory of scales of measurement and its elaboration was
tive structure (see Krantz, Luce, Suppes & Tversky, 1971; an invaluable resource for psychologists interested in such
Michell, 1990). scientific questions.
Quantitative structure is best appreciated by considering
the case of length. Let a, b, c, ... , be any specific lengths HOW STEVENS'S THEORY WAS USED
(i.e., levels of the attribute, length). The quantitative structure However, considering attributes that psychologists have
of length is given by the following conditions: famously claimed to measure (i.e., intellectual abilities,
personality traits, and social attitudes), the search for research
1. For every pair, a and b, one and only one of the following
of this kind will largely prove vain. Indeed, the contribution
is true:
of Suppes and Zinnes and its subsequent amplification by such
(i) a = b; as Krantz, Luce, Suppes and Tversky (1971) has typically been
(ii) there exists a c, such that a = b + c; ignored in psychometrics (Cliff, 1992; Michell, 2000). Stevens
(iii) there exists a c, such that b = a + c. did not ignore it. He explicitly rejected it, accusing Suppes and
2. For any a and b, a + b > a. Zinnes of drifting off "into the vacuum of abstraction"
3. For any a and b, a + b = b + a. (Stevens, 1968, p. 854). His assessment was misguided:
4. For any a, b, and c, a + (b + c) = (a + b) + c. Stevens's theory merely raises the issue, but provides the
investigator with no explicit guidance as to what to look for.
5. For any a, there is a b, such that b <a.
However, Stevens's theory was not ignored. It was used, but
6. For any a and b there is a c such that c = a + b. not to investigate the issue of scale type scientifically.
7.For every non-empty class of lengths having an upper Stevens (1951) used his theory to support his psychophysi-
bound, there is a least upper bound. cal methods. He was responding to Campbell's (1940) objec-
In these conditions the symbols, 'a', 'b', and 'c' do not stand tion that no psychophysical methods, including Stevens's, are
for numbers, they refer to specific lengths, that is, for example, methods of measurement. Campbell argued that since
specific properties of objects independently of any measure- psychophysicists had not demonstrated that sensation intensi-
ments being made. Physical objects have length (i.e., spatial ties are additive, they had no evidence that such intensities are
AustralianJournal of Psychology - August 2002
Stevens's Scales of Measurement I 101

measurable (on a ratio scale). Stevens replied that ratio scale (p. 21). Of course, this is just the view that in claiming a scale
measurement could be achieved without a demonstration to be interval, something is being said about the structure
of additivity. He reasoned like this: pairs of weights that leave of the attribute involved and, furthermore, that such claims
a beam balance horizontal for a fixed, non-central fulcrum cannot be accepted safely without evidence. Calling the
position, will all be in the same non-unitary ratio; thus, the standard, critical, scientific attitude fundamentalist was a sign
discovery of such pairs could be used to establish a ratio scale of things to come.
of weight without identifying any relation of additivity. Suppose, he says, that the "fundamentalist" view is true.
Stevens saw this situation as analogous to his psychophysical This would imply, he thought, that only length and weight
method. In his method, participants were instructed to judge could be shown to be measurable on anything better than
whether pairs of sensation intensities are in the same fixed an ordinal scale, for it is only with these two attributes that the
ratio. If the former method leads to a ratio scale of weight additive structure of the attribute can be directly seen.
then, Stevens concluded, his method leads to a ratio scale He claimed that in all other cases of measurement, a correlate
of sensation intensity. is used to measure the attribute involved (as, e.g., length
However, Holder (1901) had shown that an attribute only is used to measure temperature) and that the correlate cannot
sustains ratios if it is quantitative and therefore additive, reveal the structure of the measured attribute. Now, while
as Campbell claimed. Thus, Stevens's method could only a fundamentalist may find this debilitating, treating a correlate
establish a ratio scale if sensation intensities are additive as if it provided interval scale measurement loses very little,
in structure. In claiming that no one had shown this, Campbell according to Nunnally. The reason he gave is that the product-
was right. So Stevens's argument begged the question of the moment correlation coefficient "between two variables is
internal structure of the attribute of sensation intensity. The fact affected very little by monotonic transformations of the
that experimental participants respond to instructions to judge variables" (p. 25). So, Nunnally concluded, if the correlate is
equal ratios does not mean that there are ratios there to be ordinally related to the attribute of interest, it will also be
judged. This question needs to be investigated in its own right. highly correlated with it (in the sense of the product-moment
This is why Stevens preferred to ignore Suppes and Zinnes coefficient). Such an argument, if valid, would have obvious
(1963). Unlike Stevens, they had specified a structure that application in psychology: it would imply that if test scores
attributes must have to sustain ratio scale measurements. were an ordinal correlate of psychological attributes, they could
Stevens's reasoning was of this form: couched in the context be treated as interval scale measures of such attributes.
of his theory of scales, a question-begging argument was used Of course, this argument begs the familiar question.
to conclude that the desired level of measurement had been It assumes that the relevant attribute is quantitative and,
attained. To those prepared to beg the same question, Stevens's furthermore, that the correlate is at least an ordinal index of it.
reasoning seemed cogent and, so, this pattern became the However, if the structure of the correlate reveals nothing about
model in the more high-profile area of psychometrics. the structure of the relevant attribute, neither of these assump-
Consider Kerlinger (1964). He expounded Stevens's theory tions is safe. Suppose that the psychological attribute were
and then asked "What kinds of scales are used in behavioral merely ordinal, then the correlate could be nothing better than
and educational research?" (p. 425). His answer was: "Mostly an ordinal index of it. If the psychological attribute were
nominal and ordinal are used, though the probability is good merely a partial order, then even ordinal scaling would be
that many scales and tests used in psychological and educa- impossible. If nothing could be known of the psychological
tional measurement approximate interval measurement." (p. attribute then, ipso facto, it could never be known that any
425). He gave no argument for the claim that some are ordinal, correlate measures it on any 6f Stevens's four kinds of scales.
simply asserting, "Intelligence, aptitude, and personality test Nunnally's characterisation of the fundamentalist view is,
scores are, basically and strictly speaking, ordinal" however, a caricature. Standard accounts of extensive measure-
(p. 425, italics in original). Then, assuming that the underlying ment (e.g., Helmholtz, 1887/1996; Holder, 1901; Campbell,
psychological attribute is actually quantitative, he argued that 1920; Suppes & Zinnes, 1963) show that the additive structure
if two or more ordinal indices are linearly related, then "equal of quite a few attributes (e.g., "mass, volume, length, angle,
intervals can be assumed" (p. 425). Thus, he concluded: period, force, electrical resistance, current, voltage" (Campbell,
"The best procedure would seem to be to treat ordinal 1938, p. 127)) can be demonstrated directly. Furthermore, the
measurements as though they were interval measurements" then recently published work of Luce and Tukey (1964)
(p. 427) citing Guilford (1954) as his authority. Guilford had showed that conjoint measurement provides an indirect route
asserted that psychological practices "often approach the condi- for identifying additive structure, one which covers all other
tion of equal units" (p. 15), that is, interval scale measurement. physical attributes (Krantz, et al., 1971). Moreover, the theory
Now, test scores, considered in isolation, are neither ordinal of conjoint measurement specifies conditions under which the
nor interval. This is because, as such, they are not measure- structure of a correlate provides information about the struc-
ments of any kind. If, for example, they are a count of the ture of underlying attributes. Nunnally's argument misrepre-
number of correct answers, then they are simply frequencies, sented the scientific approach so badly that the approach
nothing more and nothing less. However, considered in appeared unworkable. His misrepresentation set the scene for
relation to hypothetical psychological attributes (such as abili- exposition of his preferred position.
ties, traits or attitudes), the issue of measurement can be raised. Nunnally's preferred position was that there is no intrinsic
Clearly, test scores can only approach interval scale measure- interval scale structure, that is, that there are no "'real' scales"
ments of such hypothetical attributes on condition that differ- (Nunnally, 1967, p. 26). Thus, he concluded "It is much more
ences within such attributes are quantitative. Thus, until the appropriate to think of any measurement scale as a convention
structure of such attributes is known, claims like those - an agreement among scientists that a particular scaling
of Kerlinger and Guilford beg the question. of an attribute is a 'good' scaling" (p. 26), 'good' scales being
Nunnally (1967) also discussed this issue at length. those that "work well in practice" (p. 28, italics in original).
He identified the "fundamentalist view"; namely, that Three points can be made about this position. First, as noted,
"(1) measurement scales have empirical reality in addition to interval scale structure is non-trivial and the issue of whether
being theoretical constructions and (2) the investigator must any attributes possess it is an empirical one. Lacking this kind
show evidence of the scale properties of particular measures" of structure, an attribute may still be ordered. It is known that
Australian Journal of Psychology - August 2002
102 Joel Michell

treating ordinal attributes as if they were interval structures have to be treated as having "interval properties" in order to be
may lead to invalid conclusions (Luce, Krantz, Suppes, correlated with a criterion variable. The question of interval
& Tversky, 1990; Michell, 1986) and, in such cases, methods scaling only arises when test scores are hypothesised to be
tailored to ordinal structures are preferable (Cliff, 1996). More indices of latent psychological attributes. This fact was recog-
specifically, ordinal attributes cannot relate quantitatively nised by Cronbach and Gleser (1957), who recommended
(e.g., linearly or multiplicatively) to other attributes and, replacing the rhetoric of measurement with statistical decision
so, methods presuming quantitative relations (e.g., factor theory in practical contexts. Likewise, even if tests measure
analysis) would be of dubious value in identifying underlying psychological attributes, it does not follow that they are useful
attributes. Also, the hypothesis that the underlying attributes predictors. That depends upon the complexity of the causal
are ordinal cannot be taken for granted. If, for example, the processes linking those attributes to performance on the
psychological attributes that lie behind test performance are relevant criteria.
only partially ordered, then even ordinal methods of data Of course, if a particular test is useful in predicting some
analysis could be misleading. Thus, if psychological attributes criterion, one possible explanation of that usefulness is the
do not possess interval structure, as Nunnally suggests, treat- hypothesis that the test measures a psychological attribute
ing them as interval measures is unsafe. Furthermore, the relevant to that criterion. However, the hypothesis that this
scientific task of discovering their actual structure remains. is what is going on is not confirmed by the existence of a high
Second, Stevens's theory made the place of convention product-moment correlation because test scores may correlate
explicit. The numbers employed in a measurement scale are highly with criteria whether or not they measure anything
constrained more and more by nature as one proceeds from on an interval scale. If, for example, the relevant psychological
nominal to ratio scales and, as a result, convention plays attribute was merely ordered and that attribute was causally
a diminishing role. With ratio scales, only the unit of measure- connected to the criterion variable, then a high correlation
ment is a matter of convention and once that is agreed to, would be expected because "product-moment correlation
numerical scale values are uniquely determined. With interval mainly is sensitive to the rank order of individuals" (Nunnally,
scales, not only the unit but also the zero point is a matter 1997, p. 25). So, the issues of predictive usefulness and inter-
of convention. Once they are agreed to, numerical scale values val scale measurement are different and confusing them does
are again uniquely determined. With ordinal and nominal not advance our understanding.
scales it is simpler to specify what is not conventional about While this paper has focused upon the arguments
them, than what is. Nominal scales only reflect sameness of Kerlinger (1965) and Nunnally (1967), it should be noted
or difference and, so, the sameness or difference of the numer- that their arguments are entirely typical of those advanced
als assigned to levels of the relevant attribute is the only non- in psychology, both in the past and today. Recent texts
conventional feature. As well as that, ordinal scales represent continue to draw the same conclusion. For example, Kline
the order of levels of the attribute, so the order of the numerals reassures his readers that "the vast majority of psychological
assigned is non-conventional as well. Aside from conventional tests measuring intelligence, ability, personality and motiva-
features, the numerical values used relate to real features of the tion ... are interval scales" (Kline, 2000, p. 18) and Lehman
attributes involved and, so, are determined by nature. To claim insists that "interval measurement is probably the most
that a scale is interval, for example, is to claim that the common scale in psychology" (1991, p. 54), while Whitley
attribute possesses the relevant natural structure. Since this is concludes that "most measures of psychological states and
an empirical hypothesis, it can only be satisfactorily investi- traits and of constructs such as attitudes and people's interpre-
gated using empirical methods. Of course, as already tations of events are interval level" (1996, p. 117). The most
indicated, Nunnally believed that (with the exception of length recent edition of Kerlinger's text still claims that "the probabil-
and weight) the attribution of interval scale properties ity is high that many scales and tests used in psychological and
to measurements was always a matter of convention. This was educational measurement approximate interval measurement"
a misunderstanding: the kind of scale obtained depends (Kerlinger & Lee, 2000, p. 635). These are general texts cover-
entirely upon the kind of empirical structure modelled. ing quantitative methods and such conclusions perpetuate
Third, the criterion of working well in practice is generally psychology's attitude of wishful thinking.
interpreted in psychology to mean something like enabling Wishful thinking about these issues is not confined
useful predictions. Psychological tests are useful in many to general texts on research methods. The texts of Nunnally
practical contexts. Such usefulness is often assessed using the (1967), Lord and Novick (1968), and Kline (2000) are special-
product-moment correlation coefficient or the methods ist texts in psychometrics of varying levels of complexity.
of linear regression. The idea that an approximate linear Experts in psychometrics typically fudge the issues as much as
association between test scores and criterion variables supports any psychologists. For example, in recent decades, item
the conclusion that such scores are interval scale measures is response theories (IRTs) have come to dominate psychomet-
the main argument given in psychometrics. Lord and Novick rics (see, e.g., Embretson & Hershberger, 1999). IRTs are
(1968), in the most important psychometric work of the theories that relate the probability of a response of a given kind
twentieth century, put it this way: upon a test item of some sort (say, a correct response to a
If we construct a test score by counting up correct responses dichotomous item on an ability test) to supposedly quantitative
(zero-one scoring) and treating the resulting scale scores as having
interval properties, the procedure may or may not produce a good attributes of the person doing the test (say, to that person's
predictor of some criterion. To the extent that this scaling produces level of the relevant ability) and the test item (say, the item's
a good empirical predictor the stipulated interval scaling is justified. level of difficulty). Perhaps the best known of these theories is
(p. 22). that proposed by Rasch (1960), the so-called Rasch model.
Such an argument overlooks the fact that the predictive useful- Within the context of the present paper, it is interesting to note
ness of psychological tests is logically independent of the issue that the Rasch model is related to the theory of conjoint
of whether test scores measure anything at all. measurement (Keats, 1967) and, thus, is potentially useful
It is possible that scores on a test might not measure in investigating whether the attributes involved (ability and
anything, and yet they could still be useful in making predic- difficulty) are quantitative, ordinal, or of some other structure.
tions. Test scores (generally the number of correct answers) Some see the Rasch model as being conjoint measurement's
are perfectly sensible quantities in their own right and do not "practical realization" (Wright, 1999, p. 80) and Wright
Australian Journal of Psychology - August 2002
Stevens's Scales of Measurement 103
illustrated how the theory of conjoint measurement could be the Second World War. Those decades witnessed an unprece-
applied to data relevant to the Rasch model (Perline, Wright, dented expansion of government investment in scientific
& Wainer, 1979). Despite this and related work (e.g., Karabat- research, especially in the United States, which ushered in the
sos, 2001; Michell, 2002; Scheiblechner 1995,1999), the era of Big Science. Suddenly, research grants became the main
reluctance of those applying IRTs to address the question of vehicle by which, not only individual careers, but also the
whether the relevant attributes are quantitative or something aspirations of entire disciplines, progressed. This led to what
less is palpable. The theory of conjoint measurement describes Schorske has aptly called "the new rigorism" (1997, p. 309).
a hierarchy of conditions - the so-called cancellation condi- By this, he does not mean that disciplines like psychology
tions (Michell, 1990) directly diagnostic of the ordinal-quanti- actually became more rigorous, only that they devised method-
tative distinction. However, in general, those applying IRTs ologies aping the rigour of the physical sciences, misrepresent-
show no inclination to employ this conceptual resource. ing their resemblance to those sciences in the hope
Instead, the prevailing tendency is to evaluate an IRT via of attracting financial support. Thus the politically constructed
indices of goodness of fit that are not necessarily diagnostic of economy of Big Science cemented the post-war methodologi-
quantitative structure (Karabatsos, 2000). Even in this poten- cal consensus. Psychology was already biased towards
tially promising area of psychometrics, practitioners generally measurement (Michell, 1999) and this, combined with the new
follow the time-honoured tradition of Stevens, begging the economy, explains why psychologists used Stevens's theory
question that the attributes of interest are quantitative, instead to shut the door upon the very questions that it promised
of investigating it. to open up to investigation. In short, it was done in order
It is nothing short of a scientific scandal that psychology, to avoid facing the possibility that psychological attributes
at the commencement of the 21st century, should be no closer might not be quantitative within a socially contrived economic
to treating this issue as an empirical research issue than it was context in which quantitative science was seen as favoured
at the beginning of the last century. At least then the excuse
over non-quantitative.
could be given that the science was young and the conceptual
resources needed to investigate the issue undeveloped. Despite ACKNOWLEDGEMENTS
that, the issue was taken seriously then (e.g., Boring, 1920; These investigations were supported by the Federal Govern-
Titchener, 1905). Now, following Stevens's example, the issue ment of Australia via an ARC grant for research into the
is deflected. The challenge for the current generation of
tensions between quantitative and qualitative methods
psychologists is to stop pretending that they can measure and
in psychology. I am grateful to Fiona Hibberd, Agnes Petocz,
to come to grips with this fundamental, foundational issue.
David Grayson and to the anonymous reviewers appointed
PSYCHOLOGY AND THE VALUES OF SCIENCE by this journal for their helpful comments.
For 50 years psychologists have wished that their tests
REFERENCES
produced interval scale measurements and they have been Birkhoff, G. & MacLane, S. (1965). A survey of modern algebra.
prepared to claim, without any supporting scientific evidence, New York: Macmillan.
that their wish is fulfilled. Presentation of Stevens's theory Boring, E.G. (1920). The logic of the normal law of error in mental
of scales of measurement, with its distinctions between types measurement. American Journalof Psychology, 31, 1-33.
of scales, is generally used only to create the illusion Campbell, N.R. (1920). Physics, the elements. Cambridge: Cambridge
of a considered treatment of the issue. Within that context, University Press.
the assertion that psychological tests provide ordinal indices Campbell, N.R. (1938). Measurement and its importance for philosophy.
Aristotelian Society supplementary vol. 17 (pp. 121-151). London:
of underlying attributes has an air of plausibility to minds Harrison &Sons.
uncluttered by knowledge of the structural conditions neces- Campbell, N.R. (1940). Physics and psychology. Advancement of Science,
sary for an ordinal scale. Once that is accepted, those wishing 1, 347-348.
for interval scales are apparently satisfied by suitably framed Cliff, N. (1992). Abstract measurement theory and the revolution that
question-begging arguments. never happened. PsychologicalScience, 3, 186-190.
Of course, psychologists are free to believe whatever they Cliff, N. (1996). Ordinalmethods for behavioral data analysis. Mahwah,
like about test scores as measures of psychological attributes NJ: Erlbaum.
in the privacy of their own minds. Furthermore, the hypothesis Coombs, C.H. (1956). The scale grid: Some interrelations of data models.
Psychometrika, 21, 313-329.
that scores on some test are interval scale measures of some
Coombs, C.H. (1960). A theory of data. Psychological Review,
theoretical construct is a perfectly coherent speculation.
67, 143-159.
However, as scientists, we are not free to claim this belief Coombs, C.H. (1964). A theory of data. New York: Wiley.
or speculation as a scientific result in the absence of evidence. Cronbach, L.J., &Gleser, G.C. (1957). Psychological tests andpersonnel
Yet this is precisely what psychologists do when they present decisions. Urbana: University of Illinois Press.
their tests to the scientific and lay communities as instruments Embretson, S.E., & Hershberger, S.L. (1999). The new rules of measure-
capable of interval scale measurement. Science is committed ment: What every psychologist and educator should know. Mahwah,
to moderating its claims according to the weight of the NJ: Erlbaum.
relevant evidence. To abandon this commitment because one Guilford, J.P. (1954). Psychometric methods. New York: McGraw-Hill.
is more interested in wished for possibilities than in what is the Helmholtz, H. (1996). Numbering and measuring from an epistemological
viewpoint. In W. Ewald (Ed.), From Kant to Hilbert:
case, is to cease doing effective science. Members of the lay A sourcebook in the foundations of mathematics: Vol. 2 (pp.
community whose lives are affected by the claims of scientists 727-752). Oxford: Clarendon Press. (Original work published 1887)
have an obvious interest in scientific moderation and none Holder, O. (1901). Die Axiome der Quantitat und die Lehre vom Mass.
in scientists' wishful thinking. Within both communities, Berichte iiber die Verhandlungen der Kniglich Schsischen
it is now widely believed that psychologists are able Gesellschaft der Wissenschaften zu Leipzig, Mathematisch-Physische
to measure all manner of attributes using psychological tests. Klasse, 53, 1-46.
Johnson, H.M. (1936). Pseudo-mathematics in the social sciences. Ameri-
On this issue, wishful thinking has been victorious and science
can Journalof Psychology, 48, 342-351.
the loser. Why? Karabatsos, G. (2001). The Rasch model, additive conjoint measurement,
Stevens's theory formed part of a methodological consensus and new models of probabilistic measurement theory. Applied
that took shape in psychology in the decades immediately after Measurement, 2, 389-423.

Australian Journal of Psychology - August 2002


104 Joel Michell

Karabatsos, G. (2001). A critique of Rasch residual fit statistics. Journal measurement: Papers in honor of S.S. Stevens (pp. 137-145).
of Applied Measurement, 1, 152-176. Dordrecht: Reidel.
Keats, J.A. (1967). Test theory. Annual Review of Psychology, Nieder6e, R. (1994). There is more to measurement than just measure-
18,217-238. ment: Measurement theory, symmetry, and substantive theorizing.
Kerlinger, F.N. (1964). Foundations of behavioral research: Educational A discussion of basic issues in the theory of measurement. Journal
and psychological inquiry. New York: Holt, Rinehart &Winston. of Mathematical Psychology, 38, 527-594.
Kerlinger, F.N., &Lee, H.B. (2000). Foundations of behavioral research. Nunnally, J.C. (1967). Psychometric theory. New York: McGraw-Hill.
Orlando, FL: Harcourt College Publishers. Perline, R., Wright, B.D., & Wainer, H. (1979). The Rasch model
Kline, P. (2000). A psychometrics primer. London: Free Association as additive conjoint measurement. Applied Psychological Measure-
Books. ment, 3, 237-255.
Krantz, D.H., Luce, R.D., Suppes, P., &Tversky, A. (1971). Foundations Pfanzagl, J. (1968). Theory of measurement. Wiirzburg: Physica-Verlag.
of measurement: Vol. 1. New York: Academic Press. Rasch, G. (1960). Probabilisticmodels for some intelligence and attain-
Lehman, R.S. (1991). Statistics and research design in the behavioral ment tests. Chicago: MESA Press.
sciences. Belmont, CA: Wadsworth Publishing Co. Russell, B. (1903). Principles of mathematics. Cambridge: Cambridge
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test University Press.
scores. Reading, MA: Addison-Wesley. Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP).
Psychometrika, 60, 281-304.
Luce, R.D., Krantz, D.H., Suppes, P., &Tversky, A. (1990). Foundations
of measurement: Vol. 3. San Diego: Academic Press. Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic models
(ADISOP). Psychometrika, 64, 295-316.
Luce, R.D., & Tukey, J.W. (1964). Similtaneous conjoint measurement:
A new type of fundamental measurement. Journal of Mathematical Schorske, C.E. (1997). The new rigorism in the human sciences,
Psychology, 1, 1-27. 1940-1960. In T. Bender &C.E. Schorske (Eds.), American academic
culture in transformation: Fifty years, four disciplines (pp. 309-329).
Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Princeton: Princeton University Press.
PsychologicalBulletin, 100, 398-407.
Stevens, S.S. (1946). On the theory of scales of measurement. Science,
Michell, J. (1990). An introduction to the logic of psychological measure- 103, 677-680.
ment. Hillsdale, NJ: Erlbaum.
Stevens, S.S. (1951). Mathematics, measurement and psychophysics.
Michell, J. (1997). Quantitative science and the definition of measurement In S.S. Stevens (Ed.), Handbook of experimental psychology,
in psychology. BritishJournal of Psychology, 88, 355-383. (pp. 1-49). New York: Wiley.
Michell, J. (1999). Measurement in psychology: Critical history of a Stevens, S.S. (1959). Measurement, psychophysics and utility. In C. W.
methodologicalconcept. Cambridge: Cambridge University Press. Churchman & P. Ratoosh (Eds), Measurement: definitions and
Michell, J. (2000). Normal science, pathological science and psychomet- theories (pp. 18-63). New York: Wiley.
rics. Theory and Psychology, 10, 639-667. Stevens, S.S. (1968). Measurement, statistics, and the schemapiric view.
Michell, J. (2002, April). Conjoint measurement and the Rasch model: Science, 161, 849-856.
Quantitative versus ordinal structure. Paper presented at the Interna- Stevens, S.S. (1975). Psychophysics: Introduction to its perceptual,
tional Objective Measurement Workshop, New Orleans, LA, USA. neural, and socialprospects. New York: Wiley.
Michell, J., & Ernst, C. (1996). The axioms of quantity and the theory Suppes, P. (1951). A set of independent axioms for extensive quantities.
of measurement, Part I. An English translation of H61der (1901), PortugaliaeMathematica, 10, 163-172.
Part I. Journalof Mathematical Psychology, 40, 235-252. Suppes, P., &Zinnes, J. (1963). Basic measurement theory. In R.D. Luce,
Michell, J., & Ernst, C. (1997). The axioms of quantity and the theory R.R. Bush & E. Galanter (Eds.), Handbook of mathematical psychol-
of measurement, Part II. An English translation of Holder (1901), ogy: Vol. I (pp. 1-76). New York: Wiley.
Part II. Journalof MathematicalPsychology, 41, 345-356. Titchener, E.B. (1905). Experimental psychology: a manualfor laboratory
Narens, L. (1985). Abstract measurement theory. Cambridge, MA: MIT practice. London: Macmillan.
Press. Whitley, B.E. (1996). Principles of research in behavioral science.
Narens, L. (2002). Theories of meaningfulness. Mahwah, NJ: Erlbaum. Mountain View, CA: Mayfield Publishing Co.
Narens, L., & Luce, R.D. (1986). Measurement: the theory of numerical Wright, B.D. (1999). Fundamental measurement for psychology. In S.E.
assignments. PsychologicalBulletin, 99, 166-180. Embretson &S.L. Hershberger (Eds.), The new rules of measurement:
Newman, E.B. (1974). On the origin of 'scales of measurement.' What every psychologist and educator should know (pp. 65-104).
In H. R. Moskowitz, B. Scharf, & J.C. Stevens (Eds.), Sensation and Mahwah, NJ: Erlbaum.

Australian Journal of Psychology - August 2002

You might also like