You are on page 1of 3

Person. indirid. D@ Vol. 8, No. 2, pp. 281-283.

1987 0191~8869/87
$3.00+ 0.00
Printed in Great Britain Pergamon Journals Ltd

On the history of rating scales

PAUL MCREYNOLDS and KLAUS LUDWIG


Department of Psychology, University qf Nevada-Reno, Reno, NV 89557, U.S.A

(Received 8 April 1986)

Summary-Rating scales constitute one of the most widely employed techniques in research on personality
and individual differences. The historical background of rating scales is therefore a matter of considerable
interest. Though Galton has generally been given credit for originating rating scale methodology, several
applications of rating scales prior to Galton can be identified, and the seminal idea of rating scales can
be traced back to Galen.

INTRODUCTION
The use of rating scales as a means of quantifying subjective variables constitutes one of the more important and widely
used measurement techniques in contemporary personology. Indeed, rating scales of one kind or another are employed on
occasion in almost all fields of psychology, and a sophisticated literature (e.g. Guilford, 1954; Nunnally, 1967; Saal, Downey
and Lahey, 1980) has been generated concerning their applications.
For many years it was part of the general psychological wisdom that Francis Galton, the pioneering student of individual
differences, was the first person to devise and apply rating scales. Thus, Garrett and Schneck, in 1933, wrote that
“Historically, the rating scale goes back to Francis Galton, who in 1883 published a scale for rating the clearness of one’s
mental imagery” (p. 103). Guilford, in the first edition of his classic Psychometric methodr (1936), assigned the first
psychological rating scale to Galton, but this attribution was removed from the second edition (1954), presumably because
in the interim Ellson and Ellson (1953) had reported evidence that the English educator and Utopian, Robert Owen
(1771--1858), had devised what could be considered a rating scale in 1826, and possibly earlier. More recently, McReynolds
and Ludwig (1984) have discussed the construction and application of psychological rating scales, as early as 1692, by the
German Enlightenment philosopher, Christian Thomas& (1655.-1728).
The purpose of this note is to briefly trace the development of rating scales from the earliest period up to the time in
which their use in psychology became relatively commonplace. Though it seems certain that Thomasius originated the idea
of using numbers to represent the intensity of the kind of personal dimensions that we customarily think of as psychological,
he was not the first to employ a numerical scale to reflect the subjective intensity of a variable. It appears that the practice
of doing this, at least in a crude way, goes well back into history. specifically to the Greco-Roman physician Galen (second
century A.D.).

EARLY HISTORY
The dichotomies of hot&cold and wet-dry were central concepts in ancient, pre-Galenic humorology, but it was evidently
Galen who recognized the need, with respect to the hot+old dimension, for some kind of standard. This standard, or neutral
value, he suggested, should be the temperature, as reflected in direct sense-perception, of a mixture of equal quantities of
boiling water and ice (Taylor, 1942). Further, Galen proposed a convention of four degrees of heat and four degrees of
cold, on either side of the standard, that could be induced in patients by various drugs. Though not explicitly described
as such, this system amounts, by implication, to at least the nascent notion of a rating scale comprised of nine points.
Galen’s interest in a hot-cold scale derived from his concern, as a physician, with body temperature. Though his efforts
to devise a rating scale for temperature were rudimentry, later physicians developed his theme to the extent that by the
seventeenth century the concept of degrees of hot and cold, as judgd by physicians, was common in the medical literature.
Further, there was some concern about the logical nature of the scale. Thus, the ninth-century Islamic physician,
Al-kindi-and here we quote Taylor--“exercised his ingenuity on the question of whether the successive degrees of heat
and cold were equal, and, if not, in what numerical ratio they stood” (p. 130). In 1578 the Bern physician, Johann Hasler,
explicitly proposed and illustrated a nine-point subjective temperature scale.*
These various proposals, it should be noted, constituted true rating scales in that they depended upon psychological
judgments, in this case judgments based on sense perceptions, for the assignment of numerals to particular variables. They
differed, of course, from strictly psychological scales in that they were concerned with the measurement of a physical variable
(temperature).
Rating scales for temperature in its physical sense quickly became obsolete after the development of the thermometer.
It is, however, possible to conceive of temperature in its directly felt or perceived sense-how warm or cold does one feel?
And it is quite possible for individuals to rate such subjective temperature, not as a way of estimating the concomitant

*Our account of Galen’s conception of a temperature scale, and of the contributions of later physicians, is based primarily
on Taylor (1942), Sambursky (1962) and Middleton (1966). Galen’s account of a temperature standard is found in his
De Temperamenfis, Book 1. An English translation of the relevant portion can be found in Sambursky (p. 40). Hasler’s
illustration of his temperature scale is reproduced in Taylor (1942, p. 131) and Middleton (1966, p. 2).

281
282 NOTES AND SHORTER COMMUNICATIONS

physical temperature, but as a psychological variable in its own right. A number of studies on subjective temperature, as
well as on related meteorological topics such as wind velocity, were carried out in the latter nineteenth and early twentieth
centuries. These studies were reviewed, in 1909, by Titchener, who was interested in their relevance to psychophysical
judgments. In Titchener’s review we learn that in about 1805 Admiral Beaufort of the British Navy adopted a 12.point
scale for recording subjective evaluations of wind strength (0 = calm; 12 = hurricane). Further. that J. W. Osborn. in 1876,
collected data in which observers rated subjective climatic temperature on a scale of I-20 (e.g. 1 = unbearably cold;
20 = intolerably hot). And, in addition, that W. F. Tyler, in 1904, employed an 1 l-point scale (0 10) for observers to rate
the degree of comfort-discomfort of climate. Several other instances of numerically ordered subjective scales were also
reported by Titchener.
As noted above, it was Christian Thomasius (1692a, 1692b; McReynolds and Ludwig, 1984) who devised and applied
the first scales for rating psychological variables. Thomasius’s rating scales were an integral part of his overall theory of
personality, which posited four major dimensions-sensuousness, acquisitiveness, social ambition, and rational love. Each
dimension-they can perhaps best be translated as “inclinations” (Latin passionestwas assessed for a given individual
by judges utilizing a 60-point scale. In practice, this amounted to 12-point scales. since Thomasius never used the zero point,
and only five point intervals (e.g. 20, 25, 30) were employed. Thomasius published (1692b) numerical data-including
reliability data-on each dimension for five actual individuals as rated by himself and others. This work appears to
constitute the first systematic collection and analysis of quantitative empirical data in the entire history of psychology.
Thomasius’s conception-it antedated Galton’s rating scale by 189 years-was a strikingly innovative achievement,* but
it appears to have had little direct effect on the history of psychology. Thus, none of Thomasius’s students or philosophical
successors carried on his quantitative approach or developed it further. Two interrelated reasons can be offered for the
relative lack of influence of Thomasius’s invention of rating scales. First. it is probable that Thomasius himself did not
fully realize the broader significance of his methodology; and second, the technique was clearly Far ahead of its time, by
which we mean that in Thomasius’s period the strain of thought that later was to become specialized as psychology had
not yet attained the degree of systematization and clarification that would permit it to incorporate and utilize objective
measurement technology.

MODERN HISTORY

The first use of rating scales after Thomasius appears to have been that by Robert Owen. referred to earlier in this paper.
Owen, a British reformer especially interested in the education of the young, in 1825 purchased 30,000 acres in Indiana
for an idealistic community, and gave it the name of New Harmony. It was there, presumably, that he developed his rating
system. This system was called to the attention of psychologists by Ellson and Ellson (1953). Their account was based on
a report in the travel journal of Karl Bernhard, Duke of Sachsen-Weimar Eisenach (1828). who visited New Harmony in
1826. Bernhard (1828, Vol. 2) wrote that Owen showed him a metal plate, about 7 x 12 inches in size, according to which
“each child could be shown his capabilities, and upon which, after a mature self-examination, he can himself discover what
progress he has made” (p. 12). The plate had 10 scales, each marked off in 100 parts, and labeled, respectively,
Self-attachment, Affections, Judgment, Imagination, Memory, Reflection, Perception, Excitability, Courage and Strength.
A system of sliding markers was provided so that a child’s judged position on each scale could be graphically displayed.
This interesting apparatus was apparently designed by Owen solely for use as a pedagogical device. So far as is known,
it was used exclusively at the New Harmony colony,f and a search of Owen’s writings indicates that he never mentioned
it m print. It is, however, conceivable that Bernhard’s description of the apparatus, published (In English). in 182X. may
have stimulated later uses of rating scales.
If the method of Robert Owen represented a small, isolated instance of a rating scale, those of certain phrenologists,
developed a little later. were fairly widespread. Though the vogue of phrenology was never accepted by scientific psychology,
it nevertheless had a significant influence on the popular culture of nineteenth-century Britain and America. Practicing
phrenologists, in order to evaluate and report the extent to which an individual was characterized by each of 37 postulated
characterological traits (e.g. Amativeness, Acquisitiveness, Benevolence), developed an extensive system of rating scales
(Bakan, 1966; McReynolds, 1975). Though it is not clear precisely when, or under what circumstances phrenologists first
employed rating scales, it is evident that such devices were in widespread use before the mid-nineteenth century. One of
the leaders in this practice was 0. S. Fowler. In his Pracrical Phrenology (1851) Fowler described the use of seven-point
rating scales in the following words:
“The proportionate size of the phrenological organs of the individual examined. and, consequently, the r&five power
and energy of his primary mental powers; that is, his moral and intellectual character and munifesfations, will be indicated
by the wrirren figures 1, 2, 3, 4, 5, 6, 7: figure 1 signifying VERY SMALL; 2, SMALL: 3, MODERATE; 4. AVERAGE;
5. FULL; 6, LARGE; 7, VERY LARGE.” (p, 4)
Detailed definitions were provided for each category, and the rater was given the opportunity of indicating values between
adjacent points by the use of + and - signs. Fowler published ratings of 40 individuals, mostly well-known figures, on
each of the scales (pp. 344348).

*While it is not known how Thomasius came upon the idea of a quantitative rating scale for psychological variables, it
is possible that his conception was developed through analogy with thermometers, which, though not yet standardized,
were by that time well known to scholars. The 60-point scale may have been adopted from analogy with mechanical
clocks, which were then widely disseminated throughout Europe. Thomasius’s invention of rating scales is also noted
in Ramul (1960) and McReynolds (1975).
tThe apparatus, though badly discolored by fire at some earlier indeterminate time, and no longer intact, still exists, and
is on view in the museum in the library at New Harmony. According to Librarian Mary Olive Cook (personal
communication, 4 March 1983), “The apparatus appears. to be made of brass. there are 20 strips 318 inches wide.
Every other strip has a ‘quality’ [indication] and markings from 0 to 100. These strips are fastened to a solid piece of
brass.”
NOTES AND SHORTER COMMUNICATIONS 283

The next application of the rating scale method, so far as we have been able to ascertain, was carried out by George
Fisher and at Greenwich Hospital School in England, prior to 1864 (Chadwick, 1864). In this development, which was
earlier noted by E. L. Thorndike (1910; see also Ayres, 1918), teachers rated given performances of students in various
subjects, including writing, spelling and mathematics, on a scale of 1-5. The most interesting aspect of this rating system
was the fact that Fisher provided a ‘Scale Book’, which included specimens of performance for each of the rated categories,
as a means of increasing rater reliability.
The person responsible for introducing rating scale methodology into mainstream psychology, and the man usually-but,
as we have seen, erroneously-considered to have been the first in history to have developed rating scales, was Francis
Galton. In 1880 Galton, in an important paper on mental imagery, reported data gathered by questionnaire from 100 male
adults and 172 boys. The raw data were descriptions of imagery in specific situations as written by the subjects. Galton
classified-i.e. rated-these responses in terms of seven categories, with an additional category at each extreme for
exceptional cases, making, in effect, a nine-point scale. An important feature of Galton’s scale was the fact that the seven
basic points were designed to mark off equal intervals, based on statistical characteristics of normal distributions.
It is interesting to note that Galton, before becoming interested in psychological questions, was deeply involved in-and
an important contributor to-the developing science of meteorology. It is thus possible, as Guilford (1936) observed, that
Galton’s application of rating scale methods to psychology was suggested by his possible earlier experience with their use
in meteorology.
Galton’s prominence helped to make the idea of rating scales a familiar one in the rapidly developing discipline of
nsvcholoav.
.- __
___oarticularlv to those nsvcholoaists _ nrimarilv
_ interested in the studv of individual differences. Guilford’s chief
student, Karl Pearson (1907), was apparently the first investigator to employ ratings-he devised a seven-point scale-in
research on intelligence. Other applications early in this century were reviewed by Guilford (1954). By 1921-22, when Rugg
published a four-part paper on rating scale methodology, the rating scale technique had become a standard tool in
psychological research, a position that it continues to enjoy.

REFERENCES

Ayres L. P. (1918) History and present status of educational measurements. Seventeenth Yearbook, National Society for
the Study of Education, Part II, pp. 9915. Pub. School Pub. Co., Bloomington, IL.
Bakan D. (1966) The influence of phrenology on American psychology. J. his?. behau. Sci. 2, 2OG220.
Bernhard K. (1828) Travels through North America, during the years 1825 and 1826, 2 ~01s. Carey, Lea & Carey,
Philadelphia.
Chadwick E. B. (1864) Statistics of educational results. The Museum, Q. Mug. Educ., Lit. Sci. 3, 48&484.
Ellson D. G. and Ellson E. C. (1953) Historical note on the rating scale. Psychol. Bull. 50, 383-384.
Fowler 0. S. (1851) Practical Phrenology. Fowler & Wells, New York.
Galton F. (1880) Statistics of mental imagery. Mind 19, 21-318.
Garrett H. E. and Schneck M. R. (1933) Psychological Tests. Harper, New York.
Guilford J. P. (1936) Psychometric Methods. McGraw-Hill, New York.
Guilford J. P. (1954) Psychometric Methods, 2nd edn. McGraw-Hill, New York.
Hasler J. (1578) De Logistica Medica. Valentinus Schonik, Augustae.
McReynolds P. (1975) Historical antecedents of personality assessment. In Advances in Psychological Assessment; Vol. 3,
(Edited by McReynolds P.), pp. 477-532. Jossey-Bass, San Francisco.
McReynolds P. and Ludwig K. (1984) Christian Thomasius and the origin of psychological rating scales. Isis 75, 546-553.
Middleton W. E. K. (1966) A History of the Thermometer and ifs Use in Meteorology. Johns Hopkins, Baltimore.
Nunnally J. (1967) Psychometric Theory. McGraw-Hill, New York.
Pearson K. (1907) On the relationship of intelligence to size and shape of head. Biomefrika 5, 106-146.
Ramul K. (1960) The problem of measurement in the psychology of the eighteenth century. Am. Psychol. 15, 256265.
Rugg H. 0. (1921, 1922) Is the rating of human character practicable? J. educ. Psychol. 12, 425438, 485-501; 13, 3w2,
81-93.
Saal F. E., Downey R. G. and Lahey M. A. (1980) Rating the ratings: assessing the psychometric quality of rating data.
Psychol. Bull. 88, 4 13428.
Sambursky S. (1962) The Physical World of Late Anfiquity. Basic Books, New York.
Taylor F. S. (1942) The origin of the thermometer. Ann. Sci. 5, 129-156.
Thomasius C. (1692a) Die neue ErJndung einer wohlgegrt?ndeten und fiir das gemeine Wesen hiichstniithigen Wissenschaff
das Verborgene des Herzens anderer Menschen such wider ihren Willen aus der tiiglichen Conversation zu erkennen.
Christoph Salfeld, Halle.
Thomasius C. (1692b) Weitere Erleuterung durch unterschiedene Exempel des ohneliingst gethanen Vorschlags - wepen
_ der
neuen Wissekschaft anderer Menschen Gemiither erkennen zu lernen: Christoph Salfeld,~Halle.
Thorndike E. L. (1910) Educational measurements of fifty years ago. J. educ. Psychol. 4, 551-552.
Titchener E. B. (1909) The psychophysics of climate. Am. J. Psychol. 20, l-14.
Wylie A. T. (1922) A brief history of mental tests. Teachers CON. Rec. 23, 19-33.

You might also like