You are on page 1of 12

Measurement and Scaling: Noncomparative

Scaling Techniques

Overview As discussed in Chapter 8, scaling techniques are classified as comparative or noncompara-


tive. The comparative techniques, consisting of paired comparison, rank order, constant
sum, and 0-sort scaling, were discussed in the last chapter. The subject of this chapter is
noncomparative techniques, which are comprised of continuous and itemized rating scales.
We discuss the popular itemized rating scales, the Likert, semantic differential, and Stapel
scales, as well as the construction of multi-item rating scales. We show how scaling tech-
niques should be evaluated in terms of reliability and validity and consider how the
researcher selects a particular scaling technique. Mathematically derived scales are also
presented. The considerations involved in implementing noncomparative scales when
researching international markets are discussed. Several ethical issues that arise in rating
scale construction are identified. The chapter also discusses the use of the Internet and
computers in developing continuous and itemized rating scales.

Real Research New York City Transit in Transit

The New York City Transit (NYCT) (www.mta.nyc.ny.us/nyct/subway) does not have a wholly captive
audience, as some people believe. Many people do not use the mass transit system when they have a choice.
A much needed rate hike brought fears that more people would avoid taking the bus or subway. Therefore,
research was undertaken to uncover ways to increase ridership.
In a telephone survey, respondents were asked to rate different aspects of the transit system using
five-point Likert scales. Likert scales were chosen because they are easy to administer over the tele-
phone and the respondents merely indicate their degree of (dis)agreement (I = strongly disagree,
5 = strongly agree).
The results showed that personal safety was the major concern on subways. New Yorkers were afraid to
use a subway station in their own neighborhoods. NYCT was able to respond to riders' concerns by increasing
police presence, having a more visible NYCT staff, increasing lighting, and repositioning walls, columns, and
stairways for better visibility throughout the station.
Telephone surveys also revealed that cleanliness of subway stations and subway cars is related to the
perception of crime. In response, NYCT was able to concentrate more on ways to maintain a cleaner
appearance. Action was also taken to reduce the number of homeless people and panhandlers. They are
asked to leave, and sometimes transportation to shelters is provided.
Results of marketing research efforts have helped NYCT improve perceptions surrounding the system,
leading to increased ridership. As of 2008, the New York subway system has 468 stations-the largest
number of public transit subway stations for any system in the world. 1 •

Noncomparative Scaling Techniques


noncomparative scale· Respondents using a noncomparative scale employ whatever rating standard seems appropriate
One of two types of scaling to them. They do not compare the object being rated either to another object or to some specified
techniques in which each standard, such as "your ideal brand." They evaluate only one object at a time, and for this reason
stimulus object is scaled
noncomparalive scales are often referred to as monadic scales. Noncomparative techniques
independently of the other
consist of continuous and itemized rating scales, which are described in Table 9.1 and discussed
objects in the stimulus set.
in the following sections.

305
306 PART II • RESEARCH DESIGN FORMULATION

Using Likert scales, the


New York City Transit was
able to determine people's
perceptions of the subway
system and address their
concerns, leading to
increased ridership.

Basic Noncomparative Scales


~~~~~

Scale Basic Characteristics Examples Advantages Disadvantages


Continuous Place a mark on a Reaction to TV Easy to constmct Scoring can be
rating scale continuous line commercials cumbersome unless
compllterized
Itemized Rating Scales
Likert scale Degree of agreement on a Measurement of Easy to construct, More time consuming
1 (strongly disagree) to attitudes administer, and
5 (strongly agree) scale understand
Semantic Seven-point scale with Brand, product, and Versatile Controversy as to
differential bipolar labels company images whether the data
are interval
Stapel scale Unipolar ten-point scale, Measurement of Easy to construct; Confusing an<l
-5 to +5, without a attitudes and administered difficult to apply
neutral point (zero) images over telephone

continuous rating scale


Also referred to as a
Continuous Rating Scale
graphic rating scale,this In a continuous rating scale, also referred to as a graphic rating scale, respondents rate the
measurement scale has the objects by placing a mark at the appropriate position on a line that runs from one extreme of the
respondents rate the criterion variable to the other. Thus, the respondents are not restricted to selecting from marks
objects by placing a mark at previously set by the researcher. The form of the continuous scale may vary considerably. For
the appropriate position on
example, the line may be vertical or horizontal; scale points, in the form of numbers or hrief
a line that runs from one
descriptions, may be provided; and, if provided, the scale points may be few or many. Three
extreme of the criterion
variable to the other.
versions of a continuous rating scale are illustrated.

Project Research Continuous Rating Scales

How would you rate Sears as a department store?

Version 1
Probably the worst - - - - - I ------- ---- - - - - - - - Probably the best
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 307

Version 2
Probably the worst - - - - - - - I - - ----- - - - - - - Probably the best
0 IO 20 30 40 50 60 70 80 90 100

Version 3
Very bad Neither good Very good
nor bad
Probably the worst - - - -1 - - - - - Probably the best
0 10 20 30 40 50 60 70 80 90 100 •

Once the respondent has provided the ratings, the researcher divides the line into as many
categories as desired and assigns scores based on the categories into which the ratings fall. In the
department store project example, the respondent exhibits an unfavorable attitude toward Sears.
These scores are typically treated as interval data. Thus, continuous scales possess the character-
istics of description, order, and distance, as discussed in Chapter 8.
The advantage of continuous scales is that they are easy to construct. However, scoring is
cumbersome and unreliable. Moreover, continuous scales provide little new information. Hence,
their use in marketing research has been limited. Recently, however, with the increased popularity of
computer-assisted personal interviewing (CAPI), Internet surveys, and other technologies, their use
is becoming more frequent. Continuous rating scales can be easily implemented in CAPI or on the
Internet. 111e cursor can be moved on the screen in a continuous fa5hion to select the exact position
on the scale that best describes the respondent's evaluation. Moreover, the scale values can be
automatically scored by the computer, thus increasing the speed and accuracy of processing the data.

Real Research Continuous Measurement and Analysis of Perceptions:


The Perception Analyzer

The Perception Analyzer (www.perceptionanalyzer.com) by MSinteractive is a computer-supported,


interactive feedback system composed of wireless or wired handheld dials for each participant, a console
(computer interface), and special software that edits questions, collects data, and analyzes participant
responses. Members of focus groups use it to record their emotional response to television commercials,
instantly and continuously. Each participant is given a dial and instructed to continuously record his or her
reaction to the material being tested. As the respondents turn the dials, the information is fed to a computer.
Thus, the researcher can determine the second-by-second response of the respondents as the commercial is
rw1. Furthermore, this response can be superimposed on the commercial to see the respondents' reactions to
the various frames and parts of the commercial.
111e analyzer was recently used to measure responses to a series of "slice-of-life" commercials for
McDonald's. The researchers found that mothers and daughters had different responses to different aspects

Companies such as
McDonald's have used the
Perception Analyzer to
measure consumers'
reactions to commercials,
company videos, and other
audio/visual materials.
308 PART II • RESEARCH DESIGN FORMULATION

of the commercial. Using the emotional response data, the researchers could detennine which commercial
had the greatest emotional appeal across mother-daughter segments. McDonald's marketing efforts proved
successful with 2008 revenues of $23.52 billion. 2 •

ACTIVE RESEARCH

Developing Hit Movies: Not a Mickey Mouse Business


Visit www.disney.com and search the Internet using a search engine as well as your library's online
databases to obtain information on consumer movie viewing habits and preferences.
How would you measure audience reaction to a new movie slated for release by the Walt Disney
Company?
As the marketing director for Disney movies, how would you develop "hit" movies?
itemized rating scale
A measurement scale
having numbers and/or Itemized Rating Scales
brief descriptions associated
Jn an itemized rating scale, the respondents are provided with a scale that has a number or brief
with each category. The
description associated with each category. The categories are ordered in terms of scale position,
categories are ordered in
terms of scale position. and the respondents are required to select the specified category that best describes the object
being rated. Itemized rating scales are widely used in marketing research and form the basic
Ukert scale components of more complex scales, such as multi-item rating scales. We first describe the com-
A measurement scale with monly used itemized rating scales, the Likert, semantic differential, and Stapel scales, and then
five response categories examine the major issues surrounding the use of these scales.
ranging from "strongly
disagree" to "strongly
Likert Scale
agree," which requires the
respondents to indicate a Named after its developer, Rensis Likert, the Likert scale is a widely used rating scale that
degree of agreement or requires the respondents to indicate the degree of agreement or disagreement with each of a
disagreement with each of a series of statements about the stimulus objects. 3 Typically, each scale item has five response
series of statements related categories, ranging from "strongly disagree" to "strongly agree." We illustrate with a Likert scale
to the stimulus objects. for evaluating attitudes toward Sears in the context of the department store project.

Project Research Likert Scale

Instructions
Listed here are different opinions about Sears. Please indicate how strongly you agree or disagree with each
by using the following scale:

Strongly disagree
2 = Disagree
3 Neither agree nor disagree
4 = Agree
5 Strongly agree

Neither
Strongly agree nor Strongly
disagree Disagree disagree Agree agree
l. Sears sells high-quality merchandise. 2X 3 4 5
2. Sears has poor in-store service. 2X 3 4 5
3. I like to shop at Sears. 2 3X 4 5
4. Sears does not offer a good mix of different 2 3 4X 5
brands within a product category.
5. The credit policies at Sears are terrible. I 2 3 4X 5
6. Sears is where America shops. IX 2 3 4 5
7. I do not like the advertising done by Sears. I 2 3 4X 5
8. Sears sells a wide variety of merchandise. 2 3 4X 5
9. Sears charges fair prices. 2X 3 4 s•
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 309

The data are typically treated as interval. Thus, the Likert scale possesses the characteristics of
desctiplion, order, and distance. To conduct the analysis, each statement is assigned a numerical
score, ranging either from -2 to +2 or I to 5. The analysis can be conducted on an item-by-item
basis (profile analysis), or a total (summated) score can be calculated for each respondent by
summing across items. Suppose the Likert scale in the department store example was used to
measure attitudes toward Sears as well as JCPenney. Profile analysis would involve comparing the
two stores in terms of the average respondent ratings for each item, such as quality of merchandise,
in-store service, and brand mix. The summated approach is most frequently used and, as a result,
the Likert scale is also reforred to as a summated scale. 4 When using this approach to determine
the total score for each respondent on each store, it is important to use a consistent scoring proce-
dure so that a high (or low) score consistently reflects a favorable response. This requires that the
categories assigned to the negative statements by the respondents be scored by reversing the scale
when analyzing the data. Note that for a negative statement, an agreement reflects an unfavorable
response, whereas for a positive statement, agreement represents a favorable response.
Accordingly, a "strongly agree" response to a favorable statement and a "strongly disagree"
response to an unfavorable statement would both receive scores of 5. In the scale shown here, if a
higher score is to denote a more favorable attitude, the scoring of items 2, 4, 5, and 7 will be
reversed. Thus, the respondent in the department store project example has an attitude score of 22.
The reason for having both positive and negative statements is to control the tendency of some
respondents to mark one or the other end of the scale without reading the items. Each respondent's
total score for each store is calculated. A respondent will have the most favorable attitude toward
the store with the highest score. The procedure for developing summated Likert scales is described
later in the section on multi-item scales.
The Likert scale has several advantages. It is easy to construct and administer. Respondents
readily understand how to use the scale, making it suitable for mail, telephone, personal or elec-
tronic interviews. Therefore, this scale was used in the NYCT telephone survey in the opening
example. The major disadvantage of the Likert scale is that it takes longer to complete than other
itemized rating scales because respondents have to read each statement. Sometimes, it may be
difficult to interpret the response to a Likert item, especially if it is an unfavorable statement.
In our example, the respondent disagrees with statement number 2 that Sears has poor in-store
service. In reversing the score of this item prior to summing, it is assumed that this respondent
would agree with the statement that Sears has good in-store service. This, however, may not be
true; the disagreement merely indicates that the respondent would not make statement number 2.
The following example shows another use of a Likert scale in marketing research.

Real Research · How Concerned Are You About Your Online Privacy?

In spite of the enormous potential of e-commerce, its share compared to the total portion of the economy
still remains small: less than 3 percent worldwide as of 2009. The lack of consumer confidence in online
privacy is a major problem hampering the growth of e-commerce. A recent report showed that practically
all Americans (94.5 percent), including Internet users and non-Internet users, are concerned about "the
privacy of their personal information when or if they buy online." Therefore, the author and his colleagues
have developed a scale for measuring Internet users' information privacy concerns. This is a JO-item, three-
dimensional scale. 111e three dimensions are control, awareness, and collection. Each of the 10 items is
scored on a 7-point Likert-type agree-disagree scale. The scale has been shown to have good reliability and
validity. This scale should enable online marketers and policy makers to measure and address Internet
users' information privacy concerns, which should result in increased e-commerce. 5 Due to space
constraints we show only the items used to measure awareness.

Awareness (of Privacy Practices)


We used 7-point scales anchored with "strongly disagree" and "strongly agree."

l. Companies seeking information online should disclose the way the data are collected, processed,
and used.
2. A good consumer online privacy policy should have a clear and conspicuous disclosure.
3. Lt is very important to me that I am aware and knowledgeable about how my personal infonnation
will be used. •
310 PART II • RESEARCH DESIGN FORMULATION

Semantic Differential Scale


semantic differential The semantic differential is a 7-point rating scale with endpoints associated with bipolar labels
A 7-point rating scale with that have semantic meaning. In a typical application, respondents rate objects on a number of
endpoints associated with itemized, 7-point rating scales bounded at each end by one of two bipolar adjectives, such as
bipolar labels that have "cold" and "warm."6 We i11ustrate this scale by presenting a respondent's evaluaiion of Sears on
semantic meaning.
five attributes.
The respondents mark the blank that best indicates how they would describe the object being
rated. 7 Thus, in our example, Sears is evaluated as somewhat weak, reliable, very old-fashioned,
warm, and careful. The negative adjective or phrase sometimes appears at the left side of the
scale and sometimes at the right. This controls the tendency of some respondents, particularly
those with very positive or very negative attitudes, to mark the right- or left-hand sides without
reading the labels. The methods for selecting the scale labels and constructing a semantic differ-
ential scale have been described elsewhere by the author. A general semantic differential sc;ale
for measuring self-concepts, person concepts, and product concepts is shown.

Project Research Semantic Differential Scale

Instructions
This part of the study measures what certain department stores mean to you by having you judge them on a
series of descriptive scales bounded at each end by one of two bipolar adjectives. Please mark (X) the blank
that best indicates how accurately one or the other adjective describes what the store means to you. Please
be sure to mark every scale; do not omit any scale.

Form
Sears is:
Powerful :-:-:-:-:-X-:-:-: Weak
Unreliable :-:-:-:-:-:-X-:-: Reliable
Modern :-:-:-:-:-:-:-X-: Old-fashioned
Cold :-:-:-:-:-:-X-:-: Warm
Careful :-:-X-:-:--:-:-:-: Careless •

Real Research A Semantic Differential Scale for Measuring Self-Concepts, Person


Concepts, and Product Concepts8

1. Rugged ......
-.-*-.-.-.-.- Delicate
2. Excitable ......
-·-·-·-·-·-·- Calm
3. lJ ncomfortable -:-:-:-:-:~:-: Comfortable
4. Dominating ......
......
-------- Submissive
s. Thrifty -:-:-:~:-:-:-: Indulgent
6. Pleasant ......
-.-.-.-.-.-·.- Unpleasant
7. Contemporary -:-:-:-:-:-:- Noncontemporary
8. Organized :-:-:-:-:-:-:- Unorganized
9. Rational ------·- Emotional
10. Youthful -:-:-:-:-:-:-: Mature
11. Fonnal ..... .
......
-------- Informal
12. Orthodox ......
-.-.-.-.-.-.- Liberal
13. Complex -·-·--·-·-·~-·- Simple
14. Colorless ......
-·-·-·-·-·~·- Colorful
___ ____
15. Modest "
Vain

Individual items on a semantic differential scale may be scored on either a 3 to + 3 or a 1-to-
7 scale. The resulting data are commonly analyzed through profile analysis. In profile analysis,
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 311

means or median values on each rating scale are calculated and compared by plotting or statis-
tical analysis. This helps determine the overall differences and similarities among the objects.
To assess differences across segments of respondents, the researcher can compare mean
responses of different segments. Although the mean is most often used as a summary statistic,
there is some controversy as to whether the data obtained should be treated as an interval scale. 9
On the other hand, in cases when the researcher requires an overall comparison of objects, such
as to determine store preference, the individual item scores are summed to arrive at a total
score. As in the case of the Likert scale, the scores for the negative items are reversed before
summing.
Its versatility makes the semantic differential a popular rating scale in marketing research. It
has been widely used in comparing brand, product, and company images. It has also been used to
develop advertising and promotion strategies and in new product development studies. 10 Several
modifications of the basic scale have been proposed.

Stapel Scale
Stapel scale The Stapel scale, named after its developer, Jan Stapel, is a unipolar rating scale with 10 cate-
A scale for measuring gories numbered from -5 to + 5, without a neutral point (zero ). 11 This scale is usually
attitudes that consists of a presented vertically. Respondents are asked to indicate how accurately or inaccurately each
single adjective in the middle term describes the object by selecting an appropriate numerical response category. The higher
of an even-numbered range
the number, the more accurately the term describes the object, as shown in the department store
of values, from ~s to +5,
project lu this example, Sears is evaluated as not having high quality and having somewhat
without a neutral point
(zero).
poor service.

Project Research Stapel Scale

Instructions
Please evaluate how accurately each word or phrase describes each of the department stores. Select a plus
number for the phrases you think describe the store accurately. The more accurately you think the phrase
describes the store, the larger the plus number you should choose. You should select a minus number for
phrases you think do not describe it accurately. The less accurately you think the phrase describes the store,
the larger the minus number you should choose. You can select any number, from +5 for phrases you think
are very accurate, to -5 for phrases you think are very inaccurate.

Form

SEARS

+5 +5
+4 +4
+3 +3
+2 +2X
+l +1
HIGH QUALITY POOR SERVICE
-1 -1
-2X -2
-3 -3
-4 -4
-5 -5.

The data obtained by using a Stapel scale are generally treated as interval and can be analyzed in
the same way as semantic differential data. The Stapel scale produces results similar to the
semantic differential. The Stapel scale's advantages are that it does not require a pretest of the
adjectives or phrases to ensure true bipolarity, and it can be administered over the telephone.
However, some researchers believe the Stapel scale is confusing and difficult to apply. Of the
three itemized rating scales considered, the Stapel scale is used least. However, this scale merits
more attention than it has received.
312 PART 11 • RESEARCH DESIGN FORMULATION

ACTIVE RESEARCH

The Diet Craze: Attitude Toward Diet Soft Drinks


Visit www.dietcoke.com and search the Internet using a search engine as well as your library's on1ine
databases to obtain information on consumers' attitudes toward diet drinks.
As the brand manager for Diet Coke, how would you use information on consumer attitudes to seg-
ment the market?
How would you use each of the three itemized scales to measure consumers' attitudes toward Diet
Coke and other diet soft drinks? Which. scale do you recommend?

Noncomparative Itemized Rating Scale Decisions


As is evident from the discussion so far, noncomparative itemized rating scales need not be u~ed
as originally proposed but can take many different forms. The researcher must make six major
decisions when constructing any of these sca1es.
1. The number of scale categories to use
2. Balanced versus unbalanced scale
3. Odd or even number of categories
4. Forced versus nonforced choice
5. The nature and degree of the verbal description
6. The physical form of the scale

Number of Scale Categories


Two conflicting considerations are involved in deciding the number of scale categories. The
greater the number of scale categories, the finer the discrimination among stimulus objects
that is possible. On the other hand, most respondents cannot handle more than a few cate-
gories. Traditional guidelines suggest that the appropriate number of categories should be
seven plus or minus two: between five and nine. 12 Yet there is no single optimal number of
categories. Several factors should be taken into account in deciding on the number of
categories.
If the respondents are interested in the scaling task and are knowledgeable about the
o~jects, a larger number of categories may be employed. On the other hand, if the respondents
are not very knowledgeable or involved with the task, fewer categories should be used.
Likewise, the nature of the objects is also relevant. Some objects do not lend themselves to fine
discrimination, so a small number of categories is sufficient. Another important factor is
the mode of data collection. If telephone interviews are involved, many categories may
confuse the respondents. Likewise, space limitations may restrict the number of categories in
mail questionnaires.
How the data are to be analyzed and used should also influence the number of categories. In
situations where several scale items are added together to produce a single score for each respon-
dent, five categories are sufficient. The same is trne if the researcher wishes to make broad general-
izations or group comparisons. If, however, individual responses are of interest or the data will be
analyzed by sophisticated statistical techniques, seven or more categories may be required. The size
of the correlation coefficient, a common measure of relationship between variables (Chapter 17), is
influenced by the number of scale categories. The correlation coefficient decreases with a reduction
in the number of categories. This, in turn, has an impact on all statistical analysis based on the
correlation coefficient.13

Balanced Versus Unbalanced Scales


balanced scale In a balanced scale, the number of favorable and unfavorable categories are equal; in an unbal-
A scale with an equal anced scale, they are unequal. 14 Examples of balanced and unbalanced scales are given in
number of favorable and Figure 9.1. In general, the scale should be balanced in order to obtain objective data. However,
unfavorable categories. if the distribution of responses is likely to be skewed, either positively or negatively, an
unbalanced scale with more categories in the direction of skewness may be appropriate. If an
unbalanced scale is used, the nature and degree of unbalance in the scale should be taken into
account in data analysis.
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 313

FIGURE 9.1
Balanced Scale Unbalanced Scale
Balanced and
Jovan Musk for Men is Jovan Musk for Men is
Unbalanced Scales
Extremely good Extremely good
Very good Very good
Good Good
Bad Somewhat good
Very bad Bad
Extremely bad Very bad

Odd or Even Number of Categories


With an odd number of categories, the middle scale position is generally designated as neutral or
impartial. The presence, position, and labeling of a neutral category can have a significant influence
on the response. The Likert scale is a balanced rating scale with an odd number of categories and a
neutral point. 15
The decision to use an odd or even number of categories depends on whether some of the
respondents may be neutral on the response being measured. lf a neutral or indifferent response is
possible from at least some of the respondents, an odd number of categories should be used. lf, on
the other hand, the researcher wants to force a response or believes that no neutral or indifferent
response exists, a rating scale with an even number of categories should be used. A related issue is
whether the scale should be forced or nonforced.

Forced Versus Nonforced Scales


forced ratin9 scales On forced rating scales, the respondents are forced to express an opinion, because a "no
A rating scale that forces opinion" option is not provided. In such a case, respondents without an opinion may mark the
the respondents to express middle scale position. If a sufficient proportion of the respondents do not have opinions on the
an opinion because "no topic, marking the middle position will distort measures of central tendency and variance.
opinion" or "no
1n situations where the respondents are expected to have no opinion, as opposed to simply being
knowlerlge" option is not
reluctant to disclose it, the accuracy of data may be improved by a nonforced scale that includes
provided.
a "no opinion" category. 16

Nature and Degree of Verbal Description


The nature and degree of verbal description associated with scale categories varies considerably
and can have an effect on the responses. Scale categories may have verbal, numerical, or even
pictorial descriptions. Furthermore, the researcher must decide whether to label every scale
category, some scale categories, or only extreme scale categories. Surprisingly, providing a
verbal description for each category may not improve the accuracy or reliability of the data. Yet
an argument can be made for labeling all or many scale categories to reduce scale ambiguity. The
category descriptions should be located as close to the response categories as possible.
The strength of the adjectives used to anchor the scale may influence the distribution of the
responses. With strong anchors (I completely disagree, 7 = completely agree), respondents
are less likely to use the extreme scale categories. This results in less variable and more peaked
response distributions. Weak anchors ( I = generally disagree, 7 generally agree), in contrast,
produce uniform or flat distributions. Procedures have been developed to assign values to cate-
gory descriptors so as to result in balanced or equal-interval scales. 17

Physical Form or Configuration


A number of options are available with respect to scale form or configuration. Scales can be
presented vertically or horizontally. Categories can be expressed by boxes, discrete lines, or units
on a continuum and may or may not have numbers assigned to them. If numerical values are used,
they may be positive, negative, or both. Several possible configurations are presented in Figure 9.2.
Two unique rating scale configurations used in marketing research are the thermometer
scale and the smiling face scale. For the thermometer scale, the higher the temperature, the more
favorable the evaluation. Likewise, happier faces indicate more favorable evaluations. These
scales are especially useful for children. 18 Examples of these scales are shown in Figure 9.3.
314 PART II • RESEARCH DESIGN FORMULATION

FIGURE 9.2
A variety of scale configurations may be employed to measure the gentleness of Cheer
Rating Scale
detergent. Some examples include:
Configurations
Cheer detergent is:

l. Very Very
harsh gentle

2. Very 2 3 4 5 6 7 Very
harsh gentle

3. D Very harsh
D
D
D Neither harsh nor gentle
D
D
D Very gentle

4.
Very Harsh Somewhat Neither Somewhat Gentle Very
harsh harsh harsh gentle gentle
nor
gentle

5. 8l [] E] [TI EI] @] ~
Very Neither Very
harsh harsh gentle
nor
gentle

Table 9.2 summarizes the six decisions in designing rating scales. Table 9.3 presents some
commonly used scales. Although we show these scales as having five categories, the number of
categories can be varied depending upon the judgment of the researcher.

Project Research Project Activities

1. Develop Likert, semantic differential, and Stapel scales for measuring customer satisfaction toward
Sears.
2. fllustrate the six itemized rating scale decisions of Table 9.2 in the context of measuring customer
satisfaction toward Sears. •

Summary of Itemized Rating Scale Decisions


1. Number of categories Although there is no single, optimal number, traditional guidelines suggest that there should
be between five and nine categories.
2. Balanced versus unbalanced Jn general, the scale should be balanced to obtain objective data.
3. Odd or even number of categories If a neutral or indifferent scale response is possible from at least some of the respondents,
an odd number of categories should be used.
4. Forced versus nonforced In situations where the respondents are expected to have no opinion, the accuracy of data
may be improved by a nonforced scale.
5. Verbal description An argument can be made for labeling all or many scale categories. The category
descriptions should be located as close to the response categories as possible.
6. Physical form A number of options should be tried and the best one selected.
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 315

FIGURE 9.3
1/zermometer Scale
Some Unique
Rating Chart Instructions
Please indicate how much you like McDonald's hamburgers by coloring in the thermometer
Configurations
with your blue pen. Start at the bottom and color up to the temperature level that best
indicates how strong your preference is for McDonald's hamburgers.
Fonn
Like Very
Much

Dislike Very
Much

Smiling Face Scale


l 11structio11s
Please tell me how much you like the Barbie Doll by pointing to the face that best shows
how much you like it If you did not like the Barbie Doll at all, you would point to Face 1.
If you liked it very much, you would point to Face 5. Now tell me, how much did you like
the Barbie Doll?
Form

2 3 4 5

ACTIVE RESEARCH

Dressing Consumers: Measuring Preferences for Dress Shoes


Visit www.rockport.com and search the internet using a search engine as well as your library's online
databases to obtain infonnation on consumers' preferences for dress shoes.
Develop an itemized scale to measure consumers' preferences for dress shoes and justify your rating
scale decisions.
As the marketing manager for Rockport, how would you use information on consumers' preferences
for dress shoes to increase your sales?

Some Commonly Used Scales in Marketing


Construct Scale
Attitude Very Bad Bad Neither Bad Nor Good Good Very Good
1mportance Not at All Important Not Important Neutral Important Very Important
Satisfaction Very Dissatisfied Dissatisfied Neither Dissatisfied Satisfied Very Satisfied
nor Satisfied
Purchase Intent Definitely Will Not Probably Will Might or Might Probably Will Definitely Will
Buy Not Buy Not Buy Buy Buy
Purchase Never Rarely Sometimes Often Very Often
Frequency
316 PART II • RESEARCH DESIGN FORMULATION

Multi-Item Scales
multi-item scales A multi-item scale consists of multiple items, where an item is a single question or statement to
A multi-item scale consists be evaluated. The Likert, semantic differential, and Stapel scales presented earlier to measure atti-
of multiple items, where an tudes toward Sears are examples of multi-item scales. Nole that each of these scales has multiple
item is a single question or items. The development of multi-item rating scales requires considerable technical expertise. 19
statement to be evaluated. Figure 9.4 is a paradigm for constructing multi-item scales. The researcher begins by developing
construct the construct of interest. A construct is a specific type of concept that exists at a higher level of
A specific type of concept abstraction than do everyday concepts, such as brand loyalty, product involvement, attitude, satis-
that exists at a higher level faction, and so forth. Next, the researcher must develop a theoretical definition of the construct
of abstraction than do that states the meaning of the central idea or concept of interest. For this, we need an underlying
everyday concepts. theory of the construct being measured. A theory is necessary not only for constmcting the scale
but also for interpreting the resulting scores. For example, brand loyalty may he defined as the
consistent repurchase of a brand prompted by a favorable attitude toward the brand. The construct
must be operationalized in a way that is consistent with the theoretical definition. The operational
definition specifies which observable characteristics will be measured and the process of assign-
ing value to the construct. For example, in the context of toothpaste purchases, consumers wi1l be
characterized as brand loyal if they exhibit a highly favorable attitude (top quartile) and have
purchased the same brand on at least four of the last five purchase occasions.
The next step is to generate an initial pool of scale items. Typically, this is done based on
theory, analysis of secondary data, and qualitative research. From this pool, a reduced set of
potential scale items is generated by the judgment of the researcher and other knowledgeable
individuals. Some qualitative criterion is adopted to aid their judgment. The reduced set of items
is still too large to constitute a scale. Thus, further reduction is achieved in a quantitative manner.

FIGURE 9.4
Development of
a Multi-Item Scale

You might also like