Professional Documents
Culture Documents
Scaling Techniques
The New York City Transit (NYCT) (www.mta.nyc.ny.us/nyct/subway) does not have a wholly captive
audience, as some people believe. Many people do not use the mass transit system when they have a choice.
A much needed rate hike brought fears that more people would avoid taking the bus or subway. Therefore,
research was undertaken to uncover ways to increase ridership.
In a telephone survey, respondents were asked to rate different aspects of the transit system using
five-point Likert scales. Likert scales were chosen because they are easy to administer over the tele-
phone and the respondents merely indicate their degree of (dis)agreement (I = strongly disagree,
5 = strongly agree).
The results showed that personal safety was the major concern on subways. New Yorkers were afraid to
use a subway station in their own neighborhoods. NYCT was able to respond to riders' concerns by increasing
police presence, having a more visible NYCT staff, increasing lighting, and repositioning walls, columns, and
stairways for better visibility throughout the station.
Telephone surveys also revealed that cleanliness of subway stations and subway cars is related to the
perception of crime. In response, NYCT was able to concentrate more on ways to maintain a cleaner
appearance. Action was also taken to reduce the number of homeless people and panhandlers. They are
asked to leave, and sometimes transportation to shelters is provided.
Results of marketing research efforts have helped NYCT improve perceptions surrounding the system,
leading to increased ridership. As of 2008, the New York subway system has 468 stations-the largest
number of public transit subway stations for any system in the world. 1 •
305
306 PART II • RESEARCH DESIGN FORMULATION
Version 1
Probably the worst - - - - - I ------- ---- - - - - - - - Probably the best
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 307
Version 2
Probably the worst - - - - - - - I - - ----- - - - - - - Probably the best
0 IO 20 30 40 50 60 70 80 90 100
Version 3
Very bad Neither good Very good
nor bad
Probably the worst - - - -1 - - - - - Probably the best
0 10 20 30 40 50 60 70 80 90 100 •
Once the respondent has provided the ratings, the researcher divides the line into as many
categories as desired and assigns scores based on the categories into which the ratings fall. In the
department store project example, the respondent exhibits an unfavorable attitude toward Sears.
These scores are typically treated as interval data. Thus, continuous scales possess the character-
istics of description, order, and distance, as discussed in Chapter 8.
The advantage of continuous scales is that they are easy to construct. However, scoring is
cumbersome and unreliable. Moreover, continuous scales provide little new information. Hence,
their use in marketing research has been limited. Recently, however, with the increased popularity of
computer-assisted personal interviewing (CAPI), Internet surveys, and other technologies, their use
is becoming more frequent. Continuous rating scales can be easily implemented in CAPI or on the
Internet. 111e cursor can be moved on the screen in a continuous fa5hion to select the exact position
on the scale that best describes the respondent's evaluation. Moreover, the scale values can be
automatically scored by the computer, thus increasing the speed and accuracy of processing the data.
Companies such as
McDonald's have used the
Perception Analyzer to
measure consumers'
reactions to commercials,
company videos, and other
audio/visual materials.
308 PART II • RESEARCH DESIGN FORMULATION
of the commercial. Using the emotional response data, the researchers could detennine which commercial
had the greatest emotional appeal across mother-daughter segments. McDonald's marketing efforts proved
successful with 2008 revenues of $23.52 billion. 2 •
ACTIVE RESEARCH
Instructions
Listed here are different opinions about Sears. Please indicate how strongly you agree or disagree with each
by using the following scale:
Strongly disagree
2 = Disagree
3 Neither agree nor disagree
4 = Agree
5 Strongly agree
Neither
Strongly agree nor Strongly
disagree Disagree disagree Agree agree
l. Sears sells high-quality merchandise. 2X 3 4 5
2. Sears has poor in-store service. 2X 3 4 5
3. I like to shop at Sears. 2 3X 4 5
4. Sears does not offer a good mix of different 2 3 4X 5
brands within a product category.
5. The credit policies at Sears are terrible. I 2 3 4X 5
6. Sears is where America shops. IX 2 3 4 5
7. I do not like the advertising done by Sears. I 2 3 4X 5
8. Sears sells a wide variety of merchandise. 2 3 4X 5
9. Sears charges fair prices. 2X 3 4 s•
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 309
The data are typically treated as interval. Thus, the Likert scale possesses the characteristics of
desctiplion, order, and distance. To conduct the analysis, each statement is assigned a numerical
score, ranging either from -2 to +2 or I to 5. The analysis can be conducted on an item-by-item
basis (profile analysis), or a total (summated) score can be calculated for each respondent by
summing across items. Suppose the Likert scale in the department store example was used to
measure attitudes toward Sears as well as JCPenney. Profile analysis would involve comparing the
two stores in terms of the average respondent ratings for each item, such as quality of merchandise,
in-store service, and brand mix. The summated approach is most frequently used and, as a result,
the Likert scale is also reforred to as a summated scale. 4 When using this approach to determine
the total score for each respondent on each store, it is important to use a consistent scoring proce-
dure so that a high (or low) score consistently reflects a favorable response. This requires that the
categories assigned to the negative statements by the respondents be scored by reversing the scale
when analyzing the data. Note that for a negative statement, an agreement reflects an unfavorable
response, whereas for a positive statement, agreement represents a favorable response.
Accordingly, a "strongly agree" response to a favorable statement and a "strongly disagree"
response to an unfavorable statement would both receive scores of 5. In the scale shown here, if a
higher score is to denote a more favorable attitude, the scoring of items 2, 4, 5, and 7 will be
reversed. Thus, the respondent in the department store project example has an attitude score of 22.
The reason for having both positive and negative statements is to control the tendency of some
respondents to mark one or the other end of the scale without reading the items. Each respondent's
total score for each store is calculated. A respondent will have the most favorable attitude toward
the store with the highest score. The procedure for developing summated Likert scales is described
later in the section on multi-item scales.
The Likert scale has several advantages. It is easy to construct and administer. Respondents
readily understand how to use the scale, making it suitable for mail, telephone, personal or elec-
tronic interviews. Therefore, this scale was used in the NYCT telephone survey in the opening
example. The major disadvantage of the Likert scale is that it takes longer to complete than other
itemized rating scales because respondents have to read each statement. Sometimes, it may be
difficult to interpret the response to a Likert item, especially if it is an unfavorable statement.
In our example, the respondent disagrees with statement number 2 that Sears has poor in-store
service. In reversing the score of this item prior to summing, it is assumed that this respondent
would agree with the statement that Sears has good in-store service. This, however, may not be
true; the disagreement merely indicates that the respondent would not make statement number 2.
The following example shows another use of a Likert scale in marketing research.
Real Research · How Concerned Are You About Your Online Privacy?
In spite of the enormous potential of e-commerce, its share compared to the total portion of the economy
still remains small: less than 3 percent worldwide as of 2009. The lack of consumer confidence in online
privacy is a major problem hampering the growth of e-commerce. A recent report showed that practically
all Americans (94.5 percent), including Internet users and non-Internet users, are concerned about "the
privacy of their personal information when or if they buy online." Therefore, the author and his colleagues
have developed a scale for measuring Internet users' information privacy concerns. This is a JO-item, three-
dimensional scale. 111e three dimensions are control, awareness, and collection. Each of the 10 items is
scored on a 7-point Likert-type agree-disagree scale. The scale has been shown to have good reliability and
validity. This scale should enable online marketers and policy makers to measure and address Internet
users' information privacy concerns, which should result in increased e-commerce. 5 Due to space
constraints we show only the items used to measure awareness.
l. Companies seeking information online should disclose the way the data are collected, processed,
and used.
2. A good consumer online privacy policy should have a clear and conspicuous disclosure.
3. Lt is very important to me that I am aware and knowledgeable about how my personal infonnation
will be used. •
310 PART II • RESEARCH DESIGN FORMULATION
Instructions
This part of the study measures what certain department stores mean to you by having you judge them on a
series of descriptive scales bounded at each end by one of two bipolar adjectives. Please mark (X) the blank
that best indicates how accurately one or the other adjective describes what the store means to you. Please
be sure to mark every scale; do not omit any scale.
Form
Sears is:
Powerful :-:-:-:-:-X-:-:-: Weak
Unreliable :-:-:-:-:-:-X-:-: Reliable
Modern :-:-:-:-:-:-:-X-: Old-fashioned
Cold :-:-:-:-:-:-X-:-: Warm
Careful :-:-X-:-:--:-:-:-: Careless •
1. Rugged ......
-.-*-.-.-.-.- Delicate
2. Excitable ......
-·-·-·-·-·-·- Calm
3. lJ ncomfortable -:-:-:-:-:~:-: Comfortable
4. Dominating ......
......
-------- Submissive
s. Thrifty -:-:-:~:-:-:-: Indulgent
6. Pleasant ......
-.-.-.-.-.-·.- Unpleasant
7. Contemporary -:-:-:-:-:-:- Noncontemporary
8. Organized :-:-:-:-:-:-:- Unorganized
9. Rational ------·- Emotional
10. Youthful -:-:-:-:-:-:-: Mature
11. Fonnal ..... .
......
-------- Informal
12. Orthodox ......
-.-.-.-.-.-.- Liberal
13. Complex -·-·--·-·-·~-·- Simple
14. Colorless ......
-·-·-·-·-·~·- Colorful
___ ____
15. Modest "
Vain
•
Individual items on a semantic differential scale may be scored on either a 3 to + 3 or a 1-to-
7 scale. The resulting data are commonly analyzed through profile analysis. In profile analysis,
CHAPTER 9 • MEASUREMENT AND SCALING: NONCOMPARATIVE SCALING TECHNIQUES 311
means or median values on each rating scale are calculated and compared by plotting or statis-
tical analysis. This helps determine the overall differences and similarities among the objects.
To assess differences across segments of respondents, the researcher can compare mean
responses of different segments. Although the mean is most often used as a summary statistic,
there is some controversy as to whether the data obtained should be treated as an interval scale. 9
On the other hand, in cases when the researcher requires an overall comparison of objects, such
as to determine store preference, the individual item scores are summed to arrive at a total
score. As in the case of the Likert scale, the scores for the negative items are reversed before
summing.
Its versatility makes the semantic differential a popular rating scale in marketing research. It
has been widely used in comparing brand, product, and company images. It has also been used to
develop advertising and promotion strategies and in new product development studies. 10 Several
modifications of the basic scale have been proposed.
Stapel Scale
Stapel scale The Stapel scale, named after its developer, Jan Stapel, is a unipolar rating scale with 10 cate-
A scale for measuring gories numbered from -5 to + 5, without a neutral point (zero ). 11 This scale is usually
attitudes that consists of a presented vertically. Respondents are asked to indicate how accurately or inaccurately each
single adjective in the middle term describes the object by selecting an appropriate numerical response category. The higher
of an even-numbered range
the number, the more accurately the term describes the object, as shown in the department store
of values, from ~s to +5,
project lu this example, Sears is evaluated as not having high quality and having somewhat
without a neutral point
(zero).
poor service.
Instructions
Please evaluate how accurately each word or phrase describes each of the department stores. Select a plus
number for the phrases you think describe the store accurately. The more accurately you think the phrase
describes the store, the larger the plus number you should choose. You should select a minus number for
phrases you think do not describe it accurately. The less accurately you think the phrase describes the store,
the larger the minus number you should choose. You can select any number, from +5 for phrases you think
are very accurate, to -5 for phrases you think are very inaccurate.
Form
SEARS
+5 +5
+4 +4
+3 +3
+2 +2X
+l +1
HIGH QUALITY POOR SERVICE
-1 -1
-2X -2
-3 -3
-4 -4
-5 -5.
The data obtained by using a Stapel scale are generally treated as interval and can be analyzed in
the same way as semantic differential data. The Stapel scale produces results similar to the
semantic differential. The Stapel scale's advantages are that it does not require a pretest of the
adjectives or phrases to ensure true bipolarity, and it can be administered over the telephone.
However, some researchers believe the Stapel scale is confusing and difficult to apply. Of the
three itemized rating scales considered, the Stapel scale is used least. However, this scale merits
more attention than it has received.
312 PART 11 • RESEARCH DESIGN FORMULATION
ACTIVE RESEARCH
FIGURE 9.1
Balanced Scale Unbalanced Scale
Balanced and
Jovan Musk for Men is Jovan Musk for Men is
Unbalanced Scales
Extremely good Extremely good
Very good Very good
Good Good
Bad Somewhat good
Very bad Bad
Extremely bad Very bad
FIGURE 9.2
A variety of scale configurations may be employed to measure the gentleness of Cheer
Rating Scale
detergent. Some examples include:
Configurations
Cheer detergent is:
l. Very Very
harsh gentle
2. Very 2 3 4 5 6 7 Very
harsh gentle
3. D Very harsh
D
D
D Neither harsh nor gentle
D
D
D Very gentle
4.
Very Harsh Somewhat Neither Somewhat Gentle Very
harsh harsh harsh gentle gentle
nor
gentle
5. 8l [] E] [TI EI] @] ~
Very Neither Very
harsh harsh gentle
nor
gentle
Table 9.2 summarizes the six decisions in designing rating scales. Table 9.3 presents some
commonly used scales. Although we show these scales as having five categories, the number of
categories can be varied depending upon the judgment of the researcher.
1. Develop Likert, semantic differential, and Stapel scales for measuring customer satisfaction toward
Sears.
2. fllustrate the six itemized rating scale decisions of Table 9.2 in the context of measuring customer
satisfaction toward Sears. •
FIGURE 9.3
1/zermometer Scale
Some Unique
Rating Chart Instructions
Please indicate how much you like McDonald's hamburgers by coloring in the thermometer
Configurations
with your blue pen. Start at the bottom and color up to the temperature level that best
indicates how strong your preference is for McDonald's hamburgers.
Fonn
Like Very
Much
Dislike Very
Much
2 3 4 5
ACTIVE RESEARCH
Multi-Item Scales
multi-item scales A multi-item scale consists of multiple items, where an item is a single question or statement to
A multi-item scale consists be evaluated. The Likert, semantic differential, and Stapel scales presented earlier to measure atti-
of multiple items, where an tudes toward Sears are examples of multi-item scales. Nole that each of these scales has multiple
item is a single question or items. The development of multi-item rating scales requires considerable technical expertise. 19
statement to be evaluated. Figure 9.4 is a paradigm for constructing multi-item scales. The researcher begins by developing
construct the construct of interest. A construct is a specific type of concept that exists at a higher level of
A specific type of concept abstraction than do everyday concepts, such as brand loyalty, product involvement, attitude, satis-
that exists at a higher level faction, and so forth. Next, the researcher must develop a theoretical definition of the construct
of abstraction than do that states the meaning of the central idea or concept of interest. For this, we need an underlying
everyday concepts. theory of the construct being measured. A theory is necessary not only for constmcting the scale
but also for interpreting the resulting scores. For example, brand loyalty may he defined as the
consistent repurchase of a brand prompted by a favorable attitude toward the brand. The construct
must be operationalized in a way that is consistent with the theoretical definition. The operational
definition specifies which observable characteristics will be measured and the process of assign-
ing value to the construct. For example, in the context of toothpaste purchases, consumers wi1l be
characterized as brand loyal if they exhibit a highly favorable attitude (top quartile) and have
purchased the same brand on at least four of the last five purchase occasions.
The next step is to generate an initial pool of scale items. Typically, this is done based on
theory, analysis of secondary data, and qualitative research. From this pool, a reduced set of
potential scale items is generated by the judgment of the researcher and other knowledgeable
individuals. Some qualitative criterion is adopted to aid their judgment. The reduced set of items
is still too large to constitute a scale. Thus, further reduction is achieved in a quantitative manner.
FIGURE 9.4
Development of
a Multi-Item Scale