You are on page 1of 6

>chapter 12 Measurement Scales 285

> Ranking Scales


In ranking scales, the participant directly compares two or Assume you are
more objects and makes choices among them. Frequently, asked by Galaxy
the participant is asked to select one as the “best” or the Department Stores to
study the shopping
“most preferred.” When there are only two choices, this habits and prefer-
approach is satisfactory, but it often results in ties when ences of teen girls.
more than two choices are found. For example, assume Galaxy is seeking a
participants are asked to select the most preferred among way to compete with
three or more models of a product. In response, 40 percent specialty stores that
are far more suc-
choose model A, 30 percent choose model B, and 30 per- cessful in serving
cent choose model C. Which is the preferred model? The this market segment.
analyst would be taking a risk to suggest that A is most pre- Galaxy is considering
ferred. Perhaps that interpretation is correct, but 60 percent the construction of an
of the participants chose some model other than A. Perhaps intrastore boutique
catering to these
all B and C voters would place A last, preferring either B teens. What measure-
or C to A. This ambiguity can be avoided by using some of ment issues would
the techniques described in this section. determine your con-
Using the paired-comparison scale, the par- struction of measure-
ticipant can express attitudes unambiguously by ment scales?
choosing between two objects. Typical of paired com-
parisons would be the sports car preference example in
Exhibit 12-10. The number of judgments required in a

>Exhibit 12-10 Ranking Scales

Paired-Comparison “For each pair of two-seat sports cars listed, place a check beside the one
Scale you would most prefer if you had to choose between the two.”
data: ordinal
___ BMW Z4 M Coupe ___ Chevrolet Corvette Z06
___ Porsche Cayman S ___ Porsche Cayman S

___ Chevrolet Corvette Z06 ___ Porsche Cayman S


___ BMW Z4 M Coupe ___ Dodge Viper SRT10

___ Chevrolet Corvette Z06 ___ Dodge Viper SRT10


___ Dodge Viper SRT10 ___ BMW Z4 M Coupe

Forced Ranking Scale “Rank the radar detection features in your order of preference. Place the
data: ordinal number 1 next to the most preferred, 2 by the second choice, and so forth.”

___ User programming


___ Cordless capability
___ Small size
___ Long-range warning
___ Minimal false alarms

Comparative Scale “Compared to your previous hair dryer’s performance, the new one is”:
data: ordinal
SUPERIOR ABOUT THE SAME INFERIOR
––– ––– ––– ––– –––
1 2 3 4 5

coo21507_ch12_266-293.indd 285 21/01/13 10:36 PM


286 >part III The Sources and Collection of Data

>Exhibit 12-11 Response Patterns of 200 Heavy Users’ Paired Comparisons on Five
Alternative Package Designs
Paired-comparison data may be treated in several ways. If there is substantial consistency, we will find that if A is
preferred to B, and B to C, then A will be consistently preferred to C. This condition of transitivity need not always be
true but should occur most of the time. When it does, take the total number of preferences among the comparisons as
the score for that stimulus. Assume a manager is considering five distinct packaging designs. She would like to know
how heavy users would rank these designs. One option would be to ask a sample of the heavy-users segment to pair-
compare the packaging designs. With a rough comparison of the total preferences for each option, it is apparent that
B is the most popular.

Designs

A B C D E

A — 164* 138 50 70

B 36 — 54 14 30

C 62 146 — 32 50

D 150 186 168 — 118

E 130 170 150 82 —

Total 378 666 510 178 268

Rank order 3 1 2 5 4

*Interpret this cell as 164 of 200 customers preferred suggested design B (column) to design A (row).

paired comparison is [(n)(n 2 1)/2], where n is the number of stimuli or objects to be judged. When
four cars are evaluated, the participant evaluates six paired comparisons [(4)(3)/2 5 6].
In another example we might compare packaging design proposals considered by a brand manager (see
Exhibit 12-11). Generally, there are more than two stimuli to judge, resulting in a potentially tedious task for
participants. If 15 suggestions for design proposals are available, 105 paired comparisons would be made.
Reducing the number of comparisons per participant without reducing the number of objects can
lighten this burden. You can present each participant with only a sample of the stimuli. In this way,
each pair of objects must be compared an equal number of times. Another procedure is to choose a
few objects that are believed to cover the range of attractiveness at equal intervals. All other stimuli
are then compared to these few standard objects. If 36 automobiles are to be judged, four may be
selected as standards and the others divided into four groups of eight each. Within each group, the
eight are compared to each other. Then the 32 are individually compared to each of the four standard
automobiles. This reduces the number of comparisons from 630 to 240.
Paired comparisons run the risk that participants will tire to the point that they give ill-considered
answers or refuse to continue. Opinions differ about the upper limit, but five or six stimuli are not un-
reasonable when the participant has other questions to answer. If the data collection consists only of
paired comparisons, as many as 10 stimuli are reasonable. A paired comparison provides ordinal data.
The forced ranking scale, shown in Exhibit 12-10, lists attributes that are ranked relative to each
other. This method is faster than paired comparisons and is usually easier and more motivating to the
participant. With five items, it takes 10 paired comparisons to complete the task, and the simple forced
ranking of five is easier. Also, ranking has no transitivity problem where A is preferred to B, and B to
C, but C is preferred to A—although it also forces a false unidimensionality.
A drawback to forced ranking is the number of stimuli that can be handled by this method. Five ob-
jects can be ranked easily, but participants may grow careless in ranking 10 or more items. In addition,
rank ordering produces ordinal data since the distance between preferences is unknown.
Often the manager is interested in benchmarking. This calls for a standard by which other programs,
processes, brands, point-of-sale promotions, or people can be compared. The comparative scale is
ideal for such comparisons if the participants are familiar with the standard. In the Exhibit 12-10 ex-
ample, the standard is the participant’s previous hair dryer. The new dryer is being assessed relative

coo21507_ch12_266-293.indd 286 21/01/13 10:36 PM


>chapter 12 Measurement Scales 287

>snapshot
Paired Comparison Increases Hospitality

Should Northwest Airlines, Marriott, or Alaskan Airlines attempt And how did the firm evaluate all the possible accommodations—
to attract the business of Americans with disabilities? If so, what from Braille safety cards on airplanes to a designated person to
would it take to capture the segment? Eric Lipp, executive di- handle problems in a hotel? Harris Interactive used its proprietary
rector of the Open Doors Organization (ODO), an advocacy COMPASSTM methodology, which uses paired comparisons as
organization for those with disabilities, sponsored a study to a measurement tool. “COMPASSTM saves the participant time
find out. High on his agenda was providing an incentive to the and energy,’’ explained Light. “Even with a long list, COMPASSTM
travel industry to make accommodations to attract the 22 million can be done quickly.” In the ODO study, COMPASSTM was used
adults with disabilities who have traveled in the last two years on twice: once to measure 17 possible airline accommodations and
63 million trips—and who may want to travel more. “We now es- once to measure 23 possible hotel accommodations. By hav-
timate that Americans with disabilities currently spent $13.2 bil- ing each participant evaluate only a portion of the large number
lion in travel expenditures and that amount would at least double of accommodation pairs rather than the full list (136 for airline
[to $27.2 billion] if travel businesses were more attuned to the accommodations and 253 for hotel accommodations), each
needs of those with disabilities.” question was answered in under four minutes. By using this pro-
ODO hired Harris Interactive, a global market research and cess with all members of the sample, Harris Interactive is able to
consulting firm best known for The Harris Poll and for pioneer- rank-order the items and measure the magnitude of difference
ing the Internet method to conduct scientifically accurate mar- between items. This makes it easier for Delta, Marriott, or Alas-
ket research. Harris Interactive conducted a hybrid study via kan Airlines to make the right choices about accommodations for
both online and phone surveys to determine the magnitude of those with disabilities.
the disability travel segment, its purchasing power, and the ac- www.opendoorsnfp.org; www.harrisinteractive.com
commodations the segment needed to increase travel. “Those
To learn more about this research, read the case “Open Doors:
with disabilities can’t all be reached with one method,” explained
Extending Hospitality to Travelers with Disabilities” available from
Laura Light, project director with Harris Interactive. “The nature of
the Online Learning Center (www.mhhe.com/cooper12e).
their physical limitation might preclude one method or the other.”

to it. The provision to compare yet other dryers to the standard is not shown in the example but is
nonetheless available to the researcher.
Some researchers treat the data produced by comparative scales as interval data since the scoring
reflects an interval between the standard and what is being compared. We would treat the rank or po-
sition of the item as ordinal data unless the linearity of the variables in question could be supported.

> Sorting
Q-sorts require sorting of a deck of cards into piles that represent points along a continuum. The
participant—or judge—groups the cards based on his or her response to the concept written on the
card. Researchers using Q-sort resolve three special problems: item selection, structured or unstructured
choices in sorting, and data analysis. The basic Q-sort procedure involves the selection of a set of verbal
statements, phrases, single words, or photos related to the concept being studied. For statistical stability,
the number of cards should not be less than 60; and for convenience, not be more than 120. After the
cards are created, they are shuffled, and the participant is instructed to sort the cards into a set of piles
(usually 7 to 11), each pile representing a point on the judgment continuum. The left-most pile represents
the concept statements, which are “most valuable,” “favorable,” “agreeable,” and so forth. The right-most
pile contains the least favorable cards. The researcher asks the participant to fill the center, or neutral, pile
with the cards about which the participant is indecisive. In the case of a structured sort, the distribution of
cards allowed in each pile is predetermined. With an unstructured sort, only the number of piles will be
determined. Although the distribution of cards in most structured sorts resembles a normal distribution,
there is some controversy about analyzing the data as ranking (ordinal data) versus interval data.

coo21507_ch12_266-293.indd 287 21/01/13 10:36 PM


288 >part III The Sources and Collection of Data

>closeup
MindWriter Scaling

Jason has been working on scaling for Mind- said Likert scales and scales similar to MindWriter’s numeri-
Writer’s CompleteCare project for a week when cal scale frequently produced a heavy concentration of 4s and
MindWriter
the request comes to Myra Wines to report her 5s—a common problem in customer satisfaction research.
progress to MindWriter’s general manager. He has narrowed Others suggest a 7-point scale to remedy this, but Jason
the choice to the three scales in Exhibit 12-12: a Likert scale, thinks the term “exceeded” on the expectation scale could com-
a numerical rating scale with two verbal anchors, and a hybrid pensate for scores that cluster on the positive end.
expectation scale. All are 5-point scales that are presumed to Now ready for a pilot test, Jason decides to compare the
measure at the interval level. expectation scale with MindWriter’s numerical rating scale.
He needs a statement that can accompany the scale for pre- (The typical Likert scale requires that they create more potential
liminary evaluation. Returning to their list of investigative ques- items than they wanted to use on the online survey.) Using the
tions, he finds a question that seems to capture the essence of CompleteCare database, names, addresses, emails and phone
the repair process: Are customers’ problems resolved? Trans- numbers are selected, and 30 customers are chosen at random
lated into an assertion for the scale, the statement becomes from those who have had recent service. Jason chooses the de-
“Resolution of problem that prompted service/repair.” He contin- layed equivalent forms method for reliability testing (see Chapter
ues to labor over the wording of the verbal anchors. Appropriate 11). Jason administers the expectation scale followed by the
versions of the investigative question are constructed, and then numerical scale to half of the participants and the numerical
the scales are added. scale followed by the expectation scale to the other half. Each
After consulting with MindWriter’s research staff, Myra and half sample experiences a time delay. No “order-of-presentation”
Jason discuss the advantages of the scales. Myra suggests it is effects are found. Subsequently, Jason correlates the numerical
unlikely that CompleteCare would meet none of the customers’ satisfaction scores with the expectation scores and plots the
expectations. And, with errors of leniency, “none” should be re- results, shown in Exhibit 12-13.
placed by the term “few” so that the low end of the scale would On the problem resolution question, the participants’ scores
be more relevant. Jason has read a Marketing News article that from the satisfaction scale and those from the expectation scale

>Exhibit 12-12 Alternative Scales Considered for MindWriter

Likert Scale

The problem that prompted service/repair was resolved.


Strongly Neither Agree Strongly
Disagree Disagree nor Disagree Agree Agree
1 2 3 4 5

Numerical Scale (MindWriter’s Favorite)

To what extent are you satisfied that the problem that prompted service/repair was resolved?
Very Very
Dissatisfied Satisfied
1 2 3 4 5

Hybrid Expectation Scale

Resolution of the problem that prompted service/repair.


Met Few Met Some Met Most Met All Exceeded
Expectations Expectations Expectations Expectations Expectations
1 2 3 4 5

coo21507_ch12_266-293.indd 288 21/01/13 10:36 PM


>chapter 12 Measurement Scales 289

>closeupcont’d
>Exhibit 12-13 Plot of MindWriter Scale Evaluation are positively correlated. The correlation index is .90 (1.00 is
a perfect positive correlation). This reveals the equivalence of
5 both scales. On another reliability issue, stability, a check of the
test-retest reliability over a one-week interval produces a cor-
relation index of .93 for the expectation scale, but the Mind-
4
Satisfaction scores

Writer satisfaction scale has a lower index (r = .75). This implies


that the expectation scale is more likely to produce stable and
3 consistent results. Finally, as shown in the plot, the data for the
resolution question measured on both scales are linear (they
cluster around a straight line). It is logical to conclude that the
2 expectation scale, which was originally thought to have advan-
tages, performs well in initial tests and is a good replacement for
MindWriter’s existing scale.
1
The decision is made. They will use the new hybrid expec-
1 2 3 4 5 tation scale for the MindWriter research on the CompleteCare
service program.
Expectation scores

The purpose of sorting is to get a conceptual representation of the sorter’s attitude toward the at-
titude object and to compare the relationships between people. The relative ranking of concepts allows
researchers to derive clusters of individuals possessing similar preferences. By researchers varying
the instructions, the technique can be used to describe products, services, behavioral intentions, and a
host of other applications. In the example below, participants are asked to complete a structured sort
of cards containing the names of magazines. The scale values and the number of cards in each pile are
predetermined, although the distribution in this case represents a normal statistical distribution.
What magazines do you want Singapore Airlines to carry for its in-flight service?

Most Least
Preferred Preferred
10 9 8 7 6 5 4 3 2 1 0 (scale value)
3 4 7 10 13 16 13 10 7 4 3 (number of cards per pile)

> Cumulative Scales


Total scores on cumulative scales have the same meaning. Given a person’s total score, it is possible
to estimate which items were answered positively and negatively. A pioneering scale of this type was
the scalogram. Scalogram analysis is a procedure for determining whether a set of items forms a uni-
dimensional scale.18 A scale is unidimensional if the responses fall into a pattern in which endorsement
of the item reflecting the extreme position results in endorsing all items that are less extreme.
Assume we are surveying opinions regarding a new style of running shoe. We have developed a
preference scale of four items:
1. The Airsole is good-looking.
2. I will insist on Airsole next time because it is great-looking.
3. The appearance of Airsole is acceptable to me.
4. I prefer the Airsole style to other styles.

coo21507_ch12_266-293.indd 289 21/01/13 10:36 PM


290 >part III The Sources and Collection of Data

>Exhibit 12-14 Ideal Scalogram Response Pattern*

Item

2 4 1 3 Participant Score

X X X X 4

— X X X 3

— — X X 2

— — — X 1

— — — — 0

*X 5 agree; — 5 disagree.

Participants indicate whether they agree or disagree. If these items form a unidimensional scale, the
response patterns will approach the ideal configuration shown in Exhibit 12-14. Item 2 is the most
extreme position of the four attitude statements. A participant who agrees with item 2 will agree with
all four items. The items are ordered in the scalogram left to right from most to least extreme. If each
agreement renders a score of 1, a score of 4 indicates all statements are agreed upon and represents the
most favorable attitude. Persons with a score of 3 should disagree with item 2 but agree with all others,
and so on. According to scalogram theory, this pattern confirms that the universe of content (attitude
toward the appearance of this running shoe) is scalable.
The scalogram and similar procedures for discovering underlying structure are useful for assessing
attitudes and behaviors that are highly structured, such as social distance, organizational hierarchies,
and evolutionary product stages.19 The scalogram is used much less often today, but retains potential
for specific applications.

>summary
1 Managers know that the measurement of attitudes is be analyzed statistically. Measurement scales are either
an important aspect of strategy and often the best tool unidimensional or multidimensional. A balanced rating
available because attitudes reflect past experience and scale has an equal number of categories above and below
shape future behavior. Attitudes are learned, stable pre- the midpoint, whereas an unbalanced rating scale has an
dispositions to respond to oneself, other persons, ob- unequal number of favorable and unfavorable response
jects, or issues in a consistently favorable or unfavorable choices. An unforced-choice rating scale provides par-
way. Attitudes are generally thought to be composed of ticipants with an opportunity to express no opinion when
three components: affective, cognitive, and behavioral they are unable to make a choice among the alternatives
intentions. offered. A forced-choice scale requires that they select
2 Selecting and constructing a measurement scale requires one of the offered alternatives. The ideal number of points
the consideration of several factors that influence the reli- for a rating scale should match the stimulus presented
ability, validity, and practicality of the scale. Two broad and extract information proportionate to the complexity
research objectives are to measure characteristics of the of the attitude object. The value of rating scales depends
individuals who participate in studies and to use partici- on the assumption that a rater can and will make good
pants as judges of the objects or indicants presented to judgments. Errors of central tendency, halo effect, and
them. Measurement scales fall into one of four general leniency adversely affect a precise understanding of the
response types: rating, ranking, categorization, and sort- measurement.
ing. The properties of data are classified in increasing 3 Rating scales have several uses, design features, and
order of power—nominal, ordinal, interval, or ratio—which requirements. The simple category scale offers two mu-
determines how a particular measurement scale’s data will tually exclusive response choices. The multiple-choice,

coo21507_ch12_266-293.indd 290 21/01/13 10:36 PM

You might also like