You are on page 1of 12

Graphic Era University Dehradun

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Measurement& Scaling
Variables A variable is anything we measure. This is a broad definition that includes
most everything we will be interested in for an experiment. It could be the age or gender
of participants, their reactions times, or anything we might be interested in. Whenever
we measure a variable, it could be a measurement (quantitative) difference or a
categorical (qualitative) difference. You should know both terms for each type.
Measurement variables are things to which we can assign a number. It is something we
can measure. Examples include age, height, weight, time measurement, or number of
children in a household. These examples are also called quantitative because they
measure some quantity. Categorical variables are measures of differences in type rather
than amount. Examples include anything categorize such as race, gender, or color.
These are also called qualitative variables because there is some quality that distinguishes
these objects. Another dimension on which variables might differ is that they may be
either continuous or discreet. A continuous variable is a variable that can take on any
value on the scale used to measure it. Thus, a measure of 1 or 2 is valid, as well as 1.5 or
1.25. Any division on any unit on the scale produces a valid possible measure.
Examples include things like height or weight. You could have an object that weighed 1
pound or 1.5 pounds or 1.25 pounds. All are possible measures. Discreet variables, on
the other hand, can assume only a few possible values on the scale used to measure it.
Divisions of measures are usually not valid. Thus, if I measure the number of television
sets in your home it could be 1 or 2 or 3. Divisions of these values are not valid. So, you
could not have 1.5 televisions or 1.25 televisions in your home. You either have a
television or you don’t. Another way to keep this difference in mind is that with a
continuous variable is a measure of “how much.” A discreet variable is a measure of
“how many.”
Measurement is the process of assigning numbers to objects or observations; In other
words, some form of quantification expressed in numbers Measuring Data / Scales of
measurements in terms of their mathematical properties are grouped as Nominal, Ordinal,
Interval and Ratio abstract concepts like ‘happiness’ is much more difficult than
measuring physical objects, i.e., abstract concepts & non-standardized measurement tools
lead to less confidence about accuracy of measurement

Scales of Measure – Whenever we measure a variable it has to be on some type of scale.

The following scales are delivered in order of increasing complexity. Each scale
presented is in order of increasing order.

Nominal scales – These are not really scales as all, but are instead numbers used to
differentiate objects. Real world examples of these variables are common. The numbers
are just labels. So, social security numbers, the channels on your television, eye colour,
race, brands ,attributes, stores are all good examples of nominal variables.

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Ordinal Scales – Ordinal scales use numbers to put objects in order. No other
information other than more or less is available from the scale. A good example is class
rank, or any type of ranking. Someone ranked at four had a higher GPA than someone
ranked as five, but we don’t know how much better four is than five. Example: Liking of
ice cream with categories dislike, indifferent, and like has a natural ordering. Examples in
marketing are relative attitudes, opinions and preferences.

Interval Scales- Interval scales contain an ordinal scale (objects are in order), but have
the added feature that the distance between scale units is always the same. Class rank
would not qualify because we don’t know how much better one unit is than another, but
with interval there is the same distance from one unit to the next anywhere we are on the
scale. Examples include temperature (in Fahrenheit or Celsius), or altitude. For
temperature you know that the difference in ten degrees is the same no matter how hot or
cold it might be. A situation where store A received a rating of 2 and store B a rating of 4
does not mean store A was preferred twice as much as store B.

Ratio Scales – Ratio scales contain an interval scale (equal intervals between units on the
scale), but have the added feature that there is a true zero point on the scale. This zero
point is necessary for ratio statements to have meaning. Examples include height or
weight or measures of amount of time. Notice that it is not valid to have a measure
below zero on any of these scales. Something could not weigh a negative amount. These
scales are much more common than interval scales because if a scale usually has a zero
point. In fact scientist invented the Kelvin temperature scale so that they would have a
measure of temperature on a ratio scale. Again, in order to make ratio statements such as
something is twice or half of another then it must be a variable on a ratio scale. Common
examples are sales, costs, market share, number of customers. Few psychological
variables have an absolute or natural zero
We can categorize all variables into one of the following groups.

Type Scale
Categorical Nominal/Ordinal
Continuous Interval/Ratio
Scale Basic Common Marketing Permissible Statistics
Characteristics Examples Examples Descriptive Inferential
Nominal Numbers Social Security Brand nos., Percentages, Chi-square,
identify & nos., numbering store types mode binomial
classify objects of football test
Ordinal Nos. indicate Quality Preference Percentile, Rank-order
the relative rankings, rankings, median correlation,
positions of rankings of market Friedman
objects but not teams in a position, social ANOVA
the magnitude tournament class
of differences
between them

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Interval Differences Temperature Attitudes, Range, mean, Product-

between objects (Fahrenheit) opinions, index standard moment
can be Celsius) nos. deviation correlation,
compared, zero t tests,
point is arbitrary regression
Ratio Zero point is Length, weight Age, sales, Geometric Coefficient
fixed, ratios of income, costs mean, of variation
scale values harmonic
can be mean

Measurement means assigning numbers or other symbols to characteristics of objects
according to certain prespecified rules. One-to-one correspondence between the numbers
and the characteristics being measured. The rules for assigning numbers should be
standardized and applied uniformly. Rules must not change over objects or time.

Scaling involves creating a continuum upon which measured objects are located.
Consider an attitude scale from 1 to 100. Each respondent is assigned a number from 1 to
100, with 1 = Extremely Unfavorable and 100 = Extremely Favorable.. Measurement is
the actual assignment of a number from 1 to 100 to each respondent. Scaling is the
process of placing the respondents on a continuum with respect to their attitude toward
department stores.
Suppose one of the banks in town is interested in comparing its image to the image of its
competitors and has developed a list of statements that can be employed to describe each
of its banks. Now suppose that, when represented with a list of characteristics a
respondent describes bank A as having convenient location and convenient hours, but
generally discourteous service and higher service charges. Does this respondent have an
overall favourable or unfavorable attitude towards bank A? It is the purpose of scaling
procedures to develop values for the characteristics so we can assess the person’s attitude
towards each bank. The scaling techniques commonly employed in marketing and social
research can be classified into comparative and non-comparative scales. Comparative
scales involve the direct comparison of stimulus objects. For example, respondents might
be asked whether they prefer Coke or Pepsi. Comparative scales must be interpreted in
relative terms and have only ordinal or rank order properties. For this reason,
comparative scaling is also referred to as non-metric scaling. The main benefit of
comparative scaling is that small differences can be detected. In addition, respondents
approach the rating task from the same known reference points. They are easily
understood and can be applied easily. The main disadvantages include the ordinal nature
of the data and the inability to generalize beyond the stimulus objects scaled. For
instance, to compare Schweppes Cola to Coke and Pepsi, the researcher would have to do

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

a new study. These disadvantages are substantially overcome by non-comparative scaling

techniques. In non-comparative scaling techniques, also referred to as metric scales,
each object is scaled independently of the others in the stimulus set. The resulting data
are generally presumed to be interval or ratio scaled. For example, respondents may be
asked to evaluate Coke on a scale of 1 to 6 preference scale (1=not at all preferred,
Tests of Good Measurement
1. VALIDITY:–The extent to which an instrument measures what it intended to
measure–The extent to which differences found with a measuring instrument reflect true
differences among those being measured
(i)Content Validity: Adequacy of the topic under study; instrument should contain a
representative sample of the universe; no numerical expression possible; primarily
intuitive and judgmental, relies on panel of experts
(ii)Criterion -related Validity: Ability to predict outcome or estimate the existence of
some current conditions. The criterion must be relevant, free from bias (equal
opportunity to each subject), reliable (stable or reproducible) and information specified
must be available.
A. Predictive validity: Usefulness of a test in predicting some future performance
B. Concurrent validity: Usefulness of a test in closely relating to other measures of
known validity Therefore criterion -related validity is expressed as the co-efficient of
correlation between test scores and some measures of future performance or scores of
another measure of known validity
(iii)Construct Validity: (Most complex & abstract) Degree to which a measure
conforms to predicted correlations with other theoretical propositions; degree to which
scores on a test can be accounted for, by the explanatory constructs of a sound theory.
I.o.w., measurements are correlated with a set of other propositions. Validity can be
tested by (A) judging responses in terms of common sense for face or logical validity (B)
obtaining jury or expert opinion (C) testing the instrument on known groups (D) using
independent criteria (ideal but difficult), i.e., several criteria are combined into an index
to check the validity of scale scores of another measure of known validity

2. RELIABILITY:–Instrument should provide consistent results–Contributes to validity

(Note that reliable need not be a valid instrument, but a valid instrument is always
reliable)–Takes care of non interference of transient and situational factors
(i)Stability: Securing consistent results with repeated measurements of the same person
and with the same instrument
(ii)Equivalence: How much error may get introduced by different investigators or
different samples of the items To improve reliability (A) standardize the conditions of
measurements (fatigue, boredom, etc.,)
(B) carefully design directions for measurement with no variation from group to group
(use trained & motivated persons, broadened sample, etc.)
Testing reliability of a scale:
(I) Test retest: Rating is repeated after an interval of time & scores correlated

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

(II) Simultaneous rating or test in multiple forms: simultaneous rating under

similar conditions by two or more competent investigators or two or more
forms of instrument & scores correlated
(III) (iii) Split -off technique: Scores on equal halves of the scale are correlated
with each other

(i)Economy: Data collection methods should be practicable. A trade off between ideal
research and affordable cost or available budget is necessary. e.g., length of measuring
instrument (more items give greater reliability but increases cost)
(ii) Convenience- Easy to administer. e.g., Proper layout of instrument
(iii) Interpretability: Other than designers should be able to interpret the results; Gives
evidence about the reliability–Give detailed instructions for administering scoring keys–
Give guidelines for using the test & interpreting the results

SCALING: The generation of a continuum on which measured objects are located.

Scaling may be considered an extension of measurement. Scaling techniques may be
divided into two categories.
1. Comparative Scaling Techniques
2. Non-Comparative Scaling Techniques


Comparative scales involve the direct comparison of stimulus objects. Comparative scale
data must be interpreted in relative terms and have only ordinal or rank order properties.

Advantages of Comparative Scales:

Small differences between stimulus objects can be detected.
Same known reference points for all respondents.
Easily understood and can be applied.
Involve fewer theoretical assumptions.
Tend to reduce halo or carryover effects from one judgment to another.

Disadvantages of Comparative Scales:

1. Ordinal nature of the data
2. Inability to generalize beyond the stimulus objects scaled.

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Paired comparison scaling

As its name implies, a respondent is presented with two objects and asked to select one
according to some criterion. The data obtained are ordinal in nature. A respondent may
state that they shop in Target more than K-Mart or like Kellog’s cereal better than
Nabisco. Coca-Cola is reported to have conducted more than 190,000 paired comparisons
before introducing New Coke. In Australia, Pepsi used paired comparisons to
demonstrate to consumers that people favoured the taste of Pepsi to Coke. Consumers
were given a blind taste test featuring Pepsi and Coke. Over 50% of consumers favoured
Pepsi in the test. Pepsi used the results in ongoing promotions. Paired comparison scaling
is the most widely used comparative scaling technique.
Example: Obtaining shampoo preferences using paired comparisons.
Instructions: We are going to present you with 10 pairs of shampoo brands. For each pair,
please indicate which one of the two brands of shampoo in the pair you would prefer for
personal use.
Recording Form:

Decore Sunsilk Pert J&J Baby Agree

Decore 0 0 1 0
Sunsilk 1 0 1 0
Pert 1 1 1 1
J&J 0 0 0 0
Agree 1 1 0 1
No times preferred 3 2 4 1

Paired comparison data can be analyzed in several ways. The researcher can calculate the
percentage of respondents who prefer one stimulus over another by the matrices for all
respondents and dividing the sum by the number of respondents.
Simultaneous evaluation of all stimuli is also possible. Under the assumption of
transitivity, it is possible to convert paired comparison data to a rank order. Transitivity
of preference implies that if Brand A is preferred to Brand B, and Brand B is preferred to
Brand C, then Brand A is preferred to Brand C. To arrive at rank order, the researcher
determines the number of time each brand is preferred by summing the column entries on
the recording form.
Paired comparison scaling is useful when the number of brands is limited. However, with
a large number of brands, the number of comparisons becomes unwieldy. Other
disadvantages are that violations of the assumption of transitivity may occur, and that the
order in which objects are presented may bias results. Paired comparisons bear little
resemblance to the market place situation which involves selection from multiple
alternatives. Also, respondents may prefer one object over another, but they may not like
it in an absolute sense.

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Rank Order Scaling

After paired comparisons, the most popular comparative scaling technique is rank order
scaling. In rank order scaling, respondents are presented with several objects
simultaneously and asked to order or rank them according to some criterion. For
example, respondents may be asked to rank brands of toothpaste according to overall
Example: Preference for Toothpaste brands using rank scaling
Instructions: Rank the various brands of toothpaste in order of preference. Begin by
picking out the brand that you like the most and assign it number 1. Then find the second
most preferred brand and assign it number 2. Continue this process until you have ranked
all the brands in order of preference. The least preferred brand should be assigned a rank
of 5. No two brands should receive the same rank order. The criterion of preference is
entirely up to you. There is no right or wrong answer. Just try and be consistent.

Brand Rank
Ultra Brite
Close Up

Like paired comparison, this approach is also comparative in nature, and it is possible
that the respondent may dislike the brand ranked 1 in the absolute sense. Furthermore,
rank order scaling results in ordinal data also. Compared to paired comparisons this type
of scaling process more closely resembles the shopping environment. It also takes less
time and eliminates intransitive responses.

Constant sum scaling.

In constant sum scaling, respondents allocate a constant number of units, such as points
or dollars among a set of stimulus objects with respect to some criterion.
Example: Importance of toilet soap attributes using a constant sum scale
Instructions: Below are eight attributes of toilet soaps. Please allocate 100 points among
these attributes so that your allocation reflects the importance you attach to each attribute.
The more points an attribute receives, the more important the attribute is. If an attribute is
not at all important assign it zero points. If an attribute is twice as important as some
other attribute, it should receive twice as many points.

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Attribute Segment 1 Segment 2 Segment 3

Mildness 8 2 4
Lather 2 4 17
Shrinkage 3 9 7
Price 53 17 9
Fragrance 9 0 19
Packaging 7 5 9
Moisturising 5 3 20
Cleaning Power 13 60 15
100 100 100

Form Average responses of three segments

These results are presented for three groups, or segments, from the population. Note that
although the constant sum has an absolute zero, it is considered an ordinal scale because
of its comparative nature. It can be seen that the allocation of points is influenced by the
specific attributes included in the evaluation task.
The main advantage is that it allows for fine discrimination among the stimulus objects.
However, it has two disadvantages. Respondents may allocate more or fewer units than
those specified. Another problem is rounding error if too few units are used. On the other
hand, the use of a large number of units maybe too taxing for the respondent and cause
confusion and fatigue.


Respondents using a non-comparative scale employ whatever rating standard seems
appropriate to them. They evaluate only one object at a time. Non-comparative
techniques consist of continuous and itemized rating scales.
Continuous rating scale
In continuous rating scale, also referred to as a graphic rating scale, respondents rate the
objects by placing a mark at the appropriate position on a line that runs from one extreme
of the criterion variable to the other. The respondents are not restricted to selecting from
the marks previously set by the researcher. The line may be vertical, horizontal or scale
points in the form of brief descriptions or numbers.
Examples: How would you rate Store G as a department store?
Version 1
Probably the worst----------------------------------------------------Probably the best
Version 2
Probably the worst--------------------------------------------------------Probably the best
0 10 20 30 40 50 60 70 80 90 100
Once the respondent has provided the ratings, the researcher divides the line into as many
categories as desired and assigns scores based on the categories into which the ratings

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

fall. These scores are typically treated as interval data. They are easy to construct, but
scoring is cumbersome and unreliable.
Itemised rating scales
Semantic Differential scale
The semantic differential is a seven point rating scale with end points associated with
bipolar labels that have semantic meaning. In a typical application, respondents rate
objects on a number of itemised, seven point scales bounded at each end by one of two
bipolar adjectives such as cold and warm.
Example Retail Store Project
Instructions: This part of the study measures what certain retail stores mean to you by
having you judge them on a series of descriptive scales bounded at each end by one of
two bipolar adjectives. Please mark (X) the blank that best indicates how accurately one
or the other adjective describes what store G means to you.

Form: Store G is:

Modern--:--:--:--:--:--:--:Old fashioned

The negative adjective or phrase sometimes appears on the left and sometimes on the
right. This controls the tendency of some respondents, particularly those with strong
positive or negative attitudes to mark one side without reading the labels.
Individual items may be scored as either -3 to +5 or a 1 to 7 scale. The results are
commonly analysed through profile analysis. In profile analysis, means or medians on
each rating scale are calculated and compared by plotting or statistical analysis. While the
mean is most often used, there is some controversy as to whether the data obtained should
be treated on an interval scale. Where the researcher requires an overall comparison of
objects, such as to determine store preference, the individual item scores are summed to
arrive at a total score.

Stapel Scale
The stapel scale is a unipolar rating scale with 10 categories numbered from -5 to +5,
without a neutral (zero) point. This scale is usually presented vertically. Respondents are
asked to indicate how accurately or inaccurately each term describes the object by
selecting an appropriate numerical response category. The higher the number, the more
accurately the term describes the object.
Example: Retail Study

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

Instructions: Evaluate how accurately the high quality/poor service phrase describes the
retail store. Select a plus number for the phrases you think describe the store more
accurately. The less accurately you think the phrase describes the score, the larger the
minus number you should choose. You an select any number from +5 for phrases you
think are very accurate to -5 for phrases you think are very inaccurate.

High Quality Poor Service
The data obtained from a Stapel scale can be analysed in the same way as semantic
differential data. It produces similar results to the semantic differential. Its advantages are
that it does not require a pretest of the adjectives or phrases to ensure true bipolarity, and
it can be administered over the phone. However some researchers believe the Stapel scale
is confusing and difficult to apply.

Likert Scale
The Likert Scale is a widely used scale that requires the respondent to indicate a degree
of agreement or disagreement with each of a series of statements about the stimulus
objects. Typically each scale has five response categories, ranging from strongly agree to
strongly disagree.

Example: Retail Store Project

Instructions: Listed below are different opinions about Store G. Please indicate how
strongly you agree or disagree with each , by circling the appropriate category.

1. Sears sells high quality merchandise. 1 2X 3 4 5

2. Sears has poor in-store service. 1 2X 3 4 5

3. I like to shop at Sears. 1 2X 3 4 5

To conduct the analysis, each statement is assigned a numerical score, ranging from
either -2 to +2 or 1 to 5. The analysis can be conducted on an item-to-item basis (profile

Prepared By-Sanjeev Kumar(Faculty R.M)

Graphic Era University Dehradun

analysis) or a total summated score can be calculated for each respondent by summing
across items. The scale has several advantages. It is easy to construct and administer.
Respondents readily understand the scale, making it suitable for mail, telephone or
personal interviews. The main disadvantage is that it takes longer to complete than other
itemized rating scales, because respondents have to read each statement.

Summary of Itemized Rating Scale Decisions

1.Number of categories While there is no single, optimal number, traditional

2.Balanced vs. unbalanced In general, the scale should be balanced to obtain

objective data.If a neutral or indifferent
scale response is possible for at least some of the
respondents, an odd number of categories
should be used

3.Odd or even number of categories If a neutral or indifferent scale response is possible

for at least some of the respondents, an
odd number of categories should be used.

4.Forced versus nonforced In situations where the respondents are expected to

have no opinion, the accuracy of data
may be improved by a nonforced scale.
5.Verbal description An argument can be made for labeling all or many
scale categories.

6.Physical form A number of options should be tried and the best

one selected.

Prepared By-Sanjeev Kumar(Faculty R.M)