You are on page 1of 47

Chapter 6

ASKING GOOD
QUESTIONS
Learning Objectives:

After reading this chapter the students should


be able to:
 Differentiate each type of scale used in taking
measurements;
 Discuss each component provides a different
insight into a person’s attitude’
 Identify other approaches in scaling techniques;
 Enumerate other considerations in designing scale;
and
 Understand validity and reliability
When conducting market research, aside
from selecting the right target audience and
research type, asking the right questions is
crucial in receiving quality responses.

The purpose of market research is to


gather sufficient information about your
business so you can take tangible actions
that move your company forward. Asking the
right questions, therefore, is the first step in
framing out the appropriate actions.
SCALES of MEASUREMENT
Measurement can be defined as a
standardized process of assigning numbers or
other symbols to certain characteristics of the
objects of interest, according to some pre-
specified rules.
Measurement often deals with numbers,
because mathematical and statistical analyses
can be performed only on numbers, and they can
be communicated throughout the world in the
same form without any translation problems.
Two Characteristics of
Measurement
1. There must be one-to-one
correspondence between the symbol and
the characteristic in the object that is
being measured.

2. The rules for assignment must be


invariant over time and the objects being
measured.
Scaling is the process of creating a
continuum on which objects are located
according to the amount of the measured
characteristic they possess.

Measurement and scaling are basic tool


used in the scientific method and are used in
almost every marketing research situation.
Four Levels of Measurement
• Nominal Scales
• Ordinal Scales
• Interval Scales
• Ratio Scales
Nominal Scales
This is the crudest of measurement
scales, classifies individuals, companies
products, brands or other entities into
categories where no order is implied. It is
often referred to as a categorical scale. It is a
system of classification and does not place
the entity along a continuum.
• Example: Which of the following soft
drinks do you like. Multiple answers.

 Coke
 Mountain Dew
 Pepsi
 7-Up
 Sprite
 Sarsi
 Royal
 RC Cola
 Others, please specify________________
Ordinal Scales
Ordinal Scales involve the ranking of
individuals, attitudes or items along the
continuum of the characteristic being scaled.
The researcher may know the order of
preference but nothing about how much
more one brand is preferred to another that
is there no information about the interval
between any two brands.
All of the information a nominal scale
would have given is available from an ordinal
scale. In addition, positional statistics such
as the median, quartile and percentile can be
determined.

It is possible to test for order correlation


with ranked data. The two main methods are:
1. Spearman’s Ranked Correlation Coefficient
2. Kendall’s Coefficient of Concordance
The only other permissible hypothesis
testing procedures are the runs test and sign
test.
The runs test (also known as the Wald-
Wolfowitz) is used to determine whether a
sequence of binomial data, meaning it can
only take one of two possible values such as
yes/no, male/female is random or contains
systematic runs of one or other value.
Sign tests are employed when the
objective is to determine whether there is a
significant difference between matched
pairs of data. The sign test tells the analyst
if the number of positive differences in
ranking is approximately equal to the
number of negative rankings, in which case,
the distribution of rankings is random,
simply apparent differences are not
significant.
• Example: Please rank the following soft drinks list
based on your degree of liking assigning 5 to your
preferred drink and 1 as the least preferred.
____ Coke
____ Mountain Dew
____ Pepsi
____ 7-Up
____ Sprite
____ Sarsi
____ Royal
____ RC Cola
____ Others, please specify___________
Interval Scales
It is only with an interval scaled data that
researchers can justify the use of the arithmetic mean
as the measure of average. The interval or cardinal
scale has equal units of measurement, thus making it
possible to interpret not only the order of scale scores
but also the distance between them. Interval scales
may be either numeric or semantic.
The data obtained from the interval can be used to
calculate the mean scores of each attributes
over all respondents. The standard deviation
or a measure of dispersion can also be
calculated.
• Example: Please indicate your liking on the following
soft drinks by encircling the number that best
described your preference.
Extremely Extremely
unfavorable favorable
Coke 5 4 3 2 1
Mountain Dew 5 4 3 2 1
Pepsi 5 4 3 2 1
7-Up 5 4 3 2 1
Sprite 5 4 3 2 1
Sarsi 5 4 3 2 1
Others, specify 5 4 3 2 1
Ratio Scales
The highest level of measurement is a ratio
scale. This has the properties of an interval
scale together with a fixed origin or zero point.
Examples of variables which are ratio scaled
include weights, lengths, and times. It permits
the researcher to compare both differences in
scores and the relative magnitude of scores.
• Example: In the past seven days, approximately
how many 12 ounce servings of the following
soft drinks in the list have you consumed?
____ Coke
____ Mountain Dew
____ Pepsi
____ 7-Up
____ Sprite
____ Sarsi
____ Royal
____ RC Cola
____ Others, please specify___________
MEASURING ATTITUDES AND
OTHER UNOBSERVABLE
CONCEPTS
Attitudes are mental states used by
individuals to structure the way they respond to
it. There is general acceptance that there are
related components that form an attitude: a
cognitive or knowledge component, a liking or
affective component, and an intentions or
actions component. Each component provides
a different insight into a person’s attitude.
• Cognitive or knowledge component – a
person’s information about an object.

• Affective or liking component – a


person’s overall feelings towards an object,
situation, or person, on a scale of like-
dislike or favorable-unfavorable.

• Intention or action component – refers to


a person’s expectations of future behavior
toward an object.
• There are other approaches that are
simpler to understand. Here are the most
important scales which are easy to
understand and use.

1. Graphic Rating Scale


2. Likert Scale
3. Semantic Differential Scale (Max Diff)
4. Side-by-Side Matrix
5. Stapel’s Scale
1. Graphic Rating Scale – also known as
continuous rating scale. The ends of the
continuum are sometimes labeled with
opposite values. Respondents are
required to make a mark at any point on
the scale that they find appropriate. Its
limitation is that coding and analysis will
require a substantial amount of time, since
we have to measure the physical
distances on the scale for each
respondent.
How satisfied are you with the following
Very Unsatisfied Very Satisfied
0 100

Product

Pricing
Series 3
Series 2
Customer Service Series 1

Website

0 1 2 3 4 5 6
2. Likert Scale – typically contains an odd
number of options, usually 5 to 7. One end
is labeled as the most positive while the
other end is labeled as the most negative;
the middle is labeled as “neutral”.
The phrases “purely negative” and
“mostly negative” could also have been
“extremely disagre.e” and “slightly
disagree”
3. Semantic Differential Scale – is a
combination of more semantic scale than
one continuum. It usually contains an odd
number of radio buttons with labels at
opposite ends. Max Diff scales are often
used in trade-off analysis such as conjoint.
This can be used in new product features
research or even market segmentation
research to get accurate orderings of the
most important product features.
How satisfied are you with the following:

Least Most

Ease-of-Use O O

Speed O O

Design O O

Size O O

Durability O O
4. Side-by-Side Matrix – it is a common
and powerful application of the side-by-
side matrix is the importance/satisfaction
type of question.
First, ask the respondent how important
an attribute is, and then ask them how
satisfied they are with the performance in
this area. This yields benchmark data that
will allow comparison of performance
against other competing alternatives.
5. Stapel’s Scale – it was developed by Jan
Stapel. This scale has some distinctive
features:

a. Each item has only one word/phrase


indicating the dimension it represents.
b. Each item has ten response categories.
c. Each item has an even number of
categories.
d. The response categories have numerical
labels but no verbal labels.
For example, in the following items,
suppose for quality of halo-halo, respondents
are asked to rank from +5 to -5. Select a plus
number for words which best describe the
halo-halo accurately. Select a minus number
for words that do not describe the halo-halo
quality accurately. Thus, respondents can
select any number from +5, for words they
think are very accurate, to -5 for words they
think are very inaccurate. This scale is usually
presented vertically.
+5
+4
+3
+2
+1
High Quality
-1
-2
-3
-4
-5
This is a unipolar rating scale.
Other Considerations in
Designing Scales
There are several issues that must be
addressed during the designing of scales
for measuring concepts in marketing
research. Some of them are:
1. Number of items in a scale
2. Number of scale positions
3. Including “I don’t know or non-applicable”
response
1. Number of items in a scale – the
number of items should be enough to
fully capture the concept being
measured.

2. Number of scale positions – a minimum


of 5 to 9 response categories work quite
well and used routinely when designing
measures for marketing research.
3. Including “I don’t know or N/A
response” – oftentimes, researchers
include “don’t know”, “no idea” or “not
applicable” options together with regular
scale positions in an item. This is quite
good if only less number of respondents is
likely to think about the object or issue
being addressed by the study.
VALIDITY and RELIABILITY of
MEASURES
Nothing in marketing research can be
measured without errors. As response
errors decrease, the validity of the
measures increases. There are two types
of errors which are:

 Systematic error
 Random error
1. Systematic Error – it is an error that
affects the measurement in a constant
way. Personality traits and other stable
characteristics of an individual add
systematic error to the measurement
process.
a. Frame Error – example, in conducting
telephone interview, many households are not
listed in a current telephone interview because
they don’t want to be listed or are not listed
accurately because they have recently changed
their telephone number.
b. Population Specification Error –
example, error in the estimation of the amount
of travel on ‘employer’s business’ by personal
vehicle, employer’s business’ travel is defined as
travel on the business of the employer (e.g. trips
to meetings, trips between worksites, etc), but
excludes travel between work and home
(commuting).
c. Selection Error – Example, door to door
interviews might decide to avoid houses that do
not look neat and tidy because they think the
inhabitants will not be agreeable to doing a
survey. If people live in messy houses are
systematically different from those who live in
tidy houses, then the selection error will be
introduced into he results of the survey.
2. Random Error – this is due to temporary
aspects of the person or measurement
situation which can affect.
Example: If a researcher wants to study the
attitudes of marketing students regarding library
services, it would not be enough to interview
every 100th person who walked into the library.
That technique would only measure the attitudes
of marketing students who use the library, not
those who do not.
VALIDITY
Validity is the extent to which an
instrument measures what it is supposed to
measure and performs as it is designed to
perform. It is rare, if nearly impossible, that
an instrument be 100% valid, so validity is
generally measured in degrees.
There are numerous statistical tests and
measures to assess the validity of
quantitative instruments, which generally
involves pilot testing.
• External Validity is the extent to which
the results of the study can be generalized
from a sample to a population.

• Content Validity refers to the


appropriateness and relevance of the
content of an instrument.
RELIABILITY
Reliability can be thought of as
consistency. Does the instrument
consistently measure what it is intended to
measure? It is not possible to calculate
reliability, however, there are four general
estimators that you may encounter in
reading research:
• Inter-rater/observer reliability
The degree to which different raters/observers
give consistent answers or estimates is a measure
of reliability used to assess the degree to which
different judges or raters agree in their assessment
decisions.
It is useful because human observers will not
necessarily interpret answers the same way; raters
may disagree as to how well certain responses or
material demonstrate knowledge of the construct or
skill being assessed.
Example:
Inter-rater reliability might be employed
when different judges are evaluating the degree
to which Marketing Plans meet certain
standards. Inter-rater reliability is especially
useful when judgments can be considered
relatively subjective. Thus, the use of this type of
reliability would probably be more likely when
evaluating Marketing Plans as opposed to
Feasibility Study.
• Test-retest reliability
The consistency of a measure is
evaluated overtime. It is a measure of
reliability obtained by administering the
same test twice over a period of time to a
group of individuals. The scores in Time 1
and Time 2 can then be correlated in order
to evaluate the test for stability over time.
Example:
A test designed to assess student learning in
Marketing could be given to a group of students
twice, with the second administration perhaps
coming a week after the first. The obtained
correlation coefficient would indicate the stability
of the scores.
• Parallel-forms reliability
The reliability of two tests constructed the
same way, from the same content is a measure
of reliability obtained by administering different
versions of an assessment tool (both versions
must contain items that probe the same
construct, skill, knowledge base, etc.) to the
same group of individuals. The scores from the
two versions can then be correlated in order to
evaluate the consistency of results across
alternative versions.
Example:
If you wanted to evaluate the reliability of a
critical thinking assessment, you might create a
large set of items that all pertain to critical
thinking and then randomly split the questions
up into two sets, which would represent the
parallel forms.

You might also like