Questionnaire Design
Guiding Principle
 Respondents should be able and willing to
provide the information requested
 respondents may not be able to recall the
 “How much did you spend on films in the last 3
 questions may be unclear or ambiguous
 “Do you agree with the government‟s philosophy?”
 questions may invade respondent‟s privacy
 “How much did you earn last year?”
 the “good subject” effect
Multiple Items
 Many theoretical constructs are multi-
faceted; multiple questions are needed to
assess them
 average of multiple items = score on construct
 multiple measures of a single construct
increases reliability (freedom from noise)
 the multiple measures of one construct
should be “sprinkled” across the
 responses to related questions “clump up"
Inter-relationship among items
 Measures of the same construct should show
strong association (“hang together”)
 let items 1, 7, and 11 measure Construct A and
items 4, 6, and 9 Construct B

Construct A Construct B
1 7 11 4 6 9
1 perfect strong strong weak weak weak
7 strong perfect strong weak weak weak
11 strong strong perfect weak weak weak
4 weak weak weak perfect strong strong
6 weak weak weak strong perfect strong
9 weak weak weak strong strong perfect

Open or closed-ended?
 Open-ended questions allow respondents
more freedom to express their thoughts
 time-consuming to respond to
 difficult to analyze
 if open-ended responses are to be “coded” into a
set of categories
 establish inter-rater reliability (Cohen‟s Kappa)
 aren‟t we better off with closed-ended questions?
 Closed-ended questions must anticipate
the common responses
 “other” category should be used infrequently
Scaling of responses
 To measure the strength of attitudes
towards an issue, responses are located on a
continuum anchored by opposites, e.g.
“The NTU MBA program is …”

 easy to establish ordinal nature of data
 are these interval data?

awful not good so-so pretty good awesome
Response Biases
 Not enough variation among responses
 use scale with more points (7-point, 9-point, …)
 Too many “middle” responses
 use scale with even number of points
 Leniency bias (responses on “generous”
 use asymmetrical anchors e.g.
“The candidate‟s potential for graduate studies is”

quite good very extremely best
good good good I’ve seen
Forced-choice questions
 Sometimes respondents choose high levels
of all attributes when researcher wants them
to choose among attributes
 forced-choice questions, e.g.
“Which characteristic best describes you –
intelligent or hard-working?”
 variation: “Allocate 100 points over the following
features – sound quality, build quality, weight,
style, converged features (camera, MP3, PDA)”
Questions to Avoid
 Double-barrelled questions
“Have you stopped beating your wife?”
 split into two or more separate questions
 Leading questions
“Don‟t you think REITs are going to take off?”
 research, not advocacy
 Questions with jargon
 Are RDBMS better for TPS or DW/BI?
Pilot Testing
 The best-laid plans can go haywire !
 Objective of pilot testing is to see if
respondents consistently interpret questions
in the same way as intended
 pilot test respondents might be invited to
comment on instrument and procedure
 presence of researcher during survey
administration helps spot problems quicker
 pilot testing “uses up” respondents
Using Existing Instruments
 Many researchers place their questionnaires
in the public domain
 such questionnaires (or parts thereof) can be
used (with proper credits) if our study examines
the same or similar constructs
 re-use of existing instruments ensures
 validity and reliability of measures
 comparability of results across studies
 Try to find existing measures
 Interviews provide
 better rapport
 clarification of complex items
 greater flexibility in wording and sequence
 However, interviews
 are costly in terms of time and effort
 do not offer the anonymity of mail surveys
 If you do interviews,
 develop a script and stick to it
Methods of scaling
Response scales
 rating scales: estimates magnitude of a
 ranking scale: rank order preference
 sorting scales: arrange or classify concepts
 choice scales: selection of preferred
Rating scale
Rating tasks ask the
respondent to estimate
the magnitude of a
characteristic, or quality,
that an object possesses.
The respondent‟s position
on a scale(s) is where he
or she would rate an
Ranking scale
Ranking tasks
require that the
respondent rank
order a small
number of objects in
overall performance
on the basis of
some characteristic
or stimulus.
Other scales
Sorting might present the respondent with
several concepts typed on cards and require that
the respondent arrange the cards into a number
of piles or otherwise classify the concepts.

Choice between two or more alternatives is
another type of measurement - it is assumed that
the chosen object is preferred over the other.
Rating scales
 category scale
 Likert scale
 semantic differential
 numerical scale
 staple scale
 itemised rating scale
 constant sum rating scale
 graphic rating scale
Category Scale
 a category scale is a more sensitive measure
than a scale having only two response categories
- it provides more information.
 Nominal or ordinal (example is ordinal)
 if interval between each category is regarded as equal
– interval

 dichotomous scale - 2 response categories (yes
or no; agree or disagree) nominal

How important were the following in your decision to visit
Sydney (tick one response for each item)

CLIMATE ___________ ___________ ___________
COST OF TRAVEL ___________ ___________ ___________
FAMILY ORIENTED ___________ ___________ ___________
/HISTORICAL ASPECTS _________ ___________ ___________
AREA ___________ ___________ ___________
It is more fun to play a tough, competitive
tennis match than to play an easy one.
___Strongly Agree
___Neither agree nor disagree
___Strongly Disagree

Semantic Differential
 Bipolar adjectives to anchor each end of scale
(seven point scale) eg
 good :__:__:__:__:__:__:__: bad
 sweet :__:__:__:__:__:__:__: sour
 hot :__:__:__:__:__:__:__: cold

 Rotation required to avoid halo effect ???
 Image profile - graphic representation for competing
brands, services to highlight comparison (based on
mean or median)
Numerical Scale
Numerical scales have numbers as response options,
rather than “semantic space‟ or verbal descriptions, to
identify categories (response positions).
Similar to semantic differential – bipolar adjectives on a 5
- point or 7 - point scale

How satisfied are you with your new computer?
Extremely satisfied 7 6 5 4 3 2 1 Extremely dissatisfied
Stapel Scales
 measures both direction & intensity of an attitude
towards an object
 up to a 10 point scale +5 to -5
 presented vertically
 considered interval

A Stapel Scale for Measuring a Store‟s Image
Store Name
Wide Selection
Select a positive or negative number that you think
describe the store accurately for each descriptive word.
Itemised rating scale
 Similar to category scale
 5 or more point scale
 Each point is numbered and labelled
 1 = Very unlikely; 2 = Unlikely; 3 = neither unlikely nor
likely; 4 = Likely; 5 = Very likely
 A number of statements are rated using
 Interval scale
Constant sum rating scale
 Respondent is asked to distribute a given
number of points across various items
(attributes) of a product to indicate the
importance to each attribute.
 Example : distribute 100 point among the
following attributes to indicate the
importance of each for the product - soap.
fragrance; size; shape; texture; colour

Graphic Rating Scale Stressing Pictorial Visual
3 2 1
Very Very
Good Poor
Ranking Scales
 Paired comparison – helps to identify
 Forced choice – rank a set of objects (eg.
destinations) from preferred to least
 Comparative scale - use a benchmark to
compare another product with.
 Ranking scales provide ordinal data
Other response sets
 Scenarios – then provide a set of possible
responses to select from
 Open-ended questions
Scale decisions
 type of response scale
 number of scale categories
 balanced versus unbalanced
 even/odd number of categories
 forced versus non-forced scales
 nature & degree of verbal description
 physical form of the scale
Type of response scale
 depends on research problem and objectives
 depends on the statistical analysis
techniques that may be used for both
descriptive and inferential statistics
Number of categories
 greater the option, greater the sensitivity
 most respondent can only handle 5 to 9
 options increase as object knowledge increases.
 nature of object
 mode of data collection
 analysis of the data - correlation coefficient
decreases with the reduction of categories
Balanced versus unbalanced
 balanced – equal no. of favourable & unfavourable
 to obtain objective data need balanced scale
 if you know the response will be skewed use an
unbalance scale in-line with the skewness
 unbalanced scale has data analysis implications
Even/odd number of categories
 depends on the need for a central or neutral
position; odd number of categories results in a
neutral point
 example: Likert scale is a balanced rating with
an odd number of categories i.e. 5 or 7
 even scales will force respondent to a position
either positive or negative.
 if a neutral or indifferent response is possible
from some respondents – odd number of
categories should be used
Forced versus non-forced scales
 forced scale - the respondent is forced to give an
 forced scale omits „no opinion‟ or „no knowledge‟
 forced scale can distort the response & thus the
measures of central tendency & variance
 offering a „no opinion‟ can allow respondents to be
lazy and not respond
Nature & degree of verbal description
 degree of verbal description associated with the
scale can influence the response
 categorising helps the respondent understand the
 recommend that all or most scale points need
categorising/ description
 strength of adjectives to anchor scale: generally
agree vs strongly agree
Physical form of the scale
 presentation of scale can be in many formats
 in selecting a scale format - consider the
audience and the format likely to receive the
highest response rate
Selecting an appropriate scale
 no one is best - decision is situational
 want maximum information
 nature of item being measured
 ease of use of technique by respondent
 analysis required
 method of communication
Criteria for goodness of measure
 3 major criteria for evaluating good measurement
 reliability
 validity
 sensitivity
 Other factors to consider are
 relevant
 versatile
 ease of response
 refers to the extent to which a scale (number of
items) produces consistent results if repeated
measurements are made
 degree to which the scale is free from random
error and yields consistent results
 Is the scale a stable measure of the concept?
and how well do the items in a scale hold
 main methods – test-retest; inter-item
consistency reliability
 reliability is a necessary but insufficient condition
of the test of goodness of a measure
 ability of a scale to measure the intended
concept and not some other concept
 content validity – measure includes an
adequate & representative set of items that tap
the concept
 literature
 qualitative research
 judgement of a panel of experts

Note: other forms of validity
Reliability and validity on target
Old Rifle New Rifle New Rifle Sunglare
Neither reliability nor High reliability Reliable but not valid
valid (Target A) & validity(Target B) (Target C)