CHAPTER-12 Measurement, Scaling, Reliability

Measurement: Scaling,
Reliability, Validity
CHAPTER 12
1
Scale
 Is a tool or mechanism by which
individuals are distinguished as to how
they differ from one another on the
variables of interest to our study.
3
scales
 There are four basic types of scales:
1. Nominal Scale
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale
3
Nominal Scale
 A nominal scale is one that allows the researcher to assign
subjects to certain categories or groups.
 What is your department?

O Marketing O Maintenance O Finance
O Production O Servicing O Personnel
O Sales O Public Relations O Accounting
 What is your gender?

O Male
O Female
7
Nominal Scale
 Nominal scale categorize individuals or
objects into mutually exclusive and
collectively exhaustive groups.
 The information that can be generated
from nominal scaling is to calculate the
percentage (or frequency) of males and
females in our sample of respondents.
5
Ordinal Scale
 Ordinal scale: not only categorizes variables in such
a way as to denote differences among various
categories, it also rank-orders categories in some
meaningful way.
 What is the highest level of education you have

completed?
O Less than High School
O High School/GED Equivalent
O College Degree
O Masters Degree
O Doctoral Degree
6
Ordinal Scale
 The preference would be ranked ( from
best to worse; or from first to last) and
numbered as 1, 2, 3, and so on.
7
Interval Scale
 Interval scale:
 The interval scale lets us measure
the distance between any two points

on the scale.
8
Interval scale
9
Example 3a
 Indicate the extent to which you agree
with the following statements as they
relate to your job, by circling the
appropriate number against each,
using the scale given below.
strongly disagree 1, Disagree 2
Neither Agree Nor Disagree 3
Agree 4, Strongly Agree 5.
10
Example 3a (Cont.)
 The following opportunities offered by
the job are very important to me:
5 4 3 2 1 Interacting with others

5 4 3 2 1 Using a number of
different skills
5 4 3 2 1 Completing a task from
beginning to end
5 4 3 2 1 Serving others
5 4 3 2 1 Working independently
11
Example 3b
 Circle the number that represents your feelings at this particular

moment best. There are no right or wrong answers. Please answer
every question.
1. I invest more in my work than I get out of it
I disagree completely 1 2 3 4 5 I agree completely
2. I exert myself too much considering what I get back in return
3. For the efforts I put into the organization, I get much in return

12
Ratio Scale
 The ratio scale is the most powerful
of the four scales because it has a
unique zero origin ( not an
arbitrary origin).
 The differences between scales are
summarized in the next Figure.
13
Ratio Scale
14
The differences between
scales
15
Properties of the Four Scales
Developing Scales
 The four types of scales that can be used
to measure the operationally defined
dimensions and elements of a variable are:
Nominal, Ordinal, Interval, and Ratio
scales.
 It is necessary to examine the methods of
scaling (assigning numbers or symbols) to
elicit the attitudinal responses of subjects
toward objects, events, or persons.
17
Developing Scales
 Categories of attitudinal scales:
(not to be confused with the four
different types of scales)
 The Rating Scales
 The Ranking Scales
18
Developing Scales
 Rating scales
 have several response categories and
 used to elicit responses with regard to the
object, event, or person studied.
 Ranking scales,
 make comparisons between or among
objects, events, or persons and elicit the
preferred choices and ranking among them.
19
Rating Scales
 The following rating scales are often
used in organizational research.
1. Dichotomous scale
2. Category scale
3. Likert scale
4. Numerical scale
20
Rating Scales
5. Semantic differential scale
6. Itemized rating scale
7. Fixed or constant sum rating scale
8. Stapel scale
9. Graphic rating scale
10. Consensus scale
21
Dichotomous Scale
 Is used to elicit a Yes or No answer.
(Note that a nominal scale is used to
elicit the response)
 Example 4
Do you own a car? Yes No
22
Category Scale
 It uses multiple items to elicit a single
response.
 Example 5
Where in Jordan do you reside?
Amman
Mafraq
Irbid
Zarqa
Other
23
Likert Scale
 Is designed to examine how strongly
subjects agree or disagree with
statements on a 5-point scale as
following:
_________________________________
Strongly Neither Agree Strongly
Disagree Disagree Nor Disagree Agree Agree
1 2 3 4 5
______________________________________________________
24
Likert Scale
 This is an Interval scale and the
differences in responses between any
two points on the scale remain the
same.
25
Semantic Differential Scale
 We use this scale when several attributes are
identified at the extremes of the scale. For
instance, the scale would employ such terms as:
 Good – Bad
Strong – Weak
Hot – Cold
26
Semantic Differential Scale
 This scale is treated as an Interval
scale.
 Example 6
What is your opinion on your supervisor?
Responsive--------------Unresponsive
Beautiful-----------------Ugly
Courageous-------------Timid
27
Numerical Scale
 Is similar to the semantic differential scale,
with the difference that numbers on a 5-
points or 7-points scale are provided, as
illustrated in the following example:
How pleased are you with your new job?
Extremely Extremlely
pleased 5 4 3 2 1 displeased
28
Itemized Rating Scale
 A 5-point or 7-point scale is provided for each item
and the respondent states the appropriate number
on the side of each item. This uses an Interval
Scale.
 Example 7(i)
Respond to each item using the scale below, and indicate your
response number on the line by each item.
1 2 3 4 5
Very unlikely unlikely neither likely very likely
unlikely nor
likely
--------------------------------------------------------------------------------
I will be changing my job in the near future. --------
29
 Note that the above is balanced
rating with a neutral point.
 The unbalance rating scale which
does not have a neutral point, will
be presented in the following example.
30
 Example 7(ii)
Circle the number that is closest to how you
feel for the item below:
Not at all Somewhat Moderately Very much
interested interested interested interested
1 2 3 4
--------------------------------------------------------------------------------
How would you rate your interest 1 2 3 4
In changing current organizational
Policies?
31
Fixed or Constant Sum Scale
 The respondents are asked to distribute a given
number of points across various items.

Example : In choosing a toilet soap, indicate the importance you
attach to each of the following five aspects by allotting points for
each to total 100 in all.
Fragrance -----
Color -----
Shape -----
Size -----
_________
Total points 100
This is more in the nature of an ordinal scale.
32
Stapel Scale
 This scale simultaneously measures
both the direction and intensity of
the attitude toward the items under
study. The characteristic of interest
to the study is placed at the center
and a numerical scale ranging, say from
+3 to – 3, on either side of the item as
illustrated in the following example:
33
Example 8: Stapel Scale
 State how you would rate your supervisor’s abilities with respect
to each of the characteristics mentioned below, by circling the
appropriate number.
+3 +3 +3
+2 +2 +2
+1 +1 +1
Adopting modern Product Interpersonal
Technology Innovation Skills
-1 -1 -1
-2 -2 -2
-3 -3 -3
34
Graphic Rating Scale
 A graphical representation helps the
respondents to indicate on this scale
their answers to a particular question by
placing a mark at the appropriate point
on the line, as in the following example:
35
Graphic Rating Scale
 Example 9
 On a scale of 1 to 10, how would you
rate your supervisor?
10
36
Ranking Scales
 Are used to tap preferences between
two or among more objects or items
(ordinal in nature). However, such
ranking may not give definitive clues
to some of the answers sought.
37
Ranking Scales
 Example 10
There are 4 product lines, the manager seeks
information that would help decide which product line
should get the most attention.
Assume:
35% of respondents choose the 1st product.
25% of respondents choose the 2nd product.

20% of respondents choose the 3rd product.
20% of respondents choose the 4th product.
100%
38
Ranking Scales
 The manager cannot conclude that the first
product is the most preferred. Why?
 Because 65% of respondents did not choose
that product. We have to use alternative
methods like Forced Choice, Paired
Comparisons, and the Comparative Scale.
39
Forced Choice
 The forced choice enables respondents
to rank objects relative to one another,
among the alternative provided. This is
easier for the respondents, particularly
if the number of choice to be ranked is
limited in number.
40
Forced Choice
 Example
 Rank the following newspapers that you would
like to subscribe to in the order of preference,
assigning 1 for the most preferred choice and 5
for the least preferred.
 Fortune _____________
 Times_____
 Peoples_______
 Prevention________
41
Goodness of Measures
 It is important to make sure that the
instrument that we develop to measure
a particular concept is
 Accurately measuring the variable,
 Actually measured the concept that we
set out to measure.
42
 How can we ensure that the measures
developed are reasonably good?
 First an item analysis of the
responses to the questions tapping the
variable is done.
 Then the reliability and validity of
the measures are established.
45
Item Analysis
 The items with a high t-value are then
included in the instrument.
 Thereafter, tests for the reliability of
the instrument are done and the
validity of the measure is established.
47
Reliability
 Reliability of measure indicates extent
to which it is without bias and
 Ensures consistent measurement across
 Time (stability) and
 Various items in the instrument (internal
consistency).
66
Stability
 Stability:
 Ability of a measure to remain the same over
time, despite uncontrollable testing conditions

or the state of the respondents themselves.
 Test–Retest Reliability: The reliability coefﬁcient obtained

with a repetition of the same measure on a second occasion.
 Parallel-Form Reliability: Responses on two comparable sets

of measures tapping the same construct are highly correlated.
84 49
Test-Retest Reliability
 When a questionnaire containing some items
that are supposed to measure a concept is
administered to a set of respondents now,
and again to the same respondents, say
several weeks to 6 months later, then the
correlation between the scores obtained is
called the test-retest coefficient.
 The higher the coefficient is, the better the
test-retest reliability, and consequently, the
stability of the measure across time.
50
Parallel-Form Reliability
 What we try to establish in the parallel-
form is the error variability resulting from
wording and ordering of the questions.
 If two such comparable forms are highly
correlated (say 8 and above), we may be
fairly certain that the measures are
reasonably reliable, with minimal error
variance caused by wording, ordering, or
other factors.
52
Internal Consistency
 Internal Consistency of Measures is

indicative of the homogeneity of the items in the
measure that tap the construct.
 Inter-item Consistency Reliability: This is a test
of the consistency of respondents’ answers to all

the items in a measure. The most popular test of
inter-item consistency reliability is the Cronbach’s
coefficient alpha.
 Split-Half Reliability: Split-half reliability reflects
the correlations between two halves of an

instrument.
72
you want to find out how satisfied your customers are
with the level of customer service they receive at your
call center. You send out a survey with three
questions designed to measure overall satisfaction.
Choices for each question are: Strongly
agree/Agree/Neutral/Disagree/Strongly disagree.
•I was satisfied with my experience.
•I will probably recommend your company to others.
•If I write an online review, it would be positive.
54
Validity
 Validity tests show how well an instrument
that is developed measures the particular
concept it is intended to measure. Validity
is concerned with whether we measure the
right concept.
 Several types of validity tests are used to
test the goodness of measures: content
validity, criterion-related validity, and
construct validity.
55
Content Validity
 Content validity ensures that the measure
includes an adequate and representative
set of items that tap the concept.
 The more the scale items represent the
domain of the concept being measured, the
greater the content validity.
 In other words, content validity is a
function of how well the dimensions and
elements of a concept have been
delineated.
56
Criterion-Related Validity
 Criterion-Related Validity is established
when the measure differentiates individuals on
a criterion it is expected to predict. This can
be done by establishing what is called
concurrent validity or predictive validity.
 Concurrent validity is established when the
scale discriminates individuals who are
known to be different; that is, they should
score differently on the instrument as in the
following example.
57
Criterion-Related Validity
 Example:
If a measure of work ethic is developed and
administered to a group of welfare recipients,
the scale should differentiate those who are
enthusiastic about accepting a job and glad of
a opportunity to be off welfare, from those who
would not want to work even when offered a
job.
58
Example 12 (Cont.)
 Those with high work ethic values would not
want to be on welfare and would ask for
employment. Those who are low on work ethic
values, might exploit the opportunity to survive
on welfare for as long as possible.
 If both types of individuals have the same
score on the work ethic scale, then the test
would not be a measure of work ethic, but of
something else.
59
Construct Validity
 Construct Validity testifies to how well the results
obtained from the use of the measure fit the theories
around which the test is designed. This is assessed
through convergent and discriminant validity.
 Convergent validity is established when the scores
obtained with two different instruments measuring
the same concept are highly correlated.
 Discriminant validity is established when, based
on theory, two variables are predicted to be
uncorrelated, and the scores obtained by measuring
them are indeed empirically found to be so.
60
Validity
.
61

CHAPTER-12 Measurement, Scaling, Reliability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHAPTER-12 Measurement, Scaling, Reliability

Uploaded by

Copyright:

Available Formats

Measurement: Scaling,

 What is your department?

 What is your gender?

 What is the highest level of education you have

the distance between any two points

5 4 3 2 1 Interacting with others

 Circle the number that represents your feelings at this particular

1. I invest more in my work than I get out of it

I disagree completely 1 2 3 4 5 I agree completely

2. I exert myself too much considering what I get back in return

I disagree completely 1 2 3 4 5 I agree completely

I disagree completely 1 2 3 4 5 I agree completely

25% of respondents choose the 2nd product.

time, despite uncontrollable testing conditions

 Test–Retest Reliability: The reliability coefﬁcient obtained

 Parallel-Form Reliability: Responses on two comparable sets

 Internal Consistency of Measures is

of the consistency of respondents’ answers to all

the correlations between two halves of an

You might also like