Professional Documents
Culture Documents
Reliability, Validity
CHAPTER 12
1
Scale
Is a tool or mechanism by which
individuals are distinguished as to how
they differ from one another on the
variables of interest to our study.
3
scales
There are four basic types of scales:
1. Nominal Scale
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale
3
Nominal Scale
A nominal scale is one that allows the researcher to assign
subjects to certain categories or groups.
7
Nominal Scale
Nominal scale categorize individuals or
objects into mutually exclusive and
collectively exhaustive groups.
The information that can be generated
from nominal scaling is to calculate the
percentage (or frequency) of males and
females in our sample of respondents.
5
Ordinal Scale
Ordinal scale: not only categorizes variables in such
a way as to denote differences among various
categories, it also rank-orders categories in some
meaningful way.
6
Ordinal Scale
The preference would be ranked ( from
best to worse; or from first to last) and
numbered as 1, 2, 3, and so on.
7
Interval Scale
Interval scale:
The interval scale lets us measure
8
Interval scale
9
Example 3a
Indicate the extent to which you agree
with the following statements as they
relate to your job, by circling the
appropriate number against each,
using the scale given below.
strongly disagree 1, Disagree 2
Neither Agree Nor Disagree 3
Agree 4, Strongly Agree 5.
10
Example 3a (Cont.)
The following opportunities offered by
the job are very important to me:
11
Example 3b
3. For the efforts I put into the organization, I get much in return
13
Ratio Scale
14
The differences between
scales
15
Properties of the Four Scales
Developing Scales
The four types of scales that can be used
to measure the operationally defined
dimensions and elements of a variable are:
Nominal, Ordinal, Interval, and Ratio
scales.
It is necessary to examine the methods of
scaling (assigning numbers or symbols) to
elicit the attitudinal responses of subjects
toward objects, events, or persons.
17
Developing Scales
Categories of attitudinal scales:
(not to be confused with the four
different types of scales)
The Rating Scales
The Ranking Scales
18
Developing Scales
Rating scales
have several response categories and
used to elicit responses with regard to the
object, event, or person studied.
Ranking scales,
make comparisons between or among
objects, events, or persons and elicit the
preferred choices and ranking among them.
19
Rating Scales
The following rating scales are often
used in organizational research.
1. Dichotomous scale
2. Category scale
3. Likert scale
4. Numerical scale
20
Rating Scales
5. Semantic differential scale
6. Itemized rating scale
7. Fixed or constant sum rating scale
8. Stapel scale
9. Graphic rating scale
10. Consensus scale
21
Dichotomous Scale
Is used to elicit a Yes or No answer.
(Note that a nominal scale is used to
elicit the response)
Example 4
Do you own a car? Yes No
22
Category Scale
It uses multiple items to elicit a single
response.
Example 5
Where in Jordan do you reside?
Amman
Mafraq
Irbid
Zarqa
Other
23
Likert Scale
Is designed to examine how strongly
subjects agree or disagree with
statements on a 5-point scale as
following:
_________________________________
Strongly Neither Agree Strongly
Disagree Disagree Nor Disagree Agree Agree
1 2 3 4 5
______________________________________________________
24
Likert Scale
This is an Interval scale and the
differences in responses between any
two points on the scale remain the
same.
25
Semantic Differential Scale
We use this scale when several attributes are
identified at the extremes of the scale. For
instance, the scale would employ such terms as:
Good – Bad
Strong – Weak
Hot – Cold
26
Semantic Differential Scale
This scale is treated as an Interval
scale.
Example 6
What is your opinion on your supervisor?
Responsive--------------Unresponsive
Beautiful-----------------Ugly
Courageous-------------Timid
27
Numerical Scale
Is similar to the semantic differential scale,
with the difference that numbers on a 5-
points or 7-points scale are provided, as
illustrated in the following example:
How pleased are you with your new job?
Extremely Extremlely
pleased 5 4 3 2 1 displeased
28
Itemized Rating Scale
A 5-point or 7-point scale is provided for each item
and the respondent states the appropriate number
on the side of each item. This uses an Interval
Scale.
Example 7(i)
Respond to each item using the scale below, and indicate your
response number on the line by each item.
1 2 3 4 5
Very unlikely unlikely neither likely very likely
unlikely nor
likely
--------------------------------------------------------------------------------
I will be changing my job in the near future. --------
29
Itemized Rating Scale
Note that the above is balanced
rating with a neutral point.
The unbalance rating scale which
does not have a neutral point, will
be presented in the following example.
30
Itemized Rating Scale
Example 7(ii)
Circle the number that is closest to how you
feel for the item below:
Not at all Somewhat Moderately Very much
interested interested interested interested
1 2 3 4
--------------------------------------------------------------------------------
How would you rate your interest 1 2 3 4
In changing current organizational
Policies?
31
Fixed or Constant Sum Scale
The respondents are asked to distribute a given
number of points across various items.
Example : In choosing a toilet soap, indicate the importance you
attach to each of the following five aspects by allotting points for
each to total 100 in all.
Fragrance -----
Color -----
Shape -----
Size -----
_________
Total points 100
This is more in the nature of an ordinal scale.
32
Stapel Scale
This scale simultaneously measures
both the direction and intensity of
the attitude toward the items under
study. The characteristic of interest
to the study is placed at the center
and a numerical scale ranging, say from
+3 to – 3, on either side of the item as
illustrated in the following example:
33
Example 8: Stapel Scale
State how you would rate your supervisor’s abilities with respect
to each of the characteristics mentioned below, by circling the
appropriate number.
+3 +3 +3
+2 +2 +2
+1 +1 +1
Adopting modern Product Interpersonal
Technology Innovation Skills
-1 -1 -1
-2 -2 -2
-3 -3 -3
34
Graphic Rating Scale
A graphical representation helps the
respondents to indicate on this scale
their answers to a particular question by
placing a mark at the appropriate point
on the line, as in the following example:
35
Graphic Rating Scale
Example 9
On a scale of 1 to 10, how would you
rate your supervisor?
10
36
Ranking Scales
Are used to tap preferences between
two or among more objects or items
(ordinal in nature). However, such
ranking may not give definitive clues
to some of the answers sought.
37
Ranking Scales
Example 10
There are 4 product lines, the manager seeks
information that would help decide which product line
should get the most attention.
Assume:
35% of respondents choose the 1st product.
38
Ranking Scales
The manager cannot conclude that the first
product is the most preferred. Why?
Because 65% of respondents did not choose
that product. We have to use alternative
methods like Forced Choice, Paired
Comparisons, and the Comparative Scale.
39
Forced Choice
The forced choice enables respondents
to rank objects relative to one another,
among the alternative provided. This is
easier for the respondents, particularly
if the number of choice to be ranked is
limited in number.
40
Forced Choice
Example
Rank the following newspapers that you would
like to subscribe to in the order of preference,
assigning 1 for the most preferred choice and 5
for the least preferred.
Fortune _____________
Times_____
Peoples_______
Prevention________
41
Goodness of Measures
It is important to make sure that the
instrument that we develop to measure
a particular concept is
Accurately measuring the variable,
Actually measured the concept that we
set out to measure.
42
Goodness of Measures
Goodness of Measures
How can we ensure that the measures
developed are reasonably good?
First an item analysis of the
responses to the questions tapping the
variable is done.
Then the reliability and validity of
the measures are established.
45
Item Analysis
The items with a high t-value are then
included in the instrument.
Thereafter, tests for the reliability of
the instrument are done and the
validity of the measure is established.
47
Reliability
Reliability of measure indicates extent
to which it is without bias and
Ensures consistent measurement across
Time (stability) and
Various items in the instrument (internal
consistency).
66
Stability
Stability:
Ability of a measure to remain the same over
84 49
Test-Retest Reliability
When a questionnaire containing some items
that are supposed to measure a concept is
administered to a set of respondents now,
and again to the same respondents, say
several weeks to 6 months later, then the
correlation between the scores obtained is
called the test-retest coefficient.
The higher the coefficient is, the better the
test-retest reliability, and consequently, the
stability of the measure across time.
50
Parallel-Form Reliability
What we try to establish in the parallel-
form is the error variability resulting from
wording and ordering of the questions.
If two such comparable forms are highly
correlated (say 8 and above), we may be
fairly certain that the measures are
reasonably reliable, with minimal error
variance caused by wording, ordering, or
other factors.
52
Internal Consistency
54
Validity
Validity tests show how well an instrument
that is developed measures the particular
concept it is intended to measure. Validity
is concerned with whether we measure the
right concept.
Several types of validity tests are used to
test the goodness of measures: content
validity, criterion-related validity, and
construct validity.
55
Content Validity
Content validity ensures that the measure
includes an adequate and representative
set of items that tap the concept.
The more the scale items represent the
domain of the concept being measured, the
greater the content validity.
In other words, content validity is a
function of how well the dimensions and
elements of a concept have been
delineated.
56
Criterion-Related Validity
Criterion-Related Validity is established
when the measure differentiates individuals on
a criterion it is expected to predict. This can
be done by establishing what is called
concurrent validity or predictive validity.
Concurrent validity is established when the
scale discriminates individuals who are
known to be different; that is, they should
score differently on the instrument as in the
following example.
57
Criterion-Related Validity
Example:
If a measure of work ethic is developed and
administered to a group of welfare recipients,
the scale should differentiate those who are
enthusiastic about accepting a job and glad of
a opportunity to be off welfare, from those who
would not want to work even when offered a
job.
58
Example 12 (Cont.)
Those with high work ethic values would not
want to be on welfare and would ask for
employment. Those who are low on work ethic
values, might exploit the opportunity to survive
on welfare for as long as possible.
If both types of individuals have the same
score on the work ethic scale, then the test
would not be a measure of work ethic, but of
something else.
59
Construct Validity
Construct Validity testifies to how well the results
obtained from the use of the measure fit the theories
around which the test is designed. This is assessed
through convergent and discriminant validity.
Convergent validity is established when the scores
obtained with two different instruments measuring
the same concept are highly correlated.
Discriminant validity is established when, based
on theory, two variables are predicted to be
uncorrelated, and the scores obtained by measuring
them are indeed empirically found to be so.
60
Validity
.
61