You are on page 1of 49

Measurement of Variables: Scaling,

Reliability, Validity

1
Scale
 Scale: tool or mechanism by which individuals are
distinguished as to how they differ from one
another on the variables of interest to our study.
 Scaling is a procedure for the assignment of
numbers (or other symbols) to a property of
objects in order to import some of the
characteristics of numbers to properties in question

2
Nominal Scale
 A nominal scale is one that allows the researcher to assign subjects to certain
categories or groups.

 What is your department?


O Marketing O Maintenance O Finance
O Production O Servicing O Personnel
O Sales O Public Relations O Accounting

 What is your gender?


O Male
O Female

3
Nominal Scale

4
Ordinal Scale
 Ordinal scale: not only categorizes variables in such a way as to
denote differences among various categories, it also rank-orders
categories in some meaningful way.

 What is the highest level of education you have completed?


O Less than High School
O High School/GED Equivalent
O College Degree
O Masters Degree
O Doctoral Degree

5
Ordinal Scale

6
Interval Scale
 Interval scale: whereas the nominal scale allows us only
to qualitatively distinguish groups by categorizing them
into mutually exclusive and collectively exhaustive sets,
and the ordinal scale to rank-order the preferences, the
interval scale lets us measure the distance between any
two points on the scale.

7
Interval scale
 Circle the number that represents your feelings at this particular moment best. There
are no right or wrong answers. Please answer every question.

1. I invest more in my work than I get out of it

I disagree completely 1 2 3 4 5 I agree completely

2. I exert myself too much considering what I get back in return

I disagree completely 1 2 3 4 5 I agree completely

3. For the efforts I put into the organization, I get much in return

I disagree completely 1 2 3 4 5 I agree completely

8
Interval scale

9
Ratio Scale
 Ratio scale: overcomes the disadvantage of the
arbitrary origin point of the interval scale, in that it has
an absolute (in contrast to an arbitrary) zero point,
which is a meaningful measurement point.

 What is your age?

10
Ratio Scale

11
Properties of the Four Scales

12
Methods of Scaling
 Rating scales
– Have several response categories and are
used to elicit responses with regard to the
object, event, or person studied.

 Ranking scales
– Make comparisons between or among
objects, events, persons and elicit the
preferred choices and ranking among
them.
13
Rating Scales
 Dichotomous scale
 Category scale
 Semantic differential scale
 Numerical scale
 Itemized rating scale
 Likert scale
 Fixed or constant sum rating scale
 Stapel scale
 Graphic rating scale
 Consensus Scale

14
Dichotomous Scale
 Is used to elicit a Yes or No answer
 Nominal scale
 It offers two mutually exclusive response choices.
 Other example: agree and disagree.

15
Dichotomous Scale

Do you own a car?

Yes
No

16
Category Scale
 Uses multiple items to elicit a single response.
 Nominal scale

Where in northern California do you reside?

North Bay
South Bay
East Bay
Peninsula
Other (specify:_____________)

17
Likert Scale
 The Likert scale was developed by Rensis Likert and is the
most frequently used variation of the summated rating scale.
 Summated rating scales consist of statements that express
either a favorable or unfavorable attitude toward the object
of interest. The participant is asked to agree or disagree with
each statement.
 Each response is given a numerical score to reflect its degree
of attitudinal favorableness and the scores may be summed to
measure the participant’s overall attitude.
 Likert scales may use 5, 7, or 9 scale points. They are quick
and easy to construct. The scale produces interval data.

18
Likert Scale
 The scale produces interval data.
 Originally, creating a Likert scale involved a procedure known
as item analysis. Item analysis assesses each item based on
how well it discriminates between those people whose total
score is high and those whose total score is low.
 It involves calculating the mean scores for each scale item
among the low scorers and the high scorers. The mean scores
for the high-score and low-score groups are then tested for
statistical significance by computing t values. After finding the t
values for each statement, they are rank-ordered, and those
statements with the highest t values are selected.
 Researchers have found that a larger number of items for each
attitude object improves the reliability of the scale.
19
Likert Scale

My work is very interesting

Strongly disagree
Disagree
Neither agree nor disagree
Agree
Strongly agree

20
Semantic differential scale
 The semantic differential scale measures the psychological
meanings of an attitude object using bipolar adjectives.
 Researchers use this scale for studies of brand and
institutional image. The method consists of a set of bipolar
rating scales, usually with 7 points, by which one or more
participants rate one or more concepts on each scale item.
 The scale is based on the proposition that an object can have
several dimensions of connotative meaning. The meanings are
located in multidimensional property space, called semantic
space.

21
Semantic differential scale
 It is efficient and easy for securing attitudes from a large
sample.
 Attitudes may be measured in both direction and intensity.
 The total set of responses provides a comprehensive picture
of the meaning of an object and a measure of the person
doing the rating.
 It is standardized and produces interval data.

22
Numerical Scale
 Similar to the semantic differential scale, with the
difference that numbers on a 5-point or 7-point
scale are provided, with bipolar adjectives at both
ends.
 Interval scale

How pleased are you with your new real estate


agent?

Extremely 7 6 5 4 3 2 1 Extremely
Pleased Displeased

23
Itemized Rating Scale
 A 5-point or 7-point scale with anchors, as needed, is
provided for each item and the respondent states the
appropriate number on the side of each item, or circles
the relevant number against each item.
 Interval scale

1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely

1. I will be changing my job within the next 12 months

24
Itemized Rating Scale
1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely

1. I will be changing my job within the next 12 months

25
Fixed or Constant-Sum Scales
 The respondents are here asked to distribute a given number of
points across various items.
 The constant-sum scale helps researchers to discover
proportions. The participant allocates points to more than one
attribute or property indicant, such that they total a constant
sum, usually 100 or 10.
 Produces Ordinal scale
 Participant precision and patience suffer when too many stimuli
are proportioned and summed. A participant’s ability to add
may also be taxed.
 Its advantage is its compatibility with percent and the fact that
alternatives that are perceived to be equal can be so scored
26
Fixed or Constant-Sum Scales

27
Stapel Scales
 It is used as an alternative to the semantic differential, especially
when it is difficult to find bipolar adjectives that match the
investigative question.
 This scale simultaneously measure both the direction and
intensity of the attitude toward the items under study.
 Interval data
 In the example, there are three attributes of corporate image.
The scale is composed of the word identifying the image
dimension and a set of 10 response categories for each of the
three attributes.

28
Stapel Scales

29
Graphic rating Scales (Cont’d)
 It was originally created to enable researchers to discern fine
differences.
 It helps the respondents to indicate on this scale their answers
to particular question by placing a mark at the appropriate
point on the line.
 Ordinal scale
 Theoretically, an infinite number of ratings is possible if
participants are sophisticated enough to differentiate and
record them. They are instructed to mark their response at any
point along a continuum. Usually, the score is a measure of
length from either endpoint.
 The difficulty is in coding and analysis.
30
Graphic Rating Scales

31
Ranking Scale
 Paired Scale
 Force Choice
 Comparative Scale

32
Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
 Using the paired-comparison scale, the participant can express
attitudes unambiguously by choosing between two objects. The
number of judgments required in a paired comparison is [(n)(n-
1)/2], where n is the number of stimuli or objects to be judged.
Paired comparisons run the risk that participants will tire to the
point that they give ill-considered answers or refuse to continue.
 Paired comparisons provide ordinal data.

33
Paired-Comparison Scale

34
Forced Choice
 Enable respondents to rank objects relative to one another,
among the alternatives provided.
 This method is faster than paired comparisons and is
usually easier and more motivating to the participant. With
five item, it takes ten paired comparisons to complete the
task, but the simple forced ranking of five is easier. A
drawback of this scale is the number of stimuli that can be
handed by the participant.
 This scale produces ordinal data.

35
Forced Choice

36
Comparative Scales (Cont’d)
 Provides a benchmark or a point of reference to assess
attitudes toward the current object, event, or situation
under study.
 When using a comparative scale, the participant compares
an object against a standard.
 It is ideal for such comparisons if the participants are
familiar with the standard.
 Some researchers treat the data produced by comparative
scales as interval data since the scoring reflects an interval
between the standard and what is being compared, but the
text recommends treating the data as ordinal unless the
linearity of the variables in question can be supported
37
Comparative Scale

38
Goodness of Measures

39
Validity

40
Reliability
 Reliability of measure indicates extent to which it is
without bias and hence ensures consistent
measurement across time (stability) and across the
various items in the instrument (internal consistency).

41
Stability
 Stability: ability of a measure to remain the same over
time, despite uncontrollable testing conditions or the
state of the respondents themselves.
– Test–Retest Reliability: The reliability coefficient obtained with
a repetition of the same measure on a second occasion.
– Parallel-Form Reliability: Responses on two comparable sets
of measures tapping the same construct are highly
correlated.

42
Internal Consistency
 Internal Consistency of Measures is indicative of the
homogeneity of the items in the measure that tap the
construct.
– Interitem Consistency Reliability: This is a test of the
consistency of respondents’ answers to all the items in a
measure. The most popular test of interitem consistency
reliability is the Cronbach’s coefficient alpha.
– Split-Half Reliability: Split-half reliability reflects the
correlations between two halves of an instrument.

43
Validity

 Ensures the ability of a scale to measure the


intended concept.
– Content validity
– Criterion related validity
– Construct validity

44
Validity
 Content validity
– Ensures that the measure includes an
adequate and representative set of items
that tap the concept.
• A panel of judges

45
Validity (Cont’d)
 Criterion related validity
– Is established when the measure
differentiates individuals on a criterion it is
expected to predict
• Concurrent validity: established when the
scale differentiates individuals who are known
to be different
• Predictive validity: indicates the ability of
measuring instrument to differentiate among
individuals with reference to future criterion
– Correlation
46
Validity
 Construct validity
– Testifies to how well the results obtained from the
use of the measure fit the theories around which the
test is designed.
• Convergent validity: established when the scores
obtained with two different instrument measuring the
same concept are highly correlated
• Discriminant validity: established when, based on
theory, two variables are predicted to be uncorrelated,
and the scores obtained by measuring them are indeed
empirically found to be so
– Correlation, factor analysis, convergent-discriminant
techniques, multitrait-multimethod analysis

47
Understanding Validity and Reliability

48
Understanding Validity and Reliability
 This graph illustrates reliability and validity by using an
archer’s bow and target as an analogy.
 High reliability means that repeated arrows shot from the
same bow would hit the target in essentially the same place.
 If we had a bow with high validity as well, then every arrow
would hit the bull’s eye.
 If reliability is low, arrows would be more scattered.
 High validity means that the bow would shoot true every
time. It would not pull right or send an arrow careening into
the woods. Arrows shot from a high-validity bow will be
clustered around a central point even when they are
dispersed by reduced reliability.
49

You might also like