# Measurement: Scaling, Reliability, Validity

CHAPTER 7

1

Chapter Objectives

Know the characteristics and power of the four types of scales- nominal, ordinal, interval, and ratio. Know how and when to use the different forms of rating scales and ranking scales. Explain stability and consistency and how they are established. Discuss what “goodness” of measures means, and why it is necessary to establish it in research.
2

Scale

Is a tool or mechanism by which individuals are distinguished as to how they differ from one another on the variables of interest to our study.

3

scales  1. 4. 2. 3. There are four basic types of scales: Nominal Scale Ordinal Scale Interval Scale Ratio Scale 4 .

scales   The degree of sophistication to which the scales are fine-tuned increases progressively as we move from the nominal to the ratio scale. The information on the variables can be obtained in greater detail when we employ an interval or a ratio scale than the other two scales. 5 .

scales  With more powerful scales. means that more meaningful answers can be found to our research questions. which in turn. 6 . increasingly sophisticated data analyses can be performed.

What is your department? O Marketing O Maintenance O Production O Servicing O Sales O Public Relations What is your gender? O Male O Female  O Finance O Personnel O Accounting  7 .Nominal Scale  A nominal scale is one that allows the researcher to assign subjects to certain categories or groups.

respondents can be grouped into two categories.Nominal Scale   For example. 8 .male and female. the variable of gender. Notice that there are no third category into which respondents would normally fall.

9 .Nominal Scale  The information that can be generated from nominal scaling is to calculate the percentage (or frequency) of males and females in our sample of respondents.

American Japanese Russian Malaysian Chinese German Arabian Other 10 . We could nominally scale this variable in the following mutually exclusive and collectively exhaustive categories.Example 1   Nominally scale the nationality of individuals in a group of tourists to a country during a certain year.

Example 1  Note that every respondent has to fit into one of the above categories and that the scale will allow computation of the numbers and percentages of respondents that fit into them. 11 .

it also rank-orders categories in some meaningful way.Ordinal Scale  Ordinal scale: not only categorizes variables in such a way as to denote differences among various categories. What is the highest level of education you have completed? O Less than High School O High School/GED Equivalent O College Degree O Masters Degree O Doctoral Degree 12  .

13 . 3. 2. and so on.Ordinal Scale  The preference would be ranked ( from best to worse. or from first to last) and numbered as 1.

You should rank the most important item as 1. 2. until you have ranked each of them 1. and so on.Example 2  Rank the following five characteristics in a job in terms of how important they are for you. 3. or 5. the next in importance a 2. 14 . 4.

Interacts with others _____ 2. Work independently _____ 15 . Use different skills _____ 3.Example 2 (Cont. Complete a task to the end _____ 4.)  Job Characteristic Ranking The opportunity provided by the job to: 1. Serve others _____ 5.

)  This scale helps the researcher to determine the percentage of respondents who consider interaction with others as most important. 16 . those who consider using a number of skills as most important. and so on. Such knowledge might help in designing jobs that would be seen as most enriched by the majority of the employees.Example 2 (Cont.

Even though differences in the ranking of objects. persons are clearly known.)   We can see that the ordinal scale provides more information than the nominal scale. we do not know their magnitude.Example 2 (Cont. 17 . This deficiency is overcome by interval scaling.

18 .Interval Scale  Interval scale: whereas the nominal scale allows us only to qualitatively distinguish groups by categorizing them into mutually exclusive and collectively exhaustive sets. and the ordinal scale to rank-order the preferences. the interval scale lets us measure the distance between any two points on the scale.

com/college/sekaran 19 .wileyeurope.Interval scale © 2009 John Wiley & Sons Ltd. www.

Disagree 2 Neither Agree Nor Disagree 3 Agree 4. using the scale given below. 20 . Strongly Agree 5. strongly disagree 1. by circling the appropriate number against each.Example 3a  Indicate the extent to which you agree with the following statements as they relate to your job.

Example 3a (Cont.)  The following opportunities offered by the job are very important to me: Interacting with others Using a number of different skills Completing a task from beginning to end Serving others Working independently 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 21 .

still retaining the magnitude of the difference.Example 3a (Cont. Any number can be added to or subtracted from the numbers on the scale. 4. and 5 for the five items. 2. The magnitude of difference represented by the space between points 1 and 2 on the scale is the same as the magnitude of difference represented by the space between points 4 and 5. or between any other two points. 1. 22 .)   Suppose that the employees circle the numbers 3.

8.. 23 . the interval scale will have the numbers 7.….. It has an arbitrary origin. The magnitude of the difference between 7 and 8 is still the same as the magnitude of the difference between 9 and 10.Example 3a (Cont.)   If we add 6 to the five points on the scale. 11 ( instead of 1 to 5).

Example 3b  Circle the number that represents your feelings at this particular moment best. I exert myself too much considering what I get back in return I disagree completely 1 2 3 4 5 I agree completely 3. 1. Please answer every question. I get much in return I disagree completely 1 2 3 4 5 I agree completely 24 . I invest more in my work than I get out of it I disagree completely 1 2 3 4 5 I agree completely 2. There are no right or wrong answers. For the efforts I put into the organization.

in that it has an absolute (in contrast to an arbitrary) zero point.Ratio Scale  Ratio scale: overcomes the disadvantage of the arbitrary origin point of the interval scale. which is a meaningful measurement point. What is your age? 26  .

Ratio Scale 26 .

The differences between scales are summarized in the next Figure.Ratio Scale   The ratio scale is the most powerful of the four scales because it has a unique zero origin ( not an arbitrary origin). 27 .

The differences between scales 28 .

Properties of the Four Scales .

Interval. It is necessary to examine the methods of scaling (assigning numbers or symbols) to elicit the attitudinal responses of subjects toward objects. events. or persons. and Ratio scales. Ordinal. 30 .Developing Scales   The four types of scales that can be used to measure the operationally defined dimensions and elements of a variable are: Nominal.

Developing Scales  Categories of attitudinal scales: (not to be confused with the four different types of scales)   The Rating Scales The Ranking Scales 31 .

Ranking scales. or persons and elicit the preferred choices and ranking among them. 32 .Developing Scales   Rating scales have several response categories and are used to elicit responses with regard to the object. events. make comparisons between or among objects. or person studied. event.

3. The following rating scales are often used in organizational research. Dichotomous scale Category scale Likert scale Numerical scale 33 . 4. 2.Rating Scales  1.

Rating Scales
5. 6. 7.

8.
9. 10.

Semantic differential scale Itemized rating scale Fixed or constant sum rating scale Stapel scale Graphic rating scale Consensus scale
34

Dichotomous Scale

Is used to elicit a Yes or No answer. (Note that a nominal scale is used to elicit the response) Example 4 Do you own a car? Yes No

35

Category Scale
 

It uses multiple items to elicit a single response. Example 5 Where in Jordan do you reside? Amman Mafraq Irbid Zarqa Other
36

Likert Scale Is designed to examine how strongly subjects agree or disagree with statements on a 5-point scale as following: _________________________________  Strongly Neither Agree Strongly Disagree Disagree Nor Disagree Agree Agree 1 2 3 4 5 ______________________________________________________ 37 .

Likert Scale  This is an Interval scale and the differences in responses between any two points on the scale remain the same. 38 .

Semantic Differential Scale  We use this scale when several attributes are identified at the extremes of the scale. For instance. the scale would employ such terms as: Good – Bad Strong – Weak Hot – Cold 39 .

Example 6 What is your opinion on your supervisor? Responsive--------------Unresponsive Beautiful-----------------Ugly Courageous-------------Timid 40 .Semantic Differential Scale   This scale is treated as an Interval scale.

as illustrated in the following example: How pleased are you with your new job? Extremely Extremlely pleased 5 4 3 2 1 displeased 41 .Numerical Scale  Is similar to the semantic differential scale. with the difference that numbers on a 5points or 7-points scale are provided.

Example 7(i) Respond to each item using the scale below. This uses an Interval Scale. 1 2 3 4 5 Very unlikely unlikely neither likely very likely unlikely nor likely -------------------------------------------------------------------------------I will be changing my job in the near future.Itemized Rating Scale   A 5-point or 7-point scale is provided for each item and the respondent states the appropriate number on the side of each item. and indicate your response number on the line by each item. -------- 42 .

Itemized Rating Scale   Note that the above is balanced rating with a neutral point. The unbalance rating scale which does not have a neutral point. 43 . will be presented in the following example.

Itemized Rating Scale  Example 7(ii) Circle the number that is closest to how you feel for the item below: Not at all interested Somewhat interested Moderately interested Very much interested 1 2 3 4 -------------------------------------------------------------------------------How would you rate your interest 1 2 3 4 In changing current organizational Policies? 44 .

 Fragrance ----Color ----Shape ----Size ----_________ Total points 100 This is more in the nature of an ordinal scale. 45 . indicate the importance you attach to each of the following five aspects by allotting points for each to total 100 in all.Fixed or Constant Sum Scale  The respondents are asked to distribute a given number of points across various items. Example : In choosing a toilet soap.

Stapel Scale  This scale simultaneously measures both the direction and intensity of the attitude toward the items under study. on either side of the item as illustrated in the following example: 46 . say from +3 to – 3. The characteristic of interest to the study is placed at the center and a numerical scale ranging.

+3 +3 +3 +2 +2 +2 +1 +1 +1 Adopting modern Product Interpersonal Technology Innovation Skills -1 -1 -1 -2 -2 -2 -3 -3 -3 47 .Example 8: Stapel Scale  State how you would rate your supervisor’s abilities with respect to each of the characteristics mentioned below. by circling the appropriate number.

as in the following example: 48 .Graphic Rating Scale  A graphical representation helps the respondents to indicate on this scale their answers to a particular question by placing a mark at the appropriate point on the line.

how would you rate your supervisor? 10 5 1 49 .Graphic Rating Scale   Example 9 On a scale of 1 to 10.

Ranking Scales  Are used to tap preferences between two or among more objects or items (ordinal in nature). However. 50 . such ranking may not give definitive clues to some of the answers sought.

100% 51 . 20% of respondents choose the 4th product. 25% of respondents choose the 2nd product. the manager seeks information that would help decide which product line should get the most attention.Ranking Scales  Example 10 There are 4 product lines. Assume: 35% of respondents choose the 1st product. 20% of respondents choose the 3rd product.

We have to use alternative methods like Forced Choice. 52 . and the Comparative Scale. We will describe the Forced Choice as an example. Paired Comparisons.Ranking Scales    The manager cannot conclude that the first product is the most preferred. Why? Because 65% of respondents did not choose that product.

particularly if the number of choice to be ranked is limited in number.Forced Choice  The forced choice enables respondents to rank objects relative to one another. among the alternative provided. This is easier for the respondents. 53 .

assigning 1 for the most preferred choice and 5 for the least preferred.‫• الرأي‬ ----‫• أخبار اليوم‬ ----------. -------‫• الدستور‬ --------.‫• شيحان‬ 54 .Forced Choice  Example 11 Rank the following newspapers that you would like to subscribe to in the order of preference.‫• الغد‬ -------.

Goodness of Measures  It is important to make sure that the instrument that we develop to measure a particular concept is accurately measuring the variable. and we are actually measured the concept that we set out to measure. 55 .

we need to be reasonably sure that the instruments we use in our research do indeed measure the variables they are supposed to.Goodness of Measures  We need to assess the goodness of the measures developed. 56 . That is. and that they measure them accurately.

Goodness of Measures .

Then the reliability and validity of the measures are established.Goodness of Measures    How can we ensure that the measures developed are reasonably good? First an item analysis of the responses to the questions tapping the variable is done. 58 .

the means between the high-score group and the low-score group are tested to detect significant differences through the t-values. Each item is examined for its ability to discriminate between those subjects whose total scores are high. 59 . In item analysis. and those with low scores.Item Analysis   Item analysis is done to see if the items in the instrument belong there or not.

60 . Thereafter. tests for the reliability of the instrument are done and the validity of the measure is established.Item Analysis  The items with a high t-value are then included in the instrument.

Reliability  Reliability of measure indicates extent to which it is without bias and hence ensures consistent measurement across time (stability) and across the various items in the instrument (internal consistency). 66 .

Parallel-Form Reliability: Responses on two comparable sets of measures tapping the same construct are highly correlated. despite uncontrollable testing conditions or the state of the respondents themselves.   Test–Retest Reliability: The reliability coefﬁcient obtained with a repetition of the same measure on a second occasion. 84 62 .Stability  Stability: ability of a measure to remain the same over time.

Test-Retest Reliability   When a questionnaire containing some items that are supposed to measure a concept is administered to a set of respondents now. then the correlation between the scores obtained is called the test-retest coefficient. and consequently. and again to the same respondents. say several weeks to 6 months later. the stability of the measure across time. the better the test-retest reliability. 63 . The higher the coefficient is.

we have parallel-form reliability. Both forms have similar items and the same response format. the only changes being the wording and the order or sequence of the questions.Parallel-Form Reliability   When responses on two comparable sets of measures tapping the same construct are highly correlated. 64 .

65 . we may be fairly certain that the measures are reasonably reliable.Parallel-Form Reliability   What we try to establish in the parallelform is the error variability resulting from wording and ordering of the questions. or other factors. If two such comparable forms are highly correlated (say 8 and above). with minimal error variance caused by wording. ordering.

The most popular test of inter-item consistency reliability is the Cronbach’s coefficient alpha.  Split-Half Reliability: Split-half reliability reflects the correlations between two halves of an instrument.Internal Consistency  Internal Consistency of Measures is indicative of the homogeneity of the items in the measure that tap the construct.  Inter-item Consistency Reliability: This is a test of the consistency of respondents’ answers to all the items in a measure. 72 .

Several types of validity tests are used to test the goodness of measures: content validity. and construct validity. criterion-related validity. 67 . Validity is concerned with whether we measure the right concept.Validity   Validity tests show how well an instrument that is developed measures the particular concept it is intended to measure.

The more the scale items represent the domain of the concept being measured. In other words. content validity is a function of how well the dimensions and elements of a concept have been delineated. 68 .Content Validity    Content validity ensures that the measure includes an adequate and representative set of items that tap the concept. the greater the content validity.

Concurrent validity is established when the scale discriminates individuals who are known to be different.Criterion-Related Validity   Criterion-Related Validity is established when the measure differentiates individuals on a criterion it is expected to predict. 69 . they should score differently on the instrument as in the following example. that is. This can be done by establishing what is called concurrent validity or predictive validity.

Criterion-Related Validity  Example 12 If a measure of work ethic is developed and administered to a group of welfare recipients. from those who would not want to work even when offered a job. 70 . the scale should differentiate those who are enthusiastic about accepting a job and glad of a opportunity to be off welfare.

but of something else. If both types of individuals have the same score on the work ethic scale. then the test would not be a measure of work ethic. Those who are low on work ethic values.Example 12 (Cont.)   Those with high work ethic values would not want to be on welfare and would ask for employment. 71 . might exploit the opportunity to survive on welfare for as long as possible.

Construct Validity    Construct Validity testifies to how well the results obtained from the use of the measure fit the theories around which the test is designed. based on theory. two variables are predicted to be uncorrelated. Discriminant validity is established when. Convergent validity is established when the scores obtained with two different instruments measuring the same concept are highly correlated. 72 . This is assessed through convergent and discriminant validity. and the scores obtained by measuring them are indeed empirically found to be so.

73 .Goodness of Measures    Goodness of Measures is established through the different kinds of validity and reliability.2 summarizes the kinds of validity discussed in the lecture. Table 7. The results of any research can only be as good as the measures that tap the concepts in the theoretical framework.

74 .Validity .