You are on page 1of 6

MULTI ITEM SCALE DEVELOPMENT

Single Item vs Multiple Item Scale

Single item scale: In the single item scale, there is only one item to measure a given
construct. For example:

Consider the following question:

• How satisfied are you with your current job?


Very Dissatisfied
Dissatisfied
Neutral
Satisfied
Very satisfied

The problem with the above question is that there are many aspects to a job, like pay, work
environment, rules and regulations, security of job and communication with the seniors. The
respondent may be satisfied on some of the factors but may not on others. By asking a
question as stated above, it will be difficult to analyse the problem areas. To overcome this
problem, a multiple item scale is proposed.

Multiple item scale: In multiple item scale, there are many items that play a role in forming
the underlying construct that the researcher is trying to measure. This is because each of the
item forms some part of the construct (satisfaction) which the researcher is trying to measure.
As an example, some of the following questions may be asked in a multiple item scale.

• How satisfied are you with the pay you are getting on your current job?
Very dissatisfied
Dissatisfied
Neutral
Satisfied
Very satisfied
• How satisfied are you with the rules and regulations of your organization?
Very dissatisfied
Dissatisfied
Neutral
Satisfied
Very satisfied

• How satisfied are you with the job security in your current job?
Very dissatisfied
Dissatisfied
Neutral
Satisfied
Very satisfied

MEASUREMENT ACCURACY/ RELIABILTY AND VALIDITY ( Multi- item scale)

There are two criteria for evaluating measurements: reliability, and validity.

A. Reliability
Reliability is concerned with consistency, accuracy and predictability of the scale. It refers to
the extent to which a measurement process is free from errors. The reliability of a scale can
be measured using the following methods:

Test–retest reliability: In this method, repeated measurements of the same person or group
using the same scale under similar conditions are taken. A very high correlation between the
two scores indicates that the scale is reliable. However, the following issues should be kept in
mind before arriving at such a conclusion.

• What should be the appropriate time difference between the two observations is a question
which requires attention. If the time difference between two consecutive observations is very
small (say two or three weeks) it is very likely that the respondents would remember the
previous answer and may give the same answer when the instrument is administered the
second time. This will make the instrument reliable, which may not actually be the case.
However, if the difference between the two observations is very large (say more than a year)
it is quite likely that the respondent’s answers to the various questions of the instrument
might have actually undergone a change, resulting in poor reliability of the scale. Therefore,
the researcher has to be very careful in deciding upon the time difference between the two
observations. Generally, it is thought that a time difference of about five to six months is an
ideal period.

• Another problem in this test is that the first measurement may change the response of the
subject to the second measurement.

• The situational factors working on two different time periods may not be the same, which
may result in different measurement in the two periods.

• The second reading on the same instrument from the same subject may produce boredom,
anger or attempt to remember the answers given in an initial measurement.

Split-half reliability method: This method is used in the case of multiple item scales. Here
the number of items is randomly divided into two parts and a correlation coefficient between
the two is obtained. A high correlation indicates that the internal consistency of the construct
leads to greater reliability. Another common approach is odd-even methods. In odd-even
reliability, we have a subset of odd-numbered items that is compared to the even-numbered
items. The process of items extraction in odd-even method is as follows:

A). = 1, 3, 5, 7 – as part of odd-numbered items,


B). = 2, 4, 6, 8 – part of even-numbered items.

Another measure which is used to test the internal consistency of a multiple item scale is the
coefficient alpha (α) commonly known as cronbach alpha. The cronbach alpha computes the
average of all possible split-half reliabilities for a multiple item scale. This coefficient
demonstrates whether the average score of all split-half of reliabilities converge to a certain
point or not. The coefficient alpha does not address validity. However, many researchers use
this as a sole indicator of validity. The following values of alpha with their interpretations are
suggested below:
 0.80 ≤ α ≤ 0.95 implies There is very good reliability between the various items of a
multiple item scale
 0.70 ≤ α ≤ 0.80 implies There is good reliability between the various items of a
multiple item scale
 0.60 ≤ α ≤ 0.70 implies There is fair reliability between the various items of a
multiple item scale
 α < 0.60 means There is poor reliability between the various items of a multiple item
scale.

Inter-rater reliability
Inter-rater reliability is the extent to which two or more raters (or observers, coders,
examiners) agree. It addresses the issue of consistency of the implementation of a rating
system. Inter-rater reliability can be evaluated by using a number of different statistics,
such as percentage agreement. High inter-rater reliability values refer to a high degree of
agreement between two examiners. Low inter-rater reliability values refer to a low degree
of agreement between two examiners.

B. Validity
The validity of a scale refers to the question whether we are measuring what we want to
measure. Validity of the scale refers to the extent to which the measurement process is free
from both systematic and random errors. The validity of a scale is a more serious issue than
reliability. There are different ways to measure validity.

Content validity: This is also called face validity. The content validity of a measuring
instrument is the extent to which it provides adequate coverage of the investigative questions
guiding the study. A determination of content validity involves judgment. First, the designer
may determine it through a careful definition of the topic, the items to be scaled, and the
scales to be used. This logical process is often intuitive and unique to each research designer.
A second way is to use a panel of persons to judge how well the instrument meets the
standards. The panel independently assesses the test items for an instrument as essential,
useful but not essential, or not necessary. In both informal judgments and this systematic
process, “content validity is primarily concerned with inferences about test construction
rather than inferences about test scores.”

For example, to measure the perception of a customer towards Jet Airways, a multiple item
scale is developed. A set of 15 items is proposed. These items when combined in an index
measure the perception of Jet Airways. In order to judge the content validity of these 15
items, a set of experts may be requested to examine the representativeness of the 15 items.
The items covered may be lacking in the content validity if we have omitted behaviour of the
crew, food quality, and food quantity, etc., from the list. In fact, conducting the exploratory
research to exhaust the list of items measuring perception of the airline would be of immense
help in such a case.

Criterion related validity: This involves the success of measures used for prediction or
estimation. It is the ability of a measured phenomenon at one point of time to predict another
phenomenon at a future point of time. If the correlation coefficient between the two is high,
the initial measure is said to have a high predictive ability. As an example, consider the use of
the common admission test (CAT) to shortlist candidates for admission to the MBA
programme in a business

Construct Validity: It refers to how well a measure actually measures the construct it is
supposed to measure. It is of two types: convergent and discriminant validity.

 Convergent Validity: It reflects the degree to which the items of an individual latent
construct correlates positively with other measures of the same construct. Convergent
validity can be established if the measures of the same construct are highly correlated.

 Discriminant Validity: It reflects the degree to which the latent constructs are truly
distinct from each other. Discriminant validity can be established when the
correlations between the latent constructs are not excessively high. The values of
correlation above 0.8 among various latent constructs is an indication of lack of
discriminant validity.
SCALE EVALUATION

1). Missing Values The amount of missing data can have a profound impact on assessing the
constructs. One approach is to exclude the cases with missing values. Alternatively, you can
utilize the neutral value of the scale (for instance a construct measured on a 7-point Likert
scale ranging from 1= Strongly disagree, 2-=Disagree, 3= Somewhat disagree, 4= Neither
agree nor disagree, 5= Somewhat agree, 6= Agree and 7=Strongly agree, would have a
neutral value of 4). Another approach is to use the average value of the individual construct
to replace the missing values.

2). Inter-item correlation

Inter-item correlations are an essential element in conducting an item analysis of a set of


test questions. Inter-item correlations examine the extent to which scores on one item are
related to scores on all other items in a scale. It provides an assessment of item redundancy:
the extent to which items on a scale are assessing the same content. Ideally, the average
inter-item correlation for a set of items should be between .50 and .60. The values more
than 0.80 indicates that the items are so close as to be almost repetitive.

3) Item to total score

Item to total score is the correlation between the score on the item and sum of all other items
making up the dimension to which the item was assigned. Items with low item- to -total
correlation should be deleted (correlations below 0.50).

4) Reliability and Validity

(Refer to the reliability and validity section above).

You might also like