You are on page 1of 59

Chapter 6

Measurement 1
Business Research / Marketing Research
Introduction
• Measurement is important to business
research.
• With an appropriate measurement, the
researcher would be able to comment on
business behaviour and business phenomena.
1. Levels of measurement
• In statistics, there are four data measurement
scales:
• nominal, ordinal, interval and ratio.

• These are simply ways to sub-categorize different


types of data (here’s an overview of statistical data
types) .
i. Nominal scale:
• Nominal scales are used for labelling variables, without
any quantitative value. “Nominal” scales could simply be
called “labels.”
• A good way to remember all of this is that “nominal”
sounds a lot like “name” and nominal scales are kind of
like “names” or labels.
ii. Ordinal Scale
• The characteristics of ordinal scale are that the categories have a
logical or ordered relationship to each other.

• In each case, we know that a #4 is better than a #3 or #2, but we


don’t know–and cannot quantify–how much better it is. For
example, is the difference between “OK” and “Unhappy” the
same as the difference between “Very Happy” and “Happy?” We
can’t say.
iii. Interval Scale
• The interval scale is also known as rating scale and variables /
attributes are measured on different scales such as scale of 1 to 5
or even 1 to 10.

• Interval scales are numeric scales in which we know both the order
and the exact differences between the values. The classic example
of an interval scale is Celsius temperature because the difference
between each value is the same. For example, the difference
between 60 and 50 degrees is a measurable 10 degrees, as is the
difference between 80 and 70 degrees.

• “Interval” itself means “space in between,” which is the important


thing to remember–interval scales not only tell us about order, but
also about the value between each item.
iv. Ratio scale
• Ratio scales are the ultimate joy when it comes to data
measurement scales because they tell us about the
order, they tell us the exact value between units, AND
they also have an absolute zero–which allows for a wide
range of both descriptive and inferential statistics to be
applied.
• At the risk of repeating myself, everything above about
interval data applies to ratio scales, plus ratio scales have
a clear definition of zero.
• Good examples of ratio variables include height, weight,
and duration.
2. Criteria For Good Measurement
• Key indicator of the quality of a measure is the proper
measurement of reliability and validity of the research. In
a standard research, any score obtained by a measuring
instrument is the sum of both the ‘true score’, which is
unknown, and ‘error’ in the measurement process.
• A good measurement must meet the

a. tests of validity,

b. reliability and

c.. practicality
continue
a. Test of Reliability
• The reliability refers to a measurement that supplies consistent
results with equal values.
• It measures consistency, precision, repeatability, and
trustworthiness of a research.
• It indicates the extent to which it is without bias, and hence
insures consistent measurement across time and across the
various items in the instruments (the observed scores).
• Some qualitative researchers use the term ‘dependability’ instead
of reliability. It is the degree to which an assessment tool produces
stable (error free) and consistent results. It indicates that the
observed score of a measure reflects the true score of that
measure. It is a necessary, but not sufficient component of validity.
continue
• In quantitative researcher, reliability refers to the
consistency, stability and repeatability of results, that is,
the result of a researcher is considered reliable if
consistent results have been obtained in identical
situations, but different circumstances.
• But, in qualitative research it is referred to as when a
researcher’s approach is consistent across different
researchers and different projects
Types of reliability
• Reliability is mainly divided into two types as:

• i) stability, and

• ii) internal consistency reliability.


stability
• It is defined as the ability of a measure to remain the
same over time despite uncontrolled testing conditions
or respondent themselves.
• It refers to how much a person’s score can be expected
to change from one administration to the next. A
perfectly stable measure will produce exactly the same
scores time after time.
• Two methods to test stability are:

• i) test-retest reliability, and ii) parallel-form reliability.


i. Test-retest reliability
• Test-retest reliability is established when the reliability coefficient is
obtained by repetition of the same measure on a second occasion.
• The higher it is the better the test-retest reliability and,
consequently, the stability of the measure across time.
• For example, employees of a Company may be asked to complete
the same questionnaire about employee job satisfaction two times
with an interval of three months, so that test results can be
compared to assess stability of scores. The correlation coefficient
calculated between two set of data, and if found to be high, the
test-retest reliability is better. The interval of the two tests should
not be very long, because the status of the company may change
during the second test, which affects the reliability of research.
ii. Parallel-form reliability
• It is a measure of reliability obtained by administering different
versions of an assessment tool to the same group of individuals.
• The scores from the two versions can then be correlated in order
to evaluate the consistency of results across alternate versions.
• If they are highly correlated, then they are known as parallel-
form reliability .
• For example, the levels of employee satisfaction of a Company
may be assessed with questionnaires, indepth interviews and
focus groups, and the results are highly correlated. Then we may
be sure that the measures are reasonably reliable [Yarnold,
2014].
Internal Consistency of
Measures
• It is a measure of reliability used to evaluate the degree
to which different test items that study the same
construct produce similar results.
• It examines whether or not the items within a scale or
measure are homogeneous.
• It can be established in one testing situation, thus it
avoids many of the problems associated with repeated
testing found in other reliability estimates. It can be
represented in two main formats
• i) The inter-item consistency, and ii) Split-half reliability.
i. Inter-item Consistency Reliability

• The inter-item consistency reliability is a test of the


consistency of respondents’ answers to all items in a
measure.
• The higher the coefficients, the better the measuring
instrument.
Split-half reliability
• The split-half reliability reflects the correlations between
two halves of an instrument.
• The estimates will vary depending on how the items in
the measure are split into two halves.
Split-half reliability
continue
• It measures the degree of internal consistency by checking one
half of the results of a set of scaled items against the other half.
It requires only one administration, especially appropriate when
the test is very long. It is done by comparing the results of one
half of a test with the results from the other half.
• A test can be split in half in several ways, for example, first half
and second half, or by odd and even numbered items. If the two
halves of the test provide similar results this would suggest that
the test has internal reliability. It is a quick and easy way to
establish reliability. It can only be effective with large
questionnaires in which all questions measure the same
construct, but it would not be appropriate for tests which
measure different constructs.
b. Test of Validity
• Validity is defined as the extent to which the
instrument measures what it claims to measure.
• It is the degree to which the results are truthful. So
that it requires research instrument (questionnaire)
to correctly measure the concepts under the study.
• Validity of research is an extent at which
requirements of scientific research method have
been followed during the process of generating
research findings. It is a compulsory requirement for
all types of studies.
Test of Validity includes
• a. Content validity,

• b. Face validity,

• c. Criterion validity

• d. Concurrent validity and

• e. Construct validity.
a. Content Validity
• Content validity ensures that the measure includes an
suitable and representative set of items that tap the
contents.
• The more the scale items represent the domain or
universe of the concept being measured, the greater the
content validity.
Example of content validity
b. Face validity
• It is considered as a basic and minimum index of content
validity, but it is determined after the test is constructed.
• Face validity indicates that the items that are intended to
measure a concept, do, on the face of it, look like they
measure the concept.
• Some researchers do not see fit to treat face validity as a
valid component of content validity.
• For example, estimating the speed of a car based on its
outward appearance (guesswork) is face validity
c. Criterion validity
• Criterion-related validity is established when measure
differentiates individuals on a criterion (standard) it is
expected to predict.
• It is used to predict future or current performance.

• It correlates test results with another criterion of interest

• It deals with relationship between scale scores, and some


specific measurable criterion.
• It tests how the scale differentiates individuals on a
criterion it is expected to predict
• That is, when we are expecting a future performance
based on the scores obtained currently by the measure,
we correlate the scores obtained with the performance .
• For example, a hands-on driving test has been shown to
be an accurate test of driving skills. The test can be
repeated by the written test to compare validity of the
test.
• It can be established by; i) the concurrent validity, and ii)
the predictive validity.
d. Concurrent validity
• Concurrent validity is established when the scale
discriminates individuals who are known to be different,
that is, they should score differently on the instrument.
• It is the degree to which the scores on a test are related
to the scores on another, already established as valid,
designed to measure the same construct, test
administered at the same time or to some other valid
criterion available at the same time.
• It is necessary when a test for assessing skills is
constructed with a view to replacing less efficient one in
used.
continue
• It is established by correlating one question with another
that has previously been validated with standard setting.
• It examines the validity of a tool on a highly theoretical
level .
• Example, a new simple test is to be used in place of an
old troublesome one, which is considered useful;
measurements are obtained on both tests at the same
time.
• Driving test, written test and on the road test.
e. Predictive validity
• It is often used in program evaluation studies, and is very
suitable for applied research.
• It is a test constructed and developed for the purpose of
predicting some form of behaviour.
• It indicates the ability of the measuring instrument to
differentiate among individuals with reference to a future
criterion.
• Tests that are constructed to pick applicants who are most
likely to be successful subsequently in their training while
rejecting those applicants who are most likely to be failures if
given admission.
continue
• Logically, predictive and concurrent validation are the
same, the term concurrent validation is used to indicate
that no time elapsed between measures.
• The higher the correlation between the criterion and the
predictor indicates the greater the predictive validity.
• If the correlation is perfect, that is 1, the prediction is also
perfect. Most of the correlations are only modest,
somewhere between 0.3 and 0.6.
f. Construct validity
• It is especially important for the empirical measures and
hypothesis testing for the construction of theories.
• Researchers create theoretical constructs to better
understand, explain, and predict behaviour.
• It involves testing a scale in terms of theoretically derived
hypotheses concerning the nature of underlying variables
or constructs.
continue
• The operationalization of the construct develops a series of
measurable behaviours that are hypothesized to correspond to
the latent construct.
• The construct syntactically involves establishing hypothesized
relationships between the construct of interest and other
related behaviours.
• Construct validity testifies to how well the results obtained
from the use of the measure fit the theories around which the
test is designed.
• This is assessed through :

• i. convergent and ii. discriminant validity.


g. Convergent Validity

• Convergent validity is established when the scores


obtained with two different instruments measuring the
same concept are highly correlated.
• It refers to the extent to which scores on a measure share a
high, medium or low relationship with scores obtained on a
different measure intended to assess the similar construct.
• It is established when the scores obtained with two
different instruments measuring the same concept are
highly correlated.
continue
• It is the degree to which two variables
measured separately bear a relationship to
one another.
• It is the actual general agreement among
ratings, gathered independently of one
another, where measures should be
theoretically related
h. Discriminant validity
• It is established when, based on theory, two variables are
predicted to be uncorrelated, and the scores obtained by
measuring them are indeed empirically found to be so, that
is, to differentiate one group from another.
• It is the lack of a relationship among measures which
theoretically should not be related.
• For example, surveys that are used to identify potential high
school drop-outs would have discriminant validity if the
students who graduate score higher on the test than
students who leave before graduation .

Attitude Rating Scale
• The following rating scales are often used in business
research (Chua Yan Piaw, 2016; Kumar, 2014; Sekaran &
Bougie, 2013):
• 1. Dichotomous scale

• The dichotomous scale is used to elicit a Yes or No answer.


A nominal scale is used to elicit the response.
• 2. Category scale

• The category scale uses multiple items to elicit a single


response. This also uses the nominal scale.
continue
3. Numerical scale
• The numerical scale is similar to the semantic differential scale,
with the difference that numbers on a five-point or seven-point
scale are provided, with bipolar adjectives at both ends. This is
also an interval scale.

4. Itemised rating scale


• A five-point or seven-point scale with anchors, as needed, is
provided for each item and the respondents states the
appropriate number on the side of each item, or circles the
relevant number against each item. The responses to the items
are then summed. This uses an interval scale.
continue
• 5. Likert scale

• The Likert scale is designed to examine how strongly subjects


agree or disagree with the statements. The responses over a
number of items tapping a particular concept or variable can
be analysed item by item, but is also possible to calculate a
total or summated score for each respondent by summing
across items. The summated rating scale, is based upon the
assumption that each statement/item on the scale has equal
attitudinal value, importance or weight in terms of reflecting
an attitude towards the issue in question. This assumption is
also the main limitation of this scale, as statements on a scale
seldom have equal attitudinal value.
Likert scale
continue
6. Thurstone scale
• Thurstone scale calculates a weight or attitudinal value
for each statement. The weight for each statement is
calculated on the basis of ratings assigned to a
statement by a group of judges. Each statement with
which respondents express agreement is given an
attitudinal score equivalent to the attitudinal value of
the statement.
Thurstone scale
continue
7. Semantic differential scale
• The semantic differential scale is a tri-dimensional scale
and its strength is that it allows a researcher to view
the respondent’s attitudes from various perspectives.
It is formed based on the semantic differential
psychology theory which states that every concept
regarding humans and other objects can be stated and
differentiated in three different dimensions. The three
dimensions in this scale are the evaluation dimension,
potential dimension and activity dimension.
Semantic differential
scale
continue
8. Fixed or Constant Sum Rating Scale
• The respondents are here asked to distribute a given
number of points across various items. This is more in
the nature of an ordinal scale.
Fixed or Constant Sum Rating Scale
continue
9. Stapel Scale
• This scale simultaneously measures both the direction
and intensity of the attitude toward the items under
study. The characteristic of interest to the study is
placed at the centre with a numerical scale ranging, for
example, from +3 to -3, on either side of the item. This
gives an idea of how close or distant the individual
response to the stimulus is. Since this does not have an
absolute zero point, this is an interval scale.
Stapel Scale
10. Graphic Rating Scale
A graphical representative helps the respondents to indicate on this scale their
answers to a particular question by placing a mark at the appropriate point on
the line. This is an ordinal scale.
continue
11. Consensus Scale
• Scales can also be developed by consensus, where a
panel of judges selects certain items, which in its view
measure the relevant concept. The items are chosen
particularly based on their pertinence or relevance to
the concept. Such a consensus scale is developed
after the selected items have been examined and
tested for their validity and reliability.
continue
12. Cumulative or Guttman Scale

• The Guttman scale is one of the most difficult scales


to construct and therefore is rarely used. This scale
does not have much relevance for beginners in
research.
Group Activity

• Discuss with an example the following measuring


scales: Likert scale, Thurstone scale, Guttman scale,
Semantic differential scale.

• How do you increase the validity and reliability of a


measurement index?

• Obtain a research article from a journal. Discuss the


kinds of measurement scale used in the research to
collect data, and how the data can be analysed.
Comprehension Questions

• 1. What are the criteria for good


measurement?
• 2. What are the types of reliability test?

• 3. What are the rating scales often used in


business research?
• 4. What are the types of practicality test?

You might also like