You are on page 1of 54

BRM 701

MEASUREMENT SCALES 2
DR. NUR ARFAH MUSTAPHA
0176163501
arfahmustapha@uitm.edu.my
Random and Systematic Error
Random Error

1) fluctuations in the person’s current mood.


2) misreading or misunderstanding the questions
3) measurement of the individuals on different days or in different
places.

These error may cancel out as you collect many samples

Systematic Error

Sources of error including the style of measurement, tendency


toward self-promotion, cooperative reporting, and other conceptual
variables are being measured.
So, we have to reduce these errors to prove scientific findings

How well do our measured variables “capture” the conceptual


variables?

Reliability
The extent to which the variables
are free from random error, usually *
determined by measuring the *
**
variables more than once.
CVs * *
*
Construct Validity *
The extent to which a measured
variable actually measures the
conceptual variables that is design * **
*
to assess the extent to which it is
known to reflect the conceptual *
variable other measured variables CVs
**
**
CRITERIA FOR GOOD
MEASUREMENT
Goodness of measurement
⚫Make sure the accuracy of the instrument. For that:
⚫ Ensure not missing any dimension, element, and question. Nothing irrelevant.
⚫ Assess the “goodness” of measuring instrument.
⚫ Characteristics of a good measurement:

Validity, Reliability, and Sensitivity – accuracy of the measuring instrument .


The Goal of Measurement:
Validity and Reliability
RELIABILITY
⚫The degree to which measures are free from random error
and therefore yield consistent results.
⚫Stability and consistency with which the instrument
measures the concept and helps to assess the goodness of a
measure.
⚫Maintains stability over time in the measurement of a
concept.
⚫Two important dimensions of reliability: (a) stability and
(b) consistency.
Quantitative data analysis:
Testing goodness of data

❑Reliability
- the reliability of a measure is established by testing for
both consistency and stability
- consistency indicates how well the items measuring a
concept hang together as a set
- Cronbach’s alpha is a reliability coefficient that indicates
how well the items in a set are positively correlated to one
another
- the closer Cronbach’s alpha is to 1, the higher the internal
consistency reliability
8
a. Stability of Measures

⚫Ability of the measure to remain the same over time.


⚫It attests to the “goodness” of measure because the measure
of the concept is stable, no matter when it is applied.
⚫Two tests of stability: (1) test-retest reliability, and (2)
parallel-form reliability
b. Internal Consistency of Measures

⚫Indicative of the homogeneity of the items in the measure.


⚫Items should ‘hang together as a set,’
⚫Each item be capable of independently measuring the same
concept.
⚫Examine if the items and subsets of items in the instrument
are highly correlated.
⚫Two ways to do it.
1. Inter-item Consistency Reliability

⚫Test of consistency of respondents’ answers to all items in a


measure.
⚫To the degree that items are independent measures of the
same concept, they will be correlated.
1. Test-Retest Reliability:
⚫Administering the same test to the same respondents at two
separate times
⚫Use instrument for measuring job satisfaction at T-1.
Satisfied 64%. Repeat after 4 weeks. Same results. Hi
stability.
⚫Reliability coefficient obtained with repetition of the same
measure on second occasion. Correlation between two
scores.
Two problems with test-retest
⚫It is a longitudinal approach.
So:
1. It may sensitize the respondents.
2. Time may help change the attitude. Also maturation of the subjects.
⚫Hence the results may not show high correlation.
⚫Due time factor rather than the lack of reliability.
Test-Retest Reliability
The extent to which scores on the same
measured variable correlate with each
other on two different measurements
given at two different time.

Questionnaire 9/20 Questionnaire 9/27


4 I feel I do not have much proud of.
___ 4 I feel I do not have much proud of.
___
3 On the whole, I am satisfied with myself
___ 4 On the whole, I am satisfied with myself
___
2 I certainly feel useless at times
___ 1 I certainly feel useless at times
___
1 At times I think I am no good at all
___ 1 At times I think I am no good at all
___
___
4 I have a number of good qualities ___
4 I have a number of good qualities
___
3 I am able to do things as well as others ___
4 I am able to do things as well as others
Effe ct
sti ng
Rete
Interrater Reliability
The extent to which the scores
counted by coders correlate
each other.

How Do You Measure


Interrater Reliability?
Aggression Code
Coder 1 Coder 2
Hit boy A ______ ______ Cohen’s Kappa
Hit boy B 1
______ ______ 3
Hit girl A ______ ______3 3
Hit girl B ______ ______3 2
1 1
2. Parallel-Form Reliability
⚫Also called equivalent-form reliability.
⚫When responses on two comparable sets of measures
tapping the same construct are highly correlated.
⚫Both forms/sets have similar items, same response format.
Change the wording/ordering.
⚫Minimum error variance caused by wording, ordering, or
other factors
⚫Split-half reliability 🡪 correlation between two halves of an
instrument.
2. Split-Half Reliability

⚫Reflects the correlation between two halves of an instrument.


⚫One half could be of even numbered items and other half of
odd numbered items.
⚫High correlation tells us about similarity among items
Equivalent-Forms Reliability
The extent to which two equivalent
variables given at different time
correlate each other.
Example. GRE, SAT, GMAT, TOEFL

22 X 45 = 32 X 45 =

85 X (23-11) = 85 X (41-11) =

72-14 X 25 X (6-1) =
72-14 X 12 X (7-1) =
Reliability as Internal Consistency
The extent to which the scores on the items
correlate with each other and thus are all
measuring the true score rather than
reflecting random
error. 9/20
Questionnaire How Do You Measure
Internal Consistency?
___ I feel I do not have much proud of.
___ On the whole, I am satisfied with myself
___ I certainly feel useless at times Split-half Reliability
___ At times I think I am no good at all
___ I have a number of good qualities
___ I am able to do things as well as others Coefficient Alpha
Questionnaire 1 t
Item 1 Questionnaire 1
Test-Retest Reliability
Item 2 Item 1
Item 2
Item 3
Item 3

Reliability as
Internal Consistency

Questionnaire 2
Item 1
Equivalent-Forms Item 2
Reliability

Item 3

Interrater Reliability
Note

⚫Reliability is necessary but not sufficient condition of to test


of goodness of a measure.
⚫A measure could be highly stable and consistent, but may
not be valid.
⚫Validity ensures the ability of the instrument to measure the
intended concept.
Sensitivity
⚫Instrument’s ability to accurately measure variability in
responses.
⚫A dichotomous response category, such as “agree or
disagree” does not allow the recording of subtle attitude
changes.
⚫A sensitive measure, with numerous items on the scale, may
be needed. For example:
⚫Increase items. Increase response categories. (Strongly
agree, agree, neutral, disagree, strongly disagree). It will
increases a scale’s sensitivity
Researchers ask questions:

⚫Do colleagues agree with my measurement?


⚫Does my measure correlate with others’ measures of the
same concept?
⚫Does it really measure what is expected to measure?
⚫The answers provide some evidence of the measure’s
validity.
Validity

● The ability of an instrument to measure what is intended to


be measured.
● Validity of the indicator 🡪 Is it a true measure? Are we
tapping the concept?
● Abstract ideas (construct) but concrete observations. Degree
of fit between a construct and its indicators.
Types of Validity
1. Content validity
2. Criterion-related validity
3. Construct validity
1. Content Validity:

⚫Do the measures include an adequate and


representative set of items that tap the concept?
⚫How well the dimensions and elements of the
concepts have been delineated?
⚫Let us take the example of measuring feminism
Example of Feminism:

⚫Implies a person’s commitment to a set of beliefs creating


full equality between men and women – in areas of:
⚫Arts, intellectual pursuits, family, work, politics, authority
relation. Dimensions
⚫Is there adequate coverage of dimensions?
⚫Do we have questions on each dimension?
⚫Panel of judges may attest the content validity of an
instrument.
⚫Each panelist assesses the test items
Validity
Construct Validity

The extent to which a measured variable


actually measures the conceptual variable
(that is, the construct) that it is designed
to assess.
Criterion Validity
The extent to which a self-report measure
correlates with a behavioral measured
variables.
Criterion Validity
Predictive Validity

The extent to which the scores


can predict the participants’ Example. GRE, SAT...
future performance.

Concurrent Validity

The extent to which the self-report


measure correlate with the behavioral
measure that is assessed at the same
time.
2. Criterion-Related Validity
⚫Uses some standard or criterion to indicate a construct
accurately.
⚫Compare the measure with another accepted measure of the
same construct.
⚫Does the measure differentiate individuals on the criterion it
is expected to predict?
⚫Two subtypes (a) concurrent validity, (b) predictive validity
a. Concurrent Validity:

⚫An indicator must be associated with a pre-existing valid


indicator. Example:
⚫Create a new intelligence test. Is it highly associated with
the existing IQ test?
⚫Do those who score high in old test also score high in new
test? If yes, then valid.
⚫When a scale differentiates individuals who are known to
be different should score differently on the instrument.
b. Predictive Validity
⚫Indicates the ability of the measuring instrument to
differentiate among individuals as to future criterion.
⚫Aptitude test at the time of selection. Those who score high
at time one (T-1) should perform better in future (T-2) than
those who scored low.
3. Construct Validity

⚫Used for measures with multiple indicators.


⚫Do various indicators operate in consistent manner?
⚫How well the results obtained from the use of the measure
fit the theories around which the test is designed? This is
assessed through (a) convergent and (b) discriminant
validity.
Construct Validity
Face Validity

The extent to which the measured


variable appears to be an adequate
measure of the conceptual variables

I don’t like Japanese


Strongly Disagree 1 2 3 4 5 6 7 8 Strongly Agree Discrimination
towards Japanese

Measured Conceptual
Variable X Variable
Construct Validity
Content Validity
The degree to which the measured
variable appears to have adequately
sampled from the potential domain
of question that might relate to
the conceptual variable of interest.

Sympathy
Verbal
Aptitude

Intelligence Math Aptitude


Construct Validity
Convergent Validity
Interdependence Scale
The extent to which a measured variable
is found to be related to other measured
variables designed to measure the same
conceptual variable. Collectivism Scale

Discriminant Validity

The extent to which a measured variable Independence Scale


is found to be unrelated to other measured
variables designed to measure the different
conceptual variables.
Interdependence Scale
a. Convergent Validity

⚫Multiple indicators of a concept converge or are associated


with one another.
⚫Multiple indicators hang together, or operate in similar
ways. For example, we measure “education” as a construct.
Construct “education”
⚫Ask the level of education completed.
⚫Verify the certificates.
⚫Give a test measuring school level knowledge.
⚫If the measures do not converge i.e.
⚫People claiming college degree but not supported by college
records, or those with college degree perform no better than
high school drop-out on the test.
⚫The outcome of each does not converge. Weak convergent
validity. Do not combine the three indicators into one
measure.
b. Discriminant Validity
⚫Divergent validity.
⚫Indicators of one concept hang together or converge, but
also diverge or are negatively associated with opposing
constructs.
⚫If two constructs A and B are very different then measures
of A and B should not be associated.
⚫Example of political conservatism.
Measuring Political Conservatism
⚫We have 10 questions to measure P C.
⚫People answer all 10 in similar ways.
⚫We put 5 additional questions that measure liberalism.
⚫Two scores are theoretically predicted to be different and
are empirically found to be so.
⚫If the 10 conservatism items hang together and are
negatively associated with 5 liberalism ones.
⚫It has discriminant validity.
• Face Validity

⚫Basic and minimum index of content validity


⚫Items that are supposed to measure a concept, do on the
face look like they measure the concept. For example:
⚫Measure a college student’s math ability 🡪 ask: 2+2=? Not
a valid measure of college level math ability.
⚫Subjective agreement among professionals about the
measuring content.
Quantitative data analysis:
Testing goodness of data

❑Validity
- factor analysis will confirm whether or not
the theorized dimensions emerge
- measures are developed by first delineating
the dimensions so as to operationalise the
concept
- factor analysis reveals whether the
dimensions are indeed tapped by the items in
the measure, as theorized.
42
Quantitative data analysis: Hypothesis
testing

Regression analysis
❑ simple regression analysis is used in a situation
where one IV is hypothesized to affect one DV
❑ the coefficient of determination, R2, provides
information about the goodness of fit of the regression
model
❑R2 is the percentage of variance in the DV that is
explained by the variation in the IV
❑If R2 is near to 1, most of the variation in the DV can
be explained by the regression model, i.e. the
regression model fits the data well. 43
time

Conceptual Future
Variables behaviors

Face Validity
Predictive Validity

Other
Domain of Measured Measured
the Variables Variables
CVs (Self- (Behavioral
Report) )
Content Validity Concurrent Validity

Similar Items-Scales Items-Scales Other Items-


Scales
Convergent Validity Discriminant Validity
Practicality
⚫Validity, reliability, and sensitivity are the scientific
requirements of a project.
⚫Operational requirements call for it to be practical in terms
of economy, convenience, and interpretability.
Reliability and validity in qualitative research

❑It is important that the conclusions that you have


drawn are verified; you must make sure that the
conclusions that you derive from your qualitative
data are plausible, reliable, and valid
❑Category reliability – relates to the extent to
which judges are able to use category definitions to
classify the qualitative data
❑Well –defined categories will lead to higher
category reliability and eventually to higher
interjudge reliability
46
Reliability and validity in qualitative research

❑Interjudge reliability can be defined as the


degree of consistency between coders
processing the same data
❑A commonly used measure of interjudge
reliability is the percentage of coding
arrangements out of the total number of
coding decisions
❑As a general guideline, agreement rates at
or above 80% are considered to be
satisfactory. 47
Reliability and validity in qualitative research

❑Validity is defined as the extent to which


an instrument measures what it purports to
measure.
❑Validity in qualitative research refers to
the extent to which the research results
i) accurately represent the collected data
(internal validity)
ii) can be generalised or transferred to
other contexts or settings (external validity)
48
Reliability and validity in qualitative research

❑Triangulation is a technique that is


associated with reliability and validity in
qualitative research
❑ One can be more confident in a result if the
use of different methods or sources leads to
the same results
❑Triangulation requires that research is
addressed from multiple perspectives

49
Reliability and validity in qualitative research

❑Method triangulation
- using multiple methods of data collection and analysis
❑Data triangulation
- collecting data from several sources and/or at different
time periods
❑Researcher triangulation
- multiple researchers collect and/or analyse the data
❑Theory triangulation
- multiple theories and/or perspectives are used to interpret
and explain the data

50
Other methods of gathering and analyzing
qualitative data

❑CONTENT ANALYSIS is an observational research


method that is used to systematically evaluate the
symbolic contents of all forms of recorded
communications
❑Used to analyze newspapers, websites,
advertisements, recordings of interviews etc
❑Enables the researcher to analyze (large amount
of) textual information and systematically identify
its properties, such as the presence of certain
words, concepts, characters, themes, or sentences.
51
Other methods of gathering and analyzing
qualitative data

❑To conduct a content analysis on a text, the text is


coded into categories and then analyzed using
conceptual analysis or relational analysis

❑Conceptual analysis establishes the existence and


frequency of concepts (such as words, themes or
characters) in a text
❑Conceptual analysis analyses and interprets text by
coding the text into manageable content categories

❑Relational analysis builds on conceptual analysis by


examining the relationships among concepts in a text. 52
Other methods of gathering and analyzing
qualitative data

❑NARRATIVE ANALYSIS is an approach that aims to


elicit and scrutinize the stories we tell about ourselves
and their implication for our lives
❑Narrative data are often collected via interviews
❑These interviews are designed to encourage the
participant to describe a certain incident in the context
of his or her life history
❑Narrative analysis has been used to study impulsive
buying, customers’ response to advertisements, and
relationships between service providers and
consumers.
53
How Do You Improve the Reliability and Validity of
Your Measured Variables?

1. Conduct a pilot test, trying out a questionnaire or other


research instruments on a small group.
2. Use multiple measures.

3. Ensure variability that there is in your measures.

4. Write good items.


5. Get your respondents to take your questions
seriously.

6. Make your items nonreactive.

7. Be certain to consider face and content validity by choosing


reasonable terms and that cover a broad range of issues
reflecting the conceptual variables..

8. Use existing measures.

You might also like