3 Validity and Reliabilty

MEASUREMENT
OF CONSTRUCTS
Evaluating Psychological Tests
Prof. Jemabel G. Sidayen, RPsy, RPm, LPT

Learning Objectives:
u Know how to evaluate evidences of
validity measures
u Demonstrate knowledge on validity limits
of psychological tests through statistical
analysis
u Explain various reliability measures
u Relate statistical concepts with test
reliability measures
Evaluating a psychological test
u Before using a psychological test,

clinicians should investigate and
understand
u the theoretical orientation of the
test,
u practical considerations, the
appropriateness of the
standardization sample,
u and the adequacy of its reliability
and validity.
Theoretical Orientation
u 1. Do you adequately understand the theoretical
construct the test is supposed to be measuring?
u 2. Do the test items correspond to the theoretical
description of the construct?
Practical Considerations
u 1. If reading is required by the examinee, does his
or her ability match the level required by
u the test?
u 2. How appropriate is the length of the test?
Standardization
u 1. Is the population to be tested similar to the
population the test was standardized on?
u 2. Was the size of the standardization sample
adequate?
u 3. Have specialized subgroup norms been
established?
u 4. How adequately do the instructions permit
standardized administration?
Reliability
u 1. Are reliability estimates sufficiently
high (generally around .90 for clinical
decision making
and around .70 for research purposes)?
u 2. What implications do the relative
stability of the trait, the method of
estimating reliability,
and the test format have on reliability?
Validity
u 1. What criteria and procedures
were used to validate the test?
u 2. Will the test produce accurate
measurements in the context and
for the purpose for which
you would like to use it?
RECAP: Scales of
Measurement
u Nominal
u Ordinal
u Interval
u Ratio
Basic Concepts
u Continuous variable: one that
theoretically can have an infinite
number of values between adjacent
units on the scale
u Examples: weight, height, time
u Discrete variable: one in which there
are no possible values between
adjacent units on the scale
u Examples: number of children in the
family, number of professors in CGHC
Basic Concepts
u RealLimits of a Continuous
Variable
u Values that are above and below the
recorded value by one-half of the
smallest measuring unit of the
scale.
u Allmeasurements made on a
continuous variable is approximate.
Measures of Central
Tendency
u Central Tendency = values that
summarize/ represent the majority
of scores in a distribution
u Three main measures of central
tendency:
u Mean (Sample Mean; Population Mean)
u Median (Mdn)
u Mode (Mo)
Measures of Variability
u Measures of variability quantify the
extent of dispersion.
u Three measures of variability are
commonly used in the behavioral
sciences:
u The range
u The standard deviation
u The variance
VALIDITY OF
MEASUREMENT
VALIDITY
p What is the Validity?

u The validity of a test concerns what the
test measure and how well it does so.
—Anne Anastasi
VALIDITY
u Thedegree to which the

measurement process measures
the variable that it claims to
measure
VALIDITY
Mathematical Definition of Validity
sco2
Val = 2
st
u Validity coefficient is the ratio of

u The variance concerned to the trait
measured to observed score variance.
VALIDITY OF MEASUREMENT
u Types
u Face validity
u Concurrent validity
u Predictive validity
u Construct validity
u Convergent validity
u Divergent validity
VALIDITY
u Facevalidity: simplest, least

scientific; superficial
appearance or face value;
Face Validity
u It looks like a test of

*#%*
u Not validity in a
technical sense
Content Validity
u Incorporates
quantitative
estimates
u Domain Sampling
u The simple
summing or
averaging of
dissimilar items
is inappropriate
Criterion-Related Validity
u Represents performance in relation to

particular tasks of discrete cognitive
or behavioral objectives
u Predictive Validity
u Concurrent Validity
Criterion-Related Validity
u Predictive validity: measurement is

validated through future behavior in the
same criteria as the construct
u Concurrent validity: consistency between

two procedures for measuring the same
variable, new measurement paired with a
standardized measurement
Construct Validity
u Indicated by u Discriminant/
correspondence Divergent
of scores to other Validity
known valid
measures of the
u Convergent
underlying
theoretical trait Validity
VALIDITY
u Construct Validity:
u correlationas established on
measurement that develops
gradually from being used in
many researches;
u In
time, widely used
psychological tests indicate high
construct validity
VALIDITY
u Convergent Validity: correlation
of two different methods of
measuring the same variable
u Divergent validity: two distinct

constructs produce unrelated
scores
Example:
• Observation
Aggression • Teacher’s
rating
• Observation
Level of • Teacher’s
Activity rating
Sources of Invalid Test
Measures
u The procedure measures a
dimension, but not the one
intended by the researcher.
u The procedure measures a
dimension, but there is more than
one interpretation of its meaning.
RELIABILITY
OF
MEASUREMENT
Reliability
u Stability or consistency of the

measurement
u Inconsistencies in a measurement
comes from error
u Observer error
u Environmental changes
u Participant changes
RELIABILITY
u Test of reliability / Coefficient

alpha
u Should not go beyond 1.0 (+/-)
u Test scores gain reliability as the
number of items increases
u The higher the coefficient alpha,
the higher the reliability or the
other way around
RELIABILITY
u Reliability analyses assume that test

scores reflect two factors:
u Stablecharacteristics (true
characteristics of the individual)
u Chance features (random
measurement of error)
RELIABILITY
u Standard error of estimates
u Tools used to estimate or infer
the extent to which an observed
score deviates from a true score
u Ifa test is a perfect
measurement instrument, an
individual would perform
similarly on all test of a
particular attribute until the
attribute change
RELIABILITY
u Random measurement error
uX =T+E
uX = is the person’s test score (raw score)
uT = the person’s stable characteristics or
knowledge (true score)
uE = chance events (error score)
RELIABILITY
u A reliable test reflect the value of E
should be close to 0
u Value of T should be should be close
to the actual test score X
u T/X = proportion of the test score
reflecting person’s true (stable)
characteristics
u E/X = proportion of test score
reflecting reflecting random error
(chance events)
RELIABILITY
u Domain sampling model
u Problem created by using a limited number of items to
represent a larger, more complicated construct
u Task in reliability analysis is to estimate how much error is
made by using a test score from the shorter test as estimate
of the true ability
u Reliability is the ratio of variance of the observed score on
the shorter test and the variance of the long-run true score
u Reliability can be estimated by correlating the observed score
with the true score
u T is not available so we estimate what they would be
u Many tests by sampling from the same domain, we can get a
normal distribution of unbiased estimates of the T.
u To estimate reliability, we create many randomly paralleled
tests
Item Response Theory
u IRT focuses on an item difficulty to

assess the ability
RELIABILITY
u Sources of Error Variance

u Test
construction – content
sampling
u Test administration
u Test scoring and interpretation
u Other sources of error
RELIABILITY
u Types
u Test– retest reliability: comparing
scores obtained from two successive
measurements
uType of analysis used when traits
being measured do not change over
time
uIdeally, 6 months or more interval
uCoefficient of stability when testing
interval is more than 6 months
uEvaluate error associated with
administering a test at two different
times
RELIABILITY
u Parallel – forms reliability:
u differentversions of the instrument
are used and compared
u Two equivalent forms of a test that
measure the same attribute are
statistically compared
u Also known as ‘alternate form’
RELIABILITY
u Split-half reliability:
uinternalconsistency of scores
obtained;
ustatesthat no single item or
question could measure a
construct
uCommonly known as ‘odd-even’
system
RELIABILITY
u Inter – item consistency:

u The degree of correlation among all
items is a scale.
u Calculated using a single
administration of a single test form.
u Methods used to obtain estimates of
internal consistency
u KR 20
u Cronbach Alpha
RELIABILITY
u Inter-rater reliability:
u degree of agreement between two
observers who are simultaneously
obtaining measurements thru
observation
u Observation is systematically
performed with validated forms for
observing behavior
u Usually used for projective tests
u Also known as ‘inter-scorer
reliability’
Take note !"
u Reliability is a pre-requisite of validity

u If a measurement has very low reliability,
it cannot be valid because it is not even
consistent or stable.
u It could be reliable but not valid
u Maximum validity is the square root of
reliability.
u Validity < Reliability
Reference:
Kaplan, R., & Saccuzzo, D. (2013).
Psychological Assessment and
Theory: Creating and Using
Psychological Test 8th Ed.
Philippines: Cengage Learning Asia.
THANK YOU
AND
GOOD DAY!

3 Validity and Reliabilty

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Validity and Reliabilty

Uploaded by

Copyright:

Available Formats

MEASUREMENT

Prof. Jemabel G. Sidayen, RPsy, RPm, LPT

u Before using a psychological test,

p What is the Validity?

u Thedegree to which the

u Validity coefficient is the ratio of

u Facevalidity: simplest, least

u It looks like a test of

u Represents performance in relation to

u Predictive validity: measurement is

u Concurrent validity: consistency between

u Divergent validity: two distinct

u Stability or consistency of the

u Test of reliability / Coefficient

u Reliability analyses assume that test

u IRT focuses on an item difficulty to

u Sources of Error Variance

u Inter – item consistency:

u Reliability is a pre-requisite of validity

You might also like