Professional Documents
Culture Documents
III. Reliability and Validity PDF
III. Reliability and Validity PDF
+0.50
Moderately small positive correlation
+0.25
Very small positive correlation
0.00 NO CORRELATION
Very small negative correlation
-0.25
Moderately small negative correlation
-0.50
High negative correlation
-0.75
-1.00 Very high negative correlation
PERFECT NEGATIVE CORRELATION
It is established by comparing the scores obtained
from two successive measurements of the same
individuals and calculating a correlation between the
two sets of scores.
▪ The SRA Verbal Form has parallel forms A and B and both
yield almost identical scores of the test taker.
Examples:
▪ Two psychologists observe the aggressive behavior of
elementary school children. If their individual records of the
construct are almost the same, then the measure has a
good inter-rater reliability.
Construct-irrelevant variance
Happens when scores are influenced by factors irrelevant
to the construct (e.g. test anxiety, reading speed, reading
comprehension, illness)
Quantification of Content Validity
Lawshe (1975) proposed a structured and systematic way of
establishing the content validity of a test
He developed the formula content validity ratio (CVR)
Tells how well a test corresponds with a particular
criterion.
A judgment of how adequately a test score can be
used to infer an individual’s most probable standing on
some measure of interest.
Criterion – standard against which a test or a test score is
evaluated.
A criterion can be a test score, psychiatric diagnosis, training
cost, index of absenteeism, amount of time.
1. Relevant
2. Valid and Reliable
3. Uncontaminated
Criterion contamination – criterion based on
predictor measures; the criterion used is a
criterion of what is supposed to be the criterion.
Verbal Aptitude Test
Self Esteem Test
Managerial Skills Test
Extroversion Test
Visual-spatial skills test
Concurrent Validity
Both the test scores and the criterion
measures are obtained at present.
Predictive Validity
Test scores may be obtained at one time and
the criterion measure may be obtained in the
future after an intervening event.
Construct – An informed scientific idea developed or
hypothesized to describe or explain a behavior; something
built by mental synthesis.
Unobservable, presupposed traits; something that the researcher
thought to have either high or low correlation with other variables.
Established through a series of activities in
which a researcher simultaneously defines
some construct and develops
instrumentation to measure it.
A judgment about the appropriateness of
inferences drawn form test scores regarding
individual standings on a variable called
construct.
Required when no criterion or universe of
content is accepted as entirely adequate to
define the quality being measured.
Assembling evidence about what a test means.
Series of statistical analysis tat one variable is a
separate variable.
A test has a good construct validity if there is an
existing psychological theory which can support what
the test items are measuring.
Divergent/Discriminant Evidence
Also called as divergent/discriminant validity
A validity coefficient sharing little or no relationship
between the newly created test and an existing test.
Social Desirability test and Marital Satisfaction test.
This approach refers to a validation strategy
that requires the collection of data on two or
more distinct traits (e.g., anxiety, affiliation,
and dominance) by two or more different
methods (e.g., self-report questionnaires,
behavioral observations, and projective
techniques).
Can be used to obtain evidence for both
convergent and discriminant validity.
Exploratory Factor Analysis – estimating or
extracting factors; deciding how many
factors to retain; and rotating factors to an
interpretable orientation
Looking for factors
Confirmatory Factor Analysis – researchers
test the degree to which a hypothetical
model fits the actual data.
Revalidation of the test to a criterion based
on another group different from the original
group from which the test was validated.
Validity Shrinkage – decrease in validity after
cross validation.
Co-validation – validation of more than one test
from the same group.
Co-norming – norming more than one test from
the same group.
1. Test bias – A factor inherent in a test that
systematically prevents accurate, impartial
measurement.
Test fairness – the extent to which a test is used in an
impartial, just, and equitable way
Adverse Impact – the use of the test systematically
rejects higher portions of minority group than non-
minority applicants
Differential Validity – the extent to which a test has
different meaning for different people
▪ Ex. A test may be a valid predictor of college success for white
but not for African-Americans
2. Rating Error – judgment resulting from
intentional and unintentional misuse of a
rating scale
Rating – a numerical or verbal judgment that
places a person or an attribute along a continuum
identified by a rating scale.
▪ Leniency Error/ Generosity Error – An error in rating that
arises from the tendency on part of the rater to be
lenient in scoring marking or grading.
▪ Severity Error – overly critical scoring
▪ Central Tendency Error – The rater has reluctance in
giving ratings at either positive or negative extreme.
▪ Rater’s ratings would tend to cluster in the middle of the
continuum.
▪ Halo Effect – The tendency to give ratee a higher rating
than he/she objectively deserves because of the rater’s
failure to discriminate among conceptually distinct and
potentially independent aspects of a ratee’s behavior.
▪ Tendency to ascribe positive attributes independently of the
observed behavior.
Reliability and validity are partially related
and partially independent.
Reliability is a prerequisite for validity,
meaning a measurement cannot be valid
unless it is reliable.
It is not necessary for a measurement to be
valid for it to be considered reliable.
r₁₂max = √r₁₁r₂₂