You are on page 1of 3

Properties of a Standardized Test from earlier scores or when

Reliability something affects some test takers.


® Consistency of test scores obtained by the • Practice Effect
same persons under which they are re- ® Test takers tend to score better on a
examined (Anastasi & Urbina, 1996). test that’s been given the second
® Extent to which a score is free from time.
measurement error (Kaplan & Saccuzzo, ® They have sharpened their skills
2011) since the first time.
o True score variance : Observed score ® Thus, time interval must be
variance evaluated carefully.
® Consistency in measurement (Cohen & ® The shorter time interval, the
Swerdlik, 2009). greater risk for both carryover and
o Measurement error = extent to which practice effects.
measurements differ from occasion to
occasion. 2) Alternate Forms Reliability
® Equivalent forms or Parallel forms reliability
Types of Reliability ® An alternative to test re-test reliability.
1) Test Re-test Reliability ® It makes use of alternate forms of the test.
® Repeat identical test on second occasion. ® Same persons can be tested with one form on
® Reliability coefficient first occasion, and an equivalent form of
o Correlation between scores obtained by test on the second.
same persons upon two administrations ® Reliability coefficient
of the same test. o Correlation between scores obtained by
® Valuable most only if we are measuring same persons on the two forms of test.
characteristics that do not change over time ® Ensure that test is truly parallel.
(e.g., IQ). o Both tests must have the same number of
o If an IQ test produces different scores, items, type and content, and equal range
conclude that it is due to random or level of difficulty.
measurement error not because the Limitations
person got smarter or less. ® Can only reduce but not eliminate practice
® Shows the extent to which scores can be effects.
generalized over different occasions ® Random error and the difference between
(unchanging). the forms can elicit variation when both
® Interval should be specified in test manual. forms of the test are given on the same day.
® Retest correlations decrease as interval ® Can be burdensome (having to develop two
lengthens. forms of the same test).
Two Possible Negative Effects of Test Re-test
• Carryover Effect 3) Split Half Reliability
® First testing session influences ® Test is divided into half. Each half scored
scores on second session. separately.
® Test takers may remember their ® The results of ½ of test are compared to the
answers from the first time they took results of the other half.
the test. ® How is the test divided into two halves?
® Systematic carryover does not o Divide randomly. Calculate score for
harm the reliability. first half and another score for second
o Everyone’s score improves half. Convenient but can cause
exactly 5 points (changes problems if difficulty is divided
aren’t random). between halves.
® Random carryover effects occur o Use odd-even system
when changes are not predictable ® Correlation is an underestimate. Less reliable
because it has fewer items.
o Apply Spearman-Brown formula. The 2. Test Difficulty
formula estimates the correlation 3. Test Objectivity
between the two halves if it had the 4. Test Administration (who, when, where, and
length of a full test. how the test was administered)
o Use Spearman-Brown only when two 5. Test Scoring
halves have equal variances. 6. Test Economy (cost and quality of test)
o Otherwise (unequal variances), 7. Test Adequacy
Cronbach’s coefficient alpha is used,
which provides the lowest estimate of Validity
reliability. ® Extent to which it measures what it claims
to measure. Attributes a meaning to test
4) KR20 Formula scores (Gregory, 2011).
® Kuder-Richardson 20 ® Measures what the test measures and how
® Calculates reliability of test where items are well it does so (Anastasi, 1996).
dichotomous (right or wrong answers, 0 and ® The agreement between a test score and the
1) quality it is believed to measure. “Does the
® Formula: KR20 = N/N-1 {(s2 – Σpq)/s2} test measure what it is supposed to measure?”
o KR20 = reliability estimate (Kaplan & Saccuzzo, 2011).
o N = number of items on the test ® It is characterized in a continuum ranging
o s2 = variance of the total test score from weak to acceptable to strong.
o p = proportion of people getting each
item correct Types of Validity
o q = proportion of people getting each 1) Face Validity
item incorrect ® Least stringent. Does the test appear to be
§ For each item, q = 1 – p valid?
o If a test measures what it intends to
5) Coefficient Alpha or Cronbach’s coefficient measure on a surface level.
® Developed by Cronbach to estimate internal ® Doesn’t involve statistics.
consistency of tests that are not ® Done by face validators (registered
dichotomous. psychometricians, psychologists, and
® Applicable for personality and attitude guidance counselors).
scales.
® SPSS software provides convenient way to 2) Content Validity
find coefficient alpha. ® Built through choice of appropriate content
® Used in essays, etc. (questions, tasks, and items).
® Extent to which test represents all aspects
How reliable is reliable? of a given construct or variable.
® .70 to .80 reliability are good enough scores ® Inspection of items. Panel of experts will
in basic research. rate test items based on how they match the
® In clinical settings, high reliability is vital domain specification.
(.90 to .95). ® Considers the adequacy of representation.

What to do about low reliability? 3) Criterion Validity


® Increase number of items ® Criterion
® Factor and item analysis o Standard followed in which a test score
® Correction for attenuation – a formula used is evaluated.
to determine exact correlation of two o Can be a test score, psychiatric
variables deemed to be affected by error. diagnosis, training cost, index of
absenteeism, amount of time.
Factors Affecting Test Reliability (For DO A SEA) o Characteristics of a criterion:
1. Test Format (structure, etc.) § Relevant
§ Valid and reliable 3. Reading comprehension level
§ Uncontaminated (should not be a 4. Item difficulty
criterion of the supposed criterion) 5. Test construction factors
® Test effectiveness in estimating behavior in 6. Length of test
particular situation. 7. Arrangement of items
® Tells how well test corresponds to a 8. Patterns of answers
particular criterion.
Types of Criterion-Related Validity
• Concurrent Validity
® Extent to which test scores may be used
to estimate individual’s present
standing on a criterion.
• Predictive Validity
® Extent to which test scores can predict
future behavior or scores on another
test taken in future.
• Incremental Validity
® Related to predictive validity.
® Extent to which an additional
predictor explains something about
criterion measure that is not explained
by predictors in use.

4) Construct Validity
® Construct
o An informed scientific idea
developed to describe a behavior.
o Built by mental synthesis.
o Unobservable traits; thought to have
correlation with other variables.
® Extent to which test measures a theoretical
construct.
® Estimates the existence of an inferred,
underlying characteristic based on a limited
sample of behavior.
® Established through activities in which
researcher defines some construct and
develops instrumentation to measure it.
® Required when no criterion is accepted as
adequate to define quality being measured
(being woke).
® Statistical analysis.
® Good construct validity if there is an
existing psychological theory to support test
items.
® Involves logical analysis and empirical
data.

Factors Influencing Test Validity (ADRITLAP)


1. Appropriateness of the test
2. Directions/Instructions

You might also like