Professional Documents
Culture Documents
Reliability Estimates: Source of Error Variance Is Test Administration
Reliability Estimates: Source of Error Variance Is Test Administration
Heterogeneous tests
- a test measures that different factors, composed of items that measure more than one trait
- same scorers may have different abilities
1. KUDER-RICHARDSON FORMULAS
a. KR 20
- Statistic of choice for determining the inter-item consistency of dichotomous items, those that
can be scored right or wrong ( e.g. multiple choice items)
- If items are heterogeneous, KR-20 will yield lower reliability estimates that split-half estimate
- B. KR 21
- If we assume that all the test items have approximately the same level of difficulty
MEASURES OF INTER-SCORER RELIABILITY (source of error variance is test scoring and interpretation)
Inter-scorer reliability - degree of agreement or consistency between two or more scorers (or judges or
raters) with regard to a particular measure
- If the reliability coefficient is high, the prospective test user knows that test scores can be
derived in a systematic, consistent way by various scorers with sufficient training.
- Inter-rater consistency may be promoted by providing raters with the opportunity for group
discussion along with practice exercises and information on rater accuracy
The simplest way of determining the degree of consistency among scorers in the scoring of a test is to
calculate a coefficient of correlation (coefficient of inter-scorer reliability)
How high should the coefficient of reliability be? We need more of it in some tests, and we will
admittedly allow for less of it in others. If a test score is routinely used in combination with many other
test scores and typically accounts for only a small part of the decision process, that test will not be held
to the highest standards of reliability.
NATURE OF TESTS (could also be helpful in determining what estimate of reliability to use)
1. Homogeneity vs heterogeneity
- Homogeneous (internal consistency)
- Heterogeneous (test-retest estimate)
2. Dynamic vs static characteristics
a. Dynamic characteristic is a trait, state, or ability presumed to be ever-changing as a
function of situational and cognitive experiences – internal consistency
b. Static – test- retest or alternate forms method
3. Restriction or inflation of range
a. If restricted, correlation coefficient tend to be lower
b. If inflated, correlation coefficient tend to be higher
4. Speed vs Power tests
a. Speed – all three. If split-half, use Spearman Brown formula to adjust. KR 20 or split-half is
high
5. Criterion-referenced tests
- how different the scores are from one another is seldom a focus of interest
A generalizability study examines how generalizable scores from a particular test are if the test is
administered in different situations. It examines how much of an impact different facets of the universe
have on the test score. The influence of particular facets on the test score is represented by coefficients
of generalizability.
3. Item-response theory
- provide a way to model the probability that a person with X ability will be able to perform at a
level of Y.
Standard Error of Measurement
1. Taylor-Russell table
- Provides an estimate of the percentage of new hires who will be successful employees if a test is
adopted (organizational success)
2. Expectancy charts or Lawshe tables
- provide a probability of success for a particular applicant based on test scores (individual
success)
a. Organization Analysis
b. Task Analysis
c. Person Analysis