You are on page 1of 3

Tuesday, 21 February 2017

Psych 162: Reliability

Classical Test Theory


- True score + error = text score
- Sources of error
• Time
• Items/Content
• Consistency
- Standard error of measurement to create confidence intervals (95% confident that the
true score will fall between these two values)

Reliability
- Consistency of scores obtained by the same person across time, items, or other test
conditions
- Extent to which individual differences in test scores represent “true” differences or
chance errors
- Estimate what proportion of test score variance is error variance
• Error variance: difference n scores resulting from conditions that are irrelevant to the
purpose of the test

• Reliability of 0.8= 80% of the variance is a variance in true scores, and remaining
20% is error variance

• Desired: Between 0.7 and 0.9


- No test is a perfectly reliable instrument

Test-Retest Reliability
- Repeat same test on the same person on another occasion
- Test for correlation between scores on the two separate testing occasions
- Source of error variance: fluctuations in performance between the two testing
occasions
- Shows how test can be generalized across situations

1
Tuesday, 21 February 2017
- Higher reliability, lower susceptibility to random changes
• smaller error attributable to time factors
- Need to specify the length of interval (between two measurement situations)
• max. 6 months
- Disadvantage: practice effect
- Can only be applied to test in which performance is not affected by repetition (e.g
sensorimotor, motor)

Alternate-Form Reliability
- 1 person, 2 equivalent forms
• Form A: original
• Form B: restate
- Test for correlation of scores on the two forms
- Measure of:
• item factors
• stability over time
• consistency of responses to two different item samples
- Sources of error variance: time + content sampling
- Disadvantage: reduce but does not completely eliminate practice effect
- Parallel forms must:
• Be independently constructed
• Items should be expressed in the same form
• Same type of content
• Equivalent range and level of difficulty
• Instructions, time limits and sample items must be equivalent

Split-Hald Reliability
- 1 form, 1 person, split into two sections

2
Tuesday, 21 February 2017
- Source of error variance: content sampling
- Test for coefficient of internal consistency
- Longer test = more reliable (usually, not always)
- Spearman-Brown formula: for estimating the effect of shortening or lengthening the
test

• Used because this type or reliability only technically computes for the reliability of
half the test

Inter-Item Consistency
- a.k.a. Kuder-Richardson reliability and coefficient alpha
- Single administration of a single form
- Consistency of all items in the test
- Source of error variance:
• content sampling
• Heterogeneity of behavior
- More homogenous items, more consistency

Scorer Reliability
- Correlate results obtained by two separate scores
- Factors excluded from error variance:
• True variance (remains in scores)
• Irrelevant factors that can be controlled experimentally

Increasing Reliability
- Acceptable reliability: between 0.7 to 0.9
- Adding number of items

You might also like