Professional Documents
Culture Documents
Test theories are frameworks or methodologies used in the field of psychometrics to develop,
evaluate, and analyze tests and assessments. These theories provide a systematic and
structured way to understand how tests are constructed, how they perform, and how they are
used to measure various attributes, skills, or traits in individuals.
IRT Assumptions
1) Monotonicity – The assumption indicates that as the trait level is increasing, the
probability of a correct response also increases
2) Unidimensionality – The model assumes that there is one dominant latent trait being
measured and that this trait is the driving force for the responses observed for each item in the
measure
3) Local Independence – Responses given to the separate items in a test are mutually
independent given a certain level of ability.
4) Invariance – We are allowed to estimate the item parameters from any position on the item
response curve. Accordingly, we can estimate the parameters of an item from any group of
subjects who have answered the item.
Using IRT, the computer is used to focus on the range of item difficulty that helps assess an
individual’s ability level. For example, if the person gets several easy items correct, the
computer might quickly move to more difficult items. If the person gets several difficult
items wrong, the computer moves back to the area of item difficulty where the person gets
some items right and some wrong. Then, this level of ability is intensely sampled. The overall
result is that a more reliable estimate of ability is obtained using a shorter test with fewer
items.
KEY DIFFERENCES BETWEEN THE MODELS:
Classical Test Theory (CTT) and Item Response Theory (IRT) are two distinct frameworks
used in psychometrics to assess the quality of tests and measurements. Here are the key
differences between the two:
1. Underlying Assumptions:
CTT: CTT assumes that the observed test score is a combination of a true
score (the person's actual ability or trait being measured) and measurement
error. It treats measurement error as a constant and does not provide
information about the nature of that error.
IRT: IRT assumes that test scores are a result of a person's underlying trait
(latent trait) and the specific characteristics of the test items, such as their
difficulty and discrimination. IRT considers the nature of measurement error
and how it varies across individuals and items.
2. Model Flexibility:
CTT: CTT provides a single overall test score but doesn't allow for detailed
examination of individual item properties. It assumes that all items have the
same characteristics, such as difficulty and discrimination.
IRT: IRT allows for individual item analysis, providing information about the
difficulty and discrimination of each item. It can accommodate items with
varying characteristics within the same test.
3. Scoring Methods:
CTT: CTT typically uses simple summation or averaging of item scores to
create a total test score. The total score doesn't consider the individual
characteristics of items.
IRT: IRT employs a more complex scoring approach. It calculates a person's
trait level (latent score) based on their responses to individual items. This
approach considers the properties of each item, adjusting for item difficulty
and the person's ability.
4. Adaptive Testing:
CTT: CTT does not naturally support computer-adaptive testing, where the
difficulty of items is adjusted based on a test-taker's previous responses.
IRT: IRT is well-suited for computer-adaptive testing, as it allows for real-
time adjustments of item difficulty, ensuring that each test-taker receives
questions that match their ability level.
5. Test Equating:
CTT: Equating different test forms or versions is challenging in CTT, as it
relies on observed scores and is influenced by variations in test difficulty.
IRT: IRT facilitates test equating because it uses item characteristics (item
parameters) rather than observed scores. Equating is more straightforward
when using IRT.
6. Precision and Reliability:
CTT: CTT provides a single estimate of reliability (e.g., Cronbach's alpha) for
the entire test but does not offer information about the reliability of individual
items.
IRT: IRT offers item-level information about reliability and precision,
enabling a more detailed analysis of measurement quality.
In summary, CTT and IRT are both used to assess tests and measurements, but they differ in
their underlying assumptions, model flexibility, scoring methods, and suitability for various
applications. IRT provides a more detailed and flexible framework for developing and
evaluating assessments, particularly in scenarios where precise measurement is critical.