You are on page 1of 4

What are test theories?

Test theories are frameworks or methodologies used in the field of psychometrics to develop,
evaluate, and analyze tests and assessments. These theories provide a systematic and
structured way to understand how tests are constructed, how they perform, and how they are
used to measure various attributes, skills, or traits in individuals.

They allow for:


• Interpretation of test score
• Evaluation of tests
• Understanding why tests behave the way they do (score change, distraction and
context)
2 Test theories widely used in the world today- CTT/CTST and IRT

Classical Test Theory


Classical Test Theory (CTT) is a framework used in psychometrics to understand and assess
the reliability and validity of psychological tests or assessments. It provides a foundation for
evaluating the accuracy, quality and consistency of measurement instruments.
Classical test score theory assumes that each person has a true score that would be obtained if
there were no errors in measurement. However, because measuring instruments are imperfect,
the score observed for each person almost always differs from the person’s true ability or
characteristic. The difference between the true score and the observed score results from
measurement error. In symbolic representation, the observed score (X) has two components; a
true score (T) and an error component (E):
A major assumption in classical test theory is that errors of measurement are random.
Although systematic errors are acknowledged in most measurement problems, they are less
likely than other errors to force an investigator to make the wrong conclusions. A carpenter
who always misreads a tape measure by 2 inches (or makes a systematic error of 2 inches)
would still be able to cut boards the same length.
Classical test theory assumes that the true score for an individual will not change with
repeated applications of the same test. Because of random error, however, repeated
applications of the same test can produce different scores.
Item Response Theory
The item response theory (IRT), also known as the latent response theory refers to a family of
mathematical models that attempt to explain the relationship between latent traits
(unobservable characteristic or attribute) and their manifestations (i.e. observed outcomes,
responses or performance). They establish a link between the properties of items on an
instrument, individuals responding to these items and the underlying trait being measured.
IRT assumes that the latent construct (e.g. stress, knowledge, attitudes) and items of a
measure are organized in an unobservable continuum. Therefore, its main purpose focuses on
establishing the individual’s position on that continuum.
In IRT, there are typically two primary item parameters associated with each test item:
1. Difficulty Parameter (b): This parameter represents the point on the latent trait
continuum (e.g., ability) where there is a 50% probability of a test-taker providing a
correct response to the item.
2. Discrimination Parameter (a): This parameter measures how well the item
discriminates between individuals with different levels of the latent trait. It describes
the steepness of the item response curve.

IRT Assumptions
1) Monotonicity – The assumption indicates that as the trait level is increasing, the
probability of a correct response also increases
2) Unidimensionality – The model assumes that there is one dominant latent trait being
measured and that this trait is the driving force for the responses observed for each item in the
measure
3) Local Independence – Responses given to the separate items in a test are mutually
independent given a certain level of ability.
4) Invariance – We are allowed to estimate the item parameters from any position on the item
response curve. Accordingly, we can estimate the parameters of an item from any group of
subjects who have answered the item.
Using IRT, the computer is used to focus on the range of item difficulty that helps assess an
individual’s ability level. For example, if the person gets several easy items correct, the
computer might quickly move to more difficult items. If the person gets several difficult
items wrong, the computer moves back to the area of item difficulty where the person gets
some items right and some wrong. Then, this level of ability is intensely sampled. The overall
result is that a more reliable estimate of ability is obtained using a shorter test with fewer
items.
KEY DIFFERENCES BETWEEN THE MODELS:
Classical Test Theory (CTT) and Item Response Theory (IRT) are two distinct frameworks
used in psychometrics to assess the quality of tests and measurements. Here are the key
differences between the two:
1. Underlying Assumptions:
 CTT: CTT assumes that the observed test score is a combination of a true
score (the person's actual ability or trait being measured) and measurement
error. It treats measurement error as a constant and does not provide
information about the nature of that error.
 IRT: IRT assumes that test scores are a result of a person's underlying trait
(latent trait) and the specific characteristics of the test items, such as their
difficulty and discrimination. IRT considers the nature of measurement error
and how it varies across individuals and items.
2. Model Flexibility:
 CTT: CTT provides a single overall test score but doesn't allow for detailed
examination of individual item properties. It assumes that all items have the
same characteristics, such as difficulty and discrimination.
 IRT: IRT allows for individual item analysis, providing information about the
difficulty and discrimination of each item. It can accommodate items with
varying characteristics within the same test.
3. Scoring Methods:
 CTT: CTT typically uses simple summation or averaging of item scores to
create a total test score. The total score doesn't consider the individual
characteristics of items.
 IRT: IRT employs a more complex scoring approach. It calculates a person's
trait level (latent score) based on their responses to individual items. This
approach considers the properties of each item, adjusting for item difficulty
and the person's ability.
4. Adaptive Testing:
 CTT: CTT does not naturally support computer-adaptive testing, where the
difficulty of items is adjusted based on a test-taker's previous responses.
 IRT: IRT is well-suited for computer-adaptive testing, as it allows for real-
time adjustments of item difficulty, ensuring that each test-taker receives
questions that match their ability level.
5. Test Equating:
 CTT: Equating different test forms or versions is challenging in CTT, as it
relies on observed scores and is influenced by variations in test difficulty.
 IRT: IRT facilitates test equating because it uses item characteristics (item
parameters) rather than observed scores. Equating is more straightforward
when using IRT.
6. Precision and Reliability:
 CTT: CTT provides a single estimate of reliability (e.g., Cronbach's alpha) for
the entire test but does not offer information about the reliability of individual
items.
 IRT: IRT offers item-level information about reliability and precision,
enabling a more detailed analysis of measurement quality.
In summary, CTT and IRT are both used to assess tests and measurements, but they differ in
their underlying assumptions, model flexibility, scoring methods, and suitability for various
applications. IRT provides a more detailed and flexible framework for developing and
evaluating assessments, particularly in scenarios where precise measurement is critical.

You might also like