Psychological Test Construction and Standardization
I. Introduction
Psychological tests are tools used to measure various aspects of human
behaviour, cognition, and personality. Constructing and standardizing these
tests requires careful attention to detail to ensure their reliability and validity.
This guide aims to provide a comprehensive overview of the key aspects
involved in the construction and standardization of psychological tests.
II. Different Taxonomies of Test Development
1. Classical Test Theory (CTT): Focuses on the reliability and validity of
test scores. For example, in CTT, a researcher might assess the internal
consistency of a test using Cronbach's alpha coefficient.
2. Item Response Theory (IRT): Emphasizes the characteristics of
individual test items and the relationship between item responses and
latent traits. An example of IRT application is the Rasch model, which
estimates item difficulty and examinee ability. (The Rasch model is
particularly useful for measuring abilities, attitudes, or traits along a
unidimensional continuum. It assumes that the probability of a person
endorsing or responding to an item correctly depends only on two factors:
• The person's ability or trait level.
• The item's difficulty
3. Criterion-Referenced Testing: Evaluates an individual's performance
based on predefined criteria rather than in comparison to others. For
instance, a driving test may use criterion-referenced testing to determine
if a person meets the standards for obtaining a driver's license.
III. Types of Items
1. Objective Items: These have corrected or clearly best answers and
include multiple-choice, true/false, and matching items. For example,
"Which of the following is not a primary colour? a) Red b) Blue c) Green
d) Yellow"
2. Subjective Items: Require subjective judgment in scoring, such as essay
questions and short-answer items. An example of a subjective item could
be "Discuss the impact of social media on mental health."
IV. General Guidelines for Writing Items
1. Clarity: Items should be clear and unambiguous to avoid confusion.
Example: "What is the capital of France?" (Clear and concise)
2. Relevance: Items should assess the intended construct or trait. Example:
In a test measuring mathematical ability, a relevant item could be "Solve
for x: 2x + 5 = 15."
3. Conciseness: Keep items concise and focused to prevent unnecessary
complexity. Example: "What is the chemical symbol for gold?" (Concise
and to the point)
V. Item Writing and Item Analysis
1. Approaches of Item Writing:
a. Direct Approach: Directly assesses the construct or trait of interest.
Example: "What is your level of agreement with the statement: 'I enjoy working
in teams'?" (Directly assesses attitude towards teamwork)
b. Indirect Approach: Requires the examinee to infer the construct or trait
from the item. Example: "Choose the picture that best represents happiness."
(Indirectly assesses emotional state)
2. Types of Item Analysis:
a. Difficulty Index: Measures the proportion of examinees who answered the
item correctly. Example: In a multiple-choice test, if 80% of examinees
answered a particular question correctly, its difficulty index is 0.80.
b. Discrimination Index: Assesses the extent to which an item differentiates
between high and low performers. Example: An item that is answered correctly
by high-performing students but not by low-performing students has a high
discrimination index.
VI. Item Difficulty Index
1. Calculation: Difficulty index is calculated as the proportion of
examinees who answered the item correctly. Example: If 50 out of 100
students answer an item correctly, its difficulty index is 0.50.
2. Interpretation: Items with a high difficulty index may be too easy, while
those with a low index may be too difficult. Example: If a test aimed at
high school students has many items with a difficulty index of 0.90, it
may not effectively differentiate between students of varying ability
levels.
VII. Item Discrimination Index
1. Calculation: Discrimination index is calculated by comparing the
performance of the upper and lower groups of examinees. Example: If the
top 25% of students answer an item correctly significantly more often
than the bottom 25%, the item has a high discrimination index.
2. Interpretation: Items with high discrimination index values effectively
distinguish between high and low performers. Example: An item that
consistently separates the top-performing students from the rest indicates
good discrimination ability.
VIII. Conclusion
Constructing and standardizing psychological tests require adherence to
established guidelines and principles. By carefully developing items and
analysing their performance, researchers can ensure the reliability and validity
of the tests, ultimately enhancing their utility in psychological assessment.