Professional Documents
Culture Documents
CHAPTER 6:
ITEM ANALYSIS AND VALIDATION
There are two important characteristics of an item that will be of interest to the teacher.
These are: (a) item difficulty and (b) discrimination index. We shall learn how to
measure these characteristics and apply our knowledge in making a decision about
the item in question.
Example: What is the item difficulty index of an item if 25 students are unable to answer it
correctly while 75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty index is 75/100 or 75%.
Another example: 25 students answered the item correctly while 75 students did not. The total
number of students is 100 so the difficulty index is 25/100 or 25 which is 25%.
It is a more difficult test item than that one with a difficulty index of 75.
A high percentage indicates an easy item/question while a low percentage indicates a difficult item.
An easy way to derive such a measure is to measure how difficult an item is with respect to
those in the upper 25% of the class and how difficult it is with respect to those in the lower 25% of the
class. If the upper 25% of the class found the item easy yet the lower 25% found it difficult, then the
item can discriminate properly between these two groups. Thus:
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the
class had a difficulty index of 0.20.
Here, DU = 0.60 while DL = 0.20, thus index of discrimination = .60 - .20 = .40.
Discrimination index is the difference between the proportion of the top scorers who got
an item correct and the proportion of the lowest scorers who got the item right. The
discrimination index range is between -1 and +1. The closer the discrimination index is to +1, the more
effectively the item can discriminate or distinguish between the two groups of students. A negative
discrimination index means more from the lower group got the item correctly. The last item is not
good and so must be discarded.
Theoretically, the index of discrimination can range from -1.0 (when DU =0 and DL = 1) to 1.0
(when DU
= 1 and DL = 0). When the index of discrimination is equal to -1, then this means that all of the
lower 25% of the students got the correct answer while all of the upper 25% got the wrong answer. In
a sense, such an index discriminates correctly between the two groups but the item itself is highly
questionable.
Example: Consider a multiple choice type of test of which the following data were obtained:
Item Options
A B* C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of discrimination:
It is also instructive to note that the distracter A is not an effective distracter since this was never
selected by the students. It is an implausible distracter. Distracters C and D appear to have good
appeal as distracters. They are plausible distracters.
Index of Difficulty
Ru + RL
P=_________________100
T
Where:
Ru — The number in the upper group who answered
Ru + RL
½T
Where:
P percentage who answered the item correctly (index of difficulty)
P=8/20 x100=40%
The smaller the percentage figure the more difficult the item
formula below:
(Ru — RL) ( 6 – 2)
D = ½t = 10 = 0.40
For classroom achievement tests, most test constructors desire items with indices of difficulty no
lower than 20 nor higher than 80, with an average index of difficulty from 30 or 40 to a maximum of
60.
After performing the item analysis and revising the items which need revision, the next step is to
validate the instrument. The purpose of validation is to determine the characteristics of the whole
test itself, namely, the validity and reliability of the test. Validation is the process of collecting and
analyzing evidence to support the meaningfulness and usefulness of the test.
Validity. Validity is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness and usefulness of the specific decisions
a teacher makes based on the test results. These two definitions of validity differ in the sense that the
first definition refers to the test itself while the second refers to the decisions made by the teacher
based on the test. A test is .valid when it is aligned with the learning outcome.
There are 2 main types of criterion validity-concurrent validity and predictive validity.
Concurrent validity refers to a comparison between the measure in question and an outcome
assessed at the same time.
6.3. Reliability
Reliability refers to the consistency of the scores obtained — how consistent they are
for each individual from one administration of an instrument to another and from one set of
items to another.
Reliability and validity are related concepts. If an instrument is unreliable, it cannot get valid
outcomes. As reliability improves, validity may improve (or it may not). However, if an instrument is
shown scientifically to be valid then it is almost certain that it is also reliable.
Predictive validity compares the question with an outcome assessed at a later time. An
example of predicitve validity is a comparison of scores in the National Achievement Test
(NAT) with first semester grade point average (GPA) in college. Do NAT scores predict college
performance? Construct validity refers to the ability of a test to measure what it is supposed to
measure. As researcher, you intend to measure depression but you actually measure anxiety so your
research gets compromised.
The following table is a standard followed almost universally in educational test and
measurement.
Reliability Interpretation
90 and above Excellent reliability; at the level of the best standardized tests
80 - 90 Very good for a classroom test
70 - 80 Good for a classroom test; in the range of most. There are probably a few
items which could be improved.
50 or below Questionable reliability. This test should not contribute heavily to the
course grade, and it needs revision.