You are on page 1of 4

ITEM ANALYSIS AND VALIDATION

INTRODUCTION

The teacher normally prepares a draft of the test. Such a draft is subjected to item
analysis and validation to ensure that the final version of the test would be useful and
functional.

PHASES OF PREPARING A TEST

• Try-out phase

• Item analysis phase

• Item revision phase

ITEM ANALYSIS

• There are two important characteristics of an item that will be of interest of the
teacher:

• Item Difficulty

• Discrimination Index

Item Difficulty or the difficulty of an item is defined as the number of students who are able
to answer the item correctly divided by the total number of students. Thus:

𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔 𝒘𝒊𝒕𝒉 𝒕𝒉𝒆 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒂𝒏𝒔𝒘𝒆𝒓


Item difficulty =
𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔

• A high index indicates a difficult item and a low index indicates an easy item.

• The item difficulty is usually expressed in percentage.

For overall difficulties of the item,

𝑡𝑜𝑡𝑎𝑙 𝑠𝑐𝑜𝑟𝑒 𝑜𝑓 𝑎𝑙𝑙 𝑖𝑡𝑒𝑚𝑠


Item Difficulty =
𝒏𝒖𝒎𝒃𝒆𝒓 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 × 𝑡𝑜𝑡𝑎𝑙 𝑔𝑖𝑣𝑒𝑛 𝑠𝑐𝑜𝑟𝑒

1
Interpretation and Action

Range of difficulty index Interpretation Action


0 – 0.25 Difficult Revise or discard
0.26 – 0.75 Right difficulty retain
0.76 - above Easy Revise or discard

Example:

What is the item difficulty index of an item if 25 students are unable to answer it correctly
while 75 answered it correctly?

Here the total number of students is 100, hence, the item difficulty index is 75/100 or 75%.

Interpretation and Action: This item is in right difficulty and it can be retained.

Weakness

One problem with this type of difficulty index is that it may not actually indicate that the
item is difficult or easy. A student who does not know the subject matter will naturally be
unable to answer the item correctly even if the question is easy. How do we decide on the
basis of this index whether the item is too difficult or too easy?

Discrimination Index

• Difficult items tend to discriminate between those who know and those who does
not know the answer.

• Easy items cannot discriminate between those two groups of students.

• A measure that will tell us whether an item can discriminate between these two
groups of students is called an index of discrimination.

Index of Discrimination – is the difference between the proportion of the upper group who
got an item right and the proportion of the lower group who got the item right.

Index of discrimination = DU – DL

DU= item difficult of those in the upper 25% of the class

DL= item difficult of those in the upper 25% of the class

2
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower
25% of the class had a difficulty index of 0.20.

DU = 0.60, DL = 0.20

Index of discrimination = .60 - .20 = .40

 Range of Discrimination Index = -1 to 1


 When the index of discrimination is equal to -1, then this means that all of the
lower 25% of the students got the correct answer while all of the upper 25% got
the wrong answer.
 Such an index discriminates correctly between the two groups but the item itself
is highly questionable.
 If the index of discrimination is 1.0, then this means that all of the lower 25%
failed to get the correct answer while all of the upper 25% got the correct
answer. This is a perfectly discriminating item and is the ideal item that should
be included in the test.

How to interpret Index of discrimination?

Index Range Interpretation Action


Can discriminate but the
-1.0 to -.50 Discarded
item is questionable
-.55 to .45 Non-discriminating Revised
.46 to 1.0 Discriminating item Include

3
The table is a standard followed by almost universally in educational tests and
measurement:

Reliability Interpretation

Excellent reliability; at the level of the best


.90 and above
standardized tests.

.80 - .90 Very good for a classroom test

Good for a classroom test; in the range of


.70 - .80 most. There are probably a few items which
could be improved.

Somewhat low. This test should be


.60 - .70 supplemented by other measures (e.g.,
more test) for grading.

Suggests need for revision of test, unless it


is quite short (ten or fewer items). The test
.50 - .60 definitely needs to be supplemented by
other measures (e.g., more tests) for
grading.

Questionable reliability. This test should


.50 or below not contribute heavily to the course grade,
and it needs revision.

You might also like