You are on page 1of 31

ITEM

AND
ANALYSIS
VALIDATION CHAP TE
R 6
INTRODUCTIO
N prepares a draft of the test, which is then subjected
The teacher
to item analysis and validation. The teacher tries out the test to a
group of students of similar characteristics, and each item is
analyzed in terms of its ability to discriminate between those who
know and those who do not know and its level of difficulty. The
item analysis will provide information that will allow the teacher
to decide whether to revise or replace an item. Finally, the final
draft is subjected to validation if the intent is to make use of the
test as a standard test for the particular unit or grading period.
Explain the meaning of item
1 analysis, item validity, reliability,
LEARNING
item difficulty, discrimination index

OUTCOMES 2
Determine the validity and

: reliability of given test


items
Determine the quality of a test item by
its difficulty Index, discrimination index
3 and plausibility of options (for a
selected
- response test)
Two important characteristics of an
item

1 ITEM
DIFFICULTY

2 DISCRIMINATION
INDEX
ITEM
DIFFICULTY
The difficulty of an item or item difficulty is defined as the
number of students who are able to answer the item correctly
divided by the total number of students. Thus:

Item difficulty = number of students with correct answer/


total number of students

The item difficulty is usually expressed in percentage.


Example
:25 students answered the item correctly while 75
students did not. The total number of students is
100 so the difficulty index is 25/100 or 25 which is
25%.

It is a more difficult test item than that one with a


difficulty index of 75.

A high percentage indicates an easy item/


question while a low percentage indicates a difficult
item.
The following arbitrary rule is often used in the
literature:
Range of difficulty index Interpretation Action

0 - 0.25 Difficult Revise or Discard

0.25 - 0.75 Right Difficult Retain

0.76 - above Easy Revise or Discard


DISCRIMINATIO
N
INDEX
Difficult items tend to discriminate between those
who know and those who do not, while easy items
cannot. We are interested in deriving a measure that will
tell us whether an item can discriminate between these
two groups of students. This measure is called an index
of discrimination.
An easy way to derive such a measure is to
measure how difficult an item is with respect to
those in the upper 25% of the class and how
difficult it is with respect to those in the lower 25%
of the class. If the upper 25% of the class found
the item easy yet the lower 25% found it difficult,
then the item can discriminate properly between
these two groups.
Thus:
Index of discrimination- DU-DL (U- Upper group; L -
Lower group)

Example:
Obtain the index of discrimination of an item if the upper 25%
of the class had a difficulty index of 0.60 (i.e. 60% of the
upper 25% got the correct answer) while the lower 25% of the
class had a difficulty index of 0.20.

Here, DU 0.60 while DL 0.20, thus index of discrimination =


=
.60-20-40.
Example:
Obtain the index of discrimination of an item if
the upper 25% of the class had a difficulty
index of 0.60 (i.e. 60% of the upper 25% got the
correct answer) while the lower 25% of the class
had a difficulty index of 0.20.

Here, DU 0.60 while DL 0.20, thus index


of discrimination = .60-.20= .40.
Discrimination index is the difference
between the proportion of top scorers and
lowest scorers who got an item correct. It
ranges from -1 to +1, with the closer to +1
the more effectively the item can
discriminate between
distinguish or the two groups of
students. negative discrimination
means
A more from
indexthe lower group got the
item correctly.
Theoretically, the index of discrimination can
range from -1.0 (when DU = 0 and DL = 1) to
1.0 (when DU = 1 and DL = 0). When the index
of discrimination is equal to -1, then this means
that all of the lower 25% of the students got the
correct answer while all of the upper 25% got the
wrong answer. On the other hand, if the index of
discrimination is 1.0, then this means that all of
the lower 25% failed to get the correct answer
while all of the upper 25% got the correct
answer.
As in the case of the index of difficulty, we have the
following rule of thumb:

Index Range Interpretation Action

Can discriminate but but item is


-1.0 - .50 Discard
questionable

.55 - 0.45 Non-discriminating Revise

0.46 - 1.0 Discriminating item Include


The INDEX OF DISCRIMINATION is the difference
between the proportion of the upper group who got an
item right and the proportion of the lower group who
got the item right. This index is dependent upon the
difficulty of an item. It may reach a maximum value of
100 for an item with an index of difficulty of 50, that
is, when 100% of the upper group and none of the
lower group answer the item correctly. For items of
less than or greater than 50 difficulty, the index of
discrimination has a maximum value of less than
100.
More Sophisticated Discrimination
Index
Item discrimination is the ability of an
item to differentiate among students based
on how well they know the material being
tested. Hand calculation procedures have
traditionally been used to compare item
responses to total test scores, but
computerized analyses provide more
accurate assessment of the discrimination
power of items.
The item discrimination index provided by ScorePak®
is a Pearson Product Moment correlation between student
responses to a particular item and total scores on all other
items on the test. It provides an estimate of the degree to
which an individual item is measuring the same thing as the
rest of the items. Values of the coefficient will tend to be
lower for tests measuring a wide range of content areas
than for more homogeneous tests. Items with low
discrimination indices should be examined to determine
why a negative value was obtained.
Tests with high internal consistency consist of
items with mostly positive relationships with total test
score. ScorePak® classifies item discrimination as
"good" if the index is above .30, "fair" if it is between.
10 and 30, and "poor" if it's below .10. At the end of
the Item Analysis report, test items are listed
according to their degrees of difficulty and
discrimination. These distributions provide a quick
overview of the test and can be used to identify items
which are not performing well and which can be
improved or discarded.
VALIDATITION
AND
VA L IDIT
Y
V
The purpose of validation is to determine the
A characteristics of the whole test itself, namely, the
validity and reliability of the test.
L
Validation is the process of collecting and analyzing
I evidence to support the meaningfulness and
usefulness of the test.
D

T
I
V
Validity is the extent to which a test
A measures what it purports to measure or as
referring to the appropriateness,
correctness, meaningfulness and usefulness
L of the specific decisions a teacher makes
based on the test results.
I

I
Content-related evidence of validity
THREE MAIN TYPES OF refers to the content and format of
EVIDENCE THAT MAY the instrument.
BE CONNECTED
Criterion-related evidence of validity
refers to the relationship between
scores obtained using the instrument
and scores obtained using one or
1 Content-Related Evidence of Validity more other tests (often called
criterion).
Construct-related evidence of
2 Criterion-Related Evidence of Validity validity refers to the nature of the
psychological construct or
characteristic being measured by the
test.
3 Construct-related evidence of validity
RELIABILITY

Reliability refers to the consistency of the


scores obtained - how consistent they are
for each individual from one administration
of an instrument to another and from one
set of items to another.
Reliability and validity are related
concepts. If an instrument is unreliable, it
cannot get valid outcomes. As reliability
improves, validity may improve (or it may not).
However, if an instrument is shown
scientifically to be valid then it is almost
certain that it is also reliable.
The following table is a standard followed almost universally
in educational test and measurement

Realiability Interpretation

.90 - above Excellent reliability; at level of the best standardized test

.80 - 90 Very good for classroom test

Good for a classroom test; in the range most. There are probably a few items
.70 - 80
which could be improved.

Somewhat low. This test needs to supplemented by other measures (e.g., more test)
.60 - 70
to determine grades. There probably some items which could be improved

Suggests are need for revision of test, unless it is quite short (ten or fewer items). The
.50 - 60
test definitely needs to be supplemented by other measures (e.g. more test) for
grading

Questionable reliability. This test should not contribute heavily to the course
.50 or below
grade, and it needs revision.
Believe. You're halfway
there. Keep hustling
to the finish line.
T h an k
You!

You might also like