You are on page 1of 31

ADMINISTERING,

ANALYZING, & IMPROVING


TESTS

Prepared by:
FERNANDO R. SEQUETE JR., LPT
LEARNING OUTCOMES
At the end of the chapter, you should be able to:
1. Define the basic concepts regarding item analysis;
2. Identify the steps in improving test items;
3. Solve difficulty index and discrimination index;
4. Identify the level of difficulty of an item;
5. Perform item analysis properly and correctly;
6. Identify the item to be rejected, revised, or retained; and
7. Interpret the results of item analysis.
INTRODUCTION
•The teacher normally prepares a draft of the test.
•The draft is subjected to item analysis and validation to be useful
and functional.
•Try-out phase – the draft test will be tried out to a group of
students of similar characteristics to the test takers.
•Item analysis phase – each item will be analyzed in terms of its
ability to discriminate, and also its level of difficulty.
•Item revision phase – allow the teacher to decide whether to
revise or replace an item.
INTRODUCTION
•The final draft of the test is subjected to validation if the intent is
to make use of the test as a standard test for the particular unit
or grading period
ITEM ANALYSIS: DIFFICULTY INDEX &
DISCRIMINATION INDEX
✓There are two important characteristics of an item that will be of interest
to the teacher: (1) item difficulty and (2) discrimination index.
✓Item difficulty – the number of students who are able to answer the item
correctly divided by the total number of students.
✓The item difficulty is usually expressed in percentage.
✓A high percentage indicates an easy item/question while a low
percentage indicates a difficult item.
EXAMPLES:
1. What is the item difficulty index of an item
if 25 students are unable to answer it
correctly while 75 answered it correctly?
2. What is the item difficulty index of an item
if 25 students answered the item correctly
while 75 students did not?
❑One problem with this type of difficulty index is that it may not actually
indicate that the item is difficult or easy.
❑A student who does not know the subject matter will naturally be unable to
answer the item correctly even if the question is easy.

How do we decide on the basis of this index whether the


item is too difficult or too easy?
THE FOLLOWING ARBITRARY RULE IS OFTEN USED
IN THE LITERATURE:

Range of Difficulty Index Interpretation Action to be done


0.00 – 0.25 Difficult Revise or discard
0.26 – 0.75 Right difficulty Retain
0.76 – above Easy Revise or discard
✓Difficult items tend to discriminate between those who know and those who do
not know the answer.
✓Easy items cannot discriminate between these two groups of students. We are
interested in deriving a measure that will tell us whether an item can
discriminate between these two groups of students.
✓Such a measure is called an index of discrimination.
✓An easy way to derive such a measure is to measure how difficult an item is
with respect to those in the upper 25% of the class and how difficult it is with
respect to those in the lower 25% of the class.
✓If the upper 25% of the class found the item easy yet the lower 25% found it
difficult, then the item can discriminate properly between these two groups.

THUS: Index of Discrimination = DU – DL, where U is the upper group and L is


the lower group.
EXAMPLE:
Obtain the index of discrimination of an item if
the upper 25% of the class had a difficulty
index of 0.60 (i.e., 60% of the upper 25% got
the correct answer) while the lower 25% of the
class had a difficulty index of 0.20.
ANSWER:

Here,

DU = 0.60
DL = 0.20

Index of discrimination = DU – DL = 0.60 – 0.20 = 0.40


✓Discrimination index is the difference between the
proportion of the top scorers who got an item correct and
the proportion of the lowest scores who got the item right.
✓The discrimination index range is between -1 and +1.
✓The closer the discrimination index is to +1, the more
effectively the item can discriminate or distinguish between
the two groups of students.
✓A negative discrimination index means more from the
lower group got the item correctly.
✓The index of discrimination can range from -1.0 (when DU =
0 & DL = 1) to 1.0 (when DU = 1 & DL = 0).
✓When the index of discrimination is equal to -1, then this means that all of the
lower 25% of the students got the correct answer while all of the upper 25%
got the wrong answer; the item is highly questionable.
✓If the discrimination index is 1.0, then this means that all of the lower 25%
failed to get the correct answer while all of the upper 25% got the correct
answer.
✓As in the case of the index of difficulty, we have the following rule of thumb:

Action to be
Index Range Interpretation
done
-1.0 – -0.50 Can discriminate but item is questionable Discard
-0.55 – 0.45 Non-discriminating Revise
0.46 – 1. Discriminating item Include
EXAMPLE:
Consider a multiple choice type of test of which the
following data were obtained:
ANSWER:
Difficulty index = # of students getting correct response/total
= 40/80 = 50%, within range of a “good item”

Discrimination index can be computed as:


DU = # of students in upper 25% with correct responses/# of students in the
upper 25%
= 15/20 = 0.75 or 75%
DL = # of students in lower 25% with correct responses/# of students in the
lower 25%
= 5/20 = 0.25 or 25%

Discrimination index = DU – DL = 0.75 – 0.25 = 0.50 or 50%


• It is also instructive to note that the distracter A is not an
effective distracter since this was never selected by the
students. It is an implausible distracter.
• Distracters C and D appear to have good appeal as
distracters, they are plausible distracters.
8
𝑃= 𝑥100 = 40%
20
The smaller the percentage figure the more difficult the item.

6−2
𝐷= = 0.40
10

• The discriminating power of an item is reported as a decimal fraction;


maximum discriminating power is indicated by an index 1.00.
• Maximum discrimination is usually found at the 50 percent level of
difficulty.
VALIDATION AND VALIDITY
▪After performing the item analysis and revising the items
which need revision, the next step is to validate the
instrument.
▪The purpose of validation is to determine the
characteristics of the whole test itself; the validity and
reliability of the test.
▪Validation is the process of collecting and analyzing
evidence to support the meaningfulness and usefulness of
the test.
VALIDATION AND VALIDITY
▪Validity is the extent to which a test measures what it
purports to measure or as referring to the appropriateness,
correctness, meaningfulness and usefulness of the specific
decisions a teacher makes based on the test results.
▪A test is valid when it is aligned with the learning
outcome.
▪A teacher who conducts test validation might want to
gather different kinds of evidence.
CONTENT-RELATED EVIDENCE OF VALIDITY

▪The content and format of the instrument.


▪How appropriate is the content?
▪How comprehensive?
▪Does it logically get at the intended variable?
▪How adequately does the sample of items or
questions represent the content to be assessed?
CRITERION-RELATED EVIDENCE OF VALIDITY

▪The relationship between scores obtained using


the instrument and scores obtained using one or
more other tests (often called criterion).
▪How strong is this relationship?
▪How well do such scores estimate present or
predict future performance of a certain type?
CONSTRUCT-RELATED EVIDENCE OF VALIDITY

▪Nature of the psychological construct or


characteristic being measured by the test.
▪How well does a measure of the construct
explain differences in the behavior of the
individuals or their performance on a certain
task?
RELIABILITY
✓The consistency of the scores obtained – how consistent they are
for each individual from one administration of an instrument to
another and from one set of items to another.
✓If an instrument is unreliable, it cannot get valid outcomes.
✓As reliability improves, validity may improve.
✓If an instrument is shown scientifically to be valid then it is almost
certain that it is also reliable.
✓For internal consistency, we could use split-half method or the
Kuder-Richardson formulae (KR-20 or KR-21).
SPLIT-HALF METHOD
2𝑟𝑜𝑒
𝑟𝑜𝑡 =
1+ 𝑟𝑜𝑒

Where,
𝑟𝑜𝑡 = 𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑡𝑒𝑠𝑡
𝑟𝑜𝑒 = 𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑑𝑑 𝑎𝑛𝑑 𝑒𝑣𝑒𝑛 𝑖𝑡𝑒𝑚𝑠
KR-20 & KR-21 FORMULAS
The KR-20 formula is also known as the Kuder-Richardson formula.

𝑘 𝑝𝑞
𝐾𝑅20 = 1− 2
𝑘−1 𝑠

k = number of items
p = proportion of the students who got the item correctly (index of
difficulty)
q=1–p
S2 = variance of the total score
KR-20 & KR-21 FORMULAS
𝑘 𝑥 (𝑘 − 𝑥)
𝐾𝑅21 = 1 − 2

𝑘−1 𝑘𝑠

k = number of items
x = mean value
S2 = variance of the total score
THE FOLLOWING TABLE IS A STANDARD FOLLOWED ALMOST
UNIVERSALLY IN EDUCATIONAL TEST & MEASUREMENT
Reliability Interpretation
0.90 and above Excellent reliability; at the level of the best standardized tests
0.80 – 0.90 Very good for a classroom test
Good for a classroom test; in the range of most. There are probably a few times which
0.70 – 0.80
could be improved.
Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to
0.60 – 0.70
determine grades. There are probably some items which could be improved.
Suggests need for revision of tests, unless it is quite short (ten or fewer items). The test
0.50 – 0.60
definitely needs to be supplemented by other measures (e.g., more tests) for grading.
Questionable reliability. This test should not contribute heavily to the course grade, and it
0.50 and below
needs revision

You might also like