You are on page 1of 41

Item analysis

and validation
EDUC5. ASSESSMENT OF LEARNING
Group 5 | SMCB-PTC, 1st Semester, AY 2021-2022
16 November 2021
S
o:t
VE
le
ab
be
ill
TI
w
ou
y
EC
r,
Explain the meaning of item analysis, item validity, reliability, item difficulty,
pte

and discrimination index;


ha
c
J
his
OB
ft

Determine the validity and reliability of a test item; and


do
en
the

Determine the quality of a test item by its difficulty index and


At

plausibility of option (for a selected-response test).


First draft
Pilot testing
and item
analysis

Revision/
replacement

HOW DO
TEACHERS
PREPARE A TEST?

Validation

INTRODUCTION
Item Analysis

Item Difficulty

Index of difficulty

Index of
discrimination

Reliability

Validity

Discrimination

KEY TERMS
Discrimination
a measure based on the comparison of performance between
stronger and weaker candidates in the exam as a whole
Source:
Edit Blog post Measuring Item Reliability Part 1 – Item
Discrimination Index |
Maxinity
Part 1
ITEM ANALYSIS: DIFFICULTY INDEX &
DISCRIMINATION INDEX
• The number of students who are able to answer the
DIFFICULTY item correctly divided by the total number of students
INDEX • Difficulty Index is usually expressed in percentage
(%)

There are 2 important


characteristics of an
item that will be of
interest to a teacher.

• Measures the difficulty of an item with respect to


DISCRIM- those in the upper 25% of the class and how difficult it
INATION is with respect to the lower 25% of the class
INDEX
• Discrimination Index = DU - DL
• The number of students who are able to answer the item correctly divided by the total number of
students

ITEM • Difficulty Index is usually expressed in percentage (%)

DIFFICULT
FORMULA:
Y No. of Students
Item Difficulty = with Correct Answer x 100%
No. of Students

SAMPLE PROBLEM: What is the difficulty of an item if 25 students are unable to answer it
correctly while 75 answered it correctly?

Solution:
75
Item Difficulty = x 100%
1 00
Item Difficulty = 0.75 x 100%

Item Difficulty = 75%


• The number of students who are able to answer the item correctly divided by the total number of
students

ITEM • Difficulty Index is usually expressed in percentage (%)

DIFFICULT
Y

IN A NUTSHELL…

Difficulty Index measures how easy a test item is.


• The number of students who are able to answer the item correctly divided by the total number of
students

ITEM • Difficulty Index is usually expressed in percentage (%)

DIFFICULT
WHAT TO DO WITH DIFFICULTY INDEX?
Y
Reject – Revise - Retain

DIFFICULTY
PERCENTAGE DESCRIPTION ACTION
INDEX
0 – 0.20 0–20% Very difficult Reject
0.21 – 0.40 21-40% Difficult Revise
0.41 – 0.60 41-60% Moderate Retain
0.61 – 0.80 61-80% Easy Revise
0.80 – 1.00 81-100% Very Easy Reject
• The number of students who are able to answer the item correctly divided by the total number of
students

ITEM • Difficulty Index is usually expressed in percentage (%)

DIFFICULT
PRACTICE EXERCISES: Compute the Item Difficulty of the following:
Y
Item Total No. Students w/ Item Teacher’s
No. of Students Correct Answers Difficulty Action
1 40 20
50% Retain
2 45 9
20% Reject
3 50 34
68% Revise
4 55 47
85% Reject
5 30 28
93% Reject
• The number of students who are able to answer the item correctly divided by the total number of
students

ITEM • Difficulty Index is usually expressed in percentage (%)

DIFFICULT
WEAKNESS OF DIFFICULTY INDEX:
Y
 It may not actually indicate that the item is difficult or easy.

 A student who does not know the subject matter will naturally be unable to answer the item
correctly.

If such is the case, the how do we decide on the basis of this index whether the item is too difficult
or too easy?
• The number of students who are able to
DIFFICULTY
answer the item correctly divided by the
INDEX total number of students
• Difficulty Index is usually expressed in
percentage (%)
There are 2 important
characteristics of an
item that will be of
interest to a teacher.
• Measures the difficulty of an item with
respect to those in the upper 25% of the
DISCRIM-
INATION
class and how difficult it is with respect
INDEX to the lower 25% of the class
• Discrimination Index = DU - DL
• Measures the difficulty of an item with respect to those in the upper 25% of the class and how
difficult it is with respect to the lower 25% of the class

DISCRIM- • Discrimination Index = DU - DL

INATION
ARBITRARY RULES OF DISCRIMINATION INDEX
INDEX
 Difficult items tend to discriminate those who know and those who do
not know the answer.

 Easy items cannot discriminate between the two groups of students.

We are, therefore, interested in deriving a measure that can tell us whether an item can
discriminate between these two groups of students. Thus, the formula above.
FORMULA:

DISCRIM- Discrimination Index = DU – DL


INATION
INDEX SAMPLE PROBLEM: Obtain the discrimination of an item if the upper
25% of the class had a difficulty of 0.60 (i.e., 60% of the 25% got the
correct answer) while the lower 25% of the class had a difficulty index of
0.20.

SOLUTION:
Here, DU=0.60 while DL=0.20

Discrimination Index = 0.60 – 0.20

Discrimination Index = 0.40


• Measures the difficulty of an item with respect to those in the upper 25% of the class and how
difficult it is with respect to the lower 25% of the class

DISCRIM- • Discrimination Index = DU - DL

INATION
Theoretically, the index of discrimination can range from -1.00 (when
INDEX
DU=0 and DL=1) to 1.0 (when DU=1 and DL=0).

WHAT TO LOOK FOR IN THE DISCRIMINATION INDEX?

a. If DU-DL = -1

b. If DU-DL = 0

c. If DU-DL = 1
A. WHEN DU-DL= -1

DISCRIM-  When the index of discrimination is equal to -1.00, this means that all
INATION of the lower 25% of students got the correct answer while the upper
INDEX 25% got the wrong answer.

• In a sense, the item discriminates correctly between the two groups


but the item itself is highly questionable.

• Why would the bright ones get the wrong answer, and the poor
ones get the right answer?
B. WHEN DU-DL= 0
 Both the upper 25% and the lower 25% got the correct answer.
DISCRIM- • The item did not discriminate between the two groups of students
INATION
INDEX
C. WHEN DU-DL= 1
 When the index of discrimination is equal to 1.00, this means that all of
the lower 25% failed to get the correct answer while the upper 25% got
the correct answer.

• This is a perfectly discriminating item and is the ideal item that


should be included in the test.
WHAT TO DO WITH DISCRIMINATION INDEX?

DISCRIM- Retain / Reject


INATION
INDEX DISCRIMINATION INDEX ACTION
(DU-DL)
Negative (-1) Reject
Zero (0) Reject
Positive (+1) Retain
• Measures the difficulty of an item with respect to those in the upper 25% of the class and how
difficult it is with respect to the lower 25% of the class

DISCRIM- • Discrimination Index = DU - DL

INATION
PRACTICE EXERCISES: Compute the Discrimination Index of the
INDEX
following:
Item DU DL Disc. Teacher’s
(Upper 25%) (Lower 25%) Index Action
1 0.45 0.70 - 0.25 Reject
2 0.50 0.25 0.25 Retain
3 0.30 0.40 - 0.10 Reject
4 0.75 0.75 0 Reject
5 0.60 0.40 0.20 Retain
APPLICATION OF DIFFICULTY & DISCRIMINATION INDICES

SAMPLE PROBLEM:
Consider a multiple-choice type of a test which the following data were obtained:

Multiple-Choice Item No. 1 (Correct Answer: B)

A B C D TOTAL
Total No. of Students 0 40 20 20 80 students took the test
Upper 25% 0 15 5 0 20 students
Lower 25% 0 5 10 5 20 students

Compute the Difficulty and Discrimination Indices


DIFFICULTY INDEX
Students
= with Correctx Answer
100%
No. of Students

= 40 students x100%
80 students

= 0.5 x100%
= 0.5 or 50%

DISCRIMINATION INDEX
= DU-DL
= 0.75 – 0.25
= 0.5 or 50%
Multiple-Choice Item No. 1 (Correct Answer: B)

A B C D
Total No. of Students 0 40 20 20 80 students took the test
Upper 25% 0 15 5 0
Lower 25% 0 5 10 5

Things to Notice:
• “A” is not a good distracter (implausible distracter)
• “B” and “C” have good appeal as distracters (plausible distracters)
MORE SOPHISTICATED
DISCRIMINATION INDEX
Item Discrimination
• the ability of an item to differentiate among the students
on the basis of how well they know the material being
tested.

Item Discrimination Index provided by ScorePak ®


• is a Pearson Product Moment correlation between student
responses to a particular item and total scores on all other
items on the test.
• Equivalent to point-biserial coefficient
• Provides an estimate of the degree to which an individual
item is measuring the same thing as the rest of the items.
MORE SOPHISTICATED
DISCRIMINATION INDEX
Things to Remember when Dealing with Discrimination
Indices

• Values of the coefficients will tend to be lower for tests ScorePak® Classification of Item
measuring wide range of areas Discrimination
• Items with low indices are often ambiguously-worded 0.30 and above “good”
• Items with negative indices should be examined.
0.10-0.30 “fair”
• Tests with high internal consistency consist of items
with mostly positive relationships with the total score. below 0.10 “poor”
• Values of discrimination index will seldom exceed 0.50
because of differing shapes of item and total score
distribution
• A good item is the one that has a good discriminating
ability
ITEM ANALYSIS IN GENERAL

Item analysis provides the


following information

The difficulty of the item


The Discriminating Power of the Item
The Effectiveness of each Item
ITEM ANALYSIS IN GENERAL

Benefits derived from item analysis

1. It provides useful information for class discussion of the


test
2. It provides data which helps students improve their
learning.
3. It provides insights and skills that lead to the preparation
of better tests in the future.
Part 2
VALIDATION & VALIDITY
VALIDITY DEFINED

DEFINITION 1
the extent to which a test measures what it intends to
measure

DEFINITION 2
the appropriateness, correctness, meaningfulness, and
usefulness of the specific decisions a teacher makes based on
the test results.
TYPES OF VALIDITY

1. Content validity
2. Construct Validity
3. Criterion-related validity (Concurrent Validity)
4. Criterion-related validity (Predictive Validity)
5. Face Validity
1. CONTENT VALIDITY

• It is related to how adequately the content of the root test


sample the domain about which inference is to be made
(Calmorin, 2004)

• This is being established through logical analysis and


adequate sampling of test items usually enough to assure
that the test has content validity. (Oriondo, 1984)

Example
A teacher wishes to validate a test in
Mathematics. He requests experts in
Mathematics to judge if the items or
questions measures the knowledge, the
skills, and values supposed to be
measured.
2. CONSTRUCT VALIDITY

• The test is the extent to which a test measures a


theoretical trait. This involves such tests as those of
understanding and interpretation of data.

Example
A teacher might design whether an
educational program increases artistic
ability amongst preschool children.
Construct validity is a measure whether
your research actually measures artistic
ability (a slightly abstract label)
3. CONCURRENT VALIDITY

• It refers to the degree to which the test correlates with a


criterion which sets as an acceptable measure on standard
other than the test itself. The criterion is always available
at the time of testing.

4. PREDICTIVE VALIDITY
• This refers to the degree of accuracy of how a test one’s
performance at some subsequent outcome (Asaad, 2004).
5. FACE VALIDITY

• The test questions are said to have validity when they


appear to be related to the group being examined. (Asaad,
2004)
• This done by examining the test if it is the good one.
(there is no common numerical method for face validity)

Example
Calculation of the area of a rectangle
when its given direction of length and
width are 4 ft and 6 ft. respectively.
FACTORS AFFECTING THE VALIDITY OF
AN ASSESSMENT INSTRUMENT

1. Unclear Directions
2. Reading vocabulary and sentence structure are too
difficult
3. Ambiguity
4. Inadequate time limits
5. Overemphasis of easy (to assess aspects of domain at the
expense of important but hard to assess) aspects
6. Test items inappropriate for the outcomes being
measured.
7. Poorly constructed test items
8. Test too short
9. Improper arrangement of items
10. Identifiable patter of answer.
Part 3
RELIABILITY
RELIABILITY DEFINED

Reliability refers to the consistency of the scores obtained—


how consistent they are for each individual from one
administration of the instrument to another and from one set
of items to another.
RELIABILITY AND VALIDITY

• If an instrument is unreliable, it cannot yield valid


outcomes.
• As reliability improves, validity may improve (or it may
not).
• If an instrument is shown scientifically to be valid, then it
is almost certain that it is also reliable.
RELIABILITY STANDARD

Reliability Interpretation
0.90 and above Excellent reliability
0.80 – 0.90 Very good for a classroom
test
0.70 – 0.80 Good for a classroom test
0.60 – 0.70 Somewhat low
0.50 – 0.60 Needs revision of test
0.50 and below Questionable reliability
TYPES OF RELIABILITY

1. Test-Retest Reliability
2. Parallel Forms Reliability
3. Inter-Rater Reliability
4. Internal Consistency Reliability
1. TEST-RETEST RELIABILITY

• It is related to how adequately the content of the root test


sample the domain about which inference is to be made
(Calmorin, 2004)

• This is being established through logical analysis and


adequate sampling of test items usually enough to assure
that the test has content validity. (Oriondo, 1984)

Example
A teacher wishes to validate a test in
Mathematics. He requests experts in
Mathematics to judge if the items or
questions measures the knowledge, the
skills, and values supposed to be
measured.

You might also like