Professional Documents
Culture Documents
and validation
EDUC5. ASSESSMENT OF LEARNING
Group 5 | SMCB-PTC, 1st Semester, AY 2021-2022
16 November 2021
S
o:t
VE
le
ab
be
ill
TI
w
ou
y
EC
r,
Explain the meaning of item analysis, item validity, reliability, item difficulty,
pte
Revision/
replacement
HOW DO
TEACHERS
PREPARE A TEST?
Validation
INTRODUCTION
Item Analysis
Item Difficulty
Index of difficulty
Index of
discrimination
Reliability
Validity
Discrimination
KEY TERMS
Discrimination
a measure based on the comparison of performance between
stronger and weaker candidates in the exam as a whole
Source:
Edit Blog post Measuring Item Reliability Part 1 – Item
Discrimination Index |
Maxinity
Part 1
ITEM ANALYSIS: DIFFICULTY INDEX &
DISCRIMINATION INDEX
• The number of students who are able to answer the
DIFFICULTY item correctly divided by the total number of students
INDEX • Difficulty Index is usually expressed in percentage
(%)
DIFFICULT
FORMULA:
Y No. of Students
Item Difficulty = with Correct Answer x 100%
No. of Students
SAMPLE PROBLEM: What is the difficulty of an item if 25 students are unable to answer it
correctly while 75 answered it correctly?
Solution:
75
Item Difficulty = x 100%
1 00
Item Difficulty = 0.75 x 100%
DIFFICULT
Y
IN A NUTSHELL…
DIFFICULT
WHAT TO DO WITH DIFFICULTY INDEX?
Y
Reject – Revise - Retain
DIFFICULTY
PERCENTAGE DESCRIPTION ACTION
INDEX
0 – 0.20 0–20% Very difficult Reject
0.21 – 0.40 21-40% Difficult Revise
0.41 – 0.60 41-60% Moderate Retain
0.61 – 0.80 61-80% Easy Revise
0.80 – 1.00 81-100% Very Easy Reject
• The number of students who are able to answer the item correctly divided by the total number of
students
DIFFICULT
PRACTICE EXERCISES: Compute the Item Difficulty of the following:
Y
Item Total No. Students w/ Item Teacher’s
No. of Students Correct Answers Difficulty Action
1 40 20
50% Retain
2 45 9
20% Reject
3 50 34
68% Revise
4 55 47
85% Reject
5 30 28
93% Reject
• The number of students who are able to answer the item correctly divided by the total number of
students
DIFFICULT
WEAKNESS OF DIFFICULTY INDEX:
Y
It may not actually indicate that the item is difficult or easy.
A student who does not know the subject matter will naturally be unable to answer the item
correctly.
If such is the case, the how do we decide on the basis of this index whether the item is too difficult
or too easy?
• The number of students who are able to
DIFFICULTY
answer the item correctly divided by the
INDEX total number of students
• Difficulty Index is usually expressed in
percentage (%)
There are 2 important
characteristics of an
item that will be of
interest to a teacher.
• Measures the difficulty of an item with
respect to those in the upper 25% of the
DISCRIM-
INATION
class and how difficult it is with respect
INDEX to the lower 25% of the class
• Discrimination Index = DU - DL
• Measures the difficulty of an item with respect to those in the upper 25% of the class and how
difficult it is with respect to the lower 25% of the class
INATION
ARBITRARY RULES OF DISCRIMINATION INDEX
INDEX
Difficult items tend to discriminate those who know and those who do
not know the answer.
We are, therefore, interested in deriving a measure that can tell us whether an item can
discriminate between these two groups of students. Thus, the formula above.
FORMULA:
SOLUTION:
Here, DU=0.60 while DL=0.20
INATION
Theoretically, the index of discrimination can range from -1.00 (when
INDEX
DU=0 and DL=1) to 1.0 (when DU=1 and DL=0).
a. If DU-DL = -1
b. If DU-DL = 0
c. If DU-DL = 1
A. WHEN DU-DL= -1
DISCRIM- When the index of discrimination is equal to -1.00, this means that all
INATION of the lower 25% of students got the correct answer while the upper
INDEX 25% got the wrong answer.
• Why would the bright ones get the wrong answer, and the poor
ones get the right answer?
B. WHEN DU-DL= 0
Both the upper 25% and the lower 25% got the correct answer.
DISCRIM- • The item did not discriminate between the two groups of students
INATION
INDEX
C. WHEN DU-DL= 1
When the index of discrimination is equal to 1.00, this means that all of
the lower 25% failed to get the correct answer while the upper 25% got
the correct answer.
INATION
PRACTICE EXERCISES: Compute the Discrimination Index of the
INDEX
following:
Item DU DL Disc. Teacher’s
(Upper 25%) (Lower 25%) Index Action
1 0.45 0.70 - 0.25 Reject
2 0.50 0.25 0.25 Retain
3 0.30 0.40 - 0.10 Reject
4 0.75 0.75 0 Reject
5 0.60 0.40 0.20 Retain
APPLICATION OF DIFFICULTY & DISCRIMINATION INDICES
SAMPLE PROBLEM:
Consider a multiple-choice type of a test which the following data were obtained:
A B C D TOTAL
Total No. of Students 0 40 20 20 80 students took the test
Upper 25% 0 15 5 0 20 students
Lower 25% 0 5 10 5 20 students
= 40 students x100%
80 students
= 0.5 x100%
= 0.5 or 50%
DISCRIMINATION INDEX
= DU-DL
= 0.75 – 0.25
= 0.5 or 50%
Multiple-Choice Item No. 1 (Correct Answer: B)
A B C D
Total No. of Students 0 40 20 20 80 students took the test
Upper 25% 0 15 5 0
Lower 25% 0 5 10 5
Things to Notice:
• “A” is not a good distracter (implausible distracter)
• “B” and “C” have good appeal as distracters (plausible distracters)
MORE SOPHISTICATED
DISCRIMINATION INDEX
Item Discrimination
• the ability of an item to differentiate among the students
on the basis of how well they know the material being
tested.
• Values of the coefficients will tend to be lower for tests ScorePak® Classification of Item
measuring wide range of areas Discrimination
• Items with low indices are often ambiguously-worded 0.30 and above “good”
• Items with negative indices should be examined.
0.10-0.30 “fair”
• Tests with high internal consistency consist of items
with mostly positive relationships with the total score. below 0.10 “poor”
• Values of discrimination index will seldom exceed 0.50
because of differing shapes of item and total score
distribution
• A good item is the one that has a good discriminating
ability
ITEM ANALYSIS IN GENERAL
DEFINITION 1
the extent to which a test measures what it intends to
measure
DEFINITION 2
the appropriateness, correctness, meaningfulness, and
usefulness of the specific decisions a teacher makes based on
the test results.
TYPES OF VALIDITY
1. Content validity
2. Construct Validity
3. Criterion-related validity (Concurrent Validity)
4. Criterion-related validity (Predictive Validity)
5. Face Validity
1. CONTENT VALIDITY
Example
A teacher wishes to validate a test in
Mathematics. He requests experts in
Mathematics to judge if the items or
questions measures the knowledge, the
skills, and values supposed to be
measured.
2. CONSTRUCT VALIDITY
Example
A teacher might design whether an
educational program increases artistic
ability amongst preschool children.
Construct validity is a measure whether
your research actually measures artistic
ability (a slightly abstract label)
3. CONCURRENT VALIDITY
4. PREDICTIVE VALIDITY
• This refers to the degree of accuracy of how a test one’s
performance at some subsequent outcome (Asaad, 2004).
5. FACE VALIDITY
Example
Calculation of the area of a rectangle
when its given direction of length and
width are 4 ft and 6 ft. respectively.
FACTORS AFFECTING THE VALIDITY OF
AN ASSESSMENT INSTRUMENT
1. Unclear Directions
2. Reading vocabulary and sentence structure are too
difficult
3. Ambiguity
4. Inadequate time limits
5. Overemphasis of easy (to assess aspects of domain at the
expense of important but hard to assess) aspects
6. Test items inappropriate for the outcomes being
measured.
7. Poorly constructed test items
8. Test too short
9. Improper arrangement of items
10. Identifiable patter of answer.
Part 3
RELIABILITY
RELIABILITY DEFINED
Reliability Interpretation
0.90 and above Excellent reliability
0.80 – 0.90 Very good for a classroom
test
0.70 – 0.80 Good for a classroom test
0.60 – 0.70 Somewhat low
0.50 – 0.60 Needs revision of test
0.50 and below Questionable reliability
TYPES OF RELIABILITY
1. Test-Retest Reliability
2. Parallel Forms Reliability
3. Inter-Rater Reliability
4. Internal Consistency Reliability
1. TEST-RETEST RELIABILITY
Example
A teacher wishes to validate a test in
Mathematics. He requests experts in
Mathematics to judge if the items or
questions measures the knowledge, the
skills, and values supposed to be
measured.