Item Analysis

GUESS WHAT?
Mechanics: In choosing a player the host will

select 2 students in the upper and lower part
of the list in the class to guess the given
basic question. Whoever got the correct
answer will proceed to the proper question
regarding the topic. A certain clue will be
flash inside an illusion and the student will
try to find out the correct answer. Players
have only 10 seconds to answer and
whoever fails with that other student may
steal the chance.
How many EYELASH
does Spongebob have?
A process of collecting and analyzing evidence to support the meaningfulness and
usefulness of the test.
VALIDATION
What is the color of the
HEART in GMA 7?
It refers to the consistency of the scores obtained — how consistent they are for each individual
from one administration of an instrument to another and from one set of items to another.
RELIABILITY
How many BIRDS are
there in the Nestle logo?
It is a process which examines student responses to individual test items (questions)
in order to assess the quality of those items and of the test as a whole.
ITEM ANALYSIS
ITEM ANALYSIS
AND VALIDATION
CHAPTER 3
Start Table of contents Back Next

It is a statistical technique which
ITEM is used for selecting and rejecting
the items of the test on the basis
ANALYSIS of their difficulty value and
discriminated power.

1. Try-out phase
a. Test Taking
PHASES OF b. Arrang the score from highest to lowest
ITEM c. Determine the upper and lower group.
ANALYSIS 2. Item analysis phase (level of

difficulty)
3. Item revision phase

 Evaluates the quality of each item. PURPOSE OF
 May suggest ways of improving the ITEM
measurement of a test. ANALYSIS

3 IMPORTANT
CHARACTERISTICS OF AN ITEM
1. Item Difficulty
2.Discrimination Index
3.Distractor Analysis
Number of students who are able to answer
the item correctly divided by the total number
of students.
Item Difficulty = no. of students
with correct answer/total no. of
students.
ITEM
DIFFICULTY
EXAMPLE
What is the difficulty index of an item if 25 students are
unable to answer it correctly while 75 answered it
correctly?
Total number of student = 100
Number of student who answer correctly = 75
Item difficulty = 75/100 = ?

0.75 or 75%

Range Of Difficulty Iterpretation Action
Revise Or
0 - 0.25 Difficult
Discard
0.26 - 0.75 Right-Difficulty Retain
Revise Or
0.76 - Above Easy
Discard

It is the difference of the proportion on the "correct
response" between the students in upper group (DU)
and the proportion of the students in lower group
(DL).. • 27% of upper group (DU) • 27% of lower
group (DL)
INDEX
 Positive Discrimination Index
 Negative Discrimination Index
DISCRIMINATION
 Zero Discrimination Index INDEX

27% UG got the correct answer
27% LG got an incorrect answers
How do we get the 27% Upper and Lower Group of the Class?
Formula: no. of class X .27
Total no. Of students = 54

27% = 14. 58 or 15
UG= 15 students 27%

LG = 15 students 27%
Find the Index of Discrimination.
Formula: DU - DL 27%
UG had DI of 0.60 or 60% 27% LG had

DI of 0.20 or 20% DU= 0.60
Index of discrimination = 0.60-0.20

DL= 0.20 =0.40.
RULE OF THUMB
Index Range Iterpretation Action

-1.0 - 0.50 Can Descriminate But Discard
Item Is Questionable
-.55 - 0.45 Non-Descriminating Revise

0.46 – 1.0 Descriminating Item Include

INDEX OF
DISCRIMINATION RANGE
-1.0 to 1.0
LG got UG got the

correct answers. correct answers.

1. Construct test items
2. Conduct an item try out
a) Test taking
STEPS: b) Arrange Scores
c) Determine the lower and upper group.
3. Construct an item analysis table

ITEM 1
A B C D
TOTAL
12 25 13 10
3 10 2 1 Upper Group
5 6 5 1 Lower Group
1. Compute the item difficulty
STEPS IN ITEM 2. Perform item discrimination
ANALYSIS
3. Analysing distractors

FORMULAS
Item difficulty = no. of students with
correct answer/total no. of students.
Discrimination index = DU-DL

Range Of Difficulty Iterpretation Action
0 - 0.25 Difficult Revise Or Discard
0.26 - 0.75 Right-Difficulty Retain
0.76 - Above Easy Revise Or Discard
Index Range Iterpretation Action

-1.0 - 0.50 Can Descriminate But Discard
Item Is Questionable
-.55 - 0.45 Non-Descriminating Revise

0.46 – 1.0 Descriminating Item Include

1. If the Upper Group and Lower group did
not choose the distractor
PRINCIPLES OF
2. If the distractor is more attractive to upper ANALYSING
group than the lower group
DISTRACTOR
3. If both the upper group and lower group
have total no. of answers in the distractor.

BASIC ITEM ANALYSIS STATISTICS
The Michigan State University

Measurement and Evaluation
Department reports a number of item
statistics which aid in evaluating the
effectiveness of an item.

The first of these is the index of difficulty

which MSU defines as the proportion of
total group who got the item wrong. Thus
a high index indicates a difficult item and
low index indicates easy item.

Whichever index is selected is shown as the

INDEX DIFFICULTY on the item analysis
print-out. Classroom achievements test, most test
constructors desire items with indicates of
difficulty no lower than 20 nor higher than 80
with an average index of difficulty from 30 or 40
to a maximum of 60.

Is the difference between the
proportion of the upper group
INDEX OF
DISCRIMINATION who got an item right and the
proportion of the lower group
who got the item right.

D= RU + RL
½T INDEX OF
Where DISCRIMINATING
P – percentage who answered the item correctly POWER
(Index of Difficulty)
R – number who answered correctly
T – total number who tried the item.

The smaller the percentage figure the more
difficult the item
INDEX OF
Estimate the item discriminating power using the
formula below: DISCRIMINATING
POWER
D = RU - RL = 6- 2 = 0.40
½T 10

MORE SOPHISTICATED
DISCRIMINATION INDEX

ITEM
DISCRIMINATION
Refers to the ability of an item to differentiate
among students on the basis of how well they
know the material being tested.

Various hand calculation
procedures have traditionally been
used to compare item responses to
total test scores using
Computerized analyses provide more

accurate assessment of the
discrimination power of items because
they take into account responses of all
students rather than just high and low
scoring groups.
The Item discrimination index provided by
ScorePak® is a Pearson Product Moment
correlation between student responses to a
particular item and total scores on all other
items on the test. It provides an estimate of the
degree to which an individual item is measuring
the same thing as the rest of the items.

Discrimination
Coefficients
• Point biserial – The point biserial (rpbis)
correlation is used to find out if the right
people are getting the items right, and
how much predictive power the item has
and how it would contribute to
predictions.

 Discrimination index reflects the degree to
which an item and the test as a whole are
measuring a unitary ability or attribute,
values of the coefficient will tend to be
lower for tests measuring a wide range of
content.
 Items with negative indices should be examined to

determine why a negative value was obtained.
Tests with high internal consistency consist of
items with mostly positive relationships with total
test score.

ScorePak® Classifies
Item Discrimination
“Earth is the third planet
“Venus has a beautiful name from the Sun and the only
“good” if the index “fair” if it is
and is actually the second
“poor” if it is below
one that harbors life in the
is above .30 between .10theand
planet from Sun”.30; .10.
System”

At the end of the Item Analysis report, test items are
listed according their degrees of difficulty (easy,
medium, hard) and discrimination (good, fair, poor).
These distributions provide a quick overview of the
test, and can be used to identify items which are not
performing well and which can perhaps be improved
or discarded.

Process of collecting and
analyzing evidence to support the
meaningfulness and usefulness of VALIDATION
the test.

Is the extent to which a test measures
what it purports to measure or as a
referring to the appropriateness,
correctness, meaningfulness and VALIDITY
usefulness of the specific decisions a
teacher makes based on the test results.

• Content - related evidence of validity.
3 TYPES OF
• Criterion - related evidence of validity.
EVIDENCE OF
VALIDITY • Construct - related evidence of validity.

Refers to the content and format
CONTENT-RELATED
of the instrument. EVIDENCE OF
VALIDITY

Refers to the relationship
between scores obtained using
CRITERION-
the instrument and scores
RELATED EVIDENCE
obtained using one or more OF VALIDITY
other tests.

Refers to the nature of
psychological construct or CONSTRUCT-
characteristics being RELATED EVIDENCE
measured by the test. OF VALIDITY

PROCEDURE FOR DETERMINING
CONTENT VALIDITY
• Teacher writes out objectives of the test.
• Teacher will give these to the experts together with the

test.
• The expert look at the objectives, read the items in the

test.
• Place a checkmark in front of each questions or item

that they may feel does not measure one or more
objectives.

PROCEDURE FOR DETERMINING
CONTENT VALIDITY
• They also place a checkmark in front of each
objectives not assessed by any item in the test.
• Teacher rewrites any item so checked and and

resubmit to experts.
• This continues until experts approved of all items and

also until the experts agree that all of the objectives
are sufficiently covered by the test.

The teacher usually compares scores
TO OBTAIN EVIDENCE on the test in question with the scores
OF CRITERION- on some other independent criterion
RELATED EVIDENCE
test which presumably has already
VALIDITY
high validity.

TYPE OFRITERION -
related evidence validity
a. Concurrent Validity
b. Predictive Validity

Grade Point Average
TEST SCORE VERY GOOD GOOD NEEDS
IMPROVEMENT
HIGH 20 10 5
AVERAGE 10 25 5
LOW 1 10 14
EXPENTANCY
TABLE

RELIABILITY
Reliability refers to the consistency of the scores

obtained — how consistent they are for each
individual from one administration of an instrument
to another and from one set of items to another.

VALIDITY AND REALIBILITY
Are related concepts. If an instrument is

unreliable, it cannot yet valid outcomes. As
reliability improves, validity may improve
(or may not).

However, if an instrument is shown
scientifically valid then it is almost certain
that it is also reliable. Item reliability is the
consistency of a set of items (variables);
that is to what extent they measure the
same thing.

When a set of items are consistent, they can make a
measurement scale such as a sum scale.
The table below is a standard followed almost

universally in educational tests and measurement
REABILITY Interpretation
.90 and above Excellent reliability; at the level of the best
standardized tests.

.80-90 Very good for a classroom test

.70-80 Good for a classroom test; in the range of most.
There are probably a few items which could be
improved.

When a set of items are consistent, they can make a
measurement scale such as a sum scale.
.60-70 Somewhat low. This test needs to be

supplemented by other measures. (e.g., more
tests) to determine grades. There are probably
some items which could be improved.

.50-60 Suggests needs for revision of test, unless it is
quite short (ten or fewer items). The test
definitely needs to be supplemented by other
measures (e.g., more tests) for grading.

.50 and below Questionable reliability. This test should not
contribute heavily to the course grade, and it
needs revision.

Item Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Item Analysis

Uploaded by

Copyright:

Available Formats

GUESS WHAT?

Mechanics: In choosing a player the host will

Start Table of contents Back Next

Start Table of contents Back Next

ITEM c. Determine the upper and lower group.

ANALYSIS 2. Item analysis phase (level of

Start Table of contents Back Next

Start Table of contents Back Next

Item difficulty = 75/100 = ?

Start Table of contents Back Next

0.26 - 0.75 Right-Difficulty Retain

Start Table of contents Back Next

Start Table of contents Back Next

Formula: no. of class X .27

Total no. Of students = 54

UG= 15 students 27%

UG had DI of 0.60 or 60% 27% LG had

Index of discrimination = 0.60-0.20

Index Range Iterpretation Action

-.55 - 0.45 Non-Descriminating Revise

Start Table of contents Back Next

LG got UG got the

Start Table of contents Back Next

2. Conduct an item try out

STEPS: b) Arrange Scores

c) Determine the lower and upper group.

3. Construct an item analysis table

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Index Range Iterpretation Action

-.55 - 0.45 Non-Descriminating Revise

Start Table of contents Back Next

Start Table of contents Back Next

The Michigan State University

Start Table of contents Back Next

The first of these is the index of difficulty

Start Table of contents Back Next

Whichever index is selected is shown as the

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Computerized analyses provide more

Start Table of contents Back Next

Start Table of contents Back Next

 Items with negative indices should be examined to

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

Start Table of contents Back Next

• Teacher will give these to the experts together with the

• The expert look at the objectives, read the items in the

• Place a checkmark in front of each questions or item

Start Table of contents Back Next

• Teacher rewrites any item so checked and and