You are on page 1of 55

Item Analysis

Made Easy in the


New Normal
Becky May I. Sajo
Teacher I
“Exam is an essential tool that assesses various
factors such as mastery of content, acquisition of
knowledge and skills, among others.”

-Matthew Lynch
How well did my
test distinguish among students
according to how well they met my
learning goals?
How well did my students perform
in each items on my test?
What questions in the test
were easy, average or
difficult?
Session Objectives:
At the end of the session, the participants should be able
to:
• define item analysis, difficulty index,
discrimination index and reliability;
• conduct item analysis using the second
summative assessment; and
• use SPSS to conduct reliability testing.
What is Item
Analysis?

It is described as a postmortem evaluation


and it is a procedure for assessing the quality
of examination items (Tamakloe, Atta &
Amedahe, 1996).
What is Item
Analysis?
Item analysis refers to a statistical
technique that helps instructors
identify the effectiveness of their test
items (Anu, 2019)
What is Item
Analysis?
Item analysis uses statistics and expert
judgment to evaluate tests based on the
quality of individual items, item sets, and
entire sets of items, as well as the
relationship of each item to other items
(CDHS, 1999).
What is Item
Analysis?
It “investigates the performance of items
considered individually either in relation to
some external criterion or in relation to the
remaining items on the test” (Thompson &
Levitov, 1985).
What is Item
Analysis?
Item analysis refers to the process of
collecting, summarizing and using
information from students‟ responses to make
decisions about each assessment task (Nitko,
1996)
In developing quality assessment and specifically
effective multiple-choice test items, item analysis
plays an important role in contributing to the
fairness of the test along with identifying content
areas that maybe problematic for students (Anu,
2019)
Objectives of Item Analysis (Anu, 2019)
• Improves the quality of teaching and examination.
• Points out teachers‟ strengths and weakness in terms of teaching and
therefore ensures that, the right thing is done by the teacher
• Helps one to identify items for retention, revision or removal.
• Improves items of a test for future used based on indices of item
difficulty and item discrimination as well as the extent of effective
functioning of the distracters.
• Helps in identifying potential mistakes in scoring, ambiguous items,
and alternatives (distractors) that don’t work.
Classical Item Analysis Statistic

•Item Difficulty
•Item Discrimination
•Distracter Analysis
•Reliability
Item Difficulty
• The proportion of students who answer an item correctly.
• Item difficulty is relevant for determining whether
students have learned the concept being tested. It also
plays an important role in the ability of an item to
discriminate between students who know the tested
material and those who do not.
(University of Washington – Office of Educational
Assessment, 2020)
Item Difficulty (p)
𝑅𝑈 + 𝑅 𝐿
𝑝=
𝑇
= The number in the upper group who
answered the item correctly
The number in the lower group who answered
the item correctly
= The total number of examinees
Difficulty Index Interpretation
(Hopkins and Antes)
Difficulty Index Description
.86 to 1.00 Very Easy
.71 to .85 Easy
.40 to .70 Desirable
.15 to .39 Difficult
0 to .14 Very Difficult
Difficulty index above 0.85 are very easy items and be carefully reviewed on the basis of the
instructor’s purpose.
P-values below 0.14 are very difficult items and should be reviewed for possible confusion.
Item Discrimination
• It is the difference between the proportion of the top scorers
who got an item correct and the proportion of the bottom
scorers who got the item right (Anu, 2019).
• It refers to the ability of an item to differentiate among
students based on how well they know the material being
tested. (University of Washington – Office of Educational
Assessment, 2020)
Item Discrimination
• When an item is discriminating negatively, over-all the most
knowledgeable examinees are getting the item wrong, and
the least knowledgeable examinees are getting the item
right. A negative discrimination index may indicate that the
item is measuring something other than what the rest of the
test is measuring. More often, it is a sign that the item has
been mis-keyed (Anu, 2019).
Item Discrimination (D)

D
= The number in the upper group who answered the item correctly
The number in the lower group who answered the item correctly
= The number of examinees in the upper group
= The number of examinees in the lower group
Discrimination Index Interpretation
(Penn, 2009; McGahee & Ball, 2009)
Discrimination Index Description
0.40 and above Very Good
0.30 – 0.39 Good (Subject to Improvement)
0.20 – 0.29 Marginal Item (Needs
Improvement)
Below 0.20 Poor
Example 1
Objective Item Upper Lower Total Decision

1 10 8 18
Total 10 8 18
Remembering Improve Item

P-value of 0.9 indicates that the item is very easy. The D-value is
0.2, indicating poor item discrimination. The item therefore the
item needs improvement.
Example 2
Objective Item Upper Lower Total Decision

1 10 10 20
Total 10 10 20
Remembering Revise

P-value of 1 indicates that the item is very easy. Again, the D-


value is 0.00, indicating no item discrimination. The item
therefore should be revised.
Example 3
Objective Item Upper Lower Total Decision

1 9 4 13
Total 9 4 13
Applying Retained

P-value of 0.65 indicates that the item is desirable. Again, the D-


value is 0.65, indicate very good item discrimination. The item
therefore is good and should be retained.
Example 4
Objective Item Upper Lower Total Decision

1 7 7 14
Analyzing
Total 7 7 14 Revise

P-value of 0.70 indicates that the item is desirable. However, the


D-value is 0, indicating very bad item discrimination. The item
therefore is needs to be revised.
Example 5
Objective Item Upper Lower Total Decision

1 4 8 12
Total 4 8 12
Understanding Change

P-value of 0.40 indicates that the item is moderate. Again, the D-value is -0.4, indicates
that the lower achievers are scoring that item more than the higher achievers. A negative
discrimination index may indicate that the item is measuring something other than what
the rest of the test is measuring. More often, it is a sign that the item has been mis-
keyed. The item therefore very bad and needs to be changed.
Distracter Analysis
• Items quality essentially depends partly on the effective
functioning of the selected distracters. The real functioning
of the distracters is identified by logical check of the options
and how they are selected by the examinees. Most
importantly, every distracter must attract, at least one
examinee. In general, a good distracter must attract more
examinees from the lower group than the high Group
(Tamakloe, Atta & Amedahe, 1996).
Distracter Analysis
•If no one selects a distracter it is
important to revise the option and
attempt to make the distracter a more
plausible choice.
Example 1
Options Upper Lower Total
A 0 3 3
B 0 0 0
C** 10 7 17
D 0 0 0
Total 10 10 20

From the table above, it is very clear that options “B” and “D” had none of the examinees selecting them.
Even none of the lower group selected these distracters meaning these two purported distracters are not
really functioning well as distracters because they did not distract any of the examinees. Therefore, these
distracters are not good distracters and must be changed.
Example 2
Options Upper Lower Total
A** 9 8 17
B 0 2 2
C 1 0 1
D 0 0 0
Total 10 10 20

Option “D” did not really function as a good distracter, because none of the
examinees selected it, not even the lower group. Therefore, that distracter
cannot be retained, it needs to be revised or changed.
Example 3
Options Upper Lower Total
A 1 4 5
B 0 2 2
C 1 0 1
D** 8 4 12
Total 10 10 20

The Item 4 table clearly shows that option “A” is really a good distracter,
because it attracted 10 examines, representing 25% of the total examinees.
Therefore, is appropriate to maintain option “A”.
Reliability
• Reliability is a measure of consistency. It is the degree to
which student results are the same when they take the same
test on different occasions, when different scorers score the
same item or task, and when different but equivalent tests
are taken at the same time or at different times (The Center
on Standards & Assessment Implementation, 2018).
Types of Reliability (Trochim, nd)
• Stability
1. Test – Retest
Types of Reliability (Trochim, nd)
• Stability
2. Inter-rater/ Observer/ Scorer
- applicable on essay questions
Types of Reliability (Trochim, nd)
• Equivalence
3. Parallel-Forms/ Equivalent
-use to assess the consistency of results of two test
constructed in the same way from the same content domain
Types of Reliability (Trochim, nd)
• Internal Consistency
-Used to assess the consistency of results across items within
a test.
4. Split Half
Types of Reliability (Trochim, nd)
• Cronbach Alpha
- It is equivalent to the average of all possible split half
correlations we would never actually calculate it that way.
Reliability Index Interpretation (Taber,
2017)
Reliability Index Description
0.91 and above Excellent and at the level of Standardized
Test
0.81-0.90 Very good for Classroom Test
0.71-0.80 Good for Classroom Test. There are
probably a few items which could be
improved.
0.61-0.70 Somewhat Low. Needs to be
supplemented
0.51 to 0.60 Needs revision of test
0.50 or below Questionable Reliability
Item Analysis
Process
(NORSU, 2019
Prepare the Table of Specifications (TOS)
Steps in Item Analysis

1. Arrange the test


scores from
highest to lowest.
Steps in Item Analysis
2. Get the upper and lower 27%
of scores. This will be the upper
group and lower group,
respectively.
Note: Top 50% and Bottom
50% for a class with less than
30 students
Steps in Item Analysis
3. Code the test items:
- 1 for correct and 0 for incorrect
- Horizontal – rows (item numbers)
- Vertical– Columns
(respondents/students)
Steps in Item Analysis

4. Solve for the


DIFFICULTY INDEX of
each item.
Steps in Item Analysis

5. Solve for the


DISCRIMINATION AND do
DISTRACTER ANALYSIS of
each item.
Steps in Item Analysis

6. For Reliability
In SPSS, click the variable
view and label each item as Q1.
Note: There should be no space
in the name column
Steps in Item Analysis

7. Assign 1 for the Correct Answer and 0 for


Wrong Answer in the values column.
Steps in Item Analysis

8. Click the data view and encode the


responses of the students.
Steps in Item Analysis

9. Click
Analyze > Scale>Reliability Analysis
>Select all the variable > Drag to Items
Box > Model – Alpha > Statistics –
Descriptive (Scale if item Deleted) ,
ANOVA Table (F test) > ok
ITEM ANALYSIS is
not an END in
itself
“Exam scores are not just supposed to
separate students who pass or fail an exam.
Through the scrutiny of the exam questions
and scores, teachers can have insight into
how much the students have learned”.
-Matthew Lynch
OUTPUT

You might also like