Itemanalysisfinalversion 170117032905

Starting Windows Especially Designed
Basics of
Item
Analysis
QUESTIONS?
Contact Me…!!!
©James L. Paglinawan TM
® e-mail me @:
JLPaglinawan@cmu.edu.ph
JLPaglinawan@teacher.deped.gov.ph
Click to end
Item Analysis
Techniques to
improve test items
and instruction
Test Development Process
1. Review 2. Convene
13. Standard Setting
Study
National National
Advisory
14. Set Passing and Committee
Standard Profession 3. Develop Domain,
al Knowledge and
11. Administer Skills Statements
Tests
Standards 4. Conduct Job
Analysis
12. Conduct
Item 5. Establish Test
Analysis Specifications
6. Develop Test
9. Assemble Operational Design
Test Forms
10. Produce Printed Tests 7. Develop New Test
Mat. Questions
8. Review Test Questions
What is Item Analysis ?
• process that examines student

responses to individual test items
assessing quality items and the test as
a whole
• valuable in improving items which will
be used again in later tests and
eliminate ambiguous or misleading
items
• valuable for increasing instructors'
skills in test construction, and
• identifying specific areas of course
content which need greater emphasis
or clarity.
Several Purposes
1. More diagnostic information on
students
– Classroom level:
• determine questions most found very
difficult/ guessing on, -
– reteach that concept
• questions all got right –
– don't waste more time on this area
• find wrong answers students are
choosing-
– identify common misconceptions
– Individual level:
• isolate specific errors this student
made
2. Build future tests,
revise test items to make
them better
• know how much work in writing good
questions
• SHOULD NOT REUSE WHOLE TESTS -->
diagnostic teaching means responding
to needs of students, so after a few years
a test bank is build up and choose a
tests for the class
• can spread difficulty levels across your
blueprint (TOS)
3. Part of continuing
professional development
– doing occasional item analysis will
help become a better test writer
– documenting just how good your
evaluation is
– useful for dealing with parents or
administrators if there's ever a dispute
– once you start bringing out all these
impressive looking stats, parents and
administrators will believe why some
students failed.
Classical ITEM Analysis
Statistics
• Reliability (test level statistic)
• Difficulty (item level statistic)
• Discrimination (item level statistic)

Test level statistic
Quality of the Test
• Reliability and Validity
– Reliability
Consistency of measurement
– Validity
Truthfulness of response
Overall Test Quality

Individual Item Quality
Reliability
refers to the extent to which the
test is likely to produce consistent
scores.
Characteristics:
1. The intercorrelations among the items --
the greater/stronger the relative number
of positive relationships are, the greater
the reliability.
2. The length of the test –

a test with more items will have a higher
reliability, all other things being equal.
3. The content of the test --
generally, the more diverse
the subject matter tested
and the testing techniques
used, the lower the
reliability.
4. Heterogeneous groups of
test takers
Types of reliability
• Stability
1. Test – Retest
• Stability
2. Inter – rater / Observer/ Scorer
• applicable for mostly essay questions
• Use Cohen’s Kappa Statistic
• Equivalence
3. Parallel-Forms/ Equivalent
Used to assess the consistency of
the results of two tests
constructed in the same way from
the same content domain.
• Internal Consistency
• Used to assess the consistency of
results across items within a test.
4. Split – Half
• 5. Kuder-Richardson
Formula 20 / 21
Correlation is
determined from a
single administration of
a test through a study
of score variances
• 6. Cronbach's Alpha (a)
Reliability
Indices
Interpretation
Excellent reliability; at the level of the best standardized
.91 and above
tests
.81 - .90 Very good for a classroom test
Good for a classroom test; in the range of most. There

.71 - .80
are probably a few items which could be improved.
Somewhat low. This test needs to be supplemented by

other measures (e.g., more tests) to determine
.61 - .70
grades. There are probably some items which could
be improved.
Suggests need for revision of test, unless it is quite

short (ten or fewer items). The test definitely needs to
.51 - .60
be supplemented by other measures (e.g., more tests)
for grading.
Questionable reliability. This test should not contribute

.50 or below
heavily to the course grade, and it needs revision.
Test Item statistic
 Item Difficulty
Percent answering correctly
 Item Discrimination
How well the item "functions“
How “valid” the item is based
on the total test score criterion
WHAT IS A WELL-
FUNCTIONING
TEST ITEM?
• how many students got it correct?

(DIFFICULTY)
• which students got it correct?
(DISCRIMINATION)
Three important information
on quality of test items
• Item difficulty: measure whether an

item was too easy or too hard.
• Item discrimination: measure whether
an item discriminated between students
who knew the material well and students
who did not.
• Effectiveness of alternatives:
Determination whether distracters
(incorrect but plausible answers) tend to
be marked by the less able students and
not by the more able students.
Item Difficulty
• Item difficulty is simply the percentage
of students who answer an item
correctly. In this case, it is also equal to
the item mean.
Diff = # of students choosing correctly

total # of students
• The item difficulty index ranges from 0

to 100; the higher the value, the easier
the question.
Item Difficulty Level:
Definition
The percentage of students who

answered the item correctly.
Low
High Medium
(Easy
(Difficult) (Moderate)
)
>=80
<= 30% > 30% AND < 80%
%
0 10 20 30 40 50 60 70 80 90 100
Sample
Number of students who answered each item = 50
Item No. Correct % Difficulty

No. Answers Correct Level
1 15 30 High
2 25 50 Medium
3 35 70 Medium
4 45 90 Low
Questions/Discussion
• Is a test that nobody failed

too easy?
• Is a test on which nobody got
100% too difficult?
• Should items that are “too
easy” or “too difficult” be
thrown out?
Item Discrimination
• Traditionally, using high and low scoring
groups (upper 27 % and lower 27%)
• Computerized analyses provide more
accurate assessment of the
discrimination power of items since it
accounts all responses rather than just
high and low scoring groups.
• Equivalent to point-biserial correlation. It
provides estimate the degree an
individual item is measuring the same
thing as the rest of the items.
What is Item Discrimination?
• Generally, students who did well on

the exam should select the correct
answer to any given item on the exam.
• The Discrimination Index
distinguishes for each item between
the performance of students who did
well on the exam and students who
did poorly.
Indices of Difficulty and
Discrimination
(by Hopkins and Antes)
Index Difficulty Discrimination

0.86 above Very Easy To be discarded
0.71 – 0.85 Easy To be revised
0.30 – 0.70 Moderate Very Good items

0.15 – 0.29 Difficult To be revised
0.14 below Very Difficult To be discarded

Item Discrimination:
Questions / Discussion
• What factors could contribute to

low item discrimination between
the two groups of students?
• What is a likely cause for a
negative discrimination index?
ITEM
ANALYSIS
PROCESS
Sample TOS
Remember Understand Apply Total
Section 4 6 10 20
A (1,3,7,9)
Section 5 5 4 14
B (2,5,8,11,15)
Section 3 7 6 16
C (6,17,21)
Total 12 18 20 50
Steps in Item
analysis
1. Code the test items:
- 1 for correct and 0 for
incorrect
- Vertical – columns (item
numbers)
- Horizontal – rows
(respondents/students)
TEST ITEMS
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 . . . . 50
1 1 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1
2 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1
3 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0
4 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0
5 1 0 1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0
6 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1
7 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
8 1 1 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 0 0
2. IN SPSS:
Analyze  Scale 
Reliability analysis 
(drag/place variables to
Item box)  Statistics 
Scale if item deleted 
ok.
• ****** Method 1 (space saver) will be used for this analysis ******
• R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
• Item-total Statistics
• Scale Scale Corrected
• Mean Variance Item- Alpha
• if Item if Item Total if Item
• Deleted Deleted Correlation Deleted
• VAR00001 14.4211 127.1053 .9401 .9502
• VAR00002 14.6316 136.8440 .7332 .9542
• VAR00003 14.4211 141.5695 .4774 .9574
• VAR00004 14.4737 128.6109 .6511 .9508
• VAR00005 14.4737 128.8252 .8274 .9509
• VAR00006 14.0526 130.6579 .2236 .9525
• VAR00007 14.2105 127.8835 .2533 .9511
• VAR00008 14.1053 128.6673 .1906 .9515
• VAR00009 14.4211 129.1410 .7311 .9513
• .....................
• VAR00022 14.4211 129.1410 .7311 .9513
• VAR00023 14.4211 127.1053 .4401 .9502
• VAR00024 14.6316 136.8440 -.0332 .9542
• VAR00047 14.4737 128.6109 .8511 .9508
• VAR00048 14.4737 128.8252 .8274 .9509
• VAR00049 14.0526 130.6579 .5236 .9525
• VAR00050 14.2105 127.8835 .7533 .9511
• Reliability Coefficients
• N of Cases = 57.0 N of Items = 50
• Alpha = .9533
3. In the output dialog box:
• Alpha placed at the bottom
• the corrected item total

correlation is the point
biserial correlation as bases
for index of test reliability
4. Count the number of
items discarded and fill
up summary item analysis
table.
Test Item Reliability Analysis
Summary (sample)
Test Level of Number % Item Number

Difficulty of Items
Math Very Easy 1 2 1
(50 items) Easy 2 4 2,5
Moderate 10 20 3,4,10,15…
Difficult 30 60 6,7,8,9,11,…
Very 7 14 16,24,32…
Difficult
5. Count the number of
items retained based
on the cognitive
domains in the TOS.
Compute the
percentage per level
of difficulty.
Remember Understand Apply
N Ret N Ret N Ret
A 4 1 6 3 10 3
B 5 3 5 3 4 2
C 3 2 7 4 6 3
Total 12 6 18 10 20 8
% 50% 56% 40%
Over 24/50 = 48%
all
• Realistically: Do item analysis
to your most important tests
– end of unit tests, final exams -->

summative evaluation
– common exams with other
teachers
(departmentalized exam)
• common exams gives bigger
sample to work with, which is
good
• makes sure that questions other
teacher wrote are working for
YOUR class
ITEM ANALYSIS is one area
where even a lot of otherwise
very good classroom teachers
fall down:
– they think they're doing a good

job;
– they think they've doing good
evaluation;
– but without doing item analysis,
– they can't really know
ITEM ANALYSIS is not
an end in itself,
–no point unless you
use it to revise items,
and
–help students on
basis of information
you get out of it.
END OF PRESENTATION…
THANK U FOR LISTENING…
HAVE A RELIABLE AND

ENJOYABLE DAY….
L/O/G/O

Itemanalysisfinalversion 170117032905

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Itemanalysisfinalversion 170117032905

Uploaded by

Copyright:

Available Formats

Starting Windows Especially Designed

• process that examines student

• Reliability (test level statistic)

• Difficulty (item level statistic)

• Discrimination (item level statistic)

Overall Test Quality

2. The length of the test –

Good for a classroom test; in the range of most. There

Somewhat low. This test needs to be supplemented by

Suggests need for revision of test, unless it is quite

Questionable reliability. This test should not contribute

• how many students got it correct?

• Item difficulty: measure whether an

Diff = # of students choosing correctly

• The item difficulty index ranges from 0

The percentage of students who

Item No. Correct % Difficulty

• Is a test that nobody failed

• Generally, students who did well on

Index Difficulty Discrimination

0.71 – 0.85 Easy To be revised

0.30 – 0.70 Moderate Very Good items

0.14 below Very Difficult To be discarded

• What factors could contribute to

• Alpha placed at the bottom

• the corrected item total

Test Level of Number % Item Number

– end of unit tests, final exams -->

– they think they're doing a good

THANK U FOR LISTENING…

HAVE A RELIABLE AND

You might also like