You are on page 1of 144

RECAP

RELIABILITY & VALIDITY


Equivalent
Test-retest Internal Spilt-half
forms
reliability consistency method
method
Equivalent
Test-retest Internal Spilt-half
forms
reliability consistency method
method

Alternate Measure is
Repeat with Most popular
forms divided into
the same method
required two halves
respondents

Requirement Cronbachs Dividing the


Re-test within of alternate alpha test into half
a short period forms

Assumption
Resistance, Adequacy of of Dimension Random half
practice- alternate and the Items
effects forms

Spearman-
K-R20 Brown
formula
Types of Validity Evidence

Content Criterion- Construct


Validity related Validity Validity
Content Criterion- Construct
Validity related Validity
Validity

Well-validated Test of theoretical


Expert judgment measures of the or conceptual
same topic assumptions

Well-balanced Concurrent Validity Convergent


sample of the
at the same time Validity
domain

Predictive Validity Discriminant


future event Validity
Content Validity:
Subjects Tests, Achievement Assessment (either
in teacher- or standardized assessment)
Criterion-related Validity:
Tests for Admissions, Selection, Placement,
Employment, Certification, Achievement
Assessment

Construct Validity:
Assessment of Psychological Constructs (e.g.,
effort, motivation), Achievement Assessment
(E&F, p90-91)
Content Validity:
How well the test content represents some defined domain of
abilities that the test intends to measure. How adequately does
the sample of assessment tasks represent the domain of tasks to
be measured?

Criterion-related Validity:
The relationship between the test scores and the scores in some
criterion measure of the relevant abilities. How accurately does
the test predict future performance or estimate present
performance on some other valued measure (called criterion)?

Construct Validity:
Overall meaning of the scores: How well can performance on the
assessment be explained in terms of psychological constructs?
Typically, expressed by relationship to other constructs.
(E&F, p90-91; W&G, p49)
Popham chapter 11
L & G chapter 14
E & F chapter 13
Item analysis
Expressing/Evaluating:
1. Whether the items in the assessment functioned
the way they are intended to function
2. Quality of assessments

Evaluating the assessments by analyzing


student responses on each item.
Using assessment data to improve quality
of the assessment/items
1. Two fundamental concepts of item
analysis: a classical test theory approach
2. How to calculate, interpret and evaluate
estimates of item difficulty and item
discrimination
3. How to use the information about item
difficulty and item discrimination to
improve the quality of your assessment
Why would you evaluate the results
of your test/assessment when you
have already given the
test/assessment scores to your
students?
1. Evaluate achievement level of each student.
2.Effectiveness of teachers instruction
3. Prediction of student performance for next
year.
4.Locating the sources of unexpected results
5. Appropriateness of the assessments for your
students/Improve the assessment tools for
future use
o Errors in judgment (e.g., too difficult, too easy)
o Whether certain items did not perform as intended
o Technical flaws or simple mistakes
o Methods of evaluation
(E&F, p223)
How to evaluate quality of your assessments?

Item Difficulty

Item Discrimination
How do you measure them?
How do you evaluate them?
Classical Item
Test Theory Response
(CTT) Theory (IRT)
Purposes
Item Difficulty o Test construction
Item Discrimination o Accurate Assessment
o Selection
o Prediction
Item
Ability
Difficulty

Any assessment result is a product of the


ability of the individual student and the
content of the item (i.e., item difficulty).
(E&F, p231)
Definition: How difficult an item is for
a particular group of students.
In CTT, the item difficulty level is
calculated by: how many students
got the particular item correct.
It is expressed by percentages or
proportions of the students who
answered the item correctly.
Proportions & Percentages

Proportions: 0.00 to 1.00


Percentages: 0% to 100%

Proportions 100 = Percentages


10 students took the test.
8 out of 10 got the item correct.
Item difficulty = 80% or .80
3 out of 10 got the item correct.
Item difficulty = 30% or .30
When item difficulty is expressed by proportions,
we call it p-value, difficulty p, or item
difficulty p-value.
A p-value ranges from 0.00 to 1.00.

p-value =

C: The number of students who have responded
correctly to an item
T: The total number of students who have responded
to the item.

(P, p276)
A p-value of .95 means that an item
was answered correctly by 95% of the
students in the group!
A p-value of .15 means that an item
was answered correctly by 15% of the
students in the group (and missed by
85% of the students in the group)!

(P, p276)
Number of students P-value,
Proportion
25 students got it right .25
50 students got it right .50
75 students got it right .75

A higher p-value means that a greater


number of students answered the item
correctly.
The larger the value is, the easier the item.
(P, p277)
Higher Lower
Number Number

Easy Hard
Option Number of
students
a. 1
b. 1
c. 0
d. (correct answer)* 8

Whats the item difficulty of this item?


Option High- Low-
Achiever Achiever
a. 1 3
b. 1 2
c. 0 1
d.* 8 4

Whats the item difficulty of this item?


1. Item difficulty should
be different for a
group of students.
2. Item difficulty should
be different for each
student in a class.
3. There is an optimal
value of item difficulty
that can be used for
most tests and for most
classes.
4. We cannot suggest an
optimal level of item
difficulty for any test.
Efforts to improve the accuracy with
which a test measures should be:
to have about medium item difficulty
for each item.
to reduce the range of item difficulty
(rather than increasing it).

(E&F, p231-4)
(E&F, p231-4)
The item-difficulty p-value is linked
to the students chance probability of
getting the answer correct (without
knowing the content).
If there was only two-choices in a test item,
students should be able to answer it correctly
half of the time (50%): p-value is .50.
In a four-choice item: The chance p-value is
.25.
(P, p277)
The goal is to obtain a mean score of
item difficulty that is about halfway
between a perfect score and the
expected chance score.
Example: in a 100-item, 4 response-options
multiple-choice test, ideally, each item should
have about 62.5 item difficulty (100 + 25 = 125,
which is divided by 2 = 62.5).

(E&F, p225)
Item Item Difficulty
1 0.5
2 0.4
3 0.3 Test difficulty = 0.35
(0.5 + 0.4 + 0.3 + 0.2)/4
4 0.2

Whats the test difficulty of this item?


Students who know the content should
perform well on each item as well as the
test overall.
Students who get any one question
correct should have a relatively high
score on the overall exam.
There is a problem if students are
getting correct answers on the individual
items, and the item performance is not
reflected in his/her total score of the test.
(E&F, p225)
Students should be able to show their
ability/achievement level on each
item of the test.
A large proportion of the high-performing
students should get an item right.
A small proportion of the low-performing
students should get it right.

(E&F, p225)
Item Discrimination: in terms of the
items discriminating power.
An items effectiveness at discriminating those
who know the content from those who do not.
How well an item discriminates between high
and low scorers.
It indicates the ability of a test to discriminate
between high- and low-achievement
students.
(E&F, p225)
Item Discrimination: in terms of the
items relationship to the total scores.
Item discrimination is the degree to
which students with high overall exam scores
also got a particular item correct.
Summarizes the relationship between students
response for the total test and their responses to
a particular test item.
How frequently an item is answered correctly
by those who have performed well on the total
test.
(P, p278; W&G, p85)
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
Any pattern of numbers to focus?
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
What are the likely causes of lower-
achieving students (shown in the
overall test scores) getting this item
correct?
What are the likely sources of this
inconsistency?
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
What are the likely causes of high-
achieving students (shown in the
overall test scores) not getting this
item correct?
What are the likely sources of this
inconsistency?
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
No item discrimination!
What are the likely sources of
observing no item discrimination?
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Was there a particular reason to
observe this negative relationship?
Item 8 Item 9 Total Score

Tom
1 1 8
Gerry
1 0 7 Higher-Total scores

Mary
0 0 6
Jane
1 0 4
Karl
1 1 4 Lower-Total scores

Darren
1 1 3
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 5 6 7 8 9 10 Score
Tom
1 1 1 0 1 1 1 1 1 0 8
Gerry
1 1 0 1 1 1 1 1 0 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 1 0 0 4
Karl
0 0 0 0 1 1 0 1 1 0 4
Darren
0 0 0 0 1 0 0 1 1 0 3
0 1 1 1 3 2 1 4 3 3
Correlation-based:
Correlation coefficients between students
total test scores and their performance on
a particular item.
Correlation-based Item Discrimination
ranges from -1.00 to 1.00
Proportion-based:
Proportion of the students getting the
item correct.
Proportion-based Item Discrimination (not
proportions!) ranges from -1.00 to 1.00
(P, p278)
Proportion-based Calculation
1. Order the test scores based on the total
scores
2. Divide them to a high-scoring group and
a low-scoring group with an equal
number
3. Calculate a p-value for both groups

(P, p278)
4. Subtract the p-values between the
high- and low-scoring groups.

Item discrimination =
(a p-value of the high-scoring group)
(a p-value of the low-scoring group)

(P, p278)
20 students took the test
20 students took the test;
10 higher-performers; 10 lower-performers
20 students took the test;
Out of 10 higher-performers, 8 got the item correct
20 students took the test;
Out of 10 lower-performers, 2 got the item correct
20 students took the test;
10 higher-performers; 10 lower-performers
p-value for the higher-performers: 0.8;
p-value for the lower-performers: 0.2
Item discrimination is 0.8 - 0.2 = 0.6
p-value for the higher-performers: 0.8;
p-value for the lower-performers: 0.4
Item discrimination is 0.8 - 0.4 = 0.4
p-value for the higher-performers: 0.8;
p-value for the lower-performers: 0.2
Item discrimination is 0.8 - 0.2 = 0.6
p-value for the higher-performers: 0.8;
p-value for the lower-performers: 0.4
Item discrimination is 0.8 - 0.4 = 0.4
Option High-Achiever Low-Achiever

a. 1 3
b. 1 2
c. 0 1
d.* 8 4

Whats the item discrimination of this item?


Item discrimination ranges from -1.00
to 1.00.
1 0 = 1 (A perfect scenario)
1 1 = 0 (not good)
0 0 = 0 (not good)
0 1 = -1 (Perfectly wrong!)
Negatively Non- Positively
discriminating discriminating discriminating
item item item

(P, p278)
Positively discriminating item: An
item is answered correctly more often by those
who score well on the total test than by those
who score poorly on the total test.
Non-discriminating item: The item shows
no difference in the correct response proportions of
those who score well or poorly on the total test.
Negatively discriminating item: An item
is answered correctly less often by those who score
well on the total test than by those who score
poorly on the total test.
(P, p278)
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Positively discriminating items
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Items with No Discrimination Power
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Negatively discriminating item
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 1 1 0 1 1 1 0 1 1 8
Gerry
1 1 0 1 1 1 1 0 1 0 7
Mary
1 0 1 1 1 1 1 0 0 0 6
Jane
0 0 0 0 1 1 1 0 1 0 4
Karl
0 0 0 0 0 1 1 0 1 1 4
Darren
0 0 0 0 0 0 1 0 1 1 3
0 1 1 1 1 2 3 3 4 4
Item Item Item Item Item Item Item Item Item Item Total
1 2 3 4 7 6 5 10 8 9 Score
Tom
1 0 1 0 1 1 1 1 1 1 9
Gerry
1 0 1 0 1 1 1 1 1 0 7
Mary
1 0 1 0 1 1 1 1 0 0 6
Jane
0 1 1 0 1 1 1 0 0 0 4
Karl
0 1 1 0 1 1 0 0 0 0 4
Darren
0 1 1 0 0 0 0 1 0 0 3
Item Discrimination Evaluation

0.40 or higher Very good


0.30 0.39 Reasonably good
0.20 0.29 Needing improvement
0.19 or lower Poor item - consider removing

(E&F, p234)
Item difficulty and item discrimination are highly
related concepts and mathematically!
If an item was very hard and no students had
answered any items correctly,
a total p-value is 0. In this case, item discrimination is 0 0 =
0.
If an item was very easy and all students got all
items correctly,
a total p-value is 1. In this case, item discrimination is 1 1 = 0.
Items that are too difficult or too easy are not
capable of discriminating between high- and low-
achievement students (or not as well as the items
of moderate difficulty).
(E&F, p225; W&G, p87 P, p278-80)
A wide range of item difficulty or item
discrimination is not an optimal test
condition.
Efforts to improve the accuracy with
which a test measures (to improve
reliability) should be to reduce the
range of item difficulty (rather than
increasing it).

(E&F, p231-4; W&G, p87)


The goal is not to have a wide spread
of item-difficulty.
The medium-level of item difficulty is
ideal.
Ideally, the item difficulty index of a multiple-
choice items should range between 60% and
65% (0.60 ~ 0.65).

(E&F, p234)
The goal is not to have a wide spread
of item-discrimination.
The higher the item-discrimination
index is, the better it is.
Ideally, item discriminating index should
range around between 0.70 and 0.80.

(E&F, p234)
Both item difficulty and item
discrimination are sample-based indices.
If a test item is reused with a new group, the
results may be quite different.
Depends on a variety of factors:
Ability level and study habits of students
Instructional content and quality, etc.
Item difficulty and item discrimination are subject
to considerable sample error. The smaller the
sample, the larger the sampling errors.
(W&G, p87)
Evaluating the effectiveness of a multiple-
choice test by examining each alternative.
1. Ideally, all alternatives and all incorrect
answers should be chosen by some.
2. More students in the high-achievement
group should choose the correct answer.
3. Each incorrect answer should be more
frequently chosen by the low-scoring
group.

(W&G, p85)
See how many students in each group
selected the correct answers and
alternatives.
50% of
25% of
10% of
High-scoring students
Low-scoring students
(W&G, p85)
Option High- Low-
Achiever Achiever Most high-
a. 1 3 achievers got
b. 1 2 the item
c. 0 1 correct
d.* 8 4
Low-achievers
If we put it more ideally are evenly
spread out
Option High- Low-
Achiever Achiever throughout the
a. 0 2 alternatives
b. 0 3
c. 0 3
d.* 10 2
(W&G, p87)
Item 1: Item 2
a. (1-6) a. (17-13)
b. (33-51) b. (15-26)
c. (43-30)* c. (12-22)
d. (23-13) d. (56-39)*
Item 1: Item 2:
a. (1-6) a. (12-28)
b. (33-51) b. (30-43)
c. (43-30)* c. (5-11)
d. (23-13) d. (53-18)*
Difficulty of 37% Difficulty of 36%
Discrimination of 0.13 Discrimination of 0.35
Item 3 Item 4
a. (12-26) a. (17-13)
b. (56-40) b. (15-26)
c. (27-18)* c. (12-22)
d. (5-16) d. (56-39)*
Difficulty of 23% Difficulty of 48%
Discrimination of 0.09 Discrimination of 0.17
Item 5 Item 6
a. (20-25) a. (17-13)
b. (20-34) b. (15-26)
c. (38-40) c. (12-22)
d. (22-1)* d. (56-39)*
Difficulty of 12% Difficulty of 30%
Discrimination of 0.21 Discrimination of 0.15
Item 1 Higher Achievers Lower Achievers
a
b
C*
d
Item 1 Item 2 Item 3 Item 4
a (2-22) (4-32) (3-22) (2-11)

b (3-25) (11-25) (3-18) (20-25)

c (91-33)* (78-20)* (10-38) (15-40)

d (4-20) (7-23) (84-22)* (63-23)*


Correct Incorrect responses
Responses Should not attract more high-
performing students than low-
Should attract substantially performing students
larger number of high-
performing students Should not attract a
(perhaps a ratio of 3:1 or considerable number of good
higher) students
Should not attract about the Should see the spread of
equal number of high- and incorrect answers by low-
low-performing students.
performing students across the
It is not important to see the available distracters nearly
incorrect answers by high-
performing students to be evenly
spread across distracters.

(E&F p. 235-8)
Item Difficulty Item Discrimination
Checking for item Items that all or most
difficulty not students answer
too difficult and correctly should be
not too easy deleted.
Extremely hard or Items that no students
easy questions for answered correctly can
a particular also be deleted.
group do not Not discriminating items
contribute to item can be deleted.
discrimination!
Items missed by most students can be
discussed in more detail.
Frequencies with which each incorrect
answer is chosen may reveal common
errors and misconceptions serve as a
basis for remedial work.
Discussion on the reasons students have selected
their answers (right and wrong) can have
instructional value.
(W&G, p87)
For some types of tests (such as content-referenced tests,
especially for mastery tests, minimum-competency tests,
professional certification tests), the rules of item difficulty
and item discrimination can be modified.

Item difficulty: If the Item discrimination:


pool of examinees is Many good items used in
expected to be well content-referenced
prepared, it is expected to measures may have
have 70% to 100% range discrimination of zero or
item/test difficulty. In this near zero (because the
case, the main concern is score distributions tend to
not to have the moderate be quite negatively skewed
item difficulty. and low in variability).
Simple rules:
The highest discrimination should be
chosen.
Item difficulty is a secondary concern
because no item (with too easy or too
difficult) will have good discrimination.

Examining the How it might be


sources (why) improved
(E&F, p230)
Item Difficulty
Novick, M.R. (1966) The axioms and principal results of
classical test theory. Journal of Mathematical Psychology,
3(1), 1-18.
Allen, M.J., & Yen, W. M. (2002). Introduction to
Measurement Theory. Long Grove, IL: Waveland Press.
Gregory, R. J. (2011). Psychological Testing: History, Principles,
and Applications (6th-Ed.). Boston: Allyn & Bacon.
Hogan, T. P., & Brooke, C (2007). Psychological Testing: A
Practical Introduction (2nd-Ed.). Hoboken, New Jersey: John
Wiley & Sons.
Item Discrimination
Johnson, A. P. (1951). Notes on a suggested index of item
validity: The U-L index. Journal of Educational Psychology,
62, 499-504.
1. Two fundamental concepts of item
analysis: a classical test theory approach
2. How to calculate, interpret and evaluate
estimates of item difficulty and item
discrimination
3. How to use the information about item
difficulty and item discrimination to
improve the quality of the test

You might also like