Professional Documents
Culture Documents
Item Analysis and Test Revision
Item Analysis and Test Revision
Among the tools test developers might employ to analyze and select items are:
An index of an item’s difficulty is obtained by calculating the proportion of the total number of
test takers who answered the item correctly.
This is represented with a lowercase italic “p” (p) is used to denote item difficulty, and a
subscript refers to the item number (so p 1 is read “item-difficulty index for item 1”).
The value of an item-difficulty index can theoretically range from 0 (if no one got the item
right) to 1 (if everyone got the item right).
Example:
If 50 of the 100 examinees answered item 2 correctly, then the item-difficulty index for
this item would be:
50/100 = 0.50
Then: p2 = 0.50
If 75 of the examinees got item 3 right, then p 3 would be equal to .75
75/100 = 0.75
Then: p3 = 0.75
We could say that item 3 was easier than item 2. Note that the larger the item-difficulty index,
the easier the item.
Ideal difficulty levels for multiple-choice items in terms of discrimination potential are:
Five-response multiple-choice 70
Four-response multiple-choice 74
Three-response multiple-choice 77
(From Lord, F.M. “The Relationship of the Reliability of Multiple-Choice Test to the Distribution of Item
Difficulties,” Psychometrika, 1952, 18, 181-194.)
Factor analysis - A statistical tool useful in determining whether items on a test appear to
be measuring the same thing(s).
Qualitative item analysis is a general term for various non-statistical procedures designed to explore
how individual test items work. The analysis compares individual test items to each other and to the test
as a whole.
Expert panels
Sensitivity Review is a study of test items, typically conducted during the test development
process, in which items are examined for fairness to all prospective test takers and for the presence of
offensive language, stereotypes, or situations.
Some of the possible forms of content bias that may find their way into any achievement test were
identified as follows (Stanford Special Report,1992, pp. 3–4).
Status: Are the members of a particular group shown in situations that do not involve authority or
leadership?
Stereotype: Are the members of a particular group portrayed as uniformly having certain (1) aptitudes,
(2) interests, (3) occupations, or (4) personality characteristics?
Familiarity: Is there greater opportunity on the part of one group to (1) be acquainted with the
vocabulary or (2) experience the situation presented by an item?
Offensive Choice of Words: (1) Has a demeaning label been applied, or (2) has a male term been used
where a neutral term could be substituted?
Other: Panel members were asked to be specific regarding any other indication of bias they detected.
TEST REVISION
Many tests are deemed to be due for revision when any of the following conditions exist.
1. The stimulus materials look dated and current test takers cannot relate to them.
2. The verbal content of the test, including the administration instructions and the test items,
contains dated vocabulary that is not readily understood by current test takers.
3. As popular culture changes and words take on new meanings, certain words or expressions in the
test items or directions may be perceived as inappropriate or even offensive to a particular group
and must therefore be changed.
4. The test norms are no longer adequate as a result of group membership changes in the population
of potential test takers.
5. The test norms are no longer adequate as a result of age-related shifts in the abilities measured
over time, and so an age extension of the norms (upward, downward, or in both directions) is
necessary.
6. The reliability or the validity of the test, as well as the effectiveness of individual test items, can
be significantly improved by a revision.
7. The theory on which the test was originally based has been improved significantly, and these
changes should be reflected in the design and content of the test.
** The steps to revise an existing test parallel those to create a brand-new one. In the test
conceptualization phase, the test developer must think through the objectives of the revision and how they
can best be met. In the test construction phase, the proposed changes are made. Test tryout, item analysis,
and test revision (in the sense of making final refinements) follow. **
(1) Evaluating existing tests for the purpose of mapping test revisions