Professional Documents
Culture Documents
ANALYSIS
INSTRUCTOR
DR. TAHIRA BATOOL
Techniques of item analysis
• Distracter analyses • Difficulty index
• Discriminate index • Analysis of subjective
items
What are methods of item analysis?
• Item analysis uses statistics and expert judgment to evaluate tests based on the quality of
individual items, item sets, and entire sets of items, as well as the relationship of each item to
other items.
• Item difficulty) indicates the proportion of students who got the item right. A high percentage
indicates an easy item/question and a low percentage indicates a difficult item. In general,
items should have values of difficulty no less than 20% correct and no greater than 80%. Very
difficult or very easy items contribute little to the discriminating power of a test.
• Under classical test theory, the main focus of item analysis is on the two item statistics of the
correct choice, that is, the proportion of the correct choice or item difficulty (i.e., item
facility) and the item discrimination index. to improve item quality, other related statistics of
the incorrect choices or foils .
• Item Difficulty is a measure of the proportion of students/subjects who have answered an
item correctly.
• What is difficulty index in item analysis?
• For items with one correct alternative worth a single point, the item
difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item
difficulty index ranges from 0 to 100; the higher the value, the easier
the question.
DISCRIMINATION INDEX
• Discrimination index is the difference between the proportion of the
top scorers who got an item correct and the proportion of the bottom
scorers who got the item right (each of these groups consists of
twenty seven percent of the total group of students who took the test
and is based on the students‟ total score for the test). The
discrimination index range is between -1 and +1. The closer the index
is to +1, the more effectively the item distinguishes between the two
groups of students. Sometimes an item will discriminate negatively.
Such an item should be revised and eliminated from scoring as it
indicates that the lower performing students actually selected the key
or correct response more frequently than the top performers.
Calculating the Item Discrimination Index
Five-response multiple-choice 70
Four-response multiple-choice 74
Three-response multiple-choice 77
.60 – .70 Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to
determine grades. There are probably some items which could be improved.
.50 – .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The test
definitely needs to be supplemented by other measures (e.g., more tests) for grading.
.50 or below Questionable reliability. This test should not contribute heavily to the course grade, and it
needs revision.
Standard Error of Measurement
• The standard error of measurement is directly related to the reliability
of the test. It is an index of the amount of variability in an individual
student’s performance due to random measurement error.
• If it were possible to administer an infinite number of parallel tests, a
student’s score would be expected to change from one administration
to the next due to a number of factors.
• For each student, the scores would form a “normal” (bell-shaped)
distribution.
• The mean of the distribution is assumed to be the student’s “true
score,” and reflects what he or she “really” knows about the subject.
• The standard deviation of the distribution is called the standard error
of measurement and reflects the amount of change in the student’s
score which could be expected from one test administration to
another.
• Whereas the reliability of a test always varies between 0.00 and
1.00, the standard error of measurement is expressed in the same
scale as the test scores. For example, multiplying all test scores by
a constant will multiply the standard error of measurement by
that same constant, but will leave the reliability coefficient
unchanged.
• A general rule of thumb to predict the amount of change which
can be expected in individual test scores is to multiply the
standard error of measurement by 1.5. Only rarely would one
expect a student’s score to increase or decrease by more than
that amount between two such similar tests. The smaller the
standard error of measurement, the more accurate the
measurement provided by the test.
• (Further discussion of the standard error of measurement can be
found in J. C. Nunnally, Psychometric Theory. New York: McGraw-Hill,
1967, pp.172-235, see especially formulas 6-34, p. 201.)
• Item difficulty (P) =Number of students choosing the correct
response in the upper + middle + lower groups divided by the total
number of students who took the test
• Item discrimination (D) = Fraction of the upper group answering the
item correctly – fraction of the lower group answering the item
correct
• Item 1
• P1 =(10+11+8)/31= 0.93 D1= 10/10 – 8/10 = 0.2
• Item 2
• P2 = (9+9+8)/31= 0.86 D2= 9/10 – 8/10 = 0.1
• Item 3
• P3= (10+9+10)/31= 0.96 D3= 10/10 – 10/10 = 0.0
•T