You are on page 1of 11

1.

Introduction

According to the Office of Educational Assessment (2005), Item analysis is a


process which examines student response to individual test items/ questions
in order to assess the quality of those items and of the test as a whole. This
test item analysis report is based on a 20 multiple-choice test question
administered to 25 students (see Appendix B). The quality of individual items
is assessed by comparing student’s item responses to their total test scores.
Statistical analysis has been shown to summarize the performance of the test
as a whole (see Appendix A). Three types of graphical representation of
statistical data are also shown in the report (A histogram, frequency polygon,
a normal distribution curve and the Ogive).

2. Purpose of the report


The purpose of this report is to disseminate information based on the
descriptive statistics on 20 multiple-choice test items administered to 25
students.

3. Test analysis

3.1 Descriptive statistics

Descriptive statistics describes the basic features of the data that is, they
describe what the data shows. They provide simple summaries about the
sample and the measures and present quantitative descriptions in a
manageable form. A set of test scores were considered to calculate the mean,
mode, median and standard deviation, and a normal distribution graph for a
distribution with a mean of 65, a median of 65 and a mode of 65 is drawn (see
Figure 1).

Table 1: Descriptive statistics

Mean 65.79
Mode 65.00
Median 65.00
STDEV2 479.57
STDEV 21.90

This is a normal distribution because the measures of central tendency, the


mean, the median and the mode are the same. If the distribution is truly
normal (i.e., bell-shaped), the mean, median, and the mode are all equal to
each other.

Refer to Figure 1 to see the normal distribution curve

1
Figure 1: Normal distribution curve
Mean
Median
Mode
0.4
0.3
0.2
0.1
0.0

The numbers on the x-axis represent the standard deviations from the mean.
The points where there is a change in curvature is one standard deviation on
either side of the mean. Dark blue is less than one standard deviation from
the mean. For the normal distribution, this accounts for about 68% of the set
(dark blue) while two standard deviations from the mean (medium and dark
blue) account for about 95% and three standard deviations (light, medium,
and-4SD
dark-3SD
blue) account
-2SD for about
-1SD 65 99.7%.
+1SD The+2SDcurve is symmetric.
+3SD +4SD This is a
heterogeneous distribution because the values are further away from the
mean.

3.2 Frequency graphs

Table 2: Grouped frequency table

H 100
L 15
Range 85
Number of intervals 10
Size of interval 8.5

Cumulative frequency is obtained by adding the frequency of a class interval


and the frequencies of the preceding interval up to the class interval. This is
explained in Table 3.

The following frequency distribution table gives the marks obtained by 25


students:

2
Table 3: Cumulative frequency distribution

Lower Upper Middle Cumulative


Limit Limit Interval Value Frequency Frequency
15.00 24 15-24 19.5 1 1
25.00 34 25-34 29.5 2 3
35.00 44 35-44 39.5 0 3
45.00 54 45-54 49.5 4 7
55.00 64 55-64 59.5 3 10
65.00 74 65-74 69.5 6 16
75.00 84 75-84 79.5 1 17
85.00 94 85-94 89.5 6 23
95.00 104 95-104 99.5 2 25

Figure 2: Frequency histogram

Frequency Histogram

7
6
5
4
Frequency

3
2
1
0
15-24 25-34 35-44 45-54 55-64 65-74 75-84 85-94 95-104
Interval

Referring to Figure 2, a histogram is drawn to represent the class interval and


the frequency in a form of a rectangle. The class intervals are marked on the
horizontal axis (X-Axis) and the frequency is marked on the vertical axis (Y-
axis). The intervals are equal; therefore the height of each rectangle is
proportional to the corresponding frequency.

3
Figure 3: Frequency polygon

Frequency Polygon

7
6
5
4
Frequency

3
2
1
0
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5

Middle values of intervals

From Figure 3, the middle-values of the class interval of the given data are
plotted against the corresponding frequencies and the points obtained are
joined by straight lines. Points plotted (19, 1), (29, 2), (39, 0), (49, 4), (59, 3),
(69, 6), (79, 1), (89, 6), and (99, 2).

Refer to Table 3 to see the middle values of intervals and frequencies.

Figure 4: Cumulative frequency graph (An Ogive)

Cumulative Frequency Curve (An Ogive)

30
25
Cumulative Frequency

20
15
10
5
0
24 34 44 54 64 74 84 94 104
Upper Values

4
An Ogive is illustrated by an ‘S’ curve shape (see Figure 4). The graph is
drawn by plotting the points with coordinates having abscissae (X-axis) as
actual limits and ordinates(Y-axis) as the cumulative frequencies, (24,1),
(34,3), (44,3),(54,7), (64,10), (74,16), (84,17), (94,23), and (104,25) are the
coordinates of the points.

Refer to Table 3 to see the data on cumulative frequency.

3.3 Reliability coefficient of a test

The reliability of a test refers to the extent to which the test is likely to produce
consistent scores. It theoretically ranges in value from zero (no reliability) to
1.00 (perfect reliability). The KR20 measures test reliability of inter- item
consistency. A higher value indicates a strong relationship between items on
the test.

Table 4: Coefficients of reliability

k 20
k-1 19
Total pq 3.83
Stdev 21.90
(Stdev)2 479.57
KR20 1.04

From Table 4, it is clear that the KR20 is 1.04 this means that the test is
reliable or has a perfect reliability.

4. Item analysis

Item analysis describes the statistical analysis which allows measurement of


the effectiveness of individual test items.

4.1.1 Difficulty and discrimination indices of a set of test


items

Using the questions and results from the test, the degree of difficulty of each
question and the corresponding discrimination index was made (see Table 5
and Table 8).

5
Table 5: Difficulty index (p)

Difficulty index

#Questions #Correct #Answered p


q1 21 25 0.84
q2 22 25 0.88
q3 17 25 0.68
q4 12 25 0.48
q5 21 25 0.84
q6 17 25 0.68
q7 11 25 0.44
q8 12 23 0.52
q9 13 25 0.52
q10 8 24 0.33
q11 23 25 0.92
q12 19 25 0.76
q13 15 25 0.6
q14 21 25 0.84
q15 20 25 0.8
q16 22 24 0.92
q17 15 24 0.63
q18 8 24 0.33
q19 13 25 0.52
q20 16 25 0.64

This shows a percentage of students who answered an item/question


correctly. It is a measure of how difficult the question was to answer. The
higher the difficulty index, the easier the question is. A value of 1.000 means
that all of the students answered this correct response and this question may
be too easy. If the p-value is greater than 0.75, the item is acceptable and if
the p-value is less than 0, 25, then the item is difficult.

6
Table 6: Interpretation of the difficulty level of questions

#Questions Proportion Interpretation Reason


q1 0.84 Unacceptable Too easy
q2 0.88 Unacceptable Too easy
q3 0.68 Acceptable Fine
q4 0.48 Acceptable Fine
q5 0.84 Unacceptable Too easy
q6 0.68 Acceptable Fine
q7 0.44 Acceptable Fine
q8 0.52 Acceptable Fine
q9 0.52 Acceptable Fine
q10 0.33 Acceptable Fine
q11 0.92 Unacceptable Too easy
q12 0.76 Unacceptable Too easy
q13 0.6 Acceptable Fine
q14 0.84 Unacceptable Too easy
q15 0.8 Acceptable Fine
q16 0.92 Unacceptable Too easy
q17 0.63 Acceptable Fine
q18 0.33 Acceptable Fine
q19 0.52 Acceptable Fine
q20 0.64 Acceptable Fine

From the Table, it shows that 35% of the questions (1, 2, 5,11,12,14, and 16)
are unacceptable therefore it means that they were too easy and 65 %( 3, 4,
6, 7, 8, 9,10,13,15,17,18,19, and 20) are acceptable which shows that they
were fine (see Table 6).

Refer to Table 5 to see the interpretation on the difficulty level of questions.

The discrimination index was used to measure the ability of items/ questions
to distinguish between the lower and the upper group of students taking the
test (see Table 7).

Table 7: Number of students in upper and lower group

Upper 15
Lower 10

This is the measure of ability of an item to discriminate or differentiates among


students who have a high score on the test and those that get a low score on

7
the test. It is the difference between the percentage of correct responses in
the upper group and the percentage of the correct responses in the lower
group. Calculation procedures have been used to compare item responses to
total test scores using upper and lower level groups of students.

Refer to Table 8 to see the calculation procedures on discrimination index.

Table 8: Discrimination index (D)

Discrimination
index

#U #L D
15 6 0.60
15 7 0.53
14 3 0.79
8 4 0.50
15 6 0.60
12 5 0.58
9 2 0.78
10 2 0.80
10 3 0.70
8 0 1.00
14 9 0.36
14 5 0.64
12 3 0.75
15 6 0.60
14 6 0.57
15 7 0.53
12 3 0.75
5 3 0.40
12 1 0.92
11 5 0.55

This is a positive discrimination index because, the discrimination values are


all positive therefore the item’s discrimination ability is adequate. A positive
value for this index means that higher scoring student tended to select the
response more often.

Refer to Table 8 to see the discrimination values.

5. Conclusion

8
Since the KR20 is 1.04 which shows that the test is reliable, it means that the
questions of a test tended to “pull together”. Students who answered a given
question correctly were more likely to answer other questions correctly. If a
parallel test were developed by using similar items, the relative scores of
students would show little change.

6. References

1. A Guide to Interpreting the Item Analysis Report. (2004). Retrieved


September 12, 2007, from http://www.asu.edu/uts/InterpIAS.pdf

2. Image: Standard deviation diagram.svg [Image] (n.d.). Retrieved


0ctober 11, 2007, From
http://en.wikipedia.org/wiki/Image:Standard_deviation_diagram.svg#file

3. Introduction to Statistical Inference. (2005). Retrieved September 11,


2007, from http://students.washington.edu/hdevans/lec_11.doc

4. Kubiszyn, T., & Borich, G. (2007).Education testing and Measurement.


Classroom Application and Practice (8th Ed).John Wiley &sons,
Inc.United States of America.

5. Normal Probability Distribution. (2005). Retrieved September 11, 2007,


from
http://palgrave.com/busines/taylor/taylor1/lectures/lectures/overheads/o
chap5.doc

6. Office of Educational Assessment (Understanding item analysis


reports). (2005). Retrieved September 12, 2007, from
http://personal.gscit.monash.edu.au/~dengs/teaching/GCHE/part3-
3.pdf

7. Test Item Analysis. (2005). Retrieved September12, 2007, From


http://personal.gscit.monash.edu.au/~dengs/teaching/GCHE/part3-
3.pdf

7. Appendices

9
7.1 Appendix A

Prop Prop
#Questions #Correct #Incorrect Correct(p) Incorrect(q) pq
q1 21 4 0.84 0.16 0.13
q2 22 3 0.88 0.12 0.11
q3 17 8 0.68 0.32 0.22
q4 12 13 0.48 0.52 0.25
q5 21 4 0.84 0.16 0.13
q6 17 8 0.68 0.32 0.22
q7 11 14 0.44 0.56 0.25
q8 12 11 0.52 0.48 0.25
q9 13 12 0.52 0.48 0.25
q10 8 16 0.33 0.67 0.22
q11 23 2 0.92 0.08 0.07
q12 19 6 0.76 0.24 0.18
q13 15 10 0.6 0.4 0.24
q14 21 4 0.84 0.16 0.13
q15 20 5 0.8 0.2 0.16
q16 22 2 0.92 0.08 0.08
q17 15 9 0.63 0.38 0.23
q18 8 16 0.33 0.67 0.22
q19 13 12 0.52 0.48 0.25
q20 16 9 0.64 0.36 0.23
Total 3.83

10
Appendix B

Key C B D D B C D A C B A C B D A A C D B C
St No Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
1 C B B A C D A D D A D A A A A C B D B
2 C B D D B D A A C B A C B D A A C D B C
3 C B D D B C D A C B A C B D A A C B D C
4 C B D B B C B A C B A C A D C A C B C C
5 C B D C B C B A C D A C B D A A A B B C
6 C A D D C C A D C D A C A D A A A B D C
7 B B A B B C B B D D A C B D C A A D D C
8 C B D B B C B D B C A C B D A A C A B A
9 C B D A B C D D B D A C B D A A C B D A
10 C B B A B C D C D C A B A D D A C D B C
11 C B D D B C D A C B A C B D A A C D B C
12 C B D D B C D D D A A C A D A A C B B D
13 C B D A B C D A C B A C B D A A A B B C
14 C B D A B C D A C B A C B D A A A B C
15 C B D D B B A A B D A C D A A C B B D D
16 C B D D B C D A C B A C B D A A C D B C
17 B B C C B A D D C A D B D A C A D
18 C B B D B A D D D D A C A D A A C B B C
19 D C A D B A B A D C C D A A D B B B A B
20 C B D D B C D A C A C D B D A A C D B C
21 C A D D C C A D C D A C A D A A A B D C
22 B B A B B C B B D D A C B D C A A D D C
23 C B D B B C B D B C A C B D A A C A B A
24 C B B A C D A D D A D A A A A C B D B
25 C B D D B D A A C B A C B D A A C D B C

11