1.

0

INTRODUCTION For this project, our group is required to build or design one complete objective

test paper which consists of 40 items or multiple choice questions. We have decided to build an English test paper with 40 items and each item contains 4 options. Our test paper also consists of 5 sections and it must be answered by the students within 1 hour.

This multiple choice test paper is designed to fulfill several objectives as mentioned below: 1) It is a part of the course’s assessment (DPP 406 – Pentaksiran Pembelajaran). 2) It can be used to see and evaluate the students’ behaviour. 3) It can measure the students’ performance and differentiate between the top (high scorers) and bottom performers (low scorers) on the test as a whole for the chosen class. 4) It can measure the students’ factual and procedural knowledge. 5) It is useful for measuring whether the students can analyze material in the items correctly or not. 6) It can be used to check whether the students have learned facts and routine procedures which have only one clear correct answer. 7) It encourages students to think critically before they choose the best options for the particular question. 8) It enables us (teachers) to analyze each item that we have built, in which the key aim in this item analysis is to increase the reliability and validity of the test. By

1

analyzing the items, we will be able to identify the weak items and try to come out with solutions to improve the particular item.

Our group has chosen to distribute the test papers to one Form 2 class in Sekolah Menengah Kebangsaan (SMK) Bukit Gambir, Pulau Pinang. We went to the school twice, which is on 9th June 2008 (Monday) to ask permission from the school’s principal and discuss with the English Language Coordinator regarding the suitable time to distribute the test papers. On 13th June 2008 (Friday), we went to the school again to distribute the test papers to the Form 1 students with the advanced level.

The students begin to answer the test paper on 8.45 a.m. until 9.45 a.m. After all the students have answered all the questions within the stipulated time, our group has taken the opportunity to ask for the students’ feedback regarding the question paper that we have distributed to them. Overall, the students felt that the items in the question paper are at the average or medium level, whereby there are some difficult questions which create confusion to the students and some easy questions which can be answered by the students without any problem. Generally, our group felt very happy and satisfied with the school and students’ cooperation during the distribution of the test papers.

Besides that, our group has also prepared the project’s report which contains 4 main topics such as the test specification table, the students’ performance (frequency, mean, mode, median, variance and standard deviation), item analysis and suggestions or improvements on weak items.

2

2.0

TEST SPECIFICATION TABLE Subject: English Language (B.I) Class : Form 2 Date : 13th June 2008 (Friday) Time : 8.45a.m - 9.45a.m (1 Hour)

BLOOM’S TAXONOMY (LEARNING OUTCOMES) ITEM NUMBER (NO.) Section A (Graphic Materials and Stimuli ) – Students would need to study information found in the graphic materials and short texts and answer the questions based on them. • Item No.1 1) Students will be able to identify the correct meaning of the sign given. BEHAVIORAL OBJECTIVES Knowledge Comprehension Application Analysis Synthesis Evaluation


3

Item No.2

2) Students will be able to identify the most accurate information of the label as displayed in the given picture.

Item No.3

3) Students will be able to identify the purpose of the letter correctly.

Item No.4

4) Students will be able to differentiate the characteristics between the 2 types of bats in Malaysia as described in the given text. 5) Students will be able to identify the correct synonym for the word “range” as described in the given text.

Item No.5

4

Item No.6

6) Students will be able to identify the correct objective of the workshop as shown in the poster given.

Item No.7

7) Students will be able to identify the most likely food to be served for the participants based on the poster given. 8) Students will be able to identify the correct step to be taken before the bowl is covered as described in the label given. 9) Students will be able to identify the correct step that people should take when there is a tsunami warning by referring to the news report given.

Item No.8

Item No.9

5

Item No.10

10) Students will be able to identify the correct type of menu as shown in the picture given.

Section B (Rational Cloze) – This section tests the students’ knowledge of grammar and vocabulary. Students need to learn how they can apply and use the clues in the text to get the correct answer. • Item No.11 11) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

6

Item No.12

12) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence. 13) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

Item No.13

Item No.14

14) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

Item No.15

15) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

7

Item No.16

16) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

Item No.17

17) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence. 18) Students will be able to choose the most accurate interrogative pronoun to construct a grammatically correct sentence.

Item No.18

Section C (Closest In Meaning) – In this section, students will learn how to use available clues to answer 8

the questions on similar expressions. • Item No.19 19) Students will be able to choose the best meaning for the underlined phrases as provided in the conversation given. 20) Students will be able to choose the best meaning for the underlined phrases as provided in the conversation given. 21) Students will be able to choose the best meaning for the underlined phrases as provided in the conversation given. 22) Students will be able to choose the best meaning for the underlined phrases as provided in the conversation given.

Item No.20

Item No.21

Item No.22

9

Item No.23

23) Students will be able to choose the best meaning for the underlined phrases as provided in the conversation given.

Section D (Reading Comprehension) – This section provides all kinds of comprehension passage and students should know how to read and understand different types of comprehension passages. • Item No.24 24) Students will be able to choose the correct third step in making a kite by looking at the information in the given descriptive passage.

10

Item No.25

25) Students will be able to identify the correct type of bamboo sticks to make the kite by looking at the information in the given descriptive passage. 26) Students will be able to identify the correct tool to make the 2 sticks into a frame based on the information in the given descriptive passage. 27) Students will be able to identify the correct synonym for the word “apply” as described in the given descriptive passage. 28) Students will be able to choose the correct answer to fill in the blank in the sentence by referring to the information in the given descriptive passage.

Item No.26

Item No.27

Item No.28

11

Item No.29

29) Students will be able to choose the correct purpose of the letter by looking at the information in the given letter. 30) Students will be able to distinguish the correct answer from the false answers based on the information in the given letter. 31) Students will be able to identify the correct reason for Connelia Eleanor to invite Emelda Allyn to her school according to the information in the given letter. 32) Students will be able to choose the closest meaning for the phrase “to request a favour” correctly by looking at the given letter.

Item No.30

Item No.31

Item No.32

12

Item No.33

33) Students will be able to choose the best description about Emelda Allyn based on the information in the given letter. 34) Students will be able to predict the main objective of the Drama Society.

Item No.34

Section E (Literature Component) – In this section, students need to have better understanding of the literature texts. • Item No.35 35) Students will be able to identify the correct purpose for the writer to go to Innisfree based on the given poem.

13

Item No.36

36) Students will be able to choose the best answer to fill in the blank in the sentence by referring to the information in the given poem. 37) Students will be able to identify the correct way to for the writer to earn living in Innisfree. 38) Students will be able to choose the best meaning for the phrase “gone back to our ancestors” as shown in the extract of the given short story. 39) Students will be able to identify the correct character who should be the chief of the village based on custom by referring to the extract of the given short story.

Item No.37

Item No.38

Item No.39

14

Item No.40

40) Students will be able to identify the important moral value that the people of Dalat had learnt from the incident happened in their village correctly by looking at the extract of the given short story. 40 Items / Behavioral Objectives 40 % 5 5%

TOTAL Percentage

24 24 %

9 9%

2 2%

-

-

15

3.0

THE OVERALL REPORT OF THE STUDENTS’ PERFORMANCE THROUGH FREQUENCY, MEAN, MODE, VARIANCE AND STANDARD DEVIATION
Overall, the performance of the form two students was very good. This is because none of

the student fails in this exam. Most of them managed to get a good marks range from 30 or 70% and above. This can be shown in the score table, Table 3.2, and the frequency table, Table 3.3, below whereby only one of the student got 22 marks out of 40 marks or 55%, 3 students got 27 marks out of 40 marks which also amount to 67.5%, 5 students got 29 marks over 40 marks that is 72.5%, 13 students got the marks range from 30 to 34 marks over 40 marks and 8 students got the marks range from 35 to 39 over 40 marks. In the other words, from the results shown, they have clearly shown that the exam questions can be considered as very easy and the level of intelligence of the students were excellent. This can be supported by the histogram shown in Histogram 1. STUDENTS S30 S11 S4 S16 S17 S22 S26 S9 S1 S7 S19 S21 S24 S28 S12 S18 S13 SCORE (x) 39 38 36 36 36 36 36 35 34 34 34 34 34 34 33 33 32 x-m 6.8 5.8 3.8 3.8 3.8 3.8 3.8 2.8 1.8 1.8 1.8 1.8 1.8 1.8 0.8 0.8 -0.2 (x-m)2 46.24 33.64 14.44 14.44 14.44 14.44 14.44 7.84 3.24 3.24 3.24 3.24 3.24 3.24 0.64 0.64 0.04 16

S14 S5 S20 S25 S3 S6 S10 S15 S27 S2 S23 S29 S8 Mean (m) 32.20 Mode 34 Variance (s2) 14.16

32 31 31 30 29 29 29 29 29 27 27 27 22

-0.2 -1.2 -1.2 -2.2 -3.2 -3.2 -3.2 -3.2 -3.2 -5.2 -5.2 -5.2 -10.2

0.04 1.44 1.44 4.84 10.24 10.24 10.24 10.24 10.24 27.04 27.04 27.04 104.04

Standard Deviation (SD) 3.763

TABLE 3.1: Mean, mode, variance and standard deviation. Meanwhile, from the table above we can see that the mean or the average marks obtained by 30 students of this Form Two class is equal to 32.20 which also amount to 80.5%. Whereas, the mode which also refers to the score that occurs most frequently for this class is 34. In the other words, most of the students had averagely scored 85% for this test. The variance or the average of the squared differences of the students’ scores from the mean is equal to 14.16 and the standard deviation or the average of the differences of all scores from the mean is equal to 3.763.

17

Scores (x/40) 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20

Percentage (%) Number of Students 100% 0 97.5% 1 95% 1 92.5% 0 90% 5 87.5% 1 85% 6 82.5% 2 80% 2 77.5% 2 75% 1 72.5% 5 70% 0 67.5% 3 65% 0 62.5% 0 60% 0 57.5% 0 55% 1 52.5% 0 50% 0 TABLE 3.2: Students Performances’ Table

Histogram 1: Skewed Distribution

18

Frequenc Bin y 22 1 25.4 0 28.8 3 32.2 10 35.6 9 More 7 TABLE 3.3: Frequency From the histogram shown above, it shows that the distributions of the students’ results are negatively skewed. Meaning to say, the exam questions are quiet easy for them therefore most of the students managed to score a very good grade for this test. This may be caused by the level of intelligence of the students themselves, in which they are from the advance class. Thus, their mastery of English Language is very good. Apart from that, the designed items in this exam paper can be considered as unable to measure, to evaluate or to discriminate the real performance of each student in the classroom as they still cannot differentiate between the excellent students and the poor students.

19

4.0

ITEM ANALYSIS

Item analysis is carried out to aid in evaluating the effectiveness of a test item. In conducting the item analysis of this test, item difficulty, item discrimination and distracters quality will be considered. The students are categorised into upper-group and lower-group. The scores of the students were sorted descending 27% of the high-scoring students are categorised into uppergroup and 30% of the low-scoring students are categorised into lower-group students. STUDENTS S30 S11 S4 S16 S17 S22 S26 S9 S1 S7 S19 S21 S24 S28 S12 S18 S13 S14 S5 SCORE (x) 39 38 36 36 36 36 36 35 34 34 34 34 34 34 33 33 32 32 31 20

S20 S25 S3 S6 S10 S15 S27 S2 S23 S29 S8 4.1 Item Difficulty

31 30 29 29 29 29 29 27 27 27 22

- Upper-group students - Lower-group students

The index of difficulty (p value) is the proportion of the total group who got an item right. It ranges between 0 – 1. A p-value closer to 1 indicates the easier the item is as more students can answer the item correctly. On the other hand, the closer the value to 0, the more difficult the item is as fewer students got the answer right. The index of difficulty (p) of each item of the test was calculated using the following formula:

where, P = index of difficulty Ncorrect = number of students answering correctly Ntotal = number of students taking the test Upper Group (8 Students) A 1 2 3 4 5 6 B C D Lower Group (9 Students) A B C D Middle Group (13 Students) 8 13 12 10 4 12

Item

Pall 0.467 0.967 0.933 0.767 0.533 0.667

Pupper 0.625 1.000 1.000 0.750 0.875 0.625

Plower 0.111 0.889 0.889 0.778 0.556 0.333

Discrimination Index 0.514 0.111 0.111 -0.028 0.319 0.292 21

1 1 *5 1 1 5 *1 2 0 *8 0 0 0 *8 0 1 0 *8 0 0 0 *8 0 1 *6 1 0 1 *7 2 0 0 *7 1 0 0 *5 4 0 0 0 1 2 *5 2 3 1 *3

7 0 0 0 *8 0 8 0 *8 0 0 1 9 0 0 *8 0 1 10 0 0 0 *8 1 11 0 0 0 *8 0 12 *6 2 0 0 *6 13 *7 0 0 1 *0 14 0 *8 0 0 1 15 0 1 0 *7 2 16 0 1 0 *7 3 17 *7 0 1 0 *4 18 0 0 *8 0 2 19 0 0 *8 0 2 20 0 *8 0 0 1 21 0 *7 0 1 0 22 0 0 *8 0 0 23 0 0 0 *8 0 24 *7 0 0 1 *9 25 *8 0 0 0 *5 26 0 *8 0 0 1 27 0 0 *8 0 0 28 *8 0 0 0 *8 29 *8 0 0 0 *7 30 *5 0 3 0 *5 31 2 0 0 *6 4 32 *8 0 0 0 *4 33 *8 0 0 0 *4 34 0 *8 0 0 0 35 0 0 *8 0 0 36 *8 0 0 0 *8 37 0 *8 0 0 2 38 0 0 *8 0 1 39 0 0 *8 0 0 40 *4 0 0 4 *4 * Denotes Correct Answer

0 *7 1 1 0 2 1 *7 1 0 1 0 1 *8 *7 0 0 0 1 *6 2 0 0 0 2 1 2 *9 0 1 *7 1 0 0

6 1 *7 0 0 0 5 1 3 1 4 *7 *6 0 0 *9 0 0 0 2 *7 0 2 3 0 3 1 0 *9 0 0 *7 *9 1

*3 0 0 *7 *9 1 3 0 *3 *5 0 0 0 0 2 0 *9 0 3 0 0 1 0 1 *3 1 2 0 0 0 1 0 0 4

9 12 12 13 13 7 9 11 5 10 7 10 11 13 13 13 13 11 10 13 13 11 12 6 8 11 9 9 13 13 11 13 13 10

0.667 0.900 0.900 0.933 0.967 0.633 0.533 0.867 0.500 0.733 0.600 0.833 0.833 0.967 0.900 1.000 1.000 0.867 0.767 0.900 0.933 0.900 0.900 0.533 0.567 0.767 0.700 0.867 1.000 0.967 0.833 0.967 1.000 0.633

1.000 1.000 1.000 1.000 0.875 0.750 0.875 1.000 0.875 0.875 0.875 1.000 1.000 1.000 0.875 1.000 1.000 0.750 1.000 1.000 1.000 1.000 1.000 0.625 0.750 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.500

0.333 0.778 0.778 0.778 1.000 0.667 0.000 0.778 0.333 0.556 0.444 0.778 0.667 0.889 0.778 1.000 1.000 1.000 0.556 0.667 0.778 0.889 0.778 0.556 0.333 0.444 0.444 1.000 1.000 0.889 0.667 0.889 1.000 0.556

0.667 0.222 0.222 0.222 -0.125 0.083 0.875 0.222 0.542 0.319 0.431 0.222 0.333 0.111 0.097 0.000 0.000 -0.250 0.444 0.333 0.222 0.111 0.222 0.069 0.417 0.556 0.556 0.000 0.000 0.111 0.333 0.111 0.000 -0.056

TABLE 4.1: Calculation of P-values And Discrimination Index for Each Item From the table above, the p-value for each item is listed in the Pall column. The p-values calculated range from 0.467 to 1.000. Since this test is a norm-referenced test (NRT), therefore 22

the average difficulty index should be within 0.34 to 0.66. The difficulty indices were analysed using the Henning’s (1987) guidelines as shown in the following table: High (Difficult) ≤ 0.33 Medium (Moderate) 0.34 – 0.66 TABLE 4.2: Henning’s Guidelines Low (Easy) ≥ 0.67

Item with p-value lesser than 0.33 will be considered as a difficult item. Whereas item with pvalue between 0.34 to 0.66 is a moderately difficult item and item with p-value more than 0.67 is an easy item. Level of difficulty High (≤ 0.33) Medium (0.34 – 0.66) Low (≥ 0.67) Items 1,5,12,13,15,17,30,31,40 2,3,4,6,7,8,9,10,11,14,16,18,19,20,21,22,23, Count 0 9 31

24,25,26,27,28,29,32,33,34,35,36,37,38,39 TABLE 4.3: Item Categorisation Based On the Level of Difficulty

Based on the table above, most items are categorised into low difficulty item which means this test is an easy test. There seems to be an imbalance distribution among easy and difficult items where 78% of the items have low level of difficulty whereas none of the item in the test has high level of difficulty. However, the discrimination index should also be considered in analysing these test items. 4.2 Item Discrimination

The index of discrimination (D) is the difference between the proportion of the upper group who answered an item correctly and the proportion of the lower group who answered the item correctly. This index is dependent upon the difficulty of an item which means it relates with the index of difficulty of the item. The D of each item for this test is calculated using the formula below:

D = Pupper – Plower
23

where, D = item discrimination for an individual item Pupper = item difficulty for the upper group on the whole test Plower = item difficulty for the lower group on the whole test Ebel’s (1979) criteria and guidelines for categorising discrimination indices is a widely quoted set of guidelines and therefore is used in this test analysis to categorise the 40 test items. Discrimination index Description (D) ≤ 0.19 (Bad) 0.20 – 0.29 (OK) 0.30 – 0.39 (Good) ≥ 0.40 (Very Good) The item should be eliminated or completely revised. The item is marginal and needs revision. Little or no revision is required. The item is functioning quite satisfactorily. TABLE 4.4: Ebel’s Guidelines

Based on the Ebel’s guidelines in the above table, the 40 test items can be categorised as follows: Discrimination Index Items (D) ≤ 0.19 (Bad) 2,3,4,11,12,20,21,22,23,24,28,30,34,35,36,38,39,40 0.20 – 0.29 (OK) 6,8,9,10,14,18,27,29 0.30 – 0.39 (Good) 5,16,19,26,37 ≥ 0.40 (Very Good) 1,7,13,15,17,25,31,32,33 TABLE 4.5: Item Categorisation Based on Discrimination Index Count 18 8 5 9

The results indicate that about 65% of the test items are week in discriminating between the ‘good’ and ‘weak’ students and these items need to be looked closely as they may need considerable revision or elimination.

Item Discrimination ≤ 0.19 (Bad)

Item Difficulty Medium (0.34 – High (≤ 0.33) 0.66) 12,30,40 Low (≥ 0.67) 2,3,4,11,20,21,22,23,24,28, 24

34,35,36,38,39 0.20 – 0.29 (OK) 6,8,9,10,14,18,27,29 0.30 – 0.39 (Good) 5 16,19,26,37 ≥ 0.40 (Very Good) 1,13,15,17,31 7,25,32,33 TABLE 4.6: Item Categorisation Based On The Relationship Item Discrimination and Item Difficulty The table above shows the relationship between the item discrimination and item difficulty for each item. Items which are listed in the black coloured cell need to be revised or eliminated if the items are intended to be kept for future use. 4.3 Distracter Analysis

Distracter analysis examines the proportion of students who selected each of the response options. According to Tucker (2007), “on a well-designed multiple choice item, high scoring students should select the correct option even from highly plausible distracters. Those who are ill-prepared should select randomly from available distracters. In this scenario, the item would be a good discriminator of knowledge and should be considered for future assessments. In other scenarios, a distracter analysis may reveal an item that was mis-keyed, contained a proofreading error, or contains a distracter that appears plausible even by those that scored well on an assessment”. The proportion for each option is calculated by using the following formula:

where, Prop = Proportion student choosing the distracter n = number of students choosing the distracter N = total number of students taking the test The table below shows the proportion of students in the upper and lower group who selected the correct answers as well as the proportion of students choosing each alternative for each item.

25

Item

Upper Group (8 Students) A B 0.125 C *0.62 5 D 0.125

Lower Group (9 Students) A 0.111 B 0.556 C *0.111 D 0.222

Pall 0.46 7

Discrim -ination Index 0.514

1

0.125

26

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0.000 0.000 *0.75 0 *0.87 5 0.000 0.000 0.000 0.000 0.000 0.000 *0.75 0 *0.87 5 0.000 0.000 0.000 *0.87 5 0.000 0.000 0.000 0.000 0.000 0.000 *0.87 5

*1.00 0 *1.00 0 0.125 0.125 0.125 0.000 *1.00 0 0.000 0.000 0.000 0.250 0.000 *1.00 0 0.125 0.125 0.000 0.000 0.000 *1.00 0 *0.87 5 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.250 0.000 0.000 *1.00 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 *1.00 0 *1.00 0 0.000 0.000 *1.00 0 0.000 0.000

0.000 0.000 0.125 0.000 *0.62 5 *1.00 0 0.000 0.000 *1.00 0 *1.00 0 0.000 0.125 0.000 *0.87 5 *0.87 5 0.000 0.000 0.000 0.000 0.125 0.000 *1.00 0 0.125

0.000 0.000 *0.77 8 *0.55 6 0.222 0.000 0.111 0.111 0.111 0.000 *0.66 7 *0.00 0 0.111 0.222 0.333 *0.44 4 0.222 0.222 0.111 0.000 0.000 0.000 *1.00 0

*0.88 9 *0.88 9 0.222 0.444 0.333 0.000 *0.77 8 0.111 0.111 0.000 0.222 0.111 *0.77 8 0.111 0.000 0.111 0.000 0.111 *0.88 9 *0.77 8 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.111 0.667 0.111 *0.77 8 0.000 0.000 0.000 0.556 0.111 0.333 0.111 0.444 *0.77 8 *0.66 7 0.000 0.000 *1.00 0 0.000 0.000

0.111 0.111 0.000 0.000 *0.33 3 *0.33 3 0.000 0.000 *0.77 8 *1.00 0 0.111 0.333 0.000 *0.33 3 *0.55 6 0.000 0.000 0.000 0.000 0.222 0.000 *1.00 0 0.000

0.96 7 0.93 3 0.76 7 0.53 3 0.66 7 0.66 7 0.90 0 0.90 0 0.93 3 0.96 7 0.63 3 0.53 3 0.86 7 0.50 0 0.73 3 0.60 0 0.83 3 0.83 3 0.96 7 0.90 0 1.00 0 1.00 0 0.86 7

0.111 0.111 -0.028 0.319 0.292 0.667 0.222 0.222 0.222 -0.125 0.083 0.875 0.222 0.542 0.319 0.431 0.222 0.333 0.111 0.097 0.000 0.000 -0.250 27

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

*1.00 0 0.000 0.000 *1.00 0 *1.00 0 *0.62 5 0.250 *1.00 0 *1.00 0 0.000 0.000 *1.00 0 0.000 0.000 0.000

0.000 *1.00 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 *1.00 0 0.000 0.000 *1.00 0 0.000 0.000

0.000 0.000 *1.00 0 0.000 0.000 0.375 0.000 0.000 0.000 0.000 *1.00 0 0.000 0.000 *1.00 0 *1.00 0

0.000 0.000 0.000 0.000 0.000 0.000 *0.75 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

*0.55 6 0.111 0.000 *0.88 9 *0.77 8 *0.55 6 0.444 *0.44 4 *0.44 4 0.000 0.000 *0.88 9 0.222 0.111 0.000

0.111 *0.66 7 0.222 0.000 0.000 0.000 0.222 0.111 0.222 *1.00 0 0.000 0.111 *0.77 8 0.111 0.000

0.000 0.222 *0.77 8 0.000 0.222 0.333 0.000 0.333 0.111 0.000 *1.00 0 0.000 0.000 *0.77 8 *1.00 0

0.333 0.000 0.000 0.111 0.000 0.111 *0.33 3 0.111 0.222 0.000 0.000 0.000 0.111 0.000 0.000

*0.50 *0.44 0.000 0.000 0.500 0.000 0.111 0.444 -0.056 0 4 * Denotes Correct Answer TABLE 4.7: The Proportion of Students in the Upper and Lower Group Who Selected Each Option

0.76 7 0.90 0 0.93 3 0.90 0 0.90 0 0.53 3 0.56 7 0.76 7 0.70 0 0.86 7 1.00 0 0.96 7 0.83 3 0.96 7 1.00 0 0.63 3

0.444 0.333 0.222 0.111 0.222 0.069 0.417 0.556 0.556 0.000 0.000 0.111 0.333 0.111 0.000

4.4

Kuder Richardson 20

This statistic measures test reliability of inter-item consistency. A KR-20 value ranges from 0 – 1. A higher value indicates a strong relationship between items on the test. While a lower value

28

indicates a weak relationship between test items. Therefore, a test has a better reliability when the KR-20 value closer to 1. The KR-20 is calculated as follows:

where, KR-20 - Kuder Richardson 20 k - Number of items in the test p - Item difficulty 2 S - Variance of the raw scores or standard deviation squared

The value of KR-20 calculated for this test is 0.63. It seems that this test is moderately reliable.

5.0

COMMENT AND SUGGESTION

In this chapter, we will be analyzing the item in order to discover items that consist of ambiguous item, miskeyed, too easy or too difficult item and non-discriminating item. The reasons for this analyzing are to enhance the technical quality of an examination by pointing out options that are nonfunctional and that should be improved or eliminated.

5.1

INTERPRETING ITEM ANALYSIS DATA

5.1.1

Too easy items

29

Item that virtually everyone gets right are useless for discriminating among students and should be replaced by more difficult items or eliminated. This could be seen from the proportion of students answering an item correctly. This point may be summarized by saying that items answered correctly by a large proportion of examinees have markedly reduced power to discriminate.

When all items are extremely easy, most test scores will be extremely high. In either case, test scores will show very little variability. Thus, extreme p values directly restrict the variability of test scores. When everyone taking the test chooses the correct response as is seen in Table 1. An item with a p value of .0 or a p value of 1.0 does not contribute to measuring individual differences, and this is almost certain to be useless.

Example Question 22:Mei Ling Librarian Mei Ling Librarian Mei Ling Librarian : Can I check out (21) this book? : Of course, you can. Is that all? : Yes, that’s all. When must I return it? : The expiry date (22), is on the first page. : I can see it. Thank you. : Make sure when you bring back the book, it is as good as new. (23)

22. expiry date A. date to go B. time to leave C. date to return D. date of birth

30

Table 1 Maximum Item Difficulty Example Illustrating No Individual Differences Group Item Response * Options Upper group Lower group A 0 B 0 C 8 D 0

0

0

9

0

Note. * denotes correct response Item difficulty: (8+9)/17 = 1.00p Discrimination Index: (8-9)/17 = 0.05

23. it is as good as new. A. it is still new B. it is interesting C. it is not dirty D. it is in a good condition Table 2 Minimum Item Difficulty Example Illustrating No Individual Differences Group Item Response * Options Upper group A 0 B 0 C 0 D 8

31

Lower group

0

0

0

9

Note. * denotes correct response Item difficulty: (8+9)/17 = 1.00p Discrimination Index: (8-9)/17 = 0.05 Based on table one (1) and table two (2,) we could suggest that the items must be improved by making the distracter more attractive and these items should be replace or eliminate in order to construct an item which is suitable with students’ level of knowledge.

5.1.2. The ambiguity of the options

One measure of item ambiguity is the extent to which students in the upper group select an incorrect option with about the same frequency as they select the correct one. Ambiguity defined in this way is the inability of the highest-scoring students on the test to discriminate between a “correct” alternative and one judged by the teacher to be “wrong”. An ambiguous item could also be defined as one that allows for more than one “correct” alternative as judged by a group of experts, although a question that is clear to experts may be ambiguous to students who lack understanding of the item’s content. Example:- Item 40. 40. What was the important moral value that the people of Dalat had learned from the incident happened in their village? A. to value peace B. never listen to brothers C. customs are waste of time they should obey their siblings D. they should obey their siblings 32

Table 3 Item Difficulty Group Item Response * Options Upper group Lower group A 4 B 0 C 0 D 4

4

0

1

4

Note. * denotes correct response Table 3, about equal numbers of top students went for A and D

In the example, the item appears to be ambiguous because the two options can be justified. When students in the upper portion of the class select a “correct” option and an “incorrect” option with about equal frequency, the item is ambiguous either because students lack knowledge or because the options or the item was defective. Which of these reasons is applicable to a given item is determined by examining the highly selected but “incorrect” options to see if more than one answer can be justified. Our suggestion to this item is to look at their favorite alternative again, and see if we can find any reason they could be choosing it.

5.1.3. Miskeying

33

When we first correcting students’ test paper, we have identified a problem where students have giving the correct answer but miskeyed by us. Miskeying is another common error that can be corrected before students’ papers are returned. One way of detecting potentially miskeyed items is to examine the responses of the students in the upper portion of the class. An “incorrect” option selected by a large number of these students suggests a keying error, as in the following example: 28. A string must be tied to each __________ of the stick to make frame. *A. end (8) B. left (0) C. right (0) D. bottom (0) Because the majority of the most capable students selected ‘end’ as the correct alternative and so few agreed with the “keyed” answer, the teacher should check the key for a possible error.

5.1.4. The effectiveness of distracters

Analyzing the distracters is useful in determining the relative usefulness of the decoys in each item. Example: Item 5. 5. The word range means A. vary B. come C. fly D. watch 34

Table 4 Alternatives aren’t working Group * Options Upper group Lower group A 7 5 B 1 4 C 0 0 D 0 0 Item Response

Based on the example, we could clearly identified that no one fell for options C and D. Option B is a good distracter however options C and D is not a plausible alternatives. Our suggestion is, if a distracter elicits very few or no responses, then it may not be functioning as a distracter and should be replaced with a more attractive and realistic option. Example: Item 6 6. The objective of the workshop is to A. visit parks B. hold a talk C. promoting membership D. enlighten students Table 5 Effective distracters Group Item Response *

35

Options Upper group Lower group

A 0

B 1

C 2

D 5

2

3

1

3

Note. * denotes correct response Item difficulty = 0.667 Discrimination Index = 0.292

Distracters should be carefully examined when items show large positive p-values. Our comment is, items should be modified if students consistently fail to select certain multiple choice alternatives. The alternatives are probably totally implausible and therefore of little use as decoys in multiple choice items. Also some distracters may be too appealing, causing the item to be too difficulty, discriminateable, or variability can be redeemed by the revision of one or two of the response options.

5.1.5. The presence of guessing Based on the item analysis, we could clearly identify that the students have answer the question without guessing or respond to the questions randomly. This is mainly because of; the test was given to an advance class student and due to easy items given.

6.0

CONCLUSIONS

Overall, we could say that this test is moderately reliable. However, most of the items are too easy to be given to advance level students. As a result, most of the items were not functioning very well and could not perfectly differ between upper and lower students. Nevertheless, the item 36

analysis statistics of item discrimination (IF) and item difficulty (ID) help us to decide which items to keep and which to discard in creating a new revised version of the test. Moreover, the distracter efficiency analysis is also useful for spotting items that are miskeyed and for tuning up those items that have options that are not working as would be expected.

7.0

REFERENCES

Ebel, R.L. (1979). Essentials of educational measurement (3rd ed.). Englewood Cliffs, NJ: Prentice Hall. Henning, G. (1987) A Guide To Language Testing­ Development, Evaluation, Research. London:  Newbury House Publisher. Tucker, S. (2007, September 21). Retrieved June 26, 2008, from University of Maryland: http://www.umaryland.edu/cits/testscoring/pdf/sop_deconstructingtestscoring.pdf

37

8.0

APPENDIX

38

Sign up to vote on this title
UsefulNot useful