A221 SGDE4013 7 AllAboutItems 2 VNB

SGDE4013 Assessment in Learning
Items: The Building Blocks of Measurement Instruments 2
Nurliyana Bukhari, Ph.D.
School of Education
Semester A221 | First Semester 2022/2023
1
OUTLINE
Part 1: Item Difficulty (and Distractor Analyses)
Part 2: Item Discrimination
Part 3: Case Studies

Case Study 1
Case Study 2
Case Study 3
Part 4: Excel Calculations (MUST CHECK THE FILES PROVIDED!)
2
Part 1
Item Difficulty
(and Distractor Analyses)
Reviewing the Concept of Item Difficulty
• Recall that item difficulty reflects

the portion of the trait continuum
for which an item provides
information about person location.
hedonism/hedonisme
• Low difficulty items provide
information about individuals
having locations that are low on
the trait continuum.
• High difficulty items provide

information about individuals
having locations that are high on
basic arithmetic ability the trait continuum.
4
Measuring Item Difficulty
• Item difficulty is measured by the mean item response across the

individuals in a sample or population.
• A low mean value indicates a high difficulty item (most people

score low on the item).
• A high mean value indicates a low difficulty item (most people

score high on the item).
5
The Scale of Item Difficulty: Dichotomous Items
• For the case of dichotomously scored items (i.e., multiple-choice items scored
as 0/1 for incorrect/correct) the mean item score is equal to the proportion
correct.
• We denote the proportion correct “p”, and call this the item’s p-value.
• Thus, the scale of difficulty for dichotomous items is:
0 1
High Difficulty Low Difficulty
• Note the inverse relationship between the p-value and difficulty.
6
Interpreting Item Difficulty: Dichotomous Items
• p-value near 1 reflects very low difficulty (provides info about individuals
with very low trait level)
• p-value near 0 reflects very high difficulty (provides info about individuals
with very high trait level)
• p-value near 0.5 reflects moderate difficulty (provides info about individuals
with medium trait level)
7
Difficulty and Item Quality: Dichotomous Items
• If the p-value is too extreme, the item may not be providing information about
individuals in a relevant range of the trait continuum.
• Values near p = 0.9 reflect items that providing information about very low levels of
target trait, and thus may not be a good use of an item.
• Typically items in the 0.3 – 0.7 range generate info for a useful range of the trait
continuum (optimal level range).
• But, sometimes you need a few items in more extreme difficulty values… so, it’s a bit
of a judgment call that depends on the range of the trait continuum about which you
intend to make inferences. 8
Exercises
For each of the p-values for multiple-choice items scored correct-incorrect,

indicate the item difficulty (low, average, high) and the range of the trait
continuum (low, middle, high) about which the item provides information.
1. p = 0.5 average difficulty, middle range

2. p = 0.2 high difficulty, high range
3. p = 0.8 low difficulty, low range
4. p = 1.0 very low (extreme), no information, everyone answered it correctly
9
Item Difficulty for Multiple-Choice Items
• Because the potential act of guessing on multiple-choice items can cause
complications, we anticipate that the p-value associated with the highest possible
difficulty item should not actually be zero.
• Rather, it should equal (roughly speaking) the chance of guessing (g) on the item.
• E.g., an item with 4 response options should have

• a lowest possible p-value  0.250
• an optimal level of p-value  0.625.
10
Exercises
• Consider a multiple-choice item with four response options (A, B, C, D). For what trait level does an
item with p = 0.3 provide information?
Somewhat difficult
• Consider a multiple-choice item with two response options (e.g., a true/false item). What would you
expect the lowest possible p-value to be?
p = 0.50
• Consider a multiple-choice item with three response options (A, B, C) that has a p-value of 0.15. What
does this tell you about the item quality?
This is a very difficult item as high difficulty item should be 1/3 = 0.33. This item should be
flagged for further analyses (start with distractor analyses)
11
Distractors Analyses
Which number is both a factor of 100

KEY & DISTRACTOR ANALYSIS
and a multiple of 5?
A. 4 A. Did not consider criteria of “multiple of 5”
STEM RATIONALE
B. 40 B. Did not consider criteria of “factor of 100”
Statement of the question
C. 50 C. Correct KEY
D. 500 DISTRACTOR
D. Multiplied 100 and 5. Also, it did not
consider criteria of “factor of 100”
OPTIONS In educational test, if no student selects a

Possible answers the particular distractor, is that distractor a
students must select from good or a bad distractor?
What is your opinion on

the stem of the item?
12
Item Analysis: Example (Item Difficulty)
Abang Zaidi ke Kuala Lumpur untuk menghadiri temu duga dengan _____________ segulung ijazah yang dimilikinya.

A. berbekalkan C. berlandaskan
B. bersandarkan D. berpandukan
Chance
Difficulty Index (p) level (g)
KATEGORI PILIHAN JAWAPAN
*A B C D
p = Jumlah pelajar yang menjawab betul g=1
KT 3 3 1 4 Jumlah pelajar
KR 1 2 1 5
4
g = 0.25
JUMLAH 4 5 2 9 p = KT* + KR*
N
*Jawapan
p=4 Optimal level
20
opt = 1 + 0.25
p = 0.20 2
opt = 0.625
The Scale of Item Difficulty: Polytomous Items
• For the case of polytomously scored items (e.g., rating scale items, essay items,
performance-based items) the mean item score is used.
• Thus, the scale of difficulty for polytomous items is:
0 J
High Difficulty Low Difficulty
• Note that the item score levels are 0, 1, …, J.

• Note the inverse relationship between the mean and difficulty.
14
Interpreting Item Difficulty: Polytomous Items
• An item mean near J reflects very low difficulty (provides info

about individuals with very low trait level).
• An item mean near 0 reflects very high difficulty (provides info

about individuals with very high trait level)
• An item mean near J/2 reflects moderate difficulty (provides info

about individuals with medium trait level)
15
Of Course, It’s a Little More Complicated than that.
• Naturally, for polytomous item one also needs to consider all

response options, as the mean may not be reflective of all that is
going on.
• So, it is important to examine the distribution of score levels

within each item to see which score levels are getting the most use.
16
Difficulty and Item Quality: Polytomous Items
• If the item mean is too extreme, the item may not be providing information
about individuals in a relevant range of the trait continuum.
• Values near 0 or J reflect items that providing information about very extreme
levels of target trait, and thus may not be a good use of an item.
• But sometimes you need a few items in more extreme difficulty values… so,
it’s a bit of a judgment call that depends on the range of the trait continuum
about which you intend to make inferences.
17
Exercises
For each situation, specify whether the item difficulty is low,
moderate, or high:
1. A rating scale item with five score levels (0,1,2,3,4) has a mean of 3.8.
low difficulty @ an easy item; average p = 4/2 = 2.00
2. A rating scale item with three score levels has a mean of 0.5.
high difficulty @ a difficult item; average p = 3/2 = 1.5
3. A rating scale item with seven score levels has a mean of 3.

moderate difficulty @ an average item; average p = 7/2 = 3.5
18
Consideration for Instrument Development
• Ultimately, you want your item difficulties to align with the intended uses of
the instrument.
• If you want to generate good information across a very wide range of trait
levels, you would want to have a very good range of item difficulties.
• If you want to generate very high information at a specific trait level (a cut-
score or a standard), then your would want to have lots of items with a
difficulty that differentiated between individuals at the location of the trait
level of interest.
19
Part 2
Item Discrimination
Measuring Item Discrimination
• Item discrimination concerns how the response options of the item

discriminate between people with different levels of latent trait.
• Discrimination is usually measured by the correlation between Yi and X

(item-total correlation: ITC).
• Often, we use an X that is adjusted by removing Yi from the computation of X

(i.e., X = sum of all items other than Yi) to avoid a positive bias. This is called
the corrected item-total correlation.
21
Item Analysis: Example (Item Discrimination)
Abang Zaidi ke Kuala Lumpur untuk menghadiri temu duga dengan _____________ segulung ijazah yang dimilikinya.

A. berbekalkan C. berlandaskan
B. bersandarkan D. berpandukan

Discrimination Index (D)
*A B C D
KT 3 3 1 4 D = Jumlah KT menjawab betul – Jumlah KR menjawab betul
KR 1 2 1 5 Jumlah pelajar KT
JUMLAH 4 5 2 9
D = KT* - KR*
*Jawapan NKT
D=3-1
11
D = 0.182
Distractors (and Item) Analyses 3 (cont…): Example of Interpretations
• Jawapan ialah A iaitu “berbekalkan”.

• 4 orang daripada murid berkebolehan tinggi memilih distractor/pengganggu D iaitu “berpandukan”
sebagai jawapan. Hal ini mungkin berlaku kerana terdapat factor kekeliruan di mana mereka
menyangka itu ialah jawapan.
• Kesemua opsyen telah dipilih oleh semua murid berkebolehan tinggi mahupun murid berkebolehan
rendah
• Pekali kesukaran (p) tidak berada dalam skala 0.30-0.70 iaitu 0.2 menunjukkan item tersebut adalah
sukar kerana hanya 20% daripada murid boleh menjawab item tersebut dengan betul.
• Pekali diskriminasi (D) ialah 0.2 menunjukkan item mempunyai pekali diskriminasi yang rendah,
iaitu KURANG daripada 0.30.
• Keseluruhannya item ini merupakan item yang sukar (p=0.20) dengan kebolehan mendiskriminasi
antara pelajar yang agak rendah (D=0.18). Item ini mungkin perlu dikaji semula sama ada ingin
dikekalkan sebagai item atau dibuang dan digantikan dengan item yang lain. Sebagai penulis item,
kami mengambil keputusan untuk mengekalkan item tersebut kerana item tersebut sukar dan tidak
mempunyai indeks diskriminasi yang negatif. Kesemua distraktor/pengganggu juga didapati
berfungsi.
Correlation Coefficient: Point-biserial:
Dichotomous Item-Total Correlation (ITC)
Example Data A
Correlation Coefficient: Point-biserial:
Dichotomous ITC (cont…)
As mentioned before, when we
conduct item analysis, we will use
item-total correlation (ITC) a lot.
In SPSS, we can do item analysis easily

in which SPSS provides the corrected
ITC instead of the pure bivariate
correlation between item score and
total score (i.e., the regular, non-
corrected ITC).
The corrected ITC is the correlation

between each item and a scale score
(i.e., SAS_score) that excludes that item
(uses all the other items, but not that
one) to adjust for bias.
That is why the corrected ITC is

usually smaller than the non-corrected
non-corrected ITC: ITC and is usually reported.
corrected ITC bivariate correlation
25
Correlation Coefficient: Pearson’s r :
Polytomous Item-Total Correlation (ITC)
Example Data B
26
18. Correlation Coefficient: Pearson’s r :
Polytomous ITC (cont…)
Notice again: The corrected ITC is
usually smaller than the non-
corrected ITC and is usually
reported.
non-corrected ITC:
corrected ITC bivariate correlation
27
Item Discrimination Guidelines
• Guidelines:
• ITC < .2: very low (especially for polytomous items)
• ITC > .7: very high (especially for dichotomous items)
• Multiple-choice items tend to have lower discriminations, so .2 to .5 is a typical range

(Some books say .3 and above).
• Items with very low (or zero) discriminations can be removed.
• Substantial negative discrimination usually indicates an error in coding.

28
Reviewing the Concept of Item Discrimination
(and item difficulty)
B
D
item discrimination: height of arrow
C item difficulty: location of arrow
A
(low) Target Trait (high)
A : low difficulty, very low discrimination

B : moderate difficulty, high discrimination
C : moderate difficulty, low discrimination
D : high difficulty, moderate discrimination
29
Let’s Do An Exercise
A
B
C
D

A B *C D
KT 0 0 10 0
KR 0 0 10 0
JUMLAH 0 0 20 0
*Jawapan
Suggested Revision
Distractor Analyses
3
A. 32 = 9
5 B. 3 x 5 = 15 correct answer
C. 3 + 5 + 3 + 5 = 16
D. 52 = 25
9
15
16
25
Another Exercise
• Pekali kesukarannya berada dalam skala 0.41 - 0.60 iaitu
0.556, item berada pada tahap sederhana sukar.
• Pekali diskriminasi melebihi 0.40 iaitu 0.444, diskriminasi
yang baik.
• Analisis distraktor bagi distraktor A berfungsi agak baik
kerana terdapat seorang pelajar daripada kumpulan rendah
menyangka itu adalah jawapan. Distraktor C berfungsi
dengan baik kerana lebih ramai pelajar daripada kumpulan
rendah yang menyangka itu adalah jawapan. Namun
distraktor D adalah distraktor lemah kerana kerana pelajar
daripada kedua-dua kumpulan memilih jawapan tersebut.
Hal ini mungkin berlaku kerana berlaku kekeliruan.
• Jadi item ini boleh dikategorikan sebagai item yang
berkualiti kerana tidak melebihi 60% pelajar boleh
menjawab dengan betul iaitu hanya sebanyak 56%.
Manakala diskriminasi item adalah sangat tinggi iaitu 0.444,
menunjukkan ianya adalah sangat baik kerana dapat lebih
ramai pelajar berkebolehan tinggi yang menjawab dengan
betul berbanding pelajar berkebolehan rendah. Hal ini
menunjukkan distraktor-distraktor telah berjaya
mengelirukan pelajar berkebolehan rendah.
Examining Item Difficulty and Discrimination Using SPSS
1. Go to the “Scale” option of the Analysis Menu, and then select “Reliability
Analysis…”.
2. Once in the Reliability Analysis window, select the items of the instrument.
3. Click on “Statistics…”, and then “Descriptives for Item, Scale, and Scale if
item Deleted”.
4. Click on “Continue” to close the “Statistics” window, and then “OK” to run
the analysis.
37
Part 3a
Case Study 1
Case Study 1: SPSS
A researcher has developed an instrument to measure social anxiety. This measure

is called the Social Anxiety Scale (SAS). The SAS consists of 20 rating scale items,
each having four score levels coded as 0, 1, 2, 3. To evaluate the properties of the
SAS, the researcher administered it to 500 individuals. The resulting data is
contained in the SPSS data file called “CaseStudy1_Data.sav”. In this file, the 20 SAS
items are labeled V1, V2, …, V20. There are no missing responses, so each
respondent has a response to each of the 20 items of the SAS.
39
Case Study 1: SPSS
A researcher has developed an instrument to measure social anxiety. This measure

is called the Social Anxiety Scale (SAS). The SAS consists of 20 rating scale items,
each having four score levels coded as 0, 1, 2, 3. To evaluate the properties of the
SAS, the researcher administered it to 500 individuals. The resulting data is
contained in the SPSS data file called “CaseStudy1_Data.sav”. In this file, the 20 SAS
items are labeled V1, V2, …, V20. There are no missing responses, so each
respondent has a response to each of the 20 items of the SAS.
40
Case Study 1: SPSS (cont…)
Based on this output, does everything look O.K. with respect to any error in data
entry (e.g., are there any values that fall outside of the acceptable range of 0-3)?
What is the minimum and maximum value of the observed scale assigned to the
anxiety continuum as defined by the SAS?
41
The mean
represents the
item difficulty
of each item
42
Case Study 1: SPSS (cont…)
Compute the observed summated SAS score for each individual in the data file
(Xp for each person). Do this using the “Sum” function in SPSS. You can call this
variable “SAS_Score”.
In SPSS, go to Transform > Compute Variable… > Target Variable: “SAS_Score” >
Function group: Statistical > Functions and Special Variables: Sum
Examine the distribution of SAS_Score for the sample of 500 respondents by

creating a histogram in SPSS.
43
44
To create a histogram for
SAS_Score in SPSS, go to
Analyze > Descriptive >
Frequencies >
Variable(s):“SAS_Score” >
Charts: Histogram, Show
normal curve on histogram
observed scale ranges from 0-60
20 items x 3 max score = 60 total score 45

Case Study 1: SPSS (cont…): Item Analysis in Detail
In SPSS, go to Analyze > Scale > Reliability Analysis. Click Statistics and check all
buttons Descriptives for:
Item
Scale
Scale if Item Deleted
We are now doing this to introduce you to some basic concepts for item analyses.
Examine the output especially the table with (Corrected) Item-Total Correlation
46
47
Case Study 1: SPSS
Organizing the SPSS Output based on APA format
48
49
Table 1. Descriptive Statistics and Item Analyses for Social Anxiety Scale (SAS)
Item Mean SD Corrected Cronbach’s alpha The mean represents the
Item-Total if item deleted
Correlation item difficulty of each
V1 1.79 .975 .455 .838 item
V2 1.50 1.419 .081 .859
V3 1.01 1.226 .289 .846
V4
The mean represents the
1.74 .993 .507 .836
V5 1.39 .864 .720 .829 item discrimination of
V6 1.73 1.004 .530 .835 each item
V7 1.03 1.191 .492 .836
V8 1.71 1.146 .514 .835
V9 2.64 .698 .421 .840
V10 1.37 .942 .747 .826
V11 2.44 .798 .537 .836
V12 1.54 1.449 .075 .860
V13 1.48 1.068 .629 .830
V14 1.73 1.013 .482 .836
V15 2.24 .893 .526 .835
V16 .75 .942 .718 .828
V17 1.50 1.230 .410 .840
V18 .93 1.129 .368 .841
V19 .32 .669 .469 .839
V20 1.49 1.309 .235 .849
Scale Reliability (Cronbach’s alpha for SAS) .846
Scale Mean 30.330
Scale SD 10.780
N
500
50
Part 3b
Case Study 2
Case Study 2
A researcher is developing a 36-item test of heart health awareness (HHA test) that
will be used to evaluate whether patients having heart disease are aware of various
issues associated with improving the health of their heart. Each item is a multiple-
choice item. The researcher administers the HHA test to a sample of 2,000
individuals. The data are contained in “CaseStudy2_Data.sav”.
52
Case Study 2 (cont…)
Explore the data using “Descriptive Statistics”
Compute the observed summated score for each individual in the data file
Examine the distribution of the summated score for the sample of 2,000
respondents by creating a histogram in SPSS
How is the data in Case Study 2 different from the data in Case Study 1?
53
Part 3c
Case Study 3
Case Study 3: SPSS
Download the file “CaseStudy3_Data.sav”. Each file contains responses to a 35 test items by 565
test takers. There are 36 variables in the data file. PersonID is the subject identifier. Variables
A1 to D8 are the response data on the 35 test items.
The 35 items were written to measure four constructs, conveniently labeled A, B, C, and D. Items
(variables) A1 to A10 are intended to measure Construct A. Items B1 to B8 were written to
measure Construct B. Items C1 to C8 ideally measure Construct C and Items D1 to D9
supposedly measure Construct D.
Statistically analyze and evaluate these 35 items. That is, conduct an appropriate item analysis of
the items and of the four intended scales (A, B, C, and D). Do all of the items appear to work well
in measuring the four intended constructs? And, if not, which items might be discarded and
why? How do you know that throwing out those items would improve the scale properties?
How WELL are each of the constructs measured in terms of their reliability?
55
Case Study 3: SPSS (cont…) B1
B1
1
B2

B3

B4

B5

B6

B7

B8

Selected Results B2 .458 1
(Reliability & Inter-Item Correlations)

B3 .319 .338 1
B4 .288 .305 .206 1
B5 .351 .414 .249 .240 1
B6 .453 .476 .329 .236 .448 1
Construct Scale Mean Scale SD Scale Reliability Number of B7 .444 .448 .351 .291 .369 .428 1
Items
B8 .330 .328 .261 .311 .379 .410 .389 1
All Constructs 71.71 12.33 .93 35
A 20.06 4.27 .87 10
B 15.84 3.24 .82 8 C1 C2 C3 C4 C5 C6 C7 C8
C 16.97 3.73 .85 8 C1 1
D 18.84 4.17 .87 9 C2 .408 1
C3 .328 .300 1
C4 .322 .212 .301 1
C5 .565 .437 .390 .332 1
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 C6 .564 .415 .410 .275 .573 1
A1 1 C7 .485 .391 .369 .295 .558 .529 1
A2 .459 1 C8 .482 .398 .269 .300 .471 .481 .505 1
A3 .254 .252 1
A4 .331 .387 .231 1
D1 D2 D3 D4 D5 D6 D7 D8 D9
A5 .417 .454 .291 .325 1
D1 1
A6 .442 .530 .290 .344 .472 1
D2 .393 1
A7 .492 .505 .304 .400 .548 .550 1
D3 .390 .532 1
A8 .396 .449 .253 .279 .433 .472 .452 1
D4 .324 .443 .397 1
A9 .477 .562 .281 .350 .461 .540 .527 .474 1
D5 .327 .369 .389 .341 1
A10 .264 .321 .142 .268 .348 .319 .328 .272 .370 1
D6 .428 .536 .492 .406 .354 1
D7 .399 .531 .498 .423 .336 .481 1
D8 .429 .608 .547 .481 .384 .568 .553 1
D9 .275 .454 .448 .340 .309 .418 .417 .481 1
Case Study 3 SPSS (cont…)
Selected Results (Item Analyses: All Items & For Each Construct)
Paste & organize your SPSS outputs…

Folder & Files for Part 3
FOLDER: CaseStudies_SPSSExcel
CaseStudy1_Data.sav
CaseStudy1_Data.xlsx
CaseStudy2_Data.sav
CaseStudy3_Data.sav
links
SPSS & Excel Demo Case Study 1 (English)
Part 1 https://www.youtube.com/watch?v=AG5nCX8_Nxk&t=4s
Part 2 https://www.youtube.com/watch?v=4xRlLj8plkc&t=607s
Part 3 https://www.youtube.com/watch?v=hB3d0dpzXm0&t=97s
Part 4 https://www.youtube.com/watch?v=qpsox5xr6hc&t=253s
Part 5 https://www.youtube.com/watch?v=3zwfQYkr6fk&t=5s
Part 4
Excel Calculations
Folder & Files for Part 4
FOLDER: Data4ItemAnalyses_Excel_SPSS
A172_SGDY_GroupB_Quiz1_DataEntry_Scoring_Organization_v2.xlsxCaseStudy1_Data.xlsx
A172_SGDY_GroupB_Quiz1_ItemAnalyses_v2.xlsx
ItemAnalyses_inSPSS_CopiedFromExcel.sav
links
Videos explaining the Excel files (English & Bahasa Malaysia)
Part 1 https://www.youtube.com/watch?v=1UWgODuRpYE&t=2s
Part 2 https://www.youtube.com/watch?v=VD_Nd6OxgJc
Additional links
SPSS & Excel Demo Data Entry (Bahasa Malaysia)
Provide you with materials on how to do item analyses calculation
(by hand or in Excel) based on formulas.
Part 1 https://www.youtube.com/watch?v=ez38bR7rvrM&t=625s
Part 2 https://www.youtube.com/watch?v=n4YSG8sX5io&t=16s
Interpreting Item Difficulty: Dichotomous
Items
• p-value near 1 reflects very low difficulty (provides info about
individuals with very low trait level)
• p-value near 0 reflects very high difficulty (provides info about

individuals with very high trait level)
• p-value near 0.5 reflects moderate difficulty (provides info about

individuals with medium trait level)
63
Interpreting Item Difficulty: Polytomous
Items
• An item mean near J reflects very low difficulty (provides info
about individuals with very low trait level).
• An item mean near 0 reflects very high difficulty (provides info

about individuals with very high trait level)
• An item mean near J/2 reflects moderate difficulty (provides info

about individuals with medium trait level)
64
Difficulty and Item Quality: Polytomous Items
• If the item mean is too extreme, the item may not be providing information
about individuals in a relevant range of the trait continuum.
• Values near 0 or J reflect items that providing information about very extreme
levels of target trait, and thus may not be a good use of an item.
• But, sometimes you need a few items in more extreme difficulty values…so,
it’s a bit of a judgment call that depends on the range of the trait continuum
about which you intend to make inferences.
65
Consideration for Instrument Development
• Ultimately, you want your item difficulties to align with the intended uses of
the instrument.
• If you want to generate good information across a very wide range of trait
levels, you would want to have a very good range of item difficulties.
• If you want to generate very high information at a specific trait level (a cut-
score or a standard), then your would want to have lots of items with a
difficulty that differentiated between individuals at the location of the trait
level of interest.
66
Item Discrimination Guidelines
• Guidelines:
• ITC < .2: very low (especially for polytomous items)
• ITC > .7: very high (especially for dichotomous items)
• Multiple-choice items tend to have lower discriminations, so .2 to .5 is a typical range

(Some books say .3 and above).
• Items with very low (or zero) discriminations can be removed.
• Substantial negative discrimination usually indicates an error in coding.

67
Finally!
68
RECAP
Part 1: Item Difficulty (and Distractor Analyses)
Part 2: Item Discrimination
Part 3: Case Studies

Case Study 1
Case Study 2
Case Study 3
Part 4: Excel Calculations (MUST CHECK THE FILES PROVIDED!)
69
nurliyana@uum.edu.my
nurliyana.bukhari@alumni.uncg.edu
70

A221 SGDE4013 7 AllAboutItems 2 VNB

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A221 SGDE4013 7 AllAboutItems 2 VNB

Uploaded by

Copyright:

Available Formats

SGDE4013 Assessment in Learning

Items: The Building Blocks of Measurement Instruments 2

Nurliyana Bukhari, Ph.D.

Semester A221 | First Semester 2022/2023

Part 1: Item Difficulty (and Distractor Analyses)

Part 2: Item Discrimination

Part 3: Case Studies

Part 4: Excel Calculations (MUST CHECK THE FILES PROVIDED!)

• Recall that item difficulty reflects

• High difficulty items provide

• Item difficulty is measured by the mean item response across the

• A low mean value indicates a high difficulty item (most people

• A high mean value indicates a low difficulty item (most people

• Note the inverse relationship between the p-value and difficulty.

For each of the p-values for multiple-choice items scored correct-incorrect,

1. p = 0.5 average difficulty, middle range

• E.g., an item with 4 response options should have

Which number is both a factor of 100

OPTIONS In educational test, if no student selects a

What is your opinion on

• Note that the item score levels are 0, 1, …, J.

• An item mean near J reflects very low difficulty (provides info

• An item mean near 0 reflects very high difficulty (provides info

• An item mean near J/2 reflects moderate difficulty (provides info

• Naturally, for polytomous item one also needs to consider all

• So, it is important to examine the distribution of score levels

3. A rating scale item with seven score levels has a mean of 3.

• Item discrimination concerns how the response options of the item

• Discrimination is usually measured by the correlation between Yi and X

• Often, we use an X that is adjusted by removing Yi from the computation of X

KATEGORI PILIHAN JAWAPAN

• Jawapan ialah A iaitu “berbekalkan”.

In SPSS, we can do item analysis easily

The corrected ITC is the correlation

That is why the corrected ITC is

• Multiple-choice items tend to have lower discriminations, so .2 to .5 is a typical range

• Items with very low (or zero) discriminations can be removed.

• Substantial negative discrimination usually indicates an error in coding.

(low) Target Trait (high)

A : low difficulty, very low discrimination

KATEGORI PILIHAN JAWAPAN

A researcher has developed an instrument to measure social anxiety. This measure

A researcher has developed an instrument to measure social anxiety. This measure

Examine the distribution of SAS_Score for the sample of 500 respondents by

observed scale ranges from 0-60

20 items x 3 max score = 60 total score 45

Organizing the SPSS Output based on APA format

Explore the data using “Descriptive Statistics”

(Reliability & Inter-Item Correlations)

Paste & organize your SPSS outputs…

• p-value near 0 reflects very high difficulty (provides info about

• p-value near 0.5 reflects moderate difficulty (provides info about

• An item mean near 0 reflects very high difficulty (provides info

• An item mean near J/2 reflects moderate difficulty (provides info

• Multiple-choice items tend to have lower discriminations, so .2 to .5 is a typical range

• Items with very low (or zero) discriminations can be removed.

• Substantial negative discrimination usually indicates an error in coding.

Part 1: Item Difficulty (and Distractor Analyses)

Part 2: Item Discrimination

Part 3: Case Studies

Part 4: Excel Calculations (MUST CHECK THE FILES PROVIDED!)

You might also like