Professional Documents
Culture Documents
Reliability
Characteristics of Good Assessments
Reliability
Usability Validity
Correlation coefficient
A statistic that indicates the degree of relationship
between any two sets of scores
N ( XY ) ( X )( Y )
r
[ N ( X 2 ) ( X ) 2 ][ N ( Y 2 ) ( Y ) 2 ]
Test A Test B
3 + 4×6 = 5×4 + 6 =
5 – 2(4 – 2) = 2(7 – 3) – 6 =
8
Split-half Method
(Kaedah Bahagi Dua)
Easy for
administration
with a
Administration traditional test
of a single form or quiz with 10
of assessment or more items
Elsa 1 1 1 1 0 1 0 0 1 0 3 3
Anna 0 1 1 1 1 1 0 1 1 0 3 4
Olof 0 1 0 1 0 1 0 1 1 0 1 4
k = number of item
number of examinee choosing the correct answer
p = proportion of correct answer =
X total number of examinee
m = mean = k
s2 = varians = ( X m) 2
k
Student/ 1 2 3 4 5
KR21 Item
1 1 1 1 1 1
2 1 1 1 1 1
3 1 1 1 1 1
4 1 1 0 0 1
5 0 1 1 1 1
6 0 1 1 0 0
7 1 1 1 1 1
8 1 1 0 0 1
9 1 1 0 0 1
10 0 1 1 1 1
1 2 3 4 5 X X-m (X-m)2 mean = k
X
=
39
3 .9
10
1 1 1 1 1 1 5 5 – 3.9 (1.1)2
= 1.1 = 1.21
s2 = varians = ( X m) 2
2 1 1 1 1 1 5 1.1 1.21 k
9.69
3 1 1 1 1 1 5 1.1 1.21
= 0.969
10
4 1 1 0 0 1 3 – 0.9 0.81
k p (1 p )
5 0 1 1 1 1 4 0.1 0.01 KR20 1
k 1 s2
6 0 1 1 0 0 2 – 1.9 3.61
7 1 1 1 1 1 5 1.1 1.21
5 0.75
8 1 1 0 0 1 3 – 0.9 0.81 1
5 1 0.969
9 1 1 0 0 1 3 – 0.9 0.81
10 0 1 1 1 1 4 0.1 0.01 5
0.226
Right 7 10 7 6 9 39 9.69 4
p 0.7 1 0.7 0.6 0.9 = 0.28
(1-p) 0.3 0 0.3 0.4 0.1
p(1-p) 0.21 0 0.21 0.24 0.09
p(1-p) 0.75
Interrater Consistency
(Kaedah antara Penilai)
Consistency for judgments
Spread of
scores
Factor
Influencing
Reliability
Coefficient
Objectivity
Method of
estimating
Factor influencing Reliability Measures
Number of Assessment Task (i.e. no. of items)
The larger the assessment tasks (the larger the
number of items) the higher the reliability coefficients
The longer assessment will provide a more
adequate sample of behavior being measured
Scores based on larger tasks are more apt to reflex
real difference in ability and thus more stable
Reservation: Lengthened the assessment by
adding quality items
Factor influencing Reliability Measures
Spread of scores
The more spread, the larger the reliability
Larger reliability result when examinee stay in the
relative position in a group from one assessment to
another
Score A Score B
67 87
68 45
73 38
72 78
69 56
70 90
68 55
Factor influencing Reliability Measures
Objectivity
Objectivity = degree to which equally competent
scorers obtain the same result
Objective items (true-false, MCQ) scores have
higher reliability compared with performance
assessment because the reliability is not affected by
scoring procedures
Factor influencing Reliability Measures
Method of estimating reliability
Equivalent form with time interval took into account
the most sources of variation. Thus it is more rigorous
compared to other method – smaller reliability
coefficient is expected & unfair to compare with other
method
Larger reliability coefficient typically reported with
split-half method