You are on page 1of 22

PGT 202E:

BASIC MEASUREMENT AND EVALUATION

Reliability
Characteristics of Good Assessments

Reliability

Usability Validity
Correlation coefficient
A statistic that indicates the degree of relationship
between any two sets of scores
N ( XY )  ( X )(  Y )
r
[ N ( X 2 )  ( X ) 2 ][ N ( Y 2 )  ( Y ) 2 ]

The larger the value of r, the stronger relationship exists


between the two scores
23/2 (X) 27/2(Y) X2 Y2 XY
76 68 762 = 5776 682 =4624 76x68 = 5168
54 46 542 = 2916 462 =2116 54 x 46 = 2484
62 68 622 =3844 682 =4624 62 x 68 = 4216
80 88 802 = 6400 882 =7744 80 x 88 = 7040
73 67 732 =5329 672 =4489 73 x 67 = 4891
X = Y =  X2 =  Y2
 XY = 23799
345 337 24265 =23597
N ( XY )  ( X )(  Y ) 2730 2730
r r r
[ N ( X )  ( X ) ][ N ( Y )  ( Y ) ]
2 2 2 2
10156800 3186.97
5(23799)  (345)(337)
r 2730
[5(24265)  (345) 2 ][5(23597)  (337) 2 ] r 
r = 0.85661
[2300][ 4416] = 0.86
Equivalent-
Interrater
Test – retest forms Split-half Coefficient
(Kaedah (Kaedah
(Kaedah (Kaedah alpha (Pekali
antara
uji – uji semula) bentuk bahagi dua) alpha)
penilai)
setara)
Test – retest Method
(Kaedah uji – uji semula)
The same assessment is
administered twice to the
same group of students with
a given time interval
The resulting scores are
correlated using
correlation coefficient

The correlation coefficient


provides measure of
stability

How stable the results are


over a given time interval
Equivalent Forms Method
(Kaedah Bentuk Setara)
Uses two different but equivalent forms of
assessment (called parallel or alternate forms)

The parallel forms are administered to the


same group in close succession

If there is a time interval – equivalent form with


interval

Both results are correlated

The result indicates the degree to which to


forms are measuring the same aspects
Parallel Tests
Two tests are considered parallel if items in the tests are similar
in terms of content and difficulty

Test A Test B

3 + 4×6 = 5×4 + 6 =

5 – 2(4 – 2) = 2(7 – 3) – 6 =

8
Split-half Method
(Kaedah Bahagi Dua)
Easy for
administration
with a
Administration traditional test
of a single form or quiz with 10
of assessment or more items

Administer the To split the


assessment in test, the
the usual normal
manner and procedure is to
divide into half score the even
for scoring and odd
purpose numbered item
separately
example

Name 1 2 3 4 5 6 7 8 9 10 Odd Even

Elsa 1 1 1 1 0 1 0 0 1 0 3 3

Anna 0 1 1 1 1 1 0 1 1 0 3 4

Olof 0 1 0 1 0 1 0 1 1 0 1 4

Find correlation between odd and even number of item score


Split-half Method
(Kaedah Bahagi Dua)
The two scores than correlated
Reliability is given as follows:

e.g. If the correlation between scores is .40, then


Coefficient Alpha
(Kaedah Pekali Alpha)
Provide index for internal
consistency without having to
split the assessment

Special case : Kuder-Richardson


Formula 20 (KR-20) and Kuder-
Richardson Formula 21 (KR-21)
k   p (1  p ) 
KR20  1  
k  1  s2 
k  m( k  m) 
KR21  1
k 1  ks 2 

k = number of item
number of examinee choosing the correct answer
p = proportion of correct answer =
X total number of examinee
m = mean =  k
s2 = varians =  ( X  m) 2

k
Student/ 1 2 3 4 5
KR21 Item
1 1 1 1 1 1
2 1 1 1 1 1
3 1 1 1 1 1
4 1 1 0 0 1
5 0 1 1 1 1
6 0 1 1 0 0
7 1 1 1 1 1
8 1 1 0 0 1
9 1 1 0 0 1
10 0 1 1 1 1
1 2 3 4 5 X X-m (X-m)2 mean = k
X
=
39
 3 .9
10
1 1 1 1 1 1 5 5 – 3.9 (1.1)2
= 1.1 = 1.21
s2 = varians =  ( X  m) 2

2 1 1 1 1 1 5 1.1 1.21 k
9.69
3 1 1 1 1 1 5 1.1 1.21
=  0.969
10
4 1 1 0 0 1 3 – 0.9 0.81
k   p (1  p ) 
5 0 1 1 1 1 4 0.1 0.01 KR20  1  
k  1  s2 
6 0 1 1 0 0 2 – 1.9 3.61
7 1 1 1 1 1 5 1.1 1.21
5  0.75 
8 1 1 0 0 1 3 – 0.9 0.81 1 
5  1  0.969 
9 1 1 0 0 1 3 – 0.9 0.81
10 0 1 1 1 1 4 0.1 0.01 5
0.226
Right 7 10 7 6 9 39 9.69 4
p 0.7 1 0.7 0.6 0.9 = 0.28
(1-p) 0.3 0 0.3 0.4 0.1
p(1-p) 0.21 0 0.21 0.24 0.09
p(1-p) 0.75
Interrater Consistency
(Kaedah antara Penilai)
Consistency for judgments

Judgments – performance based

The question: Whether the score is consistent when judge


by another equally qualified judge

Method: Correlating the score from rater 1 and rater 2


No of
assessment

Spread of
scores
Factor
Influencing
Reliability
Coefficient
Objectivity

Method of
estimating
Factor influencing Reliability Measures
Number of Assessment Task (i.e. no. of items)
The larger the assessment tasks (the larger the
number of items) the higher the reliability coefficients
The longer assessment will provide a more
adequate sample of behavior being measured
Scores based on larger tasks are more apt to reflex
real difference in ability and thus more stable
Reservation: Lengthened the assessment by
adding quality items
Factor influencing Reliability Measures
Spread of scores
The more spread, the larger the reliability
Larger reliability result when examinee stay in the
relative position in a group from one assessment to
another
Score A Score B
67 87
68 45
73 38
72 78
69 56
70 90
68 55
Factor influencing Reliability Measures
Objectivity
Objectivity = degree to which equally competent
scorers obtain the same result
Objective items (true-false, MCQ) scores have
higher reliability compared with performance
assessment because the reliability is not affected by
scoring procedures
Factor influencing Reliability Measures
Method of estimating reliability
Equivalent form with time interval took into account
the most sources of variation. Thus it is more rigorous
compared to other method – smaller reliability
coefficient is expected & unfair to compare with other
method
Larger reliability coefficient typically reported with
split-half method

You might also like