PGT 202e Reliability

PGT 202E:
BASIC MEASUREMENT AND EVALUATION
Reliability
Characteristics of Good Assessments
Reliability
Usability Validity
Correlation coefficient
A statistic that indicates the degree of relationship
between any two sets of scores
N ( XY )  ( X )(  Y )
r
[ N ( X 2 )  ( X ) 2 ][ N ( Y 2 )  ( Y ) 2 ]
The larger the value of r, the stronger relationship exists

between the two scores
23/2 (X) 27/2(Y) X2 Y2 XY
76 68 762 = 5776 682 =4624 76x68 = 5168
54 46 542 = 2916 462 =2116 54 x 46 = 2484
62 68 622 =3844 682 =4624 62 x 68 = 4216
80 88 802 = 6400 882 =7744 80 x 88 = 7040
73 67 732 =5329 672 =4489 73 x 67 = 4891
X = Y =  X2 =  Y2
 XY = 23799
345 337 24265 =23597
N ( XY )  ( X )(  Y ) 2730 2730
r r r
[ N ( X )  ( X ) ][ N ( Y )  ( Y ) ]
2 2 2 2
10156800 3186.97
5(23799)  (345)(337)
r 2730
[5(24265)  (345) 2 ][5(23597)  (337) 2 ] r 
r = 0.85661
[2300][ 4416] = 0.86
Equivalent-
Interrater
Test – retest forms Split-half Coefficient
(Kaedah (Kaedah
(Kaedah (Kaedah alpha (Pekali
antara
uji – uji semula) bentuk bahagi dua) alpha)
penilai)
setara)
Test – retest Method
(Kaedah uji – uji semula)
The same assessment is
administered twice to the
same group of students with
a given time interval
The resulting scores are
correlated using
correlation coefficient
The correlation coefficient

provides measure of
stability
How stable the results are

over a given time interval
Equivalent Forms Method
(Kaedah Bentuk Setara)
Uses two different but equivalent forms of
assessment (called parallel or alternate forms)
The parallel forms are administered to the

same group in close succession
If there is a time interval – equivalent form with

interval
Both results are correlated
The result indicates the degree to which to

forms are measuring the same aspects
Parallel Tests
Two tests are considered parallel if items in the tests are similar
in terms of content and difficulty
Test A Test B
3 + 4×6 = 5×4 + 6 =
5 – 2(4 – 2) = 2(7 – 3) – 6 =
8
Split-half Method
(Kaedah Bahagi Dua)
Easy for
administration
with a
Administration traditional test
of a single form or quiz with 10
of assessment or more items
Administer the To split the

assessment in test, the
the usual normal
manner and procedure is to
divide into half score the even
for scoring and odd
purpose numbered item
separately
example
Name 1 2 3 4 5 6 7 8 9 10 Odd Even
Elsa 1 1 1 1 0 1 0 0 1 0 3 3
Anna 0 1 1 1 1 1 0 1 1 0 3 4
Olof 0 1 0 1 0 1 0 1 1 0 1 4
Find correlation between odd and even number of item score

Split-half Method
(Kaedah Bahagi Dua)
The two scores than correlated
Reliability is given as follows:
e.g. If the correlation between scores is .40, then

Coefficient Alpha
(Kaedah Pekali Alpha)
Provide index for internal
consistency without having to
split the assessment
Special case : Kuder-Richardson

Formula 20 (KR-20) and Kuder-
Richardson Formula 21 (KR-21)
k   p (1  p ) 
KR20  1  
k  1  s2 
k  m( k  m) 
KR21  1
k 1  ks 2 
k = number of item
number of examinee choosing the correct answer
p = proportion of correct answer =
X total number of examinee
m = mean =  k
s2 = varians =  ( X  m) 2
k
Student/ 1 2 3 4 5
KR21 Item
1 1 1 1 1 1
2 1 1 1 1 1
3 1 1 1 1 1
4 1 1 0 0 1
5 0 1 1 1 1
6 0 1 1 0 0
7 1 1 1 1 1
8 1 1 0 0 1
9 1 1 0 0 1
10 0 1 1 1 1
1 2 3 4 5 X X-m (X-m)2 mean = k
X
=
39
 3 .9
10
1 1 1 1 1 1 5 5 – 3.9 (1.1)2
= 1.1 = 1.21
s2 = varians =  ( X  m) 2
2 1 1 1 1 1 5 1.1 1.21 k
9.69
3 1 1 1 1 1 5 1.1 1.21
=  0.969
10
4 1 1 0 0 1 3 – 0.9 0.81
k   p (1  p ) 
5 0 1 1 1 1 4 0.1 0.01 KR20  1  
k  1  s2 
6 0 1 1 0 0 2 – 1.9 3.61
7 1 1 1 1 1 5 1.1 1.21
5  0.75 
8 1 1 0 0 1 3 – 0.9 0.81 1 
5  1  0.969 
9 1 1 0 0 1 3 – 0.9 0.81
10 0 1 1 1 1 4 0.1 0.01 5
0.226
Right 7 10 7 6 9 39 9.69 4
p 0.7 1 0.7 0.6 0.9 = 0.28
(1-p) 0.3 0 0.3 0.4 0.1
p(1-p) 0.21 0 0.21 0.24 0.09
p(1-p) 0.75
Interrater Consistency
(Kaedah antara Penilai)
Consistency for judgments
Judgments – performance based
The question: Whether the score is consistent when judge

by another equally qualified judge
Method: Correlating the score from rater 1 and rater 2

No of
assessment
Spread of
scores
Factor
Influencing
Reliability
Coefficient
Objectivity
Method of
estimating
Factor influencing Reliability Measures
Number of Assessment Task (i.e. no. of items)
The larger the assessment tasks (the larger the
number of items) the higher the reliability coefficients
The longer assessment will provide a more
adequate sample of behavior being measured
Scores based on larger tasks are more apt to reflex
real difference in ability and thus more stable
Reservation: Lengthened the assessment by
adding quality items
Spread of scores
The more spread, the larger the reliability
Larger reliability result when examinee stay in the
relative position in a group from one assessment to
another
Score A Score B
67 87
68 45
73 38
72 78
69 56
70 90
68 55
Objectivity
Objectivity = degree to which equally competent
scorers obtain the same result
Objective items (true-false, MCQ) scores have
higher reliability compared with performance
assessment because the reliability is not affected by
scoring procedures
Method of estimating reliability
Equivalent form with time interval took into account
the most sources of variation. Thus it is more rigorous
compared to other method – smaller reliability
coefficient is expected & unfair to compare with other
method
Larger reliability coefficient typically reported with
split-half method

PGT 202e Reliability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PGT 202e Reliability

Uploaded by

Copyright:

Available Formats

PGT 202E:

BASIC MEASUREMENT AND EVALUATION

The larger the value of r, the stronger relationship exists

The correlation coefficient

How stable the results are

The parallel forms are administered to the

If there is a time interval – equivalent form with

Both results are correlated

The result indicates the degree to which to

Administer the To split the

Name 1 2 3 4 5 6 7 8 9 10 Odd Even

Find correlation between odd and even number of item score

e.g. If the correlation between scores is .40, then

Special case : Kuder-Richardson

Judgments – performance based

The question: Whether the score is consistent when judge

Method: Correlating the score from rater 1 and rater 2

You might also like