Professional Documents
Culture Documents
One indicates that they are perfectly correlated. Knowing the value of
one response provides complete information about the other items.
Now, what on Earth does that mean? Let’s start with reliability.
Say an individual takes a Happiness Survey. Your happiness
score would be highly reliable (consistent) if it produces the same
or similar results when the same individual re-takes your survey,
under the same conditions. However, say an individual, who, at
the same level of real happiness, takes this Happiness Survey
twice back-to-back, and one score shows high happiness and the
other score shows low happiness—that measure would not be
reliable at all.
Inter-Rater Reliability
Percent Agreement
1 5 4 4
2 3 3 3
3 4 4 4
4 2 2 3
5 1 2 1
Next, count the number of agreements between pairs of judges in each row. With
three judges, there are three pairings and, hence, three possible agreements per
writing sample. I’ll add columns to record the rating agreements using 1s and 0s for
agreement and disagreement, respectively. The final column is the total number of
agreements for that writing sample.
Writing Judge Judge Judge 1 &1 &2 &
Total
Sample 1 2 3 2 3 3
1 5 4 4 0 0 1 1
2 3 3 3 1 1 1 3
3 4 4 4 1 1 1 3
4 2 2 3 1 0 0 1
5 1 2 1 0 1 0 1
While this is the simplest form of inter-rater reliability, it falls short in several ways.
First, it doesn’t account for agreements that occur by chance, which causes the
percent agreement method to overestimate inter-rater reliability. Second, it
doesn’t factor in the degree of agreement, only absolute agreement. It’s either a
match or not. On a scale of 1 – 5, two judges scoring 4 and 5 is much better than
scores of 1 and 5!