You are on page 1of 4

Interrater reliability (Kappa)

Using SPSS

See www.stattutorials.com/SPSSDATA for files mentioned in this tutorial �


TexaSoft, 2008

These SPSS statistics tutorials briefly explain the use and interpretation of standard
statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials,
Marketing or Scientific Research. The examples include how-to instructions for
SPSS Software.

Interrater reliability (Kappa)

Interrater reliability is a measure used to examine the agreement between two


people (raters/observers) on the assignment of categories of a categorical variable. It
is an important measure in determining how well an implementation of some coding
or measurement system works.

A statistical measure of interrater reliability is Cohen�s Kappa which ranges


generally from 0 to 1.0 (although negative numbers are possible) where large
numbers mean better reliability, values near or less than zero suggest that agreement
is attributable to chance alone.

Example Interrater reliability analysis

Using an example from Fleiss (1981, p 213), suppose you have 100 subjects
whose diagnosis is rated by two raters on a scale that rates the subject�s disorder as
being either psychological, neurological, or organic. The data are given below:
(KAPPA.SAV)

RATER A

Psychological Neurological Organic

Psychological 75 1 4

RATER Neurological 5 4 1
B Organic 0 0 10

The data set KAPPA.SAV contains variables, Rater_A, Rater_B and Count. The
figure below shows the data file in count (summarized) form.

To analyze this data follow these steps:

1. Open the file KAPPA.SAV. Before performing the analysis on this summarized
data, you must tell SPSS that the Count variable is a �weighted� variable. Select
Data/Weight Cases...and select the �weight cases by� option with Count as the
Frequency variable

2. Select Analyze/Descriptive Statistics/Crosstabs.

3. Select Rater A as Row, Rater B as Col.

4. Click on the Statistics button, select Kappa and Continue.

5. Click OK to display the results for the Kappa test shown here:
The results of the interrater analysis are Kappa = 0.676 with p < 0.001. This
measure of agreement, while statistically significant, is only marginally convincing.
As a rule of thumb values of Kappa from 0.40 to 0.59 are considered moderate, 0.60
to 0.79 substantial, and 0.80 outstanding (Landis & Koch, 1977). Most statisticians
prefer for Kappa values to be at least 0.6 and most often higher than 0.7 before
claiming a good level of agreement. Although not displayed in the output, you can
find a 95 % confidence interval using the generic formula for 95% confidence
intervals:

Estimate � 1.96SE

Using this formula and the results in the table an approximate 95% confidence
interval on Kappa is (0.504, 0.848). Some statisticians prefer the use of a weighted
Kappa, particularly if the categories are ordered. The weighted Kappa allows
�close� ratings to not simply be counted as �misses.� However, SPSS does not
calculate weighted Kappas.

A more complete list of how Kappa might be interpreted (Landis & Koch, 1977) is
given in the following table

Kappa Interpretation

<0 Poor agreement

0.0 – 0.20 Slight agreement

0.21 – 0.40Fair agreement

0.41 – 0.60Moderate agreement

0.61 – 0.80Substantial agreement

0.81 – 1.00Almost perfect agreement

Reporting the results of an interrater reliability analysis

The following illustrate how you might report this interrater analysis in a publication
format.

Narrative for the methods section:

�An interrater reliability analysis using the Kappa statistic was performed to
determine consistency among raters.�

Narrative for the results section:


�The interrater reliability for the raters was found to be Kappa = 0.68 (p <.0.001),
95% CI (0.504, 0.848). �

Reference

Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for


categorical data. Biometrics 33:159-174.

You might also like