Professional Documents
Culture Documents
Fleiss’ Kappa
Cohen’s kappa is a measure of the agreement between two raters, where agreement due to chance is factored
out. We now extend Cohen’s kappa to the case where the number of raters can be more than two. This extension is
called Fleiss’ kappa. As for Cohen’s kappa no weighting is used and the categories are considered to be
unordered.
Let n = the number of subjects, k = the number of evaluation categories and m = the number of judges for each
subject. E.g. for Example 1 of Cohen’s Kappa, n = 50, k = 3 and m = 2. While for Cohen’s kappa both judges
evaluate every subject, in the case of Fleiss’ kappa, there may be many more than m judges and not every judge
needs to evaluate each subject; what is important is that each subject is evaluated m times.
For every subject i = 1, 2, …, n and evaluation categories j = 1, 2, …, k, let xij = the number of judges that assign
category j to subject i. Thus
The proportion of pairs of judges that agree in their evaluation on subject i is given by
where
http://www.real-statistics.com/reliability/fleiss-kappa/ 1/29
25/09/2018 Fleiss' Kappa | Real Statistics Using Excel
There is an alternative calculation of the standard error provided in Fleiss’ orginal paper, namely the square root
of the following:
The test statistics zj = κj/s.e.(κj) and z = κ/s.e. are generally approximated by a standard normal distribution,
which allows us to calculate a p-value and confidence interval. E.g. the 1 – α confidence interval for kappa is
therefore approximated as
Example 1: Six psychologists (judges) evaluate 12 patients as to whether they are psychotic, borderline, bipolar
or none of these. The rating are summarized in range A3:E15 of Figure 1. Determine the overall agreement
between the psychologists, subtracting out agreement due to chance, using Fleiss’ kappa. Also find Fleiss’ kappa
for each disorder.
For example, we see that 4 of the psychologists rated subject 1 to have psychosis and 2 rated subject 1 to have
borderline syndrome, no psychologist rated subject 1 with bipolar or none.
We use the formulas described above to calculate Fleiss’ kappa in the worksheet shown in Figure 1. The formulas
in the ranges H4:H15 and B17:B22 are displayed in text format in column J, except that the formulas in cells H9
and B19 are not displayed in the figure since they are rather long. These formulas are:
http://www.real-statistics.com/reliability/fleiss-kappa/ 2/29
25/09/2018 Fleiss' Kappa | Real Statistics Using Excel
H9 s.e. =B20*SQRT(SUM(B18:E18)^2-SUMPRODUCT(B18:E18,1-2*B17:E17))/SUM(B18:E18)
B19 κ1 =1-SUMPRODUCT(B4:B15,$H$4-B4:B15)/($H$4*$H$5*($H$4-1)*B17*(1-B17))
Note too that row 18 (labelled b) contains the formulas for qj(1–qj).
The p-values (and confidence intervals) show us that all of the kappa values are significantly different from zero.
Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental function:
KAPPA(R1, j, lab, alpha, tails, orig): if lab = FALSE (default) returns a 6 × 1 range consisting of κ if j = 0
(default) or κj if j > 0 for the data in R1 (where R1 is formatted as in range B4:E15 of Figure 1), plus the
standard error, z-stat, z-crit, p-value and lower and upper bound of the 1 – alpha confidence interval, where
alpha = α (default .05) and tails = 1 or 2 (default). If lab = TRUE then an extra column of labels is included in
the output. If orig = TRUE then the original calculation for the standard error is used; default is FALSE.
For Example 1, KAPPA(B4:E15) = .2968 and KAPPA(B4:E15,2) = .28. The complete output
for KAPPA(B4:E15,,TRUE) is shown in Figure 3.
Real Statistics Data Analysis Tool: The Reliability data analysis tool supplied in the Real Statistics Resource
Pack can also be used to calculate Fleiss’ kappa.
To calculate Fleiss’ kappa for Example 1 press Ctrl-m and choose the Reliability option from the menu that
appears. Fill in the dialog box that appears (see Figure 7 of Cronbach’s Alpha) by inserting B4:E15 in the Input
Range, choosing the Fleiss’ kappa option and clicking on the OK button..
http://www.real-statistics.com/reliability/fleiss-kappa/ 3/29