You are on page 1of 27

RELIABILITY OF DISEASE

CLASSIFICATION
Nigel Paneth
TERMINOLOGY
Reliability is analogous to precision

Validity is analogous to accuracy

Reliability is how well an observer
classifies the same individual under
different circumstances.
Validity is how well a given test reflects
another test of known greater accuracy.

RELIABILITY AND VALIDITY
Reliability includes:
assessments of the same observer at
different times - INTRA-OBSERVER
RELIABILITY
assessments of different observers at the
same time - INTER-OBSERVER
RELIABILITY
Reliability assumes that all tests or
observers are equal; Validity assumes that
there is a gold standard to which a test or
observer should be compared.

ASSESSING RELABILITY
How do we assess reliability?
One way is to look simply at percent
agreement.
Percent agreement is the proportion
of all diagnoses classified the same
way by two observers.
EXAMPLE OF PERCENT
AGREEMENT
Two physicians are each given a
set of 100 X-rays to look at
independently and asked to judge
whether pneumonia is present or
absent. When both sets of
diagnoses are tallied, it is found that
95% of the diagnoses are the same.



IS PERCENT AGREEMENT
GOOD ENOUGH?
Do these two physicians exhibit high
diagnostic reliability?

Can there be 95% agreement between
two observers without really having
good reliablity?

Compare the two tables below:
Table 1 Table 2
MD#1
Yes No

MD#2
Yes 1 3
No 2 94
MD#1
Yes No

MD#2
Yes 43 3
No 2 52
In both instances, the physicians agree
95% of the time. Are the two physicians
equally reliable in the two tables?
MD#1
Yes No

MD#2
Yes 43 3
No 2 52
What is the essential difference between
the two tables?

The problem arises from the ease of
agreement on common events (e.g. not
having pneumonia in the first table).

So a measure of agreement should take
into account the ease of agreement
due to chance alone.

USE OF THE KAPPA
STATISTIC TO ASSESS
RELIABILITY
Kappa is a widely used test of
inter or intra-observer agreement
(or reliability) which corrects for
chance agreement.
KAPPA VARIES FROM + 1 to - 1
+ 1 means that the two observers are perfectly
reliable. They classify everyone exactly the
same way.

0 means there is no relationship at all
between the two observers classifications,
above the agreement that would be
expected by chance.

- 1 means the two observers classify exactly
the opposite of each other. If one observer
says yes, the other always says no.

GUIDE TO USE OF KAPPAS IN
EPIDEMIOLOGY AND MEDICINE

Kappa > .80 is considered excellent
Kappa .60 - .80 is considered good
Kappa .40 - .60 is considered fair
Kappa < .40 is considered poor
1
st
WAY TO CALCULATE
KAPPA
1. Calculate observed agreement (cells in
which the observers agree/total cells). In
both table 1 and table 2 it is 95%

2. Calculate expected agreement (chance
agreement) based on the marginal totals
Table 1s marginal totals
are:

OBSERVED
MD#1
Yes No

MD#2
Yes 1 3 4
No 2 94 96
3 97 100
How do we
calculate the N
expected by
chance in each
cell?
We assume that
each cell should
reflect the marginal
distributions, i.e.
the proportion of
yes and no
answers should be
the same within
the four-fold table
as in the marginal
totals.
OBSERVED MD #1
Yes No
MD#2 Yes 1 3 4
No 2 94 96
3 97 100
EXPECTED MD #1
Yes No
MD#2 Yes 4
No 96
3 97 100
To do this, we find the proportion of answers in either
the column (3% and 97%, yes and no respectively for
MD #1) or row (4% and 96% yes and no respectively
for MD #2) marginal totals, and apply one of the two
proportions to the other marginal total. For example,
96% of the row totals are in the No category.
Therefore, by chance 96% of MD #1s Nos should
also be in the No column. 96% of 97 is 93.12.

EXPECTED
MD#1
Yes No
MD#2 Yes 4
No 93.12 96
3 97 100
By subtraction, all other cells fill in
automatically, and each yes/no distribution
reflects the marginal distribution. Any cell
could have been used to make the calculation,
because once one cell is specified in a 2x2
table with fixed marginal distributions, all
other cells are also specified.
EXPECTED MD #1
Yes No
MD#2 Yes 0.12 3.88 4
No 2.88 93.12 96
3 97 100
Now you can see that just by the
operation of chance, 93.24 of the 100
observations should have been agreed
to by the two observers. (93.12 + 0.12)
EXPECTED MD #1
Yes No
MD#2 Yes 0.12 3.88 4
No 2.88 93.12 96
3 97 100
Lets now compare the actual agreement with
the expected agreement.

Expected agreement is 6.76% from perfect
agreement of 100% (100 93.24)

Actual agreement is 5.0% from perfect
agreement (100 95).

So our two observers were 1.76% better
than chance, but if they had agreed perfectly
they would have been 6.76% better than
chance. So they are really only about
better than chance (1.76/6.76)


Below is the formula for calculating
Kappa from expected agreement
Observed agreement - Expected Agreement
1 - Expected Agreement

95% - 93.24% = 1.76% = .26
1 - 93.24% 6.76%

How good is a Kappa of 0.26?

Kappa > .80 is considered excellent
Kappa .60 - .80 is considered good
Kappa .40 - .60 is considered fair
Kappa < .40 is considered poor
In the second example, the observed
agreement was also 95%, but the
marginal totals were very different

ACTUAL MD #1
Yes No
MD#2 Yes 46
No 54
45 55 100
Using the same procedure as before,
we calculate the expected N in any one
cell, based on the marginal totals. For
example, the lower right cell is 54% of
55, which is 29.7






ACTUAL MD #1
Yes No
MD#2 Yes 46
No 29.7 54
45 55 100
And, by subtraction the other cells
are as below. The cells which
indicate agreement are highlighted
in yellow, and add up to 50.4%
ACTUAL MD #1
Yes No
MD#2 Yes 20.7 25.3 46
No 24.3 29.7 54
45 55 100
Enter the two agreements into the formula:

Observed agreement - Expected Agreement
1 - Expected Agreement

95% - 50.4% = 44.6% = .90
1 - 50.4% 49.6%

In this example, the observers have the
same % agreement, but now they are
much different from chance.
Kappa of 0.90 is considered excellent

A 2
nd
WAY TO CALCULATE
THE KAPPA STATISTIC



MD#1
Yes No

MD#2
Yes A B
N
1

No C D N
2
N
3
N
4
total
2(AD - BC)
N
1
N
4
+ N
2
N
3

where the Ns are the marginal totals, labeled
thus:
Look again at the tables on slide 7.
For Table 1:

2(94 x 1 - 2 x 3) = 176 = .26
4 x 97 + 3 x 96 676

For Table 2:

2(52 x 43 - 3 x 2) = 4460 = .90
46 x 55 + 45 x 54 4960

Note parallels between:

THE ODDS RATIO

THE CHI-SQUARE STATISTIC

THE KAPPA STATISTIC

Note that the cross-products of the
four-fold table, and their relation to
marginal totals, are central to all
three expressions