Professional Documents
Culture Documents
We review methods of calculating the positive predictive value of a test (the probability of having a condition given
a positive test for it) in situations where there is no ’gold standard’ way to determine the true classification. We show
that Bayesian methods lead to illogical results and instead show that a new approach using imprecise probabilities
is logically consistent.
Keywords: diagnostics, Bayes’ rule, false positives, prevalence, sensitivity, specificity, uncertainty, gold standard
such a profound disagreement could emerge and outcome, Pr(D|+), this is also known as the
escape resolution for many years. positivew predictive value (PPV). When p, s and
t are available in scalar form, we can obtain
2. The Standard Bayesian Approach to Pr(D|+) = Pr(D|+, p, s, t).
Calculating Positive Predictive Value The Mossman & Berger (2001) paper takes this
a step further and considers the following hypo-
Throughout this paper, we will refer to the fol- thetical situation:
lowing hypothetical dataset for TN trials of a Mr Smith has tested positive for disorder D,
diagnostic test for condition D: we have α true he asks his doctor the following ”Given the pub-
positives, β false positives, γ false negatives and lished estimates for prevalence, sensitivity and
δ true negatives. Let T+ be the total number of specificity, what is the 95% confidence interval for
positive tests, T− the total number of negative my probability of having D given my positive test
tests, TS the total number of sick people and TW results and the imprecision in the estimates?”
the total number of well people. This allows us to When there is uncertainty about the values of
construct the confusion matrix shown in Table 1. p, s and t they can be described by distributions
Smith & Winkler (1999), Smith et al. (2000).
Table 1. Sample data. There are a couple of ways in which in PPV can
Has Problem No Problem Total
be determined, the simplest is to estimate it using
Positive Test α β T+
the 5 but replacing p, s and t with their expected
Negative Test γ δ T−
values:
Total TS TW TN Pr ∗(D|+, p, s, t) =
E(p)E(s) (6)
The probability that someone has the disease E(p)E(s) + (1 − E(p))(1 − E(t))
given they have had a positive test result is given
by Pr(D|+), whilst the probability that they do In order to obtain a distribution for the PPV,
not have the disease is Pr(¬D|+). Throughout, Mossman & Berger (2001) use a convolution of
we will only consider positive test results, how- the distributions of p, s and t within Eq. 5. In
ever all the arguments made could equally apply their numerical calculation they sample random
to negative test results. The prevalence is given by variables from the distributions of p, s and t and
use Eq. 5 to find the distribution of the PPV.
TS α+γ Both Mossman & Berger (2001) and Winkler &
p= = , (1)
TT α+β+γ+δ Smith (2004) use Jeffery’s prior for p, s and t:
the sensitivity by fj (x|a, b) = B(x|a + 1/2, b − a + 1/2) (7)
α α
s= = , (2) where B is the beta distribution, however alterna-
TS α+γ tives are available.(Bolstad 2007)
and the specificity by Using priors defined by pk = 5, pn = 100,
sk = 9, sn = 10, tk = 9 and tn = 10, where
δ δ the values of p, s and t have come from inde-
t= = . (3)
TW β+δ pendent trials, the Mossman and Berger method
gives the 95% confidence interval for the PPV as
We will also define the ratio of positive tests out [0.075, 0.824].
of total number tested as
T+ α+β
h= = . (4) 3. The Winkler and Smith Method
TT α+β+γ+δ
Winkler & Smith (2004) diverges from Mossman
Often the values of p, s and t are published inde- & Berger (2001) and the textbook method, see
pendent of each other. In which case the following Lesaffre & Lawson (2012) as an example, they
notation can also be used: p = pk /pn , s = sk /sn argue that the outcome of a patient’s test should
and t = tk /tn . be used to update the distribution. They assert that
Given the published values of p, s and t, Bayes’ the PPV of a positive medical test for a disease is
rule gives the probability that a patient has D not Eq. 5, Eq. 6 or Mossman and Berger’s (2001)
given they have tested positive is: ”objective bayesian method.” Rather, it should
ps be computed as a weighted average of assuming
Pr(D|+, p, s, t) = . (5) the positive result is a true positive (and accord-
ps + (1 − p)(1 − t)
ingly augmenting the estimates for prevalence and
(Baron 1994, Lesaffre & Lawson 2012) We are sensitivity) and assuming the positive result is a
usually more interested in obtaining the proba- false positive (and accordingly decrementing the
bility that is only conditioned on a positive test estimates for prevalence and specificity).
2630 Proceedings of the 29th European Safety and Reliability Conference
Starting with the same sample data as Table 1, (pk = 5, pn = 100, sk = 9, sn = 10, tk = 9
Winkler and Smith suggest that we should con- and tn = 10), we find that the 95% confidence
struct two new confusion matrices, one assuming interval is [0.062, 0.741]. Notice that the width of
a true positive and one assuming a false positive, this confidence interval is smaller than that of the
Table 2 and Table 3 respectively. Mossman and Berger method.
t′S = t. (10)
Similarly for the assuming false positive case: Table 5. New confusion matrix assuming true positive for X
positive tests.
and the new sensitivity wouldn’t change from the Section 2 example data set (pk = 5,
pn = 100, sk = 9, sn = 10, tk = 9 and
s′− = s (20) tn = 10). These results amount to a logical flaw in
and specificity would be. the Winkler and Smith method. We haven’t added
any new information (apart from the number of
δ tests) and the uncertainty has reduced. It should
t′− = (21) be noted that this asymptote is due to the choice
β+X
of prior. Interestingly, and perhaps worryingly,
We will now consider what happens to p′+ , s′+ , t′+ , different priors would give different values in the
p′− , s′− and t′− when the number of tests becomes limit X → ∞.
large (X → ∞). We find
lim p′+ = 1, (22)
X→∞
1 10 100 1000
lim s′+ = 1, (23)
X→∞
1.00
lim t′+ = t, (24)
X→∞
Cumlative distribution
lim p′− = 1, (25) 0.75
X→∞
X→∞
and
lim t′− = 0 (27)
0.25
X→∞
δ
1.00
t′ = . (34)
β + δ + X[0, 1]
lim s′ = 1 (36)
X→∞
0.25
Imprecise Probability
Mossman−Berger lim t′ = 0 (37)
X→∞
Winkler−Smith
0.00
6. Conclusion
We have shown that the method for dealing with
1 10 100 1000
the lack of a gold standard in a classification test
is inappropriate, it leads to the illogical result of
the test becoming less uncertainty after more trails
1.00
Acknowledgement
0.50