A Problem in The Bayesian Analysis of Data Without Gold Standards

A Problem in the Bayesian Analysis of Data without Gold Standards
Nick Gray, Marco De Angelis, Dominic Calleja and Scott Ferson

Institute for Risk and Uncertainty, University of Liverpool, United Kingdom.
E-mail: nickgray@liverpool.ac.uk
We review methods of calculating the positive predictive value of a test (the probability of having a condition given
a positive test for it) in situations where there is no ’gold standard’ way to determine the true classification. We show
that Bayesian methods lead to illogical results and instead show that a new approach using imprecise probabilities
is logically consistent.
Keywords: diagnostics, Bayes’ rule, false positives, prevalence, sensitivity, specificity, uncertainty, gold standard
1. Introduction (Fry 2018) As authorities are unwilling to release

In many applications, the problem with classifica- prisoners if the test says that they are likely to
tion is the lack of a ‘gold standard’ of evidence. return to crime, it would be impossible to know
That is to say when we cannot decisively assign whether the recidivism test was accurate or not. If
an observation to a particular category. This phe- a prisoner fails the test, and thus remains impris-
nomenon is pervasive, arising in many fields, from oned, there is no available data on whether they
structural health in engineering, to supervised would have re-offended had they been released.
learning in computer science, or patient diagnosis Therefore, we could never know if it was a true
in medicine. Tests, on which classifications rest, negative or a false negative.
are often imperfect; they yield false alarms (false Winkler & Smith (2004) have argued that the
positives), fail to detect threats (false negatives), traditional application of Bayes’ rule in medical
or are prone to other misclassifications. counselling is inappropriate and represents a con-
For instance, medical practitioners commonly fusion in the medical decision-making literature.
diagnose a patients health condition based on They propose in its place a radically different for-
some diagnostic, which in isolation is not defini- mulation that makes special use of the information
tive. Although sometimes multiple tests can be about the test results for new patients, although
combined together in order to become definitive. not their actual disease status. As the ground
(Joseph et al. 1995, Albert 2009) The diagnostic truth cannot be established, they instead construct
result has some statistical uncertainty associated two new confusion matrices; one based upon the
with detecting the true health state. Naively in- assumption the test is a true positive, and the
terpreting the result from a medical test can lead other assuming the test is a false positive. They
to an incorrect assessment for a patient’s true then make use of these alternative facts in order
health condition. Bayes’ rule is commonly used to to update the tests sensitivity, specificity and un-
estimate the actual probability of some individual derlying prevalence of the disease, thus reducing
being a member of a class, subject to some piece the test’s uncertainty asymptotically as the test is
of evidence. (Mossman & Berger 2001) applied.
It is often impractical or even impossible to According to Google Scholar, the Winkler and
know gold standard information about classifica- Smith paper has only been referenced 11 times
tion. In medicine there are many diseases for since its publication: Jafar et al. (2007), Finkel
which there is no way to conclusively determine (2008), Zuk (2008), Proeve (2009), Weber (2009),
whether a patient has a particular disease. For in- Raab (2010), Low-Choy et al. (2011), Cuevas
stance, for patients with Giant Cell Arteritis, even (2015), Cuevas et al. (2016), Rzepiński (2018),
after a biopsy has been undertaken there is still Rushdi & Rushdi (2018). The majority of these
uncertainty about the true health state. (Hunder are in papers relating to medical decision making.
et al. 1990) However, Proeve (2009) concerns child abuse de-
There is also the situation where gold stan- cision making and Low-Choy et al. (2011) con-
dard information can only be gathered from some cerns plant pest dispersal. Only Cuevas et al.
classes, yet for others it is unknown. For ex- (2016) appears to actually make use of their
ample, some prison authorities use classification method. Nevertheless, we think their argument
algorithms to assess whether a prisoner is likely deserves a clear rebuttal because of the centrality
to reoffend when released from prison on parole. of the issue in diagnostic testing, and the remark-
able delicacy of Bayesian reasoning by which
Proceedings of the 29th European Safety and Reliability Conference.

Edited by Michael Beer and Enrico Zio
Copyright c 2019 European Safety and Reliability Association.
Published by Research Publishing, Singapore.
ISBN: 978-981-11-2724-3; doi:10.3850/978-981-11-2724-3 0458-cd 2628
Proceedings of the 29th European Safety and Reliability Conference 2629
such a profound disagreement could emerge and outcome, Pr(D|+), this is also known as the
escape resolution for many years. positivew predictive value (PPV). When p, s and
t are available in scalar form, we can obtain
2. The Standard Bayesian Approach to Pr(D|+) = Pr(D|+, p, s, t).
Calculating Positive Predictive Value The Mossman & Berger (2001) paper takes this
a step further and considers the following hypo-
Throughout this paper, we will refer to the fol- thetical situation:
lowing hypothetical dataset for TN trials of a Mr Smith has tested positive for disorder D,
diagnostic test for condition D: we have α true he asks his doctor the following ”Given the pub-
positives, β false positives, γ false negatives and lished estimates for prevalence, sensitivity and
δ true negatives. Let T+ be the total number of specificity, what is the 95% confidence interval for
positive tests, T− the total number of negative my probability of having D given my positive test
tests, TS the total number of sick people and TW results and the imprecision in the estimates?”
the total number of well people. This allows us to When there is uncertainty about the values of
construct the confusion matrix shown in Table 1. p, s and t they can be described by distributions
Smith & Winkler (1999), Smith et al. (2000).
Table 1. Sample data. There are a couple of ways in which in PPV can
Has Problem No Problem Total
be determined, the simplest is to estimate it using
Positive Test α β T+
the 5 but replacing p, s and t with their expected
Negative Test γ δ T−
values:
Total TS TW TN Pr ∗(D|+, p, s, t) =
E(p)E(s) (6)
The probability that someone has the disease E(p)E(s) + (1 − E(p))(1 − E(t))
given they have had a positive test result is given
by Pr(D|+), whilst the probability that they do In order to obtain a distribution for the PPV,
not have the disease is Pr(¬D|+). Throughout, Mossman & Berger (2001) use a convolution of
we will only consider positive test results, how- the distributions of p, s and t within Eq. 5. In
ever all the arguments made could equally apply their numerical calculation they sample random
to negative test results. The prevalence is given by variables from the distributions of p, s and t and
use Eq. 5 to find the distribution of the PPV.
TS α+γ Both Mossman & Berger (2001) and Winkler &
p= = , (1)
TT α+β+γ+δ Smith (2004) use Jeffery’s prior for p, s and t:
the sensitivity by fj (x|a, b) = B(x|a + 1/2, b − a + 1/2) (7)
α α
s= = , (2) where B is the beta distribution, however alterna-
TS α+γ tives are available.(Bolstad 2007)
and the specificity by Using priors defined by pk = 5, pn = 100,
sk = 9, sn = 10, tk = 9 and tn = 10, where
δ δ the values of p, s and t have come from inde-
t= = . (3)
TW β+δ pendent trials, the Mossman and Berger method
gives the 95% confidence interval for the PPV as
We will also define the ratio of positive tests out [0.075, 0.824].
of total number tested as
T+ α+β
h= = . (4) 3. The Winkler and Smith Method
TT α+β+γ+δ
Winkler & Smith (2004) diverges from Mossman
Often the values of p, s and t are published inde- & Berger (2001) and the textbook method, see
pendent of each other. In which case the following Lesaffre & Lawson (2012) as an example, they
notation can also be used: p = pk /pn , s = sk /sn argue that the outcome of a patient’s test should
and t = tk /tn . be used to update the distribution. They assert that
Given the published values of p, s and t, Bayes’ the PPV of a positive medical test for a disease is
rule gives the probability that a patient has D not Eq. 5, Eq. 6 or Mossman and Berger’s (2001)
given they have tested positive is: ”objective bayesian method.” Rather, it should
ps be computed as a weighted average of assuming
Pr(D|+, p, s, t) = . (5) the positive result is a true positive (and accord-
ps + (1 − p)(1 − t)
ingly augmenting the estimates for prevalence and
(Baron 1994, Lesaffre & Lawson 2012) We are sensitivity) and assuming the positive result is a
usually more interested in obtaining the proba- false positive (and accordingly decrementing the
bility that is only conditioned on a positive test estimates for prevalence and specificity).
2630 Proceedings of the 29th European Safety and Reliability Conference
Starting with the same sample data as Table 1, (pk = 5, pn = 100, sk = 9, sn = 10, tk = 9
Winkler and Smith suggest that we should con- and tn = 10), we find that the 95% confidence
struct two new confusion matrices, one assuming interval is [0.062, 0.741]. Notice that the width of
a true positive and one assuming a false positive, this confidence interval is smaller than that of the
Table 2 and Table 3 respectively. Mossman and Berger method.
3.1. The Logical Inconsistency with the

Table 2. New confusion matrix assuming true positive.
Winkler and Smith Method
Has Problem No Problem Total
Positive Test α+1 β T+ + 1 In order to show that the Winkler and Smith
Negative Test γ δ T− method is flawed we will explore the situation in
Total TS + 1 TW TN + 1
which you are conducting more than one test and
show that it leads to a reductio ad absurdum. As
the number of tests approaches infinity, we will
then contrast this with the imprecise probability
approach which, whilst not providing useful in-
Table 3. New confusion matrix assuming false positive. formation, at least makes some sense.
Has Problem No Problem Total As Winkler and Smith make use of the test
Positive Test α β+1 T+
result in their calculations, Instead of just consid-
Negative Test γ δ T− + 1
ering the effect of using just one test, we could
Total TS TW + 1 TN + 1
also consider what happens after X positive tests.
Using the Winkler and Smith method we would
have the following two confusion matrices: Ta-
Assuming a true positive the prevalence, sensi- ble 4 and Table 5.
tivity and specificity become:
Table 4. New confusion matrix assuming true positive for X
positive tests.
TS + 1
p′S = (8) Has Problem No Problem Total
TN + 1
Positive Test α+X β T+ + hX
α+1 Negative Test γ δ T−
s′S = (9) Total TS + X TW TN + X
TS + 1
t′S = t. (10)
Similarly for the assuming false positive case: Table 5. New confusion matrix assuming true positive for X
positive tests.
TS Has Problem No Problem Total

p′W = (11) Positive Test α β+X T+ + hX
TN + 1
Negative Test γ δ T−
Total TS TW + X TN + X
s′W = s (12)
δ In the assuming true positive case the new

t′W = (13) prevalence would be given by
TW + 1
They then say to construct a weighted average TS + X
p′+ = , (16)
of the new confusion matrices using TT + X
Pr(D|+) = the new sensitivity by
f (p′S , s′S , t′S ) Pr(D|+, p, s, t) (14) α+X
s′+ = , (17)
+ f (p′W , s′W , t′W ) Pr(¬D|+, p, s, t). α+X +γ
Where f (p, s, t) is a convolution of the Jeffery’s and specificity wouldn’t change.
prior distributions (Eq 7) for p, s and t: t′+ = t (18)
f (p, s, t) = fj (p|pk , pn )fj (s|sk , sn )fj (t|tk , tn ), Using Table 5 we get the new assuming false
(15) positive case we get the new prevalence as
Pr(D|+, p, s, t) is calculated using Eq. 6 and
Pr(¬D|+, p, s, t) = 1 − Pr(D|+, p, s, t). Re- TS
turning to the numerical example at the end of 2 p′− = (19)
TT + X
and the new sensitivity wouldn’t change from the Section 2 example data set (pk = 5,
pn = 100, sk = 9, sn = 10, tk = 9 and
s′− = s (20) tn = 10). These results amount to a logical flaw in
and specificity would be. the Winkler and Smith method. We haven’t added
any new information (apart from the number of
δ tests) and the uncertainty has reduced. It should
t′− = (21) be noted that this asymptote is due to the choice
β+X
of prior. Interestingly, and perhaps worryingly,
We will now consider what happens to p′+ , s′+ , t′+ , different priors would give different values in the
p′− , s′− and t′− when the number of tests becomes limit X → ∞.
large (X → ∞). We find
lim p′+ = 1, (22)
X→∞
1 10 100 1000
lim s′+ = 1, (23)
X→∞
1.00
lim t′+ = t, (24)
X→∞
Cumlative distribution
lim p′− = 1, (25) 0.75
X→∞
lim s′− = s, (26)

0.50
X→∞
and
lim t′− = 0 (27)
0.25
X→∞
Hence, using Equation 5 we get Pr(D|+) = 1

which implies that Pr(¬D|+) = 0. Naively
0.00
interpreting this result at face value would let you

conclude that any positive test at this limit is a 0.00 0.25 0.50 0.75 1.00
true positive alternatively there is no such thing
PPV
as a false negative. Winkler and Smith do not
say to use this result though they would next use
Equation 14 along with Equation 15. At the limit Fig. 1. Plot of the CDF for the first 1000 tests using the
this give us: Winkler and Smith method.
Pr(D|+, p, s, t) =fβ (p|TS + X, TT + X)×

fβ (s|α + X, α + γ + X)×
fβ (t|δ, β + δ). 4. Imprecise Probability Approach
(28) It is possible to reconsider the argument made by
Winkler and Smith using a framework provided
As by the theory of imprecise probabilities. (Wal-

∞ x = 1/2 ley 1991, 1996, Walley et al. 1996) Under this
lim fβ (x | a, b) = (29) perspective, the prevalence p, sensitivity s, and
a,b→∞ 0 x 6= 1/2
specificity t can each be updated as prescribed
and by Winkler and Smith to yield a distribution for
Z ∞ the PPV assuming the patient is actually sick (in
Pr(x)dx = 1 (30) which case the test was a true positive) and a
−∞ distribution for the PPV assuming the patient is
actually not sick (in which case the test was a false
then the cumulative distribution for the PPV be- positive). However, the appropriate synthesis of
comes these two contingent estimates of the PPV is not a

0 x < 1/2 weighted mixture as Winkler and Smith conceive
Pr(D|+, p, s, t) = . (31) it. Instead, because whether the patient is sick or
1 x ≥ 1/2
not is precisely what is unknown in this problem,
Figure 1 shows this migration from the first test, an envelope of the two distributions would be
X = 1 towards the result in Equation 31 starting more appropriate.
Returning to the numerical example in sec- 4.1. Logical Consistency

tion 2, with priors for p, s and t implied by pk = 5, As we said that the Winkler and Smith methods
pn = 100, sk = 9, sn = 10, tk = 9 and becomes logically inconsistent when we consider
tn = 10. The envelope of the two contingent it in the extreme scenario we will now show that
distributions yields a rather wide probability box, using imprecise probabilities leads to at least a
Ferson et al. (2003), that is shown as the outer logical result in the limit. What the imprecise
bounds in black in Figure 2. The leftmost edge probability confusion matrix would be after X
corresponds to the distribution that assumes the
positive test result was a false positive, increment- tests is shown in Table 6
ing p and t according to Table 3 . The rightmost
edge corresponds to the distribution that assumes Table 6. New imprecise probability confusion matrix after
the test result was a true positive, incrementing X positive tests
p and s according to Table 2. This envelop-
ing calculation is equivalent to a mixture with Has Problem No Problem Total
unknown weights characterised by the vacuous Positive ′
α + X[0, 1] β + X[0, 1] T+
interval [0, 1] for both distributions. In contrast, Test
the traditional Bayesian result and the Winkler Negative ′
γ[0, 1] δ[0, 1] T−
and Smith method are both also shown in the Test
figure. We see that the envelope encloses both Total TS′ ′
TW TN + X
the traditional and the Winkler and Smith distribu-
tions. The 95% confidence interval using impre-
cise probabilities is [0.057, 0.848], which as ex- The new prevalence would be
pected encompanses the interval for both the tradi-
tional Bayesian reusult and the Winkler and Smith TS′
result. This envelope is reminiscent of prob- p′ =
abilistic dilation of uncertainty that sometimes TN + X
(32)
accompanies the addition of weakly informative α + γ + X[0, 1]
data in probabilistic calculations.(Seidenfeld & = ,
TN + X
Wasserman 1993) In this case, the unverified test
result is certainly information, but it does not seem new sensitivity
to be information that leads to a contraction of
uncertainty about what the test result itself means. α + X[0, 1]
s′ = , (33)
α + γ + X[0, 1]
and new specificity
δ
1.00
t′ = . (34)
β + δ + X[0, 1]
Now at the X → ∞ limit:

0.75
lim p′ = [0, 1] (35)

X→∞
0.50
lim s′ = 1 (36)
X→∞
0.25
Imprecise Probability
Mossman−Berger lim t′ = 0 (37)
X→∞
Winkler−Smith
0.00
0.00 0.25 0.50 0.75 1.00

Using these results along with Eq. 5 gives:
PPV
Pr(D|+) = [0, 1] (38)
Fig. 2. P-box showing the distribution envelope for the PPV. as the final value for the PPV Figure 3 shows the
migration to the vacuous p-box using the impre-
cise probability method starting from Mossman &
Berger (2001) sample data.
6. Conclusion
We have shown that the method for dealing with
1 10 100 1000
the lack of a gold standard in a classification test
is inappropriate, it leads to the illogical result of
the test becoming less uncertainty after more trails
1.00
even though no new information is added. We

have shown that it is possible to reimagine their
method using imprecise probabilities in order to
create logically consistent results.

0.75
Acknowledgement
0.50
This research is partly funded by the UK Engi-

neering & Physical Sciences Research Council
(EPSRC) “Digital twins for improved dynamic
design”, through grant number EP/R006768/1
0.25
and UK Medical Research Council (MRC)

“Treatment According to Response in Giant
cEll arTeritis (TARGET)”, through grant number
0.00
MR/N011775/1. The funding and support of EP-

0.00 0.25 0.50 0.75 1.00
SRC and MRC are greatly acknowledge.
PPV
This paper benefited from discussion with
many people, including Masatoshi Sugeno, Jack
Siegrist, Michael Balch and Jason O’Rawe.
Fig. 3. Plot of the p-boxes for first 1000 tests using the
imprecise probability method.
References
Albert, P. S. (2009), ‘Estimating diagnostic ac-
5. Discussion curacy of multiple binary tests with an imper-
fect reference standard’, Statistics in Medicine
Let us first consider the difference between the 28(December 2008), 780–797.
Winkler and Smith method and the imprecise Badeau, M., Lindsay, C., Blais, J., Takwoingi, Y.,
probability method. Figure 1 shows that as the Langlois, S., Légaré, F., Giguère, Y., Turgeon,
number of tests increases the uncertainty on the A. F., William, W. & Rousseau, F. (2015),
PPV decreases. This amounts to a reductio ad ‘Genomics-based non-invasive prenatal testing
absurdum thus proving their method untenable. for detection of fetal chromosomal aneuploidy
This uncertainty reduction happens even after one in pregnant women’, Cochrane Database of
test as demonstrated in the numerical example in Systematic Reviews (7).
Section 3 Baron, J. A. (1994), ‘Uncertainty in Bayes’, Med-
In the imprecise version, we have also given ical Decision Making 14(1), 46–51.
the test no information but the uncertainty has in- Bolstad, W. M. (2007), Introduction to Bayesian
creased, which we argue is reasonable. Although Statistics, 2 edn, Wiley.
at the infinity limit the vacuous p-box result is Cuevas, J. R. T., Bravo Melo, L. & Achcar, J.
not useful, it at least makes logical sense. It is (2016), ‘Estimación del Valor Predictivo Pos-
perfectly reasonable to say I don’t know when one itivo de la Colangiopancreatografı́a Magnética
does not know. utilizando metodos de Bayes. (Estimation of the
Equation 15 assumes that the prevalence, sen- Positive Predictive Value of the Magnetic reso-
sitivity and specificity are independent of each nance Cholangiopancreatography using Bayes
other. In their data, this is a fair assumption as methods) (in Spanish)’, Revista Médica de Ris-
it states that they all come from different indepen- aralda 22(1), 19–26.
dent studies, however this may not always be the Cuevas, T. (2015), ‘Inferencia Bayesiana e In-
case. For example when conducting non-invasive vestigación en salud: un caso de aplicación
neonatal screening for foetal aneuploidy condi- en diagnóstico clı́nico (Bayesian Inference and
tions, such as Down’s syndrome, the prevalence Health Research: a case of application in clini-
of these conditions changes with the age of the cal diagnosis) (in Spanish)’, Revista Médica de
mother, and as the condition is rare, often studies Risaralda 21(3), 9–16.
of the test statistics are focused on higher risk Ferson, S., Kreinovich, V., Ginzburg, L., My-
categories. (Badeau et al. 2015, Montgomery et al. ers, D. S. & Sentz, K. (2003), Constructing
2017) Therefore, it is not unimaginable that there Probability Boxes and Dempster-Shafer Struc-
is dependence between p, s and t. tures, Technical Report January, Sandia Na-
tional Lab.(SNL-NM),, Albuquerque, United ‘Karnaugh-Map Utility in Medical Studies :

States. The Case of Fetal Malnutrition Karnaugh-Map
Finkel, A. M. (2008), Protecting People in Spite Utility in Medical Studies : The Case of Fetal
of–or Thanks to–the Veil of Ignorance, in R. R. Malnutrition’, International Journal of Math-
Sharp & G. E. Marchant, eds, ‘Genomics and ematical, Engineering and Management Sci-
Environmental Regulation: Science, Ethics, ences (IJMEMS) 3(3), 220–244.
and Law’, The John Hopkins University Press, Rzepiński, T. (2018), ‘Twierdzenie Bayesa w pro-
Balitmore, pp. 290–342. jektowaniu strategii diagnostycznych w medy-
Fry, H. (2018), Hello World: How to be Human cynie (Making Diagnostic Strategies in Medical
in the Age of the Machine, WW Norton & Practice with the Use of Bayes’ Theorem) (in
Company Inc, New York. Polish)’, Diametros 57(57), 39–60.
Hunder, G. G., Bloch, D. A., Michel, B. A., Seidenfeld, T. & Wasserman, L. (1993), ‘Dilation
Stevens, M. B., Aren, W. P., Calabrese, L. H., for Sets of Probabilities’, The Annals of Statis-
Edworthy, S. M., Fauci, A. S., Leavitt, R. Y., tics 21(3), 1139–1154.
Lie, J. T., Lightfoot Jr., R. W., Masi, A. T., Smith, J. E. & Winkler, R. L. (1999), ‘Casey’s
McShane, D. J., Mills, J. A., Wallace, S. L. & Problem: Interpreting and Evaluating a New
Zvaifler, N. J. (1990), ‘The American College Test’, Interfaces 29(3), 63–76.
of Rheumatology 1990 criteria for the clas- Smith, J. E., Winkler, R. L. & Fryback, D. G.
sification of giant cell arteritis’, Arthritis & (2000), ‘The First Positive: Computing Positive
Rheumatism 33(8), 1122—-1128. Predictive Value at the Extremes’, Annals of
Jafar, T. H., Chaturvedi, N., Hatcher, J. & Levey, Internal Medicine 132(10), 804.
A. S. (2007), ‘Use of albumin creatinine ratio Walley, P. (1991), Statistical reasoning with im-
and urine albumin concentration as a screen- precise probabilities, Chapman and Hall, Lon-
ing test for albuminuria in an Indo-Asian pop- don.
ulation’, Nephrology Dialysis Transplantation Walley, P. (1996), ‘Inferences from Multinomial
22(8), 2194–2200. Data: Learning about a Bag of Marbles’, Jour-
Joseph, L., Gyorkos, T. W. & Coupal, L. (1995), nal of the Royal Statistical Society. Series B
‘Bayesian estimation of disease prevalence and (Methodological) 58(1), 3–57.
the parameters of diagnostic tests in the absence Walley, P., Gurrin, L. & Burton, P. (1996), ‘Anal-
of a gold standard’, American Journal of Epi- ysis of Clinical Data Using Imprecise Prior
demiology 141(3), 263–272. Probabilities’, Journal of the Royal Statistical
Lesaffre, E. & Lawson, A. B. (2012), Bayesian Society. Series D (The Statistician) 45(4), 457–
Biostatistics, John Wiley & Sons, Ltd, Chich- 485.
ester. Weber, K. M. (2009), Making a treatment decision
Low-Choy, S., Hammond, N., Penrose, L., An- for breast cancer: Associations among mari-
derson, C. & Taylor, S. (2011), Dispersal in a tal qualities, couple communication, and breast
hurry: Bayesian learning from surveillance to cancer treatment decision outcomes, PhD the-
establish area freedom from plant pests with sis, The Pennsylvania State University.
early dispersal, in ‘MODSIM2011, 19th Inter- URL: https://etda.libraries.psu.edu/files/
national Congress on Modelling and Simula- final submissions/3502
tion’, Perth, Australia, pp. 2521–2527. Winkler, R. L. & Smith, J. E. (2004), ‘On Un-
Montgomery, J., Caney, S., Clancy, T., Edwards, certainty in Medical Testing’, Medical Decision
J., Gallagher, A., Greenfield, A., Haimes, E., Making 24(6), 654–658.
Hughes, J., Jackson, R., Lawrence, D., Pat- Zuk, T. (2008), Visualizing Uncertainty, PhD the-
tinson, S. D., Shakespear, T., Siddiqui, M., sis, The University of Calgary.
Watson, C., Widdows, H., Wishart, A. & de Zu- URL:
lueta, P. (2017), Non-invasive prenatal testing: https://prism.ucalgary.ca/bitstream/handle/1880/
ethical issues, Technical report, Nuffield Coun- 46780/Zuk 2008.pdf?sequence=1
cil of Bioethics.
Mossman, D. & Berger, J. O. (2001), ‘Intervals for
posttest probabilities: A comparison of 5 meth-
ods’, Medical Decision Making 21(6), 498–
507.
Proeve, M. (2009), ‘Issues in the application of
Bayes’ Theorem to child abuse decision mak-
ing’, Child Maltreatment 14(1), 114–120.
Raab, S. (2010), Kidney and Urinary Tract, in
W. Gray & G. Kocjan, eds, ‘Diagnostic Cy-
topathology’, 3 edn, Churchill Livingstone El-
sevier, pp. 365–401.
Rushdi, R. A. & Rushdi, A. M. (2018),

A Problem in The Bayesian Analysis of Data Without Gold Standards

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Problem in The Bayesian Analysis of Data Without Gold Standards

Uploaded by

Copyright:

Available Formats

A Problem in the Bayesian Analysis of Data without Gold Standards

Nick Gray, Marco De Angelis, Dominic Calleja and Scott Ferson

1. Introduction (Fry 2018) As authorities are unwilling to release

Proceedings of the 29th European Safety and Reliability Conference.

3.1. The Logical Inconsistency with the

TS Has Problem No Problem Total

δ In the assuming true positive case the new

lim s′− = s, (26)

Hence, using Equation 5 we get Pr(D|+) = 1

interpreting this result at face value would let you

Pr(D|+, p, s, t) =fβ (p|TS + X, TT + X)×

Returning to the numerical example in sec- 4.1. Logical Consistency

and new specificity

Now at the X → ∞ limit:

lim p′ = [0, 1] (35)

0.00 0.25 0.50 0.75 1.00

even though no new information is added. We

create logically consistent results.

This research is partly funded by the UK Engi-

and UK Medical Research Council (MRC)

MR/N011775/1. The funding and support of EP-

tional Lab.(SNL-NM),, Albuquerque, United ‘Karnaugh-Map Utility in Medical Studies :

You might also like