You are on page 1of 4

# A brief introduction to Bayes' Rule

by Kevin Murphy.
Intuition
Here is a simple introduction to Bayes' rule Irom an article in the Economist (9/30/00).
"The essence of the Bavesian approach is to provide a mathematical rule explaining how vou should
change vour existing beliefs in the light of new evidence. In other words, it allows scientists to combine
new data with their existing knowledge or expertise. The canonical example is to imagine that a
precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He
assigns equal prior probabilities to both possible outcomes, and represents this bv placing one white and
one black marble into a bag. The following dav, when the sun rises, the child places another white marble
in the bag. The probabilitv that a marble plucked randomlv from the bag will be white (ie, the childs
degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next dav, the
child adds another white marble, and the probabilitv (and thus the degree of belief) goes from two-thirds
to three-quarters. And so on. Graduallv, the initial belief that the sun is fust as likelv as not to rise each
morning is modified to become a near-certaintv that the sun will alwavs rise."
In symbols
Mathematically, Bayes' rule states
likelihood * prior
posterior = ------------------------------
marginal likelihood
or, in symbols,
P(e | R=r) P(R=r)
P(R=r | e) = -----------------
P(e)
where P(Rr,e) denotes the probability that random variable R has value r given evidence e. The
denominator is just a normalizing constant that ensures the posterior adds up to 1; it can be computed by
summing up the numerator over all possible values oI R, i.e.,
P(e) = P(R=0, e) + P(R=1, e) + ... = sum_r P(e | R=r) P(R=r)
This is called the marginal likelihood (since we marginalize out over R), and gives the prior probability oI
the evidence.
Example of Bayes' rule
Here is a simple example, based on Mike Shor's Java applet. Suppose you have been tested positive Ior a
disease; what is the probability that you actually have the disease? It depends on the accuracy and
sensitivity oI the test, and on the background (prior) probability oI the disease.
Let P(Testve , Diseasetrue) 0.95, so the Ialse negative rate, P(Test-ve , Diseasetrue), is 5. Let
Bayes' Rule http://people.cs.ubc.ca/~murphyk/Bayes/bayesrule.html
1 oI 4 9/24/2009 9:03 AM
P(Testve , DiseaseIalse) 0.05,, so the Ialse positive rate is also 5. Suppose the disease is rare:
P(Diseasetrue) 0.01 (1). Let D denote Disease (R in the above equation) and "Tve" denote the
positive Test (e in the above equation). Then
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve|D=true) * P(D=true)+ P(T=+ve|D=false) * P(D=false)

0.95 * 0.01 0.0095
= ------------------- = ------- = 0.161
0.95*0.01 + 0.05*0.99 0.0590
So the probability oI having the disease given that you tested positive is just 16. This seems too low, but
here is an intuitive argument to support it. OI 100 people, we expect only 1 to have the disease, and that
person will probably test positive. But we also expect about 5 oI the others (about 5 people in total) to
test positive by accident. So oI the 6 people who test positive, we only expect 1 oI them to actually have the
disease; and indeed 1/6 is approximately 0.16. (II you still don't believe this result, try reading An Intuitive
Explanation oI Bayesian Reasoning by Eliezer Yudkowsky.)
In other words, the reason the number is so small is that you believed that this is a rare disease; the test has
made it 16 times more likely you have the disease (p(D1,T1)/p(D1)0.16/0.0116), but it is still
unlikely in absolute terms. II you want to be "objective", you can set the prior to uniIorm (i.e. eIIectively
ignore the prior), and then get
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve)

0.95 * 0.5 0.475
= ------------------- = ------- = 0.95
0.95*0.5 + 0.05*0.5 0.5
This, oI course, is just the true positive rate oI the test. However, this conclusion relies on your belieI that, iI
you did not conduct the test, halI the people in the world have the disease, which does not seem reasonable.
A better approach is to use a plausible prior (eg P(Dtrue)0.01), but then conduct multiple independent
tests; iI they all show up positive, then the posterior will increase. For example, iI we conduct two
(conditionally independent) tests T1, T2 with the same reliability, and they are both positive, we get
P(T1=+ve|D=true) * P(T2=+ve|D=true) * P(D=true)
P(D=true|T1=+ve,T2=+ve) = ------------------------------------------------------------
P(T1=+ve, T2=+ve)

0.95 * 0.95 * 0.01 0.009
= ----------------------------- = ------- = 0.7826
0.95*0.95*0.01 + 0.05*0.05*0.99 0.0115
The assumption that the pieces oI evidence are conditionally independent is called the naive Bayes
assumption. This model has been successIully used Ior classiIying email as spam (Dtrue) or not (DIalse)
given the presence oI various key words (Tive iI word i is in the text, else Ti-ve). It is clear that the
words are not independent, even conditioned on spam/not-spam, but the model works surprisingly well
nonetheless.
Bayes' Rule http://people.cs.ubc.ca/~murphyk/Bayes/bayesrule.html
2 oI 4 9/24/2009 9:03 AM
What is the relationship between graphical models and Bayes' rule?
For complicated probabilistic models, computing the normalizing constant P(e) is computationally
intractable, either because there are an exponential number oI (discrete) values oI R to sum over, or
because the integral over R cannot be solved in closed Iorm (e.g., iI R is a high-dimensional vector).
Graphical models can help because they represent the joint probability distribution as a product oI local
terms, which can sometimes be exploited computationally (e.g., using dynamic programming or Gibbs
sampling). Bayes nets (directed graphical models) are a natural way to represent many hierarchical
Bayesian models.
Articles in the popular press
Science article, 19 November 1999, by David MalakoII. Excellent non-technical overview.
Economist article (9/30/00) about Bayesian approaches to clinical trials.
New York Times article (4/28/01) about Bayesian statistics.
Entry-level Books
"A Iirst course in Bayesian statistical methods", Peter HoII, Springer 2009
"Bayesian computation in R", Jim Albert, Springer 2009 (2nd edition)
Bayesian Statistics: An Introduction, Peter M. Lee. Third Edition 2004.
"Bayesian data analysis", A. Gelman et al
"The Bayesian choice", C. Robert
"Bayesian theory", Bernardo et al
"Probability theory: the logic oI science", Jaynes
History of Bayes' rule
(The Iollowing section was written by Alan Yuille.)
Baves Theorem is commonlv ascribed to the Reverent Thomas Baves (1701-1761) who left one hundred
pounds in his will to Richard Price ``now I suppose Preacher at Newington Green. Price discovered two
unpublished essavs among Bavess papers which he forwarded to the Roval Societv. This work made little
impact, however, until it was independentlv discovered a few vears later bv the great French
mathematician Laplace. English mathematicians then quicklv rediscovered Baves work.
Little is known about Baves and he is considered an enigmatic figure. One leading historian of statistics,
Stephen Stigler, has even suggested that Baves Theorem was reallv discovered bv Nicolas Saunderson, a
blind mathematician who was the fourth Lucasian Professor of Mathematics at Cambridge Universitv.
(Saunderson was recommended to this chair bv Isaac Netwon, the second Lucasian Professor. Recent
holders of the chair include the great phvsicist Paul Dirac and the current holder, Stephen Hawking).
Bayes' Rule http://people.cs.ubc.ca/~murphyk/Bayes/bayesrule.html
3 oI 4 9/24/2009 9:03 AM
Baves theorem and, in particular, its emphasis on prior probabilities has caused considerable
controversv. The great statistician Ronald Fisher was verv critical of the ``subfectivist aspects of priors.
Bv contrast, a leading proponent I.J. Good argued persuasivelv that ``the subfectivist (i.e. Bavesian)
states his fudgements, whereas the obfectivist sweeps them under the carpet bv calling assumptions
knowledge, and he basks in the glorious obfectivitv of science.
Bayes' Rule http://people.cs.ubc.ca/~murphyk/Bayes/bayesrule.html
4 oI 4 9/24/2009 9:03 AM