You are on page 1of 18

Bayes Decision Theory

Sargur Srihari

CSE 555
Introduction to Pattern Recognition
Reverend Thomas Bayes
1702-1761

Bayes set out his theory of probability in Essay towards solving a problem in the doctrine of
chances published in the Philosophical Transactions of the Royal Society of London in 1764.
The paper was sent to the Royal Society by Richard Price, a friend of Bayes', who wrote:-
I now send you an essay which I have found among the papers of our deceased friend Mr Bayes,
and which, in my opinion, has great merit... In an introduction which he has writ to this Essay,
he says, that his design at first in thinking on the subject of it was, to find out a method by which
we might judge concerning the probability that an event has to happen, in given circumstances,
upon supposition that we know nothing concerning it but that, under the same circumstances,
it has happened a certain number of times, and failed a certain other number of times.
CSE 555: Srihari 1
Bayes Rule
Two Classes(A, ~A) , Single Binary-Valued Feature (X,~X)

Known
Data

By Conditional Probability Rule, P ( X / A) P( A)


By Bayes Rule, P ( A / X ) =
p ( X / A) =
p ( X & A) P( X )
p ( A) P ( X / A) P( A)
.248 =
= = 0.7515 P ( X & A) + P ( X & ~ A)
.330
P ( X / A) P ( A)
p ( X / ~ A) =
p( X & ~ A) =
p(~ A) P ( X / A) P ( A) + P ( X / ~ A) P (~ A)
=
.168
= 0.2507 0.75 × 0.33
=
.670 0.75 × 0.33 + 0.25 × 0.67
.2475 .2475
CSE 555: Srihari = = = 0.596 2
.2475 + .1675 .415
Bayes Decision Theory
• Fundamental statistical approach to statistical
pattern classification
• Quantifies trade-offs between classification using
probabilities and costs of decisions
• Assumes all relevant probabilities are known

CSE 555: Srihari 3


Prior Probabilities
State of nature, prior

• State of nature is a random variable


P(ω1) + P( ω2) = 1 (exclusivity and exhaustivity)

• Decision rule with only the prior information


Decide ω1 if P(ω1) > P(ω2) otherwise decide ω2

CSE 555: Srihari 4


Class-conditional Probabilities
p(x | ω1) and p(x | ω2)

p(x | ωj) Pdfs show the probability of measuring a


particular feature value given category ωj.

If x is a feature value, the two curves


describe the difference in populations of two
types of classes.

Density functions are normalized-- thus area


under each curve is 1.0

Feature x

CSE 555: Srihari 5


Bayes formula to combine prior and
class-conditional probabilities
p ( x / ω j ) P(ω j )
P (ω j | x) =
p( x)
• In the case of two categories
j =2
p ( x) = ∑ P( x | ω j ) P(ω j )
j =1

• Informally, Bayes rule says:


posterior = likelihood x prior / evidence
CSE 555: Srihari 6
Posterior probabilities
Posterior probabilities for the
priors P(ω1) = 2/3, P(ω2)=1/3
Class-conditional For x = 14,
p.d.f.s P(ω1/x) = 0.08, P(ω2/x) = 0.92
p(x | ωj) P(ωj | x)

Feature x Feature x
CSE 555: Srihari 7
Bayes Decision Rule

x is an observation for which:

if P(ω1 | x) > P(ω2 | x) True state of nature = ω1


if P(ω1 | x) < P(ω2 | x) True state of nature = ω2

Therefore:
whenever we observe a particular x, the probability of
error is :
P(error | x) = P(ω1 | x) if we decide ω2
P(error | x) = P(ω2 | x) if we decide ω1

CSE 555: Srihari 8


Bayes Decision Rule minimizes
probability of error

Decide ω1 if P(ω1 | x) > P(ω2 | x);


otherwise decide ω2

Therefore:
P(error | x) = min [ P(ω1 | x), P(ω2 | x) ]
(Bayes decision)

CSE 555: Srihari 9


Bayes Decision Theory – Continuous
Features

• Generalization of the preceding ideas


• Use of more than one feature
• Use more than two states of nature
• Allowing actions and not only decide on the state of
nature
• Introduce a loss of function which is more general than
the probability of error

CSE 555: Srihari 10


Loss Function

• Allowing actions other than classification primarily


allows the possibility of rejection

• Refusing to make a decision in close or bad cases!


• The loss function states how costly each action
taken is

CSE 555: Srihari 11


Loss Function Definition

Let {ω1, ω2,…, ωc} be the set of c states of nature


(or “categories”)

Let {α1, α2,…, αa} be the set of possible actions

Let λ(αi | ωj) be the loss incurred for taking

action αi when the state of nature is ωj

CSE 555: Srihari 12


Overall Risk
R = Sum of all R(αi | x) for i = 1,…,a

Conditional risk
Minimizing R Minimizing R(αi | x) for i = 1,…, a

j =c
Expected Loss with action i
R( α i | x ) = ∑ λ ( α i | ω j ) P ( ω j | x )
j =1

Select the action αi for which R(αi | x) is minimum

R is minimum and R in this case is called the Risk

Bayes risk = best performance that can be achieved

CSE 555: Srihari 13


Two-category classification
α1 : deciding ω1
α2 : deciding ω2
λij = λ(αi | ωj)
loss incurred for deciding ωi when the true state of nature is ωj

Conditional risk:

R(α1 | x) = λ11P(ω1 | x) + λ12P(ω2 | x)


R(α2 | x) = λ21P(ω1 | x) + λ22P(ω2 | x)
CSE 555: Srihari 14
Minimum Risk Decision Rule
Our rule is the following:
if R(α1 | x) < R(α2 | x)
action α1: “decide ω1” is taken

This results in the equivalent rule :


decide ω1 if:

(λ21- λ11) P(x | ω1) P(ω1) >


(λ12- λ22) P(x | ω2) P(ω2)

and decide ω2 otherwise

CSE 555: Srihari 15


Likelihood ratio Decision Rule

The preceding rule is equivalent to the following rule:

P ( x | ω 1 ) λ12 − λ 22 P ( ω 2 )
if > .
P ( x | ω 2 ) λ 21 − λ11 P ( ω 1 )

Then take action α1 (decide ω1)


Otherwise take action α2 (decide ω2)
CSE 555: Srihari 16
Exercise
Select the optimal decision where:
Ω= {ω1, ω2}
P(x | ω1) N(2, 0.5) (Normal distribution)
P(x | ω2) N(1.5, 0.2)

P(ω1) = 2/3
P(ω2) = 1/3
⎡1 2⎤
λ=⎢ ⎥
⎣3 4 ⎦
CSE 555: Srihari 17

You might also like