You are on page 1of 80

Module 4

Reasoning with uncertainty


Will it rain tomorrow ??
given that it is cloudy today.
BMEE407L Module <4> 2
Uncertainty = Unknown

BMEE407L Module <4> 3


Nature of the World

• Unknown or Unsure things.


• Randomness is all around us.

• This an unfair world.


• Chances are not equal.

BMEE407L Module <4> 4


Non-monotonic Logic
• Traditional logic is monotonic
• The set of legal conclusions grows monotonically with the set of facts
appearing in our initial database
• When humans reason, we use defeasible logic
• Almost every conclusion we draw is subject to reversal
• If we find contradicting information later, we’ll want to retract earlier
inferences
• Nonmonotonic logic, or defeasible reasoning, allows a statement to be
retracted
• Solution: Truth Maintenance
• Keep explicit information about which facts/inferences support other
inferences
• If the foundation disappears, so must the conclusion
BMEE407L Module <4> 5
Uncertainty

• On the other hand, the problem might not be in the fact that T/F values can
change over time but rather that we are not certain of the T/F value
• Agents almost never have access to the whole truth about their environment
• Agents must act in the presence of uncertainty
• Some information ascertained from facts
• Some information inferred from facts and knowledge about environment
• Some information based on assumptions made from experience

BMEE407L Module <4> 6


Uncertainty
• Let action At = leave for airport t minutes before flight
• Will At get me there on time?
• Problems:
• Partial observability (road state, other drivers' plans, etc.)
• Noisy sensors (traffic reports)
• Uncertainty in action outcomes (flat tire, etc.)
• Complexity of modeling and predicting traffic
• Hence a purely logical approach either
• Risks falsehood: “A25 will get me there on time,” or
• Leads to conclusions that are too weak for decision making:
• A25 will get me there on time if there's no accident on the bridge and it doesn't rain and
my tires remain intact, etc., etc.
• A1440 might reasonably be said to get me there on time but I'd have to stay overnight in
the airport
BMEE407L Module <4> 7
Making decisions under uncertainty
• Suppose the agent believes the following:
P(A25 gets me there on time) = 0.04
P(A90 gets me there on time) = 0.70
P(A120 gets me there on time) = 0.95
P(A1440 gets me there on time) = 0.9999
Probability is the measure of the likeliness that the event will occur.
• Which action should the agent choose?
• Depends on preferences for missing flight vs. time spent waiting
• Encapsulated by a utility function
• The agent should choose the action that maximizes the expected utility:
P(At succeeds) * U(At succeeds) + P(At fails) * U(At fails)
• Utility theory is used to represent and reason with preferences
• Decision theory = probability theory + utility theory
BMEE407L Module <4> 8
Why do we need reasoning under uncertainty?
▪ There are many situations where uncertainty arises:

• When you travel you reason about the possibility of delays


• When an insurance company offers a policy it has calculated the risk that you will claim
• When your brain estimates what an object is, it filters random noise and fills in missing details
• When you play a game you cannot be certain what the other player will do
• A medical expert system that diagnoses disease has to deal with the results of tests that are sometimes
incorrect

▪ Systems which can reason about the effects of uncertainty should do better than
those that don’t

▪ But how should uncertainty be represented?

14-03-2024 BMEE407L 9
Examples

▪ I have toothache. What is the cause?


There are many possible causes of an observed event.

▪ If I go to the dentist and he examines me, when the probe catches this indicates there may be a
cavity, rather than another cause.
The likelihood of a hypothesised cause will change as additional pieces of evidence arrive.

▪ Bob lives in San Francisco. He has a burglar alarm on his house, which can be triggered by burglars
and earthquakes. He has two neighbours, John and Mary, who will call him if the alarm goes off
while he is at work, but each is unreliable in their own way. All these sources of uncertainty can be
quantified. Mary calls, how likely is it that there has been a burglary?
Using probabilistic reasoning we can calculate how likely a hypothesised cause is.

14-03-2024 BMEE407L 10
Probability Theory: Variables and Events
▪ A random variable can be an observation, outcome or event, the value of which is uncertain.

▪ e.g a coin. Let’s use Throw as the random variable denoting the outcome when we toss the
coin.

▪ The set of possible outcomes for a random variable is called its domain.

▪ The domain of Throw is {head, tail}

▪ A Boolean random variable has two outcomes.

▪ Cavity has the domain {true, false}

▪ Toothache has the domain {true, false}

14-03-2024 BMEE407L 11
Probability Explanation
• P(event) is the probability in the absence of any additional information
• Probability depends on evidence.
• Before looking at dice: P(4) = 1/6
• After looking at dice: P(4) = 0 or 1, depending on what we see
• All probability statements must indicate the evidence with respect to which the probability is
being assessed.
• As new evidence is collected, probability calculations are updated.

• Before specific evidence is obtained, we refer


to the prior or unconditional probability of the
event with respect to the evidence. After the
evidence is obtained, we refer to the posterior
or conditional probability.

BMEE407L Module <4> 12


Random variables
• We describe the (uncertain) state of the world using random variables
▪ Denoted by capital letters
• R: Is it raining?
• W: What’s the weather?
• D: What is the outcome of rolling two dice?
• S: What is the speed of my car (in MPH)?
• Random variables take on values in a domain
▪ Domain values must be mutually exclusive and exhaustive
• R in {True, False}
• W in {Sunny, Cloudy, Rainy, Snow}
• D in {(1,1), (1,2), … (6,6)}
• S in [0, 200]

BMEE407L Module <4> 13


Random variables (Syntax for propositions)

BMEE407L Module <4> 14


Syntax for probability distributions

14-03-2024 BMEE407L 15
Syntax for joint probability distributions

14-03-2024 BMEE407L 16
Joint probability distributions
• Joint probabilities can be between any number of variables
eg. P(A = true, B = true, C = true) A B C P(A,B,C)
• For each combination of variables, we need to say how probable false false false 0.1
that combination is false false true 0.2
• The probabilities of these combinations need to sum to 1 false true false 0.05
• Once you have the joint probability distribution, you can false true true 0.05
calculate any probability involving A, B, and C true false false 0.3
• Note: May need to use marginalization and Bayes rule, (both of true false true 0.1
which are not discussed in these slides)
true true false 0.05
Examples of things you can compute: true true true 0.15

• P(A=true) = sum of P(A,B,C) in rows with A=true


• P(A=true, B = true | C=true) = Sums to 1
P(A = true, B = true, C = true) / P(C = true)
14-03-2024 BMEE407L 17
The Problem with the Joint Distribution

• Lots of entries in the table A B C P(A,B,C)


to fill up! false false false 0.1
• For k Boolean random false false true 0.2
variables, you need a table false true false 0.05
of size 2k false true true 0.05
true false false 0.3
• How do we use fewer true false true 0.1
numbers? Need the true true false 0.05
concept of independence true true true 0.15

14-03-2024 BMEE407L 18
Events
• Probabilistic statements are defined over events, or sets of world states
▪ “It is raining”
▪ “The weather is either cloudy or snowy”
▪ “The sum of the two dice rolls is 11”
▪ “My car is going between 30 and 50 miles per hour”
• Events are described using propositions:
▪ R = True
▪ W = “Cloudy”  W = “Snowy”
▪ D  {(5,6), (6,5)}
▪ 30  S  50
• Notation: P(A) is the probability of the set of world states in which proposition A holds
• P(X = x), or P(x) for short, is the probability that random variable X has taken on the value x

BMEE407L Module <4> 19


Probability Theory: Atomic events
▪ We can create new events out of combinations of the outcomes of random variables

▪ An atomic event is a complete specification of the values of the random variables of interest

▪ e.g. if our world consists of only two Boolean random variables, then the world has a four possible
atomic events

Toothache = true ^ Cavity = true


Toothache = true ^ Cavity = false
Toothache = false ^ Cavity = true
Toothache = false ^ Cavity = false

▪ The set of all possible atomic events has two properties:


• It is mutually exhaustive (nothing else can happen)
• It is mutually exclusive (only one of the four can happen at one time)

14-03-2024 BMEE407L 20
Probability theory: probabilities
▪ We can assign probabilities to the outcomes of a random variable.
P(Throw = heads) = 0.5
P(Mary_Calls = true) = 0.1
P(a) = 0.3

▪ Some simple rules governing probabilities (Kolmogorov’s axioms)

1. All probabilities are between 0 and 1 inclusive 0  P( a )  1


2. If something is necessarily true it has probability 1 P(true) = 1 P( false) = 0
3. The probability of a disjunction being true is
P(a  b) = P(a) + P(b) − P(a  b)
From these three laws all of probability theory can be derived.

14-03-2024 BMEE407L 21
Probability theory: relation to set theory
▪ We can often intuitively understand the laws of probability
by thinking about sets

P(a  b) = P(a) + P(b) − P(a  b)

14-03-2024 BMEE407L 22
Probability Theory: Conditional Probability
▪ A conditional probability expresses the likelihood that one event a will occur if b occurs. We
denote this as follows

P ( a | b)
▪ e.g.
P(Toothache = true) = 0.2
P(Toothache = true | Cavity = true) = 0.6

▪ So conditional probabilities reflect the fact that some events make other events more (or less)
likely

▪ If one event doesn’t affect the likelihood of another event they are said to be independent and
therefore
P(a | b) = P(a)
▪ E.g. if you roll a 6 on a die, it doesn’t make it more or less likely that you will roll a 6 on the next
throw. The rolls are independent.
14-03-2024 BMEE407L 23
Probability Theory: Conditional Probability

14-03-2024 BMEE407L 24
Probability Theory: Conditional Probability

14-03-2024 BMEE407L 25
Combining Probabilities: the product rule
• How we can work out the likelihood of two events occuring together given their
base and conditional probabilities?

P(a  b) = P(a | b) P(b) = P(b | a) P(a)

• So in our example 1:

P(toothache  cavity ) = P(toothache | cavity ) P(cavity )


= P(cavity | toothache) P(toothache)

• But this doesn’t help us answer our question:


“I have toothache. Do I have a cavity?”

14-03-2024 BMEE407L 26
Conditional probability definitions

14-03-2024 BMEE407L 27
Chain rule

14-03-2024 BMEE407L 28
Inference by enumeration

14-03-2024 BMEE407L 29
Computing the probability of a proposition

14-03-2024 BMEE407L 30
Computing the probability of a logical sentence

14-03-2024 BMEE407L 31
Computing a conditional probability

14-03-2024 BMEE407L 32
Computing a conditional probability

14-03-2024 BMEE407L 33
Normalization

14-03-2024 BMEE407L 34
Inference by enumeration, summary

14-03-2024 BMEE407L 35
Inference by enumeration, issues

14-03-2024 BMEE407L 36
Independence

14-03-2024 BMEE407L 37
Conditional independence

14-03-2024 BMEE407L 38
Conditional independence (cont’d)

14-03-2024 BMEE407L 39
Independence: Example

• Suppose Norman and Martin each toss separate coins.


• Let A represent the variable "Norman's toss outcome“
• B represent the variable "Martin's toss outcome".
• Both A and B have two possible values (Heads and Tails).
• We can assume that A and B are independent.
• Evidence about B will not change our belief in A.

BMEE407L Module <4> 40


Conditional Independence
• Now suppose both Martin and Norman toss the same coin.
• Let A represent the variable "Norman's toss outcome“
• B represent the variable "Martin's toss outcome".
• Assume also that there is a possibility that the coin is biased towards heads but we do not know
this for certain.
• In this case A and B are not independent. For example, observing that B is Heads causes us to
increase our belief in A being Heads (in other words P(a|b)>P(a) in the case when a=Heads and
b=Heads).
• Here the variables A and B are both dependent on a separate variable C, "the coin is biased
towards Heads" (which has the values True or False).
• Although A and B are not independent, it turns out that once we know for certain the value of C
then any evidence about B cannot change our belief about A. Specifically:
P(A|B,C)=P(A|C)
• In such case we say that A and B are conditionally independent given C.

BMEE407L Module <4> 41


Independence

• Two events A and B are independent if and only if


P(A  B) = P(A) P(B)
• In other words, P(A | B) = P(A) and P(B | A) = P(B)
• This is an important simplifying assumption for modeling, e.g.,
Toothache and Weather can be assumed to be independent
• Are two mutually exclusive events independent?
• No

• Conditional independence: A and B are conditionally independent given C


iff P(A  B | C) = P(A | C) P(B | C)

BMEE407L Module <4> 42


Bayes’ rule

• But why is it useful?


• Can get diagnostic probability P(cavity | toothache) from causal probability P(toothache | cavity)
• Can update our beliefs based on evidence
• Important tool for probabilistic inference
14-03-2024 BMEE407L 43
Bayes’ rule
▪ We can think about some events as being “hidden” causes: not necessarily directly observed
(e.g. a cavity).

▪ If we model how likely observable effects are given hidden causes (how likely toothache is
given a cavity)

▪ Then Bayes’ rule allows us to use that model to infer the likelihood of the hidden cause (and
thus answer our question)

P(effect | cause) P(cause)


P(cause | effect ) =
P(effect )

▪ In fact good models of P(effect | cause) are often available to us in real domains (e.g.
medical diagnosis)

14-03-2024 BMEE407L 44
Bayes’ rule can capture causal models
▪ Suppose a doctor knows that a meningitis causes a stiff neck in 50% of cases
P(s | m) = 0.5
▪ She also knows that the probability in the general population of someone having a stiff neck at any
time is 1/20
P(s) = 0.05
▪ She also has to know the incidence of meningitis in the population (1/50,000)
P(m) = 0.00002
▪ Using Bayes’ rule she can calculate the probability the patient has meningitis:
P( s | m) P(m) 0.5  0.00002
P( m | s ) = = = 0.0002 = 1 / 5000
P( s ) 0.05
P(effect | cause) P(cause)
P(cause | effect ) =
P(effect )
14-03-2024 BMEE407L 45
The power of causal models
▪ Why wouldn’t the doctor be better off if she just knew the likelihood of meningitis given a stiff neck?
I.e. information in the diagnostic direction from symptoms to causes?

▪ Because diagnostic knowledge is often more fragile than causal knowledge

▪ Suppose there was a meningitis epidemic? The rate of meningitis goes up 20 times within a group

P( s | m) P(m) 0.5  0.0004


P( m | s ) = = = 0.004 = 1 / 250
P( s ) 0.05
▪ The causal model is unaffected by the change in P(m), whereas the diagnostic model
P(m|s)=1/5000 is now badly wrong.

14-03-2024 BMEE407L 46
Bayes rule: the normalisation short cut
▪ If we know P(effect|cause) for every cause, we can avoid having to know P(effect)

P ( e | c ) P (c ) P (e | c ) P ( c )
P ( c | e) = =
P(e)  P (e | h ) P ( h )
hCauses

▪ Suppose for two possible causes of a stiff neck, meningitis (m) and not meningitis (¬m)
P(Meningitis) =   P(s | m) P(m), P( s | m) P(m) 
▪ We simply calculate the top line for each one and then normalise (divide by the sum of the top line
for all hypothesises

▪ But sometimes it’s harder to find out P(effect|cause) for all causes independently than it is simply to
find out P(effect)

▪ Note that Bayes’ rule here relies on the fact the effect must have arisen because of one of the
hypothesised causes. You can’t reason directly about causes you haven’t imagined.

14-03-2024 BMEE407L 47
Bayes’ rule: combining evidence
▪ Suppose we have several pieces of evidence we want to combine:
• John rings and Mary rings
• I have toothache and the dental probe catches on my tooth

▪ How do we do this?
P(cavity | toothache  catch) =  P(toothache  catch | cavity ) P(cavity )

▪ As we have more effects our causal model becomes very complicated (for N binary
effects there will be 2N different combinations of evidence that we need to model
given a cause)

P(toothache  catch | cavity ) , P(toothache  catch | cavity ) 

14-03-2024 BMEE407L 48
Bayes’ rule and conditional independence

14-03-2024 BMEE407L 49
Bayes rule + conditional independence
▪ In many practical applications there are not a few evidence variables but hundreds

▪ Thus 2N is very big

▪ This nearly led everyone to give up and rely on approximate or qualitative methods for reasoning
about uncertainty

▪ But conditional independence helps

▪ Toothache and catch are not independent, but they are independent given the presence or absence
of a cavity.

▪ In other words we can use the knowledge that cavities cause toothache and they cause the catch,
but the catch and the toothache do not cause each other (they have a single common cause).

14-03-2024 BMEE407L 50
Joint probability distributions

• A joint distribution is an assignment of


probabilities to every possible atomic event

Atomic event P
Cavity = false Toothache = false 0.8
Cavity = false  Toothache = true 0.1
Cavity = true  Toothache = false 0.05
Cavity = true  Toothache = true 0.05

BMEE407L Module <4> 51


Marginal probability distributions

• Suppose we have the joint distribution P(X,Y) and we


want to find the marginal distribution P(Y)

P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false  Toothache = true 0.1
Cavity = true  Toothache = false 0.05
Cavity = true  Toothache = true 0.05

P(Cavity) P(Toothache)
Cavity = false ? Toothache = false ?
Cavity = true ? Toochache = true ?

BMEE407L Module <4> 52


Conditional probability
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false  Toothache = true 0.1
Cavity = true  Toothache = false 0.05
Cavity = true  Toothache = true 0.05

P(Cavity) P(Toothache)
Cavity = false 0.9 Toothache = false 0.85
Cavity = true 0.1 Toothache = true 0.15

• What is P(Cavity = true | Toothache = false)?


0.05 / 0.85 = 0.059
• What is P(Cavity = false | Toothache = true)?
0.1 / 0.15 = 0.667

BMEE407L Module <4> 53


Bayes’ Rule example
Q.1 Annie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it
has rained only 5 days each year (5/365 = 0.014). Unfortunately, the weatherman has predicted
rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the
time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. What is the
probability that it will rain on Annie 's wedding?
5
𝑃 𝑅𝑎𝑖𝑛 = = 0.014 𝑃 ¬𝑅𝑎𝑖𝑛 = 1 − 𝑃 𝑅𝑎𝑖𝑛 =1-0.014
365
90 10
𝑃 𝑃𝑟𝑒𝑑𝑖𝑐𝑡|𝑅𝑎𝑖𝑛 = = 0.9 𝑃 𝑃𝑟𝑒𝑑𝑖𝑐𝑡|¬𝑅𝑎𝑖𝑛 = = 0.1
100 100

𝑃 𝑅𝑎𝑖𝑛|𝑃𝑟𝑒𝑑𝑖𝑐𝑡 =?

BMEE407L Module <4> 54


Bayes’ Rule example
• Annie is getting married tomorrow, at an outdoor ceremony in the desert. In recent
years, it has rained only 5 days each year (5/365 = 0.014). Unfortunately, the
weatherman has predicted rain for tomorrow. When it actually rains, the weatherman
correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts
rain 10% of the time. What is the probability that it will rain on Annie 's wedding?

P(Predict | Rain ) P(Rain )


P(Rain | Predict ) =
P(Predict )
P(Predict | Rain ) P(Rain )
=
P(Predict | Rain ) P(Rain ) + P(Predict | Rain ) P(Rain )
0.9 * 0.014
= = 0.111
0.9 * 0.014 + 0.1* 0.986
BMEE407L Module <4> 55
Bayes’ rule: Another example
Q.2 1% of women at age forty who participate in routine screening have cancer. 80% of
women with cancer will get positive mammographies. 9.6% of women without cancer
will also get positive mammographies. A woman in this age group had a positive
mammography in a routine screening. What is the probability that she actually has
cancer?
1
𝑃 𝐶𝑎𝑛𝑐𝑒𝑟 = = 0.01 𝑃 ¬𝐶𝑎𝑛𝑐𝑒𝑟 = 1 − 0.01 = 0.99
100

80 9.6
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝐶𝑎𝑛𝑐𝑒𝑟 = = 0.8 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒|¬𝐶𝑎𝑛𝑐𝑒𝑟 = = 0.096
100 100
𝑃 𝐶𝑎𝑛𝑐𝑒𝑟|𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =?

BMEE407L Module <4> 56


Bayes’ rule: Another example
1% of women at age forty who participate in routine screening have cancer. 80% of
women with cancer will get positive mammographies. 9.6% of women without cancer
will also get positive mammographies. A woman in this age group had a positive
mammography in a routine screening. What is the probability that she actually has
cancer?

P(Positive | Cancer) P(Cancer)


P(Cancer | Positive ) =
P(Positive )
P(Positive | Cancer) P(Cancer)
=
P(Positive | Cancer) P(Cancer) + P(Positive | Cancer) P(Cancer)
0.8 * 0.01
= = 0.0776
0.8 * 0.01 + 0.096 * 0.99

BMEE407L Module <4> 57


Bayesian reasoning

• Probability theory
• Bayesian inference
• Use probability theory and information about independence
• Reason diagnostically (from evidence (effects) to conclusions (causes))
or causally (from causes to effects)
• Bayesian networks
• Compact representation of probability distribution over a set of
propositional random variables
• Take advantage of independence relationships

14-03-2024 BMEE407L 58
Bayesian Networks: Introduction
Suppose you are trying to determine if a patient has inhalational
anthrax. You observe the following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty breathing

You would like to determine how likely the patient is infected with
inhalational anthrax given that the patient has a cough, a fever, and
difficulty breathing

We are not 100% certain that the patient has anthrax because of these
symptoms. We are dealing with uncertainty!
14-03-2024 BMEE407L 59
Bayesian Networks: Introduction
Now suppose you order an x-ray and observe that the patient has
a wide mediastinum.
Your belief that the patient is infected with inhalational anthrax
is now much higher.
Mediastinum is a space in your chest that holds your heart and other
important structures. It's the middle compartment within your
thoracic cavity, nestled between your lungs.

• Now here, what you observed affected your belief that the patient is infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some methodology for reasoning with uncertainty? Well in
fact, we do…

14-03-2024 BMEE407L 60
Bayesian Networks: Introduction

HasAnthrax

HasCough HasFever HasDifficultyBreathing HasWideMediastinum

• In the opinion of many AI researchers, Bayesian networks are the most


significant contribution in AI in the last 10 years
• They are used in many applications eg. spam filtering, speech recognition,
robotics, diagnostic systems and even syndromic surveillance

14-03-2024 BMEE407L 61
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph A Each node Xi has conditional probability
distribution P(Xi|Parents(Xi). That is the
effect of the parents of Xi on Xi.
B
A directed graph in which each node is
annotated with probability distribution.
C D
The parameters are the probabilities in
these conditional probability tables (CPTs)
2. A set of tables for each node in the graph
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 62
A Directed Acyclic Graph

Each node in the graph is a A node X is a parent of


random variable another node Y if there is an
arrow from node X to node Y
A eg. A is a parent of B

C D

Informally, an arrow from


node X to node Y means X
has a direct influence on Y
14-03-2024 BMEE407L 63
A Set of Tables for Each Node
Conditional Probability Distribution for C given B
B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1 For a given combination of values of the parents (B in this example),
the entries for P(C=true | B) and P(C=false | B) must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1

If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities
(but only 2k need to be stored)

14-03-2024 BMEE407L 64
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the graph P1 P2
structure
2. Is a compact representation of the joint
probability distribution over the variables ND1 X ND2
3. The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its C1 C2
non-descendants (ND1, ND2)

14-03-2024 BMEE407L 65
The Joint Probability Distribution
Due to the Markov condition, we can compute the joint probability distribution
over all the variables X1, …, Xn in the Bayesian net using the formula:
n
P( X 1 = x1 ,..., X n = xn ) =  P( X i = xi | Parents( X i ))
i =1

Where Parents (Xi) means the values of the Parents of the node Xi with respect to the graph

14-03-2024 BMEE407L 66
Using a Bayesian Network Example
Using the network in the example, suppose you want to calculate:

P(A = true, B = true, C = true, D = true)

= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true)

= (0.4)*(0.3)*(0.1)*(0.95)
This is from the graph structure
A
These numbers are from the conditional probability tables
A P(A) A B P(B|A) B D P(D|B) B C P(C|B) B
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9 C D
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 67
Joint Probability Factorization
Our example graph carries additional independence information, which simplifies the
joint distribution:
P( A, B, C , D ) = P( A) P( B | A) P(C | A, B ) P( D | A, B, C )
= P( A) P( B | A) P(C | B ) P( D | B )

This is why, we only need the tables for P(A), P(B|A), P(C|B), and P(D|B)
and why we computed
P(A = true, B = true, C = true, D = true) A
= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) B

C D

14-03-2024 BMEE407L 68
Inference

• Using a Bayesian network to compute probabilities is called inference


• In general, inference involves queries of the form:
P( X | E )

E = The evidence variable(s)

X = The query variable(s)

14-03-2024 BMEE407L 69
Inference

HasAnthrax

HasCough HasFever HasDifficultyBreathing HasWideMediastinum

• An example of a query would be:


P( HasAnthrax = true | HasFever = true, HasCough = true)
• Note: Even though HasDifficultyBreathing and HasWideMediastinum are in the
Bayesian network, they are not given values in the query (ie. they do not appear either as
query variables or evidence variables)
• They are treated as unobserved variables and summed out.

14-03-2024 BMEE407L 70
Inference Example
Supposed we know that A=true.
A
What is more probable C=true or D=true?
For this we need to compute P(C=t | A =t) and P(D=t | A =t).
Let us compute the first one. B

P( A = t , C = t )
 P( A = t , B = b, C = t , D = d ) C D
P (C = t | A = t ) = = b ,d

P( A = t ) P( A = t )

A P(A) A B P(B|A) B D P(D|B) B C P(C|B)


false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1

14-03-2024 BMEE407L 71
What is P(A=true)?
P( A = t ) =  P( A = t, B = b, C = c, D = d )
b ,c ,d
A
= 
b ,c ,d
P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b )

= P ( A = t )  P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b ) B
b ,c ,d

= P ( A = t )  P ( B = b | A = t )  P (C = c | B = b ) P ( D = d | B = b )
b c ,d C D
= P ( A = t )  P ( B = b | A = t )  P (C = c | B = b )  P ( D = d | B = b )
b c d

= P ( A = t )  P ( B = b | A = t )  P (C = c | B = b ) * 1
b c

= 0.4( P( B = t | A = t ) P(C = c | B = t ) + P( B = f | A = t ) P(C = c | B = f )) = ...


c c
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 72
What is P(C=true, A=true)?
P( A = t , C = t ) =  P( A = t , B = b, C = t , D = d )
b ,d
A
=  P ( A = t ) P ( B = b | A = t ) P (C = t | B = b ) P ( D = d | B = b )
b ,d
B
= P ( A = t )  P ( B = b | A = t ) P (C = t | B = b )  P ( D = d | B = b )
b d

= 0.4( P( B = t | A = t ) P(C = t | B = t ) P( D = d | B = t ) C D
d

+ P( B = f | A = t ) P(C = t | B = f ) P( D = d | B = f ))
d

= 0.4(0.3 * 0.1 *1 + 0.7 * 0.6 * 1) = 0.4(0.03 + 0.42) = 0.4 * 0.45 = 0.18


A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 73
The Bad News
• Exact inference is feasible in small to medium-sized networks
• Exact inference in large networks takes a very long time
• We resort to approximate inference techniques which are much faster and
give pretty good results

We still haven’t said where we get the Bayesian network from. There are
two options:
• Get an expert to design it
• Learn it from data

14-03-2024 BMEE407L 74
Example: Burglar Alarm

• I have a burglar alarm that is sometimes set off by minor


earthquakes. My two neighbors, John and Mary, promised
to call me at work if they hear the alarm
• Example inference task: suppose Mary calls and John doesn’t call.
What is the probability of a burglary?
• What are the random variables?
• Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
• What are the direct influence relationships?
• A burglar can set the alarm off
• An earthquake can set the alarm off
• The alarm can cause Mary to call
• The alarm can cause John to call

BMEE407L Module <4> 75


Example: Burglar Alarm
Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at detecting a
burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls at that time too. On the other hand,
Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like to compute
the probability of Burglary Alarm.

Calculate the probability that alarm has sounded,


but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the
Harry.

15-03-2024 BMEE407L 76
Example: Burglar Alarm
List of all events occurring in this network:
Burglary (B), Earthquake(E), Alarm(A), David Calls(D), Sophia calls(S)
𝑃(𝐵) 𝑃(𝐸)

𝑃(𝐴|𝐵, 𝐸)
P(D, S, A, B, E)= P(D | S, A, B, E). P(S, A, B, E)
=P(D | S, A, B, E). P(S | A, B, E). P(A, B, E)
= P (D| A). P ( S| A, B, E). P( A, B, E)
= P(D | A). P( S | A). P(A| B, E). P(B, E)
= P(D | A ). P(S | A). P(A| B, E). P(B |E). P(E)
= P(D | A ). P(S | A). P(A| B, E). P(B). P(E)
𝑃(𝑆|𝐴)
𝑃(𝐷|𝐴)

15-03-2024 BMEE407L 77
Example: Burglar Alarm
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
𝑃(𝐵) 𝑃(𝐸)

P(S, D, A, ¬B, ¬E)

=P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E) 𝑃(𝐴|𝐵, 𝐸)

=0.75* 0.91* 0.001* 0.998*0.999

=0.00068045

Hence, a Bayesian network can 𝑃(𝑆|𝐴)


𝑃(𝐷|𝐴)
answer any query about the
domain by using Joint distribution.
15-03-2024 BMEE407L 78
A more realistic Bayes Network: Car diagnosis
• Initial observation: car won’t start
• Orange: “broken, so fix it” nodes
• Green: testable evidence
• Gray: “hidden variables” to ensure sparse structure, reduce parameters

BMEE407L Module <4> 79


Summary

• Bayesian networks provide a natural representation for (causally induced)


conditional independence
• Topology + conditional probability tables
• Generally easy for domain experts to construct

BMEE407L Module <4> 80

You might also like