Professional Documents
Culture Documents
6-Module 4 Reasoning With Uncertainty-15-03-2024
6-Module 4 Reasoning With Uncertainty-15-03-2024
• On the other hand, the problem might not be in the fact that T/F values can
change over time but rather that we are not certain of the T/F value
• Agents almost never have access to the whole truth about their environment
• Agents must act in the presence of uncertainty
• Some information ascertained from facts
• Some information inferred from facts and knowledge about environment
• Some information based on assumptions made from experience
▪ Systems which can reason about the effects of uncertainty should do better than
those that don’t
14-03-2024 BMEE407L 9
Examples
▪ If I go to the dentist and he examines me, when the probe catches this indicates there may be a
cavity, rather than another cause.
The likelihood of a hypothesised cause will change as additional pieces of evidence arrive.
▪ Bob lives in San Francisco. He has a burglar alarm on his house, which can be triggered by burglars
and earthquakes. He has two neighbours, John and Mary, who will call him if the alarm goes off
while he is at work, but each is unreliable in their own way. All these sources of uncertainty can be
quantified. Mary calls, how likely is it that there has been a burglary?
Using probabilistic reasoning we can calculate how likely a hypothesised cause is.
14-03-2024 BMEE407L 10
Probability Theory: Variables and Events
▪ A random variable can be an observation, outcome or event, the value of which is uncertain.
▪ e.g a coin. Let’s use Throw as the random variable denoting the outcome when we toss the
coin.
▪ The set of possible outcomes for a random variable is called its domain.
14-03-2024 BMEE407L 11
Probability Explanation
• P(event) is the probability in the absence of any additional information
• Probability depends on evidence.
• Before looking at dice: P(4) = 1/6
• After looking at dice: P(4) = 0 or 1, depending on what we see
• All probability statements must indicate the evidence with respect to which the probability is
being assessed.
• As new evidence is collected, probability calculations are updated.
14-03-2024 BMEE407L 15
Syntax for joint probability distributions
14-03-2024 BMEE407L 16
Joint probability distributions
• Joint probabilities can be between any number of variables
eg. P(A = true, B = true, C = true) A B C P(A,B,C)
• For each combination of variables, we need to say how probable false false false 0.1
that combination is false false true 0.2
• The probabilities of these combinations need to sum to 1 false true false 0.05
• Once you have the joint probability distribution, you can false true true 0.05
calculate any probability involving A, B, and C true false false 0.3
• Note: May need to use marginalization and Bayes rule, (both of true false true 0.1
which are not discussed in these slides)
true true false 0.05
Examples of things you can compute: true true true 0.15
14-03-2024 BMEE407L 18
Events
• Probabilistic statements are defined over events, or sets of world states
▪ “It is raining”
▪ “The weather is either cloudy or snowy”
▪ “The sum of the two dice rolls is 11”
▪ “My car is going between 30 and 50 miles per hour”
• Events are described using propositions:
▪ R = True
▪ W = “Cloudy” W = “Snowy”
▪ D {(5,6), (6,5)}
▪ 30 S 50
• Notation: P(A) is the probability of the set of world states in which proposition A holds
• P(X = x), or P(x) for short, is the probability that random variable X has taken on the value x
▪ An atomic event is a complete specification of the values of the random variables of interest
▪ e.g. if our world consists of only two Boolean random variables, then the world has a four possible
atomic events
14-03-2024 BMEE407L 20
Probability theory: probabilities
▪ We can assign probabilities to the outcomes of a random variable.
P(Throw = heads) = 0.5
P(Mary_Calls = true) = 0.1
P(a) = 0.3
14-03-2024 BMEE407L 21
Probability theory: relation to set theory
▪ We can often intuitively understand the laws of probability
by thinking about sets
14-03-2024 BMEE407L 22
Probability Theory: Conditional Probability
▪ A conditional probability expresses the likelihood that one event a will occur if b occurs. We
denote this as follows
P ( a | b)
▪ e.g.
P(Toothache = true) = 0.2
P(Toothache = true | Cavity = true) = 0.6
▪ So conditional probabilities reflect the fact that some events make other events more (or less)
likely
▪ If one event doesn’t affect the likelihood of another event they are said to be independent and
therefore
P(a | b) = P(a)
▪ E.g. if you roll a 6 on a die, it doesn’t make it more or less likely that you will roll a 6 on the next
throw. The rolls are independent.
14-03-2024 BMEE407L 23
Probability Theory: Conditional Probability
14-03-2024 BMEE407L 24
Probability Theory: Conditional Probability
14-03-2024 BMEE407L 25
Combining Probabilities: the product rule
• How we can work out the likelihood of two events occuring together given their
base and conditional probabilities?
• So in our example 1:
14-03-2024 BMEE407L 26
Conditional probability definitions
14-03-2024 BMEE407L 27
Chain rule
14-03-2024 BMEE407L 28
Inference by enumeration
14-03-2024 BMEE407L 29
Computing the probability of a proposition
14-03-2024 BMEE407L 30
Computing the probability of a logical sentence
14-03-2024 BMEE407L 31
Computing a conditional probability
14-03-2024 BMEE407L 32
Computing a conditional probability
14-03-2024 BMEE407L 33
Normalization
14-03-2024 BMEE407L 34
Inference by enumeration, summary
14-03-2024 BMEE407L 35
Inference by enumeration, issues
14-03-2024 BMEE407L 36
Independence
14-03-2024 BMEE407L 37
Conditional independence
14-03-2024 BMEE407L 38
Conditional independence (cont’d)
14-03-2024 BMEE407L 39
Independence: Example
▪ If we model how likely observable effects are given hidden causes (how likely toothache is
given a cavity)
▪ Then Bayes’ rule allows us to use that model to infer the likelihood of the hidden cause (and
thus answer our question)
▪ In fact good models of P(effect | cause) are often available to us in real domains (e.g.
medical diagnosis)
14-03-2024 BMEE407L 44
Bayes’ rule can capture causal models
▪ Suppose a doctor knows that a meningitis causes a stiff neck in 50% of cases
P(s | m) = 0.5
▪ She also knows that the probability in the general population of someone having a stiff neck at any
time is 1/20
P(s) = 0.05
▪ She also has to know the incidence of meningitis in the population (1/50,000)
P(m) = 0.00002
▪ Using Bayes’ rule she can calculate the probability the patient has meningitis:
P( s | m) P(m) 0.5 0.00002
P( m | s ) = = = 0.0002 = 1 / 5000
P( s ) 0.05
P(effect | cause) P(cause)
P(cause | effect ) =
P(effect )
14-03-2024 BMEE407L 45
The power of causal models
▪ Why wouldn’t the doctor be better off if she just knew the likelihood of meningitis given a stiff neck?
I.e. information in the diagnostic direction from symptoms to causes?
▪ Suppose there was a meningitis epidemic? The rate of meningitis goes up 20 times within a group
14-03-2024 BMEE407L 46
Bayes rule: the normalisation short cut
▪ If we know P(effect|cause) for every cause, we can avoid having to know P(effect)
P ( e | c ) P (c ) P (e | c ) P ( c )
P ( c | e) = =
P(e) P (e | h ) P ( h )
hCauses
▪ Suppose for two possible causes of a stiff neck, meningitis (m) and not meningitis (¬m)
P(Meningitis) = P(s | m) P(m), P( s | m) P(m)
▪ We simply calculate the top line for each one and then normalise (divide by the sum of the top line
for all hypothesises
▪ But sometimes it’s harder to find out P(effect|cause) for all causes independently than it is simply to
find out P(effect)
▪ Note that Bayes’ rule here relies on the fact the effect must have arisen because of one of the
hypothesised causes. You can’t reason directly about causes you haven’t imagined.
14-03-2024 BMEE407L 47
Bayes’ rule: combining evidence
▪ Suppose we have several pieces of evidence we want to combine:
• John rings and Mary rings
• I have toothache and the dental probe catches on my tooth
▪ How do we do this?
P(cavity | toothache catch) = P(toothache catch | cavity ) P(cavity )
▪ As we have more effects our causal model becomes very complicated (for N binary
effects there will be 2N different combinations of evidence that we need to model
given a cause)
14-03-2024 BMEE407L 48
Bayes’ rule and conditional independence
14-03-2024 BMEE407L 49
Bayes rule + conditional independence
▪ In many practical applications there are not a few evidence variables but hundreds
▪ This nearly led everyone to give up and rely on approximate or qualitative methods for reasoning
about uncertainty
▪ Toothache and catch are not independent, but they are independent given the presence or absence
of a cavity.
▪ In other words we can use the knowledge that cavities cause toothache and they cause the catch,
but the catch and the toothache do not cause each other (they have a single common cause).
14-03-2024 BMEE407L 50
Joint probability distributions
Atomic event P
Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity) P(Toothache)
Cavity = false ? Toothache = false ?
Cavity = true ? Toochache = true ?
P(Cavity) P(Toothache)
Cavity = false 0.9 Toothache = false 0.85
Cavity = true 0.1 Toothache = true 0.15
𝑃 𝑅𝑎𝑖𝑛|𝑃𝑟𝑒𝑑𝑖𝑐𝑡 =?
80 9.6
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝐶𝑎𝑛𝑐𝑒𝑟 = = 0.8 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒|¬𝐶𝑎𝑛𝑐𝑒𝑟 = = 0.096
100 100
𝑃 𝐶𝑎𝑛𝑐𝑒𝑟|𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =?
• Probability theory
• Bayesian inference
• Use probability theory and information about independence
• Reason diagnostically (from evidence (effects) to conclusions (causes))
or causally (from causes to effects)
• Bayesian networks
• Compact representation of probability distribution over a set of
propositional random variables
• Take advantage of independence relationships
14-03-2024 BMEE407L 58
Bayesian Networks: Introduction
Suppose you are trying to determine if a patient has inhalational
anthrax. You observe the following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty breathing
You would like to determine how likely the patient is infected with
inhalational anthrax given that the patient has a cough, a fever, and
difficulty breathing
We are not 100% certain that the patient has anthrax because of these
symptoms. We are dealing with uncertainty!
14-03-2024 BMEE407L 59
Bayesian Networks: Introduction
Now suppose you order an x-ray and observe that the patient has
a wide mediastinum.
Your belief that the patient is infected with inhalational anthrax
is now much higher.
Mediastinum is a space in your chest that holds your heart and other
important structures. It's the middle compartment within your
thoracic cavity, nestled between your lungs.
• Now here, what you observed affected your belief that the patient is infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some methodology for reasoning with uncertainty? Well in
fact, we do…
14-03-2024 BMEE407L 60
Bayesian Networks: Introduction
HasAnthrax
14-03-2024 BMEE407L 61
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph A Each node Xi has conditional probability
distribution P(Xi|Parents(Xi). That is the
effect of the parents of Xi on Xi.
B
A directed graph in which each node is
annotated with probability distribution.
C D
The parameters are the probabilities in
these conditional probability tables (CPTs)
2. A set of tables for each node in the graph
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 62
A Directed Acyclic Graph
C D
If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities
(but only 2k need to be stored)
14-03-2024 BMEE407L 64
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the graph P1 P2
structure
2. Is a compact representation of the joint
probability distribution over the variables ND1 X ND2
3. The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its C1 C2
non-descendants (ND1, ND2)
14-03-2024 BMEE407L 65
The Joint Probability Distribution
Due to the Markov condition, we can compute the joint probability distribution
over all the variables X1, …, Xn in the Bayesian net using the formula:
n
P( X 1 = x1 ,..., X n = xn ) = P( X i = xi | Parents( X i ))
i =1
Where Parents (Xi) means the values of the Parents of the node Xi with respect to the graph
14-03-2024 BMEE407L 66
Using a Bayesian Network Example
Using the network in the example, suppose you want to calculate:
= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95)
This is from the graph structure
A
These numbers are from the conditional probability tables
A P(A) A B P(B|A) B D P(D|B) B C P(C|B) B
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9 C D
true true 0.3 true true 0.95 true true 0.1
14-03-2024 BMEE407L 67
Joint Probability Factorization
Our example graph carries additional independence information, which simplifies the
joint distribution:
P( A, B, C , D ) = P( A) P( B | A) P(C | A, B ) P( D | A, B, C )
= P( A) P( B | A) P(C | B ) P( D | B )
This is why, we only need the tables for P(A), P(B|A), P(C|B), and P(D|B)
and why we computed
P(A = true, B = true, C = true, D = true) A
= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) B
C D
14-03-2024 BMEE407L 68
Inference
14-03-2024 BMEE407L 69
Inference
HasAnthrax
14-03-2024 BMEE407L 70
Inference Example
Supposed we know that A=true.
A
What is more probable C=true or D=true?
For this we need to compute P(C=t | A =t) and P(D=t | A =t).
Let us compute the first one. B
P( A = t , C = t )
P( A = t , B = b, C = t , D = d ) C D
P (C = t | A = t ) = = b ,d
P( A = t ) P( A = t )
14-03-2024 BMEE407L 71
What is P(A=true)?
P( A = t ) = P( A = t, B = b, C = c, D = d )
b ,c ,d
A
=
b ,c ,d
P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b )
= P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b ) B
b ,c ,d
= P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b )
b c ,d C D
= P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) P ( D = d | B = b )
b c d
= P ( A = t ) P ( B = b | A = t ) P (C = c | B = b ) * 1
b c
= 0.4( P( B = t | A = t ) P(C = t | B = t ) P( D = d | B = t ) C D
d
+ P( B = f | A = t ) P(C = t | B = f ) P( D = d | B = f ))
d
We still haven’t said where we get the Bayesian network from. There are
two options:
• Get an expert to design it
• Learn it from data
14-03-2024 BMEE407L 74
Example: Burglar Alarm
15-03-2024 BMEE407L 76
Example: Burglar Alarm
List of all events occurring in this network:
Burglary (B), Earthquake(E), Alarm(A), David Calls(D), Sophia calls(S)
𝑃(𝐵) 𝑃(𝐸)
𝑃(𝐴|𝐵, 𝐸)
P(D, S, A, B, E)= P(D | S, A, B, E). P(S, A, B, E)
=P(D | S, A, B, E). P(S | A, B, E). P(A, B, E)
= P (D| A). P ( S| A, B, E). P( A, B, E)
= P(D | A). P( S | A). P(A| B, E). P(B, E)
= P(D | A ). P(S | A). P(A| B, E). P(B |E). P(E)
= P(D | A ). P(S | A). P(A| B, E). P(B). P(E)
𝑃(𝑆|𝐴)
𝑃(𝐷|𝐴)
15-03-2024 BMEE407L 77
Example: Burglar Alarm
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
𝑃(𝐵) 𝑃(𝐸)
=0.00068045