L11 Uncertainty

UCCD2063
Artificial Intelligence Techniques
Unit 11:
Uncertainty
Outline
• Probability Distribution
• Joint Distribution
• Marginal Distribution
• Conditional Distribution
• Product Rule
• Probabilistic Inference
• Bayes’ Rule
References:
• Chapter 13 in Russell & Norvig
• CS188 Lecture Note: Probability [link]
Uncertainty
Many uncertainty in the real world:
▪ Medical Diagnosis
• temperature, blood pressure, types of pain -> disease?
▪ Speech Recognition
• sound signals -> sentence?
▪ Tracking objects
• current position, speed, acceleration -> next position?
▪ Genetics
• gene expression data -> gene interactions ?
▪ Error correcting codes
• data corrupted with noise -> original message?
▪ … lots more!
3
Random Variable
• Random variable is an aspect of the Notation:

problem domain which we may
We denote random variables
have uncertainty about with capital letter and the
– D: the result of flipping a die values with small letter
– R: Is it raining?
– M: winning a football match Example:
• Domain: Each random variable has 𝑅 = {𝑟, 𝑟}
a domain (the set of all possible
• R is a variable
values): • 𝑟 denotes R = True
– D in {1, 2, 3, 4, 5, 6}
•  𝑟 denotes R = False
– R in {yes, no}
– M in (win, lose)
4
Probability Distribution
• A probability distribution is a TABLE specifying the

probabilities for all the values (outcomes) of ONE random
variable.
W P(W)
T P(T)
hazy 0.6
hot 0.6
sunny 0.1
cold 0.4
rainy 0.3
Temperature
Weather
𝑃 𝑇 = ℎ𝑜𝑡 𝑜𝑟 𝑃 ℎ𝑜𝑡 = 0.6 𝑃 ℎ𝑎𝑧𝑦 = 0.6, 𝑃 𝑠𝑢𝑛𝑛𝑦 = 0.1,
𝑃 𝑇 = 𝑐𝑜𝑙𝑑 𝑜𝑟 𝑃 𝑐𝑜𝑙𝑑 = 0.4 𝑃 𝑟𝑎𝑖𝑛𝑦 = 0.3
• The probability value for the values of a random variable

𝑥 must fulfil the following requirements:
∀𝑥∈𝑋 1 ≥ 𝑃(𝑥) ≥ 0
෍𝑃 𝑥 = 1
𝑥∈𝑋 5
Joint Distribution
• A joint distribution is a TABLE specifying the probabilities for
all the values (outcomes) of MULTIPLE random variables.
W = {hazy, sunny, rainy} P(𝑇, 𝑊)
T = {hot, cold} hot, hazy 0.25
cold, hazy 0.15
P(𝑇, 𝑊)
hot, sunny 0.30
hazy sunny rainy
cold, sunny 0.05
hot 0.25 0.3 0.05
hot, rainy 0.05
cold 0.15 0.05 0.20
cold, rainy 0.20
• The probability values in a joint distribution must follow the

following requirements:
0 ≤ 𝑃(𝑡, 𝑤) ≤ 1 σ𝑡,𝑤 𝑃 𝑡, 𝑤 = 1
• A probabilistic model is a joint distribution over a set of

random variables.
6
Events
▪ What we can do with the joint distributions? We can infer the

probability of certain event happening.
▪ An event 𝐸 is a set of outcomes for the set of random variables.
For example, the following show some events that we can
acquire from the joint distribution:
𝑃(𝑇, 𝑊) Example events:

hazy sunny rainy • hazy
hot 0.25 0.3 0.05 • cold
cold 0.15 0.05 0.20 • not sunny (sunny)
• hazy OR rainy (hazy  rainy)
• hazy AND cold (hazy  cold)
• hazy AND rainy
• (hazy  rainy)  cold
7
Probability of an Event
▪ The probability of E is computed by adding up all entries in joint

distribution that are consistent with the event :
𝑃 𝐸 = ෍ 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑥1 ,𝑥2 ,…,𝑥𝑛 ∈𝐸
For example: Find the probability of the weather (W) being hazy
𝑃(𝑇, 𝑊)
hazy sunny rainy
hot 0.25 0.3 0.05
cold 0.15 0.05 0.20
P(hazy) = P(hazy, hot) + P(hazy, cold)

= 0.25 + 0.15 = 0.40
8
Example Events
𝑃(𝑇, 𝑊)
hazy sunny rainy
hot 0.25 0.3 0.05
cold 0.15 0.05 0.20
▪ P(hot)? 0.25 + 0.3 + 0.05 = 0.6

▪ 𝑃(¬ℎ𝑎𝑧𝑦) ? 0.3 + 0.05 + 0.05 + 0.2 = 0.6
▪ 𝑃(¬ℎ𝑎𝑧𝑦 ∨ 𝑐𝑜𝑙𝑑)? 0.3 + 0.05 + 0.05 + 0.2 + 0.15 = 0.75
▪ P(hot  sunny)? 0.25 + 0.05 = 0.30
▪ 𝑃(ℎ𝑎𝑧𝑦  𝑟𝑎𝑖𝑛𝑦) ? Not possible
9
Marginal Distributions
▪ Marginal distributions are sub-tables which eliminates variables.
▪ From the full distribution, we extract the distribution over some
subset of variables. This process is called summing out or
marginalization.
Full joint distribution Marginalized

P(T, W) Distribution
hazy sunny rainy P(T)
sum out W
hot 0.25 0.3 0.05
hot 0.6
cold 0.15 0.05 0.20
cold 0.4
P(hot) = P(hot, hazy) + P(hot,
sum out T sunny) + P(hot, rainy)
Marginalized P(cold) = P(cold, hazy) + P(cold,
hazy sunny rainy sunny) + P(cold, rainy)
Distribution
P(W) 0.4 0.35 0.25
10
Conditional Distributions
▪ Sometimes, we may know the value of some variables.
▪ For example, suppose we know about the temperature, e.g. hot
• Then, there is no more uncertainty about temperature.
• But there remains uncertainty over the weather (hazy?sunny?rainy?)
▪ Conditional distributions are probability distributions over some variables
given fixed (or known) values of other variables
P(W | T ) The probability of W given T
Full joint distribution

P(T, W)
hazy sunny rainy How to get the
hot 0.25 0.3 0.05 conditional distribution
cold 0.15 0.05 0.20 P(W | T ) from the full
joint distribution P(W, T)?
11
Suppose the temperature is hot:
hazy sunny rainy

Full joint distribution 0.25 0.3 0.05
hot
P(T, W)
cold 0.15 0.05 0.20
Retain the entries conforming

to the given fact
Part of the hazy sunny rainy
full joint distribution 0.25 0.3 0.05
Does not sum to 1. Not a
hot probability distribution.
P(T=hot, W)
Normalize so
/0.6
that it sum to 1
Conditional distribution hazy sunny rainy

Sums to 1. This is a
P(W | hot) hot 0.42 0.5 0.08
probability distribution.
P(hazy|hot) P(sunny|hot) P(rainy|hot)
12
Suppose the temperature is cold:
hazy sunny rainy

hot 0.25 0.3 0.05
Full joint distribution
cold 0.15 0.05 0.20
P(T, W)
Retain the entries conforming
to the given fact
Part of the hazy sunny rainy Does not sum to 1. Not a
full joint distribution 0.15 0.05 0.20 probability distribution.
cold
P(T=cold, W)
Normalize so that
/0.4
it sum to 1
Conditional distribution ℎ𝑎𝑧𝑦 𝑠𝑢𝑛𝑛𝑦 𝑟𝑎𝑖𝑛𝑦

cold 0.375 0.125 0.5 Sums to 1. This is a
P(W | cold) probability distribution.
P(hazy|cold) P(sunny|cold) P(rainy|cold) 13

We can combine the two conditional distributions into
one single table:
Conditional distribution
P( W | T )
hazy sunny rainy P( W | hot )

hot 0.42 0.5 0.08
Sums to 1.
cold 0.375 0.125 0.5 P( W | cold )

Sums to 1.
Probability changes under

different conditions
14
Product Rule
▪ A simple relation between joint and conditional probabilities

can be expressed in terms of the product rule:
joint
𝑃(𝑥, 𝑦)
𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦) 𝑃 𝑥𝑦 =
𝑃 𝑦
marginal
▪ Example:
P(hazy) = P(hazy, hot) + P(hazy, cold)
hazy sunny rainy = 0.25 + 0.15 = 0.4
hot 0.25 0.3 0.05
0.15 0.05 0.20
P(hot | hazy) = P(hot, hazy) / P(hazy)
cold
= 0.25 / 0.4
= 0.625
P(hazy, hot) = P(hot | hazy) P(hazy)
= 0.625  0.4
= 0.25 15
Example 1
Given the joint distribution table below, compute the
following conditional distributions:
▪ 𝑃(𝑥 | 𝑦) = P(x,y)/P(y)
𝑋 𝑌 𝑃(𝑋, 𝑌) = 0.2/(0.2+0.4)
= 0.3333
¬𝑥 ¬𝑦 0.1
¬𝑥 𝑦 0.4
𝑥 ¬𝑦 0.3 ▪ 𝑃(¬𝑥 | 𝑦) = P(¬x, y)/P(y)
𝑥 𝑦 0.2 = 0.4/(0.2 + 0.4)
= 0.6667
▪ 𝑃(¬𝑦 | 𝑥 ) = P(x, ¬y)/P(x)

= 0.3/(0.3+0.2)
= 0.6
16
Example 2
▪ P(W)?
P(sun) = 0.3 + 0.1 + 0.1 + 0.15 = 0.65
Compute P(W), P(W | winter) and
P(W | winter, hot) given the P(rain) = 0.05 + 0.05 + 0.05 + 0.20 = 0.35
following joint distribution:
▪ P(W | winter)?
S T W P(S, T, W)
summer hot sun 0.30
P(W|winter) = P(W,winter)/P(winter)
summer hot rain 0.05 P(winter) = 0.1+0.05+0.15+0.2 = 0.5
summer cold sun 0.10 P(sun | winter) = (0.1+0.15) / 0.5 = 0.5
summer cold rain 0.05 P(rain | winter) = (0.05+0.2) / 0.5 = 0.5
winter hot sun 0.10
winter hot rain 0.05 ▪ P(W | summer, hot)?
winter cold sun 0.15 P(W|summer,hot) = P(W,summer, hot) /
winter cold rain 0.20 P(summer, hot)
P(summer, hot) = 0.30+0.05 = 0.35
P(sun |summer,hot) = 0.30 / 0.35 = 0.86
P(rain |summer,hot) = 0.05 / 0.35 = 0.14
Probabilities change with new evidence! 17
Example 3
There are two production lines (line 1 and 2) in a factory with the
same output capacity.
▪ Given that 30% of the product from line one is faulty. What is
the probability that a product which is faulty and comes from
line one?
P(line1) = 0.5 P(faulty, line1) = P(faulty|line1)P(line1)
P(faulty | line1) = 0.3 = (0.3)(0.5)
P(faulty, line1) = ? = 0.15
▪ 10% of the products are faulty and come from line two. Given a
product from line two, what is the probability that the product
is faulty?
P(line2) = 0.5 P(faulty|line2) = P(faulty, line2)/P(line2)
P(faulty, line2) = 0.1 = 0.1/0.5
P(faulty | line2) = ? = 0.20
18
Probabilistic Inference
▪ Probabilistic inference: compute a

desired probability from other known
probabilities
(e.g. conditional from joint)
▪ We generally compute conditional

probabilities of an event given certain
evidence (using product rule)
P (event| evidence)
Probabilities change with new evidence!
19
Inference by Enumeration
Given the following: and we want to find:
▪ Query event:
▪ Query variables:
▪ Evidence variables:
▪ Hidden variables:
(don’t care)
The inference process:
1. Select consistent entries 2. Marginalize

Select the entries consistent with the Sum out 𝐻 to get the joint
evidence to get: distribution of query and evidence
to get:
4. Get Event
3. Normalize
Sum the entries consistent with
Normalize to get the conditional
the event: probability: 20
Probabilistic Inference Example
▪ For example: Consider the joint probability of having a
toothache (Toothache), having a cavity in the teeth (Cavity),
and the dentist catches (detects) a damaged part of the teeth
(Catch)
toothache toothache
catch catch catch catch
cavity 0.108 0.012 0.072 0.008
cavity 0.016 0.064 0.144 0.576
We may want to find
P (cavity | toothache) P (catch | cavity)
21
Example 1
Suppose we want to find the
probability of a person having no
cavity if he has toothache
Query event: cavity
Query variable: Cavity = (cavity, cavity)
Evidence variables: Toothache = True Find: P (cavity | toothache)
Hidden variables: Catch
1. Select consistent entries 2. Sum out hidden variables

P(Cavity, Catch, toothache) P(Cavity, toothache)
catch catch cavity 0.108 + 0.012 = 0.120
cavity 0.108 0.012
cavity 0.016 0.064 cavity 0.016 + 0.064 = 0.080
4. Get Event 3. Normalize

P(Cavity | toothache)
P(cavity | toothache) = 0.4
cavity 0.12/(0.12+0.08) = 0.6
cavity 0.08/(0.12+0.08) = 0.4 22
Example 2
If there is a cavity in the patient's
teeth, what is the probability that
the dentist catches a damage?
Query event: catch
Query variable: Catch = (catch, catch)
Find P(catch | cavity)
Evidence variables: Cavity = True
Hidden variables: Toothache
1. Select consistent entries 2. Sum out hidden variables

P(Catch, Toothache, cavity) P(Catch, cavity)
toothache toothache catch catch
catch catch catch catch 0.108+0.072 0.012+0.008
=0.18 =0.02
cavity 0.108 0.012 0.072 0.008
3. Normalize
4. Get Event
P(Catch | cavity)
P(catch| cavity) = 0.90 catch catch
0.18/(0.18+0.02) 0.02/(0.18+0.02)
=0.90 =0.10 23
Causal and Diagnostic Probability
▪ Most of the time, we have the causal probability:
P(effect | cause)
For example: P(fever | dengue) = the probability (%) of a dengue
patient that has fever. This can be easily computed from historical
data.
▪ But more often than not, what we really want to do is to diagnostic
probability:
P(cause | effect)
For example: P(dengue | fever) = given a patient has fever
symptom, what is the probability that he got dengue. A doctor can
use this information to help him in his diagnosis.
▪ It is possible to compute the diagnostic probability from the causal
probability?
24
Bayes’ Rule
▪ Two ways to factor a joint distribution over two variables:

The Product Rule:
▪ From above, we get:

𝑃 𝑦 𝑥 𝑃(𝑥)
𝑃 𝑥𝑦 =
𝑃 𝑦
▪ Why is this at all helpful?

• Allow us to build one conditional from its reverse
• Often one conditional is tricky but the other one is simple to get
• Foundation of many AI systems (e.g. Diagnosis, Automatic Speech
Recognition (ASR), Object tracking, …)
▪ This is one of the most important AI equation!
25
Example 1
Assume that the probability of getting meningitis (brain fever) is 0.01%.

Among those who contracted meningitis, 80% has stiff neck whereas only
1% of those without meningitis has stiff neck. What is the probability of a
person having meningitis (m) if he has a stiff neck (s).
P(m) = 0.0001
P(s|m) = 0.8
P(s|m) = 0.01
P(m|s) = ?
About 1% of stiff neck patients might get meningitis.
26
Example 2
Given:
W P D W P
sunny 0.8 wet sunny 0.1
rainy 0.2 dry sunny 0.9
wet rainy 0.7
dry rainy 0.3
What is P(W | dry) ? P(W|dry) = P(dry|W)*P(W)/P(dry)

P(dry) = P(sunny, dry) + P(rainy, dry)
= P(dry | sunny) P(sunny) + P( dry| rainy) P(rainy)
= 0.9*0.8 + 0.3*0.2 = 0.72 + 0.06 = 0.78
P(sunny|dry) = P(dry|sunny)*P(sunny)/P(dry)
= 0.9*0.8 / 0.78 = 0.9231
P(rainy|dry) = P(dry|rainy)*P(rainy)/P(dry)
= 0.3*0.2 / 0.78 = 0.0769 27
Example 3
In ABC College, of all their students, 25% obtained good CGPA, 50% are average, and 25%
are bad. Suppose for any examination, a student with good CGPA has a 5% chance of
failing the examination, a student with average CGPA 15% chance of failing, and a
student with bad CGPA 25% chance of failing. For a student who failed an examination
last year, what are the probabilities of getting different categories of CGPA?
P(good) = 0.25 P(fail) = P (fail | good)*P(good) + P(fail | average)*P(average)

P(average) = 0.50 + P(fail | bad)*P(bad)
P(bad) = 0.25 = (0.05)(0.25)+ (0.15)(0.50) +(0.25)(0.25)
P(fail | good) = 0.05
P(fail | average) = 0.15 = 0.15
P(fail | bad) = 0.25
P(good | fail) = P(fail | good)*P(good)/p(fail)
Query: = (0.05)(0.25)/0.15
P(good | fail) = 0.083
P(average | fail)
P(bad | fail) P(average | fail) = P(fail | average)*P(average) / P(fail)
= (0.15)(0.50)/0.15
= 0.075/0.15
= 0.5
P(bad | fail) = P(fail | bad)* P(bad)/P(fail)
= (0.25)(0.25) /0.15
= 0.0625/0.15
= 0.417 28
Next:
Bayesian Network

L11 Uncertainty

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L11 Uncertainty

Uploaded by

Copyright:

Available Formats

UCCD2063

Artificial Intelligence Techniques

• Random variable is an aspect of the Notation:

• A probability distribution is a TABLE specifying the

• The probability value for the values of a random variable

• The probability values in a joint distribution must follow the

• A probabilistic model is a joint distribution over a set of

▪ What we can do with the joint distributions? We can infer the

𝑃(𝑇, 𝑊) Example events:

▪ The probability of E is computed by adding up all entries in joint

P(hazy) = P(hazy, hot) + P(hazy, cold)

▪ P(hot)? 0.25 + 0.3 + 0.05 = 0.6

Full joint distribution Marginalized

Full joint distribution

hazy sunny rainy

Retain the entries conforming

Conditional distribution hazy sunny rainy

hazy sunny rainy

Conditional distribution ℎ𝑎𝑧𝑦 𝑠𝑢𝑛𝑛𝑦 𝑟𝑎𝑖𝑛𝑦

P(hazy|cold) P(sunny|cold) P(rainy|cold) 13

hazy sunny rainy P( W | hot )

cold 0.375 0.125 0.5 P( W | cold )

Probability changes under

▪ A simple relation between joint and conditional probabilities

▪ 𝑃(¬𝑦 | 𝑥 ) = P(x, ¬y)/P(x)

▪ Probabilistic inference: compute a

▪ We generally compute conditional

Probabilities change with new evidence!

1. Select consistent entries 2. Marginalize

cavity 0.016 0.064 0.144 0.576

We may want to find

P (cavity | toothache) P (catch | cavity)

1. Select consistent entries 2. Sum out hidden variables

4. Get Event 3. Normalize

1. Select consistent entries 2. Sum out hidden variables

▪ Two ways to factor a joint distribution over two variables:

▪ From above, we get:

▪ Why is this at all helpful?

Assume that the probability of getting meningitis (brain fever) is 0.01%.

About 1% of stiff neck patients might get meningitis.

What is P(W | dry) ? P(W|dry) = P(dry|W)*P(W)/P(dry)

P(good) = 0.25 P(fail) = P (fail | good)*P(good) + P(fail | average)*P(average)

You might also like

P(good) = 0.25 P(fail) = P (fail | good)P(good) + P(fail | average)P(average)