You are on page 1of 29

UCCD2063

Artificial Intelligence Techniques

Unit 11:
Uncertainty
Outline
• Probability Distribution
• Joint Distribution
• Marginal Distribution
• Conditional Distribution
• Product Rule
• Probabilistic Inference
• Bayes’ Rule

References:
• Chapter 13 in Russell & Norvig
• CS188 Lecture Note: Probability [link]
Uncertainty
Many uncertainty in the real world:
▪ Medical Diagnosis
• temperature, blood pressure, types of pain -> disease?
▪ Speech Recognition
• sound signals -> sentence?
▪ Tracking objects
• current position, speed, acceleration -> next position?
▪ Genetics
• gene expression data -> gene interactions ?
▪ Error correcting codes
• data corrupted with noise -> original message?
▪ … lots more!
3
Random Variable

• Random variable is an aspect of the Notation:


problem domain which we may
We denote random variables
have uncertainty about with capital letter and the
– D: the result of flipping a die values with small letter
– R: Is it raining?
– M: winning a football match Example:
• Domain: Each random variable has 𝑅 = {𝑟, 𝑟}
a domain (the set of all possible
• R is a variable
values): • 𝑟 denotes R = True
– D in {1, 2, 3, 4, 5, 6}
•  𝑟 denotes R = False
– R in {yes, no}
– M in (win, lose)

4
Probability Distribution

• A probability distribution is a TABLE specifying the


probabilities for all the values (outcomes) of ONE random
variable.
W P(W)
T P(T)
hazy 0.6
hot 0.6
sunny 0.1
cold 0.4
rainy 0.3
Temperature
Weather
𝑃 𝑇 = ℎ𝑜𝑡 𝑜𝑟 𝑃 ℎ𝑜𝑡 = 0.6 𝑃 ℎ𝑎𝑧𝑦 = 0.6, 𝑃 𝑠𝑢𝑛𝑛𝑦 = 0.1,
𝑃 𝑇 = 𝑐𝑜𝑙𝑑 𝑜𝑟 𝑃 𝑐𝑜𝑙𝑑 = 0.4 𝑃 𝑟𝑎𝑖𝑛𝑦 = 0.3

• The probability value for the values of a random variable


𝑥 must fulfil the following requirements:
∀𝑥∈𝑋 1 ≥ 𝑃(𝑥) ≥ 0
෍𝑃 𝑥 = 1
𝑥∈𝑋 5
Joint Distribution
• A joint distribution is a TABLE specifying the probabilities for
all the values (outcomes) of MULTIPLE random variables.
W = {hazy, sunny, rainy} P(𝑇, 𝑊)
T = {hot, cold} hot, hazy 0.25
cold, hazy 0.15
P(𝑇, 𝑊)
hot, sunny 0.30
hazy sunny rainy
cold, sunny 0.05
hot 0.25 0.3 0.05
hot, rainy 0.05
cold 0.15 0.05 0.20
cold, rainy 0.20

• The probability values in a joint distribution must follow the


following requirements:
0 ≤ 𝑃(𝑡, 𝑤) ≤ 1 σ𝑡,𝑤 𝑃 𝑡, 𝑤 = 1

• A probabilistic model is a joint distribution over a set of


random variables.
6
Events

▪ What we can do with the joint distributions? We can infer the


probability of certain event happening.
▪ An event 𝐸 is a set of outcomes for the set of random variables.
For example, the following show some events that we can
acquire from the joint distribution:

𝑃(𝑇, 𝑊) Example events:


hazy sunny rainy • hazy
hot 0.25 0.3 0.05 • cold
cold 0.15 0.05 0.20 • not sunny (sunny)
• hazy OR rainy (hazy  rainy)
• hazy AND cold (hazy  cold)
• hazy AND rainy
• (hazy  rainy)  cold
7
Probability of an Event

▪ The probability of E is computed by adding up all entries in joint


distribution that are consistent with the event :

𝑃 𝐸 = ෍ 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑥1 ,𝑥2 ,…,𝑥𝑛 ∈𝐸

For example: Find the probability of the weather (W) being hazy
𝑃(𝑇, 𝑊)
hazy sunny rainy
hot 0.25 0.3 0.05
cold 0.15 0.05 0.20

P(hazy) = P(hazy, hot) + P(hazy, cold)


= 0.25 + 0.15 = 0.40
8
Example Events

𝑃(𝑇, 𝑊)
hazy sunny rainy
hot 0.25 0.3 0.05
cold 0.15 0.05 0.20

▪ P(hot)? 0.25 + 0.3 + 0.05 = 0.6


▪ 𝑃(¬ℎ𝑎𝑧𝑦) ? 0.3 + 0.05 + 0.05 + 0.2 = 0.6
▪ 𝑃(¬ℎ𝑎𝑧𝑦 ∨ 𝑐𝑜𝑙𝑑)? 0.3 + 0.05 + 0.05 + 0.2 + 0.15 = 0.75
▪ P(hot  sunny)? 0.25 + 0.05 = 0.30
▪ 𝑃(ℎ𝑎𝑧𝑦  𝑟𝑎𝑖𝑛𝑦) ? Not possible
9
Marginal Distributions
▪ Marginal distributions are sub-tables which eliminates variables.
▪ From the full distribution, we extract the distribution over some
subset of variables. This process is called summing out or
marginalization.

Full joint distribution Marginalized


P(T, W) Distribution
hazy sunny rainy P(T)
sum out W
hot 0.25 0.3 0.05
hot 0.6
cold 0.15 0.05 0.20
cold 0.4
P(hot) = P(hot, hazy) + P(hot,
sum out T sunny) + P(hot, rainy)
Marginalized P(cold) = P(cold, hazy) + P(cold,
hazy sunny rainy sunny) + P(cold, rainy)
Distribution
P(W) 0.4 0.35 0.25
10
Conditional Distributions
▪ Sometimes, we may know the value of some variables.
▪ For example, suppose we know about the temperature, e.g. hot
• Then, there is no more uncertainty about temperature.
• But there remains uncertainty over the weather (hazy?sunny?rainy?)
▪ Conditional distributions are probability distributions over some variables
given fixed (or known) values of other variables
P(W | T ) The probability of W given T

Full joint distribution


P(T, W)
hazy sunny rainy How to get the
hot 0.25 0.3 0.05 conditional distribution
cold 0.15 0.05 0.20 P(W | T ) from the full
joint distribution P(W, T)?

11
Conditional Distributions
Suppose the temperature is hot:

hazy sunny rainy


Full joint distribution 0.25 0.3 0.05
hot
P(T, W)
cold 0.15 0.05 0.20

Retain the entries conforming


to the given fact
Part of the hazy sunny rainy
full joint distribution 0.25 0.3 0.05
Does not sum to 1. Not a
hot probability distribution.
P(T=hot, W)
Normalize so
/0.6
that it sum to 1

Conditional distribution hazy sunny rainy


Sums to 1. This is a
P(W | hot) hot 0.42 0.5 0.08
probability distribution.
P(hazy|hot) P(sunny|hot) P(rainy|hot)
12
Conditional Distributions
Suppose the temperature is cold:

hazy sunny rainy


hot 0.25 0.3 0.05
Full joint distribution
cold 0.15 0.05 0.20
P(T, W)
Retain the entries conforming
to the given fact
Part of the hazy sunny rainy Does not sum to 1. Not a
full joint distribution 0.15 0.05 0.20 probability distribution.
cold
P(T=cold, W)
Normalize so that
/0.4
it sum to 1

Conditional distribution ℎ𝑎𝑧𝑦 𝑠𝑢𝑛𝑛𝑦 𝑟𝑎𝑖𝑛𝑦


cold 0.375 0.125 0.5 Sums to 1. This is a
P(W | cold) probability distribution.

P(hazy|cold) P(sunny|cold) P(rainy|cold) 13


Conditional Distributions
We can combine the two conditional distributions into
one single table:

Conditional distribution
P( W | T )

hazy sunny rainy P( W | hot )


hot 0.42 0.5 0.08
Sums to 1.

cold 0.375 0.125 0.5 P( W | cold )


Sums to 1.

Probability changes under


different conditions
14
Product Rule

▪ A simple relation between joint and conditional probabilities


can be expressed in terms of the product rule:
joint
𝑃(𝑥, 𝑦)
𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦) 𝑃 𝑥𝑦 =
𝑃 𝑦

marginal
▪ Example:
P(hazy) = P(hazy, hot) + P(hazy, cold)
hazy sunny rainy = 0.25 + 0.15 = 0.4
hot 0.25 0.3 0.05
0.15 0.05 0.20
P(hot | hazy) = P(hot, hazy) / P(hazy)
cold
= 0.25 / 0.4
= 0.625
P(hazy, hot) = P(hot | hazy) P(hazy)
= 0.625  0.4
= 0.25 15
Example 1
Given the joint distribution table below, compute the
following conditional distributions:
▪ 𝑃(𝑥 | 𝑦) = P(x,y)/P(y)
𝑋 𝑌 𝑃(𝑋, 𝑌) = 0.2/(0.2+0.4)
= 0.3333
¬𝑥 ¬𝑦 0.1
¬𝑥 𝑦 0.4
𝑥 ¬𝑦 0.3 ▪ 𝑃(¬𝑥 | 𝑦) = P(¬x, y)/P(y)
𝑥 𝑦 0.2 = 0.4/(0.2 + 0.4)
= 0.6667

▪ 𝑃(¬𝑦 | 𝑥 ) = P(x, ¬y)/P(x)


= 0.3/(0.3+0.2)
= 0.6
16
Example 2
▪ P(W)?
P(sun) = 0.3 + 0.1 + 0.1 + 0.15 = 0.65
Compute P(W), P(W | winter) and
P(W | winter, hot) given the P(rain) = 0.05 + 0.05 + 0.05 + 0.20 = 0.35
following joint distribution:
▪ P(W | winter)?
S T W P(S, T, W)
summer hot sun 0.30
P(W|winter) = P(W,winter)/P(winter)
summer hot rain 0.05 P(winter) = 0.1+0.05+0.15+0.2 = 0.5
summer cold sun 0.10 P(sun | winter) = (0.1+0.15) / 0.5 = 0.5
summer cold rain 0.05 P(rain | winter) = (0.05+0.2) / 0.5 = 0.5
winter hot sun 0.10
winter hot rain 0.05 ▪ P(W | summer, hot)?
winter cold sun 0.15 P(W|summer,hot) = P(W,summer, hot) /
winter cold rain 0.20 P(summer, hot)
P(summer, hot) = 0.30+0.05 = 0.35
P(sun |summer,hot) = 0.30 / 0.35 = 0.86
P(rain |summer,hot) = 0.05 / 0.35 = 0.14
Probabilities change with new evidence! 17
Example 3

There are two production lines (line 1 and 2) in a factory with the
same output capacity.
▪ Given that 30% of the product from line one is faulty. What is
the probability that a product which is faulty and comes from
line one?
P(line1) = 0.5 P(faulty, line1) = P(faulty|line1)P(line1)
P(faulty | line1) = 0.3 = (0.3)(0.5)
P(faulty, line1) = ? = 0.15
▪ 10% of the products are faulty and come from line two. Given a
product from line two, what is the probability that the product
is faulty?
P(line2) = 0.5 P(faulty|line2) = P(faulty, line2)/P(line2)
P(faulty, line2) = 0.1 = 0.1/0.5
P(faulty | line2) = ? = 0.20
18
Probabilistic Inference

▪ Probabilistic inference: compute a


desired probability from other known
probabilities
(e.g. conditional from joint)

▪ We generally compute conditional


probabilities of an event given certain
evidence (using product rule)
P (event| evidence)

Probabilities change with new evidence!

19
Inference by Enumeration
Given the following: and we want to find:
▪ Query event:
▪ Query variables:
▪ Evidence variables:
▪ Hidden variables:
(don’t care)
The inference process:

1. Select consistent entries 2. Marginalize


Select the entries consistent with the Sum out 𝐻 to get the joint
evidence to get: distribution of query and evidence
to get:

4. Get Event
3. Normalize
Sum the entries consistent with
Normalize to get the conditional
the event: probability: 20
Probabilistic Inference Example
▪ For example: Consider the joint probability of having a
toothache (Toothache), having a cavity in the teeth (Cavity),
and the dentist catches (detects) a damaged part of the teeth
(Catch)
toothache toothache
catch catch catch catch
cavity 0.108 0.012 0.072 0.008

cavity 0.016 0.064 0.144 0.576

We may want to find

P (cavity | toothache) P (catch | cavity)

21
Example 1
Suppose we want to find the
probability of a person having no
cavity if he has toothache
Query event: cavity
Query variable: Cavity = (cavity, cavity)
Evidence variables: Toothache = True Find: P (cavity | toothache)
Hidden variables: Catch

1. Select consistent entries 2. Sum out hidden variables


P(Cavity, Catch, toothache) P(Cavity, toothache)
catch catch cavity 0.108 + 0.012 = 0.120
cavity 0.108 0.012
cavity 0.016 0.064 cavity 0.016 + 0.064 = 0.080

4. Get Event 3. Normalize


P(Cavity | toothache)
P(cavity | toothache) = 0.4
cavity 0.12/(0.12+0.08) = 0.6
cavity 0.08/(0.12+0.08) = 0.4 22
Example 2
If there is a cavity in the patient's
teeth, what is the probability that
the dentist catches a damage?
Query event: catch
Query variable: Catch = (catch, catch)
Find P(catch | cavity)
Evidence variables: Cavity = True
Hidden variables: Toothache

1. Select consistent entries 2. Sum out hidden variables


P(Catch, Toothache, cavity) P(Catch, cavity)
toothache toothache catch catch
catch catch catch catch 0.108+0.072 0.012+0.008
=0.18 =0.02
cavity 0.108 0.012 0.072 0.008

3. Normalize
4. Get Event
P(Catch | cavity)
P(catch| cavity) = 0.90 catch catch
0.18/(0.18+0.02) 0.02/(0.18+0.02)
=0.90 =0.10 23
Causal and Diagnostic Probability
▪ Most of the time, we have the causal probability:
P(effect | cause)
For example: P(fever | dengue) = the probability (%) of a dengue
patient that has fever. This can be easily computed from historical
data.
▪ But more often than not, what we really want to do is to diagnostic
probability:
P(cause | effect)
For example: P(dengue | fever) = given a patient has fever
symptom, what is the probability that he got dengue. A doctor can
use this information to help him in his diagnosis.
▪ It is possible to compute the diagnostic probability from the causal
probability?
24
Bayes’ Rule

▪ Two ways to factor a joint distribution over two variables:


The Product Rule:

▪ From above, we get:


𝑃 𝑦 𝑥 𝑃(𝑥)
𝑃 𝑥𝑦 =
𝑃 𝑦

▪ Why is this at all helpful?


• Allow us to build one conditional from its reverse
• Often one conditional is tricky but the other one is simple to get
• Foundation of many AI systems (e.g. Diagnosis, Automatic Speech
Recognition (ASR), Object tracking, …)
▪ This is one of the most important AI equation!
25
Example 1

Assume that the probability of getting meningitis (brain fever) is 0.01%.


Among those who contracted meningitis, 80% has stiff neck whereas only
1% of those without meningitis has stiff neck. What is the probability of a
person having meningitis (m) if he has a stiff neck (s).

P(m) = 0.0001
P(s|m) = 0.8
P(s|m) = 0.01
P(m|s) = ?

About 1% of stiff neck patients might get meningitis.

26
Example 2
Given:

W P D W P
sunny 0.8 wet sunny 0.1
rainy 0.2 dry sunny 0.9
wet rainy 0.7
dry rainy 0.3

What is P(W | dry) ? P(W|dry) = P(dry|W)*P(W)/P(dry)


P(dry) = P(sunny, dry) + P(rainy, dry)
= P(dry | sunny) P(sunny) + P( dry| rainy) P(rainy)
= 0.9*0.8 + 0.3*0.2 = 0.72 + 0.06 = 0.78
P(sunny|dry) = P(dry|sunny)*P(sunny)/P(dry)
= 0.9*0.8 / 0.78 = 0.9231
P(rainy|dry) = P(dry|rainy)*P(rainy)/P(dry)
= 0.3*0.2 / 0.78 = 0.0769 27
Example 3
In ABC College, of all their students, 25% obtained good CGPA, 50% are average, and 25%
are bad. Suppose for any examination, a student with good CGPA has a 5% chance of
failing the examination, a student with average CGPA 15% chance of failing, and a
student with bad CGPA 25% chance of failing. For a student who failed an examination
last year, what are the probabilities of getting different categories of CGPA?

P(good) = 0.25 P(fail) = P (fail | good)*P(good) + P(fail | average)*P(average)


P(average) = 0.50 + P(fail | bad)*P(bad)
P(bad) = 0.25 = (0.05)(0.25)+ (0.15)(0.50) +(0.25)(0.25)
P(fail | good) = 0.05
P(fail | average) = 0.15 = 0.15
P(fail | bad) = 0.25
P(good | fail) = P(fail | good)*P(good)/p(fail)
Query: = (0.05)(0.25)/0.15
P(good | fail) = 0.083
P(average | fail)
P(bad | fail) P(average | fail) = P(fail | average)*P(average) / P(fail)
= (0.15)(0.50)/0.15
= 0.075/0.15
= 0.5
P(bad | fail) = P(fail | bad)* P(bad)/P(fail)
= (0.25)(0.25) /0.15
= 0.0625/0.15
= 0.417 28
Next:

Bayesian Network

You might also like