You are on page 1of 90

CS 480

Introduction to Artificial Intelligence

October 24, 2023


Announcements / Reminders
 Please follow the Week 09 To Do List instructions (if you
haven't already)
 Written Assignment #03 posted
 Programming Assignment #01 due on Sunday
(10/29/23) at 11:59 PM CST

 Final Exam date:


– Thursday 11/30/2023 (last week of classes!)
 Ignore the date provided by the Registrar

2
Plan for Today
 Bayes (Belief) Networks
 Decision Networks

3
Joint Probability
The probability of event A and event B occurring (or more
than two events). It is the probability of the intersection of
two or more events (represented by random variables).

For example (specific probability shown):


 

For any propositions (random variables) :


  

4
Complex Joint Probability Distribution
Consider a complex joint probability distribution involving N random variables f1, f2, f3, ...,
fN-1, fN . [values can be OTHER than true/false and non-binary]

N Random Variables Joint


f1 f2 f3 ... fN-1 fN Probability

true true true ... true true 0.0011


2N Possible Worlds (Models)

true true true ... true false 0.0451


true true false ... false true 0.1011

2N values
... ... ... ... ... ... ...

false false true ... true false 0.0909


false false true ... false true 0.0651
false false false ... false false 0.2021

5
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) and values :

  

 
 

6
Independent Variable / Factoring
Toothache Toothache
Cloudy

Catch Catch Catch Catch


Cavity 0.108 0.012 0.072 0.008
Cavity 0.016 0.064 0.144 0.576
Toothache Toothache
Catch Catch
Cloudy

Catch Catch
Cavity 0.108 0.012 0.072 0.008
Cavity 0.016 0.064 0.144 0.576

It’s hard to imagine Cloudy influencing other variables, so:

This shows that Cloudy is INDEPENDENT of other


variables and factoring can be applied.
7
Factoring / Decomposition

8
Use Chain Rule To Decompose

9
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :

     

10
Expansion
   
     
        
           

     

11
Conditional Independence
Random variable X is conditionally independent of
random variable Y given Z if for all x  Dx, for all y
 Dy, and for all z  Dz, such that
P(Y = y  Z = z) > 0 and P(Y = y’  Z = z) > 0

P(X = x | Y = y  Z = z) = P(X = x | Y = y’  Z = z)

In other words, given a value of Z, knowing Y’s


value DOES NOT affect your belief in the value of
X.
12
Conditional Independence
Consider three random variables: P(owerful), H(appy), R(ich)
with domains:

DP = {powerful, powerless}, DH = {happy, unhappy}, DR = {rich, poor}

Now, when:
P(H = happy  R = rich) > 0 and P(H = unhappy  R = rich) > 0
and:

P(P = powerful | H = happy  R = rich) = P(P = powerful | H = unhappy  R = rich)

In other words, given a value of R, knowing Y’s value DOES NOT


affect your belief in the value of X.
“Being un/happy does not make you less powerful, if you are rich.”
13
Conditional Independence
Causal Chain:
Burglary Alarm MaryCalls
(B) (A) (M)

Burglary and MaryCalls are CONDITIONALLY independent given Alarm.

If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.

14
Conditional Independence
Causal Chain:
Burglary Alarm MaryCalls
(B) (A) (M)

by Conditional Joint
Probability Probabilities
formula
by Chain Rule:
𝑃(𝑓  𝑓  . . .  𝑓 ) = ∏ 𝑃(𝑓 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑓 ))

Burglary and MaryCalls are CONDITIONALLY independent given Alarm.

If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.

15
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :

     

However, it can be rewritten as:

  

because with conditional independence(s) considered:

  
16
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :

  

  
 Enabled by conditional independence

17
Parents of Random Variable fi
Parents of random variable ( ) is a minimal
set of predecessors of in the total ordering such that the
other predecessors of are conditionally independent of
given .

A set of all predecessors of :


A set of all parents of :
A set of all non-parents (predecessors NOT in ) of :

  

when are given (all their values are known).


18
Parents of Random Variable fi
Parents of random variable ( ) is a minimal
set of predecessors of in the total ordering such that the
other predecessors of are conditionally independent of
given .

So: when are given, probabilistically


depends on each of its parents ( ), but is
independent of its other predecessors. That is

such that:
  
19
Bayesian (Belief) Network
A Bayesian belief network describes the joint probability
distribution for a set of variables.
A Bayesian network is an acyclic, directed graph (DAG), where the
nodes are random variables (propositions). There is an edge (arc)
from each elements of parents(Xi) into Xi. Associated with the
Bayesian network is a set of conditional probability distributions -
the conditional probability of each variable given its parents (which
includes the prior probabilities of those variables with no parents).
Consists of:
 a graph (DAG) with nodes corresponding to random variables
 a domain for each random variable
 a set of conditional probability distributions P(Xi | parents(Xi))

20
Bayes Network: Factorization
Chain rule AND definition of gives us:

  

Joint probability distribution = Product of conditional probabilities after


factorization of joint probability distribution

Factorization

Bayes Network: graph


representation of joint probability
Joint probability distribution distribution factorization
21
Inference In Bayes Networks

B E P(A|B,E)
t t 0.95
t f 0.94
f t 0.29
f f 0.001

22
Inference by Enumeration: Example
Query (what is the probability distribution for the following conditional P()?):

23
Joint Probability: Marginalization
H: e: P(H, e) = P(H  e):
grad female P(grad  female)
true true 𝟎. 𝟎𝟕𝟒

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

SUM = 1

Probability P(H):

Probability P(H): “sum of all probabilities where H true”

24
Joint Probability: Marginalization
H: e: P(H, e) = P(H  e):
grad female P(grad  female)
true true 𝟎. 𝟎𝟕𝟒

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

SUM = 1

Probability P(e):

Probability P( ): “sum of all probabilities where true”

25
Joint Probability: Conditionals
H: e: P(H, e) = P(H  e):
grad female P(grad  female)
true true 𝟎. 𝟎𝟕𝟒

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

SUM = 1

From product rule:



we can derive:


26
General Inference Procedure
Given:
 a query involving a single variable X
 a list of evidence variables E
 a list of observed values e for E,
 a list of remaining unobserved variables Y (in our example: just
Catch),
where X, E, and Y together are a COMPLETE set of variables for the
domain, the probability distribution P(X | E) can be evaluated as:

 

where ys are all possible values for Ys,  - normalization constant.


P(X, e, y) is a subset of probabilities from the joint distribution
27
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X, 0.001 0.999 0.002 0.998
 a list of evidence variables K,
Burglary Earthquake
 a list of observed values k for K, (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y Alarm
(A) t f 0.94
the probability P(X | K) can be evaluated
f t 0.29
as:
 JohnCalls MaryCalls f f 0.001
(J) (M)

 A P(J|A) A P(M|A)
𝒚
t 0.90 t 0.70
where ys are all possible values for Ys,  -
f 0.05 f 0.01
normalization constant.

28
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998

Burglary Earthquake
 a list of evidence variables K: (B) (E)

, B E P(A|B,E)

 a list of observed values k for K: t t 0.95


Alarm
(A) t f 0.94

 a list of remaining unobserved f t 0.29


variables Y: , JohnCalls MaryCalls f f 0.001
(J) (M)
the probability P(X | K) can be evaluated
as: A P(J|A) A P(M|A)
t 0.90 t 0.70

f 0.05 f 0.01
𝒚
where ys are all possible values for Ys,  -
normalization constant.

29
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998
 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y: Alarm
(A) t f 0.94
the probability P(X | K) can be evaluated
f t 0.29
as:
JohnCalls MaryCalls f f 0.001
(J) (M)

𝒚
A P(J|A) A P(M|A)
where ys are all possible values for Ys,  -
t 0.90 t 0.70
normalization constant.
f 0.05 f 0.01

30
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998
 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y: Alarm
(A) t f 0.94
the probability P( | J, M) can be
f t 0.29
evaluated as:
JohnCalls MaryCalls f f 0.001
(J) (M)

A P(J|A) A P(M|A)
where ys are all possible values for Ys,  -
t 0.90 t 0.70
normalization constant.
f 0.05 f 0.01

31
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998
 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y: Alarm
(A) t f 0.94
the query can be evaluated as:
f t 0.29
 JohnCalls MaryCalls f f 0.001
(J) (M)

By Chain rule:
A P(J|A) A P(M|A)
𝑃(𝑏, 𝑗, 𝑚, 𝑒 𝑎)
= 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) t 0.90 t 0.70
f 0.05 f 0.01

32
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998
 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y: Alarm
(A) t f 0.94
the query can be evaluated as:
𝑃(𝑏 | 𝑗, 𝑚) f t 0.29
JohnCalls MaryCalls f f 0.001
= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) (J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

33
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Given: P(B) P(B) P(E) P(E)


 a query involving single variable X: 0.001 0.999 0.002 0.998
 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y: Alarm
(A) t f 0.94
the query can be evaluated as:
𝑃(𝑏 | 𝑗, 𝑚) f t 0.29
JohnCalls MaryCalls f f 0.001
= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) (J) (M)

=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) A P(J|A) A P(M|A)


t 0.90 t 0.70
f 0.05 f 0.01

34
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
f t 0.29
JohnCalls MaryCalls f f 0.001
(J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

35
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
f t 0.29
JohnCalls MaryCalls f f 0.001
(J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

36
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
JohnCalls MaryCalls f f 0.001
(J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

38
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

39
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

40
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a)
f 0.05 f 0.01

41
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a)
f 0.05 f 0.01

P(m|a) P(m|a)

42
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
P(a|b,e) P(a|b,e) P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a) P(j|a) P(j|a)
f 0.05 f 0.01

P(m|a) P(m|a) P(m|a) P(m|a)

43
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
P(a|b,e) P(a|b,e) P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a) P(j|a) P(j|a)
f 0.05 f 0.01

P(m|a) P(m|a) P(m|a) P(m|a)

44
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Query rewritten: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake


(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
0.001 f t 0.29
+
0.002 0.998 JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
0.95 0.05 0.94 1-0.94 A P(J|A) A P(M|A)
t 0.90 t 0.70
0.90 0.05 0.90 0.05
f 0.05 f 0.01

0.70 0.01 0.70 0.01

45
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

We can now calculate: P(B) P(B) P(E) P(E)


𝑃(𝑏 | 𝑗, 𝑚) =  ∗ 0.00059224 0.001 0.999 0.002 0.998

Complementary probability can be Burglary


(B)
Earthquake
(E)
calculated through similar process (not B E P(A|B,E)
shown) and we will get: t t 0.95
Alarm
𝑃(𝑏 | 𝑗, 𝑚) =  ∗ 0.0014919 (A) t f 0.94
f t 0.29
We need it. The answer to our query: f f 0.001
JohnCalls MaryCalls
(J) (M)

𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)


A P(J|A) A P(M|A)
is a vector (conditional probability distribution) t 0.90 t 0.70
f 0.05 f 0.01
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒) =

46
General Inference Procedure: Example
Query:
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

We can now get the full distribution: P(B) P(B) P(E) P(E)
𝑷(𝐵 | 𝑗, 𝑚) =  ∗ < 0.00059224, 0.0014919 > 0.001 0.999 0.002 0.998

which after normalization becomes: Burglary


(B)
Earthquake
(E)

𝑷(𝐵 | 𝑗, 𝑚)  < 0.284, 0.716 > B E P(A|B,E)


t t 0.95
Alarm
(A) t f 0.94
f t 0.29
JohnCalls MaryCalls f f 0.001
(J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

47
More On Conditional Independence
Common Cause: Common Effect:

Burglary Earthquake
(B) (E)

Alarm Alarm
(A) (A)

JohnCalls MaryCalls
(J) (M)

JohnCalls and MaryCalls Burglary and Earthquake


are NOT independent are independent

JohnCalls and MaryCalls are CONDITIONALLY Burglary and Earthquake are NOT
independent given Alarm CONDITIONALLY independent given Alarm

48
More On Conditional Independence
Common Cause: Common Effect:
Parents

Non- Non-
descendant descendant

Descendants

Node X is conditionally independent of its Node X is conditionally independent of ALL


non-descendants given its parents. other nodes in the network its given its
Markov blanket.
Why do we care?
An unconstrained joint probability distribution with N binary variables involves 2N probabilities.
Bayesian network with at most k parents per each node (N) involves N * 2k probabilities (k < N).

49
Playing Minesweeper with Bayes’ Rule
Prior probability / belief: Posterior probability / belief:

X X

50
Naive Bayes Spam Filter
Email = Spam

Word 1 Word 2 ..... Word N

P(Email = spam | Word1) = 0.09


P(Email = spam | Word2) = 0.01
...
P(Email = spam | WordN) = 0.03

51
Agents and Belief State
Partially
Sensor A=4 B=7 observable

A=4
Agent’s model of the world
B=7
Environment could be in one of Agent can
those states: consult its C=0

S1: A = 4, B = 7, C = 0  Plan X internal


representation
S2: A = 4, B = 7, C = 1  Plan Y
of the world /
S3: A = 4, B = 7, C = 2  Plan X environment to
S4: A = 4, B = 7, C = 3  Plan Z choose action

Plans are sequences of actions.


Actuator ACTION(S)

Agent Environment

Assume: DC = {0,1,2,3}

52
Decision Theory
 Decisions: every plan (actions) leads to an
outcome (state)
 Agents have preferences (preferred outcomes)
 Preferences  outcome utilities
 Agents have degrees of belief (probabilities) for
actions

Decision theory = probability theory + utility theory

53
Decision Theory
 Decisions: every plan (actions) leads to an
outcome (state)
 Agents have preferences (preferred outcomes)
 Preferences  outcome utilities
 Agents have degrees of belief (probabilities) for
actions

Decision theory = probability theory + utility theory


BELIEFS DESIRES

54
Maximum Expected (Average) Utility

Outcome S10: P(S10) = 0.1 U(S10) = 2

Action M Outcome S15: P(S15) = 0.4 U(S15) = 3

Environment could be in one of


those states: Outcome S55: P(S55) = 0.5 U(S55) = 1
A = 4, B = 7, C = 0  Plan X Pick highest
A = 4, B = 7, C = 1  Plan Y MEU action
A = 4, B = 7, C = 2  Plan X Outcome S20: P(S20) = 0.7 U(S20) = 8
A = 4, B = 7, C = 3  Plan Z
Action N Outcome S15: P(S15) = 0.2 U(S15) = 4

Outcome S12: P(S12) = 0.1 U(S12) = 5

55
Agents Decisions
Recall that agent ACTIONS change the state:
 if we are in state s
 action a is expected to
 lead to another state s’ (outcome)

Given uncertainty about the current state s and action outcome s’


we need to define the following:
 probability (belief) of being in state s: P(s)
 probability (belief) of action a leading to outcome s’: P(s’ | s, a)
Now:

56
Expected Action Utility
The expected utility of an action a given the
evidence is the average utility value of all
possible outcomes s’ of action a, weighted by
their probability (belief) of occurence:
’ ’

Rational agent should choose an action that


maximizes the expected utility:

57
State Utility Function
Agent’s preferences (desires) are captured by
the Utility function U(s).

Utility function assigns a value to each state s


to express how desirable this state is to the
agent.

58
OK, But What is Utility Function?

59
How Did We Get Here?
Let’s start with relationships (and related notation)
between agent’s preferences:
 agent prefers A over B:
A B
 agent is indifferent between A and B:
A ~ B
 agent prefers A over B or is indifferent between
A and B (weak preference):
A B
60
The Concept of Lottery
Let’s assume the following:
 an action a is a lottery ticket
 the set of outcomes (resulting states) is a lottery
A lottery L with possible outcomes S1, ..., Sn that
occur with probabilities p1, ..., pn is written as:

L = [p1, S1; p2, S2; ... ; pn, Sn]

Lottery outcome Si: atomic state or another lottery.


61
Lottery Constraints: Orderability
Given two lotteries A and B, a rational agent must
either prefer one or else rate them as equally
preferable:

Exactly one of (A B), (B A), or (A ~ B) holds

62
Lottery Constraints: Transitivity
Given three lotteries A, B, and C, if an agent
prefers A to B AND prefers B to C, then the agent
must prefer A to C:

(A B)  (B C)  (A C)

63
Lottery Constraints: Continuity
If some lottery B is between A and C in preference,
then there is some probability p for which the
rational agent will be indifferent between getting B
for sure or some other lottery that yields A with
probability p and C with probability 1 - p:

(A B C)  p [p, A; 1- p, C] ~ B

64
Lottery Constraints: Substitutability
If an agent is indifferent between two lotteries A
and B, then the agent is indifferent between two
more complex lotteries that are the same, except
that B is subsituted for A in one of them:

(A ~ B)  [p, A; 1- p, C] ~ [p, B; 1- p, C]

65
Lottery Constraints: Monotonicity
Suppose two lotteries have the same two possible
outcomes, A and B. If an agent prefers A to B, then
the agent must prefer the lottery that has a higher
probability for A:

(A B)  (p > q  [p, A; 1- p, B] [q, A; 1- q, B])

66
Lottery Constraints: Decomposability
Compound lotteries can be reduced to smaller ones
using the laws of probability:

[p, A; 1- p, [q, B; 1- q, C]] ~ [p, A; (1- p)*q, B; (1- p)*(1- q), C]

67
Preferences and Utility Function
An agent whose preferences between lotteries
follow the set of axioms (of utility theory) below:
 Orderability
 Transitivity
 Continuity
 Subsitutability
 Monotonicity
 Decomposability
can be described as possesing a utility function and
maximize it.
68
Preferences and Utility Function
If an agent’s preferences obey the axioms of utility
theory, then there exist a function U such that:

U(A) = U(B) if and only if (A ~ B)

and

U(A) > U(B) if and only if (A B)

69
Multiattribute Outcomes
Outcomes can be characterized by more than one
attribute. Decisions in such cases are handled by
Multiattribute Utility Theory.
Attributes: X = X1, ..., Xn
Assigned values: x = <x1, ..., xn>

70
Strict Dominance: Deterministic

B strictly dominates A
B is better than A
for both X1 and X2

71
Strict Dominance: Deterministic

D doesn’t strictly
dominate A,D is better
than A only for X1

72
Strict Dominance: Uncertain
B strictly dominates A
B is better than A
for both X1 and X2

73
Strict Dominance: Uncertain

D doesn’t strictly
dominate A,D is better
than A only for X1

74
Decision Network (Influence Diagram)
Decision networks (also called influence diagrams)
are structures / mechanisms for making rational
decisions.

Decision networks are based on Bayesian


networks, but include additional nodes that
represent actions and utilities.

75
Decision Networks
The most basic decision network needs to include:
 information about current state s
 possible actions
 resulting state s’ (after applying chosen action a)
 utility of the resulting state U(s’)

76
Decision Network Nodes
Decision networks are built using the following
nodes:
 chance nodes:
X

 decision nodes:
Y

 utility (or value) nodes


Z

77
Decision Network: Example

78
Decision Network: Example
Nodes describing
current state

79
Decision Network: Example
Evidence

80
Decision Network: Example
Airport Site decision
changes Safety, Quietness,
and Frugality nodes
conditional distributions

81
Decision Network: Example

Parents of the
utility node

82
Decision Network: Example

Outcome nodes

83
Decision Network: Example

Nodes directly
influencing utility

84
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility

85
Decision Network: Example

Utility table is used


to get utility

86
Decision Network: Simplified Form

Action-Utility table
is used to get
expected utility

87
(Single-Stage) Decision Networks
General Structure Simplified Structure
Decision Network Decision Network

Decision Utility Decision Utility


Node Node Node Node

Bayes Network Bayes Network

88
(Single-Stage) Decision Networks
General Structure Simplified Structure

Utility Table Action-Utility Table (not all columns shown)


S low low low low high high high high AT low low low --- --- high high high

Q low low high high low low high high L low low high --- --- low high high

F low high low high low high low high C low high low --- --- high low high

U 10 20 5 50 70 150 100 200 AS A A A --- --- B B B

U 10 20 5 --- --- 150 100 200

89
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility

90

You might also like