CS480 Lecture October24th

CS 480
Introduction to Artificial Intelligence
October 24, 2023

Announcements / Reminders
 Please follow the Week 09 To Do List instructions (if you
haven't already)
 Written Assignment #03 posted
 Programming Assignment #01 due on Sunday
(10/29/23) at 11:59 PM CST
 Final Exam date:

– Thursday 11/30/2023 (last week of classes!)
 Ignore the date provided by the Registrar
2
Plan for Today
 Bayes (Belief) Networks
 Decision Networks
3
Joint Probability
The probability of event A and event B occurring (or more
than two events). It is the probability of the intersection of
two or more events (represented by random variables).

For example (specific probability shown):

 
For any propositions (random variables) :

  
4
Complex Joint Probability Distribution
Consider a complex joint probability distribution involving N random variables f1, f2, f3, ...,
fN-1, fN . [values can be OTHER than true/false and non-binary]
N Random Variables Joint

f1 f2 f3 ... fN-1 fN Probability
true true true ... true true 0.0011

2N Possible Worlds (Models)
true true true ... true false 0.0451

true true false ... false true 0.1011
2N values
... ... ... ... ... ... ...
false false true ... true false 0.0909

false false true ... false true 0.0651
false false false ... false false 0.2021
5
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) and values :
  
 
 
6
Independent Variable / Factoring
Toothache Toothache
Cloudy
Catch Catch Catch Catch

Cavity 0.108 0.012 0.072 0.008
Cavity 0.016 0.064 0.144 0.576
Toothache Toothache
Catch Catch
Cloudy
Catch Catch
Cavity 0.108 0.012 0.072 0.008
Cavity 0.016 0.064 0.144 0.576
It’s hard to imagine Cloudy influencing other variables, so:
This shows that Cloudy is INDEPENDENT of other

variables and factoring can be applied.
7
Factoring / Decomposition
8
Use Chain Rule To Decompose
9
Chain Rule
(random variables) :
     
10
Expansion
   
     
        
           
     
11
Conditional Independence
Random variable X is conditionally independent of
random variable Y given Z if for all x  Dx, for all y
 Dy, and for all z  Dz, such that
P(Y = y  Z = z) > 0 and P(Y = y’  Z = z) > 0
P(X = x | Y = y  Z = z) = P(X = x | Y = y’  Z = z)
In other words, given a value of Z, knowing Y’s

value DOES NOT affect your belief in the value of
X.
12
Consider three random variables: P(owerful), H(appy), R(ich)
with domains:
DP = {powerful, powerless}, DH = {happy, unhappy}, DR = {rich, poor}
Now, when:
P(H = happy  R = rich) > 0 and P(H = unhappy  R = rich) > 0
and:
P(P = powerful | H = happy  R = rich) = P(P = powerful | H = unhappy  R = rich)
In other words, given a value of R, knowing Y’s value DOES NOT

affect your belief in the value of X.
“Being un/happy does not make you less powerful, if you are rich.”
13
Causal Chain:
Burglary Alarm MaryCalls
(B) (A) (M)
Burglary and MaryCalls are CONDITIONALLY independent given Alarm.
If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.
14
Causal Chain:
Burglary Alarm MaryCalls
(B) (A) (M)
by Conditional Joint
Probability Probabilities
formula
by Chain Rule:
𝑃(𝑓  𝑓  . . .  𝑓 ) = ∏ 𝑃(𝑓 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑓 ))
Burglary and MaryCalls are CONDITIONALLY independent given Alarm.
If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.
15
Chain Rule
     
However, it can be rewritten as:
  
because with conditional independence(s) considered:
  
16
Chain Rule
  
  
 Enabled by conditional independence
17
Parents of Random Variable fi
Parents of random variable ( ) is a minimal
set of predecessors of in the total ordering such that the
other predecessors of are conditionally independent of
given .
A set of all predecessors of :

A set of all parents of :
A set of all non-parents (predecessors NOT in ) of :
  
when are given (all their values are known).

18
Parents of Random Variable fi
Parents of random variable ( ) is a minimal
set of predecessors of in the total ordering such that the
other predecessors of are conditionally independent of
given .
So: when are given, probabilistically

depends on each of its parents ( ), but is
independent of its other predecessors. That is

such that:
  
19
Bayesian (Belief) Network
A Bayesian belief network describes the joint probability
distribution for a set of variables.
A Bayesian network is an acyclic, directed graph (DAG), where the
nodes are random variables (propositions). There is an edge (arc)
from each elements of parents(Xi) into Xi. Associated with the
Bayesian network is a set of conditional probability distributions -
the conditional probability of each variable given its parents (which
includes the prior probabilities of those variables with no parents).
Consists of:
 a graph (DAG) with nodes corresponding to random variables
 a domain for each random variable
 a set of conditional probability distributions P(Xi | parents(Xi))
20
Bayes Network: Factorization
Chain rule AND definition of gives us:
  
Joint probability distribution = Product of conditional probabilities after

factorization of joint probability distribution
Factorization
Bayes Network: graph

representation of joint probability
Joint probability distribution distribution factorization
21
Inference In Bayes Networks
B E P(A|B,E)
t t 0.95
t f 0.94
f t 0.29
f f 0.001
22
Inference by Enumeration: Example
Query (what is the probability distribution for the following conditional P()?):

23
Joint Probability: Marginalization
H: e: P(H, e) = P(H  e):
grad female P(grad  female)
true true 𝟎. 𝟎𝟕𝟒
true false 𝟎. 𝟏𝟒𝟖
false true 𝟎. 𝟎𝟖𝟔
false false 𝟎. 𝟔𝟗𝟏
SUM = 1
Probability P(H):

Probability P(H): “sum of all probabilities where H true”
24
Joint Probability: Marginalization
H: e: P(H, e) = P(H  e):
SUM = 1
Probability P(e):

Probability P( ): “sum of all probabilities where true”
25
Joint Probability: Conditionals
H: e: P(H, e) = P(H  e):
SUM = 1
From product rule:


we can derive:


26
General Inference Procedure
Given:
 a query involving a single variable X
 a list of evidence variables E
 a list of observed values e for E,
 a list of remaining unobserved variables Y (in our example: just
Catch),
where X, E, and Y together are a COMPLETE set of variables for the
domain, the probability distribution P(X | E) can be evaluated as:
 
where ys are all possible values for Ys,  - normalization constant.

P(X, e, y) is a subset of probabilities from the joint distribution
27
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
Given: P(B) P(B) P(E) P(E)

 a query involving single variable X, 0.001 0.999 0.002 0.998
 a list of evidence variables K,
Burglary Earthquake
 a list of observed values k for K, (B) (E)
B E P(A|B,E)
 a list of remaining unobserved
t t 0.95
variables Y Alarm
(A) t f 0.94
the probability P(X | K) can be evaluated
f t 0.29
as:
 JohnCalls MaryCalls f f 0.001
(J) (M)
 A P(J|A) A P(M|A)
𝒚
t 0.90 t 0.70
where ys are all possible values for Ys,  -
f 0.05 f 0.01
normalization constant.
28

 a query involving single variable X: 0.001 0.999 0.002 0.998
Burglary Earthquake
 a list of evidence variables K: (B) (E)
, B E P(A|B,E)
 a list of observed values k for K: t t 0.95

Alarm
(A) t f 0.94
 a list of remaining unobserved f t 0.29

variables Y: , JohnCalls MaryCalls f f 0.001
(J) (M)
as: A P(J|A) A P(M|A)
t 0.90 t 0.70

f 0.05 f 0.01
𝒚
29

 a list of evidence variables K: ,
Burglary Earthquake
 a list of observed values k for K: (B) (E)
B E P(A|B,E)
t t 0.95
variables Y: Alarm
(A) t f 0.94
f t 0.29
as:
JohnCalls MaryCalls f f 0.001
(J) (M)

𝒚
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
30

Burglary Earthquake
B E P(A|B,E)
t t 0.95
variables Y: Alarm
(A) t f 0.94
the probability P( | J, M) can be
f t 0.29
evaluated as:
(J) (M)

A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
31
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)

Burglary Earthquake
B E P(A|B,E)
t t 0.95
variables Y: Alarm
(A) t f 0.94
the query can be evaluated as:
f t 0.29
 JohnCalls MaryCalls f f 0.001
(J) (M)
By Chain rule:
A P(J|A) A P(M|A)
𝑃(𝑏, 𝑗, 𝑚, 𝑒 𝑎)
= 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) t 0.90 t 0.70
f 0.05 f 0.01
32

Burglary Earthquake
B E P(A|B,E)
t t 0.95
variables Y: Alarm
(A) t f 0.94
𝑃(𝑏 | 𝑗, 𝑚) f t 0.29
= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) (J) (M)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
33

Burglary Earthquake
B E P(A|B,E)
t t 0.95
variables Y: Alarm
(A) t f 0.94
𝑃(𝑏 | 𝑗, 𝑚) f t 0.29
= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) (J) (M)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) A P(J|A) A P(M|A)

t 0.90 t 0.70
f 0.05 f 0.01
34
Query rewritten: P(B) P(B) P(E) P(E)

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998
= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
f t 0.29
(J) (M)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
35

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
f t 0.29
(J) (M)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
36

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
38

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
P(e) P(e) JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
39

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
+ +
P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
40

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
+ +
t 0.90 t 0.70
P(j|a) P(j|a)
f 0.05 f 0.01
41

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
+ +
t 0.90 t 0.70
P(j|a) P(j|a)
f 0.05 f 0.01
P(m|a) P(m|a)
42

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
+ +
P(a|b,e) P(a|b,e) P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a) P(j|a) P(j|a)
f 0.05 f 0.01
P(m|a) P(m|a) P(m|a) P(m|a)
43

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
P(b) f t 0.29
+
(J) (M)
+ +
P(a|b,e) P(a|b,e) P(a|b,e) P(a|b,e) A P(J|A) A P(M|A)
t 0.90 t 0.70
P(j|a) P(j|a) P(j|a) P(j|a)
f 0.05 f 0.01
P(m|a) P(m|a) P(m|a) P(m|a)
44

𝑃(𝑏 | 𝑗, 𝑚) 0.001 0.999 0.002 0.998

(B) (E)
B E P(A|B,E)
=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎)
t t 0.95
Alarm
(A) t f 0.94
0.001 f t 0.29
+
0.002 0.998 JohnCalls MaryCalls f f 0.001
(J) (M)
+ +
0.95 0.05 0.94 1-0.94 A P(J|A) A P(M|A)
t 0.90 t 0.70
0.90 0.05 0.90 0.05
f 0.05 f 0.01
0.70 0.01 0.70 0.01
45
We can now calculate: P(B) P(B) P(E) P(E)

𝑃(𝑏 | 𝑗, 𝑚) =  ∗ 0.00059224 0.001 0.999 0.002 0.998
Complementary probability can be Burglary

(B)
Earthquake
(E)
calculated through similar process (not B E P(A|B,E)
shown) and we will get: t t 0.95
Alarm
𝑃(𝑏 | 𝑗, 𝑚) =  ∗ 0.0014919 (A) t f 0.94
f t 0.29
We need it. The answer to our query: f f 0.001
JohnCalls MaryCalls
(J) (M)

A P(J|A) A P(M|A)
is a vector (conditional probability distribution) t 0.90 t 0.70
f 0.05 f 0.01
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒  𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒) =
46
Query:
We can now get the full distribution: P(B) P(B) P(E) P(E)
𝑷(𝐵 | 𝑗, 𝑚) =  ∗ < 0.00059224, 0.0014919 > 0.001 0.999 0.002 0.998
which after normalization becomes: Burglary

(B)
Earthquake
(E)
𝑷(𝐵 | 𝑗, 𝑚)  < 0.284, 0.716 > B E P(A|B,E)

t t 0.95
Alarm
(A) t f 0.94
f t 0.29
(J) (M)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
47
More On Conditional Independence
Common Cause: Common Effect:
Burglary Earthquake
(B) (E)
Alarm Alarm
(A) (A)
JohnCalls MaryCalls
(J) (M)
JohnCalls and MaryCalls Burglary and Earthquake

are NOT independent are independent
JohnCalls and MaryCalls are CONDITIONALLY Burglary and Earthquake are NOT
independent given Alarm CONDITIONALLY independent given Alarm
48
More On Conditional Independence
Common Cause: Common Effect:
Parents
Non- Non-
descendant descendant
Descendants
Node X is conditionally independent of its Node X is conditionally independent of ALL

non-descendants given its parents. other nodes in the network its given its
Markov blanket.
Why do we care?
An unconstrained joint probability distribution with N binary variables involves 2N probabilities.
Bayesian network with at most k parents per each node (N) involves N * 2k probabilities (k < N).
49
Playing Minesweeper with Bayes’ Rule
Prior probability / belief: Posterior probability / belief:
X X
50
Naive Bayes Spam Filter
Email = Spam
Word 1 Word 2 ..... Word N
P(Email = spam | Word1) = 0.09

P(Email = spam | Word2) = 0.01
...
P(Email = spam | WordN) = 0.03
51
Agents and Belief State
Partially
Sensor A=4 B=7 observable
A=4
Agent’s model of the world
B=7
Environment could be in one of Agent can
those states: consult its C=0
S1: A = 4, B = 7, C = 0  Plan X internal

representation
S2: A = 4, B = 7, C = 1  Plan Y
of the world /
S3: A = 4, B = 7, C = 2  Plan X environment to
S4: A = 4, B = 7, C = 3  Plan Z choose action
Plans are sequences of actions.

Actuator ACTION(S)
Agent Environment
Assume: DC = {0,1,2,3}
52
Decision Theory
 Decisions: every plan (actions) leads to an
outcome (state)
 Agents have preferences (preferred outcomes)
 Preferences  outcome utilities
 Agents have degrees of belief (probabilities) for
actions
Decision theory = probability theory + utility theory
53
Decision Theory
 Decisions: every plan (actions) leads to an
outcome (state)
 Agents have preferences (preferred outcomes)
 Preferences  outcome utilities
 Agents have degrees of belief (probabilities) for
actions
Decision theory = probability theory + utility theory

BELIEFS DESIRES
54
Maximum Expected (Average) Utility
Outcome S10: P(S10) = 0.1 U(S10) = 2
Action M Outcome S15: P(S15) = 0.4 U(S15) = 3
Environment could be in one of

those states: Outcome S55: P(S55) = 0.5 U(S55) = 1
A = 4, B = 7, C = 0  Plan X Pick highest
A = 4, B = 7, C = 1  Plan Y MEU action
A = 4, B = 7, C = 2  Plan X Outcome S20: P(S20) = 0.7 U(S20) = 8
A = 4, B = 7, C = 3  Plan Z
Action N Outcome S15: P(S15) = 0.2 U(S15) = 4
Outcome S12: P(S12) = 0.1 U(S12) = 5
55
Agents Decisions
Recall that agent ACTIONS change the state:
 if we are in state s
 action a is expected to
 lead to another state s’ (outcome)
Given uncertainty about the current state s and action outcome s’

we need to define the following:
 probability (belief) of being in state s: P(s)
 probability (belief) of action a leading to outcome s’: P(s’ | s, a)
Now:
56
Expected Action Utility
The expected utility of an action a given the
evidence is the average utility value of all
possible outcomes s’ of action a, weighted by
their probability (belief) of occurence:
’ ’
Rational agent should choose an action that

maximizes the expected utility:
57
State Utility Function
Agent’s preferences (desires) are captured by
the Utility function U(s).
Utility function assigns a value to each state s

to express how desirable this state is to the
agent.
58
OK, But What is Utility Function?
59
How Did We Get Here?
Let’s start with relationships (and related notation)
between agent’s preferences:
 agent prefers A over B:
A B
 agent is indifferent between A and B:
A ~ B
 agent prefers A over B or is indifferent between
A and B (weak preference):
A B
60
The Concept of Lottery
Let’s assume the following:
 an action a is a lottery ticket
 the set of outcomes (resulting states) is a lottery
A lottery L with possible outcomes S1, ..., Sn that
occur with probabilities p1, ..., pn is written as:
L = [p1, S1; p2, S2; ... ; pn, Sn]
Lottery outcome Si: atomic state or another lottery.

61
Lottery Constraints: Orderability
Given two lotteries A and B, a rational agent must
either prefer one or else rate them as equally
preferable:
Exactly one of (A B), (B A), or (A ~ B) holds
62
Lottery Constraints: Transitivity
Given three lotteries A, B, and C, if an agent
prefers A to B AND prefers B to C, then the agent
must prefer A to C:
(A B)  (B C)  (A C)
63
Lottery Constraints: Continuity
If some lottery B is between A and C in preference,
then there is some probability p for which the
rational agent will be indifferent between getting B
for sure or some other lottery that yields A with
probability p and C with probability 1 - p:
(A B C)  p [p, A; 1- p, C] ~ B
64
Lottery Constraints: Substitutability
If an agent is indifferent between two lotteries A
and B, then the agent is indifferent between two
more complex lotteries that are the same, except
that B is subsituted for A in one of them:
(A ~ B)  [p, A; 1- p, C] ~ [p, B; 1- p, C]
65
Lottery Constraints: Monotonicity
Suppose two lotteries have the same two possible
outcomes, A and B. If an agent prefers A to B, then
the agent must prefer the lottery that has a higher
probability for A:
(A B)  (p > q  [p, A; 1- p, B] [q, A; 1- q, B])
66
Lottery Constraints: Decomposability
Compound lotteries can be reduced to smaller ones
using the laws of probability:
[p, A; 1- p, [q, B; 1- q, C]] ~ [p, A; (1- p)*q, B; (1- p)*(1- q), C]
67
Preferences and Utility Function
An agent whose preferences between lotteries
follow the set of axioms (of utility theory) below:
 Orderability
 Transitivity
 Continuity
 Subsitutability
 Monotonicity
 Decomposability
can be described as possesing a utility function and
maximize it.
68
Preferences and Utility Function
If an agent’s preferences obey the axioms of utility
theory, then there exist a function U such that:
U(A) = U(B) if and only if (A ~ B)
and
U(A) > U(B) if and only if (A B)
69
Multiattribute Outcomes
Outcomes can be characterized by more than one
attribute. Decisions in such cases are handled by
Multiattribute Utility Theory.
Attributes: X = X1, ..., Xn
Assigned values: x = <x1, ..., xn>
70
Strict Dominance: Deterministic
B strictly dominates A
B is better than A
for both X1 and X2
71
Strict Dominance: Deterministic
D doesn’t strictly
dominate A,D is better
than A only for X1
72
Strict Dominance: Uncertain
B strictly dominates A
B is better than A
for both X1 and X2
73
Strict Dominance: Uncertain
D doesn’t strictly
dominate A,D is better
than A only for X1
74
Decision Network (Influence Diagram)
Decision networks (also called influence diagrams)
are structures / mechanisms for making rational
decisions.
Decision networks are based on Bayesian

networks, but include additional nodes that
represent actions and utilities.
75
Decision Networks
The most basic decision network needs to include:
 information about current state s
 possible actions
 resulting state s’ (after applying chosen action a)
 utility of the resulting state U(s’)
76
Decision Network Nodes
Decision networks are built using the following
nodes:
 chance nodes:
X
 decision nodes:
Y
 utility (or value) nodes

Z
77
Decision Network: Example
78
Nodes describing
current state
79
Evidence
80
Airport Site decision
changes Safety, Quietness,
and Frugality nodes
conditional distributions
81
Parents of the
utility node
82
Outcome nodes
83
Nodes directly
influencing utility
84
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
85
Utility table is used

to get utility
86
Decision Network: Simplified Form
Action-Utility table
is used to get
expected utility
87
(Single-Stage) Decision Networks
General Structure Simplified Structure
Decision Network Decision Network
Decision Utility Decision Utility

Node Node Node Node
Bayes Network Bayes Network
88
(Single-Stage) Decision Networks
General Structure Simplified Structure
Utility Table Action-Utility Table (not all columns shown)

S low low low low high high high high AT low low low --- --- high high high
Q low low high high low low high high L low low high --- --- low high high
F low high low high low high low high C low high low --- --- high low high
U 10 20 5 50 70 150 100 200 AS A A A --- --- B B B
U 10 20 5 --- --- 150 100 200
89
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
90

CS480 Lecture October24th

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS480 Lecture October24th

Uploaded by

Copyright:

Available Formats

CS 480

Introduction to Artificial Intelligence

October 24, 2023

 Final Exam date:

For example (specific probability shown):

For any propositions (random variables) :

N Random Variables Joint

true true true ... true true 0.0011

true true true ... true false 0.0451

false false true ... true false 0.0909

Catch Catch Catch Catch

It’s hard to imagine Cloudy influencing other variables, so:

This shows that Cloudy is INDEPENDENT of other

In other words, given a value of Z, knowing Y’s

DP = {powerful, powerless}, DH = {happy, unhappy}, DR = {rich, poor}

P(P = powerful | H = happy  R = rich) = P(P = powerful | H = unhappy  R = rich)

In other words, given a value of R, knowing Y’s value DOES NOT

Burglary and MaryCalls are CONDITIONALLY independent given Alarm.

Burglary and MaryCalls are CONDITIONALLY independent given Alarm.

However, it can be rewritten as:

because with conditional independence(s) considered:

A set of all predecessors of :

when are given (all their values are known).

So: when are given, probabilistically

Joint probability distribution = Product of conditional probabilities after

Bayes Network: graph

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

true false 𝟎. 𝟏𝟒𝟖

false true 𝟎. 𝟎𝟖𝟔

false false 𝟎. 𝟔𝟗𝟏

From product rule:

where ys are all possible values for Ys,  - normalization constant.

Given: P(B) P(B) P(E) P(E)

Given: P(B) P(B) P(E) P(E)

 a list of observed values k for K: t t 0.95

 a list of remaining unobserved f t 0.29

Given: P(B) P(B) P(E) P(E)

Given: P(B) P(B) P(E) P(E)

Given: P(B) P(B) P(E) P(E)

Given: P(B) P(B) P(E) P(E)

Given: P(B) P(B) P(E) P(E)

=  ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) A P(J|A) A P(M|A)

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

Query rewritten: P(B) P(B) P(E) P(E)

= ∗ 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) Burglary Earthquake

[p, A; 1- p, [q, B; 1- q, C]] ~ [p, A; (1- p)q, B; (1- p)(1- q), C]