Professional Documents
Culture Documents
CS480 Lecture October24th
CS480 Lecture October24th
2
Plan for Today
Bayes (Belief) Networks
Decision Networks
3
Joint Probability
The probability of event A and event B occurring (or more
than two events). It is the probability of the intersection of
two or more events (represented by random variables).
4
Complex Joint Probability Distribution
Consider a complex joint probability distribution involving N random variables f1, f2, f3, ...,
fN-1, fN . [values can be OTHER than true/false and non-binary]
2N values
... ... ... ... ... ... ...
5
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) and values :
6
Independent Variable / Factoring
Toothache Toothache
Cloudy
Catch Catch
Cavity 0.108 0.012 0.072 0.008
Cavity 0.016 0.064 0.144 0.576
8
Use Chain Rule To Decompose
9
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :
10
Expansion
11
Conditional Independence
Random variable X is conditionally independent of
random variable Y given Z if for all x Dx, for all y
Dy, and for all z Dz, such that
P(Y = y Z = z) > 0 and P(Y = y’ Z = z) > 0
P(X = x | Y = y Z = z) = P(X = x | Y = y’ Z = z)
Now, when:
P(H = happy R = rich) > 0 and P(H = unhappy R = rich) > 0
and:
If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.
14
Conditional Independence
Causal Chain:
Burglary Alarm MaryCalls
(B) (A) (M)
by Conditional Joint
Probability Probabilities
formula
by Chain Rule:
𝑃(𝑓 𝑓 . . . 𝑓 ) = ∏ 𝑃(𝑓 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑓 ))
If Alarm is given, what “happened before” Alarm does not directly influence MaryCalls.
15
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :
16
Chain Rule
Conditional probabilities can be used to decompose
conjunctions using the chain rule. For any propositions
(random variables) :
Enabled by conditional independence
17
Parents of Random Variable fi
Parents of random variable ( ) is a minimal
set of predecessors of in the total ordering such that the
other predecessors of are conditionally independent of
given .
20
Bayes Network: Factorization
Chain rule AND definition of gives us:
Factorization
B E P(A|B,E)
t t 0.95
t f 0.94
f t 0.29
f f 0.001
22
Inference by Enumeration: Example
Query (what is the probability distribution for the following conditional P()?):
23
Joint Probability: Marginalization
H: e: P(H, e) = P(H e):
grad female P(grad female)
true true 𝟎. 𝟎𝟕𝟒
SUM = 1
Probability P(H):
Probability P(H): “sum of all probabilities where H true”
24
Joint Probability: Marginalization
H: e: P(H, e) = P(H e):
grad female P(grad female)
true true 𝟎. 𝟎𝟕𝟒
SUM = 1
Probability P(e):
Probability P( ): “sum of all probabilities where true”
25
Joint Probability: Conditionals
H: e: P(H, e) = P(H e):
grad female P(grad female)
true true 𝟎. 𝟎𝟕𝟒
SUM = 1
A P(J|A) A P(M|A)
𝒚
t 0.90 t 0.70
where ys are all possible values for Ys, -
f 0.05 f 0.01
normalization constant.
28
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
Burglary Earthquake
a list of evidence variables K: (B) (E)
, B E P(A|B,E)
29
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
30
General Inference Procedure: Example
Query (note: full probability distribution):
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
31
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
By Chain rule:
A P(J|A) A P(M|A)
𝑃(𝑏, 𝑗, 𝑚, 𝑒 𝑎)
= 𝑃(𝑏) ∗ 𝑃(𝑒) ∗ 𝑃(𝑎|𝑏, 𝑒) ∗ 𝑃(𝑗|𝑎) ∗ 𝑃(𝑚|𝑎) t 0.90 t 0.70
f 0.05 f 0.01
32
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
33
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
34
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
35
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
36
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
38
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
39
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
40
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
41
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
P(m|a) P(m|a)
42
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
43
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
44
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
45
General Inference Procedure: Example
Query (note: one, specific, probability):
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
46
General Inference Procedure: Example
Query:
𝑷 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒 𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)
We can now get the full distribution: P(B) P(B) P(E) P(E)
𝑷(𝐵 | 𝑗, 𝑚) = ∗ < 0.00059224, 0.0014919 > 0.001 0.999 0.002 0.998
A P(J|A) A P(M|A)
t 0.90 t 0.70
f 0.05 f 0.01
47
More On Conditional Independence
Common Cause: Common Effect:
Burglary Earthquake
(B) (E)
Alarm Alarm
(A) (A)
JohnCalls MaryCalls
(J) (M)
JohnCalls and MaryCalls are CONDITIONALLY Burglary and Earthquake are NOT
independent given Alarm CONDITIONALLY independent given Alarm
48
More On Conditional Independence
Common Cause: Common Effect:
Parents
Non- Non-
descendant descendant
Descendants
49
Playing Minesweeper with Bayes’ Rule
Prior probability / belief: Posterior probability / belief:
X X
50
Naive Bayes Spam Filter
Email = Spam
51
Agents and Belief State
Partially
Sensor A=4 B=7 observable
A=4
Agent’s model of the world
B=7
Environment could be in one of Agent can
those states: consult its C=0
Agent Environment
Assume: DC = {0,1,2,3}
52
Decision Theory
Decisions: every plan (actions) leads to an
outcome (state)
Agents have preferences (preferred outcomes)
Preferences outcome utilities
Agents have degrees of belief (probabilities) for
actions
53
Decision Theory
Decisions: every plan (actions) leads to an
outcome (state)
Agents have preferences (preferred outcomes)
Preferences outcome utilities
Agents have degrees of belief (probabilities) for
actions
54
Maximum Expected (Average) Utility
55
Agents Decisions
Recall that agent ACTIONS change the state:
if we are in state s
action a is expected to
lead to another state s’ (outcome)
56
Expected Action Utility
The expected utility of an action a given the
evidence is the average utility value of all
possible outcomes s’ of action a, weighted by
their probability (belief) of occurence:
’ ’
57
State Utility Function
Agent’s preferences (desires) are captured by
the Utility function U(s).
58
OK, But What is Utility Function?
59
How Did We Get Here?
Let’s start with relationships (and related notation)
between agent’s preferences:
agent prefers A over B:
A B
agent is indifferent between A and B:
A ~ B
agent prefers A over B or is indifferent between
A and B (weak preference):
A B
60
The Concept of Lottery
Let’s assume the following:
an action a is a lottery ticket
the set of outcomes (resulting states) is a lottery
A lottery L with possible outcomes S1, ..., Sn that
occur with probabilities p1, ..., pn is written as:
62
Lottery Constraints: Transitivity
Given three lotteries A, B, and C, if an agent
prefers A to B AND prefers B to C, then the agent
must prefer A to C:
(A B) (B C) (A C)
63
Lottery Constraints: Continuity
If some lottery B is between A and C in preference,
then there is some probability p for which the
rational agent will be indifferent between getting B
for sure or some other lottery that yields A with
probability p and C with probability 1 - p:
(A B C) p [p, A; 1- p, C] ~ B
64
Lottery Constraints: Substitutability
If an agent is indifferent between two lotteries A
and B, then the agent is indifferent between two
more complex lotteries that are the same, except
that B is subsituted for A in one of them:
(A ~ B) [p, A; 1- p, C] ~ [p, B; 1- p, C]
65
Lottery Constraints: Monotonicity
Suppose two lotteries have the same two possible
outcomes, A and B. If an agent prefers A to B, then
the agent must prefer the lottery that has a higher
probability for A:
66
Lottery Constraints: Decomposability
Compound lotteries can be reduced to smaller ones
using the laws of probability:
67
Preferences and Utility Function
An agent whose preferences between lotteries
follow the set of axioms (of utility theory) below:
Orderability
Transitivity
Continuity
Subsitutability
Monotonicity
Decomposability
can be described as possesing a utility function and
maximize it.
68
Preferences and Utility Function
If an agent’s preferences obey the axioms of utility
theory, then there exist a function U such that:
and
69
Multiattribute Outcomes
Outcomes can be characterized by more than one
attribute. Decisions in such cases are handled by
Multiattribute Utility Theory.
Attributes: X = X1, ..., Xn
Assigned values: x = <x1, ..., xn>
70
Strict Dominance: Deterministic
B strictly dominates A
B is better than A
for both X1 and X2
71
Strict Dominance: Deterministic
D doesn’t strictly
dominate A,D is better
than A only for X1
72
Strict Dominance: Uncertain
B strictly dominates A
B is better than A
for both X1 and X2
73
Strict Dominance: Uncertain
D doesn’t strictly
dominate A,D is better
than A only for X1
74
Decision Network (Influence Diagram)
Decision networks (also called influence diagrams)
are structures / mechanisms for making rational
decisions.
75
Decision Networks
The most basic decision network needs to include:
information about current state s
possible actions
resulting state s’ (after applying chosen action a)
utility of the resulting state U(s’)
76
Decision Network Nodes
Decision networks are built using the following
nodes:
chance nodes:
X
decision nodes:
Y
77
Decision Network: Example
78
Decision Network: Example
Nodes describing
current state
79
Decision Network: Example
Evidence
80
Decision Network: Example
Airport Site decision
changes Safety, Quietness,
and Frugality nodes
conditional distributions
81
Decision Network: Example
Parents of the
utility node
82
Decision Network: Example
Outcome nodes
83
Decision Network: Example
Nodes directly
influencing utility
84
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
85
Decision Network: Example
86
Decision Network: Simplified Form
Action-Utility table
is used to get
expected utility
87
(Single-Stage) Decision Networks
General Structure Simplified Structure
Decision Network Decision Network
88
(Single-Stage) Decision Networks
General Structure Simplified Structure
Q low low high high low low high high L low low high --- --- low high high
F low high low high low high low high C low high low --- --- high low high
89
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
90