Professional Documents
Culture Documents
Bayes-Nets c19
Bayes-Nets c19
Fall 2002
Class #19 Monday, November 4
Todays class
(Probability theory)
Bayesian inference
From the joint distribution
Using independence/factoring
From sources of evidence
Bayesian networks
Network structure
Conditional probability tables
Conditional independence
Inference in Bayesian networks
Bayesian Reasoning /
Bayesian Networks
Chapters 14, 15.1-15.2
P(true) = 1 ; P(false) = 0
ab
alarm
earthquake
earthquake
earthquake
earthquake
burglary
.001
.008
.0001
.0009
burglary
.01
.09
.001
.79
Independence
When two sets of propositions do not affect each others
probabilities, we call them independent, and can easily
compute their joint and conditional probability:
Independent (A, B) P(A B) = P(A) P(B), P(A | B) = P(A)
Conditional independence
Absolute independence:
A and B are independent if P(A B) = P(A) P(B); equivalently,
P(A) = P(A | B) and P(B) = P(B | A)
Bayes rule
Bayes rule is derived from the product rule:
P(Y | X) = P(X | Y) P(Y) / P(X)
Bayesian inference
In the setting of diagnostic/evidential reasoning
H i P(Hi )
hypotheses
P(E j | H i )
E1
Ej
Em
evidence/manifestations
P(Hi )
P(E j | Hi )
P(Hi | E j )
E1, Em
H1, Hn
Ej and Hi are binary; hypotheses are mutually exclusive (nonoverlapping) and exhaustive (cover all possible cases)
Conditional probabilities:
P(Ej | Hi), i = 1, n; j = 1, m
Example BN
P(A) = 0.001
a
P(B|A) = 0.3
P(B|~A) = 0.001
P(C|A) = 0.2
P(C|~A) = 0.005
c
d
P(D|B,C) = 0.1
P(D|B,~C) = 0.01
P(D|~B,C) = 0.01
P(D|~B,~C) = 0.00001
e
P(E|C) = 0.4
P(E|~C) =
0.002
Note that we only specify P(A) etc., not P(A), since they have to add to one
Topological semantics
A node is conditionally independent of its nondescendants given its parents
A node is conditionally independent of all other nodes in
the network given its parents, children, and childrens
parents (also known as its Markov blanket)
The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z
P ( x1 ,..., x n ) ni1 P ( x i | i )
Chaining: Example
a
b
c
d
Computing i: Example
a
b
d
e
P (d) = P(d | b, c) P(b, c)
P(b, c) = P(a, b, c) + P(a, b, c)
(marginalizing)
= P(b | a, c) p (a, c) + p(b | a, c) p(a, c)
(product rule)
= P(b | a) P(c | a) P(a) + P(b | a) P(c | a) P(a)
If some variables are instantiated, can plug that in and
reduce amount of marginalization
Still have to marginalize over all values of uninstantiated
parents not computationally feasible with large networks
Representational extensions
Compactly representing CPTs
Noisy-OR
Noisy-MAX
Inference tasks
Simple queries: Computer posterior marginal P(Xi | E=e)
E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
Conjunctive queries:
P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
Approaches to inference
Exact inference
Enumeration
Variable elimination
Clustering / join tree algorithms
Approximate inference