Bayes-Nets c19

CMSC 471
Fall 2002
Class #19 Monday, November 4
Todays class
(Probability theory)
Bayesian inference
From the joint distribution
Using independence/factoring
From sources of evidence
Bayesian networks
Network structure
Conditional probability tables
Conditional independence
Inference in Bayesian networks
Bayesian Reasoning /
Bayesian Networks
Chapters 14, 15.1-15.2
Why probabilities anyway?
Kolmogorov showed that three simple axioms lead to the

rules of probability theory
De Finetti, Cox, and Carnap have also provided compelling

arguments for these axioms
1. All probabilities are between 0 and 1:
0 <= P(a) <= 1
2. Valid propositions (tautologies) have probability 1, and

unsatisfiable propositions have probability 0:
P(true) = 1 ; P(false) = 0
3. The probability of a disjunction is given by:
P(a b) = P(a) + P(b) P(a b)
ab
Inference from the joint: Example

alarm
alarm
earthquake
earthquake
earthquake
earthquake
burglary
.001
.008
.0001
.0009
burglary
.01
.09
.001
.79
P(Burglary | alarm) = P(Burglary, alarm)

= [P(Burglary, alarm, earthquake) + P(Burglary, alarm, earthquake)
= [ (.001, .01) + (.008, .09) ]
= [ (.009, .1) ]
Since P(burglary | alarm) + P(burglary | alarm) = 1, = 1/(.009+.1) = 9.173
(i.e., P(alarm) = 1/ = .109 quizlet: how can you verify this?)
P(burglary | alarm) = .009 * 9.173 = .08255
P(burglary | alarm) = .1 * 9.173 = .9173
Independence
When two sets of propositions do not affect each others
probabilities, we call them independent, and can easily
compute their joint and conditional probability:
Independent (A, B) P(A B) = P(A) P(B), P(A | B) = P(A)
For example, {moon-phase, light-level} might be

independent of {burglary, alarm, earthquake}
Then again, it might not: Burglars might be more likely to
burglarize houses when theres a new moon (and hence little light)
But if we know the light level, the moon phase doesnt affect
whether we are burglarized
Once were burglarized, light level doesnt affect whether the alarm
goes off
We need a more complex notion of independence, and

methods for reasoning about these kinds of relationships
Conditional independence
Absolute independence:
A and B are independent if P(A B) = P(A) P(B); equivalently,
P(A) = P(A | B) and P(B) = P(B | A)
A and B are conditionally independent given C if

P(A B | C) = P(A | C) P(B | C)
This lets us decompose the joint distribution:

P(A B C) = P(A | C) P(B | C) P(C)
Moon-Phase and Burglary are conditionally independent

given Light-Level
Conditional independence is weaker than absolute
independence, but still useful in decomposing the full joint
probability distribution
Bayes rule
Bayes rule is derived from the product rule:
P(Y | X) = P(X | Y) P(Y) / P(X)
Often useful for diagnosis:

If X are (observed) effects and Y are (hidden) causes,
We may have a model for how causes lead to effects (P(X | Y))
We may also have prior beliefs (based on experience) about the
frequency of occurrence of effects (P(Y))
Which allows us to reason abductively from effects to causes (P(Y |
X)).
Bayesian inference
In the setting of diagnostic/evidential reasoning
H i P(Hi )
hypotheses
P(E j | H i )
E1
Ej
Em
evidence/manifestations
Know prior probability of hypothesis

conditional probability
Want to compute the posterior probability
Bayes theorem (formula 1):

P(Hi | E j ) P(Hi )P(E j | H i ) / P(E j )
P(Hi )
P(E j | Hi )
P(Hi | E j )
Simple Bayesian diagnostic reasoning

Knowledge base:
Evidence / manifestations:
Hypotheses / disorders:
E1, Em
H1, Hn
Ej and Hi are binary; hypotheses are mutually exclusive (nonoverlapping) and exhaustive (cover all possible cases)
Conditional probabilities:
P(Ej | Hi), i = 1, n; j = 1, m
Cases (evidence for a particular instance): E1, , El

Goal: Find the hypothesis Hi with the highest posterior
Maxi P(Hi | E1, , El)
Bayesian diagnostic reasoning II

Bayes rule says that
P(Hi | E1, , El) = P(E1, , El | Hi) P(Hi) / P(E1, , El)
Assume each piece of evidence Ei is conditionally

independent of the others, given a hypothesis Hi, then:
P(E1, , El | Hi) = lj=1 P(Ej | Hi)
If we only care about relative probabilities for the Hi, then

we have:
P(Hi | E1, , El) = P(Hi) lj=1 P(Ej | Hi)
Limitations of simple Bayesian

inference
Cannot easily handle multi-fault situation, nor cases where
intermediate (hidden) causes exist:
Disease D causes syndrome S, which causes correlated
manifestations M1 and M2
Consider a composite hypothesis H1 H2, where H1 and H2

are independent. What is the relative posterior?
P(H1 H2 | E1, , El) = P(E1, , El | H1 H2) P(H1 H2)
= P(E1, , El | H1 H2) P(H1) P(H2)
= lj=1 P(Ej | H1 H2) P(H1) P(H2)
How do we compute P(Ej | H1 H2) ??
Limitations of simple Bayesian

inference II
Assume H1 and H2 are independent, given E1, , El?
P(H1 H2 | E1, , El) = P(H1 | E1, , El) P(H2 | E1, , El)
This is a very unreasonable assumption

Earthquake and Burglar are independent, but not given Alarm:
P(burglar | alarm, earthquake) << P(burglar | alarm)
Another limitation is that simple application of Bayes rule doesnt

allow us to handle causal chaining:
A: years weather; B: cotton production; C: next years cotton price
A influences C indirectly: A B C
P(C | B, A) = P(C | B)
Need a richer representation to model interacting hypotheses,

conditional independence, and causal chaining
Next time: conditional independence and Bayesian networks!
Bayesian Belief Networks (BNs)

Definition: BN = (DAG, CPD)
DAG: directed acyclic graph (BNs structure)
Nodes: random variables (typically binary or discrete, but

methods also exist to handle continuous variables)
Arcs: indicate probabilistic dependencies between nodes
(lack of link signifies conditional independence)
CPD: conditional probability distribution (BNs parameters)
Conditional probabilities at each node, usually stored as a table
(conditional probability table, or CPT)
P ( x i | i ) where i is the set of all parent nodes of x i
Root nodes are a special case no parents, so just use priors

in CPD:
i , so P ( x i | i ) P ( x i )
Example BN
P(A) = 0.001
a
P(B|A) = 0.3
P(B|~A) = 0.001
P(C|A) = 0.2
P(C|~A) = 0.005
c
d
P(D|B,C) = 0.1
P(D|B,~C) = 0.01
P(D|~B,C) = 0.01
P(D|~B,~C) = 0.00001
e
P(E|C) = 0.4
P(E|~C) =
0.002
Note that we only specify P(A) etc., not P(A), since they have to add to one
Topological semantics
A node is conditionally independent of its nondescendants given its parents
A node is conditionally independent of all other nodes in
the network given its parents, children, and childrens
parents (also known as its Markov blanket)
The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z
Independence and chaining

Independence assumption
P ( xi | i , q) P ( xi | i )
where q is any set of variables

q
(nodes) other than x i and its successors
xi
i blocks influence of other nodes on x i
and its successors (q influences x i only
through variables in i )
With this assumption, the complete joint probability distribution of all
variables in the network can be represented by (recovered from) local
CPD by chaining these CPD
P ( x1 ,..., x n ) ni1 P ( x i | i )
Chaining: Example
a
b
c
d
Computing the joint probability for all variables is easy:

P(a, b, c, d, e)
= P(e | a, b, c, d) P(a, b, c, d)
by Bayes theorem
= P(e | c) P(a, b, c, d)
by indep. assumption
= P(e | c) P(d | a, b, c) P(a, b, c)
= P(e | c) P(d | b, c) P(c | a, b) P(a, b)
= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)
Direct inference with BNs

Now suppose we just want the probability for one variable
Belief update method
Original belief (no variables are instantiated): Use prior
probability p(xi)
If xi is a root, then P(xi) is given directly in the BN (CPT at
Xi)
Otherwise,
P(xi) = i P(xi | i) P(i)
In this equation, P(xi | i) is given in the CPT, but computing

P(i) is complicated
Computing i: Example
a
b
d
e
P (d) = P(d | b, c) P(b, c)
P(b, c) = P(a, b, c) + P(a, b, c)
(marginalizing)
= P(b | a, c) p (a, c) + p(b | a, c) p(a, c)
(product rule)
= P(b | a) P(c | a) P(a) + P(b | a) P(c | a) P(a)
If some variables are instantiated, can plug that in and
reduce amount of marginalization
Still have to marginalize over all values of uninstantiated
parents not computationally feasible with large networks
Representational extensions
Compactly representing CPTs
Noisy-OR
Noisy-MAX
Adding continuous variables

Discretization
Use density functions (usually mixtures of Gaussians) to build
hybrid Bayesian networks (with discrete and continuous variables)
Inference tasks
Simple queries: Computer posterior marginal P(Xi | E=e)
E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
Conjunctive queries:
P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
Optimal decisions: Decision networks include utility

information; probabilistic inference is required to find
P(outcome | action, evidence)
Value of information: Which evidence should we seek next?
Sensitivity analysis: Which probability values are most
critical?
Explanation: Why do I need a new starter motor?
Approaches to inference
Exact inference
Enumeration
Variable elimination
Clustering / join tree algorithms
Approximate inference
Stochastic simulation / sampling methods

Markov chain Monte Carlo methods
Genetic algorithms
Neural networks
Simulated annealing
Mean field theory

Bayes-Nets c19

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes-Nets c19

Uploaded by

Copyright:

Available Formats

CMSC 471

Why probabilities anyway?

Kolmogorov showed that three simple axioms lead to the

De Finetti, Cox, and Carnap have also provided compelling

1. All probabilities are between 0 and 1:

0 <= P(a) <= 1

2. Valid propositions (tautologies) have probability 1, and

3. The probability of a disjunction is given by:

P(a b) = P(a) + P(b) P(a b)

Inference from the joint: Example

P(Burglary | alarm) = P(Burglary, alarm)

For example, {moon-phase, light-level} might be

We need a more complex notion of independence, and

A and B are conditionally independent given C if

This lets us decompose the joint distribution:

Moon-Phase and Burglary are conditionally independent

Often useful for diagnosis:

Know prior probability of hypothesis

Bayes theorem (formula 1):

Simple Bayesian diagnostic reasoning

Cases (evidence for a particular instance): E1, , El

Bayesian diagnostic reasoning II

Assume each piece of evidence Ei is conditionally

If we only care about relative probabilities for the Hi, then

Limitations of simple Bayesian

Consider a composite hypothesis H1 H2, where H1 and H2

How do we compute P(Ej | H1 H2) ??

Limitations of simple Bayesian

This is a very unreasonable assumption

Another limitation is that simple application of Bayes rule doesnt

Need a richer representation to model interacting hypotheses,

Bayesian Belief Networks (BNs)

Nodes: random variables (typically binary or discrete, but

P ( x i | i ) where i is the set of all parent nodes of x i

Root nodes are a special case no parents, so just use priors

Independence and chaining

where q is any set of variables

Computing the joint probability for all variables is easy:

Direct inference with BNs

In this equation, P(xi | i) is given in the CPT, but computing

Adding continuous variables

Optimal decisions: Decision networks include utility

Stochastic simulation / sampling methods

You might also like