This action might not be possible to undo. Are you sure you want to continue?

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation

**Inference in Bayesian networks
**

B A J M E

Chapter 14.4–5 Rewrite full joint entries using product of CPT entries: P(B|j, m) = α Σe Σa P(B)P (e)P(a|B, e)P (j|a)P (m|a) = αP(B) Σe P (e) Σa P(a|B, e)P (j|a)P (m|a) Recursive depth-ﬁrst enumeration: O(n) space, O(dn) time

Simple query on the burglary network: P(B|j, m) = P(B, j, m)/P (j, m) = αP(B, j, m) = α Σe Σa P(B, e, a, j, m)

Chapter 14.4–5

1

Chapter 14.4–5

4

Outline

♦ Exact inference by enumeration ♦ Exact inference by variable elimination ♦ Approximate inference by stochastic simulation ♦ Approximate inference by Markov chain Monte Carlo

Enumeration algorithm

function Enumeration-Ask(X, e, bn) returns a distribution over X inputs: X, the query variable e, observed values for variables E bn, a Bayesian network with variables {X} ∪ E ∪ Y Q(X ) ← a distribution over X, initially empty for each value xi of X do extend e with value xi for X Q(xi ) ← Enumerate-All(Vars[bn], e) return Normalize(Q(X )) function Enumerate-All(vars, e) returns a real number if Empty?(vars) then return 1.0 Y ← First(vars) if Y has value y in e then return P (y | P a(Y )) × Enumerate-All(Rest(vars), e) else return y P (y | P a(Y )) × Enumerate-All(Rest(vars), ey ) where ey is e extended with Y = y

Chapter 14.4–5

2

Chapter 14.4–5

5

Inference tasks

Simple queries: compute posterior marginal P(Xi|E = e) e.g., P (N oGas|Gauge = empty, Lights = on, Starts = f alse) Conjunctive queries: P(Xi, Xj |E = e) = P(Xi|E = e)P(Xj |Xi, E = e) Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome|action, evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor?

P(m|a) .70 P(j|a) .90 P(a|b,e) .95 P(e) .002

Evaluation tree

P(b) .001 P( e) .998 P( a|b,e) .05 P(a|b, e) .94 P( a|b, e) .06

P(j| a) .05

P(j|a) .90

P(j| a) .05

P(m| a) .01

P(m|a) .70

P(m| a) .01

**Enumeration is ineﬃcient: repeated computation e.g., computes P (j|a)P (m|a) for each value of e
**

Chapter 14.4–5 3 Chapter 14.4–5 6

**Inference by variable elimination
**

Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation

B E A J M

Irrelevant variables

Consider the query P (JohnCalls|Burglary = true) P (J|b) = αP (b)

e a m

P(B|j, m) = α P(B) Σe P (e) Σa P(a|B, e) P (j|a) P (m|a) = αP(B)ΣeP (e)ΣaP(a|B, e)P (j|a)fM (a) = αP(B)ΣeP (e)ΣaP(a|B, e)fJ (a)fM (a) = αP(B)ΣeP (e)ΣafA(a, b, e)fJ (a)fM (a) = αP(B)ΣeP (e)fAJM (b, e) (sum out A) ¯ = αP(B)fE AJM (b) (sum out E) ¯¯ = αfB (b) × fE AJM (b) ¯¯ Thm 1: Y is irrelevant unless Y ∈ Ancestors({X} ∪ E) Here, X = JohnCalls, E = {Burglary}, and Ancestors({X} ∪ E) = {Alarm, Earthquake} so M aryCalls is irrelevant (Compare this to backward chaining from the query in Horn clause KBs)

B E A J M

P (e)

P (a|b, e)P (J|a)

P (m|a)

Sum over m is identically 1; M is irrelevant to the query

Chapter 14.4–5

7

Chapter 14.4–5

10

**Variable elimination: Basic operations
**

Summing out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors

**Irrelevant variables contd.
**

Defn: moral graph of Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by C iﬀ separated by C in the moral graph Thm 2: Y is irrelevant if m-separated from X by E

B E A J M

Σxf1 × · · · × fk = f1 × · · · × fi Σx fi+1 × · · · × fk = f1 × · · · × fi × fX ¯

assuming f1, . . . , fi do not depend on X Pointwise product of factors f1 and f2: f1(x1, . . . , xj , y1, . . . , yk ) × f2(y1, . . . , yk , z1, . . . , zl ) = f (x1, . . . , xj , y1, . . . , yk , z1, . . . , zl ) E.g., f1(a, b) × f2(b, c) = f (a, b, c)

For P (JohnCalls|Alarm = true), both Burglary and Earthquake are irrelevant

Chapter 14.4–5

8

Chapter 14.4–5

11

**Variable elimination algorithm
**

function Elimination-Ask(X, e, bn) returns a distribution over X inputs: X, the query variable e, evidence speciﬁed as an event bn, a belief network specifying joint distribution P(X1, . . . , Xn) factors ← [ ]; vars ← Reverse(Vars[bn]) for each var in vars do factors ← [Make-Factor(var , e)|factors] if var is a hidden variable then factors ← Sum-Out(var, factors) return Normalize(Pointwise-Product(factors))

**Complexity of exact inference
**

Singly connected networks (or polytrees): – any two nodes are connected by at most one (undirected) path – time and space cost of variable elimination are O(dk n) Multiply connected networks: – can reduce 3SAT to exact inference ⇒ NP-hard – equivalent to counting 3SAT models ⇒ #P-complete

0.5 0.5 0.5 0.5

A

L

B 1. A v B v C 2. C v D v 3. B v C v A

L L

C

D

L

1 D

2

3

AND

Chapter 14.4–5

9

Chapter 14.4–5

12

**Inference by stochastic simulation
**

Basic idea: 1) Draw N samples from a sampling distribution S ˆ 2) Compute an approximate posterior probability P 3) Show this converges to the true probability P

P(C) .50

Example

0.5

Cloudy

Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples – Markov chain Monte Carlo (MCMC): sample from a stochastic process whose stationary distribution is the true posterior

C P(S|C) T .10 F .50

Sprinkler Rain

Coin

C P(R|C) T .80 F .20

Wet Grass

S R P(W|S,R) T T F F

Chapter 14.4–5 13 Chapter 14.4–5 16

T F T F

.99 .90 .90 .01

**Sampling from an empty network
**

function Prior-Sample(bn) returns an event sampled from bn inputs: bn, a belief network specifying joint distribution P(X1, . . . , Xn) x ← an event with n elements for i = 1 to n do xi ← a random sample from P(Xi | parents(Xi)) given the values of P arents(Xi) in x return x

Example

P(C) .50

Cloudy

C P(S|C) T .10 F .50

Sprinkler

Rain

C P(R|C) T .80 F .20

Wet Grass

S R P(W|S,R) T T F F

Chapter 14.4–5 14

T F T F

.99 .90 .90 .01

Chapter 14.4–5 17

Example

P(C) .50

Cloudy

Example

P(C) .50

Cloudy

C P(S|C) T .10 F .50

Sprinkler

Rain

C P(R|C) T .80 F .20

Wet Grass

C P(S|C) T .10 F .50

Sprinkler

Rain

C P(R|C) T .80 F .20

Wet Grass

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

Chapter 14.4–5 15

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

Chapter 14.4–5 18

Example

P(C) .50

Cloudy

**Sampling from an empty network contd.
**

Probability that PriorSample generates a particular event n SP S (x1 . . . xn) = Πi = 1P (xi|parents(Xi)) = P (x1 . . . xn) i.e., the true prior probability E.g., SP S (t, f, t, t) = 0.5 × 0.9 × 0.8 × 0.9 = 0.324 = P (t, f, t, t)

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

**Let NP S (x1 . . . xn) be the number of samples generated for event x1, . . . , xn Then we have
**

N →∞

Wet Grass

N →∞

ˆ lim P (x1, . . . , xn) = lim NP S (x1, . . . , xn)/N = SP S (x1, . . . , xn) = P (x1 . . . xn)

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

That is, estimates derived from PriorSample are consistent ˆ Shorthand: P (x1, . . . , xn) ≈ P (x1 . . . xn)

Chapter 14.4–5

19

Chapter 14.4–5

22

Example

P(C) .50

Cloudy

Rejection sampling

ˆ P(X|e) estimated from samples agreeing with e

function Rejection-Sampling(X, e, bn, N) returns an estimate of P (X |e) local variables: N, a vector of counts over X, initially zero

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

for j = 1 to N do x ← Prior-Sample(bn) if x is consistent with e then N[x] ← N[x]+1 where x is the value of X in x return Normalize(N[X])

Wet Grass

S R P(W|S,R) T T F F

Chapter 14.4–5 20

T F T F

.99 .90 .90 .01

E.g., estimate P(Rain|Sprinkler = true) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = f alse. ˆ P(Rain|Sprinkler = true) = Normalize( 8, 19 ) = 0.296, 0.704 Similar to a basic real-world empirical estimation procedure

Chapter 14.4–5 23

Example

P(C) .50

Cloudy

**Analysis of rejection sampling
**

ˆ (algorithm defn.) P(X|e) = αNP S (X, e) = NP S (X, e)/NP S (e) (normalized by NP S (e)) ≈ P(X, e)/P (e) (property of PriorSample) = P(X|e) (defn. of conditional probability)

Rain

C P(S|C) T .10 F .50

Sprinkler

C P(R|C) T .80 F .20

Wet Grass

Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P (e) is small P (e) drops oﬀ exponentially with number of evidence variables!

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

Chapter 14.4–5 21 Chapter 14.4–5 24

Likelihood weighting

Idea: ﬁx evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence

Cloudy

**Likelihood weighting example
**

P(C) .50

function Likelihood-Weighting(X, e, bn, N) returns an estimate of P (X |e) local variables: W, a vector of weighted counts over X, initially zero for j = 1 to N do x, w ← Weighted-Sample(bn) W[x ] ← W[x ] + w where x is the value of X in x return Normalize(W[X ])

Wet Grass

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

S R P(W|S,R)

function Weighted-Sample(bn, e) returns an event and a weight x ← an event with n elements; w ← 1 for i = 1 to n do if Xi has a value xi in e then w ← w × P (Xi = xi | parents(Xi )) else xi ← a random sample from P(Xi | parents(Xi )) return x, w

T T F F .99 .90 .90 .01 T F T F

w = 1.0

Chapter 14.4–5

25

Chapter 14.4–5

28

**Likelihood weighting example
**

P(C) .50

Cloudy Cloudy

**Likelihood weighting example
**

P(C) .50

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

C P(S|C) T .10 F .50

Sprinkler

Rain

C P(R|C) T .80 F .20

Wet Grass

Wet Grass

S R P(W|S,R) T T F F .99 .90 .90 .01 T F T F

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

w = 1.0

w = 1.0 × 0.1

Chapter 14.4–5

26

Chapter 14.4–5

29

**Likelihood weighting example
**

P(C) .50

Cloudy

**Likelihood weighting example
**

P(C) .50

Cloudy

C P(S|C) T .10 F .50

Sprinkler Wet Grass

Rain

C P(R|C) T .80 F .20

C P(S|C) T .10 F .50

Sprinkler

Rain

C P(R|C) T .80 F .20

Wet Grass

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

S R P(W|S,R) T T F F T F T F .99 .90 .90 .01

w = 1.0

w = 1.0 × 0.1

Chapter 14.4–5

27

Chapter 14.4–5

30

**Likelihood weighting example
**

P(C) .50

**Approximate inference using MCMC
**

“State” of network = current assignment to all variables. Generate next state by sampling one variable given Markov blanket Sample each variable in turn, keeping evidence ﬁxed

Cloudy

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

Wet Grass

function MCMC-Ask(X, e, bn, N) returns an estimate of P (X |e) local variables: N[X ], a vector of counts over X, initially zero Z, the nonevidence variables in bn x, the current state of the network, initially copied from e initialize x with random values for the variables in Y for j = 1 to N do for each Zi in Z do sample the value of Zi in x from P(Zi |mb(Zi )) given the values of M B(Zi ) in x N[x ] ← N[x ] + 1 where x is the value of X in x return Normalize(N[X ])

S R P(W|S,R) T T F F .99 .90 .90 .01 T F T F

**w = 1.0 × 0.1 Can also choose a variable to sample at random each time
**

Chapter 14.4–5 31 Chapter 14.4–5 34

**Likelihood weighting example
**

P(C) .50

Cloudy

Sprinkler Wet Grass Rain Cloudy

**The Markov chain
**

With Sprinkler = true, W etGrass = true, there are four states:

Cloudy

C P(S|C) T .10 F .50

Sprinkler Rain

C P(R|C) T .80 F .20

Sprinkler Wet Grass

Rain

Wet Grass

S R P(W|S,R) T T F F .99 .90 .90 .01 T F T F

Cloudy Cloudy

Sprinkler Wet Grass

Rain

Sprinkler Wet Grass

Rain

**w = 1.0 × 0.1 × 0.99 = 0.099 Wander about for a while, average what you see
**

Chapter 14.4–5 32 Chapter 14.4–5 35

**Likelihood weighting analysis
**

Sampling probability for WeightedSample is l SW S (z, e) = Πi = 1P (zi|parents(Zi)) Note: pays attention to evidence in ancestors only ⇒ somewhere “in between” prior and posterior distribution Weight for a given sample z, e is m w(z, e) = Πi = 1P (ei|parents(Ei)) Weighted sampling probability is SW S (z, e)w(z, e) l m = Πi = 1P (zi|parents(Zi)) Πi = 1P (ei|parents(Ei)) = P (z, e) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight

Chapter 14.4–5 33

**MCMC example contd.
**

Estimate P(Rain|Sprinkler = true, W etGrass = true)

Cloudy

Sample Cloudy or Rain given its Markov blanket, repeat. Count number of times Rain is true and false in the samples.

Sprinkler Wet Grass Rain

E.g., visit 100 states 31 have Rain = true, 69 have Rain = f alse ˆ P(Rain|Sprinkler = true, W etGrass = true) = Normalize( 31, 69 ) = 0.31, 0.69 Theorem: chain approaches stationary distribution: long-run fraction of time spent in each state is exactly proportional to its posterior probability

Chapter 14.4–5

36

**Markov blanket sampling
**

Markov blanket of Cloudy is Sprinkler and Rain Markov blanket of Rain is Cloudy, Sprinkler, and W etGrass

Cloudy Sprinkler Wet Grass Rain

Probability given the Markov blanket is calculated as follows: P (xi|mb(Xi)) = P (xi|parents(Xi))ΠZj ∈Children(Xi)P (zj |parents(Zj )) Easily implemented in message-passing parallel systems, brains Main computational problems: 1) Diﬃcult to tell if convergence has been achieved 2) Can be wasteful if Markov blanket is large: P (Xi|mb(Xi)) won’t change much (law of large numbers)

Chapter 14.4–5

37

Summary

Exact inference by variable elimination: – polytime on polytrees, NP-hard on general graphs – space = time, very sensitive to topology Approximate inference by LW, MCMC: – LW does poorly when there is lots of (downstream) evidence – LW, MCMC generally insensitive to topology – Convergence can be very slow with probabilities close to 1 or 0 – Can handle arbitrary combinations of discrete and continuous variables

Chapter 14.4–5

38

- Robotics (40)
- Robotics (39)
- Robotics (38)
- Robotics (37)
- Robotics (36)
- Robotics (35)
- Robotics (34)
- Robotics (33)
- Robotics (32)
- Robotics (31)
- Robotics (30)
- Robotics (29)
- Robotics (28)
- Robotics (27)
- Robotics (25)
- Robotics (24)
- Robotics (23)
- Robotics (22)
- Robotics (21)
- Robotics (20)
- Robotics (19)
- Robotics (18)
- Robotics (17)
- Robotics (16)

Sign up to vote on this title

UsefulNot useful- Robotics (27)
- Ch3 MCMC Basics
- Section 3.4 Sampling
- Markov Chain Monte Carlo for Statistical Inference BESAG.pdf
- Chapter14 Mod b
- Uai Mcmc Tutorial
- Mcmc Theory
- Montecarlo
- Markov Chain Monte Carlo
- Mcmc Gibbs Intro
- Markov networks
- hang doi
- SEM11 for Permeability Simulation
- Markov Chain Monte Carlo for Statistical Inference
- ass5
- bayesian
- Regime Switching Models Hamilton Palgrav1
- 2015Aie143.lecture2
- Intro to Markov Chains
- h3
- Solution 1
- Markov Chain Monte Carlo
- MA1254 Random Processes :
- HW 06 Markov Chains Solutions
- ACTL3001 - Quiz 2 v1.5
- Continuous Time Markov Chains
- Analysis Aug30 Edit1
- Lecture Notes on Stochastic Networks_Kelly_Yudovina.pdf
- CG10-HMM
- 10.1.1.76.353
- Robotics (26)