Robotics

Inference by enumeration
Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation
Inference in Bayesian networks Simple query on the burglary network:
P(B|j, m) B E
= P(B, j, m)/P (j, m)
A
= αP(B, j, m)
= α Σe Σa P(B, e, a, j, m) J M
Chapter 14.4–5
Rewrite full joint entries using product of CPT entries:
P(B|j, m)
= α Σe Σa P(B)P (e)P(a|B, e)P (j|a)P (m|a)
= αP(B) Σe P (e) Σa P(a|B, e)P (j|a)P (m|a)
Recursive depth-first enumeration: O(n) space, O(dn) time
Chapter 14.4–5 1 Chapter 14.4–5 4
Outline Enumeration algorithm
♦ Exact inference by enumeration
function Enumeration-Ask(X, e, bn) returns a distribution over X
♦ Exact inference by variable elimination inputs: X, the query variable
e, observed values for variables E
♦ Approximate inference by stochastic simulation bn, a Bayesian network with variables {X} ∪ E ∪ Y
Q(X ) ← a distribution over X, initially empty
♦ Approximate inference by Markov chain Monte Carlo for each value xi of X do
extend e with value xi for X
Q(xi ) ← Enumerate-All(Vars[bn], e)
return Normalize(Q(X ))
function Enumerate-All(vars, e) returns a real number
if Empty?(vars) then return 1.0
Y ← First(vars)
if Y has value y in e
then return P (y | P a(Y )) × Enumerate-All(Rest(vars), e)
else return y P (y | P a(Y )) × Enumerate-All(Rest(vars), ey )
P
where ey is e extended with Y = y
Inference tasks Evaluation tree
Simple queries: compute posterior marginal P(Xi|E = e) P(b)
e.g., P (N oGas|Gauge = empty, Lights = on, Starts = f alse) .001
P(e) P( e)
Conjunctive queries: P(Xi, Xj |E = e) = P(Xi|E = e)P(Xj |Xi, E = e) .002 .998
Optimal decisions: decision networks include utility information;
P(a|b,e) P( a|b,e) P(a|b, e) P( a|b, e)
probabilistic inference required for P (outcome|action, evidence) .95 .05 .94 .06
Value of information: which evidence to seek next?
P(j|a) P(j| a) P(j|a) P(j| a)
Sensitivity analysis: which probability values are most critical? .90 .05 .90 .05
Explanation: why do I need a new starter motor?
P(m|a) P(m| a) P(m|a) P(m| a)
.70 .01 .70 .01
Enumeration is inefficient: repeated computation
e.g., computes P (j|a)P (m|a) for each value of e
Inference by variable elimination Irrelevant variables
Variable elimination: carry out summations right-to-left, Consider the query P (JohnCalls|Burglary = true)
storing intermediate results (factors) to avoid recomputation B E
P(B|j, m) P (J|b) = αP (b)
X
P (e)
X
P (a|b, e)P (J|a)
X
P (m|a) A
e a m
= α P(B) Σe P (e) Σa |P(a|B, e)} P (j|a) P (m|a) J M
| {z }
B
| {z }
E
{z | {z }|
A
{z }
J M
Sum over m is identically 1; M is irrelevant to the query
= αP(B)ΣeP (e)ΣaP(a|B, e)P (j|a)fM (a)
= αP(B)ΣeP (e)ΣaP(a|B, e)fJ (a)fM (a) Thm 1: Y is irrelevant unless Y ∈ Ancestors({X} ∪ E)
= αP(B)ΣeP (e)ΣafA(a, b, e)fJ (a)fM (a)
= αP(B)ΣeP (e)fĀJM (b, e) (sum out A) Here, X = JohnCalls, E = {Burglary}, and
= αP(B)fĒ ĀJM (b) (sum out E) Ancestors({X} ∪ E) = {Alarm, Earthquake}
= αfB (b) × fĒ ĀJM (b) so M aryCalls is irrelevant
(Compare this to backward chaining from the query in Horn clause KBs)
Variable elimination: Basic operations Irrelevant variables contd.
Summing out a variable from a product of factors: Defn: moral graph of Bayes net: marry all parents and drop arrows
move any constant factors outside the summation
Defn: A is m-separated from B by C iff separated by C in the moral graph
add up submatrices in pointwise product of remaining factors
Thm 2: Y is irrelevant if m-separated from X by E
Σxf1 × · · · × fk = f1 × · · · × fi Σx fi+1 × · · · × fk = f1 × · · · × fi × fX̄ B E
assuming f1, . . . , fi do not depend on X
For P (JohnCalls|Alarm = true), both A
Pointwise product of factors f1 and f2: Burglary and Earthquake are irrelevant
J M
f1(x1, . . . , xj , y1, . . . , yk ) × f2(y1, . . . , yk , z1, . . . , zl )
= f (x1, . . . , xj , y1, . . . , yk , z1, . . . , zl )
E.g., f1(a, b) × f2(b, c) = f (a, b, c)
Variable elimination algorithm Complexity of exact inference
function Elimination-Ask(X, e, bn) returns a distribution over X
Singly connected networks (or polytrees):
inputs: X, the query variable – any two nodes are connected by at most one (undirected) path
e, evidence specified as an event – time and space cost of variable elimination are O(dk n)
bn, a belief network specifying joint distribution P(X1, . . . , Xn)
Multiply connected networks:
factors ← [ ]; vars ← Reverse(Vars[bn])
– can reduce 3SAT to exact inference ⇒ NP-hard
for each var in vars do
factors ← [Make-Factor(var , e)|factors] – equivalent to counting 3SAT models ⇒ #P-complete
if var is a hidden variable then factors ← Sum-Out(var, factors) 0.5 0.5 0.5 0.5
return Normalize(Pointwise-Product(factors)) A B C D
L
1. A v B v C
2. C v D v A 1 2 3
L
3. B v C v D
L
AND
Inference by stochastic simulation Example
Basic idea: P(C)
1) Draw N samples from a sampling distribution S .50
2) Compute an approximate posterior probability P̂ 0.5
Cloudy
3) Show this converges to the true probability P
Outline:
Coin
C P(S|C) C P(R|C)
– Sampling from an empty network Sprinkler Rain
T .10 T .80
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples F .50 F .20
– Markov chain Monte Carlo (MCMC): sample from a stochastic process Wet
Grass
whose stationary distribution is the true posterior
S R P(W|S,R)
T T .99
T F .90
F T .90
F F .01
Sampling from an empty network Example
P(C)
function Prior-Sample(bn) returns an event sampled from bn
.50
inputs: bn, a belief network specifying joint distribution P(X1, . . . , Xn)
x ← an event with n elements Cloudy
for i = 1 to n do
xi ← a random sample from P(Xi | parents(Xi))
given the values of P arents(Xi) in x C P(S|C) C P(R|C)
return x T .10 Sprinkler Rain T .80
F .50 F .20
Wet
Grass
S R P(W|S,R)
T T .99
T F .90
F T .90
F F .01
Example Example
P(C) P(C)
.50 .50
Cloudy Cloudy
C P(S|C) C P(R|C) C P(S|C) C P(R|C)
T .10 Sprinkler Rain T .80 T .10 Sprinkler Rain T .80
F .50 F .20 F .50 F .20
Wet Wet
Grass Grass
S R P(W|S,R) S R P(W|S,R)
T T .99 T T .99
T F .90 T F .90
F T .90 F T .90
F F .01 F F .01
Example Sampling from an empty network contd.
P(C) Probability that PriorSample generates a particular event
n
.50 SP S (x1 . . . xn) = Πi = 1P (xi|parents(Xi)) = P (x1 . . . xn)
i.e., the true prior probability
Cloudy
E.g., SP S (t, f, t, t) = 0.5 × 0.9 × 0.8 × 0.9 = 0.324 = P (t, f, t, t)
C P(S|C) C P(R|C) Let NP S (x1 . . . xn) be the number of samples generated for event x1, . . . , xn
T .10 Sprinkler Rain T .80
F .50 F .20 Then we have
Wet lim P̂ (x1, . . . , xn) = lim NP S (x1, . . . , xn)/N
Grass N →∞ N →∞
= SP S (x1, . . . , xn)
S R P(W|S,R) = P (x1 . . . xn)
T T .99
That is, estimates derived from PriorSample are consistent
T F .90
F T .90 Shorthand: P̂ (x1, . . . , xn) ≈ P (x1 . . . xn)
F F .01
Example Rejection sampling
P(C)
P̂(X|e) estimated from samples agreeing with e
.50
function Rejection-Sampling(X, e, bn, N) returns an estimate of P (X |e)
Cloudy local variables: N, a vector of counts over X, initially zero
for j = 1 to N do
C P(S|C) C P(R|C) x ← Prior-Sample(bn)
T .10 Sprinkler Rain T .80 if x is consistent with e then
N[x] ← N[x]+1 where x is the value of X in x
F .50 F .20
return Normalize(N[X])
Wet
Grass
E.g., estimate P(Rain|Sprinkler = true) using 100 samples
S R P(W|S,R) 27 samples have Sprinkler = true
T T .99 Of these, 8 have Rain = true and 19 have Rain = f alse.
T F .90
P̂(Rain|Sprinkler = true) = Normalize(h8, 19i) = h0.296, 0.704i
F T .90
F F .01 Similar to a basic real-world empirical estimation procedure
Example Analysis of rejection sampling
P(C)
P̂(X|e) = αNP S (X, e) (algorithm defn.)
.50 = NP S (X, e)/NP S (e) (normalized by NP S (e))
≈ P(X, e)/P (e) (property of PriorSample)
Cloudy
= P(X|e) (defn. of conditional probability)
C P(S|C) C P(R|C) Hence rejection sampling returns consistent posterior estimates
T .10 Sprinkler Rain T .80 Problem: hopelessly expensive if P (e) is small
F .50 F .20
P (e) drops off exponentially with number of evidence variables!
Wet
Grass
S R P(W|S,R)
T T .99
T F .90
F T .90
F F .01
Likelihood weighting Likelihood weighting example
P(C)
Idea: fix evidence variables, sample only nonevidence variables,
.50
and weight each sample by the likelihood it accords the evidence
Cloudy
function Likelihood-Weighting(X, e, bn, N) returns an estimate of P (X |e)
local variables: W, a vector of weighted counts over X, initially zero C P(S|C) C P(R|C)
for j = 1 to N do T .10 Sprinkler Rain T .80
x, w ← Weighted-Sample(bn) F .50 F .20
W[x ] ← W[x ] + w where x is the value of X in x Wet
return Normalize(W[X ]) Grass
S R P(W|S,R)
function Weighted-Sample(bn, e) returns an event and a weight T T .99
x ← an event with n elements; w ← 1 T F .90
for i = 1 to n do F T .90
if Xi has a value xi in e F F .01
then w ← w × P (Xi = xi | parents(Xi ))
else xi ← a random sample from P(Xi | parents(Xi )) w = 1.0
return x, w
Likelihood weighting example Likelihood weighting example
P(C) P(C)
.50 .50
Cloudy Cloudy
F .50 F .20 F .50 F .20
Wet Wet
Grass Grass
T T .99 T T .99
T F .90 T F .90
F T .90 F T .90
F F .01 F F .01
w = 1.0 w = 1.0 × 0.1
Likelihood weighting example Likelihood weighting example
P(C) P(C)
.50 .50
Cloudy Cloudy
F .50 F .20 F .50 F .20
Wet Wet
Grass Grass
T T .99 T T .99
T F .90 T F .90
F T .90 F T .90
F F .01 F F .01
w = 1.0 w = 1.0 × 0.1
Likelihood weighting example Approximate inference using MCMC
P(C)
“State” of network = current assignment to all variables.
.50
Cloudy
Generate next state by sampling one variable given Markov blanket
Sample each variable in turn, keeping evidence fixed
C P(S|C) C P(R|C)
Sprinkler Rain function MCMC-Ask(X, e, bn, N) returns an estimate of P (X |e)
T .10 T .80
local variables: N[X ], a vector of counts over X, initially zero
F .50 F .20
Z, the nonevidence variables in bn
Wet
Grass x, the current state of the network, initially copied from e
S R P(W|S,R) initialize x with random values for the variables in Y
for j = 1 to N do
T T .99
for each Zi in Z do
T F .90
F T .90 sample the value of Zi in x from P(Zi |mb(Zi ))
F F .01 given the values of M B(Zi ) in x
N[x ] ← N[x ] + 1 where x is the value of X in x
return Normalize(N[X ])
w = 1.0 × 0.1
Can also choose a variable to sample at random each time
Likelihood weighting example The Markov chain
P(C)
With Sprinkler = true, W etGrass = true, there are four states:
.50
Cloudy Cloudy
Cloudy
Sprinkler Rain Sprinkler Rain
C P(S|C) C P(R|C)
T .10 Sprinkler Rain T .80 Wet Wet
Grass Grass
F .50 F .20
Wet
Grass
S R P(W|S,R)
Cloudy Cloudy
T T .99
T F .90
F T .90 Sprinkler Rain Sprinkler Rain
F F .01 Wet Wet
Grass Grass
w = 1.0 × 0.1 × 0.99 = 0.099
Wander about for a while, average what you see
Likelihood weighting analysis MCMC example contd.
Sampling probability for WeightedSample is Estimate P(Rain|Sprinkler = true, W etGrass = true)
l
SW S (z, e) = Πi = 1P (zi|parents(Zi))
Note: pays attention to evidence in ancestors only Sample Cloudy or Rain given its Markov blanket, repeat.
⇒ somewhere “in between” prior and
Cloudy
Count number of times Rain is true and false in the samples.
posterior distribution Sprinkler Rain
E.g., visit 100 states
Weight for a given sample z, e is
Wet
Grass
31 have Rain = true, 69 have Rain = f alse
m
w(z, e) = Πi = 1P (ei|parents(Ei)) P̂(Rain|Sprinkler = true, W etGrass = true)
Weighted sampling probability is = Normalize(h31, 69i) = h0.31, 0.69i
SW S (z, e)w(z, e) Theorem: chain approaches stationary distribution:
l m
= Πi = 1P (zi|parents(Zi)) Πi = 1P (ei|parents(Ei)) long-run fraction of time spent in each state is exactly
= P (z, e) (by standard global semantics of network) proportional to its posterior probability
Hence likelihood weighting returns consistent estimates
but performance still degrades with many evidence variables
because a few samples have nearly all the total weight
Markov blanket sampling
Markov blanket of Cloudy is Cloudy
Sprinkler and Rain
Markov blanket of Rain is Sprinkler Rain
Cloudy, Sprinkler, and W etGrass Wet
Grass
Probability given the Markov blanket is calculated as follows:
P (xi0 |mb(Xi)) = P (xi0 |parents(Xi))ΠZj ∈Children(Xi)P (zj |parents(Zj ))
Easily implemented in message-passing parallel systems, brains
Main computational problems:
1) Difficult to tell if convergence has been achieved
2) Can be wasteful if Markov blanket is large:
P (Xi|mb(Xi)) won’t change much (law of large numbers)
Chapter 14.4–5 37
Summary
Exact inference by variable elimination:
– polytime on polytrees, NP-hard on general graphs
– space = time, very sensitive to topology
Approximate inference by LW, MCMC:
– LW does poorly when there is lots of (downstream) evidence
– LW, MCMC generally insensitive to topology
– Convergence can be very slow with probabilities close to 1 or 0
– Can handle arbitrary combinations of discrete and continuous variables
Chapter 14.4–5 38

Robotics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robotics

Uploaded by

Copyright:

Available Formats

Inference by enumeration

You might also like