Professional Documents
Culture Documents
Daniel M. Roy
University of Toronto | Vector Institute
0
0 1
1
simulation
−−−−−−−−−−−−→
0
0 1
1
simulation (71, 9)
−−−−−−−−−−−−→
0
0 1
1
simulation (71, 9)
−−−−−−−−−−−−→
0
0 0.67 0.86 1
1
simulation (71, 9)
−−−−−−−−−−−−→
0
0 0.67 0.86 1
(71, 9)
1
simulation (71, 9)
−−−−−−−−−−−−→
0
0 0.67 0.86 1
inference
←−−−−−−−−−−−−− (71, 9)
1
simulation (71, 9)
−−−−−−−−−−−−→
0
0 0.67 0.86 1
inference
←−−−−−−−−−−−−− (71, 9)
0
0 0.71 0.9 1
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
As a first step towards understanding QUERY, consider the trivial predicate
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
Consider a slightly more interesting example:
def N():
return uniformInt(range(1,180))
def div235(n):
return isDivBy(n,2) or isDivBy(n,3) or isDivBy(n,5)
1 def QUERY(guesser,checker):
2 # guesser: Unit -> S
3 # predicate : S -> Boolean
4 accept = False
5 while (not accept)
6 guess = guesser()
7 accept = checker(guess)
8 return guess
represents the higher order operation of conditioning. When checker is
deterministic, then QUERY denotes the map
P ( · ∩ A)
(P, 1A ) 7→ P ( · |A) := . (1)
P (A)
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
def guesser():
p = uniform()
return p
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
def guesser():
p = uniform()
return p
def checker(p):
return [0,0,1,0,0] == bernoulli(p,5)
Daniel M. Roy 15 / 101
E XAMPLE : INFERRING BIAS OF A COIN
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
def guesser():
p = uniform()
return p
def checker(p):
return [0,0,1,0,0] == bernoulli(p,5)
Daniel M. Roy 15 / 101
E XAMPLE : INFERRING BIAS OF A COIN
2
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
count
5 return guess 1
0
0.0 0.2 0.4 0.6 0.8 1.0
Given x1 , . . . , xn ∈ {0, 1}, p
def guesser():
p = uniform()
return p
def checker(p):
return [0,0,1,0,0] == bernoulli(p,5)
Daniel M. Roy 15 / 101
E XAMPLE : INFERRING BIAS OF A COIN
7
1 accept = False
2 while (not accept): 6
3 guess = guesser() 5
4 accept = checker(guess) 4
count
5 return guess
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
Given x1 , . . . , xn ∈ {0, 1}, p
def guesser():
p = uniform()
return p
def checker(p):
return [0,0,1,0,0] == bernoulli(p,5)
Daniel M. Roy 15 / 101
E XAMPLE : INFERRING BIAS OF A COIN
5000
1 accept = False 4500
2 while (not accept): 4000
3 guess = guesser() 3500
4 accept = checker(guess) 3000
count
5 return guess 2500
2000
1500
1000
500
0
0.0 0.2 0.4 0.6 0.8 1.0
Given x1 , . . . , xn ∈ {0, 1}, p
def guesser():
p = uniform()
return p
def checker(p):
return [0,0,1,0,0] == bernoulli(p,5)
Daniel M. Roy 15 / 101
Let s = x1 + · · · + xn and let U be uniformly distributed.
For all t ∈ [0, 1], we have P U ≤ t = t and
P checker(t, x) is True = P ∀i ( Ui ≤ t ⇐⇒ xi = 1 )
= ts (1 − t)n−s .
PrHacceptL
0.06
0.05
0.04
0.03
n = 6, s ∈ {1, 3, 5}.
0.02
0.01
t
0.2 0.4 0.6 0.8 1.0
Z 1
(s)!(n − s)!
P checker(U, x) is True = ts (1 − t)n−s dt = =: Z(s)
0 (n + 1)!
Let p(t)dt be the probability that the accepted θ ∈ [t, t + dt).
ts (1 − t)n−s
p(t)dt ≈ ts (1 − t)n−s dt + 1 − Z(s) p(t)dt ≈ dt
Z(s)
R s+1
Probability that the accepted X = 1 is then t p(t)dt = n+2 .
Daniel M. Roy 16 / 101
E XAMPLE : INFERRING OBJECTS FROM AN IMAGE
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 def guesser():
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept): 4
3 guess = guesser()
accept = checker(guess)
3
4
5 return guess
count
2
1
How many objects in this image?
0
0 1 2 3 4 5 6 7
1 def guesser(): k = # blocks
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept): 40
3 guess = guesser() 36
4 accept = checker(guess) 32
5 return guess 28
24
count
20
16
12
How many objects in this image? 8
4
0
0 1 2 3 4 5 6 7
1 def guesser(): k = # blocks
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept): 80
3 guess = guesser() 72
4 accept = checker(guess) 64
5 return guess 56
48
count
40
32
24
How many objects in this image? 16
8
0
0 1 2 3 4 5 6 7
1 def guesser(): k = # blocks
2 k = geometric()
3 blocks = [ randomblock() for _ in range(k) ]
4 colors = [ randomcolor() for _ in range(k) ]
5 return (k,blocks,colors)
7 def checker(k,blocks,colors):
8 return rasterize(blocks,colors) ==
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
checker
−−−−−−→
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
checker
−−−−−−→
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
checker
−−−−−−→
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
inference
←−−−−−−
1 accept = False
2 while (not accept):
3 guess = guesser()
4 accept = checker(guess)
5 return guess
inference
←−−−−−−
Key point: QUERY is not a serious proposal for an algorithm, but it denotes
the operation we care about in Bayesian analysis.
Key point: QUERY is not a serious proposal for an algorithm, but it denotes
the operation we care about in Bayesian analysis.
Corollary
If pred(model()) is efficient and P (A) not too small, then
QUERY(model,pred) is efficient.
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
MIT-C HURCH AKA T RACE -MCMC
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
p(1 − p)
2
if1
B1 D1 A1
+1 if2
B2 RET2
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
p p(1 − p)
1 2
if1 if1
B1 RET1 B1 D1 A1
+1 if2
B2 RET2
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
B1 RET1 B1 D1 A1 B1 D1 A1
+1 if2 +1 if2
B2 RET2 B2 D2 A2
+2 if3
B3 RET3
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
B1 RET1 B1 D1 A1 B1 D1 A1
+2 if3
B3 RET3
1 def geometric(p):
2 if bernoulli(p) == 1: return 1
3 else: return 1 + geometric(p)
1 def aliased_geometric(p):
2 g = geometric(p)
3 return 1 if g < 3 else 0
B1 RET1 B1 D1 A1 B1 D1 A1
+2 if3
B3 RET3
if1
B1 RET1
if1
B1 D1 A1
+1 if2
B2 RET2
if1
B1 D1 A1
+1 if2
B2 D2 A2
+2 if3
B3 RET3
4
5
Daniel M. Roy 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
if1 if1
B1 RET1 B1 RET1
if1 if1
B1 D1 A1 B1 D1 A1
+1 if2 +1 if2
B2 RET2 B2 RET2
if1 if1
B1 D1 A1 B1 D1 A1
+1 if2 +1 if2
B2 D2 A2 B2 D2 A2
+2 if3 +2 if3
B3 RET3 B3 RET3
4 4
5 5
Daniel M. Roy 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
if1 if1
?
B1 RET1
? B1 RET1
?
if1 if1
?
B1 D1 A1
? B1 D1 A1
?
+1 if2 +1 if2
?
B2 RET2 ? B2 RET2
if1 if1
B1 D1 A1 B1 D1 A1
?
+1 if2
? +1 if2
?
B2 D2 A2 B2 D2 A2
?
+2 if3 +2 if3
?
B3 RET3 ? B3 RET3
?
4 ? 4
5 5
Daniel M. Roy 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
B1 D1 A1 B1 D1 A1 B1 D1 A1
B1 D1 A1 B1 D1 A1 B1 D1 A1
B2 D2 A2 B2 D2 A2 B2 D2 A2
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal
B1 D1 A1 B1 D1 A1 B1 D1 A1
B1 D1 A1 B1 D1 A1 B1 D1 A1
B2 D2 A2 B2 D2 A2 B2 D2 A2
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal
p
if1 if1 if1
B1 D1 A1 B1 D1 A1 B1 D1 A1
pp̄
+1 if2 +1 if2 +1 if2
B1 D1 A1 B1 D1 A1 B1 D1 A1
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal
p 1
2
if1 if1 if1
B1 D1 A1 B1 D1 A1 B1 D1 A1
pp̄
+1 if2 +1 if2 +1 if2
1
2p
if1 if1 if1
1 k
B1 D1 A1 2 pp̄ B1 D1 A1 B1 D1 A1
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal Metropolis–Hastings
p 1
2
if1 if1 if1
B1 D1 A1 B1 D1 A1 B1 D1 A1
pp̄
+1 if2 +1 if2 +1 if2
1
2p
if1 if1 if1
1 k
B1 D1 A1 2 pp̄ B1 D1 A1 B1 D1 A1
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal Metropolis–Hastings
p 1
2
p̄
if1 if1
1− 2p 1
if1
B1 D1 A1 B1 D1 A1 B1 D1 A1
pp̄ p̄
+1 if2 +1 if2
2p +1 if2
1
2p
if1 if1 if1
1 k
B1 D1 A1 2 pp̄ B1 D1 A1 B1 D1 A1
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
Proposal Metropolis–Hastings
B1 RET1 B1 RET1
1 B1 RET1
p 1
2
p̄
if1 if1
1− 2p 1
if1
B1 D1 A1 B1 D1 A1 B1 D1 A1
pp̄ p̄
+1 if2 +1 if2
2p +1 if2
1
1
2p
if1 if1 if1
1 k
B1 D1 A1 2 pp̄ B1 D1 A1
1 B1 D1 A1
4 4 4
5 5 5
Daniel M. Roy 6 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
if1 if1
B1 RET1
1 − 12 p̄ B1 RET1
1
if1
2 p̄ if1
1
B1 D1 A1
2 B1 D1 A1
1
+1 if2 2 +1 if2
B2 RET2 B2 RET2
if1 if1
B1 D1 A1 B1 D1 A1
+1 if2 +1 if2
B2 D2 A2 B2 D2 A2
+2 if3 +2 if3
B3 RET3 B3 RET3
4 4
5 5
Daniel M. Roy 6 6 23 / 101
MIT-C HURCH AKA T RACE -MCMC
if1 if1
B1 RET1
1 − 12 p̄ B1 RET1
1
if1
2 p̄ if1
1
B1 D1 A1
2 B1 D1 A1
1
+1 if2 2 +1 if2
B2 RET2 B2 RET2
if1 if1
B1 D1 A1 B1 D1 A1
+1 if2
k p p̄
+1 if2
1 − 21 p̄ 1
p̄ p+pp̄ p+pp̄
→ as k → ∞
B2 D2 A2 B2 D2 A2
2
1 1 p p̄
+2 if3 +2 if3
2 2 p+pp̄ p+pp̄
B3 RET3 B3 RET3
4 4
5 5
Daniel M. Roy 6 6 23 / 101
QUERY CAN CAPTURE A WIDE RANGE OF AI BEHAVIORS
Despite the apparent simplicity of the QUERY construct, we will see that it
captures the essential structure of a range of common-sense inferences. We
now demonstrate the power of the QUERY formalism by exploring its
behavior in a medical diagnosis example.
Probabilistic inference
References
Probabilistic inference
References
The remainder of the tutorial will use a medical diagnosis task as a running
example. The goal is to link observed symptoms to (unobserved) diseases.
The stochastic inference problem is:
QUERY(diseasesAndSymptoms,checkSymptoms)
All that remains is to define the two procedures that define the model.
diseasesAndSymptoms() will produce a random (possibly empty) set
of diseases and associated symptoms, modeling the distribution of diseases
and symptoms of a random chosen patient arriving at a clinic.
checkSymptoms(...) checks the hypothesized symptoms against the
list of observed symptoms, accepting if there is a match
n Disease pn
1 Arthritis 0.06
2 Asthma 0.04
3 Diabetes 0.11
4 Epilepsy 0.002
5 Giardiasis 0.006
6 Influenza 0.08
7 Measles 0.001
8 Meningitis 0.003
9 MRSA 0.001
10 Salmonella 0.002
11 Tuberculosis 0.003
m Symptom `m
1 Fever 0.06
2 Cough 0.04
3 Hard breathing 0.001
4 Insulin resistant 0.15
5 Seizures 0.002
6 Aches 0.2
7 Sore neck 0.006
cn,m 1 2 3 4 5 6 7
1 .1 .2 .1 .2 .2 .5 .5
2 .1 .4 .8 .3 .1 .0 .1
3 .1 .2 .1 .9 .2 .3 .5
4 .4 .1 .0 .2 .9 .0 .0
5 .6 .3 .2 .1 .2 .8 .5
6 .4 .2 .0 .2 .0 .7 .4
7 .5 .2 .1 .2 .1 .6 .5
8 .8 .3 .0 .3 .1 .8 .9
9 .3 .2 .1 .2 .0 .3 .5
10 .4 .1 .0 .2 .1 .1 .2
11 .3 .2 .1 .2 .2 .3 .5
hence Sm ∈ {0, 1}. (The max operator is playing the role of a logical OR
operation.)
Every term of the form Dn · Cn,m is interpreted as indicating whether (or not)
the patient has disease n and disease n has caused symptom m. The term
Lm captures the possibility that the symptom may present itself despite the
patient having none of the listed diseases.
This model is a toy version of the real diagnostic model QMR-DT [Shwe
et al., 1991], built from the Quick Medical Reference (QMR) knowledge base
of hundreds of diseases and thousands of findings (such as symptoms or test
results). A key aspect of this model is the disjunctive relationship between the
diseases and the symptoms, known as a “noisy-OR”.
Shortcomings
As a model of natural patterns of diseases and symptoms in a random
patient, leaves much to desire:
I model assumes that the presence or absence of any two diseases is
independent, although, as we will see later on in our analysis, diseases
are (as expected) typically not independent conditioned on symptoms.
I diseases may cause other diseases, and symptoms may cause diseases
I QMR-DT, like our toy model, was major advance over earlier expert
systems and probabilistic models, allowing simultaneous occurrence of
multiple diseases Shwe et al. [1991].
These caveats notwithstanding, a close inspection of this simplified model will
demonstrate a surprising range of common-sense reasoning phenomena.
Daniel M. Roy 33 / 101
I NCORPORATING OBSERVED SYMPTOMS
µ(B|A)µ(A)
µ(A | B) = . (3)
µ(B)
P(S1 = S7 = 1 | D = d) (4)
= P(S1 = 1 | D = d) · P(S7 = 1 | D = d), (5)
where the equality follows from the observation that once the Dn variables
are fixed, the variables S1 and S7 are independent. To see this, recall that
P(D = d | S1 = S7 = 1)
P(D = d0 | S1 = S7 = 1)
P(S1 = S7 = 1 | D = d) · P(D = d)
, (7)
P(S1 = S7 = 1 | D = d0 ) · P(D = d0 )
Q11
where P(D = d) = n=1 P(Dn = dn ) by independence. Using (5), (6) and
(7), one may calculate that
Further investigation reveals some subtle aspects of the model. For example,
Observation 1
Once we have observed some symptoms, diseases are no longer
independent.
Observation 2
Once the symptoms have been “explained” (e.g., as coming from meningitis),
there is little pressure to posit further causes (the posterior probability of
influenza is essentially the prior probability of influenza).
P(S1 = · · · = S7 = 0 | D6 = 1),
and, via QUERY, by the predicate for the condition D6 = 1. Unlike the
earlier examples where we reasoned backwards from effects
(symptoms) to their likely causes (diseases), here we are reasoning in
the same forward direction as the model diseasesAndSymptoms is
expressed.
The possibilities are effectively inexhaustible, including more complex states
of knowledge such as, there are at least two symptoms present, or the patient
does not have both salmonella and tuberculosis. Later, we will consider the
vast number of predicates and the resulting inferences supported by QUERY
and diseasesAndSymptoms, and contrast this with the compact size of
diseasesAndSymptoms and the table of parameters.
Daniel M. Roy 43 / 101
In this section, we illustrated the basic behavior of QUERY, and began to
explore how QUERY can be used to update beliefs in light of observations.
I Inferences need not be explicitly described in terms of rules, but can
arise implicitly via other mechanisms, like QUERY, paired with an
appropriate models and predicates.
I In the working example, the diagnostic rules were determined by the
definition of diseasesAndSymptoms and the table of its parameters.
I The inference, however, are fixed.
I We will examine how the underlying table of probabilities might be learned
from data.
I The structure of diseasesAndSymptoms itself encodes strong
structural relationships among the diseases and symptoms. We will study
how to learn this in part 2.
I Finally, many common-sense reasoning tasks involve making a decision,
and not just determining what to believe. Towards the end, we will describe
how to use QUERY to make decisions under uncertainty.
Probabilistic inference
References
In this section, we return to the medical diagnosis example, and explain the
way in which conditional independence leads to compact representations,
and conversely, the fact that efficient probabilistic programs, like
diseasesAndSymptoms, exhibit many conditional independencies. We
will do so through connections with the Bayesian network formalism, whose
introduction by Pearl [1988] was a major advancement in AI.
In fact, the small number of diseases and symptoms considered in our simple
medical diagnosis model already leads to a combinatorial number of potential
scenarios: among 11 potential diseases and 7 potential symptoms there are
probabilities, one each for every complete assignment. Still, this number is
exponential in the number of diseases and symptoms. Even if we discretize
the probabilities to some fixed accuracy, a simple counting argument shows
that most such distributions have no short description.
Daniel M. Roy 47 / 101
F EW DISTRIBUTIONS HAVE COMPACT REPRESENTATIONS
11 + 7 + 11 · 7 = 95
Full independence is rare and so this factorization is not the whole story.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
J J J J J J J J J J J
J J J J J J J
S1 S2 S3 S4 S5 S6 S7
J J J J J J J
C1,1 C1,2 C1,3 C1,4 C1,5 C1,6 C1,7
J J J J J J J
C2,1 C2,2 C2,3 C2,4 C2,5 C2,6 C2,7
J J J J J J J
C3,1 C3,2 C3,3 C3,4 C3,5 C3,6 C3,7
J J J J J J J
C4,1 C4,2 C4,3 C4,4 C4,5 C4,6 C4,7
J J J J J J J
C5,1 C5,2 C5,3 C5,4 C5,5 C5,6 C5,7
J J J J J J J
C6,1 C6,2 C6,3 C6,4 C6,5 C6,6 C6,7
J J J J J J J
C7,1 C7,2 C7,3 C7,4 C7,5 C7,6 C7,7
J J J J J J J
C8,1 C8,2 C8,3 C8,4 C8,5 C8,6 C8,7
J J J J J J J
C9,1 C9,2 C9,3 C9,4 C9,5 C9,6 C9,7
J J J J J J J
C10,1 C10,2 C10,3 C10,4 C10,5 C10,6 C10,7
J J J J J J J
C11,1 C11,2 C11,3 C11,4 C11,5 C11,6 C11,7
J J J J J J J
L1 L2 L3 L4 L5 L6 L7
Bayes net for the medical diagnosis example. (Note that the directionality of
the arrows is not rendered for clarity. All arrows point to the symptoms Sm .)
Daniel M. Roy 54 / 101
P ROTO - LANGUAGES FOR B AYES NETS
Dn diseases n
Cn,m
J
Lm
J J
Sm
symptoms m
d-separation
A pair (x, y) of vertices are d-separated by a subset of vertices E as follows:
First, mark each vertex in E with a ×, which we will indicate by the symbol
N
. If a vertex with (any type of) mark has an unmarked
L parent, mark the
parent with a +, which we will indicate by the symbol . Repeat until a fixed
J
point is reached. Let indicate unmarked vertices.
Definition. x and y are d-separated by E if, for all (undirected) paths from x
to y , one of the following patterns appears:
J N J J N J
→ → ← ←
J N J J J J
← → → ←
J J J J J J J
S1 S2 S3 S4 S5 S6 S7
J J J J J J J
C1,1 C1,2 C1,3 C1,4 C1,5 C1,6 C1,7
J J J J J J J
C2,1 C2,2 C2,3 C2,4 C2,5 C2,6 C2,7
J J J J J J J
C3,1 C3,2 C3,3 C3,4 C3,5 C3,6 C3,7
J J J J J J J
C4,1 C4,2 C4,3 C4,4 C4,5 C4,6 C4,7
J J J J J J J
C5,1 C5,2 C5,3 C5,4 C5,5 C5,6 C5,7
J J J J J J J
C6,1 C6,2 C6,3 C6,4 C6,5 C6,6 C6,7
J J J J J J J
C7,1 C7,2 C7,3 C7,4 C7,5 C7,6 C7,7
J J J J J J J
C8,1 C8,2 C8,3 C8,4 C8,5 C8,6 C8,7
J J J J J J J
C9,1 C9,2 C9,3 C9,4 C9,5 C9,6 C9,7
J J J J J J J
C10,1 C10,2 C10,3 C10,4 C10,5 C10,6 C10,7
J J J J J J J
C11,1 C11,2 C11,3 C11,4 C11,5 C11,6 C11,7
J J J J J J J
L1 L2 L3 L4 L5 L6 L7
Deciding d-separation in the Bayes net, we can determine that (1) the
diseases {D1 , . . . , D11 } are independent (i.e., conditionally independent
given E = ∅) and that (2) the symptoms {S1 , . . . , S7 } are conditionally
independent given the diseases {D1 , . . . , D11 } and many more.
Bayes nets specify a factorization of the joint distribution of the vertex set.
P(X1 = x1 , . . . , Xk = xk )
= P(X1 = x1 ) · P(X2 = x2 | X1 = x1 ) · · · P(Xk = xk | Xj = xj , j < k)
k
Y
= P(Xj = xj | Xi = xi , i < j).
j=1
Let G be a Bayes net over {X1 , . . . , Xk }, and write Pa(Xj ) for the set of
indices i such that (Xi , Xj ) ∈ G, i.e., Pa(Xj ) indexes the parent vertices of
Xj . Then the joint p.m.f. may be expressed as
k
Y
P(X1 = x1 , . . . , Xk = xk ) = P(Xj = xj | Xi = xi , i ∈ Pa(Xj )).
j=1
Pk
This factorization is determined by only j=1 2|Pa(Xj )| probabilities, an
exponential savings potentially.
Assuming the 95 tabulated probabilities are dyadics of the form 2km , we may
represent diseasesAndSymptoms as a small boolean circuit whose
inputs are random bits and whose 18 output lines represent the diseases and
symptom indicators. Each circuit elements can be restricted to
constant-fan-in, and the total number of circuit elements grows only linearly in
the number of diseases and in the number of symptoms, assuming fixed
accuracy for the bae probabilities.
Probabilistic inference
References
For large n,
I hypothesized marginal probability of a disease relatively close to the
frequency observed in the simulated disease–symptom data.
The probability that the n sampled values of Dj match the historical record is
α1 k+1
=
α1 + α0 n+2
and concentration parameter α1 + α0 = n + 2.
12
10
Similar approaches can be used when the patients come from multiple
distinct populations where you do not expect the patterns of diseases and
symtoms to agree.
Probabilistic inference
References
Solution
allDiseasesAndSymptoms performs probabilistic inference over the
probabilities to learn the best noisy-OR network from data.
Solution
Identify the correct structure of the dependence between symptoms and
disease by probabilistic inference over random conditional independence
structure among the model variables.
k
Y
P(X1 = x1 , . . . , Xk = xk ) = P(Xj = xj | Xi = xi , i ∈ Pa(Xj )).
j=1
Approach
Be “Bayesian”... put prior distributions on graphs and conditional probabilities.
Consider a prior program, which we will call RPD (for Random Probability
Distribution) that takes two positive integer inputs, n and D , and produces as
output n independent samples drawn from a random probability distribution
on {0, 1}D .
QUERY(RPD(n+1,18), checkAllSymptoms)
kj|v + 1
nj|v + 2
where nj|v is the number of times in the historical data where the event
{Xi = xi , i ∈ Pa(Xj )} occurs; and kj|v is the number of times when,
moreover, Xj = 1. This is simply the “smoothed” empirical frequency. In fact,
the probability pj|v is conditionally Beta distributed with concentration
nj|v + 2.
Informally,
I the smaller the parental sets (a property of G), the more certain we are
likely to be regarding the correct parameterization
I in terms of QUERY, reflected in smaller the range of values of pj|v on
accepted runs.
This is our first glimpse at a subtle balance between the simplicity (sparsity)
of the graph G and how well it captures hidden structure in the data.
k
Y
P(X1 = x1 , . . . , Xk = xk ) = P(Xj = xj | Xi = xi , i ∈ Pa(Xj )).
j=1
Every time the pattern {Xi = vi , i ∈ Pa(Xj )} arises in historical data, the
generative process produces the historical value Xj with probability pj|v if
Xj = 1 and 1 − pj|v if Xj = 0.
Conditional on the pj|v ’s, the probability that the historical data is reproduced
is
D Y
Y k
j|v
pj|v (1 − pj|v )nj|v −kj|v ,
j=1 v
where v ranges over the possible {0, 1} assignments to Pa(Xj ) and kj|v
and nj|v are defined as above.
Y
D Y
kj|v nj|v −kj|v
score(G) = E pj|v (1 − pj|v )
j=1 v
YD Y −1
−1 nj|v
= (nj|v + 1)
j=1 v
kj|v
Because the graph G was chosen uniformly at random, it follows that the
posterior probability of a particular graph G is proportional to score(G).
We can study the preference for one graph G over another G0 by studying
the ratio of their scores:
score(G)
.
score(G0 )
This score ratio is known as the Bayes factor, which I.J. Good termed the
Bayes–Jeffreys–Turing factor [Good, 1968, 1975], and which Turing himself
called the factor in favor of a hypothesis (see [Good, 1968], [Zabell, 2012,
§1.4], and [Turing, 2012]). Its logarithm is sometimes known as the weight of
evidence [Good, 1968].
The form of the score is a product over the local structure of the graph, thus
the Bayes factor will depend only on the contributions from those parts of the
graphs G and G0 that differ.
where
I n counts the total number of observations;
I k counts Y = 1;
I n1 counts X = 1;
I k1 counts X = 1 and Y = 1;
I n0 counts X = 0; and
I k0 counts X = 0 and Y = 1.
Daniel M. Roy 88 / 101
S IMPLE SPECIAL CASE OF GRAPH POSTERIOR , PART II
First consider the case where G0 is the true underlying graph, i.e., when Y is
indeed dependent on X .
By the law of large numbers, and Stirling’s approximation, we can reason that
the evidence for G0 accumulates rapidly, satisfying
score(G)
log ∼ −C · n, a.s.,
score(G0 )
for some constant C > 0 that depends only on the joint distribution of (X, Y ).
score(G) 1
log ∼ log n, a.s.
score(G0 ) 2
and thus evidence accumulates much more slowly.
The following plots show typical the evolution of the Bayes factors under G0
and under G. Evidence accumulates much more rapidly for G0 .
20 40 60 80 100 4
-200
3
-400
2
-600
1
-800
20 40 60 80 100
-1000
-1
Weight of evidence for dependence versus independence (positive values support independence) of
a sequence of pairs of random variables sampled from RPD(n, 2).
(left) When presented with data from a distribution where (X, Y ) are indeed dependent, the weight of
evidence rapidly accumulates for the dependent model, at an asymptotically linear rate in the amount
of data.
(right) When presented with data from a distribution where (X, Y ) are independent, the weight of
evidence slowly accumulates for the independent model, at an asymptotic rate that is logarithmic in
the amount of data.
Note that the dependent model can imitate the independent model, but, on average over random
parameterizations of the conditional probability mass functions, the dependent model is worse at
modeling independent data.
Probabilistic inference
References