Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics

Probabilistic
Graphical Models
Chapter 12
Probabilistic Graphical Models
EEE 485/585 Statistical Learning and Data Analytics Introduction
Directed graphical
models
Undirected Graphical
Models
Cem Tekin
Bilkent University
Cannot be distributed outside this class without the permission of the

instructor. 12.1
Probabilistic
Probabilistic Graphical Models Graphical Models
Introduction
Directed graphical
models
Models
Represents set of conditional independencies

Factorizes the joint distribution
Figure 1.1 from “Probabilistic Graphical Models: Principles and Techniques" by Daphne Koller and Nir Friedman 12.2
Probabilistic
Directed Graphical Models (Bayesian Networks) Graphical Models
Introduction
c b Directed graphical
models
Models
a is a parent of b
b is a child of a
p(a, b, c) = p(c|a, b)p(a, b)

= p(c|a, b)p(b|a)p(a)
12.3
Probabilistic
The Chain Rule Graphical Models
Introduction
Directed graphical
models
p(x1 , x2 , . . . , xD ) = p(xD |x1 , x2 , . . . , xD−1 )
× p(xD−1 |x1 , . . . , xD−2 ) × . . . × p(x2 |x1 ) × p(x1 ) Models
According to which graphical model does chain rule

factorize the joint distribution p(x1 , x2 , . . . , xD )?
12.4
Probabilistic
The Factorization Property Graphical Models
We must have a directed acyclic graph (no directed

cycles)! Introduction
Directed graphical
Let paj denote the parents of node j models
Given a directed acyclic graph with D nodes Models
x = (x1 , x2 , . . . , xD )
D
Y
p(x) = p(xj |paj )
j=1
12.5
Probabilistic
Example Graphical Models
Introduction
Directed graphical
models
Models
12.6
Probabilistic
An Application - Ancestral Sampling Graphical Models
Goal: Sample x̂ = (x̂1 , x̂2 , . . . , x̂D ) from joint distribution

p(x1 , x2 , . . . , xD ).
Assume that p(x1 , x2 , . . . , xD ) factorizes according to a graph in Introduction
which x1 has no parent. Directed graphical

models
Models
Ancestral Sampling:
sample x̂1 from p(x1 )
sample x̂2 from p(x2 |pa2 )
...
sample x̂j from p(xj |paj )
...
sample x̂D from p(xD |paD )
12.7
Probabilistic
Conditional Independence Properties Graphical Models
a and b are independent:
p(a, b) = p(a)p(b)
Introduction
a and b are conditionally independent given c: Directed graphical

models
p(a, b|c) = p(a|c)p(b|c) Models
Check independence and conditional independence of a

and b on the graphs below
c a b
a c b
a b c
12.8
Probabilistic
Explaining Away Graphical Models
p(B=1) = 0.9
B F p(F=1) = 0.9
B: battery p(G = 1|B=1,F=1) = 0.8
G: gauge p(G = 1|B=1,F=0) = 0.2
F: fuel G p(G = 1|B=0,F=1) = 0.2
p(G = 1|B=0,F=0) = 0.1
Introduction
Directed graphical
models
B F B=0 F Undirected Graphical

Models
G=0 G=0
Before any observation: p(F = 0) = 0.1

After observing G = 0: p(F = 0|G = 0) = 0.257 (increased)
After observing B = 0: p(F = 0|G = 0, B = 0) = 0.111
(decreased)
Finding battery is flat (B = 0) explains away the observation that
the fuel gauge (F = 0) reads empty.
12.9
Probabilistic
D-separation Graphical Models
A, B, C: arbitrary non-intersecting set of nodes. How to check

A⊥
⊥B|C ⊥
⊥ = conditionally independent
Let p be a path from a node in A to a node in B.
Introduction
A Directed graphical
models
Models
path C
path
12.10
Probabilistic
Definition: p is blocked if there exists a node v ∈ p such that

1 the arrows meet head-to-tail or tail-to-tail at v and v ∈ C
head-to-tail tail-to-tail
v v
Introduction
Directed graphical
models
path path Models
2 the arrows meet head-to-head at v and neither v nor any

of its descendants is in the set C
head-to-head
path
descendants of v
12.11
Probabilistic
Introduction
Directed graphical
P: set of all paths from A to B. models
If all p ∈ P is blocked, then A is d-separated from B by C: Undirected Graphical

Models
A⊥
⊥B|C
12.12
Probabilistic
D-separation Example 1 Graphical Models
a f Introduction
Directed graphical
models
e b Undirected Graphical
Models
a⊥
⊥ b | c?
12.13
Probabilistic
D-separation Example 2 Graphical Models
a f Introduction
Directed graphical
models
e b Undirected Graphical
Models
a⊥
⊥ b | f?
12.14
Probabilistic
Naive Bayes Classifier Graphical Models
x = (x1 , x2 , . . . , xD ), y ∈ {1, . . . , C}
Parameters: θ = {θjc }j∈{1,...,D},c∈{1,...,C}
Distribution of the features depend on the label! Introduction
Directed graphical
models
D
Y Undirected Graphical
p(x|y = c, θ) = p(xj |y = c, θjc ) Models
j=1
p(y = c) = πc
p(y = c|x) ∝ p(y = c)p(x|y = c, θ)
Features are conditionally independent given the class

label!
Graphical model for the naive Bayes classifier?
12.15
Probabilistic
Naive Bayes Classifier Graphical Models
Gaussian features: Introduction
Directed graphical
models
p(xj |y = c, θjc ) = N (xj |µjc , σjc2 ) Undirected Graphical
Models
Bernoulli features:
p(xj |y = c, θjc ) = Ber(xj , θjc )
12.16
Probabilistic
Classification with Naive Bayes Classifier for Bernouilli Graphical Models
Features
Estimate: πc , θjc from D = {(x i , yi )}ni=1 .
Introduction
D
Y Directed graphical
p(y = c|x, D) ∝ p(y = c|D) p(xj |y = c, D) models
j=1 Models
D
Y
∝ π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
j=1
Then,
 
D
Y 
ĉ = arg max π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
c∈{1,...,C}  j=1

12.17
Probabilistic
Estimating the parameters Graphical Models
Maximum likelihood estimation!
Introduction
n
Y Directed graphical
L(θ, π) = p(x i , yi |θ, π) models
i=1 Undirected Graphical

  Models
n
Y D
Y
= p(yi |π) p(xij |yi , θ j )
i=1 j=1
 ! !
n C D C
I(y =c)
Y Y Y Y
=  πc i p(xij |yi = c, θjc )I(yi =c) 
i=1 c=1 j=1 c=1
12.18
Probabilistic
Estimating the parameters Graphical Models
Log-likelihood:
l(θ, π) = log(L(θ, π))

   
X n X C D
X C
X
=  I(yi = c) log πc  +  I(yi = c) log p(xij |yi = c, θjc )
Introduction
i=1 c=1 j=1 c=1
Directed graphical
n X
X C n X
X D X
C models
= I(yi = c) log πc + I(yi = c) log p(xij |yi = c, θjc ) Undirected Graphical

Models
i=1 c=1 i=1 j=1 c=1
If xij ∼ Bernoulli(θjc ), then

I(xij =1)
p(xij |yi = c, θjc ) = θjc (1 − θjc )I(xij =0)
Hence,
n X
X C
l(θ, π) = I(yi = c) log πc
i=1 c=1
n X
X D X
C

+ I(yi = c) I(xij = 1) log θjc + I(xij = 0) log(1 − θjc )
i=1 j=1 c=1
12.19
Probabilistic
Estimating the parameters - Bernoulli case Graphical Models
C
X
(θ̂ MLE , π̂ MLE ) = arg max l(θ, π) subject to πc = 1
θ,π
c=1
Introduction
Directed graphical
The optimization problem above is solved by the method of models
Lagrange multipliers. Undirected Graphical
Models
Result:
n
nc X
π̂c = , nc = I(yi = c)
n
i=1
n
njc X
θ̂jc = , njc = I(xij = 1, yi = c)
nc
i=1
12.20
Probabilistic
Markov Blanket Graphical Models
For a node xi , its Markov Blanket is the minimal set of nodes

that isolates xi from the rest of the graph.
Introduction
p(x1 , . . . , xD )
p(xi |xj6=i ) = R Directed graphical
p(x1 , . . . , xD )dxi models
Models
xi co-parent of xi
12.21
Probabilistic
Undirected Graphical Models (Markov Random Field) Graphical Models
No arrows!
Introduction
a Directed graphical
models
Models
c b
Conditional independence properties?

Markov blanket?
12.22
Probabilistic
Conditional Independence Graphical Models
Introduction
When xi and xj are NOT neighbors: Directed graphical

models
Models
p(xi , xj |all other nodes) = ?
12.23
Probabilistic
Factorization of the Joint Distribution Graphical Models
Clique: A subset of fully connected nodes in a graph.
Cliques?
Graph 1 Graph 2 Introduction
Directed graphical
x1 x1 models
Models
x2 x3 x2 x3
x4
12.24
Probabilistic
Clique: A subset of fully connected nodes in a graph.
Cliques?
Graph 1 Graph 2 Introduction
Directed graphical
x1 x1 models
Models
x2 x3 x2 x3
x4
Maximal clique: A clique that cannot be extended by adding

one more node
Maximal cliques?
12.24
Probabilistic
x = (x1 , x2 , . . . , xD ): all variables

M: set of maximal cliques
x c , c ∈ M: variables in c
Introduction
ψc (·): non-negative potential function Directed graphical
models
1 Y Undirected Graphical
p(x) = ψc (x c ) Models
Z
c∈M
P Q
where (for discrete variables) Z = x c∈M ψc (x c )
(normalization constant)
Example: Boltzmann distribution
ψc (x c ) = exp(−E(x c ))
How hard is it to compute Z ?
12.25
Probabilistic
Image Denoising Graphical Models
Original Image Noisy Image
Introduction
Directed graphical
models
Models
yi ∈ {−1, +1}: pixel i of noisy image

xi ∈ {−1, +1}: pixel i of original image
12.26
Probabilistic
Cliques?
Energy function?
Introduction
Goal: Given {yi } find {x̂i } such that {x̂i } has maximum Directed graphical
probability. models
Coordinate-wise gradient descent: Models
Take pixel j. Evaluate energy for xj = +1 and xj = −1

given other variables fixed.
Choose xj that minimizes the energy.
Stop when: no more change during a sweep through all
pixels.
Guarantees local maximum of the probability
12.27
Probabilistic
Original Image Noisy Image
Introduction
Directed graphical
models
Models
Denoised Image
Figures 8.30 from “Pattern recognition and machine learning" by Bishop 12.28

Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics

Uploaded by

Copyright:

Available Formats

Probabilistic

Cannot be distributed outside this class without the permission of the

Represents set of conditional independencies

p(a, b, c) = p(c|a, b)p(a, b)

According to which graphical model does chain rule

We must have a directed acyclic graph (no directed

Goal: Sample x̂ = (x̂1 , x̂2 , . . . , x̂D ) from joint distribution

Assume that p(x1 , x2 , . . . , xD ) factorizes according to a graph in Introduction

which x1 has no parent. Directed graphical

a and b are independent:

a and b are conditionally independent given c: Directed graphical

Check independence and conditional independence of a

B F B=0 F Undirected Graphical

Before any observation: p(F = 0) = 0.1

A, B, C: arbitrary non-intersecting set of nodes. How to check

Definition: p is blocked if there exists a node v ∈ p such that

2 the arrows meet head-to-head at v and neither v nor any

If all p ∈ P is blocked, then A is d-separated from B by C: Undirected Graphical

Features are conditionally independent given the class

Gaussian features: Introduction

p(xj |y = c, θjc ) = Ber(xj , θjc )

Estimate: πc , θjc from D = {(x i , yi )}ni=1 .

Maximum likelihood estimation!

i=1 Undirected Graphical

l(θ, π) = log(L(θ, π))

= I(yi = c) log πc + I(yi = c) log p(xij |yi = c, θjc ) Undirected Graphical

If xij ∼ Bernoulli(θjc ), then

For a node xi , its Markov Blanket is the minimal set of nodes

Conditional independence properties?

When xi and xj are NOT neighbors: Directed graphical

p(xi , xj |all other nodes) = ?

Clique: A subset of fully connected nodes in a graph.

Graph 1 Graph 2 Introduction

Clique: A subset of fully connected nodes in a graph.

Graph 1 Graph 2 Introduction

Maximal clique: A clique that cannot be extended by adding

x = (x1 , x2 , . . . , xD ): all variables

How hard is it to compute Z ?

Original Image Noisy Image

yi ∈ {−1, +1}: pixel i of noisy image

Take pixel j. Evaluate energy for xj = +1 and xj = −1

Original Image Noisy Image

You might also like