Professional Documents
Culture Documents
Graphical Models
Chapter 12
Probabilistic Graphical Models
EEE 485/585 Statistical Learning and Data Analytics Introduction
Directed graphical
models
Undirected Graphical
Models
Cem Tekin
Bilkent University
Introduction
Directed graphical
models
Undirected Graphical
Models
Figure 1.1 from “Probabilistic Graphical Models: Principles and Techniques" by Daphne Koller and Nir Friedman 12.2
Probabilistic
Directed Graphical Models (Bayesian Networks) Graphical Models
Introduction
c b Directed graphical
models
Undirected Graphical
Models
a is a parent of b
b is a child of a
12.3
Probabilistic
The Chain Rule Graphical Models
Introduction
Directed graphical
models
p(x1 , x2 , . . . , xD ) = p(xD |x1 , x2 , . . . , xD−1 )
Undirected Graphical
× p(xD−1 |x1 , . . . , xD−2 ) × . . . × p(x2 |x1 ) × p(x1 ) Models
12.4
Probabilistic
The Factorization Property Graphical Models
Directed graphical
Let paj denote the parents of node j models
Undirected Graphical
Given a directed acyclic graph with D nodes Models
x = (x1 , x2 , . . . , xD )
D
Y
p(x) = p(xj |paj )
j=1
12.5
Probabilistic
Example Graphical Models
Introduction
Directed graphical
models
Undirected Graphical
Models
12.6
Probabilistic
An Application - Ancestral Sampling Graphical Models
Undirected Graphical
Models
Ancestral Sampling:
sample x̂1 from p(x1 )
sample x̂2 from p(x2 |pa2 )
...
sample x̂j from p(xj |paj )
...
sample x̂D from p(xD |paD )
12.7
Probabilistic
Conditional Independence Properties Graphical Models
p(a, b) = p(a)p(b)
Introduction
Undirected Graphical
p(a, b|c) = p(a|c)p(b|c) Models
c a b
a c b
a b c
12.8
Probabilistic
Explaining Away Graphical Models
p(B=1) = 0.9
B F p(F=1) = 0.9
B: battery p(G = 1|B=1,F=1) = 0.8
G: gauge p(G = 1|B=1,F=0) = 0.2
F: fuel G p(G = 1|B=0,F=1) = 0.2
p(G = 1|B=0,F=0) = 0.1
Introduction
Directed graphical
models
G=0 G=0
Introduction
A Directed graphical
models
Undirected Graphical
Models
path C
path
12.10
Probabilistic
D-separation Graphical Models
Directed graphical
models
Undirected Graphical
path path Models
descendants of v
12.11
Probabilistic
D-separation Graphical Models
Introduction
Directed graphical
P: set of all paths from A to B. models
A⊥
⊥B|C
12.12
Probabilistic
D-separation Example 1 Graphical Models
a f Introduction
Directed graphical
models
e b Undirected Graphical
Models
a⊥
⊥ b | c?
12.13
Probabilistic
D-separation Example 2 Graphical Models
a f Introduction
Directed graphical
models
e b Undirected Graphical
Models
a⊥
⊥ b | f?
12.14
Probabilistic
Naive Bayes Classifier Graphical Models
x = (x1 , x2 , . . . , xD ), y ∈ {1, . . . , C}
Parameters: θ = {θjc }j∈{1,...,D},c∈{1,...,C}
Distribution of the features depend on the label! Introduction
Directed graphical
models
D
Y Undirected Graphical
p(x|y = c, θ) = p(xj |y = c, θjc ) Models
j=1
p(y = c) = πc
p(y = c|x) ∝ p(y = c)p(x|y = c, θ)
12.15
Probabilistic
Naive Bayes Classifier Graphical Models
Directed graphical
models
p(xj |y = c, θjc ) = N (xj |µjc , σjc2 ) Undirected Graphical
Models
Bernoulli features:
12.16
Probabilistic
Classification with Naive Bayes Classifier for Bernouilli Graphical Models
Features
Introduction
D
Y Directed graphical
p(y = c|x, D) ∝ p(y = c|D) p(xj |y = c, D) models
Undirected Graphical
j=1 Models
D
Y
∝ π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
j=1
Then,
D
Y
ĉ = arg max π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
c∈{1,...,C} j=1
12.17
Probabilistic
Estimating the parameters Graphical Models
Introduction
n
Y Directed graphical
L(θ, π) = p(x i , yi |θ, π) models
12.18
Probabilistic
Estimating the parameters Graphical Models
Log-likelihood:
Hence,
n X
X C
l(θ, π) = I(yi = c) log πc
i=1 c=1
n X
X D X
C
+ I(yi = c) I(xij = 1) log θjc + I(xij = 0) log(1 − θjc )
i=1 j=1 c=1
12.19
Probabilistic
Estimating the parameters - Bernoulli case Graphical Models
C
X
(θ̂ MLE , π̂ MLE ) = arg max l(θ, π) subject to πc = 1
θ,π
c=1
Introduction
Directed graphical
The optimization problem above is solved by the method of models
Lagrange multipliers. Undirected Graphical
Models
Result:
n
nc X
π̂c = , nc = I(yi = c)
n
i=1
n
njc X
θ̂jc = , njc = I(xij = 1, yi = c)
nc
i=1
12.20
Probabilistic
Markov Blanket Graphical Models
Introduction
p(x1 , . . . , xD )
p(xi |xj6=i ) = R Directed graphical
p(x1 , . . . , xD )dxi models
Undirected Graphical
Models
xi co-parent of xi
12.21
Probabilistic
Undirected Graphical Models (Markov Random Field) Graphical Models
No arrows!
Introduction
a Directed graphical
models
Undirected Graphical
Models
c b
12.22
Probabilistic
Conditional Independence Graphical Models
Introduction
Undirected Graphical
Models
12.23
Probabilistic
Factorization of the Joint Distribution Graphical Models
Cliques?
Directed graphical
x1 x1 models
Undirected Graphical
Models
x2 x3 x2 x3
x4
12.24
Probabilistic
Factorization of the Joint Distribution Graphical Models
Cliques?
Directed graphical
x1 x1 models
Undirected Graphical
Models
x2 x3 x2 x3
x4
Maximal cliques?
12.24
Probabilistic
Factorization of the Joint Distribution Graphical Models
1 Y Undirected Graphical
p(x) = ψc (x c ) Models
Z
c∈M
P Q
where (for discrete variables) Z = x c∈M ψc (x c )
(normalization constant)
Example: Boltzmann distribution
ψc (x c ) = exp(−E(x c ))
12.25
Probabilistic
Image Denoising Graphical Models
Introduction
Directed graphical
models
Undirected Graphical
Models
12.26
Probabilistic
Image Denoising Graphical Models
Cliques?
Energy function?
Introduction
Goal: Given {yi } find {x̂i } such that {x̂i } has maximum Directed graphical
probability. models
Undirected Graphical
Coordinate-wise gradient descent: Models
12.27
Probabilistic
Image Denoising Graphical Models
Introduction
Directed graphical
models
Undirected Graphical
Models
Denoised Image
Figures 8.30 from “Pattern recognition and machine learning" by Bishop 12.28