You are on page 1of 29

Probabilistic

Graphical Models

Chapter 12
Probabilistic Graphical Models
EEE 485/585 Statistical Learning and Data Analytics Introduction

Directed graphical
models

Undirected Graphical
Models

Cem Tekin
Bilkent University

Cannot be distributed outside this class without the permission of the


instructor. 12.1
Probabilistic
Probabilistic Graphical Models Graphical Models

Introduction

Directed graphical
models

Undirected Graphical
Models

Represents set of conditional independencies


Factorizes the joint distribution

Figure 1.1 from “Probabilistic Graphical Models: Principles and Techniques" by Daphne Koller and Nir Friedman 12.2
Probabilistic
Directed Graphical Models (Bayesian Networks) Graphical Models

Introduction
c b Directed graphical
models

Undirected Graphical
Models
a is a parent of b
b is a child of a

p(a, b, c) = p(c|a, b)p(a, b)


= p(c|a, b)p(b|a)p(a)

12.3
Probabilistic
The Chain Rule Graphical Models

Introduction

Directed graphical
models
p(x1 , x2 , . . . , xD ) = p(xD |x1 , x2 , . . . , xD−1 )
Undirected Graphical
× p(xD−1 |x1 , . . . , xD−2 ) × . . . × p(x2 |x1 ) × p(x1 ) Models

According to which graphical model does chain rule


factorize the joint distribution p(x1 , x2 , . . . , xD )?

12.4
Probabilistic
The Factorization Property Graphical Models

We must have a directed acyclic graph (no directed


cycles)! Introduction

Directed graphical
Let paj denote the parents of node j models

Undirected Graphical
Given a directed acyclic graph with D nodes Models

x = (x1 , x2 , . . . , xD )
D
Y
p(x) = p(xj |paj )
j=1

12.5
Probabilistic
Example Graphical Models

Introduction

Directed graphical
models

Undirected Graphical
Models

12.6
Probabilistic
An Application - Ancestral Sampling Graphical Models

Goal: Sample x̂ = (x̂1 , x̂2 , . . . , x̂D ) from joint distribution


p(x1 , x2 , . . . , xD ).

Assume that p(x1 , x2 , . . . , xD ) factorizes according to a graph in Introduction

which x1 has no parent. Directed graphical


models

Undirected Graphical
Models
Ancestral Sampling:
sample x̂1 from p(x1 )
sample x̂2 from p(x2 |pa2 )
...
sample x̂j from p(xj |paj )
...
sample x̂D from p(xD |paD )

12.7
Probabilistic
Conditional Independence Properties Graphical Models

a and b are independent:

p(a, b) = p(a)p(b)
Introduction

a and b are conditionally independent given c: Directed graphical


models

Undirected Graphical
p(a, b|c) = p(a|c)p(b|c) Models

Check independence and conditional independence of a


and b on the graphs below

c a b
a c b
a b c

12.8
Probabilistic
Explaining Away Graphical Models

p(B=1) = 0.9
B F p(F=1) = 0.9
B: battery p(G = 1|B=1,F=1) = 0.8
G: gauge p(G = 1|B=1,F=0) = 0.2
F: fuel G p(G = 1|B=0,F=1) = 0.2
p(G = 1|B=0,F=0) = 0.1
Introduction

Directed graphical
models

B F B=0 F Undirected Graphical


Models

G=0 G=0

Before any observation: p(F = 0) = 0.1


After observing G = 0: p(F = 0|G = 0) = 0.257 (increased)
After observing B = 0: p(F = 0|G = 0, B = 0) = 0.111
(decreased)
Finding battery is flat (B = 0) explains away the observation that
the fuel gauge (F = 0) reads empty.
12.9
Probabilistic
D-separation Graphical Models

A, B, C: arbitrary non-intersecting set of nodes. How to check


A⊥
⊥B|C ⊥
⊥ = conditionally independent
Let p be a path from a node in A to a node in B.

Introduction

A Directed graphical
models

Undirected Graphical
Models

path C

path

12.10
Probabilistic
D-separation Graphical Models

Definition: p is blocked if there exists a node v ∈ p such that


1 the arrows meet head-to-tail or tail-to-tail at v and v ∈ C
head-to-tail tail-to-tail
v v
Introduction

Directed graphical
models

Undirected Graphical
path path Models

2 the arrows meet head-to-head at v and neither v nor any


of its descendants is in the set C
head-to-head
path

descendants of v

12.11
Probabilistic
D-separation Graphical Models

Introduction

Directed graphical
P: set of all paths from A to B. models

If all p ∈ P is blocked, then A is d-separated from B by C: Undirected Graphical


Models

A⊥
⊥B|C

12.12
Probabilistic
D-separation Example 1 Graphical Models

a f Introduction

Directed graphical
models

e b Undirected Graphical
Models

a⊥
⊥ b | c?

12.13
Probabilistic
D-separation Example 2 Graphical Models

a f Introduction

Directed graphical
models

e b Undirected Graphical
Models

a⊥
⊥ b | f?

12.14
Probabilistic
Naive Bayes Classifier Graphical Models

x = (x1 , x2 , . . . , xD ), y ∈ {1, . . . , C}
Parameters: θ = {θjc }j∈{1,...,D},c∈{1,...,C}
Distribution of the features depend on the label! Introduction

Directed graphical
models
D
Y Undirected Graphical
p(x|y = c, θ) = p(xj |y = c, θjc ) Models

j=1

p(y = c) = πc
p(y = c|x) ∝ p(y = c)p(x|y = c, θ)

Features are conditionally independent given the class


label!
Graphical model for the naive Bayes classifier?

12.15
Probabilistic
Naive Bayes Classifier Graphical Models

Gaussian features: Introduction

Directed graphical
models
p(xj |y = c, θjc ) = N (xj |µjc , σjc2 ) Undirected Graphical
Models

Bernoulli features:

p(xj |y = c, θjc ) = Ber(xj , θjc )

12.16
Probabilistic
Classification with Naive Bayes Classifier for Bernouilli Graphical Models

Features

Estimate: πc , θjc from D = {(x i , yi )}ni=1 .

Introduction
D
Y Directed graphical
p(y = c|x, D) ∝ p(y = c|D) p(xj |y = c, D) models

Undirected Graphical
j=1 Models
D
Y
∝ π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
j=1

Then,
 
D
Y 
ĉ = arg max π̂c (θ̂jc )I(xj =1) (1 − θ̂jc )I(xj =0)
c∈{1,...,C}  j=1

12.17
Probabilistic
Estimating the parameters Graphical Models

Maximum likelihood estimation!

Introduction
n
Y Directed graphical
L(θ, π) = p(x i , yi |θ, π) models

i=1 Undirected Graphical


  Models
n
Y D
Y
= p(yi |π) p(xij |yi , θ j )
i=1 j=1
 ! !
n C D C
I(y =c)
Y Y Y Y
=  πc i p(xij |yi = c, θjc )I(yi =c) 
i=1 c=1 j=1 c=1

12.18
Probabilistic
Estimating the parameters Graphical Models

Log-likelihood:

l(θ, π) = log(L(θ, π))


   
X n X C D
X C
X
=  I(yi = c) log πc  +  I(yi = c) log p(xij |yi = c, θjc )
Introduction
i=1 c=1 j=1 c=1
Directed graphical
n X
X C n X
X D X
C models

= I(yi = c) log πc + I(yi = c) log p(xij |yi = c, θjc ) Undirected Graphical


Models
i=1 c=1 i=1 j=1 c=1

If xij ∼ Bernoulli(θjc ), then


I(xij =1)
p(xij |yi = c, θjc ) = θjc (1 − θjc )I(xij =0)

Hence,
n X
X C
l(θ, π) = I(yi = c) log πc
i=1 c=1
n X
X D X
C
 
+ I(yi = c) I(xij = 1) log θjc + I(xij = 0) log(1 − θjc )
i=1 j=1 c=1

12.19
Probabilistic
Estimating the parameters - Bernoulli case Graphical Models

C
X
(θ̂ MLE , π̂ MLE ) = arg max l(θ, π) subject to πc = 1
θ,π
c=1
Introduction

Directed graphical
The optimization problem above is solved by the method of models
Lagrange multipliers. Undirected Graphical
Models

Result:

n
nc X
π̂c = , nc = I(yi = c)
n
i=1
n
njc X
θ̂jc = , njc = I(xij = 1, yi = c)
nc
i=1

12.20
Probabilistic
Markov Blanket Graphical Models

For a node xi , its Markov Blanket is the minimal set of nodes


that isolates xi from the rest of the graph.

Introduction
p(x1 , . . . , xD )
p(xi |xj6=i ) = R Directed graphical
p(x1 , . . . , xD )dxi models

Undirected Graphical
Models

xi co-parent of xi

12.21
Probabilistic
Undirected Graphical Models (Markov Random Field) Graphical Models

No arrows!
Introduction

a Directed graphical
models

Undirected Graphical
Models
c b

Conditional independence properties?


Markov blanket?

12.22
Probabilistic
Conditional Independence Graphical Models

Introduction

When xi and xj are NOT neighbors: Directed graphical


models

Undirected Graphical
Models

p(xi , xj |all other nodes) = ?

12.23
Probabilistic
Factorization of the Joint Distribution Graphical Models

Clique: A subset of fully connected nodes in a graph.

Cliques?

Graph 1 Graph 2 Introduction

Directed graphical
x1 x1 models

Undirected Graphical
Models

x2 x3 x2 x3

x4

12.24
Probabilistic
Factorization of the Joint Distribution Graphical Models

Clique: A subset of fully connected nodes in a graph.

Cliques?

Graph 1 Graph 2 Introduction

Directed graphical
x1 x1 models

Undirected Graphical
Models

x2 x3 x2 x3

x4

Maximal clique: A clique that cannot be extended by adding


one more node

Maximal cliques?
12.24
Probabilistic
Factorization of the Joint Distribution Graphical Models

x = (x1 , x2 , . . . , xD ): all variables


M: set of maximal cliques
x c , c ∈ M: variables in c
Introduction
ψc (·): non-negative potential function Directed graphical
models

1 Y Undirected Graphical
p(x) = ψc (x c ) Models
Z
c∈M
P Q
where (for discrete variables) Z = x c∈M ψc (x c )
(normalization constant)
Example: Boltzmann distribution

ψc (x c ) = exp(−E(x c ))

How hard is it to compute Z ?

12.25
Probabilistic
Image Denoising Graphical Models

Original Image Noisy Image

Introduction

Directed graphical
models

Undirected Graphical
Models

yi ∈ {−1, +1}: pixel i of noisy image


xi ∈ {−1, +1}: pixel i of original image

12.26
Probabilistic
Image Denoising Graphical Models

Cliques?
Energy function?
Introduction
Goal: Given {yi } find {x̂i } such that {x̂i } has maximum Directed graphical
probability. models

Undirected Graphical
Coordinate-wise gradient descent: Models

Take pixel j. Evaluate energy for xj = +1 and xj = −1


given other variables fixed.
Choose xj that minimizes the energy.
Stop when: no more change during a sweep through all
pixels.
Guarantees local maximum of the probability

12.27
Probabilistic
Image Denoising Graphical Models

Original Image Noisy Image

Introduction

Directed graphical
models

Undirected Graphical
Models

Denoised Image

Figures 8.30 from “Pattern recognition and machine learning" by Bishop 12.28

You might also like