You are on page 1of 92

Discrete Inference & Learning

in Artificial Vision
Lecture 2
Reparameterization and dynamic programming

M. Nikos Paragios & M. Pawan Kumar

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Undirected Graph
Graph G

Vertices
V

V1

V2

V3

V4

V5

V6

V7

V8

V9

Edges
E

Markov Random Field (MRF)

V1

V2

V3

V4

V5

V6

V7

V8

V9

Vertices are associated with random variables X

Markov Random Field (MRF)

X1

X2

X3

X4

X5

X6

X7

X8

X9

Vertices are associated with random variables X

Markov Random Field (MRF)

Unobserved
Random
Variables

X1

X2

X3

X4

X5

X6

X7

X8

X9

Neighbors
Edges define a neighborhood over random variables

MRF

X1

X2

X3

X4

X5

X6

X7

X8

X9

Variable Xp takes a value or a label xp from a set L = {l1, l2,, lh}


X = x is called a labeling
Discrete, Finite

MRF

X1

X2

X3

X4

X5

X6

X7

X8

X9

Total number of labelings is hn for n random variables


Probability of a labeling is P(x)

MRF

X1

X2

X3

X4

X5

X6

X7

X8

X9

MRF assumes the Markovian property for P(x)

MRF

X1

X2

X3

X4

X5

X6

X7

X8

X9

Xp is conditionally independent of Xq given Xps neighbors


Hammersley-Clifford Theorem

MRF
Potential
12(x1,x2)
X1

X2

X3

X4

X5

X6

X7

X8

X9

Potential
56(x5,x6)

Probability P(x) can be decomposed into clique potentials

MRF
Potential
12(x1,x2)
X1

X2

X3

X4

X5

X6

X7

X8

X9

Probability P(x) proportional to (p,q) pq(xp,xq)

Potential
56(x5,x6)

Outline
Models
Incorporating Data
Conditional Random Fields
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

MRF
Potential
1(x1,d1)

d1
X1

d2
X2

d4
X4

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

Probability P(x) proportional to (p,q) pq(xp,xq)


Probability P(d|x) proportional to p p (xp,dp)

Observed
Data

MRF
d1
X1

d2
X2

d4
X4

Probability P(x,d) =

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

p p(xp,dp) (p,q) pq(xp,xq)

Z is known as the partition function

MRF
d1
X1

d2
X2

d4
X4

High-order
Potential
4578(x4,x5,x7,x8)

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

Pairwise MRF
Unary
Potential
1(x1,d1)

d1
X1

d2
X2

d4
X4

Probability P(x,d) =

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

Pairwise
Potential
56(x5,x6)

d9
X9

p p(xp,dp) (p,q) pq(xp,xq)

Z is known as the partition function

Outline
Models
Incorporating Data
Conditional Random Fields
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Conditional Random Fields (CRF)


d1
X1

d2
X2

d4
X4

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

CRF assumes the Markovian property for P(x|d)


Hammersley-Clifford Theorem

CRF
d1
X1

d2
X2

d4
X4

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

Probability P(x|d) proportional to p p(xp;d) (p,q) pq(xp,xq;d)


Clique potentials that depend on the data

CRF
d1
X1

d2
X2

d4
X4

Probability P(x|d) =

X3
d5

X5
d7

X7

d3

d6
X6

d8
X8

d9
X9

p p (xp;d) (p,q) pq(xp,xq;d)

Z is known as the partition functionZ

MRF and CRF

Probability P(x) =

X1

X2

X3

X4

X5

X6

X7

X8

X9

p p(xp) (p,q) pq(xp,xq)


Z

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Exponential Family
Probability P(x) =

p p(xp) (p,q) pq(xp,xq)


Z
exp(-E(x))

Probability P(x) =

Energy E(x) = p p(xp) Unary Parameters


Analogous to log(p(xp))

Exponential Family
Probability P(x) =

p p(xp) (p,q) pq(xp,xq)


Z
exp(-E(x))

Probability P(x) =

Energy E(x) = p p(xp) Unary Potentials


Analogous to log(p(xp))

Exponential Family
Probability P(x) =

p p(xp) (p,q) pq(xp,xq)


Z
exp(-E(x))

Probability P(x) =

Energy E(x) = p p(xp) + (p,q) pq(xp,xq)


Pairwise Parameters
Analogous to log(pq(xp,xq))

Exponential Family
Probability P(x) =

p p(xp) (p,q) pq(xp,xq)


Z
exp(-E(x))

Probability P(x) =

Energy E(x) = p p(xp) + (p,q) pq(xp,xq)


Pairwise Potentials
Analogous to log(pq(xp,xq))

Exponential Family

exp(-E(x))
Probability P(x) =

Energy E(x) = p p(xp) + (p,q) pq(xp,xq)


Lower energy corresponds to higher probability

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Energy Function
Label l1

Label l0

Xp

Random Variables X
Labels L = {l0, l1, .}
Labelling x

Xq

Xr

Xs

Energy Function
Label l1

2

Xq

3

Xr

7

Xs

Label l0

Xp

E(x) = p p(xp)
Unary Potential

Easy to minimize

Neighbourhood

Energy Function
Label l1

2

Xq

3

Xr

7

Xs

Label l0

Xp

Neighbors = { (p,q) , (q,r) , (r,s) }

Energy Function
Label l1

2

Xq

3

Xr

7

Xs

Label l0

Xp

Pairwise Potential

E(x) = p p(xp) +(p,q) pq(xp,xq)

Energy Function
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

Pairwise Potential

E(x) = p p(xp) +(p,q) pq(xp,xq)

Energy Function
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

E(x; ) = p p(xp) +(p,q) pq(xp,xq)


Parameter

Outline
Models
Exponential Family
Problem Formulation
Energy Minimization
Computing Min-Marginals
Reparameterization
Dynamic Programming

Energy Minimization
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

E(x; ) = p p(xp) +(p,q) pq(xp,xq)

Energy Minimization
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

E(x; ) = p p(xp) +(p,q) pq(xp,xq)

2 + 1 + 2 + 1 + 3 + 1 + 3 = 13

Energy Minimization
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

E(x; ) = p p(xp) +(p,q) pq(xp,xq)

Energy Minimization
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

E(x; ) = p p(xp) +(p,q) pq(xp,xq)

5 + 1 + 4 + 0 + 6 + 4 + 7 = 27

Energy Minimization
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

e* = min E(x; ) = E(x*; )


E(x; ) = p p(xp) +(p,q) pq(xp,xq)

x* = argmin E(x; )

Energy Minimization

x* = {1, 0, 0, 1}
e* = 13

16 possible labellings
xp
0
0
0
0
0
0
0
0

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

18
15
27
20
22
19
27
20

xp
1
1
1
1
1
1
1
1

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

16
13
25
18
18
15
23
16

Outline
Models
Exponential Family
Problem Formulation
Energy Minimization
Computing Min-Marginals
Reparameterization
Dynamic Programming

Min-Marginals
Label l1

0

1

4

1

Label l0

5

Xp

2

Xq

6

3

1

4

3

Xr

3

1

0
7

Xs

x* = arg min E(x; ) such that xp = i


Min-marginal ep(i)

Min-Marginals

ep(0) = 15

16 possible labellings
xp
0
0
0
0
0
0
0
0

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

18
15
27
20
22
19
27
20

xp
1
1
1
1
1
1
1
1

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

16
13
25
18
18
15
23
16

Min-Marginals

ep(1) = 13

16 possible labellings
xp
0
0
0
0
0
0
0
0

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

18
15
27
20
22
19
27
20

xp
1
1
1
1
1
1
1
1

xq
0
0
0
0
1
1
1
1

xr
0
0
1
1
0
0
1
1

xs
0
1
0
1
0
1
0
1

16
13
25
18
18
15
23
16

Min-Marginals and Energy Minimization


Minimum min-marginal of any variable =
energy of MAP labelling

mini ep(i)
mini ( minx E(x; ) such that xp = i )
minx E(x; )

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Reparameterization
2 +
2

2 +
5

Xp

4
- 2

2
- 2

Xq

xp

xq

E(x; )

10

Add a constant to all p(i)


Subtract that constant from all q(k)

Reparameterization
2 +
2

2 +
5

Xp

4
- 2

2
- 2

Xq

xp

xq

E(x; )

7 +2-2

10 + 2 - 2

5 +2-2

6 +2-2

Add a constant to all p(i)


Subtract that constant from all q(k)
E(x; ) = E(x; )

Reparameterization
0
- 3

1
- 3

1

5

Xp

4
+ 3

2

Xq

xp

xq

E(x; )

10

Add a constant to one q(k)


Subtract that constant from pq(i,k) for all i

Reparameterization
0
- 3

1
- 3

1

5

Xp

4
+ 3

2

Xq

xp

xq

E(x; )

10 - 3 + 3

6-3+3

Add a constant to one q(k)


Subtract that constant from pq(i,k) for all i
E(x; ) = E(x; )

Reparameterization
3

2
2
- 2

1
- 2

5
0
- 2

Xp

1
+ 1

4
- 1
2

0
+ 1

1
+ 1

2
+ 2
5

2

Xq

Xq

Xp

p(i) = p(i) + Mqp(i)


pq(i,k)= pq(i,k) - Mpq(k)

0
- 4
1
+ 4

1
- 4
4

2
- 4

5

Xp

2

Xq

q(k) = q(k) + Mpq(k)


- Mqp(i)

E(x; )
= E(x; )

Reparameterization
is a reparameterization of , iff

E(x; ) = E(x; ), for all x


Equivalently

Kolmogorov, PAMI, 2006


0

4
- 2

2 +
2

p(i) = p(i) + Mqp(i)


q(k) = q(k) + Mpq(k)
pq(i,k)= pq(i,k) - Mpq(k) - Mqp(i)

2 +
5

Xp

1

0

2
- 2

Xq

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming

Dynamic Programming
Some problems are easy
Dynamic programming is exact for chains
Exact for trees

Clever Reparameterization

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming
Two Variables
Three Variables
Chains and Trees

Two Variables
2

0

1

1

5

Xp

2

Xq

5

Xp

Xq

Add a constant to one q(k)


Subtract that constant from pq(i,k) for all i
Choose the right constant

q(k) = eq(k)

Two Variables
2

Xp

4

1

1

5

2

Xq

Mpq(0) = min

5

Xp

Xq

p(0) + pq(0,0) = 5 + 0
q(1) + pq(1,0) = 2 + 1

Choose the right constant

q(k) = eq(k)

Two Variables
2

0

1

-2

5

Xp

-3

5

Xq

Choose the right constant

5

Xp

Xq

q(k) = eq(k)

Two Variables
2

xp = 1

0

1

-2

5

Xp

-3

5

Xq

5

Xp

q(0) = eq(0)
Potentials along the red path add up to 0

Choose the right constant q(k) = eq(k)

Xq

Two Variables
2

Xp

4

1

-2

5

-3

5

Xq

Mpq(1) = min

5

Xp

p(0) + pq(0,1) = 5 + 1
p(1) + pq(1,1) = 2 + 0

Choose the right constant q(k) = eq(k)

Xq

Two Variables
2

xp = 1

Xp

-2

6

-1

-2

5

xp = 1

-3

5

Xq

q(0) = eq(0)

5

Xp

Xq

q(1) = eq(1)

Minimum of min-marginals = MAP estimate


Choose the right constant q(k) = eq(k)

Two Variables

xp = 1

-3

Xp

5

Xq

q(0) = eq(0)
xq = 0

-2

6

-1

-2

5

xp = 1

5

Xp

Xq

q(1) = eq(1)

xp = 1

Choose the right constant q(k) = eq(k)

Two Variables
2

xp = 1

Xp

-2

6

-1

-2

5

xp = 1

-3

5

Xq

q(0) = eq(0)

5

Xp

Xq

q(1) = eq(1)

We get all the min-marginals of Xq


Choose the right constant q(k) = eq(k)

Computational Complexity
Number of reparameterization constants = h
Complexity for each constant = O(h)
Total complexity = O(h2)
Same complexity as brute-force !!

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming
Two Variables
Three Variables
Chains and Trees

Three Variables

l1

2

1

4

1

6

3

l0

5

Xp

2

Xq

3

Xr

Reparameterize the edge (p,q) as before

Three Variables

xp = 1
l1

-2

2

-2

6

-1

6

3

l0

5

Xp

-3

5

Xq

3

Xr

xp = 1
Reparameterize the edge (p,q) as before

Three Variables

xp = 1
l1

-2

2

-2

6

-1

6

3

l0

5

Xp

-3

5

Xq

3

Xr

xp = 1
Reparameterize the edge (p,q) as before
Potentials along the red path add up to 0

Three Variables

xp = 1
l1

-2

2

-2

6

-1

6

3

l0

5

Xp

-3

5

Xq

3

Xr

xp = 1
Reparameterize the edge (q,r) as before
Potentials along the red path add up to 0

Three Variables

xp = 1
l1

-2

2

-2

6

-1

xq = 1
-6

12

-3

-4

l0

5

Xp

-3

5

Xq

xp = 1

-5

9

Xr

xq = 0

Reparameterize the edge (q,r) as before


Potentials along the red path add up to 0

Three Variables

xp = 1
l1

-2

2

-2

6

-1

xq = 1
-6

12

-3

-4

l0

5

Xp

-3

5

Xq

xp = 1

-5

9

Xr

er(1)
er(0)

xq = 0

Reparameterize the edge (q,r) as before


Potentials along the red path add up to 0

Three Variables

xp = 1
l1

-2

2

-2

6

-1

xq = 1
-6

12

-3

-4

l0

-3

Xp

5

Xq

xp = 1
xr = 0

xq = 0

-5

9

Xr

xq = 0
xp = 1

er(1)
er(0)

Computational Complexity
Number of reparameterization constants = 2h
Complexity for each constant = O(h)
Total complexity = O(2h2) = O(h2)
Better than brute-force O(h3)

Outline
Models
Exponential Family
Problem Formulation
Reparameterization
Dynamic Programming
Two Variables
Three Variables
Chains and Trees

Chains

X1

X2

X3

..

Reparameterize the edge (1,2)

Xn

Chains

X1

X2

X3

..

Reparameterize the edge (1,2)

Xn

Chains

X1

X2

X3

..

Reparameterize the edge (2,3)

Xn

Chains

X1

X2

X3

..

Reparameterize the edge (3,4)

Xn

Chains

X1

X2

X3

..

Reparameterize the edge (n-1,n)


Min-marginals en(i) for all labels

Xn

Chains

X1

X2

X3

..

Start from left and move towards right


Pick the minimum of min-marginals
Backtrack to find the best labeling x

Xn

Computational Complexity
Number of reparameterization constants = (n-1)h
Complexity for each constant = O(h)
Total complexity = O(nh2)
Better than brute-force O(hn)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (4,2)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (4,2)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (5,2)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (6,3)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (7,3)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (2,1)

Trees
X1

X2

X4

X3

X5

X6

X7

Reparameterize the edge (3,1)


Min-marginals e1(i) for all labels

Trees
X1

X2

X4

X3

X5

X6

X7

Start from leaves and move towards root


Pick the minimum of min-marginals
Backtrack to find the best labeling x

Computational Complexity
Number of reparameterization constants = (n-1)h
Complexity for each constant = O(h)
Total complexity = O(nh2)
Better than brute-force O(hn)

You might also like