Crfasrnn Presentation PDF

Torr Vision Group, Engineering Department
Semantic Image
Segmentation with
Deep Learning
Sadeep Jayasumana
07/10/2015
Collaborators:
Bernardino Romera-Paredes
Shuai Zheng
Phillip Torr
Live Demo - http://crfasrnn.torr.vision/

Outline
 Semantic segmentation
 Why?
 CNNs for Pixelwise prediction
 CRFs
 CRF as RNN
 Conclusion
Semantic Segmentation
• Recognizing and delineating objects in an image 
Classifying each pixel in the image
Why Semantic Segmentation?

• To help partially sighted people by highlighting
important objects in their glasses

• To let robots segment objects so that they can grasp
them

• Road scenes understanding
• Useful for autonomous navigation of cars and
drones
Image taken from the cityscapes dataset.


• Useful tool for editing images

• Medical purposes: e.g. segmenting
tumours, dental cavities, ...
Image taken from Mauricio Reyes
ISBI Challenge 2015, dental x-ray images

But How?
• Deep convolutional neural networks are successful at
learning a good representation of the visual inputs.
• However, here we have a structured output.

CNN for Pixel-wise Labelling

• Usual convolutional networks
CNN for Pixel-wise Labelling

• Usual convolutional networks
• Fully convolutional networks
Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.
Fully Convolutional Networks

[Long et al, CVPR 2014]
Fully Convolutional Networks

[Long et al, CVPR 2014]
+ Significantly improved the state of the art in semantic

segmentation.
- Poor object delineation: e.g. spatial consistency
neglected.
Image FCN Results Ground truth

Conditional Random Fields (CRFs)

• A CRF can account for contextual information in the
image
Coarse output from the MRF/CRF modelling Output after the CRF
pixel-wise classifier inference

∈ {bg, cat, tree, person, …}
• Define a discrete random variable Xi for each pixel i.

• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)

∈ {bg, cat, tree, person, …} = bg = cat
• Define a discrete random variable Xi for each pixel i.

• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
• Most probable assignment given the image → segmentation.
Finding the Best Assignment

= bg
Pr = , = ,…, = | = Pr ( = | )
Pr = | = exp − |
= cat
• Maximize Pr = → Minimize
• So we have formulated the problem as an energy minimization.
| = _ + _
=
| = _ + _
Unary energy
 ( = ) = ? =
| = _ + _
Unary energy
 ( = ) = ? =
 Your label doesn’t agree with the initial

classifier → you pay a penalty.
| = _ + _
Unary energy
 ( = ) = ?
Pairwise energy
 ( = , = ) = ?
 You assign different labels to two very similar
pixels → you pay a penalty.
 How do you measure similarity?
| = _ + _
Unary energy
 ( = ) = ?
Pairwise energy
 ( = , = ) = ?
| = _ + _
Unary energy
 ( = ) = ?
Pairwise energy
 ( = , = ) = ?
Dense CRF Formulation

[Krähenbühl & Koltun, NIPS 2011.]
• Pairwise energies are defined for every pixel pair in the

image.
= ( )+ ( , )
,
• Exact inference is not feasible.

• Use approximate mean field inference.
Dense CRF Formulation

[Krähenbühl & Koltun, NIPS 2011.]
• Pairwise energies are defined for every pixel pair in the

image.
= ( )+ ( , )
,
• Exact inference is not feasible.

• Use approximate mean field inference.
exp (− )= = ( )
Fully Connected CRFs as a CNN

Q Bilateral
I
Q Bilateral Conv
I
Q Bilateral Conv Conv

I
Q Bilateral Conv Conv +

I
Q Bilateral Conv Conv + SoftMax

I
CRF as a Recurrent Neural Network
Q Bilateral Conv Conv + SoftMax

I
Mean-field Iteration
• Each of these blocks is differentiable  We can backprop

CRF as a Recurrent Neural Network

Image
CRF
Unaries Output
Iteration
SoftMax
CRF as RNN
• Each of these blocks is differentiable  We can backprop

Putting Things Together
FCN CRF-RNN
Experiments
CRF-
FCN FCN CRF FCN
RNN
[Long et al, 2014] [Chen et al, 2015] Ours
68.3 69.5 72.9

Try our demo: http://crfasrnn.torr.vision

Code & model: https://github.com/torrvision/crfasrnn
Shuai Zheng
Bernardino
Romera-Paredes
Philip Torr
Examples
http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg
Examples
http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg
Examples
http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg
Examples
http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg
Not-so-good examples
http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg
Not-so-good examples
http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500
Tricky examples
http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png
Tricky examples
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg
Tricky examples
http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Thank You!

Crfasrnn Presentation PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crfasrnn Presentation PDF

Uploaded by

Copyright:

Available Formats

Torr Vision Group, Engineering Department

Live Demo - http://crfasrnn.torr.vision/

Why Semantic Segmentation?

Why Semantic Segmentation?

Why Semantic Segmentation?

Image taken from the cityscapes dataset.

Why Semantic Segmentation?

Why Semantic Segmentation?

Image taken from Mauricio Reyes

ISBI Challenge 2015, dental x-ray images

• However, here we have a structured output.

CNN for Pixel-wise Labelling

CNN for Pixel-wise Labelling

• Fully convolutional networks

Fully Convolutional Networks

Fully Convolutional Networks

+ Significantly improved the state of the art in semantic

Image FCN Results Ground truth

Conditional Random Fields (CRFs)

Conditional Random Fields (CRFs)

• Define a discrete random variable Xi for each pixel i.

Conditional Random Fields (CRFs)

• Define a discrete random variable Xi for each pixel i.

Finding the Best Assignment

 Your label doesn’t agree with the initial

Dense CRF Formulation

• Pairwise energies are defined for every pixel pair in the

• Exact inference is not feasible.

Dense CRF Formulation

• Pairwise energies are defined for every pixel pair in the

• Exact inference is not feasible.

Fully Connected CRFs as a CNN

Fully Connected CRFs as a CNN

Fully Connected CRFs as a CNN

Fully Connected CRFs as a CNN

Q Bilateral Conv Conv

Fully Connected CRFs as a CNN

Q Bilateral Conv Conv +

Fully Connected CRFs as a CNN

Q Bilateral Conv Conv + SoftMax

CRF as a Recurrent Neural Network

Q Bilateral Conv Conv + SoftMax

• Each of these blocks is differentiable  We can backprop

CRF as a Recurrent Neural Network

• Each of these blocks is differentiable  We can backprop

Putting Things Together

[Long et al, 2014] [Chen et al, 2015] Ours

68.3 69.5 72.9

Try our demo: http://crfasrnn.torr.vision

You might also like