Professional Documents
Culture Documents
Semantic Image
Segmentation with
Deep Learning
Sadeep Jayasumana
07/10/2015
Collaborators:
Bernardino Romera-Paredes
Shuai Zheng
Phillip Torr
Torr Vision Group, Engineering Department
Outline
Semantic segmentation
Why?
CNNs for Pixelwise prediction
CRFs
CRF as RNN
Conclusion
Torr Vision Group, Engineering Department
Semantic Segmentation
• Recognizing and delineating objects in an image
Classifying each pixel in the image
Torr Vision Group, Engineering Department
But How?
• Deep convolutional neural networks are successful at
learning a good representation of the visual inputs.
Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.
Torr Vision Group, Engineering Department
Coarse output from the MRF/CRF modelling Output after the CRF
pixel-wise classifier inference
Torr Vision Group, Engineering Department
Pr = | = exp − |
= cat
• Maximize Pr = → Minimize
• So we have formulated the problem as an energy minimization.
Torr Vision Group, Engineering Department
| = _ + _
=
Torr Vision Group, Engineering Department
| = _ + _
Unary energy
( = ) = ? =
Torr Vision Group, Engineering Department
| = _ + _
Unary energy
( = ) = ? =
| = _ + _
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
Torr Vision Group, Engineering Department
| = _ + _
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
Torr Vision Group, Engineering Department
| = _ + _
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
Torr Vision Group, Engineering Department
= ( )+ ( , )
,
= ( )+ ( , )
,
exp (− )= = ( )
Torr Vision Group, Engineering Department
Q Bilateral
I
Torr Vision Group, Engineering Department
Q Bilateral Conv
I
Torr Vision Group, Engineering Department
Mean-field Iteration
CRF
Unaries Output
Iteration
SoftMax
CRF as RNN
FCN CRF-RNN
Torr Vision Group, Engineering Department
Experiments
CRF-
FCN FCN CRF FCN
RNN
Shuai Zheng
Bernardino
Romera-Paredes
Philip Torr
Torr Vision Group, Engineering Department
Examples
http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg
Torr Vision Group, Engineering Department
Examples
http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg
Torr Vision Group, Engineering Department
Examples
http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg
Torr Vision Group, Engineering Department
Examples
http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg
Torr Vision Group, Engineering Department
Not-so-good examples
http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg
Torr Vision Group, Engineering Department
Not-so-good examples
http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500
Torr Vision Group, Engineering Department
Tricky examples
http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png
Torr Vision Group, Engineering Department
Tricky examples
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg
Torr Vision Group, Engineering Department
Tricky examples
http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg
Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Thank You!