You are on page 1of 207

Week 4: Convolutional Networks

Unit 1: Introduction to CNNs


Introduction to CNNs
Overview

Conv
Content: ReLU
▪ Biological inspiration Pool
▪ Challenges for computer vision
Conv
▪ The need for spatial invariance
ReLU
▪ A naïve approach to neural networks for computer vision
(Global) Pool
▪ Convolutional neural networks
Dense
ReLU
Dropout

Dense
ReLU
Dropout

Dense
Softmax

“ship” “dog”
“truck”
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2
Introduction to CNNs
Biological inspiration

Example: The primate visual cortex


▪ Signals arriving from the retina are processed
hierarchically by subsequent brain areas
▪ This is reminiscent of processing by subsequent
layers of a deep artificial neural network

Retina

David Daniel Cox, Thomas Dean, Neural Networks and Neuroscience-Inspired Computer Vision, Current Biology, Volume 24, Issue 18, 2014, Pages R921-R929
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Introduction to CNNs
Biological inspiration

Example: The primate visual cortex


▪ Signals arriving from the retina are processed
hierarchically by subsequent brain areas
▪ This is reminiscent of processing by subsequent
layers of a deep artificial neural network

Of course, the correspondence is not perfect:


▪ In the primate visual cortex, there are many forward
and backward connections Retina

David Daniel Cox, Thomas Dean, Neural Networks and Neuroscience-Inspired Computer Vision, Current Biology, Volume 24, Issue 18, 2014, Pages R921-R929
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
Introduction to CNNs
Biological inspiration

Hubel and Wiesel (1950s and ’60s)


studied feline visual cortex
Two types of cells identified:
▪ Simple cells fire in response to stimuli of
a particular shape and orientation

Hubel, David H., and Torsten N. Wiesel. "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex." The Journal of physiology 160.1 (1962): 106

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Introduction to CNNs
Biological inspiration

Hubel and Wiesel (1950s and ’60s)


studied feline visual cortex
Two types of cells identified:
▪ Simple cells fire in response to stimuli of
a particular shape and orientation
▪ Complex cells additionally fire only when
the stimulus moves in a particular
direction

Hubel, David H., and Torsten N. Wiesel. "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex." The Journal of physiology 160.1 (1962): 106

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye
▪ Identify other body parts of the dog, like a second
eye

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye
▪ Identify other body parts of the dog, like a second
eye, ears

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye
▪ Identify other body parts of the dog, like a second
eye, ears, snout

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye
▪ Identify other body parts of the dog, like a second
eye, ears, snout, legs, etc.

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


Introduction to CNNs
Challenges for computer vision – From images to abstract representations

Object recognition involves combining many


irregular features into a whole
For example, to recognize a dog, an animal’s
brain must:
▪ Identify edges of shapes, like the arcs forming the
outline of an eye
▪ Combine those component parts into an abstract
representation of an eye
▪ Identify other body parts of the dog, like a second
eye, ears, snout, legs, etc.
▪ Combine these parts into an abstract representation
of a dog

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


Introduction to CNNs
Challenges for computer vision – Data dimensionality

Image data is very high-dimensional


▪ A 512 × 512 RGB image has 512 ⋅ 512 ⋅ 3 = 786,432 features

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?


Many invariances must be learned by a “naïve”
machine learning model, e.g.:
▪ the position of the dog in the image (spatial invariance)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?


Many invariances must be learned by a “naïve”
machine learning model, e.g.:
▪ the position of the dog in the image (spatial invariance)
▪ the size of the dog in the image (scale invariance)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?


Many invariances must be learned by a “naïve”
machine learning model, e.g.:
▪ the position of the dog in the image (spatial invariance)
▪ the size of the dog in the image (scale invariance)
▪ the angle at which the image was taken (2D rotational
invariance)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?


Many invariances must be learned by a “naïve”
machine learning model, e.g.:
▪ the position of the dog in the image (spatial invariance)
▪ the size of the dog in the image (scale invariance)
▪ the angle at which the image was taken (2D rotational
invariance)
▪ the orientation of the dog and the parts of its body in
3D space

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


Introduction to CNNs
Challenges for computer vision – Invariance

Does this image contain a dog?


Many invariances must be learned by a “naïve”
machine learning model, e.g.:
▪ the position of the dog in the image (spatial invariance)
▪ the size of the dog in the image (scale invariance)
▪ the angle at which the image was taken (2D rotational
invariance)
▪ the orientation of the dog and the parts of its body in
3D space
▪ and more…

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21


Introduction to CNNs
The need for spatial invariance

Does this image contain a dog?


Let’s focus on spatial invariance.
▪ Where the dog appears is irrelevant
▪ The same is true for the component questions:
– Is there an edge in some orientation?
– Is this an eye?
– Does it have legs?

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


Introduction to CNNs
The need for spatial invariance

Does this image contain a dog?


Let’s focus on spatial invariance.
▪ Where the dog appears is irrelevant dog
▪ The same is true for the component questions:
– Is there an edge in some orientation?
– Is this an eye?
– Does it have legs?
▪ However, the spatial relationships between the
components of the image remain important
not a dog

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23


Introduction to CNNs
A naïve approach to neural networks for computer vision

A first idea for dealing with spatial invariance:


Augment the data by generating many
translations of the images in the training set,
then apply a feed-forward neural network.

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 24


Introduction to CNNs
A naïve approach to neural networks for computer vision

A first idea for dealing with spatial invariance:


Augment the data by generating many
translations of the images in the training set,
then apply a feed-forward neural network.
Problems:
▪ Lengthens training time by massively increasing
size of data set
▪ The feed-forward network requires many
parameters (for a 512 × 512 RGB image, 786,432
times the size of the first hidden layer in the input
layer alone)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25


Introduction to CNNs
A naïve approach to neural networks for computer vision

A feed-forward neural network connects each


pixel to each hidden node ℎ1

A lot of parameters:
𝑖𝑚𝑎𝑔𝑒 𝑤𝑖𝑑𝑡ℎ × 𝑖𝑚𝑎𝑔𝑒 ℎ𝑒𝑖𝑔ℎ𝑡 × 𝑠𝑖𝑧𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 27


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 28


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 29


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 30


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 31


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 32


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 33


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network
Contrast with a feed-forward neural network,
which connects each pixel to each hidden node
Significantly more parameters:
𝑖𝑚𝑎𝑔𝑒 𝑤𝑖𝑑𝑡ℎ × 𝑖𝑚𝑎𝑔𝑒 ℎ𝑒𝑖𝑔ℎ × 𝑠𝑖𝑧𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 34


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image ℎ1

Each small patch is connected to the first


hidden layer of the network
Contrast with a feed-forward neural network,
which connects each pixel to each hidden node
Significantly more parameters:
𝑖𝑚𝑎𝑔𝑒 𝑤𝑖𝑑𝑡ℎ × 𝑖𝑚𝑎𝑔𝑒 ℎ𝑒𝑖𝑔ℎ × 𝑠𝑖𝑧𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟

instead of:
𝑝𝑎𝑡𝑐ℎ 𝑤𝑖𝑑𝑡ℎ × 𝑝𝑎𝑡𝑐ℎ ℎ𝑒𝑖𝑔ℎ𝑡 × 𝑠𝑖𝑧𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 35


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image
In deeper layers of the network, the features
become effectively larger and more abstract

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 36


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image
In deeper layers of the network, the features
become effectively larger and more abstract

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 37


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image
In deeper layers of the network, the features
become effectively larger and more abstract

Advantages:
▪ Fewer parameters to tune
▪ Spatial invariance built in mathematically
▪ Retains spatial relationship between features

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 38


Introduction to CNNs
Convolutional neural networks

CNNs apply the same transformation


(feature maps) to each patch of the image
In deeper layers of the network, the features
become effectively larger and more abstract

The next lecture will explain CNNs


Advantages: and these advantages in detail
▪ Fewer parameters to tune
▪ Spatial invariance built in mathematically
▪ Retains spatial relationship between features

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 39


Introduction to CNNs
Coming up next

CNN Architecture I
▪ Convolutions
▪ Non-linearity
▪ Weight initialization

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 40


Thank you.
Contact information:

open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 4: Convolutional Networks
Unit 2: CNN Architecture Part I
CNN Architecture Part I
What we covered in the last unit

Introduction to Convolutional Networks


▪ Biological inspiration
▪ Spatial invariance
▪ Basic idea: convolutional network

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


CNN Architecture Part I
Overview

Conv
Content: ReLU
▪ Convolutions Pool
▪ ReLU non-linearity (activation function)
Conv
▪ Weight Initialization
ReLU
▪ pooling / global pooling
(Global) Pool
▪ dense (fully connected) layers
Dense
▪ dropout
ReLU
▪ softmax Dropout

Dense
ReLU
Dropout

Dense
Softmax

“ship” “dog”
“truck”
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
CNN Architecture Part I
Convolutions

𝑥 𝑦
𝑤

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
CNN Architecture Part I
Convolutions

“Field of view” for that pixel


𝑥 𝑦
𝑤

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
CNN Architecture Part I
Convolutions

𝑥 𝑦
𝑤

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
CNN Architecture Part I
Convolutions

𝑥 𝑦
𝑤

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
CNN Architecture Part I
Convolutions

𝑥 𝑦

𝑤
𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)
𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8
CNN Architecture Part I
Convolutions

𝑥 𝑦

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9
CNN Architecture Part I
Convolutions

𝑥 𝑦

𝑤 𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“valid” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10
CNN Architecture Part I
Convolutions

zero padding

𝑦𝑖𝑗 = ෍ 𝑤𝑘𝑙 𝑥(𝑘+𝑖)(𝑙+𝑗)


𝑘𝑙

Input Output

“same” convolution

Weights are shared between pixels (neurons)!


▪ Imparts spatial invariance to network
▪ Reduces number of weights
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
CNN Architecture Part I
Convolutions in 1D

The same ideas generalize to 1D as well!

Th i s i s c o o l !
Input
(one-hot encoded)

Output

1D convolution:
e.g. audio signal
natural language processing
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12
CNN Architecture Part I
Convolutions in 1D & 3D

The same ideas generalize to 1D & 3D as well!

Th i s i s c o o l !
Input
(one-hot encoded)

Output

Input Output

1D convolution: 3D convolution:
e.g. audio signal e.g. video processing
natural language processing
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13
CNN Architecture Part I
Convolutions: popular kernel shapes

3x3 kernel

Most widely used in


deep models

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


CNN Architecture Part I
Convolutions: popular kernel shapes

3x3 kernel 1x1 kernel

Most widely used in Will become


deep models clear shortly

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


CNN Architecture Part I
Convolutions: popular kernel shapes

3x3 kernel 1x1 kernel 1x3 kernel


(asymmetric)

Most widely used in Will become


deep models clear shortly

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


CNN Architecture Part I
Convolutions: popular kernel shapes

3x3 kernel 1x1 kernel 1x3 kernel 3x3 kernel


(asymmetric) stride = 2

Most widely used in Will become Downsampling


deep models clear shortly

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


CNN Architecture Part I
Convolutions: popular kernel shapes

3x3 kernel 1x1 kernel 1x3 kernel 3x3 kernel 3x3 kernel
(asymmetric) stride = 2 dilation = 2

Most widely used in Will become Downsampling Increased field of view,


deep models clear shortly but same complexity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


CNN Architecture Part I Transposed Convolution
Convolutions: popular kernel shapes aka deconvolution
aka fractionally strided convolution

3x3 kernel 1x1 kernel 1x3 kernel 3x3 kernel 3x3 kernel 2x2 kernel
(asymmetric) stride = 2 dilation = 2 stride = 2

Most widely used in Will become Downsampling Increased field of view, Upsampling
deep models clear shortly but same complexity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


CNN Architecture Part I Transposed Convolution
Convolutions: popular kernel shapes aka deconvolution
aka fractionally strided convolution

3x3 kernel 1x1 kernel 1x3 kernel 3x3 kernel 3x3 kernel 2x2 kernel
(asymmetric) stride = 2 dilation = 2 stride = 2

Most widely used in Will become Downsampling Increased field of view, Upsampling
deep models clear shortly but same complexity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


CNN Architecture Part I
Convolutions

𝑦𝑖𝑗 = ෍ ෍ 𝑤𝐶𝑘𝑙 𝑥𝐶(𝑘+𝑖)(𝑙+𝑗)


𝐶 𝑘𝑙

Input Output

We are not just summing over pixels, but also across input channels!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21


CNN Architecture Part I
Convolutions

𝑦𝑖𝑗 = ෍ ෍ 𝑤𝐶𝑘𝑙 𝑥𝐶(𝑘+𝑖)(𝑙+𝑗)


𝐶 𝑘𝑙

One “shared” set of weights


Input Output
per input channel

We are not just summing over pixels, but also across input channels!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


CNN Architecture Part I
Convolutions

Multiple output channels

𝐼 𝐼
𝑦𝑖𝑗 = ෍ ෍ 𝑤𝐶𝑘𝑙 𝑥𝐶(𝑘+𝑖)(𝑙+𝑗)
𝐶 𝑘𝑙

One ”shared” set of


Input Output
weights per input channel
and per output channel

We can have multiple output channels (feature maps) as well!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23


CNN Architecture Part I
Convolutions: going deeper

Input

We can chain multiple convolutions to build a deep convolutional neural network!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 24


CNN Architecture Part I
Convolutions: going deeper

Input

We can chain multiple convolutions to build a deep convolutional neural network!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25


CNN Architecture Part I
Convolutions: going deeper

Low level (local) features High level (global) features


© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26
CNN Architecture Part I
Convolutions: going deeper

Input

Note: After each layer, the effective field of view within the input image is increased!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 27


CNN Architecture Part I
Non-linearity: ReLUs

Input

After each convolution, we want to have a non-linearity to increase expressive power!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 28


CNN Architecture Part I
Non-linearity: ReLUs

Input ReLU ReLU ReLU ReLU

After each convolution, we want to have a non-linearity to increase expressive power!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 29


CNN Architecture Part I
Non-linearity: ReLUs

Input ReLU ReLU ReLU ReLU

Activation included
as part of a single
convolutional layer
(if desired)

After each convolution, we want to have a non-linearity to increase expressive power!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 30


CNN Architecture Part I
Non-linearity: rectified linear unit (ReLU)

No saturation Saturation (derivative ~ 0)

Classical
ReLU activation
functions

Deep Sparse Rectifier Neural Networks


▪ ReLUs do not saturate Xavier Glorot, Antoine Bordes, Yoshua Bengio;
(reduced vanishing gradient problem; see RNNs, Week 3)
Proceedings of the Fourteenth International Conference on Artificial
▪ Induce sparsity Intelligence and Statistics, PMLR 15:315-323, 2011
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 31
CNN Architecture Part I
Delving Deep into Rectifiers:
Non-linearity: parametric rectified linear unit (PReLU)
Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
ProceedingICCV '15 Proceedings of the 2015 IEEE International
Conference on Computer Vision (ICCV) Pages 1026-1034

ReLU PReLU

▪ ReLUs do not saturate


▪ Induce sparsity
Parametric (leaky) ReLU (PReLU)
▪ But they may “die” during training (never ever activate again) does not “die”
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 32
CNN Architecture Part I
Weight initialization

▪ Weights in each layer are randomly initialized

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 33


CNN Architecture Part I
Weight initialization
Input Output
distribution distribution

▪ Weights in each layer are randomly initialized 𝑤


▪ But they should be initialized such that an input signal is neither amplified ✕
nor damped!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 34


CNN Architecture Part I
Weight initialization
Input Output
distribution distribution

▪ Weights in each layer are randomly initialized 𝑤


▪ But they should be initialized such that an input signal is neither amplified ✕
nor damped!
▪ This property depends on the number of input/output connections,
and the non-linearity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 35


CNN Architecture Part I
Weight initialization
Input Output
distribution distribution

▪ Weights in each layer are randomly initialized 𝑤


▪ But they should be initialized such that an input signal is neither amplified ✕
nor damped!
▪ This property depends on the number of input/output connections,
and the non-linearity

For ReLU, use “He’s” method (aka Gaussian / variance scaling initialization)

2
𝑤~ 𝑃Gaussian , 𝑉𝑎𝑟 =
𝑁
, ✓
N: number of input or output channels, or average of the two
Delving Deep into Rectifiers:
Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
ProceedingICCV '15 Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) Pages 1026-1034
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 36
CNN Architecture Part I
Coming up next

CNN Architecture II
▪ Pooling
▪ Dense layers
▪ Dropout
▪ Softmax

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 37


Thank you.
Contact information:

open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 4: Convolutional Networks
Unit 3: CNN Architecture Part II
CNN Architecture Part II
What we covered in the last unit

CNN Architecture Part I


▪ Convolutions
▪ Non-linearity
▪ Weight initialization

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


CNN Architecture Part II
Overview

Conv
Content: ReLU
▪ Convolutions Pool
▪ ReLU non-linearity (activation function)
Conv
▪ Weight initialization
ReLU
▪ Pooling / global pooling
(Global) Pool
▪ Dense (fully connected) layers
Dense
▪ Dropout
ReLU
▪ Softmax Dropout

Dense
ReLU
Dropout

Dense
Softmax

“ship” “dog”
“truck”
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
CNN Architecture Part II
Reducing resolution: pooling

Input ReLU ReLU ReLU ReLU

After each convolution (block), we may want to decrease resolution


but increase the number of channels

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


CNN Architecture Part II
Reducing resolution: pooling

Number of channels increased

Input ReLU ReLU ReLU ReLU ReLU

Pooling Pooling
(downsampling) (downsampling)

After each convolution (block), we may want to decrease resolution


but increase the number of channels:
▪ Increase spatial invariance
▪ Increase level of abstraction
▪ Decrease computational load
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
CNN Architecture Part II
Reducing resolution: pooling

Example: 2x2 pooling, 2x2 stride

0 1 2 1
0.5 1
1 0 0 1 Average
Pooling
2 3 1 0
2 2
1 2 5 2

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


CNN Architecture Part II
Reducing resolution: pooling

Example: 2x2 pooling, 2x2 stride

0 1 2 1
0.5 1
1 0 0 1 Average
Pooling
2 3 1 0
2 2
1 2 5 2

0 1 2 1
1 2
1 0 0 1 Max
Better at capturing
2 3 1 0 Pooling edges
3 5
1 2 5 2
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
CNN Architecture Part II
Reducing resolution: global pooling

0 1 2 1

1 0 0 1
Global Average 1.375
2 3 1 0
Pooling
1 2 5 2
Take the average over an entire channel

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


CNN Architecture Part II
Reducing resolution: global pooling

0 1 2 1

1 0 0 1
Global Average 1.375
2 3 1 0
Pooling
1 2 5 2
Take the average over an entire channel

Outcome of global pooling can be


used as input for a dense layer to
reduce complexity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


CNN Architecture Part II
From convolutions to dense layers

Number of channels increased

Input ReLU ReLU ReLU ReLU ReLU

Pooling Pooling
(downsampling) (downsampling)

For classification tasks, we want to add dense layers

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


CNN Architecture Part II
From convolutions to dense layers

Every pixel in all output channels is connected to every neuron in the dense layer

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


CNN Architecture Part II
From convolutions to dense layers

Every pixel in all output channels is connected to every neuron in the dense layer
 weight matrix may become very large!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


CNN Architecture Part II
From convolutions to dense layers

Every pixel in all output channels is connected to every neuron in the dense layer
 weight matrix may become very large!
 use global pooling
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13
CNN Architecture Part II
From convolutions to dense layers
Number of channels
Number of channels increased
= number of neurons

Input ReLU ReLU ReLU ReLU ReLU

Pooling Pooling
(downsampling) (downsampling)

For classification tasks, we want to add dense layers Global Pooling


(downsampling /
flattening)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


CNN Architecture Part II
From convolutions to dense layers

Number of channels increased Number of channels


= number of neurons

Input ReLU ReLU ReLU ReLU ReLU

ReLU
Pooling Pooling Dense layer
(downsampling) (downsampling) (fully connected)

Global Pooling
For classification tasks, we want to add dense layers (downsampling /
flattening)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


CNN Architecture Part II
Risk of overfitting: dropout

Number of channels increased Number of channels


= number of neurons

Input ReLU ReLU ReLU ReLU ReLU


ReLU
Dropout
Pooling Pooling Dense layer
(downsampling) (downsampling) (fully connected)

Global Pooling
Dense layers have a lot of parameters and can therefore easily overfit (downsampling /
flattening)
Remedy: Dropout

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


CNN Architecture Part II
Risk of overfitting: dropout

Dropout Layer Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov


Journal of Machine Learning Research 15 (2014) 1929-1958

During training, randomly set neurons to zero


with probability P (e.g. 50% chance)

Dense layers have a lot of parameters and can therefore easily overfit
Remedy: Dropout

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


CNN Architecture Part II
Risk of overfitting: dropout

Dropout Layer Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov


Journal of Machine Learning Research 15 (2014) 1929-1958

During training, randomly set neurons to zero


with probability P (e.g. 50% chance)

Dense layers have a lot of parameters and can therefore easily overfit
Remedy: Dropout

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


CNN Architecture Part II
Risk of overfitting: dropout

Dropout Layer Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov


Journal of Machine Learning Research 15 (2014) 1929-1958

During inference, rescale neurons


with 1 / P (e.g. factor 2)

Dense layers have a lot of parameters and can therefore easily overfit
Remedy: Dropout

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


CNN Architecture Part II
Risk of overfitting: dropout

Dropout individual pixels

Dropout can also be applied to convolutional layers

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


CNN Architecture Part II
Risk of overfitting: dropout

Dropout entire channels


(aka spatial dropout)

Dropout individual pixels

Dropout can also be applied to convolutional layers

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21


CNN Architecture Part II
Multiple dense layers

Number of channels increased Number of channels


= number of neurons

Input ReLU ReLU ReLU ReLU ReLU


ReLU
Dropout
Pooling Pooling Dense layer
(downsampling) (downsampling) (fully connected)

Global Pooling
Depending on the problem, multiple dense layers may be added (downsampling /
flattening)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


CNN Architecture Part II
Multiple dense layers

Number of channels increased Number of channels


= number of neurons

Input ReLU ReLU ReLU ReLU ReLU


ReLU ReLU
Pooling Pooling Dense layer Dropout Dropout
(downsampling) (downsampling) (fully connected)
Dense layer
(fully connected)
Global Pooling
(downsampling /
flattening)
Depending on the problem, multiple dense layers may be added

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23


CNN Architecture Part II
Final dense & classification layer

Number of channels increased Number of channels


= number of neurons

Input ReLU ReLU ReLU ReLU ReLU


ReLU ReLU
Pooling Pooling Dense layer Dropout Dropout
(downsampling) (downsampling) (fully connected) Dense layer Dense layer
(fully connected) (fully
Global Pooling
connected)
(downsampling /
flattening)
Softmax
(class probabilities)
Finally, we add a dense layer with softmax nonlinearity
The number of neurons in the final dense layer must equal the number of classes

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 24


CNN Architecture Part II
Final dense & classification layer

Argmax: final class prediction

Softmax output:
෍ 𝑃𝑖 = 1
𝑖

Training label:

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25


CNN Architecture Part II
Coming up next

Accelerating Deep CNN Training


▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26


Thank you.
Contact information:

open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 4: Convolutional Networks
Unit 4: Accelerating Deep CNN Training
Accelerating Deep CNN Training
What we covered in the last unit

CNN Architecture Part II


▪ Pooling
▪ Dense layers
▪ Dropout
▪ Softmax

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Accelerating Deep CNN Training
Overview

Content:
▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Accelerating Deep CNN Training
Overview

Content:
▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Accelerating Deep CNN Training
Computational considerations: convolutional filter size

𝑥
𝑤
𝑦

7 x 7 Convolution
100 channels on each layer
49 x 1002 parameters to optimize

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Accelerating Deep CNN Training
Computational considerations: convolutional filter size

𝑥
𝑤 𝑥 𝑦
𝑦 𝑤

7 x 7 Convolution 3 x 3 Convolution
80% less
100 channels on each layer 100 channels on each layer
49 x 1002 parameters to optimize 9 x 1002 parameters to optimize

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Accelerating Deep CNN Training
Computational considerations: convolutional filter size

𝑥
𝑤 𝑥 𝑦
𝑤

Combination of two 3x3 convolutional layers (2 strides) have the same field of view.
Still fewer parameters than 7 x 7, less overfitting, more nonlinearity!

VGG Net: Simonyan, K. & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Accelerating Deep CNN Training
Computational considerations: replacing fully connected (dense) layers

VGG Net Inception


19 layer CNN 22 layer CNN

Max Pooling - 512 Global Pooling - 1024


w1 = 512 * 4096 w1 = 0
FC - 4096 Dropout - 1024
w2 = 40962 w2 = 1024 *1000
FC - 4096 FC -1000
w3 = 4096 *1000 Softmax ~ 1M parameters
FC - 1000
~ 23 M parameters Softmax

VGG Net: Simonyan, K. & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
Inception: Szegedy C. et al. (2015). Going Deeper with Convolutions. CoRR. abs/1409.4842.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8
Accelerating Deep CNN Training
Overview

Content:
▪ Computational considerations
▪ Batch normalization
𝑏 2 − 4𝑎𝑐
▪ Transfer learning
▪ Residual networks න 𝑥𝑑𝑦
𝑛
1
lim 1 +
𝑛→∞ 𝑛
𝑒 −𝑖𝜔𝑡

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Accelerating Deep CNN Training
Batch normalization: motivation

▪ During training, the output distribution of each layer changes due to weight updates

T=0: Conv 1   Conv 2   Conv 3  ….

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Accelerating Deep CNN Training
Batch normalization: motivation

▪ During training, the output distribution of each layer changes due to weight updates

T=0: Conv 1   Conv 2   Conv 3  ….

T=1: Conv 1   Conv 2   Conv 3  ….

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Accelerating Deep CNN Training
Batch normalization: motivation

▪ During training, the output distribution of each layer changes due to weight updates
▪ Each following layer has to adapt to the new output distribution of the previous layer
▪ This makes training rather hard

T=0: Conv 1   Conv 2   Conv 3  ….

T=1: Conv 1   Conv 2   Conv 3  ….

Aka “internal covariate shift”


© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12
Accelerating Deep CNN Training
Batch normalization: motivation

▪ Idea: Normalize output distribution after each layer

T=0: Conv 1   Conv 2   Conv 3  ….

T=1: Conv 1   Conv 2   Conv 3  ….

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


Accelerating Deep CNN Training
Batch normalization algorithm

Input: Values of 𝑥 over a mini-batch 𝐵 = 𝑥𝑖..𝑚 ;

Parameters to be learned: 𝛾, 𝛽

Output: ൛𝑦𝑖 = 𝐵𝑁𝛾 ,𝛽 (𝑥𝑖 )}

Batch Normalization: Ioffe, S. & Szegedy C. (2015). Batch Normalization: Accelerating


Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14
Accelerating Deep CNN Training
Batch normalization algorithm

Input: Values of 𝑥 over a mini-batch 𝐵 = 𝑥𝑖..𝑚 ; 1. Normalize every batch by mean and
variance of the batch.
Parameters to be learned: 𝛾, 𝛽
𝑥𝑖 − 𝜇𝛽
Output: ൛𝑦𝑖 = 𝐵𝑁𝛾 ,𝛽 (𝑥𝑖 )} 𝑥ෝ𝑖 ⃪
𝛼𝛽2 + 𝜖

Batch Normalization: Ioffe, S. & Szegedy C. (2015). Batch Normalization: Accelerating


Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15
Accelerating Deep CNN Training
Batch normalization algorithm

Input: Values of 𝑥 over a mini-batch 𝐵 = 𝑥𝑖..𝑚 ; 1. Normalize every batch by mean and
variance of the batch.
Parameters to be learned: 𝛾, 𝛽
𝑥𝑖 − 𝜇𝛽
Output: ൛𝑦𝑖 = 𝐵𝑁𝛾 ,𝛽 (𝑥𝑖 )} 𝑥ෝ𝑖 ⃪
𝛼𝛽2 + 𝜖

2. Introduce two new parameters 𝛾, 𝛽.

Batch Normalization: Ioffe, S. & Szegedy C. (2015). Batch Normalization: Accelerating


Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16
Accelerating Deep CNN Training
Batch normalization algorithm

Input: Values of 𝑥 over a mini-batch 𝐵 = 𝑥𝑖..𝑚 ; 1. Normalize every batch by mean and
variance of the batch.
Parameters to be learned: 𝛾, 𝛽
𝑥𝑖 − 𝜇𝛽
Output: ൛𝑦𝑖 = 𝐵𝑁𝛾 ,𝛽 (𝑥𝑖 )} 𝑥ෝ𝑖 ⃪
𝛼𝛽2 + 𝜖

2. Introduce two new parameters 𝛾, 𝛽.

3. Scale and shift the activations with


𝛾 and 𝛽.
𝑦𝑖 ← 𝛾𝑥ෝ𝑖 + 𝛽 ≡ 𝐵𝑁𝛾 ,𝛽 𝑥𝑖

Batch Normalization: Ioffe, S. & Szegedy C. (2015). Batch Normalization: Accelerating


Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17
Accelerating Deep CNN Training
Batch normalization algorithm

Input: Values of 𝑥 over a mini-batch 𝐵 = 𝑥𝑖..𝑚 ; 1. Normalize every batch by mean and
variance of the batch.
Parameters to be learned: 𝛾, 𝛽
𝑥𝑖 − 𝜇𝛽
Output: ൛𝑦𝑖 = 𝐵𝑁𝛾 ,𝛽 (𝑥𝑖 )} 𝑥ෝ𝑖 ⃪
𝛼𝛽2 + 𝜖

2. Introduce two new parameters 𝛾, 𝛽.

3. Scale and shift the activations with


𝛾 and 𝛽.
𝑦𝑖 ← 𝛾𝑥ෝ𝑖 + 𝛽 ≡ 𝐵𝑁𝛾 ,𝛽 𝑥𝑖

Batch Normalization: Ioffe, S. & Szegedy C. (2015). Batch Normalization: Accelerating 4. Learn the parameters 𝛾 and 𝛽 during
Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167. training.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18
Accelerating Deep CNN Training
Batch normalization algorithm

Benefits:
▪ Higher accuracy in earlier training steps
▪ Reduction in overfitting
▪ Less need for dropout layers

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


Accelerating Deep CNN Training
Overview

Content:
▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


Accelerating Deep CNN Training
Transfer learning: motivation

Problem: Deep neural networks require very large data sets to train, and large computational resources to
build from scratch

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21


Accelerating Deep CNN Training
Transfer learning: motivation

Problem: Deep neural networks require very large data sets to train, and large computational resources to
build from scratch

Famous CNN models AlexNet, ZFNet, VGGNet,


GoogleNet, Microsoft ResNet…

▪ Trained on millions of images


▪ Powerful GPUs used for training
▪ Training takes days or even weeks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


Accelerating Deep CNN Training
Transfer learning: motivation

Problem: Deep neural networks require very large data sets to train, and large computational resources to
build from scratch

Famous CNN models AlexNet, ZFNet, VGGNet,


GoogleNet, Microsoft ResNet…

▪ Trained on millions of images


▪ Powerful GPUs used for training
▪ Training takes days or even weeks

Solution: Transfer learning (recycle pretrained model)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23


Accelerating Deep CNN Training
Transfer learning: reusing joint features

Low-level filters
conv-1 AlexNet

Low-level features
Feature map-1 AlexNet

DeCAF: Donahue, J. et al. (2014). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. CoRR, abs/. 1310.1531
AlexNet: Krizhevsky, A. et al.(2012). ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 24
Accelerating Deep CNN Training
Transfer learning: reusing joint features
Deeper

Low-level activations Mid-level activations High-level activations


VGG Net: Simonyan, K. & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25
Accelerating Deep CNN Training
Transfer learning: last layer activations

Dog Cat Elephant

VGG Net: Simonyan, K. & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26
Accelerating Deep CNN Training
Transfer learning: application
Existing model New model
Conv Conv
Scenario 1:
ReLU ReLU
Replacing last layer with new classifier
Pool Pool

Conv Conv
▪ Retrain on small data set
ReLU ReLU
▪ Similar features
(Global) Pool (Global) Pool

Dense Dense
ReLU ReLU
Dropout Dropout

Dense Dense
ReLU ReLU
Dropout Dropout

Dense replace Dense


Softmax Softmax

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 27


Accelerating Deep CNN Training
Transfer learning: application
… …
Conv Conv
Scenario 1:
ReLU ReLU
Fine-tuning more layers
Pool Pool

Conv Conv
▪ Retrain on large data set
ReLU retrain ReLU
▪ Similar features
(Global) Pool (Global) Pool

… …

Dense Dense
ReLU ReLU
Dropout Dropout

Dense replace Dense


Softmax Softmax

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 28


Accelerating Deep CNN Training
Overview

Content:
▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 29


Accelerating Deep CNN Training
Deep residual networks: Extremely Deep Networks [He, Zhang, Ren, Sun, 2015]

▪ Intuition: Deeper networks are more expressive

▪ Problem: Deep networks are hard to train

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 30


Accelerating Deep CNN Training
Deep residual networks: Extremely Deep Networks [He, Zhang, Ren, Sun, 2015]

▪ Intuition: Deeper networks are more expressive


𝑥

▪ Problem: Deep networks are hard to train


Weight
▪ Idea: The modeled function of each building block 𝑅𝑒𝐿𝑈
has a higher resemblance to the identity function
(than to the zero function) Weight

𝐺(𝑥)

+
𝐺 𝑥 +𝑥
𝑅𝑒𝐿𝑈

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 31


Accelerating Deep CNN Training
Deep residual networks: Extremely Deep Networks [He, Zhang, Ren, Sun, 2015]
𝑥
▪ Intuition: Deeper networks are more expressive
𝑥

▪ Problem: Deep networks are hard to train


+
Weight
▪ Idea: The modeled function of each building block has 𝑅𝑒𝐿𝑈
a higher resemblance to the identity function (than to
the zero function) Weight +

▪ The signal can directly skip through multiple layers


(gradient more easily flows from top to bottom layers) 𝐺(𝑥)

+ +

𝐺 𝑥 +𝑥
▪ Allows very deep architectures (typically more than 𝑅𝑒𝐿𝑈
100 layers) +
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 32
Accelerating Deep CNN Training
Coming up next

Applications of CNNs
▪ Object detection
▪ Semantic image segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 33


Thank you.
Contact information:

open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 4: Convolutional Networks
Unit 5: Applications of CNNs
Applications of CNNs
What we covered in the last unit

Accelerating Deep CNN Training


▪ Computational considerations
▪ Batch normalization
▪ Transfer learning
▪ Residual networks

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Applications of CNNs
Overview

Content:
▪ Object detection
– Two-stage detectors
– One-stage detectors
▪ Segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Applications of CNNs
Overview

Content:
▪ Object detection
– Two-stage detectors
– One-stage detectors
▪ Segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Applications of CNNs
Object detection: example

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size


2. Run pre-trained object-classification CNN on each window (softmax output)

CNN

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size


2. Run pre-trained object-classification CNN on each window (softmax output)
3. Associate current sliding window with object class if probability is sufficiently high (e.g. > 0.5)

CNN

Classification
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size


2. Run pre-trained object-classification CNN on each window (softmax output)
3. Associate current sliding window with object class if probability is sufficiently high (e.g. > 0.5)

CNN

Classification
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12
Applications of CNNs
Object detection – Naïve approach

1. Specify sliding window size


2. Run pre-trained object-classification CNN on each window (softmax output)
3. Associate current sliding window with object class if probability is sufficiently high (e.g. > 0.5)

Problem:
Intractable even
for small images

CNN

Classification
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13
Applications of CNNs
Two-stage detectors: key idea

▪ Use proposal algorithm (e.g. ‟selective search”)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


Applications of CNNs
Two-stage detectors: key idea

▪ Use proposal algorithm (e.g. “selective search”)


▪ Boxes are proposed based on e.g.:
 Texture
 Color
 Intensity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


Applications of CNNs
Two-stage detectors: key idea

▪ Use proposal algorithm (e.g. “selective search”)


▪ Boxes are proposed based on e.g.:
 Texture
 Color
 Intensity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


Applications of CNNs
Two-stage detectors: key idea

▪ Use proposal algorithm (e.g. “selective search”)


▪ Boxes are proposed based on e.g.:
 Texture
 Color
 Intensity

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


Applications of CNNs
Two-stage detectors: key idea

▪ Use proposal algorithm (e.g. “selective search”)


▪ Boxes are proposed based on e.g.:
 Texture
 Color
 Intensity
▪ Evaluate only on “proposals”

CNN

Classification
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18
Applications of CNNs
Two-stage detectors: R-CNN [Girshick, Donahue, Darrell, Malik, 2013]

Proposal algorithm Warping Features


Object
SVM classification
CNN Improved
Lin. Reg bounding boxes

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


Applications of CNNs
Two-stage detectors: R-CNN [Girshick, Donahue, Darrell, Malik, 2013]

Proposal algorithm Warping Features


Object
SVM classification
CNN Improved
Lin. Reg bounding boxes

Advantages:
▪ Several orders of magnitude faster
▪ Allows improvement of bounding boxes

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


Applications of CNNs
Two-stage detectors: R-CNN [Girshick, Donahue, Darrell, Malik, 2013]

Proposal algorithm Warping Features


Object
SVM classification
CNN Improved
Lin. Reg bounding boxes

Advantages: Disadvantages:
▪ Several orders of magnitude faster ▪ CNN cannot adapt to changes in SVM/linear
regression
▪ Allows improvement of bounding boxes
 Implication: Degrades power of model
▪ CNN often evaluates boxes that have large
overlaps
 Implication: Suboptimal performance
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21
Applications of CNNs
Two-stage detectors: Fast R-CNN [Girshick 2015]

Key Idea:
▪ Softmax output replaces SVM
▪ Combine regression and classification into one neural net (end-to-end)

Fully connected

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


Applications of CNNs
Two-stage detectors: Fast R-CNN [Girshick 2015]

Key Idea:
▪ Softmax output replaces SVM
▪ Combine regression and classification into one neural net (end-to-end)
▪ Evaluate CNN only one time/image (share results among suggestions)
Region of Interest Pooling:

(same padding) Feature map Max Pooling Fully connected

Region of Interest Pooling


© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23
Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 24


Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck


Solution: Add so-called “Region Proposal Network”

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25


Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck


Solution: Add so-called “Region Proposal Network”

Feature map

CNN

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26


Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck


Solution: Add so-called “Region Proposal Network”

Proposals

Feature map

CNN

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 27


Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck


Solution: Add so-called “Region Proposal Network”

RoI-Pooling /
Proposals
Warping

Feature map

CNN

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 28


Applications of CNNs
Two-stage detectors: Faster R-CNN [Ren, He, Girshick, Sun, 2015]

Key Problem: Region proposal becomes bottleneck


Solution: Add so-called “Region Proposal Network”
Classifier / Regressor

RoI-Pooling /
Proposals
Warping

Feature map

CNN

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 29


Applications of CNNs
Two-stage detectors: comparison [Li, Karpathy, Johnson]

VOC 2017

Faster R-CNN

Fast R-CNN

R-CNN

0 10 20 30 40 50 60
Test time/image [s]

Faster R-CNN

Fast R-CNN

R-CNN

65,4 65,6 65,8 66 66,2 66,4 66,6 66,8 67


mAP
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 30
Applications of CNNs
Overview

Content:
▪ Object detection
– Two-stage detectors
– One-stage detectors
▪ Segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 31


Applications of CNNs
Object detection: one-stage detectors

The R-CNN class detectors are accurate – but very slow!


How can we make Methodology:
this faster? ▪ First stage: Generate region proposals
▪ Second stage: Classify & refine bbox proposals

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 32


Applications of CNNs
Object detection: one-stage detectors

The R-CNN class detectors are accurate – but very slow!


How can we make Methodology:
this faster? ▪ First stage: Generate region proposals
▪ Second stage: Classify & refine bbox proposals

One-stage detectors:
▪ Idea: Take advantage of fully convolutional networks to
CHANGE
compute bounding box regression and class probabilities
METHODOLOGY
in one pass
▪ Implication: Architecture is considerably simpler!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 33


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

CNN
(Encoder)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 34


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

CNN Upsample
(Encoder) (Decoder)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 35


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

CNN Upsample
(Encoder) (Decoder)

Mask

Each pixel predicts


a class

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 36


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

Box
coordinates

CNN Upsample Each pixel predicts


4 coordinates
(Encoder) (Decoder)

Mask

Each pixel predicts


a class

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 37


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

Box
coordinates

CNN Upsample Each pixel predicts


(Encoder) (Decoder) 4 coordinates

Mask

Each pixel predicts


a class

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 38


Applications of CNNs
One-stage detectors e.g.: YOLO(9000) – You Only Look Once [Redmon, Divvala, Girshick, Farhadi, 2015]

Box
coordinates

CNN Upsample Each pixel predicts


(Encoder) (Decoder) 4 coordinates

Mask

Each pixel predicts


a class

Also see: Single Shot Multibox Detector (SSD)


© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 39
Applications of CNNs
One-stage detectors: comparison

Problems: One-stage detectors are less accurate than two-stage detectors:

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 40


Applications of CNNs
One-stage detectors: comparison

Problems: One-stage detectors are less accurate than two-stage detectors:


▪ Number of bounding box predictions is considerably larger (c.f. R-CNN’s 2k proposals vs. 100k in most
one-stage detectors)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 41


Applications of CNNs
One-stage detectors: comparison

Problems: One-stage detectors are less accurate than two-stage detectors:


▪ Number of bounding box predictions is considerably larger (c.f. R-CNN’s 2k proposals vs. 100k in most
one-stage detectors)
▪ This leads to more class imbalance since most of the bounding boxes belong to background

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 42


Applications of CNNs
One-stage detectors: RetinaNet [Lin, Goyal, Girshick, He, Dollar, 2017]

Problems: One-stage detectors are less accurate than two-stage detectors:


▪ Number of bounding box predictions is considerably larger (c.f. R-CNN’s 2k proposals vs. 100k in most
one-stage detectors)
▪ This leads to more class imbalance since most of the bounding boxes belong to background

Remedy: Introduce so-called Focal Loss:

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 43


Applications of CNNs
One-stage detectors: RetinaNet [Lin, Goyal, Girshick, He, Dollar, 2017]

Problems: One-stage detectors are less accurate than two-stage detectors:


▪ Number of bounding box predictions is considerably larger (c.f. R-CNN’s 2k proposals vs. 100k in most
one-stage detectors)
▪ This leads to more class imbalance since most of the bounding boxes belong to background

Remedy: Introduce so-called Focal Loss:


▪ Down-weight contribution of examples that are particularly easy to learn (i.e. background patches)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 44


Applications of CNNs
One-stage detectors: RetinaNet [Lin, Goyal, Girshick, He, Dollar, 2017]

Problems: One-stage detectors are less accurate than two-stage detectors:


▪ Number of bounding box predictions is considerably larger (c.f. R-CNN’s 2k proposals vs. 100k in most
one-stage detectors)
▪ This leads to more class imbalance since most of the bounding boxes belong to background

Remedy: Introduce so-called Focal Loss:


▪ Down-weight contribution of examples that are particularly easy to learn (i.e. background patches)

− log 𝑝 , 𝑦 = +1
- L 𝑝, 𝑦 = ቊ
− log 1 − 𝑝 , 𝑦 = −1

− (1 − 𝑝)𝛾 log 𝑝 , 𝑦 = +1
- F 𝑝, 𝑦 = ቊ
−𝑝𝛾 log 1 − 𝑝 , 𝑦 = −1

For more details: RetinaNet – https://arxiv.org/abs/1708.02002


© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 45
Applications of CNNs
Overview

Content:
▪ Object detection
– Two-stage detectors
– One-stage detectors
▪ Segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 46


Applications of CNNs
Image segmentation: example

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 47


Applications of CNNs
Image segmentation: example

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 48


Applications of CNNs
Image segmentation: generic architecture

CNN Upsample
(Encoder) (Decoder)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 49


Applications of CNNs
Image segmentation: generic architecture

CNN Upsample
(Encoder) (Decoder)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 50


Applications of CNNs
Image segmentation e.g. U-Net architecture

Encoder

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 51


Applications of CNNs
Image segmentation e.g. U-Net architecture

Encoder Decoder

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 52


Applications of CNNs
Image segmentation e.g. U-Net architecture

Encoder Decoder

Upsampling: Transposed Convolutions (see Unit 3)

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 53


Applications of CNNs
Image segmentation e.g. U-Net architecture
Pixelwise
classification

Encoder Decoder

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 54


Applications of CNNs
Image segmentation e.g. U-Net architecture

Merge with hires features

Encoder Decoder

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 55


Applications of CNNs
Image segmentation e.g. U-Net architecture

Merge with hires features

Encoder Decoder

Also see: e.g. SegNet


e.g. Pyramid Scene Parsing Network
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 56
Applications of CNNs
Image segmentation: challenges

▪ Efficiency / Accuracy
 Encoder downsampling (low resolution) versus dense prediction

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 57


Applications of CNNs
Image segmentation: challenges

▪ Efficiency / Accuracy
 Encoder downsampling (low resolution) versus dense prediction
 Make use of dilated convolutions
 Feature fusion

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 58


Applications of CNNs
Image segmentation: challenges

▪ Efficiency / Accuracy
 Encoder downsampling (low resolution) versus dense prediction
 Make use of dilated convolutions
 Feature fusion

▪ Data availability
 Ground truth: Hand-segmented pictures (very costly to acquire!)
 Data augmentation is essential!

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 59


Applications of CNNs
Image segmentation: data augmentation

▪ Data augmentation is
indispensable when there is little
training data
▪ Use randomized “on-the-fly”
modifications

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 60


Applications of CNNs
Coming up next

Industry Applications of Deep Learning


▪ Machine learning in customer service
▪ Machine learning in banking
▪ Medical image segmentation

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 61


Thank you.
Contact information:

open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.

You might also like