Professional Documents
Culture Documents
Convolutional Neural Networks - Annotated
Convolutional Neural Networks - Annotated
Introductory
Artificial Intelligence
College of Engineering
Deep Learning -
Convolutional
Neural Networks
Convolutional
Neural Networks
What are they?
• Can take in an input image, assign
importance to parts of it and differentiate
one from the other
• A class of neural networks
• Not fully-connected
• A much much lower number of weights
and biases to learn more manageable
A Typical Convolutional Neural Networks Architecture
Applications
Convolutional Neural Networks
Applications
Classification Retrieval
Object Detection and Segmentation
Detection Segmentation
Self Driving Cars
Captioning
AlexNet:
An Analysis of Deep Neural Network Models for Practical Applications, 2017. Smaller compute, still memory
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission. heavy, lower accuracy
Architecture
Overview
Convolution Layer
32x32x3 image preserve spatial structure
32 height
32 width
3 depth
Convolution Layer
32x32x3 image
5x5x3 filter
Convolve the filter with the image
32 height i.e. “slide over the image spatially,
computing dot products”
32 width
3 depth
Filters always extend the full
Convolution Layer depth of the input volume
32x32x3 image
5x5x3 filter
Convolve the filter with the image
i.e. “slide over the image spatially,
32 height
computing dot products”
32 width
3 depth
Convolution Layer
32x32x3 image
5x5x3 filter
32
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
32 (i.e. 5*5*3 = 75-dimensional dot product +
3 bias)
Convolution Layer
activation map
32x32x3 image
5x5x3 filter
32
28
32 28
3 1
Convolution Layer consider a second, green filter
5x5x3 filter
32
28
32 28
3 1
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
activation maps
32
28
Convolution Layer
32 28
3 6
32 28
CONV,
ReLU
e.g. 6
32 5x5x3 28
3 filters 6
Preview
ConvNet is a sequence of Convolutional Layers, interspersed with activation
functions
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x6
32 5x5x3 28 filters 24
3 filters 6 10
Preview
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
N
Output size:
(N - F) / stride + 1
F
e.g. N = 7, F = 3:
F N
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33 :\
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
(recall:)
(N - F) / stride + 1
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
7x7 output!
0
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
7x7 output!
0
in general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F = 3 => zero pad with 1
F = 5 => zero pad with
2 F = 7 => zero pad
with 3
Remember back to…
E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!
(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x6
32 5x5x3 28 filters 24
3 filters 6 10
Convolutional Layer
Example
3 2 1 0 3 4
1 2 3 4
y
Pooling Layer
Common settings:
Summary
- F = 2, S = 2
• Accepts a volume of size
- F = 3, S = 2
• Requires two hyperparameters:
Their spatial extent
The stride
• Produces a volume of size where:
=
=
=
• introduces zero parameters since it computes a fixed function of the input.
• Note that it is not common to use zero-padding for pooling layers.
Fully Connected Layer
Contains neurons that connect to the entire input volume, as in ordinary Neural Networks
Batch Size
11 -11 1 0 0
0
10 01 1 1 0
0 0 1 1 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
1 -1 0 1
0 1 1 1 0
1 0
0 0 1 1 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 11 -10 0
0 1 2
0 1 11 01 0
0 0 1 1 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 1 0 -1 0
0 1 2 1
0 1 1 11 00
0 0 1 1 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
10 -11 1 1 0
-1
=
10 00 1 1 1
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 11 -11 1 0
-1 0
0 10 01 1 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 11 -11 0
-1 0 1
0 0 11 01 1 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 1 -1 0
-1 0 1 2
0 0 1 11 01 =
0 0 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
=
10 -10 1 1 1
0
10 00 1 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 10 -11 1 1 =
0 -1
0 10 01 1 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 11 -11 1 =
0 -1 1
0 0 11 01 0
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 1 1 1 -1 1 =
0 -1 1 1
0 0 1 11 00
0 1 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 1 1 1 =
0 -1 1 1
10 -10 1 1 0
0
10 01 1 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 1 1 1 =
0 -1 1 1
0 10 -11 1 0
0 0
0 11 01 0 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 1 1 1 =
0 -1 1 1
0 0 11 -11 0
0 0 1
0 1 11 00 0
CNN
Example
1 1 1 0 0
0 1 2 1
0 1 1 1 0
-1 0 1 2
0 0 1 1 1 =
0 -1 1 1
0 0 1 11 -10
0 0 1 1
0 1 1 10 00
CNN Example
2. ReLU Activation
0 1 2 1 0 1 2 1
-1 0 1 2 ReLU 0 0 1 2
0 -1 1 1 0 0 1 1
0 0 1 1 0 0 1 1
CNN Example
3. Pooling Layer
0 1 2 1
Max Pooling with
0 0 1 2 2x2 filter, Stride 2
1
0 0 1 1
0 0 1 1
CNN Example
3. Pooling Layer
0 1 2 1
Max Pooling with
0 0 1 2 2x2 filter, Stride 2
1 2
0 0 1 1
0 0 1 1
CNN Example
3. Pooling Layer
0 1 2 1
Max Pooling with
0 0 1 2 2x2 filter, Stride 2
1 2
0 0 1 1 0
0 0 1 1
CNN Example
3. Pooling Layer
0 1 2 1
Max Pooling with
0 0 1 2 2x2 filter, Stride 2
1 2
0 0 1 1 0 1
0 0 1 1
CNN Example
4. Fully Connected Layer
1
1 2 Flattening
2
0 1 0
Flattened Matrix from
the Pooling Layer is 1
fed as input to the
Fully Connected Layer
Convolutional Neural Networks
Summary
• ConvNets stack CONV,POOL,FC layers
• Trend towards smaller filters and deeper architectures
• Trend towards getting rid of POOL/FC layers (just CONV)
• Typical architectures look like:
[(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX
where N is usually up to ~5, M is large, 0 <= K <= 2.
but recent advances such as ResNet/GoogLeNet
challenge this paradigm