Professional Documents
Culture Documents
1
What convolution really is?
▪ There is a subtle difference between convolution and mathematics’ convolution.
▪ They start with a series of pooling and convolution layers, called the convolutional base
of the model.
▪ They end with a densely connected classifier.
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018
3
Convolution: locality and translation invariance
▪ Locality: if we want to recognize patterns
corresponding to objects, like an airplane in
the sky, we will likely need to look at how
nearby pixels are arranged, and we will be
less interested in how pixels that are far from
each other appear in combi- nation.
Essentially, it doesn’t matter if our image of a
Spitfire has a tree or cloud or kite in the
corner or not.
▪ In order to do this, we need some way to go from a lower resolution image to a higher one.
We generally do this with the deconvolution operation. Roughly, deconvolution layers allow
the model to use every point in the small image to “paint” a square in the larger one.
https://distill.pub/2016/deconv-checkerboard/
7
CNN: wider, deeper and higher resolution
A deeper network means more convolutional A network with
layers higher resolution
means that it
A wider network means more feature maps
processes input
(filters) in the convolutional layers. Essentially
images with larger
more channels per layer.
width and depth
(spatial resolutions).
That way the
produced feature
maps will have higher
spatial dimensions.
This means images
with higher
resolution.
https://theaisummer.com/cnn-architectures/ 8
CNNs vs. fully connected NNs: #1
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018. 10
Two key characteristics of convolution nets
▪ The patterns they learn are translation invariant: After learning
a certain pattern in the lower-right corner of a picture, a
convnet can recognize it anywhere: for example, in the
upper-left corner. A densely connected network would have to
learn the pattern anew if it appeared at a new location. This
makes convnets data efficient when processing image.
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018. 11
Convolution reduces overfitting
▪ A single, small set of weights can train over a much
larger set of training examples, because even though
the dataset hasn’t changed, each mini-kernel is
forward propagated multiple times on multiple
segments of data, thus changing the ratio of weights
to datapoints on which those weights are being
trained.
12
Trask, Andrew W. "Deep learning." (2019).
Convolutional Neural Network (CNNs / ConvNets): #0
▪ Densely network architecture does not take into account the spatial structure of the
images. For instance, it treats input pixels which are far apart and close together on exactly
the same footing. Such concepts of spatial structure must instead be inferred from the
training data.
▪ But what if, instead of starting with a network architecture which is tabula rasa, we used an
architecture which tries to take advantage of the spatial structure? The name convolutional
comes from the fact that the operation in the used equation is sometimes known as a
convolution.
▪ ConvNet architectures make the explicit assumption that the inputs are images.
http://neuralnetworksanddeeplearning.com/chap6.html#other_techniques_for_regularization 13
Convolutional Neural Network (CNNs / ConvNets): #0.1
http://www.deeplearningbook.org/slides/09_conv.pdf
14
Convolutional Neural Network (CNNs / ConvNets): #1
▪ Convolutional Neural Networks take advantage of the fact that the input consists of images
and they constrain the architecture in a more sensible way.
▪ In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons
arranged in 3 dimensions: width, height, depth.
https://cs231n.github.io/convolutional-networks/ 15
Convolutional Neural Network (CNNs / ConvNets): #2
Regular NN
https://cs231n.github.io/convolutional-networks/ 16
Convolutional Neural Network
(CNNs / ConvNets): #3
Mueller, John Paul, and Luca Massaron. Machine learning for dummies. John Wiley & Sons, 2016. 18
Why convolution are so good for images
when compared to fully connected?
▪ Parameter sharing: a feature detector, such as a vertical detector, that’s useful in one part of
the image is probably useful in another part of the image
▪ Sparsity of connections: in each layer, each output value depends only on a small number of
inputs
19
What are convolutions?
▪ Convolutions are a component within CNNs. They are defined as a layer within the CNNs. In
a convolution layer, we slide a filter matrix over the entire image matrix from left to right
and from top to bottom, and we take the dot product of the filter, with this patch spanning
the size of the filter over the image channel.
▪ If the two matrices have high values in the same positions, the dot product's output will be
high, and vice versa.
▪ The output of the dot product is a scalar value that identifies the correlation between the
pixel pattern in the image and the pixel pattern expressed by the filter.” If we were in a
pedantic mood, we could call convolutions discrete cross-correlations.
21
How the convolution operator works (CNNs)
Essentially an
element wise
multiplication
followed by the same
of each entry
Animated pic
https://towardsdatascience.com/what-is-wrong-with-convolutional-neural-networks-75c2ba8fbd6f
23
Edge detection and convolution detector
24
Edge detection: options
25
Padding: #1
29
Stride
•.
30
Dilations
▪ Dilations can be used to control the output size, but
their main purpose is to expand the range of what a
kernel sees. In the animation on the right the dilation
is set to 2.
https://medium.com/@marsxiang/convolutions-transposed-and-deconvolution-6430c358a5b6 31
Transpose convolution
▪ Transposed convolution layer is upsampling in nature.
They are usually used in autoencoders and GANs or
where the network must reconstruct an image.
https://medium.com/@marsxiang/convolutions-transposed-and-deconvolution-6430c358a5b6 32
Transposed convolution vs. deconvolution
▪ They are not the same thing, although they are used interchangeably.
▪ On the other hand, a transposed convolution layer only reconstructs the spatial dimensions
of the input. However, it does not give the same output as the input.
https://medium.com/@marsxiang/convolutions-transposed-and-deconvolution-6430c358a5b6 33
Convolution over
volumes
34
Convolution operator and output dimensions
35
Biggest problem with CNN
▪ Commonly cited issues: Overfitting, exploding gradient, and class imbalance are the major
challenges while training the model using CNN.
▪ But there is more: it turned our that pooling is very bad and the fact that it’s working so
well is a disaster.
https://towardsdatascience.com/what-is-wrong-with-convolutional-neural-networks-75c2ba8fbd6f 36
Pooling: #1
Pooling is a destructive or generalization process to reduce overfitting. BUT CNNs have a
habit of overfitting, even with pooling layers. Dropout should be used such as between fully
connected layers and perhaps after pooling layers.
37
Pooling: #2
▪ There is more than one type of pooling layer (Max
pooling, avg pooling …), the most common -this
days- is Max pooling because it gives transational
variance — poor but good enough for some tasks
— and it reduces the dimensionality of the
network so cheaply (with no parameters) max
pooling layers is actually very simple, you
predefine a filter (a window) and swap this
window across the input taking the max of the
values contained in the window to be the output
https://towardsdatascience.com/what-is-wrong-with-convolutional-neural-networks-75c2ba8fbd6f
38
Pooling: #3
▪ The pooling layer is used to reduce the spatial dimension of an input, preserving its depth.
▪ As we move from the initial layer to the later layers in a CNN, we want to identify more
conceptual meaning in the image compared to actual pixel by pixel information, and so we
want to identify and keep key pieces of information from the input and throw away the rest.
https://towardsdatascience.com/what-is-wrong-with-convolutional-neural-networks-75c2ba8fbd6f
41
Network in network
▪ We won’t survey all of them, but we’ll cover three of the most accessible and useful ones:
▪ Visualizing intermediate convnet outputs (intermediate activations) —Useful for understanding how
successive convnet layers transform their input, and for get- ting a first idea of the meaning of
individual convnet filters.
▪ Visualizing convnets filters—Useful for understanding precisely what visual pat- tern or concept each
filter in a convnet is receptive to.
▪ Visualizing heatmaps of class activation in an image—Useful for understanding which parts of an
image were identified as belonging to a given class, thus allowing you to localize objects in images.
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018. 43
What do hidden layer
in a CNN learn?
Some of the activations created by
the fifth convolution layer. we can
see that the early layers detect
lines and edges
Rao, Delip, and Brian McMahan. Natural language processing with PyTorch: build intelligent language
applications using deep learning. " O'Reilly Media, Inc.", 2019
46
Training a new model from scratch using
what little data you have
▪ Having to train an image-classification model using very little data is a common situation, which you’ll
likely encounter in practice if you ever do computer vision in a professional context.
▪ What’s more, deep-learning models are by nature highly repurposable: you can take, say, an
image-classification or speech-to-text model trained on a large-scale dataset and reuse it on a
significantly different problem with only minor changes. Specifically, in the case of computer vision, many
pretrained models (usually trained on the Image-Net dataset) are now publicly available for download
and can be used to bootstrap powerful vision models out of very little data.
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018. 47
How to combat overfitting in computer vision
when you have a small dataset?
▪ Because you have relatively few training samples (2,000), overfitting will be your
number-one concern.
▪ Dropout and L2 regularization are all valid techniques to combat overfitting.
▪ However, when it comes to computer vision data augmentation is used almost universally.
Chollet, Francois. Deep learning with Python. Vol. 361. New York: Manning, 2018. 48
1D convnets
▪ In the same way that 2D convnets perform well for processing visual patterns in 2D space,
1D convnets perform well for processing temporal patterns.
▪ They offer a faster alternative to RNNs on some problems, in particular natural- language
processing tasks.
▪ Typically, 1D convnets are structured much like their 2D equivalents from the world of
computer vision: they consist of stacks of Conv1D layers and Max- Pooling1D layers, ending
in a global pooling operation or flattening operation.
▪ Because RNNs are extremely expensive for processing very long sequences, but 1D
convnets are cheap, it can be a good idea to use a 1D convnet as a prepro- cessing step
before an RNN, shortening the sequence and extracting useful rep- resentations for the
RNN to process.
Chollet, Francois. Deep learning with Python. Simon and Schuster, 2017. 49
Scaling images
50