CV - Deep Convolutional Neural Networks

Click icon to add picture
Deep Convolutional Neural Networks

for Image Classification
CNN basic elements
2 Faculty of Engineering Sciences, ESAT-PSI

Convolutional Layers
3 A. Rosebrock: Deep Learning for Computer Vision with Python - Starter Bundle Faculty of Engineering Sciences, ESAT-PSI
Convolutional Layers
4 A. Rosebrock: Deep Learning for Computer Vision with Python - Starter Bundle Faculty of Engineering Sciences, ESAT-PSI
Pooling Features (“Subsampling”)
• The job of complex cells

• Max Pooling
• Is there a diagonal edge somewhere in an area of the image?
• Take the maximum over the responses to the feature detector in the area
• Average Pooling
• Is there a blobs pattern in an area of the image?
• Take the average over the responses to the feature detectors in the area
• Max Pooling generally works better
6 CSC411: Machine Learning and Data Mining Faculty of Engineering Sciences, ESAT-PSI
7 http://cs231n.github.io/convolutional-networks/ Faculty of Engineering Sciences, ESAT-PSI
Max Pooling as Hierarchical Invariance
• Max Pooling:
At each level of the hierarchy, we use an “or” to get features that
are invariant across a bigger range of transformations.
• Average Pooling is a little bit like an “AND”
Putting it all together
• Different types of layers: convolution and subsampling.

• Convolution layers compute feature maps: the response to multiple feature detectors on a grid
in the lower layer
• Subsampling layers pool the features from a lower layer into a smaller feature map
12 http://cs231n.github.io/convolutional-networks/ Faculty of Engineering Sciences, ESAT-PSI
http://cs231n.github.io/convolutional-networks/
Why Convolutional Nets
It’s possible to compute the same outputs in a fully connected neural network, but
The network is much harder to train
more weights, more data, slower convergence
There is more danger of overfitting if we try it with a really big network
A convolutional network has fewer parameters due to weight sharing *
It makes sense to detect features and then combine them
That’s what the brain seems to be doing
* Small fully connected networks can work very well, but are hard to train
LeNet (1998): The origin of convolutional neural network
17 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

AlexNet (2012)

Layer of Neurons: matrix multiplication

CUDA
Historically, GPUs were used for graphics processing.

But people realized that the fine- grained parallelism inherent in GPU
architecture could be exploited for general purpose computing.
CUDA (Compute Unified Device Architecture)

Parallel computing platform
Programming model and API
Allows enabled GPUs for general purpose processing

GPU acceleration
• CPU 7th gen i7–7500U, 2.7 GHz

• GPU NVidia GeForce 940MX, 2GB (laptop)
• GPU NVidia GeForce 1070, 8GB (desktop)
• 2 x AMD Opteron 6168 1.9 GHz Processor (2x12 cores total)
taken from PowerEdge R715 server

CPU vs GPU
http://www.hpcadvisorycouncil.com/events/2017/stanford-workshop/pdf/JBernauer__MLIntro_Tutorial_Tuesday_02072017.pdf
https://www.slideshare.net/0xdata/intro-to-machine-learning-for-gpus
ReLu Non-Linearity – Simpler Activation

VGGNet (2014)
• Only 3x3 convolutions
• Doubling number of filters per “layer”

• Layer (height*width) time layer thickness constant
• Pretraining
• Training smaller versions of network
• Use converged weights as initialization for larger network layers
https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-
ilsvlc-2014-image-classification-d02355543a11
VGGNet

VGGNet
https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-
ilsvlc-2014-image-classification-d02355543a11
VGGNet
• Deep (16/19 layers) networks
• On par with GoogleNet on ILSVRC
• Large (500MB!)
• Achieves higher classification accuracy (compared to GoogleNet) in practice
• Generalizes better
• Better fit for transfer learning and fine-tuning

GoogleNet or Inception (2014)

https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Heterogeneous Set of Convolutions
Learn multi-scale features
Do additional max-pooling (at the time max-pooling was claimed essential
Super expensive if we want a decent number of filters in each layer
Inception Module
The 1x1 convolutions at the bottom of the module reduce the number of inputs
by a factor of
Decreases computation cost dramatically
GoogleNet Key Ideas

Inception v1
Auxiliary Classifiers
Deep Network: risk of vanishing gradients

Add auxiliary classifiers
Softmax outputs in the middle of the network, the same
as at the top
Encourages the network to learn features that are useful
for classification in the middle
The total loss function is a weighted sum of

the auxiliary loss and the real loss.
Why go deeper?
• According to the universal approximation theorem, given enough capacity, we
know that a feedforward network with a single layer is sufficient to represent
any function.
• However, the layer might be massive and the network is prone to overfitting
the data.
• Therefore, there is a common trend in the research community that our
network architecture needs to go deeper.
• AlexNet: 5 convolutional layers
• VGGNet: 19
• GoogleNet: 22

Why go deeper?
• Wide network is good for memorization, but not so good for generalization
• Multiple layers can learn features at various levels of abstraction
• Deep layers can provide features with global semantic meaning and abstract
details (relations of relations ... of relations of objects), while using only small
kernels
• Small kernels keep the number of parameters less

How to go Deep?

https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035
ResNet (Residual Neural Network) (2015)
• Extremely deep networks are feasible
• Can be trained using standard SGD
• And a reasonable initialization
• Relies on a micro-architecture called
Residual Module
51 A. Rosebrock: Deep Learning for Computer Vision with Python - Practitioner Bundle Faculty of Engineering Sciences, ESAT-PSI
Nested function classes
• Adding layers doesn’t only make the network more expressive, it also changes it in sometimes not quite so predictable
ways
• Only if larger function classes contain the smaller ones are we guaranteed that increasing them strictly increases the
expressive power of the network.
• At the heart of ResNet is the idea that every additional layer should contain the identity function as one of its elements.
This means that if we can train the newly-added layer into an identity mapping 𝑓(𝐱)=𝐱, the new model will be as effective
as the original model. As the new model may get a better solution to fit the training data set, the added layer might make it
easier to reduce training errors.
52 https://d2l.ai/chapter_convolutional-modern/resnet.html Faculty of Engineering Sciences, ESAT-PSI

Residual Learning
• Learn y = f(x) + x
• These residual layers start at the identity
function and evolve to become more
complex as the network learns.
• This type of residual learning framework
allows us to train networks that are
substantially deeper than previously
proposed network architectures.
• Furthermore, since the input is included in
every residual module, it turns out the
network can learn faster and with larger
learning rates.
ResNet

ResNet pre-activation variant
• Deeper ResNet still to outperform shallower Resnet

The Deeper the better !!!

DCNN as feature extractors
1. Use a pre-trained Convolutional Neural
Network as feature extractor.
2. Using this feature extractor, forward
propagate your dataset of images through
the network, extract the activations at a
given layer.
3. A standard machine learning classifier can
then be trained on top of the CNN features
Transfer Learning and fine-tuning
• But there is another type of transfer
learning, one that can actually
outperform the feature extraction
method if you have sufficient data.
• This method is called fine-tuning.
• First, cut off the final set of fully-
connected layers from a pre-trained
Convolutional Neural Network.
• Replace with a new set of fully-
connected layers with random
initializations.
• All pre-FC layers are frozen so their
weights cannot be updated.
• Un-freeze and train with very low
learning rate
Transfer Learning
• Three ways in which transfer might improve learning.
62 https://machinelearningmastery.com/transfer-learning-for-deep-learning/ Faculty of Engineering Sciences, ESAT-PSI

Generative Adversarial Networks (2014)
• GANs can be used to generate synthetic (i.e., fake) images that are perceptually near identical to their ground-truth, authentic
originals.
• In order to generate synthetic images, we make use of two neural networks during training:
• A generator that accepts an input vector of randomly generated noise and produces an output “imitation” image that looks
similar, if not identical to an authentic image
• A discriminator or adversary which attempts to determine if a given image is an “authentic” or “fake”
• By training both of these networks at the same time, one giving feedback to the other, we can learn to generate synthetic images.
Image credit: Thalles Silva

A. Rosebrock: Deep Learning for Computer Vision with Python - Practitioner Bundle
63 Goodfellow et al. (2014) Generative Adversarial Networks Faculty of Engineering Sciences, ESAT-PSI
Radford et al. (2015) Unsupervised Representation Learning with Deep Convolution Generative Adversarial Networks
GAN
GANs’ potential is huge, because they can learn to mimic any distribution of data. That is, GANs
can be taught to create worlds eerily similar to our own in any domain: images, music, speech,
prose. They are robot artists in a sense, and their output is impressive – poignant even.
In a surreal turn, Christie’s sold a portrait for

$432,000 that had been generated by a GAN,
based on
open-source code written by Robbie Barrat of Stanf
ord
. Like most true artists, he didn’t see any of the
money, which instead went to the French company,
Obvious.0
64 https://skymind.ai/wiki/generative-adversarial-network-gan Faculty of Engineering Sciences, ESAT-PSI

GAN
• Discriminative algorithms try to classify input data; that is, given the features
(x) of an instance of data, they predict p(y|x) (posterior) a label or category (y)
to which that data belongs.
• Discriminative models learn the boundary between classes
• Generative algorithms attempt to predict p(x|y) (likelihood) features (x) given a

certain label y.
• Generative models model the distribution of individual classes
65 https://skymind.ai/wiki/generative-adversarial-network-gan Faculty of Engineering Sciences, ESAT-PSI

Object Detection using CNN
• Use traditional object detection procedure
1. Sliding windows
2. Image pyramids
3. Non-maxima suppression
4. Batch processing
• Substitute conventional classifier by CNN
classifier
Downsides
• There are many downsides to treating a neural network trained for
classification as an object detector, namely:
• Sliding windows + image pyramids are incredibly slow, even when
utilizing a GPU for inference
• It can be tedious to tune the scale for the image pyramid and step size
for sliding window
• Due to the tediousness of the parameter selection, we can easily miss
objects in our images
• With these negatives in mind, it raises the question:
“Is there a way to build an end-to-end object detector with deep learning?
And if so, why even bother studying the fundamentals of object detection?”
End-to-End DCNN Object Detection
• The answer to the first part of the question is, yes, we can train end-to-end deep learning
object detectors, but we need to leverage specific network architectures and frameworks
to do so, namely Faster R-CNNs and SSDs.
• To answer the second question, we need to understand the concept of sliding window to
understand how traditional methods localized objects. Deep learning-based object
detectors utilize either:
• Region proposal methods to zero in on the areas of an image that look “interesting”
and therefore deserve closer attention and more computation.
• Image division where an image is partitioned into regions, passed into a CNN, and
then the regions are modified and grouped together based on the output predictions.
• It would be extremely challenging, if not impossible, to understand and appreciate these
methods to object detection without first understanding the classical approach of image
pyramids and sliding windows.
Bounding/Anchor Boxes
• Bounding Boxes • Anchor Boxes

Labeling training set anchor boxes
• In the training set, we consider each
anchor box as a training example.
• Each training anchor box gets two types of
labels
• the category of the target contained in
the anchor box (category)
• the offset of the ground-truth bounding
box relative to the anchor box

Output Bounding Boxes for prediction
• In object detection, we first generate multiple anchor boxes, predict the
categories and offsets for each anchor box, adjust the anchor box position
according to the predicted offset to obtain the bounding boxes to be used for
prediction, and finally filter (NMS) out the prediction bounding boxes that need
to be output.

Single Shot Multibox Detection
• Single-Shot: localization and
detection performed in same
forward inference pass
• Multibox: multiple objects at
the same time
• Detector: both category and
position
• Multi-scale
• Base Network as feature
generator (ResNet e.g.)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on
72 computer vision (pp. 21-37). Springer, Cham. Faculty of Engineering Sciences, ESAT-PSI
SSMD
• We progressively reduce the volume size in deeper layers (cf. standard CNN)
• Each of the CONV layers connects to the final detection layer (varying scale detection)
• Trained on categorical cross-entropy loss (labels) and L1 (location)

CV - Deep Convolutional Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CV - Deep Convolutional Neural Networks

Uploaded by

Copyright:

Available Formats

Click icon to add picture

Deep Convolutional Neural Networks

Click icon to add picture

CNN basic elements

2 Faculty of Engineering Sciences, ESAT-PSI

• The job of complex cells

• Different types of layers: convolution and subsampling.

17 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

20 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

21 Faculty of Engineering Sciences, ESAT-PSI

Historically, GPUs were used for graphics processing.

CUDA (Compute Unified Device Architecture)

23 Faculty of Engineering Sciences, ESAT-PSI

• CPU 7th gen i7–7500U, 2.7 GHz

24 Faculty of Engineering Sciences, ESAT-PSI

26 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

• Doubling number of filters per “layer”

33 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

35 Faculty of Engineering Sciences, ESAT-PSI

37 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

Decreases computation cost dramatically

41 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

Deep Network: risk of vanishing gradients

The total loss function is a weighted sum of

48 Faculty of Engineering Sciences, ESAT-PSI

49 Faculty of Engineering Sciences, ESAT-PSI

50 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

52 https://d2l.ai/chapter_convolutional-modern/resnet.html Faculty of Engineering Sciences, ESAT-PSI

54 S. Banerjee, SlideShare Faculty of Engineering Sciences, ESAT-PSI

56 Faculty of Engineering Sciences, ESAT-PSI

57 Faculty of Engineering Sciences, ESAT-PSI

62 https://machinelearningmastery.com/transfer-learning-for-deep-learning/ Faculty of Engineering Sciences, ESAT-PSI

Image credit: Thalles Silva

In a surreal turn, Christie’s sold a portrait for

64 https://skymind.ai/wiki/generative-adversarial-network-gan Faculty of Engineering Sciences, ESAT-PSI

• Generative algorithms attempt to predict p(x|y) (likelihood) features (x) given a

65 https://skymind.ai/wiki/generative-adversarial-network-gan Faculty of Engineering Sciences, ESAT-PSI

69 Faculty of Engineering Sciences, ESAT-PSI

70 Faculty of Engineering Sciences, ESAT-PSI

71 Faculty of Engineering Sciences, ESAT-PSI

73 Faculty of Engineering Sciences, ESAT-PSI

You might also like