You are on page 1of 45

Deep Learning for

Computer Vision
CHAPTER 3

INTRODUCTION TO
CONVOLUTION
NEURAL NETWORK

Prof. G.S. Jison Hsu 徐繼聖


• Artificial Vision Laboratory
• National Taiwan University of
Science and Technology

Deep Learning for Computer Vision


Outline

Deep Learning for Computer Vision 2


Hardware and Software – University of Michigan
OUTLINE:
• 07:20 – 09:28 CPU vs GPU
• 17:02 – 19:55 Example: Matrix Multiplication
• 19:55 – 21:14 Programming GPUs
• 29:32 – 30:05 Recall: Computational Graphs
• 30:06 – 31:50 The Point of Deep Learning Frameworks
• 33:22 – 51:08 PyTorch: Fundamental Concepts
• 51:09 – 54:29 PyTorch: nn Defining Modules
• 54:30 – 55:01 PyTorch: Pretrained Models

Deep Learning for Computer Vision 3


CPU vs GPU

https://reurl.cc/YXRpRO [07:20 – 09:28 ]


Deep Learning for Computer Vision 4
PyTorch: Fundamental Concepts

https://reurl.cc/9RrgNV [33:22 – 51:08 ]

Deep Learning for Computer Vision 5


ImageNet Challenge

https://reurl.cc/y6ExVy [1:15 - 2:54]


Deep Learning for Computer Vision 6
ImageNet Challenge
• What is ImageNet? ImageNet Official Web: https://reurl.cc/XVpA2R
• ImageNet is a benchmark dataset for object category classification and detection,
featuring hundreds of object categories and millions of images.
• The characteristics of ILSVRC include:
• A detection challenge on fully labeled data for 200 categories of objects
• An image classification plus object localization challenge with 1000 categories.

Deep Learning for Computer Vision 7


ImageNet Challenge

https://www.youtube.com/watch?v=taC5pMCm70U&ab
_channel=codebasics
Deep Learning for Computer Vision 8
ImageNet Challenge

Deep Learning for Computer Vision https://reurl.cc/XVpA2R 9


ImageNet Challenge

Deep Learning for Computer Vision 10


AlexNet

https://reurl.cc/edEjNR [2:54 - 20:20]


Deep Learning for Computer Vision 11
AlexNet
• Use ReLU to add in non-linearity. It accelerates the speed by 6 times at the same
accuracy.
• Replacing regularization with dropout effectively addresses overfitting
concerns. However, the training time is doubled with a dropout rate of 0.5.
• Implementing overlap pooling to reduce network size leads to reductions of
0.4% and 0.3% in top-1 and top-5 error rates on ImageNet, respectively.

Deep Learning for Computer Vision 12


AlexNet
• Copy convolution layers into different GPUs; Distribute the fully connected
layers into different GPUs.
• Feed one batch of training data into convolutional layers for every GPU (Data
Parallel).
• Feed the results of convolutional layers into the distributed fully connected
layers batch by batch (Model Parallel).
• When the last step is done for every GPU. Backpropogate gradients batch by
batch and synchronize the weights of the convolutional layers.

Deep Learning for Computer Vision 13


AlexNet Parameters

Dropout

Deep Learning for Computer Vision 14


Training Techniques - Dropout

https://www.youtube.com/watch?v=gxrnkqa9amo https://www.youtube.com/watch?v=ARq74QuavAo [7:04]

Deep Learning for Computer Vision 15


VGGNet

https://reurl.cc/a49pv4 [23:10 - 34:22]


Deep Learning for Computer Vision 16
VGGNet
➢ Task: 1000 Objects on ImageNet Competition
➢ Layer
◆ Convolutional layer
◆ Max pooling layer
◆ Dropout layer
◆ Fully connected layer

Deep Learning for Computer Vision 17


VGG Parameters

Deep Learning for Computer Vision 18


Example 3.1 Use VGGNet pretrained on ImageNet
1. Please download the “3-1_VGGNet_ImageNet.zip” from Moodle and unzip it.
2. Upload the “3-1_VGGNet_ImageNet.ipynb” and “imagenet1000_clsidx_to_labels.txt”
to the Google Colab.
3. Compare the top-5 prediction probability of ice_bear.jpg using the VGGNet pre-trained
model.
4. Please show your code, results, and observations in a Words file and upload to Moodle.

Original image: ice_bear.jpg Probability of the classes Predicted class : ice bear
Deep Learning for Computer Vision 19
Example 3.1 Use VGGNet pretrained on ImageNet

Use the VGG16 pretrained model

Load the 1000 class labels

Define Image path

Deep Learning for Computer Vision 20


The CIFAR-10 and CIFAR-100 Dataset
What is the CIFAR10 dataset?
• The CIFAR 10 dataset is a collection of images that are commonly used in the computer
vision field.
• It contains 60,000 32x32 color images in 10 different classes, and there are 6,000 images in
each class.
• 10 different classes represent
airplanes, cars, birds, cats, deer, dogs,
frogs, horses, ships, and trucks.
What is the CIFAR100 dataset?
• This dataset is just like the CIFAR-10,
except it has 100 classes containing 600
images each.
• There are 500 training images and
100 testing images per class.

Deep Learning for Computer Vision CIFAR-10, CIFAR-100 official website: https://reurl.cc/5pZY2R 21
Example 3.2a Train VGGNet on CIFAR100
1. Please download the “3-2_VGGNet_CIFAR100.zip” from Moodle and unzip it.
2. Upload the “3-2_VGGNet_CIFAR100.ipynb” to the Google Colab.
3. Use the VGG-16 model pretrained on ImageNet, and given apple, dolphin, and dog images,
and print the top-3 predicted results without training.
4. Please train the CIFAR-100 dataset with the following parameters: input size = 32 (color image),
batch size = 256, learning rate=0.00001, and print the top-3 predicted results without training.
5. Contrast the predictive outcomes with and without any additional training.

CIFAR 100 dataset contains 60000


images and consists of 100 class

Deep Learning for Computer Vision 22


Example 3.2a Train VGGNet on CIFAR100

Resize input images to 32x32

Training Setting Download Dataset


Deep Learning for Computer Vision 23
Example 3.2b Train VGGNet on CIFAR100 from scratch
1. Close the VGG-16 pretrained model, and give apple, dolphin, dog images, and print the top-3
predicted results without training.
2. Close VGG-16 pretrained model on ImageNet to train the CIFAR-100 dataset with the
following parameters: input size = 32 (color image), batch size = 256, learning rate=0.00001.
3. Given these images as input to your trained model, what are the top-3 probabilities in the
output layer?
4. Contrast the predictive outcomes with and without any additional training.
5. Compare the prediction result of using pretrained at 3.2a and training from scratch
6. Please show your code, results, and observations in a Words file and upload to Moodle.

Deep Learning for Computer Vision 24


ResNet

https://reurl.cc/A0kE0E [45:32 ~ 59:11]

Deep Learning for Computer Vision 25


ResNet by Andraw Ng

https://reurl.cc/3YRKWO [07:07]

Deep Learning for Computer Vision 26


ResNet
• Since AlexNet, the state-of-the-art CNN architecture is going deeper and
deeper.
• However, increasing network depth does not work by simply stacking
layers together. Deep networks are hard to train because of the notorious
vanishing gradient problem.
• The core idea of ResNet is
introducing a so-called “identity
shortcut connection” to skip one or
more layers, as shown in the
following figure:

Deep Learning for Computer Vision 27


ResNet18
• Since AlexNet, the state-of-the-art CNN architecture is going deeper and deeper.
• Increasing network depth does not work by simply stacking layers together, deep
networks are hard to train because of the notorious vanishing gradient problem.
• The core idea of ResNet is introducing a so-called “identity shortcut connection” to
skip one or more layers, as shown in the following figure:

Deep Learning for Computer Vision 28


ResNet18 Parameters

Deep Learning for Computer Vision 29


Example 3.3 Train ResNet on CIFAR-100
1. Please download the “3-3_ResNet_CIFAR100.zip” from Moodle and unzip it.
2. Use the ResNet model pretrained without training, and given apple, dolphin, and dog
images, and print the top-5 predicted results without training.
3. Use the ResNet model pretrained on ImageNet to train the CIFAR-100 dataset with the
following parameters: input size = 32 (color image), batch size = 64, learning rate=0.001.
4. Contrast the predictive outcomes with and without any additional training.
5. Please show your code, results, and observations in a Words file and upload to Moodle.

Deep Learning for Computer Vision 30


Example 3.3 Train ResNet on CIFAR-100

Residual block
structure

Deep Learning for Computer Vision 31


What does Filter Learn ? - Feature Visualization

https://reurl.cc/3YRKyl [6:08]

Deep Learning for Computer Vision 32


Example 3.4 Feature Map Visualization
1. Please download the “3-4_Feature_map_visualization.zip” from the Moodle, which is built on
the VGG-16 pretrained on the ImageNet.
2. Upload the “3-4_Feature_map_visualization.ipynb” and “imagenet1000_clsidx_to_labels.txt” to
the Google Colab.
3. Choose your own images from Internet.
4. Use the cat image as the input, please show the shape of the output, and visualize the feature
maps extracted from the layer-index-5 (red box in the right figure below).

Original image: Feature map:


Part of VGG-16 structure.
cat.jpg cat_feature_5.jpg
Deep Learning for Computer Vision 33
Example 3.4 Feature Map Visualization

• Compare with feature maps that extract from layer index-7, -8 and -9

Extract from layer 7 Extract from layer 8 Extract from layer 9


Size: 112 * 112 Size: 112 * 112 Size: 56 * 56

Deep Learning for Computer Vision 34


Example 3.4 Feature Map Visualization

Define the functions to get the


feature map that we wanted.

Deep Learning for Computer Vision 35


Example 3.4 Feature Map Visualization

Define function that can


transform the feature map to
image, and output the image.

Deep Learning for Computer Vision 36


What is Batch Size

https://reurl.cc/GKorZZ [3:54]
Deep Learning for Computer Vision 37
Batch Normalization explained

https://www.youtube.com/watch?v=DtEq44FTPM4 [8:48]

Deep Learning for Computer Vision 38


Batch Normalization explained

Deep Learning for Computer Vision 39


Batch Normalization explained
Without Batch Normalization, the activated values
fluctuate significantly during the first iterations.

Deep Learning for Computer Vision 40


Batch Normalization
• Batch Normalization
manipulates the layer inputs by
calculating a batch’s mean and
variance. The data is then scaled
and shifted.
• Batch Normalization is a special
kind of preprocessing. The
mathematical procedure can be
seen on the right.

Deep Learning for Computer Vision 41


Advantages of Batch Normalization
The figure below displays the experimental results of the VGG network with
batch normalization applied to the CIFAR-10 dataset. The benefits of using
batch normalization include.
1. Model converges faster.
2. Allow higher learning rates.
3. Reduce the strong dependence on initialization. 𝑙𝑟 = 0.0015
𝑙𝑟 = 0.0075
𝑙𝑟 = 0.045

https://gradientscience.org/batchnorm/ Train VGG network on CIFAR10


Deep Learning for Computer Vision 42
Example 3.5 Batch Normalization
• Please download the “3-5_Batch_Normalization.zip” from the Moodle, which is built on the
VGG-16 model pretrained on the ImageNet.
• Use the VGG-16 pretrained model with/without batch normalization to retrain the CIFAR-100
dataset with the following parameters: input size = 32 (color image), batch size = 64, learning
rate=0.001.
• Please show the accuracies of first five epochs, and compare the accuracy with and without
using batch normalization.
• Please compare different batch size = 32, 64, 128, respectively, and show the results

Deep Learning for Computer Vision 43


Example 3.5 Batch Normalization

Loading the CIFAR100 dataset


Loading the VGG16 with batch normalization pretrained model

Batch normalization layer

Deep Learning for Computer Vision 44


Example 3.5 Batch Normalization
• Compare different batch size = 32, 64, 128

batch size = 32 batch size = 64 batch size = 128

Deep Learning for Computer Vision 45

You might also like