DLCV Ch3 Convolutional Neural Network

Deep Learning for
Computer Vision
CHAPTER 3
INTRODUCTION TO
CONVOLUTION
NEURAL NETWORK
Prof. G.S. Jison Hsu 徐繼聖

• Artificial Vision Laboratory
• National Taiwan University of
Science and Technology
Deep Learning for Computer Vision

Outline
Deep Learning for Computer Vision 2

Hardware and Software – University of Michigan
OUTLINE:
• 07:20 – 09:28 CPU vs GPU
• 17:02 – 19:55 Example: Matrix Multiplication
• 19:55 – 21:14 Programming GPUs
• 29:32 – 30:05 Recall: Computational Graphs
• 30:06 – 31:50 The Point of Deep Learning Frameworks
• 33:22 – 51:08 PyTorch: Fundamental Concepts
• 51:09 – 54:29 PyTorch: nn Defining Modules
• 54:30 – 55:01 PyTorch: Pretrained Models

CPU vs GPU
https://reurl.cc/YXRpRO [07:20 – 09:28 ]

PyTorch: Fundamental Concepts
https://reurl.cc/9RrgNV [33:22 – 51:08 ]

ImageNet Challenge
https://reurl.cc/y6ExVy [1:15 - 2:54]

ImageNet Challenge
• What is ImageNet? ImageNet Official Web: https://reurl.cc/XVpA2R
• ImageNet is a benchmark dataset for object category classification and detection,
featuring hundreds of object categories and millions of images.
• The characteristics of ILSVRC include:
• A detection challenge on fully labeled data for 200 categories of objects
• An image classification plus object localization challenge with 1000 categories.

ImageNet Challenge
https://www.youtube.com/watch?v=taC5pMCm70U&ab
_channel=codebasics
ImageNet Challenge
Deep Learning for Computer Vision https://reurl.cc/XVpA2R 9

ImageNet Challenge

AlexNet
https://reurl.cc/edEjNR [2:54 - 20:20]

AlexNet
• Use ReLU to add in non-linearity. It accelerates the speed by 6 times at the same
accuracy.
• Replacing regularization with dropout effectively addresses overfitting
concerns. However, the training time is doubled with a dropout rate of 0.5.
• Implementing overlap pooling to reduce network size leads to reductions of
0.4% and 0.3% in top-1 and top-5 error rates on ImageNet, respectively.

AlexNet
• Copy convolution layers into different GPUs; Distribute the fully connected
layers into different GPUs.
• Feed one batch of training data into convolutional layers for every GPU (Data
Parallel).
• Feed the results of convolutional layers into the distributed fully connected
layers batch by batch (Model Parallel).
• When the last step is done for every GPU. Backpropogate gradients batch by
batch and synchronize the weights of the convolutional layers.

AlexNet Parameters
Dropout

Training Techniques - Dropout
https://www.youtube.com/watch?v=gxrnkqa9amo https://www.youtube.com/watch?v=ARq74QuavAo [7:04]

VGGNet
https://reurl.cc/a49pv4 [23:10 - 34:22]

VGGNet
➢ Task: 1000 Objects on ImageNet Competition
➢ Layer
◆ Convolutional layer
◆ Max pooling layer
◆ Dropout layer
◆ Fully connected layer

VGG Parameters

Example 3.1 Use VGGNet pretrained on ImageNet
1. Please download the “3-1_VGGNet_ImageNet.zip” from Moodle and unzip it.
2. Upload the “3-1_VGGNet_ImageNet.ipynb” and “imagenet1000_clsidx_to_labels.txt”
to the Google Colab.
3. Compare the top-5 prediction probability of ice_bear.jpg using the VGGNet pre-trained
model.
4. Please show your code, results, and observations in a Words file and upload to Moodle.
Original image: ice_bear.jpg Probability of the classes Predicted class : ice bear
Example 3.1 Use VGGNet pretrained on ImageNet
Use the VGG16 pretrained model
Load the 1000 class labels
Define Image path

The CIFAR-10 and CIFAR-100 Dataset
What is the CIFAR10 dataset?
• The CIFAR 10 dataset is a collection of images that are commonly used in the computer
vision field.
• It contains 60,000 32x32 color images in 10 different classes, and there are 6,000 images in
each class.
• 10 different classes represent
airplanes, cars, birds, cats, deer, dogs,
frogs, horses, ships, and trucks.
What is the CIFAR100 dataset?
• This dataset is just like the CIFAR-10,
except it has 100 classes containing 600
images each.
• There are 500 training images and
100 testing images per class.
Deep Learning for Computer Vision CIFAR-10, CIFAR-100 official website: https://reurl.cc/5pZY2R 21
Example 3.2a Train VGGNet on CIFAR100
1. Please download the “3-2_VGGNet_CIFAR100.zip” from Moodle and unzip it.
2. Upload the “3-2_VGGNet_CIFAR100.ipynb” to the Google Colab.
3. Use the VGG-16 model pretrained on ImageNet, and given apple, dolphin, and dog images,
and print the top-3 predicted results without training.
4. Please train the CIFAR-100 dataset with the following parameters: input size = 32 (color image),
batch size = 256, learning rate=0.00001, and print the top-3 predicted results without training.
5. Contrast the predictive outcomes with and without any additional training.
CIFAR 100 dataset contains 60000

images and consists of 100 class

Example 3.2a Train VGGNet on CIFAR100
Resize input images to 32x32
Training Setting Download Dataset

Example 3.2b Train VGGNet on CIFAR100 from scratch
1. Close the VGG-16 pretrained model, and give apple, dolphin, dog images, and print the top-3
predicted results without training.
2. Close VGG-16 pretrained model on ImageNet to train the CIFAR-100 dataset with the
following parameters: input size = 32 (color image), batch size = 256, learning rate=0.00001.
3. Given these images as input to your trained model, what are the top-3 probabilities in the
output layer?
5. Compare the prediction result of using pretrained at 3.2a and training from scratch

ResNet
https://reurl.cc/A0kE0E [45:32 ~ 59:11]

ResNet by Andraw Ng
https://reurl.cc/3YRKWO [07:07]

ResNet
• Since AlexNet, the state-of-the-art CNN architecture is going deeper and
deeper.
• However, increasing network depth does not work by simply stacking
layers together. Deep networks are hard to train because of the notorious
vanishing gradient problem.
• The core idea of ResNet is
introducing a so-called “identity
shortcut connection” to skip one or
more layers, as shown in the
following figure:

ResNet18
• Since AlexNet, the state-of-the-art CNN architecture is going deeper and deeper.
• Increasing network depth does not work by simply stacking layers together, deep
networks are hard to train because of the notorious vanishing gradient problem.
• The core idea of ResNet is introducing a so-called “identity shortcut connection” to
skip one or more layers, as shown in the following figure:

ResNet18 Parameters

Example 3.3 Train ResNet on CIFAR-100
1. Please download the “3-3_ResNet_CIFAR100.zip” from Moodle and unzip it.
2. Use the ResNet model pretrained without training, and given apple, dolphin, and dog
images, and print the top-5 predicted results without training.
3. Use the ResNet model pretrained on ImageNet to train the CIFAR-100 dataset with the
following parameters: input size = 32 (color image), batch size = 64, learning rate=0.001.

Example 3.3 Train ResNet on CIFAR-100
Residual block
structure

What does Filter Learn ? - Feature Visualization
https://reurl.cc/3YRKyl [6:08]

Example 3.4 Feature Map Visualization
1. Please download the “3-4_Feature_map_visualization.zip” from the Moodle, which is built on
the VGG-16 pretrained on the ImageNet.
2. Upload the “3-4_Feature_map_visualization.ipynb” and “imagenet1000_clsidx_to_labels.txt” to
the Google Colab.
3. Choose your own images from Internet.
4. Use the cat image as the input, please show the shape of the output, and visualize the feature
maps extracted from the layer-index-5 (red box in the right figure below).
Original image: Feature map:

Part of VGG-16 structure.
cat.jpg cat_feature_5.jpg
• Compare with feature maps that extract from layer index-7, -8 and -9
Extract from layer 7 Extract from layer 8 Extract from layer 9

Size: 112 * 112 Size: 112 * 112 Size: 56 * 56

Define the functions to get the

feature map that we wanted.

Define function that can

transform the feature map to
image, and output the image.

What is Batch Size
https://reurl.cc/GKorZZ [3:54]
Batch Normalization explained
https://www.youtube.com/watch?v=DtEq44FTPM4 [8:48]


Without Batch Normalization, the activated values
fluctuate significantly during the first iterations.

Batch Normalization
• Batch Normalization
manipulates the layer inputs by
calculating a batch’s mean and
variance. The data is then scaled
and shifted.
• Batch Normalization is a special
kind of preprocessing. The
mathematical procedure can be
seen on the right.

Advantages of Batch Normalization
The figure below displays the experimental results of the VGG network with
batch normalization applied to the CIFAR-10 dataset. The benefits of using
batch normalization include.
1. Model converges faster.
2. Allow higher learning rates.
3. Reduce the strong dependence on initialization. 𝑙𝑟 = 0.0015
𝑙𝑟 = 0.0075
𝑙𝑟 = 0.045
https://gradientscience.org/batchnorm/ Train VGG network on CIFAR10

Example 3.5 Batch Normalization
• Please download the “3-5_Batch_Normalization.zip” from the Moodle, which is built on the
VGG-16 model pretrained on the ImageNet.
• Use the VGG-16 pretrained model with/without batch normalization to retrain the CIFAR-100
dataset with the following parameters: input size = 32 (color image), batch size = 64, learning
rate=0.001.
• Please show the accuracies of first five epochs, and compare the accuracy with and without
using batch normalization.
• Please compare different batch size = 32, 64, 128, respectively, and show the results

Loading the CIFAR100 dataset

Loading the VGG16 with batch normalization pretrained model
Batch normalization layer

• Compare different batch size = 32, 64, 128
batch size = 32 batch size = 64 batch size = 128

DLCV Ch3 Convolutional Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DLCV Ch3 Convolutional Neural Network

Uploaded by

Copyright:

Available Formats

Deep Learning for

Prof. G.S. Jison Hsu 徐繼聖

Deep Learning for Computer Vision

Deep Learning for Computer Vision 2

Deep Learning for Computer Vision 3

https://reurl.cc/YXRpRO [07:20 – 09:28 ]

https://reurl.cc/9RrgNV [33:22 – 51:08 ]

Deep Learning for Computer Vision 5

https://reurl.cc/y6ExVy [1:15 - 2:54]

Deep Learning for Computer Vision 7

Deep Learning for Computer Vision https://reurl.cc/XVpA2R 9

Deep Learning for Computer Vision 10

https://reurl.cc/edEjNR [2:54 - 20:20]

Deep Learning for Computer Vision 12

Deep Learning for Computer Vision 13

Deep Learning for Computer Vision 14

https://www.youtube.com/watch?v=gxrnkqa9amo https://www.youtube.com/watch?v=ARq74QuavAo [7:04]

Deep Learning for Computer Vision 15

https://reurl.cc/a49pv4 [23:10 - 34:22]

Deep Learning for Computer Vision 17

Deep Learning for Computer Vision 18

Use the VGG16 pretrained model

Load the 1000 class labels

Define Image path

Deep Learning for Computer Vision 20

CIFAR 100 dataset contains 60000

Deep Learning for Computer Vision 22

Resize input images to 32x32

Training Setting Download Dataset

Deep Learning for Computer Vision 24

https://reurl.cc/A0kE0E [45:32 ~ 59:11]

Deep Learning for Computer Vision 25

Deep Learning for Computer Vision 26

Deep Learning for Computer Vision 27

Deep Learning for Computer Vision 28

Deep Learning for Computer Vision 29

Deep Learning for Computer Vision 30

Deep Learning for Computer Vision 31

Deep Learning for Computer Vision 32

Original image: Feature map:

Extract from layer 7 Extract from layer 8 Extract from layer 9

Deep Learning for Computer Vision 34

Define the functions to get the

Deep Learning for Computer Vision 35

Define function that can

Deep Learning for Computer Vision 36

Deep Learning for Computer Vision 38

Deep Learning for Computer Vision 39

Deep Learning for Computer Vision 40

Deep Learning for Computer Vision 41

https://gradientscience.org/batchnorm/ Train VGG network on CIFAR10

Deep Learning for Computer Vision 43

Loading the CIFAR100 dataset

Batch normalization layer

Deep Learning for Computer Vision 44

batch size = 32 batch size = 64 batch size = 128

Deep Learning for Computer Vision 45

You might also like