Building Convolutional Neural Networks For Image Classification Slides

Building Convolutional Neural
Networks for Image Classification
Janani Ravi
CO-FOUNDER, LOONYCORN
www.loonycorn.com
Narrow and wide convolution
Zero-padding and the feature map sizes
Overview for convolutional layers
Calculating feature map dimensions
Batch normalization of input images
Building and training a CNN for image
classification
Changing model hyperparameters
Convolutional Neural Networks
Two Kinds of Layers in CNNs
Convolution Pooling
Local receptive field Subsampling of inputs
Typical CNN Architecture
ReLU
ReLU
Convolutional Pooling Convolutional
Alternating convolutional and pooling layers

ReLU
ReLU
Convolutional Pooling Convolutional
This entire set of layers is then fed into a

regular, feed-forward NN
P(Y = 0)
Fully Connected
Fully Connected
P(Y = 1)
Prediction
SoftMax
ReLU
ReLU
CNN Layers
Feed-forward
Layers
P(Y = 9)
This is the output layer, emitting probabilities

P(Y=0) P(Y=9)
…
CNN Input is an image

Outputs are probabilities
Feature Maps
x1 x0 x1
x0 x1 x0
x1 x0 x1
Image Pixels Feature

Map
Zero-padding, Stride Size
Narrow vs. Wide Convolution
Input matrix i.e. image

Convolution result
Narrow Convolution Wide Convolution

Little zero padding; output Lots of zero padding; output
narrower than input wider than input
Without Zero Padding
6
0 0 0 0 0 0
4
0.2 0.8 0 0.3 0.6 0
0.2 0.9 0 0.3 0.8 0
6 0.3 0.8 0.7 0.8 0.9 0

x1 x0
x0
x1
x1 x0
4
x1 x0 x1
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Convolution
Matrix Result
Zero Padding
10 8
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0
0 0.2 0.8 0 0.3 0.6 0
0 0.2 0.9 0 0.3 0.8 0

0
0
0
0
8
10 0 0 0.3 0.8 0.7 0.8 0.9 0 0 0
x1 x0
x0
x1
x1 x0
0 0 0 0 0 0.2 0.8 0 0 0 x1 x0 x1
0 0 0 0 0 0.2 0.2 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Convolution
Matrix Result
Zero Padding
12
0 0 0 0 0 0 0 0 0 0 0 0
10
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0.2 0.8 0 0.3 0.6 0 0 0 0
10
12 0
0
0
0
0 0.2 0.9 0 0.3 0.8 0
0 0.3 0.8 0.7 0.8 0.9 0
0
0
0
0
0
0
x1 x0
x0
x1
x1 x0
0 0 0 0 0 0 0.2 0.8 0 0 0 0 x1 x0 x1
0 0 0 0 0 0 0.2 0.2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Convolution
Matrix Result
Zero Padding
x0 x0 x0 With zero-padding, every element of

x0 x1 x1 x0 matrix will be passed into filter
x0 x1 x1 x0
Can decide number of zero columns to
x0 x1 x1 x0
pad with
x0 x0
Use to get output larger than input

Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Horizontal stride of 1
Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Stride Size
0 0 0 0 0 0
0.2 0.8 0 0.3 0.6 0

x1 x0 x1
0.2
x0
0.9x1 0x0 0.3 0.8 0
0.3 0.8 0.7 0.8 0.9 0

x1 x0 x1
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Vertical stride of 1
Stride Size
0 0 0 0 0 0
0.2 0.8 0 0.3 0.6 0

x1 x0 x1
0.2
x0
0.9x1 0x0 0.3 0.8 0
0.3 0.8 0.7 0.8 0.9 0

x1 x0 x1
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Stride size is an important

hyperparameter in CNNs
Batch Normalization
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
Error
Optimiser
Vanishing and Exploding Gradients
Back propagation fails if

- gradients are vanishing
- gradients are exploding
Vanishing Gradient Problem
Loss
Gradient becomes zero
and stops changing
W
Initial value of loss
b Smallest value of loss

Exploding Gradient Problem
Loss
Gradient changes
abruptly and “explodes”
W
Initial value of loss
b Smallest value of loss

Coping with Vanishing/Exploding Gradients
Non-saturating activation
Proper initialization
function
Batch normalization Gradient clipping

Coping with Vanishing/Exploding Gradients
Non-saturating activation
Proper initialisation
function
Batch normalization Gradient clipping

Batch Normalization
Just before applying activation function

First, “normalize” inputs
Second, “scale and shift” inputs
Batch Normalization
“Normalize” inputs
- subtract mean
- divide by standard deviation
“Scale and shift” inputs
- scale = multiply by constant
- shift = add constant
Batch Normalization
Supported in PyTorch
Many other benefits
- allows much larger learn rate
- reduces overfitting
- speeds convergence of training
Choice of Activation Function
A Neural Network
Once a neural network is trained all edges have weights

which help it make predictions
Operation of a Single Neuron
W1
X1
X2 W2 Affine Activation
Wi Transformation Wx + b Function max(Wx+
…
b,0)
Xi
…
Wn
Xn
b
Each neuron only applies two simple functions to its inputs

W1
X1
…
b,0)
Xi
…
Wn
Xn
b
The affine transformation alone can only learn linear

relationships between the inputs and the output
W1
X1
…
b,0)
Xi
…
Wn
Xn
b
The affine transformation is just a weighted sum with a

bias added: W1x1 + W2x2 +…+ Wnxn + b
The weights and biases of
individual neurons are determined
during the training process
W1
X1
…
b,0)
Xi
…
Wn
Xn
b
The combination of the affine transformation and the

activation function can learn any arbitrary relationship
Activation Function
ReLU logit tanh step
Various choices of activation functions exist and drive

the design of your neural network
Importance of Activation
The choice of activation function is

crucial in determining performance
Feature Map Size Calculations
Without Zero Padding
6
0 0 0 0 0 0
4
0.2 0.8 0 0.3 0.6 0
0.2 0.9 0 0.3 0.8 0 4

6 0.3 0.8 0.7 0.8 0.9 0
x1 x0
x0
x1
x1 x0
x1 x0 x1
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Convolution
Matrix Result
Zero Padding
10 8
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0
0 0.2 0.8 0 0.3 0.6 0
0 0.2 0.9 0 0.3 0.8 0

0
0
0
0
8
10 0 0 0.3 0.8 0.7 0.8 0.9 0 0 0
x1 x0
x0
x1
x1 x0
0 0 0 0 0 0.2 0.8 0 0 0 x1 x0 x1
0 0 0 0 0 0.2 0.2 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Convolution
Matrix Result
W-K+2P
O = + 1
S
Formula for dimension calculations

Handy in getting dimensions of CNN layers right
W-K+2P
O = + 1
S
O = Output dimension
Height/width of output
W-K+2P
O = + 1
S
W = Input dimension
Height/width of input image
W-K+2P
O = + 1
S
K = Kernel size
Height/width of kernel
W-K+2P
O = + 1
S
P = Padding (if any)

Maybe zero
W-K+2P
O = + 1
S
S = Stride
How far the kernel advances in each step
Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Horizontal stride of 1
Stride Size
0 0 0 0 0 0
x1 x0 x1
0.2 0.8 0 0.3 0.6 0

x0 x1 x0
0.2 0.9 0 0.3 0.8 0

x1 x0 x1
0.3 0.8 0.7 0.8 0.9 0
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Stride Size
0 0 0 0 0 0
0.2 0.8 0 0.3 0.6 0

x1 x0 x1
0.2
x0
0.9x1 0x0 0.3 0.8 0
0.3 0.8 0.7 0.8 0.9 0

x1 x0 x1
0 0 0 0.2 0.8 0
0 0 0 0.2 0.2 0
Vertical stride of 1
W-K+2P
O = + 1
S
Formula for dimension calculations

Handy in getting dimensions of CNN layers right
Demo
Image classification using
convolutional neural networks (CNNs)
Hyperparameter tuning
Narrow and wide convolution
Zero-padding and the feature map sizes
Summary for convolutional layers
Calculating feature map dimensions
Batch normalization of input images
Building and training a CNN for image
classification
Changing model hyperparameters

Building Convolutional Neural Networks For Image Classification Slides

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Building Convolutional Neural Networks For Image Classification Slides

Uploaded by

Copyright:

Available Formats

Building Convolutional Neural

Networks for Image Classification

Alternating convolutional and pooling layers

This entire set of layers is then fed into a

This is the output layer, emitting probabilities

CNN Input is an image

Image Pixels Feature

Input matrix i.e. image

Narrow Convolution Wide Convolution

0.2 0.9 0 0.3 0.8 0

6 0.3 0.8 0.7 0.8 0.9 0

0 0.2 0.9 0 0.3 0.8 0

x0 x0 x0 With zero-padding, every element of

Use to get output larger than input

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.3 0.8 0.7 0.8 0.9 0

Stride size is an important

Back propagation fails if

b Smallest value of loss

b Smallest value of loss

Batch normalization Gradient clipping

Batch normalization Gradient clipping

Just before applying activation function

Once a neural network is trained all edges have weights

Each neuron only applies two simple functions to its inputs

The affine transformation alone can only learn linear

The affine transformation is just a weighted sum with a

The combination of the affine transformation and the

ReLU logit tanh step

Various choices of activation functions exist and drive

The choice of activation function is

0.2 0.9 0 0.3 0.8 0 4

0 0.2 0.9 0 0.3 0.8 0

Formula for dimension calculations

P = Padding (if any)

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0.2 0.8 0 0.3 0.6 0

0.3 0.8 0.7 0.8 0.9 0

Formula for dimension calculations

You might also like