CNN For Computer Vision Problem (Session 1)

Convolutional Neural Network For
Computer Vision Problem
Lecture: Dr. Chuang-Jan Chang

TA: Haryanto
MCUT Omnidirectional Surveillance Imaging Laboratory (MOIL)
Convolutional Neural Network (CNN)
In deep learning, a convolutional neural network (CNN, or ConvNet) is a
class of deep neural networks, most applied to analyzing visual imagery.
The Architecture of convolutional

Neural network:
❑ Convolutional (CONV) + Rectified Linear
Unit (ReLU)
❑ Polling (POOL)
❑ Fully connected (FC)
❑ Convolution (CONV)
A convolution operation is an element wise matrix
multiplication operation. Where one of the matrices is the
image, and the other is the filter or kernel that turns the
image into something else. The output of this is the final
convoluted image. A several example of filter such as,
sharpen, sobel filters and etc.
Convolution (CONV)
Filter or feature detector

Convolution (CONV)
Convolution (CONV)
❑ Padding
Padding is refers to the amount of pixels added to an
image when it is being processed by the kernel of a
CNN. Have two kinds of padding, such as:
- Valid
- Same
Convolution (CONV)
❑ Stride
Stride is the number of pixels shifts
over the input matrix. When the stride
is 1 then we move the filters to 1 pixel
at a time. When the stride is 2 then we
move the filters to 2 pixels at a time
and so on
Convolution in 2D
Mathematically, it’s: (2 * 1) + (0 * 0) + (1 * 1) + (0 * 0) + (1
* 0) + (0 * 0) + (0 * 0) + (0 * 1) + (1 * 0) = 3
An Input image A Filter Output image

(no padding) (3x3) (stride 1)
Rectified Linear Unit (ReLU)
Following each convolution operation, the

CNN applies a Rectified Linear Unit (ReLU)
transformation to the convolved feature,
in order to introduce nonlinearity into the
model. The ReLU function, returns x for all
values of x > 0, and returns 0 for all
values of x ≤ 0
Rectified Linear Unit (ReLU)
Before After
Convolutions over volume (Convolutions on RGB images)
27 parameters
* =
3x3x3
4x4
6x6x3
Multiple filters
Vertical edges
=
*
3x3x3 4x4
Horizontal edges
= 4x4x2
*
6x6x3
3x3x3 4x4
ONE LAYER OF A CONVOLUTIONAL NETWORK
" 𝑤 [1] 𝑥 [0] "
= Relu
*
+ 𝑏1 =
*
3x3x3 4x4 4x4
𝑤 [1]
4x4x2
6x6x3 * = Relu
* + 𝑏2 =
𝑥 [0]
3x3x3 4x4 4x4
https://www.youtube.com/watch?v=hxA0wxibv8g&list=PLNgy4gid0G9cbw5OjwG2jxvFqYDqkGnpJ&index=11
https://www.coursera.org/lecture/convolutional-neural-networks/one-layer-of-a-convolutional-network-nsiuW
ONE LAYER OF A CONVOLUTIONAL NETWORK
Polling (POOL)
The pooling layer serves to progressively reduce the spatial size of the
representation, to reduce the number of parameters and amount of
computation in the network, and hence to also control overfitting. The
intuition is that the exact location of a feature is less important than its
rough location relative to other features.
Types of Pooling
➢ Mean pooling Max Pooling
➢ Max pooling
➢ Sum pooling Hyper parameter:
F=2x2
S=2
Flattening
Flattening is converting the data into

a 1-dimensional array for inputting it
to the next layer. We flatten the
output of the convolutional layers to
create a single long feature vector.
Refer here
Fully connected (FC) or Dense Layer
Have three layers in the full connection
step:
➢ Input layer
➢ Fully-connected layer
➢ Output layer
In the above diagram, feature map matrix will be

converted as vector. With the fully connected
layers, we combined these features together to
create a model. Finally, we have an activation
function such as softmax or sigmoid to classify
the outputs as cat or dog
Activation Function
While learning the logistic regression concepts, the primary

confusion will be on the functions used for calculating to predict
the target class. The two principal functions we frequently hear
are Softmax and Sigmoid function.
Softmax is used for multi-classification in the Logistic Regression

model, whereas Sigmoid is used for binary classification in the
Logistic Regression model
Sigmoid
The sigmoid function take any range

real number and returns the output
value which falls in the range of 0 to
1. Based on the convention we can
expect the output value in the range
of -1 to 1. The Sigmoid function used
for binary classification in logistic
regression model.
Softmax
Softmax is A function that provides

probabilities for each possible class in
a multi-class classification model.
The probabilities add up to exactly
1.0.
https://poloclub.github.io/cnn-explainer/
Glossary
Some terms that you should be familiar on deep learning:

1. Convolution 12. Ground truth
2. Padding 13. Backpropagation
3. Stride 14. Epoch
4. Polling 15. Batch size
5. Flatten 16. Bias
6. Fully connected layer 17. Fine tuning
7. Dropout 18. Hidden layer
8. Softmax 19. Hyper parameter
9. Training set 20. Model
10. Validation set 21. Normalization
11. Test set 22. optimizer
Example CNN architectures
1.AlexNet
2.VGGNet
3.GoogLeNet
4.ResNet
5.etc,.
Computer Vision Problem: image classification, object
detection and segmentation
Session 1
(Image Classification)
Image classification
Image Classification is a
fundamental task that
attempts to comprehend an
entire image as a whole. The
goal is to classify the image by
assigning it to a specific label.
https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
Practice
Click here !
Open the link given by TA:

https://gist.github.com/anto112/22a15f8a9
82569906edf65a61841aa1b
Practice
Open Google Colab
https://drive.google.com/drive/folders/19STUv4SwltwhnFPjqh8wTFXn9sJT2PUp?usp=sharing
Practice
Practice (Mounting dataset)
1. The first mounting

to drive, you
can't find the cat and
dog dataset folder
Open the link here to open google drive, and go to "shared with me"
3. Now you have the

dataset folder in
your drive
Practice (Save your code in GitHub)
1. Click here
First, you need to

login your GitHub
account.
2. Define the
repository
name
3. Check Box
4. Click here
Go to Google colab
5. Click here
6. Select
repository name
Practice (Submit your code to GitHub classroom)
*Accept the assignment following the link share by TA
https://classroom.github.com/a/pC0raJi_
3. Select
repository
4. Click here
2. Import
code
Click here
Session 2
(Object Detection and Instance segmentation)
To be continued…
Thank You
Contact person:
GitHub: anto112
Email = m07158031@o365.mcut.edu.tw
Line-Id = Haryanto_96

CNN For Computer Vision Problem (Session 1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CNN For Computer Vision Problem (Session 1)

Uploaded by

Copyright:

Available Formats

Convolutional Neural Network For

Computer Vision Problem

Lecture: Dr. Chuang-Jan Chang

The Architecture of convolutional

Filter or feature detector

An Input image A Filter Output image

Following each convolution operation, the

Flattening is converting the data into

In the above diagram, feature map matrix will be

While learning the logistic regression concepts, the primary

Softmax is used for multi-classification in the Logistic Regression

The sigmoid function take any range

Softmax is A function that provides

Some terms that you should be familiar on deep learning:

Open the link given by TA:

1. The first mounting

3. Now you have the

First, you need to

You might also like