You are on page 1of 43

Convolutional Neural Network For

Computer Vision Problem

Lecture: Dr. Chuang-Jan Chang


TA: Haryanto
MCUT Omnidirectional Surveillance Imaging Laboratory (MOIL)
Convolutional Neural Network (CNN)
In deep learning, a convolutional neural network (CNN, or ConvNet) is a
class of deep neural networks, most applied to analyzing visual imagery.
Convolutional Neural Network (CNN)

The Architecture of convolutional


Neural network:
❑ Convolutional (CONV) + Rectified Linear
Unit (ReLU)
❑ Polling (POOL)
❑ Fully connected (FC)
Convolutional Neural Network (CNN)

❑ Convolution (CONV)
A convolution operation is an element wise matrix
multiplication operation. Where one of the matrices is the
image, and the other is the filter or kernel that turns the
image into something else. The output of this is the final
convoluted image. A several example of filter such as,
sharpen, sobel filters and etc.
Convolution (CONV)

Filter or feature detector


Convolution (CONV)
Convolution (CONV)

❑ Padding
Padding is refers to the amount of pixels added to an
image when it is being processed by the kernel of a
CNN. Have two kinds of padding, such as:
- Valid
- Same
Convolution (CONV)

❑ Stride
Stride is the number of pixels shifts
over the input matrix. When the stride
is 1 then we move the filters to 1 pixel
at a time. When the stride is 2 then we
move the filters to 2 pixels at a time
and so on
Convolution in 2D

Mathematically, it’s: (2 * 1) + (0 * 0) + (1 * 1) + (0 * 0) + (1
* 0) + (0 * 0) + (0 * 0) + (0 * 1) + (1 * 0) = 3

An Input image A Filter Output image


(no padding) (3x3) (stride 1)
Rectified Linear Unit (ReLU)

Following each convolution operation, the


CNN applies a Rectified Linear Unit (ReLU)
transformation to the convolved feature,
in order to introduce nonlinearity into the
model. The ReLU function, returns x for all
values of x > 0, and returns 0 for all
values of x ≤ 0
Rectified Linear Unit (ReLU)

Before After
Convolutions over volume (Convolutions on RGB images)

27 parameters

* =

3x3x3
4x4
6x6x3
Multiple filters
Vertical edges

=
*
3x3x3 4x4
Horizontal edges

= 4x4x2
*
6x6x3
3x3x3 4x4
ONE LAYER OF A CONVOLUTIONAL NETWORK
" 𝑤 [1] 𝑥 [0] "

= Relu
*
+ 𝑏1 =
*
3x3x3 4x4 4x4
𝑤 [1]

4x4x2
6x6x3 * = Relu
* + 𝑏2 =
𝑥 [0]
3x3x3 4x4 4x4

https://www.youtube.com/watch?v=hxA0wxibv8g&list=PLNgy4gid0G9cbw5OjwG2jxvFqYDqkGnpJ&index=11
https://www.coursera.org/lecture/convolutional-neural-networks/one-layer-of-a-convolutional-network-nsiuW
ONE LAYER OF A CONVOLUTIONAL NETWORK
Polling (POOL)

The pooling layer serves to progressively reduce the spatial size of the
representation, to reduce the number of parameters and amount of
computation in the network, and hence to also control overfitting. The
intuition is that the exact location of a feature is less important than its
rough location relative to other features.

Types of Pooling
➢ Mean pooling Max Pooling
➢ Max pooling
➢ Sum pooling Hyper parameter:
F=2x2
S=2
Flattening

Flattening is converting the data into


a 1-dimensional array for inputting it
to the next layer. We flatten the
output of the convolutional layers to
create a single long feature vector.
Refer here
Fully connected (FC) or Dense Layer
Have three layers in the full connection
step:
➢ Input layer
➢ Fully-connected layer
➢ Output layer

In the above diagram, feature map matrix will be


converted as vector. With the fully connected
layers, we combined these features together to
create a model. Finally, we have an activation
function such as softmax or sigmoid to classify
the outputs as cat or dog
Activation Function

While learning the logistic regression concepts, the primary


confusion will be on the functions used for calculating to predict
the target class. The two principal functions we frequently hear
are Softmax and Sigmoid function.

Softmax is used for multi-classification in the Logistic Regression


model, whereas Sigmoid is used for binary classification in the
Logistic Regression model
Sigmoid

The sigmoid function take any range


real number and returns the output
value which falls in the range of 0 to
1. Based on the convention we can
expect the output value in the range
of -1 to 1. The Sigmoid function used
for binary classification in logistic
regression model.
Softmax

Softmax is A function that provides


probabilities for each possible class in
a multi-class classification model.
The probabilities add up to exactly
1.0.
Convolutional Neural Network (CNN)

https://poloclub.github.io/cnn-explainer/
Glossary

Some terms that you should be familiar on deep learning:


1. Convolution 12. Ground truth
2. Padding 13. Backpropagation
3. Stride 14. Epoch
4. Polling 15. Batch size
5. Flatten 16. Bias
6. Fully connected layer 17. Fine tuning
7. Dropout 18. Hidden layer
8. Softmax 19. Hyper parameter
9. Training set 20. Model
10. Validation set 21. Normalization
11. Test set 22. optimizer
Example CNN architectures

1.AlexNet
2.VGGNet
3.GoogLeNet
4.ResNet
5.etc,.
Computer Vision Problem: image classification, object
detection and segmentation
Session 1
(Image Classification)
Image classification

Image Classification is a
fundamental task that
attempts to comprehend an
entire image as a whole. The
goal is to classify the image by
assigning it to a specific label.

https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
Practice

Click here !

Open the link given by TA:


https://gist.github.com/anto112/22a15f8a9
82569906edf65a61841aa1b
Practice
Open Google Colab

https://drive.google.com/drive/folders/19STUv4SwltwhnFPjqh8wTFXn9sJT2PUp?usp=sharing
Practice
Practice (Mounting dataset)

1. The first mounting


to drive, you
can't find the cat and
dog dataset folder
Practice (Mounting dataset)
Open the link here to open google drive, and go to "shared with me"
Practice (Mounting dataset)

3. Now you have the


dataset folder in
your drive
Practice (Save your code in GitHub)

1. Click here

First, you need to


login your GitHub
account.
Practice (Save your code in GitHub)

2. Define the
repository
name

3. Check Box

4. Click here
Practice (Save your code in GitHub)
Go to Google colab

5. Click here
Practice (Save your code in GitHub)

6. Select
repository name
Practice (Submit your code to GitHub classroom)
*Accept the assignment following the link share by TA
https://classroom.github.com/a/pC0raJi_
Practice (Submit your code to GitHub classroom)

3. Select
repository

4. Click here

2. Import
code
Practice (Submit your code to GitHub classroom)

Click here
Session 2
(Object Detection and Instance segmentation)
To be continued…
Thank You
Contact person:
GitHub: anto112
Email = m07158031@o365.mcut.edu.tw
Line-Id = Haryanto_96

You might also like