Professional Documents
Culture Documents
• Output
24x16 matrix
• A convolutional kernal is a
small 2D matrix
• The kernal maps on to the
input image by matrix
multiplication and addition
• The output is a matrix of
lower dimensions
Sliding window protocol
where stride =1
CNNs | simplilearn
45*0
+ 12*(-1)
+ 5*0
+ 22*(-1)
+ 10*5
+ 35*(-1)
+ 88*0
+ 26*(-1)
+ 51*0
= - 45
Activation Step Rectified
Linear
Unit
• Activation function takes the
output of a neuron and maps it
to the highest positive value
• If output is negative, the
function maps it to zero
• ReLU is a commonly used
activation function in deep
learning
CNNs | simplilearn
CNN Overview
Pooling Step New
Feature
Map
• Pooling reduces dimensionality
of features map by using
different filters
• Condenses regions of neurons
into a single output
• Simplifies model by reducing
the number of parameters the
model needs to learn
• Pooling retains the most
important information but
lowers resolution
CNNs | simplilearn
Pooling Enhances Edges Three iterations of
max pooling using a
(2, 2) kernel
or object localization
https://arxiv.org/abs/1506.02640
YOLO
https://pjreddie.com/darknet/yolo/
https://arxiv.org/abs/1506.02640
Previous Model for Image Detection: R-CNN
• Regions with CNN features
• Published Oct 2014
• link to article
• Splits an image into 2000
regions in boundary boxes
then classify each region
• Drawbacks:
• Long time to train – classify
2000 regions per image
• Detection not in real-time: 47
sec for test image
• Boundary box inaccuracies
https://arxiv.org/abs/1506.02640
What is Object
Detection?
First let’s talk about
object localization
36
What is object localization?
width (bw)
Object localization is
finding what and where a
(single) object exists in a
single image
height
(bh)
(bx, by)
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train
y_train
Pc 1
Probability Bx 0.5
of class By 0.6
Bw 0.4
Bh 0.3
C1 1
C2 0
C1 = car class
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0) x_train
y_train
Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 1
C2 0 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0)
box are described as a vector
Output of
Neural Network
Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 0.97
C2 0.03 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train
y_train
Pc 0
Probability Bx -
of class By -
Bw -
Bh -
C1 -
C2 -
C1 = car class
C2 = motorcycle class
What about multiple objects?
Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -
C1 = dog class
C2 = person class
Pc 1
Bx 0.05
By 0.3
Bw 2
Bh 1.3
C1 1
C2 0
C1 = dog class
C2 = person class
Pc 1
Bx 0.32
By 0.02
Bw 2.2
Bh 1.7
C1 0
C2 1
C1 = dog class
C2 = person class
Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -
C1 = dog class
C2 = person class
from:
• Error matrix
• Intersection over union (IoU)
ratio for bounding box
image source