You are on page 1of 54

TECHIN513 – Managing

Signal and Data Processing


Week 8
Today’s Agenda
• CNN
• YOLO
• ICTE
• FPDAWT
Today’s Agenda
• Convolutional Neural Network
• You Only Look Once
• In Class Team Exercise
• Final Project Discussion And Work Time
Announcement
• Purchasing supplies for final project
• Budget of $40 per team
• Requests must be made by Monday, February 26 at 9:59am

Link to Request Form:


TECHIN513 Final Project Supply Request Form - Google Sheets
What is a convolutional neural network?
• A network architecture for deep learning
• CNNs can have tens or hundreds of hidden layers
• Includes a typical artificial neural network architecture
• Useful for finding patterns in images to recognize objects
Stages of a CNN
• Input image
• Convolution
• Activation
• Pooling
• Flattening
• Fully Connected ANN
• Activation
image source

• Output

Convolutional Operations | Medium


pixel values range
Greyscale Image Data from 0 to 255

24x16 matrix

How Do Machines Read and Store Images? | Analytics Vidhya


Color Image Data

one image has


three matrices or
pixel values range “channels”
from 0 to 255

How Do Machines Read and Store Images? | Analytics Vidhya


CNN Overview
Feature Extraction

Feature Extraction with CNNs | Towards Data Science


Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example

Introduction to Deep Learning - MATLAB


Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example

Introduction to Deep Learning - MATLAB


Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image

Introduction to Deep Learning - MATLAB


Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image

Introduction to Deep Learning - MATLAB


Convolutional Operation

An operation on two functions


which produces a third
combined function

Convolution Integral | Statistics How To


Convolutional Operation
kernel types

• A convolutional kernal is a
small 2D matrix
• The kernal maps on to the
input image by matrix
multiplication and addition
• The output is a matrix of
lower dimensions
Sliding window protocol
where stride =1

Lower dimension matrix


(feature map) Convolutional Operations | Medium
Convoluting to Create Feature Maps

CNNs | simplilearn
45*0
+ 12*(-1)
+ 5*0
+ 22*(-1)
+ 10*5
+ 35*(-1)
+ 88*0
+ 26*(-1)
+ 51*0
= - 45
Activation Step Rectified
Linear
Unit
• Activation function takes the
output of a neuron and maps it
to the highest positive value
• If output is negative, the
function maps it to zero
• ReLU is a commonly used
activation function in deep
learning

Introduction to Deep Learning - MATLAB


ReLu activation retains only positive values

CNNs | simplilearn
CNN Overview
Pooling Step New
Feature
Map
• Pooling reduces dimensionality
of features map by using
different filters
• Condenses regions of neurons
into a single output
• Simplifies model by reducing
the number of parameters the
model needs to learn
• Pooling retains the most
important information but
lowers resolution

Introduction to Deep Learning - MATLAB


Pooling Applies Various Filters

CNNs | simplilearn
Pooling Enhances Edges Three iterations of
max pooling using a
(2, 2) kernel

Features (edges) are


enhanced, but
resolution is reduced

Pooling In Convolutional Neural Networks | paperspace


CNN Overview
Flattening
• The flatten layer lies
between the CNN and the
Softmax
ANN
• Converts the feature map
from the pooling layer into
an input that the ANN can
understand
• The ANN requires a one-
dimensional array as input
Artificial Neural Network

Feature Maps | educative.io , Dense layers | Pysource


Softmax Activation Step
Mathematical
representation
Last fully
• Often used as the last connected layer
activation function to
normalize the output of a
network to a probability
distribution over predicted
output classes
• The output of a Softmax is a
vector with probabilities of
each possible outcome.

Softmax Activation Function | Towards Data Science


CNN Output Layer
The final layer of the CNN architecture provides the final
classification output
A vector of length K
equal to the
number of classes

Introduction to Deep Learning - MATLAB


Classification, Detection, & Segmentation

or object localization

Object Segmentation vs. Object Detection | LinkedIn


You Only Look Once
• "You Only Look Once" (YOLO)
• YOLOv1 paper published May 2016
• Uses CNN as its backbone
network architecture
• YOLO predicts bounding boxes
and class probabilities for these
boxes simultaneously
• Improvement on previous model:
R-CNN

https://arxiv.org/abs/1506.02640
YOLO

https://pjreddie.com/darknet/yolo/

https://arxiv.org/abs/1506.02640
Previous Model for Image Detection: R-CNN
• Regions with CNN features
• Published Oct 2014
• link to article
• Splits an image into 2000
regions in boundary boxes
then classify each region
• Drawbacks:
• Long time to train – classify
2000 regions per image
• Detection not in real-time: 47
sec for test image
• Boundary box inaccuracies

R-CNN | Towards Data Science


How does YOLO work?
• Resizes the input image into YOLO Architecture
448x448
• A 1x1 convolution is first applied
to reduce the number of
channels
• 24 convolutional layers
• 4 max pooling layers
• The activation function is ReLU
• Two fully connected layers

https://arxiv.org/abs/1506.02640
What is Object
Detection?
First let’s talk about
object localization

36
What is object localization?
width (bw)
Object localization is
finding what and where a
(single) object exists in a
single image

height
(bh)

(bx, by)
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train

box are described as a vector

y_train

Pc 1
Probability Bx 0.5
of class By 0.6
Bw 0.4
Bh 0.3
C1 1
C2 0
C1 = car class
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0) x_train

box are described as a vector

y_train

Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 1
C2 0 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0)
box are described as a vector

Output of
Neural Network

Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 0.97
C2 0.03 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train

box are described as a vector

y_train

Pc 0
Probability Bx -
of class By -
Bw -
Bh -
C1 -
C2 -
C1 = car class
C2 = motorcycle class
What about multiple objects?

YOLO algorithm | YouTube


What about multiple objects?

Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?
Person’s
object
belongs to
this cell

Pc 1
Bx 0.05
By 0.3
Bw 2
Bh 1.3
C1 1
C2 0

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?

Pc 1
Bx 0.32
By 0.02
Bw 2.2
Bh 1.7
C1 0
C2 1

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


What about multiple objects?

All other cells 4x4x7 matrix

Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -

C1 = dog class
C2 = person class

YOLO algorithm | YouTube


Training the YOLO Model

YOLO algorithm | YouTube


YOLO Prediction

YOLO algorithm | YouTube


Evaluating Image Detection Models
• Common Objects in Context
(COCO) dataset
• Published by Microsoft
• Used to evaluate algorithms’
performance of real-time
object detection
• 330,000 images
• 200,000 are labeled Pc 1

• 1.5 million object instances y_train


Bx
By
0.5
0.6
Bw 0.4
• 5 captions per image Bh
C1
0.3
1
C2 0

COCO Dataset | viso.ai


Evaluating Image Detection Models
Error Matrix

• Mean Average Precision (mAP)


• Benchmark metric used to
evaluate the robustness of
object detection models
• Incorporates mathematics image source

from:
• Error matrix
• Intersection over union (IoU)
ratio for bounding box

image source

Understanding Confusion Matrix | Towards Data Science


Best Object Detection Models

Object Detection | viso.ai


YOLOv8

YOLOv8 Tutorial - Colaboratory (google.com)


YOLOv8

Ultralytics YOLOv8 | GitHub


ICTE

You might also like