Object Detection - Week 1 - Object Detection in 20 Years - Final

제1장 Introduction
AI VIETNAM
All-in-One Course
Object Detection
Dr. Vinh Dinh Nguyen

AI VIETNAM
All-in-One Course
2
AI VIETNAM
All-in-One Course
Shaoxi Li et al. “Survey on Deep Learning-Based Marine Object Detection”, 2021

3
AI VIETNAM
All-in-One Course
4
AI VIETNAM
All-in-One Course
5
AI VIETNAM
Tentative Calendar for Research Course (Feb, 2023)
All-in-One Course
How to write and publish research paper in reputed

international conference and journals
6
AI VIETNAM
All-in-One Course
7
AI VIETNAM
All-in-One Course
8
What is this?
AI VIETNAM
All-in-One Course
9
Application
AI VIETNAM
All-in-One Course
10
AI VIETNAM
All-in-One Course
Car and Pedestrian Detection
1. V. D. Nguyen et. al: “ Real-time Vehicle Detection Using an Effective Region Proposal-based Depth and 3-channel Pattern“, IEEE Tran
sactions on Intelligent Transportation System (ITS), 2019 (ISI Q1).
2. V. D. Nguyen et al. ,“ Learning Framework for Robust Obstacle Detection, Recognition, and Tracking“, IEEE Transactions on Intelligent
Transportation System (ITS), 2017, (ISI Q1).
AI VIETNAM
All-in-One Course
V. D. Nguyen et al., “Real-time Vehicle

Detection Using an Effective Region
Proposal-based Depth and 3-channel
Pattern”, IEEE Transactions on Intelligent
Transportation System (ITS), 2019
12
AI VIETNAM
All-in-One Course
V. D. Nguyen et al., “Learning Framework for Robust Obstacle

Detection, Recognition, and Tracking“, IEEE Transactions on Intelligent
Transportation System (ITS), 2017
13
AI VIETNAM
All-in-One Course
Our Experimental Results
14
AI VIETNAM
All-in-One Course
Hyundai Driving Contest, South Korea 2017
1st rank in Software Algorithms Contest

Pedestrian, Car, Lane, Traffic Light and Traffic Sign Detection
15
AI VIETNAM
All-in-One Course
DISPARITY ESTIMATION
Left imag Right image Proposed Existing

e
1. V. D. Nguyen et al.: " Feature Engineering and Deep. Learning for Stereo Matching Under Adverse Driving Conditions," in IEEE. Transactions o
n Intelligent Transportation Systems, 2021 (ISI Q1).
2. V. D. Nguyen et. al,“ Robust Stereo Matching with Learning Strategy“, IEEE Transactions on Intelligent Transportation System (ITS), Vol. 18,
No. 2, pp. 248-258, 2017 (ISI Q1).
AI VIETNAM
All-in-One Course
V. D. Nguyen et al., “Feature Engineering and Deep. Learning for

Stereo Matching Under Adverse Driving Conditions”, IEEE Transaction
on Intelligent Transportation System, 2021.
AI VIETNAM
All-in-One Course
V. D. Nguyen et al. ,“Robust Stereo Matching with Learning

Strategy“, IEEE Transactions on Intelligent Transportation System
(ITS), 2017
AI VIETNAM
All-in-One Course
TRAFFIC VIOLATION DETECTION
1. V. D. Nguyen et. al: “Robust Traffic Light Detection and Classification Under Day and Night Conditions”, International Conference on C
ontrol, Automation and Systems, IEEE Explore, 2020
2. V. D. Nguyet et. al: “Robust and Real- Time Obstacle Region Detection Based on Depth Feature for Vehicle Detection”, Advances in Int
elligent Systems and Computing, Springer, 2020.
AI VIETNAM
All-in-One Course
SOCIAL DISTANCING ALARM
1. V. D. Nguyen et al.: “Robust Face Mask Detection Using Local Binary Pattern and Deep Learning. Lecture Notes on Data Engineering a
nd. Communications Technologies, Springer, 2022.
AI VIETNAM
All-in-One Course
OBSTACLE DETECTION UNDER HOSTILE DR

IVING CONDITIONS
1. V. D. Nguyen et. al.: “Triangular Pattern-based Sigmoid Algorithm for A Robust Raspberry Pi-based Autonomous Driving System under
Various Driving Conditions”, IEEE. Transactions on Intelligent Transportation Systems, Under Minor Revision, 2023
2. V. D. Nguyen et. al: A Deep Learning Framework for Robust and Real-Time Taillight Detection Under Various Road Conditions, in IEEE.
Transactions on Intelligent Transportation Systems, 2022 (ISI Q1).
3. V. D. Nguyen et.al,: "Local Tetra Pattern and Its Benefits to Improve the Performance of Car and Pedestrian. Detection Under Hostile Co
nditions," International Conference on Control, Automation and Systems, IEEE Explore, 2021.
AI VIETNAM
All-in-One Course
AI VIETNAM
All-in-One Course
Blueberry Leaves Diseases Classification

(Joint Research with Department of Plant Physiology-Biochemistry, College of A
griculture, Can Tho University)
1. V. D. Nguyen et. al., “Robust Plant Leaves Diseases Classification Using EfficientNet and Residual Block”, World Conference on Inform
ation Systems for Business Management, Scopus Q3. 2022
AI VIETNAM
All-in-One Course
SKIN DISEASE DETECTION

EIU Student’s Project (under development)
1. V. D. Nguyen et. al., “Mobile Application for Robust Skin Cancer Detection-based Deep Learning Model”, Eastern International University Sci
entific Research Conference (EIUSRC 2022), accepted.
AI VIETNAM
All-in-One Course
SKIN DISEASE DETECTION

EIU Student’s Project (under development)
1. V. D. Nguyen et. al., “Mobile Application for Robust Skin Cancer Detection-based Deep Learning Model”, Eastern International University Scie
ntific Research Conference (EIUSRC 2022), accepted.
2. V. D. Nguyet et. al. “Efficient Deep Learning Model for Skin Disease Detection and Classification”, preparing to submit to IEEE Transactions o
n Medical Imaging (ISI Q1).
AI VIETNAM
All-in-One Course
The objective of object detection is to develop

computational models and techniques that provide one
of the most basic pieces of information needed by
computer vision applications:
What objects are where?
26
AI VIETNAM
All-in-One Course
27
AI VIETNAM
All-in-One Course
28
AI VIETNAM
All-in-One Course
29
AI VIETNAM
All-in-One Course
30
AI VIETNAM
All-in-One Course Traditional Object Detection
31
What kind of patches?
AI VIETNAM
All-in-One Course
32
Planes
AI VIETNAM
All-in-One Course
33
Edges
AI VIETNAM
All-in-One Course
34
Corners
AI VIETNAM
All-in-One Course
35
AI VIETNAM
All-in-One Course
Corner Detection
36
AI VIETNAM
All-in-One Course
37
Traditional Object Detection
AI VIETNAM
All-in-One Course
Viola Jones Detectors

This algorithm was developed by two people named Paul Viola and
Michael Jones (in 2001).
Achieve real-time detection of human faces for the first time without
any constraints (e.g., skin color segmentation).
The VJ detector follows a most straight forward way of detection,

i.e., sliding windows: to go through all possible locations and scales
in an image to see if any window contains a human face.
The VJ detector has dramatically improved its detection speed by

incorporating three important techniques: “integral image”, “feature
selection”, and “detection cascades.
38
AI VIETNAM
All-in-One Course
39
AI VIETNAM
All-in-One Course
40
AI VIETNAM
All-in-One Course
Integral Image
41
RANDOM FOREST
AI VIETNAM
All-in-One Course
TREE ARE INDEPENDTLY CREATED
42
AdaBoost: FOREST OF STUMP
AI VIETNAM
All-in-One Course
Stump are not equally weighted in the final decision.

Stump that create more error will have less
contribution in the final decision
43
AdaBoost: FOREST OF STUMP
AI VIETNAM
All-in-One Course
1 2
Imfluence
44
AI VIETNAM
THREE MAIN DIFFERENTS BETWEEN 제1장 Introduction
All-in-One Course RANDOM FOREST AND ADABOOST
45
AI VIETNAM
All-in-One Course
46
AI VIETNAM
All-in-One Course Traditional Object Detection
47
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Histogram of Oriented Gradient
An image gradient is a directional change in the intensity or color in an im

age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.
48
AI VIETNAM
All-in-One Course
HOG + Linear SVM


for edge detection.
49
AI VIETNAM
All-in-One Course
HOG + Linear SVM


for edge detection.
50
AI VIETNAM
All-in-One Course
HOG + Linear SVM


for edge detection.
51
AI VIETNAM
All-in-One Course
HOG + Linear SVM


for edge detection.
Sobel Kernel
Code: https://colab.research.google.com/drive/1EtxlG4XR
52 grs0F7N8ivHQ6JZvVRboQx-8?usp=sharing
AI VIETNAM
All-in-One Course
53
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Histogram of Oriented Gradient:
Step 1 : Preprocessing
In the case of the HOG feature descriptor,

the input image is of size 64 x 128 x 3 = 24576 and
54 the output feature vector is of length 3780
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Step 2 : Calculate the Gradient Images
55
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Step 3 : Calculate Histogram of Gradients in 8×8 cells
56
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Histogram of Oriented Gradient: Step 3
57
AI VIETNAM
All-in-One Course
58
AI VIETNAM
All-in-One Course
HOG + Linear SVM

Step 4 : 16×16 Block Normalization
105x36x1 = 3780x1
59
AI VIETNAM
All-in-One Course
60
AI VIETNAM
All-in-One Course
HOG + Linear SVM

The first key ingredient from HOG + Linear SVM is to use image
pyramids.
61
HOG + Linear SVM
AI VIETNAM
All-in-One Course
The second key ingredient we need is sliding windows:
62
HOG + Linear SVM
AI VIETNAM
All-in-One Course
Combined with image pyramids, sliding windows allow us to localize objects at different
locations and multiple scales of the input image
63
HOG + Linear SVM
AI VIETNAM
All-in-One Course
The final key ingredient we need is non-maxima suppression.
64
Measuring object detection accuracy with
AI VIETNAM
Intersection over Union (IoU)

All-in-One Course
65
AI VIETNAM
All-in-One Course
66
AI VIETNAM
All-in-One Course
67
AI VIETNAM
All-in-One Course
68
AI VIETNAM
All-in-One Course
69
AI VIETNAM
All-in-One Course
70
Limitations of NMS
AI VIETNAM
All-in-One Course
71
Soft NMS
AI VIETNAM
All-in-One Course
The idea is very simple — “instead of completely removing the proposals with high IOU
and high confidence, reduce the confidences of the proposals proportional to IOU value”
72
AI VIETNAM
All-in-One Course
73
AI VIETNAM
All-in-One Course
74
AI VIETNAM
All-in-One Course
Most traditional object detection algorithms like Viola–

Jones, and Histogram of Oriented Gradients (HOG)are
relied on extracting handcrafted features like edges,
corners, gradients from the image and classical machine
learning algorithms.
For example, The Viola–Jones, a first object detector, was

only designed to detect frontal faces of humans and did not
do well on sideways and up/down faces.
75
AI VIETNAM
All-in-One Course
The traditional computer vision approaches were in the

game until 2010.
From 2012, a new era of convolutional neural networks

started when AlexNet (an image classification network)
won the ImageNet Visual Recognition challenge.
76
AI VIETNAM
All-in-One Course
Then, in 2012 came a new era. A revolution that changed

the game for computer vision entirely when AlexNet, a
Deep Convolutional Neural Network (CNN) architecture,
was born out of the need to improve the results of the
ImageNet challenge achieved considerable accuracy on
the 2012 ImageNet LSVRC-2012 challenge with an
accuracy of 84.7% as compared to the second-best with an
accuracy of 73.8%.
77
AI VIETNAM
All-in-One Course
ImageNet Dataset
78
CNN for Image Classification
AI VIETNAM
All-in-One Course
https://poloclub.github.io/cnn-explainer/
79
AI VIETNAM
All-in-One Course
LetNet AlexNet
1998 2012
80
AI VIETNAM
All-in-One Course
81
AI VIETNAM
All-in-One Course
AlexNet 2012
82
AI VIETNAM
All-in-One Course
ZFNet in 2013
83
AI VIETNAM
All-in-One Course
VGG in 2014
84
AI VIETNAM
All-in-One Course
GoogleNet in 2014
85
AI VIETNAM
All-in-One Course
ResNet in 2015
86
AI VIETNAM
All-in-One Course
SqueezeNet
87
AI VIETNAM
All-in-One Course
88
ALEXNET IMPLEMENTATION
AI VIETNAM
All-in-One Course
https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVRbo
89 Qx-8?usp=sharing
AI VIETNAM
All-in-One Course
90
Assignment
AI VIETNAM
All-in-One Course
Implement Object Detection by using Tradditional

approach
ResNet50
Code: https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVR
boQx-8?usp=sharing
Assignment
AI VIETNAM
All-in-One Course
Implement Object Detection by using tradditional

approach
ResNet50
Code: https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVR
boQx-8?usp=sharing
AI VIETNAM
All-in-One Course
93
AI VIETNAM
All-in-One Course
Hence, Girschick et al. (2014) showed how we could use

convolutional features for object detection, introducing R-
CNN (applying CNN on region proposals). Since then,
object detection has started to evolve at an unprecedented
speed.
In deep learning era, object detection can be grouped into

two genres: “two-stage detection” and “one-stage
detection”, where the former frames the detection as a
“coarse- to-fine” process while the later frames it as to
“complete in one step”.
94
The single-stage and two-stage detector
AI VIETNAM
All-in-One Course
95
AI VIETNAM
All-in-One Course
96
Image classification vs. object detection
AI VIETNAM
All-in-One Course
97
AI VIETNAM
All-in-One Course
98
Classification Pipleline
AI VIETNAM
All-in-One Course
99
Classification
AI VIETNAM
All-in-One Course
100
Softmax Classification
AI VIETNAM
All-in-One Course
101
Ideas for Localization using ConvNets
AI VIETNAM
All-in-One Course
x1,y1 w
h
x0,y0
X2,y2
102
BOUNDING BOX REGRESSION TRAINING
AI VIETNAM
All-in-One Course
103
AI VIETNAM
All-in-One Course
104
AI VIETNAM
All-in-One Course
105
AI VIETNAM
All-in-One Course
106
AI VIETNAM
All-in-One Course
Ideas for Detection
Confidence scores
Localization CNN
BBox
Neither do I know the number of objects

nor the location of those objects
AI VIETNAM
All-in-One Course
Ideas for Detection – Sliding Window
Confidence scores
Localization CNN
BBox
AI VIETNAM
All-in-One Course
Ideas for Detection – Sliding Window + Image Pyramid
Smaller objects Sliding Window – Location Larger objects

Image Pyramid - Scale
AI VIETNAM
All-in-One Course
Ideas for Detection using ConvNets

Crop + Resize with Sliding Window + Image Pyramid
Sliding Window – Location
Image Pyramid - Scale
Get Class scores
Using Softmax
AlexNet/VGG
Conv and Pool Layers Get Bounding boxes

Feature Maps Using L2 loss
As Feature Extractors
(x1, y1, x2, y2)
Image Credit - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/examples/index.html For example, to process an image of 800x800, if the

sliding window size is 224, we will end up with
331,776‬ crops.
AI VIETNAM
All-in-One Course
Problem: ConvNets input size constraints

Solution
AI VIETNAM
All-in-One Course
Implement the Fully Connected layer operation as a

convolution operation
112
AI VIETNAM
All-in-One Course
Problem: ConvNets input size constraints – FC as Conv

Pooled
Image Weights/Filter Feature Maps Pool FV FC Layers
Feature Maps
0 0 0 0 0 0 0 0
0 0
0 0
0 0 H
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
H
0 0
0 0
0 0
V
0 0
0 0
0 0 0 0 0 0 0 0 0 0
1. Does this make sense?

2. If so, what does this mean?
AI VIETNAM
All-in-One Course
Receptive Field
2x2 Pool 2x2 Pool

Stride = 2 Stride = 2
2x2 1x1
4x4
2x2 Pool 2x2 Pool

Stride = 2 Stride = 2
2x2
4x4
8x8
Every value in the output encodes information from some 4x4 patch of the image.
AI VIETNAM
All-in-One Course
ConvNets input size constraints

Pooled FC Layers
Image Weights/Filter Feature Maps
Feature Maps
0 0 0 0 0 0 0 0
0 0
0 0 H
0 0
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0
Same Localization CNN
0 0 0 0 0 0 0 0 0 0
0 0
0
0
0
0
H
0 0
0 0
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0 0 0 Spatial output
1. Does this make sense? -> yes
2. If so, what does this mean? -> Represents the computations on different portions of the image.
AI VIETNAM
All-in-One Course
Spatial Output as Sliding Window
CNN
AI VIETNAM
All-in-One Course
ConvNets and Sliding Window Efficiency

Confidence scores
Localization CNN
BBox
Localization CNN
V
AI VIETNAM
All-in-One Course
ConvNets and Sliding Window Efficiency

0 0 255 255 0 0 255 255
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255 3x3 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255
6x6
8x8
0 0 255 255 0 0 255 255 0 0
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 3x3 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0
10x10 8x8
255 255 0 0 255 255 0 0

255 255 0 0 255 255 0 0
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0
3x3 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 765 765 -765 -765 765 765
8x8 6x6
AI VIETNAM
All-in-One Course
Spatial Output for Image Pyramids
H V
H V
AI VIETNAM
All-in-One Course
Spatial Output for Image Pyramids
H V
H V
AI VIETNAM
All-in-One Course
With Spatial Outputs, we can detect different objects at

different locations of the image. Below figure shows a
2x3 Spatial Output for a sample image.
121
OverFeat:
AI VIETNAM Integrated Recognition, Localization and Detection제1장 Introduction
using
All-in-One Course
Convolutional Networks
Overfeat
Sliding Window Crop FC as Conv (No input size constraint) + Spatial Output + Image Pyramid
Resolution = 36 How to modify localization framework to convert FC as Conv?
461x569 425x497 389x461 317x389 281x317
2x3
3x5
5x7
6x7
7x10
245x245
Smaller objects Larger objects
If you want to detect even smaller objects, use even bigger image pyramids. Trade-off, increase in computation
Intuition behind OverFeat Network
AI VIETNAM
All-in-One Course
1. Use the same localization network, without using the

Sliding Window crops at different locations.
2. No input size constraint, be able to use the Image
pyramids.
3. Use Image Pyramids, we will get the Spatial Output,
which will give us detections at different locations of
the image.
4. The entire network is using Convolution operations, it
is way more efficient than taking crops.
This Network won the ImageNet 2013 localization task (ILSVRC2013) and obtained very
competitive results for the detection and classifications tasks.
123
Resolution
AI VIETNAM
All-in-One Course
124
Resolution
AI VIETNAM
All-in-One Course
Resolution = 36 How to modify localization framework to convert FC as Conv?
461x569 425x497 389x461 317x389 281x317
2x3
3x5
5x7
6x7
7x10
245x245
125
AI VIETNAM
All-in-One Course
Overfeat - Classification
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks – Sermanet et al
Images Credit – Overfeat paper & https://towardsdatascience.com/object-localization-in-overfeat-5bb2f7328b62
Let’s look at the FC layers in detail
AI VIETNAM
All-in-One Course
AI VIETNAM
All-in-One Course
But, how would you change the design to get a

depth of 4096 at 2nd and 3rd FC layers?
128
AI VIETNAM
All-in-One Course
N layer Conv – M Feature Maps

AI VIETNAM
All-in-One Course
Overfeat
Fully Connected layer implemented as a convolution layer
Conv Output Feature Map Outputs Filters Final output

+ Feature Map For C Classes
Pool Layers From Conv+Pool
x256 256*4096 4096* xC

4096*
x4096 4096 x4096
C
1x1 1x1 1x1 1x1

5x5 1x1
245x245
First 5 Layers of Feature Map
AlexNet (Modified) 5x5
Overfeat Detection Network
AI VIETNAM
All-in-One Course
131
AI VIETNAM
All-in-One Course
Problem of Multiple Detections
281x317
AI VIETNAM
All-in-One Course
Non Max Suppression
Spatial Output
Softmax instead of SVM

Human detection as an example
AI VIETNAM
All-in-One Course
Results
1x1xC
2x3xC
3x5xC
Won the ImageNet Loc 5x7xC
alization challenge in 2 6x7xC
013 7x10xC
3x3xC
6x9xC
9x15xC
15x21xC
18x21xC
21x30xC
3 3 9
6 9 54
9 15 135
15 21 315
18 21 378
21 30 630
1521
x21 = 31941
AI VIETNAM
All-in-One Course
Model Size
AI VIETNAM
All-in-One Course
Model Sizes AlexNet
11*11*3*96 13*13*256*4096
=34,848 =177,209,344
=~30KB =~177MB
Total Weight FC Outpu
Input Conv Filter Output s Approx FC input t Total Weights Approx
3 11 11 96 34848 34KB 43264 4096 177209344 177MB
96 5 5 256 614400 600KB 4096 4096 16777216 16MB
256 3 3 384 884736 900KB 4096 1000 4096000 4MB
384 3 3 384 1327104 1.3MB 198082560 ~=198MB
384 3 3 256 884736 900KB
13 Conv Layers
3745824 3.7MB
AI VIETNAM
All-in-One Course
Model Sizes - VGGNet
Input Conv Filter Output Total Weights Approx FC Outpu

3 3 3 64 1728 1.7KB FC input t Total Weights Approx
64 3 3 64 36864 36KB 25088 4096 102760448 102MB
64 3 3 128 73728 73KB 4096 4096 16777216 16MB
128 3 3 128 147456 150KB 4096 1000 4096000 4MB
128 3 3 256 294912 300KB 123633664 123MB
256 3 3 256 589824 600KB • You can increase the depth of your CNN with
256 3 3 256 589824 600KB out significantly increasing model size.
256 3 3 512 1179648 1.2MB • But even for a 3 layer FC Network, you need s
512 3 3 512 2359296 2.4MB ignificant memory for weights.
512 3 3 512 2359296 2.4MB • How can we do Classifications/Bbox regressio
512 3 3 512 2359296 2.4MB n without significantly increasing model size?
512 3 3 512 2359296 2.4MB
512 3 3 512 2359296 2.4MB 13 Conv Layers
14710464 14MB
AI VIETNAM
All-in-One Course
138
AI VIETNAM
All-in-One Course
139
AI VIETNAM
All-in-One Course
140
LIMITATIONS of CNN
AI VIETNAM
All-in-One Course
141
LIMITATIONS
AI VIETNAM
All-in-One Course
Considering candidates on Object Regions only?
142
REGION PROPOSAL METHODS
AI VIETNAM
All-in-One Course
Xiangteng He, Yuxin Peng, Junjie Zhao, “Fine-grained Discriminative Localization via Saliency-g
uided Faster R-CNN”, 2023
143
AI VIETNAM
All-in-One Course
Xiangteng He, Yuxin Peng, Junjie Zhao, “Fine-grained Discriminative Localization via Saliency-g
uided Faster R-CNN”, 2023
144
AI VIETNAM
All-in-One Course
145
AI VIETNAM
All-in-One Course
Egde Density
146
AI VIETNAM
All-in-One Course
Segmentation techniques
147
Region Proposal Method Comparisons
AI VIETNAM
All-in-One Course
Jan Hosang, Rodrigo Benenson, Bernt Schiele, “How good are detection
proposals, really?”, 2014
148
EDGE BOXES
AI VIETNAM
All-in-One Course
Zitnick, C.L., Dollár, P. (2014). Edge Boxes: Locating Object Proposals from Edges. In: Fle
et, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 20
14. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1
007/978-3-319-10602-1_26
149
AI VIETNAM
All-in-One Course
150
AI VIETNAM
All-in-One Course
153
R-CNN
AI VIETNAM
All-in-One Course
The R-CNN was described in the 2014 paper by Ross

Girshick, et al. from UC Berkeley titled “Rich feature
hierarchies for accurate object detection and semantic
segmentation.”
154
Region Proposals
AI VIETNAM
All-in-One Course
Selective Search
155
CNN Model
AI VIETNAM
All-in-One Course
156
AI VIETNAM
All-in-One Course
RCNN - Region proposals with CNNs

Localization CNN
fc6 fc7
4096 4096
Get Class scores

Using Softmax
C class scores
x256
Selective Search
AlexNet/VGG
Regions
6x6
6x6x256= Get Bounding boxes,

9216 per class
227x227 (x1, y1, x2, y2)
Rich feature hierarchies for accurate object detection and semantic segmentation - Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
AI VIETNAM
All-in-One Course
RCNN - Region proposals with CNNs

Localization CNN
fc6 fc7
4096 4096
Get
GetClass
Classscores
scores
Using SVM
Using Softmax
Linear
C classSVM per class
scores
x256
Selective Search
2000 AlexNet/VGG
Cropped &
Warped
regions
6x6
6x6x256= Get Bounding boxes,

9216 per class
227x227 (x1, y1, x2, y2)
Rich feature hierarchies for accurate object detection and semantic segmentation - Ross Girshick Jeff Donahue Trevor Darrell Jitendra Mali
k
AI VIETNAM
All-in-One Course
RCNN
Classical CV CNN 3 Stage Training
Stage 1 Stage 2
Fine Tune using Log L
oss (Training Only)
fc6 fc7
Get Class scores
Using SVM
AlexNet/VGG
Linear SVM per class
SS/EB
Get Bounding boxe

2000 Pre Trained s, per class
Region On ImageNet (x1, y1, x2, y2)
Proposals Finetune
On Region Proposals VOC 0 W/O F W FT
7 T
pool5 44.2 47.3
Alex Net
fc6 46.2 53.1
• Before finetune: 44%
fc7 44.7 54.2
• After finetune: 54%
• Adding bounding box regressioni: 58%
• VGG: 66%
AI VIETNAM
All-in-One Course
RCNN
Classical CV CNN
Stage 1 Stage 2 Fine Tune using Log L
oss (Training Only)
fc6 fc7
Get Class scores
Using SVM
AlexNet/VGG
SS/EB
Get Bounding boxe
s, per class
(x1, y1, x2, y2)
2000 Pre Trained
Region On ImageNet &
Proposals Finetuned • Why don’t we need the sliding window & imag
On Region Proposals e pyramid?
• Didn’t we end up with too many inputs to the local
ization network?
AI VIETNAM
All-in-One Course
161
AI VIETNAM
All-in-One Course
162
AI VIETNAM
All-in-One Course
Results
AI VIETNAM
All-in-One Course
165
Assignment
AI VIETNAM
All-in-One Course
https://pyimagesearch.com/2020/07/13/r-cnn-object-detection-with-keras-te
nsorflow-and-deep-learning/
166
AI VIETNAM
All-in-One Course
167
AI VIETNAM
All-in-One Course
168
AI VIETNAM
All-in-One Course
169
Fast R-CNN
AI VIETNAM
All-in-One Course
170
Fast R-CNN
AI VIETNAM
All-in-One Course
Fast R-CNN is proposed as a single model instead of a

pipeline to learn and output regions and classifications
directly.
https://arxiv.org/abs/1504.08083
171
AI VIETNAM
All-in-One Course
172
AI VIETNAM
All-in-One Course
173
AI VIETNAM
All-in-One Course
Histograms of Images
72
48
0 150 255
0 255
AI VIETNAM
All-in-One Course
AI VIETNAM
All-in-One Course
AI VIETNAM
All-in-One Course
Histograms of Images - Bins
0 25 150 175 255 225

Codebook
0-49 0 48
48 50-99 1 0
100-149 2 0
150-199 3 48
200-255 4 48
0-49 50-99 100-149 150-199 200-255
AI VIETNAM
All-in-One Course
Histogram Examples
Credits – See Description

AI VIETNAM
All-in-One Course
Histogram Examples
AI VIETNAM
All-in-One Course
Bag of Visual Words

Generate HOG/SIFT Feature Descriptors
0
1
2
3
4
5
codebook
AI VIETNAM
All-in-One Course
Bag of Visual Words – K Means Clustering

0
1
2
3
4
5
codebook
K Means Clustering
AI VIETNAM
All-in-One Course
Bag of Visual Words

0
1
2
3
4
5
codebook
AI VIETNAM
All-in-One Course
BOW – Codebook generation for different textures
Image Credit - https://littlecheesecake.wordpress.com/2013/04/24/research-bag-of-features-for-visual-recognition/

AI VIETNAM
All-in-One Course
BOW – Example – Codebook generation for Faces
Slide Credit - http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf

AI VIETNAM
All-in-One Course
BOW - Examples
Slide Credit - http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf

AI VIETNAM
All-in-One Course
Spatial Pyramid Matching
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories - Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Image Credit - https://homepages.inf.ed.ac.uk/rbf/HIPR2/translte.htm

AI VIETNAM
All-in-One Course
3x1 = 3
8x12
Feature Maps
3x4 = 12 12x8 12x12
Feature Maps Feature Maps
3x16 = 48
8x8 3x21
Feature Maps
AI VIETNAM
All-in-One Course
Classification & Localization

4096 4096
C
Get Class scores
Using Softmax
x256
-Last Pool
Alex/VGG
S
P
P
Feature Maps
4xC
Get Bounding boxes
Using L2 loss
(x1, y1, x2, y2)
Replace last pooling layer by SPP
AI VIETNAM
All-in-One Course
192
AI VIETNAM
All-in-One Course
Spatial Pyramid Pooling
1x1
• Identifying features
• K-means clustering
4x1 • Codebooks
• Histograms
Just Max-Pool
16x1
8x8
Feature Maps 21x1
AI VIETNAM
All-in-One Course
SPP for Feature Maps outputs

x256
1x256
4x256
16x256
21x256
AI VIETNAM
All-in-One Course
Any Size and Aspect Ratio
1x1
8x12
Feature Maps
4x1 12x8 12x12

Feature Maps Feature Maps
16x1
8x8
21x1
Feature Maps
AI VIETNAM
All-in-One Course
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
AI VIETNAM
All-in-One Course
SPPNet = SPP + Overfeat for Classification

1x1x4096 1x1x4096
ILSVRC 2014: ranked #3 in image classific
ation 1x1xC
Get Class scores
Using Softmax
x256
-Last Pool
Alex/VGG
S
P
P
Feature Maps
245x245 1x1x4xC
Get Bounding boxes
Using L2 loss
(x1, y1, x2, y2)
Replace last pooling layer by SPP
AI VIETNAM
All-in-One Course
RCNN – 2 Stage Network
Fine Tune using Log L

oss (Training Only)
Get Class scores

AlexNet/VGG Using SVM.
SS/EB
Get Bounding boxe

s, per class
2000 Pre Trained Using L2 loss
Region Proposals On ImageNet (x1, y1, x2, y2)
+ Finetuned
On Region Proposals
AI VIETNAM
All-in-One Course
SPP – 2 Stage Network

Region Proposals
1. How do you translate ROI pro
posals onto the Feature Maps
SS/EB Fine Tune using Log L

oss (Training Only)
Get Class scores

Using SVM
-Last Pool
Alex/VGG
S
SS/EB
P
P
Get Bounding boxe
s, per class
Pre Trained Using L2 loss
On ImageNet (x1, y1, x2, y2)
Region Of Interest Proposals – ROI Proposal 3. How to train the BBox regressor
2. How do you pool the ROI proposals from the Feature Map
AI VIETNAM
All-in-One Course
Subsampling Ratio
1. How do you translate ROI proposals onto the Feature Maps
3x3 Pool 3x3 Pool 2x2 Pool

Stride = 3 Stride = 3 Stride = 2
2x2 1x1
6x6
18x18
3x3 Pool 3x3 Pool 2x2 Pool

Stride = 3 Stride = 1 Stride = 2
3x3
6x6 5x5
18x18
AI VIETNAM
All-in-One Course
ROI Projection
1/16
(340, 450) (21, 28)
0,0 688x920
x
SS/EB Region Proposal or ROI
320x128
43x58
1/16 Classifier
S
y P
20x8 P
BBox Reg
x, y
1. How do you translate ROI proposals onto the Feature Maps

AI VIETNAM
All-in-One Course
AlexNet Subsampling Ratio
2. How do you pool the ROI proposa

ls from the Feature Map
13x13
Feature Maps
AI VIETNAM
All-in-One Course
SPP on Region Proposals
688
43x58 - Feature Map

AI VIETNAM
All-in-One Course
SPP for Feature Maps outputs

x256
1x256
4x256
16x256
21x256
AI VIETNAM
All-in-One Course
Any Size and Aspect Ratio, Overlapping ROIs

AI VIETNAM
All-in-One Course
SPP on Region Proposals

In Practice - {6x6, 3x3, 2x2, 1x1}
2. How do you pool the ROI proposals from the Feature Map
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
AI VIETNAM
All-in-One Course
Feature Maps Sizes
224x224
480
576
688 30x40
36x48
864 43x58
1200 54x72
75x100
AI VIETNAM
All-in-One Course
BBox Regression Training

x
0,0
Higher Layers/
ConvNets
y BBox Reg
dx, dy, dw, dh
3. How to train the BBox regressor

AI VIETNAM
All-in-One Course
BBox Regression Training
0,0 x
Higher Layers/
ConvNets
BBox Reg
ROI Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh xg, yg, wg, hg
y (x + dx – xg )2 = 0
3. How to train the BBox regressor

AI VIETNAM
All-in-One Course
SPP – 2 Stage Network - Inference

Region Proposals
2000 Computations
Old: 0.7s; New: 0.9s
SS/EB Fine Tune using Log L
Just 1 computation oss (Training Only)
Old: 9s; New: 0.3s
Get Class scores
Using SVM
-Last Pool
Alex/VGG
S
P
P
Get Bounding boxe
s, per class
Pre Trained Using L2 loss
On ImageNet (x1, y1, x2, y2)
ILSVRC 2014: ranked #2 in object detec

tion
AI VIETNAM
All-in-One Course
RCNN -> SPPNet -> Fast RCNN

Complex
Basic Shapes
Shapes FC +
Soft 3 Stage to 1 Stage
Region Proposals
max
Fine Tune using Log Lo
SS/EB ss (Training Only)
+ Softmax for Classifica
conv3 tion
FT ROI S
P Get Class scores
P Using SVM
1L Linear SVM per class
-Last Pool
Alex/VGG
ROI
P Get Bounding boxes,
O per class
Single Scale, no Image Pyramid Using L2 Loss
Pre Trained O
L Smooth L1 loss
On ImageNet
(x, y, h, w)
if |a-b|<1, (a-b)2/2
else, |a-b| - 1/2
AI VIETNAM
All-in-One Course
Role of Region Proposals
CNN
Region Proposal Dense Sampling

RCNN Overfeat
SPPNet
Fast RCNN
They conclude – “Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as spa
rse proposals”.
AI VIETNAM
All-in-One Course
Good Looking Fast RCNN ☺
SS/EB
R ss + Softmax for Classif
O ication
I
Pretrained
-Last Pool
Alex/VGG P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Faster R-CNN
AI VIETNAM
All-in-One Course
217
Faster R-CNN
AI VIETNAM
All-in-One Course
218
AI VIETNAM
All-in-One Course
Role of Region Proposals
CNN
Region Proposal Dense Sampling

RCNN Overfeat
SPPNet
Fast RCNN
They conclude – “Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as spa
rse proposals”.
AI VIETNAM
All-in-One Course
Criteria for replacing SS

< 2000 Region Proposals
As fast as SS or better
As Accurate as SS or better
Should be able to propose Overlapping ROIs with different Aspect Ratios and Scale
SS/EB
O ication
I
Pretrained
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
AI VIETNAM
All-in-One Course
Criteria for replacing SS

Should be able to propose:

• Overlapping ROIs
• Aspect Ratios
• Scales
AI VIETNAM
All-in-One Course
Fast RCNN + Sliding Window + Image Pyramid
Dense Sampling
O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, w, h)
AI VIETNAM
All-in-One Course
Fast RCNN + Feature Pyramid
Dense Sampling
With Feature Pyramid
O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
• At least 40x60x9 =~ 20,000 proposals -> time consuming. Using Smooth L1 loss
• Backpropagating through those many proposals is difficult/time consuming (x, y, w, h)
AI VIETNAM
All-in-One Course
Fast RCNN + Neural Network?
OverFeat
Simple CNN[-Classifier-NMS]
BBox Regressor
O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
AI VIETNAM
All-in-One Course
Fast RCNN + RPN

RPN
Pretrained
-Last Pool
Alex/VGG

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
https://www.youtube.com/watch?v=po59qI5LJGU&list=PL1GQaVhO4f_jLxOokW
7CS5kY_J1t1T17S&index=79
AI VIETNAM
All-in-One Course
Ideas for Localization using ConvNets

10
Case #1 – Only one object per image Human
Car
Dog
Cat
0,0 800, 0
Bicycle
AlexNet/VGG etc
Get Class Scores
Using Softmax
0, 600 Human
Car
X1, y1 w Dog
Get Bounding boxes
h
Using L2 loss
X2, y2 (x1, y1, x2, y2) Cat
X0, y0
Bicycle
etc
Image Credit - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/examples/index.html
AI VIETNAM
All-in-One Course
0,0 BBox
x Regression - Relative
Higher Layers/
ConvNets
y BBox Reg
dx, dy, dh, dw
Reference Box x, y, h, w
ROI reference Bbox deltas Predicted Expected

x y h w dx dy dh dw x y h w x y h w
160 240 150 150 18 -22 -30 -125 178 218 120 25 180 220 120 30
AI VIETNAM
All-in-One Course
Sliding Window as Reference Box

x
0,0
Higher Layers/
ConvNets
y BBox Reg
dx, dy, dw, dh
Reference Box x, y, w, h
AI VIETNAM
All-in-One Course

x
0,0
Higher Layers/
ConvNets
y BBox Reg
dx, dy, dw, dh
AI VIETNAM
All-in-One Course

x
0,0
BBox Reg
y BBox Reg
BBox Reg
AI VIETNAM
All-in-One Course
Reference Boxes of fixed sizes

x
0,0
BBox Reg 1:1

y BBox Reg 1:2
BBox Reg 2:1
AI VIETNAM
All-in-One Course
Square ROI to Rectangular Proposal

0,0 x
Higher Layers/
ConvNets
BBox Reg 2:1

x, y, w, h dx, dy, dw, dh x, y, w, h
Tall Box Centre/W/H

y
x, y, w, h
ROI reference Bbox deltas Predicted Expected

x y h w dx dy dh dw x y h w x y h w
160 240 130 65 18 -22 -10 -32 178 218 120 33 180 220 120 30
Note: This is not the exact formula used in the Faster RCNN paper. I just used this for simpler explanation. Please see the paper for details.
AI VIETNAM
All-in-One Course

0,0 x
Higher Layers/
ConvNets
BBox Reg 1:1

Square Box Centre/W/H

y
x, y, w, h
AI VIETNAM
All-in-One Course

0,0 x
Higher Layers/
ConvNets
BBox Reg 1:2

Wide Box Centre/W/H

y
x, y, w, h
AI VIETNAM
All-in-One Course
Multiple BBox Reg using Reference Boxes

60x40
This is different from Feature Pyramid
BBox Reg 1:1
These boxes are called Anchor Boxes
BBox Reg 1:2
BBox Reg 2:1
BBox Reg
BBox Reg
BBox Reg
BBox Reg
BBox Reg
BBox Reg
AI VIETNAM
All-in-One Course
Bigger Objects?
0,0 x
Higher Layers/
ConvNets
BBox Reg
SW Centre/W/H Ground Truth
Tall Box Centre/W/H

y
x, y, w, h
AI VIETNAM
All-in-One Course
Bigger Objects?
x
0,0
BBox Reg 1:1

y BBox Reg 1:2
BBox Reg 2:1
Note: This is not the exact formula used in the Faster RCNN paper. I just used this for simpler explanation. Please see the paper for details.
AI VIETNAM
All-in-One Course
Multiple BBox Reg using Anchor Boxes

60x40 1:1, 1:2. 2:1
BBox Reg
BBox Reg 128sq
BBox Reg
BBox Reg
BBox Reg 256sq
BBox Reg
BBox Reg
BBox Reg 512sq
BBox Reg
AI VIETNAM
All-in-One Course
Fast RCNN + RPN

RPN
Pretrained
-Last Pool
Alex/VGG
x9

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
AI VIETNAM
All-in-One Course
How to reduce number of proposals?

0,0 x
Higher Layers/
ConvNets
BBox Reg Classifier
dx, dy, dw, dh

FG BG
y
AI VIETNAM
All-in-One Course
Faster RCNN – Training Anchor Boxes - Labelling

x
0,0
BBox Reg 1:1

y BBox Reg 1:2
BBox Reg 2:1
AI VIETNAM
All-in-One Course
Fast RCNN + RPN

RPN x9
FG/BG As fast as SS or better
Pretrained
-Last Pool
Alex/VGG

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
AI VIETNAM
All-in-One Course
Faster RCNN – Training Anchor Boxes - Labelling
BBox Reg 1:1

BBox Reg 1:2
BBox Reg 2:1
AI VIETNAM
All-in-One Course
Fast RCNN + RPN = Faster RCNN

Different Image Sizes? x9 RPN
1x1
FG/BG As fast as SS or better ?
3x3
Pretrained
-Last Pool
Alex/VGG Dense Sampling As Accurate as SS or better ?
1x1

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
Sparse Sampling
(x, y, h, w)
Unchanged – Fast RCNN
AI VIETNAM
All-in-One Course
Faster & Accurate than SS?
Time in ms
mAP
------
66.9
69.9
AI VIETNAM
All-in-One Course
Faster RCNN - Training

x9 RPN 1. Train RPN using ConvNet1
Unshared
Pretrained 1x1
-Last Pool
Alex/VGG
FG/BG 2. Train Fast-RCNN using ConvNet2

& RPN Proposals
3x3
3. Fine-Tune RPN using ConvNet2
Shared
4. Fine-Tune Fast-RCNN using Conv
1x1 Net2 & new RPN Proposals
Joint Training done later.

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
Fast RCNN
AI VIETNAM
All-in-One Course
Fast RCNN + Neural Network?
OverFeat [-Classifier-NMS]
O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
AI VIETNAM
All-in-One Course
RPN ~= Overfeat + Anchor Boxes – Class agnostic

x9 RPN
1x1
FG/BG
3x3
1x1

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
Fast RCNN
AI VIETNAM
All-in-One Course
Quirks about Anchor Boxes

Effective Receptive Field of VGGNet is 228x228
228 400
0, x 800
x9 RPN 0
1x1
FG/BG
228
3x3 300
1x1
y
600
See: http://zike.io/posts/calculate-receptive-field-for-vgg-16/
AI VIETNAM
All-in-One Course
Multiple BBox Reg using Anchor Boxes

60x40 1:1, 1:2. 2:1
BBox Reg
BBox Reg 128sq
BBox Reg
BBox Reg
BBox Reg 256sq
BBox Reg
BBox Reg
BBox Reg 512sq
BBox Reg
AI VIETNAM
All-in-One Course
Faster RCNN Network – Micro Code Walkthrough in TF

x9 RPN
https://github.com/endernewton/tf-faster-rcnn
1x1
FG/BG
3x3
1x1

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
Fast RCNN
AI VIETNAM
All-in-One Course
Faster RCNN
x9 RPN
1x1
FG/BG
3x3
1x1

O ication
Pretrained
I
-Last Pool
Alex/VGG
P
O
L per class
(x, y, h, w)
Fast RCNN
AI VIETNAM
All-in-One Course
253
Faster RCNN Implementation
AI VIETNAM
All-in-One Course
https://www.analyticsvidhya.com/blog/2018/11/impleme
ntation-faster-r-cnn-python-object-detection/
254
AI VIETNAM
All-in-One Course
Chain Of Influences
Convolution Colour Detection Edge Detection Superpixels Histograms Corner Detection Gradient
AlexNet, VGG, ResNet SIFT HOG
Selective Search BOW DPM

MultiBox OverFeat RCNN

Fast-RCNN
Faster-RCNN
AI VIETNAM
All-in-One Course
Understanding Region of Interest (RoI Pooling)
258
AI VIETNAM
All-in-One Course
259
AI VIETNAM
All-in-One Course
Sample RoIs
260
AI VIETNAM
All-in-One Course
How to get RoIs from the feature map?
261
AI VIETNAM
All-in-One Course
•width: 200/32 = 6.25

•height: 145/32 = ~4.53
•x: 296/32 = 9.25
•y: 192/32 = 6
262
AI VIETNAM
All-in-One Course
Quantization of coordinates on the feature map
263
AI VIETNAM
All-in-One Course
264
AI VIETNAM
All-in-One Course
265
AI VIETNAM
All-in-One Course
266
AI VIETNAM
All-in-One Course
267
AI VIETNAM
All-in-One Course
268
AI VIETNAM
All-in-One Course
269
AI VIETNAM
All-in-One Course
270
AI VIETNAM
All-in-One Course
271
AI VIETNAM
All-in-One Course
272
Mask R-CNN
AI VIETNAM
All-in-One Course
273
AI VIETNAM
All-in-One Course
274
AI VIETNAM
All-in-One Course
275
AI VIETNAM
All-in-One Course
276
AI VIETNAM
All-in-One Course
277
AI VIETNAM
All-in-One Course
278
AI VIETNAM
All-in-One Course
279
AI VIETNAM
All-in-One Course
280
AI VIETNAM
All-in-One Course
281
AI VIETNAM
All-in-One Course
282
AI VIETNAM
All-in-One Course
283
Mask RCNN
AI VIETNAM
All-in-One Course
284
AI VIETNAM
All-in-One Course
Why is there a need for a large amount of data?
285
AI VIETNAM
All-in-One Course
Why is there a need for a large amount of data?
286
AI VIETNAM
All-in-One Course
What is Data Augmentation?
287
Data Augmentation
AI VIETNAM
All-in-One Course
Resize and rescale
288
Data Augmentation
AI VIETNAM
All-in-One Course
Random rotate and flip
289
Data Augmentation
AI VIETNAM
All-in-One Course
Flip Grayscale
Adjusting the saturation Adjusting the brightness
290
Data Augmentation
AI VIETNAM
All-in-One Course
Central Crop 90-degree rotation
Applying random brightness
291

Object Detection - Week 1 - Object Detection in 20 Years - Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Object Detection - Week 1 - Object Detection in 20 Years - Final

Uploaded by

Copyright:

Available Formats

제1장 Introduction

Dr. Vinh Dinh Nguyen

Shaoxi Li et al. “Survey on Deep Learning-Based Marine Object Detection”, 2021

How to write and publish research paper in reputed

Car and Pedestrian Detection

V. D. Nguyen et al., “Real-time Vehicle

V. D. Nguyen et al., “Learning Framework for Robust Obstacle

Our Experimental Results

Hyundai Driving Contest, South Korea 2017

1st rank in Software Algorithms Contest

Left imag Right image Proposed Existing

V. D. Nguyen et al., “Feature Engineering and Deep. Learning for

V. D. Nguyen et al. ,“Robust Stereo Matching with Learning

TRAFFIC VIOLATION DETECTION

SOCIAL DISTANCING ALARM

OBSTACLE DETECTION UNDER HOSTILE DR

Blueberry Leaves Diseases Classification

SKIN DISEASE DETECTION

SKIN DISEASE DETECTION

The objective of object detection is to develop

Viola Jones Detectors

The VJ detector follows a most straight forward way of detection,

The VJ detector has dramatically improved its detection speed by

Viola Jones Detectors

Viola Jones Detectors

TREE ARE INDEPENDTLY CREATED

Stump are not equally weighted in the final decision.

All-in-One Course RANDOM FOREST AND ADABOOST

Viola Jones Detectors

HOG + Linear SVM

An image gradient is a directional change in the intensity or color in an im

HOG + Linear SVM

An image gradient is a directional change in the intensity or color in an im

HOG + Linear SVM

An image gradient is a directional change in the intensity or color in an im

HOG + Linear SVM

An image gradient is a directional change in the intensity or color in an im

HOG + Linear SVM

An image gradient is a directional change in the intensity or color in an im

HOG + Linear SVM

In the case of the HOG feature descriptor,

HOG + Linear SVM

HOG + Linear SVM

HOG + Linear SVM

HOG + Linear SVM

HOG + Linear SVM

The second key ingredient we need is sliding windows:

The final key ingredient we need is non-maxima suppression.

Intersection over Union (IoU)

Most traditional object detection algorithms like Viola–

For example, The Viola–Jones, a first object detector, was

The traditional computer vision approaches were in the

From 2012, a new era of convolutional neural networks

Then, in 2012 came a new era. A revolution that changed

Implement Object Detection by using Tradditional

Implement Object Detection by using tradditional

Hence, Girschick et al. (2014) showed how we could use

In deep learning era, object detection can be grouped into

Ideas for Detection

Neither do I know the number of objects

Ideas for Detection – Sliding Window

Ideas for Detection – Sliding Window + Image Pyramid

x256 2564096 4096 xC