You are on page 1of 280

제1장 Introduction

AI VIETNAM
All-in-One Course

Object Detection

Dr. Vinh Dinh Nguyen


제1장 Introduction
AI VIETNAM
All-in-One Course

2
제1장 Introduction
AI VIETNAM
All-in-One Course

Shaoxi Li et al. “Survey on Deep Learning-Based Marine Object Detection”, 2021


3
제1장 Introduction
AI VIETNAM
All-in-One Course

4
제1장 Introduction
AI VIETNAM
All-in-One Course

5
제1장 Introduction
AI VIETNAM
Tentative Calendar for Research Course (Feb, 2023)
All-in-One Course

How to write and publish research paper in reputed


international conference and journals

6
제1장 Introduction
AI VIETNAM
All-in-One Course

7
제1장 Introduction
AI VIETNAM
All-in-One Course

8
What is this?
제1장 Introduction
AI VIETNAM
All-in-One Course

9
Application
제1장 Introduction
AI VIETNAM
All-in-One Course

10
제1장 Introduction
AI VIETNAM
All-in-One Course

Car and Pedestrian Detection

1. V. D. Nguyen et. al: “ Real-time Vehicle Detection Using an Effective Region Proposal-based Depth and 3-channel Pattern“, IEEE Tran
sactions on Intelligent Transportation System (ITS), 2019 (ISI Q1).
2. V. D. Nguyen et al. ,“ Learning Framework for Robust Obstacle Detection, Recognition, and Tracking“, IEEE Transactions on Intelligent
Transportation System (ITS), 2017, (ISI Q1).
제1장 Introduction
AI VIETNAM
All-in-One Course

V. D. Nguyen et al., “Real-time Vehicle


Detection Using an Effective Region
Proposal-based Depth and 3-channel
Pattern”, IEEE Transactions on Intelligent
Transportation System (ITS), 2019

12
제1장 Introduction
AI VIETNAM
All-in-One Course

V. D. Nguyen et al., “Learning Framework for Robust Obstacle


Detection, Recognition, and Tracking“, IEEE Transactions on Intelligent
Transportation System (ITS), 2017

13
제1장 Introduction
AI VIETNAM
All-in-One Course

Our Experimental Results

14
제1장 Introduction
AI VIETNAM
All-in-One Course

Hyundai Driving Contest, South Korea 2017

1st rank in Software Algorithms Contest


Pedestrian, Car, Lane, Traffic Light and Traffic Sign Detection

15
제1장 Introduction
AI VIETNAM
All-in-One Course

DISPARITY ESTIMATION

Left imag Right image Proposed Existing


e
1. V. D. Nguyen et al.: " Feature Engineering and Deep. Learning for Stereo Matching Under Adverse Driving Conditions," in IEEE. Transactions o
n Intelligent Transportation Systems, 2021 (ISI Q1).
2. V. D. Nguyen et. al,“ Robust Stereo Matching with Learning Strategy“, IEEE Transactions on Intelligent Transportation System (ITS), Vol. 18,
No. 2, pp. 248-258, 2017 (ISI Q1).
제1장 Introduction
AI VIETNAM
All-in-One Course

V. D. Nguyen et al., “Feature Engineering and Deep. Learning for


Stereo Matching Under Adverse Driving Conditions”, IEEE Transaction
on Intelligent Transportation System, 2021.
제1장 Introduction
AI VIETNAM
All-in-One Course

V. D. Nguyen et al. ,“Robust Stereo Matching with Learning


Strategy“, IEEE Transactions on Intelligent Transportation System
(ITS), 2017
제1장 Introduction
AI VIETNAM
All-in-One Course

TRAFFIC VIOLATION DETECTION

1. V. D. Nguyen et. al: “Robust Traffic Light Detection and Classification Under Day and Night Conditions”, International Conference on C
ontrol, Automation and Systems, IEEE Explore, 2020
2. V. D. Nguyet et. al: “Robust and Real- Time Obstacle Region Detection Based on Depth Feature for Vehicle Detection”, Advances in Int
elligent Systems and Computing, Springer, 2020.
제1장 Introduction
AI VIETNAM
All-in-One Course

SOCIAL DISTANCING ALARM

1. V. D. Nguyen et al.: “Robust Face Mask Detection Using Local Binary Pattern and Deep Learning. Lecture Notes on Data Engineering a
nd. Communications Technologies, Springer, 2022.
제1장 Introduction
AI VIETNAM
All-in-One Course

OBSTACLE DETECTION UNDER HOSTILE DR


IVING CONDITIONS

1. V. D. Nguyen et. al.: “Triangular Pattern-based Sigmoid Algorithm for A Robust Raspberry Pi-based Autonomous Driving System under
Various Driving Conditions”, IEEE. Transactions on Intelligent Transportation Systems, Under Minor Revision, 2023
2. V. D. Nguyen et. al: A Deep Learning Framework for Robust and Real-Time Taillight Detection Under Various Road Conditions, in IEEE.
Transactions on Intelligent Transportation Systems, 2022 (ISI Q1).
3. V. D. Nguyen et.al,: "Local Tetra Pattern and Its Benefits to Improve the Performance of Car and Pedestrian. Detection Under Hostile Co
nditions," International Conference on Control, Automation and Systems, IEEE Explore, 2021.
제1장 Introduction
AI VIETNAM
All-in-One Course
제1장 Introduction
AI VIETNAM
All-in-One Course

Blueberry Leaves Diseases Classification


(Joint Research with Department of Plant Physiology-Biochemistry, College of A
griculture, Can Tho University)

1. V. D. Nguyen et. al., “Robust Plant Leaves Diseases Classification Using EfficientNet and Residual Block”, World Conference on Inform
ation Systems for Business Management, Scopus Q3. 2022
제1장 Introduction
AI VIETNAM
All-in-One Course

SKIN DISEASE DETECTION


EIU Student’s Project (under development)

1. V. D. Nguyen et. al., “Mobile Application for Robust Skin Cancer Detection-based Deep Learning Model”, Eastern International University Sci
entific Research Conference (EIUSRC 2022), accepted.
제1장 Introduction
AI VIETNAM
All-in-One Course

SKIN DISEASE DETECTION


EIU Student’s Project (under development)

1. V. D. Nguyen et. al., “Mobile Application for Robust Skin Cancer Detection-based Deep Learning Model”, Eastern International University Scie
ntific Research Conference (EIUSRC 2022), accepted.
2. V. D. Nguyet et. al. “Efficient Deep Learning Model for Skin Disease Detection and Classification”, preparing to submit to IEEE Transactions o
n Medical Imaging (ISI Q1).
제1장 Introduction
AI VIETNAM
All-in-One Course

The objective of object detection is to develop


computational models and techniques that provide one
of the most basic pieces of information needed by
computer vision applications:
What objects are where?

26
제1장 Introduction
AI VIETNAM
All-in-One Course

27
제1장 Introduction
AI VIETNAM
All-in-One Course

28
제1장 Introduction
AI VIETNAM
All-in-One Course

29
제1장 Introduction
AI VIETNAM
All-in-One Course

30
제1장 Introduction
AI VIETNAM
All-in-One Course Traditional Object Detection

31
What kind of patches?
제1장 Introduction
AI VIETNAM
All-in-One Course

32
Planes
제1장 Introduction
AI VIETNAM
All-in-One Course

33
Edges
제1장 Introduction
AI VIETNAM
All-in-One Course

34
Corners
제1장 Introduction
AI VIETNAM
All-in-One Course

35
제1장 Introduction
AI VIETNAM
All-in-One Course

Corner Detection

36
제1장 Introduction
AI VIETNAM
All-in-One Course

37
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

Viola Jones Detectors


This algorithm was developed by two people named Paul Viola and
Michael Jones (in 2001).

Achieve real-time detection of human faces for the first time without
any constraints (e.g., skin color segmentation).

The VJ detector follows a most straight forward way of detection,


i.e., sliding windows: to go through all possible locations and scales
in an image to see if any window contains a human face.

The VJ detector has dramatically improved its detection speed by


incorporating three important techniques: “integral image”, “feature
selection”, and “detection cascades.

38
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

Viola Jones Detectors

39
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

Viola Jones Detectors

40
제1장 Introduction
AI VIETNAM
All-in-One Course

Integral Image

41
RANDOM FOREST
제1장 Introduction
AI VIETNAM
All-in-One Course

TREE ARE INDEPENDTLY CREATED

42
AdaBoost: FOREST OF STUMP
제1장 Introduction
AI VIETNAM
All-in-One Course

Stump are not equally weighted in the final decision.


Stump that create more error will have less
contribution in the final decision

43
AdaBoost: FOREST OF STUMP
제1장 Introduction
AI VIETNAM
All-in-One Course

1 2

Imfluence

44
AI VIETNAM
THREE MAIN DIFFERENTS BETWEEN 제1장 Introduction

All-in-One Course RANDOM FOREST AND ADABOOST

45
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

Viola Jones Detectors

46
제1장 Introduction
AI VIETNAM
All-in-One Course Traditional Object Detection
Viola Jones Detectors

47
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient

An image gradient is a directional change in the intensity or color in an im


age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.

48
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient

An image gradient is a directional change in the intensity or color in an im


age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.

49
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient

An image gradient is a directional change in the intensity or color in an im


age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.

50
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient

An image gradient is a directional change in the intensity or color in an im


age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.

51
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient

An image gradient is a directional change in the intensity or color in an im


age. The gradient of the image is one of the fundamental building blocks in im
age processing. For example, the Canny edge detector uses image gradient
for edge detection.

Sobel Kernel
Code: https://colab.research.google.com/drive/1EtxlG4XR
52 grs0F7N8ivHQ6JZvVRboQx-8?usp=sharing
제1장 Introduction
AI VIETNAM
All-in-One Course

53
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient:
Step 1 : Preprocessing

In the case of the HOG feature descriptor,


the input image is of size 64 x 128 x 3 = 24576 and
54 the output feature vector is of length 3780
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient:
Step 2 : Calculate the Gradient Images

55
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient:
Step 3 : Calculate Histogram of Gradients in 8×8 cells

56
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient: Step 3

57
제1장 Introduction
AI VIETNAM
All-in-One Course

58
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


Histogram of Oriented Gradient:
Step 4 : 16×16 Block Normalization

105x36x1 = 3780x1

59
제1장 Introduction
AI VIETNAM
All-in-One Course

60
Traditional Object Detection
제1장 Introduction
AI VIETNAM
All-in-One Course

HOG + Linear SVM


The first key ingredient from HOG + Linear SVM is to use image
pyramids.

61
HOG + Linear SVM
제1장 Introduction
AI VIETNAM
All-in-One Course

The second key ingredient we need is sliding windows:

62
HOG + Linear SVM
제1장 Introduction
AI VIETNAM
All-in-One Course

Combined with image pyramids, sliding windows allow us to localize objects at different
locations and multiple scales of the input image

63
HOG + Linear SVM
제1장 Introduction
AI VIETNAM
All-in-One Course

The final key ingredient we need is non-maxima suppression.

64
Measuring object detection accuracy with
AI VIETNAM
제1장 Introduction

Intersection over Union (IoU)


All-in-One Course

65
제1장 Introduction
AI VIETNAM
All-in-One Course

66
제1장 Introduction
AI VIETNAM
All-in-One Course

67
제1장 Introduction
AI VIETNAM
All-in-One Course

68
제1장 Introduction
AI VIETNAM
All-in-One Course

69
제1장 Introduction
AI VIETNAM
All-in-One Course

70
Limitations of NMS
제1장 Introduction
AI VIETNAM
All-in-One Course

71
Soft NMS
제1장 Introduction
AI VIETNAM
All-in-One Course

The idea is very simple — “instead of completely removing the proposals with high IOU
and high confidence, reduce the confidences of the proposals proportional to IOU value”

72
제1장 Introduction
AI VIETNAM
All-in-One Course

73
제1장 Introduction
AI VIETNAM
All-in-One Course

74
제1장 Introduction
AI VIETNAM
All-in-One Course

Most traditional object detection algorithms like Viola–


Jones, and Histogram of Oriented Gradients (HOG)are
relied on extracting handcrafted features like edges,
corners, gradients from the image and classical machine
learning algorithms.

For example, The Viola–Jones, a first object detector, was


only designed to detect frontal faces of humans and did not
do well on sideways and up/down faces.

75
제1장 Introduction
AI VIETNAM
All-in-One Course

The traditional computer vision approaches were in the


game until 2010.

From 2012, a new era of convolutional neural networks


started when AlexNet (an image classification network)
won the ImageNet Visual Recognition challenge.

76
제1장 Introduction
AI VIETNAM
All-in-One Course

Then, in 2012 came a new era. A revolution that changed


the game for computer vision entirely when AlexNet, a
Deep Convolutional Neural Network (CNN) architecture,
was born out of the need to improve the results of the
ImageNet challenge achieved considerable accuracy on
the 2012 ImageNet LSVRC-2012 challenge with an
accuracy of 84.7% as compared to the second-best with an
accuracy of 73.8%.

77
제1장 Introduction
AI VIETNAM
All-in-One Course

ImageNet Dataset

78
CNN for Image Classification
제1장 Introduction
AI VIETNAM
All-in-One Course

https://poloclub.github.io/cnn-explainer/
79
제1장 Introduction
AI VIETNAM
All-in-One Course

LetNet AlexNet
1998 2012

80
제1장 Introduction
AI VIETNAM
All-in-One Course

81
제1장 Introduction
AI VIETNAM
All-in-One Course

AlexNet 2012

82
제1장 Introduction
AI VIETNAM
All-in-One Course

ZFNet in 2013

83
제1장 Introduction
AI VIETNAM
All-in-One Course

VGG in 2014

84
제1장 Introduction
AI VIETNAM
All-in-One Course

GoogleNet in 2014

85
제1장 Introduction
AI VIETNAM
All-in-One Course

ResNet in 2015

86
제1장 Introduction
AI VIETNAM
All-in-One Course

SqueezeNet

87
제1장 Introduction
AI VIETNAM
All-in-One Course

88
ALEXNET IMPLEMENTATION
제1장 Introduction
AI VIETNAM
All-in-One Course

https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVRbo
89 Qx-8?usp=sharing
제1장 Introduction
AI VIETNAM
All-in-One Course

90
Assignment
제1장 Introduction
AI VIETNAM
All-in-One Course

Implement Object Detection by using Tradditional


approach

ResNet50

Code: https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVR
boQx-8?usp=sharing
Assignment
제1장 Introduction
AI VIETNAM
All-in-One Course

Implement Object Detection by using tradditional


approach

ResNet50

Code: https://colab.research.google.com/drive/1EtxlG4XRgrs0F7N8ivHQ6JZvVR
boQx-8?usp=sharing
제1장 Introduction
AI VIETNAM
All-in-One Course

93
제1장 Introduction
AI VIETNAM
All-in-One Course

Hence, Girschick et al. (2014) showed how we could use


convolutional features for object detection, introducing R-
CNN (applying CNN on region proposals). Since then,
object detection has started to evolve at an unprecedented
speed.

In deep learning era, object detection can be grouped into


two genres: “two-stage detection” and “one-stage
detection”, where the former frames the detection as a
“coarse- to-fine” process while the later frames it as to
“complete in one step”.

94
The single-stage and two-stage detector
제1장 Introduction
AI VIETNAM
All-in-One Course

95
제1장 Introduction
AI VIETNAM
All-in-One Course

96
Image classification vs. object detection
제1장 Introduction
AI VIETNAM
All-in-One Course

97
제1장 Introduction
AI VIETNAM
All-in-One Course

98
Classification Pipleline
제1장 Introduction
AI VIETNAM
All-in-One Course

99
Classification
제1장 Introduction
AI VIETNAM
All-in-One Course

100
Softmax Classification
제1장 Introduction
AI VIETNAM
All-in-One Course

101
Ideas for Localization using ConvNets
제1장 Introduction
AI VIETNAM
All-in-One Course

x1,y1 w

h
x0,y0
X2,y2
102
BOUNDING BOX REGRESSION TRAINING
제1장 Introduction
AI VIETNAM
All-in-One Course

103
제1장 Introduction
AI VIETNAM
All-in-One Course

104
제1장 Introduction
AI VIETNAM
All-in-One Course

105
제1장 Introduction
AI VIETNAM
All-in-One Course

106
제1장 Introduction
AI VIETNAM
All-in-One Course

Ideas for Detection

Confidence scores

Localization CNN
BBox

Neither do I know the number of objects


nor the location of those objects
제1장 Introduction
AI VIETNAM
All-in-One Course

Ideas for Detection – Sliding Window

Confidence scores

Localization CNN
BBox
제1장 Introduction
AI VIETNAM
All-in-One Course

Ideas for Detection – Sliding Window + Image Pyramid

Smaller objects Sliding Window – Location Larger objects


Image Pyramid - Scale
제1장 Introduction
AI VIETNAM
All-in-One Course

Ideas for Detection using ConvNets


Crop + Resize with Sliding Window + Image Pyramid
Sliding Window – Location
Image Pyramid - Scale
Get Class scores
Using Softmax

AlexNet/VGG

Conv and Pool Layers Get Bounding boxes


Feature Maps Using L2 loss
As Feature Extractors
(x1, y1, x2, y2)

Image Credit - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/examples/index.html For example, to process an image of 800x800, if the


sliding window size is 224, we will end up with
331,776‬ crops.
제1장 Introduction
AI VIETNAM
All-in-One Course

Problem: ConvNets input size constraints


Solution
제1장 Introduction
AI VIETNAM
All-in-One Course

Implement the Fully Connected layer operation as a


convolution operation

112
제1장 Introduction
AI VIETNAM
All-in-One Course

Problem: ConvNets input size constraints – FC as Conv


Pooled
Image Weights/Filter Feature Maps Pool FV FC Layers
Feature Maps
0 0 0 0 0 0 0 0
0 0
0 0
0 0 H
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
H
0 0
0 0
0 0
V
0 0
0 0
0 0 0 0 0 0 0 0 0 0

1. Does this make sense?


2. If so, what does this mean?
제1장 Introduction
AI VIETNAM
All-in-One Course

Receptive Field

2x2 Pool 2x2 Pool


Stride = 2 Stride = 2

2x2 1x1
4x4

2x2 Pool 2x2 Pool


Stride = 2 Stride = 2

2x2
4x4
8x8

Every value in the output encodes information from some 4x4 patch of the image.
제1장 Introduction
AI VIETNAM
All-in-One Course

ConvNets input size constraints


Pooled FC Layers
Image Weights/Filter Feature Maps
Feature Maps
0 0 0 0 0 0 0 0
0 0
0 0 H
0 0
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0

Same Localization CNN

0 0 0 0 0 0 0 0 0 0
0 0
0
0
0
0
H
0 0
0 0
0 0 V
0 0
0 0
0 0 0 0 0 0 0 0 0 0 Spatial output
1. Does this make sense? -> yes
2. If so, what does this mean? -> Represents the computations on different portions of the image.
제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Output as Sliding Window

CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

ConvNets and Sliding Window Efficiency


Confidence scores

Localization CNN
BBox

Localization CNN

V
제1장 Introduction
AI VIETNAM
All-in-One Course

ConvNets and Sliding Window Efficiency


0 0 255 255 0 0 255 255
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
1 0 -1
0 0 255 255 0 0 255 255 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255 3x3 -765 -765 765 765 -765 -765
0 0 255 255 0 0 255 255
6x6
8x8
0 0 255 255 0 0 255 255 0 0
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
1 0 -1
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 3x3 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0 -765 -765 765 765 -765 -765 765 765
0 0 255 255 0 0 255 255 0 0
10x10 8x8

255 255 0 0 255 255 0 0


255 255 0 0 255 255 0 0
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 1 0 -1 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0
3x3 765 765 -765 -765 765 765
255 255 0 0 255 255 0 0 765 765 -765 -765 765 765

8x8 6x6
제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Output for Image Pyramids

H V
H V
제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Output for Image Pyramids

H V
H V
제1장 Introduction
AI VIETNAM
All-in-One Course

With Spatial Outputs, we can detect different objects at


different locations of the image. Below figure shows a
2x3 Spatial Output for a sample image.

121
OverFeat:
AI VIETNAM Integrated Recognition, Localization and Detection제1장 Introduction
using
All-in-One Course
Convolutional Networks

Overfeat
Sliding Window Crop FC as Conv (No input size constraint) + Spatial Output + Image Pyramid

Resolution = 36 How to modify localization framework to convert FC as Conv?

461x569 425x497 389x461 317x389 281x317

2x3
3x5
5x7
6x7

7x10

245x245
Smaller objects Larger objects
If you want to detect even smaller objects, use even bigger image pyramids. Trade-off, increase in computation
Intuition behind OverFeat Network
제1장 Introduction
AI VIETNAM
All-in-One Course

1. Use the same localization network, without using the


Sliding Window crops at different locations.
2. No input size constraint, be able to use the Image
pyramids.
3. Use Image Pyramids, we will get the Spatial Output,
which will give us detections at different locations of
the image.
4. The entire network is using Convolution operations, it
is way more efficient than taking crops.

This Network won the ImageNet 2013 localization task (ILSVRC2013) and obtained very
competitive results for the detection and classifications tasks.

123
Resolution
제1장 Introduction
AI VIETNAM
All-in-One Course

124
Resolution
제1장 Introduction
AI VIETNAM
All-in-One Course

Resolution = 36 How to modify localization framework to convert FC as Conv?

461x569 425x497 389x461 317x389 281x317

2x3
3x5
5x7
6x7

7x10

245x245

125
제1장 Introduction
AI VIETNAM
All-in-One Course

Overfeat - Classification

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks – Sermanet et al
Images Credit – Overfeat paper & https://towardsdatascience.com/object-localization-in-overfeat-5bb2f7328b62
Let’s look at the FC layers in detail
제1장 Introduction
AI VIETNAM
All-in-One Course
제1장 Introduction
AI VIETNAM
All-in-One Course

But, how would you change the design to get a


depth of 4096 at 2nd and 3rd FC layers?
128
제1장 Introduction
AI VIETNAM
All-in-One Course

N layer Conv – M Feature Maps


제1장 Introduction
AI VIETNAM
All-in-One Course

Overfeat
Fully Connected layer implemented as a convolution layer

Conv Output Feature Map Outputs Filters Final output


+ Feature Map For C Classes
Pool Layers From Conv+Pool

x256 256*4096 4096* xC


4096*
x4096 4096 x4096
C

1x1 1x1 1x1 1x1


5x5 1x1
245x245
First 5 Layers of Feature Map
AlexNet (Modified) 5x5

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks – Sermanet et al
Images Credit – Overfeat paper & https://towardsdatascience.com/object-localization-in-overfeat-5bb2f7328b62
Overfeat Detection Network
제1장 Introduction
AI VIETNAM
All-in-One Course

131
제1장 Introduction
AI VIETNAM
All-in-One Course

Problem of Multiple Detections

281x317
제1장 Introduction
AI VIETNAM
All-in-One Course

Non Max Suppression

Spatial Output

Softmax instead of SVM


Human detection as an example
제1장 Introduction
AI VIETNAM
All-in-One Course

Results
1x1xC
2x3xC
3x5xC
Won the ImageNet Loc 5x7xC
alization challenge in 2 6x7xC
013 7x10xC

3x3xC
6x9xC
9x15xC
15x21xC
18x21xC
21x30xC

3 3 9
6 9 54
9 15 135
15 21 315
18 21 378
21 30 630
1521
x21 = 31941
제1장 Introduction
AI VIETNAM
All-in-One Course

Model Size
제1장 Introduction
AI VIETNAM
All-in-One Course

Model Sizes AlexNet

11*11*3*96 13*13*256*4096
=34,848 =177,209,344
=~30KB =~177MB
Total Weight FC Outpu
Input Conv Filter Output s Approx FC input t Total Weights Approx
3 11 11 96 34848 34KB 43264 4096 177209344 177MB
96 5 5 256 614400 600KB 4096 4096 16777216 16MB
256 3 3 384 884736 900KB 4096 1000 4096000 4MB
384 3 3 384 1327104 1.3MB 198082560 ~=198MB
384 3 3 256 884736 900KB
13 Conv Layers
3745824 3.7MB
제1장 Introduction
AI VIETNAM
All-in-One Course

Model Sizes - VGGNet

Input Conv Filter Output Total Weights Approx FC Outpu


3 3 3 64 1728 1.7KB FC input t Total Weights Approx
64 3 3 64 36864 36KB 25088 4096 102760448 102MB
64 3 3 128 73728 73KB 4096 4096 16777216 16MB
128 3 3 128 147456 150KB 4096 1000 4096000 4MB
128 3 3 256 294912 300KB 123633664 123MB
256 3 3 256 589824 600KB • You can increase the depth of your CNN with
256 3 3 256 589824 600KB out significantly increasing model size.
256 3 3 512 1179648 1.2MB • But even for a 3 layer FC Network, you need s
512 3 3 512 2359296 2.4MB ignificant memory for weights.
512 3 3 512 2359296 2.4MB • How can we do Classifications/Bbox regressio
512 3 3 512 2359296 2.4MB n without significantly increasing model size?
512 3 3 512 2359296 2.4MB
512 3 3 512 2359296 2.4MB 13 Conv Layers
14710464 14MB
제1장 Introduction
AI VIETNAM
All-in-One Course

138
제1장 Introduction
AI VIETNAM
All-in-One Course

139
제1장 Introduction
AI VIETNAM
All-in-One Course

140
LIMITATIONS of CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

141
LIMITATIONS
제1장 Introduction
AI VIETNAM
All-in-One Course

Considering candidates on Object Regions only?

142
REGION PROPOSAL METHODS
제1장 Introduction
AI VIETNAM
All-in-One Course

Xiangteng He, Yuxin Peng, Junjie Zhao, “Fine-grained Discriminative Localization via Saliency-g
uided Faster R-CNN”, 2023
143
REGION PROPOSAL METHODS
제1장 Introduction
AI VIETNAM
All-in-One Course

Xiangteng He, Yuxin Peng, Junjie Zhao, “Fine-grained Discriminative Localization via Saliency-g
uided Faster R-CNN”, 2023
144
REGION PROPOSAL METHODS
제1장 Introduction
AI VIETNAM
All-in-One Course

145
REGION PROPOSAL METHODS
제1장 Introduction
AI VIETNAM
All-in-One Course

Egde Density

146
REGION PROPOSAL METHODS
제1장 Introduction
AI VIETNAM
All-in-One Course

Segmentation techniques

147
Region Proposal Method Comparisons
제1장 Introduction
AI VIETNAM
All-in-One Course

Jan Hosang, Rodrigo Benenson, Bernt Schiele, “How good are detection
proposals, really?”, 2014
148
EDGE BOXES
제1장 Introduction
AI VIETNAM
All-in-One Course

Zitnick, C.L., Dollár, P. (2014). Edge Boxes: Locating Object Proposals from Edges. In: Fle
et, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 20
14. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1
007/978-3-319-10602-1_26
149
제1장 Introduction
AI VIETNAM
All-in-One Course

150
제1장 Introduction
AI VIETNAM
All-in-One Course

153
R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

The R-CNN was described in the 2014 paper by Ross


Girshick, et al. from UC Berkeley titled “Rich feature
hierarchies for accurate object detection and semantic
segmentation.”

154
Region Proposals
제1장 Introduction
AI VIETNAM
All-in-One Course

Selective Search

155
CNN Model
제1장 Introduction
AI VIETNAM
All-in-One Course

156
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN - Region proposals with CNNs


Localization CNN
fc6 fc7
4096 4096

Get Class scores


Using Softmax
C class scores
x256
Selective Search

AlexNet/VGG
Regions

6x6

6x6x256= Get Bounding boxes,


9216 per class
227x227 (x1, y1, x2, y2)

Rich feature hierarchies for accurate object detection and semantic segmentation - Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN - Region proposals with CNNs


Localization CNN
fc6 fc7
4096 4096

Get
GetClass
Classscores
scores
Using SVM
Using Softmax
Linear
C classSVM per class
scores
x256
Selective Search

2000 AlexNet/VGG
Cropped &
Warped
regions
6x6

6x6x256= Get Bounding boxes,


9216 per class
227x227 (x1, y1, x2, y2)

Rich feature hierarchies for accurate object detection and semantic segmentation - Ross Girshick Jeff Donahue Trevor Darrell Jitendra Mali
k
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN
Classical CV CNN 3 Stage Training
Stage 1 Stage 2
Fine Tune using Log L
oss (Training Only)

fc6 fc7
Get Class scores
Using SVM

AlexNet/VGG
Linear SVM per class
SS/EB

Get Bounding boxe


2000 Pre Trained s, per class
Region On ImageNet (x1, y1, x2, y2)
Proposals Finetune
On Region Proposals VOC 0 W/O F W FT
7 T
pool5 44.2 47.3
Alex Net
fc6 46.2 53.1
• Before finetune: 44%
fc7 44.7 54.2
• After finetune: 54%
• Adding bounding box regressioni: 58%
• VGG: 66%
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN
Classical CV CNN
Stage 1 Stage 2 Fine Tune using Log L
oss (Training Only)
fc6 fc7
Get Class scores
Using SVM

AlexNet/VGG
Linear SVM per class
SS/EB
Get Bounding boxe
s, per class
(x1, y1, x2, y2)
2000 Pre Trained
Region On ImageNet &
Proposals Finetuned • Why don’t we need the sliding window & imag
On Region Proposals e pyramid?
• Didn’t we end up with too many inputs to the local
ization network?
제1장 Introduction
AI VIETNAM
All-in-One Course

161
제1장 Introduction
AI VIETNAM
All-in-One Course

162
제1장 Introduction
AI VIETNAM
All-in-One Course

Results
제1장 Introduction
AI VIETNAM
All-in-One Course

165
Assignment
제1장 Introduction
AI VIETNAM
All-in-One Course

https://pyimagesearch.com/2020/07/13/r-cnn-object-detection-with-keras-te
nsorflow-and-deep-learning/
166
제1장 Introduction
AI VIETNAM
All-in-One Course

167
제1장 Introduction
AI VIETNAM
All-in-One Course

168
제1장 Introduction
AI VIETNAM
All-in-One Course

169
Fast R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

170
Fast R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast R-CNN is proposed as a single model instead of a


pipeline to learn and output regions and classifications
directly.

https://arxiv.org/abs/1504.08083
171
제1장 Introduction
AI VIETNAM
All-in-One Course

172
제1장 Introduction
AI VIETNAM
All-in-One Course

173
제1장 Introduction
AI VIETNAM
All-in-One Course

Histograms of Images

72

48

0 150 255
0 255
제1장 Introduction
AI VIETNAM
All-in-One Course

Histograms of Images
제1장 Introduction
AI VIETNAM
All-in-One Course

Histograms of Images
제1장 Introduction
AI VIETNAM
All-in-One Course

Histograms of Images - Bins

0 25 150 175 255 225


Codebook

0-49 0 48
48 50-99 1 0
100-149 2 0
150-199 3 48
200-255 4 48
0-49 50-99 100-149 150-199 200-255
제1장 Introduction
AI VIETNAM
All-in-One Course

Histogram Examples

Credits – See Description


제1장 Introduction
AI VIETNAM
All-in-One Course

Histogram Examples
제1장 Introduction
AI VIETNAM
All-in-One Course

Bag of Visual Words


Generate HOG/SIFT Feature Descriptors

0
1
2
3
4
5
codebook
제1장 Introduction
AI VIETNAM
All-in-One Course

Bag of Visual Words – K Means Clustering


Generate HOG/SIFT Feature Descriptors

0
1
2
3
4
5
codebook

K Means Clustering
제1장 Introduction
AI VIETNAM
All-in-One Course

Bag of Visual Words


Generate HOG/SIFT Feature Descriptors

0
1
2
3
4
5
codebook
제1장 Introduction
AI VIETNAM
All-in-One Course

BOW – Codebook generation for different textures

Image Credit - https://littlecheesecake.wordpress.com/2013/04/24/research-bag-of-features-for-visual-recognition/


제1장 Introduction
AI VIETNAM
All-in-One Course

BOW – Example – Codebook generation for Faces

Slide Credit - http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf


제1장 Introduction
AI VIETNAM
All-in-One Course

BOW - Examples

Slide Credit - http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf


제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Pyramid Matching

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories - Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Image Credit - https://homepages.inf.ed.ac.uk/rbf/HIPR2/translte.htm


제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Pyramid Matching

3x1 = 3

8x12
Feature Maps
3x4 = 12 12x8 12x12
Feature Maps Feature Maps

3x16 = 48

8x8 3x21
Feature Maps
제1장 Introduction
AI VIETNAM
All-in-One Course

Classification & Localization


4096 4096

C
Get Class scores
Using Softmax
x256
-Last Pool
Alex/VGG

S
P
P
Feature Maps
4xC
Get Bounding boxes
Using L2 loss
(x1, y1, x2, y2)
Replace last pooling layer by SPP

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks – Sermanet et al
Images Credit – Overfeat paper & https://towardsdatascience.com/object-localization-in-overfeat-5bb2f7328b62
제1장 Introduction
AI VIETNAM
All-in-One Course

192
제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Pyramid Pooling

1x1

• Identifying features
• K-means clustering
4x1 • Codebooks
• Histograms
Just Max-Pool

16x1

8x8
Feature Maps 21x1
제1장 Introduction
AI VIETNAM
All-in-One Course

SPP for Feature Maps outputs


x256

1x256

4x256

16x256

21x256
제1장 Introduction
AI VIETNAM
All-in-One Course

Any Size and Aspect Ratio

1x1

8x12
Feature Maps

4x1 12x8 12x12


Feature Maps Feature Maps

16x1

8x8
21x1
Feature Maps
제1장 Introduction
AI VIETNAM
All-in-One Course

Spatial Pyramid Pooling

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
제1장 Introduction
AI VIETNAM
All-in-One Course

SPPNet = SPP + Overfeat for Classification


1x1x4096 1x1x4096
ILSVRC 2014: ranked #3 in image classific
ation 1x1xC
Get Class scores
Using Softmax
x256
-Last Pool
Alex/VGG

S
P
P
Feature Maps
245x245 1x1x4xC
Get Bounding boxes
Using L2 loss
(x1, y1, x2, y2)
Replace last pooling layer by SPP

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks – Sermanet et al
Images Credit – Overfeat paper & https://towardsdatascience.com/object-localization-in-overfeat-5bb2f7328b62
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN – 2 Stage Network

Fine Tune using Log L


oss (Training Only)

Get Class scores


AlexNet/VGG Using SVM.
Linear SVM per class
SS/EB

Get Bounding boxe


s, per class
2000 Pre Trained Using L2 loss
Region Proposals On ImageNet (x1, y1, x2, y2)
+ Finetuned
On Region Proposals
제1장 Introduction
AI VIETNAM
All-in-One Course

SPP – 2 Stage Network


Region Proposals
1. How do you translate ROI pro
posals onto the Feature Maps

SS/EB Fine Tune using Log L


oss (Training Only)

Get Class scores


Using SVM
Linear SVM per class
-Last Pool
Alex/VGG
S
SS/EB
P
P
Get Bounding boxe
s, per class
Pre Trained Using L2 loss
On ImageNet (x1, y1, x2, y2)

Region Of Interest Proposals – ROI Proposal 3. How to train the BBox regressor

2. How do you pool the ROI proposals from the Feature Map
제1장 Introduction
AI VIETNAM
All-in-One Course

Subsampling Ratio
1. How do you translate ROI proposals onto the Feature Maps

3x3 Pool 3x3 Pool 2x2 Pool


Stride = 3 Stride = 3 Stride = 2

2x2 1x1
6x6

18x18

3x3 Pool 3x3 Pool 2x2 Pool


Stride = 3 Stride = 1 Stride = 2

3x3
6x6 5x5

18x18
제1장 Introduction
AI VIETNAM
All-in-One Course

ROI Projection
1/16
(340, 450) (21, 28)
0,0 688x920
x
SS/EB Region Proposal or ROI

320x128
43x58

1/16 Classifier
S
y P
20x8 P
BBox Reg

x, y

1. How do you translate ROI proposals onto the Feature Maps


제1장 Introduction
AI VIETNAM
All-in-One Course

AlexNet Subsampling Ratio

2. How do you pool the ROI proposa


ls from the Feature Map

13x13
Feature Maps
제1장 Introduction
AI VIETNAM
All-in-One Course

SPP on Region Proposals

688

43x58 - Feature Map


제1장 Introduction
AI VIETNAM
All-in-One Course

SPP for Feature Maps outputs


x256

1x256

4x256

16x256

21x256
제1장 Introduction
AI VIETNAM
All-in-One Course

Any Size and Aspect Ratio, Overlapping ROIs


제1장 Introduction
AI VIETNAM
All-in-One Course

SPP on Region Proposals


In Practice - {6x6, 3x3, 2x2, 1x1}

2. How do you pool the ROI proposals from the Feature Map

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
제1장 Introduction
AI VIETNAM
All-in-One Course

Feature Maps Sizes

224x224

480
576
688 30x40
36x48
864 43x58

1200 54x72

75x100
제1장 Introduction
AI VIETNAM
All-in-One Course

BBox Regression Training


x
0,0

Higher Layers/
ConvNets

y BBox Reg

dx, dy, dw, dh

3. How to train the BBox regressor


제1장 Introduction
AI VIETNAM
All-in-One Course

BBox Regression Training

0,0 x
Higher Layers/
ConvNets

BBox Reg
ROI Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh xg, yg, wg, hg

y (x + dx – xg )2 = 0

3. How to train the BBox regressor


제1장 Introduction
AI VIETNAM
All-in-One Course

SPP – 2 Stage Network - Inference


Region Proposals
2000 Computations
Old: 0.7s; New: 0.9s
SS/EB Fine Tune using Log L
Just 1 computation oss (Training Only)
Old: 9s; New: 0.3s
Get Class scores
Using SVM
Linear SVM per class
-Last Pool
Alex/VGG
S
P
P
Get Bounding boxe
s, per class
Pre Trained Using L2 loss
On ImageNet (x1, y1, x2, y2)

ILSVRC 2014: ranked #2 in object detec


tion
제1장 Introduction
AI VIETNAM
All-in-One Course

RCNN -> SPPNet -> Fast RCNN


Complex
Basic Shapes
Shapes FC +
Soft 3 Stage to 1 Stage

Region Proposals
max
Fine Tune using Log Lo
SS/EB ss (Training Only)
+ Softmax for Classifica
conv3 tion
FT ROI S
P Get Class scores
P Using SVM
1L Linear SVM per class
-Last Pool
Alex/VGG

ROI
P Get Bounding boxes,
O per class
Single Scale, no Image Pyramid Using L2 Loss
Pre Trained O
L Smooth L1 loss
On ImageNet
(x, y, h, w)
if |a-b|<1, (a-b)2/2
else, |a-b| - 1/2
제1장 Introduction
AI VIETNAM
All-in-One Course

Role of Region Proposals

CNN

Region Proposal Dense Sampling


RCNN Overfeat
SPPNet
Fast RCNN

They conclude – “Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as spa
rse proposals”.
제1장 Introduction
AI VIETNAM
All-in-One Course

Good Looking Fast RCNN ☺

SS/EB
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
I
Pretrained
-Last Pool
Alex/VGG P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Faster R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

217
Faster R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

218
제1장 Introduction
AI VIETNAM
All-in-One Course

Role of Region Proposals

CNN

Region Proposal Dense Sampling


RCNN Overfeat
SPPNet
Fast RCNN

They conclude – “Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as spa
rse proposals”.
제1장 Introduction
AI VIETNAM
All-in-One Course

Criteria for replacing SS


< 2000 Region Proposals
As fast as SS or better
As Accurate as SS or better
Should be able to propose Overlapping ROIs with different Aspect Ratios and Scale

SS/EB
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
I
Pretrained
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
제1장 Introduction
AI VIETNAM
All-in-One Course

Criteria for replacing SS


< 2000 Region Proposals
As fast as SS or better
As Accurate as SS or better

Should be able to propose:


• Overlapping ROIs
• Aspect Ratios
• Scales
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + Sliding Window + Image Pyramid

Dense Sampling
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, w, h)
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + Feature Pyramid

Dense Sampling
With Feature Pyramid
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
• At least 40x60x9 =~ 20,000 proposals -> time consuming. Using Smooth L1 loss
• Backpropagating through those many proposals is difficult/time consuming (x, y, w, h)
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + Neural Network?

OverFeat
Simple CNN[-Classifier-NMS]
BBox Regressor
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + RPN


RPN
< 2000 Region Proposals
As fast as SS or better

Pretrained
-Last Pool
Alex/VGG
As Accurate as SS or better

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)

https://www.youtube.com/watch?v=po59qI5LJGU&list=PL1GQaVhO4f_jLxOokW
7CS5kY_J1t1T17S&index=79
제1장 Introduction
AI VIETNAM
All-in-One Course

Ideas for Localization using ConvNets


10
Case #1 – Only one object per image Human
Car
Dog

Cat
0,0 800, 0
Bicycle
AlexNet/VGG etc
Get Class Scores
Using Softmax

0, 600 Human
Car
X1, y1 w Dog
Get Bounding boxes
h
Using L2 loss
X2, y2 (x1, y1, x2, y2) Cat
X0, y0
Bicycle
etc
Image Credit - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/examples/index.html
제1장 Introduction
AI VIETNAM
All-in-One Course

0,0 BBox
x Regression - Relative

Higher Layers/
ConvNets

y BBox Reg

dx, dy, dh, dw

Reference Box x, y, h, w

ROI reference Bbox deltas Predicted Expected


x y h w dx dy dh dw x y h w x y h w
160 240 150 150 18 -22 -30 -125 178 218 120 25 180 220 120 30
제1장 Introduction
AI VIETNAM
All-in-One Course

Sliding Window as Reference Box


x
0,0

Higher Layers/
ConvNets

y BBox Reg

dx, dy, dw, dh

Reference Box x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Sliding Window as Reference Box


x
0,0

Higher Layers/
ConvNets

y BBox Reg

dx, dy, dw, dh

Reference Box x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Sliding Window as Reference Box


x
0,0

BBox Reg
y BBox Reg
BBox Reg

Reference Box x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Reference Boxes of fixed sizes


x
0,0

BBox Reg 1:1


y BBox Reg 1:2
BBox Reg 2:1

Reference Box x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Square ROI to Rectangular Proposal


0,0 x
Higher Layers/
ConvNets

BBox Reg 2:1


ROI Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh x, y, w, h

Tall Box Centre/W/H


y
x, y, w, h

ROI reference Bbox deltas Predicted Expected


x y h w dx dy dh dw x y h w x y h w
160 240 130 65 18 -22 -10 -32 178 218 120 33 180 220 120 30

Note: This is not the exact formula used in the Faster RCNN paper. I just used this for simpler explanation. Please see the paper for details.
제1장 Introduction
AI VIETNAM
All-in-One Course

Square ROI to Rectangular Proposal


0,0 x
Higher Layers/
ConvNets

BBox Reg 1:1


ROI Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh x, y, w, h

Square Box Centre/W/H


y
x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Square ROI to Rectangular Proposal


0,0 x
Higher Layers/
ConvNets

BBox Reg 1:2


ROI Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh x, y, w, h

Wide Box Centre/W/H


y
x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Multiple BBox Reg using Reference Boxes


60x40
This is different from Feature Pyramid
BBox Reg 1:1
These boxes are called Anchor Boxes
BBox Reg 1:2
BBox Reg 2:1

BBox Reg
BBox Reg
BBox Reg

BBox Reg
BBox Reg
BBox Reg
제1장 Introduction
AI VIETNAM
All-in-One Course

Bigger Objects?
0,0 x
Higher Layers/
ConvNets

BBox Reg
SW Centre/W/H Ground Truth
x, y, w, h dx, dy, dw, dh x, y, w, h

Tall Box Centre/W/H


y
x, y, w, h
제1장 Introduction
AI VIETNAM
All-in-One Course

Bigger Objects?
x
0,0

BBox Reg 1:1


y BBox Reg 1:2
BBox Reg 2:1

Reference Box x, y, w, h

Note: This is not the exact formula used in the Faster RCNN paper. I just used this for simpler explanation. Please see the paper for details.
제1장 Introduction
AI VIETNAM
All-in-One Course

Multiple BBox Reg using Anchor Boxes


60x40 1:1, 1:2. 2:1

BBox Reg
BBox Reg 128sq
BBox Reg

BBox Reg
BBox Reg 256sq
BBox Reg

BBox Reg
BBox Reg 512sq
BBox Reg
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + RPN


RPN
< 2000 Region Proposals
As fast as SS or better

Pretrained
-Last Pool
Alex/VGG
As Accurate as SS or better
x9

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
제1장 Introduction
AI VIETNAM
All-in-One Course

How to reduce number of proposals?


0,0 x
Higher Layers/
ConvNets

BBox Reg Classifier

dx, dy, dw, dh


FG BG

y
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster RCNN – Training Anchor Boxes - Labelling


x
0,0

BBox Reg 1:1


y BBox Reg 1:2
BBox Reg 2:1
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + RPN


RPN x9
< 2000 Region Proposals
FG/BG As fast as SS or better

Pretrained
-Last Pool
Alex/VGG
As Accurate as SS or better

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster RCNN – Training Anchor Boxes - Labelling

BBox Reg 1:1


BBox Reg 1:2
BBox Reg 2:1
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + RPN = Faster RCNN


Different Image Sizes? x9 RPN
< 2000 Region Proposals
1x1
FG/BG As fast as SS or better ?
3x3
Pretrained
-Last Pool
Alex/VGG Dense Sampling As Accurate as SS or better ?

1x1

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Sparse Sampling
Using Smooth L1 loss
(x, y, h, w)
Unchanged – Fast RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster & Accurate than SS?

Time in ms
mAP
------
66.9
69.9
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster RCNN - Training


x9 RPN 1. Train RPN using ConvNet1

Unshared
Pretrained 1x1
-Last Pool
Alex/VGG

FG/BG 2. Train Fast-RCNN using ConvNet2


& RPN Proposals
3x3
3. Fine-Tune RPN using ConvNet2

Shared
4. Fine-Tune Fast-RCNN using Conv
1x1 Net2 & new RPN Proposals
Joint Training done later.

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Fast RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Fast RCNN + Neural Network?

OverFeat [-Classifier-NMS]
Fine Tune using Log Lo
R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
제1장 Introduction
AI VIETNAM
All-in-One Course

RPN ~= Overfeat + Anchor Boxes – Class agnostic


x9 RPN
1x1
FG/BG
3x3

1x1

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Fast RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Quirks about Anchor Boxes


Effective Receptive Field of VGGNet is 228x228

228 400
0, x 800
x9 RPN 0
1x1
FG/BG
228
3x3 300

1x1

y
600

See: http://zike.io/posts/calculate-receptive-field-for-vgg-16/
제1장 Introduction
AI VIETNAM
All-in-One Course

Multiple BBox Reg using Anchor Boxes


60x40 1:1, 1:2. 2:1
BBox Reg
BBox Reg 128sq
BBox Reg

BBox Reg
BBox Reg 256sq
BBox Reg

BBox Reg
BBox Reg 512sq
BBox Reg
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster RCNN Network – Micro Code Walkthrough in TF


x9 RPN
https://github.com/endernewton/tf-faster-rcnn
1x1
FG/BG
3x3

1x1

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Fast RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Faster RCNN
x9 RPN
1x1
FG/BG
3x3

1x1

Fine Tune using Log Lo


R ss + Softmax for Classif
O ication
Pretrained

I
-Last Pool
Alex/VGG

P
O
O Get Bounding boxes,
L per class
Using Smooth L1 loss
(x, y, h, w)
Fast RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

253
Faster RCNN Implementation
제1장 Introduction
AI VIETNAM
All-in-One Course

https://www.analyticsvidhya.com/blog/2018/11/impleme
ntation-faster-r-cnn-python-object-detection/

254
제1장 Introduction
AI VIETNAM
All-in-One Course

Chain Of Influences
Convolution Colour Detection Edge Detection Superpixels Histograms Corner Detection Gradient

AlexNet, VGG, ResNet SIFT HOG

Selective Search BOW DPM

Spatial Pyramid Matching


MultiBox OverFeat RCNN

Spatial Pyramid Pooling


Fast-RCNN

Faster-RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

Understanding Region of Interest (RoI Pooling)

258
제1장 Introduction
AI VIETNAM
All-in-One Course

259
제1장 Introduction
AI VIETNAM
All-in-One Course

Sample RoIs

260
제1장 Introduction
AI VIETNAM
All-in-One Course

How to get RoIs from the feature map?

261
제1장 Introduction
AI VIETNAM
All-in-One Course

•width: 200/32 = 6.25


•height: 145/32 = ~4.53
•x: 296/32 = 9.25
•y: 192/32 = 6

262
제1장 Introduction
AI VIETNAM
All-in-One Course

Quantization of coordinates on the feature map

263
제1장 Introduction
AI VIETNAM
All-in-One Course

264
제1장 Introduction
AI VIETNAM
All-in-One Course

265
제1장 Introduction
AI VIETNAM
All-in-One Course

266
제1장 Introduction
AI VIETNAM
All-in-One Course

267
제1장 Introduction
AI VIETNAM
All-in-One Course

268
제1장 Introduction
AI VIETNAM
All-in-One Course

269
제1장 Introduction
AI VIETNAM
All-in-One Course

270
제1장 Introduction
AI VIETNAM
All-in-One Course

271
제1장 Introduction
AI VIETNAM
All-in-One Course

272
Mask R-CNN
제1장 Introduction
AI VIETNAM
All-in-One Course

273
제1장 Introduction
AI VIETNAM
All-in-One Course

274
제1장 Introduction
AI VIETNAM
All-in-One Course

275
제1장 Introduction
AI VIETNAM
All-in-One Course

276
제1장 Introduction
AI VIETNAM
All-in-One Course

277
제1장 Introduction
AI VIETNAM
All-in-One Course

278
제1장 Introduction
AI VIETNAM
All-in-One Course

279
제1장 Introduction
AI VIETNAM
All-in-One Course

280
제1장 Introduction
AI VIETNAM
All-in-One Course

281
제1장 Introduction
AI VIETNAM
All-in-One Course

282
제1장 Introduction
AI VIETNAM
All-in-One Course

283
Mask RCNN
제1장 Introduction
AI VIETNAM
All-in-One Course

284
제1장 Introduction
AI VIETNAM
All-in-One Course

Why is there a need for a large amount of data?

285
제1장 Introduction
AI VIETNAM
All-in-One Course

Why is there a need for a large amount of data?

286
제1장 Introduction
AI VIETNAM
All-in-One Course

What is Data Augmentation?

287
Data Augmentation
제1장 Introduction
AI VIETNAM
All-in-One Course

Resize and rescale

288
Data Augmentation
제1장 Introduction
AI VIETNAM
All-in-One Course

Random rotate and flip

289
Data Augmentation
제1장 Introduction
AI VIETNAM
All-in-One Course

Flip Grayscale

Adjusting the saturation Adjusting the brightness

290
Data Augmentation
제1장 Introduction
AI VIETNAM
All-in-One Course

Central Crop 90-degree rotation

Applying random brightness

291

You might also like