Professional Documents
Culture Documents
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Object
deeplearning.ai
localization
What are localization and detection?
Image classification Classification with Detection
localization
Andrew Ng
Classification with localization
⋯ ⋮
1- pedestrian
2- car
3- motorcycle
4- background
Andrew Ng
Defining the target label y
1- pedestrian Need to output #$ , #& , #' , #( , class label (1-4)
2- car
3- motorcycle
4- background
Andrew Ng
Object Detection
Landmark
deeplearning.ai
detection
Landmark detection ConvNet
!" , !$ , !% , !&
Andrew Ng
Object Detection
Object
deeplearning.ai
detection
Car detection example
Training set:
x y
1
Andrew Ng
Sliding windows detection
Andrew Ng
Object Detection
Convolutional
deeplearning.ai implementation of
sliding windows
Turning FC layer into convolutional layers
MAX POOL FC FC
5×5 2×2 ⋮ ⋮
y
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 softmax (4)
MAX POOL FC FC
Andrew Ng
Convolution implementation of sliding windows
MAX POOL FC FC FC
MAX POOL FC FC FC
MAX POOL
MAX POOL
Andrew Ng
Object Detection
Bounding box
deeplearning.ai
predictions
Output accurate bounding boxes
Andrew Ng
YOLO algorithm
Labels for training
For each grid cell:
100
100
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Specify the bounding boxes
100
100
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Object Detection
Intersection
deeplearning.ai
over union
Evaluating object localization
More generally, IoU is a measure of the overlap between two bounding boxes.
Andrew Ng
Object Detection
Non-max
deeplearning.ai
suppression
Non-max suppression example
Andrew Ng
Non-max suppression example
0.6
0.8
0.9
0.3
0.5
Andrew Ng
Non-max suppression example
0.6
0.8
0.9
0.7
0.7
Andrew Ng
Non-max suppression algorithm
$%
&'
Each output prediction is: &(
&)
&*
Discard all boxes with $% ≤ 0.6
While there are any remaining boxes:
• Pick the box with the largest $%
Output that as a prediction.
19×19
• Discard any remaining box with
IoU ≥ 0.5 with the box output
in the previous step Andrew Ng
Object Detection
Anchor boxes
deeplearning.ai
Overlapping objects:
Anchor box 1: Anchor box 2:
!"
#$
#%
#&
y = #'
()
(*
(+
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Anchor box algorithm
Previously: With two anchor boxes:
Each object in training Each object in training
image is assigned to grid image is assigned to grid
cell that contains that cell that contains object’s
object’s midpoint. midpoint and anchor box
for the grid cell with
highest IoU.
Andrew Ng
Anchor box example !"
#$
#%
#&
#'
()
(*
(+
y = !"
#$
#%
#&
#'
Anchor box 1: Anchor box 2:
()
(*
(+
Andrew Ng
Object Detection
Putting it together:
deeplearning.ai
YOLO algorithm
Training 1 - pedestrian
'( 0 0
2 - car )* ? ?
)+ ? ?
3 - motorcycle
)- ? ?
). ? ?
/0 ? ?
/1 ? ?
/2 ? ?
y = '( 0 1
)* ? )*
)+ ? )+
)- ? )-
). ?
/0
).
? 0
/1 ?
/2 1
? 0
y is 3×3×2×8
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Making predictions
'(
)*
)+
)-
).
/0
⋯ 4= /1
/2
'(
)*
3×3×2×8 )+
)-
).
/0
/1
/2
Andrew Ng
Outputting the non-max supressed outputs
Andrew Ng
Object Detection
Region proposals
deeplearning.ai
(Optional)
Region proposal: R-CNN
[Girshik et. al, 2013, Rich feature hierarchies for accurate object detection and semantic segmentation] Andrew Ng
Faster algorithms
[Girshik et. al, 2013. Rich feature hierarchies for accurate object detection and semantic segmentation]
[Girshik, 2015. Fast R-CNN]
[Ren et. al, 2016. Faster R-CNN: Towards real-time object detection with region proposal networks] Andrew Ng
Convolutional
Neural Networks
Semantic segmentation
with U-Net
Object Detection vs. Semantic Segmentation
Andrew Ng
Motivation for U-Net
[Novikov et al., 2017, Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs]
[Dong et al., 2017, Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks ] Andrew Ng
Per-pixel class labels
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
1. Car
000000011111100000000000 0. Not Car
001111111111111100000000
001111111111111111111110
001111111111111111111110
000011100000000000111000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
Andrew Ng
Per-pixel class labels
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1. Car 2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2. Building 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3. Road 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
333311133333333333111 333 333311133333333333111 333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
Segmentation Map
Andrew Ng
Deep Learning for Semantic Segmentation
𝑦ො
Andrew Ng
Transpose Convolution
Normal Convolution
* =
Transpose Convolution
* =
Andrew Ng
Transpose Convolution
231 231 231
1 2 1
231 231 231
2 0 1 0 24
+2 0 1
231 231 231 2 +0
0 2 1 410
+6 7
+3+2
+4 1 3
26 +2
2 1
0
0 37
+4 0
0 2
2
3 2 weight filter
6 33
+0 4 2
2x2
4x4
𝑦ො
Andrew Ng
U-Net
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng
U-Net
hxwx3 h x w x # classes
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng