Deeplearning - Ai Deeplearning - Ai

Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

Object Detection
Object
deeplearning.ai
localization
What are localization and detection?
Image classification Classification with Detection
localization
Andrew Ng
Classification with localization
⋯ ⋮
1- pedestrian
2- car
3- motorcycle
4- background
Andrew Ng
Defining the target label y
1- pedestrian Need to output #$ , #& , #' , #( , class label (1-4)
2- car
3- motorcycle
4- background
Andrew Ng
Object Detection
Landmark
deeplearning.ai
detection
Landmark detection ConvNet
!" , !$ , !% , !&
Andrew Ng
Object Detection
Object
deeplearning.ai
detection
Car detection example
Training set:
x y
1
Andrew Ng
Sliding windows detection
Andrew Ng
Object Detection
Convolutional
deeplearning.ai implementation of
sliding windows
Turning FC layer into convolutional layers
MAX POOL FC FC
5×5 2×2 ⋮ ⋮
y
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 softmax (4)
MAX POOL FC FC
5×5 2×2 5×5 1×1
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1× 400 1 × 1 × 400 1×1×4
Andrew Ng
Convolution implementation of sliding windows
MAX POOL FC FC FC
5×5 2×2 5×5 1×1 1×1
14×14 ×3 10×10×16 5×5×16 1×1×400 1×1×400 1×1×4
MAX POOL FC FC FC
5×5 2×2 5×5 1×1 1×1
16×16×3 12×12×16 6×6×16 2×2×400 2×2×400 2×2×4
MAX POOL
5×5 2×2 5×5 1×1 1×1
28×28×3 24×24×16 12×12×16 8×8×400 8×8×400 8×8×4

[Sermanet et al., 2014, OverFeat: Integrated recognition, localization and detection using convolutional networks] Andrew Ng
Convolution implementation of sliding windows
MAX POOL
5×5 2×2 5×5 1×1 1×1
28×28 16×16 12×12 8×8×400 8×8×400 8×8×4
Andrew Ng
Object Detection
Bounding box
deeplearning.ai
predictions
Output accurate bounding boxes
Andrew Ng
YOLO algorithm
Labels for training
For each grid cell:
100
100
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Specify the bounding boxes
100
100
Object Detection
Intersection
deeplearning.ai
over union
Evaluating object localization
“Correct” if IoU ≥ 0.5
More generally, IoU is a measure of the overlap between two bounding boxes.
Andrew Ng
Object Detection
Non-max
deeplearning.ai
suppression
Non-max suppression example
Andrew Ng
0.6
0.8
0.9
0.3
0.5
Andrew Ng
0.6
0.8
0.9
0.7
0.7
Andrew Ng
Non-max suppression algorithm
$%
&'
Each output prediction is: &(
&)
&*
Discard all boxes with $% ≤ 0.6
While there are any remaining boxes:
• Pick the box with the largest $%
Output that as a prediction.
19×19
• Discard any remaining box with
IoU ≥ 0.5 with the box output
in the previous step Andrew Ng
Object Detection
Anchor boxes
deeplearning.ai
Overlapping objects:
Anchor box 1: Anchor box 2:
!"
#$
#%
#&
y = #'
()
(*
(+
Anchor box algorithm
Previously: With two anchor boxes:
Each object in training Each object in training
image is assigned to grid image is assigned to grid
cell that contains that cell that contains object’s
object’s midpoint. midpoint and anchor box
for the grid cell with
highest IoU.
Andrew Ng
Anchor box example !"
#$
#%
#&
#'
()
(*
(+
y = !"
#$
#%
#&
#'
Anchor box 1: Anchor box 2:
()
(*
(+
Andrew Ng
Object Detection
Putting it together:
deeplearning.ai
YOLO algorithm
Training 1 - pedestrian
'( 0 0
2 - car )* ? ?
)+ ? ?
3 - motorcycle
)- ? ?
). ? ?
/0 ? ?
/1 ? ?
/2 ? ?
y = '( 0 1
)* ? )*
)+ ? )+
)- ? )-
). ?
/0
).
? 0
/1 ?
/2 1
? 0
y is 3×3×2×8
Making predictions
'(
)*
)+
)-
).
/0
⋯ 4= /1
/2
'(
)*
3×3×2×8 )+
)-
).
/0
/1
/2
Andrew Ng
Outputting the non-max supressed outputs
• For each grid call, get 2 predicted bounding

boxes.
• Get rid of low probability predictions.
• For each class (pedestrian, car, motorcycle)

use non-max suppression to generate final
predictions.
Andrew Ng
Object Detection
Region proposals
deeplearning.ai
(Optional)
Region proposal: R-CNN
[Girshik et. al, 2013, Rich feature hierarchies for accurate object detection and semantic segmentation] Andrew Ng
Faster algorithms
R-CNN: Propose regions. Classify proposed regions one at a

time. Output label + bounding box.
Fast R-CNN: Propose regions. Use convolution implementation

of sliding windows to classify all the proposed
regions.
Faster R-CNN: Use convolutional network to propose regions.
[Girshik et. al, 2013. Rich feature hierarchies for accurate object detection and semantic segmentation]
[Girshik, 2015. Fast R-CNN]
[Ren et. al, 2016. Faster R-CNN: Towards real-time object detection with region proposal networks] Andrew Ng
Convolutional
Neural Networks
Semantic segmentation
with U-Net
Object Detection vs. Semantic Segmentation
Input image Object Detection Semantic Segmentation
Andrew Ng
Motivation for U-Net
Chest X-Ray Brain MRI
[Novikov et al., 2017, Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs]
[Dong et al., 2017, Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks ] Andrew Ng
Per-pixel class labels
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
1. Car
000000011111100000000000 0. Not Car
001111111111111100000000
001111111111111111111110
001111111111111111111110
000011100000000000111000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
Andrew Ng
Per-pixel class labels
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1. Car 2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2. Building 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3. Road 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
333311133333333333111 333 333311133333333333111 333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
Segmentation Map
Andrew Ng
Deep Learning for Semantic Segmentation
𝑦ො
Andrew Ng
Transpose Convolution
Normal Convolution
* =
* =
Andrew Ng
231 231 231
1 2 1
231 231 231
2 0 1 0 24
+2 0 1
231 231 231 2 +0
0 2 1 410
+6 7
+3+2
+4 1 3
26 +2
2 1
0
0 37
+4 0
0 2
2
3 2 weight filter
6 33
+0 4 2
2x2
4x4
filter f x f = 3 x 3 padding p = 1 stride s = 2

Andrew Ng
Deep Learning for Semantic Segmentation
Skip connection
𝑦ො
Andrew Ng
U-Net
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng
U-Net
hxwx3 h x w x # classes
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Copyright:

Available Formats

Copyright Notice

These slides are distributed under the Creative Commons License.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

5×5 2×2 5×5 1×1

14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1× 400 1 × 1 × 400 1×1×4

5×5 2×2 5×5 1×1 1×1

14×14 ×3 10×10×16 5×5×16 1×1×400 1×1×400 1×1×4

5×5 2×2 5×5 1×1 1×1

16×16×3 12×12×16 6×6×16 2×2×400 2×2×400 2×2×4

5×5 2×2 5×5 1×1 1×1

28×28×3 24×24×16 12×12×16 8×8×400 8×8×400 8×8×4

5×5 2×2 5×5 1×1 1×1

28×28 16×16 12×12 8×8×400 8×8×400 8×8×4

“Correct” if IoU ≥ 0.5

• For each grid call, get 2 predicted bounding

• Get rid of low probability predictions.

• For each class (pedestrian, car, motorcycle)

R-CNN: Propose regions. Classify proposed regions one at a

Fast R-CNN: Propose regions. Use convolution implementation

Faster R-CNN: Use convolutional network to propose regions.

Input image Object Detection Semantic Segmentation

Chest X-Ray Brain MRI

filter f x f = 3 x 3 padding p = 1 stride s = 2

You might also like