You are on page 1of 51

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode


Case Studies

Why look at
deeplearning.ai
case studies?
Outline
Classic networks:
• LeNet-5
• AlexNet
• VGG

ResNet

Inception

Andrew Ng
Case Studies

Classic networks
deeplearning.ai
LeNet - 5
avg pool avg pool


"#
5×5 f=2 5×5 f=2 ⋮
s=1 s=2 s=1 s=2

32×32 ×1 28×28×6 14×14×6 10×10×16 5×5×16


120 84

[LeCun et al., 1998. Gradient-based learning applied to document recognition] Andrew Ng


AlexNet
MAX-POOL MAX-POOL

11 × 11 3×3 5×5 3×3


s=4 s=2 same s=2
55×55 ×96 27×27 ×96 27×27 ×256 13×13 ×256
227×227 ×3

MAX-POOL
= ⋮ ⋮ ⋮
3×3 3×3 3×3 3×3
s=2
Softmax
same
1000
13×13 ×384 13×13 ×384 13×13 ×256 6×6 ×256 9216 4096 4096

[Krizhevsky et al., 2012. ImageNet classification with deep convolutional neural networks] Andrew Ng
VGG - 16
CONV = 3×3 filter, s = 1, same MAX-POOL = 2×2 , s = 2

224×224×64 112×112 ×64 112×112 ×128 56×56 ×128


[CONV 64] POOL [CONV 128] POOL
×2 ×2

224×224 ×3

56×56 ×256 28×28 ×256 28×28 ×512 14×14×512


[CONV 256] POOL [CONV 512] POOL
×3 ×3

14×14 ×512 7×7×512 FC FC Softmax


[CONV 512] POOL 4096 4096 1000
×3

[Simonyan & Zisserman 2015. Very deep convolutional networks for large-scale image recognition] Andrew Ng
Case Studies

Residual Networks
deeplearning.ai
(ResNets)
Residual block
![#%(]
![#] ![#%&]

' [#%(] = * [#%(] ![#] + , [#%(] ![#%(] = -(' [#%(] ) ' [#%&] = * [#%&] ![#%(] + , [#%&] ![#%&] = -(' [#%&] )

[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Residual Network

x ![#]

Plain ResNet
training error

training error
# layers # layers
[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Case Studies

Why ResNets
deeplearning.ai
work
Why do residual networks work?

Andrew Ng
ResNet
Plain

ResNet

[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Case Studies

Network in Network
deeplearning.ai
and 1×1 convolutions
Why does a 1 × 1 convolution do?
1 2 3 6 5 8
3 5 5 1 3 4
2 1 3 4 9 3
4 7 8 5 7 9
∗ 2 =
1 5 3 7 4 8
5 4 9 8 3 5
6×6

∗ =

6 × 6 × 32 1 × 1 × 32 6 × 6 × # filters
[Lin et al., 2013. Network in network] Andrew Ng
Using 1×1 convolutions
ReLU

CONV 1 × 1
32
28 × 28 × 32
28 × 28 × 192

[Lin et al., 2013. Network in network] Andrew Ng


Case Studies

Inception network
deeplearning.ai
motivation
Motivation for inception network
1×1

3×3
64

128
5×5 28
32
32
28
28 × 28 × 192 MAX-POOL

[Szegedy et al. 2014. Going deeper with convolutions] Andrew Ng


The problem of computational cost

CONV
5 × 5,
same,
32 28 × 28 × 32
28 × 28 × 192

Andrew Ng
Using 1×1 convolution

CONV CONV
1 × 1, 5 × 5,
16, 32,
1 × 1 × 192 28 × 28 × 16 5 × 5 × 16 28 × 28 × 32
28 × 28 × 192

Andrew Ng
Case Studies

Inception network
deeplearning.ai
Inception module
1×1
CONV

1×1 3×3
CONV CONV
Previous Channel
Activation Concat
1×1 5×5
CONV CONV
MAXPOOL
3 × 3,s = 1
1×1
same CONV
Andrew Ng
Inception network

[Szegedy et al., 2014, Going Deeper with Convolutions] Andrew Ng


http://knowyourmeme.com/memes/we-need-to-go-deeper Andrew Ng
Convolutional
Neural Networks

MobileNet
Motivation for MobileNets
• Low computational cost at deployment
• Useful for mobile and embedded vision
applications
• Key idea: Normal vs. depthwise-
separable convolutions

[Howard et al. 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications] Andrew Ng
Normal Convolution

* =

3x3x3
4 4x x4 4x 5
6x6x3

Computational cost = #filter params x # filter positions x # of filters

Andrew Ng
Depthwise Separable Convolution
Normal Convolution

* =

Depthwise Separable Convolution

* * =

Depthwise Pointwise

Andrew Ng
Depthwise Convolution

* =

3x3 4x4x3
6x6x3

Computational cost = #filter params x # filter positions x # of filters

Andrew Ng
Depthwise Separable Convolution
Depthwise Convolution

* =

Pointwise Convolution

* =

Andrew Ng
Pointwise Convolution

* =

1x1x3

4x4x3 4 x4 4x x4 5

Computational cost = #filter params x # filter positions x # of filters

Andrew Ng
Depthwise Separable Convolution
Normal Convolution

* =

Depthwise Separable Convolution

* * =

Depthwise Pointwise

Andrew Ng
Cost Summary
Cost of normal convolution

Cost of depthwise separable convolution

[Howard et al. 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications] Andrew Ng
Depthwise Separable Convolution
Depthwise Convolution

* =

Pointwise Convolution

* =

Andrew Ng
Convolutional
Neural Networks

MobileNet
Architecture
MobileNet
MobileNet v1

MobileNet v2
Residual Connection

Expansion Depthwise Projection

[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet v2 Bottleneck
Residual Connection

Expansion Depthwise Pointwise

[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet
MobileNet v1

MobileNet v2
Residual Connection

Expansion Depthwise Projection

[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet v2 Full Architecture

conv2d avgpool conv2d


conv2d 1x1 7x7 1x1

[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
Convolutional
Neural Networks

EfficientNet
EfficientNet
Baseline

𝑦ො

Wider
Higher
Deeper Resolution
Compound scaling

𝑦ො 𝑦ො 𝑦ො 𝑦ො

[Tan and Le, 2019, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks] Andrew Ng
Practical advice for
using ConvNets

Transfer Learning
deeplearning.ai
Practical advice for
using ConvNets

Data augmentation
deeplearning.ai
Common augmentation method
Mirroring

Random Cropping Rotation


Shearing
Local warping

Andrew Ng
Color shifting

+20,-20,+20

-20,+20,+20

+5,0,+50

Andrew Ng
Implementing distortions during training

Andrew Ng
Practical advice for
using ConvNets

The state of
deeplearning.ai
computer vision
Data vs. hand-engineering

Two sources of knowledge


• Labeled data
• Hand engineered features/network architecture/other components
Andrew Ng
Tips for doing well on benchmarks/winning
competitions
Ensembling
• Train several networks independently and average their outputs

Multi-crop at test time


• Run classifier on multiple versions of test images and average results

Andrew Ng
Use open source code

• Use architectures of networks published in the literature

• Use open source implementations if possible

• Use pretrained models and fine-tune on your dataset

Andrew Ng

You might also like