You are on page 1of 42

MLDL-I

Machine Learning and Deep Learning - I


Horizontal Edge Circular Edge Eye 0.1

Vertical Edge
DNN 0.2

Changes in Value
Beak 0.7
Sharp Turns

Angular Edge
ImageNet
• 1.2 M Training Data*
• 50K Validation Data*
• 100K Test Data*
• 1000 Classes*

*indicates the 2010 Competition


0.1
Horizontal Edge Circular Edge Eye
0.2
Vertical Edge
DNN
0.01
Changes in Value
Beak
Sharp Turns

Angular Edge 0.07

ImageNet
0.1
Horizontal Edge Circular Edge Eye
0.2
Vertical Edge
DNN
0.01
Changes in Value
Beak
Sharp Turns

Angular Edge 0.0.7

ImageNet
Horizontal Edge Circular Edge Eye 0.1

Vertical Edge
DNN 0.2

Changes in Value
Beak 0.7
Sharp Turns

Angular Edge
ImageNet
Horizontal Edge Circular Edge Eye 0.1

Vertical Edge
DNN 0.2

Changes in Value
Beak 0.7
Sharp Turns

Angular Edge
Gradient
Calculation Off
ImageNet
Transfer Learning

Encoder DNN Output

DNN Output
Encoder
Transfer Learning

Encoder DNN Output

DNN Output
Encoder
CNN Architectures
AlexNet
- Used ReLU
- Around 60 M Param
- Used 2 GTX 580
- (VRAM 6 GB Total)
- Overlapping Pooling
- Used Dropout

2012 ImageNet Classification with Deep Convolutional Neural Networks


Link
VGG

- Simplified Architecture
- Consists of 3x3 Conv ,2x2 MaxPool
- Resolution down by scale of 2, Channel up by scale of 2

2015 Very deep convolutional networks for large-scale image recognition


Link
Convolutional
Kernel
Convolutional
Kernel

18

Applying a series of two 3x3 kernel is giving similar


output to applying one 5x5 kernel, but
5x5 – 2x(3x3) = 7 parameters less per kernel
VGG

2015 Very deep convolutional networks for large-scale image recognition


Link
2015 Very deep convolutional networks for large-scale image recognition
GoogLeNet

- 22 layers
- (27 Including
Pooliing)

2015 Going Deeper with Convolutions


Link
Inception

2015 Going Deeper with Convolutions


Inception

2015 Going Deeper with Convolutions


Link
Global Average Pooling

5 x 5 x 512 1 x 1 x 512

2015 Going Deeper with Convolutions


1x1 Convolutions

a[2]3

a[1]n

1x1 Convolutions can estimate the information across all the input channels and represent them in
a smaller or larger number of channels, depending on the filter size specified Link
1x1 Convolutions

1x1
Conv2D
(64)

254 x 254 x 64

254 x 254 x 256

(256x1)x64 = 16,384
5x5
Conv2D
(64)

254 x 254 x 32

254 x 254 x 256

(5x5x256) x 64 = 409,600
Bottleneck

1x1 5x5
Conv2D Conv2D
(64) (64)

254 x 254 x 64 254 x 254 x 64

254 x 254 x 256


(256x1)x64 + (5x5x64) x 64 = 118,784
(5x5x256) x 64 = 409,600
GoogLeNet

2015 Going Deeper with Convolutions


Link
GoogLeNet

2015 Going Deeper with Convolutions


Link
ResNet

2016 Deep Residual Learning for Image Recognition


ResNet

2016 Deep Residual Learning for Image Recognition


ResNet

2016 Deep Residual Learning for Image Recognition


ResNet

2016 Deep Residual Learning for Image Recognition


2016 Deep Residual Learning for Image Recognition
2012 ImageNet Classification with Deep Convolutional Neural Networks

2015 Very deep convolutional networks for large-scale image recognition

2015 Going Deeper with Convolutions

2016 Deep Residual Learning for Image Recognition


MobileNet

2017 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Link
Link
MobileNetV2

2018 MobileNetV2: Inverted Residuals and Linear Bottlenecks


SENet

2018 Squeeze-and-Excitation Networks


SENet

2018 Squeeze-and-Excitation Networks


EfficientNet

2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks


EfficientNet

2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks


Any Question?

You might also like