Professional Documents
Culture Documents
CNN Architectures-Stanford-Small Size
CNN Architectures-Stanford-Small Size
Case Studies
- AlexNet
- VGG
- GoogLeNet
- ResNet
Also....
- NiN (Network in Network) - DenseNet
- Wide ResNet - FractalNet
- ResNeXT - SqueezeNet
- Stochastic Depth
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 7 May 2, 2017
Review: LeNet-5
[LeCun et al., 1998]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 8 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Architecture:
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
Max POOL3
FC6
FC7
FC8
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 9 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 10 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 11 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 12 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 13 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 14 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 15 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 16 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 17 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 18 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 19 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 20 May 2, 2017
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 21 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 22 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
ZFNet: Improved
hyperparameters over
AlexNet
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 23 May 2, 2017
ZFNet [Zeiler and Fergus, 2013]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 24 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Deeper Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 25 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] FC 1000 FC 4096
FC 4096 Pool
FC 4096 3x3 conv, 512
Pool
Small filters, Deeper networks
3x3 conv, 512
3x3 conv, 512 3x3 conv, 512
3x3 conv, 512 3x3 conv, 512
3x3 conv, 512 Pool
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 26 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] FC 1000 FC 4096
FC 4096 Pool
FC 4096 3x3 conv, 512
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 27 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] FC 1000 FC 4096
FC 4096 Pool
FC 4096 3x3 conv, 512
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 28 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] FC 1000 FC 4096
FC 4096 Pool
FC 4096 3x3 conv, 512
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 29 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] FC 1000 FC 4096
FC 4096 Pool
FC 4096 3x3 conv, 512
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 30 May 2, 2017
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
Softmax
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728 FC 1000
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864 FC 4096
POOL2: [112x112x64] memory: 112*112*64=800K params: 0 FC 4096
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728 Pool
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456 3x3 conv, 512
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912 3x3 conv, 512
Pool
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
3x3 conv, 512
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824 3x3 conv, 512
POOL2: [28x28x256] memory: 28*28*256=200K params: 0 3x3 conv, 512
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648 Pool
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 128
3x3 conv, 128
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 Pool
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 64
POOL2: [7x7x512] memory: 7*7*512=25K params: 0 3x3 conv, 64
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448 Input
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 31 May 2, 2017
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
Softmax
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728 FC 1000
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864 FC 4096
POOL2: [112x112x64] memory: 112*112*64=800K params: 0 FC 4096
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728 Pool
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456 3x3 conv, 512
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912 3x3 conv, 512
Pool
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
3x3 conv, 512
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824 3x3 conv, 512
POOL2: [28x28x256] memory: 28*28*256=200K params: 0 3x3 conv, 512
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648 Pool
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 128
3x3 conv, 128
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 Pool
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 64
POOL2: [7x7x512] memory: 7*7*512=25K params: 0 3x3 conv, 64
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448 Input
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
TOTAL memory: 24M * 4 bytes ~= 96MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 32 May 2, 2017
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728 Note:
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K params: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728 Most memory is in
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456 early CONV
POOL2: [56x56x128] memory: 56*56*128=400K params: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K params: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K params: 0 Most params are
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 in late FC
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K params: 0
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000
TOTAL memory: 24M * 4 bytes ~= 96MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 33 May 2, 2017
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
Softmax
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728 FC 1000 fc8
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864 FC 4096 fc7
POOL2: [112x112x64] memory: 112*112*64=800K params: 0 FC 4096 fc6
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728 Pool
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456 3x3 conv, 512 conv5-3
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912 3x3 conv, 512 conv5-1
Pool
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
3x3 conv, 512 conv4-3
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824 3x3 conv, 512 conv4-2
POOL2: [28x28x256] memory: 28*28*256=200K params: 0 3x3 conv, 512 conv4-1
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648 Pool
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256 conv3-2
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296 3x3 conv, 256 conv3-1
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 128 conv2-2
3x3 conv, 128
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 Pool
conv2-1
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 3x3 conv, 64 conv1-2
POOL2: [7x7x512] memory: 7*7*512=25K params: 0 3x3 conv, 64 conv1-1
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448 Input
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
TOTAL memory: 24M * 4 bytes ~= 96MB / image (only forward! ~*2 for bwd) Common names
TOTAL params: 138M parameters
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 34 May 2, 2017
Case Study: VGGNet Softmax
Softmax
FC 1000
FC 4096
[Simonyan and Zisserman, 2014] fc8 FC 1000 FC 4096
fc7 FC 4096 Pool
fc6 FC 4096 3x3 conv, 512
Pool 3x3 conv, 512
Details: conv5-3 3x3 conv, 512 3x3 conv, 512
2012 fc7 FC 4096 conv4-1 3x3 conv, 512 3x3 conv, 512
- Use VGG16 or VGG19 (VGG19 only conv5 3x3 conv, 256 conv3-1 3x3 conv, 256 3x3 conv, 256
conv4 3x3 conv, 384 Pool Pool
slightly better, more memory) Pool conv2-2 3x3 conv, 128 3x3 conv, 128
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 35 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Deeper Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 36 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
- 22 layers
- Efficient “Inception” module
- No FC layers
- Only 5 million parameters!
12x less than AlexNet Inception module
- ILSVRC’14 classification winner
(6.7% top 5 error)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 37 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Inception module
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 38 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 39 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 40 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Example:
Filter
concatenation
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 41 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Filter
concatenation
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 42 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Filter
concatenation
28x28x128
1x1 conv, 3x3 conv, 5x5 conv,
3x3 pool
128 192 96
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 43 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Filter
concatenation
28x28x128
1x1 conv, 3x3 conv, 5x5 conv,
3x3 pool
128 192 96
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 44 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Filter
concatenation
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 45 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Filter
concatenation
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 46 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x(128+192+96+256) = 28x28x672
Filter
concatenation
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 47 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 48 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 49 May 2, 2017
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 50 May 2, 2017
Reminder: 1x1 convolutions
1x1 CONV
56 with 32 filters
56
(each filter has size
1x1x64, and performs a
64-dimensional dot
56 product)
56
64 32
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 51 May 2, 2017
Reminder: 1x1 convolutions
1x1 CONV
56 with 32 filters
56
preserves spatial
dimensions, reduces depth!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 52 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Filter
Filter concatenation
concatenation
Previous Layer
Naive Inception module
Inception module with dimension reduction
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 53 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
1x1 conv “bottleneck”
layers
Filter
Filter concatenation
concatenation
Previous Layer
Naive Inception module
Inception module with dimension reduction
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 54 May 2, 2017
Case Study: GoogLeNet Using same parallel layers as
naive example, and adding “1x1
[Szegedy et al., 2014]
conv, 64 filter” bottlenecks:
28x28x480
Filter Conv Ops:
concatenation
[1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 64] 28x28x64x1x1x256
28x28x128 28x28x192 28x28x96 28x28x64
[1x1 conv, 128] 28x28x128x1x1x256
3x3 conv, 5x5 conv, 1x1 conv,
1x1 conv,
192 96 64
[3x3 conv, 192] 28x28x192x3x3x64
128
[5x5 conv, 96] 28x28x96x5x5x64
28x28x64 28x28x64 28x28x256
[1x1 conv, 64] 28x28x64x1x1x256
1x1 conv, 1x1 conv,
3x3 pool Total: 358M ops
64 64
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 55 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Inception module
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 56 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Stem Network:
Conv-Pool-
2x Conv-Pool
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 57 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Stacked Inception
Modules
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 58 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Classifier output
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 59 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Classifier output
(removed expensive FC layers!)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 60 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 61 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
22 total layers with weights (including each parallel layer in an Inception module)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 62 May 2, 2017
Case Study: GoogLeNet
[Szegedy et al., 2014]
- 22 layers
- Efficient “Inception” module
- No FC layers
- 12x less params than AlexNet
- ILSVRC’14 classification winner Inception module
(6.7% top 5 error)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 63 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
“Revolution of Depth”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 64 May 2, 2017
Softmax
3x3 conv, 64
relu 3x3 conv, 64
Very deep networks using residual F(x) + x 3x3 conv, 64
..
.
conv
- 152-layer model for ImageNet X
3x3 conv, 128
3x3 conv, 128
F(x) relu
- ILSVRC’15 classification winner identity 3x3 conv, 128
3x3 conv, 128
3x3 conv, 64
detection competitions in X
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64 / 2
Input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 65 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 66 May 2, 2017
Case Study: ResNet
[He et al., 2015]
56-layer
Training error
56-layer
Test error
20-layer
20-layer
Iterations Iterations
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 67 May 2, 2017
Case Study: ResNet
[He et al., 2015]
56-layer
Training error
56-layer
Test error
20-layer
20-layer
Iterations Iterations
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 68 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 69 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 70 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
relu
H(x) F(x) + x
conv conv
F(x) X
relu relu
identity
conv
conv
X X
“Plain” layers Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 71 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
H(x) = F(x) + x relu
H(x) F(x) + x
Use layers to
conv conv fit residual
F(x) relu
X F(x) = H(x) - x
relu identity
instead of
conv
conv H(x) directly
X X
“Plain” layers Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 72 May 2, 2017
Case Study: ResNet
Softmax
FC 1000
Pool
relu
- Stack residual blocks
3x3 conv, 512
3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
two 3x3 conv layers .
3x3 conv, 128
3x3 conv 3x3 conv, 128
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64, / 2
Input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 73 May 2, 2017
Case Study: ResNet
Softmax
FC 1000
Pool
relu
- Stack residual blocks
3x3 conv, 512
3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
two 3x3 conv layers .
3x3 conv, 128
- Periodically, double # of 3x3 conv 3x3 conv, 128
3x3 conv, 128
filters and downsample F(x) X 3x3 conv, 128
filters, /2
relu
identity
3x3 conv, 128
spatially with
spatially using stride 2 3x3 conv, 128 stride 2
3x3 conv
(/2 in each dimension)
3x3 conv, 128, / 2
Pool
7x7 conv, 64, / 2
Input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 74 May 2, 2017
Case Study: ResNet
Softmax
FC 1000
Pool
relu
- Stack residual blocks
3x3 conv, 512
3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
two 3x3 conv layers .
3x3 conv, 128
- Periodically, double # of 3x3 conv 3x3 conv, 128
3x3 conv, 64
- Additional conv layer at 3x3 conv, 64
the beginning X
3x3 conv, 64
3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64, / 2 Beginning
Input conv layer
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 75 May 2, 2017
Case Study: ResNet
Softmax
FC 1000 No FC layers
Pool besides FC
1000 to
[He et al., 2015] 3x3 conv, 512
output
3x3 conv, 512
classes
3x3 conv, 512
3x3 conv, 64
- Additional conv layer at 3x3 conv, 64
the beginning X
3x3 conv, 64
3x3 conv, 64
- No FC layers at the end Residual block 3x3 conv, 64
classes) Pool
7x7 conv, 64, / 2
Input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 76 May 2, 2017
Case Study: ResNet
Softmax
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64, / 2
Input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 77 May 2, 2017
Case Study: ResNet
[He et al., 2015]
28x28x256
output
28x28x256
input
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 78 May 2, 2017
Case Study: ResNet
[He et al., 2015]
28x28x256
output
1x1 conv, 256 filters projects
back to 256 feature maps
For deeper networks (28x28x256) 1x1 conv, 256
(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv operates over
3x3 conv, 64
(similar to GoogLeNet) only 64 feature maps
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 79 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 80 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 81 May 2, 2017
Case Study: ResNet
[He et al., 2015]
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions
ILSVRC 2015 classification winner (3.6%
top 5 error) -- better than “human
performance”! (Russakovsky 2014)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 82 May 2, 2017
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 83 May 2, 2017
Comparing complexity...
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 84 May 2, 2017
Comparing complexity... Inception-v4: Resnet + Inception!
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 85 May 2, 2017
VGG: Highest
Comparing complexity... memory, most
operations
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 86 May 2, 2017
GoogLeNet:
Comparing complexity... most efficient
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 87 May 2, 2017
AlexNet:
Comparing complexity... Smaller compute, still memory
heavy, lower accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 88 May 2, 2017
ResNet:
Comparing complexity... Moderate efficiency depending on
model, highest accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 89 May 2, 2017
Forward pass time and power consumption
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 90 May 2, 2017
Other architectures to know...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 91 May 2, 2017
Network in Network (NiN)
[Lin et al. 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 92 May 2, 2017
Improving ResNets...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 93 May 2, 2017
Improving ResNets...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 94 May 2, 2017
Improving ResNets...
Aggregated Residual Transformations for Deep
Neural Networks (ResNeXt)
256-d out
[Xie et al. 2016]
32
multiple parallel 3x3 conv, 64 paths
3x3 conv, 4 3x3 conv, 4 ... 3x3 conv, 4
pathways
(“cardinality”) 1x1 conv, 64
1x1 conv, 4 1x1 conv, 4 1x1 conv, 4
- Parallel pathways
256-d in
similar in spirit to
Inception module 256-d in
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 95 May 2, 2017
Improving ResNets...
Deep Networks with Stochastic Depth
[Huang et al. 2016]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 96 May 2, 2017
Beyond ResNets...
FractalNet: Ultra-Deep Neural Networks without Residuals
[Larsson et al. 2017]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 97 May 2, 2017
Beyond ResNets...
Densely Connected Convolutional Networks Softmax
Conv
- Alleviates vanishing gradient, Concat
Dense Block 2
strengthens feature propagation, Conv
Conv
encourages feature reuse Pool
Concat Conv
Dense Block 1
Conv
Conv
Input Input
Dense Block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 2, 2017
Efficient networks...
SqueezeNet: AlexNet-level Accuracy With 50x Fewer
Parameters and <0.5Mb Model Size
[Iandola et al. 2017]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 2, 2017
Summary: CNN Architectures
Case Studies
- AlexNet
- VGG
- GoogLeNet
- ResNet
Also....
- NiN (Network in Network) - DenseNet
- Wide ResNet - FractalNet
- ResNeXT - SqueezeNet
- Stochastic Depth
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 2, 2017
0
Summary: CNN Architectures
- VGG, GoogLeNet, ResNet all in wide use, available in model zoos
- ResNet current best default
- Trend towards extremely deep networks
- Significant research centers around design of layer / skip
connections and improving gradient flow
- Even more recent trend towards examining necessity of depth vs.
width and residual connections
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 101 May 2, 2017