Professional Documents
Culture Documents
Ilsvrc2015 Deep Residual Learning Kaiminghe
Ilsvrc2015 Deep Residual Learning Kaiminghe
Kaiming He
with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun
Microsoft Research Asia (MSRA)
MSRA @ ILSVRC & COCO 2015 Competitions
1st places in all five main tracks
ImageNet Classification: Ultra-deep (quote Yann) 152-layer nets
ImageNet Detection: 16% better than 2nd
ImageNet Localization: 27% better than 2nd
COCO Detection: 11% better than 2nd
COCO Segmentation: 12% better than 2nd
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Revolution of Depth 28.2
25.8
152 layers
16.4
11.7
22 layers 19 layers
6.7 7.3
34
16 layers
8 layers
shallow
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Revolution of Depth
AlexNet, 8 layers 11x11 conv, 96, /4, pool/2
(ILSVRC 2012)
5x5 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Revolution of Depth soft max2
FC
Av eragePool
7 x7 + 1 (V)
3 x3 + 1 (S)
Conv
5 x5 + 1 (S)
Conv
1 x1 + 1 (S)
soft max1
Dept hConcat FC
Dept hConcat
Dept hConcat
MaxPool
3 x3 + 2 (S)
Conv
3 x3 + 1 (S)
LocalRespNorm
Conv
7 x7 + 2 (S)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
7x7 conv, 64, /2, pool/2
1x1 conv, 64
3x3 conv, 64
1x1 conv, 64
3x3 conv, 64
1x1 conv, 64
3x3 conv, 64
Revolution of Depth
3x3 conv, 128
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
7x7 conv, 64, /2, pool/2
1x1 conv, 64
Revolution of Depth
3x3 conv, 64
1x1 conv, 64
3x3 conv, 64
1x1 conv, 64
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Is learning better networks
as simple as stacking more layers?
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Simply stacking layers?
CIFAR-10
train error (%) test error (%)
20 20
56-layer
56-layer
10 10
20-layer
20-layer
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
iter. (1e4) iter. (1e4)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Simply stacking layers?
CIFAR-10 ImageNet-1000
20
60
56-layer
44-layer 50
32-layer
error (%)
error (%)
10 34-layer
20-layer 40
5 plain-20 30
plain-32
plain-18
plain-44
plain-56 solid: test/val plain-34 18-layer
0 20
0 1 2 3
iter. (1e4)
4 5 6
dashed: train 0 10 20 30
iter. (1e4)
40 50
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
7x7 conv, 64, /2 7x7 conv, 64, /2
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
a deeper
model 3x3 conv, 64 3x3 conv, 64 counterpart
3x3 conv, 64 3x3 conv, 64
3x3 conv, 512 3x3 conv, 512 cannot find the solution when going
deeper
3x3 conv, 512
fc 1000 fc 1000
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Deep Residual Learning
Plaint net is any desired mapping,
hope the 2 weight layers fit ()
weight layer
any two
stacked layers relu
weight layer
relu
()
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Deep Residual Learning
Residual net is any desired mapping,
hope the 2 weight layers fit ()
= +
relu
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Deep Residual Learning
is a residual mapping w.r.t. identity
If identity were optimal,
weight layer
easy to set weights as 0
() relu identity
weight layer If optimal mapping is closer to identity,
easier to find small fluctuations
= +
relu
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Related Works Residual Representations
VLAD & Fisher Vector [Jegou et al 2010], [Perronnin et al 2007]
Encoding residual vectors; powerful shallower representations.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
7x7 conv, 64, /2 7x7 conv, 64, /2
pool, /2 pool, /2
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
ResNet
3x3 conv, 64 3x3 conv, 64
Other remarks:
3x3 conv, 256 3x3 conv, 256
fc 1000 fc 1000
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Training
All plain/residual nets are trained from scratch
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
CIFAR-10 experiments
CIFAR-10 plain nets CIFAR-10 ResNets
20 20
ResNet-20
56-layer ResNet-32
ResNet-44
ResNet-56
44-layer ResNet-110
20-layer
error (%)
32-layer
error (%)
10
20-layer
10 32-layer
44-layer
5 plain-20
plain-32
5
56-layer
plain-44
plain-56
solid: test 110-layer
0
0 1 2 3 4 5 6 dashed: train 0
0 1 2 3 4 5 6
iter. (1e4) iter. (1e4)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
ImageNet experiments
ImageNet plain nets ImageNet ResNets
60 60
50 50
error (%)
error (%)
40
34-layer 40 18-layer
30 30
solid: test ResNet-18
plain-18
plain-34
dashed: train 18-layer ResNet-34 34-layer
20 20
0 10 20 30 40 50 0 10 20 30 40 50
iter. (1e4) iter. (1e4)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
ImageNet experiments
A practical design of going deeper
64-d 256-d
3x3, 64 1x1, 64
relu
relu
3x3, 64
relu
3x3, 64
1x1, 256
relu relu
similar
all-3x3 complexity bottleneck
(for ResNet-50/101/152)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
ImageNet experiments
8
Deeper ResNets have lower error 7.4
this model has
lower time complexity
than VGG-16/19 6.7 7
6.1
5.7 6
16.4
11.7
22 layers 19 layers
6.7 7.3
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Features matter. (quote [Girshick et al. 2014], the R-CNN paper)
2nd-place margin
task winner
MSRA (relative)
image
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
Object Detection (brief)
RPN learns proposals by extremely deep nets
We use only 300 proposals (no SS/EB/MCG!)
All are based on CNN features; all are end-to-end (train and/or inference)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
Our results on COCO too many objects, lets check carefully!
*the original image is from the COCO dataset
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
*the original image is from the COCO dataset
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
*the original image is from the COCO dataset
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
Instance Segmentation (brief)
Solely CNN-based (features matter)
box instances (RoIs)
mask instances
CONVs
FCs
RoI warping ,
categorized instances
pooling
for each RoI
FCs
person
person person
conv feature map
horse
masking
for each RoI
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Jifeng Dai, Kaiming He, & Jian Sun. Instance-aware Semantic Segmentation via Multi-task Network Cascades. arXiv 2015.
input
*the original image is from the COCO dataset
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Jifeng Dai, Kaiming He, & Jian Sun. Instance-aware Semantic Segmentation via Multi-task Network Cascades. arXiv 2015.
Conclusions MSRA team
Features matter!
Kaiming He Xiangyu Zhang Shaoqing Ren
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv 2015.
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
Jifeng Dai, Kaiming He, & Jian Sun. Instance-aware Semantic Segmentation via Multi-task Network Cascades. arXiv 2015.