You are on page 1of 65

STAT 453: Introduction to Deep Learning and Generative Models

Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching

Lecture 13
Introduction to
with Applications in Python

Convolutional Neural Networks


Sebastian Raschka STAT 453: Intro to Deep Learning 1
Lecture Overview

1. What CNNs Can Do


2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 2


fi
Next Lecture

1. Padding (control output size in addition to stride)


2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
A. VGG16 (simple, deep CNN)
B. ResNet and skip connections
C. Fully convolutional networks (no fully connected layers)
D. Inception (parallel convolutions and auxiliary losses)
5. Transfer learning

Sebastian Raschka STAT 453: Intro to Deep Learning 3


Common Applications of CNNs

1. What CNNs Can Do


2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 4


fi
CNNs for Image Classi cation

output

Image Source:
twitter.com%2Fcats&psig=AOvVaw30_o-PCM-
K21DiMAJQimQ4&ust=1553887775741551
p(y=cat)

Image Source: https://www.pinterest.com/pin/


244742560974520446

Sebastian Raschka STAT 453: Intro to Deep Learning 5


fi
Object Detection

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Uni ed, real-time object detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 779-788).

Sebastian Raschka STAT 453: Intro to Deep Learning 6


fi
Object Segmentation

umbrella.98 bus.99

umbrella.98
person1.00

person1.00
person1.00
backpack1.00
person1.00 person.99
handbag.96 person.99
person1.00 person1.00 person1.00
person1.00 person1.00
person.95 person.98
person1.00
person1.00 person1.00 person.94 person1.00 person1.00 person.89

person1.00 sheep.99
backpack.99
sheep.99 sheep.86
backpack.93 sheep.82 sheep.96
sheep.96 sheep.93 sheep.91 sheep.95 sheep.96 sheep1.00
sheep1.00
sheep.99
sheep1.00
sheep.99
sheep.96

sheep.99

person.99
bottle.99
dining table.96

bottle.99
bottle.99

person.99person1.00
person1.00
traffic light.96 tv.99

chair.98 chair.99
chair.90
dining table.99 chair.96 wine glass.97
chair.86
bottle.99wine glass.93 chair.99
bowl.85 wine glass1.00

elephant1.00
wine glass.99
wine glass1.00
person1.00 chair.96 chair.99 fork.95

person1.00 traffic light.95 bowl.81


person1.00
traffic light.92 traffic light.84
person1.00 person.85
person.96 truck1.00 person.99
motorcycle1.00 person.96person1.00
person.83 person1.00
motorcycle1.00 person.98
person.99 person.91
person.90 person.87 car.99 car.92
person.99
person.92 car.99 car.93
car1.00
motorcycle.95
knife.83

person.96

Figure 2. Mask R-CNN results on the COCO test set. These results are based on ResNet-101 [15], achieving a mask AP of 35.7 and
running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.

ingly minor change, RoIAlign has a large impact: it im- 2. Related Work
He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." In Proceedings of the IEEE International Conference on
proves mask accuracy by relative 10% to 50%, showing
Computer Vision, pp. 2961-2969. 2017. R-CNN: The Region-based CNN (R-CNN) approach [10]
bigger gains under stricter localization metrics. Second, we
found it essential to decouple mask and class prediction: we to bounding-box object detection is to attend to a manage-
predict a binary mask for each class independently, without able number of candidate object regions [33, 16] and evalu-
competition among classes, and rely on the network’s RoI ate convolutional networks [20, 19] independently on each
classification branch to predict the Sebastian
category.Raschka RoI.
In contrast,STAT 453: R-CNN
Intro was
to Deep extended [14, 9] to allow attending to RoIs 7
Learning
Face Recognition

[1]
x
<latexit sha1_base64="p8Wx+cqqkWj+1zNtDaf7R0Gpalg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122y7dbMLuRCyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLEykMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3Q2q4FIo3UKDk7URzGoWSt8LRzdRvPXJtRKzucZzwIKIDJfqCUbRS6+kh871g0i2V3Yo7A1kmXk7KkKPeLX11ejFLI66QSWqM77kJBhnVKJjkk2InNTyhbEQH3LdU0YibIJudOyGnVumRfqxtKSQz9fdERiNjxlFoOyOKQ7PoTcX/PD/F/lWQCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahog3BW3x5mTSrFe+8Ur27KNeu8zgKcAwncAYeXEINbqEODWAwgmd4hTcncV6cd+dj3rri5DNH8AfO5w81Jo97</latexit>

Similarity/
Distance
Score

[2]
x
<latexit sha1_base64="vzgd/QPklE2GpKgvXahAxpOTUdw=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122i7dbMLuRiyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLE8G1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1Q6pRcIkNw43AdqKQRqHAVji6mfqtR1Sax/LejBMMIjqQvM8ZNVZqPT1kfjWYdEtlt+LOQJaJl5My5Kh3S1+dXszSCKVhgmrte25igowqw5nASbGTakwoG9EB+pZKGqEOstm5E3JqlR7px8qWNGSm/p7IaKT1OAptZ0TNUC96U/E/z09N/yrIuExSg5LNF/VTQUxMpr+THlfIjBhbQpni9lbChlRRZmxCRRuCt/jyMmlWK955pXp3Ua5d53EU4BhO4Aw8uIQa3EIdGsBgBM/wCm9O4rw4787HvHXFyWeO4A+czx82rI98</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 8


Image Synthesis

Training set
Discriminator

Real /
Generated

Noise

Generated image

Generator

Sebastian Raschka STAT 453: Intro to Deep Learning 9


Why Computer Vision is (was) Hard:
Challenges with Image Classi cation
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 10


fi
fi
Why Image Classi cation is Hard

Di erent lighting, contrast, viewpoints, etc.

Image Source: Image Source: https://www.123rf.com/


twitter.com%2Fcats&psig=AOvVaw30_o-PCM- photo_76714328_side-view-of-tabby-cat-face-over-
K21DiMAJQimQ4&ust=1553887775741551 white.html

Or even simple translation This is hard for traditional


methods like multi-layer
perceptrons, because
the prediction is
basically based on a sum
of pixel intensities

Sebastian Raschka STAT 453: Intro to Deep Learning 11


ff
fi
Traditional Approaches

a) Use hand-engineered features

Sebastian Raschka STAT 453: Intro to Deep Learning 12


Traditional Approaches

a) Use hand-engineered features

Sasaki, K., Hashimoto, M., & Nagata, N. (2016). Person Invariant Classification of Subtle Facial Expressions Using Coded Movement Direction of
Keypoints. In Video Analytics. Face and Facial Expression Recognition and Audience Measurement (pp. 61-72). Springer, Cham.

Sebastian Raschka STAT 453: Intro to Deep Learning 13


Traditional Approaches

b) Preprocess images (centering, cropping, etc.)

Image Source: https://www.tokkoro.com/2827328-cat-animals-nature-feline-park-green-trees-grass.html

Sebastian Raschka STAT 453: Intro to Deep Learning 14


The Main Concepts Behind
Convolutional Neural Networks
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 15


fi
https://sg n.github.io/2020/06/22/Induction-Intro/

Sebastian Raschka STAT 453: Intro to Deep Learning 16


fi
Convolutional Neural Networks

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to
Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989.

Sebastian Raschka STAT 453: Intro to Deep Learning 17


Convolutional Neural Networks

C3: f. maps 16@10x10


C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Ha ner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning 18


ff
Hidden Layers

C3: f. maps 16@10x10


C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

"Automatic feature extractor" "Regular classi er"

Sebastian Raschka STAT 453: Intro to Deep Learning 19


fi
Hidden Layers

C3: f. maps 16@10x10


C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

Each "bunch" of feature maps represents one


hidden layer in the neural network.

Counting the FC layers, this network has 5 layers

Sebastian Raschka STAT 453: Intro to Deep Learning 20


Convolutional Neural Networks
Size of the resulting layers
Number of feature detectors
C3: f. maps 16@10x10
INPUT
C1: feature maps
6@28x28
S4: f. maps 16@5x5 Multi-layer perceptron
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

nowadays called "pooling" basically a fully-connected


layer + MSE loss
"Feature detectors" (weight matrices) (nowadays common to use
that are being reused ("weight sharing") fc-layer + softmax
=> also called "kernel" or " lter" + cross entropy)

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Ha ner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning 21


fi
ff
Main Concepts Behind
Convolutional Neural Networks

• Sparse-connectivity: A single element in the feature map is


connected to only a small patch of pixels. (This is very di erent
from connecting to the whole input image, in the case of multi-
layer perceptrons.)

• Parameter-sharing: The same weights are used for di erent


patches of the input image.

• Many layers: Combining extracted local patterns to global


patterns

Sebastian Raschka STAT 453: Intro to Deep Learning 22


ff
ff
A Closer Look at the Convolutional Layer

1. What CNNs Can Do


2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 23


fi
Weight Sharing
A "feature detector" ( lter, kernel) slides over the inputs to generate
a feature map

9
X
w j xj
The pixels are j=1
referred to
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>

as "receptive eld"

"feature map"

Rationale: A feature detector that works well in one region


may also work well in another region

Plus, it is a nice reduction in parameters to t

Sebastian Raschka STAT 453: Intro to Deep Learning 24


fi
fi
fi
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning 25


Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning 26


Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning 27


Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning 28


9
X (@1)
wj xj
j=1
Multiple "feature
<latexit sha1_base64="f26ph3SsblXR0kXxlacC2FehXAE=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaWeSMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaMBqSqqGGlEgiDuMVL3Blcjv35HhKRhcKuGEXE46gbUpxgpLbnmfkvG3E36F3baTs5TeO/220mhbB+l8MHtu2beKlpjwHliT0keTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0lwrliRCeIC6pKlpgDiRTjJ+I4WHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Mh2LMvz5NaqWgfF0s3J/ny5TSOLNgDB6AAbHAKyuAaVEAVYPAInsEreDOejBfj3fiYtGaM6cwu+APj8wfkL5gZ</latexit>

detectors" (kernels) are used


to create multiple feature
maps

9
X (@2)
w j xj
<latexit sha1_base64="nCqyd07UuJkUGWlSGLMV2F7bQVM=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaSeTMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaOcVBVVjDQiQVDgMVL3Blcjv35HhKQhv1XDiDgB6nLqU4yUllxzvyXjwE36F3baTs5TeO/220mhXDpK4YPbd828VbTGgPPEnpI8mKLiml+tTojjgHCFGZKyaVuRchIkFMWMpLlWLEmE8AB1SVNTjgIinWT8RgoPtdKBfih0cQXH6u+JBAVSDgNPdwZI9eSsNxL/85qx8s+chPIoVoTjySI/ZlCFcJQJ7FBBsGJDTRAWVN8KcQ8JhJVOLqdDsGdfnie1UtE+LpZuTvLly2kcWbAHDkAB2OAUlME1qIAqwOARPINX8GY8GS/Gu/Exac0Y05ld8AfG5w/luZga</latexit>
j=1

9
X (@3)
wj xj
<latexit sha1_base64="N3BOf0nmcHBzr6vnBzaSoMFhcQo=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiStoC6EohuXFewD2jRMppN22pkkzEzUErJy46+4caGIW7/BnX/j9LHQ1gMXDufcy733eBGjUlnWt7GwuLS8sppZy65vbG5tmzu7NRnGApMqDlkoGh6ShNGAVBVVjDQiQRD3GKl7g6uRX78jQtIwuFXDiDgcdQPqU4yUllzzoCVj7ib9CzttJ+cpvHf77SRfLh2n8MHtu2bOKlhjwHliT0kOTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0mwrliRCeIC6pKlpgDiRTjJ+I4VHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Uh2LMvz5NasWCXCsWbk1z5chpHBuyDQ5AHNjgFZXANKqAKMHgEz+AVvBlPxovxbnxMWheM6cwe+APj8wfnQ5gb</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning 29


Size Before and After Convolutions

Feature map size: kernel width


input width

padding
W K + 2P
O= +1
<latexit sha1_base64="F3e+5qMk1hWaddof/b46u0hNgJ4=">AAACBXicbZC7SgNBFIbPxluMt1VLLQaDIIhhNwraCEEbwcKI5gLJEmYns8mQ2Qszs0JYtrHxVWwsFLH1Hex8GyfJFpr4w8DHf87hzPndiDOpLOvbyM3NLywu5ZcLK6tr6xvm5lZdhrEgtEZCHoqmiyXlLKA1xRSnzUhQ7LucNtzB5ajeeKBCsjC4V8OIOj7uBcxjBCttdczdG3SO2p7AJGmgI3SNDlG5miZ3qQa7YxatkjUWmgU7gyJkqnbMr3Y3JLFPA0U4lrJlW5FyEiwUI5ymhXYsaYTJAPdoS2OAfSqdZHxFiva100VeKPQLFBq7vycS7Es59F3d6WPVl9O1kflfrRUr78xJWBDFigZkssiLOVIhGkWCukxQovhQAyaC6b8i0sc6EqWDK+gQ7OmTZ6FeLtnHpfLtSbFykcWRhx3YgwOw4RQqcAVVqAGBR3iGV3gznowX4934mLTmjGxmG/7I+PwBia6VZg==</latexit>
S

output width stride

Sebastian Raschka STAT 453: Intro to Deep Learning 30


Kernel Dimensions and Trainable Parameters

For a grayscale image with a


5x5 feature detector (kernel),
we have the following dimensions
(number of parameters to learn)

What do you think is the output size


for this 28x28 image?

Sebastian Raschka STAT 453: Intro to Deep Learning 31


CNNs and Translation/Rotation/Scale Invariance

Note that CNNs are not really invariant to scale, rotation,


translation, etc.

The activations are still


dependent on the location,
etc.

Sebastian Raschka STAT 453: Intro to Deep Learning 32


Pooling Layers Can Help With Local Invariance

Sebastian Raschka, Vahid Mirjalili. Python Machine


Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750

Downside: Information is lost.


May not matter for classi cation, but applications where relative position
is important (like face recognition)
In practice for CNNs: some image preprocessing still recommended

Sebastian Raschka STAT 453: Intro to Deep Learning 33


fi
Pooling Layers Can Help With Local Invariance

Note that typical pooling layers do not have any learnable parameters

Downside: Information is lost.


May not matter for classi cation, but applications where relative position
is important (like face recognition)
In practice for CNNs: some image preprocessing still recommended

Sebastian Raschka STAT 453: Intro to Deep Learning 34


fi
Why are Convolutional Nets
Using Cross-Correlation?
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 35


fi
Cross-Correlation vs Convolution

Deep Learning Jargon: convolution in DL is actually cross-correlation


Cross-correlation is our sliding dot product over the image

Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>

"feature map"

Sebastian Raschka STAT 453: Intro to Deep Learning 36


Cross-Correlation vs Convolution

Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>

"feature map"

Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 37


Cross-Correlation vs Convolution

Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>
1) 2) 3)
-1,-1 -1,0 -1,1
Looping direction
4) 5) 6)
indicated via the red
numbers 0,-1 0,0 0,1

7) 8) 9)
1,-1 1,0 1,1

Sebastian Raschka STAT 453: Intro to Deep Learning 38


Cross-Correlation vs Convolution
k
X k
X
Cross-Correlation: Z[i, j] = K[u, v]A[i + u, j + v] Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>

<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

k
X k
X
Convolution: Z[i, j] = K[u, v]A[i u, j v]
<latexit sha1_base64="gciLvvrtiG4n9L4bASqngj19+7w=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiB4SMJMFBQkEPUieIlgFpyMoafTSTrTs9BLIAz5GC/+ihcPLnjw4rfYWRBNLGioV/Uer1+5EaNCmuankVhYXFpeSa6m1tY3NrfS2ztVESqOSQWHLOR1FwnCaEAqkkpG6hEnyHcZqbne5civ9QkXNAxu5SAijo86AW1TjKSWmumzO5tmYc8pNoTym7Eq5rzhfewN4aTu/9TXtsrCvgPPbZrTrJfrO810xsybY8B5Yk1JBkxRbqbfGq0QK58EEjMkhG2ZkXRixCXFjAxTDSVIhLCHOsTWNEA+EU48PnIID7TSgu2Q6xdIOFZ/T8TIF2Lgu7rTR7IrZr2R+J9nK9k+dWIaREqSAE8WtRWDMoSjxGCLcoIlG2iCMKf6rxB3EUdY6lxTOgRr9uR5Ui3kraN84eY4U7qYxpEEe2AfHAILnIASuAJlUAEYPIAn8AJejUfj2Xg3PiatCWM6swv+wPj6BqTHo4k=</latexit>
u= k v= k
Z[i, j] = K ⇤ A
<latexit sha1_base64="vZGNFWQeTgSkyiycymTGu76B7qc=">AAAB+HicbVDLSgNBEOyNrxgfWfXoZTAIIhJ2o6AXIepF8BLBPHCzhNnJbDJm9sHMrBCXfIkXD4p49VO8+TdOkj1oYkFDUdVNd5cXcyaVZX0buYXFpeWV/GphbX1js2hubTdklAhC6yTikWh5WFLOQlpXTHHaigXFgcdp0xtcjf3mIxWSReGdGsbUDXAvZD4jWGmpYxbvHXaEHlx0jm7QIbromCWrbE2A5omdkRJkqHXMr3Y3IklAQ0U4ltKxrVi5KRaKEU5HhXYiaYzJAPeoo2mIAyrddHL4CO1rpYv8SOgKFZqovydSHEg5DDzdGWDVl7PeWPzPcxLln7kpC+NE0ZBMF/kJRypC4xRQlwlKFB9qgolg+lZE+lhgonRWBR2CPfvyPGlUyvZxuXJ7UqpeZnHkYRf24ABsOIUqXEMN6kAggWd4hTfjyXgx3o2PaWvOyGZ24A+Mzx9QvZDp</latexit>
9) 8) 7)
-1,-1 -1,0 -1,1
Basically, we are ipping the kernel (or the
6) 5) 4)
receptive eld) horizontally and vertically
0,-1 0,0 0,1

3) 2) 1)
1,-1 1,0 1,1

Sebastian Raschka STAT 453: Intro to Deep Learning 39


fi
fl
Sebastian Raschka STAT 453: Intro to Deep Learning 40
Sebastian Raschka STAT 453: Intro to Deep Learning 41
Cross-Correlation vs Convolution

Deep Learning Jargon: convolution in DL is actually cross-correlation

"Real" convolution has the nice associative property:


(A ⇤ B) ⇤ C = A ⇤ (B ⇤ C)
<latexit sha1_base64="NmDEdG9hUsHl7PiAA9yVIT7ok0s=">AAAB+nicbVDLSsNAFJ3UV62vVJduBovQZlGSKuhG6GPjsoJ9QBvKZDpph04mYWailNhPceNCEbd+iTv/xmmbhbYeuHA4517uvceLGJXKtr+NzMbm1vZOdje3t39weGTmj9syjAUmLRyyUHQ9JAmjnLQUVYx0I0FQ4DHS8SaNud95IELSkN+raUTcAI049SlGSksDM1+sWfWS1YA3sGYV61ajNDALdtleAK4TJyUFkKI5ML/6wxDHAeEKMyRlz7Ej5SZIKIoZmeX6sSQRwhM0Ij1NOQqIdJPF6TN4rpUh9EOhiyu4UH9PJCiQchp4ujNAaixXvbn4n9eLlX/tJpRHsSIcLxf5MYMqhPMc4JAKghWbaoKwoPpWiMdIIKx0WjkdgrP68jppV8rORblyd1mo1tM4suAUnIEicMAVqIJb0AQtgMEjeAav4M14Ml6Md+Nj2Zox0pkT8AfG5w8dmJCs</latexit>

In DL, we usually don't care about that (as opposed to many traditional
computer vision and signal processing applications).

Also, cross-correlation is easier to implement.

Maybe the term "convolution" for cross-correlation became popular,


because "Cross-Correlational Neural Network" sounds weird ;)

Sebastian Raschka STAT 453: Intro to Deep Learning 42


How does Backpropagation
Work in CNNs?
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 43


fi
Backpropagation in CNNs

Same overall concept as before: Multivariable chain rule,


but now with an additional weight sharing constraint

Sebastian Raschka STAT 453: Intro to Deep Learning 44


Remember Lecture 6? Graph with Weight
Sharing
1 (z1 ) = a1
@a1 <latexit sha1_base64="Kk055JsIihKrUR2Qz/L8arieVek=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN4NFqJuSqKAboejGZQV7gTaEyXTSDp1MwsxEiKW+ihsXirj1Qdz5Nk7bLLT1h4GP/5zDOfMHCWdKO863VVhZXVvfKG6WtrZ3dvfs/YOWilNJaJPEPJadACvKmaBNzTSnnURSHAWctoPRzbTefqBSsVjc6yyhXoQHgoWMYG0s3y73FBtE2Herj757gq6QQd+uODVnJrQMbg4VyNXw7a9ePyZpRIUmHCvVdZ1Ee2MsNSOcTkq9VNEEkxEe0K5BgSOqvPHs+Ak6Nk4fhbE0T2g0c39PjHGkVBYFpjPCeqgWa1Pzv1o31eGlN2YiSTUVZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvEomBHfxy8vQOq25ZzXn7rxSv87jKMIhHEEVXLiAOtxCA5pAIINneIU368l6sd6tj3lrwcpnyvBH1ucPPZSTMQ==</latexit>

x1 w1 <latexit sha1_base64="4Z/KrxxA+GlxhVaTOyJaNHrf4VU=">AAACC3icbVC7TsMwFHV4lvIKMLJYrZCYqgSQYKxgYSwSfUhtFN24TmvVcSLbAVVRdxZ+hYUBhFj5ATb+BqeNBLQcydK559xr+54g4Uxpx/mylpZXVtfWSxvlza3tnV17b7+l4lQS2iQxj2UnAEU5E7Spmea0k0gKUcBpOxhd5X77jkrFYnGrxwn1IhgIFjIC2ki+XemFEkjWS0BqBhyD705+qntTYd+uOjVnCrxI3IJUUYGGb3/2+jFJIyo04aBU13US7WX5nYTTSbmXKpoAGcGAdg0VEFHlZdNdJvjIKH0cxtIcofFU/T2RQaTUOApMZwR6qOa9XPzP66Y6vPAyJpJUU0FmD4UpxzrGeTC4zyQlmo8NASKZ+SsmQzDhaBNf2YTgzq+8SFonNfe05tycVeuXRRwldIgq6Bi56BzV0TVqoCYi6AE9oRf0aj1az9ab9T5rXbKKmQP0B9bHN7v7mtM=</latexit>
@w1
a1 @o
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="BOU8IhEf1nCpOTJ2JoJhJKmU0Z0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF48V7Qe0oUy2m3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkvWNNwI1kkUwygQrB2Mb2d++4kpzWP5aCYJ8yMcSh5yisZKD9j3+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/NTp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPYzLpPUMEkXi8JUEBOT2d9kwBWjRkwsQaq4vZXQESqkxqZTsiF4yy+vktZF1busuve1Sv0mj6MIJ3AK5+DBFdThDhrQBApDeIZXeHOE8+K8Ox+L1oKTzxzDHzifP+lBjYs=</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

@l
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

<latexit sha1_base64="A5BNLDamJxmqDahJkf2wo8PNgJk=">AAACCXicbZBPS8MwGMbT+W/Of1WPXoJD8DRaFfQ49OJxgtuErZS3WbqFpWlJUmGUXr34Vbx4UMSr38Cb38Z0K6ibDwR+PO/7JnmfIOFMacf5sipLyyura9X12sbm1vaOvbvXUXEqCW2TmMfyLgBFORO0rZnm9C6RFKKA024wvirq3XsqFYvFrZ4k1ItgKFjICGhj+TbuhxJI1k9AagYcx/kPg+/m2LfrTsOZCi+CW0IdlWr59md/EJM0okITDkr1XCfRXlbcSTjNa/1U0QTIGIa0Z1BARJWXTTfJ8ZFxBjiMpTlC46n7eyKDSKlJFJjOCPRIzdcK879aL9XhhZcxkaSaCjJ7KEw51jEuYsEDJinRfGIAiGTmr5iMwESjTXg1E4I7v/IidE4a7mnDuTmrNy/LOKroAB2iY+Sic9RE16iF2oigB/SEXtCr9Wg9W2/W+6y1YpUz++iPrI9vdbeaJw==</latexit>
@a1
<latexit sha1_base64="EBSdFgeDtyPz9IVeKoGbRsZIvwA=">AAACB3icbZDLSsNAFIZP6q3WW9SlIINFcFUSFXRZdOOygr1AG8pkOmmHTiZhZiKUkJ0bX8WNC0Xc+grufBsnbUBt/WHg4z/nzMz5/ZgzpR3nyyotLa+srpXXKxubW9s79u5eS0WJJLRJIh7Jjo8V5UzQpmaa004sKQ59Ttv++Dqvt++pVCwSd3oSUy/EQ8ECRrA2Vt8+7AUSk7QXY6kZ5ohnPxxlqG9XnZozFVoEt4AqFGr07c/eICJJSIUmHCvVdZ1Ye2l+I+E0q/QSRWNMxnhIuwYFDqny0ukeGTo2zgAFkTRHaDR1f0+kOFRqEvqmM8R6pOZruflfrZvo4NJLmYgTTQWZPRQkHOkI5aGgAZOUaD4xgIlk5q+IjLAJRpvoKiYEd37lRWid1tyzmnN7Xq1fFXGU4QCO4ARcuIA63EADmkDgAZ7gBV6tR+vZerPeZ60lq5jZhz+yPr4BSfGZjg==</latexit>
@o L(y, o) = l
o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

w 1 · x 1 = z1
3 (a1 , a2 ) =o
<latexit sha1_base64="+e1bOL2+yE8wQHw7R7Wi1lbuH7o=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiuCqJCroRim5cVrAXaEOYTCbt0MlMmJmoMdRXceNCEbc+iDvfxmmbhbb+MPDxn3M4Z/4gYVRpx/m2SkvLK6tr5fXKxubW9o69u9dWIpWYtLBgQnYDpAijnLQ01Yx0E0lQHDDSCUZXk3rnjkhFBb/VWUK8GA04jShG2li+Xb33XdjHodDwwdAFfPRd3645dWcquAhuATVQqOnbX/1Q4DQmXGOGlOq5TqK9HElNMSPjSj9VJEF4hAakZ5CjmCgvnx4/hofGCWEkpHlcw6n7eyJHsVJZHJjOGOmhmq9NzP9qvVRH515OeZJqwvFsUZQyqAWcJAFDKgnWLDOAsKTmVoiHSCKsTV4VE4I7/+VFaB/X3ZO6c3Naa1wWcZTBPjgAR8AFZ6ABrkETtAAGGXgGr+DNerJerHfrY9ZasoqZKvgj6/MHXCGTRw==</latexit>

x2 w1 a2
<latexit sha1_base64="MADeaGuZ5x0zDKtg8jYEX763ets=">AAAB/3icbVBNS8NAEN3Ur1q/ooIXL4tFqCAlaQW9CEUvHivYD2hDmGy37dJNNuxuhBJ78K948aCIV/+GN/+N2zYHrT4YeLw3w8y8IOZMacf5snJLyyura/n1wsbm1vaOvbvXVCKRhDaI4EK2A1CUs4g2NNOctmNJIQw4bQWj66nfuqdSMRHd6XFMvRAGEeszAtpIvn3QVWwQgl8tge+eYvArJ/gSC98uOmVnBvyXuBkpogx13/7s9gRJQhppwkGpjuvE2ktBakY4nRS6iaIxkBEMaMfQCEKqvHR2/wQfG6WH+0KaijSeqT8nUgiVGoeB6QxBD9WiNxX/8zqJ7l94KYviRNOIzBf1E461wNMwcI9JSjQfGwJEMnMrJkOQQLSJrGBCcBdf/kualbJbLTu3Z8XaVRZHHh2iI1RCLjpHNXSD6qiBCHpAT+gFvVqP1rP1Zr3PW3NWNrOPfsH6+Aa4qZP0</latexit>

<latexit sha1_base64="kbSwglAxLszB5oa4qHODldzc1d4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkR9Vj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindRce/Oy7XrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEO2o2m</latexit>

<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

@a2 <latexit sha1_base64="i3V+Hv5zU7q3mZ39kUdT/gvBnUk=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0QPu1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYP6PKcCZwWuqlGhPKxnSIXUsljVD72fzUKTmzyoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPYzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2nZEPwll9eJa1a1buouveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnnzmGP3A+fwDqxY2M</latexit>

<latexit sha1_base64="UfwdaiaQv7vCe0d17Hqvq1wxekM=">AAACC3icbVDLSgMxFM3UV62vUZduQovgqsxUQZdFNy4r2Ad0huFOmmlDMw+SjFKG7t34K25cKOLWH3Dn35hpB9TWA4GTc+69yT1+wplUlvVllFZW19Y3ypuVre2d3T1z/6Aj41QQ2iYxj0XPB0k5i2hbMcVpLxEUQp/Trj++yv3uHRWSxdGtmiTUDWEYsYARUFryzKoTCCCZk4BQDDgGrzH9ud179hR7Zs2qWzPgZWIXpIYKtDzz0xnEJA1ppAgHKfu2lSg3y2cSTqcVJ5U0ATKGIe1rGkFIpZvNdpniY60McBALfSKFZ+rvjgxCKSehrytDUCO56OXif14/VcGFm7EoSRWNyPyhIOVYxTgPBg+YoETxiSZABNN/xWQEOhyl46voEOzFlZdJp1G3T+vWzVmteVnEUUZHqIpOkI3OURNdoxZqI4Ie0BN6Qa/Go/FsvBnv89KSUfQcoj8wPr4BvY+a1A==</latexit>
@w1 <latexit sha1_base64="UHK8Rq4ihGcm4gMByPiIvg3wEO0=">AAAB/HicbZDLSsNAFIZPvNZ6i3bpZrAIdVOSKuhGKLpxWcFeoA1hMp20Q2eSMDMRYqmv4saFIm59EHe+jdM2C239YeDjP+dwzvxBwpnSjvNtrayurW9sFraK2zu7e/v2wWFLxakktEliHstOgBXlLKJNzTSnnURSLAJO28HoZlpvP1CpWBzd6yyhnsCDiIWMYG0s3y71FBsI7Ncqj757iq6QQd8uO1VnJrQMbg5lyNXw7a9ePyapoJEmHCvVdZ1Ee2MsNSOcToq9VNEEkxEe0K7BCAuqvPHs+Ak6MU4fhbE0L9Jo5v6eGGOhVCYC0ymwHqrF2tT8r9ZNdXjpjVmUpJpGZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvIomBHfxy8vQqlXds6pzd16uX+dxFOAIjqECLlxAHW6hAU0gkMEzvMKb9WS9WO/Wx7x1xcpnSvBH1ucPQKeTMw==</latexit>
2 (z1 ) = a2

Upper path
@l @l @o @a1 @l @o @a2
<latexit sha1_base64="TYYPnCIxpTv7H9H6wB8RxqOKTTU=">AAAC9XicrVLLSsNAFJ3EV019VF26GSyCIJSkCroRim5cVrAPaEqZTCc6OMmEmYlSQv7DjQtF3Pov7vwbJ2nBmpauvDBwOPfce2buXC9iVCrb/jbMpeWV1bXSulXe2NzaruzstiWPBSYtzBkXXQ9JwmhIWooqRrqRICjwGOl4D1dZvvNIhKQ8vFWjiPQDdBdSn2KkNDXYMcquLxBO3AgJRRGDLP3FTwMnhdYFXCDhKXTxkCtoQVjU8SkdylrlyjnCLFlwPbb+xbS+yLReMLUGlapds/OAs8CZgCqYRHNQ+XKHHMcBCRVmSMqeY0eqn2Q9MSOp5caSRAg/oDvS0zBEAZH9JP+1FB5qZgh9LvQJFczZ6YoEBVKOAk8rA6TuZTGXkfNyvVj55/2EhlGsSIjHRn7MoOIwWwE4pIJgxUYaICyovivE90gPR+lFyYbgFJ88C9r1mnNSs29Oq43LyThKYB8cgCPggDPQANegCVoAG8J4Nl6NN/PJfDHfzY+x1DQmNXvgT5ifPxdI7YY=</latexit>
@w1
= · ·
@o @a1 @w1
+ · ·
@o @a2 @w1 (multivariable chain rule)
Lower path

Sebastian Raschka STAT 453: Intro to Deep Learning 45


Backpropagation in CNNs

Same overall concept as before: Multivariable chain rule,


but now with an additional weight sharing constraint

Due to weight sharing: w1 = w2 <latexit sha1_base64="FKGwU2qvUIEYUjcCdiIWnsvCgXU=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSJ4KrtV0ItQ9OKxgv2QdlmyabYNTbJLkrWUpb/CiwdFvPpzvPlvTNs9aOuDgcd7M8zMCxPOtHHdb2dldW19Y7OwVdze2d3bLx0cNnWcKkIbJOaxaodYU84kbRhmOG0nimIRctoKh7dTv/VElWaxfDDjhPoC9yWLGMHGSo+jwEPXaBRUg1LZrbgzoGXi5aQMOepB6avbi0kqqDSEY607npsYP8PKMMLppNhNNU0wGeI+7VgqsaDaz2YHT9CpVXooipUtadBM/T2RYaH1WIS2U2Az0IveVPzP66QmuvIzJpPUUEnmi6KUIxOj6feoxxQlho8twUQxeysiA6wwMTajog3BW3x5mTSrFe+8Ur2/KNdu8jgKcAwncAYeXEIN7qAODSAg4Ble4c1Rzovz7nzMW1ecfOYI/sD5/AFAMo9k</latexit>

w1
<latexit sha1_base64="BGIvv1Que1aISVw+1pGEuT4uC1M=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qnn9Uplt+LOQJaJl5My5Kj3Sl/dfszSiCtkkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AELeI2j</latexit>

w2
<latexit sha1_base64="k9TH6JRVGzznxlg0BHK2AhK6Dh8=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindeqd5dlGvXeRwFOIYTOAMPLqEGt1CHBjAYwDO8wpsjnRfn3fmYt644+cwR/IHz+QMM/I2k</latexit>

Optional averaging
weight update: ✓ ◆
1 @L @L
w1 := w2 := w1 ⌘· +
<latexit sha1_base64="MYHrCBQOQYN/sQ1qQKoAWS1dkO0=">AAACenicjVFNa9tAEF2pSZs6Ses2x1IYYvJFqJHcQEugYNJLDz0kECcBy4jReuUsWX2wO6oxQj8ify23/pJeesjKVku+DhlY9vFm5u3smyhX0pDn/XbcF0vLL1+tvG6trq2/edt+9/7MZIXmYsAzlemLCI1QMhUDkqTERa4FJpES59HV9zp//ktoI7P0lGa5GCU4SWUsOZKlwvb1NPTh8BtMw97i8uETBIIQIODjjCCINfLSr8peBUEkJ5PdhoIgR00SFQQJ0iVHVf6sqvI/a6Uq2H928T/5vbDd8brePOAx8BvQYU0ch+2bYJzxIhEpcYXGDH0vp1FZK3MlqlZQGJEjv8KJGFqYYiLMqJxbV8GWZcYQZ9qelGDO3u0oMTFmlkS2sh7cPMzV5FO5YUHx11Ep07wgkfLFQ3GhgDKo9wBjqQUnNbMAuZZ2VuCXaL0iu62WNcF/+OXH4KzX9T93eycHnf5RY8cK+8A22S7z2RfWZz/YMRswzv44H51tZ8f56266e+7+otR1mp4Ndi/cg1vG1r5V</latexit>
2 @w1 @w2
Sebastian Raschka STAT 453: Intro to Deep Learning 46
What Are Some of the Common
Cnn Architectures?
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 47


fi
Main Breakthrough for CNNs:
AlexNet & ImageNet

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
AlexNet
and pooled) achieved 15.4%
output of the first error layer
convolutional on top-5
and filtersinit 2012
with 256 kernels of size 5 ⇥ 5 ⇥ 48.
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
2nd best
pooling was not layers.
or normalization even close:
The third26.2%
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
(nowadays
256 connected to~3% error onpooled)
the (normalized, ImageNet)
outputs of the second convolutional layer. The fourth
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 192 , and the fifth convolutional layer has 256
kernels of size 3 ⇥ 3 ⇥ 192. The fully-connected layers have 4096 neurons each.

Krizhevsky, A., Sutskever, I., & Hinton,


4 Reducing G. E. (2012). ImageNet classification with deep convolutional neural
Overfitting
networks. In AdvancesOur
in neural
neural information processing systems (pp. 1097-1105).
network architecture has 60 million parameters. Although the 1000 classes of ILSVRC
make each training example impose 10 bits of constraint on the mapping from image to label, this
turns out to be insufficient to learn so many parameters without considerable overfitting. Below, we
describe the two primary ways in which we combat overfitting.
Sebastian Raschka STAT 453: Intro to Deep Learning 48
Main Breakthrough for CNNs:
AlexNet & ImageNet

Stevens, Eli, Luca Antiga, and Thomas Viehmann. Deep learning with PyTorch. Manning Publications, 2020

Sebastian Raschka STAT 453: Intro to Deep Learning 49


rward pass and do not participate in back-
ral network samples a different architecture,
Main Breakthrough for CNNs:
reduces complex co-adaptations of neurons,
ar other neurons. It is, therefore, forced to
n with many different random subsets of the
AlexNet & ImageNet
ut multiply their outputs by 0.5, which is a
of the predictive distributions produced by

Figure 2. Without dropout, our network ex- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009).
he number of iterations required to converge. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition (pp. 248–255).

nt
nd
nt
In Figure 3: 96 convolutional kernels of size
er: 11⇥11⇥3 learned by the first convolutional
or layer on the 224⇥224⇥3 input images. The Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
top 48 kernels were learned on GPU 1 while with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
the bottom 48 kernels were learned on GPU remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
2. See Section 6.1 for details.
i
In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
D E top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
ble, ✏ is the learning rate, and @w
@L
wi D
is only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
he objective with respect to w, evaluated at G. E.
Krizhevsky, A., Sutskever, I., & Hinton, (2012). Imagenet
the network’s classification with thedeepfeatureconvolutional neural
i
Another way to probe visual knowledge is to consider activations induced
networks. In Advances in neural information processing
by an systems (pp.
image at the last, 4096-dimensional 1097-1105).
hidden layer. If two images produce feature activation
vectors with a small Euclidean separation, we can say that the higher levels of the neural network
consider them to be similar. Figure 4 shows five images from the test set and the six images from
ean Gaussian distribution with standard de- the training set that are most similar to each of them according to this measure. Notice that at the
pixel level, the retrieved training images are generally not close in L2 to the query images in the first
cond, fourth, and fifth convolutional layers, column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the
e constant 1. This initialization accelerates
Sebastian Raschka
results for manySTAT 453:
more test Introinto
images theDeep Learning
supplementary material. 50
Main Breakthrough for CNNs:
AlexNet & ImageNet
The ImageNet set that was used
has ~1.2 million
images and 1000 classes

Accuracy is measured as top-5


performance:
Correct prediction if the true
label matches one of the top 5
predictions of the model

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
remaining columns showSebastian
the sixRaschka
training STAT that produce feature vectors in the last 51
453: Intro to Deep Learning
images hid
Main Breakthrough for CNNs:
AlexNet & ImageNet
Note that the actual network
inputs were still 224x224
images
(random crops from
downsampled 256x256
images)

224x224 is still a good/


reasonable size today
(224*224*3 = 150,528 features)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
remaining columns showSebastian
the sixRaschka
training STAT that produce feature vectors in the last 52
453: Intro to Deep Learning
images hid
Common CNN Architectures

Figure 1: Top1 vs. network. Single-crop top-1 vali- Figure 2: Top1 vs. operations, size / parameters.
dation accuracies for top scoring single-model archi- Top-1 one-crop accuracy versus amount of operations
tectures. We introduce with this chart our choice of required for a single forward pass. The size of the
colour scheme, which will be used throughout this blobs is proportional to the number of network pa-
publication to distinguish effectively different archi- rameters; a legend is reported in the bottom right cor-
tectures and their correspondent authors. Notice that ner, spanning from 5⇥106 to 155⇥106 params. Both
networks of the same group share the same hue, for these figures share the same y-axis, and the grey dots
example ResNet are all variations of pink. highlight the centre of the blobs.

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical
single arXiv
applications. VGG-161arXiv:1605.07678.
run of preprint (Simonyan & Zisserman, 2014) and GoogLeNet (Szegedy et al., 2014) are
8.70% and 10.07% respectively, revealing that VGG-16 performs better than GoogLeNet. When
2
models are run with 10-crop sampling, then the errors become 9.33% and 9.15% respectively, and
Sebastian Raschka STAT 453: Intro to Deep Learning
therefore VGG-16 will perform worse than GoogLeNet, using a single central-crop. For this reason, 53
Convolutions with Color Channels

Sebastian Raschka, Vahid Mirjalili. Python Machine


Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750

Image dimension: X 2 Rn1 ⇥n2 ⇥cin in NWHC format,


<latexit sha1_base64="GzJ9t8GJxsmHc4EQcbLCmf2kfPA=">AAACIXicbVDLSsNAFJ34rPUVdelmsAiuSlIFuyy6cVnFPqCpYTKdtEMnkzBzI5SQX3Hjr7hxoUh34s84fQjaemDg3HPv5c45QSK4Bsf5tFZW19Y3Ngtbxe2d3b19++CwqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsYXk/6rUemNI/lPYwS1o1IX/KQUwJG8u2qFxEYBGHWzrHHJZ6VQXaXP2TSd7EHPGIaS7/yQ6mfcZnnvl1yys4UeJm4c1JCc9R9e+z1YppGTAIVROuO6yTQzYgCTgXLi16qWULokPRZx1BJzLFuNnWY41Oj9HAYK/Mk4Kn6eyMjkdajKDCTEwN6sTcR/+t1UgirXWMoSYFJOjsUpgJDjCdx4R5XjIIYGUKo4uavmA6IIhRMqEUTgrtoeZk0K2X3vFy5vSjVruZxFNAxOkFnyEWXqIZuUB01EEVP6AW9oXfr2Xq1PqzxbHTFmu8coT+wvr4Bx1+j5A==</latexit>

CUDA & PyTorch use NCWH

Sebastian Raschka STAT 453: Intro to Deep Learning 54


Interpreting (the Hidden)
ConvNet Layers
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 55


fi
What a CNN Can See
Simple example: vertical edge detector

(From classical computer vision


research)

Sebastian Raschka STAT 453: Intro to Deep Learning 56


What a CNN Can See
Simple example: vertical edge detector

Sebastian Raschka STAT 453: Intro to Deep Learning 57


What a CNN Can See
Simple example: horizontal edge detector

A CNN can learn whatever it nds


best based on optimizing the objective
(e.g., minimizing a particular loss
to achieve good classi cation accuracy)

Sebastian Raschka STAT 453: Intro to Deep Learning 58


fi
fi
rward pass and do not participate in back-
ral network samples a different architecture,
Main Breakthrough for CNNs:
reduces complex co-adaptations of neurons,
ar other neurons. It is, therefore, forced to
n with many different random subsets of the
AlexNet & ImageNet
ut multiply their outputs by 0.5, which is a
of the predictive distributions produced by

Figure 2. Without dropout, our network ex- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009).
he number of iterations required to converge. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition (pp. 248–255).

nt
nd
nt
In Figure 3: 96 convolutional kernels of size
er: 11⇥11⇥3 learned by the first convolutional
or layer on the 224⇥224⇥3 input images. The Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
top 48 kernels were learned on GPU 1 while with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
the bottom 48 kernels were learned on GPU remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
2. See Section 6.1 for details.
i
In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
D E top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
ble, ✏ is the learning rate, and @w
@L
wi D
is only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
he objective with respect to w, evaluated at G. E.
Krizhevsky, A., Sutskever, I., & Hinton, (2012). Imagenet
the network’s classification with thedeepfeatureconvolutional neural
i
Another way to probe visual knowledge is to consider activations induced
networks. In Advances in neural information processing
by an systems (pp.
image at the last, 4096-dimensional 1097-1105).
hidden layer. If two images produce feature activation
vectors with a small Euclidean separation, we can say that the higher levels of the neural network
consider them to be similar. Figure 4 shows five images from the test set and the six images from
ean Gaussian distribution with standard de- the training set that are most similar to each of them according to this measure. Notice that at the
pixel level, the retrieved training images are generally not close in L2 to the query images in the first
cond, fourth, and fifth convolutional layers, column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the
e constant 1. This initialization accelerates
Sebastian Raschka
results for manySTAT 453:
more test Introinto
images theDeep Learning
supplementary material. 59
repeated in layers 2,3,4,5. The last two layers are fully connected, taking features from

What a CNN Can See


the top convolutional layer as input in vector form (6 · 6 · 256 = 9216 dimensions).
The final layer is a C-way softmax function, C being the number of classes. All filters
and feature maps are square in shape.
Which patterns from the training set activate the feature map?

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5

Fig. 4. Evolution of a randomly chosen subset of model features through training.


Each layer’s features are displayed in a different block. Within each block, we show
a randomly chosen subset of features at epochs [1,2,5,10,20,30,40,64]. The visualiza-
tion shows the strongest activation (across all training examples) for a given feature
map, projected down to pixel space using our deconvnet approach. Color contrast is
artificially enhanced and the figure is best viewed in electronic form.
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. European
occluderIncovers theconference on computer
image region vision (pp.
that appears 818-833).
in the Springer, Cham.
visualization, we see a
strongbackpropagate
Method: drop in activity
stronginactivation
the feature map.
signals This layers
in hidden showstothat the images,
the input visualization
then
genuinely
apply corresponds
"unpooling" tovalues
to map the the image
to the structure that
original pixel stimulates
space for that feature map,
hence validating the other visualizations shown in Fig. 4 and Fig. 2.
visualization

5 Experiments Sebastian Raschka STAT 453: Intro to Deep Learning 60


What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks.
824 In European
M.D. Zeiler andconference
R. Ferguson computer vision (pp. 818-833). Springer, Cham.

Layer 1

Layer 2

Sebastian Raschka STAT 453: Intro to Deep Learning 61


What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Layer 2

Layer 3

Sebastian Raschka STAT 453: Intro to Deep Learning 62


What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European
Layer 3 conference on computer vision (pp. 818-833). Springer, Cham.

Layer 4 Layer 5
Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top
Sebastian Raschka STAT 453: Intro to Deep Learning 63
https://thegradient.pub/a-visual-history-of-interpretation-for-image-recognition/

Sebastian Raschka STAT 453: Intro to Deep Learning 64


Let's Implement CNNs in
PyTorch
1. What CNNs Can Do
2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 65


fi

You might also like