L13 Intro-Cnn Slides

STAT 453: Introduction to Deep Learning and Generative Models
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching
Lecture 13
Introduction to
with Applications in Python
Convolutional Neural Networks

Sebastian Raschka STAT 453: Intro to Deep Learning 1
Lecture Overview
1. What CNNs Can Do

2. Image Classi cation
3. Convolutional Neural Network Basics
4. Convolutional Filters and Weight-Sharing
5. Cross-Correlation vs Convolution
6. CNNs & Backpropagation
7. CNN Architectures
8. What a CNN Can See
9. CNNs in PyTorch

fi
Next Lecture
1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
A. VGG16 (simple, deep CNN)
B. ResNet and skip connections
C. Fully convolutional networks (no fully connected layers)
D. Inception (parallel convolutions and auxiliary losses)
5. Transfer learning

Common Applications of CNNs
1. What CNNs Can Do

9. CNNs in PyTorch

fi
CNNs for Image Classi cation
output
Image Source:
twitter.com%2Fcats&psig=AOvVaw30_o-PCM-
K21DiMAJQimQ4&ust=1553887775741551
p(y=cat)
Image Source: https://www.pinterest.com/pin/

244742560974520446

fi
Object Detection
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Uni ed, real-time object detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 779-788).

fi
Object Segmentation
umbrella.98 bus.99
umbrella.98
person1.00
person1.00
person1.00
backpack1.00
person1.00 person.99
handbag.96 person.99
person1.00 person1.00 person1.00
person1.00 person1.00
person.95 person.98
person1.00
person1.00 person1.00 person.94 person1.00 person1.00 person.89
person1.00 sheep.99
backpack.99
sheep.99 sheep.86
backpack.93 sheep.82 sheep.96
sheep.96 sheep.93 sheep.91 sheep.95 sheep.96 sheep1.00
sheep1.00
sheep.99
sheep1.00
sheep.99
sheep.96
sheep.99
person.99
bottle.99
dining table.96
bottle.99
bottle.99
person.99person1.00
person1.00
traffic light.96 tv.99
chair.98 chair.99
chair.90
dining table.99 chair.96 wine glass.97
chair.86
bottle.99wine glass.93 chair.99
bowl.85 wine glass1.00
elephant1.00
wine glass.99
wine glass1.00
person1.00 chair.96 chair.99 fork.95
person1.00 traffic light.95 bowl.81

person1.00
traffic light.92 traffic light.84
person1.00 person.85
person.96 truck1.00 person.99
motorcycle1.00 person.96person1.00
person.83 person1.00
motorcycle1.00 person.98
person.99 person.91
person.90 person.87 car.99 car.92
person.99
person.92 car.99 car.93
car1.00
motorcycle.95
knife.83
person.96
Figure 2. Mask R-CNN results on the COCO test set. These results are based on ResNet-101 [15], achieving a mask AP of 35.7 and
running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.
ingly minor change, RoIAlign has a large impact: it im- 2. Related Work
He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." In Proceedings of the IEEE International Conference on
proves mask accuracy by relative 10% to 50%, showing
Computer Vision, pp. 2961-2969. 2017. R-CNN: The Region-based CNN (R-CNN) approach [10]
bigger gains under stricter localization metrics. Second, we
found it essential to decouple mask and class prediction: we to bounding-box object detection is to attend to a manage-
predict a binary mask for each class independently, without able number of candidate object regions [33, 16] and evalu-
competition among classes, and rely on the network’s RoI ate convolutional networks [20, 19] independently on each
classification branch to predict the Sebastian
category.Raschka RoI.
In contrast,STAT 453: R-CNN
Intro was
to Deep extended [14, 9] to allow attending to RoIs 7
Learning
Face Recognition
[1]
x
<latexit sha1_base64="p8Wx+cqqkWj+1zNtDaf7R0Gpalg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122y7dbMLuRCyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLEykMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3Q2q4FIo3UKDk7URzGoWSt8LRzdRvPXJtRKzucZzwIKIDJfqCUbRS6+kh871g0i2V3Yo7A1kmXk7KkKPeLX11ejFLI66QSWqM77kJBhnVKJjkk2InNTyhbEQH3LdU0YibIJudOyGnVumRfqxtKSQz9fdERiNjxlFoOyOKQ7PoTcX/PD/F/lWQCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahog3BW3x5mTSrFe+8Ur27KNeu8zgKcAwncAYeXEINbqEODWAwgmd4hTcncV6cd+dj3rri5DNH8AfO5w81Jo97</latexit>
Similarity/
Distance
Score
[2]
x
<latexit sha1_base64="vzgd/QPklE2GpKgvXahAxpOTUdw=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122i7dbMLuRiyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLE8G1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1Q6pRcIkNw43AdqKQRqHAVji6mfqtR1Sax/LejBMMIjqQvM8ZNVZqPT1kfjWYdEtlt+LOQJaJl5My5Kh3S1+dXszSCKVhgmrte25igowqw5nASbGTakwoG9EB+pZKGqEOstm5E3JqlR7px8qWNGSm/p7IaKT1OAptZ0TNUC96U/E/z09N/yrIuExSg5LNF/VTQUxMpr+THlfIjBhbQpni9lbChlRRZmxCRRuCt/jyMmlWK955pXp3Ua5d53EU4BhO4Aw8uIQa3EIdGsBgBM/wCm9O4rw4787HvHXFyWeO4A+czx82rI98</latexit>

Image Synthesis
Training set
Discriminator
Real /
Generated
Noise
Generated image
Generator

Why Computer Vision is (was) Hard:
Challenges with Image Classi cation
1. What CNNs Can Do
9. CNNs in PyTorch

fi
fi
Why Image Classi cation is Hard
Di erent lighting, contrast, viewpoints, etc.
Image Source: Image Source: https://www.123rf.com/

twitter.com%2Fcats&psig=AOvVaw30_o-PCM- photo_76714328_side-view-of-tabby-cat-face-over-
K21DiMAJQimQ4&ust=1553887775741551 white.html
Or even simple translation This is hard for traditional

methods like multi-layer
perceptrons, because
the prediction is
basically based on a sum
of pixel intensities

ff
fi
Traditional Approaches
a) Use hand-engineered features

a) Use hand-engineered features
Sasaki, K., Hashimoto, M., & Nagata, N. (2016). Person Invariant Classification of Subtle Facial Expressions Using Coded Movement Direction of
Keypoints. In Video Analytics. Face and Facial Expression Recognition and Audience Measurement (pp. 61-72). Springer, Cham.

b) Preprocess images (centering, cropping, etc.)
Image Source: https://www.tokkoro.com/2827328-cat-animals-nature-feline-park-green-trees-grass.html

The Main Concepts Behind
1. What CNNs Can Do
9. CNNs in PyTorch

fi
https://sg n.github.io/2020/06/22/Induction-Intro/

fi
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to
Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989.

C3: f. maps 16@10x10

C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
Full connection Gaussian connections

Convolutions Subsampling Convolutions Subsampling Full connection
Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Ha ner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

ff
Hidden Layers
C3: f. maps 16@10x10

INPUT 6@28x28
6@14x14 120 10
84

"Automatic feature extractor" "Regular classi er"

fi
Hidden Layers
C3: f. maps 16@10x10

INPUT 6@28x28
6@14x14 120 10
84

Each "bunch" of feature maps represents one

hidden layer in the neural network.
Counting the FC layers, this network has 5 layers

Size of the resulting layers
Number of feature detectors
C3: f. maps 16@10x10
INPUT
C1: feature maps
6@28x28
S4: f. maps 16@5x5 Multi-layer perceptron
6@14x14 120 10
84

nowadays called "pooling" basically a fully-connected

layer + MSE loss
"Feature detectors" (weight matrices) (nowadays common to use
that are being reused ("weight sharing") fc-layer + softmax
=> also called "kernel" or " lter" + cross entropy)
Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Ha ner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

fi
ff
Main Concepts Behind
• Sparse-connectivity: A single element in the feature map is

connected to only a small patch of pixels. (This is very di erent
from connecting to the whole input image, in the case of multi-
layer perceptrons.)
• Parameter-sharing: The same weights are used for di erent

patches of the input image.
• Many layers: Combining extracted local patterns to global

patterns

ff
ff
A Closer Look at the Convolutional Layer
1. What CNNs Can Do

9. CNNs in PyTorch

fi
Weight Sharing
A "feature detector" ( lter, kernel) slides over the inputs to generate
a feature map
9
X
w j xj
The pixels are j=1
referred to
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
as "receptive eld"
"feature map"
Rationale: A feature detector that works well in one region

may also work well in another region
Plus, it is a nice reduction in parameters to t

fi
fi
fi
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map
9
X
w j xj
j=1

Weight Sharing
a feature map
9
X
w j xj
j=1

Weight Sharing
a feature map
9
X
w j xj
j=1

Weight Sharing
a feature map
9
X
w j xj
j=1

9
X (@1)
wj xj
j=1
Multiple "feature
<latexit sha1_base64="f26ph3SsblXR0kXxlacC2FehXAE=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaWeSMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaMBqSqqGGlEgiDuMVL3Blcjv35HhKRhcKuGEXE46gbUpxgpLbnmfkvG3E36F3baTs5TeO/220mhbB+l8MHtu2beKlpjwHliT0keTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0lwrliRCeIC6pKlpgDiRTjJ+I4WHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Mh2LMvz5NaqWgfF0s3J/ny5TSOLNgDB6AAbHAKyuAaVEAVYPAInsEreDOejBfj3fiYtGaM6cwu+APj8wfkL5gZ</latexit>
detectors" (kernels) are used

to create multiple feature
maps
9
X (@2)
w j xj
<latexit sha1_base64="nCqyd07UuJkUGWlSGLMV2F7bQVM=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaSeTMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaOcVBVVjDQiQVDgMVL3Blcjv35HhKQhv1XDiDgB6nLqU4yUllxzvyXjwE36F3baTs5TeO/220mhXDpK4YPbd828VbTGgPPEnpI8mKLiml+tTojjgHCFGZKyaVuRchIkFMWMpLlWLEmE8AB1SVNTjgIinWT8RgoPtdKBfih0cQXH6u+JBAVSDgNPdwZI9eSsNxL/85qx8s+chPIoVoTjySI/ZlCFcJQJ7FBBsGJDTRAWVN8KcQ8JhJVOLqdDsGdfnie1UtE+LpZuTvLly2kcWbAHDkAB2OAUlME1qIAqwOARPINX8GY8GS/Gu/Exac0Y05ld8AfG5w/luZga</latexit>
j=1
9
X (@3)
wj xj
<latexit sha1_base64="N3BOf0nmcHBzr6vnBzaSoMFhcQo=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiStoC6EohuXFewD2jRMppN22pkkzEzUErJy46+4caGIW7/BnX/j9LHQ1gMXDufcy733eBGjUlnWt7GwuLS8sppZy65vbG5tmzu7NRnGApMqDlkoGh6ShNGAVBVVjDQiQRD3GKl7g6uRX78jQtIwuFXDiDgcdQPqU4yUllzzoCVj7ib9CzttJ+cpvHf77SRfLh2n8MHtu2bOKlhjwHliT0kOTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0mwrliRCeIC6pKlpgDiRTjJ+I4VHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Uh2LMvz5NasWCXCsWbk1z5chpHBuyDQ5AHNjgFZXANKqAKMHgEz+AVvBlPxovxbnxMWheM6cwe+APj8wfnQ5gb</latexit>
j=1

Size Before and After Convolutions
Feature map size: kernel width

input width
padding
W K + 2P
O= +1
<latexit sha1_base64="F3e+5qMk1hWaddof/b46u0hNgJ4=">AAACBXicbZC7SgNBFIbPxluMt1VLLQaDIIhhNwraCEEbwcKI5gLJEmYns8mQ2Qszs0JYtrHxVWwsFLH1Hex8GyfJFpr4w8DHf87hzPndiDOpLOvbyM3NLywu5ZcLK6tr6xvm5lZdhrEgtEZCHoqmiyXlLKA1xRSnzUhQ7LucNtzB5ajeeKBCsjC4V8OIOj7uBcxjBCttdczdG3SO2p7AJGmgI3SNDlG5miZ3qQa7YxatkjUWmgU7gyJkqnbMr3Y3JLFPA0U4lrJlW5FyEiwUI5ymhXYsaYTJAPdoS2OAfSqdZHxFiva100VeKPQLFBq7vycS7Es59F3d6WPVl9O1kflfrRUr78xJWBDFigZkssiLOVIhGkWCukxQovhQAyaC6b8i0sc6EqWDK+gQ7OmTZ6FeLtnHpfLtSbFykcWRhx3YgwOw4RQqcAVVqAGBR3iGV3gznowX4934mLTmjGxmG/7I+PwBia6VZg==</latexit>
S
output width stride

Kernel Dimensions and Trainable Parameters
For a grayscale image with a

5x5 feature detector (kernel),
we have the following dimensions
(number of parameters to learn)
What do you think is the output size

for this 28x28 image?

CNNs and Translation/Rotation/Scale Invariance
Note that CNNs are not really invariant to scale, rotation,

translation, etc.
The activations are still

dependent on the location,
etc.

Pooling Layers Can Help With Local Invariance
Sebastian Raschka, Vahid Mirjalili. Python Machine

Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750
Downside: Information is lost.

May not matter for classi cation, but applications where relative position
is important (like face recognition)
In practice for CNNs: some image preprocessing still recommended

fi
Pooling Layers Can Help With Local Invariance
Note that typical pooling layers do not have any learnable parameters
Downside: Information is lost.

May not matter for classi cation, but applications where relative position
is important (like face recognition)
In practice for CNNs: some image preprocessing still recommended

fi
Why are Convolutional Nets
Using Cross-Correlation?
1. What CNNs Can Do
9. CNNs in PyTorch

fi
Cross-Correlation vs Convolution
Deep Learning Jargon: convolution in DL is actually cross-correlation

Cross-correlation is our sliding dot product over the image
Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>
"feature map"

Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>
"feature map"
Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k
Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>

Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
u= k v= k
Z[i, j] = K ⌦ A
1) 2) 3)
-1,-1 -1,0 -1,1
Looping direction
4) 5) 6)
indicated via the red
numbers 0,-1 0,0 0,1
7) 8) 9)
1,-1 1,0 1,1

k
X k
X
Cross-Correlation: Z[i, j] = K[u, v]A[i + u, j + v] Z[i, j] = K ⌦ A
u= k v= k
k
X k
X
Convolution: Z[i, j] = K[u, v]A[i u, j v]
<latexit sha1_base64="gciLvvrtiG4n9L4bASqngj19+7w=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiB4SMJMFBQkEPUieIlgFpyMoafTSTrTs9BLIAz5GC/+ihcPLnjw4rfYWRBNLGioV/Uer1+5EaNCmuankVhYXFpeSa6m1tY3NrfS2ztVESqOSQWHLOR1FwnCaEAqkkpG6hEnyHcZqbne5civ9QkXNAxu5SAijo86AW1TjKSWmumzO5tmYc8pNoTym7Eq5rzhfewN4aTu/9TXtsrCvgPPbZrTrJfrO810xsybY8B5Yk1JBkxRbqbfGq0QK58EEjMkhG2ZkXRixCXFjAxTDSVIhLCHOsTWNEA+EU48PnIID7TSgu2Q6xdIOFZ/T8TIF2Lgu7rTR7IrZr2R+J9nK9k+dWIaREqSAE8WtRWDMoSjxGCLcoIlG2iCMKf6rxB3EUdY6lxTOgRr9uR5Ui3kraN84eY4U7qYxpEEe2AfHAILnIASuAJlUAEYPIAn8AJejUfj2Xg3PiatCWM6swv+wPj6BqTHo4k=</latexit>
u= k v= k
Z[i, j] = K ⇤ A
<latexit sha1_base64="vZGNFWQeTgSkyiycymTGu76B7qc=">AAAB+HicbVDLSgNBEOyNrxgfWfXoZTAIIhJ2o6AXIepF8BLBPHCzhNnJbDJm9sHMrBCXfIkXD4p49VO8+TdOkj1oYkFDUdVNd5cXcyaVZX0buYXFpeWV/GphbX1js2hubTdklAhC6yTikWh5WFLOQlpXTHHaigXFgcdp0xtcjf3mIxWSReGdGsbUDXAvZD4jWGmpYxbvHXaEHlx0jm7QIbromCWrbE2A5omdkRJkqHXMr3Y3IklAQ0U4ltKxrVi5KRaKEU5HhXYiaYzJAPeoo2mIAyrddHL4CO1rpYv8SOgKFZqovydSHEg5DDzdGWDVl7PeWPzPcxLln7kpC+NE0ZBMF/kJRypC4xRQlwlKFB9qgolg+lZE+lhgonRWBR2CPfvyPGlUyvZxuXJ7UqpeZnHkYRf24ABsOIUqXEMN6kAggWd4hTfjyXgx3o2PaWvOyGZ24A+Mzx9QvZDp</latexit>
9) 8) 7)
-1,-1 -1,0 -1,1
Basically, we are ipping the kernel (or the
6) 5) 4)
receptive eld) horizontally and vertically
0,-1 0,0 0,1
3) 2) 1)
1,-1 1,0 1,1

fi
fl
Deep Learning Jargon: convolution in DL is actually cross-correlation
"Real" convolution has the nice associative property:

(A ⇤ B) ⇤ C = A ⇤ (B ⇤ C)
<latexit sha1_base64="NmDEdG9hUsHl7PiAA9yVIT7ok0s=">AAAB+nicbVDLSsNAFJ3UV62vVJduBovQZlGSKuhG6GPjsoJ9QBvKZDpph04mYWailNhPceNCEbd+iTv/xmmbhbYeuHA4517uvceLGJXKtr+NzMbm1vZOdje3t39weGTmj9syjAUmLRyyUHQ9JAmjnLQUVYx0I0FQ4DHS8SaNud95IELSkN+raUTcAI049SlGSksDM1+sWfWS1YA3sGYV61ajNDALdtleAK4TJyUFkKI5ML/6wxDHAeEKMyRlz7Ej5SZIKIoZmeX6sSQRwhM0Ij1NOQqIdJPF6TN4rpUh9EOhiyu4UH9PJCiQchp4ujNAaixXvbn4n9eLlX/tJpRHsSIcLxf5MYMqhPMc4JAKghWbaoKwoPpWiMdIIKx0WjkdgrP68jppV8rORblyd1mo1tM4suAUnIEicMAVqIJb0AQtgMEjeAav4M14Ml6Md+Nj2Zox0pkT8AfG5w8dmJCs</latexit>
In DL, we usually don't care about that (as opposed to many traditional
computer vision and signal processing applications).
Also, cross-correlation is easier to implement.
Maybe the term "convolution" for cross-correlation became popular,

because "Cross-Correlational Neural Network" sounds weird ;)

How does Backpropagation
Work in CNNs?
1. What CNNs Can Do
9. CNNs in PyTorch

fi
Backpropagation in CNNs
Same overall concept as before: Multivariable chain rule,

but now with an additional weight sharing constraint

Remember Lecture 6? Graph with Weight
Sharing
1 (z1 ) = a1
@a1 <latexit sha1_base64="Kk055JsIihKrUR2Qz/L8arieVek=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN4NFqJuSqKAboejGZQV7gTaEyXTSDp1MwsxEiKW+ihsXirj1Qdz5Nk7bLLT1h4GP/5zDOfMHCWdKO863VVhZXVvfKG6WtrZ3dvfs/YOWilNJaJPEPJadACvKmaBNzTSnnURSHAWctoPRzbTefqBSsVjc6yyhXoQHgoWMYG0s3y73FBtE2Herj757gq6QQd+uODVnJrQMbg4VyNXw7a9ePyZpRIUmHCvVdZ1Ee2MsNSOcTkq9VNEEkxEe0K5BgSOqvPHs+Ak6Nk4fhbE0T2g0c39PjHGkVBYFpjPCeqgWa1Pzv1o31eGlN2YiSTUVZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvEomBHfxy8vQOq25ZzXn7rxSv87jKMIhHEEVXLiAOtxCA5pAIINneIU368l6sd6tj3lrwcpnyvBH1ucPPZSTMQ==</latexit>
x1 w1 <latexit sha1_base64="4Z/KrxxA+GlxhVaTOyJaNHrf4VU=">AAACC3icbVC7TsMwFHV4lvIKMLJYrZCYqgSQYKxgYSwSfUhtFN24TmvVcSLbAVVRdxZ+hYUBhFj5ATb+BqeNBLQcydK559xr+54g4Uxpx/mylpZXVtfWSxvlza3tnV17b7+l4lQS2iQxj2UnAEU5E7Spmea0k0gKUcBpOxhd5X77jkrFYnGrxwn1IhgIFjIC2ki+XemFEkjWS0BqBhyD705+qntTYd+uOjVnCrxI3IJUUYGGb3/2+jFJIyo04aBU13US7WX5nYTTSbmXKpoAGcGAdg0VEFHlZdNdJvjIKH0cxtIcofFU/T2RQaTUOApMZwR6qOa9XPzP66Y6vPAyJpJUU0FmD4UpxzrGeTC4zyQlmo8NASKZ+SsmQzDhaBNf2YTgzq+8SFonNfe05tycVeuXRRwldIgq6Bi56BzV0TVqoCYi6AE9oRf0aj1az9ab9T5rXbKKmQP0B9bHN7v7mtM=</latexit>
@w1
a1 @o
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>
<latexit sha1_base64="BOU8IhEf1nCpOTJ2JoJhJKmU0Z0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF48V7Qe0oUy2m3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkvWNNwI1kkUwygQrB2Mb2d++4kpzWP5aCYJ8yMcSh5yisZKD9j3+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/NTp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPYzLpPUMEkXi8JUEBOT2d9kwBWjRkwsQaq4vZXQESqkxqZTsiF4yy+vktZF1busuve1Sv0mj6MIJ3AK5+DBFdThDhrQBApDeIZXeHOE8+K8Ox+L1oKTzxzDHzifP+lBjYs=</latexit>
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
@l
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>
<latexit sha1_base64="A5BNLDamJxmqDahJkf2wo8PNgJk=">AAACCXicbZBPS8MwGMbT+W/Of1WPXoJD8DRaFfQ49OJxgtuErZS3WbqFpWlJUmGUXr34Vbx4UMSr38Cb38Z0K6ibDwR+PO/7JnmfIOFMacf5sipLyyura9X12sbm1vaOvbvXUXEqCW2TmMfyLgBFORO0rZnm9C6RFKKA024wvirq3XsqFYvFrZ4k1ItgKFjICGhj+TbuhxJI1k9AagYcx/kPg+/m2LfrTsOZCi+CW0IdlWr59md/EJM0okITDkr1XCfRXlbcSTjNa/1U0QTIGIa0Z1BARJWXTTfJ8ZFxBjiMpTlC46n7eyKDSKlJFJjOCPRIzdcK879aL9XhhZcxkaSaCjJ7KEw51jEuYsEDJinRfGIAiGTmr5iMwESjTXg1E4I7v/IidE4a7mnDuTmrNy/LOKroAB2iY+Sic9RE16iF2oigB/SEXtCr9Wg9W2/W+6y1YpUz++iPrI9vdbeaJw==</latexit>
@a1
<latexit sha1_base64="EBSdFgeDtyPz9IVeKoGbRsZIvwA=">AAACB3icbZDLSsNAFIZP6q3WW9SlIINFcFUSFXRZdOOygr1AG8pkOmmHTiZhZiKUkJ0bX8WNC0Xc+grufBsnbUBt/WHg4z/nzMz5/ZgzpR3nyyotLa+srpXXKxubW9s79u5eS0WJJLRJIh7Jjo8V5UzQpmaa004sKQ59Ttv++Dqvt++pVCwSd3oSUy/EQ8ECRrA2Vt8+7AUSk7QXY6kZ5ohnPxxlqG9XnZozFVoEt4AqFGr07c/eICJJSIUmHCvVdZ1Ye2l+I+E0q/QSRWNMxnhIuwYFDqny0ukeGTo2zgAFkTRHaDR1f0+kOFRqEvqmM8R6pOZruflfrZvo4NJLmYgTTQWZPRQkHOkI5aGgAZOUaD4xgIlk5q+IjLAJRpvoKiYEd37lRWid1tyzmnN7Xq1fFXGU4QCO4ARcuIA63EADmkDgAZ7gBV6tR+vZerPeZ60lq5jZhz+yPr4BSfGZjg==</latexit>
@o L(y, o) = l
o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>
w 1 · x 1 = z1
3 (a1 , a2 ) =o
<latexit sha1_base64="+e1bOL2+yE8wQHw7R7Wi1lbuH7o=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiuCqJCroRim5cVrAXaEOYTCbt0MlMmJmoMdRXceNCEbc+iDvfxmmbhbb+MPDxn3M4Z/4gYVRpx/m2SkvLK6tr5fXKxubW9o69u9dWIpWYtLBgQnYDpAijnLQ01Yx0E0lQHDDSCUZXk3rnjkhFBb/VWUK8GA04jShG2li+Xb33XdjHodDwwdAFfPRd3645dWcquAhuATVQqOnbX/1Q4DQmXGOGlOq5TqK9HElNMSPjSj9VJEF4hAakZ5CjmCgvnx4/hofGCWEkpHlcw6n7eyJHsVJZHJjOGOmhmq9NzP9qvVRH515OeZJqwvFsUZQyqAWcJAFDKgnWLDOAsKTmVoiHSCKsTV4VE4I7/+VFaB/X3ZO6c3Naa1wWcZTBPjgAR8AFZ6ABrkETtAAGGXgGr+DNerJerHfrY9ZasoqZKvgj6/MHXCGTRw==</latexit>
x2 w1 a2
<latexit sha1_base64="MADeaGuZ5x0zDKtg8jYEX763ets=">AAAB/3icbVBNS8NAEN3Ur1q/ooIXL4tFqCAlaQW9CEUvHivYD2hDmGy37dJNNuxuhBJ78K948aCIV/+GN/+N2zYHrT4YeLw3w8y8IOZMacf5snJLyyura/n1wsbm1vaOvbvXVCKRhDaI4EK2A1CUs4g2NNOctmNJIQw4bQWj66nfuqdSMRHd6XFMvRAGEeszAtpIvn3QVWwQgl8tge+eYvArJ/gSC98uOmVnBvyXuBkpogx13/7s9gRJQhppwkGpjuvE2ktBakY4nRS6iaIxkBEMaMfQCEKqvHR2/wQfG6WH+0KaijSeqT8nUgiVGoeB6QxBD9WiNxX/8zqJ7l94KYviRNOIzBf1E461wNMwcI9JSjQfGwJEMnMrJkOQQLSJrGBCcBdf/kualbJbLTu3Z8XaVRZHHh2iI1RCLjpHNXSD6qiBCHpAT+gFvVqP1rP1Zr3PW3NWNrOPfsH6+Aa4qZP0</latexit>
<latexit sha1_base64="kbSwglAxLszB5oa4qHODldzc1d4=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkR9Vj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindRce/Oy7XrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEO2o2m</latexit>
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>
@a2 <latexit sha1_base64="i3V+Hv5zU7q3mZ39kUdT/gvBnUk=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0QPu1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYP6PKcCZwWuqlGhPKxnSIXUsljVD72fzUKTmzyoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPYzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2nZEPwll9eJa1a1buouveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnnzmGP3A+fwDqxY2M</latexit>
<latexit sha1_base64="UfwdaiaQv7vCe0d17Hqvq1wxekM=">AAACC3icbVDLSgMxFM3UV62vUZduQovgqsxUQZdFNy4r2Ad0huFOmmlDMw+SjFKG7t34K25cKOLWH3Dn35hpB9TWA4GTc+69yT1+wplUlvVllFZW19Y3ypuVre2d3T1z/6Aj41QQ2iYxj0XPB0k5i2hbMcVpLxEUQp/Trj++yv3uHRWSxdGtmiTUDWEYsYARUFryzKoTCCCZk4BQDDgGrzH9ud179hR7Zs2qWzPgZWIXpIYKtDzz0xnEJA1ppAgHKfu2lSg3y2cSTqcVJ5U0ATKGIe1rGkFIpZvNdpniY60McBALfSKFZ+rvjgxCKSehrytDUCO56OXif14/VcGFm7EoSRWNyPyhIOVYxTgPBg+YoETxiSZABNN/xWQEOhyl46voEOzFlZdJp1G3T+vWzVmteVnEUUZHqIpOkI3OURNdoxZqI4Ie0BN6Qa/Go/FsvBnv89KSUfQcoj8wPr4BvY+a1A==</latexit>
@w1 <latexit sha1_base64="UHK8Rq4ihGcm4gMByPiIvg3wEO0=">AAAB/HicbZDLSsNAFIZPvNZ6i3bpZrAIdVOSKuhGKLpxWcFeoA1hMp20Q2eSMDMRYqmv4saFIm59EHe+jdM2C239YeDjP+dwzvxBwpnSjvNtrayurW9sFraK2zu7e/v2wWFLxakktEliHstOgBXlLKJNzTSnnURSLAJO28HoZlpvP1CpWBzd6yyhnsCDiIWMYG0s3y71FBsI7Ncqj757iq6QQd8uO1VnJrQMbg5lyNXw7a9ePyapoJEmHCvVdZ1Ee2MsNSOcToq9VNEEkxEe0K7BCAuqvPHs+Ak6MU4fhbE0L9Jo5v6eGGOhVCYC0ymwHqrF2tT8r9ZNdXjpjVmUpJpGZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvIomBHfxy8vQqlXds6pzd16uX+dxFOAIjqECLlxAHW6hAU0gkMEzvMKb9WS9WO/Wx7x1xcpnSvBH1ucPQKeTMw==</latexit>
2 (z1 ) = a2
Upper path
@l @l @o @a1 @l @o @a2
<latexit sha1_base64="TYYPnCIxpTv7H9H6wB8RxqOKTTU=">AAAC9XicrVLLSsNAFJ3EV019VF26GSyCIJSkCroRim5cVrAPaEqZTCc6OMmEmYlSQv7DjQtF3Pov7vwbJ2nBmpauvDBwOPfce2buXC9iVCrb/jbMpeWV1bXSulXe2NzaruzstiWPBSYtzBkXXQ9JwmhIWooqRrqRICjwGOl4D1dZvvNIhKQ8vFWjiPQDdBdSn2KkNDXYMcquLxBO3AgJRRGDLP3FTwMnhdYFXCDhKXTxkCtoQVjU8SkdylrlyjnCLFlwPbb+xbS+yLReMLUGlapds/OAs8CZgCqYRHNQ+XKHHMcBCRVmSMqeY0eqn2Q9MSOp5caSRAg/oDvS0zBEAZH9JP+1FB5qZgh9LvQJFczZ6YoEBVKOAk8rA6TuZTGXkfNyvVj55/2EhlGsSIjHRn7MoOIwWwE4pIJgxUYaICyovivE90gPR+lFyYbgFJ88C9r1mnNSs29Oq43LyThKYB8cgCPggDPQANegCVoAG8J4Nl6NN/PJfDHfzY+x1DQmNXvgT5ifPxdI7YY=</latexit>
@w1
= · ·
@o @a1 @w1
+ · ·
@o @a2 @w1 (multivariable chain rule)
Lower path

Backpropagation in CNNs
Same overall concept as before: Multivariable chain rule,

but now with an additional weight sharing constraint
Due to weight sharing: w1 = w2 <latexit sha1_base64="FKGwU2qvUIEYUjcCdiIWnsvCgXU=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSJ4KrtV0ItQ9OKxgv2QdlmyabYNTbJLkrWUpb/CiwdFvPpzvPlvTNs9aOuDgcd7M8zMCxPOtHHdb2dldW19Y7OwVdze2d3bLx0cNnWcKkIbJOaxaodYU84kbRhmOG0nimIRctoKh7dTv/VElWaxfDDjhPoC9yWLGMHGSo+jwEPXaBRUg1LZrbgzoGXi5aQMOepB6avbi0kqqDSEY607npsYP8PKMMLppNhNNU0wGeI+7VgqsaDaz2YHT9CpVXooipUtadBM/T2RYaH1WIS2U2Az0IveVPzP66QmuvIzJpPUUEnmi6KUIxOj6feoxxQlho8twUQxeysiA6wwMTajog3BW3x5mTSrFe+8Ur2/KNdu8jgKcAwncAYeXEIN7qAODSAg4Ble4c1Rzovz7nzMW1ecfOYI/sD5/AFAMo9k</latexit>
w1
<latexit sha1_base64="BGIvv1Que1aISVw+1pGEuT4uC1M=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qnn9Uplt+LOQJaJl5My5Kj3Sl/dfszSiCtkkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AELeI2j</latexit>
w2
<latexit sha1_base64="k9TH6JRVGzznxlg0BHK2AhK6Dh8=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindeqd5dlGvXeRwFOIYTOAMPLqEGt1CHBjAYwDO8wpsjnRfn3fmYt644+cwR/IHz+QMM/I2k</latexit>
Optional averaging
weight update: ✓ ◆
1 @L @L
w1 := w2 := w1 ⌘· +
<latexit sha1_base64="MYHrCBQOQYN/sQ1qQKoAWS1dkO0=">AAACenicjVFNa9tAEF2pSZs6Ses2x1IYYvJFqJHcQEugYNJLDz0kECcBy4jReuUsWX2wO6oxQj8ify23/pJeesjKVku+DhlY9vFm5u3smyhX0pDn/XbcF0vLL1+tvG6trq2/edt+9/7MZIXmYsAzlemLCI1QMhUDkqTERa4FJpES59HV9zp//ktoI7P0lGa5GCU4SWUsOZKlwvb1NPTh8BtMw97i8uETBIIQIODjjCCINfLSr8peBUEkJ5PdhoIgR00SFQQJ0iVHVf6sqvI/a6Uq2H928T/5vbDd8brePOAx8BvQYU0ch+2bYJzxIhEpcYXGDH0vp1FZK3MlqlZQGJEjv8KJGFqYYiLMqJxbV8GWZcYQZ9qelGDO3u0oMTFmlkS2sh7cPMzV5FO5YUHx11Ep07wgkfLFQ3GhgDKo9wBjqQUnNbMAuZZ2VuCXaL0iu62WNcF/+OXH4KzX9T93eycHnf5RY8cK+8A22S7z2RfWZz/YMRswzv44H51tZ8f56266e+7+otR1mp4Ndi/cg1vG1r5V</latexit>
2 @w1 @w2
What Are Some of the Common
Cnn Architectures?
1. What CNNs Can Do
9. CNNs in PyTorch

fi
Main Breakthrough for CNNs:
AlexNet & ImageNet
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.
neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
AlexNet
and pooled) achieved 15.4%
output of the first error layer
convolutional on top-5
and filtersinit 2012
with 256 kernels of size 5 ⇥ 5 ⇥ 48.
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
2nd best
pooling was not layers.
or normalization even close:
The third26.2%
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
(nowadays
256 connected to~3% error onpooled)
the (normalized, ImageNet)
outputs of the second convolutional layer. The fourth
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 192 , and the fifth convolutional layer has 256
kernels of size 3 ⇥ 3 ⇥ 192. The fully-connected layers have 4096 neurons each.
Krizhevsky, A., Sutskever, I., & Hinton,

4 Reducing G. E. (2012). ImageNet classification with deep convolutional neural
Overfitting
networks. In AdvancesOur
in neural
neural information processing systems (pp. 1097-1105).
network architecture has 60 million parameters. Although the 1000 classes of ILSVRC
make each training example impose 10 bits of constraint on the mapping from image to label, this
turns out to be insufficient to learn so many parameters without considerable overfitting. Below, we
describe the two primary ways in which we combat overfitting.
AlexNet & ImageNet
Stevens, Eli, Luca Antiga, and Thomas Viehmann. Deep learning with PyTorch. Manning Publications, 2020

rward pass and do not participate in back-
ral network samples a different architecture,
reduces complex co-adaptations of neurons,
ar other neurons. It is, therefore, forced to
n with many different random subsets of the
AlexNet & ImageNet
ut multiply their outputs by 0.5, which is a
of the predictive distributions produced by
Figure 2. Without dropout, our network ex- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009).
he number of iterations required to converge. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition (pp. 248–255).
nt
nd
nt
In Figure 3: 96 convolutional kernels of size
er: 11⇥11⇥3 learned by the first convolutional
or layer on the 224⇥224⇥3 input images. The Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
top 48 kernels were learned on GPU 1 while with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
the bottom 48 kernels were learned on GPU remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
2. See Section 6.1 for details.
i
In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
D E top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
ble, ✏ is the learning rate, and @w
@L
wi D
is only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
he objective with respect to w, evaluated at G. E.
Krizhevsky, A., Sutskever, I., & Hinton, (2012). Imagenet
the network’s classification with thedeepfeatureconvolutional neural
i
Another way to probe visual knowledge is to consider activations induced
networks. In Advances in neural information processing
by an systems (pp.
image at the last, 4096-dimensional 1097-1105).
hidden layer. If two images produce feature activation
vectors with a small Euclidean separation, we can say that the higher levels of the neural network
consider them to be similar. Figure 4 shows five images from the test set and the six images from
ean Gaussian distribution with standard de- the training set that are most similar to each of them according to this measure. Notice that at the
pixel level, the retrieved training images are generally not close in L2 to the query images in the first
cond, fourth, and fifth convolutional layers, column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the
e constant 1. This initialization accelerates
Sebastian Raschka
results for manySTAT 453:
more test Introinto
images theDeep Learning
supplementary material. 50
AlexNet & ImageNet
The ImageNet set that was used
has ~1.2 million
images and 1000 classes
Accuracy is measured as top-5

performance:
Correct prediction if the true
label matches one of the top 5
predictions of the model
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
remaining columns showSebastian
the sixRaschka
training STAT that produce feature vectors in the last 51
453: Intro to Deep Learning
images hid
AlexNet & ImageNet
Note that the actual network
inputs were still 224x224
images
(random crops from
downsampled 256x256
images)
224x224 is still a good/

reasonable size today
(224*224*3 = 150,528 features)
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
remaining columns showSebastian
the sixRaschka
training STAT that produce feature vectors in the last 52
453: Intro to Deep Learning
images hid
Common CNN Architectures
Figure 1: Top1 vs. network. Single-crop top-1 vali- Figure 2: Top1 vs. operations, size / parameters.
dation accuracies for top scoring single-model archi- Top-1 one-crop accuracy versus amount of operations
tectures. We introduce with this chart our choice of required for a single forward pass. The size of the
colour scheme, which will be used throughout this blobs is proportional to the number of network pa-
publication to distinguish effectively different archi- rameters; a legend is reported in the bottom right cor-
tectures and their correspondent authors. Notice that ner, spanning from 5⇥106 to 155⇥106 params. Both
networks of the same group share the same hue, for these figures share the same y-axis, and the grey dots
example ResNet are all variations of pink. highlight the centre of the blobs.
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical
single arXiv
applications. VGG-161arXiv:1605.07678.
run of preprint (Simonyan & Zisserman, 2014) and GoogLeNet (Szegedy et al., 2014) are
8.70% and 10.07% respectively, revealing that VGG-16 performs better than GoogLeNet. When
2
models are run with 10-crop sampling, then the errors become 9.33% and 9.15% respectively, and
Sebastian Raschka STAT 453: Intro to Deep Learning
therefore VGG-16 will perform worse than GoogLeNet, using a single central-crop. For this reason, 53
Convolutions with Color Channels
Sebastian Raschka, Vahid Mirjalili. Python Machine

Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750
Image dimension: X 2 Rn1 ⇥n2 ⇥cin in NWHC format,

<latexit sha1_base64="GzJ9t8GJxsmHc4EQcbLCmf2kfPA=">AAACIXicbVDLSsNAFJ34rPUVdelmsAiuSlIFuyy6cVnFPqCpYTKdtEMnkzBzI5SQX3Hjr7hxoUh34s84fQjaemDg3HPv5c45QSK4Bsf5tFZW19Y3Ngtbxe2d3b19++CwqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsYXk/6rUemNI/lPYwS1o1IX/KQUwJG8u2qFxEYBGHWzrHHJZ6VQXaXP2TSd7EHPGIaS7/yQ6mfcZnnvl1yys4UeJm4c1JCc9R9e+z1YppGTAIVROuO6yTQzYgCTgXLi16qWULokPRZx1BJzLFuNnWY41Oj9HAYK/Mk4Kn6eyMjkdajKDCTEwN6sTcR/+t1UgirXWMoSYFJOjsUpgJDjCdx4R5XjIIYGUKo4uavmA6IIhRMqEUTgrtoeZk0K2X3vFy5vSjVruZxFNAxOkFnyEWXqIZuUB01EEVP6AW9oXfr2Xq1PqzxbHTFmu8coT+wvr4Bx1+j5A==</latexit>
CUDA & PyTorch use NCWH

Interpreting (the Hidden)
ConvNet Layers
1. What CNNs Can Do
9. CNNs in PyTorch

fi
What a CNN Can See
Simple example: vertical edge detector
(From classical computer vision

research)

What a CNN Can See
Simple example: vertical edge detector

What a CNN Can See
Simple example: horizontal edge detector
A CNN can learn whatever it nds

best based on optimizing the objective
(e.g., minimizing a particular loss
to achieve good classi cation accuracy)

fi
fi
rward pass and do not participate in back-
ral network samples a different architecture,
reduces complex co-adaptations of neurons,
ar other neurons. It is, therefore, forced to
n with many different random subsets of the
AlexNet & ImageNet
ut multiply their outputs by 0.5, which is a
of the predictive distributions produced by
Figure 2. Without dropout, our network ex- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009).
he number of iterations required to converge. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition (pp. 248–255).
nt
nd
nt
In Figure 3: 96 convolutional kernels of size
er: 11⇥11⇥3 learned by the first convolutional
or layer on the 224⇥224⇥3 input images. The Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
top 48 kernels were learned on GPU 1 while with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
the bottom 48 kernels were learned on GPU remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
2. See Section 6.1 for details.
i
In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
D E top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
ble, ✏ is the learning rate, and @w
@L
wi D
is only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
he objective with respect to w, evaluated at G. E.
Krizhevsky, A., Sutskever, I., & Hinton, (2012). Imagenet
the network’s classification with thedeepfeatureconvolutional neural
i
Another way to probe visual knowledge is to consider activations induced
networks. In Advances in neural information processing
by an systems (pp.
image at the last, 4096-dimensional 1097-1105).
hidden layer. If two images produce feature activation
vectors with a small Euclidean separation, we can say that the higher levels of the neural network
consider them to be similar. Figure 4 shows five images from the test set and the six images from
ean Gaussian distribution with standard de- the training set that are most similar to each of them according to this measure. Notice that at the
pixel level, the retrieved training images are generally not close in L2 to the query images in the first
cond, fourth, and fifth convolutional layers, column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the
e constant 1. This initialization accelerates
Sebastian Raschka
results for manySTAT 453:
more test Introinto
images theDeep Learning
supplementary material. 59
repeated in layers 2,3,4,5. The last two layers are fully connected, taking features from
What a CNN Can See

the top convolutional layer as input in vector form (6 · 6 · 256 = 9216 dimensions).
The final layer is a C-way softmax function, C being the number of classes. All filters
and feature maps are square in shape.
Which patterns from the training set activate the feature map?
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5
Fig. 4. Evolution of a randomly chosen subset of model features through training.

Each layer’s features are displayed in a different block. Within each block, we show
a randomly chosen subset of features at epochs [1,2,5,10,20,30,40,64]. The visualiza-
tion shows the strongest activation (across all training examples) for a given feature
map, projected down to pixel space using our deconvnet approach. Color contrast is
artificially enhanced and the figure is best viewed in electronic form.
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. European
occluderIncovers theconference on computer
image region vision (pp.
that appears 818-833).
in the Springer, Cham.
visualization, we see a
strongbackpropagate
Method: drop in activity
stronginactivation
the feature map.
signals This layers
in hidden showstothat the images,
the input visualization
then
genuinely
apply corresponds
"unpooling" tovalues
to map the the image
to the structure that
original pixel stimulates
space for that feature map,
hence validating the other visualizations shown in Fig. 4 and Fig. 2.
visualization
5 Experiments Sebastian Raschka STAT 453: Intro to Deep Learning 60

What a CNN Can See
networks.
824 In European
M.D. Zeiler andconference
R. Ferguson computer vision (pp. 818-833). Springer, Cham.
Layer 1
Layer 2

What a CNN Can See
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Layer 2
Layer 3

What a CNN Can See
networks. In European
Layer 3 conference on computer vision (pp. 818-833). Springer, Cham.
Layer 4 Layer 5
Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top
https://thegradient.pub/a-visual-history-of-interpretation-for-image-recognition/

Let's Implement CNNs in
PyTorch
1. What CNNs Can Do
9. CNNs in PyTorch

fi

L13 Intro-Cnn Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L13 Intro-Cnn Slides

Uploaded by

Copyright:

Available Formats

STAT 453: Introduction to Deep Learning and Generative Models

Convolutional Neural Networks

1. What CNNs Can Do

Sebastian Raschka STAT 453: Intro to Deep Learning 2

1. Padding (control output size in addition to stride)

Sebastian Raschka STAT 453: Intro to Deep Learning 3

1. What CNNs Can Do

Sebastian Raschka STAT 453: Intro to Deep Learning 4

Image Source: https://www.pinterest.com/pin/

Sebastian Raschka STAT 453: Intro to Deep Learning 5

Sebastian Raschka STAT 453: Intro to Deep Learning 6

person1.00 traffic light.95 bowl.81

Sebastian Raschka STAT 453: Intro to Deep Learning 8

Sebastian Raschka STAT 453: Intro to Deep Learning 9

Sebastian Raschka STAT 453: Intro to Deep Learning 10

Di erent lighting, contrast, viewpoints, etc.

Image Source: Image Source: https://www.123rf.com/

Or even simple translation This is hard for traditional

Sebastian Raschka STAT 453: Intro to Deep Learning 11

a) Use hand-engineered features

Sebastian Raschka STAT 453: Intro to Deep Learning 12

a) Use hand-engineered features

Sebastian Raschka STAT 453: Intro to Deep Learning 13

b) Preprocess images (centering, cropping, etc.)

Image Source: https://www.tokkoro.com/2827328-cat-animals-nature-feline-park-green-trees-grass.html

Sebastian Raschka STAT 453: Intro to Deep Learning 14

Sebastian Raschka STAT 453: Intro to Deep Learning 15

Sebastian Raschka STAT 453: Intro to Deep Learning 16

Sebastian Raschka STAT 453: Intro to Deep Learning 17

C3: f. maps 16@10x10

Full connection Gaussian connections

Sebastian Raschka STAT 453: Intro to Deep Learning 18

C3: f. maps 16@10x10

Full connection Gaussian connections

"Automatic feature extractor" "Regular classi er"

Sebastian Raschka STAT 453: Intro to Deep Learning 19

C3: f. maps 16@10x10

Full connection Gaussian connections

Each "bunch" of feature maps represents one

Counting the FC layers, this network has 5 layers

Sebastian Raschka STAT 453: Intro to Deep Learning 20

Full connection Gaussian connections

nowadays called "pooling" basically a fully-connected

Sebastian Raschka STAT 453: Intro to Deep Learning 21

• Sparse-connectivity: A single element in the feature map is

• Parameter-sharing: The same weights are used for di erent

• Many layers: Combining extracted local patterns to global

Sebastian Raschka STAT 453: Intro to Deep Learning 22

1. What CNNs Can Do

Sebastian Raschka STAT 453: Intro to Deep Learning 23

Rationale: A feature detector that works well in one region

Plus, it is a nice reduction in parameters to t

Sebastian Raschka STAT 453: Intro to Deep Learning 24

Sebastian Raschka STAT 453: Intro to Deep Learning 25

Sebastian Raschka STAT 453: Intro to Deep Learning 26

Sebastian Raschka STAT 453: Intro to Deep Learning 27

Sebastian Raschka STAT 453: Intro to Deep Learning 28

detectors" (kernels) are used

Sebastian Raschka STAT 453: Intro to Deep Learning 29

Feature map size: kernel width

output width stride

Sebastian Raschka STAT 453: Intro to Deep Learning 30

For a grayscale image with a