Professional Documents
Culture Documents
Theorical Explanation
Convolution Process
Max Pooling
Case Studies
Conclusion
Research Scope
Learning Hierarchical Representations
› Image Recognition
– Pixel -> Edge -> Texton -> Motif -> Part -> Object
› Text
– Character -> Word -> Word Group -> Clause -> Sentence -
> Story
› Speech
– Sample -> Spectral Band -> Sound -> Phone -> Phoneme -
> Word
3
DEC 28, 2023
Human View
4
DEC 28, 2023
5
DEC 28, 2023
Human View Vs. Machine View
6
DEC 28, 2023
Color Image to RGB Matrix
7
DEC 28, 2023
8
DEC 28, 2023
9
DEC 28, 2023
10
DEC 28, 2023
11
DEC 28, 2023
12
DEC 28, 2023
What is Deep Learning?
› “Representation-learning methods with multiple levels
of representation, obtained by composing simple but
non-linear modules that each transform the
representation at one level (starting with raw input) into
a representation at higher, slightly more abstractive
level.”
Ref: LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-
444.
13
DEC 28, 2023
Flow of Image Classification:
14
DEC 28, 2023
Convolution and Transposed Convolution
Celard, P., Iglesias, E. L., Sorribes-Fdez, J. M., Romero, R., Vieira, A. S., & Borrajo, L. (2023).
A survey on deep learning applied to medical images: from simple artificial neural networks to 15
DEC 28, 2023 generative models. Neural Computing and Applications, 35(3), 2291-2323.
Convolution Neural Networks
16
DEC 28, 2023
Evolution of Neural Networks
17
DEC 28, 2023
18
DEC 28, 2023
Convolution These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0
-1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 Filter 2
-1 1 -1
0 0 1 0 1 0
-1 1 -1
…
…
6 x 6 image
Each filter detects a
small pattern (3 x 3).
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1 Dot
0 1 0 0 1 0 product
3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0
3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0
3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0
-3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0
-3 -3 0 1
6 x 6 image 3 -2 -2 -1
-1 1 -1
Convolution -1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0
3 -1 -3 -1
0 0 1 1 0 0 -1 -1 -1 -1
1 0 0 0 1 0
-3 1 0 -3
0 1 0 0 1 0 -1 -1 -2 1
Feature
0 0 1 0 1 0
-3 -3 Map
0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Color image: RGB 3 channels
11 -1-1 -1-1 -1-1 11 -1-1
-1 1 1 -1-1 -1 -1 1 -1
-1-1 11 -1-1
-1 1 -1
-1 -1-1 1 1 -1 Filter 1 -1 -1 1 1 -1 -1 Filter 2
-1-1 -1-1 11 -1-1 11 -1-1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0
x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 1 1
Filter 1
-1 1 -1 2 0
-1 -1 1 3 0
4 0 3
1 0 0 0 0 1 :
…
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0
9 0
0 1 0 0 1 0
10: 0
0 0 1 0 1 0
…
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to
16 1 9 inputs, not
fully connected
…
1 -1 -1 1 1
-1 1 -1 :2 0
Filter 1
-1 -1 1 :3 0
:4 0 3
1 0 0 0 0 1 :
…
0 1 0 0 1 0 7 0
0 0 1 1 0 0 :8 1
1 0 0 0 1 0 :9 0 -1
0 1 0 0 1 0
10:: 0
0 0 1 0 1 0
…
13 0
6 x 6 image
:
14 0
Fewer parameters :
15 1
:
16 1 Shared weights
Even fewer parameters
:
…
The whole CNN
cat dog ……
Convolution
Max Pooling
Can repeat
Fully Connected many
Feedforward network
Convolution times
Max Pooling
Flattened
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Why Pooling
• Subsampling pixels will not change the object
bird
bird
Subsampling
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
0 0 1 1 0 0 3 0
-1 1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0 Max 3 1
0 3
Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can repeat
A new image
many
Convolution times
Smaller than the original
image
The number of channels Max Pooling
Max Pooling
Max Pooling
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input
Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
-1 -1 1 25 3x3
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)
3 -1 3 Max Pooling
-3 1
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
25 x 13 x 13
Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Output Convolution
25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flattened
AlphaGo
Next move
Neural
(19 x 19
Network positions)
19 x 19 matrix
Black: 1 Fully-connected feedforward
network can be used
white: -1
none: 0 But CNN performs much better
AlphaGo’s policy network
The following is quotation from their Nature article:
Note: AlphaGo does not use Max Pooling.
CNN in speech recognition
Image Time
Spectrogram
THANK YOU
Programs must be written for people to read, and only incidentally for machines to
execute