Neural Network and CNN

Neural networks and CNN
Biplab Banerjee
Thanks to towardsdatascience, Vincente Ordonez

Perceptron Model
Frank Rosenblatt (1957) - Cornell University
Activation Functions
Step(x) Sigmoid(x)
Tanh(x) ReLU(x) = max(0, x)

Two-layer Multi-layer Perceptron (MLP)
”hidden" layer
෍
Loss / Criterion
𝑥1 𝑎1
෍
𝑥2 𝑎2
෍ 𝑦ො1 𝑦1
𝑥3 𝑎3
෍
𝑥4 𝑎4
෍
- Reducing the number of layers below the minimum will result in an exponentially sized network to express
the function fully
- A network with fewer than the minimum required number of neurons cannot model the function
Linear Softmax
𝑥𝑖 = [𝑥𝑖1 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 ] 𝑦𝑖 = [1 0 0] 𝑦ො𝑖 = [𝑓𝑐 𝑓𝑑 𝑓𝑏 ]
𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐

𝑔𝑑 = 𝑤𝑑1 𝑥𝑖1 + 𝑤𝑑2 𝑥𝑖2 + 𝑤𝑑3 𝑥𝑖3 + 𝑤𝑑4 𝑥𝑖4 + 𝑏𝑑
𝑔𝑏 = 𝑤𝑏1 𝑥𝑖1 + 𝑤𝑏2 𝑥𝑖2 + 𝑤𝑏3 𝑥𝑖3 + 𝑤𝑏4 𝑥𝑖4 + 𝑏𝑏
𝑓𝑐 = 𝑒 𝑔𝑐 /(𝑒 𝑔𝑐 +𝑒 𝑔𝑑 + 𝑒 𝑔𝑏 )
𝑓𝑑 = 𝑒 𝑔𝑑 /(𝑒 𝑔𝑐 +𝑒 𝑔𝑑 + 𝑒 𝑔𝑏 )
𝑓𝑏 = 𝑒 𝑔𝑏 /(𝑒 𝑔𝑐 +𝑒 𝑔𝑑 + 𝑒 𝑔𝑏 )
26
Linear Softmax
𝑤𝑐1 𝑤𝑐2 𝑤𝑐3 𝑤𝑐4

𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐
𝑤 = 𝑤𝑑1 𝑤𝑑2 𝑤𝑑3 𝑤𝑑4
𝑔𝑑 = 𝑤𝑑1 𝑥𝑖1 + 𝑤𝑑2 𝑥𝑖2 + 𝑤𝑑3 𝑥𝑖3 + 𝑤𝑑4 𝑥𝑖4 + 𝑏𝑑 𝑤𝑏1 𝑤𝑏2 𝑤𝑏3 𝑤𝑏4
𝑔𝑏 = 𝑤𝑏1 𝑥𝑖1 + 𝑤𝑏2 𝑥𝑖2 + 𝑤𝑏3 𝑥𝑖3 + 𝑤𝑏4 𝑥𝑖4 + 𝑏𝑏
𝑏 = 𝑏𝑐 𝑏𝑑 𝑏𝑏
27
Linear Softmax

𝑔 = 𝑤𝑥 𝑇 + 𝑏 𝑇 𝑤𝑏1 𝑤𝑏2 𝑤𝑏3 𝑤𝑏4
28
Linear Softmax

𝑔 = 𝑤𝑥 𝑇 + 𝑏 𝑇 𝑤𝑏1 𝑤𝑏2 𝑤𝑏3 𝑤𝑏4
𝑓 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑔)
29
Linear Softmax
𝑓 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤𝑥 𝑇 + 𝑏 𝑇 )
30
Two-layer MLP + Softmax
𝑇
𝑎1 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤[1] 𝑥 𝑇 + 𝑏[1] )
𝑇
𝑓 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤[2] 𝑎[1]𝑇 + 𝑏[2] )
31
N-layer MLP + Softmax
𝑇
𝑇
𝑎2 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤[2] 𝑎1𝑇 + 𝑏[2] )
…
𝑇 𝑇
𝑎𝑘 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤[𝑘] 𝑎𝑘−1 + 𝑏[𝑘] )
𝑇 𝑇
𝑓 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤[𝑛] 𝑎𝑛−1 + 𝑏[𝑛] )
32
Why is non-linearity important
How to train the parameters?
𝑇
𝑇
…
𝑇 𝑇
𝑎𝑘 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤[𝑘] 𝑎𝑘−1 + 𝑏[𝑘] )
𝑇 𝑇
𝑓 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤[𝑛] 𝑎𝑛−1 + 𝑏[𝑛] )
34
How to train the parameters?
𝑇 𝑙 = 𝑙𝑜𝑠𝑠(𝑓, 𝑦)
𝑇
… We can still use SGD
𝑇 𝑇
𝑎𝑘 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤[𝑘] 𝑎𝑘−1 + 𝑏[𝑖] )
… We need!
𝜕𝑙 𝜕𝑙
𝑇 𝑇
𝑓= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤[𝑛] 𝑎𝑛−1 + 𝑏[𝑛] ) 𝜕𝑤[𝑘]𝑖𝑗 𝜕𝑏 𝑘 𝑖
35
Backpropagation – repeated application of
chain rule
Two-layer Neural Network – Forward Pass
Two-layer Neural Network – Backward Pass
Basic building blocks of the CNN architecture
• Input layer
• Convolutional layer
• Fully connected layer
• Loss layer
• Convolutional layer
• Convolutional kernel
• Pooling layer
• Non-linearity
Convolution operation
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
They can be compressed

to the same parameters.
“middle beak”
detector
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
Convolutional Layer (with 4 filters)
weights:
4x1x9x9
Input: 1x224x224 Output: 4x224x224
if zero padding,
and stride = 1
Convolutional Layer (with 4 filters)
weights:
4x1x9x9
Input: 1x224x224 Output: 4x112x112
if zero padding,
but stride = 2
Color image: RGB 3 channels – conv. over
depth
1 -1 -1 -1-1 11 -1-1
11 -1-1 -1-1 -1 1 -1
-1-1 11 -1-1 -1-1-1 111 -1-1-1 Filter
-1 1 -1 Filter 1 -1 1 -1
-1-1 -1-1 11 -1-1 11 -1-1 2 …3
-1 -1 1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Different types of convolution
Parameters:
✓ Kernel stride
✓ Size
✓ Padding
Normal vs dialated Dialation width = 2

Dialated convolution
Is ReLU helpful? https://github.com/bhattbhavesh91/why-is-relu-non-linear/
Spatially Separable convolution
Depthwise separable convolution
Convolving by 256 5x5 kernels over the input volume

Depthwise separable convolution – step1
Along depth
Depthwise separable convolution – step2
Pointwise 1x1 conv

Transpose convolution
Convolution as a matrix multiplication
Many to one mapping – 9 values to 1 value
One to many mapping
The whole CNN
cat dog ……
Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Pooling
• Down-sample the image – controls the parameters of the CNN model
Why Pooling
• Subsampling pixels will not change the object
bird
bird
Subsampling
✓ We can subsample the pixels to make image

smaller
✓ fewer parameters to characterize the image
Pooling or strided convolution?
Unpool
The whole CNN
cat dog ……
Convolution
Max Pooling
Fully Connected A new image

Feedforward network
Convolution
Max Pooling
Flattened A new image

3
Flattening
0
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Conv Net Topology
• 5 convolutional layers
• 3 fully connected layers + soft-max
• 650K neurons , 60 Mln weights
Why do we need a deep CNN?
Courtsey: ICRI
Suggested reading

Neural Network and CNN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Network and CNN

Uploaded by

Copyright:

Available Formats

Neural networks and CNN

Thanks to towardsdatascience, Vincente Ordonez

Tanh(x) ReLU(x) = max(0, x)

𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐

𝑤𝑐1 𝑤𝑐2 𝑤𝑐3 𝑤𝑐4

𝑤𝑐1 𝑤𝑐2 𝑤𝑐3 𝑤𝑐4

𝑤𝑐1 𝑤𝑐2 𝑤𝑐3 𝑤𝑐4

They can be compressed

Normal vs dialated Dialation width = 2

Convolving by 256 5x5 kernels over the input volume

Pointwise 1x1 conv

✓ We can subsample the pixels to make image

Fully Connected A new image

Flattened A new image

You might also like