Professional Documents
Culture Documents
Deep Learning
Deep Learning
Handwritten Sequence
Stimulus
Receptors Neural Net Effectors Response
Human Brain[2]
• Receptors – convert stimuli from the human body or
the external environment into electrical impulses that
convey information to the neural net (brain)
• Neural Net – continually receives information,
perceives it, and makes appropriate decisions
• Effectors – convert electrical impulses generated by
the neural net into discernible responses as system
output
• Arrows pointing from left to right indicate the forward
transmission of information-bearing signals through
the system
• Arrows pointing from right to left signify the presence
of feedback in the system
Feed forward Neural Networks
• Multilayer Perceptrons
• Deep Feedforward Networks
• A feedforward network defines a mapping y = f (x;
θ) and learns the value of the parameters θ that
result in the best function approximation.
• There are no feedback connections in which
outputs of the model are fed back into itself.
• When feedforward neural networks are extended
to include feedback connections, they are called
recurrent neural networks.[7]
Feed-forward Network
Feed Forward Neural Network
Convolution Neural Network
• Do we really need all the edges of the network?
• Can some of these be shared?
Local Receptive field
Hidden neuron
•The region in the input image is called the local receptive field for
the hidden neuron.
• Each connection learns a weight.
Hidden Layer
• Slide the local receptive field across the entire input image and
by sliding one pixel to the right (i.e., by one neuron), connect
to a second hidden neuron and so on building the first hidden
layer.
• For each local receptive field, there is a different hidden
neuron in the first hidden layer.
First Hidden layer
“beak” detector
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
“middle beak”
detector
A convolutional layer
A convolutional layer has a number of filters that
does convolutional operation.
Beak detector
A filter
Convolution
These are the network
parameters to be learned.
1 -1 -
1 0 0 0 0 1 1 Filter 1
0 1 0 0 1 0 -1 1 -1
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Each filter detects a small
pattern (3 x 3).
1 -1 -1
Convolution
-1 1 -1 Filter 1
stride=1 -1 -1 1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolution
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Convolution 1 -1 -1
-1 1 -1 Filter 1
If stride=2 -1 -1 1
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolution -1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
3 -1 -3 -1
0 1 0 0 1 0 -1 -1 -1 -
0 0 1 1 0 0 1
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map 0
-1 -1 - 1
2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4
3
Two 4 x 4 images
Color image: RGB 3 channels
11 - - - - - - 11 - -
11 -11 -1 1 -1 1 1 -1
- - -1- -1111 - 1
- -11111 1 - -1 Filter 1 1 1
- -1 Filter 2
1 1 -1
-1- -1- - -1 1111 - -1 11 - 1
1 - -1
11 11 11 11
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
fewer parameters!
1 0 0 0 0 1
0 1 0 0 1 0
Fully-conne 0 0 1 1 0 0
cted 1 0 0 0 1 0
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
Convolution and Shallow NN
Subsampling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Max Pooling
New image but
1 0 0 0 0 1 smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 30 13
0 0 1 0 1 0 Pooling
6 x 6 image 2 x 2 image
Each filter is a
channel
The whole CNN
cat dog ……
Convolution
Max Pooling
Can repeat
Fully Connected many times
Feedforward network
Convolution
Max Pooling
Flattened
The whole CNN
3
-1 0 Convolution
1
30 13 Max Pooling
A new image Can repeat
many times
Convolution
Smaller than the original
image
The number of channels is Max Pooling
cat dog ……
Convolution
Max Pooling
Max Pooling
1
3 0
-1 1 3
30 1 Flattened -1
3 Fully Connected
Feedforward network
1
3
A CNN compresses a fully connected
network in two ways:
• Reducing number of connections
• Shared weights on the edges
• Max pooling further reduces the complexity
Deep Learning
http://wiki.bethanycrane.com/overfitting-of-data
https://www.neuraldesigner.com/images/learning/selection_error.svg
Regularization
Dropout
• Randomly drop units (along with their connections) during training
• Each unit retained with fixed probability p, independent of other units
• Hyper-parameter p to be chosen (tuned)
L2 = weight decay
f(x)=x
Cost (loss)
function Cross-entropy Mean Squared Error
• When later layers in the network are learning well, early layers
often get stuck during training, learning almost nothing at all.
• If the mean and standard deviations for each input feature are
calculated over the mini-batch instead then the batch size must
be sufficiently representative of the range of each variable.
• In a batch-normalized model, we have been able to achieve a
training speedup from higher learning rates, with no ill side
effects.
• Identity mapping does not have any parameters and is there to add
the output from the previous layer to the layer ahead.
• However, x and F(x) will not have the same dimension, because a
convolution operation typically shrinks the spatial resolution of an
image.
• The identity mapping is multiplied by a linear projection W for
the input x and F(x) to be combined as input to the next layer.
•
Ws term can be implemented with 1×1 convolutions,
introducing additional parameters to the model.
Residual Networks