Professional Documents
Culture Documents
CP -6
Machine Learning
M S Prasad
INPUT --- [[ {conv+ReLU}}*N---Pool ]*M---[ FC—ReLU]*k -----
Convolution data set
54 para 2 bias
Stride: It is the number of spaces that we move to the right before reach the end of the
image, and the spaces we move to below before we reach the end of the image.
Stride 1 Stride2
Padding: Sometimes we want to take advantage of all the pixels in the image, so a padding just indicates how
many columns and rows of zeros we are going to add in the border of the image. Also, if you want to apply a
’full’ convolution you need to add a (wf-1) padding, where wf is the width of the filter.
BIAS
The relation between the bias and the result of a convolution.
The bias add some specific value to the result in every channel, so for the error that we receive from an
upper layer every value of the bias needs to change according to the error of the related
channel.
Pooling layer
Non-linear down sampling of the volume by using small filters to sample for example the
maximum or average values in a rectangular area of the output from the previous layer. Pooling
reduces the spatial size, to reduce the amount of parameters and computations, and additionally
avoids overfitting, i.e. high training accuracy but low validation accuracy.
Pooling is a down-sampling operation that reduces the dimensionality of the feature map. The
rectified feature map now goes through a pooling layer to generate a pooled feature map.
The pooling layer uses various filters to identify different parts of the image like edges, corners,
body, feathers, eyes, and beak.
Normalisation layer Different kinds of normalisation layers have been proposed
to normalise the data, but have not proven useful in practice and have therefore
not gained any solid ground.
Fully connected layer Neurons in this layer are fully connected to all activations
in the previous layers, as in regular neural networks. These are usually at the
end of the network, e.g. outputting the class probabilities.
Loss layer Often the last layer in the network that computes the objective of the
task, such as classification, by e.g. applying the softmax function
Output
Yi = B N
SAMPLE CNN
The convolution layer is that all spatial locations share the same convolution kernel, which greatly
reduces the number of parameters needed for a convolution layer.
The combination of convolution kernels and deep and hierarchical structures are very effective in
learning good representations (features) from images for visual recognition tasks..
key concept in CNN (or more generally deep learning) is distributed representation. For example,
suppose our task is to recognize N different types of objects and a CNN extracts M features from any
input image. It is most likely that any one of the M features is useful for recognizing all N object
categories; and to recognize one object type requires the joint effort of all M features.
Why CNN
Say we have an initial image is 224 x 224 x 3. If we proceed without convolution then wec need 224 x 224 x 3 =
100, 352 numbers of neurons in input layer.
After applying convolution you input tensor dimension is reduced to 1 x 1 x 1000. It means you only need 1000
neurons in first layer of feedforward neural network.
Data Size Consideration
Conv Input ==nxn ; filter f x f
Output is ( n-f+1)x(n-f+1) reduces pixel size.
Padding : p
Output (n-2p-f+1)(n=2p-f+1)
Padding makes output size = input i.e. n+2p-f+1 =n sp p =f-1/2
Stride S
Output [ n+2p-f)/S+1 ][(n+2p-f/S+1]
No of channel
Input nxnxnc padding p stride s
Flattening. Flattening is used to convert all the resultant 2-Dimensional arrays from pooled
feature maps into a single long continuous linear vector.
In July 2012, researchers at Google exposed an advanced neural network to a
series of unlabelled, static images sliced from YouTube videos.
To their surprise, they discovered that the neural network learned a cat-detecting
neuron on its own, supporting the popular assertion that “the internet is made of
cats”.
ANY question
U-Nets
A U-Net is a convolutional neural network architecture that was developed for biomedical image segmentation. U-
Nets have been found to be very effective for tasks where the output is of similar size as the input and the output
needs that amount of spatial resolution. This makes them very good for creating segmentation masks and for
image processing/generation such as super resolution.
PRACTICAL: Step by Step Guide