You are on page 1of 40

MACHINE LEARNING

UNIT 3
CONVOLUTIONAL NEURAL NETWORK
•A Convolutional Neural Network, or CNN, is a
subset of deep learning and neural networks most
commonly used to analyze visual imagery.
•In machine learning, a convolution is the integral
measurement of how two functions overlap as they
interact.
• A ConvNet usually has 3 types of layers:
1) Convolutional Layer (CONV)
2) Pooling Layer (POOL)
3) Fully Connected Layer (FC)
• Convolution
The main building block of CNN is the convolution layer.
Convolution is a mathematical operation to merge two sets
of information.
• Pooling
After a convolution operation we usually perform pooling to
reduce the dimensionality.

• Fully Connected
After the convolution + pooling layers we add a couple of fully
connected layers to wrap up the CNN architecture.
Flattening
•Flattening is converting the data into a
1-dimensional array for inputting it to the next layer.
•We are making a classification model, which means
these processed data should be good input to the
model. It needs to be in the form of a 1-dimensional
linear vector. Rectangular or cubic shapes can’t be
direct inputs.
• We flatten the output of the convolution layers to
create a single long feature vector, which is connected
to the final classification model i.e fully-
connected layer.
Padding
• Padding is used in CNN to preserve the size of
feature maps.
• If we take a neural net with 100’s of layers, it will
give us a small image after filtered in the end. So
there are the two main downsides of applying
filters:
I. Shrinking outputs
II. Loosing information on corners of the image
Padding
• If we want to maintain the same dimensionality, we
use padding to surround the input with zeros. 

Figure 1 : Apply Padding

Figure 2 : Implementation of Padding


Stride 

• Stride specifies how much we move the convolution


filter at each step.

• In other words, Stride is the number of pixels shifts


over the input matrix.

• Stride controls how the filter convolves around the


image volume.
Stride
The following figure demonstrates a stride of 1.  

Step 1 Step 2 Step 3

………………………………..
Step 4
Step 9
Pooling
• Pooling layer is a building block of CNN. Its function
is to progressively reduce the size of representation
to reduce the amount of parameters and
computation in network.
• Pooling layer operates on each feature map
independently.
• Types of Pooling :
i. Max Pooling
ii. Average Pooling
Pooling
• Sliding a window, we only take the maximum value
inside the box on the left case. This is ‘max pooling.’
• We can also take the average values like the picture
on the right. This is ‘average pooling.’ 
Loss Layer
• Loss layer is a layer in CNN which calculates the
deviation between the predicted output and
calculated output.

Dense Layer
• A Dense layer is a regular layers of neurons in neural
network in which each neuron receives input from all
the neurons in previous layer.
• A densely connected layer provides learning features
from all the combinations of the inputs of previous
layers.
Dense layer
1 * 1 Convolution
• 1*1 Convolution simply means the filter is a size of
1*1. This 1*1 filter will convolve over the entire
input image pixel by pixel.
• A 1*1 Convolution maps an input pixels with all its
respective channels to an output pixel.
• 1*1 convolution is used for dimensionality
reduction in CNN. 
• It reduces the size of the input vector, the number
of channels. The 1*1 convolutional layer is also
called a Pointwise Convolution.
• If we want to reduce the depth but keep the Height X Width of the
feature maps (Receptive field) the same, then we can choose 1X1
filters (where: Number of filters = Output Channels) to achieve this
effect.
• This effect of cross channel down-sampling is called ‘Dimensionality
reduction’.
Inception Network
• Inception Modules are used in Convolutional Neural
Networks to allow for more efficient computation and
deeper Networks through a dimensionality reduction with
stacked 1×1 convolutions.

• The solution, in short, is to take multiple filter sizes within


the CNN, and rather than stacking them sequentially,
ordering them to operate on the same level. 
• The main difference between the Inception models and regular CNNs
are the inception blocks. These involve convolution operation on same
input with multiple filters and concatenating their results.

Figure : Basic structure of Inception Network with multiple filters on same level
Using the bottleneck approaches we can rebuild the inception module
with less parameters. Also a max pooling layer is added to summarize
the content of the previous layer. All the results are concatenated one
after the other, and given to the next layer.
Versions of the Inception Network

• Its constant evolution lead to the creation of several


versions of the network. The popular versions are as
follows:
• Inception v1.
• Inception v2 and Inception v3.
• Inception v4 and Inception-ResNet.
Inception v1
• This is popularly known as GoogLeNet (Inception v1).
• GoogLeNet has :
 9 inception modules,
 It is 22 layers deep without the pooling layers.
• Inception v1 combining 1x1, 3x3, 5x5 convolutional layer
and 3x3 pooling pooling layer in parallel and concatenate
the design idea.
Inception v2
• The Inception v2 network has been improved on
the basis of Inception v1.

• It replaces the 5x5 convolution in the inception


module with two 3x3 convolutions, which reduces
the number of parameters and speeds up the
calculation.
Figure : Inception module for Inception V2
Inception v3
• One of the most important improvements in
Inception v3 is Factorization.
• Factorize convolutions of filter size nxn to
a combination of 1xn and nx1 convolutions. For
example, a 3x3 convolution is equivalent to first
performing a 1x3 convolution, and then performing
a 3x1 convolution on its output.
Figure : Inception module for Inception V3
Inception v4
• Inception V4 is a combination of Microsoft's
Residual Networks (ResNet) and V3.
• ResNet's structure can greatly accelerate training,
while improving performance, get an Inception-
ResNet v2 network, and also designed a deeper and
more optimized Inception v4 model.

• The stem here, refers to the initial set of operations


performed before introducing the Inception blocks.
The left image is the stem of Inception-ResNet v1. The right image is the stem of Inception v4 
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
Input Channels
• CNN have multiple input channels in an RGB image, where each
channel represents a different intensity of red, green, and blue.
• The input layer or input volume is an image that has the following
dimensions: [width x height x depth].
• For example INPUT [32x32x3] will hold the pixel values of the image,
in this case an image of width 32, height 32, and with 3 colour
channels R,G,B.
• In computer vision the input is often a 3 channel RGB image. If we
take a greyscale image that has one channel (a two dimensional
matrix) and a 3x3 convolution kernel (a two dimensional matrix).
Transfer Learning
•Transfer learning is a machine learning method where a model developed
for a task is reused as the starting point for a model on a second task.

• Why Transfer Learning ?


Build every model from scratch:
Time Consuming and Expensive (Data Collection & Time to Train).

Reuse common knowledge from existing system :


More practical

• How to Use Transfer Learning?


Two common approaches for transfer learning are as follows:
i. Develop Model Approach ii. Pre-trained Model Approach
One – Shot Learning
• One-shot learning aims to learn information about object
categories from one, or only a few, training images.
• Learn a concept from one or only a few examples.
• In machine learning using CNN, one of the biggest
challenge and limitation is a requirement of big set of
labeled data. In many applications, collecting this much
data is sometimes not feasible. One shot learning aims to
solve this problem.
Dimension/ Dimensionality Reduction
• Dimensionality reduction is simply, the process
of reducing the dimension of your feature set. 
• In machine learning we are having too many factors
on which the final classification is done. These
factors are basically, known as variables. The higher
the number of features, the harder it gets to
visualize the training set and then work on it.
Sometimes, most of these features are correlated,
and hence redundant. This is where dimensionality
reduction algorithms come into play.
• Methods of Dimensionality Reduction
The various methods used for dimensionality
reduction include:
I. Principal Component Analysis (PCA)
II. Linear Discriminant Analysis (LDA)
I. Principal Component Analysis (PCA)
It works on a condition that while the data in a higher
dimensional space is mapped to data in a lower dimension
space, the variance of the data in the lower dimensional space
should be maximum.

II. Linear Discriminant Analysis (LDA)


Linear Discriminant Analysis or Normal Discriminant
Analysis or Discriminant Function Analysis is a dimensionality
reduction technique. It is used for modelling differences in
groups i.e. separating two or more classes.
• Two criteria are used by LDA to create a new axis:
i. Maximize the distance between means of the two classes.
ii. Minimize the variation within each class.
Principal Component Analysis Linear Discriminant Analysis
(PCA) (LDA)

Finds the direction that maximizes Finds the direction that maximizes
the variance in the data. the differences in the data.
Implementation of CNN like tensor flow,
keras etc.

Tensors

• Sometimes we need to organize information with more


than 2 dimensions, we call tensor an n-dimensional array.
• For example an 1D tensor is a vector, a 2D tensor is a
matrix, a 3D tensor is a cube, and a 4D tensor is a vector of
cubes, a 5D tensor is a matrix of cubes.
We can merge the weights and bias (called the bias trick) to
solve the linear classification as a single matrix multiplication

You might also like