You are on page 1of 9

DEEP LEARNING(R20)

Unit-III
CONVOLUTIONAL NEURAL NETWORKS
CNN Architecture
Convolutional Neural Network consists of multiple layers like the Convolutional layer, Pooling layer, and fully connected
layers

Convolutional Layer (CONV): They are the foundation of CNN, and they are in charge of executing convolution operations.
The Kernel/Filter is the component in this layer that performs the convolution operation (matrix). Until the complete image
is scanned, the kernel makes horizontal and vertical adjustments dependent on the stride rate. The kernel is less in size than
a picture, but it has more depth. This means that if the image has three (RGB) channels, the kernel height and width will be
modest spatially, but the depth will span all three.

Other than convolution, there is another important part of convolutional layers, known as the Non-linear activation function.
The outputs of the linear operations like convolution are passed through a non-linear activation function. Although smooth
nonlinear functions such as the sigmoid or hyperbolic tangent (tanh) function were formerly utilized because they are
mathematical representations of biological neuron actions. The rectified linear unit (RELU) is now the most commonly used
non-linear activation function. f(x) = max(0, x)
Pooling Layer (POOL): This layer is in charge of reducing dimensionality. It aids in reducing the amount of computing power
required to process the data. Pooling can be divided into two types: maximum pooling and average pooling. The maximum value
from the area covered by the kernel on the image is returned by max pooling. The average of all the values in the part of the image
covered by the kernel is returned by average pooling.

Fully Connected Layer (FC): The fully connected layer (FC) works with a flattened input, which means that each input is coupled to
every neuron. After that, the flattened vector is sent via a few additional FC layers, where the mathematical functional operations are
normally performed. The classification procedure gets started at this point. FC layers are frequently found near the end of CNN
architectures if they are present.

ASCET GUDUR 1
DEEP LEARNING(R20)

Along with the above layers, there are some additional terms that are part of a CNN architecture.

Activation Function: The last fully connected layer’s activation function is frequently distinct from the others. Each activity
necessitates the selection of an appropriate activation function. The softmax function, which normalizes output real values from the
last fully connected layer to target class probabilities, where each value ranges between 0 and 1 and all values total to 1, is an
activation function used in the multiclass classification problem.

Dropout Layers: The Dropout layer is a mask that nullifies some neurons’ contributions to the following layer while leaving all others
unchanged. A Dropout layer can be applied to the input vector, nullifying some of its properties; however, it can also be applied to a
hidden layer, nullifying some hidden neurons. Dropout layers are critical in CNN training because they prevent the training data from
overfitting. If they aren’t there, the first batch of training data has a disproportionately large impact on learning.

CONVOLUTION
 A Convolutional Neural Networks(CNN) is a type of Deep Learning neural network architecture commonly used in
Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and
interpret the image or visual data.
 Convolutional Neural Network (CNN) is the extended version of Artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like images or
videos where data patterns play an extensive role.
 When it comes to Machine Learning Artificial Neural Networks perform really well. Neural Networks are used in
various datasets like images, audio, and text. Different types of Neural Networks are used for different purposes, for
example for predicting the sequence of words we use Recurrent Neural Networks more precisely and LSTM,
similarly for image classification we use Convolution Neural networks. In this blog, we are going to build a basic
building block for CNN.

In a regular Neural Network there are three types of layers:


Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal to the total
number of features in our data (number of pixels in the case of an image).
Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many hidden layers depending
upon our model and data size. Each hidden layer can have different numbers of neurons which are generally greater than the
number of features. The output from each layer is computed by matrix multiplication of output of the previous layer with
learnable weights of that layer and then by the addition of learnable biases followed by activation function which makes the
network nonlinear.
Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or softmax which converts
the output of each class into the probability score of each class.

ASCET GUDUR 2
DEEP LEARNING(R20)

The data is fed into the model and output from each layer is obtained from the above step is called feedward, we then
calculate the error using an error function, some common error functions are cross-entropy,square loss error, etc. The error
function measures how well the network is performing. After that, we backpropagate into the model by calculating the
derivatives. This step is called Backpropagation which basic0ally is used to minimize the loss.

POOLING LAYERS
 Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to
learn and the amount of computation performed in the network.
 The pooling layer summarises the features present in a region of the feature map generated by a convolution layer.
So, further operations are performed on summarised features instead of precisely positioned features generated by the
convolution layer. This makes the model more robust to variations in the position of the features in the input image.
 In convolutional neural networks (CNNs), the pooling layer is a common type of layer that is typically added after
convolutional layers. The pooling layer is used to reduce the spatial dimensions (i.e., the width and height) of the
feature maps, while preserving the depth (i.e., the number of channels).
 The pooling layer works by dividing the input feature map into a set of non-overlapping regions, called pooling
regions. Each pooling region is then transformed into a single output value, which represents the presence of a
particular feature in that region. The most common types of pooling operations are max pooling and average pooling.
 In max pooling, the output value for each pooling region is simply the maximum value of the input values within that
region. This has the effect of preserving the most salient features in each pooling region, while discarding less
relevant information. Max pooling is often used in CNNs for object recognition tasks, as it helps to identify the most
distinctive features of an object, such as its edges and corners.
 In average pooling, the output value for each pooling region is the average of the input values within that region. This
has the effect of preserving.
Types of pooling layers:
1. Max Pooling:
Max pooling is a pooling operation that selects the maximum element from the region of the feature map covered by the filter. Thus,
the output after max-pooling layer would be a feature map containing the most prominent features of the previous feature map.

2. Average Pooling:
Average pooling computes the average of the elements present in the region of feature map covered by the filter. Thus, while max
pooling gives the most prominent feature in a particular patch of the feature map, average pooling gives the average of
features present in a patch.

Advantages :
a) Dimensionality reduction
b) Translation invariance
c) Feature selection
Disadvantages :
a) Information loss
b) Over-smoothing
c) Hyperparameter tuning

ASCET GUDUR 3
DEEP LEARNING(R20)

TRANSFER LEARNING
 Transfer learning is the reuse of a pre-trained model on a new problem. It’s currently very popular in deep learning
because it can train deep neural networks with comparatively little data. This is very useful in the data science field since
most real-world problems typically do not have millions of labeled data points to train such complex models.

 In transfer learning, the knowledge of an already trained machine learning model is applied to a different but related
problem. For example, if you trained a simple classifier to predict whether an image contains a backpack, you could use
the knowledge that the model gained during its training to recognize other objects like sunglasses.
 With transfer learning, we basically try to exploit what has been learned in one task to improve generalization in another.
We transfer the weights that a network has learned at “task A” to a new “task B.”
 The general idea is to use the knowledge a model has learned from a task with a lot of available labeled training data in a
new task that doesn't have much data. Instead of starting the learning process from scratch, we start with patterns learned
from solving a related task.
 Transfer learning is mostly used in computer vision and natural language processing tasks like sentiment analysis due to the
huge amount of computational power required.

Working Of Transfer Learning:


In computer vision, for example, neural networks usually try to detect edges in the earlier layers, shapes in the middle
layer and some task-specific features in the later layers. In transfer learning, the early and middle layers are used and
we only retrain the latter layers. It helps leverage the labeled data of the task it was initially trained on.

Let’s go back to the example of a model trained for recognizing a backpack on an image, which will be used to identify
sunglasses. In the earlier layers, the model has learned to recognize objects, because of that we will only retrain the latter
layers so it will learn what separates sunglasses from other objects.

In transfer learning, we try to transfer as much knowledge as possible from the previous task the model was trained on to
the new task at hand. This knowledge can be in various forms depending on the problem and the data. For example, it
could be how models are composed, which allows us to more easily identify novel objects.

Approaches to Transfer Learning:

1. TRAINING A MODEL TO REUSE IT:

Imagine you want to solve task A but don’t have enough data to train a deep neural network. One way around this
is to find a related task B with an abundance of data. Train the deep neural network on task B and use the model as a starting
point for solving task A. Whether you'll need to use the whole model or only a few layers depends heavily on the problem
you're trying to solve.If you have the same input in both tasks, possibly reusing the model and making predictions for your new
input is an option. Alternatively, changing and retraining different task-specific layers and the output layer is a method to
explore.

2. USING A PRE-TRAINED MODEL:

ASCET GUDUR 4
DEEP LEARNING(R20)

The second approach is to use an already pre-trained model. There are a lot of these models out there, so make sure to do a little
research. How many layers to reuse and how many to retrain depends on the problem. Keras, for example, provides numerous
pre-trained models that can be used for transfer learning, prediction, feature extraction and fine-tuning. You can find these
models, and also some brief tutorials on how to use them, here. There are also many research institutions that release trained
models.

This type of transfer learning is most commonly used throughout deep learning.

3. FEATURE EXTRACTION:

Another approach is to use deep learning to discover the best representation of your problem, which means finding the most
important features. This approach is also known as representation learning, and can often result in a much better performance
than can be obtained with hand-designed representation. This approach is mostly used in computer vision because it can reduce
the size of your dataset, which decreases computation time and makes it more suitable for traditional algorithms, as well.

Use of transfer Learning:


Transfer learning has several benefits, but the main advantages are saving training time, better performance of neural networks ,
and not needing a lot of data.

Usually, a lot of data is needed to train a neural network from scratch but access to that data isn't always available — this is
where transfer learning comes in handy. With transfer learning a solid machine learning model can be built with comparatively
little training data because the model is already pre-trained. This is especially valuable in natural language processing because
mostly expert knowledge is required to create large labelled datasets .Additionally, training time is reduced because it can
sometimes take days or even weeks to train a deep neural network from scratch on a complex task.

IMAGE CLASSIFICATION USING TRANSFER LEARNING:

 Image classification is a task where a computer will predict an image belongs to which class. Before deep
learning starts booming, tasks like image classification cannot achieve human-level performance. It’s because
the machine learning model cannot learn the neighbor information of an image. The model only gets pixel-
level information.
 image classification task can reach a human level performance using a model called Convolutional Neural
Network (CNN).

 CNN is a type of deep learning model that learns representation from an image. This model can learn from low
to high-level features without human involvement.
 The model learns not only information on a pixel level. The model also learns the neighbor information from
an image by a mechanism called convolution. Convolution will aggregate neighborhood information by

ASCET GUDUR 5
DEEP LEARNING(R20)

multiplying the collection of pixels in a region and sum them into a value. Those features will be used to
classify the image into a class.
 Although deep learning can achieve human-level performance, it needs a large amount of data. What if we
don’t have them? We can use a concept called transfer learning.
 Transfer learning is a method where we will use a model that has been trained on large scale data for our
problem. Therefore, we only train them by fine-tuning the model. The benefit that we will get is the model will
train in a short time.
 Now let’s see how to use transfer learning for image classification using TensorFlow. Because preprocessing
step is the essential process.

The Implementation
1.Import Libraries:

 The first step that we need to do is to import libraries. We need TensorFlow, NumPy, os, and pandas. If you
don’t install the package yet, you can use the pip command to install the libraries. Here is the code for install
and load the libraries.
import os
import numpy as np
import pandas as pd
import tensorflow as tf

2.Prepare The Data:

 After you load the libraries, the next step is to prepare our dataset. In this case, we will use a dataset called
Food-5K.This dataset consists of 5000 images with two classes where the classes are food and non-food. Also,
the data is already divided into training, validation, and a test set of data. The folder structure of our dataset
looks like this,
\---Food-5K
+---evaluation
| 0_0.jpg
| 1_0.jpg
|
+---training
| 0_0.jpg
| 1_1.jpg
|
\---validation
| 0_0.jpg
| 1_0.jpg

 Each folder consists of images, where each image filename contains the class and identifier of it. The
identifier is divided by an underscore. With that folder structure, we need to generate the dataframe with
columns are the image filename and the label.
 The code for preparing the dataset looks like this,
def prepare_df(data_type):
X = []
y = []
path = 'Food-5K/' + data_type + '/'

for i in os.listdir(path):
# Image
X.append(i)
# Label
y.append(i.split('_')[0])

X = np.array(X)

ASCET GUDUR 6
DEEP LEARNING(R20)

y = np.array(y)

df = pd.DataFrame()
df['filename'] = X
df['label'] = y

return df

df_train = prepare_df('training')
df_val = prepare_df('validation')
df_test = prepare_df('evaluation')

And here is the preview of the data frame,


filename Label

0 0_0.jpg 0

1 0_1.jpg 0

2 0_10.jpg 0

3 0_100.jpg 0

4 0_1000.jpg

 The next step is to prepare an object to put the images into the model. We will use the ImageDataGenerator
object from tf.keras.preprocessing.image library first.
 With that object, we will generate image batches. Also, we can augment our image to largen the number of the
dataset. Because we also augment those images, we also set parameters for the image augmentations method.
 Also, because we use a dataframe as the information about the dataset, we will use the flow_from_dataframe
method to generate batches and augment the images.

3.Train the model:

 After we generate the batches, now we can train the model with the transfer learning method. Because we use
that method, we don’t need to implement CNN architecture from scratch. Instead, we will use the existing and
already pre trained architecture.
 We will use ResNet-50 as the backbone for our new model. We will create the input and change the final
linear layer of ResNet-50 with the new one based on the number of classes.
 We will use the fit method for training it.
4.Test the model:

 After we train the model, now let’s test the model on the test data. In addition, we need to add pillow library to
load and resize the image and scikit-learn for calculating the model performance.
 We will use classification_report from the scikit-learn library to generate a report about model performance.
Also, we will visualize the confusion matrix from it.
5.Save the model:

 If you want to use the model for later use or deployment, you can save the model using the save method like
this,
model.save('./resnet50_food_model')

 If you want to load to load the model, you can use the load_model function like this,
model = tf.keras.models.load_model('./resnet50_food_model')

ASCET GUDUR 7
DEEP LEARNING(R20)

ASSIGNMENT-3

SECTION-A 1*5=5M
1. Define Pooling.
2. Write a short note on Activation Function.
3. Define Dropout layers.

ASCET GUDUR 8
DEEP LEARNING(R20)

4. Briefly explain about Feature extraction.

SECTION -B 5*5=25M
1. Explain about CNN architectures in detail.
2. Explain about convolution in detail.
3. Explain about Pooling Layers in detail.
4. Explain about Transfer Learning in detail.
5. Explain about Image Classification using Transfer Learning in detail.

ASCET GUDUR 9

You might also like