You are on page 1of 15

5 Layers of a Convolutional Neural Network

1. Convolutional Layer:

This layer performs the convolution operation on the input data, which extracts various features from the
data.

Convolutional Layers in a CNN model architecture are one of the most vital components of CNN layers.
These layers are responsible for extracting features from the input data and forming the basis for further
processing and learning.

A convolutional layer consists of a set of filters (also known as kernels) applied to the i nput data in a
sliding window fashion. Each filter extracts a specific set of features from the input data based on the
weights associated with it.

The number of filters used in the convolutional layer is one of the key hyper parameters in the architecture.
It is determined based on the type of data being processed as well as the desired accuracy of the model.
Generally, more filters will result in more features extracted from the input data, allowing for more
complex network architectures to understand the data better.

The convolution operation consists of multiplying each filter with the data within the sliding window and
summing up the results. This operation is repeated for all the filters, resulting in multiple feature maps for
a single convolutional layer. These feature maps are then used as input for the following layers, allowing
the network to learn more complex features from the data.

Convolutional layers are the foundation of deep learning architectures and are used in various applications,
such as image recognition, natural language processing, and speech recognition. By extracting the most
critical features from the input data, convolutional layers enable the network to learn more complex
patterns and make better predictions.

2. Pooling Layer:

This layer performs a down sampling operation on the feature maps, which reduces the amount of
computation required and also helps to reduce overfitting.

The pooling layer is a vital component of the architecture of CNN. It is typically used to reduce the input
volume size while extracting meaningful information from the data. Pooling layers are usually used in
the later stages of a CNN, allowing the network to focus on more abstract features of an image or other
type of input. The pooling layer operates by sliding a window over the input volume and computing a
summary statistic for the values within the window.

Common statistics include taking the maximum, average, or sum of the values within the window. This
reduces the input volume’s size while preserving important information about the data.

The pooling layer is also typically used to introduce spatial invariance, meaning that the network will
produce the same output regardless of the location of the input within the image. This allows the network
to inherit more general features about the image rather than simply memorizing its exact location.

3. Activation Layer:

This layer adds non-linearity to the model by applying a non-linear activation function such as ReLU or
tanh.

An activation layer in a CNN is a layer that serves as a non-linear transformation on the output of the
convolutional layer. It is a primary component of the network, allowing it to learn complex relationships
between the input and output data.

The activation layer can be thought of as a function that takes the output of the convolutional layer and
maps it to a different set of values. This enables the network to learn more complex patterns in the data
and generalize better.

Common activation functions used in CNNs include ReLu (Rectified Linear Unit), sigmoid, and tanh.
Each activation function serves a different purpose and can be used in different scenarios.

ReLu is the most commonly used activation function in most convolutional networks. It is a non -linear
transformation that outputs 0 for all negative values and the same value as the input for all positive
values. This allows the network to imbibe more complex patterns in the data.

Sigmoid is another commonly used activation function, which outputs values between 0 and 1 for any
given input. This helps the network to understand complex relationships between the input and output
data but is more computationally expensive than ReLu.
Tanh is the least commonly used activation function, which outputs values between -1 and 1 for any
given input.

The activation layer is an essential component of the CNN, as it prevents linearity and enhances non -
linearity in the output. Choosing the right activation function for the network is essential, as each activation
function serves a different purpose and can be used in different scenarios. Selecting a suitable activation
function can lead to better performance of the CNN structure.

4. Fully Connected Layer:

This layer connects each neuron in one layer to every neuron in the next layer, resulting in a fully -connected
network.

A fully connected layer in a CNN is a layer of neurons connected to every neuron in the previous layer in
the network. This is in contrast to convolutional layers, where neurons are only connected to a subset of
neurons in the previous layer based on a specific pattern.

By connecting every neuron in one layer to every neuron in the next layer, the fully connected layer
allows information from the previous layer to be shared across the entire network, thus providing the
opportunity for a more comprehensive understanding of the data.

Fully connected layers in CNN are typically used towards the end of a CNN model architecture, after the
convolutional layers and pooling layers, as they help to identify patterns and correlations that the
convolutional layers may not have recognized.

Additionally, fully connected layers are used to generate a non-linear decision boundary that can be used
for classification. In conclusion, fully connected layers are an integral part of any CNN and provide a
powerful tool for identifying patterns and correlations in the data.

5. Output Layer:

This is the final layer of the network, which produces the output labels or values.

The output layer of a CNN is the final layer in the network and is responsible for producing the output. It
is the layer that takes the features extracted from previous layers and combines them in a way that allows
it to produce the desired output.
A fully connected layer is typically used when the output is a single value, such as a classification or
regression problem. A single neuron layer is generally used when the outcome is a vector, such as a
probability distribution.

A softmax activation function is used when the output is a probability distribution, such as a probability
distribution over classes. The output layer of a CNN is also responsible for performing the necessary
computations to obtain the desired output. This includes completing the inputs’ necessary linear or non-
linear transformations to receive the output required.

Finally, the output layer of a CNN can also be used to perform regularization techniques, such as dropout
or batch normalization, to improve the network’s performance.

………………………………………………………………………………………………………………

Convolution Neural Network


Convolutional Neural Network (CNN) is the extended version of artificial neural networks
(ANN) which is predominantly used to extract the feature from the grid-like matrix
dataset. For example visual datasets like images or videos where data patterns play an
extensive role.

CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.

The Convolutional layer applies filters to the input image to extract features, the Pooling
layer down samples the image to reduce computation, and the fully connected layer makes
the final prediction. The network learns the optimal filters through backpropagation and
gradient descent.
Concept of Convolution (1D and 2D) layers

In deep learning, a convolutional neural network (CNN or ConvNet) is a class of deep neural networks, that
are typically used to recognize patterns present in images but they are also used for spatial data analysis,
computer vision, natural language processing, signal processing, and various other purposes

What Is a Convolution?
Convolution is an orderly procedure where two sources of information are intertwined; it’s an operation that
changes a function into something else. Convolutions have been used for a long time typically in image
processing to blur and sharpen images, but also to perform other operations. (e.g. enhance edges and
emboss)
Convolution, Non Linearity (ReLU), Pooling or Sub Sampling, Classification (Fully Connected
Layer) The first layer of a Convolutional Neural Network is always a Convolutional Layer. Convolutional
layers apply a convolution operation to the input, passing the result to the next layer. A convolution converts
all the pixels in its receptive field into a single value. For example, if you would apply a convolution to an
image, you will be decreasing the image size as well as bringing all the information in the field together into
a single pixel. The final output of the convolutional layer is a vector. Based on the type of problem we need
to solve and on the kind of features we are looking to learn, we can use different kinds of convolutions.

The 2D Convolution Layer


The most common type of convolution that is used is the 2D convolution layer and is usually abbreviated as
conv2D. A filter or a kernel in a conv2D layer “slides” over the 2D input data, performing an element wise
multiplication. As a result, it will be summing up the results into a single output pixel. The kernel will
perform the same operation for every location it slides over, transforming a 2D matrix of features into a
different 2D matrix of features.
The Dilated or Atrous Convolution
This operation expands window size without increasing the number of weights by inserting zerovalues into
convolution kernels. Dilated or Atrous Convolutions can be used in real time

applications and in applications where the processing power is less as the RAM requirements are less
intensive.
Separable Convolutions
There are two main types of separable convolutions: spatial separable convolutions, and depthwise separable
convolutions. The spatial separable convolution deals primarily with the spatial dimensions of an image and
kernel: the width and the height. Compared to spatial separable convolutions, depthwise separable
convolutions work with kernels that cannot be “factored” into two smaller kernels. As a result, it is more
frequently used.
Transposed Convolutions
These types of comvolutions are also known as deconvolutions or fractionally strided convolutions. A
transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.
Develop 1D Convolutional Neural Network
A one-dimensional convolutional neural network model (1D CNN) for the human activity recognition
dataset. Convolutional neural network models were developed for image classification problems, where the
model learns an internal representation of a two-dimensional input, in a process referred to as feature
learning.

This data is collected from an accelerometer which a person is wearing on his arm. Data represent the
acceleration in all the 3 axes. 1D CNN can perform activity recognition task from accelerometer data, such
as if the person is standing, walking, jumping etc. This data has 2 dimensions. The first dimension is time-
steps and other is the values of the acceleration in 3 axes.
Following plot illustrate how the kernel will move on accelerometer data. Each row represents time series
acceleration for some axis. The kernel can only move in one dimension along the axis of time.

………………………………………………………………………………………………………..
Case study of CNN for eg on Diabetic Retinopathy

Diabetic retinopathy (DR) is the leading cause of blindness in the working-age population of the developed
world. Presently, detecting DR is a manual, time-consuming process that requires a trained ophthalmologist
to examine and evaluate digital fundus photographs of the retina. Computer machine learning technologies
such as Convolutional Neural Networks (CNNs) have emerged as an effective tool in medical image
analysis for the detection and classification of DR in real-time.
Diabetes mellitus, commonly known as diabetes, causes high blood sugar. Persistently high blood sugar
level leads to various complications and general vascular deterioration of the heart, eyes, kidneys, and
nerves, Diabetic retinopathy (DR) is one of the leading diseases caused by diabetes, It damages the blood
vessels of the retina, for those who have diabetes type-I or type-II. DR is classified into two major classes:
nonproliferative (NPDR) and proliferative (PDR).
Figure. Computer Vision through Convolutional Neural Network

CNN for Diabetic Retinopathy detection


Convolutional Neural Network is a feed-forward neural network. It mainly consists of an input layer,
many hidden layers (such as convolutional relu, pooling, flatten, fully connected and softmax layers)
and a final multi-label classification layer. CNN methodology involves two stages of processing: a time-
consuming training stage where millions of images go through many iterations of CNN architecture to
finalize the model parameters of each layer and a second realtime prediction stage where each image
in test dataset is fed into the trained model to score and validate the model.

However, there are two issues with CNN methods for DR detection. One is achieving a desirable offset in
sensitivity (patients correctly identified as having DR) and specificity (patients correctly identified as not
having DR). This is significantly harder for a five-class problem containing normal, mild, moderate, severe,
and proliferative DR classes. The second problem is over fitting. Skewed datasets cause the network to over-
fit to the class most prominent in the dataset. Large datasets are often massively skewed.

…………………………………………………………
Unsupervised Learning – SOM Algorithm and its variant;
Self Organizing Map (or Kohonen Map or SOM) is a type of Artificial Neural Network which is also
inspired by biological models of neural systems form the 1970’s. It follows an unsupervised learning
approach and trained its network through a competitive learning algorithm. SOM is used for clustering and
mapping (or dimensionality reduction) techniques to map multidimensional data onto lower-dimensional
which allows people to reduce complex problems for easy interpretation. SOM has two layers, one is the
Input layer and the other one is the Output layer.
The architecture of the Self Organizing Map with two clusters and n input features of any sample is given
below:
How SOM works?
Let’s say an input data of size (m, n) where m is the number of training example and n is the number of
features in each example. First, it initializes the weights of size (n, C) where C is the number of clusters.
Then iterating over the input data, for each training example, it updates the winning vector (weight vector
with the shortest distance (e.g Euclidean distance) from training example). Weight updation rule is given by:

Where alpha is a learning rate at time t, j denotes the winning vector, i denotes the ith feature of training
example and k denotes the kth training example from the input data. After training the SOM network,
trained weights are used for clustering new examples. A new example falls in the cluster of winning vector.
SOM Algorithm
1. Steps involved are :
2. Weight initialization
3. For 1 to N number of epochs
4. Select a training example
5. Compute the winning vector
6. Update the winning vector
7. Repeat steps 3, 4, 5 for all training examples.
8. Clustering the test sample
…………………………………………………………………………………………………
What are the advantages and disadvantage of Artificial Neural Network?

Ans. Advantages of Artificial Neural Networks (ANN):


1. Attribute-value pairs are used in ANN to represent problems.
2. ANNs are used to issues where the target function exists, and the output can take the form of a real-
valued, discrete-valued, or vector of real or discrete-valued features.
3. Learning techniques for ANNs are fairly resistant to noise in training data. Errors in the training examples
are possible, but they won’t affect the results.
4. It is employed in situations where quick evaluation of the target function learned is necessary.
5. Depending on variables like the amount of weights in the network, the number of training instances taken
into account, and the settings of various learning algorithm parameters, ANNs can withstand lengthy
training timeframes.
Disadvantages of Artificial Neural Networks (ANN):
1. Hardware dependence:
a. Due to the nature of artificial neural networks, parallel processing power is needed.
b. This makes it dependent on how the equipment is realised, and vice versa.
2. Unexplained functioning of the network:
a. This is the most important problem of ANN.
b. When ANN offers a perplexing solution, it doesn’t explain why or how.
c. This reduces trust in the network.
3. Assurance of proper network structure:
a. There is no set formula for figuring out how artificial neural networks should be structured.
b. By experience and trial-and-error, the ideal network structure is attained.
4. The difficulty of showing the problem to the network:
a. ANNs can work with numerical information.
b. Problems have to be translated into numerical values before being introduced to ANN.
c. The network’s performance will be directly impacted by the display mechanism chosen.
d. This is dependent on the user’s ability.
5. The duration of the network is unknown:
a. The network has finished training when the error on the sample is decreased to a specific value.
b. This value does not give us optimum results.
……………………………………………………………………………………………………….
Discuss the benefits of artificial neural network.
Ans.
1. Artificial neural networks are flexible and adaptive.
2. Systems for pattern and sequence recognition, data processing, robotics, modeling, etc. all use artificial
neural networks.
3. ANN solves complicated problems that are challenging to manage by adapting to internal and external
elements and learning from their surroundings.
4. It expands knowledge to create appropriate reactions to unidentified functions.
5. Flexible and able to learn, generalize, and adapt to conditions based on their discoveries are artificial
neural networks.
6. The network can learn thanks to this feature. This method of efficiently acquiring information has a clear
benefit over the linear network, which is generally insufficient for modeling non-linear data.
7. A regular network cannot handle faults as well as an artificial neuron network can. The network may
regenerate a failure in any of its components without losing any stored data.
8. An artificial neuron network is based on adaptive learning.
……………………………………………………………………………………………………..

Define convolutional networks.


Ans.

1. Convolutional networks, sometimes referred to as Convolutional Neural Networks (CNNs), are an


advanced class of neural network used to process input using a predetermined, grid-like architecture.

2. The term “convolutional neural network” refers to a neural network that uses the convolution
mathematical technique.
3. A specialized sort of linear process is convolution.
4. Convolutional networks are simply neural networks that, in at least one of their layers, substitute
convolution for conventional matrix multiplication.
5. CNNs, (ConvNets), are quite similar to regular neural networks.
6. They continue to be composed of neurons with learnable weights. Each neuron processes a dot
product after receiving some inputs.
7. They still have a loss function on the last fully connected layer.
8. A non-linearity function is still an option. A typical neural network takes a single vector of input data
and processes it through several hidden layers.

9. Each hidden layer is made up of neurons, each of which is completely coupled to every other neuron
in the layer below.
10. Each neuron is totally independent and does not share any connections within a single layer.
11. In the event of an image classification issue, class scores are contained in the completely linked layer
(the output layer). Simple ConvNets have three primary layers.
………………………………………………………………………………
Describe briefly activation function, pooling and fully connected layer.
Ans. Activation function:
1. To assist an artificial neural network in learning complex patterns in the data, activation functions are
functions that are introduced to the network.
2. In contrast to a neuron-based model seen in our brains, the activation function determines what signals
should be sent to the following neuron at the end of the process.
3. An ANN’s activation function does the same exact task.
4. It receives the output signal from the cell before it and transforms it into a format that may be used as the
input for the cell after it.
Pooling layer:
1. A pooling layer is a new layer added after the convolutional layer. Specifically, after a non-linearity (for
example ReLU) has been applied to the feature maps output by a convolutional layer, for example, the
layers in a model may look as follows :
a. Input image
b. Convolutional layer
c. Non-linearity
d. Pooling layer
2. A frequent strategy for arranging layers within a convolutional neural network that may be repeated one
or more times in a given model is the addition of a pooling layer after the convolutional layer.
3. To build a new set of the same number of pooled feature maps, the pooling layer operates on each feature
map separately.
Fully connected layer:
1. Convolutional Neural Networks (CNNs), which have been demonstrated to be particularly successful in
detecting and classifying pictures for computer vision, must have fully linked layers.
2. The convolution and pooling of the image’s information into features and independent analysis are the
first steps in the CNN process.
3. A fully connected neural network structure receives the output of this procedure and uses it to determine
the final classification.
……………………………………………………………………………………………………………
How we trained a network ? Explain.
Ans.
1. A network is prepared to be trained once it has been set up for a specific application.
2. The initial weights are picked at random to begin this process. The training or learning process then starts.
3. There are two approaches to training:
a. During supervised training, both inputs and outputs are provided. The network compares the generated
outputs to the desired outputs after processing the inputs.
b. The weights that control the network are modified as errors are then relayed back through the system.
This procedure is repeated as the weights are changed repeatedly.
c. The set of data known as the “training set” is what allows for training. While a network is being trained,
the link weights are continuously enhanced, processing the same piece of data many times.
d. The alternative method of education is unsupervised training. Giving the network inputs but not the
anticipated outputs is known as unsupervised training.
e. The system must next choose the features it will employ to organize the input data into groups. This is
frequently referred to as adaptation or self-organization.
………………………………………………………………………………………………..
Using artificial neural network how we recognize speaker.
Ans.
1. Voice control and automation are essential elements of the smart home sector’s technological
breakthroughs that can significantly improve people’s lives.
2. Due to speaker recognition functionality being present in almost all modern smart home products, the
voice recognition technology industry is still expanding quickly.
3. Unfortunately, the majority of them use highly deep neural networks or cloud-based solutions for speaker
detection, which are unsuitable models for smart home devices.
4. In this section, we compare very modest Convolutional Neural Networks (CNN) and assess how well
these models operate for identifying speakers on edge devices. We also use the transfer learning technique to
address the issue of insufficient training data.
5. We address the well-known problems associated with cloud computing, such as data privacy and network
latency, by creating a method that is appropriate for executing inference locally on edge devices.
6. According to the first findings, the selected model uses CNN and spectrograms to conduct speaker
categorization with accuracy and recall of 84% in less than 60 ms on a mobile device with an Atom Cherry
Trail CPU.
……………………………………………………………………………………………………………..

You might also like