Professional Documents
Culture Documents
Artificial Neural Networks contain artificial neurons which are called units. These units are arranged in a series
of layers that together constitute the whole Artificial Neural Network in a system. A layer can have only a
dozen units or millions of units as this depends on how the complex neural networks will be required to learn
the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input layer, an output layer as
well as hidden layers. The input layer receives data from the outside world which the neural network needs to
analyze or learn about. Then this data passes through one or multiple hidden layers that transform the input into
data that is valuable for the output layer. Finally, the output layer provides an output in the form of a response
of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these connections
has weights that determine the influence of one unit on another unit. As the data transfers from one unit to
another, the neural network learns more and more about the data which eventually results in an output from the
output layer.
The structures and operations of human neurons serve as the basis for artificial neural networks. It is also known
as neural networks or neural nets. The input layer of an artificial neural network is the first layer, and it
receives input from external sources and releases it to the hidden layer, which is the second layer. In the hidden
layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends it to
the neurons in the next layer. These connections are weighted means effects of the inputs from the previous
layer are optimized more or less by assigning different-different weights to each input and it is adjusted during
the training process by optimizing these weights for improved model performance.
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal brains So they share
a lot of similarities in structure and function wise.
Structure: The structure of artificial neural networks is inspired by biological neurons. A biological neuron
has a cell body or soma to process the impulses, dendrites to receive them, and an axon that transfers them
to other neurons. The input nodes of artificial neural networks receive input signals, the hidden layer nodes
compute these input signals, and the output layer nodes compute the final output by processing the hidden
layer’s results using activation functions.
Biological Neuron Artificial Neuron
Dendrite Inputs
Synapses Weights
Axon Output
Synapses: Synapses are the links between biological neurons that enable the transmission of impulses from
dendrites to the cell body. Synapses are the weights that join the one-layer nodes to the next-layer nodes in
artificial neurons. The strength of the links is determined by the weight value.
Learning: In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus
that helps to process the impulses. An action potential is produced and travels through the axons if the
impulses are powerful enough to reach the threshold. This becomes possible by synaptic plasticity, which
represents the ability of synapses to become stronger or weaker over time in reaction to changes in their
activity. In artificial neural networks, backpropagation is a technique used for learning, which adjusts the
weights between nodes according to the error or differences between predicted and actual outcomes.
Biological Neuron Artificial Neuron
Activation: In biological neurons, activation is the firing rate of the neuron which happens when the
impulses are strong enough to reach the threshold. In artificial neural networks, A mathematical function
known as an activation function maps the input to the output, and executes activations.
Perceptron
A single-layer feedforward neural network was introduced in the late 1950s by Frank Rosenblatt. It was the
starting phase of Deep Learning and Artificial neural networks. During that time for prediction, Statistical
machine learning, or Traditional code Programming is used. Perceptron is one of the first and most
straightforward models of artificial neural networks. Despite being a straightforward model, the perceptron
has been proven to be successful in solving specific categorization issues.
Architecture
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by Frank Rosenblatt
in 1957s. It is the simplest type of feedforward neural network, consisting of a single layer of input nodes that
are fully connected to a layer of output nodes. It can learn the linearly separable patterns. it uses slightly
different types of artificial neurons known as threshold logic units (TLU). it was first introduced by McCulloch
and Walter Pitts in the 1940s.
A weight is assigned to each input node of a perceptron, indicating the significance of that input to the output.
The perceptron’s output is a weighted sum of the inputs that have been run through an activation function to
decide whether or not the perceptron will fire. it computes the weighted sum of its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is larger than a
threshold value and 0 otherwise, is the activation function that perceptrons utilize the most frequently. The most
common step function used in perceptron is the Heaviside step function:
A perceptron has a single layer of threshold logic units with each TLU connected to all inputs.
A perceptron has a single layer of threshold logic units with each TLU connected to all inputs.
This entire procedure is known as Gradient Ascent, which is also known as steepest descent. The main objective
of using a gradient descent algorithm is to minimize the cost function using iteration. To achieve this goal, it
performs two steps iteratively:
o Calculates the first-order derivative of the function to compute the gradient or slope of that function.
o Move away from the direction of the gradient, which means slope increased from the current point by
alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process
which helps to decide the length of the steps.
What is Cost-function?
The cost function is defined as the measurement of difference or error between actual values and expected
values at the current position and present in the form of a single real number. It helps to increase and improve
machine learning efficiency by providing feedback to this model so that it can minimize error and find the local
or global minimum. Further, it continuously iterates along the direction of the negative gradient until the cost
function approaches zero. At this steepest descent point, the model will stop learning further. Although cost
function and loss function are considered synonymous, also there is a minor difference between them. The slight
difference between the loss function and the cost function is about the error within the training of machine learning
models, as loss function refers to the error of one training example, while a cost function calculates the average
error across an entire training set.
The cost function is calculated after making a hypothesis with initial parameters and modifying these parameters
using gradient descent algorithms over known data to reduce the cost function.
How does Gradient Descent work?
Before starting the working principle of gradient descent, we should know some basic concepts to find out the
slope of a line from linear regression. The equation for simple linear regression is given as:
1. Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.
The starting point(shown in above fig.) is used to evaluate the performance as it is considered just as an arbitrary
point. At this starting point, we will derive the first derivative or slope and then use a tangent line to calculate the
steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias).
The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated,
then steepness gradually reduces, and at the lowest point, it approaches the lowest point, which is called a point
of convergence.
The main objective of gradient descent is to minimize the cost function or the error between expected and actual.
To minimize the cost function, two data points are required:
o Direction & Learning Rate
These two factors are used to determine the partial derivative calculation of future iteration and allow it to the
point of convergence or local minimum or global minimum. Let's discuss learning rate factors in brief;
Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is typically a small value that is
evaluated and updated based on the behavior of the cost function. If the learning rate is high, it results in larger
steps but also leads to risks of overshooting the minimum. At the same time, a low learning rate shows the small
step sizes, which compromises overall efficiency but gives the advantage of more precision.
Whenever the slope of the cost function is at zero or just close to zero, this model stops learning further. Apart
from the global minimum, there occur some scenarios that can show this slop, which is saddle point and local
minimum. Local minima generate the shape similar to the global minimum, where the slope of the cost function
increases on both sides of the current points.
In contrast, with saddle points, the negative gradient only occurs on one side of the point, which reaches a local
maximum on one side and a local minimum on the other side. The name of a saddle point is taken by that of a
horse's saddle.
The name of local minima is because the value of the loss function is minimum at that point in a local region. In
contrast, the name of the global minima is given so because the value of the loss function is minimum there,
globally across the entire domain the loss function.
2. Vanishing and Exploding Gradient
In a deep neural network, if the model is trained with gradient descent and backpropagation, there can occur two
more issues other than local minima and saddle point.
Vanishing Gradients:
Vanishing Gradient occurs when the gradient is smaller than expected. During backpropagation, this gradient
becomes smaller that causing the decrease in the learning rate of earlier layers than the later layer of the network.
Once this happens, the weight parameters update until they become insignificant.
Exploding Gradient:
Exploding gradient is just opposite to the vanishing gradient as it occurs when the Gradient is too large and creates
a stable model. Further, in this scenario, model weight increases, and they will be represented as NaN. This
problem can be solved using the dimensionality reduction technique, which helps to minimize complexity within
the model.
Delta Learning Rule
It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and has a
continuous activation function. It is also known as the Least Mean Square method and it minimizes error over
all the training patterns.
It is based on a gradient descent approach which continues forever. It states that the modification in the weight
of a node is equal to the product of the error and the input where the error is the difference between desired and
actual output.
Computed as follows:
Assume (x1,x2,x3……………………….xn) –>set of input vectors
and (w1,w2,w3…………………..wn) –>set of weights
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
Error= ti-y
Learning signal(ej)=(ti-y)y’
y=f(net input)= ∫wixi
δw=αxiej=αxi(ti-y)y’
wnew=wo+δw
The updating of weights can only be done if there is a difference between the target and actual output(i.e.,
error) present:
case I: when t=y
then there is no change in weight
case II: else
wnew=wo+δw
Backpropagation
Backpropagation is an algorithm that backpropagates the errors from the output nodes to the input nodes.
Therefore, it is simply referred to as the backward propagation of errors. It uses in the vast applications of
neural networks in data mining like Character recognition, Signature verification, etc.
Neural Network:
Neural networks are an information processing paradigm inspired by the human nervous system. Just like in
the human nervous system, we have biological neurons in the same way in neural networks we have artificial
neurons, artificial neurons are mathematical functions derived from biological neurons. The human brain is
estimated to have about 10 billion neurons, each connected to an average of 10,000 other neurons. Each neuron
receives a signal through a synapse, which controls the effect of the signconcerning on the neuron.
Backpropagation is a widely used algorithm for training feedforward neural networks. It computes the gradient
of the loss function with respect to the network weights. It is very efficient, rather than naively directly
computing the gradient concerning each weight. This efficiency makes it possible to use gradient methods to
train multi-layer networks and update weights to minimize loss; variants such as gradient descent or stochastic
gradient descent are often used.
The backpropagation algorithm works by computing the gradient of the loss function with respect to each
weight via the chain rule, computing the gradient layer by layer, and iterating backward from the last layer to
avoid redundant computation of intermediate terms in the chain rule.
Features of Backpropagation:
1. it is the gradient descent method as used in the case of simple perceptron network with the differentiable
unit.
2. it is different from other networks in respect to the process by which the weights are calculated during the
learning period of the network.
3. training is done in the three stages :
the feed-forward of input training pattern
the calculation and backpropagation of the error
updation of the weight
Working of Backpropagation:
Neural networks use supervised learning to generate output vectors from input vectors that the network operates
on. It Compares generated output to the desired output and generates an error report if the result does not match
the generated output vector. Then it adjusts the weights according to the bug report to get your desired output.
Backpropagation Algorithm:
Backpropagation is “backpropagation of errors” and is very useful for training neural networks. It’s fast, easy
to implement, and simple. Backpropagation does not require any parameters to be set, except the number of
inputs. Backpropagation is a flexible method because no prior knowledge of the network is required.
Types of Backpropagation
Advantages:
Disadvantages:
It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate results.
Performance is highly dependent on input data.
Spending too much time training.
The matrix-based approach is preferred over a mini-batch.
Example
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the weights as
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
To calculate the final result of y1 we performed the sigmoid function as
SOM Algorithm
Kohonen Self-Organizing feature map (SOM) refers to a neural network, which is trained using competitive
learning. Basic competitive learning implies that the competition process takes place before the cycle of learning.
The competition process suggests that some criteria select a winning processing element. After the winning
processing element is selected, its weight vector is adjusted according to the used learning law (Hecht Nielsen
1990).
The self-organizing map makes topologically ordered mappings between input data and processing elements of
the map. Topological ordered implies that if two inputs are of similar characteristics, the most active processing
elements answering to inputs that are located closed to each other on the map. The weight vectors of the processing
elements are organized in ascending to descending order. Wi < Wi+1 for all values of i or Wi+1 for all values of i
(this definition is valid for one-dimensional self-organizing map only).
The self-organizing map is typically represented as a two-dimensional sheet of processing elements described in
the figure given below. Each processing element has its own weight vector, and learning of SOM (self-organizing
map) depends on the adaptation of these vectors. The processing elements of the network are made competitive
in a self-organizing process, and specific criteria pick the winning processing element whose weights are updated.
Generally, these criteria are used to limit the Euclidean distance between the input vector and the weight vector.
SOM (self-organizing map) varies from basic competitive learning so that instead of adjusting only the weight
vector of the winning processing element also weight vectors of neighboring processing elements are adjusted.
First, the size of the neighborhood is largely making the rough ordering of SOM and size is diminished as time
goes on. At last, only a winning processing element is adjusted, making the fine-tuning of SOM possible. The use
of neighborhood makes topologically ordering procedure possible, and together with competitive learning makes
process non-linear.
It is discovered by Finnish professor and researcher Dr. Teuvo Kohonen in 1982. The self-organizing map refers
to an unsupervised learning model proposed for applications in which maintaining a topology between input and
output spaces. The notable attribute of this algorithm is that the input vectors that are close and similar in high
dimensional space are also mapped to close by nodes in the 2D space. It is fundamentally a method for
dimensionality reduction, as it maps high-dimension inputs to a low dimensional discretized representation and
preserves the basic structure of its input space.
All the entire learning process occurs without supervision because the nodes are self-organizing. They are also
known as feature maps, as they are basically retraining the features of the input data, and simply grouping
themselves as indicated by the similarity between each other. It has practical value for visualizing complex or
huge quantities of high dimensional data and showing the relationship between them into a low, usually two-
dimensional field to check whether the given unlabeled data have any structure to it.
A self-Organizing Map (SOM) varies from typical artificial neural networks (ANNs) both in its architecture and
algorithmic properties. Its structure consists of a single layer linear 2D grid of neurons, rather than a series of
layers. All the nodes on this lattice are associated directly to the input vector, but not to each other. It means the
nodes don't know the values of their neighbors, and only update the weight of their associations as a function of
the given input. The grid itself is the map that coordinates itself at each iteration as a function of the input data.
As such, after clustering, each node has its own coordinate (i.j), which enables one to calculate Euclidean distance
between two nodes by means of the Pythagoras theorem.
A Self-Organizing Map utilizes competitive learning instead of error-correction learning, to modify its weights.
It implies that only an individual node is activated at each cycle in which the features of an occurrence of the
input vector are introduced to the neural network, as all nodes compete for the privilege to respond to the input.
The selected node- the Best Matching Unit (BMU) is selected according to the similarity between the current
input values and all the other nodes in the network. The node with the fractional Euclidean difference between
the input vector, all nodes, and its neighboring nodes is selected and within a specific radius, to have their position
slightly adjusted to coordinate the input vector. By experiencing all the nodes present on the grid, the whole grid
eventually matches the entire input dataset with connected nodes gathered towards one area, and dissimilar ones
are isolated.
Algorithm:
Step:1
Each node weight w_ij initialize to a random value.
Step:2
Step:3
Step:4
Calculate the Euclidean distance between weight vector wij and the input vector x(t) connected with the first node,
where t, i, j =0.
Step:5
Step:6
Calculate the overall Best Matching Unit (BMU). It means the node with the smallest distance from all calculated
ones.
Step:7
Discover topological neighborhood βij(t) its radius σ(t) of BMU in Kohonen Map.
Step:8
Repeat for all nodes in the BMU neighborhood: Update the weight vector w_ij of the first node in the
neighborhood of the BMU by including a fraction of the difference between the input vector x(t) and the weight
w(t) of the neuron.
Step:9
Repeat the complete iteration until reaching the selected iteration limit t=n.
Here, step 1 represents initialization phase, while step 2 to 9 represents the training phase.
Where;
t = current iteration.
W= weight vector
β_ij = the neighborhood function, decreasing and representing node i,j distance from the BMU.
σ(t) = The radius of the neighborhood function, which calculates how far neighbor nodes are examined in the 2D
grid when updating vectors. It gradually decreases over time.
Pros
Techniques like dimensionality reduction and grid clustering can make it simple to understand and
comprehend data.
Self-organizing maps can handle a variety of categorization issues while simultaneously producing an
insightful and practical summary of the data.
Cons
The model cannot grasp how data is formed since it does not generate a generative data model.
When dealing with categorical data, Self-Organizing Maps perform poorly, and when dealing with mixed
forms of data, they do much worse.
In comparison, the model preparation process is extremely slow, making it challenging to train against slowly
evolving data.
1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in various tasks, such
as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover and learn relevant
features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can learn from
massive amounts of data.
4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle various types of
data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their performance as more data
becomes available.
1. High computational requirements: Deep Learning models require large amounts of data and computational
resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large amount of labeled data
for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult to understand
how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training data, resulting in poor performance
on new and unseen data.
4. Black-box nature: Deep Learning models are often treated as black boxes, making it difficult to understand
how they work and how they arrived at their predictions.
Convolution Neural Network
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used
in various datasets like images, audio, and text. Different types of Neural Networks are used for different
purposes, for example for predicting the sequence of words we use Recurrent Neural Networks more precisely
an LSTM, similarly for image classification we use Convolution Neural networks. In this blog, we are going to
build a basic building block for CNN.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is
equal to the total number of features in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many
hidden layers depending upon our model and data size. Each hidden layer can have different numbers
of neurons which are generally greater than the number of features. The output from each layer is
computed by matrix multiplication of output of the previous layer with learnable weights of that layer
and then by the addition of learnable biases followed by activation function which makes the network
nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is called feedforward,
we then calculate the error using an error function, some common error functions are cross-entropy, square loss
error, etc. The error function measures how well the network is performing. After that, we backpropagate into
the model by calculating the derivatives. This step is called Backpropagation which basically is used to
minimize the loss.
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like
images or videos where data patterns play an extensive role.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling
layer, and fully connected layers.
The Convolutional layer applies filters to the input image to extract features, the Pooling layer downsamples
the image to reduce computation, and the fully connected layer makes the final prediction. The network learns
the optimal filters through backpropagation and gradient descent.
Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you have an
image. It can be represented as a cuboid having its length, width (dimension of the image), and height (i.e the
channel as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small neural network, called a filter or kernel on
it, with say, K outputs and representing them vertically. Now slide that neural network across the whole image,
as a result, we will get another image with different widths, heights, and depths. Instead of just R, G, and B
channels now we have more channels but lesser width and height. This operation is called Convolution. If the
patch size is the same as that of the image it will be a regular neural network. Because of this small patch, we
have fewer weights.
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input layer is image input).
For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of
filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
During the forward pass, we slide each filter across the whole input volume step by step where each step is
called stride (which can have a value of 2, 3, or even 4 for high-dimensional images) and compute the dot
product between the kernel weights and patch from input volume.
As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the number of filters. The network will learn all the filters.
Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of
layers, and every layer transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an
image or a sequence of images. This layer holds the raw input of the image with width 32, height 32, and
depth 3.
Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies
a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller matrices
usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product between
kernel weight and the corresponding input image patch. The output of this layer is referred ad feature maps.
Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
Activation Layer: By adding an activation function to the output of the preceding layer, activation layers
add nonlinearity to the network. it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc.
The volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.
Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the
size of volume which makes the computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling. If we use a max pool with 2 x 2
filters and stride 2, the resultant volume will be of dimension 16x16x12.
Flattening: The resulting feature maps are flattened into a one-dimensional vector after the convolution
and pooling layers so they can be passed into a completely linked layer for categorization or regression.
Fully Connected Layers: It takes the input from the previous layer and computes the final classification
or regression task.
Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the probability
score of each class.
Convolutional Layer
In deep learning, a convolutional neural network (CNN or ConvNet) is a class of deep neural networks, that are
typically used to recognize patterns present in images but they are also used for spatial data analysis, computer
vision, natural language processing, signal processing, and various other purposes The architecture of a
Convolutional Network resembles the connectivity pattern of neurons in the Human Brain and was inspired by
the organization of the Visual Cortex. This specific type of Artificial Neural Network gets its name from one of
the most important operations in the network: convolution.
What Is a Convolution?
Convolution is an orderly procedure where two sources of information are intertwined; it’s an operation that
changes a function into something else. Convolutions have been used for a long time typically in image processing
to blur and sharpen images, but also to perform other operations. (e.g. enhance edges and emboss) CNNs enforce
a local connectivity pattern between neurons of adjacent layers.
CNNs make use of filters (also known as kernels), to detect what features, such as edges, are present throughout
an image. There are four main operations in a CNN:
Convolution
Non Linearity (ReLU)
Pooling or Sub Sampling
Classification (Fully Connected Layer)
The first layer of a Convolutional Neural Network is always a Convolutional Layer. Convolutional layers apply
a convolution operation to the input, passing the result to the next layer. A convolution converts all the pixels in
its receptive field into a single value. For example, if you would apply a convolution to an image, you will be
decreasing the image size as well as bringing all the information in the field together into a single pixel. The final
output of the convolutional layer is a vector. Based on the type of problem we need to solve and on the kind of
features we are looking to learn, we can use different kinds of convolutions.
Reinforcement Learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward
in a particular situation. It is employed by various software and machines to find the best possible behavior or
path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that
in supervised learning the training data has the answer key with it so the model is trained with the correct
answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what
to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in an
environment to obtain maximum reward. In RL, the data is accumulated from machine learning systems that
use a trial-and-error method. Data is not part of the input that we would find in supervised or unsupervised
machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next. After
each action, the algorithm receives feedback that helps it determine whether the choice it made was correct,
neutral or incorrect. It is a good technique to use for automated systems that have to make a lot of small
decisions without human guidance.
Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error. It
performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in order to
achieve the best outcomes.
Example:
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed
to find the best possible path to reach the reward. The following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing
the path which gives him the reward with the least hurdles. Each right step will give the robot a reward and
each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the
final reward that is the diamond.
Main points in Reinforcement learning –
Input: The input should be an initial state from which the model will start
Output: There are many possible outputs as there are a variety of solutions to a particular problem
Training: The training is based upon the input, The model will return a state and the user will decide to
reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:
Types of Reinforcement:
There are two types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases
the strength and the frequency of the behavior. In other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states which can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative condition is
stopped or avoided.
Advantages of reinforcement learning:
Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior
Credit assignment problem: Reinforcement learning algorithms learn to generate an internal value for
the intermediate states as to how good they are in leading to the goal. The learning decision maker is called the
agent. The agent interacts with the environment that includes everything outside the agent.
The agent has sensors to decide on its state in the environment and takes action that modifies its state.
The reinforcement learning problem model is an agent continuously interacting with an environment. The
agent and the environment interact in a sequence of time steps. At each time step t, the agent receives the state
of the environment and a scalar numerical reward for the previous action, and then the agent then selects an
action.
Reinforcement learning is a technique for solving Markov decision problems.
Reinforcement learning uses a formal framework defining the interaction between a learning agent and
its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of
representing essential features of the artificial intelligence problem.
Various Practical Applications of Reinforcement Learning –
o Positive Reinforcement
o Negative Reinforcement
Positive Reinforcement:
The positive reinforcement learning means adding something to increase the tendency that expected behavior
would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior.
This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may lead
to an overload of states that can reduce the consequences.
Negative Reinforcement:
The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the
specific behavior will occur again by avoiding the negative condition.
It can be more effective than the positive reinforcement depending on situation and behavior, but it provides
reinforcement only to meet minimum behavior.
We can represent the agent state using the Markov State that contains all the required information from the
history. The State St is Markov state if it follows the given condition:
The Markov state follows the Markov property, which says that the future is independent of the past and can
only be defined with the present. The RL works on fully observable environments, where the agent can observe
the environment and act for the new state. The complete process is known as Markov Decision process, which is
explained below:
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In MDP, the agent
constantly interacts with the environment and performs actions; at each action, the environment responds and
generates a new state.
MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using
MDP.
MDP uses Markov property, and to better understand the MDP, we need to learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then
the state transition from s1 to s2 only depends on the current state and future action and states do not depend
on past actions, rewards, or states."
Or, in other words, as per Markov Property, the current state transition does not depend on any past action or
state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players
only focus on the current state and do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the finite
MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St that uses the Markov
Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition function
P. These two components (S and P) can define the dynamics of the system.
Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used
algorithms are:
o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal difference Learning.
The temporal difference learning methods are the way of comparing temporally successive
predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at a particular state
"s."
o The below flowchart explains the working of Q- learning:
o State Action Reward State action (SARSA):
o SARSA stands for State Action Reward State action, which is an on-policy temporal difference
learning method. The on-policy control method selects the action for each state while learning
using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and all pairs
of (s-a).
o The main difference between Q-learning and SARSA algorithms is that unlike Q-learning, the
maximum reward for the next state is not required for updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same policy, which has determined the
original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to define and update
a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table, neural
network approximates the Q-values for each action and state.
Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent that what actions
should be taken for maximizing the reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation given
below:
In the equation, we have various components, including reward, discount factor (γ), probability, and end states s'.
But there is no any Q-value is given so first consider the below image:
In the above image, we can see there is an agent who has three values options, V(s1), V(s2), V(s3). As this is MDP,
so agent only cares for the current state and the future state. The agent can go to any direction (Up, Left, or Right),
so he needs to decide where to go for the optimal path. Here agent will take a move as per probability bases and
changes the state. But if we want some exact moves, so for this, we need to make some changes in terms of Q-
value. Consider the below image:
Q- represents the quality of the actions at each state. So instead of using a value at each state, we will use a pair
of state and action, i.e., Q(s, a). Q-value specifies that which action is more lubricative than others, and according
to the best Q-value, the agent takes his next move. The Bellman equation can be used for deriving the Q-value.
To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -
value equation will be:
The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the agent.
Q-table:
A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e.,
[s, a], and initializes the values to zero. After each action, the table is updated, and the q-values are stored within
the table.
The RL agent uses this Q-table as a reference table to select the best action based on the q-values.
The Reinforcement Learning and Supervised Learning both are the part of machine learning, but both types of
learnings are far opposite to each other. The RL agents interact with the environment, explore it, take action, and
get rewarded. Whereas supervised learning algorithms learn from the labeled dataset and, on the basis of the
training, predict the output.
RL works by interacting with the environment. Supervised learning works on the existing dataset.
The RL algorithm works like the human brain works Supervised Learning works as when a human learns
when making some decisions. things in the supervision of a guide.
No previous training is provided to the learning agent. Training is provided to the algorithm so that it can
predict the output.
RL helps to take decisions sequentially. In Supervised learning, decisions are made when input
is given.
Reinforcement Learning Applications
1. Robotics:
a. RL is used in Robot navigation, Robo-soccer, walking, juggling, etc.
2. Control:
. RL can be used for adaptive control such as Factory processes, admission control in
telecommunication, and Helicopter pilot is an example of reinforcement learning.
3. Game Playing:
. RL can be used in Game playing such as tic-tac-toe, chess, etc.
4. Chemistry:
. RL can be used for optimizing the chemical reactions.
5. Business:
. RL is now used for business strategy planning.
6. Manufacturing:
. In various automobile manufacturing companies, the robots use deep reinforcement learning to pick goods and
put them in some containers.
7. Finance Sector:
. The RL is currently used in the finance sector for evaluating trading strategies.
Genetic Algorithm
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of evolution in
Nature." It is used to solve optimization problems in machine learning. It is one of the important algorithms as it
helps solve complex problems that would take a long time to solve.
Genetic Algorithms are being widely used in different real-world applications, for example, Designing electronic
circuits, code-breaking, image processing, and artificial creativity.
In this topic, we will explain Genetic algorithm in detail, including basic terminologies used in Genetic algorithm,
how it works, advantages and limitations of genetic algorithm, etc.
What is a Genetic Algorithm?
Before understanding the Genetic algorithm, let's first understand basic terminologies to better understand this
algorithm:
o Population: Population is the subset of all possible or probable solutions, which can solve the given
problem.
o Chromosomes: A chromosome is one of the solutions in the population for the given problem, and the
collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of the chromosome.
o Allele: Allele is the value provided to the gene within a particular chromosome.
o Fitness Function: The fitness function is used to determine the individual's fitness level in the population.
It means the ability of an individual to compete with other individuals. In every iteration, individuals are
evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better than
parents. Here genetic operators play a role in changing the genetic composition of the next generation.
o Selection
After calculating the fitness of every existent in the population, a selection process is used to determine which of
the individualities in the population will get to reproduce and produce the seed that will form the coming
generation.
So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization problems. It is a
subset of evolutionary algorithms, which is used in computing. A genetic algorithm uses genetic and natural
selection concepts to solve optimization problems.
The genetic algorithm works on the evolutionary generational cycle to generate high-quality solutions. These
algorithms use different operations that either enhance or replace the population to give an improved fit solution.
It basically involves five phases to solve the complex optimization problems, which are given as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called population. Here
each individual is the solution for the given problem. An individual contains or is characterized by a set of
parameters called Genes. Genes are combined into a string and generate chromosomes, which is the solution to
the problem. One of the most popular techniques for initialization is the use of random binary strings.
Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an individual to compete
with other individuals. In every iteration, individuals are evaluated based on their fitness function. The fitness
function provides a fitness score to each individual. This score further determines the probability of being selected
for reproduction. The high the fitness score, the more chances of getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All the selected
individuals are then arranged in a pair of two to increase reproduction. Then these individuals transfer their genes
to the next generation.
Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step, the genetic
algorithm uses two variation operators that are applied to the parent population. The two operators involved in
the reproduction phase are given below:
o Crossover: The crossover plays a most significant role in the reproduction phase of the genetic algorithm.
In this process, a crossover point is selected at random within the genes. Then the crossover operator
swaps genetic information of two parents from the current generation to produce a new individual
representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met. These newly
generated offspring are added to the population. This process is also called or crossover. Types of
crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the diversity in the
population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification. The below
image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The algorithm terminates
after the threshold fitness solution is reached. It will identify the final solution as the best solution in the
population.
General Workflow of a Simple Genetic Algorithm
Genetic Programming
Genetic programming is a form of artificial intelligence that mimics natural selection in order to find an optimal
result. Genetic programming is iterative, and at each new stage of the algorithm, it chooses only the fittest of
the “offspring” to cross and reproduce in the next generation, which is sometimes referred to as a fitness
function. Just like in biological evolution, evolutionary algorithms can sometimes have randomly mutating
offspring, but since only the offspring that have the highest fitness measure are reproduced, the fitness will
almost always improve over generations. Genetic programming will generally terminate once it reaches a
predefined fitness measure. Additionally, architecture-altering operations can be introduced to an already
running program in order to allow for new sources of information to be analyzed for a given fitness function.
Although originally proposed in 1950 by Alan Turing, it wasn’t until the 1980s that successful genetic
algorithms were first implemented. The first patented algorithm for genetic operations was in 1988 by John
Koza, who remains a leader in the field. As the study of genetic operations continued to evolve, so has the
literature around it, such as with annual conferences like Genetic Algorithms and dedicated journals
like Genetic Programming and Evolvable Machines. There has also been a series of 19 books published by
MIT Press called Genetic and Evolutionary Computation. As knowledge around evolutionary programs has
expanded, so has the population of computer programs that can run them.
Genetic programming systems utilize a type of machine learning technique that can include automatic
programming without the need for manual interaction. This means that genetic algorithms can utilize automatic
program inductions to run as new information is ingested, so that the programs can be optimized automatically.
Genetic or evolutionary algorithms have a variety of uses, particularly around domains where an exact solution
is not known in advance, or when finding an approximate solution is deemed appropriate. Genetic programming
is often used in conjunction with other forms of machine learning, as it is useful for performing symbolic
regressions and feature classifications.
Saving time: Genetic algorithms are able to process large amounts of data much more quickly than
humans can. Additionally, these algorithms run free of human biases, and are thereby able to come up
with ideas that might otherwise not have been considered.
Data and text classification: Genetic programming can quickly identify and classify various forms of
data without the need for human oversight. Genetic programming can use data tree construction in order
to optimize these classifications, especially when dealing with big data.
Ensuring network security: Rule evolution approaches have been successfully applied to identify new
attacks on networks. By quickly identifying intrusions, businesses and organizations can ensure that
they can respond to such attacks before they are able to access confidential information.
Supporting other machine learning methods: Genetic programming can be included in larger systems
of machine learning, such as with neural networks. By having genetic programming focus on only
specific subsets of data, organizations can ensure that this data is quickly processed for ingestion into
larger or different learning methods. This allows organizations to gain as much useful and actionable
information as possible.
Genetic Programming (GP) is a type of Evolutionary Algorithm (EA), a subset of machine learning. EAs are used
to discover solutions to problems humans do not know how to solve, directly. Free of human preconceptions or
biases, the adaptive nature of EAs can generate solutions that are comparable to, and often better than the best
human efforts.*
Inspired by biological evolution and its fundamental mechanisms, GP software systems implement an algorithm
that uses random mutation, crossover, a fitness function, and multiple generations of evolution to resolve a user-
defined task. GP can be used to discover a functional relationship between features in data (symbolic regression),
to group data into categories (classification), and to assist in the design of electrical circuits, antennae, and
quantum algorithms. GP is applied to software engineering through code synthesis, genetic improvement,
automatic bug-fixing, and in developing game-playing strategies, … and more.
Types of GP include:
Genetic algorithms find use in various real-world applications. In this segment, we have elaborated on some areas
that utilize the genetic algorithms in machine learning.
1. Neural networks
Genetic programming in machine learning finds great applications for neural networks in machine learning. We
use it for genetic optimization in neural networks or use cases like inheriting qualities of neurons, neural network
pipeline optimization, finding the best fit set of parameters for a given neural network, and others.
Data mining and clustering use genetic algorithms to find out the centre point of the clusters with an optimal error
rate given to its great searching capability for an optimal value. It is renowned as an unsupervised learning process
in machine learning, where we categorize the data based on the characteristics of the data points.
3. Image processing
Image processing tasks, such as image segmentation, are one of the major use cases of genetic optimization.
However, genetic algorithms can also be used in different areas of image analysis to resolve complex optimization
problems.
A wireless sensor network refers to one which includes dedicated and especially dispersed centers that maintain
the record of the physical conditions of an environment. It further passes the record created to a central storage
system.
WSN utilizes genetic machine learning to stimulate the sensors. We can optimize and even customize all the
operational stages with the help of the fitness function from genetic algorithms in wireless sensor networks.
TSP or traveling salesman problem is one of the real-life combinatorial optimization problems that were solved
using genetic optimization. It helps in finding an optimal way in a given map with the distance between two points
and with the routes to be covered by the salesman.
Several iterations take place that generate offspring solutions after every iteration to inherit the qualities of parent
solutions. In this way we don’t get the solution only one time that offers you the opportunity to choose the best
route structure. It finds application in real-time processes like planning, manufacturing and logistics.
One of the generalisations of the travelling salesman problem discussed above is the vehicle routing problem.
Genetic algorithms help in finding the optimal weight of goods to be delivered through the optimal set of delivery
routes. Factors such as depot points, wait, distance are taken into consideration while solving such problems if
they have any kind of restrictions. Moreover, the genetic algorithm approach is competitive in terms of solution
quality and time with simulated annealing algorithms and tabu search.
Genetic algorithms find application in many designing procedures of mechanical components. For instance,
consider the following genetic algorithm example where the aircraft wing design is a kind of designing problem
that takes multiple disciplines into consideration. It requires improvement in the ratio of left to drag for a complex
wing.
The fitness function in genetic optimization is flexible to considerations that come as a demand for a particular
design.
8. Manufacturing system
The manufacturing arena includes various examples of the cost function. Based on the same, the need to find an
optimal set of parameters for such functions poses a problem.
Genetic optimization performs this task of using the optimized set of parameters to minimize the cost function. It
also finds application in product manufacturing to achieve optimum production plans by considering dynamic
conditions like capacity, inventories, or material quality.
9. Financial markets
A variety of issues can be solved using genetic optimization in the financial market. It helps in finding an optimal
combination of parameters that can affect the trades or market rules. You can also find out the near-optimal value
for the optimal set of parameters.
Medical signs have an array of use cases for genetic optimization. The areas of predictive analysis include protein
prediction, RNA structure prediction, operon prediction, and others.
Other processes such as protein folding, gene expression profiling analysis, bioinformatics multiple sequence
alignment, or some of the process alignment that uses genetic optimisation.
Genetic machine learning algorithms are used to derive optimal schedules that satisfy certain constraints related
to a problem.
For instance: assume that you have to prepare the schedule for the semester exams for University. The genetic
algorithm will find the best optimal schedule for the university considering all the constraints like the number of
classrooms, the number of students, the total number of courses and subjects, and others.
12. Economics
Economics is the study of resource utilization in the production, distribution, and complete consumption of goods
and services over a time period. The genetic algorithm creates models of demand and supply that derive asset
pricing, game theory, and others.
13. Robotics
Robotics comprises the construction, design, and working of the autonomous robot. Genetic algorithms contribute
to the robotics field by providing the necessary insight into the decisions made by the robot. It generates optimal
routes for the robot so that it can use the least amount of resources to get to the desired position.