You are on page 1of 38

Artificial Neural Networks

Artificial Neural Networks contain artificial neurons which are called units. These units are arranged in a series
of layers that together constitute the whole Artificial Neural Network in a system. A layer can have only a
dozen units or millions of units as this depends on how the complex neural networks will be required to learn
the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input layer, an output layer as
well as hidden layers. The input layer receives data from the outside world which the neural network needs to
analyze or learn about. Then this data passes through one or multiple hidden layers that transform the input into
data that is valuable for the output layer. Finally, the output layer provides an output in the form of a response
of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these connections
has weights that determine the influence of one unit on another unit. As the data transfers from one unit to
another, the neural network learns more and more about the data which eventually results in an output from the
output layer.

The structures and operations of human neurons serve as the basis for artificial neural networks. It is also known
as neural networks or neural nets. The input layer of an artificial neural network is the first layer, and it
receives input from external sources and releases it to the hidden layer, which is the second layer. In the hidden
layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends it to
the neurons in the next layer. These connections are weighted means effects of the inputs from the previous
layer are optimized more or less by assigning different-different weights to each input and it is adjusted during
the training process by optimizing these weights for improved model performance.
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal brains So they share
a lot of similarities in structure and function wise.
 Structure: The structure of artificial neural networks is inspired by biological neurons. A biological neuron
has a cell body or soma to process the impulses, dendrites to receive them, and an axon that transfers them
to other neurons. The input nodes of artificial neural networks receive input signals, the hidden layer nodes
compute these input signals, and the output layer nodes compute the final output by processing the hidden
layer’s results using activation functions.
Biological Neuron Artificial Neuron

Dendrite Inputs

Cell nucleus or Soma Nodes

Synapses Weights

Axon Output
 Synapses: Synapses are the links between biological neurons that enable the transmission of impulses from
dendrites to the cell body. Synapses are the weights that join the one-layer nodes to the next-layer nodes in
artificial neurons. The strength of the links is determined by the weight value.
 Learning: In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus
that helps to process the impulses. An action potential is produced and travels through the axons if the
impulses are powerful enough to reach the threshold. This becomes possible by synaptic plasticity, which
represents the ability of synapses to become stronger or weaker over time in reaction to changes in their
activity. In artificial neural networks, backpropagation is a technique used for learning, which adjusts the
weights between nodes according to the error or differences between predicted and actual outcomes.
Biological Neuron Artificial Neuron

Synaptic plasticity Backpropagations

 Activation: In biological neurons, activation is the firing rate of the neuron which happens when the
impulses are strong enough to reach the threshold. In artificial neural networks, A mathematical function
known as an activation function maps the input to the output, and executes activations.
Perceptron
A single-layer feedforward neural network was introduced in the late 1950s by Frank Rosenblatt. It was the
starting phase of Deep Learning and Artificial neural networks. During that time for prediction, Statistical
machine learning, or Traditional code Programming is used. Perceptron is one of the first and most
straightforward models of artificial neural networks. Despite being a straightforward model, the perceptron
has been proven to be successful in solving specific categorization issues.
Architecture
Perceptron is one of the simplest Artificial neural network architectures. It was introduced by Frank Rosenblatt
in 1957s. It is the simplest type of feedforward neural network, consisting of a single layer of input nodes that
are fully connected to a layer of output nodes. It can learn the linearly separable patterns. it uses slightly
different types of artificial neurons known as threshold logic units (TLU). it was first introduced by McCulloch
and Walter Pitts in the 1940s.
A weight is assigned to each input node of a perceptron, indicating the significance of that input to the output.
The perceptron’s output is a weighted sum of the inputs that have been run through an activation function to
decide whether or not the perceptron will fire. it computes the weighted sum of its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW
The step function compares this weighted sum to the threshold, which outputs 1 if the input is larger than a
threshold value and 0 otherwise, is the activation function that perceptrons utilize the most frequently. The most
common step function used in perceptron is the Heaviside step function:

A perceptron has a single layer of threshold logic units with each TLU connected to all inputs.
A perceptron has a single layer of threshold logic units with each TLU connected to all inputs.

What is MultiLayer Perceptron Neural Network?


A multilayer perceptron (MLP) Neural network belongs to the feedforward neural network. It is an Artificial
Neural Network in which all nodes are interconnected with nodes of different layers.
The word Perceptron was first defined by Frank Rosenblatt in his perceptron program. Perceptron is a basic unit
of an artificial neural network that defines the artificial neuron in the neural network. It is a supervised learning
algorithm that contains nodes’ values, activation functions, inputs, and node weights to calculate the output.
The Multilayer Perceptron (MLP) Neural Network works only in the forward direction. All nodes are fully
connected to the network. Each node passes its value to the coming node only in the forward direction. The MLP
neural network uses a Backpropagation algorithm to increase the accuracy of the training model.

Structure of MultiLayer Perceptron Neural Network


This netwok has three main layers that combine to form a complete Artificial Neural Network. These layers are
as follows:
Input Layer
It is the initial or starting layer of the Multilayer perceptron. It takes input from the training data set and forwards
it to the hidden layer. There are n input nodes in the input layer. The number of input nodes depends on the
number of dataset features. Each input vector variable is distributed to each of the nodes of the hidden layer.
Hidden Layer
It is the heart of all Artificial neural networks. This layer comprises all computations of the neural network. The
edges of the hidden layer have weights multiplied by the node values. This layer uses the activation function.
There can be one or two hidden layers in the model.
Several hidden layer nodes should be accurate as few nodes in the hidden layer make the model unable to work
efficiently with complex data. More nodes will result in an overfitting problem.
Output Layer
This layer gives the estimated output of the Neural Network. The number of nodes in the output layer depends on
the type of problem. For a single targeted variable, use one node. N classification problem, ANN uses N nodes in
the output layer.
Working of MultiLayer Perceptron Neural Network
 The input node represents the feature of the dataset.
 Each input node passes the vector input value to the hidden layer.
 In the hidden layer, each edge has some weight multiplied by the input variable. All the production values from
the hidden nodes are summed together. To generate the output
 The activation function is used in the hidden layer to identify the active nodes.
 The output is passed to the output layer.
 Calculate the difference between predicted and actual output at the output layer.
 The model uses backpropagation after calculating the predicted output.
BackPropagation Algorithm
The backpropagation algorithm is used in a Multilayer perceptron neural network to increase the accuracy of the
output by reducing the error in predicted output and actual output.
According to this algorithm,
 After calculating the output from the Multilayer perceptron neural network, calculate the error.
 This error is the difference between the output generated by the neural network and the actual output. The
calculated error is fed back to the network, from the output layer to the hidden layer.
 Now, the output becomes the input to the network.
 The model reduces error by adjusting the weights in the hidden layer.
 Calculate the predicted output with adjusted weight and check the error. The process is recursively used till
there is minimum or no error.
 This algorithm helps in increasing the accuracy of the neural network.
Advantages of MultiLayer Perceptron Neural Network
 MultiLayer Perceptron Neural Networks can easily work with non-linear problems.
 It can handle complex problems while dealing with large datasets.
 Developers use this model to deal with the fitness problem of Neural Networks.
 It has a higher accuracy rate and reduces prediction error by using backpropagation.
 After training the model, the Multilayer Perceptron Neural Network quickly predicts the output.
Disadvantages of MultiLayer Perceptron Neural Network
 This Neural Network consists of large computation, which sometimes increases the overall cost of the model.
 The model will perform well only when it is trained perfectly.
 Due to this model’s tight connections, the number of parameters and node redundancy increases.
Gradient Descent in Machine Learning
Gradient Descent is known as one of the most commonly used optimization algorithms to train machine learning
models by means of minimizing errors between actual and expected results. Further, gradient descent is also used
to train Neural Networks.
In mathematical terminology, Optimization algorithm refers to the task of minimizing/maximizing an objective
function f(x) parameterized by x. Similarly, in machine learning, optimization is the task of minimizing the cost
function parameterized by the model's parameters. The main objective of gradient descent is to minimize the
convex function using iteration of parameter updates. Once these machine learning models are optimized, these
models can be used as powerful tools for Artificial Intelligence and various computer science applications.
In this tutorial on Gradient Descent in Machine Learning, we will learn in detail about gradient descent, the role
of cost functions specifically as a barometer within Machine Learning, types of gradient descents, learning rates,
etc.
Gradient Descent is defined as one of the most commonly used iterative optimization algorithms of machine
learning to train the machine learning and deep learning models. It helps in finding the local minimum of a
function.
o If we move towards a negative gradient or away from the gradient of the function at the current point, it
will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the function at the current point,
we will get the local maximum of that function.

This entire procedure is known as Gradient Ascent, which is also known as steepest descent. The main objective
of using a gradient descent algorithm is to minimize the cost function using iteration. To achieve this goal, it
performs two steps iteratively:
o Calculates the first-order derivative of the function to compute the gradient or slope of that function.
o Move away from the direction of the gradient, which means slope increased from the current point by
alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process
which helps to decide the length of the steps.
What is Cost-function?
The cost function is defined as the measurement of difference or error between actual values and expected
values at the current position and present in the form of a single real number. It helps to increase and improve
machine learning efficiency by providing feedback to this model so that it can minimize error and find the local
or global minimum. Further, it continuously iterates along the direction of the negative gradient until the cost
function approaches zero. At this steepest descent point, the model will stop learning further. Although cost
function and loss function are considered synonymous, also there is a minor difference between them. The slight
difference between the loss function and the cost function is about the error within the training of machine learning
models, as loss function refers to the error of one training example, while a cost function calculates the average
error across an entire training set.
The cost function is calculated after making a hypothesis with initial parameters and modifying these parameters
using gradient descent algorithms over known data to reduce the cost function.
How does Gradient Descent work?
Before starting the working principle of gradient descent, we should know some basic concepts to find out the
slope of a line from linear regression. The equation for simple linear regression is given as:
1. Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.

The starting point(shown in above fig.) is used to evaluate the performance as it is considered just as an arbitrary
point. At this starting point, we will derive the first derivative or slope and then use a tangent line to calculate the
steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias).
The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated,
then steepness gradually reduces, and at the lowest point, it approaches the lowest point, which is called a point
of convergence.
The main objective of gradient descent is to minimize the cost function or the error between expected and actual.
To minimize the cost function, two data points are required:
o Direction & Learning Rate
These two factors are used to determine the partial derivative calculation of future iteration and allow it to the
point of convergence or local minimum or global minimum. Let's discuss learning rate factors in brief;
Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is typically a small value that is
evaluated and updated based on the behavior of the cost function. If the learning rate is high, it results in larger
steps but also leads to risks of overshooting the minimum. At the same time, a low learning rate shows the small
step sizes, which compromises overall efficiency but gives the advantage of more precision.

Types of Gradient Descent


Based on the error in various training models, the Gradient Descent learning algorithm can be divided into Batch
gradient descent, stochastic gradient descent, and mini-batch gradient descent. Let's understand these
different types of gradient descent:
1. Batch Gradient Descent:
Batch gradient descent (BGD) is used to find the error for each point in the training set and update the model after
evaluating all training examples. This procedure is known as the training epoch. In simple words, it is a greedy
approach where we have to sum over all examples for each update.
Advantages of Batch gradient descent:
o It produces less noise in comparison to other gradient descent.
o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training samples.
2. Stochastic gradient descent
Stochastic gradient descent (SGD) is a type of gradient descent that runs one training example per iteration. Or in
other words, it processes a training epoch for each example within a dataset and updates each training example's
parameters one at a time. As it requires only one training example at a time, hence it is easier to store in allocated
memory. However, it shows some computational efficiency losses in comparison to batch gradient systems as it
shows frequent updates that require more detail and speed. Further, due to frequent updates, it is also treated as a
noisy gradient. However, sometimes it can be helpful in finding the global minimum and also escaping the local
minimum.
Advantages of Stochastic gradient descent:
In Stochastic gradient descent (SGD), learning happens on every example, and it consists of a few advantages
over other gradient descent.
o It is easier to allocate in desired memory.
o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.
3. MiniBatch Gradient Descent:
Mini Batch gradient descent is the combination of both batch gradient descent and stochastic gradient descent. It
divides the training datasets into small batch sizes then performs the updates on those batches separately. Splitting
training datasets into smaller batches make a balance to maintain the computational efficiency of batch gradient
descent and speed of stochastic gradient descent. Hence, we can achieve a special type of gradient descent with
higher computational efficiency and less noisy gradient descent.
Advantages of Mini Batch gradient descent:
o It is easier to fit in allocated memory.
o It is computationally efficient.
o It produces stable gradient descent convergence.
Challenges with the Gradient Descent
Although we know Gradient Descent is one of the most popular methods for optimization problems, it still also
has some challenges. There are a few challenges as follows:
1. Local Minima and Saddle Point:
For convex problems, gradient descent can find the global minimum easily, while for non-convex problems, it is
sometimes difficult to find the global minimum, where the machine learning models achieve the best results.

Whenever the slope of the cost function is at zero or just close to zero, this model stops learning further. Apart
from the global minimum, there occur some scenarios that can show this slop, which is saddle point and local
minimum. Local minima generate the shape similar to the global minimum, where the slope of the cost function
increases on both sides of the current points.
In contrast, with saddle points, the negative gradient only occurs on one side of the point, which reaches a local
maximum on one side and a local minimum on the other side. The name of a saddle point is taken by that of a
horse's saddle.
The name of local minima is because the value of the loss function is minimum at that point in a local region. In
contrast, the name of the global minima is given so because the value of the loss function is minimum there,
globally across the entire domain the loss function.
2. Vanishing and Exploding Gradient
In a deep neural network, if the model is trained with gradient descent and backpropagation, there can occur two
more issues other than local minima and saddle point.
Vanishing Gradients:
Vanishing Gradient occurs when the gradient is smaller than expected. During backpropagation, this gradient
becomes smaller that causing the decrease in the learning rate of earlier layers than the later layer of the network.
Once this happens, the weight parameters update until they become insignificant.
Exploding Gradient:
Exploding gradient is just opposite to the vanishing gradient as it occurs when the Gradient is too large and creates
a stable model. Further, in this scenario, model weight increases, and they will be represented as NaN. This
problem can be solved using the dimensionality reduction technique, which helps to minimize complexity within
the model.
Delta Learning Rule
It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and has a
continuous activation function. It is also known as the Least Mean Square method and it minimizes error over
all the training patterns.
It is based on a gradient descent approach which continues forever. It states that the modification in the weight
of a node is equal to the product of the error and the input where the error is the difference between desired and
actual output.
Computed as follows:
Assume (x1,x2,x3……………………….xn) –>set of input vectors
and (w1,w2,w3…………………..wn) –>set of weights
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
Error= ti-y
Learning signal(ej)=(ti-y)y’
y=f(net input)= ∫wixi
δw=αxiej=αxi(ti-y)y’
wnew=wo+δw
The updating of weights can only be done if there is a difference between the target and actual output(i.e.,
error) present:
case I: when t=y
then there is no change in weight
case II: else
wnew=wo+δw
Backpropagation
Backpropagation is an algorithm that backpropagates the errors from the output nodes to the input nodes.
Therefore, it is simply referred to as the backward propagation of errors. It uses in the vast applications of
neural networks in data mining like Character recognition, Signature verification, etc.

Neural Network:

Neural networks are an information processing paradigm inspired by the human nervous system. Just like in
the human nervous system, we have biological neurons in the same way in neural networks we have artificial
neurons, artificial neurons are mathematical functions derived from biological neurons. The human brain is
estimated to have about 10 billion neurons, each connected to an average of 10,000 other neurons. Each neuron
receives a signal through a synapse, which controls the effect of the signconcerning on the neuron.
Backpropagation is a widely used algorithm for training feedforward neural networks. It computes the gradient
of the loss function with respect to the network weights. It is very efficient, rather than naively directly
computing the gradient concerning each weight. This efficiency makes it possible to use gradient methods to
train multi-layer networks and update weights to minimize loss; variants such as gradient descent or stochastic
gradient descent are often used.
The backpropagation algorithm works by computing the gradient of the loss function with respect to each
weight via the chain rule, computing the gradient layer by layer, and iterating backward from the last layer to
avoid redundant computation of intermediate terms in the chain rule.

Features of Backpropagation:

1. it is the gradient descent method as used in the case of simple perceptron network with the differentiable
unit.
2. it is different from other networks in respect to the process by which the weights are calculated during the
learning period of the network.
3. training is done in the three stages :
 the feed-forward of input training pattern
 the calculation and backpropagation of the error
 updation of the weight
Working of Backpropagation:
Neural networks use supervised learning to generate output vectors from input vectors that the network operates
on. It Compares generated output to the desired output and generates an error report if the result does not match
the generated output vector. Then it adjusts the weights according to the bug report to get your desired output.

Backpropagation Algorithm:

Step 1: Inputs X, arrive through the preconnected path.


Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the error.
Step 6: Repeat the process until the desired output is achieved.

 x = inputs training vector x=(x 1,x2,…………xn).


 t = target vector t=(t 1,t2……………tn).
 δk = error at output unit.
 δj = error at hidden layer.
 α = learning rate.
 V0j = bias of hidden unit j.
Training Algorithm :
Step 1: Initialize weight to small random values.
Step 2: While the stepsstopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmitsthe signal x i signal to all the units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its net input
zinj = v0j + Σxivij ( i=1 to n)
Applying activation function z j = f(zinj) and sends this signals to all units in the layer about i.e output units
For each output l=unit yk = (k=1 to m) sums its weighted input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the output signals.
yk = f(yink)
Backpropagation Error :
Step 6: Each output unit yk (k=1 to n) receives a target pattern corresponding to an input pattern then error is
calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above
δinj = Σ δj wjk
The error information term is calculated as :
δj = δinj + zinj
Updation of weight and bias :
Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a). The weight correction term is
given by :
Δ wjk = α δk zj
and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j
Step 9: Test the stopping condition. The stopping condition can be the minimization of error, number of epochs.

Need for Backpropagation:

Backpropagation is “backpropagation of errors” and is very useful for training neural networks. It’s fast, easy
to implement, and simple. Backpropagation does not require any parameters to be set, except the number of
inputs. Backpropagation is a flexible method because no prior knowledge of the network is required.

Types of Backpropagation

There are two types of backpropagation networks.


 Static backpropagation: Static backpropagation is a network designed to map static inputs for static outputs.
These types of networks are capable of solving static classification problems such as OCR (Optical
Character Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network used for fixed-point learning.
Activation in recurrent backpropagation is feed-forward until a fixed value is reached. Static
backpropagation provides an instant mapping, while recurrent backpropagation does not provide an instant
mapping.

Advantages:

 It is simple, fast, and easy to program.


 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.

Disadvantages:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

Example

Input values

X1=0.05
X2=0.10

Initial weight

W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values

b1=0.35 b2=0.60

Target Values
T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass

To find the value of H1 we first multiply the input value from the weights as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the weights as

y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
To calculate the final result of y1 we performed the sigmoid function as

We will calculate the value of y2 in the same way as y1

SOM Algorithm

Kohonen Self-Organizing feature map (SOM) refers to a neural network, which is trained using competitive
learning. Basic competitive learning implies that the competition process takes place before the cycle of learning.
The competition process suggests that some criteria select a winning processing element. After the winning
processing element is selected, its weight vector is adjusted according to the used learning law (Hecht Nielsen
1990).

The self-organizing map makes topologically ordered mappings between input data and processing elements of
the map. Topological ordered implies that if two inputs are of similar characteristics, the most active processing
elements answering to inputs that are located closed to each other on the map. The weight vectors of the processing
elements are organized in ascending to descending order. Wi < Wi+1 for all values of i or Wi+1 for all values of i
(this definition is valid for one-dimensional self-organizing map only).

The self-organizing map is typically represented as a two-dimensional sheet of processing elements described in
the figure given below. Each processing element has its own weight vector, and learning of SOM (self-organizing
map) depends on the adaptation of these vectors. The processing elements of the network are made competitive
in a self-organizing process, and specific criteria pick the winning processing element whose weights are updated.
Generally, these criteria are used to limit the Euclidean distance between the input vector and the weight vector.
SOM (self-organizing map) varies from basic competitive learning so that instead of adjusting only the weight
vector of the winning processing element also weight vectors of neighboring processing elements are adjusted.
First, the size of the neighborhood is largely making the rough ordering of SOM and size is diminished as time
goes on. At last, only a winning processing element is adjusted, making the fine-tuning of SOM possible. The use
of neighborhood makes topologically ordering procedure possible, and together with competitive learning makes
process non-linear.

It is discovered by Finnish professor and researcher Dr. Teuvo Kohonen in 1982. The self-organizing map refers
to an unsupervised learning model proposed for applications in which maintaining a topology between input and
output spaces. The notable attribute of this algorithm is that the input vectors that are close and similar in high
dimensional space are also mapped to close by nodes in the 2D space. It is fundamentally a method for
dimensionality reduction, as it maps high-dimension inputs to a low dimensional discretized representation and
preserves the basic structure of its input space.
All the entire learning process occurs without supervision because the nodes are self-organizing. They are also
known as feature maps, as they are basically retraining the features of the input data, and simply grouping
themselves as indicated by the similarity between each other. It has practical value for visualizing complex or
huge quantities of high dimensional data and showing the relationship between them into a low, usually two-
dimensional field to check whether the given unlabeled data have any structure to it.

A self-Organizing Map (SOM) varies from typical artificial neural networks (ANNs) both in its architecture and
algorithmic properties. Its structure consists of a single layer linear 2D grid of neurons, rather than a series of
layers. All the nodes on this lattice are associated directly to the input vector, but not to each other. It means the
nodes don't know the values of their neighbors, and only update the weight of their associations as a function of
the given input. The grid itself is the map that coordinates itself at each iteration as a function of the input data.
As such, after clustering, each node has its own coordinate (i.j), which enables one to calculate Euclidean distance
between two nodes by means of the Pythagoras theorem.

A Self-Organizing Map utilizes competitive learning instead of error-correction learning, to modify its weights.
It implies that only an individual node is activated at each cycle in which the features of an occurrence of the
input vector are introduced to the neural network, as all nodes compete for the privilege to respond to the input.

The selected node- the Best Matching Unit (BMU) is selected according to the similarity between the current
input values and all the other nodes in the network. The node with the fractional Euclidean difference between
the input vector, all nodes, and its neighboring nodes is selected and within a specific radius, to have their position
slightly adjusted to coordinate the input vector. By experiencing all the nodes present on the grid, the whole grid
eventually matches the entire input dataset with connected nodes gathered towards one area, and dissimilar ones
are isolated.

Algorithm:

Step:1
Each node weight w_ij initialize to a random value.

Step:2

Choose a random input vector x_k.

Step:3

Repeat steps 4 and 5 for all nodes on the map.

Step:4

Calculate the Euclidean distance between weight vector wij and the input vector x(t) connected with the first node,
where t, i, j =0.

Step:5

Track the node that generates the smallest distance t.

Step:6

Calculate the overall Best Matching Unit (BMU). It means the node with the smallest distance from all calculated
ones.

Step:7

Discover topological neighborhood βij(t) its radius σ(t) of BMU in Kohonen Map.

Step:8

Repeat for all nodes in the BMU neighborhood: Update the weight vector w_ij of the first node in the
neighborhood of the BMU by including a fraction of the difference between the input vector x(t) and the weight
w(t) of the neuron.

Step:9

Repeat the complete iteration until reaching the selected iteration limit t=n.

Here, step 1 represents initialization phase, while step 2 to 9 represents the training phase.

Where;

t = current iteration.

i = row coordinate of the nodes grid.

J = column coordinate of the nodes grid.

W= weight vector

w_ij = association weight between the nodes i,j in the grid.


X = input vector

X(t)= the input vector instance at iteration t

β_ij = the neighborhood function, decreasing and representing node i,j distance from the BMU.

σ(t) = The radius of the neighborhood function, which calculates how far neighbor nodes are examined in the 2D
grid when updating vectors. It gradually decreases over time.

Pros

 Techniques like dimensionality reduction and grid clustering can make it simple to understand and
comprehend data.

 Self-organizing maps can handle a variety of categorization issues while simultaneously producing an
insightful and practical summary of the data.

Cons

 The model cannot grasp how data is formed since it does not generate a generative data model.

 When dealing with categorical data, Self-Organizing Maps perform poorly, and when dealing with mixed
forms of data, they do much worse.

 In comparison, the model preparation process is extremely slow, making it challenging to train against slowly
evolving data.

Introduction to Deep Learning


Deep learning is a branch of machine learning which is based on artificial neural networks. It is capable of
learning complex patterns and relationships within data. In deep learning, we don’t need to explicitly program
everything. It has become increasingly popular in recent years due to the advances in processing power and the
availability of large datasets. Because it is based on artificial neural networks (ANNs) also known as deep
neural networks (DNNs). These neural networks are inspired by the structure and function of the human brain’s
biological neurons, and they are designed to learn from large amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of neural networks to model and
solve complex problems. Neural networks are modeled after the structure and function of the human brain
and consist of layers of interconnected nodes that process and transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks, which have multiple layers of
interconnected nodes. These networks can learn complex representations of data by discovering hierarchical
patterns and features in the data. Deep Learning algorithms can automatically learn and improve from data
without the need for manual feature engineering.
3. Deep Learning has achieved significant success in various fields, including image recognition, natural
language processing, speech recognition, and recommendation systems. Some of the popular Deep
Learning architectures include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and computational resources.
However, the availability of cloud computing and the development of specialized hardware, such as
Graphics Processing Units (GPUs), has made it easier to train deep neural networks.
In summary, Deep Learning is a subfield of Machine Learning that involves the use of deep neural networks to
model and solve complex problems. Deep Learning has achieved significant success in various fields, and its
use is expected to continue to grow as more data becomes available, and more powerful computing resources
become available.
What is Deep Learning?
Deep learning is the branch of machine learning which is based on artificial neural network architecture. An
artificial neural network or ANN uses layers of interconnected nodes called neurons that work together to
process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers connected one
after the other. Each neuron receives input from the previous layer neurons or the input layer. The output of
one neuron becomes the input to other neurons in the next layer of the network, and this process continues until
the final layer produces the output of the network. The layers of the neural network transform the input data
through a series of nonlinear transformations, allowing the network to learn complex representations of the
input data.
Applications of Deep Learning :
The main applications of deep learning can be divided into computer vision, natural language processing (NLP),
and reinforcement learning.
Computer vision
In computer vision, Deep learning models can enable machines to identify and understand visual data. Some
of the main applications of deep learning in computer vision include:
 Object detection and recognition: Deep learning model can be used to identify and locate objects within
images and videos, making it possible for machines to perform tasks such as self-driving cars, surveillance,
and robotics.
 Image classification: Deep learning models can be used to classify images into categories such as animals,
plants, and buildings. This is used in applications such as medical imaging, quality control, and image
retrieval.
 Image segmentation: Deep learning models can be used for image segmentation into different regions,
making it possible to identify specific features within images.
Natural language processing (NLP):
In NLP, the Deep learning model can enable machines to understand and generate human language. Some of
the main applications of deep learning in NLP include:
 Automatic Text Generation – Deep learning model can learn the corpus of text and new text like summaries,
essays can be automatically generated using these trained models.
 Language translation: Deep learning models can translate text from one language to another, making it
possible to communicate with people from different linguistic backgrounds.
 Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text, making it possible
to determine whether the text is positive, negative, or neutral. This is used in applications such as customer
service, social media monitoring, and political analysis.
 Speech recognition: Deep learning models can recognize and transcribe spoken words, making it possible
to perform tasks such as speech-to-text conversion, voice search, and voice-controlled devices.
Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an environment to maximize
a reward. Some of the main applications of deep learning in reinforcement learning include:
 Game playing: Deep reinforcement learning models have been able to beat human experts at games such as
Go, Chess, and Atari.
 Robotics: Deep reinforcement learning models can be used to train robots to perform complex tasks such
as grasping objects, navigation, and manipulation.
 Control systems: Deep reinforcement learning models can be used to control complex systems such as
power grids, traffic management, and supply chain optimization.
Challenges in Deep Learning
Deep learning has made significant advancements in various fields, but there are still some challenges that need
to be addressed. Here are some of the main challenges in deep learning:
1. Data availability: It requires large amounts of data to learn from. For using deep learning it’s a big concern
to gather as much data for training.
2. Computational Resources: For training the deep learning model, it is computationally expensive because it
requires specialized hardware like GPUs and TPUs.
3. Time-consuming: While working on sequential data depending on the computational resource it can take
very large even in days or months.
4. Interpretability: Deep learning models are complex, it works like a black box. it is very difficult to interpret
the result.
5. Overfitting: when the model is trained again and again, it becomes too specialized for the training data,
leading to overfitting and poor performance on new data.

Advantages of Deep Learning:

1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in various tasks, such
as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover and learn relevant
features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can learn from
massive amounts of data.
4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle various types of
data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their performance as more data
becomes available.

Disadvantages of Deep Learning:

1. High computational requirements: Deep Learning models require large amounts of data and computational
resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large amount of labeled data
for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult to understand
how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training data, resulting in poor performance
on new and unseen data.
4. Black-box nature: Deep Learning models are often treated as black boxes, making it difficult to understand
how they work and how they arrived at their predictions.
Convolution Neural Network
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used
in various datasets like images, audio, and text. Different types of Neural Networks are used for different
purposes, for example for predicting the sequence of words we use Recurrent Neural Networks more precisely
an LSTM, similarly for image classification we use Convolution Neural networks. In this blog, we are going to
build a basic building block for CNN.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is
equal to the total number of features in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many
hidden layers depending upon our model and data size. Each hidden layer can have different numbers
of neurons which are generally greater than the number of features. The output from each layer is
computed by matrix multiplication of output of the previous layer with learnable weights of that layer
and then by the addition of learnable biases followed by activation function which makes the network
nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is called feedforward,
we then calculate the error using an error function, some common error functions are cross-entropy, square loss
error, etc. The error function measures how well the network is performing. After that, we backpropagate into
the model by calculating the derivatives. This step is called Backpropagation which basically is used to
minimize the loss.
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like
images or videos where data patterns play an extensive role.

CNN architecture

Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling
layer, and fully connected layers.

The Convolutional layer applies filters to the input image to extract features, the Pooling layer downsamples
the image to reduce computation, and the fully connected layer makes the final prediction. The network learns
the optimal filters through backpropagation and gradient descent.

How Convolutional Layers works

Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you have an
image. It can be represented as a cuboid having its length, width (dimension of the image), and height (i.e the
channel as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a filter or kernel on
it, with say, K outputs and representing them vertically. Now slide that neural network across the whole image,
as a result, we will get another image with different widths, heights, and depths. Instead of just R, G, and B
channels now we have more channels but lesser width and height. This operation is called Convolution. If the
patch size is the same as that of the image it will be a regular neural network. Because of this small patch, we
have fewer weights.
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
 Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of
filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
 During the forward pass, we slide each filter across the whole input volume step by step where each step is
called stride (which can have a value of 2, 3, or even 4 for high-dimensional images) and compute the dot
product between the kernel weights and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the number of filters. The network will learn all the filters.
Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of
layers, and every layer transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an
image or a sequence of images. This layer holds the raw input of the image with width 32, height 32, and
depth 3.
 Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies
a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller matrices
usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product between
kernel weight and the corresponding input image patch. The output of this layer is referred ad feature maps.
Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the preceding layer, activation layers
add nonlinearity to the network. it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc.
The volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the
size of volume which makes the computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling. If we use a max pool with 2 x 2
filters and stride 2, the resultant volume will be of dimension 16x16x12.
 Flattening: The resulting feature maps are flattened into a one-dimensional vector after the convolution
and pooling layers so they can be passed into a completely linked layer for categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the final classification
or regression task.
 Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the probability
score of each class.
Convolutional Layer
In deep learning, a convolutional neural network (CNN or ConvNet) is a class of deep neural networks, that are
typically used to recognize patterns present in images but they are also used for spatial data analysis, computer
vision, natural language processing, signal processing, and various other purposes The architecture of a
Convolutional Network resembles the connectivity pattern of neurons in the Human Brain and was inspired by
the organization of the Visual Cortex. This specific type of Artificial Neural Network gets its name from one of
the most important operations in the network: convolution.

What Is a Convolution?
Convolution is an orderly procedure where two sources of information are intertwined; it’s an operation that
changes a function into something else. Convolutions have been used for a long time typically in image processing
to blur and sharpen images, but also to perform other operations. (e.g. enhance edges and emboss) CNNs enforce
a local connectivity pattern between neurons of adjacent layers.

CNNs make use of filters (also known as kernels), to detect what features, such as edges, are present throughout
an image. There are four main operations in a CNN:

Convolution
Non Linearity (ReLU)
Pooling or Sub Sampling
Classification (Fully Connected Layer)
The first layer of a Convolutional Neural Network is always a Convolutional Layer. Convolutional layers apply
a convolution operation to the input, passing the result to the next layer. A convolution converts all the pixels in
its receptive field into a single value. For example, if you would apply a convolution to an image, you will be
decreasing the image size as well as bringing all the information in the field together into a single pixel. The final
output of the convolutional layer is a vector. Based on the type of problem we need to solve and on the kind of
features we are looking to learn, we can use different kinds of convolutions.

The 2D Convolution Layer


The most common type of convolution that is used is the 2D convolution layer and is usually abbreviated as
conv2D. A filter or a kernel in a conv2D layer “slides” over the 2D input data, performing an elementwise
multiplication. As a result, it will be summing up the results into a single output pixel. The kernel will perform
the same operation for every location it slides over, transforming a 2D matrix of features into a different 2D matrix
of features.
UNIT 5

Reinforcement Learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward
in a particular situation. It is employed by various software and machines to find the best possible behavior or
path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that
in supervised learning the training data has the answer key with it so the model is trained with the correct
answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what
to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in an
environment to obtain maximum reward. In RL, the data is accumulated from machine learning systems that
use a trial-and-error method. Data is not part of the input that we would find in supervised or unsupervised
machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next. After
each action, the algorithm receives feedback that helps it determine whether the choice it made was correct,
neutral or incorrect. It is a good technique to use for automated systems that have to make a lot of small
decisions without human guidance.
Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error. It
performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in order to
achieve the best outcomes.
Example:
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed
to find the best possible path to reach the reward. The following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing
the path which gives him the reward with the least hurdles. Each right step will give the robot a reward and
each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the
final reward that is the diamond.
Main points in Reinforcement learning –

 Input: The input should be an initial state from which the model will start
 Output: There are many possible outputs as there are a variety of solutions to a particular problem
 Training: The training is based upon the input, The model will return a state and the user will decide to
reward or punish the model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.
 Difference between Reinforcement learning and Supervised learning:

Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions sequentially.


In Supervised learning, the decision is
In simple words, we can say that the output depends on the state of
made on the initial input or the input
the current input and the next input depends on the output of the
given at the start
previous input
Reinforcement learning Supervised learning

In supervised learning the decisions are


In Reinforcement learning decision is dependent, So we give labels
independent of each other so labels are
to sequences of dependent decisions
given to each decision.

Example: Object recognition,spam


Example: Chess game,text summarization
detetction

Types of Reinforcement:
There are two types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases
the strength and the frequency of the behavior. In other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative condition is
stopped or avoided.
Advantages of reinforcement learning:
 Increases Behavior
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behavior

Elements of Reinforcement Learning

Reinforcement learning elements are as follows:


1. Policy
2. Reward function
3. Value function
4. Model of the environment
Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived states
of the environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal in a reinforcement learning problem.A reward
function is a function that provides a numerical score based on the state of the environment
Value function: Value functions specify what is good in the long run. The value of a state is the total amount
of reward an agent can expect to accumulate over the future, starting from that state.
Model of the environment: Models are used for planning.

Credit assignment problem: Reinforcement learning algorithms learn to generate an internal value for
the intermediate states as to how good they are in leading to the goal. The learning decision maker is called the
agent. The agent interacts with the environment that includes everything outside the agent.
The agent has sensors to decide on its state in the environment and takes action that modifies its state.

The reinforcement learning problem model is an agent continuously interacting with an environment. The
agent and the environment interact in a sequence of time steps. At each time step t, the agent receives the state
of the environment and a scalar numerical reward for the previous action, and then the agent then selects an
action.
Reinforcement learning is a technique for solving Markov decision problems.

Reinforcement learning uses a formal framework defining the interaction between a learning agent and
its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of
representing essential features of the artificial intelligence problem.
Various Practical Applications of Reinforcement Learning –

 RL can be used in robotics for industrial automation.


 RL can be used in machine learning and data processing
 RL can be used to create training systems that provide custom instruction and materials according to the
requirement of students.
Application of Reinforcement Learnings
1. Robotics: Robots with pre-programmed behavior are useful in structured environments, such as the assembly
line of an automobile manufacturing plant, where the task is repetitive in nature.
2. A master chess player makes a move. The choice is informed both by planning, anticipating possible replies
and counter replies.
3. An adaptive controller adjusts parameters of a petroleum refinery’s operation in real time.
RL can be used in large environments in the following situations:

1. A model of the environment is known, but an analytic solution is not available;


2. Only a simulation model of the environment is given (the subject of simulation-based optimization)
3. The only way to collect information about the environment is to interact with it.
4. Advantages of Reinforcement learning
5. 1. Reinforcement learning can be used to solve very complex problems that cannot be solved by
conventional techniques.
6. 2. The model can correct the errors that occurred during the training process.
7. 3. In RL, training data is obtained via the direct interaction of the agent with the environment
8. 4. Reinforcement learning can handle environments that are non-deterministic, meaning that the
outcomes of actions are not always predictable. This is useful in real-world applications where the
environment may change over time or is uncertain.
9. 5. Reinforcement learning can be used to solve a wide range of problems, including those that involve
decision making, control, and optimization.
10. 6. Reinforcement learning is a flexible approach that can be combined with other machine learning
techniques, such as deep learning, to improve performance.
11. Disadvantages of Reinforcement learning
12. 1. Reinforcement learning is not preferable to use for solving simple problems.
13. 2. Reinforcement learning needs a lot of data and a lot of computation
14. 3. Reinforcement learning is highly dependent on the quality of the reward function. If the reward
function is poorly designed, the agent may not learn the desired behavior.
15. 4. Reinforcement learning can be difficult to debug and interpret. It is not always clear why the agent is
behaving in a certain way, which can make it difficult to diagnose and fix problems.
Reinforcement Learning Model:
Types of Reinforcement learning

There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement:

The positive reinforcement learning means adding something to increase the tendency that expected behavior
would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior.

This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may lead
to an overload of states that can reduce the consequences.

Negative Reinforcement:

The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the
specific behavior will occur again by avoiding the negative condition.

It can be more effective than the positive reinforcement depending on situation and behavior, but it provides
reinforcement only to meet minimum behavior.

How to represent the agent state?

We can represent the agent state using the Markov State that contains all the required information from the
history. The State St is Markov state if it follows the given condition:

P[St+1 | St ] = P[St +1 | S1,......, St]

The Markov state follows the Markov property, which says that the future is independent of the past and can
only be defined with the present. The RL works on fully observable environments, where the agent can observe
the environment and act for the new state. The complete process is known as Markov Decision process, which is
explained below:

Markov Decision Process

Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In MDP, the agent
constantly interacts with the environment and performs actions; at each action, the environment responds and
generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using
MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):


o A set of finite States S
o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to learn about it.

Markov Property:

It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then
the state transition from s1 to s2 only depends on the current state and future action and states do not depend
on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past action or
state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players
only focus on the current state and do not need to remember past actions or states.

Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the finite
MDP.

Markov Process:

Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St that uses the Markov
Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition function
P. These two components (S and P) can define the dynamics of the system.

Reinforcement Learning Algorithms

Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used
algorithms are:

o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal difference Learning.
The temporal difference learning methods are the way of comparing temporally successive
predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at a particular state
"s."
o The below flowchart explains the working of Q- learning:
o State Action Reward State action (SARSA):
o SARSA stands for State Action Reward State action, which is an on-policy temporal difference
learning method. The on-policy control method selects the action for each state while learning
using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and all pairs
of (s-a).
o The main difference between Q-learning and SARSA algorithms is that unlike Q-learning, the
maximum reward for the next state is not required for updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same policy, which has determined the
original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to define and update
a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table, neural
network approximates the Q-values for each action and state.

Now, we will expand the Q-learning.

Q-Learning Explanation:

o Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent that what actions
should be taken for maximizing the reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation given
below:

In the equation, we have various components, including reward, discount factor (γ), probability, and end states s'.
But there is no any Q-value is given so first consider the below image:

In the above image, we can see there is an agent who has three values options, V(s1), V(s2), V(s3). As this is MDP,
so agent only cares for the current state and the future state. The agent can go to any direction (Up, Left, or Right),
so he needs to decide where to go for the optimal path. Here agent will take a move as per probability bases and
changes the state. But if we want some exact moves, so for this, we need to make some changes in terms of Q-
value. Consider the below image:

Q- represents the quality of the actions at each state. So instead of using a value at each state, we will use a pair
of state and action, i.e., Q(s, a). Q-value specifies that which action is more lubricative than others, and according
to the best Q-value, the agent takes his next move. The Bellman equation can be used for deriving the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -
value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]


The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the agent.

Q-table:

A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e.,
[s, a], and initializes the values to zero. After each action, the table is updated, and the q-values are stored within
the table.

The RL agent uses this Q-table as a reference table to select the best action based on the q-values.

Difference between Reinforcement Learning and Supervised Learning

The Reinforcement Learning and Supervised Learning both are the part of machine learning, but both types of
learnings are far opposite to each other. The RL agents interact with the environment, explore it, take action, and
get rewarded. Whereas supervised learning algorithms learn from the labeled dataset and, on the basis of the
training, predict the output.

The difference table between RL and Supervised learning is given below:

Reinforcement Learning Supervised Learning

RL works by interacting with the environment. Supervised learning works on the existing dataset.

The RL algorithm works like the human brain works Supervised Learning works as when a human learns
when making some decisions. things in the supervision of a guide.

There is no labeled dataset is present The labeled dataset is present.

No previous training is provided to the learning agent. Training is provided to the algorithm so that it can
predict the output.

RL helps to take decisions sequentially. In Supervised learning, decisions are made when input
is given.
Reinforcement Learning Applications

1. Robotics:
a. RL is used in Robot navigation, Robo-soccer, walking, juggling, etc.
2. Control:
. RL can be used for adaptive control such as Factory processes, admission control in
telecommunication, and Helicopter pilot is an example of reinforcement learning.
3. Game Playing:
. RL can be used in Game playing such as tic-tac-toe, chess, etc.
4. Chemistry:
. RL can be used for optimizing the chemical reactions.
5. Business:
. RL is now used for business strategy planning.
6. Manufacturing:
. In various automobile manufacturing companies, the robots use deep reinforcement learning to pick goods and
put them in some containers.
7. Finance Sector:
. The RL is currently used in the finance sector for evaluating trading strategies.

Genetic Algorithm
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of evolution in
Nature." It is used to solve optimization problems in machine learning. It is one of the important algorithms as it
helps solve complex problems that would take a long time to solve.

Genetic Algorithms are being widely used in different real-world applications, for example, Designing electronic
circuits, code-breaking, image processing, and artificial creativity.

In this topic, we will explain Genetic algorithm in detail, including basic terminologies used in Genetic algorithm,
how it works, advantages and limitations of genetic algorithm, etc.
What is a Genetic Algorithm?

Before understanding the Genetic algorithm, let's first understand basic terminologies to better understand this
algorithm:

o Population: Population is the subset of all possible or probable solutions, which can solve the given
problem.
o Chromosomes: A chromosome is one of the solutions in the population for the given problem, and the
collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of the chromosome.
o Allele: Allele is the value provided to the gene within a particular chromosome.
o Fitness Function: The fitness function is used to determine the individual's fitness level in the population.
It means the ability of an individual to compete with other individuals. In every iteration, individuals are
evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better than
parents. Here genetic operators play a role in changing the genetic composition of the next generation.
o Selection

After calculating the fitness of every existent in the population, a selection process is used to determine which of
the individualities in the population will get to reproduce and produce the seed that will form the coming
generation.

Types of selection styles available

o Roulette wheel selection


o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization problems. It is a
subset of evolutionary algorithms, which is used in computing. A genetic algorithm uses genetic and natural
selection concepts to solve optimization problems.

How Genetic Algorithm Work?

The genetic algorithm works on the evolutionary generational cycle to generate high-quality solutions. These
algorithms use different operations that either enhance or replace the population to give an improved fit solution.

It basically involves five phases to solve the complex optimization problems, which are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization

The process of a genetic algorithm starts by generating the set of individuals, which is called population. Here
each individual is the solution for the given problem. An individual contains or is characterized by a set of
parameters called Genes. Genes are combined into a string and generate chromosomes, which is the solution to
the problem. One of the most popular techniques for initialization is the use of random binary strings.

Fitness Assignment

Fitness function is used to determine how fit an individual is? It means the ability of an individual to compete
with other individuals. In every iteration, individuals are evaluated based on their fitness function. The fitness
function provides a fitness score to each individual. This score further determines the probability of being selected
for reproduction. The high the fitness score, the more chances of getting selected for reproduction.

3. Selection

The selection phase involves the selection of individuals for the reproduction of offspring. All the selected
individuals are then arranged in a pair of two to increase reproduction. Then these individuals transfer their genes
to the next generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection


o Tournament selection
o Rank-based selection

Reproduction

After the selection process, the creation of a child occurs in the reproduction step. In this step, the genetic
algorithm uses two variation operators that are applied to the parent population. The two operators involved in
the reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction phase of the genetic algorithm.
In this process, a crossover point is selected at random within the genes. Then the crossover operator
swaps genetic information of two parents from the current generation to produce a new individual
representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met. These newly
generated offspring are added to the population. This process is also called or crossover. Types of
crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the diversity in the
population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification. The below
image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation

5. Termination

After the reproduction phase, a stopping criterion is applied as a base for termination. The algorithm terminates
after the threshold fitness solution is reached. It will identify the final solution as the best solution in the
population.
General Workflow of a Simple Genetic Algorithm

Advantages of Genetic Algorithm


o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-objective problems, and
continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.

Limitations of Genetic Algorithms


o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational challenges.

Difference between Genetic Algorithms and Traditional Algorithms


o A search space is the set of all possible solutions to the problem. In the traditional algorithm, only one set
of solutions is maintained, whereas, in a genetic algorithm, several sets of solutions in search space can
be used.
o Traditional algorithms need more information in order to perform a search, whereas genetic algorithms
need only one objective function to calculate the fitness of an individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work parallelly
(calculating the fitness of the individualities are independent).
o One big difference in genetic Algorithms is that rather of operating directly on seeker results, inheritable
algorithms operate on their representations (or rendering), frequently appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic algorithm is that it does not directly
operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas Genetic Algorithms can generate
multiple optimal results from different generations.
o The traditional algorithm is not more likely to generate optimal results, whereas Genetic algorithms do
not guarantee to generate optimal global results, but also there is a great possibility of getting the optimal
result for a problem as it uses genetic operators such as Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic algorithms are probabilistic and
stochastic in nature.

Genetic Programming

Genetic programming is a form of artificial intelligence that mimics natural selection in order to find an optimal
result. Genetic programming is iterative, and at each new stage of the algorithm, it chooses only the fittest of
the “offspring” to cross and reproduce in the next generation, which is sometimes referred to as a fitness
function. Just like in biological evolution, evolutionary algorithms can sometimes have randomly mutating
offspring, but since only the offspring that have the highest fitness measure are reproduced, the fitness will
almost always improve over generations. Genetic programming will generally terminate once it reaches a
predefined fitness measure. Additionally, architecture-altering operations can be introduced to an already
running program in order to allow for new sources of information to be analyzed for a given fitness function.

Although originally proposed in 1950 by Alan Turing, it wasn’t until the 1980s that successful genetic
algorithms were first implemented. The first patented algorithm for genetic operations was in 1988 by John
Koza, who remains a leader in the field. As the study of genetic operations continued to evolve, so has the
literature around it, such as with annual conferences like Genetic Algorithms and dedicated journals
like Genetic Programming and Evolvable Machines. There has also been a series of 19 books published by
MIT Press called Genetic and Evolutionary Computation. As knowledge around evolutionary programs has
expanded, so has the population of computer programs that can run them.

Genetic programming systems utilize a type of machine learning technique that can include automatic
programming without the need for manual interaction. This means that genetic algorithms can utilize automatic
program inductions to run as new information is ingested, so that the programs can be optimized automatically.
Genetic or evolutionary algorithms have a variety of uses, particularly around domains where an exact solution
is not known in advance, or when finding an approximate solution is deemed appropriate. Genetic programming
is often used in conjunction with other forms of machine learning, as it is useful for performing symbolic
regressions and feature classifications.

Genetic programming can help organizations and businesses by:

 Saving time: Genetic algorithms are able to process large amounts of data much more quickly than
humans can. Additionally, these algorithms run free of human biases, and are thereby able to come up
with ideas that might otherwise not have been considered.
 Data and text classification: Genetic programming can quickly identify and classify various forms of
data without the need for human oversight. Genetic programming can use data tree construction in order
to optimize these classifications, especially when dealing with big data.

 Ensuring network security: Rule evolution approaches have been successfully applied to identify new
attacks on networks. By quickly identifying intrusions, businesses and organizations can ensure that
they can respond to such attacks before they are able to access confidential information.

 Supporting other machine learning methods: Genetic programming can be included in larger systems
of machine learning, such as with neural networks. By having genetic programming focus on only
specific subsets of data, organizations can ensure that this data is quickly processed for ingestion into
larger or different learning methods. This allows organizations to gain as much useful and actionable
information as possible.

Genetic Programming (GP) is a type of Evolutionary Algorithm (EA), a subset of machine learning. EAs are used
to discover solutions to problems humans do not know how to solve, directly. Free of human preconceptions or
biases, the adaptive nature of EAs can generate solutions that are comparable to, and often better than the best
human efforts.*

Inspired by biological evolution and its fundamental mechanisms, GP software systems implement an algorithm
that uses random mutation, crossover, a fitness function, and multiple generations of evolution to resolve a user-
defined task. GP can be used to discover a functional relationship between features in data (symbolic regression),
to group data into categories (classification), and to assist in the design of electrical circuits, antennae, and
quantum algorithms. GP is applied to software engineering through code synthesis, genetic improvement,
automatic bug-fixing, and in developing game-playing strategies, … and more.

Types of GP include:

 Tree-based Genetic Programming


 Stack-based Genetic Programming
 Linear Genetic Programming (LGP)
 Grammatical Evolution
 Extended Compact Genetic Programming (ECGP)
 Cartesian Genetic Programming (CGP)
 Probabilistic Incremental Program Evolution (PIPE)
 Strongly Typed Genetic Programming (STGP)
 Genetic Improvement of Software for Multiple Objectives (GISMO)
Applications of genetic algorithm in machine learning

Genetic algorithms find use in various real-world applications. In this segment, we have elaborated on some areas
that utilize the genetic algorithms in machine learning.

1. Neural networks

Genetic programming in machine learning finds great applications for neural networks in machine learning. We
use it for genetic optimization in neural networks or use cases like inheriting qualities of neurons, neural network
pipeline optimization, finding the best fit set of parameters for a given neural network, and others.

2. Data mining and clustering

Data mining and clustering use genetic algorithms to find out the centre point of the clusters with an optimal error
rate given to its great searching capability for an optimal value. It is renowned as an unsupervised learning process
in machine learning, where we categorize the data based on the characteristics of the data points.

3. Image processing

Image processing tasks, such as image segmentation, are one of the major use cases of genetic optimization.
However, genetic algorithms can also be used in different areas of image analysis to resolve complex optimization
problems.

4. Wireless sensor network

A wireless sensor network refers to one which includes dedicated and especially dispersed centers that maintain
the record of the physical conditions of an environment. It further passes the record created to a central storage
system.

WSN utilizes genetic machine learning to stimulate the sensors. We can optimize and even customize all the
operational stages with the help of the fitness function from genetic algorithms in wireless sensor networks.

5. Traveling salesman problem (TSP)

TSP or traveling salesman problem is one of the real-life combinatorial optimization problems that were solved
using genetic optimization. It helps in finding an optimal way in a given map with the distance between two points
and with the routes to be covered by the salesman.
Several iterations take place that generate offspring solutions after every iteration to inherit the qualities of parent
solutions. In this way we don’t get the solution only one time that offers you the opportunity to choose the best
route structure. It finds application in real-time processes like planning, manufacturing and logistics.

6. Vehicle routing problems

One of the generalisations of the travelling salesman problem discussed above is the vehicle routing problem.
Genetic algorithms help in finding the optimal weight of goods to be delivered through the optimal set of delivery
routes. Factors such as depot points, wait, distance are taken into consideration while solving such problems if
they have any kind of restrictions. Moreover, the genetic algorithm approach is competitive in terms of solution
quality and time with simulated annealing algorithms and tabu search.

7. Mechanical engineering design

Genetic algorithms find application in many designing procedures of mechanical components. For instance,
consider the following genetic algorithm example where the aircraft wing design is a kind of designing problem
that takes multiple disciplines into consideration. It requires improvement in the ratio of left to drag for a complex
wing.

The fitness function in genetic optimization is flexible to considerations that come as a demand for a particular
design.

8. Manufacturing system

The manufacturing arena includes various examples of the cost function. Based on the same, the need to find an
optimal set of parameters for such functions poses a problem.

Genetic optimization performs this task of using the optimized set of parameters to minimize the cost function. It
also finds application in product manufacturing to achieve optimum production plans by considering dynamic
conditions like capacity, inventories, or material quality.

9. Financial markets

A variety of issues can be solved using genetic optimization in the financial market. It helps in finding an optimal
combination of parameters that can affect the trades or market rules. You can also find out the near-optimal value
for the optimal set of parameters.

10. Medical science

Medical signs have an array of use cases for genetic optimization. The areas of predictive analysis include protein
prediction, RNA structure prediction, operon prediction, and others.

Other processes such as protein folding, gene expression profiling analysis, bioinformatics multiple sequence
alignment, or some of the process alignment that uses genetic optimisation.

11. Task scheduling

Genetic machine learning algorithms are used to derive optimal schedules that satisfy certain constraints related
to a problem.
For instance: assume that you have to prepare the schedule for the semester exams for University. The genetic
algorithm will find the best optimal schedule for the university considering all the constraints like the number of
classrooms, the number of students, the total number of courses and subjects, and others.

12. Economics

Economics is the study of resource utilization in the production, distribution, and complete consumption of goods
and services over a time period. The genetic algorithm creates models of demand and supply that derive asset
pricing, game theory, and others.

13. Robotics

Robotics comprises the construction, design, and working of the autonomous robot. Genetic algorithms contribute
to the robotics field by providing the necessary insight into the decisions made by the robot. It generates optimal
routes for the robot so that it can use the least amount of resources to get to the desired position.

You might also like