You are on page 1of 41

INDEX

S. No. Name of Experiment Date Sign Remark

1. Explain the machine learning concept by taking an example.


Describe the perspective and issues in machine learning.
2. Discuss linear regression with an example.

3. Explain the role of hypothesis function in machine learning.

4. What are the different types of Neural network? Explain


convolution neural network model in detail.
5 Explain the concept of reinforcement learning and its
framework in detail.
6 What are the basic design issues and approaches to machine
learning?
7 Define statistical theory and how it is perform in machine
learning
8 What is support vector machine? Discuss in detail.

9 What is reinforcement learning? Explain its detailed concepts.

10 Describe concept of MDP?


EXPERIMENT-1

OBJECTIVE:
Explain the machine learning concept by taking an example. Describe the perspective and
issues in machine learning.

THEORY:
Machine learning is a machine’s ability to learn from data. It has been around for decades,
but machine learning is now being applied in nearly every industry and job function. In this
blog post, we’ll cover a detailed introduction to what is machine learning (ML) including
different definitions. We will also learn about different types of machine learning tasks,
algorithms, etc along with real-world examples.
What is machine learning & how does it work?
Simply speaking, machine learning can be used to model our beliefs about real-world
events. For example, let’s say a person came to a doctor with a certain blood report. A
doctor based on his belief system learned using his/her experience & knowledge, predicts
(decides essentially) whether the person is suffering from a disease or otherwise. When
the human belief system is not good enough for reasons such as evaluating a large
amount of different data to arrive at a decision, we can, then, replace “human belief
system” with a AI / machine learning system (one or more models) and “experience and
knowledge” with data which is fed into this AI / ML system. Doctors can as well use ML
models trained using past data along with his experience & intelligence to predict whether
the person suffers from a disease or not. When human and machine learning intelligence
is used in combination, it is also termed as an augmented system. When the system can
rely solely on the AI-based decisions, the system can be called as an autonomous
system. How well these beliefs correspond with reality is what is learned by the doctor
over a period of time. In the machine learning world, we have a “cost function” or “loss
function” that is learned to ensure that the prediction is closer to reality.
Technically speaking, machine learning is a technology where in machine learns to
perform a prediction/estimation task based on past experience represented by a historical
data set. There are three key aspects of machine learning which are the following:

 Task: Task can be related to prediction problems such as classification, regression,


clustering, etc.
 Experience: Experience represents the historical dataset.
 Performance: The goal is to perform better in the prediction tasks based on the past
datasets. Different performance measures exist for different kinds of machine learning
problems. Read my related blog on key techniques for evaluating machine learning
models.
Mathematically speaking, building machine learning models is about approximating
mathematical functions (equations) representing real-world scenarios. These
mathematical functions are also referred to as “mathematical models” or just models.
Machine learning models are mathematical equations/functions that represent or model
real-world problems/scenarios. The reason why machine learning models are called
function approximations is because it will be extremely difficult to find exact functions
which can be used to exactly represent the real-world and predict or estimate real-world
scenarios. Here is an illustration of simple mathematical function which can be learned
using the data.
The function could as well be used to map the image to the image content or label. Here is
the illustration:

imageContent = f(image)
The picture below represents how the above function could be used to map cat in the
picture to cat and dog in the picture to dog.

The diagram below represents two different kinds of mathematical functions, one
representing the line (left) that divides the data points, and the other representing the line
(right – regression) which can be used to predict the data points. The left line can be called
a classification function or model which is learned with the given data points. The right line
(regression – best fit line) can be called a regression function or model which is learned
from the given data points.
Here is the philosophical way to look at machine learning.
Machine learning can be defined as a technology that can be used to create a
“mathematical form” or “mathematical being” or a “machine” that is capable to perform
certain pre-defined tasks (estimate or predict) given a set of inputs. The “mathematical
form” or the “machine” is composed of a set of levers that work together to create a pre-
defined desired output. The “levers” can be thought of as “features”. How the
“mathematical form” or “machine” needs to work together depends upon different
algorithms called machine learning algorithms. Different “machines” or models get created
based on the historical dataset and different algorithms. These “mathematical forms” are
created by what we call “data scientists” or machine learning engineers. And, there are
well-defined processes and tools that can be used for creating these “machines”
(training/testing process). The most fundamental aspects of building machine learning
models include computations based on the given data, that can be used to learn to build a
model representing real-world problems/scenarios.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.

The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sample data to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets
a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Issues in Machine Learning


"Machine Learning" is one of the most popular technology among all data scientists and
machine learning enthusiasts. It is the most effective Artificial Intelligence technology that
helps create automated learning systems to take future decisions without being constantly
programmed. It can be considered an algorithm that automatically constructs various
computer software using past experience and training data. It can be seen in every industry,
such as healthcare, education, finance, automobile, marketing, shipping, infrastructure,
automation, etc. Almost all big companies like Amazon, Facebook, Google, Adobe, etc., are
using various machine learning techniques to grow their businesses. But everything in this
world has bright as well as dark sides. Similarly, Machine Learning offers great opportunities,
but some issues need to be solved.

Machine Learning is the study of learning algorithms using past experience and making
future decisions. Although, Machine Learning has a variety of models, here is a list of the
most commonly used machine learning algorithms by all data scientists and professionals
in today's world.

o Linear Regression
o Logistic Regression
o Decision Tree
o Bayes Theorem and Naïve Bayes Classification
o Support Vector Machine (SVM) Algorithm
o K-Nearest Neighbor (KNN) Algorithm
o K-Means
o Gradient Boosting algorithms
o Dimensionality Reduction Algorithms
o Random Forest

Common issues in Machine Learning


Although machine learning is being used in every industry and helps organizations make
more informed and data-driven choices that are more effective than classical
methodologies, it still has so many problems that cannot be ignored. Here are some
common issues in Machine Learning that professionals face to inculcate ML skills and create
an application from scratch.

This article will discuss some major practical issues and their business implementation, and
how we can overcome them. So let's start with a quick introduction to Machine Learning.
EXPERIMENT-2

OBJECTIVE:
Discuss linear regression with an example.

THEORY:

Linear Regression in Machine Learning


Linear regression is one of the easiest and most popular Machine Learning algorithms. It is
a statistical method that is used for predictive analysis. Linear regression makes predictions
for continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable
is changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:


Examples of Linear Regression
The weight of the person is linearly related to their height. So, this shows a linear
relationship between the height and weight of the person. According to this, as we
increase the height, the weight of the person will also increase. It is not necessary
that one variable is dependent on others, or one causes the other, but there is some
critical relationship between the two variables. In such cases, we use a scatter plot
to simplify the strength of the relationship between the variables. If there is no relation
or linking between the variables then the scatter plot does not indicate any increasing
or decreasing pattern. In such cases, the linear regression design is not beneficial to
the given data.

Linear Regression Equation


The measure of the relationship between two variables is shown by the correlation
coefficient. The range of the coefficient lies between -1 to +1. This coefficient shows
the strength of the association of the observed data between two variables.

Linear Regression Equation is given below:

Y=a+bX

where X is the independent variable and it is plotted along the x-axis

Y is the dependent variable and it is plotted along the y-axis

Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).
.

Properties of Linear Regression

For the regression line where the regression parameters b 0 and b1are defined, the
following properties are applicable:
 The regression line reduces the sum of squared differences between observed
values and predicted values.
 The regression line passes through the mean of X and Y variable values.
 The regression constant b0 is equal to the y-intercept of the linear regression.
 The regression coefficient b1 is the slope of the regression line. Its value is
equal to the average change in the dependent variable (Y) for a unit change in
the independent variable (X)

Linear regression is a statistical approach for modelling relationship between a dependent


variable with a given set of independent variables.
Here, we refer dependent variables as response and independent variables as features for
simplicity.
In order to provide a basic understanding of linear regression, we start with the most basic
version of linear regression, i.e. Simple linear regression.

 Simple Linear Regression


EXPERIMENT-3

OBJECTIVE:
Explain the role of hypothesis function in machine learning.

THEORY: The hypothesis is defined as the supposition or proposed explanation based


on insufficient evidence or assumptions. It is just a guess based on some known facts but
has not yet been proven. A good hypothesis is testable, which results in either true or false.
The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It
is specifically used in Supervised Machine learning, where an ML model learns a function
that best maps the input to corresponding outputs with the help of an available dataset.
The hypothesis function, also known as a model, is a crucial element in the process of
building a machine learning system. It is responsible for mapping inputs to outputs based
on the training data provided.
The hypothesis function can take various forms, such as linear regression, logistic
regression, decision trees, neural networks, and more. The choice of the hypothesis function
depends on the nature of the problem and the type of data being used.
Once a hypothesis function is selected, it is trained using the available training data. The
training process involves adjusting the parameters of the model to minimize the error
between the predicted output and the actual output.
Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised machine
learning algorithms. It is primarily based on data as well as bias and restrictions applied to
data.
Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper
output and can be evaluated as well as used to make predictions.
The hypothesis (h) can be formulated in machine learning as follows:
y= mx + b
Where,
Y: Range
m: Slope of the line which divided test data or changes in y divided by change in x.
x: domain
c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:

Now, assume we have some test data by which ML algorithms predict the outputs for input
as follows:

If we divide this coordinate plane in such as way that it can help you to predict output or result as
follows:
Based on the given test data, the output result will be as follows:
However, based on data, algorithm, and constraints, this coordinate plane can also be divided in the
following ways as follows:

With the above example, we can conclude that;

Hypothesis space (H) is the composition of all legal best possible ways to divide the
coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis
and hypothesis space would be like this:
Conclusion
In the series of mapping instances of inputs to outputs in supervised machine learning, the
hypothesis is a very useful concept that helps to approximate a target function in machine
learning. It is available in all analytics domains and is also considered one of the important
factors to check whether a change should be introduced or not. It covers the entire training
data sets to efficiency as well as the performance of the models.
EXPERIMENT-4

OBJECTIVE:

What are the different types of Neural network? Explain convolution neural network model
in detail.

Theory: Convolutional Neural Network (CNN) is the extended version of artificial neural
networks (ANN) which is predominantly used to extract the feature from the grid-like matrix
dataset. For example visual datasets like images or videos where data patterns play an
extensive role.

There are several types of neural networks, each designed to solve specific problems. Some
of the most commonly used neural networks are:

Feedforward neural networks: These are the simplest type of neural network, where
the input data is passed through several layers of neurons to produce an output. They are
used for tasks such as classification and regression.

Recurrent neural networks: These networks are designed to handle sequential data,
where the output at each time step is dependent on the previous output. They are commonly
used for tasks such as speech recognition and language translation.

Convolutional neural networks: These networks are designed to process data with
a grid-like topology, such as images. They use convolutional layers to extract features from
the input data, followed by pooling layers to reduce the dimensionality of the features.

Generative adversarial networks: These networks consist of a generator and a


discriminator, both trained together to produce realistic output data. They are commonly
used for tasks such as image and video synthesis.

A typical CNN consists of three main types of layers: convolutional layers, pooling
layers, and fully connected layers.

Convolutional Layers: These layers are the core of the CNN and contain a set of
learnable filters or kernels that are used to extract features from the input image. Each filter
applies a convolution operation to a small region of the image, and the resulting output is
passed to the next layer.

Pooling Layers: These layers are used to reduce the dimensionality of the features
obtained from the convolutional layers. The most common type of pooling is max pooling,
where the maximum value in each region of the feature map is selected and passed to the
next layer.

Fully Connected Layers: These layers are used to classify the features obtained from
the convolutional and pooling layers. They are similar to the layers in a feedforward neural
network and are used to make the final predictions.

During training, the CNN learns the filters in the convolutional layers that are most effective
at recognizing the features in the input image. The goal is to minimize the error between the
predicted output and the actual output. Once the CNN is trained, it can be used to classify
new images by passing them through the network and obtaining the final predictions from
the fully connected layers.

Conclusion :- Convolutional neural networks are a powerful type of neural network that
can be used for image recognition and classification tasks. They use convolutional layers to
extract features from the input image, pooling layers to reduce the dimensionality of the
features, and fully connected layers to make the final predictions.
EXPERIMENT-5

OBJECTIVE:
Explain the concept of reinforcement learning and its framework in detail.

THEORY: Reinforcement Learning is a type of machine learning where an agent learns


to behave in an environment by performing actions and receiving rewards or punishments
based on its actions. It is inspired by how humans learn from trial and error, and its
applications include gaming, robotics, and autonomous vehicles.
Framework:
The framework of reinforcement learning can be described as follows:
1. Agent: The agent is the entity that interacts with the environment and learns from
it. The agent takes actions based on the state of the environment and receives
rewards or punishments based on those actions.
2. Environment: The environment is the external world in which the agent operates.
It is the source of information and feedback that the agent receives. The
environment can be deterministic or stochastic.
3. State: The state is the current condition of the environment that the agent
observes. The state can be represented by a set of features or variables that
describe the environment's characteristics.
4. Action: The action is the decision made by the agent based on the state of the
environment. The action can be discrete or continuous.
5. Reward: The reward is the feedback provided by the environment to the agent
based on its actions. The reward can be positive or negative, and it reflects the
agent's performance.
6. Policy: The policy is the strategy used by the agent to decide its actions based on
the state of the environment. The policy can be deterministic or stochastic.
7. Value Function: The value function is a function that estimates the long-term
value of being in a certain state and taking a certain action. The value function is
used to evaluate the quality of the agent's decisions.
8. Q-Function: The Q-function is a function that estimates the long-term value of
taking a certain action in a certain state. The Q-function is used to determine the
best action to take in a given state.

Reinforcement Learning Algorithms:


There are several reinforcement learning algorithms, including:

1. Q-Learning: Q-learning is a model-free algorithm that uses the Q-function to


determine the optimal action to take in a given state. It uses a table to store the
Q-values for each state-action pair.
2. SARSA: SARSA is a model-free algorithm that uses the Q-function to determine
the optimal action to take in a given state. It updates the Q-values based on the
action taken and the reward received.
3. Deep Q-Networks (DQN): DQN is a model-based algorithm that uses neural
networks to estimate the Q-function. It has been successful in playing Atari
games.

Conclusion:
Reinforcement Learning is a powerful paradigm for learning from experience. Its
framework provides a structure for understanding how agents interact with their
environments and how they learn to make decisions based on rewards and
punishments. Understanding the different reinforcement learning algorithms and their
applications can enable us to develop intelligent systems that can learn from experience
and improve over time.
EXPERIMENT-6

OBJECTIVE :- What are the basic design issues and approaches to machine
learning?

THEORY :- Machine Learning is a field of computer science that focuses on developing


algorithms and models that can learn patterns from data without being explicitly
programmed.
There are different approaches to machine learning, and different design issues need to
be considered when developing a machine learning system.

Basic Design Issues:


1. Data Collection: Data collection is the process of gathering data to train the
machine learning model. The quality and quantity of the data used to train the
model are critical factors in determining the model's performance. The data must
be diverse, representative, and relevant to the problem being solved.

2. Data Preprocessing: Data preprocessing is the process of cleaning, transforming,


and reducing the data to make it suitable for analysis. This includes removing
missing values, outliers, and irrelevant features.

3. Feature Selection: Feature selection is the process of selecting the most relevant
features to be used in the machine learning model. This is important to reduce
the dimensionality of the data and to avoid overfitting.

4. Model Selection: Model selection is the process of choosing the most appropriate
algorithm or model to solve the problem. This depends on the type of problem,
the size of the data, and the available resources.

5. Training the Model: Training the model involves using the selected algorithm or
model to learn patterns from the data. This involves optimizing the model's
parameters to minimize the error or loss function.

6. Model Evaluation: Model evaluation is the process of assessing the performance


of the trained model on unseen data. This involves using performance metrics
such as accuracy, precision, recall, and F1 score.

Approaches to Machine Learning:


1. Supervised Learning: Supervised learning is a type of machine learning where
the algorithm learns from labeled data. The goal is to learn a mapping function
that can predict the output for new inputs. Examples include classification and
regression problems.

2. Unsupervised Learning: Unsupervised learning is a type of machine learning


where the algorithm learns from unlabeled data. The goal is to discover hidden
patterns and structures in the data. Examples include clustering and
dimensionality reduction.
3. Semi-Supervised Learning: Semi-supervised learning is a type of machine
learning where the algorithm learns from a combination of labeled and unlabeled
data. This approach is useful when the labeled data is scarce or expensive to
obtain.

4. Reinforcement Learning: Reinforcement learning is a type of machine learning


where the algorithm learns from trial and error. The goal is to learn a policy that can
maximize the cumulative reward over time. Examples include game playing and robotics.

Conclusion:
Machine learning is a complex field that requires careful consideration of different
design issues and approaches. Data collection, preprocessing, feature selection, model
selection, training the model, and model evaluation are all important factors to consider
when developing a machine learning system. The choice of approach depends on the
type of problem and the available data. Understanding the different approaches to
machine learning can enable us to develop intelligent systems that can learn from data
and improve over time.
EXPERIMENT-7

OBJECTIVE: Define statistical theory and how it is performed in machine learning .

THEORY: Statistical theory is a branch of mathematics that deals with the study of
probability and statistical inference. Machine learning algorithms are heavily influenced by
statistical theory, as they rely on probability theory to make predictions and inferences
from data.
Statistical Theory in Machine Learning:
1. Probability Theory: Probability theory is the foundation of statistical theory and
machine learning. It provides the mathematical framework for modeling uncertain
events and making predictions based on observed data. In machine learning,
probability theory is used to model the distribution of the data and to estimate the
parameters of the model.
2. Statistical Inference: Statistical inference is the process of drawing conclusions
from data using statistical methods. It involves making inferences about the
population based on a sample of the data. In machine learning, statistical
inference is used to estimate the model parameters and to make predictions
about unseen data.
3. Hypothesis Testing: Hypothesis testing is a statistical technique used to test the
validity of a hypothesis using data. In machine learning, hypothesis testing can
be used to test the significance of the model parameters and to compare the
performance of different models.
4. Bayesian Inference: Bayesian inference is a statistical technique that involves
updating prior beliefs based on observed data. In machine learning, Bayesian
inference is used to estimate the posterior distribution of the model parameters
and to make predictions about unseen data.
5. Maximum Likelihood Estimation: Maximum likelihood estimation is a statistical
technique used to estimate the parameters of a model by maximizing the
likelihood function. In machine learning, maximum likelihood estimation is used to
estimate the parameters of the model and to make predictions about unseen
data.
Application of Statistical Theory in Machine Learning:
1. Regression Analysis: Regression analysis is a statistical technique used to model
the relationship between a dependent variable and one or more independent
variables. In machine learning, regression analysis is used to predict a
continuous output variable based on one or more input variables.

2. Classification Analysis: Classification analysis is a statistical technique used to


classify data into different categories based on a set of predefined criteria. In
machine learning, classification analysis is used to predict a categorical output
variable based on one or more input variables.

3. Clustering Analysis: Clustering analysis is a statistical technique used to group


similar data points into clusters based on their similarity. In machine learning,
clustering analysis is used to discover hidden patterns in the data and to
segment the data into meaningful groups.

4. Dimensionality Reduction: Dimensionality reduction is a statistical technique


used to reduce the number of features or variables in the data while preserving
the most important information. In machine learning, dimensionality reduction is
used to improve the performance of the model and to reduce the computational
cost.

Conclusion:

Statistical theory provides a rigorous mathematical framework for modeling uncertainty


and making predictions based on observed data. Machine learning algorithms heavily
rely on statistical theory to model the distribution of the data, estimate the parameters of
the model, and make predictions about unseen data. Understanding statistical theory is
essential for developing effective machine-learning algorithms and interpreting the
results. The application of statistical theory in machine learning has enabled us to
develop intelligent systems that can learn from data and improve over time.
EXPERIMENT-8

OBJECTIVE:-
What is Support Vector Machine Explain in Detail.

THEORY:-

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that are classified using a decision
boundary or hyperplane:-
Example:-

SVM can be understood with the example that we have used in the KNN classifier. Suppose
we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text

categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

How Does SVM Works:-

Linear SVM:

 The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two features
x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue.

 So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:

 Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point
of the lines from both the classes. These points are called support vectors. The
distance between the vectors and the hyperplane is called as margin. And the goal
of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM:

 If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
 AD

 So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can
be calculated as:

z=x2 +y2

 By adding the third dimension, the sample space will become as below image:
 So now, SVM will divide the datasets into classes in the following way. Consider the below
image:

 Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:

 Hence we get a circumference of radius 1 in case of non-linear data.

Conclusion:-
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
EXPERIMENT-9

OBJECTIVE:-
What is reinforcement learning? Explain its detailed concepts.
THEORY:-
Reinforcement learning is a machine learning training method based on rewarding desired
behaviors and/or punishing undesired ones. In general, a reinforcement learning agent is able to
perceive and interpret its environment, take actions and learn through trial and error .

How does reinforcement learning work?

Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns


to behave in an environment by performing the actions and seeing the results of actions. For each
good action, the agent gets positive feedback, and for each bad action, the agent gets negative
feedback or penalty.

In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.

Since there is no labeled data, so the agent is bound to learn by its experience only.

RL solves a specific type of problem where decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc.

The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.

The agent learns with the process of hit and trial, and based on the experience, it learns to perform
the task in a better way. Hence, we can say that "Reinforcement learning is a type of machine
learning method where an intelligent agent (computer program) interacts with the environment and
learns to act within that." How a Robotic dog learns the movement of his arms is an example of
Reinforcement learning.
It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement
learning. Here we do not need to pre-program the agent, as it learns from its own experience
without any human intervention.

Example: Suppose there is an AI agent present within a maze environment, and his goal is to find
the diamond. The agent interacts with the environment by performing some actions, and based on
those actions, the state of the agent gets changed, and it also receives a reward or penalty as
feedback.

The agent continues doing these three things (take action, change state/remain in the same state, and
get feedback), and by doing these actions, he learns and explores the environment.

The agent learns that what actions lead to positive feedback or rewards and what actions lead to
negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it
gets a negative point.

AD

Terms used in Reinforcement Learning:-


o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume
the stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the action of
the agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the short-
term reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current
action (a).

To understand the working process of the RL, we need to consider two main things:
o Environment: It can be anything such as a room, maze, football ground, etc.
o Agent: An intelligent agent such as AI robot.

Let's take an example of a maze environment that the agent needs to explore. Consider the below
image:

In the above image, the agent is at the very first block of the maze. The maze is consisting of an
S6 block, which is a wall, S8 a fire pit, and S4 a diamond block.

The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get
the +1 reward; if it reaches the fire pit, then gets -1 reward point. It can take four actions: move up,
move down, move left, and move right.

The agent can take any path to reach to the final point, but he needs to make it in possible fewer steps.
Suppose the agent considers the path S9-S5-S1-S2-S3, so he will get the +1-reward point.

The agent will try to remember the preceding steps that it has taken to reach the final step. To
memorize the steps, it assigns 1 value to each previous step. Consider the below step:
Now, the agent has successfully stored the previous steps assigning the 1 value to each previous
block. But what will the agent do if he starts moving from the block, which has 1 value block on both
sides? Consider the below diagram:

It will be a difficult condition for the agent whether he should go up or down as each block has the
same value. So, the above approach is not suitable for the agent to reach the destination. Hence to
solve the problem, we will use the Bellman equation, which is the main concept behind reinforcement
learning.
AD

The Bellman Equation:-


The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year
1953, and hence it is called as a Bellman equation. It is associated with dynamic programming and
used to calculate the values of a decision problem at a certain point by including the values of previous
states.

It is a way of calculating the value functions in dynamic programming or environment that leads to
modern reinforcement learning.

The key-elements used in Bellman equations are:

o Action performed by the agent is referred to as "a"


o State occurred by performing the action is "s."
o The reward/feedback obtained for each good and bad action is "R."
o A discount factor is Gamma "γ."

The Bellman equation can be written as:

1. V(s) = max [R(s,a) + γV(s`)]


Where,
AD
V(s)= value calculated at a particular point.

R(s,a) = Reward at a particular state s by performing an action.

γ = Discount factor

V(s`) = The value at the previous state.

In the above equation, we are taking the max of the complete values because the agent tries to find
the optimal solution always.

So now, using the Bellman equation, we will find value at each state of the given environment. We
will start from the block, which is next to the target block.

For 1st block:

V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.

V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.

For 2nd block:

V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no reward
at this state.

V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9

For 3rd block:

V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no reward
at this state also.

V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81

For 4th block:

V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no
reward at this state also.
V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73

For 5th block:

V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is no
reward at this state also.

V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66

Consider the below image:

Now, we will move further to the 6th block, and here agent may change the route because it always
tries to find the optimal path. So now, let's consider from the block next to the fire pit.

Now, the agent has three options to move; if he moves to the blue box, then he will feel a bump if he
moves to the fire pit, then he will get the -1 reward. But here we are taking only positive rewards, so
for this, he will move to upwards only. The complete block values will be calculated using this
formula. Consider the below image:
Types of Reinforcement learning:-
There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement:
The positive reinforcement learning means adding something to increase the tendency that expected
behavior would occur again. It impacts positively on the behavior of the agent and increases the
strength of the behavior.

This type of reinforcement can sustain the changes for a long time, but too much positive
reinforcement may lead to an overload of states that can reduce the consequences.

Negative Reinforcement:

The negative reinforcement learning is opposite to the positive reinforcement as it increases the
tendency that the specific behavior will occur again by avoiding the negative condition.

It can be more effective than the positive reinforcement depending on situation and behavior, but it
provides reinforcement only to meet minimum behavior.

How to represent the agent state?


We can represent the agent state using the Markov State that contains all the required information
from the history. The State St is Markov state if it follows the given condition:
P[St+1 | St ] = P[St +1 | S1,......, St]
The Markov state follows the Markov property, which says that the future is independent of the past
and can only be defined with the present. The RL works on fully observable environments, where the
agent can observe the environment and act for the new state. The complete process is known as
Markov Decision process, which is explained below:
Reinforcement Learning Algorithms
Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The
main used algorithms are:

o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal difference
Learning. The temporal difference learning methods are the way of comparing
temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at a
particular state "s."
o The below flowchart explains the working of Q- learning:

o State Action Reward State action (SARSA):


o SARSA stands for State Action Reward State action, which is an on-policy temporal
difference learning method. The on-policy control method selects the action for each
state while learning using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and
all pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is that unlike Q-
learning, the maximum reward for the next state is not required for updating the Q-
value in the table.
o In SARSA, new action and reward are selected using the same policy, which has
determined the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s: original state , a: Original action ,r: reward observed while following the states, s'
and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to define
and update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-
table, neural network approximates the Q-values for each action and state.

Now, we will expand the Q-learning.

Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman
equation.
o The main objective of Q-learning is to learn the policy which can inform the agent that what
actions should be taken for maximizing the reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the Bellman
equation given below:

In the equation, we have various components, including reward, discount factor (γ), probability, and
end states s'. But there is no any Q-value is given so first consider the below image:
In the above image, we can see there is an agent who has three values options, V(s1), V(s2), V(s3). As
this is MDP, so agent only cares for the current state and the future state. The agent can go to any
direction (Up, Left, or Right), so he needs to decide where to go for the optimal path. Here agent will
take a move as per probability bases and changes the state. But if we want some exact moves, so for
this, we need to make some changes in terms of Q-value. Consider the below image:

Q- represents the quality of the actions at each state. So instead of using a value at each state, we will
use a pair of state and action, i.e., Q(s, a). Q-value specifies that which action is more lubricative than
others, and according to the best Q-value, the agent takes his next move. The Bellman equation can
be used for deriving the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state,
so the Q -value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the
agent.
Q-table:
A Q-table or matrix is created while performing the Q-learning. The table follows the state and action
pair, i.e., [s, a], and initializes the values to zero. After each action, the table is updated, and the q-
values are stored within the table.

The RL agent uses this Q-table as a reference table to select the best action based on the q-values.

Conclusion:-
Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns
to behave in an environment by performing the actions and seeing the results of actions. For each
good action, the agent gets positive feedback, and for each bad action, the agent gets negative
feedback or penalty
EXPERIMENT-10

OBJECTIVE:-
Describe concept of MDP?

THEORY:-

Markov Decision Process:-


Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In
MDP, the agent constantly interacts with the environment and performs actions; at each action, the
environment responds and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized
using MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S


o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and move to the state
s2, then the state transition from s1 to s2 only depends on the current state and future action and states
do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past
action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess
game, the players only focus on the current state and do not need to remember past actions or states.

Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider
only the finite MDP.

Markov Process:
Markov Process is a memoryless process with a sequence of random states S 1, S2, ....., St that uses the
Markov Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S
and transition function P. These two components (S and P) can define the dynamics of the system.

Conclusion:-
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In
MDP, the agent constantly interacts with the environment and performs actions; at each action, the
environment responds and generates a new state

NAME OF FACULTY: VANDANA MA'AM


SIGNATURE:
DATE:

You might also like