You are on page 1of 77

UNIT-5 LEARNING

Dr. Vikas Khare


B.E, M.TECH, MBA, PhD (NIT Bhopal)
Associate Professor & Placement Coordinator
STME, NMIMS, INDORE
Certified Energy Manager, Bureau of Energy Efficiency, India
Supervised Machine Learning
Supervised learning is the types of machine learning in which
machines are trained using well "labelled" training data, and on
basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct
output.
In supervised learning, the training data provided to the
machines work as the supervisor that teaches the machines to
predict the output correctly. It applies the same concept as a
student learns in the supervision of the teacher.
Supervised learning is a process of providing
input data as well as correct output data to the
machine learning model. The aim of a
supervised learning algorithm is to find a
mapping function to map the input
variable(x) with the output variable(y).
Steps Involved in Supervised Learning:
•First Determine the type of training dataset
•Collect/Gather the labelled training data.
•Split the training dataset into training dataset, test dataset, and validation
dataset.
•Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
•Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
•Execute the algorithm on the training dataset. Sometimes we need validation
sets as the control parameters, which are the subset of training datasets.
•Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.
Regression
Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:

•Linear Regression
•Regression Trees
•Non-Linear Regression
•Bayesian Linear Regression
•Polynomial Regression
Classification
Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-
No, Male-Female, True-false, etc.
Spam Filtering,
•Random Forest
•Decision Trees
•Logistic Regression
•Support vector Machines
Advantages of Supervised learning:
•With the help of supervised learning, the model can
predict the output on the basis of prior experiences.
•In supervised learning, we can have an exact idea
about the classes of objects.
•Supervised learning model helps us to solve various
real-world problems such as fraud detection, spam
filtering, etc.
Disadvantages of supervised learning:
•Supervised learning models are not suitable for
handling the complex tasks.
•Supervised learning cannot predict the correct output
if the test data is different from the training dataset.
•Training required lots of computation times.
•In supervised learning, we need enough knowledge
about the classes of object.
Decision Tree Pruning

Pruning is a data compression technique in machine learning and search


algorithms that reduces the size of decision trees by removing sections of the
tree that are non-critical and redundant to classify instances. Pruning reduces
the complexity of the final classifier, and hence improves predictive
accuracy by the reduction of overfitting.
CLUSTER ANALYSIS
Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense) to
each other than to those in other groups (clusters). It is a main task of
exploratory data mining, and a common technique for statistical data analysis, used
in many fields, including machine learning, pattern recognition, image
analysis, information retrieval, bio-informatics, data compression, and computer
graphics. Following are the important points related to cluster:
• A cluster of data objects can be treated as one group.
• While doing cluster analysis, we first partition the set of data into groups based
on data similarity and then assign the labels to the groups.
The main advantage of clustering over classification is that, it is adaptable to
changes and helps single out useful features that distinguish different groups.
Table: Data of Organization

Organization Number of Product Sales in Thousand

A 4 8

B 16 4

C 18 6

D 2 10

E 17 2
Month Overall Performance Index-1 Overall Performance Index-1

Jan. 1.4 6
Feb. 4.5 7.4
March 5 7.8
April 1.3 6.1
May 5.5 7.5
June 1.2 6.3
July 6 7.6
August 2.2 8.1
Sep. 4 7.3
Oct. 2.3 8.3
Nov. 2.4 8.2
Dec. 2.5 8.4
K-MEANS:
K-Means clustering intends to partition n objects into k clusters in which each object belongs to the cluster with
the nearest mean. This method produces exactly k different clusters of greatest possible distinction. The best
number of clusters k leading to the greatest separation (distance) is not known as a priori and must be computed
from the data. The objective of K-Means clustering is to minimize total intra-cluster variance, or, the squared
error function.
Algorithms:
1. Cluster the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center, according to the Euclidean distance function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in consecutive rounds.
K-Means Clustering-
•K-Means clustering is an unsupervised iterative clustering
technique.
• It partitions the given data set into k predefined distinct
clusters.
• A cluster is defined as a collection of data points exhibiting
certain similarities.
It partitions the data set such that-
• Each data point belongs to a cluster with the nearest mean.
• Data points belonging to one cluster have high degree of similarity.
• Data points belonging to different clusters have high degree of dissimilarity.
K-Means Clustering Algorithm-
K-Means Clustering Algorithm involves the following steps-
Step-01:
• Choose the number of clusters K.
Step-02:
• Randomly select any K data points as cluster centers.
• Select cluster centers in such a way that they are as farther as possible from each other.
Step-03:
• Calculate the distance between each data point and each cluster center.
• The distance may be calculated either by using given distance function or by using
euclidean distance formula.
Step-04:
• Assign each data point to some cluster.
• A data point is assigned to that cluster whose center is nearest to that data point.

Step-05:
• Re-compute the center of newly formed clusters.
• The center of a cluster is computed by taking mean of all the data points contained in that cluster.

Step-06:
Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping criteria is met-
• Center of newly formed clusters do not change
• Data points remain present in the same cluster
• Maximum number of iterations are reached
REINFORCEMENT
LEARNING
Reinforcement Learning is a feedback-based
Machine learning technique in which an agent
learns to behave in an environment by
performing the actions and seeing the results of
actions. For each good action, the agent gets
positive feedback, and for each bad action, the
agent gets negative feedback or penalty.
The agent learns with the process of hit and trial, and
based on the experience, it learns to perform the task
in a better way. Hence, we can say
that "Reinforcement learning is a type of machine
learning method where an intelligent agent
(computer program) interacts with the
environment and learns to act within that." How a
Robotic dog learns the movement of his arms is an
example of Reinforcement learning.
Example: The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent
is supposed to find the best possible path to reach the reward. The following problem explains the problem
more easily.

The goal of the robot is to get the reward that


is the diamond and avoid the hurdles that are
fired. The robot learns by trying all the
possible paths and then choosing the path
which gives him the reward with the least
hurdles. Each right step will give the robot a
reward and each wrong step will subtract the
reward of the robot. The total reward will be
calculated when it reaches the final reward
that is the diamond.
Terms used in Reinforcement Learning
•Agent(): An entity that can perceive/explore the environment and act upon it.
•Environment(): A situation in which an agent is present or surrounded by. In RL,
we assume the stochastic environment, which means it is random in nature.
•Action(): Actions are the moves taken by an agent within the environment.
•State(): State is a situation returned by the environment after each action taken by
the agent.
•Reward(): A feedback returned to the agent from the environment to evaluate the
action of the agent.
•Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
•Value(): It is expected long-term retuned with the discount factor and opposite to
the short-term reward.
•Q-value(): It is mostly similar to the value, but it takes one additional parameter
as a current action
Key Features of Reinforcement Learning
•In RL, the agent is not instructed about the environment and
what actions need to be taken.
•It is based on the hit and trial process.
•The agent takes the next action and changes states according
to the feedback of the previous action.
•The agent may get a delayed reward.
•The environment is stochastic, and the agent needs to
explore it to reach to get the maximum positive rewards.
Approaches to implement Reinforcement Learning
Value-based: The value-based approach is about to find the
optimal value function, which is the maximum value at a state
under any policy. Therefore, the agent expects the long-term
return at any state(s) under policy π.
Policy-based: Policy-based approach is to find the optimal
policy for the maximum future rewards without using the
value function. In this approach, the agent tries to apply such
a policy that the action performed in each step helps to
maximize the future reward.
Model-based: In the model-based approach, a virtual
model is created for the environment, and the agent
explores that environment to learn it. There is no
particular solution or algorithm for this approach
because the model representation is different for each
environment.
The Bellman Equation
The Bellman equation was introduced by the Mathematician Richard
Ernest Bellman in the year 1953, and hence it is called as a Bellman
equation. It is associated with dynamic programming and used to
calculate the values of a decision problem at a certain point by
including the values of previous states.
The key-elements used in Bellman equations are:

•Action performed by the agent is referred to as "a"


•State occurred by performing the action is "s."
•The reward/feedback obtained for each good and bad action is "R."
•A discount factor is Gamma "γ."
The Bellman equation can be written as:
1.V(s) = max [R(s,a) + γV(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by performing an action.
γ = Discount factor
V(s`) = The value at the previous state.
Passive Learning
As the goal of the agent is to evaluate
how good an optimal policy is, the agent
needs to learn the expected
utility Uπ(s) for each state s. This can be
done in three ways.
Direct Utility Estimation:
In this method, the agent executes a sequence of trials or
runs (sequences of states-actions transitions that continue until
the agent reaches the terminal state). Each trial gives a sample
value and the agent estimates the utility based on the samples
values. Can be calculated as running averages of sample
values. The main drawback is that this method makes a wrong
assumption that state utilities are independent while in reality
they are Markovian. Also, it is slow to converge.
Adaptive Dynamic Programming(ADP)
ADP is a smarter method than Direct Utility Estimation as it runs trials
to learn the model of the environment by estimating the utility of a state
as a sum of reward for being in that state and the expected discounted
reward of being in the next state.

next state.

Where R(s) = reward for being in state s, P(s’|s, π(s)) = transition model, γ =
discount factor and Uπ(s) = utility of being in state s’.
It can be solved using value-iteration algorithm. The algorithm converges fast but
can become quite costly to compute for large state spaces. ADP is a model based
approach and requires the transition model of the environment. A model-free
approach is Temporal Difference Learning.
Temporal Difference Learning (TD)
TD learning does not require the agent to learn the transition model.
The update occurs between successive states and agent only updates
states that are directly affected.
affected.

Where α = learning rate which determines the convergence to true utilities. While
ADP adjusts the utility of s with all its successor states, TD learning adjusts it
with that of a single successor state s’. TD is slower in convergence but much
simpler in terms of computation.

You might also like