You are on page 1of 19

Course Contents:

U Cont Hou
ni ents rs
t
Introduction: Basic Definitions, Types of learning- Supervised
CO Learning, Unsupervised Learning, Reinforcement Learning, 6
1 Hypothesis Space and Inductive Bias, Evaluation and Cross-
validation.
Regressions:
C 8
O Linear regression, Decision trees, over fitting Instance based learning,
2 Feature reduction, Collaborative filtering-based recommendation
Probability, Bayes learning, Logistic Regression, Support Vector
C 8
O Machine, Kernel function and Kernel SVM
3
Computational learning: Theory and applications, PAC learning model,
C 6
O Sample complexity, VC Dimension, Ensemble learning
4
Clustering,k-means, adaptive hierarchical clustering, Gaussian
CO 6
5 mixture model

1. Mitchell Tom, Machine


Text Books
Learning. McGraw Hill,
1997.
Introduction to
2.
machine learning,
EthemAlpaydin. —
2nd ed., The MIT
Press, Cambridge,
Massachusetts,
London, England.
3 Chris Bishop, Pattern Recognition and Machine Learning
.
Understanding Machine Learning: From Theory to Algorithms, By
Shai Shalev-Shwartz and Shai Ben-David
1
. URL:
E- https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/
Books index.html
Bayesian Reasoning and Machine Learning By David Barber
2
. URL: http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/091117.pdf
Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of
1
. Statistical Learning Data Mining, Inference, and Prediction

Refere Richard O. Duda, Peter E. Hart, David G. Stork. Pattern


2
nce . classification, Wiley, New York, 2001.
Book Machine Learning: The Art and Science of Algorithms that Make Sense of
3
s . Data (1st Edition) – Peter Falch
NPTEL: Introduction to Machine Learning by Prof. Sudeshna Sarkar,
1 IIT Kharagpur. URL:
https://onlinecourses.nptel.ac.in/noc21_cs85/preview,
NPTEL: Essential Mathematics for Machine Learning By Prof. Sanjeev
2 Kumar, Prof. S. K. Gupta, IIT Roorkee URL:
On line
TL https://onlinecourses.nptel.ac.in/noc21_ma38/
Material DataCamp: Machine Learning for Everyone
3 URL: https://www.datacamp.com/courses/machine-learning-for-everyon

Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions. Machine learning
contains a set of algorithms that work on a huge amount of data. Data is fed to these algorithms to
train them, and on the basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based on the
training, the machine predicts the output. Here, the labelled data specifies that some of the inputs
are already mapped to the output. More preciously, we can say; first, we train the machine with the
input and corresponding output, and then we ask the machine to predict the output using the test
dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset of cats and
dog images. So, first, we will provide the training to the machine to understand the images, such as
the shape & size of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats
are smaller), etc. After completion of training, we input the picture of a cat and ask the machine to
identify the object and predict the output. Now, the machine is well trained, so it will check all the
features of the object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it
will put it in the Cat category. This is the process of how the machine identifies the objects in
Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with the
output variable(y). Some real-world applications of supervised learning are Risk Assessment,
Fraud Detection, Spam filtering, etc.

Categories of Supervised Machine Learning


Supervised machine learning can be classified into two types of problems, which are given below:

o Classification
o Regression

a) Classification
Classification algorithms are used to solve the classification problems in which the output variable
is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset. Some real-world examples of classification
algorithms are Spam Detection, Email filtering, etc.
b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as
market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea about
the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image
classification is performed on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is done by
using medical images and past labelled data with labels for disease conditions. With such a
process, the machine can identify a disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for identifying
fraud transactions, fraud customers, etc. It is done by using historic data to identify the
patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam
folder.
o Speech Recognition - Supervised learning algorithms are also used in speech recognition.
The algorithm is trained with voice data, and various identifications can be done using the
same, such as voice-activated passwords, voice commands, etc.

2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name suggests,
there is no need for supervision. It means, in unsupervised machine learning, the machine is trained
using the unlabeled dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled,
and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences. Machines are instructed to find
the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit images,
and we input it into the machine learning model. The images are totally unknown to the model, and
the task of the machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference, shape
difference, and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is a way
to group the objects into a cluster such that the objects with the most similarities remain in one
group and have fewer or no similarities with the objects of other groups. An example of the
clustering algorithm is grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those variables accordingly so that it can
generate maximum profit. This algorithm is mainly applied in Market Basket analysis, Web usage
mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is
easier as compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled,
and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset
that does not map with the output.

Applications of Unsupervised Learning


o Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright
in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised learning
techniques for building recommendation applications for different web applications and e-
commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised learning,
which can identify unusual data points within the dataset. It is used to discover fraudulent
transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each user
located at a particular location.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between


Supervised and Unsupervised machine learning. It represents the intermediate ground between
Supervised (With Labelled training data) and Unsupervised learning (with no labelled training data)
algorithms and uses the combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled data.
As labels are costly, but for corporate purposes, they may have few labels. It is completely different
from supervised and unsupervised learning as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced. The main aim of semi-supervised learning is
to effectively use all the available data, rather than only labelled data like in supervised learning.
Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it
helps to label the unlabeled data into labelled data. It is because labelled data is a comparatively
more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student is under
the supervision of an instructor at home and college. Further, if that student is self-analysing the
same concept without any help from the instructor, it comes under unsupervised learning. Under
semi-supervised learning, the student has to revise himself after analyzing the same concept under
the guidance of an instructor at college.

Advantages and disadvantages of Semi-supervised Learning

Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning
from experiences, and improving its performance. Agent gets rewarded for each good action and
get punished for each bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn from
their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns various
things by experiences in his day-to-day life. An example of reinforcement learning is to play a
game, where the Game is the environment, moves of an agent at each step define states, and the
goal of the agent is to get a high score. Agent receives feedback in terms of punishment and
rewards.
Due to its way of working, reinforcement learning is employed in different fields such as Game
theory, Operation Research, Information theory, multi-agent systems.
A reinforcement learning problem can be formalized using Markov Decision Process(MDP). In
MDP, the agent constantly interacts with the environment and performs actions; at each action, the
environment responds and generates a new state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the


tendency that the required behaviour would occur again by adding something. It enhances
the strength of the behaviour of the agent and positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour would
occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning


o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-human
performance. Some popular games that use RL algorithms are AlphaGO and AlphaGO
Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how to
use RL in computer to automatically learn and schedule resources to wait for different jobs
in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement learning.
There are different industries that have their vision of building intelligent robots using AI
and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the help
of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning


Advantages

o It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
o The learning model of RL is similar to the learning of human beings; hence most accurate
results can be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken the
results.
ML | Understanding Hypothesis
In most supervised machine learning algorithm, our main goal is to find out a possible
hypothesis from the hypothesis space that could possibly map out the inputs to the proper
outputs.
The following figure shows the common method to find out the possible hypothesis from
the Hypothesis space:

Hypothesis Space (H):

Hypothesis space is the set of all the possible legal hypothesis. This is the set from which
the machine learning algorithm would determine the best possible (only one) which would
best describe the target function or the outputs.
Hypothesis (h):

A hypothesis is a function that best describes the target in supervised machine learning. The
hypothesis that an algorithm would come up depends upon the data and also depends upon
the restrictions and bias that we have imposed on the data. To better understand the
Hypothesis Space and Hypothesis consider the following coordinate that shows the
distribution of some data:
Say suppose we have test data for which we have to determine the outputs or results.
The test data is as shown below:

We can predict the outcomes by dividing the coordinate as shown below:


So the test data would yield the following result:

But note here that we could have divided the coordinate plane as:

The way in which the coordinate would be divided depends on the data, algorithm and
constraints.
 All these legal possible ways in which we can divide the coordinate plane to predict the outcome
of the test data composes of the Hypothesis Space.
 Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:
Inductive Learning Algorithm

Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning


algorithm which is used for generating a set of a classification rule, which produces rules of
the form “IF-THEN”, for a set of examples, producing rules at each iteration and appending
to the set of rules.
Basic Idea:

There are basically two methods for knowledge extraction firstly from domain experts and
then with machine learning.
For a very large amount of data, the domain experts are not very useful and reliable. So we
move towards the machine learning approach for this work.
To use machine learning One method is to replicate the experts logic in the form of
algorithms but this work is very tedious, time taking and expensive.
So we move towards the inductive algorithms which itself generate the strategy for
performing a task and need not instruct separately at each step.

Need of ILA in presence of other machine learning algorithms:

The ILA is a new algorithm which was needed even when other reinforcement learnings like
ID3 and AQ were available.
● The need was due to the pitfalls which were present in the previous algorithms, one of
the major pitfalls was lack of generalisation of rules.

● The ID3 and AQ used the decision tree production method which was too specific which
were difficult to analyse and was very slow to perform for basic short classification
problems.

● The decision tree-based algorithm was unable to work for a new problem if some
attributes are missing.

● The ILA uses the method of production of a general set of rules instead of decision trees,
which overcome the above problems
THE ILA ALGORITHM:

General requirements at start of the algorithm:-

1. list the examples in the form of a table ‘T’ where each row corresponds to an example
and each column contains an attribute value.
2. Create a set of m training examples, each example composed of k attributes and a class
attribute with n possible decisions.
3. Create a rule set, R, having the initial value false.
4. Initially all rows in the table are unmarked.

Steps in the algorithm:-


Step 1:
divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn). One table for each
possible value of the class attribute. (repeat steps 2-8 for each sub-table)
Step 2:
Initialize the attribute combination count ‘ j ‘ = 1.
Step 3:
For the sub-table on which work is going on, divide the attribute list into distinct combinations,
each combination with ‘j ‘ distinct attributes.
Step 4:
For each combination of attributes, count the number of occurrences of attribute values that appear
under the same combination of attributes in unmarked rows of the sub-table under consideration,
and at the same time, not appears under the same combination of attributes of other sub-tables. Call
the first combination with the maximum number of occurrences the max-combination ‘ MAX’.
Step 5:
If ‘MAX’ = = null , increase ‘ j ‘ by 1 and go to Step 3.
Step 6:
Mark all rows of the sub-table where working, in which the values of ‘MAX’ appear, as classi?ed.
Step 7:
Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R whose left-hand side will
have attribute names of the ‘MAX’ with their values separated by AND, and its right-hand side
contains the decision attribute value associated with the sub-table.
Step 8:
If all rows are marked as classi?ed, then move on to process another sub-table and go to Step 2.
else, go to Step 4. If no sub-tables are available, exit with the set of rules obtained till then.

An example showing the use of ILA


Suppose an example set having attributes Place type, weather, location, decision and seven
examples, our task is to generate a set of rules that under what condition what is the
decision.

Example no. Place type Weather location decision

I) Hilly Winter kullu Yes


Example no. Place type Weather location decision

II ) Mountain Windy Mumbai No

III ) Mountain Windy Shimla Yes

IV ) Beach Windy Mumbai No

V) Beach Warm goa Yes

VI ) Beach Windy goa No

VII ) Beach Warm Shimla Yes

step 1
subset 1
s.no place type weather location decision

1 hilly winter kullu Yes

2 mountain windy Shimla Yes

3 beach warm goa Yes

4 beach warm Shimla Yes

subset 2

.
s.no place type weather location decision

5 mountain windy Mumbai No

6 beach windy Mumbai No

7 beach windy goa No

Cross Validation in Machine Learning


In machine learning, we couldn’t fit the model on the training data and can’t say that
the model will work accurately for the real data. For this, we must assure that our model got
the correct patterns from the data, and it is not getting up too much noise. For this purpose,
we use the cross-validation technique.
Cross-Validation

Cross-validation is a technique in which we train our model using the subset of the data-set
and then evaluate using the complementary subset of the data-set.
The three steps involved in cross-validation are as follows :
1. Reserve some portion of sample data-set.
2. Using the rest data-set train the model.
3. Test the model using the reserve portion of the data-set.

Methods of Cross Validation


Validation
In this method, we perform training on the 50% of the given data-set and rest 50% is used
for the testing purpose.
The major drawback of this method is that we perform training on the 50% of the dataset, it
may possible that the remaining 50% of the data contains some important information which
we are leaving while training our model i.e higher bias.

LOOCV (Leave One Out Cross Validation)


In this method, we perform training on the whole data-set but leaves only one data-point of
the available data-set and then iterates for each data-point.

It has some advantages as well as disadvantages.


An advantage of using this method is that we make use of all data points and hence it is low
bias.
The major drawback of this method is that it leads to higher variation in the testing model as
we are testing against one data point. If the data point is an outlier it can lead to higher
variation. Another drawback is it takes a lot of execution time as it iterates over ‘the number
of data points’ times.

K-Fold Cross Validation


In this method, we split the data-set into k number of subsets(known as folds) then we
perform training on the all the subsets but leave one(k-1) subset for the evaluation of the
trained model. In this method, we iterate k times with a different subset reserved for testing
purpose each time.
Note:
It is always suggested that the value of k should be 10 as the lower value
of k is takes towards validation and higher value of k leads to LOOCV method.

Example
The diagram below shows an example of the training subsets and evaluation subsets
generated in k-fold cross-validation. Here, we have total 25 instances. In first iteration we
use the first 20 percent of data for evaluation, and the remaining 80 percent for training([1-
5] testing and [5-25] training) while in the second iteration we use the second subset of 20
percent for evaluation, and the remaining three subsets of the data for training([5-10] testing
and [1-5 and 10-25] training), and so on.

Total instances: 25
Value of k :5

No. Iteration Training set observations Testing set observations


1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4]
2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9]
3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14]
4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19]
5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24]

Comparison of train/test split to cross-validation

Advantages of train/test split:


1. This runs K times faster than Leave One Out cross-validation because K-fold cross-
validation repeats the train/test split K-times.
2. Simpler to examine the detailed results of the testing process.
Advantages of cross-validation:
1. More accurate estimate of out-of-sample accuracy.
2. More “efficient” use of data as every observation is used for both training and testing.

Python code for k fold cross-validation.

# This code may not be run on GFG IDE


# as required packages are not found.

# importing cross-validation from sklearn package.

from sklearn import cross_validation

# value of K is 10.

data = cross_validation.KFold(len(train_set), n_folds=10, indices=False)

You might also like