Unit 1 Introduction of Machine Learning Notes

NUTAN MAHARASHTRA VIDYA PRASARAK MANDAL’S
NUTAN COLLEGE OF ENGINEERING & RESEARCH (NCER)

Department of Computer Science & Engineering
-------------------------------------------------------------------------------------------------------------------------- - -
BTCOC503
Lecture
Number Topic to be covered
Unit 1: Introduction to Machine Learning (07Hrs)

1
 Introduction: Basic definitions, types of learning
2  Hypothesis space and inductive bias, Evaluation
3  Cross-validation, Linear regression
4  Decision trees ,Overfitting ,
5  Instance based learning,
6  Collaborative filtering-based recommendation
7  Feature reduction
: Submitted by:
Prof. S.B.Mehta
----------------------------------------------------------------------------------------------------------------------------- ----------------------------------
ASST PROF:- S.B.Mehta NCER, BATU UNIVERSITY, LONERE
DEPARTMENT OF
COMPUTER SCIENCE
Nutan College Of Engineering & Research, & ENGINEERING

Talegaon Dabhade, Pune- 410507
Machine Learning
Unit 1: Introduction
Machine Learning Introduction:

 Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.
 Machine learning is a branch of the field called Artificial intelligence which aims to create intelligent
systems which do human like jobs by learning from a lot of relevant data. To give computers the ability
to learn without being explicitly programmed.
 Although machine learning is a field within computer science, it differs from traditional computational
approaches. In traditional computing, algorithms are sets of explicitly programmed instructions used by
computers to calculate or problemsolve. Machinelearningalgorithms instead allow for computers to train
on data inputs and use statistical analysis in order to output values that fall within a specific range. Because
of this, machine learning facilitates computers in building models from sample data in order to automate
decision-making processes based on data inputs.
 Machine learning teaches computers to do what comes naturally to humans and animals: learn from
experience. Machine learning algorithms use computational methods to “learn” information directly from
data without relying on a predetermined equation as a model. The algorithms adap tively improve their
performance as the number of samples available for learningincreases
How do Machines learn?
It work just like humans do! First, we receive the knowledge about a certain thing and then keeping this
knowledge in mind, we are able to identify the thing in the future. Also, past experiences help us in taking
decisions accordingly in the future. Our brain trains itself by identifying the features and patterns in
knowledge/data received, thus enabling itself to successfully identify or distinguish between various
things.
2
How Does Machine Learning Work?
There are basic steps used to perform a machine learning task:
1. Collecting and preparing data:

The first step in machine learning basics is that we feed knowledge/data to the machine, this data is
divided into two parts namely, training data and testing data. Be it the raw data from excel, access, text
files etc., this step (gathering past data) forms the foundation of the future learning. The better the
variety, density and volume of relevant data, better the learning prospects for the machine becomes.
2. Training a model: This step involves choosing the appropriate algorithm and representation of data
in the form of the model. The cleaned data is split into two parts – train and test (proportion depending
on the prerequisites); the first part (training data) is used for developing the model. The second part (test
data), is used as a reference.
3. Evaluating the model:

The machine learns the patterns and features from the training data and trains itself to take decisions
like identifying, classifying or predicting new data. To check how accurately the machine is able to take
these decisions, the predictions are tested on the testing data. In this case, we will first work on the
training data and once the model is sufficiently trained, we use it on the testing data to understand how
successful it is in recognizing the faces in the photo.
4. Improving the performance: This step might involve choosing a different model altogether or
introducing more variables to augment the efficiency. That’s why significant amount of time needs to be
3
spent in data collection and preparation.
Advantages of Machine Learning?
 Easily identifies trends and patterns

Machine Learning can review large volumes of data and discover specific trends and patterns that would
not be apparent to humans. For instance, for an e-commerce website like Amazon, it serves to understand
the browsing behaviors and purchase histories of its users to help cater to the right products, deals, and
reminders relevant to them. It uses the results to reveal relevant advertisements to them.
 No human intervention needed (automation)

A very powerful utility of Machine Learning is its ability to automate various decision -making tasks.
Because of the machine learning technique, we don’t need to assist our system or give it commands to
follow certain instructions. To control their decision-making ability. Rather let it take its own decision
by itself without our interference. Hence it helps them to develop and improve their decision-making
ability by themselves and also to rectify the errors.
A common exampleof this is anti-virus software’s; they learn to filter new threats as they are recognized.
ML is also good at recognizing spam.
 Continuous Improvement
Machine Learning algorithms are capable of learning from the data we provide. Consisting of a machine
learning algorithm it helps the system to continuously understand the errors and resulted rectification for
that errors. Hence this increases efficiency and accuracy. Giants like Amazon, Walmart, etc collect a
huge volume of new data every day. The accuracy of finding associated products or recommendation
engine improves with this huge amount of training data available
 Handling multi-dimensional and multi-variety data

The machine learning algorithm helps in managing and improving the multi-dimensional and large
amount of data and improving their skills in having no errors in them with the help of AI technology.
 Wide Applications
ML can be helpful for those who are in the field of e-commerce or the healthcare providers they can
make use of ML to get immense help in their market growth and also it helps in the increase of the human
work efficiency. The use of this application gives the customers a very personal experience to use this
while targeting the right customers.
Disadvantages of Machine Learning
 Data acquisition
In the process of machine learning, a large amount of data is used in the process of training and learning.
So these use of data should be of good quality, unbiased. It might contain a large volume of bogus and
incorrect data. Many times we do face a situation where we find an imbalance in data which leads to
poor accuracy of models. These reasons make data acquisition a massive disadvantage.
 Time-consuming
Machine Learning models are capable of processing huge amounts of data. Larger the volume of data,
4
the time to learn from data and process it also increases. Sometimes it might also mean additional
resources for computing.
 Algorithm Selection
A Machine Learning problem can implement various algorithms to find a solution. It is a manual and
tedious task to run models with different algorithms and identify the most accurate algorithm based on
the results. This is a disadvantage.
 High error susceptibility
In the process of machine learning, the high amount of data is used and on the other hand, many
algorithms are used and tested. Hence there is a huge change to experience many errors. Because while
you are training your dataset at that particular many algorithms is used if there is any mistake in the
algorithm then it can lead the user to several irrelevant advertisements
Applications of Machine Learning
Real World Application of Machine Learning
 Virtual Personal Assistants
There are various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistant record our
voice instructions, send it over the server on a cloud, and decode it using ML algorithms and act
accordingly. Few of the major Applications of Machine Learning here are:
 Speech Recognition
 Speech to Text Conversion
 Natural Language Processing
 Text to Speech Conversion
5
 Transportation Prediction: If we want to visit a new place, we take help of Google Maps, which
shows us the correct path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
 Real Time location of the vehicle form Google Map app and sensors
 Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.
 Email Spam and Malware Filtering : Whenever we receive a new email, it is filtered automatically
as important, normal, and spam. We always receive an important mail in our inbox with the
important symbol and spam emails in our spam box, and the technology behind this is Machine
learning.
 Social Media Services : Social media platforms use machine learning algorithms and approaches to
create some attractive and excellent features. For instance, Facebook notices and records your
activities, your chats, likes, and comments, and the time you spend on specific kinds of posts.
Machine learning learns from your own experience and makes friends and page suggestions for your
profile.
 Image Recognition: Image recognition is one of the most common applications of machine learning.
It is used to identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
 Product Recommendations : Machine learning is widely used by various e-commerce and

entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies,
etc., and this is also done with the help of machine learning.
 Automatic Language Translation: Nowadays, if we visit a new place and we are not aware of
the language then it is not a problem at all, as for this also machine learning helps us by converting
the text into our known languages. Google's GNMT (Google Neural Machine Translation) provide
this feature, which is a Neural Machine Learning that translates the text into our familiar language,
and it called as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm,
which is used with image recognition and translates the text from one language to another
6
language.
 Online Fraud Detection: Machine learning is making our online transaction safe and secure by
detecting fraud transaction. Whenever we perform some online transaction, there may be various
ways that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural network helps us
by checking whether it is a genuine transaction or a fraud transaction.. The company uses a set of
tools that helps them to compare millions of transactions taking place and distinguish between
legitimate or illegitimate transactions taking place between the buyers and sellers.
 Self-driving cars: One of the most exciting applications of machine learning is self -driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self -driving car. It is using unsupervised learning method
to train the car models Diagnosis: In to detect people and objects while driving.
 Medical medical science: machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact position of
lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
 Stock Market trading: Machine learning is widely used in stock market trading. In the stock
market, there is always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.
 Search Engine Result Refining: Google and other search engines use machine learning to
improve the search results for you. Every time you execute a search, the algorithms at the backend
keep a watch at how you respond to the results. If you open the top results and stay on the web page
for long, the search engine assumes that the the results it displayed were in accordance to the query.
Similarly, if you reach the second or third page of the search results but do not open any of the
results, the search engine estimates that the results served did not match requirement. This way, the
algorithms working at the backend improve the search results.
Life Cycle of Machine Learning
7
Machine learning life cycle is a cyclic process to build an efficient machine learning project. The
main purpose of the life cycle is to find a solution to the problem or project.
Machine learning life cycle involves seven major steps, which are given below:
1. Gathering Data
2. Data preparation
3. Data Wrangling
4. Analyze Data
5. Train the model
6. Test the model
7. Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and
obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from various sources
such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the efficiency of the output. The more will
be the data, the more accurate will be the prediction.
This step includes the below tasks:

o Identify various data sources
o Collect data
o Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as a dataset. It will be used in
further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put
our data into a suitable place and prepare it to use in our machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We need to understand the
characteristics, format, and quality of data.
8
A better understanding of data leads to an effective outcome. In this, we find Correlations, general
trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process
of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make
it more suitable for analysis in the next step. It is one of the most important steps of the complete process.
Cleaning of data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the data may not be useful.
In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data. It is mandatory to detect and remove the above
issues because it can negatively affect the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
o Selection of analytical techniques

o Building models
o Review the result
The aim of this step is to build a machine learning model to analyze the data using various analytical
techniques and review the outcome. It starts with the determination of the type of the problems, where
we select the machine learning techniques such as Classification, Regression, Cluster analysis,
Association, etc. then build the model using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its performance for
better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a model is
required so that it can understand the various patterns, rules, and, features.
9
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the model. In this
step, we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement of project
or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in the real-
world system.
If the above-prepared model is producing an accurate result as per our requirement with acceptable
speed, then we deploy the model in the real system. But before deploying the project, we will check
whether it is improving its performance using available data or not. The deployment phase is similar
to making the final report for a project.
Types of Machine Learning
Types Of Machine Learning :

1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
10
1. Supervised Learning:
 Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct output.
 In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student learns
in the supervision of the teacher.
 Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping function
to map the input variable(x) with the output variable(y).
 Consider yourself as a student sitting in a math class where in your teacher is supervising you on
how you’re solving a problem or whether you’re doing it correctly or not. This situation is similar
to what a supervised learning algorithm follows, i.e., with input provided as a labeled dataset, a
model can learn from it. Labeled dataset means, for each dataset given, an answer or solution to it
is given as well. This would help the model in learning and hence providing the result of the
problem easily.
11
 So, a labeled dataset of animal images would tell the model whether an image is of a dog, a cat, etc..
Using which, a model gets training, and so, whenever a new image comes up to the model, it can
compare that image with the labeled dataset for predicting the correct label.
How Supervised Learning Works?
 In supervised learning, models are trained using labelled dataset, where the model learns about each
type of data. Once the training process is completed, the model is tested on the basis of test data (a
subset of the training set), and then it predicts the output.
 The working of Supervised learning can be easily understood by the below example and diagram:
 For example 1:

Suppose we have a dataset of different types of shapes which includes square, rectangle, trian gle, and
Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the
shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the
shape on the bases of a number of sides, and predicts the output.
 For example 2: with supervised learning, Facial recognition and once we have identified the people
in the photos; we will try to classify them as baby, teenager or adult. Here baby, teenager and adult will
be our labels and our training dataset will already be classified into the given labels based on certain
parameters through which the machine will learn these features and patterns and classify some new
input data based on the learning from this training data.
 For example 3: Suppose there is a basket which is filled with some fresh fruits, the task is to arrange
12
the same type of fruits at one place Also, suppose that the fruits are apple, banana, cherry, grape.
Suppose one already knows from their previous work (or experience) that, the shape of each and every
fruit present in the basket so, it is easy for them to arrange the same type of fruits in one place.
 Here, the previous work is called as training data in Data Mining terminology. So, it learns the things
from the training data. This is because it has a response variable which says y that if some fruit has so
and so features then it is grape, and similarly for each and every fruit.
 This type of information is deciphered from the data that is used to train the model.
This type of learning is called Supervised Learning. Such problems are listed under
classical Classification Tasks.
Steps Involved in Supervised Learning:
 First Determine the type of training dataset

 Collect/Gather the labelled training data.
 Split the training dataset into training dataset, test dataset, and validation dataset.
 Determine the input features of the training dataset, which should have enough knowledge so that
the model can accurately predict the output.
 Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
 Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
 Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output,
which means our model is accurate.
In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud
Detection, spam filtering, etc.
Some Supervised Algorithms are - Linear Regression, Logistic Regression, KNN classification,
Support Vector Machine (SVM), Decision Trees, Random Forest, Naive Bayes’ theorem.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
Supervised Machine Learning Algorithms can be broadly divided into two types of
algorithms; Regression and Classification
13
1. Regression
 These algorithms are used to determine the mathematical relationship between two or more
variables and the level of dependency between variables. These can be used for predicting an
output based on the interdependency of two or more variables
 Regression is a process of finding the correlations between dependent and independent variables.
It helps in predicting the continuous variables such as prediction of Market Trends, prediction of
House prices, etc.
The task of the Regression algorithm is to find the mapping function to map the input variable(x) to
the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the Regression
algorithm. In weather prediction, the model is trained on the past data, and once the training is
completed, it can easily predict the weather for future days.
Types of Regression Algorithm:
 Simple Linear Regression

 Multiple Linear Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
For example, an increase in the price of a product will decrease its consumption, which means, in this
case, the amount of consumption will depend on the price of the product. Here, the amount of
consumption will be called as the dependent variable and price of the product will be called the
independent variable. The level of dependency on the amount of consumption on the price of a product
will help us predict the future value of the amount of consumption based on the change in prices of
the product
Below are some popular Regression algorithms which come under supervised learning:
 Linear Regression
 Regression Trees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression
2. Classification techniques
 Classification algorithms are used when the output variable is categorical, which means there are
14
two classes such as Yes-No, Male-Female, True-false, etc.
 These algorithms are used to classify data into predefined classes or labels. Predict discrete
responses—
 Classification is a process of finding a function which helps in dividing the dataset into classes
based on different parameters. In Classification, a computer program is trained on the training
dataset and based on that training, it categorizes the data into different classes.
 The task of the classification algorithm is to find the mapping function to map the input(x) to the
discrete output(y).
Example: The best example to understand the Classification problem is Email Spam Detection. The
model is trained on the basis of millions of emails on different parameters, and whenever it receives a
new email, it identifies whether the email is spam or not. If the email is spam, then it is moved to the
Spam folder.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the following types:
 Logistic Regression
 K-Nearest Neighbours
 Support Vector Machines
 Kernel SVM
 Naïve Bayes
 Decision Tree Classification
 Random Forest Classification
Advantages of Supervised learning:

 With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
 In supervised learning, we can have an exact idea about the classes of objects.
 Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.
Disadvantages of supervised learning:

 Supervised learning models are not suitable for handling the complex tasks.
 Supervised learning cannot predict the correct output if the test data is different from the training
dataset.
 Training required lots of computation times.
 In supervised learning, we need enough knowledge about the classes of object.
2.Unsupervised Learning :
15
 In supervised learning, the dataset is properly labeled, meaning, a set of data is provided to train the
algorithm. The major difference between supervised and unsupervised learning is that there is no
complete and clean labeled dataset in unsupervised learning .
 Unsupervised learning is a type of self-organized learning that helps find previously unknown
patterns in data set without pre-existing labels.
 Unsupervised learning is a machine learning technique in which models are not supervised using
training dataset. Instead, models itself find the hidden patterns and insights from the given data. It
can be compared to learning which takes place in the human brain while learning new things. It can
be defined as:
 Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
 Unsupervised learning cannot be directly applied to a regression or classification problem because
unlike supervised learning, we have the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.
For Example 1 : Consider the animal photo example used in supervised learning. Suppose, there is no
labeled dataset provided. Then, how can the model find out if an animal is a cat or a dog or a bird?
Well, if the model has been provided some information such as if an animal has feathers, a beak,
wings, etc. it is a bird. In the same way, if an animal has fluffy fur, floppy ears, a curly tail, and maybe
some spots, it is a dog, and so on.
Hence, according to this information, the model can distinguish the animals successfully. But, if it is
not able to do so correctly, the model follows backward propagation for reconsidering the image.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:
Example 1:
16
Here, we have taken an unlabeled input data, which means it is not categorized and corresponding
outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order
to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will
apply suitable algorithms such as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to
the similarities and difference between the objects.
Example 2 of Unsupervised Learning

Again, suppose there is a basket and it is filled with some fresh fruits. The task is to arrange the same
type of fruits at one place.
This time there is no information about those fruits beforehand, its the first time that the fruits are
being seen or discovered
So how to group similar fruits without any prior knowledge about those.
First, any physical characteristic of a particular fruit is selected. Suppose color.
Then the fruits are arranged on the basis of the color. The groups will be something as shown below:
RED COLOR GROUP: apples & cherry fruits.
GREEN COLOR GROUP: bananas & grapes.
So now, take another physical character say, size, so now the groups will be something like this.
RED COLOR AND BIG SIZE: apple.
RED COLOR AND SMALL SIZE: cherry fruits.
GREEN COLOR AND BIG SIZE: bananas.
GREEN COLOR AND SMALL SIZE: grapes.
The job is done!
Here, there is no need to know or learn anything beforehand. That means, no train data and no
response variable. This type of learning is known as Unsupervised Learning.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of problems:
17
Unsupervised Learning uses technique
1. Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them as
per the presence and absence of those commonalities.
2. Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example
of Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:
 K-means clustering
 KNN (k-nearest neighbors)
 Hierarchal clustering
 Anomaly detection
 Neural Networks
 Principle Component Analysis
 Independent Component Analysis
 Apriori algorithm
18
 Singular value decomposition
Advantages of Unsupervised Learning
 Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
 Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
Disadvantages of Unsupervised Learning
 Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
 The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.
Reinforcement Learning
 Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns

to behave in an environment by performing the actions and seeing the results of actions. For each
good action, the agent gets positive feedback, and for each bad action, the agent gets negative
feedback or penalty.
 In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.
 Since there is no labeled data, so the agent is bound to learn by its experience only.
 RL solves a specific type of problem where decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc.
19
 The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.
 The agent learns with the process of hit and trial, and based on the experience, it learns to perform
the task in a better way. Hence, we can say that "Reinforcement learning is a type of machine
learning method where an intelligent agent (computer program) interacts with the environment
and learns to act within that." How a Robotic dog learns the movement of his arms is an example
of Reinforcement learning.
 It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement
learning. Here we do not need to pre-program the agent, as it learns from its own experience
without any human intervention.
Example 1:
Consider an example of a child trying to take his/her first steps. What will be the instructions he/she
follows to start walking?
 Observing others walking and trying to replicate the same
 Standing still
 Remaining still
 Trying to balance the body weight, along with deciding on which foot to advance first to start
walking.It sounds like a difficult and challenging task for a child to get up and walk, right? But for
us, it is easy since we have become used to it over time.
Now, putting it together, a child is an agent who is trying to manipulate the environment (surface or
floor) by trying to walk and going from one state to another (taking a step). A child gets a
reward when he/she takes a few steps (appreciation) but will not receive any reward or
appreciation if he/she is unable to walk. This is a simplified description of a reinforcement learning
20
problem.
Example 2: The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following problem
explains the problem more easily.
 The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward
that is the diamond and avoid the hurdles that are fire. Suppose there is an AI agent present
within a maze environment, and his goal is to find the diamond. The agent interacts with the
environment by performing some actions, and based on those actions, the state of the agent
gets changed, and it also receives a reward or penalty as feedback.
 The agent continues doing these three things (take action, change state/remain in the same
state, and get feedback), and by doing these actions, he learns and explores the environment.
 The agent learns that what actions lead to positive feedback or rewards and what actions lead
to negative feedback penalty. As a positive reward, the agent gets a positive point, and as a
penalty, it gets a negative point.
Types of Reinforcement learning
There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement
Positive Reinforcement:
The positive reinforcement learning means adding something to increase the tendency that expected
behavior would occur again. It impacts positively on the behavior of the agent and increases the strength
of the behavior.
This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement
may lead to an overload of states that can reduce the consequences.
21
Negative Reinforcement:
The negative reinforcement learning is opposite to the positive reinforcement as it increases the
tendency that the specific behavior will occur again by avoiding the negative condition.
It can be more effective than the positive reinforcement depending on situation and behavior, but it
provides reinforcement only to meet minimum behavior.
Application of Machine Learning Algorithms
Difference between Supervised vs Unsupervised vs Reinforcement Learning
Criteria Supervised Unsupervised Reinforcement

Learning Learning Learning
Definition The Machine learns The Machine is An agent interacts with

by using labeled trained on its environment by
data unlabeled data performing actions and
without aby learning from errors or
guidance rewards
Type of Regression and Association and Reward based

Problems Classification Clustering
Type of Labeled Data Unlabeled data No Predefined data

Data
Training External No Supervision No Supervision

supervision
Approach Maps the labeled to Understand Fallow the trial and

inputs to the known patterns & error method
22
outputs discovers the
output
Applicatio Fraud Detection, Text mining, Face Gaming, Inventory

n Email Spam Recognition, management, Finance
Detection, Image Image Sector, Roto
Classification, Recognition, Big navigation
Diagnostics, Score Data Visualization
Prediction
Hypothesis Space(H):
Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine
learning algorithm would determine the best possible (only one) which would best describe the target
function or the outputs.
Hypothesis:
A hypothesis is a function that best describes the target in supervised machine learning. The
hypothesis that an algorithm would come up depends upon the data and also depends upon the
restrictions and bias that we have imposed on the data.
Supervised machine learning is often described as the problem of approximating a target function that
maps inputs to outputs.
This description is characterized as searching through and evaluating candidate hypothesis from
hypothesis space.
A hypothesis in machine learning:
1. Covers the available evidence: the training dataset.
2. Is falsifiable (kind-of): a test harness is devised beforehand and used to estimate performance
and compare it to a baseline model to see if is skillful or not.
3. Can be used in new situations: make predictions on new data.
Review of Hypothesis
We can summarize the three definitions again as follows:

 Hypothesis in Science: Provisional explanation that fits the evidence and can be confirmed or
disproved.
 Hypothesis in Statistics: Probabilistic explanation about the presence of a relationship between
observations.
 Hypothesis in Machine Learning: Candidate model that approximates a target function for mapping
examples of inputs to outputs.
23
Inductive Bias:
The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions
that the learner uses to predict outputs given inputs that it has not encountered.
In machine learning, one aims to construct algorithms that are able to learn to predict a certain
target output.
In machine learning, the term inductive bias refers to a set of (explicit or implicit) assumptions made
by a learning algorithm in order to perform induction, that is, to generalize a finite set of observation
(training data) into a general model of the domain. Without a bias of that kind, induction would not be
possible, since the observations can normally be generalized in many ways. Treating all these
possibilities in equally, i.e., without any bias in the sense of a preference for specific types of
generalization (reflecting background knowledge about the target function to be learned), predictions
for new situations could not be made.
Inductive Bias has some prior assumptions about the tasks. Not one bias that is best on all problems
and there have been a lot of research efforts to automatically discover the Inductive Bias.
The following is a list of common inductive biases in machine learning algorithms.
Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to
maximize conditional independence. This is the bias used in the Naive Bayes classifier.
Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis
with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no
free lunch" theorems show that cross-validation must be biased.
Maximum margin: when drawing a boundary between two classes, attempt to maximize the width
of the boundary. This is the bias used in support vector machines. The assumption is that distinct
classes tend to be separated by wide boundaries.
Minimum description length: when forming a hypothesis, attempt to minimize the length of the
description of the hypothesis. The assumption is that simpler hypotheses are more likely to be true.
See Occam's razor.
Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is
the assumption behind feature selection algorithms.
Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to
the same class. Given a case for which the class is unknown, guess that it belongs to the same class as
the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm.
The assumption is that cases that are near each other tend to belong to the same class.
Cross-validation:

 Cross-validation is a technique in which we train our model using the subset of the data-set and then evaluate
24
using the complementary subset of the data-set.
 Cross-validation is a resampling procedure used to evaluate machine learning models on a limited
data sample. The procedure has a single parameter called k that refers to the number of groups that
a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.
 Cross-validation is a statistical technique for testing the performance of a Machine Learning model.
In particular, a good cross validation method gives us a comp rehensive measure of our model’s
performance throughout the whole dataset.
All cross-validation methods follow the same basic procedure:
(1 ) Divide the dataset into 2 parts: training and testing
(2 ) Train the model on the training set
(3 ) Evaluate the model on the testing set
(4 ) Optionally, repeat steps 1 to 3 for a different set of data points
More thorough cross validation methods with include step 4 since such a measurement will be more
robust to biases that may come with selecting a particular split. Bias that comes from selecting a
particular part of the data is known as Selection Bias.
Such methods will take more time since the model will be trained and validated multiple times. But it
does offer the significant advantage of being more thorough as well as having the chance to potentially
find a split that squeezes out that last bit of accuracy.
Cross-validation Technique:
1.Holdout
Holdout cross validation is the simplest and most common. We simply split the data into two sets:
train and test. The train and test data must not have any of the same data points. Generally, this split
will be close to 85% of the data for training and 15% of the data for testing. The diagram below
illustrates how holdout cross validation would work.
25
The advantage of using a very simple holdout cross validation is that we only need to train one model.
If it performs well enough, we can go ahead and use it in whatever application we intended to. This is
perfectly suitable as long as your dataset is relatively uniform in terms of distribution and “difficulty.”
The danger and disadvantage of holdout cross validation arises when the dataset is not completely
even. In splitting our dataset we may end up splitting it in such a way that our training set is very
different from the test, or easier, or harder.
Thus, the single test that we perform with holdout isn’t comprehensive enough to properly evaluate
our model. We end up with bad things like overfitting or inaccurately measuring our model’s projected
real-world performance.
2.K-Fold Cross Validation
 In this method, we split the data-set into k number of subsets(known as folds) then we perform
training on the all the subsets but leave one(k-1) subset for the evaluation of the trained model. In
this method, we iterate k times with a different subset reserved for testing purpose each time.
 This technique involves randomly dividing the dataset into k groups or folds of approximately
equal size. The first fold is kept for testing and the model is trained on k-1 folds.
 The process is repeated K times and each time different fold or a different group of data points are
used for validation.
26
Example
The diagram below shows an example of the training subsets and evaluation subsets generated in k -
fold cross-validation. Here, we have total 25 instances. In first iteration we use the first 20 percent of
data for evaluation, and the remaining 80 percent for training([1 -5] testing and [5-25] training) while
in the second iteration we use the second subset of 20 percent for evaluation, and the remaining three
subsets of the data for training([5-10] testing and [1-5 and 10-25] training), and so on.
Total instances: 25
Value of k :5
No. Iteration Training set observations Testing set observations
1. [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4]
2. [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9]
3. [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14]
4. [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19]
5. [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24]
The clear advantage of this method over K-Fold is that the proportion of the train-test split is not
dependent on the number of iterations. We can even set different percentages f or the test set on each
iteration if we wanted too. Randomization may also be more robust to selection bias.
The disadvantage of this method is that some points may never be selected to be in the test subset at
all — at the same time, some points might be selected multiple times. This is a direct result of the
27
randomization. Yet with K-Fold there is a guarantee that all points will at some time be tested on.
Advantages of K fold or 10-fold cross-validation
 Computation time is reduced as we repeated the process only 10 times when the value of k is 10.
 Reduced bias
 Every data points get to be tested exactly once and is used in training k-1 times
 The variance of the resulting estimate is reduced as k increases
Disadvantages of K fold or 10-fold cross-validation
 The training algorithm is computationally intensive as the algorithm has to be rerun from scratch
k times.
3.LOOCV (Leave One Out Cross Validation):
In this method, we perform training on the whole data-set but leaves only one data-point of the
available data-set and then iterates for each data-point. It has some advantages as well as disadvantage
also.
Leave-one-out cross-validation is a special case of cross-validation where the number of folds equals
the number of instances in the data set. Thus, the learning algorithm is applied once for each
instance, using all other instances as a training set and using the selected instance as a single-
item testset.
The major drawback of this method is that it leads to higher variation in the testing model as we are
testing against one data point. If the data point is an outlier it can lead to higher variation.
Advantages of Leave one out cross-validation
 An advantage of using this method is that we make use of all data points and hence it is low bias. .
 Good way to Validate
Disadvantages of Leave one out cross-validation

28
It takes a lot of execution time as it iterates over ‘the number of data points’ times. (i.e High
Computation Time )
Advantages of cross-validation:
1. More accurate estimate of out-of-sample accuracy.
2. More “efficient” use of data as every observation is used for both training and testing.
Application of cross-validation:
(a) You can use cross validation to compare the perf ormances of a set of predictive modelling
procedures.
(b) It has excellent use in the field of medical research. Consider that we use the expression levels of
a certain number of proteins, say 15 for predicting whether a cancer patient will respond to a specific
drug.
The ideal way is to determine which subset of the 15 features produce the ideal predictive model.
Using cross validation, you can determine the exact subset that provides the best results.
(c) Recently, data analysts have used cross validation in the field of medical statistics. These
procedures are useful in the meta-analysis.
Linear Regression:
 Linear regression is one of the easiest and most popular Machine Learning algorithms. It is
supervised Learning Algorithm.
 It is a statistical method that is used for predictive analysis. Linear regression makes predictions
for continuous/real or numeric variables such as sales, salary, age, product price, etc.
 Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
 It is a commonly used type of predictive analysis. It is a statistical approach for modelling
relationship between a dependent variable and a given set of independent variables.
 Linear regression is a linear model, e.g. a model that assumes a linear relationship between the
input variables (x) and the single output variable (y). More specifically, that y can be calculated
from a linear combination of the input variables (x).
 Different techniques can be used to prepare or train the linear regression equation from data, the
most common of which is called Ordinary Least Squares. It is common to therefore refer to a
model prepared this way as Ordinary Least Squares Linear Regression or just Least Squares
Regression
 The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:
29
Mathematically, we can represent a linear regression as:
Y= b0+b1X+ ε
Or Y = mX+a
30
Here,
Y= Dependent Variable (Target Variable or label to data i.e output data)

X= Independent Variable (predictor Variable or input Training Data)
b0= intercept of the line (Gives an additional degree of freedom)
b1 = Linear regression coefficient (scale factor to each input value).
ε = random error (For a good model it will be negligible)
The values for x and y variables are training datasets for Linear Regression model representation.
Once we find the best a0 and a1 values, we get the best fit line. So when we are finally using our
model for prediction, it will predict the value of y for the input value of x.
The general equation for the linear regression is y = mx +b. But in both cases, the value to your left
is always the dependent variable which depends on the independent variable multiplied by the slope
(m) and the added or subtracted by the value of b.
Example: The most common uses for linear regression is to predict results for a given data set. For
example, we can have 3 houses which have sizes of 400, 800, and 1200 feet square respectively and
the costs of these three are 100, 200, and 300 dollars. Let’s say we want to buy a house that has a size
of 600 feet square and we want to know the price of it. Well the price of it will most likely be in
between 100 or 200 and it will most likely be 150 because since 600 feet square is in between 400
feet square and 800 feet square and they both have costs of 100 and 200 then a 600 feet square house
will have a price that’s exactly 150 dollars.
There are two types of linear regression.
1.Simple Linear Regression :

 One of the most interesting and common regression technique is simple linear regression. If a
single independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Simple Linear Regression.
 It is a statistical method that allows us to summarize and study relationships between two
continuous (quantitative) variables. One variable denoted x is regarded as an independent variable
and other one denoted y is regarded as a dependent variable. It is assumed that the two variables
are linearly related. Hence, we try to find a linear function that predicts the response value(y) as
accurately as possible as a function of the feature or independent variable(x).
2.Multiple Linear Regression:
 Multiple linear regression (MLR/multiple regression) is a statistical technique. It can use

several variables to predict the outcome of a different variable.
 If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.
31
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:
 Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on X-axis,
then such a relationship is termed as a Positive linear relationship.
 Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable increases on the X-
axis, then such a relationship is called a negative linear relationship.
Advantages of Linear Regression
 Linear Regression performs well when the dataset is linearly separable. We can use it to find the
nature of the relationship among the variables.
 Linear Regression is easier to implement, interpret and very efficient to train.
 Linear Regression is prone to over-fitting but it can be easily avoided using some dimensionality
32
reduction techniques, regularization (L1 and L2) techniques and cross-validation.
Disadvantages of Linear Regression

 Main limitation of Linear Regression is the assumption of linearity between the dependent
variable and the independent variables. In the real world, the data is rarely linearly separable. It
assumes that there is a straight-line relationship between the dependent and independent variables
which is incorrect many times.
 Prone to noise and overfitting: If the number of observations are lesser than the number of
features, Linear Regression should not be used, otherwise it may lead to overfit because is starts
considering noise in this scenario while building the model.
 Prone to outliers: Linear regression is very sensitive to outliers (anomalies). So, outliers should
be analyzed and removed before applying Linear Regression to the dataset.
 Prone to multicollinearity: Before applying Linear regression, multicollinearity should be

removed (using dimensionality reduction techniques) because it assumes that there is no
relationship among independent variables.
Decision Trees:
 Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree -
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision andhavemultiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
 It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure
Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset
and problem is the main point to remember while creating a machine learning model. Below are the
two reasons for using the Decision tree:
 Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
 The logic behind the decision tree can be easily understood because it shows a tree-like structure.
How does the Decision Tree algorithm Work?

33
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute
and, based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub -nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete process
can be better understood using the below algorithm:
 Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decisiontrees usingthe subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classifythe nodes and called
the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he should
accept the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the office)
and one leaf node based on the corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer). Consider the below diagram:
34
Types of decision tree algorithms are:-
 Iterative Dichotomiser 3 (ID3): This algorithm uses Information Gain to decide which attribute
is to be used classify the current subset of the data. For each level of the tree, information gain is
calculated for the remaining data recursively. uses Entropy function and Information gain as
metrics.
 C4.5: This algorithm is the successor of the ID3 algorithm. This algorithm uses either Information
gain or Gain ratio to decide upon the classifying attribute. It is a direct improvement from the ID3
algorithm as it can handle both continuous and missing attribute values.
 Classification and Regression Tree(CART): It is a dynamic learning algorithm which can

produce a regression tree as well as a classification tree depending upon the dependent variable.
Classification trees (Yes/No types) Here the decision variable is Categorical. Regression trees
(Continuous data types) Here the decision or the outcome variable is Continuous, e.g. a number
like 123.
There are many algorithms out there which construct Decision Trees, but one of the best is called as
ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3.: the most important part of Decision
Tree algorithm is deciding the best attribute. the best mean the attribute which has most
information gain. Before discussing the ID3 algorithm, we’ll go through few definitions.
Entropy:
 Entropy is the measures of impurity, disorder or uncertainty in a bunch of examples.

 Entropy controls how a Decision Tree decides to split the data. It actually effects how
a Decision Tree draws its boundaries.
 Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of
the amount of uncertainty or randomness in data. In machine learning sense and especially in
this case Entropy is the measure of homogeneity in the data. Its value is ranges from 0 to 1. Its
value is close to 0 if all the example belongs to same class and is close to 1 is there is almost
equal split of the data into different classes.
Now the formula to calculate entropy is :
Here pi represents the proportion of the data with i th classification and c represents the different types
of classification.
35
Information Gain:
 Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a set S is
the effective change in entropy after deciding on a particular attribute A. Information gain is the
measurement of changes in entropy after the segmentation of a dataset based on an attribute.
 It calculates how much information a feature provides us about a class.
 According to the value of information gain, we split the node and build the decision tree.
 A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using the below
formula:
The formula to calculate Gain by splitting the data on Dataset ‘S’ and on the attribute ‘A’ is :
Alternatively,
Here Entropy(S) represents the entropy of the dataset and the second term on the right is the weighted
entropy of the different possible classes obtain after the split. Now the goal is to maximize this
information gain. The attribute which has the maximum information gain is selected as the parent
node and successively data is split on the node.
Working of Iterative Dichotomiser 3 (ID3) Algorithm will perform following tasks recursively
 Create root node for the tree

 If all examples are positive, return leaf node ‘positive’
 Else if all examples are negative, return leaf node ‘negative’
 Calculate the entropy of current state H(S)
 For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S, x)
 Select the attribute which has maximum value of IG(S, x)
 Remove the attribute that offers highest IG from the set of attributes
 Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
Advantages of the Decision Tree

 It is simple to understand as it follows the same process which a human follow while making
any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.
36
Disadvantages of the Decision Tree
 The decision tree contains lots of layers, which makes it complex.

 It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
 For more class labels, the computational complexity of the decision tree may increase.
Overfitting:
 A statistical model is said to be overfitted, when we train it with a lot of data (just like fitting
ourselves in oversized pants!). When a model gets trained with so much of data, it starts learning
from the noise and inaccurate data entries in our data set. Then the model does not categorize the data
correctly, because of too many details and noise. The causes of overfitting are the non -parametric and
non-linear methods because these types of machine learning algorithms have more freedom in
building the model based on the dataset and therefore, they can really build unrealistic models
 Overfitting refers to a model that models the training data too well. Overfitting happens when a
model learns the detail and noise in the training data to the extent that it negatively impacts the
performance of the model on new data.
 This means that when a model fits more data than it needs it starts catching the noisy data and
inaccurate values in the data, as result the efficiency and accuracy of the model decrease .
For example, decision trees are a nonparametric machine learning algorithm that is very flexible and
is subject to overfitting training data. This problem can be addressed by pruning a tree after it has
learned in order to remove some of the detail it has picked up
The line seen in the image above can give a very efficient outcome for a new data point. In the case
of overfitting, when we run the training algorithm on the data set, we allow the cost to reduce with
each number of iterations.
How to Avoid Overfitting In Machine Learning?

There are several techniques to avoid overfitting in Machine Learning altogether listed below.
37
1. Cross-Validation
2. Training With More Data
3. Removing Features
4. Early Stopping
5. Regularization
6. Ensembling
1. Cross-Validation
One of the most powerful features to avoid/prevent overfitting is cross-validation. The idea behind this
is to use the initial training data to generate mini train-test-splits, and then use these splits to tune your
model.
In a standard k-fold validation, the data is partitioned into k-subsets also known as folds. After this, the
algorithm is trained iteratively on k-1 folds while using the remaining folds as the test set, also known
as holdout fold.
The cross-validation helps us to tune the hyperparameters with only the original training set. It
basically keeps the test set separately as a true unseen data set for selecting the final model. Hence,
avoiding overfitting altogether.
2. Training with More Data

This technique might not work every time, as we have also discussed in the example above, where
training with a significant amount of population helps the model. It basically helps the model in
identifying the signal better.
38
But in some cases, the increased data can also mean feeding more noise to the model. When we are
training the model with more data, we have to make sure the data is clean and free from randomness
and inconsistencies.
3. Removing Features
Although some algorithms have an automatic selection of features. For a significant number of those
who do not have a built-in feature selection, we can manually remove a few irrelevant features from
the input features to improve the generalization.
One way to do it is by deriving a conclusion as to how a feature fits into the model. It is quite similar
to debugging the code line-by-line.
In case if a feature is unable to explainthe relevancy in the model, we can simplyidentifythose features.
We can even use a few feature selection heuristics for a good starting point.
4. Early Stopping
When the model is training, you can actually measure how well the model performs based on each
iteration. We can do this until a point when the iterations improvethe model’s performance. After this,
the model overfits the training data as the generalization weakens after each iteration.
5. Regularization
It basically means, artificially forcingyour model to be simpler byusinga broader range of techniques.
It totally depends on the type of learner that we are using. For example, we can prune a decision tree,
use a dropout on a neural network or add a penalty parameter to the cost function in regression.
Quite often, regularization is a hyperparameter as well. It means it can also be tuned through cross-
39
validation.
6. Ensembling
This technique basically combines predictions from different Machine Learning models. Two of the
most common methods for ensembling are listed below:
 Bagging attempts to reduce the chance overfitting the models

 Boosting attempts to improve the predictive flexibility of simpler models
Even though they are both ensemble methods, the approach totally starts from opposite directions.
Bagging uses complex base models and tries to smooth out their predictions while boosting uses
simple base models and tries to boost its aggregate complexity.
(Subject In-charge)
(Prof.S.B.Mehta)
40
Instance Based Learning
 Instance: An instance is an example in the training data. An instance is described by a number of
attributes. One attribute can be a class label. Attribute/Feature: An attribute is an aspect of an
instance (e.g. temperature, humidity). Attributes are often called features in Machine Learning
 In machine learning, instance-based learning (sometimes called memory-based learning) is a family
of learning algorithms that, instead of performing explicit generalization, compares new problem
instances with instances seen in training, which have been stored in memory.
 Instance-based methods are sometimes referred to as lazy learning methods because they delay
processing until a new instance must be classified.
 Also known as Memory based learning, Instance based learning is a supervised classification
learning algorithm that performs operation after comparing the current instances with the previously
trained instances, which have been stored in memory. Its name is derived from the fact that it creates
assumption from the training data instances.
 Time complexity of Instance based learning algorithm depends upon the size of training data. Time
complexity of this algorithm in worst case is O (n), where n is the number of training items to be
used to classify a single new instance.
 To improve the efficiency of instance-based learning approach, preprocessing phase is required.
Preprocessing phase is a data structure that enables efficient usage of run time modeling of test
instance.
 Advantage of using Instance based learning over others is that it has the ability to adapt to
previously unseen data, which means that one can store a new instance or drop the old instance.
Example : Spam Email : It use the email flags to measure the similarity between two mails. Similarity
measure between two emails could beto count the no. of words they have common. The System would flag
an email as spam if it has many words in common with a known email.
Technique of Instance Based Learning

1.K-Nearest neighbor Learning
2.locally weighted regression
3.case-based reasoning
1.K-Nearest neighbor Learning

1
● K-Nearest Neighbors is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
● K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
● K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
● K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
● K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
● It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
● Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So, for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features of the new data set to the cats
and dogs’ images and based on the most similar features it will put it in either cat or dog category.
2
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K number of neighbors
 Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
 Step-4: Among these k neighbors, count the number of the data points in each category.
 Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
 Step-6: Our model is ready.
Example:
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
3
Firstly, we will choose the number of neighbors, so we will choose the k=5.
 Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is
the distance between two points, which we have already studied in geometry. It can be calculated as:
 By calculating the Euclidean distance, we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:
4
 As we can see the 3 nearest neighbors are from category A, hence this new data point must belong
to category A.
How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN algorithm:
 There is no particular way to determine the best value for "K", so we need to try some values to find
the best out of them. The most preferred value for K is 5.
 A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the
model.
 Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:

 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:

 Always needs to determine the value of K which may be complex some time.
 The computation cost is high because of calculating the distance between the data points for all the
training samples.
2. Locally weighted regression

 Locally weighted regression (LWR) attempts to fit the training data only in a region around the
location of a query example. LWR is a type of lazy learning, therefore the processing of training
data is often postponed until the target value of a query example needs to be predicted
 Locally weighted regression is also called LOESS or LOWESS. It’s inspired by cases when linear
regression, which simply fits a line, isn’t sufficient, but we don’t want to overfit either
 Locally weighted linear regression is a non-parametric algorithm, that is, the model does not learn a
fixed set of parameters as is done in ordinary linear regression
 LWR depends on the distance function used to recover the nearest neighbours of a given query
example. However, the distance function does not need to satisfy the formal mathematical
requirements for a distance metric 2. LWR enables several ways to use a distance function 2, for
instance: (I) one distance function is used in all parts of the input space (global distance function),
(II) the parameters of a distance function are set for each query example by an optimization process
(query-based local distance function), or (III) each training example has a distance function and its
corresponding parameter values (point-based local distance function).
5
The blue dots are the training data. We have a test point, and we want to predict the value of . Obviously,
fitting one line to this whole dataset will lead to a value that’s way off the real one. Let’s use this weighting
concept and only look at a few nearby points, and perform regressions using those nearby points only:
Well that’s significantly better. It looks like the predicted value of is something we’d expect given how
our curve looks. Let’s now go over the math for this, and see how we change standard linear regression to
this.
3.Case Based Reasoning (CBR) Classifier:

 Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on
the solutions of similar past problems’ deals with very specific data from the previous situations,
and reuses results and experience to fit a new problem situation.
 CBR is a Problem-Solving Technique that matches a new case with previously solved case and it’s
solution. Both are stored in database.
How CBR works?
When a new case arises to classify, a Case-based Reasoner (CBR) will first check if an identical training
case exists. If one is found, then the accompanying solution to that case is returned. If no identical case
is found, then the CBR will search for training cases having components that are similar to those of the
new case. Conceptually, these training cases may be considered as neighbors of the new case. If cases
are represented as graphs, this involves searching for subgraphs that are similar to subgraphs within the
new case. The CBR tries to combine the solutions of the neighboring training cases to propose a
solution for the new case. If compatibilities arise with the individual solutions, then backtracking to
search for other solutions may be necessary. The CBR may employ background knowledge and
problem-solving strategies to propose a feasible solution.
6
Case-based reasoning consists of a cycle of the following four steps:
1. Retrieve: Gathering from memory an experience closest to the current problem. i.e. Given a new
case, retrieve similar cases from the case base.
2. Reuse: Suggesting a solution based on the experience and adapting it to meet the demands of the
new situation Adapt the retrieved cases to fit to the new case.
3. Revise: Evaluate the solution and revise it based on how well it works.
4. Retain: Storing this new problem-solving method in the memory system.
If the case retrieved works for the current situation, it should be used. Otherwise, it may need
to be adapted. The revision may involve other reasoning techniques, such as using the proposed
solution as a starting point to search for a solution, or a human could do the adaptation in an
interactive system. The new case and the solution can then be saved if retaining it will help in the
future.
Applications of CBR includes:

1. Problem resolution for customer service help desks, where cases describe product-related diagnostic
problems.
2. It is also applied to areas such as engineering and law, where cases are either technical designs or
legal rulings, respectively.
3. Medical education, where patient case histories and treatments are used to help diagnose and treat
new patients.
Advantages of CBR
 Remembering past experiences helps learners avoid repeating previous mistakes, and the reasoner
can discern what features of a problem are significant and focus on them.
7
 CBR is intuitive because it reflects how people work. Because no knowledge must be elicited to
create rules or methods, development is easier.
Another benefit is that systems learn by acquiring new cases through use, and this makes
maintenance easier. This makes development easier.
 Systems learn by acquiring new cases through
Disadvantages of CBR
 Can take large storage space for all the cases

 Can take large processing time to find similar cases in case-base
 Cases may need to be created by hand
 Adaptation may be difficult
 Needs case-base, case selection algorithm, and possibly case-adaptation algorithm
Recommended System:
A Recommended system makes prediction based on users’ historical behaviors. Specifically, it’s to
predict user preference for a set of items based on past experience. To build a recommender system.
During the last few decades, with the rise of YouTube, Amazon, Netflix and many other such web
services, recommender systems have taken more and more place in our lives. From e-commerce
(suggest to buyers articles that could interest them) to online advertisement (suggest to users the right
contents, matching their preferences), recommender systems are today unavoidable in our daily online
journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users
(items being movies to watch, text to read, products to buy or anything else depending on industries).
The most two popular approaches are:

8
1. Content-based Recommended System System
2. Collaborative Filtering Recommended System.
1. Content-based Recommended System System

 Content based filtering algorithms are based on the assumption that users are going to give similar
rating to object with similar objective features.
 A Content-based recommendation system tries to recommend items to users based on their profile.
The user's profile revolves around that user's preferences and tastes. It is shaped based on user
ratings, including the number of times that user has clicked on different items or perhaps even liked
those items.
 content based approaches use additional information about users and/or items. If we consider the
example of a movies recommender system, this additional information can be, for example, the age,
the sex, the job or any other personal information for users as well as the category, the main actors,
the duration or other characteristics for the movies (items).
2. Collaborative Filtering Recommended System:
 Collaborative Filtering is a technique which is widely used in recommendation systems and is

rapidly advancing research area.
 Collaborative filtering models try to find similarities between items / users through commonly rated
/owned items.
 Collaborative Filtering is the process of filtering or evaluating items using the opinions of other
people. This filtering is done by using profiles. Collaborative filtering techniques collect and
establish profiles, and determine the relationships among the data according to similarity models.
The possible categories of the data in the profiles include user preferences, user behavior patterns, or
item properties.
 For each user, recommender systems recommend items based on how similar users liked the item.
 Example: Alice and Bob are users have similar interests in video games.
 Collaborative filtering is an unsupervised learning which we make predictions from ratings supplied
by people. Each row represents the ratings of movies from a person and each column indicates the
ratings of a movie.
 Collaborative filtering is a technique that can filter out items that a user might like on the basis of
reactions by similar users.
 It works by searching a large group of people and finding a smaller set of users with tastes similar to
a particular user. It looks at the items they like and combines them to create a ranked list of
suggestions.
 The functionalities of Collaborative filtering recommendations system can be stated as
A. Recommendations and predictions

1) Recommendation
Recommendations functionality displays a list of items to a user. The items are listed in the order of
9
usefulness
to the user. For example, Amazon’s recommendation algorithm aggregates items similar to a user’s
purchases
and ratings without ever computing a predicted rating.
2) Prediction
In prediction a calculation of predicted rating is made for a particular item. Prediction is more
demanding that
recommendations because in order to make predictions the system must be able to say something about
required
item. Some algorithms take advantage of this to be more scalable by saving memory and computation
time.
B. Prediction versus Recommendation
 Prediction and Recommendation tasks place different requirements on a CF system.To recommend
items, information regarding all items is not required.
 To provide predictions for a particular item, information regarding every item, even rarely rated
ones is required
 The Algorithms used for recommendations have less memory and computation time requirements
when
 compared to algorithms used for making predictions.
 Recommendation tasks require calculation of predictions or some scoring function for many (if not
all) items.
Therefore, a single prediction request can afford a more expensive prediction calculation than a
recommendation request.
User and Item based collaborative filtering
Collaborative filtering uses different methods to calculate the similarity between two products or two
users. In an item-based approach, a product is compared to other products. The more similar the
interactions of customers between these two products are, the more they fit together. With the user-
based approach, the same happens, but instead of products, customers are compared with each other.
With the help of the similarity matrix, a predict function can be used to create a predicted rating for each
product with which a customer has not yet interacted. Based on these predicted ratings, products can
10
then be recommended.
The two most popular collaborative filtering algorithms are categorized as:
1.Memory-based
2.Model-based.
1.Memory-based :
 Memory-based algorithms approach the collaborative filtering problem by using the entire
database .Memory-based techniques use the data (likes, votes, clicks, etc) that you have to
establish correlations (similarities?) between either users (Collaborative Filtering) or items
(Content-Based Recommendation) to recommend an item i to a user u who's never seen it
before.
 Memory-based models calculate the similarities between users / items based on user-item rating
pairs.
 Memory based Recommendation generalizes from memory-based data at the time of making
memory-based learning it is also referred as lazy learning. In memory-based learning users are
divided into groups based on their interest. When a new user comes into system, we determine
neighbors of users to make predictions for him. Memory based recommendation uses entire or
sample of user item database to make predictions.
 The main idea behind UB-CF is that people with similar characteristics share similar taste. For
example, if you are interested in recommending a movie to our friend Bob, suppose Bob and I
have seen many movies together and we rated them almost identically. It makes sense to think
that in future as well we would continue to like similar movies and use this similarity metric to
recommend movies.
The two approaches

User-based:
 In user-based, similar users which have similar ratings for similar items are found and then
target user's rating for the item which target user has never interacted is predicted.
 user based finds similar users and gives them recommendations based on what other people with
similar consumption patterns appreciated.
 The report is focusing on the “nearest neighbors” approach for recommendations, which looks at
the users rating patterns and finds the “nearest neighbors”, i.e users with ratings similar to yours.
The algorithm then proceeds to give you recommendations based on the ratings of these
neighbors.
 For a user U, with a set of similar users determined based on rating vectors consisting of given
item ratings, the rating for an item I, which hasn’t been rated, is found by picking out N users
from the similarity list who have rated the item I and calculating the rating based on these N
ratings.
11
Item-based:
 Item based collaborative filtering finds similarity patterns between items and recommends them
to users based on the computed information
 Item based collaborative filtering was introduced 1998 by Amazon[6]. Unlike user based
collaborative filtering, item based filtering looks at the similarity between different items,and
does this by taking note of how many users that bought item X also bought item Y. If the
correlation is high enough, a similarity can be presumed to exist between the two items, and they
can be assumed to be similar to one another. Item Y will from there on be recommended to users
who bought item X and vice versa.
The picture depicts a graph of how users ratings affect their recommendations
Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and
produces high-quality recommendations in real time. This type of filtering matches each of the user's
purchased and rated items to similar items, then combines those similar items into a recommendation
list for the user
For an item I, with a set of similar items determined based on rating vectors consisting of received user
ratings, the rating by a user U, who hasn’t rated it, is found by picking out N items from the similarity
list that have been rated by U and calculating the rating based on these N ratings.
2.Model-based.:
 Model-based recommendation systems involve building a model based on the dataset of ratings.
In other words, we extract some information from the dataset, and use that as a "model" to make
recommendations without having to use the complete dataset every time. This approach
potentially offers the benefits of both speed and scalability.
 Model based collaborative filtering is a two stage process for recommendations in the first stage
model is
 learned offline in the second stage a recommendation is generated for a new user based on the
learned model.
12
 Model-based techniques on the other hand try to further fill out this matrix. They tackle the task
of “guessing” how much a user will like an item that they did not encounter before. For that they
utilize several machine learning algorithms to train on the vector of items for a specific user,
then they can build a model that can predict the user’s rating for a new item that has just been
added to the system.
 Popular model-based techniques are Bayesian Networks, Singular Value Decomposition, and
Probabilistic Latent Semantic Analysis (or Probabilistic Latent Semantic Indexing). For some
reason, all model-based techniques do not enjoy particularly happy-sounding names.
Features Reduction:
 Feature reduction, also known as dimensionality reduction, is the process of reducing the
number of features in a resource heavy computation without losing important information.
 Reducing the number of features means the number of variables is reduced making the
computer’s work easier and faster.
 In machine learning classification problems, there are often too many factors on the basis of
which the final classification is done. These factors are basically variables called features.
 The higher the number of features, the harder it gets to visualize the training set and then work
on it. Sometimes, most of these features are correlated, and hence redundant. This is where
dimensionality reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set of principal
variables.
Feature reduction can be divided into two processes:
1.Feature selection:
Feature selection is a way of selecting the subset of the most relevant features from the original
features set by removing the redundant, irrelevant, or noisy features.
1. While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant.
2. If we input the dataset with all these redundant and irrelevant features, it may negatively impact and
reduce the overall performance and accuracy of the model.
3. Hence it is very important to identify and select the most appropriate features from the data and
13
remove the irrelevant or less important features, which is done with the help of feature selection in
machine learning.
4. choosing the important features for the model is known as feature selection. Each machine learning
process depends on feature engineering, which mainly contains two processes; which are Feature
Selection and Feature Extraction
5. The main difference between them is that feature selection is about selecting the subset of the
original feature set, whereas feature extraction creates new features
6. Feature selection is a way of reducing the input variable for the model by using only relevant data in
order to reduce overfitting in the model.
7. "It is a process of automatically or manually selecting the subset of most appropriate and relevant
features to be used in model building.
It usually involves three ways:
1. Filter
1. In Filter Method, features are selected on the basis of statistics measures. This method does not
depend on the learning algorithm and chooses the features as a pre-processing step.
2. The filter method filters out the irrelevant feature and redundant columns from the model by
using different metrics through ranking.
3. The advantage of using filter methods is that it needs low computational time and does not
overfit the data.
Some common techniques of Filter methods are as follows:

o Information Gain
o Chi-square Test
14
o Fisher's Score
o Missing Value Ratio
 Information Gain: Information gain determines the reduction in entropy while transforming the
dataset. It can be used as a feature selection technique by calculating the information gain of each
variable with respect to the target variable.
 Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.
 Fisher's Score:Fisher's score is one of the popular supervised technique of features selection. It
returns the rank of the variable on the fisher's criteria in descending order. Then we can select the
variables with a large fisher's score.
 Missing Value Ratio:
The value of the missing value ratio can be used for evaluating the feature set against the threshold
value. The formula for obtaining the missing value ratio is the number of missing values in each column
divided by the total number of observations. The variable is having more than the threshold value can be
dropped.
2.Wrapper
In wrapper methodology, selection of features is done by considering it as a search problem, in
which different combinations are made, evaluated, and compared with other combinations. It trains
the algorithm by using the subset of features iteratively. On the basis of the output of the model,
features are added or subtracted, and with this feature set, the model has trained again.
15
Some common examples of wrapper methods are forward feature selection, backward
feature elimination, recursive feature elimination, etc.
 Forward Selection: Forward selection is an iterative method in which we start with having no
feature in the model. In each iteration, we keep adding the feature which best improves our
model till an addition of a new variable does not improve the performance of the model.
 Backward Elimination: In backward elimination, we start with all the features and removes the
least significant feature at each iteration which improves the performance of the model. We
repeat this until no improvement is observed on removal of features.
 Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the
best performing feature subset. It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration. It constructs the next model with the left features until all
the features are exhausted. It then ranks the features based on the order of their elimination.
2. Embedded:
Embedded methods combine the qualities’ of filter and wrapper methods. It’s implemented by
algorithms that have their own built-in feature selection methods
The key difference between feature selection and extraction is that feature selection keeps a subset of
the original features while feature extraction creates brand new ones.
Top reasons to use feature selection are:
 It enables the machine learning algorithm to train faster.
 It reduces the complexity of a model and makes it easier to interpret.
 It improves the accuracy of a model if the right subset is chosen.
 It reduces overfitting.
16
2.Feature extraction:
Feature Extraction aims to reduce the number of features in a dataset by creating new features from the
existing ones (and then discarding the original features). These new reduced set of features should then
be able to summarize most of the information contained in the original set of features.
It follows Technique:
Principle Component Technique:
 Principle Component Analysis (PCA) is a common feature extraction method in data
science. Principal Component Analysis, or PCA, is a dimensionality-reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large set of
variables into a smaller one that still contains most of the information in the large set.
 PCA is a statistical procedure that orthogonally transforms the original n coordinates of a data set
into a new set of n coordinates called principal components.
 PCA is standard tool in modern data analysis in diverse fields from neuroscience to computer
graphics.
 It is very useful method for extracting relevant information from confusing data sets.
 Principle Component Analysis (PCA) is a common feature extraction method in data science.
Technically, PCA finds the eigenvectors of a covariance matrix with the highest eigenvalues and
then uses those to project the data into a new subspace of equal or less dimensions. Practically,
PCA converts a matrix of n features into a new dataset of (hopefully) less than n features. That
is, it reduces the number of features by constructing a new, smaller number variables which
capture a significant portion of the information found in the original features. However, the goal
of this tutorial is not to explain the concept of PCA, that is done very well elsewhere, but rather
to demonstrate PCA in action.
 Goals of PCA Analysis is to identify patterns in data, to detect the co-rrelation between
variables. It attempt to reduce the dimensionality.
Advantages of Dimensionality Reduction

 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
Disadvantages of Dimensionality Reduction

 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some thumb rules are
applied.
(Subject In-charge)
(Prof.S.B.Mehta)
17

Unit 1 Introduction of Machine Learning Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 Introduction of Machine Learning Notes

Uploaded by

Copyright:

Available Formats

NUTAN MAHARASHTRA VIDYA PRASARAK MANDAL’S

NUTAN COLLEGE OF ENGINEERING & RESEARCH (NCER)

Unit 1: Introduction to Machine Learning (07Hrs)

2  Hypothesis space and inductive bias, Evaluation

3  Cross-validation, Linear regression

4  Decision trees ,Overfitting ,

5  Instance based learning,

6  Collaborative filtering-based recommendation

Nutan College Of Engineering & Research, & ENGINEERING

Machine Learning Introduction:

How do Machines learn?

There are basic steps used to perform a machine learning task:

1. Collecting and preparing data:

3. Evaluating the model:

Advantages of Machine Learning?

 Easily identifies trends and patterns

 No human intervention needed (automation)

 Handling multi-dimensional and multi-variety data

Disadvantages of Machine Learning

Applications of Machine Learning

Real World Application of Machine Learning

 Virtual Personal Assistants

 Product Recommendations : Machine learning is widely used by various e-commerce and

Life Cycle of Machine Learning

This step includes the below tasks:

This step can be further divided into two processes:

Now the next step is preprocessing of data for its analysis.

o Selection of analytical techniques

Types of Machine Learning

Types Of Machine Learning :

How Supervised Learning Works?

Steps Involved in Supervised Learning:

 First Determine the type of training dataset

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

Types of Regression Algorithm:

 Simple Linear Regression

Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the following types:

Advantages of Supervised learning:

Disadvantages of supervised learning:

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Example 2 of Unsupervised Learning

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

Advantages of Unsupervised Learning

Disadvantages of Unsupervised Learning

 Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns

Types of Reinforcement learning

There are mainly two types of reinforcement learning, which are:

Application of Machine Learning Algorithms

Difference between Supervised vs Unsupervised vs Reinforcement Learning

Criteria Supervised Unsupervised Reinforcement

Definition The Machine learns The Machine is An agent interacts with

Type of Regression and Association and Reward based

Type of Labeled Data Unlabeled data No Predefined data

Training External No Supervision No Supervision

Approach Maps the labeled to Understand Fallow the trial and

Applicatio Fraud Detection, Text mining, Face Gaming, Inventory

A hypothesis in machine learning:

1. Covers the available evidence: the training dataset.

3. Can be used in new situations: make predictions on new data.

We can summarize the three definitions again as follows: