You are on page 1of 42

Department of

Computer Engineering

Machine Learning Machine Learning


Sem 7

System Design 01CE0711


4 Credits

Unit #2

Prof. Ankita Mishra


 After completion of this course, students will be able to
 Understand machine-learning concepts. (Understand)
 Understand Optimization theory and concepts.
(Understand)
Course  Understand and analyze the different methods of Gradient
Outcomes Descent. (Analyze)
 Apply the concept of Supervised and Unsupervised
Learning. (Apply)
 Apply the concepts of machine learning and optimization in
designing intelligent systems. (Apply)
 Evaluating a Learning Algorithm: Deciding what to try
next,
 Evaluating Hypothesis, Model Selection and Train/
Validation/ Test Sets
 Bias Vs variance: Diagnosing Bias Vs Variance,
Regularization, and Bias/ Variance, Learning Curve,
Topics  Building a Spam Classifier: Prioritizing what to work on,
Error Analysis,
 handling Skewed Data: Error Matrices for Skewed
Classes, Trading off Precision and recall,
 Data for Machine Learning;
 Support Vector Machines: Large Margin Classification:
 Optimization Objective, Large Margin Intuition, Kernels
Problem Statement:
It can so happen that, upon applying learning algorithm
to a problem statement, there are unacceptably large
errors in the predictions made by a model. There are
Evaluating a various options that can possibly increase the
Learning performance of a model and in turn accuracy, such as,
Algorithm:  Acquire more training data
Deciding what  Filter and reduce the number of features

to try next  Increase the number of features


 Adding polynomial features
 Decreasing the regularization parameter, λ
 Increase the regularization parameter, λ
 Often it is hard to decide what is right and what is not
in evaluating the effectiveness of an algorithm. Which
step one should try and evaluate among the heap of
probable options that can help.
 Seeing the number of options it can often be
Evaluating a cumbersome to decide which path should one follow.
Learning  Because it can so happen that sometimes one or more
Algorithm: of the listed techniques might not work in a given case
and hence would lead to wasted resources.
Deciding what
 Machine Learning Diagnostics are tests that help gain
to try next insight about what would or would not work with a
learning algorithm, and hence give guidance about
how to improve the performance. These can take time
to implement, but are still worth venturing into during
the time of uncertainties.
Overfitting and Train-Test Split
 The case of overfitting in a linear regression we have
seen that even though hypothesis has low error rate
Evaluating a still it is inaccurate due to overfitting.
Learning  This is where the standard technique of train-test
Algorithm: split comes in handy.

Deciding what  In this method, given a training dataset, it is split into


two sets: training set and test set.
to try next  Typically, the training set has 70% of the data while test
set has the remaining 30%.
• The training process is then defined as:
➢Learn Θ and minimize J(Θ) on the training set.
➢Compute the error, Jtest(Θ) on the test set.
Evaluating a
Learning  Test set error can be defined as follows:
Algorithm: For linear regression, mean-squared error (MSE),
Deciding what defined as,

to try next
For logistic regression, the cross-entropy cost function,
defined as,

Evaluating a
Learning
Algorithm:
Deciding what
to try next
Model
selection: Issue
with train/test
split
Why Train/Validation/Test Splits?
 We have n models, having varying candidate hyper
parameters (like number of polynomial terms in
regression, number of hidden layer and neurons in the
neural network) and one is chosen based on the lowest
Why test error it reports after training.
Train/Validati  Would it be correct to report this error as the indicator
of the generalized performance of the selected model?
on/Test Splits?  In general, many practitioners do use the same metrics
as model performance, but it is advised against.
 This is so because, the way the model parameters were
fit to the train samples and hence would report lower
error on train dataset, similarly the model hyper
parameters are fit to the test set and would report a
lower error.
 To overcome this issue, it is recommended to split the
dataset into three parts, namely, train, cross-validation
(or validation) and test.
 Now, train set is used to optimize the model
Why parameters, then cross-validation set is used to select
the best model among ones having varying hyper
Train/Validati parameters.
on/Test Splits?  Finally the generalized performance can be calculated
on test dataset which is not seen during training and
model selection process.
 This would be the truly unbiased reporting of model
performance metrics. This way the hyper parameters
have not been trained using the test set.
Model
selection:
solution to
with train/test
split
Model
selection:
solution to
with train/test
split
Model
selection:
solution to
with train/test
split
There are mainly two types of errors in machine
learning:
 Reducible errors: These errors can be reduced to
improve the model accuracy. Such errors can further be
classified into bias and Variance.
Errors In
Machine
Learning

 Irreducible errors: These errors will always be present


in the model. The cause of these errors is unknown
variables whose value can't be reduced.
 A machine learning model analyses the data, find patterns in
it and make predictions.
 While training, the model learns these patterns in the
dataset and applies them to test data for prediction.
 While making predictions, a difference occurs between
prediction values made by the model and actual
values/expected values, and this difference is known as
bias errors or Errors due to bias.
Bias  A model has either:
Low Bias: A low bias model will make fewer assumptions
about the form of the target function.
High Bias: A model with a high bias makes more
assumptions, and the model becomes unable to capture the
important features of our dataset. A high bias model also
cannot perform well on new data.
 The variance would specify the amount of variation in the
prediction if the different training data was used.
 In simple words, variance tells that how much a random
variable is different from its expected value.
 Ideally, a model should not vary too much from one training
dataset to another, which means the algorithm should be
good in understanding the hidden mapping between inputs
and output variables.
Variance  A model has either:
Low variance means there is a small variation in the
prediction of the target function with changes in the training
data set.
At the same time, High variance shows a large variation in
the prediction of the target function with changes in the
training dataset.
There are four possible combinations of bias and
variances, which are represented by the below
diagram:

Different
Combinations
of Bias-
Variance
1. Low-Bias, Low-Variance: The combination of low bias
and low variance shows an ideal machine learning model.
However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high
variance, model predictions are inconsistent and accurate
on average. This case occurs when the model learns with a
Different large number of parameters and hence leads to
an overfitting.
Combinations
3. High-Bias, Low-Variance: With High bias and low
of Bias- variance, predictions are consistent but inaccurate on
average. This case occurs when a model does not learn
Variance well with the training dataset or uses few numbers of the
parameter. It leads to underfitting problems in the
model.
4. High-Bias, High-Variance: With high bias and high
variance, predictions are inconsistent and also inaccurate
on average
Overfitting
 Overfitting occurs when our machine learning model
tries to cover all the data points or more than the
required data points present in the given dataset.
 Because of this, the model starts caching noise and
Overfitting inaccurate values present in the dataset, and all these
and factors reduce the efficiency and accuracy of the
model.
Underfitting
 The over fitted model has low bias and high variance.
 The chances of occurrence of overfitting increase as
much we provide training to our model. It means the
more we train our model, the more chances of
occurring the over fitted model.
Underfitting
 Underfitting occurs when our machine learning model
is not able to capture the underlying trend of the data.
 To avoid the overfitting in the model, the fed of
training data can be stopped at an early stage, due to
which the model may not learn enough from the
Overfitting training data.
and  As a result, it may fail to find the best fit of the
dominant trend in the data.
Underfitting  In the case of underfitting, the model is not able to
learn enough from the training data, and hence it
reduces the accuracy and produces unreliable
predictions.
 An under fitted model has high bias and low variance.
Overfitting
and
Underfitting
Overfitting
and
Underfitting
Diagnosing
Bias Vs
Variance
Bias-Variance Trade-Off
 While building the machine learning model, it is
really important to take care of bias and variance in
order to avoid overfitting and underfitting in the
model.
Bias-Variance  If the model is very simple with fewer parameters,
Trade-Off it may have low variance and high bias.
 Whereas, if the model has a large number of
parameters, it will have high variance and low bias.
 So, it is required to make a balance between bias
and variance errors, and this balance between the
bias error and variance error is known as the Bias-
Variance trade-off.
Bias-Variance
Trade-Off
 For an accurate prediction of the model, algorithms
need a low variance and low bias. But this is not
possible because bias and variance are related to each
other:
➢If we decrease the variance, it will increase the bias.
Bias-Variance ➢If we decrease the bias, it will increase the variance.
Trade-Off  So, we need to find a sweet spot between bias and
variance to make an optimal model.
 Hence, the Bias-Variance trade-off is about finding
the sweet spot to make a balance between bias and
variance errors.
Bias/Variance
as a function of
Regularization
parameter λ
Bias/Variance
as a function of
Regularization
parameter λ
Bias/Variance
as a function of
Regularization
parameter λ
Learning
Curves
Learning
Curves
Learning
Curves
Building a
Spam
Classifier:
Prioritizing
what
to work on
 Supervised Learning:
x= features of email
y= spam(1) or not spam(0)
Building a
Feature x= Choose 100 words indicative of spam/not
Spam spam
Classifier: Eg: Indicative words in Spam emails: Buy, Discount, Deal
Prioritizing Indicative words in Non Spam emails: Andrew, Now
what
to work on
 Step 1: Arranging 100 indicative words in alphabetical
order. For eg: Andrew, Buy, Deal, Discount, Now
 Step 2: Now I will check email and map words from
Building a list(described in step 1) and then define feature vector
X.
Spam
Classifier:
Prioritizing
what
to work on
X= [0 1 1 0 0….]
Building a
Spam
Classifier:
Prioritizing
what
to work on
Error Analysis
Error Analysis
Error Analysis
Error Analysis
Any
Thank you
Queries..??

You might also like