Professional Documents
Culture Documents
Presentation by:
Akanksha Shangloo
Asst. Professor
Dept. of CSE #Lecture1
School of Engineering and Technology
Supervised Learning
• Supervised learning is the type of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output.
• The labelled data means some input data is already tagged with the correct output.
• Supervised learning is a process of providing input data as well as correct output data to the
machine learning model.
• The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
• Supervised learning is classified into two categories of algorithms:
✓ Classification: A classification problem is when the output variable is a category, such as “Red” or
“blue” , “disease” or “no disease”.
✓ Regression: A regression problem is when the output variable is a real value, such as “dollars” or
“weight”.
How Supervised Learning Algorithms work?
How Supervised Learning Algorithms work?
• In supervised learning, models are trained using labelled dataset, where the model learns about each type
of data.
• Once the training process is completed, the model is tested on the basis of test data (a subset of the training
set), and then it predicts the output.
• For instance, suppose you are given a basket filled with different kinds of fruits.
• Now the first step is to train the machine with all the different fruits one by one.
• If the shape of the object is rounded and has a depression at the top, is red in color, then it will be labeled as
–Apple.
• If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be labeled as –
Banana.
• Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and
asked to identify it.
• Since the machine has already learned the things from previous data and this time has to use it wisely. It will
first classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in the
Banana category.
Errors in Machine Learning
• Reducible errors: These errors can be
reduced to improve the model
accuracy.
• Such errors can further be classified
into bias and Variance.
• Irreducible errors: These errors will
always be present in the model
regardless of which algorithm has
been used.
• The cause of these errors is unknown
variables whose value can't be
reduced.
What is Bias?
• The bias error is known as the difference between the prediction of the values by the Machine
Learning model and the correct value.
• It can be defined as an inability of machine learning algorithms such as Linear Regression to
capture the true relationship between the data points.
• A model has either:
• Low Bias: A low bias model will make fewer assumptions about the form of the target function.
(e.g. Decision Trees, k-Nearest Neighbours and Support Vector Machines)
• High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset. A high bias model also cannot perform well on
new data. (e.g. Linear Regression, Linear Discriminant Analysis and Logistic Regression)
• By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data
in the data set. (Data Underfitting)
• Being high in biasing gives a large error in training as well as testing data.
• It recommended that an algorithm should always be low-biased to avoid the problem of
underfitting.
What is a Variance
• The variability of model prediction for a given data point which tells us the spread
of our data is called the variance of the model.
• The variance would specify the amount of variation in the prediction if the
different training data was used.
• In simple words, variance tells that how much a random variable is different from
its expected value.
• Variance errors are either of low variance or high variance.
• Low variance: It means there is a small variation in the prediction of the target
function with changes in the training data set.(e.g. Linear Regression, Logistic
Regression, and Linear discriminant analysis)
• High variance: It shows a large variation in the prediction of the target function
with changes in the training dataset.(e.g. decision tree, SVM, and KNN)
• with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
• A high variance model leads to overfitting.
• Increase model complexities.
Reducing Bias/Variance
• Ways to reduce High Bias:
✓Increase the input features as the
model is underfitted.
✓Decrease the regularization term.
✓Use more complex models, such as
including some polynomial features
• Ways to Reduce High Variance:
✓Reduce the input features or
number of parameters as a model
is overfitted.
✓Do not use a much complex model.
✓Increase the training data.
Bias-Variance Trade-Off
• If the model is very simple with fewer parameters, it may have low variance
and high bias.
• Whereas, if the model has a large number of parameters, it will have high
variance and low bias.
• A balance is required to be maintained between bias and variance errors,
and this balance between the bias error and variance error is known as the
Bias-Variance trade-off.
• Bias-Variance trade-off is a central issue in supervised learning. Ideally, we
need a model that accurately captures the regularities in training data and
simultaneously generalizes well with the unseen dataset.
• A high variance algorithm may perform well with training data, but it may
lead to overfitting to noisy data.
• Whereas, high bias algorithm generates a much simple model that may not
even capture important regularities in the data.
Bias-Variance Trade-Off
• For an accurate prediction of the model,
algorithms need a low variance and low
bias. But this is not possible because bias
and variance are related to each other:
• If we decrease the variance, it will increase
the bias.
• If we decrease the bias, it will increase the
variance.
• Therefore, we need to find a common spot
between bias and variance to make an
optimal model.
Assumption for Linear Regression Model
• Linear regression is a powerful tool for understanding and predicting the behavior
of a variable, however, it needs to meet a few conditions in order to be accurate
and dependable solutions.
1.Linearity: The independent and dependent variables have a linear relationship
with one another. This implies that changes in the dependent variable follow
those in the independent variable(s) in a linear fashion.
2.Independence: The observations in the dataset are independent of each other.
This means that the value of the dependent variable for one observation does
not depend on the value of the dependent variable for another observation.
3.No multicollinearity: There is no high correlation between the independent
variables. This indicates that there is little or no correlation between the
independent variables.
4.Normality: The errors in the model are normally distributed.
5.Homoscedasticity: Across all levels of the independent variable(s), the variance
of the errors is constant. This indicates that the amount of the independent
variable(s) has no impact on the variance of the errors.
Logistic Regression
• Logistic regression predicts the output of a categorical dependent
variable.
• Therefore the outcome must be a categorical or discrete value.
• The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the "S" form.
• The S-form curve is called the Sigmoid function or the logistic
function used to map the predicted values to probabilities.
• Assumptions for Logistic Regression:
✓The dependent variable must be categorical in nature.
✓The independent variable should not have multi-collinearity.
Overfitting and Underfitting
• Overfitting is a phenomenon that occurs when a Machine Learning
model is constrained to the training set and not able to perform well
on unseen data. That is when our model learns the noise in the
training data as well. This is the case when our model memorizes the
training data instead of learning the patterns in it.
• Underfitting on the other hand is the case when our model is not able
to learn even the basic patterns available in the dataset. In the case of
the underfitting model is unable to perform well even on the training
data hence we cannot expect it to perform well on the validation
data. This is the case when we are supposed to increase the
complexity of the model or add more features to the feature set.
Overfitting and Underfitting
Regularization
• Sometimes a model is not able to predict the output when deals with
unseen data by introducing noise in the output, and hence the model
is called overfitted.
• This problem can be deal with the help of a regularization technique.
• It is a technique to prevent the model from overfitting by adding extra
information to it.
• This technique can be used in such a way that it will allow to maintain
all variables or features in the model by reducing the magnitude of
the variables, thereby maintaining accuracy as well as a generalization
of the model.
• In regularization technique, we reduce the magnitude of the features
by keeping the same number of features."
Lasso Regression
• It stands for Least Absolute Shrinkage and Selection Operator.
• It is also referred as regression model which uses the L1 Regularization technique.
• Lasso Regression adds the “absolute value of magnitude” of the coefficient as a
penalty term to the loss function(L).
Where,
• m – Number of Features
• n – Number of Examples
• y_i – Actual Target Value
• y_i(hat) – Predicted Target Value
• Lasso regression also helps us achieve feature selection by penalizing the weights
to approximately equal to zero if that feature does not serve any purpose in the
model.
Ridge Regression
• Ridge regression is one of the types of linear regression in which a
small amount of bias is introduced so that we can get better long-
term predictions.
• Ridge regression is a regularization technique, which is used to reduce
the complexity of the model. It is also called as L2 regularization.
• In this technique, the cost function is altered by adding the penalty
term to it.
• The amount of bias added to the model is called Ridge Regression
penalty. We can calculate it by multiplying with the lambda to the
squared weight of each individual feature.
Ordinary Least Squares
• The ordinary least squares (OLS) algorithm is a method for estimating
the parameters of a linear regression model.
• The OLS algorithm aims to find the values of the linear regression
model’s parameters (i.e., the coefficients) that minimize the sum of
the squared residuals.
• A linear regression model establishes the relation between a
dependent variable(y) and at least one independent variable(x) as:
• In OLS method, we have to choose the values of b_1 and b_0 such
that, the total sum of squares of the difference between the
calculated and observed values of y, is minimised.
To get the values of b_0 and b_1 which minimise S, we can take a
partial derivative for each coefficient and equate it to zero.
Normalization
• Normalization is a scaling technique in Machine Learning applied
during data preparation to change the values of numeric columns in
the dataset to use a common scale.
• It is not necessary for all datasets in a model.
• It is required only when features of machine learning models have
different ranges.
• Data normalization consists of remodeling numeric columns to a
standard scale.
• Data normalization is generally considered the development of clean
data.
Normalization techniques in Machine Learning
• Min-Max normalization: In this technique of data normalization, a
linear transformation is performed on the original data. The minimum
and maximum value from data are fetched and each value is replaced
according to the following formula.
• Normalization by decimal scaling: It normalizes by moving the decimal
point of values of the data. To normalize the data by this technique,
we divide each value of the data by the maximum absolute value of
the data. The data value, vi, of data, is normalized
• Z-score normalization or Zero mean normalization: In this technique,
values are normalized based on mean and standard deviation of the
data A.
Difference between Normalization and
Standardization
Normalization Standardization
This technique uses minimum and max values for This technique uses mean and standard deviation for
scaling of model. scaling of model.
It is helpful when features are of different scales. It is helpful when the mean of a variable is set to 0 and
the standard deviation is set to 1.
Scales values ranges between [0, 1] or [-1, 1]. Scale values are not restricted to a specific range.