You are on page 1of 36

Supervised machine learning

Linear Regression
Univariate Linear Regression
• Let’s say we want to predict the price for a
house that is let’s say 1750 square feet,
• We would need to first draw some line of best
fit.
• This is an example of a regression problem
where we are trying to predict a real-valued
output
• The line of best fit drawn represents a simple
linear regression model
Notation:
• n – number of training examples
• m – number of features in our dataset
• x – input variable/ feature
• y – output variable/target variable
• (xi, yi) – ith training example
• In supervised machine learning, we feed a
training set to a learning algorithm
• The learning algorithm has to give us a
function that maps the input to a predicted
output
Cost function
• Our function f can be represented as:
• and are called weights
• When trying to come up with a model that
best fits our training data, we have to choose
the right values for and
• Let’s look at how different choices of and
would affect our model…..
• We need to choose values of and so that
(predicted value) is as close as possible to for
our training examples
• So the goal is to minimize which leads us to
the MSE (min squared error function)
• for n- training examples
• Hence we need to
• Let’s assume we have the following examples
from a training set:

• Using (with =0 ), let’s try to plot a few points


to determine the shape of our cost function
-1 44

0 3

1 0

2 11

3 44
• The plot of shows that the minimum value is
obtained when = 1.
• Hence the best model to fit our data would
be:
Gradient descent algorithm
• Procedure:
– Start with some ,…
– Keep changing ,… to reduce until we hopefully
end at a minimum
• Weights are changed using the formula below:
,for
α – learning rate
NB: The weights must be simultaneously
updated!
• Let’s look at an example:
• Using , let’s try to find and that will give us
the minimum value of our MSE loss function
for the training set below:

Speed (x) Range(y)


55 316
60 292
65 268
70 246
75 227
80 207
but
So
• Finding the partial derivative of with respect
to and respectively, we get:

• Partial derivative w.r.t ,



• Partial derivative w.r.t ,
So to update our weights, we use:

Let’s try to implement these steps using python


in three steps:
Task

Play around with the number of epochs and


learning rate so that the model can converge
faster
• It is worth remembering that if α is too small,
Gradient descent can be slow and if α is too
large, gradient descent can overshoot the
minimum. It may fail to converge or even
diverge.
• The derivative of the cost function gets
smaller as we move towards the local minima
and gradient descent algorithm starts taking
smaller steps.
Might be wiser to exit our loop when is
sufficiently small?? Try it
• Please do a review of the following topics
before go to the next chapter:

– What are matrices and vectors?


– Addition, subtraction, multiplication with matrices
and vectors
– Matrix inverse, transpose

Only a basic understanding will be required


Linear regression with multiple
variables
• What if we had more than one variable or
feature affecting the value of our dependent
variable?
• In real life you’ll hardly come across datasets
with only one feature, consider the table
below:
• With multiple variables, our predicted value
will take the form:

[for convenience of form we’ve added ]


• Our feature vector x = and our weight vector
ω=

• So now in vector format: x


• Our cost function now takes a bowl shape as
shown below
Linear Regression using sklearn
What is scikit-learn
• Scikit-learn is a python open source machine
learning library built on top of the scipy
(scientific python) library.
• It features various classification, regression
and clustering algorithms
• Some of the popular models provided by
sklearn are:
Cont……
• Supervised learning algorithms - Almost all the popular
supervised learning algorithms, like Linear Regression,
Support Vector Machine (SVM), Decision Tree etc., are
the part of scikit-learn.
• Unsupervised Learning algorithms − On the other
hand, it also has all the popular unsupervised learning
algorithms from clustering, factor analysis, PCA
(Principal Component Analysis) to unsupervised neural
networks.
• Clustering − This model is used for grouping unlabeled
data.
• Cross Validation − It is used to check the accuracy of
supervised models on unseen data..
• Dimensionality Reduction − It is used for reducing the
number of attributes in data which can be further used
for summarisation, visualisation and feature selection

• Ensemble methods − As the name suggests, it is used for


combining the predictions of multiple supervised models.
• Feature extraction − It is used to extract the features from
data to define the attributes in image and text data.
• Feature selection − It is used to identify useful attributes
to create supervised models.
sklearn.linear_model.LinearRegression

• Computes the regression line using the


ordinary least squares method
• Ordinary least squares method uses the
following formulae to calculate the bias and
weight:
and
Try to compute the regression line for our
speed/range dataset by hand using ordinary
least squares method.
• First we need to import all the necessary libraries:

• numpy is a library for mathematical computations involving n-


dimensional arrays. It has many functions for linear algebra and
matrices
• matplotlib is a library for plotting and embedding plots in
applications
• pandas is a library for analyzing, cleaning, exploring and
manipulating data
• train_test_split is a function provided by sklearn for splitting data
arrays into two subsets i.e. training data and testing data
• Next we need to read our data into a
dataframe and extract the relevant columns
and store them in numpy array objects.

• csv refers to comma separated variables which


is a common file format used to store large
datasets
• We also need to visualize our data before we
start training

• Split our data into 80% training set and 20%


testing set
• Training our model

• Making predictions

• Plotting our regression line


sklearn.linear_model.SGDRegressor
• Computes the regression line using stochastic
gradient descent method
• In stochastic gradient descent is just like regular
gradient descent except we uses one random
sample per step instead of the whole dataset.
• Let’s look at some of the parameters for the
SGDRegressor constructor at
Documentation
Repeat the previous example using the
SGDRegressor model

You might also like