You are on page 1of 36

Linear Regression

What is Linear Regression?


• Linear regression is an algorithm that provides a linear relationship
between an independent variable and a dependent variable to predict the
outcome of future events.
• It is a statistical method used in data science and machine learning for
predictive analysis.
Simple Linear Regression
• Linear regression shows the linear
relationship between the
independent(predictor) variable i.e. X-
axis and the dependent(output) variable
i.e. Y-axis, called linear regression.
• If there is a single input
variable X(independent variable), such
linear regression is called simple linear
regression.
Simple Linear Regression
• The graph presents the linear relationship
between the output(y) variable and
predictor(X) variables. The blue line is
referred to as the best fit straight line.
Based on the given data points, we attempt
to plot a line that fits the points the best.
• This algorithm explains the linear
relationship between the
dependent(output) variable y and the
independent(predictor) variable X using a
straight line Y= B0 + B1 X.
Simple Linear Regression
• The slope indicates the steepness of a line and the intercept
indicates the location where it intersects an axis. The slope
and the intercept define the linear relationship between two
variables, and can be used to estimate an average rate of
change. The greater the magnitude of the slope, the steeper
the line and the greater the rate of change.
Simple Linear Regression: Example
Simple Linear Regression: Example

So how do we know which of these lines is the best fit line? That’s the problem that
we will solve in this article. For this, we will first look at the cost function.
Simple Linear Regression
But how the linear regression finds out which
is the best fit line?
• The goal of the linear regression algorithm is to get the best values for
B0 and B1 to find the best fit line. The best fit line is a line that has the
least error which means the error between predicted values and actual
values should be minimum.
• In regression, the difference between the observed value of the
dependent variable(yi) and the predicted value(predicted) is called the
residuals.
• εi = ypredicted – yi
• where ypredicted = B0 + B1 Xi
Cost Function for Linear Regression
• The cost function helps to work out the optimal values for B 0 and B1,
which provides the best fit line for the data points.
• In Linear Regression, generally Mean Squared Error (MSE) cost
function is used, which is the average of squared error that occurred
between the ypredicted and yi.
Cost Function for Linear Regression
• The cost function helps to work out the optimal values for B 0 and B1,
which provides the best fit line for the data points.
• In Linear Regression, generally Mean Squared Error (MSE) cost
function is used, which is the average of squared error that occurred
between the ypredicted and yi.

Using the MSE function, we’ll update the values of B0 and B1 such
that the MSE value settles at the minima. These parameters can be
determined using the gradient descent method such that the value for
the cost function is minimum.
Gradient Descent
Gradient Descent

B1

B0
Gradient Descent
Convex and Non-Convex cost function
Gradient Descent
Then there are 2 things that we need
1.Which direction to go (Direction of update)
2.How big step to take (Amount of update )
Gradient Descent

•positive derivative -> reduce


•negative derivative -> increase
•high absolute derivative -> large step
•low absolute derivative -> small step
⮚ Gradient Descent Algorithm

Correct: Simultaneous update Incorrect:


Check Gradient Descent
Check Gradient Descent
Types of Gradient Descent
Batch Gradient Descent:
• Let there be ‘n’ observations in a dataset. Using all these ‘n’ observations to update the
coefficient values B0 and B1 is called batch gradient descent. It requires the entire
dataset to be available in memory to the algorithm.
Stochastic Gradient Descent (SGD):
• SGD, in contrast, updates the values of B0 and B1 for each observation in the dataset.
These frequent updates of the coefficient provide a good rate of improvement.
However, they are more computationally expensive than batch gradient descent.
Mini-Batch Gradient Descent:
• Mini-batch gradient descent is a combination of the SGD and batch gradient descent. It
splits the dataset into batches and the coefficients are updated at the end of each of
these batches.
Gradient Descent
To summarize, the steps are:
• Estimate θ (Parameters)
• Compute cost function / loss function
• Tweak θ
• Repeat 2 and 3 until you reach convergence.
Preparing Data For Linear Regression
• Linear Assumption: Linear regression assumes that the relationship between
your input and output is linear. It does not support anything else. This may be
obvious, but it is good to remember when you have a lot of attributes. You may
need to transform data to make the relationship linear (e.g. log transform for an
exponential relationship).
• Remove Noise: Linear regression assumes that your input and output variables
are not noisy. Consider using data cleaning operations that let you better
expose and clarify the signal in your data. This is most important for the output
variable and you want to remove outliers in the output variable (y) if possible.
• Remove Collinearity: Linear regression will over-fit your data when you have
highly correlated input variables. Consider calculating pairwise correlations for
your input data and removing the most correlated
Preparing Data For Linear Regression
• Gaussian Distributions: Linear regression will make more reliable
predictions if your input and output variables have a Gaussian
distribution. You may get some benefit using transforms (e.g. log or
BoxCox) on you variables to make their distribution more Gaussian
looking.
• Rescale Inputs: Linear regression will often make more reliable
predictions if you rescale input variables using standardization or
normalization.
Key benefits of linear regression
• Easy implementation
The linear regression model is computationally simple to implement as
it does not demand a lot of engineering overheads, neither before the
model launch nor during its maintenance.
• Interpretability
Unlike other deep learning models (neural networks), linear regression
is relatively straightforward. As a result, this algorithm stands ahead of
black-box models that fall short in justifying which input variable causes
the output variable to change.
Key benefits of linear regression
• Scalability
Linear regression is not computationally heavy and, therefore, fits well in cases where
scaling is essential. For example, the model can scale well regarding increased data volume
(big data).
• Optimal for online settings
The ease of computation of these algorithms allows them to be used in online settings.
The model can be trained and retrained with each new example to generate predictions in
real-time, unlike the neural networks or support vector machines that are computationally
heavy and require plenty of computing resources and substantial waiting time to retrain
on a new dataset. All these factors make such compute-intensive models expensive and
unsuitable for real-time applications.
Multiple Linear Regression
• Multiple Linear Regression (MLR) is basically indicating that we will have
many features Such as f1, f2, f3, f4, and our output feature f5. If we take the
same example as above we discussed, suppose:
• f1 is the size of the house,
• f2 is bad rooms in the house,
• f3 is the locality of the house,
• f4 is the condition of the house, and
• f5 is our output feature, which is the price of the house.

• y = A+B1x1+B2x2+B3x3+B4x4
Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression
• Jobs we loose due to ML:
https://www.youtube.com/watch?v=gWmRkYsLzB4&list=PLobzMSC-r
aKifQd9vHHPkMam_jrQEyzCX&index=7
• How AI can enhance our memory, work and social lives:
https://youtu.be/DJMhz7JlPvA

You might also like